For example, the duration of the fetch is a combination of network time of the request reaching the server, server processing time, and network time of the response. Each and every one of these steps "leaks" information both about the client and the server.
For example, if the total duration is very small (say, <10ms) then we can reasonably intuit that we might be talking to a local cache, which means that the client has previously fetched this resource. Alternatively, if the duration is slightly higher (say, <50ms) then we can reasonably guess that the client is on a low-latency network (e.g. fast 4G or WiFi). We can also append random data to the URL to make it unique and rule out the various HTTP caches along the way. From there, we can try making more requests to the server and observe how the fetch duration changes to infer change in server processing times and/or larger responses being sent to the client.
If we're really crafty, we can also use the properties of the network transport like CWND induced roundtrips in TCP (see TCP Slow Start), and other quirks of local network configuration, as additional signals to infer properties (e.g. size) of the response—see TIME, HEIST attacks. If the response is compressed and also happens to reflect submitted data, then there is also the possibility of using a compression oracle attack (see BREACH) to extract data from the response.
Each and every step in the fetch process—from the client generating the request and putting on the wire, the network hops to the server, the server processing time, response properties, and the network hops back to the client—"leaks" information about the properties of the client, network, server, and the response. This is not a bug; it's a fact of life. Borrowing an explanation from our physicist friends: putting a system to work amounts to extracting energy from it, which we can then measure and interrogate to learn facts about said system.
Eyes glazing over yet? The practical implication is that if the necessary server precautions are missing, the use of the above techniques can reveal private information about you and your relationship to that server - e.g. login status, group affiliation, and more. This requires a bit more explanation…
The fact that we can use side-channel information, such as the duration of a fetch, to extract information about the response is not, by itself, all that useful. After all, if I give you a URL you can just use your own HTTP client to fetch it and inspect the bytes on the wire. However, what does make it dangerous is if you can co-opt my client (my browser) to make an authenticated request on my behalf and inspect the (opaque) response that contains my private content. Then, even if you can't access the response directly, you can observe any of the aforementioned properties of the fetch and extract private information about my client and the response. Let's make it concrete…
kittens.com
on which I have an account to pin my favorite images:
kittens.com
with a private token that is used to authenticate me on future visits.shady.com
to view more pictures of kittens...
shady.com
, the page issues background requests on my behalf to kittens.com
with the goal of attempting to learn something about my status on said site.How does shady.com
make a credentialed request? A simple image element is sufficient:
<img src="https://kittens.com/favorites" alt="Yay authenticated kittens!">
<!-- Image element is not the only mechanism with this behavior, others
include script, object, video, etc. Also, there is JavaScript... -->
<script>
var img = new Image();
img.src = "https://kittens.com/favorites"
</script>
The browser processes the image element, initializes a request for https://kittens.com/favorites
, attaches my HTTP cookies associated with kittens.com
, and dispatched the request. The target server (kittens.com
) sees a valid authentication cookie and dutifully sends back the HTML response containing my favorite kittens. Of course, the image tag will choke on the HTML and will fire an error callback, but that doesn't matter, because even though we can't inspect the response, we can still learn a lot by observing the timing of the authenticated request-response flow.
With the benefit of a few decades of experience under our belt, and if we were rebuilding the web platform from scratch, we probably wouldn't allow such "no-cors"
authenticated requests without explicit CORS opt-in from the server, just as we do today for XMLHttpRequest
and Fetch API. Alas, that would be a major breaking change, so that's off the table. However, not all is lost either, because kittens.com
can deploy additional logic to protect itself, and its users, against such cross-origin attacks.
The core issue is that the browser attaches target origin's cookies on "no-cors"
requests regardless of the origin that initiates the request. In theory, the target origin could look at the Referrer
header, but the attacker could hide the initiating origin—e.g. via no-referrer policy. Similarly, the Origin
header is only sent on CORS requests, so that won't help either. However, SameSite cookies give us the exact behavior we want:
Here, we update [RFC6265] with a simple mitigation strategy that allows servers to declare certain cookies as "same-site", meaning they should not be attached to "cross-site" requests…Note that the mechanism outlined here is backwards compatible with the existing cookie syntax. Servers may serve these cookies to all user agents; those that do not support the "SameSite" attribute will simply store a cookie which is attached to all relevant requests, just as they do today.
SameSite cookies have two modes: "strict" and "lax". In strict mode, the cookies are not sent in top-level navigations, which offers strong protection but requires some additional deployment considerations. In lax mode, cookies are sent for top-level navigations-e.g. navigations initiated by <a>
elements, window.open()
, <link rel=prerender>
), which offers reasonable protection. Do read the IETF spec, it provides good guidance.
200 OK HTTP/1.1
...
Set-Cookie: SID=31d4d96e407aad42; SameSite=Strict
Using our example above, if kittens.com
set the SameSite
flag on its authentication cookie, then the image request initiated by shady.com
would not contain the authentication cookie due to mismatch of the initiating origin and the origin that set the cookie and would result in a generic unauthenticated response—e.g. a redirect to a login page. If you're kittens.com
, enabling SameSite cookies should be a no-brainer.
More generally, if your site or service does not intentionally provide cross-origin resources (e.g. embeddable widgets, site plugins, etc.), then you should use SameSite cookies as your default.
SameSite cookies are supported in Chrome (since M51) and Opera 39, and are under consideration in Firefox. Let's hope the other browsers will be fast followers. Last but not least, it's worth noting that you also can, as a user, block third party cookies in your browser to protect yourself from this type of cross-origin attack.
]]>Maybe there is. There is an infinite supply of reasons for why the application can fall off the fast path: overloaded networks and servers, transient network routing issues, device throttling due to energy or heat constraints, competition for resources with other processes on the user's device, and the list goes on and on. It is impossible to anticipate all the edge cases that can knock our applications off the fast path, but one thing we know for certain: they will happen. The question is, how are you going to deal with it?
Carving out the fast path is not enough. We need to make our applications resilient.
Resilient applications provide guardrails that protect our users from the inevitable performance failures. They anticipate these problems ahead of time, have mechanisms in place to detect them, know how to adapt to them at runtime, and as a result, are able to deliver a reliable user experience despite these complications.
I won't rehash every point in the video, but let's highlight the key themes:
(9m3s) Seemingly small amounts of performance variability in critical components quickly add up to create less than ideal conditions. We must design our systems to detect and deal with such cases—e.g. set explicit SLA's on all requests and specify upfront how the violations will be handled.
(16m28s) The "performance inequality" gap is growing. There are two market forces at play: there is a race for features and performance, and there is high demand for lower prices. These are not entirely at odds, the cheap devices are also getting faster, but the flagships are racing ahead at a much faster pace.
(19m45s) "Fast" devices show spectacular peak performance in benchmarks, but real-world performance is more complicated: we often have to trade off raw performance against energy costs and thermal constraints, compete for shared resources with other applications, and so on.
(23m35s) Mobile networks provide an infinite supply of performance entropy, regardless of the continent, country, and provider—e.g. the chances of a device connecting to a 4G network in some of the largest European countries are effectively a coin flip; just because you "have a signal" doesn't mean the connection will succeed; see "Resilient Networking".
If we ignore the above and only optimize for the fast path, we shouldn't be surprised when the application goes off the rails, and our users complain about unreliable performance. On the other hand, if we accept the above as "normal" operational constraints of a complex system, we can engineer our applications to anticipate these challenges, detect them, and adapt to them at runtime (31m39s):
We're missing primitives that enable control over how and where CPU, GPU, and network resources are allocated by the browser. To the browser, all scripts look the same. To the developer, some are more important than others. Today, the web platform lack the tools to bridge this gap, and that's at least one reason why delivering reliable performance is often an elusive goal for many.
Conceptually, the above problem is nothing new. For example, Linux control groups (cgroups) address the very same issues "higher up" in the stack: multiple processes compete for a finite number of available resources on the device, and cgroups provide a mechanism by which resource allocation (CPU, GPU, memory, network, etc) can be specified and enforced at a per-process level - e.g. this process is allowed to use at most 10% of the CPU, 128MB of RAM, is rate-limited to 500Kbps of peak bandwidth, and is only allowed to download 10Mb in total.
The problem is that we, as site developers, have no way to communicate and specify similar policies for resources that run on our sites. Today, including a script or an iframe gives it the keys to the kingdom: these resources execute with the same priority and with unrestricted access to the CPU, GPU, memory, and the network. As a result, the best we can do is cross our fingers and hope for the best.
As a thought experiment, it may be worth considering how a cgroups-like policy could look like in the browser, and what we would want to control. What follows is a handwavy sketch, based on the frequent performance failure cases found in the wild, and conversations with teams that have found themselves in these types of predicaments:
<!-- "background" group should receive low CPU and network priority
and consume at most 5% of the available CPU and network resources -->
<meta http-equiv="cgroup" name="background"
content="cpu-share 0.05; cpu-priority low;
net-share 0.05; net-priority low;">
<!-- "app" group should receive high CPU priority and be allowed to
consume up to 80% of available CPU resources (don't hog all of CPU),
but be allowed to consume all of the available network resources -->
<meta http-equiv="cgroup" name="app"
content="cpu-share 0.8; cpu-priority high;
net-share 1.0; net-priority high">
<!-- "ads" group should receive at most 20% of the cpu and have lower
scheduling and network priority then "app" content. -->
<meta http-equiv="cgroup" name="ads"
content="cpu-share 0.2; cpu-priority medium;
net-share 0.8; net-priority medium">
...
<!-- assign followng resources to "app" group -->
<link cgroup="app" rel="stylesheet" href="/style.css">
<script cgroup="app" src="/app.js" async></script>
<!-- assign followng resources to "ads" group -->
<script cgroup="ads" src="/ads-manager.js" async></script>
<iframe cgroup="ads" src="//3rdparty.com/widget"></iframe>
<!-- assign followng resources to "background" group -->
<script cgroup="background" src="analytics.js" async></script>
The above is not an exhaustive list of plausible directives; don't fixate on the syntax. The key point, and question, is whether it would be useful—both to site developers and browser developers—to have such annotations communicate the preferred priorities and resource allocation strategy on their page - e.g. some scripts are more important than others, some network fetches should have lower relative priority, and so on.
Well, it may not be able to, in the strict sense of that word. For example, if a "background" script is scheduled and decides to monopolize the renderer thread and run for 20 frames, there isn't much that the runtime can do—today, at least. However, the runtime can use the provided information to decide which callback or function to schedule next, or how to prioritize loading of resources. Some browsers may be able to do a better job of enforcing such policies, but even small scheduling optimizations can yield significant user-visible wins. Today, the browser is running blind.
Further, once the browser knows the "desired allocation", it can flag and warn the developer when there is a mismatch at runtime - e.g. it can fire events via PerformanceObserver to notify the app of violations, allowing the developer to gather and act on this data. In effect, this could be the first step towards enabling attribution and visibility into the real-world runtime performance and impact of various resources.
Perhaps an idea worth exploring?
]]>Except, what is an "average page", exactly? Intuitively, it is a page that is representative of the web at large, in its payload size, distribution of bytes between different content types, etc. More technically, it is a measure of central tendency of the underlying distribution - e.g. for a normal distribution the average is the central peak, with 50% values greater and 50% values smaller than its value. Which, of course, begs the question: what is the shape and type of the distribution for transferred bytes and does it match this model? Let's plot the histogram and the CDF plots...
Let's start with the obvious: the transfer size is not normally distributed, and there is no meaningful "central value" and talking about the mean is meaningless, if not deceiving - see "Bill Gates walks into a bar...". We need a much richer and nuanced language and statistics to capture what's going on here, and an even richer set of tools and methods to analyze how these values change over time. The "average page" is a myth.
Coming up with a small set of descriptive statistics for a dataset is hard, and attempting to reduce a dataset as rich as HTTP Archive down to a single one is an act of folly. Instead, we need to visualize the data and start asking questions.
For example, why are some pages so heavy? A cursory look shows that the heaviest ~3% by page weight, both for desktop (>7374KB) and mobile (>4048KB), are often due to large (and/or heavy) number of images. Emphasis on often, because a deeper look at the most popular content types shows outliers in each and every category. For example, plotting the CDFs for desktop pages yields:
We have pages that fetch tens of megabytes of HTML, images, video, and fonts, as well as high single-digit megabytes of JavaScript and CSS. Each of these "obese" outliers is worth digging into, but we'll leave that for a separate investigation. Let's compare this data to the mobile dataset.
Lots of outliers as well, but the tails for mobile pages are not nearly as long. This alone explains much of the dramatic "average page" difference (desktop: 2227KB, mobile: 1253KB) — averages are easily skewed by a few large numbers. Focusing on the average leads us to believe that mobile pages are significantly "lighter", whereas in reality all we can say so far is that the desktop distribution has a longer tail with much heavier pages.
To get a better sense for the difference in distributions between the desktop and mobile pages, let's exclude the heaviest 3% that compress all of our graphs and zoom in on the [0, 97%] interval:
Mobile pages do appear to consume fewer bytes. For example, a 1000KB budget would allow the client to fetch fully ~38% of desktop pages vs. 54% of mobile pages. However, while the savings for mobile pages are present for all content types, the absolute differences for most of them are not drastic. Most of the total byte difference is explained by fewer image bytes. Structurally, mobile pages are not dramatically different from desktop pages.
Comparing the CDFs against the year prior shows that the transfers sizes for most content types have increased for both the desktop and mobile pages. However, there are some unexpected and interesting results as well:
In terms of bytes fetched, for everything but images, mobile pages are a year behind their desktop counterparts? Intuitively, this makes sense, just because we're working with a smaller screen doesn't mean the required functionality is less, or less complex.
My goal here is to raise questions, not to provide answers; this is a very shallow analysis of a very rich dataset. For a deeper and a more hands-on look at this data, take a look at my Datalab workbook. Better, clone it, run your own analysis, and share your results! If we want to talk about the trends, outliers, and their causes on the web, then we need to understand this data at a much deeper level.
]]>Unfortunately, many web applications get this wrong because they fail to account for the mobile lifecycle: they're listening for the wrong events that may never fire, or ignore the problem entirely at the high cost of poor user experience. To be fair, the web platform also doesn't make this easy by exposing (too) many different events: visibilityState, pageshow, pagehide, beforeunload, unload. Which should we use, and when?
You cannot rely on pagehide
, beforeunload
, and unload
events to fire on mobile platforms. This is not a bug in your favorite browser; this is due to how all mobile operating systems work. An active application can transition into a "background state" via several routes:
Once the application has transitioned to background state, it may be killed without any further ceremony - e.g. the OS may terminate the process to reclaim resources, the user can swipe away the app in the task manager. As a result, you should assume that "clean shutdowns" that fire the pagehide
, beforeunload
, and unload
events are the exception, not the rule.
To provide a reliable and consistent user experience, both on desktop and mobile, the application must use Page Visibility API and execute its session save and restore logic whenever visibilityChange
state changes. This is the only event your application can count on.
// query current page visibility state: prerender, visible, hidden
var pageVisibility = document.visibilityState;
// subscribe to visibility change events
document.addEventListener('visibilitychange', function() {
// fires when user switches tabs, apps, goes to homescreen, etc.
if (document.visibilityState == 'hidden') { ... }
// fires when app transitions from prerender, user returns to the app / tab.
if (document.visibilityState == 'visible') { ... }
});
If you're counting on unload
to save state, record and report analytics data, and execute other relevant logic, then you're missing a large fraction of mobile sessions where unload
will never fire. Similarly, if you're counting on beforeunload
event to prompt the user about unsaved data, then you're ignoring that "clean shutdowns" are an exception, not the rule.
Use Page Visibility API and forget that the other events even exist. Treat every transition to visible
as a new session: restore previous state, reset your analytics counters, and so on. Then, when the application transitions to hidden
end the session: save user and app state, beacon your analytics, and perform all other necessary work.
In the long term, all you need is the Page Visibility API. As of today, you will have to augment it with one other event — pagehide
, to be specific — to account for the "when the page is being unloaded" case. For the curious, here's a full matrix of which events fire in each browser today (based on my manual testing):
visibilityChange
works reliably for task-switching on mobile platforms.beforeunload
is of limited value as it only fires on desktop navigations.unload
does not fire on mobile and desktop Safari.The good news is that Page Visibility reliably covers task-switching scenarios across all platforms and browser vendors. The bad news is that today Firefox is the only implementation that fires the visibilityChange
event when the page is unloaded — Chrome, WebKit, and Edge bugs to address this. Once those are resolved, visibilityState
is the only event you'll need to provide a great user experience.
Modern browsers try their best to anticipate what connections the site will need before the actual request is made. By initiating early "preconnects", the browser can set up the necessary sockets ahead of time and eliminate the costly DNS, TCP, and TLS roundtrips from the critical path of the actual request. That said, as smart as modern browsers are, they cannot reliably predict all the preconnect targets for each and every website.
The good news is that we can — finally — help the browser; we can tell the browser which sockets we will need ahead of initiating the actual requests via the new preconnect hint shipping in Firefox 39 and Chrome 46! Let's take a look at some hands-on examples of how and where you might want to use it.
Your application may not know the full resource URL ahead of time due to conditional loading logic, UA adaptation, or other reasons. However, if the origin from which the resources are going to be fetched is known, then a preconnect hint is a perfect fit. Consider the following example with Google Fonts, both with and without the preconnect hint:
In the first trace, the browser fetches the HTML and discovers that it needs a CSS resource residing on fonts.googleapis.com
. With that downloaded it builds the CSSOM, determines that the page will need two fonts, and initiates requests for each from fonts.gstatic.com
— first though, it needs to perform the DNS, TCP, and TLS handshakes with that origin, and once the socket is ready both requests are multiplexed over the HTTP/2 connection.
<link href='https://fonts.gstatic.com' rel='preconnect' crossorigin>
<link href='https://fonts.googleapis.com/css?family=Roboto+Slab:700|Open+Sans' rel='stylesheet'>
In the second trace, we add the preconnect hint in our markup indicating that the application will fetch resources from fonts.gstatic.com
. As a result, the browser begins the socket setup in parallel with the CSS request, completes it ahead of time, and allows the font requests to be sent immediately! In this particular scenario, preconnect removes three RTTs from the critical path and eliminates over half of second of latency.
crossorigin
attribute on the preconnect hint: the browser maintains a separate pool of sockets for this mode.
In addition to declaring the preconnect hints via HTML markup, we can also deliver them via an HTTP Link
header. For example, to achieve the same preconnect benefits as above, the server could have delivered the preconnect hint without modifying the page markup - see below. The Link
header mechanism allows each response to indicate to the browser which other origins it should connect to ahead of time. For example, included widgets and dependencies can help optimize performance by indicating which other origins they will need, and so on.
We don't have to declare all preconnect origins upfront. The application can invoke preconnects in response to user input, anticipated activity, or other user signals with the help of JavaScript. For example, consider the case where an application anticipates the likely navigation target and issues an early preconnect:
function preconnectTo(url) {
var hint = document.createElement("link");
hint.rel = "preconnect";
hint.href = url;
document.head.appendChild(hint);
}
The user starts on jsbin.com
; at ~3.0 second mark the page determines that the user might be navigating to engineering.linkedin.com
and initiates a preconnect for that origin; at ~5.0 second mark the user initiates the navigation, and the request is dispatched without blocking on DNS, TCP, or TLS handshakes — nearly a second saved for the navigation!
Preconnect is an important tool in your optimization toolbox. As above examples illustrate, it can eliminate many costly roundtrips from your request path — in some cases reducing the request latency by hundreds and even thousands of milliseconds. That said, use it wisely: each open socket incurs costs both on the client and server, and you want to avoid opening sockets that might go unused. As always, apply, measure real-world impact, and iterate to get the best performance mileage from this feature.
Finally, for debugging purposes, do note that preconnect directives are treated as optimization hints: the browser might not act on each directive each and every time, and the browser is allowed to adjust its logic to perform a partial handshake - e.g. fall back to DNS lookup only, or DNS+TCP for TLS connections.
]]>Ok, so loading a page is complicated business, so what? Well, if there is no way to reliably predict how long the load might take, then why do so many browsers still use and show the progress bar? At best, the 0-100 indicator is a lie that misleads the user; worse, the success criteria is forcing developers to optimize for "onload time", which misses the progressive rendering experience that modern applications are aiming to deliver. Browser progress bars fail both the users and the developers; we can and should do better.
To be clear, progress indicators are vital to helping the user understand that an operation is in progress. The browser needs to show some form of a busy indicator, and the important questions are: what type of indicator, whether progress can be estimated, and what criteria are used to trigger its display.
Some browsers have already replaced "progress bars" with "indeterminate indicators" that address the pretense of attempting to predict and estimate something that they can't. However, this treatment is inconsistent between different browser vendors, and even same browsers on different platforms — e.g. many mobile browsers use progress bars, whereas their desktop counterparts use indeterminate indicators. We need to fix this.
Also, while we're on the subject, what are the conditions that trigger the browser's busy indicator anyway? Today the indicator is shown only while the page is loading: it is active until the onload
event fires, which is supposed to indicate that the page has finished fetching all of the resources and is now "ready". However, in a world optimized for progressive rendering, this is an increasingly less than useful concept: the presence of an outstanding request does not mean the user can't or shouldn't interact with the page; many pages defer fetching and further processing until after onload
; many pages trigger fetching and processing based on user input.
Time to onload
is bad performance metric and one that developers have been gaming for a while. Making that the success criteria for the busy indicator seems like a decision worth revisiting. For example, instead of relying on what is now an arbitrary initialization milestone, what if it represented the pages ability to accept and process user input?
The initial page load is simply a special case of painting the first frame (ideally in <1000ms), at which time the page is unable to process user input. Post first frame, if the UI thread is busy once again, then the browser can and should show the same indicator. Changing the busy indicator to signal interactivity would address our existing issues with penalizing progressive rendering, remove the need to continue gaming onload
, and create direct incentives for developers to build and optimize for smooth and jank-free experiences.
The ambiguity and lack of developer override in above spec language is a big gap and a performance problem. First, the ambiguity leaves us with inconsistent behavior across different browsers, and second, the lack of developer override means that we are either rendering content that should be blocked, or unnecessarily blocking rendering where a fallback would have been acceptable. There isn't a single strategy that works best in all cases.
How often does the above algorithm get invoked? What's the delta between the time the browser was first ready to render text and the font became available? Speaking of which, how long does it typically take the font download to complete? Can we just initiate the font fetch earlier to solve the problem?
As it happens, Chrome already tracks the necessary metrics to answer all of the above. Open a new tab and head to chrome://histograms
to inspect the metrics (for the curious, check out histograms.xml in Chromium source) for your profile and navigation history. The specific metrics we are interested in are:
WebFont.HadBlankText
: count of times text rendering was blocked.WebFont.BlankTextShownTime
: duration of blank text due to blocked rendering.WebFont.DownloadTime.*
: time to fetch the font, segmented by filesize.PLT.NT_Request
: time to first response byte (TTFB).Inspecting your own histograms will, undoubtedly, reveal some interesting insights. However, is your profile data representative of the global population? Chrome aggregates anonymized usage statistics from opted-in users to help the engineering team improve Chrome's features and performance, and I've pulled the same global metrics for Chrome for Android. Let's take a look...
50th | 75th | 95th | |
WebFont.DownloadTime.0.Under10KB | ~400 ms | ~750 ms | ~2300 ms |
WebFont.DownloadTime.1.10KBTo50KB | ~500 ms | ~900 ms | ~2600 ms |
WebFont.DownloadTime.2.50KBTo100KB | ~600 ms | ~1100 ms | ~3800 ms |
WebFont.DownloadTime.3.100KBTo1MB | ~800 ms | ~1500 ms | ~5000 ms |
WebFont.BlankTextShownTime | ~350 ms | ~750 ms | ~2300 ms |
PLT.NT_Request | ~150 ms | ~380 ms | ~1300 ms |
No blank text | Had blank text | ||
WebFont.HadBlankText | ~71% | ~29% |
29% of page loads on Chrome for Android displayed blank text: the user agent knew the text it needed to paint, but was blocked from doing so due to the unavailable font resource. In the median case the blank text time was ~350 ms, ~750 ms for the 75th percentile, and a scary ~2300 ms for the 95th.
Looking at the font download times, it is also clear that even the smallest fonts (<10KB) can take multiple seconds to complete. Further, the time to fetch the font is significantly higher than the time to the first HTML response byte (see PLT.NT_Request
) that may contain text that can be rendered. As a result, even if we were able to start the font fetch in parallel with the HTML request, there are still many cases where we would have to block text rendering. More realistically, the font fetch would be delayed until we know it is required, which means waiting for the HTML response, building the DOM, and resolving styles, all of which defer text rendering even further.
As the above data illustrates, fetching the font sooner and optimizing the resource filesize are both important but not sufficient to eliminate the "blank text problem". The network fetch may take a while, and we can't control that.
That said, knowing this, we can provide the necessary controls to developers to specify the desired text rendering strategy: there are cases where using a fallback is a valid strategy, and there are cases when rendering should be blocked. Both strategies are valid and can coexist on the same page depending on the content being rendered.
In short, text is almost always the single most important asset on the page, and we need to give developers control over how and when it's rendered. The CSS font rendering proposal should, I hope, resolve this.
]]>All connections are slow some of the time. All connections fail some of the time. All users experience these behaviors on their devices regardless of their carrier, geography, or underlying technology — 4G, 3G, or 2G.
Networks are not reliable, latency is not zero, and bandwidth is not infinite. Most applications ignore these simple truths and design for the best-case scenario, which leads to broken experiences whenever the network deviates from its optimal case. We treat these cases as exceptions but in reality they are the norm.
Building a product for a market dominated by 2G vs. 3G vs. 4G users might require an entirely different architecture and set of features. However, a 3G user is also a 2G user some of the time; a 4G user is both a 3G and a 2G user some of the time; all users are offline some of the time. A successful application is one that is resilient to fluctuations in network availability and performance: it can take advantage of the peak performance, but it plans for and continues to work when conditions degrade.
Failing to plan for variability in network performance is planning to fail. Instead, we need to accept this condition as a normal operational case and design our applications accordingly. A simple, but effective strategy is to adopt a "Chaos Monkey approach" within our development cycle:
Degraded network performance and offline are the norm not an exception. You can't bolt-on an offline mode, or add a "degraded network experience" after the fact, just as you can't add performance or security as an afterthought. To succeed, we need to design our applications with these constraints from the beginning.
Are you using a network proxy to emulate a slow network? That's a start, but it doesn't capture the real experience of your average user: a 4G user is fast most of the time and slow or offline some of the time. We need better tools that can emulate and force these behaviors when we develop our applications. Testing against localhost
, where latency is zero and bandwidth is infinite, is a recipe for failure.
We need API's and frameworks that can facilitate and guide us to make the right design choices to account for variability in network performance. For the web, ServiceWorker is going to be a critical piece: it enables offline, and it allows full control over the request lifecycle, such as controlling SLA's, background updates, and more.
]]>But, despite all of its pitfalls, UA/device detection is a fact of life, a growing business, and an enabling business requirement for many. The problem is that UA/device detection often frequently misclassifies capable clients (e.g. IE11 was forced to change their UA); leads to compatibility nightmares; can't account for continually changing user and runtime preferences. That said, when used correctly it can also be used for good.
Browser vendors would love to drop the User-Agent string entirely, but that would break too many things. However, while it is fashionable to demonize UA/device detection, the root problem is not in the intent behind it, but in how it is currently deployed. Instead of "detecting" (i.e. guessing) the client capabilities through an opaque version string, we need to change the model to allow the user agent to "report" the necessary capabilities.
Granted, this is not a new idea, but previous attempts seem to introduce as many issues as they solve: they seek to standardize the list of capabilities; they require agreement between multiple slow-moving parties (UA vendors, device manufacturers, etc); they are over-engineered - RDF, seriously? Instead, what we need is a platform primitive that is:
Here is the good news: this mechanism exists, it's Service Worker. Let's take a closer look...
Service worker is an event-driven Web Worker, which responds to events dispatched from documents and other sources… The service worker is a generic entry point for event-driven background processing in the Web Platform that is extensible by other specifications - see explainer, starter, and cookbook docs.
A simple way to understand Service Worker is to think of it as a scriptable proxy that runs in your browser and is able to see, modify, and respond to, all requests initiated by the page it is installed on. As a result, the developer can use it to annotate outbound requests (via HTTP request headers, URL rewriting) with relevant capability advertisements:
This is not a proposal or a wishlist, this is possible today, and is a direct result of enabling powerful low-level primitives in the browser - hooray. As such, now it's only a question of establishing the best practices: what do we report, in what format, and how to we optimize interoperability? Let's consider a real-world example...
Our goal is to deliver the optimal — fast and visually pleasing — video startup experience to our users. Simply starting with the lowest bitrate is suboptimal: fast, but consistently poor visual quality for all users, even for those with a fast connection. Instead, we want to pick a starting bitrate that can deliver the best visual experience from the start, while minimizing playback delays and rebuffers. We don't need to be perfect, but we should account for the current network weather on the client. Once the video starts playing, the adaptive bitrate streaming will take over and adjust the stream quality up or down as necessary.
The combination of Service Worker and Network Information API make this trivial to implement:
// register the service worker
navigator.serviceWorker.register('/worker.js').then(
function(reg) { console.log('Installed successfully', reg) },
function(err) { console.log('Worker installation failed', err) }
);
// ... worker.js
self.addEventListener('fetch', function(event) {
var requestURL = new URL(event.request);
// Intercept same origin /video/* requests
if (requestURL.origin == location.origin) {
if (/^\/video\//.test(requestURL.pathname)) {
// append the MD header, set value to NetInfo's downlinkMax:
// http://w3c.github.io/netinfo/#downlinkmax-attribute
event.respondWith(
fetch(event.request.url, {
headers: { 'MD': navigator.connection.downlinkMax }
})
);
return;
}
}
});
/video/*
requests.downlinkMax
in Chrome 41.MD
value to determine the starting bitrate, and responds with the appropriate video chunk.We have full control over the request flow and are able to add additional data to the request prior to dispatching it to the server. Best of all, this logic is transparent to the application, and you are free to customize it further. For example, want to add an explicit user override to set a starting bitrate? Prompt the user, send the value to the worker, and have it annotate requests with whatever value you feel is optimal.
Service Worker enables us (web developers) to define, customize, and deploy new capability reports at will: we can rewrite requests, implement content-type or origin specific rules, account for user preferences, and more. The new open questions are: what capabilities do our servers need to know about, and what's the best way to deliver them?
It will be tempting to report every plausibly useful property about a client. Please think twice before doing this, as it can add significant overhead to each request - be judicious. Similarly, it makes sense to optimize for interoperability: use parameter names and format that works well with existing infrastructure and services - caches and CDN's, optimization services, and so on. For example, the MD
and DPR
request headers used in above examples come from Client-Hints, the goals for which are:
DPR
and RW
hints to optimize images with resrc.it service.Now is the time to experiment. There will be missteps and poor initial implementations, but good patterns and best practices will emerge. Most importantly, the learning cycle for testing and improving this infrastructure is now firmly in the hands of web developers: deploy Service Worker, experiment, learn, and iterate.
]]>