A gentle introduction to HTTP/2

Let’s start with HTTP/0.9

The original HTTP as defined in 1991 was developed by Tim Barnes Lee who is a web developer. It was a text based request and response protocol. You could do HTTP GET and only for HTML (response type). It was solely used for sharing documents mostly about physics. After each request connection was closed.

Later on, HTTP/0.9 was extended to HTTP/1.0, request and response headers were added, also, you request images, text files, CSS and other.

In 1999, HTTP/1.1 showed up, Persistent connections (keep alive) was introduced, chunked transfer encoding and host header were added. With host headers, it was possible to host multiple sites for an IP. It was a huge SUCCESS!

Problems with HTTP/1.1:

  • It wasn’t designed for todays web pages
    • 100+ HTTP requests, and 2MB+ page size.
  • Requires multiple connections
  • Lack of prioritization
  • Verbose headers
  • Head of line blocking

Lets break it down.

Multiple connections:

When the app requires 100+ HTTP requests, there is a limit to the number of connections a browser can open per host, most browsers support 6 connections simultaneously. This becomes a problem, it takes time to establish and be efficient, there is 3 way handshake every time it needs a connection.

Before HTTP/1.1 each resource required a 3 way hand shake which was very wasteful.

With HTTP/1.1 Connection header was introduced. By default Connection: Keep-alive was introduced. With this header, the problem with three way handshake was eliminated and everything was able to be done via single TCP connection.

TCP is a reliable protocol. Each packet sent we receive a acknowledgement.

This introduced the head of line blocking problem as below. If the data 2 fail, it will block the subsequent requests.

There are other problems like slow start and sliding window. Adjusting the transfer rate by the condition of the network. This is called Flow Control.

Lack of prioritization:

Priority is decided by browser. Not be developer or the application. There is no way to specify the order or responses. Browsers need to decide how to best use the connections and the order of resources. There is no prioritization built on HTTP/1.1

Verbose headers:

There is no header compression. You can use GZIP to compress the content however, the headers such as Cookie, User Agent, Referrer and others are not compressed. HTTP is stateless by default; therefore Cookies, User Agent and other headers are sent to server every time, which is inefficient especially for very high volume sites.

Head of line blocking:

Once you create a connection, and send a request to a resource, that connection is dedicated for that request, until the response comes back, you can’t use that connection. ie: If you need to get 30 resources from host, you can get 6 at a time. Once you request 6 resources, the others must wait for these 6 requests to finish. So, HTTP/1.1 works in a serial way (serial request and response). This is called Head of line blocking.

These are the problems with HTTP/1.1. Then we have other problems like bandwidth and latency.

Bandwidth is measured in bits per second, which is relatively easy to add more to a system. Bandwidth is usually expressed as network bandwidth, data bandwidth or digital bandwidth.

Latency is the time interval between cause and affect in the system. In internet latency is typically the time interval between request and response. Latency is measured in milliseconds, which is based on distance and speed of light, there is not much to do when it comes to latency. You can use CDN to beat latency, however that comes with a price.

For more information about latency and impacts read “it’s the latency, Stupid“, by Stuart Cheshire.

Increasing bandwidth helps improve web performance, page loading times etc. However, there is a limit to it. On the other hand, fixing the latency problems almost helps linearly to performance. You can read this post for more information about bottlenecks. Based on the tests done on this post indicates that improving latency is more efficient than improving bandwidth, when it comes to web page optimization.

If we were to compare internet to a highway bandwidth is the number of lanes and latency is time it takes to travel a specific distance which depends on traffic, speed limit and others.

Goals of HTTP/2

  • Minimize impact of Latency
  • Avoid head of line blocking
  • Use single connection per host
  • Backwards compatible
    • Fall back to HTTP/1.1
    • HTTP/1.1 methods, status, headers still work.

Biggest success of HTTP/2 is reducing latency by introducing full request and response multiplexing.

HTTP/2 Major Features

  • Binary framing layer
    • Not a text based protocol anymore, binary protocols are easier to parse and more robust.
  • Resource prioritization
  • Single TCP connection
    • Fully multiplexed
    • Able to send multiple requests in parallel over a single TCP connection.
  • Header compression
    • It uses HPACK to reduce overhead.
  • Server push

HTTP/2 introduces some more improvements, more details: HTTP/2 RFC7540

Binary framing layer uses doesn’t use any text, currently we can trace and debug, this change will require tools for debugging.

Resource prioritization will allow browsers and developers to prioritize the resources requested. Priorities can be changed at any time based on resources or application. So far so good. However, if there is a problem with a high priority resource, browser can intervene and request the low priority resources.

Most important change in HTTP/2 is Single TCP connection per host which solves lot of problems. HTTP/2 multiplexes request and response frames from various streams. Lots of less resources are used, there is no 3 ways handshake, no TCP Start slow and Head of Line Blocking.


When a user requests a web page, headers are sent for every single request which are Cookies, User Agent and others. It doesn’t make too much sense to send User Agent for every single request. To solve this problem dynamic table was introduced.

When you send a request to a host, the following headers are sent along. On the consequent requests, not the whole values are sent, instead the compression values are transmitted. In the future requests if the compressed values are same, nothing will be sent again. If User-Agent doesn’t change it won’t send anymore.

HTTP headers

Original value Compression value
Method GET 2
Scheme HTTP 6
User-agent Mozilla Firefox/Windows/10/Blah 34
Host Yahoo 20
Cookie Blah 89


Currently server push is experimental today yet, servers try to predict what will be requested next and push to client.

ALNP is needed for HTTP/2.

As part of the spec, HTTP/2 doesn’t require HTTPS however, if you need HTTPS, you need to use TLS 1.2+ and also don’t use some cipher suites.

Can you use HTTP/2?

With HTTP/2 we have single multiplex connection, which means some of the performance optimization techniques sorta becomes obsolete such as CSS sprites, JS bundling, and domain sharding. Since connections are cheaper with HTTP/2 and more efficient, resources can be cached, modified and requested independently, there are of course tradeoffs which you need to decide how to implement.

However, I think, web performance optimizations like fewer HTTP requests, send as little and as infrequently as possible still applies.

While many major Internet giants are using HTTP/2, it is still not adapted as much. I assume, adoption will take a while and maturity of this new exciting protocol will come along.

Here are some demos, showing the difference between HTTP/1.1 vs HTTP/2.

Akamai demo

CDN 77 demo


Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Types of performance testing

Today performance testing includes many types of tests.

  • Latency Test – Measures end to end transaction time. This test can also vary from user point of view and within data center.
  • Throughput Test – Measures number of concurrent transactions a system can handle.
  • Load Test – a boolean test, if the system can handle the load or not.
  • Stress Test – Finds out the breaking point of a system.
  • Endurance Test – Measures if there are any anomalies during tests.
  • Capacity Planning Test – Made to find out whether the system performs as expected based on capacity planning and provisioning.
  • Degradation Test – Made to find out when the system performance degrades.

Multiple of these tests should be done to properly know about your working environment.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Resource hints for performance optimization

Resource hints can help you boost your web site/app performance.

While you can read the resource hints on W3 or on scoth.io. Below I will write up a brief.

You can also checkout resource hints usage on caniuse.com

pre-resolves DNS hostnames for objects in the page. Read more on Optimizing DNS Lookups. Usage is following:

<link rel="dns-prefetch" href="//cdn.example.com" crossorigin>

preconnect tries to hint the browser to initiate early connections. It includes TCP handshake, DNS lookup and TLS negotiation. Usage is following:

<link rel="preconnect" href="//cdn.example.com" crossorigin>

prefetch assets for further navigation. in place cache. Usage is following:

<link rel="prefetch" href="/images/hello.jpg">

pre-renders a page in the background. Usage:

<link rel="prerender" href="/about/index.html">
Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Bundling and minification

Bundling and minification is good reducing HTTP requests, which will have positive impact on the performance.

Less HTTP requests will boost your website speed.

Today in many web applications, several css and JS files are being used. Each JS and css files requires HTTP requests that goes to edge or origin. Even if you are using Persistent connections and multiplexing there is an associated latency cost. In order to avoid Round Trip Times (RTT) JS and css files can be bundled into a single file and minified. Fewer bytes means fewer round trips which means less time spent.

Moreover, you can compress HTML output and remove white spaces along with new lines. This improvement also will increase the response time.

For text files bundling and minification while using HTTP compression makes a difference. You will feel the difference right away. A server will compress objects before they are sent and result in a 90% reduction in bytes on the wire.

All textual content (html, js, css, svg, xml, json, fonts, etc.), can benefit from compression and minification.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

HTTP compression (gzip)

You should use compression for robust HTTP transfer. Compression helps reduce bandwidth usage and transfer size/speed.

Request header for HTTP compression is Accept-Encoding: gzip, deflate

Response header for HTTP compression is Content-Encoding: gzip, deflate

HTTP compression is enabled on server side and client should support it.




Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Client side caching (Browser caching)

Nothing is faster than serving resources from the client’s machine. You won’t use any network resources.

The fastest request is the one you don’t make. The directive that tells the browser how long to cache an object is called the Time to Live or TTL.

It is not easy to decide the best TTL for a resource, however there are some heuristics.

Client side caching TTL can be set through the HTTP header “cache control” and the key “max-age” (in seconds), or the “expires” header.

Static content like images, JS, CSS and other files can be versioned. Once the version changes, client makes a request to get the newer version from the server.


v is the version number of the files. Once it changes, client goes to the server and request the changed static file, in our case, css file.

If you are using CDN, the usually embrace client side caching.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Avoid HTTP redirects

Aside from SEO purposes, intentional redirects are bad for performance.

In many high traffic sites you can see a HTTP 301 redirect. This is usually done for SEO purposes. HTTP 301 redirects can be cached.

However, if you do a redirect to another domain which mean a new HTTP connection which can add DNS lookup and latency.

If it is a redirect on the same domain, you can use rewrite rules to avoid new connections and provide transparency to the user.

On a different note, HTTP 301 and 302 has newer versions like 307 and 308.

See this stackoverflow question.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Optimizing DNS Lookups

A DNS lookup needs to be made before a connection to a host. It is essential to have this resolution to be made as fast as possible.

Following can be implemented as best practices:

Limit the number of unique hostnames in your application. Having too many domains in your app would increase your response time due to DNS lookups. On the other hand, recall Domain Sharding, so you will have to balance it.

You can use dns-prefetch, browser hint to prefetch DNS resolution of the resources.

<link rel="dns-prefetch" href="//ajax.googleapis.com">

In this example, while the initial HTML is being loaded and processed, DNS resolution will take place for this particular host.

You can easily spot these browser hints on source code of amazon.com and other high traffic sites. You will also see  a meta tag to enable or disable prefetch control that directs browser on doing so.

<meta http-equiv="x-dns-prefetch-control" content="on">
<link rel="dns-prefetch" href="https://s3.images.amazon.com">

With this approach DNS lookups happens in the background so that once the user needs it, browser won’t need to do additional DNS lookups which reduces the latency when the user takes action.

Hopefully these techniques will reduce DNS lookups.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Domain Sharding

Even though this method is considered an anti-pattern or obsolete nowadays (for HTTP/2), it is good to mention and valid for HTTP/1.1 applications. Domain sharding is the most extreme, and also possibly the most successful, HTTP/1 optimization.

As I mentioned in Connection Management post, browsers usually open 6 connection per host/domain. In order to increase the performance of a web page/app, we need the browser open more connections. This way the assets/objects on that page can be downloaded in parallel. So we shard our application to load resources from multiple domains.

If  the we want a faster Web site or application, it is possible to force the opening more connections. Instead of serving all resources from the same domain, say www.foobar.com, we can split over several domains, www1.foobar.com, www2.foobar.com, www3.foobar.com. Each of these domains resolve to the same server, and the Web browser will open 6 connections to each (in our example, we will have 18 connections). This technique is called domain sharding.

Without domain sharding:

With domain sharding:

With some DNS and app server tricks you don’t have to host the files/assets on a different server, you can use the same server to serve them.

The cost we have here is extra DNS lookups for and connection setup for sharded domains.

It is also good to mention that it is good to have a * certificate for the shards you are using.

While domain sharding helps performance by providing higher paralellism there are some drawbacks as well.

First of all, domain sharding introduces extra complexity to our application and code. It has a cost associated with it during development. There is no perfect number of shards. Each connection to the shards need to consume resources  (IO, CPU) and race with each other for bandwidth which causes poor TCP performance.


Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Connection Management in HTTP

Connection management is a key topic in HTTP which effects performance. Opening and maintaining connections impacts the performance of Web sites and Web applications.

HTTP uses TCP for its reliability. There has been different models of connection management throughout the evolution of HTTP.

Initially the connection model was Short lived Connections. Prior to HTTP/1.1 for every request, a new connection was setup used and disposed. As you can imagine this operation has major effect on performance. At that time it was the simplest deliverable solution to have working HTTP. Each HTTP request is completed on its own connection; this means a TCP handshake happens before each HTTP request, and these are serialized.

Opening each TCP connection is a resource-consuming operation. Several messages (RTTs) must be exchanged between the client and the server. Network latency and bandwidth affect performance when a request needs sending.

The TCP handshake itself is time-consuming, but a TCP connection adapts to its load, becoming more efficient with more sustained (or warm) connections. Short-lived connections do not make use of this efficiency feature of TCP, and performance degrades from optimum by persisting to transmit over a new, cold connection.

With HTTP/1.1 two new models were introduced namely, Persistent Connections and HTTP Pipelining.

A picture is worth thousand words.

The persistent-connection (keep-alive) model keeps connections opened between successive requests, reducing the time needed to open new connections and thus saving resources. HTTP pipelining goes one step further and sends multiple requests without waiting for a response.

Connection header was introduced with HTTP/1.1 by default persistent connections are enabled.

A persistent connection is a one which remains open for a period, and can be reused for several requests, saving the the need for a new TCP handshake, and utilizing TCP’s performance. This connection will not stay open forever: idle connections are closed after some time.

One drawback of persistent connections is that, they consume resources on servers. They must be closed after a period of time.

HTTP pipelining is not used or removed from browsers due to the complexity.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit

Cookieless domains

“An HTTP cookie (also called web cookie, Internet cookie, browser cookie or simply cookie) is a small piece of data sent from a website and stored on the user’s computer by the user’s web browser while the user is browsing. Cookies were designed to be a reliable mechanism for websites to remember stateful information (such as items added in the shopping cart in an online store) or to record the user’s browsing activity (including clicking particular buttons, logging in, or recording which pages were visited in the past). They can also be used to remember arbitrary pieces of information that the user previously entered into form fields such as names, addresses, passwords, and credit card numbers.” Wikipedia.

Cookies are sent to the server in request header. For every request cookies are sent to the server. In HTTP/1 headers are not compressed. Today in many modern apps uses several cookies to store several user information and it is not unusual to see the cookie sizes larger than a single TCP packet( ~ 1.5 KB ). As a result sending the cookies for every request has an associated cost with it which results in increased round trips.

That’s why it has been a rational recommendation to setup cookie-less domains for resources that don’t rely on cookies, for instance images, css and other static files. This is the case for HTTP/1. For HTTP/2 we have a different story.

The advantage of serving static objects from the same hostname as the HTML eliminates additional DNS lookups and (potentially) socket connections that delay fetching the static resources.

It is a best practice for HTTP/1 applications to serve static files from cookieless domain.

** The only time you want to send cookies to the server while requesting for an image or a static file is when you want to track the user. This is what ad serving businesses usually do.

Sharing is caring Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInDigg thisEmail this to someoneShare on Reddit