What happened to turkiye.gov.tr

turkiye.gov.tr is our e-government gateway to many useful resources and hope this article finds its destinations.

Recently turkiye.gov.tr launched a new feature for viewing family trees. Great feature and work! There was a rush for many curious minds to find out about their ancestors. I guess turkiye.gov.tr wasn’t expecting too much traffic, so, things just went sideways in the most colossal way.

I am not familiar with the architecture of turkiye.gov.tr or the ancestry service but worked with many large-scale systems.

Below you will find 10 facts, I believe, caused outage for turkiye.gov.tr.

Fact #1: Big systems fail more often than small systems.
Big systems have more external and internal dependencies along with many moving parts. Therefore, big systems should have many defensive mechanisms in place.

Fact #2: Blocked threads are the number one cause of most failures.
Slow applications and hung threads are the most popular reasons of failures. These reasons lead to cascading failures and chain reactions. Blocked/hung threads can happen due to several reasons like deadlocks, starvation and live locks. Hung thread detection policies, timeouts, circuit breakers and bulkheads can prevent these failures.

Fact #3: Integration points are number one killer in any system.
Every integration point eventually fails, however a failure in the integration point shouldn’t take down the whole application. Cascading failures occur when problems in one integration point propagates. Failures in integrated services becomes your problem. It even become more serious if you are not prepared for it. Same as above, defensive programming, timeouts and circuit breakers can prevent serious failures.

Fact #4: Tight coupled systems fail more often than loose coupled systems.
Decoupling middleware is a good practice to enable loose coupling for integration. This principle is applicable and best practice for cloud native applications.

Fact #5: High traffic site’s Resource/Connections pools get drained very quickly.
Resource/Connection pool have limitations. They can run out of resources rapidly and application performance will start degrading.

Fact #6: Unbalanced capacities causes failures and scalability problems for applications.
If the capacities are not aligned, then you have a problem. Capacity and sizing should be planned accordingly.

Fact #7: Never trust a code you have no control over or you didn’t develop which can be a third partly library or a remote system developed by someone else.
The downstream application which is running a blocking code, can take your application down.

Fact #8: Slow applications gets more traffic.
When an application is slow, users hit re-load button or F5 many times to reach to the application which causes more traffic.

Fact #9: Fail fast all the time and retry gradually.
Exponential back off, Circuit breakers and timeouts should be embraced. User friendly error codes or default messages should be presented to the user until the stability is established.

Fact #10: Appreciate your hardware resources and utilize them wisely.
Don’t fall for CPU and Memory is CHEAP. This is not TRUE. Long running CPU cycles can cause contention which slows down your application and eventually it will fail. Paged or unfragmented memory causes slow seek times.

What happened reminds of dotcom bubble back in 2000s circa. Yahoo!, Altavista, Lycos and many internet giants back then faced similar problems and they started developing scalable platforms.


The notorious SSL Handshake

The notorious SSL handshake process happens as following.

  1. The client issues a secure session request.
  2. Server sends back x.509 certificate containing server’s public key.
  3. Client authenticates server’s certificate against list of known CAs (Certificate Authority). If the certificate is not in the list, user is prompt to accept the certificate.
  4. Client generates random symmetric key and encrypts it using server’s public key.
  5. Client and user now both have the symmetric key. Client send data using this symmetric key to the server during the session.

If you like to see it in action. Open up your Chrome tools. Browse to chrome://net-internals/#events

Then go a secure URL, something like https://amazon.com . In the events log you will see the events for SSL handshaking.

If you browse through events, you will see the handshaking process.

Resource hints for performance optimization

Resource hints can help you boost your web site/app performance.

While you can read the resource hints on W3 or on scoth.io. Below I will write up a brief.

You can also checkout resource hints usage on caniuse.com

pre-resolves DNS hostnames for objects in the page. Read more on Optimizing DNS Lookups. Usage is following:

<link rel="dns-prefetch" href="//cdn.example.com" crossorigin>

preconnect tries to hint the browser to initiate early connections. It includes TCP handshake, DNS lookup and TLS negotiation. Usage is following:

<link rel="preconnect" href="//cdn.example.com" crossorigin>

prefetch assets for further navigation. in place cache. Usage is following:

<link rel="prefetch" href="/images/hello.jpg">

pre-renders a page in the background. Usage:

<link rel="prerender" href="/about/index.html">

Bundling and minification

Bundling and minification is good reducing HTTP requests, which will have positive impact on the performance.

Less HTTP requests will boost your website speed.

Today in many web applications, several css and JS files are being used. Each JS and css files requires HTTP requests that goes to edge or origin. Even if you are using Persistent connections and multiplexing there is an associated latency cost. In order to avoid Round Trip Times (RTT) JS and css files can be bundled into a single file and minified. Fewer bytes means fewer round trips which means less time spent.

Moreover, you can compress HTML output and remove white spaces along with new lines. This improvement also will increase the response time.

For text files bundling and minification while using HTTP compression makes a difference. You will feel the difference right away. A server will compress objects before they are sent and result in a 90% reduction in bytes on the wire.

All textual content (html, js, css, svg, xml, json, fonts, etc.), can benefit from compression and minification.

Client side caching (Browser caching)

Nothing is faster than serving resources from the client’s machine. You won’t use any network resources.

The fastest request is the one you don’t make. The directive that tells the browser how long to cache an object is called the Time to Live or TTL.

It is not easy to decide the best TTL for a resource, however there are some heuristics.

Client side caching TTL can be set through the HTTP header “cache control” and the key “max-age” (in seconds), or the “expires” header.

Static content like images, JS, CSS and other files can be versioned. Once the version changes, client makes a request to get the newer version from the server.


v is the version number of the files. Once it changes, client goes to the server and request the changed static file, in our case, css file.

If you are using CDN, the usually embrace client side caching.

Avoid HTTP redirects

Aside from SEO purposes, intentional redirects are bad for performance.

In many high traffic sites you can see a HTTP 301 redirect. This is usually done for SEO purposes. HTTP 301 redirects can be cached.

However, if you do a redirect to another domain which mean a new HTTP connection which can add DNS lookup and latency.

If it is a redirect on the same domain, you can use rewrite rules to avoid new connections and provide transparency to the user.

On a different note, HTTP 301 and 302 has newer versions like 307 and 308.

See this stackoverflow question.

Optimizing DNS Lookups

A DNS lookup needs to be made before a connection to a host. It is essential to have this resolution to be made as fast as possible.

Following can be implemented as best practices:

Limit the number of unique hostnames in your application. Having too many domains in your app would increase your response time due to DNS lookups. On the other hand, recall Domain Sharding, so you will have to balance it.

You can use dns-prefetch, browser hint to prefetch DNS resolution of the resources.

<link rel="dns-prefetch" href="//ajax.googleapis.com">

In this example, while the initial HTML is being loaded and processed, DNS resolution will take place for this particular host.

You can easily spot these browser hints on source code of amazon.com and other high traffic sites. You will also see  a meta tag to enable or disable prefetch control that directs browser on doing so.

<meta http-equiv="x-dns-prefetch-control" content="on">
<link rel="dns-prefetch" href="https://s3.images.amazon.com">

With this approach DNS lookups happens in the background so that once the user needs it, browser won’t need to do additional DNS lookups which reduces the latency when the user takes action.

Hopefully these techniques will reduce DNS lookups.

Domain Sharding

Even though this method is considered an anti-pattern or obsolete nowadays (for HTTP/2), it is good to mention and valid for HTTP/1.1 applications. Domain sharding is the most extreme, and also possibly the most successful, HTTP/1 optimization.

As I mentioned in Connection Management post, browsers usually open 6 connection per host/domain. In order to increase the performance of a web page/app, we need the browser open more connections. This way the assets/objects on that page can be downloaded in parallel. So we shard our application to load resources from multiple domains.

If  the we want a faster Web site or application, it is possible to force the opening more connections. Instead of serving all resources from the same domain, say www.foobar.com, we can split over several domains, www1.foobar.com, www2.foobar.com, www3.foobar.com. Each of these domains resolve to the same server, and the Web browser will open 6 connections to each (in our example, we will have 18 connections). This technique is called domain sharding.

Without domain sharding:

With domain sharding:

With some DNS and app server tricks you don’t have to host the files/assets on a different server, you can use the same server to serve them.

The cost we have here is extra DNS lookups for and connection setup for sharded domains.

It is also good to mention that it is good to have a * certificate for the shards you are using.

While domain sharding helps performance by providing higher paralellism there are some drawbacks as well.

First of all, domain sharding introduces extra complexity to our application and code. It has a cost associated with it during development. There is no perfect number of shards. Each connection to the shards need to consume resources  (IO, CPU) and race with each other for bandwidth which causes poor TCP performance.


Connection Management in HTTP

Connection management is a key topic in HTTP which effects performance. Opening and maintaining connections impacts the performance of Web sites and Web applications.

HTTP uses TCP for its reliability. There has been different models of connection management throughout the evolution of HTTP.

Initially the connection model was Short lived Connections. Prior to HTTP/1.1 for every request, a new connection was setup used and disposed. As you can imagine this operation has major effect on performance. At that time it was the simplest deliverable solution to have working HTTP. Each HTTP request is completed on its own connection; this means a TCP handshake happens before each HTTP request, and these are serialized.

Opening each TCP connection is a resource-consuming operation. Several messages (RTTs) must be exchanged between the client and the server. Network latency and bandwidth affect performance when a request needs sending.

The TCP handshake itself is time-consuming, but a TCP connection adapts to its load, becoming more efficient with more sustained (or warm) connections. Short-lived connections do not make use of this efficiency feature of TCP, and performance degrades from optimum by persisting to transmit over a new, cold connection.

With HTTP/1.1 two new models were introduced namely, Persistent Connections and HTTP Pipelining.

A picture is worth thousand words.

The persistent-connection (keep-alive) model keeps connections opened between successive requests, reducing the time needed to open new connections and thus saving resources. HTTP pipelining goes one step further and sends multiple requests without waiting for a response.

Connection header was introduced with HTTP/1.1 by default persistent connections are enabled.

A persistent connection is a one which remains open for a period, and can be reused for several requests, saving the the need for a new TCP handshake, and utilizing TCP’s performance. This connection will not stay open forever: idle connections are closed after some time.

One drawback of persistent connections is that, they consume resources on servers. They must be closed after a period of time.

HTTP pipelining is not used or removed from browsers due to the complexity.