Overview
When thinking about system design and requests being submit and received by your server, I like to draw a parallel with car manufacturing. I’ve heard analogies using a restaurant with various lines chefs used nicely, but I’ll save that analogy for discussion on the value of microservices within backend design.
Background
A bit of background is required for those not familiar with Computer Science in how data is transferred between systems before I can fully dive into this analogy. In engineering, we use the term “traffic” or more formally, “Network Traffic”. Again, I find myself in a situation where the analytics used to describe the efficacy of your network tie in nicely with the analogy I’ve chosen. Network traffic represents the amount of data moving across a computer network at any given time. What does that mean?
Anytime you go to a website, data is being transferred between a sending device and a receiving device. In most cases, the sending device will be the server serving up the resources to power the site and the receiving device will be the browser which is assembling and executing the data from that server. There are other steps including communication with a Domain Name System (DNS) a Content Delivery Network (CDN) and a handful of other things which enable and speed up the efficiency of receiving the data, but for the purposes of this article, we are focusing on the communication between the server and the browser.
Now, how exactly is that data transferred?
It’s reasonable to assume that all data in a single burst. The browser says, “Hey, I need an HTML file to paint on this DOM” and the Server says, “Here’s your HTML file.” For small data items, this is indeed how it works. But what happens when you have a larger sum of data to transmit?
It’s much more efficient and robust to break up large data transactions into smaller packets and submit each of those smaller packets across the internet. Why? Because what if the receiver is not ready to receive the information - what if there is an error and communication somehow stalls or is broken during receipt?
Breaking this data into smaller packets enables a recipient to acknowledge they are receiving data, and gives the recipient a clue on how many data packets they are going to receive. They can then let the sender know when a packages was successfully received, and (more importantly) can let the sender know when an error occurred and a package needs to be resent.
The requester can then combine each received package into the larger data file as each successful package is received.
The Analogy
With that background, let’s think of each of these packages as cars which the server builds and then sends out along on a single lane road. Between the browser and the server, that’s largely exactly what is occurring. The browser will request packages that it needs, and the server will compose those packages to be sent along the treacherous road that is an HTTP request.
If we imagine a single user on a website requesting that information from the server, the process is simple - but what if we imagine a million users all hitting the same site at the same time? That’s a lot of cars, a lot of packages, which the server has to build and send out to a lot of different browsers all at once. So, how does the server handle that?
Scaling
There are two options to discuss. First, you can make a bigger car factory. If pay to decrease the time that it takes to build those cars by building a super car manufacturing facility, you can quickly process requests and get those cars (or packages) out to every browser that is requesting them. Second, you can duplicate your existing car manufacturing facility and distribute the load of creating those cars across each facility.
These two options are called Vertical and Horizontal scaling methods respectively. With vertical scaling you pay to have a computer with greater processing speeds - something that can really quickly turn out those packages and send them along the road of the internet. With horizontal scaling, you duplicate your existing manufacturing facility a hundred times over (something which is significantly cheaper to do when discussing software rather than actual manufacturing facilities) and you distribute the load of creating those packages across many different computers with smaller computing power.
With tools like Docker and the advent of virtual machines (think of each machine as its own manufacturing facility) the ability to duplicate existing environments and horizontally scale is often a cheap and reliable solution to scale lots of network traffic all at once. But what do those data highways look like?
Intro to System Design with Horizontal Scaling
Moving away from the single lane road of an HTTP request, and diving deeper into the package construction when horizontally scaling - let’s imagine that a Server is a massive city with thousands of requests being made for new cars every millisecond, and thousands of new cars being made every moment.
What’s the first thing you anticipate you might need is… directions to each of the manufacturing plants. A hundred requests for cars come in at roughly the same time - you have 50 manufacturing plants ready to make the cars and send deliver them - step one. How do you ensure you use all of your manufacturing plants efficiently?
Load Balancing
You don’t want to send all of these requests to the same manufacturing facility, as it can’t create two cars at once and it will have to add new requests to the queue. The answer lies in the term “Load Balancing”. A load balancer’s key goal is to receive all of these requests for data and redirect the request to a virtual computer which is currently not making a data package.
Popular load balancers like Nginx, Azure Load Balancer, and AWS Load Balancing and each of those can distribute the load to a manufactucturing facility that is not currently in service.
Now we’ve solved the problem of getting requests to the right manufacturing facility - but we have a new problem. Right now (and I haven’t mentioned this) every car manufacturing facility needs to request the blueprint from the Auto Designer on how to make the car every single time a request is made.
As you can image that might create a potential bottleneck on the Automobile Designer or (in this analogy) the database. So how can we circumvent the need to request the blueprint every single time?
What if we stored the blueprint at the manufacturer? If we had a blueprint of the most commonly selected cars, we could limit the number of calls heading out to our Automobile Designer and we could limit the impact of that bottleneck.
Caching
The general concept of storing commonly requested data in a layer away from the database is called caching your data. Caching occurs to optimize fetches across the entirely of the web, whether that is in commonly requested IP addresses, images when you visit the same website multiple times, or (in this case) commonly requested data items feeding back to a browser.
There’s a limit (or there should be) to the number of blueprints you want to store in a cache. I say there should be, because maintaining full duplications of the database would be highly resource intensive and there is a tradeoff between the number of blueprints stored and the efficiency gains of a cache.
So, how do we decide what to and what not to include in a cache? If you again take the scenario where you have 100 blueprints for different cars (or data packages), your goal is to find the data that will reduce the requests on your Auto Designer database the most.
One manner of doing this is to store the data that was requested soonest. Say you have a cache with the ability to only store three blueprints and requests come in, in the following order.
Camry F-150 Camry Model S Model S F-150 Camry SmartCar
Visually, you can tell storing in a cache the data for a Camry and an F-150 would reduce the blueprint requests to the Designer fairly substantially. If you requested the data that was last fetched, the blueprints stored in your cache would be: SmartCar Camry F-150
There is a greater likelihood that the cars most commonly requested will ensure their presence in your cache, because… they are most commonly requested.
But, what is that SmartCar doing in there? Is it doing any good? The short answer is, it might be. In situations where you have more varietal fetching and where you aren’t able to always predict the most common fetches, doing a time based caching store will capture those trends.
The perfect example of this might be someone searching umbrellas on a rainy day. Searches for umbrellas go up as people realize they need an umbrella. While it may not be the most common request after a sunny week, storing the last requested data in the cache would ensure that after the first request for an umbrella, every individual who realizes they need an umbrella that rainy day will have a better, more timely fetch experience.
The other common way to create a cache is to find the most commonly requested data. In some situations, you know the data that is going to be most commonly fetched (you know, for example, that your logo is going to be loaded every time the page loads). In the above fetches, your cached data would be blueprints for: Camry F-150 Model S
This method ensures that necessary data is always going to be present and will not require a request to the database in order to construct the data package.
There are a handful of other optimizations with caching which will be covered in future articles - but one related topic is the extreme optimizations you can make with static resources which you know will always be fetched every time a page loads (think of that logo again).
Content Delivery Networks (CDN’s) as Car Warehouses
As you can imagine, no matter how quickly you churn out those cars - the further they have to travel on the internet to the end browser, the longer it will take to receive the data. In the US, most highways have a maximum speed limit of 80 mph, imagine the same is true with sending and receiving data across the internet. There’s a (relatively) static time speed of transmission and the further they have to travel, the longer it will take for the browser to receive each package sent by the server. If you open up a website in rural Estonia, if the server your Browser is receiving information from is located all the way in North East Canada, you can expect that website to load substantially slower than if you open up the same site from North East Canada.
The birth of CDN’s came when a savvy group of engineers realized that it is possible to improve the delivery time of those cars by pre-making them and storing them at locations closer to their common requestors. Companies use CDN’s like Cloudflare, Sucuri, Amazon CloudFront to pre-load their large data packages and store them at locations that are geographically closer to their end users.
This means that the not only is the distance the data has to travel along that road substantially shorter but also that the data packages are less liable to be accosted by pirates (you’re less likely to have errors on receipt). You increase efficacy and reduce transmission time.
CDN’s can therefore be thought of as data warehouses spread across the globe to improve data transmission.