Setting Cache control headers for common content types Nginx and Apache

Last updated: June 8th 2020

Introduction

Cache-Control are HTTP cache headers that holds instructions for caching for requests and responses. It is used to defines how a resource is cached, where it’s cached and its maximum age before expiring. When you visit a website, your browser will save images and website data in a store called the cache. When you revisit the same website again, cache-control sets the rules that determine whether your resources loaded from your local cache or if the browser should send a request to the server for fresh resources.

For better understanding of how the browser renders pages quickly using the cache, you need to know about browser caching and HTTP headers.

What is browser caching?

Browser caching is a temporary storage of Web documents, such as images, media and pages. The intend behind this is to help reduce bandwidth and make webpages load faster in the browser.  When you revisit a Web page, there is no need to re-download all components. This results in a faster Web page load. Browsers will save those resources only for a specific period of time called TTL. Once the TTL has expired, the browser will have to reach out to the server again and download a fresh copy of the resource.

What are HTTP headers?

HTTP headers are the core part of HTTP requests and responses and provide required information about the request or response. All the headers are case-insensitive, headers fields are separated by colon, key-value pairs in clear-text string format. These headers contain information about each communication. For example, the request header contains, information on what resource is being requested, which browser the client is using and what data formats the client will accept. While response headers contain information on, whether the request was successfully fulfilled and the language and format of any resources in the body of the response.

The cache-control header is broken up into directives. You can see the cache-control header of https://google.com with the following command:

# curl -I https://google.com

You should get the following output:

HTTP/1.1 301 Moved Permanently
Location: https://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Fri, 05 Jun 2020 03:03:10 GMT
Expires: Sun, 05 Jul 2020 03:03:10 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 220
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Alt-Svc: h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

As you can see, the part to the left of the colon is cache-control and the value is on the right of the colon, and there can be one or several comma-separated values for cache control. These values are called directives, and they dictate who can cache a resource as well as how long those resources can be cached before they must be updated.

The most common cache-control headers are detailed below.

Cache-Control: Public

This directive indicates that the response may be stored by any cache, even if the response is normally non-cacheable.

Cache-Control: Private

This directive indicates that the response can only be cached by the browser that is accessing the file. It can not be cached by an intermediary agent such as proxy or CDN.

Cache-Control: Max-Age

This directive indicates that the maximum amount of time a resource is considered fresh. In other words how many seconds a resource can be served from cache after it's been downloaded. For example, if the max age is set to 3600 means that the returned resource is valid for 3600 seconds, after which the browser has to request a newer version.

You can also use a technique developed by some assets builders tools, like Webpack or Gulp to force the browser to download a new version of the needed file. This will precompiled each file on the server and add hash sums to the file names, such as “app-72420c47cc.css”. So, after next the deployment, you will get a new version of the file.

Cache-Control: No-Cache

This directive indicates that a browser may cache a response, but must first submit a validation request to an origin server. This directive is not effective in preventing caches from storing your response. It allows you to cache but subsequence response or any subsequence response for similar data the client needs to check with the browser whether that resource has changed or not. Only if the resource has not changed then the client serves the cache which is stored.

If you apply the technique you learned in the previous section in html files, you will never get new links for your css, js, or image files until you force a reload.

It is recommended to use Cache-Control: no-cache to html files to validate resources on the server before fetching it from the cache.

Cache-Control: No-Store

This directive indicates that the response should never be cached, For example, banking details you would not want to be stored in any browser cache. For those kinds of purposes, you can use no-store.

Etag (Entity tag)

The Etag also called HTTP response header is a cache validators used to determine whether a component in the browser's cache matches the one on the origin server. This will helps to improve loading times since if the resource can be retrieved from local cache, the browser does not need to make an additional request to the server.

The ETag or entity tag is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for web cache validation, and which allows a client to make conditional requests. This allows caches to be more efficient, and saves bandwidth, as a web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control, as a way to help prevent simultaneous updates of a resource from overwriting each other.

An ETag is an opaque identifier assigned by a web server to a specific version of a resource found at a URL. If the resource content at that URL ever changes, a new and different ETag is assigned. Used in this manner ETags are similar to fingerprints, and they can be quickly compared to determine whether two versions of a resource are the same. Comparing ETags only makes sense with respect to one URL—ETags for resources obtained from different URLs may or may not be equal, so no meaning can be inferred from their comparison.

The following - mildly daunting - chart may help with deciding what specific cache directives should be added to a resource:

cache_chart.png

Configure Cache-Contol Headers for Apache and Nginx Webserver

In this section, we will show you how to set the HTTP Cache-Control header in Apache and Nginx.

Apache

For the Apache web server, you will need to create a .htaccess file inside your website root directory to implement the HTTP Cache-Control header.

# nano /var/www/html/.htaccess

Add the following contents:

<filesMatch ".(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
    Header set Cache-Control "max-age=3600, public"
</filesMatch>

As you can see, we set the Cache-Control header's max-age to 3600 seconds and to public for the listed files.

Nginx

For the Nginx web server, you will need to edit your website virtual host configuration file to implement the HTTP Cache-Control header.

nano /etc/nginx/sites-enabled/webdock

You would then add the following contents, which would be functionally equivalent to the Apache directives seen above:

location ~* \.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$ {
    expires 1d;
    add_header Cache-Control "public, no-transform";
}

Recommended Settings

We recommend the following settings for all cacheable resources:

  1. Set Cache-Control: max-age or Expires to a minimum of one month and up to one year.
  2. Set the Last-Modified date to the last time the resource was changed.
  3. If you need precise control over when resources are invalidated we recommend using a URL fingerprinting or versioning technique.
  4. For js,css, and image files, set Cache-Control: public, max-age=31536000, no Etag, no Last-Modified settings.
  5. For html files, use Cache-Control: no-cache, and Etag.
  6. Use Webpack, Gulp or other tools and add unique hash digits to your js, css and image files. (For example, app-67ce7f3483.css). This will force the browser to download a new version of the needed file.