Posted on 2015-09-25
I was hacking together a JavaScript varnishstat implementation for a customer a few days ago when I noticed something strange. I have put Varnish in front of the agent delivering stats, but I'm only caching the statistics for 1 second.
But the cache hit rate was 100%.
And the stats were updating?
Logically speaking, how can you hit cache 100% of the time and still get fresh content all the time?
Grace mode is a feature Varnish has had since version 2.0 back in 2008. It is a fairly simple mechanic: Add a little bit of extra cache duration to an object. This is the grace period. If a request is made for the object during that grace period, the object is updated and the cached copy is used while updating it.
This reduces the thundering horde problem when a large amount of users request recently expired content, and it can drastically improve user experience when updating content is expensive.
The big change that happened in Varnish 4 was background fetches.
Varnish uses a very simple thread model (so to speak). Essentially, each session is handled by one thread. In prior versions of Varnish, requests to the backend were always tied to a client request.
If the cache is empty, there isn't much of a reason NOT to do this. Grace mode always complicated this. What PHK did to solve this was, in my opinion, quite brilliant in its simplicity. Even if it was a trade-off.
With grace mode, you HAVE the content, you just need to make sure it's updated. It looked something like this:
So ... NO CHANGE. For a single client, you don't have grace mode in earlier Varnish versions.
But enter client number 2 (or 3, 4, 5...):
So with Varnish 2 and 3, only the first client will block waiting for new content. This is still an issue, but it does the trick for the majority of use cases.
Background fetches changed all this. It's more complicated in many ways, but from a grace perspective, it massively simplifies everything.
With Varnish 4 you get:
And so forth. Strictly speaking, I suppose this makes grace /less/ magical...
In other words: The first client will also get a cache hit, but Varnish will update the content in the background for you.
It just works.
What is a cache hit?
If I tell you that I have 100% cache hit rate, how much backend traffic would you expect?
We want to keep track of two ratios:
For my application, a single user will result in a 100% cache hit rate, but also a fetch/request ratio of 100%. The cache isn't really offloading the backend load significantly until I have multiple users of the app. Mind you, if the application was slow, this would still benefit that one user.
The latter is also interesting from a security point of view. If you find the right type of request, you could end up with more backend fetches than client requests (e.g. due to restarts/retries).
You already have it, most likely. Grace is turned on by default, using a 10 second grace period. For frequently updated content, this is enough.
Varnish 4 changed some of the VCL and parameters related to grace. The important bits are:
If you want to override grace mechanics, you can do so in either vcl_recv by setting req.ttl to define a max TTL to be used for an object, regardless of the actual TTL. That bit is a bit mysterious.
Or you can look at vcl_hit. Here you'll be able to do:
if (obj.ttl + obj.grace > 0s && obj.ttl =< 0s) { // We are in grace mode, we have an object though if (req.http.x-magic-skip-grace-header ~ "yes") { return (miss); } else { return (delier); } }
The above example-snippet will evaluate of the object has an expired TTL, but is still in the grace period. If that happens, it looks for a client header called "X-Magic-Skip-Grace-Header" and checks if it contains the string "yes". If so, the request is treated as a cache miss, otherwise, the cached object is delivered.