Posted on 2010-02-02
Varnish purges can be tricky. Both the model of purging that Varnish use and the syntax you need to use to take advantage of them can be difficult to grasp. It took me about five Varnish Administration Courses until I was happy with how I explained it to the participants, specially because the syntax is the most confusing syntax we have in VCL. However, it's not very hard to work with once you understand the magic at work.
There are two ways to throw something out of the cache before the TTL is due in Varnish. You can either find the object you want gone and set the TTL to 0 forcing it to expire, or use Varnish' purge mechanism. Setting ttl to 0 has it's advantages, since you evict the object immediately, but it also means you have to evict one object at a time. This is fairly easy and usually done by having Varnish look for a "PURGE" request and handle it. This is not what I'll talk about today, though. Read http://varnish-cache.org/wiki/VCLExamplePurging (http://varnish-cache.org/wiki/VCLExamplePurging) for information on forcibly expiring an object.
The main reason people purge their cache, is to make room for updated content. Some journalist updated an article and you want the old one - possibly cached for days - gone. In addition, you may not know exactly what to cache, or it might be broader than just one item. En example would be a template used to generate multiple php files. Or all sports articles.
All in all, you do not purge to conserve memory. Because you expect that the cache will be filled soon.
If you are to purge all your php pages and you have 150 000 objects, you may not want to go looking for them either. This the reason some competing cache products are slow at large purging. By looking for all these objects, you might have to hit the disk to fetch cold objects.
In varnish, we also leave it up to VCL what's unique to an object. That is to say: You can override the cache hash. By default it's the host name or server IP combined with the "URL". This is usually what people want, but sometimes you may want to add a cookie into the mix, for instance. The point is, we don't know exactly what people cache on.
In Varnish, you purge by adding a purge to a list. This list can grow large if you add several very specific purges, but we try to reduce the overlap as much as possible. The purge in question can be pretty much anything you can match in VCL, including regular expressions on URLs, host names and user-agents for that matter. You can see the list by typing "purge.list" in the command line interface (CLI, or telnet).
Each object in your cache points to the last purge it was tested against. When you hit an object, it checks if there are any new purges in the list, test the object against them, then either evict the object and fetch a new one, or update the "last tested against"-pointer.
Because of this, the 'req'-structure you are evaluating is actually that of the client to access the object next, not the client who pulled the object from the backend. It also means that every single object in your cache that is hit will be tested against all purges to see if it matches. But it's spread out over time. It might sound wasteful, but it means you can add purges at constant time, and not really think about the cost of evaluating them.
It also means the object stays in the cache until it expires if it is not hit. So you don't free up memory.
Want to purge a http://example.com/somedirectory/ and everything beneath that path?
purge req.http.host == example.com && req.url ~ ^/somedirectory/.*$
or:
purge req.url ~ ^/somedirectory/ && req.http.host == example.com
Want to purge all objects with a "Cache-Control: max-age=" set to 3600 ?
purge obj.http.Cache-Control ~ max-age=3600
purge obj.http.Cache-Control ~ max-age ?= ?3600[^0-9]
Notice that all of the variables are in the same "VCL-context" as the client to hit the object next, so if you purge on req.http.user-agent, it's fairly random if the object is really purged, because you (probably) can't predict what user-agent the next person to visit a specific object is using. If you wish to purge based on a parameter sent from the "original" client, you will have to store that parameter in obj.http somewhere and remove it in vcl_deliver if you don't want to expose it.
This is where it gets tricky. The normal example of why, is this: purge("req.url == " req.url);
Normal programming-thinking would tell you that this would match everything, since the url is always equal to itself. This is where VCL string concatenation comes into the picture. In reality, you are writing: "add this to the purge list: The string containing "req.url == " and the value of the variable req.url".
In other words, if the client access http://example.com/foobar and hit the code above, this would say: "Add the string containing "req.url == " and "/foobar" to the purge list." The quotation marks are essential!
I find it easier to think of it as preparing a string for the purge-command on cli. Varnish concatenates two strings without any special sign.
In the end, this is the rule of thumb: Put everything you expect to see literally when you type "purge.list" inside quotation marks, and put things you wish to replace with the variable of the calling session outside.
So you actually have three different VCL contexts to worry about: - The context that originally pulled the object in from a backend (not much you can do here unless you hide things in obj.http) - The context that will hit the object and thereby test the object against the purge. Any variable in this context has to be inside quotation marks. - The context that triggered the purge, variables from this context should be outside quotation marks, so they are replaced with their string values before being added to the purge list.
The reason you do not need quotation marks if you enter the purge command on the command line interface is because you don't have the third context. There is no req.url in telnet, since you are not going through VCL at all.
Some examples, note that when I say "supplied by the client" I mean the client initiating the purge, typically some smart system you've set up:
Purge object on the current host and URLs matching the regex stored in the X-Purge-Regex header supplied by the client:
purge("req.http.host == " req.http.host " && req.url ~ " req.http.X-Purge-Regex);
Purge all php for any example.com-domain:
purge("req.http.host ~ example.com$ && req.url ~ ^/.*\.php");
Same, but for the host provided in the X-Purge-HostPHP:
purge("req.http.host ~ " req.http.X-Purge-HostPHP " && req.url ~ ^/.*\.php");
Purge objects with X-Cache-Channel set to "sport":
purge("obj.http.X-Cache-Channel ~ sport");
Same, but purge the cache-channel set in the header 'X-Purge-CC':
purge("obj.http.X-Cache-Channel ~ " X-Purge-CC);
Purge in vcl_fetch if the backend sent a X-Purge-URL header (weird thing to do, but fun example):
sub vcl_fetch { (....) if (obj.http.X-Purge-URL) { purge("req.url ~ " obj.http.X-Purge-URL); } (...) }
(PS: I have not actually tested all these examples, but they look correct)