Posted on 2010-07-15
Ok, so you've just started working in a Big Sysadmin Department, and they use Varnish. You may have to deal with it, but are not going to be working on it on a daily basis. This is some of what you should know about varnish in case you have to extinguish some fires.
This post is heavily inspired by my visit to Paris last week, and as such, I'd like to say hello to Ludo and everyone else (this is mostly written on my plane ride back home).
Varnish is a reverse proxy. It operates on layer 7, unlike your typical load balancer (though some of them have layer 7 functionality, like basic health checks, they can hardly compare to Varnish). For all intents and purposes, Varnish is the first "web server" that a browser hits.
Varnish has a simple, yet powerful configuration language - the Varnish Configuration Language (VCL) - which allows it to make intelligent decisions on what to cache, and how. It can rewrite urls, it can perform redirects and it can direct traffic at different web servers based on both the specific request it is dealing with (for example, if the request is for "http://www.example.com/sports", it might go to a dedicated sport-server) and on the state of the web server.
By default, Varnish should Just Work. It will only cache content it is certain can be safely cached. That means that it will only cache GET requests, and only content where no cookies are involved. This behavior can - and most likely will - be overridden by your VCL.
In regards to the RFC2616 (The HTTP RFC), Varnish is not really what they are talking about when they refer to proxies and caches. Varnish belongs on the origin-server side of the equation, as the same people who control the web server can also control the cache. But not entirely, so you we have done some approximations using common sense to make Varnish fit into the RFC.
Varnish has two separate types of configuration types. VCL defines policy, and only policy. It can be used to implement features that are otherwise not present (for example the ability to purge content simply by adding "?purge=yes" to the url, or any other scheme like that).
The second type of configuration is program arguments - or parameters. Parameters relate to how Varnish behaves on the machine it is running it. You can define the size of the cache, the user and group to run Varnish under, what ports are used by Varnish, where varnish will find its VCL, and so forth.
On Debian-based systems (including Ubuntu), you will typically find the parameters defined in /etc/default/varnish. Similarly, you will find them in /etc/sysconfig/varnish on Red Hat.
The VCL is often stored in /etc/varnish/, and typically /etc/varnish/default.vcl - but that is defined by a startup argument.
Additionally, you can change MOST of the parameters while Varnish is running. There is only requirement for this: Varnish needs to be started with a -T argument, to enable the management interface.
The best way to use the management interface is by telnetting to it. This gives you a nice and simple interface. With a "help" command. You can run the same commands with the "varnishadm" tool, but that will not give you as extensive error messages. The rule of thumb that I use is "telnet for interactive commands, varnishadm for scripting purposes".
Parameters are changed by using "param.set" after reviewing them with "param.show". Changes are applied immediately, but it might take some time for them to be visible. A good example of this delay is if you change the default ttl, as that applies to new objects in the cache from that point on.
You can also load and use new VCL while Varnish is running. This does not introduce any (known) glitches or slowdowns or delays in Varnish, as it is a matter of switching 4 pointers around - and varnish will still keep the old configurations around while they are still used (for instance if a backend is still in use).
Loading VCL is done with the "vcl.load " command. That will only _load_ the configuration, not actually use it. If your VCL has syntax errors, this is when they will show up. After it has loaded, you can switch to the new config with "vcl.use ". Alternatively use my varnish_reload script. Be very careful with "reload" arguments to init scripts, as they may be implemented using a restart.
Starting Varnish is left as an exercise to the reader.
Run it, look at it, and understand it. This previous blog post gives a quick introduction to varnishstat: http://kly.no/posts/2009_12_08__Varnishstat_for_dummies__.html
One thing you should always check is the uptime of varnish - in varnishstat. Because Varnish will "never" crash permanently. It can - however - crash repeatedly. This is because Varnish has two processes: A management process and a "everything else" process. If you have a bug, it is likely to take down the "everything else" process, and the management process will notice this and restart it immediately. This is logged to syslog in detail.
On Debian systems (including Ubuntu), /var/log/syslog is more extensive than plain /var/log/messages.
You also want to look out for disk activity. This is the number one killer of Varnish-performance. If you are seeing extensive disk activity, it might make sense to reduce the size of the cache so it fits in memory. And tune the caching scheme to make sure there are no duplicate objects (for instance a www.example.com/foo object and a separate example.com/foo object). Or buy more memory. Or make sure the disk is an SSD. (Hint: vmstat 5).
One trick that is commonly applied to avoid disk activity is to put the shmlog on a tmpfs. This is generally not required, but doesn't hurt. The shmlog and the compiled VCL is typically stored at /usr/var/varnish/(hostname)/ or similar.
Do not let the virtual memory usage of Varnish scare you. But do be afraid of the resident memory usage. It is not uncommon to see Varnish use 80G of virtual memory but only 28G resident on a machine with 32G.
Lastly you have varnishlog. This will tell you exactly what varnish is doing with each request, but is extremely extensive. Keep in mind that varnishlog can filter. Here are some freebies, and I'll leave it to the reader to expand upon them (or a future blog post):
varnishlog -o RxUrl /some/url - List requests with "/some/url" in the url. Including /some/url, /some/url/blah and /blah/some/url/bla.
varnishlog -b -o - Only list backend traffic.
varnishlog -b -o -i TxURL - List URLS going to a backend.
varnishlog -c -o RxHeader FireFox - List requests from clients with "FireFox" present in any of the HTTP headers supplied.
varnishlog -o TxStatus 500 - List all requests sent back to a client with status code 500.
Let's say your web server misbehaved and your VCL wasn't smart enough to spot it, and you've cached malformed data. The easiest way to clean that up is using the "purge" function through the management interface. It will not free up memory, but it will make sure the content is refreshed the next time someone asks for it. This supports regular expressions similar to what you find in VCL. I wrote a detailed post about purging: http://kly.no/posts/2010_02_02__Varnish_purges__.html
Don't use 32 bit.
Read my post on best practices: http://kly.no/posts/2010_01_26__Varnish_best_practices__.html
Don't over do it. KISS is king. Standard OS packages, as simple a VCL as you can, as little tuning of parameters as you can. Etc.
Remember: If Varnish is told not to cache based on response from a web server, it will cache the decision not to cache. This means that if your front page temporarily fails and varnish is told not to cache it, make sure the TTL is low, otherwise all traffic to the front page will go to a web server until the hitpass object (http://kly.no/posts/2010_01_08__Hitpass_objects_and_Varnish__.html) has expired.
Ask for advice early. It's much harder to fix a system that is deployed and in production than help you avoid problems. You can get commercial help from us at Varnish community (http://www.varnish-cache.org).