Kristian Lyngstøl's Blog

Common Varnish issues

Posted on 2009-05-25

Having worked on Varnish both as a developer, stress tester and consultant, I've seen some strange ways of breaking Varnish and/or your general web rig. Here are some common examples.

1. Ignoring /var/log/syslog

Varnish some times crashes. However, Varnish is split in two processes. The management thread continuously pings the child threads, which are most likely to die. If they do die, the management thread starts them up again within a few seconds, and all is seemingly normal. Except a few connections dying and an empty cache. This is logged to /var/log/syslog. Due to the seemingly minimal downtime, it's very common not to notice these restarts. If you can't decode the assert() errors, search for the function-name in our trac.

(syslog blabber) Panic message: Assert error in WS_Release(), cache_ws.c line 170:#012 (tons of debug information).

In this case, searching for WS_Release is a good start.

Read your syslog. Monitor varnish uptime.

2. Setting a too low value on thread_pool_min.

The default values for thread pools is ridiculously low. Varnish can adjust the number of threads dynamically, but you know your traffic pattern far better than Varnish. Set your thread_pool_min to what you expect your normal traffic to be, remembering that the number is multipled by thread_pools (default:2). Meaning: If you expect to handle 500 connections concurrently normally, set thread_pool_min to 300, assuming thread_pools is 2. This will leave you with 600 threads waiting to be used.

Varnish has a delay when it creates threads, so if you get a burst of traffic and suddenly need 100 new threads, it's going to take some time creating it, and in the meanwhile, requests will be queued. You don't want that during normal operation, and idle threads are cheap. Having 1200 threads ready for use isn't going to have any significant negative impact. Having 5 threads ready on a 2500req/s site is.

Starting varnish with something like this, makes sense (in addition to your normal parameters):

varnishd -w 200,2000 [...]

3. Using ESI or similar without adjusting session workspace

ESI is cool. And requires more session workspace. If you don't increase the value of sess_workspace (default: 16kb. Adviced value: 32kb or more), you are likely to see assert errors in your syslog.

Parameters for varnishd: varnsihd -p sess_workspace=32768 [...]

4.  Modifying headers in vcl_hit

This is a hidden easteregg crash. See bug #310. This will typically work during testing, but isn't thread safe and will crash on production sites. If you need to modify headers, do it in vcl_deliver.

Bad:

sub vcl_hit {
    set obj.http.X-Cache = "HIT";
}

Good:

sub vcl_deliver {
    if (obj.hits > 0) {
        set *resp*.http.X-Cache = "HIT";
    }
}

5. Using non-standard startup scripts

We see this every now and then, and it could work just fine, but our init scripts are well tested. Just do yourself a favor and compare yours to ours and make sure you didn't forget anything significant (ulimit for instance).

6. Putting Varnish in production without a support agreement

Seeing as I work for Redpill Linpro and specifically with Varnish, I'm not entirely impartial on this point, but I still believe it's a valid point to make.

Every now and then we get calls from people who've put Varnish in production without our help, and are experiencing problems. I don't mind, as it tends to lead to overtime, which, in turn, tends to lead to bonuses. But for you as a user, that means you have an unstable site while we desperately troubleshoot. And it's harder to fix Varnish in production, as testing is hard.

Ok, it's possible to use Varnish without our help. You get the same software as everyone else does. But what happens if things break? We (Redpill Linpro) are not just the developers behind Varnish, but also the experts on troubleshooting and performance tuning.

Remember two things:

  • Varnish development depends on support agreements. It's free software, but development isn't free of cost. (And support agreements == priority on bugs)
  • If your site has enough traffic to warrant the use of Varnish, how much is it worth to you to ensure that it doesn't break as easily and that if it does, help is available.

Get a support agreement.

Addendum

These are just a few common issues. Further points can be made when it comes to tuning, functionality and so forth. Feel free to add your own examples (or questions).