Posted on 2010-01-19
A while ago I wrote VSTS - the Varnish Stress Testing Suite. It's not nearly as fancy as the name makes it sound: It's a simple set of shell scripts that runs on a set of test servers and pounds Varnish under a couple of different scenarios. The idea is simple: Detect possible performance issues during the development cycle of Varnish by periodically testing the current code against the same well-established test.
It does the job, but could be far better. Specially when it comes to statistics.
The rrdtool-backed Python script is simply insufficient. As rrdtool likes to put data entry into a time-slot, and I don't really operate in time slots, the data is both slow and imprecise. This is fine if you want data for every 5 minute throughout a year, but not if you want data collected anwhere from one time to ten times a day. Simply put: I want one data entry to represent one unit on the X-scale.
I also want to add "real-time" statistics. The current data is collected after each test, but I want data to be gathered throughout the relevant tests. This should be seperate from the other statistics. Hopefully, the system I end up with is flexible enough to allow me to plot all the datasets on the same graph, which should make any variations easy to spot.
Up until now I've only ever dealt with rrdtool when it comes to statistics (unless you count pre-rrdtool mrtg and the like), so I was hoping for some pointers before I start digging through what I suspect is a jungle of similar but different solutions. I'm comfortable working with most languages, and the primary concerns I'm looking at is implementation complexity and flexibility.
When I'm done with this batch of refactoring, I'll do a proper writeup. However, what I'm most excited about is adding OpenSolaris into the target-platforms (should help clean out a few bugs that have been plaguing the Solaris-users for a while), adding better support for customized tests and improved robustness of VSTS.