The Architecture the Varnish Agent

Posted on 2013-02-15

Designing software architecture is fun.

The Varnish Agent 2 was written as a replacement for the original Varnish Agent. They both share the same purpose: Expose node-specific Varnish features to a management system. They are design very differently, though.

In this post I'd like to explain some choices that were made, and show you how to write your own code for the Varnish Agent 2. It's really not that hard.

The code can be found at: https://github.com/varnish/vagent2

Why C ?

The choice of C as a language was made fairly early. One of the main reasons is that Varnish itself is written in C, as are all the tools for Varnish. This means that the by far best supported APIs for talking to Varnish are written in C.

But an other reason is because C is a very good language. It has become a false truth that you "never write web apps in C", more or less. There are good reasons for this: It takes time to set things up in C, C isn't very forgiving and perhaps most importantly: people generally suck at C.

In the end, we chose C because it was the right tool for the job.

Requirements

When designing a new system, it's important to know what you're trying to achieve, and perhaps just as important to know what you're /not/ trying to achieve.

The Varnish Agent is designed to:

Manage a single Varnish server.
Remove the need for management frontends to know the Varnish CLI language.
Expose log data
Persist configuration changes
Require "0" configuration of the agent itself
Ensure that Varnish works on boot, even if there is no management front-end present.
Be expandable without major re-factoring.
Be easy to expand

What we did NOT want was:

Support for running the agent on a different machine than the Varnish server.
Elaborate self-management of the agent (e.g: support for users, and management of them).
Mechanisms that are opaque to a system administrator
Front-end code mixed with back-end code
"Sessions"

We've achieved pretty much all of these goals.

The heart of the agent: The module

At the heart of the agent, there is the module. As of this writing, there are 14 modules written. The average module is 211 lines of C code (including copyright and license). The smallest module, the echo module, is 92 lines of code (the echo plugin is an example plugin with extensive self documentation). The largest modules, the vlog and vcl modules, are both 387 lines of code.

To make modules useful, I spent most of the initial work on carving out how modules should work. This is currently how it works:

You define a module, say, src/modules/foobar.c
You write foobar_init(). This function is the only absolutely required part of the function. It will be run in the single-threaded stage of the agent.
You either hook into other modules (like the httpd-module), or define a start function.
After all plugins are initialized, the start function of each plugin is executed, if present.

That's it.

Since a common task is inter-operation between plugins, an IPC mechanism was needed. I threw together a simple message passing mechanism, inspired by varnish. This lives in src/ipc.c and include/ipc.h. The only other way to currently talk to other modules is through httpd_register (and logger(), but that's just a macro for ipc_run()).

If you want your foobar.c-plugin to talk to the varnish CLI, you want to go through the vadmin-plugin. This is a two-step process:

int handle;

void foobar_init(struct agent_core_t *core)
{
    handle = ipc_register(core, "vadmin");
}

This part of the code gives you a socket to talk to the vadmin module. Actually talking to other modules in foobar_init() is not going to work, since the module isn't started yet.

And proper etiquette is not to use a global variable, but to use the plugin structure for your plugin, present in core:

struct foobar_priv_t {
        int vadmin;
}
void foobar_init(struct agent_core_t *core)
{
        struct foobar_priv_t *priv = malloc(sizeof(struct echo_priv_t));
        struct agent_plugin_t *plug;
        plug = plugin_find(core,"foobar");
        assert(plug);
        priv->vadmin = ipc_register(core,"vadmin");
        plug->data = (void *)priv;
        plug->start = NULL;
}

In this example, we have a private data structure for the module, which we allocate in the init function. Every function has a generic struct agent_plugin_t data structure already allocated for it and hooked on to the core->plugins list. This allows you to store generic data, as the core-data structure is the one typically passed around.

Note

The varnish agent uses a lot of assert()s. This is similar to what Varnish does. It lets you, the developer, state that we assume this worked, but if it didn't you really shouldn't just continue. It's excellent for catching obscure bugs before they actually become obscure. And it's excellent for letting you know where you actually need proper error code.

Let's take a closer look at the generic struct agent_plugin_t:

struct agent_plugin_t {
        const char *name;
        void *data;
        struct ipc_t *ipc;
        struct agent_plugin_t *next;
        pthread_t *(*start)(struct
                            agent_core_t *core, const
                            char *name);
        pthread_t *thread;
};

The name should be obvious. The void *data is left for the plugin to define. It can be ignored if your plugin doesn't need any data at all (what does it do?).

struct ipc_t *ipc is the IPC-structure for the plugin. This tells you that all plugins have an IPC present. This is to allow you to run ipc_register() before a plugin has initialized itself. Otherwise we'd have to worry a lot more about which order modules were loaded.

Next is *next. This is simply because the plugins are par of a linked list.

the start() function-pointer is used to define a function that will start your plugin. This function can do pretty much anything, but have to return fairly fast. If it spawns off a thread, it's expected that it will return the pthread_t * data structure, as the agent will later wait for it to join. Similar, *thread is used for the same purpose.

Using the IPC

You've got a handle to work with, let's use it. To do that, let's look at the vping plugin, starting with init and start:

static pthread_t *
vping_start(struct agent_core_t *core, const char *name)
{
        (void)name;
        pthread_t *thread = malloc(sizeof (pthread_t));
        pthread_create(thread,NULL,(*vping_run),core);
        return thread;
}

void
vping_init(struct agent_core_t *core)
{
        struct agent_plugin_t *plug;
        struct vping_priv_t *priv = malloc(sizeof(struct vping_priv_t));
        plug = plugin_find(core,"vping");

        priv->vadmin_sock = ipc_register(core,"vadmin");
        priv->logger = ipc_register(core,"logger");
        plug->data = (void *)priv;
        plug->start = vping_start;
}

vping_init() grabs a handle for the vadmin (varnish admin interface) plugin, and the logger. It also assigns vping_start() to relevant pointer.

vping_start() simply spawns a thread that runs vping_run.

static void *vping_run(void *data)
{
        struct agent_core_t *core = (struct agent_core_t *)data;
        struct agent_plugin_t *plug;
        struct vping_priv_t *ping;
        struct ipc_ret_t vret;

        plug = plugin_find(core,"vping");
        ping = (struct vping_priv_t *) plug->data;

        logger(ping->logger, "Health check starting at 30 second intervals");
        while (1) {
                sleep(30);
                ipc_run(ping->vadmin_sock, &vret, "ping");
                if (vret.status != 200)
                        logger(ping->logger, "Ping failed. %d ", vret.status);
                free(vret.answer);

                ipc_run(ping->vadmin_sock, &vret, "status");
                if (vret.status != 200 || strcmp(vret.answer,"Child in state running"))
                        logger(ping->logger, "%d %s", vret.status, vret.answer);
                free(vret.answer);
        }
        return NULL;
}

The vping module was the first module written. Written before the varnish admin interface was a module. It simply pings Varnish over the admin interface.

This also illustrates how to use the logger: Grab a handle, then use logger(handle,fmt,...), similar to how you'd use printf().

The IPC mechanism returns data through a vret-structure. For vadmin, this is precisely how Varnish would return it.

Warning

ipc_run() dynamically allocates memory for ret->answer. FREE IT.

The logger also returns a vret-like structure, but the logger() macro handles this for you.

Hooking up to HTTP!

Hooking up to HTTP is ridiculously easy.

Let's look at echo, comments removed:

struct echo_priv_t {
        int logger;
};

static unsigned int echo_reply(struct httpd_request *request, void *data)
{
        struct echo_priv_t *echo = data;
        logger(echo->logger, "Responding to request");
        send_response(request->connection, 200, request->data, request->ndata);
        return 0;
}

void echo_init(struct agent_core_t *core)
{
        struct echo_priv_t *priv = malloc(sizeof(struct echo_priv_t));
        struct agent_plugin_t *plug;
        plug = plugin_find(core,"echo");
        assert(plug);
        priv->logger = ipc_register(core,"logger");
        plug->data = (void *)priv;
        plug->start = NULL;
        httpd_register_url(core, "/echo", M_POST | M_PUT | M_GET, echo_reply, priv);
}

This is the ENTIRE echo plugin. httpd_register_url() is the key here. It register a url-base, /echo in this case, and a set of request methods (POST, PUT and GET in this case. DELETE is also supported). A callback to execute and some optional private data.

The echo_reply function is now executed every time a POST, PUT or GET request is received for URLs starting with /echo.

You can respond with send_response() as demonstrated above, or the shorthands send_response_ok(request->connection, "Things are all OK!"); and send_response_fail(request->connection, "THINGS WENT BAD");.

Warning

Currently all http requests are handled in a single thread. This means you really really shouldn't block.

But make sure it's written with thread safety in mind. We might switch to a multi-threaded request handler in the future.

Know your HTTP

"REST"-interfaces are great, if implemented correctly. A short reminder:

GET requests are idempotent and should not cause side effects. They should be purely informational.
PUT requests are idempotent, but can cause side effects. Example: PUT /start can be run multiple times.
POST requests do not have to be idempotent, and can cause side effects. Example: POST /vcl/ will upload new copies of the VCL.
DELETE requests are idempotent, and can have side effects. Example: DELETE /vcl/foobar.

Test your code!

Unused code is broken code. Untested code is also broken code.

Pretty much all functionality is tested. Take a look in tests/.

If your code is to be included in an official release, someone has to write test cases.

I also advise you to add something in html/index.html to test it if that's feasible. It also tends to be quite fun.

Getting started

To get started, grab the code and get crackin'.

I advise you to read include/*.h thoroughly.

Kristian Lyngstøl's Blog