Kristian Lyngstøl's Blog

The Architecture the Varnish Agent

Posted on 2013-02-15

Designing software architecture is fun.

The Varnish Agent 2 was written as a replacement for the original Varnish Agent. They both share the same purpose: Expose node-specific Varnish features to a management system. They are design very differently, though.

In this post I'd like to explain some choices that were made, and show you how to write your own code for the Varnish Agent 2. It's really not that hard.

The code can be found at: https://github.com/varnish/vagent2

Why C ?

The choice of C as a language was made fairly early. One of the main reasons is that Varnish itself is written in C, as are all the tools for Varnish. This means that the by far best supported APIs for talking to Varnish are written in C.

But an other reason is because C is a very good language. It has become a false truth that you "never write web apps in C", more or less. There are good reasons for this: It takes time to set things up in C, C isn't very forgiving and perhaps most importantly: people generally suck at C.

In the end, we chose C because it was the right tool for the job.

Requirements

When designing a new system, it's important to know what you're trying to achieve, and perhaps just as important to know what you're /not/ trying to achieve.

The Varnish Agent is designed to:

  • Manage a single Varnish server.
  • Remove the need for management frontends to know the Varnish CLI language.
  • Expose log data
  • Persist configuration changes
  • Require "0" configuration of the agent itself
  • Ensure that Varnish works on boot, even if there is no management front-end present.
  • Be expandable without major re-factoring.
  • Be easy to expand

What we did NOT want was:

  • Support for running the agent on a different machine than the Varnish server.
  • Elaborate self-management of the agent (e.g: support for users, and management of them).
  • Mechanisms that are opaque to a system administrator
  • Front-end code mixed with back-end code
  • "Sessions"

We've achieved pretty much all of these goals.

The heart of the agent: The module

At the heart of the agent, there is the module. As of this writing, there are 14 modules written. The average module is 211 lines of C code (including copyright and license). The smallest module, the echo module, is 92 lines of code (the echo plugin is an example plugin with extensive self documentation). The largest modules, the vlog and vcl modules, are both 387 lines of code.

To make modules useful, I spent most of the initial work on carving out how modules should work. This is currently how it works:

  • You define a module, say, src/modules/foobar.c
  • You write foobar_init(). This function is the only absolutely required part of the function. It will be run in the single-threaded stage of the agent.
  • You either hook into other modules (like the httpd-module), or define a start function.
  • After all plugins are initialized, the start function of each plugin is executed, if present.

That's it.

Since a common task is inter-operation between plugins, an IPC mechanism was needed. I threw together a simple message passing mechanism, inspired by varnish. This lives in src/ipc.c and include/ipc.h. The only other way to currently talk to other modules is through httpd_register (and logger(), but that's just a macro for ipc_run()).

If you want your foobar.c-plugin to talk to the varnish CLI, you want to go through the vadmin-plugin. This is a two-step process:

int handle;

void foobar_init(struct agent_core_t *core)
{
    handle = ipc_register(core, "vadmin");
}

This part of the code gives you a socket to talk to the vadmin module. Actually talking to other modules in foobar_init() is not going to work, since the module isn't started yet.

And proper etiquette is not to use a global variable, but to use the plugin structure for your plugin, present in core:

struct foobar_priv_t {
        int vadmin;
}
void foobar_init(struct agent_core_t *core)
{
        struct foobar_priv_t *priv = malloc(sizeof(struct echo_priv_t));
        struct agent_plugin_t *plug;
        plug = plugin_find(core,"foobar");
        assert(plug);
        priv->vadmin = ipc_register(core,"vadmin");
        plug->data = (void *)priv;
        plug->start = NULL;
}

In this example, we have a private data structure for the module, which we allocate in the init function. Every function has a generic struct agent_plugin_t data structure already allocated for it and hooked on to the core->plugins list. This allows you to store generic data, as the core-data structure is the one typically passed around.

Note

The varnish agent uses a lot of assert()s. This is similar to what Varnish does. It lets you, the developer, state that we assume this worked, but if it didn't you really shouldn't just continue. It's excellent for catching obscure bugs before they actually become obscure. And it's excellent for letting you know where you actually need proper error code.

Let's take a closer look at the generic struct agent_plugin_t:

struct agent_plugin_t {
        const char *name;
        void *data;
        struct ipc_t *ipc;
        struct agent_plugin_t *next;
        pthread_t *(*start)(struct
                            agent_core_t *core, const
                            char *name);
        pthread_t *thread;
};

The name should be obvious. The void *data is left for the plugin to define. It can be ignored if your plugin doesn't need any data at all (what does it do?).

struct ipc_t *ipc is the IPC-structure for the plugin. This tells you that all plugins have an IPC present. This is to allow you to run ipc_register() before a plugin has initialized itself. Otherwise we'd have to worry a lot more about which order modules were loaded.

Next is *next. This is simply because the plugins are par of a linked list.

the start() function-pointer is used to define a function that will start your plugin. This function can do pretty much anything, but have to return fairly fast. If it spawns off a thread, it's expected that it will return the pthread_t * data structure, as the agent will later wait for it to join. Similar, *thread is used for the same purpose.

Using the IPC

You've got a handle to work with, let's use it. To do that, let's look at the vping plugin, starting with init and start:

static pthread_t *
vping_start(struct agent_core_t *core, const char *name)
{
        (void)name;
        pthread_t *thread = malloc(sizeof (pthread_t));
        pthread_create(thread,NULL,(*vping_run),core);
        return thread;
}

void
vping_init(struct agent_core_t *core)
{
        struct agent_plugin_t *plug;
        struct vping_priv_t *priv = malloc(sizeof(struct vping_priv_t));
        plug = plugin_find(core,"vping");

        priv->vadmin_sock = ipc_register(core,"vadmin");
        priv->logger = ipc_register(core,"logger");
        plug->data = (void *)priv;
        plug->start = vping_start;
}

vping_init() grabs a handle for the vadmin (varnish admin interface) plugin, and the logger. It also assigns vping_start() to relevant pointer.

vping_start() simply spawns a thread that runs vping_run.

static void *vping_run(void *data)
{
        struct agent_core_t *core = (struct agent_core_t *)data;
        struct agent_plugin_t *plug;
        struct vping_priv_t *ping;
        struct ipc_ret_t vret;

        plug = plugin_find(core,"vping");
        ping = (struct vping_priv_t *) plug->data;

        logger(ping->logger, "Health check starting at 30 second intervals");
        while (1) {
                sleep(30);
                ipc_run(ping->vadmin_sock, &vret, "ping");
                if (vret.status != 200)
                        logger(ping->logger, "Ping failed. %d ", vret.status);
                free(vret.answer);

                ipc_run(ping->vadmin_sock, &vret, "status");
                if (vret.status != 200 || strcmp(vret.answer,"Child in state running"))
                        logger(ping->logger, "%d %s", vret.status, vret.answer);
                free(vret.answer);
        }
        return NULL;
}

The vping module was the first module written. Written before the varnish admin interface was a module. It simply pings Varnish over the admin interface.

This also illustrates how to use the logger: Grab a handle, then use logger(handle,fmt,...), similar to how you'd use printf().

The IPC mechanism returns data through a vret-structure. For vadmin, this is precisely how Varnish would return it.

Warning

ipc_run() dynamically allocates memory for ret->answer. FREE IT.

The logger also returns a vret-like structure, but the logger() macro handles this for you.

Hooking up to HTTP!

Hooking up to HTTP is ridiculously easy.

Let's look at echo, comments removed:

struct echo_priv_t {
        int logger;
};

static unsigned int echo_reply(struct httpd_request *request, void *data)
{
        struct echo_priv_t *echo = data;
        logger(echo->logger, "Responding to request");
        send_response(request->connection, 200, request->data, request->ndata);
        return 0;
}

void echo_init(struct agent_core_t *core)
{
        struct echo_priv_t *priv = malloc(sizeof(struct echo_priv_t));
        struct agent_plugin_t *plug;
        plug = plugin_find(core,"echo");
        assert(plug);
        priv->logger = ipc_register(core,"logger");
        plug->data = (void *)priv;
        plug->start = NULL;
        httpd_register_url(core, "/echo", M_POST | M_PUT | M_GET, echo_reply, priv);
}

This is the ENTIRE echo plugin. httpd_register_url() is the key here. It register a url-base, /echo in this case, and a set of request methods (POST, PUT and GET in this case. DELETE is also supported). A callback to execute and some optional private data.

The echo_reply function is now executed every time a POST, PUT or GET request is received for URLs starting with /echo.

You can respond with send_response() as demonstrated above, or the shorthands send_response_ok(request->connection, "Things are all OK!"); and send_response_fail(request->connection, "THINGS WENT BAD");.

Warning

Currently all http requests are handled in a single thread. This means you really really shouldn't block.

But make sure it's written with thread safety in mind. We might switch to a multi-threaded request handler in the future.

Know your HTTP

"REST"-interfaces are great, if implemented correctly. A short reminder:

  • GET requests are idempotent and should not cause side effects. They should be purely informational.
  • PUT requests are idempotent, but can cause side effects. Example: PUT /start can be run multiple times.
  • POST requests do not have to be idempotent, and can cause side effects. Example: POST /vcl/ will upload new copies of the VCL.
  • DELETE requests are idempotent, and can have side effects. Example: DELETE /vcl/foobar.

Test your code!

Unused code is broken code. Untested code is also broken code.

Pretty much all functionality is tested. Take a look in tests/.

If your code is to be included in an official release, someone has to write test cases.

I also advise you to add something in html/index.html to test it if that's feasible. It also tends to be quite fun.

Getting started

To get started, grab the code and get crackin'.

I advise you to read include/*.h thoroughly.

Comments

The Varnish Agent 2.1

Posted on 2013-01-31

We just released the Varnish Agent 2.1.

(Nice when you can start a blog post with some copy/paste!)

Two-ish weeks ago we released the first version of the new Varnish Agent, and now I have the pleasure of releasing a slightly more polished variant.

The work I've put in with it the last couple of weeks has gone towards increasing stability, resilience and fault tolerance. Some changes:

For a complete-ish log, see the closed tickets for the 2.1 milestone on github.

This underlines what we seek to achieve with the agent: A rock stable operational service that just works.

If you've got any features you'd like to see in the agent, this is the time to bring them forth!

I've already started working on 2.2 which will include a much more powerful API for the varnishlog data (see docs/LOG-API.rst in the repo), and improved HTTP handling, including authentication.

So head over to the demo, play with it, if you break it, let me know! Try to install the packages and tell me about any part of the installation process that you feel is awkward or not quite right.

Comments

The Varnish Agent

Posted on 2013-01-22

We just released the Varnish Agent 2.0.

The Varnish Agent is a HTTP REST interface to control Varnish. It also provides a proof of concept front-end in html/JavaScript. In other words: A fully functional Web UI for Varnish.

We use the agent to interface between our commercial Varnish Administration Console and Varnish. This is the first agent written in C and the first version exposing a HTTP REST interface, so while 2.0 might suggest some maturity, it might be wiser to consider it a tech preview.

/misc/agent-2.0.png

I've written the agent for the last few weeks, and it's been quite fun. This is the first time I've ever written JavaScript, and it was initially just an after thought that quickly turned into something quite fun.

Some features:

I've had a lot of fun hacking on this and I hope you will have some fun playing with it too!

Comments

Tools of the trade - Job control

Posted on 2012-11-03

About the Tools of the trade series

After over a decade of using GNU/Linux, you pick up a few tricks. They become second nature to you. You don't even think about them when you're using them. They enter your regular tool chest, so to speak.

This blog post is the first in a series of what I hope to be many posts where I introduce basic tools, tricks and techniques to those of you who are less experienced with GNU/Linux. The goal is not to make you an expert on the tools, but to get you started and show you a few use cases.

As I hope to make this a series, please let me know if the style, topic and level of detail is appropriate, or if there are any particular topics that you're interested in.

If you enjoy the series, feel free to subscribe to the RSS, either for the entire blog (http://kly.no/feed.xml) or just these TOTT (tools of the trade) posts (http://kly.no/feedtott.xml). And of course, I'd appreciate it if you helped me spread the word to others who could find these posts interesting, even if they might not be for you.

Now, let's get started with job control and screen, two very simple tools that can make your life easier.

Job control, you say?

How often do you do this:

  • Open service_foo.conf
  • Edit
  • Save and close service_foo.conf
  • Restart the service foo
  • Get a syntax error
  • Reopen service_foo.conf
  • Navigate to the same position you were at
  • Edit
  • Save
  • Try restart,
  • etc etc

It's pretty common.

Or:

$ long_running_command
# Darn, should've started it in the background instead!
CTRL-C
$ long_running_command &

All of these situations can be dealt with using basic job control in your shell. Most proper shells have some job control, but since it's by far the most common shell, we'll talk about how bash handles it.

It's actually very simple. Here's what you need to know:

Action Effect
CTRL-Z Stops the currently active job
$ jobs Lists all jobs and their state
$ fg Wakes up the job most frequently stopped
$ fg x Wakes up job x, where x can be seen using the jobs command.
$ bg Sends the job most frequently stopped to the background, as if you started it with &.
$ bg x Sends job x to the background.

A job can be any command that would normally run in the foreground. You can also use %prefix instead of the job number, where the prefix is the command you started. For instance if you run man bash to read up on job control, then stop it, you could resume the job with fg %man.

Stopping a job is not the same as putting it in the background. When you stop a job, it actually stops running. For your editor, this doesn't matter. Here's a simple example where I just have a script output the time:

kristian@luke:~$ ( while sleep 1; do date ; done )
Sat Nov  3 03:05:16 CET 2012
Sat Nov  3 03:05:17 CET 2012
Sat Nov  3 03:05:18 CET 2012
Sat Nov  3 03:05:20 CET 2012
Sat Nov  3 03:05:21 CET 2012
Sat Nov  3 03:05:22 CET 2012
Sat Nov  3 03:05:23 CET 2012
^Z
[2]+  Stopped                 ( while sleep 1; do
    date;
done )
kristian@luke:~$ date
Sat Nov  3 03:05:33 CET 2012
kristian@luke:~$ jobs
[1]-  Stopped                 man bash
[2]+  Stopped                 ( while sleep 1; do
    date;
done )
kristian@luke:~$ fg 2
( while sleep 1; do
    date;
done )
Sat Nov  3 03:05:42 CET 2012
Sat Nov  3 03:05:43 CET 2012
Sat Nov  3 03:05:44 CET 2012
Sat Nov  3 03:05:45 CET 2012
^C

Notice how there is no time stamps printed for the time the command was stopped.

If you wanted that, you would have to put the job in the background. When you do put jobs in the background their output will generally pop up in your shell, just like what would happen if you use & without redirecting output.

There are a few shortcuts to job control too, though I personally don't use them. Take a look at the Job Control chapter in man bash for more.

screen

Using your shell's job control is great for manipulating jobs within a single open shell. But it has many limitations too. And it doesn't allow you to stop a job in one shell and open it up again in an other (perhaps at a later time from an other machine).

Screen is most famous for allowing you to keep programs running even if you lose your connection.

Screen is a simple wrapper around any command you run. You typically start screen with just screen and end up in a plain shell. You can also start a single command directly, for instance using screen irssi. Under the hood you've now created a screen "server" which is what your applications are connected to, and a screen "client" which is what your terminal is looking at. If you close your terminal, the client will stop, but the server will keep running and the applications inside it will be unaware of the disappearance of the terminal. You can also detach from the server manually by hitting ctrl-a d. All screen-bindings start with ctrl-a. I'll have a little list further down.

Here's a demo:

kristian@luke:~$ cat screen-demo.sh
#!/bin/bash
while sleep 1; do
        date | tee -a screen-demo.log;
done
kristian@luke:~$ screen ./screen-demo.sh

(date printing starts)
^A d (detach)

[detached from 24859.pts-3.luke]
kristian@luke:~$ date
Sat Nov  3 03:24:53 CET 2012
kristian@luke:~$  tail -n 2 -f screen-demo.log
Sat Nov  3 03:24:53 CET 2012
Sat Nov  3 03:24:54 CET 2012
Sat Nov  3 03:24:55 CET 2012
Sat Nov  3 03:24:56 CET 2012
Sat Nov  3 03:24:57 CET 2012
Sat Nov  3 03:24:58 CET 2012
(keeps running)

The basics of screen are:

  • screen starts screen with a regular shell.
  • screen app starts screen running app. The app-argument can include arguments. screen irssi -! will start screen and irssi -!.
  • All screen-commands start with CTRL-a (^A).
  • CTRL-a d (^A d) detaches from screen. This happens automatically if you close the terminal or your ssh connection breaks or similar.
  • screen -r re-attaches to a screen session. If you have multiple screens running you will have to specify which one (it will prompt you to).
  • screen -r -d re-attaches to a screen session that you are still attached to somewhere else. This means that if you ssh to a server at work and open screen but forget to close it, you can take over that screen session when you get home for example.
  • screen -x attaches to a screen session without detaching any other screen clients. A good use case is ssh'ing to a server, starting screen and having your customer do the same with screen -x so he can see exactly what you're doing and even type himself. It's quite cool, so try it out!

Screen can also have multiple 'windows' inside a session. I mostly use "full screen windows" as they are simplest. Try it out while running screen:

  • Hit ^A c to create a new window.
  • Hit ^A n to go to the next window.
  • Hit ^A p to go to the previous window.
  • Hit ^A a to go to the window you were at last.

You can also show multiple windows at the same time (split screen) and jump to specific windows if you have many (e.g: jump from window 1 to 6 without going through window 2, 3, 4 and 5.). Check the screen manual page for more.

Screen has some quirks with regards to scrolling, though, so you may want to check out the man page for that too.

Tip

Ever need to re-configure network stuff over ssh?

Run the commands in screen.

What I often do is something along the lines of: ifdown eth0; sleep 5; ifup eth0; sleep 60 && ifconfig eth0 some-safe-ip for instance. This ensures that the commands run even if the connection drops. It also allows you to regain your old session if you have to reconnect.

Minor tips

  • Use tee file if you both want to write the out to a file (echo blatti > foo) and want to see the output at the same time. tee -a will append instead of overwrite, similar to what >> does.
  • Simple bash loops are wonderful for testing. I use variations of while true; do blatti; done, while sleep 1; do blatti; done; and for a in foo bar bat; do echo $a; done frequently.
Comments

Fun with Gawk

Posted on 2012-09-01

A few years ago I saw some AWK code a colleague had written. Up until that point I'd only really used awk for foo | awk {print $2} type stuff. I decided to take a closer look at AWK, and liked what I found.

Today I frequently use AWK for rapid prototyping or just massaging some input data beyond what's suitable with sed and cut. There are several reasons I use AWK for this. Mainly because it's quite efficient in a prototyping phase, but also because I find it a very fun and natural language to work with.

With GNU AWK (or just GAWK or gawk), you can even get fairly straight forward networking. It's limited of course, but it works well within those limits.

I've already written a munin node in gawk (see github), but today I got a challenge from a friend (well, more like a ruse?):

<Napta> have you not tried to write a modest caching server in gawk yet ? :D
<Kristian> that's fairly easy?
<Napta> so do it!
<Kristian> ......
<Kristian> I hate you
<Kristian> because now I have to

And 26 minutes later it was working quite well.

#!/usr/bin/gawk -f

function say(content) {
        printf "%s", content |& Service
}

function synthetic(status, response, msg) {
        say("HTTP/1.1 " status " " response "\n");
        say("Connection: close\n");
        say("\n");
        say(msg);
}

function reply(url) {
        say("HTTP/1.1 200 OK\n");
        say("Connection: close");
        say(cache[url] "\n");
}

function get(url) {
        print "GET " url " HTTP/1.1\n" |& Backend
        print "Connection: close\n\n" |& Backend
        
        Backend |& getline
        if ($2 != "200") {
                synthetic($2, "Bad backend", "Bad backend? Got: " $0)
        } else {
                cache[url] = ""
                while ((Backend |& getline c)>0)
                        cache[url] = cache[url] "\n" c
                reply(url)
        }
}

function handle_request() {
        Service |& getline
        url=$2
        request=$1
        if (request != "GET") {
                synthetic(413,"Only support GET","We only like GET");
                return;
        }
        if (cache[url]) {
                reply(url);     
                print "Cache hit: " url "\n";
        } else {
                print "Cache miss: " url "\n";
                get(url);
        }
}
        
BEGIN {
        LINT=1
        port = "8080"   
        backend = "kly.no"
        Service = "/inet/tcp/" port "/0/0"
        Backend = "/inet/tcp/0/" backend "/80"
        do {
                handle_request()
                close(Service)
                close(Backend)
        } while(1)

}

Or download it from /code/script/gawk_cacher

Comments