
.. footer:: Oslo, May 2010

============
Weird C Shit
============
:Author: Kristian Lyngstol
:Contact: kristian@bohemians.org
:Date: Today

.. contents::
   :class: handout

Introduction
============

- C is subtle and full of relics from the past
- Understanding equals equals mastering
- You can now stab someone in the face across TCP/IP/C

Today
=====

- Macros
- Syntactical peculiarities
- Coercing and casting
- Circumventing the safety mechanisms
- Duff's Device

Macros: The Usual Suspects
==========================

- Duplicating side-effects
- Eating semi-colons
- String-conversion of macro arguments
- Variadic macros
- Text book stuff, yet somehow rarely known until you stumble upon it.

.. topic:: Macros: Duplicating side-effects
   :class: handout
      
        ::

                #define MAX(a,b) (a > b ? a : b)
                int x = MAX(i++,j--);
                // Becomes: i++ > j-- ? i++ : j--;
                // Possible solution:

                #define MAX(a,b) ({ \
                                typeof(a) a_ = a; \
                                typeof(b) b_ = b; \
                                a_ > b_ ? a_ : b_; \
                        })

        This could also be done as an in-line function, but then you still
        have to deal with different types. Perhaps a macro for that too?

        Also note the ({ ... }) syntax. While it should generally be
        avoided, it allow you to put a block of code anywhere you can put
        something to evaluate. The last statement will effectively be the
        "value" of the code block.

       .. note::
          typeof() is a GNU extension. If portability matters: avoid it.

.. topic:: Macros: Eating semi-colons 1
   :class: handout

        Treating function-like macros as actual functions can often lead to
        crude awakenings. Eating up semi-colons being one of the easier
        things to deal with, since your compiler will typically yell at
        you.

        Consider the following:

        ::

                #define howlongisthis(s) strlen(s);
                x = howlongisthis(y) + 5;
                (...)
                #define TOUPPER(s) { \
                                if (s<='z' && s>='a') \
                                        s -= 'a'-'A'; \
                          }
               if (foo)
                       TOUPPER(c);
               else
                       puts("hi");

        In both these examples, the compiler will bail out. First because
        you will end up with "+ 5;" on a line by itself, secondly because
        you'll have a disconnected else-statement.

        Solution::
        
                #define TOUPPER(s) do { \
                                if (s<='z' && s>='a') \
                                        s -= 'a'-'A'; \
                          } while(0)
                // No semi-colon ---^

        The do (0) { ... } approach is the most common way to solve all of
        this. Keep in mind that you don't want a semi-colon at the end,
        though.

Macro example
=============

examples/6-macros.c

.. container:: handout

        The original example is: ::

                #include <stdio.h>
                #include <stdarg.h>
                #define NPRINT(n, fmt,...) { \
                        int i; \
                        for(i=0; i<n; i++) \
                                printf("n:%d " fmt, i, __VA_ARGS__); \
                        }

                int main(int x, char **argv) {
                        if (x<2) return 1;
                        x=atoi(argv[1]);
                        if (x>15)    NPRINT(x=15, "More: %s\n", argv[0]);
                        else if(x<1) NPRINT(x=1,"Less than!\n");
                        else         NPRINT(x--, "OK\n");
                        return 0;
                }
        
        However, this doesn't compile, nor does it work. There are a number
        of issues with this. Semi-colon eating, side-effect duplication and
        not dealing with an empty __VA_ARGS__

        A working version could look like this: ::

                #include <stdio.h>
                #include <stdarg.h>
                #define NPRINT(n, fmt,...) do { \
                        int i; \
                        typeof(n) _n = n; \
                        for(i=0; i<_n; i++) \
                                printf("n:%d " fmt, i, ##__VA_ARGS__); \
                        } while(0)

                int main(int x, char **argv) {
                        if (x<2) return 1;
                        x=atoi(argv[1]);
                        if (x>15)    NPRINT(x=15, "More: %s\n", argv[0]);
                        else if(x<1) NPRINT(x=1,"Less than!\n");
                        else         NPRINT(x--, "OK\n");
                        return 0;
                }

	
Macro arguments
===============

* *#foo* is converted to the string literal of the argument.
* *foo ## bar* glues two items together.
* __VA__ARGS__ and ##__VA__ARGS__ are the same, but ##__VA__ARGS__ will
  remove the preceeding comma if there are no extra arguments.

.. container:: handout

   Variadic functions are well-known and useful (va_args, ie: printf).
   However, without variadic macros, you can't really use them in a
   macro.

   Passing it to a variadic function is not the only use. Consider the
   following (braindead) example:

   ::

        enum {
                item_ONE=0,
                item_TWO,
                item_THREE,
                nitem
        } _itemenum;
        #define ADDITEM(name,...) do { \
                item[item_ ## name].index = item_ ## name;
                item[item_ ## name].name = #name;
                item[item_ ## name].values = { __VA__ARGS__ };
                while (0)
        
        struct {
                int index;
                char * name;
                int * values;
        } item[nitem];

        int foo(void) {
                ADDITEM(ONE,5,2,1);
                ADDITEM(TWO,1,2,3);
                ADDITEM(THREE,9,4,2);
                ...
        }

X-Macros
========

- Make a header-file with a list of items: ::

        X(Banana,5)
        X(Grapefruit,20)
        X(Apple,5)

- Then use it: ::
        
        #define X(f, p) \
                printf("One " #f " costs %d\n", p);
        #include "fruit-list.h"
        #undef X

X-Macros 2
==========
::

        #define X(f, p) e_ ## f,
        enum {
                #include "fruit-list.h"
                e_NUM
        }
        #undef X
        struct { char *name; int price; } fruit[e_NUM];

        #define X(f, p) \
                fruit[e_ ## f].name = #f; \
                fruit[e_ ## f].price = p;
        #include "fruit-list.h"
        #undef X


The arrow-operator
==================

::

	int n=50;
	while(n --> 0)
		printf("n: %d\n",n);

Questioning your code
=====================

::

        int main(int argc, char **argv) {
                char (*fp)(int) = (char (*)())exit;
                argc>1 ? (*fp)(0) : 0;
                // or:
                argc>1 && (*fp)(0);
        }

.. class:: handout

   The ternary operator, while far more useful inside a more complex
   statement like a printf(), can still be used by itself. However, it does
   require that the resulting statements can be evaluated - that is to say:
   they can not be void() statements.

   The same applies to &&.

Recursion
=========

::
        
        main(s){
                gets(&s)&&puts(&s,main());
        }

.. class:: handout

   Recursion is simple and rarely as useful as it seems at first glance.
   You wouldn't want to use a quicksort implemented with recursion for any
   dataset of some size.

   And yes, main() is "just an other function". It can be recursed over
   just like everything else. Of course, in this example, there are quite a
   few other nasty things going on.

Digraphs and trigraphs
======================

- Problems with your keyboard layout? Use a digraph!
- Digraphs work out-of-the-box with GCC, trigraphs require the -trigraph
  argument.

::

	int main(int argc, char **argv) <%
		printf("Hello from %s\n",argv<:0:>);
	%>

.. class:: handout

	Before the introduction of iso8859 character maps, Norwegian coders
	still used a 7-bit character map, that re-used several well-known
	ascii characters. As such, they were taught to use æ for {, Æ for }
	and a few other hilarities.

        Trigraphs differ from digraphs not only in that they use three
        letters, typically starting with ??, but because they are _not_
        tokenized. This meant for some nasty usage in the middle of
        comments, for example.

Old-school function declaration
===============================

::
        
        main(argc, argv)
        int argc;
        char **argv;
        {
                ....
        }

- Return-values and arguments default to int

!!(foo) != (foo)
================

- !! Forces the value to be either 1 or 0.
- Sometimes used as an optimization hint

::

        int main(void) {
                char *str="hello, world\n";
                while(!!*str!=*str)
                        putchar(*(str++));
        }

.. class:: handout

   The above example uses the fact that the only place in a string you will
   find either a 0 or a 1, is at the end. For all non-zero character it
   test: if (1==*str), while for the last character, it tests: if (0==*str)

   This is used extensively in Linux to provide optimization hints for the
   compiler.... Though not exactly like in this example...

Bit operators
=============

::

	void swap(char *a, char *b) {
		*a ^= *b ^= *a ^= *b;
	}

	int i = 2; i <<= 2; // i == 8
	i >>= 1; // i == 4

- As usual: Let your compiler deal with it.

.. class:: handout

        Example code: ::

                #include <stdio.h>
                #include <stdint.h>
                typedef uint8_t mytype;

                void print_bitmask(mytype a) {
                        int i;
                        for (i=0;i<sizeof(mytype) * 8;i++) {
                                if(a& (1<<i)) putchar('1');
                                else          putchar('0');
                                if((i+1)%4 ==0) putchar(' ');
                        }
                }

                #define print(s) do { \
                        s; \
                        printf("%10s\t", #s); \
                        printf("a: "); print_bitmask(a); \
                        printf(" b: "); print_bitmask(b); \
                        printf(" a: %d b: %d\n", a,b); \
                        } while (0)

                int main(void) {
                        mytype a = 5;
                        mytype b = 25;
                        
                        print(a=5; b=25);
                        print(a ^= b);	
                        print(b ^= a);	
                        print(a ^= b);	
                        return 0;
                }

        Example output: ::

                a=5; b=25       a: 1010 0000  b: 1001 1000  a: 5 b: 25
                    a ^= b      a: 0011 1000  b: 1001 1000  a: 28 b: 25
                    b ^= a      a: 0011 1000  b: 1010 0000  a: 28 b: 5
                    a ^= b      a: 1001 1000  b: 1010 0000  a: 25 b: 5
                 

2+2=16 - Coercion!
==================

examples/1-explicit-casting.c

::

	a[2] == 2[a];	
	a[i] == i[a] == *(a+i);
	(((char *)a)+i) != (((int *)a)+i)

.. class:: handout

   ::
    
        Source:

                #include <stdio.h>
                #define modif(s) do { \
                        s; \
                        printf("%18s\t", #s); \
                        printf("c: %p i: %p, orig: %d",c,i,orig); \
                        getchar(); \
                } while(0)

                int main(int argc, char ** argv)
                {
                        int orig=42;
                        int * i;
                        char * c;
                        printf("&c: %p &i: %p &orig: %p orig: %d\n\n",&c,&i,&orig,orig);
                        modif(c=&orig; i=&orig;);
                        modif(c++; i++;);
                        modif(* c = 'x';);
                        modif(* i=5;);
                        modif(orig=5);
                        return 0;
                }
   
   ::

        Output:

		$ ./1-implicit-casting 
		&c: 0x7fffaa4b2ab8 &i: 0x7fffaa4b2ac0 &orig: 0x7fffaa4b2acc orig: 42

		 c=&orig; i=&orig;	c: 0x7fffaa4b2acc i: 0x7fffaa4b2acc, orig: 42
			 c++; i++;	c: 0x7fffaa4b2acd i: 0x7fffaa4b2ad0, orig: 42
			 *c = 'x';	c: 0x7fffaa4b2acd i: 0x7fffaa4b2ad0, orig: 30762
			     *i=5;	c: 0x7fffaa4b2acd i: 0x7fffaa4b2ad0, orig: 30762
			    orig=5	c: 0x7fffaa4b2acd i: 0x7fffaa4b2ad0, orig: 5


The sneaky nature of overruns
=============================

- Or: Why you can get away with it for so long
- Or: Where schroedingbugs live

examples/2-casting.c

- Padding is architecture- and compiler-dependent
- As usual: The only safety is the one you make yourself - until you segfault

.. class:: handout

	::

		#include <stdio.h>
		#define modif(s) do { \
			s; \
			printf("%18s\t", #s); \
			printf("x: %p ",x); \
			printf("*((int *)x): %10d, *x: %10d, y: %10d i: %10d\n",*((int *)x),*x,y,i); \
		} while(0);

		#define init(s) \
			s; \
			printf(#s "\n");

		int main(void) {
			init(
			int i=666;
			char y=42;
			char *x;
			);
			printf("&i: %p\n&y: %p\n sizeof(i): %lu\n",&i,&y,sizeof(i));

			modif(x=&i);
			modif(x+=4; *x=5);
			modif(*(int *)x = -5);
			modif(y=42);
			modif(i = -20);
			modif(i = (long) -20);
			modif(*((long *)&i) = -55);
			return 0;
		}

Everything evaluates
====================

- (Your milage may vary for (void))
- Precedence is fun!
- Void doesn't evaluate, but void,1 does >:D

::
	
	while(puts(x+*x*20)&&x[++*x*20]);
	...
	struct animal *dog;
	dog=make_dog();
	if (!!"my cat" > dog)
		printf("Confusing NULL-test?\n");

Confusing linking
=================

::

        int main(void) {
                char *buffer = "hello world";
                read(0, buffer, strlen(buffer));
        }
        int read(int i, char *buffer, int len) {
                write(1, buffer, len);
        }

.. class:: handout

	This is only possible because prototypes are not mandatory.

	If a function is used like this, the compiler will try to guess the
	correct argument, with a preference for int where possible. An
	example of this is if you try the above code by defining read as:

	::	
			
		int read(int i, void * buffer, ssize_t len) {...

	This will likely give you an error, as the earlier call to read()
	essentially defined it differently (ie: len as int instead of
	ssize_t).

Hoodwinking the compiler
========================
examples/3-evaluating-void.c ::

        int main(int argc, char **argv) {
                printf("Hello world?\n");
                printf("Not readable!", exit(0));
        }

        cc     3-evaluating-void.c   -o 3-evaluating-void
        3-evaluating-void.c: In function ‘main’:
        3-evaluating-void.c:5: warning: incompatible implicit declaration of built-in function ‘exit’
        3-evaluating-void.c:5: error: invalid use of void expression
        make: *** [3-evaluating-void] Error 1

Hoodwinking the compiler 2
==========================

examples/4-evaluating-void.c ::

        $ cat 4-evaluating-void.c ; make 4-evaluating-void; ./4-evaluating-void 
        #include <stdio.h>
        int exit();
        int main(int argc, char **argv) {
                        printf("Hello world?\n");
                        printf("Not readable!", exit(1));
        }
        cc     4-evaluating-void.c   -o 4-evaluating-void
        4-evaluating-void.c:2: warning: conflicting types for built-in function
        ‘exit’
        Hello world?

Function pointers!
==================

examples/5-evaluating-void.c ::

        $ cat 5-evaluating-void.c; make 5-evaluating-void; ./5-evaluating-void
        #include <stdio.h>
        #include <stdlib.h>
        int main(int argc, char **argv) {
                        char (*fp)(int) = (char (*)())exit;
                        printf("Hello world?\n");
                        printf("Not readable!", (*fp)(1));
        }
        cc     5-evaluating-void.c   -o 5-evaluating-void
        Hello world?

.. class:: handout

        Hoodwinking like this is quite dangerous.

        In this example, there isn't that much danger, as exit terminates
        the program before it returns. However, if you force an evaluation
        of a function returning void, the behavior is very much undefined.

        On the 64-bit Ubuntu Lucid system I use daily, it will evaluate to
        0, while on other (older) gcc versions, it will simply trigger a
        segmentation fault.

        Tip: use -Werror and deal with all the warnings.

Duff's Device
=============

::

	send(short *to, short *from, int count)
	{
		register n=(count+4)/5;
		switch(count%5){
		case 0:	do{	*to++ = *from++;
		case 4:		*to++ = *from++;
		case 3:		*to++ = *from++;
		case 2:		*to++ = *from++;
		case 1:		*to++ = *from++;
			}while(--n>0);
		}
	}

.. class:: handout

	Tom Duff was working at Lucasfilms while trying to optimize a copy
	to a hardware register. The trick was to do larger chumps of work
	between each test. To solve the problem of what to do when the
	number of elements to be copied was not divisible by the
	increment-size, Duff came up with the idea of using a switch
	statement to jump into the middle of a loop.

	The original version Duff's Device looks like this:

	::
	
		send(to, from, count)
		register short *to, *from;
		register count;
		{
			register n=(count+7)/8;
			switch(count%8){
			case 0:	do{	*to = *from++;
			case 7:		*to = *from++;
			case 6:		*to = *from++;
			case 5:		*to = *from++;
			case 4:		*to = *from++;
			case 3:		*to = *from++;
			case 2:		*to = *from++;
			case 1:		*to = *from++;
				}while(--n>0);
			}
		}

	Note that the ``*to`` variable does not change: Duff was working with a
	hardware device that would read this memory area between
	increments. Today, the to-variable would have been declared
	volatile.

	Today, there is little reason to use this function if memcpy is
	available. As a remainder (%) operation is inherently slow, this
	algorithm is actually likely to be slower than a straight loop for
	smaller data sets.

IOCCC
=====

- The International Obfuscated C Coding Contest http://www.ioccc.org
- The source of much pain, anguish and mad cackling

Questions?
==========

- while(gets(s)?answer(s):0);

