blog

A Simpler Blog

2019-05-13T00:00:00Z

blog

A Simpler Blog

13 May 2019

Previously I was using Jekyll for this blog. It worked okay but I disliked having the ruby dependencies and felt it was a bit too complicated for my intended use-case.

This was my Gemfile.lock prior to the change:

GEM
  remote: https://rubygems.org/
  specs:
    addressable (2.5.0)
      public_suffix (~> 2.0, >= 2.0.2)
    colorator (1.1.0)
    ffi (1.9.17)
    forwardable-extended (2.6.0)
    jekyll (3.3.1)
      addressable (~> 2.4)
      colorator (~> 1.0)
      jekyll-sass-converter (~> 1.0)
      jekyll-watch (~> 1.1)
      kramdown (~> 1.3)
      liquid (~> 3.0)
      mercenary (~> 0.3.3)
      pathutil (~> 0.9)
      rouge (~> 1.7)
      safe_yaml (~> 1.0)
    jekyll-feed (0.9.2)
      jekyll (~> 3.3)
    jekyll-last-modified-at (1.0.1)
      jekyll (~> 3.3)
      posix-spawn (~> 0.3.9)
    jekyll-sass-converter (1.5.0)
      sass (~> 3.4)
    jekyll-watch (1.5.0)
      listen (~> 3.0, < 3.1)
    kramdown (1.13.2)
    liquid (3.0.6)
    listen (3.0.8)
      rb-fsevent (~> 0.9, >= 0.9.4)
      rb-inotify (~> 0.9, >= 0.9.7)
    mercenary (0.3.6)
    pathutil (0.14.0)
      forwardable-extended (~> 2.6)
    posix-spawn (0.3.13)
    public_suffix (2.0.5)
    rb-fsevent (0.9.8)
    rb-inotify (0.9.7)
      ffi (>= 0.5.0)
    rouge (1.11.1)
    safe_yaml (1.0.4)
    sass (3.4.23)

PLATFORMS
  ruby

DEPENDENCIES
  jekyll
  jekyll-feed
  jekyll-last-modified-at

BUNDLED WITH
1.15.2

I replaced all this with the following dependencies:

cmark
make (GNU)
bash
inotify (optional)

All these are found on any typical linux installation except cmark, which is a small C library/executable that only requires libc.

Github by default builds your Jekyll pages repository. Since I no longer use it, I now need to run make and build the html pages prior to pushing, but I consider this minimal extra cost. Further, it allows me to support mirroring this blog automatically to gitlab pages, which do not support Jekyll. This is a very simple setup. I added a .gitlab-ci.yml file, then enabled auto-mirroring in gitlab and pointed it to the repository on github. I finally configured Gitlab’s CI to run on each commit.

I considered writing raw html directly and having no markdown to html step but this seemed a bit too far. Minor things such as escaping code blocks and maintaining open/end tags kept me with markdown. My index page currently is raw html but I consider this a one-off (besides updating the index). The styling also is slightly different to the standard post.

I briefly considered using a simple template system, but considering my needs I decided against that and simply concatenate a few html files to build the resulting posts. You can see this in the Makefile (which contains only one essential step):

MD := $(wildcard ./blog/*.md)
TMPL := $(wildcard ./build/*)
HTML := $(MD:.md=.html)

build: $(HTML)

blog/%.html: blog/%.md $(TMPL)
	@echo $<
	@bash -c 'cat ./build/1.html ./build/1.css ./build/2.html <(cmark --unsafe --smart $<) ./build/3.html > $@'

watch:
	@while inotifywait -qq -e move -e modify -e create -e delete --exclude './blog/*.html' ./blog; do \
		make -s; \
	done

.PHONY: watch

Finally, a change I had been meaning to do for a while anyway was to adjust the path of the blog entries behind the /blog path. I’ve kept the existing links as extra html pages which perform a simple http-equiv="refresh" redirect to the new pages so existing linked content will still continue to work.

I’m pretty happy with the result. There is less to go wrong here for me and I’ve kept practically all of the same functionality (besides an auto-generated atom feed) while minimizing dependencies. I’d strongly urge people to take a step back every now and then and see if there current solution isn’t too much for them and if it can’t be replaced. I spent a fair bit of time reading Jekyll documentation, looking for plugins that I don’t need and determining how the templating system worked. At least for my use-case, all I really needed was to generate html from markdown and cat some files together.

Generic Data Structures in C

2017-04-22T00:00:00Z

blog

home

Generic Data Structures in C

22 April 2017

See here for a final implementation if you don’t want to read all this.

Generic data structures in C typically have a pretty unfriendly API. They either rely on void pointers and erase type information, or resort to macros to provide a semblance of the templating system found in C++.

This post will look at constructing a macro-based vector in C with a focus on ease of use. We will use modern C11 features and ample compiler extensions to see where we can take this.

A Generic Vector

First, lets define our vector type. We’ll call it qvec because its short and sweet.

#define qvec(T)             \
    struct qvec_##T {       \
        size_t cap, len;    \
        T data[];           \
    }

We take a parameter T which will represent the type that is stored in our vector. This will be templatized at compile-time, similar to how vector is in C++.

The data field is a flexible array member from C99.

Note: We will forgo error checking of malloc and realloc for simplicity.

`new`

The new function should malloc enough memory for some initial members. The size of the required storage will depend on the size of T. A possible implementation could be

#define qvec_new(T, v)                                       \
do {                                                         \
    size_t initial_size = 16;                                \
    v = malloc(sizeof(qvec(T)) + sizeof(T) * initial_size);  \
    v->cap = initial_size;                                   \
    v->len = 0;                                              \
} while (0)

which we can use to initialize a vector of integers as

qvec(int) *v;
qvec_new(int, v);

The flexible array member allows us to get away with a single call to malloc which is a minor nicety. Otherwise, this is a little underwhelming. The separation of declaration and initialization is not ideal.

To make this a bit nicer, we can use statement expressions which allow multiple statements to be evaluated and used as if they were an expression. Our new definition for new would then be

#define qvec_new(T)                                                           \
({                                                                            \
    const size_t initial_size = 16;                                           \
    struct qvec_##T *v = malloc(sizeof(qvec(T)) + sizeof(T) * initial_size);  \
    v->cap = initial_size;                                                    \
    v->len = 0;                                                               \
    v;                                                                        \
})

which gives us the much more natural usage

qvec(int) *v = qvec_new(int);

Standard Functions

Lets now implement the common vector functions push, pop and at.

`pop`

pop doesn’t require any special knowledge of the type T so this is simply

#define qvec_pop(v) (v->data[--v->len])

`at`

at is slightly more interesting. When working with a C++ vector (or a standard C array), the notation array[x] is an lvalue which can be assigned to. It would be nice if our qvec has this property as well.

First, lets define the helper function

#define qvec_ref(v, i) (&v->data[i])

This returns an lvalue and so can be used with a pointer dereference. e.g. *qvec_ref(v, i) = 5.

We can wrap this in another macro to hide this dereference

#define qvec_at(v, i) (*(qvec_ref(v, i)))

`push`

push presents a small problem. If we were to generate a standard implementation

#define qvec_push(v, i)                                 \
({                                                      \
    if (v->len >= v->cap) {                             \
        v->cap *= 2;                                    \
        v = realloc(v, sizeof(?) + v->cap * sizeof(?)); \
    }                                                   \
    v->data[v->len++] = (i);                            \
})

we might be left wondering what to insert into the ? marked locations.

The second ? is less worrying. This should be sizeof(T). We could just pass the type again, but doing it on every push is not ideal. In fact, we don’t need any new information. Recall that the data field of qvec is of type T[]. Performing a dereference of this will give us the size of a single T, exactly what we want!

The first ? is more bothersome. We are interested in determining the value of sizeof(qvec(T)). We can’t use the data field here, since the T required here is the actual typename used during initialization. This would be viable if it were possible to generate a type name from an arbitrary variable but unfortunately we cannot do this.

The way to get this size is first to realise that the data member in a qvec doesn’t actually take up any space within the array, not even for a pointer.

We can confirm this by checking the following

struct {
    char a, b;
    char b[]
} foo;

printf("foo is %zu bytes\n", sizeof(foo));

which will print

foo is 2 bytes

Since this data doesn’t take any space, we can see that the other members (len and cap) have a fixed type and therefore size, regardless of the type of T.

We can separate the type of qvec into

#define qvec_base       \
    struct {            \
        size_t cap, len;\
    }

#define qvec(T)         \
    struct qvec_##T {   \
        qvec_base;      \
        T data[];       \
    }

This now allows us to query the size of the type-independent part of a qvec while retaining access to all the members in the same way.

As an aside, we can define this using less macro-wizardry if we enable the -fplan9-extensions option in GCC as documented here.

struct qvec_base {
    size_t cap, len;
}

#define qvec(T)             \
    struct qvec_##T {       \
        struct qvec_base;   \
        T data[];           \
    }

This allows embedding of existing struct definitions as an anonymous struct.

Now, finally, we can define our push function as:

#define qvec_push(v, i)                                                 \
({                                                                      \
    if (v->len >= v->cap) {                                             \
        v->cap *= 2;                                                    \
        v = realloc(v, sizeof(qvec_base) + v->cap * sizeof(*v->data));  \
    }                                                                   \
    v->data[v->len++] = (i);                                            \
})

`free`

Since we only use a single malloc to initialize the type, this is simply

#define qvec_free(v) free(v)

API so far

Lets see what this gives us so far

qvec(int) *iv = qvec_new(int);
qvec_push(iv, 5);
qvec_push(iv, 8);
printf("%d\n", qvec_at(iv, 0));
qvec_at(iv, 1) = 5;
qvec_free(iv);

and compared similar C++ vector usage

std::vector iv;
iv.push_back(5);
iv.push_back(8);
printf("%d\n", iv[0]);
iv[1] = 5;

Looking okay, but lets go a bit further.

Extended Functions

Generic Printing

It is fairly common that we want to dump the values of a vector to see what is inside. If we wanted to write this for an integer vector, the following would work

#define qvec_int_print(v)               \
({                                      \
    printf("[");                        \
    for (int i = 0; i < v->len; ++i) {  \
        printf("%d", v->data[i]);       \
        if (i + 1 < v->len)             \
            printf(", ");               \
    }                                   \
    printf("]\n");                      \
})

which can be used as

qvec_print(iv); // [5, 5]

This is nice, but since it isn’t generic it has a limited use case. Fortunately for us, C11 brings some new interesting features to the table which we can use.

The C11 _Generic keyword allows rudimentary switching based on the type of its input. Think of it just as a compile-time switch statement on types.

For example, we could construct a macro to print the name of a type

#define type_name(x) _Generic((x), int: "int", float: "float")

printf("This is a %s\n", type_name(5.0f));
printf("This is a %s\n", type_name(5));

which when run would output

This is a float
This is a int

We can use this to generate the appropriate printf format specifier for the passed type.

#define GET_FMT_SPEC(x) _Generic((x), int: "%d", float: "%f", char*: "%s")

and modifying our print function

#define qvec_print(v)                   \
({                                      \
    printf("[");                        \
    for (int i = 0; i < v->len; ++i) {  \
        printf(GET_FMT_SPEC(v->data[i]), v->data[i]);\
        if (i + 1 < v->len)             \
            printf(", ");               \
    }                                   \
    printf("]\n");                      \
 })

This would now work on an integer and float qvec type with no modifications. Of course, we could extend GET_FMT_SPEC with whatever types we need.

You may recall that I mentioned that we could solve an earlier issue regarding our push function if we could generate a type name from a variable. It seems like the _Generic keyword would help is achieve this and indeed it does in part. The problem is that it is evaluated after preprocessing, so we cannot use its output as part of the preprocessor token concatenation process.

This is an easy mistake to make, since _Generic is seen pretty much solely within macro definitions for obvious reasons. This isn’t required though, the following being perfectly valid code.

int a;
float b;

printf("%s\n", _Generic(a, int: "a is an int", float: "a is a float"));
printf("%s\n", _Generic(b, int: "b is an int", float: "b is a float"));

Initializer Lists

Since C++11, vectors can now be initialized with initializer lists

std::vector = {4, 5, 2, 3};

This is pretty nice. Let’s add something similar to our new function using C99 variadic macros with a GCC extension which allows an arbitrary name to be given for them.

#define QVEC_ALEN(a) (sizeof(a) / sizeof(*a))

#define qvec_new(T, xs...)                                                    \
({                                                                            \
    const size_t initial_size = 16;                                           \
    const T _xs[] = {xs};                                                     \
    struct qvec_##T *v = malloc(sizeof(qvec(T)) + sizeof(T) * QVEC_ALEN(_xs));\
    v->cap = initial_size;                                                    \
    v->len = QVEC_ALEN(_xs);                                                  \
    for (int i = 0; i < v->len; ++i)                                          \
        v->data[i] = _xs[i];                                                  \
    v;                                                                        \
})

xs here collects all arguments except the first. We assign these to a temporary array which allows us to work with the values, but also has the effect of typechecking the values.

qvec(int) *v = qvec_new(int, 4, 5, 2, 3);

Complex Objects

Suppose we have the following type

typedef struct {
    char *id;
    bool is_tasty;
} Food;

We might try and utilize C99 struct initializers to perform the following

qvec(Food) *v = qvec_new(Food);
qvec_push(v, { .id = "apple", .is_tasty = true });

This however fails to compile. Under clang, we get the following error

qvec.c:103:34: error: too many arguments provided to function-like macro
      invocation
    qvec_push(v, { "apple", 1 });
                                 ^
qvec.c:42:9: note: macro 'qvec_push' defined here
#define qvec_push(v, i)                                                       \
        ^
qvec.c:103:5: note: cannot use initializer list at the beginning of a macro
      argument
    qvec_push(v, { "apple", 1 });
    ^            ~~~~~~~~~~~~~~~~~~~~
qvec.c:103:5: error: use of undeclared identifier 'qvec_push'
    qvec_push(v, { "apple", 1 });
    ^

The reason this doesn’t work is that the C preprocessor is dumb. It doesn’t know that this is a designated initializer because it doesn’t actually know anything about the C language. Instead, it sees two arguments. The first being { .id = "apple" and the second .is_tasty = true }.

The can get around this is by using the previously mentioned variadic macros once again. Using a similar technique to the previously extended new function.

#define qvec_push(v, xs...)                                             \
({                                                                      \
    const typeof(*v->data) _xs[] = {xs};                                \
    if (v->len + QVEC_ALEN(_xs) >= v->cap) {                            \
        while (v->cap <= v->len + alen(_xs)) {                          \
            v->cap = 2 * v->cap;                                        \
        }                                                               \
        v = realloc(v, sizeof(qvec_base) + v->cap * sizeof(*v->data));  \
    }                                                                   \
    for (int i = 0; i < QVEC_ALEN(_xs); ++i) {                          \
        v->data[v->len++] = _xs[i];                                     \
    }                                                                   \
    v;                                                                  \
})

The reason variadic macros help here is that all macro arguments are gathered at once and treated as input to an array initializer. Even though individual arguments are not valid tokens, it doesn’t matter, since the full set of argments is.

Another thing to note is the use of the typeof keyword. This allows us to retrieve the type of an expression, which can be used to initialize new types. The most common example of its usage is likely within a type-generic swap macro.

#define swap(x, y)              \
do {                            \
    const typeof(x) _temp = y;  \
    y = x;                      \
    x = _temp;                  \
} while (0)

Extensions, Extensions, Extensions

Our code is already filled with compiler-specific C extensions, so we may as well go overboard.

RAII

One of the better features of C++ is the ability to utilize RAII to run destructors on block exit. This reduces the chance that leaks occur within programs and just makes using complex types much more pleasant.

The cleanup variable attribute is a GCC extension which allows a user-defined cleanup function to automatically run when the value goes out of scope.

This attribute takes one argument, a function of type void cleanup(T**) where T is the type which this attribute is declared with.

Using this with our qvec, it may look like

static inline _qvec_free(void **qvec) { free(*qvec); }

int main(void)
{
    qvec(int) __attribute__ ((cleanup(_qvec_free))) *qv = qvec_new(int);
    // No qvec_free here!
}

This is a little verbose however, so lets define our own keyword which we can use instead.

#define raii __attribute__ ((cleanup(_qvec_free)))

int main(void)
{
    raii qvec(int) *qv = qvec_new(int);
}

Note that an attribute doesn’t strictly need to be specified after the type definition.

This is nice, but if you had actually compiled the above you would get a number of type errors.

qvec.c: In function ‘main’:
qvec.c:13:12: warning: passing argument 1 of ‘_qvec_free’ from incompatible pointer type [-Wincompatible-pointer-types]
     struct {                                                                  \
            ^
qvec.c:26:40: note: in expansion of macro ‘qvec_base’
     struct qvec_##T *v = malloc(sizeof(qvec_base) + sizeof(_xs));             \
                                        ^
qvec.c:94:25: note: in expansion of macro ‘qvec_new’
     raii qvec(int) *v = qvec_new(int);
                         ^
qvec.c:88:20: note: expected ‘void **’ but argument is of type ‘struct qvec_int **’
 static inline void _qvec_free(void **qvec) { free(*qvec); }

The compiler complains because we are relying on an implicit cast to void. We know this is actually valid however, since every qvec is going to use a single call to free in order to release its memory.

As far as I’m aware, this requires a pragma at the callsite to disable this locally. This is quite inconvenient, and really loses out any usability that we may have gained from using this. The following will compile without warnings

int main(void)
{
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wincompatible-pointer-types"
    raii qvec(int) *v = qvec_new(int, 5, 4, 3);
#pragma GCC diagnotic pop
}

At this stage though, remembering to just manually free seems like a saner choice.

Type Inference

One of the nice features of C++11 onwards is the revitalization of the auto keyword. This now provides type inference which is very nice in a number of circumstances.

If we look at our vector initialization

qvec(int) *v = qvec_new(int);

we clearly have a bit of redundancy. Unfortunately the C language doesn’t support type inference… as part of the standard at least. An interesting extension is the _auto_type keyword which provides some limited type inference capabilities.

Since the auto keyword is practically useless, lets just redefine it

Redefining keywords is usually a very bad idea. Although, GCC will allow it.

#define auto __auto_type

auto iv = qvec_new(int);

Although yet again, our expectations differ to reality. This will not compile! The reason for this is that previously we were relying on the inline struct definition of qvec(T) that was declared on every initialization. Without this declaration, our new auto keyword cannot find any struct which matches the return type and must fail.

As an example, the following works fine

qvec(int) *a = qvec_new(int);
auto b = qvec_new(int);

because the qvec(int) declared the struct, so the next qvec return type can be deduced correctly. This is simply an inherent limitation with the tools we have. A simple solution would be simply forward declare our structs.

qvec(int);

int main(void)
{
    auto a = qvec_new(int); // Ok!
}

But this is one extra line to type for each qvec type required!

Drawbacks

We have a pretty good set of functions associated with our qvec so far. Usability is ok and we have a few of the more desirable features of C++ in our hands within C.

Undoubtedly however, there are some inherent problems that we just can’t solve.

Complex Container Types

We can do the following in C++

std::vector>> v;

To do this with our qvec the following is required

typedef qvec(int) qvec_int;
typedef qvec(qvec_int) qvec_qvec_int;
qvec(qvec_qvec_int) *v = qvec_new(qvec_qvec_int);

Recall back to our new implementation. We generate a struct with a name qvec_##T where T is the type. Since this is concatenated to make an identifier, the types must be comprised only of characters which can exist within an identifier ([_0-9A-Za-z]). Any types which use other characters, such as functions, pointers and even our own qvec types must have a typedef before we can use them.

As an example, the following

qvec(char**);

expands to the invalid struct declaration

struct qvec_char** {
    size_t cap, len;
    char* data[];
};

Too Much Inlining

Since we are dealing with macros, every call is going to generate the same code at the call site. This isn’t too big a deal with our qvec, since a vector is inherently pretty simple, but if we wanted to use the same techniques to construct a generic hashmap, for example, the code duplication would be much worse.

This is where the generic containers which rely on simply generating the required functions for each type (see khash) definitely have the upper hand.

These approaches however do lose out a bit in terms of the expressiveness of the resulting API (which is our main focus here).

Which Names are Which?

Say we wanted to do the following contrived thing

void print(qvec(int) *v)
{
    qvec_print(v);
}

int main(void)
{
    qvec(int) *v = qvec_new(int, 1, 2, 3);
    print(v);
}

This will spew our a mess of errors about anonymous structs. The reason being is that the qvec(int) in the print parameter list is declaring a new anonymous struct, and the two qvec(int) declarations are completely different structures.

This can be worked around by doing a typedef at the start of your file and using this, but again at the cost of extra work for the programmer.

How about the following example. Will this qvec_new be aware of the type being used within the Foo struct?

struct Foo {
    qvec(int) *values;
};

void foo_init(Foo *v)
{
    v->value = qvec_new(int);
}

int main(void)
{
    struct Foo f;
    foo_init(&f);
}

This in fact will work potentially to some surprise. Even though this does, it still highlights a pretty important problem. Even though the API is nice and appears easy to use, there are a number of naming issues that the user must be aware of, which greatly limits its usage as a just works type of structure.

A Final Look

#include "qvec.h"

typedef char* string;

typedef struct {
    int x, y;
} Tuple;

int main(void)
{
    qvec(string) *sv = qvec_new(string, "Who", "are", "you?");
    qvec_print(sv);
    qvec_at(sv, 2) = "we?";
    qvec_print(sv);
    qvec_free(sv);

    qvec(int) *iv = qvec_new(int, 1, 2, 3, 4);
    qvec_print(iv);
    printf("%d\n", qvec_pop(iv));
    qvec_free(iv);

    qvec(Tuple) *tv = qvec_new(Tuple, { .x = 0, .y = 1 }, { 4, 2 }, { 5, 4 });
    printf("%d\n", qvec_at(tv, 1).x);
    printf("%d\n", qvec_at(tv, 2).x);
    qvec_free(tv);
}

So would I recommend using this? Probably not. If you were insistent on sticking with C however I think the best compromise would be to generate the specific instantiations (similar to what khash does). This gets rid of most of the problems specified here. Alternatively, if performance and the type-safety isn’t a big deal, then a tried and tested void* implementation would be good too.

At the end of the day though, the pragmatic solution would be to just use C++ if there are no reasons not to and call it a day. Especially if you are considering performing these types of C macro chicanery.

Lwan Api Intro

2017-02-16T00:00:00Z

blog

home

Lwan Api Intro

16 Feb 2017

See this repository for a complete example project.

Lwan is a high performance & scalable web server written in C.

The main page has now been updated so the following is no longer applicable. Skip to the Building section.

An API example is listed on the main project page.

#include "lwan.h"

static lwan_http_status_t
hello_world(lwan_request_t *request,
            lwan_response_t *response, void *data)
{
    static const char message[] = "Hello, World!";

    response->mime_type = "text/plain";
    strbuf_set_static(response->buffer, message, sizeof(message) - 1);

    return HTTP_OK;
}

int
main(void)
{
    const lwan_url_map_t default_map[] = {
        { .prefix = "/", .handler = hello_world },
        { .prefix = NULL }
    };
    lwan_t l;

    lwan_init(&l);

    lwan_set_url_map(&l, default_map);
    lwan_main_loop(&l);

    lwan_shutdown(&l);

    return 0;
}

This looks great, easy to use and no real surprises. Unfortunately, this is incompatible with the current master branch at the time of writing. Further, the documentation on using Lwan as a library is a bit sparse at this time.

This will be a quick-start guide to setting up a sample project and going from there.

Building

First, we’ll create a new project.

mkdir lwan-api-example
cd lwan-api-example

Next, let’s pull lwan into the project and build it. You will need CMake and zlib installed for this.

git clone https://github.com/lpereira/lwan # Use a submodule here if using git!
cd lwan
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

This generates liblwan.a and liblwan.so library files in the build/common directory. We’ll come back to these later, lets start a simple project.

Creating the project

Let’s go back to our project root and create a new file, hello.c.

#include 

static enum lwan_http_status hello_world(
        struct lwan_request *request,
        struct lwan_response *response,
        void *data)
{
    static const char message[] = "Hello, World!";

    response->mime_type = "text/plain";
    strbuf_set_static(response->buffer, message, sizeof(message) - 1);

    return HTTP_OK;
}

int main(void)
{
    const struct lwan_url_map default_map[] = {
        { .prefix = "/", .handler = hello_world },
        { .prefix = NULL }
    };

    struct lwan l;
    lwan_init(&l);

    lwan_set_url_map(&l, default_map);
    lwan_main_loop(&l);

    lwan_shutdown(&l);
    return 0;
}

Note that the main differences between the current api is the removal of _t specifiers (likely for POSIX conformance).

Building our project

The include headers are stored in the common directory. We also require the include header generated by CMake which is found in the build directory.

We then can build our project.

gcc -O2 -Ilwan/common -Ilwan/build hello.c lwan/build/common/liblwan.a \
    -o hello -lpthread -ldl -lz

We need to remember to link the pthread, dl and zlib libraries. That should be it, you should now have a binary hello that you can run.

Have fun!

PEG Grammar for Chord Notation

2017-12-17T00:00:00Z

blog

home

PEG Grammar for Chord Notation

17 Dec 2017

Recently I was looking for a grammar to catagorize music chords but was surprised that there weren’t any decent ones around.

The following is a PEG grammar which categorizes the most common cases for standard Jazz notation. There are a few missing edge cases (e.g. 7sus4).

I’m still unsure whether a full grammar is worthwhile due to the many edge cases. I’ve written a parser previously using parser combinators which was fairly straight-forward.

You can find an online peg parser generator here.

PolyChord          = Chord ('|' Chord)?
Chord              = Chord1 / Chord2 / Chord3 / Chord4

Chord1             = Note Special ChordUpper
Chord2             = Note ThirdSeventh Extended? ChordUpper
Chord3             = Note Third? Sixth ChordUpper
Chord4             = Note Third? Extended? ChordUpper

ChordUpper         = Addition? Alterations? Slash?
ThirdSeventh       = Augmented / Diminished
Augmented          = 'aug' / '+'
Diminished         = 'dim' / 'o' / 'ø'

Special            = '5' / 'sus2' / 'sus4'
Sixth              = '6' / '6/9'

Third              = (MajorThird !Extended) / MinorThird
MajorThird         = 'Maj' / 'M' / 'Δ'
MinorThird         = 'min' / 'm' / '-'

Extended           = ExtendedQuality? ExtendedInterval
ExtendedQuality    = MajorThird / 'dom'
ExtendedInterval   = '7' / '9' / '11' / '13'

Alterations        = Alteration / AlterationList
AlterationList     = '(' Alteration (',' Alteration)* ')'
Alteration         = Accidental AlterationInterval
AlterationInterval = '4' / '5' / '9' / '11' / '13'

Addition           = 'add' ('2' / '4' / '6')

Slash              = '/' Note Accidental?

Note               = [A-G] Accidental*
Accidental         = [b#]

Iterative Replacement of C with Zig

2017-07-19T00:00:00Z

blog

home

Iterative Replacement of C with Zig

19 Jul 2017

Zig is a programming language in a similar realm as C. Being more modern, it has a number of useful constructs such as sum types, compile-time introspection, improved error handling and no preprocessor!

This post will not describe the language itself (check the project page for that), but will show how it can be used to convert an existing C code-base into zig. We will look at a simplistic example, but the general strategy remains the same.

The zig compiler that will be used in this post can be found here.

The finished conversion result is at this repository. For each heading, there is a corresponding commit which describes all changes made at each point in the conversion process.

The Project

commit

The project we will be replacing has the following structure:

$ tree .
.
├── compute.c
├── compute.h
├── compute_helper.c
├── compute_helper.h
├── display.c
├── display.h
├── main.c
└── Makefile

The contents of the files are:

main.c

#include "display.h"
#include "compute.h"

int main(void)
{
    display_char(compute('A'));
}

display.c

#include 

void display_char(char c)
{
    printf("%c\n", c);
}

compute.c

#include "compute_helper.h"

char compute(char a)
{
    return compute_helper(a) + 5;
}

compute_helper.c

char compute_helper(char a)
{
    return a + 1;
}

Makefile

SRCS := compute.c compute_helper.c display.c main.c
OBJS := $(SRCS:%.c=build/%.o)

main: $(OBJS)
	gcc -o main $(OBJS)

$(OBJS): build/%.o: %.c | mkdirs
	gcc -std=c99 -c $< -o $@

mkdirs:
	@mkdir -p build

clean:
	rm -rf build main

The .h file contents simply expose their corresponding .c implementations.

$ make
gcc -std=c99 -c compute.c -o build/compute.o
gcc -std=c99 -c compute_helper.c -o build/compute_helper.o
gcc -std=c99 -c display.c -o build/display.o
gcc -std=c99 -c main.c -o build/main.o
gcc -o main build/compute.o build/compute_helper.o build/display.o build/main.o

$ ./main
G

The Zig Build System

commit

The first thing we will change is replacing the Makefile with zigs own custom build system. The build system of zig is written in zig itself, which reduces the requirement of knowing the oft-arcane Makefile idiosyncrasies.

build.zig

const Builder = @import("std").build.Builder;

pub fn build(b: &Builder) {
    const exe = b.addCExecutable("main");
    exe.addCompileFlags([][]const u8 {
        "-std=c99"
    });

    const source_files = [][]const u8 {
        "compute.c",
        "compute_helper.c",
        "display.c",
        "main.c"
    };

    for (source_files) |source| {
        exe.addSourceFile(source);
    }

    exe.setOutputPath("./main");
    b.default_step.dependOn(&exe.step);
}

First, we begin by specifying the main executable which we will be building. This constructs an object which represents a build-step. For each source file in our project, we simply add the file to main executable step.

This approach is far more imperative than the declarative approach of a Makefile. In my view, this is a good choice. Makefiles whilst concise can become exceedingly opaque and hard to parse, especially as a project grows and extra conditions need to be handled.

$ zig build --verbose
cc -c compute.c -o zig-cache/compute.c.o -std=c99
cc -c compute_helper.c -o zig-cache/compute_helper.c.o -std=c99
cc -c display.c -o zig-cache/display.c.o -std=c99
cc -c main.c -o zig-cache/main.c.o -std=c99
cc zig-cache/compute.c.o zig-cache/compute_helper.c.o zig-cache/display.c.o \
    zig-cache/main.c.o -o main -Wl,-rpath,zig-cache -rdynamic

$ ./main
G

First C Replacement

commit

The first actual source code we will replace is compute.c.

compute.zig

use @cImport(@cInclude("compute_helper.h"));

export fn compute(a: u8) -> u8 {
    compute_helper(a) + 5
}

This snippet demonstrates a few features of zig. First, zig is able to parse C header files directly. No binding interface needing! In this case, the use statement will bring all definitions from compute_helper.h into the global namespace, allowing us to call the compute_helper function.

The other important thing to note here is the export specifier on our function. This is important as it tells zig that it should compile this against the C ABI. This means we can call this function from within other C files.

Since our header files are simple, we can continue using them unmodified. Zig does automatically generate C headers as well however. We can compare these against the expected definitions to make sure that we implemented the function correctly.

zig-cache/compute.zig.h

#ifndef COMPUTE_2E_ZIG_H
#define COMPUTE_2E_ZIG_H

#include 

#ifdef __cplusplus
#define COMPUTE_2E_ZIG_EXTERN_C extern "C"
#else
#define COMPUTE_2E_ZIG_EXTERN_C
#endif

#if defined(_WIN32)
#define COMPUTE_2E_ZIG_EXPORT COMPUTE_2E_ZIG_EXTERN_C __declspec(dllimport)
#else
#define COMPUTE_2E_ZIG_EXPORT COMPUTE_2E_ZIG_EXTERN_C __attribute__((visibility ("default")))
#endif

COMPUTE_2E_ZIG_EXPORT uint8_t compute(uint8_t a);
COMPUTE_2E_ZIG_EXPORT __attribute__((__noreturn__)) void __zig_panic(const uint8_t * message_ptr, uintptr_t message_len);

#endif

Build System Modification

The second step we need to perform is modifying build.zig to compile both C and zig files and link them together.

build.zig

const Builder = @import("std").build.Builder;

pub fn build(b: &Builder) {
    const exe = b.addCExecutable("main");
    b.addCIncludePath(".");
    exe.addCompileFlags([][]const u8 {
        "-std=c99"
    });

    const source_files = [][]const u8 {
        "compute_helper.c",
        "display.c",
        "main.c"
    };

    for (source_files) |source| {
        exe.addSourceFile(source);
    }

    const zig_source_files = [][]const u8 {
        "compute.zig",
    };

    for (zig_source_files) |source| {
        const object = b.addObject(source, source);
        exe.addObject(object);
    }

    exe.setOutputPath("./main");
    b.default_step.dependOn(&exe.step);
}

This is mostly same, except we now have a list of zig source files as well. This should be fairly self-explanatory; for each zig source, we create an object build step. This is then added to the exe build step.

Note that we also add the current directory to the C include path. This is important since the @cInclude function used by zig does not read headers from the local directory.

$ zig build --verbose
zig build-obj compute.zig --cache-dir zig-cache --output zig-cache/compute.zig.o \
    --output-h zig-cache/compute.zig.h --name compute.zig -isystem .
cc -c compute_helper.c -o zig-cache/compute_helper.c.o -std=c99 -I zig-cache
cc -c display.c -o zig-cache/display.c.o -std=c99 -I zig-cache
cc -c main.c -o zig-cache/main.c.o -std=c99 -I zig-cache
cc zig-cache/compute.zig.o zig-cache/compute_helper.c.o zig-cache/display.c.o \
    zig-cache/main.c.o -o main -Wl,-rpath,zig-cache -rdynamic

$ ./main
G

Using the Zig Standard Library

commit

The next file we will replace is display.c.

display.zig

const std = @import("std");
const printf = std.io.stdout.printf;

export fn display_char(c: u8)
{
    %%printf("{c}\n", c);
}

Since we want to end up using only zig, we can replace the C printf statement with zig’s own stdlib implementation. Zig does not depend on libc at all. Because this is the only use of libc in our project, we can use the nostdlib to enforce this during our C compilation.

build.zig

exe.addCompileFlags([][]const u8 {
    "-std=c99",
    "-nostdlib",
});

The only other changes are removing display.c from the C sources, and adding display.zig to the zig sources.

$ zig build --verbose
zig build-obj compute.zig --cache-dir zig-cache --output zig-cache/compute.zig.o \
    --output-h zig-cache/compute.zig.h --name compute.zig -isystem .
zig build-obj display.zig --cache-dir zig-cache --output zig-cache/display.zig.o \
    --output-h zig-cache/display.zig.h --name display.zig -isystem .
cc -c compute_helper.c -o zig-cache/compute_helper.c.o -std=c99 -nostdlib -I zig-cache -I zig-cache
cc -c main.c -o zig-cache/main.c.o -std=c99 -nostdlib -I zig-cache -I zig-cache
cc zig-cache/compute.zig.o zig-cache/display.zig.o zig-cache/compute_helper.c.o \
    zig-cache/main.c.o -o main -Wl,-rpath,zig-cache -rdynamic

$ ./main
G

Removing Header Files

commit

As we get further along in our replacement, we will eventually reach the point where we have zig files which are not used by any other C files. This is great as it means we can remove the header files.

Consider now as we change compute_helper.c.

compute_helper.zig

pub fn compute_helper(a: u8) -> u8
{
    a + 1
}

The only dependency on this is compute.zig. We don’t need to export this using the C ABI and can just mark it pub for visibility. compute.c can then be changed to import a zig file instead.

compute.zig

pub use @import("compute_helper.zig");

export fn compute(a: u8) -> u8 {
    compute_helper(a) + 5
}

$ zig build --verbose
zig build-obj compute.zig --cache-dir zig-cache --output zig-cache/compute.zig.o \
    --output-h zig-cache/compute.zig.h --name compute.zig -isystem .
zig build-obj compute_helper.zig --cache-dir zig-cache --output zig-cache/compute_helper.zig.o \
    --output-h zig-cache/compute_helper.zig.h --name compute_helper.zig -isystem .
zig build-obj display.zig --cache-dir zig-cache --output zig-cache/display.zig.o \
    --output-h zig-cache/display.zig.h --name display.zig -isystem .
cc -c main.c -o zig-cache/main.c.o -std=c99 -nostdlib -I zig-cache -I zig-cache -I zig-cache
cc zig-cache/compute.zig.o zig-cache/compute_helper.zig.o zig-cache/display.zig.o \
    zig-cache/main.c.o -o main -Wl,-rpath,zig-cache -rdynamic

$ ./main
G

The Final File

commit

Our project now has only 1 remnant left of C. Let’s remove it all!

main.zig

use @import("display.zig");
use @import("compute.zig");

pub fn main() -> %void {
    display_char(compute('A'));
}

The main things to note here are the use statements for import. Since we were converting a C project, we didn’t initially have any namespacing. Since zig has a proper module system we usually strongly prefer assigning our imports to a constant. e.g. const display = @import("display.zig").

Now, we need to edit our build.zig file.

build.zig

const Builder = @import("std").build.Builder;

pub fn build(b: &Builder) {
    const exe = b.addExecutable("main", "main.zig");

    exe.setOutputPath("./main");
    b.default_step.dependOn(&exe.step);
}

Much simpler! Zig can make use of the implicit dependency graph formed between imports. Individual object files do not need to be built for each file explicitly.

$ zig build --verbose
zig build-exe main.zig --cache-dir zig-cache --output main --name main

./main
G

Closing

Zig makes this type of iterative conversion comparatively easier than most other languages. For larger projects there will be unknown difficulties however. These will be continually improved as the language becomes more stable and refined.

Being able to easily replace C with a newer modern alternative is a real bonus in terms of safety and ergonomics. See this post by the creator of the language some short examples of improvements.

If you want to know more about zig as a language, check out the project page.

Big Integers in Zig

2018-05-13T00:00:00Z

blog

home

Big Integers in Zig

13 May 2018

I’ve recently been writing a big-integer library, zig-bn in the Zig programming language.

The goal is to have reasonable performance in a fairly simple implementation with a generic implementation with no assembly routines.

I’ll list a few nice features about Zig which I think suit this sort of library before exploring some preliminary performance comparisons and what in the language encourages the speed.

Transparent Local Allocators

Unlike most languages, the Zig standard library does not have a default allocator implementation. Instead, allocators are specified at runtime, passed as arguments to parts of the program which require it. I’ve used the same idea with this big integer library.

The nice thing about this is it is very easy to use different allocators on a per-integer level. A practical example may be to use a faster stack-based allocator for small temporaries, which can be bounded by some upper limit.

// Allocate an integer on the heap
var heap_allocator = std.heap.c_allocator;
var a = try BigInt.init(heap_allocator);
defer a.deinit();

// ... and one on the stack
var stack_allocator = std.debug.global_allocator;
var b = try BigInt.init(stack_allocator);
defer b.deinit();

// ... and some in a shared arena with shared deallocation
var arena = ArenaAllocator.init(heap_allocator);
defer arena.deinit();

var c = try BigInt.init(&arena.allocator);
var d = try BigInt.init(&arena.allocator);

This isn’t possible in GMP, which allows specifying custom allocation functions, but which are shared across the all objects. Only one set of memory functions can be used per program.

Handling OOM

One issue with GMP is that out-of-memory conditions cannot easily be handled. The only feasible way in-process way is to override the allocation functions and use exceptions in C++, or longjmp back to a clean-up function which can attempt to handle this as best as it can.

Since Zig was designed to handle allocation in a different way to C, we can handle these much more easily. For any operation that could fail (either out-of-memory or some other generic error), we can handle the error or pass it back up the call-stack.

var a = try BigInt.init(failing_allocator);
// maybe got an out-of-memory! if we did, lets pass it back to the caller
try a.set(0x123294781294871290478129478);

There is the small detriment that it is required to explicitly handle possible failing functions (and for zig-bn, that is practically all of them). The provided syntax makes this minimal boilerplate, and unlike GMP we can at least see where something could go wrong and not have to rely on hidden error control flow.

Compile-time switch functions

Zig provides a fair amount of compile-time support. A particular feature is the ability to pass an arbitrary type var to a function. This gives a duck-typing sort of feature and can provide more fluent interfaces than we otherwise could write.

For example:

pub fn plusOne(x: var) @typeOf(x) {
    const T = @typeOf(x);

    switch (@typeInfo(T)) {
        TypeId.Int => {
            return x + 1;
        },
        TypeId.Float => {
            return x + 1.0;
        },
        else => {
            @compileError("can't handle this type, sorry!");
        },
    }
}

This feature is used to combine set functions into a single function instead of needing a variety of functions for each type as in GMP (mpz_set_ui, mpz_set_si, …).

Performance

Perhaps the most important detail of a big integer library is its raw performance. I’ll walk through the low-level addition routine and look at some techniques we can use to speed it up incrementally.

The benchmarks used here can be found in this repository. We simply compute the 50000’th fibonacci number. This requires addition and subtraction only.

Our initial naive implementation is as follows. It uses 32-bit limbs (so our double-limb is a 64-bit integer) and simply propagates the carry. We force inline the per-limb division and our debug asserts are compiled out in release mode. Memory allocation is handled in the calling function.

// a + b + *carry, sets carry to overflow bits
fn addLimbWithCarry(a: Limb, b: Limb, carry: &Limb) Limb {
    const result = DoubleLimb(a) + DoubleLimb(b) + DoubleLimb(*carry);
    *carry = @truncate(Limb, result >> Limb.bit_count);
    return @truncate(Limb, result);
}

fn lladd(r: []Limb, a: []const Limb, b: []const Limb) void {
    debug.assert(a.len != 0 and b.len != 0);
    debug.assert(a.len >= b.len);
    debug.assert(r.len >= a.len + 1);

    var i: usize = 0;
    var carry: Limb = 0;

    while (i < b.len) : (i += 1) {
        r[i] = @inlineCall(addLimbWithCarry, a[i], b[i], &carry);
    }

    while (i < a.len) : (i += 1) {
        r[i] = @inlineCall(addLimbWithCarry, a[i], 0, &carry);
    }

    r[i] = carry;
}

The results are as follows:

fib-zig: 0:00.75 real, 0.75 user, 0.00 sys
  debug: 0:06.61 real, 6.60 user, 0.00 sys

For comparison, the GMP run time is:

fib-c:   0:00.17 real, 0.17 user, 0.00 sys

A more comparable C implementation (python) is:

fib-py:  0:00.77 real, 0.77 user, 0.00 sys

A bit of work to do against GMP! We aren’t out of the ballpark compared to less heavily optimized libraries. We are comparing the debug runtime version as well since I consider it important that it runs reasonably quick for a good development cycle, and not orders of magnitude slower.

Leveraging Compiler Addition Builtins

Zig provides a number of LLVM builtins to us. While these shouldn’t usually be required, they can be valuable in certain cases. We’ll be using the @addWithOverflow builtin to perform addition while catching possible overflow.

Our new addition routine is now:

fn lladd(r: []Limb, a: []const Limb, b: []const Limb) void {
    debug.assert(a.len != 0 and b.len != 0);
    debug.assert(a.len >= b.len);
    debug.assert(r.len >= a.len + 1);

    var i: usize = 0;
    var carry: Limb = 0;

    while (i < b.len) : (i += 1) {
        var c: Limb = 0;
        c += Limb(@addWithOverflow(Limb, a[i], b[i], &r[i]));
        c += Limb(@addWithOverflow(Limb, r[i], carry, &r[i]));
        carry = c;
    }

    while (i < a.len) : (i += 1) {
        carry = Limb(@addWithOverflow(Limb, a[i], carry, &r[i]));
    }

    r[i] = carry;
}

The new results:

fib-zig: 0:00.69 real, 0.69 user, 0.00 sys
  debug: 0:06.47 real, 6.42 user, 0.00 sys

A minimal, but noticeable improvement.

Improving Debug Performance

Debug mode in Zig performs runtime bounds checks which include array checks and other checks for possible undefined behavior.

For these inner loops this is a lot of overhead. Our assertions are sufficient to cover all the looping cases. We can disable these safety checks on a per-block basis:

fn lladd(r: []Limb, a: []const Limb, b: []const Limb) void {
    @setRuntimeSafety(false);
    ...
}

fib-zig: 0:00.69 real, 0.69 user, 0.00 sys
  debug: 0:03.91 real, 3.90 user, 0.00 sys

That is a lot better.

64-bit limbs (and 128-bit integers).

We have been using 32-bit words this entire time. Our machine word-size however is 64-bits. Lets change our limb size only, and rerun our tests.

fib-zig: 0:00.35 real, 0.35 user, 0.00 sys
  debug: 0:01.95 real, 1.95 user, 0.00 sys

Unsurprisingly, this is now twice as fast! It is fairly useful if your compiler supports builtin 128-bit integer types when using 64-bit limbs. The reason is it makes handling overflow in addition and especially multiplication much more simple and easier to optimize by the compiler. Otherwise, software workarounds need to be done which can be much less performant.

Implementation Performance Summary

Benchmark code here.

A performance comparison using the following libraries/languages:

Note that C and Go use assembly, while Rust/CPython both are implemented in Rust and C respectively, and are comparable as non-tuned generic implementations.

System Info

Architecture:        x86_64
Model name:          Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz

Compiler Versions

zig:  0.2.0.ef3111be
gcc:  gcc (GCC) 8.1.0
go:   go version go1.10.2 linux/amd64
py:   Python 3.6.5
rust: rustc 1.25.0 (84203cac6 2018-03-25)

Addition/Subtraction Test

Computes the 50,000th fibonacci number.

fib-zig: 0:00.35 real, 0.35 user, 0.00 sys
fib-c:   0:00.17 real, 0.17 user, 0.00 sys
fib-go:  0:00.20 real, 0.20 user, 0.00 sys
fib-py:  0:00.75 real, 0.75 user, 0.00 sys
fib-rs:  0:00.81 real, 0.81 user, 0.00 sys

Multiplication/Addition Test

Computes the 50,000th factorial.

Zig uses naive multiplication only while all others use asymptotically faster algorithms such as karatsuba multiplication.

fac-zig: 0:00.54 real, 0.54 user, 0.00 sys
fac-c:   0:00.18 real, 0.18 user, 0.00 sys
fac-go:  0:00.21 real, 0.21 user, 0.00 sys
fac-py:  0:00.50 real, 0.48 user, 0.02 sys
fac-rs:  0:00.53 real, 0.53 user, 0.00 sys

Division Test (single-limb)

Computes the 20,000th factorial then divides it back down to 1.

Rust is most likely much slower since it doesn’t special-case length 1 limbs.

facdiv-zig: 0:00.99 real, 0.98 user, 0.00 sys
facdiv-c:   0:00.16 real, 0.16 user, 0.00 sys
facdiv-go:  0:00.93 real, 0.93 user, 0.00 sys
facdiv-py:  0:00.99 real, 0.99 user, 0.00 sys
facdiv-rs:  0:05.01 real, 4.98 user, 0.00 sys

Summary

In short, zig-bn has managed to get fairly good performance from a pretty simple implementation. It is twice as fast as other generic libraries for the functions we have optimized, and is likely to be similarly fast using comparable algorithms for multiplication/division.

While I consider these good results for a very simple implementation (<1k loc, excluding tests) it is still lacking vs. GMP. Most notably, the algorithms used are much more advanced and the gap would continue to grow as numbers grew even larger. Hats off to the GMP project, as always.

A good start for a weeks work.