spartakus: Using sparse to have semantic checks for kernel ABI breakages

I have been working on this project I have named spartakus that deals with kernel ABI checks through semantic processing of the kernel source code. I have made the source code available on Github some time back and this does deserve a blog post.

Introduction
spartakus is a tool that can be used to generate checksums for exported kernel symbols through semantic processing of the source code using sparse. These checksums would constitute the basis for kernel ABI checks, with changes in checksums meaning a change in the kABI.

spartakus (which is currently a WIP) is forked from sparse and has been modified to fit the requirements of semantic processing of the kernel source for kernel ABI checks. This adds a new binary ‘check_kabi‘ upon compilation, which can be used during the linux kernel build process to generate the checksums for all the exported symbols. These checksums are stored in Module.symvers file which is generated during the build process if the variable CONFIG_MODVERSIONS is set in the .config file.

What purpose does spartakus serve?
In an earlier post I had spoken a bit on exported symbols constituting the kernel ABI and how its stability can be kept track of through CRC checksums. genksyms has been the tool that had been doing this job of generating the checksums for exported symbols that constitute the kernel ABI so far, but it has several problems/limitations that make the job of developers difficult wrt maintaining the stability of the kernel ABI. Some of these limitations as determined by me are mentioned below:

1.genksyms generates different checksums for structs/unions for declarations which are semantically similar. For example, for the following 2 declarations for the struct ‘list_head’:

struct list_head {
    struct list_head *next, *prev;
};

struct list_head {
    struct list_head *next;
    struct list_head *prev;
};

both declarations are essentially the same and should not result in a change in kABI wrt ‘list_head’. Sparse treats these 2 declarations as semantically the same and different checksums are not generated.

2. For variable declarations with just unsigned/signed specification with no type specified, sparse considers the type as int by default.

Hence,
‘unsigned foo’ is converted to ‘unsigned int foo’ and then processed to get the corresponding checksum. On the other hand, genksyms would generate different checksums for 2 different definitions which are semantically the same.

License
sparse is licensed under the MIT license which is GPL compatible. The added files as mentioned above are under a GPLv2 license since I have used code from genksyms which is licensed under GPLv2.

Source Code
Development on spartakus is in progress and the corresponding source code is hosted on Github.

Lastly, I know there will be at least who will wonder why the name ‘spartakus’. Well I do not totally remember, but it had something to do with sparse and Spartacus. Does sound cool though?

Advertisements

Not so easy way to know about your Kernel ABI

My work in Red Hat is related to the Linux kernel ABI (or kABI). This has been an entirely new and exciting experience for me since I became a Red Hat associate in May last year. In this blog post and some posts subsequent to this I would like to talk about everything I have been learning so far.

First up, ABI or the Application Binary Interface is a low level (binary level) interface exposed by some application. The linux kernel and its accompanying modules also expose (or exports) a binary interface which we refer to as the kernel ABI or kABI. It is important that this exported interface remains stable because modules written for a particular kABI would need to be recompiled if the interface does change in a subsequent kernel version.

The exported binary interface that I keep referring to is nothing except a list of symbols which have been decided to be exported. These symbols could be function names, enums, structs or anything. To find these exported symbols, you would just need to ‘grep’ through the linux source for “EXPORT_SYMBOL”. Following is a section of the grep output when I search for EXPORT_SYMBOL in the source code for kernel-3.14.5-200.fc20 built for Fedora:

sbairagy@dhcp193-120 ~/r/k/linux-3.14> grep -Rn "EXPORT_SYMBOL" .
./kernel/async.c:303:EXPORT_SYMBOL_GPL(async_synchronize_cookie_domain);
./kernel/async.c:316:EXPORT_SYMBOL_GPL(async_synchronize_cookie);
./kernel/gcov/base.c:52:EXPORT_SYMBOL(__gcov_init);
./kernel/gcov/base.c:62:EXPORT_SYMBOL(__gcov_flush);
./kernel/gcov/base.c:68:EXPORT_SYMBOL(__gcov_merge_add);
./kernel/gcov/base.c:74:EXPORT_SYMBOL(__gcov_merge_single);
./kernel/gcov/base.c:80:EXPORT_SYMBOL(__gcov_merge_delta);
./kernel/gcov/base.c:86:EXPORT_SYMBOL(__gcov_merge_ior);
./kernel/tracepoint.c:395:EXPORT_SYMBOL_GPL(tracepoint_probe_register);
./kernel/tracepoint.c:439:EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);

All symbols enclosed in brackets after EXPORT_SYMBOL or EXPORT_SYMBOL_GPL are exported as a part of the binary interface exported by the kernel. When the kernel is built for Fedora, these exported symbols are stored in a file called Module.symvers. You can find this file for the kernel version you are running in /usr/src/kernels/<version>/. A section of the Module.symvers file for the kernel I am using is shown below:

sbairagy@dhcp193-120 ~/r/k/linux-3.14> vim /usr/src/kernels/3.14.8-200.fc20.x86_64/Module.symvers
0x00000000      tveeprom_read   drivers/media/common/tveeprom   EXPORT_SYMBOL
0x00000000      cx2341x_log_status      drivers/media/common/cx2341x    EXPORT_SYMBOL
0x00000000      ttm_bo_acc_size drivers/gpu/drm/ttm/ttm EXPORT_SYMBOL
0x00000000      inet_sendmsg    vmlinux EXPORT_SYMBOL
0x00000000      zs_get_total_size_bytes vmlinux EXPORT_SYMBOL_GPL

The 0th column contain checksums corresponding to the exported symbols which one can find in column 1. You might notice that the checksums for all the symbols are 0x00000000. This is not odd because some variable called CONFIG_MODVERSIONS had not been set during the kernel build process thus disabling generation of checksums for the exported symbols.

These checksums are vital to catching kABI breakages. kABI breakage occurs whenever changes are made to exported symbols. As a simple example, if we have a exported symbol ‘colours’ which refers to an enum ‘colours’ defined as

enum colours {"Red", "Green", "Blue"}

and if we decide to change this enum by adding one more colour (say “Yellow”) to it, then we say that the change in the exported symbol ‘colours’ have resulted in a change in the kernel ABI. Now whenever such changes happen for any symbol, the checksum corresponding to that symbol also change which leads to easy detection of kernel ABI change (or breakage).

How are these checksums generated? Well, there is a tool called genksyms which is integrated with the Linux kernel source and does this job of generating checksums for exported symbols during build process, if CONFIG_MODVERSIONS variable is set. In my subsequent posts I will write about the genksyms tool and how it actually generates the all important checksums. Till then, cheers. 😉