Programming the kernel

HomepageArticlesTechnologiesProgramming the kernel

It is important to note that kernel programming differs significantly from userspace programming. The kernel is a standalone entity, which cannot use userspace libraries, even libc on Linux or kernel32.dll on Windows. As a result, the usual functions used in userspace (printf, malloc, free, open, read, write, memcpy, strcpy, etc.) can no longer be used.

In conclusion, kernel programming is based on a completely new and independent API that is not related to the API in the userspace, whether we refer to POSIX, Win32 or ANSI C. An important difference in kernel programming is how to access it. and memory allocation. Due to the fact that kernel programming is done at a level very close to the physical machine, there are important rules regarding memory management.

First, it works with several types of memory:
– physical memory
– virtual memory in the kernel address space
– virtual memory from the address space of a process
– resident memory – we know for sure that the pages accessed are present in the physical memory

The virtual memory in the address space of a process cannot be considered resident due to the virtual memory mechanisms implemented by the operating system: the pages may be in the swap, or they may not be present in the physical memory as a result of the request paging mechanism.

The memory in the kernel address space may or may not be resident. Both the data and code segments of a module and the kernel stack of a process are resident.

Dynamic memory may or may not be resident, depending on how it is allocated. When working with resident memory, things are simple: memory can be accessed at any time. However, if working with non-resident memory, then it can only be accessed from certain contexts.

Non-resident memory can only be accessed from the process context. Accessing the non-resident memory from the interrupt context has unpredictable results and, therefore, when the operating system detects such access, it will take drastic measures: blocking or resetting the system, to prevent serious corruption.

The virtual memory of a process cannot be accessed directly from the kernel. Generally, it is totally discouraged to access the address space of a process, but there are situations in which a device driver has to do it. The typical case is where the device driver has to access a buffer from the userspace. In this case, the device driver must use special functions and not access the buffer directly. This is necessary to prevent access to invalid memory areas.

Another difference from programming in userspace, relative to working with memory, is due to the stack, the stack whose size is fixed and limited. In the Linux kernel, a 4K stack is used by default, and in Windows, a 12K stack is used. For this reason, the allocation of large structures on the stack or the use of recursive calls should be avoided.

Regarding the execution mode in the kernel, we distinguish two contexts: process context and interrupt context. We are in the process context when we are running code following a system call or when we are running in the context of a kernel thread. When we are running in the routine of dealing with a break or a deferred action, we are running in a break context.

One of the most important features of kernel programming is parallelism. Both Linux and Windows support SMP systems with multiple processors, but also the kernel preemptively supports multiple processors. This makes kernel programming more difficult because access to global variables must be synchronized with either spinlock or blocking primitives.

Both Linux and Windows use preemptive kernels. The notion of preemptive multitasking should not be confused with the notion of the preemptive kernel. The notion of preemptive multitasking refers to the fact that the operating system interrupts the running of a process in a forced way, when it has expired the amount of time and runs in userspace, to run another process.
For programming in the Linux kernel, the convention used to call functions to indicate success is identical to that in UNIX programming: 0 for success, or a value other than 0 for failure.