Kernel API changes from 2.0 to 2.2

Richard Gooch



The 2.2 linux kernel has included many improvements in speed, resource utilisation, robustness and scalability compared to 2.0. Some of these improvements have required changes to the kernel API (the programming interface to internal kernel services). If you maintain a 3rd party driver, filesystem or other kernel code, this document may provide a few quick tips to help port from kernel 2.0 to 2.2.

If you really want to get with the times, then read the porting guide for changing from 2.2 to 2.4 kernels. Note that the 2.4 kernel is in late development, whereas the 2.0 and 2.2 series kernels are production (stable).

Copying data to/from user space

Early in 2.1.x Linus came up with a very clever way to improve the speed of copying to/from user space from within kernel space. Previous kernels required you to check if the buffer passed from user space was valid using the verify_area() function. If the buffer was valid, you could then call memcpy_tofs() to copy data from kernel space to user space. The verify_area() function was slow, because it had to check each page to see if there was a valid mapping.

In 2.1.x (and hence in 2.2.x) the need to verify each page of the user space buffer has been removed, and instead exception handling is used to trap for illegal buffers. This avoids race conditions on SMP machines and costly validation checks. The verify_area() function now just checks to see if the buffer range is legal, which is a quick operation.

Now, if you want to copy data to user space, a new function is required: copy_to_user(). You would use it something like this:

	if ( copy_to_user (ubuff, kbuff, length) ) return -EFAULT;
where ubuff is the user space buffer, kbuff is the kernel space buffer and length is the number of bytes to copy. If the copy_to_user() function returns a non-zero value, it means that some of the data could not be copied (due to an invalid buffer). In this case, we return -EFAULT to indicate that the buffer was not valid. Similarly, to copy from user space to kernel space:
	if ( copy_from_user (kbuff, ubuff, length) ) return -EFAULT;
Note that these two functions automatically call verify_area() so you no longer need to call it yourself.

File operation methods

After kernel 2.1.42 a new directory cache (dcache) layer was added which optimised directory search operations. Typical improvements are a 4 times speedup in find search times. This new layer required changes to the file operations interface. For writers of device drivers, the changes are relatively simple: instead of passing a struct inode * to some of your methods, the kernel now passes a struct dentry *. If your driver needs to reference the inode, the following code will suffice:
	struct inode *inode = dentry->d_inode;
assuming dentry is the variable name of the dentry. Some drivers don't care about the inode, so you can ignore this step. What you must change, however, are the declarations of your method functions. Note that some methods still have the inode and not the dentry passed to them.

Some methods don't even provide the dentry, and only provide struct file *. In this case, you can do the following to extract the dentry:

	struct dentry *dentry = file->f_dentry;
assuming file is the variable name of the file pointer.

Below is a list (as of kernel 2.2.1) of the file operations methods:

loff_t llseek (struct file *, loff_t, int);
ssize_t read (struct file *, char *, size_t, loff_t *);
ssize_t write (struct file *, const char *, size_t, loff_t *);
int readdir (struct file *, void *, filldir_t);
unsigned int poll (struct file *, struct poll_table_struct *);
int ioctl (struct inode *, struct file *, unsigned int, unsigned long);
int mmap (struct file *, struct vm_area_struct *);
int open (struct inode *, struct file *);
int flush (struct file *);
int release (struct inode *, struct file *);
int fsync (struct file *, struct dentry *);
int fasync (int, struct file *, int);
int check_media_change (kdev_t dev);
int revalidate (kdev_t dev);
int lock (struct file *, int, struct file_lock *);
You should check the definition of struct file_operations in the file include/linux/fs.h which contains these definitions.

Sometimes the order of these methods is changed. When you declare your struct file_operations structure, you should ensure that you have placed your methods in the correct order. Alternatively, you can protect yourself against simple changes in the ordering by doing something like this:

static struct file_operations mydev_fops = {
	open:    mydev_open,
	release: mydev_close,
	read:    mydev_read,
	write:   mydev_write,
This works because the compiler we use is clever, and will put the methods in their correct places and will fill unspecified methods with NULL.

Another thing to take note of is that Linux 2.2 introduces the pread() and pwrite() system calls. These allow a process to read and write from a specified position in a file. This is similar, but not identical, to using the lseek() system call followed by an ordinary read() or write() system call. In particular, concurrent access to a file (required for asynchronous I/O (AIO) support) requires the pread() and pwrite() system calls. To support these new system calls, a new parameter (the 4th or final parameter) is supplied to the read() and write() methods. This parameter is a pointer to an offset, which may be updated. As a device driver writer, you don't care about file positions, so you could ignore this parameter. For correctness, however, you should prevent the use of the new system calls for your driver, just as you don't support the llseek() method. You can do this by adding the following line at the top of your read() and write() methods:

	if (ppos != &file->f_pos) return -ESPIPE;
assuming that ppos is the variable name of the offset pointer and file is the variable name of the struct file pointer. This code depends on the fact that normal read() and write() system calls will pass the address of file->f_pos as the offset pointer, but the pread() and pwrite() system calls will pass the address of the variable passed in via the system call. Hence it is easy to distinguish between the two cases.

If you do care about file positions (say you have a driver like the MTRR driver which supports incremental reading), then you will need to use and update the valued pointed to by ppos to keep track of where in the "file" the process is reading.

For you poor sods maintaining 3rd party filesystems, life is harder, as you have to spend time dealing with the dcache. Rather than talk about what's changed in the VFS interface, read this instead.

Handling Signals

A new signal_pending() function was added to make signal handling easier and more robust for POSIX RT (queued) signals. Where you used to do:
	if (current->signal & ~current->blocked)
you now do:
	if ( signal_pending (current) )

I/O Space Mapping

The vremap() function, intended for mapping I/O memory on peripheral devices (i.e. PCI), was renamed to ioremap() to more accurately reflect it's true purpose.

I/O Event Multiplexing

The select() and poll() system calls allow a process to multiplex events from multiple file descriptors. In kernel 2.0, drivers support this with the select() method in the file_operations structure. In kernel 2.2, drivers must provide the poll() method instead, which provides greater flexibility.

Discarding Initialisation Functions and Data

You are now able to discard functions and data which are no longer needed after kernel initialisation is completed. This means the RAM required to store those functions and data can be freed and used again. This only applies to drivers compiled into the kernel.

To mark a variable for later discarding:

static int mydata __initdata = 0;
To mark a function for later discarding:
__initfunc(void myfunc (void))
The __initdata and __initfunc keywords place the code and data into a special "initialisation" section. Ideally, you will put as much code and data into the initialisation section as is possible. Of course, you have to make sure that said code or data is not referenced after initialisation (when the init process starts).

New PCI Support API

A new set of functions for PCI support has been added. The old functions are still available for compatibility, but they will eventually be removed. There is a very brief description in Documentation/pci.txt

Setting timeouts

Some new timeout functions were added. Where you used to do:
	current->timeout = jiffies + timeout;
	schedule ();
you now do:
	timeout = schedule_timeout (timeout);
Similarly, if you needed to sleep on a wait queue, but needed a timeout, you would have done:
	current->timeout = jiffies + timeout;
	interruptible_sleep_on (&wait);
you now do:
	timeout = interruptible_sleep_on_timeout (&wait, timeout);
Note that these new functions return the amount of time remaining. In some cases the functions return before the timeout.

Backwards-compatibility macros

Below is some code you can include in your code to make it easier to maintain drivers which have to be compiled for 2.2.x and 2.0.x. The macros provide a 2.2.x driver with the ability to compile with 2.0.x kernels. Contributions to this section are invited.
#include <linux/version.h>
#  define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c)
#  include <linux/mm.h>
static inline unsigned long copy_to_user (void *to, const void *from,
					  unsigned long n)
    if ( !verify_area (VERIFY_WRITE, to, n) ) return n;
    memcpy_tofs (to, from, n);
    return 0;
static inline unsigned long copy_from_user (void *to, const void *from,
					    unsigned long n)
    if ( !verify_area (VERIFY_READ, from, n) ) return n;
    memcpy_fromfs (to, from, n);
    return 0;
#  define __initdata
#  define __initfunc(func) func
#  include <asm/uaccess.h>
#ifndef signal_pending
#  define signal_pending(p) ( (p)->signal & ~(p)->blocked )

Original: 9-FEB-1999
Back to my Home Page
Richard Gooch (rgooch at atnf dot csiro dot au)