What's This Do?

Programming etc.

Modifying the ext2 filesystem module

A recent programming task required me to make changes to the ext2 filesystem module of the Linux kernel. In the real world, such a task is not very common for a variety of reasons; messing with such a core kernel module can be dangerous and filesystem code can be obscure. Moreover, the ext2 filesystem is ubiquitous within the realm of unix-like operating systems which makes finding a genuine case for modifying it a rare experience.

A rather recurrent subtask when altering the ext2 module was to modify data or write new data to data blocks. If you’re just diving in to kernel programming, modifying a mature kernel module will seem a daunting challenge. The code relies heavily on generic kernel calls, the behaviour of which can sometimes be difficult to deduce[1]. There’s only so much trawling through The Linux Cross Reference that a programmer can take.

It’s difficult to find information on the net which explains how to implement particular behaviour in the module. Fortunately, the ext2 module has been very well-written (surprise surprise!) and most functionality can be implemented with a few calls to preexisting functions and some good-old-fashioned C. What follows is an excessively-commented version of a function that appends data to a regular file, assigning data blocks as it goes. The function was chosen because it demonstrates a number of common operations that you might find yourself in need of during your own adventures in Ext2 Land.

 * Appends data to filp, assigning any needed blocks.
 * Returns the number of bytes written,
 * or a negative error code.
static ssize_t write_blocks(struct file *filp, const char *buf,
							size_t len, loff_t pos)
	/* These are for getting data blocks out.
		temp_bh retrieves a logical data block or
		assigns a new one. bh is for getting
		the block which we can manipulate. */
	struct buffer_head temp_bh, *bh;

	/* Get the inode of the regular file */
	struct inode *inode = filp->f_dentry->d_inode;
	/* Get the superblock for the file system */
	struct super_block *sb = inode->i_sb;

	/* Some tracking values for write operations */
	int err;
	size_t to_write;
	size_t offset;
	ssize_t written = 0;
	long blk;

	/* get the logical block number at the offset
		to start writing to */
	unsigned blocksize = sb->s_blocksize;
	long off_blk = (inode->i_size + blocksize - 1)
						>> EXT2_BLOCK_SIZE_BITS(sb);
	if (off_blk * blocksize > inode->i_size)

	/* At this point, we have the logical block number
		(off_blk) to start writing. */

	/* We need to lock the inode here to prevent changes
	   from other filesystem ops */
	mutex_lock_nested(&inode->i_mutex, I_MUTEX_QUOTA);

	/* Here's where we loop through the blocks and write
		to them. It's important to note here that the
		file may have zero blocks, or not enough blocks to
		fit all the data we want to write. Not to worry,
		it will all get allocated in this loop */
	for (blk = off_blk; written < len; blk++) {
		 /* get the offset within the block's data to start
			writing at, and the number of bytes to_write */
		offset = inode->i_size + written - (blk * blocksize);
		to_write = blocksize - offset;

		if (len - written < to_write)
			to_write = len - written;

		/* Get the block we want to write to. In the following
			call, blk is the logical block number we want to
			get. I.E. the index of the block within the file.
			The temp_bh argument will be filled in, but we're
			only interested in one of its members. The final
			argument is a boolean which, when true, tells the
			function to allocate more blocks if there aren't
			enough. */
		temp_bh.b_size = blocksize;;
		temp_bh.b_state = 0;
		err = ext2_get_block(inode, blk, &temp_bh, 1);
		if (err < 0)
			goto out; /*yes, goto is used alot in the kernel*/

		/* Get the actual block that we can modify. The
		temp_bh variable above contains the real block
		number that we want */
		bh = sb_getblk(sb, temp_bh.b_blocknr);
		if (!bh) {
			 err = -EIO;
			 goto out;
		/* We need to lock the buffer, or nasty things will happen. */

	        /* Write to the buffer's data, just like in user-land.
		Note: If buf comes from user space, you want to user copy_from_user() instead. */
		memcpy(bh->b_data + offset, buf + written, to_write);
		written += to_write;

		/* The following calls mark the buffer and its
			associated page as needing writing to disk. */

		/* we can now unlock the buffer */

		/* This will sync the buffer back to disk. We must sync 
			after unlocking or the module will deadlock. */

		/* We're done with this buffer */

	if (written != len) {
		/* an error occured */
		return err;

	/* Some generic cleanup code. Increase the write offset,
	   and the file size. Set a new inode version and write
	   time. Then sync the inode back to disk. */
	*ppos += written;
	inode->i_size += written;
	inode->i_mtime = inode->i_ctime = CURRENT_TIME;


	return written;

This same function can be used to read data from a file into an arbitrary buffer with a few small changes. Just start at logical block 0 (or whatever offset), and copy the data in bh->b_data to buf.

Related Stack Overflow Question.

For more information on the ext2 filesystem, see this entry in the The Linux Kernel.

[1] See this great comment in inode.c:

clear_buffer_new(bh_result); /* What's this do? */

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 232 other followers

%d bloggers like this: