A recent programming task required me to make changes to the ext2 filesystem module of the Linux kernel. In the real world, such a task is not very common for a variety of reasons; messing with such a core kernel module can be dangerous and filesystem code can be obscure. Moreover, the ext2 filesystem is ubiquitous within the realm of unix-like operating systems which makes finding a genuine case for modifying it a rare experience.
A rather recurrent subtask when altering the ext2 module was to modify data or write new data to data blocks. If you’re just diving in to kernel programming, modifying a mature kernel module will seem a daunting challenge. The code relies heavily on generic kernel calls, the behaviour of which can sometimes be difficult to deduce[1]. There’s only so much trawling through The Linux Cross Reference that a programmer can take.
It’s difficult to find information on the net which explains how to implement particular behaviour in the module. Fortunately, the ext2 module has been very well-written (surprise surprise!) and most functionality can be implemented with a few calls to preexisting functions and some good-old-fashioned C. What follows is an excessively-commented version of a function that appends data to a regular file, assigning data blocks as it goes. The function was chosen because it demonstrates a number of common operations that you might find yourself in need of during your own adventures in Ext2 Land.
/*
* Appends data to filp, assigning any needed blocks.
* Returns the number of bytes written,
* or a negative error code.
*/
static ssize_t write_blocks(struct file *filp, const char *buf,
size_t len, loff_t pos)
{
/* These are for getting data blocks out.
temp_bh retrieves a logical data block or
assigns a new one. bh is for getting
the block which we can manipulate. */
struct buffer_head temp_bh, *bh;
/* Get the inode of the regular file */
struct inode *inode = filp->f_dentry->d_inode;
/* Get the superblock for the file system */
struct super_block *sb = inode->i_sb;
/* Some tracking values for write operations */
int err;
size_t to_write;
size_t offset;
ssize_t written = 0;
long blk;
/* get the logical block number at the offset
to start writing to */
unsigned blocksize = sb->s_blocksize;
long off_blk = (inode->i_size + blocksize - 1)
>> EXT2_BLOCK_SIZE_BITS(sb);
if (off_blk * blocksize > inode->i_size)
off_blk--;
/* At this point, we have the logical block number
(off_blk) to start writing. */
/* We need to lock the inode here to prevent changes
from other filesystem ops */
mutex_lock_nested(&inode->i_mutex, I_MUTEX_QUOTA);
/* Here's where we loop through the blocks and write
to them. It's important to note here that the
file may have zero blocks, or not enough blocks to
fit all the data we want to write. Not to worry,
it will all get allocated in this loop */
for (blk = off_blk; written < len; blk++) {
/* get the offset within the block's data to start
writing at, and the number of bytes to_write */
offset = inode->i_size + written - (blk * blocksize);
to_write = blocksize - offset;
if (len - written < to_write)
to_write = len - written;
/* Get the block we want to write to. In the following
call, blk is the logical block number we want to
get. I.E. the index of the block within the file.
The temp_bh argument will be filled in, but we're
only interested in one of its members. The final
argument is a boolean which, when true, tells the
function to allocate more blocks if there aren't
enough. */
temp_bh.b_size = blocksize;;
temp_bh.b_state = 0;
err = ext2_get_block(inode, blk, &temp_bh, 1);
if (err < 0)
goto out; /*yes, goto is used alot in the kernel*/
/* Get the actual block that we can modify. The
temp_bh variable above contains the real block
number that we want */
bh = sb_getblk(sb, temp_bh.b_blocknr);
if (!bh) {
err = -EIO;
goto out;
}
/* We need to lock the buffer, or nasty things will happen. */
lock_buffer(bh);
/* Write to the buffer's data, just like in user-land.
Note: If buf comes from user space, you want to user copy_from_user() instead. */
memcpy(bh->b_data + offset, buf + written, to_write);
written += to_write;
/* The following calls mark the buffer and its
associated page as needing writing to disk. */
flush_dcache_page(bh->b_page);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
/* we can now unlock the buffer */
unlock_buffer(bh);
/* This will sync the buffer back to disk. We must sync
after unlocking or the module will deadlock. */
sync_dirty_buffer(bh);
/* We're done with this buffer */
brelse(bh);
}
out:
if (written != len) {
/* an error occured */
mutex_unlock(&inode->i_mutex);
return err;
}
/* Some generic cleanup code. Increase the write offset,
and the file size. Set a new inode version and write
time. Then sync the inode back to disk. */
*ppos += written;
inode->i_size += written;
inode->i_version++;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
mark_inode_dirty(inode);
mutex_unlock(&inode->i_mutex);
return written;
}
This same function can be used to read data from a file into an arbitrary buffer with a few small changes. Just start at logical block 0 (or whatever offset), and copy the data in bh->b_data to buf.
Related Stack Overflow Question.
For more information on the ext2 filesystem, see this entry in the The Linux Kernel.
[1] See this great comment in inode.c:
clear_buffer_new(bh_result); /* What's this do? */
Like this:
Like Loading...