CS270 -- Rich's Hints for DIY File System Project Phase 2

Rich Wolski --- Fall, 2023

this page last updated: Thu Sep 7 13:21:00 PDT 2023

Roadmap

In Phase 2 of the project, the goal is to create a working file system I can test exactly as if it were a file system that is already supported by Linux. That is, I should be able to run a test on a "normal" Linux file system (say an ext4 file system) and the same test on a FUSE file system that is using your FUSE daemon, and the program's function should be the same.

This goal is, indeed, achievable. However, the full Linux file system interface is quite extensive. In addition to the POSIX interface, there is a veritable zoo of features that the file system calls implement. We will only be testing a subset of these features. Further, you are free to design your file system in any way you choose. If this is your first up-close encounter with a file system, however, or if you are having trouble understanding how the pieces all fit together, this document will provide one possible roadmap for the project. It represents, more or less, how I implemented it. You need not consider a prescription. Rather, if you don't have a strong feeling about how to proceed, you might consult this text as I know a design and implementation that follows it will result in a working file system.

Requirements and Style

The first thing to understand is that the project is not to create a production-quality implementation. Your file system must be production quality in terms of its robustness (it must not lose data or crash) but the other aspects of a file system (portability, extensibility, etc.) that we'd typically want in the implementation need not be there.

For example, it is fine to design your data structures so that there is only one file system of your type mounted at a time. If you were building this file system for a real OS, you'd need to handle having multiple file systems mounted simultaneously. Feel free to design for the more general case, but it is not necessary.

The other way to look at the requirements for this project is to ask "what must my file system do?" At the end of the quarter, I will ask you to add my ssh public key to your instance and for you to start up your file system using a single mount point. As the root user, I will install several test routines by copying them into your file system through this mount point and I will run the routines. They will both stress test your implementation and record some performance stats.

I will also ask you to demo any cool features or features of which your are particularly proud.

And that's it. The goals (in order of importance are) first to enjoy the process, second, not to have your file system crash or corrupt the storage, and third to make your file system performant.

By way of style, it has been my experience that building this type of system is best accomplished using two basic principles.

Build incrementally and generate tests as you go. Don't move on until all of your tests pass for a particular stage. Don't move on until you have understood and fixed each bug.
Understand that an OS is fundamentally implementing four operations
- discover: find the thing I'm looking for by its name or ID
- allocate: from a pool of available resources allocate one unambiguously
- map: create a data structure that maps one abstraction to one or more resources
- deallocate: unambiguously return a resource to its correct pool when it is no longer in use

If you keep these two basic tenets in mind, I think the project is more straightforward to comprehend.

Design

For this phase, I designed the file system as four layers which I'll go through from the bottom up:

Layer 0: The Disk Layer -- This layer modularizes access to the physical storage medium
layer 1: The Data Structure Layer: This layer builds pools of blocks and inodes on the disk and defines ways to allocate and deallocate each.
layer 2: The Abstraction Layer: This layer implements the Linux file abstractions.
layer 3: The Interface Layer: This layer implements an interface between FUSE and the file abstraction layer.

Software layering is a design principle that can be taken to an extreme. In this case, it can be used fairly faithfully so that each layer makes call only to the layer below it.

Layer 0

The first step is to build an interface to the disk. For debugging purposes, build an in-memory interface that stores and retrieves data from an in-memory buffer, but does so based on logical block number. By doing this in memory it is possible to use the debugger to "see" what is on the disk which makes debugging easier

Then, once you have your file system working with memory buffers, you need only rewrite this layer to use the raw disk rather than an in-memory buffer in block-sized units. Thus, henceforth I will refer to "on disk" as operations that go through layer 0 (which will eventually read and write a disk).

Your test routines should verify that you can access all of the blocks on disk individually and that there is no corruption (e.g. due to a miscalculation resulting in overlap) in the blocks.

Layer 1

There are three kinds of functions to implement at layer 1:

mkfs: creates the superblock, free inode list, and free block list on disk
inode allocate, free, read, and write: routines to get, free and access inodes on disk
buffer allocate, free, read, and write: routines to get, free and access blocks on disk

Your test routines should verify that the free lists look correct (uncorrupted) after bock and inode allocate and free calls. Since multiple inodes will fit into a block, it should also verify that inode reads and writes work correctly.

Layer 2

This layer implements files and directories. For this project, at some level, you'll need to implement the following Linux system calls

mkdir: makes a directory
mknod: makes a file
readdir: reads a directory
unlink: removes a file or directory
open/close: opens/closes a file
read/write: reads/writes a file

You will want to study the man pages on these calls to understand their specific semantics. For example, a write past the end of a file, simply extends the file in Linux (it does not generate an EOF error). You may also wish to implement additional calls like truncate, chown, and chmod, depending on how realistic you'd like your file system to be.

You test codes for Layer 2 should be able to make directories and files. They should follow the correct creation semantics (e.g. a mknod fails if it specifies a path that contains non-existent directories). You should test file reads/writes that use direct blocks in your inodes, indirect blocks, and double indirect blocks. You should also make sure that files get deleted properly and that the free lists look reasonable as blocks and inodes are allocated and released.

Layer 3

The final layer connects Layer 2 to the FUSE interface. You are free to use the code you developed in Phase 1 as a starting point or to start from scratch. Note that FUSE has several different interface facilities. In particular, it passes the path to to each object in each call starting at the root of the mounted file system. Layer 3 can always call a function to convert a path to an inode (this routine is called

namei

is some Unix implementations) in each call. It is also possible to get FUSE to pass back a file info data structure in which you can store your own information (e.g. the inode number) for subsequent calls. You are free to use this facility if you so choose. Using namei each time means that each call will get the true conversion to an inode but it will be slower than it needs to be. You might start with the namei approach and then see if using FUSE to pass back the inode number when it can improves performance.

Also, the debugger is most helpful for development at this layer. There isn't much documentation that explains exactly what comes across the FUSE interface in gory detail. It is instructive to write stubs at layer 3 and to set breakpoints (using the debugger) in the stubs just to see what FUSE was passing into my code.

Testing at this stage involves mounting a small file system and using Linux to test it out. Consider writing test routines that use ascii text since it is easy to use the shell with such tests, and it is also easy to spot corrupted files. While the file system is small (it must be able to fit in memory) all of the "standard" file operations should work when your tests are complete.

At this stage, you should have a working file system that uses FUSE and an in-memory buffer as the disk store. You can pretty much get all of the system calls to work. The only restriction is that the sizes will need to be pretty restrictive. Considering using a small block size and small constants to test everything and then moving to implement stress tests. The larger sizes possible with a real disk may expose some sizing bugs.

Implementing Persistence

Rewrite your Layer 0 to use the Linux file commands on a raw block device in

/dev

. Launch an instance in Eucalyptus, create a volume, and attach it to the instance. The new device can be accessed like a file through the

/dev

entry.

For example, if the attached volume is

/dev/vdb

then

open /dev/vdb
lseek to 4096
read 1024 bytes

will open the raw disk device, move the file pointer to byte 4096, and read 1024 bytes from the raw disk.

Rewrite Layer 0 and rerun your tests with a file system that is at least 2 GB. Then try formatting and mounting a 30GB file system and test.

Testing and more Testing

One point of confusion that can arise with respect to Phase 2 is to know when you are "done" with the project. The accurate answer is "when the clock runs out and the assignment is due" since no file system (including the current Linux-supported file systems) are ever "done."

However, I will be grading your Phase 2 so you might legitimately ask "what do I need to do to get full credit?" The answer to that question is that you need to make sure you implement and test more Linux functionality than I will be able to test during your final presentation period. You won't know what I will test (although you do know it won't take more than about 5 minutes) so you need to make sure that your implementation is as complete as possible. What this means is that you should be writing tests through out your development and then, when you think it is finished, you should write more tests, each of which is designed to exercise some feature of the file system. In short, you need to test your system more exhaustively than I will test it.

The reason that the project is evaluated this way is because operating system developers can never anticipate (or even see, it almost all cases) what users are doing to "test" the functionality of the OS. Thus, OS development teams must doing as extensive testing as they can manage ahead of a release date so that they can expose and fix as many bugs as they can before users encounter them.

In this class, though, it is not reasonable to ask you to implement the full Linux file-system interface. You might ask "what parts of the interface are fair game?"

Here are a few hints regarding features that I will not test and also features that I might test.

I will not test the asynchronous file interfaces.
I will not test memory-mapped files.
I will not test file locking.
I will not test file system encryption.
I will not test remote file access via NFS.
I will not test named pipes.

My tests will use the Linux POSIX interface and I will expect your file system to generate the same results when a test runs on a Linux file system and on your file system. Thus you should consider writing as many (and as varied) tests using the POSIX interface and comparing the tests when run on a native Linux file system to one run on yours.

I will also test your file system using "standard" Linux system utilities and tools. Examples include, but are not limited to, the various language compilers (gcc, g++, gfortran, etc.), git, make, bash, grep, awk, sed, ls, and find. You should consider writing tests that use these utilities to access files on your file system as well. At this point you should also write stress tests that do lots of operations with different sizes and offsets to make sure that your file system doesn't have a latent bug or two.