CS270 -- Rich's Hints for DIY File System Project

CS270 -- Rich's Hints for DIY File System

Rich Wolski --- Fall, 2014

this page last updated: Wed Sep 23 15:34:22 PDT 2015

Roadmap

Requirements and Style

implementation

For example, it is fine to design your data structures so that there is only one file system of your type mounted at a time. If you were building this file system for a real OS, you'd need to handle having multiple file systems mounted simultaneously. Feel free to design for the more general case, but it is not necessary.

The other way to look at the requirements for this project is to ask "what must my file system do?" At the end of the quarter, I will ask you to add my ssh public key to your instance and for you to start up your file system using a single mount point. As root, I will install several test routines by copying them into your file system through this mount point and I will run the routines. They will both stress test your implementation and record some performance stats.

I will also ask you to demo any cool features or features of which your are particularly proud.

And that's it. The goals (in order of importance are) first to enjoy the process, second, not to have your file system crash or corrupt the storage, and third to make your file system performant.

By way of style, it has been my experience that building this type of system is best accomplished using two basic principles.

Build incrementally and generate tests as you go. Don't move on until all of your tests pass for a particular stage. Don't move on until you have understood and fixed each bug.
Understand that an OS is fundamentally implementing four operations
- discover: find the thing I'm looking for by its name or ID
- allocate: from a pool of available resources allocate one unambiguously
- map: create a data structure that maps one abstraction to one or more resources
- deallocate: unambiguously return a resource to its correct pool when it is no longer in use

Design

Layer 0: The Disk Layer -- This layer modularizes access to the physical storage medium
layer 1: The Data Structure Layer: This layer builds pools of blocks and inodes on the disk and defines ways to allocate and deallocate each.
layer 2: The Abstraction Layer: This layer implements the Linux file abstractions.
layer 3: The Interface Layer: This layer implements an interface between FUSE and the file abstraction layer.

Phase 1

Layer 0

In phase 2, the idea is to rewrite this layer to use the raw disk rather than an in-memory buffer as storage. It is important that the interface to this layer be a block interface (i.e. data is read or written only in full block units). Thus, henceforth I will refer to "on disk" as being through layer 0 (which will eventually read and write a disk) although until Phase 2 it will really be to a memory buffer, a block at a time..

Your test routines should verify that you can access all of the blocks on disk individually and that there is no corruption (e.g. due to a miscalculation resulting in overlap) in the blocks.

Layer 1

make-fs: creates the superblock, free inode list, and free block list on disk
inode allocate, free, read, and write: routines to get, free and access inodes on disk
buffer allocate, free, read, and write: routines to get, free and access blocks on disk

Layer 2

mkdir: makes a directory
mknod: makes a file
readdir: reads a directory
unlink: removes a file or directory
open/close: opens/closes a file
read/write: reads/writes a file

You test codes for Layer 2 should be able to make directories and files. They should follow the correct creation semantics (e.g. a mknod fails if it specifies a path that contains non-existent directories). You should test file reads/writes that use direct blocks in your inodes, indirect blocks, and double indirect blocks. You should also make sure that files get deleted properly and that the free lists look reasonable as blocks and inodes are allocated and released.

Layer 3

FUSE hello world

namei

Also, the debugger is most helpful for development at this layer. There isn't much documentation that explains exactly what comes across the FUSE interface in gory detail. It is instructive to write stubs at layer 3 and to set breakpoints (using the debugger) in the stubs just to see what FUSE was passing into my code.

Testing at this stage involves mounting a small file system and using Linux to test it out. Consider writing test routines that use ascii text since it is easy to use the shell with such tests, and it is also easy to spot corrupted files. While the file system is small (it must be able to fit in memory) all of the "standard" file operations should work when your tests are complete.

Phase 1 Complete

Phase 2

/dev

/dev

For example, if the attached volume is

/dev/vdb

open /dev/vdb
lseek to 4096
read 1024 bytes

Rewrite Layer 0 and rerun your tests with a file system that is at least 2 GB.

At this point you should also write stress tests that do lots of operations with different sizes and offsets to make sure that your file system doesn't have a latent bug or two.

And that's it. Phase 2 is done.