call lecture from undergraduate OS class.
this page last updated:
Tue Sep 20 14:31:46 PDT 2016
File System Project
The pedagogical goal for this class project is for you to have the experience
of building a working file system for Linux. Doing so will hopefully acquaint
you with the concepts of system calls, the Linux file abstraction,a nd device
From an engineering perspective (i.e. what it is you need to do), the project
decomposes into three tasks:
Implicitly, there is a fourth task which is to integrate these three tasks
into a single, working file system.
- system call implementation -- implementing the system calls that
can be issued by Linux on files,
- implementing the file abstraction -- building the internal data
structures and procedures necessary to implement files, and
- implementing secondary storage management -- building the parts
of the file system that persist in secondary storage
File System Calls
For the system call components, you will need to use a software
facility call FUSE. FUSE (File System in User Space) is available for most
Linux systems. It provides a way to intercept file system calls issued by
Linux programs and to redirect the program flow into a daemon running as a
user-level process. That is, when a Linux program (any Linux program)
makes a file system call, FUSE will invoke a routine in a daemon process that
you have written instead of sending that system call to the Linux file system
implementation in the kernel.
Thus, by using FUSE, you can test your file system as if it were any other
file system. More importantly, you can compare the results that your file
system produces to the results produced by the Linux file system itself.
This "A/B" will be how we will evaluate your work at the end of the quarter.
File System Abstractions
Here, you have some latitude. As long as you implement the semantics of the
file system calls correctly, the design and implementation of the internals is
up to you. Any internal organization is acceptable however it must be in
memory only (i.e. you can't just stick things in a database).
The third part of the exercise is to write the secondary storage management
that persists the file system state across machine reboots. The goal is to be
able to shut down your file system (either through and unmount or a machine
reboot) and have all of the files remain in tact and in the same state when
the file system is remounted.
To help keep this exercise on schedule, we'll do this project in phase, where
each stage receives a project score (more on scoring below).
This decomposition is for your benefit. You may follow what ever development
schedule you and your team decide upon, but you are strongly encouraged to
follow this approach in lieu of strong alternative.
- Phase 1 -- Implement the basic open/close/read/write/seek and
directory functions using an in-memory emulation of secondary storage. To
complete this phase you must
at the end of phase 1, you should have a basic file system that works in
memory only. When you unmount the file system all files are lost. Similarly,
when you mount it, you must "make" a new file system before any operations
- build an emulator for secondary storage that can be loaded in
memory with your file system. This emulator need not persist beyond the
lifetime of a mount (i.e. it does not implement persistence).
- implement mkfs to make a file system in the emulator
- implement the file abstractions (block management, block maps,
etc.) using the in-memory emulator
- integrate the file system implementation with FUSE
- Phase 2 -- Port your file system to use secondary storage. Here
we'll provide access to an on-campus cloud that will let you allocate "raw"
disk block devices. To accomplish Phase 2, you will need to swap out the
in-memory emulator for the raw disk device itself. After this phase, your
file system will be able to survive across mounts (i.e. implement
persistence). It should also be able to support more storage than is
available in memory. That is, you should not implement phase 2 by simply
writing a dump and recover function for the emulator state. The file storage
available will be bigger than RAM.
- Phase 3 -- Complete as many of the FUSE file operations as you can
and optimize the performance. At the end of Phase 2, you may have a working
file system but it is likely to be quite slow if you are careful about
persistence. At the end of Phase 3, you should be able to run regular Linux
commands (e.g. tar, gcc, grep, vi, etc.) in your file system just as on the
Linux file system itself. The expectation for Phase 3 is that it will be
faster than Phase 2, but no less reliable. That is, the Phase 3 version is a
more complete version of the necessary Linux functionality, that may also
Dates and Grading
The final project is at the end of the class. Depending on the number of
groups (see below) the exact due date will be on one of the three following
- November 29, 2016
- December 1, 2016
- December 7, 2016 (the scheduled final period)
You will present
your file system to the class
as a group so that you can demonstrate your final project (this activity will
take place in lieu of a final). The format for this evaluation is that we
will test your file system using a test suite of software that is provided to
you just before your presentation time slot. You will then make a short
presentation as a group describing the features of your implementation.
You will work in a team of 5.
All members of the team will receive the same grade. Your
final demonstration will take place during class time or during the final
I will assign presentation times randomly to each group for time
slots during those three
day. During your demonstration period, which will be 15 minutes,
we will test your file system's functionality and speed with functionality
being the most important.
This last point bears repeating. A slow file system that works without
failure is worth more than a fast file system
Referencing Existing Work
As mentioned, the goal of the project is to provide an opportunity to build a
file system from scratch -- an opportunity that is hopefully as beneficial as
it is rare professionally. Still, there exists a myriad of open source file
systems that have been developed using FUSE and much or all of the
functionality necessary to succeed with this project is likely to be
available as freely accessible code. Further, it is often easiest to
understand how to implement some of these abstractions through code examples.
For this project, it is acceptable to use existing code as reference material
however it is not acceptable to incorporate code snippets or routines from
other projects. That is, you may read code you find that helps illuminate a
particular concept just as you may read a paper or other text, however you may
not copy (either by hand or through electronic means) code for use in your
project. Further, as part of your code base, you must include a README or
LICENSE file that cites what ever references (text or code) that you have
chosen to consult.