this page last updated:
Wed Sep 6 11:24:43 PDT 2023
Getting Started
Before you can start the project, you will need Eucalyptus credentials so that
you can access the campus private cloud. To get them, you need to email me
with
- The Name of your Team
- The names of each team member (as they appear in egrades)
- The email addresses of each team member (for an email they check
regularly)
Once I receive such an email, I will respond to all team members with a shared
set of credentials.
File System Project
The pedagogical goal for the overall class project is for you to have the experience
of building a working file system for Linux. Doing so will hopefully acquaint
you with the concepts of system calls, the Linux file abstraction, and device
I/O. The resulting file system will (if it is complete) operate as a complete
functional equivalent to a file system that ships with Linux (e.g. ext4, xfs,
etc.) with possibly lower performance. However, to achieve this level of
functionality can require more work than one might anticipate.
Thus, the plan is to complete the overall assignment in two phases.
Phase 1
To get the
process started, your first assignment is to create a "fake" file system that
sends me mail when I run a small test program. The result of this "Phase 1"
assignment will be useful to completing the final project.
The main software tool we will use to integrate your eventual file system
implementation as a working file system for Linux will be the FUSE file
system utility and, in particular, LibFUSE.
FUSE is a way to interpose call back functions you write between the Linux
file system calls and the actual file system. That is, you write call-back
functions that FUSE invokes when a program (any program) makes a Linux system
call that accesses the file system.
Ultimately, by the end of of Phase 2, the call-backs you write will implement
as much of the Linux file semantics as you can manage by the end of the
quarter. However, in Phase 1, your goal is to write a set of call backs that
allow a program I will write to open a file (the name doesn't matter) and then
to write a string into the resulting file descriptor that you package in an
email and send to my email address. Writing this "fake" file system (that
really only supports a few system calls) is intended to achieve the following
goals:
- an understanding of how to instantiate and configure a Linux VM in a cloud
that you are free to modify using root privileges,
- an understanding of how to install, build, and develop using the FUSE
utility,
- an understanding of how to install the Linux dependencies in addition to
LibFUSE (e.g. sendmail) that are required to complete the assignment, and
- a basic understanding of the concept of "software release" in which your
code and documentation will be tested (without you present) to determine its
functionality.
This last feature is increasingly peculiar to operating systems. Most
software today is designed to operate as an Internet-accessible service that
can be manipulated, debugged, and "repaired" while it is in use. Operating
systems "ship" as software releases because they must install and run on
machines outside of the administrative domain of the developer. Thus, one
must prepare an operating system to be installed and used by an unseen and
unaccountable user, with no recourse once the software is released. To some
extent, we will attempt to have this experience (and to understand its impact
on system software development) in this course.
In terms of preparation, if the terms "system call" and "file system" are not
part of your current technical understanding, then you should review the
following materials:
We will review (but only review) the latter in this class in a lecture. If
you are new to computer science (e.g. you do not have an undergraduate
background in computer science) and these lecture materials seem utterly
foreign, you should consider taking an undergraduate OS class before
attempting this project.
Summarizing Phase 1
In Phase 1, you are to
- create a FUSE program that allows a calling program to open a
fictitious file for writing and to write a string into that file,
- intercept the file system calls using FUSE and run a shell "call out" to email my email address with that string
- create a "recipe" for me to use to build and install your FUSE
program so that I can test it using a simple test program.
To accomplish this phase, you will need to familiarize yourself with the FUSE
documentation (such that it is). You will also need to learn about how to
configure Linux in a cloud (the Eucalyptus campus cloud, in our case).
Finally, you will need to spend some time writing and -- most importantly --
testing your instructions.
Grading for Phase 1
To grade your assignment I will read your instructions and follow them
(largely using cut-and-paste). If, at some point, they fail, I will stop and
assign a grade. If, at the end of the recipe, I receive an email, your Phase
1 will receive full credit.
Note that I will make no assumptions regarding what you "mean" in your build
and test instructions. I will expect the precise commands I will need
to cut-and-paste from the document to
- start a VM in Eucalyptus with the ami corresponding to your distribution
of choice
- condition the VM (as the root user) to install an Linux software packages
your build requires
- build your solution
- run your solution
Often, when creating such a recipe, you need to include commands that take (as
parameters) values that are not known until previous commands are executed.
For example, when you run an instance in Eucalyptus, its IP address and DNS
name are assigned (at random) by the cloud. Thus you cannot write down a
command that takes the IP address in a document that I can directly
cut-and-paste from.
You should indicate these parameters appropriately. For example, part of your
recipe will instruct me to start a VM using a specific ami (image identifier).
The ami corresponds to a specific Linux distribution, but the command to
launch it requires that I use a key identifier that belongs to my user in the
cloud. It is fine to indicate this as
aws ec2 run-instances --key-name [mykeyname.key] --image-id ami-b157b02b9b51c65ce --instance-type t2.medium
where the square brackets indicate that I should insert what ever key I plan to
use to make my test.
Using Eucalyptus
To complete both Phase 1 and Phase 2, you will need to mount a file system and
(for Phase 2) read and write a raw disk device.
To mount a file system using FUSE you will need root privileges. To make it
possible to provide you with a Linux environment where you can have root, we
will use the Eucalyptus private cloud here at UCSB.
This
tutorial explains the basics. Eucalyptus is essentially a private-cloud
equivalent to Amazon's AWS. For this
class, you can start Linux VMs that you can use for development and to run
your file system. You can also create volumes that are virtualized raw disk
partitions to use as secondary storage.
Phase 2
From an engineering perspective (i.e. what it is you need to do), Phase 2 of
the project
decomposes into three tasks:
- system call implementation -- implementing the system calls that
can be issued by Linux on files (hopefully by replacing and then adding to the
FUSE call-backs you implemented for Phase 1),
- implementing the file abstraction -- building the internal data
structures and procedures necessary to implement files, and
- implementing secondary storage management -- building the parts
of the file system that persist in secondary storage
Implicitly, there is a fourth task which is to integrate these three tasks
into a single, working file system.
File System Abstractions
Here, you have some latitude. As long as you implement the semantics of the
file system calls correctly, the design and implementation of the internals is
up to you. Any internal organization is acceptable however it must be in
memory only (i.e. you can't just stick things in a database or open a Linux
file in your FUSE call-backs).
Secondary Storage
The third part of the exercise is to write the secondary storage management
that persists the file system state across machine reboots. The goal is to be
able to shut down your file system (either through an unmount or a machine
reboot) and have all of the files remain in tact and in the same state when
the file system is remounted.
For this part of the project, you will need to read and write a raw storage
partition in 4K blocks. That is, all accesses to persistent storage must read
or write a complete 4K block.
Phase 2 Project Deliverables
- Must Haves -- Implement the basic open/close/read/write/seek and
directory functions using FUSE. To
complete this part of the project you must
- implement mkfs to make a file system using a disk block
device,
- implement the basic file abstractions (block management, block maps,
directories,
etc.),
- integrate the file system implementation with FUSE
so that you should have a basic file system that works for most
programs that use the minimal POSIX file system interface
(open/close/read/write/seek).
- Good to Haves -- Complete as many of the FUSE file operations as you can
and optimize the performance. You need to implement a basic working
file system but it is likely to be quite slow and/or incomplete if you are careful about
persistence. Ideally, you should be able to run any regular Linux
commands (e.g. tar, gcc, grep, vi, etc.) in your file system just as on the
Linux file system itself. To do so, you'll need to make sure that you are
handling issue like access times, permissions, etc.
That is, a successful project is a
more complete version of the necessary Linux functionality, compared to the
basic functionality, that may also
improve performance.
Dates and Grading Procedures
Phase 1 of the project is due
Grading Procedure for Phase 1
You should create a tar file with your code and instructions and email it to
me. Please do not send me links to repositories or other on-line services
(e.g. Google docs). Your instructions (which should be in a text file or a
PDF) should explain exactly how to build and test your solution.
I will execute your instructions and assign a grade using the contents of your
tar file.
Grading Procedure for Phase 2
The final project is due
at the end of the class. I will schedule a time slot to meet with each team
during the class period on one of the following two days:
- Dec. 4, 2023
- Dec. 6, 2023
You will present
your file system
and demonstrate your final project (this activity will
take place in lieu of a final). The format for this evaluation is that you
will provide me with access to your file system so that I can run a series of
tests on it and to ask you questions about its response.
For the demonstration, you will use a Eucalyptus
VM and a 30GB volume that you will have formatted and mounted. My tests will
assume that the file system has at least 30GB, and at least 20GB of usable
space. You will
- start a VM in Eucalyptus with the distribution of your choice,
- install your file system code for Phase 2 and any dependencies it requires
- create a file system on a raw volume you allocate using Eucalyptus and
attach to the VM using your version of mkfs,
- mount the file system
- install my PGP public key (accessible from a link at the top of this page)
so that I can use ssh to log into your VM
During your assigned time slot, I will log in to your VM and run a series of
test codes on your file system. Afterwards, you will make a brief presentation
to the class about your file system development experience.
It is best if you work in a team of either 2, 3, or 4. If you wish to work
alone or to form a team larger than 4, please contact me so we can discuss the
feasibility of and likelihood of success.
All members of the team will receive the same grade.
I will assign presentation times randomly to each group or individuals for time
slots during those two
days.
Referencing Existing Work
As mentioned, the goal of the project is to provide an opportunity to build a
file system from scratch -- an opportunity that is hopefully as beneficial as
it is rare professionally. Still, there exists a myriad of open source file
systems that have been developed using FUSE and much or all of the
functionality necessary to succeed with this project is likely to be
available as freely accessible code. Further, it is often easiest to
understand how to implement some of these abstractions through code examples.
For this project, it is acceptable to use existing code as reference material
however it is not acceptable to incorporate code snippets or routines from
other projects. That is, you may read code you find that helps illuminate a
particular concept just as you may read a paper or other text, however you may
not copy (either by hand or through electronic means) code for use in your
project. Further, as part of your code base, you must include a README or
LICENSE file that cites what ever references (text or code) that you have
chosen to consult.