CS293B -- Cloud Computing Projects, Spring 2020
You are free to define a project of your own or to choose one of several I
have outlined below. You may work as a team of up to three.
The project requirements are
- you must hand in (by email) a written project plan no later
than 11:59 PM on Monday, April 13, 2020. You project plan must
- list the names and email addresses of all team members
- describe the problem you are planning to solve
- outline your approach to solving it
- describe how you will test whether you have solved it or the
degree to which you have addressed the problem
The plan must be sent to me as a PDF and should be in prose form.
- projects can come in two forms
- application projects that your project explore some
combination of cloud computing, edge or fog computing, and IoT
"end-to-end." For this style of project, you will need to either find live data
streams to use as "sensor" inputs or "fake" sensor data using replay from
existing data sets. The data (which could be telemetry, images, audio, etc.
in any combination) must then be manipulated and analyzed at the edge and/or
in the cloud.
- technology projects that attempt to improve upon existing
technologies for cloud, edge, and IoT. For a technology project, some
comparison with the "best known" solution or solutions must be part of the
evaluation.
- Each team will meet with me (and any collaborators) during the last full
week of classes which is June 1 through June 5, 2020. There will
be no lectures during that week. Instead, I will schedule a 15 to 30 minute
period to meet
with each team so that I can enjoy and focus on your demo.
My intention is to use the lecture periods on June 1 and June 3 for the demo
time slots but (depending on course enrollment) we may also meet on other days
that week.
- Project materials that must be turned in include
- a project write up that describes the project, explains how it
tracked the project plan, and discusses its evaluation.
- a short slide presentation (maximum of 10 slides) that presents
the project
- all code and data sets that you used
- instructions for building and executing your demo so that your
results are reproducible
I will use the materials and the discussion we have during your demo sessions
to determine the project grade for this course.
Example Application Projects
Here are several example application projects you might consider. Any
reasonable adaptation would be equally if not more appropriate.
Where's The Bear 2.0: A couple of years ago we developed an edge
computing solution that uses Google's Tensor Flow and Inception v3 model to
automatically classify camera trap data from the Sedgwick Reserve. The
project was called "Where's the Bear?" and the paper is
Elias, Andy Rosales, et al. "Where's the Bear?-Automating Wildlife Image Processing Using IoT and Edge Cloud Systems." Internet-of-Things Design and Implementation (IoTDI), 2017 IEEE/ACM Second International Conference on. IEEE, 2017.
This work can easily be extended in several ways. First, the original system
only identified images with a single species. Identifying images with
multiple species (which are rare but useful) would be a great improvement.
Secondly, the ecologists who operate the camera traps would like to use them
to count (e.g. for population estimates). They would REALLY like counts
separated by age group (e.g. youngsters versus adults). The original system
does no counting. Thirdly (and this is probably hard) the ecologists would
like to know if it is possible to identify individual animals. IN addition,
we would like to understand whether there is a relationship (temporal or
spatial) between image capture and environmental conditions (meteorological,
seasonal, drought, etc.) In particular, to what degree is it possible to
predict when an image will be taken of a given species? The authors
and collaborators as well as various data sets are available in the area as
resources for this project. You can find out more here
but you will need a UCSB NetID to access the images.
Nanoclimate forecasting: One big area of interest for IoT and cloud is
agriculture (as evidenced by Microsoft's Farmbeats
project). Estimating meteorological conditions at a fine-grained level is
turning out to be an important capability that IoT for agriculture
can provide. For example, agricultural engineers and scientists
believe that it is possible to use highly localized temperature and humidity
measurements to optimize crop management (e.g. frost prevention, differential
irrigation scheduling, etc.) However it is often infeasible to instrument
growing areas with densely distributed sensors. Doing so often carries a
large infrastructure cost (both in terms of installation and maintenance) as
well as the potential for interfering with farm operations. Thus Nanoclimate
sensing and forecasting requires the heavy use of analytics to make inferences
and predictions.
For example, at one orchard in the Central Valley of California, the growers
would like to use a data from few carefully placed temperature sensors
with the plethora of mesoscale and microclimate meteorological data to make
fine-grained inferences and predictions of temperature and humidity at meter
scale.
Another example project would be to try and determine the specific sets of
data and data-fusion analytics that can infer the temperature in an
arbitrary square meter of the orchard. For example, knowing the temperature
at one location, the prevailing wind, and the solar radiocity, it is possible
to infer the temperature at another location near by. How accurately
can this inference be made? Where should sensors be placed? What is the
minimum sensor to error ratio that is possible?
Forecasting (predicting a future temperature value) is another important
area that is related to the inference problem. For frost prevention, for
example, an inference is sufficient to allow the system to send an alert when
frost is imminent, but it would be better to predict that it will occur
several hours into the future.
All of the above are also true for inferring and forecasting humidity at the
Nanoclimate level. Our group has access to an instrumented orchard and
historical sensor data to support this project. In addition,
this paper
describes some early attempts at Nanoclimate temperature inferences using
internal CPU temperatures as explanatory variables.
Both of these examples are intended to stimulate your imagination about what
is possible (although they are both available as projects for this class as
well). What is key, though, is that the solution is an "end-to-end" solution
-- one that addresses a real-world problem using a combination of
infrastructure for cloud/edge/IoT and analytics. To succeed, you must often
develop a novel technology or amalgamate a set of existing technologies in an
entirely new way.
Example Technology Projects
An alternative style of project for this class is one that hypothesizes the
need for a new technology that is generally more useful (under some measure)
than existing approaches. There are currently a raft of incumbent
technologies for cloud/edge/IoT (e.g. MS IoT Hub, AWS Greengrass, etc.)
but their development is nascent and frequently driven by commercial
expediencies rather than computer scientific analysis. Another approach you
could take in this class is to explore alternatives to these incumbents that
are better suited to your understanding of what is necessary.
For example, we have developed a multi-scale, distributed
Functions as a Service (FaaS) infrastructure called CSPOT
CSPOT: A
Serverless Platform of Things
CSPOT make several new innovations. First, it defines a common, universal
"append-only" storage abstraction for FaaS programs. This abstraction is
simple enough to be implementable at the microcontroller level, yet powerful
enough to function as the main storage abstraction for IoT applications at the
edge and in the cloud. Secondly, it uses an append-only log as its runtime
system so that it leaves behind a record of causal dependency between
computations. Thus, by definition it is possible to recover causal execution
chains in highly scalable deployments. Thirdly, CSPOT functions are very low
latency -- two orders of magnitude faster than comparable AWS or
Microsoft technologies.
CSPOT is available as open source. There are a number of new
technological advances that it could enable including
- CSPOT-FS: a log structured file system for the CSPOT storage abstraction
- SPOT-FU: distributed transactions for CSPOT
- SPOXOS: Paxos for CSPOT
- OSPOT: a unikernel native implmentation of CSPOT
- SPOT-Leash: implementing chain replication as described in this paper by van Renesse and Schneider
for CSPOT
These enhancements all emphasize the use of append-only data structures,
wide-area causal dependency tracking, and
FaaS programming as "universal" concepts in a cloud/edge/iot setting. While
you would not necessarily need to implement an application "end-to-end" to
demonstrate your work, you would need to be able to make a meaningful
comparison to existing "state of the art" approaches.
Create your own Project
The examples described previous are intended to stimulate your interest. You
are free to choose one of them, to use one of them as a jumping off point for
a different idea, or to come up with something entirely new using your
limitless creativity. By all means, if you have an idea for a project,
contact me so we can discuss it. As long as it is exploring this new
architectural space for distributed applications and it is either validated
by a real-world application or a state-of-the-art competitor, it is almost
assuredly in scope.
Resources
There are two campus clouds available to you in this class: one that runs
Eucalyptus (which is API compatible
with Amazon AWS) and another that runs OpenStack. In addition, I can probably
arrange access
to HTCondor -- a
high-throughput cloud computing services with many powerful features.
HTCondor is useful if you wish to build something that requires a great deal
of scale, but can also tolerate a great deal of "churn" in resource
availability. If you think your project might fit this model, please get in
touch and we can discuss it.
Finally, if your project will use GPUs, UCSB is just now setting up a Pacific Research Platform node. If you would
like access to any of these infrastructures, please email me. However if you
request access, plan to use the infrastructure for this class (i.e. don't
ask for access just to have access).
Additionally, you are free to use any other cloud platform (e.g. in the free
tier) to which you can gain access. Unfortunately, we do not have class
credits from the public cloud vendors for this class.