| CS290I - Scalable Internet Services and Systems | |
|
Thorsten von Eicken - UCSB - Winter 2002 |
|
updated 1/18/2002 with Oracle
data types
updated 1/22/2002: column increment changed to inc, removed
unsigned, clarified id fields
In this project you get to learn Perl, the details of HTTP and HTML, and SQL. Your task is to write a robot in Perl which searches the web for stock information about a number of technology companies and stores the data in a SQL database. This information is going to be used in the next projects to create a web site that displays information about the auctions and eventually allows you to create new auctions and place bids..
All of you will have access to Sun Solaris machines in PSL, see the course web site for details, but at least bugatti.cs.ucsb.edu will be available. The SQL database will be running on bugatti and you will have to store your data there, however, the database is accessible remotely, so you can connect to it from any machine you choose to complete the project.
To use perl on bugatti, use /usr/local/bin/perl (not /usr/pubsw/bin/perl or /usr/bin/perl) so you get a slew of HTTP and HTML related modules. (You may want to put /usr/local/bin in your path before the other dirs, use "perl -V" to check: look at the paths at the end and they should be in /usr/local/lib).
An Oracle database for you is running on bugatti, and each of you will have your own empty database set up. You will receive mail with your oracle password. You have all permissions on your database, please refer to the oracle documentation for more details. Again, the oracle command line interface is in /usr/local/bin: you will need that, plus the DBI perl module.
You can log in to your database and create tables using:
# sqlplus username@cs290i Enter password: *****
Note that the "username" is the name of your database.
Results will be displayed in batches of 25, so it will be necessary to navigate the "next" links until the list is over (it has some 2000-3000 items total). Each of the items on that page lead to an item information page from which the robot needs to gather the following information:
In addition, the history of bids for each item should be retrieved by following the (Bid History) link. The list of bids will appear at the bottom of the page in a tabular format.
As your robot fetches the information from Yahoo, it needs to talk to oracle and create a database with the three tables described below. It is ok for you to create the tables manually, but the robot needs to insert the data as it crawls. (We actually recommend you start your script with a series of "drop table" commands to clear the database, followed by "create table" commands to create the tables afresh; this way you can easily run your script over and over.)
auction
table:
images
table:
|
users table:
bidHistory table:
|
You will need to hand in the perl source code for your robot as well as a one page project overview. We expect the robot code to be well commented so that its functioning is self-evident. The overview should describe the overall structure of the robot and explain any tricks discovered while writing and running the robot. We will also verify the results in our database.