CS8, Spring 2017

Lab07:
Processing data from a text file


Goals for this lab

By the time you have completed this lab, you and your lab partner should be able to

Step by Step Instructions

Step 0: Get together with your assigned lab partner.

Carefully choose the first pilot. Let the person who has been the pilot least often be the pilot first this time. This lab's pilot should log in and create a lab07 directory under your cs8 folder. You will work in this account for the rest of the lab. If your partner is more than 5 minutes late, ask the TA to pair you with someone else for this week.

Step 1: Start IDLE, and open a new window for function definitions

Start IDLE, then select "File=>New Window" and add a comment at the top of this new file like usual:

# lab07.py
# Name: your name(s), today's date
# Reading song information from a file

Step 2: Reading in a file

One thing you might be wondering - how do we get data that populate lists? We don't usually type everything into the Python shell every time. What we want to do is read the information from a file. There are two steps to reading in a file.

Step 2a: Open the file for reading

The general way to open a file for reading is:

myFile = open(filename,'r') # the 'r' stands for 'read'

For this lab, at the bottom of your file, type the line:

myFile = open("/cs/faculty/mikec/cs8/lab07/songs.txt","r")

This will tell python to open the file of songs information for reading

Step 2b: Read in the file, one line at a time.

Python has a special way of reading in a file. Remember how for strings and lists we could write a for loop that iterated once for each character in the string or item in the list? Well, we can write a for loop that iterates once for each line in the file. Add the following after the "open" statement that you already wrote:

for line in myFile:
    print(line)
The object named line is a string.

Run the module and see what happens. It should print out the contents of the file. Since each line in the file includes a newline character ('\n') at the end, and because the print function also always prints a newline, the output is double-spaced.

Now remove those statements from lab07.py file (or insert '#' in front of each one), so they will not execute every future time you run the module.

Step 3: Writing a function to read a file and return a list

Write a function that takes a filename and a target string as input parameters and returns a list of the lines in the file that contain the target string. The function uses the filename to open the file, loops through the lines of the file and saves each line that contains the target string as one element of the list. No printing should be done inside the function - it should just return the list of lines that contain the target string. The function must have the following signature:

def getFileMatches(filename, target):

It would be used as follows:

>>> filename = "/cs/faculty/mikec/cs8/lab07/songs.txt"
>>> loveLines = getFileMatches(filename, "Love")
>>> loveLines
['"When A Man Loves A Woman" - Percy Sledge\n', '"Whole Lotta Love" - Led Zeppel
in\n', '"Sunshine Of Your Love" - Cream\n', '"Bye Bye Love" - The Everly Brother
s\n', '"Somebody To Love" - Jefferson Airplane\n', '"Where Did Our Love Go" - Su
premes\n', '"She Loves You" - Beatles\n', '"(Love Is Like A) Heat Wave" - Martha
 & The Vandellas\n', '"Why Do Fools Fall In Love?" - Frankie Lymon & Teenagers\n
', '"I\'d Love To Change The World" - Ten Years After\n', '"Who Do You Love?" -
Bo Diddley\n', '"Stop! In The Name Of Love" - The Supremes\n', '"What\'s Love Go
t To Do With It?" - Tina Turner\n', '"Love Hurts" - Nazarath\n', '"Pride (In The
 name Of Love)" - U2\n', '"I Love Rock \'n\' Roll" - Joan Jett & The Blackhearts
\n', '"Love The One You\'re With" - Stephen Stills\n']

Hints: This is a problem in which you build the solution gradually. What type of problem is this? You are returning a list - think carefully about what the list should start out as (initialization). How do you think you express that a variable is set to a list with no items in it? Then, how do you find out which lines to add to the list based on whether or not the target string is in the line?

Step 4: A function to print lines of a file

Switch roles between pilot and navigator

So that you can more neatly demonstrate to the TAs that you have completed your task - and to learn more clearly the difference between a function that returns a result and a function that prints results for the user to see - write a function that takes in a filename and a target string as parameters, calls your getFileMatches function, stores the result into a list, and then prints out each element of the list. For example:

>>> filename = "/cs/class/cs8/lab07/songs.txt"
>>> printFileMatches(filename, "We")
"We Will Rock You" - Queen

"The Weight" - The Band

"Welcome To The Jungle" - Guns N' Roses

"Werewolves of London" - Warren Zevon

"We're An American Band" - Grand Funk Railroad

"We Gotta Get Out Of This Place" - Animals

Don't worry about printing the extra newline characters that are part of each line. Of course, if that bothers you then you can get rid of the extra lines in more than one way. Let's say a line of the file is stored in a string called line, then any of the following would print just one of the two newline characters:

>>> print(line[:-1])  # prints all except the last character in line
>>> print(line.strip())  # prints line with no leading/trailing white space
>>> print(line, end='')  # replaces the usual '\n' at the end with ''

Step 5: Show off your work and get credit for the lab

Get your TA's attention to inspect your work, and to record your lab completion.

Don't leave early though ... see challenge problems below.

Step 5a. ONLY IF YOU RAN OUT OF TIME TO HAVE THE TA INSPECT YOUR WORK

If you must complete this assignment at CSIL, then submit it with the turnin program. You MUST have both your name and your partner's name in the file in order to receive credit. Remember that the original pilot needs to do this step, since that is whose account you have been using in Phelps 3525.

Bring up a terminal window on CSIL, and cd into the original pilot's cs8 directory, and cd again into the lab07 directory. Then type the following:

turnin Lab07@cs8c lab07.py

Respond "yes" when the program asks if you want to turn in (be sure to read the list of files you are turning in), and then wait for the message indicating success.


Evaluation and Grading

Each student must accomplish the following to earn full credit for this lab:

Optional Extra Challenge

If you finish the lab early or would like an extra challenge - do the following in any order you choose.

a. Another way to read in a file is the following:

myFile = open(filename,'r')
for line in myFile:
  values = line.split()
  print(values[0],values[1])
What this does is read in each line in the file and put each word, number, or whatever separated by spaces into a separate spot in the values list. So if each line has two things on it, then the values list has two items in it each time. Each iteration overwrites values[0] and values[1] with the tems from the next line.

Write a function that creates two lists, first and second, and reads all of the first items of each line into first, and all of the second item in each line into second. To return both lists, first package them up into a single return value by creating a list to contain both of those lists.

Create and save a new file with two pieces of data per line to test your function.

b. The data in "/cs/faculty/mikec/cs8/lab07/songs.txt" are arranged so that each line contains a song title, a separator (" - "), and an artist. The split function can be told to use a different delimiter than space, and in this case it can be told to use " - " as the delimiter to get the two important pieces out of the data. For example, here is the result for the first line:

>>> line.split(" - ")
['"Stairway to Heaven"', 'Led Zeppelin\n']
Write and test another function to return two lists (or one list of tuples) for songs and artists, given the name of a file such as songs.txt. Try to clean up the results too, by removing the newline character ('\n') from the end of the artist name, and maybe the quote characters (") around the song title.

c. Data are read from a file as strings. To use the data as numerical values, it is necessary to transform the strings to number types. If line is a string that represents an integer, you can use int(line) to find the integer value, or you can use float(line) to find the floating point value. Write a function that will read a file with just one number per line, and return the sum of all the numbers. Create a small data file to test it.


Prepared by Diana Franklin and Michael Costanzo.