CS8, Fall 2010

lab10—Syllable Counting


Introduction

Goals for this exercise

This exercise provides practice in writing functions on Python lists to:

You can either use for loops or recursion—your choice.

This exercise also provides practice with writing functions that use Boolean logic, and with using functions you've already defined to make new functions.

What you'll be doing: a heuristic for syllable counting

Many word processing programs (e.g. Microsoft Word) have tools built in that determine the "reading level" of a piece of text. This is important for choosing reading assignments in K-12 education, and for things like determining whether a ballot initiative can be understood by the average California voter.

One factor in determining reading level is the average number of syllables per word. Because English is such a complex language, the only way to be 100% sure of the number of syllables in each word is to look up each word in a dictionary. However, even if we don't have a dictionary, we can still use a technique called a "heuristic" to get a good approximation of the number of syllables in a word.

A "heuristic" is an approach to solving a problem that might not produce an answer that is 100% correct, but it is an answer that is often "good enough". In Computer Science, when problems are difficult to solve with 100% certainty, we often take the approach of coming up with a heuristic, and then trying to improve on that heuristic.

For syllables, here's a first heuristic:

Taking just the words in the preceding sentence, and treating y as a vowel (in addition to a,e,i,o, and u) we see that this heuristic works for every word in that sentence, except for the words "good" and "indication", which is a 90% success rate for that sentence.

However, we can improve the heuristic if we remove "consecutive vowels" before we count—that is, if we have two or more vowels, in a row, we remove the extra ones. Thus "good" becomes "god", and "indication" becomes "indicatin" before we count the vowels. If we do that, then our heuristic of counting vowels is 100% effective on that sentence!

However, this heuristic is still not perfect. For example, if fails on words such as "like", "there", and "looked".

Thus we introduce two more heuristics: ones to try to remove silent e from words such as "like", "there", "plane", etc., and the "ed" that doesn't contribute an extra syllable to words such as "looked", "jumped", "kissed", and "frowned".

The comments in the sample code discuss these heuristics. Be sure to read the comments carefully. They are there to help you make sense of the functions, and how the heuristics were developed.

As you'll see the addition of the silent e and ed removal heuristics improve our syllable counting, but there are still words on which these techniques fail. (The word "techniques" is one example). If we had more time in the quarter, we'd extend this exercise to challenge you to come up with additional heuristics to make our syllable counting technique even more accurate.

As it is, there are still some challenges for you—applying the techniques of writing recursive functions for lists to this syllable counting problem. So, for this exercise, stick to the heuristics given.

If you want to experiment with additional heuristics, just for fun as a challenge to yourself, then finish the exercise as given first in a file called lab10.py. Then make a separate copy called lab10_v2.py, in which you experiment with your additional heuristics. You can then include both when you submit.

Note that working on additional heuristics for the spelling problem, while not required for this exercise, is an excellent way to study for the exam. Just like with swimming or playing the guitar, the more you do it, the better you will get at it. What we are learning in this class is not primarily a collect of facts, but a new way of thinking, and you need to practice that way of thinking to improve.

Step by Step Instructions

Step 0: Getting Started


Step 0a: Download the starting point code from the link

The starting point code is at this link:

lab10StartingPoint.py

Step 1: Read through the comments, code and test cases in the ENTIRE file first

Before you jump in, just read thorough the ENTIRE file from start to finish. Read all of the comments, all of the code, and all of the test cases. Having a sense of the entire file will really help you as you proceed.

Find the stubs. These are the places where you'll be adding code in Step 2. But don't just jump in and start coding yet.

As you find things you don't understand, ask the TA or the instructor, or make a note to do so during class or office hours.

Only after you've read the entire file through—and you understand the big picture—proceed to step 2.

Step 2: Starting from the top, replace the stubs with correct function bodies

Your task is to replace the stubs (things like return -1 and return "stub") with correct function bodies, so that all the tests pass.

It is best to start at the top, and do the functions in the order they appear in the file.

What NOT do to...

There can be a temptation to make your test cases pass by changing them to the answer your function returns, even if it is wrong, but that is not permitted—in fact, it is a kind of academic dishonesty.

So, don't change any of the test cases, unless you have permission of the TA/Instructor. (This should only be done if we determine that there is an error in the test cases provided.)

When all the test cases pass...

When your file contains all the functions and test cases from the original lab10StartingPoint.py, and all the tests pass, you are finished!

Step 3: Submitting via turnin

Before submitting:


Evaluation and Grading (300 pts)

Grading Rubric:

Function Item Points  
all Naming and submitting your file correctly 20  
all Fixing the header comment in each of the three files 20  
all Code style (variable names, comments, etc,
removing all the @@@ lines)
20  
isVowel() correctness of code—code passes all tests from original file 40  
countVowels() correctness of code—code passes all tests from original file 40  
allVowelsA() correctness of code—code passes all tests from original file 40  
removeSilentE() correctness of code—code passes all tests from original file 40  
removeEdWhenNotASyllable() correctness of code—code passes all tests from original file 40  
countSyllables() correctness of code—code passes all tests from original file 40  
TOTAL   300  

 

Due Date: Fri 12/03 (5pm—no extensions!)

However, you must finish this by 5pm on Friday Dec 3rd—we cannot let this go into finals week, and there is no late submission for this assignment.

So try to get finished before Thanksgiving if you can—or early during deadweek.

But, Friday afternoon, 12/03, submit whatever you have by then, finished or not.
Partial credit is better than a zero!


Copyright 2010, Phillip T. Conrad, CS Dept, UC Santa Barbara. Permission to copy for non-commercial, non-profit, educational purposes granted, provided appropriate credit is given; all other rights reserved.