CSC 226

CSC 226 logo

Software Design and Implementation


Using Lists to handle UPC Codes

Objectives

  • Learn about an important application of computer science, the UPC code
  • Work on the design of a larger problem
  • Use lists in an application problem

UPC Code

The Universal Product Code (UPC-A) is a bar code system that was first used in 1974 in Ohio designed to automate checkout at the grocery store. It has since then been much more widely adopted, and now it is seen on packages in lot of different stores in the United States. UPC code Kleenex You may even have scanned these things yourself if you use the self-check lines at Wal-Mart.
(The UPC code you see shown to the right is for a box of tissues made by a US company called Kleenex.)

The UPC code has now expanded into the EAN-13 (European Article Number) to be able to be used around the world. As an expansion of UPC, an agreement was made that all scanners which could scan EAN-13 codes would also be able to scan UPC codes. Thus, the UPC-A codes in the US did not need to change.

UPC EAN Compare the UPC-A code for that box of Kleenex to the EAN-13 code -- the bars are all identical.

There are some significant differences between the UPC-A and the EAN-13 codes though:

  1. The UPC-A has 12 human-readable digits, but the EAN code has an extra digit at the very beginning, making it 13 digits. For products made in the US, this extra digit is always a 0, indicated by the zero that starts the number in the EAN-13 code of the example.
  2. The placement of the human readable digits beneath the code also differs. Note in the example to the right, the entire EAN-13 for a box of Kleenex tissues (036000291452) is contained under the bars, with the exception of the leading zero.

The UPC-A Code Decoded

One of the most interesting sites on UPC codes is by an artist named Scott Blake who is really into bar codes. You have got to see it to believe it. Really. Check it out: http://www.barcodeart.com/artwork/index.html

In addition to the art he created with bar codes, he created a diagram decoding the UPC-A Code for a 2 liter bottle of Pepsi that is very informative and explains to some extent what each of the parts means.
UPC diagram

There are a lot of parts to this picture, but you will be working on two specifics, the "Modulo Check Character" and how the bars are constructed.


Modulo Check Character

In the UPC-A system, the modulo check character is a digit that is used to verify that the bar code has scanned correctly. This number is calculated as follows:

  1. Add the digits in the odd-numbered positions, starting with the "Number System Character", the number in the bottom left of the picture above.
    Because there are 12 digits in a UPC label, the sum includes the first, third, fifth, seventh, ninth and eleventh digits from left to right.
    Multiply the resulting by three.
  2. Add the digits in the even-numbered positions, starting again from the left to the right. We exclude the "Modulo Check Character", which means that the sum only has the second, fourth, sixth, eighth and tenth digits.
  3. Add these two result together and find the remainder when divided by 10. (i.e we can use the modulo operator)
  4. If the result is not zero, subtract the result from ten to yield your check digit; otherwise your check digit is zero.

Modulo Check Examples

UPC code Kleenex

Consider the Kleenex tissue UPC-A we saw above (the image is shown to the right). The first 11 digits of code are as follows, where we enlarged and colored the digits that are in even positions for clarity:

0 3 6 0 0 0 2 9 1 4 5
Calculating the Module Check goes as follows:
  1. Add the odd-numbered digits: 0+6+0+2+1+5 = 14 and multiply the sum by three: 14 3 = 42
  2. Add the even-numbered digits: 3+0+0+9+4 = 16
  3. Add these two results: 42 + 16 = 58, and calculate the remainder when divided by 10: 58 % 10 = 8
  4. 8 is not 0, so subtract eight from ten: 10 − 8 = 2. According to this process, the check digit is thus 2, which is exactly the last number of the UPC code (the modulo check character)!

UPC for Pepsi

We can also see how the modulo character will work for the UPC-A bar code for the 2-liter bottle of Pepsi that Scott Blake used in his diagram. The first eleven digits for the code is:

0 1 2 0 0 0 0 0 2 3 0
  1. Add the odd-numbered digits: 0+2+0+0+2+0 = 4 and multiply the sum by three: 4 3 = 12
  2. Add the even-numbered digits: 1+0+0+0+3 = 4
  3. Add these two results: 12 + 4 = 16, and find the remainder when divided by 10: 16 % 10 = 6
  4. 6 is not equal to 0, so subtract six from ten: 10 − 6 = 4. According to this process, the check digit should be 4, which it is!

Again, what is the purpose of this check character? As the name suggests, using this process is designed to detect errors and to validate that the UPC code is in fact correct. So, when you are checking out at the grocery store and pass the bar code over the scanner, the computer reads the code and compares the check digit in the UPC with what it should be. If there is a mismatch, a scan error is generated.

The UPC code can detect 100% of single bit read errors (one value is incorrect) and 89% of transposition errors (such as when two numbers get switched by mistake, 21 -> 12). Cool, huh?


What do those bars mean?

The bar codes are basically binary numbers, with black representing a 1 and white representing a 0.

You may have noticed that the middle and ends of a UPC always has a bunch of longer bars. Although we will not worry about their lengths, we do need to note that these bars are special. The end bars (called "guard bars") always consist of the bit pattern 101, and the center bars have the bit pattern 01010 (note that both of these patterns are symmetric.)

The UPC bar code is divided into two main areas--the part to the left of center and the part to the right of center.

  • Number System Character + Manufacturer ID Number (to the left of center):

    The digits between the left hand guard bars and the center bars have the following binary pattern, where the human readable number is on top and the bars below. Note that multiple consecutive bars of the same color have no dividing markers, so the collection will appear "thicker."Also note that white bars encode data (binary 0's) just like the black bar (binary 1's).

    Manufacturer ID UPC

  • Item Number + Modulo Check Character (to the right of center):

    The pattern for each digit between the center and right hand guard bars is as follows:

    Item Number UPC

Another way to represent this information is as a look-up table:

Digit "Left Side" "Right Side"
0 0001101 1110010
1 0011001 1100110
2 0010011 1101100
3 0111101 1000010
4 0100011 1011100
5 0110001 1001110
6 0101111 1010000
7 0111011 1000100
8 0110111 1001000
9 0001011 1110100

Why are there separate codes for the left and right sides of the center? Bar codes are frequently read upside down, which basically means backwards, so the patterns are designed so that the scanner can identify which bars it is reading. Thus, the computer can determine that which patterns are the ones for the manufacturer code because they always begin with a 0 and end with a 1. For the patterns on the item number, the opposite is true. Note that these patterns are not  mirror reflections of each other; if you read a pattern on the left backwards, there is never a match on the right!

Hint: There are lots of good ways to structure a look-up table in python. You can use a list of tuples, multiple lists, or even the dictionary like we used in the Bioinformatics lab.

Using Files and Strings

Here is a python program which opens a plain text file, reads it, reverses the lines (just for fun).  It also writes a new file, though you do not need to do this in this lab. Copy both the program and the provided text file, putting them into the same folder:
Run this program to get experience opening and using files.  The poem is fun too. :)


Lab Specifics

On this lab, you may choose to work alone or in pairs. Your choice!!

If you begin in pairs but do not finish in class, decide whether you will finish the lab together. You must get together outside of class and work with the same partner you had in class. You may NOT work with more than one partner.

If you finish together, please turn in a single copy of the program with a file name of username1-username2-L2.py.

If you choose to finish separately, that is fine, but you will both be required to write a full explanatory comment acknowledging your partner by name and indicating in detail how much you had accomplished together. Include this comment near the top of your program. Then, turn in separate programs named username1-L2.py and username2-L2.py.

Remember that for all labs, in addition to writing the program, you must also do a short lab write-up.


Your task in this lab is to create a program which can verify and generate bar codes.

Minimally, your program should do the following:
  1. Your program should ask the user for a filename and should be able to work with any correctly formatted text file that is located in the same folder as the program. Your program should open the file which should contain single line which is a series of 12-digits that make up a UPC code. Here are two files you may use for testing: upc-input1.txt, upc-input2.txt

  2. For every line read, your program should check to see if the 12 digits form a valid UPC code using the Modulo Check Character.  Your program should be able to do this check, but if you want to double check your work, just to be certain, you may use http://www.upcdatabase.com/itemform.asp

  3. If the code is valid, the program should display the UPC code on the screen using the turtle library.
    If the code is invalid, it should print an error message using the turtle library.
    Note that each of the lines in a UPC code is either black or white. The right edge of each line exactly meets the left edge of the next line, and each is the same width. Some lines look thicker when lines of the same color follow each other. Note also that guard lines and center lines tend to be longer.  How much longer is not important.

Some pointers worth repeating

  • The most important part of this lab is the design. If you think about design first, debugging will be easier. For this first part, we highly encourage you to create a flow chart that outlines how your program is going to work.

  • The second most important part of this lab is to break up the program into smaller functions that you implement one by one, using unit tests to check its operation on known inputs and outputs.

    The following is a couple of suggestions for the kinds of functions you may want to use, but these functions are ONLY SUGGESTIONS! You are welcome and encouraged to design the program that makes sense to you! Regardless, however, you should still unit testing for anything you design.

    • Suppose you created a function called convert that takes as a parameter a string of 12 digits and it returns a list with each digit of a UPC-A bar code. (Storing the numbers in an list will greatly facilitate computing the check digit.)

      UPC code Kleenex Taking the tissues UPC-A bar shown to the right as an example, the input would be the string "036000291452" and the output would be [0, 3, 6, 0, 0, 0, 2, 9, 1, 4, 5, 2]. Thus, you can use the unit test function testit(...) from the text to test a that it works correctly as:

      testit( convert("036000291452") == [0, 3, 6, 0, 0, 0, 2, 9, 1, 4, 5, 2] )
    • Suppose you also created a function called something like check_code(...) that checks whether a sequence of digits passed in as input is a valid UPC-A sequence. It can return True if correct and False otherwise.

      If you decided to have this function take a string digit sequence as input, you can use the function testit(...) from the text to test it for both valid and invalid sequences:

      testit( check_code( "036000291452" ) == True)
      testit( check_code( "036000291455" ) == False ) # notice that all that changed was the check modulo character
    • Suppose that you know that a sequence is a valid UPC-A code. You can then have a function that translates the decimal digits into a sequence of binary numbers that are used to draw the bars on the screen.

      Your translate(...) function could take as input a string of digits, such as the sequence 036000 291452 above (there are probably better ways to do this). This function can then generate the binary representation of the code using the look-up table scheme outlined above:


      Guard Number System Character + Manufacturer ID Center Item Number + Modulo Check Digit Guard
      Digit
      0 3 6 0 0 0
      2 9 1 4 5 2
      Binary 101 0001101 0111101 0101111 0001101 0001101 0001101 01010 1101100 1110100 1100110 1011100 1001110 1101100 101

      If you decided to have this function take a string digit sequence as input, and generate a list of binary numbers for its representation, you can use the function testit(...) from the text to test it for both valid and invalid sequences:

      #The output of this function is SO LONG that we use a variable to hold it so that
      #the unit test is easy to read.

      correct_binary = [1,0,1, 0,0,0,1,1,0, 1,0,1,1,1,1, 0,1,0,1,0,1, 1,1,1,0,0,0, 1,1,0,1,0,0, 0,1,1,0,1,0, 0,0,1,1,0,1, 0,1,0,1,0, 1,1,0,1,1,0,0, 1,1,1,0,1,0,0, 1,1,0,0,1,1,0, 1,0,1,1,1,0,0, 1,0,0,1,1,1,0, 1,1,0,1,1,0,0, 1,0,1]

      testit( translate( "036000291452" ) == correct_binary )

      Once the function works correctly, you can program the graphical display of the UPC code using the turtle library.

  • Your program must have good structure and style:
    1. It must include a main() function.
    2. The highest level of the program (i.e., no indenting) must only contain the following:
      • the header
      • any import statements
      • function definitions
      • a call to the main() function
    3. It must correctly use lists and tuples
    4. It must be designed in a modular fashion, correctly using functions for each task with correct parameter passing and appropriate use of returns.
    5. Use only meaningful variable and function names.
    6. Insert a descriptive docstring for each function you are designing and implementing.
    7. Include a descriptive header as a comment at the top of your source code.

When you have completed your program and have it working to your satisfaction, drop both the source code file name of username1-username2-L2.py and your Microsoft Word Lab username1-username2-L2.docx write-up into the L2 drop box in Moodle.


Copyright 2015 | http://cs.berea.edu/courses/csc226/