Software Design and Implementation

Objectives

• Gain practice in problem solving and functional decomposition

Assignment Specifics

This assignment is to be completed in groups in class.

In this team assignment, we are going to gain practice in breaking a big problem down into doable steps.

Problem Solving

The following problem solving steps have been suggested by George Pólya in a famous book entitled How to Solve It:
1. First, you have to understand the problem. So, in computer science, you read the requirements or even elicit requirements from a client.
2. After you understanding the problem, you then make a plan to tackle the problem. So, in computer science you develop an algorithm to solve the problem.
3. After making the plan, you carry out the plan to create a solution. In computer science, you implement the solution in some programming language.
4. Look back on your solution, you ask if it could be better? Since the answer is usually, "yes" you refine your solution. In computer science, you refine your solution perhaps by refactoring it.
While these are excellent principles to follow, Step (2) seems really far too vague for larger problems. A big part of being a computer scientist is being able to break down a big problem into smaller problems. That is when functional composition comes to the rescue.

Why Functional Decomposition?

Functional Decomposition is the process of taking a complex process and breaking it down into smaller, simpler parts many of which can be implemented in individual functions.

The main idea is that we can make a problem more manageable by breaking it into sub-problems and solving those.

For example, imagine that you want to have a successful career in information technology. Deciding to become the next Ada Lovelace or the next Steve Jobs or the next Bill Gates might be pretty overwhelming.

Let's look again at A12: DNA to develop an algorithm.

Succinctly stated, the stated problem in A12: DNA is to take a string, determine whether or not it is a valid DNA sequence, and if it is, return the corresponding amino acid sequence.

Is this description enough to understand the problem fully? No, so we included additional constraints. Valid DNA has only four possible nucleotides: A, C, G, and T. Each DNA strand pairs with a complementary sequence where T pairs with A (and vice versa) and C pairs with G (and vice versa). A simplified version of the mRNA of a DNA sequence can be seen as the complement strand of the DNA sequence with each occurrence of the nucleotide T replaced with the nucleotide U. Then each mRNA sequence can be translated to a sequence of Amino Acids by breaking the mRNA sequence into groups of three nucleotides and looking up the appropriate amino acid for each full group of three
nucleotides in the mRNA sequence.

This is a huge and possibly overwhelming problem if we don't break it down into smaller sub-problems. There are usually many valid ways to break a problem down. Few people would argue that software development hasn't benefited from an increased focus on code reuse through encapsulation, information hiding, and inheritance, and most modern software utilizes these ideas.

Software  design is a  process which serve to translate software requirements into a description of the software components necessary for the  implementation phase.  The  preliminary software design document (sometimes just called the design document) shows how  the  software  system will be structured  to satisfy the  requirements. As the  primary reference for code development,  it must contain  all the  information  required  by a programmer to write  code. The design is performed  in two stages. The first is a preliminary design document in which the overall system architecture and data architecture is defined. In the detailed design stage much more detail is added.

For the A12: DNA problem, we then broke the problem down into sub-problems in the following way in our design document:

Example: A12:DNA Preliminary Software Design Document

by Dr. Jan Pearce and Dr. Scott Heggen

Purpose: The primary purpose of this software is to take a string, determine whether or not it is a valid DNA sequence, and if it is, return the corresponding amino acid sequence.

We will complete this task with the following functions:
• First, we need to test if the input is valid. We will make a function called is_nucleotide() for this. It will need a sequence as an input. We should not continue if this fails.
• We need to determine the number of times a given nucleotide appears in a given sequence. This will require both a sequence and a nucleotide as inputs and will return an integer count as an output. This will be represented by the `num_times()` function.
• We need to complement this sequence by returning a string in which Ts complement As and Cs complement Gs. We will create complement_strand() for this. It will needs a valid input sequence.
• Next we need to replace each occurrence of the nucleotide T, replace it with the nucleotide U. We will call this mRNA(). It also needs a valid sequence which has already been complemented as an input.
• Next we need to divide the mRNA into sub-strings of length 3, ignoring any "extra DNA" at the far end. We will do this with an input which is a valid mRNA sequence, and we will return the relevant sub-strings in a list. Let's call this function chunk_amino_acid().
• Next, given a valid three character string of mRNA we need to return the corresponding single character that represents the amino acid. We will call this function amino_acid_chunks().
• Some function will need to call all of the other functions which we have described above to do the actual sequencing. We will call this function sequence_gene().
• We will also want to write unit test for all of the above so we can use test driven design as we build out. For this, we will need two functions: testit() to run each unit test and genomics_test_suite() to contain the tests themselves. Hence, genomics_test_suite() will be designed to test the following:
• is_nucleotide(sequence)
• num_times(sequence, nucleotide)
• complement_strand(sequence)
• mRNA(sequence)
• amino_acid_chunks(threecharseq)
• sequence_gene(sequence)
• Finally, we will need a main function main()which can be used to call the function sequence_gene() and  genomics_test_suite().

When we solve computer programming problems, we need to make choices about what to do as well as what order to do things in. It is a creative process, so there are often not strictly right or wrong answers in these choices. Nevertheless, sometimes the primary problem is so big or complex that we don’t really know where to start. Functional decomposition is what we do when we break a problem down into its smallest functional parts to make it easier to tackle and easier to test. One key benefit of functional decomposition is that once you start coding, you are likely working on the simplest component that you can possibly work with for your application. If you are using test driven design, you can simply work through each unit test as you work toward the ultimate solution.

You can see how the example design document we created for A12 helped us to break down the overarching problem into a set of manageable and testable functions.This is the beauty of functional decomposition! Here are the resultant functions and their purposes from A12: DNA:
• testit(did_pass) prints the result of a unit test.
• is_nucleotide(sequence) checks that the string sequence provided is a valid string consisting only of the 4 nucleotides A, C, G, and T and returns True if so, and False otherwise'.
• num_times(sequence, nucleotide) returns count of how many times nucleotide is found in sequence.'
• complement_strand(sequence) returns the string which will be the second strand of the DNA sequence given that Ts complement As, and Cs complement Gs.
• mRNA(sequence) uses the return from complement_strand(sequence) with each occurrence of the nucleotide T replaced with the nucleotide U.
• chunk_amino_acid(sequence) uses the result of mRNA(sequence)and divides it into substrings of length 3, ignoring any "extra DNA" at the far end, returning the relevant substrings in a list.
• amino_acid_chunks(threecharseq) expects a three character string as a parameter and returns the corresponding single character amino acid.
• sequence_gene(sequence) takes a a sequence of nucleotides: A, C, G, and T and returns the corresponding amino acid sequence.
• genomics_test_suite() is designed to test the following:
• is_nucleotide(sequence)
• num_times(sequence, nucleotide)
• complement_strand(sequence)
• mRNA(sequence)
• amino_acid_chunks(threecharseq)
• sequence_gene(sequence)
• main() calls genomics_test_suite() which tests each of the supportive functions. (Note that this differs from our original design-- This is okay! Design documents are designed to help you, not constrain you!)
There are many design choices which are made in the path from the problem to a solution, and there are not always right answers. For example, complement_strand(sequence) could just as easily returned the string which will be the second strand of the DNA sequence given that Ts complement As, and Cs complement Gs with each occurrence of the nucleotide T replaced with the nucleotide U which would have made the function mRNA() unnecessary.

Design documents are a really valuable part of the software development process in industry. Google "importance of software design documents/" You will see that none of the top-level hits are at colleges or universities. They are really useful because they can help you to move more smoothly from your design to your implementation. This is often accomplished by using stubs, mock implementations that take the correct inputs and return something of the correct data type, but which are not typically correct.
• Go back to assignment A12: DNA. See how the above design is ultimately built-out with function "stubs" such as the stub for is_nucleotide(sequence):
```        def is_nucleotide(sequence):             '''checks that the string sequence provided is a valid string             consisting only of the 4 nucleotides A, C, G, and T             Returns True if so, and False otherwise'''             # Add code here             return True # Clearly this should not always return True ```
•  In other words, a stub is a fake implementation that conforms to the interface and so can be used for testing. This is an extremely valuable practice, as it bridges the gap between Step 2: breaking up the problem into sub-problems, and Step 3: implementing the algorithm in a programming language. It allows you to see your design start to transform into meaningful code that can be easily tested.

Your own Design Document using Functional Decomposition

Next, consider these issues, do the following with L2: UPC Codes.

In this teamwork, we want you to work at a high level or a design level, so you MAY NOT write anything which resembles code. Instead, we want you to write a preliminary software design document using only high level abstractions which look like and might become docstrings of the actual code. Hence, in a Word document you will submit a design document written in plain English which describes the planned set of computations, loosely grouped into possible future functions and their docstrings.

Instructions for this Teamwork

• Create a Google Doc for your team to share. In the document, do the following. Use the ideas of functional decomposition to create a design document for L2: UPC Codes formatted as you see in the above example.
1. Include an appropriate title for your design document.
2. Include a byline with all authors' names.
3. Include a brief statement of purpose for the software.
4. In a bulleted list,describe of the each higher order tasks that need to be completed, including the expected inputs and desired outputs. Again, you are NOT writing code; these should be high order tasks written in plain English as shown in the above example.
5. Be sure to describe the test suite you will use to test each task, considering both expected and unexpected cases.
6. Be sure to describe the main function.