FoP: 1.2 Python Programming

Dr. Julian Huber
Management Center Innsbruck

Julian Huber - Fundamentals of Programming

Julian Huber - Fundamentals of Programming

1.2.1 Variables and Data Types

1.2.1.1 Python with Google Colab

🎯 Learning objectives

You will be able to

  • write Python code and Markdown Text in Juypter Notebooks / Google Colab
  • execute Notebooks on their on PC or using Google Colabs
Julian Huber - Fundamentals of Programming

Hello World!

print("Hello, World!")
Julian Huber - Fundamentals of Programming

Python

"Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation."

  • high-level - easy to understand for humans
  • interpreted - we do not have to compile whole programs to run it
  • general-purpose - we will use it for bio informatics, but You could also build web services, games, etc.
  • indentation - white space has meaning

Julian Huber - Fundamentals of Programming
Executing Python Code
  • There are three options to follow this course
    • If you have a Google-Account: use Google Colab in Your browser
    • Running a online-notebook with binder
    • Running Python Jupyter Notebooks on Your PC. Install the free version of Anaconda (follow this installation guide)
Julian Huber - Fundamentals of Programming

How we will work together

  • Before You start, put the the red card on top, this will indicate that You are still working on the challenge
  • ✍️ are simple practical task You should try on Your own
  • 🏆 are more challenging practical task, where You can work in a group
  • 🤓 are optional task, if You want to learn more. Only start them if You finished all non-optional tasks until the next 🏁
  • 🏁 Once, You reach the recap mark, switch the cards. A green card indicates that everything is clear, a yellow card that we should discuss the solution together
  • At any time, if You have a question: Raise Your hand and work with Your class mates
Julian Huber - Fundamentals of Programming

✍️ Copying a Google Colab Notebook

  • You also find the link to the notebooks in Sakai
  • read and try out 1.2.1.1 Python with google Colab
  • The solution to all tasks can be found here

⌛ 15 minutes

Julian Huber - Fundamentals of Programming
🤓 Without a Google Account
  • Click this button
  • Make sure to download script on the end of the lecture or Your changes will be lost
Julian Huber - Fundamentals of Programming
Downloading a Google Colab Notebook
Julian Huber - Fundamentals of Programming
Working locally
  • open Jupyter Lab or VS Code and navigate to the .ipynb file to open it
  • on the local PCs You can start both programs using the Anaconda Navigator program
Julian Huber - Fundamentals of Programming

1.2.1.2 Basic elements of Python code

🎯 Learning objectives

You will be able to

  • store numbers and text in Python variables
  • use proper naming conventions for naming variables
  • to interpret a Python error message and use Google to solve it
Julian Huber - Fundamentals of Programming

Sequential Execution

  • Each line is evaluated after the other
  • lines starting with # are comments that are not executed
# This is a comment
1+2
# > 3

"""
There are also
multi-line comments
"""
Julian Huber - Fundamentals of Programming
Output in Jupyter Notebooks
  • Cells in a Jupyter Notebook are executed individually
  • If the last line of a cell has a return value, it will output this value below, even without the print function
    1+2
    
  • This will output 3
Julian Huber - Fundamentals of Programming

Commands

  • special keywords with instructions
print(1)
Julian Huber - Fundamentals of Programming

Variables

  • store data in the memory and make is accessible by a name
        message = "Hello Python world!"
    
  • variables are stored as long as the runtime is not restarted and can be used in all cells
  • the runtime defines the current working memory, we use
Julian Huber - Fundamentals of Programming
  • A variable is a pair consisting of a name (indicates space in memory),
    to which a value is assigned.
  • The value can be changed by further assignments (is variable).
  • The type of the value (content) of a variable is defined by the data type.
    • e.g., digits, characters, letters, words, lists,...
    • according to this, different amounts of space are reserved in the memory
Julian Huber - Fundamentals of Programming

✍️ Task: Basic elements of Python code

⌛ 15 minutes

Julian Huber - Fundamentals of Programming

1.2.1.3 Data Types

🎯 Learning objectives

You will be able to

  • identify the data types of variables
  • apply methods to strings and variables containing strings
Julian Huber - Fundamentals of Programming

Types

  • Each variable has a type, that defines, how the information is coded in the memory
an_int = 42
type(an_int)
# > <class 'int'>

a_string = 'This is a string.'
type(a_string)
# > <class 'str'>
Strings in Python 3 are Unicode
Julian Huber - Fundamentals of Programming

Julian Huber - Fundamentals of Programming

✍️ Basic elements of Python code

  • read and try out 1.2.1.3 Data Types

  • ✍️ There are two tasks You should try on Your own

  • 🏆 For the other task You can form a group of three

⌛ 25 minutes

Julian Huber - Fundamentals of Programming

1.2.1.4 Calculating with Python

🎯 Learning Objectives

You will be able to

  • calculate mathematical formulas in Python using the math package
  • use the print function to show Your results
Julian Huber - Fundamentals of Programming

Importing Packages

  • Packages are are unit of code that are provided in a central package index
  • Some packages (like the math package) are part of the Python programming language and can be loaded with the import statement
    import math
    math.pi
    print(math.sin(math.radians(30)))
    
Julian Huber - Fundamentals of Programming

Goal of this lecture: Using Python instead of Your Calculator

  • Python can do everything your calculator does
  • Jupyter Notebooks can store results
  • Code makes it easy to reuse your calculations

Julian Huber - Fundamentals of Programming

🏆Case Study

Logistic Growth

  • ... time
  • ... population size
  • ... initial population
  • ... carrying capacity
  • ... growth rate
Julian Huber - Fundamentals of Programming
  • Growth start exponentially
  • slows down as it reaches the carrying capacity
  • can we use Python the replicate this formula?

Julian Huber - Fundamentals of Programming
  • Write a flexible simulation for logistic growth, so can simulate:
    • Seal populations
    • Bacterial growth
    • etc.

1.2.1 Variables and Data Types

⌛ 25 minutes

Julian Huber - Fundamentals of Programming

Hints

# importing the math library
import math

# here the user can define the input
...


# here the calculation is made
...

# here, we print out the results
...
Julian Huber - Fundamentals of Programming

1.2.2 Lists and loops

Julian Huber - Fundamentals of Programming

1.2.2.1 Special Data Types

🎯 Learning Objectives

You will be able to

  • use Python to formulate and evaluate Boolean expressions
  • use None type variables
Julian Huber - Fundamentals of Programming

Difference between Assignment (=) and Comparison (==)

eve_is_here = True                  # Assignment: eve_is_here is set to True
adam_is_here = False                # Assignment
print(adam_is_here == eve_is_here)  # Comparison of the values
#> False
Julian Huber - Fundamentals of Programming

NoneType

reading_frame_start_position = None
type(reading_frame_start_position)
#> NoneType
Julian Huber - Fundamentals of Programming

✍️ Special Data Types

⌛ 10 minutes

Julian Huber - Fundamentals of Programming

1.2.2.2 Defining Lists

🎯 Learning Objectives

You will be able to

  • define and fill lists manually
  • define list containing integer sequences
  • manipulate lists
Julian Huber - Fundamentals of Programming

Lists

nucleotides = ["Guanine","Adenine","Cytosine","Thymine"]

print(nucleotides[0])

#> "Guanine"
Julian Huber - Fundamentals of Programming

DNA & RNA from the perspective of a computer scientist

  • Nature does not use binary (0/1)
  • Nature stores data in long polymers of four nucleotides, usually symbolized by a single letter:
    either A, T, C, or G (for DNA)
  • For the first part of the lecture, we only consider DNA sequences and we can treat them as a list
Julian Huber - Fundamentals of Programming
  • Nature always has a backup:
    • double-stranded DNA stores the complement on the other stand
    • ATGG
    • TACC

Julian Huber - Fundamentals of Programming
  • It's not the most efficient way, but
    we can consider the nucleotide sequence
    a string: sting = "ATGG"
  • a string is a list of single characters ["A","T","G","G"]
  • Why is this not efficient?
    • A character in a string requires
      at least 8 Bit (ASCII-Encoding)
    • As we know we will only have
      to deal with four different states (A,T,C, or G)
    • 2 Bit should be enough
Julian Huber - Fundamentals of Programming
  • like with other encodings the data is not the information of interest
  • pairs of three nucleotides are called a codon
  • the translate to an amino acid or a stopping point

Julian Huber - Fundamentals of Programming
  • How many possible codons are there?
  • with possible bases in a combination of
  • This information is stored in Codon Tables

Julian Huber - Fundamentals of Programming

✍️ Defining Lists

Julian Huber - Fundamentals of Programming

1.2.2.3 Working with Lists

🎯 Learning Objectives

You will be able to

  • write scripts that loops through a list to automate calculations
  • copy parts of lists
  • decide, when to use tuples instead of lists
Julian Huber - Fundamentals of Programming
nucleotides = ["Guanine","Adenine","Cytosine","Thymine"]

for nucleotide in nucleotides:
    print(nucleotide)
#> "Guanine"
#> "Adenine"
#> "Cytosine"
#> "Thymine"
Julian Huber - Fundamentals of Programming

✍️ Working with Lists

Julian Huber - Fundamentals of Programming

1.2.2.4 Applications for Lists

🎯 Learning Objectives

You will be able to

  • use list operations to clean up DNA-sequence data
Julian Huber - Fundamentals of Programming

🏆 Case Study: Use Python to break up a RNA sequence into codons

  • You are given data of the following RNA-Sequence.
    dna_sequence = ["A", "U", "C", "C", "G","A", "G", "C", "U", "E", "G","A", "G", "C", "U", "G", "Z", "G","A", "G", "C", "U","U"]
  • What sequence of amino acids does this RNA sequence encode?
  • Bonus: Find an open reading frame fist
Julian Huber - Fundamentals of Programming

Task

  • As You see, the data might have some errors, it should contain only the nucleotides A, U, T, and G. First clean the data by removing all corrupted items from the list.
  • Next create a loop, that prints out all three letter codons of the remaining sequence and store it in a list:
    ['A', 'U', 'C']
    ['C', 'G', 'A']
    ['G', 'C', 'U']
    ['G', 'A', 'G']
    ['C', 'U', 'G']
    ['G', 'A', 'G']
    ['C', 'U', 'U']
    
Julian Huber - Fundamentals of Programming

1.2.2 Lists and loops

⌛ 10 minutes

Julian Huber - Fundamentals of Programming
Julian Huber - Fundamentals of Programming

1.2.3 Conditional Statements

Julian Huber - Fundamentals of Programming

1.2.3.1 Boolean Expressions

🎯 Learning Objectives

You will be able to

  • write conditional expressions
Julian Huber - Fundamentals of Programming

Boolean Expressions

  • use Boolean algebra and result in a Boolean value
    a = True
    b = True
    c = False
    result = (a and b) or (b or c)
    
  • comparisons of other (non Boolean) data types can also result in a Boolean
    a = ("abc" == "abc")
    b = (5<0)
    
  • note the difference between =and ==
Julian Huber - Fundamentals of Programming

✍️ Boolean Expressions

⌛ 10 minutes

Julian Huber - Fundamentals of Programming

1.2.3.2 Conditionals

🎯 Learning Objectives

You will be able to

  • write conditional expressions to control the execution of Python code
  • decide when to use an if, elif or else statement
Julian Huber - Fundamentals of Programming

Conditionals

  • A key concept of higher programming languages are conditional statements
  • this means that some operations B and C are only made, if the condition A is either True or False
https://upload.wikimedia.org/wikipedia/commons/c/c5/If-Then-Else-diagram.svg
Julian Huber - Fundamentals of Programming

if-clause

  • most languages use some form of if-clause
    if <logical expression>:
        <statement>
    
    •  if True:
           print("This is printed!")
      
    •   if False:
            print("This is not printed!")
      
Julian Huber - Fundamentals of Programming

🧠 if-clause - one-sided

if <logical expression>:
    <statement>
  • logical expression is evaluated
    • TRUE: statement is executed
    • FALSE: statement is ignored
pin_correct = True

if pin_correct:
     print('Pin is correct!') 
Julian Huber - Fundamentals of Programming

🧠 if-else-clause - two-sided

if <logical expression>:
    <statement a>
else <logical expression>:
    <statement b>
  • logical expression is evaluated
    • TRUE: statement a is executed
    • FALSE: statement b is ignored
# Beispiel printed Lösung
a = 4
b = 3
if a>b:
     print("a is larger than b")
else:
     print("b is larger than a")    
Julian Huber - Fundamentals of Programming

✍️ Case Study: Write a program that manages lab access

Write a program the checks the lab clearance of anyone wanting to enter the lab:

  • Make a list of five or more usernames called users_with_clearance.
  • Make another list of five usernames called users_wanting_to_enter. Make sure one or two of the new usernames are also in the users_with_clearance list.
  • Loop through the users_wanting_to_enter list to see if each of them has a lab clearance. Print a message that to greet all the persons. If the person hasn't sent them to a supervisor.
Julian Huber - Fundamentals of Programming

⌛ 45 minutes

1.2.3 What if?

Julian Huber - Fundamentals of Programming
Julian Huber - Fundamentals of Programming

1.2.4 Storing Information and while-loops

Julian Huber - Fundamentals of Programming

1.2.4.1 Dictionaries

🎯 Learning Objectives

You will be able to

  • define dictionaries
  • decide, when to use dictionaries instead of lists
  • work with nested structures of dictionaries and lists
Julian Huber - Fundamentals of Programming

Sometimes a list is not enough

  • Imaging we want to store the information what codons encodes which amino acid
  • we could use lists to do so:
  • list_of_codons = ["UUU","UCU", "UUC" <...>]
  • list_of_amino_acids = ["phenilalanyne","serine", "phenilalanyne" <...>]
  • we could remember the index of the codon we are looking for
  • Drawbacks
    • What if we loose a single item in one list?
Julian Huber - Fundamentals of Programming

Enter Dictionaries

  • A dictionary stores key-value pairs

    dict = {<key> : <value>}
    
    dict = {"UUU" : "phenilalanyne",
            "UCU" : "serine",
            "UUC" : "phenilalanyne",
            [...]  }
    
  • We can get the data using the keys (index)

    print(dict["UUU"])
    >>> "phenilalanyne"
    
Julian Huber - Fundamentals of Programming

Data structures

  • List: each node except the last has exactly one reference to its successor.
  • Tree: each node can have multiple pointers to successors, but each node is only referenced by at most one predecessor node.
  • Graph: Nodes can also be referenced by multiple predecessor nodes.

Julian Huber - Fundamentals of Programming
Hierarchical Dictionaries (Trees)
  • If we would like to store data of different populations, wouldn't it be nice to have some structure?

    dict = {"E. Coli" : 
                {   "Initial population" : 1000,
                    "Carrying capacity" : 100000,
                    "Growth rate" : 0.1
                    "Data" : [{
                        "time" : 0,
                        "population" : 1000
                    },
                    {    "time" : 1,
                        "population" : 1010
                    }]
    
                },
            "Seals" : [...]  
            }
    
Julian Huber - Fundamentals of Programming

✍️ Case Study: Write a program to analyze logistic growth

  • Write a simulation for logistic growth that evaluates the population size for all time steps starting from zero and only ends if the population surpasses a closeness of to the carrying capacity.
Julian Huber - Fundamentals of Programming

Sometimes it is unclear when to terminate the algorithm

  • Solution with for loop

    for t in range(1,100): 
        current_population = <...>
    
  • Solution with while-loop

    t = 0
    
    while epsilon > 1:
        epsilon = carrying_capacity - current_population
        t = t+1 
        current_population = <...>
    
Julian Huber - Fundamentals of Programming
  • Store the results for each time step in a list. The list should contain dictionaries that have the time step, current population size and the population growth since the last time step.

  • Find the time step with the maximum growth in population.

Julian Huber - Fundamentals of Programming

4 Storing and Entering Information

  • read and try out 1.2.4.1 Dictionaries
Julian Huber - Fundamentals of Programming

1.2.4.2 While-Loops

🎯 Learning Objectives

You will be able to

  • use while loops and prevent endless loops by using break statements
Julian Huber - Fundamentals of Programming

Sometimes it is unclear when to terminate the algorithm

distance_to_wall = 15

while distance_to_wall > 1:
    do_step_forward()
Julian Huber - Fundamentals of Programming

✍️ While-Loops

Julian Huber - Fundamentals of Programming
Julian Huber - Fundamentals of Programming

1.2.5 Functions

Julian Huber - Fundamentals of Programming

🎯 Learning Objectives

You will be able to

  • create functions with (optional) parameters and return values
  • Students use docstings to comment on a function
  • Students can restructure code with functions
Julian Huber - Fundamentals of Programming

You probably created a lot of Code during the last lectures

# importing the math library
import math

t = 0
epsilon = 1

# here the user can define the input
population_size = 1
carrying_capacity = 200
growth_rate = 0.1

results = []

print("Started simulation with the following parameters: \n initial population size: {} \n carrying capacity: {} \n growth rate: {}".format(initial_population_size, carrying_capacity, growth_rate))

while population_size + epsilon  < carrying_capacity:

  last_population_size = population_size

  population_size = carrying_capacity / (1+((carrying_capacity - initial_population_size)/initial_population_size) * math.exp(- growth_rate*t))

  growth = population_size - last_population_size

  print("The population after {} time steps is {}".format(t, population_size))

  t = t + 1  

  new_result = {"Time step" : t, "Population size" : population_size, "Growth" : growth}
  results.append(new_result)

  last_growth = 0

for data_point in results:
  growth = data_point["Growth"] 
  is_current_growth_lower_than_before = (last_growth > growth)

  if is_current_growth_lower_than_before:
    print("Found maximum growth of {} at {}!".format(last_data_point["Growth"],last_data_point["Time step"] ))
    break
  last_growth = data_point["Growth"] 
  last_data_point = data_point
Julian Huber - Fundamentals of Programming

What if You

  • have do some changes (e.g., integrate a new growth function)?
  • want to reuse code to evaluate another population?
Julian Huber - Fundamentals of Programming

Structure code with functions to make simulations reusable

Scenario 1 Scenario 2 Scenario ...
1 10 ...
Initial population 1 200 ...
Carrying capacity 200 1000 ...
Growth rate 0.1 0.01 ...
Julian Huber - Fundamentals of Programming

Functional programming

  • What we did so far is called imperative programming: We told Python what to do.

  • functional programming focuses on functions

    • def f(x):
          """this function's name is f. It takes a value x and returns a value y"""
          [...]
          y=x*2
          return y
      
      f(2) # function call
      
  • Functional programming makes code easier to understand and maintain

  • Functions should

    • be shorter than 20 lines of code
    • do one thing
Julian Huber - Fundamentals of Programming
def add_two_numbers(first_number, second_number):
  """Returns the sum of two numbers."""
  new_sum = first_number + second_number
  return new_sum

new_sum = add_two_numbers(first_number = 1,second_number = 2)
print(new_sum)
Julian Huber - Fundamentals of Programming

Anatomy of a Function

def <function_name>(<parameter>):
  """<docstring>"""
  ...
  return <return_value>

<function_name>(<parameter>=<argument>)

Julian Huber - Fundamentals of Programming

✍️ Case Study: Structure code with functions to make simulations reusable

Scenario 1 Scenario 2 Scenario ...
1 10 ...
Initial population 1 200 ...
Carrying capacity 200 1000 ...
Growth rate 0.1 0.01 ...
  • refactor Your code into three functions and use them
    to make simulations for other populations
Julian Huber - Fundamentals of Programming

1.2.5 Functions

  • there are some 🤓 optional tasks, try them anyway but only after You solved anything else!
  • there is a group work at the end 🏆
Julian Huber - Fundamentals of Programming
Julian Huber - Fundamentals of Programming

1.2.6 Algorithms for Bio Informatics

Julian Huber - Fundamentals of Programming

1.2.6.1 Simple Algorithms for Bio Informatics

🎯 Learning Objectives

You will be able to

  • work with biological sequence data in the form of strings
  • design and implement simple algorithms to solve problems in DNA sequence analysis
  • differentiate between global and local approaches to compare sequence data
Julian Huber - Fundamentals of Programming

Genome - Code of Life

  • Similarity / Evolution
  • Malfunction / Function
https://www.genome.gov/genetics-glossary/Codon
Julian Huber - Fundamentals of Programming

Simple Questions

"TAGCTAGCTAGCTTTTAGTTAGCAGCC"
  • How common are the four nucleotides?
  • Finding the reverse complement (other string of the double-helix)
  • Do we find a certain sequence in the genome and where?
Julian Huber - Fundamentals of Programming

✍️ 1.2.6.1 Simple Algorithms for Bio Informatics

⌛ 25 minutes

Julian Huber - Fundamentals of Programming

1.2.6.2 Sequence alignment

🎯 Learning Objectives

You will be able to

  • implement global and local approaches to compare sequence data
Julian Huber - Fundamentals of Programming

Comparing sequence data

  • so far we only looked for perfect matches
  • we cannot expect perfect matches (mutation, reading errors, etc.)
  • we need other ways to compare sequence data
Julian Huber - Fundamentals of Programming

Sequences are similar, if their nucleotide / amino acid patterns match well

  • each row is one sample sequence
  • amino acids are coded in different colors
https://bioinf.comav.upv.es/courses/biotech3/static/multiple/alig_mul_prot.png
Julian Huber - Fundamentals of Programming

conservative means that the resulting protein is still functional
Julian Huber - Fundamentals of Programming

Sequence similarity helps us to find evolutionary relationship

https://www.yourgenome.org/facts/how-do-you-find-out-the-significance-of-a-genome-after-sequencing
Julian Huber - Fundamentals of Programming

Sequence similarity helps us to find stuff in databases

CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTG
AATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG
CCGCCTCGGGAGCGTCCATGGCGGGTTTGAACCTCTAGCCCGGCGCAGTTTGGGCGCCAAGCCATATGAA
AGCATCACCGGCGAATGGCATTGTCTTCCCCAAAACCCGGAGCGGCGGCGTGCTGTCGCGTGCCCAATGA
https://de.wikipedia.org/wiki/BLAST-Algorithmus
Julian Huber - Fundamentals of Programming

Measuring Similarity between Strings

  • Hamming distance
    • How many letters to we have to change?
      TIME MINE
  • Levenshtein [sic!] Distance
    • How many edits required to transform one string to another?
      • Substitution of a character
      • Deletion of a character
      • Insertion of a character
    • WOMAN MAN
Julian Huber - Fundamentals of Programming

Global vs Local Alignment

  • How similar are these sequences globally?
    "TAGCTAGCTAGCTTTTAGTTAGCAGCC"
    "AGCTAGCTAGCTTTTAGTTAGCAGCCT"
    • Hamming : not similar at all
    • Levenshtein : very similar
  • Local alignment
    compare how well a subset aligns at different positions
    "AGCTAGCT"
Julian Huber - Fundamentals of Programming

Could these sequences have the same origin?

seq_1 = "ABCDEF"
seq_2 = "ABDEFG"
  • , probably not
  • , probably
  • -mers Approach
Julian Huber - Fundamentals of Programming
-mers Approach
  • the sequences are split into fragments of length , which are compared
  • corresponds to the word length in BLAST
seq_1 = "ABC"
seq_2 = "ABD"
seq_1 = "ABC"

# 2-mers of seq_1
mer_1_seq_1 = "AB"
mer_2_seq_1 = "BC"

seq_2 = "ABD"
# 2-mers of seq_2
mer_1_seq_1 = "AB"
mer_2_seq_1 = "BD"
Julian Huber - Fundamentals of Programming
counter = 0
for k_mer_1 in seq_1:               # Create a possible k-mers from seq_1
    for k_mer_2 in seq_2:           # Create a possible k-mers from seq_2
        if k_mer_1 == k_mer_2:      # Compare them
            counter = counter + 1   # Count the identical k-mers

seq_1 = "ABC"
seq_2 = "ABD"

"AB" == "AB" # 1/4 of k-mers matches
"AB" == "BD"
"BC" == "AB"
"BC" == "BD"

# H = 1
# L = 1
Julian Huber - Fundamentals of Programming

For two identical sequences

seq_1 = "ABC"
seq_2 = "ABC"

"AB" == "AB" # 1/4 matches
"AB" == "BC"
"BC" == "AB"   
"BC" == "BC" # 2/4 matches

# H = 0
# L = 0
Julian Huber - Fundamentals of Programming

For longers sequences

seq_1 = "ABCD"
seq_2 = "ABAB"

"AB" == "AB" # 1/9 matches
"AB" == "BA" 
"AB" == "AB" # 2/9 matches
"BC" == "AB"   
"BC" == "BA" 
"BC" == "AB" 
"CD" == "AB"
"CD" == "BA"
"CD" == "AB"

# H = 2
# L = 2
Julian Huber - Fundamentals of Programming

1.2.6.3 Analyzing Complexity

  • When several algorithms solve a problem, how do you know which one
    is best?
    • Is it the simplest?
    • The fastest?
    • The smallest?
    • Or something else?
  • One way to judge an algorithm is by its run time. An algorithm’s run time is the amount of time
    import time
    start = time.perf_counter()
    for i in range(1, 6):
        print(i)
    end = time.perf_counter()
    print(end – start)
    
Julian Huber - Fundamentals of Programming

Linear time

  • The same algorithm would take (about) twice as long if we double number of elements
  • e.g., looping through a range
    import time
    start = time.perf_counter()
    for i in range(1, 12):
        print(i)
    end = time.perf_counter()
    print(end-start)
    
Julian Huber - Fundamentals of Programming

Constant Time

  • Finding the value to a key in a dictionary hash a constant time independent of the length of the dictionary. The computer only has to compute the hash-function of the key to find the memory address:
    data =    {"M25925" : "AGCAAAAGCAGGCAAACCATTT..." ,
            "M25926" :  "AGCAAAAGCAGGCAAACCATTTG....",
            "MT058709" :  "CAAACCATTTGAATGGATGTCAA...."}
    
    data[M25926]
    
  • Fining data in a linked list is more expensive, as the computer has to traverse the list to find the right position in the memory
Julian Huber - Fundamentals of Programming

Quadratic Time

  • Nesting two for-loops has quadratic time complexity
  • if the sequences are twice as long the algorithm will take four times as long
    counter = 0
    for k_mer_1 in seq_1:               # Create a possible k-mers from seq_1
        for k_mer_2 in seq_2:           # Create a possible k-mers from seq_1
            if k_mer_1 == k_mer_2:      # Compare them
                counter = counter + 1   # Count the identical k-mers
    
Julian Huber - Fundamentals of Programming

O-Notation

  • gives us the order of magnitude of the time-complexity of a algorithm
  • with larger data set (larger ) this becomes very important

Julian Huber - Fundamentals of Programming

🏆 Case Study: Develop an Algorithm for finding Similarity

  • We have three Sequences from three different Influenza viruses
  • The sequences encode for the same protein but have have some variations
  • Your task is to find the two samples, that are more closely related to each other
Julian Huber - Fundamentals of Programming
M25925 = {"Sample Name" : "M25925", "Sequence" : "AGCAAAAGCAGGCAAACCATTT..." }

M25926 = {"Sample Name" : "M25926", "Sequence" :  "AGCAAAAGCAGGCAAACCATTTG...."}

MT058709 = {"Sample Name" : "MT058709", "Sequence" :  "CAAACCATTTGAATGGATGTCAA...."}
Julian Huber - Fundamentals of Programming

6 Algorithms for Bio Informatics

  • this might be the most challenging task so far, as it requires some creativity
  • pen and paper are useful tool to structure Your ideas
  • this is a good preparation for Your final project

⌛ 60 minutes

Julian Huber - Fundamentals of Programming
Julian Huber - Fundamentals of Programming

1.2.7 Testing and Final Challenge

Julian Huber - Fundamentals of Programming

1.2.7.1 Testing

🎯 Learning Objectives

You will be able to

  • use unit tests, to test the functionality of their functions
  • develop and test their own algorithms and code to find practical solutions for sequence analysis
Julian Huber - Fundamentals of Programming

Functions

def get_formatted_name(first, last):
  """Generate a neatly formatted full name."""
  full_name = first + ' ' + last
  return full_name.title()
  • must fulfill certain requirements
  • grow and change over time
Julian Huber - Fundamentals of Programming

Unittests

import unittest

# We define a class the holds different test for this specific function
class NamesTestCase(unittest.TestCase):
  """Tests for 'name_function.py'."""

# We define on of several test function for the function
  def test_first_last_name(self):
    """Do names like 'Janis Joplin' work?"""
    # we all the function
    formatted_name = get_formatted_name('janis', 'joplin')
    # we check whether we get the expected result
    self.assertEqual(formatted_name, 'Janis Joplin')

unittest.main(argv=[''], verbosity=2, exit=False)
  • provide a automatic test cast for a unit of code
  • you can run them after each change in the function to check whether they still work as expected
Julian Huber - Fundamentals of Programming

✍️ 1.2.7.1 Testing Your Code

7 Reconstruction in Shotgun-Sequencing

⌛ 30 minutes

Julian Huber - Fundamentals of Programming

1.2.7.2 🏆 Final Project: Reconstruction in Shotgun-Sequencing

So far, we assumed to have full length DNA-Sequences available from a database or any other source. In fact, we are still developing the technology yet to read long strands of DNA and directly storing them in a database. Instead, the DNA strand is broken down in shorter sequences for reading the results. However, we have to bring the short sequences in the right order again.

https://www.yourgenome.org/facts/how-do-you-find-out-the-significance-of-a-genome-after-sequencing
Julian Huber - Fundamentals of Programming

Shotgun Sequencing / Assembly

https://www.genome.gov/sites/default/files/media/images/tg/Shotgun-sequencing.jpg
Julian Huber - Fundamentals of Programming

"Shotgun sequencing is a laboratory technique for determining the DNA sequence of an organism’s genome. The method involves randomly breaking up the genome into small DNA fragments that are sequenced individually. A computer program looks for overlaps in the DNA sequences, using them to reassemble the fragments in their correct order to reconstitute the genome." genome.gov

You have a list of fragmented DNA-snippets that have to be ordered in the right order. Your task is to implement an algorithm, that brings the DNA-sequence into thr right order.

Julian Huber - Fundamentals of Programming

Verbal Description of the Algorithm

  • Randomly pick a sequence of five to eight nucleotides form the following original sequence and write them on a piece of paper
    "TAGCTAGCTAGCTTTTAGTTAGCAGCC"
  • write Your name on the other side of the paper
Julian Huber - Fundamentals of Programming

Algorithm assemble_sequence()

  • Inputs: List with segment fragment
  • Output: Assembled sequence
1) draw a random seed sequence
2) while sequences left in the list
    2.1) draw next sequence
    2.2) compare_beginning(overlap)
        if exact match 
            glue_fragments_end()
    2.3) compare_end(overlap)
        if exact match 
            glue_fragments_end()
Julian Huber - Fundamentals of Programming
  • does the algorithm terminate?
    • yes, at one point we cannot find any new matches
    • but, we cannot be sure, that we found the whole original sequence
  • is the algorithm deterministic?
    • it depends on how we select the sequences
    • we can get stuck, if we do it in the wrong order
    • hence, we should draw by random and to it multiple times
Julian Huber - Fundamentals of Programming

Your task

  • You will implement this algorithm in Python
  • The starting point is given in the notebook
  • Your submission is a *.ipynb file showing all the results in the sakai-task
  • the naming convention must be
    • <lastname>.pdf
    • e.g. huber.pdf
  • Deadline: see Sakai
Julian Huber - Fundamentals of Programming

What is given to You

  • Jupyter Notebook with the outline
  • the original DNA-sequence and the list of fragments
  • four functions with name, parameters and docstring
  • four three cases, which must be passed to get the points
Julian Huber - Fundamentals of Programming

Grading

This is the basis for the final grade.

  • Passing the unit tests for the following functions:
    • CompareBeginningsTestCase - 20 %
    • CompareEndingsTestCase- 20%
    • GlueFragmentsTestCase - 20 %
  • Implement the following function without unit test:
    • assemble_sequence() - 10 %
  • Coding style and comments - 20 %
  • Length of final sequence and speed of execution of assemble_sequence() - 10 %
Julian Huber - Fundamentals of Programming

1.2.7 Reconstruction in Shotgun-Sequencing

  • make yourself familiar with the task

⌛ 30 minutes

Julian Huber - Fundamentals of Programming