1.2.1 Variables and Data Types
25 minutes
Hints
# importing the math library
import math
# here the user can define the input
...
# here the calculation is made
...
# here, we print out the results
...
You will be able to
None
type variables=
) and Comparison (==
)eve_is_here = True # Assignment: eve_is_here is set to True
adam_is_here = False # Assignment
print(adam_is_here == eve_is_here) # Comparison of the values
#> False
reading_frame_start_position = None
type(reading_frame_start_position)
#> NoneType
10 minutes
You will be able to
nucleotides = ["Guanine","Adenine","Cytosine","Thymine"]
print(nucleotides[0])
#> "Guanine"
0/1
)list
ATGG
TACC
sting = "ATGG"
["A","T","G","G"]
You will be able to
nucleotides = ["Guanine","Adenine","Cytosine","Thymine"]
for nucleotide in nucleotides:
print(nucleotide)
#> "Guanine"
#> "Adenine"
#> "Cytosine"
#> "Thymine"
You will be able to
dna_sequence = ["A", "U", "C", "C", "G","A", "G", "C", "U", "E", "G","A", "G", "C", "U", "G", "Z", "G","A", "G", "C", "U","U"]
A
, U
, T
, and G
. First clean the data by removing all corrupted items from the list.['A', 'U', 'C']
['C', 'G', 'A']
['G', 'C', 'U']
['G', 'A', 'G']
['C', 'U', 'G']
['G', 'A', 'G']
['C', 'U', 'U']
10 minutes
You will be able to
a = True
b = True
c = False
result = (a and b) or (b or c)
a = ("abc" == "abc")
b = (5<0)
=
and ==
10 minutes
You will be able to
if
, elif
or else
statementTrue
or False
if <logical expression>:
<statement>
if True:
print("This is printed!")
if False:
print("This is not printed!")
if <logical expression>:
<statement>
TRUE
: statement is executedFALSE
: statement is ignoredpin_correct = True
if pin_correct:
print('Pin is correct!')
if <logical expression>:
<statement a>
else <logical expression>:
<statement b>
TRUE
: statement a is executedFALSE
: statement b is ignored# Beispiel printed Lösung
a = 4
b = 3
if a>b:
print("a is larger than b")
else:
print("b is larger than a")
Write a program the checks the lab clearance of anyone wanting to enter the lab:
users_with_clearance
.users_wanting_to_enter
. Make sure one or two of the new usernames are also in the users_with_clearance
list.users_wanting_to_enter
list to see if each of them has a lab clearance. Print a message that to greet all the persons. If the person hasn't sent them to a supervisor.45 minutes
You will be able to
list_of_codons = ["UUU","UCU", "UUC" <...>]
list_of_amino_acids = ["phenilalanyne","serine", "phenilalanyne" <...>]
index
of the codon we are looking forEnter Dictionaries
A dictionary stores key-value pairs
dict = {<key> : <value>}
dict = {"UUU" : "phenilalanyne",
"UCU" : "serine",
"UUC" : "phenilalanyne",
[...] }
We can get the data using the keys (index)
print(dict["UUU"])
>>> "phenilalanyne"
If we would like to store data of different populations, wouldn't it be nice to have some structure?
dict = {"E. Coli" :
{ "Initial population" : 1000,
"Carrying capacity" : 100000,
"Growth rate" : 0.1
"Data" : [{
"time" : 0,
"population" : 1000
},
{ "time" : 1,
"population" : 1010
}]
},
"Seals" : [...]
}
Sometimes it is unclear when to terminate the algorithm
Solution with for loop
for t in range(1,100):
current_population = <...>
Solution with while-loop
t = 0
while epsilon > 1:
epsilon = carrying_capacity - current_population
t = t+1
current_population = <...>
Store the results for each time step in a list. The list should contain dictionaries that have the time step, current population size and the population growth since the last time step.
Find the time step with the maximum growth in population.
4 Storing and Entering Information
You will be able to
Sometimes it is unclear when to terminate the algorithm
distance_to_wall = 15
while distance_to_wall > 1:
do_step_forward()
You will be able to
You probably created a lot of Code during the last lectures
# importing the math library
import math
t = 0
epsilon = 1
# here the user can define the input
population_size = 1
carrying_capacity = 200
growth_rate = 0.1
results = []
print("Started simulation with the following parameters: \n initial population size: {} \n carrying capacity: {} \n growth rate: {}".format(initial_population_size, carrying_capacity, growth_rate))
while population_size + epsilon < carrying_capacity:
last_population_size = population_size
population_size = carrying_capacity / (1+((carrying_capacity - initial_population_size)/initial_population_size) * math.exp(- growth_rate*t))
growth = population_size - last_population_size
print("The population after {} time steps is {}".format(t, population_size))
t = t + 1
new_result = {"Time step" : t, "Population size" : population_size, "Growth" : growth}
results.append(new_result)
last_growth = 0
for data_point in results:
growth = data_point["Growth"]
is_current_growth_lower_than_before = (last_growth > growth)
if is_current_growth_lower_than_before:
print("Found maximum growth of {} at {}!".format(last_data_point["Growth"],last_data_point["Time step"] ))
break
last_growth = data_point["Growth"]
last_data_point = data_point
What if You
Scenario 1 | Scenario 2 | Scenario ... | |
---|---|---|---|
1 | 10 | ... | |
Initial population | 1 | 200 | ... |
Carrying capacity | 200 | 1000 | ... |
Growth rate | 0.1 | 0.01 | ... |
What we did so far is called imperative programming: We told Python what to do.
functional programming focuses on functions
def f(x):
"""this function's name is f. It takes a value x and returns a value y"""
[...]
y=x*2
return y
f(2) # function call
Functional programming makes code easier to understand and maintain
Functions should
def add_two_numbers(first_number, second_number):
"""Returns the sum of two numbers."""
new_sum = first_number + second_number
return new_sum
new_sum = add_two_numbers(first_number = 1,second_number = 2)
print(new_sum)
def <function_name>(<parameter>):
"""<docstring>"""
...
return <return_value>
<function_name>(<parameter>=<argument>)
Scenario 1 | Scenario 2 | Scenario ... | |
---|---|---|---|
1 | 10 | ... | |
Initial population | 1 | 200 | ... |
Carrying capacity | 200 | 1000 | ... |
Growth rate | 0.1 | 0.01 | ... |
You will be able to
Genome - Code of Life
Simple Questions
"TAGCTAGCTAGCTTTTAGTTAGCAGCC"
25 minutes
You will be able to
Sequences are similar, if their nucleotide / amino acid patterns match well
Sequence similarity helps us to find evolutionary relationship
Sequence similarity helps us to find stuff in databases
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTG
AATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG
CCGCCTCGGGAGCGTCCATGGCGGGTTTGAACCTCTAGCCCGGCGCAGTTTGGGCGCCAAGCCATATGAA
AGCATCACCGGCGAATGGCATTGTCTTCCCCAAAACCCGGAGCGGCGGCGTGCTGTCGCGTGCCCAATGA
TIME
MINE
WOMAN
MAN
"TAGCTAGCTAGCTTTTAGTTAGCAGCC"
"AGCTAGCTAGCTTTTAGTTAGCAGCCT"
"AGCTAGCT"
Could these sequences have the same origin?
seq_1 = "ABCDEF"
seq_2 = "ABDEFG"
seq_1 = "ABC"
seq_2 = "ABD"
seq_1 = "ABC"
# 2-mers of seq_1
mer_1_seq_1 = "AB"
mer_2_seq_1 = "BC"
seq_2 = "ABD"
# 2-mers of seq_2
mer_1_seq_1 = "AB"
mer_2_seq_1 = "BD"
counter = 0
for k_mer_1 in seq_1: # Create a possible k-mers from seq_1
for k_mer_2 in seq_2: # Create a possible k-mers from seq_2
if k_mer_1 == k_mer_2: # Compare them
counter = counter + 1 # Count the identical k-mers
seq_1 = "ABC"
seq_2 = "ABD"
"AB" == "AB" # 1/4 of k-mers matches
"AB" == "BD"
"BC" == "AB"
"BC" == "BD"
# H = 1
# L = 1
For two identical sequences
seq_1 = "ABC"
seq_2 = "ABC"
"AB" == "AB" # 1/4 matches
"AB" == "BC"
"BC" == "AB"
"BC" == "BC" # 2/4 matches
# H = 0
# L = 0
For longers sequences
seq_1 = "ABCD"
seq_2 = "ABAB"
"AB" == "AB" # 1/9 matches
"AB" == "BA"
"AB" == "AB" # 2/9 matches
"BC" == "AB"
"BC" == "BA"
"BC" == "AB"
"CD" == "AB"
"CD" == "BA"
"CD" == "AB"
# H = 2
# L = 2
import time
start = time.perf_counter()
for i in range(1, 6):
print(i)
end = time.perf_counter()
print(end – start)
import time
start = time.perf_counter()
for i in range(1, 12):
print(i)
end = time.perf_counter()
print(end-start)
data = {"M25925" : "AGCAAAAGCAGGCAAACCATTT..." ,
"M25926" : "AGCAAAAGCAGGCAAACCATTTG....",
"MT058709" : "CAAACCATTTGAATGGATGTCAA...."}
data[M25926]
counter = 0
for k_mer_1 in seq_1: # Create a possible k-mers from seq_1
for k_mer_2 in seq_2: # Create a possible k-mers from seq_1
if k_mer_1 == k_mer_2: # Compare them
counter = counter + 1 # Count the identical k-mers
M25925 = {"Sample Name" : "M25925", "Sequence" : "AGCAAAAGCAGGCAAACCATTT..." }
M25926 = {"Sample Name" : "M25926", "Sequence" : "AGCAAAAGCAGGCAAACCATTTG...."}
MT058709 = {"Sample Name" : "MT058709", "Sequence" : "CAAACCATTTGAATGGATGTCAA...."}
6 Algorithms for Bio Informatics
60 minutes
You will be able to
def get_formatted_name(first, last):
"""Generate a neatly formatted full name."""
full_name = first + ' ' + last
return full_name.title()
import unittest
# We define a class the holds different test for this specific function
class NamesTestCase(unittest.TestCase):
"""Tests for 'name_function.py'."""
# We define on of several test function for the function
def test_first_last_name(self):
"""Do names like 'Janis Joplin' work?"""
# we all the function
formatted_name = get_formatted_name('janis', 'joplin')
# we check whether we get the expected result
self.assertEqual(formatted_name, 'Janis Joplin')
unittest.main(argv=[''], verbosity=2, exit=False)
7 Reconstruction in Shotgun-Sequencing
30 minutes
So far, we assumed to have full length DNA-Sequences available from a database or any other source. In fact, we are still developing the technology yet to read long strands of DNA and directly storing them in a database. Instead, the DNA strand is broken down in shorter sequences for reading the results. However, we have to bring the short sequences in the right order again.
"Shotgun sequencing is a laboratory technique for determining the DNA sequence of an organism’s genome. The method involves randomly breaking up the genome into small DNA fragments that are sequenced individually. A computer program looks for overlaps in the DNA sequences, using them to reassemble the fragments in their correct order to reconstitute the genome." genome.gov
You have a list of fragmented DNA-snippets that have to be ordered in the right order. Your task is to implement an algorithm, that brings the DNA-sequence into thr right order.
"TAGCTAGCTAGCTTTTAGTTAGCAGCC"
Algorithm assemble_sequence()
1) draw a random seed sequence
2) while sequences left in the list
2.1) draw next sequence
2.2) compare_beginning(overlap)
if exact match
glue_fragments_end()
2.3) compare_end(overlap)
if exact match
glue_fragments_end()
*.ipynb
file showing all the results in the sakai-task<lastname>.pdf
huber.pdf
What is given to You
This is the basis for the final grade.
CompareBeginningsTestCase
- 20 %CompareEndingsTestCase
- 20%GlueFragmentsTestCase
- 20 %assemble_sequence()
- 10 %assemble_sequence()
- 10 %1.2.7 Reconstruction in Shotgun-Sequencing
30 minutes