Adapt the following search term so that it returns the specific gene for two other the organisms
(GDF8[Gene Name]) AND ("Homo sapiens"[Organism] or "Bos Taurus"[Organism])
Significance of Boolean algebra for computer science
search terms
Storage of individual bits can be realized very simply by storing electrical charge, e.g. by capacitors or flip-flops:
electrical charge present 1
electrical charge not present 0
Conjunction, disjunction and negation can easily be realized by means of switches (transistors, relays) which pass on charge (information) or not.
controls execution of software code
Example conjunction:
Task
given the following patient data
formulate the following equations that result in boolean values
a ... patient is heavier than 80 kg
b ... patient is younger than 30 years
c ... patient is heavier than 80 kg but older than 30 years
fill in the truth table with the formula and the resulting values
age_years
weight_kg
a
b
c
20
60
34
70
25
120
26
90
e.g. a=(weight>80)
age
weight
a=(weight>80)
b=(age<30)
c=(a∧¬b)
20
60
FALSE
TRUE
FALSE
34
70
FALSE
FALSE
FALSE
25
120
TRUE
TRUE
FALSE
26
90
TRUE
TRUE
FALSE
Learning Summary
Now, You can
describe logical statements with Boolean variables.
evaluate the Boolean operations AND, OR, NEGATION and EXCLUSIVE OR.
can combine multiple Boolean operations into meaningful expressions.
1.4 Programming Computers
Learning objectives
You will be able to
describe the task of a compiler
formulate an algorithm using a an UML-Sequence diagram
Software as interface between human and machine
Computer are hardware that purely operate with boolean algebra
The machine program interpreted by the processor is a sequence of zeros and ones (incomprehensible to most humans)
Users need to write, read and understand programs. Programming languages are approaches to develop programs with less effort and in a form that is more comprehensible to humans
"Shotgun sequencing is a laboratory technique for determining the DNA sequence of an organism’s genome. The method involves randomly breaking up the genome into small DNA fragments that are sequenced individually. A computer program looks for overlaps in the DNA sequences, using them to reassemble the fragments in their correct order to reconstitute the genome." genome.gov
You have a list of fragmented DNA-snippets that have to be ordered in the right order. Your task is to implement an algorithm, that brings the DNA-sequence into thr right order.
Randomly pick a sequence of eight nucleotides (read) from the following original sequence and write them on a piece of paper
"TAGCTAGCTAGCTTTTAGTTAGCAGCC"
Building blocks for algorithms
loops (same steps several times)
we to compare all the sequences available
conditional execution (e.g., =if x == True:)
we only add a sequence is we have a match
elementary operations (e.g., and, =)
starting from the seed, we expand the final sequence
sequential execution
the algorithm only makes sense if we keep the sequence
Algorithms
Precise (written in a specified language) finite description of a general procedure
using executable elementary (processing) steps.
Examples
find an open reading fame
Translating a DNA into an mRNA sequence
Addition of two binary numbers
Calculating the number π
Properties: Termination
An algorithm is called terminating if it terminates (stops) (for any allowed input of parameter values) after finitely many steps.
Question: Do the algorithms in the previous slide terminate?
Translating a DNA into an mRNA sequence
yes, the DNA sequence ends and the translation stops
Addition of two binary numbers
yes, there is a result at the end
calculating the number π
no, the calculation can continue up to any precision
Properties: Determinism
establishes "freedom of choice" or "randomness" in execution
Forms
deterministic sequence: unambiguous specification of the sequence of steps to be executed.
deterministic result: unambiguous result for given parameter values
Deterministic algorithm:
Computer makes a copy of a DNA strand
Non-determined "algorithm"
Cell makes a copy of a DNA strand
Building blocks for algorithms
Inputs: Data that goes in should be in a well defined format
Storage: Data that is stored during execution, to make things easier
Outputs: Data that comes out
Building blocks for algorithms
elementary and boolean operations (e.g. +, and, =, etc.)
sequential execution (one processor)
parallel execution (multiple processors)
conditional execution (e.g. if)
loop (same steps several times) (e.g. while, for)
subroutines
recursion: application of the same principle to smaller subproblems
elementary operations
Assignment (of a variable a value) Result = 2
boolean operation True and False (operation) True == True (comparison)
Arithmetic operation 5+2
Sequenz
Steps happen in a certain order
(1) Boil water
(2) Put coffee powder into cup
(3) Fill water into cup
Refinement and subroutines
(2) Pour coffee powder into cup
refined to
(2.1) Open coffee glass
(2.2) Take out spoon of coffee
(2.3) Tilt spoon into cup
(2.4) Close coffee glass
conditional execution / selection / if
if <condition>
then <step>
resp.
if <condition>
then <step a>
otherwise <step b>
Selection nesting
if coffee is available
then open coffee glass
otherwise go to favorite cafe
if favorite cafe open
then order coffee
otherwise
go on to MCI
Loop (as long as/while)
/* next prime */
repeat
Add 1;
test for prime number property
until number is prime;
output number
Algorithm assemble_sequence()
1) draw a random seed sequence
2) while sequences left
2.1) draw next sequence
2.2) compare_beginning(overlap)
if match glue_fragments_end()
2.3) compare_end(overlap)
if match glue_fragments_end()
does the algorithm terminate?
yes, at one point we can not find any matches
but, we cannot be sure, that we found the whole original sequence
is the algorithm deterministic?
it depends on how we select the sequences
we can get stuck, if we do it in the wrong order
hence, we should draw by random and do it multiple times
UML activity diagrams
Graphical description of the flow of an algorithm
Example: Cooking noodles
Start and stop are marked
activities can be kept very rough at first
Branches are provided with conditions (boolean expressions)
Does this algorithm always terminate?
Task: Algorithm assemble_sequence()
Create and UML activity diagram that creates the complement of a DNA sequence
1) draw a random seed sequence
2) while sequences left
2.1) draw next sequence
2.2) compare_beginning(overlap)
if match glue_fragments_end()
2.3) compare_end(overlap)
if match glue_fragments_end()
Task: Algorithm Complement of a DNA sequence
Create and UML activity diagram that creates the complement of a DNA sequence
e.g., TCCACTGTGGCGTTCGGAATATAATCG
verbal description
go through the list
replace each letter by its complement
T➜A
A➜T
C➜G
G➜C
input: list to complement: TCCACTGTGGCGTTCGGAATATAATCG
output: complemented list: AGG...
storage: temporary list:
graph TD
start[Start] --> create_list(Create empty temporary list)
create_list --> start_loop(what is the first/next letter in the original list)
start_loop -->|Next letter == A| A(Add a T to the temporary list)
start_loop -->|Next letter == T| T(Add a A to the temporary list)
start_loop -->|Next letter == G| G(Add a C to the temporary list)
start_loop -->|Next letter == C| C(Add a G to the temporary list)
A --> letters_left(are there any letters left in the original list)
T --> letters_left
G --> letters_left
C --> letters_left
letters_left -->|yes| start_loop
letters_left -->|no| stop_loop(return the temporary list as complement)
stop_loop --> Stop
Create and UML activity diagram on how to find an open reading frame
If we have multiple, it is ok to stop after You found the first one
Open Reading Frames
have the potential to be transcribed into RNA and translated into protein
from a start codon (e.g., ATG), through a subsequent region which usually has a length that is a multiple of 3 nucleotides to a stop codon (e.g., TGA) in the same reading frame.
What are the Inputs and Outputs to our Algorithm?
Inputs:
String (text) of a RNA sequence
Outputs:
found_open_reading_frame: True or False
start_position: number of the first letter of the reading frame (A in the ATG)
stop_position: number of the last letter of the reading frame (A in the TGA)
Verbal Description
set found_open_reading_frame to False
Iterate through the RNA sequence in steps of 1
Look at the next three letters
if the letters are ATG
note the position of A in start_position
Iterate through the next three-letter pairs
if any of them is TGA
note the position of A in stop_position
set found_open_reading_frame to True
terminate algorithm
else go to ne next letter
if at the end of RNA sequence
terminate algorithm
this graphical interpretation of the algorithm is not very precise
anyone could interpret this a litte differently
computers cannot work with this ambiguity
programming languages provide a precise way to describe algorithms
graph TD
start[Start] --> setFound(Found open_reading_frame = False)
setFound --> loopStart{Iterate through RNA sequence}
loopStart -->|Next three letters == ATG| noteStart(Note position of A in start_position)
loopStart -->|Next three letters != ATG| moveNext
loopStart -->|No letters left| stop
moveNext[go to next letter] --> loopStart
noteStart --> loopPairs{Iterate through next three-letter pairs}
loopPairs -->|pair == TGA| noteStop(Note position of A in stop_position)
loopPairs -->|pair is not TGA| continuePairs[go to next pair]
continuePairs --> loopPairs
loopPairs -->|no pair left| loopStart
noteStop --> setTrue(Set found_open_reading_frame to True)
setTrue --> stop