1.1 Computational Basics

WiSe 2023/2024
Julian Huber

Biodata Science: Fundamentals of Programming

Biodata Science: Fundamentals of Programming

1.1 Binary Information Representation in Computers

🎯 Learning objectives

You will be able to

  • describe the difference between data and information.
  • understand how information can be stored in bits.
Biodata Science: Fundamentals of Programming

🤓 Motivation: History of Computation

  • 2500 B.C. Abacus/calculating slide
    (probably of Sumerian origin)
    • Representation of numbers
    • Rules for computation (algorithms)
Timeline of Computer History, https://de.wikipedia.org/wiki/Abakus_(Rechenhilfsmittel)
Biodata Science: Fundamentals of Programming

🤓

  • 1804 Punched cards for controlling looms
  • 1833 Charles Babbage calculating machine with punched cards
Biodata Science: Fundamentals of Programming

🤓K-Adder (1937)

  • 1937 K-Adder
https://www.computerhistory.org/timeline/1937/#169ebbe2ad45559efbc6eb35720eb5ea
Biodata Science: Fundamentals of Programming

🤓

  • 1941 Konrad Zuse builds Z3 as first digital computer with programming language Plankalkül.
  • 1954 TRADIC first computer based on transistors
  • 1977 Commodore as the first personal computer
  • 1991 Tim Berners-Lee puts first website online
  • 2020 over 27 billion networked devices
https://upload.wikimedia.org/wikipedia/commons/4/4c/Z3_Deutsches_Museum.JPG
Biodata Science: Fundamentals of Programming

https://www.computerhistory.org/timeline/1937/#169ebbe2ad45559efbc6eb35720eb5ea
Biodata Science: Fundamentals of Programming

🧠 Half adder

  • Simple electrical calculator e.g. implemented as a K-Adder
  • Adds values of the input and displays them in the
    output via two lamps
  • The information is described in each case by two states
    (keystroke or light on/off)
  1. Input:

    • Button A (left)
    • Button B (right)
  2. Output

    • Lamp left
    • Lamp right
Biodata Science: Fundamentals of Programming

✍️ Task (1/2)

  • Visit the website linked below
  • There you will find a simulation of the half adder
  1. Input:

    • Button A (top left)
    • Button B (bottom left)
  2. Output

    • Lamp left (top right)
    • Lamp right (bottom right)
https://www.falstad.com/circuit/e-halfadd.html
Biodata Science: Fundamentals of Programming

✍️ Task (2/2)

  • Fill in the table for the lamps for all possible combinations of switch positions
    (1 for (on/pressed) or 0 for (off)).
  • What numerical value (result of addition) does each result correspond to?
  • Can more numerical values be displayed via the two lamps?
  • The half adder obviously calculates a result. Would you call it a computer? What distinguishes it from modern computers?

⌛ 7 minutes

Biodata Science: Fundamentals of Programming
Up Button Down Button B Up Lamp Down Lamp Result
0 0
1 0
0 1
1 1
Biodata Science: Fundamentals of Programming
Solution: Completed truth table
Up Button Down Button B Up Lamp Down Lamp Result
0 0 0 0 0
1 0 0 1 1
0 1 0 1 1
1 1 1 0 2
Biodata Science: Fundamentals of Programming
  • In the results table, the case where both lamps are lit does not occur. This numerical value could represent another numerical value (e.g. 3)
    • Up 0 & Down 0 ➡️ 0
    • Up 0 & Down 1 ➡️ 1
    • Up 1 & Down 0 ➡️ 2
    • Up 1 & Down 1 ➡️ 3
  • Modern computers are capable of solving many different tasks. Due to the fixed wiring, the half adder allows only one operation.
https://www.falstad.com/circuit/e-halfadd.html
Biodata Science: Fundamentals of Programming

🧠 Bits (Binary Digits)

  • each lamp hold a Bit of information
  • store value of a Boolean variable
  • two states arbitrarily interpretable:
    • 0/1
    • True/False
    • on/off
    • High voltage / low voltage
  • easy to store
  • easy to transmit
Biodata Science: Fundamentals of Programming
  • Not only numbers, all information is stored binary information
  • Imaging of data carriers under a scanning electron microscope

https://mybroadband.co.za/news/gadgets/130658-what-cds-dvds-blu-rays-look-like-under-a-microscope.html
Biodata Science: Fundamentals of Programming

🧠 Combinatorial power

  • 2n2^n different states can be represented per nn bit.
  • With 1 bit one can represent 21=22^1 = 2 different states:
    • [0, 1]
  • With 3 bits you can represent 22=42^2 = 4 different states:
    • [00, 01, 10, 11]
  • With 3 bits you can represent 23=82^3 = 8 different states:
    • [000, 001, 010, 011, 100, 101, 110, 111]
Biodata Science: Fundamentals of Programming
✍️ How many bits do we need to specify all the states of a traffic light?
  • There are 3 states / 4 with everything off.
  • 1 bit is not enough [0,1]
  • 2 bits are enough [00,01,10,11]

https://www.bertelsmann-stiftung.de/en/themen/aktuelle-meldungen/2021/november/deutsche-blicken-mit-grossen-erwartungen-und-skepsis-auf-die-ampel-koalition
Biodata Science: Fundamentals of Programming

Byte

  • data is usually stored in multiples of 88 Bit

    0101 01011

  • one Byte can store 82=2568^2=256 different states

Biodata Science: Fundamentals of Programming
Biodata Science: Fundamentals of Programming
  • Bits can code four different calculation results (decimal numbers)
    • 0 0 ➡️ 0
    • 0 1 ➡️ 1
    • 1 0 ➡️ 2
    • 1 1 ➡️ 3
https://www.falstad.com/circuit/e-halfadd.html
Biodata Science: Fundamentals of Programming

🎯 Learning Summary

Now, You know

  • that data represents stored information.
  • that information can be stored in a succession of bits 0, 1
Biodata Science: Fundamentals of Programming

1.2 Binary Number System

🎯 Learning objectives

You will be able to

  • represent positive and negative integers in the binary system.
  • perform addition and subtraction in the binary system.
Biodata Science: Fundamentals of Programming

🧠 Decimal System

  • The representation uses 1010 as base
  • The place of the digit indicates to which exponent
  • Representation of the number 47114711 in the decimal system
10310^3 10210^2 10110^1 10010^0 Computation Decimal
44 77 11 11 41.000+7100+110+114 \cdot 1.000 + 7 \cdot 100 + 1 \cdot 10 + 1 \cdot 1 47114711
Biodata Science: Fundamentals of Programming

🧠 Binary System

  • The same representation can be used to represent natural numbers as a sequence of 0 and 1.
  • The base is the number 22
232^3 222^2 212^1 202^0 Computation Decimal
00 00 11 00 08+04+12+010 \cdot 8 + 0 \cdot 4 + 1 \cdot 2 + 0 \cdot 1 22
  • Most Significant Bit (MSB) - digit with the highest exponent. In our examples the left
Biodata Science: Fundamentals of Programming
232^3 222^2 212^1 202^0 Berechnung Dezimalzahl
0 0 0 0 08+04+02+010 \cdot 8 + 0 \cdot 4 + 0 \cdot 2 + 0 \cdot 1 00
0 0 0 1 08+04+02+110 \cdot 8 + 0 \cdot 4 + 0 \cdot 2 + 1 \cdot 1 11
0 0 1 0 08+04+12+010 \cdot 8 + 0 \cdot 4 + 1 \cdot 2 + 0 \cdot 1 22
0 0 1 1 08+04+12+110 \cdot 8 + 0 \cdot 4 + 1 \cdot 2 + 1 \cdot 1 33
0 1 0 0 08+14+02+010 \cdot 8 + 1 \cdot 4 + 0 \cdot 2 + 0 \cdot 1 44
0 1 0 1 08+14+02+110 \cdot 8 + 1 \cdot 4 + 0 \cdot 2 + 1 \cdot 1 55

The table contains examples of how natural numbers can be represented in the binary system:

Biodata Science: Fundamentals of Programming

Leibniz 1697

:40%

Biodata Science: Fundamentals of Programming

The lamps of the half adder also represent a binary number.

  • A+C=DA + C = D
    • AA a binary number with one digit: 1 or 0.
    • BB a binary number with one digit: 1 or 0
    • DD a binary number with two digits: 00, 01, 10, 11
Button A Button B Lamp left Lamp right Decimal
0 0 0 0 0
1 0 0 1 1
0 1 0 1 1
1 1 1 0 2
Biodata Science: Fundamentals of Programming
Button A Button B Lamp left: 212^1 Lampe right: 202^0 Zahlenwert
0 0 0 0 021+0200 \cdot 2^1 + 0 \cdot 2^0
1 0 0 1 021+1200 \cdot 2^1 + 1 \cdot 2^0
0 1 0 1 021+1200 \cdot 2^1 + 1 \cdot 2^0
1 1 1 0 121+0201 \cdot 2^1 + 0 \cdot 2^0
Biodata Science: Fundamentals of Programming

🧠 Signed binary number representation

  • A sign bit (SB) can be reserved to also represent negative numbers

    SB 222^2 212^1 202^0 Computation Decimal
    1 1 0 0 (14+02+01(1 \cdot 4 + 0 \cdot 2 + 0 \cdot 1) 4-4
  • We call a positive natural number Integer (int)
  • In natural number unsigned Integer (uint)
https://zitoc.com/signed-number/
Biodata Science: Fundamentals of Programming

✍️ Task

  • What is the smallest and largest integer value as a decimal number that a computer can store in each of the following numbers of bits?
  • given the schema of encoding of the slides before
  • Hint: With a bit reserved for the sign one bit less is available for the binary number
    Number of bits with sign bit smallest integer largest integer
    3 yes
    3 no
    4 yes
    4 no

⌛ 5 minutes

Biodata Science: Fundamentals of Programming

Solution

  • What is the smallest and largest integer value as a decimal number that a computer can store in each of the following memory array?

    Number of bits with sign bit smallest integer largest integer
    33 yes (121+120)=3-(1\cdot 2^1+1\cdot 2^0)= -3 (121+120)=3(1\cdot 2^1+1\cdot 2^0)= 3
    33 no 00 (122+121+120)=7(1 \cdot 2^2 + 1\cdot 2^1+1\cdot 2^0)= 7
    44 yes 7-7 77
    44 no 00 1515
  • Note, that there are positive and negative zero

  • Modern computers work with another format: twos' complement

Biodata Science: Fundamentals of Programming

🧠 Addition in the Binary System

  • works analogous to the decimal system
  • we work with numbers without sign bit
Binary Decimal
1001   9~~9
+ 0011   3~~3
= _ _ _ _ 1212
Biodata Science: Fundamentals of Programming
Binary Decimal
1001 99
+ 0011 33
= ___0 1212
  • Start from right (least significant bit)
  • 1+1=21+1 = 2
  • In decimal system 22 is represented as 10
  • 0 write to last digit (sum bit s)
  • 1 transfer to penultimate digit (carrier bit c)
Biodata Science: Fundamentals of Programming
Binary Decimal
1001 99
+ 0011 33
+ __1_
= __00 1212
  • add next digits and consider carry (1)
  • 0+1+1 = 10
  • Write 0 in penultimate place
  • 1 transfer to penultimate digit
Biodata Science: Fundamentals of Programming
Binary Decimal
1001 99
+ 0011 33
+ _1__
= _100 1212
  • add next digit
  • 0+1+0 = 1
  • In the decimal system 11 is represented as 1
  • write 1 to the second to last digit
  • no carry necessary
Biodata Science: Fundamentals of Programming
Binary Decimal
1001 99
+ 0011 33
= 1100 1212
  • add next digit
  • 1+0 = 1
  • In the decimal system 11 is represented as 1
  • write 1 to the second to last digit
  • no carry over necessary
Biodata Science: Fundamentals of Programming

1.2 Encoding complex information

Biodata Science: Fundamentals of Programming

Encoding Pictures

https://computerscience.chemeketa.edu/cs160Reader/DataRepresentation/ImageRepresentation.html
Biodata Science: Fundamentals of Programming

🎯 Learning objectives

You will be able to

  • store pictures as a succession of bits
  • calculate how many bits are needed to store a given amount of information (states).
Biodata Science: Fundamentals of Programming

✍️ Task 1

  • How many different (unique) images can you store in 1 megapixel?
    • a) in a black and white image (binary image)
    • 🤓 b) as a greyscale image with 1 byte per pixel (256 different greyscale values are stored)
  • Hint:
    • There are two types of black and white images
    • Binary image: There are only black or white pixels (example on the left).
    • Grey-scale image: each pixel can be different light/dark.
Biodata Science: Fundamentals of Programming
Solution 1
  • a) binary image

    • 1 million pixels with two states each (black/white):
    • 21.000.0002^{1.000.000}
  • a) 🤓 greyscale image

    • 1 million pixels with 256 states each:
    • 2561.000.000256^{1.000.000}
Biodata Science: Fundamentals of Programming

🤓 8-bit color

  • In early applications (e.g. games) colours were encoded with 8 bits, i.e. 282^{8} colours
  • historically a succession of 8 Bit is called a byte

Biodata Science: Fundamentals of Programming

Application in RGB codes

  • Colour values are specified as a mixture of red, green and blue
  • 8 bits are provided for each colour
  • e.g., R goes from 0 to 255 (282^8 states)

https://www.rapidtables.com/web/color/RGB_Color.html
Biodata Science: Fundamentals of Programming
(11111111 0000000 11111111) (0000000 0000000 00000000)  (11111111 11111111 00000000) 
(11111111 0000000 11111111) (0000000 0000000 00000000)  (11111111 11111111 00000000) 
(11111111 0000000 11111111) (0000000 0000000 00000000)  (11111111 11111111 00000000) 
Biodata Science: Fundamentals of Programming

🎯 Learning Summary

Now, You can

  • interpret a sequence of bits as a picture if You know the color coding and ratio of the picture
  • calculate how many bits You need to store a number of different states
Biodata Science: Fundamentals of Programming

Encoding Numbers

Biodata Science: Fundamentals of Programming

🎯 Learning objectives

You will be able to

  • explain why computers cannot store floating point numbers with infinite accuracy
Biodata Science: Fundamentals of Programming

The same Bit-Sequence can store different Information

  • Data: 100010

  • Code

    • Integer in binary system
    • Unsigned integer in binary system
    • Pixel colour in a picture
    • Bases: 00: A, 01: T, 10: G, 11: C
  • Information

    • 125+024+023+022+121+020=32+2=341\cdot 2^{5}+0\cdot 2^{4}+0\cdot 2^{3}+0\cdot 2^{2}+1\cdot 2^{1}+0\cdot 2^{0}=32+ 2 = 34
    • (024+023+022+121+020)=2-(0\cdot 2^{4}+0\cdot 2^{3}+0\cdot 2^{2}+1\cdot 2^{1}+0\cdot 2^{0})=-2
    • ⬛️⬜⬜
      ⬜⬛️⬜
    • GAG
Biodata Science: Fundamentals of Programming

🧠 Integers

Biodata Science: Fundamentals of Programming

Floating Point Numbers

  • With 64 bits, 2642^{64} can represent whole, positive numbers.
  • But how are negative, larger and non-integer numbers represented?
  1. IEEE 754 defines the standard representations for binary floating point numbers in computers

Biodata Science: Fundamentals of Programming

  • General value calculation: x=(1)s1,m2eBx = (-1)^s \cdot 1,m \cdot 2^{e-B}
  • First digit (always 1) is not stored in mantissa (so-called hidden bit).
  • Bias B depends on the length of the exponent representation E: B = 2E-1 - 1
  • Valid characteristics: 0 < e < 2E - 1 (values 0 and 2E - 1 are reserved)
Biodata Science: Fundamentals of Programming
Caution: Floating point numbers are not infinitely accurate!
1.1 + 2.2
#> 3.3000000000000003
Biodata Science: Fundamentals of Programming
✍️ Task
  • Try to represent Your year of birth:

⌛ 3 minutes

https://www.h-schmidt.net/FloatConverter/IEEE754.html
Biodata Science: Fundamentals of Programming
🤓 Value ranges of floating-point numbers

  • There are different names for floating-point numbers:
    float, single, double, real
Biodata Science: Fundamentals of Programming

Encoding Text

Biodata Science: Fundamentals of Programming

🎯 Learning objectives

You will be able to

  • decode binary coded text using ASCII tables
  • decide which character encoding is appropriate for a given text
Biodata Science: Fundamentals of Programming

Example: Base Pairs

  • DNA has four different base pairs (bp)
    • G, C, A, T
  • how much bits do we need to store 200200 bp?
    • we need 2n=42^n=4
      • 00 : G
      • 01 : C
      • 10 : A
      • 11 : T
https://www.lecturio.com/concepts/stages-of-transcription/
Biodata Science: Fundamentals of Programming
  • we need 22 bits to encode for 44 possible base pairs on each position
  • hence, we need 400400 bit = 5050 Bytes
  • the text file, we use to store the data is four times larger (216216 Bytes)
Biodata Science: Fundamentals of Programming
https://d-nb.info/1136955135/34
Biodata Science: Fundamentals of Programming

ASCII (American standard code for information interchange)

  • A character set assigns a numerical value to each character, which determines the position of the character within the character set.
  • one 8-bit word per character (the first bit is not used*)

* can be used as a parity bit
Biodata Science: Fundamentals of Programming

Task

  • Decode the following messages using the ASCII character table:
    • 01010011 01000010 01010100
      • SBT
  • ASCII has few options for representing special characters.
  • That is why other character sets have become established
Biodata Science: Fundamentals of Programming

other character encodings

  • ISO-8859 family
    • 15 different 8-bit character sets
    • e.g. ISO 8859-1 for Latin-1, Western European
  • Unicode
    • To provide encoding for every text document of all languages and cultures of the world.
    • The 16-bit Unicode is under continuous development
  • UTF
    • A large part of the text 8-bit character set can be represented.
    • Three encodings for Unicode characters (8-, 16- or 32-bit)
    • The first 128 characters of UTF-8 correspond to the 7-bit ASCII code.
Biodata Science: Fundamentals of Programming
🧠 Decision for character sets
  • ASCII: Space-saving, but no umlauts, special characters, e.g., sequence data: FASTA files)
  • Unicode: Higher memory requirement, but many character sets 😀
Biodata Science: Fundamentals of Programming

The Text-Editor translates the Binary Code in Characters

01000011 01000011 01010100 01010100 01010100 01000011 <...>

UTF-Table

CCTTTC <...>
Biodata Science: Fundamentals of Programming

1.3 Boolean Algebra and Propositional Logic

Biodata Science: Fundamentals of Programming

🎯 Learning objectives

You will be able to

  • describe logical statements with Boolean variables
  • evaluate the Boolean operations AND, OR, NEGATION and EXCLUSIVE OR
  • can combine multiple Boolean operations into meaningful expressions
Biodata Science: Fundamentals of Programming

Logic

Epimenides the Cretan said: All Cretans are liars.

If the lamp is not on, it must be off.

Biodata Science: Fundamentals of Programming

🧠 IPO principle

  • Calculator working according to the IPO principle
    • Input: A and B keys
    • Process: Calculation logic / wiring
    • Output: Lamps
  • Process follows a simple logic: Add the two numbers
https://www.computerhistory.org/timeline/1937/#169ebbe2ad45559efbc6eb35720eb5ea
Biodata Science: Fundamentals of Programming

🧠 Logical deduction

  • Description of the state based on the values of the
    Boolean variables XX and YY containing one bit of information.
    • X=TrueX=\text{True}: Adam is in the lecture
    • Y=TrueY=\text{True}: Eve is in the lecture
  • Possible logical inferences?
    • Are Adam and Eve both there?
    • Is at least Adam or Eve there?
    • Is no one in the lecture?
Biodata Science: Fundamentals of Programming
🧠 Which of these are meaningful Boolean variables?
  • presence of a particular student
    • yes, binary
  • size of a storage unit in megabytes
    • no, its a number
  • Name of a pet
    • no, its a string
  • Colour of a pixel in an 8-bit colour image
    • no, its a 8-bit information
  • Colour of a pixel in a 1-bit binary image
    • yes, black and white can be represented in one bit
  • Result of a coin toss
    • yes, two states
  • Gender of a person
    • ...
Biodata Science: Fundamentals of Programming

🧠 Boolean Algebra

  • is used for the formal description of
    statements and logical conclusions based on Boolean values
  • Can map switching logic (current/no current)
  • Variables only take on two values (are bits)
    • x=True, x=1
    • y=False, y=0
Biodata Science: Fundamentals of Programming

🧠 conjunction, AND, \land

  • Are Adam and Eve both there?
  • yxy \land x
  • truth table
xx yy yxy \land x
00 00 00
00 11 00
11 00 00
11 11 11
https://www.ulfkonrad.de/physik/schalter-und-oder
Biodata Science: Fundamentals of Programming

🧠 Disjunction, OR, \lor

  • Is there at least Adam or Eve?
  • yxy \lor x
  • truth table
xx yy yxy \lor x
00 00 00
00 11 11
11 00 11
11 11 11
https://www.ulfkonrad.de/physik/schalter-und-oder
Biodata Science: Fundamentals of Programming

🧠 -, negation, inversion, NOT, ¬\neg

  • Is Adam not there?

  • ¬x\lnot x

  • truth table

xx ¬x\lnot x
00 11
11 00
Biodata Science: Fundamentals of Programming

🧠 Exclusive OR, \oplus, =1

  • Is there exactly one of Adam or Eve here?
  • truth table
xx yy yxy \oplus x
00 00 00
00 11 11
11 00 11
11 11 00
Biodata Science: Fundamentals of Programming

Biodata Science: Fundamentals of Programming

🤓 Application

  • often, we break down more complex questions into boolean expressions:
    if (warning_level > 2 and slope>35):
        high_avalanche_risk = True
    
Biodata Science: Fundamentals of Programming

Evaluations that result in Boolean Variables

  • comparison: 1>2False
  • test for equality: "a" =="a"True
  • test for inequality: "a" !="a"False
Biodata Science: Fundamentals of Programming

🧠 Calculation rules for Boolean variables

  • Basic for operations in computers e.g. addition of numbers.

  • Helpful for search queries in Google and databases.

  • Three operations:

    • \land or ++ (and)
    • \lor or \cdot (or)
    • ¬\neg (negation) (not)
Biodata Science: Fundamentals of Programming

Four axioms/laws of calculation (Huntington, 1904)

  • Commutativity
    AB=BAA \lor B = B \lor A bzw. AB=BAA \land B = B \land A
  • Neutral element
    0A=A0 \lor A = A bzw. 1A=A1 \land A = A
  • Distributivity
    (AB)C=(AC)+(BC)(A \lor B) \land C = (A \land C) + (B \land C)
    (AB)C=(AC)(BC)(A \land B) \lor C = (A \lor C) \land (B \lor C)
  • Complementary element
    A¬A=1A \lor \neg A = 1 bzw. A¬A=0A \land \neg A = 0
Biodata Science: Fundamentals of Programming

https://www.ncbi.nlm.nih.gov/nuccore/advanced
Biodata Science: Fundamentals of Programming

Brackets matter!

Biodata Science: Fundamentals of Programming
("homo sapiens"[Organism] OR "mus musculus"[Organism]) AND OCA1[Gene Name])
  • the OCA gene of either human or mouse
("homo sapiens"[Organism] OR ("mus musculus"[Organism] AND OCA1[Gene Name])
  • any data regarding humans or (OCA gene of mouse)
Biodata Science: Fundamentals of Programming
Organism Gene FASTA File a=a= Organism =="homo sapiens" b=b= Organism =="mus musculus" c=c=Gene =="OCA1" (ab)c(a \lor b) \land c
homo sapiens TP53 ... True False False False
homo sapiens OCA1 ... True False True True
mus musculus OCA1 ... False True True True
mus musculus TP53 ... False True False False
mus musculus GDF8 ... False True False False
bos taurus GDF8 ... False False False False
Biodata Science: Fundamentals of Programming

✍️ Task

  • Look for myostatin (Gene is GDF8) in the NCBI-Nucleotides Database
  • For what organisms do You find entries?
  • Adapt the following search term so that it returns the specific gene for two other the organisms
    • (GDF8[Gene Name]) AND ("Homo sapiens"[Organism] or "Bos Taurus"[Organism])
Biodata Science: Fundamentals of Programming

🧠 Significance of Boolean algebra for computer science

  • search terms
  • Storage of individual bits can be realized very simply by storing electrical charge, e.g. by capacitors or flip-flops:
    • electrical charge present ➡️ 1
    • electrical charge not present ➡️ 0
  • Conjunction, disjunction and negation can easily be realized by means of switches (transistors, relays) which pass on charge (information) or not.
  • controls execution of software code
Biodata Science: Fundamentals of Programming

Example conjunction:

Biodata Science: Fundamentals of Programming

✍️ Task

  • given the following patient data
  • formulate the following equations that result in boolean values
    • aa ... patient is heavier than 80 kg
    • bb ... patient is younger than 30 years
    • cc ... patient is heavier than 80 kg but older than 30 years
  • fill in the truth table with the formula and the resulting values
Biodata Science: Fundamentals of Programming
age_years weight_kg aa bb cc
20 60
34 70
25 120
26 90
  • e.g. a=(weight>80)a = (weight > 80)
Biodata Science: Fundamentals of Programming
age weight a=(weight>80)a = (weight > 80) b=(age<30)b = (age < 30) c=(a¬b)c =(a \land \lnot b )
20 60 FALSE TRUE FALSE
34 70 FALSE FALSE FALSE
25 120 TRUE TRUE FALSE
26 90 TRUE TRUE FALSE
Biodata Science: Fundamentals of Programming

🎯 Learning Summary

Now, You can

  • describe logical statements with Boolean variables.
  • evaluate the Boolean operations AND, OR, NEGATION and EXCLUSIVE OR.
  • can combine multiple Boolean operations into meaningful expressions.
Biodata Science: Fundamentals of Programming

1.4 Programming Computers

Biodata Science: Fundamentals of Programming

🎯 Learning objectives

You will be able to

  • describe the task of a compiler
  • formulate an algorithm using a an UML-Sequence diagram
Biodata Science: Fundamentals of Programming

🧠 Software as interface between human and machine

  • Computer are hardware that purely operate with boolean algebra
  • The machine program interpreted by the processor is a sequence of zeros and ones (incomprehensible to most humans)
  • Users need to write, read and understand programs.
    Programming languages are approaches to develop programs with less effort and in a form that is more comprehensible to humans
Biodata Science: Fundamentals of Programming

🧠 Compiler

https://medium.com/young-coder/the-difference-between-compiled-and-interpreted-languages-d54f66aa71f0
Biodata Science: Fundamentals of Programming
We can describe different information
  • Booleans: a = True: stored as 1

  • Integers: b=5=122+021+120b = 5 = 1 \cdot 2^2 + 0 \cdot 2^1 + 1 \cdot 2^0: 101

  • Floats: c=3.14c = 3.14

  • Strings: d="AB"d = \text{"AB"}: 01000001 01000010

  • The computer stores it in binary code

    • Implication: The computer must know, what is stored with what code
    • we say the variable (e.g, a) has a certain data type (e.g., Boolean, Integer, etc.)
Biodata Science: Fundamentals of Programming
  • What the programmer writes
    • "a" == "a" ?
  • What the compiler translates
    0100 0001 == 0100 0001
  • What the computer computes
    0100 0001
    0100 0001
    _________
    0000 0000 == 0
    
  • A compiler is the computer program that translates source codes of a particular programming language into executable machine language.
Biodata Science: Fundamentals of Programming

Examples of Programming Languages

  • General Purpose Languages
    • for almost any application, fast
    • C, C++,C#, Java, Python, ...
  • Scripting Languages
    • controlling what a computer shall do
    • Shell, R, Python, VBA, Matlab, Funktionsplan
  • Languages for Databases
    • Filling and queries on data bases
    • SQL
Biodata Science: Fundamentals of Programming

Algorithms

Biodata Science: Fundamentals of Programming

🎯 Learning objectives

You will be able to

  • name examples of algorithms
  • describe central properties of an algorithm
  • recognize basic elements of an algorithm
Biodata Science: Fundamentals of Programming

✍️ Task: Shotgun Sequencing / Assembly

https://www.genome.gov/sites/default/files/media/images/tg/Shotgun-sequencing.jpg
Biodata Science: Fundamentals of Programming

"Shotgun sequencing is a laboratory technique for determining the DNA sequence of an organism’s genome. The method involves randomly breaking up the genome into small DNA fragments that are sequenced individually. A computer program looks for overlaps in the DNA sequences, using them to reassemble the fragments in their correct order to reconstitute the genome." genome.gov

You have a list of fragmented DNA-snippets that have to be ordered in the right order. Your task is to implement an algorithm, that brings the DNA-sequence into thr right order.

Biodata Science: Fundamentals of Programming
  • Randomly pick a sequence of eight nucleotides (read) from the following original sequence and write them on a piece of paper

"TAGCTAGCTAGCTTTTAGTTAGCAGCC"

Biodata Science: Fundamentals of Programming

Building blocks for algorithms

  • loops (same steps several times)
    we to compare all the sequences available
  • conditional execution (e.g., =if x == True:)
    we only add a sequence is we have a match
  • elementary operations (e.g., and, =)
    starting from the seed, we expand the final sequence
  • sequential execution
    the algorithm only makes sense if we keep the sequence
Biodata Science: Fundamentals of Programming

Algorithms

  • Precise (written in a specified language)
    finite description of a general procedure
    using executable elementary (processing) steps.
  • Examples
    • find an open reading fame
    • Translating a DNA into an mRNA sequence
    • Addition of two binary numbers
    • Calculating the number π\pi
Biodata Science: Fundamentals of Programming

Properties: Termination

An algorithm is called terminating if it terminates (stops) (for any allowed input of parameter values) after finitely many steps.

  • Question: Do the algorithms in the previous slide terminate?
    • Translating a DNA into an mRNA sequence
      • yes, the DNA sequence ends and the translation stops
    • Addition of two binary numbers
      • yes, there is a result at the end
    • calculating the number π\pi
      • no, the calculation can continue up to any precision
Biodata Science: Fundamentals of Programming

🤓 Properties: Determinism

establishes "freedom of choice" or "randomness" in execution

  • Forms
    • deterministic sequence: unambiguous specification of the sequence of steps to be executed.
    • deterministic result: unambiguous result for given parameter values
Biodata Science: Fundamentals of Programming
  • Deterministic algorithm:

    • Computer makes a copy of a DNA strand
  • Non-determined "algorithm"

    • Cell makes a copy of a DNA strand
Biodata Science: Fundamentals of Programming

Building blocks for algorithms

  • Inputs: Data that goes in should be in a well defined format
  • Storage: Data that is stored during execution, to make things easier
  • Outputs: Data that comes out
Biodata Science: Fundamentals of Programming

Building blocks for algorithms

  • elementary and boolean operations (e.g. +, and, =, etc.)
  • sequential execution (one processor)
  • parallel execution (multiple processors)
  • conditional execution (e.g. if)
  • loop (same steps several times) (e.g. while, for)
  • subroutines
  • recursion: application of the same principle to smaller subproblems
Biodata Science: Fundamentals of Programming
🧠 elementary operations
  • Assignment (of a variable a value)
    Result = 2
  • boolean operation
    True and False (operation)
    True == True (comparison)
  • Arithmetic operation
    5+2
Biodata Science: Fundamentals of Programming
🧠 Sequenz
  • Steps happen in a certain order
(1) Boil water
(2) Put coffee powder into cup
(3) Fill water into cup
Biodata Science: Fundamentals of Programming
Refinement and subroutines
(2) Pour coffee powder into cup

refined to

(2.1) Open coffee glass
(2.2) Take out spoon of coffee
(2.3) Tilt spoon into cup
(2.4) Close coffee glass
Biodata Science: Fundamentals of Programming
🧠 conditional execution / selection / if
if <condition>
    then <step>

resp.

if <condition>
    then <step a>
    otherwise <step b>
Biodata Science: Fundamentals of Programming
Selection nesting
if coffee is available
    then open coffee glass
    otherwise go to favorite cafe 
        if favorite cafe open
            then order coffee
        otherwise 
            go on to MCI
Biodata Science: Fundamentals of Programming
🧠 Loop (as long as/while)
/* next prime */
repeat
    Add 1;
    test for prime number property
    until number is prime;
output number
Biodata Science: Fundamentals of Programming
Algorithm assemble_sequence()
1) draw a random seed sequence
2) while sequences left
    2.1) draw next sequence
    2.2) compare_beginning(overlap)
        if match glue_fragments_end()
    2.3) compare_end(overlap)
        if match glue_fragments_end()

  • does the algorithm terminate?
    • yes, at one point we can not find any matches
    • but, we cannot be sure, that we found the whole original sequence
  • is the algorithm deterministic?
    • it depends on how we select the sequences
    • we can get stuck, if we do it in the wrong order
    • hence, we should draw by random and do it multiple times
Biodata Science: Fundamentals of Programming

🧠 UML activity diagrams

Graphical description of the flow of an algorithm

Biodata Science: Fundamentals of Programming

Example: Cooking noodles

  • Start and stop are marked
  • activities can be kept very rough at first
  • Branches are provided with conditions (boolean expressions)
  • Does this algorithm always terminate?
Biodata Science: Fundamentals of Programming

✍️ Task: Algorithm assemble_sequence()

  • Create and UML activity diagram that creates the complement of a DNA sequence
1) draw a random seed sequence
2) while sequences left
    2.1) draw next sequence
    2.2) compare_beginning(overlap)
        if match glue_fragments_end()
    2.3) compare_end(overlap)
        if match glue_fragments_end()

Biodata Science: Fundamentals of Programming

✍️ Task: Algorithm Complement of a DNA sequence

  • Create and UML activity diagram that creates the complement of a DNA sequence

  • e.g., TCCACTGTGGCGTTCGGAATATAATCG

  • verbal description

    • go through the list
    • replace each letter by its complement
    • TA
    • AT
    • CG
    • GC
Biodata Science: Fundamentals of Programming
  • input: list to complement: TCCACTGTGGCGTTCGGAATATAATCG
  • output: complemented list: AGG...
  • storage: temporary list:
Biodata Science: Fundamentals of Programming

Biodata Science: Fundamentals of Programming
graph TD
  start[Start] --> create_list(Create empty temporary list)
  create_list --> start_loop(what is the first/next letter in the original list)
  start_loop -->|Next letter == A| A(Add a T to the temporary list)
  start_loop -->|Next letter == T| T(Add a A to the temporary list)
  start_loop -->|Next letter == G| G(Add a C to the temporary list)
  start_loop -->|Next letter == C| C(Add a G to the temporary list)
  A --> letters_left(are there any letters left in the original list)
  T --> letters_left
  G --> letters_left
  C --> letters_left
  letters_left  -->|yes| start_loop 
  letters_left  -->|no| stop_loop(return the temporary list as complement)
  stop_loop --> Stop
https://mermaid.live/6qJRv7el2OuCMr0aF-N-amufwAlQIeyA
Biodata Science: Fundamentals of Programming

🤓 Bonus Task

  • Create and UML activity diagram on how to find an open reading frame
  • If we have multiple, it is ok to stop after You found the first one

Biodata Science: Fundamentals of Programming
Open Reading Frames
  • have the potential to be transcribed into RNA and translated into protein
  • from a start codon (e.g., ATG), through a subsequent region which usually has a length that is a multiple of 3 nucleotides to a stop codon (e.g., TGA) in the same reading frame.

Biodata Science: Fundamentals of Programming
What are the Inputs and Outputs to our Algorithm?
  • Inputs:
    • String (text) of a RNA sequence
  • Outputs:
    • found_open_reading_frame: True or False
    • start_position: number of the first letter of the reading frame (A in the ATG)
    • stop_position: number of the last letter of the reading frame (A in the TGA)
Biodata Science: Fundamentals of Programming
Verbal Description
  • set found_open_reading_frame to False
  • Iterate through the RNA sequence in steps of 1
    • Look at the next three letters
    • if the letters are ATG
      • note the position of A in start_position
      • Iterate through the next three-letter pairs
      • if any of them is TGA
        • note the position of A in stop_position
        • set found_open_reading_frame to True
        • terminate algorithm
    • else go to ne next letter
  • if at the end of RNA sequence
    • terminate algorithm
Biodata Science: Fundamentals of Programming
  • this graphical interpretation of the algorithm is not very precise
  • anyone could interpret this a litte differently
  • computers cannot work with this ambiguity
  • programming languages provide a precise way to describe algorithms
Biodata Science: Fundamentals of Programming
graph TD
  start[Start] --> setFound(Found open_reading_frame = False)
  setFound --> loopStart{Iterate through RNA sequence}
  loopStart -->|Next three letters == ATG| noteStart(Note position of A in start_position)
  loopStart -->|Next three letters != ATG| moveNext
  loopStart -->|No letters left| stop
  moveNext[go to next letter] --> loopStart
  noteStart --> loopPairs{Iterate through next three-letter pairs}
  loopPairs -->|pair == TGA| noteStop(Note position of A in stop_position)
  loopPairs -->|pair is not TGA| continuePairs[go to next pair]
  continuePairs --> loopPairs 
  loopPairs -->|no pair left| loopStart
  noteStop --> setTrue(Set found_open_reading_frame to True)
  setTrue --> stop
https://mermaid.ink/img/pako:eNqNk9FqwjAUhl_lLFcK-gKCg4Iou5iM6dWsSGhPNdDmZOnp2FDffTmpLXM62E0Iyff_OflzclQZ5agmau-1O8B6llqAmrXnzUrGLYzHj1Ajz6mx-SCOQA7tzqPOjd3vCq8rhCnMdVnjMMovdJSWRC46HZ8YvWYEPnhq9gd4XSYBfW_QZngWXY-K8LTETxYWEUrkoK1hOoVkvTiBJcYIDpZhBo5qw4YsUAEJGNvWv-uWh__yfrh4V_SBsn1HRD1cYsGncAw5wTrJZk_ABFbMW3J7HYHAfe391os2vr5Jx_YljlsvcML1QUVVrEvWJZr1IumiIfdnMuRug_nlZWqxaf0ysmxsgxH5eUEht2JwRVxfCm4PsBSVlwTvJEOu67i1b3CwQoZCmml3p-lCLQJ1TSfzVhwfRo1Uhb7SJg_tfRQmVXzAClM1CdMcC92UnKrUngOqG6bVl83UhIPNSDUuD88xMzp8jEpNCunukcLcMPnn9svEnzNSTts3oo45fwPwvi0D?type=png
Biodata Science: Fundamentals of Programming