2.1 Linear Algebra

Bio Data-Science

2.1.1 Linear Algebra

🎯 Learning objectives

You will be able to

  • perform matrix addition and multiplication
  • write a linear model as a matrix multiplication
  • describe the meaning of a model, prediction, parameters and predictors
Bio Data-Science

How much linear algebra is there in data science?

https://www.quora.com/How-much-linear-algebra-is-there-in-data-science
Bio Data-Science

Linear Equations and Data Science

  • Imagine you are a forensic scientist working in the Central Identification Lab
  • Your job is to help identify human remains believed to be U.S. military personnel reported missing in action during World War II and other conflicts
  • A team of your colleagues recovers skeletal remains consisting of a pelvic bone, several ribs, and a femur from a 1943 military plane crash on Vanuatu
https://www.visionlearning.com/en/library/Math-in-Science/62/Linear-Equations-in-Science/194, https://boneidentification.com/bones/human-femur/
Bio Data-Science

Linear Equations and Data Science

  • When the remains arrive in your lab, you photograph and measure the bones
  • From the shape of the pelvis, you can quickly tell that the remains most likely belong to an adult male
  • You note that the femur is 47.547.5 cm long. Bone length, especially the length of long bones like the femur, is related to an individual’s overall height
  • This relationship is so strong that you can predict an individual’s height if you know the length of one bone in the leg
Bio Data-Science

Missing Person Data

Person High
A. Abrahams 163 cm
B. Boyle 172 cm
C. Cornell 183cm
  • Found femur 47.547.5 cm
  • How does the femur belong to?
Bio Data-Science

Putting Data and Knowledge into Formulas

You plug your measurement into an equation used to estimate the overall height of an adult male based on femur length:

  • H=1.88cmcm(L)+81.3 cmH = 1.88 \frac{\text{cm}}{\text{cm}}(L) + 81.3 \text{ cm}

    • HH ... body height in cm
    • LL ... femur length in cm
  • H=(1.88×47.5) cm+81.3 cmH = (1.88 × 47.5) \text{ cm} + 81.3 \text{ cm}

  • H=170.6 cmH = 170.6 \text{ cm}

Bio Data-Science
Person High
A. Abrahams 163 cm
B. Boyle 172 cm
C. Cornell 183cm
  • Given H=170.6 cmH = 170.6 \text{ cm} we probably found B. Boyle?
Bio Data-Science

Where does the formula come from?

f^(x1)=β0+β1x1=81.3 cm+1.88cmcmx1=y\hat{f}(x_1)= \beta_0 + \beta_1 \cdot x_1 = 81.3 \text{ cm}+ 1.88 \frac{\text{cm}}{\text{cm}} \cdot x_1 = y

someone put knowledge about the world in a formula ff (model)

  • to make a prediction of the height f^\hat{f}
    • in future we will all predictions mark with a hat ^\hat{}
  • we need parameters β\vec{\beta} that describe the model (knowledge)
    • If we want to tak about many parameters (e.g, β0+β1\beta_0 + \beta_1) we put them into a vector (β\vec{\beta})
  • We have a predictor x1x_1 (femur length) that we can measure
  • We have a predicted variable yy which is the height
Bio Data-Science

We will use matrix notation most of the time

f(x1)=β×x=[β0β1]×[1x1]=[81.31.88]×[1x1]f(x_1) = \vec{\beta} \times \vec{x} = \begin{bmatrix} \beta_0 & \beta_1 \end{bmatrix} \times \begin{bmatrix} 1 \\ x_1 \end{bmatrix} = \begin{bmatrix} 81.3 & 1.88 \end{bmatrix} \times \begin{bmatrix} 1 \\ x_1 \end{bmatrix}

=81.3+1.88x1= 81.3 + 1.88 \cdot x_1

Bio Data-Science

✍️Task: Heart rate monitor

  • Some gathered heart rate (pulse) data of a subject on a indoor bike
  • Create a model to predict the pulse based on the speed the person goes
Bio Data-Science
  • write a linear formula that predicts the pulse (pp) based on the speed (ss)
  • estimate the numbers from the graph
  • what is the predictor?
  • what are the parameters?
  • what pulse do You predict for 20 kph20 \text{ kph}?
  • do You think this prediction valid?

⌛ 10 min

Bio Data-Science
  • p(s)=65+s20 bpm5 kphp(s)=65 + s \cdot \frac{20 \text{ bpm}}{5 \text{ kph}}

    • pp pulse is the predicted variable
    • ss speed is the predictor
  • parameters of the model

    • β0=65\beta_0 = 65 (intercept) and
    • β1=20 bpm5 kph\beta_1 = \frac{20 \text{ bpm}}{5 \text{ kph}} (slope)
  • 145 bpm145 \text{ bpm}, but we do not know whether this prediction is valid

  • p(s)=βxp(s) = \vec{\beta} \cdot \vec{x}
    =[β0β1]×[1s]=[654]×[1s]= \begin{bmatrix}\beta_0 & \beta_1 \end{bmatrix} \times \begin{bmatrix} 1 \\ s \end{bmatrix} = \begin{bmatrix} 65 & 4 \end{bmatrix} \times \begin{bmatrix} 1 \\ s \end{bmatrix}

Bio Data-Science

Matrix Data

  • most common tools in engineering and computer science are rectangular grids of numbers known as matrices
  • Matrices arose originally as a way to describe systems of linear equations
https://news.mit.edu/2013/explained-matrices-1206#:~:text=The numbers in a matrix,of much more complicated calculations.
Bio Data-Science

A Matrix

A=[651547318268167]A = \begin{bmatrix} 65 & 154 \\ 73 & 182 \\ 68 & 167 \\ \end{bmatrix}

A=(ai,j)=[a1,1a1,2a2,1a2,2a3,1a3,2]A = (a_{i,j}) = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \\ a_{3,1} & a_{3,2} \\ \end{bmatrix}

  • ii indicates the row
  • jj the column
https://www.utstat.toronto.edu/~brunner/books/LinearModelsInStatistics.pdf
Bio Data-Science

A Vector

has only one row or column

x=[x1x2x3]\vec{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ \end{bmatrix}

transposed vector

x=[x1x2x3]\vec{x}' = \begin{bmatrix} x_1 & x_2 & x_3 \\ \end{bmatrix}

(x)=x(\vec{x}')' =x

Bio Data-Science

🧠 Sum of Two Matrices or Two Vectors

A+B=[a1,1a1,2a2,1a2,2a3,1a3,2]+[b1,1b1,2b2,1b2,2b3,1b3,2]=[a1,1+b1,1a1,1+b1,2a2,1+b2,1a2,1+b2,2a3,1+b3,1a3,1+b3,2]A + B = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \\ a_{3,1} & a_{3,2} \\ \end{bmatrix} + \begin{bmatrix} b_{1,1} & b_{1,2} \\ b_{2,1} & b_{2,2} \\ b_{3,1} & b_{3,2} \\ \end{bmatrix} = \begin{bmatrix} a_{1,1} + b_{1,1} & a_{1,1} + b_{1,2} \\ a_{2,1} + b_{2,1} & a_{2,1} + b_{2,2} \\ a_{3,1} + b_{3,1} & a_{3,1} + b_{3,2} \\ \end{bmatrix}

Bio Data-Science

🧠 Product of a Scalar and a Matrix

cA=[ca1,1ca1,2ca2,1ca2,2ca3,1ca3,2]c \cdot A = \begin{bmatrix} c \cdot a_{1,1} & c \cdot a_{1,2} \\ c \cdot a_{2,1} & c \cdot a_{2,2} \\ c \cdot a_{3,1} & c \cdot a_{3,2} \\ \end{bmatrix}

Bio Data-Science

🧠 Product of Two Matrices or Two Vectors

A×B=[a1,1a1,2a2,1a2,2][b1,1b1,2b2,1b2,2]A \times B = \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \\ \end{bmatrix} \cdot \begin{bmatrix} b_{1,1} & b_{1,2} \\ b_{2,1} & b_{2,2} \\ \end{bmatrix}

=[a1,1b1,1+a1,2b2,1a1,1b1,2+a1,2b2,2a2,1b1,1+a2,2b2,1a2,1b1,2+a2,2b2,2]= \begin{bmatrix} a_{1,1} b_{1,1} + a_{1,2} b_{2,1} & a_{1,1} b_{1,2} + a_{1,2} b_{2,2} \\ a_{2,1} b_{1,1} + a_{2,2} b_{2,1} & a_{2,1} b_{1,2} + a_{2,2} b_{2,2} \\ \end{bmatrix}

Bio Data-Science

✍️ Solve the following computations

⌛ 5 minutes

[223451]+[100001]\begin{bmatrix} 2 & 2 \\ 3 & 4 \\ 5 & 1 \\ \end{bmatrix} + \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ \end{bmatrix}

Bio Data-Science

[2234]×[1000]\begin{bmatrix} 2 & 2 \\ 3 & 4 \\ \end{bmatrix} \times \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ \end{bmatrix}

Bio Data-Science

[223451]+[100001]=[323452]\begin{bmatrix} 2 & 2 \\ 3 & 4 \\ 5 & 1 \\ \end{bmatrix} + \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} 3 & 2 \\ 3 & 4 \\ 5 & 2 \\ \end{bmatrix}

Bio Data-Science

[2234][1000]=[21+2020+2031+4030+40]=[2030]\begin{bmatrix} 2 & 2 \\ 3 & 4 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 0 \\ 0 & 0 \\ \end{bmatrix} = \begin{bmatrix} 2 \cdot 1 + 2 \cdot 0 & 2 \cdot 0 + 2 \cdot 0 \\ 3 \cdot 1 + 4 \cdot 0 & 3 \cdot 0 + 4 \cdot 0 \\ \end{bmatrix}= \begin{bmatrix} 2 & 0 \\ 3 & 0 \\ \end{bmatrix}

Bio Data-Science

Application Matrix Multiplication

  • We want to mix a growth medium
    • we know the composition by weight
    • we want to know the caloric energy
      and the price
  • We want to apply the same calculation to different data
Water Glucose Vitamins
Sample 1 100 g 10 g 1 g
Sample 2 70 g 20 g 2 g
Sample 3 90 g 10 g 1 g
https://bioscience.lonza.com/lonza_bs/AT/en/Primary-and-Stem-Cells/p/000000000000185329/LGM-3-Lymphocyte-Growth-Medium-3
Bio Data-Science
Example Matrix Multiplication
  • We also have the data of the energy density and the price by weight:
Water Glucose Vitamins
Caloric density 0 kcal/ g 4kcal/ g 0 kcal/ g
Price 0 €/ g 0.02 €/ g 0.10 €/ g
  • Caloric energy of Sample 1

    • 100g0kcal/g+10g4kcal/g+1g0kcal/g=40kcal100 g \cdot 0 kcal/ g + 10 g \cdot 4 kcal/ g + 1 g \cdot 0 kcal/ g = 40 kcal
  • We could write a for-loop!

Bio Data-Science

Example Matrix Multiplication

Caloric energy

[100g10g1g70g20g2g90g10g1g][0kcal/g4kcal/g0kcal/g]=[40kcal80kcal40kcal]\begin{bmatrix} 100 g & 10 g& 1g \\ 70 g & 20 g& 2g \\ 90 g & 10 g& 1g \\ \end{bmatrix} \cdot \begin{bmatrix} 0 kcal/ g \\ 4 kcal/ g \\ 0 kcal/ g \\ \end{bmatrix} = \begin{bmatrix} 40 kcal \\ 80 kcal \\ 40 kcal \\ \end{bmatrix}

Bio Data-Science

✍️ Task

  • What is the price for each sample?
  • Could we also write this in one formula?

Price

[100g10g1g70g20g2g90g10g1g][0.00/g0.02/g0.10/g]=[.........]\begin{bmatrix} 100 g & 10 g& 1g \\ 70 g & 20 g& 2g \\ 90 g & 10 g& 1g \\ \end{bmatrix} \cdot \begin{bmatrix} 0.00 €/ g \\ 0.02 €/ g \\ 0.10 €/ g \\ \end{bmatrix} = \begin{bmatrix} ... \\ ... \\ ... \\ \end{bmatrix}

⌛ 5 minutes

Bio Data-Science

Price

[100g10g1g70g20g2g90g10g1g][0.00/g0.02/g0.10/g]=[1000+100.02+10.1700+200.02+20.1900+100.02+10.1]=\begin{bmatrix} 100 g & 10 g& 1g \\ 70 g & 20 g& 2g \\ 90 g & 10 g& 1g \\ \end{bmatrix} \cdot \begin{bmatrix} 0.00 €/ g \\ 0.02 €/ g \\ 0.10 €/ g \\ \end{bmatrix} = \begin{bmatrix} 100 \cdot 0 + 10 \cdot 0.02 + 1 \cdot 0.1 \\ 70 \cdot 0 + 20 \cdot 0.02 + 2 \cdot 0.1 \\ 90 \cdot 0 + 10 \cdot 0.02 + 1 \cdot 0.1 \\ \end{bmatrix} \text{€} =

[0.300.600.30]\begin{bmatrix} 0.30 \text{€} \\ 0.60 \text{€} \\ 0.30 \text{€} \\ \end{bmatrix}

Bio Data-Science

even more convenient:

[100g10g1g70g20g2g90g10g1g][0kcal0.00/g4kcal0.02/g0kcal0.10/g]=[40kcal0.3080kcal0.6040kcal0.30]\begin{bmatrix} 100 g & 10 g& 1g \\ 70 g & 20 g& 2g \\ 90 g & 10 g& 1g \\ \end{bmatrix} \cdot \begin{bmatrix} 0 kcal & 0.00 \text{€} / g \\ 4 kcal & 0.02 \text{€} / g \\ 0 kcal & 0.10 \text{€} / g \\ \end{bmatrix}= \begin{bmatrix} & 40 kcal & 0.30 \text{€} \\ & 80 kcal & 0.60 \text{€} \\ & 40 kcal & 0.30 \text{€} \\ \end{bmatrix}

Bio Data-Science

2.1.2 Python Packages and numpy

🎯 Learning objectives

You will be able to

  • load and install additional packages in Google Colab
  • define and manipulate numpy arrays
  • apply mathematical operations to numpy arrays
  • solve systems of linear equations with Python
Bio Data-Science

What is a Python package

  • a collection of modules with functions. Modules that are related to each other are mainly put in the same package. When a module from an external package is required in a program, that package can be imported and its modules can be put to use.
  • for instance numpy provides a data structure for matrices
  • (most) Python packages are open source and can be used by anyone
Bio Data-Science

1 Matrix Data and numpy

⌛ 45 minutes

Bio Data-Science

How we will work together

  • Before You start, put the the red card on top, this will indicate that You are still working on the challenge
  • ✍️ are simple practical task You should try on Your own
  • 🏆 are more challenging practical task, where You can work in a group
  • 🤓 are optional task, if You want to learn more
  • 🏁 Once, You reach the recap mark, switch the cards. A green card indicates that everything is clear, a yellow card that we should discuss the solution together
  • At any time, if You have a question: Raise Your hand
Bio Data-Science
Bio Data-Science

🤓 Example Systems of Linear Equations

  • We will use Linear Algebra in Machine Learning and regression models
  • however, there are also other useful applications
  • imagine You want to find the intersect of
    • Equation I 2y+x=62y + x = 6
    • Equation II: 2y+2=3x2y + 2 = 3x
  • for two equations, the crossing point is easy to calculate
https://www.mathe-online.at/lernpfade/Lineare_Gleichungssysteme/?kapitel=2
Bio Data-Science

🤓 Matrix formulation

  • Equation I 2y+x=62y + x = 6 can be written as y=30.5xy = 3 - 0.5 x

  • Equation II: 2y+2=3x2y + 2 = 3x can be written as y=1+1.5xy = -1 + 1.5x

  • We can rewrite this system in matrix form

    A×b=cA \times \vec{b} = \vec{c}

[126322]×[xy1]=[00] \begin{bmatrix} 1 & 2 & -6 \\ -3 & 2 & 2 \\ \end{bmatrix} \times \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix}

  • Computers are efficient in solving large linear equation systems
Bio Data-Science

🤓 System of Linear Equations with no intersect

  • Equation I: 3y3x=33y - 3x = 3

  • Equation II: 2y+2=2x2y + 2 = 2x

  • there is no solution

    [333222]×[xy1]=[00] \begin{bmatrix} -3 & 3 & 3 \\ -2 & 2 & 2 \\ \end{bmatrix} \times \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ \end{bmatrix}

https://www.mathe-online.at/lernpfade/Lineare_Gleichungssysteme/?kapitel=2
Bio Data-Science

(Optional) Case Study: Solving systems of linear equations with Python

  • You want to create a new growth medium on industrial scale.

  • You base the the new medium on two existing products (A and B).

  • You want to create 400 kg of the new mixture.

  • Component A costs 18 €. Component B costs 22 €.

  • How much (kg) of A and B do You need, if the new mixture should cost 19,50 €?

  • First, create two formulas by what You know. Then reformulate them as a matrix multiplication and solve them using numpy.

https://bioscience.lonza.com/lonza_bs/AT/en/Primary-and-Stem-Cells/p/000000000000185329/LGM-3-Lymphocyte-Growth-Medium-3
Bio Data-Science