2 Data Science

Dr. Julian Huber
Management Center Innsbruck

Julian Huber - Data Science
Julian Huber - Data Science

2.1 Linear Algebra

Julian Huber - Data Science
Julian Huber - Data Science

2.1.1 Linear Algebra

🎯 Learning objectives

You will be able to

  • perform matrix addition and multiplication
  • write a linear model as a matrix multiplication
  • describe the meaning of a model, prediction, parameters and predictors
Julian Huber - Data Science

How much linear algebra is there in data science?

Julian Huber - Data Science

Linear Equations and Data Science

  • Imagine you are a forensic scientist working in the Central Identification Lab
  • Your job is to help identify human remains believed to be U.S. military personnel reported missing in action during World War II and other conflicts
  • A team of your colleagues recovers skeletal remains consisting of a pelvic bone, several ribs, and a femur from a 1943 military plane crash on Vanuatu

Julian Huber - Data Science

Linear Equations and Data Science

  • When the remains arrive in your lab, you photograph and measure the bones
  • From the shape of the pelvis, you can quickly tell that the remains most likely belong to an adult male
  • You note that the femur is cm long. Bone length, especially the length of long bones like the femur, is related to an individual’s overall height
  • This relationship is so strong that you can predict an individual’s height if you know the length of one bone in the leg
Julian Huber - Data Science

Missing Person Data

Person High
A. Abrahams 163 cm
B. Boyle 172 cm
C. Cornell 183cm
  • Found femur cm
  • How does the femur belong to?
Julian Huber - Data Science

Putting Data and Knowledge into Formulas

You plug your measurement into an equation used to estimate the overall height of an adult male based on femur length:

    • ... body height in cm
    • ... femur length in cm
Julian Huber - Data Science
Person High
A. Abrahams 163 cm
B. Boyle 172 cm
C. Cornell 183cm
  • Given we probably found B. Boyle?
Julian Huber - Data Science

Where does the formula come from?

someone put knowledge about the world in a formula (model)

  • to make a prediction of the height
    • in future we will all predictions mark with a hat
  • we need parameters that describe the model (knowledge)
    • If we want to tak about many parameters (e.g, ) we put them into a vector ()
  • We have a predictor (femur length) that we can measure
  • We have a predicted variable which is the height
Julian Huber - Data Science

We will use matrix notation most of the time

Julian Huber - Data Science

✍️Task: Heart rate monitor

  • Some gathered heart rate (pulse) data of a subject on a indoor bike
  • Create a model to predict the pulse based on the speed the person goes
Julian Huber - Data Science
  • write a linear formula that predicts the pulse () based on the speed ()
  • estimate the numbers from the graph
  • what is the predictor?
  • what are the parameters?
  • what pulse do You predict for ?
  • do You think this prediction valid?

⌛ 10 min

Julian Huber - Data Science
    • pulse is the predicted variable
    • speed is the predictor
  • parameters of the model

    • (intercept) and
    • (slope)
  • , but we do not know whether this prediction is valid


Julian Huber - Data Science

Matrix Data

  • most common tools in engineering and computer science are rectangular grids of numbers known as matrices
  • Matrices arose originally as a way to describe systems of linear equations

Julian Huber - Data Science

A Matrix

  • indicates the row
  • the column

Julian Huber - Data Science

A Vector

has only one row or column

transposed vector

Julian Huber - Data Science

🧠 Sum of Two Matrices or Two Vectors

Julian Huber - Data Science

🧠 Product of a Scalar and a Matrix

Julian Huber - Data Science

🧠 Product of Two Matrices or Two Vectors

Julian Huber - Data Science

✍️ Solve the following computations

⌛ 5 minutes

Julian Huber - Data Science

Julian Huber - Data Science

Julian Huber - Data Science

Julian Huber - Data Science

Application Matrix Multiplication

  • We want to mix a growth medium
    • we know the composition by weight
    • we want to know the caloric energy
      and the price
  • We want to apply the same calculation to different data
Water Glucose Vitamins
Sample 1 100 g 10 g 1 g
Sample 2 70 g 20 g 2 g
Sample 3 90 g 10 g 1 g

Julian Huber - Data Science
Example Matrix Multiplication
  • We also have the data of the energy density and the price by weight:
Water Glucose Vitamins
Caloric density 0 kcal/ g 4kcal/ g 0 kcal/ g
Price 0 €/ g 0.02 €/ g 0.10 €/ g
  • Caloric energy of Sample 1

  • We could write a for-loop!

Julian Huber - Data Science

Example Matrix Multiplication

Caloric energy

Julian Huber - Data Science

✍️ Task

  • What is the price for each sample?
  • Could we also write this in one formula?

Price

⌛ 5 minutes

Julian Huber - Data Science

Price

Julian Huber - Data Science

even more convenient:

Julian Huber - Data Science
Julian Huber - Data Science

2.1.2 Python Packages and numpy

🎯 Learning objectives

You will be able to

  • load and install additional packages in Google Colab
  • define and manipulate numpy arrays
  • apply mathematical operations to numpy arrays
  • solve systems of linear equations with Python
Julian Huber - Data Science

What is a Python package

  • a collection of modules with functions. Modules that are related to each other are mainly put in the same package. When a module from an external package is required in a program, that package can be imported and its modules can be put to use.
  • for instance numpy provides a data structure for matrices
  • (most) Python packages are open source and can be used by anyone

Julian Huber - Data Science

2.1 Matrix Data and numpy

⌛ 45 minutes

Julian Huber - Data Science

How we will work together

  • Before You start, put the the red card on top, this will indicate that You are still working on the challenge
  • ✍️ are simple practical task You should try on Your own
  • 🏆 are more challenging practical task, where You can work in a group
  • 🤓 are optional task, if You want to learn more
  • 🏁 Once, You reach the recap mark, switch the cards. A green card indicates that everything is clear, a yellow card that we should discuss the solution together
  • At any time, if You have a question: Raise Your hand
Julian Huber - Data Science
Julian Huber - Data Science

🤓 Example Systems of Linear Equations

  • We will use Linear Algebra in Machine Learning and regression models
  • however, there are also other useful applications
  • imagine You want to find the intersect of
    • Equation I
    • Equation II:
  • for two equations, the crossing point is easy to calculate

Julian Huber - Data Science

🤓 Matrix formulation

  • Equation I can be written as

  • Equation II: can be written as

  • We can rewrite this system in matrix form

  • Computers are efficient in solving large linear equation systems
Julian Huber - Data Science
Julian Huber - Data Science

🤓 System of Linear Equations with no intersect

  • Equation I:

  • Equation II:

  • there is no solution

Julian Huber - Data Science

(Optional) Case Study: Solving systems of linear equations with Python

  • You want to create a new growth medium on industrial scale.

  • You base the the new medium on two existing products (A and B).

  • You want to create 400 kg of the new mixture.

  • Component A costs 18 €. Component B costs 22 €.

  • How much (kg) of A and B do You need, if the new mixture should cost 19,50 €?

  • First, create two formulas by what You know. Then reformulate them as a matrix multiplication and solve them using numpy.

Julian Huber - Data Science