Linear Algebra Essentials for Machine Learning Developers

Written by

Machine learning developers need a strong understanding of linear algebra concepts. Learn about vectors, matrices, and eigenvalues to enhance your AI programming skills.

Are you interested in learning more about AI programming and machine learning? If so, a solid grasp of linear algebra is crucial for understanding the intricacies of model training, optimization, and data transformation. 

This article will cover the most critical linear algebra concepts for machine learning, including vectors, matrices, and eigenvalues. Whether you're a novice or looking to strengthen your foundational knowledge, our goal is to equip you with the tools and intuition to dive deeper into ML development.

What Are Vectors?

Vectors can be defined in different ways depending on the context. For example, in mathematics and physics, a vector is typically defined as an object with both magnitude and direction and moving through space. That is why you will often see mathematical vectors represented by arrows. 

However, in computer science, a vector is a list-like structure that can grow or shrink in size. Machine learning developers use vectors to represent data points and enable various mathematical operations. This concept is applied in programming languages like C++, where std::vector is part of the standard template library.

What Are Vector operations?

OWe can use vectors to perform several operations. Addition and subtraction are easy to grasp, but others, like scalar multiplication and dot product, have specific mathematical properties and interpretations. Let's explore some of these operations and their properties.

Vector Addition

Vectors are represented as linear objects that project from an origin point in a plane (where the x-axis and y-axis intersect). Imagine that we have a vector $A = [2, 4]$ and another vector $B = [3, 2]$. To add these vectors, we must combine vector A with vector B by projecting A onto B's direction and vice versa. This way, we are integrating the magnitudes and directions of both vectors into a single resultant vector.

Vector chart outlining the above concept.
Line graph charting vector

As we can see in the geometrical representation above, we can perform vector addition by adding the corresponding components of vector A and vector B. This produces a new vector, $D = [5, 6]$. We say we are projecting vector B onto vector A’s direction because the resulting vector D puts vector A at the end of vector B and vice versa. This demonstrates how two forces or movements can combine in space. 

Also, it is worth noting that this operation is commutative and associative, so the sum of (A + B) + C equals the sum of B + (A + C).

Scalar Multiplication

A scalar is a single real number that is used to multiply each component of a vector. To perform scalar multiplication, we take a vector and multiply each component by a scalar. This operation is not limited to magnitude adjustments, and it can stretch or compress the vector (i.e., multiplying by a negative scalar reverses the vector's direction, extending it in the opposite direction). 

Scalar multiplication adheres to the distributive property, meaning that multiplying a scalar by the sum of two vectors is equivalent to multiplying the scalar by each vector individually and then adding the results. This ensures that the operation uniformly affects each vector component and maintains the vector's integrity while altering its scale and orientation.

In the example below, we perform scalar multiplication on vector $B = [2, 2]$  using 2 and -2 as the scalar. 

Line graph charting vector
Line graph charting vector

Dot Product

The dot product, also known as the inner product, is an operation that takes two equal-length vectors and returns a single scalar. Engineers use it to determine the angle between vectors, project one vector onto another, and measure vector similarity. 

If we have vector A and vector B, the dot product is a number obtained by multiplying each component of vector A with its corresponding component in vector B and then summing the results. In practice, the formula would look something like $\dot{AB} = A_1B_1 + A_2B_2$. This operation adheres to some algebraic properties: it is commutative, ensuring $A⋅B=B⋅A$; it is distributive over vector addition; and it is associative like scalar multiplication. 

An interesting aspect of the dot product is its capacity to reveal angular relationships between vectors. For instance, when two vectors are perpendicular (orthogonal), their dot product will always equal 0. Let’s illustrate this with an example. 

Consider two vectors: vector $V = [-4, 4]$ and vector $U = [4, 4]$. Together, both vectors form a 90-degree angle. If we perform the dot operation $V * U = -44 + 44 = 0$, the result is zero, which proves orthogonality. This outcome confirms the mathematical principle and highlights how dot products can help us identify the geometric and spatial interrelations of vectors. 

Line graph charting vector


Alternatively, let’s see what happens if we have two vectors pointing in opposite directions. In the example below, vectors V and U form a 180-degree angle. A dot product operation results in a negative value of -32, which is equal to the negative product of the magnitudes of both vectors (more on vector magnitude later). This mathematically confirms the opposite projection of both vectors at a 180-degree angle.

Line graph charting vector

Vector Length

The vector length, also known as the magnitude or norm of a vector, is a measure of the vector's size or length in space. It represents the distance from the vector's starting point (often the origin) to its endpoint. Calculating the vector length is helpful when computing the scalar and vector projection, which is a key concept when changing the basis vector or projecting a vector onto a new set of basis vectors.

To calculate a vector's magnitude or length, we need a coordinate system established through orthogonal axes. Consider the basis vectors $J = [0, 1]$ and $I = [1, 0]$, positioned perpendicularly to each other. Each axis represents a specific dimension, and the coordinates of a point or vector represent its respective location along these dimensions. This framework facilitates operations like vector addition, scalar multiplication, and the dot product, enhancing our capacity to visualize and dissect geometric transformations. 

Let’s illustrate this example with a new vector $A = [5, 4]$. We can decompose it into $A = 5I + 4J$. Scaling $I$ and $J$ by factors of 5 and 4 yields $5I =[5, 0], 4J = [0, 4]$ respectively, with their vector addition reproducing the original vector A and a right-angled triangle, as we can see below.

This geometric construct leads us right into the Pythagoras theorem, which we can use to determine vector length, represented as $||A|| = \sqrt{ai^2 + bj^2} = \sqrt{5^2 + 4^2} = \sqrt{41}$.

Line graph charting vector

Matrices

We can think of matrices as objects that rotate, stretch, or shear our vector space. Matrices help us perform linear transformations and solve systems of linear equations, a topic we'll delve into later. As a starting point, let's understand how matrices transform space. In linear algebra, a matrix can be seen as a function that takes vectors as inputs and outputs transformed vectors. This transformation can include rotation, where the direction of the vector is changed; scaling, where the size of the vector is increased or decreased; and shearing, where the shape of the vector space is altered but volumes are preserved. To rotate a vector in a 2D space by 90 degrees, we can use this rotation matrix:

$$ \begin{equation} R(90) = \begin{pmatrix} cos(90) & −sin(90)\\ sin(90) & cos(90) \end{pmatrix} \begin{pmatrix} 2 \\ 2 \end{pmatrix}

\begin{pmatrix} 0 & −1\\ 1 & 0 \end{pmatrix} \begin{pmatrix} 2 \\ 2 \end{pmatrix}

\begin{pmatrix} -2 \\ 2 \end{pmatrix} \end{equation} $$

Any vector that passes through this function will land in a new set of coordinates from a space that is now 90 degrees rotated. Let's see a geometric representation that helps us visualize what just happened.

Line graph charting vector

Now that we grasp how matrices transform space, let's see what kind of linear operation we can perform using matrices. Matrices support several mathematical operations, including addition, subtraction, scalar multiplication, and matrix multiplication. 

Some conditions need to be met before we can perform these operations. For example, for addition, the two matrices must have an equal number of rows and columns, and for matrix multiplication, the number of columns in the first matrix and the number of rows in the second matrix must match. 

When we perform a matrix multiplication operation, we are doing a combination of matrix transformations. Let's say we have a matrix $R$ that rotates any input vector by 90 degrees and another matrix $S$ that shears the input vector. When we multiply these two matrices, we are doing a composition of these two transformations. Remember that this operation is not commutative (i.e., $R * S$ is not equal to $S * R$).

Special Types of Matrices:

  • Square Matrix: A matrix with the same number of rows and columns.
  • Identity Matrix: A square matrix with ones on the diagonal and zeros elsewhere. We will talk more about this matrix in the Gaussian Elimination section of this article.
  • Diagonal Matrix: A matrix where the elements outside the diagonal are all zero.
  • Transpose of a Matrix: The resulting matrix after swapping the rows and columns of the original matrix.

Solving Linear Equations with Gaussian Elimination

To understand how to solve linear equations, we first need to learn about one special type of matrix called the inverse matrix $A^{-1}$. The inverse matrix undoes any transformation performed by the original matrix. Let's return to our previous example and define the inverse of our $R$ function:

$R^{-1} = \begin{pmatrix} cos(90) & sin(90)\\ -sin(90) & cos(90) \end{pmatrix} = \begin{pmatrix} 0 & 1\\ -1 & 0 \end{pmatrix}$

We can pass the vector $A = [2, 2]$ through the rotate function R(A) to get a new vector $RA = [-2, 2]$. If we multiply this output vector $RA$ by $R^{-1}$ — the inverse matrix of $R$ — we get the original vector [2, 2]. One interesting property of the inverse matrix is that if you multiply a matrix by its inverse, we get the identity matrix:

$R * R^{-1} = \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix}$

Keep in mind that not all matrices have inverses. A matrix must be square (having the same number of rows and columns), commutative with its inverse upon multiplication, and its determinant must not be zero to have an inverse. We will talk more about the determinant later. 

For now, let's explore a method for solving linear equations that can also be used to find the inverse of an invertible matrix. This method is called Gaussian Elimination. It involves performing a series of row operations on the augmented matrix (for linear systems) or on the matrix alongside an identity matrix (for finding inverses) to transform it into its row echelon form or reduced row echelon form.

The process involves three types of row operations:

  1. Swapping two rows,
  2. Multiplying a row by a nonzero scalar,
  3. Adding a scalar multiple of one row to another row.

The goal is to achieve an upper triangular form of the matrix, where all elements below the main diagonal are zeros. Once this form is reached, back-substitution is used to solve for the variables starting from the last row upwards.

Example:

Let's solve the following system of equations:

$$ \begin{align*} x + 2y &= 9 \\ 3x - y &= 8 \end{align*} $$

Now, let's form the augmented matrix:

$\begin{pmatrix} 1 & 2 & | & 9\\ 3 & -1 &| & 8\end{pmatrix}$

Then, we need to ensure the first column's leading element is a 1. In this case, it is already done, so let's use it to make all elements below it 0:

$R_2​=R_2​−R_1$ $\begin{pmatrix} 1 & 2 & | & 9\\ 0 & -7 &| & -19\end{pmatrix}$

Now we need to make the leading coefficient of the second row a 1:

$R_2​=R_2​/(−7)$ $\begin{pmatrix} 1 & 2 &| & 9\\ 0 & 1 & | & 19/7\end{pmatrix}$

Finally, use the second row to make all elements above the leading 1 in the second column 0:

$R_1​ = R_1​−2R_2​$ $\begin{pmatrix} 1 & 0 & | &\frac{25}{7} \\ 0 & 1 &| & \frac{19}{7}\end{pmatrix}$

This gives us the solutions: $x=\frac{25}{7}​,y= \frac{19}{7}​.$

Now, we can do the same to find the inverse of the matrix. In this case, we’ll put the identity matrix on the right side.

$A = \begin{pmatrix} 1 & 2 & | & 1 & 0 \\ 3 & -1 & | & 0 & 1 \end{pmatrix}$

Now, perform row operations to transform $A$ into the identity matrix. Applying these operations to the identity matrix simultaneously will transform $I$ into $A^{-1}$.

$R_2 = R_2​−3R_1​$

$R_2 =​ \frac{R_2}{-7}$

$R_1 = R_1 - 2R_2$

After these steps, the matrix on the left side becomes the identity matrix, and the right side becomes the inverse of $A$.

$I|A^{-1} = \begin{pmatrix} 1 & 0 & | & 1/7 & 2/7 \\ 0 & 1 & | & 3/7 & -1/7 \end{pmatrix}$

Determinants

Understanding the inverse of a matrix leads us to a new concept: the determinant's role in confirming a matrix's invertibility. As mentioned before, not all matrices are invertible, but we can use the determinant to check for invertibility quickly. The reason behind this criterion is mathematical: the formulas used to calculate the inverse of a matrix involve dividing by the determinant. If we have $det(A)=0$, the division is undefined, categorically stating the impossibility of inversion for the matrix in question.

The determinant also has a profound geometric interpretation. It represents the scaling factor by which the matrix (viewed as a linear transformation) stretches or shrinks the volume or area of objects in space. When the determinant of a matrix is zero, it means that the transformation applied by that matrix collapses the space into a line or point, effectively reducing the area to zero. For 3D transformations, the volume could collapse to a plane or line. This collapse is precisely why a matrix with a zero determinant cannot have an inverse; if a matrix maps a space onto a lower dimension, there is no way to reverse this mapping for every point in the original space because information about the original positioning is lost.

Eigenvalues and Eigenvectors

Now that we have covered most of the basic linear algebra concepts needed for machine learning let’s explore eigenvalues and eigenvectors. These are foundational concepts in some machine learning algorithms like PCA (Principal Component Analysis) that we use to identify the directions (principal components) that maximize the variance in a dataset. 

Eigenvalues and Eigenvectors are crucial for dimensionality reduction and data visualization. Imagine you have a transformation that affects all points in space, such as stretching, shrinking, or rotating. There might be certain lines (directions) where points only get stretched or shrunk but don’t change direction. The vectors along these special lines are the eigenvectors, and the factor by which they are stretched or shrunk is the corresponding eigenvalue. Let's get a visual representation.

And here’s how to apply a linear transformation $A = \begin{pmatrix} 2 & 0\\ 0 & 4 \end{pmatrix}$ to our new vectors $v_1, v_2$.

$Av_1 = \begin{pmatrix} 2 & 0\\ 0 & 4 \end{pmatrix} \begin{pmatrix} 2\\ 0 \end{pmatrix} = \begin{pmatrix} 4\\ 0 \end{pmatrix}$ $Av_2 = \begin{pmatrix} 2 & 0\\ 0 & 4 \end{pmatrix} \begin{pmatrix} 2\\ 2 \end{pmatrix} = \begin{pmatrix} 4\\ 8 \end{pmatrix}$

As we can see in the image, the vector $v_1$ lies in the same span — it doesn’t change its direction. However, for $v_2$, which is also scaled by a factor of 2, the result is a change in both direction and scale. Therefore, we can say that this transformation $A$ has only one eigenvector: that vector is $v_1$, and its corresponding Eigenvalue is 2

Let's prove that this statement is true. The mathematical definition states that given a square matrix A, an eigenvector $v$, and its corresponding eigenvalue $λ$, this equation must satisfy $Av = λv$

On the other hand, our equation says that when matrix $A$ is multiplied by vector $v$, the result is the same as if we had simply scaled $v$ by a factor of $λ$.

However, it might not be that easy. Even when a vector's direction is changed (i.e., it is multiplied by a scalar), it remains a valid eigenvector. While this applies to the 180-degree rotation, the transformation of the new vector can still be represented with an eigenvalue $λ = -1$. Lucky for us, applying this logic still satisfies the equation.

Finding Eigenvalues and Eigenvectors

For the final section of this article, let’s dive into the process of finding eigenvalues and eigenvectors and why the determinant plays a crucial role in this. To find the eigenvectors and eigenvalues of a matrix $A$, we need to solve the equation: $det(A - λI) = 0$, where $A$ is our square matrix, λ is an eigenvalue, and $I$ is the identity matrix of the same dimension as A. Solving this equation gives us the eigenvalues λ, and we can then find the corresponding eigenvectors by solving the equation $(A - λI)v = 0$ for each eigenvalue.

Here’s an example of how to find the eigenvalues and eigenvectors for the given matrix.

$A = \begin{pmatrix} 1 & 0\\ 0 & 2 \end{pmatrix}$

First, we form the characteristic equation $det⁡(A−λI)=0$ where $I$ is the identity matrix.

The matrix $A−λI$ is equal to $\begin{pmatrix} 1 - λ & 0\\ 0 & 2 - λ \end{pmatrix}$.

The determinant of this matrix is $det(A−λI)=(1−λ)(2−λ)−(0⋅0)$.

Simplifying it results in $λ^2−3λ+2=0$.

Now, let’s solve this quadratic equation $λ^2−3λ+2=0$ 

We can factor this equation as $(λ−1)(λ−2)=0$. Therefore, the eigenvalues would be $λ_1=1, λ_2=2$.

Next, we need to find the eigenvectors for each eigenvalue. So, for $λ_1=1$ we substitute $λ_1$ with $A−λI$ and solve $(A−λ_1I)⋅v=0$.

The matrix $A -1I$ is equal to $\begin{pmatrix} 0 & 0\\ 0 & 1 \end{pmatrix}$

And finally, we can solve the system:

${0x + 0y = 0}\\ {0x + 1y = 0}$

In $y = 0$, $x$, it can be any value. Therefore, an eigenvector corresponding to $λ_1 = 1$ is any vector where $y = 0$. For example:

$v_1 = \begin{pmatrix} 1\\ 0 \end{pmatrix}$

For $λ_2=2$, we have the matrix $A−2I = \begin{pmatrix} -1 & 0\\ 0 & 0 \end{pmatrix}$

Solving the system:

${-1x = 0}\\{0x= 0}$

In $x = 0$, $y$, it can also be any value. Therefore, an eigenvector corresponding to $λ_2 = 2$ is any vector where $x = 0$. For example:

$v_2 = \begin{pmatrix} 1\\ 0 \end{pmatrix}$

Linear Algebra and Machine Learning: Key Takeaways

In the fast-paced world of machine learning, linear algebra is an essential skill that empowers practitioners to simplify vast datasets with techniques like Principal Component Analysis (PCA) and to understand the inner workings of algorithms. These mathematical concepts are crucial because they enable us to compress, manipulate, and interpret data in ways that are fundamental to improving machine learning models. 

Think of linear algebra as the key that unlocks the potential of machine learning, transforming complex equations into a practical toolkit for innovation and discovery in the digital age. Keeping these concepts in your repertoire will be invaluable as you embark on the exciting journey of machine learning, providing you with the foundation needed to excel and innovate.

Frequently Asked Questions

Linear algebra is crucial in machine learning as it provides the mathematical foundation for understanding algorithms, optimizing models, and transforming data. Key concepts like vectors, matrices, and eigenvalues are essential for developing and fine-tuning AI models.

In machine learning, vectors represent data points and facilitate various mathematical operations. They are essential for tasks such as data transformation, feature representation, and model optimization.

Matrices are used in machine learning to perform linear transformations, solve systems of linear equations, and represent data. They are fundamental in operations like rotation, scaling, and shearing of vector spaces, which are vital for model training and data manipulation.

Eigenvalues and eigenvectors are important for dimensionality reduction techniques like Principal Component Analysis (PCA). They help identify the directions that maximize variance in data, making it easier to visualize and interpret complex datasets.

Gaussian elimination is a method used to solve systems of linear equations by transforming matrices into row echelon form. This technique is essential for finding solutions to linear equations, which is a common task in machine learning model development and optimization.