Exploring the mathematical foundations of predictive modeling through linear regression and gradient descent.
How does a self-driving car predict exactly where a pedestrian will be in two seconds, or how does Netflix know which movie will be your next obsession? It all starts with a single line and the math of 'getting less wrong' over time.
Quick Check
In the equation , what happens to the predicted if we increase the bias ?
Answer
The entire line shifts upward on the y-axis, increasing every prediction by the same amount.
Let's calculate MSE for two data points: 1. Point A: Actual , Predicted . Error = . Squared Error = . 2. Point B: Actual , Predicted . Error = . Squared Error = . 3. Sum of squares = . 4. .
Quick Check
Why do we square the errors in the MSE formula instead of just adding them up?
Answer
Squaring prevents positive and negative errors from canceling each other out and gives more weight to large outliers.
Suppose your current weight is 10, your learning rate is 0.1, and the gradient (slope) at that point is 5. 1. The gradient is positive, meaning the 'mountain' goes up as increases. 2. To go down, we subtract: . 3. New weight . 4. We repeat this thousands of times until the gradient is nearly zero, meaning we've reached the bottom of the valley.
What does a gradient of zero indicate in the context of a cost function?
If your model's predictions are consistently overshooting the target and the error is getting larger, what should you adjust?
In simple linear regression, the Mean Squared Error (MSE) can sometimes be a negative number.
Review Tomorrow
In 24 hours, try to write down the MSE formula from memory and explain the 'mountain' analogy for gradient descent to a friend.
Practice Activity
Try to manually calculate one update step for a weight given a starting value of 2.0, a gradient of 0.4, and a learning rate of 0.01.