Understanding how machines make categorical decisions using probability and decision boundaries.
How does your email filter decide in milliseconds if a message is 'Spam' or 'Important' without actually 'reading' it like a human?
In machine learning, we often need to choose between two paths. Linear Regression predicts a continuous number, like the price of a house or tomorrow's temperature. However, many real-world problems are Classification tasks, where the output is a discrete category. Is this tumor malignant or benign? Is this credit card transaction fraudulent or legitimate? While linear regression could technically output any number from negative to positive infinity, classification requires a way to 'squish' those numbers into a range that represents probability: a value between and .
Quick Check
If you are building a model to predict the exact number of inches of rainfall tomorrow, are you performing regression or classification?
Answer
Regression, because the output is a continuous numerical value.
Suppose your model calculates a 'score' () for an email being spam. 1. If the score , then . There is a 50% chance it is spam. 2. If the score is a high positive number like , the probability will be very close to (Spam). 3. If the score is a low negative number like , the probability will be very close to (Not Spam).
Once we have a probability, we need to make a final decision. We do this by setting a Decision Boundary or threshold. Usually, we set this at . If the Sigmoid output is , we classify it as Class 1 (e.g., Spam). If it is , we classify it as Class 0 (e.g., Not Spam). By changing this threshold, we can make our model more 'cautious' or 'aggressive' depending on the stakes of the decision.
Quick Check
If a model outputs a probability of 0.85 and our decision boundary is 0.5, what is the final classification?
Answer
Class 1 (or the 'Positive' class).
How do we know if our classifier is actually good? We use a Confusion Matrix, a table that compares the Actual labels with the Predicted labels. It breaks results into four quadrants: - True Positives (TP): Predicted Spam, was actually Spam. - True Negatives (TN): Predicted Not Spam, was actually Not Spam. - False Positives (FP): Predicted Spam, but was actually Not Spam (Type I Error). - False Negatives (FN): Predicted Not Spam, but was actually Spam (Type II Error).
Imagine a test for a rare disease. 1. Out of 100 people, 10 have the disease. 2. The model correctly identifies 8 of them (TP = 8), but misses 2 (FN = 2). 3. Of the 90 healthy people, it correctly identifies 85 (TN = 85) but tells 5 they are sick (FP = 5). 4. Total Accuracy = .
In high-stakes scenarios, accuracy isn't enough. 1. Precision asks: 'Of all we predicted as positive, how many were right?' Formula: . 2. Recall asks: 'Of all the actual positives, how many did we find?' Formula: . 3. If a self-driving car misses a pedestrian (FN), it's a disaster. In this case, we prioritize high Recall over high Precision.
Which function is used in Logistic Regression to map values to probabilities?
What does a 'False Positive' represent in a spam filter?
If the input to a Sigmoid function is a very large negative number, the output will be close to 0.5.
Review Tomorrow
In 24 hours, try to sketch the Sigmoid curve and write down the four quadrants of a Confusion Matrix from memory.
Practice Activity
Find a dataset (like the Titanic survival dataset) and identify which features would be the 'inputs' and what the 'binary classes' would be.