Learning how tree-based models make decisions and how combining multiple models improves accuracy.
How does a bank decide in milliseconds whether to approve your credit card? It isn't a mysterious 'black box'—it's often a series of logical 'Yes/No' questions that branch out like a digital tree.
Quick Check
If a dataset is perfectly balanced (50% Yes, 50% No), what is the Entropy value?
Answer
The Entropy is 1.0, representing maximum uncertainty.
If we let a tree grow indefinitely, it will eventually create a leaf for every single data point. This is Overfitting: the model 'memorizes' the noise in the training data rather than learning the underlying pattern. An overfit model performs perfectly on training data but fails miserably on new, unseen data. To combat this, we use Pruning—cutting back branches that provide little predictive power—or set a Maximum Depth to stop the tree from becoming too complex. We want a model that generalizes well, balancing the Bias-Variance Tradeoff.
Quick Check
Does an overfit model have high variance or high bias?
Answer
High variance, because its predictions change drastically with small changes in the training data.
Why use one tree when you can use a forest? Ensemble Methods combine multiple models to create a stronger predictor. There are two main strategies: Bagging and Boosting. - Bagging (Bootstrap Aggregating): We train multiple trees independently on random subsets of the data and average their results. This is the foundation of Random Forests. - Boosting: We train trees sequentially. Each new tree focuses specifically on the errors made by the previous trees. This 'boosts' the performance of weak learners into a strong ensemble.
Imagine you are diagnosing a rare disease: 1. Random Forest (Bagging): 100 doctors look at different random samples of patient charts. They all vote. The majority wins. This reduces the risk of one 'weird' chart skewing the result. 2. AdaBoost (Boosting): One doctor tries to diagnose. A second doctor looks only at the cases the first doctor got wrong. A third doctor focuses on the mistakes of the first two. They combine their expertise to solve the hardest cases.
In Gradient Boosting, we don't just look at 'wrong' labels; we look at the Residuals (the difference between predicted and actual values). 1. Start with a simple model (like the average of all targets). 2. Calculate the residual: . 3. Train a small tree to predict the residual , not the target . 4. Update the prediction: , where is the learning rate. 5. Repeat until the residuals are near zero.
Which metric is used to measure the 'purity' of a node split?
What is the primary goal of a Random Forest?
Boosting models train multiple trees simultaneously in parallel.
Review Tomorrow
In 24 hours, try to sketch the difference between a Bagging and a Boosting workflow from memory.
Practice Activity
Try building a small decision tree on paper for a 'Should I go outside?' dataset with features like 'Rain' and 'Temperature'.