Covers how to perform calculations on datasets and group results for analysis.
Imagine you are the lead data scientist for a global music app with 50 million users. How do you find the top-earning artist in seconds without manually checking every single song play?
Suppose you have a table called 'Inventory' with a column 'Price'. 1. To find the total value of all stock: `SELECT SUM(Price) FROM Inventory;` 2. To find the number of unique items: `SELECT COUNT(ItemID) FROM Inventory;` 3. To find the most expensive item: `SELECT MAX(Price) FROM Inventory;`
Quick Check
If a column 'Scores' contains the values [10, 20, NULL, 30], what would COUNT(Scores) return?
Answer
3
Raw aggregates are useful, but we often need to see data broken down by category. The GROUP BY clause collapses multiple rows into summary rows based on shared values in specific columns. Think of it like sorting a deck of cards into four piles by suit before counting how many cards are in each pile. When you use GROUP BY, every column in your `SELECT` statement must either be part of an aggregate function or listed in the `GROUP BY` clause. If you try to select a specific 'Name' while grouping by 'Department', the database won't know which name to show for that group!
You have a 'Sales' table with 'Region' and 'Amount'. To find the total sales for every region: 1. Identify the category: 'Region'. 2. Identify the math: `SUM(Amount)`. 3. Write the query: `SELECT Region, SUM(Amount) FROM Sales GROUP BY Region;` This results in one row per unique region.
Quick Check
True or False: You can use GROUP BY on a column even if you aren't using an aggregate function like SUM or COUNT.
Answer
True
A common mistake is trying to filter aggregated data using the `WHERE` clause. However, `WHERE` filters individual rows before they are grouped. To filter the results after the aggregation has occurred, we use the HAVING clause. For instance, if you want to find only the departments where the average salary is greater than $50,000, you cannot use `WHERE AVG(Salary) > 50000` because the average hasn't been calculated yet when `WHERE` runs. The logical order of operations is: 1. FROM (Get data) 2. WHERE (Filter rows) 3. GROUP BY (Categorize) 4. HAVING (Filter groups) 5. SELECT (Display results)
Find departments with more than 10 employees that have a total budget over $100,000: 1. `SELECT DeptID, COUNT(EmpID), SUM(Salary)` 2. `FROM Employees` 3. `GROUP BY DeptID` 4. `HAVING COUNT(EmpID) > 10 AND SUM(Salary) > 100000;` This filters out small or low-budget departments from the final results.
Which SQL clause is used to filter rows before any grouping occurs?
What is the result of if the table has 10 rows and 2 of them are completely empty (NULL)?
In a query using GROUP BY, every non-aggregated column in the SELECT list must also appear in the GROUP BY clause.
Review Tomorrow
In 24 hours, try to explain the difference between WHERE and HAVING to a friend, or write it down from memory.
Practice Activity
Open a sample database and try to find the 'Average grade per subject' for a school database, but only for subjects with more than 5 students.