An advanced look at organizing data to minimize redundancy through the first three normal forms.
Imagine an online store where changing a customer's address requires updating 500 different records manually—and if you miss just one, their package is lost forever. How do we build systems that are physically incapable of making such mistakes?
Before we organize data, we must understand the chaos of Redundancy. When the same piece of information is stored in multiple places, we encounter Data Anomalies. An Update Anomaly occurs when changing data in one row leaves inconsistent data in another. An Insert Anomaly happens when you cannot add new data because you're missing a piece of unrelated information. Finally, a Delete Anomaly occurs when deleting a record unintentionally wipes out the only copy of a different, important fact. Normalization is the formal process of 'cleaning' a database to prevent these three nightmares by ensuring every piece of data is stored in exactly one logical place.
Quick Check
If deleting a student's enrollment record also accidentally deletes the only record of the teacher's office location, what type of anomaly is this?
Answer
A Delete Anomaly.
To reach First Normal Form (1NF), a table must meet two criteria: every cell must contain a single, atomic value, and every record must be unique. This means no 'lists' inside a single column. For example, a 'Phone_Numbers' column cannot contain '555-1234, 555-5678'. We also identify a Primary Key (PK)—a unique identifier for every row. If a single column isn't enough, we use a Composite Key, which is a combination of two or more columns that together create a unique ID. In 1NF, we ensure the table is a flat, predictable grid where .
1. Scenario: A 'Student' table has a column 'Subjects' containing 'Math, Physics'. 2. Problem: This violates atomicity (multi-valued attribute). 3. Solution: Create separate rows for each subject. - Row 1: StudentID 101, Subject: Math - Row 2: StudentID 101, Subject: Physics 4. Result: The table is now in 1NF because every cell holds one value.
A table is in Second Normal Form (2NF) if it is in 1NF and has no Partial Functional Dependencies. This rule only applies when you have a Composite Key. A partial dependency occurs when a non-key attribute (like 'ProfessorName') depends on only part of the composite key (like 'CourseID') rather than the entire key (like 'CourseID' + 'Semester'). If and form the primary key, and is a descriptor, then must require both and to be identified. If only needs , we must move and to a new table.
Quick Check
True or False: A table with a single-column Primary Key that is already in 1NF is automatically in 2NF.
Answer
True
To reach Third Normal Form (3NF), the table must be in 2NF and have no Transitive Dependencies. This means a non-key column cannot depend on another non-key column. Think of it as a chain: if (A determines B) and (B determines C), then is a transitive dependency. In database terms, if 'StudentID' determines 'ZipCode', and 'ZipCode' determines 'City', then 'City' is transitively dependent on 'StudentID'. To fix this, we move the 'middleman' relationship () into its own table. As the saying goes: 'Every attribute must depend on the key, the whole key, and nothing but the key!'
1. Initial Table: [EmployeeID (PK), Job_Class, Hourly_Rate]. 2. Analysis: EmployeeID determines Job_Class. However, Job_Class determines the Hourly_Rate (everyone in 'Class A' gets EmployeeID \rightarrow Job\_Class \rightarrow Hourly\_Rate$. 4. Fix: Split into two tables: - Table 1: [EmployeeID (PK), Job_Class] - Table 2: [Job_Class (PK), Hourly_Rate]
Which normal form is specifically concerned with removing multi-valued attributes (lists) from a cell?
If a table has a composite primary key and a column 'Student_Name', why does this violate 2NF?
A transitive dependency exists when a non-key attribute depends on another non-key attribute.
Review Tomorrow
In 24 hours, try to explain the difference between a partial dependency and a transitive dependency to a peer without looking at your notes.
Practice Activity
Find a spreadsheet you use (like a budget or a grade tracker) and try to decompose it into 3NF tables on paper.