Data Normalisation (OCR A-Level Computer Science): Revision Notes
Data Normalisation
Overview
Data Normalisation is a process used in relational databases to organise data efficiently, minimising redundancy and dependency by structuring data across multiple tables. The goal of normalisation is to reduce data duplication, ensure data integrity, and improve the database's efficiency in handling queries and updates. Normalisation typically progresses through stages known as "Normal Forms," with First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF) being the most commonly used.
What is Normalisation?
Normalisation is a multi-step process that organises data into tables to reduce redundancy and improve data integrity. By following normalisation rules, we ensure that each table focuses on a single topic or entity, with well-defined relationships to other tables.
Stages of Normalisation
First Normal Form (1NF):
Criteria:
- Eliminate repeating groups of data.
- Ensure that each field contains only atomic (indivisible) values.
- Each record must have a unique identifier, known as a primary key.
Example: Consider a table where a student's multiple phone numbers are stored in a single field: To satisfy 1NF, we split each phone number into a separate row:
| Student_ID | Name | Phone_Numbers |
|------------|------------|---------------------|
| 101 | Alice Smith| 123-456, 789-012 |
| Student_ID | Name | Phone_Number |
|------------|------------|--------------|
| 101 | Alice Smith| 123-456 |
| 101 | Alice Smith| 789-012 |
Second Normal Form (2NF):
Criteria:
- Meet all requirements of 1NF.
- Remove partial dependencies (i.e., non-key attributes should depend on the entire primary key).
- Only applies to tables with composite primary keys (i.e., primary keys made up of more than one field).
Example: If we have a table combining "Student" and "Course" details with a composite key (Student_ID and Course_ID), it may look like this: Here, "Name" only depends on "Student_ID," and "Course_Name" only depends on "Course_ID." To satisfy 2NF, we split it into two tables:
| Student_ID | Course_ID | Name | Course_Name |
|------------|-----------|------------|-------------|
| 101 | 201 | Alice Smith| Maths |
**Students Table**
| Student_ID | Name |
|------------|------------|
| 101 | Alice Smith|
**Courses Table**
| Course_ID | Course_Name |
|-----------|-------------|
| 201 | Maths |
Third Normal Form (3NF):
Criteria:
- Meet all requirements of 2NF.
- Remove transitive dependencies, meaning that non-key attributes should not depend on other non-key attributes.
Example: Consider a table where the "Course_Fee" is based on the "Course_Name": To achieve 3NF, move "Course_Fee" to a separate table, as it depends on "Course_Name":
| Course_ID | Course_Name | Course_Fee |
|-----------|-------------|------------|
| 201 | Maths | 500 |
**Courses Table**
| Course_ID | Course_Name |
|-----------|-------------|
| 201 | Maths |
**Fees Table**
| Course_Name | Course_Fee |
|-------------|------------|
| Maths | 500 |
Benefits of Normalisation
- Reduced Redundancy: By eliminating duplicate data, normalisation minimises storage costs and prevents unnecessary data duplication.
- Improved Data Integrity: Normalisation ensures data is consistent and accurate by organising it logically across tables.
- Simplified Maintenance: Updating, inserting, and deleting data is more straightforward and less error-prone in a normalised database.
- Efficient Query Processing: Normalisation allows for optimised queries, as well-organised data can be retrieved more quickly and accurately.
Examples of Applying Normalisation (1NF to 3NF)
Consider the following table in Unnormalised Form (UNF):
| Student_ID | Name | Courses | Instructor |
|------------|------------|-----------------|----------------|
| 101 | Alice Smith| Maths, Biology | Dr. Brown, Dr. Lee |
Step 1: Convert to 1NF
To achieve 1NF, split the repeating groups into separate rows:
| Student_ID | Name | Course | Instructor |
|------------|------------|-----------|------------|
| 101 | Alice Smith| Maths | Dr. Brown |
| 101 | Alice Smith| Biology | Dr. Lee |
Step 2: Convert to 2NF
To achieve 2NF, identify partial dependencies and create separate tables. Here, "Instructor" depends on "Course," not on "Student_ID":
**Students Table**
| Student_ID | Name |
|------------|------------|
| 101 | Alice Smith|
**Courses Table**
| Course | Instructor |
|------------|------------|
| Maths | Dr. Brown |
| Biology | Dr. Lee |
Step 3: Convert to 3NF
To achieve 3NF, ensure no transitive dependencies. If "Instructor" details need a separate table, we could further normalise:
**Instructors Table**
| Instructor | Course |
|------------|------------|
| Dr. Brown | Maths |
| Dr. Lee | Biology |
Note Summary
Common Mistakes
- Over-normalisation: Over-normalising can lead to excessive tables and complex queries, making data retrieval inefficient.
- Ignoring relationships: Not understanding dependencies between data can result in incorrect normalisation, leading to partial or transitive dependencies remaining in higher normal forms.
- Skipping steps: Some students try to jump straight to 3NF without achieving 1NF or 2NF first, which can lead to errors and data inconsistencies.
Key Takeaways
- Data Normalisation organises data to reduce redundancy and improve data integrity.
- 1NF removes repeating groups and ensures atomicity in fields.
- 2NF removes partial dependencies for tables with composite keys.
- 3NF removes transitive dependencies, ensuring no non-key attribute depends on another non-key attribute.
- Normalisation leads to a more efficient, maintainable, and consistent database schema, suitable for complex applications and larger datasets.