Normalization is process of decomposing (splitting) the relations into smaller relations with fewer attributes by minimizing redundancy of data and minimizing insertion, deletion and updationanomalies.
In other words, Normalization may be defined as a step by step reversible process of transforming an unnormalized relation into relations with progressively simpler structures.
Normalization is based on the concept of Normal Forms. The Relation (table) is said to be in particular normal forms if it satisfies a certain sets of constraints.
There are normally five normal forms. Each normal form requires that it should satisfy previous levels of normal forms.
For Example: If a relation is in 3NF then It must be in 2NF as well as 1NF. The first three normal forms i.e. 1NF, 2NF, 3NF were proposed by Dr E.F. Codd. Later on, a stronger definition of 3NF known as Boyce Codd Normal Form (BCNF) was proposed by Boyce and Dr Codd. The 4NF and 5NF normal forms were proposed after BCNF to deal with practical solutions that are very rare because of performance reasons.
Need for Normalization
The data in a database should have the following features characteristics:
§ Minimize Data redundancy i.e. no unnecessarily duplication of data.
§ To make database structure flexible i.e. it should be possible to add new data values and rows without reorganizing the database structure.
§ Data should be consistent throughout the database i.e. it should not suffer from the following anomalies:
1. Insert Anomaly: Due to lack of data i.e. all the data available for insertion such that null values in keys should be avoided. This type of anomaly can seriously damage a database.
2. Update anomaly: It is due to data redundancy i.e. multiple occurrences of same values in a column. This can lead to inefficiency.
3. Deletion anomaly: It leads to loss of data for rows that are not stored elsewhere. It could result in loss of vital data.
§ Complex queries required by the user should be easy to handle.
§ The resulting relations obtained on normalization should possess the properties such as each row must be identified by a unique key, no repeating groups, homogeneous columns, each column is assigned a unique name etc
Advantages Of Normalization
1. More efficient data structure.
2. Avoid redundant fields or columns.
3. More flexible data structure i.e. we should be able to add new rows and data values easily.
4. Better understanding of data.
5. Ensures that distinct table exists when necessary.
6. Easier to maintain data structure i.e. it is easy to perform operations and complex queries can easily handled.
7. Minimize data duplication
Types of Normal Forms
Normalization works through a series of stages called normal forms:
§ First Normal Form
§ Second Normal Form
§ Third Normal Form
§ Fourth Normal Form
§ Fifth Normal Form
First Normal Form (1NF)
The term first normal form (1NF) describes the tabular format in which:
o All the key attributes are defined.
o There are no repeating groups in the table.
o All attributes are dependent on the primary key.
Second Normal Form(2NF)
A table is in 2NF if:
o It is in 1NF and
o It includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key. (It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkeyattributes.)
The concept of functional dependency is the basis for first three normal forms. The functional dependencies are the consequence of the interrelationships among attributes of a relation represented by some link or association. An attribute (column) Y of a relation (table) R is said to be functionally dependent upon attribute X of Relation R if and only if for each value of X in R has associated with it only one value of Y in R at any given time. It is represented as X – > Y where X attribute is known as determinant and Y attribute is known as determined.
A Determinant is an attribute in a Relation which can uniquely determined the values of other attribute.
Fully Functional Dependency
It usually applies to tables with composite keys i.e. when the primary key consists of more than one field. An attribute (column) Y in a Relation R is fully functional dependent on attribute X of relation R if it is functionally dependent on X and not functionally dependent on any subset of X. In other words, fully functionally dependence means that when the primary key is composite i.e. made of two or more columns then the other non key attributes of the relation must be identified by the entire key and not by some of the attributes that make up the primary key.
Third Normal Form(3NF)
A table is in 3NF if:
o It is in 2NF and
o It contains no transitive dependencies.
Boyce Codd Normal Form(BCNF)
BCNF requires that the table is 3NF and only determinants are the candidate keys
Fourth Normal Form(4NF)
A table is in 4NF if:
o It is in 3NF and
o It has no multiple sets of multivalueddependencies