Normalizing Data

Depending on the design of relational database, the tables might contain lots of duplicate data or might contain incorrect entries of data in single line that you can’t delete without losing the other entry or it may contain data that shouldn’t exist.

Normalization is the process of effectively organizing & rearranging the data in a database into a more standard normal form that prevents these kind of issues.  Normalizing the data is due to the need for data integrity, flexibility and efficiency.  There are several levels of normalization but you might not need to apply all of them to your database. For RMDBS compliance, you can have your database specialist normalize a flat database into three levels at minimum. Here are all the levels listed.

  • First Normal Form (1NF)

  • Second Normal Form (2NF)

  • Third Normal Form (3NF)

  • Boyce‐Codd Normal Form (BCNF)

  • Fourth Normal Form (4NF)

  • Fifth Normal Form (5NF)

  • Domain/Key Normal Form (DKNF)

First Normal Form (1NF) – In this step we will eliminate repeating groups by creating separate table for each set of related attributes and assign a primary key (PK) to each of those tables.

These are the requirements for 1NF

  1. Each column must have a unique name.

  2. The order of the rows and columns doesn’t matter.

  3. Each column must have a single data type.

  4. No two rows can contain identical values.

  5. Each column must contain a single value.

  6. Columns cannot contain repeating groups.

1st Normal form - Analyticalviews.com

 

 

Second Normal Form (2NF) – Take the data that is only partly dependent on the Primary Key and put it in a seperate table. This means that we need to split up the table columns that are of cardinality “many-to-many relationship”. After splitting the data, foreign keys should be defined and added so that the tables can be related. In this step all the non key attributes are fully functionally dependent on the Primary Key.

The requirements for 2NF are listed below

  1. The database tables should be in 1NF form
  2. All of the non key fields are dependent on all of the key fields.

Second Normal form 2NF - Analyticalviews.com

 

 

Third Normal Form (3NF) – In this form, no non key attributes are transitively dependent on the Primary key. Transitive dependency means that one attribute is dependent on the second attribute, which in turn depends on a third attribute. When you are deleting data in a table, and if the tables are transitively dependent on each other, there will be unwanted loss of information. An easier way of understanding this step is that the data that is not dependent on Primary Key  should be separated out and stored in another table.

The requirements for 3NF are listed below

  1. The database tables should be in 2NF form
  2. Should not contain transitive dependencies

Third Normal form 3NF - Analyticalviews.com

 

 

The final tables after performing the first three normalization forms are now ready for use in a relational database.

Normalized tables - Analyticalviews.com