Wednesday, March 30, 2011

Database Normalization (1-3 NF)

This is a tutorial for those who are confused about the normal forms due to the extreme confusion you find on the web about the subject. If you want to know what normalization is and why to do it, wikipedia has a great article detailing this information:
http://en.wikipedia.org/wiki/Database_normalization

1NF
1NF is arguably the most ambiguous and confusing normal form on the web. The first normal form is just about making multi-valued fields organized into multiple rows. There are two types of multi-valued fields in unnormalized tables:

This-

IdStudentSubject 1Subject 2
1HarryCharmsPotions
2RonCharmsPotions

And this-

IdStudentSubjects
1HarryCharms, Potions
2RonCharms, Potions

Or as is shown in most examples, this-

IdStudentSubject
1HarryCharms
Potions
2RonCharms
Potions

In both cases we are tying to shove in a list of values, that is a one-to-many or many-to-many relationship with the row, into the row itself. For reasons detailed in the wikipedia article, this should be fixed by creating a separate row for each value and repeating the values in the others fields, like so:

IdStudentSubject
1HarryCharms
1HarryPotions
2RonCharms
2RonPotions

The table is now in 1NF were it not for the primary key losing its uniqueness. The id field must be unique in order to identify each distinct student. To solve this we can do one of two things:

Either we create an additional key field for the subjects and make the primary key of the whole table be a composite key of both student and subject ids, thus making the composite key unique-

StudIdStudentSubjIdSubject
1Harry1Charms
1Harry2Potions
2Ron1Charms
2Ron2Potions

Or we could just prepare for the other normal forms and start splitting the tables now as is usually done in examples-

StudIdStudentSubjId
1Harry1
1Harry2
2Ron1
2Ron2

SubjIdSubject
1Charms
2Potions

Notice that we still need to make the foreign key in the students table part of the primary key in order to preserve uniqueness. If you're wondering why we didn't use a weak entity (bridge table / junction table), that's because it's done in 2NF.

If we had more than one multi-valued field, we'd just create a Cartesian product, like so:

StudIdStudentSubjIdSubjectTeacIdTeacher
1Harry1Charms1Filius
1Harry2Potions2Slughorn
1Harry2Potions3Snape
2Ron1Charms1Filius
2Ron2Potions2Slughorn
2Ron2Potions3Snape

Resulting in 3 tables:

StudIdStudentSubjIdTeacId
1Harry11
1Harry22
1Harry23
2Ron11
2Ron22
2Ron23

SubjIdSubject
1Charms
2Potions

TeacIdTeacher
1Filius
2Slughorn
3Snape

Complete example:

IdStudentSubjIdSubjectRoomRoomHallTeacIdTeacher
1Harry1Charms101A1Filius
2Potions202B2Slughorn
3Snape
2Ron1Charms101A1Filius
2Potions202B2Slughorn
3Snape

Turns into:

StudIdStudentSubjIdSubjectRoomRoomHallTeacIdTeacher
1Harry1Charms101A1Filius
1Harry2Potions202B2Slughorn
1Harry2Potions202B3Snape
2Ron1Charms101A1Filius
2Ron2Potions202B2Slughorn
2Ron2Potions202B3Snape

Notice that the room is assumed to be dependant on the subject such that each subject is taught in its own room.

We may opt to leave the table as it is as it quite complex to break down into smaller tables without any guidance. However if we are to break the table down, the room information would be included in with the subject table since we said that the room is dependant on it, yielding the following:

StudIdStudentSubjIdTeacId
1Harry11
1Harry22
1Harry23
2Ron11
2Ron22
2Ron23

SubjIdSubjectRoomRoomHall
1Charms101A
2Potions202B

TeacIdTeacher
1Filius
2Slughorn
3Snape

2NF
2NF is only applicable on tables with composite keys. If a table does not have a composite key, then it is already in 2NF. To make a table in 2NF, first you make sure it is in 1NF and then you split it into separate tables depending on which part of the composite key the fields depend on. For example in the students' table above, the student name does not depend on the subject id, it depends only on part of the composite key, that is, the student id. So the student name field should go into a separate table which describes the students (together with the primary key of course).

StudIdSubjId
11
12
21
22

StudIdStudent
1Harry
2Ron

SubjIdSubject
1Charms
2Potions

And there is it, the weak entity we are so used to in many to many relationships.

Complete example (following from previous):

StudIdStudentSubjIdTeacId
1Harry11
1Harry22
1Harry23
2Ron11
2Ron22
2Ron23

Turns into:

StudIdSubjIdTeacId
111
122
123
211
222
223

StudIdStudent
1Harry
2Ron

Hence yielding:

StudIdSubjIdTeacId
111
122
123
211
222
223

StudIdStudent
1Harry
2Ron

SubjIdSubjectRoomRoomHall
1Charms101A
2Potions202B

TeacIdTeacher
1Filius
2Slughorn
3Snape

Notice that if in 1NF we did not break down the table, we'd result with the same set of tables by now.

3NF
3NF is the normal form we are used to. All we do is check that every field in a 2NF table depends directly on the primary key. If it doesn't or if it depends on a non-primary key field, you place it in its own table. For example if we had the following table:

SubjIdSubjectRoomRoomHall
1Charms101A
2Potions202B

The RoomHall field is directly dependent on the Room field and not on the SubjId primary key field, so the RoomHall field should go into a table on its own together with the Room field. In fact the room where the subject is thought is not a direct property of the subjects entity but is an entity on its own and hence should be separated into a rooms entity and only referenced by a foreign key.

SubjIdSubjectRoom
1Charms101
2Potions202

RoomRoomHall
101A
202B

Complete example (following from previous):

SubjId Subject Room RoomHall
1 Charms 101 A
2 Potions 202 B

Turns into:

SubjId Subject Room
1 Charms 101
2 Potions 202

Room RoomHall
101 A
202 B

Hence yielding:

StudId SubjId TeacId
1 1 1
1 2 2
1 2 3
2 1 1
2 2 2
2 2 3

StudId Student
1 Harry
2 Ron

SubjId Subject Room
1 Charms 101
2 Potions 202

Room RoomHall
101 A
202 B

TeacId Teacher
1 Filius
2 Slughorn
3 Snape

Links
http://portal.dfpug.de/dFPUG/Dokumente/Partner/Hentzenwerke/Visual%20FoxPro%20Certification%20Exam%20Study%20Guide%20Chapter%2002.pdf

3 comments:

  1. What about normalizing tables where we do not know the information before hand. Example say a server has multiple clients that can vary in number. Each client sends a message to another through server. A db table comprising SRC and dest for message is likely to have repeated fields since clients are dynamic in connection. Meaning message with SRC Id #1can send to Dest id #1,2,3,4 each on a separate row of table. SRC Id #3 can similarly send to fest #15,2,43, etc.

    ReplyDelete
    Replies
    1. It would still be normalized as long as you're using foreign keys. Normalization does not stop every form of repetition as foreign keys would still be repeated. You can see that in the example above, the "StudId | SubjId | TeacId" table has a lot of repetition but it is still normalized because the repetition is in the foreign keys. What's important is that fields in a table which are more closely related to each other than they are to the rest of the fields should be placed in their own table.

      Delete