Microsoft Office Tutorials and References
In Depth Information
Normalization
Database design is the process of determining the content and structure of data in a data-
base in order to support some activity on behalf of a user or group of users. After you have
determined the collection of fields users need to support an activity, you need to deter-
mine the precise tables needed for the collection of fields and then place those fields into
the correct tables. Crucial to good database design is understanding the functional depen-
dencies of all fields; recognizing the anomalies caused by data redundancy, partial depen-
dencies, and transitive dependencies when they exist; and knowing how to eliminate the
anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data
integrity and other problems as your database grows in size.
The process of identifying and eliminating anomalies is called normalization . Using
normalization, you start with a collection of tables, apply sets of rules to eliminate anom-
alies, and produce a new collection of problem-free tables. The sets of rules are called
normal forms . Of special interest for our purposes are the first three normal forms: first
normal form, second normal form, and third normal form. First normal form improves the
design of your tables, second normal form improves the first normal form design, and
third normal form applies even more stringent rules to produce an even better design.
Note that normal forms beyond third normal form exist; these higher normal forms can
improve a database design in some situations but won’t be covered in this section.
First Normal Form
Consider the Employee table shown in Figure A-14. For each employee, the table contains
EmployeeNum, which is the primary key; the employee’s first name, last name, health
plan code and description; and the ID, description, pay grade, and start date of each posi-
tion held by the employee. For example, Barbara Hennessey has held one position, while
the other three employees have held two positions. Because each entry in a table must
contain a single value, the structure shown in Figure A-14 does not meet the requirements
for a table, or relation; therefore, it is called an unnormalized relation . The set of fields
that includes the PositionID, PositionDesc, PayGrade, and StartDate fields, which can
have more than one value, is called a repeating group .
Repeating group of data in an unnormalized Employee table
Figure A-14
repeating group
Employee
EmployeeNum
PositionID
FirstName
LastName
PositionDesc
PayGrade
StartDate
HealthPlan
PlanDesc
2173
2
Barbara
Hennessey
Manager
40
12/14/2008
B
Managed HMO
4519
1
Lee
Noordsy
Director
45
04/23/2010
A
Managed PPO
3
Analyst
30
11/11/2004
8005
3
Pat
Amidon
Analyst
30
06/05/2009
C
Health Savings
4
Clerk
20
07/02/2007
8112
1
Chris
Wandzell
Director
45
12/15/2009
A
Managed PPO
2
Manager
40
10/04/2008
First normal form addresses this repeating-group situation. A table is in first normal
form (1NF) if it does not contain repeating groups. To remove a repeating group and con-
vert to first normal form, you expand the primary key to include the primary key of the
repeating group, forming a composite key. Performing the conversion step produces the
1NF table shown in Figure A-15.
Search JabSto ::




Custom Search