Microsoft Office Tutorials and References
In Depth Information
The first option (to duplicate the category and subcategory value over all rows) is the one
required by the logical data model. Because the description of a category is the same for a
lot of different products and you have, therefore, a high level of data duplication, the
logical model is not optimized. The technical model, on the other hand, is very much optimized
because that model removes redundancy, which leads to a smaller size on disk, less use of
memory, and—in general—to a higher level of performance of the database. Roughly
speaking: the technical model is normalized (redundancy removed), whereas the logical
one is denormalized (redundancy present).
Normalization increases the number of tables needed to store the same information, yet
it reduces the overall size of the database. The need for normalization has been present
in relational databases since their invention, and database analysts are so used to
normalization that they look at denormalized databases with disgust. Nevertheless, denormalized
databases have the great advantage of being much easier to query and understand because
their structure is much more similar to the logical one.
Moreover, with the advent of modern columnar databases, such as PowerPivot, the need
to normalize tables is highly reduced. If a single value is repeated several times in a table,
PowerPivot does not store the value as it is but provides a kind of automatic normalization
by removing the long description and replacing it with numbers that reference the
description. In other words, the process of normalization is carried on internally by the PowerPivot
database even if you load a denormalized table.
Is Normalization a Correct Technique?
Please note that normalization is neither good nor bad. The same applies for its contrary:
denormalization. The need to normalize data comes from the very nature of the database
and follows technical rules that we cannot cover in this small note, yet are very important.
An operational database should be normalized, whereas a data model directly browsable
by the user should be denormalized.
You should avoid the temptation to think that one model is more correct than the other
one. Each one is the right model for the kind of operations that you need to perform on
it. The point here is that, when you load data from an operational database, you are likely
to find normalized data. Because you need to show that information to a user, you need
some techniques to remove normalization and produce a denormalized data model, which
is the correct one for the analysis.