Microsoft Office Tutorials and References
In Depth Information
adding a ProductCategory column to the Products table, the ProductCategory table can be
safely removed from the data model without any loss of expressivity. The same, obviously,
applies to ProductSubcategory.
Tables whose only job is to provide descriptions are amazingly common in the world of relational
databases. So you find a lot of situations where the application of this simple rule drastically
reduces the number of tables needed and simplifies the PowerPivot data model. For simple
lookup tables, the denormalization through SQL queries is by far the easiest tool.
On the other hand, when a table contains more information than a simple description, you are
much more likely to find it difficult to decide whether to import the table and perform the
denormalization in PowerPivot or to perform the denormalization in a query and import the result
of that query. If, for example, the ProductSubcategory table contains more columns, we need
to add all of these columns to the Product table, which might produce a complex Product table
that is difficult to use. These situations need to be handled thoughtfully so that you can make
correct decisions and produce an easy-to-use model.
Finally, complex tables that represent business entities are the pillars of the data models
and so should not be denormalized. If you think, for instance, about the relationship between
SalesOrderDetail and Products, you see that the relationship should not be denormalized at
all because orders and products are different business entities. Both the entities and their
relationship should exist in the PowerPivot database; denormalizing it would produce a
wrong data model.
The rationale behind this is that you might be interested in analyzing products without
relating them to sales. Think about this: if you denormalize the product name in the sales
table, you might want to count the number of products to produce some report. But, if
you see the product name only in the sales table, you can count only sold products, not
products. If a product has never been sold, it does not even appear in the sales table
because no sales row exists for it. If you reach the point where you over-denormalize
the structure, you might create a data model that returns incorrect answers.
As a rule, you denormalize tables when the number of useful columns in the table is very small
(for lookup tables, it is often only one) and you do not denormalize tables when the number of
useful columns exceeds three. For two or three columns, it is up to the data modeler (that is, up
to you) to choose the best denormalization strategy.
Keep in mind that the task to define a data model is complex, and denormalization is just
one of the many hard decisions that need to be made. As a PowerPivot user, you are going to
become a data modeler more of your time, so get ready to make difficult decisions and don’t
be too afraid to take the wrong ones; even seasoned Business Intelligence (BI) experts often
have difficulty when it is time to decide whether to keep an attribute normalized or not.