Microsoft Office Tutorials and References
In Depth Information
Data Model Best Practices
Data Model Best Practices
One of Excel’s most attractive features is its flexibility. You can construct an intricate system of calculations,
linked cells, and formatted summaries that work together to create your final presentation. But creating a
successful dashboard requires more than just slapping data onto a worksheet. A poorly designed data
model can lead to hours of excess work maintaining and updating your presentation. On the other hand,
an effective data model enables you to easily repeat monthly update processes without damaging your
dashboards or your sanity.
In this section, we discuss some data modeling best practices that help you start on the right foot
with your dashboard projects.
Avoid storing excess data
In Chapter 1, you may have read that measures used on a dashboard should absolutely support the
initial purpose of that dashboard. The same concept applies to the back-end data model. You should
import only data that’s necessary to fulfill the purpose of your dashboard or report.
In an effort to have as much data as possible at their fingertips, many Excel users bring into their
worksheets every piece of data they can get their hands on. You can spot these people by the 40MB
files they send through e-mail. You’ve seen these worksheets — two tabs that contain presentation
and then six hidden tabs that contain thousands of lines of data (most of which isn’t used). They
essentially build a database in their worksheet.
What’s wrong with utilizing as much data as possible? Well, here are a few issues:
➤ Excess data increases the number of formulas. If you’re bringing in all raw data, you have
to aggregate that data in Excel. This inevitably causes you to exponentially increase the
number of formulas you have to employ and maintain. Remember your data model is a vehicle
for presenting analyses, not processing raw data. The data that works best in the
presentation layer is what’s already been aggregated and summarized into useful views that can be
navigated and fed to dashboard components. Importing data that’s already been
aggregated as much as possible is far better. For example, if you need to report on Revenue by
Region and Month, there’s no need to import sales transactions into your data model.
Instead, use an aggregated table consisting of Region, Month, and Sum of Revenue.
➤ Excess data degrades the performance of your presentation layer. In other words,
because your dashboard is fed by your data model, you need to maintain the model behind
the scenes (likely in hidden tabs) when distributing the dashboard. Besides the fact that it
causes the file size to be unwieldy, including too much data in your data model can actually
degrade the performance of your dashboard. Why? When you open an Excel file, the entire
file is loaded into memory (or RAM ) to ensure quick data processing and access. The
drawback to this behavior is that Excel requires a great deal of RAM to process even the smallest
change in your worksheet. You may have noticed that when you try to perform an action on
a large formula-intensive data, Excel is slow to respond, giving you a Calculating indicator in
the status bar. The larger your data is, the less efficient the data crunching in Excel is.
➤ Excess data limits the scalability of your data model. Imagine that you’re working in a small
company and you’re using monthly transactions in your data model. Each month holds 80,000