Microsoft Office Tutorials and References
In Depth Information
OLAP data is in the form of aggregations. You want to get from OLAP information such as the following:
The volume of sales for Wal-Mart last month
The average salaries paid to employees this year
A statistical comparison of your own company details historically, or a comparison against
■ another model similar to the star schema is the snowflake schema, which is formed when one or more
dimension tables are joined to another dimension table (or tables) instead of a fact table. this results in reference
relationships between dimensions; in other words, they are normalized.
So far, so good! Although the OLAP system is designed with these schemas and structures, it’s still a relational
database. It still has all the tables and relations of an OLTP database, which means that you might encounter
performance issues when querying from these OLAP tables. This creates a bit of concern in aggregation.
aggregation is nothing but summing or adding data or information on a given dimension.
Extract, Transform, and Load
It is the structure of the cubes that solves those performance issues; cubes are very efficient and fast in providing
information. The next question then is how to build these cubes and populate them with data. Needless to say, data
is an essential part of your business and, as we’ve noted, typically exists in an OLTP database. What you need to do is
retrieve this information from the OLTP database, clean it up, and transfer the data (either in its entirety or only what’s
required) to the OLAP cubes. Such a process is known as Extract, Transform, and Load (ETL).
■ etl tools can extract information not only from Oltp databases, but also from different relational databases,
web services, file systems, or various other data sources.
You will learn about some ETL tools later in this chapter. We’ll start by taking a look at transferring data from an
OLTP database to an OLAP database using ETL, at a very high level. But you can’t just jump right in and convert these
systems. Be warned, ETL requires some preparation, so we’d better discuss that now.
Need for Staging
The ETL process pulls data from various data sources that can be as simple as a flat text file or as complex as a SQL
Server or Oracle database. Moreover, the data might come from different sources of unknown formats, such as when
an organization has merged with another. Or it could be an even worse scenario, where not only the data schemas are
different but the data sources are completely different as well. There might be diverse databases such as SQL, Oracle, or
DB2 or, for that matter, even flat files and XML files. And these data sources might be real-time OLTP databases that can’t
be directly accessed to retrieve information. Furthermore, the data likely needs to be loaded on a periodic basis as updates
happen in real time—probably every second. Now imagine that this involves terabytes of data. How much time would it
take to copy the data from one system and load it into another? As you can tell, this is likely to be a very difficult situation.
Search JabSto ::