December 2006 - Posts

Integrating Analysis Services 2005 and Reporting Services 2005 with DRILLTHROUGH: Part II of IV

Part II of IV: Design Implications of the New Drillthrough

Part I of this series showed that using drillthrough in SSAS is hit-or-miss.  Drilling through the top cell in the Adventure Works cube returned a rowcount identical to that of the FactResellerSales table underpinning the Reseller Sales measure group.  Another example, from a real-world project, showed that this is not a reliable indicator of drillthrough’s behavior.  In that example, drillthrough returned fewer rows than the fact table contained, even though the sum of dollars was correct.  Microsoft has intentionally changed the behavior of drillthrough to return one row per nonempty detail cell that feeds the cell upon which drillthrough acts.
This is decidedly unsatisfactory if the goal is to show the actual fact rows that contributed to a cell, not a rowset imitated by detail cells.  There are three possible ways to address this change.

  1. Tell the users such detail is no longer available.  This will surely go over like the proverbial lead zeppelin and does not show a servant's heart.
  2. Force drillthrough behavior by using parameterized relational queries.  This is not a satisfactory approach since it adds complexity to report authoring and assumes an available and capable relational data source.
  3. Alter multidimenional design principles to accommodate the new reality.  Use an MDX query with crossjoins or with the DRILLTHROUGH keyword to provide drillthrough.

The third option is the only viable option.  Since drillthrough seems to return one row for each detail cell that comprises the target cell, there must be a way to ensure that each cell contains one and only one fact row.  This suggestion will rankle many.  As noted earlier, the point of using OLAP is to have a single cell for a value, regardless of how many fact rows correspond to that tuple.  Stated another way, it was legitimate to have many fact rows per single detail cell.  But for the “new” drillthrough to work properly, this relationship must be changed to one cube detail cell per one and only one fact row.   If each detail cell has one and only one fact row, then drillthrough has no choice but to return one row per fact row.  This can be codified with the following rules.

  • For drillthrough in SSAS to return fact rows based only on UDM information, each detail cell must have one and only one fact row.
  • For each detail cell to have one and only one fact row, each fact row must be dimensionally unique.  No two rows may have the same combination of surrogate keys.

SSAS has two features to enable this design change. 

  • SSAS permits an optional primary key in the DSV fact table.  This key can be a combination of existing dimensional columns, or it can be a value manufactured by ETL to ensure uniqueness.
  • SSAS permits fact dimensions.  Christian Wade has an excellent explanation of this nifty new feature at http://blogs.conchango.com/christianwade/archive/2005/04/07/1255.aspx. 

This fact table primary key becomes a dimension attribute in the fact dimension, which allows a UDM attribute to enforce the one-to-one relationship between a detail cell and a fact row.  It can be invisible, to be used only when called by a drillthrough query.  Beware though, as it could exact a hefty toll on SSAS.  Imagine having one attribute member for every single row in the fact table!  Such an attribute would be prodigious.  The multidimensional lattice could grow from “huge” to “unimaginably monstrous.”  For those running on 32-bit Windows with the ongoing 3GB RAM limitation, this could be a huge problem even with the slick new dimension memory management in SSAS.  Making the fact dimension a ROLAP dimension may be a good idea, and it may be possible to disable the attribute hierarchy for that attribute as well.

Part I of this series used Adventure Works DW as an example.  This example is carried forward now, since it illustrates how these techniques are implemented in the Reseller Sales measure group.  Inspection reveals the following: 

  • The FactResellerSales table in the AdventureWorksDW SQL database has a unique constraint on the SalesOrderNumber and SalesOrderLineNumber columns.
  • The FactResellerSales table in the DSV reveals a compound primary key on the SalesOrderNumber and SalesOrderLineNumber columns.
  • UDM contains a ROLAP dimension called Reseller Sales Order Details.  The key attribute, called Reseller Sales Order, is a combination of SalesOrderNumber and SalesOrderLineNumber.
  • The Reseller Sales measure group ties to the Reseller Sales Order Details dimension with a fact dimension relationship to the Reseller Sales Order attribute.

The upshot of all this is that each detail cell in the Reseller Sales measure group can have one and only one fact row.

Clearly, SSAS drillthrough creates design ripples all the way back through the DW/BI manufacturing pipeline.  The following guidelines specify practices if drillthrough is desired.

  1. Facts must have primary keys.  ETL and the data pipeline must be able to manufacture meaningful unique identifiers for each fact row.  The preferred unique identifier is a combination of meaningful attributes.  Identity columns are less desirable since these keys need to be semi-permanent; that is, if the fact table is dropped and recreated with identical data, all of the values in the table primary keys should be as they were.  Failure to enforce this will require complete reprocessing of the dimension holding the fact table’s primary key as an attribute, and every measure group using that dimension.
  2. A PK must be designated on the fact table in the SSAS DSV.
  3. A dimension and attribute must be created to pull this key into UDM.  This dimension ideally has real business meaning (like an order number and detail line number that could never be duplicated).  And regardless of its meaning, it could be huge.  Suppose a fact table contains one billion rows.  There will be literally one billion members populating the attribute.  The new fact dimension in SSAS is tailor-made for this role.  The dimension can be populated directly from the fact table without the need to build and populate separate dimension tables in the data source, or carry those objects forward in the DSV.
  4. The storage mode must be carefully considered when building this dimension.  As Christian Wade points out, SSAS now permits MOLAP partition storage with ROLAP dimensions.  If this dimension is not used for anything other than drillthrough imitation, then ROLAP could be an ideal choice.  Dimension processing times will be greatly reduced, and queries that do not directly use the dimension can ignore this dimension.
  5. A decision must be made about how to author drillthrough queries.  The DRILLTHROUGH keyword in MDX will still work.  The alternative is to write a hefty crossjoin MDX query.  The next part of this series will compare these two methods.
  6. Limit drillthrough queries.  This limitation takes two forms.  First, account for drillthrough during design, by designating which measure groups will be use for drillthrough.  Not all measure groups will be used for drillthrough, and in many cases, the data available to SSAS will not have a natural primary key for the fact data.  If drillthrough is not required, then there is no mandate to create a unique fact key.  Second, carefully control where and how drillthrough is executed.  For small cubes, the query response time for any type of drillthrough query is immaterial.  For cubes with huge data (e.g. – a billion fact rows), drillthrough on high-level cube cells would be disastrous, regardless of the method by which the query is answered.  Ensure that MAXROWS is set properly.

Integrating Analysis Services 2005 and Reporting Services 2005 with DRILLTHROUGH: Part I of IV

Part I of IV:  Two Examples Show that the New Drillthrough May Not Always Behave Like the Old Drillthrough

As noted in a previous post, there was usually no compelling reason for a fact table to have a primary key since there was no need for row uniqueness in the fact table.  Since there may be readers who erroneously dispute this, an example is in order.  Suppose a customer buys three identical hammers at a big orange home improvement retailer.  The cashier swipes them one at a time and places them in the bag.  The fact table that records these occurrences has a grain of day, store, cashier, and product, with a measure of quantity and price.  Is it okay to write three rows into the fact table?  Yes.  Could ETL aggregate these identical facts into a single row?  Also, yes.  Does Analysis Services care?   Not one whit, aside from some additional processing time to load three rows instead of one.  But keeping three rows is a convenient way to count the number of scanner swipes, and these three rows will be perfectly identical.

Drillthrough on a cell might return all three identical rows to the client and this occurrence was easily factored into the data warehouse design.  But this does not work for Analysis Services 2005 (“SSAS”) because drillthrough has been rearchitected.  Two examples will show how fundamental these changes are; in fact, the changes are so pervasive that (a) MSAS cubes using DRILLTHROUGH will lose their DRILLTHROUGH upon migration to SSAS, and (b) the “new” DRILLTHROUGH returns a completely different answer in SSAS than it did in MSAS.

The first example provides evidence that drillthrough in SSAS can provide the same answer as the old drillthrough.  It uses the AdventureWorksDW sample database for SQL Server 2005, and the Adventure Works DW sample database for SSAS.  It focuses solely upon the Reseller Sales measure group, which has the following characteristics:

  • The DSV fact table is FactResellerSales.
  • The measures in question are Reseller Sales Amount (source column: SalesAmount) and Reseller Transaction Count (rowcount aggregate function).
  • Reseller Transaction Count is a hidden measure.

The first QA any good Analysis Services architect/developer performs is a rowcount check and top cell summary check.   This first step in the check is to know what the “right” answer is.  In this case, the right answer is 100% of the rows in FactResellerSales and the sum of the SalesAmount column.  The following query provides the control totals:

SELECT

count(*) as ResellerSalesRowcount,

sum(SalesAmount) as ResellerSalesTotalSales

FROM

dbo.FactResellerSales

 

ResellerSalesRowcount = 60,855

ResellerSalesTotalSales = $80,450,596.9823

Any drillthrough operation should return 60,855 rows and total dollars of $80,450,596.98.  The following MDX query executes a DRILLTHROUGH statement on the Adventure Works cube:

DRILLTHROUGH MAXROWS 100000

SELECT

[Measures].[Reseller Transaction Count] ON COLUMNS

FROM

[Adventure Works]

(Note:  MAXROWS is set to 10,000 by default.  Since this fact table has far more than 10,000 rows, MAXROWS is set to 100,000.)

This query also returns 60,855 rows, which is the correct answer.  Do not be fooled into thinking that this one example proves that drillthrough in SSAS will always return the right answer. 

The second example, which uses the aforementioned cash report, shows that drillthrough may not always return the correct answer.  The cash report fact table has 20,473 rows in the fact table.  But a drillthrough query, identical in form to the query used in the first example, sent back only 18,585 rows.  The sum of the dollars returned by the drillthrough set was correct.  So the dollars were right, but the rowcount was wrong. 

After adding a rowcount measure to the measure group and running the query again, the results were plain.  In SSAS, drillthrough does not return every row that went into the cell being drilled.  In fact, it does not even query the data source.  (http://msdn2.microsoft.com/en-us/library/ms345125.aspx) It returns one row for each detail cell in the cube that contributed to the cell upon which drillthrough occurs.  So the 18,585 rows returned by the DRILLTHROUGH statement represent 18,585 distinct combinations of member values for the non-empty detail cells comprising the aggregate cell.  Examination of the rowcount measure showed many fact rows with duplicate dimensionality, but not amounts (48 in one case, 41 in another, etc.)  In all cases the dollar value returned for the drillthrough row was the sum of the fact rows that went into the detail cell represented by the drillthrough row.  In other words, drillthrough reported that there were 48 fact rows that populated that detail cell, but it refused to return the individual rows.  This was not acceptable since the cash report requires one row for each transaction, even if they are dimensionally identical. 

The difference between MSAS and SSAS can be summarized with these two points: 

  • In MSAS, drillthrough queries the MSAS data source with an SQL statement and returns the fact rows underneath the cell upon which drillthrough acts.
  • In SSAS, drillthrough does not query the SSAS data source.  It queries the cube and returns one row for each detail cell underneath the cell upon which drillthrough acts.  An obvious implication of this new behavior is that all information returned by drillthrough must reside solely with SSAS as UDM attributes.

When asked about this behavior, Microsoft’s response was, “This is by design.  You can write all drillthrough queries as MDX now.”  Darren Gosbel wrote a bug report about it.  (http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126175) 

Is the latter part really true?  Can you really write all drillthrough queries as MDX?  Yes, but only if you modify your design to accommodate the “new” drillthrough, because Microsoft changed the rules of the game from MSAS to SSAS. 

The next installment of this series will address the changes required in fact and dimension design to accommodate the new DRILLTHROUGH.

Integrating Analysis Services 2005 and Reporting Services 2005 with DRILLTHROUGH; Introduction

My current client has engaged my employer to implement a financial reporting solution.  The first report we chose was a cash report, which displays details not usually carried in a cube.  In the past, I never worried about uniqueness of fact rows because it was unnecessary.  Having completely identical rows in a fact table was perfectly legitimate, a fact giving indigestion to many traditional data modelers.  When Analysis Services 2000 (“MSAS”) loads duplicate rows into the cube, it simply aggregates the measures into the single designated cell identified by the dimensional tuple formed by the surrogate key combination.  This is one of the things we love about Analysis Services, that it provides (semi) detailed and aggregated information on a moment’s notice.

But sometimes you need to see more detail than the cube provides.  Sometimes you need to see the individual fact rows contributing to a cell because reports require such detail.  The aforementioned cash report is little more than a rote listing of all cash transactions for a particular period, e.g. 2006.  While the dimensions which form the cash report provide some finely grained information, there are many cash transactions with duplicate dimensionality (though different dollar amounts).  These are aggregated into a single detail cell, as expected.  But the report needs those detail fact rows.  In MSAS, drillthrough on a single cell returned the fact rows (with designated dimensional columns) from the cell’s data source.  It did this by forming an SQL statement, pushing it back through the data provider, and passing along the rowset which answered the query.  This handy technique allowed very detailed reporting and analysis without overstressing MSAS, though it carried no performance guarantees since query performance was governed by data source performance, not MSAS itself.

While developing the cash report, I needed fact rows so I turned to the trusty DRILLTHROUGH statement.  To my chagrin, it didn’t return the correct answer.  Specifically, it didn’t return the correct number of rows.  This began a frantic search to understand why drillthrough was not behaving as expected.  I was tipped off by Teo Lachev’s book, which said that drillthrough had been completely rearchitected and that Analysis Services 2005 (“SSAS”) answered all drillthrough queries from the cube.  Uh oh.  That had nasty implications that I’ll discuss later and it has forced me to change a couple of design guidelines.

A Web search yielded a lot of clues, but nothing in one place that allowed me to craft a solution.  I hope to provide a solution here, in multiple parts.  This is the first part in a series to explain how DRILLTHROUGH has changed and how you must change your design techniques to properly utilize it.  I will reference several websites when appropriate, for I stand on the shoulders of giants.  And should I err in this series, I trust my fellow cubers to provide the truth.

Part I:  Two Examples Show that SSS Drillthrough May Not Always Behave Like MSAS Drillthrough

Part II:  Design Implications of the New Drillthrough

Part III:  Query Examples for SSAS Drillthrough

Part IV:  Summary