September 2004 - Posts

Side-by-Side Installation of SQL Server 2000 and Yukon Beta 2


I have been able to successfully install and run Yukon Beta 2 in parallel with SQL Server 2000. I used different names for the server instances. I am able to bring up both instances of SQL Server and Analysis Services at the same time, although my machine gets too slow with the 512MB RAM that I have. 1GB or more RAM should do better. Also, I am not able to use the SQL Server Service Manager to start Yukon. Both instances of the server seems to be pointing to AS 2000. So, just to get around this problem, I need to go to Windows Services and start from there. Apart from that, things look good.

Narayanan Kulasekar
Architect


Dimensions and sorcerer’s brooms


When building a cube in AMO (or the XML equivalent, ASSL), sometimes it feels like you’re copying a dimension every time you turn around. You start with a dimension (that is, a database dimension). Then you don’t just add it to the cube, you create a cube dimension that references it. And then you don’t just use that in a measure group, instead, you create a measure group dimension. It’s like poor Mickey Mouse and his sorcerer’s broom: they keep multiplying.

Well, I finally worked through why there’s a new object at each layer, and it really does help to see what each one adds that’s not there in the more general version.

  • Database dimension: this is where you really define the dimension and create all the attributes.
  • Cube dimension: this adds the fact that a cube can have multiple instances of the same dimension—the classic example is Ship Date, Due Date, Order Date. The point is that you don’t have to create three different dimensions. You define the dimension only once and use it three times, and the cube dimension is where you give them unique names.
  •  Measure Group dimension: this adds the fact that the same dimension can have a different granularity attribute with different fact tables—the classic example is Orders at the Daily grain but Budget at the Monthly grain. You need a place to identify what the “leaf” level of the dimension should be for this particular fact table, and also tell which fact table column it should join to (which further supports the multiple instances of the dimension).
  • Aggregation Design dimension: this allows you to override the global row counts for attributes in the context of one aggregation. Basically, it lets you speak out of both sides of your mouth (not to say “lie”) with different aggregation designs—in one, you might say an attribute has 10 members, and in another you might say the attribute has 10000 members.

So I feel better now. Just thinking through the different versions made it clear that there really weren’t 400 different variations of one dimension. (Although there are more than 4 :-).

Reed


Adding a Where Clause to a Partition


When you create a cube, the designer creates a default partition that references the fact table. If you’re going to add more partitions, each of which points to a different fact table, it’s easy—you just add the partition and select the fact table. But what if you want to have multiple cube partitions coming from a single fact table.

In AS2K, you could add a Slice. The Slice would then create a Where clause on the fact table query (and make queries more efficient by not searching a partition that has no relevant data). In Yukon, you don’t have to create an explicit slice, because the engine automatically figures out which partitions contain which slice(s).

So you need to add an explicit Where clause. But you can’t because when you try to edit the partition, it doesn’t get to Page 2, where you can type in the Where clause because it tells you that the desired fact table is already used.

So how do you change a partition from a table to a query? Simple answer: you don’t. Instead, delete the existing partition and create a new one. The new one lets you pick the fact table (because it’s not used yet) and then you can get to Page 2 where you can enter the Where clause. You can add additional partitions the same way, because it lets you select a fact table if existing partitions use a query against the table.

In a warped kind of way, it all makes sense.

Reed


MDX Subqueries and Multi-Select


Yesterday, I posted about the new MDX subqueries. One place where they’re particularly intuitive is with multi-select In a WHERE clause tuple, you can have only one member for each attribute, so you can’t slice by both US and Canada. But users have wanted a multi-select slicer, so most client tools have taken the approach of creating a calculated member that aggregates the selected items and then uses that one member for the slicer. This sometimes gives the desired result, but has all sorts of problems. For one thing, the Aggregate function won’t work at all on distinct count measures. For another, the slicer caption is usually something dumb like “multi-select”.

With the new subquery structure, the UI can just change the FROM clause to (SELECT {[US],[Canada]},*) on 0 from [Sales]) which has very intuitive consequence of reducing the source space for the report.

- Reed


MDX Slicers and Subqueries


One of the big conceptual differences between MDX and SQL query statements is in the way the WHERE clause works. In SQL, the WHERE clause restricts the source that goes into the query, and thus affects the rows (group by items) as well as the data values. In MDX, the WHERE clause qualifies the tuple for each cell of the query, but doesn’t affect the axis sets. For example, in SQL, if you group by Country and put North America in the WHERE clause, you’ll only see rows for US, Canada, and Mexico. But in MDX, if you put Country.Members on rows and put North America in the WHERE clause, you see all the countries, but the cells for most of the countries of the world are empty.

In Yukon, they didn’t change the meaning of the WHERE clause, but they added a new structure that gives the same effect as the SQL WHERE clause—to filter the source: You can now use a subquery in the FROM clause. So, rather than putting North America in the WHERE clause, if you put (SELECT ([North America],*) on 0 FROM [Sales]) in the FROM clause, you get the SQL-like result that includes only US, Canada, and Mexico.

While there’s still a place for the data-slice behavior of the MDX WHERE clause, the new approach behaves in a way that feels more intuitive to both SQL developers and end users.

Reed


Defining ‘By Account’ Aggregations


In Yukon, you can use semi-additive functions such as Last NonEmpty Child to aggregate a measure. The function then works against the Time dimension. You can also identify a measure’s aggregation as “By Account”, which means that it uses the Account Type attribute for each member to choose which aggregation function to use. When you use the Wizard to build the dimension, you get to map your labels to the 7 pre-defined account types.

But what if you want to use a different aggregation function for a type? For example, what if you want to use “Average of Children” instead of “Last NonEmpty Child”? Or what if you change your mind about the Account Type mapping (and you really meant to spell it as “Asset” rather than as “Assset”)? The definitions of the account types are stored in the <xxx>.database file, which is the only XML definition file you cannot see by choosing View Code in the Solution Explorer. You need to go to Windows Explorer and open the file.

In practice, it’s probably best to try to use the bucket names as meaningfully as possible, but you don’t really have to. Think of the seven pre-defined account types as simply seven slots for aggregate functions. (There are 12 aggregate functions, so there are always 5 you won’t be able to use :-(. ) You could, if you want, assign seven different functions to seven “Account Types” and then map those to your account type names in the database file.

But whether you’re vanilla or chocolate fudge sundae in the way you use the Account Type mappings, the database file is the place to change them.

Reed


Profiling is a Good Thing (Part 2)


In a previous blog I talked about using SQL Server 2005 Profiler to trace events that occur when you process AS2K5 dimensions and partitions. Profiler also provides some useful information about how AS2K5 responds to a query.

I started off using the Standard (default) trace template provided with Profiler. Audit Login events let you know who has accessed your database. Discover Begin events occur when a browser requests database metadata. Query and Progress Report events let you watch the steps AS2K5 goes through to respond to a query. You watch as data is retrieved from each partition and tell when data is retrieved from cache. You can rerun a query and see that it is much less work for AS2K5 to respond

I think SQL Server 2005 Profiler is going to make it much easier to optimize queries.

- Scott


IsAggregateable and Default Members


People ask me a lot what’s new in MDX for Yukon. MDX itself really isn’t all that different, but with the multiple hierarchies in a dimension, there are a lot of implications of MDX that new. One that I was working on with Scott a few days ago had to do with the implication of turning off aggregations for an attribute hierarchy. As soon as you turn off aggregations for the attribute, you suddenly get the default member implied in every single tuple you ever use (where you don’t specifically mention the attribute). So, for example, if you had an attribute with countries and no All member, the first member — say Albania — becomes the default default member. In every expression, unless you specify an explicit country—or reference the entire set of Country.Members — you implicitly get Albania. This is, again, not different from AS2K, but because there are so many attribute hierarchies, it’s easy to get burned by it.

Reed


Profiling is a Good Thing


Well, using SQL Server 2005 Profiler to create a trace of Analysis Services events is a good thing.

I’ve spent a lot of time reading Analysis Services 2000 process logs while trying to optimize the processing of cubes. So I was quite frustrated when I couldn’t find how to turn on the process log in AS2K5. I was looking in the wrong place for the info. SQL Server 2005 Profiler can now be used to create a trace of Analysis Services events.

I started off using the Standard (default) trace template provided with Profiler. Because I wanted to monitor processing events, I used the Events Selection tab of the Trace Properties dialog box to deselect all event categories except Progress Reports, Command Events, and Errors and Warnings. The Events Selection tab also has an Organize Columns button that allows you to rearrange the order of the columns. I found the following columns useful:

  • ObjectPath
  • EventSubclass
  • TextData
  • IntegerData
  • CurrentTime
  • StartTime
  • EndTime
  • Duration
  • CPUTime
  • JobID

In Profiler you can view the trace in real time and have the trace saved to a SQL Server table, Profiler trace file, or XML file. I liked saving a trace to SQL Server because I could then write SQL queries to filter and sort the events. There are events that will give you the SQL that Analysis Services used to populate the dimension or partition and you can see how long each object took to process.

- Scott


RS MDX in B2


I just wanted to follow-up on Stacia’s (very interesting) recent post about Reporting Services and MDX. In Beta 2, the MDX that the wizard generates returns data from only a single level, and then the report aggregates the numbers into totals. There are a couple of problems with this approach, the most serious of which is that it doesn’t work properly with cube calculations. If you have, for example, a calculated member that divides Dollars by Units to get Price, the Beta 2 MDX would retrieve the low-level price and then add it up.

The dev team has assured me that by Beta 3 it will work properly. They will generate good multi-level MDX and the report will be able to grab the cube-generated value at all levels, which will return the right answer.

One other little note if you’re playing with OLAP as a data source in Beta 2. The Report Wizard automatically sorts by the Grouping items, so if you retrieve a Data dimension that includes Month Names, the wizard sorts the month names alphabetically. The workaround is simply to go into the group definition and delete the sort—so the report will sort in the data source order. (You also want to make sure your date dimension is to the left of other dimensions so that the other dimensions don’t mess up the sort order.)

- Reed


DTS Runtime


When you programmatically create a package or a custom object for DTS, you have to think in two different modes: DesignTime and RunTime. DesignTime is for when you’re building the package, for when you’re setting up all the flows and connections. RunTime is for when the package executes. The goal is to make the RunTime flows as fast as possible by getting everything else set up in advance. This all makes sense. There is a tad bit of confusion about the fact that the term RunTime is also used sometimes to refer to the control flow part of a package—to the sequenced execution of tasks, as opposed to the data flow, which is also called the PipeLine. As long as you keep the two uses of RunTime separate, you can clearly call DesignTime methods of RunTime components or properly implement RunTime methods for a custom PipeLine componenent

Reed


Airlift


Last week was a big Airlift event where I was finally able to meet Bob and Larry (the primary players behind the other SQL Server Ascend courses). It was cool to be able to compare notes a little on the joys of banging one’s head against a wall for months to see which one cracks first.

At the Airlift, I did a one-day version of the five-day Yukon BI for Developers course—basically, talking really, really fast to try to fit an impossible amount of content into a day. We even had a surprise visit from some folks from the ActiveViews team to show us a sneak preview. ActiveViews isn’t in Beta 2, so I haven’t looked at it much yet. Because of its historical evolution, ActiveViews requires its own metadata layer—rather than using the Yukon UDM, which is understandable, but unfortunate. The value for the report designers/users is sufficient that some administrative redundancy is probably acceptable.

- Reed