Friday, October 07, 2005 - Posts

AS2000 magics: working without repository

Today I discovered that you can have a working AS2000 database without repository data relative to structure internals to that database (cubes and dimensions).

Just to explain how it works.

  • You have a "Foodmart 2000" database
  • Into the Olap Data directory, you create a "Foodmart Copy" directory
  • In Analysis Manager, create a database "Foodmart Copy"
  • Stop Analysis Services service
  • Into Olap Data directory, copy "Foodmart 2000" content into "Foodmart Copy" directory
  • Start Analysis Services service
  • From an olap client (like Excel) you can navigate all cubes into "Foodmart Copy" as you were in "Foodmart 2000"
  • From Analysis Manager, you see that "Foodmart Copy" is an empty database (it has no metadata in repository)

I know at least one customer who currently use this "undocumented feature" (can we say that?) to support the distribution of a processed database so queries scale on a multi-server architecture. If they update the "master" database changing some metadata, adding a level to a dimension or a measure to a cube, the distribution is a simpe XCOPY deployment.

I didn't know that AS2000 is able to query a processed cube even without metadata...

Star schema vs. snowflake with SSAS2005

Recently I discovered that a very large dimension could be loaded more effectively by SSAS2005 if it is designed as a snowflake schema instead than as a singular table (star schema). I have to say that I'm a strong supporter of star schema, but these are the facts.

For a dimension, SSAS2005 sends a SELECT DISTINCT query to the relational data source for each dimension attribute. If you have a product dimension with 2 million rows and a lot of attributes (may be 30), it requires time and consumes SQL Server resources (CPU and RAM). But when many of these attributes are defined at the category level (imagine to have a category-product natural hierarchy), then in a snowflake design many of SELECT DISTINCT queries are sent to the ProductCategories table only, without join with the much more populated Products table.

When you consider performance in a cube full process operation, it may be not so significant, after all. But what if you have an incremental cube update and want to incrementally update the Product dimension? Many times each day? Yeah, in this case you could consider this condition in a very different way!

I'd like to share experiences with someone who had done similar test and considerations: comments are welcome!