SSIS Snippet: Dimension and Fact load with Surrogate Keys and Early Arriving Facts
Well, what a headline... So many BI buzz words, wow, what a solution this has to be... It's nothing special, not very trivial, but also not very hard to understand... at least I hope so...
From a post in the Microsoft Technical Forums for SSIS (http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=75211) I got the idea to write my second snippet which shows how to create new dimension records from a fact table (so you get facts for a dimension record which doesn't exist, yet) and handles all the surrogate key stuff you need...
So what does this package do? It's based on Marco's post http://sqljunkies.com/WebLog/sqlbi/archive/2005/05/30/15684.aspx about how to work with surrogate keys. This is a nice example for loading dimensions and assigning surrogate keys on the fly.
I took this and placed some transforms around it (see a screenshot here: http://tpagel.gmxhome.de/snippets/keygen.jpg). There is a source and I assign the surrogate keys by doing a "merge join" with the dimension table. The problem is now that the dimension table might have some records missing. The solution is that I split up the source by a "multicast". The one side only does a "sort" to prepare for the "merge join". The other path is aggregated by the application key. I lookup the dimension table to find out the dimension key for these application keys. If that fails I generate a new surrogate key (thanks to Marco's code) and add the record with this new key to the dimension table. At the same time ("multicast" again...) I join this dimension record (with a "union all") with the already existing records from the lookup.
So now I have both, the existing records who got their surrogate key from the dimension table and the new dimension records just created. I sort them by the application key and then I can merge them with the facts using an "inner join".
The result is an updated dimension table with new records (including surrogate keys) from the fact table and the fact table itself with the right surrogate keys assigned.
I hope that this helps people to understand SSIS a little better. I'm quite sure that this is no perfect design but it works...
You can download the package (including the backup of a very simple sample database) here: http://tpagel.gmxhome.de/snippets/keygen.zip. Just add a few records to the "factSource" table, run the package and see what happens to the "dim" and "factDestination" table...
Comments are (as always) very appreciated...
Thomas