<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>Michael's BI corner</title><link>http://sqljunkies.com/WebLog/reckless/default.aspx</link><description>BI blog on Analysis Services, Integration Services and Reporting Services.</description><dc:language>en-US</dc:language><generator>CommunityServer 1.0 (Build: 1.0.1.50214)</generator><item><title>My blog is moving</title><link>http://sqljunkies.com/WebLog/reckless/archive/2007/01/07/26573.aspx</link><pubDate>Sun, 07 Jan 2007 08:40:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:26573</guid><dc:creator>reckless</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/reckless/comments/26573.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/reckless/commentrss.aspx?PostID=26573</wfw:commentRss><description>My blog has moved to &lt;a href="http://barrettbi.spaces.live.com"&gt;http://barrettbi.spaces.live.com&lt;/a&gt;.&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=26573" width="1" height="1"&gt;</description></item><item><title>SSIS: A distinct component</title><link>http://sqljunkies.com/WebLog/reckless/archive/2006/12/12/ssis_distinct.aspx</link><pubDate>Tue, 12 Dec 2006 15:03:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:26014</guid><dc:creator>reckless</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/reckless/comments/26014.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/reckless/commentrss.aspx?PostID=26014</wfw:commentRss><description>As a follow up to Jamie Thomsons post on "SSIS: A distinct component please" (found &lt;a href="http://blogs.conchango.com/jamiethomson/archive/2006/12/08/SSIS_3A00_-A-distinct-component-please.aspx"&gt;here&lt;/a&gt;) I thought I would share how I have implemented a distinct component using the script component in the SSIS pipeline.
&lt;br&gt;&lt;br&gt;
The script component is - of course - set up to be asynchronous by modifying the synchronous input ID of the output from the script component. It will only be partially-blocking, however, as it does not need to wait for all rows to be passed in before the output is created. A new row is added to the output every time a new key value is encountered in the input. This is contrary to the distinct functionality of the built-in Sort component, where you have to wait for all rows to be passed in, before the distinct values are sent down the output.&lt;br&gt;&lt;br&gt;

The implementation uses a hashtable to keep track of the distinct values, and to reproduce it you can do the following:&lt;br&gt;&lt;br&gt;
1. Create a new SSIS package&lt;br&gt;
2. Create an OLE DB Connection manager that points to the AdventureworksDW database.&lt;br&gt;
3. Create a data flow task in the package&lt;br&gt;
4. In the dataflow, create an OLE DB Source with the source SQL statement:&lt;br&gt;&lt;br&gt;
&lt;i&gt;
SELECT Color&lt;br&gt;
FROM DimProduct&lt;br&gt;
&lt;/i&gt;
&lt;br&gt;
5. Tie the output of the OLE DB Source to the input of a new script component.&lt;br&gt;
6. In the script component under "Input" select the Color attribute&lt;br&gt;
7. Under "Inputs and outputs" select the output and modify the synchronous input ID to something other than the default value (this makes the output behave asynchronously).&lt;br&gt;
8. Create a new output column called Color - remember to adjust the data type to match the data type of the input column.&lt;br&gt;
9. Add the following code to your script.&lt;br&gt;&lt;br&gt;
&lt;pre&gt;
&lt;i&gt;
Imports System&lt;br&gt;
Imports System.Data&lt;br&gt;
Imports System.Math&lt;br&gt;
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper&lt;br&gt;
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper&lt;br&gt;
Imports System.Collections&lt;br&gt;&lt;br&gt;

Public Class ScriptMain&lt;br&gt;
    Inherits UserComponent&lt;br&gt;
    Dim htDistinct As Hashtable = New Hashtable&lt;br&gt;&lt;br&gt;

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)&lt;br&gt;
        If Not htDistinct.ContainsKey(Row.Color) Then&lt;br&gt;
            htDistinct.Add(Row.Color, DBNull.Value)&lt;br&gt;
            Output0Buffer.AddRow()&lt;br&gt;
            Output0Buffer.Color = Row.Color&lt;br&gt;
        End If&lt;br&gt;
    End Sub&lt;br&gt;&lt;br&gt;

End Class&lt;br&gt;&lt;br&gt;
&lt;/i&gt;
&lt;/pre&gt;
10. Optionally, connect the script component to a row count transformation and add a data viewer to the path between the two components.&lt;br&gt;
11. Run the package and watch how only distinct values are output from the script component.&lt;br&gt;&lt;br&gt;

This example illustrates how to make a distinct operation on a single column in the data flow, but it can be easily extended to making a distinct operation on more than one column.&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=26014" width="1" height="1"&gt;</description></item><item><title>AS2005, MDX: Subcubes should set the current context</title><link>http://sqljunkies.com/WebLog/reckless/archive/2006/07/05/22106.aspx</link><pubDate>Wed, 05 Jul 2006 08:27:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:22106</guid><dc:creator>reckless</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/reckless/comments/22106.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/reckless/commentrss.aspx?PostID=22106</wfw:commentRss><description>Currently, subcubes/subselects used in MDX queries in Analysis Services 2005 do not set the current query context. This is a huge problem in calculated members that need the current context in order to return correct results. 
&lt;P&gt;For examples and an elaboration, please see:&lt;BR&gt;&lt;A href="http://www.sqljunkies.com/WebLog/reckless/archive/2006/03/08/18601.aspx"&gt;http://www.sqljunkies.com/WebLog/reckless/archive/2006/03/08/18601.aspx&lt;/A&gt;&lt;BR&gt;&lt;A href="http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=282896&amp;amp;SiteID=1"&gt;http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=282896&amp;amp;SiteID=1&lt;/A&gt;&lt;BR&gt;&lt;A href="http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=488496&amp;amp;SiteID=1"&gt;http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=488496&amp;amp;SiteID=1&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Excel 2007 makes extensive use of subselects/subcubes for restricting query results, so the problem will appear in this client application with the current behavior of AS 2005.&lt;/P&gt;
&lt;P&gt;If you agree with me, please vote for my suggestion at: &lt;A href="http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=156581"&gt;http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=156581&lt;/A&gt;&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=22106" width="1" height="1"&gt;</description></item><item><title>SSIS: Handling large data flows</title><link>http://sqljunkies.com/WebLog/reckless/archive/2006/05/01/ssis_largedataflows.aspx</link><pubDate>Mon, 01 May 2006 17:27:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:20847</guid><dc:creator>reckless</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/reckless/comments/20847.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/reckless/commentrss.aspx?PostID=20847</wfw:commentRss><description>Recently I had an experience while developing an SSIS package that I would like to share. The package had a simple control flow containing 5 Execute SQL Tasks, one Send Mail Task and a Data Flow Task. Simple as this sounds, the data flow was really large. Some of the components that it contained were:&lt;br&gt;
&lt;br&gt;
* 3 OLE DB sources&lt;br&gt;
* 4 OLE DB destinations&lt;br&gt;
* Approx. 20 Lookup components&lt;br&gt;
* Approx. 20 Conditional Split components&lt;br&gt;
* Approx. 20 Union All transformations&lt;br&gt;
* Approx. 30 Derived Column transformations&lt;br&gt;
* Approx. 20 Script components&lt;br&gt;
&lt;br&gt;
I should mention that the data flow was used for loading fact data, and the components were (primarily) used to assign surrogate keys and inserting inferred members in dimension tables using the &lt;a href="http://www.microsoft.com/downloads/details.aspx?familyid=705B03F3-1BBF-417F-9E63-92A00A4744E6&amp;displaylang=en"&gt;"Project Real Method"&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
The problem with the package was that when I was developing it, the UI became increasingly slow to work with, when I was editing the data flow. In the end it was almost impossible to work with the data flow, as I had to wait 15-20 seconds for the UI to react every time I clicked the mouse. Also, opening the package took a good 10-15 minutes - and this was also the case when DTEXEC (not DTSDebugHost) had to execute the package. There was simply a 10-15 minute delay before the package started executing!&lt;br&gt;
&lt;br&gt;
Now, what could I do? I needed the functionality in the package, but I could not live with the slow (impossible) editing of the package, and the delay when executing the package was also unacceptable. I came up with a solution that is almost as good as if I had been able to keep the logic in one data flow. I simply "cut the data flow in half", creating two data flows. The first data flow wrote its output to a raw file and the next data flow simply picked up data from this file and continued its execution.&lt;br&gt;
&lt;br&gt;
It's not a perfect solution, but it solved the problem, and if you have a large data flow that for some reason becomes very slow to work with, perhaps you could use this approach as well...&lt;br&gt;
&lt;br&gt;
I think what I have encountered could be described as a bug. I haven't filed it with MS, though.&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=20847" width="1" height="1"&gt;</description></item><item><title>SQL Server 2005 SP1 CTP</title><link>http://sqljunkies.com/WebLog/reckless/archive/2006/03/16/19218.aspx</link><pubDate>Thu, 16 Mar 2006 20:54:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:19218</guid><dc:creator>reckless</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/reckless/comments/19218.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/reckless/commentrss.aspx?PostID=19218</wfw:commentRss><description>The SQL Server 2005 SP1 CTP is now available at &lt;A href="http://www.microsoft.com/sql/ctp_sp1.mspx"&gt;http://www.microsoft.com/sql/ctp_sp1.mspx&lt;/A&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=19218" width="1" height="1"&gt;</description></item><item><title>MDX: EXISTING operator and subselects</title><link>http://sqljunkies.com/WebLog/reckless/archive/2006/03/08/18601.aspx</link><pubDate>Wed, 08 Mar 2006 10:47:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:18601</guid><dc:creator>reckless</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/reckless/comments/18601.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/reckless/commentrss.aspx?PostID=18601</wfw:commentRss><description>&lt;P&gt;This is the first post in my blog, which will be focusing on BI-related issues on the SQL Server platform - i.e.: Analysis Services, Integration Services and Reporting Services.&lt;/P&gt;
&lt;P&gt;This blog entry is about the use of the EXISTING operator in MDX calculations (well, really it is about one of the pitfalls of using EXISTING). The operator is new to Analysis Services 2005 and can be used to filter a set by the current coordinate in the cube. This is very useful and can for instance be used for making MDX calculations "multi-select friendly" (for more details on this, see Mosha's blog post here: &lt;a href="http://www.sqljunkies.com/WebLog/mosha/archive/2005/11/18/multiselect_friendly_mdx.aspx"&gt;http://www.sqljunkies.com/WebLog/mosha/archive/2005/11/18/multiselect_friendly_mdx.aspx&lt;/A&gt;). The following MDX query (running on the Adventure Works cube database) illustrates the use of EXISTING.&lt;/P&gt;&lt;PRE&gt;&lt;FONT&gt;WITH MEMBER [Measures].[MyDate] AS&lt;BR&gt;Tail(EXISTING [Date].[Calendar].[Date], 1).item(0).membervalue&lt;BR&gt;SELECT {[Measures].[MyDate]} ON 0&lt;BR&gt;FROM [Adventure Works]&lt;BR&gt;WHERE [Date].[Calendar Year].&amp;[2003]&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;This query will return the last date in calendar year 2003, which of course is december 31 2003. It does this because the WHERE clause slices on calendar year 2003 and the last date to exist with this is december 31 2003.&lt;/P&gt;
&lt;P&gt;What happens if we rewrite the query a bit to use a subselect to filter on calendar year 2003 instead of specifying this in the WHERE clause?&lt;/P&gt;&lt;PRE&gt;&lt;FONT&gt;WITH MEMBER [Measures].[MyDate] AS&lt;BR&gt;Tail(EXISTING [Date].[Calendar].[Date], 1).item(0).membervalue&lt;BR&gt;SELECT {[Measures].[MyDate]} ON 0&lt;BR&gt;FROM &lt;BR&gt;(SELECT [Date].[Calendar Year].&amp;[2003] ON 0&lt;BR&gt;FROM [Adventure Works])&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;Now, I might have been the only one, but I certainly expected this query to return the same date as the first one: December 31 2003. &lt;STRONG&gt;This is not the case&lt;/STRONG&gt;. Our new, rewritten, query will return August 31 2004 (which is the last available date in the date dimension in the Adventure Works cube database). What this really means is that our subselect &lt;STRONG&gt;does not&lt;/STRONG&gt; set the current coordinate in the cube (in Moshas words it "merely does top level Exists with axis and applies visual totals"). This is really important to remember if you ever want to use the EXISTING operator for some fancy MDX calculations, because if your client tools use subselects (which, for example, Excel 12 does extensively I am told), you might run into some unexpected results.&lt;/P&gt;
&lt;P&gt;Now, it is not all bad - if our query for instance sets the current coordinate on the date dimension by using one of the hierarchies from this dimension on an axis, we will get our expected result.&lt;/P&gt;&lt;PRE&gt;&lt;FONT&gt;WITH MEMBER [Measures].[MyDate] AS&lt;BR&gt;Tail(EXISTING [Date].[Calendar].[Date], 1).item(0).membervalue&lt;BR&gt;SELECT {[Measures].[MyDate]} ON 0,&lt;BR&gt;[Date].[Calendar Year].[Calendar Year] ON 1&lt;BR&gt;FROM &lt;BR&gt;(SELECT [Date].[Calendar Year].&amp;[2003] ON 0&lt;BR&gt;FROM [Adventure Works])&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;I&gt;UPDATE 20060610: The problems mentioned above are not limited to the use of the EXISTING operator. They apply in all situations where the current context is needed in an MDX expression in order to return a correct result.&lt;/i&gt;&lt;/p&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=18601" width="1" height="1"&gt;</description></item></channel></rss>