Wednesday, March 30, 2005 - Posts

OK, I have a confession to make. I thought I knew what Unpivot did. Heck, I've shown demos of it for a few years now. Today, Jason wanted to build a pretty cool package for survey results and I realized I didn't understand how unpivot worked anymore. I think I did when I built the demo, but today I don't.

The survey was on SQL BI and had columns such as:

CustomerName, email, Company, Question1_Answer1, Question1_Answer2, Question2_Answer1, Question2_Answer2, ...

Basically all of the questions had 2 answers and there were about 70 questions. (If you filled out one of such surveys, thank you for spending the time!) What Jason wanted was something like this:

CustomerName, email, Question, Answer1, Answer2

I.e. introduce a new column called Question that would contain “Question1, Question2, etc.” one per row and maintain only two columns for all the answers.

Fortunately Wenyang confirmed that I needed to use an Unpivot. I managed to play around with SSIS to get what we wanted because it's so easy to try something. 

But now I'm sitting here trying to understand what's the “Pivot Value“, and what's the “Data Value“? I started trying to understand what Unpivot is by searching the net. MsnSearch said Unpivot is the opposite of Pivot. Great, thanks. Trying to go through ANSI SQL-92's index looking for Unpivot/Pivot/cross tab, and then through several SQL books. Mr. Celko finally confirmed I'm not a smartie. Thanks, buddy. At least I have a head full of hair, grey as they might be.

Finally (and I'm kicking myself why I didn't do it sooner) what really helped was the BOL topic on the Unpivot transformation. The example makes is simple, and it makes sense. What else is there? It breaks down the operation into “Destination Column“ and “Pivot Key Value“, concepts that I understand.

Following the grid in BOL, added two rows - "Destination Column" and "Pivot Key Value" to my scratch paper:

Destination Column A1 A2 A1 A2
Pivot Key Value Q1 Q1 Q2 Q2
Column Name Customer Email Question1_Answer1 Question1_Answer2 Question2_Answer1 Question2_Answer2
Data Records Ashvini as@somewhere.com 1 2 3 4

That was easy.

The only other thing I need is the name of the column where the Destination Column Values would end up. Well, that's "Question".

Given this information, what do the terms in the UI ("Data Column" and "Pivot Value") mean? Well, "Pivot Value" is the "Pivot Key Value" in the grid above. And "Data Column" is the column name that will contain the data, which is the "Destination Column" in the grid above. At the bottom of the Unpivot UI is the single edit box for the name of the new column which will contain the "Pivot Value"s, which in our case is "Question".

So, why are the names in the UI not the same in the docs? Ironically, I had a hand in that. The new names made sense when Bob (the UI author) and I were discussing them because we understood the feature in and out. Having revisited the description, the names make sense to me again. The description that Unpivot is the opposite of Pivot also makes more sense. I haven't picked up Celko's book again, though. Serves him right for being an elitist. <g>

While we'll try to fix this discrepancy, what helped most was not the names but the description in BOL. Last year I got a chance to spend a week at Product Support understanding how they work and got reports of customers saying our BOL is the best in the industry. With all the attention on features we tend to loose focus on how much good information BOL provides. The docs folks have been a part of the team through the beginning of the product cycle and worked their butts off. Thanks to them I can go home with a lighter head knowing I finally understand something that was bugging me for half the day today.

Thank you guys!

Hvala,

Laku Noc

 

with 0 Comments

Dobro jutro,

Kako ste vi?

In recent CTP builds folks are seeing some cells that contains “<Missing Lineage ID>” when using visualizers in a data flow task.

This is a common enough issue that we should address it head on to help folks from wasting their time.

The cause of this error is that due to the optimizations that the data flow engine does internally some data is not available on certain paths even though the designer tells you it is. The “<Missing Lineage ID>” message is the data flow engine way of reporting that the designer asked to see data for a column that's not available.

To explain the workaround for now, lets use a simple layout:

     Source -> Derived Column -> SomeComponent

Lets say the Source is producing two columns: 'FirstName' and 'LastName'. The Derived Column then creates 'Name' out of the incoming columns and the SomeComponent then uses the Name column only.

If you add a visualizer on the path between Derived Column and SomeComponent, 'Name' would show up just fine but 'FirstName' and 'LastName' would have the “<Missing Lineage ID>“ error because they're not used by Destination.

So, currently you can only see columns that're used by the component that's getting the data after the visualizer.

One workaround, then, is to use the Advanced Editor for editing the next component (SomeComponent in this case) and adding all columns to the Input Columns collection of that component's input which is receiving the data from the visualizer. Note that some components' validation might warn that unnecessary columns are added to the component while some other components such as some destinations manage their input columns collection tightly and will not allow such tinkering.

Another workaround is to add a UnionAll between the Derived Column and Destination that automatically syncs all the columns. That, however, has the performance penalty of causing memory copies through the UnionAll.

In this scenario, perhaps the best workaround would be to look at the data in the destination itself.

We hope to address this very soon because visualizers are one source of productivity boosts you get from using SSIS.

zdravo

PS. You might be wondering why I'm sprinkling my posts with bits of Croatian/Serbian. Well, in preparation of my trip to WinDays in Croatia (http://www.microsoft.com/croatia/windays/sadrzaj.aspx), I and my 4 year old son have been trying to learn a bit of the language and practicing with a few of our colleagues from ex-Yu. While my wife also joins in the fun, our 4 month old blows bubbles instead. Slacker.

 

with 1 Comments