OK, I have a confession to make. I thought I knew what Unpivot did. Heck, I've shown demos of it for a few years now. Today, Jason wanted to build a pretty cool package for survey results and I realized I didn't understand how unpivot worked anymore. I think I did when I built the demo, but today I don't.
The survey was on SQL BI and had columns such as:
CustomerName, email, Company, Question1_Answer1, Question1_Answer2, Question2_Answer1, Question2_Answer2, ...
Basically all of the questions had 2 answers and there were about 70 questions. (If you filled out one of such surveys, thank you for spending the time!) What Jason wanted was something like this:
CustomerName, email, Question, Answer1, Answer2
I.e. introduce a new column called Question that would contain “Question1, Question2, etc.” one per row and maintain only two columns for all the answers.
Fortunately Wenyang confirmed that I needed to use an Unpivot. I managed to play around with SSIS to get what we wanted because it's so easy to try something.
But now I'm sitting here trying to understand what's the “Pivot Value“, and what's the “Data Value“? I started trying to understand what Unpivot is by searching the net. MsnSearch said Unpivot is the opposite of Pivot. Great, thanks. Trying to go through ANSI SQL-92's index looking for Unpivot/Pivot/cross tab, and then through several SQL books. Mr. Celko finally confirmed I'm not a smartie. Thanks, buddy. At least I have a head full of hair, grey as they might be.
Finally (and I'm kicking myself why I didn't do it sooner) what really helped was the BOL topic on the Unpivot transformation. The example makes is simple, and it makes sense. What else is there? It breaks down the operation into “Destination Column“ and “Pivot Key Value“, concepts that I understand.
Following the grid in BOL, added two rows - "Destination Column" and "Pivot Key Value" to my scratch paper:
| Destination Column |
|
|
A1 |
A2 |
A1 |
A2 |
| Pivot Key Value |
|
|
Q1 |
Q1 |
Q2 |
Q2 |
| Column Name |
Customer |
Email |
Question1_Answer1 |
Question1_Answer2 |
Question2_Answer1 |
Question2_Answer2 |
| Data Records |
Ashvini |
as@somewhere.com |
1 |
2 |
3 |
4 |
That was easy.
The only other thing I need is the name of the column where the Destination Column Values would end up. Well, that's "Question".
Given this information, what do the terms in the UI ("Data Column" and "Pivot Value") mean? Well, "Pivot Value" is the "Pivot Key Value" in the grid above. And "Data Column" is the column name that will contain the data, which is the "Destination Column" in the grid above. At the bottom of the Unpivot UI is the single edit box for the name of the new column which will contain the "Pivot Value"s, which in our case is "Question".
So, why are the names in the UI not the same in the docs? Ironically, I had a hand in that. The new names made sense when Bob (the UI author) and I were discussing them because we understood the feature in and out. Having revisited the description, the names make sense to me again. The description that Unpivot is the opposite of Pivot also makes more sense. I haven't picked up Celko's book again, though. Serves him right for being an elitist. <g>
While we'll try to fix this discrepancy, what helped most was not the names but the description in BOL. Last year I got a chance to spend a week at Product Support understanding how they work and got reports of customers saying our BOL is the best in the industry. With all the attention on features we tend to loose focus on how much good information BOL provides. The docs folks have been a part of the team through the beginning of the product cycle and worked their butts off. Thanks to them I can go home with a lighter head knowing I finally understand something that was bugging me for half the day today.
Thank you guys!
Hvala,
Laku Noc