April 2005 - Posts

Yesteday, Thierry, a colleague from the Analysis Services team and I arrived to Opatija. The flight was great, it was the first time I tried noise cancelling headphones and they just rock! That and the experience of coming here kept us with about 3 hours of sleep in 36 hours or so.

One thing I love about Croatians is that they're just really passionate about ... well... almost everything from what I can see. They love their country, their hot chocolate, their ties (did you know that's a Croatian contribution to the world?), their food, their sports, their islands, and they absolutely love technology! There're about 200 speakers here for WinDays, and over a thousand attendees. Plus a few hundred support staff. That's pretty neat for a country with a population less than half of New Delhi, where I'm originally from. There're also folks who're taking others into production with SSIS, a newsflash which is always welcomed.

Our driver was pretty passionate about his job of delivering us from Zagreb (the capital) to the supposedly-sunny Opatija. It was raining cats and dogs yesterday, and the driver was doing ~100mph on the highways. I kept hinting by asking Thierry (who had me for an airbag since he was conveniently sitting in the back), ”hey, why don't we drive this fast in Seattle during the rain?” and “what does a speed limit of 80 mean?“ but perhaps should have asked the dude to cut back on the caffeine instead. At one point he said “The rain is good, the police are not on the roads”. well... hello! There's a reason they're not on the roads, dude: it's because they don't want to break their necks chasing maniacs! At one point on a toll checkpoint a policeman asked us to pull over, but that was just because of a spot check to make sure the car wasn't stolen. At another point the car underwent hydroplaning at which point the guy got a hint for a little while. Finally the traffic in Rijeka throttled his passion.

Aside from that excitement, things have been pretty stable. I finally got a chance to focus on my presentations and have just now started working on the demos. Lets see how it goes. The sun is trying to peak out today, and Thierry and I finally found a quiet place to sit and focus that had internet connectivity and reduced levels of tobacco smoke. It was interesting to see a loaded cigarette lighter as a swag in the conference handbag. No wonder it's so hard for Marin (the dev manager of Analysis Services and a Croat) to give up smoking.

that's it for now. Wish me luck for tomorrow.

 

with 1 Comments

Today, my ever helpful colleague Bob provided a language lesson in Serbian which I'm passing along in case you find yourself stressing out in the ex-Yu area.

In a previous post I had written

    Ja bih peva

I thought peva was the same in Russian and Serbian. What I forgot was that the 'e' is not pronounced as 'ee' but rather as the 'e' in bed. So, Bob said it should either be :

    Ja bih pevao

which means "I want to sing", or
 
    Ja bih pivo

which means "I want a beer".

The correct answer depends on how much one has had to drink, I guess. Unless you're Sergei, one of our extremely talented Technical Leads, who once gave us a memorable experience by singing a Russian Opera while standing on a table when we had met a really hard internal deadline. Heh heh heh heh. I think I have that somewhere on my pocket pc! That guy (besides being the youngest employee at Microsoft at one point) is such a great sport besides being just a gem of a guy.

This reminds me of the way folks say the name of my boss. Most folks that don't know Hindi pronounce his name as 'Kmäl' (pronunciation key), which in Hindi means 'magic'. I say 'Kml' which means the flower Lotus.

I didn't know how much one could appreciate being around interesting people with so many varied backgrounds until I joined Microsoft. You just use a different part of the brain that would otherwise likely stay dormant. I now know how to curse really well (at the screen, of course) in Serbian/Croatian and Romanian, say a few words in Spanish (and something that'll get your ass kicked in parts of Mexico), how to say thank you in Chinese (Mandarin), how to tell written Korean from Chinese and Japanese, some quite interesting stories about Scotland thanks to Donald, and how similar some words in Hindi are to Persian. While the grammar of Serbian is hard, I've been intimated by Gaelic for now. All the 'kh's remind me of Arabic that I got a chance to learn for a couple of years growing up, which, BTW, I get to read once in a while thanks to Mohammed. Maybe when I go to Scotland.

One of our colleagues from my previous team of Analysis Services, for example, comes around lunch time and says the word 'lunch' or 'food' in the native language of the person whose office he's passing, inviting them to join the lunch train. Once in a while I do hear English! 

Which brings me to the final part of this rambling post that I'm writing while installing VS & SQL. Aside from all this interesting stuff, we do write software once in a while. Some pretty good one at that. The SQL team has several openings up on the career website. If you're up for a challenge or just want to try your luck out like I did, drop us a resume. We've got a couple of positions in the SSIS team as well... for both dev and test. And if you're really good, positions could be created for you. But, that's another story.

bok!

 

with 0 Comments

Folks have been running into issues recently about package hanging for mysterious reasons. Here's a simple checklist:

1. Check (using either FileMon or the SSIS Perf counter on Buffers Spooled or other disk related perf counters) to see if the disk is busy doing something like spooling buffers.

2. Check to see if readers & writers into your database have caused a deadlock. sp_who and sp_lock are your friends. I usually don't have to go to the DMVs to get any more information.

3. In a rare case (and rarer everyday now), bumping up the EngineThreads property alleviates the issue. If that works, please please please open a bug so that we can make sure it's resolved. We'll need the package and the execution plan.

That's it for now!

ja bih peva

 

with 0 Comments

There're two properties on the Data Flow task that have piqued enough interest to cause this post. Note that while I'm describing numbers below, the SSIS team reserves the right to change them in response to our performance testing. There're no secrets here, most of this information is exposed plainly and some other (like MaxBufferSize) can be retrieved easily as well. This post hopefully puts it together.

First the basics:

DefaultMaxBufferRows: maximum number of rows in a buffer. Defaults to 10,000.

DefaultBufferSize: maximum size of the buffer (in bytes). Defaults to 10MB.

Now, how do these inter-relate? I.e. how many rows will go down a path in a buffer?

Well, first of all that depends on the type of buffer and what columns will be in it. For the purposes of this post, assume that one execution tree has one type of a buffer. This means in my previous post's data flow task, there will be three types of buffers. Moving from one execution tree to another means a new buffer type and that means memcpy, something that users should typically avoid. However, getting an execution tree could also mean the scheduler might allocate a different thread for components on this execution tree, so you might get more CPUs being utilized. More on that when we discuss execution plans.

Back to the buffer sizes. Lets take one execution tree (#1) that goes from Flat file -> Data Convert -> Lookup and ends up going into the Sort. Note that the buffer type is the same even though in my case I'm creating new columns in Data Convert and bringing in new columns from the reference table in the Lookup. Why? Because memcpy just for reformatting memory is bad for performance so we try to do as little of that as possible. That means the underlying buffer that the Flat File produces has some extra columns that are not filled in until downstream components get a chance. There're also some optimizations the pipeline does to reuse columns in the buffer when possible like when a multicast is producing two outputs, both of which have a derived column that produce a column of the same type. So, it's not a trivial exercise to find out what columns are on a buffer but hopefully you can get a ballpark. Lets say this is RowSize in a buffer.

A few other variables should be called out here.

MaxBufferSize (not the DefaultMaxBufferSize). This cannot be changed externally. Currently it's set to 100MB but that might change. The reason for this maximum is to avoid wastage of memory where most of the buffer is empty. You'll get an error if you attempt to increase DefaultMaxBufferSize to more than this.

MinBufferSize. This is the minimum size of the buffer and currently has the granularity of VirtualAlloc. On my machine, for example, this is exposed in the following BufferSizeTuning LogEvent:

Rows in buffer type 13 would cause a buffer size less than allocation minimum, which is 65536 bytes. There will be 1365 rows in buffers of this type.

So, I've got a MinBufferSize of 64K.

Given these, the pipeline initialization code (that creates buffer types and sets metadata about them such as what columns and how many rows) then gets busy to calculate the buffer sizes.

If the size of one buffer (RowSize * DefaultMaxBufferRows) will exceed DefaultMaxBufferSize, the number of rows will be reduced to keep the memory requirements predictable. Note that one row cannot span multiple buffers, so in a rare case there'd be a huge buffer with just one row.

If the size of one buffer will be less than MinBufferSize, the number of rows will be increased to at least match MinBufferSize which is the allocation granularity to maximize memory utilization.

If the size of one buffer is between the Min and Max, the memory for a buffer will be a multiple of the allocation granularity that most closely fits the number of rows requested. This is again to maximize memory utilization. The number of rows will be adjusted a little to fit the allocated space.

Drop me a note if you play around these and have some interesting information to share, or better still, write/present on it. I'm also interested in hearing about what further information you need to gain insights into perf tuning the pipeline.

later!

 

with 0 Comments

I was about to write about data flow buffer size configuration properties but need to get this one out first so that we can refer to it.

The Data Flow task (internally and in the object model also called the 'Pipeline') logs some pretty interesting information that describe the internal scheduler.

To see these messages, one way is:

1. In a package you want to get detailed pipeline logs on, select the 'Logging...' menu option on the control flow.

2. Check the Data Flow task of interest on the tree on the left. Switch over to Details and select the following events:

    BufferSizeTuning; PipelineExecutionTrees; PipelineExecutionPlan; PipelineInitialization

3. Run the package and launch the 'Log Events' window from the Control Flow context menu again.

You'll see a new tool window that'll pop up and it will get filled with some events.

BufferSizeTuning

Example:

Rows in buffer type 3 would cause a buffer size greater than the configured maximum. There will be only 2440 rows in buffers of this type.

Next post will talk about what this means.

PipelineExecutionTrees

Example:

begin execution tree 0
   output "Aggregate Output 1" (779)
   input "Data Conversion Input" (793)
   output "Data Conversion Output" (795)
   input "OLE DB Destination Input" (827)
end execution tree 0
begin execution tree 1
   output "Flat File Source Output" (1087)
   input "Data Conversion Input" (958)
   output "Data Conversion Output" (959)
   input "Lookup Input" (117)
   output "Lookup Output" (118)
   input "Sort Input" (470)
end execution tree 1
begin execution tree 2
   output "Sort Output" (471)
   input "Multicast Input 1" (668)
   output "Multicast Output 2" (672)
   input "OLE DB Destination Input" (643)
   output "Multicast Output 3" (674)
   input "Aggregate Input 1" (776)
end execution tree 2


We've discussed execution trees in the past... they start from a source or a component with asynchronous output and go on until a destination or another asynchronous output. Looks like I have 3 execution trees in my package.

PipelineExecutionPlan

The plan for my package is too large to paste here. This was hinted at in an earlier post when describing EngineThreads. Soon...

PipelineInitialization

Example:

No temporary BLOB data storage locations were provided. The buffer manager will consider the directories in the TEMP and TMP environment variables.

These are typically information messages about the directories and other defaults the pipeline will use during its execution.

Next up, DefaultBufferMaxRows and DefaultBufferSize.

 

with 1 Comments