April 2005 - Posts

Trip report...

Back from Houston. The long way. I didn't take a lot of pictures for various reasons. Mostly cuz I was lazy or put another way, I knew it was going to be a very long ride and every minute I spent doing something other than riding made it even longer.

I picked up the bike, a brand new (sorta) Yamaha FJR 1300 in Houston on Friday. The very first time I took it out for a spin, it rained cats and dogs. Saturday morning, 6AM, I drove out of Houston and had a great ride into Wichita.


Then, Sunday, I rode from Wichita to Burlington, Colorada. That is possibly the most windy section of road I've ever experienced. No big surprise. I lived in Wichita for years, I've learned to admit that now. :) So, I should have expected it, but there's nothing like tooling down the freeway at ~80 MPH (130 KPH) leaning 20 degrees to compensate for the 30 MPH crosswinds. Notice which section of the trip was the shortest? It seemed the longest.

Monday, I got rained on in Denver. Lightening like I haven't heard for years. It just doesn't do that sort of thing in the Northwest. It only sprinkles, rarely do we get a serious thunder storm. And people talk about how much it rains in Seattle. Sheeesh. Ate lunch at a local Black Eye'd Pea. The waitress was very helpful and friendly, but reminded me of Ichabod Crane. The rain passed by the time I finished eating and I was back on the road again. Rode until late Monday night into Evanston, Wyoming. Wyoming had lots of curvies and of most relief, no winds. Odd for Wyoming, but I wasn't complaining.

Tuesday, I hauled my sorry, sore butt out of bed for what I knew would be a very long haul into Seattle. About 900 miles in one day, I made it home at 11pm. 2700 miles in 4 days. Not too shabby.

After just one day of riding, my bike looked pretty hammered. Warning, don't look at this while eating.

A passerby asked
“What'dya hit a seagull or somethin?“ 
“No, just Wyoming mosquitos.”
“How could you see through that?“
“I don't know, it was late and dark, so I couldn't see anything anyways.“

Bon Apetit

Universe.Earth.Travel.Motorcycles.Yamaha.FJR..KirkHaselden

Flying to Houston tomorrow...

I'm flying to Houston tomorrow. Look for pictures of the return trip. :)

Universe.Earth.Travel.Motorcycle.YamahaFJR..KirkHaselden

Validation...

Another stream of thought post...

Some questions keep popping up about validation and I thought I'd try to clarify it a bit.

What is Validation

I've never seen a good definition for what validation in IS is supposed to be or what it's supposed to do anywhere. There is a lot of discussion about what components do when validating, but before launching into a discussion about it, I thought it would be helpful to give a little history and explain a little of the philosophy behind validation. This will likely be better than a straight definition because, hopefully, you'll understand the evolution and reasoning behind it rather than just a static description.

Early Design Iterations

Early on, I remember walking into perhaps the first design review of what was later to become the runtime. Among all the proposals in that review, there was a new method on the task interfaces called Custom or something similar. (Can't remember, after all, it _has_ been 5 years now) Nobody liked the name but everyone liked the concept. The idea was that it would be a method that the runtime would call “sometime” before execution to give the task a chance to “do stuff” before executing. Pretty vague, yes. I wished all ideas and designs would just spring into my mind fully formed on the first day of a project. Just doesn't seem to happen that way. If you know of anyone that does this, let us know. We're hiring. :)

As I recall, Gert Draper, who was our PUM at that time suggested that we call it Validate. Hmmm, has a nice ring to it. Makes sense, because, typically, what folks want to do before executing is make sure that execution will succeed.  Gert asked why we need to pass in variables, and connections etc. Well, I said, somewhat unsure of my answer. After all, this was Gert. :) “Well, because the tasks may need to use them for the work they do during validation.“ Knees knocking, hands shaking, teeth chattering. Gert says, “Oh, OK“. Boom? A new method is born and with that the start of a new concept that needed to be developed. It's interesting how one little old method can spawn multiple discussions and philosophical discussions.

Validation caused a whole lot of confusion and discussion. We talked about “deep validation” vs. “light validation“. That debate raged for months. What type of validation should components do? We talked about what other components should do. At the time, we still didn't have the notion of extensible log providers, connections or enumerators. Later, when those were “componentized“, we only passed variables. The logic went, “Why would anyone need connections for a log provider?“ Hmmmm...

The light validation discussion was very interesting. A lot of talk about how to indicate to the task that the validation was just preliminary vs. pre-execution validation etc. The thought being that you wouldn't want to validate a task strictly if the task was being configured or had property mappings etc. We didn't have the notion of warnings and information events either. Only errors which were inflexible and for purposes of validation, terminating. Out of these discussions grew the notions of Warnings, later came information events. Eventually, we scrapped the whole idea of light validation vs. deep validation. Warnings enabled tasks to give information and yet still say they could function in spite of the issue they were reporting. Slowly we migrated to a point where validation was just validation. No variants. Then we arrived at the definition we have today.

Validation is what a component does to detect any issues that would cause it to fail during execution.

This led to a few other “rules“

  • When a component validates, it should always validate that given it's current property settings, it will succeed during execution.
  • Components should not return when they find the first error. They should continue to validate until finding all errors. This allows for a better picture when the whole error stack of errors is visible.
  • Components should emit warnings for cases where the error is not fatal, but could cause problems. For example, when the send mail task doesn't have a subject. Non-terminative errors.
  • Some others I can't remember right now...

Now the problem is, when do components get validated? Essentially we look at validation in a number of different ways depending on the situation. If you're designing a package, you'd like to know that it has errors during design, not when you go to execute it. So, the designer takes some liberties here and validates components whenever the UI for the component is modified. When a package is opened, the designer validates it as well. These are design time validations that should not be confused with execution time validations. These are done to ensure that the package writer is alerted ASAP to problems with the package.

Execution Validation

Execution validation is perhaps where most folks get tripped up. Execution time validation happens at two key points. When the package is executed and when the runtime executes tasks in the package. In the designer, this can be confusing. Because it's validating all the time and because it appears that the package is running in the same process as the designer, it seems there is no clear distinction between design time and execution time validation. But the designer doesn't run the package in it's own process. It actually runs it out of process (for a number of reasons I won't go into here). So a host process loads the package and executes it. When the designer calls Execute() on the package, the package validates. Everything in the package gets validated from the package down to the containers to the components. This is general validation. Then, when the runtime calls execute on each task, the TaskHost calls validate again. This is component validation.

Early vs. Late Validation

Why? This is the confusing part. Since we want folks to always do strict validation, (remember no light validation), and since packages have the notion of late configuration or dynamic configuration via property expressions and foreach loops etc., and since we make the assumption that you would rather have a task fail validation then to corrupt data or otherwise fail execution in a destructive way, we validate twice. We validate the whole package and we validate each individual task right before execution. In fact, if you were to look at the TaskHost code, you'd see something like this:

ExecResult Execute(parameters)
{
  ExecResult = Task.Validate(Parameters)
  if(FAILED(ExecResult))
    return ExecResult;
  return Task.Execute(Parameters)
}

Now, some tasks aren't going to be ready to execute when the package Execute() method gets called. They may rely on a variable value that gets set by another task further up the package call chain. They may be waiting for a file to be generated or dropped etc. So, if there was no way to validate tasks after Execute() gets called on the package, the package would never run successfully. However, the runtime needs to know when this is the case. There is no way for the runtime to know when to validate early or late. Enter the “DelayValidation” property.

DelayValidation

This property tells the runtime, “Don't validate me until the very last instant”. This property is on all containers and all hosted objects. It's a simple flag. When the early, package level, validation happens, the runtime checks this flag. If set to true, the runtime skips validating that part of the package. Think scope here. If there is a container with multiple children containers with multiple grandchildren containers, and that container has set DelayValidation to true, none of it get's validated early. The whole thing gets skipped.

Late Validation

Later in the package execution, the runtime will call the individual Execute() method for tasks. Then the runtime will call the Validate() method of the task implicitly when it calls the Execute() method. Now, if the task fails validation, you're sure that it would have also failed execution because there is very little chance that anything will change between the time that the runtime calls Validate() and Execute().

Validation is important because it gives early warning of critical issues. It's the way all the nifty little icons pop up in the designer when there's an error. When you move your mouse cursor over the little red x in a task, and the error message shows up in the tooltip? That's the result of validation. Validation keeps your system safer because components check to ensure that the operation will be successful, or at least has a good chance of success before performing potentially invasive or damaging operations. Validation is to packages what compiling is to source code.

Hope this helps.

You can see more about Validation in books online here:

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/dtsref9/html/dc224f4f-b339-4eb6-a008-1b4fe0ea4fd2.htm

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Caching in The SSIS Service

In the stream of thought category...

Integration Services is of all things a platform. While some platforms are pretty useless until you actually build something with them, SDKs and APIs like DirectX etc (Yes, definitions of “platform“ differ) others are extremely functional out of the box, Windows, Office etc.

Integration Services as a Platform

Integration Services is a platform in every sense of the word of the latter type. You can use it out of the box to generate functionality. You can build components that plug into it. You can build them in any COM conformant language or .NET language as well. For lack of better nomenclature, or ignorance thereof, I call this “Building Into”.

You can also fill components with code that gets executed (Script Task, SQL Task, Script Transform). “Building up”

You can build applications with SSIS embedded in the typical object model sense. “Building in”

You can just use SSIS to create packages and put together packages for building a solution. “Building on“

There's a cost associated with this. SSIS needs to keep track of all this “stuff“ that's going on. Where are the components. What kind are they, what are their capabilities and how do those components wish to be instantiated or persisted etc. It's surprising how much code in SSIS is devoted to managing, enumerating and tracking all this information.

Start up time for SSIS

We regularly get complaints that the startup time for packages is really long etc. It's true. In some cases, it's avoidable. We're looking at ways to make this better. In other cases, there are some things you can do to make it better as well.

What's one got to do with the other? Well, all those components need to be enumerated at some point. DTS's answer was to cache the component information in registry. Integration Services has many more component types and also supports both COM and managed components which include Logging Providers, Transforms, Source and Destination Adapters, Tasks, Connection Managers, and ForEach Enumerators. Managed components aren't really registry friendly. It is after all a different paradigm.

Integration Services uses the service to enumerate these components. It's running all the time. (It should be. If you're doing work in SSIS, you should have the service running.), and in case of reboot etc., it refreshes the cache automatically,

So, what happens when we enumerate components?

The runtime does a few things. It checks the registry for the following registry entry.

SOFTWARE\\Microsoft\\MSDTS\\Setup\\DtsPath

This tells IS where SSIS is installed. Typically, that value is something like:

C:\Program Files\Microsoft SQL Server\90\DTS

Yes, yes, I know. DTS is all over the place. Trust me, we would have changed it if we could, but that's a whole 'nuther blog entry.

If you go to the folder specified in that registry entry, you should see some subfolders named Connections, ForEachEnumerators, PipelineComponents, Tasks, and LogProviders. There are some others there as well. That's another blog too.

Now, the runtime looks in those folders for the managed components. If it finds the managed component there, it reflects on it. Interrogates it for things like it's component type, properties, Icon etc. This is how all the Infos collections on the Application object get populated. Or at least, this is part of how it works, for managed components. .Net has no notion of  “component categories“. It's a COM way of marking components as being in a certain class of components that supports certain interfaces. So, we've resorted to dropping the managed components into these folders. The different folders are really just to make it easier to organize more than a technical necessity.

Once all the managed components have been enumerated, the runtime looks for native components. This is a little easier because the search is assisted by “Component categories“. All we have to do is enumerate the components that are DTS components.

The service communicates with the runtime to retrieve this information and stores it for the next time you load a package.

Summary and tips

Bottom line is, there's a lot happening behind the scenes. To make startup better, keep the service running. Use the designer to design packages only. If you want to gauge performance or startup time, use DTExec.exe. There is very little overhead when running DTExec with the service running. If you don't need breakpoints while executing a package in the designer, use the execute without debugging feature found in the debug menu. That launches the package without attaching the debugger etc. If you're debugging in the designer, shut down as many of the windows as you can. Having 4 watches, the memory and output window all open at the same time along with the tools and project windows isn't really necessary while debugging. In fact, the VS environment will remember which windows you have open during execution vs. design time. Only open the windows you need during execution, that has a profound effect on perf.

Hope this helps,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

I guess I asked for it...

How ironic that the post announcing that I'm turning on comments gets spammed 100 times...

http://sqljunkies.com/WebLog/knight_reign/archive/2005/04/18/12055.aspx

This has got to get better...

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Under a cloud for weeks...

For those who care, you might have noticed I've been pretty quiet for the last few weeks.

I had been pretty healthy for quite sometime, then boom! I got the stomach flu, regular flu, then a cold and then bronchitis all within the last four weeks.

All that hacking and weezing, being under a dark hazy cloud all day.

Man, I'll take two or three days with a bad flu over that anytime...

Thanks for caring. :)

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

Out parameters in the Execute SQL Task...

One of the more strident complaints we've been receiving is the fact that the SQL Task didn't support out parameters on stored procedures. We listened.

Our crack developer, Mohammed spent some substantial time and effort on the issue and isolated the problem.

As it turns out, the SQL Task _did_ support out parameters. But, in some situations, the SQL Task had a bug in the portion of the code that retrieved the result into the parameter. Because of the way OLEDB returns multiple result sets, if the code doesn't consume all the results, and the out parameter value is one of the results, the out parameter won't get correctly set. For simple cases, stored procedures with only one result for example, the out parameters worked fine.

The bottom line is, ultimately, you should see this begin to work in the near future!

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Comments back on...

I've noticed that since comments are off, I've been receiving more mail directly. Don't mind the mail. But, likely some of the questions folks ask others have as well. So, I've bitten the bullet and turned comments back on. Oh look! There's some SPAM now inviting me to a nice online gambling casino...

Off to hit the delete button. . . :)

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Another SSIS Article...

Here's another article on SSIS. It's rewarding to see more and more press on Integration Services.

This one is pretty fundamental, but I liked the roundup of new features at the beginning.

http://www.developer.com/db/article.php/3497511

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Forrester Research Write Up

Others will likely post about this, but it's good information and want to share it as well.

Forrester Research has done a short write up about Integration Services.

http://www.microsoft.com/sql/bi/integrate/productinfo/foresterreport.asp

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

 

Package command line configurations...

I've been seeing questions about this and although I've posted about configurations before, here command line configurations are a bit different.

There are some cases where you wish to pass values for properties directly on the command.

Here is how to do it:

DTExec /f PackageName.dtsx /set \Package\ForLoop\Sequence.Variables[VariableName].Value;VariableValue

This package path (bold) points to the variable named “VariableName“ defined on a sequence named “Sequence“ inside a ForLoop container named “ForLoop“.

Containers are delimited by the '\' token. Properties are delimited by a period. And indexes into collections are delimited by brackets [].

Another way to easily figure out a package path is to create an XML configuration to a property and then copy the package path directly from the configuration file. For example, an XML configuration to configure a package with a variable called “test“ at package scope would look something like this:

<Configuration ConfiguredType="Property" Path="\Package.Variables[User::test].Value" ValueType="String"><ConfiguredValue>Default</ConfiguredValue></Configuration>

Cut the “Path“ portion out of the file and paste it into the command line, and you can configure the variable through DTExec.

DTExec /f MyPackage /set \Package.Variables[User::test].Value;“This is the new value of the variable“

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden
 

SQL Server Edition information...

Microsoft has a link that explains the different editions and what will be included in each.

http://www.microsoft.com/sql/2005/productinfo/sql2005features.asp

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Turning comments off...

As this blog grows the surface area for spammers just gets bigger and bigger. I spend substantial time everyweek just culling out the spam from the legitimate feedback, comments and other valid discussions. I really look forward to the pending upgrade I've heard about that will eliminate this problem.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden