February 2005 - Posts

Latest reads...

So while staying in the rustic cabin in the mountains, I picked up the intriguing and informative little booklet tuned to my resting grey matter called:

Around the World in 100 Years, by Jean Fritz

I don't think it's in print anymore but it showed how early western explorers slowly opened up the world through progressively brave and sometimes cataclismic exploration into unknown territory. Basically, each explorer opened the way and simultaneously laid down the challenge for further exploration. It did a great job showing how long standing beliefs and lack of knowledge of unknown territories caused explorers to make assumptions that were sometimes way off base. In fact, some explorers desperately stuck to previous assumptions in spite of all indications to the contrary. For example, Christopher Columbus believed he had found the short cut to China even when all indications were clear that he'd found the Americas. It's a book tuned to 4th and 5th graders, but I still enjoyed it.

There's a remarkable similarity between the progression of early explorers and the progression of a new complex piece of software. I'll tell you about it sometime when I feel like boring you to tears...

Joel on Software, by Joel Spolsky

A former Microsoft employee, some of what Joel writes is quite familiar, especially the chapters discussing best practices etc. What's interesting is that Joel seems to have a grasp of what works and doesn't work at Microsoft and other software companies and has some great insights into managing software teams. As we're in the middle of a review cycle here at Microsoft, I especially appreciated his thoughts on that aspect of Microsoft. :)

http://www.amazon.com/exec/obidos/tg/detail/-/1590593898/ref%3Dnosim/joelonsoftware/002-4709218-9192042

Custom components...

A few folks have started to ask questions about how to write custom components.

There is an excellent tutorial here and it's a good start:

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/dtsref9/html/0f2a3300-7bdb-4d8c-9521-d819d6646720.htm

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

SQL Server Magazine article...

For those who may be interested, here's my latest article.

http://www.windowsitpro.com/SQLServer/Articles/ArticleID/45092/pg/1/1.html

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

System variables, logging and the Execute SQL Task...

Here's something useful you can do with system variables and the SQL Task. Logging in SSIS is more flexible and there are more options for logging destinations and formats then with DTS. However, it may not always be as surgical as you'd like it to be, and you can only create log events that the components support.
There are a few ways around this as follows:
  • You can use a script component or task and log custom log events “on the fly“. This has value in some cases.
  • You can create a custom task in which you can create custom log events that you may actually filter. I think there is someone writing one of these now. :)
  • You can build a pipeline to output to various formats.
  • You can use the SQL Task to log custom data using parameter mapping and variables.

I've created a very simple package run log that illustrates the last option. Granted, the package already logs a similar set information, but it's a good example because it's easy to compare the two approaches. Using the SQL Task, I have created a simple log table that logs the username, packagename, machinename, packageID and starttime for each time a package gets executed.

First, I created a table as follows.

use Playland
CREATE TABLE PackageRun (
  username varchar(50),
  packagename varchar(50),
  machinename varchar(50),
  packageID varchar(50),
  starttime datetime
);

Then in the SQL Task, I have the following insert statement:
INSERT INTO PackageRun VALUES(?,?,?,?,?); 

  /* username    varchar(50),
     packagename varchar(50),
     machinename varchar(50),
     packageID   varchar(50),
     starttime   date */

Here's where the system variables come in handy. The Parameter Mapping tab looks like this:

I've put this SQL Task in the package at package scope to log simple information about the run of the package. Now, whenever I run the package, it enters a log into the PackageRun table as shown below.

Existing log events cover a lot of ground. The logging infrastucture generously handles the general case for logging, but it's a broad brush and doesn't always get as detailed as you'd like. There are cases where there is no way to log specific information. For example, parameters to event handlers, the LocaleID of a specific container and task execution values. This logging mechanism can be used for just about any type of custom logging you'd like at anytime during the execution of the package and is great for one off, surgical logging of custom data that the existing log infrastructure doesn't support. If you ever run into the situation where you wished something got logged, but it doesn't, this method may fill in the gaps.

Let me know if this was useful to you.

Thanks,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

I'm baaaaack!

Well, took a quick (1 week) vacation to my favorite ski spot, Utah.

Took the whole family and trashed my legs on the slopes of Brighton.

It was great to get away from all things digital to “hang with the kids” and visit with family in a rustic cabin in the mountains.

But, I had withdrawal and after returning home last night around 2 AM, I spent the whole night catching up on email, blogs, newsgroups and news.

I am a digital slave! :)

K

Scheduling package runs I

This is just an introduction because we've been getting a lot of questions about this. The way you schedule packages to run in Agent has changed a bit. I'll blog about it here a bit. I'm working on a full article and will post later. But for now, here's a quick set of steps to get you started:

In SQL Server 2005 you no longer schedule package execution with DTSRun.exe or even the new command line executable DTExec.exe

Agent has a new feature called subsystems that makes it possible to directly execute packages.

Here are the steps:
1 - In SQL Server Management Studio create a new credential un the security node.
2 - In the proxies node, under SQL Agent, create a new proxy. This is the entity that will execute the package under the credentials specified in step 1.
3 - Now create a new job.
4 - Under the steps, create a new one and select the type of the step to be SQL Server Integration Services Package Execution.
5 - Select the execution settings you want to use in the UI.
6 - Set the rest of the settings in the job like time and date to execute etc.

Your done.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Allan and Darren are at it again...

Just cracked open the latest (Jan/Feb) copy of SQL Server Standard to see the SSIS dynamic duo's new article on what's new in Integration Services. Funny how I skip right by all the other great articles to that SSIS one just sticking out at me. :)

I think Darren and Allan did a great job capturing some of the lesser known but extremely powerful new features. It was especially rewarding to see property expressions described. If you've read some of my other blogs about it, you'll know that I feel this is one of the most powerful features in the control flow. We thought they'd solve some problems for us, but as it turns out, they are extremely powerful and flexible.

If you get a chance, take a look at their article.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Free XML Editor...

Roman's blog mentioned a nifty little free XML editor.

I have XMLSpy, but the license doesn't allow me to use it everywhere I want.

So for lightweight, one off stuff, this one comes in pretty handy. Give it a try.

It's here.

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Managing complexity in packages?

<<Yeah, I know. I'm blogging a lot today. I've had a few backed up since .Text has been tweaky lately.>>

We've had several customers send mail asking what was the best way to approach some common problems, which are basically, “How do I organize my complex packages so that they're easier to manage?”

You've got a few options:

  • Use more containers - That's what they're there for, to reduce complexity. Especially the sequence container.
  • For large dataflows, find logical intervals where you can stage into raw adapters. Enhances restartability.
  • Use more subpackages - This is the ultimate because you can share the packages in different solutions. (Code reuse) Faster design time as well. Easier to debug child packages alone than large packages.
  • Use property expressions - A lot of folks are using script tasks for simple things like catenating strings etc. Use property expressions for these types of simple jobs instead.
  • Use enable/disable options on containers to isolate pieces of a package while debugging.

These are just a few examples of ways you can eliminate complexity in your IS solutions.

Thanks
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Programmatically creating tasks...

Someone had a question on an internal alias about the names of the “stock” tasks and how to know which is a stock task and which isn't. But, first let me back up a bit and give some history.

When we first started writing tasks, and the SSIS runtime was in its infancy, we had a number of problems related to creating tasks.

  • Tasks written in managed code had fully qualified names that were very long and very hard to remember. (They still do)
  • Task names were changing a lot. There was a lot of churn.
  • Our task creation story hadn't stabilized yet.
  • Task versions were changing a lot.
  • We were writing a lot of ad hoc testing code.

Because of these issues, it was quite difficult to create a task programmatically. Plus, the code just looked gnarly.

With fully qualified names, the code looks something like this:

TaskHost execPkghost = (TaskHost)pkg.Executables.Add(“Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptTask, Microsoft.SqlServer.ScriptTask, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91”);

We wanted some way to shortcut the name. So we came up with stock names.

With stock names, the code to create a task looks something like this:

TaskHost execPkghost = (TaskHost)pkg.Executables.Add("STOCK:ExecuteSQLTask ");

A little easier to read and write, yes? The problem we have now is that there is code out there that uses the stock “monikers”. So, these names are here to stay. However, they aren't available for every task. Only tasks that the Integration Services team writes. They weren't designed to be extensible, but they are still useful. Maybe in a future version, we'll make it possible for everyone to use this convention. But for now, these are the only tasks that support the stock name:

  • SQLTask
  • ScriptTask
  • ExecuteProcessTask
  • ExecutePackageTask
  • PipelineTask
  • FTPTask
  • SendMailTask
  • MSMQTask
  • FileSystemTask
  • BulkInsertTask
  • ActiveXScriptTask
  • XMLTask
  • WmiDataReaderTask
  • WmiEventWatcherTask
  • TransferStoredProceduresTask
  • TransferLoginsTask
  • ExecutePackageTask
  • Exec80PackageTask
  • WebServiceTask
  • TransferObjectsTask
  • TransferDatabaseTask

So, place a “Stock” qualifier before the names you see above like this, “STOCK:WmiDataReaderTask” and you can create the WMI Data Reader task.

Hope this helps.

Thanks,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Setting fonts...

There are a number of custom settings for the BI Dev. Studio that effect the look of the designer and how it works. Dev. studio is fairly complex though, and sometimes it's hard to find some settings.

Case in point:

How do you change the font the designer uses in the data flow data viewers.

  • Select Tools | Options from the main menu
  • In the resulting dialog box, select the Environment node
  • Select Fonts and Colors
  • In the drop down under "Show settings for:" there is an option to select "Business Intelligence Data Viewers".
  • You can change the font there. I like "Rage Italic" myself. :)

There are a few other settings in the options dialog that effect the BI designer as well. Take a few minutes when you get a second and explore it.

Thanks,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Another SSIS article from our good friend Jamie...

Jamie Thomson (Did I get it right this time? :) has been writing about SSIS again. This time in SQL Server Central. Take a look here

Here's Jamie's blog about it.

Thanks,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Integration Services Checkpoints...

A few questions have been cropping up regularly about checkpoint restartability. Here are some fundamentals.

Checkpoints are a feature found at the control flow level only

Checkpoints cannot be set in the dataflow. In fact, checkpoints cannot be set anywhere in the package. Precedence constraints are automatically the checkpoints in a package. This is the granularity for this release. So, anytime you create a new precedence constraint, you're essentially creating a checkpoint as well.

Checkpoints are dataflow agnostic, (but there are ways to use them in dataflow)

This is just another way of saying that checkpoints are a control flow feature. So, why be redundant then, Kirk? Well, it's not always clear to folks and some people are still just getting used to the idea of data flow and control flow being separated. Especially, the press. :)

Once a checkpoint file is used to run a previously failed package, the IS runtime deletes it

When the checkpoint file is used in the package, it will be cleaned up.

IS creates a checkpoint file only when the package fails

This is important. If the package doesn't fail, it won't write out a checkpoint file.

Task failure does not necessarily mean that the package failed

On a related note, tasks may fail in a package and the package still not fail. Why? Well, the rationale goes something like this. If a package were to always fail whenever any task failed, then the package could never recover from a task failure. That's why there is an OnFailure precedence constraint option. To handle failed tasks. So, you must cause the package to fail if you wish for a checkpoint file to be created. There are many ways to make a package fail. See below.

There are three properties that control checkpoint restartability, they are:

  • CheckpointFileName - Name of file where IS saves checkpoint information.
  • CheckpointUsage - Enumerated choice IfExists, Never, and Always. Controls if the package uses the checkpoint file. If Exists will use the file if it is found in the location specified by the CheckpointFileName. If not, it will attempt to run the package as though it never failed previously, ie. from the start to finish. If “Never” is specified, then the package will always run as though there were no checkpoint file. Finally, “Always“ is a curious choice. If “Always“ is specified, the package will refuse to run if it doesn't find a checkpoint file. This is useful in the case where it would be destructive to run an entire package subsequent to a failed partial execution. For example, if you had a package that did a simple load. If the package succeeded in loading some data, but then failed on a subsequent operation, running the entire package again would likely result in duplicated data. So, specifying “Always“ ensures that if the checkpoint file isn't available for some reason, the package will refuse to execute.
  • SaveCheckpoints - The package should save a checkpoint file if it fails.

Ways to make a package fail

  • All containers and tasks have a property called FailPackageOnFailure. Set that to true, and if the task or container fails, it will also fail the package. Use this for critical tasks, like a dataflow task, that if it fails, you're sure that the package won't succeed. Peripheral tasks like send mail or other informative type tasks, may not be a good choice here.
  • All containers and tasks also have a property called FailParentOnFailure. This property if set on a direct descendant of a package will fail the package if it fails, just like the FailPackageOnFailure property will. This property can be used in packages to create cascading failures from leaf (task) containers up to the package.
  • ForceExecutionResult is a property you can set in conjunction with the above named properties to simulate a task or container failure. Set it to Failure and it will appear to the IS runtime that the task has failed. ForceExecutionResult is available on all containers as well.
  • Set the MaxErrorCount (maximum number of errors allowed before failure). When the number of errors hits that threshold, the package will fail. This is useful is cases where you want to execute a package that has known errors, but they are handled within the package or are inconsequential. However, you also know that if you get more than n number of errors, something is very wrong and the package should just terminate. This is the sanity check setting for errors.

Hope this helps a bit.

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

Shredding recordsets in the ForEachADO Enumerator

Ash blogged about this recently. I've been meaning to blog on it as well. We've known it was an issue, but didn't know how much of one until we started hearing things like "You're not seriously considering shipping like this are you?" and "This is really much too much work just to enumerate recordsets" and the ever abiding "This sucks". :)

So, we've taken a look and it appears that there was a way to enable the ADO enumerator to shred the recordsets for you.
This should be available soon. :)

Thanks Matt!!!

Update: you can see here that we've finished this now and there is no longer a need to hack around a solution!


Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Expressions : Part II

This is a blog I've been wanting to do for some time. It addresses a problem users have a lot and one that I've kept in the back of my mind throughout the project. The problem is, how can I move files from one location to another and rename them in some consistent way as I do it. On its face, the problem doesn't seem that difficult, and in practice it's not. But there are a few gotchas that can get in the way. The way I've seen most folks address this is to drop a script task onto the designer and start coding away. That's the DTS way of doing things. There is a more subtle and simpler way to do this sort of thing in Integration Services. It also requires less code. A lot less code. This example, I think, points out some of the quintessential differences between DTS and IS. So, let's take a look.

So, the problem is, I have some files in the C:\SourceFlatFiles folder. I want to do some stuff with them and then I want to copy them to the ProcessedFlatFiles folder when I'm all done. I also want to rename them to make it clear that each file was processed. For this exampl