Runying, our data flow performance guru, just dropped a note with some information she thought would be generally interesting. I'm hoping she can start her own blog so you can directly appreciate her dedication and diligence, but in the meantime I continue to get the limelight.
Adjusting the following properties can have impact on parallelization during execution:
1. MaxConcurrentExecutables
This is a property on the Package. It defines how many tasks can be run concurrently. A value of -1 means the number of processors plus 2. When hyper threading is turned on, it counts the number of logical processors, not physically present ones. In the little testing that I (Ash) have done, I've not noticed significant difference (<5%) in performance between turning on hyperthreading or leaving it off. It'd be interesting to see if you see better results.
2. EngineThreads
This is a property on the Data Flow task. It defines how many worker threads the engine scheduler will create. Note that this is a hint to the engine and does not enforce the total number of threads in the thread pool so that if the scheduler predicts scheduling different components on the same thread might lead to a deadlock, we'd create more threads. As written on the previous thread, assume each source gets one thread, each destination gets another thread, and each component that has an output that's not in sync with the input (sort, agg, merge join, etc.) would get its own thread by the scheduler. Of course certain components (like sort and merge join) might create their own threads to manage their workload.
On topic that I've been meaning to blog about for a long time now is the execution plan that the pipeline uses for scheduling components and how you can read it to understand whether more EngineThreads could be used for improving the performance. Until then, the rule of thumb listed in the previous paragraph would have to be combined with empirical results.
3. MaxConcurrent
This is a property on the ForEachLoop which says how many instances of the loop contents can be run in parallel.