May 2005 - Posts

Thanks to Wenyang for writing all this up! Just posting it on her behalf.

 

 

Summary

InfoPath forms can be saved to XML, these XML Files can later be used in SSIS XMLSource adapter to pull out the data in tables and columns. However, there are some common problems you may meet in these scenarios. This article describes how to work around these potential problems. The issues mentioned in this article is not only specific to InfoPath files, it can also be referenced in other similar situations as well.

 

1        Stripping out multiple spaces

XMLSource does not support multiple namespaces. For example, for a XML saved from InfoPath, it will typically look like

 

   <?xml version="1.0" encoding="UTF-8" ?>

  <?mso-infoPathSolution PIVersion="1.0.0.0" solutionVersion="1.0.0.1" name="urn:schemas-microsoft-com:office:infopath:oob:ExpenseReportDomestic:1033" productVersion="11.0.6357" ?>

  <?mso-application progid="InfoPath.Document"?>

      <exp:expenseReport xmlns:exp="http://schemas.microsoft.com/office/infopath/2003/sample/ExpenseReport" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD" xmlns:xhtml="http://www.w3.org/1999/xhtml" xml:lang="en-us">

      <exp:employee>

      <exp:name>

      <exp:prefix />

      <exp:givenName />

      <exp:middleName />

      <exp:surname />

      <exp:suffix />

 

Note the multiple namespaces specified. When we use this file for XMLSource adapter, and ask the XMLSource to infer a schema for the xml file by pressing “Generate XSD…” button in the UI, an error will pop up saying

 

 “Unable to infer the XSD from the XML file. The XML contains multiple namespaces.”

 

Work around:

Unless the multiple namespaces in the above xml make real differences in your scenario, you can always try to use only one single namespace to work around the problem. There are two options

  1. Manually remove the unnecessary namespaces specified on the XML file.
  2. Use a SSIS XMLTask XSLT to strip out namespaces.

XMLTask supports an operation called XSLT, in which users can specify a XML Source, a XSL style sheet source, and then execute to apply the style sheet on the XML source to get a file in new format. We can use a style sheet to get rid of multiple namespaces from the XML file.

 

I have a sample xsl style sheet which can be used to strip out all namespaces(use this only in case when multiple namespaces don’t make sense to your scenario) 

 

            <?xml version="1.0" encoding="utf-8" ?>

                <xsl:stylesheet version="1.0"         xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   

            <xsl:output method="xml" indent="no" />

                <xsl:template match="/|comment()|processing-instruction()">

                <xsl:copy>

                <xsl:apply-templates />

           </xsl:copy>

            </xsl:template>

     

                <xsl:template match="*">            

                <xsl:element name="{local-name()}">

                <xsl:apply-templates select="@*|node()" />

           </xsl:element>

        </xsl:template>

 

                <xsl:template match="@*">

                <xsl:attribute name="{local-name()}">

                <xsl:value-of select="." />

        </xsl:attribute>

        </xsl:template>

 

        </xsl:stylesheet>

 

The way to use this style sheet is: In SSIS designer, add a XMLTask, choose XSLT operation, then point “Source” to your XML source file and “SecondOperand” to your XSL file(“File Connection” as the Type can be used so as to conveniently pointing to files in both cases), set “SaveOperationResult” to true to obtain the XSLT operation result. Then use this result in XMLSource in the downstream dataflow task.

 

      Note: <?mso-infoPath…> will be ignored by XMLSource, so it’s ok to leave it there.

 

2        Wrap up the fields in a single layer XML source

An extreme example of using InfoPath xml source is when the form has only one single level, no hierarchy. In other words, originally the XML file is something like

 

<?xml version="1.0" encoding="UTF-8"?>

<?mso-infoPathSolution solutionVersion="1.0.0.2" productVersion="11.0.6357" PIVersion="1.0.0.0" href="file:///c:\infofile.xsn" name="urn:schemas-microsoft-com:office:infopath:Info1:-myXSD-2005-04-27T19-26-55" ?>

<?mso-application progid="InfoPath.Document"?>

 

<my:myFields xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-04-27T19:26:55" >

          <my:FirstName>Wenyang</my:FirstName>

          <my:LastName>Hu</my:LastName>

          <my:PhoneNumber>425-123-4567</my:PhoneNumber>

</my:myFields>

 

XMLSource will think all fields under the root – “FirstName”/”LastName”/”PhoneNumber” as tables, instead of columns which belong to a same table. That’s not going to work.

 

Work Around

The way to work around this is, after removing the unnecessary namespaces, add a pair of tags to wrap the fields. So change the above to

 

<?xml version="1.0" encoding="UTF-8"?>

<myTable>

<myFields>

            <FirstName>Wenyang</FirstName>

            <LastName>Hu</LastName>

            <PhoneNumber>425-123-4567</PhoneNumber>

</myFields>

</myTable>

 

Then the columns will be retrieved from XMLSource successfully - there will be an output called “myFields” and “FirstName”/”LastName”/”PhoneNumber” will be the output columns of that output, as expected.

 

3        Deal with the fields with missing values

  <?xml version="1.0" encoding="utf-8" ?>

      <expenseReport lang="en-us">

                <employee>

                          <name>

                                              <prefix />

                                              <givenName />

                                              <middleName />

                                              <surname />

                                              <suffix />

                                              <singleName>John</singleName>

                              </name>

                          <address>

                                              <line1>1234 56th AVE</line1>

                                              <line2>Unit B101</line2>

                                              <line3 />

                                              <line4 />

                                              <city>Redmond</city>

                                              <stateProvince>WA</stateProvince>

                                              <postalCode>98034</postalCode>

                                              <country>USA</country>

                              </address>

                         

                          <identificationNumber>1336</identificationNumber>

                                <email>johns@hotmail.com</email>

                  </employee>

      </expenseReport>

 

It’s common that the InfoPath users to leave certain fields in a form blank, like the “prefix” and “givenName” fields in “name” above. Then in XMLSource, if the users ask the XMLSource to infer a XSD, the inferred schema won’t set the data types for those fields, then a bug in XMLDataReader will cause all those fields be treated as tables. That’s very wrong.

 

Work Around

Before getting the fix for the bug, work arounds for this problem include

1)      Give values for all fields.

2)      1) sometime is not acceptable. In that case, ask the XMLSource to infer a XSD file first, then manually edit that XSD file, add “type="xs:string” to that element.

      For example, change    

                  <xs:element minOccurs="0" name="prefix"/> to

                  <xs:element minOccurs="0" name="prefix" type="xs:string>

Will solve this problem.

 

 

with 1 Comments

A question came up on an internal alias about how can one download a file over HTTP. Here's a script that does that, should be trivial to wrap it inside a task if you forsee doing this over and over again.

I can't get the colors to show up on the post, though.

Imports System.Net

      Public Sub Main()

        Dim myWebClient As WebClient

        Dim remoteUri As String

        Dim localFileName As String

        Dim fireAgain As Boolean

 

        Try

            myWebClient = New WebClient()

 

            ' get the context from variables

            remoteUri = CStr(Dts.Variables("RemoteUri").Value)

            localFileName = CStr(Dts.Variables("LocalFileName").Value)

 

            ' tell the user what we're downloading where

            Dts.Events.FireInformation(0, String.Empty, String.Format("Downloading '{0}' from '{1}'", localFileName, remoteUri), String.Empty, 0, fireAgain)

 

            ' do the actual download

            myWebClient.DownloadFile(remoteUri, localFileName)

 

            Dts.TaskResult = Dts.Results.Success

        Catch ex As Exception

            ' post the error message we got back.

            Dts.Events.FireError(0, String.Empty, ex.Message, String.Empty, 0)

            Dts.TaskResult = Dts.Results.Failure

        End Try

    End Sub   

with 3 Comments

Kamal and I are going to be presenting three sessions between us on Integration Services at the TechEd in a few weeks. Of course no SSIS conference would be complete without Donald who will have a couple of surprises for attendees. In addition to the formal talks we've also got some time set aside to chat in the Cabanas.

The titles are (paraphrasing):

- Overview of data warehousing support in SSIS

- Manageability and operations view of SSIS

- Performance and scalability with SSIS

If you're attending either this TechEd or plan to attend other TechEds this year (since they share content from the US based one), let us know what you'd like to see or if there're specific areas you'd like us to cover. The slides are due this Friday, so you can hit the iron while it's hot!

Hope to see quite a few of you there!

ash

 

with 0 Comments

Sudi (from SQL Marketing) wanted more folks to know about the upcoming BI webcasts from MSDN and TechNet. Here's a link and description of the series:

http://www.microsoft.com/events/series/sqlserverbi.mspx

Your business intelligence (BI) applications need to integrate seamlessly into your overall application infrastructure to deliver real competitive advantage. Are you thinking about how to leverage your existing investment in BI and without sacrificing the ability to extend and add cutting-edge functionality? Is developing BI applications and managing them across multiple data platforms and tools becoming a complex and costly venture? Microsoft SQL Server 2005 is around the corner, delivering its enterprise-class BI platform with rich and powerful integration, analysis, and reporting functionality.

Tune in to this series as we show you how to:

  • Take advantage of high-availability features in SQL Server 2005 BI
  • Increase SQL Server 2005 BI application performance using the new scalability features
  • Use the single management shell and new features to easily manage and deploy BI applications
  • Extend the functionality of SQL Server 2005 BI applications using the .NET Framework
  • Develop rich applications on top of SQL Server 2005 BI platform using ADOMD.net, XML and the new management object model
  • Add reporting functionality to any application by easily embedding Report Controls in your applications
  • Use the SQL Server data mining to plug in your own data mining algorithms

Through interactive presentations and end-to-end scenario demonstrations, this series provides you with the knowledge you need to develop and implement easy-to-manage, adaptive BI application architecture. The powerful SQL Server 2005 BI platform will help you keep pace with the evolving needs of your organization.

Bonus: Attend any live webcast in this series to qualify to win a Portable Media Center (official rules) pre-loaded with our best webcasts.

 

with 0 Comments