Example: Loading RSS feeds into an SQL Server 2005 XML column
There's been a brief discussion on the SQL Server 2005 XML newsgroup regarding RSS and the XML type on SQL, encoding and schema. In keeping with my recent series of code-heavy examples, here's the guts of a primvative little VB.NET console app that sucks feeds into a table.
The basic gist of the problem is that the XML type doesn't look at the content to determine encoding unless the source that being converted from is binary, but if an encoding is given in the prolog other than UTF-16 (in most cases), SQL complains that it can't switch the encoding. That's seldom the case with client side code where we are using SqlXml typed parameters. It seems the trick is to parse off the document prolog so that we're just feed the typed column data and not encoding information.
Does this potentially cause problems? I'm not sure. I'm looking for a feeds that offered in encodings other than UTF-8 to try. I'd especially like to get my hands on an explicitly UTF-16 encoded feed to try. The Use Perl feed looks to have stored okay even though its using ISO-8859-1 as its encoding.
Script for creating the table in question:
use scratch
go
create table dbo.rssfeeds
(
pkid tinyint identity(1,1) primary key not null,
doc xml not null,
source varchar(max) collate sql_latin1_general_cp1_ci_as not null,
fetched smalldatetime not null constraint df_rssfeeds_fetched default (getdate())
)
Loading program:
Imports System.Data.Sql
Imports System.Data.SqlClient
Imports System.Data.SqlDbType
Imports System.Text
Imports System.Net
Imports System.Xml
Module Module1
Private Function StoreFeed(ByRef URL As String) As Boolean
Dim client As New Net.WebClient()
Dim buffer As String
buffer = client.DownloadString(URL)
Dim reader As New XmlTextReader(buffer, XmlNodeType.Document, Nothing)
client.Dispose()
reader.MoveToContent()
Dim conn As New SqlConnection("Data Source=localhost;Initial Catalog=scratch;Integrated Security=True")
Dim cmd As New SqlCommand("insert into dbo.rssFeeds(source,doc) values (@source,@xml)", conn)
Dim parm As New SqlParameter("@xml", SqlDbType.Xml, buffer.Length, ParameterDirection.Input)
parm.Value = reader.ReadOuterXml
cmd.Parameters.Add(parm)
parm = New SqlParameter("@source", VarChar, URL.Length, ParameterDirection.Input)
parm.Value = URL
cmd.Parameters.Add(parm)
Try
Using conn
conn.Open()
cmd.ExecuteNonQuery()
End Using
Catch ex As Exception
Throw ex
End Try
End Function
Sub Main()
StoreFeed("http://use.perl.org/index.rss") ' <?xml version="1.0" encoding="ISO-8859-1" ?>
StoreFeed("http://sqljunkies.com/WebLog/ktegels/Rss.aspx") ' no encoding
StoreFeed("http://primates.ximian.com/~miguel/miguel.rss2") ' case in question
Console.WriteLine("Done")
Console.ReadLine()
End Sub
End Module
One final question, too: I'm looking for suggestions as to what the most generally accepted and appropriate XML Schema can be found for RSS 2.0 so that that I can try typing the XML column with it.