XML. Its (going to be) the new RTF. And PPT.
As an Database geek, I've faced one problem over and over and over
again: How do I store Office Suite documents in a database in a useable
way. And by usable, I mean in such a way that I can easily query their
contents and reshape them into smaller chunks of knowledge. One of my
favorite examples of this is importing Word documents representing some
knowledge capture and needing to extract out their abstracts. The idea
is the abstract should give users enough information about the document
to decide if reading the whole document is likely to help them solve
the problem at hand. Of course, there's a number of ways of doing this,
but none of them really strike me as very user friendly. For example,
suppose you use some ASP.NET website to allow users to upload documents
and search them. You could, for example, require the user to enter (or
more likely, copy and paste) the abstract into a textbox and save that
as field in some database field. Then you could use either matching or
full-text searches on that column (or even the whole text of a
document, if you like) and render a list of results.
As the user of such a system, though, I'd prefer to just submit the
document and let some logic find the abstract and do what it needs to
with it. In other words, make me do as little as possible!
With the combination of Office 2003 and SQL Server 2005, I've cobbled
together workable solution. In the Word document template, I have a
style called "abstract" that users can apply to a section of document.
Users can then save their inherited documents as XML documents, which
they in turn upload. The upload is (of course) to a SQL Server 2005
instance. The document itself is saved to an XML typed column. That
allows me to use XQuery on those documents to pull out the abstract and
use that as I like. I demonstrated some of this at Code Camp II in
Boston last year.
Pretty cool, right? Kind of. There's a problem, though.
More correctly, its that Word 2003 wants to save, by default, to "Word Format" (a newer flavor of RTF, in a sense) instead of XML, and sometimes even the best users forget that they need
to save in XML instead of Word Native format or even older standard RTF
format. Naturally, my uploading page barks that the uploaded file isn't
XML and asks them go re-save. Fine, other than it irritates the user,
duplicates work and so on. But I don't have a good way around that
today.
But with Office.Next, it appears I will. Why? John Durant points us at the announcement of XML as the new native format for Word, Excel and FINALLY PowerPoint.
John said: "It's a watershed moment for Office programmability." True, but this goes way beyond programmability, I think. Its potentially just as much as a watershed for Knowledge Management Systems. Sure, we've had this ability with OpenOffice for a while, but I, for one, and glad to Microsoft -- which really owns the IW productivity suite space -- make this move.
Looks like I'll be shuffling my TechEd 2005 schedule around a bit...