Full Text Search
Full Text Search
Hi all,
While I've not been an active blogger on SQLJunkies this year, I have actively monitored and posted feedback from time-to-time to this site as it contains some of the best information on and about the SQL Server community! With the recent launching of MSN Spaces, I have decided to transfer both my personal and professional blogging efforts to http://spaces.msn.com/members/jtkane/. This new blog is an experiment for me as I'm still gaining my *public* voice (other than in the newsgroups) as I will be posting both my personal observations as well as speak on SQL Server (7.0, 2000 and 2005) Full Text Search (FTS) from my many years of working with this technology.
Additionally, I have extended my research to cover not only SQL FTS, but Internet Search Engines (General and Vertical), Intranet Search (aka, Enterprise Search) software, web Site Search techniques and Text Mining using both SQL Server 2000 and SQL Server 2005 (Yukon). These are big topics that are worthy of a separate blog covering, the structured (database), semi-structured (XML), unstructured (text) worlds and the joining of these separate worlds is something that I firmly believe will come together in the next few years.
Along the way, I will check back here often to see what is happening on this most progressive blogging site. I'd also like to thank Danny Mack and others who have made this blogging site what it is today as I think that 2005 will be a most interesting year, not only for SQL Full Text Search, but for Microsoft and the SQL Server community!
Have a Happy & Safe Christmas!
John Kane
It's been an interesting and full week as I've been attending the MS-sponsored Publishers/Authors summit conference all this week and while the details are under NDA, I did have a good time seeing other SQL Server authors and I did meet several publishers and discussed my book proposal. I also saw some of my friends from SQL Dev and Communities at the after-conference party at the Red Hook Brewery where a great time was had by all! I too am in the Yukon beta and when it's Ok, to post information on this, I'll discuss some of the advantages and enhancements relative to Yukon FTS and all things textual. Speaking of Yukon, Eric Brown of Microsoft SQL Marketing is interviewed in a “talking to“ article in the October 2003 MSDN Magazine that is now online.
Eric Brown's Oct. 2003 MSDN Magazine interview - http://msdn.microsoft.com/msdnmag/issues/03/10/TalkingTo/default.aspx
Relative to SQL Server 2000 FTS and depending upon what OS Platform you have it installed on or if you upgrade just the OS platform from Windows Server 2000 (Win2K) to Windows Server 2003 (Win2003), you can see significant difference (and for most users, a *better*, more expected difference). The OS Platform supplies the “word breaker” dll, for Win2K - infosoft.dll and for Win2003 - langwrbk.dll. The latter is a new Microsoft developed wordbreaker that is also included with Windows XP Pro (and used by SQL Server 2000 Developer's Edition on WinXP) with "better" or at least what people would expect as better, although, sometimes different is not always better. Note, the workaround for this issue on Win2K is to use the Neutral “Language for Word Breaker“, but then you lose the ability to use the language-specific INFLECTIONAL FTS query keyword as the Neutral wordbreaker “breaks“ the words based upon the white space between words.
I've tested the Neutral wordbreaker with SQL Server 2000 on OS Platforms Win2K and Win2003 and using a search string of "T-SQL" is broken into "T" and "SQL". Note the use of double quotes in the search string as this indicates a phrase, i.e., a multiple word search string. However, in this case we are using a single letter and single letters are normally "noise words" in all of the noise word files, so the "T" is ignored and in this case, a SQL FTS query will return results for "SQL" alone. You can also remove "T" (or other single letters) from noise.dat, the Neutral wordbreaker noise word file, and on the OS platform Win2K, you will also need to remove "T"or other single letters from the noise.* files under your \WINNT\System32 directory as well as noise.enu (US_English) and noise.eng (UK_English) as well as the noise word files under your SQL Server default folder of \FTDATA\SQLServer\Config. After making these changes and before saving the file changes, you must stop the “Microsoft Search“ service, before you can save the FTDATA noise word files. When your modifications are completed, you must run a Full Population and then re-test your SQL FTS query.
My name is John Kane and I'm an independent consultant/author (or soon to be) as well as ex-Microsoftie who specializes in SQL Server's Full-Text Search (FTS) components. I have been researching and writing on a book about FTS in SQL Server 2000 as well as Yukon and related topics, such as Text Mining with Analysis Services (BI/OLAP) and all things textual. I'll be posting random thoughts on Search in general, SQL Server FTS as well as Web-based Search Engines and the convergence of structured and un-structured data search and other search-related news items I come across from time-to-time in my work.
I too am new at this blogging thing and want to thank the SQLJunkies folks for hosting this, so forgive me if I make the occasional goof, but you can see many of my postings (I'm one of the ones who reply) in the Fulltext newsgroup at microsoft.public.sqlserver.fulltext and I'll be presenting at the 2003 PASS Conference in Seattle, WA in November.
Regards,
John