John Kane

SQL FTS Blog

<July 2008>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789


Navigation

Subscriptions

Post Categories



A new SQL Full Text Search Blog...

Hi all,
While I've not been an active blogger on SQLJunkies this year, I have actively monitored and posted feedback from time-to-time to this site as it contains some of the best information on and about the SQL Server community! With the recent launching of MSN Spaces, I have decided to transfer both my personal and professional blogging efforts to http://spaces.msn.com/members/jtkane/. This new blog is an experiment for me as I'm still gaining my *public* voice (other than in the newsgroups) as I will be posting both my personal observations as well as speak on SQL Server (7.0, 2000 and 2005) Full Text Search (FTS) from my many years of working with this technology.

Additionally, I have extended my research to cover not only SQL FTS, but Internet Search Engines (General and Vertical), Intranet Search (aka, Enterprise Search) software, web Site Search techniques and Text Mining using both SQL Server 2000 and SQL Server 2005 (Yukon). These are big topics that are worthy of a separate blog covering, the structured (database), semi-structured (XML), unstructured (text) worlds and the joining of these separate worlds is something that I firmly believe will come together in the next few years.

Along the way, I will check back here often to see what is happening on this most progressive blogging site. I'd also like to thank Danny Mack and others who have made this blogging site what it is today as I think that 2005 will be a most interesting year, not only for SQL Full Text Search, but for Microsoft and the SQL Server community!

Have a Happy & Safe Christmas!
John Kane

 

posted Tuesday, December 21, 2004 6:06 PM by jtkane

Yukon features and articles TechNet...

Microsoft has released details about the new Data Mining, BI and Data Warehousing features of "Yukon", the next release of SQL Server, entitled an 'Overview of Business Intelligence and Data Warehousing in SQL Server Yukon' at
  http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/next/DWSQLSY.asp


Update... more Yukon articles released on TechNet at...
http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/next/default.asp?frame=true

posted Monday, October 27, 2003 7:05 PM by jtkane

The PDC and related blog sites...

Wow, there is a lot of excitement about the 2003 PDC for many & good reasons, but alais I'm not attending this year, and yet still I feel like I'll be there in spirit, if not in person because of this new (or new to me) blogging effort. Many from the .NET Development community and now the SQL Server community are blogging. Additionally, Microsoft is encourging their employees at all levels to blog about their activities and I think that this is a good thing as more information is much better than no-to-little information that can sometimes cause speculation and can be un-productive...

I've been able to track down many of the official and un-official 2003 PDC related blogs (via BlogRolls and cross-linked Blogs) from both Microsofties as well as others who are just attending or are presenters at the 2003 Professional Developers Conference at the Los Angeles Convention Center, the week of October 26 - 30, 2003. This is an important conference for Microsoft as they will be releasing it's first public look at not only Yukon, but Whidbey and the new OS platform Longhorn. I've listed below some of the official as well as un-official web sites and blogs for you to monitor in the comming week, if you're like me and unable to attend...

http://msdn.microsoft.com/events/pdc/ - MSDN 2003 PDC Home page.
http://pdcbloggers.net/Default.aspx - PDC Blogger's Home page.
http://weblogs.asp.net/ - PDC Session List
http://weblogs.asp.net/pmarcucci/ - Constantly updated sessions data
http://weblogs.asp.net/RHolloway - Randy Holloway's Weblog on Yukon and the CLR
http://www.netcrucible.com/blog/ - Josha Allen's Blog
http://radio.weblogs.com/0001011/ - Robert Scoble's (very entertaining ;-) WebLog
http://longhornblogs.com/scobleizer/ - Robert Scoble's Longhorn Weblog...
http://pdcbloggers.net/Question_and_Answer.category  - PDC Blogger's Q&A page

The above is by no means a complete list, and if you have you're favorite PDC related blog, feel free to email me and I'll post an updated list, just post it to in your blogs.

Enjoy!

posted Friday, October 24, 2003 10:16 AM by jtkane

Back from L.A. and Blogging...

Well, I'm finally back from my trip to L.A. and my adventure in 64-bit vs. 32-bit benchmark testing was extended into this week as well as last week and as it a very busy time for me, I was unable to blog during that trip...  More on the benchmarking results in a latter blog once the final results analysis is completed.  Now that I'm back, I'd thought I'd blog a bit about what's going on in my life... I live in downtown Kirkland, WA near Lake Washington and the Microsoft campus in Redmond, WA. When I left for L.A. nearly two weeks ago, the leaves had yet to change color, and today during my morning walk down to the Triple J Cafe for breakfast and reading the New York Times (NYTimes), I noticed quite a bit of fall leave color among the trees.

The NYTimes is a paper I've been reading for over 20 years with regularity and while it may not have much about the Pacific Northwest, it does have some interesting articles from time-to-time. New Triva - Q. Where is the tallest building in the world? A. Taipei, Taiwan. The construction crew finished building the world's tallest skyscraper, a 1,676-foot-tall, 101-story building called Taipei 101. This new building tops the previous record holder, the Petronas Twin Towers in Kula Lumpur, Malaysia.

Another interesting NYTimes article 'Art and Science Meet with Novel Results' is a review of current and past science fiction novels by and about real scientists, including the new 'Radiant Cool' with a 100-page appendix explaining the theory discussed in this book. I've recently purchased Neal Stephenson's 'Quicksilver' book as I read and enjoyed his 'Cryptonomicon' book this past summer. I've only just started this book, but I was suprised that it was not referenced in the NYTimes article.

You'll note that I didn't reference the 'it' word (SQL) above, except in this paragraph, and that was on purpose as Blogs or WebLogs are more than just about a technology that we use everyday, it's also about who we are as individuals as well as our personal opinions and activities. I've spent some time searching and researching blogs and will post more links from time-to-time of other bloggers from MS employees to people in our industry (scroll down for a pic of Jim Gray) who have web sites &/or blogs about both their personal and professional lives.

IMHO, we who are bloggers at SQLJunkies (2nd reference), should not only blog about SQL Server, but also about our projects, opinions and views (to the extent you feel confortable with) that are both related to our work as well as to what you are passionate about and not just about T-SQL code, product announcements, or general questions that could be better answered in the newsgroups....

Let me know what you think by writing your own blog about what you did today or whatever... Why Blog?

 

posted Saturday, October 18, 2003 11:36 AM by jtkane

SQL Server 2000 (64-bit) and what types of queries could benefit from 64-bit processing?

Recently, I've been working on a project that is completely different than SQL Full-Text Search (SQL FTS), challenging and fun too :-) and is related to SQL Server 2000 (64-bit) Enterprise Edition and the newly anounced Intel's Itanium 2 64-bit CPU codenamed "Madison". Earlier this year Microsoft released both Windows Server 2003 (64-bit) as well as SQL Server 2000 (64-bit) EE and with newly emerging hardware vendor's 64-bit server platforms, SQL Server 2000 64-bit processing is set to take-off, IMHO.

Obviously, one benefit of 64-bit processing is the increased “flat” or linear memory addressability of up to the TB level of RAM (practical limit of 512GB), and therfore no fooling around with PAE, AWE, etc. When comparing SQL Server 2000 (64 bit) running on the new Itanium 2 platforms, the 64-bit platform will show performance improvement directly related to this larger memory addressability including reduced I/O due to larger memory buffer pools. Database applications that require work loads larger than 4GB, can benefit from the higher memory addressability of the 64-bit platform as memory sensitive workloads consume more memory.

Obviously, one benefit of 64-bit processing is the increased flat or linear memory addressability of up to the TB level of RAM (practical limit of 512GB), and therfore no fooling around with PAE, AWE, etc. When comparing SQL Server 2000 (64 bit) running on the new Itanium 2 platforms, the 64-bit platform will show performance improvement directly related to this larger memory addressability including reduced I/O due to larger memory buffer pools. Database applications that require work loads larger than 4GB, can benefit from the higher memory addressability of the 64-bit platform as memory sensitive workloads consume more memory.
However, while the benefits of 64-bit processing is well know in general terms, and if I may, I'd like to get your opinion and feedback on exact T-SQL queries that you believe might benefit from both the increased memory as well computational features of the Madison chip. For example, SQL queries that can use in-memory Hash Joins, Sorts, hash aggregates, and other queries that can take advantage of Parallelism could benefit from 64-bit processing as well as large workloads to reduce i/o (this is not necessarily a complete list), but what exact examples of T-SQL ad hoc or user stored procedures would you test?

Also, what actual and practical SQL queries could benefit from the 128 registers of both floating point and integer? As these too can be valid examples of why Microsoft customers would want to upgrade their 32-bit SQL Server's to SQL Server 2000 (64-bit) EE, IMHO.

I invite your feedback and comments and examples of ad hoc and stored procedures that might benefit true 64-bit processing and I'll pick the best of them to test next week!

posted Monday, September 29, 2003 3:20 PM by jtkane

SQL Server 2000 FTS on Windows 2000 vs. Windows Server 2003...

It's been an interesting and full week as I've been attending the MS-sponsored Publishers/Authors summit conference all this week and while the details are under NDA, I did have a good time seeing other SQL Server authors and I did meet several publishers and discussed my book proposal. I also saw some of my friends from SQL Dev and Communities at the after-conference party at the Red Hook Brewery where a great time was had by all!  I too am in the Yukon beta and when it's Ok, to post information on this, I'll discuss some of the advantages and enhancements relative to Yukon FTS and all things textual. Speaking of Yukon, Eric Brown of Microsoft SQL Marketing is interviewed in a “talking to“ article in the October 2003 MSDN Magazine that is now online.

Eric Brown's Oct. 2003 MSDN Magazine interview - http://msdn.microsoft.com/msdnmag/issues/03/10/TalkingTo/default.aspx

Relative to SQL Server 2000 FTS and depending upon what OS Platform you have it installed on or if you upgrade just the OS platform from Windows Server 2000 (Win2K) to Windows Server 2003 (Win2003), you can see significant difference (and for most users, a *better*, more expected difference).  The OS Platform supplies the “word breaker” dll, for Win2K -  infosoft.dll and for Win2003 -  langwrbk.dll. The latter is a new Microsoft developed wordbreaker that is also included with Windows XP Pro (and used by SQL Server 2000 Developer's Edition on WinXP) with "better" or at least what people would expect as better, although, sometimes different is not always better. Note, the workaround for this issue on Win2K is to use the Neutral “Language for Word Breaker“, but then you lose the ability to use the language-specific INFLECTIONAL FTS query keyword as the Neutral wordbreaker “breaks“ the words based upon the white space between words.

I've tested the Neutral wordbreaker with SQL Server 2000 on OS Platforms Win2K and Win2003 and using a search string of "T-SQL" is broken into "T" and "SQL".  Note the use of double quotes in the search string as this indicates a phrase, i.e., a multiple word search string.  However, in this case we are using a single letter and single letters are normally "noise words" in all of the noise word files, so the "T" is ignored and in this case, a SQL FTS query will return results for "SQL" alone.  You can also remove "T" (or other single letters) from noise.dat, the Neutral wordbreaker noise word file, and on the OS platform Win2K, you will also need to remove "T"or other single letters from the noise.* files under your \WINNT\System32 directory as well as noise.enu (US_English) and noise.eng (UK_English) as well as the noise word files under your SQL Server default folder of  \FTDATA\SQLServer\Config.  After making these changes and before saving the file changes, you must stop the “Microsoft Search“ service, before you can save the FTDATA noise word files. When your modifications are completed, you must run a Full Population and then re-test your SQL FTS query.

posted Friday, September 19, 2003 1:11 PM by jtkane

Full-Text Search (FTS), Text Mining and all things textual...

My name is John Kane and I'm an independent consultant/author (or soon to be) as well as ex-Microsoftie who specializes in SQL Server's Full-Text Search (FTS) components. I have been researching and writing on a book about FTS in SQL Server 2000 as well as Yukon and related topics, such as Text Mining with Analysis Services (BI/OLAP) and all things textual. I'll be posting random thoughts on Search in general, SQL Server FTS as well as Web-based Search Engines and the convergence of structured and un-structured data search  and other search-related news items I come across from time-to-time in my work.

I too am new at this blogging thing and want to thank the SQLJunkies folks for hosting this, so forgive me if I make the occasional goof, but you can see many of my postings (I'm one of the ones who reply) in the Fulltext newsgroup at microsoft.public.sqlserver.fulltext and I'll be presenting at the 2003 PASS Conference in Seattle, WA in November.

Regards,
John

posted Wednesday, September 17, 2003 9:29 AM by jtkane




Powered by Dot Net Junkies, by Telligent Systems