Monday, July 10, 2006 - Posts

Double your performance, don't use unicode in your database

I was listening to a web cast by one of the infamous gurus of SQL and heard again that you should use nvarchar and nchar for your database columns and not varchar and char.

The reasons put forward were that

  1. your client will be in unicode and so if you don't there will always be conversion going on from non-unicode to unicode.
  2. you will be able to support languages such as Chinese etc that require unicode.

Again I was wondering how much should I take head and start using unicode because

  1. I work on central european systems
  2. the space required for unicode is double that of non-unicode

The only compelling reason I see would be if the performance of the conversion on the client out ways the size required to store the double bytes.

So I did a test. Very cruedly created 2 tables, populated them with 10,000 rows and read them from a .net client app and timed them.

What was the outcome, well on my laptop (so no network latency) the time to process the unicode data was twice that of the non-unicode data. That does make sense as SQL has to return twice the amount of data, that means reading twice the number of pages, send twice the amount of data down the stack and use up twice the amount of your valuable cache. 

So it seems that the overhead of processing the double the amount of data far out ways any conversion that goes on.

When ever you design a database you will realise that the size of text based columns will dwarfs the size of your integer key columns, if you double the size of those columns the dwarfing becomes a real David v Goliath. imagine a address table with 2  integer key columns and 5x50 character address fields. thats 250 bytes of text and 8 bytes of integers, with unicode that becomes 500 bytes of text data, thats 16 rows per page. Your 1 million row table now takes up .5 Gb of your cache.

Whilst I know anyone that has used the CLR will be shouting, "but SQLCLR only support unicode", true but it doens't mean you have to use unicode for your columns.

So if you your system isn't likely to need unicode, (if you don't know or are not sure then it probably won't), don't use unicode for your columns. Save your cache, bandwidth and disk space.

Cross posted from http://sqlblogcasts.com/blogs