posted on Tuesday, February 28, 2006 7:16 AM by SimonSabin

Why use anything but varchar(max)

As you might have read I am putting together some standards for data type usage.

When I got to the text stuff varchars are king (generally due to the effort required in an app to trim data), except for small (<4 characters) when I would use char.

My quandry came about what size should a varchar be. With SQL 2005 and the varchar(max) I can't see any benefit not using this everywhere, the result is the same whether you use varchar(10), varchar(2000) or varchar(max). Isn't it.

Ok so with a fixed size datatype SQL will stop you iserting more data than is allowed, but be honest who is relying on SQL to do that validation, not me for one.

and there might be a slight difference in storage, I can't remember how the length or the column is stored but would guess that for < 256 it is 1 byte, < 32,767 its 2 bytes etc.

But using varchar(max) it is definitely future proof, and a lot easier to set standards "always use varchar(max)".

Comments

# re: Why use anything but varchar(max) @ Tuesday, February 28, 2006 2:23 PM

Well here's one reason : http://blogs.conchango.com/stevewright/archive/2006/01/20/2686.aspx

Or more generically, if you have another placeholder external to SQL Server that is going to be interfacing with that column then you have the length of the external placeholder to consider as well.

-Jamie

Jamie Thomson

# re: Why use anything but varchar(max) @ Tuesday, February 28, 2006 2:31 PM

I also cover some reasons NOT to do that in this article:

http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1098157,00.html

Adam Machanic

# re: Why use anything but varchar(max) @ Tuesday, February 28, 2006 5:31 PM

Not to pile on, but a partitioning key may also not utilize the varchar(max) data type.

RickH

# re: Why use anything but varchar(max) @ Sunday, March 05, 2006 10:36 PM

Explicitly defining the data size of a column is indicative of good design. But it is more than academic-

Data limits:
If many columns are defined in this manner, it increases the chance that poor usage will overflow the data row (over 8000 characters). Database design shouldn't allow for this sort of flaw to be easily performed. While stored procs could prevent such a flaw, it's much more concise to have the data columns well defined.

Application usage:
When applications render or use the text data, it is useful to know how much buffer space is required. The application will probably make some assumptions on the display area. The data table, not stored procs or application controls, should dictate the size explicitly.

David Rodecker

# re: Why use anything but varchar(max) @ Monday, March 06, 2006 2:00 AM

Contra-indication : You want to use this column in the key-part of an index. You cannot do that with varchar(max)-columns !

check BOL ;)

Also the way varchar(max) is handled internaly, you will suffer performance wize, compared to varchar(nnn) with an appropriate size.

You'd better start development with restrictions, because to adjust the applications afterward to match your performance-restrictions will be a mighty pain in the ***.

With sql2005 MS did not specify to thug all your data-analysis-excersises away !
Do your datadesign and you will benifit in the long run !

ALZDBA

# re: Why use anything but varchar(max) @ Wednesday, April 26, 2006 7:53 PM

Amongst all of the performance, design and other generally technical reasons why this is a terrible idea is a splash of common sense.

I can drive my car with my feet but that doesn't make it a good idea!

Jack Thompson

# re: Why use anything but varchar(max) @ Thursday, June 01, 2006 1:30 AM

I am struggling to justify this to myself and other team members on our current project. Sure varchar(max) has some limitations for keys, etc. but why should I use varchar(50) instead of varchar(8000)? Obviously if all the varchars in one row exceed 8000ish bytes I get overflow and performance hits. However, if this never happens in production why not give yourself some room? I don't want to agree with your post but I can't see anything against it. :)

Jason

# re: Why use anything but varchar(max) @ Friday, June 09, 2006 4:59 AM

Designing a database is a job. This takes time and is important for the future of the server / application.
Creating object with varchar(max).... only to win some time in the creation of the db is really a bad idea.
You should know what comes in your rows. You should correctly size your table, db. Choose the correct type for the correct column. The max is designed for one purpose. Don't use it for everything...

Consider the time you will loose:
- when you will make connection to other db, system. All the conversion,...
- unpredictable growth of the db / servers
- unwanted results in column
- performance issue
- working with molap cubes would be insane (when making a molap cube, the columns are created with their max value.... oh a cube of 10 gig... There must be a lot of records in it ? Oh no, only 2...)
- designing applications / screen
-....

Don't use shortcut for everything. The time won could be lot less than the time lost to fix all the problems...

Belsteak

# re: Why use anything but varchar(max) @ Tuesday, November 07, 2006 3:57 AM

When creating an ods application with a database behind it. I would agree to the statement you are making. However, when you consider bulk loads and etl processes. One should reconsider the use of varchar max.

2 main reasons.

1. The space alloction in extends and pages of the database are flexible. Thus when adding information to the field using update, your database would have to create a pointer if the new data is longer than the previous inserted. This the database files would become fragmented = lower performance in almost everything, from index to delete, update and inserts.

2. When inserting tens of millions of rows (Or smaller amounts for that sake) the issue described in point 1 is even more but add now the calculation time required to get the proper length of the varchar and evaluate every inserts before it is actually inserted. (We are talking here in miliseconds per field but a calculating explains:

1 column varchar is around 0.1 milisecond.
30 columns = 0.30 miliseconds (And perhaps even some additional overhead but i'll leave this out.)

10 records = 3 miliseconds of proces time wasted

10.000 records = 3 seconds
100.000 records = 30 seconds
1.000.000 records = 300 seconds (5 minutes)
10.000.000 records = 3000 seconds (50 minutes)

Remember this is all wasted time initially. Add the index creation and defragmentation let alone wasted query performance to retrieve the information and I rest my case. :)

Werner de Jong