Saturday, February 19, 2005

No UTF-8 support in SQL Server

OK, I admit that I am a little biased as a Java programmer. But is it too much to ask for from a major commercial database? In Storing UTF-8 Data in SQL Server, Microsoft acknowledges that "some applications (especially those that are Web based) must deal with Unicode data that is encoded with the UTF-8 encoding method". But instead of adding UTF-8 support to SQL Server, they patch up the IIS to perform UCS-2 to UTF-8 translation and of course you have to use ASP to benefit from that. As for those who dare to write their web applications in other insignificant languages such as JSP, you deserve to write your own conversion routine. And if your language of choice doesn't support UCS-2, you really should switch to any of "ODBC, OLEDB, COM, Win32 API calls, VB, and C".

Technorati Tags:


asir said...

how about sql server 2005?

Web Developer Sydney said...

My exact problem :) and the problem still exists in SQL Server 2005 and even SQL Server 2008 (check the beta documentation of SQL Server 2008 on Microsoft's website) you only have nchar and nvarchar which store UCS2 characters...

Anonymous said...

I guess Microsoft assumes that there is not requirement of having multiple languages in the same application. Something like, if you app is in Japanese, why do you want to store Chinese characters. And if there is, this is edge case and you should do your convertion by yourself.

Anonymous said...

I hit your blog while tryi
ng an install of SQL server 2005 with UTF-8 support. I am installing a piece of software called 'Documentum Content Server' which makes use of a database instance. The content server supports 4 different databases, SQL server 2005 being one of them. UTF-8 is a prerrequisite for any brand you choose. I stopped when I found that there was no UTF-8 for SQL server 2005, as it is neverthless supported (?) After reviewing the installation guide for the software, I found the following:

"On SQL Server, you can use any collation (SQL Server’s name for code page),
because this only determines the code page of varchar and char types. For new
SQL Server repositories, Content Server uses only nvarchar and nchar types, which
automatically use Unicode."

It is true that at the beginning of the installation, the SQL server prompts you to use a 'collation' and Unicode or UTF is not seen anywhere. But then, it seems that you would have to restrict yourself to those nchar and nvchar types.
Hope that helps anyone.