|
A DBA Primer : Collations
Collation is concerned with how character data is interpreted by SQL Server. Until you run into a problem concerning them, you’re probably
blissfully unaware of their existence. The following error is easy to generate by joining columns of different collations.
Msg 468, Level 16, State 9, Line 1
Cannot resolve the collation conflict
between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS" in the equal
to operation.
A collation ties together :
1) Code Page
This is a single-byte character set that represents the
alphabet, punctuation & symbols of a language. Data types using a code page are
char, varchar & ntext.
Code page does not apply to double-byte (Unicode) characters.
2) Sort Order for
Unicode data types.
Unicode file types require double byte storage. Data types
are nchar , nvarchar and ntext.
3) Sort order for code
page characters
Sort order itself is affected by :
Case Sensitivity
In comparison and sorting operations, are uppercase and
lowercase characters considered equal? In a Case Sensitive collation 'A' <> 'a',
'Cat' <> 'CAT' etc.
Accent Sensitivity
In comparison and sorting operations, are accented and
unaccented characters considered equal? Does 'a' = '?', 'Bronte' = 'Brontë'?
Kana sensitivity
Japan has 2 types of characters Hiragana and Katakana.
In comparison and sorting operations, are Kana sensitivite
collation treated them as equal?
Width sensitivity
In comparison and sorting operations, is a single byte
character equal to it's double-byte representation?
Collation naming reveals all this information.
For example, the Latin1_General_CI_AS collation is a Latin
code page that is Case Insensitive (CI) and Accent Sensitive (AS).
Exploring Collations
Step 1 : What
Collation is SQL installed under?
SELECT SERVERPROPERTY('COLLATION')
Step 2 : Which collations are available to me?
SELECT Name, Description FROM fn_helpcollations()
Step 3 : Which databases have a different collation to the
server default?
SELECT
NAME
AS DATABASE_NAME
,
DATABASEPROPERTYEX(NAME,'COLLATION') AS DBCOLLATION
,
SERVERPROPERTY('COLLATION') AS SERVERCOLLATION
FROM SYS.DATABASES
WHERE
CONVERT(SYSNAME,DATABASEPROPERTYEX(NAME,'COLLATION')) <>
SERVERPROPERTY('COLLATION')
Step 4 : Show me the collation for each column in my database
SELECT
C.TABLE_CATALOG
AS DATABASE_NAME
,C.TABLE_SCHEMA
,C.TABLE_NAME
,C.COLUMN_NAME
,DATA_TYPE
,SERVERPROPERTY('COLLATION')
AS SERVER_COLLATION
,CONVERT(SYSNAME,DATABASEPROPERTYEX(D.NAME,'COLLATION'))
AS DATABASE_COLLATION
,C.COLLATION_NAME
AS COLUMN_COLLATION
FROM INFORMATION_SCHEMA.COLUMNS C
INNER JOIN SYS.DATABASES D
ON DB_ID(C.TABLE_CATALOG) =
DB_ID(D.NAME)
WHERE DATA_TYPE IN ('VARCHAR'
,'CHAR','NVARCHAR','NCHAR','TEXT','NTEXT')
Step 5 : Show me differences in collation settings on my
server.
This produces two results sets :
1) Databases where collation is different from the server
setting
2) Columns where collation is different from the database
setting
IF EXISTS (SELECT * FROM
TEMPDB.DBO.SYSOBJECTS WHERE ID = OBJECT_ID('tempdb.dbo.#CollationComparison'))
DROP TABLE #CollationComparison
CREATE TABLE #CollationComparison
(Database_Name SYSNAME
,Table_Schema SYSNAME
,Table_Name SYSNAME
,Column_Name SYSNAME
,Server_Collation SYSNAME
,Database_Collation SYSNAME
,Column_Collation SYSNAME)
DECLARE @SQL NVARCHAR(MAX)
DECLARE @dbname NVARCHAR(200)
DECLARE dbcursor CURSOR FOR
select name from sys.databases
OPEN dbcursor
FETCH NEXT FROM dbcursor INTO @dbname
WHILE @@FETCH_STATUS = 0
BEGIN
print @dbname
SET @SQL = 'INSERT INTO
#CollationComparison
(Database_Name
,Table_Schema
,Table_Name
,Column_Name
,Server_Collation
,Database_Collation
,Column_Collation)
SELECT
C.TABLE_CATALOG
AS DATABASE_NAME
,C.TABLE_SCHEMA
,C.TABLE_NAME
,C.COLUMN_NAME
,CONVERT(VARCHAR,SERVERPROPERTY(''COLLATION''))
AS SERVER_COLLATION
,CONVERT(SYSNAME,DATABASEPROPERTYEX(D.NAME,''COLLATION''))
AS DATABASE_COLLATION
,C.COLLATION_NAME
AS COLUMN_COLLATION
FROM [' + @dbname +
'].INFORMATION_SCHEMA.COLUMNS C
INNER JOIN SYS.DATABASES D
ON DB_ID(C.TABLE_CATALOG) =
DB_ID(D.NAME)
WHERE DATA_TYPE IN (''VARCHAR''
,''CHAR'',''NVARCHAR'',''NCHAR'',''TEXT'',''NTEXT'')
'
exec sp_executesql @SQL
print @sql
FETCH NEXT FROM dbcursor INTO @dbname
END
CLOSE dbcursor
DEALLOCATE dbcursor
SELECT DISTINCT
Server_Collation,Database_Collation,Database_Name FROM #CollationComparison
WHERE Server_Collation <> Database_Collation
SELECT DISTINCT * FROM
#CollationComparison WHERE Column_Collation <> Database_Collation
So, you've got collation differences. Now what?
Well you can deal with them at the query level, e.g;
in a WHERE clause ...
SELECT columnlist
FROM table
WHERE searchedcolumn COLLATE
SQL_Latin1_General_CP1_CS_AS = 'Searched Text'
Or in a JOIN
SELECT columnlist
FROM table1
LEFT OUTER JOIN table2
ON table1.textid = table2.textid
COLLATE SQL_Latin1_General_CP1_CI_AI
NB : Using Collate in this way will
prevent the query optimiser from using optimal indexes and creating an efficient
execution plan.
If you can change the database schema, you can ....
Change Collation of a
column -
ALTER TABLE tablename
ALTER COLUMN columnname datatype
COLLATE collationname
e.g,
ALTER TABLE Person.Contact
ALTER COLUMN Lastname
COLLATE Latin1_General_CS_AS
Change Collation of a
database -
There are 2 approaches to this problem
1) Create a second database by scripting the first and
transferring the data. Knowledgebase Article
325335 describes this.
2) IF you are Using Sql 2005 SP2+, you can change your
existing database directly.
Firstly change the collation setting –
ALTER DATABASE [adventureworks] COLLATE
Latin1_General_CS_AS
Then change each column individually using the ALTER TABLE
... ALTER COLUMN... syntax.
Thankfully, there are plenty of scripts around to help
automate this e.g.
here
Change the default
collation of a server
You really don't want to do this (though instructions are
here). Reinstalling Sql Server is less
time consuming in my opinion.
Hopefully this will help you get an insight into Collation
settings in your environment and go some way towards locating the cause of
collation issues.
|