Skip to content

Commit

Permalink
Documentation page on collations and case-sensitivity
Browse files Browse the repository at this point in the history
Closes #2273
  • Loading branch information
roji committed Apr 28, 2020
1 parent 804349c commit 4f0ae57
Show file tree
Hide file tree
Showing 6 changed files with 170 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Collations and case sensitivity - EF Core
author: roji
ms.date: 04/27/2020
ms.assetid: bde4e0ee-fba3-4813-a849-27049323d301
uid: core/miscellaneous/collations-and-case-sensitivity.md
---
# Collations and Case Sensitivity

> [!NOTE]
> The APIs shown on this page are being introduced in EF Core 5.0, which is still in preview.
Text processing in databases can be a complex, and requires more user attention that one would suspect. For one thing, databases vary considerably in how they handle text; for example, while some databases are case-sensitive by default (e.g. Sqlite, PostgreSQL), others are case-insensitive (SQL Server, MySQL). In addition, because of index usage, case-sensitivity and similar aspects can have a far-reaching impact on query performance: while it may be tempting to use `string.Lower` to force a case-insensitive comparison in a case-sensitive database, doing so may prevent your application from using indexes. This page details how to configure case sensitivity, or more generally, collations, and how to do so in an efficient way without compromising query performance.

## Introduction to collations

A fundamental concept in text processing is the *collation*, which is a set of rules determining how text values are ordered and compared for equality. For example, while a case-insensitive collation disregards differences between upper- and lower-case letters for the purposes of equality comparison, a case-sensitive collation does not. However, since case-sensitivity is culture-sensitive (e.g. `i` and `I` represent different letter in Turkish), there exist multiple case-insensitive collations, each with its own set of rules. The scope of collations also extends beyond case-sensitivity, to other aspects of character data; in German, for example, it is sometimes (but not always) desirable to treat `ä` and `ae` as identical. Finally, collations also define how text values are *ordered*: while German places `ä` after `a`, Swedish places it at the end of the alphabet.

All text operations in a database use a collation - whether explicitly or implicitly - to determine how the operation compares and orders strings. The actual list of available collations and their naming schemes is database-specific; consult [the section below](#database-specific-information) for links to relevant documentation pages of various databases. Fortunately, database do generally allow a default collation to be defined at the database or column level, and to explicitly specify which collation should be use for specific operations in a query.

## Database collation

In most database systems, a default collation is defined at the database level; unless overridden, that collation implicitly applies to all text operations occurring within that database. The database collation is typically set at database creation time (via the `CREATE DATABASE` DDL statement), and if not specified, defaults to a some server-level value determined at setup time. For example, the default server-level collation in SQL Server is `SQL_Latin1_General_CP1_CI_AS`, which is a case-insensitive, accent-sensitive collation. Although database systems usually do permit altering the collation of an existing database, doing so can lead to complications; it is recommended to pick a collation before database creation.

The following code in your model's `OnModelCreating` method configures a SQL Server database to use a case-sensitive collation:

[!code-csharp[Main](../../../samples/core/Miscellaneous/Collations/Program.cs?range=40)]

## Column collation

Collations can also be defined on text columns, overriding the database default. This can be useful if certain columns need to be case-insensitive, while the rest of the database needs to be case-sensitive.

The following configures the column for the `Name` property to be case-insensitive in a database that is otherwise configured to be case-sensitive:

[!code-csharp[Main](../../../samples/core/Miscellaneous/Collations/Program.cs?name=OnModelCreating&highlight=6)]

## Explicit collation in a query

In some cases, the same column needs to be queried using different collations by different queries. For example, one query may need to perform a case-sensitive comparison on a column, while another may need to perform a case-insensitive comparison on the same column. This can be accomplished by explicitly specifying a collation within the query itself:

[!code-csharp[Main](../../../samples/core/Miscellaneous/Collations/Program.cs?name=SimpleQueryCollation)]

This generates a `COLLATE` clause in the SQL query, which applies a case-sensitive collation regardless of the collation defined at the column or database level:

```sql
SELECT [c].[Id], [c].[Name]
FROM [Customers] AS [c]
WHERE [c].[Name] COLLATE SQL_Latin1_General_CP1_CS_AS = N'John'
```

### Explicit collations and indexes

Indexes are one of the most important factors in database performance - a query that runs efficiently with an index can grind to a halt without that index. Indexes implicitly inherit the collation of their column; this means that all queries on the column are automatically eligible to use indexes defined on that column - provided that the query doesn't specify a different collation. Specifying an explicit collation in a query will generally prevent that query from using an index defined that column, since the collations would no longer match; it is therefore recommended to exercise caution when using this feature. It is always preferable to define the collation at the column (or database) level, allowing all queries to implicitly use that collation and benefit from any index.

Note that some databases allow the collation to be defined when creating an index (e.g. PostgreSQL, Sqlite). This allows multiple indexes to be defined on the same column, speeding up operations with different collations (e.g. both case-sensitive and case-insensitive comparisons). Consult your database provider's documentation for more details.

> [!WARNING]
> Always inspect the query plans of your queries, and make sure the proper indexes are being used in performance-critical queries executing over large amounts of data. Overriding case-sensitivity in a query via `EF.Functions.Collate` (or by calling `string.ToLower`) can have a very significant impact on your application's performance.
## Translation of built-in .NET string operations

In .NET, string equality is case-sensitive by default: `s1 == s2` performs an ordinal comparison that requires the strings to be identical. Because the default collation of databases varies, and because it is desirable for simple equality to use indexes, EF Core makes no attempt to translate simple equality to a database case-sensitive operation: C# equality is translated directly to SQL equality, which may or may not be case-sensitive, depending on the specific database in use and its collation configuration.

In addition, .NET provides overloads of [`string.Equals`](https://docs.microsoft.com/dotnet/api/system.string.equals#System_String_Equals_System_String_System_StringComparison_) accepting a [`StringComparison`](https://docs.microsoft.com/dotnet/api/system.stringcomparison) enum, which allows specifying case-sensitivity and culture for the comparison. By design, EF Core refrains from translating these overloads to SQL, and attempting to use them will result in an exception. For one thing, EF Core does know not which case-sensitive or case-insensitive collation should be used. More importantly, applying a collation would in most cases prevent index usage, significantly impacting performance for a very basic and commonly-used .NET construct. To force a query to use case-sensitive or case-insensitive comparison, specify a collation explicitly via `EF.Functions.Collate` as [detailed above](#explicit-collations-and-indexes).

## Database-specific information

* [SQL Server documentation on collations](https://docs.microsoft.com/sql/relational-databases/collations/collation-and-unicode-support)
* *MORE NEEDED*
13 changes: 13 additions & 0 deletions entity-framework/core/modeling/entity-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,16 @@ A property that would be optional by convention can be configured to be required
[!code-csharp[Main](../../../samples/core/Modeling/FluentAPI/Required.cs?name=Required&highlight=3-5)]

***

## Column collation

> [!NOTE]
> This feature is beging introduced in EF Core 5.0, which is still in preview.
A collation can be defined on text columns, determining how they are compared and ordered. For example, the following configures a SQL Server column to be case-insensitive:

[!code-csharp[Main](../../../samples/core/Miscellaneous/Collations/Program.cs?range=42-43)]

If all columns in a database need to use a certain collation, define the collation at the database level instead.

General information about EF Core support for collations can be found in the [collation documentation page](../miscellaneous/collations-and-case-sensitivity.md).
2 changes: 2 additions & 0 deletions entity-framework/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@
href: core/miscellaneous/configuring-dbcontext.md
- name: Nullable reference types
href: core/miscellaneous/nullable-reference-types.md
- name: Collations and case sensitivity
href: core/miscellaneous/collations-and-case-sensitivity.md
- name: Create a model
items:
- name: Overview
Expand Down
14 changes: 14 additions & 0 deletions samples/core/Miscellaneous/Collations/Collations.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFramework>netcoreapp3.1</TargetFramework>
<RootNamespace>EFCollations</RootNamespace>
<AssemblyName>EFCollations</AssemblyName>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="5.0.0-preview.3.20181.2" />
<PackageReference Include="Microsoft.Extensions.Logging.Console" Version="5.0.0-preview.3.20181.2" />
</ItemGroup>

</Project>
53 changes: 53 additions & 0 deletions samples/core/Miscellaneous/Collations/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
using System;
using System.Linq;
using Microsoft.EntityFrameworkCore;

namespace EFCollations
{
public class Program
{
static void Main(string[] args)
{
using (var db = new CustomerContext())
{
db.Database.EnsureDeleted();
db.Database.EnsureCreated();
}

using (var context = new CustomerContext())
{
#region SimpleQueryCollation
var customers = context.Customers
.Where(c => EF.Functions.Collate(c.Name, "SQL_Latin1_General_CP1_CS_AS") == "John")
.ToList();
#endregion
}
}
}

public class CustomerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=EFCollations;Trusted_Connection=True;ConnectRetryCount=0");
}

#region OnModelCreating
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.UseCollation("SQL_Latin1_General_CP1_CS_AS");

modelBuilder.Entity<Customer>().Property(c => c.Name)
.UseCollation("SQL_Latin1_General_CP1_CI_AS");
}
#endregion
}

public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
}
}
19 changes: 19 additions & 0 deletions samples/core/Samples.sln
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SqlServer", "SqlServer\SqlS
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ValueConversions", "Modeling\ValueConversions\ValueConversions.csproj", "{FE71504E-C32B-4E2F-9830-21ED448DABC4}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Collations", "Miscellaneous\Collations\Collations.csproj", "{62C86664-49F4-4C59-A2EC-1D70D85149D9}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -375,6 +377,22 @@ Global
{FE71504E-C32B-4E2F-9830-21ED448DABC4}.Release|x64.Build.0 = Release|Any CPU
{FE71504E-C32B-4E2F-9830-21ED448DABC4}.Release|x86.ActiveCfg = Release|Any CPU
{FE71504E-C32B-4E2F-9830-21ED448DABC4}.Release|x86.Build.0 = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|Any CPU.Build.0 = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|ARM.ActiveCfg = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|ARM.Build.0 = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|x64.ActiveCfg = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|x64.Build.0 = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|x86.ActiveCfg = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Debug|x86.Build.0 = Debug|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|Any CPU.ActiveCfg = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|Any CPU.Build.0 = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|ARM.ActiveCfg = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|ARM.Build.0 = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|x64.ActiveCfg = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|x64.Build.0 = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|x86.ActiveCfg = Release|Any CPU
{62C86664-49F4-4C59-A2EC-1D70D85149D9}.Release|x86.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand All @@ -393,6 +411,7 @@ Global
{802E31AD-2F1E-41A1-A662-5929E2626601} = {CA5046EC-C894-4535-8190-A31F75FDEB96}
{63685B9A-1233-4B44-AAC1-8DDD4B16B65D} = {CA5046EC-C894-4535-8190-A31F75FDEB96}
{FE71504E-C32B-4E2F-9830-21ED448DABC4} = {CA5046EC-C894-4535-8190-A31F75FDEB96}
{62C86664-49F4-4C59-A2EC-1D70D85149D9} = {85AFD7F1-6943-40FE-B8EC-AA9DBB42CCA6}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {20C98D35-54EF-46A6-8F3B-1855C1AE4F70}
Expand Down

0 comments on commit 4f0ae57

Please sign in to comment.