String Interning in C#
We all know that string objects are immutable in C# i.e we can only create a new instance of the object we cannot alter or modify them.Let us take a quick look into the following lines of code:
static void Main(string args)
string s1 = "sankarsan";
string s2 = "sankarsan";
if (object.ReferenceEquals(s1, s2))
Console.WriteLine("Both s1 and s2 refer to same object");
Console.WriteLine("s1 and s2 refer to different object");
As strings are immutable s1 and s2 should be two different objects and output of the program should be "s1 and s2 refer to different object".But somehow that is not the case the output of the above code is "Both s1 and s2 refer to same object".But how can this happen?Let us also take a look into the IL code
IL_0001: ldstr "sankarsan"
IL_0007: ldstr "sankarsan"
ldstr basically allocates memory for a string and stloc stores the reference into a variable in stack.
Now let us carefully study the documentation of the ldstr opcode in MSDN : http://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldstr.aspx.
The following lines in MSDN needs to be carefully noted:
The Common Language Infrastructure (CLI) guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object (a process known as "string interning").
CLR internally maintains a hashtable like structure called intern pool which contains an entry for each unique literal string as key and the memory location of the string object as value.When a string literal is assigned to the variable CLR checks if the entry present in the intern pool,if exists it returns reference to that object otherwise creates the string object, adds to the pool and returns the reference.This is String Interning.The basic objective of this is reduce memory usage by avoiding duplication of same strings which are immutable objects.
But this can have negative performance impact as well.This is because the additional hashtable lookups are costly and moreover all the interned strings are not unloaded from the memory till the app domain is unloaded.So they will occupy memory even if they are not used.
We can try to off string interning by adding the following attribute to the assembly
But it is upto the CLR as it may or may not consider this attribute.But if the native image is compiled using Ngen.exe then it considers this attribute.
This feature of string interning is not something specific to CLR but also present languages like Java,Python etc.