Talk:String interning

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Clarify[edit]

.NET languages, Lua and JavaScript string values are immutable and interned.

For C# (and .NET in general, for the other languages/runtimes I don't know) - as mentioned earlier in this talk - only 'string literals' are interned automatically. I think it's worth to mention. The cited MSDN article shows this. — Preceding unsigned comment added by 37.5.252.178 (talk) 09:18, 30 June 2013 (UTC)[reply]

question[edit]

if you have to do a special operation to get the version in the intern pool why do strings need to be put in the intern pool as soon as they are created (couldn't they just be put in the first time an app looks for them in there?). Plugwash 22:42, 22 January 2006 (UTC)[reply]

String costs a bit of time, but saves memory in the longer run if there are a lot of equal strings. It's a space-time tradeoff. Wouter Lievens 08:21, 22 June 2006 (UTC)[reply]


Maybe the historical roots of intern (lisp symbols) should be mentioned? Chkno 20:50, 20 November 2006 (UTC)[reply]

The example about finding string duplicates should probably be replaced. It's straightforward to find duplicates in O(n log n) time (just sort the strings, then scan the ordered list; duplicates will be adjacent), and you can get O(n) in practice with a reasonable hash function (hash the strings, use hash table to find potential duplicates). 03:15, 6 December 2006 (UTC)

  • I agree, it was not a very good piece of code, so I removed it completely. I don't see such an example help much anyway. If the code should be rewritten, it could/should be written with STL (e.g. set<string>), because that gives cleaner and less code, and gives a much better idea of what actually is done. anoko_moonlight (talk) 15:41, 21 August 2011 (UTC)[reply]

Removal of content[edit]

I have been bold in updating this page, and removed a bit of content from it which I found inaccurate; notably,

One disadvantage of the string intern pool is that since every string object must be checked against the pool at creation, there is some performance penalty even if the intern pool is never used by the application. However, due to the possible benefits, this is generally considered acceptable in the languages in which it is implemented.

In neither Java nor the .NET Framework does the string interning occur automatically. It is up to the programmer to explicitly use the intern pool. String literals, however, are always interned anyway, so

String.Intern("Hello World!") // C#
"Hello World!".intern() // Java

don't make much sense. Louis 10:08, 28 January 2007 (UTC) (Edited to make C#'s invocation correct; it's a class method in the .NET Framework. Louis 10:12, 28 January 2007 (UTC))[reply]


this code sucks[edit]

at least three memory leaks introduced in such small code snippet, this sucks! 1. no destructor of StringTable, thus no delete of StringTable::array memory allocated at constructor 2. addString method allocates memory for array[i] variable which is neither deleted later 3. String class allocates memory for local text variable which is not freed after usage —Preceding unsigned comment added by 80.250.171.67 (talk) 14:37, 27 July 2010 (UTC)[reply]

implicit (and optional) string interning for string literals in C and C++[edit]

Isn't it a kind of "string interning" for at a subset of string objects in C and C++? --RokerHRO (talk) 21:37, 19 July 2011 (UTC)[reply]

Constant literal strings can probably be considered as automatically-interned in C and C++ since compilers will generally merge duplicates, and the string pointer may be used for comparision, but deduplication may be implementation-dependent. However, it's of course not difficult to implement explicit string interning functionality for C or C++. This can be implemented using a hash table or other specialized structure (new string creation/interning needs to verify if the string previously existed, but afterwards only the pointer is enough for comparison). This is how Lisp symbols are interned in implementations written in C. For symbols, interning makes a lot of sense, as they may be compared very frequently to other symbols using EQ (basically internally comparing their pointers or unique integer identity), in the implementation of interpreters, compilers, macros, and for dynamic function argument list processing; with most symbols interned at read/load time instead of runtime, although dynamic runtime symbol creation and interning is also possible... 76.10.128.192 (talk) 22:15, 16 June 2013 (UTC)[reply]