Ultimately, a generic "contains" operation comes down to a function like this,
/// <summary>
/// Determines whether the source contains the sequence.
/// </summary>
/// <typeparam name="T">The type of the items in the sequences.</typeparam>
/// <param name="sourceEnumerator">The source enumerator.</param>
/// <param name="sequenceEnumerator">The sequence enumerator.</param>
/// <param name="equalityComparer">An equality comparer.</param>
/// <remarks>
/// An empty sequence will return <c>true</c>.
/// The sequence must support <see cref="IEnumerator.Reset"/>
/// if it does not begin the source.
/// </remarks>
/// <returns>
/// <c>true</c> if the source contains the sequence;
/// otherwise <c>false</c>.
/// </returns>
public static bool Contains<T>(
IEnumerator<T> sourceEnumerator,
IEnumerator<T> sequenceEnumerator,
IEqualityComparer<T> equalityComparer)
{
if (equalityComparer == null)
{
equalityComparer = EqualityComparer<T>.Default;
}
while (sequenceEnumerator.MoveNext())
{
if (sourceEnumerator.MoveNext())
{
if (!equalityComparer.Equals(
sourceEnumerator.Current,
sequenceEnumerator.Current))
{
sequenceEnumerator.Reset();
}
}
else
{
return false;
}
}
return true;
}
this can be trivially wrapped in a extension version accepting IEnumerable like this,
public static bool Contains<T>(
this IEnumerable<T> source,
IEnumerable<T> sequence,
IEqualityComparer<T> equalityComparer = null)
{
if (sequence == null)
{
throw new ArgumentNullException("sequence");
}
using(var sequenceEnumerator = sequence.GetEnumerator())
using(var sourceEnumerator = source.GetEnumerator())
{
return Contains(
sourceEnumerator,
sequenceEnumerator,
equalityComparer);
}
}
Now, this will work for the ordinal comparison of any sequences, including strings, since string implements IEnumerable<char>,
// The optional parameter ensures the generic overload is invoked
// not the string.Contains() implementation.
"testable".Contains("est", EqualityComparer<char>.Default)
However, as we know, strings are not generic, they are specialized. There are two key factors at play.
- The "casing" issue which itself has various language dependent edge cases.
- The rather involved issue of how a set of "Text Elements" (letters/numbers/symbols etc.) are represented by Unicode Code Points and what valid sequences of chars can represent a given string, details are expanded in these answers.
The net effect is the same. Strings that you might assert are linguistically equal can be validly represented by different combinations of chars. Whats more, the rules for validity change between cultures.
All this leads to a specialized string based "Contains" implementation like this.
using System.Globalization;
public static bool Contains(
this string source,
string value,
CultureInfo culture = null,
CompareOptions options = CompareOptions.None)
{
if (value == null)
{
throw new ArgumentNullException("value");
}
var compareInfo = culture == null ?
CultureInfo.CurrentCulture.CompareInfo :
culture.CompareInfo;
var sourceEnumerator = StringInfo.GetTextElementEnumerator(source);
var sequenceEnumerator = StringInfo.GetTextElementEnumerator(value);
while (sequenceEnumerator.MoveNext())
{
if (sourceEnumerator.MoveNext())
{
if (!(compareInfo.Compare(
sourceEnumerator.Current,
sequenceEnumerator.Current,
options) == 0))
{
sequenceEnumerator.Reset();
}
}
else
{
return false;
}
}
return true;
}
This function can be used to perform a case insensitive, culture specific "contains" that will work, whatever the normalization of the strings. e.g.
"testable".Contains("EST", StringComparer.CurrentCultureIgnoreCase)