Tuple vs string as a Dictionary key in C#

Question

I have a cache that I implement using a ConcurrentDictionary, The data that I need to keep depends on 5 parameters. So the Method to get it from the cache is: (I show only 3 parameters here for simplicity)

public CachedData GetCachedData(string param1, string param2, int param3);

I wonder what type of key will be better to use in my ConcurrentDictionary, I can do it like this:

var myCache = new ConcurrentDictionary<string, ChachedData>();
// check for key
bool exists = myCache.ContainsKey(string.Format("{0}_{1}_{2}", parm1, param2, param3);

Or like this:

var myCache = new ConcurrentDictionary<Tuple<string, string, int>, CachedData>();
// check for key
bool exists = myCache.ContainsKey(new Tuple(param1, param2, param3));

I don't use these parameters together any other place, so there is no justification to create a class just to keep them together.

I want to know which approach is a better in terms of performance and maintainability.

If you're talking about maintainability and readability, I'd say you'd still create a class for the parameters/key with a custom comparer. If you use a class, you only have to edit said class, and you won't have to edit two or more places in your code. That's all my opinion though. If I'd have to choose between your two options, I'd go for the tuple as key. It's worse performance-wise, but it's easier to understand/maintain. — RandomStranger, 22 hours ago
My company always states the following: "You'll write your code once, but your code will be read a thousand times." It's all up to you though, I just find that readability is important for code I write :) — RandomStranger, 22 hours ago
I find it odd that you would need 5 different keys for the same dictionary. — o_weisman, 22 hours ago
If you use string concatenation then you'll need to make very sure that you don't duplicate keys. If any of your strings contain _ then they can be immediately ambiguous. It may of course be that this can't happen with the format of your keys but it is something to be aware of and why I would be more inclined towards Tuples than strings. — Chris, 22 hours ago
My Moto is: "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." :) — Shahar, 22 hours ago

InBetween · Answer 1 · 2017-01-30 14:36:10Z

I want to know which approach is a better in terms of performance and maintainability.

As always, you have the tools to figure it out. Code both possible solutions and make them race. The one that wins is the winner, you don't need anyone here to answer this particular question.

About maintenance, the solution that autodocuments itself better and has better scalability should be the winner. In this case, the code is so trivial that autodocumentation isn't that much of an issue. From a scalability point of view, IMHO, the best solution is to use Tuple<T1, T2, ...>:

You get free equality semantics which you don't need to maintain.
Collisions are not possible, something that is not true if you choose the string concatenation solution:
```
var param1 = "Hey_I'm a weird string";
var param2 = "!"
var param3 = 1;
key = "Hey_I'm a weird string_!_1";

var param1 = "Hey";
var param2 = "I'm a weird string_!"
var param3 = 1;
key = "Hey_I'm a weird string_!_1";
```
Yeah, far fetched, but, in theory, entirely possible and your question is precisely about unknown events in the future, so...
And last, but not least, the compiler helps you maintain the code. If, for example, tomorrow you have to add param4 to your key, Tuple<T1, T2, T3, T4> will strongly type your key. Your string concatenation algorithm on the other hand can live on blissfully happy generating keys without param4 and you wont know whats happening until your client calls you up because their software is not working as expected.

+1 All good points, and a sensible discussion given the question, though I'd still be inclined to write my own type if anyone else ever had to use the code. — VisualMelon, 19 hours ago
@VisualMelon Thnks. Yes, I agree completely but there is already an upvoted question addressing that particular solution. I just focused on answering the exact question. I'd also probably implement a private nested class to handle the key even though it does add an additional maintenance cost. — InBetween, 19 hours ago
Aye, I think there is great value in both kinds of answers, certainly no slight intended on my part, your answer is very good as it stands. — VisualMelon, 19 hours ago
This works, but a bit slower than creating a type. Definitely better than a string key. stackoverflow.com/a/41938199/1176983 — Tomer, 17 hours ago

Tim Rutter · Answer 2 · 2017-01-30 20:50:30Z

You could create a class (doesn't matter that its only used here) that overrides GetHashCode and Equals:

Thanks theDmi (and others) for improvements...

public class Key : IEquatable<Key>
{
    public Key(string p1, string p2, int p3)
    {
        Param1 = p1;
        Param2 = p2;
        Param3 = p3;
    }

    public string Param1 {get;}
    public string Param2 {get;}
    public int Param3 {get;}

    public override int GetHashCode()
    {
        unchecked // Overflow is fine, just wrap
        {
            int hash = (int) 2166136261;

            hash = (hash * 16777619) ^ Param1?.GetHashCode() ?? 0;
            hash = (hash * 16777619) ^ Param2?.GetHashCode() ?? 0;
            hash = (hash * 16777619) ^ Param3.GetHashCode();
            return hash;
        }
    }

    public override bool Equals(object other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        if (other.GetType() != GetType()) return false;
        return Equals(other as Key);
    }

    public bool Equals(Key other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Param1,obj.Param1) && string.Equals(Param2, obj.Param2) && Param3 == obj.Param3;
    }
}

If you don't override those, ContainsKey just does a reference equals. That said, the Tuple class does have its own equality functions that would basically do the same as above. Using a bespoke class makes it clear that is what is intended to happen - and is therefore better for maintainability

Note: the class is immutable as dictionary keys need to be to avoid potential bugs with hashcodes changing after the object is added to the dictionary See here

GetHashCode taken from here

Although this answer is good, I'd recommend that Key implements IEquatable<Key> which autodocuments better the type's value semantics of the equalitiy check. — InBetween, 22 hours ago
I don't quite understand how having a redundant type is better for maintainability. Your example has nothing over the tuple. What do you need "full control over comparison" for? You conveniently skipped a quite important thing which is implementation of GetHashCode. It seems like explicitly reimplementing a stub of tuple, just for being verbose. Introducing unnecessary entities into architecture is not a good thing. This approach is fine, but only if the existing standard types would be not sufficient. — luk32, 20 hours ago
Yes, it illustrates why Tuple is a good idea at least from a productivity standpoint. Getting all these details right for your own separate class is a chore. :-) — Jeroen Mostert, 17 hours ago

theDmi · Answer 3 · 2017-01-30 12:35:37Z

Implement a custom key class and make sure it is suitable for such use cases, i.e. implement IEquatable and make the class immutable:

public class CacheKey : IEquatable<CacheKey>
{
    public CacheKey(string param1, string param2, int param3)
    {
        Param1 = param1;
        Param2 = param2;
        Param3 = param3;
    }

    public string Param1 { get; }

    public string Param2 { get; }

    public int Param3 { get; }

    public bool Equals(CacheKey other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Param1, other.Param1) && string.Equals(Param2, other.Param2) && Param3 == other.Param3;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != GetType()) return false;
        return Equals((CacheKey)obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            var hashCode = Param1?.GetHashCode() ?? 0;
            hashCode = (hashCode * 397) ^ (Param2?.GetHashCode() ?? 0);
            hashCode = (hashCode * 397) ^ Param3;
            return hashCode;
        }
    }
}

This is a GetHashCode() implementation how Resharper generates it. It is a good general-purpose implementation. Adapt as required.

Alternatively, use something like Equ (I'm the creator of that library) that automatically generates Equals and GetHashCode implementations. This will make sure that these methods always include all members of the CacheKey class, so the code becomes much easier to maintain. Such an implementation would then simply look like this:

public class CacheKey : MemberwiseEquatable<CacheKey>
{
    public CacheKey(string param1, string param2, int param3)
    {
        Param1 = param1;
        Param2 = param2;
        Param3 = param3;
    }

    public string Param1 { get; }

    public string Param2 { get; }

    public int Param3 { get; }
}

Note: You should obviously use meaningful property names, otherwise introducing a custom class does not provide much benefit over using a Tuple.

If the object is imputable, and exclusivly used as dictioanary keys, shouldn't the hash code be computed at object initialization? — Taemyr, 20 hours ago
It could, but why? What you're suggesting is micro-optimization. You'll have to profile a realistic use case with both versions and see which one is faster if you really want to know the answer to this question. — theDmi, 20 hours ago
I think that the main benefit of writing your own class over using Tuple is that Item1 and Item2 don't mean anything, the same goes for Param1, and Param2. I would suggest including a note that if you are going to the effort of creating a dedicated class for this case, that you should go the full hog and clearly name each parameter, because all I see here are 2 strings and an int. Maybe you can work out what the in is, but how are you to know which string goes where? (P.S. +1 for immutable and viable GetHashCode implementation) — VisualMelon, 19 hours ago
@VisualMelon Thanks for the note, that is of course the idea, I'll clarify the answer. Since the names in the question are meaningless, I did the same here... — theDmi, 19 hours ago
What about IStructuralEquatable which is what I believe Tuple<> is? At least I believe this use case is as at least partially (perhaps wholly) why that interface exists — pinkfloydx33, 18 hours ago

Tomer · Answer 4 · 2017-01-30 14:32:29Z

I wanted to compare the Tuple versus Class versus "id_id_id" approaches described in the other comments. I used this simple code:

public class Key : IEquatable<Key>
{
    public string Param1 { get; set; }
    public string Param2 { get; set; }
    public int Param3 { get; set; }

    public bool Equals(Key other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Param1, other.Param1) && string.Equals(Param2, other.Param2) && Param3 == other.Param3;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Key) obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            var hashCode = (Param1 != null ? Param1.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ (Param2 != null ? Param2.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ Param3;
            return hashCode;
        }
    }
}

static class Program
{

    static void TestClass()
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        var classDictionary = new Dictionary<Key, string>();

        for (var i = 0; i < 10000000; i++)
        {
            classDictionary.Add(new Key { Param1 = i.ToString(), Param2 = i.ToString(), Param3 = i }, i.ToString());
        }
        stopwatch.Stop();
        Console.WriteLine($"initialization: {stopwatch.Elapsed}");

        stopwatch.Restart();

        for (var i = 0; i < 10000000; i++)
        {
            var s = classDictionary[new Key { Param1 = i.ToString(), Param2 = i.ToString(), Param3 = i }];
        }

        stopwatch.Stop();
        Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
    }

    static void TestTuple()
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        var tupleDictionary = new Dictionary<Tuple<string, string, int>, string>();

        for (var i = 0; i < 10000000; i++)
        {
            tupleDictionary.Add(new Tuple<string, string, int>(i.ToString(), i.ToString(), i), i.ToString());
        }
        stopwatch.Stop();
        Console.WriteLine($"initialization: {stopwatch.Elapsed}");

        stopwatch.Restart();

        for (var i = 0; i < 10000000; i++)
        {
            var s = tupleDictionary[new Tuple<string, string, int>(i.ToString(), i.ToString(), i)];
        }

        stopwatch.Stop();
        Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
    }

    static void TestFlat()
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        var tupleDictionary = new Dictionary<string, string>();

        for (var i = 0; i < 10000000; i++)
        {
            tupleDictionary.Add($"{i}_{i}_{i}", i.ToString());
        }
        stopwatch.Stop();
        Console.WriteLine($"initialization: {stopwatch.Elapsed}");

        stopwatch.Restart();

        for (var i = 0; i < 10000000; i++)
        {
            var s = tupleDictionary[$"{i}_{i}_{i}"];
        }

        stopwatch.Stop();
        Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
    }

    static void Main()
    {
        TestClass();
        TestTuple();
        TestFlat();
    }
}

Results:

I ran each method 3 times in Release without debugging, each run commenting out the calls to the other methods. I took the average of the 3 runs, but there wasn't much variance anyway.

TestTuple:

initialization: 00:00:14.2512736
Retrieving: 00:00:08.1912167

TestClass:

initialization: 00:00:11.5091160
Retrieving: 00:00:05.5127963

TestFlat:

initialization: 00:00:16.3672901
Retrieving: 00:00:08.6512009

I was surprised to see that the class approach was faster than both the tuple approach and the string approach. In my opinion it's more readable and more future-safe, in the sense that more functionality can be added to the Key class (assuming it's not just a key, it represents something).

Great job comparing. It seems like the best solution is to use a class, but all the given parameters are parts of other classes in the reset of the code... — Shahar, 16 hours ago
Thanks. Even if the parameters are parts of other classes, I see no problem creating a new class. — Tomer, 16 hours ago
I think it's time to arrange all the classes and structs that are used in this code, and then I will create the new class for the key. It will take some time, since this isn't my code. — Shahar, 16 hours ago

LmTinyToon · Answer 5 · 2017-01-30 09:38:35Z

IMHO, I prefer to use in such cases some intermediate structure (in your case it will be Tuple). Such approach creates additional layer between parameters and end-target dictionary. Of course, it will be depend on purposes. Such way for example allows you to create not trivial transition of parameters (for example container may "distort" data).

svick · Answer 6 · 2017-01-30 16:18:22Z

If performance is really important, then the answer is that you shouldn't use either option, because both unnecessarily allocate an object on every access.

Instead, you should use a struct, either a custom one, or ValueTuple from the System.ValueTuple package:

var myCache = new ConcurrentDictionary<ValueTuple<string, string, int>, CachedData>();
bool exists = myCache.ContainsKey(ValueTuple.Create(param1, param2, param3));

C# 7.0 will also contain syntax sugar to make this code easier to write (but you don't need to wait for C# 7.0 to start using ValueTuple without the sugar):

var myCache = new ConcurrentDictionary<(string, string, int), CachedData>();
bool exists = myCache.ContainsKey((param1, param2, param3));

Dmytro Mukalov · Answer 7 · 2017-01-30 10:03:48Z

up vote -2 down vote

Having these parameters used as a complex key is good justification to have them combined into class as for my taste. The Tuple<...> class as is isn't suitable for a dictionary key purposes as its GetHashCode implementation always returns 0. So you'll have to create new class which is either derived from the Tuple or brand new container. From the performance perspective everything depends on the efficiency of GetHashCode and Equals methods implementation.

answered 21 hours ago

Dmytro Mukalov

172

Actually it returns normal hash code - sorry for misleading but implementation of combining the hash code values can be optimized by used cached value for example instead of recalculating them each time - that's the reason to think of custom class implementation. – Dmytro Mukalov 21 hours ago

If you know your answer is wrong, you should either edit it or delete it. I don't think writing a comment is sufficient. – svick 15 hours ago

add a comment |

asked	today
viewed	1573 times
active	today

current community

your communities

more stack exchange communities

Tuple vs string as a Dictionary key in C#

7 Answers 7

Results:

Your Answer

Not the answer you're looking for? Browse other questions tagged c# .net caching equality concurrentdictionary or ask your own question.

Visit Chat

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Tuple vs string as a Dictionary key in C#

7 Answers 7

Results:

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c# .net caching equality concurrentdictionary or ask your own question.

Visit Chat

Linked

Related

Hot Network Questions