Join the Stack Overflow Community
Stack Overflow is a community of 6.6 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I'm using Cygwin GCC and run this code:

#include <iostream>
#include <thread>
#include <vector>
using namespace std;

unsigned u = 0;

void foo()
{
    u++;
}

int main()
{
    vector<thread> threads;
    for(int i = 0; i < 1000; i++) {
        threads.push_back (thread (foo));
    }
    for (auto& t : threads) t.join();

    cout << u << endl;
    return 0;
}

Compiled with the line: g++ -Wall -fexceptions -g -std=c++14 -c main.cpp -o main.o.

It prints 1000, which is correct. However, I expected a lesser number due to threads overwriting a previously incremented value. Why does this code not suffer from mutual access?

My test machine has 4 cores, and I put no restrictions on the program that I know of.

The problem persists when replacing the content of the shared foo with something more complex, e.g.

if (u % 3 == 0) {
    u += 4;
} else {
    u -= 1;
}
share|improve this question
57  
Intel CPUs have some amazing internal "shoot down" logic to preserve compatibility with very early x86 CPUs used in SMP systems (like dual Pentium Pro machines). Lots of failure conditions that we are taught are possible almost never actually happen on x86 machines. So say a core goes to write u back to memory. The CPU will actually do amazing things like notice that the memory line for u isn't in the CPU's cache and it will restart the increment operation. This is why going from x86 to other architectures can be an eye opening experience! – David Schwartz 2 days ago
88  
Why does this code NOT suffer from race condition? -- No code suffers from a race condition, until it suffers from a race condition. – PaulMcKenzie 2 days ago
35  
Testing for unefined behavior is never conclusive, and absence of it's guaranteed manifestation is why it is called undefined. Race condition is undefined behavior, and if it would manifest itself in a clear and observable way all the time it wouldn't be called undefined behavior, it would be called something else. – SergeyA 2 days ago
63  
Absence of evidence is not evidence of absence. – Roger Rowland 2 days ago
7  
Just as with undefined behavior, "has a race condition" does not mean "something bad will happen". Not until you demo your program for your most important client... – Pete Becker yesterday
up vote 228 down vote accepted

foo() is so short, each thread probably finishes before the next one even gets spawned. If you add a sleep for a random time in foo() before the u++, you may start seeing what you expect.

share|improve this answer
46  
This indeed changed the output in the expected way. – mafu 2 days ago
39  
I would note that this is in general a rather good strategy for exhibiting race conditions. You should be able to inject a pause between any two operations; if not, there's a race condition. – Matthieu M. yesterday
    
We had just this issue with C# recently. Code almost never failed usually, but the recent addition of an API call in between introduced enough delay to make it consistently change. – Obsidian Phoenix 17 hours ago
    
@MatthieuM. Doesn't Microsoft have an automated tool that does exactly that, as a method of both detecting race conditions and making them reliably reproducible? – Mason Wheeler 12 hours ago
    
@MasonWheeler: I work nigh exclusively on Linux, so... dunno :( – Matthieu M. 12 hours ago

It is important to understand a race condition does not guarantee the code will run incorrectly, merely that it could do anything, as it is an undefined behavior. Including running as expected.

Particularly on X86 and AMD64 machines race conditions in some cases rarely cause issues as many of the instructions are atomic and the coherency guarantees are very high. These guarantees are somewhat reduced on multi processor systems where the lock prefix is needed for many instructions to be atomic.

If on your machine increment is an atomic op, this will likely run correctly even though according to the language standard it is Undefined Behavior.

Specifically I expect in this case the code may be being compiled to an atomic Fetch and Add instruction (ADD or XADD in X86 assembly) which is indeed atomic in single processor systems, however on multiprocessor systems this is not guaranteed to be atomic and a lock would be required to make it so. If you are running on a multiprocessor system there will be a window where threads could interfere and produce incorrect results.

Specifically I compiled your code to assembly using https://godbolt.org/ and foo() compiles to:

foo():
        add     DWORD PTR u[rip], 1
        ret

This means it is solely performing an add instruction which for a single processor will be atomic (though as mentioned above not so for a multi processor system).

share|improve this answer
39  
It's important to remember that "running as intended" is a permissible outcome of undefined behavior. – Mark 2 days ago
3  
As you indicated, this instruction is not atomic on an SMP machine (which all modern systems are). Even inc [u] is not atomic. The LOCK prefix is required to make an instruction truly atomic. The OP is simply getting lucky. Recall that even though you're telling the CPU "add 1 to the word at this address", the CPU still has to fetch, increment, store that value and another CPU can do the same thing simultaneously, causing the result to be incorrect. – Jonathon Reinhart 2 days ago
2  
I down-voted, but then I re-read your question and realized that your atomicity statements were assuming a single CPU. If you edit your question to make this more clear (when you say "atomic", be clear that this is only the case on a single CPU), then I will be able to remove my down-vote. – Jonathon Reinhart 2 days ago
2  
Downvoted, i find this claim a bit meh "Particularly on X86 and AMD64 machines race conditions in some cases rarely cause issues as many of the instructions are atomic and the coherency guarantees are very high." The paragraph should start making the explicit assumption that you're focusing on single core. Even so, multi-core architectures are de-facto standard nowadays in consumer devices that I would consider this a corner case to explain last, rather than first. – Patrick Trentin 2 days ago
2  
Hmm, no. This is wrong. add DWORD PTR u[rip], 1 or any instruction with a memory operand is not atomic on x86. add rax, 1 (or any instruction with a register operand) would be atomic, but when you have a memory operand, the instruction is decomposed into ~3 micro-ops (a load, an add, and a store in this case), which means it is not atomic. I guess you kind of try and weasel out of this by saying it is only atomic on a single-CPU system, but that doesn't make much sense. It's still not atomic, by definition, and with multiple threads, you're still hosed without a lock prefix. – Cody Gray 2 days ago

I think it is not so much the thing if you put a sleep before or after the u++. It is rather that operation u++ translates to code that is - compared to the overhead of spawning threads that call foo - very quickly performed such that it is unlikely to get intercepted. However, if you "prolong" the operation u++, then the race condition will become much more likely:

void foo()
{
    unsigned i = u;
    for (int s=0;s<10000;s++);
    u = i+1;
}

result: 694

BTW: I also tried

if (u % 2) {
    u += 2;
} else {
    u -= 1;
}

and it gave me most times 1997, but sometimes 1995.

share|improve this answer
1  
I would expect on any vaguely sane compiler that whole function would be optimized to the same thing. I am surprised it wasn't. Thank you for the interesting result. – Vality 2 days ago
    
This is exactly correct. Many thousands of instructions need to run before the next thread starts executing the tiny function in question. When you make the execution time in the function closer to the thread creation overhead, you see the impact of the race condition. – Jonathon Reinhart 2 days ago
    
@Vality: I also expected it to delete the spurious for-loop under O3 optimization. It doesn't? – user21820 yesterday
    
How could else u -= 1 ever be executed? Even in a parallel environment the value should never not fit %2, doesn't it? – mafu yesterday
1  
from the output, it looks like else u -= 1 is executed once, the first time foo() is called, when u == 0. The remaining 999 times u is odd and u += 2 is executed resulting in u = -1 + 999 * 2 = 1997; i.e. the correct output. A race condition sometimes causes one of the +=2 to be overwritten by a parallel thread and you get 1995. – Luke yesterday

It does suffer from a race condition. Put usleep(1000); before u++; in foo and I see different output (< 1000) each time.

share|improve this answer
  1. The likely answer to why the race condition didn't manifest for you, though it does exist, is that foo() is so fast, compared to the time it takes to start a thread, that each thread finishes before the next can even start. But...

  2. Even with your original version, the result varies by system: I tried it your way on a (quad-core) Macbook, and in ten runs, I got 1000 three times, 999 six times, and 998 once. So the race is somewhat rare, but clearly present.

  3. You compiled with '-g', which has a way of making bugs disappear. I recompiled your code, still unchanged but without the '-g', and the race became much more pronounced: I got 1000 once, 999 three times, 998 twice, 997 twice, 996 once, and 992 once.

  4. Re. the suggestion of adding a sleep -- that helps, but (a) a fixed sleep time leaves the threads still skewed by start time (subject to timer resolution), and (b) a random sleep spreads them out when what we want is to pull them closer together. Instead, I'd code them to wait for a start signal, so I can create them all before letting them get to work. With this version (with or without '-g'), I get results all over place, as low as 974, and no higher than 998:

    #include <iostream>
    #include <thread>
    #include <vector>
    using namespace std;
    
    unsigned u = 0;
    bool start = false;
    
    void foo()
    {
        while (!start) {
            std::this_thread::yield();
        }
        u++;
    }
    
    int main()
    {
        vector<thread> threads;
        for(int i = 0; i < 1000; i++) {
            threads.push_back (thread (foo));
        }
        start = true;
        for (auto& t : threads) t.join();
    
        cout << u << endl;
        return 0;
    }
    
share|improve this answer
    
Just a note. The -g flag does not in any way "make bugs disappear." The -g flag on both GNU and Clang compilers simply adds debug symbols to the compiled binary. This allows you to run diagnostic tools like GDB and Memcheck on your programs with some human readable output. For example when Memcheck is run over a program with a memory leak it will not tell you the line number unless the program was built using the -g flag. – MS-DDOS 6 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.