On a multi-core Mac, running 10.8, I ran a test to compare the speed differences of incrementing or decrementing a value with no-locks, atomics and finally, a mutex lock.
The no-lock is the baseline, and here is the difference in speed with the later 2 items:
Atomics: 3X slower on the same thread
Mutex: 7X slower on the same thread
Hmmm…
Obviously no-lock doesn’t work when synchronization is required. But atomics are much better to use than mutexes when you can. We used atomics a lot at Bibble instead of mutexes in performance-critical areas. Lockless synchronization using atomics is neat.