Herlihy, Luchangco and Moir's 2003 paper, "Obstruction-Free Synchronization: Double-Ended Queues as an Example" pretty much revolutionized the field and is mandatory reading. Techniques like speculative lock elision (SLE) can abrogate much of the cost of uncontested locks, and threading implementations like NPTL handle uncontested mutexes entirely in userspace.
- lock-free - guaranteed system-wide progress
- wait-free - guaranteed per-thread progress
- Fich, Hendler, and Shavit's 2004 "On the Inherent Weakness of Conditional Synchronization Primites" shows that CAS and LL/SC cannot provide starvation-free implementations of many common data structures without O(N) space on N threads.
- LWN's 2008-09-30 and 2009-07-08 articles on lockless ring buffers in the Linux kernel
- Bencina, "Some Notes on Lock-Free Algorithms"
- "Practical Lock-Free Algorithms" at Cambridge's Computer Laboratory's Systems Research Group
- Section 5.2, "Synchronization Primitives", in Understanding the Linux Kernel, Third Edition
- "Memory Ordering in Modern Microprocessors" by Paul McKenney.
- High Performance Parallel Computing page