Search This Blog

Saturday, May 28, 2005

Fast Semaphores - Implementation

The explaination I gave previously about how semaphores work should give you a pretty good idea how we're going to do this. At the heart of this thing is the semaphore count; this counter will be accessed with our atomic functions, so we are thread safe (and fast) accessing it. When this count is positive, it represents the number of threads that can enter the semaphore without waiting. When it's negative, it represents the negative of the number of threads currently waiting. New threads trying to enter the semaphore will have to wait if it's nonpositive (0 or less). Each thread that tries to enter the semaphore (and is willing to wait for it) decrements the counter, to register the fact that they are in line (or in the semaphore; either way).

Threads that need to wait will do so on a slow semaphore, as we can't do without kernel support. This slow semaphore will always be at a count of 0 or less, as threads will only try to grab it when they have to wait; when threads need to be released, this semaphore will be posted, but never increased above 0.

As we did with the slow semaphore, in addition to supporting a method that waits indefinitely for the semaphore to become available, we'll also support a try-wait method, which fails immediately if the semaphore cannot be obtained (the reason the slow semaphore did this was that POSIX semaphores, unlike Windows semaphores, don't support timed waits).

Finally, our fast semaphore supports spinning, in addition to waiting. When you call the constructor, you tell it the number of times it should try to spin before waiting on the slow semaphore - this number should be 0 for single-CPU systems, and something above that for multi-CPU systems (the exact number for a particular application is probably best found by trial and error). When the Wait function is called, the fast semaphore will spin as many times as it can; if it can't get the semaphore while spinning, it will wait on the slow semaphore. The try-wait function does not spin, it simply returns if the semaphore cannot be grabbed.

I think that's all the important explanation, so here's the code:

class CFastSemaphore
{
private:
CSemaphore m_sem; // The slow semaphore that threads will wait on

unsigned int m_nSpinCount; // Constant. The number of times to attempt to grab the semaphore before waiting

long m_nSemCount; // Atomically accessed. Current count semaphore is at. Positive numbers indicate the number of threads can can still claim the semaphore; negative numbers indicate the number of threads waiting on the semaphore.

public:
inline CFastSemaphore(long nInitialCount, unsigned int nSpinCount = 0) : m_sem(0)
{
assert(nInitialCount >= 0);

m_nSpinCount = nSpinCount;

m_nSemCount = nInitialCount;
}

inline ~CFastSemaphore()
{
}

inline bool TryWait()
{
long nSemCount;

do
{
nSemCount = m_nSemCount;

// If there are no free slots available, fail
if (nSemCount <= 0)
return false;
// Try to claim a slot
} while (AtomicCompareExchange(&m_nSemCount, nSemCount - 1, nSemCount) != nSemCount);

// Got it
return true;
}

void Wait()
{
// Spin first, in case we can get it without waiting
unsigned int nSpinCount = m_nSpinCount;
while (nSpinCount-- > 0)
{
if (TryWait())
return;
}

long nSemCount = AtomicDecrement(&m_nSemCount);

if (nSemCount >= 0)
return; // We got it; no need to wait

// We have to wait
m_sem.Wait();
}

inline Post(long nCount = 1)
{
assert(nCount > 0);

long nSemCount = AtomicExchangeAdd(&m_nSemCount, nCount);

// Were there any waiters?
if (nSemCount < 0)
m_sem.Post(min(nCount, -nSemCount)); // Release as many as possible
}
};

No comments: