Deterministic Fault Tolerant Load Balancing
The USA has an unfortunate penchant for granting patents that arguably do not merit patent protection. Some of these are things that are blindingly obvious. Others are just not sufficiently inventive.
Anyway, since I have no funds for patent searches, nor patent applications, and there are some other complications, I've decided to post this to establish prior-art for an algorithm. I don't claim that the algorithm is clever, nor novel, nor even that it violates no existing patents. This posting is simply to ensure that to the extent that someone might be granted a patent on it, they can't, because it has already been published.
The Java connection is that I've done a fair amount of the work required to turn this into a real system in Java.
Suppose you have set of processors,p0 thrupn-1, and each piece of work to be performed by a processor has some numberk associated with it. The problem is to allocate the work roughly equally across the subset of processors that are actually functioning. Further, over a period of time, a series of related pieces of work may arrive with the samek. To the maximum possible extent you want each of the related pieces of work to be handled by the same processor. If a processor fails, you want its work to be distributed across the remaining processors, but still maintaining the property that pieces of work with a given value fork are handled by the same processor. In general we assume that thek values are randomly spread through a large number space.
The motivation for these requirements is that for a givenk the processor may be caching information that improves performance. Or it may be enforcing some invariant, such as in a lock manager where each request for a given lock must go to the same processor, or it clearly won't function.
To achieve this, construct a list of integers of sizen. Elementi containsi if processori is functional, and -1 otherwise.
Calculatek modn, and use the result as an index into the list. If the value contained there is non-negative, then it is the number of the processor to use. If it is -1, remove the element from the list, decrement the value ofn and repeat. Continue until a processor number is found.
This scheme is fault tolerant to a degree, in that the resulting system has a high level of availability.
It also has the property that the failure of a processor only impacts on the allocation of pieces of work that would have been allocated to the failed processor. It does not result in a complete rearrangement of the work allocations. This makes things a lot simpler when dealing with things like distributed lock managers.
The fault tolerance can be improved by an extension of the algorithm that allows a distributed master/slave arrangement, where the master number for a givenk is determined as above, and a slave number is obtained by treating the master as if it were not functioning. Each processor is a master for some subset of thek values, and is a slave for another subset. For any given master, each of the other processors is a slave for a roughly equal portion of the given master's subset of thek values.
There are some boring details that I've not discussed, such as how an entity wanting work to be done determines which processors are functioning, and the stuff related to the exact sequence of steps that must be performed when a processor breaks, or is repaired. I don't believe anyone could patent them because once you start thinking about it, the steps are pretty obvious.

