Divide a number into random parts
I need to divide a number into a collection of random parts. So that the sum of all parts is equal to the number.
E.g.:
public void divideRandomly ( int number, int parts)
{
int[] returnArray = new int[parts];
//do random division calculations
return returnArray;
}
Lets say the number = 20
And parts = 4
I would then like to return an array of ints containing e.g. 3, 11, 5, 1
OR e.g. 4,8,2,6
Does anyone have a clue as to how I could do this?
Cheers
[537 byte] By [
jattena] at [2007-10-1 22:38:35]

First you need to decide what you mean by "random parts". For example, does this mean that over the long term you would select each partition of M into N parts the same number of times? Does the order of the parts matter? Can the parts be zero? I am sure there are other questions that could be asked.
Based on your example, I think something like this would work.M = numberN = partsloop i =1 to N - 1 {R = randomInt between 1 and (M - (N - i)) (inclusive)M = M - Rstore R}store Mreturn all stored values
I think it will do, although, the first parts saved will have a tendency to be larger than the last parts. However, if i randomize the positions at the end, I guess your solution will work just fine. Cheers!
> I think it will do, although, the first parts saved
> will have a tendency to be larger than the last
> parts. However, if i randomize the positions at the
> end, I guess your solution will work just fine.
Actually the distribution is off:
5 and 3 parts:
comb odds
1,1,3 1/3*1/3*1 = 1/9
1,2,2 1/3*1/3*1 = 1/9
1,3,1 1/3*1/3*1 = 1/9
2,1,2 1/3*1/2*1 = 1/6
2,2,1 1/3*1/2*1 = 1/6
3,1,1 1/3*1*1= 1/3
So the chance of getting a combination including a 3 is 5/9 when it's actually in only 1/2 of the possible combinations.
To ensure an even distribution, you could create a list of the possible combinations and pick one at random.
> Can the parts be zero? I am sure there> are other questions that could be asked.Can the parts be negative? Can the parts be fractional? Can the parts be imaginary?
jverda at 2007-7-13 14:40:51 >

The problem of splitting the numbers reminds me of this problem:
http://forums.java.sun.com/thread.jspa?threadID=639252&messageID=3822228#3822228
So if the number of parts is not too large, then you could count the number of possible splits, generate a random number between [0, number of splits) and then count up to that number.
I'm not sure if there is a way to calculate (without counting) the number of possible splits.
Given that all numbers come from the same distribution
following solution works. The idea behind it is similar
to what dubwai suggested, but it does not create the
list. This might be important because the list of
possible combinations can be large.
public static int[] getRandomSumList(int sum, int length, int maxValue) {
int [] list = new int[length];
boolean doneFirstPass = false;
int tempSum = 0;
int i = 0;
while (!doneFirstPass || tempSum!=sum) {
tempSum = tempSum - list[i];
// OBS, any distribution can be used
list[i] = (int)(Math.random()*(maxValue+1));
tempSum = tempSum + list[i];
i++;
if (i==length) {
i = 0;
doneFirstPass = true;
}
}
return list;
}
Be aware that given the wrong parameters this solution
can take a long time to find a list.
parza at 2007-7-13 14:40:51 >

> Be aware that given the wrong parameters this solution
> can take a long time to find a list.
This seems misleading, as the termination condition is based
on randomness. Although, it is possible to predict an average
termination time.
Basically your algorithm starts by generating initial random values
in the range [0, maxValue]. You limit the size of the maximum
generated value, although I'm not sure why. Obviously maxValue
has to be greater than sum / length, otherwise it would be
impossible to generate any combination values that add up to sum.
Then your algorithm asks if the sum of those initial random values
is equal to the target sum. If not, you cycle through the list replacing
the the current position with a new random value and updating the
current sum.
As an example, let's suppose that maxValue is equal to sum, and
that sum = 1000, and length (numParts) is 4.
Assuming the minValue is 1, not 0, then the total number of
permutations of 4 numbers that add up to 1000 is:
165,668,499
However, the total number of permutations is:
1000^4 = 1,000,000,000,000
In terms of probability, the chance of finding a solution on each try is:
165,668,499 / 1,000,000,000,000 = 0.000165668499
Thus, you'll need approximately 6036 loops on average to find a solution for 1,000 split 4 ways.
For 2,000 split 4 ways, number of perms is:
2000^4 = 16,000,000,000,000
The perms of 4 numbers that add up to 2000 is:
1,329,336,999
Probability of finding a solution on each try is:
0.0000830835624375
Average number of tries is: 12036
I'm impressed that the random approach is that fast.
> This seems misleading, as the termination condition is based
> on randomness. Although, it is possible to predict an average
> termination time.
Maybe not misleading, but far from conclusive. In your examples,
I agree that it is surprisingly fast. However, running
getRandomSumList(1, 100, 1); demand on average
2^100 = 1.2*10^30 number of loops.
> Obviously maxValue has to be greater than sum / length, otherwise
> it would be impossible to generate any combination values that
> add up to sum.
Yes, looping forever takes a very long time. :)
> You limit the size of the maximum generated value,
> although I'm not sure why.
The 'maxValue' parameter is just part of defining the
distribution. The most general way would be to define
an interface
public interface DiscreteDistribution {
public int getInstance();
}
and then use this in the function
public static int[] getRandomSumList(int sum, int length, DiscreteDistribution dist) {
...
list[i] = dist.getInstance();
...
}
> Assuming the minValue is 1, not 0, then the total number of
> permutations of 4 numbers that add up to 1000 is:
> 165,668,499
How did you compute this?
parza at 2007-7-13 14:40:51 >

This solution is always fast. I do not know from which distribution
the numbers are, but they are all from the same one.
Let L be the length of the list and S be the sum. Then L-1 splitters
divides the sum S and the distances between these splitters are
the numbers.
public static int[] getRandomSumList(int sum, int length) {
int [] splitters = new int[length-1];
for (int i = 0;i<length-1;i++) {
splitters[i] = (int)(Math.random()*(sum+1));
}
java.util.Arrays.sort(splitters);
int [] list = new int [length];
list[0] = splitters[0];
for (int i = 0;i<length-2;i++) {
list[i+1] = splitters[i+1] - splitters[i];
}
list[length-1] = sum - splitters[length-2];
return list;
}
>
parza at 2007-7-13 14:40:51 >

> Maybe not misleading, but far from conclusive. In your examples,
> I agree that it is surprisingly fast. However, running
> getRandomSumList(1, 100, 1); demand on average
> 2^100 = 1.2*10^30 number of loops.
There are 100 ways to create a sum of 1:
[1, 0, 0, .... ]
[0, 1, 0, .... ]
[0, 0, 1, .... ]
However, there are 2^100 combinations.
Thus, the probability is 100 / 2^100. The average number
of loops would be: 12676506002282294014967032053.76
or 1.26765*10^28
> How did you compute this?
In the link I posted earlier. I modified the code
to count the number of times the display method is called.
For the examples I gave, it took much longer to count than to
use the random search.
However, in the example you gave, the output size is only
100. Thus, counting would be extremely fast since the algorithm
runs in O(n) where n is the size of the output.
The random approach works well when the number of ways to
create the sum is large. The counting approach would work well
when the number of ways to create the sum is small.
I don't know if there's a mathematically fast way to determine the
output size or if it's possible generate a permutation based on an
index value. I'll try to figure it out if I get a chance.
> The random approach works well when the number of ways to
> create the sum is large. The counting approach would work well
> when the number of ways to create the sum is small.
I agree, but the approach I gave above works in both cases.
However, if you run the last method above many times and
take the mean of the values, the mean of the first and last
value is lower. I think this has something to do with the
rounding because if I remove the 1 in (sum+1) and replace
all lists with double values it works fine, but I can not
get it to work with integers.
parza at 2007-7-13 14:40:51 >

> I don't know if there's a mathematically fast way to determine
> the output size
Actually, it's quite easy. If sum = x and numParts = y then
the number of permutations is: nCr(x - 1, y - 1)
(This applies when the minimum value of each part is 1, not 0).
So in my first example, x = 1000, y = 3, then
nCr(999, 3) = 999! / (3! * (999 - 3)!) = 165,668,499
> the approach I gave above works
I'm still looking at it...
If I replace the splitters line with
splitters[i] = (int)(Math.random()*(sum)+0.5);
it becomes a lot better
0.430235, 0.560931, 0.513133, 0.496007, 0.499942,
0.499075, 0.4965,0.513215, 0.561102, 0.42986
but not completely even. Do you know why this is or
how to fix it?
parza at 2007-7-13 14:40:51 >

> but not completely even. Do you know why this is or
> how to fix it?
I think being completely even requires an infinite amount
of computation.
> If I replace the splitters line with
I thought it was ok without the replacement. I'm using this
to test it and it's coming out pretty even:
public static void main(String[] args) {
int[] sum = new int[4];
int[] min = new int[4];
int[] max = new int[4];
int n = 10000000;
for (int i = 0; i < 4; i++)
min[i] = Integer.MAX_VALUE;
int goalSum = 100;
for (int i = 0; i < n; i++) {
int[] list = getRandomSumList(goalSum, sum.length);
for (int j = 0; j < list.length; j++) {
sum[j] += list[j];
if (list[j] < min[j]) min[j] = list[j];
if (list[j] > max[j]) max[j] = list[j];
}
}
System.out.println("The ideal average value is: " + (1.0 * goalSum / sum.length));
System.out.println("averages:");
for (int i = 0; i < sum.length; i++)
System.out.print((1.0 * sum[i] / n) + " ");
System.out.println("\n");
System.out.println("mins:");
for (int i = 0; i < min.length; i++)
System.out.print(min[i] + " ");
System.out.println("\n");
System.out.println("maxes:");
for (int i = 0; i < max.length; i++)
System.out.print(max[i] + " ");
System.out.println();
}
Looking at your algorithm, if you were to split 100 (max) into 4 parts,
then you could first generate 3 random numbers in the range [0, 100].
[a, b, c, max]
then you sort the array because if you don't sort the array then
you'll end up with negative values... even though the sum will
still add up to max.
a + (b - a) + (c - b) + (max - c) = max
To me, it looks like the distribution is good if not optimum.
> a + (b - a) + (c - b) + (max - c) = max
Cool, did not think of this possibility.
I think I have isolated the cause of the problem. Take
a look at this example. The sum should be 1, divided up
into 3 buckets. Then we can create following table
splitters = sorted splitters = value in buckets
0,0 = 0,0 = 0,0,1
0,1 = 0,1 = 0,1,0
1,0 = 0,1 = 0,1,0
1,1 = 1,1 = 1,0,0
Mean values in buckets = 1/4,1/2,1/4.
This can be verified by running the program. Now, if we
let the sum be 1 and we increase the number of buckets,
then the sorted splits will push the 0->1 jump around
the center of the sort. For 10 buckets the most common
sorted split will be 0,0,0,0,0,1,1,1,1,1 and
0,0,0,0,0,0,0,0,0,1 is extremely rare. This effect is
almost gone when the sum is about 100 times the number
of buckets. Conclusion: This method only works when
the number of buckets is a lot less then the sum.
I think I have found a solution that is quite fast for
any number of buckets and any sum. The idea is really
simple!!
public static int[] getRandomSumList(int sum, int length, double evenFactor) {
int [] list = new int [length];
while (sum>0) {
int sumPart = 1 + (int)(Math.random()*sum*evenFactor);
int i = (int)(Math.random()*length);
list[i] = list[i] + sumPart;
sum = sum - sumPart;
}
return list;
}
The evenFactor should be between 0 and 1 and determines
how even the numbers should be.
parza at 2007-7-20 12:20:12 >

> Mean values in buckets = 1/4,1/2,1/4.
Good example. The sorting does skew the output. For the case
of generating 0's and 1's, it looks like a binomial distribution
occurs, however the distribution should be even. In your
example of 2 random digits:
0,0 = 0,0,1
0,1 = 0,1,0
1,0 = 0,1,0
1,1 = 1,0,0
So 001 x 1, 010 x 2, 100 x 1
Now, for 3 random digits:
0,0,0 => 0001
0,0,1 => 0010
0,1,0 => 0010
0,1,1 => 0100
1,0,0 => 0010
1,0,1 => 0100
1,1,0 => 0100
1,1,1 => 1000
0001 x 1, 0010 x 3, 0100 x 3, 1000 x 1
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
etc.
> 0,0,0,0,0,0,0,0,0,1 is extremely rare.
the probability would be 1 / 1024, however, ideally it should be
1 / 10.
> I think I have found a solution that is quite fast for
> any number of buckets and any sum.
That does look good. Choosing a random index location
ensures that the big first number skewing doesn't occur.
The sumPart variable is always >= 1 because there's no
point in generating a 0 value for sumPart.
> The evenFactor should be between 0 and 1 and determines
I suppose if the evenFactor is 0, then there is a very good chance
the numbers will be close to equal.
> it looks like a binomial distribution occurs,
Yes, I was thinking the same, but I could not get
the values consistent with its density function
f(x) = nCr(n,x)*p^x*(1-p)^(n-x).
The more I dig into this problem, the more complicated
it becomes. Since the sum must be be a constant, at
least one stochastic variable must be dependent on at
least one other. Further more, if we want the same mean,
variance, or even the same distribution, every variable
in the sum must be dependent on every other. With this
in mind, let us take a look at my first solution. In
this solution the sum can be built from any distribution
and even if the variables are independent going into
the sum, the variable can not be independent when a
result is found. They become dependent because the
method is discarding many sums, but how is the
distribution effected?
Even if there has been three different solutions
presented in this thread, no one is complete. I would
like to see a solution where I can determine the
distribution of the parts and it should always be fast.
parza at 2007-7-20 12:20:12 >

That sounds awfully complicated to me. It appears to me that your summands aren't limited to being integers, but they are limited to being greater than zero. So why not look at it this way: you want to choose x1 + x2 + ... + xn = 1. This equation defines a bounded manifold of dimension n-1; you want to choose a random point on this manifold.
Now you just need to decide what "choosing a random point on a manifold" means. For example look at the case where n=3, and you have an equilateral triangle whose corners are at (1, 0, 0), (0, 1, 0), and (0, 0, 1). Presumably choosing random points in that triangle means that every region inside the triangle, whose area is 1/M of the triangle's total area, will contain a randomly chosen point 1/M of the time.
Personally I don't see an easy way to do that, not that I have thought about it much. Most likely somebody else already has a solution to it.
The OP never defined what was meant by choosing an integer partition at random. It has been suggested by dubwai that the appropriate measure was to choose uniformly from the finite set of possible partitions. However it was also not specified as to which of the two perfectly logical sets of partitions we might be talking about.
For example if I am splitting 6 into 3 parts, it is easy to see that one of the partitions is {2,2,2} but do I count {1,1,4} {1,4,1} and {4,1,1} as three distinct partitions or do I cannonicalize it to sorted form {4,1,1} and count it as a single partition?
One might assume from the fact that the OPs list was unsorted to start with that the unsorted definition (counting 411, 141, & 114 as 3 different partitions) would be the prefered distribution, but the later comment about shuffling a result to get a final answer makes it unclear which is the prefered distribution. (Actually it makes it perfectly clear that the term "random" was being used in the the colloquial manner where it means some random unknown thing because the concept of an underlying distribution was never considered.)
It is fairly easy to see that none of the yet proposed algorithms actually returns a uniformly distributed choice among either of these two reasonable definitions of the set of distinct partitions.
It is not particularly difficult to come up with an algorithm that selects uniformly among the unordered partitions. To break the number N into K parts, just choose K-1 distinct numbers from 1 to N-1, Think of them as split points of the interval from 1 to N and then read off the lengths of the intervals. The K-1 splits points yield K intervals whose lengths will all at least 1 as long as all the splits are distinct. The discussion about how to efficiently choose K distinct elements from a set of N was a lively topic here a month or so ago.
It should be clear that every distinct choice of split points leads to a single distinct unsorted partition, and thus this algorithm will efficiently generate a uniform choice among the unsorted partitions.
It also leads to a rather simple (though very inefficient) algorithm for selecting uniformly among the sorted partitions as well. Simply run the algorithm and if the result happened to be sorted return it, otherwise run it again. This means that you will be coosing only from among the set of cannonical forms which is the other interesting distribution.
Unfortunately, given that there are K! arrangements of K terms and in general only one of them is in sorted order that means that this algorithm has a runtime that is exponential in K. Works fine for samll K.
The method that has been proposed so far, filling each slot with a random number until you happen to get the correct sum, has little hope of ever solving the harder problem of uniformly generating one of these sorted partitions. It clearly does the wrong thing if the internal distribution function is highly skewed, for example if the internal distribution chooses 1 nearly all the time and only rarely chooses some other number - it will tend to choose final partitions that are nearly all 1's with an occasional larger number. Once you know that the final result is sensitive to the internal distribution used to select candidates, you know that only one single distribution function among the infinitude available could possibly be right.
Furthermore there is no indication that there even is a correct internal distribution that would work. It seems much more likely, that once you have choosen a single number for a single slot, the distribution required to fill yet another slot is different from the first.
First I want to make an adjustment to my second solution.
The set splits should be found so no duplicates are allowed.
This was why the means for the different buckets where not
the same.
> The OP never defined what was meant by choosing an
> integer partition at random.
I agree that the main problem here is that there are so many
possibilities because we lack a definition of the problem.
> It is fairly easy to see that none of the yet proposed
> algorithms actually returns a uniformly distributed
> choice among either of these two reasonable definitions
> of the set of distinct partitions.
You have chosen to look at the partition of a number where
you do not allow the number 0 in the sum. I assumed that 0
should be allowed which is also a reasonable definition. In
this case neither of your proposed algorithms will choose
uniformly from this domain. If you adjust your unordered
solution to allow identical splits it will be identical to
my second suggestion and it should be clear that every
distinct choice of splits leads to a single distinct
unsorted partition allowing empty sets.
> It also leads to a rather simple (though very inefficient)
> algorithm for selecting uniformly among the sorted ...
This idea of changing distributions by ignoring outcomes is
useful in many cases. My first suggestion was based on this
idea.
> The method that has been proposed so far, filling each slot
> with a random number until you happen to get the correct
> sum, has little hope of ever solving the harder problem of
> uniformly generating one of these sorted partitions. ...
Obviously since it was not design to do this. Once again
different assumptions give different conclusions. I assumed
that all the numbers comes from the same distribution and in
this context it is quite flexible because it gives the user
the ability to influence the outgoing distribution of the
numbers. Picking an unsorted partition is special case for
this solution but it can not pick a sorted partition because
the numbers in the sorted set is not from the same
distribution.
I guess that in order to continue we need an conclusive
definition of the problem. Do you have one jatten?
parza at 2007-7-20 12:20:12 >

