Most efficient way to strip nulls from a Double[] array?
I'm trying to optimize performance of some code that needs to accept a Double[] array (or a List<Double>) and fetch the median of the non-null values. I'm using org.apache.commons.math.stat.descriptive.rank.Median to fetch the median after converting to double[]. My question is how I can most efficiently make this conversion?
publicclass MathStatics{
privatestatic Median median =new Median();
publicstatic Double getMedian(Double[] doubles){
int numNonNull = 0;
for (int i = 0; i < doubles.length; i++){
if (doubles[i] !=null) numNonNull++;
}
double[] ds =newdouble[numNonNull];
for (int i = 0; i < doubles.length; i++){
if (doubles[i] !=null) ds[i] = doubles[i];
System.out.println(ds[i]);
}{
}
return median.evaluate(ds);
}
publicstaticvoid main(String[] args){
Double[] test =new Double[]{null,null,-1.1,2.2,5.8,null};
System.out.println(MathStatics.getMedian(test));
}
}
I'm sure that the code I wrote above is clunky and amateurish, so I'd really appreciate some insight into how to make improvements. FWIW, the arrays will typically range in size from ~1-15,000 doubles.
Thanks!
There's no need to loop over the array twice
int numNonNull = 0;
double[] ds = new double[numNonNull];
for (int i = 0; i < doubles.length; i++) {
if (doubles[i] != null) {
numNonNull++;
ds[i] = doubles[i];
System.out.println(ds[i]);
}
}
I'm also keen to see inputs on this one, I often have this same question arising but typically do what you've done above.
I tried using a different config for the loop like this (which used to be faster)
for (int i = 0, n = doubles.length; i < n; i++) {
}
instead of computing the length of the array on each iteration. But I have to say even on a very large Double[] (5E6 elements) I can't see a great difference in performance (it's very very fast!). I guess the JVM is internally using the fast version anyway.
However, changing the line
double[] ds = new double[numNonNull];
to
Double[] ds = new Double[numNonNull];
does in fact reduce the processing time for the array from 130ms on average to 100ms on my machine (again 5E6 elements). The unboxing done by the JVM is costly (converting the Double wrapper to the double primitive every time).
Keen to see other suggestions...
Cheers
Lance
> There's no need to loop over the array twice
> [code]int numNonNull = 0;
> double[] ds = new double[numNonNull];
> for (int i = 0; i < doubles.length; i++) {
> if (doubles != null) {
> numNonNull++;
> ds = doubles;
> System.out.println(ds);
>}
Except that ds will have length zero, so you'll get a OutOfBoundsException every time you use it.
As you're using Doubles rather than doubles, you can add the non-null values to an ArrayList and then convert it to an array at the end.
> Except that ds will have length zero, so you'll get a
> OutOfBoundsException every time you use it.
>
> As you're using Doubles rather than doubles, you can
> add the non-null values to an ArrayList and then
> convert it to an array at the end.
Oh, nice catch. I thing it's gonna be one of those days. :)
By the way (excluding construction of the ArrayList), this takes about 6 times longer than using the Double[] method. The fastest Collection seems to be a Vector<Object> but even that is 2x slower than the Double[].
> However, changing the line
> > double[] ds = new double[numNonNull];
>
> to
> > Double[] ds = new Double[numNonNull];
>
> does in fact reduce the processing time...
Thanks for the replies everyone. LanceJ, could you clarify something for me please? When you say that the Double[] is faster than the double[], how are you then passing the Double[] to the Median class? Is there an efficient way to cast Double[] to double[] that I'm not aware of, or are you suggesting that I make my own median calculation using a Double[]?
Thanks!
May I suggest public static Double getMedian(Double[] doubles) {
double[] ds = new double[doubles.length];
int j = 0;
for (int i = doubles.length-1; i >= 0; i--) {
Double d = doubles[i];
if (d != null) ds[j++] = d.doubleValue();
}
return median.evaluate(ds,0, j);
}
or some variant along these lines
Thanks jsalonen. I just came back to report that my early code contained an error (I was indexing the new array ds[] incorrectly), but you already fixed that error and made the code a lot cleaner and more concise to boot.
about 3-6 time faster is to do the following: instead of copying the array to a new one, just run on your array from both directions and swap null values in the front with non-null values in the end (some sort algorithms work similarly).
Note that it works mainly because order doesn't matter for calculating the median.
public static Double getMedian3(Double[] doubles)
{
int start = 0 ;
int end = doubles.length-1;
while (doubles[end] == null && end > start)
end--;
for (start = 0; start < end; start++)
{
if (doubles[start] == null)
{
doubles[start] = doubles[end];
end--;
while (doubles[end] == null)
end--;
}
}
return median.evaluate(doubles, 0, end+1);
}
Message was edited by:
gris
grisa at 2007-7-12 2:41:42 >
