Problems Implementing Backpropagation

Hi. I'm having a good deal of difficulty implementing the backpropagation learning algorithm for articial neural networks. Errors seem to be converging toward plus or minus .5 instead of 0. One of my problems is that different sources say to do different things, so if someone could look over my code and give me an answer (preferrably quickly) then I'd be much obliged.

Backpropogation (yeah, I know it's misspelled...).java:

publicclass Backpropogationextends TrainingAlgorithm

{

//Variables are only used within train(double[],double[]String[],String[]), but are initialized

//once and for all globaly for faster runtime

//numNeurons is the number of neurons in layer n, connectionsPerNeuron is the number of neurons in

//layer n+1. Because we are going backwards through the array, you might instead think of them as

//being layers n-1 and n, respectively

int numNeurons,connectionsPerNeuron;

//The learning rate for the algorithm is low, causing it to take longer but lowering the risk of

//missing a minimum

double trainingRate=.1;

//values is an array for holding the activation of layer n-1, which is multiplied by delta to

//determine a neuron's particular responsibility toward delta. error is the error array for a

//particular layer. delta is an array for layer n, determining how much the weights feeding into

//a particular neuron need to be changed by (before multiplying by the responsibility of the source

//neuron for each weight, and before multiplying by the training rate). Finally, in order to

//backpropagate the error we need to multiply the transpose of the weight matrix between layers n-1

//and n by the error values of layer n. tmp is a helper in taking the transpose of the weight matrix

double[]values,error,delta,tmp;

//The matrix of weights where the first index is the number of neurons in n-1 and the second is the

//number of neurons in n

double[][]weights;

/*

*Preconditions: The network has been run. error for the output layer has been claculated, out is

*an array of the output values of the output layer, the two String arrays are file locations so

*that additional vectors and matrices do not have to be kept in RAM

*

*Postcondition: Weights are updated and saved

*/

publicvoid train(double[]error,double[]out,String[]valueFilenames,String[]weightFilenames)

{

//Helper code put in specifically for testing XOr, hence the single error output

System.out.println("Error: "+error[0]);

//Let m be the total number of layers of the network including both input and output.

//There are m-1 saved values, since the values of the output layer were passed as a

//parameter, and there are m-1 saved weight matrices, since matrices only appear between

//layers and the network is not recursive. The first index that can be loaded when starting

//from the end is the number of files-1 because 0 is used as an index

for(int i=valueFilenames.length-1;i>-1;i--)

{

//Load the weight matrix between layers n-1 and n (the rows represent layer n and the

//columns represent layer n-1)

weights=IOHelper.loadMatrix(weightFilenames[i]);

//Load the output values of layers n-1

values=IOHelper.loadVector(valueFilenames[i]);

//The total number of neurons in layer n is equal to the rows of the weight matrix

numNeurons=weights.length;

//The total number of neurons in layer n-1 is equal to the columns of the weight matrix

connectionsPerNeuron=weights[0].length;

//delta is set to be an empty array with one spot for each neuron in layer n

delta=newdouble[numNeurons];

//Calculate delta for each neuron in layer n based on the error of that neuron*the

//derivative of the activation function with respect to the output of that neuron.

//One resource used the learningRate here, which affects what error values will be

//backpropogated in addition to how the weights feeding the neurons in layer n will

//be changed. I tried that method. It did not solve my problems

//POSSIBLE ERROR LOCATION: out[j] is the output of the neuron after the sigmoidal

//activation function of the neuron is applied. Should it perhaps be the net value of the

//neuron before? This was not tried experimentally. Beyond that, check the file

//sigmoid.java to see if my understanding of what equation I'm supposed to use is correct

for(int j=0;j<numNeurons;j++)

{

delta[j]=error[j]*actFunc.getDerivativeFor(out[j]);

}

//We are done with the errors for layer n. We now make a new array for the neurons

//of layer n-1

error=newdouble[connectionsPerNeuron];

//Calculate the error for each neuron in layer n-1

for(int j=0;j<connectionsPerNeuron;j++)

{

//Build a vector of a column of the weights matrix with respect to a single row.

//In effect we are taking a single row of the transpose of the matrix of weights

tmp=newdouble[numNeurons];

for(int k=0;k<numNeurons;k++)

{

tmp[k]=weights[k][j];

}

//The error of this neuron is equal to each weight that comes from it multiplied

//by the delta value of the neuron it goes to in layer n. Note: the delta value

//seems an odd choice? Why not instead use the error value of layer n. There is

//discrepency between resources as to which to use, but most, and the most

//professional appearing, said to use delta. I tried the error vector instead but

//it had the same problems as with delta.

error[j]=DotProduct.dot(tmp,delta);

}

//Update the weights for each neuron in layer n. Most sources say to do this step

//before backpropogating the error, thus backpropogating it through updated weights.

//That did not resolve my problems. That way probably makes more sense, and I suspect,

//Lord willing, I'll move this code back above the other loop later.

for(int j=0;j<numNeurons;j++)

{

//The first weight for each neuron is from the bias, and therefore has a value of

//1, so we don't bother multiplying it with that value

weights[j][0]+=trainingRate*delta[j];

//Adjust each weight feeding into the current neuron in layer n (not counting the

//bias, which we've already adjusted) by the learningRate*delta*the responsibility

//of the neuron in n-1 for the error.

for(int k=1;k<connectionsPerNeuron;k++)

{

weights[j][k]+=trainingRate*delta[j]*values[k-1];

}

}

//Save the updated matrix, overwriting the one we loaded

IOHelper.saveMatrix(weightFilenames[i],weights);

//We are now ready to move back one layer, and since out=the output values of n and

//values=the output values of n-1, we can simply set out to be equal to values

out=values;

}

}

}

Simoid.java

publicclass Sigmoidextends ActivationFunction

{

privatefinaldouble E=Math.E;

privatedouble tmp;

publicboolean isActivated(double value)

{

return value>.5;

}

publicdouble getActivation(double net)

{

return 1/(1+Math.exp(-net));

}

publicdouble getDerivativeFor(double out)

{

//There seems to be a decent amount of discrepency on what value to use.

//Most books say out(1-out), but that is not the derivative of the sigmoid

//so I think I may be confused

tmp=Math.exp(-out);

return tmp/((1+tmp)*(1+tmp));//*/(1-out)*out;

}

}

[12046 byte] By [AlexanderTrostorffFincha] at [2007-10-2 6:58:34]
# 1
this isn't really a java problem, but an AI algorithm issue ? You may need a forum more geared to that. btw I can barely read the code for the amount of commenting.
David_Waddella at 2007-7-16 20:27:36 > top of Java-index,Other Topics,Algorithms...