Training a classifier
Hey
For a learning classifier is common practice to use the same dataset as the training data and the test data? I am asking because i did a college assignment which involved creating just the one dataset which was used as the training data and the test data.
If this is a bad approach then when using seperate datasets for the training data and the test data, would i be right in assuming that the training method would take the training dataset as input while the prediction method would take the test data as input?
Thanks
[550 byte] By [
oraistea] at [2007-10-2 14:41:22]

There are a number of approaches.
take k to mean training set and K to mean test set.
A simple technique is called 'leave on out' where |k| = 1.
for each k element of K
K = K - k
classify(k,K)
report success or failure.
Another way to do it is to generate a set of random test set partitions from the data set. You will find oodles of existing research on this. There are accepted standards on this.
There are also 'standardization' techniques you may need to look into.. Common standardization involves subtracting mean from a value and dividing by standard deviation.
There is a pretty solid base of knowledge on classification frameworks. You should find oodles to go on.
Best of luck,
Bryan