Training a classifier

Hey

For a learning classifier is common practice to use the same dataset as the training data and the test data? I am asking because i did a college assignment which involved creating just the one dataset which was used as the training data and the test data.

If this is a bad approach then when using seperate datasets for the training data and the test data, would i be right in assuming that the training method would take the training dataset as input while the prediction method would take the test data as input?

Thanks

[550 byte] By [oraistea] at [2007-10-2 14:41:22]
# 1
Yes, it is a bad approach. Using the same data as training and test just proves you can memorize where the data goes.
RadcliffePikea at 2007-7-13 13:10:22 > top of Java-index,Other Topics,Algorithms...
# 2
ThanksJust a few other questions about this. Should the test data contain a class label, or should it be composed of just the attributes and corresponding values? And should it be fed directly to the prediction method?
oraistea at 2007-7-13 13:10:22 > top of Java-index,Other Topics,Algorithms...
# 3

There are a number of approaches.

take k to mean training set and K to mean test set.

A simple technique is called 'leave on out' where |k| = 1.

for each k element of K

K = K - k

classify(k,K)

report success or failure.

Another way to do it is to generate a set of random test set partitions from the data set. You will find oodles of existing research on this. There are accepted standards on this.

There are also 'standardization' techniques you may need to look into.. Common standardization involves subtracting mean from a value and dividing by standard deviation.

There is a pretty solid base of knowledge on classification frameworks. You should find oodles to go on.

Best of luck,

Bryan

bjb1440a at 2007-7-13 13:10:22 > top of Java-index,Other Topics,Algorithms...
# 4
Ohh... 'Cross validation' is the term you should look up. Research cross validation techniques. I think 10-fold is the norm. Just FYI: following generally accepted cross-validation technique is important. Reasearch and decide carefully before jumping into it.-Bryan
bjb1440a at 2007-7-13 13:10:22 > top of Java-index,Other Topics,Algorithms...