Hons Project Help
hey, i am writing a plagiarism detection application within Java, for my Honours Degree, and was looking for help on which algorithm a can use for comparing two text documents, for textual similarities.
i dont really know much about this field of computing (text compaarison), so any relevant comments would be appreciated
many thanx
D-Gen
[364 byte] By [
D-Gena] at [2007-9-28 2:24:34]

Actually, it's much better to compare bytecode than to compare textfiles.
Simple formatting issues (whitespaces, indentation, variable names, comments) are taken into account by the compiler.
Therefor somebody who just changes the formatting, adds/alters a few comments and changes variable names will be detected easly.
Even more subtle changes (Changing a for-loop to a while-loop ) may result in the same bytecode.
So all you have to take into account is a possible change in the order of the instructions, or exchanges of primitiv types (double/float, int/long)
Sorry i wasnt very clear, the program will have to compare a submitted report, not source code. this is for plagiarism within thesis, dissertations etc.
i want the program to compare the orginal document being checked to other submissions.
an example of this would be comparing one dissertation to 30 other students in the class, to see if any were plagiarising.
i hope this helps