Hons Project Help

hey, i am writing a plagiarism detection application within Java, for my Honours Degree, and was looking for help on which algorithm a can use for comparing two text documents, for textual similarities.

i dont really know much about this field of computing (text compaarison), so any relevant comments would be appreciated

many thanx

D-Gen

[364 byte] By [D-Gena] at [2007-9-28 2:24:34]
# 1

Actually, it's much better to compare bytecode than to compare textfiles.

Simple formatting issues (whitespaces, indentation, variable names, comments) are taken into account by the compiler.

Therefor somebody who just changes the formatting, adds/alters a few comments and changes variable names will be detected easly.

Even more subtle changes (Changing a for-loop to a while-loop ) may result in the same bytecode.

So all you have to take into account is a possible change in the order of the instructions, or exchanges of primitiv types (double/float, int/long)

phohmeyera at 2007-7-7 21:57:03 > top of Java-index,Other Topics,Algorithms...
# 2

Sorry i wasnt very clear, the program will have to compare a submitted report, not source code. this is for plagiarism within thesis, dissertations etc.

i want the program to compare the orginal document being checked to other submissions.

an example of this would be comparing one dissertation to 30 other students in the class, to see if any were plagiarising.

i hope this helps

D-Gena at 2007-7-7 21:57:03 > top of Java-index,Other Topics,Algorithms...
# 3
The usual procedure is to search the literature to see if it has been done before and if so how. I suppose your post was an attempt to do that, but an Internet search would probably be more practical.
DrClapa at 2007-7-7 21:57:03 > top of Java-index,Other Topics,Algorithms...
# 4
The ResearchIndex cites many articles on the subject: http://citeseer.nj.nec.com/csSearch for "plagiarism"Søren
soren_baka at 2007-7-7 21:57:03 > top of Java-index,Other Topics,Algorithms...
# 5
Have you ever used the "diff" command under *nix?Look into that. It could be a good place to start (and source should be available somewhere).
mgbolusma at 2007-7-7 21:57:03 > top of Java-index,Other Topics,Algorithms...