codepage detection
Hi,
In Java, I need to find out at runtime what codepage a document is using. I already googled for some solutions and found 'cpdetector' (http://cpdetector.sourceforge.net/).
Has anyone used this library? Is it any good? Are there better alternatives or other solutions (using the JDK classes or external libraries)?
Thanks in advance!
Kind regards,
Dirk
[394 byte] By [
dirkdaemsa] at [2007-11-27 9:53:36]

Detecting what encoding a piece of text uses is a difficult problem, especially if you don't have a small range of possibilities. I've used a library for this before, see this thread: http://forum.java.sun.com/thread.jspa?forumID=31&threadID=5164766 It was good enough for my purposes, but it wasn't great.
What type of files are you working with? Are they webpages, or some other type of document? What hints do you have about their encoding?
Hi,
I know, it's a nasty problem. The files I need to discover the codepage from are plain text (txt) files. I don't have any clue about the encoding because they can be generated by different operating systems on different locations.
Thanks for the reply, I'll take a look at your solution.
Kind regards,
Dirk