codepage detection

Hi,

In Java, I need to find out at runtime what codepage a document is using. I already googled for some solutions and found 'cpdetector' (http://cpdetector.sourceforge.net/).

Has anyone used this library? Is it any good? Are there better alternatives or other solutions (using the JDK classes or external libraries)?

Thanks in advance!

Kind regards,

Dirk

[394 byte] By [dirkdaemsa] at [2007-11-27 9:53:36]
# 1

Detecting what encoding a piece of text uses is a difficult problem, especially if you don't have a small range of possibilities. I've used a library for this before, see this thread: http://forum.java.sun.com/thread.jspa?forumID=31&threadID=5164766 It was good enough for my purposes, but it wasn't great.

What type of files are you working with? Are they webpages, or some other type of document? What hints do you have about their encoding?

hunter9000a at 2007-7-13 0:22:50 > top of Java-index,Java Essentials,Java Programming...
# 2

Hi,

I know, it's a nasty problem. The files I need to discover the codepage from are plain text (txt) files. I don't have any clue about the encoding because they can be generated by different operating systems on different locations.

Thanks for the reply, I'll take a look at your solution.

Kind regards,

Dirk

dirkdaemsa at 2007-7-13 0:22:50 > top of Java-index,Java Essentials,Java Programming...