NNTP Input stream / File output stream help

Hi, please excuse the lengthy post I am hoping to say and include everything i need to first time round.

I'm trying to read data (files) from a NNTP stream. There are two important things that I had to consider:

1) The NNTP RFC indicates a character encoding of "ASCII" (or in Java "US-ASCII")

2) Whenever a new line starts with a double dot (".."), it needs to discard one dot

The type of files i'm downloading are generally of line length 128 (except for the header and footer information).

My original battle was that a BufferedReader and FileWriter caused the downloaded file to fail a crc check. After investigation (comparison to a correctly downloaded file) I found that wherever the correct version had a (int) value of 65533 my downloaded file had a (int) value of 63. I came to the conclusion this has something to do with the default Charset being "windows-1252" on a Windows XP machine.

I managed to fix this problem by reading to and writing from a byte array directly (as is demonstrated in the code below). The situation I am in now, however, is that a double dot on a new line needs to be replaced with a single dot. (This situation was easy to deal with using a BufferedReader as I could simply read the input in lines and use the String.startsWith() method)

The first choice I'm not sure of consists of 2 options:

1) I fix the "double-dot" issue during the read from the network stream, somehow scanning for the four successive bytes being (char) "\r\n.." and replace with 3 bytes (char) "\r\n." as I write it to file.

2) I read the entire file completely, reopen it and scan for the double dot on a new line (which seems to me somewhat redundant, so i'm leaning towards option 1).

So if I decide to fix this "double-dot" issue during the download, how do I go about detecting a new-line and replacing two dots with a single dot in an efficient manner (keeping in consideration the code below)?

Another option may be to revert back to a BufferedReader, reading lines at a time and making sure that the (int) value of 65533 is not read as 63. This would include playing around with the Charsets so i'm not sure what to do from there either.

Any input would be appreciated, many thanks.

publicstaticvoid save(Socket connection, String name)throws IOException

{

FileOutputStream output;

BufferedInputStream input;

try{

output =new FileOutputStream(name +".txt");

input =new BufferedInputStream(connection.getInputStream());

}catch (FileNotFoundException e){

System.err.println("Error creating file.");

}catch (SecurityException e){

System.err.println("You do not have write access to this file.");

}

byte[] buffer =newbyte[128];

int bytes;

try{

while ( (bytes = input.read( buffer )) != -1 )

{

output.write( buffer, 0, bytes );

}

}

finally{

input.close();

output.close();

}

}

[4007 byte] By [jabalsada] at [2007-11-27 9:50:23]
# 1

> My original battle was that a BufferedReader and

> FileWriter caused the downloaded file to fail a crc

> check. After investigation (comparison to a correctly

> downloaded file) I found that wherever the correct

> version had a (int) value of 65533 my downloaded file

> had a (int) value of 63. I came to the conclusion

> this has something to do with the default Charset

> being "windows-1252" on a Windows XP machine.

I would definitely investigate this further. I suspect your CRC technique was faulty. NNTP itself doesn't contain any binary data. Of course if you tried to pass the CRC through a BufferedReader it will be corrupted, or if you tried to compute it via characters instead of bytes.

I would just use BufferedReader.readLine() or maybe DataInputStream.readLine() for this problem.

ejpa at 2007-7-13 0:19:10 > top of Java-index,Java Essentials,New To Java...
# 2

Well I didnt write my own CRC checker. The type of articles i'm downloading from NNTP is yEnc, and I used an official yEnc decoder. Basically when I do a byte-for-byte read on a (known) non-faulty download and an equivalent download done by my java program then the data is exactly the same, except a single (byte) -3 in the correct version is written in my program as (byte) 63.

Ofcourse I could simply replace every -3 byte as 63, but that seems like an ad hoc solution and doesnt really solve the root of the problem.

I have also tried passing on a different Charset as parameter to InputStreamReader, i.e.

Input = new BufferedReader( new InputStreamReader(connection.getInputStream(), Charset.forName("US-ASCII") );

but this method results in a lot more (different types of) bytes being different than the correct verison.

Message was edited by:

jabalsad

jabalsada at 2007-7-13 0:19:11 > top of Java-index,Java Essentials,New To Java...
# 3

Hmm I just thought of something.

Since the type of content i'm downloading is encoded files, I figured maybe i should read/write the byte stream exactly as it comes in, and out into a file. I can then write a decoder which ensures that the double dot on a new line will be replaced with a single dot.

Any suggestions whether this is efficient or not? Other ideas will be appreciated aswell. Many thanks :D

I think the problem I had with the bytes being written incorrectly just has to do with the fact that I used a BufferedReader and not an InputStream. Certain characters werent recognized as I was reading them into a String or char variable and replaced them with the '?' char. I'm still not sure how to work around this as changing the Charset of the InputStream to "US-ASCII" and a couple of other charsets resulted in something worse.

jabalsada at 2007-7-13 0:19:11 > top of Java-index,Java Essentials,New To Java...