Optimal Java I/O -- how can achieve?
I have a dilemma about how best to perform certain I/O operations in Java.
The issue arises when you have an InputStream which generates bytes at unpredictable times and in unpredictable amounts. For instance, the InputStream could come from a network socket. In my current case of interest, it actually comes from a general native Process that I cannot make any assumptions about. In contrast, an InputStream from a file (that you have an exclusive lock on) would NOT fall into this category because all of the bytes that will ever be read are available right now.
For the above type of InputStream, how do you optimally monitor it for data?
Consider the following Java fragment, which you may be tempted to write:
InputStream in;
byte[] readBuffer;
.
.
.
while (true){
int bytesRead = in.read(readBuffer);
if (bytesRead == -1){
handleStreamEOF();
break;
}
else{
handleNewBytes(readBuffer, bytesRead);
}
}
The first question I have is what exactly happens when you call the read method on the InputStream? The read javadocs clearly state that the thread executing it will block until at least either a single byte becomes available or the stream hits end of file. Fine -- that is perfectly clear: to the Java programmer, I/O in this case is purely synchronous.
What I really want to know is whether a blocked thread is still inefficiently wasting CPU cycles, or whether the OS/JVM are intelligent enough to recognize that if data is not available, then the thread should be put to sleep and only woken when data is available. In other words, is there any asynchronous I/O going on "under the hood"?
I have reason to believe that, at least with JDK 1.3.1 running on Microsoft OSes (NT and 2K), the blocked thread is NOT put to sleep, but actually continues to waste a lot of CPU cycles. This conclusion comes from profiling a couple of applications where I had many many threads all simultaneously doing, say, read I/O. In these cases, I set it up so that there was always at least one thread that had data available to read, and so should be using the CPU to process that data. Instead, my profiler showed that a majority of the CPU time was spent just on the read method. From that, I am guessing that the blocked threads are still wasting tons of CPU time.
If anyone has any experience on other OSes (e.g. Unixes), I would be very interested in what you found.
OK, for now let's assume that I/O blocked threads are in fact wasting CPU cycles. Also, assume that the asynchronous I/O facilities in Java 1.4 are unavailable -- both because 1.4 is still in beta, and also because, as I near as I can tell, 1.4 only gives extremely limited asynchronous I/O support (e.g. it is limited to sockets, and not to, say, InputStreams from Processes).
So is there any way of being more CPU efficient? The best method that I have come up with is a pseudo-polling solution, whose code might look something like:
InputStream in;
int pollInterval;
byte[] readBuffer;
.
.
.
while (true){
int bytesRead = in.read(readBuffer);
if (bytesRead == -1){
handleStreamEOF();
break;
}
else{
handleNewBytes(readBuffer, bytesRead);
}
if (stream.available() == 0)
Thread.sleep(pollInterval);
}
This solution sleeps for a fixed period of time if no data is available before re-attempting a blocking read. Of course, it has all the usual defects of a polling I/O solution (e.g. being woken up on a fixed schedule, instead of when data is actually available).
Surprisingly, it is not a true polling solution: you still have to do reads that could indefinitely block. This major defect arises because the -1 value returned by read seems to be the only way to detect stream end of file.
IF YOU KNOW ANOTHER RELIABLE WAY, PLEASE TELL ME!
This is entirely because Sun seems to have lacked the foresight to either supply an isEOF method for InputStream, or to specify in the contract for the available method what exactly happens when you call it on a stream that has reached EOF.
The correct specification would be to say that available returns -1 when the stream has reached EOF. An inferior specification would be to say that it throws an IOException. The actual javadocs say nothing, and, in fact, the implementation of available in InputStream itself is to always return 0 under all circumstances.
Had Sun done what I suggest, you could at least write a true polling I/O solution like
InputStream in;
int pollInterval;
byte[] readBuffer;
.
.
.
while (true){
case (stream.available())
-1:
handleStreamEOF();
return;
0:
Thread.sleep(pollInterval);
break;
default:
int bytesRead = in.read(readBuffer);
handleNewBytes(readBuffer, bytesRead);
}
}
You guys have any feedback and ideas for me?
[6357 byte] By [
bbatman] at [2007-9-26 8:39:39]

Wouldn't catching the IOException be faster than doing the case for -1 anyway?
(I assume that you are referring to my claim that Sun should provide a non-blocking way, such as thru the available method, to test whether a stream has reached EOF)
No -- catching an IOException would NOT be faster.
Exceptions are extremely expensive to generate -- much worse than general Object creation -- because of the stack trace that must be generated each time. Of course, that does not mean that you should be scared of throwing Exceptions in truly -exceptional- circumstances. But it does mean that it is a bad idea to use Exceptions for normal program flow.
-shrug-
I was just wondering, because for a 'for' loop, it's faster to have no bounds, and catch the exception, than it is to check the bounds each time around.
Ie .. this:
for (int i = 0; i < LIMIT; i++) {
...
}
Is slower than this:
try {
for (int i = 0; ; i++) {
...
}
} catch (IndexOutOfBoundsException e) {
}
I was wondering whether it was the same for I/O..
Your example is ONLY faster for extremely large loop sizes (see below).
The comparison operation that you have eliminated is about one of the fastest operations that can be done in about any language, compared to that extremely slow Exception generation. So, for smaller loops, the code that you wrote will actually be much worse performing.
Furthermore, if the body of your loop does any actual useful work, the percentage of time saved by eliminating the comparison op is minimal.
If you read any decent book on Java performance tuning, they universally reccomend against the technique you provided.
Here are some further online resources on this:
http://www.webcom.com/~haahr/essays/java-style/exceptions.html
http://www.javaranch.com/ubb/Forum15/HTML/000112.html
(Note: this last URL has an entry by Java authority Peter Haggar; he claims that your loop size needs to be greater than 1 million before eliminating the comparison gives you any benefit!)
Yeah, but for I/O, you're typically dealing with large files .. all you need for a million operations is 1 megabyte of data to be transferred. Also, it's not so much a matter of style in the case of I/O, because you need to catch the exception anyway, in case an error occurs.
"...because you need to catch the exception anyway, in case an error occurs."
That's the point. Exceptions are for exceptional conditions, not for normal control flow. This is a horrible bastardization of Java and its entire throwable feature-set.
Why not do the following, then?...
try {
while (true) {
something(someArray[i]);
i++;
}
} catch (IndexOutOfBoundsException e) {
// nothing to do
}
Well, unfortunately lots of 'exceptional' conditions happen often enough to be called 'normal'... the end of a stream *is* an exceptional condition, the condition that data cannot be read from it.
Whereas a little isEOF() method might be handy for files on the local machine, what happens for every other scenario, where the stream could be coming from anywhere, e.g. a socket?
>Well, unfortunately lots of 'exceptional' conditions happen often enough to be called 'normal
Perhaps.
>the end of a stream *is* an exceptional condition, the condition that data cannot be read from it.
NO! EOF is not an --exceptional-- condition. It is actually a totally normal condition that you should expect to happen sooner or later with EVERY stream.
EOF is to streams precisely what hasNext is to Iterators. Would you propose that the hasNext method be eliminated from Iterator and you just rely on Exceptions thrown from the next method to indicate that there is no more data? I would argue that that is a very bad idea. My suggested modifications (an isEOF method, or expand the behavior of available) would offer to streams the equivalent functionality of hasNext which they currently lack.
>Whereas a little isEOF() method might be handy for files on the local machine, what happens for every >other scenario, where the stream could be coming from anywhere, e.g. a socket?
The handiness of an isEOF method has absolutely nothing to do with the eventual size of the stream.
If you look at my original posting, the whole point had to do with avoiding blocking your thread, and a non-blocking isEOF would actually be most useful for "streams coming from anywhere, e.g. a socket", and would be less useful for a local file.
EOF only truly makes sense for files. I guess you could rename it to EOS, for networking it's still more often an exceptional condition, probably because you usually know how much data you're getting before you start reading it.
I don't know java deep down, but I don't get the catch in reading a file byte by byte. Your talking few cycle optimisations here, and still reading file byte by byte. If your reading say 1meg file, there's firt FileInputStream, which has buffer.. then there's BufferedInputStream which also has buffer, and then there's probably BufferedReader... the layers of one readByte is guite massive. **** I miss the day's of pascal. If I wanted to read file I just did
blockread(f, data, 65535);
then again... i don't miss it. Reading a meg file, means dealing with EMS, or something.
Anyway my point is that why would you want to read the stream byte by byte. The socket handling is not the answer since that data to socket's is also delivered by packets, more than one byte.
Here's my suggestion... Why don't you read a byte array of all available bytes, handle it in a tight loop, and then see if there's another (packet) available.
MikaelsStream s=new MikaelsStream(target);
while (s.hasTerminated())
{
byte data[]=s.getBytes();
int length=s.getBytesLength(); // the array might not be full all the time
for (int i=0;i<length;i++)
{
System.out.println(data[i]); // or whatever you want to do
}
}
My stream class, would be doublebuffered. It has two byte buffer, and while some thread is filling the other buffer, the other buffer can be handled. While calling getBytes() the buffers will be swapped. When you are filling the stream, it is done with methods getInputBytes() and setInputBytesLength().>
Hi,
I agree that having an InputStream.available() method whose contract would stipulate that it returns -1 if the end of stream is reached would be useful.
Anyway, in the case you specified, with an InputStream on a file you have an exclusive lock, there might be a solution because you actually know the length of the file you are reading from. So you can use that to build your own InputStream implementation for streams that have a known length.
You can do something like:
File file = new File("some_file_name");
InputStream inBasicStream = new FileInputStream(file);
InputStream inExtendedStream = new ExtendedInputStream(inBasicStream, file.length());
, where ExtendedInputStream class is something like:
public class ExtendedInputStream extends InputStream {
private long lStreamLength;
private long lReadLength = 0;
private InputStream inUnderlyingStream;
public ExtendedInputStream(InputStream inUnderlyingStream, long lStreamLength) {
this.inUnderlyingStream = inUnderlyingStream;
this.lStreamLength = lStreamLength;
}
public int read() throws IOException {
int nReadData = inUnderlyingStream.read();
lReadLength++;
return nReadData;
}
public int read(byte b[]) throws IOException {
int nCurrentReadLength = inUnderlyingStream.read(b);
lReadLength += nCurrentReadLength;
return nCurrentReadLength;
}
public int read(byte b[], int off, int len) throws IOException {
int nCurrentReadLength = inUnderlyingStream.read(b, off, len);
lReadLength += nCurrentReadLength;
return nCurrentReadLength;
}
public long skip(long n) throws IOException {
long lSkippedLength = inUnderlyingStream.skip(n);
lReadLength += lSkippedLength;
return lSkippedLength;
}
public int available() throws IOException {
if (lReadLength == lStreamLength) {
return -1;
} else {
return inUnderlyingStream.available();
}
}
public void close() throws IOException {
inUnderlyingStream.close();
}
}
and then you could count on the available method to tell you when the eos is reached.
This thread is awfully old, but I'm curious if there was ever any resolution. It seems to me like it would be a minimal addition to the InputStream API that would be a huge help.
I think the issue of read consuming time while it was blocked was either a bug in 1.3, or the profiling output was misconstrued.
If was perhaps unfortunate that available() continues to return zero even when the socket connection has been closed from the remote end, but it's too late to change now. Adding the isEof() could be done, but since channels provide a more general solution, I can't see that happening either.
So...of historical interest only.
Sylvia.
> I think the issue of read consuming time while it was
> blocked was either a bug in 1.3, or the profiling
> output was misconstrued.
So now in 1.4.1, calls to InputStream.read() do not consume system resources if they block for input?
> but since channels provide a more general solution, I
> can't see that happening either.
But afaik, channels are currently only available for use with sockets and thus not universally applicable to all I/O scenarios. Perhaps you are thinking that more channel classes will be developed in the future for other I/O needs besides just sockets?
> So now in 1.4.1, calls to InputStream.read() do not
> consume system resources if they block for input?
I would hesitate to answer for MS-Windows, since who knows how they implemented file I/O, but in the general case, a read operation -- from disk or socket -- won't waste CPU. Taking *NIX as the general case :-), and giving a 30,000 foot view: the read(2) syscall attempts to find the data in an OS buffer; if it cannot, then it registers itself as waiting for data and relinquishes control of the CPU. When data becomes available, a hardware interrupt occurs, the OS reads the data into a buffer, and schedules the waiting process to run.
Going to Sylvia's point about misinterpreting profiler output: a profiler does not track actual CPU time consumed; it tracks entry-to and exit-from a procedure, and tracks the time between those two points. That's why you'll see main() listed as consuming 100% of the CPU time. A blocking operation will still accumulate time from entry to exit; if there are multiple threads, in fact, you should see > 100% CPU consumed by read().
There seems to be a difference in the behavior of BufferedInputStream.read() between WinNT/2K and Linux/Solaris. Let me explain,
On WinNT/2K, BufferedInputStream.read() is more "diligent" than Linux/Soarlis, meaning it returns immediately when the data is availabe again. While on Linux/Solaris, once the buffer is empty, the read() falls into a long sleep and it even causes the data sender side to a halt. Try with a realtime application such as voice communication, one will see that the later causes data skip.
Anyone else noticed the same?
Hello!
I am trying to work with the Java Communication API to read and store some 50 Megs of binary data on the serial input.
I have an InputStream is, and I am supposed to read and save it in a file hus.bin. abuf is a 200 byte array.
I am using the following code.
BufferedInputStream in = new BufferedInputStream(is);
FileOutputStream fileo = new FileOutputStream("hus.bin", true);
for (int i=0; i < abuf.length; i++){
int newData = in.read();
if (newData == -1){
break;
}
abuf = (byte)newData;
if (i == abuf.length-1){
fileo .write(abuf);
i = 0;
}
}
the problem is that after some time some of the bytes are copied or deleted.
I have tried different byte lengths but I am facing the same problem. I would appreciate some help or guidance
Rgds,