Design decision about using a Collection's view
I hope this is the right forum to ask this question since it deals with a design decision.
I'm writting an implementation for a new type of file format called "NAP" (Network cAPture) which stores captured network packets in a file. Similar to PCAP and SNOOP file formats.
I'm designing the public API to this new file format around the Collection's List and Colleciton interfaces. The format simply consists of Records within the file and I'm providing a List interface view of those records. Any changes to the list are reflected physically in the physical file on the disk.
I've created 2 new interfaces IOList and IOCollection which have exactly the same methods as standard List and Collection interfaces with 2 exceptions. The letters "IO" are appended to their name and each method throws IOException. Both extends the List and Collection interaces so converting from IOList to List is automatic.
Here is the problem. If I supply the List and Collection Inteface views of my IOList and IOColleciton objects which internally generate IOExceptions, how do I relay the exceptions to the user when he uses the List and Collection methods which do not throw any explicit exceptions.
My current solution is to throw a special RuntimeIOException which subclasses RuntimeException. I felt that user had to be notified of any exceptions thrown internally. Of course the user can oversee the fact that he needs to catch the RuntimeIOException from standard List and Colleciton methods and be unpleasantly suprised probably at the most unoportune moment possible. At least that is my worry.
Has anyone previously tried something like this? Any advice?
Here is a link to my blog that goes into this issue and has an example of the API:
http://jnetpcap.sourceforge.net/?q=node/9
[1832 byte] By [
voytechsa] at [2007-10-2 23:00:16]

> I'm designing the public API to this new file format
> around the Collection's List and Colleciton
> interfaces. The format simply consists of Records
> within the file and I'm providing a List interface
> view of those records. Any changes to the list are
> reflected physically in the physical file on the
> disk.
>
> I've created 2 new interfaces IOList and IOCollection
> (...) Both extends the List and
> Collection interaces so converting from IOList to List is automatic.
1) maybe you may reconsider extending the well-known interfaces. Of course, if you give up the inheritance, the exception handling problem disappears.
Quoted from your blog:
The reason for this is that although the IOList and IOCollection interfaces are the prefered way of accessing the contained data since they declaretively throw the needed IOException, but becuse of the immense flexibility and familiarity with List and Collection interfaces, its worth while allowing this extremely common interfaces as views as well
The reason stated on your blog justifies using similardesign and method names in IOList and IOCOllection as in List and Collection, but it is not in itself enough of a reason to extend List and Collection.
Extending List and Collection means that the new classes may be used wherever existing code can alrady use LIst and Collections.
It sure may have benefit that existing code already manipulating List and Collection supplied as method arguments would be able to manipulate instances of your new classes almost transparently (Exception handling apart).
But I'm not sure that is a first goal of your API: in-file sorting and in-file searching, for example, might not be a very good idea, though I sure see how handy it looks API wise.
Note that I don't take into account code reuse of the collections implementation classes (e.g. AbstractList, ArrayList,...), since you declare to only extend the collections interfaces.
2) Even if you extend List and Collection, you can still provide public specific methods (e.g. an addIO() besides the standard List's add(), getIO() above List's get()). This way specific code that knows about your specific IOXxx classes can use the specific methods.
3)
> my IOList and IOColleciton objects (...) internally generate
> IOExceptions, how do I relay the exceptions to the
> user when he uses the List and Collection methods
> which do not throw any explicit exceptions.
I also read the part on your blog where you justify propagating the exceptions. There are several sides to it:
* Client code must not believe an add/get operation succeeded if it actually ended in a no-op or worse, a half-op. So you have to throw an exception, no questioning.
* "Abstract" client code (which knows only about List and Collection) cannot do anything specific to your IO exceptions : it just needs to get an exception, but doesn't care that the exception be of any special type (RuntimeIOException or anything).
* The compiler forces the exception to be a RuntimeException, so you can either:
- use a custom RuntimeIOException class, as you went
- use whatever existing RuntimeException subclass fits the specific problem (e.g. NoSuchElementException seems OK for an end-of-file, IllegalArgumentException if the data to add is bigger than whatever limit you might have, IllegalStateException if the underlying problem is related to file access right,...)
In both cases, you probably ought to chain the RuntimeException to the root-cause IOException.
* "Specific" client code (that knows about your specific IOList and IOCollection) would be neater using specific methods instead of going through the List and Collection methods (which would require to walk up the exception chain to handle the specific exceptions).
I've slept on this and came to the same conclusion at night.
I will only expose the IOList and IOCollection interfaces. This way it is very explicit and no funny business. I can also change the method names and drop the IO appended to them as well as there would be no conflict with List and Collection method names. Compiler would not let the user forget about the expection handling.
It really helps to write about a problem, sleep on it and get some great feedback. Thanks a bunch.
I have to question this whole design. It seems like a poor choice to keep a file open and write changes back real-time. Firstly because it's extremely inefficient and secondly because it greatly increases the chances of corrupting the files (e.g. leaving them in an invalid state.) For example, if you sort the List with the Collections API, it inserts an element in it's new position and then deletes it from it's original. Consider what happens on an IO error between those operations.
Why not just add load and save methods to your collection. Then the users only have to worry about IO error at those points and computation won't be bound to (slow) disk IO.
That is certainly a very valid observation. And I would agree with you if this was the only API the user had to deal with NAP files. There are several levels to this API and this is one of the levels. Depending where the user needs to be. The user can use the List interface which uses an indexed approach to file management or he can choose the iterative approach by dealing at slightly lower level of the API and he can do his own random access as needed. The iterative approach buys him very little since the file format allows very fast indexing even at extremely large file sizes. But that is his option.
NAP file format is fairely complex compared to PCAP or SNOOP and list has everything I can think of, for abstracting most of that complexity. The way I came up with the list approach is that my first attempts boiled down to what list interface provides. The maint thing that was missling from List interface was the IOException handling, but that is now resolved by utilizing a List like interface called IOList which has same methods as List but throws IOExceptions. There isn't anything more that can be done at this API level that the IOList interface can do. The user may need to use some lower level API to recover from errors, such as when record failed to be written, he can call the write() method directly or retrieve the backend buffers and fix them up.
http://jnetpcap.sourceforge.net/docs/jnetcapture-1.0/draft-slytechs-network-nap-00.html
Here is what I'm trying to accomplish with all of the API I'm working on:
0) A least common denominator API for all file formats. The List interface is not part of this common API, as its impossible to index non-NAP files. (SNOOP, PCAP, NAP, etc.)
1) Store large number of DataRecords (PacketRecords more specifically)
2) Break the file up into managable chunks or Bocks (at 512Kb by default)
3) Keep certain META information about the PacketRecords and Blocks
4) Provide semi-indexing of Records. I say semi as indexes are calculated at block levels not individal data records
6) Manipulation
6a) Changing record size (increasing/decreasing)
6b) inserting/deleting records
7) Super fast packet counting
Other formats such as PCAP and SNOOP only provide #1 today. Manipulation of data there is extremely difficult and results in basically making changes to the original file while writting out the result to a new file. This doesn't work well for very large files. Plus its impossible to index the records. So user deals with offsets into the file as a percentage instead of actual packet indexes.
The IOList abstraction sits on top of the lowest level API where the user can create his/hers own records and invoke read() and write() methods on them directly. The list makes things much easier as it provides ordering of records, caching and hides intimate details of the format. All this lower level functionality is accessible to the user directly as well.
For example, reordering of records may involve just switching some IDs on the records wihtout having to rewrite large amounts of data, or it may involve record's data being copied.
Most users won't deal with actual BlockRecords. They will be dealing with PacketRecords which span multiple blocks. The list implementation there is actually quiet different from the list impelementation for BlockRecords. It uses SoftReferences for all records. So all DataRecords within the BlockRecord are cached. The user works with views of that cached list which hides the fact that certain information he inquires is actually retrieved from PacketRecord and some from other supporting MetaRecords.
I'm designing this to work with 250Gig files (thats the size of my biggest file system) which will have millions of records. So information retrieved has to be weakly referenced or non at all.. I'm using the SoftReference approach to increase performance by caching. I figure if I can make it work with 250gig file, it should work with any size file, although I'm sure my users will let me know for a fact if it works or not.
All that said, I'm not 100% sure how well this IOList interface will fill that particular nitch. I am prepared to change the API if my initial tests with the reference implementation show weakness or as you point out prove to be unreliable or hard to recover from errors. It certainly looks feasable and my initial testing is coming up with increadible result. (i.e. accessing any record by index within a 2Gig file in 25 ms.) Indexing all the BlockRecords within a 2Gig file in 250ms. Error recovery is much harder to test, but I will work on it.
OK, you've got a lot here and you are clearly thinking a lot on this and it's too late in my day to absorb all of it.
But I think I do understand a little bit more and my points above don't really apply.
I don't think, however, that extending the List interface with doppleganger methods is a good idea.It seems to have very little value to me and I think you'd be better off just wrapping the IO exceptions as RuntimeExceptions and be done with it, if you go the route you are on. Doing this misses the point of interfaces. Anyone who uses your specific methods will be stuck. You will have to support these methods for eternity and if someone wants to change their code to use lists from somewhere else, they will have to modify thier code. Any developer worth his/her salt won't use your new methods.
Yeah, sry about the info overload. One thought just leads to another.
Hmmm, to be perfectly honest I'm also bothered by the fact that its not a clean List interface but somekind of derivative, that looks like a collections' list but its not. I got all excitted about the Collection's view of the file, but the devil is in the detail.
Nothing has been released, the API is still under development. I'm writting all the back-end code and obviously still need to spend a lot of time on the public API part.
I don't think that you should use Collections. Collections are meant for data manipulation in RAM, not in file IO. Collections aren't suited well enough to the file IO that you will be doing. You can look at it this way: JDBC didn't use Collections, and neither should you =P.
Cool. This is exactly the type of advice I was looking for.
The common theme here is, that in reality, Collections view for file IO is not a good idea, no matter how good it looks in priciple.
Please don't think of me as a complete novice, but understand that writting public APIs is a beast of its own in the programming world. I've written several O.S. projects and I learn a lot with each one. In order to really help with writting public APIs all my code in all my projects now seperates public API and private implementation into seperate modules. This forces me to think about he issue what the user sees and uses vs. how to implement it. And groups like this are very helpful to keep me in check.
I hope you guys won't mind if I ask again to review my efforts once I have it all better defined and documented. My user group is volacal, but unfortunately after the fact which is not as helpful as before its all nailed down permanently.