xml parsing - object creation - error handling...design suggestions?

I'm new to OO programing so am seeking some design help. I need to write a process which will parse an xml file and generate POJOs.

The problem is that the structure of the xml file changes quite a lot. Sometimes some attributes are there, sometimes not.

I want to define rules for my POJO creation, for e.g if a POJO consists of attributes A and B where attr A is required, attr B is optional. If the xml file contains both attributes A & B, then all is fine....POJO created. If B is not there generate warning (which will be logged) if A is not there, fail object creation (which again will be logged).

is there such a framework out there already? how would you go about designing this process to be reusable and easy to maintain? what patterns woud you use?

any help is appreciated.

[824 byte] By [manu212a] at [2007-11-26 14:46:49]
# 1

http://www.springframework.org/

it won't log that certain attributes weren't present, though. your own code will have to do that, although I'd question its usefulness to be honest. I suppose you could use springs AOP framework to achieve something similar,but if you're new to OO you probably don't want to muddy the waters further with AOP just yet :-)

why do you want to log that the properties weren't there? since you're putting the configuration together, you'll already know that!

your "patterns" question can't really be answered, design patterns don't tend to tackle entire projects in one go :-)

georgemca at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 2
http://www.castor.org/ http://jakarta.apache.org/commons/digester/
YoGeea at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 3

I want to log all this stuff, is because the file comes from another vendor. If something is wrong with the file and it doesn't contain all the information that I need, I want to reject the file and send it back to the vendor with an explanation on what's wrong with it.

I guess what I meant by 'pattern' is how would you lay out the classes in the framework. for example i was thinking i would have an attribute class first which would define if an attribute is required or optional. then a parser class which would iterate over all these attributes.....and generate the java object. any more insights that you can offer in this arena?

thanks for your help!

manu212a at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 4
You should supply your vendor with a DTD or an XML Schema, then all you need to do is validate the XML against the DTD/Schema and tell them that if it fails. Finding the error would be their problem.
dcmintera at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 5

I did a lot of work with XML at my last job and I thought long and hard about how to deal with this issue because what we were doing (JAXB class generation from schemas) was horrible. here's what I would recommend.

Create your pojos first. Use a tool to generate schemas from the classes and the binding code. I think JAXB 2 can do this and perhaps Castor. This is much easier to do that trying to generate Java classes from schemas. It's also a lot cleaner.

Use XSLT to create mappings from the input schemas to the schemas you have generated from the classes.

For validation, you can approach that a bunch of ways. You can apply a third schema to check for requirements. Don't update the generated schema (at least not manually) because you will probably need to regenerate it later. A second option is to create the Object (pojo, if you like) and apply validation rules on it before you return it from a factory. This will probably the most straightforward but least flexible. You could use a rules engine to verify either of the XML documents or the Object itself. This is the most flexible but adds more complications. Which validation scheme you should use depends on how many of these types of mappings you plan to do.

dubwaia at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 6

> You should supply your vendor with a DTD or an XML

> Schema, then all you need to do is validate the XML

> against the DTD/Schema and tell them that if it

> fails. Finding the error would be their problem.

This can work if the validation rules are simple. A lot of useful rules cannot be represented in a schema. For example, you can't really specify that an element contains at least one of it's children. Well, you can but it's a really ugly permutation of all allowed combinations. The othe problem with doing schema based validation is that the errors messages are often inscrutable. "Value does not match facet ..." and what not. There's also RELAX NG which I have no experience with but I have read good things about it.

dubwaia at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 7

If you go with what I am suggesting, I know there is a free visual XSL editor out there somewhere, I just can't find it. If you can get someone to willing to shell out for a good tool, try out the free Altova demos. It's been a long while since I used their software but it pretty much rocked when I did and it's probably better now.

dubwaia at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 8

> For example, you can't really specify that an

> element contains at least one of it's children.

Why can't you just add this to its type definition?

<element name="Foo" type="string" minOccurs="1"/>

> The othe problem with doing schema based validation

> is that the errors messages are often inscrutable. "Value

> does not match facet ..." and what not.

I'd make that the vendor's problem :-)

dcmintera at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 9

> > For example, you can't really specify that an

> > element contains at least one of it's children.

>

> Why can't you just add this to its type definition?

>

><element name="Foo" type="string" minOccurs="1"/>

>

That's not really applicable to the situation I mean. All that says is that foo must occur. What I mean is you have situation like this:

<parent>

<childA/>

<childB/>

<childC/>

</parent>

Where any combination of a b or c may be specified but at least one must occur. There's no way to specify this relationship cleanly in the schema. We ran into this all the time with the standards we were using. The only place this would be specified was in the documentation. We often had problems with partners sending invalid data that passed the schema validation.

And DTDs are a waste of time. They are way too simplistic.

> > The othe problem with doing schema based

> validation

> > is that the errors messages are often inscrutable.

> "Value

> > does not match facet ..." and what not.

>

> I'd make that the vendor's problem :-)

If that's acceptable, I'm with you. I just depends what kind of relationship you have with the people supplying the data. Really they should be validating it before they even send it so your responsibility for explaining what went wrong should be nearly none.

dubwaia at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 10

thanks for everyone's input.

dubwai - i really like your solution. I'll start investigating how to map the generated schema with the real schema. This will be the most difficult part since I haven't used XSLT before.

do you know of any rules-based validation framework which i can use to validate my generated objects? ideally, i'd like to be able to specify my rules in a config file and have the framework validate objects against those rules.

thanks for your help.

manu212a at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...
# 11

> thanks for everyone's input.

>

> dubwai - i really like your solution. I'll start

> investigating how to map the generated schema with

> the real schema. This will be the most difficult part

> since I haven't used XSLT before.

I would definitely suggest using a visual editor. These let you load two schemas and drap and drop arrows from source to target. You can also use the XPath functions to make changes like conversions and merging, and splitting data.Then, if you care, you can look at what the tool is creating and learn more about XSL. It should also be possible to create custom XPath functions which are useful for custom date formatting or common tasks.

The tool I used in the past was actually just creating XSL but each instance was basically equivalent to an XSLT document. Once you create these mappings running the transformation in Java should be fairly trivial.

> do you know of any rules-based validation framework

> which i can use to validate my generated objects?

No. Sorry, I can't make any specific recommendations there.

> ideally, i'd like to be able to specify my rules in a

> config file and have the framework validate objects

> against those rules.

You could roll your own. If your rules are pretty limited in scope, that might work. You could use a package like Drools (I have no experience with that, I just know of it). Or... One idea that I think might really work well is to use a scripting language like Groovy or Jython. You are really going to have to make this call.

And I agree with dcminter here, if you can get away with just using a schema validation, do it. It will save you gobs of time and trouble.

> thanks for your help.

Glad to do it. Since I left that job, I have't done any XML stuff. I was considering writing a package to create schemas from classes but I saw that JAXB 2 does this. I was going to look into how well it worked but got side-tracked. Anyway, it's good to get these things written down before I forget.

Just so you know, we were doing something similar to this where we had a set of cannonical XML files and a many input format to one cannonical setup with XSL mappings. This worked well although I think we were doing too much logic in XSL, Java's better for logic. THe problem was when we moved the data into Java. We had created classes from the cannonical schemas and were marshalling with JAXB. The problem was that it didn't get us anywhere. We still had to walk the tree of the XML document but all with hardcoded Java. DOM would be better. Essentially we still needed to map the data to real business objects and it was all hardcoded Java with hundreds of null checks all over the place. In short, it's hard to write good Java classes using the w3c schema language.

dubwaia at 2007-7-8 8:34:29 > top of Java-index,Other Topics,Patterns & OO Design...