generating diagrams from code
I was wondering if anyone had any suggestions on ways of approaching the problem of building a tool to semi-automatically generate sequence diagrams from source code.
Will it be a matter of starting at the main method in the code and building a list of all of the methods which may be invoked.
As you can see I am only at the begining stages and any suggestions and guidance would be greatly appreciated.
Thanks in advance..
Ali Atli
[464 byte] By [
Mr.Smitha] at [2007-10-2 1:22:58]

I'm not aware of any book that tells you how to automatically generate sequence diagrams.
The best, most practical UML book I know is "UML Distilled" by Martin Fowler. Perhaps those tomes by the Three Amigoes will have some answers.
Your biggest problem will be parsing all that Java. Your problem has a great deal in common with compiler design. After all, you're parsing a language for one grammar and writing it out in another. The input grammar in this case is Java; the output "grammar" is UML. How much do you know about compilers?
It's a three-step process:
(1) Read and parse the Java into a parse tree.
(2) Iterate through the Java parse tree and generate the relationships needed for the sequence diagram.
(3) Write out the UML in some format.
What will the output format be?
How will you "know" the client that kicks off each sequence diagram? If it's a JSP, it won't be part of the Java code you're parsing.
Where is the UML notion of "actor" in Java code? It's not there.
Frankly, I'm not sure it can be done. You won't have enough information to get anything sensible out of it.
Or perhaps I'm the one who's ignorant.
%
Cool, a really good question for a change.
I'm share Duffymo view on this, I've use his post to add some more detail on the difficulty of this.
> I'm not aware of any book that tells you how to
> automatically generate sequence diagrams.
I would bet money that they don't exist, since even the most advanced UML case tools don't produce any thing like worth while sequence diagrams.
There are likely to be some pure research papers on some aspects of this but I would expect these to be pretty tightly focused and largely theoretical.
> The best, most practical UML book I know is "UML
> Distilled" by Martin Fowler. Perhaps those tomes by
> the Three Amigoes will have some answers.
I don't think this would be much help, IM!HO the UML part of this project would be the easy part.
> Your biggest problem will be parsing all that Java.
> Your problem has a great deal in common with
> h compiler design. After all, you're parsing a
> language for one grammar and writing it out in
> another. The input grammar in this case is Java; the
> output "grammar" is UML. How much do you know about
> compilers?
>
> It's a three-step process:
I think there are at least 4 steps
> (1) Read and parse the Java into a parse tree.
JavaCC would probably go a long way towards providing this part and the following article would be a good starting point, it includes 4 indepth articles on parsing Java source.
https://javacc.dev.java.net/
http://java.sun.com/developer/technicalArticles/Parser/index.html
> (2) Iterate through the Java parse tree and generate
> the relationships needed for the sequence diagram.
> (3) Write out the UML in some format.
(4) Convert the static source code into the dynamic sequence diagram.
A sequence diagram represent a single path through the code during a single runtime context. Therefore there should be a tree of sequence diagrams for each conditional in the source.
One way to approach this may be side step the obvious approach of parsing the source and try an alternative. Use (or modify) a code profilering tool to drop the required information while running through the required usecase.
I think that steps 2,3 and 4 would each be Ph.D Computer Science research projects.
> What will the output format be?
I think XMI (XML Meta-Data Interchange) format would be a good choice for output.
> How will you "know" the client that kicks off each
> sequence diagram? If it's a JSP, it won't be part of
> the Java code you're parsing.
It might be possible to map all the paths, but certainly not easy and there would be a serious risk of overloading the model with detail that was not required to get the model. In practice how many UML models document the entire program ? I've use UML a lot 6 years ( and 2 years OMT before that) and I dont do it, I capture the essense for the flow only.
> Where is the UML notion of "actor" in Java code?
> It's not there.
I think the normal entry points for the various types of Class (Application/Applet/Thread/Session EJB/Entity EJB,StartUp, etc, etc, etc) could be identified from the interfaces, but the tool would need to be 'trained' or configured to understand these individually and if you think about it there are quite a few. Just about every public interface in Java could be an potential entry point, because of Class Factories/Loaders.
> Frankly, I'm not sure it can be done. You won't have
> enough information to get anything sensible out of it.
I *think* it could be done, but it would need a team of real Computer Scientists capable of orginal research and a couple of years just to prove it was possible and it would almost certainly only a proof of concept and not a commercially viable product,.
> Or perhaps I'm the one who's ignorant.
Nope I dont think so I absolutely agree is would be a highly complex project.
I have studied compliers and grammers extensively in the past including writing real parsers. I've programmed in Java for 6 years; I'm very familiar with UML and I've done orginal commercial research (including DTV & VOD over IP and Web Architectures before they became fashionable). I would find this a pretty daunting (but very enticing) project, there are several areas of this that would require be absolutely original work.
> A sequence diagram represent a single path through the code during a single runtime context.
In UML2 blocks were added for conditionals and loops, so this is no longer true. One of the reason they were added was to make round tripping easier.
> I *think* it could be done, but it would need a team of real Computer Scientists capable of orginal
> research and a couple of years just to prove it was possible and it would almost certainly only a proof
> of concept and not a commercially viable product,.
As previously mentioned, at least one of the top end commercial UML2/SysML tools (Artisan RTS) already do this.
Not to say it's easy, but it does exist in COTS products, if you want to pay for it.
Pete