generating diagrams from code

I was wondering if anyone had any suggestions on ways of approaching the problem of building a tool to semi-automatically generate sequence diagrams from source code.

Will it be a matter of starting at the main method in the code and building a list of all of the methods which may be invoked.

As you can see I am only at the begining stages and any suggestions and guidance would be greatly appreciated.

Thanks in advance..

Ali Atli

[464 byte] By [Mr.Smitha] at [2007-10-2 1:22:58]
# 1
Most reverse engineering tools I've used give you class diagrams at best.My recollection is poor, but it might be that TogetherJ can do it for you.Unfortunately, that's a hard problem. I've always done them by hand.%
duffymoa at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 2

I'm sorry if you have misunderstood me... I'm trying to develope such a tool and I would greatly appreciate it if someone could point me in the right direction on where I can find information on the best way of going about it... Perhaps a good UML book that describes good practices in reverse engineering to produce sequence diagrams from code (manually or automatically).

Mr.Smitha at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 3

I'm not aware of any book that tells you how to automatically generate sequence diagrams.

The best, most practical UML book I know is "UML Distilled" by Martin Fowler. Perhaps those tomes by the Three Amigoes will have some answers.

Your biggest problem will be parsing all that Java. Your problem has a great deal in common with compiler design. After all, you're parsing a language for one grammar and writing it out in another. The input grammar in this case is Java; the output "grammar" is UML. How much do you know about compilers?

It's a three-step process:

(1) Read and parse the Java into a parse tree.

(2) Iterate through the Java parse tree and generate the relationships needed for the sequence diagram.

(3) Write out the UML in some format.

What will the output format be?

How will you "know" the client that kicks off each sequence diagram? If it's a JSP, it won't be part of the Java code you're parsing.

Where is the UML notion of "actor" in Java code? It's not there.

Frankly, I'm not sure it can be done. You won't have enough information to get anything sensible out of it.

Or perhaps I'm the one who's ignorant.

%

duffymoa at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 4
i heard about one book"applying uml and design patterns" by larmanmay be that book will help u
phanidharkumara at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 5

I'm not sure that you will find a book - tool design is a much smaller market than tool use.

Each swimlane/object extent of a sequence/activity diagram can be extracted from a method, and recursive calls mapped to more deeply embedded lanes.

As to whether the result will be useful is anyone's guess - the only round trip tool I know of is Artisan's Real time studio, and that appears to operate on the assumption it's simple procedural code that's being reverse engineered. UML diagrams tend to be most useful to people when opinionated - a person has pruned enough clutter that they are showing one thing; something machine generated won't have that clarity.

Pete

pm_kirkhama at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 6

Cool, a really good question for a change.

I'm share Duffymo view on this, I've use his post to add some more detail on the difficulty of this.

> I'm not aware of any book that tells you how to

> automatically generate sequence diagrams.

I would bet money that they don't exist, since even the most advanced UML case tools don't produce any thing like worth while sequence diagrams.

There are likely to be some pure research papers on some aspects of this but I would expect these to be pretty tightly focused and largely theoretical.

> The best, most practical UML book I know is "UML

> Distilled" by Martin Fowler. Perhaps those tomes by

> the Three Amigoes will have some answers.

I don't think this would be much help, IM!HO the UML part of this project would be the easy part.

> Your biggest problem will be parsing all that Java.

> Your problem has a great deal in common with

> h compiler design. After all, you're parsing a

> language for one grammar and writing it out in

> another. The input grammar in this case is Java; the

> output "grammar" is UML. How much do you know about

> compilers?

>

> It's a three-step process:

I think there are at least 4 steps

> (1) Read and parse the Java into a parse tree.

JavaCC would probably go a long way towards providing this part and the following article would be a good starting point, it includes 4 indepth articles on parsing Java source.

https://javacc.dev.java.net/

http://java.sun.com/developer/technicalArticles/Parser/index.html

> (2) Iterate through the Java parse tree and generate

> the relationships needed for the sequence diagram.

> (3) Write out the UML in some format.

(4) Convert the static source code into the dynamic sequence diagram.

A sequence diagram represent a single path through the code during a single runtime context. Therefore there should be a tree of sequence diagrams for each conditional in the source.

One way to approach this may be side step the obvious approach of parsing the source and try an alternative. Use (or modify) a code profilering tool to drop the required information while running through the required usecase.

I think that steps 2,3 and 4 would each be Ph.D Computer Science research projects.

> What will the output format be?

I think XMI (XML Meta-Data Interchange) format would be a good choice for output.

> How will you "know" the client that kicks off each

> sequence diagram? If it's a JSP, it won't be part of

> the Java code you're parsing.

It might be possible to map all the paths, but certainly not easy and there would be a serious risk of overloading the model with detail that was not required to get the model. In practice how many UML models document the entire program ? I've use UML a lot 6 years ( and 2 years OMT before that) and I dont do it, I capture the essense for the flow only.

> Where is the UML notion of "actor" in Java code?

> It's not there.

I think the normal entry points for the various types of Class (Application/Applet/Thread/Session EJB/Entity EJB,StartUp, etc, etc, etc) could be identified from the interfaces, but the tool would need to be 'trained' or configured to understand these individually and if you think about it there are quite a few. Just about every public interface in Java could be an potential entry point, because of Class Factories/Loaders.

> Frankly, I'm not sure it can be done. You won't have

> enough information to get anything sensible out of it.

I *think* it could be done, but it would need a team of real Computer Scientists capable of orginal research and a couple of years just to prove it was possible and it would almost certainly only a proof of concept and not a commercially viable product,.

> Or perhaps I'm the one who's ignorant.

Nope I dont think so I absolutely agree is would be a highly complex project.

I have studied compliers and grammers extensively in the past including writing real parsers. I've programmed in Java for 6 years; I'm very familiar with UML and I've done orginal commercial research (including DTV & VOD over IP and Web Architectures before they became fashionable). I would find this a pretty daunting (but very enticing) project, there are several areas of this that would require be absolutely original work.

MartinS.a at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 7

I think MartinS's comments are very good.

I liked seeing his recommendation of JavaCC. That will jumpstart the first bit.

Unfortunately, that's the easy part, as he's already noted.

The tree of sequence diagrams sounds right to me, because the automatic algorithm won't necessarily know the right class to kick off the sequence diagram. Maybe something like Google's weighting of goodness (e.g., "this sequence diagram is rated at 98%; this one is at 47%") might be useful.

A fine discussion, MartinS. Thanks.

%

duffymoa at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 8

> A sequence diagram represent a single path through the code during a single runtime context.

In UML2 blocks were added for conditionals and loops, so this is no longer true. One of the reason they were added was to make round tripping easier.

> I *think* it could be done, but it would need a team of real Computer Scientists capable of orginal

> research and a couple of years just to prove it was possible and it would almost certainly only a proof

> of concept and not a commercially viable product,.

As previously mentioned, at least one of the top end commercial UML2/SysML tools (Artisan RTS) already do this.

Not to say it's easy, but it does exist in COTS products, if you want to pay for it.

Pete

pm_kirkhama at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 9
You could hook into Java Debugging API and track methods being executed by different execution paths...
kelvekara at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 10

> > A sequence diagram represent a single path through

> the code during a single runtime context.

>

> In UML2 blocks were added for conditionals and loops,

> so this is no longer true. One of the reason they

> were added was to make round tripping easier.

Has anyone worked with UML 2? How does it compare to the original from a unsefulness perspective. I found only a couple types of diagrams to be useful in UML

dubwaia at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 11
I understand that there's a lot of interest in UML2 in systems engineering and hard real-time, but I haven't used it myself on a project-most the work I do is less specified.Pete
pm_kirkhama at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 12
Read replies
mscscfa at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...
# 13

I have been looking at the same problem, and think it is possible - but requires some serious work; on-the-fly configuration - think of J2EE applications - package level references as well as class level, package filtering (do you really want to capture commons.logging hits? No), turning on/off for business function specific sequence runs.

The scope could expand into frequency hits of classes and methods - making sure code has efficient & logic flow.

One application worth looking at is jseq: http://www.edge.co.th/products/jseq/

This is basic in functionality - but a great start to the problem. Pitty its not open-source, I'm sure it would take off it was.

Good luck.

lscovella at 2007-7-15 18:44:32 > top of Java-index,Other Topics,Patterns & OO Design...