Graph algorithm question
Hello.
I am trying to find a graph algorithm (by graph I mean nodes interconnected to each other,not the sketch of a function).This algorithm should yield the route that passes from as many nodes of the graph as possible,though it is forbidden to step on a node twice,and then ends back to the first node that it started from,thus forming a cycle.
I am positive that this algorithm must have been researched by now,so in order to avoid thinking it myself I wonder if someone would know to direct me somewhere in the web where I can get it.Since I am a medical student,any terminology help would be also appreciated(as to the name of what I want,keywords to search etc).
look up Hamiltonian Cycle. Massively researched, yes. Trivial to find, no.
Yes,you are right,thank you.A cycle that traverses all the EDGES of a graph, traversing each edge exactly once, is called a Hamiltonian path.
I had heard what I want is similar to a Hamiltonian,but my algorithm has 2 differences:
the 1st, is that I want to pass as many NODES as possible(because in some graphs you just can't pass all of them) and
2nd,I want the path to end where it started,thus forming a cycle.
Do you have any idea where in the net I could find resources on the subject?
Thanks for responding !
> Yes,you are right,thank you.A cycle that traverses
> all the EDGES of a graph, traversing each edge
> exactly once, is called a Hamiltonian path.
No, that's an Eulerian cycle. That's an easy problem.
> I had heard what I want is similar to a
> Hamiltonian,but my algorithm has 2 differences:
> the 1st, is that I want to pass as many NODES as
> possible(because in some graphs you just can't pass
> all of them) and
> 2nd,I want the path to end where it started,thus
> forming a cycle.
Yes, research Hamiltonian cycles. You'll find that this is considered to be an intractable problem, especially for large graphs. As Marlin indicated this problem has been heavily researched, and there are probably different approaches you could take, including but not limited to:
Restating your problem as an easier problem.
Approximating the longest tour.
Find out if there are special cases of the Hamiltonian tour that are easy to solve. (e.g. for a complete graph the problem is trivial).
> Do you have any idea where in the net I could find> resources on the subject?Like marlin314 already said: it's massively researched. So I think your favorite search-engine will come up with a lot of information.Good luck.
Let me toss out another suggestion. I can not begin to count the number of times that someone posts a question where they are working on a problem, P, have worked it down to the place that if there was some easy way to do X, they could solve their problem P.
Instead of posting their problem P, they ask how do I do X, and the answer unfortunately turns out to be that X is a computationally hard problem and would require a time beyond the projected heat death of the universe to accomplish.
The fact that X is difficult or impossible does not mean that there is no hope for problem P. Given that you admit to being a medical student, how about letting us domain experts (algorithms design, that is) have a look at the problem you are trying to actually solve and see if we can't give some helpful advice.
Just pointing out that it is not always the case that "simplifying the problem" for us always actually simplifies the problem. Tell us what you are trying to do.
marlin:
> Just pointing out that it is not always the case that
> "simplifying the problem" for us always actually
> simplifies the problem. Tell us what you are trying
> to do.
You're absolutely right there,this thing with the P and X has happened to me lots of times.
So, this is what I'm trying to do:
I am coding a program in VB that depicts a graph.If there are no cycles in the graph(eg if it's like a tree) my algorithm assumes that each node is charged and repels other nodes electrostatically so it deploys the graph progrssively,node by node.
However if there are cycles,their nodes have to be put on the screen simultaneously as a block cycle.
Now,if more than one cycles merge together,the problem becomes complex,and I am trying to find the longest cycle possible in that group of cyclic nodes to put it on the screen as a block.That's it.
I have been coding this program for 3 years now,I hope to make it a PhD,it's about pathology thinking.
I looked up Hamiltonian cycle and found that I messed up,this IS like what I'm trying to find.It's called longest cycle problem.
I did research a bit the thing on the net,but I got lost in senses like toughness of a graph,cyclability,connectivity etc
The group of nodes I am trying to find the longest cycle,has these properties:
* it is formed of cycles that have at least one common edge between them, and
* each node has at least 3 edges
Also,I have to admit I don't know the terms "trivial" and "intractable" that you previously used.
Are you trying to tell me that my problem isn't yet solved?I have done some thoughts on my own,which involve pairs of nodes in the graph,that,if removed they separate the graph in 2 or more pieces that don;t connect to each other.I know these pairs are somehow crucial to this algorithm but the problem was too omplex for me to keep thinking on it.
Anyway,i'l keep researching it now that I know it's official name,longest cycle,but just in case you happen to hve something for me,please notify me by email: dimvasiadis@hotmail.com
Thanks again for your answers!
> Also,I have to admit I don't know the terms "trivial"
> and "intractable" that you previously used.
> Are you trying to tell me that my problem isn't yet
> solved?
Trivial means that the solution requires little or no effort. This would be the case if you had a complete graph (i.e. there is an edge between every pair of nodes), because any permutation of the nodes would be a Hamiltonian cycle.
Intractable means that it is possible to find a solution, but the sun would burn out before we found it. That's the case for the general Hamiltonian cycle problem. For example, here's an algorithm that solves your problem:
For every possible permutation of the vertices
Test if the permutation is a legal cycle
If it is, problem solved, if not, try the next permutation.
For large graphs, this will take forever to compute as there are n-factorial possible permutations. IIRC, 50! is roughly the number of molecules in the universe (one googol). Computers are pretty fast, but they won't be counting to one googol anytime soon.
Now, if your graph is small this brute force approach may work... how many nodes do you expect the graph to have?
Also, it appears your graphs may have some regular structure, this could be exploited to find a faster algorithm.
Yes,this is my approach also,though I thought maybe someone had a smarter one.
I backtrack the path each time a node (other than the current node) is left with only one possible exit and I store as result in a collection each time i get to the final node.
Then I scan all the results to find the longest one.
My graph is made up of cycles that have common edges (imagine various polygons merged together).
I don't expect more than 100 nodes in such a complex though.
My graph is an expert system,each node is a medical condition and each edge is an etiological cause.
e.g. [Death] is a node and [Banging your head on the computer] is another node and they are linked etiologically.There are thousands of nodes in my databases,but I expext only hundreds to form cyclic blocks as mentioned above.
Hi,From what i can gather you need a route from the start destination to the start destination again passing through as many nodes as possible. Please check for customized versions of Dijkstra's Algorithm,Regards,
Hey thanks!
I checked it out,it seems this algorithm finds the SHORTEST path,but if you set the weights of the edges to be negative,and if you set [end node=start node] then you get the LONGEST CYCLE.
However,in some sites it says that the weights cannot be negative(I hope it means in order to fnd the shortest path) and that the variation for the longest path works only on acyclic graphs.Mine is made up of cycles though.I didn't find an exact longest path problem answered,just implications,but I'll keep looking.
Thanks Again,this might be it!
I checked personally the negative weights version of the Drijka algorithm to yield the longest path and unfortunately I figured it doesn't work,not even on a simple cyclic graph.
Imagine a triangle ABC.We will use the Drijka algorithm to find the longest path from A to C ,which of course we know a priori that is A-B-C.We mark each edge with weight -1.
The algorithm starts at A.Seeing that both options have a weight of -1 ,it randomly chooses to go directly to C.C point gets in the Visited nodes array,and that's it.The algorithm variation calculated wrongly that the longest path is A-C
The reason that it doesn't work from my point of view is that the Drijka algorithm,always chooses the minimum edge to follow.There is no chance that t should follow a largest option in value,which would later be proved correct.There is no such case when you're searching for the shortest path.However,when you're searching for the longest path,even what seems to be long at first,it might prove shortest in the end.The Drijka algorithm can't see that.
So I guess I'm back where I started...
Thanks anyway,I learned somthing useful studying that algorithm.
You could modify Dijkstra, not by using negative numbers, but by making it keep searching. Use a hashtable of visited nodes to avoid infinite loops.
Keep a value 1 for the weight of the edges.
Create a copy of your start node and use it as a destination node.
This is my idea:
/*
Create a list where each entry contain:
a node,
the number of edges traveled to reach that node,
the path traveled to reach that node,
a completed flag indicating if the longest path to that node has been found
Populate your list with:
the start node,
the value 0 (for edges traveled),
and an empty path
a false completed flag
while not all nodes in list are completed
{
for each node A in your list
{
set the flag f to false
for each node B that can be reached directly from A
{
set f to true
if B is not in the path of A
{
if B is not in the list
{
insert B to the list
set the path of B to be the path of A + the edge AB.
set the value of B to the value of A +1
}
else if the value of B is less than the value of A+1
{
set the path of B to be the path of A + the edge AB.
set the value of B to the value of A +1
}
}
}
if f is false,
mark A as completed.
if A is the destination node
You are finished!
}
}
*/
What do you think?
OK. Before you get too geeky here, try a brute force approach. if your graph is small or has limited connectivity, you may not need to get fancy. Its surprising what a computer can do without breaking a sweat. In any case I whipped up a solution as shown below. It is untested and will most likely have bugs (and does not check for null conditions). But as you can see the code is pretty basic and should get you from point A to point B (pun inteneded).
public List<Integer> getLargestCycle( Map<Integer,List><Integer>>graph )
{
List<Integer> longestCycle = new Vector<Integer>();
List<Integer> path = new Vector<Integer>();
for ( Integer startingPoint : graph.keySet() )
{
path.add(startingPoint);
List<Integer> aCycle = getLargestCycle( graph, path );
if ( aCycle.size() > longestCycle.size() )
longestCycle = aCycle;
}
return longestCycle;
};
protected List<Integer> getLargestCycle( Map<Integer,List><Integer>>graph,
List<Integer> currentPath )
{
List<Integer> longestCycle = new Vector<Integer>();
Integer currentPoint = currentPath.get( currentPath.size()-1 );
Integer startingPoint = currentPath.get( 0 );
for ( Integer nextPoint : graph.get( currentPoint ) )
{
List<Integer> aPath = new Vector( currentPath );
aPath.add( nextPoint );
if ( nextPoint.equals( startingPoint ) )
{
if ( longestCycle.size() == 0 )
longestCycle = aPath;
}
else if ( currentPath.contains( nextPoint ) )
{
// not a cycle here.
}
else
{
List<Integer> aCycle = getLargestCycle( graph, aPath );
if ( aCycle.size() > longestCycle.size() )
longestCycle = aCycle;
}
}
return longestCycle;
}
WOW.Thanks.That idea might have hit the spot..
I was testing your algorithm.
However,it is not clear to me, when do you mark a node as completed?
I think you meant to put the [set f = true] inside another condition.Perhaps you meant to set it along the condition that changes the path of a node?
Also,imagine this.
let's say that all of A's edges (B,C and D) are visited,so A is marked complete I guess?
Then,afterwards,due to a cycle that comes over from D,the path of B is increased.How is the path to A going to be increased,since we've marked it complete?(let's suppose that A is not the initial node and it dosn't have a copy node)
Anyway I'll look more into it and get back to yoy by tomorrow morning.
Another issue:are we sure that this algorithm is faster than scanning all the possible paths?(using a collection that tracks the path made already and the options in each step and using backstepping in dead ends while erasing the opion that lead to the dead end).
I didn't have ore than half an hour to check it today,but I'll surely put more into it this evening.It actually seems that it should be working though.
THANKS!!!
Wait.My previous reply was for Ragnvald_id2's algorithm.
Miteke,I didn't have time to review your algorithm,but I'll do it tonight.I expext my largest graph to be measured in hundreds of nodes,that is for the next months anyway.However I'm not quite familiar with your syntax(I only know VB).I got Ragnvald_id2's syntax,but could you elucidate what this means?
List<Integer> path = new Vector<Integer>();
in VB you declare an array like this
Dim Array() as Integer
And then you set the bounds of the array like this:
Redim Array(1 to 5,0 to 1)
and what's
public List<Integer> getLargestCycle( Map<Integer,List><Integer>>graph )
is it declaring a sub named getLargestCycle public ,and then declaring it's parameters I guess?
> WOW.Thanks.That idea might have hit the spot..
> I was testing your algorithm.
> However,it is not clear to me, when do you mark a
> node as completed?
> <snip>
We are not marking anything. We know where we have visitied because each itereation passes a list of previously visited nodes in order of visit.
The currentPath list contains all the nodes visited for an iteration (in affect, that is how they are marked). When the path reaches a point where there are no exits an empty list is returned (meaning no cycle). If the path reaches a point where the only exits double back an empty list is returned. If the path reaches a point where the exit coubles back to the starting point you have a cycle and the cycle is recorded. And, of course, the #1 possibility is that an exit reaches a new point and it is added to the path and the checking continues. But all but the last option terminates a search for a given path.
> Another issue:are we sure that this algorithm is
> faster than scanning all the possible paths?
>
Fast? Who said anything about fast. In fact I'm pretty certain that this is about the S*L*O*W*E*S*T algorith one could come up with. That's why I called it a brute force algorithm. But in many cases brute force is better because it keeps your algorithm simple. Clean, intuitive code in MOST situations is better than efficient code. That's why most folks don't use machine code.
Second, it DOES scan all possible paths.
To speed it up you could:
1) Find out what the maximum size of your cycles are and don't search beyond this length.
2) Find a way of avoiding recalculating cycles (for example the cycle ABCDEF is the same cycle as DEFABC).
3) Take advantage of some domain knowledge and start the seach from a specific point instead of trying from all points.
> Wait.My previous reply was for Ragnvald_id2's
> algorithm.
> Miteke,I didn't have time to review your
> algorithm,but I'll do it tonight.I expext my largest
> graph to be measured in hundreds of nodes,that is for
> the next months anyway.However I'm not quite familiar
> with your syntax(I only know VB).
>
I'm surprised. You do know that this is a Java forum? The syntax is, of course, Java.
> I got Ragnvald_id2's
> syntax,but could you elucidate what this means?
List<Integer> path = new Vector<Integer>();
You need to read the Java tutorial on collections. Basically a list is an ordered collection of items (in this case Integers since that is what is inside the <>). So path is nothing but an ordered collection of integers. When first created it has no elements in the list. Each time you call path.add it adds an element tot he end of the list. You do not need to manage the list, the machine will do that for you. You can get the first element by calling path.get(0), the last element by calling path.get( path.size()-1), and for that matter the nth element by calling path.get(n); A lot like an array except it is dynamic and very easy to manipulate (like that add method I used to grow the list).
The graph is a map from integers to a list of integers. So for each node in your graph you will map it to the list of nodes it is connected to.
So to construct your graph you would construct a list of edges for each node and add it to the map as in the following...
graph.put( 1, list containing 2, 3, 4, 5, and 6 );
graph.put( 2, list containing 3, 7, 9 );
etc.
>
> and what's
> public List<Integer> getLargestCycle(
> Map<Integer,List><Integer>>graph )
> is it declaring a sub named getLargestCycle public
> ,and then declaring it's parameters I guess?
>
Exactly. It is a method that returns a list of nodes (your cyclic path), and takes as input the graph.
Anyway,without having looked at miteke's idea yet,I'm posting my algorithm just for the fun of it (about getting geeky..)
My algorithm is proceeding randomly into the graph,until it reaches a dead end or a situation where a node is left out for sure.If it does,it backtracks.If it finds the end node(eg the start node again),it measures the path's length and keeps the path in memory,if it is the biggest found yet.When the algorithm passes a node(except the first node)it temporarily deletes all of its edges that don't participate in the path,to simplify the graph and CHECK if this deletion leaves out a node.
I use a Collection to track my steps(or Array if you prefer,it's the same thing almost).Let's name that collection Steps.Each item of that collection is another collection,that holds inside the possible options from this step on.
E.g. if A is the starting node,and is connected to B,C and D
the Steps Colection should like
Steps:
Item 1:B,C,D
Now we choose the first option,which is B and proceed.The new options to go are whatever nodes B is connected to,apart from A which we already have stepped onto.
However,there is an exception here,since A is the final node ,we may step there.This is the only node in which we can step twice,thus concluding the cycle.
Let's suppose that B is connected to A,E and F
The Steps Colletion should now look like this
Steps:
Item 1:B,C,D
Item 2:A,E,F
Now we choose to proceed to A
Since this is not the 1st step,we temporarily kill all of the other B's edges that don't bolong to the path,in order to simplify our graph,sinve it is impossible to step on those edges.
Thus,when we proceed to A,we tmp erase these edges: BE,BF
Now,we check E and F whose edges have been erased,to see if any of these nodes i left with ONLY ONE EDGE.If so,then it is clear that this node cannot be visited,so our path may not be optimal.Thus we transfer the current state to a collection for later use,and BACKTRACK.This is the first of two situations that causes a backtrack.
When backtracking,we reconstruct the deleted edges and scratch the first option in the options list in our current step.
Let's say that the node E has only 2 edges left,one to B and one to G.Thus,moving from B to anywhere but E,leaves E out.
So we backtrack.
The Steps Colletion should now look like this
Steps:
Item 1:B,C,D
Item 2:E,F
We proceed to E.
The Steps Colletion should now look like this
Steps:
Item 1:B,C,D
Item 2:E,F
Item 3:G
We proceed to G
The Steps Colletion should now look like this
Steps:
Item 1:B,C,D
Item 2:E,F
Item 3:G
Item 4:D,H
We proceed to D
Steps:
Item 1:B,C,D
Item 2:E,F
Item 3:G
Item 4:D,H
Item 5:A
And finally we meet our first "legal" cycle,A-B-E-G-D-A
Now,we backtrack deleting the D option from step 4,since we followed that one:
Steps:
Item 1:B,C,D
Item 2:E,F
Item 3:G
Item 4:H
and so on,until
a)a Hamiltonian cycle is found
or b)we deplete all of our options,in which case we run through the known non- optimal paths that we have stored aside and then we keep the cycle with the greater length.
Oh,now I saw your answer..god,while I'm replying you answered again,I didn't catch it sorry.
So, thanks for clarifying.Yes I'm aware that this is a Java forum,but my problem is an algorithmic one so it doesn't matter I guess.I googled "graph algorithms forum" and you guys came up about first I think.
So,to sum up,miteke I'll check your algorithm tonight.
Ragnvald_id2,if you come around please check my pre-pre-previous post about questioning the marking of complete on the nodes on your algorithm.
Thanks again.
Ragnvald,I checked your algorithm with anticipation,and figured my self that if a node's path shoud be increased in size,then all of its adjacent nodes that
don't participate in the path should be marked uncomplete,in order to be rechecked so they inherit the increased path.
E.g. if A links to B,C and D and A is marked Complete,and afterwards the path to C is increased,then A should marked Uncomplete so it inherits the longest
path.
Anyway,even with this addition,a few moments later I discovered a MAJOR glitch in your idea :((( that I couldn't fix.
Consider a simple triangle,ABC again.
Let's say we use your algorithm to find the longest path to B,which is of course ACB.
The algorithm starts from A
It marks B and C with paths AB and AC respectively,adds them to the list,then marks A as complete.
So we have:
List:A,B,C
A. Path:- Length:0 Status:Complete
B. Path:AB Length:1 Status:Uncomplete
C. Path:AC Length:1 Status:Uncomplete
Then,it starts processing the list again,and proceeds to B,since A is already checked and so marked Complete(I guess that's when you intended to mark a node
as Complete?)
From B,it sees that it can increase the path to C,so it erases C's current path and replaces it with ABC.
So we have:
List:A,B,C
A. Path:- Length:0 Status:Complete
B. Path:AB Length:1 Status:Complete
C. Path:ABC Length:2 Status:Uncomplete
Now,only C is left to check.
Well now comes the glitch,SINCE THE PATH AC THAT WOULD LEAD US TO FIND THE CORRECT ANSWER ACB, IS KILLED.
From C,the computer sees that B IS in the path Of C,so it doesn't even bother to do anything with B.
There now,you have your glitch in a mere triangle.
Bogus.However I was almost certain that your algorithm was correct when I had a first look at it.
If I misunderstood something please clarify.Before you go on solving this bug for the triangle graph in this example,imagine a larger graph with lots of
cycles,and the case that the glitch may appear after a large link depth (link=edge),doing a big cycle and then reappearing..
Miteke I also looked at your algorithm,and from what I can make out from the syntax,your algorithm checks all the possible cycles as you said.Well,that's my
current approach too,just with backstepping added.I thank you for getting into all the trouble posting that code.
An idea I had researched but never been able to exploit,was that of articulation nodes (I later found out that that's what they're called).An articulation
node is a node that interferes in the connection of 2 or more parts of the graph.That is,if you remove the articulation node,the graph is separated in 2 or
more parts that are not connected to each other.In an easier way to put it,imagine that my graph is made of connected parts of cycles.So the 2 points in
which a cycle touches the graph,these are a pair of articulation nodes.If you remove them,that cycle doesn't touch the graph.Ok ,so:It is easily
understandable that when the path reaches an articulation node that is the point of connection of the cycle A to the graph,it is obligatory that the path
should "get in" the cycle A and "get out" from the other articulation node, or the cycle A will not be included in the longest path.However,there is no way
of telling IF the cycle A should be included in the path,because if it is,it may actually decrease the size of the path,since another option could be better.
