Graph Shortest Path Solution by SQL

In graph theory,"Graph Shortest Path Problem" of finding a path between two nodes of a graph in a way that the sum of the weights/distance of its constituent edges is minimized.

Here I am trying to solve "Graph Shortest Path" problem by SQL and will try to find shortest path from 'A' to 'D' nodes. I have created "flight" table to store graph values and inserted some demo data as

SQL> create table flights
  2  (
  3    id             number,
  4    from_point     varchar2(10),
  5    to_point       varchar2(10),
  6    distance      number
  7  );
Table created.

SQL> insert into flights
  2  select 0 id, NULL d1, 'A' d2, 0 d from dual union
  3  select 1 id,'A' d1, 'B' d2, 05 d from dual union
  4  select 2 id,'A' d1, 'C' d2, 10 d from dual union
  5  select 3 id,'B' d1, 'C' d2, 15 d from dual union
  6  select 4 id,'C' d1, 'D' d2, 20 d from dual union
  7  select 5 id,'B' d1, 'D' d2, 25 d from dual union
  8  select 6 id,'A' d1, 'D' d2, 35 d from dual union
  9  select 6 id,'B' d1, 'E' d2, 40 d from dual union
 10  select 6 id,'C' d1, 'E' d2, 45 d from dual;
9 rows created.

Solution:
To solve the "Graph Shortest Path" problem to find shortest path from 'A' to 'D' nodes, I have used hierarchical query "connect by prior". To make it easy to understand I am breaking the solution in two paths.

First Step of the solution is quite simple and simply use "connect by prior" clause, and is finding every possible path starting from 'A'. Here we are simply making an arithmetic expression for DISTANCE by using "SYS_CONNECT_BY_PATH"

SQL> column distance format a10;
SQL> column path format a10
SQL> set lines 200
SQL> select
  2    connect_by_root to_point from_point,
  3    to_point,
  4    ltrim(sys_connect_by_path(to_point,'->'),'->') path,
  5    ltrim(sys_connect_by_path(distance,'+'),'+') distance,
  6    level stop_counts
  7  from flights
  8    connect by prior to_point=from_point
  9    start with to_point='A';

FROM_POINT TO_POINT   PATH       DISTANCE   STOP_COUNTS
---------- ---------- ---------- ---------- -----------
A          A          A          0                    1
A          B          A->B       0+5                  2
A          C          A->B->C    0+5+15               3
A          D          A->B->C->D 0+5+15+20            4
A          E          A->B->C->E 0+5+15+45            4
A          D          A->B->D    0+5+25               3
A          E          A->B->E    0+5+40               3
A          C          A->C       0+10                 2
A          D          A->C->D    0+10+20              3
A          E          A->C->E    0+10+45              3
A          D          A->D       0+35                 2


Second Step: Now the final step is to simply filter out the records from above result-set where to_point is equal to 'D'. Also we need to compute the arithmetic express of "DISTANCE". To calculate the distance you can simply create an Oracle function which solve this expression or use "dbms_xmlgen.getxml" to dynamically calculate it in query, which I have used in this example.

SQL> column distance format a10
SQL> column path format a10
SQL> set lines 200
SQL> select * from
  2  (
  3    select
  4      rank() over (order by total_distance,stop_counts) rn,
  5      from_point, to_point, stop_counts, path, distance, total_distance
  6    from
  7    (
  8      select
  9        connect_by_root to_point from_point,
 10        to_point,
 11        ltrim(sys_connect_by_path(to_point,'->'),'->') path,
 12        ltrim(sys_connect_by_path(distance,'+'),'+') distance,
 13        level stop_counts,
 14        to_number(
 15          extractvalue(
 16            xmltype(
 17              dbms_xmlgen.getxml(
 18                'select ' || ltrim(sys_connect_by_path(distance,'+'),'+') || ' as c from dual'
 19              )
 20            ),'/ROWSET/ROW/C'
 21          )
 22        ) total_distance
 23      from flights
 24        connect by prior to_point=from_point
 25        start with to_point='A'
 26    ) where to_point='D'
 27  ) where rn = 1;

        RN FROM_POINT TO_POINT   STOP_COUNTS PATH       DISTANCE   TOTAL_DISTANCE
---------- ---------- ---------- ----------- ---------- ---------- --------------
         1 A          D                    3 A->B->D    0+5+25                 30
         1 A          D                    3 A->C->D    0+10+20                30


Note: You can create following function to calculate the expression of distance and avoid the use of "dbms_xmlgen.getxml".

SQL> create function cal_distance(str varchar2) return number
  2  is
  3    val number;
  4  begin
  5    execute immediate 'select ' || str || ' from dual' into val;
  6    return val;
  7  end;
  8  /
Function created.

SQL> select * from
  2  (
  3    select
  4      rank() over (order by total_distance,stop_counts) rn,
  5      from_point, to_point, stop_counts, path, distance, total_distance
  6    from
  7    (
  8      select
  9        connect_by_root to_point from_point,
 10        to_point,
 11        ltrim(sys_connect_by_path(to_point,'->'),'->') path,
 12        ltrim(sys_connect_by_path(distance,'+'),'+') distance,
 13        level stop_counts,
 14        cal_distance(ltrim(sys_connect_by_path(distance,'+'),'+')) total_distance
 15      from flights
 16        connect by prior to_point=from_point
 17        start with to_point='A'
 18    ) where to_point='D'
 19  ) where rn = 1;

        RN FROM_POINT TO_POINT   STOP_COUNTS PATH       DISTANCE   TOTAL_DISTANCE
---------- ---------- ---------- ----------- ---------- ---------- --------------
         1 A          D                    3 A->B->D    0+5+25                 30
         1 A          D                    3 A->C->D    0+10+20                30


More SQL Puzzles:
- SQL Puzzle - Transpose Rows and Shift Values among columns
- Diamond Shaped Star Pattern by SQL
- Sorting Versions stored in Varchar2 Column
- SQL Puzzle - Grouping Deals
- SQL Puzzle - Consecutive Wins
- SQL Puzzle - Issue Tracker
- SQL Interview Question Answers

2 comments:

  1. From my reading of the SQL it suggests that the paths are directional thus all possible paths are not tested unless there are two rows per nodal pair. That is, it seems to only be tested for a subset of "graph shortest path problem".

    ReplyDelete
  2. Calculating Graph Shortest Path by SQL as referenced here is shown as a fixed very complex programming solution. The solution I was demonstrating uses SQL's inherent Lowest Common Ancestor (LCA) processing to inherently process concurrent multipath processing. It naturally handles any complexity automatically and breaks the processing into multiple LCAs as necessary to inherently perform the most efficient and accurate processing as described in my above slideshare article. Why this new LCA version of this inherent technology works as described above is shown below in a previous article of mine. It precisely demonstrates how and why this dynamic SQL inherent LCA processing works to always produce the most meaningful result with no programming required. I discovered this natural LCA in SQL processing occurring naturally and determined how and why this occurs inherently.

    ReplyDelete