In this article we are going to tackle one of the problems in the graph theory department. The problem solution will be realized in two variants: Python and PostgreSQL (plain PL/pgSQL). Python version is aimed at visualizing the problem in detail and getting the grasp of the idea in a clear and accessible way. PostgreSQL solution makes use of recursive functions and recursive CTEs (common table expressions).
The code can be found in the git repository.
Literature, resources
- Leonhard Euler
- Pierre-Henry Fleury - I'd appreciate pointing me to a good resource. Some discussion can be found on stackexchange.com.
- Fleury's algorithm (Wikipedia)
- Encyclopedia of Mathematics
- Lecture by David Eppstein (University of California)
- Neelam Yadav / geeksforgeeks.org
Visualization
Although the problem (and the terms) might seem a bit overwhelming, the idea is simple: we want to visit all the vertices without ever traversing any of the edges more than once.
A very well known example: draw an envelope without lifting a pencil.
Thus, this article discusses the algorithm using this analogy. "An envelope with a little ribbon".
Fleury's algorithm
Fleury's algorithm isn't quite efficient and there are other algorithms. However, only Fleury's algorithm is covered here.
This Wikipedia article (in Polish) provides a generic pseudocode for a solution using a stack data structure. The algorithm modifies the graph, therefore that article also discusses an abstract data structure that would implement a copy constructor allowing for a copy of the original graph.
The solution presented in this article operates directly on the original copy of the graph: the edges are being removed during the traversal. This can be easily modified, though.
Rules:
The article Eulerian path and circuit for undirected graph (geeksforgeeks.org) already discusses the theory in a clear and visual way. Here we're listing only the essential rules.
Eulerian cycle exists when:
- all non-zero degree vertices are connected (isolated nodes are not considered)
- all vertices have even degree.
Eulerian path exists when:
- all non-zero degree vertices are connected (isolated nodes are not considered)
- the graph has 0 or 2 odd degree vertices and other vertices have even degree.
Python implementation
Python version is based on the excellent guide by Neelam Yadav / geeksforgeeks.org with tweaks, optimizations and added visualization.
class Graph:
def __init__(self, edges):
self.G = {}
[self.add_edge(*edge) for edge in edges]
self.G = {k: list(set(v)) for k, v in self.G.items()}
def add_edge(self, src, dst, idxs=None):
u_idx, v_idx = (idxs or [None, None])
self.G.setdefault(src, [])
gs = self.G[src]
gs.insert(u_idx, dst) if u_idx is not None else gs.append(dst)
self.G.setdefault(dst, [])
ds = self.G[dst]
ds.insert(v_idx, src) if v_idx is not None else ds.append(src)
def remove_edge(self, v, u):
u_idx = self.G[v].index(u)
self.G[v].pop(u_idx)
v_idx = self.G[u].index(v)
self.G[u].pop(v_idx)
return [u_idx, v_idx]
def get_odd_degree_vertices(self):
return {s: len(d) for s, d in self.G.items() if len(d) % 2 != 0}
def get_pathlen(self, src, limit=None, visited=None):
"""Recursive"""
visited = (visited or {})
visited[src] = 1
if limit is not None and len(visited) > limit:
return len(visited)
for adjacent in self.G[src]:
if adjacent not in visited:
self.get_pathlen(adjacent, limit, visited)
return len(visited)
def is_edge_valid(self, v, u):
if len(self.G[v]) == 1:
return True
vlen = self.get_pathlen(v)
[u_idx, v_idx] = self.remove_edge(v, u)
ulen = self.get_pathlen(u, vlen + 1)
self.add_edge(v, u, [u_idx, v_idx])
return vlen <= ulen
def traverse(self, v, traversed_paths):
"""Recursive"""
for u in self.G[v]:
if self.is_edge_valid(v, u):
path = [v, u]
back = [u, v]
assert path not in paths, "Does not traverse twice"
assert back not in paths, "Does not traverse twice"
paths.append(path)
self.remove_edge(v, u)
self.traverse(u, traversed_paths)
def find_eulerian_path(self):
odd_vertices = self.get_odd_degree_vertices()
n_odd = len(odd_vertices)
assert n_odd in [0, 2], "0 or 2 odd vertices"
nodes = odd_vertices if n_odd else self.G
assert nodes, "Has somewhere to start"
v = sorted(nodes.items(), key=lambda p: -p[1])[0][0]
traversed_paths = []
self.traverse(v, traversed_paths)
return traversed_paths
Analysis
- start
The algorithm starts with thefind_eulerian_path()
method. The code verifies the number of vertices having odd degree. If the graph satisfies the requirements, a maximum odd degree vertex is chosen as a start node (v
). If there are no vertices having odd degree, a node of highest degree is selected. - traversal
The algorithm then traverses the graph recursively (traverse()
). Each connected neighbor (u
) is considered a valid next edge if more vertices can be visited fromu
than fromv
. - choosing next edge
The choice of which edge to follow is based on the number of vertices reachable fromv
(current vertex) andu
(any neighbor). Before calculating the degree ofu
the currently considered edgev-u
is removed so that the algorithm doesn't go into a cycle. The edge is then added back again. The path length calculation (get_pathlen
) is optimized - it stops as soon as it finds that more nodes can be visited fromu
than fromv
. -
graph modification (in-place)
remove_edge()
- removes an edge and returns indices of the respective neighborsadd_edge()
- adds an edge, allows for inserting the previously removed neighbors at their exact original offsets in the list
PL/pgSQL (postgres implementation)
The solution is loosely based on the Python version.
The database has two schemas: store and graph.
CREATE SCHEMA IF NOT EXISTS store; -- data
CREATE SCHEMA IF NOT EXISTS graph; -- api
Implementation may use arrays or hstore
. By default a recursive variant with hstore
is used, so the hstore
extension has to be created:
CREATE EXTENSION IF NOT EXISTS hstore;
Schema: store
The graph is stored in a single table. During graph traversal the edges will only by marked as removed.
CREATE TABLE IF NOT EXISTS store.graph (
edge_id SERIAL PRIMARY KEY,
src TEXT NOT NULL,
dst TEXT NOT NULL,
is_removed BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE UNIQUE INDEX graph_uidx ON store.graph(src, dst);
Removed nodes could also be stored in an array or an hstore.
Graph: add_edge()
The graph is undirected, thus the add_edge()
function inserts two records: src-dst
and dst-src
. If the edge already exists, its is_removed
flag will be cleared. The function does not really need to return anything and can be PERFORM
ed. The return type is left for "debugging"/tracing purposes.
CREATE OR REPLACE FUNCTION graph.add_edge(_src TEXT, _dst TEXT)
RETURNS SETOF store.graph
AS $$
DECLARE
R store.graph;
BEGIN
-- RAISE NOTICE 'Adding edge: (% <-> %)', _src, _dst;
INSERT INTO store.graph(src, dst) VALUES(_src, _dst)
ON CONFLICT(src, dst) DO UPDATE SET is_removed = FALSE
RETURNING * INTO R;
RETURN NEXT R;
INSERT INTO store.graph(src, dst) VALUES(_dst, _src)
ON CONFLICT(src, dst) DO UPDATE SET is_removed = FALSE
RETURNING * INTO R;
RETURN NEXT R;
END
$$
LANGUAGE PLPGSQL;
Graph: remove_edge()
Edge removal can be realized by setting the is_removed
field to TRUE
. The function needs to remove both edges: src-dst
and dst-src
.
CREATE OR REPLACE FUNCTION graph.remove_edge(_src TEXT, _dst TEXT)
RETURNS VOID
AS $$
BEGIN
-- RAISE NOTICE 'Marking edge removed: (% <-> %)', _src, _dst;
UPDATE store.graph SET is_removed = TRUE
WHERE (
src = _src AND dst = _dst
OR
src = _dst AND dst = _src
);
END
$$
LANGUAGE PLPGSQL;
Data: storing the graph
At this point the graph.add_edge()
function can already be used to fill in the graph edges.
-- envelope
SELECT * FROM graph.add_edge('A', 'B');
SELECT * FROM graph.add_edge('A', 'C');
SELECT * FROM graph.add_edge('B', 'A');
SELECT * FROM graph.add_edge('B', 'C');
SELECT * FROM graph.add_edge('B', 'D');
SELECT * FROM graph.add_edge('B', 'E');
SELECT * FROM graph.add_edge('C', 'A');
SELECT * FROM graph.add_edge('C', 'B');
SELECT * FROM graph.add_edge('C', 'D');
SELECT * FROM graph.add_edge('C', 'E');
SELECT * FROM graph.add_edge('D', 'B');
SELECT * FROM graph.add_edge('D', 'C');
SELECT * FROM graph.add_edge('D', 'E');
SELECT * FROM graph.add_edge('E', 'B');
SELECT * FROM graph.add_edge('E', 'C');
SELECT * FROM graph.add_edge('E', 'D');
-- ribbon
SELECT * FROM graph.add_edge('E', 'F');
Data:
SELECT G.src, array_agg(G.dst ORDER BY G.dst) FROM store.graph G
WHERE NOT G.is_removed
GROUP BY G.src ORDER BY G.src;
src | array_agg
-----+-----------
A | {B,C}
B | {A,C,D,E}
C | {A,B,D,E}
D | {B,C,E}
E | {B,C,D,F}
F | {E}
(6 rows)
Graph: assert_eulerian()
The following function simply counts the vertices that have odd degree. It raises an exception when it finds a different number of odd degree vertices than 0 or 2. The first SELECT
statement (CTE) returns the vertices with odd degree, the second statement counts the records returned by the CTE and checks if there are 0 or 2 of them. The result of the comparison is placed in is_eulerian_
boolean variable. If is_eulerian_
is FALSE
, an error is raised.
The function could return a boolean but its use here is limited to asserting the graph satisfies the requirements. Hence, RETURNS VOID
.
CREATE OR REPLACE FUNCTION graph.assert_eulerian()
RETURNS VOID
AS $$
DECLARE
is_eulerian_ BOOLEAN;
BEGIN
WITH stmt_get_odd AS (
SELECT COUNT(*)
FROM store.graph
WHERE NOT is_removed
GROUP BY src
HAVING array_length(array_agg(dst), 1) % 2 != 0
)
SELECT COUNT(*) = ANY(ARRAY[0, 2]) INTO is_eulerian_ FROM stmt_get_odd;
IF NOT is_eulerian_ THEN
RAISE EXCEPTION 'Not Eulerian';
END IF;
END
$$
LANGUAGE PLPGSQL;
Test: Eulerian graph
The "envelope with a ribbon" satisfies the requirements: 0 or 2 vertices of odd degree.
SELECT G.src, array_agg(G.dst) FROM store.graph G
WHERE NOT G.is_removed
GROUP BY G.src ORDER BY G.src;
src | array_agg
-----+-----------
A | {B,C}
B | {A,C,D,E}
C | {A,B,D,E}
D | {B,C,E}
E | {B,C,D,F}
F | {E}
(6 rows)
Check:
SELECT graph.assert_eulerian();
assert_eulerian
-----------------
(1 row)
Test: Non-Eulerian graph
When the graph does not satisfy the requirements, an exception is raised.
SELECT G.src, array_agg(G.dst) FROM store.graph G
WHERE NOT G.is_removed
GROUP BY G.src ORDER BY G.src;
src | array_agg
-----+-------------
A | {B,C,F}
B | {A,C,D,E,F}
C | {A,B,D,E,F}
D | {B,C,E,F}
E | {B,C,D,F}
F | {E,D,C,A,B}
(6 rows)
Check:
SELECT graph.assert_eulerian();
ERROR: Not Eulerian
CONTEXT: PL/pgSQL function graph.assert_eulerian() line 15 at RAISE
Graph: get_pathlen()
, variants
This function is crucial and probably the hardest to get right. It can be realized in several ways: as a recursive CTE (no cursors, based on and array or hstore
) or as a recursive function (with cursors or an array). Three different implementations are discussed here.
ARRAY
based, recursive CTE variant:
CREATE OR REPLACE FUNCTION graph.get_pathlen(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
pathlen_ INTEGER = 1;
BEGIN
WITH RECURSIVE my_traversal(dst, visited)
AS (
SELECT dst,
ARRAY[src]::TEXT[]
FROM store.graph
WHERE NOT is_removed AND src = _src
UNION
SELECT G.dst,
array_append(T.visited, G.src)
FROM store.graph G
JOIN my_traversal T ON G.src = T.dst
WHERE NOT G.is_removed AND NOT G.src=ANY(T.visited)
)
SELECT COUNT(DISTINCT(dst))
FROM my_traversal
INTO pathlen_;
RETURN pathlen_;
END
$$
LANGUAGE PLPGSQL;
The recursive traversal stores visited nodes in an array, the recursive term of the CTE excludes the visited nodes. The order of the records returned by this CTE may not be assumed, so the final result is the number of distinct elements.
Test:
SELECT graph.get_pathlen('E');
get_pathlen
-------------
6
hstore
based, recursive CTE variant:
Hash based version is a counterpart of the Python implementation above, but there's a catch. Although the function does basically the same, the result for one node is different due to the nature of recursive CTE. In case of Python the algorithm iterates the already known list of adjacent vertices, while in a recursive CTE there are two SELECT
statements.
WITH RECURSIVE cte_stmt AS (
SELECT statement -- non-recursive term
UNION [ALL]
SELECT statement -- recursive term
)
SELECT * FROM cte_stmt;
In Python the recursive call happens already after the if v not in visited
condition, while in a recursive CTE the condition is a part of the recursive term and gets evaluated once the recursive term has already started its execution.
a) tracking source vertex (v
) in hstore will result in node F
being one off
b) tracking v-u
- same as a)
c) tracking destination vertex (u
) in hstore will result in node E
being one off
These solutions work for the "envelope with a ribbon" case, however rigorous testing should follow.
The implementation can be analyzed and traced using the versions with a dedicated type.
The default variant is described later. However, this one deserves analysis. It looks correct and it works, but without thorough testing one cannot be sure. The code repository includes the hstore
recursive CTEs in the fixme_1.sql
and fixme_2.sql
files.
The function in variant "a)":
CREATE OR REPLACE FUNCTION graph.get_pathlen_hstore(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
pathlen_ INTEGER = 1;
BEGIN
WITH RECURSIVE my_traversal(lvl, dst, visited, row_num)
AS (
SELECT 0 AS lvl,
dst,
hstore(src, '1'),
0::bigint AS row_num
FROM store.graph
WHERE NOT is_removed AND src = _src
UNION
SELECT T.lvl + 1 AS lvl,
G.dst,
visited || (G.src || '=>1')::hstore,
ROW_NUMBER() OVER (PARTITION BY lvl) AS row_num
FROM store.graph G
JOIN my_traversal T ON G.src = T.dst
WHERE NOT G.is_removed AND
NOT exist(T.visited, G.src)
)
SELECT array_length(akeys(visited), 1)
FROM my_traversal
ORDER BY lvl DESC, row_num DESC LIMIT 1
INTO pathlen_;
RETURN pathlen_;
END
$$
LANGUAGE PLPGSQL;
The CTE returns the following:
lvl | dst | visited | row_num
-----+-----+--------------------------------------------------+---------
2 | A | "B"=>"1", "D"=>"1", "E"=>"1" | 1
2 | A | "B"=>"1", "C"=>"1", "E"=>"1" | 2
-- [...]
3 | A | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 1
3 | A | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 2
-- [...]
3 | E | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 37
3 | E | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 38
4 | A | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 1
4 | B | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 2
-- [...]
4 | E | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 21
4 | E | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 22
The final statement counts the keys in the hash (visited
) as returned in the last record returned by the recursive CTE.
Hence: ORDER BY lvl DESC, row_num DESC LIMIT 1
.
hstore base, recursive function variant (the default):
For the sake of clarity, the implementation is split into two separate functions:
- the entry function
- the recursive function (using
IN
andINOUT
parameters, although the result might also be explicitely returned ashstore
type)
The entry function:
CREATE OR REPLACE FUNCTION graph.get_pathlen_recursive(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
visited_ hstore = hstore(_src, '1');
BEGIN
SELECT * FROM graph._get_pathlen_recursive(_src, visited_) INTO visited_;
RETURN array_length(akeys(visited_), 1);
END
$$
LANGUAGE PLPGSQL;
The recursive algorithm:
CREATE OR REPLACE FUNCTION graph._get_pathlen_recursive(
IN _src TEXT,
INOUT _visited hstore)
RETURNS hstore
AS $$
DECLARE
u_ TEXT;
BEGIN
SELECT COALESCE(_visited, '') || hstore(_src, '1') INTO _visited;
FOR u_ IN SELECT dst FROM store.graph
WHERE NOT is_removed AND src = _src
LOOP
IF NOT exist(_visited, u_) THEN
SELECT graph._get_pathlen_recursive(u_, _visited) || _visited
INTO _visited;
END IF;
END LOOP;
END
$$
LANGUAGE PLPGSQL;
This approach yields correct results and using hstore
allows for optimum performance.
Test:
SELECT graph.get_pathlen_recursive('E');
get_pathlen_recursive
-----------------------
6
(1 row)
SELECT graph.get_pathlen_recursive('F');
get_pathlen_recursive
-----------------------
6
Graph: is_edge_valid()
This is an exact SQL counterpart of the Python implementation.
CREATE OR REPLACE FUNCTION graph.is_edge_valid(_src TEXT, _dst TEXT)
RETURNS BOOLEAN
AS $$
DECLARE
vn_ INTEGER = 0;
un_ INTEGER = 0;
BEGIN
SELECT count(*) FROM store.graph
WHERE NOT is_removed AND src = _src
GROUP BY src
INTO vn_;
IF vn_ = 1 THEN
RETURN TRUE;
END IF;
SELECT graph.get_pathlen_recursive(_src) INTO vn_;
PERFORM graph.remove_edge(_src, _dst);
SELECT graph.get_pathlen_recursive(_dst) INTO un_;
PERFORM graph.add_edge(_src, _dst);
RETURN vn_ <= un_;
END
$$
LANGUAGE PLPGSQL;
Graph: traverse()
This function traverses the graph in a recursive fashion. For each adjacent vertex the edge v-u
is checked. If the edge was not already traversed and is valid (more nodes reachable from u
), the path is added to the solution, the edge v-u
is removed and the traversal follows the edges starting at vertex u
. The collected paths are returned as an array of text elements.
CREATE OR REPLACE FUNCTION graph.traverse(
_v TEXT,
_paths TEXT[])
RETURNS TEXT[]
AS $$
DECLARE
u_ TEXT;
is_valid_ BOOLEAN = FALSE;
edge_src_dst_ TEXT;
edge_dst_src_ TEXT;
BEGIN
FOR u_ IN SELECT dst FROM store.graph
WHERE NOT is_removed AND src = _v
LOOP
SELECT * FROM graph.is_edge_valid(_v, u_) INTO is_valid_;
IF is_valid_ THEN
edge_src_dst_ = (_v || '-' || u_);
edge_dst_src_ = (u_ || '-' || _v);
IF (edge_src_dst_ = ANY(_paths) OR edge_dst_src_ = ANY(_paths))
THEN
RETURN _paths;
END IF;
SELECT * FROM array_append(_paths, _v || '-' || u_) INTO _paths;
PERFORM graph.remove_edge(_v, u_);
SELECT * FROM graph.traverse(u_, _paths) INTO _paths;
END IF;
END LOOP;
RETURN _paths;
END
$$
LANGUAGE PLPGSQL;
Graph: find_eulerian_path()
The algorithm starts with this function. An odd degree vertex is selected as a starting point. If such a vertex is found, the recursive traversal starts. The results are returned as a set of text records.
CREATE OR REPLACE FUNCTION find_eulerian_path()
RETURNS SETOF TEXT
AS $$
DECLARE
v_ TEXT;
path_ TEXT[] = ARRAY[]::TEXT[];
BEGIN
SELECT src,
array_length(array_agg(dst), 1) AS degree
FROM store.graph
WHERE NOT is_removed
GROUP BY src HAVING array_length(array_agg(dst), 1) % 2 != 0
ORDER BY degree DESC
LIMIT 1
INTO v_;
IF FOUND THEN
-- RAISE NOTICE 'Start vertex: %', v_;
SELECT * FROM graph.traverse(v_, path_) INTO path_;
END IF;
RETURN QUERY SELECT UNNEST(path_);
END
$$
LANGUAGE PLPGSQL;
Test:
SELECT * FROM find_eulerian_path();
NOTICE: Start vertex: D
find_eulerian_path
--------------------
D-B
B-A
A-C
C-B
B-E
E-C
C-D
D-E
E-F
(9 rows)
EXPLAIN ANALYZE SELECT * FROM find_eulerian_path();
NOTICE: Start vertex: D
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Function Scan on find_eulerian_path (cost=0.25..10.25 rows=1000 width=32) (actual time=7.406..7.407 rows=9 loops=1)
Planning Time: 0.011 ms
Execution Time: 7.413 ms
(3 rows)