2022-01-07 / Bartłomiej Kurek
Graphs: Eulerian Path, Fleury's algorithm

In this article we are going to tackle one of the problems in the graph theory department. The problem solution will be realized in two variants: Python and PostgreSQL (plain PL/pgSQL). Python version is aimed at visualizing the problem in detail and getting the grasp of the idea in a clear and accessible way. PostgreSQL solution makes use of recursive functions and recursive CTEs (common table expressions).

The code can be found in the git repository.

Literature, resources
Visualization
Fleury's algorithm
Python implementation
- Analysis
PL/pgSQL (postgres implementation)

Literature, resources

Leonhard Euler
Pierre-Henry Fleury - I'd appreciate pointing me to a good resource. Some discussion can be found on stackexchange.com.
Fleury's algorithm (Wikipedia)
Encyclopedia of Mathematics
Lecture by David Eppstein (University of California)
Neelam Yadav / geeksforgeeks.org

Visualization

Although the problem (and the terms) might seem a bit overwhelming, the idea is simple: we want to visit all the vertices without ever traversing any of the edges more than once.
A very well known example: draw an envelope without lifting a pencil.
Thus, this article discusses the algorithm using this analogy. "An envelope with a little ribbon".

Images available here

Fleury's algorithm

Fleury's algorithm isn't quite efficient and there are other algorithms. However, only Fleury's algorithm is covered here.

This Wikipedia article (in Polish) provides a generic pseudocode for a solution using a stack data structure. The algorithm modifies the graph, therefore that article also discusses an abstract data structure that would implement a copy constructor allowing for a copy of the original graph.
The solution presented in this article operates directly on the original copy of the graph: the edges are being removed during the traversal. This can be easily modified, though.

Rules:

The article Eulerian path and circuit for undirected graph (geeksforgeeks.org) already discusses the theory in a clear and visual way. Here we're listing only the essential rules.

Eulerian cycle exists when:

all non-zero degree vertices are connected (isolated nodes are not considered)
all vertices have even degree.

Eulerian path exists when:

all non-zero degree vertices are connected (isolated nodes are not considered)
the graph has 0 or 2 odd degree vertices and other vertices have even degree.

Python implementation

Python version is based on the excellent guide by Neelam Yadav / geeksforgeeks.org with tweaks, optimizations and added visualization.

class Graph:
    def __init__(self, edges):
        self.G = {}
        [self.add_edge(*edge) for edge in edges]
        self.G = {k: list(set(v)) for k, v in self.G.items()}

    def add_edge(self, src, dst, idxs=None):
        u_idx, v_idx = (idxs or [None, None])
        self.G.setdefault(src, [])
        gs = self.G[src]
        gs.insert(u_idx, dst) if u_idx is not None else gs.append(dst)

        self.G.setdefault(dst, [])
        ds = self.G[dst]
        ds.insert(v_idx, src) if v_idx is not None else ds.append(src)

    def remove_edge(self, v, u):
        u_idx = self.G[v].index(u)
        self.G[v].pop(u_idx)

        v_idx = self.G[u].index(v)
        self.G[u].pop(v_idx)

        return [u_idx, v_idx]

    def get_odd_degree_vertices(self):
        return {s: len(d) for s, d in self.G.items() if len(d) % 2 != 0}

    def get_pathlen(self, src, limit=None, visited=None):
        """Recursive"""
        visited = (visited or {})
        visited[src] = 1
        if limit is not None and len(visited) > limit:
            return len(visited)
        for adjacent in self.G[src]:
            if adjacent not in visited:
                self.get_pathlen(adjacent, limit, visited)
        return len(visited)

    def is_edge_valid(self, v, u):
        if len(self.G[v]) == 1:
            return True

        vlen = self.get_pathlen(v)
        [u_idx, v_idx] = self.remove_edge(v, u)
        ulen = self.get_pathlen(u, vlen + 1)
        self.add_edge(v, u, [u_idx, v_idx])

        return vlen <= ulen

    def traverse(self, v, traversed_paths):
        """Recursive"""
        for u in self.G[v]:
            if self.is_edge_valid(v, u):
                path = [v, u]
                back = [u, v]
                assert path not in paths, "Does not traverse twice"
                assert back not in paths, "Does not traverse twice"
                paths.append(path)

                self.remove_edge(v, u)
                self.traverse(u, traversed_paths)

   def find_eulerian_path(self):
        odd_vertices = self.get_odd_degree_vertices()
        n_odd = len(odd_vertices)
        assert n_odd in [0, 2], "0 or 2 odd vertices"
        nodes = odd_vertices if n_odd else self.G
        assert nodes, "Has somewhere to start"
        v = sorted(nodes.items(), key=lambda p: -p[1])[0][0]
        traversed_paths = []
        self.traverse(v, traversed_paths)
        return traversed_paths

Analysis

start
The algorithm starts with the find_eulerian_path() method. The code verifies the number of vertices having odd degree. If the graph satisfies the requirements, a maximum odd degree vertex is chosen as a start node (v). If there are no vertices having odd degree, a node of highest degree is selected.
traversal
The algorithm then traverses the graph recursively (traverse()). Each connected neighbor (u) is considered a valid next edge if more vertices can be visited from u than from v.
choosing next edge
The choice of which edge to follow is based on the number of vertices reachable from v (current vertex) and u (any neighbor). Before calculating the degree of u the currently considered edge v-u is removed so that the algorithm doesn't go into a cycle. The edge is then added back again. The path length calculation (get_pathlen) is optimized - it stops as soon as it finds that more nodes can be visited from u than from v.
graph modification (in-place)
- remove_edge() - removes an edge and returns indices of the respective neighbors
- add_edge() - adds an edge, allows for inserting the previously removed neighbors at their exact original offsets in the list

PL/pgSQL (postgres implementation)

The solution is loosely based on the Python version.
The database has two schemas: store and graph.

CREATE SCHEMA IF NOT EXISTS store; -- data
CREATE SCHEMA IF NOT EXISTS graph; -- api

Implementation may use arrays or hstore. By default a recursive variant with hstore is used, so the hstore extension has to be created:

CREATE EXTENSION IF NOT EXISTS hstore;

Schema: store

The graph is stored in a single table. During graph traversal the edges will only by marked as removed.

CREATE TABLE IF NOT EXISTS store.graph (
       edge_id SERIAL PRIMARY KEY,
       src TEXT NOT NULL,
       dst TEXT NOT NULL,
       is_removed BOOLEAN NOT NULL DEFAULT FALSE
);

CREATE UNIQUE INDEX graph_uidx ON store.graph(src, dst);

Removed nodes could also be stored in an array or an hstore.

Graph: `add_edge()`

The graph is undirected, thus the add_edge() function inserts two records: src-dst and dst-src. If the edge already exists, its is_removed flag will be cleared. The function does not really need to return anything and can be PERFORMed. The return type is left for "debugging"/tracing purposes.

CREATE OR REPLACE FUNCTION graph.add_edge(_src TEXT, _dst TEXT)
RETURNS SETOF store.graph
AS $$
DECLARE
   R store.graph;
BEGIN
    -- RAISE NOTICE 'Adding edge: (% <-> %)', _src, _dst;

    INSERT INTO store.graph(src, dst) VALUES(_src, _dst)
           ON CONFLICT(src, dst) DO UPDATE SET is_removed = FALSE
           RETURNING * INTO R;

    RETURN NEXT R;

    INSERT INTO store.graph(src, dst) VALUES(_dst, _src)
           ON CONFLICT(src, dst) DO UPDATE SET is_removed = FALSE
           RETURNING * INTO R;

    RETURN NEXT R;
END
$$
LANGUAGE PLPGSQL;

Graph: `remove_edge()`

Edge removal can be realized by setting the is_removed field to TRUE. The function needs to remove both edges: src-dst and dst-src.

CREATE OR REPLACE FUNCTION graph.remove_edge(_src TEXT, _dst TEXT)
RETURNS VOID
AS $$
BEGIN
    -- RAISE NOTICE 'Marking edge removed: (% <-> %)', _src, _dst;

    UPDATE store.graph SET is_removed = TRUE
           WHERE (
                 src = _src AND dst = _dst
                 OR
                 src = _dst AND dst = _src
           );
END
$$
LANGUAGE PLPGSQL;

Data: storing the graph

At this point the graph.add_edge() function can already be used to fill in the graph edges.

-- envelope
SELECT * FROM graph.add_edge('A', 'B');
SELECT * FROM graph.add_edge('A', 'C');
SELECT * FROM graph.add_edge('B', 'A');
SELECT * FROM graph.add_edge('B', 'C');
SELECT * FROM graph.add_edge('B', 'D');
SELECT * FROM graph.add_edge('B', 'E');
SELECT * FROM graph.add_edge('C', 'A');
SELECT * FROM graph.add_edge('C', 'B');
SELECT * FROM graph.add_edge('C', 'D');
SELECT * FROM graph.add_edge('C', 'E');
SELECT * FROM graph.add_edge('D', 'B');
SELECT * FROM graph.add_edge('D', 'C');
SELECT * FROM graph.add_edge('D', 'E');
SELECT * FROM graph.add_edge('E', 'B');
SELECT * FROM graph.add_edge('E', 'C');
SELECT * FROM graph.add_edge('E', 'D');

-- ribbon
SELECT * FROM graph.add_edge('E', 'F');

Data:

SELECT G.src, array_agg(G.dst ORDER BY G.dst) FROM store.graph G
       WHERE NOT G.is_removed
       GROUP BY G.src ORDER BY G.src;

 src | array_agg
-----+-----------
 A   | {B,C}
 B   | {A,C,D,E}
 C   | {A,B,D,E}
 D   | {B,C,E}
 E   | {B,C,D,F}
 F   | {E}
(6 rows)

Graph: `assert_eulerian()`

The following function simply counts the vertices that have odd degree. It raises an exception when it finds a different number of odd degree vertices than 0 or 2. The first SELECT statement (CTE) returns the vertices with odd degree, the second statement counts the records returned by the CTE and checks if there are 0 or 2 of them. The result of the comparison is placed in is_eulerian_ boolean variable. If is_eulerian_ is FALSE, an error is raised.

The function could return a boolean but its use here is limited to asserting the graph satisfies the requirements. Hence, RETURNS VOID.

CREATE OR REPLACE FUNCTION graph.assert_eulerian()
RETURNS VOID
AS $$
DECLARE
    is_eulerian_ BOOLEAN;
BEGIN
    WITH stmt_get_odd AS (
         SELECT COUNT(*)
            FROM store.graph
            WHERE NOT is_removed
            GROUP BY src
            HAVING array_length(array_agg(dst), 1) % 2 != 0
    )
    SELECT COUNT(*) = ANY(ARRAY[0, 2]) INTO is_eulerian_ FROM stmt_get_odd;

    IF NOT is_eulerian_ THEN
       RAISE EXCEPTION 'Not Eulerian';
    END IF;
END
$$
LANGUAGE PLPGSQL;

Test: Eulerian graph

The "envelope with a ribbon" satisfies the requirements: 0 or 2 vertices of odd degree.

SELECT G.src, array_agg(G.dst) FROM store.graph G
       WHERE NOT G.is_removed
       GROUP BY G.src ORDER BY G.src;

 src | array_agg
-----+-----------
 A   | {B,C}
 B   | {A,C,D,E}
 C   | {A,B,D,E}
 D   | {B,C,E}
 E   | {B,C,D,F}
 F   | {E}
(6 rows)

Check:

SELECT graph.assert_eulerian();

 assert_eulerian
-----------------

(1 row)

Test: Non-Eulerian graph

When the graph does not satisfy the requirements, an exception is raised.

SELECT G.src, array_agg(G.dst) FROM store.graph G
       WHERE NOT G.is_removed
       GROUP BY G.src ORDER BY G.src;

 src |  array_agg
-----+-------------
 A   | {B,C,F}
 B   | {A,C,D,E,F}
 C   | {A,B,D,E,F}
 D   | {B,C,E,F}
 E   | {B,C,D,F}
 F   | {E,D,C,A,B}
(6 rows)

Check:

SELECT graph.assert_eulerian();

ERROR:  Not Eulerian
CONTEXT:  PL/pgSQL function graph.assert_eulerian() line 15 at RAISE

Graph: `get_pathlen()`, variants

This function is crucial and probably the hardest to get right. It can be realized in several ways: as a recursive CTE (no cursors, based on and array or hstore) or as a recursive function (with cursors or an array). Three different implementations are discussed here.

ARRAY based, recursive CTE variant:

CREATE OR REPLACE FUNCTION graph.get_pathlen(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
    pathlen_ INTEGER = 1;
BEGIN
    WITH RECURSIVE my_traversal(dst, visited)
    AS (
       SELECT dst,
              ARRAY[src]::TEXT[]
              FROM store.graph
              WHERE NOT is_removed AND src = _src
       UNION
       SELECT G.dst,
              array_append(T.visited, G.src)
              FROM store.graph G
              JOIN my_traversal T ON G.src = T.dst
              WHERE NOT G.is_removed AND NOT G.src=ANY(T.visited)
    )
    SELECT COUNT(DISTINCT(dst))
           FROM my_traversal
           INTO pathlen_;

    RETURN pathlen_;
END
$$
LANGUAGE PLPGSQL;

The recursive traversal stores visited nodes in an array, the recursive term of the CTE excludes the visited nodes. The order of the records returned by this CTE may not be assumed, so the final result is the number of distinct elements.

Test:

SELECT graph.get_pathlen('E');

 get_pathlen
-------------
           6

hstore based, recursive CTE variant:

Hash based version is a counterpart of the Python implementation above, but there's a catch. Although the function does basically the same, the result for one node is different due to the nature of recursive CTE. In case of Python the algorithm iterates the already known list of adjacent vertices, while in a recursive CTE there are two SELECT statements.

WITH RECURSIVE cte_stmt AS (
    SELECT statement -- non-recursive term
    UNION [ALL]
    SELECT statement -- recursive term
)
SELECT * FROM cte_stmt;

In Python the recursive call happens already after the if v not in visited condition, while in a recursive CTE the condition is a part of the recursive term and gets evaluated once the recursive term has already started its execution.
a) tracking source vertex (v) in hstore will result in node F being one off
b) tracking v-u - same as a)
c) tracking destination vertex (u) in hstore will result in node E being one off

These solutions work for the "envelope with a ribbon" case, however rigorous testing should follow.
The implementation can be analyzed and traced using the versions with a dedicated type.
The default variant is described later. However, this one deserves analysis. It looks correct and it works, but without thorough testing one cannot be sure. The code repository includes the hstore recursive CTEs in the fixme_1.sql and fixme_2.sql files.

The function in variant "a)":

CREATE OR REPLACE FUNCTION graph.get_pathlen_hstore(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
    pathlen_ INTEGER = 1;
BEGIN
    WITH RECURSIVE my_traversal(lvl, dst, visited, row_num)
    AS (
       SELECT 0 AS lvl,
              dst,
              hstore(src, '1'),
              0::bigint AS row_num
              FROM store.graph
              WHERE NOT is_removed AND src = _src
       UNION
       SELECT T.lvl + 1 AS lvl,
              G.dst,
              visited || (G.src || '=>1')::hstore,
              ROW_NUMBER() OVER (PARTITION BY lvl) AS row_num
              FROM store.graph G
              JOIN my_traversal T ON G.src = T.dst
              WHERE NOT G.is_removed AND
                    NOT exist(T.visited, G.src)
    )
    SELECT array_length(akeys(visited), 1)
           FROM my_traversal
           ORDER BY lvl DESC, row_num DESC LIMIT 1
           INTO pathlen_;

    RETURN pathlen_;
END
$$
LANGUAGE PLPGSQL;

The CTE returns the following:

 lvl | dst |                     visited                      | row_num
-----+-----+--------------------------------------------------+---------
   2 | A   | "B"=>"1", "D"=>"1", "E"=>"1"                     |       1
   2 | A   | "B"=>"1", "C"=>"1", "E"=>"1"                     |       2
   -- [...]
   3 | A   | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1"           |       1
   3 | A   | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1"           |       2
   -- [...]
   3 | E   | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1"           |      37
   3 | E   | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1"           |      38
   4 | A   | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" |       1
   4 | B   | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" |       2
   -- [...]
   4 | E   | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" |      21
   4 | E   | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" |      22

The final statement counts the keys in the hash (visited) as returned in the last record returned by the recursive CTE.
Hence: ORDER BY lvl DESC, row_num DESC LIMIT 1.

hstore base, recursive function variant (the default):
For the sake of clarity, the implementation is split into two separate functions:

the entry function
the recursive function (using IN and INOUT parameters, although the result might also be explicitely returned as hstore type)

The entry function:

CREATE OR REPLACE FUNCTION graph.get_pathlen_recursive(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
    visited_ hstore = hstore(_src, '1');
BEGIN
    SELECT * FROM graph._get_pathlen_recursive(_src, visited_) INTO visited_;
    RETURN array_length(akeys(visited_), 1);
END
$$
LANGUAGE PLPGSQL;

The recursive algorithm:

CREATE OR REPLACE FUNCTION graph._get_pathlen_recursive(
       IN _src TEXT,
       INOUT _visited hstore)
RETURNS hstore
AS $$
DECLARE
    u_ TEXT;
BEGIN
    SELECT COALESCE(_visited, '') || hstore(_src, '1') INTO _visited;

    FOR u_ IN SELECT dst FROM store.graph
                         WHERE NOT is_removed AND src = _src
    LOOP
           IF NOT exist(_visited, u_) THEN
              SELECT graph._get_pathlen_recursive(u_, _visited) || _visited
                     INTO _visited;
           END IF;
    END LOOP;
END
$$
LANGUAGE PLPGSQL;

This approach yields correct results and using hstore allows for optimum performance.

Test:

SELECT graph.get_pathlen_recursive('E');

 get_pathlen_recursive
-----------------------
                     6
(1 row)

SELECT graph.get_pathlen_recursive('F');

 get_pathlen_recursive
-----------------------
                     6

Graph: `is_edge_valid()`

This is an exact SQL counterpart of the Python implementation.

CREATE OR REPLACE FUNCTION graph.is_edge_valid(_src TEXT, _dst TEXT)
RETURNS BOOLEAN
AS $$
DECLARE
    vn_ INTEGER = 0;
    un_ INTEGER = 0;
BEGIN
   SELECT count(*) FROM store.graph
          WHERE NOT is_removed AND src = _src
          GROUP BY src
          INTO vn_;

   IF vn_ = 1 THEN
      RETURN TRUE;
   END IF;

   SELECT graph.get_pathlen_recursive(_src) INTO vn_;
   PERFORM graph.remove_edge(_src, _dst);

   SELECT graph.get_pathlen_recursive(_dst) INTO un_;
   PERFORM graph.add_edge(_src, _dst);

   RETURN vn_ <= un_;
END
$$
LANGUAGE PLPGSQL;

Graph: `traverse()`

This function traverses the graph in a recursive fashion. For each adjacent vertex the edge v-u is checked. If the edge was not already traversed and is valid (more nodes reachable from u), the path is added to the solution, the edge v-u is removed and the traversal follows the edges starting at vertex u. The collected paths are returned as an array of text elements.

CREATE OR REPLACE FUNCTION graph.traverse(
       _v TEXT,
       _paths TEXT[])
RETURNS TEXT[]
AS $$
DECLARE
    u_ TEXT;
    is_valid_ BOOLEAN = FALSE;
    edge_src_dst_ TEXT;
    edge_dst_src_ TEXT;
BEGIN
    FOR u_ IN SELECT dst FROM store.graph
                     WHERE NOT is_removed AND src = _v
    LOOP
        SELECT * FROM graph.is_edge_valid(_v, u_) INTO is_valid_;
        IF is_valid_ THEN
           edge_src_dst_ = (_v || '-' || u_);
           edge_dst_src_ = (u_ || '-' || _v);

           IF (edge_src_dst_ = ANY(_paths) OR edge_dst_src_ = ANY(_paths))
           THEN
               RETURN _paths;
           END IF;

           SELECT * FROM array_append(_paths, _v || '-' || u_) INTO _paths;

           PERFORM graph.remove_edge(_v, u_);

           SELECT * FROM graph.traverse(u_, _paths) INTO _paths;
        END IF;
    END LOOP;

    RETURN _paths;
END
$$
LANGUAGE PLPGSQL;

Graph: `find_eulerian_path()`

The algorithm starts with this function. An odd degree vertex is selected as a starting point. If such a vertex is found, the recursive traversal starts. The results are returned as a set of text records.

CREATE OR REPLACE FUNCTION find_eulerian_path()
RETURNS SETOF TEXT
AS $$
DECLARE
    v_ TEXT;
    path_ TEXT[] = ARRAY[]::TEXT[];
BEGIN
    SELECT src,
           array_length(array_agg(dst), 1) AS degree
        FROM store.graph
        WHERE NOT is_removed
        GROUP BY src HAVING array_length(array_agg(dst), 1) % 2 != 0
        ORDER BY degree DESC
        LIMIT 1
        INTO v_;

    IF FOUND THEN
        -- RAISE NOTICE 'Start vertex: %', v_;
        SELECT * FROM graph.traverse(v_, path_) INTO path_;
    END IF;

    RETURN QUERY SELECT UNNEST(path_);
END
$$
LANGUAGE PLPGSQL;