Given a graph and a source vertex, the breadth-first search (BFS) algorithm finds all nodes reachable from the source vertex by searching / traversing the graph in a breadth-first manner.
graph_bfs( vertex_table, vertex_id, edge_table, edge_args, source_vertex, out_table, max_distance, directed, grouping_cols )
Arguments
TEXT. Name of the table containing the vertex data for the graph. Must contain the column specified in the 'vertex_id' parameter below.
TEXT, default = 'id'. Name of the column in 'vertex_table' containing vertex ids. The vertex ids can be of type INTEGER or BIGINT with no duplicates. They do not need to be contiguous.
TEXT. Name of the table containing the edge data. The edge table must contain columns for source vertex and destination vertex. Column naming convention is described below in the 'edge_args' parameter. In addition to vertex columns, if grouping is used then the columns specified in the 'grouping_cols' parameter must be present.
TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". The following parameters are supported for this string argument:
INTEGER or BIGINT. The source vertex id for the algorithm to start. This vertex id must exist in the 'vertex_id' column of 'vertex_table'.
TEXT. Name of the table to store the result of BFS. It contains a row for every vertex that is reachable from the source_vertex. In the presence of grouping columns, only those edges are used for which there are no NULL values in any grouping column. The output table will have the following columns (in addition to the grouping columns):
A summary table named <out_table>_summary is also created. This is an internal table that keeps a record of the input parameters.
INT, default = NULL. Maximum distance to traverse from the source vertex. When this value is null, traverses until reaches leaf node. E.g., if set to 1 will return only adjacent vertices, if set to 7 will return vertices up to a maximum distance of 7 vertices away.
BOOLEAN, default = FALSE. If TRUE the graph will be treated as directed, else it will be treated as an undirected graph.
DROP TABLE IF EXISTS vertex, edge; CREATE TABLE vertex( id INTEGER ); CREATE TABLE edge( src INTEGER, dest INTEGER ); INSERT INTO vertex VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11) ; INSERT INTO edge VALUES (0, 5), (1, 0), (1, 3), (2, 6), (3, 4), (3, 5), (4, 2), (8, 9), (9, 10), (9, 11), (10, 8);
DROP TABLE IF EXISTS out, out_summary; SELECT madlib.graph_bfs( 'vertex', -- Vertex table NULL, -- Vertix id column (NULL means use default naming) 'edge', -- Edge table NULL, -- Edge arguments (NULL means use default naming) 3, -- Source vertex for BFS 'out'); -- Output table of nodes reachable from source_vertex -- Default values used for the other arguments SELECT * FROM out ORDER BY dist,id;
id | dist | parent ----+------+-------- 3 | 0 | 3 1 | 1 | 3 4 | 1 | 3 5 | 1 | 3 0 | 2 | 1 2 | 2 | 4 6 | 3 | 2 (7 rows)
SELECT * FROM out_summary;
vertex_table | vertex_id | edge_table | edge_args | source_vertex | out_table | max_distance | directed | grouping_cols --------------+-----------+------------+-----------+---------------+-----------+--------------+----------+--------------- vertex | NULL | edge | NULL | 3 | out | | | NULL (1 row)
DROP TABLE IF EXISTS out_max, out_max_summary; SELECT madlib.graph_bfs( 'vertex', -- Vertex table NULL, -- Vertix id column (NULL means use default naming) 'edge', -- Edge table NULL, -- Edge arguments (NULL means use default naming) 3, -- Source vertex for BFS 'out_max', -- Output table of nodes reachable from source_vertex 2); -- Maximum distance to traverse from source_vertex -- Default values used for the other arguments SELECT * FROM out_max ORDER BY dist,id;
id | dist | parent ----+------+-------- 3 | 0 | 3 1 | 1 | 3 4 | 1 | 3 5 | 1 | 3 0 | 2 | 1 2 | 2 | 4 (6 rows)
DROP TABLE IF EXISTS vertex_alt, edge_alt; CREATE TABLE vertex_alt AS SELECT id AS v_id FROM vertex; CREATE TABLE edge_alt AS SELECT src AS n1, dest AS n2 FROM edge;
DROP TABLE IF EXISTS out_alt, out_alt_summary; SELECT madlib.graph_bfs( 'vertex_alt', -- Vertex table 'v_id', -- Vertex id column (NULL means use default naming) 'edge_alt', -- Edge table 'src=n1, dest=n2', -- Edge arguments (NULL means use default naming) 8, -- Source vertex for BFS 'out_alt'); -- Output table of nodes reachable from source_vertex SELECT * FROM out_alt ORDER BY v_id;
v_id | dist | parent ------+------+-------- 8 | 0 | 8 9 | 1 | 8 10 | 1 | 8 11 | 2 | 9
DROP TABLE IF EXISTS out_alt_dir, out_alt_dir_summary; SELECT madlib.graph_bfs( 'vertex_alt', -- Vertex table 'v_id', -- Vertex id column (NULL means use default naming) 'edge_alt', -- Edge table 'src=n1, dest=n2', -- Edge arguments (NULL means use default naming) 8, -- Source vertex for BFS 'out_alt_dir', -- Output table of nodes reachable from source_vertex NULL, -- Maximum distance to traverse from source_vertex TRUE); -- Flag for specifying directed graph SELECT * FROM out_alt_dir ORDER BY v_id;
v_id | dist | parent ------+------+-------- 8 | 0 | 8 9 | 1 | 8 10 | 2 | 9 11 | 2 | 9 (4 rows)Notice that, with the graph being treated as directed, the parent of v_id=10 is now vertex 9 and not 8 as in the undirected case.
DROP TABLE IF EXISTS edge_gr; CREATE TABLE edge_gr( g1 INTEGER, g2 TEXT, src INTEGER, dest INTEGER ); INSERT INTO edge_gr VALUES (100, 'a', 0, 5), (100, 'a', 1, 0), (100, 'a', 1, 3), (100, 'a', 2, 6), (100, 'a', 3, 4), (100, 'a', 3, 5), (100, 'a', 4, 2), (100, 'a', 8, 9), (100, 'a', 9, 10), (100, 'a', 9, 11), (100, 'a', 10, 8), (202, 'c', 8, 9), (202, 'c', 9, 10), (202, 'c', 9, 11), (202, 'c', 10, 8) ;
DROP TABLE IF EXISTS out_gr, out_gr_summary; SELECT madlib.graph_bfs( 'vertex', -- Vertex table NULL, -- Vertex id column (NULL means use default naming) 'edge_gr', -- Edge table NULL, -- Edge arguments (NULL means use default naming) 8, -- Source vertex for BFS 'out_gr', -- Output table of nodes reachable from source_vertex NULL, -- Maximum distance to traverse from source_vertex NULL, -- Flag for specifying directed graph 'g1,g2' -- Grouping columns ); SELECT * FROM out_gr ORDER BY g1,g2,dist,id;
g1 | g2 | id | dist | parent -----+----+----+------+-------- 100 | a | 8 | 0 | 8 100 | a | 9 | 1 | 8 100 | a | 10 | 1 | 8 100 | a | 11 | 2 | 9 202 | c | 8 | 0 | 8 202 | c | 9 | 1 | 8 202 | c | 10 | 1 | 8 202 | c | 11 | 2 | 9 (8 rows)If source_vertex is not present in a group, then that group will not appear in the output table.
DROP TABLE IF EXISTS out_gr, out_gr_summary; SELECT madlib.graph_bfs( 'vertex', -- Vertex table NULL, -- Vertex id column (NULL means use default naming) 'edge_gr', -- Edge table NULL, -- Edge arguments (NULL means use default naming) 3, -- Source vertex for BFS 'out_gr', -- Output table of nodes reachable from source_vertex NULL, -- Maximum distance to traverse from source_vertex NULL, -- Flag for specifying directed graph 'g1,g2' -- Grouping columns ); SELECT * FROM out_gr ORDER BY g1,g2,dist,id;
g1 | g2 | id | dist | parent -----+----+----+------+-------- 100 | a | 3 | 0 | 3 100 | a | 1 | 1 | 3 100 | a | 4 | 1 | 3 100 | a | 5 | 1 | 3 100 | a | 0 | 2 | 1 100 | a | 2 | 2 | 4 100 | a | 6 | 3 | 2 (7 rows)
[1] Breadth-first Search algorithm. https://en.wikipedia.org/wiki/Breadth-first_search