Given a directed graph, a weakly connected component (WCC) is a subgraph of the original graph where all vertices are connected to each other by some path, ignoring the direction of edges. In case of an undirected graph, a weakly connected component is also a strongly connected component. This module also includes a number of helper functions that operate on the WCC output.
weakly_connected_components( vertex_table, vertex_id, edge_table, edge_args, out_table, grouping_cols )
Arguments
TEXT. Name of the table containing the vertex data for the graph. Must contain the column specified in the 'vertex_id' parameter below.
TEXT, default = 'id'. Name of the column in 'vertex_table' containing vertex ids. The vertex ids are of type INTEGER with no duplicates. They do not need to be contiguous.
TEXT. Name of the table containing the edge data. The edge table must contain columns for source vertex and destination vertex.
TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". The following parameters are supported for this string argument:
TEXT. Name of the table to store the component ID associated with each vertex. It will contain a row for every vertex from 'vertex_table' with the following columns:
A summary table named <out_table>_summary is also created. This is an internal table that keeps a record of some of the input parameters and is used by the weakly connected component helper functions.
The largest connected component retrieval function finds the largest weakly connected component(s) in a graph. If weakly connected components was run with grouping, the largest connected components are computed for each group.
graph_wcc_largest_cpt( wcc_table, largest_cpt_table )
Arguments
TEXT. Name of the table that contains the output of weakly connected components.
This function creates a histogram of the number of vertices per connected component.
graph_wcc_histogram( wcc_table, histogram_table )
Arguments
TEXT. Name of the table that contains the output of weakly connected components.
TEXT. Name of the output table that contains the number of vertices per component. A row is created for every comoponent in every group if grouping_cols was specified when running weakly connected components. The output table has the following columns:
This function determines if two vertices belong to the same component.
graph_wcc_vertex_check( wcc_table, vertex_pair, pair_table )
Arguments
TEXT. Name of the table that contains the output of weakly connected components.
TEXT. A pair of vertex IDs separated by a comma.
TEXT. Name of the output table that specifies if the two vertices in vertex_pair belong to the same component. If wcc_table was generated using grouping_cols, all the components in all groups are considered. The output table has the following columns:
This function finds all the vertices that can be reached from a given vertex via weakly connected paths.
graph_wcc_reachable_vertices( wcc_table, src, reachable_vertices_table )
Arguments
TEXT. Name of the table that contains the output of weakly connected components.
TEXT. The vertex ID from which all reachable vertices have to be found.
TEXT. Name of the output table that contains the list of vertices that are reachable from the src vertex. The output table has the following columns:
This function finds the total number of components in the input graph.
graph_wcc_num_cpts( wcc_table, count_table )
Arguments
TEXT. Name of the table that contains the output of weakly connected components.
TEXT. Name of the output table that contains the total number of components per group in the graph, if there are any grouping_cols in wcc_table. The output table has the following columns:
DROP TABLE IF EXISTS vertex, edge; CREATE TABLE vertex( id INTEGER ); CREATE TABLE edge( src INTEGER, dest INTEGER, user_id INTEGER ); INSERT INTO vertex VALUES (0), (1), (2), (3), (4), (5), (6), (10), (11), (12), (13), (14), (15), (16); INSERT INTO edge VALUES (0, 1, 1), (0, 2, 1), (1, 2, 1), (1, 3, 1), (2, 3, 1), (2, 5, 1), (2, 6, 1), (3, 0, 1), (5, 6, 1), (6, 3, 1), (10, 11, 2), (10, 12, 2), (11, 12, 2), (11, 13, 2), (12, 13, 2), (13, 10, 2), (15, 16, 2), (15, 14, 2);
DROP TABLE IF EXISTS wcc_out, wcc_out_summary; SELECT madlib.weakly_connected_components( 'vertex', -- Vertex table 'id', -- Vertix id column 'edge', -- Edge table 'src=src, dest=dest', -- Comma delimted string of edge arguments 'wcc_out'); -- Output table of weakly connected components SELECT * FROM wcc_out ORDER BY component_id, id;
id | component_id ----+-------------- 0 | 0 1 | 0 2 | 0 3 | 0 5 | 0 6 | 0 4 | 4 10 | 10 11 | 10 12 | 10 13 | 10 14 | 14 15 | 14 16 | 14 (14 rows)
DROP TABLE IF EXISTS wcc_out, wcc_out_summary; SELECT madlib.weakly_connected_components( 'vertex', -- Vertex table 'id', -- Vertix id column 'edge', -- Edge table 'src=src, dest=dest', -- Comma delimted string of edge arguments 'wcc_out', -- Output table of weakly connected components 'user_id'); -- Grouping column name SELECT * FROM wcc_out ORDER BY user_id, component_id, id;
id | component_id | user_id ----+--------------+--------- 0 | 0 | 1 1 | 0 | 1 2 | 0 | 1 3 | 0 | 1 5 | 0 | 1 6 | 0 | 1 10 | 10 | 2 11 | 10 | 2 12 | 10 | 2 13 | 10 | 2 14 | 14 | 2 15 | 14 | 2 16 | 14 | 2 (13 rows)Note that vertex 4 is not identified as a separate component above. This is because there is no entry in the edge table for vertex 4 indicating which group it belongs to (though you could do that if you wanted to).
DROP TABLE IF EXISTS largest_cpt_table; SELECT madlib.graph_wcc_largest_cpt( 'wcc_out', -- WCC output table 'largest_cpt_table'); -- output table containing largest component ID SELECT * FROM largest_cpt_table ORDER BY component_id;
user_id | component_id | num_vertices ---------+--------------+-------------- 1 | 0 | 6 2 | 10 | 4 (2 rows)
DROP TABLE IF EXISTS histogram_table; SELECT madlib.graph_wcc_histogram( 'wcc_out', -- WCC output table 'histogram_table'); -- output table containing the histogram of vertices SELECT * FROM histogram_table ORDER BY component_id;
user_id | component_id | num_vertices ---------+--------------+-------------- 1 | 0 | 6 2 | 10 | 4 2 | 14 | 3 (3 rows)
DROP TABLE IF EXISTS vc_table; SELECT madlib.graph_wcc_vertex_check( 'wcc_out', -- WCC output table '14,15', -- Pair of vertex IDs 'vc_table'); -- output table containing components that contain the two vertices SELECT * FROM vc_table ORDER BY component_id;
user_id | component_id ---------+-------------- 2 | 14 (1 row)
DROP TABLE IF EXISTS reach_table; SELECT madlib.graph_wcc_reachable_vertices( 'wcc_out', -- WCC output table '0', -- source vertex 'reach_table'); -- output table containing all vertices reachable from source vertex SELECT * FROM reach_table ORDER BY component_id, dest;
user_id | component_id | dest ---------+--------------+------ 1 | 0 | 1 1 | 0 | 2 1 | 0 | 3 1 | 0 | 5 1 | 0 | 6 (5 rows)
DROP TABLE IF EXISTS count_table; SELECT madlib.graph_wcc_num_cpts( 'wcc_out', -- WCC output table 'count_table'); -- output table containing number of components per group SELECT * FROM count_table;
user_id | num_components ------—+-------------— 1 | 1 2 | 2 (2 rows)