Here is a list of all modules:

[detail level 1234]

▼Data Types and Transformations
►Arrays and Matrices	Mathematical operations for arrays and matrices
Array Operations	Provides fast array operations supporting other MADlib modules
Matrix Operations	Provides fast matrix operations supporting other MADlib modules
►Matrix Factorization	Linear algebra methods that factorize a matrix into a product of matrices
Low-Rank Matrix Factorization	Performs low-rank matrix factorization for an incomplete matrix
Singular Value Decomposition	Performs factorization of dense and sparse matrices
Norms and Distance Functions	Provides utility functions for basic linear algebra operations
Sparse Vectors	Implements a sparse vector data type that provides compressed storage of vectors that may have many duplicate elements
Encoding Categorical Variables	Functions to encode categorical variables to prepare data for input into predictive algorithms
Path	A function to perform complex pattern matching across rows and extract useful information about the matches
Pivot	Pivoting and data summarization tools for preparing data for modeling operations
Sessionize	Session reconstruction of data consisting of a time stampled sequence of events
Stemming	Provides porter stemmer operations supporting other MADlib modules
▼Deep Learning	A collection of modules for deep learning
►Model Preparation	Prepare models and data for deep learning
Preprocess Data	Prepare training data for use by deep learning modules
Define Model Architectures	Function to load model architectures and weights into a table
Define Custom Functions	Function to load serialized Python objects into a table
Train Single Model	Fit, evaluate and predict for one model
►Train Multiple Models	Train multiple deep learning models at the same time for model architecture search and hyperparameter selection
Define Model Configurations	Generate configurations for model architecture search and hyperparameter tuning
Train Model Configurations	Explore network architectures and hyperparameters by training many models a time
AutoML	Functions to run automated machine learning (autoML) methods for model architecture search and hyperparameter tuning
►Utilities for Deep Learning	Utilities specific to deep learning workflows
Show GPU Configuration	Utility function to report number and type of GPUs in the database cluster
▼Graph	Graph algorithms and measures associated with graphs
All Pairs Shortest Path	Finds the shortest paths between every vertex pair in a given graph
Breadth-First Search	Finds the nodes reachable from a given source vertex using a breadth-first approach
HITS	Find the HITS scores (authority and hub) of all vertices in a directed graph
►Measures	A collection of metrics computed on a graph
Average Path Length	Computes the average shortest-path length of a graph
Closeness	Computes the closeness centrality value of each node in the graph
Graph Diameter	Computes the diameter of a graph
In-Out Degree	Computes the degrees for each vertex
PageRank	Find the PageRank of all vertices in a directed graph
Single Source Shortest Path	Finds the shortest path from a single source vertex to every other vertex in a given graph
Weakly Connected Components	Find all weakly connected components of a graph
▼Model Selection	Functions for model selection and model evaluation
Cross Validation	Estimates the fit of a predictive model given a data set and specifications for the training, prediction, and error estimation functions
Prediction Metrics	Provides various prediction accuracy metrics
Train-Test Split	A method for splitting a data set into separate training and testing sets
▼Sampling	A collection of methods for sampling from a population
Balanced Sampling	A method to independently sample classes to produce a balanced data set. This is commonly used when classes are imbalanced, to ensure that subclasses are adequately represented in the sample
Stratified Sampling	A method for independently sampling subpopulations (strata)
▼Statistics	A collection of probability and statistics modules
►Descriptive Statistics	Methods to compute descriptive statistics of a dataset
►Cardinality Estimators	Methods to estimate the number of unique values contained in data
CountMin (Cormode-Muthukrishnan)	Implements Cormode-Mathukrishnan CountMin sketches on integer values as a user-defined aggregate
FM (Flajolet-Martin)	Implements Flajolet-Martin's distinct count estimation as a user-defined aggregate
MFV (Most Frequent Values)	Implements the most frequent values variant of the CountMin sketch as a user-defined aggregate
Covariance and Correlation	Generates a covariance or Pearson correlation matrix for pairs of numeric columns in a table
Summary	Calculates general descriptive statistics for any data table
►Inferential Statistics	Methods to compute inferential statistics of a dataset
Hypothesis Tests	Provides functions to perform statistical hypothesis tests
Probability Functions	Provides cumulative distribution, density/mass, and quantile functions for a wide range of probability distributions
▼Supervised Learning	Methods to perform a variety of supervised learning tasks
Conditional Random Field	Constructs a Conditional Random Fields (CRF) model for labeling sequential data
k-Nearest Neighbors	Finds \(k\) nearest data points to the given data point and outputs majority vote value of output classes for classification, or average value of target values for regression
Neural Network	Solves classification and regression problems with several fully connected layers and non-linear activation functions
►Regression Models	A collection of methods for modeling conditional expectation of a response variable
Clustered Variance	Calculates clustered variance for linear, logistic, and multinomial logistic regression models, and Cox proportional hazards models
Cox-Proportional Hazards Regression	Models the relationship between one or more independent predictor variables and the amount of time before an event occurs
Elastic Net Regularization	Generates a regularized regression model for variable selection in linear and logistic regression problems, combining the L1 and L2 penalties of the lasso and ridge methods
Generalized Linear Models	Estimate generalized linear model (GLM). GLM is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value
Linear Regression	Also called Ordinary Least Squares Regression, models linear relationship between a dependent variable and one or more independent variables
Logistic Regression	Models the relationship between one or more predictor variables and a binary categorical dependent variable by predicting the probability of the dependent variable using a logistic function
Marginal Effects	Calculates marginal effects for the coefficients in regression problems
Multinomial Regression	Multinomial regression is to model the conditional distribution of the multinomial response variable using a linear combination of predictors
Ordinal Regression	Regression to model data with ordinal response variable
Robust Variance	Calculates Huber-White variance estimates for linear, logistic, and multinomial regression models, and for Cox proportional hazards models
Support Vector Machines	Solves classification and regression problems by separating data with a hyperplane or other nonlinear decision boundary
►Tree Methods	A collection of recursive partitioning (tree) methods
Decision Tree	Decision trees are tree-based supervised learning methods that can be used for classification and regression
Random Forest	Random forest is an ensemble learning method for classification and regression that construct a multitude of decision trees at training time, then produces the class that is the mean (regression) or mode (classification) of the prediction produced by the individual trees
▼Time Series Analysis	A collection of methods to analyze time series data
ARIMA	Generates a model with autoregressive, moving average, and integrated components for a time series dataset
▼Unsupervised Learning	A collection of methods for unsupervised learning tasks
►Association Rules	Methods used to discover patterns in transactional datasets
Apriori Algorithm	Computes association rules for a given set of data
►Clustering	Methods for clustering data
k-Means Clustering	Partitions a set of observations into clusters by finding centroids that minimize the sum of observations' distances from their closest centroid
►Dimensionality Reduction	Methods for reducing the number of variables in a dataset to obtain a set of principle variables
Principal Component Analysis	Produces a model that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components
Principal Component Projection	Projects a higher dimensional data point to a lower dimensional subspace spanned by principal components learned through the PCA training procedure
►Topic Modelling	A collection of methods to uncover abstract topics in a document corpus
Latent Dirichlet Allocation	Generates a Latent Dirichlet Allocation predictive model for a collection of documents
▼Utilities
Columns to Vector	Create a new table with all feature columns inserted into a single column as an array
Database Functions	Provides a collection of user-defined functions for performing common tasks in the database
►Linear Solvers	Methods that implement solutions for systems of consistent linear equations
Dense Linear Systems	Implements solution methods for large dense linear systems. Currently, restricted to problems that fit in memory
Sparse Linear Systems	Implements solution methods for linear systems with sparse matrix input. Currently, restricted to problems that fit in memory
Mini-Batch Preprocessor	Utility that prepares input data for use by models that support mini-batch as an optimization option
PMML Export	Implements the PMML XML standard to describe and exchange models produced by data mining and machine learning algorithms
Term Frequency	Provides a collection of functions for performing common tasks related to text analytics
Vector to Columns	Converts a feature array in a single column of an output table into multiple columns
▼Early Stage Development
Conjugate Gradient	Finds the solution to the function \( \boldsymbol Ax = \boldsymbol b \), where \(A\) is a symmetric, positive-definite matrix and \(x\) and \( \boldsymbol b \) are vectors
DBSCAN	Partitions a set of observations into clusters of arbitrary shape based on the density of nearby neighbors
Naive Bayes Classification	Constructs a classification model from a dataset where each attribute independently contributes to the probability that a data point belongs to a category
Random Sampling	Provides utility functions for sampling operations
XGBoost	This module allows you to use SQL to build gradient boosted tree models designed in XGBoost [1]
▼Deprecated Modules
Create Indicator Variables	Provides utility functions helpful for data preparation before modeling
Multinomial Logistic Regression	Also called as softmax regression, models the relationship between one or more independent variables and a categorical dependent variable