Modules

Here is a list of all modules:

[detail level 1234]

▼Data Types and Transformations | |

►Arrays and Matrices | Mathematical operations for arrays and matrices |

Encoding Categorical Variables | Functions to encode categorical variables to prepare data for input into predictive algorithms |

Path | A function to perform complex pattern matching across rows and extract useful information about the matches |

Pivot | Pivoting and data summarization tools for preparing data for modeling operations |

Sessionize | Session reconstruction of data consisting of a time stampled sequence of events |

Stemming | Provides porter stemmer operations supporting other MADlib modules |

▼Deep Learning | A collection of modules for deep learning |

►Model Preparation | Prepare models and data for deep learning |

Train Single Model | Fit, evaluate and predict for one model |

►Train Multiple Models | Train multiple deep learning models at the same time for model architecture search and hyperparameter selection |

►Utilities for Deep Learning | Utilities specific to deep learning workflows |

▼Graph | Graph algorithms and measures associated with graphs |

All Pairs Shortest Path | Finds the shortest paths between every vertex pair in a given graph |

Breadth-First Search | Finds the nodes reachable from a given source vertex using a breadth-first approach |

HITS | Find the HITS scores (authority and hub) of all vertices in a directed graph |

►Measures | A collection of metrics computed on a graph |

PageRank | Find the PageRank of all vertices in a directed graph |

Single Source Shortest Path | Finds the shortest path from a single source vertex to every other vertex in a given graph |

Weakly Connected Components | Find all weakly connected components of a graph |

▼Model Selection | Functions for model selection and model evaluation |

Cross Validation | Estimates the fit of a predictive model given a data set and specifications for the training, prediction, and error estimation functions |

Prediction Metrics | Provides various prediction accuracy metrics |

Train-Test Split | A method for splitting a data set into separate training and testing sets |

▼Sampling | A collection of methods for sampling from a population |

Balanced Sampling | A method to independently sample classes to produce a balanced data set. This is commonly used when classes are imbalanced, to ensure that subclasses are adequately represented in the sample |

Stratified Sampling | A method for independently sampling subpopulations (strata) |

▼Statistics | A collection of probability and statistics modules |

►Descriptive Statistics | Methods to compute descriptive statistics of a dataset |

►Inferential Statistics | Methods to compute inferential statistics of a dataset |

Probability Functions | Provides cumulative distribution, density/mass, and quantile functions for a wide range of probability distributions |

▼Supervised Learning | Methods to perform a variety of supervised learning tasks |

Conditional Random Field | Constructs a Conditional Random Fields (CRF) model for labeling sequential data |

k-Nearest Neighbors | Finds \(k\) nearest data points to the given data point and outputs majority vote value of output classes for classification, or average value of target values for regression |

Neural Network | Solves classification and regression problems with several fully connected layers and non-linear activation functions |

►Regression Models | A collection of methods for modeling conditional expectation of a response variable |

Support Vector Machines | Solves classification and regression problems by separating data with a hyperplane or other nonlinear decision boundary |

►Tree Methods | A collection of recursive partitioning (tree) methods |

▼Time Series Analysis | A collection of methods to analyze time series data |

ARIMA | Generates a model with autoregressive, moving average, and integrated components for a time series dataset |

▼Unsupervised Learning | A collection of methods for unsupervised learning tasks |

►Association Rules | Methods used to discover patterns in transactional datasets |

►Clustering | Methods for clustering data |

►Dimensionality Reduction | Methods for reducing the number of variables in a dataset to obtain a set of principle variables |

►Topic Modelling | A collection of methods to uncover abstract topics in a document corpus |

▼Utilities | |

Columns to Vector | Create a new table with all feature columns inserted into a single column as an array |

Database Functions | Provides a collection of user-defined functions for performing common tasks in the database |

►Linear Solvers | Methods that implement solutions for systems of consistent linear equations |

Mini-Batch Preprocessor | Utility that prepares input data for use by models that support mini-batch as an optimization option |

PMML Export | Implements the PMML XML standard to describe and exchange models produced by data mining and machine learning algorithms |

Term Frequency | Provides a collection of functions for performing common tasks related to text analytics |

Vector to Columns | Converts a feature array in a single column of an output table into multiple columns |

▼Early Stage Development | |

Conjugate Gradient | Finds the solution to the function \( \boldsymbol Ax = \boldsymbol b \), where \(A\) is a symmetric, positive-definite matrix and \(x\) and \( \boldsymbol b \) are vectors |

DBSCAN | Partitions a set of observations into clusters of arbitrary shape based on the density of nearby neighbors |

Naive Bayes Classification | Constructs a classification model from a dataset where each attribute independently contributes to the probability that a data point belongs to a category |

Random Sampling | Provides utility functions for sampling operations |

XGBoost | This module allows you to use SQL to build gradient boosted tree models designed in XGBoost [1] |

▼Deprecated Modules | |

Create Indicator Variables | Provides utility functions helpful for data preparation before modeling |

Multinomial Logistic Regression | Also called as softmax regression, models the relationship between one or more independent variables and a categorical dependent variable |