2.1.0
User Documentation for Apache MADlib
Deep Learning

Detailed Description

There are three main steps in order to run deep learning workloads with MADlib:

  1. Preparation, which includes data preprocessing and model definition. Data preprocessing is required to format training data for use by frameworks like Keras and TensorFlow that support mini-batching as an optimization option. Model definition involves describing model architectures (and optionally custom functions) and loading them into tables.
  2. Model training, either one model at a time or multiple models in parallel. In the latter case, you will need to define the configurations for the multiple models that you want to train - this can be done manually or in an automated way using autoML methods. The trained models can then be used for evaluation and inference.

This flowchart shows the workflow in more detail:

Modules

 Model Preparation
 Prepare models and data for deep learning.
 
 Train Single Model
 Fit, evaluate and predict for one model.
 
 Train Multiple Models
 Train multiple deep learning models at the same time for model architecture search and hyperparameter selection.
 
 Utilities for Deep Learning
 Utilities specific to deep learning workflows.