Apache MADlib: Big Data Machine Learning in SQL

  • Open source, commercially friendly Apache license
  • For PostgreSQL, Greenplum Database, and Apache HAWQ (incubating)
  • Powerful machine learning, graph, statistics and analytics for data scientists

Read More

MADlib 1.12 Release (GA)

On Aug 29, 2017, MADlib completed its first release as an Apache Software Foundation Top Level Project.

New features include: All Pairs Shortest Path, Weakly Connected Components, Breadth First Search, Mulitple Graph Measures, Stratified Sampling, Train-test split, Multilayer Perceptron and various updates for Apache Top Level Project.

Improvements:

  • Decision tree and random forest - Allow expressions in feature list, Allow array input for features, Filter NULL dependent values in OOB, Add option to treat NULL as category.

  • Summary - Allow user to determine the number of columns per run, Improve efficiency of computation time by ~35%.

  • Sketch - Promote cardinality estimators to top level module from early stage.

You are invited to download the 1.12 release and review the release notes.

 

MADlib Graduates to Apache Top Level Project

On July 19, 2017, the ASF board established Apache MADlib as a Top Level Project, which was approved by unanimous vote of the directors present. Please see the associated press release from the ASF.

MADlib entered incubation in the fall of 2015 and made five releases as an incubating project. Along the way, the MADlib community has worked hard to ensure that the project is being developed according to the principles of the  The Apache Way. We will continue to do so in the future as a TLP, to the best of our ability.

Thank you to all who have contributed to the project so far, and we look forward more innovation in machine learning in the future as a TLP!

 

MADlib 1.11 Release (GA)

On May 16, 2017, MADlib completed its fifth release as an Apache Software Foundation incubator project.

New features include: PageRank for graph analytics, grouping support for single source shortest path, array and sparse vector output for pivot, various updates for Apache Top Level Project readiness.

You are invited to download the 1.11 release and review the release notes.

 

MADlib 1.10 Release (GA)

On Mar. 10, 2017, MADlib completed its fourth release as an Apache Software Foundation incubator project.

New features include: single source shortest path for graph analytics, all new encoding categorical variables, K-nearest neighbors.

You are invited to download the 1.10 release and review the release notes.

 

MADlib User Survey Results

In October 2016, we ran a survey asking MADlib users about a wide range of topics pertaining to this open source project, including desired new features. Thank you to all who responded.

You are welcome to view the survey results and make any comments or suggestions on the user mailing list.

 

MADlib 1.9.1 Release (GA)

On Sept. 19, 2016, MADlib completed its third release as an Apache Software Foundation incubator project.

New features include: 1-class SVM for novelty detection, class weights for SVM, prediction metrics, sessionization, pivoting, overlapping patterns in the path function, and support for PostgreSQL 9.5 and 9.6.

You are invited to download the 1.9.1 release and review the release notes.

 

MADlib 1.9 Release (GA)

On April 6, 2016, MADlib completed its second release as an Apache Software Foundation incubator project: general availability of MADlib 1.9.

New features include: path functions, support vector machines including non-linear kernels, matrix operations (phase 2), covariance matrix, proportion of variance for PCA, stemmer function, and support for Apache HAWQ 2.0 (incubating).

You are invited to  download the 1.9 release and review the release notes.

 

MADlib 1.9 alpha Release

On March 11, 2016, MADlib completed its first release as an Apache Software Foundation incubator project.

The purpose of the release was to clear all potential IP issues in the code base and make it legally ready to be adopted by the community. In addition, we want to share the new features that have been developed, in order to give the community a good sense of the upcoming 1.9 release.

You are invited to  download the 1.9 alpha release and review the release notes. This is a source code only release.

The MADlib 1.9 release will be coming out shortly, based closely on the 1.9 alpha with a few "last mile" updates and additions.

 

MADlib Moves to ASF

On Sept. 15, 2015, MADlib became an Apache Software Foundation incubator project.

Together with Apache HAWQ (incubating), the MADlib open source project has transitioned its development and governance models to be in accordance with  ”The Apache Way.”

Apache Software Foundation is a widely recognized place for like-minded developers to collaborate on software in open and productive ways. MADlib community views it as the ideal venue to continue developing MADlib technology in innovative directions.  Please refer to the ASF incubator proposal for more details.

We invite anyone to come collaborate on the codebase.  Both software contributions and non-code contributions (documentation, events, community management, etc.) are valued.

We enthusiastically look forward to working together with all future contributors to MADlib in order to advance the state-of-the-art of scale-out data science tools.