Skip to content

Latest commit

 

History

History
48 lines (40 loc) · 2.59 KB

File metadata and controls

48 lines (40 loc) · 2.59 KB
title Overview of LightGBM in SynapseML
description Learn about LightGBM in SynapseML.
ms.topic overview
ms.author scottpolly
author s-polly
ms.reviewer ruxu
reviewer ruixinxu
ms.date 09/29/2025

Overview of LightGBM in SynapseML

LightGBM is an open-source, distributed, high-performance gradient boosting (GBDT, GBRT, GBM, or MART) framework. This framework specializes in creating high-quality and GPU-enabled decision tree algorithms for ranking, classification, and many other machine learning tasks. LightGBM is part of Microsoft's DMTK project.

Advantages of LightGBM

  • Composability: LightGBM models can be incorporated into existing SparkML pipelines and used for batch, streaming, and serving workloads.
  • Performance: LightGBM on Spark is 10-30% faster than SparkML on the Higgs dataset and achieves a 15% increase in AUC. Parallel experiments have verified that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
  • Functionality: LightGBM offers a wide array of tunable parameters that one can use to customize their decision tree system. LightGBM on Spark also supports new types of problems such as quantile regression.
  • Cross platform: LightGBM on Spark is available on Spark, PySpark, and SparklyR.

LightGBM Usage

  • LightGBMClassifier: Used for building classification models. For example, to predict whether a company will go bankrupt or not, we could build a binary classification model with LightGBMClassifier.
  • LightGBMRegressor: Used for building regression models. For example, to predict housing prices, we could build a regression model with LightGBMRegressor.
  • LightGBMRanker: Used for building ranking models. For example, to predict the relevance of website search results, we could build a ranking model with LightGBMRanker.

Related content