| title | Use ai.classify with PySpark |
|---|---|
| description | Learn how to categorize input text according to custom labels by using the ai.classify function with PySpark. |
| ms.reviewer | vimeland |
| ms.topic | how-to |
| ms.date | 11/13/2025 |
| ms.search.form | AI functions |
The ai.classify function uses generative AI to categorize input text according to custom labels you choose, with a single line of code.
Note
- This article covers using ai.classify with PySpark. To use ai.classify with pandas, see this article.
- See other AI functions in this overview article.
- Learn how to customize the configuration of AI functions.
The ai.classify function is available for Spark DataFrames. You must specify the name of an existing input column as a parameter, along with a list of classification labels.
The function returns a new DataFrame with labels that match each row of input text, stored in an output column.
df.ai.classify(labels=["category1", "category2", "category3"], input_col="text", output_col="classification")| Name | Description |
|---|---|
labels Required |
An array of strings that represents the set of classification labels to match to text values in the input column. |
input_col Required |
A string that contains the name of an existing column with input text values to classify according to the custom labels. |
output_col Optional |
A string that contains the name of a new column where you want to store a classification label for each input text row. If you don't set this parameter, a default name is generated for the output column. |
error_col Optional |
A string that contains the name of a new column. The new column stores any OpenAI errors that result from processing each row of input text. If you don't set this parameter, a default name is generated for the error column. If there are no errors for a row of input, the value in this column is null. |
The function returns a Spark DataFrame that includes a new column that contains classification labels that match each input text row. If a text value can't be classified, the corresponding label is null.
# This code uses AI. Always review output for mistakes.
df = spark.createDataFrame([
("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
], ["descriptions"])
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)This example code cell provides the following output:
:::image type="content" source="../../media/ai-functions/classify-example-output.png" alt-text="Screenshot of a data frame with 'descriptions' and 'category' columns. The 'category' column lists each description’s category name." lightbox="../../media/ai-functions/classify-example-output.png":::
The ai.classify function supports file-based multimodal input. You can classify images, PDFs, and text files by setting input_col_type="path". For more information about supported file types and setup, see Use multimodal input with AI functions.
# This code uses AI. Always review output for mistakes.
results = custom_df.ai.classify(
labels=["Master", "PhD", "Bachelor", "Other"],
input_col="file_path",
input_col_type="path",
output_col="highest_degree",
)
display(results)-
Detect sentiment with ai.analyze_sentiment.
-
Generate vector embeddings with ai.embed.
-
Extract entities with ai_extract.
-
Fix grammar with ai.fix_grammar.
-
Answer custom user prompts with ai.generate_response.
-
Calculate similarity with ai.similarity.
-
Summarize text with ai.summarize.
-
Translate text with ai.translate.
-
Learn more about the full set of AI functions.
-
Customize the configuration of AI functions.
-
Did we miss a feature you need? Suggest it on the Fabric Ideas forum.