Skip to content

Commit dc1d5fd

Browse files
authored
Merge pull request #256145 from v-akarnase/hdinsight-aks-spark
Hdinsight aks spark
2 parents a09c20b + f3f5f7a commit dc1d5fd

7 files changed

Lines changed: 71 additions & 46 deletions

articles/hdinsight-aks/monitor-with-prometheus-grafana.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Monitoring with Azure Managed Prometheus and Grafana
33
description: Learn how to use monitor With Azure Managed Prometheus and Grafana
44
ms.service: hdinsight-aks
55
ms.topic: how-to
6-
ms.date: 08/29/2023
6+
ms.date: 10/27/2023
77
---
88

99
# Monitoring with Azure Managed Prometheus and Grafana
@@ -23,8 +23,8 @@ This article covers the details of enabling the monitoring feature in HDInsight
2323
* An Azure Managed Prometheus workspace. You can think of this workspace as a unique Azure Monitor logs environment with its own data repository, data sources, and solutions. For the instructions, see [Create a Azure Managed Prometheus workspace](../azure-monitor/essentials/azure-monitor-workspace-manage.md).
2424
* Azure Managed Grafana workspace. For the instructions, see [Create a Azure Managed Grafana workspace](../managed-grafana/quickstart-managed-grafana-portal.md).
2525
* An [HDInsight on AKS cluster](./quickstart-create-cluster.md). Currently, you can use Azure Managed Prometheus with the following HDInsight on AKS cluster types:
26-
* Apache Spark
27-
* Apache Flink
26+
* Apache Spark
27+
* Apache Flink®
2828
* Trino
2929

3030
For the instructions on how to create an HDInsight on AKS cluster, see [Get started with Azure HDInsight on AKS](./overview.md).
@@ -164,3 +164,7 @@ You can use the Grafana dashboard to view the service and system. Trino cluster
164164
1. View the metric as per selection.
165165

166166
:::image type="content" source="./media/monitor-with-prometheus-grafana/view-output.png" alt-text="Screenshot showing how to view the output." border="true" lightbox="./media/monitor-with-prometheus-grafana/view-output.png":::
167+
168+
## Reference
169+
170+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](./trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).

articles/hdinsight-aks/spark/azure-hdinsight-spark-on-aks-delta-lake.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
---
2-
title: How to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster.
3-
description: Learn how to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster.
2+
title: How to use Delta Lake in Azure HDInsight on AKS with Apache Spark cluster.
3+
description: Learn how to use Delta Lake scenario in Azure HDInsight on AKS with Apache Spark cluster.
44
ms.service: hdinsight-aks
55
ms.topic: how-to
6-
ms.date: 08/29/2023
6+
ms.date: 10/27/2023
77
---
88

9-
# Use Delta Lake scenario in Azure HDInsight on AKS Spark cluster (Preview)
9+
# Use Delta Lake in Azure HDInsight on AKS with Apache Spark cluster (Preview)
1010

1111
[!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
1212

13-
[Azure HDInsight on AKS](../overview.md) is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster.
13+
[Azure HDInsight on AKS](../overview.md) is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use Delta Lake in Azure HDInsight on AKS with Apache Spark cluster.
1414

1515
## Prerequisite
1616

17-
1. Create an [Azure HDInsight on AKS Spark cluster](./create-spark-cluster.md)
17+
1. Create an [Apache Spark™ cluster in Azure HDInsight on AKS](./create-spark-cluster.md)
1818

19-
:::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png" alt-text="Screenshot showing spark cluster creation." lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png":::
19+
:::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png" alt-text="Screenshot showing spark cluster creation." lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png":::
2020

2121
1. Run Delta Lake scenario in Jupyter Notebook. Create a Jupyter notebook and select "Spark" while creating a notebook, since the following example is in Scala.
2222

@@ -33,7 +33,7 @@ ms.date: 08/29/2023
3333

3434
### Provide require configurations for the delta lake
3535

36-
Delta Lake Spark Compatibility matrix - [Delta Lake](https://docs.delta.io/latest/releases.html), change Delta Lake version based on Spark Version.
36+
Delta Lake with Apache Spark Compatibility matrix - [Delta Lake](https://docs.delta.io/latest/releases.html), change Delta Lake version based on Apache Spark Version.
3737
```
3838
%%configure -f
3939
{ "conf": {"spark.jars.packages": "io.delta:delta-core_2.12:1.0.1,net.andreinc:mockneat:0.4.8",
@@ -216,3 +216,7 @@ dfTxLog.select(col("add")("path").alias("file_path")).withColumn("version",subst
216216
217217
:::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/data-after-each-data-load.png" alt-text="Screenshot KPI data after each data load." border="true" lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/data-after-each-data-load.png":::
218218
219+
## Reference
220+
221+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
222+

articles/hdinsight-aks/spark/configuration-management.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
2-
title: Configuration management in HDInsight on AKS Spark
3-
description: Learn how to perform Configuration management in HDInsight on AKS Spark
2+
title: Configuration management in HDInsight on AKS with Apache Spark
3+
description: Learn how to perform Configuration management in HDInsight on AKS with Apache Spark™ cluster
44
ms.service: hdinsight-aks
55
ms.topic: how-to
6-
ms.date: 08/29/2023
6+
ms.date: 10/19/2023
77
---
8-
# Configuration management in HDInsight on AKS Spark
8+
# Configuration management in HDInsight on AKS with Apache Spark™ cluster
99

1010
[!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
1111

12-
Azure HDInsight on AKS is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use configuration management in Azure HDInsight on AKS Spark cluster.
12+
Azure HDInsight on AKS is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use configuration management in Azure HDInsight on AKS with Apache Spark cluster.
1313

14-
Configuration management is used to add specific configurations into the spark cluster.
14+
Configuration management is used to add specific configurations into the Apache Spark cluster.
1515

1616
When user updates a configuration in the management portal the corresponding service is restarted in rolling manner.
1717

@@ -62,5 +62,9 @@ When user updates a configuration in the management portal the corresponding ser
6262
> Selecting **Save** will restart the clusters.
6363
> It is advisable not to have any active jobs while making configuration changes, since restarting the cluster may impact the active jobs.
6464
65+
## Reference
66+
67+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
68+
6569
## Next steps
6670
* [Library management in Spark](./library-management.md)

articles/hdinsight-aks/spark/connect-to-one-lake-storage.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Connect to OneLake Storage
33
description: Learn how to connect to OneLake storage
44
ms.service: hdinsight-aks
55
ms.topic: how-to
6-
ms.date: 08/29/2023
6+
ms.date: 10/27/2023
77
---
88

99
# Connect to OneLake Storage
@@ -12,7 +12,7 @@ ms.date: 08/29/2023
1212

1313
This tutorial shows how to connect to OneLake with a Jupyter notebook from an Azure HDInsight on AKS cluster.
1414

15-
1. Create an HDInsight on AKS Spark cluster. Follow these instructions: Set up clusters in HDInsight on AKS.
15+
1. Create an HDInsight on AKS cluster with Apache Spark™. Follow these instructions: Set up clusters in HDInsight on AKS.
1616
1. While providing cluster information, remember your Cluster login Username and Password, as you need them later to access the cluster.
1717
1. Create a user assigned managed identity (UAMI): Create for Azure HDInsight on AKS - UAMI and choose it as the identity in the **Storage** screen.
1818

@@ -26,7 +26,7 @@ This tutorial shows how to connect to OneLake with a Jupyter notebook from an Az
2626
1. In the Azure portal, look for your cluster and select the notebook.
2727
:::image type="content" source="./media/connect-to-one-lake-storage/overview-page.png" alt-text="Screenshot showing cluster overview page." lightbox="./media/connect-to-one-lake-storage/overview-page.png":::
2828

29-
1. Create a new Spark Notebook.
29+
1. Create a new Notebook and select type as **pyspark**.
3030
1. Copy the workspace and Lakehouse names into your notebook and build your OneLake URL for your Lakehouse. Now you can read any file from this file path.
3131
```
3232
fp = 'abfss://' + 'Workspace Name' + '@onelake.dfs.fabric.microsoft.com/' + 'Lakehouse Name' + '/Files/'
@@ -38,3 +38,7 @@ This tutorial shows how to connect to OneLake with a Jupyter notebook from an Az
3838
`writecsvdf = df.write.format("csv").save(fp + "out.csv")`
3939
4040
1. Test that your data was successfully written by checking in your Lakehouse or by reading your newly loaded file.
41+
42+
## Reference
43+
44+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,43 @@
11
---
2-
title: What is Apache Spark in HDInsight on AKS? (Preview)
3-
description: An introduction to Apache Spark in HDInsight on AKS
2+
title: What is Apache Spark in HDInsight on AKS? (Preview)
3+
description: An introduction to Apache Spark in HDInsight on AKS
44
ms.service: hdinsight-aks
55
ms.topic: conceptual
6-
ms.date: 08/29/2023
6+
ms.date: 10/27/2023
77
---
88

9-
# What is Apache Spark in HDInsight on AKS? (Preview)
9+
# What is Apache Spark in HDInsight on AKS? (Preview)
1010

1111
[!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
1212

13-
Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications.
13+
Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications.
1414

15-
Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system (HDFS). Spark allows integration with the Scala and Python programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations.
15+
Apache Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system (HDFS). Apache Spark allows integration with the Scala and Python programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations.
1616

1717
:::image type="content" source="./media/spark-overview/spark-overview.png" alt-text="Diagram showing Spark overview in HDInsight on AKS.":::
1818

1919

20-
## HDInsight Spark in AKS
20+
## Apache Spark cluster with HDInsight on AKS
2121
Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises.
2222

23-
Apache Spark in Azure HDInsight is the managed spark service in Microsoft Azure. With Apache Spark on AKS in Azure HDInsight, you can store and process your data all within Azure. Spark clusters in HDInsight are compatible with or [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md), allows you to apply Spark processing on your existing data stores.
23+
Apache Spark in Azure HDInsight on AKS is the managed spark service in Microsoft Azure. With Apache Spark in Azure HDInsight on AKS, you can store and process your data all within Azure. Spark clusters in HDInsight are compatible with or [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md), allows you to apply Spark processing on your existing data stores.
2424

2525
The Apache Spark framework for HDInsight on AKS enables fast data analytics and cluster computing using in-memory processing. Jupyter Notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
2626

27-
Spark on AKS in HDInsight composed of multiple components as pods.
27+
Apache Spark on AKS in HDInsight composed of multiple components as pods.
2828

2929
## Cluster Controllers
3030

3131
Cluster controllers are responsible for installing and managing respective service. Various controllers are installed and managed in a Spark cluster.
3232

33-
## Spark service components
33+
## Apache Spark service components
3434

3535
**Zookeeper service:** A three node Zookeeper cluster, serves as distributed coordinator or High Availability storage for other services.
3636

3737
**Yarn service:** Hadoop Yarn cluster, Spark jobs would be scheduled in the cluster as Yarn applications.
3838

39-
**Client Interfaces:** HDInsight on AKS Spark provides various client interfaces. Livy Server, Jupyter Notebook, Spark History Server, provides Spark services to HDInsight on AKS users.
39+
**Client Interfaces:** Apache Spark clusters in HDInsight on AKS, provides various client interfaces. Livy Server, Jupyter Notebook, Spark History Server, provides Spark services to HDInsight on AKS users.
40+
41+
## Reference
42+
43+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).

articles/hdinsight-aks/spark/submit-manage-jobs.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: How to submit and manage jobs on a Spark cluster in Azure HDInsight on AKS
3-
description: Learn how to submit and manage jobs on a Spark cluster in HDInsight on AKS
2+
title: How to submit and manage jobs on an Apache Spark cluster in Azure HDInsight on AKS
3+
description: Learn how to submit and manage jobs on an Apache Spark cluster in HDInsight on AKS
44
ms.service: hdinsight-aks
55
ms.topic: how-to
6-
ms.date: 08/29/2023
6+
ms.date: 10/27/2023
77
---
88

9-
# Submit and manage jobs on a Spark cluster in HDInsight on AKS
9+
# Submit and manage jobs on an Apache Spark cluster in HDInsight on AKS
1010

1111
[!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
1212

@@ -19,13 +19,13 @@ Once the cluster is created, user can use various interfaces to submit and manag
1919
## Using Jupyter
2020

2121
### Prerequisites
22-
An Apache Spark cluster on HDInsight on AKS. For more information, see [Create an Apache Spark cluster](./create-spark-cluster.md).
22+
An Apache Spark cluster on HDInsight on AKS. For more information, see [Create an Apache Spark cluster](./create-spark-cluster.md).
2323

2424
Jupyter Notebook is an interactive notebook environment that supports various programming languages.
2525

2626
### Create a Jupyter Notebook
2727

28-
1. Navigate to the Spark cluster page and open the **Overview** tab. Click on Jupyter, it asks you to authenticate and open the Jupyter web page.
28+
1. Navigate to the Apache Spark cluster page and open the **Overview** tab. Click on Jupyter, it asks you to authenticate and open the Jupyter web page.
2929

3030
:::image type="content" source="./media/submit-manage-jobs/select-jupyter-notebook.png" alt-text="Screenshot of how to select Jupyter notebook." border="true" lightbox="./media/submit-manage-jobs/select-jupyter-notebook.png":::
3131

@@ -106,13 +106,13 @@ Jupyter Notebook is an interactive notebook environment that supports various pr
106106
107107
## Using Apache Zeppelin notebooks
108108
109-
HDInsight on AKS Spark clusters include [Apache Zeppelin notebooks](https://zeppelin.apache.org/). Use the notebooks to run Apache Spark jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight on AKS cluster.
109+
Apache Spark clusters in HDInsight on AKS include [Apache Zeppelin notebooks](https://zeppelin.apache.org/). Use the notebooks to run Apache Spark jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight on AKS cluster.
110110
### Prerequisites
111111
An Apache Spark cluster on HDInsight on AKS. For instructions, see [Create an Apache Spark cluster](./create-spark-cluster.md).
112112
113113
#### Launch an Apache Zeppelin notebook
114114
115-
1. Navigate to the Spark cluster Overview page and select Zeppelin notebook from Cluster dashboards. It prompts to authenticate and open the Zeppelin page.
115+
1. Navigate to the Apache Spark cluster Overview page and select Zeppelin notebook from Cluster dashboards. It prompts to authenticate and open the Zeppelin page.
116116
117117
:::image type="content" source="./media/submit-manage-jobs/select-zeppelin.png" alt-text="Screenshot of how to select Zeppelin." lightbox="./media/submit-manage-jobs/select-zeppelin.png":::
118118
@@ -227,7 +227,7 @@ An Apache Spark cluster on HDInsight on AKS. For instructions, see [Create an
227227
228228
:::image type="content" source="./media/submit-manage-jobs/run-spark-submit-job.png" alt-text="Screenshot showing how to run Spark submit job." lightbox="./media/submit-manage-jobs/view-vim-file.png":::
229229
230-
## Monitor queries on a Spark cluster in HDInsight on AKS
230+
## Monitor queries on an Apache Spark cluster in HDInsight on AKS
231231
232232
#### Spark History UI
233233
@@ -264,4 +264,6 @@ An Apache Spark cluster on HDInsight on AKS. For instructions, see [Create an
264264
265265
:::image type="content" source="./media/submit-manage-jobs/view-logs.png" alt-text="View Logs." lightbox="./media/submit-manage-jobs/view-logs.png":::
266266
267+
## Reference
267268
269+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).

articles/hdinsight-aks/spark/use-hive-metastore.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: How to use Hive metastore in Spark
3-
description: Learn how to use Hive metastore in Spark
2+
title: How to use Hive metastore in Apache Spark
3+
description: Learn how to use Hive metastore in Apache Spark
44
ms.service: hdinsight-aks
55
ms.topic: how-to
6-
ms.date: 08/29/2023
6+
ms.date: 10/27/2023
77
---
88

9-
# How to use Hive metastore in Spark
9+
# How to use Hive metastore with Apache Spark™ cluster
1010

1111
[!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
1212

@@ -16,7 +16,7 @@ Azure HDInsight on AKS supports custom meta stores, which are recommended for pr
1616

1717
1. Create Azure SQL database
1818
1. Create a key vault for storing the credentials
19-
1. Configure Metastore while you create a HDInsight Spark cluster
19+
1. Configure Metastore while you create a HDInsight on AKS cluster with Apache Spark™
2020
1. Operate on External Metastore (Shows databases and do a select limit 1).
2121

2222
While you create the cluster, HDInsight service needs to connect to the external metastore and verify your credentials.
@@ -68,7 +68,7 @@ While you create the cluster, HDInsight service needs to connect to the external
6868

6969
:::image type="content" source="./media/use-hive-metastore/basic-tab.png" alt-text="Screenshot showing the basic tab." lightbox="./media/use-hive-metastore/basic-tab.png":::
7070

71-
1. The rest of the details are to be filled in as per the cluster creation rules for [HDInsight on AKS Spark cluster](./create-spark-cluster.md).
71+
1. The rest of the details are to be filled in as per the cluster creation rules for [Apache Spark cluster in HDInsight on AKS](./create-spark-cluster.md).
7272

7373
1. Click on **Review and Create.**
7474

@@ -97,5 +97,8 @@ While you create the cluster, HDInsight service needs to connect to the external
9797
`>> spark.sql("select * from sampleTable").show()`
9898

9999
:::image type="content" source="./media/use-hive-metastore/read-table.png" alt-text="Screenshot showing how to read table." lightbox="./media/use-hive-metastore/read-table.png":::
100+
101+
## Reference
100102

103+
* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
101104

0 commit comments

Comments
 (0)