Merge pull request #256145 from v-akarnase/hdinsight-aks-spark

ttorble · web-flow · commit dc1d5fd03bf2 · 2023-10-29T09:48:31.000Z
Hdinsight aks spark
diff --git a/articles/hdinsight-aks/monitor-with-prometheus-grafana.md b/articles/hdinsight-aks/monitor-with-prometheus-grafana.md
@@ -3,7 +3,7 @@ title: Monitoring with Azure Managed Prometheus and Grafana
 description: Learn how to use monitor With Azure Managed Prometheus and Grafana
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 10/27/2023
 ---
 
 # Monitoring with Azure Managed Prometheus and Grafana
@@ -23,8 +23,8 @@ This article covers the details of enabling the monitoring feature in HDInsight
 * An Azure Managed Prometheus workspace. You can think of this workspace as a unique Azure Monitor logs environment with its own data repository, data sources, and solutions. For the instructions, see [Create a Azure Managed Prometheus workspace](../azure-monitor/essentials/azure-monitor-workspace-manage.md).
 * Azure Managed Grafana workspace. For the instructions, see [Create a Azure Managed Grafana workspace](../managed-grafana/quickstart-managed-grafana-portal.md).
 * An [HDInsight on AKS cluster](./quickstart-create-cluster.md). Currently, you can use Azure Managed Prometheus with the following HDInsight on AKS cluster types:
-    * Apache Spark
-    * Apache Flink
+    * Apache Spark™
+    * Apache Flink®
     * Trino
 
 For the instructions on how to create an HDInsight on AKS cluster, see [Get started with Azure HDInsight on AKS](./overview.md).
@@ -164,3 +164,7 @@ You can use the Grafana dashboard to view the service and system. Trino cluster
 1. View the metric as per selection.
 
     :::image type="content" source="./media/monitor-with-prometheus-grafana/view-output.png" alt-text="Screenshot showing how to view the output." border="true" lightbox="./media/monitor-with-prometheus-grafana/view-output.png":::
+
+## Reference
+
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](./trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
diff --git a/articles/hdinsight-aks/spark/azure-hdinsight-spark-on-aks-delta-lake.md b/articles/hdinsight-aks/spark/azure-hdinsight-spark-on-aks-delta-lake.md
@@ -1,22 +1,22 @@
 ---
-title: How to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster.
-description: Learn how to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster. 
+title: How to use Delta Lake in Azure HDInsight on AKS with Apache Spark™ cluster.
+description: Learn how to use Delta Lake scenario in Azure HDInsight on AKS with Apache Spark™ cluster. 
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 10/27/2023
 ---
 
-# Use Delta Lake scenario in Azure HDInsight on AKS Spark cluster (Preview)
+# Use Delta Lake in Azure HDInsight on AKS with Apache Spark™ cluster (Preview)
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
-[Azure HDInsight on AKS](../overview.md) is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use Delta Lake scenario in Azure HDInsight on AKS Spark cluster.
+[Azure HDInsight on AKS](../overview.md) is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use Delta Lake in Azure HDInsight on AKS with Apache Spark™ cluster.
 
 ## Prerequisite
 
-1. Create an [Azure HDInsight on AKS Spark cluster](./create-spark-cluster.md)
+1. Create an [Apache Spark™ cluster in Azure HDInsight on AKS](./create-spark-cluster.md)
 
-    :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png" alt-text="Screenshot showing spark cluster creation." lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png":::
+    :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png" alt-text="Screenshot showing  spark cluster creation." lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/create-spark-cluster.png":::
 
 1. Run Delta Lake scenario in Jupyter Notebook. Create a Jupyter notebook and select "Spark" while creating a notebook, since the following example is in Scala.
    
@@ -33,7 +33,7 @@ ms.date: 08/29/2023
 
 ### Provide require configurations for the delta lake
 
-Delta Lake Spark Compatibility matrix - [Delta Lake](https://docs.delta.io/latest/releases.html), change Delta Lake version based on Spark Version.
+Delta Lake with Apache Spark Compatibility matrix - [Delta Lake](https://docs.delta.io/latest/releases.html), change Delta Lake version based on Apache Spark Version.
  ```
 %%configure -f
 { "conf": {"spark.jars.packages": "io.delta:delta-core_2.12:1.0.1,net.andreinc:mockneat:0.4.8",
@@ -216,3 +216,7 @@ dfTxLog.select(col("add")("path").alias("file_path")).withColumn("version",subst
 
 :::image type="content" source="./media/azure-hdinsight-spark-on-aks-delta-lake/data-after-each-data-load.png" alt-text="Screenshot KPI data after each data load." border="true" lightbox="./media/azure-hdinsight-spark-on-aks-delta-lake/data-after-each-data-load.png":::
 
+## Reference
+
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
+
diff --git a/articles/hdinsight-aks/spark/configuration-management.md b/articles/hdinsight-aks/spark/configuration-management.md
@@ -1,17 +1,17 @@
 ---
-title: Configuration management in HDInsight on AKS Spark
-description: Learn how to perform Configuration management in HDInsight on AKS Spark
+title: Configuration management in HDInsight on AKS with Apache Spark™
+description: Learn how to perform Configuration management in HDInsight on AKS with Apache Spark™ cluster
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 10/19/2023
 ---
-# Configuration management in HDInsight on AKS Spark
+# Configuration management in HDInsight on AKS with Apache Spark™ cluster
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
-Azure HDInsight on AKS is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use configuration management in Azure HDInsight on AKS Spark cluster.
+Azure HDInsight on AKS is a managed cloud-based service for big data analytics that helps organizations process large amounts data. This tutorial shows how to use configuration management in Azure HDInsight on AKS with Apache Spark™ cluster.
 
-Configuration management is used to add specific configurations into the spark cluster.
+Configuration management is used to add specific configurations into the Apache Spark cluster.
 
 When user updates a configuration in the management portal the corresponding service is restarted in rolling manner.
 
@@ -62,5 +62,9 @@ When user updates a configuration in the management portal the corresponding ser
     > Selecting **Save** will restart the clusters.
     > It is advisable not to have any active jobs while making configuration changes, since restarting the cluster may impact the active jobs.
 
+## Reference
+
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
+  
 ## Next steps
 * [Library management in Spark](./library-management.md)
diff --git a/articles/hdinsight-aks/spark/connect-to-one-lake-storage.md b/articles/hdinsight-aks/spark/connect-to-one-lake-storage.md
@@ -3,7 +3,7 @@ title: Connect to OneLake Storage
 description: Learn how to connect to OneLake storage
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 10/27/2023
 ---
 
 # Connect to OneLake Storage
@@ -12,7 +12,7 @@ ms.date: 08/29/2023
 
 This tutorial shows how to connect to OneLake with a Jupyter notebook from an Azure HDInsight on AKS cluster.
 
-1. Create an HDInsight on AKS Spark cluster. Follow these instructions: Set up clusters in HDInsight on AKS.
+1. Create an HDInsight on AKS cluster with Apache Spark™. Follow these instructions: Set up clusters in HDInsight on AKS.
 1. While providing cluster information, remember your Cluster login Username and Password, as you need them later to access the cluster.
 1. Create a user assigned managed identity (UAMI): Create for Azure HDInsight on AKS - UAMI and choose it as the identity in the **Storage** screen.
 
@@ -26,7 +26,7 @@ This tutorial shows how to connect to OneLake with a Jupyter notebook from an Az
 1. In the Azure portal, look for your cluster and select the notebook.
     :::image type="content" source="./media/connect-to-one-lake-storage/overview-page.png" alt-text="Screenshot showing cluster overview page." lightbox="./media/connect-to-one-lake-storage/overview-page.png":::
 
-1. Create a new Spark Notebook.
+1. Create a new Notebook and select type as **pyspark**.
 1. Copy the workspace and Lakehouse names into your notebook and build your OneLake URL for your Lakehouse. Now you can read any file from this file path.
     ```
     fp = 'abfss://' + 'Workspace Name' + '@onelake.dfs.fabric.microsoft.com/' + 'Lakehouse Name' + '/Files/' 
@@ -38,3 +38,7 @@ This tutorial shows how to connect to OneLake with a Jupyter notebook from an Az
     `writecsvdf = df.write.format("csv").save(fp + "out.csv")`
    
 1. Test that your data was successfully written by checking in your Lakehouse or by reading your newly loaded file.
+
+## Reference
+
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
diff --git a/articles/hdinsight-aks/spark/hdinsight-on-aks-spark-overview.md b/articles/hdinsight-aks/spark/hdinsight-on-aks-spark-overview.md
@@ -1,39 +1,43 @@
 ---
-title: What is Apache Spark in HDInsight on AKS? (Preview)
-description: An introduction to Apache Spark in HDInsight on AKS
+title: What is Apache Spark™ in HDInsight on AKS? (Preview)
+description: An introduction to Apache Spark™ in HDInsight on AKS
 ms.service: hdinsight-aks
 ms.topic: conceptual
-ms.date: 08/29/2023
+ms.date: 10/27/2023
 ---
 
-# What is Apache Spark in HDInsight on AKS? (Preview)
+# What is Apache Spark™ in HDInsight on AKS? (Preview)
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
-Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. 
+Apache Spark™ is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. 
 
-Spark provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system (HDFS). Spark allows integration with the Scala and Python programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations.
+Apache Spark™ provides primitives for in-memory cluster computing. A Spark job can load and cache data into memory and query it repeatedly. In-memory computing is faster than disk-based applications, such as Hadoop, which shares data through Hadoop distributed file system (HDFS). Apache Spark allows integration with the Scala and Python programming languages to let you manipulate distributed data sets like local collections. There's no need to structure everything as map and reduce operations.
 
 :::image type="content" source="./media/spark-overview/spark-overview.png" alt-text="Diagram showing Spark overview in HDInsight on AKS."::: 
 
 
-## HDInsight Spark in AKS
+## Apache Spark cluster with HDInsight on AKS
 Azure HDInsight is a managed, full-spectrum, open-source analytics service for enterprises.
 
-Apache Spark in Azure HDInsight is the managed spark service in Microsoft Azure. With Apache Spark on AKS in Azure HDInsight, you can store and process your data all within Azure. Spark clusters in HDInsight are compatible with or [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md), allows you to apply Spark processing on your existing data stores.
+Apache Spark™ in Azure HDInsight on AKS is the managed spark service in Microsoft Azure. With Apache Spark in Azure HDInsight on AKS, you can store and process your data all within Azure. Spark clusters in HDInsight are compatible with or [Azure Data Lake Storage Gen2](../../storage/blobs/data-lake-storage-introduction.md), allows you to apply Spark processing on your existing data stores.
 
 The Apache Spark framework for HDInsight on AKS enables fast data analytics and cluster computing using in-memory processing. Jupyter Notebook lets you interact with your data, combine code with markdown text, and do simple visualizations.
 
-Spark on AKS in HDInsight composed of multiple components as pods. 
+Apache Spark on AKS in HDInsight composed of multiple components as pods. 
 
 ## Cluster Controllers
 
 Cluster controllers are responsible for installing and managing respective service. Various controllers are installed and managed in a Spark cluster.
 
-## Spark service components
+## Apache Spark service components
 
 **Zookeeper service:** A three node Zookeeper cluster, serves as distributed coordinator or High Availability storage for other services.
 
 **Yarn service:** Hadoop Yarn cluster, Spark jobs would be scheduled in the cluster as Yarn applications.
 
-**Client Interfaces:** HDInsight on AKS Spark provides various client interfaces. Livy Server, Jupyter Notebook, Spark History Server, provides Spark services to HDInsight on AKS users.
+**Client Interfaces:** Apache Spark clusters in HDInsight on AKS, provides various client interfaces. Livy Server, Jupyter Notebook, Spark History Server, provides Spark services to HDInsight on AKS users.
+
+## Reference
+
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
diff --git a/articles/hdinsight-aks/spark/submit-manage-jobs.md b/articles/hdinsight-aks/spark/submit-manage-jobs.md
@@ -1,12 +1,12 @@
 ---
-title: How to submit and manage jobs on a Spark cluster in Azure HDInsight on AKS
-description: Learn how to submit and manage jobs on a Spark cluster in HDInsight on AKS
+title: How to submit and manage jobs on an Apache Spark™ cluster in Azure HDInsight on AKS
+description: Learn how to submit and manage jobs on an Apache Spark™ cluster in HDInsight on AKS
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 10/27/2023
 ---
 
-# Submit and manage jobs on a Spark cluster in HDInsight on AKS
+# Submit and manage jobs on an Apache Spark™ cluster in HDInsight on AKS
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
@@ -19,13 +19,13 @@ Once the cluster is created, user can use various interfaces to submit and manag
 ## Using Jupyter
 
 ### Prerequisites
-An Apache Spark cluster on HDInsight on AKS. For more information, see [Create an Apache Spark cluster](./create-spark-cluster.md).
+An Apache Spark™ cluster on HDInsight on AKS. For more information, see [Create an Apache Spark cluster](./create-spark-cluster.md).
 
 Jupyter Notebook is an interactive notebook environment that supports various programming languages.
 
 ### Create a Jupyter Notebook
 
-1. Navigate to the Spark cluster page and open the **Overview** tab. Click on Jupyter, it asks you to authenticate and open the Jupyter web page.
+1. Navigate to the Apache Spark™ cluster page and open the **Overview** tab. Click on Jupyter, it asks you to authenticate and open the Jupyter web page.
 
     :::image type="content" source="./media/submit-manage-jobs/select-jupyter-notebook.png" alt-text="Screenshot of how to select Jupyter notebook." border="true" lightbox="./media/submit-manage-jobs/select-jupyter-notebook.png":::
 
@@ -106,13 +106,13 @@ Jupyter Notebook is an interactive notebook environment that supports various pr
 
 ## Using Apache Zeppelin notebooks
 
-HDInsight on AKS Spark clusters include [Apache Zeppelin notebooks](https://zeppelin.apache.org/). Use the notebooks to run Apache Spark jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight on AKS cluster.
+Apache Spark clusters in HDInsight on AKS include [Apache Zeppelin notebooks](https://zeppelin.apache.org/). Use the notebooks to run Apache Spark jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight on AKS cluster.
 ### Prerequisites
 An Apache Spark cluster on HDInsight on AKS. For instructions, see [Create an Apache Spark cluster](./create-spark-cluster.md).
 
 #### Launch an Apache Zeppelin notebook
 
-1.	Navigate to the Spark cluster Overview page and select Zeppelin notebook from Cluster dashboards. It prompts to authenticate and open the Zeppelin page.
+1.	Navigate to the Apache Spark cluster Overview page and select Zeppelin notebook from Cluster dashboards. It prompts to authenticate and open the Zeppelin page.
 
     :::image type="content" source="./media/submit-manage-jobs/select-zeppelin.png" alt-text="Screenshot of how to select Zeppelin." lightbox="./media/submit-manage-jobs/select-zeppelin.png":::
 
@@ -227,7 +227,7 @@ An Apache Spark cluster on HDInsight on AKS. For instructions, see [Create an
 
     :::image type="content" source="./media/submit-manage-jobs/run-spark-submit-job.png" alt-text="Screenshot showing how to run Spark submit job." lightbox="./media/submit-manage-jobs/view-vim-file.png":::
 
-## Monitor queries on a Spark cluster in HDInsight on AKS
+## Monitor queries on an Apache Spark cluster in HDInsight on AKS
 
 #### Spark History UI
 
@@ -264,4 +264,6 @@ An Apache Spark cluster on HDInsight on AKS. For instructions, see [Create an
 
    :::image type="content" source="./media/submit-manage-jobs/view-logs.png" alt-text="View Logs." lightbox="./media/submit-manage-jobs/view-logs.png":::
 
+## Reference
 
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).
diff --git a/articles/hdinsight-aks/spark/use-hive-metastore.md b/articles/hdinsight-aks/spark/use-hive-metastore.md
@@ -1,12 +1,12 @@
 ---
-title: How to use Hive metastore in Spark
-description: Learn how to use Hive metastore in Spark
+title: How to use Hive metastore in Apache Spark™
+description: Learn how to use Hive metastore in Apache Spark™
 ms.service: hdinsight-aks
 ms.topic: how-to
-ms.date: 08/29/2023
+ms.date: 10/27/2023
 ---
 
-# How to use Hive metastore in Spark
+# How to use Hive metastore with Apache Spark™ cluster
 
 [!INCLUDE [feature-in-preview](../includes/feature-in-preview.md)]
 
@@ -16,7 +16,7 @@ Azure HDInsight on AKS supports custom meta stores, which are recommended for pr
 
 1. Create Azure SQL database
 1. Create a key vault for storing the credentials
-1. Configure Metastore while you create a HDInsight Spark cluster 
+1. Configure Metastore while you create a HDInsight on AKS cluster with Apache Spark™ 
 1. Operate on External Metastore (Shows databases and do a select limit 1).
 
 While you create the cluster, HDInsight service needs to connect to the external metastore and verify your credentials.
@@ -68,7 +68,7 @@ While you create the cluster, HDInsight service needs to connect to the external
 
     :::image type="content" source="./media/use-hive-metastore/basic-tab.png" alt-text="Screenshot showing the basic tab." lightbox="./media/use-hive-metastore/basic-tab.png":::
 
-1. The rest of the details are to be filled in as per the cluster creation rules for [HDInsight on AKS Spark cluster](./create-spark-cluster.md).
+1. The rest of the details are to be filled in as per the cluster creation rules for [Apache Spark cluster in HDInsight on AKS](./create-spark-cluster.md).
 
 1. Click on **Review and Create.**
 
@@ -97,5 +97,8 @@ While you create the cluster, HDInsight service needs to connect to the external
     `>> spark.sql("select * from sampleTable").show()`
 
     :::image type="content" source="./media/use-hive-metastore/read-table.png" alt-text="Screenshot showing how to read table." lightbox="./media/use-hive-metastore/read-table.png":::
+   
+## Reference
 
+* Apache, Apache Spark, Spark, and associated open source project names are [trademarks](../trademarks.md) of the [Apache Software Foundation](https://www.apache.org/) (ASF).