You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-concepts.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,17 +21,17 @@ A serverless Apache Spark pool is created in the Azure portal. It's the definiti
21
21
22
22
As there's no dollar or resource cost associated with creating Spark pools, any number can be created with any number of different configurations. Permissions can also be applied to Spark pools allowing users only to have access to some and not others.
23
23
24
-
A best practice is to create smaller Spark pools that may be used for development and debugging and then larger ones for running production workloads.
24
+
A best practice is to create smaller Spark pools that for development and debugging and then larger ones for running production workloads.
25
25
26
26
You can read how to create a Spark pool and see all their properties here [Get started with Spark pools in Azure Synapse Analytics](../quickstart-create-apache-spark-pool-portal.md)
27
27
28
28
## Spark instances
29
29
30
-
Spark instances are created when you connect to a Spark pool, create a session, and run a job. As multiple users may have access to a single Spark pool, a new Spark instance is created for each user that connects.
30
+
Spark instances are created when you connect to a Spark pool, create a session, and run a job. As multiple users can have access to a single Spark pool, a new Spark instance is created for each user that connects.
31
31
32
32
When you submit a second job, if there's capacity in the pool, the existing Spark instance also has capacity. Then, the existing instance processes the job. Otherwise, if capacity is available at the pool level, a new Spark instance is created.
33
33
34
-
Billing for the instances starts when the Azure VM(s) starts. Billing for the Spark pool instances stops when pool instances change to terminating. For more information on how Azure VMs are started and deallocated, see [States and billing status of Azure Virtual Machines](/azure/virtual-machines/states-billing).
34
+
Billing for the instances starts when the Azure virtual machine starts. Billing for the Spark pool instances stops when pool instances change to terminating. For more information on how Azure VMs are started and deallocated, see [States and billing status of Azure Virtual Machines](/azure/virtual-machines/states-billing).
35
35
36
36
## Examples
37
37
@@ -40,7 +40,7 @@ Billing for the instances starts when the Azure VM(s) starts. Billing for the Sp
40
40
- You create a Spark pool called SP1; it has a fixed cluster size of 20 medium nodes
41
41
- You submit a notebook job, J1 that uses 10 nodes, a Spark instance, SI1 is created to process the job
42
42
- You now submit another job, J2, that uses 10 nodes because there's still capacity in the pool and the instance, the J2, is processed by SI1
43
-
- If J2 had asked for 11 nodes, there wouldn't have been capacity in SP1 or SI1. In this case, if J2 comes from a notebook, then the job is rejected; if J2 comes from a batch job, it is queued.
43
+
- If J2 had asked for 11 nodes, there wouldn't have been capacity in SP1 or SI1. In this case, if J2 comes from a notebook, then the job is rejected; if J2 comes from a batch job, it's queued.
44
44
- Billing starts at the submission of notebook job J1.
45
45
- The Spark pool is instantiated with 20 medium nodes, each with 8 vCores, and typically takes ~3 minutes to start. 20 x 8 = 160 vCores.
46
46
- Depending on the exact Spark pool start-up time, idle timeout and the runtime of the two notebook jobs; the pool is likely to run for between 18 and 20 minutes (Spark pool instantiation time + notebook job runtime + idle timeout).
@@ -70,7 +70,7 @@ Billing for the instances starts when the Azure VM(s) starts. Billing for the Sp
70
70
- Depending on the exact Spark pool start-up time, the ide timeout and the runtime of the first, and third notebook job; The SI1 pool is likely to run for between 18 and 20 minutes (Spark pool instantiation time + notebook job runtime + idle timeout).
71
71
- Another Spark pool SI2 is instantiated with 20 medium nodes, each with 8 vCores, and typically takes ~3 minutes to start. 20 x 8, 160 vCores
72
72
- Depending on the exact Spark pool start-up time, the ide timeout and the runtime of the first notebook job; The SI2 pool is likely to run for between 18 and 20 minutes (Spark pool instantiation time + notebook job runtime + idle timeout).
73
-
- Assuming the two pools run for 20 minutes each, 160 x .03 x 2 = 96 vCore hours.
73
+
- Assuming the two pools run for 20 minutes each, 160 x 0.03 x 2 = 96 vCore hours.
74
74
- Note: vCore hours are billed per minute and vCore pricing varies by Azure region. For more information, see [Azure Synapse Pricing](https://azure.microsoft.com/pricing/details/synapse-analytics/#pricing)
75
75
76
76
## Quotas and resource constraints in Apache Spark for Azure Synapse
@@ -97,7 +97,7 @@ The following article describes how to request an increase in workspace vCore qu
97
97
98
98
### Spark pool level
99
99
100
-
When you define a Spark pool you're effectively defining a quota per user for that pool, if you run multiple notebooks, or jobs or a mix of the 2 it's possible to exhaust the pool quota. If you do, then an error message will be generated
100
+
When you define a Spark pool you're effectively defining a quota per user for that pool, if you run multiple notebooks, jobs, or a mix of the two, it's possible to exhaust the pool quota. If you do, then an error message will be generated
101
101
102
102
```console
103
103
Failed to start session: Your Spark job requested xx vCores.
0 commit comments