| title | Tutorial: Get started analyze data with dedicated SQL pools |
|---|---|
| description | In this tutorial, use the NYC Taxi sample data to explore SQL pool's analytic capabilities. |
| author | azaricstefan |
| ms.author | stefanazaric |
| ms.date | 12/11/2024 |
| ms.service | azure-synapse-analytics |
| ms.subservice | sql |
| ms.topic | tutorial |
| ms.custom | engagement-fy23 |
In this tutorial, use the NYC Taxi data to explore a dedicated SQL pool's capabilities.
[!div class="checklist"]
- [Deploy a dedicated SQL pool]
- [Load data into the pool]
- [Explore the data you've loaded]
- This tutorial assumes you've completed the steps in the rest of the quickstarts. Specifically it uses the 'contosodatalake' resource created in the Create a Synapse Workspace quickstart.
- In Synapse Studio, on the left-side pane, select Manage > SQL pools under Analytics pools.
- Select New.
- For Dedicated SQL pool name select
SQLPOOL1. - For Performance level choose DW100C.
- Select Review + create > Create. Your dedicated SQL pool will be ready in a few minutes.
Your dedicated SQL pool is associated with a SQL database that's also called SQLPOOL1.
- Navigate to Data > Workspace.
- You should see a database named SQLPOOL1. If you don't see it, select Refresh.
A dedicated SQL pool consumes billable resources as long as it's active. You can pause the pool later to reduce costs.
Note
When creating a new dedicated SQL pool (formerly SQL DW) in your workspace, the dedicated SQL pool provisioning page will open. Provisioning will take place on the logical SQL server.
-
In Synapse Studio, navigate to the Develop hub, select the + button to add new resource, then create new SQL script.
-
Select the pool
SQLPOOL1(pool created in STEP 1 of this tutorial) in Connect to drop down list above the script. -
Enter the following code:
IF NOT EXISTS (SELECT * FROM sys.objects O JOIN sys.schemas S ON O.schema_id = S.schema_id WHERE O.NAME = 'NYCTaxiTripSmall' AND O.TYPE = 'U' AND S.NAME = 'dbo') CREATE TABLE dbo.NYCTaxiTripSmall ( [VendorID] bigint, [store_and_fwd_flag] nvarchar(1) NULL, [RatecodeID] float NULL, [PULocationID] bigint NULL, [DOLocationID] bigint NULL, [passenger_count] float NULL, [trip_distance] float NULL, [fare_amount] float NULL, [extra] float NULL, [mta_tax] float NULL, [tip_amount] float NULL, [tolls_amount] float NULL, [ehail_fee] float NULL, [improvement_surcharge] float NULL, [total_amount] float NULL, [payment_type] float NULL, [trip_type] float NULL, [congestion_surcharge] float NULL ) WITH ( DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX -- HEAP ) GO COPY INTO dbo.NYCTaxiTripSmall (VendorID 1, store_and_fwd_flag 4, RatecodeID 5, PULocationID 6 , DOLocationID 7, passenger_count 8,trip_distance 9, fare_amount 10, extra 11, mta_tax 12, tip_amount 13, tolls_amount 14, ehail_fee 15, improvement_surcharge 16, total_amount 17, payment_type 18, trip_type 19, congestion_surcharge 20 ) FROM 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet' WITH ( FILE_TYPE = 'PARQUET' ,MAXERRORS = 0 ,IDENTITY_INSERT = 'OFF' ,AUTO_CREATE_TABLE ='ON' )
[!TIP] If you get an error that reads
Login failed for user '<token-identified principal>', you need to set your Entra Id admin.- In the Azure portal, search for your synapse workspace.
- Under Settings select Microsoft Entra ID.
- Select Set admin and set a Microsoft Entra ID admin.
-
Select the Run button to execute the script.
-
This script finishes in less than 60 seconds. It loads 2 million rows of NYC Taxi data into a table called
dbo.NYCTaxiTripSmall.
-
In Synapse Studio, go to the Data hub.
-
Go to SQLPOOL1 > Tables. (If you don't see it in the menu, refresh the page.)
-
Right-click the dbo.NYCTaxiTripSmall table and select New SQL Script > Select TOP 100 Rows.
-
Wait while a new SQL script is created and runs.
-
At the top of the SQL script Connect to is automatically set to the SQL pool called SQLPOOL1.
-
Replace the text of the SQL script with this code and run it.
SELECT passenger_count as PassengerCount, SUM(trip_distance) as SumTripDistance_miles, AVG(trip_distance) as AvgTripDistance_miles INTO dbo.PassengerCountStats FROM dbo.NYCTaxiTripSmall WHERE trip_distance > 0 AND passenger_count > 0 GROUP BY passenger_count; SELECT * FROM dbo.PassengerCountStats ORDER BY PassengerCount;
This query creates a table
dbo.PassengerCountStatswith aggregate data from thetrip_distancefield, then queries the new table. The data shows how the total trip distances and average trip distance relate to the number of passengers. -
In the SQL script result window, change the View to Chart to see a visualization of the results as a line chart. Change Category column to
PassengerCount.
Pause your dedicated SQL Pool to reduce costs.
- Navigate to Manage in your synapse workspace.
- Select SQL pools.
- Hover over SQLPOOL1 and select the Pause button.
- Confirm to pause.
[!div class="nextstepaction"] Analyze data in an Azure Storage account