Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Advanced+500 XP

Project 06 — Pull Data From a REST API

Call a live weather API from ADF, receive JSON responses for three cities, save them to ADLS, and automate the pipeline to run every morning at 6 AM — no files, no uploads, pure API-driven ingestion.

75–90 min March 2026
Series
Azure DE — Zero to Advanced
Project
06 of 25
Level
Beginner+
Time
75–90 min
Builds on: Projects 01–05 — same resource group, same storage account, same ADF
What you will build

A pipeline that calls a live REST API, receives a JSON response for each of FreshMart's three store cities, and saves the data to ADLS automatically every morning at 6 AM.

🏢 Real World Problem

FreshMart's category managers noticed something interesting. On rainy days, sales of umbrellas, instant noodles, and hot beverages spike. On hot summer days, ice cream, cold drinks, and fruits sell faster.

The data team wants to combine sales data with weather data to help stores plan inventory better. If heavy rain is forecast for Delhi tomorrow, ST001 should stock more umbrellas tonight.

But FreshMart has no weather data. They need to pull it from somewhere.

Open-Meteo is a free, public weather API. Every day it provides temperature, rainfall, wind speed, and weather codes for any city in the world. No registration. No credit card. No API key.

FreshMart's 3 store cities
New Delhi    → lat=28.6139, lon=77.2090
Mumbai       → lat=19.0760, lon=72.8777
Bangalore    → lat=12.9716, lon=77.5946

We will build a pipeline that:

  1. Calls the Open-Meteo API for each city
  2. Receives weather data as JSON
  3. Saves each city's weather to ADLS
  4. Runs automatically every morning at 6 AM
What we are building

Open-Meteo API → https://api.open-meteo.com

ADF calls API 3 times (one per city):

Call 1: ?latitude=28.6139&longitude=77.2090 → Delhi weather JSON

Call 2: ?latitude=19.0760&longitude=72.8777 → Mumbai weather JSON

Call 3: ?latitude=12.9716&longitude=77.5946 → Bangalore weather JSON

Each JSON saved to ADLS:

raw/weather/date=2024-01-15/

├── weather_delhi_20240115.json

├── weather_mumbai_20240115.json

└── weather_bangalore_20240115.json

Pipeline: pl_ingest_weather_api · Trigger: every morning at 6 AM IST · 3 cities, 3 API calls, 3 JSON files
Step by Step Plan
PART 1Test the API in browserStep 110 min
PART 2Create Linked ServiceStep 210 min
PART 3Create DatasetsSteps 3–415 min
PART 4Build the PipelineSteps 5–1340 min
PART 5Add 6 AM TriggerSteps 14–1510 min

🧠 Concepts You Must Understand First

What is a REST API?

In Project 04 we used HTTP to download a file — like downloading a CSV from a website. The URL pointed directly to a file.

A REST API is different. You do not download a file. Instead you ask a question and get a structured answer.

FILE DOWNLOAD (Project 04)
  • URL points to a file
  • Response IS the file
  • Like downloading a PDF
  • Same response every time
  • GET /data/csv/cities.csv
  • ← returns the cities.csv file
REST API (Project 06)
  • URL is a question/request
  • Response is structured data
  • Like asking Google a question
  • Response changes based on what you ask
  • GET /v1/forecast?latitude=28.61
  • ← returns Delhi's temperature data

Think of a REST API like a waiter at a restaurant:

  • You (ADF) place an order (HTTP request) with specific details (parameters in the URL)
  • The waiter (API) goes to the kitchen (server) and comes back with exactly what you asked for (JSON response)
  • Different orders give different responses

What is JSON?

When you call a REST API, the response comes back as JSON — JavaScript Object Notation. It is the universal language APIs use to send data.

Example JSON response from Open-Meteo
{
  "city": "New Delhi",
  "latitude": 28.6139,
  "longitude": 77.2090,
  "daily": {
    "time": ["2024-01-15", "2024-01-16"],
    "temperature_2m_max": [22.5, 24.1],
    "precipitation_sum": [0.0, 2.3],
    "weathercode": [1, 61]
  }
}
Understanding JSON structure
{  }         → object (a thing with properties)
[ ]          → array (a list of items)
"key": value → a property with its value
"text"       → text value (always in double quotes)
22.5         → number value (no quotes)
true/false   → boolean value

In the example above:
  "city"      → text property    → "New Delhi"
  "latitude"  → number property  → 28.6139
  "daily"     → object           → contains arrays of values
  "time"      → array of dates   → ["2024-01-15", "2024-01-16", ...]

What URL Do We Call?

The Open-Meteo API URL for Delhi looks like this:

Delhi API URL — broken down
https://api.open-meteo.com/v1/forecast
  ?latitude=28.6139
  &longitude=77.2090
  &daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode
  &timezone=Asia/Kolkata
  &forecast_days=1

https://api.open-meteo.com   → base URL (goes in the Linked Service)
/v1/forecast                  → the specific API endpoint
?                             → starts the query parameters
latitude=28.6139              → first parameter (Delhi's latitude)
&longitude=77.2090            → second parameter (Delhi's longitude)
&daily=temperature_2m_max,... → which weather fields we want
&timezone=Asia/Kolkata        → time zone for the dates
&forecast_days=1              → only give me today's forecast
💡 Base URL vs Relative URL
ADF splits the URL into two parts. The base URL (just the domain) lives in the Linked Service. The relative URL (the path and query parameters) goes in the dataset. This lets you reuse one linked service for many different API endpoints on the same server.

Object Arrays in ForEach — The Key New Concept

In Projects 02 and 03, the array contained simple strings:

Projects 02–03 — simple string array
["ST001", "ST002", "ST003"]

This time each item is an object with three properties — name, lat, and lon:

Project 06 — array of objects
[{"name":"delhi","lat":"28.6139","lon":"77.2090"}, ...]

This is because for each city we need three pieces of information — not just one. We pack all three into one object so the ForEach loop gets everything in one shot.

Inside the ForEach you use dot notation to read individual properties:

@{item()}

{"name":"delhi","lat":"28.6139","lon":"77.2090"} — the whole object

@{item().name}

"delhi" — just the city name

@{item().lat}

"28.6139" — just the latitude

@{item().lon}

"77.2090" — just the longitude

PART 1 — Test the API in Browser

Step 1 — Open the API URL and See the Response

Before building anything, always test your API in the browser first.

Open your browser and paste this URL exactly:

https://api.open-meteo.com/v1/forecast?latitude=28.6139&longitude=77.2090&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode&timezone=Asia%2FKolkata&forecast_days=1

Your browser will show a JSON response similar to this:

Expected response — Delhi weather
{
  "latitude": 28.625,
  "longitude": 77.1875,
  "timezone": "Asia/Kolkata",
  "timezone_abbreviation": "IST",
  "elevation": 216.0,
  "daily_units": {
    "time": "iso8601",
    "temperature_2m_max": "°C",
    "temperature_2m_min": "°C",
    "precipitation_sum": "mm",
    "weathercode": "wmo code"
  },
  "daily": {
    "time": ["2024-01-15"],
    "temperature_2m_max": [22.3],
    "temperature_2m_min": [9.1],
    "precipitation_sum": [0.0],
    "weathercode": [1]
  }
}
📸SCREENSHOT

Browser showing the raw JSON response from Open-Meteo API — JSON text visible in the browser window

✅ If you see JSON data — the API is working
The browser is doing the same thing ADF will do: sending an HTTP GET request to the URL and receiving a JSON response.

Now test for Mumbai:

https://api.open-meteo.com/v1/forecast?latitude=19.0760&longitude=72.8777&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode&timezone=Asia%2FKolkata&forecast_days=1
📸SCREENSHOT

Browser showing Mumbai weather JSON response — different temperature values than Delhi, confirming the API returns different data per location

What does weathercode mean? Open-Meteo uses standard WMO weather codes:

0   → Clear sky
1   → Mainly clear
2   → Partly cloudy
3   → Overcast
51  → Light drizzle
61  → Slight rain
71  → Slight snowfall
95  → Thunderstorm
PART 2 — Create Linked Service

Step 2 — Create HTTP Linked Service for Open-Meteo

In Project 04 we created ls_http_public_data pointing to https://people.sc.fsu.edu. Open-Meteo is a different server with a different base URL — we need a new linked service for it.

In ADF Studio → ManageLinked services"+ New"

Search "HTTP" → select HTTPContinue

📸SCREENSHOT

New linked service — HTTP selected in the search results

Fill in:

Name:                    ls_http_openmeteo
Description:             Connection to Open-Meteo free weather API
Connect via:             AutoResolveIntegrationRuntime
Base URL:                https://api.open-meteo.com
Authentication type:     Anonymous
📸SCREENSHOT

HTTP linked service form — name ls_http_openmeteo, Base URL https://api.open-meteo.com, Anonymous authentication

Click "Test connection"

📸SCREENSHOT

Green 'Connection successful' message for ls_http_openmeteo

Click "Create"

📸SCREENSHOT

Linked services list — ls_http_openmeteo visible alongside previous linked services

PART 3 — Create Datasets

Step 3 — Source Dataset (HTTP → JSON)

In Project 04 we used DelimitedText (CSV) as the format for our HTTP dataset. This time the API returns JSON, so we pick JSON as the format.

AuthorDatasets"+""New dataset"

Search "HTTP" → select HTTPContinue

For format — select "JSON"Continue

📸SCREENSHOT

Format selection step — JSON selected, not DelimitedText

Name:            ds_src_http_weather_json
Linked service:  ls_http_openmeteo
Relative URL:    (leave empty — will be dynamic)
Request method:  GET

Click "OK"

Click "Parameters" tab → "+ New"

Name:    relative_url
Type:    String
📸SCREENSHOT

Dataset Parameters tab — relative_url parameter added

Click "Connection" tab

Relative URL"Add dynamic content" → click relative_url under Parameters:

@dataset().relative_url
📸SCREENSHOT

Connection tab — Relative URL showing @dataset().relative_url

Click 💾 Save

Step 4 — Sink Dataset (ADLS → JSON)

Datasets"+""New dataset"

Search "Azure Data Lake Storage Gen2" → select → Continue

For format — select "JSON"Continue

📸SCREENSHOT

Format selection — JSON selected for ADLS sink dataset

Name:            ds_sink_adls_weather_json
Linked service:  ls_adls_freshmart
File path:       (leave all empty)
Import schema:   None

Click "OK"

Click "Parameters" tab → add TWO parameters:

Parameter 1:
  Name:    date_folder
  Type:    String

Parameter 2:
  Name:    file_name
  Type:    String
📸SCREENSHOT

Sink dataset Parameters tab — date_folder and file_name parameters both added

Click "Connection" tab:

Container: raw

Directory"Add dynamic content":

weather/@{dataset().date_folder}

File"Add dynamic content":

@dataset().file_name
📸SCREENSHOT

Sink dataset Connection tab — Container: raw, Directory: weather/@{dataset().date_folder}, File: @dataset().file_name

Click 💾 Save

PART 4 — Build the Pipeline

Step 5 — Create New Pipeline

Author"+" next to Pipelines → "New pipeline"

Name:        pl_ingest_weather_api
Description: Calls Open-Meteo API for each city and saves JSON to ADLS raw/weather/
📸SCREENSHOT

Blank pipeline canvas — name pl_ingest_weather_api in Properties panel

Step 6 — Add Parameters

Click blank canvas"Parameters" tab → add parameters:

Parameter 1:
  Name:    run_date
  Type:    String
  Default: 2024-01-15

Parameter 2:
  Name:    cities
  Type:    Array
  Default: [
    {"name":"delhi","lat":"28.6139","lon":"77.2090"},
    {"name":"mumbai","lat":"19.0760","lon":"72.8777"},
    {"name":"bangalore","lat":"12.9716","lon":"77.5946"}
  ]
📸SCREENSHOT

Pipeline Parameters tab — run_date and cities parameters visible

💡 Why is each city an object instead of a plain string?
In Projects 02 and 03 we only needed one value per item (the store ID). Here we need three values per city — name, lat, and lon. Packing them into an object {"name":"delhi","lat":"28.6139","lon":"77.2090"} means the ForEach gets everything it needs in a single item, and we read each value with dot notation:item().name, item().lat, item().lon.

Step 7 — Add Variable + Set Variable Activity

Click blank canvas"Variables" tab → "+ New"

Name:    run_date_folder
Type:    String
📸SCREENSHOT

Variables tab — run_date_folder variable added

From left panel → "General" → drag "Set variable" onto canvas

Click it → bottom panel:

General tab:
  Name:  set_run_date_folder

Variables tab:
  Name:   run_date_folder
  Value:  (click "Add dynamic content")

In the dynamic content editor, type:

date=@{pipeline().parameters.run_date}

date=2024-01-15

📸SCREENSHOT

Set variable activity — value showing date=@{pipeline().parameters.run_date}

Step 8 — Add ForEach Activity

From left panel → "Iteration & conditionals" → drag "ForEach" onto canvas

Connect: drag green arrow from set_run_date_folderForEach

📸SCREENSHOT

Canvas — set_run_date_folder connected to ForEach with green arrow

Click ForEach → bottom panel:

General tab:
  Name:  ForEach_cities

Settings tab:
  Sequential:   ☐ Unchecked  (run all 3 cities in parallel)
  Batch count:  3
  Items:        (click "Add dynamic content")

Items value → click cities under Parameters:

@pipeline().parameters.cities
📸SCREENSHOT

ForEach Settings tab — Items showing @pipeline().parameters.cities, batch count 3, Sequential unchecked

Step 9 — Add Copy Activity Inside ForEach

Click the "+" button INSIDE the ForEach box

📸SCREENSHOT

ForEach activity — '+' button inside highlighted

You are on the inner canvas. Drag "Copy data" from the left panel onto the inner canvas.

General tab:
  Name:  copy_weather_json_to_adls
📸SCREENSHOT

Copy data activity on ForEach inner canvas — named copy_weather_json_to_adls

Step 10 — Configure Source

Click "Source" tab

Source dataset:  ds_src_http_weather_json

Dataset property relative_url appears → click "Add dynamic content"

Build the full API query URL for each city:

relative_url — builds a different URL for each city
/v1/forecast?latitude=@{item().lat}&longitude=@{item().lon}&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode&timezone=Asia%2FKolkata&forecast_days=1
📸SCREENSHOT

Dynamic content editor — the full API query URL expression with @{item().lat} and @{item().lon}

Click "OK"

What does this build for each city?

When item() = {name:'delhi', lat:'28.6139', lon:'77.2090'}
/v1/forecast
  ?latitude=28.6139
  &longitude=77.2090
  &daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode
  &timezone=Asia%2FKolkata
  &forecast_days=1

ADF joins this with the base URL from the linked service:
→ https://api.open-meteo.com/v1/forecast?latitude=28.6139&longitude=77.2090...

That is exactly the Delhi weather URL we tested in the browser! ✅

Next iteration (Mumbai): @{item().lat} → 19.0760, @{item().lon} → 72.8777
→ https://api.open-meteo.com/v1/forecast?latitude=19.0760&longitude=72.8777...
📸SCREENSHOT

Source tab complete — relative_url dataset property showing the full API query expression

Step 11 — Configure Sink

Click "Sink" tab

Sink dataset:  ds_sink_adls_weather_json

Dataset property date_folder"Add dynamic content" → click run_date_folder under Variables:

@variables('run_date_folder')

date=2024-01-15

Dataset property file_name"Add dynamic content":

file_name — builds a unique file name for each city
weather_@{item().name}_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.json

What file name does this build?

weather_          → literal "weather_"
@{item().name}    → city name → "delhi"
_                 → literal "_"
@{formatDateTime(...,'yyyyMMdd')} → "20240115"
.json             → literal ".json"

Delhi:     weather_delhi_20240115.json     ✅
Mumbai:    weather_mumbai_20240115.json    ✅
Bangalore: weather_bangalore_20240115.json ✅
📸SCREENSHOT

Sink tab — date_folder showing @variables('run_date_folder'), file_name showing weather_@{item().name}_ expression

Go back to the main canvas using the back arrow.

📸SCREENSHOT

Main canvas — set_run_date_folder → ForEach_cities, two activities connected with green arrow

Step 12 — Validate and Debug

Click "Validate"

📸SCREENSHOT

Validation successful — no errors

Click "Debug"

Parameters dialog:

run_date:  2024-01-15
cities:    [{"name":"delhi","lat":"28.6139","lon":"77.2090"},{"name":"mumbai","lat":"19.0760","lon":"72.8777"},{"name":"bangalore","lat":"12.9716","lon":"77.5946"}]
📸SCREENSHOT

Debug dialog — run_date and cities array pre-filled with defaults from the pipeline parameters

Click "OK"

Watch the pipeline:

  1. set_run_date_folder → green ✅
  2. ForEach_cities → 3 iterations run in parallel
📸SCREENSHOT

Pipeline running — set_run_date_folder green, ForEach running with 3 cities in progress

📸SCREENSHOT

Pipeline complete — both activities showing green checkmarks

Click the 👓 glasses icon on the ForEach run:

📸SCREENSHOT

ForEach iteration details — 3 rows for delhi, mumbai, bangalore — all Succeeded

Step 13 — Verify JSON Files in ADLS

Azure Portal → Storage → stfreshmartdev → Containers → raw

You should now see a new weather folder alongside sales and external:

raw/
├── sales/      ← Projects 01–03
├── external/   ← Project 04
└── weather/    ← NEW — Project 06
    └── date=2024-01-15/
        ├── weather_delhi_20240115.json
        ├── weather_mumbai_20240115.json
        └── weather_bangalore_20240115.json
📸SCREENSHOT

raw container — three folders visible: sales, external, weather

📸SCREENSHOT

raw/weather/date=2024-01-15/ — three JSON files for Delhi, Mumbai, Bangalore

Click on weather_delhi_20240115.json → click "Edit"

You should see the raw JSON from the API saved exactly as-is:

weather_delhi_20240115.json — content saved in ADLS
{
  "latitude": 28.625,
  "longitude": 77.1875,
  "daily": {
    "time": ["2024-01-15"],
    "temperature_2m_max": [22.3],
    "temperature_2m_min": [9.1],
    "precipitation_sum": [0.0],
    "weathercode": [1]
  }
}
📸SCREENSHOT

weather_delhi_20240115.json file open in Azure Portal — raw JSON content visible

📸SCREENSHOT

weather_mumbai_20240115.json — different temperature values confirming it's a separate API call

✅ 3 API calls, 3 JSON responses, 3 separate files
The pipeline called the Open-Meteo API three times with different lat/lon values and saved three separate JSON files to ADLS — one per city.
PART 5 — Add the 6 AM Trigger

Step 14 — Create Schedule Trigger

On the pipeline canvas → "Add trigger""New/Edit""+ New"

Name:        trigger_weather_6am
Description: Pulls weather data every morning at 6 AM IST before store opening
Type:        Schedule
Start date:  today
Time zone:   India Standard Time
Repeat:      Every 1 Day
At time:     06:00
📸SCREENSHOT

New trigger form — trigger_weather_6am, Schedule type, 06:00 IST, daily repeat

Click "OK"

Trigger Run Parameters dialog:

For run_date:

@formatDateTime(trigger().scheduledTime,'yyyy-MM-dd')

The date the trigger fires — e.g. 2024-01-15

For cities:

[{"name":"delhi","lat":"28.6139","lon":"77.2090"},{"name":"mumbai","lat":"19.0760","lon":"72.8777"},{"name":"bangalore","lat":"12.9716","lon":"77.5946"}]
📸SCREENSHOT

Trigger Run Parameters — run_date showing formatDateTime expression, cities showing the JSON array

💡 Why 6 AM instead of midnight?
The sales pipeline runs at midnight copying yesterday's data. Weather data is more useful when it is fresh — today's forecast pulled in the morning before stores open. By 7 AM, the data team has both yesterday's sales AND today's weather in ADLS ready for analysis.

Click "OK"

Step 15 — Publish

Click "Publish all"

Items being published:

pl_ingest_weather_api
ds_src_http_weather_json
ds_sink_adls_weather_json
ls_http_openmeteo
trigger_weather_6am
📸SCREENSHOT

Publish panel — all 5 new items listed

Click "Publish"

📸SCREENSHOT

Successfully published notification

Check the trigger is active: MonitorTrigger runs

📸SCREENSHOT

Monitor Trigger runs — trigger_weather_6am showing Active, next run time showing tomorrow 6:00 AM

What You Built — Summary

Before Project 06
  • FreshMart had no weather data
  • No way to link sales patterns to weather
  • Analyst would visit a weather website manually
After Project 06
  • Weather API called at 6 AM every morning
  • 3 cities, 3 JSON files, landed in ADLS with date folders
  • Data team can JOIN sales data with weather data
  • Zero manual work — completely automated

The ADLS raw layer now looks like this after Projects 01–06:

raw/
├── sales/                          ← store transaction data (Projects 01–03)
│   └── date=YYYY-MM-DD/
│       └── store_STXXX_sales_YYYYMMDD.csv
│
├── external/                       ← files downloaded from internet (Project 04)
│   ├── cities/
│   └── grades/
│
└── weather/                        ← REST API data (Project 06)
    └── date=YYYY-MM-DD/
        ├── weather_delhi_YYYYMMDD.json
        ├── weather_mumbai_YYYYMMDD.json
        └── weather_bangalore_YYYYMMDD.json

Everything You Created

TypeNameWhat it does
Linked Servicels_http_openmeteoHTTP connection to https://api.open-meteo.com, Anonymous auth
Datasetds_src_http_weather_jsonSource HTTP JSON dataset with relative_url parameter
Datasetds_sink_adls_weather_jsonSink ADLS JSON dataset with date_folder and file_name parameters
Pipelinepl_ingest_weather_apiSet Variable → ForEach (3 cities) → Copy JSON to ADLS
Parameterrun_date (String)Date to process — controls folder and file names
Parametercities (Array)Array of objects with name, lat, lon for each city
Variablerun_date_folder (String)Computed folder name like date=2024-01-15
Triggertrigger_weather_6amFires every morning at 6 AM IST — passes today as run_date

Key Expressions in This Project

Read the "name" property from the current ForEach item
@{item().name}

"delhi", "mumbai", or "bangalore"

Read the "lat" property from the current ForEach item
@{item().lat}

"28.6139", "19.0760", or "12.9716"

Read the "lon" property from the current ForEach item
@{item().lon}

"77.2090", "72.8777", or "77.5946"

Format run_date for use in the file name
@formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')

"20240115"

Pass today's date from the trigger
@formatDateTime(trigger().scheduledTime,'yyyy-MM-dd')

"2024-01-15"

🧠 Key Concepts to Remember

ConceptWhat It IsWhy It Matters
REST APIA service you query with HTTP to get structured dataMost real-world data sources are APIs, not files
JSONThe format APIs use to return dataYou need to read and store JSON in data engineering
Query parametersValues after ? in a URL that customize the API responseControl what data the API returns per city
Object in array{"name":"delhi","lat":"28.61"} — an item with propertiesGroup related values together in ForEach items
@{item().property}Read one property from the current ForEach objectAccess name, lat, lon separately from one item
Dot notationitem().name reads the "name" field of the objectStandard way to navigate JSON/objects in ADF
JSON dataset formatADF dataset type for JSON filesUse this instead of DelimitedText for API responses
Base URL + relative URLSplit between linked service and datasetBase URL is the server, relative URL is the endpoint

⚠️ Common Mistakes in This Project

Putting query parameters in the base URL

Fix: Base URL must be only the domain: https://api.open-meteo.com — everything after including /v1/forecast?... goes in the relative URL dataset property.

Using @{item()} instead of @{item().lat} for object arrays

Fix: @{item()} gives the whole object: {"name":"delhi","lat":"28.61","lon":"77.20"}. Use @{item().lat} to get just the lat value. Symptom: URL becomes malformed.

Wrong format for cities array in parameter

Fix: Must be valid JSON — property names in double quotes, no trailing commas. Wrong: [{name:'delhi'}]. Right: [{"name":"delhi","lat":"28.6139","lon":"77.2090"}].

Forgetting %2F in timezone URL parameter

Fix: Asia/Kolkata has a slash that must be URL-encoded as %2F. Wrong: &timezone=Asia/Kolkata. Right: &timezone=Asia%2FKolkata. Symptom: API returns an error about invalid timezone.

Selecting wrong format (DelimitedText instead of JSON) for dataset

Fix: ADF tries to parse the JSON response as CSV — output is empty or garbled. Delete the dataset, recreate it, and select JSON format.

🚀 What's Coming in Project 07

We now have data coming from three sources — store sales CSVs, public file downloads, and a weather REST API. All raw data is landing in ADLS.

But all of it is raw and messy: inconsistent column names, null values, negative amounts, dates stored as text. In Project 07 we connect Azure Databricks for the first time and write a Python notebook to clean, validate, and transform the raw sales data from Bronze into a clean Silver layer. Output is Parquet — the industry standard format for analytics.

First time writing actual Python/PySpark code — every concept explained from zero.

🎯 Key Takeaways

  • REST APIs return structured JSON data — you query them with a URL instead of downloading a file
  • The cities array uses objects {"name":"delhi","lat":"28.61","lon":"77.20"} so one ForEach item carries all three values
  • Use @{item().property} to read individual fields from an object — dot notation works the same as in JavaScript
  • Base URL lives in the Linked Service, the relative URL and query parameters go in the dataset
  • Always test your API in the browser before building the ADF pipeline
  • The trigger uses @formatDateTime(trigger().scheduledTime,'yyyy-MM-dd') to pass today's date automatically
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...