Project 06 — Pull Data From a REST API
Call a live weather API from ADF, receive JSON responses for three cities, save them to ADLS, and automate the pipeline to run every morning at 6 AM — no files, no uploads, pure API-driven ingestion.
Azure DE — Zero to Advanced
06 of 25
Beginner+
75–90 min
A pipeline that calls a live REST API, receives a JSON response for each of FreshMart's three store cities, and saves the data to ADLS automatically every morning at 6 AM.
🏢 Real World Problem
FreshMart's category managers noticed something interesting. On rainy days, sales of umbrellas, instant noodles, and hot beverages spike. On hot summer days, ice cream, cold drinks, and fruits sell faster.
The data team wants to combine sales data with weather data to help stores plan inventory better. If heavy rain is forecast for Delhi tomorrow, ST001 should stock more umbrellas tonight.
But FreshMart has no weather data. They need to pull it from somewhere.
Open-Meteo is a free, public weather API. Every day it provides temperature, rainfall, wind speed, and weather codes for any city in the world. No registration. No credit card. No API key.
New Delhi → lat=28.6139, lon=77.2090
Mumbai → lat=19.0760, lon=72.8777
Bangalore → lat=12.9716, lon=77.5946We will build a pipeline that:
- Calls the Open-Meteo API for each city
- Receives weather data as JSON
- Saves each city's weather to ADLS
- Runs automatically every morning at 6 AM
Open-Meteo API → https://api.open-meteo.com
ADF calls API 3 times (one per city):
Call 1: ?latitude=28.6139&longitude=77.2090 → Delhi weather JSON
Call 2: ?latitude=19.0760&longitude=72.8777 → Mumbai weather JSON
Call 3: ?latitude=12.9716&longitude=77.5946 → Bangalore weather JSON
Each JSON saved to ADLS:
raw/weather/date=2024-01-15/
├── weather_delhi_20240115.json
├── weather_mumbai_20240115.json
└── weather_bangalore_20240115.json
🧠 Concepts You Must Understand First
What is a REST API?
In Project 04 we used HTTP to download a file — like downloading a CSV from a website. The URL pointed directly to a file.
A REST API is different. You do not download a file. Instead you ask a question and get a structured answer.
- URL points to a file
- Response IS the file
- Like downloading a PDF
- Same response every time
- GET /data/csv/cities.csv
- ← returns the cities.csv file
- URL is a question/request
- Response is structured data
- Like asking Google a question
- Response changes based on what you ask
- GET /v1/forecast?latitude=28.61
- ← returns Delhi's temperature data
Think of a REST API like a waiter at a restaurant:
- You (ADF) place an order (HTTP request) with specific details (parameters in the URL)
- The waiter (API) goes to the kitchen (server) and comes back with exactly what you asked for (JSON response)
- Different orders give different responses
What is JSON?
When you call a REST API, the response comes back as JSON — JavaScript Object Notation. It is the universal language APIs use to send data.
{
"city": "New Delhi",
"latitude": 28.6139,
"longitude": 77.2090,
"daily": {
"time": ["2024-01-15", "2024-01-16"],
"temperature_2m_max": [22.5, 24.1],
"precipitation_sum": [0.0, 2.3],
"weathercode": [1, 61]
}
}{ } → object (a thing with properties)
[ ] → array (a list of items)
"key": value → a property with its value
"text" → text value (always in double quotes)
22.5 → number value (no quotes)
true/false → boolean value
In the example above:
"city" → text property → "New Delhi"
"latitude" → number property → 28.6139
"daily" → object → contains arrays of values
"time" → array of dates → ["2024-01-15", "2024-01-16", ...]What URL Do We Call?
The Open-Meteo API URL for Delhi looks like this:
https://api.open-meteo.com/v1/forecast
?latitude=28.6139
&longitude=77.2090
&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode
&timezone=Asia/Kolkata
&forecast_days=1
https://api.open-meteo.com → base URL (goes in the Linked Service)
/v1/forecast → the specific API endpoint
? → starts the query parameters
latitude=28.6139 → first parameter (Delhi's latitude)
&longitude=77.2090 → second parameter (Delhi's longitude)
&daily=temperature_2m_max,... → which weather fields we want
&timezone=Asia/Kolkata → time zone for the dates
&forecast_days=1 → only give me today's forecastObject Arrays in ForEach — The Key New Concept
In Projects 02 and 03, the array contained simple strings:
["ST001", "ST002", "ST003"]This time each item is an object with three properties — name, lat, and lon:
[{"name":"delhi","lat":"28.6139","lon":"77.2090"}, ...]This is because for each city we need three pieces of information — not just one. We pack all three into one object so the ForEach loop gets everything in one shot.
Inside the ForEach you use dot notation to read individual properties:
@{item()}→ {"name":"delhi","lat":"28.6139","lon":"77.2090"} — the whole object
@{item().name}→ "delhi" — just the city name
@{item().lat}→ "28.6139" — just the latitude
@{item().lon}→ "77.2090" — just the longitude
Step 1 — Open the API URL and See the Response
Before building anything, always test your API in the browser first.
Open your browser and paste this URL exactly:
https://api.open-meteo.com/v1/forecast?latitude=28.6139&longitude=77.2090&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode&timezone=Asia%2FKolkata&forecast_days=1Your browser will show a JSON response similar to this:
{
"latitude": 28.625,
"longitude": 77.1875,
"timezone": "Asia/Kolkata",
"timezone_abbreviation": "IST",
"elevation": 216.0,
"daily_units": {
"time": "iso8601",
"temperature_2m_max": "°C",
"temperature_2m_min": "°C",
"precipitation_sum": "mm",
"weathercode": "wmo code"
},
"daily": {
"time": ["2024-01-15"],
"temperature_2m_max": [22.3],
"temperature_2m_min": [9.1],
"precipitation_sum": [0.0],
"weathercode": [1]
}
}Browser showing the raw JSON response from Open-Meteo API — JSON text visible in the browser window
Now test for Mumbai:
https://api.open-meteo.com/v1/forecast?latitude=19.0760&longitude=72.8777&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode&timezone=Asia%2FKolkata&forecast_days=1Browser showing Mumbai weather JSON response — different temperature values than Delhi, confirming the API returns different data per location
What does weathercode mean? Open-Meteo uses standard WMO weather codes:
0 → Clear sky
1 → Mainly clear
2 → Partly cloudy
3 → Overcast
51 → Light drizzle
61 → Slight rain
71 → Slight snowfall
95 → ThunderstormStep 2 — Create HTTP Linked Service for Open-Meteo
In Project 04 we created ls_http_public_data pointing to https://people.sc.fsu.edu. Open-Meteo is a different server with a different base URL — we need a new linked service for it.
In ADF Studio → Manage → Linked services → "+ New"
Search "HTTP" → select HTTP → Continue
New linked service — HTTP selected in the search results
Fill in:
Name: ls_http_openmeteo
Description: Connection to Open-Meteo free weather API
Connect via: AutoResolveIntegrationRuntime
Base URL: https://api.open-meteo.com
Authentication type: AnonymousHTTP linked service form — name ls_http_openmeteo, Base URL https://api.open-meteo.com, Anonymous authentication
Click "Test connection"
Green 'Connection successful' message for ls_http_openmeteo
Click "Create"
Linked services list — ls_http_openmeteo visible alongside previous linked services
Step 3 — Source Dataset (HTTP → JSON)
In Project 04 we used DelimitedText (CSV) as the format for our HTTP dataset. This time the API returns JSON, so we pick JSON as the format.
Author → Datasets → "+" → "New dataset"
Search "HTTP" → select HTTP → Continue
For format — select "JSON" → Continue
Format selection step — JSON selected, not DelimitedText
Name: ds_src_http_weather_json
Linked service: ls_http_openmeteo
Relative URL: (leave empty — will be dynamic)
Request method: GETClick "OK"
Click "Parameters" tab → "+ New"
Name: relative_url
Type: StringDataset Parameters tab — relative_url parameter added
Click "Connection" tab
Relative URL → "Add dynamic content" → click relative_url under Parameters:
@dataset().relative_urlConnection tab — Relative URL showing @dataset().relative_url
Click 💾 Save
Step 4 — Sink Dataset (ADLS → JSON)
Datasets → "+" → "New dataset"
Search "Azure Data Lake Storage Gen2" → select → Continue
For format — select "JSON" → Continue
Format selection — JSON selected for ADLS sink dataset
Name: ds_sink_adls_weather_json
Linked service: ls_adls_freshmart
File path: (leave all empty)
Import schema: NoneClick "OK"
Click "Parameters" tab → add TWO parameters:
Parameter 1:
Name: date_folder
Type: String
Parameter 2:
Name: file_name
Type: StringSink dataset Parameters tab — date_folder and file_name parameters both added
Click "Connection" tab:
Container: raw
Directory → "Add dynamic content":
weather/@{dataset().date_folder}File → "Add dynamic content":
@dataset().file_nameSink dataset Connection tab — Container: raw, Directory: weather/@{dataset().date_folder}, File: @dataset().file_name
Click 💾 Save
Step 5 — Create New Pipeline
Author → "+" next to Pipelines → "New pipeline"
Name: pl_ingest_weather_api
Description: Calls Open-Meteo API for each city and saves JSON to ADLS raw/weather/Blank pipeline canvas — name pl_ingest_weather_api in Properties panel
Step 6 — Add Parameters
Click blank canvas → "Parameters" tab → add parameters:
Parameter 1:
Name: run_date
Type: String
Default: 2024-01-15
Parameter 2:
Name: cities
Type: Array
Default: [
{"name":"delhi","lat":"28.6139","lon":"77.2090"},
{"name":"mumbai","lat":"19.0760","lon":"72.8777"},
{"name":"bangalore","lat":"12.9716","lon":"77.5946"}
]Pipeline Parameters tab — run_date and cities parameters visible
{"name":"delhi","lat":"28.6139","lon":"77.2090"} means the ForEach gets everything it needs in a single item, and we read each value with dot notation:item().name, item().lat, item().lon.Step 7 — Add Variable + Set Variable Activity
Click blank canvas → "Variables" tab → "+ New"
Name: run_date_folder
Type: StringVariables tab — run_date_folder variable added
From left panel → "General" → drag "Set variable" onto canvas
Click it → bottom panel:
General tab:
Name: set_run_date_folder
Variables tab:
Name: run_date_folder
Value: (click "Add dynamic content")In the dynamic content editor, type:
date=@{pipeline().parameters.run_date}→ date=2024-01-15
Set variable activity — value showing date=@{pipeline().parameters.run_date}
Step 8 — Add ForEach Activity
From left panel → "Iteration & conditionals" → drag "ForEach" onto canvas
Connect: drag green arrow from set_run_date_folder → ForEach
Canvas — set_run_date_folder connected to ForEach with green arrow
Click ForEach → bottom panel:
General tab:
Name: ForEach_cities
Settings tab:
Sequential: ☐ Unchecked (run all 3 cities in parallel)
Batch count: 3
Items: (click "Add dynamic content")Items value → click cities under Parameters:
@pipeline().parameters.citiesForEach Settings tab — Items showing @pipeline().parameters.cities, batch count 3, Sequential unchecked
Step 9 — Add Copy Activity Inside ForEach
Click the "+" button INSIDE the ForEach box
ForEach activity — '+' button inside highlighted
You are on the inner canvas. Drag "Copy data" from the left panel onto the inner canvas.
General tab:
Name: copy_weather_json_to_adlsCopy data activity on ForEach inner canvas — named copy_weather_json_to_adls
Step 10 — Configure Source
Click "Source" tab
Source dataset: ds_src_http_weather_jsonDataset property relative_url appears → click "Add dynamic content"
Build the full API query URL for each city:
/v1/forecast?latitude=@{item().lat}&longitude=@{item().lon}&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode&timezone=Asia%2FKolkata&forecast_days=1Dynamic content editor — the full API query URL expression with @{item().lat} and @{item().lon}
Click "OK"
What does this build for each city?
/v1/forecast
?latitude=28.6139
&longitude=77.2090
&daily=temperature_2m_max,temperature_2m_min,precipitation_sum,weathercode
&timezone=Asia%2FKolkata
&forecast_days=1
ADF joins this with the base URL from the linked service:
→ https://api.open-meteo.com/v1/forecast?latitude=28.6139&longitude=77.2090...
That is exactly the Delhi weather URL we tested in the browser! ✅
Next iteration (Mumbai): @{item().lat} → 19.0760, @{item().lon} → 72.8777
→ https://api.open-meteo.com/v1/forecast?latitude=19.0760&longitude=72.8777...Source tab complete — relative_url dataset property showing the full API query expression
Step 11 — Configure Sink
Click "Sink" tab
Sink dataset: ds_sink_adls_weather_jsonDataset property date_folder → "Add dynamic content" → click run_date_folder under Variables:
@variables('run_date_folder')→ date=2024-01-15
Dataset property file_name → "Add dynamic content":
weather_@{item().name}_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.jsonWhat file name does this build?
weather_ → literal "weather_"
@{item().name} → city name → "delhi"
_ → literal "_"
@{formatDateTime(...,'yyyyMMdd')} → "20240115"
.json → literal ".json"
Delhi: weather_delhi_20240115.json ✅
Mumbai: weather_mumbai_20240115.json ✅
Bangalore: weather_bangalore_20240115.json ✅Sink tab — date_folder showing @variables('run_date_folder'), file_name showing weather_@{item().name}_ expression
Go back to the main canvas using the back arrow.
Main canvas — set_run_date_folder → ForEach_cities, two activities connected with green arrow
Step 12 — Validate and Debug
Click "Validate"
Validation successful — no errors
Click "Debug"
Parameters dialog:
run_date: 2024-01-15
cities: [{"name":"delhi","lat":"28.6139","lon":"77.2090"},{"name":"mumbai","lat":"19.0760","lon":"72.8777"},{"name":"bangalore","lat":"12.9716","lon":"77.5946"}]Debug dialog — run_date and cities array pre-filled with defaults from the pipeline parameters
Click "OK"
Watch the pipeline:
set_run_date_folder→ green ✅ForEach_cities→ 3 iterations run in parallel
Pipeline running — set_run_date_folder green, ForEach running with 3 cities in progress
Pipeline complete — both activities showing green checkmarks
Click the 👓 glasses icon on the ForEach run:
ForEach iteration details — 3 rows for delhi, mumbai, bangalore — all Succeeded
Step 13 — Verify JSON Files in ADLS
Azure Portal → Storage → stfreshmartdev → Containers → raw
You should now see a new weather folder alongside sales and external:
raw/
├── sales/ ← Projects 01–03
├── external/ ← Project 04
└── weather/ ← NEW — Project 06
└── date=2024-01-15/
├── weather_delhi_20240115.json
├── weather_mumbai_20240115.json
└── weather_bangalore_20240115.jsonraw container — three folders visible: sales, external, weather
raw/weather/date=2024-01-15/ — three JSON files for Delhi, Mumbai, Bangalore
Click on weather_delhi_20240115.json → click "Edit"
You should see the raw JSON from the API saved exactly as-is:
{
"latitude": 28.625,
"longitude": 77.1875,
"daily": {
"time": ["2024-01-15"],
"temperature_2m_max": [22.3],
"temperature_2m_min": [9.1],
"precipitation_sum": [0.0],
"weathercode": [1]
}
}weather_delhi_20240115.json file open in Azure Portal — raw JSON content visible
weather_mumbai_20240115.json — different temperature values confirming it's a separate API call
Step 14 — Create Schedule Trigger
On the pipeline canvas → "Add trigger" → "New/Edit" → "+ New"
Name: trigger_weather_6am
Description: Pulls weather data every morning at 6 AM IST before store opening
Type: Schedule
Start date: today
Time zone: India Standard Time
Repeat: Every 1 Day
At time: 06:00New trigger form — trigger_weather_6am, Schedule type, 06:00 IST, daily repeat
Click "OK"
Trigger Run Parameters dialog:
For run_date:
@formatDateTime(trigger().scheduledTime,'yyyy-MM-dd')→ The date the trigger fires — e.g. 2024-01-15
For cities:
[{"name":"delhi","lat":"28.6139","lon":"77.2090"},{"name":"mumbai","lat":"19.0760","lon":"72.8777"},{"name":"bangalore","lat":"12.9716","lon":"77.5946"}]Trigger Run Parameters — run_date showing formatDateTime expression, cities showing the JSON array
Click "OK"
Step 15 — Publish
Click "Publish all"
Items being published:
pl_ingest_weather_api
ds_src_http_weather_json
ds_sink_adls_weather_json
ls_http_openmeteo
trigger_weather_6amPublish panel — all 5 new items listed
Click "Publish"
Successfully published notification
Check the trigger is active: Monitor → Trigger runs
Monitor Trigger runs — trigger_weather_6am showing Active, next run time showing tomorrow 6:00 AM
What You Built — Summary
- FreshMart had no weather data
- No way to link sales patterns to weather
- Analyst would visit a weather website manually
- Weather API called at 6 AM every morning
- 3 cities, 3 JSON files, landed in ADLS with date folders
- Data team can JOIN sales data with weather data
- Zero manual work — completely automated
The ADLS raw layer now looks like this after Projects 01–06:
raw/
├── sales/ ← store transaction data (Projects 01–03)
│ └── date=YYYY-MM-DD/
│ └── store_STXXX_sales_YYYYMMDD.csv
│
├── external/ ← files downloaded from internet (Project 04)
│ ├── cities/
│ └── grades/
│
└── weather/ ← REST API data (Project 06)
└── date=YYYY-MM-DD/
├── weather_delhi_YYYYMMDD.json
├── weather_mumbai_YYYYMMDD.json
└── weather_bangalore_YYYYMMDD.jsonEverything You Created
| Type | Name | What it does |
|---|---|---|
| Linked Service | ls_http_openmeteo | HTTP connection to https://api.open-meteo.com, Anonymous auth |
| Dataset | ds_src_http_weather_json | Source HTTP JSON dataset with relative_url parameter |
| Dataset | ds_sink_adls_weather_json | Sink ADLS JSON dataset with date_folder and file_name parameters |
| Pipeline | pl_ingest_weather_api | Set Variable → ForEach (3 cities) → Copy JSON to ADLS |
| Parameter | run_date (String) | Date to process — controls folder and file names |
| Parameter | cities (Array) | Array of objects with name, lat, lon for each city |
| Variable | run_date_folder (String) | Computed folder name like date=2024-01-15 |
| Trigger | trigger_weather_6am | Fires every morning at 6 AM IST — passes today as run_date |
Key Expressions in This Project
@{item().name}→ "delhi", "mumbai", or "bangalore"
@{item().lat}→ "28.6139", "19.0760", or "12.9716"
@{item().lon}→ "77.2090", "72.8777", or "77.5946"
@formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')→ "20240115"
@formatDateTime(trigger().scheduledTime,'yyyy-MM-dd')→ "2024-01-15"
🧠 Key Concepts to Remember
| Concept | What It Is | Why It Matters |
|---|---|---|
| REST API | A service you query with HTTP to get structured data | Most real-world data sources are APIs, not files |
| JSON | The format APIs use to return data | You need to read and store JSON in data engineering |
| Query parameters | Values after ? in a URL that customize the API response | Control what data the API returns per city |
| Object in array | {"name":"delhi","lat":"28.61"} — an item with properties | Group related values together in ForEach items |
| @{item().property} | Read one property from the current ForEach object | Access name, lat, lon separately from one item |
| Dot notation | item().name reads the "name" field of the object | Standard way to navigate JSON/objects in ADF |
| JSON dataset format | ADF dataset type for JSON files | Use this instead of DelimitedText for API responses |
| Base URL + relative URL | Split between linked service and dataset | Base URL is the server, relative URL is the endpoint |
⚠️ Common Mistakes in This Project
Fix: Base URL must be only the domain: https://api.open-meteo.com — everything after including /v1/forecast?... goes in the relative URL dataset property.
Fix: @{item()} gives the whole object: {"name":"delhi","lat":"28.61","lon":"77.20"}. Use @{item().lat} to get just the lat value. Symptom: URL becomes malformed.
Fix: Must be valid JSON — property names in double quotes, no trailing commas. Wrong: [{name:'delhi'}]. Right: [{"name":"delhi","lat":"28.6139","lon":"77.2090"}].
Fix: Asia/Kolkata has a slash that must be URL-encoded as %2F. Wrong: &timezone=Asia/Kolkata. Right: &timezone=Asia%2FKolkata. Symptom: API returns an error about invalid timezone.
Fix: ADF tries to parse the JSON response as CSV — output is empty or garbled. Delete the dataset, recreate it, and select JSON format.
We now have data coming from three sources — store sales CSVs, public file downloads, and a weather REST API. All raw data is landing in ADLS.
But all of it is raw and messy: inconsistent column names, null values, negative amounts, dates stored as text. In Project 07 we connect Azure Databricks for the first time and write a Python notebook to clean, validate, and transform the raw sales data from Bronze into a clean Silver layer. Output is Parquet — the industry standard format for analytics.
First time writing actual Python/PySpark code — every concept explained from zero.
🎯 Key Takeaways
- ✓REST APIs return structured JSON data — you query them with a URL instead of downloading a file
- ✓The cities array uses objects {"name":"delhi","lat":"28.61","lon":"77.20"} so one ForEach item carries all three values
- ✓Use @{item().property} to read individual fields from an object — dot notation works the same as in JavaScript
- ✓Base URL lives in the Linked Service, the relative URL and query parameters go in the dataset
- ✓Always test your API in the browser before building the ADF pipeline
- ✓The trigger uses @formatDateTime(trigger().scheduledTime,'yyyy-MM-dd') to pass today's date automatically
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.