ProjectsAdvanced+500 XP

Project 03 — Parameterized Pipeline with Run Date

Build a fully automated pipeline where you pass a date at runtime and ADF constructs the correct file names and folder paths automatically. Add a scheduled trigger and the pipeline runs every night at midnight with zero human involvement.

75–90 min March 2026

Project 02 — ForEach Loop Project 04 — HTTP Ingestion

Series: Azure Data Engineering — Zero to Advanced

Project: 03 of 25

Level: Beginner

Builds on: Project 01 + 02 — same resources

Time: 75–90 minutes

What you will build

A pipeline that takes a date as input, builds the correct file names and folder paths automatically, and copies all 10 store files into date-partitioned ADLS folders — triggered automatically every night at midnight.

Real World Problem

Let's be honest about what Projects 01 and 02 did and did not solve:

✅ Already Solved

Moving one file to the cloud
Moving multiple files with ForEach

❌ Still Not Solved

Files are named the same every day
Someone still presses Debug manually
Miss a day and that data is gone
No way to tell which file belongs to which day

Here is what FreshCart's IT team actually needs:

"Every night at 11:30 PM the billing system exports files automatically. The file names include the date — like store_ST001_sales_20240115.csv for January 15th. We need a pipeline to run automatically at midnight, pick up that night's files, and copy them to ADLS — without anyone pressing a button."

This is how every production data pipeline in the real world works:

11:30 PMBilling system exports files with today's date in the name

12:00 AMADF pipeline triggers automatically

12:00 AMPipeline reads today's date, builds the correct file names

12:01 AMAll 10 store files copied to ADLS

12:05 AMData team wakes up to fresh data. Nobody pressed anything.

Concepts You Must Understand First

Why Pass run_date as a Parameter?

The most important design decision in this project. Here is the problem with not using a parameter:

❌ Hardcode today's date inside pipeline

Pipeline fails on Monday
You rerun it on Tuesday
It processes Tuesday's data again
Monday's data is lost forever
Cannot reprocess historical dates

✅ Pass run_date as a parameter

Pipeline fails on Monday
You rerun with run_date = "2024-01-15"
It correctly reprocesses Monday's data
No data lost. Full control.
Backfill any past date anytime

💡 Idempotency — The Professional Standard

When a pipeline can be run multiple times for the same date and always produce the same correct result — that is called idempotency. Every production data pipeline you build from here should be idempotent. The run_date parameter is how you achieve it.

Important ADF Limitation — Parameter Defaults Cannot Be Expressions

⚠️ Read This Before You Build

This caught many people off guard. Pipeline parameter default values in ADF must be plain static text — ADF does not evaluate expressions there. This will NOT work as a default:

@{formatDateTime(utcNow(), 'yyyy-MM-dd')} ← ADF treats this as plain text, not an expression. You get an error.

The fix is simple: use a plain static date as the default value, and let the trigger pass the real dynamic date at runtime.

2024-01-15 ← plain static date. Works perfectly as a default.

→ produces: Used during Debug. Trigger passes the real date when it fires.

How Do Dynamic Expressions Work with Dates?

ADF has built-in date formatting functions. These are the ones we use in this project:

Read the run_date parameter

@pipeline().parameters.run_date

→ "2024-01-15" (exactly what you passed in)

Date formatted for file names

@formatDateTime(pipeline().parameters.run_date, 'yyyyMMdd')

→ "20240115" (no dashes — for file names)

Date formatted for folder names

@formatDateTime(pipeline().parameters.run_date, 'yyyy-MM-dd')

→ "2024-01-15" (with dashes — for folder names)

Used inside trigger parameters

@formatDateTime(trigger().scheduledTime, 'yyyy-MM-dd')

→ The night the trigger fired — e.g. "2024-01-16"

The @{ } inside a text string is called string interpolation. ADF evaluates what is inside the braces and inserts the result:

How a file name gets built

store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

→ produces: store_ST001_sales_20240115.csv (when item()=ST001, run_date=2024-01-15)

Hive-Style Partitioning — The Folder Structure We Are Building

Good data engineers organize ADLS by date. This makes it easy to find any day's data and lets analytics tools skip folders they do not need — dramatically faster and cheaper to query.

raw/sales/

├── date=2024-01-15/

│ ├── store_ST001_sales_20240115.csv

│ └── ... (10 files)

├── date=2024-01-16/

│ └── ... (10 files)

└── date=2024-01-17/

└── ... (10 files)

The date=YYYY-MM-DD folder naming convention is the industry standard — called Hive-style partitioning. Databricks, Synapse, and Athena all understand it natively.

What is a Trigger?

Schedule Trigger

Runs on a fixed schedule — every day at midnight, every hour, every Monday. Most common type. What we use in this project.

Tumbling Window Trigger

Like a schedule trigger but with built-in backfill. If the pipeline was down for 3 days, it automatically queues 3 missed runs.

Event Trigger

Fires when a file arrives in ADLS. "As soon as a new file lands, start the pipeline." We use this in Project 05.

What we are building

landing/store_sales/date=2024-01-15/

store_ST001_sales_20240115.csv

store_ST002_sales_20240115.csv

... 8 more

→

run_date

parameter

raw/sales/date=2024-01-15/

store_ST001_sales_20240115.csv

store_ST002_sales_20240115.csv

... 8 more

Pipeline: pl_copy_store_sales_by_date · Trigger: every night at midnight · Passes today as run_date automatically

PHASE 1 — PREPARE DATA

Step 1 — Create Date-Based CSV Files

This time the file names include the date: store_ST001_sales_20240115.csv. We create files for two dates (January 15 and 16) so we can test backfill — running the pipeline for different dates without changing anything.

On your Desktop create a folder called freshmart_dated_files. Inside it create two subfolders: 20240115 and 20240116.

📸SCREENSHOT

Desktop folder 'freshmart_dated_files' — showing two subfolders: 20240115 and 20240116

Inside 20240115 — create all 10 files. Here are the first two as templates. Follow the same pattern for stores ST003–ST010.

store_ST001_sales_20240115.csv

order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD1001,ST001,Basmati Rice 5kg,Grocery,12,299.00,2024-01-15
ORD1002,ST001,Samsung TV 43inch,Electronics,2,32000.00,2024-01-15
ORD1003,ST001,Amul Butter 500g,Dairy,25,240.00,2024-01-15
ORD1004,ST001,Colgate Toothpaste,Personal Care,30,89.00,2024-01-15
ORD1005,ST001,Nike Running Shoes,Apparel,5,4500.00,2024-01-15

store_ST002_sales_20240115.csv

order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD2001,ST002,Sunflower Oil 1L,Grocery,18,145.00,2024-01-15
ORD2002,ST002,iPhone 14,Electronics,1,75000.00,2024-01-15
ORD2003,ST002,Amul Milk 1L,Dairy,40,62.00,2024-01-15
ORD2004,ST002,Dove Soap 100g,Personal Care,50,65.00,2024-01-15
ORD2005,ST002,Levis Jeans,Apparel,8,2999.00,2024-01-15

Create stores ST003–ST010 with the same column structure, using their store IDs and order_date = 2024-01-15. Then for the 20240116 folder, duplicate all 10 files changing only the date in the file name, order IDs, and order_date column to 2024-01-16.

📸SCREENSHOT

Inside the 20240115 folder — all 10 store CSV files with dates in their names

Step 2 — Upload Files to Landing Container

Go to Azure Portal → stfreshmartdev → Containers → landing → click the store_sales folder.

Click "+ Add Directory" → name it exactly: date=2024-01-15

🎯 Why This Exact Name?

The date= prefix is the Hive partition convention. Keep it exactly like this in both landing and raw containers so the folder structure mirrors across both sides.

📸SCREENSHOT

Add Directory dialog — 'date=2024-01-15' typed in

Click into date=2024-01-15 → "Upload" → select all 10 files from your 20240115 local folder → "Upload".

📸SCREENSHOT

landing/store_sales/date=2024-01-15/ — all 10 dated CSV files uploaded

Go back to store_sales → create another directory: date=2024-01-16 → upload all 10 files from your 20240116 local folder.

📸SCREENSHOT

landing/store_sales/ — showing two date folders: date=2024-01-15 and date=2024-01-16

PHASE 2 — CREATE PARAMETERIZED DATASETS

These datasets need two parameters each — one for the date folder, one for the file name. Both values will be passed from the pipeline at runtime.

Step 3 — Create Source Dataset With Two Parameters

In ADF Studio → Author → Datasets → "+" → "New dataset" → "Azure Blob Storage" → "Continue" → "DelimitedText" → "Continue".

Nameds_src_blob_dated_store_sales

Linked servicels_blob_freshmart_landing

File pathleave ALL fields empty

First row as header✅ Yes

Import schemaNone

Click "OK" → click the "Parameters" tab → "+ New" — add BOTH parameters:

Parameter 1

Namerun_date_folder

TypeString

Parameter 2

Namefile_name

TypeString

📸SCREENSHOT

Dataset Parameters tab — both run_date_folder and file_name parameters listed

Click the "Connection" tab. Set the three path fields:

Containerlanding← type directly

Directorystore_sales/@{dataset().run_date_folder}← Add dynamic content

File@dataset().file_name← Add dynamic content

For the Directory field: click "Add dynamic content" → in the editor, type the full expression: store_sales/@{dataset().run_date_folder} → click "OK".

For the File field: click "Add dynamic content" → under Parameters → click file_name → click "OK".

📸SCREENSHOT

Connection tab fully configured — container 'landing', directory with dynamic expression, file with @dataset().file_name

Click 💾 Save.

Step 4 — Create Sink Dataset With Two Parameters

Click "+" next to Datasets → "Azure Data Lake Storage Gen2" → "DelimitedText".

Nameds_sink_adls_dated_sales

Linked servicels_adls_freshmart

Click "OK" → Parameters tab → add the same two parameters: run_date_folder (String) and file_name (String).

📸SCREENSHOT

Sink dataset Parameters tab — run_date_folder and file_name parameters added

Click Connection tab:

Containerraw

Directorysales/@{dataset().run_date_folder}

File@dataset().file_name

📸SCREENSHOT

Sink dataset Connection tab — raw/sales/@{dataset().run_date_folder} for directory, @dataset().file_name for file

Click 💾 Save.

PHASE 3 — BUILD THE PIPELINE

Step 5 — Create New Pipeline

In ADF Studio → Author → "+" next to Pipelines → "New pipeline".

Namepl_copy_store_sales_by_date

DescriptionCopies all store sales files for a given run_date into a date partition in ADLS

📸SCREENSHOT

New blank pipeline canvas — name 'pl_copy_store_sales_by_date' in Properties panel

Step 6 — Add the run_date Parameter

Click on empty canvas → Parameters tab at the bottom → "+ New".

Namerun_date

TypeString

Default2024-01-15← plain static date — no expression syntax!

⚠️ Static Default Only

Remember: parameter default values must be plain text. Write 2024-01-15 — not @{formatDateTime(...)}. The trigger will pass the real dynamic date at runtime. The static default is just for when you manually Debug.

📸SCREENSHOT

run_date parameter — default value showing plain '2024-01-15' with no expression syntax

Step 7 — Add the store_ids Array Parameter

Still in the Parameters tab → "+ New".

Namestore_ids

TypeArray

Default["ST001","ST002","ST003","ST004","ST005","ST006","ST007","ST008","ST009","ST010"]

Notice: in Project 02 the array stored full file names like store_ST001_sales.csv. Now we store just the store ID like ST001. The pipeline builds the full file name using run_date. This means the array never needs to change — even as dates change every night.

📸SCREENSHOT

Pipeline Parameters tab — both run_date (String) and store_ids (Array) parameters visible

Step 8 — Add a Pipeline Variable

We need a variable to hold the computed folder name date=2024-01-15. Computing it once in a variable means we can use it in multiple places without repeating the expression.

Click empty canvas → Variables tab → "+ New".

Namerun_date_folder

TypeString

Defaultleave empty

📸SCREENSHOT

Variables tab — run_date_folder variable of type String added

Step 9 — Add a Set Variable Activity

This activity runs first. It takes run_date (e.g. 2024-01-15) and stores date=2024-01-15 in the variable. Every other activity then reads this variable instead of re-computing it.

Left panel → expand "General" → drag "Set variable" onto the canvas.

📸SCREENSHOT

Set variable activity placed on the main canvas

Click the Set variable activity → configure:

General Tab

Nameset_run_date_folder

DescriptionBuilds the date= folder name from run_date parameter

Variables Tab (inside the activity)

Click the Variables tab in the bottom properties panel (this is the activity configuration, not the pipeline variables tab).

Namerun_date_folder← select from dropdown

Valuedate=@{pipeline().parameters.run_date}← Add dynamic content

Click "Add dynamic content" for the Value field → type this expression in the editor:

Expression for Set Variable value

date=@{pipeline().parameters.run_date}

→ produces: date=2024-01-15 (when run_date is 2024-01-15)

This works because run_date already comes in as yyyy-MM-dd format — we just prepend date= to it. Simple and clean.

📸SCREENSHOT

Set variable activity Variables tab — name 'run_date_folder', value showing date=@{pipeline().parameters.run_date}

Step 10 — Add ForEach and Connect it to Set Variable

Left panel → "Iteration & conditionals" → drag "ForEach" onto the canvas.

Now connect the two activities: hover over set_run_date_folder → drag the green arrow on its right edge → drop it onto the ForEach. This forces Set Variable to finish before ForEach starts.

⚠️ Connection Is Required

If you do not connect them with an arrow, both activities run at the same time (in parallel). The ForEach would start before the variable is set — and the folder name would be empty.

📸SCREENSHOT

Canvas — set_run_date_folder connected to ForEach_store_ids with a green arrow showing the execution order

Click the ForEach activity → configure:

General Tab

NameForEach_store_ids

Settings Tab

Sequential☐ Unchecked

Batch count4

Items@pipeline().parameters.store_ids

📸SCREENSHOT

ForEach Settings tab — Sequential off, Batch count 4, Items showing @pipeline().parameters.store_ids

Step 11 — Add Copy Activity Inside ForEach

Click the "+" button inside the ForEach box → from the inner canvas left panel → drag "Copy data".

📸SCREENSHOT

Copy data activity placed inside the ForEach inner canvas

Step 12 — Configure Source With Date Expressions

Click the Copy activity → bottom panel:

General Tab

Namecopy_dated_store_file

Source Tab

Select ds_src_blob_dated_store_sales. Two Dataset properties fields appear.

📸SCREENSHOT

Source tab — ds_src_blob_dated_store_sales selected, Dataset properties showing run_date_folder and file_name fields

For run_date_folder: Click "Add dynamic content" → under Variables → click run_date_folder.

run_date_folder dataset property

@variables('run_date_folder')

→ produces: date=2024-01-15

📸SCREENSHOT

Dynamic content editor — @variables('run_date_folder') expression with run_date_folder visible under Variables section

For file_name: Click "Add dynamic content" → type this expression in the editor:

file_name dataset property — breaking it down

store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

→ produces: store_ST001_sales_20240115.csv (when item()=ST001, run_date=2024-01-15)

How the file name expression builds the value:

store_→ "store_" — literal text

@{item()}→ "ST001" — the current store ID from the ForEach loop

_sales_→ "_sales_" — literal text

@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}→ "20240115" — date without dashes

.csv→ ".csv" — literal text

📸SCREENSHOT

Source tab fully configured — run_date_folder showing @variables expression, file_name showing the full dynamic file name expression

Step 13 — Configure Sink

Click Sink tab → select ds_sink_adls_dated_sales. Two Dataset properties appear — fill them with the exact same expressions as the source:

run_date_folder@variables('run_date_folder')

file_namestore_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

📸SCREENSHOT

Sink tab fully configured — same expressions as source, writing to raw/sales/date=2024-01-15/

Click the back arrow to return to the main pipeline canvas.

📸SCREENSHOT

Main canvas — set_run_date_folder → ForEach_store_ids connected in sequence

Step 14 — Validate and Debug (Run for Jan 15)

Click "Validate" → should show no errors. Then click "Debug".

📸SCREENSHOT

Validation successful — no errors found

The parameter dialog appears with the defaults pre-filled. Leave run_date as 2024-01-15 and click "OK".

📸SCREENSHOT

Debug dialog — run_date = 2024-01-15, store_ids array pre-filled

Watch the canvas — Set Variable completes first (green), then ForEach starts and runs 10 iterations 4 at a time.

📸SCREENSHOT

Pipeline running — set_run_date_folder green, ForEach running with progress indicator

📸SCREENSHOT

All completed — both activities showing green checkmarks

Verify in ADLS: Azure Portal → stfreshmartdev → Containers → raw → sales.

📸SCREENSHOT

raw/sales/date=2024-01-15/ — all 10 dated files visible with correct names and timestamps

Step 15 — Test Backfill (Run for Jan 16 Without Changing Anything)

This is where parameters prove their value. Click "Debug" again → change only run_date:

run_date2024-01-16← changed

store_idsleave the same

📸SCREENSHOT

Debug dialog — run_date changed to 2024-01-16, everything else the same

Click "OK". Check ADLS — you now have two date partitions:

📸SCREENSHOT

raw/sales/ — showing BOTH date=2024-01-15 and date=2024-01-16 folders side by side

This is backfill. If a pipeline fails on any day, rerun it with that date — it fills the missing data without touching any other day's folder.

PHASE 4 — ADD SCHEDULED TRIGGER

Step 16 — Create the Schedule Trigger

Go back to the main pipeline canvas → click "Add trigger" in the top toolbar → "New/Edit" → "+ New".

📸SCREENSHOT

Top toolbar — 'Add trigger' button highlighted, dropdown showing '+ New'

The New trigger panel opens. Fill in:

Nametrigger_daily_midnight

TypeSchedule

Start datetoday's date

Time zoneIndia Standard Time

Repeat every1 Day

At00:00 (midnight)

Activated✅ Yes

📸SCREENSHOT

New trigger panel — name, type Schedule, recurrence set to daily at 00:00 IST filled in

Click "OK".

Step 17 — Set What the Trigger Passes to the Pipeline

After clicking OK, a "Trigger Run Parameters" dialog appears. This is where you tell the trigger what to send as run_date and store_ids each night.

📸SCREENSHOT

Trigger Run Parameters dialog — run_date and store_ids fields to fill

For run_date: Click "Add dynamic content" and type this expression:

run_date trigger parameter value

@{formatDateTime(trigger().scheduledTime, 'yyyy-MM-dd')}

→ produces: 2024-01-16 (the date the trigger was scheduled to fire)

Why trigger().scheduledTime and not utcNow()?

trigger().scheduledTime is the time ADF scheduled this trigger to fire — always exactly midnight on the right date. utcNow() is the actual clock time when the pipeline runs, which could be 12:00:03 AM — and in UTC that might be a different date than your local time. Always use trigger().scheduledTime in trigger parameters.

Trigger scheduled for 2024-01-16 00:00 IST
→ trigger().scheduledTime = 2024-01-16T00:00:00
→ formatDateTime result = "2024-01-16" ✅ always correct

📸SCREENSHOT

Trigger Run Parameters — run_date showing @{'{formatDateTime(trigger().scheduledTime,\'yyyy-MM-dd\')'}} expression

For store_ids: Type the array directly:

["ST001","ST002","ST003","ST004","ST005","ST006","ST007","ST008","ST009","ST010"]

📸SCREENSHOT

Trigger Run Parameters fully filled — both run_date expression and store_ids array

Click "OK".

Step 18 — Publish Everything

Click "Publish all". The panel shows all 4 new items — click "Publish".

📸SCREENSHOT

Publish panel — showing pipeline, 2 datasets, and trigger all listed

📸SCREENSHOT

Successfully published — notification in top right corner

Step 19 — Manually Trigger a Run Right Now

You do not need to wait until midnight to test the trigger. On the pipeline canvas → "Add trigger" → "Trigger now".

📸SCREENSHOT

'Trigger now' option in the Add trigger dropdown

In the Run Parameters dialog, enter a date you have files for:

run_date2024-01-15

store_ids["ST001",...,"ST010"]

📸SCREENSHOT

Trigger now dialog — run_date and store_ids filled in

Click "OK" → go to Monitor → Pipeline runs to watch it execute.

📸SCREENSHOT

Monitor → Pipeline runs — pl_copy_store_sales_by_date showing In Progress

📸SCREENSHOT

Pipeline run completed — status Succeeded, run_date visible in parameters, duration shown

Step 20 — View the Trigger in Monitor

Click Monitor → Trigger runs in the left submenu.

📸SCREENSHOT

Monitor → Trigger runs — trigger_daily_midnight listed with its next scheduled run time and Active status

The trigger is now live. Every night at midnight IST it fires automatically, passes today's date as run_date, copies all 10 store files into raw/sales/date=YYYY-MM-DD/, and nobody needs to press anything.

Before and After

Before This Project

Ran only when you pressed Debug
File names were static — same every day
No way to reprocess a past date
No date organization in ADLS

After This Project

Triggers automatically every night at midnight
File names built from run_date parameter
Backfill any past date anytime
ADLS organized into date=YYYY-MM-DD/ partitions

All Expressions Used in This Project

Expression	Where Used
2024-01-15	run_date parameter default (plain static — no expression allowed here)
date=@{pipeline().parameters.run_date}	Set Variable activity — builds the folder name
@pipeline().parameters.store_ids	ForEach Items — the list to loop through
@variables('run_date_folder')	Dataset property — passes folder to dataset
store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv	Dataset property — builds the full file name
store_sales/@{dataset().run_date_folder}	Source dataset Directory field
sales/@{dataset().run_date_folder}	Sink dataset Directory field
@{formatDateTime(trigger().scheduledTime,'yyyy-MM-dd')}	Trigger parameter — passes the correct date nightly

What Was Added in Project 03

Item	Name	What It Does
Dataset	ds_src_blob_dated_store_sales	Source with 2 parameters: run_date_folder + file_name
Dataset	ds_sink_adls_dated_sales	Sink with 2 parameters: run_date_folder + file_name
Pipeline	pl_copy_store_sales_by_date	Set Variable → ForEach → Copy, driven by run_date
Parameter	run_date (String)	Date to process — controls file names and folder
Parameter	store_ids (Array)	List of store IDs to loop through
Variable	run_date_folder (String)	Computed folder name like date=2024-01-15
Activity	set_run_date_folder	Set Variable — builds the date= folder name
Activity	ForEach_store_ids	Loops through store IDs
Activity	copy_dated_store_file	Copies one store file per iteration
Trigger	trigger_daily_midnight	Fires every night at midnight, passes today as run_date

Key Concepts Reference

Concept	What It Is	Why It Matters
run_date parameter	Date passed into the pipeline from outside	Enables backfill, reprocessing, and idempotency
Idempotency	Running the same date twice gives the same result	Production pipelines must be safe to rerun
formatDateTime()	ADF function that formats a date into a string	Builds file names and folder paths from dates
String interpolation	Embedding @{expressions} inside a text string	Build dynamic strings like file names
Set Variable activity	Computes and stores a value during the pipeline run	Avoids repeating the same expression everywhere
@variables('name')	Reads a variable value you set earlier	Use one computed value in multiple places
trigger().scheduledTime	The time the trigger was scheduled to fire	Safe, predictable way to get the date for a run
Hive-style partitioning	Folder naming like date=YYYY-MM-DD	Industry standard — analytics tools scan only what they need
Schedule trigger	Runs a pipeline on a fixed schedule	Automates nightly runs with zero human involvement
Backfill	Running the pipeline for a past date	Fix failed runs without affecting other dates

Common Mistakes

⚠️

Using an expression as a parameter default value

Fix: Parameter defaults must be plain static text — write 2024-01-15, not @{formatDateTime(...)}

⚠️

Using utcNow() in trigger parameters instead of trigger().scheduledTime

Fix: scheduledTime is always the correct scheduled date. utcNow() can be a different date due to timezone offset.

⚠️

Wrong date format in formatDateTime

Fix: Use 'yyyyMMdd' (no dashes) for file names. Use 'yyyy-MM-dd' (with dashes) for folder names and run_date.

⚠️

Not connecting Set Variable → ForEach with an arrow

Fix: Without the arrow they run in parallel. ForEach starts before the variable is set — folder name is empty.

⚠️

Trigger created but never fires — forgot to publish

Fix: Always Publish all after adding or changing a trigger. Triggers only activate after publishing.

What is coming in Project 04

So far we have only worked with files you manually uploaded to Blob Storage. In the real world, data often lives on public internet URLs — government portals, supplier servers, weather APIs, open datasets.

In Project 04 you will build a pipeline that downloads a CSV file directly from a public HTTPS URL — no manual upload needed. ADF fetches the file from the internet and drops it straight into ADLS. Same FreshCart scenario. Zero manual work.

🎯 Key Takeaways

✓Pipeline parameter defaults must be plain static text — expressions like @{formatDateTime(...)} are not allowed there
✓run_date as a parameter enables idempotency — rerun any past date safely without affecting other dates
✓Set Variable activity runs before ForEach — always connect them with an arrow to enforce the order
✓@variables('run_date_folder') reads the computed folder name — one computation, used everywhere
✓String interpolation: store_@{item()}_sales_@{formatDateTime(run_date,'yyyyMMdd')}.csv builds file names at runtime
✓trigger().scheduledTime is the safe way to get the date in trigger parameters — not utcNow()
✓Hive-style partitioning (date=YYYY-MM-DD) is the industry standard — analytics tools understand it natively
✓After publishing the trigger, use "Trigger now" to test immediately without waiting for midnight

What to learn next

Project 04 — HTTP Ingestion

Projects · 75 min · +500 XP

Project 02 — ForEach Loop

Project 04 — HTTP Ingestion

Projects

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub