Python Β· SQL Β· Web Dev Β· Java Β· AI/ML tracks launching soon β€” your one platform for all of IT
Advanced+500 XP

Project 03 β€” Parameterized Pipeline with Run Date

Build a fully automated pipeline where you pass a date at runtime and ADF constructs the correct file names and folder paths automatically. Add a scheduled trigger and the pipeline runs every night at midnight with zero human involvement.

75–90 min March 2026
Series: Azure Data Engineering β€” Zero to Advanced
Project: 03 of 25
Level: Beginner
Builds on: Project 01 + 02 β€” same resources
Time: 75–90 minutes
What you will build

A pipeline that takes a date as input, builds the correct file names and folder paths automatically, and copies all 10 store files into date-partitioned ADLS folders β€” triggered automatically every night at midnight.

Real World Problem

Let's be honest about what Projects 01 and 02 did and did not solve:

βœ… Already Solved
  • Moving one file to the cloud
  • Moving multiple files with ForEach
❌ Still Not Solved
  • Files are named the same every day
  • Someone still presses Debug manually
  • Miss a day and that data is gone
  • No way to tell which file belongs to which day

Here is what FreshMart's IT team actually needs:

"Every night at 11:30 PM the billing system exports files automatically. The file names include the date β€” like store_ST001_sales_20240115.csv for January 15th. We need a pipeline to run automatically at midnight, pick up that night's files, and copy them to ADLS β€” without anyone pressing a button."

This is how every production data pipeline in the real world works:

11:30 PMBilling system exports files with today's date in the name
12:00 AMADF pipeline triggers automatically
12:00 AMPipeline reads today's date, builds the correct file names
12:01 AMAll 10 store files copied to ADLS
12:05 AMData team wakes up to fresh data. Nobody pressed anything.

Concepts You Must Understand First

Why Pass run_date as a Parameter?

The most important design decision in this project. Here is the problem with not using a parameter:

❌ Hardcode today's date inside pipeline
  • Pipeline fails on Monday
  • You rerun it on Tuesday
  • It processes Tuesday's data again
  • Monday's data is lost forever
  • Cannot reprocess historical dates
βœ… Pass run_date as a parameter
  • Pipeline fails on Monday
  • You rerun with run_date = "2024-01-15"
  • It correctly reprocesses Monday's data
  • No data lost. Full control.
  • Backfill any past date anytime
πŸ’‘ Idempotency β€” The Professional Standard
When a pipeline can be run multiple times for the same date and always produce the same correct result β€” that is called idempotency. Every production data pipeline you build from here should be idempotent. The run_date parameter is how you achieve it.

Important ADF Limitation β€” Parameter Defaults Cannot Be Expressions

⚠️ Read This Before You Build
This caught many people off guard. Pipeline parameter default values in ADF must be plain static text β€” ADF does not evaluate expressions there. This will NOT work as a default:
@{formatDateTime(utcNow(), 'yyyy-MM-dd')} ← ADF treats this as plain text, not an expression. You get an error.

The fix is simple: use a plain static date as the default value, and let the trigger pass the real dynamic date at runtime.

2024-01-15 ← plain static date. Works perfectly as a default.

β†’ produces: Used during Debug. Trigger passes the real date when it fires.

How Do Dynamic Expressions Work with Dates?

ADF has built-in date formatting functions. These are the ones we use in this project:

Read the run_date parameter
@pipeline().parameters.run_date

β†’ "2024-01-15" (exactly what you passed in)

Date formatted for file names
@formatDateTime(pipeline().parameters.run_date, 'yyyyMMdd')

β†’ "20240115" (no dashes β€” for file names)

Date formatted for folder names
@formatDateTime(pipeline().parameters.run_date, 'yyyy-MM-dd')

β†’ "2024-01-15" (with dashes β€” for folder names)

Used inside trigger parameters
@formatDateTime(trigger().scheduledTime, 'yyyy-MM-dd')

β†’ The night the trigger fired β€” e.g. "2024-01-16"

The @{ } inside a text string is called string interpolation. ADF evaluates what is inside the braces and inserts the result:

How a file name gets built
store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

β†’ produces: store_ST001_sales_20240115.csv (when item()=ST001, run_date=2024-01-15)

Hive-Style Partitioning β€” The Folder Structure We Are Building

Good data engineers organize ADLS by date. This makes it easy to find any day's data and lets analytics tools skip folders they do not need β€” dramatically faster and cheaper to query.

raw/sales/

β”œβ”€β”€ date=2024-01-15/

β”‚ β”œβ”€β”€ store_ST001_sales_20240115.csv

β”‚ └── ... (10 files)

β”œβ”€β”€ date=2024-01-16/

β”‚ └── ... (10 files)

└── date=2024-01-17/

└── ... (10 files)

The date=YYYY-MM-DD folder naming convention is the industry standard β€” called Hive-style partitioning. Databricks, Synapse, and Athena all understand it natively.

What is a Trigger?

Schedule Trigger

Runs on a fixed schedule β€” every day at midnight, every hour, every Monday. Most common type. What we use in this project.

Tumbling Window Trigger

Like a schedule trigger but with built-in backfill. If the pipeline was down for 3 days, it automatically queues 3 missed runs.

Event Trigger

Fires when a file arrives in ADLS. "As soon as a new file lands, start the pipeline." We use this in Project 05.

What we are building
landing/store_sales/date=2024-01-15/
store_ST001_sales_20240115.csv
store_ST002_sales_20240115.csv
... 8 more
β†’
run_date
parameter
raw/sales/date=2024-01-15/
store_ST001_sales_20240115.csv
store_ST002_sales_20240115.csv
... 8 more
Pipeline: pl_copy_store_sales_by_date Β· Trigger: every night at midnight Β· Passes today as run_date automatically
PHASE 1 β€” PREPARE DATA

Step 1 β€” Create Date-Based CSV Files

This time the file names include the date: store_ST001_sales_20240115.csv. We create files for two dates (January 15 and 16) so we can test backfill β€” running the pipeline for different dates without changing anything.

On your Desktop create a folder called freshmart_dated_files. Inside it create two subfolders: 20240115 and 20240116.

πŸ“ΈSCREENSHOT

Desktop folder 'freshmart_dated_files' β€” showing two subfolders: 20240115 and 20240116

Inside 20240115 β€” create all 10 files. Here are the first two as templates. Follow the same pattern for stores ST003–ST010.

store_ST001_sales_20240115.csv
order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD1001,ST001,Basmati Rice 5kg,Grocery,12,299.00,2024-01-15
ORD1002,ST001,Samsung TV 43inch,Electronics,2,32000.00,2024-01-15
ORD1003,ST001,Amul Butter 500g,Dairy,25,240.00,2024-01-15
ORD1004,ST001,Colgate Toothpaste,Personal Care,30,89.00,2024-01-15
ORD1005,ST001,Nike Running Shoes,Apparel,5,4500.00,2024-01-15
store_ST002_sales_20240115.csv
order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD2001,ST002,Sunflower Oil 1L,Grocery,18,145.00,2024-01-15
ORD2002,ST002,iPhone 14,Electronics,1,75000.00,2024-01-15
ORD2003,ST002,Amul Milk 1L,Dairy,40,62.00,2024-01-15
ORD2004,ST002,Dove Soap 100g,Personal Care,50,65.00,2024-01-15
ORD2005,ST002,Levis Jeans,Apparel,8,2999.00,2024-01-15

Create stores ST003–ST010 with the same column structure, using their store IDs and order_date = 2024-01-15. Then for the 20240116 folder, duplicate all 10 files changing only the date in the file name, order IDs, and order_date column to 2024-01-16.

πŸ“ΈSCREENSHOT

Inside the 20240115 folder β€” all 10 store CSV files with dates in their names

Step 2 β€” Upload Files to Landing Container

Go to Azure Portal β†’ stfreshmartdev β†’ Containers β†’ landing β†’ click the store_sales folder.

Click "+ Add Directory" β†’ name it exactly: date=2024-01-15

🎯 Why This Exact Name?
The date= prefix is the Hive partition convention. Keep it exactly like this in both landing and raw containers so the folder structure mirrors across both sides.
πŸ“ΈSCREENSHOT

Add Directory dialog β€” 'date=2024-01-15' typed in

Click into date=2024-01-15 β†’ "Upload" β†’ select all 10 files from your 20240115 local folder β†’ "Upload".

πŸ“ΈSCREENSHOT

landing/store_sales/date=2024-01-15/ β€” all 10 dated CSV files uploaded

Go back to store_sales β†’ create another directory: date=2024-01-16 β†’ upload all 10 files from your 20240116 local folder.

πŸ“ΈSCREENSHOT

landing/store_sales/ β€” showing two date folders: date=2024-01-15 and date=2024-01-16

PHASE 2 β€” CREATE PARAMETERIZED DATASETS

These datasets need two parameters each β€” one for the date folder, one for the file name. Both values will be passed from the pipeline at runtime.

Step 3 β€” Create Source Dataset With Two Parameters

In ADF Studio β†’ Author β†’ Datasets β†’ "+" β†’ "New dataset" β†’ "Azure Blob Storage" β†’ "Continue" β†’ "DelimitedText" β†’ "Continue".

Nameds_src_blob_dated_store_sales
Linked servicels_blob_freshmart_landing
File pathleave ALL fields empty
First row as headerβœ… Yes
Import schemaNone

Click "OK" β†’ click the "Parameters" tab β†’ "+ New" β€” add BOTH parameters:

Parameter 1
Namerun_date_folder
TypeString
Parameter 2
Namefile_name
TypeString
πŸ“ΈSCREENSHOT

Dataset Parameters tab β€” both run_date_folder and file_name parameters listed

Click the "Connection" tab. Set the three path fields:

Containerlanding← type directly
Directorystore_sales/@{dataset().run_date_folder}← Add dynamic content
File@dataset().file_name← Add dynamic content

For the Directory field: click "Add dynamic content" β†’ in the editor, type the full expression: store_sales/@{dataset().run_date_folder} β†’ click "OK".

For the File field: click "Add dynamic content" β†’ under Parameters β†’ click file_name β†’ click "OK".

πŸ“ΈSCREENSHOT

Connection tab fully configured β€” container 'landing', directory with dynamic expression, file with @dataset().file_name

Click πŸ’Ύ Save.

Step 4 β€” Create Sink Dataset With Two Parameters

Click "+" next to Datasets β†’ "Azure Data Lake Storage Gen2" β†’ "DelimitedText".

Nameds_sink_adls_dated_sales
Linked servicels_adls_freshmart

Click "OK" β†’ Parameters tab β†’ add the same two parameters: run_date_folder (String) and file_name (String).

πŸ“ΈSCREENSHOT

Sink dataset Parameters tab β€” run_date_folder and file_name parameters added

Click Connection tab:

Containerraw
Directorysales/@{dataset().run_date_folder}
File@dataset().file_name
πŸ“ΈSCREENSHOT

Sink dataset Connection tab β€” raw/sales/@{dataset().run_date_folder} for directory, @dataset().file_name for file

Click πŸ’Ύ Save.

PHASE 3 β€” BUILD THE PIPELINE

Step 5 β€” Create New Pipeline

In ADF Studio β†’ Author β†’ "+" next to Pipelines β†’ "New pipeline".

Namepl_copy_store_sales_by_date
DescriptionCopies all store sales files for a given run_date into a date partition in ADLS
πŸ“ΈSCREENSHOT

New blank pipeline canvas β€” name 'pl_copy_store_sales_by_date' in Properties panel

Step 6 β€” Add the run_date Parameter

Click on empty canvas β†’ Parameters tab at the bottom β†’ "+ New".

Namerun_date
TypeString
Default2024-01-15← plain static date β€” no expression syntax!
⚠️ Static Default Only
Remember: parameter default values must be plain text. Write 2024-01-15 β€” not @{formatDateTime(...)}. The trigger will pass the real dynamic date at runtime. The static default is just for when you manually Debug.
πŸ“ΈSCREENSHOT

run_date parameter β€” default value showing plain '2024-01-15' with no expression syntax

Step 7 β€” Add the store_ids Array Parameter

Still in the Parameters tab β†’ "+ New".

Namestore_ids
TypeArray
Default["ST001","ST002","ST003","ST004","ST005","ST006","ST007","ST008","ST009","ST010"]

Notice: in Project 02 the array stored full file names like store_ST001_sales.csv. Now we store just the store ID like ST001. The pipeline builds the full file name using run_date. This means the array never needs to change β€” even as dates change every night.

πŸ“ΈSCREENSHOT

Pipeline Parameters tab β€” both run_date (String) and store_ids (Array) parameters visible

Step 8 β€” Add a Pipeline Variable

We need a variable to hold the computed folder name date=2024-01-15. Computing it once in a variable means we can use it in multiple places without repeating the expression.

Click empty canvas β†’ Variables tab β†’ "+ New".

Namerun_date_folder
TypeString
Defaultleave empty
πŸ“ΈSCREENSHOT

Variables tab β€” run_date_folder variable of type String added

Step 9 β€” Add a Set Variable Activity

This activity runs first. It takes run_date (e.g. 2024-01-15) and stores date=2024-01-15 in the variable. Every other activity then reads this variable instead of re-computing it.

Left panel β†’ expand "General" β†’ drag "Set variable" onto the canvas.

πŸ“ΈSCREENSHOT

Set variable activity placed on the main canvas

Click the Set variable activity β†’ configure:

General Tab

Nameset_run_date_folder
DescriptionBuilds the date= folder name from run_date parameter

Variables Tab (inside the activity)

Click the Variables tab in the bottom properties panel (this is the activity configuration, not the pipeline variables tab).

Namerun_date_folder← select from dropdown
Valuedate=@{pipeline().parameters.run_date}← Add dynamic content

Click "Add dynamic content" for the Value field β†’ type this expression in the editor:

Expression for Set Variable value
date=@{pipeline().parameters.run_date}

β†’ produces: date=2024-01-15 (when run_date is 2024-01-15)

This works because run_date already comes in as yyyy-MM-dd format β€” we just prepend date= to it. Simple and clean.

πŸ“ΈSCREENSHOT

Set variable activity Variables tab β€” name 'run_date_folder', value showing date=@{pipeline().parameters.run_date}

Step 10 β€” Add ForEach and Connect it to Set Variable

Left panel β†’ "Iteration & conditionals" β†’ drag "ForEach" onto the canvas.

Now connect the two activities: hover over set_run_date_folder β†’ drag the green arrow on its right edge β†’ drop it onto the ForEach. This forces Set Variable to finish before ForEach starts.

⚠️ Connection Is Required
If you do not connect them with an arrow, both activities run at the same time (in parallel). The ForEach would start before the variable is set β€” and the folder name would be empty.
πŸ“ΈSCREENSHOT

Canvas β€” set_run_date_folder connected to ForEach_store_ids with a green arrow showing the execution order

Click the ForEach activity β†’ configure:

General Tab

NameForEach_store_ids

Settings Tab

Sequential☐ Unchecked
Batch count4
Items@pipeline().parameters.store_ids
πŸ“ΈSCREENSHOT

ForEach Settings tab β€” Sequential off, Batch count 4, Items showing @pipeline().parameters.store_ids

Step 11 β€” Add Copy Activity Inside ForEach

Click the "+" button inside the ForEach box β†’ from the inner canvas left panel β†’ drag "Copy data".

πŸ“ΈSCREENSHOT

Copy data activity placed inside the ForEach inner canvas

Step 12 β€” Configure Source With Date Expressions

Click the Copy activity β†’ bottom panel:

General Tab

Namecopy_dated_store_file

Source Tab

Select ds_src_blob_dated_store_sales. Two Dataset properties fields appear.

πŸ“ΈSCREENSHOT

Source tab β€” ds_src_blob_dated_store_sales selected, Dataset properties showing run_date_folder and file_name fields

For run_date_folder: Click "Add dynamic content" β†’ under Variables β†’ click run_date_folder.

run_date_folder dataset property
@variables('run_date_folder')

β†’ produces: date=2024-01-15

πŸ“ΈSCREENSHOT

Dynamic content editor β€” @variables('run_date_folder') expression with run_date_folder visible under Variables section

For file_name: Click "Add dynamic content" β†’ type this expression in the editor:

file_name dataset property β€” breaking it down
store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

β†’ produces: store_ST001_sales_20240115.csv (when item()=ST001, run_date=2024-01-15)

How the file name expression builds the value:

store_β†’ "store_" β€” literal text
@{item()}β†’ "ST001" β€” the current store ID from the ForEach loop
_sales_β†’ "_sales_" β€” literal text
@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}β†’ "20240115" β€” date without dashes
.csv→ ".csv" — literal text
πŸ“ΈSCREENSHOT

Source tab fully configured β€” run_date_folder showing @variables expression, file_name showing the full dynamic file name expression

Step 13 β€” Configure Sink

Click Sink tab β†’ select ds_sink_adls_dated_sales. Two Dataset properties appear β€” fill them with the exact same expressions as the source:

run_date_folder@variables('run_date_folder')
file_namestore_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv
πŸ“ΈSCREENSHOT

Sink tab fully configured β€” same expressions as source, writing to raw/sales/date=2024-01-15/

Click the back arrow to return to the main pipeline canvas.

πŸ“ΈSCREENSHOT

Main canvas β€” set_run_date_folder β†’ ForEach_store_ids connected in sequence

Step 14 β€” Validate and Debug (Run for Jan 15)

Click "Validate" β†’ should show no errors. Then click "Debug".

πŸ“ΈSCREENSHOT

Validation successful β€” no errors found

The parameter dialog appears with the defaults pre-filled. Leave run_date as 2024-01-15 and click "OK".

πŸ“ΈSCREENSHOT

Debug dialog β€” run_date = 2024-01-15, store_ids array pre-filled

Watch the canvas β€” Set Variable completes first (green), then ForEach starts and runs 10 iterations 4 at a time.

πŸ“ΈSCREENSHOT

Pipeline running β€” set_run_date_folder green, ForEach running with progress indicator

πŸ“ΈSCREENSHOT

All completed β€” both activities showing green checkmarks

Verify in ADLS: Azure Portal β†’ stfreshmartdev β†’ Containers β†’ raw β†’ sales.

πŸ“ΈSCREENSHOT

raw/sales/date=2024-01-15/ β€” all 10 dated files visible with correct names and timestamps

Step 15 β€” Test Backfill (Run for Jan 16 Without Changing Anything)

This is where parameters prove their value. Click "Debug" again β†’ change only run_date:

run_date2024-01-16← changed
store_idsleave the same
πŸ“ΈSCREENSHOT

Debug dialog β€” run_date changed to 2024-01-16, everything else the same

Click "OK". Check ADLS β€” you now have two date partitions:

πŸ“ΈSCREENSHOT

raw/sales/ β€” showing BOTH date=2024-01-15 and date=2024-01-16 folders side by side

This is backfill. If a pipeline fails on any day, rerun it with that date β€” it fills the missing data without touching any other day's folder.

PHASE 4 β€” ADD SCHEDULED TRIGGER

Step 16 β€” Create the Schedule Trigger

Go back to the main pipeline canvas β†’ click "Add trigger" in the top toolbar β†’ "New/Edit" β†’ "+ New".

πŸ“ΈSCREENSHOT

Top toolbar β€” 'Add trigger' button highlighted, dropdown showing '+ New'

The New trigger panel opens. Fill in:

Nametrigger_daily_midnight
TypeSchedule
Start datetoday's date
Time zoneIndia Standard Time
Repeat every1 Day
At00:00 (midnight)
Activatedβœ… Yes
πŸ“ΈSCREENSHOT

New trigger panel β€” name, type Schedule, recurrence set to daily at 00:00 IST filled in

Click "OK".

Step 17 β€” Set What the Trigger Passes to the Pipeline

After clicking OK, a "Trigger Run Parameters" dialog appears. This is where you tell the trigger what to send as run_date and store_ids each night.

πŸ“ΈSCREENSHOT

Trigger Run Parameters dialog β€” run_date and store_ids fields to fill

For run_date: Click "Add dynamic content" and type this expression:

run_date trigger parameter value
@{formatDateTime(trigger().scheduledTime, 'yyyy-MM-dd')}

β†’ produces: 2024-01-16 (the date the trigger was scheduled to fire)

Why trigger().scheduledTime and not utcNow()?

trigger().scheduledTime is the time ADF scheduled this trigger to fire β€” always exactly midnight on the right date. utcNow() is the actual clock time when the pipeline runs, which could be 12:00:03 AM β€” and in UTC that might be a different date than your local time. Always use trigger().scheduledTime in trigger parameters.

Trigger scheduled for 2024-01-16 00:00 IST
β†’ trigger().scheduledTime = 2024-01-16T00:00:00
β†’ formatDateTime result = "2024-01-16" βœ… always correct

πŸ“ΈSCREENSHOT

Trigger Run Parameters β€” run_date showing @{'{formatDateTime(trigger().scheduledTime,\'yyyy-MM-dd\')'}} expression

For store_ids: Type the array directly:

["ST001","ST002","ST003","ST004","ST005","ST006","ST007","ST008","ST009","ST010"]
πŸ“ΈSCREENSHOT

Trigger Run Parameters fully filled β€” both run_date expression and store_ids array

Click "OK".

Step 18 β€” Publish Everything

Click "Publish all". The panel shows all 4 new items β€” click "Publish".

πŸ“ΈSCREENSHOT

Publish panel β€” showing pipeline, 2 datasets, and trigger all listed

πŸ“ΈSCREENSHOT

Successfully published β€” notification in top right corner

Step 19 β€” Manually Trigger a Run Right Now

You do not need to wait until midnight to test the trigger. On the pipeline canvas β†’ "Add trigger" β†’ "Trigger now".

πŸ“ΈSCREENSHOT

'Trigger now' option in the Add trigger dropdown

In the Run Parameters dialog, enter a date you have files for:

run_date2024-01-15
store_ids["ST001",...,"ST010"]
πŸ“ΈSCREENSHOT

Trigger now dialog β€” run_date and store_ids filled in

Click "OK" β†’ go to Monitor β†’ Pipeline runs to watch it execute.

πŸ“ΈSCREENSHOT

Monitor β†’ Pipeline runs β€” pl_copy_store_sales_by_date showing In Progress

πŸ“ΈSCREENSHOT

Pipeline run completed β€” status Succeeded, run_date visible in parameters, duration shown

Step 20 β€” View the Trigger in Monitor

Click Monitor β†’ Trigger runs in the left submenu.

πŸ“ΈSCREENSHOT

Monitor β†’ Trigger runs β€” trigger_daily_midnight listed with its next scheduled run time and Active status

The trigger is now live. Every night at midnight IST it fires automatically, passes today's date as run_date, copies all 10 store files into raw/sales/date=YYYY-MM-DD/, and nobody needs to press anything.

Before and After

Before This Project
  • Ran only when you pressed Debug
  • File names were static β€” same every day
  • No way to reprocess a past date
  • No date organization in ADLS
After This Project
  • Triggers automatically every night at midnight
  • File names built from run_date parameter
  • Backfill any past date anytime
  • ADLS organized into date=YYYY-MM-DD/ partitions

All Expressions Used in This Project

ExpressionWhere Used
2024-01-15run_date parameter default (plain static β€” no expression allowed here)
date=@{pipeline().parameters.run_date}Set Variable activity β€” builds the folder name
@pipeline().parameters.store_idsForEach Items β€” the list to loop through
@variables('run_date_folder')Dataset property β€” passes folder to dataset
store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csvDataset property β€” builds the full file name
store_sales/@{dataset().run_date_folder}Source dataset Directory field
sales/@{dataset().run_date_folder}Sink dataset Directory field
@{formatDateTime(trigger().scheduledTime,'yyyy-MM-dd')}Trigger parameter β€” passes the correct date nightly

What Was Added in Project 03

ItemNameWhat It Does
Datasetds_src_blob_dated_store_salesSource with 2 parameters: run_date_folder + file_name
Datasetds_sink_adls_dated_salesSink with 2 parameters: run_date_folder + file_name
Pipelinepl_copy_store_sales_by_dateSet Variable β†’ ForEach β†’ Copy, driven by run_date
Parameterrun_date (String)Date to process β€” controls file names and folder
Parameterstore_ids (Array)List of store IDs to loop through
Variablerun_date_folder (String)Computed folder name like date=2024-01-15
Activityset_run_date_folderSet Variable β€” builds the date= folder name
ActivityForEach_store_idsLoops through store IDs
Activitycopy_dated_store_fileCopies one store file per iteration
Triggertrigger_daily_midnightFires every night at midnight, passes today as run_date

Key Concepts Reference

ConceptWhat It IsWhy It Matters
run_date parameterDate passed into the pipeline from outsideEnables backfill, reprocessing, and idempotency
IdempotencyRunning the same date twice gives the same resultProduction pipelines must be safe to rerun
formatDateTime()ADF function that formats a date into a stringBuilds file names and folder paths from dates
String interpolationEmbedding @{expressions} inside a text stringBuild dynamic strings like file names
Set Variable activityComputes and stores a value during the pipeline runAvoids repeating the same expression everywhere
@variables('name')Reads a variable value you set earlierUse one computed value in multiple places
trigger().scheduledTimeThe time the trigger was scheduled to fireSafe, predictable way to get the date for a run
Hive-style partitioningFolder naming like date=YYYY-MM-DDIndustry standard β€” analytics tools scan only what they need
Schedule triggerRuns a pipeline on a fixed scheduleAutomates nightly runs with zero human involvement
BackfillRunning the pipeline for a past dateFix failed runs without affecting other dates

Common Mistakes

⚠️

Using an expression as a parameter default value

Fix: Parameter defaults must be plain static text β€” write 2024-01-15, not @{formatDateTime(...)}

⚠️

Using utcNow() in trigger parameters instead of trigger().scheduledTime

Fix: scheduledTime is always the correct scheduled date. utcNow() can be a different date due to timezone offset.

⚠️

Wrong date format in formatDateTime

Fix: Use 'yyyyMMdd' (no dashes) for file names. Use 'yyyy-MM-dd' (with dashes) for folder names and run_date.

⚠️

Not connecting Set Variable β†’ ForEach with an arrow

Fix: Without the arrow they run in parallel. ForEach starts before the variable is set β€” folder name is empty.

⚠️

Trigger created but never fires β€” forgot to publish

Fix: Always Publish all after adding or changing a trigger. Triggers only activate after publishing.

What is coming in Project 04

So far we have only worked with files you manually uploaded to Blob Storage. In the real world, data often lives on public internet URLs β€” government portals, supplier servers, weather APIs, open datasets.

In Project 04 you will build a pipeline that downloads a CSV file directly from a public HTTPS URL β€” no manual upload needed. ADF fetches the file from the internet and drops it straight into ADLS. Same FreshMart scenario. Zero manual work.

🎯 Key Takeaways

  • βœ“Pipeline parameter defaults must be plain static text β€” expressions like @{formatDateTime(...)} are not allowed there
  • βœ“run_date as a parameter enables idempotency β€” rerun any past date safely without affecting other dates
  • βœ“Set Variable activity runs before ForEach β€” always connect them with an arrow to enforce the order
  • βœ“@variables('run_date_folder') reads the computed folder name β€” one computation, used everywhere
  • βœ“String interpolation: store_@{item()}_sales_@{formatDateTime(run_date,'yyyyMMdd')}.csv builds file names at runtime
  • βœ“trigger().scheduledTime is the safe way to get the date in trigger parameters β€” not utcNow()
  • βœ“Hive-style partitioning (date=YYYY-MM-DD) is the industry standard β€” analytics tools understand it natively
  • βœ“After publishing the trigger, use "Trigger now" to test immediately without waiting for midnight
Share

Discussion

0

Have a better approach? Found something outdated? Share it β€” your knowledge helps everyone learning here.

Continue with GitHub
Loading...