Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Advanced+500 XP

Project 02 — Copy Multiple CSV Files Using ForEach Loop

Stop creating one Copy activity per file. Use the ForEach activity to loop through a list of files and copy all of them in a single pipeline run. Add a new store tomorrow — just update the array, no pipeline changes needed.

60–90 min March 2026
Series: Azure Data Engineering — Zero to Advanced
Project: 02 of 25
Level: Beginner
Builds on: Project 01 — same resources, same ADF
Time: 60–90 minutes
What you will build

A pipeline that loops through 10 store CSV files and copies all of them to ADLS in a single run — using one ForEach activity instead of 10 Copy activities.

Real World Problem

In Project 01 we solved the first problem — we moved a single CSV file from a laptop to Azure automatically. But here is where the real world gets complicated.

FreshMart does not have 1 store. They have 10 stores. Every night, each store manager exports their own sales file:

FileStoreCity
store_ST001_sales.csvST001New Delhi
store_ST002_sales.csvST002Mumbai
store_ST003_sales.csvST003Bangalore
store_ST004_sales.csvST004Chennai
store_ST005_sales.csvST005Hyderabad
store_ST006_sales.csvST006Pune
store_ST007_sales.csvST007Kolkata
store_ST008_sales.csvST008Ahmedabad
store_ST009_sales.csvST009Jaipur
store_ST010_sales.csvST010Chandigarh

Imagine solving this the wrong way — creating 10 separate Copy activities, one per store file:

❌ WRONG APPROACH

Copy Activity 1 → store_ST001

Copy Activity 2 → store_ST002

Copy Activity 3 → store_ST003

... 7 more activities ...

Copy Activity 10 → store_ST010

New store opens → manually add an activity

File renamed → manually update each activity

10 separate failure points

✅ RIGHT APPROACH

ForEach Activity

└── For each file in list

Copy Activity

(runs once per file)

New store opens → just add to the array

Tomorrow: 50 stores → same pipeline

1 activity, 1 failure point

Concepts You Must Understand First

What is a ForEach Activity?

A ForEach activity is a loop inside ADF. It takes a list of items and runs one or more activities for each item in that list.

Real analogy: You have 10 packages to deliver. Instead of making 10 separate trips, you load all packages into one truck and deliver them one by one on a single route. The ForEach activity is the truck route.

ForEach Activity

├── Items: ["ST001", "ST002", ... "ST010"]

│ ↑ this is the list it loops through

└── Activities inside the loop:

Copy Activity (runs once per item)

Sequential

Copies file 1, then file 2, then file 3. Safer. Slower.

Parallel

Copies multiple files at the same time. Faster. We use this with batch count 4.

What is a Pipeline Parameter?

A parameter is a value you pass into a pipeline from outside — before it starts running. In Project 02 we use an Array parameter called store_files to pass the list of file names.

PARAMETER
  • Set FROM OUTSIDE the pipeline
  • Set before the pipeline starts — cannot change during a run
  • Example: store_files = ["ST001.csv", "ST002.csv", ...] passed when triggering
💡 Note
Variables are a separate concept — they are set inside a pipeline during a run and can change as activities execute. You will use variables properly in Project 03 where they are genuinely needed. For Project 02, a parameter is all you need.

What is a Dynamic Expression?

In Project 01, our dataset had a hardcoded file name: daily_sales.csv — fixed, never changes. In Project 02, the file name changes on every loop iteration. This is where dynamic expressions come in.

The @{ } syntax tells ADF: this is not a static value — evaluate this expression right now.

Common dynamic expressions
@item()Current item in a ForEach loop ← what we use in this project
@pipeline().parameters.store_filesValue of the store_files array parameter
@utcNow()Current date and time
@formatDateTime(utcNow(),'yyyy-MM-dd')Today's date formatted
What we are building
Blob (landing/store_sales/)
store_ST001_sales.csv
store_ST002_sales.csv
store_ST003_sales.csv
store_ST004_sales.csv
store_ST005_sales.csv
... 5 more
ForEach
ADLS (raw/sales/)
store_ST001_sales.csv
store_ST002_sales.csv
store_ST003_sales.csv
store_ST004_sales.csv
store_ST005_sales.csv
... 5 more

One pipeline. One ForEach. Ten files copied.

PHASE 1 — PREPARE SAMPLE DATA

Step 1 — Create 10 Store CSV Files

Each file represents one store's daily sales. Open Notepad (Windows) or TextEdit (Mac) and create each file below. Save them all in a folder on your Desktop called freshmart_store_files.

store_ST001_sales.csv
order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD1001,ST001,Basmati Rice 5kg,Grocery,12,299.00,2024-01-15
ORD1002,ST001,Samsung TV 43inch,Electronics,2,32000.00,2024-01-15
ORD1003,ST001,Amul Butter 500g,Dairy,25,240.00,2024-01-15
ORD1004,ST001,Colgate Toothpaste,Personal Care,30,89.00,2024-01-15
ORD1005,ST001,Nike Running Shoes,Apparel,5,4500.00,2024-01-15
store_ST002_sales.csv
order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD2001,ST002,Sunflower Oil 1L,Grocery,18,145.00,2024-01-15
ORD2002,ST002,iPhone 14,Electronics,1,75000.00,2024-01-15
ORD2003,ST002,Amul Milk 1L,Dairy,40,62.00,2024-01-15
ORD2004,ST002,Dove Soap 100g,Personal Care,50,65.00,2024-01-15
ORD2005,ST002,Levis Jeans,Apparel,8,2999.00,2024-01-15

Create files 3–10 with the same structure, using the store IDs ST003 through ST010 and their respective city product data. Save all 10 as store_ST00X_sales.csv.

📸SCREENSHOT

Desktop folder 'freshmart_store_files' — showing all 10 CSV files listed

Step 2 — Upload All 10 Files to Landing Container

Go to Azure Portal → Storage accountsstfreshmartdevContainerslanding.

Click "+ Add Directory" → name it store_sales.

📸SCREENSHOT

Add Directory dialog — 'store_sales' entered as directory name

Click into the store_sales directory → click "Upload" → hold Ctrl and select all 10 CSV files at once → click "Upload".

📸SCREENSHOT

Upload dialog — all 10 files selected, showing their names in the file list before uploading

📸SCREENSHOT

landing/store_sales/ directory after upload — all 10 CSV files visible with file sizes

PHASE 2 — CREATE PARAMETERIZED DATASETS

In Project 01, our datasets had a hardcoded file path: File: daily_sales.csv. That worked for one file. Now the file name needs to change on each loop iteration. We do this by adding parameters to the dataset — a placeholder instead of a fixed value.

💡 Think of it like a form letter
"Dear [NAME], your order [ORDER_ID] has been shipped." — instead of writing 1000 separate letters, you write one template and fill in the parameters for each person. A parameterized dataset is the same idea.

Step 3 — Create Parameterized Source Dataset

In ADF Studio → AuthorDatasets"+""New dataset""Azure Blob Storage""Continue""DelimitedText""Continue".

Nameds_src_blob_store_sales
Linked servicels_blob_freshmart_landing
File pathleave ALL THREE fields empty for now
First row as header✅ Yes
Import schemaNone
📸SCREENSHOT

Dataset form — name filled, linked service selected, file path fields left empty

Click "OK". You are now in the dataset editor. Click the "Parameters" tab at the bottom.

📸SCREENSHOT

Dataset editor — Parameters tab highlighted at the bottom

Click "+ New":

Namefile_name
TypeString
Defaultleave empty
📸SCREENSHOT

Parameters tab — new parameter 'file_name' of type String added

Now click the "Connection" tab → click inside the "File" field → click "Add dynamic content" (blue link below the field).

📸SCREENSHOT

Connection tab — 'Add dynamic content' blue link visible below the File field

The dynamic content editor opens. Under "Parameters" on the right → click file_name. The expression becomes:

@dataset().file_name
📸SCREENSHOT

Dynamic content editor — @dataset().file_name expression in the box, file_name parameter visible in the right panel

Click "OK". Set the full path:

Containerlanding
Directorystore_sales
File@dataset().file_name← dynamic
📸SCREENSHOT

Connection tab fully configured — container 'landing', directory 'store_sales', file showing the dynamic expression

Click 💾 Save.

Step 4 — Create Parameterized Sink Dataset

Click "+" next to Datasets → "New dataset""Azure Data Lake Storage Gen2""Continue""DelimitedText""Continue".

Nameds_sink_adls_store_sales
Linked servicels_adls_freshmart
File pathleave ALL THREE fields empty

Click "OK"Parameters tab → "+ New"file_name, type String.

📸SCREENSHOT

Sink dataset Parameters tab — file_name parameter added

Click Connection tab → File field → "Add dynamic content"→ click file_name → expression: @dataset().file_name"OK".

Containerraw
Directorysales
File@dataset().file_name
📸SCREENSHOT

Sink dataset Connection tab — raw/sales/@dataset().file_name

Click 💾 Save.

PHASE 3 — BUILD THE PIPELINE

Step 5 — Create New Pipeline

In ADF Studio → Author"+" next to Pipelines → "New pipeline".

Namepl_copy_all_store_sales
DescriptionLoops through all store CSV files and copies each one to ADLS raw/sales/
📸SCREENSHOT

New blank pipeline canvas — name 'pl_copy_all_store_sales' in the Properties panel on the right

Step 6 — Add Pipeline Parameter

Click on the empty canvas background (not on any activity) → at the bottom → click the "Parameters" tab.

📸SCREENSHOT

Pipeline canvas — empty background clicked, Parameters tab visible at bottom

Click "+ New":

Namestore_files
TypeArray
Default["store_ST001_sales.csv","store_ST002_sales.csv","store_ST003_sales.csv","store_ST004_sales.csv","store_ST005_sales.csv","store_ST006_sales.csv","store_ST007_sales.csv","store_ST008_sales.csv","store_ST009_sales.csv","store_ST010_sales.csv"]
⚠️ Array Format Must Be Exact JSON
The default value must be a valid JSON array: strings in double quotes, separated by commas, wrapped in square brackets. Copy the value above exactly — no trailing commas, no single quotes.
📸SCREENSHOT

Pipeline Parameters tab — store_files parameter with Array type and the full JSON array as default value

Step 7 — Add ForEach Activity

In the left activities panel → expand "Iteration & conditionals"→ drag "ForEach" onto the canvas.

📸SCREENSHOT

Left activities panel — 'Iteration & conditionals' section expanded, ForEach being dragged to canvas

📸SCREENSHOT

ForEach activity placed on the canvas — a larger box with 'ForEach' label and a '+' icon in the centre

Click on the ForEach activity → configure the bottom panel:

General Tab

NameForEach_store_files
DescriptionLoops through each store CSV file name
📸SCREENSHOT

ForEach General tab — name and description filled in

Settings Tab

Sequential☐ Unchecked ← we want parallel
Batch count4
Items@pipeline().parameters.store_files← add via dynamic content

For Items field → click "Add dynamic content"→ under "Parameters" → click store_files. Expression becomes @pipeline().parameters.store_files.

📸SCREENSHOT

Dynamic content editor — @pipeline().parameters.store_files expression, store_files parameter highlighted on the right

📸SCREENSHOT

ForEach Settings tab — Sequential unchecked, Batch count 4, Items showing @pipeline().parameters.store_files

Sequential OFF — run multiple iterations at the same time (parallel). Faster but uses more resources.

Batch count 4 — run maximum 4 iterations simultaneously: files 1,2,3,4 → then 5,6,7,8 → then 9,10. Prevents overloading the system.

Step 8 — Add Copy Activity INSIDE the ForEach

⚠️ Most Common Mistake in This Step
Do NOT drag a Copy activity from the left panel onto the main canvas. You must click the "+"that is inside the ForEach box. This is the mistake beginners make most often.

Click the "+ (Add activity)" button that is inside the ForEach box on the canvas.

📸SCREENSHOT

ForEach activity box — showing the '+' button inside the box (not on the main canvas)

You are now inside the loop — a new blank canvas area opens labeled with the ForEach name.

📸SCREENSHOT

ForEach inner canvas — a new blank canvas area showing you are inside the loop

From the left panel → drag a "Copy data" activity onto this inner canvas.

📸SCREENSHOT

Copy data activity placed inside the ForEach inner canvas

Step 9 — Configure Source with @item()

Click the Copy activity inside the ForEach → configure:

General Tab

Namecopy_store_file
DescriptionCopies current store file from landing to ADLS raw/sales/

Source Tab

Select ds_src_blob_store_sales. A Dataset properties section appears. Click inside the file_name value field → "Add dynamic content".

📸SCREENSHOT

Source tab — ds_src_blob_store_sales selected, Dataset properties section showing file_name field with 'Add dynamic content' link

Under "ForEach iterator" on the right → click "Item". Expression becomes:

@item()
📸SCREENSHOT

Dynamic content editor — @item() expression, 'Item' option highlighted under ForEach iterator section

Click "OK".

What does @item() mean?

@item() only works inside a ForEach loop. It returns the current item being processed.

Iteration 1: @item() = "store_ST001_sales.csv"

Iteration 2: @item() = "store_ST002_sales.csv"

Iteration 3: @item() = "store_ST003_sales.csv" ... and so on

📸SCREENSHOT

Source tab complete — file_name Dataset property showing @item() value

Step 10 — Configure Sink with @item()

Click Sink tab → select ds_sink_adls_store_sales→ in the file_name Dataset property → "Add dynamic content"→ click "Item" → expression: @item()"OK".

📸SCREENSHOT

Sink tab — ds_sink_adls_store_sales selected, file_name Dataset property showing @item()

Both source and sink now use @item() — the same file name is used for reading and writing:

Sourcereads fromlanding/store_sales/store_ST001_sales.csv
Sinkwrites toraw/sales/store_ST001_sales.csv

Same file name — different container and folder. Clean.

Step 11 — Validate and Return to Main Canvas

Click the back arrow at the top left of the inner canvas to return to the main pipeline canvas.

📸SCREENSHOT

Back arrow at top left — returning to main pipeline canvas from ForEach inner canvas

📸SCREENSHOT

Main pipeline canvas — ForEach_store_files activity showing '1 activity' label inside it

Click "Validate" in the top toolbar.

📸SCREENSHOT

Validation successful message — 'Your pipeline has been validated. No errors were found.'

If you see errors:

Dataset property file_name is not set → Copy activity → Source or Sink tab → add @item() to the file_name property

Items expression is required → ForEach activity → Settings tab → Items field → add @pipeline().parameters.store_files

Step 12 — Debug

Click "Debug". A dialog appears asking for parameter values — the default array should be pre-filled. If not, paste it:

["store_ST001_sales.csv","store_ST002_sales.csv","store_ST003_sales.csv","store_ST004_sales.csv","store_ST005_sales.csv","store_ST006_sales.csv","store_ST007_sales.csv","store_ST008_sales.csv","store_ST009_sales.csv","store_ST010_sales.csv"]
📸SCREENSHOT

Debug parameter dialog — store_files parameter with the JSON array value pre-filled

Click "OK". Watch the ForEach run. Click the 👓 glasses icon next to the ForEach in the Output tab to see individual iterations.

📸SCREENSHOT

ForEach run details — showing all 10 iterations, each with status, file name, and duration

📸SCREENSHOT

All 10 iterations completed — every row showing green checkmark and duration

Step 13 — Verify All 10 Files in ADLS

Go to Azure Portal → stfreshmartdevContainersrawsales.

📸SCREENSHOT

raw/sales/ directory — showing all 10 store CSV files listed with file sizes and timestamps

Click on any file → "Edit" to preview the data.

📸SCREENSHOT

One store file open in preview — showing the 5 rows of data for that store

Step 14 — Publish

Click "Publish all" → the panel shows 3 new items: pipeline, 2 datasets → click "Publish".

📸SCREENSHOT

Successfully published message — 3 new items published

💡 Bonus — Test With Only 3 Files

Click Debug → change the store_files value to just 3 files. The pipeline runs only 3 iterations. This is how professionals test with a subset before running the full load.

["store_ST001_sales.csv","store_ST002_sales.csv","store_ST003_sales.csv"]
📸SCREENSHOT

Debug dialog — store_files with only 3 files in the array for a quick test run

📸SCREENSHOT

ForEach showing only 3 iterations — faster test run completed

What Was Added in Project 02

ItemNameWhat It Does
Datasetds_src_blob_store_salesParameterized source — file name is dynamic
Datasetds_sink_adls_store_salesParameterized sink — file name is dynamic
Pipelinepl_copy_all_store_salesContains ForEach + Copy
Parameterstore_files (Array)List of filenames to process
ActivityForEach_store_filesLoops through the file list
Activitycopy_store_fileCopies one file per iteration — inside ForEach

Key Concepts Reference

ConceptWhat It IsWhen You Use It
ForEach ActivityLoops through a list of itemsWhen you have multiple files/tables to process
Array ParameterA list of values passed to a pipelineWhen the list of files may change
@item()Current item in a ForEach loopInside ForEach — to get the current loop value
@pipeline().parameters.XRead a pipeline parameterAnywhere in the pipeline
@dataset().XPass a value to a dataset parameterIn Copy activity Source/Sink dataset properties
Dynamic expression@{ } syntax — evaluated at runtimeWhenever a value needs to be dynamic, not fixed
Batch countHow many ForEach iterations run in parallelBalance speed vs resource usage
Parameterized datasetDataset where the file path uses parametersWhen same dataset is used for many different files

Common Mistakes

⚠️

Placing Copy activity OUTSIDE the ForEach

Fix: Delete it, click the "+" INSIDE the ForEach box, re-add it

⚠️

Forgetting to set Dataset properties in Source or Sink

Fix: Copy activity → Source tab → Dataset properties → set file_name to @item()

⚠️

Wrong Array format in parameter default

Fix: Must be: ["file1.csv","file2.csv"] — double quotes, square brackets, no trailing comma

⚠️

Sequential = ON with large file lists

Fix: Turn Sequential OFF and set Batch count to 4 or 5

What is coming in Project 03

Right now you pass the file list as a hardcoded default array parameter. What if the file name includes today's date?

store_ST001_sales_20240115.csv

store_ST001_sales_20240116.csv ← tomorrow

store_ST001_sales_20240117.csv ← day after

In Project 03 you will learn Parameterized Pipelines with Variables — pass run_date at trigger time, use a Set Variable activity to build the folder path, and ADF constructs the correct file names automatically every night.

🎯 Key Takeaways

  • ForEach loops through an array — one Copy activity handles all files instead of duplicating activities per file
  • @item() returns the current loop value — it only works inside a ForEach activity
  • Parameterized datasets use @dataset().file_name so the same dataset works for any file
  • The store_files Array parameter holds the list of files — passed from outside the pipeline before it runs
  • Sequential OFF + Batch count 4 runs 4 files simultaneously — much faster than sequential for large lists
  • Pass a smaller test array at Debug time to validate the pipeline on 3 files before running all 10
  • Adding a new store? Just add the filename to the array parameter — the pipeline never needs to change
  • Variables are different from parameters — you will use them properly in Project 03 where they are genuinely needed
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...