Project 02 — Copy Multiple CSV Files Using ForEach Loop
Stop creating one Copy activity per file. Use the ForEach activity to loop through a list of files and copy all of them in a single pipeline run. Add a new store tomorrow — just update the array, no pipeline changes needed.
A pipeline that loops through 10 store CSV files and copies all of them to ADLS in a single run — using one ForEach activity instead of 10 Copy activities.
Real World Problem
In Project 01 we solved the first problem — we moved a single CSV file from a laptop to Azure automatically. But here is where the real world gets complicated.
FreshMart does not have 1 store. They have 10 stores. Every night, each store manager exports their own sales file:
| File | Store | City |
|---|---|---|
| store_ST001_sales.csv | ST001 | New Delhi |
| store_ST002_sales.csv | ST002 | Mumbai |
| store_ST003_sales.csv | ST003 | Bangalore |
| store_ST004_sales.csv | ST004 | Chennai |
| store_ST005_sales.csv | ST005 | Hyderabad |
| store_ST006_sales.csv | ST006 | Pune |
| store_ST007_sales.csv | ST007 | Kolkata |
| store_ST008_sales.csv | ST008 | Ahmedabad |
| store_ST009_sales.csv | ST009 | Jaipur |
| store_ST010_sales.csv | ST010 | Chandigarh |
Imagine solving this the wrong way — creating 10 separate Copy activities, one per store file:
Copy Activity 1 → store_ST001
Copy Activity 2 → store_ST002
Copy Activity 3 → store_ST003
... 7 more activities ...
Copy Activity 10 → store_ST010
New store opens → manually add an activity
File renamed → manually update each activity
10 separate failure points
ForEach Activity
└── For each file in list
Copy Activity
(runs once per file)
New store opens → just add to the array
Tomorrow: 50 stores → same pipeline
1 activity, 1 failure point
Concepts You Must Understand First
What is a ForEach Activity?
A ForEach activity is a loop inside ADF. It takes a list of items and runs one or more activities for each item in that list.
Real analogy: You have 10 packages to deliver. Instead of making 10 separate trips, you load all packages into one truck and deliver them one by one on a single route. The ForEach activity is the truck route.
ForEach Activity
├── Items: ["ST001", "ST002", ... "ST010"]
│ ↑ this is the list it loops through
└── Activities inside the loop:
Copy Activity (runs once per item)
Copies file 1, then file 2, then file 3. Safer. Slower.
Copies multiple files at the same time. Faster. We use this with batch count 4.
What is a Pipeline Parameter?
A parameter is a value you pass into a pipeline from outside — before it starts running. In Project 02 we use an Array parameter called store_files to pass the list of file names.
- — Set FROM OUTSIDE the pipeline
- — Set before the pipeline starts — cannot change during a run
- — Example: store_files = ["ST001.csv", "ST002.csv", ...] passed when triggering
What is a Dynamic Expression?
In Project 01, our dataset had a hardcoded file name: daily_sales.csv — fixed, never changes. In Project 02, the file name changes on every loop iteration. This is where dynamic expressions come in.
The @{ } syntax tells ADF: this is not a static value — evaluate this expression right now.
One pipeline. One ForEach. Ten files copied.
Step 1 — Create 10 Store CSV Files
Each file represents one store's daily sales. Open Notepad (Windows) or TextEdit (Mac) and create each file below. Save them all in a folder on your Desktop called freshmart_store_files.
order_id,store_id,product_name,category,quantity,unit_price,order_date ORD1001,ST001,Basmati Rice 5kg,Grocery,12,299.00,2024-01-15 ORD1002,ST001,Samsung TV 43inch,Electronics,2,32000.00,2024-01-15 ORD1003,ST001,Amul Butter 500g,Dairy,25,240.00,2024-01-15 ORD1004,ST001,Colgate Toothpaste,Personal Care,30,89.00,2024-01-15 ORD1005,ST001,Nike Running Shoes,Apparel,5,4500.00,2024-01-15
order_id,store_id,product_name,category,quantity,unit_price,order_date ORD2001,ST002,Sunflower Oil 1L,Grocery,18,145.00,2024-01-15 ORD2002,ST002,iPhone 14,Electronics,1,75000.00,2024-01-15 ORD2003,ST002,Amul Milk 1L,Dairy,40,62.00,2024-01-15 ORD2004,ST002,Dove Soap 100g,Personal Care,50,65.00,2024-01-15 ORD2005,ST002,Levis Jeans,Apparel,8,2999.00,2024-01-15
Create files 3–10 with the same structure, using the store IDs ST003 through ST010 and their respective city product data. Save all 10 as store_ST00X_sales.csv.
Desktop folder 'freshmart_store_files' — showing all 10 CSV files listed
Step 2 — Upload All 10 Files to Landing Container
Go to Azure Portal → Storage accounts → stfreshmartdev → Containers → landing.
Click "+ Add Directory" → name it store_sales.
Add Directory dialog — 'store_sales' entered as directory name
Click into the store_sales directory → click "Upload" → hold Ctrl and select all 10 CSV files at once → click "Upload".
Upload dialog — all 10 files selected, showing their names in the file list before uploading
landing/store_sales/ directory after upload — all 10 CSV files visible with file sizes
In Project 01, our datasets had a hardcoded file path: File: daily_sales.csv. That worked for one file. Now the file name needs to change on each loop iteration. We do this by adding parameters to the dataset — a placeholder instead of a fixed value.
Step 3 — Create Parameterized Source Dataset
In ADF Studio → Author → Datasets → "+" → "New dataset"→ "Azure Blob Storage" → "Continue" → "DelimitedText" → "Continue".
Dataset form — name filled, linked service selected, file path fields left empty
Click "OK". You are now in the dataset editor. Click the "Parameters" tab at the bottom.
Dataset editor — Parameters tab highlighted at the bottom
Click "+ New":
Parameters tab — new parameter 'file_name' of type String added
Now click the "Connection" tab → click inside the "File" field → click "Add dynamic content" (blue link below the field).
Connection tab — 'Add dynamic content' blue link visible below the File field
The dynamic content editor opens. Under "Parameters" on the right → click file_name. The expression becomes:
Dynamic content editor — @dataset().file_name expression in the box, file_name parameter visible in the right panel
Click "OK". Set the full path:
Connection tab fully configured — container 'landing', directory 'store_sales', file showing the dynamic expression
Click 💾 Save.
Step 4 — Create Parameterized Sink Dataset
Click "+" next to Datasets → "New dataset"→ "Azure Data Lake Storage Gen2" → "Continue"→ "DelimitedText" → "Continue".
Click "OK" → Parameters tab → "+ New" → file_name, type String.
Sink dataset Parameters tab — file_name parameter added
Click Connection tab → File field → "Add dynamic content"→ click file_name → expression: @dataset().file_name → "OK".
Sink dataset Connection tab — raw/sales/@dataset().file_name
Click 💾 Save.
Step 5 — Create New Pipeline
In ADF Studio → Author → "+" next to Pipelines → "New pipeline".
New blank pipeline canvas — name 'pl_copy_all_store_sales' in the Properties panel on the right
Step 6 — Add Pipeline Parameter
Click on the empty canvas background (not on any activity) → at the bottom → click the "Parameters" tab.
Pipeline canvas — empty background clicked, Parameters tab visible at bottom
Click "+ New":
Pipeline Parameters tab — store_files parameter with Array type and the full JSON array as default value
Step 7 — Add ForEach Activity
In the left activities panel → expand "Iteration & conditionals"→ drag "ForEach" onto the canvas.
Left activities panel — 'Iteration & conditionals' section expanded, ForEach being dragged to canvas
ForEach activity placed on the canvas — a larger box with 'ForEach' label and a '+' icon in the centre
Click on the ForEach activity → configure the bottom panel:
General Tab
ForEach General tab — name and description filled in
Settings Tab
For Items field → click "Add dynamic content"→ under "Parameters" → click store_files. Expression becomes @pipeline().parameters.store_files.
Dynamic content editor — @pipeline().parameters.store_files expression, store_files parameter highlighted on the right
ForEach Settings tab — Sequential unchecked, Batch count 4, Items showing @pipeline().parameters.store_files
Sequential OFF — run multiple iterations at the same time (parallel). Faster but uses more resources.
Batch count 4 — run maximum 4 iterations simultaneously: files 1,2,3,4 → then 5,6,7,8 → then 9,10. Prevents overloading the system.
Step 8 — Add Copy Activity INSIDE the ForEach
Click the "+ (Add activity)" button that is inside the ForEach box on the canvas.
ForEach activity box — showing the '+' button inside the box (not on the main canvas)
You are now inside the loop — a new blank canvas area opens labeled with the ForEach name.
ForEach inner canvas — a new blank canvas area showing you are inside the loop
From the left panel → drag a "Copy data" activity onto this inner canvas.
Copy data activity placed inside the ForEach inner canvas
Step 9 — Configure Source with @item()
Click the Copy activity inside the ForEach → configure:
General Tab
Source Tab
Select ds_src_blob_store_sales. A Dataset properties section appears. Click inside the file_name value field → "Add dynamic content".
Source tab — ds_src_blob_store_sales selected, Dataset properties section showing file_name field with 'Add dynamic content' link
Under "ForEach iterator" on the right → click "Item". Expression becomes:
Dynamic content editor — @item() expression, 'Item' option highlighted under ForEach iterator section
Click "OK".
What does @item() mean?
@item() only works inside a ForEach loop. It returns the current item being processed.
Iteration 1: @item() = "store_ST001_sales.csv"
Iteration 2: @item() = "store_ST002_sales.csv"
Iteration 3: @item() = "store_ST003_sales.csv" ... and so on
Source tab complete — file_name Dataset property showing @item() value
Step 10 — Configure Sink with @item()
Click Sink tab → select ds_sink_adls_store_sales→ in the file_name Dataset property → "Add dynamic content"→ click "Item" → expression: @item() → "OK".
Sink tab — ds_sink_adls_store_sales selected, file_name Dataset property showing @item()
Both source and sink now use @item() — the same file name is used for reading and writing:
Same file name — different container and folder. Clean.
Step 11 — Validate and Return to Main Canvas
Click the back arrow at the top left of the inner canvas to return to the main pipeline canvas.
Back arrow at top left — returning to main pipeline canvas from ForEach inner canvas
Main pipeline canvas — ForEach_store_files activity showing '1 activity' label inside it
Click "Validate" in the top toolbar.
Validation successful message — 'Your pipeline has been validated. No errors were found.'
If you see errors:
Dataset property file_name is not set → Copy activity → Source or Sink tab → add @item() to the file_name property
Items expression is required → ForEach activity → Settings tab → Items field → add @pipeline().parameters.store_files
Step 12 — Debug
Click "Debug". A dialog appears asking for parameter values — the default array should be pre-filled. If not, paste it:
Debug parameter dialog — store_files parameter with the JSON array value pre-filled
Click "OK". Watch the ForEach run. Click the 👓 glasses icon next to the ForEach in the Output tab to see individual iterations.
ForEach run details — showing all 10 iterations, each with status, file name, and duration
All 10 iterations completed — every row showing green checkmark and duration
Step 13 — Verify All 10 Files in ADLS
Go to Azure Portal → stfreshmartdev → Containers → raw → sales.
raw/sales/ directory — showing all 10 store CSV files listed with file sizes and timestamps
Click on any file → "Edit" to preview the data.
One store file open in preview — showing the 5 rows of data for that store
Step 14 — Publish
Click "Publish all" → the panel shows 3 new items: pipeline, 2 datasets → click "Publish".
Successfully published message — 3 new items published
Click Debug → change the store_files value to just 3 files. The pipeline runs only 3 iterations. This is how professionals test with a subset before running the full load.
Debug dialog — store_files with only 3 files in the array for a quick test run
ForEach showing only 3 iterations — faster test run completed
What Was Added in Project 02
| Item | Name | What It Does |
|---|---|---|
| Dataset | ds_src_blob_store_sales | Parameterized source — file name is dynamic |
| Dataset | ds_sink_adls_store_sales | Parameterized sink — file name is dynamic |
| Pipeline | pl_copy_all_store_sales | Contains ForEach + Copy |
| Parameter | store_files (Array) | List of filenames to process |
| Activity | ForEach_store_files | Loops through the file list |
| Activity | copy_store_file | Copies one file per iteration — inside ForEach |
Key Concepts Reference
| Concept | What It Is | When You Use It |
|---|---|---|
| ForEach Activity | Loops through a list of items | When you have multiple files/tables to process |
| Array Parameter | A list of values passed to a pipeline | When the list of files may change |
| @item() | Current item in a ForEach loop | Inside ForEach — to get the current loop value |
| @pipeline().parameters.X | Read a pipeline parameter | Anywhere in the pipeline |
| @dataset().X | Pass a value to a dataset parameter | In Copy activity Source/Sink dataset properties |
| Dynamic expression | @{ } syntax — evaluated at runtime | Whenever a value needs to be dynamic, not fixed |
| Batch count | How many ForEach iterations run in parallel | Balance speed vs resource usage |
| Parameterized dataset | Dataset where the file path uses parameters | When same dataset is used for many different files |
Common Mistakes
Placing Copy activity OUTSIDE the ForEach
Fix: Delete it, click the "+" INSIDE the ForEach box, re-add it
Forgetting to set Dataset properties in Source or Sink
Fix: Copy activity → Source tab → Dataset properties → set file_name to @item()
Wrong Array format in parameter default
Fix: Must be: ["file1.csv","file2.csv"] — double quotes, square brackets, no trailing comma
Sequential = ON with large file lists
Fix: Turn Sequential OFF and set Batch count to 4 or 5
Right now you pass the file list as a hardcoded default array parameter. What if the file name includes today's date?
store_ST001_sales_20240115.csv
store_ST001_sales_20240116.csv ← tomorrow
store_ST001_sales_20240117.csv ← day after
In Project 03 you will learn Parameterized Pipelines with Variables — pass run_date at trigger time, use a Set Variable activity to build the folder path, and ADF constructs the correct file names automatically every night.
🎯 Key Takeaways
- ✓ForEach loops through an array — one Copy activity handles all files instead of duplicating activities per file
- ✓@item() returns the current loop value — it only works inside a ForEach activity
- ✓Parameterized datasets use @dataset().file_name so the same dataset works for any file
- ✓The store_files Array parameter holds the list of files — passed from outside the pipeline before it runs
- ✓Sequential OFF + Batch count 4 runs 4 files simultaneously — much faster than sequential for large lists
- ✓Pass a smaller test array at Debug time to validate the pipeline on 3 files before running all 10
- ✓Adding a new store? Just add the filename to the array parameter — the pipeline never needs to change
- ✓Variables are different from parameters — you will use them properly in Project 03 where they are genuinely needed
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.