ProjectsAdvanced+500 XP

Project 05 — Organize Files Automatically With Date Stamps

Stop overwriting files silently. Build a pipeline that checks if a file exists before copying, date-stamps the output, cleans the landing zone automatically, and logs what was missing — a complete production file management workflow.

90–120 min March 2026

Project 04 — HTTP Ingestion Project 06 — Pull Data From a REST API

Series
Azure DE — Zero to Advanced

Project
05 of 25

Level
Beginner+

Time
90–120 min

Builds on: Projects 01–04 — same resource group, same storage account, same ADF

🏢 Real World Problem

FreshCart's data lake is growing. After Projects 01–04, files are landing in ADLS — but new problems are showing up.

Problem 1 — Files get overwritten silently

Every morning, cities.csv is downloaded and saved to raw/external/cities/cities.csv. But what happened to yesterday's file? Overwritten. Gone. No history. Three months later the data team asks what external price data looked like in January — the answer is: we don't know.

Problem 2 — Nobody knows if the source file actually arrived

The pipeline runs at midnight. What if the supplier server was down? ADF will try to copy a file that doesn't exist and fail with a cryptic error. Better behavior: check first, skip gracefully if missing — don't crash the whole pipeline.

Problem 3 — Landing zone is filling up with processed files

Files land in landing/store_sales/ and stay there forever after being copied to raw/. The landing zone is a staging area, not permanent storage. Processed files should be deleted after a successful copy.

The solution — a production-grade file management pattern:

Step 1: Check if file exists in landing zone (Get Metadata + If Condition)
Step 2: If file exists → copy it to ADLS with date stamp in the name
Step 3: After successful copy → delete file from landing zone
Step 4: If file does not exist → log warning, continue gracefully

📌 Real World Example

This is exactly how production pipelines are designed at companies like Amazon, Uber Eats, and HDFC Bank.

🧠 Concepts You Must Understand First

What is the Get Metadata Activity?

Get Metadata reads information about a file or folder — without reading the file contents. Think of it like checking a package label before opening it.

Get Metadata can return:
  exists          → true or false
  size            → file size in bytes
  lastModified    → when the file was last changed
  itemName        → the file name
  itemType        → "File" or "Folder"
  childItems      → list of files inside a folder

What is the If Condition Activity?

ADF's decision maker. It evaluates a TRUE/FALSE expression and runs different activities for each outcome.

IF (file exists AND size > 0)
  THEN → Copy the file + Delete original
  ELSE → Log a warning message, skip this file

In ADF:
  If Condition Activity
    ├── Expression:   @and(activity('get_metadata_store_file').output.exists, greater(...size, 0))
    ├── True branch:  Copy Activity → Delete Activity
    └── False branch: Set Variable Activity (log "file not found")

What is the Delete Activity?

Deletes a file or folder from a storage location. After successfully copying a file from landing → ADLS, we use Delete to remove the original.

⚠️ Important

Delete only runs AFTER Copy succeeds. If copying fails, Delete never runs and the original file is safe. Always connect Delete to Copy with a success arrow (green), not an always arrow (grey).

String Interpolation for Date-Stamped File Names

Building a dated file name

Original:   store_ST001_sales.csv
Stamped:    store_ST001_sales_20240115.csv

Expression:
@{replace(item(), '.csv', '')}_@{formatDateTime(pipeline().parameters.run_date, 'yyyyMMdd')}.csv

Breaking down:
  @{replace(item(), '.csv', '')}   → removes .csv → "store_ST001_sales"
  _                                → literal underscore
  @{formatDateTime(..., 'yyyyMMdd')} → "20240115"
  .csv                             → adds .csv back

Result: store_ST001_sales_20240115.csv

What We Are Building — Visualized

PIPELINE: pl_file_management_daily

FOR EACH store file:
  │
  ├─► GET METADATA
  │   "Does store_ST001_sales_20240115.csv exist in landing?"
  │
  ├─► IF CONDITION
  │   "output.exists = true AND size > 0?"
  │
  ├─► TRUE BRANCH:
  │   │
  │   ├─► COPY ACTIVITY
  │   │   landing/store_sales/date=2024-01-15/store_ST001_sales_20240115.csv
  │   │   → raw/sales/date=2024-01-15/store_ST001_sales_20240115.csv
  │   │
  │   └─► DELETE ACTIVITY
  │       Delete: landing/store_sales/date=2024-01-15/store_ST001_sales_20240115.csv
  │
  └─► FALSE BRANCH:
      SET VARIABLE: missing_files += "ST001 missing for 2024-01-15 | "
      (pipeline continues — does not crash)

📋 Step by Step Overview

PHASE 1 — Prepare (10 min)
  Step 1:  Upload test files to landing zone

PHASE 2 — Create New Datasets (15 min)
  Step 2:  Confirm existing source dataset (reuse from Project 03)
  Step 3:  Create Delete activity dataset

PHASE 3 — Build the Pipeline (60 min)
  Step 4:  Create pipeline with parameters and variables
  Step 5:  Add Set Variable — build run_date_folder
  Step 6:  Add ForEach activity
  Step 7:  Inside ForEach — add Get Metadata activity
  Step 8:  Inside ForEach — add If Condition activity
  Step 9:  Inside True branch — add Copy activity
  Step 10: Inside True branch — add Delete activity
  Step 11: Inside False branch — add Set Variable (log missing files)
  Step 12: Add final summary log activity to main canvas
  Step 13: Validate and Debug — file exists scenario
  Step 14: Debug — file missing scenario
  Step 15: Publish

Phase 1 — Prepare

Step 1 — Upload Test Files to Landing Zone

We need files in the landing zone to test both scenarios — exists AND missing.

Azure Portal → Storage → stfreshmartdev → Containers → landing → store_sales/

Click "+ Add Directory"

Directory name: date=2024-01-15

📸SCREENSHOT

Add Directory dialog — date=2024-01-15 entered

Click into date=2024-01-15 → click "Upload"

Upload only 5 of the 10 store files (ST001–ST005). This lets us test the missing scenario for ST006–ST010.

store_ST001_sales_20240115.csv
store_ST002_sales_20240115.csv
store_ST003_sales_20240115.csv
store_ST004_sales_20240115.csv
store_ST005_sales_20240115.csv

📸SCREENSHOT

Upload dialog — 5 files selected, ready to upload

📸SCREENSHOT

landing/store_sales/date=2024-01-15/ — showing exactly 5 files (ST001 through ST005)

Phase 2 — Create New Datasets

Step 2 — Reuse Existing Source Dataset

The Get Metadata activity will reuse ds_src_blob_dated_store_sales from Project 03 — it already has run_date_folder and file_name parameters. No new source dataset needed.

📸SCREENSHOT

Author → Datasets — ds_src_blob_dated_store_sales already exists from Project 03

Step 3 — Create Delete Activity Dataset

In ADF Studio → Author → Datasets → "+" → "New dataset"

Search "Azure Blob Storage" → select → "Continue"

Select "DelimitedText" → "Continue"

Name:            ds_delete_blob_landing
Linked service:  ls_blob_freshmart_landing
File path:       (leave all empty)
First row as header: ✅ Yes
Import schema:   None

Click "OK" → "Parameters" tab → add TWO parameters:

Parameter 1:
  Name:    run_date_folder
  Type:    String

Parameter 2:
  Name:    file_name
  Type:    String

📸SCREENSHOT

ds_delete_blob_landing Parameters tab — run_date_folder and file_name parameters

Click "Connection" tab:

Container: landing

Directory → "Add dynamic content":

Phase 3 — Build the Pipeline

Step 4 — Create Pipeline With Parameters and Variables

In ADF Studio → Author → "+" → "New pipeline"

Name:        pl_file_management_daily
Description: Checks file existence, copies with date stamp, deletes from landing zone

Click empty canvas → "Parameters" tab → add TWO parameters:

Parameter 1:
  Name:    run_date
  Type:    String
  Default: 2024-01-15

Parameter 2:
  Name:    store_ids
  Type:    Array
  Default: ["ST001","ST002","ST003","ST004","ST005","ST006","ST007","ST008","ST009","ST010"]

📸SCREENSHOT

Pipeline Parameters tab — run_date and store_ids parameters

Click "Variables" tab → add THREE variables:

Variable 1:
  Name:    run_date_folder
  Type:    String

Variable 2:
  Name:    missing_files
  Type:    String

Variable 3:
  Name:    final_log
  Type:    String

⚠️ Important

Why three variables?
ADF does not allow a Set Variable activity to read and write the same variable in one step — it calls this a "self-reference" and throws an error.

missing_files is written to inside the ForEach (appending each missing store).
final_log is written to after the ForEach (reads missing_files and builds the summary).
Two different variables = no self-reference = no error.

📸SCREENSHOT

Variables tab — three variables: run_date_folder, missing_files, final_log

Step 5 — Add Set Variable: Build run_date_folder

From left panel → "General" → drag "Set variable" onto the canvas

Click it → bottom panel:

General tab:
  Name:        set_run_date_folder
  Description: Formats run_date into Hive partition folder name

Click "Variables" tab:

Name:   run_date_folder
Value:  (Add dynamic content)

Value expression

date=@{pipeline().parameters.run_date}

📸SCREENSHOT

Set variable activity — name 'set_run_date_folder', value showing date=@{pipeline().parameters.run_date}

Step 6 — Add ForEach Activity

From left panel → "Iteration & conditionals" → drag "ForEach" onto the canvas

Connect: hover over set_run_date_folder → drag green arrow → connect to ForEach

📸SCREENSHOT

Canvas — set_run_date_folder connected to ForEach with green success arrow

Click the ForEach → bottom panel:

General tab:
  Name:        ForEach_stores
  Description: Loops through each store ID to check and copy files

Settings tab:
  Sequential:   ☑ Checked   ← IMPORTANT
  Items:        @pipeline().parameters.store_ids

📸SCREENSHOT

ForEach Settings tab — Sequential CHECKED, Items showing @pipeline().parameters.store_ids

⚠️ Important

Why Sequential and not Parallel?
Variables in ADF are shared across the pipeline. If two iterations run simultaneously and both try to updatemissing_files at the same time, one write overwrites the other — entries get lost. This is a race condition. Sequential solves it — each iteration waits its turn before writing.

Parallel risk:   Iterations 4 and 6 both find missing files simultaneously
                 Both try to write to missing_files → one write lost ❌

Sequential:      Iteration 4 runs → writes → finishes
                 Iteration 6 runs → writes → finishes
                 All updates preserved ✅

Step 7 — Inside ForEach: Add Get Metadata Activity

Click the "+" button inside the ForEach box to enter the inner canvas

📸SCREENSHOT

ForEach box — '+' button inside, about to enter inner canvas

From left panel → "General" → drag "Get Metadata" onto the inner canvas

📸SCREENSHOT

Get Metadata activity placed on ForEach inner canvas

Click Get Metadata → bottom panel:

General tab:
  Name:        get_metadata_store_file
  Description: Checks if today's store file exists in landing zone

Click "Dataset" tab:

Dataset:  ds_src_blob_dated_store_sales

run_date_folder → Add dynamic content:

run_date_folder

@variables('run_date_folder')

file_name → Add dynamic content:

file_name

store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

📸SCREENSHOT

Dynamic content editor — full dated file name expression with @item() and formatDateTime

Click "Field list" tab → click "+ New" twice:

Field 1:  exists
Field 2:  size

📸SCREENSHOT

Field list tab — 'exists' and 'size' added as fields to retrieve

💡 Note

exists tells you if the file is there. size tells you if it is empty (0 bytes). In production you check both — a 0-byte file exists but contains no data.

Step 8 — Inside ForEach: Add If Condition Activity

From left panel → "Iteration & conditionals" → drag "If Condition" onto the inner canvas

Connect green arrow from get_metadata_store_file → connect to If Condition

📸SCREENSHOT

Inner canvas — get_metadata_store_file connected to If Condition with green arrow

Click If Condition → bottom panel:

General tab:
  Name:        if_file_exists
  Description: Checks if file exists AND has content (size > 0)

Click "Activities" tab → Expression field → "Add dynamic content":

If Condition expression

@and( activity('get_metadata_store_file').output.exists, greater(activity('get_metadata_store_file').output.size, 0) )

📸SCREENSHOT

Dynamic content editor — the full @and() expression with exists and greater() checks

Breaking down the expression

@and( ... , ... )
  → Returns true only if BOTH conditions are true

activity('get_metadata_store_file').output.exists
  → true if file exists, false if not
  → activity('name') reads output of a previous activity by name

greater(activity('get_metadata_store_file').output.size, 0)
  → true if file size is greater than 0 (not empty)

Combined:
  → true if file EXISTS and is NOT EMPTY ✅
  → false if file is missing OR is 0 bytes ❌

📸SCREENSHOT

If Condition Activities tab — expression showing the @and() check with exists and size

Step 9 — Inside True Branch: Add Copy Activity

Click the pencil icon next to "True" to enter the True branch canvas

📸SCREENSHOT

If Condition Activities tab — True and False sections, pencil icon next to True highlighted

Drag "Copy data" onto the True branch canvas

📸SCREENSHOT

Copy data activity placed on the True branch canvas

General tab:
  Name:        copy_to_adls_with_datestamp
  Description: Copies store file from landing to ADLS raw/sales/ with date stamp

Source tab:

Source dataset:  ds_src_blob_dated_store_sales

run_date_folder → @variables('run_date_folder')

file_name →

file_name

store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

📸SCREENSHOT

Source tab — both dataset properties filled, file_name showing the dated expression

Sink tab:

Sink dataset:  ds_sink_adls_dated_sales

run_date_folder → @variables('run_date_folder')

file_name → same dated expression as Source

file_name

store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

📸SCREENSHOT

Sink tab — both dataset properties filled matching source

Step 10 — Inside True Branch: Add Delete Activity

Still on the True branch canvas → from left panel → "General" → drag "Delete"

Connect green arrow from copy_to_adls_with_datestamp → connect to Delete

📸SCREENSHOT

True branch canvas — Copy activity connected to Delete activity with green success arrow

General tab:
  Name:        delete_from_landing
  Description: Removes processed file from landing zone after successful copy

Click "Dataset" tab:

Dataset:  ds_delete_blob_landing

run_date_folder → @variables('run_date_folder')

file_name → dated file name expression

file_name

store_@{item()}_sales_@{formatDateTime(pipeline().parameters.run_date,'yyyyMMdd')}.csv

📸SCREENSHOT

Delete activity Dataset tab — ds_delete_blob_landing selected, both properties filled

Click "Logging settings" tab (optional but recommended):

Enable logging:    ✅ Yes
Linked service:    ls_adls_freshmart
Log folder path:   logs/delete_activity/

📸SCREENSHOT

Delete activity Logging tab — enabled, log path set to logs/delete_activity/

🎯 Pro Tip

Enable logging for Delete. It is irreversible — if a file gets deleted that shouldn't have been, the log tells you exactly what was deleted, when, and by which run. Your safety net.

📸SCREENSHOT

Complete True branch canvas — Copy → Delete connected with green arrow

Step 11 — Inside False Branch: Add Set Variable

Click back arrow → return to If Condition Activities tab → click pencil next to "False"

📸SCREENSHOT

If Condition — False section pencil icon highlighted

Drag "Set variable" onto the False branch canvas

📸SCREENSHOT

Set variable activity placed on False branch canvas

General tab:
  Name:        log_missing_file
  Description: Appends missing store ID to missing_files variable for monitoring

Variables tab:

Name:   missing_files
Value:  (Add dynamic content)

Append expression

@{variables('missing_files')}@{item()} missing for @{pipeline().parameters.run_date} |

→ After ST006 and ST007 missing: "ST006 missing for 2024-01-15 | ST007 missing for 2024-01-15 | "

📸SCREENSHOT

Set variable — Variables tab with the append expression for missing_files

How string concatenation works here

@{variables('missing_files')}        → current value (whatever was already logged)
@{item()}                             → current store ID, e.g. "ST006"
" missing for "                       → literal text
@{pipeline().parameters.run_date}    → "2024-01-15"
" | "                                 → separator between entries

Each iteration APPENDS — previous entries are preserved ✅

📸SCREENSHOT

Complete False branch canvas — log_missing_file Set variable activity alone

Step 12 — Add Final Summary Log to Main Canvas

Click back arrow until you are on the main pipeline canvas.

From left panel → drag one more "Set variable" onto the main canvas (outside ForEach)

Connect green arrow from ForEach_stores → connect to this new Set variable

General tab:
  Name:        output_missing_files_log
  Description: Final log of all missing files for this pipeline run

Variables tab:

Name:   final_log       ← write to final_log, NOT missing_files
Value:  Pipeline run complete. Missing files: @{variables('missing_files')}

⚠️ Important

Write to final_log, not missing_files.
ADF throws a "self-reference" error if a Set Variable activity reads and writes the same variable.missing_files is appended to inside the ForEach. Here we read it and write the result tofinal_log — two different variables, no error.

📸SCREENSHOT

output_missing_files_log — Name shows 'final_log', value reads from missing_files

📸SCREENSHOT

Main canvas — complete pipeline: set_run_date_folder → ForEach_stores → output_missing_files_log

Full pipeline structure

MAIN CANVAS:
  [set_run_date_folder] ──► [ForEach_stores] ──► [output_missing_files_log]

INSIDE ForEach_stores:
  [get_metadata_store_file] ──► [if_file_exists]
                                    │
                                    ├── TRUE:  [copy_to_adls_with_datestamp] ──► [delete_from_landing]
                                    │
                                    └── FALSE: [log_missing_file]

Step 13 — Validate

Click "Validate" in the top toolbar

📸SCREENSHOT

Validation successful — no errors found

⚠️ Important

Common validation errors:

"Activity not found" → The If Condition expression uses activity('get_metadata_store_file') — the name in quotes must exactly match the activity's General tab name. Case-sensitive.

"Variable is read-only inside parallel ForEach" → ForEach Sequential must be CHECKED ON.

"Delete activity dataset not configured" → True branch → Delete → Dataset tab → confirm ds_delete_blob_landing is selected with both properties filled.

Step 14 — Debug: File Exists Scenario

Click "Debug"

run_date:   2024-01-15
store_ids:  ["ST001","ST002","ST003","ST004","ST005","ST006","ST007","ST008","ST009","ST010"]

📸SCREENSHOT

Debug parameter dialog — run_date 2024-01-15, full store_ids array

Click "OK" — pipeline runs sequentially through all 10 stores.

ST001–ST005 go through TRUE branch. ST006–ST010 go through FALSE branch.

📸SCREENSHOT

Pipeline running — set_run_date_folder green, ForEach running with sequential progress

📸SCREENSHOT

Pipeline completed — all activities green

Click 👓 glasses icon on ForEach in Output tab:

📸SCREENSHOT

ForEach iteration list — 10 rows, showing each store ID, which branch ran (TRUE/FALSE), and duration

Verify in ADLS:

raw/sales/date=2024-01-15/
  ├── store_ST001_sales_20240115.csv  ✅
  ├── store_ST002_sales_20240115.csv  ✅
  ├── store_ST003_sales_20240115.csv  ✅
  ├── store_ST004_sales_20240115.csv  ✅
  └── store_ST005_sales_20240115.csv  ✅

Only 5 files — because only 5 existed in landing. Correct behavior.

📸SCREENSHOT

raw/sales/date=2024-01-15/ — exactly 5 files, matching the stores uploaded

Verify landing zone is cleaned:

landing/store_sales/date=2024-01-15/
  (empty — all 5 processed files deleted) ✅

📸SCREENSHOT

landing/store_sales/date=2024-01-15/ — empty folder, files deleted after copying

Check the missing files log — ADF Monitor → pipeline run → click output_missing_files_log → Output:

Pipeline run complete. Missing files: ST006 missing for 2024-01-15 | ST007 missing for 2024-01-15 | ST008 missing for 2024-01-15 | ST009 missing for 2024-01-15 | ST010 missing for 2024-01-15 |

📸SCREENSHOT

output_missing_files_log activity output — showing the missing stores listed in final_log value

Step 15 — Debug: All Files Present Scenario

Upload the remaining 5 store files (ST006–ST010) to landing/store_sales/date=2024-01-15/

📸SCREENSHOT

landing/store_sales/date=2024-01-15/ — all 10 files now uploaded

Run Debug again with the same parameters. This time all 10 go through TRUE branch.

📸SCREENSHOT

ForEach iteration list — all 10 rows showing TRUE branch ran, all green

📸SCREENSHOT

raw/sales/date=2024-01-15/ — all 10 files present

📸SCREENSHOT

landing/store_sales/date=2024-01-15/ — empty, all files deleted

📸SCREENSHOT

output_missing_files_log output — 'Pipeline run complete. Missing files: ' (empty — none missing)

Step 16 — Publish

Click "Publish all"

Publishing:
  pl_file_management_daily   (new)
  ds_delete_blob_landing     (new)

📸SCREENSHOT

Publish panel — listing new pipeline and dataset

📸SCREENSHOT

Successfully published notification

🎯 What You Built — Summary

BEFORE:
  Pipelines copied blindly — crashed if file was missing
  Files accumulated in landing zone forever
  No way to know which files were missing on any given day
  No history — files overwritten daily

AFTER:
  Pipeline checks file existence BEFORE attempting to copy
  Landing zone cleaned automatically after successful copy
  Missing files logged — you know exactly what did not arrive
  Files are date-stamped — full history preserved in ADLS
  Pipeline never crashes on missing files — handles gracefully

New ADF Activities Learned

Activity	Purpose	Key Output
Get Metadata	Read file/folder properties	.output.exists, .output.size, .output.lastModified
If Condition	Branch based on true/false	Runs True OR False activities
Delete	Remove a file from storage	File is permanently removed
Set Variable (append)	Build a running log	Concatenates text across iterations

New Expressions Learned

Expression	What It Does
@and(condition1, condition2)	True only when BOTH conditions are true
activity('name').output.fieldname	Read the output of a previous activity
greater(value, number)	True if value is greater than number
@{replace(string, 'find', 'replace')}	Replace text within a string
@{variables('name')}existing@{item()}new	Append text to a variable (concatenation)

🧠 Key Concepts to Remember

Concept	What It Is	Why It Matters
Get Metadata	Reads file properties without reading file	Check existence before copying — prevent crashes
If Condition	Branches pipeline based on true/false	Handle missing files gracefully
Delete Activity	Permanently removes file from storage	Clean landing zone after successful processing
True branch	Activities that run when condition is true	The happy path
False branch	Activities that run when condition is false	The error handling path
Sequential ForEach	One iteration at a time	Required when writing to shared variables
Race condition	Two iterations updating same variable at once	Why parallel ForEach breaks variable updates
String concatenation	Appending text to a variable each iteration	Build running logs across loop iterations
activity('name').output	Read another activity's result	Core pattern for connecting activity results
@and()	Both conditions must be true	Safer than just checking exists alone

⚠️ Common Mistakes in This Project

Mistake	Fix
Activity name in expression does not match actual name	Expression uses activity('get_metadata_store_file') — if the activity is named GetMetadata1 the expression fails. Names are case-sensitive.
ForEach set to Parallel when writing to a variable	Set ForEach Sequential = ON whenever activities inside write to pipeline variables. Parallel causes race conditions.
Delete activity runs even when Copy failed	Make sure Delete is connected to Copy with a success arrow (green), not always arrow (grey). Click the arrow to verify it shows "On success".
Get Metadata field list not configured	Get Metadata → Field list tab → must explicitly add "exists" as a field. Without this, output.exists returns null.
Activities placed in wrong branch	Click Activities tab on If Condition → verify Copy is in True branch and log_missing_file is in False branch.
Self-reference error on missing_files variable	Use a second variable (final_log) for the summary. missing_files is written inside ForEach, final_log reads it after. Never read and write the same variable in one Set Variable activity.

🏆 Tier 1 Complete — What You Have Built So Far

PROJECT 01:  Copy a single file                     → ADF basics, linked services, datasets
PROJECT 02:  Copy multiple files with ForEach       → Loops, arrays, parallel execution
PROJECT 03:  Date-parameterized pipeline + trigger  → Parameters, dynamic expressions, scheduling
PROJECT 04:  Download from public HTTPS URL         → HTTP linked service, internet data sources
PROJECT 05:  File management with validation        → Get Metadata, If Condition, Delete, error handling

🎯 Pro Tip

You now understand the complete ADF activity toolkit for file-based pipelines. Every data engineer at every company uses these exact patterns.

🎯 Key Takeaways

✓Always use Get Metadata to check file existence before copying — prevents cryptic pipeline crashes
✓If Condition branches your pipeline into happy path (True) and error path (False)
✓Connect Delete to Copy with a success arrow — if Copy fails, Delete must NOT run
✓Sequential ForEach is required when activities inside write to shared pipeline variables
✓Use a separate variable for summaries — ADF blocks self-reference (read + write same variable in one step)
✓Date-stamping files in ADLS preserves history — overwriting destroys it
✓The landing zone is temporary — clean it after processing so it stays lean

🚀 What's Coming in Project 06 — Tier 2 Begins

So far, all data sources have been files — CSVs sitting somewhere waiting to be copied. In the real world, a huge portion of data comes from REST APIs — services you query with an HTTP request and get back structured JSON.

In Project 06, FreshCart integrates with a live public REST API:

Call a real API endpoint and receive a JSON response
Extract data and land it in ADLS as a clean CSV
Handle REST API pagination — when results come in pages
API key authentication
Parse nested JSON structures

What to learn next

Project 06 — Pull Data From a REST API

Projects · 90 min · +500 XP

Project 04 — HTTP Ingestion

Projects

Project 06 — Pull Data From a REST API

Projects

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub