ProjectsAdvanced+500 XP

Project 01 — Copy a CSV File to Azure Data Lake

Build your first Azure data pipeline from scratch. Copy a CSV file from a local landing zone into ADLS Gen2 using Azure Data Factory — the foundational pattern behind every data engineering pipeline in Azure.

60–90 min March 2026

DBMS Interview Questions Project 02 — ForEach Loop

Series: Azure Data Engineering — Zero to Advanced

Project: 01 of 25

Level: Absolute Beginner

Time: 60–90 minutes

What you will build

A pipeline that automatically copies a CSV file from your local computer into Azure cloud storage — the foundational pattern of every data engineering project in the world.

Real World Problem

The Company: FreshCart — a grocery chain with 10 stores across India.

Every day, each store manager exports a file called daily_sales.csv from their billing software and saves it on their computer. That file just sits there. The data team at FreshCart HQ has zero visibility into what is happening across stores. They cannot answer basic questions like:

Which store sold the most today?
Which product is running out of stock?
Is store revenue growing or shrinking week over week?

The root problem: Data is trapped on individual laptops. There is no central place to store it, no way to query it, no way to analyse it.

What we are going to do: Take that CSV file sitting on a laptop → upload it to Azure cloud storage automatically using a pipeline. This is the very first step of every data engineering project in the world — get the data off the source and into a central location.

What we are building

💻

Your Computer

daily_sales.csv

→

⚙️

ADF Pipeline

pl_copy_sales_csv

→

🏦

ADLS Gen2

raw/sales/daily_sales.csv

Simple. One file. One destination. One pipeline. But this exact pattern is the foundation of every data engineering project you will ever build.

Concepts You Must Understand First

Before touching Azure, read this section completely — it will make every step feel logical instead of confusing.

What is Azure?

Azure is Microsoft's cloud platform. Instead of buying physical servers and hard drives — you rent them from Microsoft and pay only for what you use.

Think of it like electricity. You do not build a power plant to get electricity at home. You plug into the grid and pay a monthly bill. Azure is the same — you plug into Microsoft's infrastructure and pay for what you consume. For data engineering, Azure gives you cloud storage, compute, pipelines, and dashboards.

What is Azure Data Factory (ADF)?

ADF is a data pipeline orchestration tool. It answers the question: how do I move data from A to B automatically?

Think of ADF like a courier service. You tell it: pick up from this location (source), deliver to that location (destination), run every day at midnight, and alert me if delivery fails. That is exactly what ADF does with data.

Linked Service

The connection details to reach a data source — like saving a contact in your phone (name + number).

Dataset

Points to the specific file or table you want — like telling the courier: pick up the red box.

Activity

One single action — copy, transform, run code. Like one step in the delivery process.

Pipeline

A container that holds one or more activities — the complete delivery workflow from start to finish.

In this project we will create 2 Linked Services, 2 Datasets, 1 Activity, and 1 Pipeline.

What is ADLS Gen2?

ADLS stands for Azure Data Lake Storage Generation 2. It is cloud storage optimized for data analytics workloads — think of it as a massive intelligent hard drive in the cloud that can store any file type, never runs out of space, and is directly connected to every Azure analytics service.

⚠️ Critical — Enable Hierarchical Namespace

ADLS Gen2 is created by enabling one checkbox — Hierarchical Namespace — during storage account creation. This cannot be enabled after creation. If you forget it, you must delete the storage account and start over. We will cover this in detail in Step 4.

Step by Step Overview

PHASE 0 — Prepare15 min

01Create an Azure free account
02Create a Resource Group
03Create the CSV file on your computer

PHASE 1 — Storage15 min

04Create ADLS Gen2 storage account
05Create container and folder structure

PHASE 2 — ADF Setup10 min

06Create Azure Data Factory
07Open ADF Studio

PHASE 3 — Build the Pipeline30 min

08Create Linked Service — Blob (source)
09Create Linked Service — ADLS Gen2 (destination)
10Create Dataset — source CSV
11Create Dataset — destination ADLS
12Create Pipeline + Copy Activity
13Configure Source and Sink
14Debug (test run)
15Publish
16Verify file in ADLS

PHASE 0 — PREPARE

Step 1 — Create an Azure Account

Go to https://azure.microsoft.com/en-in/free and click "Start free".

You will get $200 free credit valid for 30 days, 12 months of popular services free, and always-free services including limited ADF and storage. You will need a Microsoft account, a phone number for verification, and a credit card for identity only — you will NOT be charged for free tier usage.

📸SCREENSHOT

Azure free account signup page — showing the 'Start free' button and the $200 credit offer

After signing up → go to https://portal.azure.com. You will land on the Azure Portal homepage — your control centre for everything Azure.

📸SCREENSHOT

Azure Portal homepage after first login — showing the top search bar, left sidebar, and the main dashboard area

What you are looking at

Top search bar — Search for any Azure service by name — use this constantly

Left sidebar — Your recently used services

Main area — Your dashboard — empty now, fills up as you create resources

Top right — Your account name, subscription, notifications

Step 2 — Create a Resource Group

A Resource Group is a logical folder that holds all Azure resources belonging to one project. Instead of having ADF, ADLS, and Databricks scattered randomly — you put them all in one Resource Group called rg-freshmart-dev.

Why use a Resource Group?

See all costs in one place — How much is this entire project costing me?

Delete everything at once — Done with the project? Delete the group and everything inside is gone.

Permissions once — Give a teammate access to the group and they get access to everything inside.

In the Azure Portal search bar → type "Resource groups" → click it.

📸SCREENSHOT

Azure Portal — typing 'Resource groups' in the search bar, showing the suggestion dropdown

Click "+ Create" (top left button).

📸SCREENSHOT

Resource groups page — showing the '+ Create' button at top left

Fill in the form exactly as shown:

Subscriptionyour subscription name — e.g. "Azure subscription 1"

Resource grouprg-freshmart-dev

RegionEast US 2

🎯 Naming Convention

We use: rg = resource group, freshmart = project name, dev = environment. Professional teams always follow naming conventions. Get in the habit now — all 25 projects will use this same pattern.

📸SCREENSHOT

Resource group creation form — all three fields filled in exactly as shown above

Click "Review + Create" → then "Create". Wait 5 seconds — you will see a notification: "Resource group created".

📸SCREENSHOT

Resource group successfully created — showing the overview page with name 'rg-freshmart-dev' and region 'East US 2'

Step 3 — Create the Sample CSV File

This represents the daily_sales.csv that FreshCart store managers export from their billing software every day. Open Notepad (Windows) or TextEdit (Mac) and paste exactly this:

daily_sales.csv

order_id,store_id,product_name,category,quantity,unit_price,order_date
ORD001,ST001,Basmati Rice 5kg,Grocery,3,299.00,2024-01-15
ORD002,ST001,Sunflower Oil 1L,Grocery,5,145.00,2024-01-15
ORD003,ST001,Samsung TV 43inch,Electronics,1,32000.00,2024-01-15
ORD004,ST002,Amul Butter 500g,Dairy,8,240.00,2024-01-15
ORD005,ST002,Basmati Rice 5kg,Grocery,2,299.00,2024-01-15
ORD006,ST002,Colgate Toothpaste,Personal Care,10,89.00,2024-01-15
ORD007,ST003,Lays Chips Family Pack,Snacks,15,99.00,2024-01-15
ORD008,ST003,Sony Headphones,Electronics,2,4500.00,2024-01-15
ORD009,ST003,Amul Milk 1L,Dairy,20,62.00,2024-01-15
ORD010,ST001,Dove Soap 100g,Personal Care,6,65.00,2024-01-15

Save the file as daily_sales.csv somewhere easy to find — your Desktop is fine.

📸SCREENSHOT

Notepad with the CSV content pasted in — showing the file before saving

📸SCREENSHOT

Save As dialog — showing filename 'daily_sales.csv' being saved to Desktop

PHASE 1 — STORAGE

Step 4 — Create ADLS Gen2 Storage Account

In the Azure Portal search bar → type "Storage accounts" → click it → click "+ Create".

📸SCREENSHOT

Storage accounts page — showing the '+ Create' button at top left

Basics Tab

Resource grouprg-freshmart-dev

Storage accountstfreshmartdev← no hyphens!

RegionEast US 2

PerformanceStandard

RedundancyLocally-redundant storage (LRS)

📸SCREENSHOT

Storage account creation — Basics tab completely filled in with all values as shown above

Advanced Tab — The Most Important Checkbox

Click the "Advanced" tab → find the section called "Data Lake Storage Gen2" → enable Hierarchical namespace.

⚠️ Do Not Skip This

This single checkbox converts a regular Azure Blob Storage account into ADLS Gen2. Without it, analytics tools run 30× slower and you get simulated folders instead of real ones. You cannot enable this after creation. If you forget, delete the storage account and recreate it.

📸SCREENSHOT

Advanced tab — showing the 'Hierarchical namespace' checkbox being checked under the Data Lake Storage Gen2 section

Leave all other tabs as default. Click "Review" → "Create". Deployment takes about 30–60 seconds.

📸SCREENSHOT

Deployment complete — showing 'Your deployment is complete' with the resource name stfreshmartdev

Click "Go to resource".

📸SCREENSHOT

Storage account overview page — highlighting the 'Azure Data Lake Storage Gen2' label and the name stfreshmartdev

Step 5 — Create Container and Folder Structure

A container is like a top-level folder inside a storage account. We will create one container called raw — this is where all raw, unprocessed data lands. In later projects we will also have processed and curated containers following the Medallion Architecture.

On the storage account page → left sidebar → click "Containers" → "+ Container".

Nameraw

Public accessPrivate (no anonymous access)

📸SCREENSHOT

New container dialog — name 'raw' entered, Private selected

📸SCREENSHOT

Container 'raw' now visible in the containers list

Click on the "raw" container → click "+ Add Directory" → name it sales.

📸SCREENSHOT

raw container showing the 'sales' directory created inside it

🎯 Why This Folder Structure?

We are starting organized. In future projects we will add raw/products/, raw/customers/, raw/inventory/ as FreshCart grows. Starting with a clean hierarchy now saves massive headaches later.

PHASE 2 — ADF SETUP

Step 6 — Create Azure Data Factory

In the Azure Portal search bar → type "Data factories" → click it → click "+ Create".

Resource grouprg-freshmart-dev

Nameadf-freshmart-dev

RegionEast US 2

VersionV2

📸SCREENSHOT

Data Factory creation form — all fields filled in as shown above

Click the "Git configuration" tab → check "Configure Git later". Git integration is important for production but adds complexity for beginners — we will cover it in a later project.

📸SCREENSHOT

Git configuration tab — 'Configure Git later' checkbox checked

Click "Review + create" → "Create". Deployment takes 1–2 minutes.

📸SCREENSHOT

ADF deployment complete — showing 'adf-freshmart-dev' successfully created

Step 7 — Open ADF Studio

Click "Go to resource" → on the ADF overview page → click "Launch studio". ADF Studio opens in a new browser tab — this is where you will spend 90% of your time.

📸SCREENSHOT

ADF overview page — showing the 'Launch studio' button in the centre

📸SCREENSHOT

ADF Studio homepage — label each section: (1) Left sidebar icons, (2) Main canvas area, (3) Top toolbar

ADF Studio Layout

🏠 HomeWelcome page

✏️ AuthorWhere you BUILD pipelines, datasets, linked services

📊 MonitorWhere you SEE pipeline runs, success/failure logs

🔧 ManageWhere you set up linked services and integration runtimes

PHASE 3 — BUILD THE PIPELINE

Step 8 — Create Linked Service for Source (Blob Storage)

ADF lives in the cloud and cannot directly reach into your laptop. The solution: we first upload the CSV to a landing container in the same storage account, then ADF copies it from there to the raw/sales/ destination.

Think of it like a courier — the courier cannot teleport to your home. You drop the package at a collection point, then the courier picks it up and delivers it.

Upload the CSV to a landing container

Go to Azure Portal → Storage account stfreshmartdev → Containers → "+ Container".

Namelanding

Public accessPrivate

📸SCREENSHOT

Creating 'landing' container — name and private access level set

Click on the "landing" container → click "Upload" → select your daily_sales.csv from your Desktop → click "Upload".

📸SCREENSHOT

landing container after upload — daily_sales.csv visible with file size and last modified date

Create the Linked Service

Go back to ADF Studio → click "Manage" (toolbox icon in left sidebar) → click "Linked services" → click "+ New".

📸SCREENSHOT

Linked services page — empty list and '+ New' button visible

In the search box → type "Azure Blob" → select "Azure Blob Storage" → click "Continue".

📸SCREENSHOT

New linked service panel — 'Azure Blob Storage' selected in search results

Namels_blob_freshmart_landing

Connect viaAutoResolveIntegrationRuntime

AuthenticationAccount key

Storage accountstfreshmartdev

Click "Test connection" at the bottom. You should see a green ✅ "Connection successful".

📸SCREENSHOT

Green 'Connection successful' message at the bottom of the linked service form

Click "Create".

📸SCREENSHOT

Linked services list — ls_blob_freshmart_landing now visible

Step 9 — Create Linked Service for Destination (ADLS Gen2)

Still in Manage → Linked services → click "+ New" → search "Azure Data Lake Storage Gen2" → select it → "Continue".

📸SCREENSHOT

New linked service — 'Azure Data Lake Storage Gen2' selected

Namels_adls_freshmart

Connect viaAutoResolveIntegrationRuntime

AuthenticationAccount key

Storage accountstfreshmartdev

💡 Same Storage Account, Two Linked Services?

Yes — in this project both source and destination live in stfreshmartdev. We still create two separate linked services because one represents Blob Storage behaviour (landing) and one represents ADLS Gen2 behaviour (raw, with hierarchical namespace). In real projects these will be completely different accounts.

Click "Test connection" → ✅ green → click "Create".

📸SCREENSHOT

Linked services list — now showing both: ls_blob_freshmart_landing and ls_adls_freshmart

Step 10 — Create Source Dataset

A Dataset tells ADF which specific file to work with. The Linked Service is how to connect to the storage — the Dataset is what file specifically to read or write.

Click "Author" (pencil icon in left sidebar) → click "+" next to "Datasets" → "New dataset".

📸SCREENSHOT

Author tab — showing Datasets section with the '+' button highlighted

Search "Azure Blob Storage" → select it → "Continue". Select format: "DelimitedText" (this means CSV) → "Continue".

📸SCREENSHOT

Format selection — DelimitedText/CSV selected

Nameds_src_blob_daily_sales

Linked servicels_blob_freshmart_landing

Containerlanding

Filedaily_sales.csv

First row as header✅ Yes

Import schemaFrom connection/store

📸SCREENSHOT

Dataset properties form — all fields filled in exactly as shown

Click "OK" → then click the "Preview data" tab at the bottom. You should see all 10 rows of FreshCart sales data — this confirms ADF can read your file.

📸SCREENSHOT

Dataset preview — showing all 10 rows of CSV data in a clean table format

Click 💾 Save (or Ctrl+S).

📸SCREENSHOT

Dataset saved — ds_src_blob_daily_sales visible in the Datasets list on the left

Step 11 — Create Destination Dataset

Click "+" next to "Datasets" → "New dataset" → search "Azure Data Lake Storage Gen2" → select it → "Continue". Select format: "DelimitedText" → "Continue".

Nameds_sink_adls_raw_sales

Linked servicels_adls_freshmart

Containerraw

Directorysales

Filedaily_sales.csv

First row as header✅ Yes

Import schemaNone← destination does not need a schema

📸SCREENSHOT

Destination dataset form — all fields showing raw/sales/daily_sales.csv path

Click "OK" → 💾 Save.

📸SCREENSHOT

Datasets list — now showing both ds_src_blob_daily_sales and ds_sink_adls_raw_sales

Step 12 — Create the Pipeline

Click "+" next to "Pipelines" → "New pipeline". A blank canvas opens.

📸SCREENSHOT

Author tab — Pipelines section with '+' button highlighted

In the Properties panel on the right, set the name and description:

Namepl_copy_daily_sales_csv

DescriptionCopies daily_sales.csv from landing zone to ADLS raw/sales/

📸SCREENSHOT

Pipeline canvas — Properties panel on right showing the name and description filled in

📸SCREENSHOT

Empty pipeline canvas — labelling each area: top toolbar, left activities panel, centre canvas, bottom properties panel

Step 13 — Add and Configure Copy Activity

The Copy Activity does one thing: read data from a source dataset and write it to a sink (destination) dataset. Source = where data comes FROM. Sink = where data goes TO. (Sink is standard data engineering terminology — like a kitchen sink where water flows into.)

In the left activities panel → expand "Move & transform" → drag "Copy data" onto the canvas.

📸SCREENSHOT

Dragging 'Copy data' activity from the left panel onto the canvas

📸SCREENSHOT

Copy data activity placed on the canvas — a blue box with 'Copy data' label

Click the Copy activity box to select it. The bottom panel shows configuration tabs. Set the following in each tab:

General Tab

Namecopy_daily_sales_to_adls

DescriptionReads daily_sales.csv from landing and writes to raw/sales/

📸SCREENSHOT

General tab — activity name and description filled in

Source Tab

Source datasetds_src_blob_daily_sales

File path typeFile path in dataset

📸SCREENSHOT

Source tab — ds_src_blob_daily_sales selected as source dataset

Sink Tab

Sink datasetds_sink_adls_raw_sales

Copy behaviorPreserveHierarchy

📸SCREENSHOT

Sink tab — ds_sink_adls_raw_sales selected as sink dataset

Mapping Tab

Click "Mapping" → click "Import schemas". ADF will automatically detect all columns and map them 1:1.

📸SCREENSHOT

Mapping tab — all columns auto-mapped with arrows between source and destination

Click 💾 Save.

📸SCREENSHOT

Saved pipeline — pl_copy_daily_sales_csv visible in the Pipelines list on the left

Step 14 — Debug (Test Run)

Before scheduling or publishing anything, always run a Debug first. Debug runs the pipeline immediately using your current draft — no effect on production.

Click "Debug" in the top toolbar. No parameters for this pipeline — click "OK".

📸SCREENSHOT

Top toolbar — Debug button highlighted with cursor pointing to it

Watch the canvas. The Copy activity will show:

🟡

Yellow

Running

✅

Green

Success

🔴

Red

Failed

📸SCREENSHOT

Pipeline running — Copy activity showing yellow/spinning status

📸SCREENSHOT

Pipeline succeeded — Copy activity showing green checkmark

At the bottom → "Output" tab → click the 👓 glasses icon next to the run to see details.

📸SCREENSHOT

Copy activity run details — showing files read: 1, files written: 1, data read and written amounts

Step 15 — Publish

In ADF, everything you build exists as a draft until published. Debug runs work on drafts. But triggers and scheduled runs only use published pipelines.

🎯 Draft vs Published

Think of it like a Google Doc — you are editing a draft. Publishing is clicking "Share" so others and scheduled triggers can see and use the final version. Always publish after making changes.

Click "Publish all" in the top toolbar. A panel shows everything that will be published — all 5 items we created. Click "Publish".

📸SCREENSHOT

Publish panel — showing all 5 items: pipeline, 2 datasets, 2 linked services

📸SCREENSHOT

'Successfully published' notification in the top right corner

Step 16 — Verify File in ADLS

Go to Azure Portal → Storage accounts → stfreshmartdev → Containers → raw → sales.

📸SCREENSHOT

sales folder contents — showing daily_sales.csv file with file size and last modified timestamp

Click on daily_sales.csv → click "Edit" to preview its contents.

📸SCREENSHOT

File preview in Azure Portal — showing all 10 rows of FreshCart sales data confirming the copy was successful

🎉 You did it

You just completed your first Azure Data Engineering pipeline. A CSV file that was sitting on a laptop is now safely stored in Azure Data Lake — accessible to Databricks, Synapse, and Power BI — copied by an automated pipeline that you built from scratch.

Step 17 — Check the Monitor Tab

Go back to ADF Studio → click "Monitor" (bar chart icon in the left sidebar). In production, this is where you check every morning that all pipelines ran successfully overnight.

📸SCREENSHOT

Monitor tab — showing the pipeline run for pl_copy_daily_sales_csv with status 'Succeeded', duration, and timestamp

Click on the pipeline run to see full details.

📸SCREENSHOT

Pipeline run detail — showing the copy activity, its duration, rows copied, and data volume

Resources Created — Summary

Resource	Name	Purpose
Resource Group	rg-freshmart-dev	Container for all project resources
Storage Account	stfreshmartdev	Holds all data (landing + raw)
Container	landing	Staging area for uploaded files
Container	raw	Destination — Bronze layer
Data Factory	adf-freshmart-dev	Pipeline orchestration
Linked Service	ls_blob_freshmart_landing	Connection to landing container
Linked Service	ls_adls_freshmart	Connection to ADLS Gen2 raw container
Dataset	ds_src_blob_daily_sales	Points to landing/daily_sales.csv
Dataset	ds_sink_adls_raw_sales	Points to raw/sales/daily_sales.csv
Pipeline	pl_copy_daily_sales_csv	Copies the file end-to-end

Key Concepts Reference

Concept	What It Is	Analogy
Resource Group	Logical folder for Azure resources	Project folder on your computer
ADLS Gen2	Cloud data lake with real folder hierarchy	Massive intelligent hard drive
Hierarchical Namespace	What makes ADLS Gen2 different from Blob	Real folders vs simulated ones
ADF	Visual pipeline orchestration tool	Automated courier service
Linked Service	Saved connection to a data source	Contact saved in your phone
Dataset	Pointer to a specific file or table	Address written on a package
Copy Activity	Single action that copies data	The delivery truck
Pipeline	Container of one or more activities	The full delivery workflow
Source	Where data comes FROM	Pickup location
Sink	Where data goes TO	Delivery destination
Debug	Test run on a draft pipeline	Proofreading before sending
Publish	Make pipeline live and schedulable	Clicking Send
Monitor	View all pipeline run logs	Delivery tracking dashboard

Common Mistakes

⚠️

Forgetting to enable Hierarchical Namespace

Fix: Delete the storage account and recreate it — cannot be enabled after creation

⚠️

Not testing connection before saving Linked Service

Fix: Always click "Test connection" and wait for the green tick before saving

⚠️

Forgetting to Publish after building

Fix: Always click "Publish all" after making changes — triggers use published version only

⚠️

Wrong file path in dataset

Fix: Double-check container name, directory name, and filename in the dataset settings

What is coming in Project 02

Right now our pipeline copies one specific file. But FreshCart has 10 stores — that means 10 files: store_ST001_sales.csv through store_ST010_sales.csv.

In Project 02, you will learn to use the ForEach activity to loop through all 10 files and copy them in one pipeline run — instead of creating 10 separate Copy activities. Same resources. Same storage account. Same ADF. Just smarter.

🎯 Key Takeaways

✓ADLS Gen2 requires the Hierarchical Namespace checkbox to be enabled at creation time — this cannot be changed later
✓ADF cannot reach your local laptop — upload files to a landing container first, then ADF copies from there
✓Always test connections on Linked Services before saving — catch errors early
✓Debug runs use your draft pipeline — Publish to make the pipeline available to triggers and schedules
✓Source = where data comes FROM. Sink = where data goes TO. These are standard data engineering terms
✓The Monitor tab is your daily health check — every pipeline run is logged with status, duration, and row counts
✓Resource Groups let you see all project costs together and delete everything with one click when done

📄 Resume Bullet Points

Copy these directly to your resume — tailored from this lesson

•

Built end-to-end Medallion Architecture batch pipeline on Azure: ADLS Gen2 → Databricks PySpark → Synapse Analytics

•

Implemented data quality framework validating 5,000 daily records — removing nulls, duplicates, and invalid values in Silver layer

•

Orchestrated multi-step Azure Data Factory pipeline with chained Databricks Notebook activities on daily schedule trigger

What to learn next

Project 02 — ForEach Loop

Projects · 60 min · +500 XP

DBMS Interview Questions

DBMS

Project 02 — ForEach Loop

Projects

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub