Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Intermediate+150 XP

Azure Data Factory (ADF)

ADF is the orchestration backbone of every Azure data pipeline. It moves data from 90+ sources, triggers Databricks notebooks, handles failures, and automates everything on a schedule — all without writing infrastructure code.

16 min read March 2026

What ADF is — and what it is not

Azure Data Factory is not a processing engine. It does not transform data. It does not run Python or SQL. What it does is move data and orchestrate other services that do those things. Think of it as the project manager of your data pipeline — it tells everything else what to do, in what order, and handles what happens when something goes wrong.

In a typical Azure pipeline, ADF handles three things: connecting to source systems and copying raw data into your data lake, triggering Databricks notebooks to run transformations in the right sequence, and scheduling all of this to happen automatically on a daily, hourly, or event-driven basis.

📌 Real World Example
A retail company has sales data sitting in an on-premises SQL Server. Every night at 2am, ADF connects to that SQL Server, extracts the previous day's transactions, copies them to ADLS Gen2 Bronze, then triggers a Databricks notebook to clean and process them. ADF manages the schedule, monitors for failures, and sends alerts if anything breaks. The data engineer wrote the notebook — ADF handles everything around it.

The six core concepts

Pipeline

The top-level container. A logical grouping of activities that together perform a task. "Ingest Sales Data" is one pipeline. You can have dozens in one ADF instance.

Activity

A single step inside a pipeline. Three types: data movement (Copy Data), transformation (run Databricks, run SQL), and control flow (If Condition, ForEach, Wait).

Dataset

A named reference to data — a specific table, file, or folder. Points to a Linked Service. For example: "the sales table in this SQL Server" or "the /bronze/sales/ folder in ADLS Gen2".

Linked Service

The connection definition. Holds the connection string (or Key Vault reference) for a data source. Create one per external system and reuse it everywhere.

Trigger

What starts a pipeline run. Three types: Schedule (set time), Storage Event (new file arrives), Tumbling Window (for historical backfills).

Integration Runtime

The compute ADF uses to run activities. Azure IR handles cloud-to-cloud. Self-hosted IR runs on-premises to connect ADF to on-prem databases.

Linked Services — always use Key Vault

Every data source ADF connects to needs a Linked Service. The most important rule: never store credentials directly in the Linked Service. Always reference Azure Key Vault. This is a hard requirement in every production environment, and interviewers specifically ask about this.

linked_service_adls.json
json
// ADF Linked Service — how ADF connects to ADLS Gen2
// Configure once in ADF UI, then reuse across all pipelines
{
  "name": "LS_ADLS_Bronze",
  "type": "AzureDataLakeStoreGen2",
  "typeProperties": {
    "url": "https://yourstorageaccount.dfs.core.windows.net",
    "accountKey": {
      "type": "AzureKeyVaultSecret",
      "store": { "referenceName": "LS_KeyVault", "type": "LinkedServiceReference" },
      "secretName": "adls-storage-key"
    }
  }
}

A real pipeline — Copy + Databricks in sequence

Here is what a typical ADF pipeline looks like. Two activities chained together: a Copy activity that moves data from SQL Server to Bronze, then a Databricks Notebook activity that runs once the copy succeeds. The dependsOn field is how you control execution order — the second activity only runs if the first one succeeded.

pipeline_ingest_sales.json
json
// ADF Pipeline — Copy Activity + Databricks Notebook chained together
{
  "name": "PL_Ingest_Sales_Daily",
  "activities": [
    {
      "name": "Copy_Sales_To_Bronze",
      "type": "Copy",
      "inputs":  [{ "referenceName": "DS_SQLServer_Sales", "type": "DatasetReference" }],
      "outputs": [{ "referenceName": "DS_ADLS_Bronze_Sales", "type": "DatasetReference" }],
      "typeProperties": {
        "source": {
          "type": "SqlSource",
          "sqlReaderQuery": "SELECT * FROM dbo.Sales WHERE LoadDate = '@{pipeline().parameters.runDate}'"
        },
        "sink": { "type": "DelimitedTextSink" }
      }
    },
    {
      "name": "Run_Databricks_Transform",
      "type": "DatabricksNotebook",
      "dependsOn": [{ "activity": "Copy_Sales_To_Bronze", "dependencyConditions": ["Succeeded"] }],
      "typeProperties": {
        "notebookPath": "/Shared/bronze_to_silver",
        "baseParameters": { "run_date": "@pipeline().parameters.runDate" }
      }
    }
  ],
  "parameters": {
    "runDate": { "type": "string", "defaultValue": "@utcNow('yyyy-MM-dd')" }
  }
}
🎯 Pro Tip
You rarely write this JSON by hand. ADF has a drag-and-drop visual designer in the Azure Portal. The JSON is what ADF stores behind the scenes — useful to know when you need to commit pipelines to Git for version control.

Trigger types and when to use each

🕐
Schedule Trigger
Daily/hourly batch pipelines

Runs at a fixed time on a recurring schedule. Most common type. Example: every day at 2am UTC.

📁
Storage Event Trigger
Event-driven ingestion

Fires when a file lands in ADLS Gen2. A partner drops a CSV — the pipeline triggers automatically.

🪟
Tumbling Window
Backfilling historical data

Runs for non-overlapping time windows. Perfect for reprocessing 90 days of history one day at a time, in parallel.

⚠️ Important
Always set up failure alerts on production pipelines. ADF can send emails or Teams notifications when a pipeline fails. A pipeline failing silently for three days without anyone noticing is a real, career-damaging incident.

🎯 Key Takeaways

  • ADF is an orchestration tool — it moves data and triggers other services, it does not transform data itself
  • Four core concepts: Pipelines (containers), Activities (steps), Datasets (data references), Linked Services (connections)
  • Always store credentials in Azure Key Vault — never hardcode them directly in Linked Services
  • dependsOn controls execution order — the next activity only runs when the previous one succeeds
  • Three trigger types: Schedule (time-based), Storage Event (file arrives), Tumbling Window (backfills)
  • The Monitor tab in ADF is your first stop when a pipeline run fails in production
  • ADF integrates natively with Databricks — triggering notebooks with parameters is the most common pipeline pattern
📄 Resume Bullet Points
Copy these directly to your resume — tailored from this lesson

Built Azure Data Factory pipelines orchestrating Databricks notebook activities with dependency chaining and retry logic

Configured ADF Schedule and Storage Event triggers to automate daily batch pipeline execution at 2am UTC

Implemented ADF pipeline monitoring and alerting using Azure Monitor — achieving 99.5% pipeline success rate

🧠

Knowledge Check

5 questions · Earn 50 XP for passing · Score 60% or more to pass

Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...