Azure Key Vault
Azure Key Vault is a managed secrets store. Store passwords, connection strings, API keys, and certificates in one place — and let your pipelines retrieve them at runtime using Managed Identity, with zero credentials in your code.
Why Key Vault Exists
Secrets in code are the most common security mistake in data engineering. A connection string hardcoded in a Databricks notebook gets committed to Git. A storage account key in an ADF config gets shared in a Teams message. These happen constantly — and once a secret is in Git history, it is compromised even after deletion.
Key Vault solves this by being the single place where secrets live. Your code never contains the actual value — it contains a reference to Key Vault. At runtime, the pipeline authenticates using Managed Identity, retrieves the secret, and uses it. The value never touches your code or logs.
Key Vault with Azure Databricks
Databricks has first-class Key Vault integration through secret scopes. Create the scope once, then use dbutils.secrets.get() anywhere in your notebooks. Secret values are automatically redacted in all notebook output.
# Use Key Vault secrets in Databricks notebooks
# First: create a secret scope backed by Key Vault (one-time setup in Databricks CLI)
# databricks secrets create-scope --scope kv-scope --scope-backend-type AZURE_KEYVAULT
# Then in any notebook — secret values are NEVER shown in output
storage_key = dbutils.secrets.get(scope="kv-scope", key="adls-storage-key")
db_password = dbutils.secrets.get(scope="kv-scope", key="sql-db-password")
api_key = dbutils.secrets.get(scope="kv-scope", key="resend-api-key")
# Configure Spark to use the secret — actual value never visible
spark.conf.set(
"fs.azure.account.key.yourstorage.dfs.core.windows.net",
storage_key
)
# Read from ADLS Gen2 — fully authenticated via Key Vault
df = spark.read.parquet(
"abfss://silver@yourstorage.dfs.core.windows.net/sales/"
)
df.show()
# If you accidentally print(storage_key), it shows [REDACTED]
# Databricks masks all secret values in notebook output automaticallyKey Vault with Azure Data Factory
Create a Key Vault Linked Service in ADF first, then reference it from all other Linked Services instead of entering passwords directly. Your ADF pipeline JSON contains zero credentials — safe to commit to Git and share with the team.
// ADF Linked Service referencing Key Vault — zero credentials in config
// In ADF Studio: Manage > Linked Services > New > Azure SQL Database
// Select Key Vault for authentication instead of typing a password
{
"name": "LS_AzureSQL_Production",
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "yourserver.database.windows.net",
"database": "SalesDB",
"encrypt": true,
"authenticationType": "SQL",
"userName": "pipeline_user",
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "LS_KeyVault",
"type": "LinkedServiceReference"
},
"secretName": "sql-db-password"
}
}
}
// This ADF pipeline JSON has zero credentials
// Safe to commit to Git — the password lives only in Key Vault
// Rotate the password in Key Vault — ADF picks it up automaticallyKey Vault from Python — Managed Identity
When running Python code in Azure — Functions, Container Apps, VMs — use DefaultAzureCredential. It automatically uses the Managed Identity when running in Azure, and your local az login when developing locally. No credentials in code at all.
# Access Key Vault from Python using Managed Identity
# Works in Azure Functions, Container Apps, VMs — no credentials needed
# pip install azure-keyvault-secrets azure-identity
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
# DefaultAzureCredential uses Managed Identity in Azure
# and your local az login account for local development
credential = DefaultAzureCredential()
client = SecretClient(
vault_url="https://your-vault.vault.azure.net/",
credential=credential
)
# Get secrets
db_password = client.get_secret("sql-db-password").value
api_key = client.get_secret("resend-api-key").value
# Set a secret (needs Key Vault Secrets Officer role)
client.set_secret("new-secret-name", "secret-value")
# List all secret names (values are not returned in list)
for secret_props in client.list_properties_of_secrets():
print(secret_props.name)RBAC Roles — Who Gets What Access
| Role | Permission | Assign To |
|---|---|---|
| Key Vault Secrets User | Read secret values | Databricks clusters, ADF pipelines, Azure Functions |
| Key Vault Secrets Officer | Read, write, delete secrets | DevOps pipelines, admin scripts |
| Key Vault Administrator | Full control including policies | Security team only |
| Key Vault Reader | View metadata, not values | Auditors, monitoring tools |
Best Practices
Separate Key Vaults for dev, staging, and production. Dev secrets should never be accessible from production services.
Set expiry dates on secrets. Key Vault alerts you before expiry. Rotate database passwords every 90 days.
Every secret access is logged to Azure Monitor. If credentials are compromised, you can see exactly what was accessed and when.
Connection strings and storage account keys in code get committed to Git. Always use Key Vault references or Managed Identity.
Setting It Up — Step by Step
Azure Portal > Create a resource > Key Vault. Choose the same region as your other resources. Enable RBAC authorization (not access policies).
Key Vault > Secrets > Generate/Import. Add each secret: adls-storage-key, sql-db-password, resend-api-key. Name them clearly.
For each service that needs secrets: find its Managed Identity, go to Key Vault > Access control (IAM) > Add role assignment > Key Vault Secrets User.
In Databricks: Settings > Developer > Secret Scopes > Create. Link to your Key Vault resource ID and DNS name.
In notebooks: replace hardcoded keys with dbutils.secrets.get(). In ADF: update Linked Services to reference Key Vault secrets.
🎯 Key Takeaways
- ✓Never put secrets in code, config files, or environment variables — use Key Vault
- ✓Managed Identity authenticates your Azure services to Key Vault with zero credentials in code
- ✓Databricks secret scopes backed by Key Vault redact values automatically in notebook output
- ✓ADF Linked Services referencing Key Vault make your pipeline JSON safe to commit to Git
- ✓Use separate Key Vaults for dev, staging, and production
- ✓Enable audit logging — every secret access is tracked for security review
Discussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.