Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Back to Blog
GCPSecurity

GCP IAM for Data Engineers — Access Control Without the Confusion

February 28, 2026 5 min read✍️ by Asil

GCP IAM (Identity and Access Management) confuses most newcomers because it looks similar to AWS IAM and Azure RBAC but works differently. Understanding how to grant the right access to the right resources is essential for building secure data pipelines on GCP.

The three concepts you need

Member: who is getting access. Can be a Google account, service account, Google group, or domain.

Role: what access is being granted. A role is a collection of permissions. predefined roles (roles/bigquery.dataEditor) bundle common permissions. Custom roles let you create exactly the permission set you need.

Binding: the connection between member and role on a specific resource. Grant service account X the role roles/bigquery.dataEditor on dataset Y.

Service accounts for pipelines

Service accounts are the GCP equivalent of AWS IAM roles or Azure Service Principals. Your Dataflow jobs, Composer DAGs, and GCE instances use service accounts to authenticate to other GCP services.

Best practice: create one service account per workload with only the permissions it needs. A Dataflow job that reads from Pub/Sub and writes to BigQuery needs:

- roles/pubsub.subscriber on the subscription

- roles/bigquery.dataEditor on the target dataset

- roles/bigquery.jobUser on the project

Nothing more. Principle of least privilege.

Common data engineering role bindings

BigQuery analyst (read only): roles/bigquery.dataViewer + roles/bigquery.jobUser

Dataflow pipeline runner: roles/dataflow.developer + roles/bigquery.dataEditor + roles/storage.objectViewer

Composer DAG runner: roles/composer.worker + service-specific roles for each GCP service the DAGs call

GCS pipeline: roles/storage.objectCreator (write) + roles/storage.objectViewer (read)

Workload Identity — the modern approach

Workload Identity lets GKE workloads (including Composer and Dataflow) use service accounts without downloading key files. The workload authenticates using its Kubernetes service account, which is mapped to a GCP service account.

This eliminates the biggest security risk in GCP pipelines: service account JSON key files getting committed to Git or exposed in container images. Enable Workload Identity Federation on all new GCP data engineering projects.