Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Intermediate+150 XP

Cloud Pub/Sub

Cloud Pub/Sub is GCP's real-time messaging service. It decouples event producers from consumers and handles millions of messages per second — the streaming foundation of every GCP data pipeline.

12 min read March 2026

What Pub/Sub does and why it matters

Cloud Pub/Sub is GCP's equivalent of Azure Event Hubs and Amazon Kinesis. Its core job is decoupling — separating the services that produce events (your web app, mobile app, IoT device) from the services that consume and process them (Dataflow, BigQuery). Producers publish messages to a topic. Consumers subscribe to that topic and receive messages.

This decoupling means if your Dataflow processing job goes down for maintenance, messages queue up in Pub/Sub and are delivered when it comes back. No events are lost. Producers and consumers can scale independently.

📤
Topics

Where producers publish messages. Think of a topic as a named channel. Your app publishes sale events to the sales-events topic.

📥
Subscriptions

Where consumers receive messages. A topic can have multiple subscriptions — Dataflow and BigQuery can both subscribe to the same topic.

Acknowledgements

Messages must be acknowledged after processing. Unacknowledged messages are redelivered. This guarantees at-least-once delivery.

Publishing events to Pub/Sub

pubsub_publisher.py
python
# Pub/Sub Publisher — send events to a topic
from google.cloud import pubsub_v1
import json
from datetime import datetime

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path('your-project', 'sales-events')

def publish_sale_event(order_id: str, customer_id: str, amount: float, region: str):
    event = {
        'order_id':   order_id,
        'customer_id': customer_id,
        'amount':     amount,
        'region':     region,
        'timestamp':  datetime.utcnow().isoformat(),
    }
    # Messages must be bytes
    data = json.dumps(event).encode('utf-8')
    
    # Add attributes for server-side filtering
    future = publisher.publish(
        topic_path,
        data=data,
        region=region,           # Attribute for subscription filtering
        event_type='sale'
    )
    print(f"Published message ID: {future.result()}")

# Publish 100 sale events
for i in range(100):
    publish_sale_event(f'ORD-{i}', f'CUST-{i % 20}', 99.99 * i, 'US-EAST')

Subscribing and processing messages

pubsub_subscriber.py
python
# Pub/Sub Subscriber — pull and process messages
from google.cloud import pubsub_v1
import json

subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path('your-project', 'sales-events-sub')

def process_message(message: pubsub_v1.types.ReceivedMessage):
    data = json.loads(message.data.decode('utf-8'))
    print(f"Processing order {data['order_id']} — ${data['amount']} from {data['region']}")
    
    # Your processing logic here
    # save_to_database(data)
    
    # IMPORTANT: Always acknowledge the message after processing
    # Unacknowledged messages are redelivered
    message.ack()

# Pull messages with streaming pull (most efficient)
streaming_pull_future = subscriber.subscribe(
    subscription_path,
    callback=process_message
)

print(f"Listening for messages on {subscription_path}")
with subscriber:
    try:
        streaming_pull_future.result(timeout=60)
    except Exception:
        streaming_pull_future.cancel()
⚠️ Important
Always acknowledge messages after successful processing. If your code throws an error before calling message.ack(), Pub/Sub will redeliver the message. This is intentional — it guarantees no message is lost — but it means your processing logic must handle duplicate messages gracefully (idempotency).

🎯 Key Takeaways

  • Pub/Sub decouples event producers from consumers — equivalent to Azure Event Hubs and Amazon Kinesis
  • Topics are where producers publish. Subscriptions are where consumers receive. One topic can have many subscriptions
  • Always acknowledge messages after processing — unacknowledged messages are redelivered automatically
  • The most common pattern: app publishes to Pub/Sub → Dataflow reads from Pub/Sub → writes to BigQuery
  • Pub/Sub guarantees at-least-once delivery — design your processing logic to handle duplicate messages safely
  • Messages are retained for 7 days by default — useful for replaying events if a consumer goes down
Share

Discussion

0

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub
Loading...