Pub/Sub vs Kafka vs Kinesis — Choosing a Streaming Ingestion Layer
Every real-time data pipeline needs a message broker — a system that ingests high-velocity events and makes them available to downstream consumers. Google Pub/Sub, Apache Kafka, and Amazon Kinesis all do this job but with different tradeoffs.
Google Pub/Sub
Pub/Sub is Google fully managed global message queue. Producers publish to topics. Subscribers receive messages via push (Pub/Sub calls your endpoint) or pull (your code polls).
Key characteristic: at-least-once delivery. Messages may be delivered more than once — your consumer must handle duplicates. For exactly-once: use Dataflow with Pub/Sub source which handles deduplication.
Strength: zero infrastructure management, global availability, integrates natively with all GCP services, automatic scaling to millions of messages per second.
Apache Kafka
Kafka is the industry standard for high-throughput event streaming. Unlike Pub/Sub and Kinesis, Kafka retains messages for configurable periods (days to weeks). Multiple consumer groups read the same topic independently.
This retention and replayability is Kafka's killer feature. A new analytics system can replay 30 days of events from Kafka without needing a separate archive.
On GCP: use Confluent Cloud (managed Kafka) or Google Cloud Managed Kafka. Self-managing Kafka is complex — use a managed service.
Amazon Kinesis
Kinesis is AWS's managed streaming service. Kinesis Data Streams retains messages for 24 hours to 365 days. Kinesis Firehose automatically delivers streams to S3, Redshift, or OpenSearch without writing consumer code.
Kinesis shards are the unit of throughput (1MB/s write, 2MB/s read per shard). You provision shards upfront — unlike Kafka and Pub/Sub which scale automatically.
For AWS-native stacks: Kinesis Firehose is the easiest way to get streaming data into S3 for downstream batch processing.
How to choose
GCP stack: Pub/Sub is the default. Add Kafka only if you need message retention and replay.
AWS stack: Kinesis for AWS-native integration. MSK (managed Kafka) for replay requirements or multi-cloud.
Multi-cloud or cloud-neutral: Kafka. Confluent Cloud runs on all three clouds with the same API.
For job market: Kafka knowledge is the most transferable. Pub/Sub and Kinesis are cloud-specific.