AWSStreaming

Amazon Kinesis Firehose Explained — The Easiest Way to Stream Data into S3

February 26, 2026 5 min read✍️ by Asil

Kinesis Data Firehose is the simplest streaming ingestion service in AWS. While Kinesis Data Streams requires you to write consumer code, Firehose automatically delivers streaming data to S3, Redshift, OpenSearch, or Splunk — no consumer code needed.

What Firehose does

You point an application at a Firehose delivery stream. Firehose buffers the incoming records (by size or time — whichever threshold is hit first), optionally transforms them using a Lambda function, and delivers batches to your destination.

For S3 delivery: Firehose automatically creates date-partitioned prefix paths (year/month/day/hour), converting your stream into organized S3 files suitable for Athena or Glue queries.

Firehose vs Kinesis Data Streams

Kinesis Data Streams: you control consumers. Records are retained for 24h to 365 days. Multiple consumers read at their own pace. You write the consumer application.

Kinesis Firehose: no consumer code. Automatic delivery to a destination. No message retention — once delivered, it is gone. Simpler to set up but less flexible.

Choose Firehose when: you want to get streaming data into S3 quickly without building consumer infrastructure. Choose Data Streams when: multiple downstream systems need to consume the same stream independently.

Data transformation with Lambda

Firehose can invoke a Lambda function on each batch before delivery. This is where you add: data format conversion (JSON to Parquet — reduces S3 storage 5-10x), schema validation (drop malformed records), PII redaction (mask credit card numbers before S3).

The Lambda receives up to 3MB of records per invocation, transforms them, and returns the modified records. Firehose handles the rest.

Real-world use case

Application servers emit clickstream events → Kinesis Firehose → S3 (partitioned by date/hour) → AWS Glue catalog → Athena queries by analysts.

This pattern requires zero consumer code, produces queryable S3 data in minutes, and costs a fraction of running a Kafka cluster. For AWS-native event collection to S3, Firehose is the default first choice.

Ready to apply this?

Learn Amazon Kinesis

Back to all articles