Python · SQL · Web Dev · Java · AI/ML tracks launching soon — your one platform for all of IT
Back to Blog
AWSStorage

Amazon Redshift Best Practices — Distribution Keys, Sort Keys, and Vacuum

February 25, 2026 7 min read✍️ by Asil

A poorly configured Redshift cluster can be 100x slower than a well-configured one for the same query. The three most impactful configuration decisions — distribution keys, sort keys, and vacuum strategy — are what separate a performant Redshift cluster from an expensive slow one.

Distribution keys — how data splits across nodes

Redshift distributes table rows across compute nodes. The distribution key determines which node each row goes to.

KEY distribution: rows with the same distribution key value go to the same node. Use this on join columns — if orders and customers both distribute by customer_id, joins between them require no network transfer.

EVEN distribution: rows distributed round-robin. Good for tables with no dominant join pattern.

ALL distribution: full copy on every node. Use for small dimension tables (under a few million rows) that join frequently with large fact tables.

Sort keys — how data is ordered on disk

Sort keys determine the physical order of rows within each data block. Queries with range filters (WHERE event_date BETWEEN x AND y) skip entire blocks that fall outside the range.

Single sort key: one column. Use when queries consistently filter on one column (usually a date).

Compound sort key: multiple columns ordered left to right. WHERE date = x AND region = y benefits from a compound sort key on (date, region). The leftmost column is most important.

Interleaved sort key: equal weight to all key columns. Better for varied query patterns but slower on VACUUM. Use sparingly.

VACUUM and ANALYZE

When rows are deleted or updated in Redshift, they are not removed immediately — they are marked as deleted and new versions are written. Over time this creates table bloat.

VACUUM reclaims disk space and re-sorts unsorted rows. Run VACUUM on high-churn tables weekly or after large loads.

ANALYZE updates table statistics used by the query planner. Run after significant data changes. Without current statistics, Redshift may choose inefficient join order or scan strategy.

Both VACUUM and ANALYZE can run automatically — enable auto VACUUM and auto ANALYZE in cluster settings.

WLM — Workload Management

WLM controls how queries are queued and prioritized. By default all queries share one queue. This means an analyst running a 10-minute report blocks a 1-second dashboard query.

Configure WLM with at least two queues: a fast queue for short queries with a short timeout, and a slow queue for long-running ETL jobs. Route queries automatically by user group or query group label.

Ready to apply this?

Learn Amazon Redshift