FoundationsIntermediate+150 XP

PostgreSQL for Data Engineers

The most important database skill you can learn. From absolute zero — install PostgreSQL, understand tables, write real queries, and build skills that every data engineering job requires.

90–120 min March 2026

SQL for Data Engineers Python for Data Engineers

Level: Absolute Beginner

Requires: Nothing — start from zero

Time: 90–120 minutes

Used in: Almost every DE job

What you will learn

Install PostgreSQL, create a real database, write SELECT / INSERT / UPDATE / DELETE queries, understand JOINs, and learn the patterns data engineers use every single day.

What is PostgreSQL?

PostgreSQL (often called "Postgres") is a relational database — a system that stores data in organised tables with rows and columns, like a very powerful, programmable spreadsheet.

It has been in development since 1986, it is completely free and open source, and it is trusted by companies like Apple, Instagram, Spotify, and thousands of others to store billions of rows of data.

🆓

Free Forever

Open source — no licencing cost, ever. Used by startups and Fortune 500 companies equally.

💪

Extremely Powerful

Handles everything from 1,000 rows to billions of rows. Used in production at massive scale.

🏢

Industry Standard

If a job says "SQL experience required" — PostgreSQL knowledge covers it completely.

PostgreSQL vs Other Databases

Database	Type	Free?	When used in DE
PostgreSQL	Relational	✅ Yes	Source systems, staging, metadata stores
MySQL	Relational	✅ Yes	Web app backends, OLTP source systems
SQL Server	Relational	❌ Paid	Enterprise source systems (common in Azure)
BigQuery	Cloud DW	💳 Pay per query	GCP analytics warehouse
Snowflake	Cloud DW	💳 Pay per use	Multi-cloud analytics warehouse
MongoDB	Document	✅ Free tier	Semi-structured / JSON data

💡 Note

As a data engineer you will constantly pull data from PostgreSQL (source systems) and sometimes use it as a staging or metadata store. Understanding it deeply is non-negotiable.

PHASE 1 — INSTALL POSTGRESQL

Step 1 — Download and Install PostgreSQL

Go to postgresql.org/download → choose your operating system → download the installer.

🪟 Windows

1.Run the .exe installer
2.Click Next through all steps
3.Set a password for the postgres user — remember this!
4.Keep default port: 5432
5.pgAdmin will be installed automatically

🍎 Mac

1.Download the .dmg installer
2.Drag to Applications
3.Or use Homebrew: brew install postgresql@16
4.Start with: brew services start postgresql@16
5.pgAdmin: download separately from pgadmin.org

⚠️ Important

Remember the password you set for the postgres user during installation. You will need it every time you connect. If you forget it, reinstalling is the easiest fix.

Step 2 — Open pgAdmin (Your Visual Interface)

pgAdmin is the free visual tool that comes with PostgreSQL. It lets you browse databases, run queries, and see your data without typing everything in a terminal.

Open pgAdmin from your Start Menu / Applications → it opens in your browser → enter the password you set during installation.

pgAdmin left panel structure

Servers

└── PostgreSQL 16

└── Databases

└── postgres ← default database, always exists

└── Schemas → public → Tables

PHASE 2 — UNDERSTAND THE BASICS

Core Concepts You Must Know

Term	What It Means	Example
Database	A container that holds all your tables and data	freshmart_db
Table	Like a spreadsheet — rows and columns of data	sales, customers, products
Row	One single record in a table	One sale transaction
Column	A field/attribute every row has	order_id, amount, date
Schema	A folder inside a database to organise tables	raw, silver, gold
Query	A question you ask the database in SQL	SELECT * FROM sales
Primary Key	A column that uniquely identifies each row	order_id
Foreign Key	A column that links to a primary key in another table	store_id → stores.store_id
Index	A speed-up structure so queries find rows faster	Index on order_date
NULL	A missing or unknown value — not zero, not empty string	A sale with no discount

Data Types — What Type Does Each Column Store?

Every column in a table must have a data type. This tells PostgreSQL what kind of data to expect and how to store it efficiently.

Type	Use For	Example
INTEGER	Whole numbers	1, 42, 1000
BIGINT	Very large whole numbers	Row counts, IDs in large tables
NUMERIC(p,s)	Exact decimals (money)	NUMERIC(10,2) → 29999.99
VARCHAR(n)	Text with max length	VARCHAR(100) for names
TEXT	Unlimited length text	Descriptions, notes
BOOLEAN	True / False	is_active, is_deleted
DATE	Date only	2024-01-15
TIMESTAMP	Date + Time	2024-01-15 09:30:00
JSONB	JSON data (binary, fast)	API responses, flexible fields

PHASE 3 — CREATE YOUR FIRST DATABASE

Step 3 — Create a Database

In pgAdmin → right-click Databases → Create → Database. Or open the Query Tool and run:

SQL — Create Database

CREATE DATABASE freshmart_db;

Now connect to it — in pgAdmin click on freshmart_db in the left panel, then open the Query Tool (Tools → Query Tool).

Step 4 — Create Your First Table

We will recreate the FreshCart scenario in PostgreSQL. Let us create a stores table and a sales table.

SQL — Create stores table

CREATE TABLE stores (
  store_id    VARCHAR(10)  PRIMARY KEY,
  store_name  VARCHAR(100) NOT NULL,
  city        VARCHAR(100) NOT NULL,
  state       VARCHAR(50)  NOT NULL,
  opened_date DATE
);

What each part means:

store_id VARCHAR(10) PRIMARY KEY→ Text column, max 10 chars, must be unique — identifies each store

store_name VARCHAR(100) NOT NULL→ Text column, max 100 chars, cannot be empty

city VARCHAR(100) NOT NULL→ City name — required

opened_date DATE→ Date only — no time. NULL allowed (we did not say NOT NULL)

SQL — Create sales table

CREATE TABLE sales (
  order_id      VARCHAR(20)    PRIMARY KEY,
  store_id      VARCHAR(10)    NOT NULL REFERENCES stores(store_id),
  product_name  VARCHAR(200)   NOT NULL,
  category      VARCHAR(100),
  quantity      INTEGER        NOT NULL DEFAULT 1,
  unit_price    NUMERIC(10,2)  NOT NULL,
  total_amount  NUMERIC(10,2)  GENERATED ALWAYS AS (quantity * unit_price) STORED,
  order_date    DATE           NOT NULL,
  created_at    TIMESTAMP      DEFAULT NOW()
);

🎯 Pro Tip

REFERENCES stores(store_id) is a Foreign Key — it means everystore_id in the sales table MUST exist in the stores table first. PostgreSQL will reject any sale for a store that does not exist. This is called referential integrity.

Step 5 — Insert Data

Tables are empty after creation. Let us add some FreshCart data.

SQL — Insert stores

INSERT INTO stores (store_id, store_name, city, state, opened_date)
VALUES
  ('ST001', 'FreshCart New York',   'New York',  'Delhi',     '2020-01-15'),
  ('ST002', 'FreshCart New York',      'New York',     'Maharashtra','2020-03-10'),
  ('ST003', 'FreshCart Seattle',   'Seattle',  'Karnataka', '2020-06-01'),
  ('ST004', 'FreshCart Chicago',     'Chicago',    'Tamil Nadu','2021-01-20'),
  ('ST005', 'FreshCart Austin',   'Austin',  'Telangana', '2021-04-05');

SQL — Insert sales

INSERT INTO sales (order_id, store_id, product_name, category, quantity, unit_price, order_date)
VALUES
  ('ORD1001', 'ST001', 'Basmati Rice 5kg',    'Grocery',      12,  299.00, '2024-01-15'),
  ('ORD1002', 'ST001', 'Samsung TV 43inch',   'Electronics',   2, 32000.00,'2024-01-15'),
  ('ORD1003', 'ST001', 'Amul Butter 500g',    'Dairy',        25,  240.00, '2024-01-15'),
  ('ORD1004', 'ST002', 'Sunflower Oil 1L',    'Grocery',      18,  145.00, '2024-01-15'),
  ('ORD1005', 'ST002', 'iPhone 14',           'Electronics',   1, 75000.00,'2024-01-15'),
  ('ORD1006', 'ST002', 'Amul Milk 1L',        'Dairy',        40,   62.00, '2024-01-15'),
  ('ORD1007', 'ST003', 'Nike Running Shoes',  'Apparel',       5,  4500.00,'2024-01-16'),
  ('ORD1008', 'ST003', 'Colgate Toothpaste',  'Personal Care',30,   89.00, '2024-01-16'),
  ('ORD1009', 'ST004', 'Levis Jeans',         'Apparel',       8,  2999.00,'2024-01-16'),
  ('ORD1010', 'ST005', 'Dove Soap 100g',      'Personal Care',50,   65.00, '2024-01-16');

PHASE 4 — QUERY YOUR DATA

Step 6 — SELECT Queries (Reading Data)

SELECT is the most important SQL command. It retrieves data from your tables.

Basic SELECT patterns

-- Get every row and every column
SELECT * FROM sales;

-- Get only specific columns
SELECT order_id, product_name, total_amount FROM sales;

-- Filter rows with WHERE
SELECT * FROM sales WHERE category = 'Electronics';

-- Multiple conditions
SELECT * FROM sales WHERE category = 'Grocery' AND unit_price > 100;

-- Sort results
SELECT * FROM sales ORDER BY total_amount DESC;

-- Limit number of results
SELECT * FROM sales ORDER BY order_date DESC LIMIT 5;

-- Filter by date
SELECT * FROM sales WHERE order_date = '2024-01-15';

-- Filter by date range
SELECT * FROM sales WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

🎯 Pro Tip

-- is a comment in SQL. Anything after -- on a line is ignored by PostgreSQL. Always comment your queries — your future self will thank you.

Step 7 — Aggregate Functions (Summarising Data)

Aggregate functions calculate a single value from many rows. These are what data engineers use to build summary reports.

Aggregate functions

-- Count all rows
SELECT COUNT(*) FROM sales;

-- Sum of all sales
SELECT SUM(total_amount) FROM sales;

-- Average order value
SELECT ROUND(AVG(total_amount), 2) AS avg_order_value FROM sales;

-- Highest and lowest sale
SELECT MAX(total_amount) AS highest, MIN(total_amount) AS lowest FROM sales;

-- Total sales per category  ← this is GROUP BY
SELECT
  category,
  COUNT(*)              AS order_count,
  SUM(total_amount)     AS total_revenue,
  ROUND(AVG(unit_price),2) AS avg_price
FROM sales
GROUP BY category
ORDER BY total_revenue DESC;

Expected result of the last query:

categoryorder_counttotal_revenueavg_price

Electronics3139000.0046333.33

Apparel226492.003749.50

Dairy25960.00151.00

Step 8 — HAVING (Filter After GROUP BY)

WHERE filters rows before grouping. HAVING filters after grouping. Use HAVING when you want to filter based on an aggregate result.

WHERE vs HAVING

-- WHERE filters individual rows before grouping
SELECT category, SUM(total_amount) AS revenue
FROM sales
WHERE order_date >= '2024-01-01'    -- ← filters rows first
GROUP BY category;

-- HAVING filters groups after aggregation
SELECT category, SUM(total_amount) AS revenue
FROM sales
GROUP BY category
HAVING SUM(total_amount) > 5000;    -- ← only show categories with revenue > 5000

Step 9 — JOINs (Combining Tables)

Real data is almost never in a single table. JOINs combine rows from two or more tables based on a related column. This is one of the most important skills in data engineering.

JOIN Type	Returns	Real World Analogy
INNER JOIN	Only rows that match in BOTH tables	Customers who placed an order
LEFT JOIN	All left rows + matching right rows	All customers, with orders if they have any
RIGHT JOIN	All right rows + matching left rows	All orders, with customer details if available
FULL JOIN	All rows from both tables, matched where possible	Every customer and every order

JOIN examples

-- INNER JOIN — sales with their store name
SELECT
  s.order_id,
  s.product_name,
  s.total_amount,
  st.store_name,
  st.city
FROM sales s
INNER JOIN stores st ON s.store_id = st.store_id;

-- Total revenue per city
SELECT
  st.city,
  COUNT(s.order_id)       AS total_orders,
  SUM(s.total_amount)     AS total_revenue
FROM stores st
LEFT JOIN sales s ON st.store_id = s.store_id
GROUP BY st.city
ORDER BY total_revenue DESC NULLS LAST;

💡 Note

s and st are aliases — short names for tables used inside a query. FROM sales s means "call the sales table 's' in this query". Aliases make long queries much easier to read.

Step 10 — UPDATE and DELETE

UPDATE and DELETE

-- UPDATE a single row
UPDATE stores
SET city = 'San Francisco'
WHERE store_id = 'ST003';

-- UPDATE multiple rows
UPDATE sales
SET category = 'Food & Grocery'
WHERE category = 'Grocery';

-- DELETE a specific row
DELETE FROM sales WHERE order_id = 'ORD1001';

-- DELETE with a condition
DELETE FROM sales WHERE order_date < '2023-01-01';

⚠️ Important

Always use WHERE with UPDATE and DELETE. Running DELETE FROM saleswithout a WHERE clause deletes every single row in the table instantly — with no undo. Always double-check your WHERE condition before running.

PHASE 5 — DATA ENGINEERING PATTERNS

Patterns Data Engineers Use Every Day

These are the SQL patterns you will actually write in a real DE job — not just SELECT * FROM table.

1. UPSERT — Insert or Update

In pipelines you often need to insert new records but update existing ones. This is called an upsert.

UPSERT with ON CONFLICT

INSERT INTO stores (store_id, store_name, city, state)
VALUES ('ST001', 'FreshCart New York Updated', 'New York', 'Delhi')
ON CONFLICT (store_id)
DO UPDATE SET
  store_name = EXCLUDED.store_name,
  city       = EXCLUDED.city;
-- If ST001 exists → UPDATE it. If not → INSERT it.

2. CTEs — Clean Up Complex Queries

A CTE (Common Table Expression) lets you break a complex query into named steps — like variables in a query.

CTE example

WITH daily_totals AS (
  SELECT
    order_date,
    SUM(total_amount) AS daily_revenue
  FROM sales
  GROUP BY order_date
),
high_revenue_days AS (
  SELECT * FROM daily_totals
  WHERE daily_revenue > 50000
)
SELECT * FROM high_revenue_days ORDER BY daily_revenue DESC;

3. Window Functions — Rank and Compare Rows

Window functions calculate a value for each row based on a group of related rows — without collapsing rows like GROUP BY does.

Window function examples

-- Rank products by revenue within each category
SELECT
  product_name,
  category,
  total_amount,
  RANK() OVER (PARTITION BY category ORDER BY total_amount DESC) AS rank_in_category
FROM sales;

-- Running total of revenue by date
SELECT
  order_date,
  total_amount,
  SUM(total_amount) OVER (ORDER BY order_date) AS running_total
FROM sales
ORDER BY order_date;

4. CREATE TABLE AS — Build Summary Tables

In pipelines you often materialise a query result into a new table — this is the Silver → Gold pattern.

Create summary table

-- Create a Gold-layer summary table
CREATE TABLE gold_sales_by_store_category AS
SELECT
  st.store_id,
  st.city,
  s.category,
  COUNT(s.order_id)           AS total_orders,
  SUM(s.total_amount)         AS total_revenue,
  ROUND(AVG(s.unit_price), 2) AS avg_unit_price,
  MIN(s.order_date)           AS first_sale_date,
  MAX(s.order_date)           AS last_sale_date
FROM sales s
INNER JOIN stores st ON s.store_id = st.store_id
GROUP BY st.store_id, st.city, s.category;

📌 Real World Example

This is exactly what the Silver → Gold layer looks like in a real Medallion Architecture — you read from cleaned data (Silver) and write a business-ready aggregated table (Gold) that analysts and dashboards query directly.

Quick Reference — Most Used Commands

DDL — Structure

CREATE TABLE

ALTER TABLE

DROP TABLE

TRUNCATE TABLE

DML — Data

SELECT

INSERT INTO

UPDATE ... SET

DELETE FROM

Aggregates

COUNT()

SUM()

AVG()

MAX() / MIN()

Clauses

WHERE

GROUP BY

HAVING

ORDER BY

LIMIT

🎯 Key Takeaways

✓PostgreSQL is a free, open-source relational database — the most important database skill for any data engineer
✓Every table must have defined columns with data types — use VARCHAR for text, NUMERIC for money, DATE/TIMESTAMP for time
✓SELECT, INSERT, UPDATE, DELETE are the four core operations — always use WHERE with UPDATE and DELETE
✓GROUP BY + aggregate functions (SUM, COUNT, AVG) is how you build summaries — the foundation of every report
✓JOINs combine data from multiple tables — INNER JOIN for matches only, LEFT JOIN to keep all rows from the left table
✓CTEs make complex queries readable by breaking them into named steps — use them in every non-trivial query
✓Window functions (RANK, SUM OVER) calculate per-row values across groups without collapsing rows like GROUP BY
✓UPSERT (ON CONFLICT DO UPDATE) is the key pattern for idempotent pipeline loads — safe to run multiple times
✓CREATE TABLE AS SELECT is how you materialise Gold-layer summary tables — the end result of a pipeline

What to learn next

Python for Data Engineers

Foundations · 20 min · +150 XP

SQL for Data Engineers

Python for Data Engineers

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub