SQL — Module 18Beginner

SQL Data Types

What types exist, which to choose for each use case, how types affect storage and performance, and how type mismatches cause silent bugs in calculations and comparisons

12–16 min April 2026

Section 4 · Writing & Changing Data

Writing & Changing Data · 5 modulesModule 18

Data Creating Adding Updating Deleting

// Part 01

Why Data Types Matter More Than You Think

Every column in a SQL table has a data type. This is not just a technical detail — the type you choose for a column determines how the database stores it, how fast queries on it run, what operations are valid on it, and whether calculations produce correct results or silently wrong ones.

Three real consequences of wrong type choices that happen in production:

Financial rounding errors

Storing money as FLOAT instead of DECIMAL causes floating-point rounding errors. ₹10.50 stored as a FLOAT might be retrieved as ₹10.499999999. Multiplied across millions of transactions, these fractions cause accounting discrepancies that compliance teams spend days tracking down.

Silent comparison failures

Storing a phone number as INTEGER drops leading zeros — 09876543210 becomes 9876543210. Storing order IDs as VARCHAR when they should be INTEGER makes WHERE order_id = 1007 compare a number to a string, which sometimes works (MySQL coerces) and sometimes fails (PostgreSQL errors).

Index inefficiency

Using VARCHAR(5000) for a column that always holds 2-character country codes wastes storage and makes indexes larger than necessary. Using TEXT when VARCHAR(100) is appropriate prevents certain index types. Choosing the tightest correct type keeps indexes small and queries fast.

In FreshCart's schema, every type choice was deliberate. unit_price is DECIMAL(10,2) — not FLOAT — because money must be exact. customer_id is INTEGER — not VARCHAR — because IDs are numbers used in arithmetic and joins. order_date is DATE — not TIMESTAMP — because FreshCart only needs day precision. This module explains every choice like that.

// Part 02

The Five Type Families

SQL data types fall into five broad families. Every specific type belongs to one of these families, and understanding the families gives you a mental model for choosing types even when you encounter an unfamiliar database.

Numeric

INTEGER, BIGINT, DECIMAL, FLOAT

Numbers — counts, prices, quantities, IDs, percentages

Text

VARCHAR, CHAR, TEXT

Strings — names, emails, descriptions, codes

Date & Time

DATE, TIME, TIMESTAMP, INTERVAL

Dates and times — order dates, birth dates, timestamps

Boolean

BOOLEAN, TINYINT(1)

True/false flags — in_stock, is_active, is_deleted

Binary & Other

BYTEA, JSON, JSONB, UUID, ARRAY

Files, structured data, unique identifiers, lists

// Part 03

Numeric Types — Integers, Decimals, and Floats

Numeric types are the most consequential choice in database design. The difference between INTEGER, DECIMAL, and FLOAT is not just precision — it determines whether your financial calculations are correct.

Integer types — whole numbers

SMALLINTINT2

Storage

2 bytes

Range / Size

-32,768 to 32,767

Use for

Small counters, ratings (1-5), age, small counts

Never use for

IDs, prices, anything that might exceed 32,767

INTEGERINT, INT4

Storage

4 bytes

Range / Size

-2.1 billion to 2.1 billion

Use for

Primary keys, foreign keys, quantities, most counts

Never use for

User counts above 2 billion (use BIGINT), money

BIGINTINT8

Storage

8 bytes

Range / Size

±9.2 quintillion

Use for

High-volume PKs, timestamps as epoch ms, large counts

Never use for

Overkill for most IDs — use INTEGER until you need BIGINT

Loading FreshCart DB…
-- See the column types in the orders table
SELECT
  name       AS column_name,
  type       AS data_type,
  [notnull]  AS not_null,
  dflt_value AS default_value
FROM pragma_table_info('orders');Ctrl + Enter to run
Loading FreshCart database in your browser…

DECIMAL / NUMERIC — exact decimal arithmetic

DECIMAL(p, s)NUMERIC(p, s)

Storage

Variable — depends on precision

Range / Size

Up to 131,072 digits before decimal, 16,383 after

Use for

Money, prices, financial calculations, tax rates, percentages

Never use for

Scientific measurements needing huge range (use FLOAT)

DECIMAL(p, s) takes two parameters: p (precision) is the total number of significant digits, and s (scale) is the number of digits after the decimal point. DECIMAL(10, 2) stores up to 10 total digits with exactly 2 after the decimal — perfect for prices up to ₹99,999,999.99.

DECIMAL precision and scale examples

DECIMAL(10, 2)   -- max: 99,999,999.99  — prices, salaries
DECIMAL(5, 2)    -- max: 999.99          — percentages, tax rates
DECIMAL(15, 4)   -- max: 99,999,999,999.9999 — high-precision financial
DECIMAL(3, 0)    -- max: 999             — integer stored as decimal

-- FreshCart uses:
unit_price  DECIMAL(10, 2)  -- up to ₹99,999,999.99
cost_price  DECIMAL(10, 2)  -- same
salary      DECIMAL(10, 2)  -- monthly salary
line_total  DECIMAL(10, 2)  -- order item total

Loading FreshCart DB…
-- DECIMAL arithmetic is exact — no floating-point error
SELECT
  0.1 + 0.2                         AS float_result,
  CAST(0.1 AS DECIMAL(5,1))
  + CAST(0.2 AS DECIMAL(5,1))       AS decimal_result,
  -- In most databases: float gives 0.30000000000000004
  -- DECIMAL gives exactly 0.3
  unit_price * 1.18                 AS gst_price,
  ROUND(unit_price * 1.18, 2)       AS gst_rounded
FROM products
LIMIT 5;Ctrl + Enter to run
Loading FreshCart database in your browser…

FLOAT / REAL / DOUBLE — approximate decimal arithmetic

FLOAT / DOUBLEFLOAT8, DOUBLE PRECISION, REAL

Storage

4 bytes (REAL) or 8 bytes (FLOAT/DOUBLE)

Range / Size

±1.7 × 10^308 (enormous range)

Use for

Scientific measurements, ML model weights, statistics

Never use for

Money, prices, financial calculations — use DECIMAL instead

FLOAT stores numbers in binary floating-point format — the same format used by computers for all floating-point arithmetic. It can represent an enormous range of values but cannot represent most decimal fractions exactly. The decimal 0.1 stored as a FLOAT is actually stored as the closest binary approximation: 0.1000000000000000055511151231257827021181583404541015625. Calculations compound these tiny errors into visible discrepancies.

Why FLOAT is wrong for money — always

-- FLOAT rounding error demonstration
-- Store ₹10.50 as FLOAT and multiply:
SELECT 10.50::FLOAT * 1000000  -- might give 10499999.999... instead of 10500000

-- In a payment system processing 1 million ₹10.50 transactions:
-- FLOAT total: ₹10,499,999.99 (₹0.01 short — regulatory violation)
-- DECIMAL total: ₹10,500,000.00 (exactly correct)

-- The rule is absolute: NEVER store money as FLOAT.
-- Always use DECIMAL(precision, 2) for any monetary value.

// Part 04

Text Types — VARCHAR, CHAR, and TEXT

Text types store strings of characters. The choice between them affects storage efficiency, performance, and what operations are available.

VARCHAR(n)CHARACTER VARYING(n)

Storage

Actual length + 1-4 bytes overhead

Range / Size

Up to n characters (max 65,535 in MySQL, 10,485,760 in PostgreSQL)

Use for

Names, emails, codes, URLs, any variable-length text with a known max

Never use for

Text longer than ~10,000 chars (use TEXT), fixed-length codes (use CHAR)

CHAR(n)CHARACTER(n)

Storage

Always exactly n bytes (space-padded)

Range / Size

Fixed n characters

Use for

Fixed-length codes: country codes (CHAR(2)), currency codes (CHAR(3)), postal codes

Never use for

Variable-length data — wastes space with padding

TEXTCLOB (Oracle/MySQL)

Storage

Actual length + small overhead

Range / Size

Unlimited (PostgreSQL), 65,535 bytes (MySQL TEXT), 4GB (MySQL LONGTEXT)

Use for

Long descriptions, article content, JSON strings, logs

Never use for

Short fields where VARCHAR(n) communicates the expected max length

VARCHAR vs TEXT — practical guidance

In PostgreSQL, VARCHAR and TEXT have identical performance — TEXT is not slower than VARCHAR. The difference is documentation: VARCHAR(100) tells any developer that this column should never exceed 100 characters, which acts as a soft constraint and communicates intent. TEXT has no upper bound. Use VARCHAR(n) when you have a reasonable maximum, TEXT when the length is genuinely unbounded (product descriptions, user reviews, log entries).

In MySQL, TEXT types are stored differently from VARCHAR — they cannot be fully indexed without a prefix and have different row-format implications. For MySQL, prefer VARCHAR(255) or VARCHAR(1000) over TEXT whenever a reasonable maximum exists.

Loading FreshCart DB…
-- Explore column types in customers and products
SELECT 'customers' AS table_name, name AS column_name, type AS data_type
FROM pragma_table_info('customers')
UNION ALL
SELECT 'products', name, type
FROM pragma_table_info('products')
ORDER BY table_name;Ctrl + Enter to run
Loading FreshCart database in your browser…

String storage and collation

Every text column has a collation — a set of rules for comparing and sorting strings. Collation controls case sensitivity (is 'A' = 'a'?), accent sensitivity (is 'é' = 'e'?), and sort order. In PostgreSQL the default collation is determined at database creation. In MySQL the default is often utf8mb4_general_ci (case-insensitive). Mismatched collations between joined tables can cause join failures or incorrect results — a subtle bug that is hard to diagnose.

Choosing VARCHAR length — FreshCart column design rationale

-- Why each column has the length it has:
first_name  VARCHAR(100)  -- longest Indian name + margin
last_name   VARCHAR(100)  -- same
email       VARCHAR(255)  -- RFC 5321 maximum email length
phone       VARCHAR(20)   -- country code + number + separators
city        VARCHAR(100)  -- longest city name with margin
zip_code     VARCHAR(10)   -- 6-digit Indian zip_code + future
product_name VARCHAR(200) -- long descriptive product names
store_name  VARCHAR(200)  -- full store name with location

-- The principle: VARCHAR(n) where n is the realistic maximum
-- Add ~20-30% headroom for edge cases
-- Do not use VARCHAR(255) for everything — it hides intent

// Part 05

Date and Time Types — DATE, TIME, TIMESTAMP

Date and time types are among the most misused in SQL. Choosing the wrong one causes subtle bugs that only appear when edge cases hit — daylight saving time transitions, year boundaries, timezone changes.

DATE(standard across databases)

Storage

4 bytes

Range / Size

4713 BC to 5874897 AD (PostgreSQL)

Use for

Birth dates, order dates, hire dates, any calendar day without time

Never use for

Anything that needs time of day — use TIMESTAMP

TIMETIME WITHOUT TIME ZONE

Storage

8 bytes

Range / Size

00:00:00 to 24:00:00

Use for

Store opening hours, scheduled times without a date

Never use for

Most business uses — TIME alone without a date is rarely useful

TIMESTAMPDATETIME (MySQL)

Storage

8 bytes

Range / Size

4713 BC to 294276 AD (PostgreSQL)

Use for

Created_at, updated_at, event timestamps, log entries — any moment in time

Never use for

When only the date matters — use DATE to avoid timezone confusion

TIMESTAMPTZTIMESTAMP WITH TIME ZONE

Storage

8 bytes

Range / Size

Same as TIMESTAMP

Use for

All production timestamps where users are in multiple timezones

Never use for

Single-timezone applications where TIMESTAMP is simpler

DATE vs TIMESTAMP — the most important choice

Use DATE when only the calendar day matters and time is irrelevant. Order dates, birth dates, hire dates, expiry dates — these are pure dates. Using TIMESTAMP for these adds unnecessary complexity and opens timezone pitfalls.

Use TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ) for any event that happens at a specific moment — when a user logged in, when a payment was made, when a notification was sent. TIMESTAMPTZ stores the moment in UTC and converts to local time on display. Without time zone, timestamps from different cities are ambiguous — 14:30:00 in Seattle and 14:30:00 in New York are completely different moments.

Loading FreshCart DB…
-- Working with dates in FreshCart
SELECT
  order_id,
  order_date,
  delivery_date,
  -- Extract parts of a date
  CAST(strftime('%Y', order_date) AS INTEGER)                              AS order_year,
  CAST(strftime('%m', order_date) AS INTEGER)                              AS order_month,
  CAST(strftime('%w', order_date) AS INTEGER)                              AS day_of_week,
  -- Date arithmetic (julianday gives fractional days; cast to integer for whole days)
  CAST(julianday(delivery_date) - julianday(order_date) AS INTEGER)        AS days_to_deliver,
  -- Current date
  date('now')                                                              AS today,
  CAST(julianday('now') - julianday(order_date) AS INTEGER)                AS days_since_order
FROM orders
WHERE delivery_date IS NOT NULL
ORDER BY order_date DESC
LIMIT 8;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Date formatting and extraction (SQLite syntax)
SELECT
  order_id,
  order_date,
  strftime('%d/%m/%Y', order_date)                        AS formatted_date,
  strftime('%Y-%m', order_date)                           AS month_label,
  date(order_date, 'start of month')                      AS month_start,
  CAST(strftime('%m', order_date) AS INTEGER)             AS order_month,
  CAST(strftime('%Y', order_date) AS INTEGER)             AS order_year,
  CAST(strftime('%j', order_date) AS INTEGER)             AS day_of_year
FROM orders
ORDER BY order_date DESC
LIMIT 8;Ctrl + Enter to run
Loading FreshCart database in your browser…

INTERVAL — storing durations

INTERVAL stores a duration rather than a point in time. It is used in date arithmetic for adding or subtracting time periods.

INTERVAL in date arithmetic

-- Add one month to a date
order_date + INTERVAL '1 month'

-- Subtract 30 days
CURRENT_DATE - INTERVAL '30 days'

-- Add 2 hours 30 minutes to a timestamp
created_at + INTERVAL '2 hours 30 minutes'

-- Find orders in the last 7 days
WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'

-- MySQL equivalent:
WHERE order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)

// Part 06

Boolean Type — TRUE, FALSE, and NULL

Boolean columns store true/false values. In PostgreSQL, BOOLEAN is a first-class type. In MySQL, it is stored as TINYINT(1) — 1 for true, 0 for false. Both support the same logical operations.

BOOLEANBOOL, TINYINT(1) in MySQL

Storage

1 byte

Range / Size

TRUE, FALSE, or NULL

Use for

Flags: in_stock, is_active, is_deleted, is_verified, is_paid

Never use for

Multi-state fields — use VARCHAR or ENUM for status columns

Loading FreshCart DB…
-- Boolean operations in SQL
SELECT
  product_name,
  in_stock,
  CASE WHEN in_stock THEN 'Available' ELSE 'Unavailable' END  AS availability,
  CAST(in_stock AS INTEGER)                                   AS stock_as_int
FROM products
ORDER BY in_stock DESC, product_name;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- Count boolean values — how many in stock vs out of stock?
SELECT
  COUNT(*)                                   AS total_products,
  SUM(CASE WHEN in_stock THEN 1 ELSE 0 END)  AS in_stock_count,
  SUM(CASE WHEN NOT in_stock THEN 1 ELSE 0 END) AS out_of_stock_count,
  -- Alternative: cast boolean to integer
  SUM(CAST(in_stock AS INTEGER))             AS in_stock_alt
FROM products;Ctrl + Enter to run
Loading FreshCart database in your browser…

💡 Note

Boolean columns should be used for true binary flags — columns that are definitively either true or false with no other states. For anything with multiple possible states (order_status, loyalty_tier, payment_method), use VARCHAR with a CHECK constraint or an ENUM type. A boolean is_delivered column becomes wrong the moment you need to distinguish between "delivered," "partially delivered," and "delivery failed."

// Part 07

CAST and Type Conversion

Explicit type conversion is done with the CAST function or the PostgreSQL shorthand :: operator. This is essential when you need to convert between types in calculations, comparisons, or output formatting.

CAST syntax

-- Standard SQL syntax (works everywhere):
CAST(expression AS target_type)

-- PostgreSQL shorthand (:: operator):
expression::target_type

-- Examples:
CAST('2024-01-15' AS DATE)
CAST(unit_price AS INTEGER)
CAST(customer_id AS VARCHAR)
CAST(total_amount AS DECIMAL(10,2))

-- PostgreSQL shorthand:
'2024-01-15'::DATE
unit_price::INTEGER
customer_id::VARCHAR
3.14159::NUMERIC(5,2)  -- rounds to 3.14

Common CAST use cases

Loading FreshCart DB…
-- CAST for safe division (force decimal result)
SELECT
  product_name,
  unit_price,
  cost_price,
  -- Integer division (wrong in databases with integer columns):
  -- (unit_price - cost_price) / unit_price
  -- Decimal division (always correct):
  ROUND(
    CAST(unit_price - cost_price AS DECIMAL) / unit_price * 100
  , 1)  AS margin_pct
FROM products
ORDER BY margin_pct DESC
LIMIT 6;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- CAST for string formatting and concatenation
SELECT
  order_id,
  customer_id,
  order_date,
  total_amount,
  -- Concatenate mixed types — must CAST to text first
  'Order #' || CAST(order_id AS VARCHAR)
  || ' — ₹' || CAST(total_amount AS VARCHAR)   AS order_summary,
  -- Format date as string (DuckDB syntax)
  strftime(order_date, '%d/%m/%Y')              AS formatted_date
FROM orders
ORDER BY order_date DESC
LIMIT 6;Ctrl + Enter to run
Loading FreshCart database in your browser…

Loading FreshCart DB…
-- CAST for date extraction
SELECT
  order_date,
  CAST(strftime('%Y', order_date) AS INTEGER)                                    AS year,
  CAST(strftime('%m', order_date) AS INTEGER)                                    AS month,
  (CAST(strftime('%m', order_date) AS INTEGER) - 1) / 3 + 1                     AS quarter,
  -- Cast date parts to text for grouping labels
  strftime('%Y', order_date)
  || '-Q'
  || CAST((CAST(strftime('%m', order_date) AS INTEGER) - 1) / 3 + 1 AS TEXT)    AS quarter_label
FROM orders
ORDER BY order_date DESC
LIMIT 8;Ctrl + Enter to run
Loading FreshCart database in your browser…

Implicit vs explicit conversion

Some databases silently convert between compatible types — this is called implicit conversion. MySQL is more permissive about implicit conversion than PostgreSQL. Relying on implicit conversion creates code that works on one database but fails on another. Always use explicit CAST when you need a type conversion — it makes the intent clear and ensures portability.

Implicit vs explicit conversion

-- MySQL: implicit conversion works (but avoid relying on it)
WHERE order_id = '1007'   -- MySQL converts '1007' to integer silently
WHERE total_amount = '850' -- MySQL converts '850' to decimal

-- PostgreSQL: explicit conversion required
WHERE order_id = 1007          -- correct — integer literal
WHERE order_id = CAST('1007' AS INTEGER)  -- explicit cast
WHERE total_amount = 850.00    -- correct — decimal literal

-- Always write queries that are explicit about types:
-- Use integer literals for integer columns
-- Use decimal literals for decimal columns
-- Use quoted ISO dates for date columns

// Part 08

Special Types — JSON, UUID, and Arrays

Modern databases support types beyond the basic five families. These are used in specific scenarios at production companies.

JSON and JSONB — storing structured documents

PostgreSQL supports JSON (text-stored JSON) and JSONB (binary-stored JSON, indexable and faster to query). JSONB is almost always preferred. MySQL supports a JSON type since version 5.7.

JSONB in PostgreSQL — real use cases

-- Store product attributes with varying structure
-- Electronics have 'wattage', food has 'expiry_days' — no fixed schema
CREATE TABLE products (
  product_id   INTEGER PRIMARY KEY,
  product_name VARCHAR(200),
  attributes   JSONB    -- flexible key-value storage
);

-- Insert
INSERT INTO products VALUES (
  1, 'Smart TV 55"',
  '{"wattage": 120, "resolution": "4K", "ports": ["HDMI", "USB"]}'
);

-- Query JSONB
SELECT product_name,
  attributes->>'resolution'          AS resolution,
  attributes->'ports'                AS ports_array,
  attributes->'ports'->>0           AS first_port
FROM products
WHERE attributes->>'resolution' = '4K';

-- Index for fast JSONB queries
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);

Stripe uses JSONB to store payment metadata — each payment instrument (UPI, card, netbanking) has a completely different structure of additional fields. JSONB lets them store all of it in one column without creating dozens of nullable columns. Uber Eats uses JSONB for restaurant menu data — each item has different option structures.

UUID — universally unique identifiers

UUID as primary key

-- UUID: 128-bit random identifier, globally unique
-- Format: 550e8400-e29b-41d4-a716-446655440000

CREATE TABLE events (
  event_id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  event_type VARCHAR(100),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- When to use UUID vs INTEGER primary key:
-- UUID: distributed systems, external-facing IDs, merged databases
-- INTEGER: internal tables, simple applications, better join performance

-- FreshCart uses INTEGER PKs — single database, internal IDs
-- Stripe uses UUID for payment IDs shared with merchants — must be globally unique

Arrays — storing multiple values in one column

PostgreSQL ARRAY type

-- Store a list of tags in one column (PostgreSQL only)
CREATE TABLE products (
  product_id  INTEGER PRIMARY KEY,
  tags        TEXT[]   -- array of text values
);

INSERT INTO products VALUES (1, ARRAY['organic', 'dairy-free', 'vegan']);

-- Query array columns
SELECT * FROM products WHERE 'organic' = ANY(tags);
SELECT * FROM products WHERE tags @> ARRAY['organic', 'vegan'];

-- When arrays are appropriate:
-- Small, fixed-type lists that are always fetched with the parent row
-- Tags, categories, phone numbers for one contact

-- When NOT to use arrays:
-- When you need to query individual elements frequently — normalise instead
-- When elements have their own attributes — they deserve a separate table

// Part 09

Choosing the Right Type — The Decision Framework

When designing a column or reviewing a schema, run through these questions in order.

Is it a number?

Is it always a whole number? → INTEGER or BIGINT. Does it have decimal places? Is it money? → DECIMAL(p,s). Is it a scientific measurement? → FLOAT. Is it a boolean flag? → BOOLEAN.

Is it text?

Do you know the maximum length? → VARCHAR(n). Is the length truly unbounded (descriptions, articles)? → TEXT. Is it always exactly the same length (country code, currency code)? → CHAR(n).

Is it a date or time?

Date only, no time? → DATE. Exact moment in time, single timezone? → TIMESTAMP. Exact moment, multiple timezones? → TIMESTAMPTZ. Duration/interval? → INTERVAL.

Is it an identifier?

Internal ID, single database? → INTEGER with AUTO_INCREMENT or SERIAL. Distributed system or external-facing? → UUID. Fixed-format code (PAN, GST)? → CHAR(n) or VARCHAR(n) with CHECK constraint.

Is it structured data with variable schema?

Attributes that differ per row? → JSONB (PostgreSQL) or JSON (MySQL). Small fixed list? → ARRAY (PostgreSQL) or separate table. Always separate table when elements need their own attributes or are queried independently.

FreshCart schema — every type explained

Column	Type chosen	Why this type
customer_id	INTEGER	Whole number, internal ID, will not exceed 2 billion customers
email	VARCHAR(255)	RFC 5321 max email length is 254 chars — 255 gives one byte headroom
phone	VARCHAR(20)	Must store leading zeros and country codes — NOT an integer
joined_date	DATE	Only the calendar day matters — no time component needed
loyalty_tier	VARCHAR(20)	Short text with a finite set of values — CHECK constraint enforces allowed values
unit_price	DECIMAL(10,2)	Money — must be exact. 10 total digits, 2 decimal places (paise precision)
in_stock	BOOLEAN	Binary flag — either in stock or not. No other states needed
order_date	DATE	Calendar day of the order — no time needed
delivery_date	DATE	Calendar day of delivery — nullable (NULL = not yet delivered)
total_amount	DECIMAL(10,2)	Money — same reasoning as unit_price
salary	DECIMAL(10,2)	Money — monthly salary in rupees with paise precision
monthly_target	DECIMAL(12,2)	Larger amounts (store targets) — wider precision than individual prices

// Part 10

Type Mismatches — Silent Bugs in Production

Type mismatches between columns and values in WHERE conditions, JOINs, and calculations are a major source of production bugs. They often do not cause errors — they cause silent wrong results or performance issues.

Phone number stored as INTEGER — leading zero lost

The phone number type mistake

-- WRONG: phone stored as INTEGER
CREATE TABLE customers (phone BIGINT);
INSERT INTO customers VALUES (09876543210);  -- stored as 9876543210
-- Leading zero is gone. Can never be recovered.

-- RIGHT: phone stored as VARCHAR
CREATE TABLE customers (phone VARCHAR(20));
INSERT INTO customers VALUES ('09876543210');  -- stored correctly
INSERT INTO customers VALUES ('+91-9876543210');  -- also works

-- The rule: store phone numbers, account numbers, PAN, SSN,
-- and any "number" that has leading zeros or non-numeric characters
-- as VARCHAR, never as INTEGER

Date stored as VARCHAR — sorting breaks

The VARCHAR date mistake

-- WRONG: date stored as VARCHAR
-- '2024-01-15' sorts correctly (ISO format)
-- But '15/01/2024' sorts wrong — '15' comes before '9' alphabetically
SELECT * FROM orders ORDER BY order_date;
-- '01/01/2024', '15/01/2024', '2/01/2024', '20/01/2024' — WRONG ORDER

-- RIGHT: always store dates as DATE type
-- Database handles sorting, arithmetic, extraction correctly
-- CURRENT_DATE - order_date works, VARCHAR - VARCHAR does not

-- The rule: any calendar date should be a DATE column, never VARCHAR

JOIN on mismatched types — implicit cast or index miss

JOIN type mismatch

-- WRONG: joining INTEGER customer_id to VARCHAR customer_id
-- customers.customer_id = INTEGER
-- orders.customer_id = VARCHAR  (design error)

SELECT * FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
-- PostgreSQL: ERROR — cannot compare integer and varchar
-- MySQL: implicit cast, but cannot use index on either column
-- Result: full table scan, extremely slow on large tables

-- RIGHT: join columns must have the same type
-- Always make FK columns the same type as the PK they reference
-- customer_id INTEGER in customers → customer_id INTEGER in orders

Loading FreshCart DB…
-- Verify FreshCart's type consistency in joins
-- customer_id should be the same type in customers and orders
SELECT
  c.name AS column_name,
  c.type AS customers_type,
  o.type AS orders_type,
  CASE WHEN c.type = o.type THEN 'MATCH' ELSE 'MISMATCH' END AS type_status
FROM pragma_table_info('customers') c
JOIN pragma_table_info('orders') o ON c.name = o.name
WHERE c.name = 'customer_id';Ctrl + Enter to run
Loading FreshCart database in your browser…

// Part 11

What This Looks Like at Work

You are a backend engineer at a Seattle fintech startup. A senior engineer asks you to review the schema for a new loan application table before it goes to production. You find three type-related issues that would have caused production bugs.

10:00 AM

Schema review request arrives

The schema dump arrives in Slack. You review it column by column against the type decision framework.

The schema with issues (adapted example)

-- Problematic schema (before your review):
CREATE TABLE loan_applications (
  application_id  VARCHAR(20),      -- Issue 1
  applicant_name  VARCHAR(50),
  ssn_last4      CHAR(10),
  annual_income   FLOAT,            -- Issue 2
  application_date VARCHAR(10),     -- Issue 3
  phone_number    BIGINT,           -- Issue 4
  credit_score    INTEGER,
  loan_amount     FLOAT,            -- Issue 5
  is_approved     INTEGER           -- Issue 6
);

10:20 AM

You identify six issues

You document each issue with the business impact and the correct fix.

Your corrected schema with rationale

CREATE TABLE loan_applications (
  -- Issue 1: application_id should be SERIAL or UUID
  -- VARCHAR(20) requires manual ID generation and risks duplicates
  application_id   SERIAL PRIMARY KEY,

  -- applicant_name: VARCHAR(50) too short for some Indian names with titles
  applicant_name   VARCHAR(200) NOT NULL,

  -- PAN: CHAR(10) is correct — PAN is always exactly 10 chars (ABCDE1234F)
  ssn_last4       CHAR(10) UNIQUE NOT NULL,

  -- Issue 2: annual_income as FLOAT causes rounding errors in loan calculations
  -- ₹5,00,000 stored as FLOAT might compute to ₹4,99,999.999...
  annual_income    DECIMAL(12, 2) NOT NULL,

  -- Issue 3: application_date as VARCHAR loses date arithmetic and sorting
  -- '15/01/2024' cannot be subtracted from CURRENT_DATE
  application_date DATE NOT NULL DEFAULT CURRENT_DATE,

  -- Issue 4: phone_number as BIGINT loses leading zeros and +91 prefix
  phone_number     VARCHAR(20) NOT NULL,

  credit_score     SMALLINT NOT NULL CHECK (credit_score BETWEEN 300 AND 900),

  -- Issue 5: loan_amount as FLOAT — same as income, must be exact
  loan_amount      DECIMAL(14, 2) NOT NULL,

  -- Issue 6: is_approved as INTEGER instead of BOOLEAN
  -- Also: 3-state approval (pending/approved/rejected) needs VARCHAR
  approval_status  VARCHAR(20) NOT NULL DEFAULT 'Pending'
                   CHECK (approval_status IN ('Pending','Approved','Rejected')),

  created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

11:00 AM

Review approved, schema fixed before production

The senior engineer reviews your comments and approves all six changes. The FLOAT-to-DECIMAL change alone prevents a potential regulatory issue — RBI audit reports require exact loan amounts, and FLOAT rounding in amortisation calculations would have caused discrepancies in compliance reports.

🎯 Pro Tip

Schema reviews for type correctness are one of the highest-value activities in backend engineering. A wrong type chosen at table creation is extremely costly to change in production — ALTER TABLE on a column type on a table with millions of rows locks the table, requires extensive data migration, and risks data loss. Getting it right before the first INSERT is worth hours of careful review.

// Part 12

Interview Prep — 5 Questions With Complete Answers

Q: What is the difference between DECIMAL and FLOAT and which should you use for money?

DECIMAL (also called NUMERIC) stores numbers as exact decimal digits — DECIMAL(10,2) stores exactly 10 significant digits with exactly 2 after the decimal point. Arithmetic on DECIMAL values produces exact results. 0.1 + 0.2 in DECIMAL is exactly 0.3. FLOAT stores numbers in binary floating-point format — the IEEE 754 standard used by all modern CPUs. Binary floating-point cannot represent most decimal fractions exactly. 0.1 stored as FLOAT is actually 0.1000000000000000055511151231... — a tiny error that compounds through arithmetic.

For money, FLOAT is never acceptable. Financial calculations require exact decimal arithmetic — ₹10.50 × 1,000,000 must be exactly ₹10,500,000.00, not ₹10,499,999.99 due to floating-point approximation. In a payment system processing millions of transactions, FLOAT errors accumulate into accounting discrepancies that violate regulatory requirements. The RBI and SEBI require exact financial records. FLOAT violates this requirement at scale.

Always use DECIMAL for monetary values, prices, tax rates, and any financial calculation. The typical choices: DECIMAL(10,2) for amounts in rupees (up to ₹99,999,999.99), DECIMAL(5,4) for tax rates and percentages stored as decimals (0.1800 for 18% GST), DECIMAL(15,2) for very large institutional amounts. Use FLOAT only for scientific measurements, machine learning weights, and statistical values where exact decimal representation is neither possible nor required — the measurement itself has inherent imprecision that exceeds the floating-point error.

Q: When should you use VARCHAR vs TEXT for a text column?

The functional difference between VARCHAR and TEXT varies by database. In PostgreSQL, VARCHAR and TEXT have identical storage and performance — there is no technical advantage to TEXT over VARCHAR or vice versa. The choice is entirely about documentation and constraints: VARCHAR(n) enforces a maximum length limit and communicates to any developer that the field should not exceed n characters. TEXT has no upper bound and makes no statement about expected length.

In MySQL, the distinction is more significant. VARCHAR(n) is stored inline with the row for values up to 65,535 bytes. TEXT types (TEXT, MEDIUMTEXT, LONGTEXT) are stored off-row as a reference, have different indexing limitations (cannot be fully indexed without a prefix), and have different performance characteristics for large values. For MySQL, prefer VARCHAR(n) for all fields where a reasonable maximum exists, and TEXT only for genuinely long content.

Professional guidance: use VARCHAR(n) whenever you can define a meaningful maximum — email VARCHAR(255), first_name VARCHAR(100), city VARCHAR(100), product_name VARCHAR(200). The n value should be based on the realistic maximum for the domain, not defaulted to 255 for everything. Use TEXT for genuinely unbounded fields: product descriptions, user reviews, blog content, log messages, any field where the author might reasonably write hundreds or thousands of words. The VARCHAR(n) limit serves as a soft validation and a schema communication tool — it tells the next developer what to expect from this column.

Q: Why should phone numbers be stored as VARCHAR rather than INTEGER?

Phone numbers are not numeric values in the mathematical sense — they are identifiers that happen to use digits. Storing them as INTEGER or BIGINT causes three problems. First, leading zeros are lost. Indian mobile numbers start with 0 (when dialled as local calls) or with a country code — 09876543210 stored as BIGINT becomes 9876543210, irreversibly losing the leading zero. Stored data can never be recovered to the original format.

Second, non-numeric characters cannot be stored. International format phone numbers include country codes (+91), separators (-), and sometimes extensions (x204). BIGINT accepts none of these — any format beyond pure digits requires VARCHAR. Third, no arithmetic is ever performed on phone numbers — you never add two phone numbers together or calculate the average phone number. The entire reason to choose a numeric type is to enable arithmetic and range operations. Phone numbers need neither.

The correct type is VARCHAR(20) — long enough for any international format (+1-800-555-0199 is 15 chars, with headroom). This applies to any "number" that is really an identifier: PAN card (ABCDE1234F — mixed alphanumeric), SSN number (12 digits, no arithmetic), GST number, IFSC code, postal codes (some have letters — UK postcodes SW1A 1AA). The rule: if you would never multiply it, average it, or compare it with greater-than, store it as VARCHAR.

Q: What is the difference between TIMESTAMP and TIMESTAMPTZ and which should you use?

TIMESTAMP (TIMESTAMP WITHOUT TIME ZONE) stores a date and time value with no timezone information attached. The value is stored and retrieved exactly as entered — 2024-01-15 14:30:00 is stored and returned as-is. The database makes no attempt to convert it. If two users in different timezones insert the "same" timestamp, they might both insert 14:30:00 but mean completely different moments in absolute time (IST vs UTC vs PST).

TIMESTAMPTZ (TIMESTAMP WITH TIME ZONE) stores a specific moment in absolute time. The value is always stored internally as UTC. When you insert a timestamp with a timezone (or the database's configured timezone is used), it converts to UTC for storage. When you retrieve it, PostgreSQL converts from UTC to the current session's timezone. 2024-01-15 14:30:00+05:30 (IST) and 2024-01-15 09:00:00+00:00 (UTC) refer to the same moment and store identically.

Use TIMESTAMPTZ for all production application timestamps — created_at, updated_at, logged_at, processed_at. Any event that "happened at a specific moment" needs TIMESTAMPTZ to be unambiguous across timezones. Use TIMESTAMP only when the time has no timezone meaning — a scheduled broadcast at 20:00 every evening regardless of timezone, a business hours definition, a recurring calendar event. Most Indian startups operate in a single timezone (IST) but use TIMESTAMPTZ anyway because it is the safer default and costs nothing extra. If you later add international users or move servers between regions, TIMESTAMP values become ambiguous while TIMESTAMPTZ remains correct.

Q: What is CAST and when do you need it?

CAST converts a value from one data type to another. The standard SQL syntax is CAST(expression AS target_type). PostgreSQL also supports the :: shorthand: expression::target_type. The conversion is explicit — you are telling the database exactly what type conversion you want, rather than relying on implicit conversion behaviour that may differ between databases.

The most important use cases for CAST: forcing decimal division when columns are integers — CAST(numerator AS DECIMAL) / denominator prevents integer truncation where 7/2 would return 3 instead of 3.5. Converting dates to strings for display — TO_CHAR or CAST(date_col AS VARCHAR) for concatenation with strings. Converting strings to dates when inserting or comparing with user input. Narrowing a type for a specific calculation — CAST(total_amount AS INTEGER) to drop paise precision when only rupees matter.

When not to use CAST: do not use CAST to work around a schema design error. If you are constantly casting a phone_number column from BIGINT to VARCHAR, fix the schema — change the column to VARCHAR. CAST is for legitimate temporary type coercion in calculations and output formatting, not a permanent workaround for wrong column types. Also avoid CAST on the column side of WHERE conditions because it prevents index usage — CAST(order_id AS VARCHAR) = '1007' disables the index on order_id. Rewrite the value side instead: order_id = CAST('1007' AS INTEGER) or simply order_id = 1007.

// Part 13

Errors You Will Hit — And Exactly Why They Happen

ERROR: operator does not exist: integer = character varying — WHERE customer_id = '1007'

Cause: Type mismatch in a WHERE condition. customer_id is an INTEGER column but you compared it to a string value '1007' (with quotes). PostgreSQL does not implicitly convert between integer and text. MySQL would silently coerce the string to an integer and succeed — which is why queries written for MySQL sometimes fail on PostgreSQL.

Fix: Remove the quotes to use the correct integer literal: WHERE customer_id = 1007. If the value comes from application code that always produces strings, cast explicitly: WHERE customer_id = CAST('1007' AS INTEGER). For the long term, ensure your application passes integer values for integer columns — use typed parameters in prepared statements, not raw string interpolation.

Financial totals are consistently off by small fractions — SUM returns 10499999.99 instead of 10500000.00

Cause: Money is stored as FLOAT or DOUBLE instead of DECIMAL. Floating-point arithmetic cannot represent most decimal fractions exactly. Small errors (like 10.50 being stored as 10.499999...) compound across millions of rows. The SUM of one million ₹10.50 values stored as FLOAT may be ₹10,499,999.99 instead of ₹10,500,000.00 — a ₹0.01 discrepancy that causes compliance failures.

Fix: Change the column type from FLOAT to DECIMAL(precision, 2): ALTER TABLE payments ALTER COLUMN amount TYPE DECIMAL(12,2). Migrate existing data: UPDATE payments SET amount = ROUND(amount::DECIMAL, 2). Audit all financial columns across the schema and convert every FLOAT used for money. Going forward, make DECIMAL mandatory in code review for any column that stores a monetary value — reject any schema change that uses FLOAT for money.

Date sorting is wrong — '2/01/2024' sorts before '15/01/2024' in ORDER BY

Cause: Dates are stored as VARCHAR instead of DATE. VARCHAR sorts lexicographically — character by character. '2' comes before '15' alphabetically because '2' > '1'. The correct chronological order (Jan 2 before Jan 15) matches numeric order, not alphabetical. Any non-ISO date format stored as VARCHAR will sort incorrectly.

Fix: Change the column to DATE type: ALTER TABLE orders ALTER COLUMN order_date TYPE DATE USING TO_DATE(order_date, 'DD/MM/YYYY'). The USING clause converts existing VARCHAR values to DATE during the migration. Going forward, always insert dates as ISO 8601 strings ('2024-01-15') which the database parses correctly into DATE. If input arrives in DD/MM/YYYY format from users, convert at the application layer before inserting.

VARCHAR column truncates long values silently — 'Rajasthani Papad Masala Fryums Mixed' becomes 'Rajasthani Papad Masala F'

Cause: The VARCHAR(n) limit is too short for some values being inserted. The database is truncating values to fit the declared length — or in strict mode, rejecting the insert entirely. VARCHAR(20) for a product name column that needs to hold 35-character names will silently truncate in MySQL's non-strict mode, causing data loss that is invisible until a product search returns wrong names.

Fix: Increase the VARCHAR length: ALTER TABLE products ALTER COLUMN product_name TYPE VARCHAR(500). Review all VARCHAR lengths in your schema against the actual data: SELECT MAX(LENGTH(product_name)) FROM products — this shows the current maximum and helps you set a correct limit with headroom. Enable MySQL strict mode (STRICT_TRANS_TABLES) to turn silent truncation into an error so data loss is immediately visible.

Timezone confusion — timestamps show wrong time after server migration

Cause: Timestamps are stored as TIMESTAMP WITHOUT TIME ZONE. When the database server was in one timezone (Asia/Kolkata), values inserted at 14:30 IST were stored as 14:30 with no timezone. After migrating the server to UTC, the same values are displayed as 14:30 UTC — 5.5 hours ahead of the original time. All historical timestamps are now wrong relative to their actual creation time.

Fix: Going forward, use TIMESTAMPTZ for all event timestamps: ALTER TABLE events ALTER COLUMN created_at TYPE TIMESTAMPTZ USING created_at AT TIME ZONE 'Asia/Kolkata'. This converts the stored 14:30 to 09:00 UTC (the actual UTC equivalent of 14:30 IST), which displays as 14:30 when the session timezone is set to IST and as 09:00 when set to UTC — always representing the correct moment. For existing data, determine what timezone the values were originally entered in and migrate accordingly.

Try It Yourself

Write a query that shows all columns in the FreshCart 'products' table — their name, data type, whether they allow NULL, and their default value. Then write a second query that finds any product where unit_price cast to INTEGER differs from unit_price, indicating sub-rupee pricing.

🎯 Key Takeaways

✓Data types are not just technical details — wrong choices cause silent bugs (FLOAT money rounding), data loss (INTEGER phone numbers), and schema migrations that lock production tables for hours.
✓Never store money as FLOAT or DOUBLE. Always use DECIMAL(p,2). Floating-point cannot represent decimal fractions exactly, and errors compound across millions of transactions into regulatory violations.
✓INTEGER for whole-number counts and IDs. BIGINT when values might exceed 2.1 billion. DECIMAL for any number with decimal precision. FLOAT only for scientific measurements with inherent imprecision.
✓VARCHAR(n) for variable-length text with a known maximum. CHAR(n) for fixed-length codes (country codes, currency codes, PAN). TEXT for unbounded content (descriptions, articles).
✓Phone numbers, PAN, SSN, IFSC, postal codes — store as VARCHAR, never INTEGER. They have leading zeros, non-numeric characters, and require no arithmetic.
✓DATE for calendar days with no time component. TIMESTAMP for specific moments in a single timezone. TIMESTAMPTZ for specific moments across multiple timezones — use this for all production created_at and updated_at columns.
✓CAST(expression AS type) or expression::type (PostgreSQL) converts between types. Put CAST on the literal side of WHERE conditions, not the column side — CAST on a column prevents index usage.
✓BOOLEAN for binary flags (is_active, in_stock). For multi-state columns (order_status, approval_status), use VARCHAR with a CHECK constraint — not BOOLEAN.
✓Join columns must be the same type. Joining INTEGER customer_id to VARCHAR customer_id either throws an error (PostgreSQL) or silently disables indexes (MySQL), causing full table scans.
✓query information_schema.columns to inspect column types in any database. Always run a schema type audit before migrations and review type choices in code review before tables are created.

What comes next

In Module 18, you learn CREATE TABLE — how to define a table from scratch, set column types, add constraints, and design schemas that enforce data quality at the database level.

Module 18 → CREATE TABLE

Discussion

Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.

Continue with GitHub