CAST and Type Conversion
Converting between data types — CAST, :: shorthand, implicit vs explicit casting, TRY_CAST for safe conversion, and every coercion rule that prevents silent errors in production queries
// Part 01
Why Type Conversion Exists
SQL is a strongly typed language — every column and every expression has a declared data type, and most operations require compatible types. When types do not match, the database either performs an implicit cast (silently converts for you — sometimes incorrectly) or raises a type error and returns nothing.
Type conversion becomes unavoidable in real data work. CSV imports store everything as text — dates and numbers arrive as VARCHAR and must be cast before arithmetic. UNION ALL requires matching column types across queries — a DATE in one query must become TEXT if the other query has TEXT in that position. Concatenation requires text — a numeric customer_id must be cast to TEXT before joining with a string prefix. Understanding when and how to cast explicitly is the difference between queries that work reliably and queries that fail silently or crash unpredictably.
// Part 02
Three Syntaxes for Casting
// Part 03
Common Cast Conversions — A Practical Map
'42'::INTEGER or CAST('42' AS INTEGER)Fails if string contains non-numeric characters or decimals. '42.5'::INTEGER fails — cast to NUMERIC first, then INTEGER.
'42.50'::NUMERIC or CAST('42.50' AS NUMERIC)Accepts integers and decimals as strings. Fails on non-numeric strings like 'abc' or '₹42'.
'2024-01-15'::DATE or CAST('2024-01-15' AS DATE)Requires ISO format (YYYY-MM-DD). '15-01-2024' or '01/15/2024' may fail — database-dependent.
42::TEXT or CAST(42 AS TEXT)Always succeeds — every integer has a valid text representation. Essential for string concatenation.
3.14::TEXT or CAST(3.14 AS TEXT)Produces the decimal string representation. May include scientific notation for very large/small values.
order_date::TEXT or CAST(order_date AS TEXT)Produces ISO format: '2024-01-15'. For custom formats, use strftime() or TO_CHAR() instead.
42::NUMERIC or CAST(42 AS NUMERIC)Always succeeds. Essential before division to prevent integer arithmetic. 7::NUMERIC / 2 = 3.5, not 3.
3.99::INTEGER or CAST(3.99 AS INTEGER)Truncates toward zero — does not round. 3.99 → 3, not 4. Use ROUND() first if rounding is needed.
TRUE::INTEGER or CAST(is_active AS INTEGER)TRUE → 1, FALSE → 0. Useful for counting boolean flags: SUM(is_active::INTEGER).
// Part 04
Implicit Casting — When the Database Casts for You
Implicit casting happens automatically when the database converts a type without being asked. Sometimes this is convenient. Sometimes it causes silent bugs that are hard to diagnose — a comparison that "works" but filters the wrong rows, or a calculation that produces a different type than expected.
// Part 05
Casting in Practice — The Most Common Patterns
Numeric to text — for concatenation
Text to numeric — for arithmetic on imported data
Integer to numeric — for safe division
Date to text — for display formatting
Boolean to integer — for counting flags
// Part 06
TRY_CAST — Safe Conversion Without Crashes
Standard CAST fails with an error if the value cannot be converted — '₹120' cannot be cast to INTEGER and the entire query fails. TRY_CAST returns NULL instead of raising an error when conversion fails. This is essential for data quality checks and cleaning imports where not every value is guaranteed to be convertible.
// Part 07
Numeric Type Precision — NUMERIC vs FLOAT
SQL has two families of numeric types and choosing between them matters for financial calculations. FLOAT/DOUBLE stores values in binary floating-point — fast but imprecise for exact decimal values. NUMERIC/DECIMAL stores exact decimal values — slightly slower but essential for money.
// Part 08
Type Coercion in UNION — Matching Column Types
UNION and UNION ALL require matching column types across all queries. When types differ, the database either performs implicit coercion (if compatible) or raises a type error. Explicit CAST in UNION is best practice — it makes the intended type clear and prevents database-specific coercion surprises.
// Part 09
Casting Dates — Parsing, Formatting, and Extraction
Date casting is one of the highest-frequency type conversions in real work — parsing date strings from imports, extracting integer parts for arithmetic, and formatting for display. Each direction has its own considerations.
// Part 10
What This Looks Like at Work
You are a data engineer at Amazon. A partner has delivered a seller performance CSV. Every column is stored as TEXT — amounts have currency symbols, dates are in DD/MM/YYYY format, booleans are stored as 'Y'/'N', and some numeric fields have commas as thousand separators. You must write a cleaning and casting query before inserting into the target table.
🎯 Pro Tip
Always run a TRY_CAST audit query before any bulk INSERT from external data. COUNT how many rows have NULL TRY_CAST results — those are rows that will fail CAST during INSERT. If the bad-row count is low, quarantine them. If it is high, there is a systemic format problem that needs upstream investigation before the load proceeds.
// Part 11
Interview Prep — 5 Questions With Complete Answers
All three convert a value from one data type to another, but with different syntax and portability. CAST(expression AS type) is the SQL standard syntax — it works across PostgreSQL, MySQL, SQL Server, DuckDB, and virtually every other SQL database. It is the most portable choice for code that needs to run on multiple databases.
The :: shorthand (expression::type) is PostgreSQL and DuckDB syntax — it is more concise and reads naturally in analytical queries. '2024-01-15'::DATE is cleaner to type and read than CAST('2024-01-15' AS DATE). It is the preferred style in PostgreSQL-specific code. CONVERT has two incompatible variants: MySQL uses CONVERT(expression, type) and SQL Server uses CONVERT(type, expression) — the argument order is reversed between the two databases. CONVERT is not supported in PostgreSQL or DuckDB.
The recommendation: use CAST() for any SQL code that may run on multiple database engines (shared scripts, portable ETL code). Use :: in PostgreSQL and DuckDB-specific code where readability matters. Never use CONVERT in code that needs to work across MySQL and SQL Server without careful attention to the argument order reversal.
Implicit casting is automatic type conversion performed by the database without an explicit CAST instruction. When you write WHERE order_date = '2024-01-15', the database implicitly converts the text literal '2024-01-15' to a DATE for comparison. When you add an integer to a numeric column, the integer is implicitly promoted to numeric. These conversions are usually correct and convenient.
The risks are subtle but real. First, performance: if implicit casting is applied to an indexed column (WHERE CAST(id AS TEXT) = '42'), the index cannot be used and the query performs a full table scan. The rule is to never apply any function or cast to the indexed column side — cast the constant instead. Second, silent semantic errors: implicit casting between text and numeric may succeed with unexpected results in some databases (MySQL converts '5abc' to 5, losing the 'abc' silently), while the same cast raises an error in others (PostgreSQL). Third, portability: a query that relies on implicit casting may behave differently across database engines, especially for edge cases like empty string to numeric conversion or locale-specific date parsing.
Best practice: be explicit about type conversions in all non-trivial cases. Reserve implicit casting for the truly safe cases — integer to numeric promotion in arithmetic, string literal to date in WHERE clauses with clearly formatted ISO dates. For joins between columns of different types, casting on the constant side (not the column side) preserves index usage: WHERE date_column = '2024-01-15'::DATE is better than WHERE date_column::TEXT = '2024-01-15'.
TRY_CAST(expression AS type) attempts type conversion and returns NULL when conversion fails, instead of raising an error. Standard CAST terminates the entire query with an error if any single row has an unconvertible value. TRY_CAST allows the query to continue, producing NULL for problematic rows while returning converted values for valid rows.
The primary use case is data quality assessment on imported data. After loading a CSV into a staging table where every column is TEXT, TRY_CAST identifies which rows have invalid values in each column: SELECT COUNT(*) FROM staging WHERE TRY_CAST(revenue_col AS NUMERIC) IS NULL gives the count of rows with non-numeric revenue values. This pre-flight check prevents a bulk INSERT from crashing midway through a large load.
TRY_CAST is native to DuckDB and SQL Server. PostgreSQL does not have TRY_CAST — the equivalent pattern uses a CASE + regular expression check (CASE WHEN col ~ '^-?[0-9]+(\.[0-9]+)?$' THEN col::NUMERIC ELSE NULL END) or a custom exception-handling function written in PL/pgSQL. In data pipelines, TRY_CAST enables a quarantine pattern: rows where TRY_CAST returns NULL are inserted into an error table for manual review, while rows where TRY_CAST succeeds proceed to the target table. This pattern is essential for production ETL pipelines where bad data must not block good data.
FLOAT (and DOUBLE) store values in binary floating-point format — a base-2 representation. Most decimal fractions (like 0.1 and 0.2) cannot be represented exactly in binary, just as 1/3 cannot be represented exactly in decimal. This causes accumulation of tiny rounding errors across arithmetic operations. 0.1::FLOAT + 0.2::FLOAT may return 0.30000000000000004, not 0.3. For a single transaction this is irrelevant, but accumulated across thousands of transactions in a financial report, these errors compound into visible discrepancies.
NUMERIC (also called DECIMAL) stores values in exact base-10 representation with a specified precision and scale. 0.1::NUMERIC + 0.2::NUMERIC = exactly 0.3. No floating-point error is introduced. NUMERIC(10, 2) stores up to 10 total digits with exactly 2 decimal places — exactly the semantics needed for currency values.
For financial columns in a production database, always declare them as NUMERIC(precision, scale) — never FLOAT or DOUBLE. For analytical queries on existing FLOAT columns, cast to NUMERIC before summing or averaging: SUM(amount::NUMERIC). FLOAT is appropriate for scientific and statistical data where approximate values are acceptable and where the operations (averages, standard deviations) are inherently approximate anyway. The performance difference between NUMERIC and FLOAT is small on modern hardware and never justifies using FLOAT for money.
UNION requires all component queries to return the same number of columns with compatible types in corresponding positions. When types differ, the database either performs implicit coercion (if the types are compatible) or raises a type error. Rather than relying on implicit coercion (which varies by database), the professional approach is to cast all non-matching columns explicitly in every component query.
The rule for choosing the cast direction: cast to the most general type that can represent all values. If one query returns INTEGER and another returns NUMERIC, cast the INTEGER to NUMERIC — NUMERIC can represent all INTEGER values without loss. If one query returns DATE and another returns TEXT, cast the DATE to TEXT — TEXT can represent any DATE value. Avoid casting from a more general to a less general type (TEXT to INTEGER) in UNION columns — this will fail for rows where the text is not a valid integer.
The most common UNION type issue: combining different tables with the same column name but different types. customer_id as INTEGER in customers and customer_id as VARCHAR in a legacy import table cannot be directly unioned — cast one to match the other. Use CAST(customer_id AS TEXT) on the INTEGER side or CAST(legacy_id AS INTEGER) on the TEXT side depending on which type you want in the combined result. Always add explicit column aliases on the first query in the UNION — these determine the column names in the final result. Test UNION type compatibility by running each component query independently first and comparing the output column types.
// Part 12
Errors You Will Hit — And Exactly Why They Happen
🎯 Key Takeaways
- ✓Three cast syntaxes: CAST(expr AS type) is SQL standard and portable; expr::type is PostgreSQL/DuckDB shorthand; CONVERT() is MySQL/SQL Server only with incompatible argument order between the two.
- ✓Safe conversions (always succeed): INTEGER → TEXT, NUMERIC → TEXT, DATE → TEXT, INTEGER → NUMERIC, BOOLEAN → INTEGER. Risky conversions (can fail): TEXT → INTEGER, TEXT → NUMERIC, TEXT → DATE.
- ✓TRY_CAST returns NULL on failure instead of raising an error. Use it for data quality audits on imported data before bulk INSERT. PostgreSQL lacks TRY_CAST — use regex validation with CASE instead.
- ✓CAST to INTEGER truncates toward zero — does not round. 3.99::INTEGER = 3. Always ROUND() first if rounding is intended: ROUND(3.99, 0)::INTEGER = 4.
- ✓Use NUMERIC not FLOAT for financial values. FLOAT has binary precision issues: 0.1 + 0.2 ≠ 0.3 in floating-point. NUMERIC stores exact decimal values. Cast money columns to NUMERIC before aggregating.
- ✓Implicit casting on an indexed column prevents index usage. Always cast the constant, not the column: WHERE date_col = '2024-01-15'::DATE not WHERE date_col::TEXT = '2024-01-15'.
- ✓In UNION queries, cast non-matching column types explicitly. Cast to the more general type — TEXT can hold anything, NUMERIC can hold any number. Column names come from the first query.
- ✓Date string parsing: ISO format (YYYY-MM-DD) always works with ::DATE. Non-ISO formats need strptime(str, format)::DATE in DuckDB or TO_DATE(str, format) in PostgreSQL.
- ✓For concatenation, numeric columns must be cast to TEXT first: customer_id::TEXT || '-' || name. The || operator requires both operands to be text in PostgreSQL and DuckDB.
- ✓The TRY_CAST quarantine pattern: INSERT valid rows (TRY_CAST IS NOT NULL) to target, INSERT invalid rows (TRY_CAST IS NULL) to error table for review. This prevents bad data from blocking good data in bulk loads.
What comes next
In Module 45, you learn Views — virtual tables defined by a query, stored in the database schema, and queryable like real tables. Covers creation, updating, security, and when views outperform repeated CTEs.
Module 45 → ViewsDiscussion
0Have a better approach? Found something outdated? Share it — your knowledge helps everyone learning here.