select distinct count sql is the SQL aggregate pattern using COUNT(DISTINCT column) to return the number of unique non-NULL values in a column or expression. It eliminates duplicate rows before counting, enabling cardinality analysis and deduplication auditing across database engines.
-- Basic single-column distinct count
SELECT COUNT(DISTINCT column_name) FROM table_name WHERE condition;
Syntax verified against PostgreSQL 16, MySQL 8.0, SQL Server 2022.
select distinct count sql Syntax Reference
-- Multi-column distinct count (SQL standard using concatenation or tuple)
SELECT COUNT(DISTINCT col1 || '-' || col2) FROM table_name;
-- Modern engines (PostgreSQL, SQLite 3.33+, MySQL 8.0+):
SELECT COUNT(DISTINCT (col1, col2)) FROM table_name;
-- Equivalent using derived table (works on all engines)
SELECT COUNT(*) FROM (
SELECT DISTINCT col1, col2 FROM table_name WHERE condition
) AS dt;
-- Count per group with DISTINCT
SELECT group_col, COUNT(DISTINCT value_col) FROM table_name GROUP BY group_col;
select distinct count sql Rapid Reference Cheat Sheet
| Action | SQL Clause | Engine Context | Key Flag / Syntax | Impact / Result |
|---|---|---|---|---|
| Count unique values in one column | COUNT(DISTINCT col) |
All SQL engines | DISTINCT inside COUNT() |
Returns cardinality of the column (NULLs excluded) |
| Count unique combinations of two columns | COUNT(DISTINCT (a, b)) |
PostgreSQL, MySQL 8.0+, SQLite 3.33+ | Row-constructor syntax | Counts distinct unordered tuples |
| Count unique combinations (fallback) | SELECT COUNT(*) FROM (SELECT DISTINCT a, b FROM t) AS dt |
All SQL engines | Derived table with DISTINCT |
Universally portable; slight overhead from subquery |
| Count distinct values per group | SELECT g, COUNT(DISTINCT c) FROM t GROUP BY g |
All SQL engines | GROUP BY + COUNT(DISTINCT) |
Cardinality per partition |
Count distinct with NULL sensitivity |
COUNT(DISTINCT COALESCE(col, 'NULL_PLACEHOLDER')) |
All SQL engines | COALESCE to replace NULL |
Counts NULL as a distinct value instead of ignoring it |
| Count distinct across multiple tables | SELECT COUNT(DISTINCT t1.col) FROM t1 JOIN t2 ON t1.id = t2.id |
All SQL engines | Join + COUNT(DISTINCT) |
Cardinality after join, potentially large intermediate result set |
| Performance-optimized distinct count (approximate) | SELECT COUNT(DISTINCT col) FROM table SAMPLE (1 PERCENT) |
PostgreSQL, Snowflake, BigQuery | SAMPLE / TABLESAMPLE |
Reduces I/O; result is statistical estimate |
Advanced Implementation & Parameters
Behavior Across Major Engines
COUNT(DISTINCT column) is part of the SQL-92 standard, but engine-specific quirks affect its execution:
- PostgreSQL — Uses a hash-based aggregation (HashAggregate) when work_mem is sufficient; falls back to disk-based sort.
COUNT(DISTINCT (a, b))is supported natively as row-constructor. - MySQL 8.0+ — Supports
COUNT(DISTINCT col)andCOUNT(DISTINCT col1, col2)(comma-separated list, not tuple syntax). The comma form counts distinct combinations of two expressions. - SQL Server 2022 — Only accepts a single expression inside
COUNT(DISTINCT). Multi-column distinct count must use the derived-table pattern.COUNT(DISTINCT col1 + col2)is valid but semantically dangerous (concatenation vs. tuple). - SQLite 3.33+ — Supports
COUNT(DISTINCT (a, b))row-constructor like PostgreSQL. - BigQuery —
COUNT(DISTINCT)is exact up to approximately 1e12 cardinality; beyond that, useAPPROX_COUNT_DISTINCTfor performance. - Snowflake —
COUNT(DISTINCT)always returns an exact count;APPROX_COUNT_DISTINCTavailable for large datasets.
Performance Characteristics
COUNT(DISTINCT) forces a sort or hash of the distinct values before the count. For large cardinalities, this can consume significant memory and I/O. Query planners typically show a HashAggregate or Unique followed by Aggregate node. The derived-table pattern SELECT COUNT(*) FROM (SELECT DISTINCT ...) produces the same plan on most engines but may be easier to tune with index hints.
-- PostgreSQL EXPLAIN output for COUNT(DISTINCT)
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(DISTINCT program_name) FROM cm_production;
-- Typical plan node: HashAggregate (cost=... rows=... width=...)
-- -> Seq Scan on cm_production
-- Index-only scan can improve performance if a covering index exists
CREATE INDEX idx_program_name ON cm_production (program_name);
Handling NULLs
COUNT(DISTINCT col) automatically excludes NULL values. To count NULL as a distinct value, use COALESCE to replace it with a sentinel:
SELECT COUNT(DISTINCT COALESCE(region, 'UNKNOWN')) FROM item WHERE region IS NULL;
Error Resolution & Troubleshooting
| Error Code / Condition | Root Cause | Remediation Command / Action |
|---|---|---|
ERROR: function count(text, text) does not exist (PostgreSQL) |
Passing multiple columns as separate arguments to COUNT(DISTINCT col1, col2) |
Use row-constructor: COUNT(DISTINCT (col1, col2)) |
Incorrect syntax near ',' (SQL Server) |
SQL Server does not accept multiple expressions inside DISTINCT |
Use derived table: SELECT COUNT(*) FROM (SELECT DISTINCT col1, col2 FROM t) AS dt |
Out of memory / HashAggregate: spilled to disk |
Cardinality too high for work_mem (PostgreSQL) or sort_buffer_size (MySQL) |
Increase work_mem (PostgreSQL) or sort_buffer_size (MySQL), or use approximate count on very large datasets |
Distinct count includes NULL unexpectedly |
Using COUNT(DISTINCT COALESCE(col, value)) without understanding that NULL replacement becomes a distinct value |
Remove COALESCE if NULLs should be ignored; confirm business logic |
Query returns 0 instead of expected count |
All values in column are NULL, or WHERE condition eliminates all rows |
Check with SELECT COUNT(*) FROM table WHERE condition first; then verify column contains non-NULL data |
ELSE branch in CASE expression causing distinct-over-partition error |
Using COUNT(DISTINCT CASE WHEN ... THEN col END) incorrectly scopes the distinct set |
SELECT COUNT(DISTINCT CASE WHEN condition THEN col END) FROM t; works but counts NULL for ELSE rows; use COUNT(DISTINCT col) FILTER (WHERE condition) on PostgreSQL 9.4+ |
Diagnostic Query Patterns
-- Check for NULLs in the target column
SELECT COUNT(*) AS total, COUNT(col) AS non_null, COUNT(*) - COUNT(col) AS null_count FROM table;
-- Preview distinct values before counting (limit for large cardinalities)
SELECT DISTINCT col FROM table ORDER BY col LIMIT 20;
-- Compare distinct count to total count for duplication ratio
SELECT COUNT(col) AS total, COUNT(DISTINCT col) AS unique,
ROUND(COUNT(col)::numeric / NULLIF(COUNT(DISTINCT col), 0), 2) AS duplication_ratio
FROM table;
Production-Grade Implementation
Indexing Strategy
To minimize latency in COUNT(DISTINCT column) execution, create a covering index that the query planner can use for an index-only scan:
-- PostgreSQL: index-only scan on a single column
CREATE INDEX idx_covering ON cm_production (program_name);
-- For multi-column distinct count with derived table
CREATE INDEX idx_covering_multi ON cm_production (program_type, program_name);
-- Check index usage
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(DISTINCT program_name) FROM cm_production;
-- Look for "Index Only Scan" or "Bitmap Index Scan"
Approximate Distinct Count for Truly Massive Data
When exact cardinality is not required and latency is critical, use engine-specific approximate functions:
| Engine | Function | Error Margin | Use Case |
|---|---|---|---|
| PostgreSQL | hll extension via hyperloglog |
~1-2% | Unique visitors, IP cardinality |
| BigQuery | APPROX_COUNT_DISTINCT(col) |
~2% | Billions of rows |
| Snowflake | APPROX_COUNT_DISTINCT(col) |
~1% | Large-scale analytics |
| MySQL | No built-in approximate distinct; use TABLESAMPLE estimate |
Variable | Quick cardinality checks |
Materialized Distinct Counts
For dashboards or repeated queries, cache the distinct count in a materialized view or summary table:
-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW mv_program_unique_count AS
SELECT program_type, push_number, COUNT(DISTINCT program_name) AS unique_programs
FROM cm_production
GROUP BY program_type, push_number;
-- Refresh on schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_program_unique_count;
Security & Least Privilege
COUNT(DISTINCT col)requiresSELECTon the target column(s). No write privileges needed.- For derived: table patterns, the subquery inherits the same permission scope.
- Avoid exposing exact distinct counts on sensitive columns (PII, customer IDs) in public-facing APIs; use approximate counts or bucketed ranges instead.
- In PostgreSQL,
COUNT(DISTINCT)on a column with row-level security (RLS) reflects only rows visible per policy.
Anti-Patterns to Avoid
- Using DISTINCT outside COUNT unnecessarily —
SELECT DISTINCT col FROM tableretrieves all rows; useCOUNT(DISTINCT col)instead ofSELECT COUNT(*) FROM (SELECT DISTINCT col FROM table) AS dtplan-wise it’s the same, but directCOUNT(DISTINCT)is clearer. - Counting distinct on an unbounded text column — Avoid
COUNT(DISTINCT long_text_col)without limiting rows; it can consume exessive memory. ApplyWHEREfilters or use a hash. - Adding unnecessary concatenation —
COUNT(DISTINCT col1 + col2)(SQL Server) treats the addition as a numeric expression, not a tuple, and can lose distinctness if concatenation produces non-unique results.
Frequently Asked Questions
What is the difference between COUNT(DISTINCT column) and COUNT(column) in SQL?
Answer: COUNT(DISTINCT column) returns the number of unique non-null values; COUNT(column) returns the total number of non-null rows.
COUNT(DISTINCT col) ignores duplicates and NULLs, while COUNT(col) counts every non-NULL row including duplicates.
-- Example on 'users' table with two NULL emails and one duplicate.
SELECT COUNT(DISTINCT email) AS unique_emails, COUNT(email) AS total_emails FROM users;
When should I use COUNT(DISTINCT) instead of GROUP BY with COUNT?
Answer: Use COUNT(DISTINCT col) for a single aggregated unique count; use GROUP BY when you need counts per category or additional columns.
COUNT(DISTINCT) is a scalar aggregate returning one row. GROUP BY with COUNT returns one row per group. Example:
-- COUNT(DISTINCT)
SELECT COUNT(DISTINCT department_id) FROM employees;
-- GROUP BY equivalent (same result but less efficient)
SELECT COUNT(*) FROM (SELECT DISTINCT department_id FROM employees) sub;
How do I fix “Syntax error: DISTINCT is not allowed with COUNT(*)” in SQL Server?
Answer: Replace COUNT(DISTINCT *) with COUNT(DISTINCT column) or use a subquery with DISTINCT inside a COUNT(*).
SQL Server and most databases disallow COUNT(DISTINCT *). Fix:
-- Incorrect
SELECT COUNT(DISTINCT *) FROM orders;
-- Correct
SELECT COUNT(DISTINCT order_id) FROM orders;
-- Alternative (counts distinct rows, excludes NULLs)
SELECT COUNT(*) FROM (SELECT DISTINCT * FROM orders) sub;
Does SELECT COUNT(DISTINCT column) work consistently across AWS RDS, Azure SQL, and Google Cloud SQL?
Answer: Yes, COUNT(DISTINCT) is ANSI SQL standard and supported on all three clouds, though NULL handling and performance vary slightly.
On AWS RDS (MySQL, PostgreSQL, SQL Server), Azure SQL Database, and Cloud SQL (MySQL, PostgreSQL), COUNT(DISTINCT col) ignores NULLs identically. For large datasets, test vendor-specific optimizations:
-- Example works on all
SELECT COUNT(DISTINCT client_id) FROM billing WHERE status='active';
What is the fastest way to count distinct values in a large PostgreSQL table?
Answer: Use the hll extension (HyperLogLog) for approximate distinct counts, or ensure an index on the column for exact COUNT(DISTINCT).
Exact:
-- With B-tree index on column
CREATE INDEX idx_col ON large_table(col);
SELECT COUNT(DISTINCT col) FROM large_table;
Approximate (fast, low memory):
CREATE EXTENSION hll;
SELECT hll_cardinality(hll_add_agg(hll_hash_text(col))) AS approx_distinct FROM large_table;

Command Line Expert & Software Engineer
Welcome! I’m Thomas Heinrich, a software engineer and system administrator with a deep passion for the Command Line Interface (CLI). With years of experience navigating the terminal, building backend architectures, and automating server deployments, I created this space to share practical, real-world terminal knowledge.
Whether you are a beginner taking your first steps in a Linux environment or a seasoned DevOps engineer looking to optimize your deployment scripts, you will find actionable solutions here. My goal is to help you ditch the mouse, speed up your workflow, and harness the full power of the command line.