Skip to main content
Database Administration & Troubleshooting

Select Distinct Count SQL — Complete CLI Reference, Syntax

select distinct count sql is the SQL aggregate pattern using COUNT(DISTINCT column) to return the number of unique non-NULL values in a column or expression. It eliminates duplicate rows before counting, enabling cardinality analysis and deduplication auditing across database engines.

-- Basic single-column distinct count
SELECT COUNT(DISTINCT column_name) FROM table_name WHERE condition;

Syntax verified against PostgreSQL 16, MySQL 8.0, SQL Server 2022.

select distinct count sql Syntax Reference

-- Multi-column distinct count (SQL standard using concatenation or tuple)
SELECT COUNT(DISTINCT col1 || '-' || col2) FROM table_name;
-- Modern engines (PostgreSQL, SQLite 3.33+, MySQL 8.0+):
SELECT COUNT(DISTINCT (col1, col2)) FROM table_name;

-- Equivalent using derived table (works on all engines)
SELECT COUNT(*) FROM (
    SELECT DISTINCT col1, col2 FROM table_name WHERE condition
) AS dt;

-- Count per group with DISTINCT
SELECT group_col, COUNT(DISTINCT value_col) FROM table_name GROUP BY group_col;

select distinct count sql Rapid Reference Cheat Sheet

Action SQL Clause Engine Context Key Flag / Syntax Impact / Result
Count unique values in one column COUNT(DISTINCT col) All SQL engines DISTINCT inside COUNT() Returns cardinality of the column (NULLs excluded)
Count unique combinations of two columns COUNT(DISTINCT (a, b)) PostgreSQL, MySQL 8.0+, SQLite 3.33+ Row-constructor syntax Counts distinct unordered tuples
Count unique combinations (fallback) SELECT COUNT(*) FROM (SELECT DISTINCT a, b FROM t) AS dt All SQL engines Derived table with DISTINCT Universally portable; slight overhead from subquery
Count distinct values per group SELECT g, COUNT(DISTINCT c) FROM t GROUP BY g All SQL engines GROUP BY + COUNT(DISTINCT) Cardinality per partition
Count distinct with NULL sensitivity COUNT(DISTINCT COALESCE(col, 'NULL_PLACEHOLDER')) All SQL engines COALESCE to replace NULL Counts NULL as a distinct value instead of ignoring it
Count distinct across multiple tables SELECT COUNT(DISTINCT t1.col) FROM t1 JOIN t2 ON t1.id = t2.id All SQL engines Join + COUNT(DISTINCT) Cardinality after join, potentially large intermediate result set
Performance-optimized distinct count (approximate) SELECT COUNT(DISTINCT col) FROM table SAMPLE (1 PERCENT) PostgreSQL, Snowflake, BigQuery SAMPLE / TABLESAMPLE Reduces I/O; result is statistical estimate
See also  Concat In SQL: Syntax, Flags, Examples & Troubleshooting Guide

Advanced Implementation & Parameters

Behavior Across Major Engines

COUNT(DISTINCT column) is part of the SQL-92 standard, but engine-specific quirks affect its execution:

  • PostgreSQL — Uses a hash-based aggregation (HashAggregate) when work_mem is sufficient; falls back to disk-based sort. COUNT(DISTINCT (a, b)) is supported natively as row-constructor.
  • MySQL 8.0+ — Supports COUNT(DISTINCT col) and COUNT(DISTINCT col1, col2) (comma-separated list, not tuple syntax). The comma form counts distinct combinations of two expressions.
  • SQL Server 2022 — Only accepts a single expression inside COUNT(DISTINCT). Multi-column distinct count must use the derived-table pattern. COUNT(DISTINCT col1 + col2) is valid but semantically dangerous (concatenation vs. tuple).
  • SQLite 3.33+ — Supports COUNT(DISTINCT (a, b)) row-constructor like PostgreSQL.
  • BigQueryCOUNT(DISTINCT) is exact up to approximately 1e12 cardinality; beyond that, use APPROX_COUNT_DISTINCT for performance.
  • SnowflakeCOUNT(DISTINCT) always returns an exact count; APPROX_COUNT_DISTINCT available for large datasets.

Performance Characteristics

COUNT(DISTINCT) forces a sort or hash of the distinct values before the count. For large cardinalities, this can consume significant memory and I/O. Query planners typically show a HashAggregate or Unique followed by Aggregate node. The derived-table pattern SELECT COUNT(*) FROM (SELECT DISTINCT ...) produces the same plan on most engines but may be easier to tune with index hints.

-- PostgreSQL EXPLAIN output for COUNT(DISTINCT)
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(DISTINCT program_name) FROM cm_production;

-- Typical plan node: HashAggregate (cost=... rows=... width=...)
--   -> Seq Scan on cm_production
-- Index-only scan can improve performance if a covering index exists
CREATE INDEX idx_program_name ON cm_production (program_name);

Handling NULLs

COUNT(DISTINCT col) automatically excludes NULL values. To count NULL as a distinct value, use COALESCE to replace it with a sentinel:

SELECT COUNT(DISTINCT COALESCE(region, 'UNKNOWN')) FROM item WHERE region IS NULL;

Error Resolution & Troubleshooting

Error Code / Condition Root Cause Remediation Command / Action
ERROR: function count(text, text) does not exist (PostgreSQL) Passing multiple columns as separate arguments to COUNT(DISTINCT col1, col2) Use row-constructor: COUNT(DISTINCT (col1, col2))
Incorrect syntax near ',' (SQL Server) SQL Server does not accept multiple expressions inside DISTINCT Use derived table: SELECT COUNT(*) FROM (SELECT DISTINCT col1, col2 FROM t) AS dt
Out of memory / HashAggregate: spilled to disk Cardinality too high for work_mem (PostgreSQL) or sort_buffer_size (MySQL) Increase work_mem (PostgreSQL) or sort_buffer_size (MySQL), or use approximate count on very large datasets
Distinct count includes NULL unexpectedly Using COUNT(DISTINCT COALESCE(col, value)) without understanding that NULL replacement becomes a distinct value Remove COALESCE if NULLs should be ignored; confirm business logic
Query returns 0 instead of expected count All values in column are NULL, or WHERE condition eliminates all rows Check with SELECT COUNT(*) FROM table WHERE condition first; then verify column contains non-NULL data
ELSE branch in CASE expression causing distinct-over-partition error Using COUNT(DISTINCT CASE WHEN ... THEN col END) incorrectly scopes the distinct set SELECT COUNT(DISTINCT CASE WHEN condition THEN col END) FROM t; works but counts NULL for ELSE rows; use COUNT(DISTINCT col) FILTER (WHERE condition) on PostgreSQL 9.4+
See also  SQL Server DATEADD Function: Syntax, Parameters, and Examples

Diagnostic Query Patterns

-- Check for NULLs in the target column
SELECT COUNT(*) AS total, COUNT(col) AS non_null, COUNT(*) - COUNT(col) AS null_count FROM table;

-- Preview distinct values before counting (limit for large cardinalities)
SELECT DISTINCT col FROM table ORDER BY col LIMIT 20;

-- Compare distinct count to total count for duplication ratio
SELECT COUNT(col) AS total, COUNT(DISTINCT col) AS unique, 
       ROUND(COUNT(col)::numeric / NULLIF(COUNT(DISTINCT col), 0), 2) AS duplication_ratio
FROM table;

Production-Grade Implementation

Indexing Strategy

To minimize latency in COUNT(DISTINCT column) execution, create a covering index that the query planner can use for an index-only scan:

-- PostgreSQL: index-only scan on a single column
CREATE INDEX idx_covering ON cm_production (program_name);

-- For multi-column distinct count with derived table
CREATE INDEX idx_covering_multi ON cm_production (program_type, program_name);

-- Check index usage
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(DISTINCT program_name) FROM cm_production;
-- Look for "Index Only Scan" or "Bitmap Index Scan"

Approximate Distinct Count for Truly Massive Data

When exact cardinality is not required and latency is critical, use engine-specific approximate functions:

Engine Function Error Margin Use Case
PostgreSQL hll extension via hyperloglog ~1-2% Unique visitors, IP cardinality
BigQuery APPROX_COUNT_DISTINCT(col) ~2% Billions of rows
Snowflake APPROX_COUNT_DISTINCT(col) ~1% Large-scale analytics
MySQL No built-in approximate distinct; use TABLESAMPLE estimate Variable Quick cardinality checks

Materialized Distinct Counts

For dashboards or repeated queries, cache the distinct count in a materialized view or summary table:

-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW mv_program_unique_count AS
SELECT program_type, push_number, COUNT(DISTINCT program_name) AS unique_programs
FROM cm_production
GROUP BY program_type, push_number;

-- Refresh on schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_program_unique_count;

Security & Least Privilege

  • COUNT(DISTINCT col) requires SELECT on the target column(s). No write privileges needed.
  • For derived: table patterns, the subquery inherits the same permission scope.
  • Avoid exposing exact distinct counts on sensitive columns (PII, customer IDs) in public-facing APIs; use approximate counts or bucketed ranges instead.
  • In PostgreSQL, COUNT(DISTINCT) on a column with row-level security (RLS) reflects only rows visible per policy.
See also  Union In SQL: Syntax, Examples, Flags & Production Guide

Anti-Patterns to Avoid

  • Using DISTINCT outside COUNT unnecessarilySELECT DISTINCT col FROM table retrieves all rows; use COUNT(DISTINCT col) instead of SELECT COUNT(*) FROM (SELECT DISTINCT col FROM table) AS dt plan-wise it’s the same, but direct COUNT(DISTINCT) is clearer.
  • Counting distinct on an unbounded text column — Avoid COUNT(DISTINCT long_text_col) without limiting rows; it can consume exessive memory. Apply WHERE filters or use a hash.
  • Adding unnecessary concatenationCOUNT(DISTINCT col1 + col2) (SQL Server) treats the addition as a numeric expression, not a tuple, and can lose distinctness if concatenation produces non-unique results.

Frequently Asked Questions

What is the difference between COUNT(DISTINCT column) and COUNT(column) in SQL?

Answer: COUNT(DISTINCT column) returns the number of unique non-null values; COUNT(column) returns the total number of non-null rows.

COUNT(DISTINCT col) ignores duplicates and NULLs, while COUNT(col) counts every non-NULL row including duplicates.

-- Example on 'users' table with two NULL emails and one duplicate.
SELECT COUNT(DISTINCT email) AS unique_emails, COUNT(email) AS total_emails FROM users;

When should I use COUNT(DISTINCT) instead of GROUP BY with COUNT?

Answer: Use COUNT(DISTINCT col) for a single aggregated unique count; use GROUP BY when you need counts per category or additional columns.

COUNT(DISTINCT) is a scalar aggregate returning one row. GROUP BY with COUNT returns one row per group. Example:

-- COUNT(DISTINCT)
SELECT COUNT(DISTINCT department_id) FROM employees;
-- GROUP BY equivalent (same result but less efficient)
SELECT COUNT(*) FROM (SELECT DISTINCT department_id FROM employees) sub;

How do I fix “Syntax error: DISTINCT is not allowed with COUNT(*)” in SQL Server?

Answer: Replace COUNT(DISTINCT *) with COUNT(DISTINCT column) or use a subquery with DISTINCT inside a COUNT(*).

SQL Server and most databases disallow COUNT(DISTINCT *). Fix:

-- Incorrect
SELECT COUNT(DISTINCT *) FROM orders;
-- Correct
SELECT COUNT(DISTINCT order_id) FROM orders;
-- Alternative (counts distinct rows, excludes NULLs)
SELECT COUNT(*) FROM (SELECT DISTINCT * FROM orders) sub;

Does SELECT COUNT(DISTINCT column) work consistently across AWS RDS, Azure SQL, and Google Cloud SQL?

Answer: Yes, COUNT(DISTINCT) is ANSI SQL standard and supported on all three clouds, though NULL handling and performance vary slightly.

On AWS RDS (MySQL, PostgreSQL, SQL Server), Azure SQL Database, and Cloud SQL (MySQL, PostgreSQL), COUNT(DISTINCT col) ignores NULLs identically. For large datasets, test vendor-specific optimizations:

-- Example works on all
SELECT COUNT(DISTINCT client_id) FROM billing WHERE status='active';

What is the fastest way to count distinct values in a large PostgreSQL table?

Answer: Use the hll extension (HyperLogLog) for approximate distinct counts, or ensure an index on the column for exact COUNT(DISTINCT).

Exact:

-- With B-tree index on column
CREATE INDEX idx_col ON large_table(col);
SELECT COUNT(DISTINCT col) FROM large_table;

Approximate (fast, low memory):

CREATE EXTENSION hll;
SELECT hll_cardinality(hll_add_agg(hll_hash_text(col))) AS approx_distinct FROM large_table;