Select Distinct Count SQL — Complete CLI Reference, Syntax

select distinct count sql is the SQL aggregate pattern using COUNT(DISTINCT column) to return the number of unique non-NULL values in a column or expression. It eliminates duplicate rows before counting, enabling cardinality analysis and deduplication auditing across database engines.

-- Basic single-column distinct count
SELECT COUNT(DISTINCT column_name) FROM table_name WHERE condition;

Syntax verified against PostgreSQL 16, MySQL 8.0, SQL Server 2022.

Table of Contents

select distinct count sql Syntax Reference

-- Multi-column distinct count (SQL standard using concatenation or tuple)
SELECT COUNT(DISTINCT col1 || '-' || col2) FROM table_name;
-- Modern engines (PostgreSQL, SQLite 3.33+, MySQL 8.0+):
SELECT COUNT(DISTINCT (col1, col2)) FROM table_name;

-- Equivalent using derived table (works on all engines)
SELECT COUNT(*) FROM (
    SELECT DISTINCT col1, col2 FROM table_name WHERE condition
) AS dt;

-- Count per group with DISTINCT
SELECT group_col, COUNT(DISTINCT value_col) FROM table_name GROUP BY group_col;

select distinct count sql Rapid Reference Cheat Sheet

Action	SQL Clause	Engine Context	Key Flag / Syntax	Impact / Result
Count unique values in one column	`COUNT(DISTINCT col)`	All SQL engines	`DISTINCT` inside `COUNT()`	Returns cardinality of the column (NULLs excluded)
Count unique combinations of two columns	`COUNT(DISTINCT (a, b))`	PostgreSQL, MySQL 8.0+, SQLite 3.33+	Row-constructor syntax	Counts distinct unordered tuples
Count unique combinations (fallback)	`SELECT COUNT(*) FROM (SELECT DISTINCT a, b FROM t) AS dt`	All SQL engines	Derived table with `DISTINCT`	Universally portable; slight overhead from subquery
Count distinct values per group	`SELECT g, COUNT(DISTINCT c) FROM t GROUP BY g`	All SQL engines	`GROUP BY` + `COUNT(DISTINCT)`	Cardinality per partition
Count distinct with `NULL` sensitivity	`COUNT(DISTINCT COALESCE(col, 'NULL_PLACEHOLDER'))`	All SQL engines	`COALESCE` to replace NULL	Counts NULL as a distinct value instead of ignoring it
Count distinct across multiple tables	`SELECT COUNT(DISTINCT t1.col) FROM t1 JOIN t2 ON t1.id = t2.id`	All SQL engines	Join + `COUNT(DISTINCT)`	Cardinality after join, potentially large intermediate result set
Performance-optimized distinct count (approximate)	`SELECT COUNT(DISTINCT col) FROM table SAMPLE (1 PERCENT)`	PostgreSQL, Snowflake, BigQuery	`SAMPLE` / `TABLESAMPLE`	Reduces I/O; result is statistical estimate

Advanced Implementation & Parameters

Behavior Across Major Engines

COUNT(DISTINCT column) is part of the SQL-92 standard, but engine-specific quirks affect its execution:

PostgreSQL — Uses a hash-based aggregation (HashAggregate) when work_mem is sufficient; falls back to disk-based sort. COUNT(DISTINCT (a, b)) is supported natively as row-constructor.
MySQL 8.0+ — Supports COUNT(DISTINCT col) and COUNT(DISTINCT col1, col2) (comma-separated list, not tuple syntax). The comma form counts distinct combinations of two expressions.
SQL Server 2022 — Only accepts a single expression inside COUNT(DISTINCT). Multi-column distinct count must use the derived-table pattern. COUNT(DISTINCT col1 + col2) is valid but semantically dangerous (concatenation vs. tuple).
SQLite 3.33+ — Supports COUNT(DISTINCT (a, b)) row-constructor like PostgreSQL.
BigQuery — COUNT(DISTINCT) is exact up to approximately 1e12 cardinality; beyond that, use APPROX_COUNT_DISTINCT for performance.
Snowflake — COUNT(DISTINCT) always returns an exact count; APPROX_COUNT_DISTINCT available for large datasets.

Performance Characteristics

COUNT(DISTINCT) forces a sort or hash of the distinct values before the count. For large cardinalities, this can consume significant memory and I/O. Query planners typically show a HashAggregate or Unique followed by Aggregate node. The derived-table pattern SELECT COUNT(*) FROM (SELECT DISTINCT ...) produces the same plan on most engines but may be easier to tune with index hints.

-- PostgreSQL EXPLAIN output for COUNT(DISTINCT)
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(DISTINCT program_name) FROM cm_production;

-- Typical plan node: HashAggregate (cost=... rows=... width=...)
--   -> Seq Scan on cm_production
-- Index-only scan can improve performance if a covering index exists
CREATE INDEX idx_program_name ON cm_production (program_name);

Handling NULLs

COUNT(DISTINCT col) automatically excludes NULL values. To count NULL as a distinct value, use COALESCE to replace it with a sentinel:

SELECT COUNT(DISTINCT COALESCE(region, 'UNKNOWN')) FROM item WHERE region IS NULL;

Error Resolution & Troubleshooting

Error Code / Condition	Root Cause	Remediation Command / Action
`ERROR: function count(text, text) does not exist` (PostgreSQL)	Passing multiple columns as separate arguments to `COUNT(DISTINCT col1, col2)`	Use row-constructor: `COUNT(DISTINCT (col1, col2))`
`Incorrect syntax near ','` (SQL Server)	SQL Server does not accept multiple expressions inside `DISTINCT`	Use derived table: `SELECT COUNT(*) FROM (SELECT DISTINCT col1, col2 FROM t) AS dt`
`Out of memory / HashAggregate: spilled to disk`	Cardinality too high for `work_mem` (PostgreSQL) or `sort_buffer_size` (MySQL)	Increase `work_mem` (PostgreSQL) or `sort_buffer_size` (MySQL), or use approximate count on very large datasets
`Distinct count includes NULL unexpectedly`	Using `COUNT(DISTINCT COALESCE(col, value))` without understanding that NULL replacement becomes a distinct value	Remove `COALESCE` if NULLs should be ignored; confirm business logic
`Query returns 0 instead of expected count`	All values in column are NULL, or `WHERE` condition eliminates all rows	Check with `SELECT COUNT(*) FROM table WHERE condition` first; then verify column contains non-NULL data
`ELSE` branch in CASE expression causing distinct-over-partition error	Using `COUNT(DISTINCT CASE WHEN ... THEN col END)` incorrectly scopes the distinct set	`SELECT COUNT(DISTINCT CASE WHEN condition THEN col END) FROM t;` works but counts NULL for ELSE rows; use `COUNT(DISTINCT col) FILTER (WHERE condition)` on PostgreSQL 9.4+

Diagnostic Query Patterns

-- Check for NULLs in the target column
SELECT COUNT(*) AS total, COUNT(col) AS non_null, COUNT(*) - COUNT(col) AS null_count FROM table;

-- Preview distinct values before counting (limit for large cardinalities)
SELECT DISTINCT col FROM table ORDER BY col LIMIT 20;

-- Compare distinct count to total count for duplication ratio
SELECT COUNT(col) AS total, COUNT(DISTINCT col) AS unique, 
       ROUND(COUNT(col)::numeric / NULLIF(COUNT(DISTINCT col), 0), 2) AS duplication_ratio
FROM table;

Production-Grade Implementation

Indexing Strategy

To minimize latency in COUNT(DISTINCT column) execution, create a covering index that the query planner can use for an index-only scan:

-- PostgreSQL: index-only scan on a single column
CREATE INDEX idx_covering ON cm_production (program_name);

-- For multi-column distinct count with derived table
CREATE INDEX idx_covering_multi ON cm_production (program_type, program_name);

-- Check index usage
EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(DISTINCT program_name) FROM cm_production;
-- Look for "Index Only Scan" or "Bitmap Index Scan"

Approximate Distinct Count for Truly Massive Data

When exact cardinality is not required and latency is critical, use engine-specific approximate functions:

Engine	Function	Error Margin	Use Case
PostgreSQL	`hll` extension via `hyperloglog`	~1-2%	Unique visitors, IP cardinality
BigQuery	`APPROX_COUNT_DISTINCT(col)`	~2%	Billions of rows
Snowflake	`APPROX_COUNT_DISTINCT(col)`	~1%	Large-scale analytics
MySQL	No built-in approximate distinct; use `TABLESAMPLE` estimate	Variable	Quick cardinality checks

Materialized Distinct Counts

For dashboards or repeated queries, cache the distinct count in a materialized view or summary table:

-- PostgreSQL materialized view
CREATE MATERIALIZED VIEW mv_program_unique_count AS
SELECT program_type, push_number, COUNT(DISTINCT program_name) AS unique_programs
FROM cm_production
GROUP BY program_type, push_number;

-- Refresh on schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_program_unique_count;

Security & Least Privilege

COUNT(DISTINCT col) requires SELECT on the target column(s). No write privileges needed.
For derived: table patterns, the subquery inherits the same permission scope.
Avoid exposing exact distinct counts on sensitive columns (PII, customer IDs) in public-facing APIs; use approximate counts or bucketed ranges instead.
In PostgreSQL, COUNT(DISTINCT) on a column with row-level security (RLS) reflects only rows visible per policy.

Anti-Patterns to Avoid

Using DISTINCT outside COUNT unnecessarily — SELECT DISTINCT col FROM table retrieves all rows; use COUNT(DISTINCT col) instead of SELECT COUNT(*) FROM (SELECT DISTINCT col FROM table) AS dt plan-wise it’s the same, but direct COUNT(DISTINCT) is clearer.
Counting distinct on an unbounded text column — Avoid COUNT(DISTINCT long_text_col) without limiting rows; it can consume exessive memory. Apply WHERE filters or use a hash.
Adding unnecessary concatenation — COUNT(DISTINCT col1 + col2) (SQL Server) treats the addition as a numeric expression, not a tuple, and can lose distinctness if concatenation produces non-unique results.

Frequently Asked Questions

What is the difference between COUNT(DISTINCT column) and COUNT(column) in SQL?

Answer: COUNT(DISTINCT column) returns the number of unique non-null values; COUNT(column) returns the total number of non-null rows.

COUNT(DISTINCT col) ignores duplicates and NULLs, while COUNT(col) counts every non-NULL row including duplicates.

-- Example on 'users' table with two NULL emails and one duplicate.
SELECT COUNT(DISTINCT email) AS unique_emails, COUNT(email) AS total_emails FROM users;

When should I use COUNT(DISTINCT) instead of GROUP BY with COUNT?

Answer: Use COUNT(DISTINCT col) for a single aggregated unique count; use GROUP BY when you need counts per category or additional columns.

COUNT(DISTINCT) is a scalar aggregate returning one row. GROUP BY with COUNT returns one row per group. Example:

-- COUNT(DISTINCT)
SELECT COUNT(DISTINCT department_id) FROM employees;
-- GROUP BY equivalent (same result but less efficient)
SELECT COUNT(*) FROM (SELECT DISTINCT department_id FROM employees) sub;

How do I fix “Syntax error: DISTINCT is not allowed with COUNT(*)” in SQL Server?

Answer: Replace COUNT(DISTINCT *) with COUNT(DISTINCT column) or use a subquery with DISTINCT inside a COUNT(*).

SQL Server and most databases disallow COUNT(DISTINCT *). Fix:

-- Incorrect
SELECT COUNT(DISTINCT *) FROM orders;
-- Correct
SELECT COUNT(DISTINCT order_id) FROM orders;
-- Alternative (counts distinct rows, excludes NULLs)
SELECT COUNT(*) FROM (SELECT DISTINCT * FROM orders) sub;

Does SELECT COUNT(DISTINCT column) work consistently across AWS RDS, Azure SQL, and Google Cloud SQL?

Answer: Yes, COUNT(DISTINCT) is ANSI SQL standard and supported on all three clouds, though NULL handling and performance vary slightly.

On AWS RDS (MySQL, PostgreSQL, SQL Server), Azure SQL Database, and Cloud SQL (MySQL, PostgreSQL), COUNT(DISTINCT col) ignores NULLs identically. For large datasets, test vendor-specific optimizations:

-- Example works on all
SELECT COUNT(DISTINCT client_id) FROM billing WHERE status='active';

What is the fastest way to count distinct values in a large PostgreSQL table?

Answer: Use the hll extension (HyperLogLog) for approximate distinct counts, or ensure an index on the column for exact COUNT(DISTINCT).

Exact:

-- With B-tree index on column
CREATE INDEX idx_col ON large_table(col);
SELECT COUNT(DISTINCT col) FROM large_table;

Approximate (fast, low memory):

CREATE EXTENSION hll;
SELECT hll_cardinality(hll_add_agg(hll_hash_text(col))) AS approx_distinct FROM large_table;

Thomas Heinrich

Command Line Expert & Software Engineer

Welcome! I’m Thomas Heinrich, a software engineer and system administrator with a deep passion for the Command Line Interface (CLI). With years of experience navigating the terminal, building backend architectures, and automating server deployments, I created this space to share practical, real-world terminal knowledge.

Whether you are a beginner taking your first steps in a Linux environment or a seasoned DevOps engineer looking to optimize your deployment scripts, you will find actionable solutions here. My goal is to help you ditch the mouse, speed up your workflow, and harness the full power of the command line.