Data comparison in SQL is a different discipline from comparing SQL scripts or schemas. The question here is not "do these two migration files match?" but rather: "do these two tables contain the same rows?" After a data migration, do the source and destination hold identical records? After a nightly ETL, does the data warehouse match the OLTP system? After a replica failover, has any row drifted? These questions require query-level techniques — set operators, JOINs, hash fingerprints — not text diff tools. For the complementary topic of diffing SQL scripts and schemas, see the guide to SQL code and schema comparison. For the broader problem of reconciling data sets across environments, this article provides the SQL-native building blocks.

ISO/IEC 9075 (the SQL standard) defines set operators — EXCEPT, INTERSECT, UNION — that make row-level comparison in SQL a first-class language feature. The challenge is that engine support varies, NULL semantics are treacherous, and naive approaches collapse under production data volumes. This guide covers every practical pattern, from a three-line EXCEPT query to SHA-256 hash pipelines for bulletproof row validation, with dialect-tagged runnable examples for SQL Server, PostgreSQL, MySQL, and Oracle.

Why Data Comparison in SQL Matters

When Teams Need Data Comparison in SQL Migration Source vs destination row counts, sums, key sets identical? EXCEPT / hash ETL Reconciliation Control totals match? Late-arriving rows? Duplicates introduced? COUNT/SUM + MERGE Replica Drift Read replica in sync with primary after network partition? FULL OUTER JOIN Audit & Compliance Point-in-time snapshot vs current state. What changed, when? snapshot tables
The four primary drivers for data comparison in SQL. Each requires a different query pattern.

Every team that moves data between systems eventually faces the same question: did it arrive correctly? The four canonical scenarios are:

  • Migration validation: After migrating from MySQL to PostgreSQL, or from an on-prem database to a cloud warehouse, verify that every row transferred correctly. Row counts are necessary but not sufficient — a bulk INSERT can silently truncate VARCHAR columns or coerce NULL values. You need row-level comparison in SQL to be certain.
  • ETL reconciliation: Nightly loads from OLTP to data warehouse should produce predictable aggregate deltas. When COUNT(*) or SUM(revenue) drifts unexpectedly, the ETL pipeline introduced duplicates, dropped rows, or applied incorrect transformations. The data set reconciliation pattern starts with control totals and drills down with set operators.
  • Replica drift detection: After a network partition or a failover event, a read replica may have diverged from the primary. A FULL OUTER JOIN comparison — or a hash comparison on sampled rows — quickly confirms whether the replica is safe to promote or needs a full resync.
  • Audit and compliance: Regulated environments must prove that data has not changed unexpectedly between audits. Snapshot tables combined with point-in-time comparison queries create an immutable audit trail of exactly what changed and when.

This guide focuses entirely on comparing row data using SQL queries. If your goal is to diff the SQL scripts, stored procedures, or schema definitions themselves, that is a separate problem covered in the SQL code compare guide. And if you need to compare individual database tables row by row with a visual interface, dedicated tools exist for that too — but understanding the underlying SQL patterns makes you independent of any specific tool.

Set Operators: EXCEPT, INTERSECT, MINUS

SQL Set Operators: EXCEPT · INTERSECT · UNION A B EXCEPT Rows in A, not in B A B INTERSECT Rows in both A and B A B UNION All distinct rows from A and B
EXCEPT returns rows exclusive to the left table. INTERSECT returns the common rows. UNION returns all distinct rows from both.

Set operators are the cleanest SQL-native tool for data comparison in SQL. They operate on full rows, not individual columns, and they handle NULLs correctly for set membership — two NULLs in the same column position are treated as equal for the purposes of EXCEPT and INTERSECT (unlike the standard equality operator where NULL = NULL is UNKNOWN).

EXCEPT: rows in A but not in B

-- SQL Server / PostgreSQL
-- Find rows in production that are missing from the archive
SELECT id, customer_id, amount, status
FROM   orders_production

EXCEPT

SELECT id, customer_id, amount, status
FROM   orders_archive;
-- Returns every row from orders_production
-- with no matching row (on all selected columns) in orders_archive.
-- Oracle: use MINUS (Oracle 21c+ also accepts EXCEPT as a synonym)
-- MINUS is the historical Oracle operator; EXCEPT was added as a synonym in 21c
SELECT id, customer_id, amount, status
FROM   orders_production

MINUS

SELECT id, customer_id, amount, status
FROM   orders_archive;
-- MySQL 8.0.31+ (native EXCEPT added in Oct 2022)
SELECT id, customer_id, amount, status
FROM   orders_production

EXCEPT

SELECT id, customer_id, amount, status
FROM   orders_archive;
-- Pre-8.0.31 and MariaDB: see Section 4 for workarounds.

INTERSECT: rows present in both tables

-- SQL Server / PostgreSQL / MySQL 8.0.31+ / Oracle (INTERSECT is universal)
-- Find records confirmed in both the source and destination after migration
SELECT id, email, created_at
FROM   users_source

INTERSECT

SELECT id, email, created_at
FROM   users_destination;
-- Only rows identical on ALL selected columns appear in the result.

Bidirectional diff: find all discrepancies

A single EXCEPT only shows rows in A missing from B. To find all differences in both directions — rows missing from B and rows missing from A — combine two EXCEPTs with UNION ALL:

-- SQL Server / PostgreSQL
-- Complete bidirectional diff
SELECT 'only_in_source' AS diff_direction,
       id, customer_id, amount, status
FROM   orders_source
EXCEPT
SELECT 'only_in_source', id, customer_id, amount, status
FROM   orders_destination

UNION ALL

SELECT 'only_in_destination' AS diff_direction,
       id, customer_id, amount, status
FROM   orders_destination
EXCEPT
SELECT 'only_in_destination', id, customer_id, amount, status
FROM   orders_source

ORDER BY diff_direction, id;

Critical gotchas

  • EXCEPT removes duplicates by default. EXCEPT is equivalent to EXCEPT DISTINCT — duplicate rows in the left table are deduplicated before comparison. EXCEPT ALL preserves duplicates and is supported by PostgreSQL only; SQL Server does not implement EXCEPT ALL. If duplicate row counts matter (e.g., verifying no duplicates were introduced), use EXCEPT ALL in PostgreSQL or use the UNION ALL + GROUP BY pattern in other engines.
  • Column count and types must match. Both SELECT lists must have the same number of columns and compatible data types. Cast explicitly when comparing across environments where type coercion rules differ.
  • NULL handling is safe here. Unlike =, set operators treat two NULLs in the same column position as equal. A row with a NULL in column 3 in table A will match a row with a NULL in column 3 in table B, so nulls do not generate false positives in EXCEPT results.
  • Oracle MINUS = EXCEPT. Oracle's historical operator is MINUS. Oracle 21c added EXCEPT as a synonym. Both are EXCEPT DISTINCT semantics.

The Microsoft Learn reference for EXCEPT and INTERSECT documents SQL Server's implementation in detail, including ordering rules and limitations with FOR XML clauses.

JOIN-Based Data Comparison

FULL OUTER JOIN Row-Level Diff orders_source (A) id=1 amt=100 status=paid id=2 amt=250 status=paid id=3 amt=80 status=pending id=4 amt=320 status=paid id=5 amt=175 status=paid orders_dest (B) id=1 amt=100 status=paid id=2 amt=250 status=paid — missing — id=4 amt=320 status=paid id=6 amt=90 status=refund FULL OUTER Only in A (id=3) Only in B (id=6) Matched rows FULL OUTER JOIN on id reveals id=3 missing from dest and id=6 extra in dest
A FULL OUTER JOIN surfaces rows present in one table but absent in the other. Blue = only in source; red = only in destination.

JOIN-based comparison gives you more control than set operators: you can compare specific columns, show matched vs. unmatched rows side by side, and include rows where values differ between tables. This is the pattern to use when you want to see how rows differ, not just which rows differ.

FULL OUTER JOIN: find all unmatched rows

-- SQL Server / PostgreSQL / Oracle
-- Finds rows present in only one of the two tables
SELECT
    COALESCE(a.id, b.id)         AS id,
    a.customer_id                AS src_customer,
    b.customer_id                AS dst_customer,
    a.amount                     AS src_amount,
    b.amount                     AS dst_amount,
    CASE
        WHEN a.id IS NULL THEN 'only_in_destination'
        WHEN b.id IS NULL THEN 'only_in_source'
        ELSE                    'matched'
    END                          AS diff_status
FROM   orders_source  a
FULL OUTER JOIN orders_destination b ON a.id = b.id
WHERE  a.id IS NULL OR b.id IS NULL  -- remove to see all rows
ORDER  BY id;

Extend to value-level diffs

To catch rows that exist in both tables but have different column values (e.g., a field was updated in one environment but not the other):

-- SQL Server / PostgreSQL
-- Rows that exist in both but have column-level differences
SELECT
    a.id,
    a.amount    AS src_amount,
    b.amount    AS dst_amount,
    a.status    AS src_status,
    b.status    AS dst_status
FROM   orders_source      a
JOIN   orders_destination b ON a.id = b.id
WHERE  a.amount  <> b.amount
    OR a.status  <> b.status
    -- NULL-safe comparisons where columns are nullable:
    OR (a.notes IS NULL) <> (b.notes IS NULL)
    OR (a.notes IS NOT NULL AND a.notes <> b.notes)
ORDER  BY a.id;

LEFT JOIN WHERE NULL: rows in A but not B (index-friendly)

-- All engines including MySQL pre-8.0.31
-- Equivalent to: SELECT ... FROM a EXCEPT SELECT ... FROM b
-- but faster on indexed join columns
SELECT a.id, a.customer_id, a.amount
FROM   orders_source      a
LEFT JOIN orders_destination b ON a.id = b.id
WHERE  b.id IS NULL;  -- no matching row found in destination

This pattern is typically faster than EXCEPT on indexed columns because the optimizer can use an index scan + hash join rather than materializing both result sets for set subtraction. Use it when the join column is indexed in both tables.

MySQL Workarounds: No Native EXCEPT

MySQL: Choose the Right Comparison Pattern MySQL version? 8.0.31+ Native EXCEPT ✓ < 8.0.31 / MariaDB NULLs in subquery? Yes NOT EXISTS (NULL-safe) No NULLs LEFT JOIN WHERE NULL (fastest on index) UNION ALL + GROUP BY HAVING (dupe-aware) need dupes AVOID NOT IN when subquery may contain NULLs — returns zero rows
Choosing the right MySQL comparison pattern. NOT IN is a trap when NULLs are possible in the subquery.

MySQL added native EXCEPT support in 8.0.31 (released October 2022). If you are on an older MySQL version, or on MariaDB (which diverged from MySQL before 8.0.31 and does not support native EXCEPT as of this writing), you need workarounds. The right choice depends on your data and performance requirements.

Option 1: NOT EXISTS (recommended when NULLs are possible)

-- MySQL pre-8.0.31 / MariaDB
-- Find rows in table_a not present in table_b
-- NULL-safe: works correctly even if id can be NULL
SELECT a.id, a.customer_id, a.amount
FROM   orders_a a
WHERE NOT EXISTS (
    SELECT 1
    FROM   orders_b b
    WHERE  b.id          = a.id
      AND  b.customer_id = a.customer_id
      AND  b.amount      = a.amount
);

Option 2: NOT IN (only when subquery column is guaranteed non-NULL)

-- MySQL pre-8.0.31 / MariaDB
-- WARNING: if b.id can ever be NULL, this returns ZERO rows.
-- x NOT IN (1, 2, NULL) is always UNKNOWN (never TRUE) in SQL three-valued logic.
-- Only use this when the join column is NOT NULL and has a NOT NULL constraint.
SELECT id, customer_id, amount
FROM   orders_a
WHERE  id NOT IN (SELECT id FROM orders_b);
-- Safe only if orders_b.id is defined NOT NULL.

Option 3: LEFT JOIN WHERE NULL (fastest on indexed join columns)

-- MySQL pre-8.0.31 / MariaDB
SELECT a.id, a.customer_id, a.amount
FROM   orders_a      a
LEFT JOIN orders_b   b ON a.id = b.id
WHERE  b.id IS NULL;
-- If the join uses only a single key column (id), this is often the fastest
-- option because the optimizer can use an index scan on b.id.

Option 4: UNION ALL + GROUP BY HAVING (dupe-aware EXCEPT emulation)

-- MySQL pre-8.0.31 / MariaDB
-- Emulates EXCEPT ALL — preserves duplicate row counts
-- Each row gets a source tag; rows appearing only once (in one table) are the diff
SELECT id, customer_id, amount, MIN(src) AS found_in
FROM (
    SELECT id, customer_id, amount, 'A' AS src FROM orders_a
    UNION ALL
    SELECT id, customer_id, amount, 'B' AS src FROM orders_b
) combined
GROUP BY id, customer_id, amount
HAVING COUNT(DISTINCT src) = 1  -- appears in only one table
ORDER BY id;

The UNION ALL + GROUP BY HAVING pattern is the most flexible: it works in all MySQL versions, handles duplicates, and shows which table each unmatched row came from. For large tables, ensure id, customer_id, and amount are covered by a composite index or the GROUP BY will cause a full table scan. See the guide to MySQL data comparison tools for GUI-based alternatives that generate these queries automatically.

Hash and Checksum Row Validation

Hash-Based Row Validation Pipeline Row Data id=42 name='Alice' amt=NULL CONCAT_WS '42|Alice|' NULL becomes '' separator '|' needed SHA-256 HASHBYTES (T-SQL) md5(row::text) (PG) MD5(CONCAT_WS) (MySQL) Digest a3f2... 64 hex chars Concatenation collision without separator CONCAT('ab','c') = CONCAT('a','bc') = 'abc' — use CONCAT_WS('|',...) to prevent false matches
Hash validation pipeline: concatenate columns with a separator, hash the result, compare the fixed-length digest. A separator is mandatory to prevent collision between ('ab','c') and ('a','bc').

Set operators compare rows column-by-column, which is accurate but expensive on wide tables with many columns. Hash-based comparison condenses an entire row into a fixed-length digest. You compare digests instead of raw columns — far fewer bytes to transmit and compare, especially useful for cross-database validation.

SQL Server: HASHBYTES (the correct choice for integrity)

-- SQL Server
-- CHECKSUM() is fast but collision-prone — Microsoft explicitly warns
-- against using it for data integrity validation.
-- Use HASHBYTES('SHA2_256', ...) for trustworthy row fingerprints.

SELECT
    id,
    HASHBYTES(
        'SHA2_256',
        CONCAT_WS('|',
            CAST(id           AS NVARCHAR(50)),
            CAST(customer_id  AS NVARCHAR(50)),
            CAST(amount       AS NVARCHAR(50)),
            COALESCE(status, ''),
            COALESCE(notes,  '')
        )
    ) AS row_hash
FROM orders_source;

Then join the two hash result sets on the primary key and compare digests:

-- SQL Server: find rows where any column value differs
WITH src_hash AS (
    SELECT id,
           HASHBYTES('SHA2_256',
               CONCAT_WS('|',
                   CAST(id AS NVARCHAR(50)),
                   CAST(customer_id AS NVARCHAR(50)),
                   CAST(amount AS NVARCHAR(50)),
                   COALESCE(status,''), COALESCE(notes,'')
               )
           ) AS row_hash
    FROM orders_source
),
dst_hash AS (
    SELECT id,
           HASHBYTES('SHA2_256',
               CONCAT_WS('|',
                   CAST(id AS NVARCHAR(50)),
                   CAST(customer_id AS NVARCHAR(50)),
                   CAST(amount AS NVARCHAR(50)),
                   COALESCE(status,''), COALESCE(notes,'')
               )
           ) AS row_hash
    FROM orders_destination
)
SELECT s.id, 'hash_mismatch' AS diff_type
FROM   src_hash s
JOIN   dst_hash d ON s.id = d.id
WHERE  s.row_hash <> d.row_hash;  -- values differ somewhere in the row

PostgreSQL: md5(row::text)

-- PostgreSQL
-- Cast the entire row to text and MD5 it.
-- Simpler but less portable; column order matters.
SELECT id, md5(CAST(t AS TEXT)) AS row_hash
FROM   orders_source t;

-- More explicit and portable (SHA-256 requires pgcrypto extension):
SELECT
    id,
    md5(
        CONCAT_WS('|',
            CAST(id AS TEXT),
            CAST(customer_id AS TEXT),
            CAST(amount AS TEXT),
            COALESCE(status,''),
            COALESCE(notes,'')
        )
    ) AS row_hash
FROM orders_source;

MySQL: MD5(CONCAT_WS(...))

-- MySQL / MariaDB
-- MD5 returns a 32-char hex string.
-- CONCAT_WS skips NULLs but still needs a separator to avoid collisions.
SELECT
    id,
    MD5(CONCAT_WS('|',
        id,
        customer_id,
        CAST(amount AS CHAR),
        IFNULL(status,''),
        IFNULL(notes,'')
    )) AS row_hash
FROM orders_source;

Why the separator is mandatory

Without a separator, CONCAT('ab','c') and CONCAT('a','bc') both produce 'abc'. Row ('ab', 'c') and row ('a', 'bc') would produce identical hashes despite being different data — a silent false negative. Using CONCAT_WS('|', ...) produces 'ab|c' vs 'a|bc', which hash differently. Choose a separator character that cannot appear in your data, or use a multi-character sentinel like '||'.

Use COALESCE(column, '') or the engine-specific NULL-replacement function (ISNULL, IFNULL, NVL) for nullable columns. Without it, a NULL in any position makes the entire concatenation NULL, and all rows with any NULL field would hash identically.

Comparing Data Across Databases

When the tables you need to compare live in different databases — even different servers or different database engines — each major engine provides a native federation mechanism.

SQL Server: Linked Servers

-- SQL Server: query a remote server using four-part name notation
-- server.database.schema.table
SELECT
    local_orders.id,
    remote_orders.id AS remote_id
FROM   dbo.orders                                  AS local_orders
FULL OUTER JOIN
       [prod-server].myapp.dbo.orders              AS remote_orders
ON local_orders.id = remote_orders.id
WHERE  local_orders.id IS NULL
    OR remote_orders.id IS NULL;
-- Linked Server must be configured: EXEC sp_addlinkedserver ...

PostgreSQL: Foreign Data Wrappers

-- PostgreSQL: postgres_fdw for cross-server comparison
-- One-time setup:
CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER remote_prod
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'prod-db.internal', dbname 'myapp', port '5432');

CREATE USER MAPPING FOR current_user
    SERVER remote_prod
    OPTIONS (user 'reader', password 'secret');

CREATE FOREIGN TABLE orders_remote (
    id          BIGINT,
    customer_id BIGINT,
    amount      NUMERIC(12,2),
    status      VARCHAR(50)
) SERVER remote_prod OPTIONS (schema_name 'public', table_name 'orders');

-- Now compare as if it were a local table:
SELECT id FROM orders_local
EXCEPT
SELECT id FROM orders_remote;

Oracle: Database Links

-- Oracle: query a remote Oracle database via a database link
CREATE DATABASE LINK prod_link
    CONNECT TO reader IDENTIFIED BY secret
    USING 'PROD_TNS_ALIAS';

-- Use @link_name to reference the remote table
SELECT id, amount FROM orders_local
MINUS
SELECT id, amount FROM orders@prod_link
ORDER BY 1;

MySQL: Federated Tables

-- MySQL: FEDERATED storage engine (must be enabled in my.cnf)
CREATE TABLE orders_remote (
    id          INT NOT NULL,
    customer_id INT,
    amount      DECIMAL(12,2),
    status      VARCHAR(50),
    PRIMARY KEY (id)
) ENGINE=FEDERATED
  CONNECTION='mysql://reader:secret@prod-host:3306/myapp/orders';

-- Query and compare as a local table
SELECT a.id FROM orders_local a
LEFT JOIN orders_remote b ON a.id = b.id
WHERE b.id IS NULL;

The offline fallback: export, compare

When cross-engine comparison is needed, or when network policy prevents direct connections, export both sides as CSV or JSON and compare the exported files. This is also the safest approach for compliance-sensitive data — no credentials or data leave the controlled environment over a live database link.

-- PostgreSQL: export to CSV
\COPY (SELECT id, customer_id, amount, status FROM orders ORDER BY id)
TO '/tmp/orders_prod.csv' CSV HEADER;

-- MySQL: export to CSV
SELECT id, customer_id, amount, status
FROM orders
ORDER BY id
INTO OUTFILE '/tmp/orders_staging.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';

Once you have both CSV files, use the diff command or import them into any tabular diff tool for visual comparison. For JSON exports, the guide to comparing JSON objects online covers the available tooling. If you need to compare the extracted primary key lists before doing a full row diff, the list comparison guide walks through that narrower problem efficiently.

ETL Reconciliation Patterns

ETL reconciliation is systematic — you verify counts first, then sums, then hashes, and only drill to row-level diffs when aggregates disagree. This layered approach keeps validation fast on large tables.

Control totals: the first gate

-- Run on both source and destination, compare the results manually
-- or programmatically in your ETL orchestration tool
SELECT
    COUNT(*)                  AS total_rows,
    COUNT(DISTINCT id)        AS distinct_ids,
    SUM(amount)               AS total_amount,
    MIN(created_at)           AS earliest_record,
    MAX(created_at)           AS latest_record,
    MAX(updated_at)           AS last_update
FROM orders
WHERE created_at >= '2026-01-01'
  AND created_at <  '2026-02-01';
-- If all six aggregates match between source and destination,
-- row-level content is almost certainly identical.

MERGE statement for INSERT/UPDATE/DELETE deltas

When the goal is not just to detect differences but to surface them as actionable INSERT / UPDATE / DELETE operations:

-- SQL Server / Oracle
-- Surface the delta between source and destination
MERGE INTO orders_destination AS dst
USING orders_source            AS src
ON (dst.id = src.id)
WHEN NOT MATCHED BY TARGET THEN
    INSERT (id, customer_id, amount, status)
    VALUES (src.id, src.customer_id, src.amount, src.status)
WHEN MATCHED AND (
        dst.amount <> src.amount
     OR dst.status <> src.status
) THEN
    UPDATE SET
        dst.amount = src.amount,
        dst.status = src.status
WHEN NOT MATCHED BY SOURCE THEN
    DELETE
-- Optional: log what happened
OUTPUT $action AS merge_action, inserted.id, deleted.id;

Snapshot tables for point-in-time comparison

For audit and compliance use cases, store periodic snapshots of critical tables and query the snapshots for point-in-time comparison:

-- SQL Server / PostgreSQL
-- Create a snapshot at a specific point in time
INSERT INTO orders_snapshot (snapshot_date, id, customer_id, amount, status)
SELECT CURRENT_DATE, id, customer_id, amount, status
FROM orders;

-- Later: compare current state against the snapshot from 30 days ago
SELECT
    COALESCE(current.id, snap.id) AS id,
    snap.amount                   AS amount_30d_ago,
    current.amount                AS amount_now,
    CASE
        WHEN snap.id    IS NULL THEN 'new_record'
        WHEN current.id IS NULL THEN 'deleted'
        WHEN snap.amount <> current.amount THEN 'amount_changed'
        ELSE 'unchanged'
    END AS change_type
FROM orders current
FULL OUTER JOIN orders_snapshot snap
    ON current.id = snap.id
    AND snap.snapshot_date = CURRENT_DATE - INTERVAL '30 days'
WHERE snap.id IS NULL
   OR current.id IS NULL
   OR snap.amount <> current.amount
ORDER BY id;

Performance and Large Tables

Data comparison in SQL on large tables requires deliberate query design. A naive EXCEPT on two 100-million-row tables will materialize both result sets, hash them, and perform a set subtraction — potentially spilling to disk and running for hours.

Index-friendly patterns

  • JOIN on indexed primary key first. LEFT JOIN WHERE NULL with the join on a clustered primary key is usually faster than EXCEPT because the optimizer can apply a merge join on sorted data without materializing both sets.
  • NOT EXISTS beats NOT IN on indexed columns. NOT EXISTS uses a correlated semi-join that short-circuits on first match. NOT IN (subquery) must evaluate the full subquery result.
  • Partition pruning. If the table is date-partitioned, add a partition key filter to both sides of the comparison. A WHERE clause on the partition key prevents a full table scan even when the join covers all other columns.

Sampling for initial validation

-- SQL Server: TABLESAMPLE for a fast approximate check
SELECT id, amount, status
FROM orders_source TABLESAMPLE (1 PERCENT)
EXCEPT
SELECT id, amount, status
FROM orders_destination TABLESAMPLE (1 PERCENT);
-- Not statistically guaranteed but catches gross migration failures quickly.

-- PostgreSQL: TABLESAMPLE BERNOULLI for random sampling
SELECT id, amount, status
FROM orders_source TABLESAMPLE BERNOULLI(1)
EXCEPT
SELECT id, amount, status
FROM orders_destination TABLESAMPLE BERNOULLI(1);

Chunked comparison for very large tables

-- Process in ID ranges to avoid a single massive query
-- Run this in a loop from your ETL tool, incrementing the range each iteration
SELECT id FROM orders_source
WHERE id BETWEEN 1 AND 100000
EXCEPT
SELECT id FROM orders_destination
WHERE id BETWEEN 1 AND 100000;
-- Repeat for 100001-200000, etc.

NOLOCK pitfalls

In SQL Server, WITH (NOLOCK) is sometimes added to comparison queries to avoid blocking. This is dangerous for data validation: NOLOCK allows dirty reads, which means your comparison query may see uncommitted data that will later be rolled back. A comparison using NOLOCK can report false differences that disappear on the next run. Use READ COMMITTED SNAPSHOT ISOLATION (RCSI) at the database level instead, which provides consistent reads without blocking.

Data Comparison Tool Matrix

The right tool depends on whether you need a one-off query, an automated pipeline, or a visual report. The dedicated guide to database comparison software covers GUI tools in depth. Here is how the approaches stack up for pure data comparison:

Approach Best For Cost Cross-DB Output
EXCEPT / MINUS Ad-hoc row diff, same engine Free (built-in) Same engine only Result set
FULL OUTER JOIN Column-level diffs, custom logic Free (built-in) Via linked server / FDW Result set
Hash / HASHBYTES Wide tables, high-confidence validation Free (built-in) Yes (hash then compare) Mismatch IDs
MERGE statement ETL delta detection + apply Free (built-in) Via linked server / FDW Delta operations
Redgate SQL Data Compare SQL Server / Oracle, GUI, CI/CD Paid SQL Server & Oracle Report + sync script
dbForge Data Compare MySQL, SQL Server, PostgreSQL, Oracle GUI Paid (free trial) Yes (multi-engine) Report + sync script
ApexSQL Data Diff SQL Server, GUI, scheduling Paid SQL Server Report + sync script
Diff Checker (browser extension) Visually diff CSV / JSON query exports Free Any (via export) Visual side-by-side diff

Best Practices for Data Comparison in SQL

1. Start with aggregates, not row-level queries

Run COUNT(*), SUM on key financial columns, and MAX(updated_at) first. If these match between source and destination, you have strong evidence of consistency. Row-level data comparison in SQL is expensive — only run it when aggregates diverge.

2. Never use NOT IN when the subquery may contain NULLs

This is the single most common source of silent incorrect results in SQL comparison queries. x NOT IN (1, 2, NULL) is always UNKNOWN in SQL three-valued logic, so no rows are ever returned. Use NOT EXISTS instead — it is NULL-safe and typically faster on indexed columns.

3. Use UNION ALL + GROUP BY HAVING for dupe-aware comparison

EXCEPT DISTINCT silently deduplicates. If the source has three copies of a row and the destination has two, EXCEPT reports no difference. For ETL validation where duplicate row counts matter, use UNION ALL + GROUP BY HAVING COUNT(DISTINCT src) = 1.

4. Standardize collation before comparing string columns

Case-insensitive collations on one server and case-sensitive on another mean 'Alice' and 'alice' are equal in one environment and different in the other. Normalize string columns with explicit COLLATE or LOWER() before any string-based comparison, or accept that case differences will show up as false positives.

5. Use the checksum + spot-check pattern for very large tables

Compute a row hash for every row, then compare hashes. When hashes disagree, retrieve the actual rows for those IDs and inspect the column-level diff. This two-phase approach — hash sweep then spot-check — avoids transmitting all column data for all rows while still catching any difference.

6. Record reconciliation results to an audit table

Do not just run comparison queries and discard the output. INSERT the results — timestamp, comparison type, rows_in_source, rows_in_destination, discrepancy_count — into a reconciliation log table. This creates a historical record that is invaluable for proving data integrity to auditors and for diagnosing recurring drift patterns.

7. Beware of type coercion differences across engines

A DECIMAL(12,2) in MySQL and a NUMERIC(12,2) in PostgreSQL store the same values, but comparing them via an exported CSV introduces floating-point formatting differences (100.00 vs 100). Cast explicitly when comparing across engines, and use string-format normalization before hashing cross-engine row data.

Frequently Asked Questions

How do I compare data in two SQL tables?

The cleanest approach is EXCEPT (SQL Server, PostgreSQL, MySQL 8.0.31+) or MINUS (Oracle): SELECT cols FROM table_a EXCEPT SELECT cols FROM table_b returns rows present in A but not in B. Combine two EXCEPT queries with reversed order via UNION ALL for a bidirectional diff. When you also need column-level value differences for rows that exist in both tables, use a FULL OUTER JOIN on the primary key with a CASE expression to classify each row as matched, only_in_source, or only_in_destination.

What is the difference between EXCEPT and INTERSECT?

EXCEPT returns rows that appear in the first query but not in the second — a set subtraction. INTERSECT returns only rows that appear in both queries — a set intersection. EXCEPT is the right operator for finding missing or extra rows after a migration; INTERSECT confirms which rows are identical on both sides. Both default to DISTINCT semantics (duplicates collapsed); PostgreSQL additionally supports EXCEPT ALL and INTERSECT ALL, which preserve duplicate row counts.

How do I compare data in MySQL before 8.0.31?

MySQL only added native EXCEPT in 8.0.31 (October 2022). On earlier MySQL versions and on MariaDB, three patterns cover every case. NOT EXISTS is safest because it handles NULLs correctly. LEFT JOIN ... WHERE b.id IS NULL is typically fastest on indexed primary keys. UNION ALL + GROUP BY HAVING COUNT is the most flexible — it preserves duplicate counts and shows which table each unmatched row came from. Avoid NOT IN whenever the subquery column can contain NULL, because a single NULL silently returns zero rows under SQL three-valued logic. The full workarounds are in Section 4.

Is CHECKSUM safe for comparing data in SQL Server?

No. Microsoft Learn explicitly documents that CHECKSUM() is non-cryptographic and has a high collision rate — two different rows can produce the same checksum, so a CHECKSUM-based comparison can report identical when the data has actually changed. CHECKSUM is acceptable for fast change-detection hints, never for validating data integrity after a migration or replication event. Use HASHBYTES('SHA2_256', CONCAT_WS('|', ...)) instead: SHA-256 collisions are computationally infeasible on real-world data, and CONCAT_WS with an explicit separator prevents the ('ab','c') vs ('a','bc') concatenation collision.

How do I compare data across two different databases in SQL?

Each major engine ships a native federation mechanism. SQL Server uses Linked Servers and four-part names (server.database.schema.table). PostgreSQL uses Foreign Data Wrappers — postgres_fdw for another Postgres, or dblink, mysql_fdw, oracle_fdw for cross-engine federation. Oracle uses Database Links and the @link_name syntax. MySQL uses the FEDERATED storage engine. When live cross-server connections are not permitted by network or compliance policy, the offline fallback is to export both sides as CSV or JSON and diff the exported files.

Visually diff exported query results in your browser

After running EXCEPT or hash comparison queries, export the result sets as CSV or JSON and paste them into Diff Checker for a side-by-side visual diff. Monaco Editor, three diff algorithms (Smart / Line / Character with ignore-whitespace option), and local processing — no data leaves your machine.

Add to Chrome — It's Free