NULL IS NULL, except when it isn't.

Illustration of a (possible) SQLite bug by especkman@gmail.com

Code & Data: notebook_and_database.zip

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook-test.db
In [2]:
%%sql
SELECT sqlite_version();
 * sqlite:///chinook-test.db
Done.
Out[2]:
sqlite_version()
3.33.0
  • Problem observed with 3.33.0 on MacOS, Sqlite3 installed using Homebrew.
    • May also have been present in 3.32.3
  • Can't reproduce using a cloud service (Dataquest) with 3.21.0 installed.
In [3]:
%%sql
SELECT NULL
 * sqlite:///chinook-test.db
Done.
Out[3]:
NULL
None

I believe the SQL library juypyter is using is translating NULL into the pythonic equivalent None when displaying results

The Unexpected Result

In [4]:
%%sql
WITH 
    tracks_purchased AS
    (
        SELECT *
        FROM invoice_line as il
        INNER JOIN track tr on il.track_id = tr.track_id
    ), 
    candidates AS
    (
        SELECT invoice_id, album_id
        FROM tracks_purchased
        GROUP BY invoice_id,album_id
    ),
    single_album_purchases AS
    (
        SELECT
            invoice_id,
            album_id
        FROM tracks_purchased
        GROUP BY invoice_id
        HAVING count(DISTINCT(album_id)) = 1
    ),
    whole_album_purchases AS
    (
        SELECT 
            s.invoice_id, 
            (
                (
                    SELECT track_id
                    FROM tracks_purchased tp
                    WHERE tp.invoice_id = s.invoice_id
                    EXCEPT
                    SELECT tr.track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                ) IS NULL
                AND
                (
                    SELECT track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                    EXCEPT    
                    SELECT tp.track_id
                    FROM tracks_purchased tp
                    WHERE tp.invoice_id = s.invoice_id
                )IS NULL

            ) AS whole_album
        FROM single_album_purchases s
    )

SELECT
    w.whole_album,
    CASE
        WHEN w.whole_album IS NULL THEN "Nope"
        WHEN w.whole_album IS TRUE then "Yes"
        ELSE "No"
    END is_album,
    w.whole_album IS NULL,
    i.invoice_id,
    i.total
FROM invoice i
LEFT JOIN whole_album_purchases w ON i.invoice_id = w.invoice_id
LIMIT 10;
 * sqlite:///chinook-test.db
Done.
Out[4]:
whole_album is_album w.whole_album IS NULL invoice_id total
1 Yes 0 1 15.84
None Yes 0 2 9.9
None Yes 0 3 1.98
None Yes 0 4 7.92
1 Yes 0 5 16.83
None Yes 0 6 1.98
None Yes 0 7 10.89
None Yes 0 8 9.9
None Yes 0 9 8.91
None Yes 0 10 1.98

The whole_album column contains a mix of True & False values from the right table in a LEFT JOIN, as well as NULL values where no rows existed matching the join condition.

The is_album column uses CASE statements to translate the contents of the whole_album column. It was originally intended to have just "Yes" and "No" values, but a third case was added because NULL's were being Traslated as YES, rather than falling through to the ELSE catchall.

w.whole_album IS NULL further demonstrates that the DB engine does not evaluate the NULLs from the query as NULL.

Further confirmation of unexpected result

In [5]:
%%sql
WITH 
    tracks_purchased AS
    (
        SELECT *
        FROM invoice_line as il
        INNER JOIN track tr on il.track_id = tr.track_id
    ), 
    candidates AS
    (
        SELECT invoice_id, album_id
        FROM tracks_purchased
        GROUP BY invoice_id,album_id
    ),
    single_album_purchases AS
    (
        SELECT
            invoice_id,
            album_id
        FROM tracks_purchased
        GROUP BY invoice_id
        HAVING count(DISTINCT(album_id)) = 1
    ),
    whole_album_purchases AS
    (
        SELECT 
            s.invoice_id, 
            (
                (
                    SELECT track_id
                    FROM tracks_purchased tp
                    WHERE tp.invoice_id = s.invoice_id
                    EXCEPT
                    SELECT tr.track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                ) IS NULL
                AND
                (
                    SELECT track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                    EXCEPT    
                    SELECT tp.track_id
                    FROM tracks_purchased tp
                    WHERE tp.invoice_id = s.invoice_id
                )IS NULL

            ) AS whole_album
        FROM single_album_purchases s
    )

SELECT
    w.whole_album,
    CASE
        WHEN w.whole_album IS NULL THEN "Nope"
        WHEN w.whole_album IS TRUE then "Yes"
        ELSE "No"
    END is_album,
    w.whole_album IS NULL,
    i.invoice_id,
    i.total
FROM invoice i
LEFT JOIN whole_album_purchases w ON i.invoice_id = w.invoice_id
WHERE w.whole_album IS NULL;
 * sqlite:///chinook-test.db
Done.
Out[5]:
whole_album is_album w.whole_album IS NULL invoice_id total

This query replicates the previous query and adds a WHERE condition that is expected to select only rows where the whole_album column is NULL. Instead it produces an empty result.

(Unsuccesful) attempts to replicate problem behavior with simpler queries

In [6]:
%%sql
WITH
    tracks_purchased AS
    (
        SELECT *
        FROM invoice_line as il
        INNER JOIN track tr on il.track_id = tr.track_id
    ),
    candidates AS
    (
        SELECT invoice_id, album_id
        FROM tracks_purchased
        GROUP BY invoice_id,album_id
    ),
    single_album_purchases AS
    (
        SELECT
            invoice_id,
            album_id
        FROM tracks_purchased
        GROUP BY invoice_id
        HAVING count(DISTINCT(album_id)) = 1
    )
    
SELECT t.album_id, s.album_id
FROM track t
LEFT JOIN single_album_purchases s ON s.album_id = t.album_id
WHERE s.album_id IS NULL
LIMIT 10;
 * sqlite:///chinook-test.db
Done.
Out[6]:
album_id album_id_1
2 None
3 None
3 None
3 None
11 None
11 None
11 None
11 None
11 None
11 None

The above is a simpler case. The NULLs resulting from a LEFT JOIN are properly evaluated as NULL.

The following queries are further attempts to provoke the problem behavior. Instead they work as one would expect.

In [7]:
%%sql
SELECT
    tr.track_id, 
    il.invoice_id
FROM track tr
LEFT JOIN invoice_line il ON il.track_id = tr.track_id
WHERE il.invoice_id IS NULL
LIMIT 10;
 * sqlite:///chinook-test.db
Done.
Out[7]:
track_id invoice_id
99 None
101 None
104 None
106 None
107 None
111 None
112 None
113 None
114 None
115 None
In [8]:
%%sql
WITH 
    tracks_purchased AS
    (
        SELECT *
        FROM invoice_line as il
        INNER JOIN track tr on il.track_id = tr.track_id
    ),
    candidates AS
    (
        SELECT invoice_id, album_id
        FROM tracks_purchased
        GROUP BY invoice_id,album_id
    ),
    single_album_purchases AS
    (
        SELECT
            invoice_id,
            album_id
        FROM tracks_purchased
        GROUP BY invoice_id
        HAVING count(DISTINCT(album_id)) = 1
    ),
    whole_album_purchases AS
    (
        SELECT 
            s.invoice_id
            /*, 
            (
                (
                    SELECT track_id
                    FROM invoice_line il
                    WHERE il.invoice_id = s.invoice_id
                    EXCEPT
                    SELECT tr.track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                ) IS NULL
                AND
                (
                    SELECT track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                    EXCEPT    
                    SELECT track_id
                    FROM invoice_line il
                    WHERE il.invoice_id = s.invoice_id
                )IS NULL

            ) AS whole_album */
        FROM single_album_purchases s
    )

SELECT
/*    w.whole_album,
    w.whole_album IS NULL,
    CASE
        WHEN w.whole_album IS NULL THEN "Nope"
        WHEN w.whole_album IS TRUE then "Yes"
        ELSE "No"
    END is_album,
*/
    i.invoice_id,
    w.invoice_id,
    i.total
FROM invoice i
LEFT JOIN whole_album_purchases w ON i.invoice_id = w.invoice_id
WHERE w.invoice_id IS NULL
LIMIT 10;
 * sqlite:///chinook-test.db
Done.
Out[8]:
invoice_id invoice_id_1 total
2 None 9.9
3 None 1.98
4 None 7.92
6 None 1.98
7 None 10.89
8 None 9.9
9 None 8.91
10 None 1.98
11 None 10.89
12 None 3.96
In [9]:
%%sql
WITH 
    tracks_purchased AS
    (
        SELECT *, 
        CASE
            WHEN RANDOM() > 0 THEN NULL
            ELSE 0
        END this_is_null
        FROM invoice_line as il
        INNER JOIN track tr on il.track_id = tr.track_id
    ), 
    candidates AS
    (
        SELECT invoice_id, album_id, this_is_null
        FROM tracks_purchased
        GROUP BY invoice_id, album_id
    ),
    single_album_purchases AS
    (
        SELECT
            invoice_id,
            album_id,
            this_is_null
        FROM tracks_purchased
        GROUP BY invoice_id
        HAVING count(DISTINCT(album_id)) = 1
    ),
    whole_album_purchases AS
    (
        SELECT 
            s.invoice_id,
            CASE
                WHEN s.this_is_null IS NULL THEN NULL
                ELSE True
            END this_is_null,
            (
                (
                    SELECT track_id
                    FROM tracks_purchased tp
                    WHERE tp.invoice_id = s.invoice_id
                    EXCEPT
                    SELECT tr.track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                ) IS NULL
                AND
                (
                    SELECT track_id
                    FROM track tr
                    WHERE tr.album_id = (SELECT c.album_id FROM candidates c WHERE c.invoice_id=s.invoice_id)
                    EXCEPT    
                    SELECT tp.track_id
                    FROM tracks_purchased tp
                    WHERE tp.invoice_id = s.invoice_id
                )IS NULL

            ) AS whole_album
        FROM single_album_purchases s
    )

SELECT
    w.whole_album,
    w.whole_album IS NULL,
    w.this_is_null,
    w.this_is_null IS NULL,
    CASE
        WHEN w.whole_album IS NULL THEN "Nope"
        WHEN w.whole_album IS TRUE then "Yes"
        ELSE "No"
    END is_album,
    i.invoice_id,
    i.total
FROM invoice i
LEFT JOIN whole_album_purchases w ON i.invoice_id = w.invoice_id
LIMIT 10;
 * sqlite:///chinook-test.db
Done.
Out[9]:
whole_album w.whole_album IS NULL this_is_null w.this_is_null IS NULL is_album invoice_id total
1 0 None 1 Yes 1 15.84
None 0 None 1 Yes 2 9.9
None 0 None 1 Yes 3 1.98
None 0 None 1 Yes 4 7.92
1 0 1 0 Yes 5 16.83
None 0 None 1 Yes 6 1.98
None 0 None 1 Yes 7 10.89
None 0 None 1 Yes 8 9.9
None 0 None 1 Yes 9 8.91
None 0 None 1 Yes 10 1.98