Tony Hoare invented the null reference in 1965, and he considers it his "billion-dollar mistake" for the amount of trouble it has caused. Even today, SQL's null
value is the cause of several common mistakes.
Let’s go over the most egregious.
Equals Null
These two queries return the exact same result on a users
table with many rows:
select * from users where deleted_at = null;
-- result: 0 rows
select * from users where deleted_at != null;
-- result: 0 rows
How can that be? It's because null
represents an "unknown" type. This means it doesn't make sense to compare null
to anything else with normal conditional operators. Null
isn't even equal to itself:
select null > 0;
-- result: null
select null < 0;
-- result: null
select null = 0;
-- result: null
select null = null;
-- result: null
The right way to compare values with null
is with the is
, and is not
operators:
select * from users
where deleted_at is null;
-- result: all non-deleted users
select * from users
where deleted_at is not null;
-- result: all deleted users
If you want to check if two columns are different, you can use is distinct from
:
select * from users
where has_address is distinct from has_photo
-- result: users with an address or
-- photo, but not both
Not in Null
One handy way to filter rows is with a subselect. For example, if you wanted the users
who did not have any packages
, you could write a query like this:
select * from users
where id not in (select user_id from packages)
But if one of the rows in packages
has a null user_id
, this query will return no results! To understand why this happens we need to factor the query like the SQL compiler does. Here's a simpler example:
select * from users
where id not in (1, 2, null)
Which translates to:
select * from users
where id != 1 and id != 2 and id != null
As we now know, id != null
is an unknown value, null
. Using and
on any value withnull
becomes null
, so all of the other and
conditions fall away. No rows match the resulting query since null
is not equal to true
.
If the condition is inverted, the query works fine. This time we'll look for users withpackages.
select * from users
where id in (select user_id from packages)
Which we can simplify for the example:
select * from users
where id in (1, 2, null)
This query translates to:
select * from users
where id = 1 or id = 2 or id = null
Since the where
clause is a list of or
conditions, it doesn't matter that one of them is null
. That condition is simply ignored because non-true values do not change the evaluation of the rest of the clause.
Sorting Nulls
When it comes to sorting, nulls are considered the largest possible value. This can lead to some frustrating queries when trying to sort values descending, since all the nulls will be on top.
This query is meant to show the users ranked by their points, but it's putting users without any points first!
select name, points
from users
order by 2 desc;
-- null points sort above
-- any number of points!
There are two ways to deal with this. The easiest way is to get rid of those nulls in the output or the comparison using coalesce
:
-- treat nulls as 0 in output
select name, coalesce(points, 0)
from users
order by 2 desc;
-- keep nulls in output, but sort as 0
select name, points
from users
order by coalesce(points, 0) desc;
And if your database supports it, you can instead tell it where to put nulls when sorting with nulls first
or nulls last
:
select name, coalesce(points, 0)
from users
order by 2 desc nulls last;
Of course, nulls
can also be used to prevent errors. One great use of nulls
is in dealing with divide by zero errors.
Divide by Zero
Divide by zero errors are especially painful. Queries that ran fine yesterday all of a sudden are failing with divide by zero errors. One common solution is to check if the denominator is 0 before dividing with a case
statement:
select case when num_users = 0 then 0
else total_sales/num_users end;
The case
statement approach is verbose and duplicates the denominator. That's OK if the denominator is simple, but if it's an expression, you're likely to get more bugs if you change the query later.
Here we can use null
to our advantage. Use nullif
on the denominator to make the denominator null
instead of zero. Rather than failing, the query will return null
on days where num_users = 0
.
select total_sales/nullif(num_users, 0);
If you prefer the result to be 0 or anything else instead of null
, use coalesce on the previous query:
select coalesce(total_sales/nullif(num_users, 0), 0);
-- nulls results become 0
Conclusion
Tony Hoare may regret his mistake, but at least it’s easy to work around the issues that null
presents. Now go forth with your new knowledge and keep null
from nullifying your future query results!
No comments:
Post a Comment