SQL

Set operators – Union, intersect and minus

You’ve probably combined two queries with “union”, but have you looked at the different options for how to combine queries? The set operators are more combining result sets than they are combining queries even if we often think about it is combined queries.

Thy need to return the same number of columns of the same datatypes. The lengths of the columns can differ but not the datatypes.

Let’s start with a couple of tables and data so you have something to test with.

create table t1 (ix number, name varchar2(25));
create table t2 (ix number, name varchar2(25));

insert into t1 values(1, 'Charles');
insert into t1 values(2, 'John');
insert into t1 values(3, 'Steve');

insert into t2 values(1, 'Ben');
insert into t2 values(2, 'John');
insert into t2 values(3, 'Ari');

Let’s start with the version of a union that I think of as the default even if it has more keywords. “Union all” just concatenates two result sets making it the most basic form of a union.

select ix, name from t1
union all
select ix, name from t2
IX NAME
1  Charles
2  John
3  Steve
1  Ben
2  John
3  Ari

As you can see it is just the output from the first query followed by the output of the last. No sorting, no eliminating duplicate rows (#2 is a dup).

The next variation of union is of course doing exactly that, lets remove the extra #2.

select ix, name from t1
union
select ix, name from t2
IX NAME
1 Ben
1 Charles
2 John
3 Ari
3 Steve

Now it is sorted left to right and the extra #2 that was identical to the first is removed. The order of the rows is undefined even if the data tends to be sorted left to right.

If we were to do a select on ly on column IX, we’d end up with just three rows as the values in that column is repeating. Both tables has the value 1, 2, and 3.

Those two variants are pretty common in everyones SQL. Let’s look at the next that many people seem to forget about.

select ix, name from t1
minus
select ix, name from t2
IX NAME
1  Charles
3  Steve

As you can see it is the rows from the first query where rows that are identical in the second query are removed. This could of course be written with “where not exists” or “where not in”, but when the minus set operator works it is usually easier to read and describes the intent better.

On to the variant that I think few even know exists, intersect is similar to a join. In fact it can be written as one.

select ix, name from t1
intersect
select ix, name from t2
IX NAME
2  John

This combines both result sets and keeps only those where there is an identical row in both. It is pretty much the opposite of the minus operator. Minus removed the #2 while intersect returns only #2

It could be written as a join where you join on every column returned in the select. For a small query like the example here it is quick to write the join. But consider a situation with 25 columns, writing a join with 25 columns in the join condition is not that fun to write.

Now let’s look at one way to control the order in which the set operator is evaluated. This query will of course return three rows all with a 1.

select 1 from dual
union all
select 1 from dual
union all 
select 1 from dual

To make it a little bit more interesting, let us change a union all to union. Try to figure out the number of rows returned before reading on.

select 1 from dual
union all
select 1 from dual
union 
select 1 from dual

The result is 1 row. You may have thought there would be 2, one from the first that is not having duplicates removed and then one for the last two. But that is not how it is done, it does the onion all on the first two and that result is then used with the union to the last so identical rows are removed and thus only one row with a 1 is returned.

Next question. What happens if the query starts with a union and ends with a union all, do you get the same result? No you do not and you’re right I would not have asked if you did. In that case the first two queries has a row removed due to union removing duplicates. That results is then used in a union all with the last query resulting in two rows each containing a 1.

How about controlling the order in which queries are evaluated and set operations performed? Take a look at this, if you have not seen it before then the parenthesis are probably odd looking.

select 1 from dual
union
(
select 1 from dual
union all 
select 1 from dual
)

Without the parens it is the same query as variant discussed above that began with a union and ended with a union all. Thus with the parens removed it returns two rows – the union of the first two queries are concatenated with the result of the last – but the parens changes the order of evaluation just like in a where-statement. Here the selects in the parenthesis is evaluated. That is first the result of the second and third query is concatenated and that is the combined with the first and duplicates are removed. As a result the whole query returns just one row.

That is it. There is a lot of power in the set operators and you can do much more with them than first meets the eye.

Leave a Comment

Your email address will not be published. Required fields are marked *

*