This was posted as an answer to the following question:
I also wanted to write this slightly more detailed explanation.
Say we have a table that stores start- and end-dates for some entities. That is to say: There can be many start- and end-date-pairs for each entity.
For the purposes of this exercise we would ultimately like to condense overlapping start- and end-date-pairs into one row for each set of overlapping date-ranges; non-overlapping date-ranges should not be condensed. In other words, the final goal is to have one and only one row for each set of however many overlapping start- and end-dates exist, for each entity in the table. You can probably think of several applications for this kind of query.
For a solution to this problem we turn to our good friend: so-called "recursive SQL"; which is actually more of an iterative process.
WITH RECURSIVE t1_rec ( id, "begin", "end", n ) AS ( SELECT id, "begin", "end", n FROM ( SELECT id, "begin", "end", CASE WHEN LEAD("begin") OVER ( PARTITION BY id ORDER BY "begin") <= ("end" + interval '2' day) THEN 1 ELSE 0 END AS cl, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY "begin") AS n FROM mytable ) s WHERE s.cl = 1 UNION ALL SELECT p1.id, p1."begin", p1."end", a.n FROM t1_rec a JOIN mytable p1 ON p1.id = a.id AND p1."begin" > a."begin" AND (a."begin", a."end" + interval '2' day) OVERLAPS (p1."begin", p1."end") ) SELECT t1.id, min(t1."begin"), max(t1."end") FROM t1_rec t1 LEFT JOIN t1_rec t2 ON t1.id = t2.id AND t2."end" = t1."end" AND t2.n < t1.n WHERE t2.n IS NULL GROUP BY t1.id, t1.n ORDER BY t1.id, t1.n;
To begin the query we start at the end: We specify what fields will be contained in our final resultset:
Now for the real beginning: We start with the those
begin dates in the table that have a subsequent (temporally)
begin date. The
LEAD function helps with this task. If you want to see more detail about how this part works you can add this field into the
SELECT clause of the (sub)query with the
-- for debugging LEAD(beg) OVER ( PARTITION BY entity_id ORDER BY beg),
Next we build up our resultset row by row using
UNION ALL. You can think of this step like a for loop iterating over your table. The second and third join-conditions are specific to this particular question, and ensure that we append 1) subsequent ranges that 2) have a gap of up to one day between one entity's end-date and the next entity's start- date.
The final step is to reduce the resultset to only the rows that represent entire overlapping date-ranges. What happens in the iterative process is that any overlapping ranges beyond an initial one generate a subset of overlapping ranges. This might be useful in some cases, but not here; so we remove them using an aggregate operation. If self-exclusion joins give you the heebie-jeebies use this instead:
-- correlated subquery alternative to self-exclusion join NOT EXISTS ( SELECT 1 FROM t1_rec t2 WHERE t2.entity_id = t1.entity_id AND t2.fin = t1.fin AND t2.n < t1.n)
Comments are now closed for this post.