materialized-stages
I like that the Lambda Architecture emphasizes retaining the input data unchanged. I think the discipline of modeling data transformation as a series of materialized stages from an original input has a lot of merit. This is one of the things that makes large MapReduce workflows tractable, as it enables you to debug each stage independently.
(#)
For postgres, I have found that creating temporary tables
- Timing
- lets you reason more effectively about the timing of various pieces of a large flow. It's unclear how long each of your subqueries or CTEs took. You can run explain analyze to see that, so maybe not a strong point.
- Debuggability
- lets you easily debug each step after-the-fact (as Jay mentioned). This is more important the longer it takes to run... if it takes an hour to get to a certain stage in the processing, you don't want to have to redo that hour of work each time you try something.
- (copy 1) not doing the tables as temporary is good in case the session fails... then you can come back in and find the previous tables.
- Recovery
- lets you recover a process that doesn't work fully. Decide how far back you want to go and re-materialize
- (copy 2) not doing the tables as temporary is good in case the session fails... then you can come back in and find the previous tables.