optimal-query-format

page 183: "a vital advantage of the Lambda Architecture is that it allows you to tailor the serving layer for the queries it serves to optimize efficiency (#)

file dumps, typically in a format like CSV, are regularly uploaded to EDH, where they are then unpacked, transformed into optimal query format, and tucked away in HDFS where various EDH components can use them (#)

Location: 13,634 Viewed like this, the role of caches, indexes, and materialized views is simple: they shift the boundary between the read path and the write path. (#)

Location: 13,672 In terms of our model of write path and read path, actively pushing state changes all the way to client devices means extending the write path all the way to the end user. (#)

"Sometimes it's helpful to wrap a view around a table. The view definition can include derived data calculations. Then applications and interfaces can access views for a consistent implementation of derived data." (#)

page 185: since there are not random writes in the serving layer then you can optimize for the read path and get high-performance. (#)

Where decompression is I/O or network bound it makes sense to keep the compressed data as compact as possible. That being said, there are cases where decompression is compute bound and compression schemes like Snappy play a useful role in lowering the overhead. (#)

Referring Pages