https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf
paper-dremel#nested-data-model1A nested data model underlies most of structured data processing at Google paper-dremel#nested-data-model1
paper-dremel#in-situ1Unlike traditional databases, it is capable of operating on in situ nested data. In situ refers to the ability to access data "in place", e.g., in a distributed file system paper-dremel#in-situ1
paper-dremel#not-map-reduce-replacement1Dremel is not intended as a replacement for MR and is often used in conjunction with it to analyze outputs of MR pipelines or rapidly prototype larger computations. paper-dremel#not-map-reduce-replacement1
paper-dremel#serving-tree1its architecture borrows the concept of a serving tree used in distributed search engines [11]. Just like a web search request, a query gets pushed down the tree and is rewritten at each step. The result of the query is assembled by aggregating the replies received from lower levels of the tree paper-dremel#serving-tree1
paper-dremel#execution-trees1We show how execution trees used in web search systems can be applied to database processing, and explain their benefits for answering aggregation queries efficiently paper-dremel#execution-trees1
paper-dremel#replication-to-achieve-fast-response-times1GFS uses replication to preserve the data despite faulty hardware and achieve fast response times in presence of stragglers paper-dremel#replication-to-achieve-fast-response-times1
paper-dremel#high-performance-storage-layer-for-in-situ1A high performance storage layer is critical for in situ data management paper-dremel#high-performance-storage-layer-for-in-situ1
paper-dremel#common-storage-layer1 2The above scenario requires interoperation between the query processor and other data management tools. The first ingredient for that is a common storage layer. paper-dremel#common-storage-layer1 2
paper-dremel#shared-storage-format1The second ingredient for building interoperable data manage- ment components is a shared storage format. Columnar storage proved successful for flat relational data but making it work for Google required adapting it to a nested data model paper-dremel#shared-storage-format1
paper-dremel#format-to-improve-efficiency1our goal is to store all values of a given field consecutively to improve retrieval efficiency. In this section, we address the following challenges: lossless representation of record structure in a columnar format (Section 4.1), fast encoding (Section 4.2), and efficient record assembly (Section 4.3) paper-dremel#format-to-improve-efficiency1
paper-dremel#sparseMany datasets used at Google are sparse; it is not uncommon to have a schema with thousands of fields, only a hundred of which are used in a given record. paper-dremel#sparse
paper-dremel#approximate-results-with-one-pass-algos1 2Some Dremel queries, such as top-k and count-distinct, return approximate results using known one-pass algorithms paper-dremel#approximate-results-with-one-pass-algos1 2
paper-dremel#mr-benefit-from-columnar-storage1MR can benefit from columnar storage just like a DBMS. paper-dremel#mr-benefit-from-columnar-storage1
paper-dremel#structural-normalization1Our columnar representation of nested data builds on ideas that date back several decades: separation of structure from content and transposed representation. paper-dremel#structural-normalization1