paper-dremel

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf

paper-dremel#nested-data-model1A nested data model underlies most of structured data processing at Google paper-dremel#nested-data-model1

paper-dremel#in-situ1Unlike traditional databases, it is capable of operating on in situ nested data. In situ refers to the ability to access data "in place", e.g., in a distributed file system paper-dremel#in-situ1

paper-dremel#not-map-reduce-replacement1Dremel is not intended as a replacement for MR and is often used in conjunction with it to analyze outputs of MR pipelines or rapidly prototype larger computations. paper-dremel#not-map-reduce-replacement1

paper-dremel#serving-tree1its architecture borrows the concept of a serving tree used in distributed search engines [11]. Just like a web search request, a query gets pushed down the tree and is rewritten at each step. The result of the query is assembled by aggregating the replies received from lower levels of the tree paper-dremel#serving-tree1

paper-dremel#execution-trees1We show how execution trees used in web search systems can be applied to database processing, and explain their benefits for answering aggregation queries efficiently paper-dremel#execution-trees1

paper-dremel#replication-to-achieve-fast-response-times1GFS uses replication to preserve the data despite faulty hardware and achieve fast response times in presence of stragglers paper-dremel#replication-to-achieve-fast-response-times1

paper-dremel#high-performance-storage-layer-for-in-situ1A high performance storage layer is critical for in situ data management paper-dremel#high-performance-storage-layer-for-in-situ1

paper-dremel#common-storage-layer1 2The above scenario requires interoperation between the query processor and other data management tools. The first ingredient for that is a common storage layer. paper-dremel#common-storage-layer1 2

paper-dremel#shared-storage-format1The second ingredient for building interoperable data manage- ment components is a shared storage format. Columnar storage proved successful for flat relational data but making it work for Google required adapting it to a nested data model paper-dremel#shared-storage-format1

paper-dremel#format-to-improve-efficiency1our goal is to store all values of a given field consecutively to improve retrieval efficiency. In this section, we address the following challenges: lossless representation of record structure in a columnar format (Section 4.1), fast encoding (Section 4.2), and efficient record assembly (Section 4.3) paper-dremel#format-to-improve-efficiency1

paper-dremel#sparseMany datasets used at Google are sparse; it is not uncommon to have a schema with thousands of fields, only a hundred of which are used in a given record. paper-dremel#sparse

paper-dremel#approximate-results-with-one-pass-algos1 2Some Dremel queries, such as top-k and count-distinct, return approximate results using known one-pass algorithms paper-dremel#approximate-results-with-one-pass-algos1 2

paper-dremel#mr-benefit-from-columnar-storage1MR can benefit from columnar storage just like a DBMS. paper-dremel#mr-benefit-from-columnar-storage1

paper-dremel#structural-normalization1Our columnar representation of nested data builds on ideas that date back several decades: separation of structure from content and transposed representation. paper-dremel#structural-normalization1

Referring Pages

data-architecture-glossary paper-google-bigquery