paper-google-filesystem

http://static.googleusercontent.com/media/research.google.com/en/us/archive/gfs-sosp2003.pdf

"our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points."

"We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking, and power supplies. Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system."

"most files are mutated by appending new data rather than overwriting existing data. Random writes within a file are practically non-existent. Once written, the files are only read, and often only sequentially. A variety of data share these characteristics. Some may constitute large repositories that data analysis programs scan through. Some may be data streams continuously generated by running applications. Some may be archival data. Some may be intermediate results produced on one machine and processed on another, whether simultaneously or later in time. Given this access pattern on huge files, appending becomes the focus of performance optimization and atomicity guarantees, while caching data blocks in the client loses its appeal."

paper-google-filesystem#co-design"Fourth, co-designing the applications and the file system API benefits the overall system by increasing our flexibility. For example, we have relaxed GFS's consistency model to vastly simplify the file system without imposing an onerous burden on the applications" paper-google-filesystem#co-design

"Small writes at arbitrary positions in a file are supported but do not have to be efficient."

paper-google-filesystem#record-append-and-multi-way-merge-results1 2"Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client's append. It is useful for implementing multi-way merge results and producer-consumer queues that many clients can simultaneously append to without additional locking. We have found these types of files to be invaluable in building large distributed applications." paper-google-filesystem#record-append-and-multi-way-merge-results1 2

Clients interact with the master for metadata opera- tions, but all data-bearing communication goes directly to the chunkserver

In fact, the client typically asks for multiple chunks in the same request and the master can also include the informa- tion for chunks immediately following those requested. This extra information sidesteps several future client-master in- teractions at practically no extra cost.

Having a single master vastly simplifies our design and enables the master to make sophisticated chunk placement and replication decisions using global knowledge.

However, hot spots did develop when GFS was first used by a batch-queue system: an executable was written to GFS as a single-chunk file and then started on hundreds of ma- chines at the same time. The few chunkservers storing this executable were overloaded by hundreds of simultaneous re- quests. We fixed this problem by storing such executables with a higher replication factor and by making the batch- queue system stagger application start times.

Since the operation log is critical, we must store it reli- ably and not make changes visible to clients until metadata changes are made persistent. Otherwise, we effectively lose the whole file system or recent client operations even if the chunks themselves survive. Therefore, we replicate it on multiple remote machines and respond to a client opera- tion only after flushing the corresponding log record to disk both locally and remotely.

Referring Pages

data-architecture-glossary blog-post-did-google-send-industry-on-10-year-head-fake