http://queue.acm.org/detail.cfm?id=1563874
article-the-pathologies-of-big-data#limited-by-disklimited primarily by the speed at which data could be fetched from disk: a little over 15 minutes for one pass through the data at a typical 90-megabyte-per-second sustained read speed,9 shamefully underutilizing the CPU the whole time. article-the-pathologies-of-big-data#limited-by-disk
article-the-pathologies-of-big-data#data-inflation1subsequent tests revealed that the database was using three to four times as much storage as would be necessary to store each field as a 32-bit integer. This sort of data "inflation" is typical of a traditional RDBMS and shouldn't necessarily be seen as a problem, especially to the extent that it is part of a strategy to improve performance. After all, disk space is relatively cheap.) article-the-pathologies-of-big-data#data-inflation1
A data warehouse has been classically defined as "a copy of transaction data specifically structured for query and analysis,"4
Big data changes the answers to these questions, as traditional techniques such as RDBMS-based dimensional modeling and cube-based OLAP (online analytical processing) turn out to be either too slow or too limited to support asking the really interesting questions about warehoused dat
article-the-pathologies-of-big-data#transactional-databases-ignore-ordering-of-rows-in-tablesThe prevailing database model today, however, is the relational database, and this model explicitly ignores the ordering of rows in tables. article-the-pathologies-of-big-data#transactional-databases-ignore-ordering-of-rows-in-tables
To achieve acceptable performance for highly order-dependent queries on truly large data, one must be willing to consider abandoning the purely relational database model for one that recognizes the concept of inherent ordering of data down to the implementation level
article-the-pathologies-of-big-data#media-access-performance1A further point that's widely underappreciated: in modern systems, as demonstrated in the figure, random access to memory is typically slower than sequential access to disk. Note that random reads from disk are more than 150,000 times slower than sequential access; SSD improves on this ratio by less than one order of magnitude. In a very real sense, all of the modern forms of storage improve only in degree, not in their essential nature, upon that most venerable and sequential of storage media: the tape. article-the-pathologies-of-big-data#media-access-performance1