Saturday, December 28, 2013

Column oriented DBMS (Columnar storage)

http://en.wikipedia.org/wiki/Column-oriented_DBMS
https://blog.twitter.com/2013/dremel-made-simple-with-parquet
https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

There are several advantages to columnar formats:

  • Organizing by column allows for better compression, as data is more homogenous. The space savings are very noticeable at the scale of a Hadoop cluster.
  • I/O will be reduced as we can efficiently scan only a subset of the columns while reading the data. Better compression also reduces the bandwidth required to read the input.
  • As we store data of the same type in each column, we can use encodings better suited to the modern processors’ pipeline by making instruction branching more predictable.




No comments:

Related Posts Plugin for WordPress, Blogger...