- New Data: All data entering the system is dispatched to both the batch layer and the speed layer for processing.
- Batch layer: This layer has two functions: (i) managing the master dataset, an immutable, append-only set of raw data, and (ii) to pre-compute arbitrary query functions, called batch views. Hadoop's HDFS is typically used to store the master dataset and perform the computation of the batch views using MapReduce.
- Serving layer: This layer indexes the batch views so that they can be queried in ad hoc with low latency. To implement the serving layer, usually technologies such as Apache HBase or ElephantDB are utilized. The Apache Drill project provides the capability to execute full ANSI SQL 2003 queries against batch views.
- Speed layer:This layer compensates for the high latency of updates to the serving layer, due to the batch layer. Using fast and incremental algorithms, the speed layer deals with recent data only. Storm is often used to implement this layer.
- Queries: Last but not least, any incoming query can be answered by merging results from batch views and real-time views.
Monday, June 02, 2014
lambda architecture
http://www.drdobbs.com/database/applying-the-big-data-lambda-architectur/240162604http://www.slideshare.net/nathanmarz/runaway-complexity-in-big-data-and-a-plan-to-stop-it
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment