實時數據分析Real-time data analysis frameworks (or stream system)

最近的工作中涉及要設計一個系統可以實時的監控系統的狀態,比如hadoop任務的執行情況,服務器的健康等。這個系統需要實時的處理對象產生的信息,併發送給用戶。

這個系統顯然需要具備如下特性:

  1. 可靠性
  2. 大數據處理
  3. 實時性

顯然這將是一個基於Hadoop上的項目,目前可供參考的有

Kafka: Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn’s activity stream processing pipeline. Nice talk

S4: S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.

Hedwig: Hedwig is a publish-subscribe system designed to carry large amounts of data across the internet in a guaranteed-delivery fashion from those who produce it (publishers) to those who are interested in it (subscribers).

Storm: Storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for data processing. Introduction slide

Flume: Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop’s HDFS.

Scribe: Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.

隨着項目的跟進,我會繼續更新。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章