kafka文檔(8)----0.10.1-Use Cases-用例

Here is a description of a few of the popular use cases for Apache Kafka™. For an overview of a number of these areas in action, see this blog post.

下面是有關Apache Kafka使用案例的描述。可以參考這篇文章

Messaging

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.

In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.


kafka表現很好,可以替代傳統消息系統。消息brokers的存在有很多原因(隔離消息producers和consumers,並緩存未處理的消息)。和大多數消息系統相比,kafka有更好的吞吐量,內置的partitions,備份,容錯性能,這使得kafka稱爲大規模消息處理應用的優秀解決方案。

目前經驗來看,消息系統一般是低吞吐量,但是要求端到端的低延遲,而且依賴於kafka提供的強可用保證。

在這個領域內,kafka可以比得上傳統消息系統,例如ActiveMQ或者RabbitMQ。


Website Activity Tracking

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.

Activity tracking is often very high volume as many activity messages are generated for each user page view.


網站 瀏覽行爲跟蹤

kafka最初的應用場景是重建用戶訪問行爲,通過一個實時發佈-訂閱信息流模擬重現用戶行爲。這就意味着網站瀏覽行爲(網頁查看、查找、或者其他用戶的行爲)統一傳輸到對應行爲的topic中。這些信息流可用於大量使用場景:包括實時處理,實時監控,以及加載到Hadoop或者離線數據清洗系統進行離線處理以及上報。

行爲跟蹤數據非常大,每次用戶網頁查看都會產生一次行爲消息。


Metrics

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.


指標監測

kafka還可以用來監測操作數據。這涉及到聚合來自分佈式應用的統計數據,然後產生操作數據的集中化信息流。


Log Aggregation

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.


日誌聚合

很多人將kafka用來做日誌聚合解決方案。日誌聚合一般收集來自server的物理日誌,並將它們存放到一個集中的地方(文件服務器或者HDFS),進行統一處理。kafka抽離文件細節,並給出更加清晰抽象,這些數據以日誌或者消息流事件數據形式存在。這將提供低延遲處理,同時爲多數據來源和分佈式數據消費提供更簡單的支持。相比其他中心日誌系統例如Scribe或者Flume,kafka提供相差不大的性能,同時提供多個備份以提高可用性,以及更低的端對端延遲。


Stream Processing

Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might normalize or deduplicate this content and published the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called Kafka Streams is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza.


流式處理

很多kafka用戶使用多級管道處理方式,即從kafka獲取原始數據,然後聚合、增強、或者傳輸到新topic中,以供後面流式處理。例如,新聞推薦的處理管道可能是:從RSS信息流中爬取文章內容,然後存儲到名爲articles的topic中;然後規範或者精簡文章內容並存儲到新topic中;最後一級的處理可能是推薦這些文章內容給用戶。基於這些單獨的topics,處理管道就建立了實時數據流的圖譜。從0.10.0.0開始,kafka有了更加輕量級並且更有力的流式處理庫-Kafka Streams。除了Kafka Streams,其他開源流式處理工具還有Apache Storm和Apache Samza。


Event Sourcing

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.


事件收集

事件收集是用來收集一系列基於時間順序的狀態改變時間。kafka可以存儲大量的日誌,這一點使得事件收集更加便利。


Commit Log

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeper project.


提交日誌

kafka可以作爲分佈式的日誌提交系統。這有助於在節點之間備份數據,同時有助於重新可用的節點重新同步數據。kafka同時提供日誌壓縮機制。在這個方面,kafka和Apache BookKeeper項目比較相似。



發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章