


l  我想分析一下用戶行爲(pageviews),以便我能設計出更好的廣告位

l  我想對用戶的搜索關鍵詞進行統計,分析出當前的流行趨勢。這個很有意思,在經濟學上有個長裙理論,就是說,如果長裙的銷量高了,說明經濟不景氣了,因爲姑娘們沒錢買各種絲襪了。

l  有些數據,我覺得存數據庫浪費,直接存硬盤又怕到時候操作效率低。


首先我們要明白什麼是消息系統,在kafka官網上對kafka的定義叫:A distributed publish-subscribe messaging system。publish-subscribe是發佈和訂閱的意思,所以更準確的說kafka是一個消息訂閱和發佈的系統。publish-subscribe這個概念很重要,因爲kafka的設計理念就可以從這裏說起。






  • Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
  • High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second.
  • Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
  • Support for parallel data load into Hadoop.


  • LinkedIn - Apache Kafka is used at LinkedIn for activity stream data and operational metrics. This powers various products like LinkedIn Newsfeed, LinkedIn Today in addition to our offline analytics systems like Hadoop.
  • Mate1.com Inc. - Apache kafka is used at Mate1 as our main event bus that powers our news and activity feeds, automated review systems, and will soon power real time notifications and log distribution.
  • Tagged - Apache Kafka drives our new pub sub system which delivers real-time events for users in our latest game - Deckadence. It will soon be used in a host of new use cases including group chat and back end stats and log collection.
  • Boundary - Apache Kafka aggregates high-flow message streams into a unified distributed pubsub service, brokering the data for other internal systems as part of Boundary's real-time network analytics infrastructure.
  • Wooga - We use Kafka to aggregate and process tracking data from all our facebook games (which are hosted at various providers) in a central location.
  • AddThis - Apache Kafka is used at AddThis to collect events generated by our data network and broker that data to our analytics clusters and real-time web analytics platform.
  • Urban Airship - At Urban Airship we use Kafka to buffer incoming data points from mobile devices for processing by our analytics infrastructure.
  • Metamarkets - We use Kafka to collect realtime event data from clients, as well as our own internal service metrics, that feed our interactive analytics dashboards.
  • SocialTwist - We use Kafka internally as part of our reliable email queueing system.
  • Countandra - We use a hierarchical distributed counting engine, uses Kafka as a primary speedy interface as well as routing events for cascading counting
  • FlyHajj.com - We use Kafka to collect all metrics and events generated by the users of the website.



l  Server-1 broker其實就是kafkaserver,因爲producerconsumer都要去連它。Broker主要還是做存儲用。

l  Server-2zookeeperserver端,zookeeper的具體作用你可以去官網查,在這裏你可以先想象,它維持了一張表,記錄了各個節點的IP、端口等信息(以後還會講到,它裏面還存了kafka的相關信息)。

l  Server-345他們的共同之處就是都配置了zkClient,更明確的說,就是運行前必須配置zookeeper的地址,道理也很簡單,這之間的連接都是需要zookeeper來進行分發的。

l  Server-1Server-2的關係,他們可以放在一臺機器上,也可以分開放,zookeeper也可以配集羣。目的是防止某一臺掛了。


1.         啓動zookeeperserver

2.         啓動kafkaserver

3.         Producer如果生產了數據,會先通過zookeeper找到broker,然後將數據存放進broker

4.         Consumer如果要消費數據,會先通過zookeeper找對應的broker,然後消費。

對kafka的初步認識就寫到這裏,接下去我會寫如何搭建kafka的環境。最後感謝大神 @rockybean 的指導和幫助。

