https://www.jianshu.com/p/4bf007885116
入門推薦閱讀

總結

1、Segment的概念：
一個分區被分成相同大小數據條數不相等的Segment，
每個Segment有多個index文件和數據文件組成

2、數據的存儲機制（就是面試題中kafka速度爲什麼如此之快）：
首先是Broker接收到數據後，將數據放到操作系統的緩存裏（pagecache），
pagecache會盡可能多的使用空閒內存，
使用sendfile技術儘可能多的減少操作系統和應用程序之間進行重複緩存，
寫入數據的時候使用順序寫入，寫入數據的速度可達600m/s

3、Consumer怎麼解決負載均衡？（rebalance）
1）獲取Consumer消費的起始分區號
2）計算出Consumer要消費的分區數量
3）用起始分區號的hashCode值模餘分區數

4、數據的分發策略？
Kafka默認調用自己分區器（DefaultPartitioner），
也可以自定義分區器，需要實現Partitioner特質，實現partition方法

5、Kafka怎麼保證數據不丟失？
Kafka接收數據後會根據創建的topic指定的副本數來存儲，
也就是副本機制保證數據的安全性

6、Kafka的應用：
①作爲消息隊列的應用在傳統的業務中使用高吞吐、分佈式、使得處理大量業務內容輕鬆自如。
②作爲互聯網行業的日誌行爲實時分析,比如：實時統計用戶瀏覽頁面、搜索及其他行爲，結合實時處理框架使用實現實時監控，或放到 hadoop/離線數據倉庫裏處理。
③作爲一種爲外部的持久性日誌的分佈式系統提供服務。主要利用節點間備份數據，文件存儲、日誌壓縮等功能。

——————
其他應用場景：
	① 企業內部指標
		對於某些時效性要求較高的指標，如預警指標等，必須在數據變化時
		及時計算併發送信息
	② 通信服務運營商
		 對於用戶套餐中的剩餘量進行監控，如流量，語音通話，短信
	③ 電商行業
		對於吞吐量特別大和數據變動頻次較高的應用，如電商網站，必須使
		用實時計算來捕捉用戶偏好

7、Kafka組件：
①每個partition在存儲層面是append log文件。新消息都會被直接追加到log文件的尾部，每條消息在log文件中的位置稱爲offset（偏移量）。
②每條Message包含了以下三個屬性：
1° offset 對應類型：long 此消息在一個partition中序號。可以認爲offset是partition中Message的id
2° MessageSize 對應類型：int。
3° data 是message的具體內容。
③越多的partitions意味着可以容納更多的consumer,有效提升併發消費的能力。
④總之：業務區分增加topic、數據量大增加partition (副本數<=broker節點數)。

8、實時流處理框架如Storm, Spark Streaming如何實現實時處理的，底層封裝了Kafka Stream。
若是手動實現實時處理的框架，需要自己使用Kafka Stream 庫。

org.apache.kafka
kafka-streams
1.0.2

9、維護消息訂閱方消費的offset的方式有哪些？
①zookeeper ,參數：–zookeeper
②kafka集羣來維護，參數：–bootstrap-server
主題名：__consumer_offsets, 默認： 50個分區；默認的副本數是：1
若是達到默認的主題__consumer_offsets的分區的ha (高容錯)，需要在server.properties文件中定製默認的副本數：
default.replication.factor=3
③手動維護偏移量（一般使用redis存儲偏移量）

10、幾個問題：
①每次啓動一個消費者進程（kafka-console-consumer.sh）,是一個單獨的進程
②手動書寫的消費者，可以通過參數來定製是從頭開始消費，還是接力消費。需要指定flg (main: args[])
③kafka-console-consumer.sh，每次開啓一個消費者進程，有一個默認的消費者組。命名方式是：console-consumer-64328
④查看消費者組的信息，詳見： 4_筆記\查看消費者組.png
⑤PachCache, SendFile

首先準備兩個配置文件到resources

producer.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.producer.ProducerConfig for more details

############################# Producer Basics #############################


# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
# Kafka服務端的主機名和端口號
bootstrap.servers=hadoop:9092

# specify the compression codec for all data generated: none, gzip, snappy, lz4
compression.type=none

# 等待所有副本節點的應答 （follower角色的分區從leader角色的分區中同步完畢消息後，給leader反饋信息）
acks=all

#消息發送最大嘗試次數
retries=0

#一批消息處理大小
batch.size=16384

# name of the partitioner class for partitioning events; default partition spreads data randomly
# partitioner.class=com.l000phone.partition.MyPartition

# the maximum amount of time the client will wait for the response of a request
#request.timeout.ms=

# how long `KafkaProducer.send` and `KafkaProducer.partitionsFor` will block for
#max.block.ms=

# the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together
# 請求延時
linger.ms=1

# the maximum size of a request in bytes
#max.request.size=

# the default batch size in bytes when batching multiple records sent to a partition
#batch.size=

# the total bytes of memory the producer can use to buffer records waiting to be sent to the server
#發送緩存區內存大小
buffer.memory=33554432

#消息的key對應的序列化類
key.serializer=org.apache.kafka.common.serialization.IntegerSerializer

#消息的value對應的序列化類
value.serializer=org.apache.kafka.common.serialization.StringSerializer

consumer.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# 
#    http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.consumer.ConsumerConfig for more details

# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
#定義kakfa 服務的地址
bootstrap.servers=hadoop:9092
# consumer group id
group.id=test-consumer-group
# What to do when there is no initial offset in Kafka or if the current
# offset does not exist any more on the server: latest, earliest, none
#auto.offset.reset=
# 是否自動確認offset
enable.auto.commit=true
#  自動確認offset的時間間隔
auto.commit.interval.ms=500
# key的反序列化類
key.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer
# value的反序列化類
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer

生產者producer：

public class MyMsgProducerDemo {
    public static void main(String[] args) {
        //步驟：
        Producer<Integer, String> producer = null;

        try {
            //①準備Properties的實例，並將資源目錄下的配置文件producer.properties中定製的參數封裝進去
            Properties properties = new Properties();
            properties.load(MyMsgProducerDemo.class.getClassLoader().getResourceAsStream("producer.properties"));

            //②KafkaProducer實例的創建
            producer = new KafkaProducer(properties);

            //③通過循環模擬發佈多條消息
            for (int i = 1; i <= 10; i++) {
                //a）準備消息
                ProducerRecord<Integer, String> record = new ProducerRecord<>("test", i, i + "\t→ 老同學，最近可好？！呵呵噠噠...");

                //b) 發佈消息
                producer.send(record, new Callback() {
                    /**
                     * 當前待發送的消息發送完畢後，下述方法會被回調執行
                     *
                     * @param metadata
                     * @param exception
                     */

                    public void onCompletion(RecordMetadata metadata, Exception exception) {
                        System.out.printf("當前的消息對應的主題是：%s，內容是：%s，所在的分區是：%d，偏移量是：%d%n",
                                metadata.topic(), record.value(), metadata.partition(), metadata.offset());
                    }
                });
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            //⑤資源釋放
            if (producer != null) {
                producer.close();
            }
        }


    }
}

消費者consumer


import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;

import java.util.Arrays;
import java.util.Collection;
import java.util.Properties;

public class MyMsgConsumerDemo {
    public static void main(String[] args) {

        //步驟：
        KafkaConsumer consumer = null;
        try {
            //①Properties的實例，將consumer.properties資源文件中的參數設置封裝進去
            Properties properties = new Properties();
            properties.load(MyMsgConsumerDemo.class.getClassLoader().getResourceAsStream("consumer.properties"));

            //②KafkaConsumer
            consumer = new KafkaConsumer(properties);

            //③指定訂閱的主題
            final Consumer<Integer, String> finalConsumer = consumer;
            consumer.subscribe(Arrays.asList("test", "test2"), new ConsumerRebalanceListener() {

                @Override
                public void onPartitionsRevoked(Collection<TopicPartition> partitions) {

                }

                /**
                 * 從各個分區的開始位置進行訂閱
                 * @param partitions
                 */
                @Override
                public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                    finalConsumer.seekToBeginning(partitions);
                }
            });

            //④循環接收消息
            while (true) {
                //④正式開始進行訂閱
                ConsumerRecords<Integer, String> records = consumer.poll(1000);

                //⑤分析訂閱後的結果
                for (ConsumerRecord<Integer, String> record : records) {
                    String topic = record.topic();
                    int partition = record.partition();
                    long offset = record.offset();
                    String value = record.value();
                    Integer key = record.key();
                    System.out.printf("當前消息的詳情是：%n主題名→%s,分區編號→%d,偏移量→%d,消息的value→%s,消息的key→%d%n%n",
                            topic, partition, offset, value, key
                    );

                }


                //所有的消息訂閱完畢，就退出
                if (records.isEmpty()) {
                    break;
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            //⑥資源釋放
            if (consumer != null) {
                consumer.close();
            }
        }


    }
}

kafka 總結以及 JavaAPI 操作kafka生產者和消費者

總結

首先準備兩個配置文件到resources

生產者producer：

消費者consumer

Hue 的編譯安裝及簡單使用

Spark第二天的RDD概念

淺探scala閉包

redis集羣優化，JedisCluster實現Pipeline功能，進而實現批處理

HIve修改字段或者增加字段後，Spark訪問不生效問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

kafka 總結 以及 JavaAPI 操作kafka生產者和消費者

總結

首先準備兩個配置文件到resources

生產者producer：

消費者consumer

kafka 總結以及 JavaAPI 操作kafka生產者和消費者