Druid單機測試與數據加載方法

http://druid.io/docs/0.10.1/tutorials/quickstart.html

(1)Getting started

下載安裝Druid:

curl -O http://static.druid.io/artifacts/releases/druid-0.10.1-bin.tar.gz
tar -xzf druid-0.10.1-bin.tar.gz
cd druid-0.10.1

主要目錄:

  • LICENSE - the license files.
  • bin/ - scripts useful for this quickstart.
  • conf/* - template configurations for a clustered setup.
  • conf-quickstart/* - configurations for this quickstart.
  • extensions/* - all Druid extensions.
  • hadoop-dependencies/* - Druid Hadoop dependencies.
  • lib/* - all included software packages for core Druid.
  • quickstart/* - files useful for this quickstart.

(2)Start up Zookeeper

 啓動ZK

curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz

tar -xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start

(3)Start up Druid services

 啓動Druid,Zookeeper running後,返回 druid-0.10.1目錄,執行 

 bin/init

這會爲我們建立目錄如log和var,下面在不同的terminal windows中執行不同的進程

java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager

如果需要CTRL-C 來結束(這裏不需要)

如果需要重啓,需要刪掉var目錄,然後重啓bin/init

攝入數據

在druid-0.10.1目錄下執行

 curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task

返回
{"task":"index_hadoop_wikiticker_2017-11-26T12:57:40.055Z"}

 ingestion task console: http://localhost:8090/console.html


coordinator console http://localhost:8081/#/.




 (4)查詢數據

執行

curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty

返回

[html] view plain copy
  1. {"task":"index_hadoop_wikiticker_2017-11-18T16:07:55.681Z"}localhost:druid-0.10.-data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty  
  2. [ {  
  3. "timestamp" : "2015-09-12T00:46:58.771Z",  
  4. "result" : [ {  
  5. "edits" : 33,  
  6. "page" : "Wikipedia:Vandalismusmeldung"  
  7. }, {  
  8. "edits" : 28,  
  9. "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"  
  10. }, {  
  11. "edits" : 27,  
  12. "page" : "Jeremy Corbyn"  
  13. }, {  
  14. "edits" : 21,  
  15. "page" : "Wikipedia:Administrators' noticeboard/Incidents"  
  16. }, {  
  17. "edits" : 20,  
  18. "page" : "Flavia Pennetta"  
  19. }, {  
  20. "edits" : 18,  
  21. "page" : "Total Drama Presents: The Ridonculous Race"  
  22. }, {  
  23. "edits" : 18,  
  24. "page" : "User talk:Dudeperson176123"  
  25. }, {  
  26. "edits" : 18,  
  27. "page" : "Wikipédia:Le Bistro/12 septembre 2015"  
  28. }, {  
  29. "edits" : 17,  
  30. "page" : "Wikipedia:In the news/Candidates"  
  31. }, {  
  32. "edits" : 17,  
  33. "page" : "Wikipedia:Requests for page protection"  
  34. }, {  
  35. "edits" : 16,  
  36. "page" : "Utente:Giulio Mainardi/Sandbox"  
  37. }, {  
  38. "edits" : 16,  
  39. "page" : "Wikipedia:Administrator intervention against vandalism"  
  40. }, {  
  41. "edits" : 15,  
  42. "page" : "Anthony Martial"  
  43. }, {  
  44. "edits" : 13,  
  45. "page" : "Template talk:Connected contributor"  
  46. }, {  
  47. "edits" : 12,  
  48. "page" : "Chronologie de la Lorraine"  
  49. }, {  
  50. "edits" : 12,  
  51. "page" : "Wikipedia:Files for deletion/2015 September 12"  
  52. }, {  
  53. "edits" : 12,  
  54. "page" : "Гомосексуальный образ жизни"  
  55. }, {  
  56. "edits" : 11,  
  57. "page" : "Constructive vote of no confidence"  
  58. }, {  
  59. "edits" : 11,  
  60. "page" : "Homo naledi"  
  61. }, {  
  62. "edits" : 11,  
  63. "page" : "Kim Davis (county clerk)"  
  64. }, {  
  65. "edits" : 11,  
  66. "page" : "Vorlage:Revert-Statistik"  
  67. }, {  
  68. "edits" : 11,  
  69. "page" : "Конституция Японской империи"  
  70. }, {  
  71. "edits" : 10,  
  72. "page" : "The Naked Brothers Band (TV series)"  
  73. }, {  
  74. "edits" : 10,  
  75. "page" : "User talk:Buster40004"  
  76. }, {  
  77. "edits" : 10,  
  78. "page" : "User:Valmir144/sandbox"  
  79. } ]  


================================

數據加載方法

Loading Data

http://druid.io/docs/0.10.1/tutorials/ingestion.html
兩種形式streaming (real-time)  file-based (batch) 
【1】HDFS文件
http://druid.io/docs/0.10.1/ingestion/batch-ingestion.html
【2】Kafka, Storm, Spark Streaming
利用Tranquility客戶端 http://druid.io/docs/0.10.1/ingestion/stream-ingestion.html#stream-push

文件加載簡單入門

Files-based
【1】加載本地磁盤文件:http://druid.io/docs/0.10.1/tutorials/tutorial-batch.html
【2】Streams-based 
          push data over HTTP:http://druid.io/docs/0.10.1/tutorials/tutorial-streams.html

【3】Kafka-based tutorial:http://druid.io/docs/0.10.1/tutorials/tutorial-kafka.html

例子1-加載本地磁盤文件

Loading from Files-Load your own batch data
【1】按照單機版下載並啓動
http://druid.io/docs/0.10.1/tutorials/quickstart.html
【2】寫ingestion規則
參考下載包中的 quickstart/wikiticker-index.json
要點:
(1)標識dataset,dataSource中dataSchema
(2)標識dataset的位置,inputSpec中的paths,多個文件用逗號分隔
(3)標識timestamp,timestampSpec的column
(4)標識dimensions ,dimensionsSpec的imensions(
(5)標識metrics,metricsSpec
(6)ranges,granularitySpec的intervals
如果數據無時間可以按照"2000-01-01T00:00:00.000Z"形式標識每一行
文件支持TSV, CSV, and JSON ,不支持嵌套JSON
JSON數據形式如下:
pageviews.json文件內容
{"time": "2015-09-01T00:00:00Z", "url": "/foo/bar", "user": "alice", "latencyMs": 32}
{"time": "2015-09-01T01:00:00Z", "url": "/", "user": "bob", "latencyMs": 11}
{"time": "2015-09-01T01:30:00Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
主要保證每一行數據沒有newline符號
如按下面寫規則json,my-index-task.json
"dataSource": "pageviews"
"inputSpec": {
  "type": "static",
  "paths": "pageviews.json"
}
"timestampSpec": {
  "format": "auto",
  "column": "time"
}
"dimensionsSpec": {
  "dimensions": ["url", "user"]
}
"metricsSpec": [
  {"name": "views", "type": "count"},
  {"name": "latencyMs", "type": "doubleSum", "fieldName": "latencyMs"}
]
"granularitySpec": {
  "type": "uniform",
  "segmentGranularity": "day",
  "queryGranularity": "none",
  "intervals": ["2015-09-01/2015-09-02"]
}
【3】爲了保障indexing task可以讀到pageviews.json文件內容
(1)本地執行(不配置連接hadoop),將pageviews.json文件放在Druid root目錄
(2)若連接hadoop,修改inputSpec中的paths
【4】執行
curl -X 'POST' -H 'Content-Type:application/json' -d @my-index-task.json OVERLORD_IP:8090/druid/indexer/v1/task
若本地執行用下面
curl -X 'POST' -H 'Content-Type:application/json' -d @my-index-task.json localhost:8090/druid/indexer/v1/task=
通過http://OVERLORD_IP:8090/druid/indexer/v1/task 查看indexing的進度
【4】查詢數據
數據將在1到2分鐘後可用,通過Coordinator console http://localhost:8081/#/. 查看
【5】查看數據
http://druid.io/docs/0.10.1/querying/querying.html

例子2-消費kafka數據

Tutorial: Load from Kafka
【1】下載啓動kafka
curl -O http://www.us.apache.org/dist/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz
tar -xzf kafka_2.11-0.9.0.0.tgz
cd kafka_2.11-0.9.0.0
啓動Kafka broker
./bin/kafka-server-start.sh config/server.properties
建立Kafka topic命名爲metrics
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic metrics
【2】發送樣例數據
Druid目錄生成測試數據bin/generate-example-metrics
啓動kafka的producer
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic metrics
將生成的數據貼到producer的終端中
【3】查詢數據
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章