Yandex Big Data Essentials Week1 Scaling Distributed File System

GFS Key Components

components failures are a norm
even space utilisation
write-once-read-many

GFS and Hadoop Distributed File System

GFS主要分爲:Application 、Master、ChannelServer
hdfs主要分爲:Appllcation 、 NameNode 、DataNode三部分

how to read file from hdfs

HDFS client 運行在client node 上的client jvm上。

讀取文件的流程

打開分佈式文件系統上的文件
從NameNode處取的文件塊的位置
HDFS client將塊位置信息傳給FSDataInputStream
FSDataInputStream再從相應的DataNode裏面讀取其中一個塊數據
FSDataInputStream再從相應的DataNode裏面讀取另一個塊數據
關閉FSDataInputStream

寫入文件的流程

hdfs client 運行在client jvm上,client jvm運行在client jvm上。
寫入文件的流程:
1. HDFS client 在Distributed FileSystem上創建文件
2. DistributedFileSystem 在NameNode上create一個文件
3. HDFS client 通過FSDataInputStream向datanode發送write packet
4. 至少三個datanode組成Pipeline of datanodes寫入多個副本
5. datanode向FSDataInpuStream發送ack packet
6. 關閉

In DFS,you can “append” into file,but cannot “modify” a file in the middle. Why?
DFS的核心特性write once read many time 描述了一種數據存儲策略。信息一旦寫入就不能修改,因爲修改操作需要對對底層的存儲結構進行修改。如果需要修改分佈式文件系統(例如hdfs)中的文件,可以寫一份新的同樣文件名的數據。舊的文件在hdfs在整理數據的時候會丟棄。

HDFS應用需要一個“一次寫入多次讀取”的文件訪問模型。一個文件經過創建、寫入和關閉之後就不需要改變。這一假設簡化了數據一致性問題，並且使高吞吐量的數據訪問成爲可能。Map/Reduce應用或者網絡爬蟲應用都非常適合這個模型。目前還有計劃在將來擴充這個模型，使之支持文件的附加寫操作。

Yandex Big Data Essentials Week1 Scaling Distributed File System

GFS Key Components

GFS and Hadoop Distributed File System

how to read file from hdfs

讀取文件的流程

寫入文件的流程

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

數據庫概念模型和邏輯模型及物理模型

Java反射異常:java.lang.IllegalArgumentException: wrong number of arguments

java元註解詳解及自定義註解的方法

Spring學習筆記:使用代理實現AOP

Spring學習筆記:IOC容器

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結