1. MapFile概述
MapFile是排序後的SequenceFile,由兩部分構成,分別是data和index。index作爲文件的數據索引,主要記錄了每個Record的key值,以及該Record在文件中的偏移位置。在MapFile被訪問的時候,索引文件會先被加載到內存,通過index映射關係可迅速定位到指定Record所在文件位置。因此,相對於SequenceFile而言,MapFile的檢索效率更高,缺點是會消耗一部分內存來存儲index數據。
2. MapFile寫操作
2.1 寫操作實現步驟
- 設置Configuration
- 獲取FileSystem
- 設置文件輸出路徑
- MapFile.Writer() 創建MapFile.Write寫入
- 調用MapFile.Write.append追加寫入
- 關閉流
2.2 寫操作實現代碼
實例代碼:
package Mapfile;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapFile;
import org.apache.hadoop.io.Text;
import java.net.URI;
public class MapFileWriter {
private static Configuration configuration = new Configuration();
private static String HDFS_PATH = "hdfs://master002:9000";
public static void main(String[] args) throws Exception{
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fileSystem = FileSystem.get(URI.create(HDFS_PATH), configuration);
Path outputPath = new Path("MyMapFile.map");
Text key = new Text();
key.set("mymapkey");
Text value = new Text();
value.set("mymapvalue");
MapFile.Writer writer = new MapFile.Writer(configuration, fileSystem, outputPath.toString(), Text.class, Text.class);
writer.append(key,value);
IOUtils.closeStream(writer);
}
}
效果截圖:
3. MapFile讀操作
3.1 讀操作實現步驟
- 設置Configuration
- 獲取FileSystem
- 設置文件輸出路徑
- MapFile.Reader() 創建MapFile.Reader寫入
- 拿到Key與Value的class
- 讀取
- 關閉流
3.2 讀操作實現代碼
實例代碼:
package Mapfile;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.ReflectionUtils;
import java.net.URI;
public class MapFileReader {
private static Configuration configuration = new Configuration();
private static String HDFS_PATH = "hdfs://master002:9000";
public static void main(String[] args) throws Exception{
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fileSystem = FileSystem.get(URI.create(HDFS_PATH), configuration);
Path inputPath = new Path("MyMapFile.map");
MapFile.Reader reader = new MapFile.Reader(fileSystem, inputPath.toString(), configuration);
Writable keyclass = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), configuration);
Writable valueclass = (Writable)ReflectionUtils.newInstance(reader.getValueClass(), configuration);
while(reader.next((WritableComparable)keyclass, valueclass)){
System.out.println(keyclass);
System.out.println(valueclass);
}
IOUtils.closeStream(reader);
}
}
效果截圖:
4. 小結
HDFS暫時就告一個段落了,不知道以後還能不能高效的寫博客了,導師讓我學習k8s去了。。。