1. MapFile概述
MapFile是排序后的SequenceFile,由两部分构成,分别是data和index。index作为文件的数据索引,主要记录了每个Record的key值,以及该Record在文件中的偏移位置。在MapFile被访问的时候,索引文件会先被加载到内存,通过index映射关系可迅速定位到指定Record所在文件位置。因此,相对于SequenceFile而言,MapFile的检索效率更高,缺点是会消耗一部分内存来存储index数据。
2. MapFile写操作
2.1 写操作实现步骤
- 设置Configuration
- 获取FileSystem
- 设置文件输出路径
- MapFile.Writer() 创建MapFile.Write写入
- 调用MapFile.Write.append追加写入
- 关闭流
2.2 写操作实现代码
实例代码:
package Mapfile;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapFile;
import org.apache.hadoop.io.Text;
import java.net.URI;
public class MapFileWriter {
private static Configuration configuration = new Configuration();
private static String HDFS_PATH = "hdfs://master002:9000";
public static void main(String[] args) throws Exception{
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fileSystem = FileSystem.get(URI.create(HDFS_PATH), configuration);
Path outputPath = new Path("MyMapFile.map");
Text key = new Text();
key.set("mymapkey");
Text value = new Text();
value.set("mymapvalue");
MapFile.Writer writer = new MapFile.Writer(configuration, fileSystem, outputPath.toString(), Text.class, Text.class);
writer.append(key,value);
IOUtils.closeStream(writer);
}
}
效果截图:
3. MapFile读操作
3.1 读操作实现步骤
- 设置Configuration
- 获取FileSystem
- 设置文件输出路径
- MapFile.Reader() 创建MapFile.Reader写入
- 拿到Key与Value的class
- 读取
- 关闭流
3.2 读操作实现代码
实例代码:
package Mapfile;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.ReflectionUtils;
import java.net.URI;
public class MapFileReader {
private static Configuration configuration = new Configuration();
private static String HDFS_PATH = "hdfs://master002:9000";
public static void main(String[] args) throws Exception{
System.setProperty("HADOOP_USER_NAME", "hadoop");
FileSystem fileSystem = FileSystem.get(URI.create(HDFS_PATH), configuration);
Path inputPath = new Path("MyMapFile.map");
MapFile.Reader reader = new MapFile.Reader(fileSystem, inputPath.toString(), configuration);
Writable keyclass = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), configuration);
Writable valueclass = (Writable)ReflectionUtils.newInstance(reader.getValueClass(), configuration);
while(reader.next((WritableComparable)keyclass, valueclass)){
System.out.println(keyclass);
System.out.println(valueclass);
}
IOUtils.closeStream(reader);
}
}
效果截图:
4. 小结
HDFS暂时就告一个段落了,不知道以后还能不能高效的写博客了,导师让我学习k8s去了。。。