https://blog.csdn.net/zhangshk_/article/details/83690790

HDFS操作
1. Shell命令

表格中完整地列出了支持的命令選項：

選項名稱	使用格式	含義
-ls	-ls <路徑>	查看指定路徑的當前目錄結構
-lsr	-lsr <路徑>	遞歸查看指定路徑的目錄結構
-du	-du <路徑>	統計目錄下個文件大小
-dus	-dus <路徑>	彙總統計目錄下文件(夾)大小
-count	-count [-q] <路徑>	統計文件(夾)數量
-mv	-mv <源路徑> <目的路徑>	移動
-cp	-cp <源路徑> <目的路徑>	複製
-rm	-rm [-skipTrash] <路徑>	刪除文件/空白文件夾
-rmr	-rmr [-skipTrash] <路徑>	遞歸刪除
-put	-put <多個linux上的文件> <hdfs路徑>	上傳文件
-copyFromLocal	-copyFromLocal <多個linux上的文件> <hdfs路徑>	從本地複製
-moveFromLocal	-moveFromLocal <多個linux上的文件> <hdfs路徑>	從本地移動
-getmerge	-getmerge <源路徑> <linux路徑>	合併到本地
-cat	-cat <hdfs路徑>	查看文件內容
-text	-text <hdfs路徑>	查看文件內容
-copyToLocal	-copyToLocal [-ignoreCrc] [-crc] [hdfs源路徑] [linux目的路徑]	從本地複製
-moveToLocal	-moveToLocal [-crc] <hdfs源路徑> <linux目的路徑>	從本地移動
-mkdir	-mkdir <hdfs路徑>	創建空白文件夾
-setrep	-setrep [-R] [-w] <副本數> <路徑>	修改副本數量
-touchz	-touchz <文件路徑>	創建空白文件
-stat	-stat [format] <路徑>	顯示文件統計信息
-tail	-tail [-f] <文件>	查看文件尾部信息
-chmod	-chmod [-R] <權限模式> [路徑]	修改權限
-chown	-chown [-R] [屬主][:[屬組]] 路徑	修改屬主
-chgrp	-chgrp [-R] 屬組名稱路徑	修改屬組
-help	-help [命令選項]	幫助

注意：以上表格中路徑包括hdfs中的路徑和linux中的路徑。對於容易產生歧義的地方，會特別指出“linux路徑”或者“hdfs路徑”。如果沒有明確指出，意味着是hdfs路徑。

下面簡單介紹幾個常用命令選項的用法。

hdfs dfs -ls /

-mkdir 功能：在HDFS文件系統上創建目錄。

hdfs dfs -mkdir /test

-put 功能：上傳本地文件到HDFS指定目錄，info處是本地文件路徑。

hdfs dfs -put info /test

-get 功能：從hdfs下載文件到本地。

hdfs dfs -get /test/info ./

-rm 功能：從HDFS刪除文件。

hdfs dfs -rm /test/info

hdfs dfs -ls /test

-moveFromLocal 功能：剪切本地文件到HDFS，info處是本地文件路徑。

hdfs dfs -moveFromLocal info /test

-cat 功能：顯示文件內容。

hdfs dfs -cat /test/info

-appendToFile 功能：在文件末尾追加數據，info處是本地文件路徑。

hdfs dfs -appendToFile info /test/info

hdfs dfs -cat /test/info

-chmod 功能：更改文件所屬權限。

hdfs dfs -ls /test

-cp 功能：實現HDFS目錄中的文件的拷貝。

將/test/info拷貝到/tmp下，這裏要先在HDFS下創建好/tmp：

hdfs dfs -cp /test/info /tmp/

hdfs dfs -ls /tmp

-mv 功能：在HDFS中移動文件。

將/test/info移動到 /user下，需要先創建/user

hdfs dfs -mv /test/info /user/

hdfs dfs -ls /test

hdfs dfs -ls /user

-df 功能：統計文件系統的可用空間信息。

hdfs dfs -df -h /

-du 功能：統計文件夾的大小信息。

hdfs dfs -du /user

-count 功能：統計一個指定目錄下的文件數量。

hdfs dfs -count /user

1. Java API編程

public class Test {

private FileSystem fs;

private URI uri;

Configuration cf;

//private static String rec="hdfs://localhost:9000/test";

private static String rec="hdfs://localhost:9000/";

Test(String resource){

cf=new Configuration();

try {

uri=new URI(resource);

try {

fs=FileSystem.newInstance(uri, cf);

} catch (IOException e) {

e.printStackTrace();

}

} catch (URISyntaxException e) {

e.printStackTrace();

}

public void createDir(String src){

try {

fs.mkdirs(new Path(src));

} catch (IllegalArgumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public void readFile(){

InputStream input=null;

ByteArrayOutputStream ouput=null;

try {

input=fs.open(new Path(rec+"/test"));

ouput=new ByteArrayOutputStream(input.available());

IOUtils.copyBytes(input, ouput, cf);

System.out.print(ouput.toString());

} catch (IllegalArgumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public void listAll(String src){

try {

FileStatus[] status=fs.listStatus(new Path(src));

for(int i=0;i<status.length;i++){

System.out.println(status[i].getPath().toString());

}

} catch (IllegalArgumentException | IOException e) {

e.printStackTrace();

}

public void copyFromLocalDir(String src,String dst){

try {

fs.copyFromLocalFile(new Path(src), new Path(dst));

} catch (IllegalArgumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public void copyToLocalDir(String src,String dst){

try {

boolean isExist=fs.exists(new Path(src));

if(isExist){

fs.copyToLocalFile(new Path(src),new Path(dst));

}

else{

System.out.println("文件不存在！");

}

} catch (IllegalArgumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public void copyAllToLocalDir(String src,String dst){

try {

boolean isExist=fs.exists(new Path(src));

if(isExist){

FileStatus[] status=fs.listStatus(new Path(src));

for(int i=0;i<status.length;i++){

fs.copyToLocalFile(new Path(status[i].getPath().toString()),new Path(dst));

}

else{

System.out.println("文件不存在！");

}

} catch (IllegalArgumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public void deleteFile(String src){

try {

boolean isExist=fs.exists(new Path(src));

if(isExist){

fs.delete(new Path(src));

}

else{

System.out.println("文件不存在！");

}

} catch (IllegalArgumentException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public static void main(String[] args) {

// TODO Auto-generated method stub

Test ts=new Test(rec);

System.out.println("All Files");

ts.listAll(rec);

ts.copyFromLocalDir("/home/wangzun/文檔/hadoopts.pdf", rec+"/test");

ts.listAll(rec+"/test");

ts.copyAllToLocalDir(rec+"/test", "/home/wangzun/hadooptest/");

//ts.deleteFile(rec+"/test/test");

//ts.listAll(rec+"/test");

//ts.copyToLocalDir(rec+"/test/大數據技術原理與應用.pdf", "/home/wangzun/hadooptest/");//這裏的路徑需要修改

//ts.readFile();

}

HBase相關操作
1. Shell命令操作

“hbase(main):002:0>”後的語句是輸入的命令

表結構操作

創建表

語法：create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
創建一個User表，並且有一個info列族

hbase(main):002:0> create 'User','info'

查看所有表

hbase(main):003:0> list

查看錶詳情

hbase(main):004:0> describe 'User'

或

hbase(main):025:0> desc 'User'

表修改

刪除指定的列族

hbase(main):002:0> alter 'User', 'delete' => 'info'

表數據操作

插入數據

語法：put <table>,<rowkey>,<family:column>,<value>

hbase(main):005:0> put 'User', 'row1', 'info:name', 'xiaoming'

hbase(main):006:0> put 'User', 'row2', 'info:age', '18'

hbase(main):007:0> put 'User', 'row3', 'info:sex', 'man'

根據rowKey查詢某個記錄

語法：get <table>,<rowkey>,[<family:column>,....]

hbase(main):008:0> get 'User', 'row2'

hbase(main):028:0> get 'User', 'row3', 'info:sex'

hbase(main):036:0> get 'User', 'row1', {COLUMN => 'info:name'}

查詢所有記錄

語法：scan <table>, {COLUMNS => [ <family:column>,...], LIMIT => num}
掃描所有記錄

hbase(main):009:0> scan 'User'

掃描前2條

hbase(main):037:0> scan 'User', {LIMIT => 2}

範圍查詢

hbase(main):011:0> scan 'User', {STARTROW => 'row2'}

hbase(main):012:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row2'}

hbase(main):013:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'}

另外，還可以添加TIMERANGE和FITLER等高級功能

STARTROW,ENDROW必須大寫，否則報錯;查詢結果不包含等於ENDROW的結果集

統計表記錄數

語法：count <table>, {INTERVAL => intervalNum, CACHE => cacheNum}

INTERVAL設置多少行顯示一次及對應的rowkey，默認1000；CACHE每次去取的緩存區大小，默認是10，調整該參數可提高查詢速度

hbase(main):020:0> count 'User'

刪除

刪除列

hbase(main):008:0> delete 'User', 'row1', 'info:age'

刪除所有行

hbase(main):014:0> deleteall 'User', 'row2'

刪除表中所有數據

hbase(main):016:0> truncate 'User'

表管理操作

禁用表

hbase(main):014:0> disable 'User'

hbase(main):015:0> describe 'User'

hbase(main):016:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'}

啓用表

hbase(main):017:0> enable 'User'

hbase(main):018:0> describe 'User'

hbase(main):019:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'}

測試表是否存在

hbase(main):022:0> exists 'User'

hbase(main):023:0> exists 'user'

hbase(main):024:0> exists user

報錯NameError: undefined local variable or method `user' for main

刪除表

刪除前，必須先disable

hbase(main):030:0> drop 'User'

報錯ERROR: Table TEST.USER is enabled. Disable it first.

hbase(main):031:0> disable 'User'

hbase(main):033:0> drop 'User'

查看結果

hbase(main):034:0> list

1. Java API編程

需要將hbase文件夾下的lib文件夾下的包加入到工程中

public class HBaseConn {

static Configuration conf = null;

static Connection conn = null;

static {

conf = HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", "localhost");

conf.set("hbase.zookeeper.property.clientPort", "2181");

conf.set("hbase.master", "127.0.0.1:60000");

try {

conn = ConnectionFactory.createConnection(conf);

} catch (IOException e) {

e.printStackTrace();

}

/**

* 創建表

* @throws Exception

public static void testCreateTable() throws Exception {

// 創建表管理類

Admin admin = conn.getAdmin();

// 創建表描述類

TableName tableName = TableName.valueOf("user2");

HTableDescriptor descriptor = new HTableDescriptor(tableName);

// 創建列族描述類

HColumnDescriptor info1 = new HColumnDescriptor("info1");

// 列族加入表中

descriptor.addFamily(info1);

HColumnDescriptor info2 = new HColumnDescriptor("info2");

descriptor.addFamily(info2);

// 創建表

admin.createTable(descriptor);

}

public static void testDeleteTable() throws Exception {

TableName t1 = TableName.valueOf("user2");

Admin admin = conn.getAdmin();

admin.disableTable(t1);

admin.deleteTable(t1);

}

/**

* 向表中插入數據單條插入(包括修改)

* @throws Exception

public static void testPut() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

// rowkey

Put put = new Put(Bytes.toBytes("1234"));

// 列族，列，值

put.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("gender"),

Bytes.toBytes("1"));

put.addColumn(Bytes.toBytes("info2"), Bytes.toBytes("name"),

Bytes.toBytes("wangwu"));

table.put(put);

}

/**

* 向表中插入數據多條插入,使用list

* @throws Exception

public static void testPut2() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

List<Put> putList = new ArrayList<>();

for (int i = 20; i <= 30; i++) {

// rowkey

Put put = new Put(Bytes.toBytes("jbm_" + i));

// 列族，列，值

put.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("age"),

Bytes.toBytes(Integer.toString(i)));

put.addColumn(Bytes.toBytes("info1"), Bytes.toBytes("name"),

Bytes.toBytes("lucy" + i));

putList.add(put);

}

table.put(putList);

}

/**

* 修改數據

* @throws Exception

public static void testUpdate() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Put put = new Put(Bytes.toBytes("1234"));

put.addColumn(Bytes.toBytes("info2"), Bytes.toBytes("name"),

Bytes.toBytes("tom"));

table.put(put);

}

/**

* 刪除數據

* @throws Exception

public static void testDeleteData() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Delete delete = new Delete(Bytes.toBytes("1234"));

table.delete(delete);

}

/**

* 單條查詢

* @throws Exception

public static void testGetSingle() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

// rowkey

Get get = new Get(Bytes.toBytes("jbm_20"));

Result result = table.get(get);

// 列族，列名

byte[] name = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("name"));

byte[] age = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("age"));

System.out.println(Bytes.toString(name));

System.out.println(Bytes.toString(age));

}

/**

* 多條查詢全表掃描

* @throws Exception

public static void testGetMany() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Scan scan = new Scan();

// 字典序類似於分頁

scan.setStartRow(Bytes.toBytes("jbm_20"));

scan.setStopRow(Bytes.toBytes("jbm_30"));

ResultScanner resultScanner = table.getScanner(scan);

for (Result result : resultScanner) {

// Single row result of a Get or Scan query. Result

// Result 一次獲取一個rowkey對應的記錄

// 列族，列名

byte[] name = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("name"));

byte[] age = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("age"));

System.out.print(Bytes.toString(name) + ",");

System.out.print(Bytes.toString(age));

System.out.println();

}

/**

* 全表掃描過濾器列值過濾器

* @throws Exception

public static void testFilter() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Scan scan = new Scan();

// 列值過濾器

SingleColumnValueFilter columnValueFilter = new SingleColumnValueFilter(

Bytes.toBytes("info1"), Bytes.toBytes("name"), CompareOp.EQUAL,

Bytes.toBytes("lucy25"));

// 設置過濾器

scan.setFilter(columnValueFilter);

// 獲取結果集

ResultScanner resultScanner = table.getScanner(scan);

for (Result result : resultScanner) {

byte[] name = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("name"));

byte[] age = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("age"));

System.out.print(Bytes.toString(name) + ",");

System.out.print(Bytes.toString(age));

System.out.println();

}

/**

* 全表掃描過濾器 rowkey過濾

* @throws Exception

public static void testRowkeyFilter() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Scan scan = new Scan();

// rowkey過濾器

// 匹配以jbm開頭的

RowFilter filter = new RowFilter(CompareOp.EQUAL,

new RegexStringComparator("^jbm"));

// 設置過濾器

scan.setFilter(filter);

// 獲取結果集

ResultScanner resultScanner = table.getScanner(scan);

for (Result result : resultScanner) {

byte[] name = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("name"));

byte[] age = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("age"));

System.out.print(Bytes.toString(name) + ",");

System.out.print(Bytes.toString(age));

System.out.println();

}

/**

* 全表掃描過濾器列名前綴過濾

* @throws Exception

public static void testColumnPrefixFilter() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Scan scan = new Scan();

// 列名前綴過濾器列名前綴爲na(注：不是指值的前綴)

ColumnPrefixFilter filter = new ColumnPrefixFilter(Bytes.toBytes("na"));

// 設置過濾器

scan.setFilter(filter);

// 獲取結果集

ResultScanner resultScanner = table.getScanner(scan);

for (Result result : resultScanner) {

byte[] name = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("name"));

byte[] age = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("age"));

if (name != null) {

System.out.print(Bytes.toString(name) + " ");

}

if (age != null) {

System.out.print(age);

}

System.out.println();

}

/**

* 全表掃描過濾器過濾器集合

* @throws Exception

public static void testFilterList() throws Exception {

TableName t1 = TableName.valueOf("user2");

Table table = conn.getTable(t1);

Scan scan = new Scan();

// 過濾器集合：MUST_PASS_ALL（and）,MUST_PASS_ONE(or)

FilterList filterList = new FilterList(Operator.MUST_PASS_ALL);

// ROWKEY過濾器

RowFilter rowFilter = new RowFilter(CompareOp.EQUAL,

new RegexStringComparator("^jbm"));

// 列值過濾器 age大於25

SingleColumnValueFilter columnValueFilter = new SingleColumnValueFilter(

Bytes.toBytes("info1"), Bytes.toBytes("age"),

CompareOp.GREATER, Bytes.toBytes("25"));

filterList.addFilter(columnValueFilter);

filterList.addFilter(rowFilter);

// 設置過濾器

scan.setFilter(filterList);

// 獲取結果集

ResultScanner resultScanner = table.getScanner(scan);

for (Result result : resultScanner) {

byte[] name = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("name"));

byte[] age = result.getValue(Bytes.toBytes("info1"),

Bytes.toBytes("age"));

if (name != null) {

System.out.print(Bytes.toString(name) + " ");

}

if (age != null) {

System.out.print(Bytes.toString(age) + " ");

}

System.out.println();

}

public static void main(String[] arg) {

System.setProperty("hadoop.home.dir",

"/home/slave3/hadoop/hadoop-3.1.1");

try {

// testCreateTable();

// testDeleteTable();

// testPut();

// testPut2();

// testUpdate();

// testDeleteData();

// testGetSingle();

// testGetMany();

// testFilter();

// testRowkeyFilter();

// testColumnPrefixFilter();

//testFilterList();

} catch (Exception e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

Mapreduce相關操作

需要將hadoop文件夾中share文件夾下的所有包加入到工程。

1. WordCount

統計文件中每個單詞出現的次數。

MapTask.java

/**

* 四個參數

* KEYIN 輸入數據的key 行偏移量

* VALURIN 輸入的value,每一行數據的類型

* KEYOUT 輸出的key類型

* VALUEOUT 輸出的value類型

* 序列化

* java的序列化：存儲全類名，每一個數據的類型都會存儲效率不高

* hadoop自己的序列化

* Long LongWritable

* Integer IntWritable

* String Text

* float FloatWritable

* double DoubleWritable

* null NullWritable

* @author hasee

/**

* map 階段: 每一行的數據進行切分，輸出數據

public class MapTask extends Mapper<LongWritable, Text, Text, IntWritable> {

@Override

protected void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

String[] split = value.toString().split(" ");

for (String word : split) {

context.write(new Text(word), new IntWritable(1));

}

ReduceTask.java

/**

* hello--->{1 1 1 1 1}

* @author hasee

public class ReduceTask extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override

protected void reduce(Text key, Iterable<IntWritable> values,

Context context) throws IOException, InterruptedException {

// TODO Auto-generated method stub

int count = 0;

for (IntWritable value : values) {

count += value.get();

}

context.write(key, new IntWritable(count));

}

Mycombiner.java

/**

* 用來對map的結果進行先一步的處理從而降低IO流的壓力

* @author hasee

public class Mycombiner extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override

protected void reduce(Text arg0, Iterable<IntWritable> arg1,

Reducer<Text, IntWritable, Text, IntWritable>.Context arg2)

throws IOException, InterruptedException {

int count = 0;

for (IntWritable intWritable : arg1) {

count += intWritable.get();

}

arg2.write(arg0, new IntWritable(count));

}

Driver.java

/**

* 本地模式小數據測試，測試完成之後才改成集羣模式進行提交

public class Driver {

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

// System.setProperty("HADOOP_USER_NAME", "hasee");

/**

* fs.defaultFs的默認值file:/// 本地文件系統 *mapreduce.framework.name默認值是local

Job job = Job.getInstance(conf);

job.setMapperClass(MapTask.class);

job.setReducerClass(ReduceTask.class);

job.setJarByClass(Driver.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

// 設置combiner

job.setCombinerClass(Mycombiner.class);

// 設置輸入和輸出目錄

FileInputFormat.setInputPaths(job, new Path(

"/home/slave3/Downloads/test"));

FileOutputFormat.setOutputPath(job, new Path(

"/home/slave3/Downloads/result"));

File file = new File("/home/slave3/Downloads/test.txt");

if (file.exists()) {

FileUtils.deleteDirectory(file);

}

// 提交任務

boolean b = job.waitForCompletion(true);

System.out.println(b ? 0 : 1)

}

1. 多個文件中同一字符分別在某個文件中出現的次數

第一步：首先將每個文件中的字符數統計出來：例如hello-a.txt 3

CreateIndexOne.java

/**

* 計算多個文件裏字符出現的次數每個word在各個文件中出現的次數

public class CreateIndexOne {

public static class MapTask extends

Mapper<LongWritable, Text, Text, IntWritable> {

String pathname = null;

@Override

protected void setup(

Mapper<LongWritable, Text, Text, IntWritable>.Context context)

throws IOException, InterruptedException {

// 獲取當前文件名計算切片

FileSplit fileSplit = (FileSplit) context.getInputSplit();

pathname = fileSplit.getPath().getName();

}

@Override

protected void map(LongWritable key, Text value,

Mapper<LongWritable, Text, Text, IntWritable>.Context context)

throws IOException, InterruptedException {

String[] words = value.toString().split(" ");

for (String word : words) {

context.write(new Text(word + "-" + pathname), new IntWritable(

1));

}

public static class ReduceTask extends

Reducer<Text, IntWritable, Text, IntWritable> {

@Override

protected void reduce(Text key, Iterable<IntWritable> values,

Reducer<Text, IntWritable, Text, IntWritable>.Context context)

throws IOException, InterruptedException {

int count = 0;

for (IntWritable value : values) {

count++;

}

context.write(key, new IntWritable(count));

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

// System.setProperty("HADOOP_USER_NAME", "hasee");

/**

* fs.defaultFs的默認值file:/// 本地文件系統 *mapreduce.framework.name默認值是local

Job job = Job.getInstance(conf);

job.setMapperClass(MapTask.class);

job.setReducerClass(ReduceTask.class);

job.setJarByClass(CreateIndexOne.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

// 設置輸入和輸出目錄

FileInputFormat.setInputPaths(job, new Path(

"/home/slave3/Downloads/test2"));//在這個路徑下創建多個txt文件，並分別輸入若干單詞

FileOutputFormat.setOutputPath(job, new Path(

"/home/slave3/Downloads/result2"));

File file = new File("/home/slave3/Downloads/result2");

if (file.exists()) {

FileUtils.deleteDirectory(file);

}

boolean b = job.waitForCompletion(true);

System.out.println(b ? 0 : 1);

}

第二步：使用第一步的結果，合併每個文件在各個單詞的次數。例如 hello a.txt 3,b.txt 2

CreateIndexTwo.Java

public class CreateIndexTwo {

public static class MapTask extends Mapper<LongWritable, Text, Text, Text> {

Text outKey = new Text();

Text outValue = new Text();

@Override

protected void map(LongWritable key, Text value,

Mapper<LongWritable, Text, Text, Text>.Context context)

throws IOException, InterruptedException {

String[] split = value.toString().split("-");

String word = split[0];

String nameNum = split[1];

outKey.set(word);

outValue.set(nameNum);

context.write(outKey, outValue);

}

public static class ReduceTask extends Reducer<Text, Text, Text, Text> {

@Override

protected void reduce(Text key, Iterable<Text> values,

Reducer<Text, Text, Text, Text>.Context context)

throws IOException, InterruptedException {

StringBuilder builder = new StringBuilder();

boolean flag = true;

for (Text text : values) {

if (flag) {

builder.append(text.toString());

flag = false;

} else {

builder.append(",");

builder.append(text.toString());

}

context.write(key, new Text(builder.toString()));

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

/**

* fs.defaultFs的默認值file:/// 本地文件系統 *mapreduce.framework.name默認值是local

Job job = Job.getInstance(conf);

job.setMapperClass(MapTask.class);

job.setReducerClass(ReduceTask.class);

job.setJarByClass(CreateIndexTwo.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

// 設置輸入和輸出目錄

FileInputFormat.setInputPaths(job, new Path(

"/home/slave3/Downloads/result2"));

FileOutputFormat.setOutputPath(job, new Path(

"/home/slave3/Downloads/result3"));

File file = new File("/home/slave3/Downloads/result3");

if (file.exists()) {

FileUtils.deleteDirectory(file);

}

boolean b = job.waitForCompletion(true);

System.out.println(b ? 0 : 1);

}

Hadoop相關操作,HDFS,HBase,Mapreduce

Hadoop相關操作

通過f-string編寫簡潔高效的Python格式化輸出代碼

工作中用到的腳本合集

微服務實踐Aspire項目發佈到遠程k8s集羣

[轉帖]20個常用的Linux工具命令

[轉帖]PostgreSQL從小白到高手教程 - 第46講：poc-tpch測試

24-5-18 X

Hadoop相關操作,HDFS,HBase,Mapreduce

Hadoop 3.1.1 僞分佈式配置

HDFS jav API編程（代碼暫存）

軟件定義安全的一點點理解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結