hadoop讀寫mysql數據庫

hadoop技術推出一度曾遭到關係數據庫研究者的挑釁和批評，認爲MapReduce不具有關係數據庫中的結構化數據存儲和處理能力。爲此，hadoop社區和研究人員做了多的努力，在hadoop0.19版支持

MapReduce訪問關係數據庫，如：MySQL、Mongodb、PostgreSQL、Oracle 等幾個數據庫系統。Hadoop 訪問關係數據庫主要通過DBInputFormat類實現的，包的位置在 org.apache.hadoop.mapred.lib.db。

本課程我們以 Mysql爲例來學習 MapReduce讀寫數據。

[讀數據]

DBInputFormat 在 Hadoop 應用程序中通過數據庫供應商提供的 JDBC接口來與數據庫進行交互，並且可以使用標準的 SQL 來讀取數據庫中的記錄。學習DBInputFormat首先必須知道二個條件。

第一、在使用 DBInputFormat 之前,必須將要使用的 JDBC 驅動拷貝到分佈式系統各個節點的$HADOOP_HOME/lib/目錄下。

第二、MapReduce訪問關係數據庫時，大量頻繁的從MapReduce程序中查詢和讀取數據，這大大的增加了數據庫的訪問負載，因此，DBInputFormat接口僅僅適合讀取小數據量的數據，而不適合處理數據倉庫。

提示:處理數據倉庫的方法有：利用數據庫的 Dump 工具將大量待分析的數據輸出爲文本，並上傳到 HDFS 中進行理。

下面我們來看看 DBInputFormat類的內部結構，DBInputFormat 類中包含以下三個內置類。

1、protected class DBRecordReader implementsRecordReader< LongWritable, T>：用來從一張數據庫表中讀取一條條元組記錄。

2、public static class NullDBWritable implements DBWritable,Writable：主要用來實現 DBWritable 接口。DBWritable接口要實現二個函數，第一是write，第二是readFileds，這二個函數都不難理解，一個是寫，一個是讀出所有字段。原型如下：

public void write(PreparedStatement statement) throwsSQLException;

public void readFields(ResultSet result);

3、protected static class DBInputSplit implements InputSplit：主要用來描述輸入元組集合的範圍,包括 start 和 end 兩個屬性，start 用來表示第一條記錄的索引號，end 表示最後一條記錄的索引號。

下面對怎樣使用 DBInputFormat 讀取數據庫記錄進行詳細的介紹，具體步驟如下：

步驟一、配置 JDBC 驅動、數據源和數據庫訪問的用戶名和密碼。代碼如下。

DBConfiguration.configureDB (Job job, StringdriverClass, String dbUrl, String userName, String passwd)

MySQL 數據庫的 JDBC 的驅動爲“com.mysql.jdbc.Driver”，數據源爲“jdbc:mysql://localhost/testDB”，其中testDB爲訪問的數據庫。useName一般爲“root”，passwd是你數據庫的密碼。

步驟二、使用 setInput 方法操作 MySQL 中的表，setInput 方法的參數如下。

DBInputFormat.setInput(Job job, Class< extends DBWritable> inputClass, String tableName, String conditions,String orderBy, String... fieldNames)

這個方法的參數很容易看懂，inputClass實現DBWritable接口。string tableName表名， conditions表示查詢的條件，orderby表示排序的條件，fieldNames是字段，這相當與把sql語句拆分的結果。當然也可以用sql語句進行重載，代碼如下。

setInput(Job job, Class< extends DBWritable> inputClass, String inputQuery, StringinputCountQuery)。

步驟三、編寫MapReduce函數，包括Mapper 類、Reducer 類、輸入輸出文件格式等，然後調用job.waitForCompletion(true)。

我們通過示例程序來看看 MapReduce 是如何讀數據的，假設 MySQL 數據庫中有數據庫 user，假設數據庫中的字段有“uid”，“email”，“name"。

第一步要實現DBwrite和Writable數據接口。代碼如下：

package com.dajiangtai.hadoop.advance;

import java.io.DataInput;

import java.io.DataOutput;

import java.io.IOException;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.SQLException;

import org.apache.hadoop.io.Writable;

import org.apache.hadoop.mapred.lib.db.DBWritable;

public class UserRecord implements Writable, DBWritable {

int uid;

String email;

String name;

*從數據庫讀取所需要的字段

@Override

public void readFields(ResultSet resultSet) throws SQLException {

// TODO Auto-generated method stub

this.uid = resultSet.getInt(1);

this.email = resultSet.getString(2);

this.name = resultSet.getString(3);

}

*向數據庫寫入數據

@Override

public void write(PreparedStatement statement) throws SQLException {

// TODO Auto-generated method stub

statement.setInt(1, this.uid);

statement.setString(2, this.email);

statement.setString(3, this.name);

}

*讀取序列化數據

@Override

public void readFields(DataInput in) throws IOException {

// TODO Auto-generated method stub

this.uid = in.readInt();

this.email = in.readUTF();

this.name = in.readUTF();

}

*將數據序列化

@Override

public void write(DataOutput out) throws IOException {

// TODO Auto-generated method stub

out.writeInt(uid);

out.writeUTF(email);

out.writeUTF(name);

}

public String toString() {

return new String(this.uid + " " + this.email + " " +this.name);

}

第二步，實現Map和Reduce類

public static class ConnMysqlMapper extends Mapper< LongWritable,UserRecord,Text,Text> {

public void map(LongWritable key,UserRecord values,Context context)

throws IOException,InterruptedException {

//從 mysql 數據庫讀取需要的數據字段

context.write(new Text(values.uid+""), new Text(values.name +" "+values.email));

}

public static class ConnMysqlReducer extends Reducer< Text,Text,Text,Text> {

public void reduce(Text key,Iterable< Text> values,Context context)

throws IOException,InterruptedException {

//將數據輸出到HDFS中

for(Iterator< Text> itr = values.iterator();itr.hasNext();) {

context.write(key, itr.next());

}

第三步：主函數的實現

/**

* @function MapReduce 連接mysql數據庫讀取數據

* @author 小講

public class ConnMysql {

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

//輸出路徑

Path output = new Path("hdfs://single.hadoop.dajiangtai.com:9000/advance/mysql/out");

FileSystem fs = FileSystem.get(URI.create(output.toString()), conf);

if (fs.exists(output)) {

fs.delete(output);

}

//mysql的jdbc驅動

DistributedCache.addFileToClassPath(new Path("hdfs://single.hadoop.dajiangtai.com:9000/advance/jar/mysql-connector-java-5.1.14.jar"), conf);

//設置mysql配置信息 4個參數分別爲： Configuration對象、mysql數據庫地址、用戶名、密碼

DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://hadoop.dajiangtai.com:3306/djtdb_www", "username", "password");

Job job = new Job(conf,"test mysql connection");//新建一個任務

job.setJarByClass(ConnMysql.class);//主類

job.setMapperClass(ConnMysqlMapper.class);//Mapper

job.setReducerClass(ConnMysqlReducer.class);//Reducer

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

job.setInputFormatClass(DBInputFormat.class);//從數據庫中讀取數據

FileOutputFormat.setOutputPath(job, output);

//列名

String[] fields = { "uid", "email","name" };

//六個參數分別爲：

//1.Job;2.Class< extends DBWritable> 3.表名;4.where條件 5.order by語句;6.列名

DBInputFormat.setInput(job, UserRecord.class,"user", null, null, fields);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

第四步：運行命令如下。

[hadoop@single-hadoop-dajiangtai-com djt]$ hadoop jar ConnMysql.jar com.dajiangtai.hadoop.advance.ConnMysql

第五步：查看結果如下所示。

[hadoop@single-hadoop-dajiangtai-com djt]$ hadoop fs -text /advance/mysql/out/

54 無情 [email protected]

55 冷血 [email protected]

提示 MapReduce 操作 MySQL 數據庫在實際工作中比較常用，例如把 MySQL 中的數據遷移到 HDFS 中，當然還有個很好的方法把 MySQL 或 Oracle 中的數據遷移到 HDFS 中，這個工具是 Pig，如果有這

方面的需求建議使用 Pig。

[寫數據]

數據處理結果的數據量一般不會太大，可能適合hadoop直接寫入數據庫中。hadoop提供了數據庫接口，把 MapReduce 的結果直接輸出到 MySQL、Oracle 等數據庫。主要的類如下所示。

1、DBOutFormat: 提供數據庫寫入接口。

2、DBRecordWriter:提供向數據庫中寫入的數據記錄的接口。

3、DBConfiguration:提供數據庫配置和創建鏈接的接口。

下面我們通過示例來看看 MapReduce 如何向數據庫寫數據，假設 MySQL 數據庫中有數據庫 test，假設數據庫中的字段有“uid”，“email”，“name"。

第一步同上定義UserRecord實現DBwrite和Writable數據接口。代碼不再贅敘。

第二步，實現Map和Reduce類，代碼如下所示。

public static class ConnMysqlMapper extends Mapper< LongWritable,Text,Text,Text>

{

public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException

{

//讀取 hdfs 中的數據

String email = value.toString().split("\\s")[0];

String name = value.toString().split("\\s")[1];

context.write(new Text(email),new Text(name));

}

public static class ConnMysqlReducer extends Reducer< Text,Text,UserRecord,UserRecord>

{

public void reduce(Text key,Iterable< Text> values,Context context)throws IOException,InterruptedException

{

//接收到的key value對即爲要輸入數據庫的字段，所以在reduce中：

//wirte的第一個參數，類型是自定義類型UserRecord，利用key和value將其組合成UserRecord，然後等待寫入數據庫

//wirte的第二個參數，wirte的第一個參數已經涵蓋了要輸出的類型，所以第二個類型沒有用，設爲null

for(Iterator< Text> itr = values.iterator();itr.hasNext();)

{

context.write(new UserRecord(key.toString(),itr.next().toString()),null);

}

第三步：主函數的實現，代碼如下所示。

/**

* 將mapreduce的結果數據寫入mysql中

* @author 小講

public class WriteDataToMysql {

public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException

{

Configuration conf = new Configuration();

//配置 JDBC 驅動、數據源和數據庫訪問的用戶名和密碼

DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver","jdbc:mysql://hadoop.dajiangtai.com:3306/djtdb_www","username", "password");

Job job = new Job(conf,"test mysql connection");//新建一個任務

job.setJarByClass(WriteDataToMysql.class);//主類

job.setMapperClass(ConnMysqlMapper.class); //Mapper

job.setReducerClass(ConnMysqlReducer.class); //Reducer

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(DBOutputFormat.class);//向數據庫寫數據

//輸入路徑

FileInputFormat.addInputPath(job, new Path("hdfs://single.hadoop.dajiangtai.com:9000/advance/mysql/data/data.txt"));

//設置輸出到數據庫表名：test 字段：uid、email、name

DBOutputFormat.setOutput(job, "test", "uid","email","name");

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

第四步：運行命令如下。

[hadoop@single-hadoop-dajiangtai-com djt]$ hadoop jar WriteDataToMysql.jar com.dajiangtai.hadoop.advance.WriteDataToMysql

第五步：查看 MySQL 數據庫記錄。

1[email protected] yangjun

2[email protected] yangjun

3[email protected] binquan

hadoop讀寫mysql數據庫

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

數據展示動態（跑分）顯示

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

Storm記錄005--Storm集羣安裝

Storm記錄04-- Storm適用場景

Storm記錄02-- Storm是什麼

Storm記錄06--Storm的HelloWorld

003-storm設計思想和處理過程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結