基於MapReduce的並行化大矩陣乘法

原創

2020-06-23 21:40

並行化大矩陣乘法是較早的基於MapReduce編程模型實現的基礎算法之一，最早是由Google公司爲了解決PageRank中包含的大量矩陣乘法而提出的。今天我們就來一起學習一下基於MapReduce的並行化大矩陣乘法。

我們假設有兩個矩陣M和N，其中M的列數等於N的行數，則記M和N的乘積P = M . N。其中Mij表示矩陣M中第i行第j列的元素，Njk表示矩陣N中第j行第K列的元素，則矩陣P中的元素可由下式求得：

即Pik爲M的第i行元素和N的第k列元素對應相乘再相加。由上式我們知道最終決定Pik位置的是（i，k），我們可以把（i，k）作爲Reduce輸出的key，將Pik作爲輸出的value。爲了求出Pik，我們必須知道Mij和Njk。對於Mij，我們需要知道的屬性有所屬矩陣爲M，行號爲i，列號爲j，Mij本身的值大小。對於Njk，我們需要知道的屬性有所屬矩陣爲N，行號爲j，列號爲k，Njk本身的值大小。Mij和Njk的屬性都有Mapper類處理產生。

Map函數：對於M矩陣的每一個元素Mij，產生一系列的鍵值對<(i,k),(M,j,Mij)>，其中K=1、2、到N的列數。對於N矩陣中的每一個元素Njk，產生一系列的鍵值對<(i,k),(N,j,Njk)>，其中i=1、2、到M的行數。

Reduce函數：對於同一個鍵（i，k），有許多的值（M,j,Mij）、（N,j,Njk），將j值相同的Mij和Njk相乘，然後不同j值處理後的結果再相加，即可得到Pik的值。

下面我們就以一個具體的矩陣爲例講解。

我們將M矩陣存放在M.txt文件中，文件的一行爲一個元素，內容格式爲“元素所在行,元素所在列元素值”。M.txt的內容如下。

我們將N矩陣存放在N.txt中，N.txt的內容如下。

Map函數輸出：經過map函數的處理，產生了一系列形如<(i,k),(M,j,Mij)>的鍵值對，具體如下。

Reduce函數輸出：對相同的（i,k）鍵，按照j值進行相乘再相加，過程如下。

所以最終得到的乘積矩陣爲P = [2,5,11]。

並行化大矩陣乘法的MapReduce程序如下：

package Matrix;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**
 *parameters : rowM,columnM,columnN,InputPaths,OutputPath 
 * @author liuchen
 *
 */

public class MatrixMain {
	public static void main(String[] args)throws Exception {
		//create job = map + reduce
		Configuration conf = new Configuration();
		
		//Setting global share parameters
		conf.set("rowM", args[0]);
		conf.set("columnM", args[1]);
		conf.set("columnN", args[2]);
		
		//create Job
		Job job = Job.getInstance(conf);
		
		//the entry of job
		job.setJarByClass(MatrixMain.class);
		
		//the mapper of job
		job.setMapperClass(MatrixMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		
		//the reducer of job
		job.setReducerClass(MatrixReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		
		//input and output
		TextInputFormat.setInputPaths(job, new Path(args[3]));
		TextOutputFormat.setOutputPath(job, new Path(args[4]));
		
		//submit job
		job.waitForCompletion(true);
	}

}

package Matrix;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

/**
 * Matrix multiplication Mapper
 * @author liuchen
 *
 */

public class MatrixMapper extends Mapper<Object, Text , Text , Text> {
	
	private static int columnN = 0;
	private static int rowM = 0;
	
	private Text map_key = new Text();
	private Text map_value = new Text();
	
	/**
	 *   Before executing the map function, get the necessary parameters
	 */
	protected void setup(Context context)throws IOException, InterruptedException {
		Configuration conf = context.getConfiguration();
		columnN = Integer.parseInt(conf.get("columnN"));
	    rowM = Integer.parseInt(conf.get("rowM"));	
	}

	protected void map(Object key, Text value,Context context)throws IOException, InterruptedException {
		//Through filename differentiation matrix
		FileSplit fileSplit = (FileSplit)context.getInputSplit();
		String fileName = fileSplit.getPath().getName();
		if(fileName.contains("M")){    //M Matrix
			String[] arr1 = value.toString().split(",");
			int i = Integer.parseInt(arr1[0]);
			String[] arr2 = arr1[1].split("\t");
			int j = Integer.parseInt(arr2[0]);
			int Mij = Integer.parseInt(arr2[1]);
			for(int k = 1;k <= columnN;k++){
				map_key.set(i + "," + k);
				map_value.set("M," + j + "," + Mij);
				context.write(map_key, map_value);
			}
			
		}
		else if (fileName.contains("N")){   //N Matrix
			String[] arr1 = value.toString().split(",");
			int j = Integer.parseInt(arr1[0]);
			String[] arr2 = arr1[1].split("\t");
			int k = Integer.parseInt(arr2[0]);
			int Njk = Integer.parseInt(arr2[1]);
			
			for(int i = 1;i<= rowM;i++){
				map_key.set(i + "," + k);
				map_value.set("N," + j +"," + Njk);
				context.write(map_key, map_value);
			}	
		}
	}

	
	
	

}

package Matrix;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MatrixReducer extends Reducer<Text, Text, Text, Text>{
	private static int columnM = 0;
	
	protected void setup(Context context)throws IOException, InterruptedException {
		Configuration conf = context.getConfiguration();
		columnM = Integer.parseInt(conf.get("columnM"));
	}
	
	protected void reduce(Text key, Iterable<Text> values,Context context)throws IOException, InterruptedException {
		int[] M = new int[columnM + 1];   //Index is 0 Empty
		int[] N = new int[columnM + 1];
		int sum = 0;
		for(Text value : values){
			String[] arr1 = value.toString().split(",");
			if(arr1[0].contains("M")){
				M[Integer.parseInt(arr1[1])] = Integer.parseInt(arr1[2]);
			}
			else if (arr1[0].contains("N")){
				N[Integer.parseInt(arr1[1])] = Integer.parseInt(arr1[2]);
			}
		}
		
		for(int j = 1;j<columnM + 1;j++){
			sum += M[j] * N[j];
		}
		context.write(key, new Text(Integer.toString(sum)));
	}

	
	
	

}

獲取更多幹貨請關注微信公衆號：追夢程序員。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

基於MapReduce的並行化大矩陣乘法

自學編程兩個月，現在我月入 4 萬元

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

基於機器學習中KNN算法的車牌字符識別

基於MapReduce的並行化大矩陣乘法

從影評的角度看《後來的我們》

MapReduce執行框架的組件和執行流程

幫別人做畢業設計程序是一種怎樣的體驗

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結