hdfs文件目錄監控的思路

  1. 監控hdfs的一個目錄,若有新文件,spark就開始處理這個文件,可以使用spark streaming textfilestream來監控該目錄
    這個是文件實時傳輸過程的監控還是文件上傳完成之後的監控,需要自己試一下看看。

  2. hdfs api調用監控

package com.zx.dao;
 
import com.zx.utils.PropertiesUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSInotifyEventInputStream;
import org.apache.hadoop.hdfs.client.HdfsAdmin;
import org.apache.hadoop.hdfs.inotify.Event;
import org.apache.hadoop.hdfs.inotify.EventBatch;
import org.apache.hadoop.hdfs.inotify.MissingEventsException;
 
import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.Properties;
 
//監控hdfs系統的文件狀態
public class MonitorHdfs extends Thread {
 
    private  ArrayList<String> fileList = new ArrayList<String>();
 
    public  void getFileStatus() throws IOException, InterruptedException, MissingEventsException {
        Properties properties = PropertiesUtils.getProperties("spark-conf.properties");
        String hdfsPath = (String) properties.get("hdfsPath");
        HdfsAdmin admin = new HdfsAdmin( URI.create(hdfsPath), new Configuration() );
        DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
        while( true ) {
            EventBatch events = eventStream.take();
            for( Event event : events.getEvents() ) {
                System.out.println("=======================================================");
                System.out.println( "event type = " + event.getEventType() );
                switch( event.getEventType() ) {
                    case CREATE:
                        Event.CreateEvent createEvent = (Event.CreateEvent) event;
                        System.out.println( "  path = " + createEvent.getPath() );
                        String filePath = createEvent.getPath();
                        if(filePath.contains("/upload/")){
                            if(filePath.contains("._COPYING_")){
                                filePath = filePath.substring(0,filePath.length()-10);
                            }
                            this.fileList.add(filePath);
                        }
                        for(String str:fileList){
                            System.out.println(str);
                        }
 
                        break;
                    case CLOSE:
                        Event.CloseEvent closeEvent = (Event.CloseEvent) event;
                        System.out.println( "  path = " + closeEvent.getPath() );
                        break;
                    case APPEND:
                        Event.AppendEvent appendEvent = (Event.AppendEvent) event;
                        System.out.println( "  path = " + appendEvent.getPath() );
                        break;
                    case RENAME:
                        Event.RenameEvent renameEvent = (Event.RenameEvent) event;
                        System.out.println( "  srcPath = " + renameEvent.getSrcPath() );
                        System.out.println( "  dstPath = " + renameEvent.getDstPath() );
                        break;
                    case METADATA:
                        Event.MetadataUpdateEvent metadataUpdateEvent = (Event.MetadataUpdateEvent) event;
                        System.out.println( "  path = " + metadataUpdateEvent.getPath() );
                        break;
                    case UNLINK:
                        Event.UnlinkEvent unlinkEvent = (Event.UnlinkEvent) event;
                        System.out.println( "  path = " + unlinkEvent.getPath() );
                        break;
                    default:
                        break;
                }
                System.out.println("=======================================================");
            }
        }
    }
 
    public ArrayList<String> getFileList() {
        return fileList;
    }
 
    public void setFileList(ArrayList<String> fileList) {
        this.fileList = fileList;
    }
 
    public void clearFileList(){
        this.fileList.clear();
    }
}
  1. Oozie Coordinator
    Coordinator Engine Use Cases
    Here are some typical use cases for the Oozie Coordinator Engine.

You want to run your workflow once a day at 2PM (similar to a CRON).
You want to run your workflow every hour and you also want to wait for specific data feeds to be available on HDFS
You want to run a workflow that depends on other workflows.
Benefits

Easily define all your requirements for triggering your workflow in an XML file
Avoid running multiple crontabs to trigger your workflows.
Avoid writing custom scripts that poll HDFS to check for input data and trigger workflows.
Oozie is provided as a service by the Grid Operations Team. You do not need to install software to start using Oozie on the Grid.
A Simple Coordinator Job
If you want to trigger workflows based on time or availability of data, then you should use the Oozie Coordinator Engine.

https://github.com/YahooArchive/oozie/wiki/Oozie-Coord-Use-Cases

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章