hdfs文件目录监控的思路

原創

2020-07-07 06:48

监控hdfs的一个目录，若有新文件，spark就开始处理这个文件，可以使用spark streaming textfilestream来监控该目录
这个是文件实时传输过程的监控还是文件上传完成之后的监控，需要自己试一下看看。
hdfs api调用监控

package com.zx.dao;
 
import com.zx.utils.PropertiesUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSInotifyEventInputStream;
import org.apache.hadoop.hdfs.client.HdfsAdmin;
import org.apache.hadoop.hdfs.inotify.Event;
import org.apache.hadoop.hdfs.inotify.EventBatch;
import org.apache.hadoop.hdfs.inotify.MissingEventsException;
 
import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.Properties;
 
//监控hdfs系统的文件状态
public class MonitorHdfs extends Thread {
 
    private  ArrayList<String> fileList = new ArrayList<String>();
 
    public  void getFileStatus() throws IOException, InterruptedException, MissingEventsException {
        Properties properties = PropertiesUtils.getProperties("spark-conf.properties");
        String hdfsPath = (String) properties.get("hdfsPath");
        HdfsAdmin admin = new HdfsAdmin( URI.create(hdfsPath), new Configuration() );
        DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
        while( true ) {
            EventBatch events = eventStream.take();
            for( Event event : events.getEvents() ) {
                System.out.println("=======================================================");
                System.out.println( "event type = " + event.getEventType() );
                switch( event.getEventType() ) {
                    case CREATE:
                        Event.CreateEvent createEvent = (Event.CreateEvent) event;
                        System.out.println( "  path = " + createEvent.getPath() );
                        String filePath = createEvent.getPath();
                        if(filePath.contains("/upload/")){
                            if(filePath.contains("._COPYING_")){
                                filePath = filePath.substring(0,filePath.length()-10);
                            }
                            this.fileList.add(filePath);
                        }
                        for(String str:fileList){
                            System.out.println(str);
                        }
 
                        break;
                    case CLOSE:
                        Event.CloseEvent closeEvent = (Event.CloseEvent) event;
                        System.out.println( "  path = " + closeEvent.getPath() );
                        break;
                    case APPEND:
                        Event.AppendEvent appendEvent = (Event.AppendEvent) event;
                        System.out.println( "  path = " + appendEvent.getPath() );
                        break;
                    case RENAME:
                        Event.RenameEvent renameEvent = (Event.RenameEvent) event;
                        System.out.println( "  srcPath = " + renameEvent.getSrcPath() );
                        System.out.println( "  dstPath = " + renameEvent.getDstPath() );
                        break;
                    case METADATA:
                        Event.MetadataUpdateEvent metadataUpdateEvent = (Event.MetadataUpdateEvent) event;
                        System.out.println( "  path = " + metadataUpdateEvent.getPath() );
                        break;
                    case UNLINK:
                        Event.UnlinkEvent unlinkEvent = (Event.UnlinkEvent) event;
                        System.out.println( "  path = " + unlinkEvent.getPath() );
                        break;
                    default:
                        break;
                }
                System.out.println("=======================================================");
            }
        }
    }
 
    public ArrayList<String> getFileList() {
        return fileList;
    }
 
    public void setFileList(ArrayList<String> fileList) {
        this.fileList = fileList;
    }
 
    public void clearFileList(){
        this.fileList.clear();
    }
}

Oozie Coordinator
Coordinator Engine Use Cases
Here are some typical use cases for the Oozie Coordinator Engine.

You want to run your workflow once a day at 2PM (similar to a CRON).
You want to run your workflow every hour and you also want to wait for specific data feeds to be available on HDFS
You want to run a workflow that depends on other workflows.
Benefits

Easily define all your requirements for triggering your workflow in an XML file
Avoid running multiple crontabs to trigger your workflows.
Avoid writing custom scripts that poll HDFS to check for input data and trigger workflows.
Oozie is provided as a service by the Grid Operations Team. You do not need to install software to start using Oozie on the Grid.
A Simple Coordinator Job
If you want to trigger workflows based on time or availability of data, then you should use the Oozie Coordinator Engine.

https://github.com/YahooArchive/oozie/wiki/Oozie-Coord-Use-Cases

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hdfs文件目录监控的思路

使用c#强大的表达式树实现对象的深克隆之解决循环引用的问题

GPT-4o 引领人机交互新风向，向量数据库赛道沸腾了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU启动那些事（12.A）- uSDHC eMMC启动时间(RT1170)

基于Ubuntu-22.04安装K8s-v1.28.2实验（二）使用kube-vip实现集群VIP访问

企业大模型如何成为自己数据的“百科全书”？

本地SSL证书过期输入命令在IIS自动生成

.NET周刊【5月第2期 2024-05-12】

基于Ubuntu-22.04安装K8s-v1.28.2实验（一）部署K8s

基于Ubuntu-22.04安装K8s-v1.28.2实验（三）数据卷挂载NFS（网络文件系统）

idea集成git合併分支到主幹

12個你值得擁有的虛擬科學實驗APP、工具和資源

Hyper-v虛擬機的網絡配置（CentOS7）

win10搭建spark本地開發環境

maven jar包找不到

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結