背景说明

利用apache parquet-mr项目提供的parquet合并接口，完成hdfs上parquet文件的合并，从而减少hdfs上的小文件，减少文件元数据占据namenode的内存。

问题描述

现场环境上线parquet文件合并算子，运行一段时间后，日志中出现too many open files。利用lsof -p 进程号|wc -l命令来查看进程打开的文件句柄数，发现已经接近系统设置的最大数65535。

解决过程

查看org.apache.parquet.hadoop.ParquetFileWriter源码

public void appendFile(Configuration conf, Path file) throws IOException {
  ParquetFileReader.open(conf, file).appendTo(this);
}

public void appendTo(ParquetFileWriter writer) throws IOException {
  writer.appendRowGroups(f, blocks, true);
}

public void appendRowGroup(SeekableInputStream from, BlockMetaData rowGroup,
  boolean dropColumns) throws IOException {
  startBlock(rowGroup.getRowCount());
  Map<String, ColumnChunkMetaData> columnsToCopy =
      new HashMap<String, ColumnChunkMetaData>();
  for (ColumnChunkMetaData chunk : rowGroup.getColumns()) {
    columnsToCopy.put(chunk.getPath().toDotString(), chunk);
  }
   List<ColumnChunkMetaData> columnsInOrder =
      new ArrayList<ColumnChunkMetaData>();

  for (ColumnDescriptor descriptor : schema.getColumns()) {
    String path = ColumnPath.get(descriptor.getPath()).toDotString();
    ColumnChunkMetaData chunk = columnsToCopy.remove(path);
    if (chunk != null) {
      columnsInOrder.add(chunk);
    } else {
      throw new IllegalArgumentException(String.format(
          "Missing column '%s', cannot copy row group: %s", path, rowGroup));
    }
  }

  // complain if some columns would be dropped and that's not okay
  if (!dropColumns && !columnsToCopy.isEmpty()) {
    throw new IllegalArgumentException(String.format(
        "Columns cannot be copied (missing from target schema): %s",
        Strings.join(columnsToCopy.keySet(), ", ")));
  }

  // copy the data for all chunks
  long start = -1;
  long length = 0;
  long blockCompressedSize = 0;
  for (int i = 0; i < columnsInOrder.size(); i += 1) {
    ColumnChunkMetaData chunk = columnsInOrder.get(i);

    // get this chunk's start position in the new file
    long newChunkStart = out.getPos() + length;

    // add this chunk to be copied with any previous chunks
    if (start < 0) {
      // no previous chunk included, start at this chunk's starting pos
      start = chunk.getStartingPos();
    }
    length += chunk.getTotalSize();

    if ((i + 1) == columnsInOrder.size() ||
        columnsInOrder.get(i + 1).getStartingPos() != (start + length)) {
      // not contiguous. do the copy now.
      copy(from, out, start, length);
      // reset to start at the next column chunk
      start = -1;
      length = 0;
    }

    currentBlock.addColumn(ColumnChunkMetaData.get(
        chunk.getPath(),
        chunk.getType(),
        chunk.getCodec(),
        chunk.getEncodingStats(),
        chunk.getEncodings(),
        chunk.getStatistics(),
        newChunkStart,
        newChunkStart,
        chunk.getValueCount(),
        chunk.getTotalSize(),
        chunk.getTotalUncompressedSize()));

    blockCompressedSize += chunk.getTotalSize();
  }

  currentBlock.setTotalByteSize(blockCompressedSize);

  endBlock();
}

public void end(Map<String, String> extraMetaData) throws IOException {
  state = state.end();
  LOG.debug("{}: end", out.getPos());
  ParquetMetadata footer = new ParquetMetadata(new FileMetaData(schema, extraMetaData, Version.FULL_VERSION), blocks);
  serializeFooter(footer, out);
  out.close();
}

上述源码显示ParquetFileReader.open(conf, file)构造出来的ParquetFileReader对象，并没有用变量引用，因此该对象中的f变量无法关闭（f变量是SeekableInputStream对象），导致too many open files问题。

解决方法

方法一、修改org.apache.parquet.hadoop.ParquetFileWriter.appendFile方法，代码中持有SeekableInputStream引用，该方法执行的最后一条语句关闭该io流。
方法二、继承org.apache.parquet.hadoop.ParquetFileWriter类，重写appendFile方法，然后在算子代码中使用该继承类。
方法一需要修改parquet-hadoop jar源码包，违背了开闭原则，故在项目中使用了方法二。

结论

经过测试环境验证，该IO流在关闭之后，文件句柄在短时间后，会自动关闭。

看官有收获的话，关注公众号支持一下呗~

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

日常问题系列——使用parquet-hadoop-1.8.1.jar提供的parquet文件合并，出现too many open files错误

背景说明

问题描述

解决过程

解决方法

结论

日常問題系列——使用parquet-hadoop-1.8.1.jar提供的parquet文件合併，出現too many open files錯誤

日常問題定位——kafka topic leader none ISR爲空

Kafka總結——KafkaProducer

日常問題系列——Java字節碼解決nosuchmethoderror

日常問題系列——藉助Arthas解決noclassdeferror/nosuchmethoderror問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結