用Java将QQ、企业QQ导出的消息（mht格式）（支持大文件）解析为html格式，包含解析图片内容

代码地址见本文最后。

因为特殊原因，更换了通讯工具，需要将原聊天消息进行备份，并能进行浏览或者查询。

发现QQ消息可以导出mht格式的消息，这种文件格式内部其实就是讲html、css、图片（图片是经过base64转换）按照一定规律全部写入到mht文件中的，只要按照规律解析即可。

在解析的过程中，如果是文件体积比较大，就需要考虑进行分页，否则生成的html文件很大，我遇到解析后最大的单html文件达到了500M（导出全部消息），浏览起来很不方便，因此增加了分页功能。

首先上效果：

1.将程序放到mht文件所在文件夹。

程序会自动查询当前文件夹的mht文件，并进行转换，最后将文件保存到mht同名的文件夹中。

2.双击运行run.bat文件，可选分页。

注意：在mht文件小于100M的情况下，即使选择了分页，程序不会也不会进行分页。

3.预览效果

部分代码解析：

1.生成单文件html

/**
     * 创建单文件html
     * @param inputFile
     * @param outputFilePath
     */
    public static void readAndCreateFile(String inputFile, String outputFilePath) {
        String htmlFileName = parseHtmlFileName(inputFile);

        File file = new File(inputFile);
        BufferedInputStream fis = null;
        BufferedReader reader = null;
        try {
            fis = new BufferedInputStream(new FileInputStream(file));
            reader = new BufferedReader(new InputStreamReader(fis,"utf-8"),5*1024*1024);

            boolean isCreatedHtml = false, isHtmlContent = false;
            String line = "";
            StringBuilder sb = null;
            //String [] resType = null;
            String resName = null;
            //boolean isGetResType = false;
            boolean isGetResName = false;
            StringBuilder resSb = new StringBuilder();
            while((line = reader.readLine()) != null){
                if(!isCreatedHtml) {
                    if (isHtmlStartTag(line)) {
                        isHtmlContent = true;
                        sb = new StringBuilder(line).append("\n");
                    }else{
                        if(isHtmlContent) {
                            if (isHtmlEndTag(line)) {
                                sb.append(line).append("\n");
                                createHtmlFile(outputFilePath, htmlFileName, sb.toString(), 0, true);
                                sb.delete(0, sb.length());
                                isCreatedHtml = true;
                            } else {
                                sb.append(line).append("\n");
                            }
                        }
                    }
                }

                /**
                 * 开始解析资源文件
                 */
                if(isCreatedHtml) {
                    /*if(!isGetResType) {
                        resType = parseResourceType(line);
                        if (resType != null) {
                            isGetResType = true;
                            continue;
                        }
                    }*/
                    if(!isGetResName) {
                        resName = parseResourceName(line);
                        if (resName != null) {
                            isGetResName = true;
                            resSb.delete(0, resSb.length());
                            continue;
                        }
                    }
                    if(isGetResName) {
                        if(line.length() > 0) {
                            if(line.contains("------=_NextPart_")) {
                                //isGetResType = false;
                                isGetResName = false;
                                generateImage(resSb.toString(), (outputFilePath + File.separator + htmlFileName), resName);
                            }else{
                                resSb.append(line).append("\n");
                            }
                        }
                    }
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if(reader != null) {
                    reader.close();
                }
                if(fis != null) {
                    fis.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

2.生成分页html

/**
     * 创建分页html
     * @param inputFile
     * @param outputFilePath
     */
    public static void readAndCreateMultFile(String inputFile, String outputFilePath) {
        String htmlFileName = parseHtmlFileName(inputFile);

        File file = new File(inputFile);
        BufferedInputStream fis = null;
        BufferedReader reader = null;
        try {
            fis = new BufferedInputStream(new FileInputStream(file));
            reader = new BufferedReader(new InputStreamReader(fis,"utf-8"),5*1024*1024);

            boolean isCreatedHtml = false, isTableContent = false;
            String line = "";
            StringBuilder sb = null;
            int trLine = 1;
            int htmlNo = 1;
            //String [] resType = null;
            String resName = null;
            //boolean isGetResType = false;
            boolean isGetResName = false;
            StringBuilder resSb = new StringBuilder();
            while((line = reader.readLine()) != null){
                if(!isCreatedHtml) {
                    Matcher startMatcher = tableStartPattern.matcher(line);
                    if (startMatcher.find()) {
                        isTableContent = true;
                        /**
                         * 将table后面的内容拼接起来
                         */
                        sb = new StringBuilder(line.substring(startMatcher.end())).append("\n");
                    }else{
                        if(isTableContent) {
                            trLine++;
                            Matcher endMacher = tableEndPattern.matcher(line);
                            if (endMacher.find()) {
                                sb.append(line.substring(0, endMacher.start()));
                                createHtmlFile(outputFilePath, htmlFileName, sb.toString(), htmlNo, true);
                                isCreatedHtml = true;
                            } else {
                                sb.append(line).append("\n");
                                if(trLine % 1000 == 0) {
                                    createHtmlFile(outputFilePath, htmlFileName, sb.toString(), htmlNo, false);
                                    htmlNo ++;
                                    sb.delete(0, sb.length());
                                }
                            }
                        }
                    }
                }

                /**
                 * 开始解析资源文件
                 */
                if(isCreatedHtml) {
                    /*if(!isGetResType) {
                        resType = parseResourceType(line);
                        if (resType != null) {
                            isGetResType = true;
                            continue;
                        }
                    }*/
                    if(!isGetResName) {
                        resName = parseResourceName(line);
                        if (resName != null) {
                            isGetResName = true;
                            resSb.delete(0, resSb.length());
                            continue;
                        }
                    }
                    if(isGetResName) {
                        if(line.length() > 0) {
                            if(line.contains("------=_NextPart_")) {
                                //isGetResType = false;
                                isGetResName = false;
                                generateImage(resSb.toString(), (outputFilePath + File.separator + htmlFileName), resName);
                            }else{
                                resSb.append(line).append("\n");
                            }
                        }
                    }
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if(reader != null) {
                    reader.close();
                }
                if(fis != null) {
                    fis.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

注意：

1.考虑到性能问题，分页生成的时候，只做了上一页、下一页，因为个人已经觉得这个够用了，并没有开发页码的功能，读者可对这个进行扩展，思路大概是：将mht中的html部分读取到，主要是table中的tr部分，然后将这些逐条读取到并放入list，然后根据list进行分页。

2.读者在转换之后，还是请保留您的原始mht文件，虽然功能经过测试，也对数据进行了一定量的验证，但是不保证在转换的过程中可能因bug或未知因素导致数据丢失，因此，请保留原始mht文件，请保留原始mht文件，请保留原始mht文件！

3.花了一点时间开发的，并未对代码进行优化，可能有些地方有部分重复代码，读者请自行优化。

4.测试过6G和10G左右的文件，更大的文件暂时未测试。

后续：

后续如果有空闲时间，可能会做一定的改进，使用Lucene相关技术，对生成的html进行索引并支持搜索，方便搜索消息。

代码地址：

https://github.com/itriders/mht2html

用Java将QQ、企业QQ导出的消息（mht格式）（支持大文件）解析为html格式，包含解析图片内容

HTML页面关于高分屏的设置

北欧瑞典挪威芬兰瑞士TikTok海外网红与YouTube博主的合作模式

druid数据源 xml配置

Hibernate原生SQL使用別名（表字段使用了別名與Bean中字段名不一致）後無法獲取數據的問題

用Java將QQ、企業QQ導出的消息（mht格式）（支持大文件）解析爲html格式，包含解析圖片內容

關於Linux下snmp的Timeout: No Response from localhost錯誤

Centos7.8切換阿里雲（aliyun）yum源

解決vc6卡死,需要打補丁[有下載地址]

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結