XML解析中Bom導致錯誤的問題分析與解決

原創

微寒Super

2020-02-20 17:07

錯誤信息：org.dom4j.DocumentException:Error on line 1of document：Content is not allowed in prolog.

Nested exception: Content is not allowed in prolog.

XML編碼錯誤：
左邊報錯的XML，右邊正常的xml文件，比較工具Beyond Compare 4

解決辦法：
1、使用Notepad++編輯器，將以UTF-8格式編碼的文件轉換爲以UTF-8無Bom格式編碼的文件，另存爲即可。

2、對於webService接收來的xmlString的處理，使用如下方法，修改xml字符串

/**
     * 檢查xml字符串是否有非法前綴
     * @param xmlStr
     * @return
     */
    public String checkXMLStr(String xmlStr){

        StringBuilder sb= new StringBuilder(xmlStr);
        int index = sb.indexOf("<?xml");
        if(index > 0){
            sb.delete(0, index);
            xmlStr = sb.toString();
        }else if(index == -1){
            xmlStr = "";
        }
        return xmlStr;

    }

3、爲了程序的健壯性，可以在讀文件的時候，加入判斷，判斷是否有Bom，有的話，在生成字符串的時候，將其刪除，方法如下：

/**
     * 檢查byte數組 是否有BOM頭
     * UTF8文件都有一個3字節的頭，爲“EF BB BF”(稱爲BOM--Byte Order Mark)
     * @param bytes
     * @return
     */
    private static boolean CheckBOM( byte[] bytes )
    {
        boolean isBOM = false;
        {
            if(bytes.length >3){
                 if( 0xef == (bytes[0] & 0xff) 
                     && 0xbb == (bytes[1] & 0xff) 
                     && 0xbf == (bytes[2] & 0xff) ){
                     isBOM = true;
                 }
            }
        }
        //System.out.println("是否有BOM："+isBOM);
        return isBOM;
    }
/**
     * 將文件讀取爲UTF-8編碼字符串
     * @param filePath
     * @return
     */
    public String getXMLFileText(String filePath) {
        String retXMLStr = "";

        byte[] bt = fileToByteArray(filePath);
        //加入一個判斷，文件流是否含有Bom，有就刪除
        if( CheckBOM(bt) ){
            try {
                retXMLStr = new String(bt,3,bt.length -3, "utf-8");
            } catch (UnsupportedEncodingException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }else{
            try {
                retXMLStr = new String(bt,0 ,bt.length, "utf-8");
            } catch (UnsupportedEncodingException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        //return checkXMLStr(retXMLStr);
        return retXMLStr;
    }
    // 將文件讀成byte[]數組
    public byte[] fileToByteArray(String filePath) {

        filePath = filePath.replaceAll("\\\\", "/");
        File file = null;
        FileInputStream fileInputStream = null;
        BufferedInputStream in = null;
        ByteArrayOutputStream out = null;
        byte[] bt = null;
        try {
            file = new File(filePath);

            if (!file.exists() || file.isDirectory()) {
                return null;
            }
            fileInputStream = new FileInputStream(file);

            in = new BufferedInputStream(fileInputStream);
            out = new ByteArrayOutputStream();
            byte[] temp = new byte[1024 * 1024];  //每次讀取 1M
            int size = 0;
            while ((size = in.read(temp)) != -1) {
                out.write(temp, 0, size);
            }

            bt = out.toByteArray();
            // for(int i = 0; i < bt.length; i++)

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                fileInputStream.close();
                in.close();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }

        return bt;
    }