string 轉yaml 不支持無法打印的unicode字符的問題

當我們用yaml來存儲Map<String,String>時候,用dump方法轉yaml,如果map中有不可打印的字符比如 \u0002 \b 等unicode,時候,發現轉出來的內容成了 !!binary "5oKj6ICF77yM55S377yMMjTlsoHjgILlm6DovabnpbjkvKTlhaXpmaLvvIzmn6XkvZPlj5HnjrDlt6bkvqfpoqfpqqjpoqflvJPpqqjmipjvvIznnLblkajmt6TmlpHvvIzlvKDlj6Plj5fpmZDvvIzkuIrllIfpurvmnKjjgILooYxDVOS4iee7tOmcnue7hCgw77yONzUgbW3oloTlsYLmiavmj4/vvIxNUFLlhqDnirbpnaLpnJ7nu4Qp77yM5Y+R546w5bem5L6n5LiK6aKM56qm5YmN44CB5aSW5L6n5aOB44CBSGFsAmxlcuawlOaIv+S4iuWjgeWSjOWkluS+p+WjgeOAgeectuS4i+elnue7j+euoeWkluS+p+WjgeOAgeW3puS+p+ectuWkluWjgeOAgemip+W8k+Wkmgrlj5HpqqjotKjkuI3ov57nu63lj4rpqqjnoo7niYfvvIzlt6bkvqfkuIrpooznqqbjgIHlt6bnnLblpJbkvqflo4Hnoo7pqqjniYflkJHnqqblhoXjgIHnnLblhoXlh7npmbfvvIzlt6bot5/lpJbnm7TogoznlaXlj5fljovjgILpvLvpqqjlh7npmbfjgILlj4zkvqfnnLznkIPjgIHop4bnpZ7nu4/mnKrop4HmmI7mmL7lvILluLjjgILlvbHlg4/ljbDosaHvvJrlt6bkvqfpnaLpg6jlpJrlj5HpqqjmipjjgILlkI7ooYzmiYvmnK/mlbTlpI3jgII="

檢查發現這是一段base64加密的String,這個就比較蛋疼了,只好看代碼,

DumperOptions options = new DumperOptions();
options.setDefaultFlowStyle(FlowStyle.FLOW);
Constructor constructor = new Constructor(Map.class);
Yaml yaml = new Yaml(constructor, new Representer(), options);
yaml.setBeanAccess(BeanAccess.FIELD);
這個是初始化的代碼,沒什麼問題,問題出在 Representer extends SafeRepresenter
protected class RepresentString implements Represent {
        public Node representData(Object data) {
            Tag tag = Tag.STR;
            Character style = null;
            String value = data.toString();
            if (StreamReader.NON_PRINTABLE.matcher(value).find()) {
                tag = Tag.BINARY;
                char[] binary;
                try {
                    binary = Base64Coder.encode(value.getBytes("UTF-8"));
                } catch (UnsupportedEncodingException e) {
                    throw new YAMLException(e);
                }
                value = String.valueOf(binary);
                style = '|';
            }
            // if no other scalar style is explicitly set, use literal style for
            // multiline scalars
            if (defaultScalarStyle == null && MULTILINE_PATTERN.matcher(value).find()) {
                style = '|';
            }
            return representScalar(tag, value, style);
        }
    }

其中

public final static Pattern NON_PRINTABLE = Pattern
            .compile("[^\t\n\r\u0020-\u007E\u0085\u00A0-\uD7FF\uE000-\uFFFD]");

這就明白了,在dump過程中,轉換string時候回檢查是否有不可打印的字符,如果有,就用base64來表示了,知道了原因,我們就有多種方法了處理了,一個是可以轉之前去掉不可顯示字符,也可以先將其轉成base64,在反序列化成Java對象時候,特殊處理,再將base64轉回來

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章