string 转yaml 不支持无法打印的unicode字符的问题

当我们用yaml来存储Map<String,String>时候,用dump方法转yaml,如果map中有不可打印的字符比如 \u0002 \b 等unicode,时候,发现转出来的内容成了 !!binary "5oKj6ICF77yM55S377yMMjTlsoHjgILlm6DovabnpbjkvKTlhaXpmaLvvIzmn6XkvZPlj5HnjrDlt6bkvqfpoqfpqqjpoqflvJPpqqjmipjvvIznnLblkajmt6TmlpHvvIzlvKDlj6Plj5fpmZDvvIzkuIrllIfpurvmnKjjgILooYxDVOS4iee7tOmcnue7hCgw77yONzUgbW3oloTlsYLmiavmj4/vvIxNUFLlhqDnirbpnaLpnJ7nu4Qp77yM5Y+R546w5bem5L6n5LiK6aKM56qm5YmN44CB5aSW5L6n5aOB44CBSGFsAmxlcuawlOaIv+S4iuWjgeWSjOWkluS+p+WjgeOAgeectuS4i+elnue7j+euoeWkluS+p+WjgeOAgeW3puS+p+ectuWkluWjgeOAgemip+W8k+Wkmgrlj5HpqqjotKjkuI3ov57nu63lj4rpqqjnoo7niYfvvIzlt6bkvqfkuIrpooznqqbjgIHlt6bnnLblpJbkvqflo4Hnoo7pqqjniYflkJHnqqblhoXjgIHnnLblhoXlh7npmbfvvIzlt6bot5/lpJbnm7TogoznlaXlj5fljovjgILpvLvpqqjlh7npmbfjgILlj4zkvqfnnLznkIPjgIHop4bnpZ7nu4/mnKrop4HmmI7mmL7lvILluLjjgILlvbHlg4/ljbDosaHvvJrlt6bkvqfpnaLpg6jlpJrlj5HpqqjmipjjgILlkI7ooYzmiYvmnK/mlbTlpI3jgII="

检查发现这是一段base64加密的String,这个就比较蛋疼了,只好看代码,

DumperOptions options = new DumperOptions();
options.setDefaultFlowStyle(FlowStyle.FLOW);
Constructor constructor = new Constructor(Map.class);
Yaml yaml = new Yaml(constructor, new Representer(), options);
yaml.setBeanAccess(BeanAccess.FIELD);
这个是初始化的代码,没什么问题,问题出在 Representer extends SafeRepresenter
protected class RepresentString implements Represent {
        public Node representData(Object data) {
            Tag tag = Tag.STR;
            Character style = null;
            String value = data.toString();
            if (StreamReader.NON_PRINTABLE.matcher(value).find()) {
                tag = Tag.BINARY;
                char[] binary;
                try {
                    binary = Base64Coder.encode(value.getBytes("UTF-8"));
                } catch (UnsupportedEncodingException e) {
                    throw new YAMLException(e);
                }
                value = String.valueOf(binary);
                style = '|';
            }
            // if no other scalar style is explicitly set, use literal style for
            // multiline scalars
            if (defaultScalarStyle == null && MULTILINE_PATTERN.matcher(value).find()) {
                style = '|';
            }
            return representScalar(tag, value, style);
        }
    }

其中

public final static Pattern NON_PRINTABLE = Pattern
            .compile("[^\t\n\r\u0020-\u007E\u0085\u00A0-\uD7FF\uE000-\uFFFD]");

这就明白了,在dump过程中,转换string时候回检查是否有不可打印的字符,如果有,就用base64来表示了,知道了原因,我们就有多种方法了处理了,一个是可以转之前去掉不可显示字符,也可以先将其转成base64,在反序列化成Java对象时候,特殊处理,再将base64转回来

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章