使用正則提取字符串中URL等信息

一、說明

背景:最近在做同步京東商品信息時遇到一個問題,同步後的商品詳情無法在富文本中修改,強制修改會導致圖片無法正常顯示,研究發現詳情中的圖片是在css的作爲背景圖指定的。

解決:經過多次嘗試,最後使用自定義HTML標籤模板,提取css樣式中background-image:url的圖片地址和尺寸,並替換到自定義的模板中

技術:Java語言、正則表達式

二、代碼

public static void main(String[] args) {
        StringBuilder stringBuilder = new StringBuilder();
        //商品詳情
        String goodsDesc = "<div cssurl='//sku-market-gw.jd.com/css/pc/100002519219.css?t=1581586700014'></div><div id='zbViewModulesH'  value='4797'></div><input id='zbViewModulesHeight' type='hidden' value='4797'/><div skudesign=\\\"100010\\\"></div><div class=\\\"ssd-module-wrap\\\" >\\n    <div  id=\\\"ssd-vc-goods\\\"  class=\\\"ssd-module ssd-module-goods M15541052686741\\\" data-id=\\\"M15541052686741\\\">\\n    <ul class=\\\"ssd-goods-4\\\">\\n                        <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t26977/50/1495537803/456791/ca60d3de/5be4e374Nf8e94aa9.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 進口威化餅乾 零食禮盒 零食大禮包 潘多拉禮盒684g\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 進口威化餅乾 零食禮盒 零食大禮包 潘多拉禮盒684g\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t23938/203/1285847551/405421/27964aa9/5b57e55eN969c2d3f.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 餅乾 咔芙爾焦糖威化餅乾73.5g\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 餅乾 咔芙爾焦糖威化餅乾73.5g\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t28057/267/707899178/312718/1054f7be/5bfbda66Ne622ae83.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 乳酪夾心威化餅乾160g\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 乳酪夾心威化餅乾160g\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t24409/100/1278216587/342196/7f15ac48/5b580b36Nb9007958.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 巧克力夾心威化餅乾125g\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 巧克力夾心威化餅乾125g\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t1/26596/26/9557/317836/5c7f4fedE8e6d5730/940a4d2112e62fc3.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 巧克力咔咔脆組合裝320g(160g*2盒)\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 巧克力咔咔脆組合裝320g(160g*2盒)\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t1/16950/37/10436/362577/5c8741b5E238f9c4a/ad91f31e0b26302c.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 咔咔脆威化餅乾 泡泡糖味80g/盒\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 咔咔脆威化餅乾 泡泡糖味80g/盒\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t20488/87/2361646474/244765/b67e1c77/5b503ba8N075a3501.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 咔咔脆威化餅乾 牛奶味160g\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 咔咔脆威化餅乾 牛奶味160g\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                    <li>\\n                <a >\\n                    <div class=\\\"ssd-good-item\\\">\\n                        <div class=\\\"ssd-good-img\\\">\\n                            <img src=\\\"http://img30.360buyimg.com/n1/jfs/t1/12175/32/10619/337857/5c8741e3E45420cc9/b3dab30dd73a7d8a.jpg\\\" alt=\\\"印尼進口 Tango威化餅乾 休閒零食 咔咔脆威化餅乾 草莓味80g/盒\\\"/>\\n                        </div>\\n                        <div class=\\\"ssd-good-info\\\">\\n                            <p class=\\\"ssd-good-name\\\">\\n                                印尼進口 Tango威化餅乾 休閒零食 咔咔脆威化餅乾 草莓味80g/盒\\n                            </p>\\n                        </div>\\n                    </div>\\n                </a>\\n            </li>\\n                </ul>\\n</div><div class=\\\"ssd-module M15518471203811 animate-M15518471203811\\\" data-id=\\\"M15518471203811\\\">\\n        \\n</div>\\n<div class=\\\"ssd-module M15518471298134 animate-M15518471298134\\\" data-id=\\\"M15518471298134\\\">\\n        \\n</div>\\n<div class=\\\"ssd-module M15518471291853 animate-M15518471291853\\\" data-id=\\\"M15518471291853\\\">\\n        \\n</div>\\n<div class=\\\"ssd-module M15518471283932 animate-M15518471283932\\\" data-id=\\\"M15518471283932\\\">\\n        \\n</div>\\n\\n</div>\\n<!-- 2019-07-01 10:02:50 --> \\n<style>.ssd-module-wrap{position:relative;margin:0 auto;width:750px;text-align:left;background-color:#fff}.ssd-module-wrap .ssd-module,.ssd-module-wrap .ssd-module-heading{width:750px;position:relative;overflow:hidden}.ssd-module-wrap .ssd-module{background-repeat:no-repeat;background-position:left top;background-size:100% 100%}.ssd-module-wrap .ssd-module-heading{background-repeat:no-repeat;background-position:left center;background-size:100% 100%}.ssd-module-wrap .ssd-module-heading .ssd-module-heading-layout{display:inline-block}.ssd-module-wrap .ssd-module-heading .ssd-widget-heading-ch{float:left;display:inline-block;margin:0 6px 0 15px;height:100%}.ssd-module-wrap .ssd-module-heading .ssd-widget-heading-en{float:left;display:inline-block;margin:0 15px 0 6px;height:100%}.ssd-module-wrap .ssd-widget-pic,.ssd-module-wrap .ssd-widget-text,.ssd-module-wrap .ssd-widget-line,.ssd-module-wrap .ssd-widget-rectangle,.ssd-module-wrap .ssd-widget-circle,.ssd-module-wrap .ssd-widget-triangle,.ssd-module-wrap .ssd-widget-table{position:absolute;overflow:hidden}.ssd-module-wrap .ssd-widget-rectangle{box-sizing:border-box;-moz-box-sizing:border-box;-webkit-box-sizing:border-box}.ssd-module-wrap .ssd-widget-table table{width:100%;height:100%}.ssd-module-wrap .ssd-widget-table td{position:relative;white-space:pre-line;word-break:break-all}.ssd-module-wrap .ssd-widget-pic img{display:block;width:100%;height:100%}.ssd-module-wrap .ssd-widget-text{line-height:1.5;word-break:break-all}.ssd-module-wrap .ssd-widget-text span{display:block;overflow:hidden;width:100%;height:100%;padding:0;margin:0;word-break:break-all;word-wrap:break-word;white-space:normal}.ssd-module-wrap .ssd-widget-link{position:absolute;left:0;top:0;width:100%;height:100%;background:transparent;z-index:100}.ssd-module-wrap .ssd-cell-text{position:absolute;top:0;left:0;right:0;width:100%;height:100%;overflow:auto}.ssd-module-wrap .M15541052686741{width:750px; height:492px}\\n.ssd-module-wrap  .M15541052686741  ul {\\n  padding: 5px;\\n  line-height: 1.15;\\n  background: #F3F4F7;\\n  overflow: hidden;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  li {\\n  list-style-type: none;\\n  padding: 5px;\\n  float: left;\\n  -moz-box-sizing: border-box;\\n  box-sizing: border-box;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-goods-1 li {\\n  width: 100%;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-goods-2 li {\\n  width: 50%;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-goods-3 li {\\n  width: 33.33%;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-goods-4 li {\\n  width: 25%;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  a {\\n  display: block;\\n  overflow: hidden;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-good-item {\\n  background-color: #fff;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-good-img {\\n  position: relative;\\n  padding-top: 100%;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-good-img img {\\n  position: absolute;\\n  top: 0;\\n  left: 0;\\n  width: 100%;\\n  height: 100%;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-good-info {\\n  padding: 10px;\\n  margin: 0;\\n}\\n\\n.ssd-module-wrap  .M15541052686741  .ssd-good-name {\\n  margin: 0;\\n  height: 36px;\\n  line-height: 18px;\\n  font-size: 14px;\\n  color: #333333;\\n  display: -webkit-box;\\n  overflow: hidden;\\n  text-overflow: ellipsis;\\n  -webkit-line-clamp: 2;\\n  -webkit-box-orient: vertical;  \\n}.ssd-module-wrap .M15518471203811{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg); height:1083px}\\n.ssd-module-wrap .M15518471298134{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg); height:786px}\\n.ssd-module-wrap .M15518471291853{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg); height:1416px}\\n.ssd-module-wrap .M15518471283932{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg); height:1020px}\\n</style>";
        //商品詳情模板
        String goodsDescTemplate = "<p><img src=%s data-width=750 data-height=%s /></p>";

        //定義提取圖片URL和height值的正則表達式,提取的字段用group的()語法
        Pattern pattern = Pattern.compile("background-image:url\\((https?://.*)\\).*height:(\\d+)");

        //研究原串後,先以尺寸進行分組
        String[] split = goodsDesc.split("px}");
        for (String s : split) {
            if (s.contains("background-image:url")){    //過去掉不含背景圖片的數據
                Matcher matcher = pattern.matcher(s);   //指定匹配器
                while (matcher.find()){ //進行查找,並判斷是否匹配
                    System.out.println("匹配到的字符串:"+ matcher.group());
                    System.out.println("提取的圖片地址:"+ matcher.group(1));
                    System.out.println("提取的height值:"+ matcher.group(2));
                    stringBuilder.append(String.format(goodsDescTemplate, matcher.group(1), matcher.group(2)));
                }
            }
        }

        System.out.println("拼接的字符串:"+ stringBuilder);
    }

 三、打印日誌

匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg); height:1083
提取的圖片地址:http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg
提取的height值:1083
匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg); height:786
提取的圖片地址:http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg
提取的height值:786
匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg); height:1416
提取的圖片地址:http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg
提取的height值:1416
匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg); height:1020
提取的圖片地址:http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg
提取的height值:1020
拼接的字符串:<p><img src=http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg data-width=750 data-height=1083 /></p><p><img src=http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg data-width=750 data-height=786 /></p><p><img src=http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg data-width=750 data-height=1416 /></p><p><img src=http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg data-width=750 data-height=1020 /></p>

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章