文檔內容結構化在百度文庫的技術探索

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"導讀:","attrs":{}},{"type":"text","text":"簡述百度文庫關於各類文檔的轉碼和展現歷程,早期的版式數據滿足了PC端的各類文檔閱讀體驗,隨着業務發展的需求迭代,無線端的文檔閱讀體驗亟需提升。版式數據轉流式數據過程中,簡易的內容結構化滿足了pdf數據在無線端的重排版。底層解析ooxml數據和細緻的內容結構化,則帶來了不錯的word無線端重排版效果。從chart圖片中“從無到有”抽取結構化的元數據,更爲用戶與文檔的互動打開了想象空間","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"全文3724字,預計閱讀時間 9分鐘。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"一、百度文庫中各類文檔的展現","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文庫有數十億海量文檔,包括word,ppt,excel和pdf等十幾種常見辦公文檔,核心基礎服務是文檔轉碼和展現。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了統一十幾種文檔的轉碼和展現方案,不依賴於原文件格式的開檔軟件,技術調研後,最終方案爲任意文檔轉碼爲pdf格式,解析開源的pdf數據格式,加工後形成文庫自有文檔格式,在pc端、無線端排版和渲染。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PC端渲染採用源於PDF的xreader版式數據,版式數據指的是每個元素(文字、圖片)都有一個座標信息和元素的寬高信息,以及其他的描述信息。每一個文本片段、圖片和其他矢量元素等根據座標信息在當前版面固定顯示。因此,版式數據比較適合用於在PC端等比例展現各類文檔,版式排版的還原效果較好。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無線端的屏幕尺寸普遍較小,如果將版式數據等比例縮小後排版,整個版面中的文字、公式較小,給閱讀帶來不便,如圖1所示。雖然可以放大顯示,但顯然增加了用戶的操作成本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1c45ceee6bfb8ee760a5b4ad766874af.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖1 無線端採用版式數據進行等比例縮小的版式排版","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比較理想的方案是將版式數據轉換成流式數據,根據不同的無線端屏幕尺寸,進行重排版。區別於版式數據中每個元素都有當前版面的座標信息,流式數據沒有座標信息,有的是章節、欄、段落、公式和表格等結構化信息,大量的數據結構信息將最基礎的文本、圖片關聯起來,形成結構化的文檔內容,適合各種屏幕尺寸的自適應重排版。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"二、文檔內容結構化的技術探索","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.1 Retype流式數據(基於xreader版式數據)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文庫早期文檔內容“版式轉流式”的方案,遍歷xreader版式數據中的每個元素,提取座標信息x,y和元素的寬高w,h信息。比較接近的y認爲是同一行數據,y接近的情況下,根據x和w拼接相鄰的文本元素、連接相鄰的文本和圖片。然後就得到當前版面的所有行數據結構line,根據每個line的y和h信息,將相鄰line拼接爲段落。通過判斷當前line的x+w數據小於版面寬度、以特殊標點結尾,以及下一個line的x信息蘊含着段首縮進等情況,從而判定一個段落的結束。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上是“版式轉流式”方案的通用思路,當文檔的版面結構較爲複雜,比如論文、文獻等存在大量多欄、圖文繞排、表格腳註尾註的情況時,還需要進行range識別的預處理,將整個版面分析、切割成多個range結構,在每個range內再進行“版式轉流式”的通用方案,才能得到較好的效果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種方案從版式數據中提取了“段落、行“等結構化信息,有助於流式排版。但一些case顯示這些結構化信息的準確率達不到百分百正確,存在“段落被強制換行,inline圖片位置錯誤”等情況,且對“公式、圖表chart和表格”等複雜結構化信息的提取能力較弱。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同於pdf文檔中只有元素相對於版面的座標信息且缺乏內容結構化信息,office文檔如word文檔,源文檔中存在結構化信息,只是在word轉pdf的轉碼過程中丟失了這些信息。因此,對文庫佔比較多的word文檔提取結構化信息和提升無線端流式排版效果,成爲階段性的重要目標。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.2 BDJson流式數據(基於ooxml數據)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"微軟office歷史悠久,word存在許多版本,簡化區分爲doc二進制複合文檔格式和docx的ooxml文檔格式。Doc二進制複合文檔格式較爲複雜,且是微軟的閉源項目,解析和轉碼的成本較高。爲了簡化方案,將doc轉換成docx,然後核心方案就是解析docx格式,轉碼,產出BDJson格式流式數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OOXML是開源項目,基於zip+xml的格式,普通文本及其字符屬性、段落屬性的讀取和解析較爲方便,其自帶章節、段落和表格等結構化信息,便於流式排版。基於本次排版需求,以及考慮到將來有word在線編輯的場景,方案設計爲語義級別的精確解析文檔,抽取內容和屬性,組建office數據結構。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"章節、段落等數據結構,遵循ooxml標準,從Document.xml中解析數據後即可組裝成對應的數據結構。頁眉頁腳、腳註尾註等數據結構,Document.xml中存儲的只是索引和基本信息,具體的區域內容需要從其他的xml文件中獲取,按照索引的對應關係進行拼裝,並插入到正文中的具體位置。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些數據結構因office結構與html結構的差異性,需要做一些適配工作。例如常見的項目符號與編號,在word中可以有9層結構,每一層結構都有字符屬性、段落屬性、tab設置和圖片編號等,需要兼容映射到html的ol、ul簡易結構。表格中合併單元格的行跨、列跨和隱藏被合併單元格,在office和html中也是有很大差異,需要遍歷整個表格,計算轉換後進行兼容性轉碼。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,對一些在線編輯場景中涉及的數據結構,也做了提取和轉碼,例如將word中支持的多套公式數據“域公式、mathtype公式,omath公式”統一轉碼成LaTex數據格式,不僅便於後續編輯,而且可以適配正文的字體和大小,整體排版效果更統一。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上技術方案的實施,完美提取了word文檔中的結構化信息,優化了現有文檔轉碼和展現的流程,如圖2所示。文檔內容結構化信息,使得word文檔無線端可以實現自適應的流式排版,大爲提升了展示效果,如圖3所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0e631c6f51c595504816491cbd5a0371.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖2 文檔轉碼和展現(版式,流式)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/40/40eef22ece2453fc6efb6e7fdeae2da6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖3 文檔無線端流式排版和公式LaTex展現","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.3 chart圖片(或pdf數據)中提取結構化數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在論文、期刊和財經研報等特定類型的pdf文檔中,經常會有一些圖表chart信息,這些圖表一般以“無結構的pdf數據、圖片、背景圖”的形式出現。提取這些圖表信息,將元數據導入到excel中,讓用戶可以重新編輯、觀測和生成新的chart,具有較大的產品價值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現有的一些工具,一般都是讓用戶對文檔中chart所在的區域range手動截圖,然後人工選擇座標軸原點,輸入座標軸刻度等信息,對chart描邊等一系列繁瑣的操作,且數據提取的正確率不高。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Chart圖片或無結構的pdf數據中提取結構化元數據的技術方案可以簡化爲兩大模塊:range識別,元數據提取。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3.1 Range識別","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以pdf文檔舉例,首先遍歷本頁所有元素,將文本碎片span、圖片等框選起來。原始span按y,x進行相鄰merge,得到大一些的fragment, 進而聚合成line。line區域按文本數量和位置等信息,進行有效性判斷,有些可消除。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"搜索剩餘空間內的空白區域,作爲range的候選區域。獲取頁面設置信息,確定頁面內容範圍。從上往下遍歷,先把整行空白的range識別出來。按行遍歷,如果line兩端尚有空餘,加2個兩端的range。用當前的line去碰撞已有的range,若相交,將相交部分消去,會把原有range切成多個新的range。至此,得到紫色的range候選區域集合,如圖4所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1f/1ff7690dd68f8688c49043cd1edc40af.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖4 range候選區域集合","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遍歷range候選區域集合,按range的位置、寬高進行相鄰range的合併和重新組合,得到新的一組range,如圖5所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e5/e5945d82f15b61a9807a41e58f44b3d2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖5 range候選區域集合(合併後)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"過濾range(根據矩形大小、位置、前後的文本line,ocr的文本數量等信息),同時對range的邊緣進行白邊切割,最後得到有效range,如圖6所示。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/45/45afedaf2ad4b5b0e52c5cd81e639b76.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖6 range候選區域集合(過濾後)","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3.2 元數據提取","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過range識別模塊產出的range集合,可以進行下一步的元數據提取。不是所有range裏都是chart,可能是個簡單圖片、流程圖等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先依據range信息,對當前頁面截取range對應的圖片,進行圖像分析,初步判定是否是chart圖片並進行初步的chart分類,例如柱狀圖、餅圖,如圖7所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7a/7a82fc0efe4cb4c547ef5e5bfdcaa129.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖7 以圖片形式展現的chart","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以柱狀圖舉例,基於像素分析和邊緣提取算子的預處理,識別出x軸、y軸的候選線條,並依據長度、位置等信息,進行刪選,最後得到xy軸,組成座標體系。掃描xy軸上的刻度線,此時有較多幹擾,可能誤差較大,通過像素對比和交叉驗證,給軸線補充上刻度線。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完整、正確的座標體系對於後續的chart元數據提取很重要。基於座標體系,可將整個圖片切割成多個subRange,對subRange中的小圖進行ocr,獲取其中的文本,即可拼裝成chart的數據項、各個數據點,經過一系列的數據矯正和重新組合,從而得到整個chart的元數據,如圖8所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/025cd21703fe5994caa949a3307e8ec6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖8 從chart圖片中提取的元數據","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"三、文檔內容結構化的後續發展","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務的發展,基於文檔整頁展現的基礎上,如何給用戶更好的文檔展現和互動效果,對文檔轉碼和展現技術提出了更高的要求,而這一切的基礎正是提取細粒度的文檔元素和對文檔內容進一步的結構化識別和提取。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"招聘信息:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度-文庫研發部,團隊致力於建設業界領先的在線互動式文檔、音頻等知識分享平臺,十年來彙集了超9億份高價值文檔資料,擁有近40萬認證作者和2萬家專業權威機構,已成爲中國領先的文檔與知識服務平臺。百度文庫堅持以“讓每個人平等地提升自我”爲目標,努力將知識儘可能地分享到每一個需要的角落。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"誠邀iOS & Android小夥伴。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"關注百度Geek說,公衆號菜單欄點擊內推即可。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"推薦閱讀:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247495266&idx=1&sn=a50ed4cf4828bb6bdc6caa58c2cdae5a&chksm=c03ede1ef74957080a72f358781b0ee656be889bf4a91705a249ab1c7e36ce78294de2974226&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"|","attrs":{}}],"marks":[{"type":"strong"}]},{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247497004&idx=1&sn=bd993eb334455350eb7b67f1f232f3c3&chksm=c03ec550f7494c4612d6e070d4a9943cbf40897699a395af9da3a1fc2630099be3d2d6434f68&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"從 Web 圖標演進歷史看最佳實踐 | 文末送書","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247495266&idx=1&sn=a50ed4cf4828bb6bdc6caa58c2cdae5a&chksm=c03ede1ef74957080a72f358781b0ee656be889bf4a91705a249ab1c7e36ce78294de2974226&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"|","attrs":{}}],"marks":[{"type":"strong"}]},{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247496411&idx=1&sn=0fb7bd30099dbbabc8558f14f23d4546&chksm=c03ec2a7f7494bb1b14507338f97bdb74f0f4c979e2e6564ffd1b6e0025ccedc655717d7bab4&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"百度內容風控詞表那些事兒|文末送書","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247496007&idx=1&sn=ea4e0dc518177e456ff01a2961af2842&chksm=c03ec13bf749482dd2a5d241d68d087454fd79f4f204fcbc98821dbbaecf6b2d51945c111a41&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"揭祕百度微服務監控:百度遊戲服務監控的演進","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"----------  END  ----------","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"百度Geek說","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度官方技術公衆號上線啦!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術乾貨 · 行業資訊 · 線上沙龍 · 行業大會","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"招聘信息 · 內推信息 · 技術書籍 · 百度周邊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歡迎各位同學關注","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章