破解數據流通不暢問題,多方安全計算技術到底行不行?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據已經取代“石油”成爲當今世界最有價值的資源。但是,企業或政府等組織卻無法發揮其最大價值。而主要原因在於數據流通不暢。由中國信通院發佈的《隱私保護計算技術研究報告(2020 年)》表明,數據流通不暢有三大原因:"},{"type":"text","marks":[{"type":"strong"}],"text":"“數據孤島”現象的普遍存在、全球數據合規監管日趨嚴格和隱私泄露事件頻發"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現實中,數據使用方需要將各方數據融合在一起,建立模型進行數據挖掘;數據擁有者出於數據安全保密的顧慮而不願共享數據,結果導致不同企業、不同機構間難以獲取對方數據進行聯合分析或建模,數據價值無法充分發揮。這就是數據流通不暢的真實寫照。目前,它已經成爲制約大數據行業發展的重要問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"解決這個問題的新手段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據瞭解,隱私保護計算技術被視爲解決數據流通不暢問題的有效手段。什麼是隱私保護計算技術?《隱私保護計算技術研究報告(2020 年)》這樣解釋:隱私保護計算技術並不是一種單一的技術,它是一套包含人工智能、密碼學、數據科學等衆多領域交叉融合的跨學科技術體系,實現"},{"type":"text","marks":[{"type":"strong"}],"text":"數據”可用不可見“"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,隱私保護計算技術主要有五大技術:多方安全計算、聯邦學習、機密計算、差分隱私和同態加密。其中,多方安全計算技術近年來快速成熟,支持的應用場景越來越多,備受關注。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多方安全計算技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"談起多方安全計算技術,不得不提一個著名問題 - 百萬富翁問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩個百萬富翁在街頭相遇,他們都想比一比誰更有錢。但是出於隱私,誰都不想讓對方知道自己擁有多少財富。在不借助第三方的情況下,如何得出誰的財富更多。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個問題由姚期智(圖靈獎獲得者、清華大學交叉信息研究院院長)在 1982 年提出並作出解答。並且,他用數學理論證明了凡是可以在明文數據上進行的計算,理論上都可以在密文上直接進行計算,並得出與明文計算完全一致的結果,從而創立”多方安全計算“(Secure Multi-Party Computation,簡稱 MPC)的理論框架。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單說,多方安全計算技術解決的是“一組互不信任的參與方之間在保護隱私信息以及沒有可信第三方的前提下的協同計算問題”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在華控清交副總裁黃斌看來,多方安全計算技術可以實現“數據可用不可見”,讓數據流通。據悉,自上世紀八十年代姚期智等人提出多方安全計算以來,這項技術更多停留在學術研究層面。當時,相對於傳統的加密和傳輸,使用多方安全計算技術進行加密相對於不加密而言,其計算耗時大約要上升十萬至百萬倍。這在工程實踐中是難以接受的。而近年來,隨着算法協議的優化和計算能力的增強,計算耗時已經從數十萬倍下降至 100 倍以內。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"黃斌表示,多方安全計算技術的性能目前基本可以滿足大規模商用要求。這個行業裏,一部分廠商選擇的技術道路是一種點對點計算,沒有把數據節點和計算節點拆開,因此它實際上是一個兩方計算的架構,擴展性差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“這樣,只能 A 與 B 做計算,如果 C 要參與進來,那就做不了。但是,我們一開始設計時,數據功能和計算功能是分離的。如果加入新的數據節點,那不影響這個架構。並且,我們的計算節點也是可擴展的,跟 Hadoop、Spark 一樣。”他說。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"數據行業的老兵"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲 IT 行業的資深人士,黃斌一直在做數據領域相關的工作,對數據有着深刻的認識。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/71\/71305806492209bfe38f0dad1ff18183.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"華控清交副總裁黃斌"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據悉,他 2000 年畢業後加入華爲,做網絡相關的軟件系統開發。在華爲工作的後期,他的工作是通過採集網絡上的數據來對網絡進行調優。換句話說,通過數據來反向控制網絡配置,讓網絡實現均衡。2018 年,他離開華爲,加入阿里,做工業大腦、城市大腦相關的工作。在阿里做工業大腦,核心也是通過採集設備和裝備的數據,去控制設備或調控生產計劃。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在與數據長時間打交道後,他認識到“如果你沒有數據,就是拍腦袋決策”。即使有了數據,企業裏也會存在數據孤島,並且因部門設置,數據孤島很難打通。在黃斌看來,數據孤島分兩種情況:“一是其他部門根本不想和你一起做這件事,二是其他部門可以和你做這件事,但是不知道怎麼做,因爲有合規要求,比如銀行數據不能出門”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,有些業務可以做,數據也能流動,但是其手段比較原始。比如政府間的數據流通,有一種辦法是“A 方把數據刻錄到一張光盤上,通過一個類似於機密通道的方式送給 B 方,B 方拿到這個光盤,籤個字獲得光盤”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"黃斌在機緣巧合下認識了華控清交,瞭解到他們使用多方安全計算技術來解決數據流通不暢問題。據瞭解,華控清交成立於 2018 年 6 月,張旭東任 CEO,清華大學交叉信息研究院長聘副教授徐葳爲首席科學家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020 年 4 月加入華控清交後,他擔任公司副總裁,負責公司產品研發和工程化工作。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"多方安全計算技術的落地難點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多方安全計算技術目標是解決數據流通共享中的安全保密問題。在無可信第三方的條件下,多方安全計算通過同態加密、混淆電路、不經意傳輸和祕密分享等技術環節,保障各個參與方數據輸入的隱私性和計算結果的準確性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自 20 世紀 80 年代以來,多方安全計算經過理論研究階段、實驗室階段、應用初創階段,目前正處於規模化發展階段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"黃斌表示,目前,一方面,多方安全計算理論本身在發展;另一方面,多方安全計算的工程實踐也在向前發展。比如,一個密碼學算法,它在數學上可能早已被證明,但是在工程落地上還有很多事情要做。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"如何同時滿足高吞吐和低延遲?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據瞭解,多方安全計算技術的一大難點在於工程實踐。黃斌指出,工程上主要難點在於要同時滿足高吞吐和低延遲。既要滿足大數據量下的查詢、統計、訓練,又要滿足一些諸如人臉特徵比對的實時性應用。“這需要架構上在保障數據安全的同時,對計算、傳輸、序列化等做很多的優化”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多方安全計算從理論到落地有兩個階段:一是理論到實驗室科研原型;二是從實驗室原型再到實際應用落地。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"他坦言,“第二個階段更難,因爲需要投入更多的人。同時,在實際落地過程中,還要考慮商業化、傳播技術和教育市場。此外,工程實踐難點也很多。實驗室做一個原型,跑完 100 個數據,甚至把 1+1=2 做完就行了。但是現在,我們要計算上億條數據,還要跟客戶一起測試。計算 100 條數據相加,這很容易;如果拓展到上億條數據,這個事情就難了。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一旦處理上億條數據,就要按照大數據的方式做,這需要調度的機器資源量完全不一樣,複雜度急劇增加。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲解決這個工程難題,“要做好億級 MPC 數據處理”。一方面,優化算法,比如 MapReduce 階段,明密文混合計算不同階段的拆分;另一方面,做好調度,把數據的讀取、加密、傳輸、計算、解密、存儲等各個階段協調好,在做好安全、容錯的基礎上無縫銜接各個階段,來充分利用計算資源。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"調度怎麼做?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果數據量少,調度完全沒問題。事實上,他們遇到的單個數據流通會達到上億,比如 A 方與 B 方各有幾億條數據求交集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一旦數據量太大,調度就會遇到挑戰。“因爲你任務量大了後,我加機器,機器堆到一定程度,調度可能調不過來。所以,這個時候還是要去優化調度的東西”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"黃斌指出,這裏面有幾個瓶頸點。最大的瓶頸是磁盤,第二個瓶頸是網絡。然後,配合從磁盤讀數據,網絡傳輸數據,還有計算數據。“這三個動作是非常關鍵的:讀數據、傳數據、算數據”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度系統的理想情況是“這邊剛剛算完,數據就傳到那邊,中間沒歇着。然後,就把這個瓶頸點變成讀磁盤了。因爲讀寫磁盤是最慢的,其次是網絡,最快是計算。最好是讓計算節點在剛好算完一批數據的時候,下一批要算的東西就到那裏了。這樣,這個系統沒有空閒”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在黃斌看來,多方安全計算技術落地成功的關鍵是業務。如果沒有業務驅動,這個技術最終只能成爲“水中月、鏡中花”。而事實上,在金融、政務等領域已經有多方安全計算的落地和應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以多頭借貸爲例。跨國銀行一般遍佈全球,在亞洲、歐洲、北美等都有業務,這些地方對數據安全和合規的要求比較高。如果一個人拿着一張貨運單到跨國銀行香港分行的櫃檯,去抵押貨運單,來獲得一筆貸款。可能幾天後,他又跑到歐洲,拿着這張貨運單幹同樣的事。因此貨運單可能被重複抵押。爲避免出現這種情況,跨國銀行需要通過一些手段進行確認。傳統上,可能通過打電話確認,但這樣的方式效率很低,因此就可以使用多方安全計算技術。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"寫在最後"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 DT 時代,數據應用與隱私保護的矛盾日益突出。如何化解這個矛盾,勢必是人們長期面臨的問題。某種意義上,兼具理論研究和實際應用價值的多方安全計算,爲解決這個問題提供了一條重要的技術路徑。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章