拿不到谷歌DeepMind Protein AI的代碼,這家實驗室自己寫了一個模型

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這家谷歌子公司解決了生物學研究中的一項基本問題,但沒有及時分享其解決方案。所以華盛頓大學的一個團隊試圖重建它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於研究蛋白質結構的生物學家來說,他們的研究領域最近的歷史可以分爲兩個時期:在"},{"type":"link","attrs":{"href":"https:\/\/predictioncenter.org\/casp14\/","title":"","type":null},"content":[{"type":"text","text":"CASP14"}]},{"type":"text","text":"(第14屆蛋白質結構批判性評估會議,該會議兩年舉辦一次)之前,以及那次會議之後。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在此之前的幾十年中,科學家們經過一年又一年的努力,一點點探索根據蛋白質所包含的氨基酸序列預測蛋白質結構這個問題的解決方案。在2020年12月的CASP14之後,谷歌子公司DeepMind的研究人員成功攻克了這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲一家專注於深度學習(人工智能技術的一個分支)的研究公司,DeepMind此前曾因構建擊敗圍棋世界冠軍的人工智能系統而登上媒體頭條。如今它使用一個名爲AlphaFold2的神經網絡在蛋白質結構預測領域取得了成功,這標誌着它首次建立了一個可以解決真正科學問題的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們能幫助科學家弄清楚蛋白質是什麼樣子,就可以推動對細胞內部運作機制的研究,並找出抑制特定蛋白質作用的方法,進而助力新藥的研究過程。7月15日,《自然》期刊發表了一篇未編輯的"},{"type":"link","attrs":{"href":"https:\/\/www.nature.com\/articles\/s41586-021-03828-1","title":"","type":null},"content":[{"type":"text","text":"手稿"}]},{"type":"text","text":",詳細介紹了DeepMind模型的工作原理,且DeepMind公開分享了他們的代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是在這屆CASP之後的七個月裏,另一支團隊接過了接力棒。6月,也就是DeepMind手稿發表前一個月,由華盛頓大學蛋白質設計研究所所長David Baker領導的團隊"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/OEcfh1vwtqEjJtNUGzSR","title":"","type":null},"content":[{"type":"text","text":"發佈"}]},{"type":"text","text":"了他們自己的蛋白質結構預測模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個月來,這個名爲RoseTTAFold的模型是其他科學家可以實際用上的最成功的蛋白質預測算法。儘管它沒有達到與AlphaFold2同樣水平的性能峯值,但該團隊構建了一種工具,讓研究人員無需動手編寫代碼即可提交氨基酸序列並獲得預測結果,讓那些最不擅長計算機的科學家也可以使用這個模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個月後,就在《自然》發佈DeepMind早期手稿的同一天,《科學》期刊發表了Baker實驗室介紹RoseTTAFold的論文。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RoseTTAFold和AlphaFold2都是複雜的多層神經網絡。給定蛋白質的氨基酸序列,它們就能輸出預測的3D結構。它們的設計有一些有趣的相似之處,比如一種“多軌”結構,使它們能分別分析蛋白質結構的不同方面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些相似之處並非巧合——華盛頓大學團隊使用DeepMind團隊在CASP上的15分鐘演講中提到的理念設計了RoseTTAFold——DeepMind在那次演講中概述了AlphaFold2的創新元素。但前者也因那次簡短演講後的不確定性而受到了鼓舞——當時DeepMind團隊沒有給出任何跡象,表明它會在什麼時候讓科學家們接觸到這一前所未有的技術。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些研究人員擔心,一傢俬營公司可能會違背標準的學術實踐,並不會讓更廣泛的社區瞭解自己的代碼。“所有人都驚呆了,媒體報道鋪天蓋地,然後基本上就是無線電靜默了,”Baker說。“你所處的境地如此奇妙:你的領域有了重大進展,但你不能在此基礎上再接再厲。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Baker和他實驗室的博士後Minkyung Baek看到了機會。他們可能沒有DeepMind團隊用來解決蛋白質結構問題的代碼,但他們知道了這是可以做到的。他們也知道DeepMind是使用哪種方法來實現的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“即使在那個時候,David也在說,‘這是一個存在證明。DeepMind已經證明這些方法是可行的,’”馬里蘭大學帕克分校生物科學與生物技術研究所教授兼CASP活動的組織者John Moult說。“這對他來說已經足夠了。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於不知道DeepMind團隊何時或是否會將其工具提供給希望使用它的結構生物學家,Baker和Baek決定嘗試構建自己的版本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歐洲生物信息學研究所名譽主任Janet Thornton說,弄清楚蛋白質的三維結構對於理解細胞的內部運作機制是至關重要的。“DNA編碼了一切信息,但它實際上並沒有做任何事情,”她說。“所有工作都是由蛋白質完成的。”科學家們使用了各種實驗技術來試圖找出蛋白質的結構,但有時數據根本不足以提供明確的答案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用蛋白質獨特的氨基酸序列來預測其結構的計算機模型,可以幫助研究人員弄清楚這些令人困惑的數據到底意味着什麼。在過去的27年裏,CASP爲科學家們提供了一種系統的方法來評估他們算法的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“我們一直在前進,但速度相當緩慢,”Thornton說。但是對於AlphaFold2,她的評價是,“它帶來的改進非常顯著——實際上比我們多年來累積的進步更大。所以在這方面,這是向前跨越了一大步。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Baker實驗室使用自己的模型在CASP14上獲得了第二好的性能,這爲他們重現DeepMind的方法提供了一個堅實的起點。他們將DeepMind團隊成員對AlphaFold2的評價與他們自己的方法做了系統性對比,當他們找出了DeepMind最重要的那些進步,就着手將它們一一構建成一個新的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"他們採用的一項關鍵創新是多軌網絡的想法。大多數神經網絡模型沿着單個“軌跡”(通過網絡的路徑)來處理和分析數據,軌跡中有一系列模擬“神經元”的層,每一層都會轉換前一層的輸出並傳遞給下一層。這有點像傳話遊戲,其中每一位玩家聽到上一位玩家說的單詞後,就悄悄告訴下一個人——只不過在神經網絡中,信息會逐漸重新排列成更有用的形式,而不是像在遊戲中一樣逐漸失真。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepMind設計的AlphaFold2將蛋白質結構信息的不同方面分成了兩個獨立的軌道,這兩個軌道互相反饋一些信息——就像同時有兩組傳話遊戲,兩組玩家之間相鄰的人們會來回傳遞一些信息。到了RoseTTAFold這裏,Baker和Baek發現使用三個軌道效果最好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“當你畫一些複雜的圖形時,你不會一次畫完,”Baek說。“你會從非常粗略的草圖開始,逐步添加一些片段並添加一些細節。蛋白質結構預測有點像這種過程。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了觀察RoseTTAFold在現實世界中的運行情況,Baker和Baek聯繫了一些遇到了無法解決的蛋白質結構問題的結構生物學家。一天晚上7點,加州大學舊金山分校的生物化學和生物物理學教授David Agard,向他們發送了由感染特定病毒的細菌產生的蛋白質的氨基酸序列。結構預測結果在凌晨1點發給了教授。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在六個小時內,RoseTTAFold解決了困擾Agard兩年的問題。“我們實際上可以看到它是如何從兩種細菌酶的組合進化而來的,進化過程可能發生在數百萬年前,”Agard說。現在克服了這個瓶頸後,Agard和他的實驗室就可以繼續研究這種蛋白質的運作機制了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管RoseTTAFold沒有達到與AlphaFold2相同的性能水平,但Baker和Baek知道是時候向世界發佈他們的工具了。“這顯然還是非常有用的,因爲這些人正在解決很多長期以來一直懸而未決的生物學問題,”Baker說。“我們當時決定,'好吧,讓科學界瞭解並用上這個工具會是好事一樁。'”6月15日,他們發佈了一款可以讓任何人輕鬆運行他們模型的工具,以及他們即將發表的科學論文的"},{"type":"link","attrs":{"href":"https:\/\/www.biorxiv.org\/content\/10.1101\/2021.06.14.448402v1","title":"","type":null},"content":[{"type":"text","text":"預印版"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時,據DeepMind領導AlphaFold項目的John Jumper稱,一篇詳細介紹該系統的深度科學論文已經(在《自然》中接受審查了,當然Baker他們還不知道這件事。DeepMind已於5月11日將其手稿提交給了《自然》。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那時,科學界對DeepMind的時間表知之甚少。在Baker的預印版發佈三天後,情況發生了變化。6月18日,DeepMind首席執行官Demis Hassabis在Twitter寫道:“我們一直在全力完成我們的完整方法論文(目前正在審查)以及隨附的開源代碼,併爲科學界提供對AlphaFold的廣泛免費訪問。”“很快就會有更多東西出來的!”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7月15日,就在Baker的RoseTTAFold論文發表的同一天,《自然》發佈了DeepMind未經編輯但經過同行評審的AlphaFold2"},{"type":"link","attrs":{"href":"https:\/\/www.nature.com\/articles\/s41586-021-03819-2","title":"","type":null},"content":[{"type":"text","text":"手稿"}]},{"type":"text","text":"。同時,DeepMind在GitHub上"},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/alphafold","title":"","type":null},"content":[{"type":"text","text":"免費提供"}]},{"type":"text","text":"了AlphaFold2的代碼。一週後,該團隊發佈了一個龐大的"},{"type":"link","attrs":{"href":"https:\/\/www.alphafold.ebi.ac.uk\/","title":"","type":null},"content":[{"type":"text","text":"數據庫"}]},{"type":"text","text":",其中包含了通過他們方法預測的350,000個蛋白質結構。革命性的蛋白質預測工具及其大量預測結果終於走進了科學社區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據Jumper的說法,DeepMind的論文和代碼直到CASP演示後七個多月才發佈的原因並不特殊:“那天我們還沒有準備好開源,或發佈這篇具體介紹細節的論文,”他說。在5月份提交論文後,團隊正在完成同行評審過程,Jumper說他們試圖儘快發表論文。“老實說,我們一直在儘量加快腳步,”他說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepMind團隊的手稿是通過《自然》的文章加速預審流程發表的,期刊經常使用這個流程來審查Covid-19論文。在給《連線》期刊的一份聲明中,《自然》的一位發言人寫道,這一過程旨在“爲我們的作者和讀者提供服務,以儘快提供特別值得注意且對時間敏感的同行評審研究成果。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Jumper和DeepMind科學團隊的負責人Pushmeet Kohli對於Baker的論文是否影響了他們在《自然》的發表時間這個話題給出了看法。“從我們的角度來看,我們在5月份貢獻並提交了這篇論文,因此從某種意義上說,它的發表時間已經不是我們能控制的了,”Kohli說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但CASP組織者Moult認爲,華盛頓大學團隊的工作可能幫助了DeepMind的科學家說服他們的母公司在更短的時間內免費提供他們的研究成果。“我瞭解他們——他們是非常傑出的科學家,我覺得他們應該會希望儘可能開放,”Moult說。“內部應該會存在一些衝突,因爲它是一家商業企業,它最後必須以某種方式來賺錢。”DeepMind的母公司Alphabet是全球市值第四的企業。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hassabis認爲AlphaFold2的發佈對科學界和Alphabet都是有利的。他在接受WIRED採訪時說:“這都是開放的科學成果,我們將它提供給全人類,沒有任何附加條件——系統、代碼和數據庫全部公開。”當被問及他們是否出於商業原因討論過將代碼保密時,他說:“這是一個很好的問題,它涉及我們交付價值的途徑。價值可以通過很多不同的方式傳遞,對嗎?商業途徑顯然是一種方法,但聲譽也是一個重要的途徑。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Baker很快就讚揚了DeepMind團隊的論文,也讚賞了他們無保留公開代碼的做法。他說,從某種意義上說,RoseTTAFold是針對DeepMind背離科學合作精神行事的這種可能性的一種預防措施。“如果他們沒那麼開明,並決定不發佈代碼,那麼至少世界上還會有一個起點,”他說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"話雖如此,他認爲如果DeepMind的信息早點發布,他的團隊本可以推動AlphaFold2表現得更好,或者讓它適應設計人造蛋白質的問題——這是Baker實驗室的主要關注點。“毫無疑問,如果比如說在12月初,在CASP剛結束之後他們就說,‘這就是我們的代碼,我們就是這樣做的',那麼我們肯定會走得更遠,”Baker說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於蛋白質結構預測的一些實際應用來說,時間可能是至關重要的。例如,瞭解對病原體生存至關重要的蛋白質的三維結構可以幫助科學家開發藥物來對抗病原體。這些應用甚至可以用來對抗疫情;例如,DeepMind去年8月使用了AlphaFold2的一個版本來預測一些SARS-CoV-2蛋白質的結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Baker認爲,學術界和工業界之間需要越來越深入的信息共享。人工智能中的問題需要大量的時間和資源來解決,而像DeepMind這樣的公司可以獲得大學實驗室無法想象的人員和計算能力。“幾乎可以肯定的是,工業界將繼續取得很多重大進展,我認爲這一趨勢只會加速,”Baker說。“這些公司將面臨很多內部壓力,決定是像DeepMind那樣公開這些進展,還是嘗試將其商業化。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.wired.com\/story\/without-code-for-deepminds-protein-ai-this-lab-wrote-its-own\/","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.wired.com\/story\/without-code-for-deepminds-protein-ai-this-lab-wrote-its-own\/"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章