谷歌發佈生態系統RLDS,可在強化學習中生成、共享和使用數據集

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"大多數"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning","title":null,"type":null},"content":[{"type":"text","text":"強化學習"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"和"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Sequential_decision_making","title":null,"type":null},"content":[{"type":"text","text":"序列決策算法"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"都需要智能體與環境的大量交互生成訓練數據,以獲得最佳性能。這種方法效率很低,尤其是在很難做到這種交互的情況下,比如用真實的機器人來收集數據,或者和人類專家進行交互。要緩解這個問題,可以重用外部的知識源,比如 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/deepmind-research\/tree\/master\/rl_unplugged#atari-dataset","title":null,"type":null},"content":[{"type":"text","text":"RL Unplugged Atari 數據集"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",其中包括玩 Atari 遊戲的合成智能體的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"但是,由於這些數據集非常少,而且序列決策生成數據的任務和方式多種多樣(例如,專家數據或噪聲演示,人類或合成交互,等等),因此,整個社區要用一組很少的、具有代表性的數據集進行工作,就不太現實,甚至不可取。另外,有些數據集被髮行成僅適合特定算法的形式,因此研究者不能重用這些數據集。比如,某些數據集並沒有包含與環境的交互序列,但卻提供了一組讓我們無法重構其時間關係的隨機交互,其他數據集則會以稍有差異的方式發行,從而導致細微的誤差,非常難以識別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"基於此,我們提出了"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/abs\/2111.02767","title":null,"type":null},"content":[{"type":"text","text":"強化學習數據集"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(Reinforcement Learning Datasets,RLDS),併發布了一套用於記錄、重放、操作、註釋和共享數據的"},{"type":"link","attrs":{"href":"http:\/\/github.com\/google-research\/rlds","title":null,"type":null},"content":[{"type":"text","text":"工具"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",用於序列決策制定,其中包括"},{"type":"link","attrs":{"href":"https:\/\/ai.googleblog.com\/2020\/08\/tackling-open-challenges-in-offline.html","title":null,"type":null},"content":[{"type":"text","text":"離線強化學習"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"、"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Apprenticeship_learning","title":null,"type":null},"content":[{"type":"text","text":"學徒學習"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(Apprenticeship learning)或"},{"type":"link","attrs":{"href":"https:\/\/ai.googleblog.com\/2020\/09\/imitation-learning-in-low-data-regime.html","title":null,"type":null},"content":[{"type":"text","text":"模仿學習"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(imitation learning)。RLDS 可以方便地共享數據集,而不會損失任何信息(比如,保持交互的序列,而非隨機化),而且獨立於底層原始格式,從而允許用戶在更廣泛的任務上對新的算法進行快速測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另外,RLDS 提供了收集由合成智能體("},{"type":"link","attrs":{"href":"http:\/\/github.com\/deepmind\/envlogger","title":null,"type":null},"content":[{"type":"text","text":"EnvLogger"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")或人類("},{"type":"link","attrs":{"href":"http:\/\/github.com\/google-research\/rlds-creator","title":null,"type":null},"content":[{"type":"text","text":"RLDS Creator"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")生成的數據的工具,以及對收集到的數據進行檢查與處理的工具。最後,通過與 "},{"type":"link","attrs":{"href":"https:\/\/www.tensorflow.org\/datasets","title":null,"type":null},"content":[{"type":"text","text":"TensorFlow Dataset"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(TFDS)集成,有助於加強與研究界共享強化學習數據集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通過 RLDS,用戶可以將智能體與環境的交互以無損、標準的格式進行記錄。他們可以利用並轉換這些數據,供不同的強化學習或序列決策算法使用,或者進行數據分析。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"數據集結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"強化學習、離線強化學習或模仿學習中的算法,都有可能會使用格式完全不同的數據,並且,當數據集的格式不清楚時,很容易導致由於對底層數據的誤解引起的 bug。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"RLDS 通過定義數據集的每個字段的內容和意義,使數據格式顯式化,併爲其提供了重新對齊和轉換的工具,以適應任何算法實現所需的格式。爲了定義數據格式,RLDS 利用了強化學習數據集固有的標準結構,也就是"},{"type":"link","attrs":{"href":"https:\/\/commons.wikimedia.org\/wiki\/File:Reinforcement_learning_diagram.svg#\/media\/File:Reinforcement_learning_diagram.svg","title":null,"type":null},"content":[{"type":"text","text":"智能體和環境之間的交互(步驟)"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"的序列(情節),其中,智能體可以是基於規則的\/自動化控制器、正式規劃者、人類、動物,或上述的組合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這些步驟中的每一個都包含當前的觀察、應用於當前觀察的行動、作爲應用行動的結果而獲得的獎勵以及與獎勵一起獲得的"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Q-learning#Discount_factor","title":null,"type":null},"content":[{"type":"text","text":"折扣"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。步驟還包括額外的信息,以表明該步驟是該情節的第一個還是最後一個,或者該觀察是否對應於一個終端狀態。每個步驟和情節還可以包含自定義的元數據,可用於存儲與環境相關或與模型相關的數據。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"生成數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"研究人員通過記錄任何類型的智能體"},{"type":"link","attrs":{"href":"https:\/\/commons.wikimedia.org\/wiki\/File:Reinforcement_learning_diagram.svg#\/media\/File:Reinforcement_learning_diagram.svg","title":null,"type":null},"content":[{"type":"text","text":"與環境的交互"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"來產生數據集。爲了保持其有用性,原始數據最好以無損格式存儲,記錄所有生成的信息,並保留數據項之間的時間關係(例如,步驟和事件的序列),而不會對將來如何利用數據集作出任何假定。爲了這個目的,我們發行了 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/envlogger","title":null,"type":null},"content":[{"type":"text","text":"EnvLogger"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",這是一個軟件庫,以"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Open_format","title":null,"type":null},"content":[{"type":"text","text":"開放文檔格式"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"記錄智能體與環境的交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"EnvLogger 是一種環境包裝器,可以將智能體與環境的交互記錄下來,並將它們存儲在一個較長的時間內。雖然 EnvLogger 無縫地集成在 RLDS 生態系統中,但是我們將其設計爲可作爲一個獨立的庫使用,以提高模塊化程度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"與大多數機器學習環境一樣,爲強化學習收集人類數據是一個既費時又費力的過程。解決這個問題的常見方法是使用衆包,它要求用戶能夠輕鬆地訪問可能難以擴展到大量參與者的環境。在 RLDS 生態系統中,我們發行了一個基於 Web 的工具,名爲 "},{"type":"link","attrs":{"href":"http:\/\/github.com\/google-research\/rlds-creator","title":null,"type":null},"content":[{"type":"text","text":"RLDS Creator"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",該工具可以通過瀏覽器爲任何人類可控制的環境提供一個通用接口。用戶可以與環境進行交互,例如,在網上玩 Atari 遊戲,交互會被記錄和存儲,以便以後可以通過 RLDS 加載回來,用於分析或訓練智能體。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"共享數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"數據集的製作往往很煩瑣,與更廣泛的研究社區共享,不僅可以重現之前的實驗,還可以加快研究速度,因爲它更容易在一系列場景中運行和驗證新算法。爲此,RLDS 與 "},{"type":"link","attrs":{"href":"https:\/\/www.tensorflow.org\/datasets","title":null,"type":null},"content":[{"type":"text","text":"TensorFlow Datasets"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(TFDS)集成,後者是一個現有的機器學習社區內共享數據集的庫。一旦數據集成爲 TFDS 的一部分,它就會被索引到全球 TFDS 目錄中,這樣,所有研究人員都可以通過使用 tfds.load(name_of_dataset) 來訪問,並且可以將數據以 TensorFlow 或 "},{"type":"link","attrs":{"href":"https:\/\/numpy.org\/","title":null,"type":null},"content":[{"type":"text","text":"Numpy"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 格式加載。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"TFDS 獨立於原始數據集的底層格式,所以,任何具有 RLDS 兼容格式的現有數據集都可以用於 RLDS,即使它最初不是用 EnvLogger 或 RLDS Creator 生成的。另外,使用 TFDS,用戶對自己的數據擁有所有權和完全控制權,並且所有的數據集都包含了一個引用給數據集作者。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"研究人員可以使用這些數據集對各種機器學習算法進行分析、可視化或訓練,就像上面提到的那樣,這些算法可能會以不同的格式使用數據,而不是以不同的格式存儲數據。例如,一些算法,如 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/acme\/tree\/master\/acme\/agents\/tf\/r2d2","title":null,"type":null},"content":[{"type":"text","text":"R2D2"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 或 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/acme\/tree\/master\/acme\/agents\/tf\/r2d3","title":null,"type":null},"content":[{"type":"text","text":"R2D3"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",使用完整的情節;而另一些算法,如 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/acme\/tree\/master\/acme\/agents\/jax\/bc","title":null,"type":null},"content":[{"type":"text","text":"Behavioral Cloning"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"(行爲克隆)或 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/acme\/tree\/master\/acme\/agents\/jax\/value_dice","title":null,"type":null},"content":[{"type":"text","text":"ValueDice"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",則使用成批的隨機步驟。爲了實現這一點,RLDS 提供了一個強化學習場景的轉換庫。由於強化學習數據集的嵌套結構,所以這些轉換都經過了優化,包括了自動批處理,從而加速了其中一些操作。使用這些優化的轉換,RLDS 用戶有充分的靈活性,可以輕鬆實現一些高級功能,而且開發的管道可以在 RLDS 數據集上重複使用。轉換的示例包含了對選定的步驟字段(或子字段)的全數據集的統計,或關於情節邊界的靈活批處理。你可以在這個"},{"type":"link","attrs":{"href":"https:\/\/github.com\/google-research\/rlds\/blob\/main\/rlds\/examples\/rlds_tutorial.ipynb","title":null,"type":null},"content":[{"type":"text","text":"教程"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"中探索現有的轉換,並在這個 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/google-research\/rlds\/blob\/main\/rlds\/examples\/rlds_examples.ipynb","title":null,"type":null},"content":[{"type":"text","text":"Colab"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 中看到更復雜的真實示例。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可用數據集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"目前,"},{"type":"link","attrs":{"href":"https:\/\/www.tensorflow.org\/datasets\/catalog\/overview","title":null,"type":null},"content":[{"type":"text","text":"TFDS"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 中有以下數據集(與 RLDS 兼容):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"帶有 "},{"type":"link","attrs":{"href":"https:\/\/gym.openai.com\/envs\/#mujoco","title":null,"type":null},"content":[{"type":"text","text":"Mujoco"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 和 "},{"type":"link","attrs":{"href":"https:\/\/sites.google.com\/corp\/view\/deeprl-dexterous-manipulation","title":null,"type":null},"content":[{"type":"text","text":"Adroit"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 任務的 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/rail-berkeley\/d4rl","title":null,"type":null},"content":[{"type":"text","text":"D4RL"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 的子集;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/deepmind-research\/tree\/master\/rl_unplugged","title":null,"type":null},"content":[{"type":"text","text":"RLUnplugged DMLab"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"、"},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/deepmind-research\/tree\/master\/rl_unplugged#atari-dataset","title":null,"type":null},"content":[{"type":"text","text":"Atari"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 和 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/deepmind\/deepmind-research\/tree\/master\/rl_unplugged#realworld-rl-dataset","title":null,"type":null},"content":[{"type":"text","text":"Real World RL"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 數據集;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"用 RLDS 工具生成的三個 "},{"type":"link","attrs":{"href":"https:\/\/robosuite.ai\/","title":null,"type":null},"content":[{"type":"text","text":"Robosuite"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 數據集。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們的團隊致力於在不久的將來迅速擴大這個清單,並且歡迎外界爲 RLDS 和 TFDS 貢獻新的數據集。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"RLDS 生態系統不僅可以提高強化學習與序列決策問題研究的可重現性,還可以方便地進行數據的共享和重用。我們期望 RLDS 所提供的特性能夠推動一種趨勢,即發行結構化的強化學習數據集,包含所有的信息,並涵蓋更廣泛的智能體和任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Sabela Ramos,Google AI 軟件工程師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Léonard Hussenot,谷歌研究室,大腦團隊學生研究員。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"https:\/\/ai.googleblog.com\/2021\/12\/rlds-ecosystem-to-generate-share-and.html"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章