人工智能時代,如何硬核“玩”音樂?

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中國計算機界一年一度的頂級盛會 —— CNCC2021( 中國計算機大會)將於 12 月16-18 日在深圳拉開帷幕。InfoQ 極客傳媒已正式成爲 CNCC2021 的戰略合作媒體。作爲合作的一部分,《InfoQ大咖說》與 CCF 聯合推出了高端訪談欄目《技術風雲 | 對話CNCC》。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"《技術風雲 | 對話CNCC》高端訪談欄目將以直播對話的形式,從縱覽計算機發展的視角出發,特邀來自 CNCC2021 的頂尖專家學者、科技企業的技術領袖,圍繞 AI、數字化轉型、計算+ 、雲計算、開源、芯片等前沿技術展開廣泛探討,帶來學術、技術、產業等全方位的深度解讀,推動計算領域創新技術更廣泛的傳播、討論和變革,幫助 IT 從業者開拓視野,緊跟時代。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在人工智能技術迅速發展的當下,越來越多的領域被這項技術注入新的活力。作爲多媒體領域中不可缺少的組成部分,音樂對於人類的重要性不言而喻。值得一提的是,人工智能在音樂領域的研究早在多年前就已經開始了,並且也落地了很多成熟應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"11 月 10 日,InfoQ 和 CCF 聯合推出的大咖說欄目《技術風雲 | 對話 CNCC》第 7 期直播開播。本期大咖說,我們邀請到了微軟亞洲研究院主管研究員,也是 CNCC 2021 的講者——譚旭老師 ,來跟我們聊聊在人工智能時代,如何硬核玩音樂。 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以下內容節選自"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/video\/hO5VDtlVFSqyKvupZvDN?utm_source=home_video&utm_medium=article","title":null,"type":null},"content":[{"type":"text","text":"當天的分享"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",InfoQ 做了不改變原意的編輯:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:譚旭老師好,非常開心可以有機會和您進行交流,可以先和大家打個招呼嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"大家好,我是微軟亞洲研究院主管研究員譚旭,我的研究領域主要包括機器學習、自然語言處理、語音和音樂相關方向,尤其是在文字、語音和音樂的內容創作方面開展了一些研究。在 AI 音樂方向,我們圍繞着 AI 音樂的理解和生成做了一些研究工作。相信通過 AI 技術能給音樂領域帶來新的生產力和創造力,幫助到音樂從業者。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您作爲 CNCC2021 計算藝術論壇講者,能分享下您與 CNCC 之間的故事嗎?您加入的契機是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"CNCC 中國計算機大會是由中國計算機協會組織的非常有影響力的會議,今年 CNCC2021 的主題是計算賦能加速數字化轉型,主要探討計算和人工智能技術如何加速各個行業的數字化轉型。近幾年,計算機音樂、人工智能音樂取得了快速發展,受到學界和業界的廣泛關注,大家也普遍認爲,計算機及人工智能技術能給音樂行業數字化轉型賦能。我在微軟亞洲研究院的團隊圍繞 AI 音樂開展了一系列的研究工作,也是這個領域的見證者和參與者。希望借這次 CNCC2021 大會,能和大家共同探討人工智能技術在音樂,尤其是音樂創作領域的發展現狀、行業機遇、目前面臨的問題以及未來的發展方向,也希望能吸引更多志同道合的朋友們加入進來,推動這個領域發展進步。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼說音樂適合與人工智能相結合?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:人工智能技術與音樂相結合聽起來非常有趣,主要的結合點有哪些?您能整體介紹一下嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在回答這個問題前,我想先解答下可能大多數人心中的一個疑惑:音樂作爲一門藝術,怎麼會和人工智能或計算機這麼偏重邏輯、理工的領域有結合呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"要回答這個問題,就得先談音樂的本質。音樂背後是有樂理的,幾千年來,人類一直在探索音樂背後的樂理,像早期古希臘、古羅馬的科學家、數學家、音樂理學家就在研究,爲什麼人聽一段音樂會感覺到和諧。比如畢拉格拉斯就曾發現,音調的音程實際上是和絃長成一定比例的,而不同的聲音一定會有一些頻率組合,這樣組合起來纔會產生悅耳的聲音。柏拉圖也提出過,天文學裏的很多運動軌跡,和音樂裏聽到的一些音程的和諧,是有很多關聯的。這些例子能夠說明,音樂背後是有非常嚴謹的邏輯的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們生產一段音樂,最重要的是要讓不同聲音組合在一起能夠和諧,這個和諧來自於人的聽感,能產生共鳴,而共鳴實際上就是物理學的聲波共振。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"兩個音組合在一起能夠和諧,一定是它們的共振或諧波比例是比較協調的。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這說明,音樂的背後是有很深的數學上的邏輯關係的,音樂是適合計算機或人工智能去做處理的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"音樂本身也是有嚴謹的結構框架的,像小說、電影需要有起承轉合,需要推動故事情節,音樂也是如此。在音樂中,推動情感發展的是和絃,和絃能起到音樂的情緒推進作用,比如大家熟悉的和絃 4536251 或者卡隆。此外,音樂要想好聽還需要具備一些其他因素,比如配器規則、編曲技巧等等。這也解答了爲什麼藝術是可以和計算機、人工智能相結合的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其實最早可以追溯到十五、十六世紀,就已經有計算機音樂方向的研究了。比如在古典音樂前期,像莫扎特就曾做過一個有趣的音樂實驗,他把一段音樂先分成很多片段,每個片段就是一小節,然後再搖骰子,搖到什麼就選擇哪個片段去組合,最後組合出來的音樂還挺好聽的,這在計算機音樂圈被認爲是計算機音樂的鼻祖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"至於人工智能技術和音樂的主要結合點,我大概從以下兩個方面來做解讀。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"第一,從音樂本身的數據表現方式來做解讀。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們可以粗略地把音樂的表示形態分爲兩種,一種是我們常見的曲譜、歌詞這些符號化的音樂表示方式,另一種是有了曲譜會發出聲音,通過歌手唱出來,通過播放器播放出來,或是通過音樂軟件渲染出聲音傳到人耳,這也是更爲常見的音樂的表達方式。人工智能與音樂相結合,其實就是將人工智能技術在這兩種音樂模態上去發揮一些作用。比如我們可以利用自然語言處理技術,如語言理解、語言生成,來幫助音樂的理解和生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"第二,從音樂涉及的一些任務來去解讀。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"比如我們可以把音樂粗淺地進行二元劃分,一方面是假設我們已經有了音樂,需要對音樂進行處理、理解、檢索、轉換、加工等等。另一方面,假設我們沒有音樂,需要去創造音樂,具體包括製作旋律、詞曲寫作、伴奏編曲、音色的合成、混音、歌詞合成等等。這些剛好對應人工智能技術裏比較常見的數據理解和數據生成。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"人工智能 + 音樂的應用與技術挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您本人也做了一系列 AI 音樂研究,包括音樂理解、詞曲創作、伴奏生成、歌聲合成等,它們的實現難度和實現路徑分別是什麼樣的?主要會涉及哪些技術和方法?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我和我的團隊在 AI 音樂生成方向上做了一系列的工作,尤其是圍繞流行音樂,具體包括詞曲創作、伴奏編曲、歌聲合成等等。我們在圍繞這些流程去開展相關研究的時候就發現,要想做好音樂生成,離不開對音樂的理解,你需要對音樂的節奏、和聲、曲式結構、情感風格有較好的理解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們的整個研究主要圍繞理解和生成兩個方向進行。在音樂生成方向,商品化音樂的整個製作流程是非常長的,涉及很多技術鏈條。舉個例子,創作一首音樂最基本的就是詞曲創作,從詞到曲這個創作過程用 AI 的方式來解,就是一個典型的序列到序列學習的任務,輸入一個歌詞系列,輸出一個旋律系列。目前市面上的開源數據集是非常少的,歌詞旋律的配對數據是非常缺乏的。我們解決這個問題的方式就是引入預訓練,利用大規模的非配對數據來訓練模型,實現歌詞的旋律生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"隨着研究的深入,我們發現歌詞、旋律和一般的序列到序列學習的任務還是不一樣。比如在人工智能領域裏,一些常見的序列到序列學習的任務像語音識別、語音合成、機器翻譯,輸入和輸出有很強的語義對應關係,一段語音識別出來的文字就是對應語音中的某個片段,不會漏掉,也不會增加。但一句歌詞可以對應多段旋律,一段旋律也可以有多個歌詞去匹配它,這是一種很弱的耦合。如果我們還用傳統的人工智能序列到序列的學習方法去解決這個問題,那麼不可避免地需要大量的訓練數據,因爲它的邏輯很弱。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這個時候,就得換個方式去解決它,比如找到歌詞和旋律裏真正產生耦合的信息做中間的橋接,把歌詞和旋律這個任務拆成兩階段,歌詞到了橋接再生成旋律。我們最近開展的一個工作就是通過一些音樂的線性知識去定義怎麼橋接歌詞旋律,先從歌詞裏提取出中間的模板旋律,也可以提取節奏信息,或是句子結束的信息來作爲第二階段,從這個模板信息生成最後的旋律。這樣做的優勢在於,第二階段的模型只需從旋律裏邊抽出音樂模板即可,這是一個自監督的學習過程。而第一階段就是歌詞的模板,它是可以通過規則人爲去決定的,或者通過一些輔助的監督訓練的方法,從歌詞抽取到模板,這樣就能解決我們之前提到的耦合很弱,需要很多數據的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從實際的生成效果來看,這樣做也確實比之前的方法更好,並且基本不需要任何配對的歌詞到旋律數據,就能實現比較好的旋律生成效果。當然,從歌詞到旋律還有很多路要走,比如怎麼能讓旋律匹配到歌詞裏的意境、情感或是主題,這些都是以後需要持續探索的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"再舉一個歌聲合成的例子。歌聲合成是曲譜和歌詞合成聲音,實際上它和語音合成非常類似。兩者相比最大的區別在於,語音中的人聲音高和時長基本上是比較平穩的、確定的,比如男生音高大概是一兩百赫茲,女生音高大概是兩三百赫茲。但歌聲卻不一樣,一個八度頻率就會翻倍,這麼寬的頻帶會給建模帶來非常大的挑戰。另外,唱歌的時候經常容易過快或過慢,比如像 Rap 每個音發很短,而一些慢歌一個音會發很長,這種特點就會造成我們在生成音頻波形的時候非常不穩定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"針對歌聲合成中遇到過的這些挑戰,我們設計了一系列工作來解決。其中一項工作是 HIFISinger,它是專門針對高保真的音頻進行合成,48K 的採樣率,能傳達出比較豐富的歌聲細節。我們通過一些改進的對象生成網絡,以及系統性的設計來解決,最後合成了效果比較不錯的歌聲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"舉的這些例子主要涉及哪些通用的方法呢?整個 AI 生成任務大概涉及兩部分技術方法,一部分是通用的,比如人工智能裏常見的自然語言生成、系列到系列學習、可控的系列生成、語音合成,以及聲音生成裏的聲碼器、常見的生成模型等等。還有一些通用的學習範式,比如半監督學習、自監督學習,或者低資源的一些機器學習。另一部分是問題特有的,只能具體問題具體解決。要想在這些任務裏面做得更好,比如你要做到穩定魯棒,要讓音質的效果好,要有完整的結構和情緒推動,都存在很大的挑戰。像在歌聲合成裏面,有很多人聲特有的一些唱歌技巧,比如說顫音、滑音,或是把京劇、戲腔加入到歌聲中,建模都非常困難。這是我們目前要往下進行的研究關注點,也是整個行業需要關注的一些問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您剛和我們介紹了目前團隊在 AI 音樂方向上的研究現狀以及挑戰,能介紹一下當前 AI 音樂的整體應用情況嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其實計算機音樂的應用場景是一直都有的,可能這幾年 AI 剛火,大家纔有 AI 音樂這個概念。在過去,大家習慣叫計算機音樂或是音樂信息檢索,典型的應用場景就是幫助我們更好地組織、管理、檢索、推薦音樂。此外還有音樂風格分類,通過歌詞搜索歌曲等等。在音樂教學中,AI 音樂也有很多應用。比如對彈奏出來的聲音進行分析,看曲譜卡拍有沒有卡準,轉換是否出現問題,從而幫助人們更好地去學習某項樂器。在音樂生成方向上也有一些應用,比如現在很多短視頻的配樂工作,或是通過 AI 輔導人類去創作等等。我認爲從目前的技術來講,還沒有做到 AI 生成的音樂是完全不需要任何人工參與,就是一個完整的產品音樂。可能隨着技術的不斷髮展,未來或許能夠解決。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您開源了 AI 音樂研究項目 Muzic,能和我們介紹下這個項目的具體情況嗎?Muzic 可以解決什麼問題?在技術上有哪些創新點?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Muzic 是微軟亞洲研究院圍繞着 AI 音樂的理解和生成的研究項目,宗旨是通過機器學習和人工智能技術更好地幫助到音樂的理解和生成。項目涉及理解和生成的各個方面任務,比如音樂分類、音樂識別出歌詞、歌聲合成、伴奏編曲等等,未來還會有分離、識別檢索的任務。AI 領域涉及的問題非常多,我們希望通過開放一些現有研究工作的源代碼或項目文檔資料,給到社區的每個從業者,這些從業者能基於我們的這些框架工具,更方便地做進一步研究,一起推動 AI 音樂領域的進步,也歡迎大家多多關注和使用 Muzic。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"開源地址:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/microsoft\/muzic","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"https:\/\/github.com\/microsoft\/muzic"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:評論區有觀衆提問,音樂生成和其他比如下棋、語言理解這些任務不同,它的目標不是明確的。對於這種目標不明確的任務,我們有什麼解決方案嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"像下棋或是其他任務都有一個非常強的規則體系和 Reword,就是你贏了沒有,你得到多少分。而音樂實際上是一個很主觀的東西,並且不同人的標準是不一樣的,同樣一首歌,有人聽到的是喜悅,有人聽到的是悲傷。我們可以從兩個角度去看待這個問題,一是它本身的反饋機制不明確,這就需要我們在利用 AI 技術的時候拆解,AI 技術到底能幫到音樂裏面的哪個環節。二是,AI 生成的音樂更多還是需要人工來進行評估,也就是說在整個鏈條裏,始終有人的存在。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"AI 技術只是一個工具"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",它不可能有一個自制系統去操控或是有主觀的意見去做一些事情,人在這個過程中還是要有完全自主的把控能力,最後還是需要人來做決定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:評論區有觀衆提問,AI 在輔助音樂創作方向上有哪些作品和應用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"輔助創作的產品應用並不算多,但是有很多 Demo 是符合輔助音樂創作的。比如有很多音樂技術軟件和編輯軟件,你寫完一段旋律的時候,AI 能馬上給你補充一段旋律,並且補充的這段旋律是符合你的整個思路的,接下去也比較連貫自然。又或者,你寫了一首曲子,AI 的工具馬上給你配上伴奏,整個音樂也很和諧。面對這些 AI 生成的音樂,你可以完全採納,也可以在上做一些編輯修改,去掉不合理的地方,這些一定可以幫助你減少很多成本,這也是 AI 輔助音樂創作的典型例子。當然還有很多其他應用,比如這種交互式的,我輸出一句,AI 輸出一句,有點像和 AI 鬥琴,我彈奏一段,AI 彈奏一段,這種場景也會有。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"AI 只是工具,爲人類的音樂創作賦能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:AI 音樂大概什麼時候可以完全不需要人工參與呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"關於這個問題,可以從兩種角度進行回答。一種角度是,我們是不是真正期待 AI 音樂完全不需要人類?如果 AI 真正取代了人類,不需要任何干擾就能創作出非常好的音樂,人類的位置該在哪裏?另外從技術角度來看,我覺得實現起來也還很遠。當然,對於 AI 音樂的最終形態,我們也不希望演化成 AI 完全取代人類,這也是不太得體的一種方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"更好的方式是什麼?就是人在創作過程中,通過 AI 去激發靈感,去輔助自己做出更好的音樂。AI 爲人類的音樂創作賦能。人的一生能聽的音樂有限,但 AI 模型訓練聽過上百萬甚至上千萬的音樂,AI 能見到大規模的不同流派、不同風格、不同編排規劃的音樂,它能爲人類提供很多相關的素材,能在人類的音樂創作環節中提供指導或參考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:有評論認爲,人工智能技術產生的音樂是沒有靈魂的,不利於音樂的發展,您如何看待這種觀點?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這個問題也會涉及的一點是,我們需要問自己,AI 音樂到底需不需要靈魂?動物是有靈魂的,機器需不需要靈魂?這是個哲學範疇的問題,如果機器哪天有靈魂,取代人類怎麼辦?它可能就不是人工智能,而是超越人的智能了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另一方面,如果它沒有靈魂,是不是不利於現在音樂的發展?我覺得不是的。AI 永遠只是工具,是爲人服務的。相比 AI,人類的優勢在於有靈魂,AI 的優勢在於它的數據或知識體系很完善,但最後還是需要依靠人的親身實踐。很多時候音樂創作不單是音樂本身,而且創作者在生活中遇到了什麼事情,或悲傷或快樂。過去創作者可能需要花一個月甚至更長的時間創作一首歌,但有了人工智能,創作效率可以得到提升,這樣的模式結合是我個人比較期待看到的。總結來說就是,人要把控最精華、最有靈魂、最有靈感的地方,而 AI 去做剩下的事情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:在未來,人工智能在音樂領域的發展會朝着哪個方向邁進?有哪些可以預見的發展趨勢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從我個人的一些理解和判斷來看,長遠來講,人工智能技術未來一定會取得非常大的進步。作爲技術從業者,我希望能看到人工智能技術的發展,可以實實在在地幫助到音樂。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在音樂理解層面,我相信未來我們能夠構建一個比較大規模的音樂理解模型,這個模型和現在的不一樣,它可以直接對音頻進行分類、分離轉錄等等。現在音樂領域的很多束縛還是在於把音頻的一些任務轉成符號,這項技術目前還是存在很多挑戰的。比如我們可以把音頻轉成鋼琴,但如果轉成其他樂器,就會產生比較大的誤差。如果未來的語音處理技術能真正成熟的話,那我們可以直接從音樂中把這些內容理解出來,這也能極大地推動 AI 音樂的進步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在音樂生成層面,我們目前需要解決的問題在於模型的可控生成。之前提到 AI 要爲人服務,人要去指定控制模型生成,或某種音樂結構,或某種曲子結構,告訴 AI 生成什麼,AI 就能生成什麼。如果未來真正能做到這一點,我相信能有巨大的場景應用。另外就是 AI 能不能去自動地做音色選擇,或是混音,這也是我個人比較期待的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"人工智能時代下的通用內容創作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:人工智能在內容創作上存在哪些優勢和侷限性?和真人相比,還有哪些差距?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我本人對 AI 內容創作這個方向比較感興趣,也圍繞着這個方向做了很多研究。其實文本生成、語音生成以及音樂生成都屬於廣泛的人工智能內容創作,優勢在於 AI 技術本身的一些優勢,比如它能夠從大量數據裏去總結背後的規律,通過擬合這些規律實現模型生成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在人工智能領域,我們經常能聽到兩個概念:感知和認知。比如圖像和語音更多會涉及感知層面的生成,現在的 AI 能夠生成非常逼真的人臉,以及出色的語音效果。在認知任務層面,比如 AI 需要對語言進行理解,這部分的發展還有一段距離。像之前 OpenAI 的 GPT-3 模型非常大,生成效果也不錯,乍一看內容還可以,但如果你仔細去看這些內容,用一個詞來衡量叫似是而非,從認知層面來推敲的話,還是和真人存在一定的差距的。當然,這也是我們這個行業以及從業者努力的空間和方向,需要在數據、算法、建模,以及整個技術路線的選擇上做更多努力,這樣才能在文本或音樂生成上真正取得不錯的效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:您將在今年 12 月 16-18 日召開的 CNCC2021 計算藝術論壇上帶來《基於深度學習的流行音樂創作》主題演講,能否提前劇透一下演講中的一兩個亮點?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"譚旭:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有很多大咖專家參與了 CNCC2021 計算機藝術論壇,既有來自技術界的,也有來自音樂藝術界的。我們會探討計算機人工智能和藝術,包括計算機和音樂,或是繪畫美術以及其他藝術形式的一些結合,整個內容非常豐富。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我主要圍繞着深度學習技術在流行音樂創作中的一些工作,介紹我們團隊最新的關於 AI 音樂的研究。比如我之前提到過,音樂序列是非常長的,它比一般的文本或者段落都要長很多,並且有很多重複結構,像主歌、副歌,後面又來個主歌,或者又來個副歌,它其實是有遠距離依賴的重複結構的。我們針對這個問題做了一些長序列建模的工作,以及可控音樂生成。所以我也會分享這方面的一些研究進展,希望大家到時參加。謝謝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"如果大家還想了解關於《基於深度學習的流行音樂創作》的更多信息,請關注將於 12 月 16-18 日在深圳舉辦的 "},{"type":"link","attrs":{"href":"https:\/\/mobile.ccf.org.cn\/web\/html15\/index.html?globalId=m8271748750546083841617255458379&type=1","title":null,"type":null},"content":[{"type":"text","text":"CNCC2021"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",屆時譚旭老師將在大會上做精彩的報告。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章