Twitch表情中的情緒分析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,人們在社交媒體平臺上"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1903.09623.pdf","title":null,"type":null},"content":[{"type":"text","text":"越來越多地使用"}]},{"type":"text","text":"emoji、表情符號、顏文字、GIF以及各種非文字的表達方式,這讓數據科學家們在研究全球範圍的社會學格局時愈發艱難,但從人們公開的發言中還是能找到全球化社會學的一些趨勢的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管在過去的十年裏,"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-is-natural-language-processing\/","title":null,"type":null},"content":[{"type":"text","text":"自然語言處理"}]},{"type":"text","text":"(NLP)是個非常強大的情緒分析工具,但它不僅跟不上快速更新發展、跨語言的網絡詞彙和縮寫,面對臉書和推特等社交網站上以圖爲主的帖子也束手無策。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲這類研究真正能依靠的超大規模資源只有這些爲數不多的大型社交媒體平臺,所以人工智能必須要做到與時俱進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今年七月,一篇論文提出了一種"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/reaction-gifs-offer-a-new-key-to-emotion-recognition-in-nlp\/","title":null,"type":null},"content":[{"type":"text","text":"新方法"}]},{"type":"text","text":",該方法利用包含了 30000 條推文的數據庫,根據用戶發到社交網絡博文下的“GIF反應”(見下圖),對博文引發的情緒進行歸類和預測。該論文發現,這類以圖像爲主的反應從各方面來說都很容易衡量,因爲大多數都不會包含情緒分析中的"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/new-ai-detects-sarcasm-in-social-media\/","title":null,"type":null},"content":[{"type":"text","text":"弱勢項"}]},{"type":"text","text":":諷刺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/49\/03\/49caf31938e553e3a3343829dbbbae03.gif","alt":null,"title":"研究學者們將人們使用的動態表情GIF稱作“還原指標”,並在他們發佈於2021的論文中分析其用法。","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年上半年,波士頓大學帶領研究團隊通過訓練"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2101.06535.pdf","title":null,"type":null},"content":[{"type":"text","text":"機器學習模型"}]},{"type":"text","text":"預測推特上可能會流行的梗圖。2021年八月,英國學者們通過研究社交媒體中人們使用表情符號(指圖像形式的數字、字母和標點)或emoji(指圖像形式的人臉、物品和符號)的趨勢對比,整合了一個包含七種語言的大型推特情緒數據集。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Twitch顏文字"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,美國的研究者們已經開發出了一種機器學習訓練方法,可以更好地理解、歸類並測量Twitch(國外一遊戲直播平臺)上不斷髮展變化的"},{"type":"text","marks":[{"type":"italic"}],"text":"顏文字"},{"type":"text","text":"(emotes)僞詞彙。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"顏文字emotes是指Twitch上用來表達情緒、情感或小衆笑話的新造詞。因爲其定義便是新造表情,所以對於機器學習系統來說,最難的並不是對源源不斷新產生的新表情進行歸類,總結的速度恐怕還趕不上他們過氣的速度;我們要讓機器能更好地理解這些表情背後的結構,並開發系統將這些顏表情識別爲“臨時”的單詞或組合短語,而其所代表的情感則完全需要依靠上下文情景來判斷。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/ml8ygptwlcsq.i.optimole.com\/fMKjlhs-3HyC7GEo\/w:740\/h:575\/q:auto\/https:\/\/www.unite.ai\/wp-content\/uploads\/2021\/11\/emotes-graph.jpg","alt":null,"title":"與快樂蛙相類似的顏文字,簡單更改後綴其含義便完全不同了。","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖來自舊金山的一家社交媒體分析公司中的三位研究者發佈的論文,《"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2108.08411.pdf","title":null,"type":null},"content":[{"type":"text","text":"快樂蛙:推斷Twitch中新造詞背後的情緒含義"}]},{"type":"text","text":"》。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爆紅後的轉型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管這些表情新鮮一時又多數短命,但Twitch經常會把舊錶情素材挖出來回收利用,讓飽經訓練的情緒分析框架判斷錯誤。通過追溯表情在演變過程中含義的變化,經常會發現他們現在所代表的情感或意圖與最初創造時完全是天翻地覆。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉例來說,研究者們注意到由於極右翼對快樂蛙梗圖的"},{"type":"link","attrs":{"href":"https:\/\/www.pbs.org\/independentlens\/documentaries\/feels-good-man\/","title":null,"type":null},"content":[{"type":"text","text":"濫用"}]},{"type":"text","text":",這個表情幾乎完全失去了它在Twitch上最初流行時代表的政治含義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快樂蛙的形象和它那句經典的“真不錯兄弟(Feels Good Man)”,最初是出現在2005年美國插畫家Matt Furie的一本漫畫中,隨後在2010年左右變成了極右翼的代表梗圖。Vox曾在2017年發文稱,雖然Furie自稱與其撇清關係,但這種右翼挪用後所代表的含義還是流傳了下來,但這篇論文背後的舊金山研究人員卻並不認同:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2010年早些時候,Furie創作的卡通青蛙形象被4chan(外網匿名論壇)等各種線上論壇中的右翼用作宣傳。而從那時起,Furie一直在努力贏回青蛙Pepe本身的意義,而在Twitch上,大量"},{"type":"link","attrs":{"href":"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/14797585.2019.1713443","title":null,"type":null},"content":[{"type":"text","text":"非仇恨"}]},{"type":"text","text":"、積極的青蛙表情成爲了主流,讓快樂蛙和它對應的悲傷蛙用法更加傾向表情的字面意義。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"後續麻煩"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種梗圖的常見表達含義在爆紅後又轉換的情況經常會讓NLP研究項目進展受挫。畢竟這些表情已經被打上了“仇恨”或者“民族主義(US)”這類標籤,並且打包扔進了長期開源倉庫裏。後續使用這些數據的NLP研究項目可能並不會檢查數據的正確性,有的會是因爲沒有數據審計的手段,有的則可能是壓根沒意識到審計的需要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種過期標籤的後果很明顯,如果在2017年使用Twitch表情數據集來訓練一個“政治分類”的算法,那麼歸功於悲傷蛙表情的大量使用,我們將觀測到Twitch上有非常明顯的極右翼傾向。當然,也許Twitch確實"},{"type":"link","attrs":{"href":"https:\/\/www.nytimes.com\/2021\/04\/27\/technology\/twitch-livestream-extremists.html","title":null,"type":null},"content":[{"type":"text","text":"充滿了極右翼的主播們"}]},{"type":"text","text":",但你並不能靠青蛙頭來驗證這點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"悲傷蛙梗圖的政治意義似乎被Twitch上1.4億的用戶(其中"},{"type":"link","attrs":{"href":"https:\/\/backlinko.com\/twitch-users#user-demographics","title":null,"type":null},"content":[{"type":"text","text":"41%的用戶未滿24歲"}]},{"type":"text","text":")不客氣地拋棄了。他們不約而同地從盜圖的政治家手中非常效率地將青蛙Pepe奪了回來,重新用自己的方式將其定義。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"方法和數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者們發現帶標籤的Twitch顏文字數據集“幾乎不存在”,雖然"},{"type":"link","attrs":{"href":"https:\/\/dl.acm.org\/doi\/10.1145\/3365523","title":null,"type":null},"content":[{"type":"text","text":"先前有研究"}]},{"type":"text","text":"稱他們共使用了"},{"type":"link","attrs":{"href":"https:\/\/dl.acm.org\/doi\/10.1145\/3365523","title":null,"type":null},"content":[{"type":"text","text":"八百萬的Twitch表情"}]},{"type":"text","text":",而其中40萬都是在同一周內造出來的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017年的一份預測Twitch上流行顏文字的"},{"type":"link","attrs":{"href":"https:\/\/aclanthology.org\/W17-4402\/","title":null,"type":null},"content":[{"type":"text","text":"研究"}]},{"type":"text","text":",在將預測範圍限制在了前30後,仍然只有0.39的得分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲應對這一難題,舊金山的研究者對舊數據使用了新方法,將訓練集和測試集的比例分成了80\/20,並採用了"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-is-bayes-theorem\/","title":null,"type":null},"content":[{"type":"text","text":"樸素貝葉斯"}]},{"type":"text","text":"(NB),"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/generative-vs-discriminative-machine-learning-models\/","title":null,"type":null},"content":[{"type":"text","text":"隨機森林"}]},{"type":"text","text":"(RF)、"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-are-support-vector-machines\/","title":null,"type":null},"content":[{"type":"text","text":"支持向量"}]},{"type":"text","text":"(SVM,用線性核),以及"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/supervised-vs-unsupervised-learning\/","title":null,"type":null},"content":[{"type":"text","text":"邏輯迴歸"}]},{"type":"text","text":"(Logistic Regression),這些之前並未在Twitch數據中使用過的“傳統”機器學習算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種算法的性能和先前研究的基準線相比高出了63.8%,而研究人員藉此開發的LOOVE(“從詞彙中學習情緒”的英文縮寫)框架做到了新詞彙的識別,並將這些全新的定義添加到現有的模型中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/ml8ygptwlcsq.i.optimole.com\/fMKjlhs--olRzhhu\/w:404\/h:493\/q:auto\/https:\/\/www.unite.ai\/wp-content\/uploads\/2021\/11\/LOOVE-framework.jpg","alt":null,"title":"研究人員開發的LOOVE(Learning Out Of Vocabulary Emotions)框架結構","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LOOVE在無監督訓練嵌入詞上大展身手,通過定期再訓練和微調避免了對標記數據集的需求。考慮到表情的數量和其演化的速度,實時更新標記數據集非常的不現實。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在項目中,研究者們用一個未標記的Twitch數據集"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1301.3781.pdf","title":null,"type":null},"content":[{"type":"text","text":"訓練"}]},{"type":"text","text":"一個顏文字的“僞字典”,在訓練過程中,模型生成了444,714個單詞、顏文字和emoji的嵌入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,他們在"},{"type":"link","attrs":{"href":"https:\/\/dblp.uni-trier.de\/db\/conf\/icwsm\/icwsm2014.html#HuttoG14","title":null,"type":null},"content":[{"type":"text","text":"VADER詞典"}]},{"type":"text","text":"中新增了"},{"type":"link","attrs":{"href":"https:\/\/journals.plos.org\/plosone\/article?id=10.1371\/journal.pone.0144296","title":null,"type":null},"content":[{"type":"text","text":"emoji和表情符號詞彙"}]},{"type":"text","text":",除了之前提到的EC數據集之外,他們還利用來自推特、爛番茄和YELP(外網一美食點評網站)採樣這三個公開可用的數據集進行"},{"type":"link","attrs":{"href":"https:\/\/aclanthology.org\/W18-6215\/","title":null,"type":null},"content":[{"type":"text","text":"三元"}]},{"type":"text","text":"感情的分類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於項目中使用了不止一種方法和數據集,其結果也各不相同,但可以肯定的是,項目中表現最優的基準線比先前研究要高出7.36個百分點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者認爲,該項目的後續價值是在於LOOVE框架的繼續開發,藉助"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-is-k-nearest-neighbors\/","title":null,"type":null},"content":[{"type":"text","text":"K最鄰近法"}]},{"type":"text","text":"(KNN)和word-to-vector(W2V)的嵌入訓練Twitch上超過3.31億條的聊天數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文作者總結道:框架背後的功能驅動是可用於預測未知表情情感的顏文字的僞詞典。利用這個顏文字的僞詞典,我們創建了一個包含22,507個表情的情緒表,可以說是第一個如此規模的顏文字解讀案例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"英文原文"},{"type":"text","text":":"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/understanding-twitch-emotes-in-sentiment-analysis\/","title":null,"type":null},"content":[{"type":"text","text":"Understanding Twitch Emotes in Sentiment Analysis"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章