Twitch表情中的情绪分析

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年来,人们在社交媒体平台上"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1903.09623.pdf","title":null,"type":null},"content":[{"type":"text","text":"越来越多地使用"}]},{"type":"text","text":"emoji、表情符号、颜文字、GIF以及各种非文字的表达方式,这让数据科学家们在研究全球范围的社会学格局时愈发艰难,但从人们公开的发言中还是能找到全球化社会学的一些趋势的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"尽管在过去的十年里,"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-is-natural-language-processing\/","title":null,"type":null},"content":[{"type":"text","text":"自然语言处理"}]},{"type":"text","text":"(NLP)是个非常强大的情绪分析工具,但它不仅跟不上快速更新发展、跨语言的网络词汇和缩写,面对脸书和推特等社交网站上以图为主的帖子也束手无策。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因为这类研究真正能依靠的超大规模资源只有这些为数不多的大型社交媒体平台,所以人工智能必须要做到与时俱进。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今年七月,一篇论文提出了一种"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/reaction-gifs-offer-a-new-key-to-emotion-recognition-in-nlp\/","title":null,"type":null},"content":[{"type":"text","text":"新方法"}]},{"type":"text","text":",该方法利用包含了 30000 条推文的数据库,根据用户发到社交网络博文下的“GIF反应”(见下图),对博文引发的情绪进行归类和预测。该论文发现,这类以图像为主的反应从各方面来说都很容易衡量,因为大多数都不会包含情绪分析中的"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/new-ai-detects-sarcasm-in-social-media\/","title":null,"type":null},"content":[{"type":"text","text":"弱势项"}]},{"type":"text","text":":讽刺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/49\/03\/49caf31938e553e3a3343829dbbbae03.gif","alt":null,"title":"研究学者们将人们使用的动态表情GIF称作“还原指标”,并在他们发布于2021的论文中分析其用法。","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021年上半年,波士顿大学带领研究团队通过训练"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2101.06535.pdf","title":null,"type":null},"content":[{"type":"text","text":"机器学习模型"}]},{"type":"text","text":"预测推特上可能会流行的梗图。2021年八月,英国学者们通过研究社交媒体中人们使用表情符号(指图像形式的数字、字母和标点)或emoji(指图像形式的人脸、物品和符号)的趋势对比,整合了一个包含七种语言的大型推特情绪数据集。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Twitch颜文字"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"现在,美国的研究者们已经开发出了一种机器学习训练方法,可以更好地理解、归类并测量Twitch(国外一游戏直播平台)上不断发展变化的"},{"type":"text","marks":[{"type":"italic"}],"text":"颜文字"},{"type":"text","text":"(emotes)伪词汇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"颜文字emotes是指Twitch上用来表达情绪、情感或小众笑话的新造词。因为其定义便是新造表情,所以对于机器学习系统来说,最难的并不是对源源不断新产生的新表情进行归类,总结的速度恐怕还赶不上他们过气的速度;我们要让机器能更好地理解这些表情背后的结构,并开发系统将这些颜表情识别为“临时”的单词或组合短语,而其所代表的情感则完全需要依靠上下文情景来判断。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/ml8ygptwlcsq.i.optimole.com\/fMKjlhs-3HyC7GEo\/w:740\/h:575\/q:auto\/https:\/\/www.unite.ai\/wp-content\/uploads\/2021\/11\/emotes-graph.jpg","alt":null,"title":"与快乐蛙相类似的颜文字,简单更改后缀其含义便完全不同了。","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图来自旧金山的一家社交媒体分析公司中的三位研究者发布的论文,《"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2108.08411.pdf","title":null,"type":null},"content":[{"type":"text","text":"快乐蛙:推断Twitch中新造词背后的情绪含义"}]},{"type":"text","text":"》。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爆红后的转型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"尽管这些表情新鲜一时又多数短命,但Twitch经常会把旧表情素材挖出来回收利用,让饱经训练的情绪分析框架判断错误。通过追溯表情在演变过程中含义的变化,经常会发现他们现在所代表的情感或意图与最初创造时完全是天翻地覆。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"举例来说,研究者们注意到由于极右翼对快乐蛙梗图的"},{"type":"link","attrs":{"href":"https:\/\/www.pbs.org\/independentlens\/documentaries\/feels-good-man\/","title":null,"type":null},"content":[{"type":"text","text":"滥用"}]},{"type":"text","text":",这个表情几乎完全失去了它在Twitch上最初流行时代表的政治含义。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快乐蛙的形象和它那句经典的“真不错兄弟(Feels Good Man)”,最初是出现在2005年美国插画家Matt Furie的一本漫画中,随后在2010年左右变成了极右翼的代表梗图。Vox曾在2017年发文称,虽然Furie自称与其撇清关系,但这种右翼挪用后所代表的含义还是流传了下来,但这篇论文背后的旧金山研究人员却并不认同:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2010年早些时候,Furie创作的卡通青蛙形象被4chan(外网匿名论坛)等各种线上论坛中的右翼用作宣传。而从那时起,Furie一直在努力赢回青蛙Pepe本身的意义,而在Twitch上,大量"},{"type":"link","attrs":{"href":"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/14797585.2019.1713443","title":null,"type":null},"content":[{"type":"text","text":"非仇恨"}]},{"type":"text","text":"、积极的青蛙表情成为了主流,让快乐蛙和它对应的悲伤蛙用法更加倾向表情的字面意义。"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"后续麻烦"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种梗图的常见表达含义在爆红后又转换的情况经常会让NLP研究项目进展受挫。毕竟这些表情已经被打上了“仇恨”或者“民族主义(US)”这类标签,并且打包扔进了长期开源仓库里。后续使用这些数据的NLP研究项目可能并不会检查数据的正确性,有的会是因为没有数据审计的手段,有的则可能是压根没意识到审计的需要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种过期标签的后果很明显,如果在2017年使用Twitch表情数据集来训练一个“政治分类”的算法,那么归功于悲伤蛙表情的大量使用,我们将观测到Twitch上有非常明显的极右翼倾向。当然,也许Twitch确实"},{"type":"link","attrs":{"href":"https:\/\/www.nytimes.com\/2021\/04\/27\/technology\/twitch-livestream-extremists.html","title":null,"type":null},"content":[{"type":"text","text":"充满了极右翼的主播们"}]},{"type":"text","text":",但你并不能靠青蛙头来验证这点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"悲伤蛙梗图的政治意义似乎被Twitch上1.4亿的用户(其中"},{"type":"link","attrs":{"href":"https:\/\/backlinko.com\/twitch-users#user-demographics","title":null,"type":null},"content":[{"type":"text","text":"41%的用户未满24岁"}]},{"type":"text","text":")不客气地抛弃了。他们不约而同地从盗图的政治家手中非常效率地将青蛙Pepe夺了回来,重新用自己的方式将其定义。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"方法和数据"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者们发现带标签的Twitch颜文字数据集“几乎不存在”,虽然"},{"type":"link","attrs":{"href":"https:\/\/dl.acm.org\/doi\/10.1145\/3365523","title":null,"type":null},"content":[{"type":"text","text":"先前有研究"}]},{"type":"text","text":"称他们共使用了"},{"type":"link","attrs":{"href":"https:\/\/dl.acm.org\/doi\/10.1145\/3365523","title":null,"type":null},"content":[{"type":"text","text":"八百万的Twitch表情"}]},{"type":"text","text":",而其中40万都是在同一周内造出来的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017年的一份预测Twitch上流行颜文字的"},{"type":"link","attrs":{"href":"https:\/\/aclanthology.org\/W17-4402\/","title":null,"type":null},"content":[{"type":"text","text":"研究"}]},{"type":"text","text":",在将预测范围限制在了前30后,仍然只有0.39的得分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为应对这一难题,旧金山的研究者对旧数据使用了新方法,将训练集和测试集的比例分成了80\/20,并采用了"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-is-bayes-theorem\/","title":null,"type":null},"content":[{"type":"text","text":"朴素贝叶斯"}]},{"type":"text","text":"(NB),"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/generative-vs-discriminative-machine-learning-models\/","title":null,"type":null},"content":[{"type":"text","text":"随机森林"}]},{"type":"text","text":"(RF)、"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-are-support-vector-machines\/","title":null,"type":null},"content":[{"type":"text","text":"支持向量"}]},{"type":"text","text":"(SVM,用线性核),以及"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/supervised-vs-unsupervised-learning\/","title":null,"type":null},"content":[{"type":"text","text":"逻辑回归"}]},{"type":"text","text":"(Logistic Regression),这些之前并未在Twitch数据中使用过的“传统”机器学习算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这种算法的性能和先前研究的基准线相比高出了63.8%,而研究人员借此开发的LOOVE(“从词汇中学习情绪”的英文缩写)框架做到了新词汇的识别,并将这些全新的定义添加到现有的模型中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/ml8ygptwlcsq.i.optimole.com\/fMKjlhs--olRzhhu\/w:404\/h:493\/q:auto\/https:\/\/www.unite.ai\/wp-content\/uploads\/2021\/11\/LOOVE-framework.jpg","alt":null,"title":"研究人员开发的LOOVE(Learning Out Of Vocabulary Emotions)框架结构","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LOOVE在无监督训练嵌入词上大展身手,通过定期再训练和微调避免了对标记数据集的需求。考虑到表情的数量和其演化的速度,实时更新标记数据集非常的不现实。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在项目中,研究者们用一个未标记的Twitch数据集"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1301.3781.pdf","title":null,"type":null},"content":[{"type":"text","text":"训练"}]},{"type":"text","text":"一个颜文字的“伪字典”,在训练过程中,模型生成了444,714个单词、颜文字和emoji的嵌入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,他们在"},{"type":"link","attrs":{"href":"https:\/\/dblp.uni-trier.de\/db\/conf\/icwsm\/icwsm2014.html#HuttoG14","title":null,"type":null},"content":[{"type":"text","text":"VADER词典"}]},{"type":"text","text":"中新增了"},{"type":"link","attrs":{"href":"https:\/\/journals.plos.org\/plosone\/article?id=10.1371\/journal.pone.0144296","title":null,"type":null},"content":[{"type":"text","text":"emoji和表情符号词汇"}]},{"type":"text","text":",除了之前提到的EC数据集之外,他们还利用来自推特、烂番茄和YELP(外网一美食点评网站)采样这三个公开可用的数据集进行"},{"type":"link","attrs":{"href":"https:\/\/aclanthology.org\/W18-6215\/","title":null,"type":null},"content":[{"type":"text","text":"三元"}]},{"type":"text","text":"感情的分类。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于项目中使用了不止一种方法和数据集,其结果也各不相同,但可以肯定的是,项目中表现最优的基准线比先前研究要高出7.36个百分点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究者认为,该项目的后续价值是在于LOOVE框架的继续开发,借助"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/what-is-k-nearest-neighbors\/","title":null,"type":null},"content":[{"type":"text","text":"K最邻近法"}]},{"type":"text","text":"(KNN)和word-to-vector(W2V)的嵌入训练Twitch上超过3.31亿条的聊天数据。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"论文作者总结道:框架背后的功能驱动是可用于预测未知表情情感的颜文字的伪词典。利用这个颜文字的伪词典,我们创建了一个包含22,507个表情的情绪表,可以说是第一个如此规模的颜文字解读案例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"英文原文"},{"type":"text","text":":"},{"type":"link","attrs":{"href":"https:\/\/www.unite.ai\/understanding-twitch-emotes-in-sentiment-analysis\/","title":null,"type":null},"content":[{"type":"text","text":"Understanding Twitch Emotes in Sentiment Analysis"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章