CS224N 第二课

原創

2019-02-27 13:26

Problems with this discrete representation -- 离散表示的问题

一个基本的问题是：几乎所有nlp的研究都用了原子符号来表示单词。

So, the fundamental thing to note is that for sorta(近似、可以说是) just about all NLP, apart from(除了) both modern deep learning and a litle bit of neural network NLP that got done in the 1980s, that it's all used atomic symbols like hotel, conference, walk.

这样我们就有了与单词对应的词向量表示。词向量的维度（长度）取决于你的任务，可能是20K维（语音识别），50K维（机器翻译系统）、500K维甚至13M的维度（Google 1T语料库的爬虫-web crawl，有1300万词汇，the vocabulary in that is 13 million words.）

那么，这些向量为什么会有问题呢？ And so, why are these vectors problematic?

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CS224N 第二课

正則表達式 - \1\2和\\1的理解

Task 4: Contextual Word Embeddings

Task 3: Subword Models

Python後臺運行 -- nohup python xxx.py &

Task 2: Word Vectors and Word Senses

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結