CS224N 第二課

原創

2019-02-27 13:26

Problems with this discrete representation -- 離散表示的問題

一個基本的問題是：幾乎所有nlp的研究都用了原子符號來表示單詞。

So, the fundamental thing to note is that for sorta(近似、可以說是) just about all NLP, apart from(除了) both modern deep learning and a litle bit of neural network NLP that got done in the 1980s, that it's all used atomic symbols like hotel, conference, walk.

這樣我們就有了與單詞對應的詞向量表示。詞向量的維度（長度）取決於你的任務，可能是20K維（語音識別），50K維（機器翻譯系統）、500K維甚至13M的維度（Google 1T語料庫的爬蟲-web crawl，有1300萬詞彙，the vocabulary in that is 13 million words.）

那麼，這些向量爲什麼會有問題呢？ And so, why are these vectors problematic?

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CS224N 第二課

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

正則表達式 - \1\2和\\1的理解

Task 4: Contextual Word Embeddings

Task 3: Subword Models

Python後臺運行 -- nohup python xxx.py &

Task 2: Word Vectors and Word Senses

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結