Word Embeddings
Word Representation
- 1-hot representation: any product of them is \(0\)
- Featurized representation: word embedding
Visualizing word embeddings
t-SNE algorithm: \(300 \mathrm D \to 2 \mathrm D\)
learn the concepts that fell like they should be more related
Using word embeddings
Named entity recognition example
it will be much smaller in training sets and so this allows you to carry out transfer learning
Transfer learning and word embeddings
-
Learn word embeddings from large text corputs. (\(1 - 100\mathrm B\) words)
(or download pre-trained embedding online.)
-
Transfer embedding to new task with smaller training set.
(say, 100k words)
-
Optional: Continue to finetune word embeddings with new data
Properties of Word Embeddings
Analogies
\(\text{Man} \to \text{Woman } as \text{ King} \to ?\)
\(e_{\text{man}} - e_{\text{woman}} \approx \begin{bmatrix} -2 \\ 0 \\ 0 \\ 0 \end{bmatrix} \approx e_{\text{king}} - e_{\text{queen}}\)
\(e_? \approx e_\text{king} - e_\text{man} + e_\text{woman} \approx e_{\text{queen}}\)
find a word \(w\) to satisfiy \(\max_w \text{sim}(e_w, e_\text{king} - e_\text{man} + e_\text{woman})\)
- Cosine similarity\[\text{sim}(u, v) = \frac{u^{T}v}{||u||_2 ||v||_2} \]