Sequence Model - Natural Language Processing & Word Embeddings

Word Embeddings

Word Representation

  • 1-hot representation: any product of them is \(0\)
  • Featurized representation: word embedding

Visualizing word embeddings


t-SNE algorithm: \(300 \mathrm D \to 2 \mathrm D\)

learn the concepts that fell like they should be more related

Using word embeddings

Named entity recognition example


it will be much smaller in training sets and so this allows you to carry out transfer learning

Transfer learning and word embeddings

  • Learn word embeddings from large text corputs. (\(1 - 100\mathrm B\) words)

    (or download pre-trained embedding online.)

  • Transfer embedding to new task with smaller training set.

    (say, 100k words)

  • Optional: Continue to finetune word embeddings with new data

Properties of Word Embeddings


\(\text{Man} \to \text{Woman } as \text{ King} \to ?\)

\(e_{\text{man}} - e_{\text{woman}} \approx \begin{bmatrix} -2 \\ 0 \\ 0 \\ 0 \end{bmatrix} \approx e_{\text{king}} - e_{\text{queen}}\)

\(e_? \approx e_\text{king} - e_\text{man} + e_\text{woman} \approx e_{\text{queen}}\)

find a word \(w\) to satisfiy \(\max_w \text{sim}(e_w, e_\text{king} - e_\text{man} + e_\text{woman})\)

  • Cosine similarity

    \[\text{sim}(u, v) = \frac{u^{T}v}{||u||_2 ||v||_2} \]

Embedding Matrix

