Review Your NLP Knowledge
1. Abbreviated Words in NLP:
- LSTM: Long Short Term Memory
- Bert: Bidirectional Encoder Representations from Transformers.
- POS: parts of speech.
- DTM: Document Term Matrix.
- NER: name entity recognition.
- NLG: Natural Language Generation.
- NLU: Natural Language Understanding.
- TF IDF: Term Frequency–Inverse Document Frequency.
- re: Regular expression.
- LDA: Latent Dirichlet Allocation.
- LSI: Latent Semantic Indexing.
- NMF: Non-Negative Matrix Factorization.
- NLTK: Natural Language Toolkit
2. Some Common Steps for NLP Problems:
- Sentence Segmentation: break the text apart into separate sentences
- Tokenization: split Sentence to words
- Stemming: process of reducing words to their word stem for example thinking→ think
- Lemmatizing: for example worse→ bad
- POS tags: Predicting Parts of Speech for Each Token
- Identifying Stop Words: like “and”, “the”
- Name entity recognition: detect nouns with the real world concepts.
- Text classification
- Chunking
- Coreference resolution
3. Applications of NLP in The Real World:
- Personal assistant applications
- Fighting spam
- Chatbots
- Managing the Advertisement
- Sentiment analysis
- Text classification
- Text summarization
- Toxicity Classification
- Name entity recognization
- Part of speech tagging
- Language model building
- Machine translation
- Spell checking
- Speech recognition
- Character recognition
4. Python Library for NLP:
- NLTK
- spaCy
- Gensim : is a python library specifically for Topic Modelling.
- Pattern
- Stanford CoreNLP
- Polyglot
- TextBlob
- re: python library for regular expression
- WordCloud
- allennlp: an open-source NLP research library, built on PyTorch
5. A few terms in NLP:
- Stop words
- Punctuation
- Word embedding
- Word segmentation
- Text summarization
- Regular expression
- Morphological segmentation
- Named entity recognition
- Corpus: A collection of texts
- Document-Term Matrix
- n-gram: tokenize sentences by n words combination
- Latent Dirichlet Allocation: a technique for topic modelling.
6. Word Embedding Libraries:
- Word2vec
- Glove
- Fasttext
- Genism
7. Some Useful links for Learning NLP:
- NLP Course Beginner to Advanced
- natural-language-processing-fundamentals-in-python
- Stanford’s cs224n: Natural Language Processing with Deep Learning
- Top 10 Books on NLP and Text Analysis
8. NLP Engineer Interview Question:
- natural language processing interview questions
- nlp-interview-questions
- 30-questions-test-data-scientist-natural-language-processing
9. Great Tutorials for NLTK & spaCy:
- https://pythonspot.com/category/nltk/
- Python for NLP: Tokenization, Stemming, and Lemmatization with SpaCy Library
- https://course.spacy.io/
10. Some Great Topics in Kaggle
It’s a good idea to read the following topics because you can review almost all the issues that are relevant to this competition: