NLP Deep Dive: Complete Course on Tokenization, Stop Words, Stemming and Lemmatization
🕒 2025-04-23 05:40:43.193235What will you learn?
- Get all the NLP courses with video tutorials (start to end with projects)
- Tokenization in Natural Language Processing
- Stop Words in NLP
- Stemming and Lemmatization
What is Natural Language Processing??
Why Natural Language Processing??
Get all the NLP courses with video tutorials (start to end with projects)
I will be discussing several topics related to Natural Language Processing:
- Overview of the course
- Introduction to Natural Language Processing (NLP)
- Tokenization
- Stop-words
- Stemming and Lemmatization
- Bag of Words model
- TF-IDF model
- Project 1: Movie Recommendation System using TF-IDF
- Embedding layers: Word embedding technique
- Word2Vec: Word embedding technique
- Word embedding method: GloVe: Global vectors for word representations
- Recurrent Neural Network (RNN) | Simple Neural Network vs RNN vs CNN (convolutional neural network)
- Long short-term memory (LSTM) and Gated recurrent unit (GRU)
- Project 2: Next word prediction using LSTM
- Bidirectional LSTM
- Sequence to Sequence (Seq2seq) model | Encoder and Decoder architecture
- Transformer in NLP
- Bert in NLP
- Project 3: Question Answering Using Bert
These are the topics on which I am concentrating.
I created an entire playlist for this course.
You can view everything. It will be a lot of fun to study all of these concepts through practical examples and demonstrations.
Tokenization in Natural Language Processing
import nltk
nltk.download()
paragraph="""Natural language processing (NLP) is an artificial intelligence (AI) technique that lets users communicate with intelligent computers using a natural language, like English. Natural language processing is essential when we want an intelligent system, such as a robot, to follow our instructions, when we want to hear a conclusion from a dialogue-based clinical expert system, and so on. The field of NLP is concerned with teaching computers to execute meaningful tasks using the natural languages that humans use. An NLP system's input and output can be speech and written text."""
# Tokenizing sentences
sentences = nltk.sent_tokenize(paragraph)
print(sentences)
print(len(sentences))
# Tokenizing words
words = nltk.word_tokenize(paragraph)
print(words)
print(len(words))
Stop words in Natural Language Processing
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words('english')
# Removing stopwords from given tokens
lst = [word for word in ("Welcome","to","my","channel","IG","Tech","Team") if word not in set(stopwords.words('english'))]
print(lst)
paragraph="""Natural language processing (NLP) is an artificial intelligence (AI) technology that uses a natural language, for instance English, to interact with intelligent systems. Natural language processing is essential when we want an intelligent system, such as a robot, to follow our instructions, when we want to hear a conclusion from a dialogue-based clinical expert system, and so on. The field of NLP is concerned with teaching computers to execute meaningful tasks using the natural languages that humans use. An NLP system's input and output can be speech and written text."""
# Tokenizing words
words = nltk.word_tokenize(paragraph)
# Removing stopwords from above paragraph
lst = [word for word in words if word not in set(stopwords.words('english'))]
print(lst)
# Assignment
#--> Beside english language, what other language can be passed in stopwords.words() as parameter in order to find stopwords?
# Solution:
print(stopwords.fileids())
Stemming and Lemmatization in Natural Language Processing
import nltk
from nltk.corpus import stopwords
# Applying stemming in given words
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words=["go","goes","google","googling","goal","goals"]
words = [stemmer.stem(word) for word in words]
print(words)
# Applying lemmatization in given words
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
words=["go","goes","google","googling","goal","goals"]
words = [lemmatizer.lemmatize(word) for word in words]
print(words)
#paragraph
paragraph="""Natural language processing (NLP) is an artificial intelligence (AI) technique for collaborating with intelligent systems that uses a natural language, for instance, English. Natural language processing is essential when we want an intelligent system, such as a robot, to follow our instructions, when we want to hear a conclusion from a dialogue-based clinical expert system, and so on. The field of NLP is concerned with teaching computers to execute meaningful tasks using the natural languages that humans use. An NLP system's input and output can be speech and written text."""
# Apply tokenization and then stemming
sentences = nltk.sent_tokenize(paragraph)
for i in range(len(sentences)):
words = nltk.word_tokenize(sentences[i])
words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
sentences[i] = ' '.join(words)
print(sentences)
# Apply tokenization and then Lemmatization
sentences = nltk.sent_tokenize(paragraph)
for i in range(len(sentences)):
words = nltk.word_tokenize(sentences[i])
words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))]
sentences[i] = ' '.join(words)
print(sentences)
Conclusion:
This is the post on NLP deep dive. In this post, we have learned about Tokenization, stop-words, stemming, and lemmatization. Make sure you have checked the complete playlist. I have made three projects: Movie Recommendation System, Next word prediction, and Question Answering.
I hope this post is very helpful to you. Don't hesitate to ask me in the comment section if you have any questions. I will back to you as soon as possible. Thanks.
Comments
Loading comments...
Leave a Comment