Machines do not process language the same way humans do.
A word can mean different things depending on where it appears.
"The bank approved my loan."
"I sat near the river bank."In the first sentence, bank means a financial institution.
In the second sentence, bank means the side of a river.
That is the central difficulty of language processing. Language is not just a list of words. It is meaning, context, order, tone, and relationships between words.
The bank approved my loan.
I sat near the river bank.
The model has to use nearby words to decide which meaning of bank is active.
Before embeddings or Transformers make sense, it helps to understand the broader field they come from.
This is part 1 of a three-part introduction. The next two parts cover counting words to embeddings and contextual embeddings to Transformers.
What NLP is trying to do
NLP stands for Natural Language Processing.
It is the area of AI focused on helping computers work with human language.
NLP includes tasks like:
text classification
translation
summarization
question answering
chatbots
semantic search
speech-to-text
information extraction
sentiment analysisA simple NLP system takes raw text, processes it, and produces a useful output.
Example:
Input:
"I love this product."
Output:
Sentiment = positiveAnother example:
Input:
"Summarize this article."
Output:
A short summary written in natural language.NLP is the broad field. Inside it, two important ideas are NLU and NLG.
NLU: understanding language
NLU means Natural Language Understanding.
It is the part of NLP focused on understanding what text means.
Example:
"Book me a flight to Paris next Friday."An NLU system might extract:
intent: book a flight
destination: Paris
date: next FridayThe system is not only reading the words. It is trying to identify the user's goal and the important details inside the sentence.
NLU is used for tasks like:
intent detection
named entity recognition
sentiment analysis
topic classification
text classification
question understandingA chatbot needs NLU to understand what the user is asking.
A search engine needs NLU to understand what a query means.
A customer support system needs NLU to determine whether the user is asking for a refund, reporting a bug, or complaining about a product.
NLG: generating language
NLG means Natural Language Generation.
It is the part of NLP focused on producing text.
If NLU is about reading, NLG is about writing.
Examples of NLG-heavy tasks include:
writing an answer
summarizing a document
generating an email
creating chatbot replies
translating text
explaining data in natural languageTranslation uses both sides: the system has to interpret the source text, then generate text in another language.
If a support system understands that a customer is upset about a late delivery, NLG helps generate a response like:
"I'm sorry your delivery was late. I can help check the status or request a refund."A simple way to remember the relationship:
NLP = the whole field
NLU = understanding language
NLG = generating languageModern LLM apps often use all three ideas at once. The system understands the user's request, retrieves or processes information, then generates a useful answer.
Book me a flight to Paris next Friday.
intent, destination, date, sentiment, entities
a natural-language response or summary
NLP is the full field. NLU is the reading side. NLG is the writing side.
Tokenization
Tokenization means breaking text into smaller pieces.
Those pieces are called tokens.
Tokens can be:
words
subwords
characters
punctuationExample:
"Apple opened a new office in Paris."Tokenized:
["Apple", "opened", "a", "new", "office", "in", "Paris", "."]Modern language models often use subword tokenization.
One tokenizer might split a word like this:
"unbelievable"
-> ["un", "believ", "able"]The exact split depends on the tokenizer.
Tokenization matters because models do not process raw text directly. Text has to be split into smaller units first.
The flow looks like this:
Text
-> tokens
-> token IDs
-> embeddings
-> modelCommon tools for tokenization:
NLTK
spaCy
Hugging Face Tokenizers
Transformers tokenizersTokenization sounds small, but it affects cost, context length, search quality, and how the model handles unfamiliar words.
Apple opened a new office in Paris.
After tokenization, each token can be mapped to an ID and then to an embedding vector.
Named entity recognition
Named Entity Recognition, usually called NER, means identifying important named things in text.
Entities can include:
people
companies
locations
dates
money amounts
organizations
productsExample:
"Apple opened a new office in Paris on Monday."NER output:
Apple -> Organization
Paris -> Location
Monday -> DateAnother example:
"Elon Musk bought Twitter for $44 billion in 2022."NER output:
Elon Musk -> Person
Twitter -> Organization
$44 billion -> Money
2022 -> DateNER is useful because it turns messy text into structured information.
This sentence:
"Meeting with Sarah in London next Friday."can become:
person: Sarah
location: London
date: next FridayCommon tools for NER:
spaCy
NLTK
Stanford NLP
Hugging Face Transformers
FlairNER is one reason NLP became useful in real software. Once text becomes structured, applications can search it, route it, validate it, and store it.
Apple
Paris
Monday
NER turns unstructured text into fields an application can store, filter, or route.
Sentiment analysis
Sentiment analysis means detecting the emotional tone or opinion in text.
The basic labels are usually:
positive
negative
neutralExamples:
"I love this app. It is fast and easy to use."
-> positive
"This app keeps crashing after the update."
-> negative
"The app was updated yesterday."
-> neutralSometimes sentiment can be more detailed:
happy
angry
sad
frustrated
excited
confusedSentiment analysis is used for:
product reviews
customer support tickets
surveys
social media monitoring
feedback analysisCommon tools for sentiment analysis:
scikit-learn
NLTK
TextBlob
spaCy
Hugging Face TransformersSentiment analysis is a good beginner example because the input and output are easy to understand. The hard part is that real language is subtle.
Example:
"Great, the app crashed again."The word great looks positive, but the sentence is negative. That is why context matters.
I love this app. It is fast and easy to use.
This app keeps crashing after the update.
The app was updated yesterday.
The simple version predicts positive, negative, or neutral. Real systems often need more nuance.
Why this matters before embeddings
All of these tasks share the same underlying problem:
How do we turn language into something a machine can process?Early NLP systems often handled this by counting words.
Modern systems try to represent meaning numerically.
That is the path from traditional NLP to embeddings and Transformers:
raw text
-> tokens
-> numbers
-> meaning representations
-> useful outputThe important idea is simple:
NLP is the field.
NLU reads and interprets language.
NLG writes language.
Tokenization, NER, and sentiment analysis are common tasks inside that field.Once that map is clear, the next question is natural:
How did older NLP systems represent text before embeddings?That is where word counts, Bag of Words, and TF-IDF come in.