Ticker

6/recent/ticker-posts

NLP Lab Program 2 -BAI601

 

Experiment 2:

Demonstrate the N-gram modeling to analyze and establish the probability distribution across sentences and explore the utilization of unigrams, bigrams, and trigrams in diverse English sentences to illustrate the impact of varying n-gram orders on the calculated probabilities.


What is an N-gram?

An N-gram is a sequence of 'n' words that appear next to each other in a sentence or a piece of text. The value of 'n' tells you how many words are in that sequence. For example, if we have the sentence:

"I love programming"

  • Unigram (1-gram): Each word is considered individually. So, the unigrams for this sentence would be:
    • I
    • love
    • programming
  • Bigram (2-gram): A bigram looks at pairs of consecutive words. For the same sentence, the bigrams would be:
    • I love
    • love programming
  • Trigram (3-gram): A trigram looks at triplets of consecutive words. For the sentence above, the trigram would be:
    • I love programming

As you can see, the number of words in the sequence increases as you move from unigrams to bigrams to trigrams.

Why is 'n' important?

The number 'n' in N-grams tells us how many words we look at together. For instance:

  • Unigrams focus on individual words.
  • Bigrams help us understand how two words come together to form meaning.
  • Trigrams take a closer look at how three words work together to convey meaning.

Increasing 'n' allows us to capture more context. The more words we include, the better we can understand the relationships and meanings behind them.

Examples of N-grams in Action

  1. Unigrams (1-grams):
    • Definition: Unigrams are individual words in a sentence.
    • Example: If the sentence is “I enjoy reading books,” the unigrams are:
      • I
      • enjoy
      • reading
      • books
  2. Bigrams (2-grams):
    • Definition: Bigrams are pairs of consecutive words in a sentence.
    • Example: In the same sentence, the bigrams would be:
      • I enjoy
      • enjoy reading
      • reading books

Bigrams help us understand the relationship between two consecutive words. For example, “enjoy reading” makes more sense together than the words “enjoy” or “reading” separately.

  1. Trigrams (3-grams):
    • Definition: Trigrams are triplets of consecutive words in a sentence.
    • Example: For the sentence "I enjoy reading books," the trigram would be:
      • I enjoy reading
      • enjoy reading books

Trigrams give us even more context, like how "reading books" is a common phrase and gives us more meaning than just looking at "reading" or "books".

Why Are N-grams Important?

N-grams are crucial for understanding how words are connected in sentences and for making sense of the language. In simple terms, they help machines learn patterns in text, making it possible for computers to:

  • Predict the next word: In tasks like autocomplete or chatbots, N-grams help computers predict what word is most likely to come next.
  • Improve search engines: When searching for information, understanding common phrases (bigrams, trigrams) can help improve the results returned.
  • Language translation: N-grams are used in translating text from one language to another, because they help the machine understand the relationship between different words in both languages.

Applications of N-grams

  1. Speech Recognition: N-grams are used to recognize words in spoken language, such as voice assistants like Siri or Alexa. They predict which words are likely to follow each other.
  2. Text Prediction: In predictive text systems, like texting on smartphones or typing in word processors, N-grams help suggest the next word based on the previous ones.
  3. Sentiment Analysis: When analyzing the sentiment (positive or negative) of a sentence, N-grams can help detect patterns of words that typically indicate sentiment, like “love programming” (positive sentiment) or “hate bugs” (negative sentiment).
  4. Machine Translation: When translating one language to another, N-grams help machines understand word patterns and improve the accuracy of translations.

How Does Increasing the Value of 'n' Affect the Model?

  • Unigrams (n=1): Only the individual words are considered. While simple, this doesn’t capture relationships between words. For instance, “I love programming” would be seen as three separate words.
  • Bigrams (n=2): This captures some context, such as how “love programming” appears together frequently. It helps in understanding common word pairings.
  • Trigrams (n=3): This gives a deeper level of understanding by capturing patterns of three words, such as "machine learning model" or "data science approach". This helps in predicting sequences of words that make sense together.




 

Post a Comment

0 Comments