NLP Lab Program 5-BAI601

Experiment 5

Given the following short movie reviews, each labelled with a genre, either comedy or action:

● fun, couple, love, love comedy

● fast, furious, shoot action

● couple, fly, fast, fun, fun comedy

● furious, shoot, shoot, fun action

● fly, fast, shoot, love action and

A new document D: fast, couple, shoot, fly Compute the most likely class for D.

Assume a Naive Bayes classifier and use add-1 smoothing for the likelihoods.

Aim:

To classify the new document DD (which contains the words "fast", "couple", "shoot", and "fly") using a Naive Bayes classifier, we need to calculate the likelihood of each class (comedy or action) and then compare the results. We'll also use add-1 smoothing for the likelihoods.

Procedure:

1) Open Anaconda Navigator.

2) Click on Launch under Jupyter Notebook.

3) Once Jupyter Notebook opens in the browser, create a new notebook by selecting New -> Python 3.

4) Install necessary libraries (e.g., nltk, spacy).

5) After completing your analysis, make sure to save your work. Click on File > Save and Checkpoint or use the keyboard shortcut Ctrl + S to save your Jupyter notebook.

6) Export Notebook (Optional)

If you'd like to share your Jupyter notebook with others or convert it into another format (like PDF or HTML), you can do so by:

File > Download as and then select the format you wish to export to (e.g., PDF, HTML, Markdown).

7) Shut Down Jupyter Notebook

To shut down your notebook server, simply close the Jupyter Notebook tab in your browser, or from the command line, press Ctrl + C to stop the server.

Theory:

Step 1: Organize the data

We have two classes: comedy and action, and the following labelled movie reviews:

Comedy Class:

fun, couple, love, love (comedy)
couple, fly, fast, fun, fun (comedy)

Action Class:

fast, furious, shoot (action)
furious, shoot, shoot, fun (action)
fly, fast, shoot, love (action)

The document DD consists of the words "fast", "couple", "shoot", and "fly".

Step 2: Calculate Class Prior Probabilities

The class prior probabilities are based on the number of documents in each class.

Total number of documents = 6
Number of comedy documents = 2
Number of action documents = 4

The prior probabilities are:

P(comedy)=2/6=1/3

P(action)=4/6=2/3

Step 3: Compute the Word Likelihoods with Add-1 Smoothing

For each class, we need to calculate the likelihood of each word given the class. We'll apply add-1 smoothing (also known as Laplace smoothing) to account for words that might not appear in the training data.

Vocabulary:

We need the union of words that appear across all reviews:

Words in comedy reviews: "fun", "couple", "love", "fly", "fast"
Words in action reviews: "fast", "furious", "shoot", "fun", "fly", "love"

Thus, the total vocabulary size VV is 6 distinct words: ["fun", "couple", "love", "fly", "fast", "shoot", "furious"].

Calculate Likelihoods for Comedy Class:

In the comedy class, the word counts (after adding 1 for each word) are:

"fun": Appears 2 times in comedy documents.
"couple": Appears 2 times in comedy documents.
"love": Appears 2 times in comedy documents.
"fly": Appears 1 time in comedy documents.
"fast": Appears 1 time in comedy documents.
"shoot": Appears 0 times in comedy documents.
"furious": Appears 0 times in comedy documents.

Total words in comedy documents = 2 + 2 + 2 + 1 + 1 = 8 words.

Calculate Likelihoods for Action Class:

In the action class, the word counts (after adding 1 for each word) are:

"fun": Appears 1 time in action documents.
"couple": Appears 0 times in action documents.
"love": Appears 1 time in action documents.
"fly": Appears 1 time in action documents.
"fast": Appears 2 times in action documents.
"shoot": Appears 3 times in action documents.
"furious": Appears 2 times in action documents.

Total words in action documents = 1 + 0 + 1 + 1 + 2 + 3 + 2 = 10 words.

Result :

Based on the Naive Bayes classifier with add-1 smoothing, the new document "fast, couple, shoot, fly" was classified as action with a higher posterior probability than comedy. The classification was based on the prior probabilities and likelihoods derived from the training data.

Ticker

NLP Lab Program 5-BAI601

Post a Comment

0 Comments

Categories

Popular Posts

Common Emitter configuration, Input and Output Characteristics

Common Base configuration, input and output characteristics

Common Collector configuration, Input and Output characteristics

Menu Footer Widget

Ticker

NLP Lab Program 5-BAI601

You may like these posts

Post a Comment

0 Comments

Social Plugin

Categories

Popular Posts

Common Emitter configuration, Input and Output Characteristics

Common Base configuration, input and output characteristics

Common Collector configuration, Input and Output characteristics

Menu Footer Widget