Program 5
Design and implement a deep learning network for classification of textual documents.
Step 1:
·
TensorFlow/Keras
→ to build and train the deep learning model.
·
IMDB
dataset → built-in dataset of 50,000 movie reviews (positive/negative).
·
pad_sequences
→ ensures all reviews are the same length.
·
Matplotlib
→ for plotting graphs of training/validation performance.
Step 2:
·
VOCAB_SIZE
= Only keep the 10,000 most common words.
·
MAX_LEN
= Each review will be cut/padded to 256 words.
·
EMBEDDING_DIM
= Each word becomes a 128-dimensional vector.
·
BATCH_SIZE
= Train with 64 reviews at a time.
·
EPOCHS
= Train the model for up to 12 passes through the dataset.
·
SEED
= Fixes randomness for reproducibility.
Step 3
·
Loads
training and test data. Each review is already converted into word indices.
·
pad_sequences
→ ensures all reviews are the same length (256).
·
Short
reviews → padded with zeros.
·
Long
reviews → cut at 256.
Step 4:
·
Splits
training data into:
·
Train
set (80%)
·
Validation
set (20%) → checks performance during training.
·
Test
set is used only at the end.
Step 5
·
Embedding
→ converts word indices into dense word vectors.
·
SpatialDropout1D
→ randomly drops word vectors during training (prevents overfitting).
·
Bidirectional
LSTM → reads the review both forward and backward, capturing meaning from
context.
·
Pooling
layers → reduce sequence into a fixed vector:
·
GlobalMaxPooling1D
→ picks strongest features.
·
GlobalAveragePooling1D
→ averages features.
·
Dense(64,
relu) → hidden layer to learn patterns.
·
Dropout(0.4)
→ prevents overfitting by turning off neurons randomly.
·
Dense(1,
sigmoid) → outputs probability (positive review vs negative review).
Step 6:
·
Adam
optimizer → adjusts learning rate automatically.
·
Binary crossentropy → best suited for 2-class
problems.
·
Metrics:
·
Accuracy
→ % of correct predictions.
·
AUC
→ measures how well the model separates the two classes.
Step 7:
·
EarlyStopping
→ stops training if validation doesn’t improve (avoids overfitting).
·
ModelCheckpoint
→ saves the best model automatically.
·
ReduceLROnPlateau
→ lowers learning rate if training stalls.
Step 8:
- Model learns patterns from
training reviews.
- Validation set checks progress
each epoch.
Step 9:
0 Comments