Data Science

Text Mining & NLP Final Project

Applied NLP and text mining techniques to extract insights and patterns from unstructured text data.

Problem

Unstructured text data is abundant but underutilized. The challenge was to apply systematic NLP techniques to extract meaningful patterns and classifications from raw text corpora.

Solution

Implemented a complete text mining pipeline covering preprocessing (tokenization, stopword removal, stemming), feature extraction (TF-IDF, word embeddings), and model training for classification and topic discovery.

Impact

Demonstrated end-to-end NLP capability — from raw text ingestion to actionable insights — applicable to real-world use cases like sentiment analysis, document classification, and content recommendation.

Technologies Used

PythonNLTKScikit-learnPandasGensimJupyter

About This Project

A graduate-level applied NLP project covering the full text mining pipeline: data collection, preprocessing, tokenization, feature extraction, topic modeling, and classification. Built as the capstone for an Applied Text Mining course at the University of San Diego.