Go Sports Go Team !

This project involved building a custom web crawler to collect and structure team-specific NFL news articles. I developed a scraping pipeline that navigates archived news pages, extracts article URLs, retrieves full article content, and stores the results in structured JSONL format for downstream analysis.

The crawler includes logic for handling broken links, duplicate prevention, incremental storage, and queue-based traversal to ensure efficient collection. In total, the pipeline successfully ingested approximately 17,000 unique team news articles.

This project deepened my understanding of web scraping architecture, fault-tolerant data collection, structured document storage, and preparing unstructured text data for future NLP and sentiment analysis workflows.