First Project: Yelp Restaurant Review Scraper + NLP
- Gihani Dissanayake
- Aug 28, 2020
- 1 min read
I've been cooking a lot in the pandemic, but it's also nice to try some new restaurants these days. The challenge is how to decide on one to order from! There are so many great restaurants out there. So I thought it would be interesting to build a Yelp scraper and do some NLP analysis (such as sentiment analysis, TF-IDF, topic modeling) on a restaurant I'd be interested in going to. It would be an opportunity to hear what people really think, especially as many restaurants have made changes to their menu, service formats, etc. (though I'm only interested in curbside or to-go options).


Above is the work I've done so far, which scrapes the Yelp reviews for a restaurant called Fonda San Miguel, converts the 20 reviews from the first page into a pandas DataFrame, and then starts using the NLTK library in python to tokenize and remove stop words.
I wanted to share my work thus far, even though there are many future steps that I have planned, especially in the NLP space. This was my first time building a scraper from scratch, so I'm looking forward to trying it on different URLs within Yelp later, and hopefully making it more versatile. I know there is a Yelp API but I didn't investigate much for a variety of reasons.
Once this project is completed (or at least much closer to usable tool), I plan on documenting it in my GitHub but for now it remains on my local machine.

Comments