M.S. Data Science @ Columbia University
Email: tsaiester@gmail.com
View My LinkedIn Profile
For this project, the student team (Aditya Agrawal, Ester Tsai, Kelly Park, Sukanya Krishna) was given manay large datasets of Amazon product reviews, which contain details such as the review title, review text body, star rating, how many people voted “helpful,” and the time the review was posted.
Verified = product was bought on Amazon
Unverified = product not bought on Amazon but might be bought from 3rd party seller
We sought to answer the question: How well can we predict whether a review comes from a verified purchase or not?
The team used US Amazon Customer Reviews datasets from Amazon archives. The reviews were taken over the span of the years 2014 - 2015.
The collection of reviews is organized into sub datasets for each product category. The collection contained datasets for 46 different product categories, and each dataset contains 15 features for each review.
EXAMPLE UNVERIFIED REVIEW:
What the Features Mean: