Ester Tsai

Logo

M.S. Data Science @ Columbia University
B.S. Data Science @ UC San Diego

Email: tsaiester@gmail.com

View My LinkedIn Profile

View My GitHub Profile

Amazon Fake Review Machine Learning Classification

Project Overview

For this project, the student team (Aditya Agrawal, Ester Tsai, Kelly Park, Sukanya Krishna) was given manay large datasets of Amazon product reviews, which contain details such as the review title, review text body, star rating, how many people voted “helpful,” and the time the review was posted.

Verified = product was bought on Amazon

Unverified = product not bought on Amazon but might be bought from 3rd party seller

We sought to answer the question: How well can we predict whether a review comes from a verified purchase or not?

Dataset Description

The team used US Amazon Customer Reviews datasets from Amazon archives. The reviews were taken over the span of the years 2014 - 2015.

The collection of reviews is organized into sub datasets for each product category. The collection contained datasets for 46 different product categories, and each dataset contains 15 features for each review.

Analysis

KNN Classification

Random Forest Classification + Bigrams

EXAMPLE UNVERIFIED REVIEW:

What the Features Mean:

Jupyter Notebook Demo

Steamlit Demo

Results

Final Poster Presentation