The abstract outlines the study's objective: using Amazon product reviews to categorize them into negative, neutral, or positive sentiments.
The background section provides context about Amazon, describing it as a comprehensive online retailer and web service provider with a vast array of products. It emphasizes the extensive range of products available on Amazon, including over 12 million items, which expands to more than 350 million products when including Marketplace sellers. The section also highlights the impracticality of manually classifying the enormous volume of customer reviews, suggesting machine learning as a more efficient alternative.
The dataset used in the study is titled Amazon Reviews for Sentiment Analysis and is available on Kaggle.
12 columns
4,915 rows
index: The row number.
reviewerName: The name of the user who submitted the review.
overall: The overall product rating.
reviewText: A summary of the evaluation.
reviewTime: The time at which the evaluation was submitted.
day_diff: The number of days since the review was posted.
helpful_yes: The count of users who found the review useful.
helpful_no: The count of users who did not find the review useful.
total_vote: The total number of votes the review received.
score_pos_neg_diff: A score indicating the difference between positive and negative assessments.
score_average_rating: The average rating score.
wilson_lower_bound: The lower bound of the Wilson score confidence interval, which is a statistical measure used to sort products based on their positive and negative ratings.