Chocolate Bars Data Analysis
Contents
About Dataset π
This dataset contains expert ratings of 1700 unique chocolate bars along with information on their regional origin, cocoa percentage , variety of chocolate bean used and the the areas where the beans were grown. The dataset can be found here on kaggle.
Cacao rating system is as follows: 4.0 - 5.0 = Outstanding 3.5 - 3.9 = Highly Recommended 3.0 - 3.49 = Recommended 2.0 - 2.9 = Disappointing 1.0 - 1.9 = Unpleasant
Methodology π§°
I used excel to explore the data to check for any issues or missing values & tableau for EDA and dashboarding.
Data Cleaning π§Ή
The dataset was in the form of csv file which I imported into Excel to check for errors and missing values. Following steps are followed:
- Removed the extra columns()
- Checked for blank cells & found none(The columns second_taste, third_taste, fourth_taste have some blank cells but itβs an exception so I assumed there is no second, third & fourth taste of chocolate)
- The processed dataset now has 29 columns and 2224 rows indicating different chocolate bars with their qualities, origins, and ratings.
Findings & Insights β¨
Dashboard Link:
The insights covered are:
- Highly rated chocolate brands
- Highly recommended chocolate brands
- Regions producing highly rated chocolate brands
- First Taste in Chocolate bars
- Percentage composition of number of Ingredients, lecithin, sugar and vanilla in highly rated chocolate bars.