Data Analysis & Visualization of 77 Breakfast Cereals
Contents
Introduction
Cereals are consumed in most parts of the world as a breakfast. Before oats and flakes became popular barley & corn grits were mainly consumed as a breakfast meal. Gradually several companies came out with their own version of breakfast cereals and soon the customers realized the need to make a healthy choice. In this project I will be doing an analysis on the cereal data in order to find answers to questions such as:
- What all nutrients are we getting in a bowl of cereal ?
- Which cereals hold maximum calorific value ?
- Which cereals offer maximum fiber ?
- Which cereals are kept at top, middle & lower shelf ?
- Is there a relationship that exists between fiber & calories ?
- Is there a relationship between fats & calories ?
Objective
To do a Exploratory data analysis on Excel & making a dashboard presenting different cereals with their composition and their placements in the shelf.
Data
The data has been downloaded from this Source. The dataset is in the form of csv file which contains nutritional information of 77 different kinds of cereals manufactured by 7 different companies namely American Home Food Products, General Mills, Kelloggs, Nabisco, Post, Quaker Oats and Ralston Purina.
Methodologies
I will be using Spreadsheet software Microsoft Excel & the in-built tools to apply various techniques to clean data(if required), explore data and visualize it.
Data Importing
The dataset is a multivariate dataset containing 77 rows & 16 columns. The mfr column represents different manufacturers(A = American Home Food Products; G = General Mills; K = Kelloggs; N = Nabisco; P = Post; Q = Quaker Oats; R = Ralston Purina) & types(C=Cold, H=Hot).
Data Cleaning
Before doing any modifications in the data, it’s important to make a copy of it to ensure data safety and restoration if anything goes wrong in the process. I will do below operations on the duplicate sheet. 1. Dropping certain columns which I will not be using for analysis. 2. Checking if there are any blank cells. (As per the information provided by the source, -1 values represent not available so I will be converting them to 0)
Data Exploration
Data has measurements in different units. Except Sodium & Potassium, all values were in grams. Vitamins represent mineral & vitamin content as per the FDA recommendation in percentage. Type of cereal is nominal variable & shelf is ordinal variable. I have converted the measurements of sodium & potassium from milligrams to grams for data integrity. Now for data exploration phase, I will plot graphs & charts to show composition of different cereals.
Basic Exploration
Cereals placement on shelf along with their fiber content
-Majority of cereals(96%) in this dataset are of cold type. -More than 50% of the cereals by the store are kept on the top shelf which constitute cereals having maximum fiber content per serving. -Cereals manufactured by Kelloggs & Nabisco hold most positions in the Top 10 highly rated cereals.
Bivariate Analysis
- The correlation between calories & Fats is a weak positive correlation.
- The correlation between calories & Fibers is a weak negative correlation.
- We can say that more calorie content in cereals doesn’t equate to more fats or more fibers.
Data Communication
- More than 90 % Cereals are of cold type.
- The store has a done a good placement of cereals on the shelves. The cereals having maximum fiber content are kept on top shelf which could help buyer make a conscious choice.
- As per the Recommended Daily Intake, dietary fibers are a categorized under good nutrients to take more for better health. However there are some cereals having zero fiber content in the store kept on the last shelf.
- Diets high in calories from added sugars can make it difficult to meet daily recommended levels of important nutrients while staying within calorie limits. Hence the best choice here would be cereals having zero sugar content.
- No relationship found between calories and fibers implies that amount of calories in cereals available in the store doesn’t affect the fiber content. Similarly no relationship between calories and fats was found.
- Most of the highly rated cereals are among the ones having zero sugar content. This means people prefer cereals without added sugar.