Data Analysis & Visualization of 77 Breakfast Cereals

Contents


Introduction

Cereals are consumed in most parts of the world as a breakfast. Before oats and flakes became popular barley & corn grits were mainly consumed as a breakfast meal. Gradually several companies came out with their own version of breakfast cereals and soon the customers realized the need to make a healthy choice. In this project I will be doing an analysis on the cereal data in order to find answers to questions such as:

Objective

To do a Exploratory data analysis on Excel & making a dashboard presenting different cereals with their composition and their placements in the shelf.

Data

The data has been downloaded from this Source. The dataset is in the form of csv file which contains nutritional information of 77 different kinds of cereals manufactured by 7 different companies namely American Home Food Products, General Mills, Kelloggs, Nabisco, Post, Quaker Oats and Ralston Purina.

Methodologies

I will be using Spreadsheet software Microsoft Excel & the in-built tools to apply various techniques to clean data(if required), explore data and visualize it.

Data Importing

The dataset is a multivariate dataset containing 77 rows & 16 columns. The mfr column represents different manufacturers(A = American Home Food Products; G = General Mills; K = Kelloggs; N = Nabisco; P = Post; Q = Quaker Oats; R = Ralston Purina) & types(C=Cold, H=Hot).

Data Cleaning

Before doing any modifications in the data, it’s important to make a copy of it to ensure data safety and restoration if anything goes wrong in the process. I will do below operations on the duplicate sheet. 1. Dropping certain columns which I will not be using for analysis. 2. Checking if there are any blank cells. (As per the information provided by the source, -1 values represent not available so I will be converting them to 0)

Data Exploration

Data has measurements in different units. Except Sodium & Potassium, all values were in grams. Vitamins represent mineral & vitamin content as per the FDA recommendation in percentage. Type of cereal is nominal variable & shelf is ordinal variable. I have converted the measurements of sodium & potassium from milligrams to grams for data integrity. Now for data exploration phase, I will plot graphs & charts to show composition of different cereals.

Step by step process from Cleaning to Exploration.

Basic Exploration

Types of cereals in the dataset.

Shows cereals placed at different shelves differ in fiber content.

Cereals placement on shelf along with their fiber content

-Majority of cereals(96%) in this dataset are of cold type. -More than 50% of the cereals by the store are kept on the top shelf which constitute cereals having maximum fiber content per serving. -Cereals manufactured by Kelloggs & Nabisco hold most positions in the Top 10 highly rated cereals.

Bivariate Analysis

Scatter plot between fats and calories on left; fibers & calories on the right.
  1. The correlation between calories & Fats is a weak positive correlation.
  1. The correlation between calories & Fibers is a weak negative correlation.
  1. We can say that more calorie content in cereals doesn’t equate to more fats or more fibers.

Data Communication

Glimpse of Excel Dashboard that shows important information about different cereals.