Project 1 - Exploratory Data Analysis
Contents
18. Project 1 - Exploratory Data Analysis¶
In this project you will perform an exploratory data analysis (EDA) using visualizations and correlations. You may choose from one of the datasets within the class shared data folder, or search for a dataset that interests you the most! Kaggle is a good place to start, as they often have relatively clean and easy to use datasets, but feel free to explore other places. There is a lot of data out there!
In this project you will:
Choose and download a dataset
Get summary statistics for key variables
Create visuals to help understand your data
Use correlation to measure relationships between key variables
Summarise how EDA helped (or not!) in understanding your dataset
Import python libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
18.1. Data ¶
Action: Import your data into colaboratory.
Action: Determine the types of data are you dealing with. Marks (0.5)
18.2. Summary statistics¶
Action: Estimate the summary statistics of some of the key variables, and describe what you find. Marks (1)
18.3. Visualize the data¶
Action: Visualize the distribution of values for some key variables. Marks (2)
Q1: Explain your choice of plots using the five visualization components: Marks (2.5)
Data component – what kinds of data are you dealing with?
Graphical component – what kinds of plot can you use?
Label component – what should be on the plot axis?
Esthetic component – what should you plot say, and how best to do this?
Ethical component – Is the graph misleading, what is left out?
18.4. Correlations¶
Action: Use correlation to estimate the relationship between some of the key variables. Try exploring for interesting relationships using heatmaps. Marks (1)
Q2: Choose one or two correlations and describe what the magnitude and direction of the correlation suggests about the relationship between the two variables. Marks (2)
18.5. Discussion¶
Q3: Did this exploritory data analysis help you better understand your chosen dataset? If so how? Is there still parts that don’t make sense? Marks (1)
The idea with this question is not to see if you know everything about this dataset, just how EDA might have helped (or not!).