Open In Colab

25. Project 2 - Making Predictions

In this project you will build a model to make predictions. This project will build on your exploratory data analytic (EDA) skills. You may choose to use the datasets you used with project 1 or move to another dataset.

In this project you will:

  1. Develop an understanding of the dataset

  2. Do exploratory data analysis and visualization

  3. Do some data preprocessing

  4. Build a predictive model

  5. Measure the performance of your model

  6. Summarize and interpret your results

Action: Import python libraries

25.1. Data Understanding

Action: Import your data into colaboratory.

Action: Determine the types of data are you dealing with & handle missing data (if there is any!). Marks (0.5)

Action: Estimate the summary statistics of some of the key variables. Marks (0.5)

25.2. Data Exploration and Visualization

Action: Visualize 1- the distribution of values for some key variables, and 2- the relationships between key variables. Remember to add text that walks a reader through what you found. Marks: 2

Action: Use correlation to estimate the relationship between some of the key variables. Remember to add text that helps a reader interpret the correlations. Marks: 1

25.3. Data preprocessing

Action: Do you need to apply any preprocessing steps? E.g., convert a binary variable to 1/0, or use one-hot encoding to convert categorical variables? Apply at least one preprocessing step, and explain why you used it. Marks: 2

Action: Split your data into training and testing datasets Marks: 1

Action: (optional) Scale any numeric variables. If you have no binary or categorical variables that need transforming, scaling will count towards your marks for your preprocessing step.

25.4. Build a model

Action: Use your training dataset to build a model with the goal of predicting a target variable. Marks: 2

25.5. Measure performance

Action: Use your testing dataset to estimate the performance of your model. Add text describing what kind of measure you used. Marks: 2

25.6. Discussion and interpretation

Q1:

What have you learnt about the ability to model and predict your variable of interest? Marks: 1

What variables are responsible for the predictive ability of your model, and what does your model suggest about the relationships these variables have with your target variable? (i.e., think magnitude and sign of each effect). Marks: 2

How did these relationships generalize to the with-held sample (i.e., testing data sample)? Marks: 1