Open In Colab

29. Project 3 - Answering Questions

In this project you will use all you learnt about in the class to answer a question. Imagine you are part of a team and you have been assigned a question to answer, build a notebook that you would be able to share with your team that shows what you found.

In this project you will:

  1. Introduce your question of interest

  2. Make sure your reader knows enough about the data

  3. Wrangle and preprocess your data so that a team mate can reproduce your work

  4. Build and test a model that can provide insight into your question

  5. Interpret your model results

  6. Answer your question

Note: Marks will be awarded for clarity, organization, and succinctness. I.e., try to point out only the important parts in a well-organized easy to follow manner. (Marks: 3)

29.1. Ask a question

Q1: Layout your question of interest. Remember to state your question as clearly and simply as you can (Mark 1), and what your ideal outcome would be (Marks 1).

Q2: Identify and describe what data sources you’ll use (Marks 1). Make sure to talk about one of the following: data accuracy, reliability, validity, or sample selection. (Marks: 1)

Q3: Layout what kind of ML problem you are facing and what kind of model you’ll use to answer it (eg., is it unsupervised or supervised learning, and is it classification or regression). Make sure to say why. (Marks 1)

29.2. Data understanding, exploration, and visualization

Action: While above you gave an overview of the dataset(s) that will be used, here make sure that the reader understands the important details of the data. E.g., show a figure or descriptive statistic and explain why the reader should know about this, i.e., how will it help your reader understand your analysis? (Marks: 2)

29.3. Data wrangling and preprocessing

Action: Layout all your data wrangling and preprocessing steps so that a reader will understand why you took each step, and would be able to reproduce your steps. (Marks: 3)

29.4. Build and test a model

Action: Use your training dataset to build a model with the goal of addressing your question of interest. (Marks: 2)

Q4: Measure the performance of your model, and describe how well your model generalizes to new data (Marks: 2)

29.5. Interpret your model

Q5: Interpret your model results. E.g., what features contributed to your predictions, if possible, can you determine the sign and magnitude of the effect (Marks: 2).

29.6. Answer your question

Q6: Use your analysis above to answer your question of interest. Did you achieve your desired outcome, and what might the next steps be? Remember to write as though you are writing to team mates working on the same/similar problem. (Marks: 3)

Note: It is ok if your analysis doesn’t provide a strong answer, you can point out where it failed. If anything you can cross the approach you took off the list of possible ways to tackle your question. I.e., you still made progress!