28. A/B Testing¶

Here we will look at how to collect and analyze data to determine the difference between two groups. The idea here is that if we randomly assign individuals to two groups we end up with comparable groups. If we then measure how these two groups respond to a treatment (e.g., being given game version A vs. game version B) we can better determine the effect of that treatment.

We’ll take a look at data collected to test how effective different versions of a game are at retaining users.

#load packages
import pandas as pd
import sklearn as sk
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

Load the data

#load data
df_cats = pd.read_csv("/content/cookie_cats.csv")

#take a look
df_cats.head()

	userid	version	sum_gamerounds	retention_1	retention_7
0	116	gate_30	3	False	False
1	337	gate_30	38	True	False
2	377	gate_40	165	True	False
3	483	gate_40	1	False	False
4	488	gate_40	179	True	True

28.1. Describe the data¶

How many in each group?

?.value_counts()

How many users returned after 7 days?

#gate placed at level 30
df_cats[?=='gate_30'].retention_7.sum() / len(df_cats[df_cats['version']=='gate_30'])

#gate placed at level 40
?

28.2. Visualize the data¶

#plot the differences between the versions
?

28.3. Wrangle the data¶

Convert the binary traget and binary input variable to 0/1

from sklearn.preprocessing import OrdinalEncoder

#get the columns names of features you'd like to turn into 0/1
bin_names = ['retention_7','version']

#create the OrdinalEncoder
my_ordinal = ?()

#fit and transform the data
df_cats[bin_names] = my_ordinal.?(?)

#take a look
df_cats

Check which version is assigned to which value

my_ordinal.categories_

Split your data into training and testing

#split these data into training and testing datasets
df_train, df_test = train_test_split(df_cats, test_size=0.20, stratify=df_cats[['retention_7']])

28.4. Build a model¶

Can we predict which game version does better?

import statsmodels.api as sm #for running regression!
import statsmodels.formula.api as smf

#1. Build the model
linear_reg_model = ?(formula='retention_7 ~ version ', data=?)

#2. Use the data to fit the model (i.e., find the best intercept and slope parameters)
linear_reg_results = linear_reg_model.?

#3. take a look at the summary
?

Make predictions to get the probability (i.e., in the table these are values on the logit scale!).

y_pred_prob = linear_reg_results.predict(df_train)
y_pred_prob.hist()

Run the model again but this time add in the sum of the times they played the game in the first 2 weeks.

import statsmodels.api as sm #for running regression!
import statsmodels.formula.api as smf

#1. Build the model
linear_reg_model = ?(formula='retention_7 ~ version + sum_gamerounds  ', data=?)

#2. Use the data to fit the model (i.e., find the best intercept and slope parameters)
linear_reg_results = linear_reg_model.?()

#3. take a look at the summary
?

Calculate the difference in predicted probability

y_pred_prob = linear_reg_results.predict(df_train)
y_pred_prob.hist()

Check to make sure the pattern you found generalizes to the whitheld dataset. (i.e., are you overfitting)

from sklearn.metrics import confusion_matrix

#predict on testing data
y_pred_prob = ?

#convert probs to 0/1
y_pred = (y_pred_prob > 0.5).astype(int)

#create a confusion matrix
cm_logit = confusion_matrix(df_test.retention_7, ?)

#visualize the confusion matrix
sns.heatmap(cm_logit, annot=True)
plt.xlabel('Predicted label')
plt.ylabel('True label')

Measure the accuracy, precision, and recall of the model on the test dataset

from sklearn.metrics import accuracy_score, precision_score, recall_score

model_acc = accuracy_score(df_test.retention_7, ?)
model_prec = precision_score(?)
model_rec = recall_score(?)

print(f"accuracy: {model_acc}" )
print(f"?: {model_prec}" )
print(f"?: {?}" )

28.5. Bonus¶

What does the model think retetion will change when we vary versions and sum_gamerounds?

#1. Create a dataframe
df_question = pd.DataFrame({'version':[?,?],
                            'sum_gamerounds':?})
                            
#2. Use the model to make predictions
question_pred =  ?(df_question)

#3. add a column to the df_question
df_question['predicted_retention'] = question_pred

#4. plot the predictions
?

Try to match those predictions based on your knowledge of the linear formula (y=a+bx)

import scipy

#the following function can be used to convert numbers on the logit scale back into the probability scale
scipy.special.expit(0)

#i.e., on the logit scale 0 is equivalent to 0.5 probability

#what was the intercept and slope of the line your model estimated? 
intercept = ?
b_version = ?
b_sumGame = ?

#probability of retention for version 0
scipy.special.expit(intercept + b_version * 0 + b_sumGame*100)

#probability of retention for version 1
scipy.special.expit(intercept + b_version * 1 + b_sumGame*100)

28.6. Further reading¶

If you would like the notebook without missing code check out the full code version.

Practical exercises in data science - PEDS

A/B Testing

Contents