Open In Colab


27. Explainability vs Causality

Here we will look at the difference between understanding how the ML model is making predictions (explainability) and what is causing the outcome (causality)

To do so we will look at a silly example where we know that the patterns picked up by the model are not causal.

27.1. Waffle houses and divorce rates

import pandas as pd
import sklearn as sk
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

Load the data

#load data
df_waffles = pd.read_csv("/content/waffles.csv")

#take a look
df_waffles.head()

Visualize the data

#sort the dataframe
pd_df = df_waffles.sort_values(['Divorce']).reset_index(drop=True)

#plot by state
sns.barplot(data=pd_df, x="Loc",y="Divorce")
plt.xticks(rotation=90)

27.2. Do whaffle houses cause divorce?

#correlation
?
#scatter plot
sns.?(data=?, x="WaffleHouses", y="Divorce" )

Data wrangling

#split these data into training and testing datasets
df_train, df_test = train_test_split(df_waffles, test_size=0.20, random_state=14)

27.3. Build a model

Can we predict divorce rates based on:

  1. Population

  2. Marriage rates (more marriage more divorce)

  3. Median age at marriage

  4. Number of waffle houses

Build a linear regression predicting Divorce using wafflehouses.

import statsmodels.api as sm #for running regression!
import statsmodels.formula.api as smf

#1. Build the model
?

#2. Use the data to fit the model (i.e., find the best intercept and slope parameters)
?

#Look summary
?

27.4. Fit the model again, this time add the South variable

#Build the model
?

#Use the data to fit the model (i.e., find the best intercept and slope parameters)
?

#summary
?

27.4.1. Bonus

Try to run the models with alternative combinations of variables? How does the model estimate of the effect of wafflehouses on divorce change?

27.5. Statistical confounds

Statistical confounds make it hard to determine the causal nature of the patterns we find in ML model results. We need to be careful about how we explain how a model makes predictions and the causal nature of those patterns.

In the case of the whaffle houses and divorce rates, there are just more waffle houses in southern states. South –> wafflehouses –> Divorce rates

sns.boxplot(data=df_waffles, x="South", y="WaffleHouses")

27.6. Let’s see what feature importance suggests

from sklearn.linear_model import LinearRegression
from sklearn.inspection import permutation_importance

#split data into predictors (X) and target (y)
X = df_waffles.drop(['Divorce','Location','Loc'],axis=1)
y = df_waffles['Divorce']

#split these data into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

#fit linear regression
LR1 = LinearRegression()
LR1.fit(X_train, y_train)

#model interpretation
rel_impo = permutation_importance(LR1, X_test, y_test,n_repeats=30,random_state=0)
pd.DataFrame({"feature":X_test.columns,"importance":rel_impo.importances_mean, "sd":rel_impo.importances_std})

27.7. Let’s see what feature selection suggests

from sklearn.model_selection import KFold
from sklearn.feature_selection import RFECV

#split data into predictors (X) and target (y)
X = df_waffles.drop(['Divorce','Location','Loc'], axis=1)
y = df_waffles['Divorce']

#split these data into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

#build a linear regression (full model)
LR1 = LinearRegression()

#fit linear regression
LR1.fit(X_train, y_train)
#min number of variables/features
min_features_to_select = 1

#build the feature selection algorithm
rfecv = RFECV(estimator=LR1, step=1, cv=3,scoring='neg_mean_squared_error', min_features_to_select=min_features_to_select)

#fit the algorithm to the data
rfecv.fit(X_train, y_train)
print("Optimal number of features : %d" % rfecv.n_features_)

# Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (mean square error?)")
plt.plot(range(min_features_to_select,
               len(rfecv.grid_scores_) + min_features_to_select),
         rfecv.grid_scores_)
plt.show()
rfecv.support_
X_train_reduced = X_train.iloc[:,rfecv.support_]

X_train_reduced.head(3)
#get the slopes!
rfecv.estimator_.coef_

27.8. Bonus

Redo the exercise above this time using a more black box approach, e.g., Random Forest!

27.9. Further reading

If you would like the notebook without missing code check out the full code version.