My New Udemy Class: Fundamental Market Analysis with Python

Fundamental Market Analysis with Python

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

Hot off the Press!

The Little Book of Fundamental Market Indicators

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!

Grow Your Web Brand, Visibility & Traffic Organically

The Little Book of Fundamental Market Indicators

5 Years of amunategui.github.Io and the Lessons I Learned from Growing My Online Community from the Ground Up

Modeling for Actionable Insights with XGBoost - What Can You Do about Your Predictions?

Introduction

Lets talk modeling for actionable insights! Building a predictive model is only the first step as your end user or customer wont know what to do with an AUC or RMSE score, but if you can tell them WHO is at risk, WHY and WHAT they can do about it - thats actionable and can even be translated into dollar amounts!! And Were going to do it with XGBoost on a C5.0 dataset entitled Customer Churn

Code

Modeling for Actionable Insights - Customer Churn
In [516]:
from IPython.display import Image
Image(filename='double logos.png')
Out[516]:

We'll use a data set called "Customer Churn". As the name implies, the data contains customer information and usage records from a phone company including whether the customer churned or not. It contains full day use, international plans, and customer service calls to understand and predict patterns of churn.

You can find the data set on many GitHub repos, on C5.0, and at http://amunategui.github.io/customer_churn.csv

In [517]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import seaborn as sns
warnings.filterwarnings("ignore")
In [518]:
#churn_df = pd.read_csv('http://amunategui.github.io/customer_churn.csv')
churn_df = pd.read_csv('customer_churn.csv')
churn_df.head()
Out[518]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... Eve Calls Eve Charge Night Mins Night Calls Night Charge Intl Mins Intl Calls Intl Charge CustServ Calls Churn?
0 KS 128 415 382-4657 no yes 25 265.1 110 45.07 ... 99 16.78 244.7 91 11.01 10.0 3 2.70 1 False.
1 OH 107 415 371-7191 no yes 26 161.6 123 27.47 ... 103 16.62 254.4 103 11.45 13.7 3 3.70 1 False.
2 NJ 137 415 358-1921 no no 0 243.4 114 41.38 ... 110 10.30 162.6 104 7.32 12.2 5 3.29 0 False.
3 OH 84 408 375-9999 yes no 0 299.4 71 50.90 ... 88 5.26 196.9 89 8.86 6.6 7 1.78 2 False.
4 OK 75 415 330-6626 yes no 0 166.7 113 28.34 ... 122 12.61 186.9 121 8.41 10.1 3 2.73 3 False.

5 rows × 21 columns

Feature Engineering

In [519]:
# Binarize area codes
churn_df['Area Code'] = churn_df['Area Code'].apply(str)
pd.get_dummies(churn_df['Area Code']).head()
Out[519]:
408 415 510
0 0 1 0
1 0 1 0
2 0 1 0
3 1 0 0
4 0 1 0

More Feature Engineering - Transform true/false yes/no text into numerics

In [524]:
churn_df['State'].value_counts()[0:10]
Out[524]:
WV    106
MN     84
NY     83
AL     80
OH     78
WI     78
OR     78
WY     77
VA     77
CT     74
Name: State, dtype: int64
In [522]:
# fix the outcome
churn_df['Churn?'] = np.where(churn_df['Churn?'] == 'True.', 1, 0)
churn_df["Int'l Plan"] = np.where(churn_df["Int'l Plan"] == 'yes', 1, 0)
churn_df['VMail Plan'] = np.where(churn_df['VMail Plan'] == 'yes', 1, 0)
 
In [523]:
# dummify states
pd.get_dummies(churn_df['State']).head()
Out[523]:
AK AL AR AZ CA CO CT DC DE FL ... SD TN TX UT VA VT WA WI WV WY
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 51 columns

In [525]:
# binarize categorical columns
churn_df = pd.concat([churn_df, pd.get_dummies(churn_df['State'])], axis=1)
churn_df = pd.concat([churn_df, pd.get_dummies(churn_df['Area Code'])], axis=1)

churn_df.head()
Out[525]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... UT VA VT WA WI WV WY 408 415 510
0 KS 128 415 382-4657 0 1 25 265.1 110 45.07 ... 0 0 0 0 0 0 0 0 1 0
1 OH 107 415 371-7191 0 1 26 161.6 123 27.47 ... 0 0 0 0 0 0 0 0 1 0
2 NJ 137 415 358-1921 0 0 0 243.4 114 41.38 ... 0 0 0 0 0 0 0 0 1 0
3 OH 84 408 375-9999 1 0 0 299.4 71 50.90 ... 0 0 0 0 0 0 0 1 0 0
4 OK 75 415 330-6626 1 0 0 166.7 113 28.34 ... 0 0 0 0 0 0 0 0 1 0

5 rows × 75 columns

In [527]:
# # check for nulls in data and impute if necessary
# for feat in list(churn_df):
#     if (len(churn_df[feat]) - churn_df[feat].count()) > 0:
#         print(feat)
#         print(len(churn_df[feat]) - churn_df[feat].count())
#         # tmp_df.loc[tmp_df[feat].isnull(), feat] = 0
    
In [528]:
churn_df.head()
Out[528]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... UT VA VT WA WI WV WY 408 415 510
0 KS 128 415 382-4657 0 1 25 265.1 110 45.07 ... 0 0 0 0 0 0 0 0 1 0
1 OH 107 415 371-7191 0 1 26 161.6 123 27.47 ... 0 0 0 0 0 0 0 0 1 0
2 NJ 137 415 358-1921 0 0 0 243.4 114 41.38 ... 0 0 0 0 0 0 0 0 1 0
3 OH 84 408 375-9999 1 0 0 299.4 71 50.90 ... 0 0 0 0 0 0 0 1 0 0
4 OK 75 415 330-6626 1 0 0 166.7 113 28.34 ... 0 0 0 0 0 0 0 0 1 0

5 rows × 75 columns

In [529]:
list(churn_df)
Out[529]:
['State',
 'Account Length',
 'Area Code',
 'Phone',
 "Int'l Plan",
 'VMail Plan',
 'VMail Message',
 'Day Mins',
 'Day Calls',
 'Day Charge',
 'Eve Mins',
 'Eve Calls',
 'Eve Charge',
 'Night Mins',
 'Night Calls',
 'Night Charge',
 'Intl Mins',
 'Intl Calls',
 'Intl Charge',
 'CustServ Calls',
 'Churn?',
 'AK',
 'AL',
 'AR',
 'AZ',
 'CA',
 'CO',
 'CT',
 'DC',
 'DE',
 'FL',
 'GA',
 'HI',
 'IA',
 'ID',
 'IL',
 'IN',
 'KS',
 'KY',
 'LA',
 'MA',
 'MD',
 'ME',
 'MI',
 'MN',
 'MO',
 'MS',
 'MT',
 'NC',
 'ND',
 'NE',
 'NH',
 'NJ',
 'NM',
 'NV',
 'NY',
 'OH',
 'OK',
 'OR',
 'PA',
 'RI',
 'SC',
 'SD',
 'TN',
 'TX',
 'UT',
 'VA',
 'VT',
 'WA',
 'WI',
 'WV',
 'WY',
 '408',
 '415',
 '510']
In [491]:
features = [feat for feat in list(churn_df) if feat not in ['State', 'Churn?', 'Phone', 'Area Code']]
In [492]:
outcome = 'Churn?'
In [493]:
# run simple xgboost classification model and check 
# prep modeling code
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(churn_df, 
                                                 churn_df[outcome], 
                                                 test_size=0.3, 
                                                 random_state=42)

import xgboost  as xgb
xgb_params = {
    'max_depth':3, 
    'eta':0.05, 
    'silent':0, 
    'eval_metric':'auc',
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'objective':'binary:logistic',
    'seed' : 0
}

dtrain = xgb.DMatrix(X_train[features], y_train, feature_names = features)
dtest = xgb.DMatrix(X_test[features], y_test, feature_names = features)
evals = [(dtrain,'train'),(dtest,'eval')]
xgb_model = xgb.train (params = xgb_params,
              dtrain = dtrain,
              num_boost_round = 2000,
              verbose_eval=50, 
              early_stopping_rounds = 500,
              evals=evals,
              #feval = f1_score_cust,
              maximize = True)
 
# plot the important features  
fig, ax = plt.subplots(figsize=(6,9))
xgb.plot_importance(xgb_model,  height=0.8, ax=ax)
plt.show()
[0]	train-auc:0.827398	eval-auc:0.832005
Multiple eval metrics have been passed: 'eval-auc' will be used for early stopping.

Will train until eval-auc hasn't improved in 500 rounds.
[50]	train-auc:0.919653	eval-auc:0.927977
[100]	train-auc:0.936531	eval-auc:0.933436
[150]	train-auc:0.951331	eval-auc:0.936508
[200]	train-auc:0.963419	eval-auc:0.937251
[250]	train-auc:0.972945	eval-auc:0.937422
[300]	train-auc:0.980271	eval-auc:0.937104
[350]	train-auc:0.98542	eval-auc:0.93712
[400]	train-auc:0.988963	eval-auc:0.935659
[450]	train-auc:0.992415	eval-auc:0.937316
[500]	train-auc:0.994326	eval-auc:0.936394
[550]	train-auc:0.996219	eval-auc:0.936043
[600]	train-auc:0.997177	eval-auc:0.935863
[650]	train-auc:0.997906	eval-auc:0.934688
[700]	train-auc:0.998623	eval-auc:0.933448
Stopping. Best iteration:
[217]	train-auc:0.968177	eval-auc:0.938687

In [494]:
# get dataframe version of important feature for model 
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
xgb_fea_imp.head(10)
Out[494]:
feature importance
0 Day Mins 567
4 Eve Mins 506
13 Night Mins 344
5 Night Calls 320
9 Intl Mins 316
16 Account Length 275
14 Day Calls 252
19 Eve Calls 250
1 CustServ Calls 240
2 Int'l Plan 168

Creating top/bottom percentiles to determine under/over use

In [530]:
churn_df['Day Mins'].quantile(0.25)
Out[530]:
143.7
In [531]:
churn_df['Day Mins'].quantile(0.75)
Out[531]:
216.4
In [532]:
pred_churn = xgb_model.predict(dtest)
plt.plot(sorted(pred_churn))
plt.grid()
In [498]:
# get all numerical features
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numeric_features = list(X_test.head().select_dtypes(include=numerics))
features_to_ignore = ['Account Length', 'Area Code','Churn?', 'Will_Churn']
numeric_features = [nf for nf in numeric_features if nf not in features_to_ignore]

row_counter = 0
X_test['Will_Churn'] = pred_churn
new_df = []
for index, row in X_test.iterrows():
    if row['Will_Churn'] > 0.8:
        row_counter += 1
        new_df.append(row[list(churn_df)])
        for feat in numeric_features:
            # only consider high prob churns
            if row[feat] < X_test[feat].quantile(0.25):
                print('(ID:', row_counter, ')', feat,  ' is < than 25 percentile')
            if row[feat] > X_test[feat].quantile(0.75):
                print('(ID:', row_counter, ')', feat,  ' is > than 75 percentile')


new_df[0]
(ID: 1 ) Day Mins  is < than 25 percentile
(ID: 1 ) Day Calls  is < than 25 percentile
(ID: 1 ) Day Charge  is < than 25 percentile
(ID: 1 ) Eve Mins  is < than 25 percentile
(ID: 1 ) Eve Calls  is > than 75 percentile
(ID: 1 ) Eve Charge  is < than 25 percentile
(ID: 1 ) Night Mins  is < than 25 percentile
(ID: 1 ) Night Charge  is < than 25 percentile
(ID: 1 ) Intl Mins  is < than 25 percentile
(ID: 1 ) Intl Calls  is < than 25 percentile
(ID: 1 ) Intl Charge  is < than 25 percentile
(ID: 1 ) CustServ Calls  is > than 75 percentile
(ID: 2 ) Day Mins  is > than 75 percentile
(ID: 2 ) Day Calls  is > than 75 percentile
(ID: 2 ) Day Charge  is > than 75 percentile
(ID: 2 ) Eve Calls  is < than 25 percentile
(ID: 2 ) Night Calls  is > than 75 percentile
(ID: 2 ) Intl Mins  is < than 25 percentile
(ID: 2 ) Intl Charge  is < than 25 percentile
(ID: 3 ) Day Mins  is < than 25 percentile
(ID: 3 ) Day Calls  is > than 75 percentile
(ID: 3 ) Day Charge  is < than 25 percentile
(ID: 3 ) Eve Mins  is < than 25 percentile
(ID: 3 ) Eve Charge  is < than 25 percentile
(ID: 3 ) CustServ Calls  is > than 75 percentile
(ID: 4 ) Day Mins  is > than 75 percentile
(ID: 4 ) Day Calls  is > than 75 percentile
(ID: 4 ) Day Charge  is > than 75 percentile
(ID: 4 ) Night Mins  is > than 75 percentile
(ID: 4 ) Night Calls  is > than 75 percentile
(ID: 4 ) Night Charge  is > than 75 percentile
(ID: 4 ) CustServ Calls  is > than 75 percentile
(ID: 5 ) Int'l Plan  is > than 75 percentile
(ID: 5 ) Eve Mins  is > than 75 percentile
(ID: 5 ) Eve Calls  is > than 75 percentile
(ID: 5 ) Eve Charge  is > than 75 percentile
(ID: 5 ) Night Calls  is > than 75 percentile
(ID: 5 ) Intl Calls  is < than 25 percentile
(ID: 5 ) CustServ Calls  is < than 25 percentile
(ID: 6 ) Day Mins  is > than 75 percentile
(ID: 6 ) Day Calls  is < than 25 percentile
(ID: 6 ) Day Charge  is > than 75 percentile
(ID: 6 ) Eve Mins  is > than 75 percentile
(ID: 6 ) Eve Calls  is < than 25 percentile
(ID: 6 ) Eve Charge  is > than 75 percentile
(ID: 6 ) Intl Calls  is < than 25 percentile
(ID: 7 ) Int'l Plan  is > than 75 percentile
(ID: 7 ) Day Mins  is < than 25 percentile
(ID: 7 ) Day Charge  is < than 25 percentile
(ID: 7 ) Eve Mins  is < than 25 percentile
(ID: 7 ) Eve Charge  is < than 25 percentile
(ID: 7 ) Night Mins  is > than 75 percentile
(ID: 7 ) Night Calls  is > than 75 percentile
(ID: 7 ) Night Charge  is > than 75 percentile
(ID: 7 ) Intl Mins  is < than 25 percentile
(ID: 7 ) Intl Charge  is < than 25 percentile
(ID: 7 ) CustServ Calls  is > than 75 percentile
(ID: 8 ) Day Mins  is > than 75 percentile
(ID: 8 ) Day Charge  is > than 75 percentile
(ID: 8 ) Night Mins  is < than 25 percentile
(ID: 8 ) Night Calls  is > than 75 percentile
(ID: 8 ) Night Charge  is < than 25 percentile
(ID: 8 ) Intl Mins  is < than 25 percentile
(ID: 8 ) Intl Calls  is > than 75 percentile
(ID: 8 ) Intl Charge  is < than 25 percentile
(ID: 8 ) CustServ Calls  is < than 25 percentile
(ID: 9 ) Day Mins  is > than 75 percentile
(ID: 9 ) Day Charge  is > than 75 percentile
(ID: 9 ) Eve Mins  is > than 75 percentile
(ID: 9 ) Eve Calls  is > than 75 percentile
(ID: 9 ) Eve Charge  is > than 75 percentile
(ID: 9 ) Night Mins  is > than 75 percentile
(ID: 9 ) Night Charge  is > than 75 percentile
(ID: 9 ) Intl Mins  is < than 25 percentile
(ID: 9 ) Intl Charge  is < than 25 percentile
(ID: 10 ) Day Mins  is > than 75 percentile
(ID: 10 ) Day Charge  is > than 75 percentile
(ID: 10 ) Eve Mins  is > than 75 percentile
(ID: 10 ) Eve Calls  is < than 25 percentile
(ID: 10 ) Eve Charge  is > than 75 percentile
(ID: 10 ) Intl Calls  is < than 25 percentile
(ID: 11 ) Int'l Plan  is > than 75 percentile
(ID: 11 ) Day Mins  is > than 75 percentile
(ID: 11 ) Day Charge  is > than 75 percentile
(ID: 11 ) Eve Calls  is < than 25 percentile
(ID: 11 ) Night Calls  is > than 75 percentile
(ID: 11 ) Intl Mins  is > than 75 percentile
(ID: 11 ) Intl Charge  is > than 75 percentile
(ID: 11 ) CustServ Calls  is < than 25 percentile
(ID: 12 ) Int'l Plan  is > than 75 percentile
(ID: 12 ) Day Mins  is > than 75 percentile
(ID: 12 ) Day Charge  is > than 75 percentile
(ID: 12 ) Eve Mins  is > than 75 percentile
(ID: 12 ) Eve Calls  is < than 25 percentile
(ID: 12 ) Eve Charge  is > than 75 percentile
(ID: 12 ) Night Mins  is > than 75 percentile
(ID: 12 ) Night Calls  is < than 25 percentile
(ID: 12 ) Night Charge  is > than 75 percentile
(ID: 12 ) Intl Mins  is < than 25 percentile
(ID: 12 ) Intl Calls  is < than 25 percentile
(ID: 12 ) Intl Charge  is < than 25 percentile
(ID: 12 ) CustServ Calls  is > than 75 percentile
(ID: 13 ) Int'l Plan  is > than 75 percentile
(ID: 13 ) Day Mins  is > than 75 percentile
(ID: 13 ) Day Calls  is < than 25 percentile
(ID: 13 ) Day Charge  is > than 75 percentile
(ID: 13 ) Eve Calls  is < than 25 percentile
(ID: 13 ) Intl Mins  is > than 75 percentile
(ID: 13 ) Intl Calls  is < than 25 percentile
(ID: 13 ) Intl Charge  is > than 75 percentile
(ID: 14 ) Int'l Plan  is > than 75 percentile
(ID: 14 ) Day Calls  is < than 25 percentile
(ID: 14 ) Eve Mins  is < than 25 percentile
(ID: 14 ) Eve Calls  is < than 25 percentile
(ID: 14 ) Eve Charge  is < than 25 percentile
(ID: 14 ) Night Mins  is > than 75 percentile
(ID: 14 ) Night Charge  is > than 75 percentile
(ID: 14 ) Intl Mins  is > than 75 percentile
(ID: 14 ) Intl Charge  is > than 75 percentile
(ID: 14 ) CustServ Calls  is < than 25 percentile
(ID: 15 ) Day Mins  is > than 75 percentile
(ID: 15 ) Day Charge  is > than 75 percentile
(ID: 15 ) Night Mins  is > than 75 percentile
(ID: 15 ) Night Calls  is < than 25 percentile
(ID: 15 ) Night Charge  is > than 75 percentile
(ID: 15 ) CustServ Calls  is < than 25 percentile
(ID: 16 ) Day Mins  is > than 75 percentile
(ID: 16 ) Day Calls  is > than 75 percentile
(ID: 16 ) Day Charge  is > than 75 percentile
(ID: 16 ) Intl Calls  is > than 75 percentile
(ID: 17 ) Day Mins  is > than 75 percentile
(ID: 17 ) Day Charge  is > than 75 percentile
(ID: 17 ) Eve Mins  is > than 75 percentile
(ID: 17 ) Eve Charge  is > than 75 percentile
(ID: 17 ) Intl Mins  is > than 75 percentile
(ID: 17 ) Intl Calls  is < than 25 percentile
(ID: 17 ) Intl Charge  is > than 75 percentile
(ID: 18 ) Day Mins  is < than 25 percentile
(ID: 18 ) Day Charge  is < than 25 percentile
(ID: 18 ) Eve Mins  is < than 25 percentile
(ID: 18 ) Eve Charge  is < than 25 percentile
(ID: 18 ) Night Calls  is > than 75 percentile
(ID: 18 ) Intl Mins  is > than 75 percentile
(ID: 18 ) Intl Charge  is > than 75 percentile
(ID: 18 ) CustServ Calls  is > than 75 percentile
(ID: 19 ) Day Mins  is < than 25 percentile
(ID: 19 ) Day Calls  is > than 75 percentile
(ID: 19 ) Day Charge  is < than 25 percentile
(ID: 19 ) Eve Mins  is < than 25 percentile
(ID: 19 ) Eve Calls  is < than 25 percentile
(ID: 19 ) Eve Charge  is < than 25 percentile
(ID: 19 ) Night Calls  is > than 75 percentile
(ID: 19 ) Intl Calls  is < than 25 percentile
(ID: 19 ) CustServ Calls  is > than 75 percentile
(ID: 20 ) Day Mins  is > than 75 percentile
(ID: 20 ) Day Calls  is > than 75 percentile
(ID: 20 ) Day Charge  is > than 75 percentile
(ID: 20 ) Eve Mins  is > than 75 percentile
(ID: 20 ) Eve Charge  is > than 75 percentile
(ID: 20 ) Night Mins  is > than 75 percentile
(ID: 20 ) Night Charge  is > than 75 percentile
(ID: 20 ) Intl Mins  is > than 75 percentile
(ID: 20 ) Intl Calls  is > than 75 percentile
(ID: 20 ) Intl Charge  is > than 75 percentile
(ID: 21 ) Day Mins  is > than 75 percentile
(ID: 21 ) Day Calls  is > than 75 percentile
(ID: 21 ) Day Charge  is > than 75 percentile
(ID: 21 ) Eve Mins  is > than 75 percentile
(ID: 21 ) Eve Charge  is > than 75 percentile
(ID: 21 ) Night Calls  is < than 25 percentile
(ID: 22 ) Day Mins  is > than 75 percentile
(ID: 22 ) Day Charge  is > than 75 percentile
(ID: 22 ) Eve Mins  is > than 75 percentile
(ID: 22 ) Eve Charge  is > than 75 percentile
(ID: 22 ) Night Mins  is > than 75 percentile
(ID: 22 ) Night Charge  is > than 75 percentile
(ID: 22 ) Intl Mins  is > than 75 percentile
(ID: 22 ) Intl Calls  is > than 75 percentile
(ID: 22 ) Intl Charge  is > than 75 percentile
(ID: 22 ) CustServ Calls  is < than 25 percentile
(ID: 23 ) Day Mins  is > than 75 percentile
(ID: 23 ) Day Calls  is < than 25 percentile
(ID: 23 ) Day Charge  is > than 75 percentile
(ID: 23 ) Eve Calls  is > than 75 percentile
(ID: 23 ) Night Mins  is > than 75 percentile
(ID: 23 ) Night Charge  is > than 75 percentile
(ID: 24 ) Int'l Plan  is > than 75 percentile
(ID: 24 ) Night Mins  is > than 75 percentile
(ID: 24 ) Night Calls  is > than 75 percentile
(ID: 24 ) Night Charge  is > than 75 percentile
(ID: 24 ) Intl Mins  is > than 75 percentile
(ID: 24 ) Intl Calls  is > than 75 percentile
(ID: 24 ) Intl Charge  is > than 75 percentile
(ID: 25 ) Day Mins  is > than 75 percentile
(ID: 25 ) Day Charge  is > than 75 percentile
(ID: 25 ) Eve Mins  is > than 75 percentile
(ID: 25 ) Eve Calls  is > than 75 percentile
(ID: 25 ) Eve Charge  is > than 75 percentile
(ID: 25 ) Night Calls  is < than 25 percentile
(ID: 25 ) Intl Calls  is < than 25 percentile
(ID: 25 ) CustServ Calls  is > than 75 percentile
(ID: 26 ) Int'l Plan  is > than 75 percentile
(ID: 26 ) VMail Message  is > than 75 percentile
(ID: 26 ) Day Calls  is > than 75 percentile
(ID: 26 ) Eve Calls  is > than 75 percentile
(ID: 26 ) Night Mins  is < than 25 percentile
(ID: 26 ) Night Charge  is < than 25 percentile
(ID: 26 ) Intl Mins  is > than 75 percentile
(ID: 26 ) Intl Calls  is > than 75 percentile
(ID: 26 ) Intl Charge  is > than 75 percentile
(ID: 27 ) Day Mins  is > than 75 percentile
(ID: 27 ) Day Calls  is < than 25 percentile
(ID: 27 ) Day Charge  is > than 75 percentile
(ID: 27 ) Eve Calls  is < than 25 percentile
(ID: 27 ) Night Calls  is > than 75 percentile
(ID: 27 ) Intl Mins  is > than 75 percentile
(ID: 27 ) Intl Calls  is > than 75 percentile
(ID: 27 ) Intl Charge  is > than 75 percentile
(ID: 28 ) Int'l Plan  is > than 75 percentile
(ID: 28 ) Day Calls  is > than 75 percentile
(ID: 28 ) Eve Calls  is > than 75 percentile
(ID: 28 ) Night Mins  is < than 25 percentile
(ID: 28 ) Night Calls  is < than 25 percentile
(ID: 28 ) Night Charge  is < than 25 percentile
(ID: 28 ) CustServ Calls  is > than 75 percentile
(ID: 29 ) Day Mins  is > than 75 percentile
(ID: 29 ) Day Calls  is > than 75 percentile
(ID: 29 ) Day Charge  is > than 75 percentile
(ID: 29 ) Eve Calls  is > than 75 percentile
(ID: 29 ) Night Calls  is > than 75 percentile
(ID: 29 ) Intl Calls  is < than 25 percentile
(ID: 30 ) Day Mins  is > than 75 percentile
(ID: 30 ) Day Charge  is > than 75 percentile
(ID: 30 ) Night Calls  is < than 25 percentile
(ID: 30 ) CustServ Calls  is < than 25 percentile
(ID: 31 ) Int'l Plan  is > than 75 percentile
(ID: 31 ) Day Mins  is > than 75 percentile
(ID: 31 ) Day Charge  is > than 75 percentile
(ID: 31 ) Eve Mins  is > than 75 percentile
(ID: 31 ) Eve Charge  is > than 75 percentile
(ID: 31 ) Intl Calls  is < than 25 percentile
(ID: 32 ) Int'l Plan  is > than 75 percentile
(ID: 32 ) Day Mins  is > than 75 percentile
(ID: 32 ) Day Calls  is > than 75 percentile
(ID: 32 ) Day Charge  is > than 75 percentile
(ID: 32 ) Eve Mins  is > than 75 percentile
(ID: 32 ) Eve Charge  is > than 75 percentile
(ID: 32 ) Night Mins  is < than 25 percentile
(ID: 32 ) Night Calls  is < than 25 percentile
(ID: 32 ) Night Charge  is < than 25 percentile
(ID: 32 ) Intl Calls  is < than 25 percentile
(ID: 33 ) Day Mins  is > than 75 percentile
(ID: 33 ) Day Charge  is > than 75 percentile
(ID: 33 ) Eve Mins  is > than 75 percentile
(ID: 33 ) Eve Calls  is < than 25 percentile
(ID: 33 ) Eve Charge  is > than 75 percentile
(ID: 33 ) Night Mins  is > than 75 percentile
(ID: 33 ) Night Calls  is > than 75 percentile
(ID: 33 ) Night Charge  is > than 75 percentile
(ID: 33 ) Intl Mins  is > than 75 percentile
(ID: 33 ) Intl Calls  is > than 75 percentile
(ID: 33 ) Intl Charge  is > than 75 percentile
(ID: 34 ) Day Mins  is > than 75 percentile
(ID: 34 ) Day Calls  is > than 75 percentile
(ID: 34 ) Day Charge  is > than 75 percentile
(ID: 34 ) Eve Mins  is > than 75 percentile
(ID: 34 ) Eve Calls  is > than 75 percentile
(ID: 34 ) Eve Charge  is > than 75 percentile
(ID: 34 ) Night Mins  is > than 75 percentile
(ID: 34 ) Night Charge  is > than 75 percentile
(ID: 35 ) Int'l Plan  is > than 75 percentile
(ID: 35 ) VMail Message  is > than 75 percentile
(ID: 35 ) Eve Calls  is < than 25 percentile
(ID: 35 ) Night Calls  is < than 25 percentile
(ID: 35 ) Intl Calls  is < than 25 percentile
(ID: 35 ) CustServ Calls  is > than 75 percentile
(ID: 36 ) Day Mins  is < than 25 percentile
(ID: 36 ) Day Calls  is > than 75 percentile
(ID: 36 ) Day Charge  is < than 25 percentile
(ID: 36 ) Night Mins  is < than 25 percentile
(ID: 36 ) Night Calls  is > than 75 percentile
(ID: 36 ) Night Charge  is < than 25 percentile
(ID: 36 ) CustServ Calls  is > than 75 percentile
(ID: 37 ) Day Mins  is > than 75 percentile
(ID: 37 ) Day Charge  is > than 75 percentile
(ID: 37 ) Eve Mins  is > than 75 percentile
(ID: 37 ) Eve Charge  is > than 75 percentile
(ID: 37 ) Night Mins  is > than 75 percentile
(ID: 37 ) Night Calls  is > than 75 percentile
(ID: 37 ) Night Charge  is > than 75 percentile
(ID: 37 ) Intl Mins  is < than 25 percentile
(ID: 37 ) Intl Charge  is < than 25 percentile
(ID: 38 ) Int'l Plan  is > than 75 percentile
(ID: 38 ) VMail Message  is > than 75 percentile
(ID: 38 ) Day Calls  is < than 25 percentile
(ID: 38 ) Eve Calls  is > than 75 percentile
(ID: 38 ) Intl Mins  is > than 75 percentile
(ID: 38 ) Intl Charge  is > than 75 percentile
(ID: 38 ) CustServ Calls  is < than 25 percentile
(ID: 39 ) Day Mins  is < than 25 percentile
(ID: 39 ) Day Charge  is < than 25 percentile
(ID: 39 ) Night Mins  is > than 75 percentile
(ID: 39 ) Night Charge  is > than 75 percentile
(ID: 39 ) Intl Calls  is < than 25 percentile
(ID: 39 ) CustServ Calls  is > than 75 percentile
(ID: 40 ) Day Mins  is > than 75 percentile
(ID: 40 ) Day Charge  is > than 75 percentile
(ID: 40 ) Eve Mins  is > than 75 percentile
(ID: 40 ) Eve Charge  is > than 75 percentile
(ID: 40 ) Night Mins  is > than 75 percentile
(ID: 40 ) Night Charge  is > than 75 percentile
(ID: 41 ) Day Mins  is > than 75 percentile
(ID: 41 ) Day Charge  is > than 75 percentile
(ID: 41 ) Eve Mins  is > than 75 percentile
(ID: 41 ) Eve Calls  is > than 75 percentile
(ID: 41 ) Eve Charge  is > than 75 percentile
(ID: 41 ) Night Mins  is > than 75 percentile
(ID: 41 ) Night Calls  is > than 75 percentile
(ID: 41 ) Night Charge  is > than 75 percentile
(ID: 42 ) Day Mins  is > than 75 percentile
(ID: 42 ) Day Charge  is > than 75 percentile
(ID: 42 ) Eve Mins  is > than 75 percentile
(ID: 42 ) Eve Calls  is > than 75 percentile
(ID: 42 ) Eve Charge  is > than 75 percentile
(ID: 42 ) Night Calls  is > than 75 percentile
(ID: 42 ) Intl Mins  is > than 75 percentile
(ID: 42 ) Intl Charge  is > than 75 percentile
(ID: 42 ) CustServ Calls  is > than 75 percentile
(ID: 43 ) Int'l Plan  is > than 75 percentile
(ID: 43 ) Day Calls  is > than 75 percentile
(ID: 43 ) Eve Calls  is < than 25 percentile
(ID: 43 ) Intl Mins  is > than 75 percentile
(ID: 43 ) Intl Calls  is < than 25 percentile
(ID: 43 ) Intl Charge  is > than 75 percentile
(ID: 43 ) CustServ Calls  is < than 25 percentile
(ID: 44 ) Int'l Plan  is > than 75 percentile
(ID: 44 ) Day Calls  is < than 25 percentile
(ID: 44 ) Eve Calls  is < than 25 percentile
(ID: 44 ) Night Mins  is < than 25 percentile
(ID: 44 ) Night Charge  is < than 25 percentile
(ID: 44 ) Intl Mins  is > than 75 percentile
(ID: 44 ) Intl Charge  is > than 75 percentile
(ID: 44 ) CustServ Calls  is > than 75 percentile
(ID: 45 ) Day Mins  is > than 75 percentile
(ID: 45 ) Day Calls  is < than 25 percentile
(ID: 45 ) Day Charge  is > than 75 percentile
(ID: 45 ) Night Mins  is > than 75 percentile
(ID: 45 ) Night Calls  is > than 75 percentile
(ID: 45 ) Night Charge  is > than 75 percentile
(ID: 45 ) Intl Mins  is > than 75 percentile
(ID: 45 ) Intl Calls  is > than 75 percentile
(ID: 45 ) Intl Charge  is > than 75 percentile
(ID: 46 ) Day Mins  is > than 75 percentile
(ID: 46 ) Day Calls  is < than 25 percentile
(ID: 46 ) Day Charge  is > than 75 percentile
(ID: 46 ) Eve Mins  is > than 75 percentile
(ID: 46 ) Eve Calls  is < than 25 percentile
(ID: 46 ) Eve Charge  is > than 75 percentile
(ID: 46 ) Night Mins  is < than 25 percentile
(ID: 46 ) Night Calls  is < than 25 percentile
(ID: 46 ) Night Charge  is < than 25 percentile
(ID: 46 ) Intl Mins  is < than 25 percentile
(ID: 46 ) Intl Charge  is < than 25 percentile
(ID: 47 ) Int'l Plan  is > than 75 percentile
(ID: 47 ) Eve Mins  is > than 75 percentile
(ID: 47 ) Eve Charge  is > than 75 percentile
(ID: 47 ) Night Mins  is < than 25 percentile
(ID: 47 ) Night Charge  is < than 25 percentile
(ID: 47 ) Intl Calls  is < than 25 percentile
(ID: 48 ) Day Mins  is > than 75 percentile
(ID: 48 ) Day Charge  is > than 75 percentile
(ID: 48 ) Eve Calls  is < than 25 percentile
(ID: 48 ) Night Mins  is > than 75 percentile
(ID: 48 ) Night Charge  is > than 75 percentile
(ID: 48 ) CustServ Calls  is > than 75 percentile
(ID: 49 ) Day Mins  is > than 75 percentile
(ID: 49 ) Day Calls  is > than 75 percentile
(ID: 49 ) Day Charge  is > than 75 percentile
(ID: 49 ) Eve Mins  is > than 75 percentile
(ID: 49 ) Eve Calls  is > than 75 percentile
(ID: 49 ) Eve Charge  is > than 75 percentile
(ID: 49 ) Intl Mins  is > than 75 percentile
(ID: 49 ) Intl Calls  is < than 25 percentile
(ID: 49 ) Intl Charge  is > than 75 percentile
(ID: 50 ) Day Mins  is > than 75 percentile
(ID: 50 ) Day Calls  is > than 75 percentile
(ID: 50 ) Day Charge  is > than 75 percentile
(ID: 50 ) Eve Mins  is > than 75 percentile
(ID: 50 ) Eve Charge  is > than 75 percentile
(ID: 50 ) Night Calls  is > than 75 percentile
(ID: 51 ) Int'l Plan  is > than 75 percentile
(ID: 51 ) Day Calls  is < than 25 percentile
(ID: 51 ) Eve Calls  is > than 75 percentile
(ID: 51 ) Night Calls  is < than 25 percentile
(ID: 51 ) Intl Calls  is < than 25 percentile
(ID: 51 ) CustServ Calls  is > than 75 percentile
(ID: 52 ) Day Mins  is > than 75 percentile
(ID: 52 ) Day Charge  is > than 75 percentile
(ID: 52 ) Eve Mins  is > than 75 percentile
(ID: 52 ) Eve Charge  is > than 75 percentile
(ID: 53 ) Int'l Plan  is > than 75 percentile
(ID: 53 ) Day Mins  is < than 25 percentile
(ID: 53 ) Day Calls  is < than 25 percentile
(ID: 53 ) Day Charge  is < than 25 percentile
(ID: 53 ) Eve Mins  is < than 25 percentile
(ID: 53 ) Eve Charge  is < than 25 percentile
(ID: 53 ) Night Calls  is > than 75 percentile
(ID: 53 ) Intl Calls  is < than 25 percentile
(ID: 54 ) Day Mins  is > than 75 percentile
(ID: 54 ) Day Calls  is > than 75 percentile
(ID: 54 ) Day Charge  is > than 75 percentile
(ID: 54 ) Intl Mins  is > than 75 percentile
(ID: 54 ) Intl Charge  is > than 75 percentile
(ID: 55 ) Day Mins  is > than 75 percentile
(ID: 55 ) Day Charge  is > than 75 percentile
(ID: 55 ) Eve Calls  is < than 25 percentile
(ID: 55 ) Night Calls  is < than 25 percentile
(ID: 55 ) Intl Calls  is < than 25 percentile
(ID: 55 ) CustServ Calls  is < than 25 percentile
(ID: 56 ) Day Mins  is < than 25 percentile
(ID: 56 ) Day Charge  is < than 25 percentile
(ID: 56 ) Eve Calls  is < than 25 percentile
(ID: 56 ) Night Mins  is < than 25 percentile
(ID: 56 ) Night Calls  is < than 25 percentile
(ID: 56 ) Night Charge  is < than 25 percentile
(ID: 56 ) Intl Calls  is < than 25 percentile
(ID: 56 ) CustServ Calls  is > than 75 percentile
(ID: 57 ) Day Mins  is < than 25 percentile
(ID: 57 ) Day Charge  is < than 25 percentile
(ID: 57 ) Intl Mins  is < than 25 percentile
(ID: 57 ) Intl Charge  is < than 25 percentile
(ID: 57 ) CustServ Calls  is > than 75 percentile
(ID: 58 ) Day Mins  is > than 75 percentile
(ID: 58 ) Day Charge  is > than 75 percentile
(ID: 58 ) Eve Mins  is > than 75 percentile
(ID: 58 ) Eve Charge  is > than 75 percentile
(ID: 58 ) Night Mins  is > than 75 percentile
(ID: 58 ) Night Calls  is > than 75 percentile
(ID: 58 ) Night Charge  is > than 75 percentile
(ID: 58 ) Intl Mins  is < than 25 percentile
(ID: 58 ) Intl Charge  is < than 25 percentile
(ID: 59 ) Int'l Plan  is > than 75 percentile
(ID: 59 ) Eve Calls  is < than 25 percentile
(ID: 59 ) Night Mins  is < than 25 percentile
(ID: 59 ) Night Charge  is < than 25 percentile
(ID: 59 ) Intl Mins  is > than 75 percentile
(ID: 59 ) Intl Charge  is > than 75 percentile
(ID: 59 ) CustServ Calls  is < than 25 percentile
(ID: 60 ) Day Mins  is > than 75 percentile
(ID: 60 ) Day Calls  is > than 75 percentile
(ID: 60 ) Day Charge  is > than 75 percentile
(ID: 60 ) Eve Mins  is > than 75 percentile
(ID: 60 ) Eve Calls  is > than 75 percentile
(ID: 60 ) Eve Charge  is > than 75 percentile
(ID: 60 ) Intl Calls  is < than 25 percentile
(ID: 60 ) CustServ Calls  is > than 75 percentile
(ID: 61 ) Int'l Plan  is > than 75 percentile
(ID: 61 ) Eve Mins  is > than 75 percentile
(ID: 61 ) Eve Charge  is > than 75 percentile
(ID: 61 ) Night Mins  is > than 75 percentile
(ID: 61 ) Night Calls  is > than 75 percentile
(ID: 61 ) Night Charge  is > than 75 percentile
(ID: 61 ) Intl Mins  is > than 75 percentile
(ID: 61 ) Intl Calls  is < than 25 percentile
(ID: 61 ) Intl Charge  is > than 75 percentile
(ID: 62 ) Int'l Plan  is > than 75 percentile
(ID: 62 ) Day Mins  is > than 75 percentile
(ID: 62 ) Day Calls  is < than 25 percentile
(ID: 62 ) Day Charge  is > than 75 percentile
(ID: 62 ) Eve Mins  is > than 75 percentile
(ID: 62 ) Eve Charge  is > than 75 percentile
(ID: 62 ) Night Mins  is < than 25 percentile
(ID: 62 ) Night Charge  is < than 25 percentile
(ID: 62 ) Intl Calls  is < than 25 percentile
(ID: 62 ) CustServ Calls  is > than 75 percentile
(ID: 63 ) Int'l Plan  is > than 75 percentile
(ID: 63 ) Day Mins  is > than 75 percentile
(ID: 63 ) Day Calls  is > than 75 percentile
(ID: 63 ) Day Charge  is > than 75 percentile
(ID: 63 ) Night Calls  is < than 25 percentile
(ID: 63 ) Intl Calls  is < than 25 percentile
(ID: 64 ) Day Mins  is < than 25 percentile
(ID: 64 ) Day Charge  is < than 25 percentile
(ID: 64 ) Eve Mins  is < than 25 percentile
(ID: 64 ) Eve Calls  is < than 25 percentile
(ID: 64 ) Eve Charge  is < than 25 percentile
(ID: 64 ) Night Mins  is > than 75 percentile
(ID: 64 ) Night Calls  is < than 25 percentile
(ID: 64 ) Night Charge  is > than 75 percentile
(ID: 64 ) Intl Mins  is < than 25 percentile
(ID: 64 ) Intl Charge  is < than 25 percentile
(ID: 64 ) CustServ Calls  is > than 75 percentile
(ID: 65 ) Day Mins  is > than 75 percentile
(ID: 65 ) Day Charge  is > than 75 percentile
(ID: 65 ) Intl Mins  is > than 75 percentile
(ID: 65 ) Intl Charge  is > than 75 percentile
(ID: 66 ) Day Mins  is > than 75 percentile
(ID: 66 ) Day Calls  is < than 25 percentile
(ID: 66 ) Day Charge  is > than 75 percentile
(ID: 66 ) Night Mins  is > than 75 percentile
(ID: 66 ) Night Charge  is > than 75 percentile
(ID: 66 ) Intl Calls  is > than 75 percentile
(ID: 67 ) Day Mins  is > than 75 percentile
(ID: 67 ) Day Calls  is < than 25 percentile
(ID: 67 ) Day Charge  is > than 75 percentile
(ID: 67 ) Eve Calls  is < than 25 percentile
(ID: 67 ) Night Mins  is > than 75 percentile
(ID: 67 ) Night Charge  is > than 75 percentile
(ID: 67 ) CustServ Calls  is > than 75 percentile
(ID: 68 ) Day Calls  is > than 75 percentile
(ID: 68 ) Eve Calls  is > than 75 percentile
(ID: 68 ) Night Calls  is < than 25 percentile
(ID: 68 ) CustServ Calls  is > than 75 percentile
(ID: 69 ) Day Mins  is > than 75 percentile
(ID: 69 ) Day Calls  is < than 25 percentile
(ID: 69 ) Day Charge  is > than 75 percentile
(ID: 69 ) Eve Mins  is > than 75 percentile
(ID: 69 ) Eve Charge  is > than 75 percentile
(ID: 69 ) Night Mins  is > than 75 percentile
(ID: 69 ) Night Charge  is > than 75 percentile
(ID: 70 ) Int'l Plan  is > than 75 percentile
(ID: 70 ) Day Mins  is > than 75 percentile
(ID: 70 ) Day Charge  is > than 75 percentile
(ID: 70 ) Eve Calls  is < than 25 percentile
(ID: 70 ) Night Mins  is > than 75 percentile
(ID: 70 ) Night Charge  is > than 75 percentile
(ID: 70 ) Intl Mins  is > than 75 percentile
(ID: 70 ) Intl Charge  is > than 75 percentile
(ID: 71 ) Day Mins  is < than 25 percentile
(ID: 71 ) Day Calls  is < than 25 percentile
(ID: 71 ) Day Charge  is < than 25 percentile
(ID: 71 ) Eve Calls  is < than 25 percentile
(ID: 71 ) Night Calls  is < than 25 percentile
(ID: 71 ) Intl Mins  is < than 25 percentile
(ID: 71 ) Intl Charge  is < than 25 percentile
(ID: 71 ) CustServ Calls  is > than 75 percentile
(ID: 72 ) Int'l Plan  is > than 75 percentile
(ID: 72 ) Day Mins  is > than 75 percentile
(ID: 72 ) Day Calls  is > than 75 percentile
(ID: 72 ) Day Charge  is > than 75 percentile
(ID: 72 ) Eve Mins  is > than 75 percentile
(ID: 72 ) Eve Charge  is > than 75 percentile
(ID: 72 ) Night Mins  is > than 75 percentile
(ID: 72 ) Night Charge  is > than 75 percentile
(ID: 72 ) Intl Calls  is > than 75 percentile
(ID: 73 ) Day Mins  is < than 25 percentile
(ID: 73 ) Day Calls  is > than 75 percentile
(ID: 73 ) Day Charge  is < than 25 percentile
(ID: 73 ) Eve Calls  is < than 25 percentile
(ID: 73 ) Night Mins  is < than 25 percentile
(ID: 73 ) Night Charge  is < than 25 percentile
(ID: 73 ) Intl Mins  is > than 75 percentile
(ID: 73 ) Intl Calls  is < than 25 percentile
(ID: 73 ) Intl Charge  is > than 75 percentile
(ID: 73 ) CustServ Calls  is > than 75 percentile
(ID: 74 ) Int'l Plan  is > than 75 percentile
(ID: 74 ) Eve Mins  is < than 25 percentile
(ID: 74 ) Eve Charge  is < than 25 percentile
(ID: 74 ) Night Calls  is > than 75 percentile
(ID: 74 ) Intl Mins  is < than 25 percentile
(ID: 74 ) Intl Calls  is < than 25 percentile
(ID: 74 ) Intl Charge  is < than 25 percentile
(ID: 74 ) CustServ Calls  is < than 25 percentile
(ID: 75 ) Int'l Plan  is > than 75 percentile
(ID: 75 ) Day Mins  is > than 75 percentile
(ID: 75 ) Day Charge  is > than 75 percentile
(ID: 75 ) Eve Mins  is > than 75 percentile
(ID: 75 ) Eve Calls  is < than 25 percentile
(ID: 75 ) Eve Charge  is > than 75 percentile
(ID: 75 ) Intl Mins  is < than 25 percentile
(ID: 75 ) Intl Charge  is < than 25 percentile
(ID: 75 ) CustServ Calls  is < than 25 percentile
(ID: 76 ) Day Calls  is < than 25 percentile
(ID: 76 ) Eve Calls  is > than 75 percentile
(ID: 76 ) Intl Mins  is < than 25 percentile
(ID: 76 ) Intl Calls  is < than 25 percentile
(ID: 76 ) Intl Charge  is < than 25 percentile
(ID: 76 ) CustServ Calls  is > than 75 percentile
Out[498]:
State                   SD
Account Length          98
Area Code              415
Phone             392-2555
Int'l Plan               0
VMail Plan               0
VMail Message            0
Day Mins                 0
Day Calls                0
Day Charge               0
Eve Mins             159.6
Eve Calls              130
Eve Charge           13.57
Night Mins           167.1
Night Calls             88
Night Charge          7.52
Intl Mins              6.8
Intl Calls               1
Intl Charge           1.84
CustServ Calls           4
Churn?                   1
AK                       0
AL                       0
AR                       0
AZ                       0
CA                       0
CO                       0
CT                       0
DC                       0
DE                       0
                    ...   
MO                       0
MS                       0
MT                       0
NC                       0
ND                       0
NE                       0
NH                       0
NJ                       0
NM                       0
NV                       0
NY                       0
OH                       0
OK                       0
OR                       0
PA                       0
RI                       0
SC                       0
SD                       1
TN                       0
TX                       0
UT                       0
VA                       0
VT                       0
WA                       0
WI                       0
WV                       0
WY                       0
408                      0
415                      1
510                      0
Name: 1345, Length: 75, dtype: object
In [581]:
# get all known not to churn
not_churn = X_train[X_train['Churn?']==False].copy()

find_closet_df = []

# add row to find insights
find_closet_df.append(new_df[0])

for index, row in not_churn.iterrows():
    find_closet_df.append(row[list(churn_df)])
    
find_closet_df = pd.DataFrame(find_closet_df)
find_closet_df['ID'] = [idx for idx in range(1,len(find_closet_df)+1)]
find_closet_df.head()
Out[581]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... UT VA VT WA WI WV WY 408 415 510
1345 SD 98 415 392-2555 0 0 0 0.0 0 0.00 ... 0 0 0 0 0 0 0 0 1 0
2016 RI 80 510 332-8764 0 0 0 202.4 118 34.41 ... 0 0 0 0 0 0 0 0 0 1
1362 WV 63 510 329-7102 0 0 0 132.9 122 22.59 ... 0 0 0 0 0 1 0 0 0 1
2670 WY 116 510 392-2733 0 1 12 221.0 108 37.57 ... 0 0 0 0 0 0 1 0 0 1
1846 NH 120 510 395-2579 0 1 43 177.9 117 30.24 ... 0 0 0 0 0 0 0 0 0 1

5 rows × 75 columns

Find Closest Clusters to the Embedded Churn Risk

In [582]:
from sklearn.cluster import KMeans 
num_clusters = 20
kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(find_closet_df[features])
labels = kmeans.labels_
find_closet_df['clusters'] = labels
find_closet_df.head()
Out[582]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... VA VT WA WI WV WY 408 415 510 clusters
1345 SD 98 415 392-2555 0 0 0 0.0 0 0.00 ... 0 0 0 0 0 0 0 1 0 6
2016 RI 80 510 332-8764 0 0 0 202.4 118 34.41 ... 0 0 0 0 0 0 0 0 1 6
1362 WV 63 510 329-7102 0 0 0 132.9 122 22.59 ... 0 0 0 0 1 0 0 0 1 6
2670 WY 116 510 392-2733 0 1 12 221.0 108 37.57 ... 0 0 0 0 0 1 0 0 1 6
1846 NH 120 510 395-2579 0 1 43 177.9 117 30.24 ... 0 0 0 0 0 0 0 0 1 6

5 rows × 76 columns

We compare the row with high-probability of churn against non-churns

We find 13 rows of non-churn resembling row 0 with the high-probability of churn, thus we recommend offering day-time credits to this customer.

In [584]:
find_closet_df[find_closet_df['clusters']==6][features]
Out[584]:
Account Length Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge Eve Mins Eve Calls Eve Charge ... UT VA VT WA WI WV WY 408 415 510
1345 98 0 0 0 0.0 0 0.00 159.6 130 13.57 ... 0 0 0 0 0 0 0 0 1 0
2016 80 0 0 0 202.4 118 34.41 260.2 67 22.12 ... 0 0 0 0 0 0 0 0 0 1
1362 63 0 0 0 132.9 122 22.59 67.0 62 5.70 ... 0 0 0 0 0 1 0 0 0 1
2670 116 0 1 12 221.0 108 37.57 151.0 118 12.84 ... 0 0 0 0 0 0 1 0 0 1
1846 120 0 1 43 177.9 117 30.24 175.1 70 14.88 ... 0 0 0 0 0 0 0 0 0 1
2071 132 0 0 0 181.1 121 30.79 314.4 109 26.72 ... 0 0 0 0 0 0 0 0 0 1
3018 105 0 0 0 156.5 102 26.61 140.2 134 11.92 ... 0 0 0 0 0 0 0 0 1 0
3269 117 1 0 0 198.4 121 33.73 249.5 104 21.21 ... 0 0 0 0 0 1 0 0 0 1
2584 64 0 0 0 216.9 78 36.87 211.0 115 17.94 ... 0 0 0 0 0 0 0 0 0 1
2561 143 0 1 33 141.4 130 24.04 186.4 114 15.84 ... 0 0 0 0 0 0 0 0 0 1
456 60 0 0 0 98.2 88 16.69 180.5 69 15.34 ... 0 0 0 0 0 0 0 0 1 0
462 144 0 1 18 106.4 109 18.09 108.1 113 9.19 ... 0 0 0 0 0 0 0 0 1 0
2474 80 0 1 22 196.4 115 33.39 150.3 109 12.78 ... 0 0 0 0 0 0 0 0 0 1
926 143 0 0 0 209.1 127 35.55 106.1 80 9.02 ... 0 0 0 0 0 0 0 0 1 0
1055 161 0 0 0 178.1 109 30.28 146.5 86 12.45 ... 0 0 0 0 0 0 0 0 1 0
2177 109 0 0 0 193.6 58 32.91 148.7 115 12.64 ... 0 0 0 0 0 0 0 0 1 0
381 97 0 0 0 151.6 107 25.77 155.4 96 13.21 ... 0 0 0 0 0 0 0 0 1 0
3108 98 0 1 30 110.3 71 18.75 182.4 108 15.50 ... 0 0 0 0 0 0 0 1 0 0
2328 122 0 0 0 168.3 96 28.61 87.6 91 7.45 ... 0 0 0 0 0 0 0 0 1 0
2031 130 0 0 0 139.1 72 23.65 246.0 112 20.91 ... 0 0 0 0 0 0 0 0 0 1
929 24 0 0 0 241.9 104 41.12 145.2 112 12.34 ... 0 0 0 0 0 0 0 0 1 0
1270 74 0 0 0 162.7 102 27.66 292.0 105 24.82 ... 0 0 0 0 0 0 0 0 1 0
1743 35 0 0 0 260.8 87 44.34 258.1 78 21.94 ... 0 0 0 0 0 0 0 0 1 0
1010 105 0 0 0 246.4 83 41.89 256.2 101 21.78 ... 0 0 0 0 0 0 0 0 0 1
2855 95 0 0 0 149.2 96 25.36 260.7 116 22.16 ... 0 0 0 0 0 0 0 0 1 0
1730 161 0 0 0 107.5 121 18.28 256.4 46 21.79 ... 0 0 0 0 0 0 0 1 0 0
662 63 0 0 0 211.2 80 35.90 237.7 93 20.20 ... 0 0 0 0 0 0 0 0 1 0
3021 57 0 0 0 85.9 92 14.60 193.9 127 16.48 ... 0 0 0 0 0 0 0 0 1 0
1726 50 0 0 0 131.7 108 22.39 216.5 103 18.40 ... 0 0 0 0 0 1 0 0 0 1
2807 52 0 0 0 217.0 104 36.89 152.3 83 12.95 ... 0 0 0 0 0 0 0 1 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018 153 0 1 22 167.7 104 28.51 246.8 91 20.98 ... 0 0 0 0 0 0 0 0 1 0
1814 72 0 0 0 198.4 147 33.73 216.9 121 18.44 ... 0 0 0 0 0 0 0 0 1 0
3131 107 0 0 0 189.7 76 32.25 156.1 65 13.27 ... 0 0 1 0 0 0 0 0 0 1
560 80 0 0 0 160.6 103 27.30 237.0 109 20.15 ... 0 0 1 0 0 0 0 0 1 0
707 84 0 1 42 165.3 97 28.10 223.5 118 19.00 ... 0 0 0 0 0 0 0 0 0 1
1444 79 0 0 0 222.3 99 37.79 146.2 82 12.43 ... 0 0 0 0 0 1 0 0 1 0
424 112 0 1 16 221.6 110 37.67 130.2 123 11.07 ... 0 0 0 0 0 0 0 0 1 0
1093 210 0 0 0 104.6 121 17.78 149.5 71 12.71 ... 0 1 0 0 0 0 0 1 0 0
647 88 0 0 0 192.0 91 32.64 127.6 127 10.85 ... 0 0 0 0 0 0 0 0 1 0
72 147 0 0 0 248.6 83 42.26 148.9 85 12.66 ... 0 0 0 0 0 0 0 0 0 1
1477 112 0 0 0 168.6 102 28.66 298.0 117 25.33 ... 0 0 0 0 0 0 0 0 1 0
1723 80 0 0 0 149.8 123 25.47 276.3 75 23.49 ... 0 0 0 0 0 0 0 0 0 1
1351 13 0 0 0 58.4 121 9.93 262.2 64 22.29 ... 0 0 0 0 0 0 0 0 1 0
1005 12 1 0 0 216.7 117 36.84 116.5 126 9.90 ... 0 0 0 0 0 0 0 0 0 1
620 163 0 0 0 191.3 89 32.52 193.9 87 16.48 ... 0 0 0 0 0 0 0 0 1 0
2364 54 0 1 33 161.8 73 27.51 273.0 58 23.21 ... 0 0 0 0 0 0 0 0 1 0
553 61 1 0 0 78.2 103 13.29 195.9 149 16.65 ... 1 0 0 0 0 0 0 0 0 1
1737 134 0 0 0 141.7 95 24.09 205.6 101 17.48 ... 0 0 0 0 0 0 0 0 1 0
998 59 0 0 0 179.4 80 30.50 232.5 99 19.76 ... 0 0 0 0 0 0 0 0 0 1
2013 92 0 0 0 196.5 82 33.41 190.0 89 16.15 ... 0 0 0 0 0 0 0 0 1 0
1446 111 0 1 28 128.8 104 21.90 157.3 52 13.37 ... 0 0 0 0 0 0 0 0 1 0
3049 147 0 0 0 130.6 83 22.20 208.1 144 17.69 ... 0 0 0 0 0 0 0 0 1 0
3315 149 0 1 18 148.5 106 25.25 114.5 106 9.73 ... 0 0 0 0 0 0 0 0 1 0
589 117 0 1 14 80.2 81 13.63 219.0 103 18.62 ... 0 0 0 0 1 0 0 1 0 0
2417 120 0 0 0 98.2 99 16.69 186.7 85 15.87 ... 0 0 0 0 0 0 0 1 0 0
2709 193 0 1 31 71.2 58 12.10 124.7 105 10.60 ... 0 0 0 0 0 0 0 0 1 0
1628 131 0 0 0 110.9 74 18.85 115.6 90 9.83 ... 0 0 0 0 0 0 1 0 0 1
49 97 0 1 24 133.2 135 22.64 217.2 58 18.46 ... 0 0 0 0 0 0 1 0 1 0
1072 164 0 1 25 219.1 88 37.25 151.5 99 12.88 ... 0 0 0 0 0 0 0 1 0 0
3074 113 0 0 0 72.5 88 12.33 204.0 112 17.34 ... 0 0 0 0 0 1 0 0 0 1

142 rows × 71 columns

In [588]:
find_closet_df.head()
Out[588]:
State Account Length Area Code Phone Int'l Plan VMail Plan VMail Message Day Mins Day Calls Day Charge ... VA VT WA WI WV WY 408 415 510 clusters
1345 SD 98 415 392-2555 0 0 0 0.0 0 0.00 ... 0 0 0 0 0 0 0 1 0 6
2016 RI 80 510 332-8764 0 0 0 202.4 118 34.41 ... 0 0 0 0 0 0 0 0 1 6
1362 WV 63 510 329-7102 0 0 0 132.9 122 22.59 ... 0 0 0 0 1 0 0 0 1 6
2670 WY 116 510 392-2733 0 1 12 221.0 108 37.57 ... 0 0 0 0 0 1 0 0 1 6
1846 NH 120 510 395-2579 0 1 43 177.9 117 30.24 ... 0 0 0 0 0 0 0 0 1 6

5 rows × 76 columns

In [ ]:
def risk_compare(cluster_df, cluster_number, var1, var2):
    mydat = find_closet_df.copy()
    mydat = mydat[mydat['clusters'] == cluster_number]
    mydat = mydat[[var1, var2, 'clusters']]
    # differentiate high-risk churn customer
    mydat.iat[0, 2] = 0

    sns.lmplot(var1, var2, data=mydat,
               fit_reg=False, hue="clusters", 
               scatter_kws={"marker": "D", "s": 100})

    plt.xlabel(var1)
    plt.ylabel(var2)
    plt.show()
    
In [593]:
risk_compare(find_closet_df.copy(), 6, 'Night Mins', 'Night Calls')
 
In [591]:
    
risk_compare(find_closet_df.copy(), 6, 'Day Mins', 'Eve Mins')

Show Notes

(pardon typos and formatting -
these are the notes I use to make the videos)

Lets talk modeling for actionable insights! Building a predictive model is only the first step as your end user or customer wont know what to do with an AUC or RMSE score, but if you can tell them WHO is at risk, WHY and WHAT they can do about it - thats actionable and can even be translated into dollar amounts!! And Were going to do it with XGBoost on a C5.0 dataset entitled Customer Churn MORE: Blog or code: http://www.viralml.com/video-content.html?fm=yt&v=XfPND5wA7Vw Signup for my newsletter and more: http://www.viralml.com Connect on Twitter: https://twitter.com/amunategui My books on Amazon: The Little Book of Fundamental Indicators: Hands-On Market Analysis with Python: Find Your Market Bearings with Python, Jupyter Notebooks, and Freely Available Data: https://amzn.to/2DERG3d Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud: https://amzn.to/2PV3GCV Grow Your Web Brand, Visibility & Traffic Organically: 5 Years of amunategui.github.Io and the Lessons I Learned from Growing My Online Community from the Ground Up: Fringe Tactics - Finding Motivation in Unusual Places: Alternative Ways of Coaxing Motivation Using Raw Inspiration, Fear, and In-Your-Face Logic https://amzn.to/2DYWQas Create Income Streams with Online Classes: Design Classes That Generate Long-Term Revenue: https://amzn.to/2VToEHK Defense Against The Dark Digital Attacks: How to Protect Your Identity and Workflow in 2019: https://amzn.to/2Jw1AYS Transcript Hello Friends lets talk modeling for the actionable insight! What do I mean by that? Well, building a predictive model is only the first step as your end user or customer wont know what to do with an AUC or RMSE score, but if you can tell them who is at risk, why and what they can do about it - thats actionable and can even be translated into dollar amounts!! And Were going to do it with XGBoost on a dataset called customer churn welcome to ViralML, my name in Manuel Amunategui, am the author of Monetizing ML, to extend you machine learning models to the web so everybody can enjoy them and even look at a way to monetize them through paywalls. I also have a free class all on youtube Start with the first then work your way down. So, signup for my newsletter and connect and subscribe. So, back to actionable insight. If you can tell your customer how to prevent someone from dropping out of your service, and it costs them $1000 dollars to acquire that person. You can put a dollar amount on the model and thats a language that those in charge that write checks understand - your employer and customer will love you. Were going to use a data set from.... CATEGORY:DataScience HASCODE:Modeling-For-Actionable-Insight.html