Introduction
When I am using group by(),pivot_table() to analyze the data and do data manipulation methods like a heat map, pandas introduce me to a brandnew function called crosstab(). The crosstab function is used to calculate the crosstabulation, which helps me to understand the relationship between variables easily. In this article, we discuss how to use crosstab() and why you should use it instead of other similar functions. They are widely used in polling, business intelligence, engineering, and scientific research. It provides a basic picture of the relationship between two variables and can aid in the discovery of interrelationship between them.
In Data Analysis, Pandas has many functions to analyze the data, group the data, and Summarize it. A crosstab (also called a contingency table or crosstabulation) is a table that displays the frequency distribution of one variable in rows and another in columns.
OverView
 Pandas Crosstab Parameters
 Pandas Crosstab Basics
 Pandas Crosstab with aggregation functions
 Pandas Crosstab Vs Pivot table Vs Groupby functions
 Pandas Crosstab using plot
 Conclusion
Pandas Crosstab Parameters
Pandas Crosstab syntax is,
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)
The table represents the parameters in the crosstab function:
Name  Description  Type 
Index  Values to group by in rows  Array/List/Series 
Columns  Values to group by in columns  Array/Series 
Values  The array of Values to aggregate based on factors.  Array 
rownames  If passed, the number of row arrays must match.  Sequential 
colnames  If passed, the number of columns arrays must match.  Sequential 
aggfunc  If specified, values must also be specified.  Function 
margins  Include row/column margins (subtotals).  Bool 
margins_name  When margins are True, the name of the row/column that will contain the totals is used.  Str 
dropna  Columns with all NaN values should not be included.  Boolean 
normalize  Normalize by dividing all values by their sum.  bool, {‘all’, ‘index’, ‘columns’}, or {0,1} 
Pandas CrossTab Basics
Learning the Pandas Crosstab method by implementing it on Credit Card Customer Dataset
import pandas as pd
df = pd.read_csv(r"./BUGGY_PROGRAMMER/DATASET/BankChurners.csv")
df.head()
crosstab() always returns a data frame, as shown below. The data frame is a crosstabulation of two variables: Education_level and Gender. Cross tabulation simply means displaying one variable’s groups as indexes and the other variable’s groups as columns.
Implementation of Crosstab function using Python
crosstab_table = pd.crosstab(df['Education_Level'],df['Gender'])
print(crostab_table)
Here, We are implementing crosstab function to get relationship pattern between Income_Category and Gender Variables
"""Implementing Crosstab function using Python"""
crosstab_table2 =pd.crosstab(df['Income_Category'],df['Gender'])
print(crosstab_table2)
table3 = pd.crosstab(df['Gender'],df['Income_Category'])
print(table3)
Pandas Crosstab function Sum
Here sum function is used to add values of crosstab
sum_of_crosstab = pd.crosstab(df['Gender'],df['Income_Category']).sum()
print(sum_of_crosstab)
Pandas Cross tab Normalize
 If ‘all’ or True is specified, the overall data will be normalized.
 If ‘index’ is given, it will normalize each row.
 If ‘columns’ are supplied, it will normalize each column.
 If margins are set to True, the margin values will be normalized as well.
Implementing Normalize using Python
"""Normalizing the overall data values"""
normalize_all = pd.crosstab(df['Gender'],df['Income_Category'],normalize='all')
print(normalize_all)
Here, We are normalizing the overall data values(rows and columns) of two variables Income_Category and Gender with percentages
"""Normalizing the row / index values"""
normalize_row = pd.crosstab(df['Gender'],df['Income_Category'],normalize='index')
print(normalize_row)
"""Normalizing the columns values"""
normalize_columns = pd.crosstab(df['Gender'],df['Income_Category'],normalize='columns')
print(normalize_columns)
normalize_true = pd.crosstab(df['Gender'],df['Income_Category'],normalize=True)
print(normalize_true)
custom_pd.crosstab(index=df['Card_Category'],
columns=df['Attrition_Flag'],
margins=True,
margins_name='Total Percentage',
normalize=True)
Pandas Crosstab Count
Pandas Crosstab function we can perform count operation as shown below, where the count is the total number of times a specific thing. To perform count we have the set margin to be True and margin_name is to be all, which default one, or else we can change the column name to be like Total_count.
crosstab_count = pd.crosstab(index =
df['Gender'],columns=df['Income_Category'],margins=True,margins_name="Total_count")
print(crosstab_Count)
Pandas Crosstab Multiple
Multiple_indexes = pd.crosstab(index =[df['Gender'],df['Card_Category']],columns =df['Income_Category'])
print(Multiple_indexes)
Pandas Crosstab Multiple Columns
multi_col = pd.crosstab(index =df['Gender'],columns =[df['Income_Category'],df['Card_Category']])
print(multi_col)
Pandas Crosstab with aggregation function
import numpy as np
pd.crosstab(
df.Card_Category,
df.Attrition_Flag,
values=df.Months_on_book,
aggfunc=np.mean
)
pd.crosstab(
df.Card_Category,
df.Attrition_Flag,
values=df.Months_on_book,
aggfunc=np.median,
margins_name ="Median",margins = True
)
Pandas Crosstab Vs Pivot table Vs Groupby functions
We compare the differences between the three functions of crosstab(),pivot_table(), and groupby() in syntax and return of their results.
Group by Function in Pandas
df.groupby(['Card_Category', 'Attrition_Flag'])['Months_on_book']\
.agg([np.mean]).reset_index()
Pivot_table Function in Pandas
df.pivot_table(values='Months_on_book',
index='Card_Category',
columns='Attrition_Flag',
aggfunc=np.mean)
The pivot table expects your input data to already be a DataFrame, while you pass a data frame to the pivot table and specify the index, columns, and values bypassing the column names as strings. You don’t need to pass a dataFrame into crosstab because you just pass arraylike objects for index, columns, and values. Crosstab has the normalized parameter to perform percentage by dividing all values by their sum where pivot table doesn’t have.
groupby()  pivot_table()  crosstab() 
Input data should be in Dataframe  Input data should be in Dataframe  You can pass arraylike objects 
Passing Columns name in String format  Passing Columns in string format  No need to pass in string format 
Pandas Crosstab using plot
How to prepare plots for crosstab function in pandas as shown below
Implementing the line, bar, pie charts, and heat map using python
pd.crosstab(df['Income_Category'],df['Gender']).plot(kind='line',figsize=(12,5))
pd.crosstab(df['Income_Category'],df['Gender']).plot(kind='bar',figsize=(12,5))
pd.crosstab(df['Income_Category'],df['Gender']).plot(kind='pie',subplots =True,figsize=(12,12))
import seaborn as sns
cross_tab = pd.crosstab(
df.Card_Category,
df.Attrition_Flag,
values=df.Months_on_book,
aggfunc=np.mean
)
sns.heatmap(cross_tab, cmap='rocket_r', annot=True, fmt='g')
Also, read more about Data Visualization – Bokeh Vs Plotly
Conclusion
In this article, We understand the crosstab function utilization by implementing with credit card customer dataset, and with help of parameters in the function, we can count the values, normalize them with percentages and we perform different types of aggregate functions like mean, median e.t.c., how to plot the different types of the plots for crosstab to have better analysis compared to pivot table and group by functions in pandas. Feel free to ask questions or comment your thoughts on the topic, if any need math and details further.
Thanks for reading the article.

Ranga Seshahttps://buggyprogrammer.com/author/ranga/

Ranga Seshahttps://buggyprogrammer.com/author/ranga/

Ranga Seshahttps://buggyprogrammer.com/author/ranga/