Seaborn for Data Analysis

James Olayinka

By James Olayinka

Aug 21

Seaborn offers a high-level interface for creating visually appealing and intuitive plots. It provides a consistent API and integrates seamlessly with Pandas, making it an excellent choice for working with structured datasets. With Seaborn, you can easily generate complex visualizations, uncover patterns and relationships, and reveal insights that might be hidden in the data.

Data visualization is a powerful tool for understanding and communicating insights from data. Seaborn, a Python data visualization library built on top of Matplotlib, offers a wide range of statistical graphics and visualizations. One of the standout features of Seaborn is its ability to effortlessly produce aesthetically pleasing plots with minimal code. It offers a range of color palettes, plot styles, and themes that allow you to customize the appearance of your visualizations, making them more engaging and impactful. Seaborn also provides built-in support for statistical estimation and visualization, making it easier to analyze relationships and distributions within your data.

In this tutorial, I will dive into the world of Seaborn, exploring its various functionalities and demonstrating how you can leverage its capabilities to create impactful visualizations. From basic plots to advanced statistical graphics, I will guide you through the process of using Seaborn to enhance your data visualization workflow.

In this article, I will explore the following sub-topics highlighted below in a bid to improve your understanding of the Seaborn Library.

  • Installing Seaborn Library
  • Loading the titanic dataset
  • Basic Plotting with Seaborn
  • Categorical Data Visualization with Seaborn
  • Advanced Data Visualization with Seaborn
  • Styling and Customization in Seaborn
  • Statistical Estimation and Visualization with Seaborn
  • Working with Time Series Data in Seaborn

Let’s get started…

Installing Seaborn Library

Before I begin, make sure you have Seaborn installed on any IDE - integrated developemnt environment you are using. You can install it using pip on Jupyter Notebook.

Note : PIP stands for python installer package.

pip install seaborn

Loading the titanic dataset

I will be leveraging the popular Titanic dataset in this tutorial, which contains information about the passengers who were aboard the Titanic when it sank.

Once you've downloaded the dataset, you can load it into a Pandas DataFrame like this”

import pandas as pd

# Load the Titanic dataset
df = pd.read_csv('titanic.csv')

Basic Plotting with Seaborn

  • Line Plots: Line plots are useful for visualizing trends and patterns over continuous variables.
import seaborn as sns

# Line plot of age versus fare

sns.lineplot(data=df, x='Age', y='Fare')

This will give a chart that looks like this..

  • Bar Plots: Bar plots are ideal for comparing categorical variables and their frequencies.
# Bar plot of passenger class counts

sns.countplot(data=df, x='Pclass')

This will give a chart that looks like this...

  • Scatter Plots: Scatter plots display the relationship between two numerical variables.

# Scatter plot of age versus fare

sns.scatterplot(data=df, x='Age', y='Fare', hue='Survived')

This will give a chart that looks like this..

  • Histograms: Histograms visualize the distribution of a numerical variable.

# Histogram of age

sns.histplot(data=df, x='Age', bins=10)

This will give a chart that looks like this..

  • Kernel Density Estimation (KDE) Plots: KDE plots provide a smooth estimate of the distribution of a numerical variable.

# KDE plot of fare

sns.kdeplot(data=df, x='Fare', shade=True)

This will give a chart that looks like this..

Categorical Data Visualization with Seaborn

Categorical data visualization techniques are useful for analyzing and comparing variables with discrete values. In this section, we will explore various categorical plots using Seaborn with the Titanic dataset. We will cover categorical plots, box plots, violin plots, swarm plots, and count plots to gain insights into the relationships and distributions within the dataset.

  • Categorical Plots: Categorical plots are ideal for visualizing the distribution of categorical variables.

import seaborn as sns

# Categorical plot of passenger class

sns.catplot(data=df, x='Pclass', kind='count')

This will give a chart that looks like this..

  • Box Plots: Box plots display the distribution of a continuous variable across different categories.
# Box plot of fare grouped by passenger class

sns.boxplot(data=df, x='Pclass', y='Fare')

This will give a chart that looks like this..

  • Violin Plots: Violin plots combine a box plot and KDE plot to show the distribution of a variable across categories.

# Violin plot of age grouped by survival status

sns.violinplot(data=df, x='Survived', y='Age')

This will give a chart that looks like this..

  • Swarm Plots: Swarm plots display individual data points along a categorical axis to show their distribution.
# Swarm plot of age grouped by passenger class

sns.swarmplot(data=df, x='Pclass', y='Age', hue='Survived')

This will give a chart that looks like this..

  • Count Plots: Count plots visualize the frequency of each category in a categorical variable.


# Count plot of embarked locations

sns.countplot(data=df, x='Embarked', hue='Survived')

This will give a chart that looks like this..

Advanced Data Visualization with Seaborn

  • Pair Plots: Pair plots allow us to visualize pairwise relationships between variables in a dataset.

import seaborn as sns

# Pair plot of selected numerical variables

sns.pairplot(data=df, vars=['Age', 'Fare', 'SibSp', 'Parch'], hue='Survived')

This will give a chart that looks like this..

  • Heatmaps: Heatmaps provide a visual representation of data using color-coded values in a matrix.


# Correlation heatmap of numerical variables

corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')

This will give a chart that looks like this..

  • Joint Plots: Joint plots display the relationship between two numerical variables with scatter plots and histograms.

# Joint plot of age and fare

sns.jointplot(data=df, x='Age', y='Fare', kind='scatter')

This will give a chart that looks like this..

  • FacetGrids: FacetGrids allow us to create multiple plots based on unique values of a categorical variable.
# FacetGrid of age and fare based on passenger class

g = sns.FacetGrid(data=df, col='Pclass')
g.map(sns.scatterplot, 'Age', 'Fare')

This will give a chart that looks like this..

Styling and Customization in Seaborn

One can utilize various styling and customization options in Seaborn to enhance the appearance and visual appeal of your plots. Let's explore how one can apply color palettes, plot styles, themes, annotations and text, as well as customize axis labels and titles to create visually appealing visualizations.

  • Color Palettes: Seaborn provides a range of color palettes to customize the colors of your plots. You can apply a color palette using the palette parameter in Seaborn functions.
import seaborn as sns

# Set a custom color palette
sns.set_palette("Set2")

# Use a specific color palette in a plot
sns.scatterplot(data=df, x='Age', y='Fare', hue='Survived', palette='Dark2')

This will give a chart that looks like this..

  • Plot Styles: Seaborn allows you to change the overall style of your plots using different built-in plot styles. You can set a plot style using the set_style function in Seaborn.

# Set the plot style

sns.set_style("whitegrid")

# Apply the plot style to a specific plot

sns.scatterplot(data=df, x='Age', y='Fare')

This will give a chart that looks like this…

  • Themes: Seaborn provides different themes that change the overall visual appearance of your plots. You can set a theme using the set_theme function in Seaborn.

# Set the theme
sns.set_theme(style="darkgrid")

# Apply the theme to a specific plot

sns.scatterplot(data=df, x='Age', y='Fare')


This will give a chart that looks like this..

  • Annotations and Text: One can add annotations and text to your plots using the text and annotate functions in Matplotlib. Here's an example:

# Adding a text annotation to a plot

sns.scatterplot(data=df, x='Age', y='Fare')
plt.text(30, 100, 'Annotation', fontsize=12, ha='center')

# Adding a text label to a specific data point

sns.scatterplot(data=df, x='Age', y='Fare')
plt.annotate('Label', xy=(30, 100), xytext=(40, 120),
             arrowprops=dict(facecolor='black', arrowstyle='->'))

This will give a chart that looks like this..

  • Axis Labels and Titles: You can customize the axis labels and plot titles using the xlabel, ylabel, and title functions in Matplotlib.

# Adding axis labels and a plot title

sns.scatterplot(data=df, x='Age', y='Fare')
plt.xlabel('Age')
plt.ylabel('Fare')
plt.title('Age vs Fare')

This will give a chart that looks like this..

Statistical Estimation and Visualization with Seaborn

Seaborn offers a range of powerful functions for statistical estimation and visualization, allowing you to gain insights and understand the relationships within the Titanic dataset. Let's explore some of these functions:

  • Regression Plots: Seaborn provides regression plot functions to visualize the relationship between variables and fit regression models to the data.

import seaborn as sns

# Regression plot of age and fare

sns.regplot(data=df, x='Age', y='Fare')

This will give a chart that looks like this...

  • Residual Plots: Residual plots help to assess the goodness of fit for regression models by examining the residuals (the differences between observed and predicted values).

# Residual plot of age and fare

sns.residplot(data=df, x='Age', y='Fare')

This will give a chart that looks like this..

  • Distribution Fitting: Seaborn allows you to fit and visualize probability distributions to the data, providing insights into the underlying distribution of variables.

# Distribution plot of age

import scipy.stats as stats

sns.histplot(data=df, x='Age', kde=True)
sns.lineplot(x, stats.norm.pdf(x, loc=df['Age'].mean(), scale=df['Age'].std()), color='red')

This will give a chart that looks like this...

  • Confidence Intervals: Seaborn provides functions to visualize confidence intervals, which help assess the precision and reliability of statistical estimates.

# Confidence interval plot of age and fare

sns.lineplot(data=df, x='Age', y='Fare', ci=95)

This will give a chart that looks like this..

  • Statistical Tests Visualization: Seaborn allows you to visualize the results of statistical tests, providing a clear understanding of the significance of relationships in the data.

# Statistical test visualization of age and fare

sns.pointplot(data=df, x='Age', y='Fare', hue='Survived', ci=None, estimator=np.median)

Working with Time Series Data in Seaborn

Seaborn provides useful tools for visualizing time series data, allowing you to analyze trends, seasonality, and other temporal patterns within the Titanic dataset. Let's explore some of these functions:

  • Time Series Line Plots: Seaborn allows you to create line plots to visualize the changes in a variable over time.
import seaborn as sns

# Time series line plot of passenger count over time

sns.lineplot(data=df, x='Date', y='Passenger_Count')

  • Seasonal Plots: Seaborn provides functions to analyze and visualize seasonal patterns in time series data.
# Seasonal plot of passenger count

sns.lineplot(data=df, x='Month', y='Passenger_Count', hue='Year')
  • Time Series Heatmaps: Seaborn allows you to create heatmaps to visualize patterns and relationships in time series data.
# Time series heatmap of passenger count

sns.heatmap(data=df.pivot('Year', 'Month', 'Passenger_Count'), cmap='YlGnBu)
  • Rolling Windows and Moving Averages: Seaborn provides tools to calculate rolling windows and moving averages, which help smooth out fluctuations in time series data.
# Rolling average plot of passenger count

rolling_avg = df['Passenger_Count'].rolling(window=7).mean()

sns.lineplot(data=rolling_avg)

Note: The data given does not have a date column, so it will be difficult to perform a time series function or produce such chart. You can create a dummy date column and perform the time series operations.

Interactive Visualizations with Seaborn

Seaborn provides options for creating interactive visualizations, enhancing the user experience and allowing for deeper exploration of the Titanic dataset. Let's explore some of these interactive features:

  • Interactive Plots with Widgets: Seaborn can be combined with interactive widgets from libraries like ipywidgets to create interactive plots with customizable parameters.
import seaborn as sns
import ipywidgets as widgets
from IPython.display import display

# Create an interactive scatter plot with widget controls
def scatterplot(x, y):
    sns.scatterplot(data=df, x=x, y=y)

# Create widget controls
x_dropdown = widgets.Dropdown(options=df.columns)
y_dropdown = widgets.Dropdown(options=df.columns)

# Display the interactive scatter plot
interactive_plot = widgets.interactive(scatterplot, x=x_dropdown, y=y_dropdown)
display(interactive_plot)
  • Plot Interactivity and Tooltips: Seaborn supports interactivity features like tooltips, which provide additional information when hovering over data points in a plot.
# Create a scatter plot with tooltips

sns.scatterplot(data=df, x='Age', y='Fare', hue='Survived', palette='Dark2', alpha=0.7)
plt.title('Age vs Fare')

# Enable tooltips

mplcursors.cursor(hover=True)

This will give a chart that looks like this..

  • Seaborn with Jupyter Notebooks: Seaborn seamlessly integrates with Jupyter Notebooks, allowing you to create interactive visualizations within the notebook environment.
import seaborn as sns
import matplotlib.pyplot as plt

# Enable interactive plots in Jupyter Notebook
% matplotlib widget

# Create a line plot

sns.lineplot(data=df, x='Age', y='Fare')
plt.xlabel('Age')
plt.ylabel('Fare')
  • Exporting and Sharing Interactive Visualizations: Seaborn plots can be exported in various formats, such as HTML or interactive notebook files, allowing you to share interactive visualizations with others.
import seaborn as sns

# Create an interactive scatter plot
sns.scatterplot(data=df, x='Age', y='Fare')

# Save the plot as an HTML file
plt.savefig('interactive_plot.html', format='html')

Note: This saves the scatterplot chart in html format on the local computer

Seaborn Integration with Pandas and Matplotlib

Seaborn seamlessly integrates with Pandas and Matplotlib, allowing you to combine the functionality of these libraries to enhance your data analysis and visualization with the Titanic dataset. Let's explore some ways to leverage this integration:

  • Combining Seaborn with Pandas: Seaborn works well with Pandas, enabling you to easily access and manipulate data from the Titanic dataset before visualizing it.

import seaborn as sns
import pandas as pd

# Load the Titanic dataset into a Pandas DataFrame
titanic_df = pd.read_csv('titanic.csv')

# Use Seaborn to create a scatter plot using Pandas DataFrame columns
sns.scatterplot(data=titanic_df, x='Age', y='Fare')

This will give a chart that looks like this..

  • Overlaying Seaborn Plots with Matplotlib: Seaborn plots can be overlaid with additional Matplotlib elements to further customize and enhance the visualizations.
import seaborn as sns
import matplotlib.pyplot as plt

# Create a scatter plot with Seaborn
sns.scatterplot(data=titanic_df, x='Age', y='Fare', hue='Survived')

# Overlay a Matplotlib line plot
plt.plot([0, 80], [50, 500], color='red', linewidth=2, linestyle='--')

# Add Matplotlib annotations
plt.text(25, 200, 'Threshold', fontsize=12, color='red')

This will give a chart that looks like this..

  • Seaborn Styling with Matplotlib Functions: One can utilize Matplotlib functions to customize the styling of Seaborn plots, such as modifying the axes, titles, or legends.

import seaborn as sns
import matplotlib.pyplot as plt

# Create a bar plot with Seaborn
sns.barplot(data=titanic_df, x='Embarked', y='Fare', hue='Survived')

# Customize the Matplotlib axes labels
plt.xlabel('Embarked')
plt.ylabel('Fare')

# Add a Matplotlib legend
plt.legend(title='Survived', loc='upper right')

This will give a chart that looks like this..

Conclusion

Seaborn is a versatile library for data visualization, offering a wide range of plots and functions. In this comprehensive guide, we explored various visualization techniques provided by Seaborn using the Titanic dataset. By leveraging Seaborn's capabilities, you can effectively explore, analyze, and communicate insights from your own datasets.

Keep in mind that this is just a brief introduction to Seaborn, and there are many more advanced features and functions available that can help you with more complex data analysis tasks. To learn more about Seaborn, be sure to check out the official documentation and explore more resources available online here.

If you want to get started with data analytics and looking to improving your skills, you can check out our Learning Track

Table of contents
  1. Installing Seaborn Library
  2. Loading the titanic dataset
  3. Basic Plotting with Seaborn
  4. Categorical Data Visualization with Seaborn
  5. Advanced Data Visualization with Seaborn
  6. Styling and Customization in Seaborn
  7. Statistical Estimation and Visualization with Seaborn
  8. Working with Time Series Data in Seaborn
  9. Interactive Visualizations with Seaborn
  10. Seaborn Integration with Pandas and Matplotlib
  11. Conclusion
resa logo

Empowering individuals and businesses with the tools to harness data, drive innovation, and achieve excellence in a digital world.

2025Resagratia (a brand of Resa Data Solutions Ltd). All Rights Reserved.