Visualization Techniques: Histograms, Boxplots & Scatterplots Explained
Data Visualization is a key part of Exploratory Data Analysis (EDA).
It helps you see patterns, identify outliers, check distributions, and understand relationships.
This guide explains Histograms, Boxplots, and Scatterplots — three essential charts every Data Science student must know — in simple words, with examples and code.
Why Are Visualization Techniques Important?
Because they help you:
Understand data distribution
Detect outliers
Analyze trends
Discover relationships
Make better decisions
Explain findings clearly
Histograms
A Histogram shows how numerical values are distributed.
It divides data into “bins” (ranges) and counts how many values fall in each bin.
Think of it as a bar chart for numerical data.
When to Use a Histogram?
To check distribution (normal, skewed, uniform)
To identify outliers
To understand frequency of values
To compare data ranges
Example datasets:
Student marks, salaries, ages, sales numbers.
Example
Dataset:
[10, 12, 13, 20, 25, 25, 26, 30, 35, 40]
Histogram answers:
- How many values are between 10–20?
- How many between 20–30?
- How many above 30?
Python Example (Histogram)
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(df["Age"], bins=10, kde=True)
plt.title("Age Distribution")
plt.show()
Boxplots (Box-and-Whisker Plots)
A Boxplot shows:
Median
Quartiles (Q1, Q3)
Minimum
Maximum
Outliers
It gives a quick summary of distribution and variation.
When to Use a Boxplot?
To detect outliers
To compare multiple groups
To understand spread of data
To check skewness
Example use cases:
Salary comparison across departments, marks comparison across classes.
Python Example (Boxplot)
sns.boxplot(x=df["Salary"])
plt.title("Salary Distribution")
plt.show()
Scatterplots
A Scatterplot shows the relationship between two numerical variables.
Each point represents one data row.
When to Use a Scatterplot?
To check correlation (positive, negative, none)
To detect clusters
To find trends or patterns
To spot outliers
Example:
Height vs Weight, Age vs Salary, Advertising Spend vs Sales.
Python Example (Scatterplot)
plt.scatter(df["Age"], df["Salary"])
plt.xlabel("Age")
plt.ylabel("Salary")
plt.title("Age vs Salary")
plt.show()
Or using Seaborn:
sns.scatterplot(x="Age", y="Salary", data=df)
Real-World Example
Example Dataset
50 employees → Age, Salary, Department
Use Histograms
Check distribution of Age and Salary.
Use Boxplots
Find outliers in Salary.
Use Scatterplots
See if Age affects Salary.
This helps HR make decisions on hiring, promotions, and salary structure.
Full Python Code Example (All Three Visualizations)
import matplotlib.pyplot as plt
import seaborn as sns
# Histogram
sns.histplot(df["Marks"], kde=True)
plt.title("Marks Distribution")
plt.show()
# Boxplot
sns.boxplot(x=df["Marks"])
plt.title("Marks Boxplot")
plt.show()
# Scatterplot
sns.scatterplot(x="Study_Hours", y="Marks", data=df)
plt.title("Study Hours vs Marks")
plt.show()