Visualization Techniques: Histograms, Boxplots & Scatterplots Explained

Data Visualization is a key part of Exploratory Data Analysis (EDA).
It helps you see patterns, identify outliers, check distributions, and understand relationships.

This guide explains Histograms, Boxplots, and Scatterplots — three essential charts every Data Science student must know — in simple words, with examples and code.

Why Are Visualization Techniques Important?

Because they help you:

  • Understand data distribution

  • Detect outliers

  • Analyze trends

  • Discover relationships

  • Make better decisions

  • Explain findings clearly

Histograms

A Histogram shows how numerical values are distributed.
It divides data into “bins” (ranges) and counts how many values fall in each bin.

Think of it as a bar chart for numerical data.

When to Use a Histogram?

  • To check distribution (normal, skewed, uniform)

  • To identify outliers

  • To understand frequency of values

  • To compare data ranges

Example datasets:
Student marks, salaries, ages, sales numbers.

Example

Dataset:
[10, 12, 13, 20, 25, 25, 26, 30, 35, 40]

Histogram answers:

  • How many values are between 10–20?
  • How many between 20–30?
  • How many above 30?

Python Example (Histogram)

import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(df["Age"], bins=10, kde=True)
plt.title("Age Distribution")
plt.show()

Boxplots (Box-and-Whisker Plots)

A Boxplot shows:

  • Median

  • Quartiles (Q1, Q3)

  • Minimum

  • Maximum

  • Outliers

It gives a quick summary of distribution and variation.

When to Use a Boxplot?

  • To detect outliers

  • To compare multiple groups

  • To understand spread of data

  • To check skewness

Example use cases:
Salary comparison across departments, marks comparison across classes.

Python Example (Boxplot)

sns.boxplot(x=df["Salary"])
plt.title("Salary Distribution")
plt.show()

Scatterplots

A Scatterplot shows the relationship between two numerical variables.

Each point represents one data row.

When to Use a Scatterplot?

  • To check correlation (positive, negative, none)

  • To detect clusters

  • To find trends or patterns

  • To spot outliers

Example:
Height vs Weight, Age vs Salary, Advertising Spend vs Sales.

Python Example (Scatterplot)

plt.scatter(df["Age"], df["Salary"])
plt.xlabel("Age")
plt.ylabel("Salary")
plt.title("Age vs Salary")
plt.show()

Or using Seaborn:

sns.scatterplot(x="Age", y="Salary", data=df)

Real-World Example

Example Dataset

50 employees → Age, Salary, Department

Use Histograms

Check distribution of Age and Salary.

Use Boxplots

Find outliers in Salary.

Use Scatterplots

See if Age affects Salary.

This helps HR make decisions on hiring, promotions, and salary structure.

Full Python Code Example (All Three Visualizations)

import matplotlib.pyplot as plt
import seaborn as sns

# Histogram
sns.histplot(df["Marks"], kde=True)
plt.title("Marks Distribution")
plt.show()

# Boxplot
sns.boxplot(x=df["Marks"])
plt.title("Marks Boxplot")
plt.show()

# Scatterplot
sns.scatterplot(x="Study_Hours", y="Marks", data=df)
plt.title("Study Hours vs Marks")
plt.show()