Understanding Data Types & Structures in EDA: Simple Guide for Beginners
Exploratory Data Analysis (EDA) is the first step in understanding a dataset before cleaning, modeling, or visualization.
One of the most important parts of EDA is understanding data types and data structures.
Why?
Because the type of data decides:
What cleaning methods you use
What visualizations you choose
What statistical methods apply
What machine learning algorithms work best
What Are Data Types?
Data Types tell us what kind of value is stored in a dataset.
Example dataset:
| Name | Age | Salary | Married | Joining_Date |
|---|---|---|---|---|
| Asha | 25 | 50000 | Yes | 2020-03-15 |
Each column has a different data type.
Types of Data
We divide data into two big categories:
- Numerical (Numbers)
- Categorical (Labels/Text)
Let’s explain each.
Numerical Data (Numbers)
Numerical data is quantitative — it represents counts or measurements.
Integer Numbers (int)
Whole numbers
Examples: Age = 25, Stock = 100, Students = 45
Float Numbers (float)
Decimal numbers
Examples: Salary = 50000.50, Rating = 4.3
What EDA techniques can you use?
Mean, median, mode
Variance, standard deviation
Histograms
Box plots
Correlation analysis
Categorical Data (Labels/Text)
Categorical data represents categories, not numbers.
Nominal (No order)
Examples:
Gender → Male, Female
City → Delhi, Mumbai
Ordinal (Has order)
Examples:
Education → High School < Graduate < Postgraduate
Rating → Low < Medium < High
What EDA techniques can you use?
Bar charts
Count plots
Pie charts
Frequency tables
Boolean Data (True/False)
Examples:
Married: Yes/No
Purchased: 1/0
Treated as categorical in EDA.
Date/Time Data (datetime)
Examples:
2020-03-15
2023-11-01 15:30
You can extract:
Year
Month
Day
Weekday
Hour
Useful for time-series analysis and trend detection.
Data Structures in EDA (Python + Pandas)
In Data Science, we mostly work with Pandas data structures.
Series (1-Dimensional)
A column of data.
Example:
import pandas as pd
s = pd.Series([10, 20, 30])
DataFrame (2-Dimensional)
A table with rows and columns.
Example:
df = pd.DataFrame({
"Name": ["Asha", "Rohan"],
"Age": [25, 30]
})
How to Check Data Types in Pandas (Very Important for EDA)
df.dtypes
Example Output:
Name object
Age int64
Salary float64
Married object
Joining_Date datetime64[ns]
Why Understanding Data Types Is Important
Because data types affect every step of your workflow:
Cleaning
Missing values in numeric → mean/median
Missing values in categorical → mode
Date formatting required
Visualization
Numeric → histograms, scatter plots
Categorical → bar charts, pie charts
Dates → line charts over time
Feature Engineering
Convert dates → Year, Month
Encode categories → One-Hot Encoding
Scale numerical features
Machine Learning
Algorithms require numeric inputs
Categorical values need encoding
Date values must be transformed
Real-World Example
Dataset for predicting house prices:
| Size_sqft | Location | Bedrooms | Built_Year | Price |
|---|---|---|---|---|
| 1200 | Delhi | 2 | 2015 | 65 Lakh |
Identify Data Types:
Size_sqft → Numerical
Location → Categorical
Bedrooms → Numerical
Built_Year → Date/Numerical
Price → Numerical (target)
How EDA Uses This:
Plot histogram of Size_sqft
Bar chart for Location
Scatter plot: Size vs Price
Line chart: Price over years
Understanding types helps you choose the right EDA techniques.
Python Example: Full EDA Data Type Check
import pandas as pd
df = pd.read_csv("data.csv")
# Check data types
print(df.dtypes)
# Summary of numerical columns
print(df.describe())
# Summary of categorical columns
print(df.select_dtypes(include="object").describe())