Understanding Data Types & Structures in EDA: Simple Guide for Beginners

Exploratory Data Analysis (EDA) is the first step in understanding a dataset before cleaning, modeling, or visualization.
One of the most important parts of EDA is understanding data types and data structures.

Why?
Because the type of data decides:

  • What cleaning methods you use

  • What visualizations you choose

  • What statistical methods apply

  • What machine learning algorithms work best

What Are Data Types?

Data Types tell us what kind of value is stored in a dataset.

Example dataset:

NameAgeSalaryMarriedJoining_Date
Asha2550000Yes2020-03-15

Each column has a different data type.

Types of Data

We divide data into two big categories:

  • Numerical (Numbers)
  • Categorical (Labels/Text)

Let’s explain each.

Numerical Data (Numbers)

Numerical data is quantitative — it represents counts or measurements.

Integer Numbers (int)

Whole numbers
Examples: Age = 25, Stock = 100, Students = 45

 

Float Numbers (float)

Decimal numbers
Examples: Salary = 50000.50, Rating = 4.3

 

 What EDA techniques can you use?

  • Mean, median, mode

  • Variance, standard deviation

  • Histograms

  • Box plots

  • Correlation analysis

Categorical Data (Labels/Text)

Categorical data represents categories, not numbers.

Nominal (No order)

Examples:
Gender → Male, Female
City → Delhi, Mumbai

Ordinal (Has order)

Examples:
Education → High School < Graduate < Postgraduate
Rating → Low < Medium < High

What EDA techniques can you use?

  • Bar charts

  • Count plots

  • Pie charts

  • Frequency tables

Boolean Data (True/False)

Examples:

  • Married: Yes/No

  • Purchased: 1/0

Treated as categorical in EDA.

Date/Time Data (datetime)

Examples:

  • 2020-03-15

  • 2023-11-01 15:30

You can extract:

  • Year

  • Month

  • Day

  • Weekday

  • Hour

Useful for time-series analysis and trend detection.

Data Structures in EDA (Python + Pandas)

In Data Science, we mostly work with Pandas data structures.

Series (1-Dimensional)

A column of data.

Example:

import pandas as pd
s = pd.Series([10, 20, 30])

DataFrame (2-Dimensional)

A table with rows and columns.

Example:

df = pd.DataFrame({
    "Name": ["Asha", "Rohan"],
    "Age": [25, 30]
})

How to Check Data Types in Pandas (Very Important for EDA)

df.dtypes

Example Output:

Name            object
Age             int64
Salary          float64
Married         object
Joining_Date    datetime64[ns]

Why Understanding Data Types Is Important

Because data types affect every step of your workflow:

 Cleaning

  • Missing values in numeric → mean/median

  • Missing values in categorical → mode

  • Date formatting required

Visualization

  • Numeric → histograms, scatter plots

  • Categorical → bar charts, pie charts

  • Dates → line charts over time

 Feature Engineering

  • Convert dates → Year, Month

  • Encode categories → One-Hot Encoding

  • Scale numerical features

 Machine Learning

  • Algorithms require numeric inputs

  • Categorical values need encoding

  • Date values must be transformed

Real-World Example

Dataset for predicting house prices:

Size_sqftLocationBedroomsBuilt_YearPrice
1200Delhi2201565 Lakh

Identify Data Types:

  • Size_sqft → Numerical

  • Location → Categorical

  • Bedrooms → Numerical

  • Built_Year → Date/Numerical

  • Price → Numerical (target)

How EDA Uses This:

  • Plot histogram of Size_sqft

  • Bar chart for Location

  • Scatter plot: Size vs Price

  • Line chart: Price over years

Understanding types helps you choose the right EDA techniques.

Python Example: Full EDA Data Type Check

import pandas as pd

df = pd.read_csv("data.csv")

# Check data types
print(df.dtypes)

# Summary of numerical columns
print(df.describe())

# Summary of categorical columns
print(df.select_dtypes(include="object").describe())