Python Basics for Data Science

Python is the most important programming language for Data Science. It is powerful, easy to learn, and supported by thousands of libraries for:

  • Data cleaning

  • Data analysis

  • Data visualization

  • Machine learning

  • Artificial intelligence

This guide explains the Python basics every Data Science student must learn, in simple words, with examples.

Why Python for Data Science?

Because Python is:

  • Easy to read → English-like syntax

  • Rich in libraries → Pandas, NumPy, Matplotlib

  • Flexible → Works for ML, AI, automation

  • Fast to prototype → Perfect for experiments

  • Widely used → Industry standard tool

Python Syntax Basics

Python uses simple, clean syntax.

Print statement

print("Hello Data Science!")

Comments

# This is a comment

Python ignores everything after the #.

Variables in Python

Variables store data.

name = "Alice"
age = 21
score = 95.6

Python automatically detects the type (no need to declare explicitly).

Data Types (Most Important for Data Science)

Numeric Types

x = 10        # int
y = 10.5      # float

String

city = "New York"

Boolean

is_active = True

None (represents missing values)

 
value = None

List

numbers = [10, 20, 30]

Dictionary

student = {"name": "Asha", "age": 20}

Conditional Statements (if/elif/else)

Used to make decisions.

score = 85

if score > 90:
    print("Excellent")
elif score > 75:
    print("Good")
else:
    print("Needs Improvement")

Loops (for & while)

For Loop

for i in range(5):
    print(i)

While Loop

count = 1
while count <= 5:
    print(count)
    count += 1

Functions

Functions make code reusable.

def add(a, b):
    return a + b

print(add(3, 5))

Importing Libraries

In Data Science, libraries do the heavy work.

NumPy

For numerical computations.

import numpy as np

arr = np.array([1, 2, 3])
print(arr)

Pandas

For data cleaning & analysis.

import pandas as pd

df = pd.DataFrame({"Name": ["Ajay", "Riya"], "Marks": [85, 90]})
print(df)

Matplotlib

For data visualization.

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [5, 7, 9])
plt.show()

Working with Pandas

Creating DataFrames

import pandas as pd

df = pd.DataFrame({
    "Name": ["John", "Sara", "Ali"],
    "Score": [90, 85, 88]
})

Reading CSV Files

df = pd.read_csv("data.csv")

Viewing Data

df.head()     # first 5 rows
df.info()     # data summary
df.describe() # statistics

Selecting Columns

df["Score"]

Filtering Rows

df[df["Score"] > 85]

NumPy Basics

Creating arrays

import numpy as np

a = np.array([1, 2, 3])

Mathematical operations

a * 2
a + 10
np.mean(a)
np.std(a)

NumPy is used internally by Pandas and ML libraries.

Data Visualization Basics

Data Science requires charts.

Line Plot

plt.plot([1,2,3], [3,6,9])
plt.title("Line Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

Bar Chart

plt.bar(["A","B","C"], [10,20,15])
plt.show()

Python for Machine Learning

With Scikit-Learn:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit([[1],[2],[3]], [2,4,6])

print(model.predict([[4]]))

This predicts output for new data.

Python Project Example for Beginners

Problem: Predict student performance

You can practice:

  1. Load CSV using Pandas

  2. Clean data (remove missing values)

  3. Visualize scores

  4. Build a simple ML model

  5. Predict exam results

This is the perfect first data science project.