Python Basics for Data Science
Python is the most important programming language for Data Science. It is powerful, easy to learn, and supported by thousands of libraries for:
Data cleaning
Data analysis
Data visualization
Machine learning
Artificial intelligence
This guide explains the Python basics every Data Science student must learn, in simple words, with examples.
Why Python for Data Science?
Because Python is:
Easy to read → English-like syntax
Rich in libraries → Pandas, NumPy, Matplotlib
Flexible → Works for ML, AI, automation
Fast to prototype → Perfect for experiments
Widely used → Industry standard tool
Python Syntax Basics
Python uses simple, clean syntax.
Print statement
print("Hello Data Science!")
Comments
# This is a comment
Python ignores everything after the #.
Variables in Python
Variables store data.
name = "Alice"
age = 21
score = 95.6
Python automatically detects the type (no need to declare explicitly).
Data Types (Most Important for Data Science)
Numeric Types
x = 10 # int
y = 10.5 # floatString
city = "New York"
Boolean
is_active = True
None (represents missing values)
value = NoneList
numbers = [10, 20, 30]
Dictionary
student = {"name": "Asha", "age": 20}
Conditional Statements (if/elif/else)
Used to make decisions.
score = 85
if score > 90:
print("Excellent")
elif score > 75:
print("Good")
else:
print("Needs Improvement")
Loops (for & while)
For Loop
for i in range(5):
print(i)
While Loop
count = 1
while count <= 5:
print(count)
count += 1
Functions
Functions make code reusable.
def add(a, b):
return a + b
print(add(3, 5))
Importing Libraries
In Data Science, libraries do the heavy work.
NumPy
For numerical computations.
import numpy as np
arr = np.array([1, 2, 3])
print(arr)
Pandas
For data cleaning & analysis.
import pandas as pd
df = pd.DataFrame({"Name": ["Ajay", "Riya"], "Marks": [85, 90]})
print(df)
Matplotlib
For data visualization.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [5, 7, 9])
plt.show()
Working with Pandas
Creating DataFrames
import pandas as pd
df = pd.DataFrame({
"Name": ["John", "Sara", "Ali"],
"Score": [90, 85, 88]
})
Reading CSV Files
df = pd.read_csv("data.csv")
Viewing Data
df.head() # first 5 rows
df.info() # data summary
df.describe() # statistics
Selecting Columns
df["Score"]
Filtering Rows
df[df["Score"] > 85]
NumPy Basics
Creating arrays
import numpy as np
a = np.array([1, 2, 3])
Mathematical operations
a * 2
a + 10
np.mean(a)
np.std(a)
NumPy is used internally by Pandas and ML libraries.
Data Visualization Basics
Data Science requires charts.
Line Plot
plt.plot([1,2,3], [3,6,9])
plt.title("Line Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Bar Chart
plt.bar(["A","B","C"], [10,20,15])
plt.show()
Python for Machine Learning
With Scikit-Learn:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit([[1],[2],[3]], [2,4,6])
print(model.predict([[4]]))
This predicts output for new data.
Python Project Example for Beginners
Problem: Predict student performance
You can practice:
Load CSV using Pandas
Clean data (remove missing values)
Visualize scores
Build a simple ML model
Predict exam results
This is the perfect first data science project.