ML - Step 2 : Python for Data Analysis with NumPy & Pandas
In this step, we’ll explore how to use NumPy for numerical computations and Pandas for working with datasets. These are the most widely used libraries in Machine Learning for data preprocessing.
1️⃣ Install Required Libraries
Before using NumPy and Pandas, install them with pip:
pip install numpy pandas
2️⃣ Using NumPy for Arrays
NumPy allows us to create and manipulate arrays easily:
import numpy as np
# Create a simple array
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)
# Perform operations
print("Mean:", arr.mean())
print("Squared:", arr ** 2)
3️⃣ Using Pandas for DataFrames
Pandas helps you manage tabular data (like CSV files):
import pandas as pd
# Create a DataFrame
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Score": [85, 90, 88]
}
df = pd.DataFrame(data)
print(df)
print("Average Score:", df["Score"].mean())
✅ Summary
In this tutorial, you learned how to use NumPy arrays for calculations and Pandas DataFrames for handling datasets. These will be the backbone for most ML projects where data cleaning and analysis are required.