ML - Step 2 : Python for Data Analysis with NumPy & Pandas

In this step, we’ll explore how to use NumPy for numerical computations and Pandas for working with datasets. These are the most widely used libraries in Machine Learning for data preprocessing.

1️⃣ Install Required Libraries

Before using NumPy and Pandas, install them with pip:

pip install numpy pandas

2️⃣ Using NumPy for Arrays

NumPy allows us to create and manipulate arrays easily:

import numpy as np

# Create a simple array
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

# Perform operations
print("Mean:", arr.mean())
print("Squared:", arr ** 2)

3️⃣ Using Pandas for DataFrames

Pandas helps you manage tabular data (like CSV files):

import pandas as pd

# Create a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Score": [85, 90, 88]
}
df = pd.DataFrame(data)

print(df)
print("Average Score:", df["Score"].mean())

✅ Summary

In this tutorial, you learned how to use NumPy arrays for calculations and Pandas DataFrames for handling datasets. These will be the backbone for most ML projects where data cleaning and analysis are required.