Python for Excel: Automate Spreadsheets with Python

Learn how to read, write, analyse, and automate Excel spreadsheets with Python. Covers openpyxl for Excel files and pandas for data analysis, with practical copy-paste examples.

11 min readUpdated March 29, 2026

Overview section

If you spend hours every week doing the same thing in Excel — copying data between sheets, applying formulas, reformatting reports, merging files — Python can do it for you in seconds.

This guide covers two libraries:

Library	Best For	Think of It As
openpyxl	Reading and writing `.xlsx` files directly (formatting, cell-by-cell control)	A robot that clicks through Excel for you
pandas (3.x)	Analysing, filtering, transforming, and summarising data	Excel on steroids — pivot tables, VLOOKUP, and more in one line

You do not need to be a Python expert. If you have read our Python for Beginners guide, you have everything you need.

Setup section

Install the Libraries section

bash

pip install openpyxl pandas xlsxwriter

openpyxl — read and write .xlsx files with full formatting control
pandas — data analysis (uses openpyxl under the hood for Excel I/O). pandas 3.0 brought significant performance improvements with Copy-on-Write semantics enabled by default. For very large datasets, polars is still a faster alternative, but the gap has narrowed
xlsxwriter — alternative Excel writer with advanced formatting (optional)

Info

pandas 3.0 string dtype change: pandas 3.0 changed how string columns are inferred. Strings now default to the str dtype instead of object. This means methods like df["column"].str.contains(...) work more reliably out of the box. If you are following older tutorials or upgrading existing scripts, you may notice different dtype outputs when calling df.dtypes — this is expected and all the string operations shown in this guide remain the same.

Create a Sample File section

Save this as sample_data.xlsx to follow along (or use any Excel file you already have):

Name	Department	Salary	Start Date	Status
Alice Johnson	Engineering	95000	2022-03-15	Active
Bob Smith	Marketing	72000	2021-07-01	Active
Carol White	Engineering	105000	2020-01-10	Active
Dave Brown	Sales	68000	2023-06-20	Inactive
Eve Davis	Marketing	78000	2022-11-05	Active
Frank Miller	Sales	71000	2021-09-14	Active
Grace Lee	Engineering	112000	2019-04-22	Active
Henry Wilson	Marketing	65000	2024-01-08	Active

Part 1: openpyxl — Direct Excel Control section

Reading an Excel File section

python

from openpyxl import load_workbook

# Open the file
wb = load_workbook("sample_data.xlsx")
ws = wb.active  # Get the active sheet

# Print every row
for row in ws.iter_rows(min_row=2, values_only=True):  # min_row=2 skips the header
    name, dept, salary, start_date, status = row
    print(f"{name} | {dept} | ${salary:,}")

Reading Specific Cells section

python

from openpyxl import load_workbook

wb = load_workbook("sample_data.xlsx")
ws = wb.active

# Read a specific cell
print(ws["A2"].value)  # "Alice Johnson"
print(ws["C2"].value)  # 95000

# Read a range
for row in ws["A2":"C5"]:
    for cell in row:
        print(cell.value, end=" | ")
    print()

Writing a New Excel File section

python

from openpyxl import Workbook

wb = Workbook()
ws = wb.active
ws.title = "Employees"

# Write headers
headers = ["Name", "Department", "Salary"]
ws.append(headers)

# Write data rows
data = [
    ["Alice Johnson", "Engineering", 95000],
    ["Bob Smith", "Marketing", 72000],
    ["Carol White", "Engineering", 105000],
]

for row in data:
    ws.append(row)

# Save
wb.save("new_report.xlsx")
print("File saved.")

Formatting Cells section

python

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, numbers

wb = Workbook()
ws = wb.active
ws.title = "Formatted Report"

# Header row
headers = ["Name", "Department", "Salary", "Status"]
ws.append(headers)

# Style the header
header_font = Font(bold=True, color="FFFFFF", size=12)
header_fill = PatternFill(start_color="BE1E2D", end_color="BE1E2D", fill_type="solid")

for cell in ws[1]:
    cell.font = header_font
    cell.fill = header_fill
    cell.alignment = Alignment(horizontal="center")

# Add data
data = [
    ["Alice Johnson", "Engineering", 95000, "Active"],
    ["Bob Smith", "Marketing", 72000, "Active"],
    ["Dave Brown", "Sales", 68000, "Inactive"],
]

for row in data:
    ws.append(row)

# Format salary column as currency
for row in ws.iter_rows(min_row=2, min_col=3, max_col=3):
    for cell in row:
        cell.number_format = '$#,##0'

# Auto-adjust column widths
for col in ws.columns:
    max_length = max(len(str(cell.value or "")) for cell in col)
    ws.column_dimensions[col[0].column_letter].width = max_length + 4

# Conditional formatting — highlight "Inactive" in red
from openpyxl.styles import PatternFill
red_fill = PatternFill(start_color="FFE0E0", end_color="FFE0E0", fill_type="solid")
for row in ws.iter_rows(min_row=2):
    if row[3].value == "Inactive":
        for cell in row:
            cell.fill = red_fill

wb.save("formatted_report.xlsx")
print("Formatted report saved.")

Adding Formulas section

python

from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# Headers and data
ws.append(["Product", "Price", "Quantity", "Total"])
ws.append(["Widget A", 10.50, 100, "=B2*C2"])
ws.append(["Widget B", 25.00, 50, "=B3*C3"])
ws.append(["Widget C", 5.75, 200, "=B4*C4"])

# Summary row
ws.append([])
ws.append(["", "", "Grand Total:", "=SUM(D2:D4)"])

wb.save("formulas.xlsx")

Info

Formulas are saved as text in openpyxl. They are evaluated when you open the file in Excel, not by Python. If you need Python to calculate values, do the maths in Python and write the result.

Working with Multiple Sheets section

python

from openpyxl import Workbook

wb = Workbook()

# Rename default sheet
ws1 = wb.active
ws1.title = "Summary"

# Create additional sheets
ws2 = wb.create_sheet("Q1 Data")
ws3 = wb.create_sheet("Q2 Data")

# Write to each sheet
ws1.append(["Quarter", "Revenue"])
ws1.append(["Q1", "=SUM('Q1 Data'!B2:B100)"])
ws1.append(["Q2", "=SUM('Q2 Data'!B2:B100)"])

ws2.append(["Month", "Revenue"])
ws2.append(["January", 50000])
ws2.append(["February", 55000])
ws2.append(["March", 60000])

ws3.append(["Month", "Revenue"])
ws3.append(["April", 62000])
ws3.append(["May", 58000])
ws3.append(["June", 67000])

wb.save("multi_sheet.xlsx")

Part 2: pandas — Data Analysis Power section

pandas is where Python truly replaces Excel. It handles the tasks you would normally do with VLOOKUP, pivot tables, and complex formulas — but faster and reproducibly.

Reading an Excel File into pandas section

python

import pandas as pd

# Read the file — pandas creates a "DataFrame" (a table)
df = pd.read_excel("sample_data.xlsx")

# Preview the data
print(df.head())       # First 5 rows
print(df.shape)        # (rows, columns) — e.g. (8, 5)
print(df.columns)      # Column names
print(df.dtypes)       # Data types of each column
print(df.describe())   # Summary statistics for numeric columns

Filtering Data (Like Excel's Filter)section

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Filter: only active employees
active = df[df["Status"] == "Active"]
print(active)

# Filter: Engineering department with salary > 100k
senior_eng = df[(df["Department"] == "Engineering") & (df["Salary"] > 100000)]
print(senior_eng)

# Filter: anyone in Marketing OR Sales
mkt_sales = df[df["Department"].isin(["Marketing", "Sales"])]
print(mkt_sales)

# Filter: names containing "son"
has_son = df[df["Name"].str.contains("son", case=False)]
print(has_son)

Sorting section

python

# Sort by salary (highest first)
sorted_df = df.sort_values("Salary", ascending=False)
print(sorted_df)

# Sort by department, then by salary within each department
sorted_df = df.sort_values(["Department", "Salary"], ascending=[True, False])
print(sorted_df)

Adding and Modifying Columns section

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Add a new column — annual bonus (10% of salary)
df["Bonus"] = df["Salary"] * 0.10

# Add a column based on a condition
df["Pay Grade"] = df["Salary"].apply(lambda s: "Senior" if s >= 90000 else "Standard")

# Rename a column
df = df.rename(columns={"Start Date": "Hire Date"})

# Format a column
df["Salary Formatted"] = df["Salary"].apply(lambda s: f"${s:,.0f}")

print(df[["Name", "Salary Formatted", "Bonus", "Pay Grade"]])

Group By — Like Pivot Tables section

This is the pandas equivalent of Excel's pivot tables and SUMIFS.

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Average salary by department
dept_avg = df.groupby("Department")["Salary"].mean()
print(dept_avg)
# Department
# Engineering    104000.0
# Marketing       71666.7
# Sales           69500.0

# Multiple aggregations
summary = df.groupby("Department").agg(
    Headcount=("Name", "count"),
    Avg_Salary=("Salary", "mean"),
    Max_Salary=("Salary", "max"),
    Min_Salary=("Salary", "min"),
    Total_Payroll=("Salary", "sum"),
).round(0)

print(summary)

VLOOKUP Equivalent — Merging DataFrames section

python

import pandas as pd

# Main employee data
employees = pd.DataFrame({
    "Name": ["Alice", "Bob", "Carol"],
    "Dept_ID": [101, 102, 101],
})

# Department lookup table
departments = pd.DataFrame({
    "Dept_ID": [101, 102, 103],
    "Dept_Name": ["Engineering", "Marketing", "Sales"],
    "Manager": ["Jane", "Tom", "Sarah"],
})

# VLOOKUP: merge on Dept_ID
result = employees.merge(departments, on="Dept_ID", how="left")
print(result)
#     Name  Dept_ID    Dept_Name Manager
# 0  Alice      101  Engineering    Jane
# 1    Bob      102    Marketing     Tom
# 2  Carol      101  Engineering    Jane

Combining Multiple Excel Files section

python

import pandas as pd
import glob

# Read all .xlsx files in a folder and combine them
files = glob.glob("reports/*.xlsx")

all_data = []
for file in files:
    df = pd.read_excel(file)
    df["Source File"] = file  # Track which file each row came from
    all_data.append(df)

combined = pd.concat(all_data, ignore_index=True)
print(f"Combined {len(files)} files — {len(combined)} total rows")

# Save the combined data
combined.to_excel("combined_report.xlsx", index=False)

Tip

This is one of the most common time-savers. If you receive 12 monthly reports as separate Excel files and need to combine them, this script replaces an hour of manual copy-paste.

Removing Duplicates section

python

# Remove exact duplicate rows
df_clean = df.drop_duplicates()

# Remove duplicates based on specific columns
df_clean = df.drop_duplicates(subset=["Name", "Department"])

# Keep the last occurrence instead of the first
df_clean = df.drop_duplicates(subset=["Name"], keep="last")

print(f"Removed {len(df) - len(df_clean)} duplicates")

Handling Missing Data section

python

# Find missing values
print(df.isnull().sum())  # Count of missing values per column

# Fill missing values
df["Department"] = df["Department"].fillna("Unknown")
df["Salary"] = df["Salary"].fillna(df["Salary"].median())

# Drop rows with any missing values
df_clean = df.dropna()

# Drop rows where a specific column is missing
df_clean = df.dropna(subset=["Email"])

Writing Back to Excel section

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Basic write
df.to_excel("output.xlsx", index=False)

# Write multiple sheets to one file
with pd.ExcelWriter("report.xlsx", engine="xlsxwriter") as writer:
    # Active employees on one sheet
    active = df[df["Status"] == "Active"]
    active.to_excel(writer, sheet_name="Active", index=False)

    # Inactive on another
    inactive = df[df["Status"] == "Inactive"]
    inactive.to_excel(writer, sheet_name="Inactive", index=False)

    # Summary on a third
    summary = df.groupby("Department").agg(
        Count=("Name", "count"),
        Avg_Salary=("Salary", "mean"),
    ).round(0)
    summary.to_excel(writer, sheet_name="Summary")

print("Report saved with 3 sheets.")

Part 3: Real-World Recipes section

Recipe 1: Monthly Report Formatter section

Takes raw data and produces a formatted Excel report with a summary sheet.

python

import pandas as pd

def format_monthly_report(input_file, output_file):
    """Read raw data and create a formatted monthly report."""
    df = pd.read_excel(input_file)

    with pd.ExcelWriter(output_file, engine="xlsxwriter") as writer:
        # Raw data sheet
        df.to_excel(writer, sheet_name="Data", index=False)

        # Summary sheet
        summary = df.groupby("Department").agg(
            Employees=("Name", "count"),
            Total_Payroll=("Salary", "sum"),
            Average_Salary=("Salary", "mean"),
        ).round(0)
        summary.to_excel(writer, sheet_name="Summary")

        # Format the summary sheet
        workbook = writer.book
        worksheet = writer.sheets["Summary"]

        money_fmt = workbook.add_format({"num_format": "$#,##0"})
        worksheet.set_column("B:D", 18, money_fmt)

        header_fmt = workbook.add_format({
            "bold": True,
            "bg_color": "#BE1E2D",
            "font_color": "#FFFFFF",
            "border": 1,
        })
        for col_num, value in enumerate(["Department"] + list(summary.columns)):
            worksheet.write(0, col_num, value, header_fmt)

    print(f"Report saved to {output_file}")

format_monthly_report("sample_data.xlsx", "monthly_report.xlsx")

Recipe 2: Find Differences Between Two Spreadsheets section

python

import pandas as pd

def compare_spreadsheets(file_old, file_new, key_column="Name"):
    """Compare two spreadsheets and show what changed."""
    old = pd.read_excel(file_old)
    new = pd.read_excel(file_new)

    # New rows (in new but not in old)
    added = new[~new[key_column].isin(old[key_column])]
    print(f"\n--- Added ({len(added)} rows) ---")
    print(added)

    # Removed rows (in old but not in new)
    removed = old[~old[key_column].isin(new[key_column])]
    print(f"\n--- Removed ({len(removed)} rows) ---")
    print(removed)

    # Changed rows (same key, different values)
    common_keys = set(old[key_column]) & set(new[key_column])
    old_common = old[old[key_column].isin(common_keys)].set_index(key_column).sort_index()
    new_common = new[new[key_column].isin(common_keys)].set_index(key_column).sort_index()

    changes = old_common.compare(new_common)
    if not changes.empty:
        print(f"\n--- Changed ({len(changes)} rows) ---")
        print(changes)
    else:
        print("\nNo changes in existing rows.")

compare_spreadsheets("last_month.xlsx", "this_month.xlsx")

Recipe 3: Split One File into Multiple Files by Column section

python

import pandas as pd
import os

def split_by_column(input_file, column, output_folder="split_output"):
    """Split an Excel file into separate files based on a column value."""
    df = pd.read_excel(input_file)
    os.makedirs(output_folder, exist_ok=True)

    for value in df[column].unique():
        subset = df[df[column] == value]
        safe_name = str(value).replace("/", "-").replace(" ", "_")
        output_path = os.path.join(output_folder, f"{safe_name}.xlsx")
        subset.to_excel(output_path, index=False)
        print(f"  {output_path} — {len(subset)} rows")

    print(f"\nSplit into {df[column].nunique()} files.")

split_by_column("sample_data.xlsx", "Department")
# Creates: split_output/Engineering.xlsx, split_output/Marketing.xlsx, split_output/Sales.xlsx

Recipe 4: Email-Ready Summary from Excel Data section

python

import pandas as pd

def generate_email_summary(input_file):
    """Generate a plain-text summary suitable for pasting into an email."""
    df = pd.read_excel(input_file)

    total_employees = len(df)
    active = len(df[df["Status"] == "Active"])
    total_payroll = df["Salary"].sum()
    avg_salary = df["Salary"].mean()

    dept_summary = df.groupby("Department").agg(
        Count=("Name", "count"),
        Avg_Salary=("Salary", "mean"),
    ).round(0)

    summary = f"""
EMPLOYEE REPORT SUMMARY
=======================
Total Employees: {total_employees}
Active: {active}
Total Payroll: ${total_payroll:,.0f}
Average Salary: ${avg_salary:,.0f}

BY DEPARTMENT:
{dept_summary.to_string()}
"""
    print(summary)
    return summary

generate_email_summary("sample_data.xlsx")

Common Tasks Cheat Sheet section

Excel Task	pandas Equivalent
Open a file	`pd.read_excel("file.xlsx")`
Save a file	`df.to_excel("file.xlsx", index=False)`
Filter rows	`df[df["Column"] == "value"]`
Sort	`df.sort_values("Column")`
Remove duplicates	`df.drop_duplicates()`
VLOOKUP	`df.merge(other_df, on="key")`
Pivot table	`df.groupby("Column").agg(...)`
SUMIF	`df[df["Col"] == "X"]["Amount"].sum()`
COUNTIF	`len(df[df["Col"] == "X"])`
CONCATENATE	`df["Full"] = df["First"] + " " + df["Last"]`
IF formula	`df["New"] = df["Col"].apply(lambda x: "A" if x > 50 else "B")`
Fill blanks	`df["Col"].fillna("default")`
Add new column	`df["New"] = df["Col"] * 1.1`
Read specific sheet	`pd.read_excel("file.xlsx", sheet_name="Sheet2")`
Read all sheets	`pd.read_excel("file.xlsx", sheet_name=None)` (returns dict)

Tip

Save this cheat sheet — it covers 90% of what most people need when moving from Excel to Python. When in doubt, search "pandas equivalent of [Excel function]" and you will find it.

Next Steps section

Start with one repetitive Excel task you do weekly and automate it
Use pandas for data analysis and openpyxl when you need precise cell formatting
Explore our Python for Beginners guide if any code in this article was unclear
Look into scheduling your Python scripts with Task Scheduler (Windows) or cron (Mac/Linux) to run them automatically

PythonExcelAutomationPandasopenpyxl

Coding & AutomationPython for Beginners: A Complete Getting Started GuideLearn Python from scratch. This beginner-friendly guide covers installation, core concepts, data types, control flow, functions, file handling, and your first real project — all with practical examples.Read Coding & AutomationPowerShell: Export All Office 365 Mailbox Sizes to CSVA script to export every mailbox size, item count, and archive size in your Microsoft 365 tenant to a CSV file for capacity planning and reporting.Read Coding & AutomationPowerShell: Automate AD & M365 User ProvisioningA comprehensive PowerShell script to create new users in Active Directory and assign Microsoft 365 licences, mailboxes, and group memberships in one step.Read

Was this article helpful?

Coding & Automation

Python for Excel: Automate Spreadsheets with Python

Learn how to read, write, analyse, and automate Excel spreadsheets with Python. Covers openpyxl for Excel files and pandas for data analysis, with practical copy-paste examples.

11 min readUpdated March 29, 2026

Overview section

If you spend hours every week doing the same thing in Excel — copying data between sheets, applying formulas, reformatting reports, merging files — Python can do it for you in seconds.

This guide covers two libraries:

Library	Best For	Think of It As
openpyxl	Reading and writing `.xlsx` files directly (formatting, cell-by-cell control)	A robot that clicks through Excel for you
pandas (3.x)	Analysing, filtering, transforming, and summarising data	Excel on steroids — pivot tables, VLOOKUP, and more in one line

You do not need to be a Python expert. If you have read our Python for Beginners guide, you have everything you need.

Setup section

Install the Libraries section

bash

pip install openpyxl pandas xlsxwriter

openpyxl — read and write .xlsx files with full formatting control
pandas — data analysis (uses openpyxl under the hood for Excel I/O). pandas 3.0 brought significant performance improvements with Copy-on-Write semantics enabled by default. For very large datasets, polars is still a faster alternative, but the gap has narrowed
xlsxwriter — alternative Excel writer with advanced formatting (optional)

Info

Create a Sample File section

Save this as sample_data.xlsx to follow along (or use any Excel file you already have):

Name	Department	Salary	Start Date	Status
Alice Johnson	Engineering	95000	2022-03-15	Active
Bob Smith	Marketing	72000	2021-07-01	Active
Carol White	Engineering	105000	2020-01-10	Active
Dave Brown	Sales	68000	2023-06-20	Inactive
Eve Davis	Marketing	78000	2022-11-05	Active
Frank Miller	Sales	71000	2021-09-14	Active
Grace Lee	Engineering	112000	2019-04-22	Active
Henry Wilson	Marketing	65000	2024-01-08	Active

Part 1: openpyxl — Direct Excel Control section

Reading an Excel File section

python

from openpyxl import load_workbook

# Open the file
wb = load_workbook("sample_data.xlsx")
ws = wb.active  # Get the active sheet

# Print every row
for row in ws.iter_rows(min_row=2, values_only=True):  # min_row=2 skips the header
    name, dept, salary, start_date, status = row
    print(f"{name} | {dept} | ${salary:,}")

Reading Specific Cells section

python

from openpyxl import load_workbook

wb = load_workbook("sample_data.xlsx")
ws = wb.active

# Read a specific cell
print(ws["A2"].value)  # "Alice Johnson"
print(ws["C2"].value)  # 95000

# Read a range
for row in ws["A2":"C5"]:
    for cell in row:
        print(cell.value, end=" | ")
    print()

Writing a New Excel File section

python

from openpyxl import Workbook

wb = Workbook()
ws = wb.active
ws.title = "Employees"

# Write headers
headers = ["Name", "Department", "Salary"]
ws.append(headers)

# Write data rows
data = [
    ["Alice Johnson", "Engineering", 95000],
    ["Bob Smith", "Marketing", 72000],
    ["Carol White", "Engineering", 105000],
]

for row in data:
    ws.append(row)

# Save
wb.save("new_report.xlsx")
print("File saved.")

Formatting Cells section

python

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side, numbers

wb = Workbook()
ws = wb.active
ws.title = "Formatted Report"

# Header row
headers = ["Name", "Department", "Salary", "Status"]
ws.append(headers)

# Style the header
header_font = Font(bold=True, color="FFFFFF", size=12)
header_fill = PatternFill(start_color="BE1E2D", end_color="BE1E2D", fill_type="solid")

for cell in ws[1]:
    cell.font = header_font
    cell.fill = header_fill
    cell.alignment = Alignment(horizontal="center")

# Add data
data = [
    ["Alice Johnson", "Engineering", 95000, "Active"],
    ["Bob Smith", "Marketing", 72000, "Active"],
    ["Dave Brown", "Sales", 68000, "Inactive"],
]

for row in data:
    ws.append(row)

# Format salary column as currency
for row in ws.iter_rows(min_row=2, min_col=3, max_col=3):
    for cell in row:
        cell.number_format = '$#,##0'

# Auto-adjust column widths
for col in ws.columns:
    max_length = max(len(str(cell.value or "")) for cell in col)
    ws.column_dimensions[col[0].column_letter].width = max_length + 4

# Conditional formatting — highlight "Inactive" in red
from openpyxl.styles import PatternFill
red_fill = PatternFill(start_color="FFE0E0", end_color="FFE0E0", fill_type="solid")
for row in ws.iter_rows(min_row=2):
    if row[3].value == "Inactive":
        for cell in row:
            cell.fill = red_fill

wb.save("formatted_report.xlsx")
print("Formatted report saved.")

Adding Formulas section

python

from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# Headers and data
ws.append(["Product", "Price", "Quantity", "Total"])
ws.append(["Widget A", 10.50, 100, "=B2*C2"])
ws.append(["Widget B", 25.00, 50, "=B3*C3"])
ws.append(["Widget C", 5.75, 200, "=B4*C4"])

# Summary row
ws.append([])
ws.append(["", "", "Grand Total:", "=SUM(D2:D4)"])

wb.save("formulas.xlsx")

Info

Formulas are saved as text in openpyxl. They are evaluated when you open the file in Excel, not by Python. If you need Python to calculate values, do the maths in Python and write the result.

Working with Multiple Sheets section

python

from openpyxl import Workbook

wb = Workbook()

# Rename default sheet
ws1 = wb.active
ws1.title = "Summary"

# Create additional sheets
ws2 = wb.create_sheet("Q1 Data")
ws3 = wb.create_sheet("Q2 Data")

# Write to each sheet
ws1.append(["Quarter", "Revenue"])
ws1.append(["Q1", "=SUM('Q1 Data'!B2:B100)"])
ws1.append(["Q2", "=SUM('Q2 Data'!B2:B100)"])

ws2.append(["Month", "Revenue"])
ws2.append(["January", 50000])
ws2.append(["February", 55000])
ws2.append(["March", 60000])

ws3.append(["Month", "Revenue"])
ws3.append(["April", 62000])
ws3.append(["May", 58000])
ws3.append(["June", 67000])

wb.save("multi_sheet.xlsx")

Part 2: pandas — Data Analysis Power section

pandas is where Python truly replaces Excel. It handles the tasks you would normally do with VLOOKUP, pivot tables, and complex formulas — but faster and reproducibly.

Reading an Excel File into pandas section

python

import pandas as pd

# Read the file — pandas creates a "DataFrame" (a table)
df = pd.read_excel("sample_data.xlsx")

# Preview the data
print(df.head())       # First 5 rows
print(df.shape)        # (rows, columns) — e.g. (8, 5)
print(df.columns)      # Column names
print(df.dtypes)       # Data types of each column
print(df.describe())   # Summary statistics for numeric columns

Filtering Data (Like Excel's Filter)section

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Filter: only active employees
active = df[df["Status"] == "Active"]
print(active)

# Filter: Engineering department with salary > 100k
senior_eng = df[(df["Department"] == "Engineering") & (df["Salary"] > 100000)]
print(senior_eng)

# Filter: anyone in Marketing OR Sales
mkt_sales = df[df["Department"].isin(["Marketing", "Sales"])]
print(mkt_sales)

# Filter: names containing "son"
has_son = df[df["Name"].str.contains("son", case=False)]
print(has_son)

Sorting section

python

# Sort by salary (highest first)
sorted_df = df.sort_values("Salary", ascending=False)
print(sorted_df)

# Sort by department, then by salary within each department
sorted_df = df.sort_values(["Department", "Salary"], ascending=[True, False])
print(sorted_df)

Adding and Modifying Columns section

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Add a new column — annual bonus (10% of salary)
df["Bonus"] = df["Salary"] * 0.10

# Add a column based on a condition
df["Pay Grade"] = df["Salary"].apply(lambda s: "Senior" if s >= 90000 else "Standard")

# Rename a column
df = df.rename(columns={"Start Date": "Hire Date"})

# Format a column
df["Salary Formatted"] = df["Salary"].apply(lambda s: f"${s:,.0f}")

print(df[["Name", "Salary Formatted", "Bonus", "Pay Grade"]])

Group By — Like Pivot Tables section

This is the pandas equivalent of Excel's pivot tables and SUMIFS.

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Average salary by department
dept_avg = df.groupby("Department")["Salary"].mean()
print(dept_avg)
# Department
# Engineering    104000.0
# Marketing       71666.7
# Sales           69500.0

# Multiple aggregations
summary = df.groupby("Department").agg(
    Headcount=("Name", "count"),
    Avg_Salary=("Salary", "mean"),
    Max_Salary=("Salary", "max"),
    Min_Salary=("Salary", "min"),
    Total_Payroll=("Salary", "sum"),
).round(0)

print(summary)

VLOOKUP Equivalent — Merging DataFrames section

python

import pandas as pd

# Main employee data
employees = pd.DataFrame({
    "Name": ["Alice", "Bob", "Carol"],
    "Dept_ID": [101, 102, 101],
})

# Department lookup table
departments = pd.DataFrame({
    "Dept_ID": [101, 102, 103],
    "Dept_Name": ["Engineering", "Marketing", "Sales"],
    "Manager": ["Jane", "Tom", "Sarah"],
})

# VLOOKUP: merge on Dept_ID
result = employees.merge(departments, on="Dept_ID", how="left")
print(result)
#     Name  Dept_ID    Dept_Name Manager
# 0  Alice      101  Engineering    Jane
# 1    Bob      102    Marketing     Tom
# 2  Carol      101  Engineering    Jane

Combining Multiple Excel Files section

python

import pandas as pd
import glob

# Read all .xlsx files in a folder and combine them
files = glob.glob("reports/*.xlsx")

all_data = []
for file in files:
    df = pd.read_excel(file)
    df["Source File"] = file  # Track which file each row came from
    all_data.append(df)

combined = pd.concat(all_data, ignore_index=True)
print(f"Combined {len(files)} files — {len(combined)} total rows")

# Save the combined data
combined.to_excel("combined_report.xlsx", index=False)

Tip

This is one of the most common time-savers. If you receive 12 monthly reports as separate Excel files and need to combine them, this script replaces an hour of manual copy-paste.

Removing Duplicates section

python

# Remove exact duplicate rows
df_clean = df.drop_duplicates()

# Remove duplicates based on specific columns
df_clean = df.drop_duplicates(subset=["Name", "Department"])

# Keep the last occurrence instead of the first
df_clean = df.drop_duplicates(subset=["Name"], keep="last")

print(f"Removed {len(df) - len(df_clean)} duplicates")

Handling Missing Data section

python

# Find missing values
print(df.isnull().sum())  # Count of missing values per column

# Fill missing values
df["Department"] = df["Department"].fillna("Unknown")
df["Salary"] = df["Salary"].fillna(df["Salary"].median())

# Drop rows with any missing values
df_clean = df.dropna()

# Drop rows where a specific column is missing
df_clean = df.dropna(subset=["Email"])

Writing Back to Excel section

python

import pandas as pd

df = pd.read_excel("sample_data.xlsx")

# Basic write
df.to_excel("output.xlsx", index=False)

# Write multiple sheets to one file
with pd.ExcelWriter("report.xlsx", engine="xlsxwriter") as writer:
    # Active employees on one sheet
    active = df[df["Status"] == "Active"]
    active.to_excel(writer, sheet_name="Active", index=False)

    # Inactive on another
    inactive = df[df["Status"] == "Inactive"]
    inactive.to_excel(writer, sheet_name="Inactive", index=False)

    # Summary on a third
    summary = df.groupby("Department").agg(
        Count=("Name", "count"),
        Avg_Salary=("Salary", "mean"),
    ).round(0)
    summary.to_excel(writer, sheet_name="Summary")

print("Report saved with 3 sheets.")

Part 3: Real-World Recipes section

Recipe 1: Monthly Report Formatter section

Takes raw data and produces a formatted Excel report with a summary sheet.

python

import pandas as pd

def format_monthly_report(input_file, output_file):
    """Read raw data and create a formatted monthly report."""
    df = pd.read_excel(input_file)

    with pd.ExcelWriter(output_file, engine="xlsxwriter") as writer:
        # Raw data sheet
        df.to_excel(writer, sheet_name="Data", index=False)

        # Summary sheet
        summary = df.groupby("Department").agg(
            Employees=("Name", "count"),
            Total_Payroll=("Salary", "sum"),
            Average_Salary=("Salary", "mean"),
        ).round(0)
        summary.to_excel(writer, sheet_name="Summary")

        # Format the summary sheet
        workbook = writer.book
        worksheet = writer.sheets["Summary"]

        money_fmt = workbook.add_format({"num_format": "$#,##0"})
        worksheet.set_column("B:D", 18, money_fmt)

        header_fmt = workbook.add_format({
            "bold": True,
            "bg_color": "#BE1E2D",
            "font_color": "#FFFFFF",
            "border": 1,
        })
        for col_num, value in enumerate(["Department"] + list(summary.columns)):
            worksheet.write(0, col_num, value, header_fmt)

    print(f"Report saved to {output_file}")

format_monthly_report("sample_data.xlsx", "monthly_report.xlsx")

Recipe 2: Find Differences Between Two Spreadsheets section

python

import pandas as pd

def compare_spreadsheets(file_old, file_new, key_column="Name"):
    """Compare two spreadsheets and show what changed."""
    old = pd.read_excel(file_old)
    new = pd.read_excel(file_new)

    # New rows (in new but not in old)
    added = new[~new[key_column].isin(old[key_column])]
    print(f"\n--- Added ({len(added)} rows) ---")
    print(added)

    # Removed rows (in old but not in new)
    removed = old[~old[key_column].isin(new[key_column])]
    print(f"\n--- Removed ({len(removed)} rows) ---")
    print(removed)

    # Changed rows (same key, different values)
    common_keys = set(old[key_column]) & set(new[key_column])
    old_common = old[old[key_column].isin(common_keys)].set_index(key_column).sort_index()
    new_common = new[new[key_column].isin(common_keys)].set_index(key_column).sort_index()

    changes = old_common.compare(new_common)
    if not changes.empty:
        print(f"\n--- Changed ({len(changes)} rows) ---")
        print(changes)
    else:
        print("\nNo changes in existing rows.")

compare_spreadsheets("last_month.xlsx", "this_month.xlsx")

Recipe 3: Split One File into Multiple Files by Column section

python

import pandas as pd
import os

def split_by_column(input_file, column, output_folder="split_output"):
    """Split an Excel file into separate files based on a column value."""
    df = pd.read_excel(input_file)
    os.makedirs(output_folder, exist_ok=True)

    for value in df[column].unique():
        subset = df[df[column] == value]
        safe_name = str(value).replace("/", "-").replace(" ", "_")
        output_path = os.path.join(output_folder, f"{safe_name}.xlsx")
        subset.to_excel(output_path, index=False)
        print(f"  {output_path} — {len(subset)} rows")

    print(f"\nSplit into {df[column].nunique()} files.")

split_by_column("sample_data.xlsx", "Department")
# Creates: split_output/Engineering.xlsx, split_output/Marketing.xlsx, split_output/Sales.xlsx

Recipe 4: Email-Ready Summary from Excel Data section

python

import pandas as pd

def generate_email_summary(input_file):
    """Generate a plain-text summary suitable for pasting into an email."""
    df = pd.read_excel(input_file)

    total_employees = len(df)
    active = len(df[df["Status"] == "Active"])
    total_payroll = df["Salary"].sum()
    avg_salary = df["Salary"].mean()

    dept_summary = df.groupby("Department").agg(
        Count=("Name", "count"),
        Avg_Salary=("Salary", "mean"),
    ).round(0)

    summary = f"""
EMPLOYEE REPORT SUMMARY
=======================
Total Employees: {total_employees}
Active: {active}
Total Payroll: ${total_payroll:,.0f}
Average Salary: ${avg_salary:,.0f}

BY DEPARTMENT:
{dept_summary.to_string()}
"""
    print(summary)
    return summary

generate_email_summary("sample_data.xlsx")

Common Tasks Cheat Sheet section

Excel Task	pandas Equivalent
Open a file	`pd.read_excel("file.xlsx")`
Save a file	`df.to_excel("file.xlsx", index=False)`
Filter rows	`df[df["Column"] == "value"]`
Sort	`df.sort_values("Column")`
Remove duplicates	`df.drop_duplicates()`
VLOOKUP	`df.merge(other_df, on="key")`
Pivot table	`df.groupby("Column").agg(...)`
SUMIF	`df[df["Col"] == "X"]["Amount"].sum()`
COUNTIF	`len(df[df["Col"] == "X"])`
CONCATENATE	`df["Full"] = df["First"] + " " + df["Last"]`
IF formula	`df["New"] = df["Col"].apply(lambda x: "A" if x > 50 else "B")`
Fill blanks	`df["Col"].fillna("default")`
Add new column	`df["New"] = df["Col"] * 1.1`
Read specific sheet	`pd.read_excel("file.xlsx", sheet_name="Sheet2")`
Read all sheets	`pd.read_excel("file.xlsx", sheet_name=None)` (returns dict)

Tip

Save this cheat sheet — it covers 90% of what most people need when moving from Excel to Python. When in doubt, search "pandas equivalent of [Excel function]" and you will find it.

Next Steps section

Start with one repetitive Excel task you do weekly and automate it
Use pandas for data analysis and openpyxl when you need precise cell formatting
Explore our Python for Beginners guide if any code in this article was unclear
Look into scheduling your Python scripts with Task Scheduler (Windows) or cron (Mac/Linux) to run them automatically

PythonExcelAutomationPandasopenpyxl

Was this article helpful?

Related Articles

Related Articles