安装与配置 #

系统要求 #

基本要求 #

text

┌─────────────────────────────────────────────────────────────┐
│                    Pandas 系统要求                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Python 版本：Python 3.9+                                   │
│                                                             │
│  操作系统：                                                  │
│  ✅ Windows 10/11                                           │
│  ✅ macOS 10.14+                                            │
│  ✅ Linux (Ubuntu, CentOS, etc.)                            │
│                                                             │
│  内存：建议 4GB+                                             │
│  磁盘：至少 500MB 可用空间                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

依赖库 #

依赖库	说明
NumPy	数值计算基础
python-dateutil	日期处理
pytz	时区支持
tzdata	时区数据

安装方式 #

方式一：使用 pip 安装 #

bash

# 基础安装
pip install pandas

# 指定版本安装
pip install pandas==2.0.0

# 安装所有可选依赖
pip install pandas[all]

方式二：使用 conda 安装 #

bash

# 使用 Anaconda/Miniconda
conda install pandas

# 指定版本
conda install pandas=2.0

# 从 conda-forge 安装
conda install -c conda-forge pandas

方式三：从源码安装 #

bash

# 克隆仓库
git clone https://github.com/pandas-dev/pandas.git
cd pandas

# 安装开发版本
python -m pip install -ve .

虚拟环境配置 #

使用 venv #

bash

# 创建虚拟环境
python -m venv pandas_env

# 激活虚拟环境
# Windows
pandas_env\Scripts\activate
# macOS/Linux
source pandas_env/bin/activate

# 安装 Pandas
pip install pandas

使用 conda #

bash

# 创建新环境
conda create -n pandas_env python=3.11

# 激活环境
conda activate pandas_env

# 安装 Pandas
conda install pandas

使用 poetry #

bash

# 创建项目
poetry new my_pandas_project
cd my_pandas_project

# 添加 Pandas
poetry add pandas

# 进入虚拟环境
poetry shell

验证安装 #

检查安装 #

python

import pandas as pd

# 查看版本
print(pd.__version__)

# 查看 Pandas 信息
pd.show_versions()

测试基本功能 #

python

import pandas as pd
import numpy as np

# 创建 DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c']
})

print(df)

# 基本操作
print(df.head())
print(df.describe())
print(df.info())

开发工具配置 #

Jupyter Notebook #

bash

# 安装 Jupyter
pip install jupyter

# 安装 JupyterLab
pip install jupyterlab

# 启动
jupyter notebook
# 或
jupyter lab

VS Code 配置 #

json

// settings.json
{
    "python.defaultInterpreterPath": "./pandas_env/bin/python",
    "python.analysis.typeCheckingMode": "basic",
    "python.formatting.provider": "black",
    "editor.formatOnSave": true
}

PyCharm 配置 #

打开 Settings → Project → Python Interpreter
添加虚拟环境
安装 Pandas 包
配置代码风格（使用 Black）

常用扩展库安装 #

数据可视化 #

bash

# Matplotlib
pip install matplotlib

# Seaborn
pip install seaborn

# Plotly
pip install plotly

数据处理增强 #

bash

# OpenPyXL - Excel 支持
pip install openpyxl

# SQLAlchemy - 数据库支持
pip install sqlalchemy

# PyArrow - 高性能 IO
pip install pyarrow

性能优化 #

bash

# NumExpr - 加速数值计算
pip install numexpr

# Bottleneck - 加速统计计算
pip install bottleneck

配置选项 #

显示配置 #

python

import pandas as pd

# 设置显示选项
pd.set_option('display.max_rows', 100)        # 最大行数
pd.set_option('display.max_columns', 50)      # 最大列数
pd.set_option('display.width', 1000)          # 显示宽度
pd.set_option('display.precision', 2)         # 小数精度
pd.set_option('display.float_format', '{:.2f}'.format)  # 浮点格式

# 查看当前配置
pd.describe_option('display')

计算配置 #

python

# 设置计算模式
pd.set_option('mode.chained_assignment', 'warn')  # 链式赋值警告
pd.set_option('compute.use_numexpr', True)        # 使用 NumExpr

配置文件 #

python

# 查看配置文件位置
print(pd.core.config.config_name)

# 创建配置文件 ~/.pandas/pandas.ini
# [display]
# max_rows = 100
# max_columns = 50

常见问题解决 #

问题 1：安装速度慢 #

bash

# 使用国内镜像
pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple

# 或配置永久镜像
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

问题 2：依赖冲突 #

bash

# 使用虚拟环境隔离
python -m venv clean_env
source clean_env/bin/activate
pip install pandas

# 或使用 pip-check
pip install pip-check
pip-check

问题 3：Excel 文件读写失败 #

bash

# 安装必要依赖
pip install openpyxl xlrd xlwt

# 读取 Excel
df = pd.read_excel('file.xlsx', engine='openpyxl')

问题 4：内存不足 #

python

# 分块读取
chunks = pd.read_csv('large.csv', chunksize=10000)
for chunk in chunks:
    process(chunk)

# 指定数据类型节省内存
dtypes = {'col1': 'int32', 'col2': 'category'}
df = pd.read_csv('data.csv', dtype=dtypes)

问题 5：中文显示问题 #

python

import matplotlib.pyplot as plt

# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei']  # Windows
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']  # macOS
plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题

性能优化配置 #

内存优化 #

python

# 使用更小的数据类型
df['col'] = df['col'].astype('int32')  # 而非 int64

# 使用 category 类型
df['category'] = df['category'].astype('category')

# 查看内存使用
df.info(memory_usage='deep')
df.memory_usage(deep=True)

并行处理 #

python

# 使用 swifter 加速 apply
pip install swifter

import swifter
df['new_col'] = df['col'].swifter.apply(lambda x: x * 2)

开发环境检查清单 #

text

✅ Python 版本检查
   python --version

✅ Pip 版本检查
   pip --version

✅ 虚拟环境激活
   which python  # macOS/Linux
   where python  # Windows

✅ Pandas 安装检查
   python -c "import pandas; print(pandas.__version__)"

✅ 依赖库检查
   pip list | grep numpy

✅ Jupyter 检查
   jupyter --version

✅ 基本功能测试
   运行测试代码

下一步 #

环境配置完成后，接下来学习 Series 基础，开始掌握 Pandas 的核心数据结构！