安装与配置 #

系统要求 #

支持的 Python 版本 #

Scikit-learn 版本 Python 版本
1.5+ 3.9 - 3.12
1.3 - 1.4 3.8 - 3.12
1.0 - 1.2 3.8 - 3.11
0.24 3.7 - 3.10

硬件要求 #

要求 最低配置 推荐配置
内存 4 GB 8 GB+
CPU 双核 四核+
存储 1 GB 5 GB+

安装方法 #

方法一:使用 pip 安装(推荐) #

bash
pip install scikit-learn

安装特定版本:

bash
pip install scikit-learn==1.4.0

方法二:使用 conda 安装 #

bash
conda install scikit-learn

方法三:从源码安装 #

bash
git clone https://github.com/scikit-learn/scikit-learn.git
cd scikit-learn
pip install -e .

依赖管理 #

核心依赖 #

text
scikit-learn
├── numpy >= 1.17.3
├── scipy >= 1.5.0
├── joblib >= 1.1.1
├── threadpoolctl >= 2.0.0
└── matplotlib (可选,用于可视化)

安装完整数据科学栈 #

bash
pip install numpy scipy matplotlib pandas scikit-learn

或使用 Anaconda:

bash
conda install numpy scipy matplotlib pandas scikit-learn

虚拟环境配置 #

使用 venv #

bash
python -m venv sklearn-env
source sklearn-env/bin/activate  # Linux/macOS
sklearn-env\Scripts\activate     # Windows
pip install scikit-learn

使用 conda #

bash
conda create -n sklearn-env python=3.10
conda activate sklearn-env
conda install scikit-learn

使用 pipenv #

bash
pipenv install scikit-learn
pipenv shell

验证安装 #

检查版本 #

python
import sklearn
print(sklearn.__version__)

运行测试 #

python
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"准确率: {accuracy_score(y_test, y_pred):.4f}")

检查配置 #

python
import sklearn
print(sklearn.show_versions())

输出示例:

text
System:
    python: 3.10.12
executable: /usr/bin/python3
   machine: Linux-5.15.0-x86_64

Python dependencies:
          sklearn: 1.4.0
              pip: 23.2.1
       setuptools: 68.0.0
            numpy: 1.24.3
            scipy: 1.11.1
           Cython: None
           pandas: 2.0.3
       matplotlib: 3.7.2
           joblib: 1.3.1
threadpoolctl: 3.2.0

开发工具配置 #

Jupyter Notebook #

bash
pip install jupyter
jupyter notebook

JupyterLab #

bash
pip install jupyterlab
jupyter lab

VS Code 配置 #

安装扩展:

  • Python
  • Pylance
  • Jupyter

配置 settings.json:

json
{
    "python.defaultInterpreterPath": "${workspaceFolder}/sklearn-env/bin/python",
    "python.analysis.typeCheckingMode": "basic"
}

PyCharm 配置 #

  1. 打开 Settings → Project → Python Interpreter
  2. 添加虚拟环境
  3. 安装 scikit-learn

常见安装问题 #

问题1:numpy 版本冲突 #

bash
pip install --upgrade numpy
pip install --upgrade scikit-learn

问题2:编译错误 #

bash
pip install --upgrade pip setuptools wheel
pip install scikit-learn

问题3:权限问题 #

bash
pip install --user scikit-learn

问题4:网络问题 #

使用国内镜像:

bash
pip install scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple

性能优化配置 #

使用 OpenBLAS #

bash
pip install scikit-learn
python -c "import numpy; numpy.show_config()"

设置线程数 #

python
import os
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'

使用 joblib 并行 #

python
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_jobs=-1)
scores = cross_val_score(model, X, y, cv=5, n_jobs=-1)

开发环境最佳实践 #

项目结构 #

text
my-ml-project/
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
├── notebooks/
│   └── exploration.ipynb
├── src/
│   ├── __init__.py
│   ├── data.py
│   ├── features.py
│   └── models.py
├── tests/
│   └── test_models.py
├── requirements.txt
├── setup.py
└── README.md

requirements.txt #

text
numpy>=1.24.0
scipy>=1.10.0
scikit-learn>=1.4.0
pandas>=2.0.0
matplotlib>=3.7.0
jupyter>=1.0.0

使用配置文件 #

python
from sklearn import set_config

set_config(display='diagram')
set_config(transform_output='pandas')

Docker 环境 #

Dockerfile #

dockerfile
FROM python:3.10-slim

WORKDIR /app

RUN pip install --no-cache-dir \
    numpy \
    scipy \
    scikit-learn \
    pandas \
    matplotlib \
    jupyter

EXPOSE 8888

CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--allow-root"]

构建和运行 #

bash
docker build -t sklearn-env .
docker run -p 8888:8888 sklearn-env

下一步 #

环境配置完成后,继续学习 核心概念 了解 Scikit-learn 的基本设计理念!

最后更新:2026-04-04