安装与配置 #

系统要求 #

支持的 Python 版本 #

Scikit-learn 版本	Python 版本
1.5+	3.9 - 3.12
1.3 - 1.4	3.8 - 3.12
1.0 - 1.2	3.8 - 3.11
0.24	3.7 - 3.10

硬件要求 #

要求	最低配置	推荐配置
内存	4 GB	8 GB+
CPU	双核	四核+
存储	1 GB	5 GB+

安装方法 #

方法一：使用 pip 安装（推荐） #

bash

pip install scikit-learn

安装特定版本：

bash

pip install scikit-learn==1.4.0

方法二：使用 conda 安装 #

bash

conda install scikit-learn

方法三：从源码安装 #

bash

git clone https://github.com/scikit-learn/scikit-learn.git
cd scikit-learn
pip install -e .

依赖管理 #

核心依赖 #

text

scikit-learn
├── numpy >= 1.17.3
├── scipy >= 1.5.0
├── joblib >= 1.1.1
├── threadpoolctl >= 2.0.0
└── matplotlib (可选，用于可视化)

安装完整数据科学栈 #

bash

pip install numpy scipy matplotlib pandas scikit-learn

或使用 Anaconda：

bash

conda install numpy scipy matplotlib pandas scikit-learn

虚拟环境配置 #

使用 venv #

bash

python -m venv sklearn-env
source sklearn-env/bin/activate  # Linux/macOS
sklearn-env\Scripts\activate     # Windows
pip install scikit-learn

使用 conda #

bash

conda create -n sklearn-env python=3.10
conda activate sklearn-env
conda install scikit-learn

使用 pipenv #

bash

pipenv install scikit-learn
pipenv shell

验证安装 #

检查版本 #

python

import sklearn
print(sklearn.__version__)

运行测试 #

python

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"准确率: {accuracy_score(y_test, y_pred):.4f}")

检查配置 #

python

import sklearn
print(sklearn.show_versions())

输出示例：

text

System:
    python: 3.10.12
executable: /usr/bin/python3
   machine: Linux-5.15.0-x86_64

Python dependencies:
          sklearn: 1.4.0
              pip: 23.2.1
       setuptools: 68.0.0
            numpy: 1.24.3
            scipy: 1.11.1
           Cython: None
           pandas: 2.0.3
       matplotlib: 3.7.2
           joblib: 1.3.1
threadpoolctl: 3.2.0

开发工具配置 #

Jupyter Notebook #

bash

pip install jupyter
jupyter notebook

JupyterLab #

bash

pip install jupyterlab
jupyter lab

VS Code 配置 #

安装扩展：

Python
Pylance
Jupyter

配置 settings.json：

json

{
    "python.defaultInterpreterPath": "${workspaceFolder}/sklearn-env/bin/python",
    "python.analysis.typeCheckingMode": "basic"
}

PyCharm 配置 #

打开 Settings → Project → Python Interpreter
添加虚拟环境
安装 scikit-learn

常见安装问题 #

问题1：numpy 版本冲突 #

bash

pip install --upgrade numpy
pip install --upgrade scikit-learn

问题2：编译错误 #

bash

pip install --upgrade pip setuptools wheel
pip install scikit-learn

问题3：权限问题 #

bash

pip install --user scikit-learn

问题4：网络问题 #

使用国内镜像：

bash

pip install scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple

性能优化配置 #

使用 OpenBLAS #

bash

pip install scikit-learn
python -c "import numpy; numpy.show_config()"

设置线程数 #

python

import os
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'

使用 joblib 并行 #

python

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_jobs=-1)
scores = cross_val_score(model, X, y, cv=5, n_jobs=-1)

开发环境最佳实践 #

项目结构 #

text

my-ml-project/
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
├── notebooks/
│   └── exploration.ipynb
├── src/
│   ├── __init__.py
│   ├── data.py
│   ├── features.py
│   └── models.py
├── tests/
│   └── test_models.py
├── requirements.txt
├── setup.py
└── README.md

requirements.txt #

text

numpy>=1.24.0
scipy>=1.10.0
scikit-learn>=1.4.0
pandas>=2.0.0
matplotlib>=3.7.0
jupyter>=1.0.0

使用配置文件 #

python

from sklearn import set_config

set_config(display='diagram')
set_config(transform_output='pandas')

Docker 环境 #

Dockerfile #

dockerfile

FROM python:3.10-slim

WORKDIR /app

RUN pip install --no-cache-dir \
    numpy \
    scipy \
    scikit-learn \
    pandas \
    matplotlib \
    jupyter

EXPOSE 8888

CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--allow-root"]

构建和运行 #

bash

docker build -t sklearn-env .
docker run -p 8888:8888 sklearn-env

下一步 #

环境配置完成后，继续学习核心概念了解 Scikit-learn 的基本设计理念！