安装与配置 #
系统要求 #
支持的 Python 版本 #
| Scikit-learn 版本 | Python 版本 |
|---|---|
| 1.5+ | 3.9 - 3.12 |
| 1.3 - 1.4 | 3.8 - 3.12 |
| 1.0 - 1.2 | 3.8 - 3.11 |
| 0.24 | 3.7 - 3.10 |
硬件要求 #
| 要求 | 最低配置 | 推荐配置 |
|---|---|---|
| 内存 | 4 GB | 8 GB+ |
| CPU | 双核 | 四核+ |
| 存储 | 1 GB | 5 GB+ |
安装方法 #
方法一:使用 pip 安装(推荐) #
bash
pip install scikit-learn
安装特定版本:
bash
pip install scikit-learn==1.4.0
方法二:使用 conda 安装 #
bash
conda install scikit-learn
方法三:从源码安装 #
bash
git clone https://github.com/scikit-learn/scikit-learn.git
cd scikit-learn
pip install -e .
依赖管理 #
核心依赖 #
text
scikit-learn
├── numpy >= 1.17.3
├── scipy >= 1.5.0
├── joblib >= 1.1.1
├── threadpoolctl >= 2.0.0
└── matplotlib (可选,用于可视化)
安装完整数据科学栈 #
bash
pip install numpy scipy matplotlib pandas scikit-learn
或使用 Anaconda:
bash
conda install numpy scipy matplotlib pandas scikit-learn
虚拟环境配置 #
使用 venv #
bash
python -m venv sklearn-env
source sklearn-env/bin/activate # Linux/macOS
sklearn-env\Scripts\activate # Windows
pip install scikit-learn
使用 conda #
bash
conda create -n sklearn-env python=3.10
conda activate sklearn-env
conda install scikit-learn
使用 pipenv #
bash
pipenv install scikit-learn
pipenv shell
验证安装 #
检查版本 #
python
import sklearn
print(sklearn.__version__)
运行测试 #
python
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"准确率: {accuracy_score(y_test, y_pred):.4f}")
检查配置 #
python
import sklearn
print(sklearn.show_versions())
输出示例:
text
System:
python: 3.10.12
executable: /usr/bin/python3
machine: Linux-5.15.0-x86_64
Python dependencies:
sklearn: 1.4.0
pip: 23.2.1
setuptools: 68.0.0
numpy: 1.24.3
scipy: 1.11.1
Cython: None
pandas: 2.0.3
matplotlib: 3.7.2
joblib: 1.3.1
threadpoolctl: 3.2.0
开发工具配置 #
Jupyter Notebook #
bash
pip install jupyter
jupyter notebook
JupyterLab #
bash
pip install jupyterlab
jupyter lab
VS Code 配置 #
安装扩展:
- Python
- Pylance
- Jupyter
配置 settings.json:
json
{
"python.defaultInterpreterPath": "${workspaceFolder}/sklearn-env/bin/python",
"python.analysis.typeCheckingMode": "basic"
}
PyCharm 配置 #
- 打开 Settings → Project → Python Interpreter
- 添加虚拟环境
- 安装 scikit-learn
常见安装问题 #
问题1:numpy 版本冲突 #
bash
pip install --upgrade numpy
pip install --upgrade scikit-learn
问题2:编译错误 #
bash
pip install --upgrade pip setuptools wheel
pip install scikit-learn
问题3:权限问题 #
bash
pip install --user scikit-learn
问题4:网络问题 #
使用国内镜像:
bash
pip install scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple
性能优化配置 #
使用 OpenBLAS #
bash
pip install scikit-learn
python -c "import numpy; numpy.show_config()"
设置线程数 #
python
import os
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['MKL_NUM_THREADS'] = '4'
使用 joblib 并行 #
python
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_jobs=-1)
scores = cross_val_score(model, X, y, cv=5, n_jobs=-1)
开发环境最佳实践 #
项目结构 #
text
my-ml-project/
├── data/
│ ├── raw/
│ ├── processed/
│ └── external/
├── notebooks/
│ └── exploration.ipynb
├── src/
│ ├── __init__.py
│ ├── data.py
│ ├── features.py
│ └── models.py
├── tests/
│ └── test_models.py
├── requirements.txt
├── setup.py
└── README.md
requirements.txt #
text
numpy>=1.24.0
scipy>=1.10.0
scikit-learn>=1.4.0
pandas>=2.0.0
matplotlib>=3.7.0
jupyter>=1.0.0
使用配置文件 #
python
from sklearn import set_config
set_config(display='diagram')
set_config(transform_output='pandas')
Docker 环境 #
Dockerfile #
dockerfile
FROM python:3.10-slim
WORKDIR /app
RUN pip install --no-cache-dir \
numpy \
scipy \
scikit-learn \
pandas \
matplotlib \
jupyter
EXPOSE 8888
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--allow-root"]
构建和运行 #
bash
docker build -t sklearn-env .
docker run -p 8888:8888 sklearn-env
下一步 #
环境配置完成后,继续学习 核心概念 了解 Scikit-learn 的基本设计理念!
最后更新:2026-04-04