如何为 Ubuntu Server 安装 NVIDIA 驱动？

2025-10-09 by dongnan

开始之前

nvidia-driver-xxx，显卡驱动核心包。
nvidia-utils-xxx-server，驱动工具集（实用命令，例如 nvidia-smi)。
nvidia-cuda-toolkit，CUDA 编程工具包 (GPU 编程所需的开发工具，例如 nvcc 编译器)。

为什么带 “server”？

“server” 版本是无图形界面版 (headless)，不安装 X11 组件，适合服务器。
普通版（不带 server）会装 GUI 支持库，比如 OpenGL。

如何选择版本

新显卡 (RTX 40 系列，或者较新的数据中心 GPU)
- 建议选 580 或 570（最新版，支持新显卡，CUDA 12.x）。
中等显卡 (RTX 30 系列 / A100 / L40 等)
- 建议选 550 或 535（稳定，兼容 CUDA 12.x/11.x）。
老显卡 (GTX 10 系列 / RTX 20 系列 / Tesla V100 之类)
- 建议选 470（CUDA 11.x，老卡支持最好）。

环境描述

OS：Ubuntu Server 22.04.5 LTS
GPU：GeForce RTX 3070

操作步骤

确认型号

lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)

安装驱动

使用 server 驱动包，不会安装 GUI 组件：

apt update
apt install -y nvidia-driver-535-server nvidia-utils-535-server
reboot

验证驱动是否加载成功

nvidia-smi

类似输出：

Thu Oct  9 06:35:01 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03             Driver Version: 535.261.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3070        Off | 00000000:01:00.0 Off |                  N/A |
| 30%   44C    P0              41W / 220W |      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

安装 CUDA (可选)

apt install -y nvidia-cuda-toolkit

验证 CUDA 是否安装成功：

nvcc --version

类似输出：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

测试

这里测试调用本地的 MiroThinker 模型。

cpu

time python demo.py 
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|████████████| 3/3 [00:00<00:00,  5.67it/s]
请用简短的一句话解释什么是人工智能。 人工智能是模拟人类智能的计算机系统，能够执行需要人类智能的任务，如学习、推理、问题解决、感知和语言理解等。
省略...

real    1m22.955s
user    5m26.199s
sys 0m6.533s

gpu

time python3 demo.py 
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|████████████| 3/3 [00:00<00:00,  4.05it/s]
Some parameters are on the meta device because they were offloaded to the cpu.
请用简短的一句话解释什么是人工智能。 人工智能是模拟人类智能的计算机系统，能够执行需要人类智能的任务，如学习、推理、问题解决、感知和语言理解。
省略...

real    0m23.337s
user    0m19.810s
sys 0m2.356s

其它

如何判断 demo 使用的是 CPU 还是 GPU？

用 PyTorch 检查

import torch

print(torch.cuda.is_available())       # True 表示有可用的 GPU
print(torch.cuda.current_device())     # 当前使用的 GPU 编号
print(torch.cuda.get_device_name(0))   # GPU 型号

从运行速度判断

CPU → 推理非常慢，特别是 MiroThinker-14B 这种较的大模型，可能要几十秒甚至几分钟。
GPU → 明显加速，一般几秒钟就能出结果。

nvidia-smi

运行程序时系统另开一个终端，运行 nvidia-smi 命令：

如果模型在跑，GPU 显存会被占用（会显示 Python 进程和显存用量）。

Thu Oct  9 08:10:41 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03             Driver Version: 535.261.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3070        Off | 00000000:01:00.0 Off |                  N/A |
| 31%   44C    P2              71W / 220W |   6987MiB /  8192MiB |     64%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      6633      C   python3                                    6982MiB |
+---------------------------------------------------------------------------------------+

参考

ChatGPT