使用huggingface-cli下载模型

📅 2024-06-03 | 🖱️

🔖 aigc

see also “使用modelscope下载模型”

Hugging Face 作为目前自然语言处理领域最热门的平台之一，提供了海量预训练模型。本文将介绍如何使用 huggingface-cli 下载各种模型，并通过实例演示其使用方法。

安装huggingface-cli #

安装环境信息如下:

操作系统: Ubuntu 24.04
安装用户: 系统普通用户(非root, 非特权用户)
Python版本: 3.11 (注Ubuntu 24.04内置安装了Python 3.12, Python 3.11需额外安装)

安装过程如下:

切换到执行安装操作的普通系统用户上

1su - <theuser>

确保Python 3.11的pip已经安装:

1python3.11 -m ensurepip --upgrade --user
2python3.11 -m pip install --user --upgrade pip --index-url=https://mirrors.aliyun.com/pypi/simple

确保pipx已经安装:

1python3.11 -m pip install --user pipx --index-url=https://mirrors.aliyun.com/pypi/simple

前面两步安装的pip和pipx位于~/.local目录中，具体为~/.local/bin, ~/.local/lib/python3.11/site-packages

使用pipx安装huggingface-cli

1pipx install 'huggingface_hub[cli]' --index-url=https://mirrors.aliyun.com/pypi/simple

使用pipx安装的好处是, huggingface-cli被pipx安装到了~/.local/share/pipx/venvs/huggingface-hub下，具有独立的虚拟环境

验证安装

安装完成后，会自动创建~/.local/bin/huggingface-cli到~/.local/share/pipx/venvs/huggingface-hub/bin/huggingface-cli的软链接

 1huggingface-cli --help
 2usage: huggingface-cli <command> [<args>]
 3
 4positional arguments:
 5  {download,upload,repo-files,env,login,whoami,logout,repo,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache,tag}
 6                        huggingface-cli command helpers
 7    download            Download files from the Hub
 8    upload              Upload a file or a folder to a repo on the Hub
 9    repo-files          Manage files in a repo on the Hub
10    env                 Print information about the environment.
11    login               Log in using a token from huggingface.co/settings/tokens
12    whoami              Find out which huggingface.co account you are logged in as.
13    logout              Log out
14    repo                {create} Commands to interact with your huggingface.co repos.
15    lfs-enable-largefiles
16                        Configure your repository to enable upload of files > 5GB.
17    scan-cache          Scan cache directory.
18    delete-cache        Delete revisions from the cache directory.
19    tag                 (create, list, delete) tags for a repo in the hub
20
21options:
22  -h, --help            show this help message and exit

配置huggingface.co的镜像地址 #

由于某些原因，国内的服务器可能无法直接访问huggingface.co。可以通过配置使用其镜像站。例如:hf-mirror.com。

hf-mirror.com是用于镜像 huggingface.co 域名。作为一个公益项目，致力于帮助国内AI开发者快速、稳定的下载模型、数据集。

修改~/.profile，在最下方加入:

1export HF_ENDPOINT=https://hf-mirror.com

执行一次source ~/.profile使上述环境变量在当前shell窗口立即生效，或退出重新登录或切换到当前用户su - <theuser>。

在当前shell窗口执行检查，确保HF_ENDPOINT环境变量已经配置上了:

1echo $HF_ENDPOINT
2https://hf-mirror.com

下载模型 #

Qwen/Qwen2-7B-Instruct #

1huggingface-cli download Qwen/Qwen2-7B-Instruct

模型默认被下载到~/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/目录中。

注意~/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/目录中存储的是模型的缓存文件:

 1~/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/
 2├── blobs
 3│   ├── 0b749a4446d7cda007d5e7bd9f908849d08d89867192d4c039dc167e9ab5a02e
 4│   ├── 0eb3c536657dcd12626e09eca4b6198c0cbcde1e
 5│   ├── 20024bfe7c83998e9aeaf98a0cd6a2ce6306c2f0
 6│   ├── 26d9919262ccd063fcdfd926763fe9025ef1e3073767aaa8c83a375d7c5140c4
 7│   ├── 33ea6c72ebb92a237fa2bdf26c5ff16592efcdae
 8│   ├── 35a3713c113c1548c4769b6c562b2624a2d4c2b2
 9│   ├── 428f5926b8d79604d999b20b3bea98f5d4225a21
10│   ├── 4783fe10ac3adce15ac8f358ef5462739852c569
11│   ├── 4bf699d11c6478a4b70fc2adfb405429de22525f
12│   ├── a6344aac8c09253b3b630fb776ae94478aa0275b
13│   ├── ae681edbd9486961e27d023f91d97b15562da7a9
14│   ├── cc375d92d7061b465042e9a1d507cb99598fb97a
15│   ├── da724bb7d3c3512eb371aa6caa5bcc08d78bda84f94e00ae9a9b2124e3e9c62f
16│   └── f5bb99fdadcac55c2c176497ec99f088a1764e78ed986fa4a0d45d12426ef0fa
17├── refs
18│   └── main
19└── snapshots
20    └── 41c66b0be1c3081f13defc6bdf946c2ef240d6a6
21        ├── LICENSE -> ../../blobs/cc375d92d7061b465042e9a1d507cb99598fb97a
22        ├── README.md -> ../../blobs/35a3713c113c1548c4769b6c562b2624a2d4c2b2
23        ├── config.json -> ../../blobs/ae681edbd9486961e27d023f91d97b15562da7a9
24        ├── generation_config.json -> ../../blobs/0eb3c536657dcd12626e09eca4b6198c0cbcde1e
25        ├── merges.txt -> ../../blobs/20024bfe7c83998e9aeaf98a0cd6a2ce6306c2f0
26        ├── model-00001-of-00004.safetensors -> ../../blobs/26d9919262ccd063fcdfd926763fe9025ef1e3073767aaa8c83a375d7c5140c4
27        ├── model-00002-of-00004.safetensors -> ../../blobs/f5bb99fdadcac55c2c176497ec99f088a1764e78ed986fa4a0d45d12426ef0fa
28        ├── model-00003-of-00004.safetensors -> ../../blobs/0b749a4446d7cda007d5e7bd9f908849d08d89867192d4c039dc167e9ab5a02e
29        ├── model-00004-of-00004.safetensors -> ../../blobs/da724bb7d3c3512eb371aa6caa5bcc08d78bda84f94e00ae9a9b2124e3e9c62f
30        ├── model.safetensors.index.json -> ../../blobs/4bf699d11c6478a4b70fc2adfb405429de22525f
31        ├── tokenizer.json -> ../../blobs/33ea6c72ebb92a237fa2bdf26c5ff16592efcdae
32        ├── tokenizer_config.json -> ../../blobs/428f5926b8d79604d999b20b3bea98f5d4225a21
33        └── vocab.json -> ../../blobs/4783fe10ac3adce15ac8f358ef5462739852c569

关于这个缓存目录的更多内容，可以查看huggingface_hub的官方文档 Manage huggingface_hub cache-system。

还可以在本地创建一个专门存放模型的目录:

1mkdir ~/models

可以使用huggingface-cli download加上--local-dir选项将模型保存到~/models这个本地目录中。

1huggingface-cli download Qwen/Qwen2-7B-Instruct --local-dir ~/models/models--Qwen--Qwen2-7B-Instruct

上面的命令将模型下载到本地目录时，因模型已经下载到缓存文件中了，所以存到本地目录时，直接会从缓存中获取。本地目录中的模型目录结构如下:

 1~/models/models--Qwen--Qwen2-7B-Instruct/
 2├── LICENSE
 3├── README.md
 4├── config.json
 5├── generation_config.json
 6├── merges.txt
 7├── model-00001-of-00004.safetensors
 8├── model-00002-of-00004.safetensors
 9├── model-00003-of-00004.safetensors
10├── model-00004-of-00004.safetensors
11├── model.safetensors.index.json
12├── tokenizer.json
13├── tokenizer_config.json
14└── vocab.json

Alibaba-NLP/gte-Qwen2-7B-instruct #

1huggingface-cli download Alibaba-NLP/gte-Qwen2-7B-instruct

BAAI/bge-large-zh-v1.5 #

1huggingface-cli download BAAI/bge-large-zh-v1.5

BAAI/bge-large-en-v1.5 #

1huggingface-cli download BAAI/bge-large-en-v1.5

shibing624/text2vec-base-chinese #

1huggingface-cli download shibing624/text2vec-base-chinese