使用huggingface-cli下载模型
2024-06-03
Hugging Face 作为目前自然语言处理领域最热门的平台之一,提供了海量预训练模型。本文将介绍如何使用 huggingface-cli 下载各种模型,并通过实例演示其使用方法。
安装huggingface-cli #
安装环境信息如下:
- 操作系统: Ubuntu 24.04
- 安装用户: 系统普通用户(非root, 非特权用户)
- Python版本: 3.11 (注Ubuntu 24.04内置安装了Python 3.12, Python 3.11需额外安装)
安装过程如下:
- 切换到执行安装操作的普通系统用户上
1su - <theuser>
- 确保Python 3.11的pip已经安装:
1python3.11 -m ensurepip --upgrade --user
2python3.11 -m pip install --user --upgrade pip --index-url=https://mirrors.aliyun.com/pypi/simple
- 确保pipx已经安装:
1python3.11 -m pip install --user pipx --index-url=https://mirrors.aliyun.com/pypi/simple
前面两步安装的pip和pipx位于
~/.local
目录中,具体为~/.local/bin
,~/.local/lib/python3.11/site-packages
- 使用pipx安装huggingface-cli
1pipx install 'huggingface_hub[cli]' --index-url=https://mirrors.aliyun.com/pypi/simple
使用pipx安装的好处是,
huggingface-cli
被pipx安装到了~/.local/share/pipx/venvs/huggingface-hub
下,具有独立的虚拟环境
- 验证安装
安装完成后,会自动创建~/.local/bin/huggingface-cli
到~/.local/share/pipx/venvs/huggingface-hub/bin/huggingface-cli
的软链接
1huggingface-cli --help
2usage: huggingface-cli <command> [<args>]
3
4positional arguments:
5 {download,upload,repo-files,env,login,whoami,logout,repo,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache,tag}
6 huggingface-cli command helpers
7 download Download files from the Hub
8 upload Upload a file or a folder to a repo on the Hub
9 repo-files Manage files in a repo on the Hub
10 env Print information about the environment.
11 login Log in using a token from huggingface.co/settings/tokens
12 whoami Find out which huggingface.co account you are logged in as.
13 logout Log out
14 repo {create} Commands to interact with your huggingface.co repos.
15 lfs-enable-largefiles
16 Configure your repository to enable upload of files > 5GB.
17 scan-cache Scan cache directory.
18 delete-cache Delete revisions from the cache directory.
19 tag (create, list, delete) tags for a repo in the hub
20
21options:
22 -h, --help show this help message and exit
配置huggingface.co的镜像地址 #
由于某些原因,国内的服务器可能无法直接访问huggingface.co。可以通过配置使用其镜像站。例如:hf-mirror.com。
hf-mirror.com是用于镜像 huggingface.co 域名。作为一个公益项目,致力于帮助国内AI开发者快速、稳定的下载模型、数据集。
修改~/.profile
,在最下方加入:
1export HF_ENDPOINT=https://hf-mirror.com
执行一次source ~/.profile
使上述环境变量在当前shell窗口立即生效,或退出重新登录或切换到当前用户su - <theuser>
。
在当前shell窗口执行检查,确保HF_ENDPOINT
环境变量已经配置上了:
1echo $HF_ENDPOINT
2https://hf-mirror.com
下载模型 #
Qwen/Qwen2-7B-Instruct #
1huggingface-cli download Qwen/Qwen2-7B-Instruct
模型默认被下载到~/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/
目录中。
注意~/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/
目录中存储的是模型的缓存文件:
1~/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/
2├── blobs
3│ ├── 0b749a4446d7cda007d5e7bd9f908849d08d89867192d4c039dc167e9ab5a02e
4│ ├── 0eb3c536657dcd12626e09eca4b6198c0cbcde1e
5│ ├── 20024bfe7c83998e9aeaf98a0cd6a2ce6306c2f0
6│ ├── 26d9919262ccd063fcdfd926763fe9025ef1e3073767aaa8c83a375d7c5140c4
7│ ├── 33ea6c72ebb92a237fa2bdf26c5ff16592efcdae
8│ ├── 35a3713c113c1548c4769b6c562b2624a2d4c2b2
9│ ├── 428f5926b8d79604d999b20b3bea98f5d4225a21
10│ ├── 4783fe10ac3adce15ac8f358ef5462739852c569
11│ ├── 4bf699d11c6478a4b70fc2adfb405429de22525f
12│ ├── a6344aac8c09253b3b630fb776ae94478aa0275b
13│ ├── ae681edbd9486961e27d023f91d97b15562da7a9
14│ ├── cc375d92d7061b465042e9a1d507cb99598fb97a
15│ ├── da724bb7d3c3512eb371aa6caa5bcc08d78bda84f94e00ae9a9b2124e3e9c62f
16│ └── f5bb99fdadcac55c2c176497ec99f088a1764e78ed986fa4a0d45d12426ef0fa
17├── refs
18│ └── main
19└── snapshots
20 └── 41c66b0be1c3081f13defc6bdf946c2ef240d6a6
21 ├── LICENSE -> ../../blobs/cc375d92d7061b465042e9a1d507cb99598fb97a
22 ├── README.md -> ../../blobs/35a3713c113c1548c4769b6c562b2624a2d4c2b2
23 ├── config.json -> ../../blobs/ae681edbd9486961e27d023f91d97b15562da7a9
24 ├── generation_config.json -> ../../blobs/0eb3c536657dcd12626e09eca4b6198c0cbcde1e
25 ├── merges.txt -> ../../blobs/20024bfe7c83998e9aeaf98a0cd6a2ce6306c2f0
26 ├── model-00001-of-00004.safetensors -> ../../blobs/26d9919262ccd063fcdfd926763fe9025ef1e3073767aaa8c83a375d7c5140c4
27 ├── model-00002-of-00004.safetensors -> ../../blobs/f5bb99fdadcac55c2c176497ec99f088a1764e78ed986fa4a0d45d12426ef0fa
28 ├── model-00003-of-00004.safetensors -> ../../blobs/0b749a4446d7cda007d5e7bd9f908849d08d89867192d4c039dc167e9ab5a02e
29 ├── model-00004-of-00004.safetensors -> ../../blobs/da724bb7d3c3512eb371aa6caa5bcc08d78bda84f94e00ae9a9b2124e3e9c62f
30 ├── model.safetensors.index.json -> ../../blobs/4bf699d11c6478a4b70fc2adfb405429de22525f
31 ├── tokenizer.json -> ../../blobs/33ea6c72ebb92a237fa2bdf26c5ff16592efcdae
32 ├── tokenizer_config.json -> ../../blobs/428f5926b8d79604d999b20b3bea98f5d4225a21
33 └── vocab.json -> ../../blobs/4783fe10ac3adce15ac8f358ef5462739852c569
关于这个缓存目录的更多内容,可以查看huggingface_hub的官方文档 Manage huggingface_hub cache-system。
还可以在本地创建一个专门存放模型的目录:
1mkdir ~/models
可以使用huggingface-cli download
加上--local-dir
选项将模型保存到~/models
这个本地目录中。
1huggingface-cli download Qwen/Qwen2-7B-Instruct --local-dir ~/models/models--Qwen--Qwen2-7B-Instruct
上面的命令将模型下载到本地目录时,因模型已经下载到缓存文件中了,所以存到本地目录时,直接会从缓存中获取。本地目录中的模型目录结构如下:
1~/models/models--Qwen--Qwen2-7B-Instruct/
2├── LICENSE
3├── README.md
4├── config.json
5├── generation_config.json
6├── merges.txt
7├── model-00001-of-00004.safetensors
8├── model-00002-of-00004.safetensors
9├── model-00003-of-00004.safetensors
10├── model-00004-of-00004.safetensors
11├── model.safetensors.index.json
12├── tokenizer.json
13├── tokenizer_config.json
14└── vocab.json
Alibaba-NLP/gte-Qwen2-7B-instruct #
1huggingface-cli download Alibaba-NLP/gte-Qwen2-7B-instruct
BAAI/bge-large-zh-v1.5 #
1huggingface-cli download BAAI/bge-large-zh-v1.5
BAAI/bge-large-en-v1.5 #
1huggingface-cli download BAAI/bge-large-en-v1.5
shibing624/text2vec-base-chinese #
1huggingface-cli download shibing624/text2vec-base-chinese