当前位置：网站首页 > IT知识 > 正文

比ollama性能强大更全面的Xinference模型平台的详细安装步骤

liuian 2025-03-29 19:27 83 浏览

Xinference 是一个性能强大且功能全面的平台，旨在让您轻松运行和管理各种大型语言模型（LLMs）、嵌入模型（embedding models）和多模态模型（multimodal models）。它具有以下特点：

多模型支持： 支持各种开源 LLMs（如 LLaMA、Falcon、ChatGLM 等）、嵌入模型和多模态模型。
分布式部署： 可以在单机、多机甚至集群上部署，实现高可用性和可扩展性。
易于使用： 提供了简单的命令行界面（CLI）和 Web UI，方便您管理和使用模型。
内置优化： 包含了多种模型推理优化技术，如 GGML、GPTQ 等，提高推理速度。
兼容 OpenAI API： 提供了与 OpenAI API 兼容的接口，方便您将现有应用迁移到 Xinference。

部署步骤

安装 Python 环境 (建议使用 conda)
由于 Xinference 是 Python 项目，您需要先安装 Python 环境。强烈建议使用 conda 来管理 Python 环境，避免潜在的依赖冲突。
安装 Miniconda 或 Anaconda：
Miniconda: https://docs.conda.io/en/latest/miniconda.html
Anaconda: https://www.anaconda.com/products/distribution
下载对应 macOS (Apple Silicon) 的安装包，按照提示安装。安装完成后，打开终端，输入 conda --version，如果能看到版本号，则表示安装成功。
创建 conda 环境：
conda create -n xinference python=3.10 # 建议使用 Python 3.10 conda activate xinference
content_copydownload
Use code with caution.Bash
安装 Xinference
有两种安装方式：
方式一：使用 pip 安装 (推荐)
pip install "xinference[all]" # 安装所有依赖，包括 Web UI 和各种加速库
content_copydownload
Use code with caution.Bash
如果网络不好, 使用国内源
pip install "xinference[all]" -i https://pypi.tuna.tsinghua.edu.cn/simple
content_copydownload
Use code with caution.Bash
方式二：从源码安装 (适合开发者)
git clone https://github.com/xorbitsai/inference.git cd inference pip install -e ".[all]"
content_copydownload
Use code with caution.Bash
如果网络不好, 使用国内源
pip install -e ".[all]" -i https://pypi.tuna.tsinghua.edu.cn/simple
content_copydownload
Use code with caution.Bash
启动 Xinference 服务
本地单机模式启动：
xinference-local
content_copydownload
Use code with caution.Bash
这将启动一个本地 Xinference 服务，监听默认端口 9997。您可以通过浏览器访问 http://localhost:9997 来查看 Web UI。
部署和使用模型
Xinference Web UI 提供了图形化界面，方便您部署和管理模型。您也可以使用命令行工具。
Web UI 方式：

打开浏览器，访问 http://localhost:9997。
点击 "Launch Model" 按钮。
选择您想要部署的模型（例如，chatglm3-6b）。
填写模型相关参数（例如，模型路径、量化方式等）。如果模型不在本地, xinference将自动下载模型。
点击 "Launch" 按钮，等待模型加载完成。
模型加载完成后，您可以在 "Chat with Model" 页面与模型进行交互。

命令行方式：

启动一个模型：
以chatglm3-6b为例, 内置支持的模型不需要指定模型路径
xinference launch --model-name chatglm3 --model-format pytorch --model-size-in-billions 6
content_copydownload
Use code with caution.Bash
如果需要指定模型路径
xinference launch --model-name chatglm3 --model-format pytorch --model-size-in-billions 6 --model-path /path/to/your/chatglm3-6b
content_copydownload
Use code with caution.Bash
查看已启动的模型：
xinference list
content_copydownload
Use code with caution.Bash
与模型交互（使用 curl 或 Python）：
获取模型的endpoint和model_uid
$ xinference list +--------------------------------------+-----------------------------------------------------------------------+------------+ | model_uid | endpoint | model_name | +--------------------------------------+-----------------------------------------------------------------------+------------+ | 82e9895b6e474cb9b39987c47ab27439 | http://localhost:9997/v1/models/82e9895b6e474cb9b39987c47ab27439 | chatglm3 | +--------------------------------------+-----------------------------------------------------------------------+------------+
content_copydownload
Use code with caution.Bash
使用 curl：
curl -X POST \ -H "Content-Type: application/json" \ -d '{ "prompt": "你好", "model": "chatglm3" }' \ http://localhost:9997/v1/chat/completions
content_copydownload
Use code with caution.Bash
使用 Python (OpenAI 客户端)：
from openai import OpenAI client = OpenAI( base_url="http://localhost:9997/v1", # 替换为您的 Xinference endpoint api_key="EMPTY", # Xinference 不需要 API key ) completion = client.chat.completions.create( model="82e9895b6e474cb9b39987c47ab27439", # 替换为您的 model_uid messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "你好!"} ] ) print(completion.choices[0].message)
content_copydownload
Use code with caution.Python

注意事项

模型下载： 首次部署模型时，Xinference 会自动下载模型文件。请确保您的网络连接良好，并有足够的存储空间。
硬件要求： 运行 LLMs 对硬件有一定的要求，特别是 GPU 内存。如果您的 GPU 内存不足，可以尝试使用量化后的模型（如 GPTQ 格式）或较小的模型。
模型路径: 命令行启动模型时，如果模型不在 xinference 的内置模型列表中，您需要指定 --model-path 为您的本地模型路径。
端口冲突： 如果默认端口 9997 被占用，您可以使用 --host 和 --port 参数指定其他主机和端口。

进阶使用

分布式部署： Xinference 支持分布式部署，可以参考官方文档了解更多信息：https://inference.readthedocs.io/en/latest/guides/distributed_deployment.html
自定义模型： 您可以部署自己训练的模型，具体方法请参考官方文档：https://inference.readthedocs.io/en/latest/guides/register_custom_model.html
模型加速： Xinference 支持多种模型加速技术，如 GGML、GPTQ 等，可以根据您的硬件和模型选择合适的加速方式。

希望这个详细的教程能帮助您在电脑上成功部署 Xinference！如果您在部署过程中遇到任何问题，欢迎随时提问。

pip指定版本安装