1. MindIE 服务化
1.1 环境准备
镜像传送门
参数说明:
device用于挂载卡,下面的例子是挂载了8张卡
倒数第二行的镜像名称记得修改
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| docker run -itd --privileged --name=mindie --net=host \ --shm-size 500g \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ --device=/dev/davinci3 \ --device=/dev/davinci4 \ --device=/dev/davinci5 \ --device=/dev/davinci6 \ --device=/dev/davinci7 \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device /dev/devmm_svm \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v /usr/local/sbin:/usr/local/sbin \ -v /etc/hccn.conf:/etc/hccn.conf \ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-800I-A2-py311-openeuler24.03-lts \ bash
# 进入容器 docker exec -it mindie bash
|
1.2 MindIE基础配置
1 2 3 4 5 6 7 8 9
| cd /usr/local/Ascend/mindie/latest/mindie-service/ vim conf/config.json
# 几个参数需要修改: httpsEnabled: false modelName: 类似vllm的modelName,用于标识模型,可以随意配置 modelWeightPath: 本地模型路径 npuDeviceIds: 使用的NPU卡,默认[[0,1,2,3]],暂时可不修改 worldSize: 使用的卡数,和npuDeviceIds对应,默认4
|
1 2 3
| # 启动MindIE cd /usr/local/Ascend/mindie/latest/mindie-service ./bin/mindieservice_daemon
|

测试服务化
1 2 3 4 5 6 7 8 9 10 11 12
| curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{ "prompt": "你是谁", "max_tokens": 200, "repetition_penalty": 1.03, "presence_penalty": 1.2, "frequency_penalty": 1.2, "temperature": 0.5, "top_k": 10, "top_p": 0.95, "stream": false, "ignore_eos": false }' http://127.0.0.1:1025/generate
|
2. Benchmark
2.1 配置
1 2 3 4
| # 配置文件权限问题 chmod 640 /usr/local/lib/python3.11/site-packages/mindieclient/python/config/config.json # 日志打屏 export MINDIE_LOG_TO_STDOUT="benchmark:1; client:1"
|
2.2 数据集
2.1 合成数据(性能测试)
可以指定输入输出的token数量
生成的数据为A[空格],如输入token数为5: “A A A A A”
输出token数不会因为结束符而停止。可以根据真实场景的输出更好的测试性能。
在任意位置创建:vim synthetic_config.json
样例
1 2 3 4 5 6 7 8 9 10 11
| { "Input":{ "Method": "uniform", "Params": {"MinValue": 1, "MaxValue": 200} }, "Output": { "Method": "gaussian", "Params": {"Mean": 100, "Var": 200, "MinValue": 1, "MaxValue": 100} }, "RequestCount": 100 }
|
参数 |
含义 |
取值范围 |
Input |
输入配置 |
- |
Output |
输出配置 |
- |
RequestCount |
请求次数,即样本数量 |
[1,1048576] |
Method |
采样方法 |
取 “uniform”、”gaussian”或”zipf”。 |
Params |
采样方法中对应的采样参数 |
取值详情请参见表2。 |
“Input” 中的 “MinValue” |
token 数量最小值 |
[1,1048576] |
“Input” 中的 “MaxValue” |
token 数量最大值 |
[1,1048576] |
“Output” 中的 “MinValue” |
token 数量最小值 |
[1,1048576] |
“Output” 中的 “MaxValue” |
token 数量最大值 |
[1,1048576] |
“gaussian” 中的 “Mean” |
高斯分布均值 |
[-3.0 x 10^38, 3.0 x 10^38] |
“gaussian” 中的 “Var” |
高斯分布方差 |
[0, 3.0 x 10^38] |
“zipf” 中的 “Alpha” |
zipf分布Alpha系数 |
(1.0,10.0] |
注:1048576 = 2^20 = 1 M。 |
|
|
Benchmark命令: 合成数据配置路径使用SyntheticConfigPath
指定
1 2 3 4 5 6 7 8 9 10 11 12
| benchmark \ --DatasetType "synthetic" \ --ModelName llama_7b \ --ModelPath "/{模型权重路径}/llama_7b" \ --TestType vllm_client \ --Http https://{ipAddress}:{port} \ --ManagementHttp https://{managementIpAddress}:{managementPort} \ --Concurrency 128 \ --MaxOutputLen 20 \ --TaskKind stream \ --Tokenizer True \ --SyntheticConfigPath /{配置文件路径}/synthetic_config.json
|
测试环境:910B、四卡、DeepSeek-R1-Qwen-32B
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
| # 配置: { "Input":{ "Method": "uniform", "Params": {"MinValue": 500, "MaxValue": 1000} }, "Output": { "Method": "gaussian", "Params": {"Mean": 100, "Var": 200, "MinValue": 1, "MaxValue": 100} }, "RequestCount": 2000 }
# 输出 +---------------------+----------------+-----------------+----------------+---------------+----------------+----------------+-----------------+------+ | Metric | average | max | min | P75 | P90 | SLO_P90 | P99 | N | +---------------------+----------------+-----------------+----------------+---------------+----------------+----------------+-----------------+------+ | FirstTokenTime | 762.4963 ms | 9074.5516 ms | 118.6991 ms | 567.9139 ms | 1415.2258 ms | 1415.2258 ms | 8181.3668 ms | 2000 | | DecodeTime | 170.4084 ms | 8435.523 ms | 0.0157 ms | 246.6154 ms | 444.639 ms | 185.2339 ms | 767.7023 ms | 2000 | | LastDecodeTime | 262.8068 ms | 928.5161 ms | 20.0195 ms | 423.1779 ms | 617.8357 ms | 617.8357 ms | 911.0484 ms | 2000 | | MaxDecodeTime | 1279.4515 ms | 8435.523 ms | 75.1517 ms | 907.9784 ms | 2699.2586 ms | 2699.2586 ms | 7724.6093 ms | 2000 | | GenerateTime | 16700.8135 ms | 20314.2266 ms | 4299.7291 ms | 18216.5194 ms | 18719.7726 ms | 18719.7726 ms | 20156.1953 ms | 2000 | | InputTokens | 753.29 | 999 | 500 | 877.0 | 951.0 | 951.0 | 995.01 | 2000 | | GeneratedTokens | 94.5235 | 100 | 55 | 100.0 | 100.0 | 100.0 | 100.0 | 2000 | | GeneratedTokenSpeed | 5.8279 token/s | 21.2943 token/s | 4.2824 token/s | 5.677 token/s | 6.0752 token/s | 6.0752 token/s | 14.8526 token/s | 2000 | | GeneratedCharacters | 189.047 | 200 | 110 | 200.0 | 200.0 | 200.0 | 200.0 | 2000 | | Tokenizer | 4.7329 ms | 33.6573 ms | 1.3688 ms | 5.6392 ms | 6.5679 ms | 6.5679 ms | 8.5305 ms | 2000 | | Detokenizer | 1.0233 ms | 1.2207 ms | 0.6008 ms | 1.081 ms | 1.086 ms | 1.086 ms | 1.0986 ms | 2000 | | CharactersPerToken | 2.0 | / | / | / | / | / | / | 2000 | | PostProcessingTime | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 2000 | | ForwardTime | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 0 ms | 2000 | +---------------------+----------------+-----------------+----------------+---------------+----------------+----------------+-----------------+------+ [2025-04-29 19:12:10.619+08:00] [13134] [281473497909536] [benchmark] [INFO] [output.py:121] The BenchMark test common metric result is: +------------------------+---------------------+ | Common Metric | Value | +------------------------+---------------------+ | CurrentTime | 2025-04-29 19:12:10 | | TimeElapsed | 263.6115 s | | DataSource | None | | Failed | 0( 0.0% ) | | Returned | 2000( 100.0% ) | | Total | 2000[ 100.0% ] | | Concurrency | 128 | | ModelName | llm | | lpct | 1.0122 ms | | Throughput | 7.5869 req/s | | GenerateSpeed | 717.1425 token/s | | GenerateSpeedPerClient | 5.6027 token/s | | accuracy | / | +------------------------+---------------------+
|
