✌️Polygraphy-Cheatsheet
安装Polygraphy
python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.comAPI
从ONNX模型导出engine,并执行推理,比较输出结果
"""
This script builds and runs a TensorRT engine with FP16 precision enabled
starting from an ONNX identity model.
"""
import numpy as np
from polygraphy.backend.trt import CreateConfig, EngineFromNetwork, NetworkFromOnnxPath, SaveEngine, TrtRunner
def main():
# We can compose multiple lazy loaders together to get the desired conversion.
# In this case, we want ONNX -> TensorRT Network -> TensorRT engine (w/ fp16).
#
# NOTE: `build_engine` is a *callable* that returns an engine, not the engine itself.
# To get the engine directly, you can use the immediately evaluated functional API.
# See examples/api/06_immediate_eval_api for details.
build_engine = EngineFromNetwork(
NetworkFromOnnxPath("identity.onnx"), config=CreateConfig(fp16=True)
) # Note that config is an optional argument.
# To reuse the engine elsewhere, we can serialize and save it to a file.
# The `SaveEngine` lazy loader will return the TensorRT engine when called,
# which allows us to chain it together with other loaders.
build_engine = SaveEngine(build_engine, path="identity.engine")
# Once our loader is ready, inference is simply a matter of constructing a runner,
# activating it with a context manager (i.e. `with TrtRunner(...)`) and calling `infer()`.
#
# NOTE: You can use the activate() function instead of a context manager, but you will need to make sure to
# deactivate() to avoid a memory leak. For that reason, a context manager is the safer option.
with TrtRunner(build_engine) as runner:
inp_data = np.ones(shape=(1, 1, 2, 2), dtype=np.float32)
# NOTE: The runner owns the output buffers and is free to reuse them between `infer()` calls.
# Thus, if you want to store results from multiple inferences, you should use `copy.deepcopy()`.
outputs = runner.infer(feed_dict={"x": inp_data})
assert np.array_equal(outputs["y"], inp_data) # It's an identity model!
print("Inference succeeded!")
if __name__ == "__main__":
main()直接导入engine,并执行推理,比较输出结果
快速查看模型结构
比较多个不同推理后端的输出
验证单个推理后端的输出结果
TensorRT API和Polygraphy互操作
TensorRT中的int8校准
使用TensorRT API搭建网络
及时评估函数API
动态shape
Setting The Stage
Performance Considerations
A Possible Solution
保存输入数据&处理运行结果
CLI
TensorRT中的int8校准
在TensorRT中构建确定性引擎(engine)
Running The Example
动态shape
将onnx转换为FP16格式&分析精度损失情况
调试TensorRT引擎生成策略(tactics)
tactics)减少运行失败的ONNX模型

查看 TensorRT 网络
查看 TensorRT 引擎
查看ONNX模型
查看模型的输出数据
查看模型的输入数据
查看tactic文件
检查TensorRT对ONNX算子的支持情况




不同推理框架之间的比较
Last updated