Preparation
This section describes how to convert ONNX models into OpenVINO IR format, quantize for CPU/NPU, and validate compatibility.
OpenVINO IR is required for:
- Optimal CPU performance
- iGPU/NPU execution
- INT8 hardware acceleration
1. Export Model to ONNX
PyTorch example:
torch.onnx.export(
model,
dummy_input,
"model.onnx",
opset_version=17,
input_names=["input"],
output_names=["output"]
)
TensorFlow:
python -m tf2onnx.convert --saved-model ./saved --output model.onnx
2. Convert ONNX → OpenVINO IR
Use Model Optimizer:
mo --input_model model.onnx --output_dir ./ir --compress_to_fp16
Produces:
- model.xml
- model.bin
FP16 recommended for:
- CPU
- iGPU
- NPU
For FP32:
mo --input_model model.onnx --data_type FP32
3. Quantization (INT8)
Use POT (Post-Training Optimization Tool):
from openvino.tools.pot import DataLoader, IEEngine, load_model, save_model
CLI version:
pot -m model.xml -d config.json -o output/
Benefits:
- Faster CPU execution
- Required for optimal NPU performance
4. Validate the IR Model
from openvino.runtime import Core
ie = Core()
model = ie.read_model("model.xml")
compiled = ie.compile_model(model, "CPU")
5. Check Input/Output Shapes
for inp in model.inputs:
print(inp.get_any_name(), inp.shape, inp.element_type)
Ensure:
- NCHW or NHWC matches your preprocessing
- Dynamic shapes are allowed (-1)
6. Layout Conversion
If your model expects NHWC but your pipeline is NCHW:
mo --input_model model.onnx --layout=input(NHWC)
7. Batch Size
Static batch:
mo --input_model model.onnx --batch 4
Dynamic batch:
mo --input_model model.onnx --input_shape "[?,3,224,224]"
8. Best Practices
Prefer FP16 for GPU/NPU Prefer INT8 for CPU (if accuracy drop acceptable) Always validate the model with a real input Use dynamic input shapes for video analytics pipelines