Preparation

This section describes how to convert ONNX models into OpenVINO IR format, quantize for CPU/NPU, and validate compatibility.

OpenVINO IR is required for:

Optimal CPU performance
iGPU/NPU execution
INT8 hardware acceleration

1. Export Model to ONNX

PyTorch example:

torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    opset_version=17,
    input_names=["input"],
    output_names=["output"]
)

TensorFlow:

python -m tf2onnx.convert --saved-model ./saved --output model.onnx

2. Convert ONNX → OpenVINO IR

Use Model Optimizer:

mo --input_model model.onnx --output_dir ./ir --compress_to_fp16

Produces:

model.xml
model.bin

FP16 recommended for:

CPU
iGPU
NPU

For FP32:

mo --input_model model.onnx --data_type FP32

3. Quantization (INT8)

Use POT (Post-Training Optimization Tool):

from openvino.tools.pot import DataLoader, IEEngine, load_model, save_model

CLI version:

pot -m model.xml -d config.json -o output/

Benefits:

Faster CPU execution
Required for optimal NPU performance

4. Validate the IR Model

from openvino.runtime import Core

ie = Core()
model = ie.read_model("model.xml")
compiled = ie.compile_model(model, "CPU")

5. Check Input/Output Shapes

for inp in model.inputs:
    print(inp.get_any_name(), inp.shape, inp.element_type)

Ensure:

NCHW or NHWC matches your preprocessing
Dynamic shapes are allowed (-1)

6. Layout Conversion

If your model expects NHWC but your pipeline is NCHW:

mo --input_model model.onnx --layout=input(NHWC)

7. Batch Size

Static batch:

mo --input_model model.onnx --batch 4

Dynamic batch:

mo --input_model model.onnx --input_shape "[?,3,224,224]"

8. Best Practices

Prefer FP16 for GPU/NPU Prefer INT8 for CPU (if accuracy drop acceptable) Always validate the model with a real input Use dynamic input shapes for video analytics pipelines

Atriva Edge AI Platform