Running ONNX models with KServe on macOS, especially ARM64, can be tricky without GPU support from runtimes like NVIDIA Triton. I created a simple demo to make it work: kserve-onnx-predictor-demo
.
It uses a custom predictor with Python and ONNX Runtime, perfect for macOS and deployable to a Kind cluster with KServe. No GPU required, just a clean way to serve ONNX models.
Curious? The repo has all the details—code, manifests, and steps. Check it out at github.com/hbelmiro/kserve-onnx-predictor-demo
and give it a spin!