Single-split boundary analysis for ONNX graphs. The tool enumerates all topological boundaries and ranks them by communication volume, compute balance, and (optionally) a simple link/latency model.
onnx_splitpoint_tool/– Python package (core code)api.py– public API / compatibility re-exportscli.py– CLI implementationgui_app.py– Tkinter GUIonnx_utils.py– parsing + graph utilitiesmetrics.py– cost/FLOPs/memory estimation + rankingpruning.py– skip-/block-aware pruningsplit_export.py– split export + runner skeleton + context diagrams
analyse_and_split.py– thin wrapper entrypoint (kept for backwards compatibility)analyse_and_split_gui.py– thin wrapper entrypoint (kept for backwards compatibility)download_model_zoo_examples.py– helper to download ONNX model zoo examplesmodel_zoo_manifest.json– curated model list for the downloader
pip install onnx numpy matplotlib pillow
# optional (validation/runner):
pip install onnxruntimeThe GUI can optionally run a Hailo feasibility check (parse-only) during ranking. This is useful to automatically prune split candidates where a partition cannot be translated by the Hailo toolchain.
- Install the Hailo Dataflow Compiler (DFC) Python wheel into the same Python environment as this tool.
- In the GUI enable Hailo check and set Backend = local (or keep auto).
The Hailo DFC wheel is Linux-only. Recommended setup:
- Install WSL2 + an Ubuntu distro.
- Create a WSL virtualenv (default expected path:
~/hailo_dfc_venv) and install the Hailo DFC wheel there. A helper script is included:./scripts/setup_hailo_dfc_wsl.sh /path/to/hailo_dataflow_compiler-*.whl - Run the GUI on Windows, enable Hailo check, set Backend = wsl (or auto), and point it to:
- WSL venv:
~/hailo_dfc_venv/bin/activate - WSL distro: optional (leave empty to use the default WSL distro)
- WSL venv:
Notes:
- The tool calls
wsl.exeand runs a helper script inside WSL; you do not need to activate the venv manually. - If your model files are on
C:/D:, WSL can access them via/mnt/c/...and/mnt/d/.... - A small helper for interactive shells is included:
source ./scripts/activate_hailo_dfc_venv.sh
Practical tips:
- The GUI provides a Test backend button in the Hailo section to verify that the selected backend (local or WSL) can import the Hailo SDK.
- Hailo parse-only results are cached across runs (by sub-model hash) to avoid re-running DFC translation during ranking.
You can clear the cache from the GUI (Clear cache) or delete
~/.onnx_splitpoint_tool/hailo_parse_cache.json.
python analyse_and_split_gui.pypython analyse_and_split.py path/to/model.onnx --topk 10- Export split-context diagrams (full + cut-flow) with configurable context hops (0..3).
- Suggest split boundaries based on activation-communication + compute balance.
- Skip-/Block-aware candidate pruning (heuristic: avoids splitting inside long skip/residual blocks).
- Link-model plugin for latency/energy with optional constraints (bandwidth/latency/energy).
- Pareto export + clean System / Workload separation (
system_config.json+workload_profile.json). - Peak activation memory (approx, from value spans):
- per boundary: live-set bytes (same basis as Comm(b))
- per partition:
peak_left[b]=max_{i<=b} live(i),peak_right[b]=max_{i>=b} live(i) - optional constraints: max act mem left/right (fits SRAM/VRAM)
- Split directly from the GUI (export part1/part2 ONNX).
- Strict boundary option (rejects splits where part2 needs additional intermediate activations beyond the cut tensors; original graph inputs are allowed).
- Optional onnxruntime validation (
full(x) ~= part2(part1(x))). - Runner skeleton generator (
run_split_onnxruntime.py)- generic ORT benchmark runner for: full / part1 / part2 / composed
- supports input feeds via NPZ (
--inputs-npz) and saving generated inputs (--save-inputs-npz) - supports dumping standalone split interfaces as NPZ (
--dump-interface {right,left,min,either}) with metadata - supports CPU/CUDA/TensorRT (engine cache + fast-build preset) and writes
validation_report.json
Example (dump interface for Part 2 / “right” side):
./run_split_onnxruntime.sh --provider tensorrt --dump-interface right --dump-interface-out results/interface
# outputs: results/interface_right.npz (and metadata in __meta__)Example (LLM-style shapes via overrides):
./run_split_onnxruntime.sh --provider cuda --shape-override "input_ids=1x128 attention_mask=1x128"Example (reproducible inputs to NPZ):
./run_split_onnxruntime.sh --provider cpu --seed 0 --save-inputs-npz results/inputs_full.npz
./run_split_onnxruntime.sh --provider cpu --inputs-npz results/inputs_full.npz- Benchmark set generator (GUI button: “Benchmark set…”)
- exports one subfolder per selected split (models + runner)
- writes
benchmark_suite.pyto run all cases and aggregate results/plots - also exports paper assets into the benchmark folder root:
split_candidates.tex- plots (
analysis_*.pdf/analysis_*.svg) system_config.json,workload_profile.jsonpareto_export.csv,candidate_pruning.json
- If Graphviz
dotis installed,.dotfiles are rendered to SVG/PDF automatically. Otherwise a matplotlib fallback diagram is created. - Windows: do not run
.batfiles with Python. Double click, or run from PowerShell directly.
Large models exported with ONNX external data are supported.
When exporting splits/benchmark sets, the tool tries to make the output folder usable by:
- creating a hardlink to the referenced
*.datafile (fast, no extra disk use; requires same filesystem), - or falling back to symlink/copy,
- and if none of the above is possible, it rewrites the ONNX external-data
locationto an absolute path (works locally, not portable).
The GUI writes logs to both:
~/.onnx_splitpoint_tool/gui.log./gui.log(current working directory)