Mixed-vendor GPU inference cluster manager with speculative decoding
python machine-learning deep-learning metal cuda p2p homelab rocm gpu-cluster llama-cpp gpu-inference local-llm llm-inference ollama speculative-decoding distributed-inference gpu-pooling self-hosted-ai openai-compatible mixed-gpu
-
Updated
Apr 11, 2026 - Python