Skip to content

Blogs

DatePostSummary
2026-06-20Co-locating Prefill and Decode on One GPU: Green Contexts for Higher ThroughputCUDA Green Contexts, SM partitioning, prefill/decode overlap, and OpenInfer benchmark results.
2026-06-13OpenInfer 0.1.0: Writing a Production-Grade Inference Engine in RustRust runtime story, RTX 5090 serving benchmarks, prefix-cache TTFT, and pegaflow KV offload.