Blogs
| Date | Post | Summary |
|---|---|---|
| 2026-06-20 | Co-locating Prefill and Decode on One GPU: Green Contexts for Higher Throughput | CUDA Green Contexts, SM partitioning, prefill/decode overlap, and OpenInfer benchmark results. |
| 2026-06-13 | OpenInfer 0.1.0: Writing a Production-Grade Inference Engine in Rust | Rust runtime story, RTX 5090 serving benchmarks, prefix-cache TTFT, and pegaflow KV offload. |