Hands-On LLM Serving and Optimization
Shipping & Delivery
Our Delivery Time Frames Explained
2-4 Working Days: Available in-stock
14-28 Working Days: On Backorder
Will Deliver When Available: On Pre-Order or Reprinting
We ship your order once all items have arrived at our warehouse and are processed. Need those 2-4 day shipping items sooner? Just place a separate order for them!
Product details
- ISBN 9798341621497
- Dimensions: 178 x 232mm
- Publication Date: 26 May 2026
- Publisher: O'Reilly Media
- Publication City/Country: US
- Product Form: Paperback
Large language models (LLMs) are rapidly becoming the backbone of AI-driven applications. Without proper optimization, however, LLMs can be expensive to run, slow to serve, and prone to performance bottlenecks. As the demand for real-time AI applications grows, along comes Hands-On Serving and Optimizing LLM Models, a comprehensive guide to the complexities of deploying and optimizing LLMs at scale.
In this hands-on book, authors Chi Wang and Peiheng Hu take a real-world approach backed by practical examples and code, and assemble essential strategies for designing robust infrastructures that are equal to the demands of modern AI applications. Whether you're building high-performance AI systems or looking to enhance your knowledge of LLM optimization, this indispensable book will serve as a pillar of your success.
- Learn the key principles for designing a model-serving system tailored to popular business scenarios
- Understand the common challenges of hosting LLMs at scale while minimizing costs
- Pick up practical techniques for optimizing LLM serving performance
- Build a model-serving system that meets specific business requirements
- Improve LLM serving throughput and reduce latency
- Host LLMs in a cost-effective manner, balancing performance and resource efficiency
