Hands-On LLM Serving and Optimization

Name: Hands-On LLM Serving and Optimization
Brand: O'Reilly Media
SKU: 9798341621497
Price: 76.99 EUR
Availability: InStock

Chi Wang | Peiheng Hu

€76.99

Product variants

Quantity:

4.8/5

Judge.me

603 verified reviews

100% verified

In stock with our UK publisher. 14-28 days

Delivery/Collection within 10-20 working days

14 days return policy Shipping & Delivery

A01=Chi Wang

A01=Peiheng Hu

Author_Chi Wang

Author_Peiheng Hu

Category=UYQL

eq_bestseller

eq_computing

eq_isMigrated=1

eq_isMigrated=2

eq_new_release

eq_nobargain

eq_non-fiction

LLM LLM Serving LLM Optimization vLLM Triton Model Serving

Product details

ISBN 9798341621497
Dimensions: 178 x 232mm
Publication Date: 26 May 2026
Publisher: O'Reilly Media
Publication City/Country: US
Product Form: Paperback

Secure checkout

Fast Shipping

Easy returns

Large language models (LLMs) are rapidly becoming the backbone of AI-driven applications. Without proper optimization, however, LLMs can be expensive to run, slow to serve, and prone to performance bottlenecks. As the demand for real-time AI applications grows, along comes Hands-On Serving and Optimizing LLM Models, a comprehensive guide to the complexities of deploying and optimizing LLMs at scale.

In this hands-on book, authors Chi Wang and Peiheng Hu take a real-world approach backed by practical examples and code, and assemble essential strategies for designing robust infrastructures that are equal to the demands of modern AI applications. Whether you're building high-performance AI systems or looking to enhance your knowledge of LLM optimization, this indispensable book will serve as a pillar of your success.

Learn the key principles for designing a model-serving system tailored to popular business scenarios
Understand the common challenges of hosting LLMs at scale while minimizing costs
Pick up practical techniques for optimizing LLM serving performance
Build a model-serving system that meets specific business requirements
Improve LLM serving throughput and reduce latency
Host LLMs in a cost-effective manner, balancing performance and resource efficiency

Chi Wang is a director of engineering at Salesforce's Einstein AI group, with over 18 years of experience in artificial intelligence and distributed systems. He leads the development of large-scale AI platforms that enable model training, inference, and optimization for hundreds of internal teams and power AI capabilities used by millions of Salesforce customers. At Salesforce, Chi oversees multiple engineering teams focused on model inference and optimization, and data science platforms. His work spans building multi-tenant AI infrastructure, scaling distributed compute systems, and improving the performance and cost-efficiency of large language model workloads in production. Chi is the lead inventor on 12 patents across areas including model serving and optimization, data access control, and large-scale system design. He is also a passionate technical writer, focused on making complex AI systems practical and accessible for engineers. Peiheng Hu is an accomplished machine learning engineer with over 10 years of industry experience and expertise in building large-scale AI systems. He currently works at NVIDIA, where he focuses on the cutting-edge distributed LLM inference, pushing the boundaries of high-performance inference engines on the latest NVIDIA GPUs. He holds a master of science in computational science and engineering from Harvard University and a bachelor of science in industrial engineering operations research from Georgia Institute of Technology. Previously, Peiheng served as a principal member of technical staff at Salesforce, where he led the development of the company's only unified serving platform, handling thousands of per-tenant models and LLM optimizations for Agentforce that saved millions in AI infrastructure expenses. Prior to that, he was a senior ML engineer at Microsoft Azure, where he architected distributed ML processing solutions for cloud security detection and analytics, handling billions of transactions per hour.

Hands-On LLM Serving and Optimization

Shipping & Delivery

Product details

More from this author

Submit Withdrawal Request