SEATTLE: AWS has launched EC2 Capacity Blocks for ML, a new way for customers to access GPU compute capacity for generative AI workloads. EC2 Capacity Blocks allow customers to reserve hundreds of NVIDIA GPUs collocated in Amazon EC2 UltraClusters for a duration between one to 14 days, at a future start date up to eight weeks in advance. T
his gives customers the flexibility to run a broad range of ML workloads and only pay for the amount of GPU time needed. EC2 Capacity Blocks are ideal for completing training and fine tuning ML models, short experimentation runs, and handling temporary future surges in inference demand.
Benefits of EC2 Capacity Blocks for ML
- Predictable access to GPU compute capacity: EC2 Capacity Blocks allow customers to reserve GPU compute capacity in advance, so they can plan for their ML workload deployments with certainty.
- Flexible deployment options: EC2 Capacity Blocks can be deployed in EC2 UltraClusters, interconnected with second-generation Elastic Fabric Adapter (EFA) petabit-scale networking, delivering low-latency, high-throughput connectivity, enabling customers to scale up to hundreds of GPUs.
- Cost-effective: Customers only pay for the amount of GPU time they reserve.
Use Cases for EC2 Capacity Blocks for ML
- Training and fine tuning ML models: EC2 Capacity Blocks are ideal for training and fine tuning ML models, which can require significant GPU compute resources.
- Short experimentation runs: EC2 Capacity Blocks can be used for short experimentation runs, such as trying out new ideas or testing different model parameters.
- Handling temporary future surges in inference demand: EC2 Capacity Blocks can be used to handle temporary future surges in inference demand, such as during product launches.
Customer Quotes
- Amplify Partners: “We believe that predictable and timely access to GPU compute capacity is fundamental to enabling founders to not only quickly bring their ideas to life but also continue to iterate on their vision and deliver increasing value to their customers.”
- Canva: “We’re excited to see AWS launching EC2 Capacity Blocks with support for P5 instances. We can now get predictable access to up to 512 NVIDIA H100 GPUs in low-latency EC2 UltraClusters to train even larger models than before.”
- Leonardo.Ai: “We are delighted with the launch of EC2 Capacity Blocks. It enables us to elastically access GPU capacity for training and experimenting while preserving the option for us to switch to different EC2 instances that might better meet our compute requirements.”
- OctoML: “EC2 Capacity Blocks enables us to predictably spin up different sizes of GPU clusters that match our customers’ planned scale-ups, while offering potential cost savings as compared to long-term capacity commits or deploying on-prem.”
Conclusion
EC2 Capacity Blocks for ML provide a new way for customers to access GPU compute capacity for generative AI workloads. With EC2 Capacity Blocks, customers can reserve GPU compute capacity in advance, deploy it in flexible ways, and only pay for the amount of GPU time they need.
Amazon Web Services launches AWS for Media & Entertainment initiative
Leave a Reply