Issue # 43: June 23rd, 2023 to June 30th, 2023
[Read the browser version right here]
Welcome to Issue # 43 of AWS Graviton Weekly, which will be focused on sharing everything that happened in the past week related to AWS Silicon: from June 23rd, 2023 to June 30th, 2023
In this issue, you will find:
- An application deep-dive into the AWS Graviton3E-based Amazon EC2 Hpc7g instance
- Amazon SageMaker Neo now supports the compilation of PyTorch and TensorFlow models for Inferentia 2 and Trainium 1 instances
- How Snowflake is achieving its Sustainability goal using Graviton
- How GoDaddy is using Graviton instances to build a highly scalable hosting platform
- A deep dive into PyTorch 2.0 on Graviton
- Oracle Announces Oracle Database for Arm Architectures in the Cloud & On-Premises
- A very interesting tutorial about how to optimize & deploy BERT on AWS inferentia2
Before the regular share, I wanted to give a shoutout to my good friend Cristian Măgherușan-Stanciu (the creator of AutoSpotting and EBS Optimizer) who is sharing a lot of FinOps tips in his newsletter and on LinkedIn.
You can subscribe here.
The last one? $7k of yearly savings for 10-15 minutes of work
Back to business.
Starting today, you can choose Inferentia 2 and Trainium 1 as additional targets to compile your PyTorch and TensorFlow models for Amazon SageMaker Neo, a capability of Amazon SageMaker that enables customers to optimize machine learning (ML) models for inference on SageMaker to achieve faster inference without any loss in accuracy. Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances deliver high performance at the lowest cost for generative artificial intelligence (AI) models, including large language models (LLMs) and vision transformers. AWS Trainium is a machine learning (ML) accelerator that AWS purpose-built for deep learning training of 100B+ parameter models.
ARTICLES AND TUTORIALS
Application deep-dive into the AWS Graviton3E-based Amazon EC2 Hpc7g instance, by Neil Ashton (Principal Computational Engineering Specialist at Amazon Web Services (AWS), Karthik Raman (Principal Application Engineer, HPC at AWS), Dnyanesh Digraskar (Senior Partner Solutions Architect for HPC at AWS), Heidi Poxon (Principal, and lead, for the Performance Engineering and Technical Strategy team in HPC Engineering at AWS), Jun Tang (Software Development Engineer at Annapurna Labs), Rama Malladi (Solution Architect, HPC and Research at AWS) and Stephen Sachs (Principal HPC Application Engineer at AWS)
In this post we introduced the new Hpc7g instance that joins the Amazon EC2 HPC instances family. We ran several popular HPC applications and showed that it offers up to 70% better performance and almost 3x better price-performance compared to previous generation Graviton instances.
Graviton3E instances also consume up to 60% less energy for the same work than comparable Amazon EC2 instances, which makes it a more sustainable approach to HPC.
In this post, we will discuss how Snowflake
reduced its carbon emissions footprint and improved performance efficiency by transitioning virtual warehouses to AWS Graviton-based instance types.
Since the 5G mobile core network serves mission-critical services, such as voice calls and data streaming, you must make sure of the disaster-resiliency of the service and also have the capability for prompt disaster-recovery of the network component. More specifically, to have better isolation from a fault and disaster, it is reasonable to consider building a DR 5G network on the cloud rather than legacy CSP data centers. In addition, if this DR 5G network is mainly supposed to be used for a limited time period (during the recovery of service or absorbing the spike in traffic burst), then it has a good fit with the cloud’s pay-as-you-go model. AWS can help CSP customers by providing not only an environment for building this DR virtual data center, but also various tools of automation and scaling capability for the network as demonstrated with the GitHub repo sample in this post. Using this fast scaling-out capability along with the right type/size of instance, such as the Graviton instance in AWS, would maximize the benefit of cost and energy saving for building a DR 5G network for CSP customers.
Historically, the art of creating and running complex game servers locked developers into a single CPU architecture, typically Intel/AMD. Our developers tell us it’s hard to introduce different CPU architectures once game servers are built for a given processor.
In this article, we’ll show you how to build an Unreal Engine game with full support for the AWS Graviton processor. Plus, we’ll show you how to meet your performance requirements at a 42% lower cost than comparable current generation x86-based instances. Let’s dive in.
In this end-to-end tutorial, you will learn how to optimize and deploy BERT on AWS Inferentia2. We will reduce latency down to 4ms latency for BERT-base with a sequence length of 128.
While GoDaddy Websites + Marketing was built on one of the fastest hosting platforms on the planet, we wanted to improve latency, availability, and reliability by leveraging AWS technologies. This article details how we rebuilt and rearchitected our hosting platform for Websites + Marketing from the ground up using AWS technologies.
In this post, we covered the top five recommendations on how to optimize the cost when running ElastiCache for Redis workloads using native ElastiCache features. We talked about lowering ElastiCache costs by utilizing Graviton nodes and reserved instances, avoiding over-provisioning and scaling your clusters per business needs with auto-scaling, achieving 4.8 times the capacity with two times less cost using data tiering nodes, and enhancing throughput and lowering resource utilization with I/O multiplexing by upgrading to ElastiCache for Redis 7. Contact your AWS account team to get assistance on how to take advantage of these cost-optimization options in your use cases.
EPAM was tasked, by Maestro Cloud Control, to migrate it’s Maestro hybrid cloud management platform to AWS Graviton within a pre-existing enterprise infrastructure. The aim of the project was to reduce Maestro’s ongoing R&D cost and improve its performance. This was achieved by increasing processing time by 10 percent, using less resources to achieve this higher level of performance.
Overprovisioning is the top reason why teams see their cloud bills constantly growing. But choosing the best instances from the hundreds of options AWS offers is a tough call. Luckily, automation is here to help and slash your EKS costs in 15 minutes. Read this case study to learn more.
SLIDES, VIDEO, AND AUDIO
Since the release of PyTorch in 2017, hardware accelerators have gotten faster, and Arm-based server processors were introduced to the cloud. PyTorch 2.0 improvements take advantage of the growing landscape of new compute capabilities, reducing framework overhead as well as supporting Arm Compute Library. In this session, we will cover the cost reduction and performance gains of Graviton and how to get started with AWS EC2 Graviton and Amazon SageMaker.
From many major instance families in Amazon EC2 to managed services such as AWS Lambda, Amazon Aurora, and Amazon EKS, AWS Graviton-based architecture is being used by tens of thousands of customers to get significant price-performance benefits for a wide variety of workloads on AWS. AWS Graviton3 processors provide up to 25 percent better performance over AWS Graviton2 processors, which already provided significant price-performance benefits. This session dives deep into the AWS Graviton2, Graviton3 and Graviton3E processors including suitable workloads and considerations for adoption, and it features an AWS customer speaking about their processor adoption experience.
Running containerized apps on EKS? Is your app written in Python, Java, or Ruby? Come learn how to reduce your compute costs, increase application resiliency, and future-proof your architecture with Graviton and Intel processors. You'll learn how your app will benefit from Graviton, Intel and AMD processors and how to benchmark it. Moreover, we will demonstrate how to gradually deploy, processor-agnostic applications. We will present a real-world, python-based workload deployed in EKS, powered by Karpenter. We will load the system with 10000 transactions per second, on 100s of cores, and show the cost benefits of Graviton and CPU diversification.
From the ARM Ecosystem