AWS Graviton Weekly # 42


Issue # 42: June 16th, 2023 to June 23rd, 2023

AWS Silicon Innovation Day 2023 Edition

[Read the browser version right here]

Hey Reader.

Welcome to Issue # 42 of AWS Graviton Weekly, which will be focused on sharing everything that happened in the past week related to AWS Silicon: from June 16th, 2023 to June 23rd, 2023.

First a quick apology: I'm sending this email almost 8 hours after the regular schedule because I wanted to share more details about everything the team from AWS shared.

It wasn't easy because it was more than 6 hours of video I needed to analyze and share the details behind it, but it was worth it.

And a quick note outside of the event? Arm just launched its Learning platform with very good stuff there. Make sure to check out its Learning Paths.

Enjoy.


On June 21st, 2023 was the AWS Silicon Innovation Day.

So, this issue will be a little different: I will concentrate on sharing everything I learned from the event yesterday, and in the final part of the email, you will find some news and resources related to AWS Silicon, apart from the things shared outside of the event.

First, I want to highlight the key announcements from the live event on YouTube.

In that way, you will find the exact moments in a single place:

Introduction

  • 5:55: The event starts with Art Baudo and Martin Yip as the co-hosts of the event
  • 7:19: Sean and Christopher, founders of Dough Joy Donuts
  • 8:52: The conversation between David Brown (Vice President, Amazon EC2) and Ruba Borno (VP, Worldwide Channels and Alliances, AWS) about how customers and partners can take advantage of the innovation behind AWS Silicon. One of the coolest things shared by Dave here?
It's hard to believe that in just 10 years, we have gone from a mere concept to already having deployed over 20 million chips across AWS

And one of my favorite things shared by Ruba here? The power of Nitro Enclaves [16:12]

These capabilities begin at the foundation level with AWS Nitro system and extend further. Partners can also choose to use AWS Nitro Enclaves using the same Nitro hypervisor technology to reduce the attack surface area for highly sensitive information. Nitro Enclaves uses cryptographic attestation, so you can be sure that only the authorized code is running, all at no extra charge other than EC2 instances.
Partners like Anjuna working to create a high trust environment where data is always encrypted have built solutions to help our customers, further protect and securely process sensitive data. With the support of Anjuna's Confidential Computing platform; customers can easily embrace AWS Nitro Enclaves and create confidential clouds in a matter of minutes by lifting and shifting their existing applications without any need for code changes

Another interesting number shared by Dave was these stats[18:38]:

Furthermore, we work closely with the ISV community to ensure that our customer workloads will be supported on Graviton.
Today we are in our third generation of Graviton processors and customers have been loving them. With Graviton-based Amazon EC2 instances over 40,000 customers see great benefits like up to 40 percent better price performance, up to 20 percent lower costs, and up to 60 percent less energy used for the same workload.

How AWS Silicon is built

  • [34:26] The fascinating conversation between Gary Szilagyi (VP at AWS, Annapurna Labs) and Nafea Bshara (VP, Distinguished Engineer, Annapurna Labs)
  • [39:26] How AWS Silicon is actually built. Nafea shared a very interesting metaphor in order to compare a chip with a house:
If you are gonna build a chip, it's like building a house there are 50 billion bricks to put together, 50 billion bricks and you have to follow this architecture blueprint for the house while still running the structural beam, the plumbings, the wires, the ventilation ducts, and putting the windows and the doors to communicate with the external world, except this time each brick is a 5-nanometer brick, a transistor not wider than 5-nanometer, the house itself is 80 levels high, and the wires, all the ducting and electrical wire you have to run inside the house is 35 km or 25 miles, and the footprint of the house is not bigger than an inch by an inch.
Once you do that, you still need to meet the volume demand, because of the efficiency, we need to build these chips in a few months, we need to make million of them per year, and each one should not be costing more than ten of dollars or a few hundreds of dollars. And Gary: one more thing about these chips: after you build them and deploy them, you need to deliver electricity to empower them.
Think about this: a typical house in the US, in the best case, you'll get 100 Ampere of electrical current from a utility company. Well, the chips that we build today of the size of a quarter; they actually need 500 Ampere. So, imagine they need 5x of the house need, and not only do they need 5x, they need the 500 Amp in a few nanoseconds.
So you really need to build not just the chip; while building the chip in the system, you will need a pretty wide access road for all this electrical current and inside these chips, you will need many, many electrical wires: the 35 km of wire equivalent, and all these wires need to be low resistance so they don't heat up, and they don't have any open circuit.
So, designing and testing this level of complexity is what we focus on.

Simply fascinating.

How Annapurna Labs develops the AI chips

  • [45:39] Gary explains how is the process to build the AI chips
  • [47:33] Gary explained the challenge of the fast innovation behind the Science of Machine Learning
  • [49:08] Back to developing high-performance CPUs like Graviton and the hard challenges behind it
  • [50:34] The advantages of developing silicon in-house for AWS
  • [52:35] The importance of sustainability in the cloud

Insights from Analysts

The AWS Nitro Card Discussion

  • [1:01:21] Anthony Liguori (VP/Distinguished Engineer at AWS) and Ali Saidi (Sr Principal Engineer at Annapurna Labs) discussed the foundations of AWS Nitro a decade ago and how the seamless integration between hardware and software plays a key part here
  • [1:04:29] How Annapurna Labs actually build and design chips end-to-end
  • [1:06:09] What it looks like to develop hardware at Annapurna Labs
  • [1:08:35] Ali explains how actually Annapurna Labs is using the current generation of Graviton and Nitro chips on EC2 in order to develop its next generation
  • [1:10:00] Changes in the Nitro card across generations
  • [1:12:47] How to run simulations for certain high-performance numbers in the Nitro card
  • [1:17:41] Exciting features like Scalable Reliable Datagram (SRD) protocol, Elastic Fabric Adapter (EFA), encryption, and more cool stuff are loved by Anthony
  • [1:20:00] The importance of efficiency and sustainability

The Voice of the Customer

Insights from Analysts

  • [2:05:17] Another analyst perspective. This time from Elias Khnaser (Chief of Research at EK Media Group) and Raj Pai (VP of EC2 Product Management at AWS) talking about Silicon Innovation
  • [2:07:08] Why AWS is building its own Arm-based processors
  • [2:11:30] The Graviton "perks" for AWS customers
  • [2:14:01] Is difficult to migrate your workloads to Graviton?
  • [2:16:51] Average time to actually do the migration to Graviton?

It's time for Generative AI chips stuff

  • [2:32:46] Chetan Kapoor (Director of Product Management, EC2 Core at AWS) and Gadi Hutt (Senior Director of Business Development at Annapurna Labs) discussed the innovation behind AWS Inferentia and AWS Traininum chips
  • [2:34:36] Building an FPGA Service as a Business from Scratch
  • [2:36:18] Why it was important to start building a silicon for accelerating Machine Learning?
  • [2:38:31] How inf1 is being used by external customers?
  • [2:39:56] How Alexa and Alexa Voice is Powered by AWS Inferentia
  • [2:41:12] Why AWS Trainium was the next challenge for the team
  • [2:42:16] The main differences between AWS Trainium and AWS Inferentia chips
  • [2:44:20] Let's take a look at the AWS Trainium chip
  • [2:45:07] Inferentia and Trainium chips compared side by side
  • [2:46:30] How it means to deploy a supercomputer at scale on AWS datacenters
  • [2:48:59] AWS is building a new cluster called "Trainium One" with more than 30,000 chips on it
  • [2:49:32] The next big challenge? Generative AI Inference: inf2
  • [2:50:42] The EC2 inf2 Instance Server itself: You can run a model with 175 Billion parameters there, thanks to the memory bandwidth of 10 Terabytes (WTF???)
  • [2:52:53] Let's talk about the Software part where Open Source LLMs play a key role: AWS Neuron SDK
  • [2:57:07] Sairam Menon (Software Engineering Manager, AI Product Line Owner at Johnson & Johnson Technology) explains how the company is using AI today

Confidential Computing on AWS

  • [3:07:03] Art Baudo, William Yap (Principal Product Manager, AWS), and Arvind Raghu (WW Business Development & GTM Strategy Lead, EC2 Core, Confidential Computing) discussed the state of Confidential Computing at AWS
  • [3:10:52] What makes Confidential Computing unique from the AWS perspective?
  • [3:12:54] What is the AWS Nitro system exactly?
  • [3:30:49] How Skyflow is using Confidential Computing on AWS with Amruta Moktali (Chief Product Officer at Skyflow)

The second part of the event is only available today on Twitch, so let's continue with it.

Deep Dive on AWS Graviton

  • [00:05:23] Deep Dive on Graviton with Stephanie Shyu (Global Head of Graviton GTM Strategy and Business Development at Amazon Web Services for Graviton) Martin Yip and Ali Saidi
  • [00:06:23] Why AWS decided to build their custom chips based on Arm?
  • [00:08:09] How Epic Games and Snap are taking advantage of Graviton processors
  • [00:08:58] How AWS is helping customers to save up to 40% in cloud costs thanks to Graviton
  • [00:10:56] How Graviton is helping customers with their Sustainability goals
  • [00:12:43] How Snowflake and Pinterest are taking advantage of it
  • [00:14:05] More than 40,000 AWS customers are using Graviton today

A very interesting slide was shared here

Deep Dive on AWS Trainium and AWS Inferentia

  • [00:34:57] How to start working with AWS ML Silicon with Shruti Koparkar (Product Marketing and GTM leader on EC2 for ML accelerators at AWS) and Matthew McClean (Senior Manager, Solution Architect at Annapurna Labs)
  • [00:36:24] What is Generative AI?
  • [00:37:56] Foundational Models of Generative AI
  • [00:39:24] Fine-tuning vs Pre-Training a model
  • [00:39:54] Computing-Intensive Pre-Training a Model
  • [00:41:08] Other capabilities in terms of scale and price-performance of AWS Trainium and Inferentia: Trn1 instances
  • [00:42:52] Support for different data types and why it matters
  • [00:44:39] Model Deployment
  • [00:45:44] Stable Diffusion 1.5 Demo on AWS Inferentia2
  • [00:50:22] Similarities between Inference and Training
  • [00:55:02] UX matters for ML development at AWS
  • [00:56:40] Hugging Face and AWS Collaboration, with Jeff Boudier from the Product team at Hugging Face

Deep Dive on High-Performance Computing

High-Performance Networking

Silicon Innovation and the Modern Network

AWS Nitro SSDs and Amazon RDS

Costs Using AWS Silicon for Cost Optimization


NEWS


ARTICLES AND TUTORIALS

Optimized PyTorch 2.0 Inference with AWS Graviton processors, by Sunita Nadampalli (Software Development Manager at Amazon) and Ankith Gunapal (AI Partner Engineer at Meta (PyTorch))

For PyTorch 2.0, the Graviton3-based C7g instance is the most cost-effective compute-optimized Amazon EC2 instance for inference. These instances are available on SageMaker and Amazon EC2.
The AWS Graviton Technical Guide provides a list of optimized libraries and best practices that will help you achieve cost benefits with Graviton instances across different workloads.

BTW: it's worth the mention that this was a collaboration between AWS, Meta, Arm, and Intel. Here are some of the people who participated in this joint work:



EVENTS


Graviton Essentials - Virtual Developer Day (Wednesday, July 12 2023 | 9:00 AM - 5:00 PM PDT) Live Virtual & Interactive


From the ARM Ecosystem


Marcos Ortiz

I'm a Data Engineer by day at Riot Games (via X-Team ) and by night, I curate the last news/product announcements/resources about AWS Silicon (Graviton, AWS Nitro, Inferentia, and Trainium).

Read more from Marcos Ortiz
AWS Graviton Weekly # 97: How Amazon’s New CPU Fights Cybersecurity Threats?

Issue # 97: July 19, 2024 to July 26, 2024 Hey Reader. Welcome to Issue # 97 of AWS Graviton Weekly, which will be focused on sharing everything that happened in the past week related to AWS Silicon: from July 19, 2024 to July 26, 2024. Enjoy. Recommendation of the week: CAST.AI NEWS AWS Step Functions now supports Customer Managed Keys Llama 3.1 models from Meta are now available on AWS, offering more options for building generative AI applications AWS Lambda now supports Amazon MQ for...

Issue # 96: July 11, 2024 to July 19, 2024 Hey Reader. Welcome to Issue # 96 of AWS Graviton Weekly, which will be focused on sharing everything that happened in the past week related to AWS Silicon: from July 11, 2024 to July 19, 2024. Before continuing with the regular content: #hugops for CrowdStrike for the Microsoft Windows BSOD issue today. BTW, Cristian Măgherușan-Stanciu worked on some Terraform automation tool to fix this issue on AWS. He is looking for testers. Back to business now...

Issue # 95: July 5, 2024 to July 11, 2024 Hey Reader. Welcome to Issue # 95 of AWS Graviton Weekly, which will be focused on sharing everything that happened in the past week related to AWS Silicon: from July 5, 2024 to July 11, 2024. Enjoy. Recommendation of the week: CAST.AI NEWS Amazon EC2 R8g instances powered by AWS Graviton4 now generally available AWS Neuron introduces Flash Attention kernel enabling high performance and large sequence lengths Announcing availability of AWS Outposts in...