The major cloud providers are launching fabless semiconductor business units to drive AI ecosystems, from the cloud to the edge and back: less than half a year ago, I expressed my expectation that Google could take over global AI device endpoints with its TPU Edge (Tensor Processing Units) chips, forming a logical end-to-end semiconductor architecture for ML training and inference (1) . Now, AWS (through its Annupurna Labs acquisition) announced their own custom ML inferencing chip, AWS Inferentia (2) . After the Graviton chip for arm workloads, this second device reinforces AWS extension to becoming a full-fledged fabless semiconductor company.
The AWS chip strategy is different from Google’s end-to-end chipset strategy. Inferentia chips will run in cloud datacenters, where AWS claims customer can achieve cost efficiencies around inferencing (which, according to AWS, eats up around 90% of the compute costs of ML, with only 10% being consumed by training). Not all ML inferencing workloads will be alike, and your cost reduction mileage may vary, depending on your tolerance of probabilities, the depth of your convolutions, etc. The strategy with Inferentia is to draw ML designers and operators into using the full suite of AWS enablers, especially SageMaker Neo, which optimizes the training model, regardless of the framework used. For edge devices (where Inferentia will not run), legacy arm- or x86-based processing platforms will still have to run local inferencing.
What this means for tech providers:
• Inferentia will not directly impact edge device inferencing cost – only indirectly, through use of AWS’ toolset (SageMaker Neo, Elastic Inference) and through calling EC2 instances.
• ML practitioners will not be able to buy parts directly, but leverage Instantia-based systems, deployed either in the cloud, or on-prem.
• It will cost Amazon less to provide me the Alexa experiences that manage my smart home thermostat, feed my dogs, and sprinkle my lawn. It could cost providers who leverage the AWS ecosystem and tools less to provision ML-enabled services, and incentivize further democratization of AI.
Whether or not there will be an “Inferentia Lite” or “Inferentia Edge” that will power my next gen Echo device (reducing latency and staying operational when the network is down) is a roadmap question. Just thinking and extrapolating out loud…
Read Complimentary Relevant Research
Top Strategic Predictions for 2019 and Beyond: Practicality Exists Within Instability
Technology-based change is happening continuously, and most organizations struggle to see the change in advance. Continuous change can...
View Relevant Webinars
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.