Amazon has slimmed-down the Alexa client into something that will fit onto an ARM Cortex “M” processor with only 1MB of RAM. That’s good for talking white goods, but it also means that most of Alexa will continue to reside in the Amazon cloud while Google’s Assistant heads out from the parental home.
From the moment the trigger word is spoken a modern VPA (Virtual Personal Assistant) is streaming audio into the cloud, so the speech can be recognised and a response formulated. That processing costs money; in semiconductors and the electricity to run them, which must (somehow) be recovered. When Alexa was launched Amazon thought users would shop by voice, but they don’t, and if most Alexa users were subscribing to Prime Music then Amazon wouldn’t have launched a free (advertising supported) music service. So one has to ask how Amazon is planning to cover the cost of all that additional voice recognition?
Google faces the same problem, but has mitigated against it by shifting the burden of voice recognition onto the end customer. When activated on an Android smartphone the Google Assistant processes the speech on the user’s hardware, and using the user’s electricity. The Google Cloud just gets the text, for processing and storing for customer segmentation. By doing the recognition at the edge Google has shifted the cost of that processing – the company might claim that it’s doing so to reduce latency, but cutting costs seems like a very happy coincidence.
It seemed logical that Google would add a similar level of processing to its Smart Speaker products, and that Amazon would follow suit. The capital cost to the consumer would rise slightly, but that seems like a fair contribution given the utility of recognising voices. However, Amazon has moved in entirely the opposite direction – reducing the processing power at the edge, and saying very firmly that Alexa’s voice recognition will remain in the cloud for the foreseeable future.
Given the price sensitivity in the voice market Google will be obliged to follow suit. This will discourage more intelligence in the edge, for consumer applications at least, putting more emphasis (and spending) on cloud-based processing and network systems. That’s a long-term play for control of the ecosystem, with the expectation that revenue can be made down the line. However, it also reduces the cost of incorporating voice controls into products of all kinds, something we’d certainly recommend (see Market Trends: Voice-Based Interfacing Will Be Essential for Consumer Product Success for details).
View Free, Relevant Gartner Research
Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.Read Free Gartner Research
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.