Amazon has slimmed-down the Alexa client into something that will fit onto an ARM Cortex “M” processor with only 1MB of RAM. That’s good for talking white goods, but it also means that most of Alexa will continue to reside in the Amazon cloud while Google’s Assistant heads out from the parental home.
From the moment the trigger word is spoken a modern VPA (Virtual Personal Assistant) is streaming audio into the cloud, so the speech can be recognised and a response formulated. That processing costs money; in semiconductors and the electricity to run them, which must (somehow) be recovered. When Alexa was launched Amazon thought users would shop by voice, but they don’t, and if most Alexa users were subscribing to Prime Music then Amazon wouldn’t have launched a free (advertising supported) music service. So one has to ask how Amazon is planning to cover the cost of all that additional voice recognition?
Google faces the same problem, but has mitigated against it by shifting the burden of voice recognition onto the end customer. When activated on an Android smartphone the Google Assistant processes the speech on the user’s hardware, and using the user’s electricity. The Google Cloud just gets the text, for processing and storing for customer segmentation. By doing the recognition at the edge Google has shifted the cost of that processing – the company might claim that it’s doing so to reduce latency, but cutting costs seems like a very happy coincidence.
It seemed logical that Google would add a similar level of processing to its Smart Speaker products, and that Amazon would follow suit. The capital cost to the consumer would rise slightly, but that seems like a fair contribution given the utility of recognising voices. However, Amazon has moved in entirely the opposite direction – reducing the processing power at the edge, and saying very firmly that Alexa’s voice recognition will remain in the cloud for the foreseeable future.
Given the price sensitivity in the voice market Google will be obliged to follow suit. This will discourage more intelligence in the edge, for consumer applications at least, putting more emphasis (and spending) on cloud-based processing and network systems. That’s a long-term play for control of the ecosystem, with the expectation that revenue can be made down the line. However, it also reduces the cost of incorporating voice controls into products of all kinds, something we’d certainly recommend (see Market Trends: Voice-Based Interfacing Will Be Essential for Consumer Product Success for details).