Alexa is always listening but does not record continuously. It does not send anything to the cloud servers until it hears you say the idle word (Alexa, Echo or Computer). But listening to the waking words is harder than you think.
The echo material is not so smart. Without the Internet, any request or question you ask will fail. Indeed, your orders are sent to the cloud for interpretation and decisions. Amazon does not want all the conversations you have in front of a smart speaker to be recorded, but rather the commands you give to that speaker. For this reason, the company uses a watch word to attract the attention of the smart speaker. To do this, Amazon uses a combination of adjusted microphones, a short memory buffer and a neural network drive.
Adjusted microphones locate your voice
Voice assistant speakers, such as Echo and Echo Dot, typically have multiple built-in microphones. The echo point, for example, counts to seven. This matrix gives the devices several capabilities, ranging from hearing to distant controls to separating the background noise from the voices.
The latter is particularly useful for detecting wake-up words. With its multiple microphones, the echo can locate your position in relation to the sitting position and listen in that direction while ignoring the rest of the room.
You see this in action every time you use the waking word. Stand beside an Echo or an Echo Dot and say the waking word. Notice that the ring lights up in dark blue, then in brighter blue when it circles and "points" toward you. Now move several steps to the side and say the waking word one more time. Notice that the light blue lights follow you.
By knowing where you are, the device focuses better on you and give noises from elsewhere.
Short memory keeps the speaker from overtightening
The echo devices have a lot of storage space, but they do not use a lot of them. According to Rohit Prasad, vice president of Amazon and senior researcher of Alexa Artificial Intelligence, an echo can physically store only a few seconds of audio.
By reducing its capabilities, Amazon not only gives you more privacy (it's one less place where your voice is stored), but also prevents Echo from listening to entire conversations, which limits it to looking for the word of the day before .
Imagine that you have a three-second tape and a tape recorder. Suppose that after reaching the end, the band goes back to the beginning again and again. If you started recording a conversation, everything you said four seconds ago would be erased and immediately recorded. That's what Amazon Echo does.
It continuously records but erases everything that it has just recorded at the same time. This short span of attention means that all it can hear is the word "Alexa", and not much more. Three seconds, however, are sufficient for this word to be recorded, examined and dealt with properly.
Neural Net Training helps with pattern matching
Finally, Amazon depends on neural network training learn how to match patterns. A bit like the others forms of automatic learning, Amazon trains his algorithms by introducing the pending instance of the word Alexa (or Computer, or Echo, according to the waking word that the company forms).
The idea is to cover every inflection and accent, but also the context. Amazon wants your echo to recognize the difference when you speak at when you speak sure or maybe when you talk to a the person named Alexa. Directional microphones also help to achieve this goal.
With each word heard by the echo, the audio passes through layers of algorithms. Each layer is designed to eliminate false positives by looking for sound or context cues. If a layer check succeeds, the word moves to the next one. Finally, when the local device decides to hear the wake-up message, it starts recording and transmitting the sound to Amazon's cloud servers. Amazon uses four algorithms: one for each wake-up word (Alexa, Computer, Echo) and one for Alexa Guard, which treats specific sounds, such as broken glass, as a wake-up word.
But even in the case of a match, Amazon continues to perform more complex checks. Have you noticed that when someone pronounces the word Alexa in a TV show or advertisement, this usually causes no response from your echo? This is because Amazon is also doing a cloud check.
Cloud checks exclude some false positives
When companies make commercials featuring Alexa, they can submit audio to Amazon. The company runs audio via similar model matching algorithms, used to identify the wakeup word. Once this exact instance is fully cataloged, it is added to a database.
As part of the cloud connection process, your echo includes information about the idle word that it has heard and checks that database. Whenever it finds a match, Amazon will ask your Echo to ignore the wakeup word, turn it off and delete all its recorded sound.
In addition, Amazon searches for the instances of the spoken word simultaneously. Not all companies are submitting audio to Amazon, so she has come up with a new backup solution. After checking the database match, the company compares the wake word fingerprint to all other incoming instances at the same time. It is unlikely that two people who say that Alexa sounds simultaneously sound exactly the same. Therefore, if there is a match, Amazon knows it is a commercial or TV show and ignores the request.
Despite all the checks, false positives still occur. You can listen to what your echo has recorded at The privacy hub of Amazonand you will probably find at least one false positive in the pack. But the technology is constantly improving and ultimately Amazon would like it to work without any alarm clock.