Amazon admits storing Alexa recordings until users take action
FYI, this story is more than a year old
Amazon’s Alexa voice service is storing all user voice recordings and transcripts on its servers by default – and people who want to delete those recordings must do so manually.
The findings come after US Senator Chris Coons wrote to Amazon about the company’s data privacy and security practices back in May this year.
The letter details how Amazon’s privacy protections may not be as strong as they claim to be.
Coons wanted to investigate key privacy issues – are transcripts deleted when audio recordings are deleted? Or are the transcripts are stored indefinitely on Amazon servers without giving users options to delete them?
“The inability to delete a transcript of an audio recording renders the option to delete the recording largely inconsequential and puts users’ privacy at risk,” Coons wrote.
In 2018, Amazon reassured Coons that it built user privacy by design into its Alexa product. However, the reality seems starkly different as it tries to balance privacy with its learning systems.
Coons requested information about the transcripts, including how long they are stored for, if there are any transcripts that can’t be deleted, how Amazon uses these transcripts, and if Amazon anonymises user identity and other information.
He also asked questions surrounding audio recognition and recording after voice commands. Those questions included: how long Alexa waits before it stops recording; if Amazon records audio without the ‘wake word’; and how Alexa stops recording after commands that don’t repeat the wake word.
Coons also asked further questions, which we have not included here for brevity.
On June 28, Amazon responded to Coons’ questions. The company explained that Alexa and Echo devices uses ‘keyword spotting’ to detect when a user wants to interact with Alexa.
The company also admits that users must manually delete recordings (including transcripts), by going to amazon.com/alexaprivacy.
If users don’t delete their recordings, they are giving Amazon access to all interactions and customer data. That data is used to train Alexa.
“In addition to using the transcripts to improve Alexa and the customer experience, we use the transcripts to provide transparency to our customer about what Alexa thought it heard and what Alexa provided as a response. Our Alexa’s Voice History feature allows customers to play the actual audio that was streamed to the cloud, review the text transcript of what Alexa thought the customer said, and review Alexa’s response. This helps customers to understand how Alexa works,” Amazon’s letter says.
“Providing customers with the transcript also allows customers to understand and inspect exactly what Alexa is, and is not, recording.”
“We already delete those transcripts from all of Alexa’s primary storage systems, and we have an ongoing effort to ensure those transcripts do not remain in any of Alexa’s other storage systems.”
But take note - an ‘ongoing effort’ isn’t a guarantee that transcripts are fully deleted.
In regards to how Alexa records without the ‘wake word’, Amazon explains that this setting is called Follow-Up Mode.
“Alexa will end the stream immediately once our automatic speech recognition system determines the customer has stopped speaking to Alexa. A blue light illuminates on the Echo device to indicate when audio is being streamed to the cloud, and customers can also enable an audible tone that plays when their Echo device begins and ends streaming audio to the cloud.”
Coons appreciates Amazon’s response, however he’s still worried about privacy concerns.
“Amazon’s response leaves open the possibility that transcripts of user voice interactions with Alexa are not deleted from all of Amazon’s servers, even after a user has deleted a recording of his or her voice."
“What’s more, the extent to which this data is shared with third parties, and how those third parties use and control that information, is still unclear.”
If one thing’s clear from the communications, Amazon has much more to answer for when it comes to balancing user privacy with machine learning systems like Alexa.