Simon Willison's Weblog: microphone-ads-conspiracy

Calm Down—Your Phone Isn’t Listening to Your Conversations. It’s Just Tracking Everything You Type, Every App You Use, Every Website You Visit, and Everywhere You Go in the Physical World

2025-04-26T18:22:51+00:00

Calm Down—Your Phone Isn’t Listening to Your Conversations. It’s Just Tracking Everything You Type, Every App You Use, Every Website You Visit, and Everywhere You Go in the Physical World

Perfect headline on this piece by Jonathan Zeller for McSweeney’s.

Via limbero on Hacker News

Tags: microphone-ads-conspiracy

Another rant about companies not spying on you through your phone's microphone to serve you ads

2025-04-26T02:07:00+00:00

Last September I posted a series of long ranty comments on Lobste.rs about the latest instance of the immortal conspiracy theory (here it goes again) about apps spying on you through your microphone to serve you targeted ads.

On the basis that it's always a great idea to backfill content on your blog, I just extracted my best comments from that thread and turned them into this full post here, back-dated to September 2nd which is when I wrote the comments.

My rant was in response to the story In Leak, Facebook Partner Brags About Listening to Your Phone’s Microphone to Serve Ads for Stuff You Mention. Here's how it starts:

Which is more likely?

All of the conspiracy theories are real! The industry managed to keep the evidence from us for decades, but finally a marketing agency of a local newspaper chain has blown the lid off the whole thing, in a bunch of blog posts and PDFs and on a podcast.

Everyone believed that their phone was listening to them even when it wasn’t. The marketing agency of a local newspaper chain were the first group to be caught taking advantage of that widespread paranoia and use it to try and dupe people into spending money with them, despite the tech not actually working like that.

My money continues to be on number 2.

You can read the rest here. Or skip straight to why I think this matters so much:

Privacy is important. People who are sufficiently engaged need to be able to understand exactly what’s going on, so they can e.g. campaign for legislators to reign in the most egregious abuses.

I think it’s harmful letting people continue to believe things about privacy that are not true, when we should instead be helping them understand the things that are true.

Tags: blogging, privacy, microphone-ads-conspiracy

I still don't think companies serve you ads based on spying through your microphone

2025-01-02T23:43:31+00:00

One of my weirder hobbies is trying to convince people that the idea that companies are listening to you through your phone's microphone and serving you targeted ads is a conspiracy theory that isn't true. I wrote about this previously: Facebook don’t spy on you through your microphone.

(Convincing people of this is basically impossible. It doesn't matter how good your argument is, if someone has ever seen an ad that relates to their previous voice conversation they are likely convinced and there's nothing you can do to talk them out of it. Gimlet media did a great podcast episode about how impossible this is back in 2017.)

This is about to get even harder thanks to this proposed settlement: Siri “unintentionally” recorded private convos; Apple agrees to pay $95M (Ars Technica).

Apple are spending $95m (nine hours of profit), agreeing to settle while "denying wrongdoing".

What actually happened is it turns out Apple were capturing snippets of audio surrounding the "Hey Siri" wake word, sending those back to their servers and occasionally using them for QA, without informing users that they were doing this. This is bad.

The Reuters 2021 story Apple must face Siri voice assistant privacy lawsuit -U.S. judge reported that:

One Siri user said his private discussions with his doctor about a "brand name surgical treatment" caused him to receive targeted ads for that treatment, while two others said their discussions about Air Jordan sneakers, Pit Viper sunglasses and "Olive Garden" caused them to receive ads for those products.

The claim from that story was then repeated in the 2025 Reuters story about the settlement.

The Ars Technica story reframes that like this:

The only clue that users seemingly had of Siri's alleged spying was eerily accurate targeted ads that appeared after they had just been talking about specific items like Air Jordans or brands like Olive Garden, Reuters noted.

Crucially, this was never proven in court. And if Apple settle the case it never will be.

Let’s think this through. For the accusation to be true, Apple would need to be recording those wake word audio snippets and transmitting them back to their servers for additional processing (likely true), but then they would need to be feeding those snippets in almost real time into a system which forwards them onto advertising partners who then feed that information into targeting networks such that next time you view an ad on your phone the information is available to help select the relevant ad.

That is so far fetched. Why would Apple do that? Especially given both their brand and reputation as a privacy-first company combined with the large amounts of product design and engineering work they’ve put into preventing apps from doing exactly this kind of thing by enforcing permission-based capabilities and ensuring a “microphone active” icon is available at all times when an app is listening in.

I really don't think this is happening - in particular for Siri wake words!

I've argued these points before, but I'll do it again here for good measure.

You don't notice the hundreds of times a day you say something and don't see a relevant advert a short time later. You see thousands of ads a day, can you remember what any of them are?
The tiny fraction of times where you see an ad that's relevant to something you've just said (hence breaking through your filter that prevents you from seeing most ads at all) stick in your head.
Human beings are pattern matching machines with a huge bias towards personal anecdotes. If we've seen direct evidence of something ourselves, good luck talking us out of it!

I think the truth of the matter here is much more pedestrian: the quality of ad targeting that's possible just through apps sharing data on your regular actions within those apps is shockingly high... combined with the fact that it turns out just knowing "male, 40s, NYC" is often more than enough - we're all pretty basic!

I fully expect that this Apple story will be used as "proof" by conspiracy theorists effectively forever.

Tags: apple, conspiracy, privacy, misinformation, microphone-ads-conspiracy, digital-literacy

In Leak, Facebook Partner Brags About Listening to Your Phone’s Microphone to Serve Ads for Stuff You Mention

2024-09-02T23:56:44+00:00

In Leak, Facebook Partner Brags About Listening to Your Phone’s Microphone to Serve Ads for Stuff You Mention

(I've repurposed some of my comments on Lobsters into this commentary on this article. See also I still don’t think companies serve you ads based on spying through your microphone.)

Which is more likely?

All of the conspiracy theories are real! The industry managed to keep the evidence from us for decades, but finally a marketing agency of a local newspaper chain has blown the lid off the whole thing, in a bunch of blog posts and PDFs and on a podcast.
Everyone believed that their phone was listening to them even when it wasn’t. The marketing agency of a local newspaper chain were the first group to be caught taking advantage of that widespread paranoia and use it to try and dupe people into spending money with them, despite the tech not actually working like that.

My money continues to be on number 2.

Here’s their pitch deck. My “this is a scam” sense is vibrating like crazy reading it: CMG Pitch Deck on Voice-Data Advertising 'Active Listening'.

It does not read to me like the deck of a company that has actually shipped their own app that tracks audio and uses it for even the most basic version of ad targeting.

They give the game away on the last two slides:

Prep work:

Create buyer personas by uploading past consumer data into the platform

Identify top performing keywords relative to your products and services by analyzing keyword data and past ad campaigns

Ensure tracking is set up via a tracking pixel placed on your site or landing page

Now that preparation is done:

Active listening begins in your target geo and buyer behavior is detected across 470+ data sources […]

Our technology analyzes over 1.9 trillion behaviors daily and collects opt-in customer behavior data from hundreds of popular websites that offer top display, video platforms, social applications, and mobile marketplaces that allow laser-focused media buying.

Sources include: Google, LinkedIn, Facebook, Amazon and many more

That’s not describing anything ground-breaking or different. That’s how every targeting ad platform works: you upload a bunch of “past consumer data”, identify top keywords and setup a tracking pixel.

I think active listening is the term that the team came up with for “something that sounds fancy but really just means the way ad targeting platforms work already”. Then they got over-excited about the new metaphor and added that first couple of slides that talk about “voice data”, without really understanding how the tech works or what kind of a shitstorm that could kick off when people who DID understand technology started paying attention to their marketing.

TechDirt's story Cox Media Group Brags It Spies On Users With Device Microphones To Sell Targeted Ads, But It’s Not Clear They Actually Can included a quote with a clarification from Cox Media Group:

CMG businesses do not listen to any conversations or have access to anything beyond a third-party aggregated, anonymized and fully encrypted data set that can be used for ad placement. We regret any confusion and we are committed to ensuring our marketing is clear and transparent.

Why I don't buy the argument that it's OK for people to believe this

I've seen variants of this argument before: phones do creepy things to target ads, but it’s not exactly “listen through your microphone” - but there’s no harm in people believing that if it helps them understand that there’s creepy stuff going on generally.

I don’t buy that. Privacy is important. People who are sufficiently engaged need to be able to understand exactly what’s going on, so they can e.g. campaign for legislators to reign in the most egregious abuses.

I think it’s harmful letting people continue to believe things about privacy that are not true, when we should instead be helping them understand the things that are true.

This discussion thread is full of technically minded, engaged people who still believe an inaccurate version of what their devices are doing. Those are the people that need to have an accurate understanding, because those are the people that can help explain it to others and can hopefully drive meaningful change.

This is such a damaging conspiracy theory.

It’s causing some people to stop trusting their most important piece of personal technology: their phone.
We risk people ignoring REAL threats because they’ve already decided to tolerate made up ones.
If people believe this and see society doing nothing about it, that’s horrible. That leads to a cynical “nothing can be fixed, I guess we will just let bad people get away with it” attitude. People need to believe that humanity can prevent this kind of abuse from happening.

The fact that nobody has successfully produced an experiment showing that this is happening is one of the main reasons I don’t believe it to be happening.

It’s like James Randi’s One Million Dollar Paranormal Challenge - the very fact that nobody has been able to demonstrate it is enough for me not to believe in it.

Tags: conspiracy, facebook, privacy, microphone-ads-conspiracy

The AI trust crisis

2023-12-14T16:14:11+00:00

Dropbox added some new AI features. In the past couple of days these have attracted a firestorm of criticism. Benj Edwards rounds it up in Dropbox spooks users with new AI features that send data to OpenAI when used.

The key issue here is that people are worried that their private files on Dropbox are being passed to OpenAI to use as training data for their models - a claim that is strenuously denied by Dropbox.

As far as I can tell, Dropbox built some sensible features - summarize on demand, "chat with your data" via Retrieval Augmented Generation - and did a moderately OK job of communicating how they work... but when it comes to data privacy and AI, a "moderately OK job" is a failing grade. Especially if you hold as much of people's private data as Dropbox does!

Two details in particular seem really important. Dropbox have an AI principles document which includes this:

Customer trust and the privacy of their data are our foundation. We will not use customer data to train AI models without consent.

They also have a checkbox in their settings that looks like this:

Update: Some time between me publishing this article and four hours later, that link stopped working.

I took that screenshot on my own account. It's toggled "on" - but I never turned it on myself.

Does that mean I'm marked as "consenting" to having my data used to train AI models?

I don't think so: I think this is a combination of confusing wording and the eternal vagueness of what the term "consent" means in a world where everyone agrees to the terms and conditions of everything without reading them.

But a LOT of people have come to the conclusion that this means their private data - which they pay Dropbox to protect - is now being funneled into the OpenAI training abyss.

People don't believe OpenAI

Here's copy from that Dropbox preference box, talking about their "third-party partners" - in this case OpenAI:

Your data is never used to train their internal models, and is deleted from third-party servers within 30 days.

It's increasing clear to me like people simply don't believe OpenAI when they're told that data won't be used for training.

What's really going on here is something deeper then: AI is facing a crisis of trust.

I quipped on Twitter:

"OpenAI are training on every piece of data they see, even when they say they aren't" is the new "Facebook are showing you ads based on overhearing everything you say through your phone's microphone"

Here's what I meant by that.

Facebook don't spy on you through your microphone

Have you heard the one about Facebook spying on you through your phone's microphone and showing you ads based on what you're talking about?

This theory has been floating around for years. From a technical perspective it should be easy to disprove:

Mobile phone operating systems don't allow apps to invisibly access the microphone.
Privacy researchers can audit communications between devices and Facebook to confirm if this is happening.
Running high quality voice recognition like this at scale is extremely expensive - I had a conversation with a friend who works on server-based machine learning at Apple a few years ago who found the entire idea laughable.

The non-technical reasons are even stronger:

Facebook say they aren't doing this. The risk to their reputation if they are caught in a lie is astronomical.
As with many conspiracy theories, too many people would have to be "in the loop" and not blow the whistle.
Facebook don't need to do this: there are much, much cheaper and more effective ways to target ads at you than spying through your microphone. These methods have been working incredibly well for years.
Facebook gets to show us thousands of ads a year. 99% of those don't correlate in the slightest to anything we have said out loud. If you keep rolling the dice long enough, eventually a coincidence will strike.

Here's the thing though: none of these arguments matter.

If you've ever experienced Facebook showing you an ad for something that you were talking about out-loud about moments earlier, you've already dismissed everything I just said. You have personally experienced anecdotal evidence which overrides all of my arguments here.

Here's a Reply All podcast episode from Novemember 2017 that explores this issue: 109 Is Facebook Spying on You?. Their conclusion: Facebook are not spying through your microphone. But if someone already believes that there is no argument that can possibly convince them otherwise.

I've experienced this effect myself - over the past few years I've tried talking people out of this, as part of my own personal fascination with how sticky this conspiracy theory is.

The key issue here is the same as the OpenAI training issue: people don't believe these companies when they say that they aren't doing something.

One interesting difference here is that in the Facebook example people have personal evidence that makes them believe they understand what's going on.

With AI we have almost the complete opposite: AI models are weird black boxes, built in secret and with no way of understanding what the training data was or how it influences the model.

As with so much in AI, people are left with nothing more than "vibes" to go on. And the vibes are bad.

This really matters

Trust is really important. Companies lying about what they do with your privacy is a very serious allegation.

A society where big companies tell blatant lies about how they are handling our data - and get away with it without consequences - is a very unhealthy society.

A key role of government is to prevent this from happening. If OpenAI are training on data that they said they wouldn't train on, or if Facebook are spying on us through our phone's microphones, they should be hauled in front of regulators and/or sued into the ground.

If we believe that they are doing this without consequence, and have been getting away with it for years, our intolerance for corporate misbehavior becomes a victim as well. We risk letting companies get away with real misconduct because we incorrectly believed in conspiracy theories.

Privacy is important, and very easily misunderstood. People both overestimate and underestimate what companies are doing, and what's possible. This isn't helped by the fact that AI technology means the scope of what's possible is changing at a rate that's hard to appreciate even if you're deeply aware of the space.

If we want to protect our privacy, we need to understand what's going on. More importantly, we need to be able to trust companies to honestly and clearly explain what they are doing with our data.

On a personal level we risk losing out on useful tools. How many people cancelled their Dropbox accounts in the last 48 hours? How many more turned off that AI toggle, ruling out ever evaluating if those features were useful for them or not?

What can we do about it?

There is something that the big AI labs could be doing to help here: tell us how you are training!

The fundamental question here is about training data: what are OpenAI using to train their models?

And the answer is: we have no idea! The entire process could not be more opaque.

Given that, is it any wonder that when OpenAI say "we don't train on data submitted via our API" people have trouble believing them?

The situation with ChatGPT itself is even more messy. OpenAI say that they DO use ChatGPT interactions to improve their models - even those from paying customers, with the exception of the "call us" priced ChatGPT Enterprise.

If I paste a private document into ChatGPT to ask for a summary, will snippets of that document be leaked to future users after the next model update? Without more details on HOW they are using ChatGPT to improve their models I can't come close to answering that question.

Clear explanations of how this stuff works could go a long way to improving the trust relationship OpenAI have with their users, and the world at large.

Maybe take a leaf from large scale platform companies. They publish public post-mortem incident reports on outages, to regain trust with their customers through transparency about exactly what happened and the steps they are taking to prevent it from happening again. Dan Luu has collected a great list of examples.

An opportunity for local models

One consistent theme I've seen in conversations about this issue is that people are much more comfortable trusting their data to local models that run on their own devices than models hosted in the cloud.

The good news is that local models are consistently both increasing in quality and shrinking in size.

I figured out how to run Mixtral-8x7b-Instruct on my laptop last night - the first local model I've tried which really does seem to be equivalent in quality to ChatGPT 3.5.

Microsoft's Phi-2 is a fascinating new model in that it's only 2.7 billion parameters (most useful local models start at 7 billion) but claims state-of-the-art performance against some of those larger models. And it looks like they trained it for around $35,000.

While I'm excited about the potential of local models, I'd hate to see us lose out on the power and convenience of the larger hosted models over privacy concerns which turn out to be incorrect.

The intersection of AI and privacy is a critical issue. We need to be able to have the highest quality conversations about it, with maximum transparency and understanding of what's actually going on.

This is hard already, and it's made even harder if we straight up disbelieve anything that companies tell us. Those companies need to earn our trust. How can we help them understand how to do that?

Tags: trust, dropbox, ai, openai, local-llms, llms, training-data, microphone-ads-conspiracy, digital-literacy