Voice AI privacy: what actually happens to your meeting data

Voice AI privacy comes down to three things: who hears the audio, where the transcript lives, and how long it sticks around. Here's the honest map.

TL;DR

When voice AI joins your meeting, three things travel: the raw audio, the live transcript, and the structured outputs (decisions, action items, follow-ups). Privacy risk lives in how each one is handled. The right tool deletes audio fast, keeps the transcript in a region you control, doesn't train on your data by default, and lets the host pause the bot mid-call. If a vendor can't show you that on a single page, walk.

The honest answer in one paragraph

Voice AI privacy is not a checkbox. It's a chain of custody for three different artifacts. The audio gets captured, transcribed, and (in most reputable tools) thrown away within minutes. The transcript sits in a database somewhere, usually for as long as your retention setting says. The structured outputs (notes, decisions, action items) flow into your stack: Slack, Notion, Linear, Gmail. Every link in that chain is a place where things can go right or wrong, and most of the questions teams worry about map cleanly to one of those links.

If you only remember one thing from this piece: ask the vendor where each of the three artifacts lives, who can see them, and when they go away. If the answers are vague, that is the answer.

Where the audio actually goes

The path looks roughly the same across vendors. A meeting bot joins your Zoom, Google Meet, or Teams call as a participant. It captures the audio stream the same way any other attendee would. That audio is sent to a speech-to-text model, sometimes hosted by the vendor, sometimes by a partner (Deepgram, AssemblyAI, OpenAI's Whisper API, or a self-hosted model).

Two questions matter here.

Is the audio stored, and for how long?

Reputable voice AI tools discard raw audio within minutes once the transcript is written. Some keep it for a few hours so users can replay clips. A few keep it for the full retention window of the meeting. The best vendors let you choose, and default to the shortest setting.

Why the audio matters more than the transcript: a transcript is plain text, which is easy to redact, search, and govern. Audio is a biometric. Voiceprints are personally identifiable in ways a paragraph of text is not. If audio is hanging around for 30 days, that is a different privacy posture than a transcript hanging around for 30 days.

Where is it processed, geographically?

If your team or your customers are in the EU, UK, or any region with strict cross-border data rules, the answer needs to be specific. "Our infrastructure is on AWS" is not specific. "Audio is processed in eu-central-1 and never leaves the region" is specific.

Where the transcript lives

Once the audio is gone, the transcript becomes the primary artifact. This is the part most teams underweight. A transcript of a strategy meeting can be more sensitive than the audio itself, because it's searchable, copyable, and quotable.

Three things to confirm:

  • Storage region. Same question as audio. Where, exactly. EU customers should expect EU storage as a default, not an upgrade.
  • Retention window. Default and configurable. Some teams want 7 days. Others need 7 years for regulatory reasons. The tool should let you set both a global default and per-meeting overrides.
  • Access controls. Who on your team can read which transcript. Workspace admin, meeting host, invited participants only. The good answer is "all of the above, configurable per meeting."

The bad answer here is silence: a tool that stores transcripts forever, surfaces them to anyone in the workspace, and offers no controls. That's how a sales call ends up surfacing in a search by someone who shouldn't see it.

Training: the question that keeps getting dodged

If you can't tell from the homepage whether your meetings are training the model, that's the answer.

The single most important privacy question for voice AI right now is whether your meetings are used to train future models. The answer should be plainly stated, in writing, in the data processing agreement.

The healthy default in 2026 is no-training-on-customer-data, with the option to opt in if you want to contribute. Some consumer-tier tools flip this: training on by default, opt-out buried in settings. That is a red flag for anything past a free trial.

Even when training is off, ask about fine-tuning evaluations: do humans inside the vendor ever read your transcripts to evaluate model quality? The honest answer is sometimes yes, on aggregated samples, with strict access controls. The dishonest answer is "no humans ever see your data," which is rarely true and usually worth probing.

Consent: the part the tool can't solve for you

This one isn't really about the AI. It's about the law and the room.

Most US states are one-party-consent: as long as one person on the call knows it's being recorded, you're fine. But California, Florida, Illinois, Massachusetts, Pennsylvania, Washington, and a handful of others require all parties to consent. The EU's GDPR and most of the UK go further: you typically need explicit, informed consent before any recording or transcription begins.

What this means in practice:

  1. The bot should be visible. A named participant with an avatar, not a silent listener.
  2. The host should announce it at the top of every external call. "We're using a voice AI to take notes, you'll see it in the participant list. Anyone want it off?"
  3. The tool should make it easy to pause for off-the-record sections, and keep a clear audit trail of when it was recording and when it wasn't.

If a tool brags about being "invisible" or "stealth," that is a privacy posture you do not want. It also probably breaks the law somewhere your team operates.

The compliance shopping list

If you're evaluating voice AI for a team that has any kind of compliance review, here's the list to ask for, in priority order:

  • Signed DPA. A real Data Processing Agreement that names the sub-processors, lists the data categories, and commits to breach notification windows.
  • SOC 2 Type II. Type I is a snapshot. Type II is the audited-over-time version, which is what enterprise security teams ask for.
  • GDPR posture. EU storage, an EU-based DPO or representative, and a public sub-processor list.
  • HIPAA, if relevant. Most voice AI tools are not HIPAA-eligible. If you're in healthcare, this narrows the list fast.
  • SSO and SCIM. Not strictly privacy, but it's how you actually offboard people who shouldn't have transcript access anymore.
  • Audit logs. Who viewed which transcript, who exported it, when. If you can't see that, you can't investigate when something walks out the door.

None of this is exotic. It's the same list any well-run SaaS buyer is already using. Voice AI is just newer, so vendors sometimes assume teams won't ask. They will.

The user-facing controls that matter most

Beyond the legal layer, the day-to-day experience is what actually keeps your team safe. Three controls are non-negotiable:

Per-meeting opt-out

The host should be able to keep the bot out of a meeting in one click, every time. Not buried in settings, not vendor-toggled. Front of the calendar invite or front of the meeting room.

Mid-meeting pause

Half the value of voice AI is in candid conversations. The other half is being able to say "let's go off the record for two minutes" without anyone wondering whether the bot kept listening. A visible pause-and-resume control, with a clear UI signal that it's paused, makes that conversation easy.

Per-transcript deletion

If a meeting shouldn't have been recorded, deleting it should be one click and final. No "we'll get to it on the retention sweep." Hard delete, audited, today.

What we built into relly

relly is a voice AI participant for live team meetings, and we treat the privacy chain as a product surface, not an afterthought. Audio is processed and discarded within minutes. Transcripts are stored in your chosen region with a configurable retention window. Customer meetings are not used to train models, ever, and that's in writing. The bot joins as a visible participant and the host has one-click pause and per-meeting opt-out. We're SOC 2 Type II, with a signed DPA available to anyone who asks.

None of that is unique to us. The point is: it should be table stakes, and you should expect it from any voice AI you let into a real meeting. (See our security page for the full list, and the privacy policy for the legal version.)

The five-question gut check

If you're evaluating a voice AI tool right now, ask these five questions and write down the answers. If any answer is fuzzy, that's a real signal:

  1. How long do you keep the raw audio, and where is it processed?
  2. Where is the transcript stored, and how do I set retention?
  3. Are my meetings used to train your models? Show me the DPA clause.
  4. How does the tool announce itself in a meeting, and how do I pause it?
  5. If I delete a transcript, when is it actually gone, and from where?

You don't need a security team to ask these. You need a few minutes and the willingness to push past the marketing page. The answers will sort the serious vendors from the ones hoping you don't ask.

Common questions

What actually happens to my meeting audio when I use voice AI?

Audio is captured by the meeting bot, streamed to a transcription model, then usually discarded once the transcript is written. The transcript is stored in the vendor's database and surfaced back to your team. Reputable voice AI tools delete raw audio within minutes to hours and let you set retention windows on the transcript.

Is voice AI training on my meetings?

It depends on the vendor and the plan. Most enterprise tools commit in writing that customer meetings are not used to train models. Free or consumer plans sometimes allow opt-out training. Always check the data processing agreement, not the marketing page.

Do meeting participants know voice AI is recording?

They should. A well-designed voice AI joins as a visible participant with a name and avatar, and the host should announce it at the start of the call. Two-party-consent jurisdictions like California, Florida, and most of the EU legally require everyone on the call to be aware before recording starts.

What's the safest way to use voice AI for sensitive meetings?

Pick a vendor with a signed DPA, SOC 2 Type II, regional data residency, no-training-by-default, short audio retention, and per-meeting controls so the host can pause the bot for off-the-record sections. Then announce the bot at the top of every call.

Want voice AI you can actually defend in a security review?

relly is built so your security team can sign off in a single conversation. Early access is open with 50% off your first year, no card needed until launch.

Claim early access →