Speakwise

Free Speaker Diarization: See Who Said What in Your Audio

Automatically separate and label each speaker (Speaker 1, Speaker 2…) and see who said what - free, in your browser, no app to download.

TL;DR - Speaker diarization answers “who spoke when”: it splits a recording into segments and labels each speaker. This free tool runs in your browser - upload an audio file and get a speaker-labeled transcript in minutes (up to 32 speakers, 90+ languages, no install). Try a file below, or see a real sample.

How it works

  1. 1. Upload audio

    Drag in an mp3, m4a or wav of your meeting, interview or call.

  2. 2. We diarize it

    Each distinct voice is detected and labeled - who said what.

  3. 3. Edit & export

    Rename speakers, copy the transcript, or download TXT / SRT.

Sample output

A real run on a 2-speaker interview clip - speaker-labeled automatically.

Speaker 1How do you choose your team? Based on what?

Speaker 2Well, um, I suppose honestly that it tends to be gut feel more than anything else. So when I interview somebody, my interview question's always the same.

Speaker 1What do you ask?

Speaker 2I said, “Tell me the story of your life and the decisions that you made along the way and why you made them. And also tell me about some of the most difficult problems you worked on and how you solved them.” The people that really solved the problem, they know exactly how they solved it - they know the little details. And the people that pretended to solve the problem can maybe go one level, and then they get stuck.

Who it's for

Anyone who needs to know who said what in a recording: meeting and interview transcripts, sales and support calls, podcasts, focus groups, and qualitative research - all multi-speaker audio where a plain transcript isn't enough.

Frequently asked questions

What is speaker diarization?

The process of automatically working out “who spoke when” in audio: it partitions a recording by speaker and labels each one, separate from transcribing what was said.

Speaker diarization vs transcription - what’s the difference?

Transcription captures what was said; diarization captures who said it. This tool does both: a transcript with speaker labels.

How does speaker diarization work?

It splits audio into short frames, turns each into a voice “embedding,” clusters similar voices together, and assigns each cluster a speaker label - then aligns those labels to the transcript.

Is this tool really free?

Yes. Upload audio and get a speaker-labeled transcript free in your browser; sign in with Google to run it. No app install.

How many speakers can it detect?

Up to 32, labeled automatically (Speaker 1, Speaker 2…). You can rename labels after.

Can it tell who said what in a meeting or interview?

Yes - that’s the main use case: meetings, interviews, calls, podcasts and focus groups.

What audio formats and length are supported?

mp3, m4a, wav, aac and ogg, up to 70 MB (about 1 hour of audio) per file.

How accurate is it with accents or overlapping speech?

Strong on clear multi-speaker audio across 90+ languages; heavy crosstalk and overlap are the hardest case for any diarizer.

Do I need to install anything?

No. It’s browser-based; there’s nothing to download.

Is my audio private?

Your audio file is auto-deleted within 24 hours and your transcript after 30 days; neither is used for AI training.

Can I edit or rename the speakers?

Yes - rename labels, correct text, and export (TXT and SRT) right in the browser.

What’s the difference vs open-source (pyannote / WhisperX)?

Those need Python and setup; this gives you the same “who said what” output instantly, with no code.

SpeakwiseSpeakwise - AI Note Taker

Never miss a key detail

  • Smart summaries and action items.
  • Draft a follow-up email from any recording.
  • Record up to 4 hours, no limits.
  • Auto-sync to Notion.
  • Record offline, even with no signal.
Download Speakwise on the App Store