A practical tool for researchers working with audio data
This Colab-ready notebook is designed for qualitative researchers, doctoral students, and anyone working with spoken audio from interviews, classrooms, or focus groups. It uses WhisperX and pyannote.audio to produce clean, speaker-attributed transcripts with word-level timestamps.
You can:
- Upload audio files (.wav, .mp3, .ogg)
- Automatically generate CSV, TXT, JSON, and VTT outputs
- Assign pseudonyms using a simple CSV file
- Run everything directly in Google Colab no installation required
Example Use Cases
- Batch process teacher interviews for a dissertation project
- Identify speaker turns in professional development sessions
- Prepare anonymized transcripts for NVivo, Dedoose, or Atlas.ti
What You’ll Need
- An audio file (or try our sample)
- A Hugging Face token for diarization (instructions)
- (Optional) A pseudonyms.csv file to anonymize names and locations
How to Use the Notebook
👉 Launch in Google Colab
Once open, you’ll:
- Install dependencies (automated)
- Paste your Hugging Face token
- Upload your audio and (optional) pseudonyms file
- Get diarized transcripts in multiple formats
You can download the output files at the end of the notebook.
Sample Files (Optional but Helpful)
These are included for test runs and learning how the pipeline works before you use your own data.
Reminder on Privacy
No data is stored externally; everything runs inside your Colab session. That said, always anonymize identifiable information before sharing transcripts or outputs.
This notebook is part of a growing set of tools for qualitative researchers who want to use NLP to analyze language, discourse, and meaning. Check back soon for topic modeling, semantic search, and more.