About Turing:
Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.
Role Overview:
We are looking for an Applied Speech AI Engineer / Audio AI Specialist to help build and scale a large-scale multilingual conversational audio data platform. This role sits at the intersection of speech AI, audio quality engineering, ASR systems, diarization, and data operations.
The ideal candidate should have strong practical experience working with:
- Speech-to-text (ASR) systems
- Multi-speaker conversational audio
- Speaker diarization
- Multilingual speech pipelines
- Audio quality analysis
- Real-world noisy audio datasets
- This person will work closely with product, engineering, operations, and data teams to design and improve scalable audio data collection and transcription workflows.
Key Responsibilities
- Define audio quality standards for conversational recordings, including SNR, clipping, noise, echo, reverb, and overlap.
- Build automated audio quality checks for large-scale remote audio collection.
- Evaluate ASR systems such as Whisper, Deepgram, AssemblyAI, Speechmatics, and GPT-4o transcription.
- Improve transcription accuracy across languages, accents, noisy environments, and multi-speaker conversations.
- Design diarization pipelines for speaker segmentation, attribution, overlap handling, and transcript alignment.
- Develop scalable workflows for audio ingestion, transcription, human correction, QA, and delivery.
- Define quality metrics for ASR, diarization, and datasets, including WER, CER, confidence scores, and speaker accuracy.
- Support creation of benchmark datasets and multilingual conversational speech corpora.
Required Skills
- 3–8 years of experience in speech/audio AI systems.
- Strong understanding of ASR systems, speaker diarization, conversational audio processing, and multilingual speech systems.
- Experience working with noisy real-world audio data.
- Strong knowledge of SNR, clipping, codecs, sample rates, and audio preprocessing techniques.
- Hands-on experience with Python and audio processing libraries.
- Familiarity with conversational speech datasets and evaluation metrics.
Perks of Freelancing With Turing:
- Work in a fully remote environment.
- Opportunity to work on cutting-edge AI projects with leading LLM companies.
- Potential for contract extension based on performance and project needs.
Offer Details:
- Commitments Required : at least 4 hours per day and minimum 40 hours per week with 4 hours of overlap with PST.
- Engagement type : Contractor assignment/freelancer (no medical/paid leave)
- Duration of contract : 12 weeks
Evaluation Process :
- Shortlisted candidates will be reviewed by our team internally and will be reached out for onboarding.