About Turing
Turing is one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.
Turing helps customers in two ways: working with the world's leading AI labs to advance frontier model capabilities and leveraging that work to build real-world AI systems that help businesses solve complex problems and unlock new opportunities.
We are seeking detail-oriented Python developers to join Turing's AI Evaluation team as AI Evaluation Specialists.
In this role, you will design and create evaluation tasks that measure the capabilities of advanced AI systems on real-world software engineering challenges. You will author structured tasks, reference solutions, verification criteria, and domain-specific guidance used to evaluate how effectively AI models perform Python-based development work.
Your contributions will directly influence the benchmarking and improvement of next-generation AI systems used by leading AI organizations.
What does day-to-day look like?
- Design realistic Python-focused software engineering evaluation tasks for AI agents
- Write clear, precise instructions that define expected outputs, constraints, and success criteria
- Create reference solutions that successfully solve evaluation tasks and satisfy validation requirements
- Develop human-readable verifier descriptions that document expected behaviors and evaluation checks
- Author domain-specific knowledge files that teach AI systems relevant workflows, conventions, and best practices without revealing solutions
- Review tasks for clarity, consistency, edge cases, and evaluation quality
- Analyze AI-generated outputs and identify common failure patterns
- Collaborate with researchers and evaluation teams to improve benchmark quality and coverage
- Follow established task-authoring standards and quality guidelines
Requirements
- Bachelor's degree or higher in Computer Science, Software Engineering, Information Technology, or a related field
- 3–5 years of hands-on Python development experience
- Strong proficiency in Python and core software engineering concepts
- Experience developing applications, automation scripts, APIs, data-processing workflows, or backend systems using Python
- Strong understanding of data structures, algorithms, debugging, testing, and software development best practices
- Excellent written English with the ability to write clear, precise, and unambiguous technical instructions
- Ability to think critically about how AI systems interpret instructions and solve technical problems
- Comfortable working with structured formats such as JSON, Markdown, YAML, DOCX, and XLSX
Nice to have:
- Experience with LLM evaluation, prompt engineering, or AI benchmarking
- Experience creating technical assessments, coding challenges, or educational content
- Experience with Docker, containers, or cloud-based development environments
Perks of Freelancing With Turing
- Work on cutting-edge AI projects with leading AI research organizations
- Flexible remote work opportunities
- Opportunity to influence the evaluation of next-generation AI systems
- Collaborate with a global network of highly skilled professionals
Offer details:
- Commitments Required : 40 hours per week with overlap of 4 hours with PST
- Engagement type : Contractor assignment/freelancer (no medical/paid leave)
- Duration of contract : 2 months; [expected start date is next week]
Evaluation Process
- One round of technical interview (or) Automated Live coding challenge