AI Evaluation Engineer (Python / Java / Web)

Full-time

Remote

Skills

Java

JavaScript

Overview

About Turing

Turing is one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.

Turing helps customers in two ways: working with the world's leading AI labs to advance frontier model capabilities and leveraging that work to build real-world AI systems that help businesses solve complex problems and unlock new opportunities.

Role Overview

We are seeking detail-oriented software professionals with expertise in Java and Web Development to join Turing's AI Evaluation team as AI Evaluation Specialists.

In this role, you will design and create evaluation tasks that measure the capabilities of advanced AI systems on real-world software engineering challenges. You will author structured tasks, reference solutions, verification criteria, and domain-specific guidance used to evaluate how effectively AI models perform development-related work.

Your contributions will directly influence the benchmarking and improvement of next-generation AI systems used by leading AI organizations.

What does day-to-day look like?

Design realistic software engineering evaluation tasks focused on Java and Web Development
Write clear, precise instructions that define expected outputs, constraints, and success criteria for AI agents
Create reference solutions that successfully solve evaluation tasks and satisfy validation requirements
Develop human-readable verifier descriptions that document expected behaviors and evaluation checks
Author domain-specific knowledge files that teach AI systems relevant workflows, conventions, and best practices without revealing solutions
Review tasks for clarity, consistency, edge cases, and evaluation quality
Analyze AI-generated outputs and identify common failure patterns
Collaborate with researchers and evaluation teams to improve benchmark quality and coverage
Follow established task-authoring standards and quality guidelines

Requirements

Bachelor's degree or higher in Computer Science, Software Engineering, Information Technology, or a related field
3–5 years of hands-on software development experience
Strong proficiency in Java and core software engineering concepts
Experience building or maintaining web applications (Frontend, Backend, or Full Stack)
Strong understanding of object-oriented programming, APIs, web technologies, and software development best practices
Excellent written English with the ability to write clear, precise, and unambiguous technical instructions
Ability to think critically about how AI systems interpret instructions and solve technical problems
Comfortable working with structured formats such as JSON, Markdown, YAML, DOCX, and XLSX

Nice to have:

Experience with LLM evaluation, prompt engineering, or AI benchmarking
Experience creating technical assessments, coding challenges, or educational content
Experience with Docker, containers, or cloud-based development environments