About Turing:
Turing is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.
Turing helps customers in two ways: Working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.
About the Role:
Annotators are the core builders of SkillsBench. You will design and write AI evaluation tasks — structured challenges given to large language model (LLM) agents running inside automated environments. Each task you create tests whether an AI agent performs significantly better when given domain-specific knowledge
skills versus without it. Your tasks directly feed into Turing's commercial AI evaluation pipeline, used by clients.
What You Will Do:
Write clear, unambiguous task instructions that define exactly what an AI agent must produce, where to save it, and what rules to follow Create reference solutions that demonstrate the correct approach and pass all automated checks Write human-readable verifier descriptions listing every check the automated test suite will run Author domain-specific skill files that teach an AI agent the conventions, workflows, and edge cases relevant to the task — without leaking expected answers Ensure the no-skills variant of each task is identical to the with-skills variant except for the absence of skill files Work within the task structure (instruction, environment, solution, tests) and follow Turing's task quality standards
Required:
Bachelor's degree or higher in a relevant technical or domain-specific field (Computer Science, Engineering, Finance, Data Science, Linguistics, etc.) Experience: 1–3 years in a domain where you have hands-on practical expertise (software development, financial analysis, document processing, data science, etc.)
Must Have:
Strong written English; ability to write precise, unambiguous instructions
Genuine hands-on expertise in at least one of the SkillsBench domains (coding, finance, document generation, audio/ML, etc.)
Ability to think from an AI agent's perspective — what would a model get wrong without guidance?
Comfort reading and producing structured file outputs (JSON, DOCX, XLSX, Markdown)
Nice to Have:
Prior experience with LLM evaluation, prompt engineering, or AI benchmark design
Familiarity with Python scripting Experience with Docker or containerised environments
Domains :
Power Systems & Control
Cybersecurity
Network & System Engineering
Offer Details:
Don't miss out on this job opportunity!
Annotator - STEM