Go back to all jobs

Annotator - STEM

Full-time
Remote
Skills
Python
Cyber Security
Overview

About Turing:

Turing is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.

Turing helps customers in two ways: Working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.


About the Role:

Annotators are the core builders of SkillsBench. You will design and write AI evaluation tasks — structured challenges given to large language model (LLM) agents running inside automated environments. Each task you create tests whether an AI agent performs significantly better when given domain-specific knowledge 

skills versus without it. Your tasks directly feed into Turing's commercial AI evaluation pipeline, used by clients. 


What You Will Do:

Write clear, unambiguous task instructions that define exactly what an AI agent must produce, where to save it, and what rules to follow Create reference solutions that demonstrate the correct approach and pass all automated checks Write human-readable verifier descriptions listing every check the automated test suite will run Author domain-specific skill files that teach an AI agent the conventions, workflows, and edge cases relevant to the task — without leaking expected answers Ensure the no-skills variant of each task is identical to the with-skills variant except for the absence of skill files Work within the task structure (instruction, environment, solution, tests) and follow Turing's task quality standards 


Required: 

Bachelor's degree or higher in a relevant technical or domain-specific field (Computer Science, Engineering, Finance, Data Science, Linguistics, etc.)  Experience: 1–3 years in a domain where you have hands-on practical expertise (software development, financial analysis, document processing, data science, etc.)  


Must Have:  

Strong written English; ability to write precise, unambiguous instructions 

Genuine hands-on expertise in at least one of the SkillsBench domains (coding, finance, document generation, audio/ML, etc.) 

Ability to think from an AI agent's perspective — what would a model get wrong without guidance? 

Comfort reading and producing structured file outputs (JSON, DOCX, XLSX, Markdown)  


Nice to Have:  

Prior experience with LLM evaluation, prompt engineering, or AI benchmark design 

Familiarity with Python scripting Experience with Docker or containerised environments 

 

Domains : 

Power Systems & Control 

Cybersecurity 

Network & System Engineering 


Offer Details:

  • Commitments Required:  40 hours per week with overlap 4 hours with PST
  • Engagement type  : Contractor assignment(no medical/paid leave)
  • Duration of contract : 2 months; [expected start date is next week]
  • Location : India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, Mexico
Turing
Create an account
Already have an account?
Or continue with email
Trusted by AI leaders, enterprises, and more
Anthropic
Dell
Disney
Nvidia
Pepsi
Reddit
Rivian
snowflake
Anthropic
Dell
Disney
Nvidia
Pepsi
Reddit
Rivian
snowflake
Terms of ServicePrivacy Policy© 2026 Turing Enterprises, Inc.

Don't miss out on this job opportunity!

Annotator - STEM