About Turing:
Turing is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.
Turing helps customers in two ways: Working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.
Role Overview:
We are seeking a highly analytical and creative AI Trainer to support the development of a new benchmark evaluating how Large Language Models (LLMs) internalize and apply unfamiliar rule systems. This role focuses on training, evaluating, and refining model performance using both real-world and custom-designed board and card game mechanics.
The ideal candidate has strong experience in logic, game design analysis, instructional reasoning, and data annotation, as well as the ability to translate complex rule sets into structured tasks and evaluation criteria.
What does day-to-day look like:
1. Task & Game System Development
- Interpret and analyze rules from recently released board/card games (3–6 months old).
- Collaborate with the research team to design synthetic, novel game systems featuring unique mechanics, win conditions, and interactions.
- Create clear, human-style instructional materials that simulate how people learn new games.
- Build tasks based on:
- Real modern games.
- Custom, invented game systems.
2. Model Training & Evaluation
- Develop prompts that require LLMs to:
- Predict valid moves under dynamic rule systems.
- Simulate turns and multi-step game sequences.
- Reason about outcomes, scoring, and strategic implications.
- Test model comprehension and adaptability to previously unseen rule sets.
- Identify failure patterns, edge cases, and misunderstanding of explicit vs. inferred rules.
3. Rubric Creation
- Create automated scoring rubrics to evaluate:
- Logical coherence of model-generated moves.
- Rule adherence and internal consistency.
- Correctness of predicted game outcomes.
- Document evaluation logic to ensure reproducibility.
4. Quality Assurance & Iteration
- Review and refine task instructions to ensure clarity and minimal ambiguity.
- Validate that benchmarks properly test context learning rather than memorization.
- Work closely with engineering and research teams to iterate on dataset quality and task difficulty.
Requirements:
Strong analytical background in logic, rules-based systems, or game theory.
- Experience with game design, puzzle design, or mechanics analysis (professional or hobbyist).
- Excellent writing skills with the ability to explain complex systems clearly.
- Familiarity with LLM behavior, prompt design, or ML evaluation frameworks.
Perks of Freelancing With Turing:
- Work in a fully remote environment.
- Opportunity to work on cutting-edge AI projects with leading LLM companies.
- Potential for contract extension based on performance and project needs.
Offer Details:
- Commitments Required: Minimum 40 hours per week with overlap 4 hours with PST
- Engagement type : Contractor assignment(no medical/paid leave)
- Duration of contract : 5 weeks; [expected start date is next week]
- Location : India, Pakistan, Nigeria, Kenya, Egypt, Ghana, Bangladesh, Turkey, Mexico
Evaluation Process (approximately 90 mins) :
- Offline Technical Assessment