About Turing:
Turing is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.
Turing helps customers in two ways: Working with the world’s leading AI labs to advance frontier model capabilities in thinking, reasoning, coding, agentic behavior, multimodality, multilinguality, STEM and frontier knowledge; and leveraging that work to build real-world AI systems that solve mission-critical priorities for companies.
Job Overview:
We are seeking professionals with strong analytical and mathematical reasoning skills to create high-quality reasoning datasets for Large Language Model (LLM) training. This role focuses on designing and authoring complex, well-defined mathematical systems — including entirely new or fictional constructs — that require models to learn, reason, and prove properties from scratch.
Ideal candidates possess a deep understanding of mathematical logic and formal systems (such as group theory, algebraic structures, or discrete mathematics) and can design tasks that assess whether models can apply newly defined axioms to perform calculations, derive properties, and construct valid proofs.
Key Responsibilities:
- Design and define new mathematical objects and operations, similar to those in abstract algebra or group theory, but potentially fictional or modified (e.g., “Chromatic Numbers” where numbers have color attributes and custom addition rules).
- Author multi-part reasoning tasks that evaluate an LLM’s ability to interpret definitions, perform calculations, and prove properties within these newly defined systems.
- Develop problem types that include:
- Applying new mathematical rules to compute or simplify expressions.
- Proving or disproving structural properties (e.g., associativity, closure, commutativity).
- Performing symbolic manipulation within multi-layered or hybrid mathematical systems.
- Ensure every task includes:
- Deterministic answer(s) — solutions that can be verified against the defined system’s rules.
- Fully checked reasoning trace — step-by-step logical reasoning consistent with the provided definitions.
- A rubric that distinguishes between rule misuse, algebraic error, and incomplete proof, ensuring rigorous and consistent evaluation.
- Anticipate and handle corner cases such as color collisions, non-associative triples, undefined identities, or boundary-rule ambiguities.
- Collaborate with reviewers and LLM engineers to refine definitions and task phrasing, ensuring clarity, coherence, and reproducibility.
- Maintain quality and consistency standards across all tasks, emphasizing precision, completeness, and logical correctness.
Qualifications:
- 2+ years of experience in a technical, mathematical, or analytical role (e.g., mathematics, computer science, logic, or data science).
- Strong background in abstract reasoning and familiarity with formal mathematical systems (group theory, modular arithmetic, symbolic logic, etc.).
- Proven ability to design structured reasoning problems that test both procedural understanding and conceptual application.
- Excellent written communication skills for creating clear, rigorous, and logically sound task descriptions and solutions.
- Demonstrated attention to detail and logical completeness, ensuring definitions and proofs remain internally consistent.
- Experience working with or evaluating Large Language Models (LLMs) is a plus.
- Background in Mathematics, Theoretical Computer Science, or a related STEM field preferred
Deliverables:
- A structured collection of reasoning tasks covering:
- Definition and application of new mathematical systems.
- Proof and property reasoning.
- Complex expression evaluation and symbolic generalization.
- Each task must contain:
- A formal context block (system definition, axioms, and examples).
- A prompt query (calculation, proof, or reasoning challenge).
- An expected output (step-by-step reasoning trace leading to a deterministic final result).
- A rubric for scoring correctness, consistency, and completeness.
- Drafts must anticipate corner cases and demonstrate clear handling of exceptions or ambiguous scenarios.
- Final tasks must be self-contained, verifiable, and aligned with model reasoning evaluation goals.
Offer Details:
- Commitments Required: Minimum 40 hours per week with overlap of 4 hours with PST.
- Engagement type : Contractor assignment
- Duration of contract : 5 weeks
Evaluation Process -
- Shortlisted candidates will be sent a Job Interest Form.
- After the profile review, an assessment will be shared, which must be completed within 48 hours
- Based on the assessment outcomes, shortlisted candidates will be contacted to discuss the pre‑onboarding requirements