Go back to all jobs

Senior Software Engineer – LLM Evaluation & Repository Validation

Part-time
Remote
Skills
Python
JavaScript
Java
Go
Rust
C
C++
C#
Ruby
Overview

Job Title: Senior Software Engineer – LLM Evaluation & Repository Validation


About the projects: we are building LLM evaluation and training datasets to train LLM to work on realistic software engineering problems. One of our approaches, in this project, is to build verifiable SWE tasks based on public repository histories in a synthetic approach with human-in-the-loop; while expanding the dataset coverage to different types of tasks in terms of programming language, difficulty level, and etc.


About the Role: We are looking for experienced software engineers (tech lead level) who are familiar with high-quality public GitHub repositories and can contribute to this project. You should have experience working with well-maintained, widely-used repos with 500+ stars. This role involves hands-on software engineering work, including development environment automation, issue triaging, and evaluating test coverage and quality


Why Join Us?
Turing is one of the world’s fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems. You’ll be at the forefront of evaluating how LLMs interact with real code, influencing the future of AI-assisted software development. This is a unique opportunity to blend practical software engineering with AI research.


What does day-to-day look like:

  • Analyze and triage GitHub issues across trending open-source libraries.
  • Set up and configure code repositories, including Dockerization and environment setup.
  • Evaluating unit test coverage and quality.
  • Modify and run codebases locally to assess LLM performance in bug-fixing scenarios.
  • Collaborate with researchers to design and identify repositories and issues that are challenging for LLMs.
  • Opportunities to lead a team of junior engineers to collaborate on projects.

Required Skills:

  • Strong experience with at least one of the following languages: Python, JavaScript, Java, Go, Rust, C/C++, C#, or Ruby. 
  • Proficiency with Git, Docker, and basic software pipeline setup.
  • Ability to understand and navigate complex codebases.
  • Comfortable running, modifying, and testing real-world projects locally.
  • Experience contributing to or evaluating open-source projects is a plus.

Nice to Have:

  • Previous participation in LLM research or evaluation projects.
  • Experience building or testing developer tools or automation agents.

Perks of Freelancing With Turing:

  • Work in a fully remote environment.
  • Opportunity to work on cutting-edge AI projects with leading LLM companies.

Offer details:

  • Commitment required: 20 hours per week with some overlap with PST
  • Employment type: Contractor assignment (no medical/paid leave)
  • Duration of contract: 3 months with expected start date as next week
Turing
Create an account
Already have an account?

By clicking continue, you agree to our Terms of Service and Privacy Policy.

Or continue with
Trusted by AI leaders, enterprises, and more
Anthropic
Dell
Disney
Nvidia
Pepsi
Reddit
Rivian
snowflake
Anthropic
Dell
Disney
Nvidia
Pepsi
Reddit
Rivian
snowflake
Terms of ServicePrivacy Policy© 2026 Turing Enterprises, Inc.

Don't miss out on this job opportunity!

Senior Software Engineer – LLM Evaluation & Repository Validation