Go back to all jobs

Agentic Coding Evaluation Specialist

Part-time
Remote
Skills
Python
JavaScript
Java
Go
C++
Overview

As an Agentic Coding Evaluation Specialist, you will work with real-world codebases to design challenging coding prompts, execute them across AI models, and evaluate outputs through structured, side-by-side comparisons. Your input will directly contribute to improving AI systems for developer workflows.

  • Rate: 75 USD/hr
  • Commitment: 40 or 30 or 20 hours per week (Weekday only)
  • Employment Type: Contract
  • Duration: 1 week (Expected start date: April 17, 2026)
  • Location: North America, Western Europe

Key Responsibilities

  • Analyze and navigate large open-source codebases (e.g., React, Pydantic, Pandas)
  • Create complex, single-turn coding prompts grounded in real repositories
  • Execute prompts across multiple AI model instances
  • Perform side-by-side (SxS) evaluations of model outputs
  • Evalure model performance across dimensions such as Instruction following, Code quality, Tool usage, Testing & validation and Communication clarity
  • Provide clear, structured justifications for evaluation decisions

Required Qualifications

  • 2+ years of professional experience in software development
  • Bachelor’s degree in Computer Science or a related field
  • Strong proficiency in at least one of the following: Python / JavaScript / TypeScript / Java / Go / C++
  • Solid understanding of Data structures & algorithms, Debugging and testing practices and Software design and code quality principles
Turing
Create an account
Already have an account?
Or continue with email
Trusted by AI leaders, enterprises, and more
Anthropic
Dell
Disney
Nvidia
Pepsi
Reddit
Rivian
snowflake
Anthropic
Dell
Disney
Nvidia
Pepsi
Reddit
Rivian
snowflake
Terms of ServicePrivacy Policy© 2026 Turing Enterprises, Inc.

Don't miss out on this job opportunity!

Agentic Coding Evaluation Specialist