Agentic Coding Evaluation Specialist

Part-time

Remote

Skills

Python

JavaScript

Java

C++

Overview

As an Agentic Coding Evaluation Specialist, you will work with real-world codebases to design challenging coding prompts, execute them across AI models, and evaluate outputs through structured, side-by-side comparisons. Your input will directly contribute to improving AI systems for developer workflows.

Rate: 75 USD/hr
Commitment: 40 or 30 or 20 hours per week (Weekday only)
Employment Type: Contract
Duration: 1 week (Expected start date: April 17, 2026)
Location: North America, Western Europe

Key Responsibilities

Analyze and navigate large open-source codebases (e.g., React, Pydantic, Pandas)
Create complex, single-turn coding prompts grounded in real repositories
Execute prompts across multiple AI model instances
Perform side-by-side (SxS) evaluations of model outputs
Evalure model performance across dimensions such as Instruction following, Code quality, Tool usage, Testing & validation and Communication clarity
Provide clear, structured justifications for evaluation decisions

Required Qualifications

2+ years of professional experience in software development
Bachelor’s degree in Computer Science or a related field
Strong proficiency in at least one of the following: Python / JavaScript / TypeScript / Java / Go / C++
Solid understanding of Data structures & algorithms, Debugging and testing practices and Software design and code quality principles

Create an account

Already have an account?

Or continue with email

Trusted by AI leaders, enterprises, and more

Don't miss out on this job opportunity!

Agentic Coding Evaluation Specialist

Go back to all jobs

Agentic Coding Evaluation Specialist

Part-time

Remote

Skills

Python

JavaScript

Java

C++

Overview

Rate: 75 USD/hr
Commitment: 40 or 30 or 20 hours per week (Weekday only)
Employment Type: Contract
Duration: 1 week (Expected start date: April 17, 2026)
Location: North America, Western Europe

Key Responsibilities

Analyze and navigate large open-source codebases (e.g., React, Pydantic, Pandas)
Create complex, single-turn coding prompts grounded in real repositories
Execute prompts across multiple AI model instances
Perform side-by-side (SxS) evaluations of model outputs
Evalure model performance across dimensions such as Instruction following, Code quality, Tool usage, Testing & validation and Communication clarity
Provide clear, structured justifications for evaluation decisions

Required Qualifications

2+ years of professional experience in software development
Bachelor’s degree in Computer Science or a related field
Strong proficiency in at least one of the following: Python / JavaScript / TypeScript / Java / Go / C++
Solid understanding of Data structures & algorithms, Debugging and testing practices and Software design and code quality principles

Create an account

Already have an account?

Or continue with email

Trusted by AI leaders, enterprises, and more

Don't miss out on this job opportunity!

Agentic Coding Evaluation Specialist