As an Agentic Coding Evaluation Specialist, you will work with real-world codebases to design challenging coding prompts, execute them across AI models, and evaluate outputs through structured, side-by-side comparisons. Your input will directly contribute to improving AI systems for developer workflows.
- Rate: 75 USD/hr
- Commitment: 40 or 30 or 20 hours per week (Weekday only)
- Employment Type: Contract
- Duration: 1 week (Expected start date: April 17, 2026)
- Location: North America, Western Europe
Key Responsibilities
- Analyze and navigate large open-source codebases (e.g., React, Pydantic, Pandas)
- Create complex, single-turn coding prompts grounded in real repositories
- Execute prompts across multiple AI model instances
- Perform side-by-side (SxS) evaluations of model outputs
- Evalure model performance across dimensions such as Instruction following, Code quality, Tool usage, Testing & validation and Communication clarity
- Provide clear, structured justifications for evaluation decisions
Required Qualifications
- 2+ years of professional experience in software development
- Bachelor’s degree in Computer Science or a related field
- Strong proficiency in at least one of the following: Python / JavaScript / TypeScript / Java / Go / C++
- Solid understanding of Data structures & algorithms, Debugging and testing practices and Software design and code quality principles