Agentic Tasker (Frontier STEM)

Full-time

Remote

Skills

Python

Data Science

PyTorch

Machine Learning

Overview

About the Role

Exciting opportunity to work directly with researchers at a top 8 Frontier Lab. The core objective of this role is to enhance the reasoning and problem-solving capabilities of a target frontier model in STEM topics by designing, validating, and analyzing challenging benchmark tasks.

Key Responsibilities

Task Design and Development: Design challenging, real-world data science problems that serve as the foundation for Colab Bench tasks.
Content Generation: Integrate the problems into an Agentic development environment, preparing all necessary components using Python, which include:
- Detailed Instructions and an overview of the required task.
- A Golden solution that follows the instructions.
- The necessary Environment, including datasets, Python libraries, and metadata.
- A Test notebook containing unit tests that solutions must pass.
Evaluation and Analysis: Evaluate the cross model’s performance on the tasks
Headroom Identification: Identify tasks where target model fails to pass all tests, specifically classifying the failure as a logical reasoning failure
Loss Extraction: Analyze the agent’s steps (Agent Trajectory) to observe and extract core capability loss patterns from the model.

Qualifications and Recruitment

Expertise Focus: Applicants must have strong expertise in data science, ML, finance, and coding, with a deep background in frontier STEM.
Target Candidates: We are actively recruiting PhD students from top schools in the US and highly-skilled GitHub contributors. (Will consider a small cohort in India as well)

Offer Details

Rate: ~ $30/hour.
Commitment: Minimum 30 hours per week (on Week days).
Employment Type: Contractor (no medical/paid leave).
Duration: 3 months (expected start date: next week).
Locations: India.

Create an account

Already have an account?

Or continue with email

Trusted by AI leaders, enterprises, and more

Don't miss out on this job opportunity!

Agentic Tasker (Frontier STEM)