SWE-bench

Overview

Code Comprehension tasks measure a model’s ability to interpret existing RTL and testbench artifacts, ensuring alignment with specifications and accurate technical reasoning.

Task Categories

Specification ↔ RTL Correspondence – map between natural language specifications and RTL modules.
Test Plan ↔ Testbench Correspondence – map between test plans and testbench implementations.
Technical Q&A on RTL or Testbenches - answer detailed questions about provided RTL or testbench code.

Formats

Provided in Non-Agentic format only.

Evaluation Criteria

BLEU scores for correspondence tasks.
LLM-based judging for Q&A.