Recommended Seminar Readings #
This page consists a list of recommended reading materials, which includes papers, technical reports, blog posts, and other related online materials in the scope of ML4SE/AI4SE. They are cherry-picked from the literature with the following filtering criteria: (1) relevancy to the latest trends in the field; (2) benefit of reading and discussing them; (3) the instructor’s own research taste.
They are grouped by topics. Each seminar can focus on one topic, with reading one or more reading materials under that topic. Feel free to propose other topics and reading materials for the seminars you lead.
Coding Agents #
- Agentless: Demystifying LLM-Based Software Engineering Agents (FSE'25)
- OpenHands: An Open Platform for AI Software Developers as Generalist Agents (ICLR'25)
- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering (NeurIPS'24)
- AutoCodeRover: Autonomous Program Improvement (ISSTA'24)
Issue Resolution Benchmarks (SWE- Series) #
- SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? (‘25)
- SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? (ICLR'25)
- SWE-smith: Scaling Data for Software Engineering Agents (NeurIPS-DB'25)
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (ICLR'24)
Coding Agent Usages Benchmarks #
- The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering (‘26)
- This is the dataset driving the MSR'26 Mining Challenge (check out the accepted papers in the track)
- SWE-chat: Coding Agent Interactions From Real Users in the Wild (‘26)
Instruction Fine-Tuning #
- Magicoder: Empowering Code Generation with OSS-Instruct (ICML'24)
- WizardCoder: Empowering Code Large Language Models with Evol-Instruct (ICLR'24)
- OctoPack: Instruction Tuning Code Large Language Models (ICLR'24)
Training with Human/Execution Feedback #
Parameter-Efficient Fine-Tuning #
Analysis of Code LLMs #
- What Do They Capture? A Structural Analysis of Pre-Trained Language Models for Source Code (ICSE'22)
Test-Time Scaling #
Repository-Level Context #
- RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems (ICLR'24)
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS-DB'23)
Code LLM Technical Reports #
- Qwen3-Coder-Next Technical Report (‘26)
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (ACL'25)
- CodeGemma: Open Code Models Based on Gemma (‘24)
- StarCoder2 and The Stack v2: The Next Generation (‘24)
- DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence (‘24)
- CodeT5+: Open Code Large Language Models for Code Understanding and Generation (EMNLP'23)