Bio
I am an Assistant Professor in the Cheriton School of Computer Science at University of Waterloo. My research interest is improving developers' productivity during software development, testing, and maintenance. Specific topics include execution-guided machine learning models for testing and verification, learning to evolve code and comments, and frameworks for executable comments and specifications.
I obtained my Ph.D. in 2023 and M.Sc. in 2020 from The University of Texas at Austin, advised by Milos Gligoric. I received my B.Sc. from University of Science and Technology of China (School of the Gifted Young) in 2017.
Teaching
CS846 Advanced Topics in Software Engineering: Machine Learning for Software Engineering: Spring 2026, Fall 2024
CS446/CS646/ECE452 Software Design and Architecture: Winter 2026, Winter 2025, Winter 2024
Publications
-
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar.
In Annual Meeting of the Association for Computational Linguistics
(ACL'26), to appear. 2026. San Diego, USA. -
Energy-Efficient Software Development: A Multi-dimensional Empirical Analysis of Stack Overflow.
In International Conference on Software Engineering
(ICSE'26). 2026. -
When AI Coding Assistants Leak Training Data: A Study of LLM Memorization in Code Generation.
In International Conference on AI-powered Software @ FSE
(AIWare'26). 2026. -
Learning Multi-step Reasoning via Persistent Latent State Propagation.
In Workshop on Latent & Implicit Thinking---Going Beyond CoT Reasoning @ ICLR
(LIT'26). 2026. -
World of Logs: A Dataset of Logs from Online Documents.
In International Conference on Mining Software Repositories, Data and Tool Showcase Track
(MSR'26 DataTool). 2026. -
NL in the Middle: Code Translation with LLMs and Intermediate Representations.
In International Conference on Collaborative Advances in Software and Computing
(CASCON'25). November 2025. Toronto, Canada. -
Learning to Edit Interactive Machine Learning Notebooks.
In International Conference on the Foundations of Software Engineering, Ideas, Visions and Reflections Track
(FSE'25 IVR). June 2025. Trondheim, Norway. -
A Tool for Generating Exceptional Behavior Tests With Large Language Models.
In International Conference on the Foundations of Software Engineering, Demonstrations Track
(FSE'25 Demo). June 2025. Trondheim, Norway. -
Mix-of-Language-Experts Architecture for Multilingual Programming.
In International Workshop on Large Language Models for Code
(LLM4Code'25). April 2025. Ottawa, Canada. -
CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories.
In International Conference on Mining Software Repositories, Data and Tool Showcase Track
(MSR'25 DataTool). April 2025. Ottawa, Canada. -
exLong: Generating Exceptional Behavior Tests with Large Language Models.
In International Conference on Software Engineering
(ICSE'25). April 2025. Ottawa, Canada. -
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation.
In International Conference on Software Engineering
(ICSE'25). April 2025. Ottawa, Canada. -
What Inputs Drive Effective LLM-based Unit Test Generation?.
IEEE Software, Special Issue on AIware in the FM Era
(IEEE Softw.'25 AIware). 2025. -
Detecting DTC Requirement-Implementation Inconsistencies Using LLMs: An Experience Report.
IEEE Software
(IEEE Softw.'25). 2025. -
Efficient Incremental Code Coverage Analysis for Regression Test Suites.
In International Conference on Automated Software Engineering
(ASE'24), 1882-1894. October 2024. Sacramento, USA. -
Multilingual Code Co-evolution using Large Language Models.
In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
(ESEC/FSE'23), 695-707. December 2023. San Francisco, USA. -
Machine Learning for Executable Code in Software Testing and Verification.
PhD Thesis, The University of Texas at Austin. August 2023. Austin, USA.This dissertation won a Margarida Jacome Dissertation Award.
-
Extracting Inline Tests from Unit Tests.
In International Symposium on Software Testing and Analysis
(ISSTA'23), 1458-1470. July 2023. Seattle, USA. -
More Precise Regression Test Selection via Reasoning about Semantics-Modifying Changes.
In International Symposium on Software Testing and Analysis
(ISSTA'23), 664-676. July 2023. Seattle, USA.This paper won an ACM Distinguished Paper Award. -
Learning Deep Semantics for Test Completion.
In International Conference on Software Engineering
(ICSE'23), 2111-2123. May 2023. Melbourne, Australia. -
pytest-inline: An Inline Testing Tool for Python.
In International Conference on Software Engineering, Tool Demonstrations Track
(ICSE'23 Demo), 161-164. May 2023. Melbourne, Australia. -
Inline Tests.
In International Conference on Automated Software Engineering
(ASE'22), 57:1-13. October 2022. Oakland Center, USA. -
CoditT5: Pretraining for Source Code and Natural Language Editing.
In International Conference on Automated Software Engineering
(ASE'22), 22:1-12. October 2022. Oakland Center, USA. -
Impact of Evaluation Methodologies on Code Summarization.
In Annual Meeting of the Association for Computational Linguistics
(ACL'22), 4936-4960. May 2022. Dublin, Ireland. -
Roosterize: Suggesting Lemma Names for Coq Verification Projects using Deep Learning.
In International Conference on Software Engineering, Tool Demonstrations Track
(ICSE'21 Demo), 21-24. May 2021. Madrid, Spain. -
Leveraging Class Hierarchy for Code Comprehension.
In Workshop on Computer Assisted Programming
(CAP'20). December 2020. Vancouver, Canada. -
Unifying Execution of Imperative Generators and Declarative Specifications.
In Conference on Object-Oriented Programming Systems, Languages and Applications
(OOPSLA'20), 217:1-217:26. November 2020. Chicago, USA. -
On the Naturalness of Hardware Descriptions.
In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
(ESEC/FSE'20), 530-542. November 2020. Sacramento, USA. -
Debugging the Performance of Maven's Test Isolation: Experience Report.
In International Symposium on Software Testing and Analysis
(ISSTA'20), 249-259. July 2020. Los Angeles, USA. -
Learning to Update Natural Language Comments Based on Code Changes.
In Annual Meeting of the Association for Computational Linguistics
(ACL'20), 1853-1868. July 2020. Seattle, USA. -
Deep Generation of Coq Lemma Names using Elaborated Terms.
In International Joint Conference on Automated Reasoning
(IJCAR'20), 97-118. June 2020. Paris, France. -
Learning to Format Coq Code using Language Models.
In The Coq Workshop
(Coq'20). June 2020. Paris, France. -
Design, Implementation, and Application of GPU-based Java Bytecode Interpreters.
In Conference on Object-Oriented Programming Systems, Languages and Applications
(OOPSLA'19), 177:1-177:28. October 2019. Athens, Greece. -
A Framework for Writing Trigger-Action Todo Comments in Executable Format.
In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
(ESEC/FSE'19), 385-396. August 2019. Tallinn, Estonia.This paper won an ACM SIGSOFT Distinguished Paper Award. -
Natural Language Processing and Program Analysis for Supporting Todo Comments as Software Evolves.
In Workshop on Natural Language Processing for Software Engineering
(NL4SE'18), 775-778. February 2018. New Orleans, USA.
Service
2027: PC member of ICSE, FSE.
2026: PC member of ICSE, ASE, ISSTA.
2025: PC member of ISSTA, LLM4Code. Reviewer for ICLR, ACL Rolling Review February (ACL'25).
2024: PC member of ASE, ISSTA, LLM4Code, ASE-SRC. Reviewer for ACL Rolling Review February (ACL'24), June (EMNLP'24), and December (ACL'25); Emergency Area Chair of ARR December.
2021: PC member of AAAI, NLP4Prog, AIST.
Journal reviewing: TOSEM, TSE, EMSE, TOPLAS, TACL, EAAI, IST, JSS.
- 2022–2023: Co-organizer of Joint UT-Cornell Software Engineering Seminar.
- 2018–2022: Co-organizer of NLP+Programming Reading Group at UT Austin.
- 2022: Committee of Graduate and Industry Networking (GAIN) at UT Austin.
- 2022: Mentor of ECE Partner Program at UT Austin.