Advanced Topics in Software Engineering: Machine Learning for Software Engineering - CS846 Sec 001, Fall 2024 (Term 1249) #

About #

CS846-ML4SE is a graduate-level seminar course about the application of machine learning and natural language processing in software engineering. We will mainly cover three topics: (1) language modeling for code, (2) mining software repositories (MSR) with program analysis, (3) automating software engineering tasks with the aforementioned techniques.

Class will be in-person on Mondays at 9:00-11:50am at DC 2585. Attendance at all classes is mandatory.

The course will contain a mix of paper discussions and lectures. During the first half of each class, we will discuss 1~2 papers from the reading list; each paper discussion will be led by two students. During the second half, I will give a lecture, usually in the form of coding demos, on the topic of the week.

The course also includes a project. Students should work in teams of 2~4 (larger teams will receive higher expectations). Each team is expected to conduct a research project in the area of ML4SE and complete a short-paper-level report at the end of the term. For examples of how this level of research projects should look like, please refer to SE conferences’ tool/data tracks (e.g., in MSR'24), mining challenges (e.g., in MSR'24), student research competitions (e.g., in ICSE'24), or new idea tracks (e.g., in ICSE'24).

Contact #

Instructor: Pengyu Nie - pynie@uwaterloo.ca (office hours by appointment)

We will be using the Teams chat group for course discussions and announcements. The project submissions will be done through emails.

Course Schedule #

The syllabus is tentative and subject to change. The reading list will be posted soon.

DateTopicReading
Sep 09Introduction to class; overview of ML4SE researchNone
Sep 16Language modeling and n-gram modelsCapturing Structural Locality in Non-parametric Language Models
Sep 23Sequence-to-sequence models and transformers
Sep 30Large language models for code
Oct 07Software datasets (GitHub, StackOverflow, etc.)
Oct 14Reading week - no classNone
Oct 21Build systems essentials and parsing
Oct 28Static analysis
Nov 04Dynamic analysis
Nov 11Task: program comprehension
Nov 18Task: code completion and generation
Nov 25Task: code translation
Dec 02Task: bug detection and fixing

Assessment #

All deliverables are due at 11:59pm Eastern Time on the respective day. Late submissions will be graded only on a case-by-case basis.

TaskDue DateWeight
Attendance-20%
Paper discussion lead-20%
Project: team formationSep 25 (Wed)-
Project: proposalOct 11 (Fri)10%
Project: progress reportNov 01 (Fri)20%
Project: final reportDec 05 (Thu)30%

Reading List #

Paper discussion signup sheet

Sep 16: language modeling and n-gram models #

Sep 23: sequence-to-sequence models and transformers #

Sep 30: large language models for code #

Acknowledgements #

Administrative Notes #

Generative AI #

Generative artificial intelligence (GenAI) trained using large language models (LLM) or other methods to produce text, images, music, or code, like Chat GPT, DALL-E, or GitHub CoPilot, may be used for assignments in this class with proper documentation, citation, and acknowledgement. Recommendations for how to cite GenAI in student work at the University of Waterloo may be found through the Library. Please be aware that generative AI is known to falsify references to other work and may fabricate facts and inaccurately express ideas. GenAI generates content based on the input of other human authors and may therefore contain inaccuracies or reflect biases.
In addition, you should be aware that the legal/copyright status of generative AI inputs and outputs is unclear. Exercise caution when using large portions of content from AI sources, especially images. More information is available from the Copyright Advisory Committee.
You are accountable for the content and accuracy of all work you submit in this class, including any supported by generative AI.

Territorial Acknowledgement #

The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is centralized within the Office of Indigenous Relations.

Inclusive Teaching-Learning Spaces #

The University of Waterloo values the diverse and intersectional identities of its students, faculty, and staff. The University regards equity and diversity as an integral part of academic excellence and is committed to accessibility for all. We consider our classrooms, online learning, and community spaces to be places where we all will be treated with respect, dignity, and consideration. We welcome individuals of all ages, backgrounds, beliefs, ethnicities, genders, gender identities, gender expressions, national origins, religious affiliations, sexual orientations, ability - and other visible and nonvisible differences. We are all expected to contribute to a respectful, welcoming, and inclusive teaching- learning environment. Any member of the campus community who has experienced discrimination at the University is encouraged to seek guidance from the Office of Equity, Diversity, Inclusion & Anti-racism (EDI-R) via email at equity@uwaterloo.ca. Sexual Violence Prevention & Response Office (SVPRO), supports students at UWaterloo who have experienced, or have been impacted by, sexual violence and gender-based violence. This includes those who experienced harm, those who are supporting others who experienced harm. SVPRO can be contacted at svpro@uwaterloo.ca

Religious & Spiritual Observances #

The University of Waterloo has a duty to accommodate religious and spiritual observances under the Ontario Human Rights Code. Please inform the instructor at the beginning of term if special accommodation needs to be made for religious observances that are not otherwise accounted for in the scheduling of classes and assignments. Consult with your instructor(s) within two weeks of the announcement of the due date for which accommodation is being sought.

Respectful Communication and Pronouns #

Communications with Instructor(s) and TAs should be through recommended channels for the course (e.g., email, LEARN, Piazza, Teams, etc.) Please use your UW email address. Include an academic signature with your full name, program, student ID. We encourage you to include your pronouns to facilitate respectful communication (e.g., he/him; she/her; they/them). You can update your chosen/preferred name at WatIAM. You can update your pronouns in Quest.

Mental Health and Wellbeing Resources #

If you are facing challenges impacting one or more courses, contact your academic advisor, Associate Chair Undergraduate, or the Director of your academic program. Mental health is a serious issue for everyone and can affect your ability to do your best work. We encourage you to seek out mental health and wellbeing support when needed. The Faculty of Engineering Wellness Program has programming and resources for undergraduate students. For counselling (individual or group) reach out to Campus Wellness and Counselling Services. Counselling Services is an inclusive, non-judgmental, and confidential space for anyone to seek support. They offer confidential counselling for a variety of areas including anxiety, stress management, depression, grief, substance use, sexuality, relationship issues, and much more.

Intellectual Property #

Be aware that this course contains the intellectual property of their instructor, TA, and/or the University of Waterloo. Intellectual property includes items such as:

  • Lecture content, spoken and written (and any audio/video recording thereof).
  • Lecture handouts, presentations, and other materials prepared for the course (e.g., PowerPoint slides).
  • Questions or solution sets from various types of assessments (e.g., assignments, quizzes, tests, final exams); and
  • Work protected by copyright (e.g., any work authored by the instructor or TA or used by the instructor or TA with permission of the copyright owner).

Course materials and the intellectual property contained therein are used to enhance a student’s educational experience. However, sharing this intellectual property without the intellectual property owner’s permission is a violation of intellectual property rights. For this reason, it is necessary to ask the instructor, TA and/or the University of Waterloo for permission before uploading and sharing the intellectual property of others online (e.g., to an online repository).
Permission from an instructor, TA or the University is also necessary before sharing the intellectual property of others from completed courses with students taking the same/similar courses in subsequent terms/years. In many cases, instructors might be happy to allow distribution of certain materials. However, doing so without expressed permission is considered a violation of intellectual property rights and academic integrity.
Please alert the instructor if you become aware of intellectual property belonging to others (past or present) circulating, either through the student body or online.

Continuity Plan - Fair Contingencies for Unforeseen Circumstances (e.g., resurgence of Covid) #

In the event of emergencies or highly unusual circumstances, the instructor will collaborate with the Department/Faculty to find reasonable and fair solutions that respect rights and workloads of students, staff, and faculty. This may include modifying content delivery, course topics and/or assessments and/or weight and/or deadlines with due and fair notice to students. Substantial changes after the first week of classes require the approval of the Associate Dean, Undergraduate Studies.

Declaring absences (undergraduate students and/or courses only) #

Regardless of the process used to declare an absence, students are responsible for reaching out to their instructors as soon as possible. The course instructor will determine how missed course components are accommodated. Self-declared absences (for COVID-19 and short-term absences up to 2 days) must be submitted through Quest. Absences requiring documentation (e.g., Verification of Illness Form, bereavement, etc.) are to be uploaded by completing the form on the VIF System. The UW Verification of Illness form, completed by a health professional, is the only acceptable documentation for an absence due to illness. Do not send documentation to your advisor, course instructor, teaching assistant, or lab coordinator. Submission through the VIF System, once approved, will notify your instructors of your absence.

Rescheduling Co-op Interviews #

Follow the co-op process for rescheduling co-op interviews for conflicts to graded assignments (e.g., midterms, tests, and final exams). Attendance at co-operative work-term employment interviews is not considered to be a valid reason to miss a test.

Policies #

Academic integrity #

In order to maintain a culture of academic integrity, members of the University of Waterloo community are expected to promote honesty, trust, fairness, respect and responsibility. [Check the Office of Academic Integrity for more information.]

Grievance #

A student who believes that a decision affecting some aspect of their university life has been unfair or unreasonable may have grounds for initiating a grievance. Read Policy 70, Student Petitions and Grievances, Section 4. When in doubt, please be certain to contact the department’s administrative assistant who will provide further assistance.

Discipline #

A student is expected to know what constitutes academic integrity to avoid committing an academic offence, and to take responsibility for their actions. [Check the Office of Academic Integrity for more information.] A student who is unsure whether an action constitutes an offence, or who needs help in learning how to avoid offences (e.g., plagiarism, cheating) or about “rules” for group work/collaboration should seek guidance from the course instructor, academic advisor, or the undergraduate associate dean. For information on categories of offences and types of penalties, students should refer to Policy 71, Student Discipline. For typical penalties, check Guidelines for the Assessment of Penalties.

Appeals #

A decision made or penalty imposed under Policy 70, Student Petitions and Grievances (other than a petition) or Policy 71, Student Discipline may be appealed if there is a ground. A student who believes they have a ground for an appeal should refer to Policy 72, Student Appeals.

Note for students with disabilities #

AccessAbility Services, located in Needles Hall, Room 1401, collaborates with all academic departments to arrange appropriate accommodations for students with disabilities without compromising the academic integrity of the curriculum. If you require academic accommodations to lessen the impact of your disability, please register with AccessAbility Services at the beginning of each academic term.

Turnitin.com: #

Text matching software (Turnitin®) may be used to screen assignments in this course. Turnitin® is used to verify that all materials and sources in assignments are documented. Students’ submissions are stored on a U.S. server, therefore students must be given an alternative (e.g., scaffolded assignment or annotated bibliography), if they are concerned about their privacy and/or security. Students will be given due notice, in the first week of the term and/or at the time assignment details are provided, about arrangements and alternatives for the use of Turnitin in this course.
It is the responsibility of the student to notify the instructor if they, in the first week of term or at the time assignment details are provided, wish to submit alternate assignment.