Maths and Assessment in the Age of AI // A fractal spectrum of tales

The commoditization of AI—and its rapidly improving ability to solve, explain, and generate mathematical problems—is increasingly getting in the way of our teaching and, even more significantly, disrupting our usual assessment methods.

For instance, many courses have seen an surge in near-perfect, overly elaborate solutions whose quality is rarely reflected in final exams. Discussion and active participation seem to be declining, replaced by a reliance on LLMs as convenient—albeit impersonal—tutor replacements. This is concerning, as research and experience consistently show that collaboration and productive struggle are key ingredients of learning.

While this is surely an overgeneralization, these issues have become so apparent that they are forcing us to question how we teach and assess. After numerous discussions and requests from staff, our Programme Committee decided to sit down with students and colleagues looking for some clarity. Clearly, we could not solve this in one meeting, but the open dialogue was invaluable; it allowed us to generate new ideas and begin understanding our collective goals and concerns. This blog post summarizes my impressions from that meeting.

# Rethinking Ourselves

To engage in this discussion, we must clarify two fundamental questions: What is our primary objective? and What should students learn from our degree program? These questions must be addressed before we can even consider how to promote and measure that learning.

While our Teaching and Exam regulations provide an institutional answer, our personal ones may not be as obvious as they seem. From my perspective, the goal of a mathematics degree isn’t just to teach mathematics. Rather, our primary aim is to develop analytical and critical thinking, tolerance for frustration, and collaboration skills. In this sense, mathematical knowledge is the means, not the end.

This framing may raise eyebrows; after all, not everyone shares this point of view. That is precisely why these open discussions are so valuable: they help us find our common ground on what we are actually trying to achieve. If we aren’t asking the same questions, how can we expect to agree on the answers?

In the meantime, the challenge remains: AI has gone in the way of our classroom, so how should we respond?

The debate over whether to allow, restrict, or ban AI in education is far from settled. Current research suggests that the technology may have a detrimental effect on learning, but these findings are preliminary. We need more time to understand the impacts and develop the necessary pedagogical adjustments.

Then a question comes up. If top mathematicians are using these tools, why shouldn’t our students?

To me, the answer is not as simple as it may seem. Two concerns stand out. First, accessibility. Tools like Google Aletheia, Google DeepThink, and ChatGPT Pro (which some of our students are already using!) are either restricted or prohibitively expensive (they are hidden behind 200€/month plans). This is all but democratizing mathematics education.

Second, the role of expertise. Established mathematicians can critically assess the validity of outputs, spot hallucinations, and steer the conversation in useful directions. This is rarely possible when one is still learning the subject. If we allow students to take these shortcuts too soon, we are short-circuiting the very skills we are meant to cultivate: original thought, perseverance, and tolerance to frustration—which to me form the foundation to work through difficult problems.

# New Thoughts, Old Problems

To gain this expertise, then, how much time should students spend wrestling with exercises? This is a question I had not fully considered. I am sure some students engage with the material for days; flipping through notes, debating with peers, and testing ideas until they crack the problem. They approach it much like many of us did before the answers started to be a Google search away.

But many will not. With AI, they can obtain a hint (or, more often, a verbose, half-correct solution) in seconds. It is convenient and non-judgmental, but it completely bypasses the struggle and the time required for ideas to properly sink in.

Someone suggested setting explicit expectations: tell students to spend, say, 4–8 hours per exercise. I am wary of setting a hard number, as everyone works at a different pace, but the principle is quite interesting. Learning takes time; you need to sleep on a problem and allow your brain to digest it subconsciously. By telling students when to pause, seek help, and how long to try before giving up, we can set better boundaries. Currently, the time between getting stuck and turning to an LLM is often a matter of minutes, far too short for proper learning.

Interestingly, data from conversations and a recent internal survey suggests that newer cohorts are becoming more aware of LLM limitations. Having been exposed to these tools early, many have seen firsthand how they can mislead or go off the rails. Students recognize that relying on AI can lead to a shallower understanding, yet the pressure to optimize for grades remains a strong incentive to cut corners.

But is any of this new? The concerns regarding participation, grade fixation, and a shrinking tolerance for frustration are long-standing issues. However, having a free, tireless, judgment-free personal tutor exponentially accelerates these problems. The explosion of LLMs has simply made the challenge of motivating our students more visible.

Perhaps that is a good thing, an opportunity for even the most skeptical among us to see the problem clearly and begin seeking structural solutions. Some have responded by removing or ungrading exercises, but is that the answer? If other courses maintain graded homework, students will simply shift their focus, and engagement will inevitably drop. We see this happening already in our courses.

# Ideas in Progress

During our meeting, several interesting ideas were proposed. I share them here as a starting point for others facing similar challenges:

Replace graded homework with peer-review sessions or optional TA feedback. This requires ensuring students engage meaningfully with the process. If implemented well, peer review can be a lively and insightful activity; providing extra credit for participation can offer extrinsic motivation.
Encourage live presentations. Giving students points for presenting their solutions at the board during tutorials has proven successful in courses that have trialed it.
Implement post-tutorial quizzes. Using an exercise from a previous tutorial for a graded quiz in class has proven quite effective and well received in the class that implemented it.
Self-grading. To incentivize participation, allow students to self-evaluate. If their self-evaluations consistently align with their final exam grades, they earn a bonus point. As far as I know this has not yet been attempted in our courses.

The common thread here is the use of low-stakes, high-value activities that prioritize engagement over grade-chasing. For further inspiration, this post on the Grading for Growth blog offers other interesting ideas.

Many students also praised the use of tutorial-group WhatsApp groups with their TAs. While we must be careful not to overwork our teaching assistants, experience shows that students often resolve questions among themselves before the TA even needs to step in.

# What’s the Aim, Again?

Ultimately, this discussion brought us back to a crucial question: Who are we teaching for?

Are we trying to only push our top students to excel? Do we want to keep the majority engaged without losing those who need more support? Do we accept that some will always tune out? While we share a general understanding, we are not yet necessarily on the same page. That is precisely why these conversations are worth the time. They may not provide immediate answers, but they can lay the groundwork for a much needed new way forward.

In the end, a tension remains lingering between maximizing our pedagogical impact and optimizing for the practical needs of our students. Both staff and students are under pressure to save time, and often, the best solutions are also the most time-consuming. Perhaps that is where our next dialogue should begin.

PS. I recently read another Grading for Growth piece on how AI can save time when preparing courses. While I am not sure I agree with all the example, it was worth the read.