To be perfectly honest: How AI and technology could impact test cheating
For as long as there has been testing, there has been cheating.
This stretches as far back as the 6th century (CE) during the time of the Sui Dynasty in China when testing was employed for entry into the Chinese Imperial Civil Service.
Candidates for the Civil Service exams were forbidden from bringing books into exams with them—or any written material for that matter.
According to the book China’s Examination Hell by Ichisada Miyazaki, this was sometimes enforced by soldiers who would search candidates thoroughly before the exams in the hope of obtaining a reward of three pieces of silver if they were successful in finding contraband. Soldiers often went so far as to cut open dumplings brought in as food to examine their bean-jam fillings.
Although cutting open people’s food is thankfully uncommon in testing today (!), the risks and issues are not dissimilar. The means of coping with test cheating, however, are vastly different, and in this piece I’ll share some thoughts—influenced by what I learned at the EATP (Association of Test Publishers) Security Summit and Conference in London in October—on how AI and other technology both helps and hinders cheating in online tests and exams.
Why does test cheating matter?
Even if they no longer help select Imperial civil servants, assessments are still essential in helping people qualify for and attain educational and work-life opportunities. Tests and exams are the most effective way to measure skills, grant credentials, and confer important education and job chances. Quizzes and tests are also used in educational homework and for corporate competence checking.
Though cheating is seemingly endemic in testing, the vast majority of test-takers are honest. But it’s not so much the scale of cheating as its knock-on severity that gives pause for serious concern: every time someone passes a test they shouldn’t by cheating the test is devalued for all other test-takers.
Where quizzes or tests are used to measure learning or to supplement homework, cheating makes it harder to measure the effectiveness of learning and reduces the amount that’s actually learned. Worse still, where exams are used to grant a qualification, that qualification is devalued if not everyone earned it fairly. Indeed, society itself is potentially put at risk when qualification for roles involving health and safety is taken into consideration. The bottom line is that when tests and exams are used to select for educational or work-life chances, anyone who gains an opportunity unfairly deprives someone else who might deserve it more.When tests and exams are used to select for educational or work-life chances, anyone who gains an opportunity unfairly deprives someone else who might deserve it more. Click To Tweet
With such high stakes, preventing cheating (or “test fraud” as it’s sometimes called) is a serious business.
If there are any doubts about this, consider the fact that some countries are prepared to take radical measures to counteract cheating such as turning off the Internet during important educational exams. Every serious testing organization puts significant time, effort, and resources into making it hard to cheat in their exams.
The test cheating arms race
As with other types of fraud, there is a kind of arms race between those who set tests and those who try to cheat in them.
For example, a decade or two ago, there was a lot of concern around plagiarism in university essay exams. Students copied text from published materials as part of their essays and it was hard to spot. But then automated plagiarism checkers were developed that could detect copied text. But because universities still expected students to write essays, and some students struggled to do so on their own, contract cheating services emerged in response to the plagiarism checkers—companies (often run by criminals) who offered to write an original essay for money on a given subject.
Contract cheating still exists but the latest trend in this area is AI-driven text generation or AI-assisted writing.
There are a slew of tools that can generate a plausible piece of written content from some prompt text. Finding them is easy. Just pop “AI text generator” into your search engine of choice to get an idea of the growing array of options out there. It is difficult to say how these technologies will impact test taking over the long term, but it’s easy to see how they might be perceived by under-pressure learners, such as this student quoted in the Cheat Sheet (an excellent newsletter on test cheating):
“When people are children, they imagine that a machine can do their homework. And I just happened to stumble upon that machine.”
Meanwhile in the world of IT and professional certification, there is a content theft challenge.
Certification organizations spend significant effort writing psychometrically sound questions to check competence. But these questions get copied, either using screen capture programs or hidden cameras. Yet again we can see the arms race as it evolves in real-time with lockdown browsers preventing screen capture, and proctors seeking to detect hidden cameras, but those scrupulous enough tend to find ways to evade these.
Test content being exposed leads to three issues:
1. Candidates find or buy it and learn the answers to questions, not the full subject.
2. There are criminal gangs of proxy testers who harvest all the questions in the exam and then approach candidates on LinkedIn or other social media and offer to take the exam remotely for them—very often they aren’t experts in the subject area, they just know all the stolen questions and can parrot the answers to them.
3. It erodes the value of IP for the test organization.
Stopping content theft
Questions being stolen and exposed in this way presents a problem in all kinds of quizzes, tests, and exams from educational tests to corporate learning and compliance exams to certification and admissions tests.
People try hard to stop content theft, but it’s hard to protect access to digital content—just look at the music industry. Scrambling question selection or choice order helps by making exams different for each test taker.People try hard to stop content theft, but it’s hard to protect access to digital content—just look at the music industry. Click To Tweet
One approach is to use questions that make pre-knowledge redundant, by requiring that test takers demonstrate a skill. For example, questions can ask the test taker to do something on a virtual computer lab, present answers orally, upload a file containing some work they have completed, or be observed doing a practical task. Learnosity has a rich set of question/item types that can help with this.
Another good approach is to have a huge bank of questions so that it’s very hard to learn them all. Ideally, it becomes easier to gain the skill being tested than to learn all the answers to every question. In such cases, it almost doesn’t matter if questions get stolen as there are too many for theft to have much impact. But since question creation is time consuming, there needs to be a technological approach to generate the items.
This is relatively easy to do in areas like accountancy, science or maths where questions can have numeric or financial parameters that can easily be varied. For example, Learnosity’s Dynamic Content capability allows you to parameterize questions and fill up a spreadsheet with 10s or 100s of variants of a question very easily. Such an approach can also work in other topics with a bit more work (e.g. selecting different scenarios or parameters in compliance or language questions).
There are also efforts being made to use AI to generate large numbers of questions.
Duolingo shared at the recent EATP conference that they have made good progress in using AI to generate large numbers of questions for the Duolingo English Test. They’re still subjected to human review and the company pilots questions before using them in anger, but AI does a lot of the work for them. They are ahead of most testing organizations, but using AI to generate thousands of psychometrically valid questions could well be the future of battling content theft.Using AI to generate thousands of psychometrically valid questions could well be the future of battling content theft. Click To Tweet
What of the future?
As AI techniques develop, machine learning is becoming highly effective at classifying things into categories. For example, AI aids radiographers and other medical professions in diagnosing disease by identifying patterns or possibilities that humans cannot always spot.
It seems likely that AI could therefore be good at identifying test fraud.
Conventional statistical analysis (“data forensics”) is already used to examine test results and identify likely cheating—for example, by identifying clusters of test takers who give the same wrong answers, or test takers who answer difficult questions very quickly, thereby implying they might already know the answers.
It also seems likely that AI techniques will be developed that are very accurate at detecting cheating, though they may not be able to prove it or explain themselves.
This may pose an ethical problem whereby people get accused of cheating by a machine, and though the accusation may be accurate most of the time, it could sometimes be awry. Do we trust the machine and invalidate the test result? Or do we suggest that computers are there to serve humankind, not to judge it?In time, AI will paradoxically arm both sides of the cheater and cheater-prevention arms race. Click To Tweet
Regulation and laws will also have a part to play, with Europe planning strong restrictions [Note: PDF link] on how AI is used to make decisions around education or recruitment. At least for now, testing organizations are going to give any possible AI decision a human review, due in part to a fear of inherent bias or discrimination in the machine.
But it may well be that, in time, AI will paradoxically arm both sides of the cheater and cheater-prevention arms race. On the one hand, test takers may use AI to help them answer questions and do better at exams; while on the other hand, testing organizations will use AI to prevent and detect cheating. This raises a further concern: when an arms race pits one artificial intelligence against another, is it possible that it’s people who get left behind?
As the old saying (or curse, as some would have it) goes “May you live in interesting times”. Indeed.