Duncan Smeed, who is an educator, worries enough about traditional plagiarism and notes that, “of course, the situation is further complicated by the ready availability of vast resources over the Internet.” He points to a group that is evaluating iParadigms’ electronic system that claims it can detect papers copied off the Internet.
Several comments. First, I don’t know about the situation in the UK, but in the United States cheating is endemic. A professor I once had held up a newspaper story reporting a poll in which 60 percent of American college students said they had cheated. “What about the other 40 percent?” he asked rhetorically and then quickly answered his own question, “They’re liars” which elicited knowing laughter from the class.
Although I’m not to proud of it now, I have to admit that when I was in college I helped numerous people who had more money than brains by “helping” them write term papers that I’m sure their professors would have probably thought crossed the line into outright writing their papers for them. IParadigms claims that, “Students themselves report that unchecked cheating and plagiarism by others undermines their own efforts and educational enthusiasm,” but most of the good students I knew were more cynical about the overall lack of academic rigor and didn’t feel that much guilt in helping people bend the rules in their classes (to put it another way, what does it matter if I help someone get a B when they’re going to get a C just by showing up and breathing?)
Anyway, leaving my bit of true confessions behind, the problem with the iParadigms approach is that as the amount of published works on the Internet keeps expanding, the usefulness of the sort of brute force comparison iParadigms is doing is going to lose its usefulness.
Consider a college freshman writing his first paper on Shakespeare’s MacBeth. How many tens of thousands of articles and papers are there going to be about MacBeth available on the Internet? Sure if somebody is dumb enough to just cut and paste wholesale you’re going to catch them, but most of the people I knew who were chronic cheaters were far more sophisticated than that and were perfectly capable of taking a paper written by someone else and modifying it and rearranging it to more closely mimic how they would write the paper if they could be bothered.
Using something like statistical sampling to test for originality is a great idea when you’ve got a relatively small number of documents to deal with, but when you start comparing a very small body of work, such as a single paper, with a huge document base of 1.4 billion and rising documents on the Internet (using Google as a measure for the moment), the risk of a false positive will likely be unacceptably high. Imagine what it will be in 5 or 10 years when we could very well be measure the number of discreet documents indexed by search engines in the tens of billions.