There was a post just a few hours ago on the frontpage asking not to use AI for writing [0]. I copied the content and pasted it on multiple "AI detection" tools. It scored from 0% and to up to 80%. This is not gonna cut it. As someone who used LLMs to "improve" my writing, after a while, no matter the prompt, you will find the exact same patterns. "Here's the kicker" or "here is the most disturbing part" those expressions and many more come up no matter how your engineer the prompt. But here's the kicker, real people also use these expressions, just at a lesser rate.
Detection is not what is going to solve the problem. We need to go back and reevaluate why we are asking students to write in the first place. And how we can still achieve the goal of teaching even when these modern tools are one click away.
[0]: https://news.ycombinator.com/item?id=45722069
My two cents about this after working with some teachers: this is a cat and mouse game and you're wasting your time trying to catch students writing essays on their own time.
It is better to pivot and not care about the actual content of the essay, but instead seek alternate strategies to encourage learning - such as an oral presentation or a quiz on the knowledge. In the laziest case, just only accept hand-written output - because even if it was generated at least they retained some knowledge by copying it.
Do teachers prefer grading papers or something? This always seemed like the obvious answer and there are no shortage of complaints. There is something making papers "sticky" that I do not understand. Education needs to be agile enough to change it's assessment methods. It's getting to the point where we can't just blame LLMs anymore. Figure out how to asses learning outcomes instead of just insisting on methods that you assumed should work.
Oral exams and quizzes are hard for reasons unrelated to understanding the subject matter. Language barriers, public speaking anxiety, exam stress, etc. All things that students should hopefully learn how to overcome, but that's a lot to ask a teacher to deal with in addition to teaching history or whatever. With a paper, a student can choose their own working environment, choose a day and time when they are best able to focus, have a constructive discussion with the teacher if they're having trouble midway through the work, and spread their effort (if they want to) across more than an hour-long test or 5-minute oral exam. In an imaginary world where they couldn't cheat, a paper gives the teacher the best chance of evaluating whether a student understands the material.
I don't think you're wrong necessarily, but there are good reasons that teachers like papers other than "we've always used them".
Because, assuming it's done properly w/o cheating, it's a great learning tool. It's sometimes easy to forget that certain tasks are the way they are because they're supposed to teach. We don't structure teaching and learning around what the least painful thing is.
How wide is the gap between “least painful thing” and “most effective thing”?
I think the most realistic way is to do a flipped classroom, where middle-school and beyond, children are expected to be independent learners. Class time should be spent on application of skills and evaluation.
If computer usage hampers a child's socialization with the group he's learning with, maybe the simplest and most meaningful solution would be preventing children enrolled in language comprehension classes from having access to computers at home particularly at core language and reasoning stages in development.
I suspect AI text detection has actually become easier, as chatbots today have been heavily finetuned towards a more distinctive style.
For example “delve” and the em-dash are both a result of the finetuning dataset, not the base LLM.
You are forgetting the human mind accounting for this and adding "write this like a kinda dumb high school student". I just did a little test between a copilot essay and the same prompt with "write this like a kinda dumb high school student" and it reads like an essay i would have written.
In the brave world of the future you too will be able to get a C- with very little effort!
That's where the humanizers come in. These are solutions that take LLM generated text and make it sound human written to avoid detection.
The principle of training them is quite simple. Take an LLM and reward it for revising text so that it doesn't get detected. Reinforcement learning takes care of the rest for you.
While it’s interesting work, so far my experience is that AI isn’t good enough (or most people aren’t good enough with AI) for detection to really be a concern, at least in “research” or any writing over a few sentences.
If you think about the 2x2 of “Good” vs “By AI”, you only really care about the case when something it good work that an AI did, and then only when catching cheaters, as opposed to deriving some utility.
If it’s bad, who cares if it’s AI or not, and most AI is pretty obvious thoughtless slop, and most people that use it aren’t paying attention to mask that, so I guess what I’m saying is for most cases one could just set a quality bar and see if the work passes.
I think maybe a difference AI brings is that in many cases people don’t really know how to understand or judge the quality of what they are reading, or are to lazy to, so have substituted as proxies for quality the same structural cues that AI now uses. So if you’re used to saying “it’s well formatted, lots of bulleted lists, no spelling mistakes, good use of adjectives, must be good”, now you have to actually read it and think about it to know.
I personally would value a spam filter that filters out AI generated content.
Wow. Never heard of Pangram until now. Quote:
Pangram maintains near-perfect accuracy across long and medium length texts. It achieves very low error rates even on shorter passages and ‘stubs.’
I'm extremely skeptical of these claims. Especially when we're dealing with careful prompting to adjust tone/style.
Even if it was close to being near perfect, that is still not enough due to the negative impact of false positive detections on students.
Mmmm yes, I probably will never be able to find it again but someone recently tested a lot of these out and found you could bypass them easily by changing a few words around.