- From: Moses Ma <moses.ma@futurelabconsulting.com>
- Date: Fri, 20 Feb 2026 10:22:38 -0800
- To: public-credentials@w3.org
- Message-ID: <27b2d352-6ec0-44e6-9110-92693f223f05@futurelabconsulting.com>
Okay, it’s been a week, and here are the results of last week’s
mini-experiment.
*25% of readers misidentified the LLM output as human-generated.*
A was human generated and B was LLM generated.
*Forms response chart. Question title: Pick one please. Number of
responses: 16 responses.*
That’s a surprising but not shocking number. What will be interesting is
the direction of travel this number will see over the next 12 months, as
new frontier models are released. If you’ve spent any time with frontier
models lately, you already know this percentage is going to rise. Not
because humans are getting worse—but because the models are getting
better. What happened for AI video and AI coding will eventually make it
to online group collaborations.
Anyway, I’d like to repeat this experiment every month for the next year
or so with this group. Same format. Same blind vote. I suspect this
group uses cutting edge frontier models, and I'll capture that detail in
the next iteration. Also, I’ll correlate the data with the release of
new frontier models.
Let’s track the curve and learn together!
A few clarifications:
• I wasn’t trying to trick anyone.
• My reply was written extemporaneously with no editing.
• The LLM response was reposted as generated the first time,
again with zero editing. No prompt-optimization. No human “polishing.”
Just raw output.
Also, now that the experiment is over, here is my suggestion for how we
should manage the situation of "AI slopification" in collaboration…
So here are some details about my post - last year, I led the
development of a national AI roadmap for the Ministry of Industry of
Thailand. An IGO used an AI detector and reported that my writing
rose above the compliance limit, and triggered a request to
"humanize" the outpu. I explained how so-called “humanizers” work —
essentially by injecting less likely words to weaken the writing.
Also, I remarked how a report about AI is questioned about the use
of AI in its own generation - shouldn't we be practicing what we preach?
And then I asked a simple question:
What is better for the customer—higher-quality writing, or
compliance with a rule you don’t fully understand because the
technology is so new? That should be the litmus test here. Not
authorship theater or some purity tests. Simply, what produces the
better product? What improves the process and end result?
I added, if an AI helps create a new antibiotic for MDROs
(multidrug-resistant organisms), do you demand the Nobel prize be
stripped because it’s not fair to use AI? Of course not.
Here's my point: if AI materially improves clarity, structure, rigor, or
speed—and you choose not to use it—are you protecting integrity, or just
degrading output for the sake of red tape?
We’re entering a phase where the line between “human” and
“machine-assisted” becomes less meaningful than the quality of thinking
expressed. The bar isn’t “was it written in a way that used up the
budgeted man-hours, ie, did you cheat with AI?” The bar is: /did it
advance understanding? was it effective in creating a breakthrough?/
My actual report included many innovations that were unique and not
regurgitated by an AI. I pointed that out, and they agreed.
Anyway, that’s my position. As a group, let’s optimize for better
work—not nostalgic for the way it used to be. Those days aren’t coming
back people!
What is happening is essentially the onset of a digital ice age, and we
need to adapt faster not question if the climate is really changing or not.
Moses
--
*Moses Ma | Managing Partner*
Learn more at futurelabconsulting.com
On 2/14/26 6:33 PM, Moses Ma wrote:
> Hi everyone,
>
> Daniel is right, the results so far are very interesting. I won't
> spoil it by saying any more.
> Please participate in the experiment -
> *https://forms.gle/42mWD8HAouAhM9kVA*
> <https://forms.gle/42mWD8HAouAhM9kVA>
>
> Moses
>
>> On Sat, Feb 14, 2026 at 1:19 PM Moses Ma
>> <moses.ma@futurelabconsulting.com> wrote:
>>
>> Hi all,
>>
>> Just as an experiment, I’m providing two responses, one
>> written organically and the other was generated by an LLM.
>> Please vote on which you think is human generated? Doing this
>> is allowing me to explorethe nature of human versus AI
>> generated content.
>>
>> Vote here: *https://forms.gle/42mWD8HAouAhM9kVA*
>> <https://forms.gle/42mWD8HAouAhM9kVA>
>>
>> I'll reply with the survey results in about a week.
>>
>> Moses
>>
>> ---
>>
>> Version A:
>>
>> I am also worried about the slopification of not only this
>> forum, but the entire practice of strategic collaboration.
>> First, I recently wrote something where there was concern
>> that my work was AI generated simply because I used em
>> dashes—I tend to use them a lot, as it offers the reading
>> equivalent of a thoughtful pause. I had to use an AI detector
>> on an extensive article I blogged ten years ago to show that
>> my natural writing style triggered the detector, when it was
>> simply, well, good writing. (The article I blogged received
>> over half a million page views.) I subsequently discovered
>> that most “humanizers” simply inject words less likely ones,
>> to appear human generated by reducing the quality of the writing.
>>
>> The reality is that AI tools are now part of the cognitive
>> environment. Clearly, the key issue isn’t “used an LLM” vs
>> “didn’t use an LLM.” It’s authorship and responsibility. If
>> someone uses a model to structure thoughts, tighten prose,
>> and check logic — but the ideas, judgments, and commitments
>> are theirs — that’s still meaningful participation. If the
>> model is doing the thinking, and the human is just "phoning
>> it in" by forwarding output, then yes, we’re drifting toward
>> a lot more than just model collapse
>>
>> The other issue is where we’re heading. Just as AI generated
>> coding and video have crossed their critical thresholds of
>> realism, collaborative ideation will also do so within a few
>> years. As LLMs have slowly transitioned from “stochastic
>> plagiarism” to the use of reasoning models, they will at some
>> point cross a similar threshold, the "uncanny valley" of
>> cognitive realism. Especially when multi-agent systems start
>> emulating what we do here. At that point, our internal slop
>> detectors—even set to high sensitivity so anything we read
>> could “feel” AI generated—will not be able to tell the
>> difference.
>>
>> Version B:
>>
>> Thanks for raising this. I think you're pointing at something
>> real and worth discussing openly.
>>
>> The distinction you draw between using an LLM as a tool and
>> outsourcing your thinking to one is important, and I'd argue
>> it's actually sharper than it might first appear.
>> Proofreading preserves the structure of your reasoning while
>> cleaning up expression. Generation replaces the reasoning
>> itself. Those are fundamentally different activities, even if
>> the output can sometimes look similar on the surface.
>>
>> Where I think this gets genuinely difficult is that the line
>> isn't always clean. Someone might use an LLM to help organize
>> scattered thoughts into a coherent draft, then revise it
>> substantially. That's somewhere in the middle, and I'd
>> hesitate to call it illegitimate. The question probably isn't
>> whether an LLM touched the text at all, but whether the
>> person behind it actually wrestled with the problem, made
>> judgment calls, and can defend what they wrote if pressed.
>>
>> That said, I think the practical concern you're raising
>> stands regardless of where we draw the line. When responses
>> on a list like this start reading like they were produced by
>> someone who spent 30 seconds prompting rather than 30 minutes
>> thinking, it does erode trust. You start reading differently.
>> You skim more. You engage less. And that's corrosive to
>> exactly the kind of deliberation this group exists for.
>>
>> I don't know what the right intervention is. Norms are
>> probably more useful than rules here. Something like: if you
>> wouldn't be comfortable explaining and defending every claim
>> in your message during a live conversation, maybe reconsider
>> sending it. That's not a perfect filter, but it at least
>> recenters the expectation that contributions reflect genuine
>> engagement rather than generated fluency.
>>
>> On 2/13/26 3:41 AM, Filip Kolarik wrote:
>>> Dear VCWG,
>>> I want to raise a concern that’s been bothering me lately.
>>> It feels like this mailing list is being flooded by
>>> LLM-generated responses.
>>>
>>> Whether or not that’s intentional, meaningful work depends
>>> on people engaging directly with arguments and tradeoffs,
>>> and when contributions read like synthesized summaries
>>> rather than considered positions, the discussion loses
>>> clarity and momentum.
>>>
>>> I’m not arguing against using tools; I use LLMs to proofread
>>> my own writing. But there is a difference between
>>> proofreading text you wrote and letting an LLM generate the
>>> entire response. If normalized, we risk damaging the
>>> effectiveness of the group and turning this mailing list
>>> into a swamp to be ignored.
>>>
>>> Best regards,
>>> Filip
>>> https://www.linkedin.com/in/filipkolarik/
>>
>
Received on Friday, 20 February 2026 18:22:51 UTC