Re: Advancing Distributed Moderation

OpenAI has documented a way in which you can use GPT-4 for content moderation: https://openai.com/blog/using-gpt-4-for-content-moderation

I expect there to be a lot more work done in this field of research.

James


On Saturday, 25 November 2023 at 20:00, Adam Sobieski <adamsobieski@hotmail.com> wrote:


> Emelia Smith,
> 

> 

> Thank you for the feedback. I am new to the SWICG and am glad to see that there is some interest here in uses of AI/ML/CV/LLMs for equipping moderators and empowering end-users.
> 

> 

> 

> 

> Best regards,
> Adam
> 

> 

> P.S.: I would enjoy learning more about the FIRES proposal.
> 

> 

> From: Emelia Smith <emelia@brandedcode.com>
> Sent: Friday, November 24, 2023 12:24 PM
> To: Adam Sobieski <adamsobieski@hotmail.com>
> Cc: public-swicg@w3.org <public-swicg@w3.org>
> Subject: Re: Advancing Distributed Moderation
> 

> Hi all,
> 

> Some of what Adam speaks of here is what I'm working on with my FIRES proposal, essentially a service for storing & sharing moderation advisories and recommendations, and looks past the current status quo of denylists or blocklists, allowing still for full federation restrictions, but also for more granular restrictions. It also allows for multiple different entity types, whether Domains/Instances, Hashtags, URLs, Users, etc.
> 

> When I worked at State in 2012/2013, we did structure opinions, where you'd have a topic and several words associated with that topic, and ultimately it didn't go mainstream.  So I'm skeptical on the "made me feel" proposal here, as it's additional "work" for the person using the service, maybe you could extrapolate this via emoji reactions, which are popular on the fediverse though?
> 

> Attempts at trying to classify content could help moderation, but it could also harm it, by reducing the "human touch", which creates problems like how Instagrams automated moderation often harms marginalised communities. So I do suggest caution in this path.
> 

> Additionally, this could then result in algorithmic timelines, which currently aren't present on the Fediverse. From what I've seen, there's generally a resistance to algorithmic timelines, even if those can help people who check social media less frequently than others.
> 

> You are right that AI/machine vision/LLMs could be beneficial for instance moderators, preventing exposure to harm, but we also need to be critical of and investigate the training of these models, and their human impact (there was a recent headline I read about OpenAI paying minimal amounts for human moderators to classify media as CSAM or not, which obviously has a human cost), additionally, we need to enquire as to the biases such algorithms may have and how they may adversely affect marginalised social groups.
> 

> Whilst tools can assist in moderation, ultimately I believe we need to have humans making the final decision.
> 

> (If anyone would like to peer-review FIRES I can send you an early draft, but I'm currently reworking a lot of it)
> 

> Yours,
> Emelia Smith
> 

> 

> > On 24. Nov 2023, at 14:30, Adam Sobieski <adamsobieski@hotmail.com> wrote:
> 

> > Social Web Incubator Community Group,
> > 

> > 

> > Introduction
> > ============
> > 

> > Hello. I would like to share some ideas to better inform and equip distributed social media administrators and moderators.
> > 

> > 

> > Multimedia-content-scanning Rules and Service Providers
> > =======================================================
> > 

> > Software tools for moderators can be envisioned which:
> > 

> > 1.  Scan multimedia content on servers,
> > 2.  Provide moderators with time-critical information, updates, alerts, and alarms,
> > 3.  Provide moderators with real-time natural-language reports, data visualizations, and analytics.
> > 

> > 

> > 

> > Software tools for moderators could be updated in a manner resembling real-time "antivirus" data updates. Instead of there being one or a few such "antivirus" data providers, moderators could choose to subscribe to individual channels of data from multiple providers. As envisioned, each provider would serve updates across a number of described channels. Data sent in these channels, e.g., multimedia-content-scanning rules, could be merged on platform servers and subsequently utilized by moderators' software tools. Some data providers' channels might be free to use while others would require paid subscriptions.
> > 

> > 

> > As envisioned, moderators would be able to create, modify, and delete multimedia content scanning rules manually to customize their software tools.
> > 

> > 

> > Moderators' actions and decisions could be collected, aggregated, processed, and utilized for purposes including to distribute helpful real-time hints to other moderators across platforms. Moderators would be able to choose which service providers, if any, to share these data with. These data would not identify end-users or moderators.
> > 

> > 

> > The aforementioned "antivirus" data for informing and equipping decentralized social media moderators could additionally enhance and enable other technologies and systems, e.g., content-distribution algorithms and recommender systems.
> > 

> > 

> > With recent advancements to AI:
> > 

> > 1.  Systems like NeMo Guardrails and Guardrails AI can enhance the capabilities of moderators' software tools,
> >     

> > 2.  Systems like GPT-4V and LLaVA can process images occurring in social-media contexts in new ways.
> > 

> > 

> > 

> > Enabling More Granular Reactions to Content
> > ===========================================
> > 

> > Beyond liking content or not, end-users could react to content items by attaching text-keywords metadata. End-users could click on buttons next to content to enter open-ended text contents into text fields, e.g., "this content made me feel _____". There could be another button to use to add subsequent reactions – second, third, or fourth reactions – for more complex, and potentially mixed, reactions. Alternatively, these reactions could be comma-delimited, and converted to reaction tags as typed. End-users' reactions would tend to be drawn from a folksonomic vocabulary and, accordingly, incremental search and recommendation features could reduce typing.
> > 

> > 

> > Anonymized usage data from end-users could be collected and sent to service providers. Envisioned data include anonymized status messages and anonymized reactions to content items. With these data, new usage trends, audience reaction data, natural-language reports, data visualizations, and analytics could be made available to moderators and administrators.
> > 

> > 

> > With unfolding advancements to AI, granular reactions to content could be explained and predicted. This could enhance tools for moderators such as automatically displaying warnings atop some content for end-users.
> > 

> > 

> > End-user-specified Preferences
> > ==============================
> > 

> > Some moderators might want to allow their end-users to express content-related preferences, to allow end-users to directly or indirectly create, modify, and delete multimedia-content-scanning rules.
> > 

> > 

> > End-users' preferences can be complex; there would be some complex concepts or categories for machine-learning-based approaches to learn through the processing of items and end-users' responses to them. Consider, for instance, the following pseudocode for two rules, which end-users might express through a number of techniques, including natural language: "if an item is predicted to make me sad or angry, unless it is for activism or charity, I want a warning atop it."
> > 

> > 

> > Rule 1:
> > 

> > 

> > if(prediction(me, item, 'sad'))
> > {
> >     if(!(metadata(item, 'activism') || metadata(item, 'charity')))
> >     {
> >         return warning.content(item, 'sad');
> >     }
> > }
> > return warning.none;
> > 

> > 

> > Rule 2:
> > 

> > 

> > if(prediction(me, item, 'angry'))
> > {
> >     if(!(metadata(item, 'activism') || metadata(item, 'charity')))
> >     {
> >         return warning.content(item, 'angry');
> >     }
> > }
> > return warning.none;
> > 

> > 

> > Note that multiple warnings could be aggregated and displayed atop content items.
> > 

> > 

> > AI systems, i.e., large language models, could process end-users' natural-language expressions of content-related preferences into multimedia-content-scanning rules.
> > 

> > 

> > In theory, end-users' preferences could be collected, aggregated, and processed. These data would not identify end-users.
> > 

> > 

> > Conclusion
> > ==========
> > 

> > Existing and new standards and recommendations can enable indicated technology scenarios for better informing and equipping distributed social media administrators and moderators.
> > 

> > 

> > It would be great if, after some amount of delay, these kinds of data, indicated above, could be made available to the scientific community and to the public.
> > 

> > 

> > Thank you. Any comments, questions, or thoughts on these ideas?
> > 

> > 

> > 

> > 

> > Best regards,
> > Adam Sobieski
> > 

> > 

> > P.S.: See also:
> > 

> > -   https://github.com/swicg/general/issues/34
> > -   https://github.com/swicg/general/issues/7
> > -   https://github.com/w3c/activitypub/issues/231
> > -   https://github.com/w3c/activitypub/issues/232
> >     

> > 

> > 

> > 

Received on Saturday, 25 November 2023 20:06:32 UTC