Re: Man-machine Dialogues Grounded in Public-sector Meeting Transcripts from Adam Sobieski on 2024-07-05 (public-civics@w3.org from July 2024)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Fri, 5 Jul 2024 16:22:20 +0000
To: Mike Gifford <mike.gifford@civicactions.com>, "public-civics@w3.org" <public-civics@w3.org>
Message-ID: <PH8P223MB0675C6FB536DD710934533DBC5DF2@PH8P223MB0675.NAMP223.PROD.OUTLOOK.COM>
Mike,
All,

Thank you for the hyperlinks and the excellent points about government transparency and accountability.

These technology topics are also applicable to this Community Group's previous discussions about equipping and amplifying local journalism organizations. Local journalists and newspapers, while facing an existential crisis, have been expected to attend local governments' meetings and/or to scour through these meetings' minutes.

I wonder whether there exist any business models for these types of civic-technology search engines. Perhaps government watchdog organizations and journalism organizations would purchase accounts to search these data and would upgrade their accounts for search-automation capabilities? If there are not any viable business models, perhaps these (AI-enhanced) civic-technology search engines could be a result of philanthropy... perhaps an industry-wide philanthropy resembling how C-SPAN was created and provided for the public by the cable television industry. Otherwise, a CKAN/DKAN extension could save the day.

With respect to exploring existing and new formats for representing city, county, state, and federal governments' meetings' transcripts (e.g., TTML3, WebVTT, Markdown for Meetings, MeetingsML, etc.), please see the postscript for some technical discussion.


Best regards,
Adam

[1] https://github.com/w3c/ttml3/

[2] https://w3c.github.io/ttml-webvtt-mapping/

[3] https://w3c.github.io/webvtt/


P.S.:

In addition to speakers' names, we might desire to be able to provide AI, e.g., LLMs, with more information about speakers to enable fuller natural-language questions and dialogue. For instance, speakers' roles or positions could be useful.

For WebVTT this might resemble:

00:22.000 --> 00:27.000
<v Alice Smith (Senator)>Hello world.

or

00:22.000 --> 00:27.000
<v Alice Smith {position:Senator}>Hello world.

(unknown whether WebVTT supports parentheses or curly-brackets in voice tags).

For TTML3, this might resemble:

<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata">
  <head>
    <ttm:agent xml:id="smith" type="person">
      <ttm:name type="family">Smith</ttm:name>
      <ttm:name type="given">Alice</ttm:name>
      <ttm:name type="full">Alice Smith</ttm:name>
      <ttm:position>Senator</ttm:position> <!-- not a TTML3 metadata element -->
    </ttm:agent>
  </head>
  <body>
    <div>
      ...
      <p begin="00:22.000" end="00:27.000" ttm:agent="smith">Hello world.</p>
      ...
    </div>
  </body>
</tt>

________________________________
From: Mike Gifford <mike.gifford@civicactions.com>
Sent: Thursday, July 4, 2024 11:57 AM
To: public-civics@w3.org <public-civics@w3.org>; Adam Sobieski <adamsobieski@hotmail.com>
Subject: Re: Man-machine Dialogues Grounded in Public-sector Meeting Transcripts

Hi Adam & other CivicTech folks,

It would be interesting to be able to Talk to Your Government on other fronts:
https://ai.objectives.institute/talk-to-the-city


As far as CKAN/DKAN, at the moment they are primarily data repositories. So the transcripts, multi-media, documents could all be stored and shared in them. How they are processed is another thing.

I’m pretty sure both have an API that would allow the data to be queried.

I don’t know any instance where any open data portal is being used to organize public meetings. I could see it being useful for holding politicians accountable.

Using well constructed Sitemap.xml files which are registered with search engines probably would already alert the search engines.

The Welsh government is doing some pretty neat open government work:
https://www.gov.wales/uk-open-government-national-action-plan-2022-2024-welsh-government-commitments-html


I’ve seen a lot more transparency there than in the Government of Canada. Both would use either a Record of Proceedings or a Hansard.

I do see potential in this, but not sure that politicians would. There is a range of ideas that they should be supporting from organizations like:
https://github.com/opennorth

https://github.com/sunlightlabs

https://github.com/mysociety


We will see though. This year could change a lot with democracies around the world.  Do we want to move to an uncertain participatory future, or dream of a glorious hazy past?

Mike

Mike Gifford, Senior Strategist, CivicActions
President of the Board, Digital Services Coalition
Drupal Core Accessibility Maintainer, IAAP CPWA Certified
https://civicactions.com<https://civicactions.com/>    |  https://accessibility.civicactions.com

http://twitter.com/mgifford |  http://linkedin.com/in/mgifford



On July 3, 2024 at 6:26:23 PM, Adam Sobieski (adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>) wrote:

Civic Technology Community Group,

Also, in addition to enabling citizens and journalists to be able to manually engage in man-machine dialogues about one or more public-sector meetings' transcripts, search-automation scenarios are possible. Users' natural-language queries, questions, and "dialogical procedures" could be stored and applied to public-sector meetings' minutes and transcripts data as they arrived. Users would, then, be able to receive alerts, e.g., by email, as updates pertinent to their stored searches occurred.

Mike Gifford: what is the nature and status of public-sector CMS tools, CKAN and DKAN tools, and extensions for these tools, with respect to meetings' minutes and transcripts?

Should city, county, state, and federal governments' and organizations' CMS tools ping search engines, for instance, when new minutes and transcripts are made available online and/or should these data be automatically routed to pertinent CKAN/DKAN data services?

The aforementioned Google Research MISeD data (https://github.com/google-research-datasets/MISeD) appears to utilize a JSONL-based data format for representing 225 meetings across three domains: 134 Product Meetings (AMI), 58 Academic Meetings (ICSI), and 33 public Parliamentary Committee Meetings sourced from the Welsh Parliament and the Parliament of Canada.

Would a new format or standard for meetings' minutes and transcripts be useful, e.g., MeetingsML, something expanding upon markdown, or something expanding upon timed text (perhaps enabling streaming minutes and transcripts for meetings)?

Thank you. Any thoughts on these or any other, related topics?


Best regards,
Adam

P.S.:

[1] https://www.congress.gov/house-hearing-transcripts/118th-congress

[2] https://www.congress.gov/senate-hearing-transcripts/118th-congress


________________________________
From: Adam Sobieski <adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>>
Sent: Wednesday, June 26, 2024 2:40 PM
To: public-civics@w3.org<mailto:public-civics@w3.org> <public-civics@w3.org<mailto:public-civics@w3.org>>
Subject: Man-machine Dialogues Grounded in Public-sector Meeting Transcripts

Civic Technology Community Group,

Hello. I'd like to share a hyperlink to some exciting AI R&D:

https://research.google/blog/efficient-data-generation-for-source-grounded-information-seeking-dialogs-a-use-case-for-meeting-transcripts/


"Meeting recordings have helped people worldwide catch missed meetings, focus instead of taking notes during calls, and review information. But recordings can also take a lot of time to review. One solution to enable efficient navigation of recordings would be an agent that supports natural language conversations with meeting recordings, so that users can catch up on meetings they have missed. This could manifest as a source-grounded information-seeking dialog task where the agent would allow users to efficiently navigate the given knowledge source and extract information of interest. In this conversational setting, a user would interact with an agent over multiple rounds of queries and responses regarding a source text. The input to the agent model would include the source text, dialog history, and the current user query, and its output should be a response to the query and a set of attributions (text spans from the source document that support the response)."

These technologies will benefit society across sectors – meetings are ubiquitous – and will have civic-technology applications.

In the not-too-distant future, citizens and journalists will be able to ask questions and to engage in dialogues about public-sector meetings, both individual meetings and collections of such meetings.


Best regards,
Adam Sobieski
Received on Friday, 5 July 2024 16:22:27 UTC