Re: Ideas for the Use Case of Public-sector Meeting Transcripts

Silvia,

Hello and thank you for that hyperlink about the WebVTT metadata type. It has an expressiveness resembling that of JSON with some important caveats about blank lines.

Next steps for the use case of meetings' minutes and transcripts appear to involve developing extensible, general-purpose schema including for the WebVTT metadata type.

Also, the WebVTT metadata type would, in theory, be more readily compatible with LLMs than extended TTML, enabling some interesting scenarios such as Q&A and dialogue about (public sector) meetings utilizing their minutes and transcripts [1][2].


Best regards,
Adam

[1] Golany, Lotem, Filippo Galgani, Maya Mamo, Nimrod Parasol, Omer Vandsburger, Nadav Bar, and Ido Dagan. "Efficient data generation for source-grounded information-seeking dialogs: A use case for meeting transcripts." (2024). https://arxiv.org/abs/2405.01121

[2] https://github.com/google-research-datasets/MISeD


________________________________
From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Sent: Saturday, July 6, 2024 1:56 AM
To: Adam Sobieski <adamsobieski@hotmail.com>
Cc: public-tt@w3.org <public-tt@w3.org>
Subject: Re: Ideas for the Use Case of Public-sector Meeting Transcripts

Hi Adam,

You might consider using WebVTT for that purpose - the "metadata" type already allows you to formulate your custom timed markup:
https://www.w3.org/TR/webvtt1/#introduction-metadata


Kind Regards,
Silvia.


On Sat, Jul 6, 2024 at 1:18 AM Adam Sobieski <adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>> wrote:
Timed Text Working Group,

Hello. I am pleased to share, for purposes of discussion, some ideas for extending TTML for use cases including public-sector meetings' minutes and transcripts.

As shown in the following markup example, seven main ideas are broached:


  1.  Files could be attached to meetings' minutes and transcripts, e.g., presenters' slideshow slides.
  2.  These files could be described with metadata.
  3.  Agents could have one or more roles or positions described in their metadata.
  4.  Minutes and transcripts could have generator agents and/or software tools.
     *   Beyond "person", "character", "group", "organization", and "other", might software tools be a type of agent?
  5.  Inline time-based hyperlinks could be placed in minutes to signal when files were attached to meetings' minutes and transcripts.
  6.  These time-based hyperlinks could be attributed to agents or software tools.
  7.
These time-based hyperlinks could be displayed for end-users consuming accompanying videos of meetings for downloading attached files.

Here is a markup sketch. The new parts, showcasing the above ideas, are phrased using an XML extension and are emphasized in bold.


<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:ext="..."
    xml:base="...">

  <head>
    <metadata xmlns:ttm="http://www.w3.org/ns/ttml#metadata">
      <ttm:title>...</ttm:title>
      <ttm:desc>...</ttm:desc>
      <ext:generator ttm:agent="brown" />
    </metadata>

    <ext:attachment xml:id="budget-2024-1"
                    ext:mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
                    ext:src="attachments/budget-2024.xlsx" />
    <ext:attachment xml:id="budget-2024-2"
                    ext:mime="application/xml"
                    ext:src="attachments/budget-2024.xbrl" />
    <ext:attachment xml:id="panelist-presentation-1"
                    ext:mime="application/vnd.openxmlformats-officedocument.presentationml.presentation"
                    ext:src="attachments/panelist-presentation-1.pptx">
      <metadata>
        <ttm:title>Slideshow Presentation</ttm:title>
        <ext:generator ttm:agent="jackson" />
      </metadata>
    </ext:attachment>

    <ttm:agent xml:id="smith" type="person">
      <ttm:name type="family">Smith</ttm:name>
      <ttm:name type="given">Alice</ttm:name>
      <ttm:name type="full">Alice Smith</ttm:name>
      <ext:position>Senator</ext:position>
      <ext:position>Co-chair</ext:position>
    </ttm:agent>
    <ttm:agent xml:id="jones" type="person">
      <ttm:name type="family">Jones</ttm:name>
      <ttm:name type="given">Bob</ttm:name>
      <ttm:name type="full">Bob Jones</ttm:name>
      <ext:position>Senator</ext:position>
      <ext:position>Co-chair</ext:position>
    </ttm:agent>
    <ttm:agent xml:id="brown" type="person">
      <ttm:name type="family">Brown</ttm:name>
      <ttm:name type="given">Charles</ttm:name>
      <ttm:name type="full">Charles Brown</ttm:name>
      <ext:position>Secretary</ext:position>
    </ttm:agent>
    <ttm:agent xml:id="jackson" type="person">
      <ttm:name type="family">Jackson</ttm:name>
      <ttm:name type="given">David</ttm:name>
      <ttm:name type="full">David Jackson</ttm:name>
      <ext:position>Guest</ext:position>
      <ext:position>Panelist</ext:position>
    </ttm:agent>
  </head>
  <body>
    <div>
      ...
      <p begin="00:22.000" end="00:27.000" ttm:agent="smith">
        Without objection, the annual budget is entered into the minutes.
      </p>
      <ext:a begin="00:27.000" duration="00:10.000" ext:xref="budget-2024-1" ttm:agent="brown" />
      <ext:a begin="00:27.000" duration="00:10.000" ext:xref="budget-2024-2" ttm:agent="brown" />
      ...
      <p begin="01:23.000" end="01:28.000" ttm:agent="smith">
        Without objection, the panelist's slides are entered into the minutes.
      </p>
      <ext:a begin="01:28.000" duration="00:10.000" ext:xref="panelist-presentation-1" ttm:agent="brown" />
      ...
    </div>
  </body>
</tt>

Any thoughts on these ideas and the markup sketch? Any other ideas towards utilizing and/or extending timed text, e.g., TTML, for the use case of representing (public-sector) meetings' minutes and transcripts? Thank you.


Best regards,
Adam Sobieski

P.S.: It appears that I should have emailed this mailing list instead of having opened a GitHub issue. Apologies for the multiple copies of this content in this mailing list.

Received on Sunday, 7 July 2024 20:22:35 UTC