[webrtc-charter] Representing Meetings' Transcripts and Minutes (#84)

AdamSobieski has just created a new issue for https://github.com/w3c/webrtc-charter:

== Representing Meetings' Transcripts and Minutes ==
## Introduction

Hello. I would like to propose that exploring representations of meetings' transcripts and minutes be in-scope for the WebRTC Working Group. This work item would involve designing a representation suitable for transcripts and minutes for both in-person and virtual (WebRTC-based) meetings. Meetings' transcripts and minutes could be produced by human or AI agents.

Interestingly, in addition to timed speech transcription content, meetings' transcripts and minutes could include attachments, timed hyperlinks, slideshow presentation events, and other metadata.

Please find some preliminary and rough-draft sketches below, showcasing these features, utilizing a WebVTT-metadata-inspired format with an extensible and in-progress JSON schema.

## Speech Transcription

As envisioned, meetings' transcripts would be mostly comprised of timed transcribed speech content.

Participants' speech needn't be transcribed to plain text. Alternatives, in these regards, include SSML, HTML, and LaTeX.

```
00:00.000 --> 00:05.000
{
  "@type": "speech",
  "agent" : {
    "@type": "person",
    "fullName" : "Alice Smith",
    "position" : ["Mathematics Instructor"],
  }
  "data": [{
    "@type": "data",
    "mimeType" : "text/latex",
    "value": "See, here, that the value is $x^{2}$."
  }]
}
```

```
00:00.000 --> 00:05.000
{
  "@type": "speech",
  "agent" : {
    "@type": "person",
    "fullName" : "Alice Smith",
    "position" : ["Senator", "Co-chair", "Attendee"],
  }
  "data": [{
    "@type": "data",
    "mimeType" : "text/plain",
    "value": "Without objection, the presenter's slides are entered into the minutes."
  }]
}
```

## Attachments

Files could be attached to meetings' minutes and transcripts.

```
00:05.000 --> 00:05.000
{
  "@type": "attachment",
  "agent" : {
    "@type": "person",
    "fullName" : "Charles Brown",
    "position" : ["Secretary", "Attendee"],
  }
  "data": [{
    "@type: "link",
    "mimeType" : "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    "href" : "files/panelist-presentation-1.pptx"
    "metadata" : {
      "author" : {
         "@type": person",
         "fullName" : "David Jackson"
      },
      "title" : "A Slideshow Presentation"
    }
  },
  {
    "@type: "link",
    "mimeType" : "application/pdf",
    "href" : "files/panelist-presentation-1.pdf"
    "metadata" : {
      "author" : {
         "@type": person",
         "fullName" : "David Jackson"
      },
      "title" : "A Slideshow Presentation"
    }
  }]
}
```

## Timed Hyperlinks

Timed hyperlinks could be entered into minutes and viewed and navigated by meetings' audiences.

```
00:05.000 --> 00:35.000
{
  "@type": "hyperlink",
  "agent" : {
    "@type": "person",
    "fullName" : "Charles Brown",
    "position" : ["Secretary", "Attendee"],
  }
  "data": [{
    "@type: "link",
    "mimeType" : "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    "href" : "files/panelist-presentation-1.pptx"
    "metadata" : {
      "author" : {
         "@type": person",
         "fullName" : "David Jackson"
      },
      "title" : "A Slideshow Presentation"
    }
  },
  {
    "@type: "link",
    "mimeType" : "application/pdf",
    "href" : "files/panelist-presentation-1.pdf"
    "metadata" : {
      "author" : {
         "@type": person",
         "fullName" : "David Jackson"
      },
      "title" : "A Slideshow Presentation"
    }
  }]
}
```

## Slideshow Presentation Events

As presenters presented their slideshows during meetings, events could be generated as they advanced or changed slides. These events could be entered into meetings' transcripts and minutes. These events could be consumed by both audiences and multimodal AI systems (see below).

WebVTT-based timed thumbnails could be useful for providing images to slideshow presentations' slides. These images of presentations' slides could also be accompanied by hyperlinks to individual slides (e.g., `"files/slideshow.pptx#3"`).

## Other Metadata

Metadata about meetings (e.g., their secretaries, scribes, or transcribing agents, their lists of attendees, their venues, their enabling software tools) could also be placed into meetings' transcripts and minutes.

```
00:00.000 --> 00:00.000
{
  "@type" : "metadata",
  "agent" : {
    "@type": "person",
    "fullName" : "Charles Brown",
    "position" : ["Secretary", "Attendee"],
  }
  "property": "transcribingAgent",
  "data": [{
    "@type": "person",
    "fullName" : "Charles Brown",
  }]
}
```

## Use Case: Artificial Intelligence

A new and important use case for meetings' transcripts and minutes is artificial-intelligence systems, e.g., multimodal large language models (MLLMs), consuming these data. AI systems will be able to answer questions about and engage in dialogues about meetings (see, for example: [1]).

[1] Golany, Lotem, Filippo Galgani, Maya Mamo, Nimrod Parasol, Omer Vandsburger, Nadav Bar, and Ido Dagan. "Efficient data generation for source-grounded information-seeking dialogs: A use case for meeting transcripts." (2024). [[arXiv](https://arxiv.org/abs/2405.01121)] [[GitHub](https://github.com/google-research-datasets/MISeD)]

## Conclusion

Thank you for considering this proposal for a work item for the WebRTC Working Group charter.

Please view or discuss this issue at https://github.com/w3c/webrtc-charter/issues/84 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 11 July 2024 02:22:03 UTC