Creating detailed markup for transcripts on AudioObjects from Michael Fienen on 2019-10-02 (public-schemaorg@w3.org from October 2019)

From: Michael Fienen <fienen@gmail.com>
Date: Wed, 2 Oct 2019 10:23:23 -0500
To: public-schemaorg@w3.org
Message-ID: <CAO_==X1NyF4gudpy6AXWd4bqr3R2H5Kr+SbBJsYD1GUX+er12w@mail.gmail.com>

So, I've been doing research into improving transcript accessibility for
audio files (in particular, podcasts). Unfortunately, there's very little
advice out there on "proper" markup for transcript accessibility, and the
W3's base recommendation is little more than a series of <p> tags with
names in <strong>. Certainly, this can make it very screen reader friendly,
but it definitely doesn't make it easy to transform or convey any meaning
to machine readers.

I turned to schema.org thinking maybe there was some sort of microformat
that might be applicable here. As it turns out, AudioObjects do support a
transcript property, but similar to other recommendations, the support for
actual information in the transcript is nonexistent. It just takes a text
blob. So you have an option, but it's not exactly the best it could be, IMO.

I'm wondering if anyone has worked on, or is interested in collaborating on
extending the spec for transcripts to include stuff like speaker
identification, timecoding, etc. My hope is to create something detailed
for users who generate transcripts for their content so that it can easily
be moved around, potentially be granted better meaning by search crawlers,
etc.

Does that sound reasonable, or am I not seeing a more obvious, better
solution to the underlying question?

Michael Fienen

Received on Wednesday, 2 October 2019 15:23:57 UTC