Re: 608 samples from Christian Vogler on 2014-04-08 (public-texttracks@w3.org from April 2014)

From: Christian Vogler <christian.vogler@gallaudet.edu>
Date: Tue, 8 Apr 2014 12:55:38 -0400
To: Philip Jägenstedt <philipj@opera.com>
Cc: "public-texttracks@w3.org" <public-texttracks@w3.org>, Ken Harrenstien <klh@google.com>, Jean-Baptiste Kempf <jb@videolan.org>
Message-ID: <CAHVQVp2m6Pqi4hsYV+71DCuQZ3mZNGD_djT9SeaSEgPNHqxsjA@mail.gmail.com>
In CEA-608 they are defined in tables 5-10 (at least in the older revision
608a, which I got for cheap).

Hi == 0x12 and low between 20-3f are Spanish, misc, and French special
characters, Hi == 0x13 are Portuguese, German, and Danish characters.

A full list also seems to be here:
http://en.wikipedia.org/wiki/EIA-608#Characters

In practice - for the US CEA 608 to WebVTT conversion - only 0x12 20 to
0x12 2f are relevant. The mapping for these to Unicode code points is
encapsulated in this table that I use, indexed starting from 0x20 through
0x2f:

/* CC non-Latin1 code mappings */static const uint16_t specialchar[] =
{  174 /* Â® */, 176 /* Â° */, 189 /* Â½ */, 191 /* Â¿ */,  0x2122 /* â„¢
*/, 162 /* Â¢ */, 163 /* Â£ */, 0x266A /* â™ª */,  224 /* Ã  */,
TRANSP_SPACE,232 /* Ã¨ */, 226 /* Ã¢ */,  234 /* Ãª */, 238 /* Ã® */, 244
/* Ã´ */, 251 /* Ã» */};

TRANSP_SPACE is a non-breaking space, and doesn't correspond to any
Unicode code point.


For muxing, my understanding is that the CPC authoring tools do that kind
of thing.

Christian


On Apr 8, 2014 7:57 AM, "Philip JÃ¤genstedt" <philipj@opera.com> wrote:

> Thanks jb, that looks useful as a reference. Would you happen to know
> where the extended characters "THAT COME FROM HI BYTE=0x12 AND LOW
> BETWEEN 0x20 AND 0x3F" and "THAT COME FROM HI BYTE=0x13 AND LOW
> BETWEEN 0x20 AND 0x3F" are actually defined? Also, do you know of a
> tool to mux arbitrary 608 data with an existing caption-less file for
> testing?
>
> Philip
>
> On Tue, Apr 8, 2014 at 12:27 PM, Jean-Baptiste Kempf <jb@videolan.org>
> wrote:
> > VLC has a 608 decoder, that should support all 608, including roll-up
> > captions (a contrario from Xine) and tested with actual NTSC streams.
> > http://git.videolan.org/?p=vlc.git;a=blob;f=modules/codec/cc.c;hb=HEAD
> >
> > On 01 Apr, Christian Vogler wrote :
> >> There are also at least two open-source projects that have CEA-608
> caption
> >> decoders - Xine (for the subset that is used on DVDs in the libspucc/
> >> source folder) and CCExtractor. Xine doesn't support roll-up captions,
> >> since they never appear on DVDs, but it handles pretty much everything
> >> else, including a file that Giovanni Galvez threw at me for testing a
> >> couple years ago.
> >>
> >> Christian
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Mar 31, 2014 at 2:39 PM, Philip JÃ¤genstedt <philipj@opera.com
> >wrote:
> >>
> >> > Thank you Ken!
> >> >
> >> > I remember now that SCC was one of the standalone 608 formats you
> >> > mentioned at FOMS. The raw essence is exactly what I'm interested in,
> >> > so that sounds very promising.
> >> >
> >> > I've asked to order a copy of "The Closed Captioning Handbook" for my
> >> > office, it looks very relevant to what I do.
> >> >
> >> > Philip
> >> >
> >> > On Tue, Apr 1, 2014 at 1:10 AM, Ken Harrenstien <klh@google.com>
> wrote:
> >> > > Giovanni Galvez is still there and still super helpful.
> >> > > Their software is widely used in the industry for format
> >> > > conversion.
> >> > >
> >> > > We host one of their demo videos at
> >> > > http://www.youtube.com/watch?v=BbqPe-IceP4
> >> > >
> >> > > and I'm sure Giovanni can send you the corresponding SCC
> >> > > files for that or any other demo video they have.  The reason
> >> > > I suggest SCC is that this format contains the raw 608 essence
> >> > > that we care about; in fact, this is YouTube's preferred upload
> >> > > format for movie/TV content.  If you want to know how to extract
> those
> >> > > bytes from a video file, then you have a much harder task given
> >> > > the multitude of video containers and formats.
> >> > >
> >> > > And yes, the CFR link, while terse, does contain pretty much all
> >> > > of the important bits.  The CEA documents are mostly about
> >> > > XDS data, which has nothing to do with captions.  For
> >> > > purposes of WebVTT conversion a much better place to start
> >> > > learning about 608 is the "Closed Captioning Handbook" by Gary
> Robson,
> >> > > which should still be available on Amazon.  I like it because it's
> >> > > very readable and has so much other interesting context.
> >> > >
> >> > > On the other hand, if you plan to implement some kind of
> >> > > cable set-top box, then yes, you'll need the CEA documents
> >> > > plus several other specs.
> >> > >
> >> > > --Ken
> >> > >
> >> > >
> >> > > On Mon, Mar 31, 2014 at 8:58 AM, Philip JÃ¤genstedt <
> philipj@opera.com>
> >> > > wrote:
> >> > >>
> >> > >> Do you mean http://www.cpcweb.com/webcasts/webcast_samples.htm ?
> >> > >>
> >> > >> What I'm looking for is the actual video file that contains the 608
> >> > >> data, preferably with some clue about how to extract it as well :)
> >> > >>
> >> > >> Philip
> >> > >>
> >> > >> On Mon, Mar 31, 2014 at 10:16 PM, Christian Vogler
> >> > >> <christian.vogler@gallaudet.edu> wrote:
> >> > >> > Gio Galvez at CPC did a video like that. His company was bought
> out,
> >> > but
> >> > >> > it
> >> > >> > might still be possible to get access. Should I ask?
> >> > >> >
> >> > >> > Sent from my mobile phone.  Please excuse any touchscreen-induced
> >> > >> > weirdness.
> >> > >> >
> >> > >> > On Mar 31, 2014 9:53 AM, "Philip JÃ¤genstedt" <philipj@opera.com>
> >> > wrote:
> >> > >> >>
> >> > >> >> Hi all,
> >> > >> >>
> >> > >> >> Does anyone have access to 608 caption data and recommendations
> for
> >> > >> >> software that is known to render it correctly? I'd like to
> understand
> >> > >> >> the 608 model at the lowest level, but it's hard without
> examples.
> >> > I'm
> >> > >> >> guessing that people who have worked on 608 to WebVTT already
> have
> >> > >> >> sample files and scripts to process them, so anything like that
> would
> >> > >> >> be appreciated.
> >> > >> >>
> >> > >> >> Also, the spec is incredibly brief, is there really nothing
> better
> >> > than
> >> > >> >> this?
> >> > >> >>
> http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/47cfr15.119.pdf
> >> > >> >>
> >> > >> >> Philip
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Christian Vogler, PhD
> >> Director, Technology Access Program
> >> Department of Communication Studies
> >> SLCC 1116
> >> Gallaudet University
> >> http://tap.gallaudet.edu/
> >> VP: 202-250-2795
> >
> > --
> > With my kindest regards,
> >
> > --
> > Jean-Baptiste Kempf
> > http://www.jbkempf.com/ - +33 672 704 734
> > Sent from my Electronic Device
>
Received on Tuesday, 8 April 2014 16:56:03 UTC