- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Sat, 21 Apr 2012 16:10:46 +1000
- To: Glenn Maynard <glenn@zewt.org>
- Cc: David Singer <singer@apple.com>, public-texttracks@w3.org
On Sat, Apr 21, 2012 at 2:39 PM, Glenn Maynard <glenn@zewt.org> wrote: > On Fri, Apr 20, 2012 at 10:59 PM, Silvia Pfeiffer > <silviapfeiffer1@gmail.com> wrote: >> >> > On Fri, Apr 20, 2012 at 12:01 AM, David Singer <singer@apple.com> wrote: >> >> Yes, I think we want some readability. And maybe assuming all values >> >> could be on one line is bad. >> > >> > (I agree with the first, but the second isn't a problem: any JSON >> > document >> > can be stored on one line.) >> >> I believe David wanted to make sure valued don't *have* to be on one >> line, which your proposal seemed to suggest? > > > I mean: it's perfectly safe to assume that all values *can* be stored on a > single line with JSON; the second (assuming all values *can* be on one line) > is safe. Newlines within the string are encoded with \n (per JSON), and > there's no inherent limit to how long a line can be. > > But it's the first that's the problem--it's ugly and hard to read with a > text editor, even though it would work. > >> > Key: value >> > value >> > value2 >> > >> > would decode to "value\nvalue\n value". >> >> Incidentally, multi-line in HTTP headers also require white space at the >> start. > > > Right, but it's one or more of any whitespace, so you can't tell how much > whitespace was there to begin with. If it's exactly one space (like > patches), it doesn't have that problem. > >> I am not sure we can, though, without changing the parsing of WebVTT. > > > We could use a simple escape mechanism: if the first character on a header > value line begins with a backslash, remove it. So, you get the following: > > Key: | > line 1 > \. > \ > \\ > . > > In effect, "\." at the start of a line represents a period, "\" on a line by > itself represents a blank line, and '\\" at the start of a line is a single > backslash (escaping the escape character), but defined as a single trivial > rule instead of a list of escape sequences. > > This means--unless I'm missing a case--that *any* valid block of UTF-8 text > will round-trip. We don't strictly need that now (the only use case so far > for multiline comments is CSS), but it seems like a useful property to have > going forward. > > Also, these are infrequent enough that they wouldn't uglify source very > much. > > (One tangental detail: the final newline before the terminating "." line > should not be included in the resulting header data, or else it would be > impossible to encode a string that doesn't end with a newline.) OK, it seems we've just created ourselves a new name-value format spec. As much as I wanted to avoid this (sigh), I think this is the simplest we've had this far. Something like: 1. Name-vaue pairs of header metadata are given with a name-string separated from the value by a colon. No control characters or separators are allowed in the name value. No white space is allowed between the name and the colon (?). 2. If the value is a single "|" character, the value is multi-line, starting on the next line and ends with a line that only contains a single dot. The newline just before the dot-line is also not part of the value. A quick-and-dirty ABF could be: metadata-header = field-name ":" field-value field-name = token field-value = ("|" *TEXT CRLF "." CRLF) | (*TEXT without CRLF) Sound roughly right? Cheers, Silvia.
Received on Saturday, 21 April 2012 06:11:45 UTC