W3C home > Mailing lists > Public > www-rdf-comments@w3.org > July to September 2003

Re: pfps-04 (why the thread is germane to pfps-04)

From: Frank Manola <fmanola@mitre.org>
Date: Tue, 29 Jul 2003 12:25:26 -0400
Message-ID: <3F269FF6.27187643@mitre.org>
To: Graham Klyne <GK-lists@ninebynine.org>
CC: pat hayes <phayes@ihmc.us>, Martin Duerst <duerst@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org

Graham--

I understand your point (and I looked at what CHARMOD had to say about
octets before I replied).  My point about context, though, was that
we're using the term "octet" in the context of a discussion that uses a
set of different terms ("octet" being only one of them;  "character",
"text", "string" being others) in order to make specific distinctions
between the concepts those terms represent.  We may get a clearer
definition of "octet" from somewhere else, but it may not be making the
same distinction vis a vis the other terms we're using that CHARMOD
(which I thought we considered normative) does.

--Frank


Graham Klyne wrote:
> 
> Frank,
> 
> You make a fair point, except that CHARMOD seems to not offer a clear
> definition.
> 
> The best I could find is:
> 
> [[
> 3.1.6 Units of storage
> 
> Computer storage and communication rely on units of physical storage and
> information interchange, such as bits and bytes (also known as octets, as
> nowadays the word bytes is generally considered to mean 8-bit bytes). A
> frequent error in specifications and implementations is the equating of
> characters with units of physical storage. The mapping between characters
> and such units of storage is actually quite complex, and is discussed in
> the next section, 3.2 Digital Encoding of Characters.
> 
> [S] [I] Specifications and software MUST NOT assume a one-to-one
> relationship between characters and units of physical storage.
> ]]
> -- http://www.w3.org/TR/charmod/#sec-Storage
> 
> Anyway, I didn't mean to get into a protracted debate here, just wanted to
> try and reduce the number of imponderables.
> 
> #g
> --
> 
> At 11:21 29/07/03 -0400, Frank Manola wrote:
> >It seems to me that, given the context of this discussion, we might try
> >to stick to the terminology (and distinctions made between the terms) in
> >CHARMOD if we can.  At least that's a single document...
> >
> >--Frank
> >
> >Graham Klyne wrote:
> > >
> > > At 00:46 29/07/03 -0500, pat hayes wrote:
> > >
> > > >>Are 'binary octets' different from 'octets'?
> > > >
> > > >I have absolutely no idea. :-)
> > >
> > > Noticing that we're banding around this term 'octets', apparently without
> > > understanding what they are, I thought I'd dig over some definitions...
> > >
> > > I see an octet as a sequence of 8 bits, where a bit is one of {0,1}.  Octet
> > > instances are often described by a number in the range 0..255, with the
> > > common relationship between binary numbers and bits, subject to agreeing
> > > most significant first or least significant first.  In either case, the
> > > relationship is 1:1.
> > >
> > > The UTF-8 spec avoids the bit ordering issue by simply talking about "high
> > > order" to "low order" bits, which establishes a single direct relationship
> > > between the individual bits and the numbers 0..255.
> > >
> > > [[
> > > In UTF-8, characters are encoded using sequences of 1 to 6 octets. The only
> > > octet of a "sequence" of one has the higher-order bit set to 0, the
> > > remaining 7 bits being used to encode the character value. In a sequence of
> > > n octets, n>1, the initial octet has the n higher-order bits set to 1,
> > > followed by a bit set to 0. The remaining bit(s) of that octet contain bits
> > > from the value of the character to be encoded. The following octet(s) all
> > > have the higher-order bit set to 1 and the following bit set to 0, leaving
> > > 6 bits in each to contain bits from the character to be encoded.
> > > ]]
> > > -- http://www.rfc-editor.org/rfc/rfc2279.txt
> > >
> > > The UTF-8 spec generally presents octet values as hexadecimal numerals.
> > >
> > > Dan Connolly offers a slightly different form of definition:
> > > [[
> > > octet
> > >      an element of the set {0, 1, 2, ..., 255}
> > > ]]
> > > http://www.w3.org/MarkUp/html-spec/charset-harmful.html
> > >
> > > Some others:
> > >
> > > [[
> > > octet: A byte of eight binary digits usually operated upon as an entity.
> > > ]]
> > > -- http://glossary.its.bldrdoc.gov/fs-1037/dir-025/_3631.htm
> > > -- http://www.atis.org/tg2k/_octet.html
> > >
> > > [[
> > > Definition for: octet
> > >
> > > Eight bits.Octet is sometimes used instead of the term byte to avoid
> > > confusion, because not all computer systems use bytes that are eight
> > bits long.
> > > ]]
> > > --
> > http://www.computeruser.com/resources/dictionary/definition.html?lookup=3442
> > >
> > > Google for "octet definition" shows up plenty more
> > >
> > > Looking for definitions of "binary octet" doesn't show up anything
> > > especially useful, but the pattern of its use suggests one of two things:
> > > (a) octet values represented as 8 bits (as opposed to, say, a number)
> > > (b) octets used to encode binary data (as opposed to textual data).
> > >
> > > Anyway, returning to the original question (Are 'binary octets' different
> > > from 'octets'?), I think the answer is:  not for any meaningful purpose as
> > > far as RDF is concerned.
> > >
> > > #g
> > >
> > > -------------------
> > > Graham Klyne
> > > <GK@NineByNine.org>
> > > PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
> >
> >--
> >Frank Manola                   The MITRE Corporation
> >202 Burlington Road, MS A345   Bedford, MA 01730-1420
> >mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
> 
> -------------------
> Graham Klyne
> <GK@NineByNine.org>
> PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E

-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
Received on Tuesday, 29 July 2003 12:25:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 21 September 2012 14:16:32 GMT