W3C home > Mailing lists > Public > www-xml-infoset-comments@w3.org > January to March 2001

Re: defn "string" across XML infoset/query/schema, I18N specs

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Tue, 16 Jan 2001 19:31:41 +0000
Message-Id: <B0011189155@euvig1.dtc.lon.ime.reuters.com>
To: Dan Connolly <connolly@w3.org>
Cc: w3c-i18n-ig@w3.org, www-rdf-comments@w3.org, www-xml-infoset-comments@w3.org, www-xml-query-comments@w3.org, www-xml-schema-comments@w3.org


Is this the kind of definition you had in mind?


On 03/01/2001 17:41:54 Dan Connolly wrote:
> We were just discussing the infoset spec, and the
> lack of a definition of the term "string" there.
> In my head, a string is a finite sequence of
> unicode (UCS) characters. I suggested we say
> that in the infoset spec. It occurred
> to me that we should be consistent with the I18N
> character model, and that there should
> be some words that we can cite/steal...
> I don't see any clear mathematical specification
> of the term "string" in the spec.
> 4.3 String Identity Matching
> http://www.w3.org/TR/charmod/#IdentityMatching
> http://www.w3.org/TR/1999/WD-charmod-19990225#IdentityMatching
> Some text that looks relevant, though sorta garbled is:
>    "Level 2: Indexing based on abstract codepoints
>              UCS codepoints should be chosen, in accordance
>              with Production [2] of [XML 1.0], the SGML
>              declaration of [HTML 4.0], and the character model
>              of [RFC 2070]. This is the highest level of
>              abstraction that ensures interoperability. To avoid
>              problems with duplicates, it is assumed that the
>              data is normalized according to Section 3.2. "
>    -- http://www.w3.org/TR/1999/WD-charmod-19990225#Indexing
> By "string" I mean a finite sequence of those things... the abstract
> things... it should be clear that these are characters, not
> (necessarily) identical to the integer codepoints to which they
> correspond.
> I wonder if a formal model would clarify. I started working on one a
> while back:
>    http://www.w3.org/Architecture/theory/Character.lsl
>    Mon, 15 Jan 1996 19:34:44 GMT
> but I haven't integrated it into my somewhat more recent, but still out
> of date stuff:
>    http://www.w3.org/XML/9711theory/XMLElement.
>    http://www.w3.org/XML/9711theory/XMLElement.lsl
>    http://www.w3.org/XML/9711theory/XMLElement.html
>    $Id: XMLElement.lsl,v 1.9 2000/01/17 21:33:41 connolly Exp $
> Meanwhile, the term "Character" is grounded in the web at:
>    http://www.w3.org/XML/2000/12/infoset-20001211#Character
> but in the parts of that RDF schema where one would expect
> to find #String, one finds just:
>    http://www.w3.org/2000/01/rdf-schema#Literal
> which is not constrained to be a sequence of characters;
> RDF literals can include markup etc.
> Hmm... I suspect the Query data model spec has a specification
> for character and string, but I haven't looked. So let's look...
> http://www.w3.org/TR/query-datamodel/
> http://www.w3.org/TR/2000/WD-query-datamodel-20000511/
> ah... it takes its definition of string from the schema spec...
> of course, I should have thought of that...
> Ah yes, this text will do nicely:
> [[[
> 3.2.1 string
>         [Definition:]  The string datatype represents character
>         strings in XML. The value space of string is the set of
>         finite-length sequences of characters (as defined in [XML
>         1.0 Recommendation (Second Edition)]) that match the
>         Char production from [XML 1.0 Recommendation
>         (Second Edition)]. A character is an atomic unit of
>         communication; it is not further specified except to note
>         that every character has a corresponding Universal Code
>         Set code point ([ISO 10646], [Unicode] and [Unicode3]),
>         which is an integer.
>              NOTE: As noted in Order (, the fact
>              that this specification does not specify an
>              order-relation for string does not preclude
>              other applications from treating strings as
>              being ordered.
> ]]]
> http://www.w3.org/TR/2000/CR-xmlschema-2-20001024/#string
> Hm... I'm surprised by the restrictive clause
> "that match the Char production..."; do we really
> mean to exclude strings including the 0th character
> or the 1st character (ala CTRL-A) from XML strings?
> I guess so. Well, I learn something new every day.
> So the term "string" in the infoset spec refers to an
> item in the value space of the string datatype.
> Er... of course, the dependency should go the other way:
> the schema spec should import its definition of "string"
> from the character model spec, either directly, or
> indirectly, thru the infoset spec. The infoset spec
> should import its definition from the character model spec.
> Hmm... I'm not sure if scheduling that dependency is
> manageable, but that's how it *should* work, in theory.
> Hmm... the term string seems to have a home in the web...
> no, those hyperlinked "StringValue" terms refer to
> section 3.8 Values
> http://www.w3.org/TR/2000/WD-query-datamodel-20000511/#valueNode
> [Hmm... it would be great to "Webize" the notation used
> in the query data model spec
> http://www.w3.org/DesignIssues/Webize.html
> I suspect the result would be what we're after for the Semantic Web...
>    http://www.w3.org/DesignIssues/Semantic.html
>    http://www.w3.org/DesignIssues/Logic.html
>    http://www.w3.org/2000/01/sw/
> But I should send that request in a separate message...
> ]
> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/

        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Tuesday, 16 January 2001 14:34:38 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:08:00 UTC