FW: Even more simplified datatyping proposal

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com


------ Forwarded Message
From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 21 Feb 2002 09:01:18 +0200
To: ext Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Cc: Brian McBride <brian_mcbride@hp.com>, Pat Hayes <phayes@ai.uwf.edu>, ext
Jos De_Roo <jos.deroo.jd@belgium.agfa.com>
Subject: Re: Even more simplified datatyping proposal

On 2002-02-21 0:32, "ext Graham Klyne" <Graham.Klyne@MIMEsweeper.com> wrote:

> At 06:47 PM 2/20/02 +0200, you wrote:
>> On 2002-02-20 16:57, "ext Graham Klyne" <Graham.Klyne@MIMEsweeper.com>
>> wrote:
>> 
>>> Patrick,
>>> 
>>> I don't entirely trust my judgement at the moment (which is coded speak for
>>> I want to hear other comments before fully committing myself), but if there
>>> is no requirement to apply RDF scheme statements to RDF and/or RDFS
>>> vocabulary to indicate the designers intentions, then I think the thumbs
>>> are quite safe.
>>> 
>>> BUT, I think there may be a problem with the union type approach:
>>> 
>>> I'm going to take an example from ENUM, which is the IETF protocol and DDDS
>>> application I mentioned elsewhere.  The raw input to enum is a telephone
>>> number digit string, and this is mapped to a DNS domain name with the
>>> digits in reverse order:
>>> 
>>>  +441235123456  -->  6.5.4.3.2.1.5.3.2.1.4.4.e164.arpa
>>> 
>>> So, we have a datatype:  the value space is space of dotted DNS names, and
>>> the lexical space is the space of "+"-prefixed telephone numbers.  Both are
>>> character strings.
>>> 
>>> Let's call this datatype ietf:enum.
>>> 
>>> The intended use might be something like this:
>>> 
>>>   ex:dddskey ex:keyvalue _:x .
>>>   _:x ietf:enum "+441235123456"
>>> 
>>> or
>>> 
>>>   ex:dddskey ex:keyvalue "+441235123456" .
>>>   ex:keyvalue rdfs:range ietf:enum .
>>> 
>>> But what is the meaning of this:
>>> 
>>>   ex:dddskey ex:keyvalue "6.5.4.3.2.1.5.3.2.1.4.4.e164.arpa" .
>>>   ex:keyvalue rdfs:range ietf:enum .
>> 
>> It is an error (see below).
>> 
>>> Because the value space contains strings, they can appear as literals, as
>>> well as the intended lexical space strings.  A simple union type cannot be
>>> disambiguated in this case.
>> 
>> RDF does not provide representation for values. Period. It may provide
>> denotation of a single value by a unique bNode, but it *never* provides
>> for explicit representation of the value in the graph, ever.
> 
> But denoted values *can* be strings.  There is nothing to prevent syntactic
> elements of the RDF graph being in the universe of discourse (indeed,
> that's how the Herbrand lemma proof works).
> 
> I don't see why you can say that what I wrote above is an error.  The RDF
> N-triples syntax is OK.  And the denotation of the literal is a member of
> the class extension of I(ietf:enum) as I understand you are proposing it to
> be.

It's an error in the context of datatyping.

Or alternately, you could say that you have a non-cannonical lexical
space where for some members (the actual value strings) the mapping
function from lexical space to value space is the equality operator,
but that doesn't seem right for the ietf:enum example.

If you have a literal in a datatyping idiom/context, it is a lexical
form. That is imposed by the datatyping interpretation.

If you want to use some other non-datatyping idiom to express
string representable values otherwise, fine, e.g.

   _:1 ietf:enum "+441235123456" .
   _:1 actualValueRepresentation "6.5.4.3.2.1.5.3.2.1.4.4.e164.arpa" .

but the latter literal is not a lexical form. It's just a literal.
Saying

   _:1 ietf:enum "6.5.4.3.2.1.5.3.2.1.4.4.e164.arpa" .

is an error because "6.5.4.3.2.1.5.3.2.1.4.4.e164.arpa" is not
a member of the lexical space of ietf:enum and the datatyping
property says that it should be.

>> If it is a literal, it is treated as a lexical form. It's that simple.
> 
> If it is a literal, it *is* a lexical form, surely?

Only if literals are untidy. A actual lexical form is a member of
a datatype lexical space, and thus denotes a specific value.

A literal is just a literal, but in the context of a datatype
interpretation it is taken to *represent* a lexical form (be
string equal with a lexical form) and in the interpretation denote
a specific value.

The distinction is critical, and one that I've tried a few times
to communicate, but folks don't seem to get it. Pat certainly doesn't.
(He doesn't want to consider anything outside the MT semantics, which
 is a pity, because that's where datatype interpretation actually
happens...)

Graham, I'm CCing the others in this thread as well, rather than
maintaining two separate ones -- I had simply forgotten to include
you in the original CCline earlier...

> But not being a literal doesn't mean it cannot be a lexical form -- the
> universe of discourse can be anything we choose to pick upon.  Including
> lexical strings.
> 
>> If the above is not a valid lexical form, then it is an error.
>> 
>> The fact that values are also representable, theoretically, as literals
>> is irrelevant. Literals in a datatyping context are *always* lexical
>> forms (or an error).
> 
> Yes.
> 
>> OK?
> 
> Not quite, because (if I understand correctly) you're proposing to have the
> value space of a datatype be the union of its lexical and value spaces,

No. Let me clarify.

I am proposing that an "RDF Datatype Class" (i.e. rdfs:Datatype) contain
as its members the members of a datatype's value space and lexical space.

The datatype is not the same thing as the RDF Datatype Class. The latter
is the basis for datatyping in RDF and how one interprets a
literal-in-context to obtain a value. The former is external theory
adopted mostly from XML Schema.

An rdfs:Datatype is a particular treatment of a 'datatype' as defined
by XML Schema and the Foundational MT.

You could say that an RDF Datatype Class contains all the members
of both spaces of a datatype, irrespective of the internal space
partitions, but we can re-partition them as needed using range
intersection with either rdfs:Literal or rdfs:Resource and the
graph syntax makes interpretation unambiguous and consistent.

The reason why the RDF Datatype Class has to have both lexical forms
and values as its members is to allow a consistent interpretation
for both the inline and datatype triple idioms, since the property
value can be either a literal or a bNode (a lexical form or a value).

I appreciate that it is a somewhat radical view, at first, but
if you look through the commented examples I sent last night, I
think (hope) it will become clear that it's really quite simple.

(it just seems complicated because it's so different than how we've
 been looking at datatype URIs -- with such strict and fine precision
 of which actual space they denote, so the union idea at first may
 feel sloppy or fuzzy, but in fact, it's just the right amount of
 flexibility, I think  ;-)

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com


------ End of Forwarded Message

Received on Thursday, 21 February 2002 04:04:49 UTC