Re: DATATYPING: second draft

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Fri, 14 Dec 2001 10:19:03 -0800
Message-ID: <3C1A4297.EB5ABC6@db.stanford.edu>
To: Brian McBride <bwm@hplb.hpl.hp.com>

```Brian McBride wrote:
>
> Sergey,
>
> At 12:57 10/12/2001 -0800, Sergey Melnik wrote:
> >I restructured and extended the datatyping document to include the most
> >recent contributions by Graham, DanC, PatrickS and Frank:
>
> Good work Sergey.  I'm finding it increasingly hard to distinguish between
> the different proposals, a fact I find encouraging in that I hope it means
> we are moving to a synthesis of ideas.

Thanks... I hope you are right!

> sans chapeau:
>
> Let me see if I've got this straight.
>
> In the P proposal, two nodes in a graph might be labeled "10".  An
> interpretation of one node could be the integer (10) (borrowing Jeremy's
> notation) and the other might be the string "10".
>
> In the PL proposal, two nodes in a graph might be labeled "10" and the
> interpretation of both nodes must be the same string "10". However, an
> application may use other information, possibly represented in the graph,
> possibly not, to 'interpret' that string as an integer, or not.

Right.

> In the B idiom of the S proposal, two nodes in the graph might be labeled
> "10" and an interpretion of one node might be the string "10" which is a
> member of the lexical space of xsd:integer and of the other might be the
> string "10" which is a member of the lexical space of xsd:string.

Correction: the interpretation of both nodes is the *same* literal value
(which may be an element of both the lexical space of xsd:integer and
the lexical space of xsd:string). I believe that S-B is equivalent to
PL.

> Given
>
>    foo bar "10" .
>
> we need more information to figure out which "10" is meant, e.g. a range
> constraint.
>
> ok so far?

We need more informaion to figure out from what angle to look at this
same "10".

> Dan has raised an issue, rephrased by Pat as how many triples result from
> merging:
>
>    foo bar "10" .
>
> and
>
>    foo bar "10" .
>
> Peter Patel-Schneider, echo'd by Jeremy has raised the issue of
> compatibility with xml:schema.  I personally feel there is an elegance to
> Jeremy's proposal which has xml schema dealing with syntax and lexical
> issues, and rdf schema dealing with data model issues.
>
> A while back, Jan suggested putting the integer in the graph rather than
> the numeral.  Hmm, I can't remember why that did not fly at the time.

We'd still have to use a literal or a resource to denote an integer, I
guess...

> Patrick has raised the issue of things getting rather messy if the lexical
> representation of a value becomes separated from the mapping which defines
> its interpretation.
>
> And this led me to wondering about Patrick's suggestion that a literal is
> in fact a pair.
>
> So what if  ...
>                   we extended n-triples so that literals MAY have an
> associated datatype.
>
>       <foo> <prop> xsd:integer:"10" .
>
> If we don't know the datatype, we leave it blank:
>
>      <foo> <prop> _:"10" .
>
> We can do inference:
>
>    <foo> <prop> _:"10" .
>    <prop> <rdfs:rangeLex> <xsd:integer.lex> .
>
> entails
>
>    <foo> <prop> xsd:integer:"10" .
>
> and also:
>
>    <prop> <rdfs:rangeLex> <xsd:integer.lex> .
>
> entails
>
>    <prop> <rdfs:range> <xsd:integer.val> .
>
> We can do simplification:
>
>    <foo> <prop> xsd:integer:"10" .
>
> entails
>
>    <foo> <prop> _:"10" .
>
> Some WG, to leave aside charter issues, could decided that the triple to
> output for:
>
>    <rdf:Description>
>      <prop xsd:type="xsd:integer">10</prop>
>    </rdf:Description>
>
> is
>
>    _:bnode <prop> xsd:integer:"10" .
>
> What happens if we merge
>
>    <foo> <prop> _:"10" .
>
> and
>
>    <foo> <prop> _:"10" .
>
> Icky bit here: by rule we get:
>
>    <foo> <prop> _:"10" .
>
> If we merge
>
>    <foo> <prop> _:"10" .
>    <prop> <rdfs:rangeLex> <xsd:string>
>
> and
>
>    <foo> <prop> _:"10" .
>    <prop> <rdfs:rangeLex> <xsd:int>
>
> the merging rule is, first run datatype inference to populate blank data
> types then merge, which gives a merge of
>
>    <foo> <prop> xsd:string:"10" .
>
> and
>
>    <foo> <prop> xsd:int:"10" .
>
> which will give the right result.  Why do I have the feeling I didn't get
> away with that bit.

Assuming monotonicity, it does not matter whether you first do
inferencing and then merge (and do inferencing again), or first merge
all the data and then do inferencing on it. So, after merging we get:

<foo> <prop> _:"10" .
<foo> <prop> _:"10" .
<prop> <rdfs:rangeLex> <xsd:string.lex>
<prop> <rdfs:rangeLex> <xsd:int.lex>

where _ is treated in a bNode-like fashion, so we still have two
distinct triples. Then, according the rules you suggested, we can derive

<foo> <prop> xsd:string:"10" .
<foo> <prop> xsd:int:"10" .

which is fine. However, we also derive

<prop> <rdfs:range> <xsd:int.val> .
<prop> <rdfs:range> <xsd:string.val> .

Since the value spaces of int and string as disjoint, we get
CEXT(I(prop)) = empty set, and there is no valid interpretation of the
merged data ;( It looks like your second rule causes problems, which,
however, could be avoided by using something like rdfs:rangeVal instead
of rdfs:range that has a union (not intersection) semantics in the model
theory. So we'd get

<prop> <rdfs:rangeVal> <xsd:int.val> .
<prop> <rdfs:rangeVal> <xsd:string.val> .

And, if rdfs:rangeLex also has union semantics, basically, we are
stating that the range of prop is a union type, similar to union types
defined in XSD. Moreover, speaking in abstract terms, the datatype
mapping of this union type would have the property that some elements of
the lexical space are mapped to multiple elements of the value space. In
other words, if complex_type is the name of the type defined by merging
the data above, we'd have the situation that complex_type:"10" is
interpreted as both an integer and a string. That is,

<foo> <prop> complex_type:"10"

entails both

<foo> <prop> xsd:int:"10"
<foo> <prop> xsd:string:"10"

So far, this still looks fine given the desired result you wanted to
obtain above. However, this is not much different from the S-B approach.
There you start with

<foo> <prop> "10"
<foo> <rdfs:range> <xsd:int.val>

and

<foo> <prop> "10"
<foo> <rdfs:range> <xsd:string.val>

After merging, you get

<foo> <prop> "10"
<foo> <rdfs:range> <xsd:int.val>
<foo> <rdfs:range> <xsd:string.val>

Here, the literal "10" is interpreted, as before, as some literal value
that is both element of lex space of int and lex space of string. The
schema information gives a hint that both lex spaces are to be
considered by the application...

> No matter, completing the job:
>
> Merging
>
>    <foo> <prop> _:"10" .
>    <prop> <rdfs:rangeLex> <xsd:integer>
>
> and
>
>    <foo> <prop> _:"12" .
>    <prop> <rdfs:rangeLex> <eg:oct>
>
> gives
>
>    <foo> <prop> xsd:integer:"10" .
>
> or
>
>    <foo> <prop> eg:oct:"10" .
>
> to a processor that 'understands' the xsd:integer and eg:oct datatypes.

This is not all, though. From the merged data

<foo> <prop> _:"10" .
<prop> <rdfs:rangeLex> <xsd:integer.lex>
<foo> <prop> _:"12" .
<prop> <rdfs:rangeLex> <eg:oct.lex>

you'd infer all of

<foo> <prop> xsd:integer:"10" .
<foo> <prop> xsd:integer:"12" .
<foo> <prop> eg:oct:"12" .
<foo> <prop> eg:oct:"10" .

which includes integers 8, 10, and 12, and is not what you want...

Sergey

>
> Finally,
>
>    <foo> <prop> xsd:integer:"10" .
>
> if and only if
>
>    <foo> <prop> _:v .
>    _:v   <xsd:integer.map> "10" .
>
> Brian
```
Received on Friday, 14 December 2001 13:03:54 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:54 UTC