Re: Ntriples++ (was Re: Banishing "bNode") from Dave Beckett on 2001-10-16 (w3c-rdfcore-wg@w3.org from October 2001)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Tue, 16 Oct 2001 06:16:08 +0100
To: Pat Hayes <phayes@ai.uwf.edu>
cc: w3c-rdfcore-wg@w3.org
Message-ID: <25655.1003209368@tatooine.ilrt.bris.ac.uk>
>>>Pat Hayes said:
<snip/>

> OK, done. To reiterate, I'm NOT here requesting that these changes be 
> made now. If we decide to only use very simple datatyping in RDF then 
> there is no need to make these changes. (I would argue, on a 
> different thread, that we in fact not decide to stick with very 
> simple datatyping, for upward compatibility reasons; but that is 
> another discussion.)
> 
<snip/>

I've started another thread to discuss the model theory changes, not
the syntax ones.

> >So if this literal labelled nodes, literal subject stuff is a
> >requirement, I propose to make the following changes to the last
> >public WD http://www.w3.org/TR/2001/WD-rdf-testcases-20010912/#ntriples
> >
> >   amending:
> >     subject ::= uriref | nodeID | nodeID":"literal
> >     object  ::= uriref | nodeID | nodeID":"literal | literal
> 
> I think this is too drastic for now, lets just leave Ntriple 
> extensions, I'd suggest, but have it in reserve in case we decide to 
> allow subtle datatyping.

So you are *not* proposing changing it!

If this is a consequence of a model theory change, it should be
proposed after such a change.  And I'd rather have words something
like "The MT has changed in section X.Y and this requires the
following from the graph encoding: 1. blah 2. blah" and then we can
work out what to do with syntax.

 
> BTW, there really are two options.
> 
> 1. add nodeIds as an option to literal objects (only), ie just your 
> second change above. This would be required if we were to allow 
> complex datatyping (ie where the same literal can have more than one 
> datatype, and where the datatype of a particular literal occurrence 
> can be deduced from its context of use.)

Ditto for MT changes with respect to datatyping.
 
> 2. (also) allow literals as subjects, ie first line above (though why 
> did you make it non-optional?) This isn't required, but it would add 
> useful expressivity and (I now think) cause no real harm to anyone.

non-optional: I always try to make the minimum change required; but
I'm still unclear what the change is and why it is needed, hence the
new thread I'm starting to ask that about the MT, not the N-Triples
syntax.


> But I'd suggest leaving all this open for discussion and just doing 
> the following for now, which is really all I was asking for :-). 
> Thanks.
> 
> >   deleting;
> >     bNode
> >
> >   adding:
> >     nodeID  ::=  '_:' name
> >
> >Where the bare literal object is used when the same literal is never
> >used as a statement subject.


Done in current draft at 
  http://www.w3.org/2001/08/rdf-test/#ntriples
  Last modified Mon, 15 Oct 2001 14:35:36 GMT
 
> Even if not, we might need to use the nodeID: form. There might, for 
> example, be some other assertion about that particular literal that 
> entailed that it was typed as an integer, even if it only occurred as 
> an object. For example
> 
> aaa integerpropertyeg "05" .
> integerpropertyeg rdfs:range xsd:integer .
> bbb stringpropertyeg "05" .
> stringpropertyeg rdfs:range xsd:string .
> ccc foodle "05" .
>
> Is that third 05 the integer, the string, or neither of them? If we 
> don't have literal-tidy graphs, there's no way to settle the 
> question. I'd like to be able to write
> 
> aaa integerpropertyeg _:node1:"05" .
> integerpropertyeg rdfs:range xsd:integer .
> bbb stringpropertyeg "05" .
> stringpropertyeg rdfs:range xsd:string .
> ccc foodle  _:node1:"05" .
> 
> which insists that the first and last triples have the same object node

For tidyness, consistency reasons, I think all literal-labelled nodes
should use the same format i.e. _:node1:"05" if this was going ahead.
The bare "05" could be used in subject, object but would refer to the
same occurance.


> 
> >The above changes also mean all
> >existing N-Triples files remain legal.
> 
> Good point. That ought to be true, obviously.

Yes; I'll try strongly to keep this.


> The obvious convention, it occurs to me, is that an Ntriples document 
> describes a graph which is as UNtidy as possible, given that nodes 
> with the same uriref or nodeID *must* be identified in the graph. So 
> one Ntriples-to-graph algorithm would be: treat everything except 
> urirefs as being on a separate node, then use the nodeIDs to identify 
> nodes with the same ID. The only point of adding nodeIDs is then to 
> force two nodes to be identified in the graph: it's a kind of 
> node-stitching indicator. (That's why they aren't needed for urirefs, 
> since you could just use the uriref itself to do the job.) That would 
> make the first example above have three distinct nodes with the same 
> literal label, but the second example would only have two. (If you 
> were to also add _:node1: to the middle literal, it would be only 
> one, but that graph would be datatype-inconsistent.)


I was wondering if there is going to be a N-Triples to graph
algorithm in the MT, or somewhere...?

> 
> BTW, you would get the same RDF graph if you had said
> 
> aaa integerpropertyeg _:node1:"05" .
> integerpropertyeg rdfs:range xsd:integer .
> bbb stringpropertyeg "05" .
> stringpropertyeg rdfs:range xsd:string .
> ccc foodle  _:node1  .
> 
> which illustrates why 'bNode' would be particularly unfortunate in this case.

Hmmm, for clarity I'd still rather the last line was _:node1:"05" to
indicate to developers that it was the same thing.


<snip/>

> >Sure; I just want a clear and good reason for allowing literals as
> >subjects, labelling nodes with literals so that once this is done, we
> >can change the N-Triples software to deal with them.  This means
> >additional complexity in comparing graphs, as far as I can tell.
> 
> Yes, some. But I think that this kind of labelling, or something like 
> it, will be inevitable if we have complex datatyping. If two 
> different occurrences of the same literal are going to have different 
> meanings, then they can't be at the same node. We have to have *some* 
> way to distinguish things said about one meaning from things said 
> about the other. And this gives us fewer nodes to compare than doing 
> it Ron Daniel's way :-)

The above paragraph I really wanted to read at the start of this
thread, something like the MT requires this functionality because of
these use cases.


> PS. BTW, if anyone feels that this particular style of labelling is 
> ugly or has some bad properties, then I have no objection to doing it 
> some other way. We could maybe write the label after the literal , 
> for example, then we wouldn't need the extra semicolon (?)


I'd rather reserve that space for other literal junk such as (hypothetically):

  "<em>foo</em>"(xmlns=<http://www.w3.org/TR/xhtml1>)

  "chat"(xml:lang="fr")

Suggestions, of what could be done consequently deciding what an
RDF/XML 1.0 literal is :)

Dave
Received on Tuesday, 16 October 2001 01:16:11 UTC