Re: RDFON: a new RDF serialization from Tim Berners-Lee on 2007-07-31 (semantic-web@w3.org from July 2007)

From: Tim Berners-Lee <timbl@w3.org>
Date: Tue, 31 Jul 2007 13:46:57 -0400
To: Garret Wilson <garret@globalmentor.com>
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <64C173BF-202B-401D-B85B-EBE441FACD50@w3.org>
On 2007-07 -26, at 19:53, Garret Wilson wrote:
>
> [..]

> Although I've used RDF since the early days, I seem to have skipped  
> a lot of the advancements made in serialization as my focus was  
> distracted over the past few years.

A shame :)

> Sure, it appears that the difference between RDFON and N3 are  
> negligible from a technical point of view, so I don't want to argue  
> that one is better or worse than the other. I would like to point  
> out a few merits of RDFON to think about.

The cost, though, of diverting the efforts of programmers like  
yourself away from supporting the common formats if high.  I'm not  
saying that one should never fork off a new format, or there would  
not have been an N3.  But now the need has been met and we need a  
common language not only for code but for people on IRC and in  
tutorials and in presentations. One can pretty much expect technical  
sem web folks to grok a slide in N3 nowadays

>
>    * RDF is built upon an assumption of propositional logic and
>      category theory that many programmers aren't used to. Sure, it's
>      the same philosophy underneath (going back to Wittgenstein,
>      Russel, Frege, and even Aristotle), but many programmers think in
>      different terms. N3 thinks, "I'm making an assertion regarding a
>      particular subject, predicate, and object, so I therefore list  
> the
>      three parts of the proposition being asserted." 95% (I would
>      guess) of today's programmers would think, "I'm assigning a value
>      to the property of an object."

No, I think the model for this information is data, not program.
And in fact the JSON code is not program, is it JS object.
  JSONs { x: "3", y: 5}   i snot assignment but a data structure.
N3 very much matches this. It omits the ":", yes, which actually  
saves a lot of time.
It also has the comma which is very naturally english-like and also  
saves a lot of code.


> So while N3 has "eg:childCount 2;",
>      a procedural programmer sees something wrong here---he or she  
> will
>      breath a sigh of release when he/she sees that extra delimiter
>      character separating the property from its value:
>      "eg.childCount:2". This is a teeny, tiny issue, but I think the
>      difference in mindset between "setting object properties" versus
>      "asserting propositions" is a significant one, even though they
>      both mean the same thing.

Well, I'd like you to make a point of using N3 for a month and see  
whether
you still feel that way.

By the way, in the original language, there were those delimiters,  
but they were
optional and people never used them. You could write  Alice  -- child  
-> Bob
and Alice has child Bob, but now we say Alice child bob.

>    * Along the same lines, when a common programmer looks at
>      "<urn:uuid:ca00db92-0f7f-434b-b254-8a6afcf57907> a
>      <http://xmlns.com/foaf/0.1/Person>" he/she won't know what the
>      heck is going on. When he/she looks at
>      "foaf.Person(<urn:uuid:ca00db92-0f7f-434b- 
> b254-8a6afcf57907>)", on
>      the other hand, he/she will think, "Ah, an instance of the class
>      foaf.Person is being instantiated, and is being initialized with
>      the URI <urn:uuid:ca00db92-0f7f-434b-b254-8a6afcf57907>, which
>      must be its ID." Of course, that's not *exactly* what's going on,
>      but at its core it's saying the same thing---and we've made it
>      easier for the common programmer to grasp what's going on.
>      (Frankly, there are a lot more common programmers out there than
>      there are propositional logicians.)


I don't think programmers need to see object creation. And object  
creation is in fact a misleading metaphor, suggesting that the number  
of slots os fixed in advanced.  So wile it is useful to help people  
along with a metaphor, it also piles up issues for the future, making  
it more difficult I think for people to unhook from the closed world  
assumption and realize that any document can say anything about  
anything.


>    * So let's talk about literals, now. No one take offense, but I
>      personally have always thought that "value"^^eg:datatype is just
>      plain ugly. And it shows that RDF never quite knew what to do  
> with
>      literals, especially not typed literals. If everything is a
>      resource and everything can be described by a three-part
>      proposition, what is eg:datatype? Is it a property of the literal
>      "value"?

I think a datatype -- or a unit --  is best modeled as  a property  
relating the data value to the bare number.

	:bed :length  [ si:meters 2].

or using path notation

	:bed "length 2^si:meters.

This has all kinds of nice properties,   like   si:hz  owl:inverseOf  
si:seconds.
So when the RDF folks wanted an NTriple notation for a datatype built- 
in to a value, I suggested ^^ as analogous to ^.


> But literals can't have values.

That's a bug IMHO in RDF/XML.  I understand it isn't true in SPARQL.   
It isn't true in full N3.

> And surely the literal
>      "value" cannot have multiple conflicting datatypes. And how is an
>      xsd:string different from a plain literal? In RDF 2.0, I would
>      like to move literals away from their strange quasi-resource
>      status so that *everything* is a resource, For Real. An integer
>      such as 123 is a resource, and it might be *defined" by the
>      sequence of characters "123",


I agree that thinking of an integer as a Resource is fine, in that  
123 is a Thing, like everyThing else.

That does not mean we should symbols and literal values in the language.
I think it is fine to have 123 (note no quotes)  as literal in n3,  
which it is.
I think it fine to say that that sequence of character sin the  
labguage a identifies the number 123, which is a member of the  class  
of Integers, much as a URI identifies another reseource.  I think in  
fact also its fine to make URIs and say they also represent the  
number 123, e.g.

  	   	ex:bicycleWheels = 2.

(= is owl:smeAs in N3)

I don't, however,  think it works to have rdf:about as a single  
property (or even XML attribute) relating
123 to the string "123".   For example, suppose we want to model  
octal numbers and decimal numbers.
I much prefer to concentrate on the number 123 as an Integer, and  
have separate properties decmal and octal
relating it to different strings, than to imagine separate classes of  
Decimal Integer and Octal Integer.


> but that's different than saying
>      that "123" is a resource with a data type of xsd:integer  
> (which is
>      not true). I'd like to see something like <xsd:integer
>      rdf:literalAbout="123"/>, where "123" plays an analogous role to
>      rdf:about. This is the same thing has saying <rdf:Description
>      rdf:type="xsd:integer" rdf:literalAbout="123"/>. Suddenly we get
>      typed literals (distinct from strings!) that can take properties
>      and appear in lists with no problem.

The fact that literals can't appear in lists is clearly a bug.  So I  
think we should have an RDF 1.2 to fix that ASAP.

> And (finally) going back to
>      RDFON, we see that eg:datatype("value") is really just
>      instantiating an eg:datatype class with a lexical identifier
>      instead of a URI identifier.

If you look at that as an object initialization function, then that  
maps to a binary predicate which is my model above. I prefer very  
much to have a datatype-specific one such as dt:decimal.

One more note on datatypes.  In practice the term in the RDF abstract  
language which N3 writes as 123 and NTriples writes as  
123^^xsd:integer I model as   [ xsd:integer 2]  or 2^xsd:decimal, in  
practice is stored in RDF stores typically as  some object like

	 {termType: 'literal',  value: "123",  dt_URI: "http://...integer",  
lang: null }

This is a term in the language. It isn't the resource 123.  Many  
literal terms can identify the same Integer. 0123 is one, for  
example, and 00123 another , not to mention the decimal 123.0 and the  
octal etc etc.

It is tempting in built-in functions to add a primitive for accessing  
the datatype from the term.   But the you can't access the datatype  
of an Integer.   At the raw RDF graph level, then you could write

	{   ?x  rdf:datatype xsd:Integer } =>  { ?x rds:label  "Should be an  
Integer." }.

but once a system has any notion of the trivial inferences around  
datatypes, then that becomes an inappropriate question: ?x might have  
been calculated as the sum of two numbers, and many terms could  
represent it.  So when a datatype is a relationship between a typed  
value and an untyped string, then a typed value does not "have" a  
unique datatype.

((You could say

	{   ?x  xsd:Integer ?y } =>  { ?x rds:label  "Should be an Integer." }.

meaning "If there is a ?y such that ?y if the representation of ?x as  
an integer then  label ?x as an integer" ))

> It makes sense to common programmers
>      using RDF 1.x, and it points to where I want to go with RDF 2.0,
>      in which everything is a resource, For Real, and a typed literal
>      is not just a kludge on top of a kludge.
>    * Common programmers expect a comma in a list such as  
> ( "Smith"      "Van Buren" ). Perhaps propositional logicians do not.

I agree. ((Well, anyone who has ued S-expression in any form have  
come to accept Lots of Irritating Silly Parens as some put it, and  
would be driven crazy by a bunch of commas. :)  Note the comma is  
used in N3 for repeated objects, where I find it very natural.  So  
one idea was to use commas to express lack of ordering, ie make a set  
a comma-separated list, and an ordered set space-separated.  This  
made the amount of look-ahead in the language too great.
))

  These syntactic decisions are about so many compromises about using  
defferent communities' metaphors in the langauge desig, and  
subjective notions of cleanliness.  Some of the design decisions made  
in N3 design are noted, with the alternatives and some arguments fro  
an against, in

		http://www.w3.org/DesignIssues/N3Alternatives


The idea of properties linking a string and the string interpreted,   
for langauge and datatypes, is described in the 1998 article (maybe  
dated in parts)
DesignIssues/InterpretationProperties.html -- see specifically  
DesignIssues/InterpretationProperties.html#Interpreta1
which was I think added in 2001.

There may have been propositional logicians involved in the design,  
but a core value of expressing data as in data formats, tab-separted  
text, etc.

Tim
Received on Tuesday, 31 July 2007 17:47:20 UTC