Re: Subjects as Literals from Pat Hayes on 2010-07-01 (semantic-web@w3.org from July 2010)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 1 Jul 2010 13:56:18 -0500
To: Peter Ansell <ansell.peter@gmail.com>
Cc: Semantic Web <semantic-web@w3.org>
Message-Id: <431628B8-CEC7-4DE9-8C46-B677B026BD60@ihmc.us>
On Jul 1, 2010, at 5:21 AM, Peter Ansell wrote:

> On 1 July 2010 13:14, Pat Hayes <phayes@ihmc.us> wrote:
>>
>> On Jun 30, 2010, at 8:14 PM, Ross Singer wrote:
>>
>>> I suppose my questions here would be:
>>>
>>> 1) What's the use case of a literal as subject statement (besides
>>> being an academic exercise)?
>>
>> A few off the top of my head.
>>
>> 1. Titles of books, music and other works might have properties  
>> such as the
>> date they were registered, who owns them, etc..
>> 2. Dates may have significant properties such as being the day that  
>> someone
>> was shot or when war broke out.
>> 3. Dates represented as character strings in some known date format  
>> other
>> than XSD can be asserted to be the same as a 'real' date by writing  
>> things
>> like
>>
>> "01-02-1481" sameDateAs "01022010"^^xsd:date .
>> "01-02-1481" isDateIn :MuslimCalendar .
>>
>> I am sure that you can think of many more. In general, allowing  
>> strings as
>> subjects opens the door to a wide range of uses of RDF to 'attach'
>>  information to pieces of text. Another example which occurs to me:  
>> this
>> piece of text is the French translation of that piece of text,  
>> expressed as
>> a single RDF triple with two literals.
>
> If you are working with datasets where you just need to know explicit
> facts, and not who said anything about the facts, this may be useful.

Well, that is a good sketch of the way that RDF is intended to be  
used. It doesn't have any very advanced machinery for keeping track of  
who said what about what facts. It jsut records the facts. (I know it  
has reification....)


> You will run into issues if you accidentally use the string
> "01-02-1481" or you import another set of triples that gave that
> string a different meaning, like the barcode of a computer for example
> and the implication that the date was the barcode of a computer.

Why would you? A given character string may indeed have several  
properties and interpretations. The word "chat" has one meaning in  
English and a different one in French, but its the same string of four  
letters in both cases.

> If
> you are going to work at the web level then you will get a new set of
> issues surrounding what literals should actually be merged.

Merged? You can merge two literals only if they are the exact same  
literal, in which case they are already 'merged' in the RDF graph model.

> At least
> currently there is one level of protection between documents in regard
> to literals as subjects, ie, unmergeable blank nodes.
>
> If you assume that there will never be any overlap between literals,
> then you could be safe with having a single anonymous triple stating
> the equivalence, but you would only be able to hope that the string
> was never used in a different context or you would be in for some
> trouble trying to explain to your customers why the software
> necessarily translated a string the wrong way because it only required
> one RDF triple instead of 3 or 4.

This all seems like a non-issue. The basic RDF graph model has no  
'contexts' in it, and it clearly says that each literal node is  
unique. So if an RDF graph has two triples in it both containing the  
literal "23"^^xsd:number, then these are the exact same literal in  
both triples. They come, as it were, pre-merged.

>
> Perhaps only merging if they have a datatype and are not just a string
> that doesn't have any other implications? RDF is still based on graphs
> unless something major is changing. Literals would need to become
> common across an entire datastore whereever they are used

They already are common in this way. In fact, they are globally common.

> if they are
> no longer known to just be graph leaves and could infact have attached
> leaves.
>
> Does RDF imply in any way currently that an RDF processing system
> should merge a triple containing a literal with other triples that
> contain that literal?

Not sure what you mean. The triples don't get merged, but the literal  
nodes do.

> I kind of presumed that it didn't have this
> implication, and that it was safe to use the same literal in different
> places without having any adverse effects.

What adverse effects?

> If Literals are able to be
> subjects, then it would be necessary for every RDF processor to merge
> any and all triples that contain a literal, *whether it is in the
> subject or object position*, and have all of the implications that
> gives.

That is what the current RDF model requires. I see no reason why  
allowing them in the subject as well as the object position will have  
any drastic consequences.

> I think it is quite wise to require some sort of reference as a
> way of identifying literals, even if it is a blank node. At least then
> you can mix knowledge without having necessary conflicts between
> totally different RDF triples. Even if it appeared as a side effect of
> the RDF/XML serialisation, it may be useful generally.
>
>> 4. It has been noted that one can map datatyping into RDF itself by  
>> treating
>> the datatypes as properties, and there are several use cases for  
>> this. The
>> natural way to do it involves having literals as subject, since the  
>> dataype
>> map goes from the string to the value:
>>
>> "23" xsd:number "23"^^xsd:number .
>
> Would this imply that wherever the string "23" was used in any RDF
> triples you have access to, it would necessarily mean 23 (the number)?

No, it would only mean that if you apply the xsd:number property to  
it, the result would be a number.

> If I needed to accept that I could never use "23" to mean the letters
> 2 and 3 put together for any reason, perhaps as a hex-encoding or
> trademarked symbol, because there would be no way of isolating it from
> the statement that it necessarily was a number and that it was the sum
> of 11 and 12, rather than "23" which was "2" and "3" put together and
> not equal to any number.
>
> What if they needed to also know that 22 was a number, etc., ad
> infinitum? The cases that require entire number lines to be present in
> an RDF database, for the case to make non-trivial sense, seem to be
> harming the case rather than furthering it in my opinion.
>
>> 5. Also, allowing this "purely academically" has the notable  
>> advantage of
>> simplifying RDF(S) inferencing, including making the forward- 
>> chaining rules
>> simpler. Right now, there is a strange oddity involving blank node
>> instantiations. One can say things like 'the number of my children  
>> is prime"
>> by using an blank node:
>>
>> :PatHayes hasNumberOfKids _:x .
>> _:x :a :PrimeNumber .
>>
>> But this legal RDF can't be instantiated in the obvious way:
>>
>> :PatHayes hasNumberOfKids "3"^^xsd:number .
>> "3"^^xsd:number :a "PrimeNumber .   XXXX
>>
>> This trips up RDFS reasoners, which can often produce inferences by  
>> a kind
>> of sneaky use-a-bnode-instead maneuver even when the obvious  
>> conclusion
>> cannot be stated because of the restriction. (There are a few  
>> examples in
>> the RDF semantics document.) Removing the restriction would enable  
>> reasoners
>> to work more efficiently with a smaller set of rules. (I gather  
>> that at
>> least some of the RDFS rule engines out there already do this,  
>> internally.)
>
> The reasoner could have some number theory knowledge embedded to imply
> the nature of the typed literal.

No, that is not the point. It is not a matter of arithmetic, it has to  
do with blank node instantiation.

>
> All of the use cases so far are completely factual, and could be
> derived at any time using an algorithm or a query structure that
> merged the literals. If there was any opinion relating to the
> statement that 3 was in category X, than you might need to state it,
> but it wouldn't be a general need and you could cope with whatever
> (URI/blank node) hacks were necessary to get 3 to be in the category
> without changing the way the RDF language is specified.
>
>>> 2) Does literal as subject make sense in "linked data" (I ask mainly
>>> from a "follow your nose" perspective) if blank nodes are considered
>>> controversial?
>>
>> Seems to me that from the linked data POV, anything that can be an  
>> object
>> should also be useable as a subject. Of course, that does allow for  
>> the view
>> that both of them should only ever be IRIs, I guess.
>
> I have not come to that conclusion myself. I come at Linked Data from
> the same perspective as the traditional linked web. In the traditional
> web it is of great importance that you know that "document A links to
> document B" as opposed to just knowing that "a link exists between
> either document A and document B or document B and document A"

Um, yes, obviously. But what has this to do with being an RDF subject?

> If we
> assume that all links in RDF are necessarily bidirectional

What? Nobody AFAIK has suggested anything remotely like this. Im not  
sure what it even means.

> , with the
> appropriate change to the predicate, then it makes it unusually
> difficult to represent that directional knowledge (using two or three
> triples), even if it makes it easier to work with numbers and dates in
> a reasoner. You may be enabling one case at the cost of another case.

You have completely lost me. Surely, the directionality of the *link*  
has to do with following HTTP connections in URIs, no? Or am I  
completely misunderstanding you?

Pat Hayes


>
> Having said all that, if you really need Subjects to be Literals so it
> can make things possible that aren't currently possible then it may be
> worth the trouble to change the language specification. If you just
> want to make a few cases easier to deal with in your nicely boundaried
> Reasoning-enabled RDF datastore, and you don't actually need everyone
> on the web to change for a case to work then you could just extend the
> language and make it work for yourself I think considering that it
> adds a feature while it may be taking away a current implicit feature
> (link directionality).
>
> It may also help the literals-as-subjects case if N3 were standardised
> as an RDF++. Then the community could focus on the N3 assumptions of
> symmetry of RDF statements and its focus on numbers and logic rather
> than the (possibly directionally) link oriented RDF/XML format that
> needs to be able to represent incomplete data across many different
> locations and disciplines.
>
> Cheers,
>
> Peter
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 1 July 2010 18:57:22 UTC