Re: Subjects as Literals from Peter Ansell on 2010-07-01 (semantic-web@w3.org from July 2010)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Fri, 2 Jul 2010 08:03:47 +1000
To: Pat Hayes <phayes@ihmc.us>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <AANLkTilbtW66HYgThpXaq551Ip0o1vZUYL_O7wBR_-Jj@mail.gmail.com>
On 2 July 2010 04:56, Pat Hayes <phayes@ihmc.us> wrote:
>
> On Jul 1, 2010, at 5:21 AM, Peter Ansell wrote:
>
>> On 1 July 2010 13:14, Pat Hayes <phayes@ihmc.us> wrote:
>>>
>>> On Jun 30, 2010, at 8:14 PM, Ross Singer wrote:
>>>
>>>> I suppose my questions here would be:
>>>>
>>>> 1) What's the use case of a literal as subject statement (besides
>>>> being an academic exercise)?
>>>
>>> A few off the top of my head.
>>>
>>> 1. Titles of books, music and other works might have properties such as
>>> the
>>> date they were registered, who owns them, etc..
>>> 2. Dates may have significant properties such as being the day that
>>> someone
>>> was shot or when war broke out.
>>> 3. Dates represented as character strings in some known date format other
>>> than XSD can be asserted to be the same as a 'real' date by writing
>>> things
>>> like
>>>
>>> "01-02-1481" sameDateAs "01022010"^^xsd:date .
>>> "01-02-1481" isDateIn :MuslimCalendar .
>>>
>>> I am sure that you can think of many more. In general, allowing strings
>>> as
>>> subjects opens the door to a wide range of uses of RDF to 'attach'
>>>  information to pieces of text. Another example which occurs to me: this
>>> piece of text is the French translation of that piece of text, expressed
>>> as
>>> a single RDF triple with two literals.
>>
>> If you are working with datasets where you just need to know explicit
>> facts, and not who said anything about the facts, this may be useful.
>
> Well, that is a good sketch of the way that RDF is intended to be used. It
> doesn't have any very advanced machinery for keeping track of who said what
> about what facts. It jsut records the facts. (I know it has reification....)

It does depend on how it interprets facts though...

>> You will run into issues if you accidentally use the string
>> "01-02-1481" or you import another set of triples that gave that
>> string a different meaning, like the barcode of a computer for example
>> and the implication that the date was the barcode of a computer.
>
> Why would you? A given character string may indeed have several properties
> and interpretations. The word "chat" has one meaning in English and a
> different one in French, but its the same string of four letters in both
> cases.

What is your interpretation of the third triple in this sequence? Does
someoneElse have an opinion about cats (french semantic meaning) or
chatting (english semantic meaning), or should the model imply that
cats and chatting are equivalent? To me it seems like it is not
logical to choose one interpretation randomly over the other. There is
no difference between this issue and letting Literals become Subjects,
as the key motivator for my argument is that an instance of a Literal
in one triple should not affect other triples where it appears
(currently just as the Object)

<me> <likesTo> "chat" (Literal1) .
<frenchFriend> <likes> "chat" (Literal2) .
<someoneElse> <hasOpinionAbout> "chat" (Literal3) .

You are seeming to say that instance of "chat" (Literal3) in the third
triple should inherit information about frenchFriend likes "chat"
(noun) (Literal2) but also that me likesTo "chat" (verb) (Literal1).
If Literals should be merged between triples an RDF processor will
have to accept both interpretations of triple 3 concurrently, while I
would just accept all three triples separately as three separated
graphs without logical confusion because the concept of a shared
Literal wasn't in my interpretation of RDF.

Even if you go back to the bnode model for the first two triples, as
shown below, all of the triples are merged in some way, and you can't
semantically distinguish bnode1 from bnode2 to decide which version of
"chat" is the correct interpretation for the opinion statement. As in
the case above, if you aren't sharing literals between triples, then
Literal3 is semantically separated from both of the other literals,
and there is no confusion about what each of the statements mean.

<me> <likesTo> _:bnode1 .
_:bnode1 sameas "chat" (Literal1).
<frenchFriend> <likes> _:bnode2 .
_:bnode2 sameas "chat" (Literal2) .
<someoneElse> <hasOpinionAbout> "chat" (Literal3) .

The fact that RDF has language annotations should have no effect on
this argument, as it is incidental, and the argument could map to any
string literals that don't have types and one could find examples of
semantic inconsistencies that an algorithm could never fix
consistently.

>> If
>> you are going to work at the web level then you will get a new set of
>> issues surrounding what literals should actually be merged.
>
> Merged? You can merge two literals only if they are the exact same literal,
> in which case they are already 'merged' in the RDF graph model.

Sorry for any confusion. I was under the impression that literals were
not merged, so the triples using the same literal were not actually
related in a single conceptual graph. I don't agree that it is the
right way, re the issues that it brings, but if that is the way the
specification says it works then we have to stick with it. If RDF is
really just supposed to be a general data description model like JSON
then we wouldn't have to worry about the semantic conflict between
shared Literals anyway.

That changes the entire conversation if RDF was *always* designed that
Literals should be merged based on co-occurence. If it is already that
way then some people have been misinterpreting it. It wouldn't
actually require a material change to the specification for Literals
to become Subjects as well as Objects if they are already used for
chaining Triples together.

Why not do it today so the false interpretation (non-Literal-merging)
of the RDF specification doesn't keep filtering through and people can
start fixing their legacy RDF software? Then any actual RDF users
could start telling customers that they need to be very careful about
what words they type into their software as the database may develop
errors if they use the same word twice in different contexts. It may
be very upsetting for users to later figure out that they suddenly had
opinions about cats after a french friend joined the community because
they liked chatting in the past.

Cheers,

Peter
Received on Thursday, 1 July 2010 22:04:22 UTC