Re: Subjects as Literals from Peter Ansell on 2010-07-01 (semantic-web@w3.org from July 2010)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Thu, 1 Jul 2010 20:21:26 +1000
To: Pat Hayes <phayes@ihmc.us>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <AANLkTilQOwTUx24RoIEuGtpg-nbDzs5bI4ZQETSb9jxa@mail.gmail.com>
On 1 July 2010 13:14, Pat Hayes <phayes@ihmc.us> wrote:
>
> On Jun 30, 2010, at 8:14 PM, Ross Singer wrote:
>
>> I suppose my questions here would be:
>>
>> 1) What's the use case of a literal as subject statement (besides
>> being an academic exercise)?
>
> A few off the top of my head.
>
> 1. Titles of books, music and other works might have properties such as the
> date they were registered, who owns them, etc..
> 2. Dates may have significant properties such as being the day that someone
> was shot or when war broke out.
> 3. Dates represented as character strings in some known date format other
> than XSD can be asserted to be the same as a 'real' date by writing things
> like
>
> "01-02-1481" sameDateAs "01022010"^^xsd:date .
> "01-02-1481" isDateIn :MuslimCalendar .
>
> I am sure that you can think of many more. In general, allowing strings as
> subjects opens the door to a wide range of uses of RDF to 'attach'
>  information to pieces of text. Another example which occurs to me: this
> piece of text is the French translation of that piece of text, expressed as
> a single RDF triple with two literals.

If you are working with datasets where you just need to know explicit
facts, and not who said anything about the facts, this may be useful.
You will run into issues if you accidentally use the string
"01-02-1481" or you import another set of triples that gave that
string a different meaning, like the barcode of a computer for example
and the implication that the date was the barcode of a computer. If
you are going to work at the web level then you will get a new set of
issues surrounding what literals should actually be merged. At least
currently there is one level of protection between documents in regard
to literals as subjects, ie, unmergeable blank nodes.

If you assume that there will never be any overlap between literals,
then you could be safe with having a single anonymous triple stating
the equivalence, but you would only be able to hope that the string
was never used in a different context or you would be in for some
trouble trying to explain to your customers why the software
necessarily translated a string the wrong way because it only required
one RDF triple instead of 3 or 4.

Perhaps only merging if they have a datatype and are not just a string
that doesn't have any other implications? RDF is still based on graphs
unless something major is changing. Literals would need to become
common across an entire datastore whereever they are used if they are
no longer known to just be graph leaves and could infact have attached
leaves.

Does RDF imply in any way currently that an RDF processing system
should merge a triple containing a literal with other triples that
contain that literal? I kind of presumed that it didn't have this
implication, and that it was safe to use the same literal in different
places without having any adverse effects. If Literals are able to be
subjects, then it would be necessary for every RDF processor to merge
any and all triples that contain a literal, *whether it is in the
subject or object position*, and have all of the implications that
gives. I think it is quite wise to require some sort of reference as a
way of identifying literals, even if it is a blank node. At least then
you can mix knowledge without having necessary conflicts between
totally different RDF triples. Even if it appeared as a side effect of
the RDF/XML serialisation, it may be useful generally.

> 4. It has been noted that one can map datatyping into RDF itself by treating
> the datatypes as properties, and there are several use cases for this. The
> natural way to do it involves having literals as subject, since the dataype
> map goes from the string to the value:
>
> "23" xsd:number "23"^^xsd:number .

Would this imply that wherever the string "23" was used in any RDF
triples you have access to, it would necessarily mean 23 (the number)?
If I needed to accept that I could never use "23" to mean the letters
2 and 3 put together for any reason, perhaps as a hex-encoding or
trademarked symbol, because there would be no way of isolating it from
the statement that it necessarily was a number and that it was the sum
of 11 and 12, rather than "23" which was "2" and "3" put together and
not equal to any number.

What if they needed to also know that 22 was a number, etc., ad
infinitum? The cases that require entire number lines to be present in
an RDF database, for the case to make non-trivial sense, seem to be
harming the case rather than furthering it in my opinion.

> 5. Also, allowing this "purely academically" has the notable advantage of
> simplifying RDF(S) inferencing, including making the forward-chaining rules
> simpler. Right now, there is a strange oddity involving blank node
> instantiations. One can say things like 'the number of my children is prime"
> by using an blank node:
>
> :PatHayes hasNumberOfKids _:x .
> _:x :a :PrimeNumber .
>
> But this legal RDF can't be instantiated in the obvious way:
>
> :PatHayes hasNumberOfKids "3"^^xsd:number .
> "3"^^xsd:number :a "PrimeNumber .   XXXX
>
> This trips up RDFS reasoners, which can often produce inferences by a kind
> of sneaky use-a-bnode-instead maneuver even when the obvious conclusion
> cannot be stated because of the restriction. (There are a few examples in
> the RDF semantics document.) Removing the restriction would enable reasoners
> to work more efficiently with a smaller set of rules. (I gather that at
> least some of the RDFS rule engines out there already do this, internally.)

The reasoner could have some number theory knowledge embedded to imply
the nature of the typed literal.

All of the use cases so far are completely factual, and could be
derived at any time using an algorithm or a query structure that
merged the literals. If there was any opinion relating to the
statement that 3 was in category X, than you might need to state it,
but it wouldn't be a general need and you could cope with whatever
(URI/blank node) hacks were necessary to get 3 to be in the category
without changing the way the RDF language is specified.

>> 2) Does literal as subject make sense in "linked data" (I ask mainly
>> from a "follow your nose" perspective) if blank nodes are considered
>> controversial?
>
> Seems to me that from the linked data POV, anything that can be an object
> should also be useable as a subject. Of course, that does allow for the view
> that both of them should only ever be IRIs, I guess.

I have not come to that conclusion myself. I come at Linked Data from
the same perspective as the traditional linked web. In the traditional
web it is of great importance that you know that "document A links to
document B" as opposed to just knowing that "a link exists between
either document A and document B or document B and document A" If we
assume that all links in RDF are necessarily bidirectional, with the
appropriate change to the predicate, then it makes it unusually
difficult to represent that directional knowledge (using two or three
triples), even if it makes it easier to work with numbers and dates in
a reasoner. You may be enabling one case at the cost of another case.

Having said all that, if you really need Subjects to be Literals so it
can make things possible that aren't currently possible then it may be
worth the trouble to change the language specification. If you just
want to make a few cases easier to deal with in your nicely boundaried
Reasoning-enabled RDF datastore, and you don't actually need everyone
on the web to change for a case to work then you could just extend the
language and make it work for yourself I think considering that it
adds a feature while it may be taking away a current implicit feature
(link directionality).

It may also help the literals-as-subjects case if N3 were standardised
as an RDF++. Then the community could focus on the N3 assumptions of
symmetry of RDF statements and its focus on numbers and logic rather
than the (possibly directionally) link oriented RDF/XML format that
needs to be able to represent incomplete data across many different
locations and disciplines.

Cheers,

Peter
Received on Thursday, 1 July 2010 10:21:54 UTC