Re: my laziness with literals

On Oct 12, 2007, at 10:25 AM, Garret Wilson wrote:

>
> Dan,
>
> Dan Brickley wrote:
>>
>> Garret Wilson wrote:
>>> But if we're going to produce semantic rich data that can be  
>>> machine-processed, we need to store things as they are, with  
>>> appropriate indication of type.
>>
>> I'm not convinced of this. RDF/XML's syntax for datatyping is  
>> pretty heavyweight, and there are many RDF vocabularies that pre- 
>> date RDFCore (ie. created between 1997-2003).
>
> I was making a normative assertion---saying the way things *should*  
> be going forward.
>
> I agree completely with your comments regarding RDF/XML typed  
> literal syntax---but that's a problem with RDF/XML. If RDF/XML made  
> typed literals as easy to use as plain literals, would you agree  
> with me when I say that we *should* use appropriate types in the  
> future rather than making plain literals our first choice?
>
>>
>> It would be good to have a notation in RDFS/OWL (maybe OWL1.1  
>> could do it) to indicate that some plain-literal-valued property  
>> takes string values that can be cast to some specified datatype.
>
> OMG. Of course it would be useful, but it's ludicrous because of  
> what it says about how hard it is to use RDF/XML datatypes. I'm  
> wondering whether to laugh or to cry (not because what you say is  
> laughable---but because of the conditions that make your suggestion  
> useful).

I wonder if you guys could clarify what you have in mind as the  
problems with RDF/XML datatypes?  The reason I ask is that what many  
people find troublesome is the requirement to explicitly specify a  
type with every literal.  If that's what you're referring to, that's  
not an artifact of RDF/XML, it's part of the way RDF itself defines  
datatyped literals (and there's a daunting amount of email in the RDF  
Core archives concerning the tradeoffs involved in that  
requirement).   What we used to call "long-distance datatyping" would  
be very convenient;  but it's not as easy as it looks in a SW  
environment.

>
>>
>> RDF has special handling for URIs. Almost always people are  
>> interested in the thing the URI is identifying, not in the URI  
>> string itself.
>
> I'm not sure what you're saying. Are you saying that any time a  
> processor sees a plain literal starting with "http://example.com/",  
> it should assume that the type is URI because people never want to  
> identify the URI string itself? If we have to rely on the context,  
> aren't we back in plain XML land?
>
> If people want a string of "http://example.com/", they should use a  
> string. If people want a URI of <http://example.com/>, they should  
> use a typed literal of xsd:anyURI type. If they want a resource  
> identified by the URI <http://example.com/>, they should use a  
> resource with that URI. Isn't that the perfect world scenario?  
> That's what I was pushing for---a perfect world. :)
>
>
>>
>>> * If you're going to store a language, use something like  
>>> info:lang/en/US.
>>> * If you're going to store a Java class, use something like  
>>> info:lang/com/example/package#Class.
>>
>> There is a java: URI scheme. This is used for example in ARQ for  
>> dynamic  code loading. I don't see a case for using info: instead.
>
> There might have been *plans* for a Java URI scheme back when you  
> suggested it over eight years ago (<http://lists.xml.org/archives/ 
> xml-dev/199903/msg00165.html>), but I don't think it was ever  
> standardized, and the link you cited (<http://www.w3.org/Addressing/ 
> schemes.html>) no longer references such a scheme. If such a scheme  
> has been standardized, by all means let me know. Otherwise, I'm  
> going with info:java/ .
>
>>
>>> * If you're going to store an Internet media type, use something  
>>> like info:media/text/plain.
>>
>> Or dc:format?
>
> dc:format is a property. I'm talking about resource types. The  
> whole point of RDF is that we can tell the types of resources  
> without knowing what predicate is being used.
>
>> It's good to agree on ways of doing these things, but your choices  
>> seem a little arbitrary,
>
> There is no "java:" URI scheme, so there is no alternative to  
> info:java/. dc.format is a property, not a resource type, so saying  
> that is an alternative is comparing apples to oranges. So I don't  
> know of any alternatives to my choices---if there were choices, I  
> would have used them. By all means, I'm interested in knowing other  
> choices.
>
>> and not yet widely used.
>
> ...because it's easier to stick things in plain literals.
>
>>
>> Can we take "Be liberal in what you accept, and conservative in  
>> what you send." (see http://www.postel.org/postel.html ) as a  
>> shared goal here?
>
> In a semantic context?! No, no, no!
>
> "Be liberal in what you accept, and conservative in what you send"  
> is useful in certain circumstances when interpreting syntax and  
> protocols. But in a semantic context, it's horrible---I don't want  
> to send you a string "123" and have you use it as the integer 123  
> just because you noticed I used digits in the string! Similarly, I  
> don't want to send you the string "www.something" and have you try  
> to look up a web page just because you noticed there was a "www" in  
> there somewhere. Will the strength of our semantic exchange rely on  
> how good our heuristic algorithms are? The whole point of a  
> semantic framework is that we identify the types of things we're  
> using! Otherwise, we could just stick everything in XML and tell  
> people to guess about types based upon context.
>
> Anyway, just a statement from experience trying to encapsulate best  
> practice---didn't realize this would be controversial.

But best practice for what?  I think there are two slightly different  
issues involved here.  The idea of developing an entirely open  
datatype facility in RDF was to allow people to exactly identify the  
datatype of the literals they are publishing, without having to do  
(or even worry about the possible need for) any conversion to some  
canonical set of datatypes defined by RDF.  So if I'm publishing  
(integer) ages using literals obtained from a Java program, and you  
are publishing (integer) ages using literals obtained from an SQL  
database, we might want to use separate datatypes (even if we're  
talking about values of the same RDF property), in order to identify  
exactly the datatype the literal is associated with at its source.   
That, to me, is *more* semantics about the literal, not less  
semantics.  But it does place the burden of dealing with any  
differences between the semantics of the source and target datatypes  
(here, possible slight differences between "integers") on the  
consumer.  Anyway, in this context I think at least a form of "Be  
liberal in what you accept, and conservative in what you send"  
continues to be good advice, if we can understand "conservative" as  
meaning "provide the receiver with as much relevant metadata (in this  
case, the datatype) about what you're sending as you can".  Notice  
that there's no rule that says that a receiver can't ignore the  
datatype part of the literal and interpret it according to some  
locally-defined scheme;  but the information is there if the receiver  
wants it.  (We might also want to add to the end of the rule the old  
boxing adage "protect yourself at all times"!)

Garret, what you seem to be suggesting is more of an agreed set of  
types that everyone would agree to use, and where the mediation with  
local types would already have taken place.  More of this would be a  
good thing, provided it doesn't lock out other data ("be liberal in  
what you accept...").  And it can be done on top of what's already  
there.

--Frank

>
> Best,
>
> Garret
>

Received on Friday, 12 October 2007 17:33:34 UTC