Re: long-range datatyping and rdfa/microdata

On 2011-06-10, at 13:07, Dan Brickley wrote:

> On 10 June 2011 13:58, Steve Harris <steve.harris@garlik.com> wrote:
> 
>>> Well foaf:age does not currently have a datatype set, for the
>>> verbosity reasons you give. Perhaps datatype annotations can be added
>>> into RDFa 1.1 profiles?
>> 
>> Honestly I'm not convinced that the datatyping matters in this kind of situation, if someone says their age is "39"^^xsd:string, so what. The FOAF world (as will be any schema where the data is so widely used) is full of nonsense like foaf:age "twelve", foaf:age "79b8c9c56b0b879941b3cd424b1af2bc", foaf:age "三十九"@zh and so on, the fact that some of them aren't tagged as xsd:integers doesn't even register on the scale of the kind of nastiness you have to cope with.
> 
> Yes, I was never expecting instance publishers (of RDFa or any similar
> markup) to add that stuff.
> 
> Hopefully quality issues will be partially addressed when people
> actually start using this sort of data for apps that people care about
> (SEO or whatever). Right now, there is almost no incentive to get
> things right.

Perhaps, but many SEO attempts are misguided and/or largely based on superstition; bogus keywords in <head>, huge long URLs stuffed with nonsense, deliberate typos to cache search engine users who can't spell, and so on.

Sadly, at various points in time, these things have actually helped SEO, so they persist.

>> If you're trying to calculate the average age of people interested in frogs, then you just have to ignore anything that doesn't cast cleanly to an integer.
> 
> So you don't see any value in knowing in advance which properties are
> intended to be castable to integers?

It depends. In the case of FOAF, or Good Relations for e.g., not really. Your app needs to have loads of domain knowledge to make sense of the kind of things you find anyway, just knowing that the thing on the RHS of a foaf:age predicate should probably be an integer is pretty minor advantage in comparison.

Other things that you might learn from the schema (e.g. that foaf:mbox_sha1sum is inverse functional) will land you in hot water pretty quickly, if you actually believe them. I don't see the data getting cleaner over time.

It would be more useful if you're writing domain neutral tools, which don't have embedded knowledge of the schema, but I have trouble believing in a future where something of the complexity of GR could be usefully tackled with a completely domain-neutral tool. In more specialised areas, where there aren't specific tools, I can see it being an advantage.

In many cases a regular expression describing the allowed format would be just as useful.

As a side note, "三十九"@zh is an integer string, so arguably conforms to the spec :) I think it's 20 10 9, the canonical representation for 39 in traditional chinese, but I could well be wrong.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Friday, 10 June 2011 12:35:11 UTC