Re: Schema.org property cardinality and use of plural (WAS Re: SoftwareApplication proposal for schema.org)

tl;dr: just wanting to +1 another good email in this thread:

On Wed, Feb 29, 2012 at 00:23, Martin Hepp
<martin.hepp@ebusiness-unibw.org> wrote:
> Hi Adrian, all:
>
> I really, really think it is a bad idea to model multiple values for the same property by using delimiters inside a single element. This would basically re-introduce all the problems that delimiter-based data exchange (e.g. CSV files) had and that markup-based representations (XML, HTML,...) had overcome.

Agreed for HTML. I think XML (especially on the web) has so many of
its own problems that I'm unsure about whether it is better or worse
than CSV when considering all issues/problems.

As another example, whenever we encountered such a compound property
in other formats that were being considered for adoption in
microformats, we "singularized" the names of the properties explicitly
to solve this problem. I've written up a bit about this here (also
noting what you pointed out about CSV).

http://microformats.org/wiki/cardinality#adopting_compound_properties_from_other_formats


> So I would strongly suggest to use this pattern:
>
> <div itemscope="" itemtype="http://schema.org/SoftwareApplication">
>  <p itemprop="operatingSystems">OSX 10.6</p>,
>  <p itemprop="operatingSystems">Windows 7</p>
> ...
> </div>

It's a good example.


> or, if you cannot reuse the visible content for some reasons
> <div itemscope="" itemtype="http://schema.org/SoftwareApplication">
>  <p>OSX 10.6, Windows 7</p>
>  <meta itemprop="operatingSystems" content="OSX 10.6">
>  <meta itemprop="operatingSystems" content="Windows 7">
> ...
> </div>
>
> Note that, afaik, meta can live without a closing </meta> tag in HTML5.

This is a reasonable HTML5 work-around, and worthy of being
documented, yet as with any invisible (meta)data, it should be
strongly discouraged for all the usual data deterioration reasons etc.

>
> Martin

Thanks for writing these up Martin,

Tantek

>
> On Feb 29, 2012, at 9:07 AM, Adrian Giurca wrote:
>
>> Hello Henry,
>> I would say space will be the token separator :)  (a bad result) But, in this context, the main problem is not how to extract triples but what  content creators really do.
>> I am confident that a non-trivial schema processor (extractor) will do more than simple DOM parsing.
>>
>> -Adrian Giurca
>> On 2/28/2012 7:56 PM, Henry Andrews wrote:
>>> With the caveat that I'm new here and probably don't know what I'm talking about, this plural/list usage does not look like a good idea, as it requires anyone who wants to make use of the data to understand that it needs to parse and split on the comma.  Which is easy enough in this example but can become very complex in terms of quoting and escaping, at which point people are likely to write things improperly quoted/escaped making the data worthless.  It's much much easier to say that all formatting/parsing should be handled by the actual markup syntax (in this case HTML) and values are treated as-is.
>>>
>>> I guess this would make for more verbose HTML markup as you'd need to wrap each OS in a <span itemprop="operatingSystem"></span>, but I think it's much more clean.
>>>
>>> thanks,
>>> -henry
>>>
>>> From: Adrian Giurca <giurca@tu-cottbus.de>
>>> Subject: Re: Schema.org property cardinality and use of plural (WAS Re: SoftwareApplication proposal for schema.org)
>>>
>>> When Text is expected I would say that both string and distinct markup should be allowed. Asa such the below may work too:
>>> <div itemscope="" itemtype="http://schema.org/SoftwareApplication">
>>>   <p itemprop="operatingSystems">OSX 10.6, Windows 7</p>
>>> ...
>>> </div>
>>>
>>> and a potential Schema processor should be advised. Of course, this can solved much better  by introducing cardinalities on Schema.org
>>> Introducing cardinalities will not put any pressure on possible existent Schema.org consumers.
>>> However, one should be advised that object oriented software design has a long tradition on using plural to introduce collections of things.
>>>
>>> -Adrian  Giurca
>>>
>>
>> --
>> - Adrian
>> Follow Me on Twitter
>> Connect on Linkedin
>
> --------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  hepp@ebusiness-unibw.org
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>         http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
> * Project Main Page: http://purl.org/goodrelations/
>
>
>
>



-- 
http://tantek.com/ - I made an HTML5 tutorial! http://tantek.com/html5

Received on Sunday, 4 March 2012 04:16:00 UTC