Re: Schema.org property cardinality and use of plural (WAS Re: SoftwareApplication proposal for schema.org)

tl;dr: making all properties optional and plural at a
syntactical/schema level is more future proof.

slightly longer:

I saw the wiki page: http://www.w3.org/wiki/WebSchemas/Singularity but
I have such fundamental disagreements with its content that I'm
following up in email instead (wiki follow-up is my preference by
default).

1. bad idea to put cardinality into formats/schemas.
2. better to make each property plural at a format/schema level.
3. move any notion of "singular" semantics up to the per application
level (use first occurrence/value of a property if necessary).

Our experience with microformats has shown all this and thus we are
moving forward with 1-3 above in microformats-2 (explicitly making all
properties both optional and plural at a syntax level, permitting
applications to apply any needed singular semantics to first instance
of a property).

See point (2.) here:

http://microformats.org/wiki/microformats-2#Summary

Related: making properties optional has also been a hard lesson
learned, with repeated examples, from hCard to hAtom - which to be
fair took their notions of "required" properties from vCard and Atom,
though a mistake propagated is still a mistake.

In short: people will omit properties when publishing, and often think
of ways that it makes sense to do so - beyond what the original
vocabulary designer(s) was/were thinking. Better to define what that
means on a per vocabulary / application level than make it something
syntactical.

More inline:

On Thu, Mar 1, 2012 at 10:01 AM, Dan Brickley <danbri@danbri.org> wrote:
> On 24 February 2012 21:25, Will Norris <will@willnorris.com> wrote:
>> I had the same question when I first started looking at this.  There is a
>> certain simplicity in not requiring microdata vocabularies to define
>> cardinality of properties, and leaves the door open to interesting use cases
>> that may not have been initially imagined.
>
> Yes. Well there are two things here: do we define a cardinality

Experience with microformats has shown attempts to define cardinality
in formats for publishing on the web create an unnecessary point of
failure/fragility.

So, no, don't bother.

> do we also bake that cardinality into the property name with an
> English plural 's' (or it's absense)?

Even worse idea.

Simple answer (using hRecipe as a real world designed/implemented
source of examples)

http://microformats.org/wiki/hrecipe

1. use singular forms of English nouns, even for (expected)
multivalued properties. E.g. "ingredient" is the property name even
though pretty much all recipes have multiple ingredients (and thus
instances of that property)

2. plural forms of English nouns should only be used when it implies
specific meaning about any instance of the property (e.g. amounts),
not some implication that the property is or may be multivalued. E.g.
"instructions" is the property because a property value itself likely
contains multiple human readable instructions, and in practice recipes
have a single instance of this property. Related: the "calories"
extension property (which is an amount).


> I'm in favour of defining cardinality when it makes sense to do so

It never makes sense to do so at a format level.


> (and in the full expectation people will ignore or mess up whatever we
> try to impose).

Exactly why. Instead define application processing of such cases.


> But I don't think the experiment of using plural 's'
> markers has worked well.

Yes, the use of plural forms to indicate anything syntactically or
semantically automatically was/is a mistake. Let's stop propagating it
(and shame on whoever thought that experiment was a good idea :P)


> http://schema.org/Person has 'spouse' rather than 'spouses'. Are we
> really to assume the property can have at most one singular value?
> What about re-marriage, or societies (the Web having global reach)
> where multiple spouses are common?

Great example.

Cultural differences are easily a source of disagreements of
cardinality. Avoid this problem by leaving out cardinality.


> I'd much rather see cardinality expressed schematically
> than through spelling,

Both are bad.

> since changing the expected spelling has impact
> on a *lot* of instance data.

Separate issue: unnecessary renaming in general is bad, and I'd advise
anyone who makes decisions on schema property names to consider
re-using existing property names, perhaps singularized as necessary
(as we've done with microformats), rather than using new names (as is
rampant throughout schema.org - lots of unnecessary NIH,
even/especially where we (previous to schema) had format convergence
on the web e.g. Person vs. vCard/hCard, Event vs. iCalendar/hCalendar
etc.).


>> I think the same would apply to cardinality.  We provide guidance on
>> expected cardinality of properties, but always do the best we can with
>> whatever we get.
>
> Yes. With FOAF we declared some properties as having 'at most one
> proper value', or implying that
> there can be at most one entity with any given value. Sites got it
> wrong all the time, but at least the
> declaration helped track down some data problems.

And before that vCard (up through v3) made the same mistake of
declaring many properties to have at most one value.

Much of this was addressed in vCard4, where previously singular
properties were made plural.

In short, vocabulary designers get cardinality wrong all the time, so
you might as well give up trying. Seriously, y'all are not that smart.
None of us are. ;)

It's easier (and more future-proof) to simply allow every property to
be plural, and then define any perceived singular semantics at a
higher application level (which is where any notion of singular vs
plural actually matters if at all). If it changes, changing the
application is much easier than the format.

> If we have to choose between the JSON being a bit weird, or the
> HTML-based markups being a bit weird, I would go for the former. JSON
> feeds are relatively invisible, whereas the HTML source has a wider
> and more varied audience.

Agreed. HTML impacts more authors, thus takes design precedence over JSON.

>> This same problem occurred with PortableContacts when you compare the XML
>> and JSON
>> representations: http://portablecontacts.net/draft-schema.html#anchor5.  For
>> what it's worth, PoCo used plural naming where properties were expected to
>> be multi-valued.

Which was also a mistake.

> Yup, it's hard designing a schema to work nicely in two quite
> different syntaxes.

Yes it is hard, but not impossible.

We've taken a shot at doing so for HTML and JSON in microformats 2.0 [1].

Comments appreciated (though perhaps better redirected to microformats-new[2]).

>
> cheers,
>
> Dan
>


Thanks,

Tantek

[1] http://microformats.org/wiki/microformats-2
[2] http://microformats.org/mailman/listinfo/microformats-new/

Received on Friday, 2 March 2012 00:43:56 UTC