Re: Proposal: Looking inside tables

Hi Martin,

Sorry for the delayed reply. I just came back from vacation... Thank you
for your detailed and deep feedback. I would like to try to address some of
the concerns you have with this proposal.

In general, I think there are many valid use cases for schema-level mark-up
of table contents (e.g., many open data scenarios), and I see value in
having a unified model that doesn't separate between the HTML case and
other tables (CSV, etc.). I agree with you that this is a generic problem
that should not be tied to a specific vocabulary or syntax. In fact, there
is nothing in this proposal that is specific to schema.org.

On specific issues (your numbering):

1. Confusing developers with schema-level rules: I agree that this
mechanism requires additional support from mark-up processors that want to
support it, but this implementation should be fairly straightforward, and
the semantics of the transformation very clear and specified by a spec that
remains to be written (not proprieraty inference rules)

2. Too simplistic for advanced patterns: Your example of GeoCoordinates is
supported by the nesting of properties. As described in the "Nested
descriptions" section, you can write (e.g., in JSON):

{

 "@context":{

   "@vocab": "http://schema.org",

   "t2": "http://my.domain.org/cities.csv#"

 }

 "@type": "SetOf/City",

 "@id": "http://my.domain.org/city/{t2:col:city-code}",

 "name": "{t2:col:city-name}",

*  "geo": {*

*  "@type": "GeoCoordinates",*

* "latitude": "{t2:col:city-lat}",*

* "longitude": "{t2:col:city-lon}",*
* },*

},

I think this is actually one of the strengths of mapping from the vocab
schema to the table columns vs. the other way around.

3. On the proposal being complicated: I think there is a trade-off to find
between expressiveness and simplicity. This is the simplest formulation we
could come up with that supports modeling the contents of reasonably
complex tables. My sense is that the only viable simpler option is the one
you proposed (which we also considered in earlier iterations), but it is
less expressive because it does not support nesting / creation of
intermediary nodes.

4. Putting this extension in schema.org: As I mentioned above, I don't
think this proposal is limited to schema.org. Schema.org is a good place to
start the discussion, since it has actionable impact on the behavior of
search engines.

Best regards,
-Omar


On Mon, Sep 2, 2013 at 4:18 PM, Guha <guha@google.com> wrote:

> Martin,
>
>   There are two distinct things to represent --- the structure of the
> table itself & the content represented by the table. One approach would be
> to treat the former as the primary set of objects and map them to the later
> (it is the later that we are really after). The second would be to focus on
> the later, which is what Omar's proposal does. The first strategy (the one
> you propose) is a good technique too. Would you care to make a concrete
> proposal along these lines?
>
> guha
>
>
> On Wed, Aug 21, 2013 at 10:06 AM, Martin Hepp <
> martin.hepp@ebusiness-unibw.org> wrote:
>
>> Thanks, Guha! In that case - and if you decide to add/extend the support
>> for modeling meta-data with schema.org, then it should IMO be done in a
>> conceptually clean manner, i.e. by adding respective properties and types
>> as in my "from the top of my head" example instead of overloading existing
>> properties that are defined for instance data. This would effectively boil
>> down to adding the properties
>>
>>     typeOfEntries - URL of schema.org type
>>     column - columns of the table (optional, since directly available
>> from the HTML tree)
>>
>> to http://schema.org/Table and to create a new type
>>
>>     http://schema.org/TableColumn ( or Dimension or something more
>> generic)
>>
>> with the property
>>
>>     mapsToProperty - URL or schema.org property
>>
>> for the basic case (without modeling links between tables, defining
>> columns as primary keys or foreign keys, etc.)
>>
>> Martin
>> On Aug 20, 2013, at 8:07 PM, Guha wrote:
>>
>> > Martin,
>> >
>> >  Thank you for your feedback. It is always good to see a spirited
>> discussion.
>> >
>> >  For better or worse, it turns out that marking each cell in a table is
>> not always an option. So ...
>> >
>> > guha
>> >
>> >
>> > On Tue, Aug 20, 2013 at 2:52 AM, Martin Hepp <
>> martin.hepp@ebusiness-unibw.org> wrote:
>> > Hi all,
>> > just from the top of my head - if you really want to add such a
>> meta-data extension, why don't you create a type
>> >
>> >     http://schema.org/Table
>> >
>> > with the properties
>> >
>> >     typeOfEntries - URL of schema.org type
>> >     column - columns of the table (optional, since directly available
>> from the HTML tree)
>> >
>> > and a second type
>> >
>> >     http://schema.org/TableColumn ( or Dimension or something more
>> generic)
>> >
>> > with the property
>> >
>> >     mapsToProperty - URL or schema.org property
>> >
>> > That should to the trick without mixing apples (tables) and oranges
>> (table entries).
>> >
>> >
>> > But in general, I strictly oppose the proposed extension, since it adds
>> an unnecessary intermedia level. Putting in the proper markup directly in
>> the body of the table is
>> >
>> > 1. cleaner,
>> > 2. requires no special processing, and
>> > 3. works in RDFa and Microdata without proprietary inferences.
>> >
>> > <table>
>> >   <thead>
>> >     <tr>
>> >       <th>Image</th>
>> >       <th>Name</th>
>> >       <th>Year</th>
>> >       <th>Technique</th>
>> >       <th>Dimensions</th>
>> >       <th>Gallery</th>
>> >     </tr>
>> >   </thead>
>> > <tbody>
>> >    <tr itemscope itemtype="http://schema.org/Painting">
>> >         <td><a itemprop="image" href="...URL">Image</a></td>
>> >         <td itemprop="name">The Cry</td>
>> >         ...
>> >    </tr>
>> > ...</tbody>
>> > </table>
>> >
>> >
>> > My strongest argument, however, is that most tables today are generated
>> with loops from database content, so the HTML / template will contain the
>> data markup in only one place anyway. So there is really no reason to add
>> such an awkward, proprietary mechanism that mixes syntax and vocabulary.
>> >
>> >
>> > Martin
>> >
>> >
>> >
>> > On Aug 14, 2013, at 2:47 PM, Dan Brickley wrote:
>> >
>> > > Re-fwd'ing this as Omar's mail didn't get distributed for some reason.
>> > >
>> > > Somewhat related, see also schema.org position paper to W3C Open Data
>> > > on the Web workshop recently,
>> > > http://www.w3.org/2013/04/odw/odw13_submission_53.pdf via
>> > > http://www.w3.org/2013/04/odw/papers
>> > >
>> > > Dan
>> > >
>> > > ---------- Forwarded message ----------
>> > > From: Omar Benjelloun (عمر بنجلون) <benjello@google.com>
>> > > Date: 13 August 2013 22:04
>> > > Subject: Proposal: Looking inside tables
>> > > To: public-vocabs@w3.org
>> > > Cc: Ramanathan Guha <guha@google.com>, Dan Brickley <
>> danbri@google.com>
>> > >
>> > >
>> > > Hi,
>> > >
>> > > Many useful datasets on the Web take the form of tables. The goal of
>> > > this proposal is to provide a simple, schema.org-friendly way to "look
>> > > inside" these tables, and map their contents into triples.
>> > >
>> > > This is an early draft proposal developed at Google. We're seeking
>> > > feedback from the community.
>> > >
>> > > The proposal is attached to this e-mail, and will be uploaded to the
>> > > WebSchemas/SchemaDotOrgProposals page shortly.
>> > >
>> > > Thanks,
>> > > -Omar
>> > > <Lookinginsidetables.html>
>> >
>> > --------------------------------------------------------
>> > martin hepp
>> > e-business & web science research group
>> > universitaet der bundeswehr muenchen
>> >
>> > e-mail:  hepp@ebusiness-unibw.org
>> > phone:   +49-(0)89-6004-4217
>> > fax:     +49-(0)89-6004-4620
>> > www:     http://www.unibw.de/ebusiness/ (group)
>> >          http://www.heppnetz.de/ (personal)
>> > skype:   mfhepp
>> > twitter: mfhepp
>> >
>> > Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> > =================================================================
>> > * Project Main Page: http://purl.org/goodrelations/
>> >
>> >
>> >
>> >
>> >
>>
>> --------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>>
>> e-mail:  hepp@ebusiness-unibw.org
>> phone:   +49-(0)89-6004-4217
>> fax:     +49-(0)89-6004-4620
>> www:     http://www.unibw.de/ebusiness/ (group)
>>          http://www.heppnetz.de/ (personal)
>> skype:   mfhepp
>> twitter: mfhepp
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>> * Project Main Page: http://purl.org/goodrelations/
>>
>>
>>
>>
>


-- 
Omar Benjelloun | benjello@google.com | (415) 845-8516

Received on Wednesday, 4 September 2013 22:08:07 UTC