Re: Discussion topic: How to improve understanding and application of schema.org from Niels on 2018-06-16 (public-schemaorg@w3.org from June 2018)

From: Niels <nielsl@xs4all.nl>
Date: Sat, 16 Jun 2018 12:07:22 +0200
To: public-schemaorg@w3.org
Message-ID: <ED47CAC8-C085-484B-A60E-A9336BA60890@xs4all.nl>
I think I am one of those users you are reffering to.

I started using Schema.org 2 weeks ago, because I am getting into web developement just now. I immidiately got frustrated with it, because it is hard to implement and doesn't always fit my intent. Further, I found it is leaving too much freedom to search engines, leading to people following the exa.ples of a certain search engine who came up with its own values for certain properties, rather than following schama.org and corcing search engines to follow the standard. The web isn't owned by Google nor Bing.

I am going to focus on the use of the schema vocab on the world wide web. I think if we smooth that use case out, it will give good insights for other use cases as well.

I think the most confusing part of the vocab for its use on websites is that it lacks an easy way to define properties to the general topic of the website.

A bot of phylosophy:
A website is defined by its pages, and on those pages we might find images or articles or videos or information about an event. The page is this physical document you are looking at. The image is actually here, so is the video, so is the information about an event. But the very thing binding it all together is not here: the website. There are only pages. The page can describe itself and anything it containes, but the vocab is missing a way to describe the elephant in the room, this page must be seen in the context of a website with multiple pages.
One such page might desrive what the organisation is about, another may describe how to contact them. Yet another which events they organise, yet another hosts the articles they write.
And out of this you can already see the website isn't the root of it all either; the website is part of an organisation who owns the site to publicise information or interact with audience.

Its like we are describing a room, and anything in it, but forget to come up with ways to describe the house this room is part of, and the owner who owns this house.
Eventually you are going to find yourself trying to describe the owner while you should be describing a picture of this owner which is standing on his nightstand.
The information about the events and about the articles and about how to contact them do not define the organisation.
Just like an image on someones nighstand doesnt define the person depicted, that is only a single frame out of a single perspective on someone. A snaphot. It did not capture the entire person. If you are going to describe the imsge you are describling contour of a face. Such properties have nothing to do with a person, they say something sbout the picture.

So lets address the missing vocab to define anything above that webpage. Define the website as a whole, and as a parent of that the organisation as a whole.

The actual root is the global society we live in. Organisations are a child of that, and websites are children of organisations.

So far the phylosophy.



Furthermore, there are insufficient guidelines for values. Sometimes a datetime is the expected value, in which case iso 8601 provides a very clear definition. Often a value is just text, whithout ANY guidelines what the text should describe. I am talking about guidelines for the use of text, not rules. Text is quite deliberatly a very free expression, but when we define text in the context of a certain property, there should be a little guidence on what information the text is expected to include or exclude.

More importantly, DO stadardize values where a need for such surfaces. This is happening right now, as I brought forward on GitHub, with contactPointType. This property accepts text, but Google wanted to use it to display it in their search results. Therefore Google came up with their own "accepted values" which foccused only on buissnesses, and left out for example an information hotline, because such are not interesting to Google (they hardly ever advertise their service). In practice, this means people will follow googles guidelines on which values are acceptable, and they will shift the way they define their own contactPoint to the cosesed definition Google finds acceptable.
Schema.org should step up and standardise some values, such as the ones for contactPointType. We cannot accept search engines pushing people into their limited vocab. This is something to get angry about even, Google is disrubting what Schema.org was inte ded for. Even though they funded it. The solution is not to lash out, but to standardise, forcing them to accept all values we see fit.

About examples:
First and foremost, we must come up with exaples of entire websites, not just snippets alone. Show people thenbig picture, and how pages are related to eachother, and elements on those pages, and how the head of the page is used, etc.
In fact, setting up such examples I think you will quickly stuble over issues schema.org vocab has, which make it difficult to ise it on websites.

Come up with you own validator.
Google has a validator wich only gives some vague warnings or errors, but doesnt advise what to do and doesnt show where it was found. Yandex does a bit better on both, but still doesn't clearly show where in the document the error was found.
Set u a validater which clearly indicates which scope applies where, which part of the document has no scope set, and which scope is contained by another scope. I am thinking just show the html of the page and mark itemprops with font colors and itemscopes with background colors, to show how those itemprops are contained within a scope.


In my eyes, data is only interesting in terms of relations. How does one webpage relate to another? How do elements on the page relate to eachother? How do elements on the page relate to the over all site?
For the schema.org vocab two things broadly set this structure: the nomenclature of the types, and the child parent relationships in which these types are placed.
With respect to websites, I am immidiatly missing clear guidelines, and I even had to conclude the vocab itself is not ready to mark up entires sites.

The first issue I came across is that by default a wep page is assumed to be itemscope webPage. But nowhere does schema.org recommand to follow this structure on websites. For all the vocab cares, I define my entire page as itemscope organisation and be done with it. A document is not an organisation, and always contains much more than information about an organisation. Such use should be invalid. By allowing this implementation, there is not much standardizing left to work with for crawlers and search engines, and the vocab loses most of its potential use.

Continuing onwards, I find insufficient ways to mark up how this web page relates to the website it is part of. I encountered a case where I was working on a webpage which offers an overview for events. This webpage belongs on the site of an NGO whom also writes articles, and advises policy makers. The website was much broader than these events, yet there was no clear place to define that this page is about that NGO. Whith schemas vocab I could mark up that this page is about events, and i could mark up each event with the detail that this NGO is its organizer. What is missing is a block of data describing how these events are a key part of something a certain NGO does. They define this NGO. But also important, there are other things, such as those articles, which are equally a key aspect of what this NGO does.
Lastly, there is no way to state this page is part of the offical website of the events organizer. The events might be posted on several websites, but this is the goto site for these events, and the vocab can offer me no way to to state that this NGO owns this entire site, and uses it as its main website for anything it does.

You are asking about cross-domain vocab. I would argue that we already have issues with cross-page vocab within the same domain, and we need to solve those first.

The head of a page can only have one itemscope, but is often the place to add buisness contact information (part of the organisation scope) next to data which difines the document which is this webpage, and next to the title of the page which can be seen as a name to the thing described in the body.
The head of the pagev ontains a mix of itemtypes, but no easy way to scope them.

And finally, the vocab for webpage navigation elements. These would be the perfect place to define how this page relates to other webpages on thecsame domain and sometimes even cross-domain. Instead we are left with a broken type WebPageElement, which is somehow not a child of WebPage(it clearly should ve if we are gonna use it at all), and does not have many children either. (About this I also wrote an issue report on github.) It would seem logical to me to have an itemscope WebPage with a property webPageElement in it, and give that webPageElement itself an itemscope of SiteNavigationElement. A webpage element should be a property of WebPage, not a type of its own.


Final words:
I am looking for a more hierarchycal way to structure an entite site, and dhow how different pages hang together to define the sub topics of this site, and how each page relates to a main activity of the NGO who owns the site.
The vocab needs to be clear, and it examples need to be clear, about how an organosation or buisness should mark up their entire site, rather than only focussing on snippets. The root itemscope is often the domainname itself. Schema can help me define how events arevrelated to a ceratin NGO, and how these events can be seen as a follow up on articles on this website, and how events can lead to advise to policy makes uploaded pn yet another oage of this website.
Such type of complex relationships are what I want to define, and althoug schema.org comes close, it can not yet get the job done.

Apologies for the long reply which somewhat shifts off-topic. Please pick from it any insight of use to you.

Kind regards,
Niels Lancel

P.s.:
I almost forgot to mention, the website schema.org is a confusing mess. Just finding this mailing list was a challenge, navigating between obolete google docs and such. There is also a w3.org page, which seems to have no practical use at this time. The documentation is a list of accumulated documents in no particulair order or structure...

I can go on with this but I hope I have been able to provide a new persepective by now.



On June 16, 2018 7:27:38 AM GMT+02:00, Michael Andrews <nextcontent01@gmail.com> wrote:
>Hello,
>
>
>I’m inviting discussion about how to improve understanding and
>application
>of schema.org more widely.  This thread is a fork of a long comment I
>made
>earlier, which some felt deserved a dedicated thread.  I won’t repeat
>all
>the points I made before, but you can read them here:
>https://lists.w3.org/Archives/Public/public-schemaorg/2018Jun/0102.html
>
>
>Question 1: Can we improve general understanding of how the vocabulary
>works, especially for those who are not active on a weekly basis
>shaping
>its decisions?  Sometimes people find certain terminology confusing,
>and
>not self-describing.  Coverage of different entity types can vary, with
>some detailed and well-documented, and others not detailed or
>well-documented.  Can the current terminology be improved or
>rationalized
>in a non-disruptive way?  Should nomenclature used in creating terms or
>definitions be standardized, or defined by a common dictionary of
>definitions?  Can the documentation be improved to reduce ambiguities,
>provide better guidance in the absence of examples, provide best
>practices
>for quality, and help new users understand how pieces fit together to
>support novel applications?
>
>
>Question 2: Can we improve cross-domain application of schema.org, so
>that
>different types of entities can be compared?  Much of schema.org’s
>development has focused on the needs of sector- or domain-specific data
>users.  But many potential applications (voice interaction, learning,
>games) can take advantage of schema.org to compare shared properties of
>different kinds of entities, such as the speed of a machine verse an
>animal.
>To do that requires that properties be comparable across different
>entity
>types, which is sometimes difficult to take advantage of when
>properties
>are closely tied to specific entity types.    How can entity and
>property
>coverage or usage be improved to benefit general and comparative
>information description and application?
>
>
>I welcome your feedback on these questions.  Please feel to challenge
>any
>assumptions I’ve asserted that you don’t feel aren't accurate.
Received on Saturday, 16 June 2018 18:01:25 UTC