Re: How are you currently generating Schema.org syntax? from Sebastian Samaruga on 2018-07-13 (public-lod@w3.org from July 2018)

From: Sebastian Samaruga <ssamarug@gmail.com>
Date: Thu, 12 Jul 2018 22:53:58 -0300
To: Dan Brickley <danbri@google.com>
Cc: Joe Duarte <songofapollo@gmail.com>, "schema.org Mailing List" <public-schemaorg@w3.org>, public-rww <public-rww@w3.org>, public-lod <public-lod@w3.org>
Message-ID: <CAOLUXBti96xqFYsxYDU+Sy+Q=n+PKhgfriLdiYii8MT1pZAsMw@mail.gmail.com>
Hi, I'm a newcomer to the list and I don't quite understand all the
concepts in general. I'm sorry if I'm misunderstanding the principles, but,
does an 'opposite' approach like this make sense?

Standard annotation / metadata format mechanism for activated content types
augmenting content with its (learned) knowledge, accessible only with an
(inferred) content identifier.

Augmentation (activation): type / instance identity merge (discover type /
identity equivalence of different occurrences of the same entity),
attributes / links discovery (augment known schema about some entity kind
and populate inferred fields), context roles (extract role from entity
mention, entity from role occurrence, in a given context).

Standard activation alignment: content / annotations (IO / commands /
verbs).

Content types: text (structured / unstructured), images, serialization
formats, backends (schema annotations), domain flows (behavior
annotations). Integration (dimensional annotations / metamodels).

Content: identifiers, metamodel aggregation. CRUD. API (application /
service).

Annotations: (inferred) content identifiers relative retrievable / editable
metadata (content index aligned repository). API (application / service).

Client: Augmented content browser. Metamodel driven assisted content
browser (sessions: purposes / goals metadata driven helper, keep browsing
session items relations with each other / wizard like interface obtained
from metadata). API (application / service).

Although all these functionality is currently being tried to be solved by
embedding 'microformats' into content, the actual approach here is those of
a kind of 'protocol' for inferring / retrieving metadata identifiers via
the following APIs (for example): Index (resolve metadata), Naming (resolve
content), Registry (content / types / metadata bindings).

So, a 'semantic' lookup for 'hollydays_picture.png' in an HTTP resolvable
context could shield to a retrievable 'hollydays_picture.rdf' somewhere and
then, perhaps, an RDF vocabulary for describing image regions could be used
to describe the objects (i.e.: faces) in the visualization of that file in
a browser.

Standardization will be needed for such approach of 'pointing' external
metadata into a document. Addressing mechanisms exists for most markup
languages (XPointer, XPath, XLink) and that will leverage existing XML DOM
capabilities, plus XSL / XSLT declarative languages for ease of merge and
transform of metadata via templates.

I'm also not an expert in the field but doesn't SoLiD approach address some
of this issues. Seems like efforts in other lists like rww / lod have
nothing to do with semantic annotations, so I include them in this thread.
Regards,

Sebastian.
http://exampledotorg.blogspot.com


On Thu, Jul 12, 2018, 9:42 PM Dan Brickley <danbri@google.com> wrote:

>
>
> On Thu, 12 Jul 2018 at 16:47, Joe Duarte <songofapollo@gmail.com> wrote:
>
>> Hi all,
>>
>> This is a question I've had for a long time. I'm not aware of any
>> software that can automatically generate Schema.org syntax for content like
>> an article, event, product, etc. I'm speaking of body content, not the head.
>>
>> For example, if I write an article that mentions some moderately famous
>> scientist, I want to insert the sameAs syntax with a link to his or her
>> Wikipedia page or ORCID page to let search engines know that I'm talking
>> about this particular person. Hopefully that would strengthen the article's
>> SEO or whatever and lead to more readers.
>>
>> I have to do that and any other kind of Schema.org markup manually. I'd
>> really like to go wheels up with it and markup just about everything in an
>> article, any mention of a city, country, scientific paper, person, car, all
>> of it. But it would be a lot of work as I understand the situation
>> currently.
>>
>> So how are you doing it? Are there any major publishers that thoroughly
>> mark up their articles? Have they released any open source tools? (Sorry if
>> I missed a thread.)
>>
>> It seems like automated, thorough markup would require very powerful
>> software, like IBM Watson or other machine learning tools. Am I correct in
>> assuming that you're all doing it manually? The WP plugins I saw seemed to
>> only do the head page-level metadata, not the thorough embedded markup.
>>
>> Schema.org has been developed to satisfy various criteria or goals. It
>> occurs to me that one design goal could be* ease of automation*. I'm not
>> sure what that would look like – I'll have to think about it some more.
>>
>
> The initial central usecase for Schema.org, and still pretty core, was the
> idea that sites very often *already* have highly structured data in
> databases of various kinds. And sites already have mechanisms (templates
> etc.) that turn database records into user-facing HTML. Schema.org simply
> allowed more of that original structure to be exposed. In practice, there
> are often sites that want to expose Schema.org descriptions but don't quite
> have the right fields. For example around fact checking and our ClaimReview
> markup, many fact checking organizations already have more or less what's
> needed but too often in an understructured form. So you'll sometimes see
> initiatives (often case specific, e.g. http://sharethefacts.org/) that
> try to make things easier for publishers in that situation.
>
> Dan
>
>
>> Cheers,
>>
>> JD
>>
>
Received on Friday, 13 July 2018 01:55:13 UTC