W3C home > Mailing lists > Public > public-vocabs@w3.org > September 2014

Re: Is the HTML content necessary if using JSON LD for semantic data negotiation?

From: Stéphane Corlosquet <scorlosquet@gmail.com>
Date: Thu, 4 Sep 2014 22:03:57 -0400
Message-ID: <CAGR+nnHtoUrU58mc0nqUQr9+jDwrMeuPnJGpcSY=jcC+2VQ3Cw@mail.gmail.com>
To: Chilly_Bang <chilly_bang@yahoo.de>
Cc: W3C Vocabularies <public-vocabs@w3.org>
Disclaimer: I don't speak for any search engine, my opinion is merely based
on what I've gathered from their docs and common sense.

There is no doubt that Google (and other search engines to similar extents)
employ complex algorithms to detect and penalize any attempt at gaming the
system. They've learnt a lot from the once upon a time very common meta
keywords cheating techniques. Even if one managed to find a way to get
around their algorithms with JSON-LD, it would cost them a lot in terms of
ranking and reputation once discovered, and it's not worth it IMO. It's
easy for large search engines with enough computing power to compare the
content of the JSON-LD snippet with the rest of the visible content on the
page, and measure the divergence.


On Thu, Sep 4, 2014 at 4:51 PM, Chilly_Bang <chilly_bang@yahoo.de> wrote:

> Hi to all!
>
> Specially after looking this video about getting events into knowledge
> graph by using Schema / JSON LD, http://goo.gl/zBFftH, i got these
> thoughts and questions again:
>
> Google repeatedly mentions its embracing of JSON LD as the
> developer-friendly art of semantic data negotiation. I guess the cause
> behind it to make the implementation of Schema's data easier / broader. The
> questions i nervously ask are:
>
> • whether the HTML content is in general still necessary, if using JSON LD?
>

I guess you're talking about JSON-LD being served as JSON via HTTP content
negotiation vs JSON-LD being a JSON island in a script inside the HTML. As
far as I know, Google only support the later in webpages and email messages.


> • Whether / how algorithm checks the correlation of semantic data provided
> in the same web document by JSON LD and the pure content? I assume, that
> using in the same web document both of inline Schema's markup and JSON LD
> simultanously isn't a good practice at all... right?
>

Are you talking about JSON-LD embedded in HTML, or pure JSON-LD as a JSON
document?


>  • If this correlation isn't checked, how algorithm decides about misuse
> / spam usage of content provided by JSON LD snippet?
> • Even if this correlation is checked, but meanings provided by content
> and JSON LD are more or less different, which "meaning's part" will finally
> rank: the one provided by JSON LD, or the one algorithm got from the
> analyzing of pure content?
>

These are very search engine specific questions. I doubt you'll get a
direct answer to those :)


>
> I remember tons of auto-populated websites years ago and see the similar
> coming - tons of empty websites containing only JSON LD snippets. Providing
> semantic data as inline markup gives added value to the content, but at the
> same time it builds a barrier against spam misuse of inline markup, cause
> the implementation of it takes place on quasi-manual way. Making
> implementation easier (easier means in this context possibility to *simple
> *script-based data querying, negotiation and publication) could backfire
> with thousands of auto-populated stub-sites inside of the SERP.
>

That's the advantage of syntaxes like microdata and RDFa which I think are
more transparent, forcing the publisher to be more "honest": the data being
extracted is what's displayed to the end user, following the Don't Repeat
Yourself principle. It also works better with larger chunks of content such
as job postings, news articles or comments, that need to be duplicated in
JSON-LD adding to the size of the payload, which is not the case with
inline markups (microdata/RDFa). This isn't as much of an issue for things
like Musical Events though, since the size of the data chunks is smaller.

Steph.
Received on Friday, 5 September 2014 02:04:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:44 UTC