RE: How to avoid that collections "break" relationships (ISSUE-41)

On 13 Jun 2014 at 23:49, Gregg Kellogg wrote:
> On Jun 13, 2014, at 8:03 AM, Markus Lanthaler wrote:
>> Could you please elaborate a bit? Are you worried that simple crawlers
and other clients
> won't be able to reconstruct the complete resource description and it's
> relationships to other resources?
> 
> Not clear exactly what SEO scrapers do or don't do, but my guess is that
they restrict

:-) looks like you think a lot about search engine optimization these days
:-)


> themselves to schema.org properties. Even then, there's not always an
obvious property to

Of course I don't know what they are doing but if I were to write one, I
would follow every link - even those I don't understand - because they may
lead me to data I do understand.


> connect to classes defined in schema.org. For example, its natural to want
to link things such
> as questions from, say, a Person to the set of questions asked of her.
Question has an about
> property, which can link back to a Thing, but there is no forward
relationship. In JSON-LD
> (or RDFa), I could create this relationship using @reverse or @rev, but no
natural way to do
> that in Microdata. If I create my own property to reference questions,
does just the fact that
> I've linked to another resource cause an SEO scraper to dereference it?

In most cases, yes. AFAICT, these crawlers follow all links in HTML pages
regardless of whether they are marked up or not (or with which property).
Extracting structured data from the crawled pages is a separate process
which doesn't have to do anything with the crawling process. Consequently,
JSON-LD embedded in HTML probably doesn't affect the crawling process at
all. In other word, if the link isn't in the HTML, it won't be seen by the
"dumb" crawler. Web APIs aren't crawled yet at all AFAIK.


>> Does the hasCollection/collection forward link help in that regard in
your opinion?
> 
> hasCollection suffers from the same ambiguity; it makes perfect sense to
provide this, and a
> Hydra client will be able to use it to find appropriate collections (vs,
having to query for
> collections having a subject referencing the interesting entity. But, for
SEO, is there some
> reason to believe that a scraper will follow this to find related
information?

See above. If the link is in the HTML, it will be found. If not, it won't.
All of this will obviously change as soon as search engines start to crawl
Web APIs directly. Which might be sooner than we all think. At least I'm
bullish :-)


> Similarly, if I split the definition of an entity between the main entity
resource, and the
> collections having that entity as a subject, will this result in a search
engine marrying those
> properties together? From a linked data prospective, sure, but for SEO
we're sort of
> shooting in the dark.

Does it matter? What queries can't be answered if it isn't merged? Or for
what queries a page would rank lower if the data isn't merged? If the data
isn't on a specific page, that page also shouldn't show up in the search
results. Users would certainly be surprised if they would click on that
result and then reach a site which doesn't contain the "promised"
information.


>> What do you think makes many-to-many relationships special?
> 
> Many-to-many relationships are interesting because you don't use
> CREATE/DELETE operations to cause to entities to become related. For
> example, in the Friend case, if I want to friend Markus, performing a
> CREATE doesn't make sense as the Markus entity presumably already
> exists. This is where I think LINK/UNLINK become interesting (and that
> seems to be exactly what they're intended for). You might be able to do
> this with PATCH on the collection manifesting these relationships, but
> again the semantics would be clear. If I patch the member property of a
> collection, do I replace it's values, or add a new value? How would I
> remove a specific value from a property? LINK/UNLINK make this fairly

We would need to define a new media type for PATCH which describes this
explicitly. We can't use application/ld+json or text/turtle for this. Andy
is working on an RDF patch format [1].

> clear (although, IMO, the operation should be performed on the entities,
> not the collection.
> 
> In contrast, many-to-one relationships always result in a new entity being
created, updated
> or deleted. For example, I create a new instance of a Game by performing a
create on the
> collection (in my view), but manipulate it or delete it by operating on
the game directly.

That's not always true. Think of foaf:knows, e.g.. I can't delete the
persons Alice knows as that would remove those persons completely.


[1] http://afs.github.io/rdf-patch/


--
Markus Lanthaler
@markuslanthaler

Received on Saturday, 14 June 2014 21:07:17 UTC