Hang on, I think attaching semantics to schemas, rather than data, solves 100% of the problems with both semantics and schemas.
-
Marco 🦝 :verified: :grumpycat:replied to Jenniferplusplus last edited by
@jenniferplusplus i remember an article where it was proposed to put the semantics external like a sub-schema. But i can't find it. It was proposed by people from the web of things people. I i found the same discussion in the area of geolocation.
I think, that examples will show the benefits. In the era of ai it could be really usefull if your data factory can gather semantic information on the data by following a link.
I think you should start a repo and buy a domain
-
Marco 🦝 :verified: :grumpycat:replied to Marco 🦝 :verified: :grumpycat: last edited by
@jenniferplusplus and maybe people developing odata could have some ideas on how to implement something like the semantic schema extension. They work already with linking to additional information.
-
Jenniferplusplusreplied to Marco 🦝 :verified: :grumpycat: last edited by
@m2vh my concern is mostly in the realm of getting semantic data nerds to get out of my way and stop making everything harder than it needs to be. I also hate the idea of making data more legible to LLMs.
So. This is a battle for someone else to fight.
-
You might find this interesting:
GitHub - common-workflow-language/schema_salad: Semantic Annotations for Linked Avro Data
Semantic Annotations for Linked Avro Data. Contribute to common-workflow-language/schema_salad development by creating an account on GitHub.
GitHub (github.com)
Basically everything defined in the schema has a corresponding semantic node, documents are written in YAML but have a corresponding rdf representation, and robust support for including fields outside the core vocabulary in an unambiguous way
-
@tetron This appears to be a project to define schemas for linked data documents? And that is, again, backwards. I want to attach (but not embed) vocabularies to schemas. Mostly so that I stop having to deal with it. It can be entirely the problem of the people who want it, instead of them making it my problem.
-
@jenniferplusplus
I think you want something like a json-ld context, which describes how json fields map to semantic nodes without necessarily specifying a schema, but even then it is hard to avoid asserting schema-like details such as whether a field takes a single value or an array of values.But ultimately it is a problem for the schema design, because common anti patterns like reusing the same field name to mean different things in different contexts make it challenging to assign semantics.
-
@tetron No, I extremely don't want that. I want the people who do want that to stop forcing it on me. I promise I know about json-ld, and I hate it.
-
Jenniferplusplusreplied to Jenniferplusplus last edited by [email protected]
@tetron I want to give my json schema and human readable documentation to the people to who want that. And I want them to go off on their own devise their own method to attach semantic meaning to things that doesn't burden me with solving this problem that I don't have and don't care about.
-
@jenniferplusplus
I'm not very familiar with the ActivityPub spec but this is about AP isn't it? -
@tetron That is certainly the largest and most immediate contributor, yes.
But it's a concern almost any time that almost any W3C standard or working group is involved with something that needs to operate at high QPS.
-
@jenniferplusplus
So the irony is that linked data semantic web stuff is totally designed for annotating external resources the way you want, but only if the resource itself has a linked data mapping (i.e. there's way to refer to individual elements in the document), and schema documents written with json schema don't. Which is why the schemas need to be linked data themselves. Cue the endless screaming. -
@tetron That's not really ironic, so much as tangential. I get the benefits in a reference context. But at best it's useless in a processing context. To the extent that it displaces techniques that enable processing, it's actually a detriment.
-
@jenniferplusplus
So I'm writing from the perspective of the particular thing I linked earlier but I just want to mention a couple of things it has:a) code generators for a bunch of languages including C#, which use the schema to write the data structures and parsing/validation for you, which is very fast and there's no lunacy like having to transit through an rdf triple store
b) knowing which fields are identifiers or references to other things has some nice properties for validation
-
@tetron That would be helpful if there was a defined schema, or if it was even possible to define a schema. But with activitypub, that's not actually possible.
-
@jenniferplusplus
So if we're talking about https://www.w3.org/TR/activitystreams-vocabulary/
there is a machine readable formal model under there, it is just defined in OWL. I don't offhand know of tools that take in OWL and give you data models in more practical languages but that doesn't mean they don't exist. For ActivityStreams specifically it doesn't look like it would be all that hard.
At this rate I'm going to talk myself into writing a proof of concept, which is dangerous. -
@tetron I'm pretty sure both the owl and context are broken and contradict the spec. The spec also mandates that fields with a single value must serialize as a value rather than an array, which creates an enormous number of problems.
I'm pretty sure that combines to make it impossible. But if you can somehow turn it into a proper schema, you'd be advancing fediverse development by years.
-
@jenniferplusplus
this is incomplete and broken but if it was more complete and less broken and could be used to generate C# code, would it be useful?very incomplete conversion of activitystreams2.owl to schema salad representation
very incomplete conversion of activitystreams2.owl to schema salad representation - gist:2714b63983063548af4705a7cf9defa2
Gist (gist.github.com)
-
@tetron Potentially. It at least gets to the next hurdle, which is that adding things to the @/context is what passes for an extension mechanism in AP. And the state of those vocabularies is even worse. Several of the most common ones don't even exist.
-
Jenniferplusplusreplied to Jenniferplusplus last edited by [email protected]
@tetron But, let's assume this can turn into JSON schemas, accounting for extensions. And those schemas are suitable to do validation and code generation. Then it would become a primarily social problem of convincing maintainers to switch to using schema-defined implementations.
-
d@nny "disc@" mc²replied to Jenniferplusplus last edited by
@jenniferplusplus @tetron cc @aud who i think is working on something related but possibly a different angle