Hang on, I think attaching semantics to schemas, rather than data, solves 100% of the problems with both semantics and schemas.

Jenniferplusplus

Am I wrong? This feels like it's something.

Jenniferplusplus

This feels like there's so much to it, while simultaneously being very narrow and easy to accomplish. Or at least to implement. Specifying it is likely harder.

Still. Surely I'm not the first person to have this thought. Is this a thing already? If not, what am I missing?

Jenniferplusplus

Ok, I am not the first
https://www.w3.org/TR/sawsdl/

A quick skim of this feels like it's upside down. It embeds the semantic definitions into the schema. I would want them to be external to the schema, in the same way the schema is external to the data. But it's the kind of thing I mean. From 15 years ago. Did this go nowhere because it's XML and everyone abandoned XML to get away from the semantic data nerds? Or something else?

Marco 🦝 :verified: :grumpycat:

@jenniferplusplus found this https://www.mdpi.com/2076-3417/11/24/11978

As well as some discussions in the json repo

Marco 🦝 :verified: :grumpycat:

@jenniferplusplus https://github.com/json-schema-org/json-schema-vocabularies/issues/13

Jenniferplusplus

@m2vh yeah, that *kind* of thing. I found SAWSDL, and these SAWSDL-for-json proposals unsurprisingly have the same problem. Embedding semantic annotations into other things harder to work with. I imagine it was easier to ignore that when the structure was XML, because XML is designed to be self describing. But it would work better if the definition is outside the subject, the way schemas operate.

So you get
Data <- syntactic structure (schema) <- semantic meaning (vocabulary)

Jenniferplusplus

@m2vh otherwise, you get the same dynamic as always: system developers can't even imagine why or how you would even possess data whose semantic meaning you don't already know. (Fair, btw.) And the demands made by semantic data nerds seem intrusive and burdensome. (Because they are.)

Marco 🦝 :verified: :grumpycat:

@jenniferplusplus i remember an article where it was proposed to put the semantics external like a sub-schema. But i can't find it. It was proposed by people from the web of things people. I i found the same discussion in the area of geolocation.

I think, that examples will show the benefits. In the era of ai it could be really usefull if your data factory can gather semantic information on the data by following a link.

I think you should start a repo and buy a domain

Marco 🦝 :verified: :grumpycat:

@jenniferplusplus and maybe people developing odata could have some ideas on how to implement something like the semantic schema extension. They work already with linking to additional information.

Jenniferplusplus

@m2vh my concern is mostly in the realm of getting semantic data nerds to get out of my way and stop making everything harder than it needs to be. I also hate the idea of making data more legible to LLMs.

So. This is a battle for someone else to fight.

Peter Amstutz

@jenniferplusplus

You might find this interesting:

GitHub - common-workflow-language/schema_salad: Semantic Annotations for Linked Avro Data

Semantic Annotations for Linked Avro Data. Contribute to common-workflow-language/schema_salad development by creating an account on GitHub.

GitHub (github.com)

Basically everything defined in the schema has a corresponding semantic node, documents are written in YAML but have a corresponding rdf representation, and robust support for including fields outside the core vocabulary in an unambiguous way

Jenniferplusplus

@tetron This appears to be a project to define schemas for linked data documents? And that is, again, backwards. I want to attach (but not embed) vocabularies to schemas. Mostly so that I stop having to deal with it. It can be entirely the problem of the people who want it, instead of them making it my problem.

Peter Amstutz

@jenniferplusplus
I think you want something like a json-ld context, which describes how json fields map to semantic nodes without necessarily specifying a schema, but even then it is hard to avoid asserting schema-like details such as whether a field takes a single value or an array of values.

But ultimately it is a problem for the schema design, because common anti patterns like reusing the same field name to mean different things in different contexts make it challenging to assign semantics.

Jenniferplusplus

@tetron No, I extremely don't want that. I want the people who do want that to stop forcing it on me. I promise I know about json-ld, and I hate it.

Jenniferplusplus

@tetron I want to give my json schema and human readable documentation to the people to who want that. And I want them to go off on their own devise their own method to attach semantic meaning to things that doesn't burden me with solving this problem that I don't have and don't care about.

Peter Amstutz

@jenniferplusplus
I'm not very familiar with the ActivityPub spec but this is about AP isn't it?

Jenniferplusplus

@tetron That is certainly the largest and most immediate contributor, yes.

But it's a concern almost any time that almost any W3C standard or working group is involved with something that needs to operate at high QPS.

Peter Amstutz

@jenniferplusplus
So the irony is that linked data semantic web stuff is totally designed for annotating external resources the way you want, but only if the resource itself has a linked data mapping (i.e. there's way to refer to individual elements in the document), and schema documents written with json schema don't. Which is why the schemas need to be linked data themselves. Cue the endless screaming.

Jenniferplusplus

@tetron That's not really ironic, so much as tangential. I get the benefits in a reference context. But at best it's useless in a processing context. To the extent that it displaces techniques that enable processing, it's actually a detriment.

Peter Amstutz

@jenniferplusplus
So I'm writing from the perspective of the particular thing I linked earlier but I just want to mention a couple of things it has:

a) code generators for a bunch of languages including C#, which use the schema to write the data structures and parsing/validation for you, which is very fast and there's no lunacy like having to transit through an rdf triple store

b) knowing which fields are identifiers or references to other things has some nice properties for validation