Hang on, I think attaching semantics to schemas, rather than data, solves 100% of the problems with both semantics and schemas.
-
Jenniferplusplusreplied to Jenniferplusplus last edited by [email protected]
@tetron But, let's assume this can turn into JSON schemas, accounting for extensions. And those schemas are suitable to do validation and code generation. Then it would become a primarily social problem of convincing maintainers to switch to using schema-defined implementations.
-
d@nny "disc@" mc²replied to Jenniferplusplus last edited by
@jenniferplusplus @tetron cc @aud who i think is working on something related but possibly a different angle
-
infinite love ⴳreplied to Jenniferplusplus last edited by
@jenniferplusplus @tetron this is ironically what json-ld contexts were *supposed* to do -- you can "upgrade" any arbitrary json into json-ld by providing your own context, even if it wasn't explicitly declared by the document producer. but this requires you to "guess" what the producer meant by any given term, instead of the producer telling you explicitly what they meant. and your "guess" might not match someone else's "guess".
anyway, i don't see why this can't be layered on top of a schema.
-
@jenniferplusplus @tetron it's just that usually, the semantic data nerds will insist that the semantics are required while the schema is optional. it feels like the counterargument here is that the schema should be ~required instead, while the semantics should be optional.
-
@jenniferplusplus @tetron or maybe in an ideal world you could package both together. this is something i've been trying out -- have the context document include not just an intended context mapping, but also schema/ontology information. see https://w3id.org/fep/1985.jsonld for example, which defines a context mapping for 4 terms, but then also separately contains a graph for those term definitions. for example, `orderType` will tell you its domain, range, min/max cardinality (i.e. required/functional).
-
Jenniferplusplusreplied to infinite love ⴳ last edited by
@trwnh @tetron meaning is necessary to do anything meaningful with a document, sure. But the meaning is implicit in the context. We're all out here building AP social networking services, passing each other social messages. We know what these things mean. But without a schema, doing that processing is slow, expensive, and error prone. We gain nothing by defining these messages semantically, and lose a lot from the lack of structure.
-
@jenniferplusplus @trwnh
So the problem the semantic stuff is trying to solve is how to have an extensible standard without causing chaos, e.g. if two different implementations decide to add a field called "grilledCheese" but actually each one uses it to mean different things with different structure. Then the semantic markup lets you tell them apart. -
@jenniferplusplus @trwnh
But your application probably only cares about or understands a subset of all the terms in use and it makes sense to use a schema to rigorously validate the things you support and ignore the rest. -
@[email protected] @[email protected] @[email protected] So, like, definitely I am actively working on doing an AP implementation in Rust, and the fact that the schema is so broad is definitely difficult.
I have some advantages in that I am implementing a specific case rather than thinking about the problem in broader, non-AP specific terms. For instance: the additional schema definition added by@context
is something I'm able to parse and deserialize into native types, to an extent, but it is not something I necessarily have to care about unless I choose to. If I don't know the data is there, I... don't really have to do anything about it if I don't want to.
As an example, Mastodon seems to add blur hashes and positional information to image objects. Misskey adds a_misskey_summary
field to notes. These are defined in the@context
section of the payload. In the implementation I'm working on, things that are part of the incoming payload but aren't part of the AP spec are left in an_extra
HashMap that exists on the object (rather than a specific field, which I'm reserving for things that are defined in AP, such asid
,name
, etc). The idea is that someone using these structures (myself or others) might care about that data and do something with it... assuming they know it's there and part of the payload, of course. But if you don't know if it's there, well... not much you can do with it at compile time, really.
About the arrays and having things just be a single line if only one element of their structure is populated, I'm handling that via specific serialization functions. Basically, everything is deserialized into the AP type that is specified in the original schema regardless of whether they're a simple string or a more complex struct (links in particular often are just a simple string). At serialization time, I check how many of my fields are populated... and if it's only one, I spit it out like a string.
Similar for arrays: if I am expecting a payload of a single item but receive instead an array of items, I deserialize the payload into the__array
element (which is an array of APObject
s) of my AP Object type. Basically, my AP object implementation can be both a real AP object or a simple container for an array of AP Objects.
This came about because I noticed when working with real data that AP didn't talk much about arrays but they're everywhere in real payloads. I think I need to generalize this functionality (currently it only works onObjects
when it should work on anything and everything that inherits from it).
Basically: because I'm working on a specific example of this type of problem, I'm free to make decisions that wouldn't necessarily work for every type of problem... and also because I'm working with real data, I have to deviate (in a sense) from the written spec to handle real payloads. Thankfully, there's no shortage of data... -
@tetron @jenniferplusplus this assumes you are working purely in one problem space and never cross any boundaries. for example, if your schema is roughly "activitystreams plus some extensions", then you won't know what to do with something that isn't as2. here, the mime type is doing a lot of the semantic work for you. if you want to ensure that certain extensions are understood, you end up basically needing to define a new mime type. but the problem is you can embed documents in other documents
-
@tetron @jenniferplusplus so the mime type actually changes for only *part of the document* instead of the entire document. i think this is something a lot of people are not prepared to encounter, and generally don't know how to deal with, except by making assumptions based on popular usage. for example, the `publicKey` property is not part of as2. it's from the old deprecated security/v1. if doing ld, you expect some CryptographicKey object(s) inside it, but a "plain json" might use a string!
-
@tetron @jenniferplusplus this basically means that not only does your schema have to account for extensions (especially "required extensions" in the case of how fedi uses the `publicKey` property), you also have to be clear about semantics at some level. either that is done via the mime type, via the context declaration, or perhaps via some schema that indirectly embeds or references something equivalent to a context (as is being proposed at the top of this thread).