Literal Dev Diary - July 5th, 2020 ■ javamonn

This week, I began re-implementing Literal's data model to use the W3C Web Annotation specification.

Web Annotations from a product perspective

To be clear, my intention is to have Literal support the specification in a first-class way, i.e. the core GraphQL API always supports at least the data models as specified, and may evolve into a super-set of functionality as necessary. From a product perspective, supporting the Web Annotation spec in a first class way offers clear advantages. As a textual annotation management system, Literal's desired functionality clearly falls within the scope of the specification and as such can take advantage of a standardized data model that is explicitly designed for the problem space.

Philosophically, Literal is a bridge, not a silo. Literal enables the creation, management, and organization of textual annotations and the individual should retain ownership over the annotation data itself. Supporting the prevailing standard for annotation data will enable this ownership, as individuals will be able to export data from Literal in an established and documented format, and import Web Annotation compliant data from other systems ^[1].

Web Annotations from an implementation perspective

The initial version of the GraphQL schema derived from the spec can be seen here. Currently I'm at a stage where all data models illustrated in the spec are captured within the schema. Next, I'll work on implementing custom Mutation resolvers for the core data models.

It's early days as far as my overall familiarity with the spec, but some of my thoughts from this week are as follows:

Models have some degree of normalization. An example of this is several core types (e.g. Resource, Agent) can be represented as a globally-unique IRI or inlined directly into their referencing context. I think this makes sense when thinking about data as it exists outside of a DBMS, as references to denormalized data are difficult to resolve. In the context of Literal, this normalization results in the need for custom mutation resolvers to propagate data changes atomically. Additionally, in the GraphQL schema, where fields are technically union types of ExternalIRI | InlinedResource, I've opted to represent as the latter always, with the consumer always being able to request just the IRI field of the InlinedResource node if that's all they need.
The core data model spec intermixes concerns around transport of the data. The primary example of this is within the fields onAnnotationCollection and AnnotationPage classes that support pagination. This strikes me as odd, especially considering that there's a separate specification for a Web Annotation that is REST-centric and would be a better place to capture or model these concerns. Instead, JSON-LD for transport is baked into the data model spec itself. In the context of Literal, and GraphQL APIs more broadly, there are established pagination specifications that are competing. For Literal I'm planning to follow the fields and classes outlined in the Web Annotations spec in order to remain compliant, but never create more than one AnnotationPage for a given AnnotationCollection, and instead utilize the Relay connections spec in order to paginate the items field within anAnnotationPage.
The spec's modeling of SpecificResource is neat and solves several problems with referencing and interfacing with externally hosted resources. Specifically, the HttpRequestState class stood out to me, as this attempts to solve a particular problem I frequently ran into with Literal PDF Reader i.e. how you encode the parameters required for accessing an external resource again in the future.

Going Forward

I intend to continue to move forward on implementing the Web Annotation spec. My progress (and length of these weekly updates) will slow down for the next couple of weeks, as I'm in the middle of a cross-state move.