JSON-LD Streaming¶

JSON-LD is a format for expressing linked data in JSON. It has filled a need for a JSON representation of RDF semantic web data. One core use case that the current JSON-LD specification does not address is streaming. This document describes a proposal for a streaming JSON-LD format.

A streaming JSON-LD format is useful for a number of reasons:

It allows for the streaming of JSON-LD data from a source to a sink without having to buffer the entire document in memory or introduce complex client processing semantics.
It guarentees that the context comes before the data, and does not need to be repeated for each data item. This also means that the document/ stream can be processed without having to buffer the entire document in memory or needing a second pass.
The format can be extended to include continuation tokens that can be used when syncing data to easily continue from where the last sync left off.

Existing JSON-LD Streaming Specification¶

There exists a JSON-LD Streaming specification. This specification is a W3C working group note for a streaming JSON-LD format. This specification focuses on the ordering of properties in a single JSON-LD resource representation. It does not address the challenge of streaming multiple JSON-LD resources in a single document with a common context.

JSON-LD¶

The current pattern in JSON-LD when a server needs to send a number of resources as a response to a request would either be to send a JSON-LD document with a graph key that contains all the resources being described or,alternatively, each resource can be serialised as a separate JSON-LD document within an array.

The only problem with this and the current streaming JSON-LD specification is that the context is repeated for each resource. This requires it to be parsed and processed for each resource when often there is very little difference between each JSON-LD object.

Streaming JSON-LD¶

This streaming JSON-LD proposal defines that the first JSON object in the array is the context that applies to all subsequent resource descriptions. The following JSON objects are the resource descriptions, any object is free to override the context mappings defined in the initial context.

e.g.

[

{
    "@context": {
        "ns0": "http://data.mimiro.io/core/dataset/",
        "ns1": "http://data.mimiro.io/core/",
        "ns2": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "ns3": "http://data.mimiro.io/people/",
        "ns4": "http://data.mimiro.io/schema/person/",
        "ns5": "http://data.mimiro.io/companies/",
        "ns6": "http://data.mimiro.io/schema/",
        "ns7": "http://data.mimiro.io/schema/person/worksfor/",
        "core" : "http://data.mimiro.io/uda/core/"                 
        }
},

{
    "@id": "ns3:homer",
    "core:recorded": 1672299810499868928,
    "core:deleted": false,
    "ns2:type": { "@id" : "ns6:Person" },
    "ns4:worksfor": [ { "@id": "ns5:mimiro"} ],
    "ns4:fullname": [ "Homer Simpson" ]
},

... more resources ...

{
    "ns2:type": "core:continuation",
    "core:token": "AAgAAAACAAAAAAAAAAo="
}

]

One additional reflection is that the key ordering described in the W3C note gives some small gain, but having each single resource description in memory is not typically a challange.

Data Sync Extensions¶

One of the key use cases for the streaming representation is when a server is exposing a stream of changes to resources in an underlying graph. In this case, the server can provide a continuation token that can be used to resume the stream from where it left off. This is useful for clients that want to sync data from a server. The continuation token is an RDF resource serialised as a JSON-LD object.

The rationale for being able to embed continuation tokens in the stream as opposed to being HTTP response headers is primarily beacuase streams can be long lived and a server may wish to provide many continution tokens over the lifetime of the stream. If it cannot do this then in essence servers return collections of items that ensure that the amount of replay needed is 'acceptable' to a client.

The Universal Data API now includes the above JSON-LD streaming format and also describes how to exposes continuation tokens in the stream.