Skip to content

Releases May 2023

Latest datahub release 1.4.0

Datahub 1.4.0 introduces two new API features: pageable queries and adhoc queries.

Adhoc Queries

An additional insight giving feature, Datahub now allows users to post adhoc query scripts to the /query endpoint. Datahub will know that you are posting a script instead of query parameters, if you set the request's Content-Type to application/x-javascript-query. The posted script must be base64 encoded.

Query scripts posted this way must contain a javascript function with the signature function do_query(). Datahub will immediately execute this function in it's transforms engine. For the duration of execution, datahub keeps the HTTP request open.

do_query scripts can return results using WriteQueryResult(jsonObject). Note that WriteQueryResult does not flush until do_query completes.

Also note that do_query is currently not supported in datahub-tslib

Example usage of do_query

Let's assume we have a dataset people, and we want to count how many changes there are in this dataset.

First, we define a do_query function in a file and name the file query.js:

function do_query() {
  let count = 0;
  while (true) {
    const entities = GetDatasetChanges("people", count, 10000).Entities;
    const hits = entities.length;
    if (hits == 0) {
      break;
    }
    count = count + hits;
  }
  const result = { "changes count": count };
  WriteQueryResult(result);
}

Now we can post the query to the datahub. The query has to be posted as base64 encoded string - just like regular transform scripts.

q=$(cat query.js | base64 -w0)
curl -XPOST \
    -H "Content-Type: application/x-javascript-query" \
    -d '{"query": "'"$q"'"}' \
    http://datahub-hostname/query

Alternatively, the latest version of the datahub-cli can be used to send the query.

mim query --file query.js

When the request completes, the response should show the aggregation result: [{"changes count": 5000000}]

You can read more about adhoc queries in the Datahub documentation

Pageable Queries

Queries in Datahub can be outgoing queries, following a relationship from a starting entity to other entities. Or they can be inverse queries, finding all other entities that point back to a starting entity via a relationship.

The outgoing type is usually quick and efficient, because the maximum number of query results is limited by what a single starting entity can point to in it's refs mapping.

Inverse queries on the other hand can potentially have very big result sets. Imagine querying in a demographics database with any city as starting entity, asking for all people pointing back to that city via the "hometown" relationship. That could return millions of results.

Especially for cases where many results are possible, datahub now offers a pageable queries. Pageable queries also work just as efficient on queries with small result sets, so they can be used for all query needs.

Web API

In order to use pageable queries in mim or directly in Datahub's Web API, queries are used as before. But when a limit is provided as query parameter, Datahub will now not only limit the returned result, but potentially return a third element in the query result array.

If a third element is returned, its value is a list of continuation tokens for the initiated query. To fetch the next page of query results, a new query can be sent with only the contiuation tokens as continuations query parameter, and a limit. The continuation token contains information about starting entity, via, inverse and datasets parameters - so these need not be repeated when fetching the next page of a query.

In Transforms

The Query function in transform scrips remains unchanged for compatibility. In addition, transforms can now use a new function: PagedQuery.

The signature is function PagedQuery({StartURIs, Via, Inverse, Datasets}, limit, callback), with callback as new parameter. callback must be a javascript function, defined in the transform script. It should accept one parameter, an array of query results. The callback function can return a boolean value. true indicates that PagedQuery shall continue calling the callback. false tells PagedQuery to stop.

An example using PagedQuery, logging all query results

function transform_entities(entities) {
  for (e of entities) {
    const cb = function (resultPage) {
      for (item of resultPage) {
        Log(item);
      }
      return true;
    };

    PagedQuery(
      { StartURIs: ["ns3:person-1"], Via: "*", Inverse: false, Datasets: [] },
      10,
      cb
    );
  }
}

Read more about PagedQuery in the Datahub documentation

Latest datahub-cli release v0.16.0

The latest datahub-cli version added bugfixes and support for new datahub features.


Get Datahub version 1.4.0 on Github and on Dockerhub.

Get Datahub-CLI version v0.16.0 on Github.