Releases May 2023¶
Latest datahub release 1.4.0¶
Datahub 1.4.0 introduces two new API features: pageable queries and adhoc queries.
Adhoc Queries¶
An additional insight giving feature, Datahub now allows users to post adhoc query scripts to the /query
endpoint.
Datahub will know that you are posting a script instead of query parameters, if you set the request's Content-Type
to
application/x-javascript-query
. The posted script must be base64 encoded.
Query scripts posted this way must contain a javascript function with the signature function do_query()
. Datahub will
immediately execute this function in it's transforms engine. For the duration of execution, datahub keeps the HTTP request
open.
do_query
scripts can return results using WriteQueryResult(jsonObject)
.
Note that WriteQueryResult
does not flush until do_query completes.
Also note that do_query
is currently not supported in datahub-tslib
Example usage of do_query¶
Let's assume we have a dataset people
, and we want to count how many changes there are in this dataset.
First, we define a do_query
function in a file and name the file query.js
:
function do_query() {
let count = 0;
while (true) {
const entities = GetDatasetChanges("people", count, 10000).Entities;
const hits = entities.length;
if (hits == 0) {
break;
}
count = count + hits;
}
const result = { "changes count": count };
WriteQueryResult(result);
}
Now we can post the query to the datahub. The query has to be posted as base64 encoded string - just like regular transform scripts.
q=$(cat query.js | base64 -w0)
curl -XPOST \
-H "Content-Type: application/x-javascript-query" \
-d '{"query": "'"$q"'"}' \
http://datahub-hostname/query
Alternatively, the latest version of the datahub-cli can be used to send the query.
When the request completes, the response should show the aggregation result: [{"changes count": 5000000}]
You can read more about adhoc queries in the Datahub documentation
Pageable Queries¶
Queries in Datahub can be outgoing queries, following a relationship from a starting entity to other entities. Or they can be inverse queries, finding all other entities that point back to a starting entity via a relationship.
The outgoing type is usually quick and efficient, because the maximum number of query results is limited by what a single starting entity can point to
in it's refs
mapping.
Inverse queries on the other hand can potentially have very big result sets. Imagine querying in a demographics database with any city as starting entity, asking for all people pointing back to that city via the "hometown" relationship. That could return millions of results.
Especially for cases where many results are possible, datahub now offers a pageable queries. Pageable queries also work just as efficient on queries with small result sets, so they can be used for all query needs.
Web API¶
In order to use pageable queries in mim
or directly in Datahub's Web API, queries are used as before. But when a limit
is
provided as query parameter, Datahub will now not only limit the returned result, but potentially return a third element in the query result array.
If a third element is returned, its value is a list of continuation tokens for the initiated query. To fetch the next page of query results, a new
query can be sent with only the contiuation tokens as continuations
query parameter, and a limit
. The continuation token contains information
about starting entity, via, inverse and datasets parameters - so these need not be repeated when fetching the next page of a query.
In Transforms¶
The Query
function in transform scrips remains unchanged for compatibility. In addition, transforms can now use a new function: PagedQuery
.
The signature is function PagedQuery({StartURIs, Via, Inverse, Datasets}, limit, callback)
, with callback
as new parameter.
callback
must be a javascript function, defined in the transform script. It should accept one parameter, an array of query results. The callback
function can return a boolean value. true
indicates that PagedQuery
shall continue calling the callback. false
tells PagedQuery
to
stop.
An example using PagedQuery
, logging all query results
function transform_entities(entities) {
for (e of entities) {
const cb = function (resultPage) {
for (item of resultPage) {
Log(item);
}
return true;
};
PagedQuery(
{ StartURIs: ["ns3:person-1"], Via: "*", Inverse: false, Datasets: [] },
10,
cb
);
}
}
Read more about PagedQuery
in the Datahub documentation
Latest datahub-cli release v0.16.0¶
The latest datahub-cli version added bugfixes and support for new datahub features.
Get Datahub version 1.4.0 on Github and on Dockerhub.
Get Datahub-CLI version v0.16.0 on Github.