Skip to content

Data transformation scripts in typescript

Get Started

Setup

To start with, we need an npm project to store our typescript files, and for transpiling them to datahub compatible javascript. The following commands create a new project folder, add a barebone npm config file, and install the datahub-tslib library.

mkdir myproject && cd myproject
echo "{}">package.json
npm install mimiro-io/datahub-tslib --save-dev
Info

To install a specific version, use this syntax:

npm install "mimiro-io/datahub-tslib#0.0.1" --save-dev

Please also refer to the README on Github for updated setup information.

Writing the first transformation script

Every typescript transformation script needs two things:

  1. an import statement, to include datahub types and functions
  2. an exported funcion called transform_entities, with an array of Entity as both input parameter and returnend value.

This minimal example returns all incoming entities unmodified, and logs the entity batch size to the datahub logging output:

import * as dh from "datahub-tslib/datahub";

export function transform_entities(entities: dh.Entity[]): dh.Entity[] {
    dh.Log(entities.length);
    return entities;
}

Info

it is also possible to import single types and functions into the current module context. The above example with single imports:

import { type Entity, Log } from "datahub-tslib/datahub"

export function transform_entities(entities: Entity[]): Entity[] {
    Log(entities.length);
    return entities;
}

Now all that is left is to transpile our new transform script to datahub compatible javascript. Save the script as mytransform.ts and run the tt transpiler

npx tt mytransform.ts

Note

tt prints the transpilation result to stdout. So to store it, pipe the output into a file

npx tt mytransform.ts > mytransform.js

Next steps

Try to modify entities. This example checks in every entity, if the http://example.com/ex/name property starts with "Old". If so, the current entity is mutated using SetDeleted.

import * as dh from "datahub-tslib/datahub";

export function transform_entities(entities: dh.Entity[]): dh.Entity[] {
    const prefix = dh.GetNamespacePrefix("http://example.com/ex/");

    entities.forEach((e: dh.Entity) => {
        const name = dh.GetProperty(e, prefix, "name");
        if (name !== null && typeof name === 'string' && name.substring(0,3) ==="Old") {
            dh.SetDeleted(e, true);
        }
    })
    return entities;
}

You can also filter out entities, build and return completely new entity arrays (using NewEntity), or write to multiple datasets using NewTransaction and ExecuteTransaction. Refer to the Data hub documentation on github for a description of all available functions. And read below for a list of the typed function signatures.

Supported functions

Datahub injects a number of helper functions into the global scope of transformation scripts. Most functions handle access to and manipulation of entities. It is preferable to use the provided helper functions over direct manipulation of javascript entity objects, because the functions ensure correct encoding for storage.

List of built-in functions

The following functions are typed versions of Datahub's javascript built-ins.

  • Log(t: any, level?: string): void
  • NewEntity(): Entity
  • AsEntity(obj: any): Entity|null
  • GetId(entity: Entity): string|null
  • SetId(entity: Entity, id: string): void
  • SetDeleted(entity: Entity, value: boolean): void
  • GetDeleted(entity: Entity): boolean|null
  • PrefixField(prefix: string, field: string): string
  • AssertNamespacePrefix(urlExpansion: string): string
  • GetNamespacePrefix(urlExpansion: string): string
  • SetProperty(entity:Entity, propertyNamespacePrefix: string, propertyName:string, value: PropertyValue): void
  • GetProperty(entity:Entity, propertyNamespacePrefix: string, propertyName:string, defaultValue?: PropertyValue): PropertyValue|null
  • GetReference(entity:Entity, refNamespacePrefix: string, refName:string, defaultValue?: ReferenceValue): ReferenceValue|null
  • AddReference(entity: Entity, nsPrefix: string, refName: string, refValue: ReferenceValue): void
  • RenameProperty(entity: Entity, originalPrefix: string, originalName: string, newPrefix: string, newName: string): void
  • RemoveProperty(entity: Entity, prefix: string, name: string): void
  • UUID(): string
  • ExecuteTransaction(txn: Transaction): Error
  • NewTransaction(): Transaction
  • Timing(name: string, end: boolean): void
  • ToString(object: any): string
  • FindById(entityId: string, datasets?: string[]): Entity|null
  • Query(startingEntities: string[], predicate: string, inverse: boolean, datasets?: string[]):QueryResult[]