Data transformation scripts in typescript¶
Get Started¶
Setup¶
To start with, we need an npm project to store our typescript files, and for transpiling them to datahub compatible javascript. The following commands create a new project folder, add a barebone npm config file, and install the datahub-tslib library.
mkdir myproject && cd myproject
echo "{}">package.json
npm install mimiro-io/datahub-tslib --save-dev
Info
To install a specific version, use this syntax:
Please also refer to the README on Github for updated setup information.
Writing the first transformation script¶
Every typescript transformation script needs two things:
- an import statement, to include datahub types and functions
- an exported funcion called
transform_entities, with an array ofEntityas both input parameter and returnend value.
This minimal example returns all incoming entities unmodified, and logs the entity batch size to the datahub logging output:
import * as dh from "datahub-tslib/datahub";
export function transform_entities(entities: dh.Entity[]): dh.Entity[] {
dh.Log(entities.length);
return entities;
}
Info
it is also possible to import single types and functions into the current module context. The above example with single imports:
Now all that is left is to transpile our new transform script to datahub compatible javascript. Save the script as mytransform.ts and run the tt transpiler
Note
tt prints the transpilation result to stdout. So to store it, pipe the output into a file
Next steps¶
Try to modify entities. This example checks in every entity, if the http://example.com/ex/name property starts with "Old". If so, the current entity is mutated using SetDeleted.
import * as dh from "datahub-tslib/datahub";
export function transform_entities(entities: dh.Entity[]): dh.Entity[] {
const prefix = dh.GetNamespacePrefix("http://example.com/ex/");
entities.forEach((e: dh.Entity) => {
const name = dh.GetProperty(e, prefix, "name");
if (name !== null && typeof name === 'string' && name.substring(0,3) ==="Old") {
dh.SetDeleted(e, true);
}
})
return entities;
}
You can also filter out entities, build and return completely new entity arrays (using NewEntity), or write to multiple datasets using NewTransaction and ExecuteTransaction. Refer to the Data hub documentation on github for a description of all available functions. And read below for a list of the typed function signatures.
Supported functions¶
Datahub injects a number of helper functions into the global scope of transformation scripts. Most functions handle access to and manipulation of entities. It is preferable to use the provided helper functions over direct manipulation of javascript entity objects, because the functions ensure correct encoding for storage.
List of built-in functions¶
The following functions are typed versions of Datahub's javascript built-ins.
Log(t: any, level?: string): voidNewEntity(): EntityAsEntity(obj: any): Entity|nullGetId(entity: Entity): string|nullSetId(entity: Entity, id: string): voidSetDeleted(entity: Entity, value: boolean): voidGetDeleted(entity: Entity): boolean|nullPrefixField(prefix: string, field: string): stringAssertNamespacePrefix(urlExpansion: string): stringGetNamespacePrefix(urlExpansion: string): stringSetProperty(entity:Entity, propertyNamespacePrefix: string, propertyName:string, value: PropertyValue): voidGetProperty(entity:Entity, propertyNamespacePrefix: string, propertyName:string, defaultValue?: PropertyValue): PropertyValue|nullGetReference(entity:Entity, refNamespacePrefix: string, refName:string, defaultValue?: ReferenceValue): ReferenceValue|nullAddReference(entity: Entity, nsPrefix: string, refName: string, refValue: ReferenceValue): voidRenameProperty(entity: Entity, originalPrefix: string, originalName: string, newPrefix: string, newName: string): voidRemoveProperty(entity: Entity, prefix: string, name: string): voidUUID(): stringExecuteTransaction(txn: Transaction): ErrorNewTransaction(): TransactionTiming(name: string, end: boolean): voidToString(object: any): stringFindById(entityId: string, datasets?: string[]): Entity|nullQuery(startingEntities: string[], predicate: string, inverse: boolean, datasets?: string[]):QueryResult[]