Data transformation scripts in typescript¶
Get Started¶
Setup¶
To start with, we need an npm project to store our typescript files, and for transpiling them to datahub compatible javascript. The following commands create a new project folder, add a barebone npm config file, and install the datahub-tslib library.
mkdir myproject && cd myproject
echo "{}">package.json
npm install mimiro-io/datahub-tslib --save-dev
Info
To install a specific version, use this syntax:
Please also refer to the README on Github for updated setup information.
Writing the first transformation script¶
Every typescript transformation script needs two things:
- an import statement, to include datahub types and functions
- an exported funcion called
transform_entities
, with an array ofEntity
as both input parameter and returnend value.
This minimal example returns all incoming entities unmodified, and logs the entity batch size to the datahub logging output:
import * as dh from "datahub-tslib/datahub";
export function transform_entities(entities: dh.Entity[]): dh.Entity[] {
dh.Log(entities.length);
return entities;
}
Info
it is also possible to import single types and functions into the current module context. The above example with single imports:
Now all that is left is to transpile our new transform script to datahub compatible javascript. Save the script as mytransform.ts
and run the tt
transpiler
Note
tt
prints the transpilation result to stdout. So to store it, pipe the output into a file
Next steps¶
Try to modify entities. This example checks in every entity, if the http://example.com/ex/name
property starts with "Old". If so, the current entity is mutated using SetDeleted
.
import * as dh from "datahub-tslib/datahub";
export function transform_entities(entities: dh.Entity[]): dh.Entity[] {
const prefix = dh.GetNamespacePrefix("http://example.com/ex/");
entities.forEach((e: dh.Entity) => {
const name = dh.GetProperty(e, prefix, "name");
if (name !== null && typeof name === 'string' && name.substring(0,3) ==="Old") {
dh.SetDeleted(e, true);
}
})
return entities;
}
You can also filter out entities, build and return completely new entity arrays (using NewEntity
), or write to multiple datasets using NewTransaction
and ExecuteTransaction
. Refer to the Data hub documentation on github for a description of all available functions. And read below for a list of the typed function signatures.
Supported functions¶
Datahub injects a number of helper functions into the global scope of transformation scripts. Most functions handle access to and manipulation of entities. It is preferable to use the provided helper functions over direct manipulation of javascript entity objects, because the functions ensure correct encoding for storage.
List of built-in functions¶
The following functions are typed versions of Datahub's javascript built-ins.
Log(t: any, level?: string): void
NewEntity(): Entity
AsEntity(obj: any): Entity|null
GetId(entity: Entity): string|null
SetId(entity: Entity, id: string): void
SetDeleted(entity: Entity, value: boolean): void
GetDeleted(entity: Entity): boolean|null
PrefixField(prefix: string, field: string): string
AssertNamespacePrefix(urlExpansion: string): string
GetNamespacePrefix(urlExpansion: string): string
SetProperty(entity:Entity, propertyNamespacePrefix: string, propertyName:string, value: PropertyValue): void
GetProperty(entity:Entity, propertyNamespacePrefix: string, propertyName:string, defaultValue?: PropertyValue): PropertyValue|null
GetReference(entity:Entity, refNamespacePrefix: string, refName:string, defaultValue?: ReferenceValue): ReferenceValue|null
AddReference(entity: Entity, nsPrefix: string, refName: string, refValue: ReferenceValue): void
RenameProperty(entity: Entity, originalPrefix: string, originalName: string, newPrefix: string, newName: string): void
RemoveProperty(entity: Entity, prefix: string, name: string): void
UUID(): string
ExecuteTransaction(txn: Transaction): Error
NewTransaction(): Transaction
Timing(name: string, end: boolean): void
ToString(object: any): string
FindById(entityId: string, datasets?: string[]): Entity|null
Query(startingEntities: string[], predicate: string, inverse: boolean, datasets?: string[]):QueryResult[]