TriplyETL: Overview
TriplyETL allows you to create linked data knowledge graphs. It does this by Extracting data connect data sources to a pipelines that
If you are using TriplyETL for the first time, go to the Getting started page.
Approach
TriplyETL uses the following unique approach to Extract, Transform, and Load (ETL) data:
- Step 1 Extract extracts data records from one or more data sources.
- A generic Record is loaded from the Source Systems. The representation of the Record is independent of the source system that is used.
- Step 2 Transform cleans, combines, and extends data in the Record representation.
- Step 3 Assert uses data from the Record to generate linked data assertions.
- The Internal Store holds linked data that is generated for each Record.
- Step 4 Enrich improves or extends linked data in the Internal Store.
- Step 5 Validate ensures that linked data in the Internal Store is correct.
- Step 6 Publish makes linked data available in a Triple Store for others to use.
In addition, the following components can be used throughout these steps:
- Declarations allow you to declare constants in one place for reuse throughout the rest of your TriplyETL configuration.
- Debug tools allow you to gain insight in a TriplyETL for the purpose of maintenance.
- Control structures can be used to make parts of the TriplyETL configuration optional or repeating (loops).
Why TriplyETL?
TriplyETL has the following core features, that set it apart from other data pipeline products:
- Backend-agnostic: TriplyETL supports any data source through it large and growing set of source connectors.
- Multi-paradigm: TriplyETL supports all major paradigms for transforming and asserting linked data: SPARQL Update, JSON-LD Algorithms (TBA), SHACL Rules, RML (TBA), and RDF All The Things (RATT). You can also write your own transformations in TypeScript.
- Scalable: TriplyETL processes data in a stream of self-contained records. This allows TriplyETL pipelines to run in parallel, ensuring a high pipeline throughput.
- Standards-compliant: TriplyETL implements the latest versions of the linked data standards and best practices: RDF 1.1, SHACL, XML Schema Datatypes 1.1, IETF RFC3987 (IRIs), IETF RFC5646 (Language Tags).
- High Quality: The output of TriplyETL pipelines is automatically validated against the specified data model.
- Production-grade: TriplyETL pipelines can run in the four DTAP environments that are common in production systems (Development, Testing, Acceptance, Production).