This repository contains the architectural configuration to produce and publish a Linked Data Event Stream (LDES) containing a feed of changes for a given and configurable controlled vocabulary, as the ones managed by the EU Publications Office.
The vocabulary changes are modelled using the W3C Activity Streams 2 vocabulary.
The data processing workflow is built as an RDF-Connect pipeline that performs several data transformation steps, which include:
- Raw vocabulary fetching over HTTP
- Change detection and semantic labeling with Activity Streams 2
- Fragmentation based on temporal constraints
- Ingestion into a target data store system
The publishing is done via an instance of the ldes-server, which sits on top of the data store used by the RDF-Connect pipeline to write the data.
TODO: Diagram and description of pipeline components.
To run the pipeline locally, you need to make sure all the required components are up and running. These include:
- A Redis or MongoDB instance (see /datastore for more information)
- An instance of the ldes-server (see /ldes-server for more information)
- Optionally, an Varnish instance for caching (see /varnish for more information)
Next, you need to configure all the environment variables in the conf.env file according to your local setup.
Finally, you can an execution loop of the pipeline, that will fetch all versions of a given vocabulary (see run.sh) with:
./run.sh This pipeline and the necessary data storage and interface components are containerized using Docker and can be executed altogether using docker-compose as follows:
$ docker-compose up --build The conf.env file contains the main configuration variables to be set.