DIM and RIM: data ingestion manager and RDF indexing manager for smart city

RIM and DIM are accessible at source code on GITHUB


Data Ingestion Manager

Data Ingestion Manager allows the creation of Open Data records, setup and management of the ingestion process. The ingestion process starts by collecting raw Open Data and ends with the generation of RDF Triples according to the domain ontology model adopted. The creation of Open Data record and properties allows the insertion and editing of an Open Data in the repository. Open Data are described by a set of properties to be set like:  Name, Category, Resource, Source, Format, Type (real-time or static) and more (see section 2.2 for a full list).
The setup and management of ingestion process allows selecting tasks to execute both in the creation step and during the life of data for update purposes. The following tasks are available and could be executed singularly or joined (“concatenate”):  
·       Ingestion (I)of the data instances performs the raw data retrieval from the source where the Open Data is stored.
·       Quality Improvement (QI) task is focused on enriching the Open Data by adding for instance links to external Linked Open Data (LOD) or refining possible inconsistences.
·       Triples Generation (T) performs thegeneration of RDF Triples by mapping static, dynamic data on the basis of the domain ontology model.
·       Validation(V) of the Open Data detects possible inconsistencies, incompleteness, correctness of relationships, etc…
·       Reconciliation (R) tasktries to solve the lack of coherence among indexed entities referring to the same concept but coming from different data sets.


Manual: Data Ingestion Manager, DIM, Km4City
Data Ingestione Manager strumento per il trattamento di dati: acquisizione multisorgente (multiformato, multi protocollo, statici e real time), arricchimento, estensione, conversione, aumento, integrazione, equalizzazione, razionalizzazione, incremento qualità, etc. veda: processi di data management in ETL, Java, Perl, etc.; Gestione delle attività da svolgere per ogni data set

 

RDF Indexing Manager

For the RDF Index generation the RDF Index Manager produces a script according to the index descriptor and the RDF store target. The script is structured in the following steps: (i) setup of script, (ii) initialization of RDF store, (iii) bulk uploading of triples into the store, (iv) RDF store finalization, (v) create eventual additional indexes as textual indexes, geographical indexes that need additional database commands, and (vi) update index building status. In most cases, the RDF store rebuilt by indexing is time consuming, and may imply manually edited long scripts that are error prone. In order to solve this kind of problem, in this paper, a lifecycle methodology and our RIM tool for RDF KB store versioning is proposed. The results have shown that saving time up to the  95% depending on the number of triples, files and cases to be indexed.

RDF Index Manager: user manual
Graph databases are taking place in many different applications: smart city, smart cloud, smart education, etc. In most cases, the applications imply the creation of ontologies and the integration of a large set of knowledge to build a knowledge base as an RDF KB store, with ontologies, static data, historical data and real time data. Most of the RDF stores are endowed of inferential engines that materialize some knowledge as triples during indexing or querying. In these cases, the delete of concepts may imply the removal and change of many triples, especially if the triples are those modeling the ontological part of the knowledge base, or are referred by many other concepts. For these solutions, the graph database versioning feature is not provided at level of the RDF stores tool, and it is quite complex and time consuming to be addressed as black box approach. In most cases the indexing is time consuming process, and the rebuilding of the KB may imply manually edited long scripts that are error prone. Therefore, in order to solve these kinds of problems, in this paper, a lifecycle methodology and a tool supporting versioning of indexes for RDF KB store is proposed. The solution proposed has been developed on the basis of a number of knowledge oriented projects as Sii-Mobility (smart city), RESOLUTE (smart city risk assessment), ICARO (smart cloud). Results are reported in terms of time saving and reliability.


Supported Methodology

Graph databases are taking place in many different applications: smart city, smart cloud, smart education, etc. In most cases, the applications imply the creation of ontologies and the integration of a large set of knowledge to build a knowledge base as an RDF KB store, with ontologies, static data, historical data and real time data. Most of the RDF stores are endowed of inferential engines that materialize some knowledge as triples during indexing or querying. In these cases, the delete of concepts may imply the removal and change of many triples, especially if the triples are those modeling the ontological part of the knowledge base, or are referred by many other concepts. For these solutions, the graph database versioning feature is not provided at level of the RDF stores tool, and it is quite complex and time consuming to be addressed as black box approach. In most cases the indexing is time consuming process, and the rebuilding of the KB may imply manually edited long scripts that are error prone. Therefore, in order to solve these kinds of problems, a lifecycle methodology and a tool supporting versioning of indexes for RDF KB store is proposed. The solution proposed has been developed on the basis of a number of knowledge oriented projects as Sii-Mobility (smart city), RESOLUTE (smart city risk assessment), ICARO (smart cloud). Results are reported in terms of time saving and reliability.


Figure 1.   RDF Index Buidling Monitor