Metadata Registry

The content of the metadata registry conforms to an application profile of DCAT that combines notable (mostly metadata) vocabularies such as VoID, Dublin Core Metadata Terms, LIME and FOAF.

The Model

The following discussion of the model will assume the following prefix declarations:


@prefix dcat: <http://www.w3.org/ns/dcat#>
@prefix dcterms: <http://purl.org/dc/terms/>
@prefix foaf: <http://xmlns.com/foaf/0.1/>
@prefix lime: <http://www.w3.org/ns/lemon/lime#>
@prefix mdr: <http://semanticturkey.uniroma2.it/ns/mdr#>
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#>
@prefix skos: <http://www.w3.org/2004/02/skos/core#>
      
The namespace of the metadata registry ontology was http://semanticturkey.uniroma2.it/ns/mdreg# prior to Semantic Turkey 8.0. An update routine will take care of applying the new namespace to existing catalogs when upgrading from a previous version.

The metadata registry should contain exactly one dcat:Catalog, as follows:


:78e4d620-8314-4a4e-81ad-63e6a24d1584 a dcat:Catalog .
      

The addition of a dataset to the catalog requires a dcat:CatalogRecord, which in turn is related via foaf:primaryTopic to a void:Dataset denoting the current, possibly changing dataset.


:78e4d620-8314-4a4e-81ad-63e6a24d1584 a dcat:Catalog;
  dcat:record :8316629b-94e8-467e-b7ca-cce87ea185a5 .

:8316629b-94e8-467e-b7ca-cce87ea185a5 a dcat:CatalogRecord;
  dcterms:issued "2019-02-11T16:04:47.369+01:00"^^xsd:dateTime;
  foaf:primaryTopic <http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc> .

<http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc> a void:Dataset; a void:Dataset;
    …
      

A lot datasets, such as AGROVOC in the example above, may change over time, with discrete snapshots of them taken as different versions. Consequently, we should be able to differentiate between a dataset, in general, irrespectively of specific versions and specific (immutable) versions.

The dcat:CatalogRecord is unique for each dataset, in general, while grouping different versions. The foaf:primaryTopic holds the current description of a dataset, which may change over time, as it is bound to the data currently published at the dataset namespace. This description can be consulted, when it is not important to reference a specific version of the dataset in a specific point of time. Additionally the record can be associated through foaf:topic to descriptions of specific, immutable versions of the dataset.

The description of a dataset may contain various metadata, but it should contain at least the namespace (void:uriPrefix), the knowledge model (dcterms:conformsTo) and other metadata describing different accesses to the data. While the derefenceability of HTTP URIs is a "reasonable assumption" according to the specifications of VoID, the metadata registry uses the property mdreg:dereferenciationSystem, in order to represent explicit this fact (mdreg:standardDereferenciation) or the absence of this access (mdreg:standardDereferenciation). Additionally, the metadata should indicate a SPARQL endpoint (void:sparqlEndpoint), if any.


<http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc> a void:Dataset ;
  dcterms:conformsTo <http://www.w3.org/2004/02/skos/core>;
  void:uriSpace "http://aims.fao.org/aos/agrovoc/"; ;
  mdreg:dereferenciationSystem mdreg:standardDereferenciation ;
  void:sparqlEndpoint <http://agrovoc.uniroma2.it/sparql> .

The namespace is used to find the (unique) dataset that defines URIs with a given namespace. This is useful when someone has a URI, and wants to identify the dataset it belongs to. The knowledge model is required to differentiate, for example, between an ontology and a thesaurus, thus allowing the selection of the appropriate strategy to browse the semantic content (e.g. class/property hierarchy vs concept hierarchy). The SPARQL endpoint of a dataset may provide a richer alternative to HTTP resolution as an access mechanism.

A SPARQL endpoint may be further described to provide metadata about it. Currently, the model offers the property mdreg:sparqlEndpointLimitation, which holds a resource describing a limitation of the given endpoint. Currently, the sole defined value is mdreg:noAggregation, which is defined operationally as the inability of the endpoint to support aggregation to the extent required by services of Semantic Turkey (e.g. the resource view).

A dataset may define subsets (void:subset), when parts of the dataset deserve specific metadata. As an example, a dataset may define various lexicalization sets as subsets.


<http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc> a void:Dataset ;
  void:subset :3f73648f-d92b-4b9a-b763-56c871cd5456_it_lexicalization_set .

:3f73648f-d92b-4b9a-b763-56c871cd5456_it_lexicalization_set a lime:LexicalizationSet;
  dcterms:language <http://id.loc.gov/vocabulary/iso639-1/it>, <http://lexvo.org/id/iso639-3/ita>;
  lime:avgNumOfLexicalizations 0.7686822;
  lime:language "it"^^xsd:language ;
  lime:lexicalizationModel <http://www.w3.org/2008/05/skos-xl>
  lime:lexicalizations 32443;
  lime:percentage 0.59477323;;
  lime:referenceDataset <http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc>;
  lime:references 25103 .

The description of the lexicalization set reports on the natural language and the lexicalization model (in the example, SKOS-XL) that it conforms to. This information is important to instruct a dataset matching system on how to consume the available lexical material. In addition to this general metadata, a number of metrics can be used to evaluate coverage and expressiveness of the lexicalization set, and thus its relevance as a source of evidence in a matching scenario. We refer to the LIME specifications and to section on the formal definition of properties for further information on the metrics.

Other potential subsets of a dataset are void:Linksets describing the axioms connecting the datasets to other datasets.


<http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc> a void:Dataset;
  void:subset :agrovoc_dbpedia .

:agrovoc_dbpedia a void:Linkset ;
  void:linkPredicate skos:exactMatch ;
  void:objectsTarget <http://dbpedia.org/void/Dataset> ;
  void:subjectsTarget <http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc>
  void:triples 11057 .

The description of a language resource such as Open Multilingual Wordnet has some peculiarities that are not found in other generic datasets. Firstly, it contains the following subsets:

The description of the concept set should provide the number of synsets (lexical concepts):

<http://art.uniroma2.it/pmki/omw/pwn30-conceptset> a ontolex:ConceptSet;
  lime:concepts 117659 . 
      
As for the lexicalization set, the description of a lexicon tells the natural language in which the lexicon is expressed, and the number of words (lexical entries):

  <http://art.uniroma2.it/pmki/omw/Princeton_WordNet-en-lexicon> a
    lime:Lexicon;
    dcterms:language <http://id.loc.gov/vocabulary/iso639-1/en>, <http://lexvo.org/id/iso639-3/eng>
    lime:language "en" ;
    lime:lexicalEntries 156584 .     
      

Finally, the conceptualization set tells which lexicon and concept set it refers to, and provides useful metrics about the conceptualization itself:


:Princeton_WordNet-en-lexicon_pwn30-conceptset_conceptualization_set a lime:ConceptualizationSet;
  lime:avgAmbiguity 1.322;
  lime:avgSynonymy 1.76;
  lime:concepts 117659;
  lime:conceptualDataset <http://art.uniroma2.it/pmki/omw/pwn30-conceptset>
  lime:conceptualizations 206978;
  lime:lexicalEntries 156584;
  lime:lexiconDataset <http://art.uniroma2.it/pmki/omw/Princeton_WordNet-en-lexicon> .
      

It is noteworthy that certain elements that have been described as subsets (e.g. lexicalization sets or conceptualization sets) can actually be described as root datasets, when they are distributed separately.

Example of Metadata

The Metadata Persistence

The metadata registry managing a dataset catalog inside Semantic Turkey aggregates metadata from two sources:

Application Programming Interface

The metadata registry can be resed at three different levels: as a library in any Java application, as a managed dependency within Semantic Turkey, or as a web service consumed by external appplications.

The metadata registry has been factored into a dedicated subsystem since Semantic Turkey 8.0. This subsystem is futher decomposed into a core, bindings and services module.

Metadata Registry as a library

Since Semantic Turkey 8.0, the core module of the metadata registry can be used as a library in any Java application.

The first step is to add a dependency on the needed artifact (where ${st.version} is a variable containing the version of Semantic Turkey).


<dependency>
	<groupId>it.uniroma2.art.semanticturkey</groupId>
	<artifactId>st-metadata-registry-core</artifactId>
	<version>${st.version}</version>
</dependency>
      

The following Java program shows how to intantiate and initialize the metadata registry, add a dataset (record) to the catalog and then shutdown the registry.

	
import java.io.File;
import java.io.IOException;

import org.eclipse.rdf4j.model.ValueFactory;
import org.eclipse.rdf4j.model.impl.SimpleValueFactory;
import org.eclipse.rdf4j.repository.RepositoryException;
import org.eclipse.rdf4j.rio.RDFParseException;

import it.uniroma2.art.maple.orchestration.MediationFramework;
import it.uniroma2.art.semanticturkey.mdr.core.MetadataRegistryBackend;
import it.uniroma2.art.semanticturkey.mdr.core.MetadataRegistryCreationException;
import it.uniroma2.art.semanticturkey.mdr.core.MetadataRegistryIntializationException;
import it.uniroma2.art.semanticturkey.mdr.core.MetadataRegistryWritingException;
import it.uniroma2.art.semanticturkey.mdr.core.impl.MetadataRegistryBackendImpl;

public class Main3 {
	public static void main(String[] args) throws RDFParseException, RepositoryException, IOException,
			MetadataRegistryCreationException, MetadataRegistryIntializationException,
			IllegalArgumentException, MetadataRegistryWritingException {
		File baseDir = new File(".");
		MediationFramework mediationFramework = null;
		MetadataRegistryBackend mdr = new MetadataRegistryBackendImpl(baseDir, mediationFramework);
		mdr.initialize();
		try {

			ValueFactory vf = SimpleValueFactory.getInstance();

			mdr.addDataset(vf.createIRI("http://aims.fao.org/aos/agrovoc/void.ttl#Agrovoc"),
					"http://aims.fao.org/aos/agrovoc/", "AGROVOC", true,
					vf.createIRI("http://agrovoc.uniroma2.it/sparql"));
		} finally {
			mdr.destroy();
		}
	}
}
      

The program above pesists the content of the catalog inside the file metadataRegistry/catalog.ttl within the current working directory. The variable mediationFramework should contain a reference to a MAPLE mediation framework. Unfortunately, it is difficult to create such an object outside an OSGi container: therefore, it is set to null in this example program. Without a link to a mediation framework, the metadata registry looses some capabilities related to information discovery and analysis.

The class MetadataRegistryBackendImpl contains several protected methods that can be overridden to customize this implementation of the registry.

Metadata Registry as a managed dependency within Semantic Turkey

Semantic Turkey manages a single instance of the metadata registry (i.e. a singleton), which is then published in the OSGi registry to make it generally available within the system. The general metadata registry that has been described previous is bound to Semantic Turkey, by providing i) a specific position for the persistence file within the Semantic Turkey data directory and ii) use metadata stored in each project.

First of all, it is important to verify (in the context definition file) that the metadata registry has been imported into the Spring context associated with the bundle of interest.


<beans:beans xmlns="http://www.springframework.org/schema/security"
	xmlns:beans="http://www.springframework.org/schema/beans" xmlns:osgi="http://www.springframework.org/schema/osgi"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
	xsi:schemaLocation="http://www.springframework.org/schema/beans
	http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
	http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd
	http://www.springframework.org/schema/security
	http://www.springframework.org/schema/security/spring-security-3.1.xsd
	http://www.springframework.org/schema/osgi http://www.springframework.org/schema/osgi/spring-osgi.xsd">


	<!-- Retrieves the MetadataRegistry -->
	<osgi:reference id="metadataRegistryBackend"
		interface="it.uniroma2.art.semanticturkey.mdr.bindings.STMetadataRegistryBackend" />

</beans:beans>
      

At this point, the metadata registry can be injected using Spring's inversion-of-control mechanism, using the type it.uniroma2.art.semanticturkey.mdr.bindings.STMetadataRegistryBackend:


/**
* This service class allows the management of the metadata about remote datasets.
*/
@STService
public class MetadataRegistry extends STServiceAdapter {

  private STMetadataRegistryBackend metadataRegistryBackend;

  @Autowired
  public void setMetadataRegistry(STMetadataRegistryBackend metadataRegistryBackend) {
    this.metadataRegistryBackend = metadataRegistryBackend;
  }

  ...

}
Prior to Semantic Turkey 8.0, the metadata registry was part of the core-framework and it should be injected using the type it.uniroma2.art.semanticturkey.resources.MetadataRegistryBackend.

Metadata Registry as a web service

External applications may interact with the metadata registry through the REST API defined by these service classes:

For instructions on how to interact with these services, please refer to the documentation on the Web API of Semantic Turkey.

Prior to Semantic Turkey 8.0, the metadata registry services were also part of the core-services bundle.

Related Software