URI Generator
The Purpose
In some scenarios, users may want newly created resources to have their URIs automatically assigned (instead of being manually specified). The URIs can be generated by the system according to different policies, which can be set for each different project.
Semantic Turkey supports this functionality via the extension point uri generator (it.uniroma2.art.semanticturkey.plugin.extpts.URIGenerator). The extension point allows arbitrary URI generation mechanisms to be plugged to the system and used to return URI names for newly created resources.
ST comes already bundled with a few URI generators, which can be configured for multiple needs.
The Model
ST provides a notion of role in order to address and distinguish classes of resources, such as rdfs/owl classes, properties, skos concepts etc.. Roles are used when assigning icons to resources, when establishing the layout of the resource view etc...
An xRole is an extensible interpretation of roles, allowing for further roles to be defined in specific contexts of the platform. For instance, reified notes in SKOS do not even have a related class, but they could be addressed as specific entities in some contexts.
In the context of automatic URI generation, resource URIs can be generated based on their xRole and on a map holding additional properties. Some xRoles are defined by Semantic Turkey itself, but they can also be defined by extensions (provided that the extended role name is sufficiently distinguished to avoid clashes). Whoever defines an extended role also establishes which arguments must be provided (i.e. mandatory parameters) or can be supplied (i.e. optional parameters).
Although a user should always conform to the contract associated with an xRole when providing values for its associated parameters, an actual URI generator should be designed to be robust with respect to missing arguments or unknown xRoles. The last requirement is particularly important, since it allows independent invention of xRoles, without negative consequences on already developed URI generators.
Here follows a list of xRoles, together with their known associated parameters
concept
(forskos:Concept
s)label
(optional): the accompanying preferred label of the skos:Conceptschemes
(optional): the concept schemes to which the concept is being attached at the moment of its creation encoded as a Turtle collection
conceptScheme
(forskos:ConceptScheme
s)label
(optional): the accompanying preferred label of the skos:ConceptScheme
xLabel
(forskosxl:Label
s)lexicalForm
: the lexical form of the skosxl:LabellexicalizedResource
: the resource to which the skosxl:Label will be attached tolexicalizationProperty
: the property used for attaching the label
skosCollection
(for reifiedskos:Collection
s)label
(optional): the accompanying preferred label of the skos:Collection
Choosing and Configuring a URI Generator for your project
Each project stores information about the URI Generator associated to it. The choice of the appropriate URI Generator and its configuration are normally performed at project creation; it is however possible to also change these settings afterwards.
During project creation
A specific URI Generator is associated to a project during its creation. In the panel "Optional settings", it is possible to choose a URI Generator, in place of the default one (NativeTemplateBasedURIGeneratorFactory
). Also, it is possible to configure the chosen uri generator, by choosing an alternative configuration than the default one (NativeTemplateBasedURIGeneratorConfiguration
) for the default generator). It may be necessary to manually configure the chosen configuration type, in case it has mandatory parameters.
Low-level project metadata
It is not advisable to change the URI Generator after a project has been created: the alteration of a populated project could result in identifiers conforming to different schemes, thus defying the purpose of the URI Generator in the first place. For this reason, there is no UI for changing/reconfiguring the selected URI Generator after project creation. Provided that the user understands its implications, it is possible to change the URI Generator by altering the metadata of a closed project.
In the project file (project.info
), there are two properties:
plugins.mandatory.urigen.factoryID
: the ID of the factory to useplugins.mandatory.urigen.configType
: the type of configuration used
urigen.config
contains the serialization of the configuration object to use for instantiating the extension point.
URI Generator Implementations
Native Template-Based URI Generator
The Native Template-Based URI Generator (whose factory ID is it.uniroma2.art.semanticturkey.plugin.impls.urigen.NativeTemplateBasedURIGeneratorFactory
) constructs the URI for a resource by instantiating a pattern chosen based on the xRole of the resource. The patterns can be constructed out of the parameters associated with the corresponding xRole, as well as some functions (e.g. to produce random values).
This URI generator has only one type of configuration (NativeTemplateBasedURIGeneratorConfiguration
), where parameters named after xRoles hold the corresponding template. The configuration class has default templates for the following xRoles: concept
, xLabel
and xNote
, as well as a fallback
template for other cases.
It is also possible to add new parameters to this configuration class, which can be used to indicate the template for additional extended roles (e.g. introduced and defined by a Semantic Turkey extension).
A template is instantiated with the information associated with the resource being created to produce its local name, which is then appended to the default namespace of the dataset. The template can thus contain any character that can appear legally in a URI local name. In addition, it is possible to user placeholders for the value of a parameter associated with the extended role or for the invocation of a function, e.g. to generate a random string.
A placeholder can be written either as ${...}
or $${...}
, where the dots should be replaced with a parameter name or function, as discussed later. The two constructions differ in that the former implies the sanitization of the placeholder value (i.e. substituting sequences of spaces with underscores and percent-encoding other illegal characters). The latter does not perform that sanitization, and it can be used to insert already sanitized content (note that percent encoding is not an idempotent operation).
To use a parameter inside a template, it is sufficient to use its name as a placeholder: for instance, c_${label}
generates deterministically a URI for a concept based on its label. The placeholder is then substituted with a value depending on the type of its associated argument (e.g.literal in the case of the label parameter for the concept xRole). The following lists the substitutions for each parameter type.
IRI
: the local nameLiteral
: the label (unquoted, without language tag)BNode
: bnode identifier- otherwise, the output of
Object.toString()
c_${label.getLanguage}_${label}
orc_${label.language}_${label}
(the latter works because the reference to the propertylanguage
is turned togetLanguage
).${lexicalizedResource.getLocalName}
or${lexicalizedResource.localName}
(same as above).
Note that, due to a limitation in the current state of implementation, templates should invoke the function rand()
to introduce some randomness inside the generation process. This randomness is necessary as, after a name is created for a new resource, its uniqueness is checked against the current dataset and in case regenerated. Without the randomness introduced by the rand()
function, the current implementation runs into the risk of entering into an infinite loop, when a deterministic template produces an already existing identifier (and it will continue during subsequent iterations, exactly because of its deterministic nature).
The rand() function has a single optional parameter, to which one of the following values can be assigned:
- DATETIMEMS: uses the current time in MS for generating the ID
- UUID: generates a random UUID
- TRUNCUUID4: generates a random UUID and then truncates up to the first 4 chars
- TRUNCUUID8: generates a random UUID and then truncates up to the first 8 chars (first section of the UUID before the hyphen)
- TRUNCUUID12: generates a random UUID and then truncates up to the first 12 chars (including the hyphen)
If this argument is not provided explicitly between the round brackets of rand(), it is looked up on the project property uriRndCodeGenerator
.
If that property is not found in the project, then a default is assumed (TRUNCUUID8).
e.g.
c_$ {rand(TRUNCUUID4)}
will generate resource localnames such as
c_47d3
the valuemapping in the #generateURI method can be used to define new placeholders, by associating them to values computed outside of the regexp. For instance, in the case of SKOSXL labels, one might want to add a lang placeholder filled with the value of the literalform of the xlabel
e.g xl_${lang}_${rand()} will generate skosxl labels such as: xl_en_4f56ed21
CODA URI Generator
The CODA URI Generator (whose factory ID is it.uniroma2.art.semanticturkey.plugin.impls.urigen.CODAURIGenerator
) wraps any CODA converter conforming to the contract http://art.uniroma2.it/coda/contracts/randIdGen
. In a certain sense, it is complementary to the class STSpecificRandomIDGenerator
, which wraps a URI Generator as a converter. In fact, the former should never mention the latter, otherwise the application would crash because of an infinite loop.
This URI generator has two configuration classes: "CODA-based templated URI generator" (CODATemplateBasedURIGeneratorConfiguration
) and "CODA-based any converter URI generator" (CODAAnyURIGeneratorConfiguration
). The former delegates to a CODA converter having the same semantics of the native URI Generator, while the latter is parameterized by the URI of the converter to use (in this case, the actual arguments are passed to the converter as global parameters in the CODAContext
).
URI Generators and CODA converters
Semantic Turkey URI Generators are tightly bound to CODA Converters, as they play a similar role in these two, interconnected platforms. The picture below illustrates this relationship from a structural viewpoint:
Both NativeTemplateBasedURIGenerator and CODAURIGenerator uses CODA converters to different extents: the former being bound to a specific converter (i.e., templateBasedRandomIdGenerator), the latter enabling the use of any converter that implements the specific contract coda:randIdGen (i.e., the analogous of ST URI Generators in CODA). Actually, the former tries to avoid most of the cost associated with the instantiation of the CODA framework, by using the converter as if it were a plain Java class with stubbed implementations of the required dependencies from CODA.
The NativeTemplateBasedURIGenerator is implemented in such a way that the configurations parameters of the URI generator (e.g. the templates) are forwarded to the underlying converter.
On the other way round, the CODA runtime embedded in Semantic Turkey is configured so that the contract coda:randIdGen is always resolved with the converter implemented by the class STSpecificRandomIDGenerator, which in turn delegates to the URI Generator of the current project. To clarify, when the CODA runtime needs to execute a http://art.uniroma2.it/coda/contracts/randIdGen
converter, it always use a converter (provided by Semantic Turkey itself) that uses the URI Generator of the current project (which, in turns, is implemented using a coda:randIdGen).