Strona zostanie usunięta „Home”
. Bądź ostrożny.
Standards and guide to reading this wiki
Must(Not), Should(Not), May(Not)
Use of must, must not, should, should not, may, may not is as per RFC-2119
WIP
This wiki, a work-in-progress, would be distilled at a later stage to a specification document for the set of protocols for metadata exchange between participating institutions and the downstream systems.
niosX
is an adaptable backend, capable of serving any application that manages highly connected data, modelled using W3C Annotation Data Model. We overlay a semantic layer of NiosxConcept
s over the annotation data model to capture the structure of an application in any given context.
The annotations capture the different ways in which the users can interact with the base object and the NiosxConcept
s impose a context specific structure on top. The work for this project started as a discovery and interpretation tool for Milli which is a consortium of archives so let's take that as an example to understand its capabilities.
The base archival object is modelled as a MilliEntity
which is an extension of the NiosxEntity
and adds fields mandated by the ISAD(G) guidelines. This base object is enriched by annotations added by the users of the platform. niosX
is conceptualised as a data graph with the MilliEntity
(archival objects) and annotations as nodes. The edges of this data graph are the NiosxConcept
that add a semantic layer to the entities. In this framework, a comment is a simple annotation with a TextualBody
, connected to the MilliEntity
with an edge Comment
(a type of NiosxConcept
). Similarly, ScopeContent
, Tagging
, Copyright
are all types of NiosxConcept
that form the edges that connect the object to the annotation. The user can even create a UserDefinedConcept
to extend the semantic vocabulary already available; a GenericConcept
is a plain-old annotation to be used in cases where a new concept is not needed and available concepts are too specific.
APIs for other domains can be easily constructed by providing contracts for a base object (a sub-type of NiosxEntity
) and a semantic layer of NiosxContext
(s).
The milli-discovery is an API provider that sits at the edge of the Milli ecosystem and is responsible for extraction of metadata from the participating institutions and making it available to downstream components that consume this metadata exposed via a GraphQL API. The users of the application can annotate these objects in different ways.
The Figure 1 above shows some building blocks of niosX
. We begin with a metadata object (EAD3, RSS, etc.) and convert it to a NiosxEntity
instance. The ingestion process may be able to extract other information from the metadata object, create annotations and connect them with the object using appropriate NiosxConcept
(s). Post-ingestion, users would interact with the application and add more annotations to it.
A list of participating institutions needs to be maintained. This list must be capable of expansion and contraction by users with appropriate authorization.
Harvesting the metadata from the participating institutions needs to be handled by milli-discovery. The plausible strategies for harvesting are:
The design should incorporate mechanism for the community to contribute plugin like components that provide crosswalk from other standards to the milli-discovery specification. The challenge to such an implementation is data-type conversion which could be handled by limiting the data-types to a basic fixed number, defining type conversion macros. With the constraints in place, a community contributed crosswalk would be a simple mapping from source to target keys provided in a text based data exchange format like JSON for instance.
A data validation utility should help the user with information to correct any validation errors. A utility to fix any superficial validation errors like source-to-target mapping of keys may be provided.
Milli-discovery expects the participating institutions to provide the metadata in a format that is understood by the application which is described in this section. At a later stage, tools may be provided to enable crosswalks between this specification and other standards like Dublin Core, DPLA MAP, etc.
The specification must be structured as an aggregation of classes (object types) that makes the specification future proof as addition and elimination of fields is less likely to break the specification.
The specification is defined as an abstract data model, however, it borrows from the design principles of GraphQL specification (which is an abstract query language itself) and can be implemented with any data-exchange format like JSON.
Harvesting metadata, which is essentially a POST on the API endpoint, is modelled as a mutation. However, the metadata may not always flow into the system as a well-formed JSON conforming to the GraphQL schema. For such cases, a utility that ingests a csv (with the header providing key-value mapping information for instance), xml or other formats and, calls the mutation end-point internally must be provided.
Consumption of metadata would be available at the query endpoint of the schema.
This guide provides the following information:
There are 3 types of data transformations that can cover a vast majority of source to target transformation while ingesting data:
FullName
mapped to name : {firstName, lastName}
Most transformations to reconcile data from source and the expected format for ingestion can be handled as one of these or their complex combination.
Strona zostanie usunięta „Home”
. Bądź ostrożny.