Datacube Description

RDF Datacube (qb:)is a vocabulary to describe dimension aspects of a dataset. 

The presumption that the dataset is available in RDF can be relaxed, in that a dataset may not currently be available in RDF or via SPARQL endpoints, but is accessible using web services or REST API using the described dimensions.

In qb: a dataset may have multiple dimensions, and these dimensions may be bound to skos:Concepts, which in turn may have narrower terms, providing a hierarchical dimension. 

Describing a parameterised URL 

In SIRF, we can bind these vocabulary terms to URL templates. This requires establishing a convention on how to do this, as the URI templates will not generally use URI terms.

For example, consider the URL 

http://nrtstest.ead.ae/ka/rest/services/NRT_AD_AirQuality/FeatureServer/1

This accesses data about an indicator named "1", in this case CO, within a family of indicators called "NRT_AD_AirQuality". So a client processing the URL template will need to be able to locate both these tokens, either by browsing the definitions or by looking up details when presented with a non-ambiguous URI for the concept.

The convention we will use is to describe variables in a URI template as RDF property chains using well-known prefixes (e.g. dim: and skos:) . The object these property chains relate to is a skos:Concept matching a declared qb;dimension of the dataset. This skos:Concept may be known in advance, or discovered by browsing the skos:ConceptScheme declared by qb:codeList property of the indicated dimension.

SKOS allows Concepts to have multiple notations - the tricky thing is that the datatype of the notation is used to determine its code space. so if multiple notations may be present it will be necessary to define the data type.

e.g.

<http://demo.sirf.net/def/vocab/mdgindicators/7.2>
        a                       skos:Concept , rdfs:Resource , <http://demo.sirf.net/def/vocab/mdgindicators/Indicator> ;
        rdfs:label              "7.2 CO2 emissions, total, per capita and per $1 GDP (PPP)"@en ;
 ...
        skos:notation           "749"^^<http://demo.sirf.net/def/vocab/mdgindicators/unstats_seriescode> , 
                                "CO2"^^<http://demo.sirf.net/def/vocab/mdgindicators/shortlabel> ,
                                 "7.2"^^<http://demo.sirf.net/def/vocab/mdgindicators/Indicator> ;

The rdfs:label of the dimension is used to indicate the type of object required, and then a property chain is provided that the client will need to resolve. Property chains will need to match sub-property (specialised) types, - will see if the SIRF DNA can list all supertypes of a property so a client doesnt need to interpret this.

Thus

{dim:indicator skos:broader skos:notation^^codetype}

will match a dimension of the dataset with  rdfs:label "indicator".  note that the properrty lu:shortid is a subproperty of skos:notation, and will match these templates.

"In statistical data sets it is common for values to be encoded using some (possibly hierarchical) code list and it can be useful to be able to easily identify the overall code list in some more structured form. To cater for this a component can also be optionally annotated with a qb:codeList to indicate a set of skos:Concepts which may be used as codes. The qb:codeList value may be a skos:ConceptSchemeskos:Collection orqb:HierarchicalCodeList. In such a case the rdfs:range of the component might be left as simply skos:Concept but a useful design pattern is to also define an rdfs:Class whose members are all the skos:Concepts within a particular scheme. In that way the rdfs:range can be made more specific which enables generic RDF tools to perform appropriate range checking."

Thus we might write a URL template as 

http://nrtstest.ead.ae/ka/rest/services/{dim:indicator skos:broader skos:notation}/FeatureServer/{dim:indicator skos:notation}/...

If we may have multiple codespaces for skos:notations then you may specify the specific notation required using a CURIE (prefix:id) or full URI form. Prefixes are interpreted using the namespaces of the defined dimensions - so will be available to a client accessing the datanetwork. In general it will be sufficient to match the last token in the datatype with the skos:notations provided.

Describing a Hierarchical Dimension

In the case of a structured hierarchy of terms, a few challenges arise. 

A key issue is the need to be able to browse up and down the hierarchy and detect when you have the right type of term - the right level of detail - to be used in access method for the datacube. 

This is achieved by assigning a data type to each term, and allocating this type to the rdfs:range property of the dimension. 

In this example, low level terms are of type: uleadi:Indicator ;


# bind the reporting artea to the SDMX conceptual model
ul:eadIndicator  a rdf:Property, qb:DimensionProperty;
rdfs:label "indicator"@en;
#    rdfs:subPropertyOf sdmx-dimension:refArea;
rdfs:range uleadi:Indicator;
qb:codeList ul:eadindicators;
qb:concept ul:Indicator .


ul:eadindicators a skos:ConceptScheme ;
rdfs:comment "Environmental Indicators reported by Environment Abu Dhabi via test services"@en '
rdfs:label "EAD NRT Indicators"@en;
.

uleadi:AQParameter_CO a skos:Concept, uleadi:Indicator ;
skos:inScheme  ul:eadindicators ;
skos:prefLabel "AQParameter_CO" ;
skos:notation "1" ;
skos:broader uleadi:NRT_AD_AirQuality ;
.

uleadi:IndicatorGroup a rdfs:Class ;
rdfs:label "A thematic grouping od indicators" ;
.

uleadi:Indicator a rdfs:Class ;
rdfs:label "A specific indicator" ;
.


uleadi:NRT_AD_AirQuality a skos:Concept,  uleadi:IndicatorGroup ;
skos:inScheme  ul:eadindicators ;
skos:prefLabel "Air Quality"@en ;
skos:notation "NRT_AD_AirQuality" ;
skos:narrower uleadi:AQParameter_CO ;
.

Note that it is possible to browse the skos:ConceptScheme from the top until a object macthing the rdfs:range is obtained - in this case a ul:Indicator.  This allows us to be specific about which terms in a hierachical vocabulary may be used.  If any term may be used, a range of skos:Concept is sufficient.

Interacting with a Hierarchical Dimension

The SIRF DNA will return a graph containing all the qb: definitions, but not the SKOS terms. It will be necessary to access the SKOS content via either URI dereferencing or using a known SPARQL or SISSVoc endpoint and formulating an appropriate query.

We have three possible modes of interacting with the terms;

1) we have a low level term of the right type already, and we need to "walk" the hierarchy upwards to get details of broader terms, to fill in the parameters specified in the URL template.

2) we want to browse from the top down, until we find a term of the right type to fit into the template

3) we have a term of the wrong type, such as from a search of the taxonomy, and  we need to walk up or down until we find a term of the right type.

Traversing narrower to broader

The SKOS vocabulary can be accessed using the SISSVoc API.  

Views have been assigned to all identifiers where the containing scheme is declared to be a sirf:dimanesion_voc (see http://demo.sirf.net/def/vocab/eadindicators/NRT_AD_AirQuality?_view=alternates&_format=html to see the list of views).

Thus, to get the broader term using SISSVoc, simply use the skos:broader  view:

http://demo.sirf.net/def/vocab/eadindicators/AQParameter_CO?_view=skos:broader

=>

/sissvoc/dnskos/concept/broader?uri=http://demo.sirf.net/def/vocab/eadindicators/AQParameter_CO

returns 

http://demo.sirf.net/def/vocab/eadindicators/NRT_AD_AirQuality

If necessary we can get all the terms up to the top of the tree in one query using:

http://demo.sirf.net/def/vocab/eadindicators/NRT_AD_AirQuality?_view=skos:allbroader&_format=html

=> 
/sissvoc/dnskos/concept/broaderTransitive?uri=http://demo.sirf.net/def/vocab/eadindicators/AQParameter_CO

Browsing a hierarchical parameter

We have extended this to support multiple concept schemes and browsing hierarchical terms:

This can be access via the LID view:

http://demo.sirf.net/def/vocab/eadindicators?_view=skos:topconcepts&_format=html

Each term can then be accessed to get narrower terms;

i.e. we have added a refinement to the "list all concepts" to list all concepts in a target scheme.

http://unsdi-dev.csiro.au/sissvoc/dnskos/concept

=>

http://unsdi-dev.csiro.au/sissvoc/dnskos/concept?scheme=http://demo.sirf.net/datasets/uneplive/eadindicators

and to get the ":top concepts" for a scheme.

http://unsdi-dev.csiro.au/sissvoc/dnskos/topconcept?scheme=http://demo.sirf.net/datasets/uneplive/eadindicators

Searching

TODO - add the same scheme scope to the search functions.

Use Case Example

Let us consider that we start with an application that defines a domain of interest. It therefore knows, at as a starting point, a geography and a subject domain. For simplicity we will assume a standard temporal dimension (reporting over calendar years).  These are specific terms in two different vocabularies. (i.e. values on the axis of these dimensions). We may have a broad term or a narrow term and either want to navigate to specific detail or compare specific detail with related values at higher detail, equivalent or more generalised 

For example, the subject matter may be Millenium Developmnent Goal Indicators (MDG) and the Geography is Abu Dhabi. 

We will address the geography dimension first, as we are using SIRF as a foundation and its API is designed for this (though we could use a different API to support different logic in future.).

Abu Dhabi is an entity within the SIset http://demo.sirf.net/siset/uae-admin

=> http://demo.sirf.net/siset/uae-admin/AZ

Selecting specific indicator.

MDG is a broad concept, and a SKOS vocabulary is available at: http://demo.sirf.net/def/vocab/mgdindicators

We can get a browse list of concepts at:http://demo.sirf.net/def/vocab/mgdindicators?_view=skos:topconcepts

For any term in the vocabulary (i.e. what the user has chosen from the list) we can search for datasets that use the siset and have this subject:

Let us select "7.6 Proportion of terrestrial and marine areas protected" i.e.

 http://demo.sirf.net/def/vocab/mgdindicators/7.6

we can then search for datasets in the current geography dimension (i.e. SIset) matching this subejct.

http://unsdi-dev.csiro.au/sissvoc/dna/datanetwork?uri=http://demo.sirf.net/siset/uae-admin&subject=http://demo.sirf.net/def/vocab/mgdindicators/7.6

which returns

http://demo.sirf.net/datasets/uneplive/uae-mdg 

(which needs synthetic data attached to it)

  • this exploits SKOS information model to search for any related broader term to find data, using the specific term we know about, provided its a term in a registered dimension, and therefore the dimension has been loaded to the datanetwork

Finding related geography terms

We can find the next level using the "related" API

http://demo.sirf.net/siset/uae-admin/AZ?_view=related

and look for skos:broader or geo:sfWithin relationships 

User selection of comparable geography

The user now selects and option

This identifies a containing geography (UAE, a country), potentially in a new SIset (e.g. UNGEGN)

Finding all equivalent terms for selected geography

We can then get all the equivalent terms for UAE from other vocabularies - using the "listlookups" view.

For each SIset we have a mapping to we can the look for data using that as a dimension.

Finding relevant data

So we can now look for datasets that use the "next level" geography and the selected indicator.

for each term we can hit the datanetwork API (may want a PID rule here to include subject search dimension)

http://sirf-data.csiro.au/sissvoc/dna/datanetwork?uri=http://id.sirf.net/siset/UN/UNSTATS-countries&subject=http://purl.org/linked-data/sdmx/2009/subject%233.3.5

Comments

    Add new comment