SMC - devdocs

This is the technical part of the documentation, explaining more the behind the scenes working of the SMC module. It is being maintained in the clarin-svn. See userdocs for more general information.

SMC consists of:

crosswalk service
the main service translating between indexes - TODO
concept-based query expansion
TODO
smc-xsl

set of xslt-stylesheets (governed by a build file) for pre- and post-processing the data

see under {smc-root}/src/xsl and {smc-root}/build.xml

SMC Browser

a web application to explore the CMD data domain consisting of the two modules: smc-stats and smc-graph

smc-stats
a module of the SMC Browser providing human-readable statistical summaries of the CMD data domain
smc-graph
a module of the SMC Browser providing advanced interactive graph-based user interface for exploring the CMD data domain

smc-xsl

separate auto-docs under {smc-root}/docs/xsltdocs/ governed by the build-file {smc-root}/build_init.xml (included by main build)

Configuration under build_init.props

The build has two main targets:

init-data
fetches all data from source registries into {data.dir} and applies a series of transformation steps to produce internal data representations
gen-out
generates a build of the smc-browser, i.e. it copies all scripts to $out.dir and generates the statistics and graphs out of the preprocessed data into $out.data.dir

SMC Browser

SMC Browser is implemented on top of wonderful js-library d3. The main code is in {smc-root}/src/web/js/smc-graph.js. It needs refactoring - separate the graph functionality into a widget.

Interaction with MDrepository

SMC primarily works on the data from Component Registry, Data Category Registries and Relation Registry, which is all schema-level information. However, it is interesting to know, how this model data translates into instance data. Therefore linking betwen SMC-Browser and MDRepo is being built.

The complete initialization sequence from scratch to get SMC + mdrepo running:

  1. install cr-xq

    https://github.com/vronk/SADE

  2. setup mdrepo project

    TODO: How?
    
  3. upload the data

    mdrepo/build.xml: ant upload
    or
    mdrepo/upload_all.sh
    

    requires write permissions for $data.path and $cache.path (_indexes))

  4. generate collection mappings

    mdrepo/cmd?operation=gen-mappings
    -> cmdcheck:collection-to-mapping($config, $x-context)
    in cr-xq/modules/cmd/cmd-check.xqm
    

    generates a representation of every collection in the application configuration, based on the directory structure - basically one collection per provider.

    stores the resulting mappings file in the project-collection combining mappings-manual.xml with the generated info into

    mappings.xml
    
  5. get the smc-module with the smc-data:

    ant init-data
    
  6. install smc-browser

smc-data (schema)

stats+graph for instance data

precondition: smc-data in place

ay-xml

http://clarin.aac.ac.at/exist9/apps/cr-xq/mdrepo/resource?x-context=University_of_the_Basque_Country&action=ay-xml-view

generate the indexes (map dcrIndex -> cmdIndex)
combining the structural information

runs also struct (ay-xml) if not present. -> mdrepo/smc?operation=gen-mappings

gen-graph

http://localhost:8680/exist/apps/cr-xq/mdrepo/smc/smc.xql?operation=gen-graphhttp://clarin.aac.ac.at/exist9/apps/cr-xq/mdrepo/smc/smc.xql?operation=gen-graph = <Termsets>{structures}</Termsets> -> _structure.xml (enriched with + cmd-terms-nested.xml, dcr-terms.xml) -> terms2graph -> _structure-graph.xml

move the resulting json into the smc-browser: -> smc-browser/_structure-graph.json