smc-report: teiHeader

This report gives an overview of the CMD profiles modelling teiHeader (as of 2013-09-02)

The links point either to a static exported [SVG] graph, to a rasterized [PNG] version of it, or live to the [SMC] browser. (With live links, sometimes, the graph does not get rendered right away. In such case just change any of the options to refresh the graph pane.) Links pointing to the definition of given term either in Component Registry or ISOcat are marked with [DEF].

First teiHeader profile 2010

There was a first attempt already in 2010, modelling the recommended teiHeader, encoding fileDesc and profileDesc components, leaving out encodingDesc and revisionDesc. The leaf elements were bound to the most prominent data categories, making it a mixture of both dublincore and isocat.

examples/teiHeader1-profile.png

In this graph, the node size is set relative to the usage of given term within the whole CMD (optionnode-size=usage).

View teiHeader-2010 [PNG], teiHeader-2010 [DEF] in Component Registry or teiHeader-2010 [SMC] in SMC browser.

DTA teiHeader profile 2012

The large research project Deutsches Textarchiv, digitizing a hoist of historical german texts from the period 1650 - 1900 also uses TEI to encode the material and consequently the teiHeader to hold the metadata information. Part of the project is also to integrate the data and metadata with the CLARIN infrastructure, meaning CMD records need to be generated for the resources. For this the team generated a completely new profile (as yet private) closely modelling the version of the teiHeader used in the project (DTA teiHeader). Regarding the question, why another teiHeader-based profile was generated not reusing the existing one, according to a personal note by a member of the project team and author of the profile, Axel Herold, the profile was custom made for this particular project and it seemed undesirable to create a generalised TEI header profile.

Though this profile seems to cover many aspects the same way the first profile does, they are completely disconnected, even using different data categories. (The only shared data category is size.)

The profile is still private, but there are already instances of it in the joint CLARIN metadata domain (857 records). It can be also seen in the SMC Browser (teiHeader-DTA [SMC]) and its XML Schema can be downloaded from the Component Registry (teiHeader-DTA [XSD]).

examples/teiHeader-DTA_c857.png

In this graph, the node size is relative to the number of text nodes in the individual fields of the (857) instances currently published.

DBNL

Another large-scale project Nederlab aiming at processing historic Dutch newspaper articles into a platform for search and analysis, starting 2013 in Netherlands. Within this project, the metadata is also encoded in a teiHeader and the data shall be integrated within CLARIN. Here also, another set of CMD profiles was created, however reusing existing components. As seen in the figure, components fileDesc and profileDesc were reused, while the components encodingDesc and revisionDesc, left out in the original profile, were added (DBNL new components [SMC]).

The profiles are private and no records were yet published, but there is a preview of a 2013-06 version of the profiles in a dedicated version of the SMC Browser: DBNL_Tekst [SMC]. (This profiles will also be integrated in the main version of the SMC-Browser)

examples/teiHeader-DBNL.png

This shows the reuse of components from the first teiHeader (2010) by the new DBNL profiles (teiHeader-DBNL [SMC])

TEIDocumentDescription

There is another profile with 'TEI' in its name, the TEIDocumentDescription. However according to its author Thomas Eckart, this is rather an experimental one and it does not seem to follow the structure of the teiHeader.

See TEIDocumentDescription [DEF] in Comonent Registry or TEIDocumentDescription [SMC] in the SMC Browser.