corpus_shell is a modular framework - a service-oriented architecture for a distributed and heterogeneous virtual landscape. Its core functionality is encapsulated in self-contained components exposing well-defined interfaces based on acknowledged standards. The principle idea behind the architecture is to decouple the modules serving data from the user-interface components. To achieve this end, a number of basic requirements are imposed on the system: dynamic configuration of data sources, dynamic configuration of front-end layout, support for different protocols and support for different data formats.

corpus_shell is built on and around standards and protocols endorsed by and used in the CLARIN infrastructure:  

One of the basic requirements for a modicum of interoperability is to identify the resources by a PID/URI and to make them available via HTTP. Another basic requirement is to offer metadata about the resources for harvesting via OAI-PMH. The backbone of the system is the SRU/CQL search protocol which is also being used by CLARIN’s Federated Content Search working group. It is applied both on the input side by offering access to external data sources, as well as on the output side, by offering access to the data provided by the system.

The heterogeneous nature of data and the need to interact with external applications requires a system that supports different formats. While the system is intended to be as generic as possible, the extensible nature of the system allows to easily add support for new data formats.

Link: The framework is open source, the source code is available at git-hub.

Contact: Matej Ďurčo (