FAQ

Archiving

How can I save my valuable research material for the future?

You should definitely go to one of the established repositories for research data. They are made just for that.

In Austria, there are already several institutional repositories available. Which one to use depends on the type of resource, how it is to be used and also on your affiliation.

Do you accept data from anybody?

As part of the CLARIN-AT infrastructure, CCV/LRP is intended to be a service for Austrian researchers.

Do you accept any data?

Language Resources Portal is specialized in language resources obviously, this includes digital texts (including images of texts), lexicographic resources, or semantic resources like vocabularies and thesauri. In case of doubt, simply contact us.

What data formats do you accept?

You are strongly encouraged to provide the resources in standard formats acknowledged by the respective international research communities. We will support you in converting the data if this is necessary and feasible. The preferred encoding system for textual data in our repository is TEI/XML (Text Encoding Initiative) with metadata in CMDI (Component Metadata Infrastructure). For an overview of recommended standard formats have a look at the CLARIN standards recommendations.

What is the actual depositing/archiving procedure?

During the submission of digital language resources to the repository, the data undergo a curation process in order to ensure quality and consistency. We assist you in meeting necessary requirements for sustainable resource archiving: data have to be provided with metadata in standard formats accepted/adopted in the respective communities, persistent identifiers (PIDs) have to be assigned, IPR issues have to be resolved and clear statements with regard to licensing and possible use of the resources are to be made.

The depositor is also required to sign a deposition agreement acknowledging the (s)he is the holder of rights to the data and that (s)he has the right to grant the rights contained in this licence.

Once the data is indeed deposited in the repository it is assigned a PID for stable reference.

How can the archived data be cited? What is a "PID"?

In accordance with CLARIN’s technical defaults, CCV/LRP makes use of the Handle System to assign unique and persistent identifiers to the digital objects. In such a manner, every resource has a uniquely identifiable URL that will always point to the same data, wherever it might physically move in the future. The handle is especially meant for citing the resources in publications.

What if I want/need to update the archived data?

Every change to the resources and metadata is stored as a new version. The PID always points to the latest version. However if the changes are substantial and/or the two versions should both be equally available, a new object with a new PID should be created that is equipped with a link to the preceding version, which retains its PID.

How safe is my data in your repository?

The repository is ran on the hardware/servers maintained by the Computing Centre of the Austrian Academy of Sciences (ARZ), which makes for a solid organisational and technical backing.

The data in the repository is backed up in a regular manner: a backup copy is stored everyday on another server on-site and once a week an additional copy off-site. The LRP performs regular checks of the integrity of the copies. In case of corruption, the backup data set is replaced by a new one. We keep at least three copies at all times, one of them off-site.

What if I want to withdraw the resources in the future? Can I delete the data?

Yes, if need be. However we at least need to keep a reference that the data was there, so the administrative metadata will be retained indicating that the data itself were removed.

Do I need to pay to deposit the resources?

No. The repository is run as part of the research infrastructure as a service to the community.

I don't want / cannot make the data publicly available. Would you still archive them for me?

In accordance with the advocacy of the research infrastructures and the general development with respect to Open Access, we strongly encourage the data producers to be as open as possible: publicly available data has better chance to be picked up by fellow colleagues which is good for the reputation and the citation index. Public funding agencies increasingly require researchers to publish not only the results of their research, but also the research data.

However we are aware that the Open Access approach is not possible in all cases. IPR or ethical issues as well as research projects in progress may require more restrictive access modes. We will help you to select the right licence for your needs. If really, necessary, we also offer the possibility to just archive the data, without any public access whatsoever.

Search, Resource Availability

How can the archived data be found?

The resources are published on CCV/LRP's web site and can be browsed through the LRP's web interface.

Additionally, the metadata about the resources is also collected by the research infrastructure CLARIN and made available via its dissemination channels, primarily the Virtual Language Observatory.

Can I do anything with the resources? What are the regulations regarding access?

In general, the Terms of Use apply to the use of the resources and services provided by the CLARIN Centre Vienna. Additionally resource-specific licences apply as stated in the description for every resource.

Do I need to pay to get to the resources?

No. All the resources we offer are available free of charge.

Do I need register/login to get to the resources?

It depends. There are three basic modes of access: public, academic and restricted.

Public resources are accessible without any further restrictions. You still need to abide by the corresponding licence (by default CC-BY-NC-SA-3.0-AT) if you want to further use the resources (e.g. for research). Academic use means that you have to be affiliated with an academic institution (e.g. member of a university). This is checked primarily via the so-called Federated (or Shibboleth) Login (see below). If you cannot login via Shibboleth, but still are an academic person or you have academic motives to get to the resource, please contact us.

Some of the resources are only available on the basis of a special agreement. This is indicated by the "restricted" access mode which usually implies that you have to fill in a registration form and accept a special licence. In the worst case the resource is not available online at all. In this case, you need to contact us to find out if and how to can get access to the resource.

What is this Federated (or Shibboleth) Login?

Shibboleth, AAI (Authentication and Authorisation Infrastructure) or SSO (Single-Sign-On) refer to an architecture where service providers rely on identity providers to authenticate users. I.e. if users want to use a certain service (like the Language Resources Portal) of the provider, for which they need to authenticate, they are redirected to their home institution (e.g. university) where they can login with their institutional credentials. If successful, the home institution lets the provider know that they are entitled to use the service. In short, you can login to different services with your institutional account, without the need to register every time.

This is similar to the OpenId initiative known in the "commercial" world (login to cool web pages with your google or facebook account).

Given that this "Identity Federation" is established by academic institutions, it is implicitly assumed that if a user can login via Shibboleth, (s)he is an academic person.