Charly Moerth, Stephan ProcházkaVienna 2017

Contents

1 Introduction

2 Basic dictionary structure

3 Character encoding

4 Dictionary entries

4.1 Lemmas and MWUs

4.2 Identifiers (IDs)

4.3 Illustrative sentences

4.4 Proverbs

5 Word forms

5.1 Morphosyntactic annotations

5.2 Variants

6 Translation equivalents

6.1 Translating lemmas and multi-word units

6.2 Selection restrictions

6.3 Translating examples

6.4 Definitions

6.5 Names

6.6 Synonyms

7 Grammatical information

7.1 Word class information

7.2 Roots

7.3 Grammatical gender

7.4 Voice

7.5 Derived verb classes

7.6 Plural nouns

7.7 Collective nouns

7.8 Elatives

7.9 Count nouns

7.10 Invariable nouns and adjectives

7.11 Participles

7.12 Pragmatics

7.13 Functional constraints

8 Language identifiers

9 Etymologies

10 Advanced issues

10.1 One entry or two entries?

10.1.1 Feminine derivations

10.1.2 Homonyms

10.1.3 Nominals ending in -i

10.2 Diptosy

10.3 Arguments

10.4 Constructions

10.5 Constructions vs. sample sentences

10.6 Register

10.7 Semantic classifications

11 Creating usage examples

12 Sources and responsibilities

12.1 ... of forms

12.2 ... of examples

12.3 ... of senses

13 References



5.0

1 Introduction (Go to contents)

The examples in the following guidelines are taken from dictionaries that are being produced as part of the VICAV programme. These are A Machine-readable Dictionary of Egyptian Arabic, A machine-readable dictionary of Rabat Arabic, A Digital Dictionary of Tunis Arabic, A digital dictionary of Damascus Arabic and A machine-readable dictionary of Modern Standard Arabic.

2 Basic dictionary structure (Go to contents)

The VICAV dictionaries are encoded according to the Guidelines of the Text Encoding Initiative (P5). They are conceptualised as a specific type of text and are therefore encoded with text elements. Each dictionary starts with a teiHeader element which contains the metadata of the dictionary.

The lexicographic data are placed in typed div elements. Thus, our TEI dictionaries basically look like this:

<TEIversion="5.0"><teiHeader>
   ...
   </teiHeader><text><body><divtype="entries"><entry>...</entry><entry>...</entry><entry>...</entry>
            ...
            ...
            ...
         </div></body></text></TEI>
                        
A Machine-readable Dictionary of Egyptian Arabic

The body of the VICAV dictionaries can not only contain simple entries but also examples which are encoded in cit/quote constructs. The rationale behind keeping example sentences outside the entries is to be able to reuse them in different parts of the dictionary (See below: Examples and Creating examples).

<body><divtype="entries"><entry>...</entry><entry>...</entry><entry>...</entry>
            ...
            ...
            ...
         </div><divtype="examples"><cittype="example">...</cit><cittype="example">...</cit><cittype="example">...</cit>
            ...
            ...
            ...
         </div></body>
                        
A Machine-readable Dictionary of Egyptian Arabic

3 Character encoding (Go to contents)

Character encoding is based on Unicode (UTF-8).

4 Dictionary entries (Go to contents)

There are three types of entries: lemmas and multi-word units (MWUs) and examples.

4.1 Lemmas and MWUs (Go to contents)

Lemmas and MWUs basically have the same structure.

<entryxml:id="kitaab_001"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">kitāb</orth></form>
   ...
</entry>
                        
A Machine-readable Dictionary of Egyptian Arabic
<entryxml:id="fi_0"><formtype="multiWordUnit"><orthxml:lang="ar-arz-x-cairo-vicavTrans">fi baḥr</orth></form>
   ...
</entry>
                        
A Machine-readable Dictionary of Egyptian Arabic

4.2 Identifiers (IDs) (Go to contents)

As can be seen in the examples above, entries are assigned a unique identifier. In the VICAV dictionaries, these are made up of characters that are restricted to letters in the ASCII range. Usually, the IDs are created by pressing Ctrl + I. If VLE (the editor tool) creates an ID already existing in the database, the entry can not be saved. In such cases, the user has to modify the ID manually by e.g. increasing the number at the end of the ID.

4.3 Illustrative sentences (Go to contents)

Examples are encoded making use of a cit/quote construction.

<citxml:id="yibqa_ustaz_001"type="example"><quotexml:lang="ar-arz-x-cairo-vicavTrans">ḥayibʔa ustāz in šāʔ allāh.</quote>
   ...
</cit>
                        
A Machine-readable Dictionary of Egyptian Arabic

Ideally, examples should consist in complete sentences. Examples should be concise, but can also contain several sentences. If dialogical models are involved, sentences are to be separated by a dash.

...
<quotexml:lang="ar-arz-x-cairo-vicavTrans">tislam idēki. - ʔaḷḷāh yisallimak.</quote>
...
A Machine-readable Dictionary of Egyptian Arabic

4.4 Proverbs (Go to contents)

Proverbs are a subtype of example.

<citxml:id="il_cagala_min_ish_shitaan_001"type="example"subtype="proverb"><quotexml:lang="ar-arz-x-cairo-vicavTrans">il-ʕagala min iš-šiṭān.</quote><cittype="translation"xml:lang="en"><quote>Haste makes waste.</quote></cit></cit>
                        
A Machine-readable Dictionary of Egyptian Arabic

5 Word forms (Go to contents)

There are two types of form elements: lemmas and wordforms. Nominals are usually furnished with plural forms, verbs with the third person singular present tense.

<entryxml:id="balad_0"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">balad</orth></form>
   
   ...
   
   <formtype="inflected"ana="#n_pl"><orthxml:lang="ar-arz-x-cairo-vicavTrans">bilād</orth></form>
   ...
</entry>
                        
A Machine-readable Dictionary of Egyptian Arabic

5.1 Morphosyntactic annotations (Go to contents)

The different morphological forms are encoded through labels in the ana “analytic” attribute. Examples of frequent labels are:

Value Meaning
#adj_f feminine form of an adjective
#adj_pl plural of an adjective
#n_constructState construct state of a noun
#n_pl plural of a noun
#n_unit nomen unitatis
#v_pres_sg_p3 3rd person singular present tense
#v_vn verbal noun

An example with a verb:

...
<formtype="inflected"ana="#v_pres_sg_p3"><orthxml:lang="ar-apc-x-damascus-vicavTrans">yǝnšor</orth></form>
...
A Digital Dictionary of Damascus Arabic

The status constructus of a noun can be registered like this:

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">mara</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">mrʔ</gram></gramGrp><formtype="inflected"ana="#n_constructState"><orthxml:lang="ar-arz-x-cairo-vicavTrans">mirāt</orth></form>
...
A Machine-readable Dictionary of Egyptian Arabic

5.2 Variants (Go to contents)

Only headwords may have variants. These are encoded as typed forms nested in the top-level form of the entry.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">tžawwaz</orth><formtype="variant"><orthxml:lang="ar-apc-x-damascus-vicavTrans">dzawwaž‎</orth></form></form>
...
A Digital Dictionary of Damascus Arabic

This is the only position where the form element may have the type="variant" attribute. All other variants are simply listed but not classified. In the following example two competing morphological forms are listed.

...
<formtype="inflected"ana="#v_pres_sg_p3"><orthxml:lang="ar-apc-x-damascus-vicavTrans">yǝṣal</orth></form><formtype="inflected"ana="#v_pres_sg_p3"><orthxml:lang="ar-apc-x-damascus-vicavTrans">yūṣal</orth></form>
...
A Digital Dictionary of Damascus Arabic

The next example shows alternative plural forms.

...
<formtype="inflected"ana="#n_pl"><orthxml:lang="ar-apc-x-damascus-vicavTrans">kǝnaz</orth></form><formtype="inflected"ana="#n_pl"><orthxml:lang="ar-apc-x-damascus-vicavTrans">kanzāt</orth></form>
...
A Digital Dictionary of Damascus Arabic

Variants can be assigned usage labels indicating e.g. a particular register. The more frequent variant should precede less frequent ones.

6 Translation equivalents (Go to contents)

6.1 Translating lemmas and multi-word units (Go to contents)

Translations of lemmas and MWUs are given in sense elements.

<entryxml:id="bard_0"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">bard</orth></form>
   ...
   <sense><cittype="translation"xml:lang="en"><quote>coldness</quote></cit><cittype="translation"xml:lang="de"><quote>Kälte</quote></cit></sense>
   ...
</entry>
                        
A Machine-readable Dictionary of Egyptian Arabic

Semantically unrelated homophones or items with clearly differing semantics have to be documented with several sense elements. In the following example, the Egyptian lemma balad is represented with two senses.

...
<sense><cittype="translation"xml:lang="en"><quote>country</quote></cit><cittype="translation"xml:lang="en"><quote>land</quote></cit>
   ...
</sense><sense><cittype="translation"xml:lang="en"><quote>city</quote></cit><cittype="translation"xml:lang="en"><quote>town</quote></cit>
   ...
</sense>
...
A Machine-readable Dictionary of Egyptian Arabic

Another example is rās:

...
<sense><cittype="translation"xml:lang="de"><quote>Kopf</quote></cit></sense><sense><cittype="translation"xml:lang="de"><quote>Anfang</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

6.2 Selection restrictions (Go to contents)

Ambiguous translations are often made explicit by additional information narrowing down the semantic scope of the particular item.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vcvTrans">kibīr</orth></form><sense><cittype="translation"xml:lang="en"><quote>big</quote></cit></sense><sense><cittype="translation"xml:lang="en"><quote>old
         <segtype="hint">of persons</seg></quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

6.3 Translating examples (Go to contents)

Translation equivalents of examples are indicated in cit/quote constructions.

<citxml:id="tislam_ideeki__001"type="example"><quotexml:lang="ar-arz-x-cairo-vicavTrans">tislam idēki! - ʔaḷḷāh yisallimak.</quote><cittype="translation"xml:lang="en"><quote>Thank you! - Not at all.</quote></cit></cit>
                        
A Machine-readable Dictionary of Egyptian Arabic

Sometimes it is necessary to add literal translations. This is handled in analogous manner.

<citxml:id="tislam_ideeki__001"type="example"><quotexml:lang="ar-arz-x-cairo-vicavTrans">tislam idēki! - ʔaḷḷāh yisallimak.</quote><cittype="translation"xml:lang="en"><quote>Thank you! - Not at all.</quote></cit><cittype="literalTranslation"xml:lang="en"><quote>May your hands be healthy! - May God keep you healthy.</quote></cit></cit>
                        
A Machine-readable Dictionary of Egyptian Arabic

6.4 Definitions (Go to contents)

When the translation of a term is not very common or easily understandable in the target language, it is common practise to explain the item instead of or in addition to the translation. Explanations can be understood as same language ‛translations’. In TEI, the def “definition” element is used to encode this part of a dictionary entry.

...
<sense><defxml:lang="en">a sweet dessert made of semolina, butter, sugar and rosewater</def><defxml:lang="de">Süßigkeit aus Gries, Butter, Zucker und Rosenwasser</def><cittype="translation"xml:lang="en"><quote>Basbusa</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

Very often lexical items are particular to the culture of the source language and do not have adequate equivalents in a target language. In such cases, it is important not to enter definitions or explanations in the cit/quote element. Wherever possible, we have tried to furnish translations (very often transliterations) even though they might not be very common in the target language. Explanations have to go into the def element.

The above example shows such a case. In principle, def can be used to encode any information related to ‛meaning’ that does not qualify as a translation in the narrower sense. In the following example the def element simply explains what the place name stands for.

...
<sense><defxml:lang="en">the southernmost of Egypt’s western oases</def><cittype="translation"xml:lang="en"><quote>Kharga</quote></cit>
   ...
</sense>
...
A Machine-readable Dictionary of Egyptian Arabic

6.5 Names (Go to contents)

How to write/transliterate place names and person names is an age-old problem. When several graphematic variants exist, we attempt to choose the most common one.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">šubra</orth></form>
...
<sense><defxml:lang="en">a residential area in Cairo</def><cittype="translation"xml:lang="en"><quote>Shubra</quote></cit><cittype="translation"xml:lang="de"><quote>Schubra</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

6.6 Synonyms (Go to contents)

Synonyms are encoded as pointers to other entries in the dictionary. They have always to be encoded inside sense elements.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʔaṣl-an</orth></form>
...
<sense><xrtype="syn"><refxml:lang="ar-arz-x-cairo-vicavTrans">fi l-ʔaṣl</ref></xr><cittype="translation"xml:lang="en"><quote>initially</quote></cit><cittype="translation"xml:lang="en"><quote>originally</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

7 Grammatical information (Go to contents)

The gramGrp element can accomodate a wide range of grammatical information such as word class (=pos: part-of-speech), the consonantal root and/or the verb class.

<entryxml:id="badal_001"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">badal</orth></form><gramGrp><gramtype="pos">verb</gram><gramtype="root">bdl</gram><gramtype="derivedVerbClass">III</gram></gramGrp>

   ...
</entry>
                        
A Machine-readable Dictionary of Egyptian Arabic

The gramGrp element can appear in two places: when the information refers to the lemma it is put after the form[@type=lemma] element. In many cases, the grammatical information only refers to particular senses. It is then placed inside the sense element as the first item. For the second case have a look at chapters Arguments and Constructions.

7.1 Word class information (Go to contents)

The most common POS labels are listed in the following table. Most of them are self-explanatory.

Label explanation
adjective Adjective
noun Noun
ordinal Ordinal number
particle verb
pluralNoun A plural noun that has an entry of its own. This does not necessarily mean that the singular does not exist, but that the plural displays semantic particularities.
verb verb

The labels ideally correspond to ISOcat concepts. However, there are exceptions:

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ʕarabi</orth></form><gramGrp><gramtype="pos">glottonym</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">ʕrb</gram></gramGrp>

...
A Digital Dictionary of Damascus Arabic

7.2 Roots (Go to contents)

Roots are indicated in accordance with etymology. The root of the Arabic equivalent of ‘to yawn’ is encoded like this:

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʔittāwib</orth></form><gramGrp><gramtype="pos">verb</gram><gramtype="derivedVerbClass">I-t</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">ṯʔb</gram></gramGrp>
...
A Machine-readable Dictionary of Egyptian Arabic

Loans of the structure CāC(a) are invariably assigned CʔC.

Word Root
bāṣ bʔṣ
ḍāma ḍʔm
kār kʔr

Other loans are reduced to their consonantal skeleton.

In multi-word units the single items are separated by blanks.

...
<formtype="multiWordUnit"><orthxml:lang="ar-apc-x-damascus-vicavTrans">rās ǝž-žabal</orth></form><gramGrp><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">rʔs ǧbl</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

The following table contains a list of special cases.

Word Root
mayy, mayye, māyya, mā ... mwh
sana ‘year’ sn
istanna ʔny
kam km
qaddēš, ʔaddēš, ... qdr
ʔayn, wēn, fēn, ... ʔyn
ʔēmta, ʔimta, ... mty
šī, šuwayy ... šyʔ
walla ‘or’ w ʔly
ʔillā ʔly

Prepositions are dealt with in the following manner:

Word Root
bi b
li l
ʔilā ʔly
ʕalā ʕly
maʕa
fy

7.3 Grammatical gender (Go to contents)

Gender is only indicated with morphologically unmarked feminine common and proper nouns.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">šamᵊs</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="gender">feminine</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">šms</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic
...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">zēnab</orth></form><gramGrp><gramtype="pos">properNoun</gram><gramtype="gender">female</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">zynb</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

7.4 Voice (Go to contents)

When verbs display special developments in the passive voice they can be treated as lemmata in their own right.

...
<formtype="lemma"><orthxml:lang="ar-x-DMG">šufiya</orth><orthxml:lang="ar">شفي</orth></form><gramGrp><gramtype="pos">verb</gram><gramtype="derivedVerbClass">I</gram><gramtype="voice">passive</gram><gramtype="root"xml:lang="ar">شفي</gram><gramtype="root"xml:lang="ar-x-DMG">šfy</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

7.5 Derived verb classes (Go to contents)

In the VICAV dictionaries, we apply a mixed system of indicators mainly making use of the labels traditionally used in Arabic linguistics. In cases not covered by this system, we use labels analogous to Woidich 2006 (Das Kairenisch-Arabische).

Traditional Woidich Example
I katab
II darris
III zākir
IV ʔalqa
t-I ʔitkatab
V ʔitdarris
VI ʔitdāra
VIw tlūṣiq
VII ʔinṣaṛaf
VIII ʔintaẓaṛ
IX ʔiḥmaṛṛ
X ista-I
ista-II
ista-III
ʔistaxdim
ʔistirayyaḥ
ʔistabārik

Quadriliteral verbs are assigned the values Iq (=CaCCaC) and IIq (=taCaCCaC).

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">tbahdal</orth></form><gramGrp><gramtype="pos">verb</gram><gramtype="derivedVerbClass">IIq</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">bhdl</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

7.6 Plural nouns (Go to contents)

Nouns which are only used in the plural form are encoded as pluralNouns.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ḥǝlwīyāt</orth></form><gramGrp><gramtype="pos">pluralNoun</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">ḥlw</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

7.7 Collective nouns (Go to contents)

Collective nouns are usually registered with their respective singulative and the plural forms.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">baṣal</orth></form><gramGrp><gramtype="pos">collectiveNoun</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">bṣl</gram></gramGrp><formtype="inflected"ana="#n_unit"><orthxml:lang="ar-arz-x-cairo-vicavTrans">baṣala</orth></form><formtype="inflected"ana="#n_pl"><orthxml:lang="ar-arz-x-cairo-vicavTrans">baṣalāt</orth></form>
...
A Machine-readable Dictionary of Egyptian Arabic

If a collective noun has no singulative this is recorded in the following manner:

...
<formtype="lemma"><orthxml:lang="ar-aeb-x-tunis-vicav">sfinnārya</orth></form><gramGrp><gramtype="pos">collectiveNoun</gram><gramtype="usg">has no unit noun</gram><colloctype="countNoun"lang="ar-aeb-x-tunis-vicav">kaʕba</colloc><gramtype="root"xml:lang="ar-aeb-x-tunis-vicav">sfnry</gram></gramGrp><sense><cittype="translation"xml:lang="en"><quote>carrots</quote></cit>
   ...
   </sense>
...
A Digital Dictionary of Tunis Arabic

7.8 Elatives (Go to contents)

Elatives should be registered under the respective positive forms.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">kuwayyis</orth></form><formtype="inflected"ana="#adj_sg_f"><orthxml:lang="ar-arz-x-cairo-vicavTrans">kuwayyisa</orth></form><formtype="inflected"ana="#adj_pl"><orthxml:lang="ar-arz-x-cairo-vicavTrans">kuwayyisīn</orth></form><formtype="inflected"ana="#adj_elative"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʔaḥsan</orth></form>
...
A Machine-readable Dictionary of Egyptian Arabic

In some cases it may make sense to treat a particular elative as a lexeme in its own right.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʔaḥsan</orth></form><gramGrp><gramtype="pos">elative</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">ḥsn</gram></gramGrp><sense><cittype="translation"xml:lang="en"><quote>better</quote></cit><ptrtype="example"target="izzayyi_abuuk__001"/><ptrtype="example"target="ahsan_min_balaash_001"/><ptrtype="example"target="bukra_tib_a_ahsan_001"/><ptrtype="example"target="ahsan_haaga_taaxud_taaks_001"/></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

7.9 Count nouns (Go to contents)

Many languages have special words that are placed between a numeral and a counted noun. This word class, which is also referred to as classifier, also exists in spoken Arabic varieties, albeit it is not as pervasive as in other languages such as many eastern Indo-European languages or the languages of East Asia (Chinese, Korean, Japanese).

If a noun is used with a count noun before numerals this is indicated in a colloc element.

...
<formtype="lemma"><orthxml:lang="ar-aeb-x-tunis-vicav">brīk</orth><bibl>Ritt-Benmimoun 2014</bibl><bibl>Singer 1984, p.67</bibl></form><gramGrp><gramtype="pos">collectiveNoun</gram><gramtype="root"xml:lang="ar-aeb-x-tunis-vicav">brk</gram><colloctype="countNoun"xml:lang="ar-aeb-x-tunis-vicav">kaʕba</colloc></gramGrp>
...
A Digital Dictionary of Tunis Arabic

An analogous example is this:

...
<formtype="lemma"><orthxml:lang="ar-aeb-x-tunis-vicav">bṣal</orth><bibl>Ritt-Benmimoun 2014</bibl></form><gramGrp><gramtype="pos">collectiveNoun</gram><gramtype="root"xml:lang="ar-aeb-x-tunis-vicav">bṣl</gram><colloctype="countNoun"xml:lang="ar-aeb-x-tunis-vicav">ṛāṣ</colloc></gramGrp>
...
A Digital Dictionary of Tunis Arabic

7.10 Invariable nouns and adjectives (Go to contents)

Many Arabic dialects have nominals which do not display feminine or plural forms. These are identified with a gram element and a morphType attribute.

...
<gramGrp><gramtype="pos">adjective</gram><gramtype="morphType">invariable</gram></gramGrp>
...

7.11 Participles (Go to contents)

Adjectives or nouns which are participles derived from verbal forms are furnished with an additional type="morph" attribute.

<entryxml:id="mashhur_001"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">mašhūr</orth></form><gramGrp><gramtype="pos">adjective</gram><gramtype="morph">passiveParticiple</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">šhr</gram></gramGrp>
   ...
</entry>
                        
A Machine-readable Dictionary of Egyptian Arabic
...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">mǝtʕāwen</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="morph">activeParticiple</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">ʕwn</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

7.12 Pragmatics (Go to contents)

In many cases it is necessary to furnish information about contexts, situations in which lexical items are being used. Such information is typically to be found in sense elements.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʔafandim</orth></form><sense><usgtype="prag"xml:lang="en">respectful form of address to a man or a woman</usg><usgtype="prag"xml:lang="de">höfliche Anrede an einen Mann oder eine Frau</usg>
   ...
</sense>
...
A Machine-readable Dictionary of Egyptian Arabic

When not sure whether to apply a usg or def element, ask yourself if it possible to formulate the information saying used as or used in. In the afandim example above one might consider reformulating the usage label as used as a respectful form of address ....

Another example:

...
<sense><usgtype="prag"xml:lang="en">to attract the evil eye by talking about something</usg><cittype="translation"xml:lang="en"><quote>to jinx</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

The following code snipped furnishes a good example of a quite complex situation. The noun has two morphological forms which are mutually exclusive. The grammatical information in both senses is corroborated by examples.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">maṛa</orth></form>
...
<sense><usgtype="prag"xml:lang="en">often derogatory</usg><usgxml:lang="en">absolute state only</usg><cittype="translation"xml:lang="en"><quote>woman</quote></cit><ptrtype="example"target="inti_ya_mara_tacaali_001"/></sense><sense><usgxml:lang="en">construct state only</usg><cittype="translation"xml:lang="en"><quote>wife</quote></cit><ptrtype="example"target="miraatu_ga_lha_walad_001"/></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

Mind that the usg element can also be applied in sample sentences.

...
<citxml:id="sharraftuuna_allaah_yisharraf_mi_daarak_001"type="example"><quotexml:lang="ar-arz-x-cairo-vicavTrans">šaṛṛaftūna. - ʔaḷḷāh yišaṛṛaf miʔdāṛak.</quote><usgtype="prag"xml:lang="en">This phrase is used upon leaving.</usg><cittype="literalTranslation"xml:lang="en"><quote>You have honoured us (i.e. visiting us). - God may honour your worth.</quote></cit></cit>
...
A Machine-readable Dictionary of Egyptian Arabic

7.13 Functional constraints (Go to contents)

Another option are usage labels which can be used to indicate functional constraints.

...
<form><orthxml:lang="ar-apc-x-damascus-vicavTrans">la-ḥāl-</orth></form><gramGrp><gramtype="usg">only with pronominal suffixes</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">l ḥwl</gram></gramGrp>
...
A Digital Dictionary of Damascus Arabic

8 Language identifiers (Go to contents)

The values in the xml:lang attributes have been designed in compliance with Best Current Practice 47 (BCP 47) which in turn refers to and aggregates a number of ISO standards (639-1, 639-2, ISO 15924, ISO 3166). The labels used as values in the xml:lang attributes reflect a hybrid system that indicates both linguistic variety and writing system.

Value Explanation Used in dictionary
de German
en English
ar-aeb-x-tunis-vicav Tunis Arabic, VICAV transcription aeb_eng_001__v001
ar-arz-x-cairo-vicavTrans Cairo Arabic, VICAV transcription arz_eng_006
ar-arz-x-cairo-arabic Cairo Arabic, Arabic script arz_eng_006
ar-apc-x-damascus-vicavTrans Damascus Arabic, VICAV transcription apc_eng_002
ar-ary-x-sale-vicavTrans Sale Arabic, VICAV transcription ary_s_rabat_eng_002
ar-ary-x-vicavTrans Moroccan Arabic, VICAV transcription ary_eng_001

9 Etymologies (Go to contents)

Etymologies are encoded by means of the etym element. According to our schema, these also have to be top-level elements which are placed after the gramGrp element.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ʔaṣanṣēr</orth></form><gramGrp>...</gramGrp><etym>loanword<lang>French</lang><mentioned>ascenseur</mentioned></etym>
...
A Digital Dictionary of Damascus Arabic

10 Advanced issues (Go to contents)

10.1 One entry or two entries? (Go to contents)

The applied system is supposed to retain flat hierarchies.

10.1.1 Feminine derivations (Go to contents)

Feminine forms of substantives are encoded as separate entries.

<entryxml:id="gaar_001"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">gāṛ</orth></form>
   ...
   <sense><cittype="translation"xml:lang="en"><quote>neighbour</quote></cit></sense></entry><entryxml:id="gaara_001"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">gāṛa</orth></form>
   ...
   <sense><cittype="translation"xml:lang="en"><quote>neighbour (female)</quote></cit></sense></entry>
                        
A Machine-readable Dictionary of Egyptian Arabic

10.1.2 Homonyms (Go to contents)

Homonyms with diverging morphological forms are also encoded as separate entries.

<entryxml:id="beet_000"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">bēt</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">byt</gram></gramGrp><formtype="inflected"ana="#n_pl"><orthxml:lang="ar-arz-x-cairo-vicavTrans">biyūt</orth></form><sense><cittype="translation"xml:lang="en"><quote>house, home</quote></cit></sense></entry><entryxml:id="beet_001"><formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">bēt</orth></form>
   ...
   <formtype="inflected"ana="#n_pl"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʔabyāt</orth></form><sense><cittype="translation"xml:lang="en"><quote>verse (in poem)</quote></cit></sense></entry>
                        
A Machine-readable Dictionary of Egyptian Arabic

10.1.3 Nominals ending in -i(Go to contents)

In Arabic, there is often no clear delineation between nouns and adjectives. The issue is particularly tricky with nominals derived by means of the Nisba suffix. In the VICAV dictionaries, these are treated in four categories: adjectives, masculine nouns, feminine nouns and the special case of glottonyms. By adjectives, we understand nominals that are mostly used attributively and usually display masculine and feminine forms:

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">sūri</orth></form><gramGrp><gramtype="pos">adjective</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">swr</gram></gramGrp><formtype="inflected"ana="#n_f"><orthxml:lang="ar-apc-x-damascus-vicavTrans">sūriyye</orth></form><formtype="inflected"ana="#n_pl"><orthxml:lang="ar-apc-x-damascus-vicavTrans">sūriyyīn</orth></form><sense><cittype="translation"xml:lang="en"><quote>Syrian</quote></cit>
   ...
</sense>
...
A Digital Dictionary of Damascus Arabic

Cases such as the following are categorised as nouns:

<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">šāmi</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">šʔm</gram></gramGrp><formtype="inflected"ana="#n_pl"><orthxml:lang="ar-apc-x-damascus-vicavTrans">šwām</orth></form><sense><cittype="translation"xml:lang="en"><quote>Damascene</quote></cit><cittype="translation"xml:lang="de"><quote>Damaszener</quote></cit><cittype="translation"xml:lang="es"><quote>damasceno, de Damasco</quote></cit></sense>
...
A Digital Dictionary of Damascus Arabic

The respective feminine form looks like this:

<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">šāmiyye</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">šʔm</gram></gramGrp><formtype="inflected"ana="#n_pl"><orthxml:lang="ar-apc-x-damascus-vicavTrans">šāmiyyāt</orth></form><sense><cittype="translation"xml:lang="en"><quote>Damascene woman</quote></cit><cittype="translation"xml:lang="de"><quote>Damaszenerin</quote></cit><cittype="translation"xml:lang="es"><quote>damascena, de Damasco</quote></cit></sense>
...
A Digital Dictionary of Damascus Arabic

Ethnonyms, demonyms and similar proper nouns are also placed in separate entries:

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ʕarabi</orth></form><gramGrp><gramtype="pos">glottonym</gram><gramtype="root"xml:lang="ar-apc-x-damascus-vicavTrans">ʕrb</gram></gramGrp><sense><usgtype="dom">linguistics</usg><cittype="translation"xml:lang="en"><quote>Arabic</quote></cit></sense>
...
A Digital Dictionary of Damascus Arabic

10.2 Diptosy (Go to contents)

This phenomenon is related to the MSA dictionary only. Diptosy is indicated either on the lemma form or plural forms. In the first case the information is stored in the gramGrp element.

...
<formtype="lemma"><orthxml:lang="ar-x-DMG">ˀazraq</orth><orthxml:lang="ar">ازرق</orth></form><gramGrp><gramtype="pos">adjective</gram><gramtype="morphType">diptotic</gram>
   ...
</gramGrp>
...
A Machine-readable Dictionary of Modern Standard Arabic

For plural forms, we make use of a special n_diptPl value.

...
<formtype="inflected"ana="#n_diptPl"><orthxml:lang="ar-x-DMG">rasāˀil</orth><orthxml:lang="ar">رسائل</orth></form>
...
A Machine-readable Dictionary of Modern Standard Arabic

10.3 Arguments (Go to contents)

The relation between a lexical item and its dependents is also encoded making use of the gramGrp element. By contrast to the cases dealt with before this gramGrp element is not placed on the top-level of the entry but inside sense elements.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">dawwaṛ</orth></form><sense><gramGrp><gramtype="arguments"xml:lang="ar-apc-x-damascus-vicavTrans">ʕala</gram></gramGrp><cittype="translation"xml:lang="en"><quote>to look for, to search</quote></cit></sense>
...
A Digital Dictionary of Damascus Arabic

10.4 Constructions (Go to contents)

Multi-word units on the level of independent dictionary entries have to be distinguished from constructions that correlate with particular senses. These are encoded in form elements. Consider the following example:

...
<sense><formtype="construction"><orthxml:lang="ar-arz-x-cairo-vicavTrans">baʔa + li- + 
         <segtype="constrPart">pronSuffix | noun</seg> + 
         <segtype="constrPart">timeExpression</seg></orth></form>
   ...
</sense>
...
A Machine-readable Dictionary of Egyptian Arabic

Information regarding the translation of the particular construction is represented as in other senses:

...
<sense><formtype="construction"><orthxml:lang="ar-arz-x-cairo-vicavTrans">baʔa + li- + 
         <segtype="constrPart">pronSuffix | noun</seg> + 
         <segtype="constrPart">timeExpression</seg></orth></form><cittype="translation"xml:lang="en"><quote>since, for</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

One sense can have several instantiations of a construction.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ʔal‎ᵊ‎b</orth></form>
...
<sense><cittype="translation"xml:lang="en"><quote>heart</quote></cit>
   ...
</sense><sense><formtype="construction"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ʔalb +
         <segtype="constrPart">pronSuffix</seg> + ṭayyeb
      </orth></form><formtype="construction"><orthxml:lang="ar-apc-x-damascus-vicavTrans">ʔlūb +
         <segtype="constrPart">pronSuffix</seg> + ṭayybīn
      </orth></form><cittype="translation"xml:lang="en"><quote>to be kind-hearted</quote></cit>
   ...
</sense>
...
A Digital Dictionary of Damascus Arabic

An anologous case is the following one:

...
   <formtype="lemma"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʕand</orth></form>
   ...
   <sense><cittype="translation"xml:lang="en"><quote>with, by, next to</quote></cit></sense><sense><formtype="construction"><orthxml:lang="ar-arz-x-cairo-vicavTrans">ʕand + 
            <segtype="constrPart">pronSuffix</seg></orth></form><cittype="translation"xml:lang="en"><quote>to have</quote></cit></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

10.5 Constructions vs. sample sentences (Go to contents)

It is not always easy to decide whether to put information into a construction or a sample sentence. By construction, we understand strings of words with variable elements. They can be conceived as patterns with particular slots holding variables. Sample sentences would then be particular instantiations of such a pattern.

<entryxml:id="tbaarek_001"><formtype="lemma"><orthxml:lang="ar-ary-x-sale-vicavTrans">tbāṛek</orth><orthxml:lang="ar-ary-x-sale-vicavTrans">تبارك</orth></form><gramGrp><gramtype="root"xml:lang="ar-ary-x-sale-vicavTrans">brk</gram></gramGrp><sense><formtype="construction"><orthxml:lang="ar-ary-x-sale-vicavTrans">tbāṛek + aḷḷāh + ʕla + 
            <segtype="constrPart">pronSuffix | noun</seg></orth></form><cittype="translation"xml:lang="en"><quote>how nice is ...! how wonderful ...!</quote></cit><cittype="translation"xml:lang="de"><quote>-</quote></cit></sense></entry>
                        
A Machine-readable Dictionary of Rabat Arabic

10.6 Register (Go to contents)

Information concerning the level of formality is tagged using the usg element with a type="reg" attribute. This may occur in three positions of an entry.

Information qualifies ... Tag is placed in ...
lemma gramGrp The information refers to the lexical item as a whole.
other word forms form The information refers to a particular form only.
sense sense The information refers to a particular sense only.

Some words are usually used in formal occasions.

...
<formtype="lemma"><orthxml:lang="ar-arz-x-cairo-arabic">ثقافة</orth></form><gramGrp><gramtype="pos">noun</gram><gramtype="root"xml:lang="ar-arz-x-cairo-vicavTrans">ṯqf</gram><usgtype="reg">formal</usg></gramGrp>
...
A Machine-readable Dictionary of Egyptian Arabic

Many nouns have several plural forms. This apparent overabundance can often be explained by varying degrees of formality.

...
<formtype="inflected"ana="#v_pp_m"><usgtype="reg">informal</usg><orthxml:lang="ar-arz-x-cairo-vicavTrans">mittihim</orth></form><formtype="inflected"ana="#v_pp_m"><usgtype="reg">formal</usg><orthxml:lang="ar-arz-x-cairo-vicavTrans">muttaham</orth></form>
...
A Machine-readable Dictionary of Egyptian Arabic

10.7 Semantic classifications (Go to contents)

Basic semantic classifications are stored in ‘domain’ labels which are placed inside sense elements. They are put right at the beginning of the sense elements. Senses can be assigned multiple such labels.

...
<formtype="lemma"><orthxml:lang="ar-apc-x-damascus-vicavTrans">fūl</orth></form>
...
<sense><usgtype="dom">food</usg><usgtype="dom">plants</usg><cittype="translation"xml:lang="en"><quote>beans</quote></cit></sense>
...
A Digital Dictionary of Damascus Arabic

11 Creating usage examples (Go to contents)

As we have seen before, examples are separate records. They are not encoded with the lemmas. Examples are always linked to particular senses. They are referenced through ptr “pointer” elements which are put at the end of the respective sense element.

...
<sense><cittype="translation"xml:lang="en"><quote>to be, to become</quote></cit>
   ...
   <ptrtype="example"target="yibqa_ustaz_001"/></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

The example referrenced in the above ptr element looks like this:

<citxml:id="yibqa_ustaz_001"type="example"><quotexml:lang="ar-arz-x-cairo-vicavTrans">ḥayibʔa ustāz in šāʔ allāh.</quote><cittype="translation"xml:lang="en"><quote>He will become a professor (hopefully).</quote></cit></cit>
                        
A Machine-readable Dictionary of Egyptian Arabic

To add such a link follow these steps:

1. Go to the example entry. The focus has to be in the editor.
2. Copy the ID to the clipboard ... by pushing F11. Make sure that this key is defined in your list of key assignments.
3. Go to the entry into which you want to insert the link.
4. Move to the insert position in the appropriate sense element. ptr should be at the end of a sense element.
5. Insert the pointer ... by pushing Ctrl + V.

12 Sources and responsibilities (Go to contents)

Some of our dictionaries contain bibliographic references concerning the source of particular entry components. This type of information is typically encoded in bibl elements.

...
<bibl><author>Ritt-Benmimoun</author><date>2012/2013</date></bibl>
...
<bibl><author>Singer</author><date>1958</date><biblScopeunit="page">56</biblScope></bibl>
...
A Digital Dictionary of Tunis Arabic

In production stage, abbreviated versions are permissible.

...
<bibl>Singer 1958, p.56</bibl>
...
A Digital Dictionary of Tunis Arabic

Ideally, the sources should be resolved in the header.

12.1 ... of forms (Go to contents)

bibl may be embedded in form elements. A form element can contain several bibl elements.

...
<formtype="lemma"><orthxml:lang="ar-aeb-x-tunis-vicav">markaz</orth><bibl>Ritt-Benmimoun 2014</bibl><bibl>Singer 1958, p.34</bibl></form>
...
A Digital Dictionary of Tunis Arabic

12.2 ... of examples (Go to contents)

Furthermore, the bibl element can be placed inside cit elements which are used to encode usage examples.

<citxml:id="limcit_cineeh_001"type="example"><quotexml:lang="ar-arz-x-cairo-vicavTrans">limʕit ʕinēh.</quote>
   ...
   <bibl>4/82</bibl></cit>
                        
A Machine-readable Dictionary of Egyptian Arabic

12.3 ... of senses (Go to contents)

The third option are senses. As bibl can not directly be put in sense, xr has to be wrapped around:

...
<sense><cittype="translation"xml:lang="en"><quote>to call to prayer</quote></cit><cittype="translation"xml:lang="de"><quote>zum Gebet rufen</quote></cit><xr><bibl>8/10</bibl></xr><xr><bibl>19/205</bibl></xr></sense>
...
A Machine-readable Dictionary of Egyptian Arabic

By convention, all bibl elements are placed at the end of the containing element.

13 References (Go to contents)

The Guidelines of the Text Encoding Initiative are currently available in eight languages. Go