ARCADE    
Sentence track - Format of results
 

Format of results

Results will be SGML-encoded according to one of the mechanisms provided by the Corpus Encoding Standard (CES) developed in the Multext project (section 5.3.4.2). The CES provides a simple means to point to SGML elements in other SGML documents by referring to IDs, using  xtargets attribute on a <link> element. 

The sentences to be associated have ID attributes by which they can be referenced in the alignment file (see corpus format). Here is a simple example: 
 

     DOC1: <s id="s1">Première phrase</s> 

     DOC2: <s id="s1">First sentence</s> 

     ALIGNMENT: <link xtargets="s1 ; s1">  

Note that we will arbitrarily decide that the references to the French document are on the left. 
  
The IDrefs of the elements to be aligned are given in the xtargets attribute on the <link> element. A semicolon separates the IDref(s) from each document being linked. Many-to-one alignments are specified by providing a list of IDs from any single document, separated by spaces:  
 

          <link xtargets="s1 ; s1 s2"> 
          <link xtargets="s3 s4 s5 ; s3 s4"> 

N-to-zero alignments can also be indicated:  
 

          <link xtargets="s1 ; "> 
 

The complete scheme proposed in the CES si a little more complicated (<link> elements are embedded into link groups, themselves constituting and SGML document. However, for the purpose of ARCADE, we can simplify the scheme. The files of results will simply consist of a series of <link> lines, each corresponding to an alignment. 

A different file of alignments will be provided for each pair of documents in the corpus. A file naming convention will be distributed later.

 

Example

Original texts:  
 
 
French English
<S id="S1"> 
Alignement le plus courant (1-1) 
</S>
<S id="S1"> 
The most common type of alignment (1-1) 
</S>
<S id="S2"> 
Maintenant un autre cas, deux phrases pour une. 
</S>
<S id="S2"> 
Now a different case. 
</S> 
<S id="S3"> 
Two sentences are aligned to one. 
</S>
<S id="S3"> 
Différentes configurations sont possibles. 
</S> 
<S id="S4"> 
Le nombre de phrases peut être très variable. 
</S> 
<S id="S5"> 
Cela dépend beaucoup des habitudes du traducteur. 
</S>
<S id="S4"> 
Various configurations are possible, with different numbers of sentences involved, depending on the translator's habits. 
</S>
<S id="S6"> 
Parfois, les alignements sont assez complexes, par exemple de 2 pour 3, ou 3 pour 4. 
</S> 
<S id="S7"> 
Heureusement, ça n'est pas très fréquent. 
</S>
<S id="S5"> 
Not very frequently, alignments can be rather complex. 
</S> 
<S id="S6"> 
For example, they can be 2 to 3, or 3 to 4. 
</S>
<S id="S8"> 
Dans d'autres cas, la traduction manque complètement. 
</S>
 
  <S id="S7"> 
Of course, the translators can make mistakes. 
</S>
Encoding of alignements: 
 
<LINK XTARGETS="S1;S1"> 
<LINK XTARGETS="S2;S2 S3"> 
<LINK XTARGETS="S3 S4 S5;S4"> 
<LINK XTARGETS="S6 S7;S5 S6"> 
<LINK XTARGETS="S8;"> 
<LINK XTARGETS=";S7">