![]() |
ARCADE
Sentence track - Format of results |
| Results will be SGML-encoded
according to one of the mechanisms provided by the Corpus
Encoding Standard (CES) developed in the Multext project (section
5.3.4.2). The CES provides a simple means to point to SGML elements
in other SGML documents by referring to IDs, using xtargets
attribute on a <link> element.
The sentences to be associated
have ID attributes by which they can be referenced in the alignment file
(see corpus format). Here is a simple example:
DOC1: <s id="s1">Première phrase</s> DOC2: <s id="s1">First sentence</s> ALIGNMENT: <link xtargets="s1 ; s1"> Note that we will arbitrarily
decide that the references to the French document are on the left.
<link xtargets="s1 ; s1 s2">
N-to-zero alignments can
also be indicated:
<link xtargets="s1 ; ">
The complete scheme proposed in the CES si a little more complicated (<link> elements are embedded into link groups, themselves constituting and SGML document. However, for the purpose of ARCADE, we can simplify the scheme. The files of results will simply consist of a series of <link> lines, each corresponding to an alignment. A different file of alignments will be provided for each pair of documents in the corpus. A file naming convention will be distributed later. |
Original texts:
|