Senseval-2 Systems

SENSEVAL II System Descriptions

Also, here is a link to the new SENSEVAL web site

Page by Michael Oakes, University of Sunderland, July 2nd 2000

Last revision: October 30th 2002 Diana McCarthy

SYSTEMS GROUPED BY TASK

Czech All Words

JHU-Czech: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

Dutch All Words

English All Words

ehu-dlist-all: Agirre & Martinez

Sussex-sel: Carroll & McCarthy

Sussex-sel-ospd: Carroll & McCarthy

Sussex-sel-ospd-ana: Carroll, McCarthy & Preiss

University of California: Chao

LIA-Sinequa_AllWords: Crestan, El-Beze & de Loupy

UMD-UST: Diab & Resnik

UNED-AW-T: Fernandez-Amoros

UNED-AW-U: Fernandez-Amoros

usm_english_tagger,usm_english_tagger2,usm_english_tagger3: Guo

IIT1, IIT2, IIT3: Haynes

ANTWERP: Hoste

DIMAP: Litkowski

irst-eng-all: Magnini

SMUaw: Mihalcea

University of Sheffield: Preiss

UMD-SST: Resnik, Stevens & Cabezas

Estonian All Words

Semyhe: Vider

JHU-Estonian: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

Basque Lexical Sample

ehu-dlist-all: Agirre & Martinez

ehu-dlist-best: Agirre & Martinez

UMD-SST: Resnik, Stevens & Cabezas

JHU-Basque: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

Chinese Lexical Sample

Danish Lexical Sample

English Lexical Sample

ehu-dlist-all: Agirre & Martinez

ehu-dlist-best: Agirre & Martinez

SUSS2: Canning, Oakes & Tait

LIA-Sinequa_Lexsample: Crestan, El-Beze & de Loupy

upenn-VB: Dang

TALP: Escudero

UNED-LS-U: Fernandez-Amoros

UNED-LS-T: Fernandez-Amoros

IIT1, IIT2: Haynes

DIMAP: Litkowski

irst-eng-sample: Magnini

CS224N: Manning

SMUls: Mihalcea

Univ._Alicante_System: Montoyo

Duluth1: Pedersen

Duluth2: Pedersen

Duluth3: Pedersen

Duluth4: Pedersen

Duluth5: Pedersen

DuluthA: Pedersen

DuluthB: Pedersen

DuluthC: Pedersen

UMD-SST: Resnik, Stevens & Cabezas

Kunlp: Seo, Lee & Rim

WASPS-Workbench: Tugwell

JHU-English: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

Italian Lexical Sample

irst-ita-sample: Magnini

JHU-Italian: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

Japanese Lexical Sample

Kyoto: Aramaki

Stanford-Titech 1: Baldwin

Stanford-Titech 2: Baldwin

ATR: Kumano

CRL1: Murata

CRL2: Murata

CRL3: Murata

CRL4: Murata

Ibaraki: Shinnou

CRL-NYU: Uchimoto

Titech1: Yagi

Titech2: Yagi

NAIST: Yamamoto

Anonym1

Anonym2

Anonym3

Korean Lexical Sample

Kunlp-Korean: Seo, Lee & Rim

Spanish Lexical Sample

CS224N: Manning

Univ._Alicante_System: Montoyo

Duluth6: Pedersen

Duluth7: Pedersen

Duluth8: Pedersen

Duluth9: Pedersen

Duluth10:Pedersen

DuluthX: Pedersen

DuluthY: Pedersen

DuluthZ: Pedersen

UMD-SST: Resnik, Stevens & Cabezas

JHU-Spanish: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

Swedish Lexical Sample

Prolog Word Experts: Lager & Zinovjeva

Linköping University: Ahrenberg, Merkel & Anderson

Språkdata/Machine-Learning: Kokkinakis

Språkdata/Common-Features: Kokkinakis

UMD-SST: Resnik, Stevens & Cabezas

JHU-Swedish: Yarowsky, Cucerzan, Florian, Schafer, & Wicentowski

SYSTEMS DESCRIPTIONS (in alphabetical order of first authors).

1. System name: ehu-dlist-all

2. Your contact details

name: Eneko Agirre & David Martinez

email: {eneko,jibmaird}@si.ehu.es

organisation: University of the Basque Country

3. Task/s: English lexical, English all-words, Basque lexical

4. Did you use any training data provided in an automatic training procedure? Yes

5. (if the answer to (4) is no) did you use any training data provided in any way (eg as a test set for debugging)? :

6. Description: This supervised system trains on the provided training data. It extracts a basic feature set:

i) local features for english: bigrams and trigrams around the target word, consisting on lemmas or word forms or parts of speech. Also a bag of lemmas constructed using the content words in a +/- 4 word window around the target.

ii) local features for Basque: being Basque an agglutinative language, part of the syntactic information is in the inflectional suffixes. We therefore have used unigrams, bigrams and trigrams of word forms, lemmas, and parts of speech including declension case and number information.

iii) global features: a bag of lemmas with the content words included in the whole context provided for the target word.

The system is based on Yarowsky's decision list. It sorts the features according to the log-likelihood value and chooses the sense of the feature with the highest value. Features occurring only once were pruned.

In the case of the English all-words task, Semcor 1.6 was used for training, via a automatically produced WordNet 1.6-1.7 map. Adjectives and adverbs were not treated in this case.

Tags P and U have not been used. There is no special treatment for multiword detection.

7. keywords: Supervised learning, decision lists, agglutinative languages.

8. URL containing additional information (optional):Click here

# tokens	152.758
# types	10.263
# sentences	12.287
# words per sentence	12.4
# unambiguous words	9.095
# words that occur once	4.949
# sense tags	9319
# word/sense combinations occuring once	6.702
% of ambiguous tokens in corpus	54