lunduniversity.lu.se

Language Technology at LTH

Computer Science | Faculty of Engineering, LTH

Denna sida på svenska This page in English

Semantic parsing: PropBank--NomBank frames

Prerequisities

  • A Java virtual machine, version 1.5 or above.
  • 1.4 GB of disk space.
  • 4 GB of RAM for the full parser, 2 GB for the simplified version.
  • Optional: your favourite tokenizer and POS tagger.

Installation

Download the program here. Unzip the package.

Usage

Enter the lth_srl directory.

First of all, you need an input file formatted according to the CoNLL-2008 format. The package includes a script scripts/preprocess.sh that tokenizes, adds part-of-speech tags, and finds lemmas. For instance, you can download this text file and apply the preprocessing script:

sh scripts/preprocess.sh < test.txt > test.tokens

If you prefer to use your own tokenizer or part-of-speech tagger, you have to prepare the CoNLL-2008 format on your own. In this case, don't forget to set the lemma column, at least for the predicates you are interested in.

To run the full syntactic–semantic analyzer, use the script scripts/run.sh:
sh scripts/run.sh < test.tokens > test.output

You might need to increase the heap size declared in run.sh if you use 64-bit machine.

The script scripts/run.sh runs the full system: the second-order dependency parser, linguistic constraints, semantic reranking, and syntactic–semantic integration. To save time and memory, the system might also be run in simpler configurations by using one of the following scripts:

  • scripts/run_constraints.sh runs a simplified system: the second-order dependency parser, linguistic constraints, but no reranking, and no syntactic–semantic integration.
  • scripts/run_greedy.sh is even simpler: the second-order dependency parser, no constraints, reranking, or syntactic–semantic integration.
  • scripts/run_constraints_fast.sh and scripts/run_greedy_fast.sh use a first-order parser (which runs in O(n^3)) instead of the second-order parser (O(n^4)).

The output of all these scripts is in the CoNLL-2008 format.

Page Manager: