Semantic parsing: PropBank--NomBank frames
- A Java virtual machine, version 1.5 or above.
- 1.4 GB of disk space.
- 4 GB of RAM for the full parser, 2 GB for the simplified version.
- Optional: your favourite tokenizer and POS tagger.
Download the program here. Unzip the package.
First of all, you need an input file formatted according to the
CoNLL-2008 format. The package includes a script
scripts/preprocess.sh that tokenizes, adds part-of-speech tags,
and finds lemmas. For instance, you can download
this text file and apply the preprocessing script:
sh scripts/preprocess.sh < test.txt > test.tokens
If you prefer to use your own tokenizer or part-of-speech tagger, you have to prepare the CoNLL-2008 format on your own. In this case, don't forget to set the lemma column, at least for the predicates you are interested in.
To run the full syntactic–semantic analyzer, use the script
sh scripts/run.sh < test.tokens > test.output
You might need to increase the heap size declared in
run.sh if you use 64-bit machine.
scripts/run.sh runs the full system: the second-order
dependency parser, linguistic constraints, semantic reranking, and
syntactic–semantic integration. To save time and memory, the
system might also be run in
simpler configurations by using one of the following scripts:
scripts/run_constraints.shruns a simplified system: the second-order dependency parser, linguistic constraints, but no reranking, and no syntactic–semantic integration.
scripts/run_greedy.shis even simpler: the second-order dependency parser, no constraints, reranking, or syntactic–semantic integration.
scripts/run_greedy_fast.shuse a first-order parser (which runs in O(n^3)) instead of the second-order parser (O(n^4)).
The output of all these scripts is in the CoNLL-2008 format.