External Resources

This page contains links to resources which I have found useful through the course of my research.  Hopefully you will find them useful too.  I mainly code in java, so programming API's are typically in that language.

If you know of any other resources you would like to be posted here, please get in touch!

WordNet: A large semantic thesaurus.
Download here.
Java API for WordNet Searching.

Simple Wikipedia: a large, collaboratively edited, simplified English encyclopedia.
Download XML Source for EN Wikipedia
Download XML Source for Simple Wikipedia
Download for other language Wikipedias

A nice tool for parsing Wikipedia Markup Language into HTML
The HTML can then be converted to raw text using:
The HTML Editor Kit

The PWKP Dataset:  Aligned complex-simple sentences from Wikipedia/Simple Wikpedia.

The SUBTLEX dataset: A frequency dictionary created from film subtitles.

Aligned Wikipedia Articles: Both at sentence and document level.


No comments:

Post a Comment