Tuesday, March 13, 2012

NLP and Installing NLTK

As part of my quest to bring my tech skills up to date I've signed up for a free course on Natural Language Processing (NLP) run on the Coursera website - it's based on a Stanford University CS Major course, and if the first week is anything to go by the quality is high, and the content interesting... An important part of the course is the weekly programming assignment, and here we have a choice of Java or Python as the language used. Given I already know Java I've decided to have a go at Python, and so it is I'm bringing my tech skills up to date by learning a language that was invented last century :)

One of the libraries Python offers is the Natural Language Toolkit (NLTK) which includes various tools to simplify NLP programming. Installing this library on Ubuntu was, as always, less than automatic. In theory, you install the library and then enter in a python console and type "import nltk" - this should work with no errors shown. But...

Things I tried and failed:
  1. http://nltk.github.com/install.html - easy_install, pip, and various dependency errors.
  2. sudo apt-get install libyaml-dev
  3. sudo apt-get install python-yaml
  4. Uninstall and reinstall the pip libraries (?) - numpy and nltk. No dependency errors, but still didn't work.
  5. sudo apt-get install python-nltk - installed apparently correctly, but then didn't work.
  6. sudo apt-get remove python-nltk
  7. Repeat various times
  8. sudo pip remove nltk
  9. sudo pip remove numpy
  10. sudo apt-get remove python-nltk
At this point god knows what crap is installed on my system, but the bloody thing doesn't work. Googling revealed this page: http://www.nltk.org/download - and finally I got the thing installed:
  1. Download PyYAML: http://pyyaml.org/download/pyyaml/PyYAML-3.09.tar.gz
  2. Unzip into a temp directory (tar xvf ...)
  3. sudo python setup.py install
  4. Download NLTK: http://nltk.googlecode.com/files/nltk-2.0.1rc1.tar.gz
  5. Repeat steps 2 and 3.
Success!


No comments: