Tuesday, March 13, 2012

Installing NLTK corpuses (NLP)

How to install the stopwords corpus:

1. mkdir /home/jim/nltk_data (obviously change the directory name to your user!)
2. Fire up a python console
3. >>>import nltk
4. >>>nltk.download()
5. Downloader> d stopwords

(d for download)

NLP and Installing NLTK

As part of my quest to bring my tech skills up to date I've signed up for a free course on Natural Language Processing (NLP) run on the Coursera website - it's based on a Stanford University CS Major course, and if the first week is anything to go by the quality is high, and the content interesting... An important part of the course is the weekly programming assignment, and here we have a choice of Java or Python as the language used. Given I already know Java I've decided to have a go at Python, and so it is I'm bringing my tech skills up to date by learning a language that was invented last century :)

One of the libraries Python offers is the Natural Language Toolkit (NLTK) which includes various tools to simplify NLP programming. Installing this library on Ubuntu was, as always, less than automatic. In theory, you install the library and then enter in a python console and type "import nltk" - this should work with no errors shown. But...

Things I tried and failed:
  1. http://nltk.github.com/install.html - easy_install, pip, and various dependency errors.
  2. sudo apt-get install libyaml-dev
  3. sudo apt-get install python-yaml
  4. Uninstall and reinstall the pip libraries (?) - numpy and nltk. No dependency errors, but still didn't work.
  5. sudo apt-get install python-nltk - installed apparently correctly, but then didn't work.
  6. sudo apt-get remove python-nltk
  7. Repeat various times
  8. sudo pip remove nltk
  9. sudo pip remove numpy
  10. sudo apt-get remove python-nltk
At this point god knows what crap is installed on my system, but the bloody thing doesn't work. Googling revealed this page: http://www.nltk.org/download - and finally I got the thing installed:
  1. Download PyYAML: http://pyyaml.org/download/pyyaml/PyYAML-3.09.tar.gz
  2. Unzip into a temp directory (tar xvf ...)
  3. sudo python setup.py install
  4. Download NLTK: http://nltk.googlecode.com/files/nltk-2.0.1rc1.tar.gz
  5. Repeat steps 2 and 3.

Saturday, March 03, 2012

Ruby on Rails, Ubuntu style

I've been going through the Ruby on Rails tutorial on my work laptop, but it's hardly ideal: it's not my machine, it's often at work, and unlike my home computer it's a Windows box. So I had the bright idea of installing Oracle Virtual Box, Ubuntu (like at home), and move the sample app I'm building onto a Dropbox shared folder.

1. Install Virtual Box - painless, and very similar to VMWare
2. Install Ubuntu - basically painless, although the CD I was using was 10.10, and it required 2 further updates to bring it up to the latest version. Not sure why Ubuntu couldn't have just skipped to the latest without having to install 11.04 first - I didn't bother to check out the system update options, so this may just have been me being lazy.
3. Install Rails. And here's where things started to go wrong...

Installing Rails on Windows was easy: there's a ready made installer that covers everything, double click, next-next-next and you're done. There's even a cool video that runs you through setting up an account on Engine Yard, Github and so on. It's basically foolproof. On Ubuntu, however, and as with nearly everything else you install on it, you need to do a bit more work.

So, here's a simple how-to for a clean, new Ubuntu 11.10 install.

1. sudo apt-get install ruby1.9.1-full

2. export GEM_HOME=/home/jim/.gem (Here you need to set GEM_HOME to a directory that your user has write permissions for, I used the standard gem directory)

3. gem install rails

4. gem install bundle

EDIT==: after trying to follow these instructions on a different Ubuntu machine, I needed to include the $GEM_HOME/bin directory to my path (in bashrc), for whatever reason the install didn't leave a link to bundle in the /usr/local/bun/ directory... Given the amount of crap I installed and uninstalled when trying this the first time, I'm not overly surprised. == END_EDIT

Unfortunately, the basic bundle script currently defaults to ruby1.8, you need to modify it:

5. sudo vi /usr/local/bin/bundle (modify the shebang)

6. sudo vi /usr/local/bin/rails (modify the shebang)

If you're about to make a clean, shiny new Rails project you're probably done at this point, but as I wanted to run my sample app I had to do a couple more things to get it running.

I tried to install and fire up my sample app - developed as you'll recall on Windows. While I'd been following the tutorial fairly closely, I'd been forced to make a couple of changes to get the console to work correctly under cmd.exe. Unfortunately, these changes weren't directly compatible with Ubuntu, and I had to go through Gemfile and Gemfile.lock deleting the win32 dependencies. Further, there were a series of other dependencies that failed due to missing libraries:

1. nokigiri dependencies:

sudo apt-get install libxml2-dev
sudo apt-get install libxslt1-dev
sudo apt-get install libpq-dev

2. spork dependencies:
sudo apt-get install nodejs

3. Sqlite3:
sudo apt-get install sqlite3
sudo apt-get install libsqlite3-dev

... finally, all was well:

rails s - no problem entering in
bundle exec spork and bundle exec rspec spec - 40 something tests run perfectly, and 0 failures :)