writing a crawler program for answers.com

Posted by admin on October 5th, 2008 filed in JAVA, crawler

This is a small crawler program for I wrote for www.answers.com,  Actually I was trying to write a gui based GRE mentor program for that I wrote another simple program to get the all the GRE word list. After getting word list the next step for that program is to extract the necessary data for GRE mentor, So that’s why I wrote this crawler program.

Download: Answers.com crawler

The above link is to download the program, there the program starts from the GRE.java. In that program I am extracting only selected data from the answers.com, Thesaurus, Dictionary, Idioms, Antonyms, synonyms are my required columns from that site. The program will not run if you have proxy authentication requirement. When I am writing  the crawler for GRE words, I analyzed this kind of pattern “ http://sitename/[word] ” for most of the sites, So the same thing happened in answers.com, here also “http://www.answers.com/[word]” now I got the link pattern for every word. I prepared a word list, each time I read a word from that list and append it to the “http://www.answers.com/”.

Ex: “http://www.answers.com/time”

Now I am working on how to provide images for the words then I can remember the word very easy. So for this option I prefer www.images.google.com.

Leave a Comment