Sunday, January 13, 2008

speling

A client at work keeps insisting that we improve the search results on one of my current projects, and one of the things they want is the nifty Google "spell check"... you know, the thing that asks if you meant to search for something else when you misspell a word?  I once read an article on how it works, and it does some comparisons behind the scenes to see if searching by another similar word will result in a significant amount more of results.  The difficulty is in finding those "other words".  I found two ways around this.  The first is that Google actually makes the spell check available through a REST interface, and you don't even need an API key to use it.  Here is some code that will make it work:


import java.net.*;
import java.io.*;

public class Post{

public static void main(String args[]) throws Exception{
String data = "<?xml version=\"1.0\" encoding=\"utf-8\" ?><spellrequest textalreadyclipped=\"0\" ignoredups=\"0\" ignoredigits=\"1\" ignoreallcaps=\"1\"><text>I lik pzza</text></spellrequest>";
URL url = new URL("https://www.google.com/tbproxy/spell?lang=en&hl=en");
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(connection.getOutputStream());
writer.write(data);
writer.flush();

BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while((line = reader.readLine()) != null){
System.out.println(line);
}
writer.close();
reader.close();
}
}

My worry with using Google is that they'd decide they didn't like me querying that service every time a user does a search, and it's very likely against their terms and services agreement.  But lo, dear reader, I found another method of generating those "other words"  It's in this great article by Peter Norvig who happens to be a director of research at Google.  So Thanks Peter, I never could have done it without you.

No comments: