Friday, June 05, 2009

Google squared

Matt Hurst published some results today of some of his own experiments with Google^2.  It's kind of interesting to compare Google^2 to SEAL, which has more limited goals (it doesn't try and find attributes).   I don't think either one is clearly superior, from my trials, but it seems like SEAL does quite well comparatively.

Matt's first query, "small boats" gives mediocre results for Google^2, but SEAL gives

which is quite decent (modulo kayak and kayaks being the same).  "Sailing boats" gives spottier results in SEAL:

SEAL's result for "netbooks" is great, except "linux" creeps in somewhere in the top 30 results...

"Scottish regions" gives mostly cities in SEAL (like in Google^2), to the extent that I can recognize what it gives.  Matt's next queries are:

Test 5: ‘british overseas territories’ – SEAL gives a very reasonable list, as does Google^2, according to Matt.

Test 6: ‘plants native to the pacific northwest’ – Matt rates Google^2 as "very poor {gardening, botany, Hardcover, Your account, Add your first tag, Share your own customer images, paperback)".  SEAL starts with "Trees, Plant nursery, Washington" and then goes downhill....but it at least knows that it didn't find much and tells you that the list has low confidence.

Test 7: ‘novels of o’brian’ – excellent results from Google^2, and mediocre low-confidence results from SEAL.

Test 8: ‘jedi masters’ – Matt's comments for Google^2 is "oops {Obi-Wan kenobi, Star Wars: The Clone Wars, Mace Windu, Star Wars Episode VI:return of the jedi, Jedi Temple, Kit Fisto, Younglings}".  SEAL gives

#Found Items
1Mace Windu
2Obi-Wan Kenobi
3Luke Skywalker
5Qui-Gon Jinn
7Shaak Ti
8Jedi Council
9Plo Koon
10Saesee Tiin

Nice, except for #8, but why is Mace-Windu first? still, this is pretty good.

Test 9: ‘movies of brad pitt’ – Google^2 gets "not bad, a reasonable list of movies with Director and Language fields. Note that it doesn’t attempt to normalize ‘USA’ and ‘United States’ suggesting that the system uses very superficial representations internally."  SEAL gets poor results, with a low-confidence warning.

Test 10: ‘progressive rock bands’ – Google^2 gets a "fail" from Matt.  SEAL gets

#Found Items
2Pink Floyd
3King Crimson
6Jethro Tull
7Gentle Giant
8Dream Theater
10Van Der Graaf Generator
13The Beatles
14Porcupine Tree
15Procol Harum
17Frank Zappa
18Led Zeppelin
20Art rock
21The Moody Blues
22Deep Purple
24Tangerine Dream
25The Mars Volta
26Punk rock
27Progressive metal
28Classical music
29The Wall
30Symphonic rock



Richie said...

If you just slightly rephrase the queries to SEAL for test 7 (to "O'Brian's novels") and for test 9 (to "Brad Pitt movies"), you'll get better results.

Test 9: Brad Pitt movies

1 Fight Club
2 Troy
3 Snatch
4 Legends of the Fall
5 Seven
6 Meet Joe Black
7 Babel
8 Seven Years in Tibet
9 Burn After Reading
10 Spy Game
11 The Curious Case of Benjamin Button
12 Sleepers
13 True Romance
14 Interview with the Vampire

William Cohen said...

Ah, but re-phrasing queries is a whole 'nother research topic...

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.