Friday, June 29, 2007

Research Publicity

Last year my student Vitor Carvalho got the clever idea of using machine learning methods to detect when an email is mistakenly sent to the wrong person (an "email leak"). A few months after our publication of some results on this, the idea's getting a little publicity:

Search-term Polarity

Via Farber's "Interesting People" list: Lauren Weinstein has a nice discussion of a well-known bug or feature of Google's ranking method - namely, if you search for "Jew" the top-ranked page is, well, sort of uncomplimentary to us Members of the Tribe. Google's analysis is that "Jew" is used more in an "anti-Semitic context".

Lauren's comments here are interesting, and raise some nice questions. If my search term indicates I'm an anti-Semite, should I get a page that's ranked highly by other anti-Semites, or one that's ranked highly by the (hopefully) larger general community? What if my search term indicates I'm a creationist? a disbeliever in global warming? arguably there's a continuum between Google-bombing (ie, manipulation of search results by a small group) and just exploiting linguistic regularities of a subcommunity to give better search results.

Wednesday, June 20, 2007

Dangers of eating lunch at home

Durn, I missed CMU's "Dynamic Balance Festival" and a chance to try out an extreme pogo stick.

Tuesday, June 19, 2007

AT&T Boing-Boinged

My old employer AT&T makes the most-read blog, Boing Boing.

But even at $10/month, AT&T DSL should be avoided like the plague. These are the scumbags who illegally wiretapped the entire Internet for the NSA, who broke net-neutrality to find "copyright infringements, and who inspired NBC to call for a law requiring all ISPs to do the same (imagine -- a law forbidding network neutrality!). Seriously: the only day I wouldn't piss on AT&T is if they were on fire.
This isn't politics, btw - it's technology.

Monday, June 18, 2007

The Shape of Things to Come?

The book The Long Tail is about the way the market for "soft" goods (like music) has changed with the internet. The last chapter discussed some interesting new technologies, including 3D printers - with speculation that they will ultimately force the same sort of changes in the market for real, physical goods. Right now 3D printers are expensive and slow, but they are already making an impact in certain niches--for instance, a talk at CMU a few years ago made a great case for using them to make printing 3D models of proteins. (Believe it or not, holding one for a minute or two beats any CHIME visualization). A good friend of mine recently blogged a visit to Desktop Factory, which makes 3D printers, and posted a bunch of interesting images. Look at the bones, man!

Friday, June 15, 2007

Symbol grounding and relations

I've been reading a number of Peter Turney's papers and I've lately been catching up on his blog, which has a number of really interesting posts related to his work - and remarkably related to the things I've been pondering about over the last few months.

For instance: machine learning spent years learning how to recognize classes of objects by their attributes; a popular topic now is collective classification, or recognizing the class of an object (in part) by considering how that object is related to other objects. Are attributes only a "convenient fiction" - a useful abstraction that ultimately must be defined in terms of relations? Is an apple intrinsically red, or is "redness" something that describes the interaction of the apple and the sensory system of the observer? likewise is every attribute properly a description of an object and some sensory system or measurement instrument?

This seems like a rather strange and abstract question, but it's intimately connected with the "symbol grounding problem", the subject of another of Peter's posts, which in turn is connected with my long-standing interest in data integration - a very practical real-world problem. To combine data from two heterogeneous knowledge bases is, properly speaking, impossible to do automatically: if they are different formal systems, and there is no surefire way to translate between them. There is no common ground. By the same token, communication between two people is also impossible. How do we know that what I mean by "red" is the same as what you mean?

The solution to the problem, for human communication, appears to be that language is grounded - in part by common perceptual systems. (Goleman's book Social Intelligence is a nice description of some of the ingenious mechanisms that have evolved for establishing this common grounding.) The effect is that I don't really know that what I mean by "fear" is the same as what you mean; an in fact, it may not be the same. But if we're both neurologically typical it is almost certainly highly similar to what you mean.

Back to attributes and relations - I'm not sure, but I think the apparent primacy of relations starts to emerge when you start thinking about these issues. Everything is ultimately defined in terms of relationships with other things, which are finally grounded in our own perceptions--those few things that we don't need words to explain or understand.

Wednesday, June 06, 2007

Is it real or is it demoware?

From Lynn Monson. These guys have so been watching Minority Report.

Google Streetview

There's lots and lots of talk about Google Streetview, including quite a lot of gosh-what-about-privacy? postings: eg there's a rundown of comments in BoingBoing today. One that I resonate with is:

What is the difference between posting a picture of people on a public street on Google Street View or on Flickr?

Obviously - the information is the same, only the information access is different. There's no difference in kind of information - just a qualitatively easier interface to getting it. Anyone can take a plane and a taxi and take a picture of my house - but I don't expect that anyone will bother. So it's jarring to find out suddenly that anyone with a DSL line and 30 sec to spare can get the same effect.

Our expectations of privacy (and privacy laws) are driven by what's easy, and what's likely, not what's theoretically possible.

My personal take on this - it's another great example of how smoothly information can be integrated together, when it's all grounded in the physical world. Entries from maps + business listings + satimages + street view are relatively easy to search together and visualize simultaneously, since they're all tied together by physical location in space.

New Look and New Resolutions

I'm back to blogging for a while, but I'm staying away from politics. It's just way way too much of a time sink.