Wednesday, January 30, 2008

Measure twice, cut once

My students Vitor and Ramnath have developed a Thunderbird plugin that implements recipient recommendation and leak detection for email. It modifies Thunderbird by adding an additional pane that pops up after you send a message, giving you one final chance to fix any errors in your recipient list.

The plugin is called Cut Once - the name comes from a phrase used repeatedly by those weird Rastafarian guys in Neuromancer. There's a brief writeup on how to use it, but it's pretty self-explanatory: just download it, open Thunderbird, and go to the tools->addon menu to install. After you've installed it, you train by opening your folder of "Sent" mail and pressing the "train" button. (This took about an hour for my 9000+ old messages, which is pretty good for something written in JavaScript.)

Any comments/feedback are appreciated. Or, if messing with your email client is too extreme for you, you could just read Vitor's ECIR 2008 paper, which is the latest one to come out of the review tunnel.

Saturday, January 19, 2008

Personal control of personal information

Recently I've run into a number of posts discussing privacy from a different prospective - noting that while you, as a consumer/citizen, have limited control over where and how information about you is kept, governments and businesses often do try to control how information about their activities.

For instance, Glenn Reynolds recently wrote in Popular Mechanics:
...government officials and big corporations often want to watch us, but they don't want to be watched in return. Shopping malls are full of security cameras, but many have signs at the entrance telling customers that no photography or video recording is allowed. Police cars have dashboard cameras... But try shooting photos or video of police or ­other public officials as they go about their business and you might find yourself in wrist restraints. ... Under the law, citizens have no right not to be photographed in public places. So why should people who make their living on the taxpayers' dime enjoy greater freedom from public scrutiny than the taxpayers themselves?
An article with a similar tone also appeared in Computer World, which gave a link to a compelling example of how a citizen's records of police actions were evidence of police wrongdoing (NYT headline: "Recorded on a Suspect’s Hidden MP3 Player, a Bronx Detective Faces 12 Perjury Charges"). The author's summary of the situation:
...surveillance in general ... upsets the balance of power. Whoever has the tape has the power to use, not use, selectively use or misuse the information or proof or evidence recorded.
This an interesting line of argument. There's something that seems fundamentally wrong about living in a society where surveillance is "endemic", but maybe the most jarring thing is not the loss of privacy, but the loss of power and control.

With that in mind, there's a very interesting move afoot to set up an open standard to describe user "attention" data - which I gather includes browsing history, mostly, but could also certainly include any other information about what a user is interested in...Netflix reviews, Amazon purchases, search queries, you name it. The hope is to break this data away from the many different sites (each if which controls a piece of it) and put it in the hands of users, who can then go and get purchase recommendations (or what have you) from whoever does the best job.

Not an entirely new idea, but a fascinating one none the less. Moore's law, along with progress in collaborative filtering/machine learning techniques, means that the barrier to being able to save this sort of data and do something interesting with it is just going to keep dropping, so one can certainly imagine a more horizontal market for recommendations opening up over the next few years. I don't see any particularly horrible technical roadblocks, but I do see a lot of interesting technical problems (e.g. reference resolution over this data!) and of course there might be pushback from the people that control the data now.

Update: There's some discussion of this from Fernando.

Thursday, January 10, 2008

Wikis at school

In Ars Technica, last October, I saw a story about a professor that assigned her students the task of writing a Wikipedia entry. I went one better, I think...Natalie Glance and I were teaching a seminar in "Analysis of Social Media" this fall, and I assigned the class the task of building a wiki on the subject. During the class I limited wiki access to class members, but it's now open to everyone to read or edit.

It was an interesting experiment. It was only a little extra work in grading and coordinating, but it was worth it for the irony factor alone. The students were mostly positive about it.

The principle content of the wiki is a bunch of paper summaries, not unlike what students would have turned in in a class, but some students came up with some other ideas for contributions. The main change I'd make if I did this again would be switching wiki providers. (Cheapskate that I am, I used a free wiki farm called scribblewiki. They were very helpful early on, and even upgraded me to a "paid" account for free, but took a long time with some other requests later - and never did add support for latex math.) It's not obvious what's the best way to use a wiki in teaching course for the (n+1)th time, but it's a fun twist for the first time you teach a seminar.