Cranial Darwinism: July 2007

Tuesday, July 31, 2007

Flyer's Rights

You gotta love anything called a bill of rights for airplane passengers. How do I get them to add the right not to be sold credit cards while on the plane?

A paper emerges from the tunnel of anonymous review

Overall I'm in favor of anonymous review, but one of the many annoyances of it is that during the review process I always feel reluctant to advertise. But now...

Richard's paper describing his "Set Expander in Any Language" system was accepted to ICDM. Of all my students Richard's the only one who really enjoys building systems...so if you don't have time to read the (preliminary version of the) paper you can play with the demo.

Friday, July 27, 2007

Deep packet inspection meets 'Net neutrality, CALEA: Page 1

There's a very interesting, in-depth discussion of "deep packet inspection", and some of the implications of it on Ars Technica. DPI is diving into the packets flowing through an ISP, and opening them up to inspect the content - eg, there are commercial tools to identify the type of traffic (e.g., virus vs YouTube video vs iTunes download vs chat vs email vs ...). "Flow analysis" is assembling packets together (e.g., to reconstruct an email message), and that's also commercialized. DPI products that are "CALEA-compliant" can collect and offload a user's datastream (CALEA is the "Communications Assistance to Law Enforcement Act") - usually this stuff is farmed out by an ISP to specialists. Once packets (or flow) is classified it's also possible to impose rules - e.g., squash viruses, eliminate denial-of-service attacks, disallow on-line games for non-premium users, or slow down traffic from, say, YouTube to a crawl unless Google pays up a designated fee.

According to the article, current DPI systems classify packets using signature-based methods, much like anti-virus systems do. This makes a lot of sense if you're only interested in Personally I'm surprised that machine learning isn't used in this step yet - but I suspect that this will happen before long.

Thursday, July 26, 2007

CBC.ca Arts - Queen's Brian May to complete astrophysics doctorate

Brian May, the 60-year old former Queen guitarist just submitted his doctoral thesis in astrophysics on "Radial Velocities in the Zodiacal Dust Cloud" (Imperial College, London). When I retire I think I'll become a rock star.

Data Mining: Text Mining, Visualization and Social Media: LinguisticAgents

Matt Hurst has been spying out the action at AAAI. Today's post on LinguisticAgents (an Israeli company) is interesting. From a quick read, NanoSyntax is combining morphology with syntax - which makes loads and loads of sense in Hebrew, certainly, and similar methods have been effective in other languages - Klein et al have a very nice paper from CoNLL a few years back on character-level models for NER., to give one example.

I always find it amusin', though, how different industry types and academics pitch their intellectual wares. AI companies are so often based on transformational revolutionary brand-new ideas (if you believe the white papers), whereas the most of us longhair university folks are plugging away with incremental improvements to the big idea from, say, three years ago. Does that seem backwards to anyone else?

What are willing to do - for science?

Of course you trust your ISP, but next time you're using a friend's, visit http://vancouver.cs.washington.edu

It's, you know, a good cause.

Wednesday, July 25, 2007

It's a sign!

While reading the blogs on my front porch, my laptop ran out of power and hibernated while I was reading this. (Discovered from Boingboing).

Monday, July 23, 2007

More email leaks

For those that are interested in information diffusion processes: I got an update from Vitor on publicity on our paper on "detecting email leaks" - there have been a dozen or so followups to the new stories I mentioned back in June. Interestingly all of these are in Portugese:

Blogs:
http://alexronald.wordpress.com/2007/06/30/brasileiro-cria-sistema-que-evita-envio-de-e-mail-a-destinatario-errado/
http://idgnow.uol.com.br/seguranca/2007/06/29/idgnoticia.2007-06-29.1623119240
http://bloginfodicas.blogspot.com/2007/06/nesta-ltima-dcada-perodo-em-que-os.html
http://www.rogergoncalves.com/blog/?p=21
http://www.piratadarede.com/blog/?p=1031
http://ouhe.blogspot.com/2007/06/novo-sistema-evita-envio-de-e-mail-para.html
http://videobr.pro.br/forum/viewtopic.php?p=37696&sid=e510424e3ba94063a467a5e5e5970cd3
http://inovabrasil.blogspot.com/2007/07/novo-mtodo-previne-vazamento-de.html

http://mundoetecnologia.wordpress.com/2007/06/29/brasileiro-cria-sistema-que-evita-envio-de-e-mail-a-destinatario-errado/

Research, high-tech and general news websites

http://www.acessasp.sp.gov.br/html/modules/news2/article.php?storyid=31
http://www.agencia.fapesp.br/boletim_dentro.php?data%5Bid_materia_boletim%5D=7362
http://g1.globo.com/Noticias/Tecnologia/0,,MUL61241-6174,00.html
http://g1.globo.com/Noticias/Tecnologia/0,,MUL34417-6174,00.html
http://www.adn-negocios.com/adn/index.php?option=com_content&task=view&id=38&Itemid=1
http://pcworld.uol.com.br/noticias/2007/06/29/idgnoticia.2007-06-29.3833885183
http://pe360graus.globo.com/noticias360/matLer.asp?newsId=94336

At least 30 comments on a discussion board, asking the question: "Have you ever sent an email to the wrong person?"
http://g1.globo.com/Noticias/Tecnologia/0,,MUL34362-6174,00.html

And some in news Portugal, pretty much echoing the news from Brazil
http://diariodigital.sapo.pt/news.asp?section_id=44&id_news=283432
http://www.wintech.com.pt/index2.php?option=com_content&do_pdf=1&id=380

I guess there's some element of "local news" with these, since Vitor's Brazilian, but there's nothing particular about the content of the story that seems especially Brazilian - it's just a technique to avoid a class of email-related errors (and was evaluated, in fact, on the Enron corpus, which is all English.) It's interesting that language is as much of a barrier as it appears to be for the spread of high-tech news.

If you don't own an iPhone yet...

It got easier. Penetrated, completely compromised; the hack validated by Steve Bellovin and Avi Rubin; published in the NYT; and even slashdotted. On the plus side the guy that hacked it is a former employee of the NSA.

Monday, July 16, 2007

Is this the world's most frequently parodied song yet?

Chortle. (From Lauren Weinstein. Posted, naturally, from my Google-owned blogger account.)

Friday, July 13, 2007

The laughter curver

Ok, maybe this is politics...but it's also a great example of really bad data analysis. Maybe there is a Laffer curve, but you sure have to look hard to see it here. I wonder how many economists would choose the line in the top graph over the one in the bottom graph, if all they saw were the points, without any labels?

Monday, July 09, 2007

Sad news of the day

Don Michie and his wife were killed in a car crash.

Sunday, July 08, 2007

Personalization and polarity

In the more than 20 years I've been studying AI I've discovered that every decent knife has two edges. Even those knives that seem like really, really cool ideas when you first grok them. How could giving everyone a free editorial column be a bad thing? and how could improving access to all that new content not be beneficial?

Anyway, I can't resist responding to Fernando's response to Matt's response to my response to Lauren Weinstein's posting (are you following all this?) on search-term polarity.

The original post by Lauren Weinstein that triggered this thread was about the visible global impact of search rankings, but William's discussion suggests a less global but possibly more powerful effect in search personalization, of whether a personalization algorithm could become a strong reinforcer of prejudice without the counter-pressure of critical discussion of globally visible search ranking.

Here's an even broader suggestion: could just having more and better access to more and more diverse content have the same effect - i.e., is the growing blog world "a strong reinforcer of prejudice without the counter-pressure of critical discussion of globally visible" content? It's certainly easy enough to fill time reading political commentary that you can be 99% sure you'll agree with - and look how bitterly partisan the country has become, and how little is now universally accepted as correct.

Maybe Matt or Fernando know whether anyone's ever looked into whether the effect I'm speculating about is real - and if it is, what could we scientists do to create the appropriate "counter-pressure". Ideas, anyone?

Friday, July 06, 2007

iShouldn't have wondered about security

According to RixStep, the iPhone includes, among its many OS X-based cool features, a root password of 'dottie'.

CKAN

Here's something interesting, I wonder how far it will go. Wikipedia's been so successful, a lot of people seem to be trying to take a further step in that direction. So far, Freebase is my favorite project of that sort - they have a very precise clear vision that's easy to convey (if not accomplish).

Thursday, July 05, 2007

iBuzz

According to Matt Hurst the iPhone buzz has peaked which means that I'll fashionably late with my contribution...

I don't expect to get an iPhone anytime soon - I'm still perfectly happy with my ancient Samsung i500. But I'm amazed that Apple decided to ~~jump in bed~~ lock in to using AT&T as a provider. (Apparently the first day the iPhone was released AT&T Edge had widespread network outages, affecting non-iPhone users as well as iPhone users.) And I'm thrilled that Apple decided on a closed platform - it will be way amusing to follow the inevitable opening of the iPhone by hackers. The checklist below (from IP) is already yesterday's news:

Break DMG Password *COMPLETE*
Break Activation *COMPLETE*
Unlock Phone
Run Third Party Applications
Allow DUN/Tethering
Remove IMEI Transmitting
Enable Disk Mode

Yes, it would be cool to be able to use an iPhone on whatever network you want - but there's a socially interesting issue here also. My smartphone holds my information, some of which is potentially quite private (e.g., what doctors do I go to? when am I out of my house?), and which we know is not always very secure. I think I know how and where my Palm/phone device keeps this information, but on a networked device with a closed architecture, what guarantees do you have? We know how careful Apple has been with DRM in iTunes - it'll be interesting to see how much effort Apple has put into locking up its customer's information.

Cranial Darwinism