Sunday, May 18, 2008

Upcoming gig

I'll be playing (mostly mandolin) Tues night in Braddock, with the band Smokestack Lightenin' for a fundraiser for Unseam’d Shakespeare Company’s Out of this Furnace: project. The headliner is Anne Feeney.

Where: Elks Lodge #883- 424 Library Street, Braddock, Pennsylvania
When: Tuesday May 20, 2008 7:00 PM-11:00PM
Tickets: $25 (or pay what you can), available at the door (cash/check only)
Contact: Tim Dawson, 412-621-0244 for directions and information

Friday, May 16, 2008

We have violated the prime directive

Noah Smith and I are co-supervising Tae Yano on a project involving analysis of political blogs, and Tae left a pile of results and code on her CMU web site as a way of communicating with us...world-readable. Surprisingly someone at one of the blogs she spidered, Little Green Footballs, actually noticed, leading to a lot of investigative work in this fascinating thread:
Anyone know what this page at Carnegie Mellon means? It’s some kind of experiment that involves comments posted at LGF, and I have a feeling it’s not friendly.
...
comment #5: Maybe it's post-modern poetry, academia-style? Using LGF comments as gibberish to transcend interpretation?
...
comment #18: Obviously it's a KGB plot.
...
#12: there's a Kos directory too. Probably just too what they consider to be partisan sites to test some kind of language algorithms or something
...
#20: I think they are doing a text analysis looking for Charles' sock puppets -- like the way some trolls accuse Robert Spencer of posting as Hugh
...
#23: Looks like they are doing another election-year study looking at patterns of argument or other aspects of communication.
...
#47: Who ever thought that garbage labeled as "research" could be so sophisticated and at the same time be so worthless. ... Polishing farts to get Piled higher and Deeper has become, for the most part, an ephemeral exercise in self deception. [ouch! sounds like my last grant review].
...
#48: This isn't that advanced of stuff - we are working on a simillar concept for internal corporate communications. This has been around a while.
....
#60: EXPERIMENTATION WITHOUT REPRESENTATION! I demand compensation!
...
#61: I wonder if linking to their experiment will introduce fatal feedback loops, leading to a disturbance in the Krell mind-field and possible space-time discontinuity?
...
#79: It looks to my un-Pythonesque eye like a programmer is using the cosine function to create some kind of mapping on how comments refer to one another based on comment location in the thread; and looking at different types of postings from Charles (open thread, breaking news, news outlet criticisms) to see if there are any trends in the discussions that vary according to the posting type. That has to be one of the worst run-on sentences I've ever written.
...
#92: [#64: "What is port sniffing?"] after you pour some in the glass, but before you take a sip, swirl it around a bit and inhale whist the dark chocolate is melting on your tongue......
...
#162: Ahh, so basically it's a merger between post-Marxist deconstructionism and THX-1138-esque Orwellian personality dehumanization.
...
#172: I liked the method of withdrawing all posts by 204 commenters, then asking the system to predict what those posts were (or *if* the post existed, I think) based on all the rest.
...
#187: ok here's how I see it. There is a whole group at Carnegie Mellon CS department doing research in Artificial Intelligence by running statistics on blogs. A grad student Tae Yano stashed her research files on a server and left them wide open to WWW. The files pertain to running basic stats on DK and LGF, up to cosine similarity. The AI goal would be to have a blog-commenting software indistinguishable from a blog-commenting human. For all I know, this, or any other comment, is already written by a python script.
...
#277: [Quoted from a grant proposal Noah & I wrote that got unearthed, along with various other papers, pictures of Noah's cats, Tae's cv, her picture, her programming project on knitting, wedding announcement, etc:] "Political text is often indirect, sarcastic, repetitive, hyperbolic, emotional, biased, manipulative, and riddled with unstated assumptions." - Byte me.
...
#314: So some students are doing research on blogs. No big deal. It was interesting to see what it was about, but that's about it. I don't think they feel the need to protect their files. Why should they? Who would want to mess with that? It's just a school project, I don't see point of "exposing" the students and publicizing their information. At least that's my take.
...
#316: I noticed that the research group to which she belongs is partially funded by a DARPA grant.
...
#361: And the same techniques/analyses will probably work on Arabic-language websites, too.
...
#428: Are terrorists ‘phone banking’ for Barack?
...
#474: magine a natural language program that could respond to comments with charm and style, sort of a robo-blogger. Now imagine an army of them, all set to monitor a different political blog, run by a campaign manager for a politician. Add to its writing ability an encyclopedic memory, with instant access to famous quotes, historical facts, trivia, statistics, and every word ever uttered by the opposition. You now have an army of ultimate bloggers, all completely under the control of one campaign manager... no more "going off message" by some underpaid/volunteer lackey, just high quality counter-opinion, ready to be inserted into the blogs of anyone who disagrees with your candidate. This research will eventually lead to robo-blogging to kill emerging scandals and alternative opinions on issues... no more Rathergates as they will be smothered in the cradle by the most charming bloggers around -- the poli-bots.
...
#482: She's trying for a data-mining tool tailored for blogs that separates "useful, thoughtful" information from all the mindless dreck in the blogosphere. Lotsa luck with that! As an aside, I've been on the Carnegie Mellon campus, toured the Computer Science department, and met with CS faculty. It's a gorgeous campus. The school clearly has big bucks. CMU holds numerous contracts with various government agencies related to the information technology aspects of defense, computer security, homeland security, and similar "black ops" topics. At least some people on that campus have intimate access to NSA, DOD, and CIA. It's a spooky place.
...
#489: We should, for amusements sake, keep an eye open for a KOS diary about this. There may be some entertaining histrionics and conniption fits over their being the unwitting subject of DARPA funded research.
...
#492: But but but... Markos is a CIA agent.... dKos is a DARPA funded research project....
...
#496: Don't spill the beans. The Koslings haven't figured any of that out yet. Agent Markos will have a rough time of it when he is exposed as being a double secret agent of the Zionist conspiracy. Don't blow his cover.
...
#497: From this, we conclude that LGF not only has MORE numbers than DKOS, but BIGGER numbers as well. If YOU TOO want bigger numbers, choose LGF brand blogs.
...
#514: Was just thinking that all you people have too much time on your hands...but then it occurred to me I'm sitting here reading all this.
...
#518: Looks like the whole thing has gone 404. My guess is that she just wanted a corpus of data for some programming project and, now that the object under study is aware of her, it's no longer useful. I don't see any dark purpose here. How evil can someone be who writes knitting software?
...
#519: I suspect that if the mice KNOW they are in an experiment, they will not produce the same results,as they would otherwise, thus invalidating the experiment.
...
#520: [re: #21 zombie: "Talk about pointless. People get PhDs for this crap."] Not really pointless. The "value" of this may be questionable though, at this point in time. As the web and bogosphere has grown exponentially in influence, there has been great interest in determining if real life outcomes can be affected by influential opinions posted on blogs -- and then re-created in numerous other places to mimick majority viewpoints. There are numerous companies invloved in this research-like activity which can be tailored for marketing, business intelligence financial trading, political campaign managment uses (for example, Umbria of Denver, CO was just sold to JD Power). They use primarily data visualization tools (like those used now by lawyers engaged in electronic data discovery) -- These are similar to the software tools used by intelligence services to monitor, track, analyze, for example, wireless (web/phone) transmissions and discussions originating in the US, and destined for overseas delivery in places like, say, Iran.
...
#531: Yea- shut down. I didn't get to it in time to even see what it was all about. So, now I'm depressed. And stuff.
Since this morning, Tae chmod'ed all her code to hide it, but I suggested to Tae that she keep it visible, since folks were having so much fun with it. (Of course now she's embarassed and wants to clean up her code first...) Noah and I also wrote an open letter to LGF explaining what was happening.

I'm mostly amused (as you can tell), but also impressed at (1) how much was uncovered about Tae, Noah and me and (2) how much of this obscure statistical NLP code the LGF community was able to figure out, communally. The peril and power of the internet.

Thursday, May 01, 2008

Farecast and Fonolo

From Dave Farber's IP, news of a new company called Fonolo: a ‘Google’ for phone menus. The basic idea is simple: spider all those annoying phone menus and make every point inside one a URL, which you can reach by having a bot navigate to that point for you, and then call you when it gets there. From the video it sounds like the spidering is done manually, and the navigation is done using ASR to check progress (since the menus change frequently).

Like Farecast this is a nice example of using AI to solve consumer problems, by mining information that is, in principle public (in one case airfare prices, in the other phone menus), but in practice, inaccessible. And like Farecast, it's not at all obvious that the companies controlling that information will be happy about that mining - especially if people end up using it as a souped-up version of GetHuman.

This feels like the thin edge of a wedge to me. I predict that within five years there will be open AI warfare between consumer-oriented bots and corporate-controlled information...then again, maybe this is just wishful thinking. After all, I'm in the AI munitions business.