Thursday, November 20, 2008
The Last Veridian Note
On sabbatical this year at Google, I keep noticing how little of my old office space I've taken with me, and how little I miss it. As Bruce Sterling points out: "objects can no longer protect you from want, from humiliation – in fact they are causes of humiliation, as anyone with a McMansion crammed with Chinese-made goods and an unsellable SUV has now learned at great cost." Almost makes me feel like cleaning up at home...
Sunday, October 26, 2008
More data on tax policies
Above is yet another graph, using data from the same source - a simple one just showing the average tax rate in each income bracket: the bottom 20% of the population, the next 20%, and so on, up to the top 0.1%. The famous "Joe the plumber" is just at the around the 95% percentile - at the top end of that long, long range (from about the 40 percentile mark to the 90 percentile mark, $40k/yr to about $170k/yr) where the tax policies of the two candidates are pretty much the same. Or if you prefer, at the bottom of that long range (from $250k/yr to $2,800k/yr and up), where the policies are different.
I find it exceedingly strange that there's so much rhetoric about very small differences (eg for Joe the plumber's tax rate) and so little discussion about the fairly large differences in tax rates for the top end of the range. The idea that tax credits are socialist is just wacky - these have been pushed by conservatives for decades now as an alternative to heavier-weight social programs, and in fact McCain's health care plan is based on tax credits. And both plans are clearly progressive in the sense that higher incomes pay higher tax rates - the difference is only in degree, not kind.
Even if you accept the claim that the very top of the scale for tax rates is important, the difference in top tax rate is also not especially large in a global sense - in particular moving from 29-38% doesn't turn the US from a free-market bastion to a socialistic Sweden wanna-be. In act, it doesn't dramatically change the ranking of the US for top personal income tax rate. Look at this chart (based on 2005 data) from Wikipedia, and imagine moving the top US bracket from 28% or so to 38% or so:
Or if you prefer here are some selected values from the data tables on the WP page that I sliced out and graphed myself against the Obama and McCain proposals. It's clearly a difference...but it's just as clearly not a switch from capitalism to Marxism.
I find it exceedingly strange that there's so much rhetoric about very small differences (eg for Joe the plumber's tax rate) and so little discussion about the fairly large differences in tax rates for the top end of the range. The idea that tax credits are socialist is just wacky - these have been pushed by conservatives for decades now as an alternative to heavier-weight social programs, and in fact McCain's health care plan is based on tax credits. And both plans are clearly progressive in the sense that higher incomes pay higher tax rates - the difference is only in degree, not kind.
Even if you accept the claim that the very top of the scale for tax rates is important, the difference in top tax rate is also not especially large in a global sense - in particular moving from 29-38% doesn't turn the US from a free-market bastion to a socialistic Sweden wanna-be. In act, it doesn't dramatically change the ranking of the US for top personal income tax rate. Look at this chart (based on 2005 data) from Wikipedia, and imagine moving the top US bracket from 28% or so to 38% or so:
Or if you prefer here are some selected values from the data tables on the WP page that I sliced out and graphed myself against the Obama and McCain proposals. It's clearly a difference...but it's just as clearly not a switch from capitalism to Marxism.
Monday, October 20, 2008
More visual comparisons
From Mark Thoma at The Economists View, here's a pie chart showing earmarks as a percentage of the total Federal budget.
..and here is offshore drilling production, by year, as a fraction of total oil consumption.
..and here is offshore drilling production, by year, as a fraction of total oil consumption.
Thursday, October 16, 2008
Tuesday, October 14, 2008
My brilliant idea of the week...
...how about announcing a concert that hasn't happened yet, instead of one that's over? Our next show is a
RSVP at the snazzy Obama-provided event page or listen to some samples on myspace if you're not one of those "low information" listeners.
Musical Open House for Obama featuring folk / blues / ragtime / bluegrass band Smokestack Lightining & friends.No cover - just donate what you can if you appreciate the music!
Sunday Oct 26, 7:30 - 9:30pm
at the House of Nick Coles and Jen Matesa
331 S. Fairmount Street, Pittsburgh 15232
RSVP at the snazzy Obama-provided event page or listen to some samples on myspace if you're not one of those "low information" listeners.
Thursday, October 09, 2008
Visualizing the Long Tail for Tax Policy: Part 2
Above is a corresponding visualization of the Obama plan - but it's a little more complicated. Again, you can click to zoom in.
Obama's plan includes both tax cuts and tax increases. All of Obama's tax cuts go to the bottom 900 people (the poorest 90%). The total cuts for the bottom 90% is much larger than for McCain's plan, but it's also distributed quite differently, as the inset shows.
For the next 90 people (percentiles 90-99%) it's basically a wash - the bottom half of these will get a tax cut, but a very slight one, and the top half of those 90 will see an increase: in terms of field position, they'll take a penalty, and have to move back to the left. How far, you ask? On average across these 90 people, the penalty is about 6 inches each - not enough to see on the gridiron I drew.
For the next 9 people, in the 99-99.9% bracket, the average tax increase is big enough to be visible: about a 4 yards penalty. An offsides penalty, maybe. I drew the approximate shift of each of these 9 people on the graph, with dotted blue arrows pointing from the old to the new position.
For the last guy, the penalty is nearly 22 yards, moving him back to the 70 or 75. Maybe a holding call.
I didn't bother to draw the appropriate-size stack of bills for the tax increases- you'll just have to image it. I didn't draw field-position arrows for the McCain tax plan because nobody moves by more than a foot or so - except for the guy down by the 94-yard line, who just about makes it to the goal.
I should also probably point out (in case it's not obvious) that neither of these plans is likely to be implemented exactly in their proposed form - even if the economy and stock market hadn't implored the last couple of weeks, which has obviously changed things, there's a long process of digestion before any major legislation is produced by congress, and like digestion, what comes out is often quite dissimilar from what goes in. It's probably best to think of this as a statement of what values the candidates stand support, rather than a prediction of how your taxes will change.
So to summarize:
- The Obama plan makes the spread between the highest-paid and lowest-paid Americans noticibly smaller. The McCain plan makes it slightly larger.
- The bottom 50-60% of the country by income do much better under the Obama plan).
- The tax plans are not very different for the 60-95% bracket, between about $70k and $170k per year.
- The 95-99% bracket ($170-$240k) clearly does better under McCain - by a substantial amount in dollars each, but a small percentage (a little more than 2%, actually) in income.
- The top 1%, and especially the top 0.1%, do much much better under McCain.
Saturday, October 04, 2008
Visualizing the Long Tail for Tax Policy
They say software is hard to understand because it's "invisible" - but aren't so many things? For fun, I downloaded the raw data on Obama and McCain's proposed tax policies from http://www.taxpolicycenter.org - they have it all in Excel form, which makes crunching it convenient and easy - and tried to construct what I considered a reasonable visualization. Their data is broken out to show the effective tax increase or decrease for the first quintile, the second quintile, etc, with the top quintile broken down to partitions of the top 10%, 5%, 1% and 0.1%. I wanted to show the range of salaries in each partition, the number of people in each partition, and the change in tax. I decided that it just can't be done - the ranges are just two broad to see on, without using hard-to-explain constructs like a loglog plot.
Here's a mental image that might help. Think of a football field, 1000 typical Americans, and some big stacks of money in small bills. Specifically, think of $1 and $5 shuffled together and piled up so that each pile contains about equal numbers of $1's and $5's. If I did the math right (they say a US bill is 0.043" thick) then a money stack one foot tall is worth about $10k - so the football field is $3M long and $1.6M wide, I'm about $57k tall, and my waist size used to be about $30k (but with my diet, I lost about $1600 around the middle).
Now to visualize income distribution, let's take those 1000 typical Americans, and let them walk down the gridiron like this: start at one goal line, and for every $10k of yearly income, walk one foot toward the other end zone...so if you make $90k/year, e.g., you'll end up on the three-yard line, and if you make $300k/year, you'll end up on the 10. Subject to that rule, space everyone out as much as you can.
The first 900 of those 1000 people will end up somewhere before the 4 yard line, because about 90% of Americans make about $120k or less.
Another 90 people will end up between the 4 yard line and the 8 yard line - that is, they make between $120k and $240k per year.
Another 9 people will end up between the 8 yard line at the 20 - they make between $240k and $600k per year.
That accounts for 999 out of 1000 people, but this is America, land of Bill Gates and Warren Buffett - so we're not done yet. The last guy will stand, on average, at the 94 yard line - with an average income of $2.8M per year. Remember this is income, not life savings.
Now, for the tax cuts. Let's start with McCain's. EIn his plan, everyone gets a tax cut, and we'll hand it out as stacks of money. McCain has a stack about 9 feet high to be shared by the first 900 people; a stack about 5 feet high to be shared by the next 90 people; another 9.5 foot stack to be shared by the next 9 people; and a final stack about 6.5 feet high - about the height of Ben Rothlisberger - that goes to that one lonely old man (surely, he's old?) down on the 94 yard line.
Above is a picture I drew of this...you can click to zoom in.
This is approximate, and loses some detail - which maybe I'll get to in another post - but it illustrates the main properties of McCain's plan: there is an extremely long tail in income distribution in the US, and as a consequence, an extremely long tail of tax-cut returns. If you imagine the money stacks are all the same (which is order-of-magnitude correct), then
Here's a mental image that might help. Think of a football field, 1000 typical Americans, and some big stacks of money in small bills. Specifically, think of $1 and $5 shuffled together and piled up so that each pile contains about equal numbers of $1's and $5's. If I did the math right (they say a US bill is 0.043" thick) then a money stack one foot tall is worth about $10k - so the football field is $3M long and $1.6M wide, I'm about $57k tall, and my waist size used to be about $30k (but with my diet, I lost about $1600 around the middle).
Now to visualize income distribution, let's take those 1000 typical Americans, and let them walk down the gridiron like this: start at one goal line, and for every $10k of yearly income, walk one foot toward the other end zone...so if you make $90k/year, e.g., you'll end up on the three-yard line, and if you make $300k/year, you'll end up on the 10. Subject to that rule, space everyone out as much as you can.
The first 900 of those 1000 people will end up somewhere before the 4 yard line, because about 90% of Americans make about $120k or less.
Another 90 people will end up between the 4 yard line and the 8 yard line - that is, they make between $120k and $240k per year.
Another 9 people will end up between the 8 yard line at the 20 - they make between $240k and $600k per year.
That accounts for 999 out of 1000 people, but this is America, land of Bill Gates and Warren Buffett - so we're not done yet. The last guy will stand, on average, at the 94 yard line - with an average income of $2.8M per year. Remember this is income, not life savings.
Now, for the tax cuts. Let's start with McCain's. EIn his plan, everyone gets a tax cut, and we'll hand it out as stacks of money. McCain has a stack about 9 feet high to be shared by the first 900 people; a stack about 5 feet high to be shared by the next 90 people; another 9.5 foot stack to be shared by the next 9 people; and a final stack about 6.5 feet high - about the height of Ben Rothlisberger - that goes to that one lonely old man (surely, he's old?) down on the 94 yard line.
Above is a picture I drew of this...you can click to zoom in.
This is approximate, and loses some detail - which maybe I'll get to in another post - but it illustrates the main properties of McCain's plan: there is an extremely long tail in income distribution in the US, and as a consequence, an extremely long tail of tax-cut returns. If you imagine the money stacks are all the same (which is order-of-magnitude correct), then
- For the bottom 90%, the per-person tax cut is about 1/10 that of people in the 90-99% bracket.
- For the 90-99% partition, the per-person tax cut is about 1/10 that of people in the 99-99.9% bracket.
- For the 99-99.9% partition, the per-person tax cut is about 1/10 that of people in the top one-tenth of one percent.
"I shot a moose once"
And now, an informational message, for those who may not fully recognize the intricacies and dangers of moose hunting.
Tuesday, September 30, 2008
The Sarah Palin Turing test
I was wondering .... was it just me, or did anyone else have the reaction "Gosh, her answer sounds like it was generated from a statistical language model trained on a corpus of random Republican talking points?"
Sunday, September 28, 2008
Friday, September 26, 2008
Summary of the morning's news...
Sometime last night, the bailout became no bailout. Obama, via the NYT:
“What I’ve found, and I think it was confirmed today, is that when you inject presidential politics into delicate negotiations, it’s not necessarily as helpful as it needs to be,” Mr. Obama told reporters Thursday evening. “Just because there is a lot of glare of the spotlight, there’s the potential for posturing or suspicions. When you’re not worrying about who’s getting credit, or who’s getting blamed, then things tend to move forward a little more constructively.”The Republican side of the story, as told by Kevin Smith, aid to House Republican Leader Boehner:
But a top aide to Mr. Boehner said it was Democrats who had done the political posturing. The aide, Kevin Smith, said Republicans revolted, in part, because they were chafing at what they saw as an attempt by Democrats to jam through an agreement on the bailout early Thursday and deny Mr. McCain an opportunity to participate in the agreement.
Tuesday, September 09, 2008
The Value of Experience
The great thing about the web is that if you can imagine a question and if you spend a little time looking you can find often someone that already answered the question. This weekend I wondered: there's all this talk about who's experienced enough to be president - what does the data actually say about how experience correlates with performance? Turns out that they asked the same question at electoral-vote.com and got some interesting answers.
The first scatter plot (or maybe it's the second?) above plots total years of experience against rank for all presidents up to W. Bush, where rank=1 for Abraham Lincoln, and rank=42 for Warren Harding (who?). This is a consensus ranking by historians, so it's somewhat subjective, but there's no apparent correlation between x and y here at all. The second scatter plot (or was it the first?) compares experience to a random permutation of rank. Can you see any difference? I can't. If you're curious, and you compute r^2 for this dataset, then you get a value of 0.008 - essentially nil (an r^2 around 0.5 would be a strong correlation for data of this sort). I poked around in Excel for a while, and the data looks just as uncorrelated if you look at gubernatorial experience, for instance, and a couple of other variants I tried.
So what does that mean? obviously, experience matters for any job, right? Well, one point is that there is a sample bias here - the data is entirely on people that were at least arguably well-qualified for the job, not the result of controlled scientific experiment in which scientists took [wo]men off the street at random and dropped them into the White House. (Insert Sarah Palin joke here if you like). It's wrong to say experience doesn't matter: more accurately the data says that among people qualified enough to be elected, more experience does not correlate with better performance.
An second point: from a machine learning point of view, certainly not all experiences are created equal. Two important properties are the recency and diversity of experiences - for instance, 1,000 samples of poll data collected last week are better than 10,000 samples of polling data collected 50 years ago, and 1,000 samples collected from 50 states is better than 10,000 samples collected from just Utah and Nevada. This data says little about the quality of the experience. Will John McCain's experience as a child and young man in the 1930's and 40's help him be a better president? How much does he actually know now, midway through his third decade in congress, that he didn't know midway through his first?
Other explanations are also possible. Maybe being president is just too unlike other jobs for experience to matter much. Or maybe the Lincolns and FDRs of the world are so remarkable that they rise quickly through the ranks and are still great, while the Buchanans, Garfields, and Van Burens will never be more than mediocre no longer how long they slog along.
Or maybe there's some other explanation.
So what does this all mean? What am I, a pundit? In terms of this week's talking points, neither party scores much - Team Red seems to have all but abandoned the experience-vs-celebrity story and are busy trying to co-opt Team Blue's message of change/new faces. But there's still a lot of noise about how much experience Palin has, and how it compares to Obama's, and I think it's worth noting that a major fraction of the talking-head time this cycle has been (and certainly will be) spent on a point that is a matter of faith, not a matter of objective fact. Once you look at the data, it's clear that simple miles-on-the-odometer experience does not make tend make one a better president - or at least, if a trend exists, it is extremely small.
Monday, August 25, 2008
politics shmolitics, as long as it swings
A clever YouTube video produced by my good friend Judy Minot.
Saturday, July 26, 2008
Another performance by Smokestack Lightinin'
This one was at the bizarrely inspired 1st Annual Alexander Berkman Festival, and was captured in full audio glory by Er1k Riebling (thanks Er1k!) The whole set (20min) consisted of:
Sunday, May 18, 2008
Upcoming gig
I'll be playing (mostly mandolin) Tues night in Braddock, with the band Smokestack Lightenin' for a fundraiser for Unseam’d Shakespeare Company’s Out of this Furnace: project. The headliner is Anne Feeney.
Where: Elks Lodge #883- 424 Library Street, Braddock, Pennsylvania
When: Tuesday May 20, 2008 7:00 PM-11:00PM
Tickets: $25 (or pay what you can), available at the door (cash/check only)
Contact: Tim Dawson, 412-621-0244 for directions and information
Where: Elks Lodge #883- 424 Library Street, Braddock, Pennsylvania
When: Tuesday May 20, 2008 7:00 PM-11:00PM
Tickets: $25 (or pay what you can), available at the door (cash/check only)
Contact: Tim Dawson, 412-621-0244 for directions and information
Friday, May 16, 2008
We have violated the prime directive
Noah Smith and I are co-supervising Tae Yano on a project involving analysis of political blogs, and Tae left a pile of results and code on her CMU web site as a way of communicating with us...world-readable. Surprisingly someone at one of the blogs she spidered, Little Green Footballs, actually noticed, leading to a lot of investigative work in this fascinating thread:
I'm mostly amused (as you can tell), but also impressed at (1) how much was uncovered about Tae, Noah and me and (2) how much of this obscure statistical NLP code the LGF community was able to figure out, communally. The peril and power of the internet.
Anyone know what this page at Carnegie Mellon means? It’s some kind of experiment that involves comments posted at LGF, and I have a feeling it’s not friendly.Since this morning, Tae chmod'ed all her code to hide it, but I suggested to Tae that she keep it visible, since folks were having so much fun with it. (Of course now she's embarassed and wants to clean up her code first...) Noah and I also wrote an open letter to LGF explaining what was happening.
...
comment #5: Maybe it's post-modern poetry, academia-style? Using LGF comments as gibberish to transcend interpretation?
...
comment #18: Obviously it's a KGB plot.
...
#12: there's a Kos directory too. Probably just too what they consider to be partisan sites to test some kind of language algorithms or something
...
#20: I think they are doing a text analysis looking for Charles' sock puppets -- like the way some trolls accuse Robert Spencer of posting as Hugh
...
#23: Looks like they are doing another election-year study looking at patterns of argument or other aspects of communication.
...
#47: Who ever thought that garbage labeled as "research" could be so sophisticated and at the same time be so worthless. ... Polishing farts to get Piled higher and Deeper has become, for the most part, an ephemeral exercise in self deception. [ouch! sounds like my last grant review].
...
#48: This isn't that advanced of stuff - we are working on a simillar concept for internal corporate communications. This has been around a while.
....
#60: EXPERIMENTATION WITHOUT REPRESENTATION! I demand compensation!
...
#61: I wonder if linking to their experiment will introduce fatal feedback loops, leading to a disturbance in the Krell mind-field and possible space-time discontinuity?
...
#79: It looks to my un-Pythonesque eye like a programmer is using the cosine function to create some kind of mapping on how comments refer to one another based on comment location in the thread; and looking at different types of postings from Charles (open thread, breaking news, news outlet criticisms) to see if there are any trends in the discussions that vary according to the posting type. That has to be one of the worst run-on sentences I've ever written.
...
#92: [#64: "What is port sniffing?"] after you pour some in the glass, but before you take a sip, swirl it around a bit and inhale whist the dark chocolate is melting on your tongue......
...
#162: Ahh, so basically it's a merger between post-Marxist deconstructionism and THX-1138-esque Orwellian personality dehumanization.
...
#172: I liked the method of withdrawing all posts by 204 commenters, then asking the system to predict what those posts were (or *if* the post existed, I think) based on all the rest.
...
#187: ok here's how I see it. There is a whole group at Carnegie Mellon CS department doing research in Artificial Intelligence by running statistics on blogs. A grad student Tae Yano stashed her research files on a server and left them wide open to WWW. The files pertain to running basic stats on DK and LGF, up to cosine similarity. The AI goal would be to have a blog-commenting software indistinguishable from a blog-commenting human. For all I know, this, or any other comment, is already written by a python script.
...
#277: [Quoted from a grant proposal Noah & I wrote that got unearthed, along with various other papers, pictures of Noah's cats, Tae's cv, her picture, her programming project on knitting, wedding announcement, etc:] "Political text is often indirect, sarcastic, repetitive, hyperbolic, emotional, biased, manipulative, and riddled with unstated assumptions." - Byte me.
...
#314: So some students are doing research on blogs. No big deal. It was interesting to see what it was about, but that's about it. I don't think they feel the need to protect their files. Why should they? Who would want to mess with that? It's just a school project, I don't see point of "exposing" the students and publicizing their information. At least that's my take.
...
#316: I noticed that the research group to which she belongs is partially funded by a DARPA grant.
...
#361: And the same techniques/analyses will probably work on Arabic-language websites, too.
...
#428: Are terrorists ‘phone banking’ for Barack?
...
#474: magine a natural language program that could respond to comments with charm and style, sort of a robo-blogger. Now imagine an army of them, all set to monitor a different political blog, run by a campaign manager for a politician. Add to its writing ability an encyclopedic memory, with instant access to famous quotes, historical facts, trivia, statistics, and every word ever uttered by the opposition. You now have an army of ultimate bloggers, all completely under the control of one campaign manager... no more "going off message" by some underpaid/volunteer lackey, just high quality counter-opinion, ready to be inserted into the blogs of anyone who disagrees with your candidate. This research will eventually lead to robo-blogging to kill emerging scandals and alternative opinions on issues... no more Rathergates as they will be smothered in the cradle by the most charming bloggers around -- the poli-bots....
#482: She's trying for a data-mining tool tailored for blogs that separates "useful, thoughtful" information from all the mindless dreck in the blogosphere. Lotsa luck with that! As an aside, I've been on the Carnegie Mellon campus, toured the Computer Science department, and met with CS faculty. It's a gorgeous campus. The school clearly has big bucks. CMU holds numerous contracts with various government agencies related to the information technology aspects of defense, computer security, homeland security, and similar "black ops" topics. At least some people on that campus have intimate access to NSA, DOD, and CIA. It's a spooky place.
...
#489: We should, for amusements sake, keep an eye open for a KOS diary about this. There may be some entertaining histrionics and conniption fits over their being the unwitting subject of DARPA funded research.
...
#492: But but but... Markos is a CIA agent.... dKos is a DARPA funded research project....
...
#496: Don't spill the beans. The Koslings haven't figured any of that out yet. Agent Markos will have a rough time of it when he is exposed as being a double secret agent of the Zionist conspiracy. Don't blow his cover.
...
#497: From this, we conclude that LGF not only has MORE numbers than DKOS, but BIGGER numbers as well. If YOU TOO want bigger numbers, choose LGF brand blogs.
...
#514: Was just thinking that all you people have too much time on your hands...but then it occurred to me I'm sitting here reading all this.
...
#518: Looks like the whole thing has gone 404. My guess is that she just wanted a corpus of data for some programming project and, now that the object under study is aware of her, it's no longer useful. I don't see any dark purpose here. How evil can someone be who writes knitting software?
...
#519: I suspect that if the mice KNOW they are in an experiment, they will not produce the same results,as they would otherwise, thus invalidating the experiment.
...
#520: [re: #21 zombie: "Talk about pointless. People get PhDs for this crap."] Not really pointless. The "value" of this may be questionable though, at this point in time. As the web and bogosphere has grown exponentially in influence, there has been great interest in determining if real life outcomes can be affected by influential opinions posted on blogs -- and then re-created in numerous other places to mimick majority viewpoints. There are numerous companies invloved in this research-like activity which can be tailored for marketing, business intelligence financial trading, political campaign managment uses (for example, Umbria of Denver, CO was just sold to JD Power). They use primarily data visualization tools (like those used now by lawyers engaged in electronic data discovery) -- These are similar to the software tools used by intelligence services to monitor, track, analyze, for example, wireless (web/phone) transmissions and discussions originating in the US, and destined for overseas delivery in places like, say, Iran.
...
#531: Yea- shut down. I didn't get to it in time to even see what it was all about. So, now I'm depressed. And stuff.
I'm mostly amused (as you can tell), but also impressed at (1) how much was uncovered about Tae, Noah and me and (2) how much of this obscure statistical NLP code the LGF community was able to figure out, communally. The peril and power of the internet.
Monday, May 05, 2008
Warning: software is packaged by weight not volume, contents may settle on shipping
From Farber's IP list: EULAs have come to malware. So much for honor among thieves.
Thursday, May 01, 2008
Farecast and Fonolo
From Dave Farber's IP, news of a new company called Fonolo: a ‘Google’ for phone menus. The basic idea is simple: spider all those annoying phone menus and make every point inside one a URL, which you can reach by having a bot navigate to that point for you, and then call you when it gets there. From the video it sounds like the spidering is done manually, and the navigation is done using ASR to check progress (since the menus change frequently).
Like Farecast this is a nice example of using AI to solve consumer problems, by mining information that is, in principle public (in one case airfare prices, in the other phone menus), but in practice, inaccessible. And like Farecast, it's not at all obvious that the companies controlling that information will be happy about that mining - especially if people end up using it as a souped-up version of GetHuman.
This feels like the thin edge of a wedge to me. I predict that within five years there will be open AI warfare between consumer-oriented bots and corporate-controlled information...then again, maybe this is just wishful thinking. After all, I'm in the AI munitions business.
Like Farecast this is a nice example of using AI to solve consumer problems, by mining information that is, in principle public (in one case airfare prices, in the other phone menus), but in practice, inaccessible. And like Farecast, it's not at all obvious that the companies controlling that information will be happy about that mining - especially if people end up using it as a souped-up version of GetHuman.
This feels like the thin edge of a wedge to me. I predict that within five years there will be open AI warfare between consumer-oriented bots and corporate-controlled information...then again, maybe this is just wishful thinking. After all, I'm in the AI munitions business.
Thursday, April 24, 2008
Classifiers in the news
Here's a first for me - a blog post consisting of an actual decision tree. It classifies counties as going Clinton vs Obama. Wonder what the weights for a linear classifier would look like?
Wednesday, April 23, 2008
Somehow this seems more appropriate the day after
I was hoping for Obama to have a knockout win in Pennsylvania, just to end the madness...but no. Apparently, as the Three-Toed Sloth says, the voting will continue until morale improves.
Oh well, at least the telerobots will stop calling my house for a while....
Oh well, at least the telerobots will stop calling my house for a while....
Thursday, April 17, 2008
Valuable resource for sentiment analysis?
I didn't make it through last night's debate but it's an ill wind...one result seems to be a list of 11000+ comments that appear to be almost entirely negative in tone.
Monday, April 14, 2008
Sunday Forum: Nation of bridges
If you like Pittsburgh, baseball, Michael Chabon, and Barack Obama - and who doesn't? - there was a lovely essay in my Sunday paper: Nation of bridges
Friday, April 04, 2008
Sterling interview
Via Dave Farber's "Interesting Persons" list, an interview with Bruce Sterling on Life, the Internet and Everything, in which he discusses the role of colleges
A college is supposed to civilize the young. If you remove those functions merely in order to pump ones and zeros at student eyeballs, you are going to lose a lot of valuable cultural capital. An educated citizen is not a friction-free technocratic philistine myrmidon with an ISO rating. Those guys exist, don't get me wrong, but they bear the relationship to education that the Ron Paul campaign had to political reality.and himself
They say that those who worship the Muses end up running the Museums. Similarly, all futurists are doomed to become historians. The avant-garde becomes the gray eminences...
Tuesday, April 01, 2008
After hearing the David Evan's excellent talk "What Elements of an Online Social Networking Profile Predict Target-Rater Agreement in Personality?" I now know there there are three things I could tell you about that would (roughly speaking) help you more reliably assess my true personality. Here they are....
- Something I'm proud of.
- Something I'm kind of embarrassed by.
- A humorous video.
Sunday, March 23, 2008
They Told You Not To Reply
Via Dave Farber's IP list, how email leaks have helped raise $5000 for Seattle-area dog pounds:
They Told You Not To Reply
Wednesday, March 19, 2008
Wednesday, March 12, 2008
The Keys to My Heart
...should use at least 128 bits, I'd say. Via Dave Farber's IP list - and giving a new meaning to the phrase "security for embedded systems": A Heart Device Is Found Vulnerable to Hacker Attacks
Friday, March 07, 2008
Cut Once (Again)
For those that want to try out the latest learning technology in Thunderbird - the Cut Once Thunderbird Extension now has a spiffy new home page. And before you ask, no, I have no idea why the parrot. You'll have to ask Vitor or Ramnath.
Wednesday, March 05, 2008
Joining the 21st century
I certainly don't publish enough for it to be necessary, but in an idle moment I used Dapper to wrap my publication list an build an RSS feed for it. As someone that worked on this a few years ago, I'm impressed - wrapping the page took only about 5 minutes, and it was quite easy to use. I didn't even really pay attention to the "documentation" (which is a Flash video).
Monday, March 03, 2008
Greenhous gases - doing the math
Three times this winter I've turned the crank on my 1994 Ford Escort and had it not start right up. Admittedly once was a dead battery and once was a defective battery but still, I'm considering the reality that one of these days I'll have to overcome my natural lethargy actually replace the thing....so, I've been reading up on new cars...and carbon footprints...and have been trying to work out to what extent getting a hybrid is an effective way to save the planet and to what extent it's just sort of cool.
Somewhat surprisingly, the nicest carbon-footprint calculator I found was at the EPA. According to them, replacing both my cars with something that had better than double the gas mileage would save about 2300 lbs of CO2/year - but I'd save more with a either newer furnace, or with new windows for my house. I'm probably atypical here, as together Susan and I drive about 100 miles a week, but still, I was pretty surprised.
Another data-point came up today in Low-tech Magazine: according to them, at least, the carbon cost of manufacturing the 8m^2 of solar panels needed to power a house is between 60,000 and 940,000 kgs of C02 - and according to the EPA this is the same carbon cost as between 20 and, um, 300+? years of electricity use for an average family of 4. Of course, this could be way off base, but it seems to me that we need to set up either some sort of marketplace mechanisms to allocate CO2 savings efficiently - just following the trends may not get us there.
Somewhat surprisingly, the nicest carbon-footprint calculator I found was at the EPA. According to them, replacing both my cars with something that had better than double the gas mileage would save about 2300 lbs of CO2/year - but I'd save more with a either newer furnace, or with new windows for my house. I'm probably atypical here, as together Susan and I drive about 100 miles a week, but still, I was pretty surprised.
Another data-point came up today in Low-tech Magazine: according to them, at least, the carbon cost of manufacturing the 8m^2 of solar panels needed to power a house is between 60,000 and 940,000 kgs of C02 - and according to the EPA this is the same carbon cost as between 20 and, um, 300+? years of electricity use for an average family of 4. Of course, this could be way off base, but it seems to me that we need to set up either some sort of marketplace mechanisms to allocate CO2 savings efficiently - just following the trends may not get us there.
Wednesday, February 27, 2008
Thursday, February 21, 2008
Is 88.80.13.160 the new 09-f9-... ?
Remember the great Digg revolt of 2007, when takedown efforts for a key needed to decrypt HD-DVD led to numerous ways to publish or conceal said key? (I think my favorite was
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-bdbut that's just me.) From Steven Bellovin via IP, the web site Wikileaks has been disabled due to a court order...by removing the site from the DNS registry. As Steven points out:
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-be
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-bf
[redacted]
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-c1
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-c2
09-f9-11-02-9d-74-e3-5b-d8-41-56-c5-63-56-88-c3
Not surprisingly, the NY Times article quoted the web site as noting the similarity of this case to the Pentagon Papers case. The Times also noted how ineffectual the censorship attempt actually was -- not only are there alternate names wikileaks.be, wikileaks.de, and wikileaks.cx -- but the site is still reachable via 88.80.13.160.
Monday, February 04, 2008
State Secret Abuses Come to a Boil
State Secret Abuses Come to a Boil - an interesting sidelight to my post below on asymmetric information is this story on how secrecy has been abused in the distant past by our government. Of course any abuses in the recent past would still be secret.
Two more papers out of the review tunnel
- Ramesh Nallapati and William W. Cohen (2008): Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs in ICWSM-2008.
- Frank Lin and William W. Cohen (2008): The MultiRank Bootstrap Algorithm: SemiSupervised Political Blog Classification and Ranking Using SemiSupervised Link Classification in ICWSM-2008.
Kevin Kelly -- The Technium
Long ago, I remember reading a science fiction story where aliens nearly destroy the world by dumping devices that would copy anything - a book, a flashlight, a hamburger - given an original and a pile of sand (or any other raw matter). This doesn't actually wreck the world economy... in the story...but of course it's fiction. However, given how much of the expensive parts of the world is information, the premise is no longer that far-out.
Kevin Kelly gives an excellent analysis of why free information (or even free goods) doesn't mean the end of the world.
Kevin Kelly gives an excellent analysis of why free information (or even free goods) doesn't mean the end of the world.
Wednesday, January 30, 2008
Measure twice, cut once
My students Vitor and Ramnath have developed a Thunderbird plugin that implements recipient recommendation and leak detection for email. It modifies Thunderbird by adding an additional pane that pops up after you send a message, giving you one final chance to fix any errors in your recipient list.
The plugin is called Cut Once - the name comes from a phrase used repeatedly by those weird Rastafarian guys in Neuromancer. There's a brief writeup on how to use it, but it's pretty self-explanatory: just download it, open Thunderbird, and go to the tools->addon menu to install. After you've installed it, you train by opening your folder of "Sent" mail and pressing the "train" button. (This took about an hour for my 9000+ old messages, which is pretty good for something written in JavaScript.)
Any comments/feedback are appreciated. Or, if messing with your email client is too extreme for you, you could just read Vitor's ECIR 2008 paper, which is the latest one to come out of the review tunnel.
The plugin is called Cut Once - the name comes from a phrase used repeatedly by those weird Rastafarian guys in Neuromancer. There's a brief writeup on how to use it, but it's pretty self-explanatory: just download it, open Thunderbird, and go to the tools->addon menu to install. After you've installed it, you train by opening your folder of "Sent" mail and pressing the "train" button. (This took about an hour for my 9000+ old messages, which is pretty good for something written in JavaScript.)
Any comments/feedback are appreciated. Or, if messing with your email client is too extreme for you, you could just read Vitor's ECIR 2008 paper, which is the latest one to come out of the review tunnel.
Saturday, January 19, 2008
Personal control of personal information
Recently I've run into a number of posts discussing privacy from a different prospective - noting that while you, as a consumer/citizen, have limited control over where and how information about you is kept, governments and businesses often do try to control how information about their activities.
For instance, Glenn Reynolds recently wrote in Popular Mechanics:
With that in mind, there's a very interesting move afoot to set up an open standard to describe user "attention" data - which I gather includes browsing history, mostly, but could also certainly include any other information about what a user is interested in...Netflix reviews, Amazon purchases, search queries, you name it. The hope is to break this data away from the many different sites (each if which controls a piece of it) and put it in the hands of users, who can then go and get purchase recommendations (or what have you) from whoever does the best job.
Not an entirely new idea, but a fascinating one none the less. Moore's law, along with progress in collaborative filtering/machine learning techniques, means that the barrier to being able to save this sort of data and do something interesting with it is just going to keep dropping, so one can certainly imagine a more horizontal market for recommendations opening up over the next few years. I don't see any particularly horrible technical roadblocks, but I do see a lot of interesting technical problems (e.g. reference resolution over this data!) and of course there might be pushback from the people that control the data now.
Update: There's some discussion of this from Fernando.
For instance, Glenn Reynolds recently wrote in Popular Mechanics:
...government officials and big corporations often want to watch us, but they don't want to be watched in return. Shopping malls are full of security cameras, but many have signs at the entrance telling customers that no photography or video recording is allowed. Police cars have dashboard cameras... But try shooting photos or video of police or other public officials as they go about their business and you might find yourself in wrist restraints. ... Under the law, citizens have no right not to be photographed in public places. So why should people who make their living on the taxpayers' dime enjoy greater freedom from public scrutiny than the taxpayers themselves?An article with a similar tone also appeared in Computer World, which gave a link to a compelling example of how a citizen's records of police actions were evidence of police wrongdoing (NYT headline: "Recorded on a Suspect’s Hidden MP3 Player, a Bronx Detective Faces 12 Perjury Charges"). The author's summary of the situation:
...surveillance in general ... upsets the balance of power. Whoever has the tape has the power to use, not use, selectively use or misuse the information or proof or evidence recorded.This an interesting line of argument. There's something that seems fundamentally wrong about living in a society where surveillance is "endemic", but maybe the most jarring thing is not the loss of privacy, but the loss of power and control.
With that in mind, there's a very interesting move afoot to set up an open standard to describe user "attention" data - which I gather includes browsing history, mostly, but could also certainly include any other information about what a user is interested in...Netflix reviews, Amazon purchases, search queries, you name it. The hope is to break this data away from the many different sites (each if which controls a piece of it) and put it in the hands of users, who can then go and get purchase recommendations (or what have you) from whoever does the best job.
Not an entirely new idea, but a fascinating one none the less. Moore's law, along with progress in collaborative filtering/machine learning techniques, means that the barrier to being able to save this sort of data and do something interesting with it is just going to keep dropping, so one can certainly imagine a more horizontal market for recommendations opening up over the next few years. I don't see any particularly horrible technical roadblocks, but I do see a lot of interesting technical problems (e.g. reference resolution over this data!) and of course there might be pushback from the people that control the data now.
Update: There's some discussion of this from Fernando.
Thursday, January 10, 2008
Wikis at school
In Ars Technica, last October, I saw a story about a professor that assigned her students the task of writing a Wikipedia entry. I went one better, I think...Natalie Glance and I were teaching a seminar in "Analysis of Social Media" this fall, and I assigned the class the task of building a wiki on the subject. During the class I limited wiki access to class members, but it's now open to everyone to read or edit.
It was an interesting experiment. It was only a little extra work in grading and coordinating, but it was worth it for the irony factor alone. The students were mostly positive about it.
The principle content of the wiki is a bunch of paper summaries, not unlike what students would have turned in in a class, but some students came up with some other ideas for contributions. The main change I'd make if I did this again would be switching wiki providers. (Cheapskate that I am, I used a free wiki farm called scribblewiki. They were very helpful early on, and even upgraded me to a "paid" account for free, but took a long time with some other requests later - and never did add support for latex math.) It's not obvious what's the best way to use a wiki in teaching course for the (n+1)th time, but it's a fun twist for the first time you teach a seminar.
It was an interesting experiment. It was only a little extra work in grading and coordinating, but it was worth it for the irony factor alone. The students were mostly positive about it.
The principle content of the wiki is a bunch of paper summaries, not unlike what students would have turned in in a class, but some students came up with some other ideas for contributions. The main change I'd make if I did this again would be switching wiki providers. (Cheapskate that I am, I used a free wiki farm called scribblewiki. They were very helpful early on, and even upgraded me to a "paid" account for free, but took a long time with some other requests later - and never did add support for latex math.) It's not obvious what's the best way to use a wiki in teaching course for the (n+1)th time, but it's a fun twist for the first time you teach a seminar.
Subscribe to:
Posts (Atom)