Showing posts with label applied statistics. Show all posts
Showing posts with label applied statistics. Show all posts

Tuesday, December 23, 2014

Controlling For Cause, Variance of Lifespan Still Higher for American Blacks


Why Lifespans Are More Variable Among Blacks Than Among Whites in the United States 

 Glenn Firebaugh et al.
Demography, December 2014, Pages 2025-2045

Abstract: Lifespans are both shorter and more variable for blacks than for whites in the United States. Because their lifespans are more variable, there is greater inequality in length of life — and thus greater uncertainty about the future — among blacks. This study is the first to decompose the black-white difference in lifespan variability in America. Are lifespans more variable for blacks because they are more likely to die of causes that disproportionately strike the young and middle-aged, or because age at death varies more for blacks than for whites among those who succumb to the same cause? We find that it is primarily the latter. For almost all causes of death, age at death is more variable for blacks than it is for whites, especially among women. Although some youthful causes of death, such as homicide and HIV/AIDS, contribute to the black-white disparity in variance, those contributions are largely offset by the higher rates of suicide and drug poisoning deaths for whites. As a result, differences in the causes of death for blacks and whites account, on net, for only about one-eighth of the difference in lifespan variance.


Wednesday, August 13, 2014

Comparative Politics

As far as I can tell, this is a near perfect description of empirical work in comparative politics.


In honor of Scott de Marchi.

Thursday, January 02, 2014

Bayes' Honeydew

An email from Zach Weiner:

Funny story I thought you might enjoy. Pretty typical kid story, I'm sure, but it was too perfect not to share. I was visiting ***** in Palo Alto. They have two kids: S** and J**. J** is about 3, and S** is about 6. 

We're eating breakfast, and J**'s meal comes with a slice of honeydew. He doesn't like honeydew and says so. His mom asks "Then can S** have it?" He says yes without giving it any thought. 

But S**'s eyes light up. She's very excited to have the honeydew. J** sees her face, immediately changes his opinion and declares he'd like the honeydew. J** then gives the honeydew a taste. J** decides once again he doesn't like honeydew, and now it makes its way to S**, where it is at last consumed. 

By these means I conclude J** is a Bayesian.

Thursday, August 08, 2013

Discriminating Linebackers?


Compensation Discrimination for Defensive Players: Applying Quantile Regression to the National Football League Market for Linebackers and Offensive Linemen 

Nancy Burnett & Lee James Van Scyoc 
Journal of Sports Economics, forthcoming 

Abstract: Keefer’s recent article in the Journal of Sports Economics, “Compensation discrimination for defensive players: applying quantile regression to the National Football League market for linebackers,” finds wage discrimination in the National Football League market for linebackers. Following Keefer, we examine both ordinary least squares and quantile analysis, as well as Oaxaca and quantile treatment effects decompositions though we explore the market not only for linebackers but also for offensive linemen and limit our study to rookie players. We would expect to find stronger evidence of discrimination, as rookies are captured sellers. However, we find no pattern of discrimination against Blacks. 

Nod to Kevin Lewis

"Oaxaca decomposition treatment"?  I always thought that involved smoking weed.  But no...


Tuesday, April 30, 2013

False Specificity


The price of a drink – too exactly? Flawed evidence for minimum unit pricing 

John Duffy
 Significance, April 2013, Pages 23–27

The UK government has been considering whether to introduce minimum unit pricing for alcohol. Extraordinarily precise benefits have been claimed for the measure, down to exactly how many lives a year will be saved. But are the statistics real or illusory? John Duffy says they are flawed to the point of uselessness.

An earlier, longer version.  

Nod to Kevin Lewis

Wednesday, October 17, 2012

Bike Helmet Follies

"One common denominator of successful bike programs around the world — from Paris to Barcelona to Guangzhou — is that almost no one wears a helmet, and there is no pressure to do so. In the United States the notion that bike helmets promote health and safety by preventing head injuries is taken as pretty near God’s truth. Un-helmeted cyclists are regarded as irresponsible, like people who smoke. Cities are aggressive in helmet promotion...'Pushing helmets really kills cycling and bike-sharing in particular because it promotes a sense of danger that just isn’t justified — in fact, cycling has many health benefits,' says Piet de Jong, a professor in the department of applied finance and actuarial studies at Macquarie University in Sydney. He studied the issue with mathematical modeling, and concludes that the benefits may outweigh the risks by 20 to 1. He adds: 'Statistically, if we wear helmets for cycling, maybe we should wear helmets when we climb ladders or get into a bath, because there are lots more injuries during those activities.'" [Elisabeth Rosenthal, NYT op-ed]

I like the comments part of the article.  The idiot parade is in full swing.  The claim is not that (1) wearing a helmet is a bad idea, or that (2) wearing a helmet should be illegal.  The claim is that the statistical risks are in line with wearing a helmet when you brush your teeth.  People slip and fall in the bathroom, sometimes, and hit their heads.  Not very often.  And the survivable accidents on bikes where a helmet matters are statistically rare.

Now I fully expect some goofball to comment and say, "A helmet saved the life of my cousin's stepdaughter!"  Yes.  And your dad should have worn a condom.

(Nod to Kevin Lewis, who likely wears a helmet when he brushes his teeth)

Thursday, August 02, 2012

Wow!! Doug Hibbs basically calls the election for Mittens!

Here is the money quote:

"according  to  the  Bread  and  Peace  model  per  capita  real   income  growth  rates  must  average out  at  nearly  6  percent  after  2012:q2  for  Obama  to  have  a   decent  chance  of  re-­‐election."

You can get to the whole paper from here (and obviously a hat tip goes to Brendan).

Beyond this bombshell, the paper is well worth reading as Hibbs excoriates his competitors for using ad-hoc and ex-post dummy variables as well as endogenous approval ratings as explanatory variables in their vote-share equations.

Sunday, July 15, 2012

All dressed up but nowhere to go

There are a lot of things that drive me crazy about the current practice of econometrics. People who think over-identification tests validate their indentifying assumptions. People who think that if you fail to reject the null at the 0.05 level, it's fine to proceed in your analysis as if the null was true (i.e. people who don't believe in type II error).

But one of the biggest is the practice of thinking we do no harm by using estimators we know to be inappropriate for the data at hand and thinking we somehow fully fix that issue by using robust standard errors.

I annually beat my head against the wall trying to get my students to appreciate these issues (only to often have my work undone by their reading papers/books that make these mistakes), but now on this last point, I have some help!

Continue reading below the fold

Tuesday, February 21, 2012

Correlation is Causation?

Why are professors liberal?

Neil Gross & Ethan Fosse
Theory and Society, March 2012, Pages 127-168

Abstract: The political liberalism of professors - an important occupational group and anomaly according to traditional theories of class politics - has long puzzled sociologists. This article sheds new light on the subject by employing a two-step analytic procedure. In the first step, we assess the explanatory power of the main hypotheses proposed over the last half century
to account for professors' liberal views. To do so, we examine hypothesized predictors of the political gap between professors and other Americans using General Social Survey data pooled from 1974-2008. Results indicate that professors are more liberal than other Americans because a higher proportion possess advanced educational credentials, exhibit a disparity between their levels of education and income, identify as Jewish, non-religious, or non-theologically conservative Protestant, and express greater tolerance for controversial ideas. In the second step of our article, we develop a new theory of professors' politics on the basis of these findings (though not directly testable with our data) that we think holds more explanatory promise than existing approaches and that sets an agenda for future research.


Do you know what has long puzzled me? How someone can get a paper published simply by running a bunch of regressions of ideology on demographic characteristics, and then saying things like "Jewish causes liberal beliefs." This "two stage" thing described above...if the Onion published empirical papers, this might be a candidate.

Friday, November 04, 2011

A little knowledge is....the BBC!

This is one of the best stats teachable moments I think I have seen in a long time.

An article was published, listing cancer rates of an admittedly dangerous disease.

The BooBC weighs in, noting that the variation is three times as large for some parts of this nationwide sample.

Dr. Goldacre, perhaps a trifle gleefully, points out that these are SEPARATE local samples, and they have associated variance that comes from the sample size. He writes a nice piece, with a fine funnel graph, and notes that the internet is a groovy, groovy thing, because it enables people like this to check things stated as fact by experts like the BooBC.

The BBC "stands by its story." They failed, utterly, to understand the very basic mistake they had made in looking at the information. (Of course, in journalism indoctrination school, they never had to learn any of those nasty stats stuff!)

As Dr. Goldacre put it in a tweet: "Dear sir, I have completely failed to understand a simple criticism of our work, please tell everyone, yours, BBCnews"

My own favorite bit is that in the BBC rebuttal, there are two parts:
1. We did not make a mistake.
2. Why are you picking on us? Lots of people made the same mistake!

Fantastic stuff. A Lagniappe: they are holding a "Bowel Cancer Comedy Night." No way even the Onion could get away with that.

Tuesday, October 04, 2011

Pitching is Over Rated?

Jackie Blue sends this outrage.

Pavitt found hitting accounts for more than 45% of teams' winning records, fielding for 25% and pitching for 25%. And, the impact of stolen bases is greatly overestimated.

He crunched hitting, pitching, fielding and base-stealing records for every MLB team over a 48-year period from 1951-1998 with a method no other researcher has used in this area. In statistical parlance, he used a conceptual decomposition of offense and defense into its component parts and then analyzed recombinations of the parts in intuitively meaningful ways.


Well, as long as it's "intuitively meaningful," right? Charlie is a professor of Communication. Me? I "intuitively doubtful."

Saturday, June 18, 2011

Code-breakers

People, I am not a programming genius by any stretch of the imagination. That said, I've done work with a variety of co-authors (Mark Perry, Mrs. Angus, Rodolfo Cermeno, Olan Henry & Nilss Olekalns ) where we write our own code to estimate multivariate GARCH in mean models, which so far are not available as pre-programmed packages in STATA or EVIEWS or SAS.

As a result, I get a fair amount of requests to give code to others. Amazingly to me, many of these requests come from PhD students.

My feeling is that if you are getting a PhD in economics you should write your own code for your dissertation. I usually tell students that and decline to give them the code, but offering to answer any specific questions that they may have about their own coding efforts.

The most bizarre situation I've faced was with a researcher from a central bank in Latin America. He asked for some code and I provided it, but he couldn't get from the code specifically written for my problem and data to a solution to his problem and data. So he asked if he sent me his data would I code up and run the estimations his bosses wanted.

I'd be very interested in hearing in the comments from other researchers how they handle requests to provide code along with any good code-sharing stories.

Tuesday, May 17, 2011

Your Job Counts on Account of the Way It Is Counted

Interesting. Our boy LeBron points out that the way we count matters, and that "offshoring" may overcount the portion of value created in the U.S. (And this in reference to an article he cites. Nice.)

I have taken the other side, claiming that the way we count dramatically UNDERCOUNTS how much value is created in the U.S. Sure, the "jobs" may be offshored, but they are a tiny part of the value of the product, and what is being done overseas is easy, repetitive, and cheap, not something U.S. workers need to do. Our other boy, Mark Perry, writes it up for the iPhone. Here's the value pie chart:

Who is right? I find the Houseman, et al., article pretty persuasive. So, I am, as usual, confused. It can't be that we BOTH overcount and undercount, can it?

(UPDATE: Meant to say that the title is stolen from the evil "May the Schwartz Be With You!")

(UPDATE II: LG and JN had an interesting exchange. Here is my own view: we might well want to soften the blow. Globalization helps all of us a little, but hurts a few of us a lot. Why not smooth out a little?)

Monday, March 07, 2011

Sweet Home Mississippi?

All hail to the State, with Patriotism so great, that less than 20% of its residents even have passports!



I was very proud to see that roughly 1/3 of my fellow Okies have passports.

Represent!!

Hat tip to LeBron, who grew up in New Jersey, that cesspool of disloyalty!


Tuesday, February 01, 2011

The ambulatory ICU

Fantastic article in the New Yorker about applying crime mapping and policing the "hot spots" to medical care. In a lot of situations, a small fraction of the relevant population is responsible for an outsized fraction of medical costs (this is NOT including catastrophic events like organ transplants). The article outlines some currently small programs where lavishing attention and money on these "hot spots" increases the quality of care and produces better outcomes while actually saving money. It's long, but it's a fascinating article.

Note that Megan McArdle is not a believer.

Thursday, September 09, 2010

Go for it?

Most tennis players have a huge speed differential between their first and second serves. They take more risk with the first serve, knowing that if they miss they have another, while on the second serve they ease off because there's no "third serve" (thank God).

But does it make sense? Perhaps not:

Nine of the top 20 men as of the Aug. 2 rankings would be better off statistically or virtually unaffected by using their first-serve technique on the second serve. The list includes Novak Djokovic, Nikolay Davydenko, Fernando Verdasco and many of those with dominating first serves: Soderling, Roddick, John Isner and Sam Querrey.

Yet only on occasion — perhaps with a big lead in a game, like 40-love — do any dare to strike a full-strength second serve.

“You need to at least give yourself a chance to win the point,” Querrey said.

The women who could be better served by hitting two first serves include Serena Williams, Jelena Jankovic, Victoria Azarenka and Maria Sharapova.

Andy Roddick for one, begs to differ:

“Two double faults in a row and you’re love-30,” Roddick said. “If sports were played on a stat sheet, you know, the look of it would probably be a lot different. One thing you’re not putting into consideration with the numbers is nervous tension.“

"You know, it’s a lot easier on a black-and-white piece of paper with a number. Most people don’t serve a ton better under pressure. So if you’re digging yourself a hole — love-15, love-30 — it’s a totally different ballgame. That can’t be explained by numbers, I don’t think.”



Personally I'd like to see, one serve, no-let, no ad tennis! That would be a hoot.

And, let me add in closing that my first serve and second serve are very similar in speed and power, but that's probably because my first serve is pathetic!

Thursday, July 22, 2010

Ed Leamer knocks it out of the park (again)

Leamer has a fantastic essay in the Spring 2010 Journal of Economic Perspectives (ungated version here) reacting to Angrist & Pishke's semi-triumphalist piece on the "revolution" in applied econometrics. My favorite part, reproduced below, hits on something I complain about to my students all the time, namely the practice of eschewing any attempt to obtain more accurate point estimates in favor of using confidence intervals that only have an asymptotic justification despite having a decidedly finite sample. Anyway, I'll just shut up and let Ed preach it:

It should not be a surprise at this point in this essay that I part ways with Angrist and Pischke in their apparent endorsement of White’s (1980) paper on how to calculate robust standard errors. Angrist and Pischke write: “Robust standard errors, automated clustering,
and larger samples have also taken the steam out of issues like heteroskedasticity and serial correlation. A legacy of White’s (1980) paper on robust standard errors, one of the most highly cited from the period, is the near-death of generalized least squares in cross-sectional applied work.”

An earlier generation of econometricians corrected the heteroskedasticity problems with weighted least squares using weights suggested by an explicit heteroskedasticity model.
These earlier econometricians understood that reweighting the observations can have dramatic effects on the actual estimates, but they treated the effect on the standard errors as a secondary matter. A “robust standard” error completely turns this around, leaving the estimates the same but changing the size of the confidence interval.

Why should one worry about the length of the confidence interval, but not the location? This mistaken advice relies on asymptotic properties of estimators. I call it “White-washing.” Best to remember that no matter how far we travel, we remain always in the Land of the Finite
Sample, infinitely far from Asymptopia. Rather than mathematical musings about life in Asymptopia, we should be doing the hard work of modeling the heteroskedasticity and the time dependence to determine if sensible reweighting of the observations materially changes the locations of the estimates of interest as well as the widths of the confidence
intervals.


Yes! Amen!

Wednesday, March 31, 2010

When Robert met Edmond

Newish NBER working paper by Aghion, Howitt, & Murtin (ungated version here) is titled: "The Relationship Between Health and Growth: When Lucas Meets Nelson-Phelps."

They argue that Lucas, who modeled an effect of improving health on growth and Nelson and Phelps who modeled an effect of the level of health on growth are both correct.

Their evidence comes mainly from a 96 country cross sectional average growth regression where both the initial level of life expectancy and the growth of life expectancy over the sample have positive and significant coefficients, both in LS and IV models.

Of the two results, they claim the effect of initial life expectancy is more robust.

I like the piece because they take a very reduced form approach. It's health and health improvements on growth, with basically nothing else in the model.

I dislike the piece because they, as do so many others, abuse the Hansen test of over-identifying restrictions to justify their instruments.

First, failing to reject the null, or "passing" the Hansen test, does not validate your identification, the test is on over-identifying instruments. Consider that in an exactly identified equation the test cannot be performed.

Second, failing to reject the null doesn't mean you don't have an instrument problem. A p level of .13 on a Hansen test means you don't reject the null at conventional levels, but it also means (more or less, I am speaking imprecisely here), there is an 87% chance that the null is false and your instruments are suspect. Another way to say this is we are rarely given any information about the power of the test, which is crucial when failing to reject the null is what guides our modeling choices.