Showing posts with label cosma shalizi. Show all posts
Showing posts with label cosma shalizi. Show all posts

Thursday, August 2, 2012

ich wünschte ich wüßte...

Cathy and Cosma both feel that knowing specific programming languages is not essential. To quote Cathy, "you shouldn’t obsess over something small like whether they already know SQL." To put it politely, I reject this statement. To apply to a data science job without learning the five key SQL statements is a fool's errand. Simply put, I'd never hire such a person. To come to an interview and draw a blank trying to explain "left join" is a sign of (a) not smart enough or (b) not wanting the job enough or (c) not having recently done any data processing, or some combination of the above. If the job candidate is a fresh college grad, I'd be sympathetic. If he/she has been in the industry, you won't be called back. (One not-disclosed detail in the Cosma-Cathy dialogue is what level of hire they are talking about.)

Why do I insist that all (experienced) hires demonstrate a minimum competence in programming skills? It's not because I think smart people can't pick up SQL. The data science job is so much more than coding -- you need to learn the data structure, what the data mean, the business, the people, the processes, the systems, etc. You really don't want to spend your first few months sitting at your desk learning new programming languages.
Both Cathy and Cosma also agree that basic statistical concepts are easily taught or acquired. Many studies have disproven this point, starting with the Kahneman-Tversky work. ..

Terrific post by Kaiser Fung (of Junk Charts and Numbers Rule Your World) - not least for thrill of discovery that Cosma Shalizi is, er, aggressively discussing...

Saturday, April 7, 2012

talks about talks

The point of academic talk is to try to persuade your audience to agree with you about your research. This means that you need to raise a structure of argument in their minds, in less than an hour, using just your voice, your slides, and your body-language. Your audience, for its part, has no tools available to it but its ears, eyes, and mind. (Their phones do not, in this respect, help.)

This is a crazy way of trying to convey the intricacies of a complex argument. Without external aids like writing and reading, the mind of the East African Plains Ape has little ability to grasp, and more importantly to remember, new information. (The great psychologist George Miller estimated the number of pieces of information we can hold in short-term memory as "the magical number seven, plus or minus two", but this may if anything be an over-estimate.) Keeping in mind all the details of an academic argument would certainly exceed that slight capacity*. When you over-load your audience, they get confused and cranky, and they will either tune you out or avenge themselves on the obvious source of their discomfort, namely you. 

Cosma Shalizi at Three-Toed Sloth

Saturday, April 10, 2010

hand me the ritalin

(If the idea of a comic about Spinozism and lycanthropy in eighteenth-century central Europe sounds the least bit interesting, you really need to read Family Man.)

Cosma Shalizi at Three-toed Sloth

Sunday, January 3, 2010

Avanti

2010. Time to move over to Wordpress. Yes.

But before we go, here's Cosma Shalizi on the Neyman-Pearson lemma and William James:

When last we saw the Neyman-Pearson lemma, we were looking at how to tell whether a data set x was signal or noise, assuming that we know the statistical distributions of noise (call it p) and the distribution of signals (q). There are two kinds of mistake we can make here: a false alarm, saying "signal" when x is really noise, and a miss, saying "noise" when x is really signal. What Neyman and Pearson showed is that if we fix on a false alarm rate we can live with (a probability of mistaking noise for signal; the "significance level"), there is a unique optimal test which minimizes the probability of misses --- which maximizes the power to detect signal when it is present. This is the likelihood ratio test, where we say "signal" if and only if q(x)/p(x) exceeds a certain threshold picked to control the false alarm rate.


CRS goes on to elaborate, then gets to William James and the will to believe:

Let's step back a little bit to consider the broader picture here. We have a question about what the world is like --- which of several conceivable hypotheses is true. Some hypotheses are ruled out on a priori grounds, others because they are incompatible with evidence, but that still leaves more than one admissible hypothesis, and the evidence we have does not conclusively favor any of them. Nonetheless, we must chose one hypothesis for purposes of action; at the very least we will act as though one of them is true. But we may err just as much through rejecting a truth as through accepting a falsehood. The two errors are symmetric, but they are not the same error. In this situation, we are advised to pick a hypothesis based, in part, on which error has graver consequences.

This is precisely the set-up of William James's "The Will to Believe". (It's easily accessible online, as are summaries and interpretations; for instance, an application to current controversies by Jessa Crispin.) In particular, James lays great stress on the fact that what statisticians now call Type I and Type II errors are both errors:

There are two ways of looking at our duty in the matter of opinion, — ways entirely different, and yet ways about whose difference the theory of knowledge seems hitherto to have shown very little concern. We must know the truth; and we must avoid error, — these are our first and great commandments as would-be knowers; but they are not two ways of stating an identical commandment, they are two separable laws. Although it may indeed happen that when we believe the truth A, we escape as an incidental consequence from believing the falsehood B, it hardly ever happens that by merely disbelieving B we necessarily believe A. We may in escaping B fall into believing other falsehoods, C or D, just as bad as B; or we may escape B by not believing anything at all, not even A.

Know the truth! Shun error! 2010, Excelsior!

The whole thing here.