Educated Guesswork

December 31, 2011

Web form complaints

Spent some of today getting my 2011 charitable donations out of the way, so I've been experiencing a lot of different Web forms. Remember, these people want my money, so it would be nice if they didn't make the experience so irritating. On that basis, here are some things not to do:

Refuse to accept spaces or dashes in my credit card number, phone number, social security number, etc. Don't force me into your stupid format; parse whatever I send you. Here, let me help. The following JS code strips out spaces and dashes. input = input.replace(/[ \-]/g, "");. For an appropriately huge consulting fee I'll show you how to replace periods and pluses, too.
Force me to tell you what kind of credit card I have. This information is encoded in the leading digits of the credit card number. This table may help. I know that things change, but seriously, you could at least try to guess.
Force me to select "USA" out of the end of an incredibly long drop-down list of countries. It's true that you can generally determine someone's country by looking at their IP address, but I can certainly understand not wanting to bother with that, but if most of your customers are American, it's silly to force them to scroll all the way to the end out of a misguided notion of national equity. Make my life easy and put the USA as the first item in the list, people.
Make me enter my state and my zip code. In nearly all cases, the zip code encodes the state.

Also, not a Web form issue, but I also wish there were some way to tell these organizations not to ask me for donations during the year. I give once a year, at the end of the year. It's just a matter of convenience. Sending me a bunch of physical letters asking for money just wastes your fund raising dollars and my time.

December 22, 2011

Somelliers for beer... wait, what?

Mark Garrison has a rather odd article in Slate arguing that we need expert advice to order beer in restaurants:

It's a busy night at the D.C. restaurant Birch & Barley, as well as its casual upstairs sister joint, ChurchKey. Greg Engert is guiding me through his beverage list with all the knowledge, talent, and grace one would expect from an award-winning sommelier. With a couple crisp queries, he learned enough to make some intriguing recommendations. He didn't flaunt his knowledge about food and drink, but when I had questions, he gave precise answers about the flavor, aroma, producer, pairing potential, and even the history of the available beverages. Fortunately, there was no attempt at upselling, the odious sin far too many sommeliers commit, a big reason why many diners are suspicious of the entire profession.
...
There may be agreement in the industry that great beer deserves top-notch service, but there's not yet a consensus on what that means. In fact, there's not even agreement on what to call a well-trained beer server. Engert's job title is beer director, but he doesn't mind being called a beer sommelier. (He has put some thought into this.) Some in the beer community find this term problematic, since "sommelier" is tied to the wine world and may imply a professional certification that doesn't exist.
...
The program's website states the claim that wine sommeliers might have known enough to choose a good beer for you a few decades ago, but now "the world of beer is just as diverse and complicated as wine. As a result, developing true expertise in beer takes years of focused study and requires constant attention to stay on top of new brands and special beers." So Daniels set out to build a testing and certification program to create a standard level of knowledge and titles that would signify superior beer knowledge to consumers, similar to the way a Court of Master Sommeliers credential does for wine.

Look, I love beer, don't like wine, and am well aware of the lousy beer service one typically gets at restaurants, so I'm generally in favor of anything that improves beer quality. But the main the problem isn't that there's nobody at the restaurant who understands beer. It's that the beer selection at restaurants sucks. To take one recent example, I ate at the Los Altos Grill the other night: they had a page of wines and three beers on tap. This isn't uncommon; in fact it's not uncommon for restaurants to have solid wine lists but only bottled beer, and only a few varieties of bottles at that. The question I have for waiters isn't "what beer do you recommend", but rather "is Peroni really the best beer you have?"

In large part, the culprit here is customer demand: people who eat at high-end restaurants tend to prefer wine to beer, so those restaurants naturally have lousy beer selections. But I suspect that the chemistry of beer has a lot to do with it as well. Wine can last years in the bottle—and many wines are better when aged—but bottled beer has a shelf life measured in months, with draft beer going bad in in a few weeks. So, unlike wine, you can't afford to stock any beer that people don't order fairly frequently, since there's too high a chance it will go bad before someone orders it. I suspect that this is why most restaurants keep such a small beer selection. (Anyone with contacts in the restaurant business should feel free to chime in here.)

The major exception here is restaurants that specialize in beer (Garrison's example of Birch & Barley advertises itself as "a completely unique food and beer experience celebrating a full spectrum of styles, traditions, regions and flavors"). If you're that kind of restaurant you probably get enough volume to keep a large inventory without things getting too stale—though I do wonder what the oldest bottle on their shelves tastes like.

December 18, 2011

Do we need DNS confidentiality?

The first step in most Internet communications is name resolution: mapping a text-based hostname (e.g., www.educatedguesswork.org) to a numeric IP address (e.g,, 69.163.249.211). This mapping is generally done via the Domain Name System (DNS), a global distributed database. The thing you need to know about the security of the DNS is that it doesn't have much: records are transmitted without any cryptographic protection, either for confidentiality or integrity. The official IETF security mechanism, DNSSEC is based on digital signatures and so offers integrity, but not confidentiality, and in an any case has seen extremely limited deployment. Recently, OpenDNS rolled out DNSCrypt, which provides both encrypted and authenticated communications between your machine and a DNSCrypt-enabled resolver such as the one operated by OpenDNS. OpenDNS is based on DJB's DNSCurve and I've talked about comparisons between DNSSEC and DNSCurve before, but what's interesting here is that OpenDNS is really pushing the confidentiality angle:

In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks. It doesn't require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers. We know that claims alone don't work in the security world, however, so we've opened up the source to our DNSCrypt code base and it's available on GitHub.
DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user's online security and privacy.

Unfortunately, I don't think this argument really holds up under examination. Remember that DNS is mostly used to map names to IP addresses. Once you have the IP address, you need to actually do something with it, and generally that something is to connect to the IP address in question, which tends to leak a lot of the information you encrypted.

Consider the (target) case where we have DNSCrypt between your local stub resolver and some recursive resolver somewhere on the Internet. The class of attackers this protects against is those which have access to traffic on the wire between you and the resolver. Now, if I type http://www.educatedguesswork.org/ into my browser, what happens is that the browser tries to resolve www.educatedguesswork.org, and what the attacker principally learns is (1) the hostname I am querying for and (2) the IP address(es) that were returned. The next thing that happens, however, is that my browser forms a TCP connection to the target host and sends something like this:

GET / HTTP/1.1
Host: www.educatedguesswork.org
Connection: keep-alive
Cache-Control: max-age=0
...

Obviously, each IP packet contains the IP address of the target the Host header contains the target host name, so any attacker on the wire learns both. And as this information is generally sent over the same access network as the DNS request, the attacker learns all the information they would have had if they had been able to observe my DNS query. [Technical note: when Tor is configured properly, DNS requests are routed over Tor, rather than over the local network. If that's not true, you have some rather more serious problems to worry about than DNS confidentiality.]

"You idiot," I can hear you saying, "if you wanted confidentiality you should have used SSL/TLS." That's true, of course, but SSL/TLS barely improves the situation. Modern browsers provide the target host name of the server in question in the clear in the TLS handshake using the Server Name Indication (SNI) extension. (You can see if your browser does it here), so the attacker learns exactly the same information whether you are using SSL/TLS or not. Even if your browser doesn't provide SNI, the hostname of the server is generally in the server's certificate. Pretty much the only time that a useful (to the attacker) hostname isn't in the certificate is when there are a lot of hosts hidden behind the same wildcard certificate, such as when your domain is hosted using Heroku's "piggyback SSL". But this kind of certificate sharing only works well if your domain is subordinated behind some master domain (e.g, example-domain.heroku.com), which isn't really what you want if you're going to offer a serious service.

This isn't to say that one couldn't design a version of SSL/TLS that didn't leak the target host information quite so aggressively—though it's somewhat harder than it looks—but even if you were to do so, it turns out to be possible to learn a lot about which sites you are visiting via traffic analysis (see, for instance here and here). You could counter this kind of attack as well, of course, but that requires yet more changes to SSL/TLS. This isn't surprising: concealing the target site simply wasn't a design goal for SSL/TLS; everyone just assumed that it would be clear what site you were visiting from the IP address alone (remember that when SSL/TLS was designed, it didn't even support name-based virtual hosting via SNI). I haven't seen much interest in changing this, but unless and until we do, it's hard to see how providing confidentiality for DNS traffic adds much in the way of security.

December 8, 2011

An overview of espresso-making technology

I've been meaning to write something about espresso and the various technology options for making one, but I never get around to it. Now I have. I'm not an espresso-making expert, but I'm a guy who cares about espresso, has a moderate but not extreme budget, and can pull a fairly solid shot. As such, this might or might not be useful to you. There are many articles like this, but this one is mine.

The discussion below is restricted to what's called "semi-automatic" machines: those where you grind the coffee yourself but the machine has controls designed to regulate temperature and pressure. "Super-automatic" where you put in beans and water and they put out coffee are out of scope here.

Consistency
The basic principle of espresso is simple: you grind up the coffee, pack it down and then force heated water through under pressure. The difference between swill and pure liquid perfection is in the details. Moreover, if you're going to get the details right, the first thing you need to do is get them consistent; the exact procedures and settings you need differ with each coffee and each machine, but if you can be consistent then you can dial them in over time. [Aside: when I took machining in college, the first thing the instructor told me was that machining wasn't about cutting metal, it was about measurement. If you could measure accurately, you could cut accurately.] The major variables you need to control are:

The coffee itself.
The grind.
The amount of coffee.
The dispersal into the portafilter basket and the tamp.
Water temperature.
Water pressure.

The coffee is something you buy, so you have some control over it but not complete control. With the right grinder, you can completely control the grind and the amount of coffee. Dispersal and tamp is a matter of personal technique and practice. With the right espresso machine, you can control water temperature quite precisely and with any pump machine, pressure control should be quite good. So, as you can tell, this is primarily a matter of getting good equipment.

Grinder
The grinder thing is pretty simple: get a burr grinder with enough adjustments. Don't get a doser. Get one with a timer. A little elaboration: blade grinders (the cheap canister ones that you can buy for $20-$40) don't do a good job of getting you a consistent grind. The individual grounds aren't the same size and you can't control the overall size except by grinding longer. Don't buy one. You want a burr grinder and you want one that allows you to adjust the grind finely and over a large range. Different beans require different grinder settings, so easy adjustment matters if you change beans much.

The reason you want a timer is to let you control the amount of coffee you grind. This is a parameter people usually specify by mass, but using a scale is a pain in the ass. Grind time is a good proxy here. What I typically do is make some test shots and then set the grind time on my grinder (it has 3 presets). Then when I want to pull a shot I just put the portafilter under the grinder and hit the right preset button. None of this requires much thought once you get it wired.

There are lots of good grinders. What I have is a Baratza Vario. There are two features I like about this. First, it has easy adjustments with two slides up front, one for macro (espresso versus drip) and one for micro (grind fineness once you've selected espresso). Second, it has timer presets, which, as I said earlier, is super-convenient. There's a rest for you to put the portafilter on while you grind, but you need to hold it there or it falls off. I notice that Baratza now makes a weight-based Vario W. This seems like a good idea, but I don't know how well it will work with espresso, since you don't want to grind into a hopper but right into your portafilter, and it's not clear how the scale integrates with that. One caution I would have with the Vario is that the really gross burr adjustments are done with a hex wrench (included). They're easy but kinda scary (keep turning until the motor starts to labor), so if that freaks you out, you might consider another choice.

Espresso Machine
There are a lot of choices in what kind of espresso machine you buy, but let's get something out of the way now: espresso machines have pumps. Yes, you can buy a cheap machine that works off steam pressure, but that's not what you want.

The central problem that dictates the design of an espresso machine is this: The water you use to make espresso needs to be at one temperature (~200 F). The water you use to steam your milk needs to be at steam temperatures (~250 F). If you're going to make milk drinks (I don't, but Mrs. G. does) then you need to somehow address this. There are four basic approaches that I've seen:

Have a single boiler and a switch that selects which temperature to maintain at (a single boiler machine).
Have two boilers, one at each temperature (a double boiler machine).
Have a boiler set to steam temperature and use a heat exchanger to heat your water to espresso temperature.
Have a boiler set to water temperature and an electric thermal block heating system to make steam.

Single boiler machines are basically a terrible solution for more than about one or two people if you want to make any kind of steamed milk drink. Here's what the procedure looks like if you want to make a latte: set the thermostat switch to "water"; pull a shot; set the thermostat switch to steam; wait for it to heat up; steam your milk. This is all reasonably fast because the boiler heats up fast. However, say you want to make another latte. Now you have to set the thermostat back to water and wait for it to cool down, which can take minutes. You can accelerate this some by just running water through the group head which pulls cool water out of the reservoir into the system, but basically it's a pain. I've used this kind of machine in an office setting and it sucks.

The obvious (and best) solution to this problem is to have two totally separate boilers, with one set to water and one set to steam. This is of course more expensive, especially since manufacturers seem to have decided to engage in a little market segmentation. To give you an example, Chris Coffee's cheapest double boiler is the Mini Vivaldi II at $1995. They'll sell you a Rancilio Silvia (a very nice single boiler) for $699. This isn't an uncommon pattern: many double boiler machines sell for more than twice what a good single boiler would cost. I don't know anyone who has bought two singles instead, but it's sure occurred to me.

The other two solutions are compromises. In a heat exchanger machine, the boiler is set to steam temperature and then the water for the espresso runs through a tube set inside the boiler, thus heating up on the way (good description here. The idea is that as the water is being pulled out of the reservoir and onto the coffee it heats up. The obvious problem, however, is that when you're not pulling espresso, the water in the heat exchanger tube is heating up eventually to the temperature of the steam, at which point you're back where you started, as is the heavy metal group head which provides a lot of thermal intertia. Standard procedure here is a cooling flush which means that you run some water through the (empty) portafilter/brew group to get it down below the right temperature. Then you quickly pack the portafilter and pull your shot. This all requires some coordination.

About a year ago, QuickMill came out with a new machine (the Silvano), which has a single boiler for the water and a thermoblock for the steam. This has the advantage that you can tightly temperature control the water and the group head and still get decent steam fast. The steam isn't as good as it would be if you had an actual boiler, but it's pretty good, so it's a reasonable compromise. And since the water side is temperature controlled, you get to pull a predictable shot without much messing around, which is what I, at least, am after. It shouldn't be surprising at this point that I have a Silvano, which I'm pretty happy with. Here's what it looks like pulling a shot of Four Barrel Ethiopia Welena Suke Quto (and no, those two little spurts onto the backsplash are not intended. That's evidence of tamping error.)

Oh, one more thing: the water supply for espresso machines can either be plumbed (there is a water tube coming from your pipes) or unplumbed (there is a water reservoir you have to refill). Plumbed typically only comes on higher end machines. I don't know if it's worth stepping up to one of those machines to get plumbed, but I do know that my Silvano is unplumbed and I wish it were plumbed. It's pretty annoying to have the shot already to go and realize you're out of water. Doubly annoying if it's your last shot worth of coffee.

November 29, 2011

More on qualification: gender and age

As I wrote earlier, many oversubscribed races use a performance-based qualification process as a way of selecting participants. What I mostly passed over, however, is whether different people should have to meet different qualifying standards. If your goal is to get the best people, you could simply just pick the top X%. However, if you were to do that, what you would get would be primarily men in the 20-40 age range. To give you an idea of this, consider Ironman Canada 2011, which had 65 Hawaii Qualifiers. If you just take the first 65 non-Pro finishers, the slowest qualifier would be around 10:17. This standard would have two amateur women, Gillian Clayton (W30-34) at 10:01.58 (a pretty amazing performance, since she's 18 minutes ahead of the next woman) and Rachel Ross (W35-39) at 10:12.17, and no man 55 or above.

If you're going to have a diversified field, then, you need to somehow adjust the qualifying standard for age and gender. The standard practice is to have separate categories for men and women and five year age brackets within each gender. (Some races also have "athena" and "clydesdale" divisions for women and men respectively who are over a certain weight, but at least in triathlon, these are used only for awards and not for Hawaii qualifying purposes.) However, it's also well-known that these categories do a fairly imperfect job of capturing age-related variation: it's widely recognized that "aging up" from the oldest part of your age group to the youngest part of the next age group represents a signficant improvement in your expected results.

UPDATE: I forgot to mention. Western States 100 has a completely gender neutral qualifying standard, but it's comparatively very soft.

November 28, 2011

On qualification standards

One of the common patterns in endurance and ultra-endurance sports is to have one or two races that everyone wants to do (the Hawaii Ironman, the Boston Marathon, Western States 100, etc.) Naturally, as soon as the sport gets popular you have more people who want to do race X than the race organizers can accomodate. [Interestingly, this seems to be true no matter the size of the event: Hawaii typically has around 1800 participants, Boston over 20,000.] As a race organizer, then, you are faced with the problem of deciding how to pick the actual participants from those who have the desire to participate.

The first problem seems to be deciding what to optimize for, with the two most common objectives being:

Choose fairly among everyone who wants to do the race.
Choose the best athletes.

Fair Selection
The easiest way to choose fairly is generally to run a lottery. You take applications for a race up until date X and then just draw however many entrants you want out of that list. [Note that there is always a yield issue, since some people register who never show because of injuries or whatever, so the number of actual participants is never totally predictable.] For races which are only mildly oversubscribed, what's more common is take entries up until you're full and then close entry under the "you snooze, you lose" principle. Ironman Canada does this, but now it basically fills up right away every year so you more or less have to be there the day after the race when registration for the next year opens up.

Merit-Based Selection
Choosing the best athletes is a more difficult proposition, since you first need to identify them. You might think that you could just have a big qualifying race with everyone who wants to race and just pick the top X participants, but this clearly isn't going to work. Since the size of the target event is generally (though not always) set to be about the maximum practical size of a race, if you're going to pick out the top people to race in your target event, the qualifying event would have to be much much larger, well beyond the practical size. Instead, you somehow have to have a set of qualifying races and draw the best candidates from each race. In some cases this is easy: If you are drawing national teams for the world championship, you can just have each nation run its own qualifying race and since each such race only needs to draw from a smaller pool, it's still manageable. However, many events (e.g., Ironman) aren't laid out among national lines so this doesn't work.

There are two basic strategies for drawing your qualifying candidates from a number of races. First, you can have a qualifying time. For instance, if I wanted to run the Boston Marathon, I would need to run some marathon under 3:10. Obviously, there is a lot of variation in how difficult any given race is, and so this leads to people forum shopping for the fastest race. It's extremely common to see marathons advertised as good Boston qualifiers. The key words here are "flat and fast" (A qualifying race can only have a very small amount of net downhill, so non-flat means uphill,which slows you down.). Obviously, a qualifying time doesn't give you very tight control over how many people you actually admit, so you still have an admissions problem. As I understand it, Boston used to just use a first-come-first-served policy for qualifiers but in 2012 they're moving towards a rolling admissions policy designed to favor the fastest entrants. That said, At the other end of the spectrum, the Western States has their qualifying time set so that there are vastly more qualifiers than eventual participants (it looks to me like it's set so that practically anyone who can finish can qualify [observation due to Cullen Jennings]) and they use a lottery to choose among the qualifiers.

The other major predictable approach is that used for the Hawaii Ironman. The World Triathlon Corporation (who runs Hawaii) has made certain races "Hawaii qualifiers" (my understanding is that a race pays for this privilege) and each race gets a specific number of slots for each gender/age combination. The way that this works is that if there are 5 slots in your age group, then the top 5 finishers get them. If any of those people don't want the slot (for instance they may have already qualified) then the slots roll down to the 6th person, and so on. all of this happens the day of or the day after the race and in person. This method gives the race organizer a very predictable race size but poses some interesting strategic issues for participants: because participants compete directly against each other for slots, what you want is to pick a qualifying race that looks like it is going to have a weak field this year. Unfortunately, just because a race had a weak field last year doesn't mean that that will be true again, since everyone else is making the same calculation!

Arbitrary Selection
One thing that I've only seen in ultrarunning is invitational events with arbitrary (or at least unpublished) selection criteria. For instance, here's the situation with Badwater:

The application submission period begins on February 1, 2012 and ends on February 15, 2012. A committee of five race staff members, one of whom is the race director, will then review and rank each application on a scale of 0 to 10. The ranks will be tallied on February 18 and the top 45 rookie applicants with the highest scores, and the top 45 veterans with the highest scores, will be invited (rookies and veterans compete separately for 45 slots for each category). At that time, or later, up to ten more applicants (rookie and/or veteran) may be invited at the race director's discretion, for a total of approximately 100 entrants, and 90 actual competitors on race day.

I guess that's one way to do it.

November 8, 2011

HyperMac's external battery workaround

The MacBook (Air, Pro, etc.) are great computers, but the sealed battery is a real limitation if you want to travel with it. My Air gets about 5-6 hours of life if I'm careful, which is fine for a transcontinental flight, but not a transatlantic one. The fix, of course, is to buy a HyperMac external battery, which plugs into the laptop at the only real point of access, the magsafe connector. Unfortunately, in 2010 Apple sued HyperMac for patent infringement and HyperMac stopped selling the relevant cable (which, as I understand it, was actually a modified version of an official Apple cable). Without the cable, of course, the battery is pretty useless.

I'm lucky enough to have one of the pre-lawsuit battery/cable combinations but recently a friend wanted one, so I looked again. It seems that HyperMac is back in business, but they've resorted to a do-it-yourself kind of ethos. Basically, you have two choices:

HyperMac will sell you a connector that impersonates a 12V air/auto power connector. You then buy the Apple air/auto to MagSafe adaptor and plug it into your Mac.
They sell you a pair of jacks that you splice into the cable for a legitimate Apple power supply. The way that this works is you take a standard Apple power supply and cut the magsafe half of the cable in two. You strip the wires and attach them to the jack; repeat for the other side.

Without taking a position on the merits of Apple's legal claims, this seems like a pretty lame state of affairs. First, the original HyperMac design was better because you could charge your battery at the same time as you powered your Mac with it. This works with the air/auto version but not with the DIY jack version. Second, while it's not exactly microsurgery to splice the cables, it's still something you could mess up.

Moreover, it's not like Apple has some super-expensive power expansion solution that HyperMac is competing with and the patent is protecting them from. Rather, they're just making life harder for people who want to use Apple's products in situations which are just more extreme versions of the situations which motivated the device having a battery in the first place. I just don't see how this makes anyone's life better.

November 5, 2011

Rizzo/Duong BEAST Countermeasures

A while ago I promised to write about countermeasures to the Rizzo/Duong BEAST attack that didn't involve using TLS 1.1. For reasons that the rest of this post should make clear, I had to adjust that plan a bit.

To recap, the requirements to mount this attack are a channel that:

Is TLS 1.0 or older.
Uses a block cipher (e.g., DES or AES).
Is controllable by an attacker from a different origin.
Allows the attackers to force the target secret to appear on at a controllable location.
Allows the attacker to observe ciphertext block n and control a subsequent block m with only a small number of uncontrolled bits in between n and m.

I know this last requirement is a bit complicated, so for now just think of it as "observe block n and control the plaintext of n+1", but with an asterisk. It won't really enter into our story much.

So far, there are two publicly known channels that meet this criterion:

WebSockets-76 and previous (only relevant on Safari).
Java URLConnection

Note that requirements 1 and 2 are about the TLS stack and requirements 2-4 are about the Web application. Requirement 5 is about both. This suggests that there are two main angles for countermeasures: address the TLS stack and address the Web application. Moreover, there are three potential actors here: users, operators of potential victim sites, and implementors.

The TLS Stack
TLS 1.1
First let's dispose of the TLS 1.1 angle. As has been apparent from the beginning, the ultimate fix is to upgrade everyone to TLS 1.1. Unfortunately, the upgrade cycle is really long, especially as many of the popular stacks don't have TLS 1.1 support at all. To make matters worse, due to a number of unfortunate implementation decisions which I'll hopefully get time to write about later, it's likely to be possible for an attacker to force two TLS 1.1 implementations to speak TLS 1.0, making them vulnerable. So, upgrading to TLS 1.1 is basically a non-starter.

RC4
The next obvious angle (per requirement 2) is to force the use of RC4, which isn't vulnerable to this attack. This isn't really a general solution for a number of reasons, including that there are also (more theoretical) security concerns about the use of RC4 and there are a number of government and financial applications where AES is required.

The only really credible place to restrict the use of non-RC4 ciphers is the server. The browsers aren't going to turn them off because some sites require it. Users aren't going to turn them off en masse for the usual user laziness reasons (and because some browsers make it difficult or impossible to do). Even if users do restrict their cipher suite choices, Java uses its own SSL/TLS stack, so configuring the cipher suites on the browser doesn't help here. The server, however, can choose RC4 as long as the client supports it and this provides good protection. [Note that TLS's anti-downgrade countermeasures do help here; the server can use RC4 with clients which support both AES and RC4 and the attacker can't force the server to believe that the client supports only AES.] However, as I've said, this isn't really a permanent solution.

Record Splitting
A number of techniques have been suggested to randomize the CBC state. The general form of this is to split each plaintext write (i.e. the unit the attacker is required to provide) into two records, with the first containing less than one cipher block worth of plaintext. So, for instance, each time the user does a write you could send an empty record (zero-length plaintext). Because TLS encrypts the MAC, this means that the first plaintext block is actually the MAC, which the attacker can't predict, thus randomizing the CBC state.

In theory, this should work fine, since TLS doesn't guarantee any particular mapping between plaintext and records (just as TCP does not). However, it turns out that some SSL/TLS servers don't handle this kind of record splitting well (i.e., they assume some mapping) and so this technique causes problems in practice. Client implementors tried a bunch of techniques and ultimately settled on one where the first byte of the plaintext is sent separately in a single record and then the rest is sent in as many records as necessary (what has been called 1/n-1 splitting). [*]. This seems to be mostly compatible, though apparently some servers still choke

if you use Chrome or Firefox you should either have this fix already or get it soon. However, as mentioned above, those browsers aren't vulnerable to attack via WebSockets and Java uses a different stack, so the fix doesn't help with Java. The good news is that Oracle's October 18 Java patch claims to fix the Rizzo/Duong attack and running it under ssldump reveals that they are doing a 1/n-1 split. The bad news is that the version of Java that Apple ships for Lion hasn't been updated, so if you have a Mac you're still vulnerable.

Web-based Threat Vectors
The other angle is to remove the Web-based threat vectors. How feasible this is depends on how many such vectors there are. As I noted above, the only two known ones are WebSockets ≤ 76 and Java. I've heard claims that SilverLight is vulnerable, but Microsoft says otherwise here. This could of course be wrong. It's also of course possible to introduce a new threat vector. For instance, if we were to add an XHR variant that allowed streaming uploads, this combined with CORS would create a new potential vector. So, in the future we all need to adopt one of the TLS-based countermeasures or be really careful about what Web features we add; probably both to be honest.

We also need to subdivide these vectors into two categories: those which the server can protect itself against (WebSockets) and those which it cannot really (Java). To see the difference, consider that before the client is allowed to use WebSockets, the server needs to agree. So, if you have a standard non-WebSockets server, there's no WebSockets threat. By contrast Java allows URLConnections to the server without any agreement, so there's no way for the server to protect itself from a Java threat vector (other than trying to fingerprint the Java SSL/TLS stack and refuse service, which seems kind of impractical.) Obviously, then, the Java vector is more severe, especially since it's present even if the browser has been fixed.

To make matters worse, the previous version of Java is not only vulnerable to the Rizzo/Duong attack, but it also has what's called a "universal CSRF" issue. It's long been known that Java treats two hosts on the same IP address as on the same origin. It turns out that if you manage to be on the same IP address as the victim site (easy if you're a network attacker), then you can inject a Java applet which will do an HTTPS request to the victim site. That request (a) passes cookies to the site and (b) lets you read the response. These are the two elements necessary to mount a CSRF even in the face of the standard CSRF token defenses. (A related issue was fixed a few years ago, but only by suppressing client-side access to the cookie, which is an incomplete fix.) Obviously, this also serves as a vector for the Rizzo/Duong attack, though I don't know if it's the vector they used, since I don't have all the details of their procedure. Adam Barth and I discovered (or rediscovered, perhaps) the problem while trying to figure out how Rizzo and Duong's attack worked and notified Oracle, who fixed it in the most recent Java patch by supressing sending the cookie in this type of request. (Obviously, I put off writing this post to avoid leaking the issue.) The fix in question would also close this particular vector for the Rizzo/Duong attack, even without the 1/n-1 split, though that doesn't mean that this is the one they were using or that there aren't others.

The bottom line, then, is that you should be upgrading Java, or, if you can't do that, disabling it until you can.

October 25, 2011

SSL/TLS and Computational DoS

Threat Level writes about the release of a denial of service tool for SSL/TLS web servers.

The tool, released by a group called The Hackers Choice, exploits a known flaw in the Secure Socket Layer (SSL) protocol by overwhelming the system with secure connection requests, which quickly consume server resources. SSL is what's used by banks, online e-mail providers and others to secure communications between the website and the user.
The flaw exists in the process called SSL renegotiation, which is used in part to verify a user's browser to a remote server. Sites can still use HTTPS without that renegotiation process turned on, but the researchers say many sites have it on by default.
"We are hoping that the fishy security in SSL does not go unnoticed. The industry should step in to fix the problem so that citizens are safe and secure again. SSL is using an aging method of protecting private data which is complex, unnecessary and not fit for the 21st century," said the researchers in a blog post.
The attack still works on servers that don't have SSL renegotiation enabled, the researchers said, though it takes some modifications and some additional attack machines to bring down the system.

Background
In order to understand what's going on, you need to have some background about SSL/TLS. An SSL/TLS connection has two phases:

A handshake phase in which the keys are exchanged
A data transfer phase in which the actual data is passed back and forth.

For technical cryptographic reasons which aren't relevent here, the handshake phase is generally much more expensive than the data transfer phase (though not as expensive as people generally think). Moreover, the vast majority of the cost is to the server. Thus, if I'm an attacker and you're a server and I can initiate a lot of handshakes to you, I can force you to do a lot of computations. In large enough quantity, then, this is a computation denial of service attack. This is all very well known. What the attack would look like is that I would set up a client or set of clients which would repeatedly connect to your server, do enough of a handshake to force you to incur computational cost, and disconnect.

What's slightly less well-known is that SSL/TLS includes a feature called "renegotiation", in which either side can ask to do a new handshake on an existing connection. Unsurprisingly, the cost of a new handshake is roughly the same as an initial one. [Technical note: not if you're doing resumption but in this case the client wouldn't offer resumption, since he wants to maximize server cost.] So, what this attack would look like is that instead of opening multiple connections, I'd open a single connection and just renegotiate over and over. As I said, this is slightly less well-known, but it's certainly been known that it's a possibility for some time, but most of the analyses I have seen suggested that it wasn't a major improvement from the attacker's perspective.

The Impact of This Attack
What you should be asking at this point is whether a computational DoS attack based on renegotiation is any better for the attacker than a computational DoS attack based on multiple connections. The way we measure this is by the ratio of the work the attacker has to do to the work that the server has to do. I've never seen any actual measurements here (and the THC guys don't present any), but some back of the envelope calculations suggest that the difference is small.

If I want to mount the old, multiple connection attack, I need to incur the following costs:

Do the TCP handshake (3 packets)
Send the SSL/TLS ClientHello (1 packet). This can be a canned message.
Send the SSL/TLS ClientKeyExchange, ChangeCipherSpec, Finished messages (1 packet). These can also be canned.

Note that I don't need to parse any SSL/TLS messages from the server, and I don't need to do any cryptography. I'm just going to send the server junk anyway, so I can (for instance) send the same bogus ClientKeyExchange and Finished every time. The server can't find out that they are bogus until it's done the expensive part [Technical note: the RSA decryption is the expensive operation.] So, roughly speaking, this attack consists of sending a bunch of canned packets in order to force the server to do one RSA decryption.

Now let's look at the "new" single connection attack based on renegotiation. I need to incur the following costs.

Do the TCP handshake (3 packets) [once per connection.]
Send the SSL/TLS ClientHello (1 packet). This can be a canned message.
Receive the server's messages and parse the server's ServerHello to get the ServerRandom (1-3 packets).
Send the SSL/TLS ClientKeyExchange and ChangeCipherSpec messages (1 packet).
Compute the SSL/TLS PRF to generate the traffic keys.
Send a valid Finished message.
Repeat steps 2-7 as necessary.

The advantage of this variant is that I get to amortize the TCP handshake (which is very cheap). The disadvantage is that I can't just use canned packets. I need to do actual cryptographic computations in order to force the server to do an RSA private key decryption. This is just a bunch of hashes, but it's still not free.

Briefly then, we've taken an attack which was previously limited by network bandwidth and slightly reduced the bandwidth (by a factor of about 2 in packets/sec and less than 10% in number of bytes) at the cost of significantly higher computational effort on the attacker's client machines. Depending on the exact characteristics of your attack machines, this might be better or worse, but it's not exactly a huge improvement in any case.

Another factor to consider is the control discipline on the server. Remember that the point of the exercise is to deny service to legitimate users. It's not uncommon for servers to service each SSL/TLS connection in a single thread. If you're attacking a server that does this, and you use a single connection with renegotiation, then you're putting a lot of load on that one thread; a sane thread scheduler will try to give each thread equivalent amounts of CPU, which means that you don't have a lot of impact on other legitimate users; your thread just falls way behind. By contrast, if you use a lot of connections then you get much better crowding out of legitimate users. On the other hand, if you have some anti-DoS device in front of your server, it might be designed to prevent a lot of connections from the same client, in which case the single connection approach would be more effective. Of course, if single-connection attacks become popular, it's trivial to enhance anti-DoS devices to stop them. [Technical note: SSL/TLS content types are in the clear so renegotiation is easily visible.]

Is this a flaw in SSL/TLS?
Zetter and the THC guys characterize this as a flaw in SSL/TLS. Without offering a general defense of SSL/TLS, this seems overstated at best. First, this isn't really a threat that endangers citizens ability to be "safe and secure". Rather, it's a mechanism for bringing down the Web sites they visit. This isn't to say that there aren't problems in SSL/TLS that would lead to compromise of user's data, but this sort of DoS attack doesn't fall into that category.

Second, computational DoS attacks of this type have been known about for a very long time and in general security protocol designers have made a deliberate choice not to attempt to defend against them. Defenses against computational DoS typically fall into two categories:

Force users to demonstrate that they are reachable at their claimed IP address. This prevents "blind" attacks where the attacker can send forged packets and thus makes it easier to track down attackers.
Try to impose costs on users so that the ratio of attacker work to defender work is more favorable. (There are a variety of schemes of this type but the general term is "client puzzles").

Because SSL/TLS runs over TCP, it gets the first type of defense automatically. [Technical note: Datagram TLS runs over UDP and so Nagendra Modadugu and I explicitly added a reachability proof mechanism to protect against blind attack.] However, SSL/TLS, like most other Internet security protocols, doesn't do anything to push work onto the client. The general reasoning here is that DoS attackers generally use botnets (i.e., other people's compromised computers) to mount their attacks and therefore they have a very large amount of CPU available to them. This makes it very hard to create a puzzle which creates enough of a challenge to attackers to reduce the attack threat without severely impacting people with low computational resources such as those on mobile devices. Obviously, there is a tradeoff here, but my impression of the history of DoS attacks has been that this sort of CPU-based attack isn't that common and so this has been a generally reasonable design decision.

More generally, defending against computational DoS attacks is a really hard problem; you need to be able to serve large numbers of people you don't really have a relationship with, but it's easy for attackers who control large botnets to pretend to be a lot of legitimate users. All the known defenses are about trying to make it easier to distinguish legitimate users from attackers before you've invested a lot of resources in them, but this turns out to be inherently difficult and we don't have any really good solutions.

UPDATE: Fixed a writing error. Thanks to Joe Hall for pointing this out.

October 18, 2011

My comments on Argonne's "Suggestions for Better Election Security"

Following up on their demonstration attack on Diebold voting machines (writeup, my comments), the Argonne Vulnerability Assessment Team has developed a set of Suggestions for Better Election Security). My review comments are below:

I've had a chance to go over this document and while there are some suggestions that are valuable, many seem naive, impractical, or actively harmful. More generally, I don't see that it derives from any systematic threat model or cost/benefit analysis about which threats to address; merely following the procedures here would--at great expense--foreclose some security threats while leaving open other threats that are arguably more serious both in terms of severity and ease of attack. Finally, many of the recommendations here seem quite inconsistent with the current state of election practice. That's not necessarily fatal, since that practice is in some cases flawed, but there doesn't seem to be any acknowledgement that these seemingly minor changes actually would require radically reworking election equipment and procedures.

If this document is to be useful rather than harmful, it needs to start with a with a description of the threat model--and in particular the assumed attacker capabilities--and then proceed to a systematic analysis of which threats it is economical to defend against, rather than just being a grab bag of isolated security recommendations apparently designed to defend against very different levels of threat.

Pre- And Post-Election Inspections
The authors recommend:

... at least 1% of the voting machines actually used in the election-randomly chosen-should be tested, then disassembled, inspected, and the hardware examined for tampering and alien electronics. The software/firmware should also be examined, including for malware. It is not sufficient to merely test the machines in a mock election, or to focus only on cyber security issues!

This document does not specify how the hardware must be "examined", but a thorough examination, sufficient to discover attack by a sophisticated attacker, is likely to be extremely time consuming and expensive. A voting machine, like most embedded computers, consists of a number of chips mounted on one or more printed circuit boards as well as peripherals (e.g., the touchscreen) connected with cabling. This document seems to expect that "alien electronics" will be a separate discrete component added to the device, but this need not be so. A moderately sophisticated attacker could modify or replace any of these components (for instance, by replacing the chips with lookalike chips). As most of these components are sealed in opaque plastic packaging, assessing whether they have been tampered with is no easy matter. For instance, in the case of a chip, one would need to either remove the chip packaging (destroying it in the process) or x-ray it and then compare to a reference example of the chip in order to verify that no substitution had occurred. These are specialized and highly sophisticated techniques that few people are qualified to carry out, and yet this document proposes that they be performed on multiple machines in every jurisdiction in the United States, of which there are on the order of 10,000.

Moreover, this level of hardware analysis is useless against a broad spectrum of informational threats. An attacker who can rewrite the device's firmware--trivial with physical access to the internals, but the California TTBR discovered a number of vectors which did not require such access--can program his malware to erase itself after the election is over, thus evading inspection. Moreover, to the extent to which the microprocessors in the device contain firmware and/or microcode, it may not be possible to determine whether it has been tampered, since that would require interfaces directly to the firmware which do not depend on the firmware itself; these do not always exist. Absent some well-defined threat model, it is unclear why this document ignores these threats in favor of less effective physical attacks.

Finally, doing any of this inspection requires extremely detailed knowledge of the expected internals of the voting machine (it is insufficient to simply do exact comparison from a single reference unit because there is generally some manufacturing variation due to inter-run engineering fixes and the like). This information would either need to be discovered expensive reverse engineering or having the vendor release the information, which they have historically been very reluctant to do, especially as releasing it to every county in the US would be much like publishing it.

Official and Pollworker Verification
This document recommends that voting officials and pollworkers be subject to a number of verification requirements. In particular:

Background checks, including interviews with co-workers
Citizenship verification
Positive physical identification of poll workers prior to handling sensitive materials
Test bribery

These recommendations are highly discordant with existing practice. In real jurisdictions, it is extremely difficult to find poll workers (hence the high number of retirees) and they are paid relatively nominal sums (~$10/hr). It's unclear if they would be required to undergo a background check, but I suspect that many would not be pleased by that. In my experience, poll workers feel they are performing a public service and are unlikely to be pleased to be treated as criminals. Of course, it's unclear if poll workers count for the purposed of background checks. The authors write:

Minimum: All election officials, technicians, contractors, or volunteers who prepare, maintain, repair, test, inspect, or transport voting machines, or compile "substantial" amounts of election results should have background checks, repeated every 3---5 years, that include a criminal background history, credit check, and (when practical) interviews with co---workers.

Volunteers certainly set machines up in the polling place. I'm not sure if this counts as "preparing". It wouldn't surprise me if volunteers transported machines. The bottom line here is that this requirement is problematic either way: if you think poll workers have to get background checks, it's really invasive. If you don't, you're ignoring a category of threat from people who have very high levels of machine access (assuming you think that background checks do anything useful, which seems rather dubious in this case.)

The requirement for positive physical identification seems extremely impractical. As noted above, typical polling places are operated by semi-volunteer poll workers. Given the ease of acquiring false identification, it seems highly unlikely that they will be able to validate the identity of either the poll workers under their supervision or of the (alleged) election officials to whom they are supposed to deliver election materials. Similarly, it's not clear to me that verifying US Citizenship does anything useful. Is there some evidence that non-citizens are particularly likely to want to tamper with elections or that it's especially difficult for foreign countries which want to tamper with elections to find US citizens to do it for them?

This document recommends attempting to bribing a subset of poll workers. I'd be interested to learn whether any systematic study of this has been done on the likely subject population. I.e., does this sort of intervention actually reduce the effective level of bribery?

Seal Practice
This document contains a number of detailed recommendations about seal practice (required level of training, surface preparation, inspection protocols). I don't think there's any doubt that seals are a weak security measure and much of the research showing that comes from the Argonne group. However, it's also not clear to me that the measures described here will improve the situation. Extensive human factors research in the Web context shows that users typically ignore even quite obvious indications of security failures, especially in contexts where they get in the way of completion of some task.

Is there research that shows that (for instance) 10 minutes of training has any material impact on the detection rate of fake seals, especially when that detection is performed in the field?

The authors also write:

Minimize the use of (pressure sensitive) adhesive label seals

I don't really understand how this recommendation is operationalizable: Existing voting equipment is designed with numerous points of entry which are not obviously securable in any way, and for which adhesive seals appear to be the most practical option. What is the recommendation for such equipment?

Excessive Expert Manpower Requirements
The authors write:

Minimum: Election officials will arrange for a local committee (pro bono if necessary) to serve as the Election Security Board. The Board should be made up primarily of security professionals, security experts, university professors, students, and registered voters not employees of the election process. The Board should meet regularly to analyze election security, observe elections, and make suggestions for improved election security and the storage and transport of voting machines and ballots. The Board needs considerable autonomy, being able to call press conferences or otherwise publicly discuss its findings and suggestions as appropriate. Employees of companies that sell or manufacture seals, other security products often used in elections, or voting machines are not eligible to serve on the Board.

The United States has something like 10,000 separate election jurisdictions. If each of these convenes a board of 3-5 people, then approximately 30,000-50,000 security experts will be required. Given that all existing voting system reviews have been short-term affairs and in many cases the experts were compensated, and yet have drawn from the entire country to gather ~30 experts, it's hard to see where we are going to gather 1000 times more people for a largely thankless long-term engagement.

Miscellaneous
The authors recommend that:

The voting machines for the above inspection (or trial bribery discussed below) should be randomly chosen based on pseudo-random numbers generated by computer, or by hardware means such as pulling numbers or names from a hat.

Verifiably generating random values is a significantly harder problem than this makes it sound like. In particular, pulling names and numbers from a hat is trivial to game.

Recommended: Each individual in the chain of custody must know the secret password of the day or the election before being allowed to take control of the assets.

Any secret that is distributed this widely is hardly likely to remain a secret for long.

Recommended: Before each election, discuss with poll workers, election judges, and election officials the importance of ballot secrecy, and the importance of watching for miniature wireless video cameras in the polling place, especially mounted to the ceiling or high up on walls to observe voters' choices. The polling place should be checked for surreptitious digital or video cameras at least once on election day.

Elections are typically conducted in spaces which are otherwise reserved for other purposes and therefore are not empty. In my experience with such spaces, it would be very difficult to practically inspect for a surreptitious camera placed in the ceiling and concealed with any level of skill. This is particularly difficult in spaces with drop ceilings, ventilation ducts, etc.