Educated Guesswork

Initial notes on Microsoft's CU-RTC-Web Proposal

2012-08-07T14:52:41Z

EXECUTIVE SUMMARY
Yesterday, Microsoft published their CU-RTC-Web WebRTC API proposal as an alternative to the existing W3C WebRTC API being implemented in Chrome and Firefox. Microsoft's proposal is a "low-level API" proposal which basically exposes a bunch of media- and transport-level primitives to the JavaScript Web application, which is expected to stitch them together into a complete calling system. By contrast to the current "mid-level" API, the Microsoft API moves a lot of complexity from the browser to the JavaScript but the authors argue that this makes it more powerful and flexible. I don't find these arguments that convincing, however: a lot of them seem fairly abstract and rhetorical and when we get down to concrete use cases, the examples Microsoft gives seem like things that could easily be done within the existing framework. So, while it's clear that the Microsoft proposal is a lot more work for the application developer; it's a lot less clear that it's sufficiently more powerful to justify that additional complexity.

Microsoft's arguments for the superiority of this API fall into three major categories:

JSEP doesn't match with "key Web tenets"; i.e., it doesn't match the Web/HTML5 style.
It allows the development of applications that would otherwise be difficult to develop with the existing W3C API.
It will be easier to make it interoperate with existing VoIP endpoints.

Like any all-new design, this API has the significant advantage (which the authors don't mention) of architectural cleanliness. The existing API is a compromise between a number of different architectural notions and like any hybrid proposals has points of ugliness where those proposals come into contact with each other (especially in the area of SDP.) However, when we actually look at functionality rather than elegance, the advantages of an all-new design---not only one which is largely not based on preexisting technologies but one which involves discarding most of the existing work on WebRTC itself---start to look fairly thin.

Looking at the three claims listed above: the first seems more rhetorical than factual. It's certainly true that in the early years of the Web designers strove to keep state out of the Web browser, but that hasn't been the case with rich Web applications for quite some time. To the contrary, many modern HTML5 technologies (localstore, WebSockets, HSTS, WebGL) are about pushing state onto the browser from the server.

The interoperability argument is similarly weakly supported. Given that JSEP is based on existing VoIP technologies, it seems likely that it is easier to make it interoperate with existing endpoints since it's not first necessary to implement those technologies (principally SDP) in JavaScript before you can even try to interoperate. The idea here seems to be that it will be easier to accomodate existing noncompliant endpoints if you can adapt your Web application on the fly, but given the significant entry barrier to interoperating at all, this seems like an argument that needs rather more support than MS has currently offered.

Finally, with regard to the question of the flexibility/JavaScript complexity tradeoff, it's somewhat distressing that the specific applications that Microsoft cites (baby monitoring, security cameras, etc.) are so pedestrian and easily handled by JSEP. This isn't of course to say that there aren't applications which we can't currently envision which JSEP would handle badly, but it rather undercuts this argument if the only examples you cite in support of a new design are those which are easily handled by the old one.

None of this is to say that CU-RTC-Web wouldn't be better in some respects than JSEP. Obviously, any design has tradeoffs and as I said above, it's always appealing to throw all that annoying legacy stuff away and start fresh. However, that also comes with a lot of costs and before we consider that we really need to have a far better picture of what benefits other than elegance starting over would bring to the table.

BACKGROUND
More or less everyone agrees about the basic objectives of the WebRTC effort: to bring real-time communications (i.e., audio, video, and direct data) to browsers. Specifically, the idea is that Web applications should be able to use these capabilities directly. This sort of functionality was of course already available either via generic plugins such as Flash or via specific plugins such as Google Talk, but the idea here was to have a standardized API that was built into browsers.

In spite of this agreement about objectives, from the beginning there was debate about the style of API that was appropriate, and in particular how much of the complexity should be in the browser and how much in the JavaScript The initial proposals broke down into two main flavors:

High-level APIs — essentially a softphone in the browser. The Web application would request the creation of a call (perhaps with some settings as to what kinds of media it wanted) and then each browser would emit standardized signaling messages which the Web application would arrange to transit to the other browser. The original WHATWG HTML5/PeerConnection spec was of this type.
Low-level APIs — an API which exposed a bunch of primitive media and transport capabilities to the JavaScript. A browser that implemented this sort of API couldn't really do much by itself. Instead, you would need to write something like a softphone in JavaScript, including implementing the media negotiation, all the signaling state machinery, etc. Matthew Kaufman from Microsoft was one of the primary proponents of this design.

After a lot of debate, the WG ultimately rejected both of these and settled on a protocol called JavaScript Session Establishment Protocol (JSEP), which is probably best described as a mid-level API. That design, embodied in the current specifications [ http://tools.ietf.org/html/draft-ietf-rtcweb-jsep-01 http://dev.w3.org/2011/webrtc/editor/webrtc.html], keeps the transport establishment and media negotiation in the browser but moves a fair amount of the session establishment state machine into the JavaScript. While it doesn't standardize signaling, it also has a natural mapping to a simple signaling protocol as well as to SIP and Jingle, the two dominant standardized calling protocols. The idea is supposed to be that it's simple to write a basic application (indeed, a large number of such simple demonstration apps have been written) but that it's also possible to exercise advanced features by manipulating the various data structures emitted by the browser. This is obviously something of a compromise between the first two classes of proposals.

The decision to follow this trajectory was made somewhere around six months ago and at this point Google has a fairly mature JSEP implementation available in Chrome Canary while Mozilla has a less mature implementation which you could compile yourself but hasn't been released in any public build.

Yesterday, Microsoft made a new proposal, called CU-RTC-Web. See the blog post and the specification.

Below is an initial, high-level analysis of this proposal.

Disclaimer: I have been heavily involved with both the IETF and W3C working groups in this area and have contributed significant chunks of code to both the Chrome and Firefox implementations. I am also currently consulting for Mozilla on their implementation. However, the comments here are my own and don't necessarily represent those of any other organization.

WHAT IS MICROSOFT PROPOSING?
What Microsoft is proposing is effectively a straight low level API.

There are a lot of different API points, and I don't plan to discuss the API in much detail, but it's helpful to talk about the API some to get a flavor of what's required to use it.

RealTimeMediaStream -- each RealTimeMediaStream represents a single flow of media (i.e., audio or video).
RealTimeMediaDescription -- a set of parameters for the RealTimeMediaStream.
RealTimeTransport -- a transport channel which a RealTimeMediaStream can run over.
RealTimePort -- a transport endpoint which can be paired with a RealTimePort on the other side to form a RealTimeTransport.

In order to set up an audio, video, or audio-video session, then, the JS has to do something like the following:

Acquire local media streams on each browser via the getUserMedia() API, thus getting some set of MediaStreamTracks.
Create RealTimePorts on each browser for all the local network addresses as well as for whatever media relays are available/ required.
Communicate the coordinates for the RealTimePorts from each browser to the other.
On each browser, run ICE connectivity checks for all combinations of remote and local RealTimePorts.
Select a subset of the working remote/local RealTimePort pairs and establish RealTimeTransports based on those pairs. (This might be one or might be more than one depending on the number of media flows, level of multiplexing, and the level of redundancy required).
Determine a common set of media capabilities and codecs between each browser, select a specific set of media parameters, and create matching RealTimeMediaDescriptions on each browser based on those parameters.
Create RealTimeMediaStreams by combining RealTimeTransports, RealTimeMediaDescriptions, and MediaStreamTracks.
Attach the remote RealTimeMediaStreams to some local display method (such as an audio or video tag).

For comparison, in JSEP you would do something like:

Acquire local media streams on each browser via the getUserMedia() API, thus getting some set of MediaStreamTracks.
Create a PeerConnection() and call AddStream() for each of the local streams.
Create an offer on one brower send it to the other side, create an answer on the other side and send it back to the offering browser. In the simplest case, this just involves making some API calls with no arguments and passing the results to the other side.
The PeerConnection fires callbacks announcing remote media streams which you attach to some local display method.

As should be clear, the CU-RTC-Web proposal requires significantly more complex JavaScript, and in particular requires that JavaScript to be a lot smarter about what it's doing. In a JSEP-style API, the Web programmer can be pretty ignorant about things like codecs and transport protocols, unless he wants to do something fancy, but with CU-RTC-Web, he needs to understand a lot of stuff to make things work at all. In some ways, this is a much better fit for the traditional Web approach of having simple default behaviors which fit a lot of cases but which can then be customized, albeit in ways that are somewtimes a bit clunky.

Note that it's not like this complexity doesn't exist in JSEP, it's just been pushed into the browser so that the user doesn't have to see it. As discussed below, Microsoft's argument is that this simplicity in the JavaScript comes at a price in terms of flexibility and robustness, and that libraries will be developed (think jQuery) to give the average Web programmer a simple experience, so that they won't have to accept a lot of complexity themselves. However, since those libraries don't exist, it seems kind of unclear how well that's going to work.

ARGUMENTS FOR MICROSOFT'S PROPOSAL
Microsoft's proposal and the associated blog post makes a number of major arguments for why it is a superior choice (the proposal just came out today so there haven't really been any public arguments for why it's worse). Combining the blog posts, you would get something like this:

That the current specification violates "fit with key web tenets", specifically that it's not stateless and that you can only make changes when in specific states. Also, that it depends on the SDP offer/answer model.
That it doesn't allow a "customizable response to changing network quality".
That it doesn't support "real-world interoperability" with existing equipment.
That it's too tied to specific media formats and codecs.
That JSEP requires a Web application to do some frankly inconvenient stuff if it wants to do something that the API doesn't have explicit support for.
That it's inflexible and/or brittle with respect to new applications and in particular that it's difficult to implement some specific "innovative" applications with JSEP.

Below we examine each of these arguments in turn.

FITTING WITH "WEB TENETS"
MS writes:

Honoring key Web tenets-The Web favors stateless interactions which do not saddle either party of a data exchange with the responsibility to remember what the other did or expects. Doing otherwise is a recipe for extreme brittleness in implementations; it also raises considerably the development cost which reduces the reach of the standard itself.

This sounds rhetorically good, but I'm not sure how accurate it is. First, the idea that the Web is "stateless" feels fairly anachronistic in an era where more and more state is migrating from the server. To pick two examples, WebSockets involves forming a fairly long-term stateful two-way channel between the browser and the server, and localstore/localdb allow the server to persist data semi-permanently on the browser. Indeed, CU-RTC-Web requires forming a nontrivial amount of state on the browser in the form of the RealTimePorts, which represent actual resource reservations that cannot be reliably reconstructed if (for instance) the page reloads. I think the idea here is supposed to be that this is "soft state", in that it can be kept on the server and just reimposed on the browser at refresh time, but as the RealTimePorts example shows, it's not clear that this is the case. Similar comments apply to the state of the audio and video devices which are inherently controlled by the browser.

Moreover, it's never been true that neither party in the data exchange was "saddled" with remembering what the other did; rather, it used to be the case that most state sat on the server, and indeed, that's where the CU-RTC-Web proposal keeps it. This is the first time we have really built a Web-based peer-to-peer app. Pretty much all previous applications have been client-server applications, so it's hard to know what idioms are appropriate in a peer-to-peer case.

I'm a little puzzled by the argument about "development cost"; there are two kinds of development cost here: that to browser implementors and that to Web application programmers. The MS proposal puts more of that cost on Web programmers whereas JSEP puts more of the cost on browser implementors. One would ordinarily think that as long as the standard wasn't too difficult for browser implementors to develop at all, then pushing complexity away from Web programmers would tend to increase the reach of the standard. One could of course argue that this standard is too complicated for browser implementors to implement at all, but the existing state of Google and Mozilla's implementations would seem to belie that claim.

Finally, given that the original WHATWG draft had even more state in the browser (as noted above, it was basically a high-level API), it's a little odd to hear that Ian Hickson is out of touch with the "key Web tenets".

CUSTOMIZABLE RESPONSE TO CHANGING NETWORK QUALITY
The CU-RTC-Web proposal writes:

Real time media applications have to run on networks with a wide range of capabilities varying in terms of bandwidth, latency, and noise. Likewise these characteristics can change while an application is running. Developers should be able to control how the user experience adapts to fluctuations in communication quality. For example, when communication quality degrades, the developer may prefer to favor the video channel, favor the audio channel, or suspend the app until acceptable quality is restored. An effective protocol and API will have to arm developers with the tools to tailor such answers to the exact needs of the moment, while minimizing the complexity of the resulting API surface.

It's certainly true that it's desirable to be able to respond to changing network conditions, but it's a lot less clear that the CU-RTC-Web API actually offers a useful response to such changes. In general, the browser is going to know a lot more about the bandwidth/quality tradeoff of a given codec is going to be than most JavaScript applications will, and so it seems at least plausible that you're going to do better with a small number of policies (audio is more important than video, video is more important than audio, etc.) than you would by having the JS try to make fine-grained decisions about what it wants to do. It's worth noting that the actual "customizable" policies that are proposed here seem pretty simple. The idea seems to be not that you would impose policy on the browser but rather that since you need to implement all the negotiation logic anyway, you get to implement whatever policy you want.

Moroever, there's a real concern that this sort of adaptation will have to happen in two places: as MS points out, this kind of network variability is really common and so applications have to handle it. Unless you want to force every JS calling application in the universe to include adaptation logic, the browser will need some (potentially configurable and/or disableable) logic. It's worth asking whether whatever logic you would write in JS is really going to be enough better to justify this design.

REAL-WORLD INTEROPERABILITY
In their blog post today, MS writes about JSEP:

it shows no signs of offering real world interoperability with existing VoIP phones, and mobile phones, from behind firewalls and across routers and instead focuses on video communication between web browsers under ideal conditions. It does not allow an application to control how media is transmitted on the network.

I wish this argument had been elaborated more, since it seems like CU-RTC-Web is less focused on interoperability, not more. In particular, since JSEP is based on existing technologies such as SDP and ICE, it's relatively easy to build Web applications which gateway JSEP to SIP or Jingle signaling (indeed, relatively simple prototypes of these already exist). By contrast, gatewaying CU-RTC-Web signaling to either of these protocols would require developing an entire SDP stack, which is precisely the piece that the MS guys are implicitly arguing is expensive.

Based on Matthew Kaufman's mailing list postings, his concern seems to be that there are existing endpoints which don't implement some of the specifications required by WebRTC (principally ICE, which is used to set up the network transport channels) correctly, and that it will be easier to interoperate with them if your ICE implementation is written in JavaScript and downloaded by the application rather than in C++ and baked into the browser. This isn't a crazy theory, but I think there are serious open questions about whether it is correct. The basic problem is that it's actually quite hard to write a good ICE stack (though easy to write a bad one). The browser vendors have the resources to do a good job here, but it's less clear that random JS toolkits that people download will actually do that good a job (especially if they are simultaneously trying to compensate for broken legacy equipment). The result of having everyone write their own ICE stack might be good but it might also lead to a landscape where cross-Web application interop is basically impossible (or where there are islands of noninteroperable de facto standards based on popular toolkits or even popular toolkit versions).

A lot of people's instincts here seem to be based on an environment where updating the software on people's machines was hard but updating one's Web site was easy. But for about half of the population of browsers (Chrome and Firefox) do rapid auto-updates, so they actually are generally fairly modern. By contrast, Web applications often use downrev version of their JS libraries (I wish I had survey data here but it's easy to see just by opening up a JS debugger on you favorite sites). It's not at all clear that the JS is easy to upgrade/native is hard dynamic holds up any more.

TOO TIED TO SPECIFIC MEDIA FORMATS AND CODECS
The proposal says:

A successful standard cannot be tied to individual codecs, data formats or scenarios. They may soon be supplanted by newer versions, which would make such a tightly coupled standard obsolete just as quickly. The right approach is instead to to support multiple media formats and to bring the bulk of the logic to the application layer, enabling developers to innovate.

I can't make much sense of this at all. JSEP, like the standards that it is based on, is agnostic about the media formats and codecs that are used. There's certainly nothing in JSEP that requires you to use VP8 for your video codec, Opus for your audio codec, or anything else. Rather, two conformant JSEP implementations will converge on a common subset of interoperable formats. This should happen automatically without Web application intervention.

Arguably, in fact, CU-RTC-Web is *more* tied to a given codec because the codec negotiation logic is implemented either on the server or in the JavaScript. If a browser adds support for a new codec, the Web application needs to detect that and somehow know how to prioritize it against existing known codecs. By contrast, when the browser manufacturer adds a new codec, he knows how it performs compared to existing codecs and can adjust his negotiation algorithms accordingly. Moreover, as discussed below, JSEP provides (somewhat clumsy) mechanisms for the user to override the browser's default choices. These mechanisms could probably be made better within the JSEP architecture.

Based on Matthew Kaufman's interview with Janko Rogers [http://gigaom.com/2012/08/06/microsoft-webrtc-w3c/], it seems like this may actually be about the proposal to have a mandatory to implement video codec (the leading candidates seem to be H.264 or VP8). Obviously, there have been a lot of arguments about whether such a mandatory codec is required (the standard argument in favor of it is that then you know that any two implementations have at least one codec in common), but this isn't really a matter of "tightly coupling" the codec to the standard. To the contrary, if we mandated VP8 today and then next week decided to mandate H.264 it would be a one-line change in the specification. In any case, this doesn't seem like a structural argument about JSEP versus CU-RTC-Web. Indeed, if IETF and W3C decided to ditch JSEP and go with CU-RTC-Web, it seems likely that this wouldn't affect the question of mandatory codecs at all.

THE INCONVENIENCE OF SDP EDITING
Probably the strongest point that the MS authors make is that if the API doesn't explicitly support doing something, the situation is kind of gross:

In particular, the negotiation model of the API relies on the SDP offer/answer model, which forces applications to parse and generate SDP in order to effect a change in browser behavior. An application is forced to only perform certain changes when the browser is in specific states, which further constrains options and increases complexity. Furthermore, the set of permitted transformations to SDP are constrained in non-obvious and undiscoverable ways, forcing applications to resort to trial-and-error and/or browser-specific code. All of this added complexity is an unnecessary burden on applications with little or no benefit in return.

What this is about is that in JSEP you call CreateOffer() on a PeerConnection in order to get an SDP offer. This doesn't actually change the PeerConnection state to accomodate the new offer; instead, you call SetLocalDescription() to install the offer. This gives the Web application the opportunity to apply its own preferences by editing the offer. For instance, it might delete a line containing a codec that it didn't want to use. Obviously, this requires a lot of knowledge of SDP in the application, which is irritating to say the least, for the reasons in the quote above.

The major mitigating factor is that the W3C/IETF WG members intend to allow most common manipulations to made through explicit settings parameters, so that only really advanced applications need to know anything about SDP at all. Obviously opinions vary about how good a job they have done, and of course it's possible to write libraries that would make this sort of manipulation easier. It's worth noting that there has been some discussion of extending the W3C APIs to have an explicit API for manipulating SDP objects rather than just editing the string versions (perhaps by borrowing some of the primitives in CU-RTC-Web). Such a change would make some things easier while not really representing a fundamental change to the JSEP model. However, it's not clear if there are enough SDP-editing tasks to make this project worthwhile.

With that said, that in order to have CU-RTC-Web interoperate with existing SIP endpoints at all you would need to know far more about SDP than would be required to do most anticipated transformations in a JSEP environment, so it's not like CU-RTC-Web frees you from SDP if you care about interoperability with existing equipment.

SUPPORT FOR NEW/INNOVATIVE APPLICATIONS
Finally, the MSFT authors argue that CU-RTC-Web is more flexible and/or less brittle than JSEP:

On the other hand, implementing innovative, real-world applications like security consoles, audio streaming services or baby monitoring through this API would be unwieldy, assuming it could be made to work at all. A Web RTC standard must equip developers with the ability to implement all scenarios, even those we haven't thought of.

Obviously the last sentence is true, but the first sentence provides scant support for the claim that CU-RTC-Web fulfills this requirement better than JSEP. The particular applications cited here, namely audio streaming, security consoles, and baby monitoring, seem not only doable with JSEP, but straightforward. In particular, security consoles and baby monitoring just look like one way audio and/or video calls from some camera somewhere. This seems like a trivial subset of the most basic JSEP functionality. Audio streaming is, if anything, even easier. Audio streaming from servers already exists without any WebRTC functionality at all, in the form of the audio tag, and audio streaming from client to server can be achieved with the combination of getUserMedia and WebSockets. Even if you decided that you wanted to use UDP rather than WebSockets, audio streaming is just a one-way audio call, so it's hard to see that this is a problem.

In e-mail to the W3C WebRTC mailing list, Matthew Kaufman mentions the use case of handling page reload:

An example would be recovery from call setup in the face of a browser page reload... a case where the state of the browser must be reinitialized, leading to edge cases where it becomes impossible with JSEP for a developer to write Javascript that behaves properly in all cases (because without an offer one cannot generate an answer, and once an offer has been generated one must not generate another offer until the first offer has been answered, but in either case there is no longer sufficient information as to how to proceed).

This use case, often called "rehydration" has been studied a fair bit and it's not entirely clear that there is a convenient solution with JSEP. However, the problem isn't the offer/answer state, which is actually easily handled, but rather the ICE and cryptographic state, which are just as troublesome with CU-RTC-Web as they are with JSEP [for a variety of technical reasons, you can't just reuse the previous settings here.] So, while rehydration is an issue, it's not clear that CU-RTC-Web makes matters any easier.

This argument, which should be the strongest of MS's arguments, feels rather like the weakest. Given how much effort has already gone into JSEP, both in terms of standards and implementation, if we're going to replace it with something else that something else should do something that JSEP can't, not just have a more attractive API. If MS can't come up with any use cases that JSEP can't accomplish, and if in fact the use cases they list are arguably more convenient with JSEP than with CU-RTC-Web, then that seems like a fairly strong argument that we should stick with JSEP, not one that we should replace it.

What I'd like to see Microsoft do here is describe some applications that are really a lot easier with CU-RTC-Web than they are with JSEP. Depending on the details, this might be a more or less convincing argument, but without some examples, it's pretty hard to see what considerations other than aesthetic would drive us towards CU-RTC-Web.

Acknowledgement
Thanks to Cullen Jennings, Randell Jesup, Maire Reavy, and Tim Terriberry for early comments on this draft.

In which Home Depot tries to give me $78

2012-07-20T04:03:39Z

The other day I went to Home Depot to buy some party supplies (incidentally, check out the party invitation here and the bonus Web site here. It's some of my better work.). One of the things I wanted was a set of rope lights. I eventually picked up three sets of 48' lights for $36.48. However, when I went to ring them up (you know Home Depot is almost all self-check, right?) two rang up at $62.48.

Looking closely, what happened is that the lights were packaged in clear plastic clamshell packaging with two paper labels, one in the front and one in the back. The paper label in the front showed the 48' lights listed above. The back label (the one with the bar code) showed 27' LED lights (LEDs are cooler and cool == expensive). It took a while for Home Depot to sort the problem out. Customer service's initial reaction was that someone had returned a set of the cheap lights but swapped the back labels so that they could get a larger refund. But then they had some more lights pulled off the shelf and they were mismatched as well, so things started to look a bit confused. Eventually, they just pulled the back pages out of the package (I guess to make it hard for me to do a return) and sent me on my way.

Here's the screwed up thing: nobody in this entire transaction was sure which set of actual lights I had in my hand. The matching package (the one which had rung up as expected) looked a lot like the other two packages, but really these things look pretty similar and after all we didn't know that any of the packages was right. I offered to take them out and measure them for length, but nobody seemed interested. So, at the time I walked out the door it seemed quite possible that Home Depot had sold me $188 worth of lights for $109. Of course, I assured them that I would bring them back if they turned out to be the LED lights, but they had no way of knowing I actually would (or of verifying if I did or not). I actually tried to explain this several times, but nobody seemed to care and eventually I gave up and left.

Turns out that they were the right lights after all, though.

Problems with secure upgrade to TLS 1.1

2012-07-17T04:23:32Z

One of the most common responses to the Rizzo/Duong "BEAST" attack was why not just deploy TLS 1.1. See, for instance, this incredibly long Bugzilla bug about TLS 1.1 in Network Security Services (NSS), the SSL/TLS stack used by both Chrome and Firefox. Unfortunately, while TLS 1.1 deployment is a good idea in and of itself, it turns out not to be a very useful defense against this particular attack. The problem isn't that servers don't support TLS 1.1 (though most still don't) but rather that the attacker can force a client and server which both implement TLS 1.1 to negotiate TLS 1.0 (which is vulnerable).

Background: Protocol Negotiation and Downgrade Attacks
Say we are designing a new protocol to remotely control toasters, the Toaster Control Protocol (TCP). TCP has a client controller, a Toaster Control Equipment (TCE), and a device responsible for toasting the bread, or Toaster Heading Equipment (THE). We'll start by developing TCP 1.0, but we expect that as time goes on we'll want to add new features and eventually we'll want to deploy TCP 2.0. So, for instance, maybe TCP 1.0 will only support toasters up to two slots, but TCP 2.0 will add toaster ovens (as has been widely observed, TCP 3.0 will allow you to send and receive e-mail). We may also change the protocol encoding between versions, so TCP 1.0 could have an ASCII representation whereas TCP 2.0 added a binary encoding to save bits on the wire. For obvious reasons, each version doesn't roll out all at once, so I might want TCP 2.0 TCE to talk to my TCP 1.0 THE. Obviously, that communication will be TCP 1.0, but if I later add a TCP 2.0 toaster oven, I want that to communicate with my TCE using TCP 2.0.

One traditional way to address this problem is to have some sort of initial handshake in which each side advertises its capabilities and they converge on a common version (typically the most recent common version). So, for instance, my TCE would say "I speak 2.0" but if the says "I only speak 1.0" then you end up with 1.0. On the other hand if the TCE advertises 2.0 and the THE speaks 2.0, then you end up with 2.0. As in:

Another common approach is to have individual feature negotiation rather than version numbers. For instance, the TCE might say "do you know how to make grilled cheese" and the THE would say "yes" or "no". In that case, you can roll out individual features rather than have a big version number jump. Sometimes, systems will have both types of negotiation, with the version number indicating a pile of features that go together and also being able to negotiate individual features. TLS is actually one such protocol, though the features are called "extensions" (not an uncommon name for this). So you get something like:

For non-security protocol, or rather ones where you don't need to worry about attackers, or rather those where you don't think you need to worry about attackers, this kind of approach mostly works pretty well, though there's always the risk that someone will screw up their side of the negotiation. With protocols that are security relevant, however, things are a little different. Let's say that in TCP 2.0 we decide to add encryption. So the negotiation looks pretty much the same as before:

But since we're talking security we need to assume someone might be attacking us, and in particular they might be tampering with the traffic, like so:

This is what's called a downgrade attack or a bid-down attack. Even though in principle both sides could do version 2.0 (and an encrypted channel), the attacker has forced them down to 1.0 (and a clear channel). Similar attacks can be mounted against negotiation of cryptographic features. Consider, for instance, the case where we are negotiating cryptographic algorithms and each side supports both AES (a strong algorithm) and DES (a weak algorithm), and the attacker forces both sides down to DES:

There are two basic defenses against this kind of downgrade attack. The first is for sides to remember the other side's capabilities and complain if those expectations are violated. So, for instance, the first time that the TCE and THE communicate, the TCE notices that the THE can do TCP 2.0 and from then on it refuses to do TCP 1.0. Obviously, an attacker can downgrade you on the first communication, but if you ever get a communication without the attacker in the way, then you are immune from attack thereafter (at least until both sides upgrade again). This isn't a fantastic defense for a number of reasons, but it's more or less the best you can do in the non-cryptographic setting. In the setting where you are building a security protocol, however, there's a better solution. Most association-oriented security protocols (SSL/TLS, IPsec, etc.) have a handshake phase where they do version/feature negotiation and key establishment, followed by a data transfer phase where the actual communications happen. In most such protocols, the handshake phase includes an integrity check over the handshake messages. So, for instance, in SSL/TLS, the Finished messages include a Message Authentication Code (MAC) computed over the handshake and keyed with the exchanged master secret:

Any tampering with any of the handshake values causes the handshake to fail. This makes downgrade attacks more difficult: as long as the weakest share key exchange protocol and the weakest shared MAC are sufficiently strong (both of these things are true for TLS), then pretty much everything else can be negotiated safely, including features and version numbers. [Technical note: SSL version 2 didn't have anti-downgrade defenses and so there's some other anti-downgrade mechanisms in SSL/TLS as well.] This is why it's so important to establish a baseline level of cryptographic security in the first level of the protocol, so you can prevent downgrade attack to the nonsecure version.

Attacks on TLS 1.1 Negotiation
Based on what I said above, it would seem that rolling out TLS 1.1 securely would be no problem. And if everything was perfect, then that would indeed be true. Unfortunately, everything is not perfect. In order for version negotiation to work properly, a version X implementation needs to accept offers of version Y > X (although of course it will negotiate version X). However, some nontrivial number of TLS servers and/or intermediaries (on the order of 1%) will not complete the TLS handshake if TLS 1.1 is offered (I don't mean they negotiate 1.0 but instead an error is observed). There are similar problems (though less extensive with TLS extensions and offering TLS 1.0 as opposed to SSLv3).

No browser wants to break on 1% of the sites in the world, so instead when some browser clients (at least Chrome and Firefox) encounter a server which throws some error with a modern ClientHello, they seamlessly fall back to older versions. I.e., something like this (the exact details of the fallback order depend on the browser):

It seems very likely that browsers will continue this behavior for negotiating TLS 1.1 and/or 1.2. Here's the problem: this fallback happens outside of the ordinary TLS version negotiation machinery, so it's not protected by any of the cryptographic checks designed to prevent downgrade attack. Any attacker can forge a TCP FIN or RST, thus forcing clients back to SSLv3, TLS 1.0, or whatever the lowest version they support is. The attack looks like this:

The underlying problem here is that the various extension mechanisms for TLS weren't completely tested (or in some cases, specified; extensions in particular weren't part of SSLv3), and so the browsers have to fall back on ad hoc feature/version negotiation mechanisms. Unfortunately, those mechanisms, unlike the official mechanisms, aren't secure against downgrade attack.¹

There is, however, one SSL/TLS negotiation mechanism that is extremely reliable: cipher suite negotiation. In TLS, each cipher suite is rendered as a 16-bit number: the client offers a pile of cipher suites and the server selects the one it likes. Because new cipher suites are introduced fairly regularly, and ignoring unknown suites is so easy, this mechanism has gotten a lot of testing, and it works pretty well, even through nearly all intermediaries. The result is that if you really need to have downgrade attack resistance, you need to put something in the cipher suites field. This is the idea behind the Signaling Cipher Suite Value used by the TLS Renegotiation Indication Extension [RFC 5746]. Recently, there have been several proposals that are intended to indicate TLS 1.1 and/or extension support in the cipher suite field. The idea here is to allow detection of version rollback attacks. Once you can detect version rollback, then you can use the ordinary handshake anti-tampering mechanisms to detect removal of extensions.²

The bad news about these mechanisms is that they require upgrading the server to detect the new cipher suite. On the other hand, they can be incrementally deployed. (Yngve Pettersen has a client-side only proposal which leverages the RI SCSV to a similar end, but relies on the assumption that any server which does RI is modern enough to handle extensions properly).

What's the lesson here? Minimally, this kind of negotiation facility needs to be clearly specified from the start and then extensively tested (and hopefully exercised as soon as possible). Once you've got a significant installed base of noncompliant implementations, it gets very difficult to distinguish a noncompliant peer and a downgrade attack and thus problematic to refuse to connect to apparently noncompliant peers.

¹ Note that this isn't always a big deal. Consider, for instance, the TLS Server Name Indication message, which allows a server to host multiple HTTPS sites on the same IP. The attacker could force an SNI downgrade, but this will generally just cause a connection failure, which they could have easily have done by forging an RST for every connection. Downgrade attacks are mostly an issue when the attacker is forcing you to a weaker security posture, rather than just breaking stuff.

Selecting IETF Travel Venues

2012-04-10T03:46:10Z

The IETF RTCWEB WG has been operating on a fast track with an interim meeting between each IETF meeting. Since we needed to schedule a lot of meetings, thought it might be instructive to try to analyze a bunch of different locations to figure out the best strategy. Here's a lightly edited version of my post to the RTCWEB WG trying to address this issue.

Note that I'm not trying to make any claims about what the best set of venues is. It's obviously easy to figure out any statistic we want about each proposed venue, but how you map that data to "best" is a much more difficult problem. The space is full of Pareto optima, and even if we ignore the troubling philosophical question of interpersonal utility comparisons, there's some tradeoff between minimal total travel time and a "fair" distribution of travel times (or at least an even distribution).

METHODOLOGY
The data below is derived by treating both people and venues as airport locations and using travel time as our primary instrument.

For each responder for the current Doodle poll, assign a home airport based on their draft publication history. We're missing a few people but basically it should be pretty complete. Since these people responded before the venue is known, it's at least somewhat unbiased.
Compute the shortest advertised flight between each home airport and the locations for each venue by looking at the shortest advertised Kayak flights around one of the proposed interim dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares". [Thanks to Martin Thomson or helping me gather these.]

This lets us compute statistics for any venue and/or combination of venues, based on the candidate attendee list.

The three proposed venues:

San Francisco (SFO)
Boston (BOS)
Stockholm (ARN)

Three hubs not too distant from the proposed venues:

London (LHR)
Frankfurt (FRA)
New York (NYC) (treating all NYC airports as the same location)

Also, Calgary (YYC), since the other two chair locations (BOS and SFO) were already proposed as venues, and I didn't want Cullen to feel left out.

RESULTS
Here are the results for each of the above venues, measured in total hours of travel (i.e., round trip).

Venue         Mean         Median           SD
----------------------------------------------
SFO           13.5             11         12.2
BOS           12.3             11          7.5
ARN           17.0             21         10.7
FRA           14.8             17          7.3
LHR           13.3             14          7.5
NYC           11.5             11          5.8
YYC           14.9             13         10.2
SFO/BOS/ARN   14.3             13          3.6
SFO/NYC/LHR   12.7             11.3        3.7

XXX/YYY/ZZZ is a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean and median are intended to be some sort of aggregate measure of travel time. I don't have any way to measure "fairness", but SD is intended as some metric of the variation in travel time between attendees.

The raw data and software are attached. The files are:

home-airports: the list of people's home airports
durations.txt: the list of airport-airport durations
doodle.txt: the attendees list
pairings: the software to compute travel times
doodle-out.txt -- the computed travel times for each attendee

This was a quick hack, so there may be errors here, but nobody has pointed out any yet.

OBSERVATIONS
Obviously, it's hard to know what the optimal solution is without some model for optimality, but we can still make some observations based on this data:

If we're just concerned with minimizing total travel time, then we would always in New York, since it has both the shortest mean travel time and the shortest median travel time, but as I said above, this arguably isn't fair to people who live either in Europe or California, since they always have to travel.

Combining West Coast, East Coast, and European venues has comparable (or at least not too much worse) mean/median values than NYC with much lower SDs. So, arguably that kind of mix is more fair.

There's a pretty substantial difference between hub and non-hub venues. In particular, LHR has a median travel time 7 hours less than ARN, and the SFO/NYC/LHR combination has a median/mean travel time about 2 hours less than SFO/BOS/ARN (primarily accounted for by the LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs here, but you'd probably get similar results if, for instance, you used AMS instead of LHR.]

Obviously, your mileage may vary based on your location and feelings about what's fair, but based on this data, it looks to me like a three-way rotation between West Coast, East Coast, and European hubs offers a good compromise between minimum cost and a flat distribution of travel times.

In which a misplaced greater than sign totally screws me over

2012-03-04T00:06:54Z

Something annoying but also instructive happened during my build of Chromium today. Everything started when I checked out a clean version and went to do a build, only to be greeted with the following exciting error:

/Users/ekr/dev/chromium/src/third_party/WebKit/Source/WebCore/WebCore.gyp
ar: input.a is a fat file (use libtool(1) or lipo(1) and ar(1) on it)
ar: input.a: Inappropriate file type or format
rm: /Users/ekr/dev/chromium/src/out/Debug/obj.target/\
  webkit_system_interface/geni/adjust_visibility/self/cuDbUtils.o: No such file or directory
make: *** [out/Debug/libWebKitSystemInterfaceLeopardPrivateExtern.a] Error 1
make: *** Waiting for unfinished jobs....

Luckily, I've run into this problem before so I know what the problem is. The script third_party/WebKit/Source/WebCore/WebCore.gyp/mac/adjust_visibility.sh, which does some library mangling, uses file to determine what kind of library it's dealing with. Unfortunately, it invokes file with an unqualified name, and since MacPorts wants to put itself at the beginning of PATH this means that you get the file implementation from MacPorts which has a slightly different output than the system file. The result is that adjust_visibility.sh decides that you have a thin version of libWebKit...a and tries to run ar on it. When ar fails, so does the build.

The fix here is to move MacPorts below /usr/bin in your path. I'd already done this—or so I thought— but it turned out that MacPorts had inserted itself twice in .cshrc so I had to edit .cshrc and then run source .cshrc. I did this, and after correcting a typo things looked good and I and went to rerun the build, only to be greeted with:

  CXX(target) out/Debug/obj.target/base/base/sync_socket_posix.o
In file included from base/sync_socket_posix.cc:18:
./base/file_util.h:416:56: error: no type named 'set' in namespace 'std'
                                            const std::set& group_gids);
                                                  ~~~~~^
./base/file_util.h:416:59: error: expected ')'
                                            const std::set& group_gids);
                                                          ^
./base/file_util.h:413:44: note: to match this '('
BASE_EXPORT bool VerifyPathControlledByUser(const FilePath& base,
                                           ^
2 errors generated.
make: *** [out/Debug/obj.target/base/base/sync_socket_posix.o]

I know what you're thinking here—or at least what I thought—someone forgot to #include and for some reason the automated builds didn't catch it, perhaps due to some conditional compilation problem getting triggered on Lion. But checking the source quite clearly showed that set was being included. Moreover, other STL containers like vector work fine. Changing from clang to GCC didn't help here, so eventually I reverted to gcc -E. For those of you who don't know, this runs the preprocessor but not the compiler and so is really useful for diagnosing this kind of include error. Here's the relevant portion of the result:

# 18 "./base/file_util.h" 2





# 1 "./set" 1
# 24 "./base/file_util.h" 2

It's a little hard to read, but if you know what to look for, it's telling you that instead of including set from /Developer, where the system include files live, the compiler is getting it from the local directory. Now, you might ask what the heck a file named set is doing in the local directory, especially as when I looked it was totally empty. Naturally, it was my fault, but it took a minute to realize what. Remember I said that I had to correct a typo in .cshrc but now what the typo was. Well, the problem was that I had written:

>set OSVER=`uname -r`

Instead of

set OSVER=`uname -r`

Of course, when I ran this it create a file called set in the current directory and since the compile flags included the current directory in the include path, the compiler duly included it instead of the system include file. And since the file was empty, there wasn't any definition of std::set and we got a compile error. Time wasted by this error: 11 minutes (not including writing this up).

What the heck is going on with Tesla batteries?

2012-02-25T15:17:17Z

Disclaimer: I am not a car guy. Read the following with that in mind.

As long-time EG readers will know, I've complained in the past that my Prius has a feeble starter/electronics battery which is easy to run down even by leaving the interior lights on. This despite the fact that the Prius has a huge battery running the hybrid system to draw on. But I certainly didn't want this. Michael DeGusta reports that if you leave your Tesla parked for a long time (like months), then the car bleeds enough power off of the battery to run the auxilary vehicle systems [parasitic load] to drain it down into deep discharge (and hance damage to the battery) territory:

A Tesla Roadster that is simply parked without being plugged in will eventually become a "brick". The parasitic load from the car's always-on subsystems continually drains the battery and if the battery's charge is ever totally depleted, it is essentially destroyed. Complete discharge can happen even when the car is plugged in if it isn't receiving sufficient current to charge, which can be caused by something as simple as using an extension cord. After battery death, the car is completely inoperable. At least in the case of the Tesla Roadster, it's not even possible to enable tow mode, meaning the wheels will not turn and the vehicle cannot be pushed nor transported to a repair facility by traditional means.
The amount of time it takes an unplugged Tesla to die varies. Tesla's Roadster Owners Manual [Full Zipped PDF] states that the battery should take approximately 11 weeks of inactivity to completely discharge [Page 5-2, Column 3: PDF]. However, that is from a full 100% charge. If the car has been driven first, say to be parked at an airport for a long trip, that time can be substantially reduced. If the car is driven to nearly its maximum range and then left unplugged, it could potentially "brick" in about one week.1 Many other scenarios are possible: for example, the car becomes unplugged by accident, or is unwittingly plugged into an extension cord that is defective or too long.
When a Tesla battery does reach total discharge, it cannot be recovered and must be entirely replaced. Unlike a normal car battery, the best-case replacement cost of the Tesla battery is currently at least $32,000, not including labor and taxes that can add thousands more to the cost.

There's been a lot of controversy about this report (see, for instance, this defense), but Tesla's response seems to by consistent with DeGusta's basic argument, as does the letter that Jalopnik reproduces above:

All automobiles require some level of owner care. For example, combustion vehicles require regular oil changes or the engine will be destroyed. Electric vehicles should be plugged in and charging when not in use for maximum performance. All batteries are subject to damage if the charge is kept at zero for long periods of time. However, Tesla avoids this problem in virtually all instances with numerous counter-measures. Tesla batteries can remain unplugged for weeks (even months), without reaching zero state of charge. Owners of Roadster 2.0 and all subsequent Tesla products can request that their vehicle alert Tesla if SOC falls to a low level. All Tesla vehicles emit various visual and audible warnings if the battery pack falls below 5 percent SOC. Tesla provides extensive maintenance recommendations as part of the customer experience.

At present, then, the agreed upon facts seem to be that:

If you leave the Tesla's batteries at zero charge, battery damage occurs.
If you leave a Tesla unplugged for long enough, even with a charged battery, parasitic load from the vehicle systems will eventually consume the battery's charge, leaving you in state (1) above. [Note that this appears to exceed the Lithium-Ion self-discharge rate, so it likely is parasitic load.]

The controversy really seems to be about who's fault this is, namely whether the customer should have known better, whether Tesla notified them correctly, etc. I don't have a Tesla so I don't care about that. I'm much more interested in the engineering question of what's going on and what, if anything, can be done about it.

The parasitic load thing isn't totally unfamiliar territory, of course. Any modern vehicle has electronics and those need power, which they get from the battery. Some do a better job than others. My BMW R1200GS motorcycle, for instance, has this problem and the manual explicitly tells you to connect it to a trickle charger (an expensive BMW model, of course, though you can use a standard one if you're willing to do a tiny bit of work) if you're not going to drive it for a while, and I duly plug it into the wall whenever I get home. If you don't do that, however, the worst you're going to be out is new lead-acid battery, which depending on what vehicle you have, leaves you out something like $50-$200, not $40,000.

However, the level of load we're talking about here seems awful high. Remember that we're talking about a battery capable of powering your car for 200 miles or so on a single charge (53 kWh). In order to deplete the battery in 11 weeks (~2000 hrs) you would need continuous battery consumption of around 30 W. For comparison, a Macbook Air has a 50Wh battery and gets something like 5 hours on a charge, so it's like the Tesla is running 5 Airs at once 24x7. It's natural to ask where all that power is going, since you don't need anywhere near that much to keep a vehicle on standby. One likely source seems to be the battery cooling system, of which Wikipedia says "Coolant is pumped continuously through the ESS both when the car is running and when the car is turned off if the pack retains more than a 90% charge. The coolant pump draws 146 watts." [Original reference and long discussion here. Note that this post is due to Martin Eberhard, one of the Tesla Founders but apparently no longer with the company at the time he wrote it. Thanks Wayback Machine for preserving this!].

Obviously, if you have a load this high, then you're going to deplete the battery. The question then becomes whether there is some way of avoiding permanent battery damage as the depletion gets to dangerous levels. The natural thing to do is install some sort of cutoff that turns off all power drain once you get close to that level. This may end up blowing away a bunch of the car's configuration (though really, it's not that hard to store that stuff in flash memory, even though historically manufacturers have tended not to), but surely it's cheaper to reboot your car than replace the entire battery pack. However, if the power is going to the cooling system and the cooling system is doing something important, like keeping the battery from being damaged by excessive heat, then this may not help.

Oh, one more thing. DeGusta claims that Tesla has the capability to remotely monitor the battery and locate the car, and has sent people out to fix it:

In at least one case, Tesla went even further. The Tesla service manager admitted that, unable to contact an owner by phone, Tesla remotely activated a dying vehicle's GPS to determine its location and then dispatched Tesla staff to go there. It is not clear if Tesla had obtained this owner's consent to allow this tracking5, or if the owner is even aware that his vehicle had been tracked. Further, the service manager acknowledged that this use of tracking was not something they generally tell customers about.

If true, that would be... interesting.

Protecting your encrypted data in the face of coercion

2012-02-11T17:28:34Z

Cryptography is great, but it's not so great if you get arrested and forced to give up your cryptographic keys. Obviously, you could claim that you've forgotten it (remember that you need a really long key to thwart exhaustive search attacks, so this isn't entirely implausible.) However, since you also need to regularly be able to decrypt your data, this means you need to be able remember your password, so it's not entirely plausible either, which means that you might end up sitting in jail for a long time due to a contempt citation. This general problem has been floating around the cryptographic community for a long term, where it's usually referred to as "rubber hose cryptanalysis", with the idea being that the attacker will torture you (i.e., beat you with a rubber hose) until you give up the key. This xkcd comic sums up the problem. Being technical people, there's been a lot of work on technical solutions, none of which are really fantastic. (see the Wikipedia deniable encryption page for one summary).

Threat model
As usual, it's important to think about the threat model, which in this case is more complicated than it initially seems. We assume that you have some encrypted data and that the attacker has a copy of that data and of the encryption software you have used. All they lack is the key. The attacker insists you hand over the key and has some mechanism for punishing you if you don't comply. Moreover, we need to assume that the attacker isn't a sadist, so as long as there's no point in punishing you further they won't. It's this last point that is the key to all the technical approaches I know of, namely convincing the attacker that they are unlikely to learn anything more by punishing you further, so they might as well stop. Of course, how true that assumption is probably depends on the precise nature of the proceedings and how much it costs the attacker to keep inflicting punishment on you. If you're being waterboarded in Guantanamo, the cost is probably pretty low, so you probably need to be pretty convincing.

Technical Approaches
Roughly speaking, there seem to be two strategies for dealing with the threat of being legally obliged to give up your cryptographic keys:

Apparent Compliance/Deniable Encryption.
Verifiable Destruction

Apparent Compliance/Deniable Encryption
The idea behind an apparent compliance strategy is that you pretend to give up your encryption key, but instead you give up another key that decrypts the message to an innocuous ciphertext. More generally, you want a cryptographic scheme which produces a given ciphertext C which maps onto a series of plaintexts M_1, M_2, ... M_n via a set of keys K_1, K_2, ... K_n. Assume for the moment that only M_n is and M_1, ... M_n-1 are either fake or real (but convincing) non-sensitive data. So, when you are captured, you reveal K_1 and claim that you've decrypted the data. If really pressed, you reveal K_2 and so on.

The reason that this is supposed to work is that the attacker is assumed to not know n. However, since they have a copy of your software, they presumably know that it's multilevel capable, so they know that there may be more than one key. They just don't know if you've given them the last key. All the difficult cryptographic problems are about avoiding revealing n. There are fancy cryptographic ways to do this (the original paper on this is by Canetti, Dwork, Naor, and Ostrovsky), but consider one simple construction. Take each message M_i and encrypt it with K_i to form C_i and then concatenate all the results to form C. The decryption procedure given a single key is to decrypt each of the sub-ciphertexts in turn and discard any which don't decrypt correctly (assume there is some simple integrity check.) Obviously, if you have a scheme this trivial, then it's easy for an attacker to see how many keys there are just by insisting you provide keys for all the data, so you also pad C with a bunch of random-appearing data which you really can't decrypt at all, which in theory creates plausible deniability. This is approximately what TrueCrypt does):

Until decrypted, a TrueCrypt partition/device appears to consist of nothing more than random data (it does not contain any kind of "signature"). Therefore, it should be impossible to prove that a partition or a device is a TrueCrypt volume or that it has been encrypted (provided that the security requirements and precautions listed in the chapter Security Requirements and Precautions are followed). A possible plausible explanation for the existence of a partition/device containing solely random data is that you have wiped (securely erased) the content of the partition/device using one of the tools that erase data by overwriting it with random data (in fact, TrueCrypt can be used to securely erase a partition/device too, by creating an empty encrypted partition/device-hosted volume within it).

How well this works goes back to your threat model. The attacker knows there is some chance that you haven't revealed all the keys and maybe if they punish you further you will give them up. So, whether you continue to get punished depends on their cost/benefit calculations, which may be fairly unfavorable to you. The problem is worse yet if the attacker has any way of determining what correct data looks like. For instance, in one of the early US court cases on this, In re Boucher, customs agents had seen (or at least claimed to had seen) child pornography on the defendant's hard drive and so would presumably have known a valid decryption from an invalid one. Basically, in any setting where the attacker has a good idea of what they are looking for and/or can check the correctness of what you give them, a deniable encryption scheme doesn't work very well, since the whole scheme relies on uncertainty about when you have actually given up the last key.

Verifiable Destruction
An alternative approach that doesn't rely on this kind of ambiguity is to be genuinely unable to encrypt the data and to have some way of demonstrating this to the attacker. Hopefully, a rational attacker won't continue to punish you once you've demonstrated that you cannot comply. It's demonstrating part that's the real problem here. Kahn and Schelling famously sum up the problem of how to win at "chicken":

Some teenagers utilize interesting tactics in playing "chicken." The "skillful" player may get into the car quite drunk, throwing whiskey bottles out the window to make it clear to everybody just how drunk he is. He wears dark glasses so that it is obvious that he cannot see much, if anything. As soon as the car reaches high speed, he takes the steering wheel and throws it out the window. If his opponent is watching, he has won. If his opponent is not watching, he has a problem;

Of course, as Allan Schiffman once pointed out to me, the really skillful player keeps a spare steering wheel in his car and throws that out the window. And our problem is similar: demonstrating that you have thrown out the data and/or key and you don't have a spare lying around somewhere.

The technical problem then becomes constructing a system that actually works. There are a huge variety of potential technical options here, but at a high-level, it seems like solutions fall into two broad classes, active and passive. In an active scheme, you actively destroy the key and/or the data. For instance, you could have the key written on a piece of paper which you eat, or there is a thermite charge on your computer which melts it to slag when you press a button. In a passive system, by contrast, no explicit action is required by you, but you have some sort of deadman switch which causes the key/data to be destroyed if you're captured. So, you might store the data in a system like Vanish (although there are real questions about the security of Vanish per se), or you have the key stored offsite with some provider who promises to delete the key if you are arrested or if you don't check in every so often.

I'm skeptical of how well active schemes can be made to work: once it becomes widely known how any given commercial scheme works, attackers will take steps to circumvent it. For instance, if there is some button you press to destroy your data, they might taser you and ask questions later to avoid you pressing it. Maybe someone can convince me otherwise, but this leaves us mostly with passive schemes (or semi-passive schemes as discussed in a bit.) Consider the following strawman scheme:

Your data is encrypted in the usual way, but part of the encryption key is stored offsite in some location inaccessible to the attacker (potentially outside their legal jurisdiction if we're talking about a nation-state type attacker). The encryption key is stored in a hardware security module, and if the key storage provider doesn't hear from you (and you have to prove possession of some key) every week (or two weeks or whatever), they zeroize the HSM, thus destroying your key. It's obviously easy to build a system like this where the encryption software automatically contacts the key storage provider, proves possession, and thus resets their deadman timer, so as long as you use your files every week or so, you're fine.

So, if you're captured, you just need to hold out until the deadman timer expires and then the data really isn't recoverable by you or anyone else. Of course, "not recoverable" isn't the same as "provably not recoverable", since you could have kept a backup copy of the keys somewhere—though the software could be designed in a way that this was inconvenient, thus giving some credibility to the argument that you did not. Moreover, this design is premised on the assumption that there is actually somewhere that you could store your secret data that the attacker couldn't get it from. This may be reasonable if the attacker is the local police, but perhaps less so if the attacker is the US government. And of course any deadman system is hugely brittle: if you forget your key or just don't refresh for a while, your data is gone, which might be somewhat inconvenient.

One thing that people often suggest is to have some sort of limited-try scheme. The idea here is that the encryption system automatically erases the data (and/or a master key) if the wrong password/key is entered enough times. So, if you can just convincingly lie N times and get the attacker to try those keys, then the data is gone. Alternately, you could have a "coercion" key which deletes all the data. It's clear that you can't build anything like this in a software-only system: the attacker will just image the underlying encrypted data and write their own decryption software which doesn't have the destructive feature. You can, however, build such a system using hardware security modules (assume for now that the HSM can't be broken directly.) This is sort of a semi-passive scheme in that you are intentionally destroying the data, but the destruction is produced by the attacker keying in the alleged encryption key.

The big drawback with any verifiable destruction system is that it leaves evidence that you could have complied but didn't; in fact, that's the whole point of the system. But this means that the attacker's countermove is to credibly commit to punishing you for noncompliance after the fact. I don't think this question has ever been faced for crypto, but it has been faced in other evidence-gathering contexts. Consider, for instance, the case of driving under the influence: California requires you to take a breathalyzer or blood test as a condition of driving [*], and refusal carries penalties comparable to those for being convicted of DUI. One could imagine a more general legal regime in which actively or passively allowing your encrypted data to be destroyed once you have been arrested was itself illegal, and with a penalty that was large enough that it would almost never be worth refusing to comply (obviously the situation would be different in extra-legal settings, but the general idea seems transferable.) I'll defer to any lawyers reading this about how practical such a law would actually be.

Bottom Line
Obviously, neither of these classes of solution seems entirely satisfactory from the perspective of someone who is trying to keep their data secret. On the other hand, it's not clear that this is really a problem that admits of a good technical solution.

GIT Y U NO...

2012-01-24T03:01:34Z

You have to have used git to really understand this one, but...

[16] git checkout f4a56
Note: checking out 'f4a56'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at f4a560b... Foo

As you may have gathered from this long warning, you most likely don't want to be in a detached head setting, you probably just meant to create a branch or wanted to rollback a commit but typed the wrong thing. Which is why there are lots of pages about what this means and how to get yourself out. My contribution to this literature can be found below the fold.

]]>

What is this all this crap in my wallet?

2012-01-23T02:35:06Z

On my way to Red Rock today to do some work, I looked in my wallet to see if I had enough money to afford my hot chocolate (paying for a $3.50 drink with a credit card is a pretty lame move). Here's what I found:

After some sorting, it comes out as follows...

Currency	Count	Value (nominal)	Value (USD)
USD	3	3	3
CAD	7	100	98.55
CZK	2	2100	106.40
GBP	1	10	15.55
EUR	1	20	25.79
INR	1	100	1.99
RUB	9	1570	49.97
Total	24	-	301.25

In other words, out of 24 total pieces of paper valued at over $300, I had three spendable pieces of paper valued at $3. Oh, and a couple of United beverage vouchers which expire in 9 days. I ended up going to the ATM.

Does DNSSEC really interfere with SOPA/PIPA?

2012-01-22T06:53:22Z

You've of course heard by now that much of the Internet community thinks that SOPA and PIPA are bad, which is why on January 16, Wikipedia shut itself down, Google had a black bar over their logo, etc. This opinion is shared by much of the Internet technical community, and in particular much has been made of the argument made by Crocker et al. that DNSSEC and PIPA are incompatible. A number of the authors of the statement linked above are friends of mine, and I agree with much of what they write in it, but I don't find this particular line of argument that convincing.

Background
As background, DNS has two kinds of resolvers:

Authoritative resolvers which host the records for a given domain.
Recursive resolvers which are used by end-users for name mapping. Typically they also serve as a cache.

A typical configuration is for end-user machines to use DHCP to get their network configuration data, including IP address and the DNS recursive resolvers to use. Whenever your machine joins a new network, it gets whatever resolver that network is configured for, which is frequently whatever resolver is provided by your ISP. One of the requirements of some iterations of PIPA and SOPA has been that recursive resolvers would have to block resolution of domains designated as bad. Here's the relevant text from PIPA:

(i) IN GENERAL- An operator of a nonauthoritative domain name system server shall take the least burdensome technically feasible and reasonable measures designed to prevent the domain name described in the order from resolving to that domain name's Internet protocol address, except that--
(I) such operator shall not be required--
(aa) other than as directed under this subparagraph, to modify its network, software, systems, or facilities;
(bb) to take any measures with respect to domain name lookups not performed by its own domain name server or domain name system servers located outside the United States; or
(cc) to continue to prevent access to a domain name to which access has been effectively disable by other means; and ...

(ii) TEXT OF NOTICE.-The Attorney General shall prescribe the text of the notice displayed to users or customers of an operator taking an action pursuant to this subparagraph. Such text shall specify that the action is being taken pursuant to a court order obtained by the Attorney General.

This text has been widely interpreted as requiring operators of recursive resolvers to do one of two things:

Simply cause the name resolution operation to fail.
Redirect the name resolution to the notice specified in (ii).

The question then becomes how one might implement these.

Technical Implementation Mechanisms
Obviously if you can redirect the name, you can cause the resolution to fail by returning a bogus address, so let's look at the redirection case first. Crocker et al. argue that DNSSEC is designed to secure DNS data end-to-end to the user's computer. Thus, any element in the middle which modifies the DNS records to redirect traffic to a specific location will break the signature. Technically, this is absolutely correct. However, it is mitigated by two considerations.

First, the vast majority of client software doesn't do DNSSEC resolution. Instead, if you're resolving some DNSSEC-signed name and the signature is being validated at all it's most likely being validated by some DNSSEC-aware recursive resolver, like the ones Comcast has recently deployed. Such a resolver can easily modify whatever results it is returning and that change will be undetectable to the vast majority of client software (i.e., to any non-DNSSEC software).¹. So, at present, a rewriting requirement looks pretty plausible.

Crocker et al. would no doubt tell you that this is a transitional stage and that eventually we'll have end-to-end DNSSEC, so it's a mistake to legislate new requirements that are incompatible with that. If a lot of endpoints start doing DNSSEC validation, then ISPs can't rewrite undetectably. They can still make names fail to resolve, though, via a variety of mechanisms. About this, Crocker et al. write:

Even DNS filtering that did not contemplate redirection would pose security challenges. The only possible DNSSEC-compliant response to a query for a domain that has been ordered to be filtered is for the lookup to fail. It cannot provide a false response pointing to another resource or indicate that the domain does not exist. From an operational standpoint, a resolution failure from a nameserver subject to a court order and from a hacked nameserver would be indistinguishable. Users running secure applications have a need to distinguish between policy-based failures and failures caused, for example, by the presence of an attack or a hostile network, or else downgrade attacks would likely be prolific.[12]
..
12. If two or more levels of security exist in a system, an attacker will have the ability to force a "downgrade" move from a more secure system function or capability to a less secure function by making it appear as though some party in the transaction doesn't support the higher level of security. Forcing failure of DNSSEC requests is one way to effect this exploit, if the attacked system will then accept forged insecure DNS responses. To prevent downgrade attempts, systems must be able to distinguish between legitimate failure and malicious failure.

I sort of agree with the first part of this, but I don't really agree with the footnote. Much of the problem is that it's generally easy for network-based attackers to generate situations that simulate legitimate errors and/or misconfiguration. Cryptographic authentication actually makes this worse, since there are so many ways to screw up cryptographic protocols. Consider the case where the attacker overwrites the response with a random signature. Naturally the signature is unverifiable, in which case the resolver's only response is to reject the records, as prescribed by the DNSSEC standards. At this point you have effectively blocked resolution of the name. It's true that the resolver knows that something is wrong (though it can't distinguish between attack and misconfiguration), but so what? DNSSEC isn't designed to allow name resolution in the face of DoS attack by in-band active attackers. Recursive resolvers aren't precisely in-band, of course, but the ISP as a whole is in-band, which is one reason people have talked about ISP-level DNS filtering for all traffic, not just filtering at recursive resolvers.

Note that I'm not trying to say here that I think that SOPA and PIPA are good ideas, or that there aren't plenty of techniques for people to use to evade them. I just don't think that it's really the case that you can't simultaneously have DNSSEC and network-based DNS filtering.

^1. Technical note: As I understand it, if you're a client resolver who wants to validate signatures itself needs to send the DO flag (to get the recursive resolver to return the DNSSEC records) and the CD flag (to suppress validation by the recursive resolver). This means that the recursive resolver can tell when its safe to rewrite the response without being detected. If DO isn't set, then the client won't be checking signatures. If CD isn't set, then the recursive resolver can claim that the name was unvalidatable and generate whatever error it would have generated in that case (Comcast's deployment seems to generate SERVFAIL for at least some types of misconfiguration.)

The Supremes on the failure of broadcast content controls

2012-01-11T16:10:59Z

In Dahlia Lithwick's report on FCC v. Fox (about the FCC's TV indecency policy), she writes:

Justice Stephen Breyer raises a question about why the ABC ass case is being heard together with the fleeting-expletives case. Justice Ginsburg asks whether Hair could be broadcast on network television (Verrilli: "Serious questions") and then whether the opera Metropolis could be broadcast (Verrilli: "Context-based approach"). Then Justice Anthony Kennedy interrupts the parade of naked horrible to clarify: "What you're saying is that there is a public value in having a particular segment of the media with different standards than other segments." Verrilli replies that, yes, this is about preserving "a safe haven where if parents want to put their kids down in front of the television at 8:00 p.m. they're not going to have to worry about whether the kids are going to get bombarded with curse words or nudity."
Because if you want that, you can find it in the back seat of my car, at rush hour when we're late for Kung Fu. Just ask my children.
Kennedy replies that the V-chip is available and that "you ask your 15-year-old, or your 10-year-old, how to turn off the chip. They're the only ones that know how to do it."

I'm not saying this isn't true--though I rather suspect it's more likely that parents don't know how to turn on the V-chip [explanation here, in case you don't know] than that they don't know how to turn it off. However, I think discussion illustrates pretty clearly the confusion over the problem that people are trying to solve. (The terminology "threat model" as applied to children probably sounds funny to non-parents.) In any case, there are two different things one might be trying to accomplish with respect to potentially objectionable content:

Prevent children from inadvertantly accessing objectionable content.
Prevent children from intentionally accessing objectionable content.

If your objective is the former, then the V-chip works fine (except for the horrible UI); you just configure your device to suppress objectionable content. The sort of content-based regulation the FCC is engaging in works as well, assuming you don't let your kids watch TV except in the "safe" period, but it's a very inefficient mechanism compared to the V-chip, being both overbroad (affecting everyone, including people who don't have children) and not very effective, as it applies only to broadcast TV.

On the other hand, if your model is to prevent children from intentionally accessing objectionable content, and you further expect them to attempt to bypass content controls, then restrictions on broadcast TV don't do very much given that (a) if you have cable your kids can just tune to unrestricted channels and (b) large number of other sources of such content exist on the Internet. Blocking the tiny sliver of such content you still get through broadcast TV mostly looks silly and anachronistic. Though I guess it's less silly if you get hit with a huge fine for breaching the rather unclear rules.

Web form complaints

2012-01-01T01:41:36Z

Spent some of today getting my 2011 charitable donations out of the way, so I've been experiencing a lot of different Web forms. Remember, these people want my money, so it would be nice if they didn't make the experience so irritating. On that basis, here are some things not to do:

Refuse to accept spaces or dashes in my credit card number, phone number, social security number, etc. Don't force me into your stupid format; parse whatever I send you. Here, let me help. The following JS code strips out spaces and dashes. input = input.replace(/[ \-]/g, "");. For an appropriately huge consulting fee I'll show you how to replace periods and pluses, too.
Force me to tell you what kind of credit card I have. This information is encoded in the leading digits of the credit card number. This table may help. I know that things change, but seriously, you could at least try to guess.
Force me to select "USA" out of the end of an incredibly long drop-down list of countries. It's true that you can generally determine someone's country by looking at their IP address, but I can certainly understand not wanting to bother with that, but if most of your customers are American, it's silly to force them to scroll all the way to the end out of a misguided notion of national equity. Make my life easy and put the USA as the first item in the list, people.
Make me enter my state and my zip code. In nearly all cases, the zip code encodes the state.

Also, not a Web form issue, but I also wish there were some way to tell these organizations not to ask me for donations during the year. I give once a year, at the end of the year. It's just a matter of convenience. Sending me a bunch of physical letters asking for money just wastes your fund raising dollars and my time.

Somelliers for beer... wait, what?

2011-12-23T05:24:50Z

Mark Garrison has a rather odd article in Slate arguing that we need expert advice to order beer in restaurants:

It's a busy night at the D.C. restaurant Birch & Barley, as well as its casual upstairs sister joint, ChurchKey. Greg Engert is guiding me through his beverage list with all the knowledge, talent, and grace one would expect from an award-winning sommelier. With a couple crisp queries, he learned enough to make some intriguing recommendations. He didn't flaunt his knowledge about food and drink, but when I had questions, he gave precise answers about the flavor, aroma, producer, pairing potential, and even the history of the available beverages. Fortunately, there was no attempt at upselling, the odious sin far too many sommeliers commit, a big reason why many diners are suspicious of the entire profession.
...
There may be agreement in the industry that great beer deserves top-notch service, but there's not yet a consensus on what that means. In fact, there's not even agreement on what to call a well-trained beer server. Engert's job title is beer director, but he doesn't mind being called a beer sommelier. (He has put some thought into this.) Some in the beer community find this term problematic, since "sommelier" is tied to the wine world and may imply a professional certification that doesn't exist.
...
The program's website states the claim that wine sommeliers might have known enough to choose a good beer for you a few decades ago, but now "the world of beer is just as diverse and complicated as wine. As a result, developing true expertise in beer takes years of focused study and requires constant attention to stay on top of new brands and special beers." So Daniels set out to build a testing and certification program to create a standard level of knowledge and titles that would signify superior beer knowledge to consumers, similar to the way a Court of Master Sommeliers credential does for wine.

Look, I love beer, don't like wine, and am well aware of the lousy beer service one typically gets at restaurants, so I'm generally in favor of anything that improves beer quality. But the main the problem isn't that there's nobody at the restaurant who understands beer. It's that the beer selection at restaurants sucks. To take one recent example, I ate at the Los Altos Grill the other night: they had a page of wines and three beers on tap. This isn't uncommon; in fact it's not uncommon for restaurants to have solid wine lists but only bottled beer, and only a few varieties of bottles at that. The question I have for waiters isn't "what beer do you recommend", but rather "is Peroni really the best beer you have?"

In large part, the culprit here is customer demand: people who eat at high-end restaurants tend to prefer wine to beer, so those restaurants naturally have lousy beer selections. But I suspect that the chemistry of beer has a lot to do with it as well. Wine can last years in the bottle—and many wines are better when aged—but bottled beer has a shelf life measured in months, with draft beer going bad in in a few weeks. So, unlike wine, you can't afford to stock any beer that people don't order fairly frequently, since there's too high a chance it will go bad before someone orders it. I suspect that this is why most restaurants keep such a small beer selection. (Anyone with contacts in the restaurant business should feel free to chime in here.)

The major exception here is restaurants that specialize in beer (Garrison's example of Birch & Barley advertises itself as "a completely unique food and beer experience celebrating a full spectrum of styles, traditions, regions and flavors"). If you're that kind of restaurant you probably get enough volume to keep a large inventory without things getting too stale—though I do wonder what the oldest bottle on their shelves tastes like.

Do we need DNS confidentiality?

2011-12-18T14:33:25Z

The first step in most Internet communications is name resolution: mapping a text-based hostname (e.g., www.educatedguesswork.org) to a numeric IP address (e.g,, 69.163.249.211). This mapping is generally done via the Domain Name System (DNS), a global distributed database. The thing you need to know about the security of the DNS is that it doesn't have much: records are transmitted without any cryptographic protection, either for confidentiality or integrity. The official IETF security mechanism, DNSSEC is based on digital signatures and so offers integrity, but not confidentiality, and in an any case has seen extremely limited deployment. Recently, OpenDNS rolled out DNSCrypt, which provides both encrypted and authenticated communications between your machine and a DNSCrypt-enabled resolver such as the one operated by OpenDNS. OpenDNS is based on DJB's DNSCurve and I've talked about comparisons between DNSSEC and DNSCurve before, but what's interesting here is that OpenDNS is really pushing the confidentiality angle:

In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks. It doesn't require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers. We know that claims alone don't work in the security world, however, so we've opened up the source to our DNSCrypt code base and it's available on GitHub.
DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user's online security and privacy.

Unfortunately, I don't think this argument really holds up under examination. Remember that DNS is mostly used to map names to IP addresses. Once you have the IP address, you need to actually do something with it, and generally that something is to connect to the IP address in question, which tends to leak a lot of the information you encrypted.

Consider the (target) case where we have DNSCrypt between your local stub resolver and some recursive resolver somewhere on the Internet. The class of attackers this protects against is those which have access to traffic on the wire between you and the resolver. Now, if I type http://www.educatedguesswork.org/ into my browser, what happens is that the browser tries to resolve www.educatedguesswork.org, and what the attacker principally learns is (1) the hostname I am querying for and (2) the IP address(es) that were returned. The next thing that happens, however, is that my browser forms a TCP connection to the target host and sends something like this:

GET / HTTP/1.1
Host: www.educatedguesswork.org
Connection: keep-alive
Cache-Control: max-age=0
...

Obviously, each IP packet contains the IP address of the target the Host header contains the target host name, so any attacker on the wire learns both. And as this information is generally sent over the same access network as the DNS request, the attacker learns all the information they would have had if they had been able to observe my DNS query. [Technical note: when Tor is configured properly, DNS requests are routed over Tor, rather than over the local network. If that's not true, you have some rather more serious problems to worry about than DNS confidentiality.]

"You idiot," I can hear you saying, "if you wanted confidentiality you should have used SSL/TLS." That's true, of course, but SSL/TLS barely improves the situation. Modern browsers provide the target host name of the server in question in the clear in the TLS handshake using the Server Name Indication (SNI) extension. (You can see if your browser does it here), so the attacker learns exactly the same information whether you are using SSL/TLS or not. Even if your browser doesn't provide SNI, the hostname of the server is generally in the server's certificate. Pretty much the only time that a useful (to the attacker) hostname isn't in the certificate is when there are a lot of hosts hidden behind the same wildcard certificate, such as when your domain is hosted using Heroku's "piggyback SSL". But this kind of certificate sharing only works well if your domain is subordinated behind some master domain (e.g, example-domain.heroku.com), which isn't really what you want if you're going to offer a serious service.

This isn't to say that one couldn't design a version of SSL/TLS that didn't leak the target host information quite so aggressively—though it's somewhat harder than it looks—but even if you were to do so, it turns out to be possible to learn a lot about which sites you are visiting via traffic analysis (see, for instance here and here). You could counter this kind of attack as well, of course, but that requires yet more changes to SSL/TLS. This isn't surprising: concealing the target site simply wasn't a design goal for SSL/TLS; everyone just assumed that it would be clear what site you were visiting from the IP address alone (remember that when SSL/TLS was designed, it didn't even support name-based virtual hosting via SNI). I haven't seen much interest in changing this, but unless and until we do, it's hard to see how providing confidentiality for DNS traffic adds much in the way of security.

An overview of espresso-making technology

2011-12-08T16:35:21Z

I've been meaning to write something about espresso and the various technology options for making one, but I never get around to it. Now I have. I'm not an espresso-making expert, but I'm a guy who cares about espresso, has a moderate but not extreme budget, and can pull a fairly solid shot. As such, this might or might not be useful to you. There are many articles like this, but this one is mine.

The discussion below is restricted to what's called "semi-automatic" machines: those where you grind the coffee yourself but the machine has controls designed to regulate temperature and pressure. "Super-automatic" where you put in beans and water and they put out coffee are out of scope here.

Consistency
The basic principle of espresso is simple: you grind up the coffee, pack it down and then force heated water through under pressure. The difference between swill and pure liquid perfection is in the details. Moreover, if you're going to get the details right, the first thing you need to do is get them consistent; the exact procedures and settings you need differ with each coffee and each machine, but if you can be consistent then you can dial them in over time. [Aside: when I took machining in college, the first thing the instructor told me was that machining wasn't about cutting metal, it was about measurement. If you could measure accurately, you could cut accurately.] The major variables you need to control are:

The coffee itself.
The grind.
The amount of coffee.
The dispersal into the portafilter basket and the tamp.
Water temperature.
Water pressure.

The coffee is something you buy, so you have some control over it but not complete control. With the right grinder, you can completely control the grind and the amount of coffee. Dispersal and tamp is a matter of personal technique and practice. With the right espresso machine, you can control water temperature quite precisely and with any pump machine, pressure control should be quite good. So, as you can tell, this is primarily a matter of getting good equipment.

Grinder
The grinder thing is pretty simple: get a burr grinder with enough adjustments. Don't get a doser. Get one with a timer. A little elaboration: blade grinders (the cheap canister ones that you can buy for $20-$40) don't do a good job of getting you a consistent grind. The individual grounds aren't the same size and you can't control the overall size except by grinding longer. Don't buy one. You want a burr grinder and you want one that allows you to adjust the grind finely and over a large range. Different beans require different grinder settings, so easy adjustment matters if you change beans much.

The reason you want a timer is to let you control the amount of coffee you grind. This is a parameter people usually specify by mass, but using a scale is a pain in the ass. Grind time is a good proxy here. What I typically do is make some test shots and then set the grind time on my grinder (it has 3 presets). Then when I want to pull a shot I just put the portafilter under the grinder and hit the right preset button. None of this requires much thought once you get it wired.

There are lots of good grinders. What I have is a Baratza Vario. There are two features I like about this. First, it has easy adjustments with two slides up front, one for macro (espresso versus drip) and one for micro (grind fineness once you've selected espresso). Second, it has timer presets, which, as I said earlier, is super-convenient. There's a rest for you to put the portafilter on while you grind, but you need to hold it there or it falls off. I notice that Baratza now makes a weight-based Vario W. This seems like a good idea, but I don't know how well it will work with espresso, since you don't want to grind into a hopper but right into your portafilter, and it's not clear how the scale integrates with that. One caution I would have with the Vario is that the really gross burr adjustments are done with a hex wrench (included). They're easy but kinda scary (keep turning until the motor starts to labor), so if that freaks you out, you might consider another choice.

Espresso Machine
There are a lot of choices in what kind of espresso machine you buy, but let's get something out of the way now: espresso machines have pumps. Yes, you can buy a cheap machine that works off steam pressure, but that's not what you want.

The central problem that dictates the design of an espresso machine is this: The water you use to make espresso needs to be at one temperature (~200 F). The water you use to steam your milk needs to be at steam temperatures (~250 F). If you're going to make milk drinks (I don't, but Mrs. G. does) then you need to somehow address this. There are four basic approaches that I've seen:

Have a single boiler and a switch that selects which temperature to maintain at (a single boiler machine).
Have two boilers, one at each temperature (a double boiler machine).
Have a boiler set to steam temperature and use a heat exchanger to heat your water to espresso temperature.
Have a boiler set to water temperature and an electric thermal block heating system to make steam.

Single boiler machines are basically a terrible solution for more than about one or two people if you want to make any kind of steamed milk drink. Here's what the procedure looks like if you want to make a latte: set the thermostat switch to "water"; pull a shot; set the thermostat switch to steam; wait for it to heat up; steam your milk. This is all reasonably fast because the boiler heats up fast. However, say you want to make another latte. Now you have to set the thermostat back to water and wait for it to cool down, which can take minutes. You can accelerate this some by just running water through the group head which pulls cool water out of the reservoir into the system, but basically it's a pain. I've used this kind of machine in an office setting and it sucks.

The obvious (and best) solution to this problem is to have two totally separate boilers, with one set to water and one set to steam. This is of course more expensive, especially since manufacturers seem to have decided to engage in a little market segmentation. To give you an example, Chris Coffee's cheapest double boiler is the Mini Vivaldi II at $1995. They'll sell you a Rancilio Silvia (a very nice single boiler) for $699. This isn't an uncommon pattern: many double boiler machines sell for more than twice what a good single boiler would cost. I don't know anyone who has bought two singles instead, but it's sure occurred to me.

The other two solutions are compromises. In a heat exchanger machine, the boiler is set to steam temperature and then the water for the espresso runs through a tube set inside the boiler, thus heating up on the way (good description here. The idea is that as the water is being pulled out of the reservoir and onto the coffee it heats up. The obvious problem, however, is that when you're not pulling espresso, the water in the heat exchanger tube is heating up eventually to the temperature of the steam, at which point you're back where you started, as is the heavy metal group head which provides a lot of thermal intertia. Standard procedure here is a cooling flush which means that you run some water through the (empty) portafilter/brew group to get it down below the right temperature. Then you quickly pack the portafilter and pull your shot. This all requires some coordination.

About a year ago, QuickMill came out with a new machine (the Silvano), which has a single boiler for the water and a thermoblock for the steam. This has the advantage that you can tightly temperature control the water and the group head and still get decent steam fast. The steam isn't as good as it would be if you had an actual boiler, but it's pretty good, so it's a reasonable compromise. And since the water side is temperature controlled, you get to pull a predictable shot without much messing around, which is what I, at least, am after. It shouldn't be surprising at this point that I have a Silvano, which I'm pretty happy with. Here's what it looks like pulling a shot of Four Barrel Ethiopia Welena Suke Quto (and no, those two little spurts onto the backsplash are not intended. That's evidence of tamping error.)

Oh, one more thing: the water supply for espresso machines can either be plumbed (there is a water tube coming from your pipes) or unplumbed (there is a water reservoir you have to refill). Plumbed typically only comes on higher end machines. I don't know if it's worth stepping up to one of those machines to get plumbed, but I do know that my Silvano is unplumbed and I wish it were plumbed. It's pretty annoying to have the shot already to go and realize you're out of water. Doubly annoying if it's your last shot worth of coffee.