"There's a 70% chance of rain tomorrow," says the weather app on your phone. "There’s a 30% chance my flight will be delayed," posts a colleague on Slack. Scientific theories also include chances: “There’s a 50% chance of observing an electron with spin up,” or (less fundamental) “This is a fair die — the probability of it landing on 2 is one in six.”
We constantly talk about chances and probabilities, treating them as features of the world that we can discover and disagree about. And it seems you can be objectively wrong about the chances. The probability of a fair die landing on 2 REALLY is one in six, it seems, even if everybody in the world thought otherwise. But what exactly are these things called “chances”?
Readers...
One pet peeve of mine is that actual weather forecasts for the public don't disambiguate interpretations of rain chance. Is it the chance of any rain at some point in that day or hour? Is it the expected proportion of that day or hour during which it will be raining?
TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate or send me an email, DM, signal message (+1 510 944 3235), or public comment on this post, if you want to support what we do. We are a registered 501(c)3, have big plans for the next year, and due to a shifting funding landscape need support from a broader community more than in any previous year. [1]
I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means:
Have donated $400. I appreciate the site and its team for all it's done over the years. I'm not optimistic about the future wrt to AI (I'm firmly on the AGI doom side), but I nonetheless think that LW made a positive contribution on the topic.
Anecdote: In 2014 I was on a LW Community Weekend retreat in Berlin which Habryka either organized or did a whole bunch of rationality-themed presentations in. My main impression of him was that he was the most agentic person in the room by far. Based on that experience I fully expected him to eventually accomplish some arbitrary impressive thing, though it still took me by surprise to see him specifically move to the US and eventually become the new admin/site owner of LW.
Epistemic status -- sharing rough notes on an important topic because I don't think I'll have a chance to clean them up soon.
Suppose a human used AI to take over the world. Would this be worse than AI taking over? I think plausibly:
Almost no competent humans have human extinction as a goal. AI that takes over is clearly not aligned with the intended values, and so has unpredictable goals, which could very well be ones which result in human extinction (especially since many unaligned goals would result in human extinction whether they include that as a terminal goal or not).
I don't think we have good evidence that almost no humans would pursue human extinction if they took over the world, since no human in history has ever achieved that level of power.
Most historical conquerors ...
I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model.
One question that's really important is how likely scheming is. But it's also really important to know how much we expect this uncertainty to be resolved by various key points in the future. I think it's about 25% likely that the first AIs capable of obsoleting top human experts[1] are scheming. It's really important for me to know whether I expect to make basically no updates to my P(scheming)[2] between here and the advent of potentially dangerously scheming models, or whether I expect...
Abilities/intelligence come almost entirely from pretraining, so all the situation awareness and scheming capability that current (and future similar) frontier models possess is thus also mostly present in the base model.
Yes, but for scheming, we care about whether the AI can self-locate itself as an AI using its knowledge. The fact that (at a minimum) sampling from the system is required for it to self-locate as an AI might make a big difference here.
Who cares if it greatly reduces competitiveness in experimental training runs?
Yes, reducing situati...
Traditional economics thinking has two strong principles, each based on abundant historical data:
When I stated Principle (A) at the top of the post, I was stating it as a traditional principle of economics. I wrote: “Traditional economics thinking has two strong principles, each based on abundant historical data”,
I don't think you think Principle [A] must hold, but I do think you think it's in question. I'm saying that, rather than taking this very broad general principle of historical economic good sense, and giving very broad arguments for why it might or might not hold post-AGI, we can start reasoning about superintelligent manufacturing [includ...
We should only use AGI once to make it so that no one, including ourselves, can use it ever again.
I'm terrified of both getting atomized by nanobots and of my sense of morality disintegrating in Extremistan. We don't need AGI to create a post-scarcity society, cure cancer, solve climate change, build a Dyson sphere, colonize the galaxy, or any of the other sane things we're planning to use AGI for. It will take us hard work and time, but we can get there with the power of our own minds. In fact, we need that time to let our sense of morality adjust to our ever-changing reality. Even without AGI, most people already feel that technological progress is too fast for them to keep up.
Some of the...
I appreciate you engaging and writing this out. I read your other post as well, and really liked it.
I do think that AGI is the unique bad technology. Let me try to engage with the examples you listed:
links 1/13/2025: https://roamresearch.com/#/app/srcpublic/page/01-13-2025
I'm behind on a couple of posts I've been planning, but am trying to post something every day if possible. So today I'll post a cached fun piece on overinterpreting a random data phenomenon that's tricked me before.
Recall that a random walk or a "drunkard's walk" (as in the title) is a sequence of vectors in some such that each is obtained from by adding noise. Here is a picture of a 1D random walk as a function of time:
A random walk is the "null hypothesis" for any ordered collection of data with memory (a.k.a. any "time series" with memory). If you are looking at some learning process that updates state to state with some degree of stochasticity, seeing a random walk...
Interesting! Perhaps one way to not be fooled by such situations could be to use a non-parametric statistical test. For example, we could apply permutation testing: by resampling the data to break its correlation structure and performing PCA on each permuted dataset, we can form a null distribution of eigenvalues. Then, by comparing the eigenvalues from the original data to this null distribution, we could assess whether the observed structure is unlikely under randomness. Specifically, we’d look at how extreme each original eigenvalue is relative to those...
Much of this content originated on social media. To follow news and announcements in a more timely fashion, follow me on Twitter, Threads, Bluesky, or Farcaster.
Regarding The Two Cultures essay:
I have gained so much buttressing context from reading dedicated history about science and math that I have come around to a much blunter position than Snow's. I claim that an ahistorical technical education is technically deficient. If a person reads no history of math, science, or engineering than they will be a worse mathematician, scientist, or engineer, full stop.
Specialist histories can show how the big problems were really solved over time.[1] They can show how promising paths still wind up being wrong, and the ...