Daniel Herrmann, ben_levinstein, Aydin Mohseni

"There's a 70% chance of rain tomorrow," says the weather app on your phone. "There’s a 30% chance my flight will be delayed," posts a colleague on Slack. Scientific theories also include chances: “There’s a 50% chance of observing an electron with spin up,” or (less fundamental) “This is a fair die — the probability of it landing on 2 is one in six.”

We constantly talk about chances and probabilities, treating them as features of the world that we can discover and disagree about. And it seems you can be objectively wrong about the chances. The probability of a fair die landing on 2 REALLY is one in six, it seems, even if everybody in the world thought otherwise. But what exactly are these things called “chances”?

Readers...

(Continue Reading – 1952 more words)

AnthonyC4m20

One pet peeve of mine is that actual weather forecasts for the public don't disambiguate interpretations of rain chance. Is it the chance of any rain at some point in that day or hour? Is it the expected proportion of that day or hour during which it will be raining?

2transhumanist_atom_understander3h

Yes, de Finetti's theorem shows that if our beliefs are unchanged by exchanging members of the sequence, that's mathematically equivalent to having some "unknown probability" that we can learn by observing the sequence. Importantly, this is always against some particular background state of knowledge, in which our beliefs are exchangeable. We ordinary have exchangeable beliefs about coin flips, for example, but may not if we had less information (such as not knowing they're coin flips) or more (like information about the initial conditions of each flip). In my post on unknown probabilities, I give more detail on how they are definedc which turns out to involve a specific state of background knowledge, so they only act like a "true" probability relative to that background knowledge. And how they can be interpreted as part of a physical understanding of the situation. Personally, rather than observing that my beliefs are exchangeable and inferring an unknown probability as a mathematical fiction, I would rather "see" the unknown probability directly in my understanding of the situation, as described in my post.

4winstonBosan3h

Great stuff! I don't have strong fundamentals in math and statistics but I was still able to hobble along and understand the post. It reminds me of what Rissanen said about data/observation - that data is really all we have, and there is no true state of nature. Our job is to squeeze as much alpha out of observation as possible, instead of trying to find a "true" generator function. This post hit the same spot for me :)

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

597

habryka

1mo

TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate or send me an email, DM, signal message (+1 510 944 3235), or public comment on this post, if you want to support what we do. We are a registered 501(c)3, have big plans for the next year, and due to a shifting funding landscape need support from a broader community more than in any previous year. ^[1]

I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means:

We build and run LessWrong.com and the AI Alignment Forum.^[2]
We built and run Lighthaven (lighthaven.space), a ~30,000 sq.

...

(Continue Reading – 12350 more words)

MondSemmel6m40

Have donated $400. I appreciate the site and its team for all it's done over the years. I'm not optimistic about the future wrt to AI (I'm firmly on the AGI doom side), but I nonetheless think that LW made a positive contribution on the topic.

Anecdote: In 2014 I was on a LW Community Weekend retreat in Berlin which Habryka either organized or did a whole bunch of rationality-themed presentations in. My main impression of him was that he was the most agentic person in the room by far. Based on that experience I fully expected him to eventually accomplish some arbitrary impressive thing, though it still took me by surprise to see him specifically move to the US and eventually become the new admin/site owner of LW.

2FeepingCreature29m

Haven't heard back yet...

2Chipmonk6h

I wonder if you could set up a conditional donation? “I donate $X, minus if total donations exceed $3M"

5Dmitrii Krasheninnikov7h

Donated $100 for now. Thanks for the great work!

Human takeover might be worse than AI takeover

101

Tom Davidson

Epistemic status -- sharing rough notes on an important topic because I don't think I'll have a chance to clean them up soon.

Summary

Suppose a human used AI to take over the world. Would this be worse than AI taking over? I think plausibly:

In expectation, human-level AI will better live up to human moral standards than a randomly selected human. Because:
- Humans fall far short of our moral standards.
- Current models are much more nice, patient, honest and selfless than humans.
  - Though human-level AI will have much more agentic training for economic output, and a smaller fraction of HHH training, which could make them less nice.
- Humans are "rewarded" for immoral behaviour more than AIs will be
  - Humans evolved under conditions where selfishness and cruelty often paid high dividends, so evolution

...

(Continue Reading – 2082 more words)

Matthew Barnett13m20

Almost no competent humans have human extinction as a goal. AI that takes over is clearly not aligned with the intended values, and so has unpredictable goals, which could very well be ones which result in human extinction (especially since many unaligned goals would result in human extinction whether they include that as a terminal goal or not).

I don't think we have good evidence that almost no humans would pursue human extinction if they took over the world, since no human in history has ever achieved that level of power.

Most historical conquerors ... (read more)

4kave5h

I think this is slightly a non sequitor. I take Tom to be saying "AIs will care about stuff that is natural to express in human concept-language" and your evidence to be primarily about "AIs will care about what we tell it to", though I could imagine there being some overflow evidence into Tom's proposition. I do think the limited success of interpretability is an example of evidence against Tom's proposition. For example, I think there's lots of work where you try and replace an SAE feature or a neuron (R) with some other module that's trying to do our natural language explanation of what R was doing, and that doesn't work.

6eggsyntax3h

Thanks, that's a totally reasonable critique. I kind of shifted from one to the other over the course of that paragraph. Something I believe, but failed to say, is that we should not expect those misgeneralized goals to be particularly human-legible. In the simple environments given in the goal misgeneralization spreadsheet, researchers can usually figure out eventually what the internalized goal was and express it in human terms (eg 'identify rulers' rather than 'identify tumors'), but I would expect that to be less and less true as systems get more complex. That said, I'm not aware of any strong evidence for that claim, it's just my intuition. I'll edit slightly to try to make that point more clear.

2Nathan Helm-Burger6h

I thought the argument about the kindly mask was assuming that the scenario of "I just took over the world" is sufficiently out-of-distribution that we might reasonably fear that the in-distribution track record of aligned behavior might not hold?

How will we update about scheming?

137

ryan_greenblatt

Ω 807d

I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model.

One question that's really important is how likely scheming is. But it's also really important to know how much we expect this uncertainty to be resolved by various key points in the future. I think it's about 25% likely that the first AIs capable of obsoleting top human experts^[1] are scheming. It's really important for me to know whether I expect to make basically no updates to my P(scheming)^[2] between here and the advent of potentially dangerously scheming models, or whether I expect...

(Continue Reading – 10601 more words)

ryan_greenblatt18m20

Abilities/intelligence come almost entirely from pretraining, so all the situation awareness and scheming capability that current (and future similar) frontier models possess is thus also mostly present in the base model.

Yes, but for scheming, we care about whether the AI can self-locate itself as an AI using its knowledge. The fact that (at a minimum) sampling from the system is required for it to self-locate as an AI might make a big difference here.

Who cares if it greatly reduces competitiveness in experimental training runs?

Yes, reducing situati... (read more)

Applying traditional economic thinking to AGI: a trilemma

102

Steven Byrnes

22h

Traditional economics thinking has two strong principles, each based on abundant historical data:

Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that human labor will retain high value—in other words, the demand curve will move right. Indeed, if anything, the value of labor will ultimately go up, not down—for example, dense cities are engines of economic growth!
Principle (B): “Experience curves”: If the demand for some product goes up, there might be some price increase in the very short term, because the supply curve slopes up. But in the longer term, people will

...

(See More – 748 more words)

2Steven Byrnes44m

I’m still pretty sure that you think I believe things that I don’t believe. I’m trying to narrow down what it is and how you got that impression. I just made a number of changes to the wording, but it’s possible that I’m still missing the mark. When I stated Principle (A) at the top of the post, I was stating it as a traditional principle of economics. I wrote: “Traditional economics thinking has two strong principles, each based on abundant historical data”, and put in a link to a wikipedia article with more details. You see what I mean? I wasn’t endorsing it as always and forever true. Quite the contrary: The punchline of the whole article is: “here are three traditional economic principles, but at least one will need to be discarded post-AGI.” I did some rewriting of this part, any chance that helps?

Lorec21m10

When I stated Principle (A) at the top of the post, I was stating it as a traditional principle of economics. I wrote: “Traditional economics thinking has two strong principles, each based on abundant historical data”,

I don't think you think Principle [A] must hold, but I do think you think it's in question. I'm saying that, rather than taking this very broad general principle of historical economic good sense, and giving very broad arguments for why it might or might not hold post-AGI, we can start reasoning about superintelligent manufacturing [includ... (read more)

3Hzn1h

The purely technical reason why principle A does not apply in this way is opportunity cost. Let's say S is a highly productive worker who could generate $500,000 for the company over 1 year. Moreover S is willing to work for only $50,000! But if investing $50,000 in AI instead would generate $5,000,000, the true cost of hiring S is actually $4,550,000.

2Steven Byrnes32m

We can imagine a hypothetical world where a witch cast a magical spell that destroyed 99.9999999% of existing chips, and made it such that it’s only possible to create one new computer chip per day. And the algorithms are completely optimized—as good as they could possibly be. In that case, the price of compute would get bid up to the maximum economic value that it can produce anywhere in the world, which would be quite high. The company would not have an opportunity cost, because using AI would not be a cheap option. See what I mean? You’re assuming that the price of AI will wind up low, instead of arguing for it. As it happens, I do think the price of AI will wind up low!! But if you want to convince someone who believes in Principle (A), you need to engage with the idea of this race between the demand curve speeding to the right versus the supply curve speeding to the right. It doesn’t just go without saying.

Cast it into the fire! Destroy it!

Aram Panasenco

16h

We should only use AGI once to make it so that no one, including ourselves, can use it ever again.

I'm terrified of both getting atomized by nanobots and of my sense of morality disintegrating in Extremistan. We don't need AGI to create a post-scarcity society, cure cancer, solve climate change, build a Dyson sphere, colonize the galaxy, or any of the other sane things we're planning to use AGI for. It will take us hard work and time, but we can get there with the power of our own minds. In fact, we need that time to let our sense of morality adjust to our ever-changing reality. Even without AGI, most people already feel that technological progress is too fast for them to keep up.

Some of the...

(See More – 557 more words)

3Thane Ruthenis1h

AGI is not the only technology or set of technologies that could be used to let a small set of people (say, 1-100) attain implacable, arbitrarily precise control over the future of humanity. Some obvious examples: * Sufficiently powerful industrial-scale social-manipulation/memetic warfare tools. * Superhuman drone armies capable of reliably destroying designated targets while not-destroying not-designated-targets. * Self-replicating nanotechnology capable of automated end-to-end manufacturing of arbitrarily complicated products out of raw natural resources. * Brain uploading, allowing to create a class of infinitely exploitable digital workers with AGI-level capabilities. Any of those would be sufficient to remove the need to negotiate the direction of the future with vast swathes of humanity. You can just brainwash them into following your vision, or threaten them into compliance with overwhelming military power, or just disassemble them into raw materials for superior manufacturers. Should we ban all of those as well? Generalizing, it seems that we should ban technological progress entirely. What if there's some other pathway to ultimate control that I've overlooked when I've thought about it for a minute? Perhaps we should all return to the state of nature? I don't mean to say you don't have a point. Indeed, I largely agree that there are no humans or human processes that humanity-as-a-whole is in the epistemic position to trust with AGI (though there are some humans I would trust with it; it's theoretically possible to use it ethically). But "we must ban AGI, it's the unique Bad technology" is invalid. Humanity's default long-term prospects seem overwhelmingly dicey well without it. I don't have a neat alternate proposal for you. But what you're suggesting is clearly not the way.

Aram Panasenco36m10

I appreciate you engaging and writing this out. I read your other post as well, and really liked it.

I do think that AGI is the unique bad technology. Let me try to engage with the examples you listed:

Social manipulation: I can't even begin to imagine how this could let 1-100 people have arbitrarily precise control the entire rest of the world. Social manipulation implies that there are societies, humans talking to each other. That's just too large a system for a human mind to fully take the combinatorial explosion of all possible variables into account. Ma

... (read more)

3Aram Panasenco2h

I think this post may be what you're referring to. I really like this comment in that post: Providing for material needs is less than 0.0000001% of the range of powers and possibilities that an AGI/ASI offers. Consider the trans debate. Disclaimer: I'm not trying to take any side in this debate, and am using it for illustrative purposes only. A hundred years ago someone saying "I feel like I'm in the wrong body and feel suicidal" could only be met with one compassionate response, which is to seek psychological or spiritual help. Now scientific progress has advanced enough that it can be hard to determine what the compassionate response is. Do we have enough evidence to determine whether puberty blockers are safe? Are hospitals holding the best interests of the patients at heart or trying to maximize profit from expensive surgeries? If a person is prevented from getting the surgery and kills themselves, should the person who kept them from getting the surgery be held liable? If a person does get the surgery, but later regrets it, should the doctors who encouraged them be held liable? Should doctors who argue against trans surgery lose their medical licenses? ASI will open up a billion possibilities that will got up to such a scale that if the difficulty of determining whether eating human babies is moral is a 1.0 and the difficulty of determining whether encouraging trans surgeries is moral is a 2.0, each of those possibilities will be in the millions. Our sense of morality will just not apply, and we won't be able to reason ourselves into a right or wrong course of action. That which makes us human will drown in the seas of black infinity.

4Seth Herd2h

I'm sorry, I just don't have time to engage on these points right now. You're talking about the alignment problem. It's the biggest topic on LessWrong. You're assuming it won't be solved, but that's hotly debated among people like me who spend tons of time on the details of the debate. My recommended starting point is my Cruxes of disagreement on alignment difficulty post. It explains why some people think it's nearly impossible, some think it's outright easy, and people like me who think it's possible but not easy are working like mad to solve it before people actually build AGI.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

sarahconstantin's Shortform

sarahconstantin

3mo

sarahconstantin1h20

links 1/13/2025: https://roamresearch.com/#/app/srcpublic/page/01-13-2025

https://www.construction-physics.com/p/why-skyscrapers-became-glass-boxes
- plain glass box skyscrapers were, in fact, more cost-effective for developers. it's not all about architectural tastes. architects in real life are very far from all-powerful.
  - in fact, I really think people should stop writing books/movies/etc about auteur architects; it only encourages more young people to go into architecture and become unemployed. I'm looking at you, Francis Ford Coppola
https://www.betonit.ai/

... (read more)

The purposeful drunkard

Dmitry Vaintrob

I'm behind on a couple of posts I've been planning, but am trying to post something every day if possible. So today I'll post a cached fun piece on overinterpreting a random data phenomenon that's tricked me before.

Recall that a random walk or a "drunkard's walk" (as in the title) is a sequence of vectors $x_{1}, x_{2}, \dots$ in some $R^{n}$ such that each $x_{k}$ is obtained from $x_{k - 1}$ by adding noise. Here is a picture of a 1D random walk as a function of time:

A random walk is the "null hypothesis" for any ordered collection of data with memory (a.k.a. any "time series" with memory). If you are looking at some learning process that updates state to state with some degree of stochasticity, seeing a random walk...

(Continue Reading – 1641 more words)

Guillaume Corlouer1h10

Interesting! Perhaps one way to not be fooled by such situations could be to use a non-parametric statistical test. For example, we could apply permutation testing: by resampling the data to break its correlation structure and performing PCA on each permuted dataset, we can form a null distribution of eigenvalues. Then, by comparing the eigenvalues from the original data to this null distribution, we could assess whether the observed structure is unlikely under randomness. Specifically, we’d look at how extreme each original eigenvalue is relative to those... (read more)

Progress links and short notes, 2025-01-13

jasoncrawford

This is a linkpost for https://newsletter.rootsofprogress.org/p/links-and-short-notes-2025-01-13

Much of this content originated on social media. To follow news and announcements in a more timely fashion, follow me on Twitter, Threads, Bluesky, or Farcaster.

From me and RPI
Jobs and fellowships
Other opportunities
Events
Questions
Announcements
Commentary on the wildfires
Sam Altman: AI workers in 2025, superintelligence next
Never underestimate elasticity of supply
“The earnestness and diligence of smart technical people”
“Americans born on foreign soil”
Undaunted
Eli Dourado’s model of policy change
Stats
Links
AI
Inspiration
Politics
China biotech rising
Predictions about war
Why did we wait so long for the camera?
Housing without homebuilders
Charts
Fun

From me and RPI

2024 in review for me and RPI, in case you missed it, including my annual “highlights from things I read this year”
First batch of recorded talks from Progress Conference 2024 are available now. Special thanks to Freethink Media for their excellent work producing these

Jobs and fellowships

Epoch AI hiring a Technical Lead “to develop a next-generation computer-use benchmark

...

(See More – 708 more words)

ryan_b1h20

Regarding The Two Cultures essay:

I have gained so much buttressing context from reading dedicated history about science and math that I have come around to a much blunter position than Snow's. I claim that an ahistorical technical education is technically deficient. If a person reads no history of math, science, or engineering than they will be a worse mathematician, scientist, or engineer, full stop.

Specialist histories can show how the big problems were really solved over time.^[1] They can show how promising paths still wind up being wrong, and the ... (read more)

LESSWRONG
is fundraising!
LW
$

The 2023 Review
Discussion Phase

The 2023 Review

Discussion Phase

Quick Takes

Popular Comments

Recent Discussion

Summary

Contents

From me and RPI

Jobs and fellowships

The 2023 ReviewDiscussion Phase

The 2023 Review

Discussion Phase

Quick Takes

Popular Comments

Recent Discussion

Summary

Contents

From me and RPI

Jobs and fellowships

The 2023 Review
Discussion Phase