Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
So many people have lived such grand lives. I have certainly lived a greater life than I expected, filled with adventures and curious people. But people will soon not live any lives at all. I believe that we will soon build intelligences more powerful than us who will disempower and kill us all. I will see no children of mine grow to adulthood. No people will walk through mountains and trees. No conscious mind will discover any new laws of physics. My mother will not write all of the novels she wants to write. The greatest films that will be made have probably been made. I have not often viscerally reflected on how much love and excitement I have for all the things I could do in the future, so I didn't viscerally feeling the loss. But now, when it is all lost, I start to think on it. And I just want to weep. I want to scream and smash things. Then I just want to sit quietly and watch the sun set, with people I love.
I'm currently in the Catalyze Impact AI safety incubator program. I'm working on creating infrastructure for automating AI safety research. This startup is attempting to fill a gap in the alignment ecosystem and looking to build with the expectation of under 3 years left to automated AI R&D. This is my short timelines plan. I'm looking to talk (for feedback) to anyone interested in the following: * AI control * Automating math to tackle problems as described in Davidad's Safeguarded AI programme. * High-assurance safety cases * How to robustify society in a post-AGI world * Leverage large amounts of inference-time compute to make progress on alignment research * Short timelines * Profitability while still reducing overall x-risk * Are someone with an entrepreneurial spirit and can spin out traditional business within the org to fund the rest of the work (thereby reducing investor pressure) If you're interested in chatting or giving feedback, please DM me!
leogao20
0
a thriving culture is a mark of a healthy and intellectually productive community / information ecosystem. it's really hard to fake this. when people try, it usually comes off weird. for example, when people try to forcibly create internal company culture, it often comes off as very cringe.
Implications of DeepSeek-R1: Yesterday, DeepSeek released a paper on their o1 alternative, R1. A few implications stood out to me:  * Reasoning is easy. A few weeks ago, I described several hypotheses for how o1 works. R1 suggests the answer might be the simplest possible approach: guess & check. No need for fancy process reward models, no need for MCTS. * Small models, big think. A distilled 7B-parameter version of R1 beats GPT-4o and Claude-3.5 Sonnet new on several hard math benchmarks. There appears to be a large parameter overhang. * Proliferation by default. There's an implicit assumption in many AI safety/governance proposals that AGI development will be naturally constrained to only a few actors because of compute requirements. Instead, we seem to be headed to a world where: * Advanced capabilities can be squeezed into small, efficient models that can run on commodity hardware. * Proliferation is not bottlenecked by infrastructure. * Regulatory control through hardware restriction becomes much less viable. For now, training still needs industrial compute. But it's looking increasingly like we won't be able to contain what comes after.
habryka260
3
Sorry for the downtime. Another approximate Ddos/extremely aggressive crawler. We are getting better at handling these, but this one was another 10x bigger than previous ones, and so kicked over a different part of our infrastructure.

Popular Comments

Recent Discussion

This question I submitted got rejected from the Humanity's Last Exam (HLE) benchmark set for being too easy. I'm really proud of it though, so I figured I'd post it here.

A wooden cube of unit side length and relative density 0.75 floats stably in a pool of water. What is the distance from the highest point on the cube to the surface of the water, calculated to four decimal places?

Cross-posted from Telescopic Turnip

As we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nuclear reactors and microchips, but we still can't create a proper artificial insect that flies, feeds, and lays eggs that turn into more butterflies. That seems like evidence that butterflies are incredibly complex machines – certainly more complex than a nuclear power facility.

Likewise, when you google "most complex object in the universe", the first result is usually not something invented by humans – rather, what people find the most impressive seems to be "the human brain".

As we are getting closer to building super-human AIs, people wonder what kind of unspeakable super-human inventions these machines will come up with. And, most of the time, the...

leogao20

the laws of physics are quite compact. and presumably most of the complexity in a zygote is in the dna.

2DirectedEvolution
  There are two ways to see this is incorrect. 1. DNA's ability to structure an organism is mediated through its chemical mileau. It is densely, dynamically regulated through a complex and dense mesh of proteins  and regulatory RNA and small signaling molecules at every timepoint in every organism throughout the life cycle. Disruption of that chemical mileau renders the organism nonviable. This is a separate issue from the fact that evolution overwhelmingly operates on DNA sequence. 2. The DNA in a particular organism/cell is one point in a very long series of complex inheretance chains going back 4.5 billion years. I'm comfortable rounding off the maximum complexity of the soma to the maximum possible complexity of the complete set of ancestral DNA sequences. But we can go further by noticing that an individual's DNA sequence is not just the combination of their direct ancestors -- the entire ancestral lineage at every step is sampled from a distribution of possible genomes that is produced from mechanisms impacting reproduction. In a more mathematical sense, while it's true that, conditional on a specific non-stochastic function, the number of values in the output set is less than or equal to the number of values in the input set, if the function can vary freely then there is no such constraint. The soma might be viewed as a stochastic function mapping DNA inputs to phenotypic outputs. The stochastic aspect gives a much larger number of theoretically possible outputs from the same input set. And the fact that the 'function' (soma) itself varies from organism to organism increases the number of phenotypes that can be generated from a given amount of DNA still further. All these arguments also apply to technology. MS Word 'co-evolved' with Windows, with programming languages, with hardware, and this context must be taken into account when thinking about how complex a machine is.
1quiet_NaN
Unlike Word, the human genome is self-hosting. That means that it is paying fair and square for any complexity advantage it might have -- if Microsoft found that the x86 was not expressive enough to code in a space-efficient manner, they could likewise implement more complex machinery to host it. Of course, the core fact is that the DNA of eukaryotes looks memory efficient compared to the bloat of word. There was a time when Word was shipped on floppy disks. From what I recall, it came on multiple floppies, but on the order of ten, not a thousand. With these modern CD-ROMs and DVDs, there is simply less incentive to optimize for size. People are not going to switch away from word to libreoffice if the latter was only a gigabyte.
2quiet_NaN
I think formally, the Kolmogorov complexity would have to be stated as the length of a description of a Turing Machine (not that this gets completely rid of any wiggle room). Of course, TMs do not offer a great gaming experience. "The operating system and the hardware" is certainly an upper bound, but also quite certainly to be overkill. Your floating point unit or your network stack are not going to be very busy while you play tetris. If you cut it down to the essentials (getting rid of things like scores which have to displayed as characters, or background graphics or music), you have a 2d grid in which you need to toggle fields, which is isomorphic to a matrix display. I don't think that having access to boost or JCL or the python ecosystem is going to help you much in terms of writing a shorter program than you would need for a bit serial processor. And these things can be crazy small -- this one takes about 200 LUTs and FFs. If we can agree that an universal logic gate is a reasonable primitive which would be understandable to any technological civilization , then we are talking on the order of 1k or 2k logic gates here. Specifying that on a circuit diagram level is not going to set you back by more than 10kB. So while you are technically correct that there is some overhead, I think directionally Malmesbury is correct in that the binary file makes for a reasonable estimate of the information content, while adding the size of the OS (sometimes multiple floppy disks, these days!) will lead to a much worse estimate.
leogao20

a thriving culture is a mark of a healthy and intellectually productive community / information ecosystem. it's really hard to fake this. when people try, it usually comes off weird. for example, when people try to forcibly create internal company culture, it often comes off as very cringe.

2Viliam
yep. doing it and then redoing it can still be much faster than procrastinating on it

[Crossposted from windowsontheory]

The following statements seem to be both important for AI safety and are not widely agreed upon. These are my opinions, not those of my employer or colleagues. As is true for anything involving AI, there is significant uncertainty about everything written below. However, for readability, I present these points in their strongest form, without hedges and caveats. That said, it is essential not to be dogmatic, and I am open to changing my mind based on evidence. None of these points are novel; others have advanced similar arguments. I am sure that for each statement below, there will be people who find it obvious and people who find it obviously false.

  1. AI safety will not be solved on its own.
  2. An “AI scientist” will not solve
...
1Aaron_Scher
I think your discussion for why humanity could survive a misaligned superintelligence is missing a lot. Here are a couple claims:  1. When there are ASIs in the world, we will see ~100 years of technological progress in 5 years (or like, what would have taken humanity 100 years in the absence of AI). This will involve the development of many very lethal technologies. 2. The aligned AIs will fail to defend the world against at least one of those technologies.  Why do I believe point 2? It seems like the burden of proof is really high to say that "nope, every single one of those dangerous technologies is going to be something that it is technically possible for the aligned AIs to defend against, and they will have enough lead time to do so, in every single case". If you're assuming we're in a world with misaligned ASIs, then every single existentially dangerous technology is another disjunctive source of risk. Looking out at the maybe-existentially-dangerous technologies that have been developed previously and that could be developed in the future, e.g., nuclear weapons, biological weapons, mirror bacteria, false vacuum decay, nanobots, I don't feel particularly hopeful that we will avoid catastrophe. We've survived nuclear weapons so far, but with a few very close calls — if you assume other existentially dangerous technologies go like this, then we probably won't make it past a few of them. Now crunch that all into a few years, and like gosh it seems like a ton of unjustified optimism to think we'll survive every one of these challenges.  It's pretty hard to convey my intuition around the vulnerable world hypothesis, I also try to do so here. 
2habryka
I don't currently think it's plausible, FWIW! Agree that there are probably substantially easier and less well-specified paths. 
4ryan_greenblatt
I find this post somewhat strange to interact with. I think I basically agree with all of the stated claims at least directionally[1], but I disagree with many of the arguments made for these claims. Additionally the arguments you make seem to imply you have an very different world view from me and/or you are worried about very different problems. For example, the section on detection vs prevention seems to focus entirely on the case of getting models to refuse harmful requests from users. My sense is that API misuse[2] is a tiny fraction of risk from powerful AI so it seems like a strange example to focus on from my perspective. I think detection is more important than prevention because catching a scheming AI red handed would be pretty useful evidence that would alter behavior and might more speculatively be useful for preventing further bad behavior. ---------------------------------------- 1. I'd agree more fully with a bit more precision and hedging. ↩︎ 2. I'm worried about humans using AIs for bad ends, but not via the mechanism of doing this as an unprivileged user over an public API, see here for more discussion. ↩︎

I think the link in footnote two goes to the wrong place?

Milan W10

Maybe the trawler problem would be mitigated if lesswrong offered a daily XML or plaintext or whathever dump on a different URL and announced it in robots.txt?

Epistemic status: Late night hot take, notting it down so I don't forget it. Not endorsed. Asked in the spirit of a question post. I am aware that people may respond both "ehm we are already that" and "no! we don't give in to threats!". I don't know.

Summary: An individual Commodore 64 is almost certainly safe, Top 10 super computers could almost certainly run a superpowerful AGI, but where is the safe line, and how would we get to the safe side?

I started thinking about this topic when I realized that we can safely use uranium because we have a field of nuclear criticality safety[1] but we have no field of computer foom safety (or Artificial General Intelligence takeoff safety).[2] For example, if we had such a field we might be able to have a function AGIT(architecture, time, flops, memory)  Bool to tell us if a computer could take off into an AGI or not with that amount of resources.  Making this a total function (giving a value for all of its domain) might not be possible,...

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

The comments here are a storage of not-posts and not-ideas that I would rather write down than not.

So many people have lived such grand lives. I have certainly lived a greater life than I expected, filled with adventures and curious people. But people will soon not live any lives at all. I believe that we will soon build intelligences more powerful than us who will disempower and kill us all. I will see no children of mine grow to adulthood. No people will walk through mountains and trees. No conscious mind will discover any new laws of physics. My mother will not write all of the novels she wants to write. The greatest films that will be made have prob... (read more)

PSA: If you are writing an important prompt for an LLM that will be run multiple time, it really helps to end it with something like "and if there is anything about this prompt that is unclear to you or that could be improved, tell me about it in a <feedback> tag."

Source: I'm doing MATS, writing an automated evaluation, and my mentor Evan Hubinger said more people should be doing this.

I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a person that they can’t rationally explain.

But if a person gives you a bad feeling, don’t ignore that.

Both I and several others who I know have generally come to regret it if they’ve gotten a bad feeling about somebody and ignored it or rationalized it away.

I’m not saying to endorse prejudice. But my experience is that many types of prejudice feel more obvious. If someone has an accent that I associate with something negative, it’s usually pretty obvious to me that it’s their accent that I’m reacting to.

Of course, not everyone has the level of reflectivity to make that distinction....

Vibes tend to be based on pattern matching, and are prone to bucket errors, so it's important to watch out for that - particularly for people with trauma. For instance, I tend to automatically dislike anyone who has even one mannerism in common with either of my parents, and it takes me quite a while to identify exactly what it is that's causing it. It usually isn't their fault and they're quite nice people, but the most annoying part is it doesn't go away just because I know that. This drastically reduces the range of people I can feel comfortable around.

2MSRayne
I learned a lot from him and I STILL have a bad vibe about him. People can be correct, useful, and also unsafe. (Primarily, I suspect him to be high on scales of narcissism, to which I'm very sensitive. Haven't met the guy personally, but his text reeks of it. Doesn't negate his genius; just negates my will to engage with him in any other dimension.)