2022-06-20 #2 orthogonality thesis

9:42AM Jan 19, 2023

Speakers:

Forrest

Keywords:

agent

thesis

goals

assumptions

environment

colossally

limits

asserting

asimov

stupid

laws

reading

independence

degree

orthogonal

wanted

held

exempts

convergence

true

On another note, I was reading one of the links that you sent, a list of lethalities. There's this reference to this thing called the orthagonality thesis and I was reminded that I actually wanted to mention something about this.

I don't actually agree with the orthogonality thesis. He's asserting that they're true. I don't actually really care about instrumental convergence, but I can definitely say that the orthogonality thesis has some assumptions built into it, which I just don't think are right. I'm reading it as asserting that the orthogonality thesis holds that the goals of an agent are, to some extent, independently variable from the nature of the agent. What I'm basically saying, different environment, different drives, and therefore different intentions, I'm saying "No, actually, there is a contingency between the environment, which is basically the basis of the agents and the drives for that agent. That an agent can't be completely non-responsive to itself or to its environment. That it has to have some degree of agentic relationship to itself or its environments in order to endure at all, i.e. to remain an agent."

So unless we're going to contradict the condition of being an agent, then there needs to be some non-orthogonal aspect to the goal systems of that agent. This is roughly equivalent to maybe the third of Asimov's laws to not have the robot do something so colossally stupid that it exempts itself from existence, it extincts itself. In this sense, I'm holding particularly that the orthogonality thesis can't be universally true, that there needs to be at least some limits. There can be a degree to which we can have a variation of goals, but we can't have complete independence of goals as if any goal can be held by any agent. I think that's wrong but it might not show up unless you get to certain limits or push it to certain edges. That otherwise, in most ranges, it can seem as if you can have independence of goals versus agent.

I just wanted to really bring attention to that in so far as this might be one of the key assumptions that we're continually finding ourselves fighting against.