The question of whether Machines Can Think… is about as relevant as the question of whether Submarines Can Swim.
- Edsger W. Dijkstra
Three preliminary notes:
- Intelligence and life are synonymous in my opinion, meaning you could replace all the occurrences of the word intelligence with the word life and this essay is still valid.
- I treat intelligence not as a discrete quality, but rather as a continuous spectrum. On this spectrum we can place different systems, stating whether they are more intelligent or less than other systems, but never being simply intelligent or simply unintelligent. Normally we would like to think that we are closer to one end (if such an end exists), while stones, chairs etc’ are on the other end. Throughout this mass I call a system intelligent, as an abbreviation of “more intelligent than most systems or has a similar level of intelligence as humans” (depending on context).
- All elements to which we attribute the quality of intelligence are “systems”. Systems in this sense means they are a bound part of perceived reality. Bound means we have to state beforehand which elements are parts of the system and which aren’t, or otherwise provide a rule or a machine to determine this. Perceived means that we can use some measuring device to prove they exist. Usually the collection of elements that compose a system will interact with one another, as well as with elements outside of the system.
For some time now I’ve been trying to explain to myself why “a chair is chair”, meaning how is it that we can state comfortably that a specific system is intelligent, while another isn’t, and all that without having a clear definition of intelligence. It is a known fact that both humans and certain algorithms have the ability to identify a feature without being able to define it conceptually, in a verbal way. Nevertheless, the search for definitions is important for deeper understanding of a phenomena.
I will try and present three observations that lead us to the following relativistic definition — a system is more intelligent than another if it can predict its behaviour, which requires some kind of simulation.
The first observation revolves around the ability to resist the outer fluctuations in the system’s environment, to act as a buffer between the fluctuations of the outer world and the inner parts of the system or more poetically: The ability to convert between chaos and order. The more the outer fluctuations are erratic, and the system can still cope with them, the more it is intelligent. This also sits well with intuitive definitions of wisdom and free will. Wisdom: A smart man is one who sees what’s coming. Free will: Free will could be defined as how hard it is to coerce/force a system into specific action. So in a sense, it’s all about predictions, as in order to resist fluctuations you have to foresee them. In order to control another system, you have to predict its response to specific input.
The second observation addresses interactions between different orders of magnitude. Intelligent systems often present this quality more than less intelligent systems. Think about how single molecules can make us smell something, which can cause us (a system of infinite-like number of other molecules) to turn the other way. Other examples can be experiments we are conducting in particle accelerators, or how a single mistake in one DNA molecule can lead to a person’s death, or astrology — we can be affected by things as big and as far as distant stars. So there are indications that intelligent systems are some sort of junctions between orders of magnitude, “portals” if you like, that allow them to affect each other. Another way to look at this is that intelligent systems can sense a wide range of things, both near and far, big and small, while having the ability to be very selective about it, for example not every molecule affects our sense of smell, only very specific ones. Less intelligent systems are affected by what’s close to their size and location, and this influence is not very selective. This returns us to the ability to predict, which is highly depended in such an ability.
A third observation, one which is a bit different than the other two, follows. We may be attributing the quality of intelligence to “miraculous systems”, meaning systems that we don’t really understand very well, much like the way gods in Greek mythology were used to explain natural phenomenons that humans at that time could not fully comprehend, but in a more immediate setting. In other words, an intelligent system is one that requires us to use empathy instead of “mechanical” understanding. By empathy I mean some kind of animation of a system in order to predict its behaviour (Perhaps the more correct word is “personification” or “anthropomorphism”, but I think animation captures the meaning here in a better way). Animation is a very natural heuristic for us as humans when we want to predict each other (it is called “theory of mind” in psychological contexts). This strategy is required because unlike simple systems such as cars or rocks, other systems are too complex for us, and we simply can’t keep track of a substantial percent of their components and inner interactions. Furthermore every attempt to divide these systems into subsystems is made futile by the fact they are so interconnected. Maybe if we had many more neurons in our brain, a mouse brain would have looked to us as intelligent as.. a lock mechanism for example. So the idea behind animation is something like this: I’m confronted with a very complex system, maybe even more complex than I am, so the best way to predict its actions is to pretend I’m that system, identify its interests, store its knowings of the world, and think what would I have done given these knowings to achieve these interests. This is the practical meaning of what it is to animate something, and one might say it is identical to identifying a system as intelligent. So this is to say that the definition of intelligence is: The complementary group of phenomenon that we can comprehend/predict mechanically.
So as stated these three observations lead in the same direction — intelligence is a relation between two elements (when used colloquially one of the elements, the benchmark, is the average human), where one element can predict the other by some kind of simulation.
This definition which is relative and not very empirical could be grounds for an argument in favor of the abolition of the concept of intelligence all together, in the spirit of the principle of parsimony (“Occam’s razor”). But is this really a redundant concept, merely an indication to our lack of understanding? Maybe a more empirical approach could be taken with looking at architecture of intelligent systems. Are there specific structural principles that create intelligence? Do all intelligent systems have similar architecture? It seems that this question will remain open for now.
What is more or less clear is that intelligence does not emerge from multitude of components and interactions alone. Take two examples of this:
- A glass of water contains an enormous size of components and interactions which we can’t fathom, but we still wouldn’t call it intelligent. These interactions are not “interesting”: Firstly they have no macroscopic effect, so we have no problem to generalize or ignore them, and still predict what the glass of water will do (that is, nothing). Secondly there are no “junction effects” (interactions between orders of magnitude). Thirdly, these interactions don’t make the glass of water resistant to its environment. We can easily knock the glass off the table, or drink the water up. There is no buffer between the “inner water” and the outer fluctuations. Bottom line — the whole is not more than the sum of its parts: no architecture, no “design”, no layers. Just a heap of molecules.
- Mortem is another good example, that is because the complexity and physical structure of a corpse does not look (to the untrained eye) very different from that of a living person. Yet somehow there is crucial damage to the architecture of the system. So crucial it is clear that the person is an object, and nothing more.
These two example show us that it’s not just about complexity in the sense of the number of components and their interactions. That kind of complexity is a mandatory condition, but it’s not enough. Furthermore, replicating this quality is relatively easy. It’s enough to program a model neural network with 10a lot neurons, and have some sort of interaction between them. There’s your glass of water. Now the big question is how to replicate the specific architecture that leads to intelligence, to connect the neurons in such a way that the whole is more than the sum of its parts. A tricky task.
The Turing test
Language most shows a man: Speak, that I may see thee.
- Ben Jonson
The Turing test for intelligence addresses the difficulty in measuring intelligence. Instead of checking for specific parameters, it chooses to accept as intelligent anything we as humans recognize as such. However, if we do have some kind of definition for intelligence as the ability to predict, maybe we are too biased towards human-recognition tests? Another way to build intelligence tests is using games (as is done today). In any game the most important thing is to be able to predict what will happen in the future, and choose your actions accordingly. The richer the game, the higher an intelligence it can measure. For example chess is limited in options when compared to basketball for two reasons. Firstly because the latter happens in physical reality, and secondly because it is played by many players. This causes basketball to have infinite-like possibilities.
Then what’s so smart about natural language tests? Why shouldn’t we have basketball tests instead? Using language lets us create a rich-enough environment without resorting to physical reality, which makes it more convenient for tests (especially in Turing’s time), but there is another reason to sticking to these kinds of tests. In order to understand this second reason, we note that any form of intelligence must conceptualize reality — simply because there are more atoms in a room than the amount of neurons of a person in a room (unless you believe every part of the universe has infinite components). Because of that, almost every interesting system must have some kind of compression/inner representation of the world. This means that for a lot of events that happen on the outside, there will be some kind of measurable effects inside the system. Some part which is activated concurrently with that event. A simple human example are neurons that fire for very specific events (e.g. the ̶r̶e̶c̶e̶n̶t̶l̶y̶. discovered Halle Berry neuron). Concepts are elements of human inner representations which can be communicated outside through natural language. So talking to a system is like looking through the outer layers, directly into the system’s inner workings, a feat that would otherwise require us to reverse engineer the system — which is very very hard, maybe even practically impossible for us as humans if the system has too many parts/interconnections. So to sum things up, conceptual natural language based tests expose conceptual networks, and therefore gives a deep understanding of a system, rather than simply observing how the system reacts to the world, or a game environment.
The halting problem
“What I cannot create, I cannot understand.”
- Richard Feynman
Another observation I would like to make is the interesting “solution” nature has found to the halting problem through the ability to predict using empathy and animation. In essence, the halting problem is a feat of prediction, an impossible one, mathematically speaking. One turing machine has complete information over another and one has to state “semantically” what the second will do (halt or not for example). The halting problem is an indication of the inability of a specific Turing machine to have a non-heuristic complete understanding (prediction) of any other Turing machine due to a lack of representation resources (as shown through diagonalization). Intuitively speaking, the halting problem is impossible because there are “too many” ways to build a Turing machine that performs a certain task. So any heuristic way to predict the next step could be turned into a trap and used against us. If we try and apply these concepts to nature, it is obvious that the situation is even worse — systems never have full information on each other, and even if they did it would require a vast amount of resources to fully simulate each other (even a brain of a human does not have enough resources to effectively simulate the brain of a cockroach). This poses a serious problem as the ability to predict other systems’ behaviour is crucial for survival. So the solution nature has accomplished is something along these lines: For simpler systems it is possible to reconstruct a conceptual model by recognizing patterns and learn connections between them. This allows building a simple model. This model is then run in order to provide prediction. This estimated simulation is only slightly different than the usual Turing machine simulation as it compensates for the lack of knowledge. If however the system is too complicated, even for that, then the strategy of animation is chosen. In this strategy instead of modeling the other system, an attempt is made to identify the target system’s interests, then simply act as if they were our own.
Super Turing machines and human computation capabilities
Another important question in the context of defining intelligence addresses the validity of a Turing machine as a model for the human computing ability. I will try to show this question has a connection to the question whether or not the universe is finite.
It has been proven mathematically that given infinite actual memory, neural networks can perform super-Turing computations. These are simply neural nets (or Turing machines, as they are equivalent) with ability to store “real” constants (א1 amount of different numbers). In order to be able to do that, we would need infinite actual storage, and the ability to make calculations using theses infinite memory cells.
If the universe has a finite amount of particles (atoms, quarks or whatever) this is of course impossible, and in fact, even true Turing machines would be impossible, as those assume only potentially infinite memory (א0). Anyhow this means humans cannot perform super Turing calculations, and are mathematically equivalent to less than a Turing machine. Same is true for a potentially infinite universe (א0), the only difference is that we could have infinite discrete memory, like the mathematical model of Turing machines.
If however the universe is continuous, or has א1 parts, then the possibility exists that we do not yet have the physics (or maybe even the mathematics) to describe the calculation capability of humans, and therefore the boundaries on Turing machines may not apply to us.
Observations for intelligence:
- Predicting the environment
- Junctions between different orders of magnitudes
- Empathy and animation as opposed to mechanical understanding.
Definition: System A is more intelligent than B if it can predict it.
Tests for intelligence:
- The importance of rich enough environment — physical reality or language, and not a chess game.
- The importance of language as a window to the inner workings of a system.
Other topics discussed:
Connections between the ability to animate and the halting problem.
Connection between the question of whether or not the universe is infinite (and to what level of infinity) and the question of whether or not we are Turing machines.
(originally written in 2010, so pardon for the lack of more direct references to ML)