Elegant Signatures of Symmetry in Biological and Artificial Intelligence

On the Subtle Impact of Symmetry and Balance on the Design and Emergence of Intelligent Systems in Humans and Machines

Symmetry is a characteristic found in various aspects of nature, bringing a sense of beauty and often associated with health and functionality. It can be observed at levels from the arrangement of petals on a flower to the pattern of galaxies.

When it comes to intelligence, the concept of symmetry can get more abstract. We humans exhibit vast biological symmetry, which is mirrored also in the complex structure of our brains; for example, we have two hemispheres that work together cohesively but are responsible for different functions, and we have a sympathetic and parasympathetic nervous system that can operate in antagonistic symmetry. The anatomic and physiologically symmetries in our nervous system and brain, certainly, play a crucial role in the development of natural intelligence, too, and the exact relationships are still areas of study and exploration by scientific researchers worldwide.

A stereotypical image of brain lateralisation — demonstrated to be false in neuroscientific research.-SOURCE.
A stereotypical image of brain lateralisation — demonstrated to be false in neuroscientific research

In the sphere of intelligence, symmetry can also be seen as achieving a certain equilibrium among abilities such as emotional intelligence and logical reasoning or encompassingHoward Gardners, an American developmental psychologist and Research Professor of Cognition and Education at the Harvard Graduate School of Education, proposed ‘multiple intelligences’ like linguistic aptitude, logical mathematical skills, spatial awareness and so forth.

Howard Gardner’s Theory of Multiple Intelligences posits that individuals possess various distinct types of intelligences, rather than a single general intelligence. These types encompass areas like linguistic, logical-mathematical, musical, spatial, bodily-kinesthetic, interpersonal, intrapersonal, and naturalistic intelligences, emphasizing a broader understanding of human capability.-SOURCE.
Howard Gardner’s Theory of Multiple Intelligences posits that individuals possess various distinct types of intelligences, rather than a single general intelligence. These types encompass areas like linguistic, logical-mathematical, musical, spatial, bodily-kinesthetic, interpersonal, intrapersonal, and naturalistic intelligences, emphasizing a broader understanding of human capability.

To my belief, striking a balance—and I mean balance as a notion of symmetry—, between these facets, by minimizing some kind of entropic potential functions in interrelation with self and the environment, could potentially result in comprehensive problem-solving capabilities and deeper cognition. Examples of such potential functions have been elaborated by concepts such as cross entropy loss, the Friston free energy or the generalised Landau free energy.

The concept of free energy minimization, for instance, explains how our brain processes perceptions by making unconscious predictions. This approach, rooted in Bayesian theory, suggests that our brain balances these predictions to better align with incoming sensory data through a process similar to gradient descent. This adjustment is mathematically represented as:

​​This equation represents the rate of change of the brain’s estimation of some state μ. The tilde (~) sign suggests that this variable is represented in generalised coordinates of motion. D is a derivative operator, indicating how our brain updates its beliefs to minimize prediction errors effectively as an evolution over time. The second term on the right-hand side of the formula represents the gradient (partial derivative) of the free energy F with respect to the variable μ. The function F(s,μ) is the free energy dependent on the sensory input s and the state μ. This gradient provides the direction and rate at which changes in μ decrease the free energy, guiding the system towards a state that better explains the sensory input. The evaluation of the gradient is at the point where μ equals μtilde. This means that we apply the gradient computation when the state variable μ is at its estimated value μtilde​.

Schematic figure illustrating a Markov Blanket in biological systems. — SOURCE
Schematic figure illustrating a Markov Blanket in biological systems.

With this concept in mind, the above figure illustrates schematically the partition of states into the internal states and external (hidden or latent) states that are separated by a so-called Markov Blanket — comprising sensory states and active states. In the upper picture, focussed on a cell, the dependencies are rearranged such that the internal states are associated with the intracellular states of a cell. Here, the sensory states become the surface states of the cell membrane overlying active states—e.g., the actin filaments of the cytoskeleton. The lower illustration, in contrast, shows the same partition and dependencies rearranged, as it would be applied to action and perception in the entire brain. There, active and internal states minimize a free energy functional of sensory states. The ensuing balanced self-organisation of internal states then corresponds to perception, while action couples brain states back to external states. The entire mechanism resembles a dynamic process that sustains in an interplay between formation of symmetries, by creating manifolds of free energy functional, and symmetry breaks, by approaching a stable attractor dynamics on the manifolds.¹

Ifsymmetry indeed has an impact, on the development and functioning of intelligence, it stands to reason that we would also observe indications of symmetry, in artificial intelligence.

Indeed, the concept of symmetry, or more broadly, balance in structure, is implicit in the design and training of AI models, including for instance Large Language Models (LLM) too.

In the transformer architecture, in the encoder, tokens communicate with each other and update their representations; in the decoder, a target token first looks at previously generated target tokens, then at the source, and finally updates its representation. This happens in several layers, usually 6. Inherently, the architecture operates symmetrically, by encoding versus decoding.-SOURCE
In the transformer architecture, in the encoder, tokens communicate with each other and update their representations; in the decoder, a target token first looks at previously generated target tokens, then at the source, and finally updates its representation. This happens in several layers, usually 6. Inherently, the architecture operates symmetrically, by encoding versus decoding for a desired optimum.

Neural networks often incorporate structural symmetry in their architecture. This means that their neural layers are organized in such an orderly manner to ensure an optimal flow of information throughout the network. For instance, models like GPT 4 use Transformers as their architecture, which typically consists of a symmetric configuration of encoder/decoder attention components.

A high-level view on the attention mechanism in machine translation: An attention mechanism is a part of a neural network. At each decoder step, it decides which source parts are more important. In this setting, the encoder does not have to compress the whole source into a single vector — it gives representations for all source tokens. Regarding the encoder/decoder system as a concept of symmetry and balance, the attention mechanism may determine dynamically the degree of balance at each step.-S
A high-level view on the attention mechanism in machine translation: An attention mechanism is a part of a neural network. At each decoder step, it decides which source parts are more important. In this setting, the encoder does not have to compress the whole source into a single vector — it gives representations for all source tokens. Regarding the encoder/decoder system as a concept of symmetry and balance, the attention mechanism may determine dynamically the degree of balance at each step.

While encoder/decoder attention is looking from one current decoder state to all encoder states, at a more sophisticated level of symmetry, self-attention is looking from each state in a set of states to all other states in the same set. Self-attention can be understood as a dynamic form of self-symmetry or self-balancing.

Self-attention is the component of the model where tokens interact with each other. Each token “looks” at other tokens in the sentence with an attention mechanism, gathers context, and updates the previous representation of “self”.-SOURCE
Self-attention is the component of the model where tokens interact with each other. Each token “looks” at other tokens in the sentence with an attention mechanism, gathers context, and updates the previous representation of “self”.

Following this rationale, sharing parameters across parts of a model can also be seen as a form of symmetry expression (see case example further down). In this approach, the same weights are used to process different sections of the input data, ensuring consistency and reducing the number of parameters. Encouraging the model to learn more generalizable features, regularization techniques are employed to prevent overfitting and enhance balance or symmetry within the model weights.

Furthermore, data augmentation techniques introduce symmetries into training data by performing operations like image flipping or rotation, particularly in vision models. These symmetries enable models to learn features more effectively.

Certain models are designed multimodal, to handle types of data such, as text and images. These models also aim to strike a balance and maintain symmetry in processing modalities, allowing them to interact and complement each other effectively. Moreover, ensemble methods, which combine the predictions of multiple models, aim to achieve balanced and robust performance by mitigating the weaknesses of individual models. Alongside, some models utilize symmetric loss functions to ensure that errors are treated symmetrically, promoting a balanced optimization process.

Obviously, symmetry in structure and design determines the way patterns of intelligence emerge and unfold. When we initialize a neural network, we typically start with random weights. However, when the network learns from training data, these weights get adjusted to reduce errors. The way these adjustments happen isn’t fully random but follows certain patterns dictated by the network’s design. These patterns or rules are often related to and determined by symmetries in the architecture. Symmetry in the architecture and design, thus, influences how weights are updated and how the machine learns.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *