Evolving AI Biosphere - Technical Documentation

Core Systems Overview

The simulation is built on five interconnected subsystems working in harmony:

🧠

Neural Network Agents

LSTM policies for intelligent decision-making in organisms

🗺️

Cellular Grid System

50×50 spatial world representation with 2,500 organism capacity

⚡

Energy Economy

Resource flow and metabolism governing survival

💨

Chemical Signaling

Scent diffusion for indirect perception beyond adjacency

📈

Reinforcement Learning

Hive mind training pipeline with REINFORCE algorithm

Neural Network Architecture

🌿 Herbivore/Plant LSTM (CellLSTM)

Per-organism neural network for individual decision-making

Input Layer (8 neurons)

plant_count (0-1)
herbivore_count (0-1)
predator_count (0-1)
nutrient_count (0-1)
energy_level (0-1)
age_factor (0-1)
random_noise_1
random_noise_2

→

Hidden Layer

LSTMCell (32 hidden units)

Maintains internal state (h, c) across timesteps

→

Output Layer (4 neurons)

Actions with softmax activation:

Reproduce
Move
Rest
Default

🧬 Mutation Mechanism

During reproduction, offspring neural networks undergo random perturbations:

2% probability per weight
Gaussian noise with σ=0.01
Preserves general behavior while enabling evolutionary drift

🦁 Predator Hive LSTM (PredatorHive)

Shared policy network used by all predators collectively

Input Layer (8 neurons)

Same observation space as CellLSTM

→

Hidden Layer

LSTMCell (64 hidden units)

Fresh state per forward pass during training

→

Output Layer (4 neurons)

Reproduce
Move
Rest
Hunt

                        🎯 Key Advantage: Shared Network
                        Predators learn from collective experience
Rare successful strategies spread instantly to all individuals
No need for explicit communication—learning IS the communication
Faster convergence than per-organism evolution

                    

Training Pipeline (REINFORCE Algorithm)

For each experience (s, a, r):
  1. Normalize rewards: r_norm = (r - mean(R)) / std(R)
  2. Compute log probability: log_π(a|s)
  3. Compute loss: L = -mean(log_π(a|s) · r_norm)
  4. Backpropagate and update weights

Learning rate: 1e-3 Epochs: 8 Batch size: 64 Optimizer: Adam

Spatial Grid System

Grid Structure

World Size 800×800 pixels

Tile Size 16 pixels

Grid Dimensions 50×50 cells

Total Capacity 2,500 organisms

Species Parameters

Species	Max Energy	Metabolism	Reproduction	Max Age
🌱 Plant	150	0.4	130	350
🦌 Herbivore	120	0.9	90	400
🦁 Predator	140	1.2	100	600

Energy System

Energy Flow Diagram

☀️ Sunlight

↓

🌱 Plants
Photosynthesis

↓

🦌 Herbivores
Consumption

↓

🦁 Predators
Predation

↓

💀 Death

🌱 Photosynthesis (Plants)

local_light = light_map[x, y] * global_light
temperature_factor = 1.0 - abs(temperature - 0.5)
energy_gain = 3.5 * local_light * temperature_factor

Constraints:
  - light_map: spatial field (0-1)
  - global_light: (0.3-1.0)
  - temperature: (0-1), optimal at 0.5

🦌 Herbivore Consumption

When herbivore eats plant:
  energy_transfer = min(plant.energy * 0.9, 60)
  herbivore.energy += energy_transfer
  herbivore.energy = min(herbivore.energy, max_energy)
  plant dies (removed from grid)

🦁 Predator Predation

When predator eats herbivore:
  energy_transfer = min(herbivore.energy * 1.5, 120)
  predator.energy += energy_transfer
  predator.energy = min(predator.energy, max_energy)
  herbivore dies (removed from grid)
  
  # Record experience for hive training
  hive_experiences.append((observation, action=3, reward=energy_transfer))

Chemical Signaling (Scent System)

Enables indirect perception—organisms detect resources/threats beyond direct adjacency

Diffusion Algorithm

def diffuse_once(grid, decay=0.85):
  new_grid = grid * decay
  
  for each cell (x, y):
    share_amount = grid[x, y] * (1 - decay) * 0.25
    
    for each neighbor (nx, ny):
      new_grid[nx, ny] += share_amount
  
  return new_grid

# Applied iteratively
for _ in range(3):  # scent_diffuse_steps
  plant_scent = diffuse_once(plant_scent, 0.85)
  herbivore_scent = diffuse_once(herbivore_scent, 0.85)

Properties

📉

Decay Rate: 0.85

15% of scent intensity lost each diffusion step

🌊

Diffusion Steps: 3

Scent spreads 3 cells outward from source

🔄

Share Factor: 0.25

Remaining intensity distributed equally to 4 neighbors

📊

Normalization

Final values scaled to 0-1 for consistent perception

Environmental Systems

☀️ Spatial Light Map

Creates organic light patches resembling forest canopy or underwater zones

Generation: Base vertical gradient + sinusoidal patterns

Bright Zones (light > 0.7): Plants thrive

Dark Zones (light < 0.18): Energy penalties and increased death risk

🌡️ Temperature

Range: 0.0 (cold) to 1.0 (hot)

Update: Random walk ± 0.02 per cycle

Effect: Multiplies metabolism rates

💡 Global Light

Range: 0.3 (dim) to 1.0 (bright)

Update: Random walk ± 0.03 per cycle

Effect: Multiplies photosynthesis

Theoretical Foundations

📐 Lotka-Volterra Dynamics

The predator-prey oscillations observed follow modified Lotka-Volterra equations:

dP/dt = αP - βPH        (plants grow, eaten by herbivores)
dH/dt = δβPH - γH - ζHR (herbivores eat plants, die naturally, eaten by predators)
dR/dt = εζHR - λR       (predators eat herbivores, die naturally)

Where:
  P = plant population
  H = herbivore population
  R = predator population
  α = plant growth rate (photosynthesis)
  β = herbivore predation rate on plants

Unlike classical Lotka-Volterra, learning introduces non-constant coefficients—oscillations gradually dampen as the system learns stable equilibrium.

🎓 Reinforcement Learning Framework

The predator hive implements a REINFORCE policy gradient algorithm:

π(a|s; θ): Policy network (parameterized by θ)
J(θ) = E[Σ γᵗ r_t]: Expected cumulative reward
∇J(θ) = E[∇ log π(a|s; θ) * R]: Policy gradient

Update rule:
θ ← θ + α * ∇J(θ)

Advantages of REINFORCE:

Simple implementation (single network, no value function)
Works with continuous high-dimensional state spaces
Naturally handles stochastic policies
Suitable for episodic tasks (hunt = episode)

🧬 Evolution vs. Learning

Phylogenetic (Evolutionary)

Herbivore neural weights mutate across generations
Selection pressure: organisms with better weights survive longer
Timescale: 50-200 generations
Mechanism: Variation (mutation) + Selection (survival)

Ontogenetic (Learning)

Predator hive updates weights through gradient descent
Improvement within single generation's lifetime
Timescale: 20 generations (one training cycle)
Mechanism: Error backpropagation + experience replay

Baldwin Effect: Learning can guide evolution—predators that learn to hunt successfully reproduce more, increasing the "learnability" of the hive over time.

Key Takeaways

⚡

Energy is the fundamental currency driving all behavior

⏱️

Learning and evolution operate on different timescales but interact

💥

Extinction cascades can wipe out entire ecosystems rapidly

🗺️

Spatial structure creates ecological niches

🧠

Shared intelligence enables rapid species-level adaptation

🌀

Emergent complexity arises from simple local interactions

AI Biosphere represents a unique intersection of artificial life, reinforcement learning, and ecological simulation.

By combining neural network agents with emergent ecosystem dynamics, it creates a living laboratory for studying adaptation, cooperation, and survival strategies.

Experiment, observe, and discover the unexpected patterns that emerge when artificial organisms learn to survive in a dynamic world.

Created by Vivek Dagar