Evolving AI Biosphere

Where Neural Networks Meet Ecosystem Dynamics

A living laboratory combining artificial life, reinforcement learning, and ecological simulation

Explore the System

Core Systems Overview

The simulation is built on five interconnected subsystems working in harmony:

🧠

Neural Network Agents

LSTM policies for intelligent decision-making in organisms

🗺️

Cellular Grid System

50×50 spatial world representation with 2,500 organism capacity

Energy Economy

Resource flow and metabolism governing survival

💨

Chemical Signaling

Scent diffusion for indirect perception beyond adjacency

📈

Reinforcement Learning

Hive mind training pipeline with REINFORCE algorithm

Neural Network Architecture

🌿 Herbivore/Plant LSTM (CellLSTM)

Per-organism neural network for individual decision-making

Input Layer (8 neurons)

  • plant_count (0-1)
  • herbivore_count (0-1)
  • predator_count (0-1)
  • nutrient_count (0-1)
  • energy_level (0-1)
  • age_factor (0-1)
  • random_noise_1
  • random_noise_2

Hidden Layer

LSTMCell (32 hidden units)

Maintains internal state (h, c) across timesteps

Output Layer (4 neurons)

Actions with softmax activation:

  • Reproduce
  • Move
  • Rest
  • Default

🧬 Mutation Mechanism

During reproduction, offspring neural networks undergo random perturbations:

  • 2% probability per weight
  • Gaussian noise with σ=0.01
  • Preserves general behavior while enabling evolutionary drift

🦁 Predator Hive LSTM (PredatorHive)

Shared policy network used by all predators collectively

Input Layer (8 neurons)

Same observation space as CellLSTM

Hidden Layer

LSTMCell (64 hidden units)

Fresh state per forward pass during training

Output Layer (4 neurons)

  • Reproduce
  • Move
  • Rest
  • Hunt

🎯 Key Advantage: Shared Network

  • Predators learn from collective experience
  • Rare successful strategies spread instantly to all individuals
  • No need for explicit communication—learning IS the communication
  • Faster convergence than per-organism evolution

Training Pipeline (REINFORCE Algorithm)

For each experience (s, a, r):
  1. Normalize rewards: r_norm = (r - mean(R)) / std(R)
  2. Compute log probability: log_π(a|s)
  3. Compute loss: L = -mean(log_π(a|s) · r_norm)
  4. Backpropagate and update weights
Learning rate: 1e-3 Epochs: 8 Batch size: 64 Optimizer: Adam

Spatial Grid System

Grid Structure

World Size 800×800 pixels
Tile Size 16 pixels
Grid Dimensions 50×50 cells
Total Capacity 2,500 organisms

Species Parameters

Species Max Energy Metabolism Reproduction Max Age
🌱 Plant 150 0.4 130 350
🦌 Herbivore 120 0.9 90 400
🦁 Predator 140 1.2 100 600

Energy System

Energy Flow Diagram

☀️ Sunlight
🌱 Plants
Photosynthesis
🦌 Herbivores
Consumption
🦁 Predators
Predation
💀 Death

🌱 Photosynthesis (Plants)

local_light = light_map[x, y] * global_light
temperature_factor = 1.0 - abs(temperature - 0.5)
energy_gain = 3.5 * local_light * temperature_factor

Constraints:
  - light_map: spatial field (0-1)
  - global_light: (0.3-1.0)
  - temperature: (0-1), optimal at 0.5

🦌 Herbivore Consumption

When herbivore eats plant:
  energy_transfer = min(plant.energy * 0.9, 60)
  herbivore.energy += energy_transfer
  herbivore.energy = min(herbivore.energy, max_energy)
  plant dies (removed from grid)

🦁 Predator Predation

When predator eats herbivore:
  energy_transfer = min(herbivore.energy * 1.5, 120)
  predator.energy += energy_transfer
  predator.energy = min(predator.energy, max_energy)
  herbivore dies (removed from grid)
  
  # Record experience for hive training
  hive_experiences.append((observation, action=3, reward=energy_transfer))

Chemical Signaling (Scent System)

Enables indirect perception—organisms detect resources/threats beyond direct adjacency

Diffusion Algorithm

def diffuse_once(grid, decay=0.85):
  new_grid = grid * decay
  
  for each cell (x, y):
    share_amount = grid[x, y] * (1 - decay) * 0.25
    
    for each neighbor (nx, ny):
      new_grid[nx, ny] += share_amount
  
  return new_grid

# Applied iteratively
for _ in range(3):  # scent_diffuse_steps
  plant_scent = diffuse_once(plant_scent, 0.85)
  herbivore_scent = diffuse_once(herbivore_scent, 0.85)

Properties

📉

Decay Rate: 0.85

15% of scent intensity lost each diffusion step

🌊

Diffusion Steps: 3

Scent spreads 3 cells outward from source

🔄

Share Factor: 0.25

Remaining intensity distributed equally to 4 neighbors

📊

Normalization

Final values scaled to 0-1 for consistent perception

Environmental Systems

☀️ Spatial Light Map

Creates organic light patches resembling forest canopy or underwater zones

Generation: Base vertical gradient + sinusoidal patterns
Bright Zones (light > 0.7): Plants thrive
Dark Zones (light < 0.18): Energy penalties and increased death risk

🌡️ Temperature

Range: 0.0 (cold) to 1.0 (hot)
Update: Random walk ± 0.02 per cycle
Effect: Multiplies metabolism rates

💡 Global Light

Range: 0.3 (dim) to 1.0 (bright)
Update: Random walk ± 0.03 per cycle
Effect: Multiplies photosynthesis

Theoretical Foundations

📐 Lotka-Volterra Dynamics

The predator-prey oscillations observed follow modified Lotka-Volterra equations:

dP/dt = αP - βPH        (plants grow, eaten by herbivores)
dH/dt = δβPH - γH - ζHR (herbivores eat plants, die naturally, eaten by predators)
dR/dt = εζHR - λR       (predators eat herbivores, die naturally)

Where:
  P = plant population
  H = herbivore population
  R = predator population
  α = plant growth rate (photosynthesis)
  β = herbivore predation rate on plants

Unlike classical Lotka-Volterra, learning introduces non-constant coefficients—oscillations gradually dampen as the system learns stable equilibrium.

🎓 Reinforcement Learning Framework

The predator hive implements a REINFORCE policy gradient algorithm:

π(a|s; θ): Policy network (parameterized by θ)
J(θ) = E[Σ γᵗ r_t]: Expected cumulative reward
∇J(θ) = E[∇ log π(a|s; θ) * R]: Policy gradient

Update rule:
θ ← θ + α * ∇J(θ)

Advantages of REINFORCE:

  • Simple implementation (single network, no value function)
  • Works with continuous high-dimensional state spaces
  • Naturally handles stochastic policies
  • Suitable for episodic tasks (hunt = episode)

🧬 Evolution vs. Learning

Phylogenetic (Evolutionary)

  • Herbivore neural weights mutate across generations
  • Selection pressure: organisms with better weights survive longer
  • Timescale: 50-200 generations
  • Mechanism: Variation (mutation) + Selection (survival)

Ontogenetic (Learning)

  • Predator hive updates weights through gradient descent
  • Improvement within single generation's lifetime
  • Timescale: 20 generations (one training cycle)
  • Mechanism: Error backpropagation + experience replay
Baldwin Effect: Learning can guide evolution—predators that learn to hunt successfully reproduce more, increasing the "learnability" of the hive over time.

Key Takeaways

Energy is the fundamental currency driving all behavior

⏱️

Learning and evolution operate on different timescales but interact

💥

Extinction cascades can wipe out entire ecosystems rapidly

🗺️

Spatial structure creates ecological niches

🧠

Shared intelligence enables rapid species-level adaptation

🌀

Emergent complexity arises from simple local interactions

AI Biosphere represents a unique intersection of artificial life, reinforcement learning, and ecological simulation.

By combining neural network agents with emergent ecosystem dynamics, it creates a living laboratory for studying adaptation, cooperation, and survival strategies.

Experiment, observe, and discover the unexpected patterns that emerge when artificial organisms learn to survive in a dynamic world.

Created by Vivek Dagar