Where Neural Networks Meet Ecosystem Dynamics
A living laboratory combining artificial life, reinforcement learning, and ecological simulation
The simulation is built on five interconnected subsystems working in harmony:
LSTM policies for intelligent decision-making in organisms
50×50 spatial world representation with 2,500 organism capacity
Resource flow and metabolism governing survival
Scent diffusion for indirect perception beyond adjacency
Hive mind training pipeline with REINFORCE algorithm
Per-organism neural network for individual decision-making
LSTMCell (32 hidden units)
Maintains internal state (h, c) across timesteps
Actions with softmax activation:
During reproduction, offspring neural networks undergo random perturbations:
Shared policy network used by all predators collectively
Same observation space as CellLSTM
LSTMCell (64 hidden units)
Fresh state per forward pass during training
For each experience (s, a, r): 1. Normalize rewards: r_norm = (r - mean(R)) / std(R) 2. Compute log probability: log_π(a|s) 3. Compute loss: L = -mean(log_π(a|s) · r_norm) 4. Backpropagate and update weights
| Species | Max Energy | Metabolism | Reproduction | Max Age |
|---|---|---|---|---|
| 🌱 Plant | 150 | 0.4 | 130 | 350 |
| 🦌 Herbivore | 120 | 0.9 | 90 | 400 |
| 🦁 Predator | 140 | 1.2 | 100 | 600 |
local_light = light_map[x, y] * global_light temperature_factor = 1.0 - abs(temperature - 0.5) energy_gain = 3.5 * local_light * temperature_factor Constraints: - light_map: spatial field (0-1) - global_light: (0.3-1.0) - temperature: (0-1), optimal at 0.5
When herbivore eats plant: energy_transfer = min(plant.energy * 0.9, 60) herbivore.energy += energy_transfer herbivore.energy = min(herbivore.energy, max_energy) plant dies (removed from grid)
When predator eats herbivore: energy_transfer = min(herbivore.energy * 1.5, 120) predator.energy += energy_transfer predator.energy = min(predator.energy, max_energy) herbivore dies (removed from grid) # Record experience for hive training hive_experiences.append((observation, action=3, reward=energy_transfer))
Enables indirect perception—organisms detect resources/threats beyond direct adjacency
def diffuse_once(grid, decay=0.85):
new_grid = grid * decay
for each cell (x, y):
share_amount = grid[x, y] * (1 - decay) * 0.25
for each neighbor (nx, ny):
new_grid[nx, ny] += share_amount
return new_grid
# Applied iteratively
for _ in range(3): # scent_diffuse_steps
plant_scent = diffuse_once(plant_scent, 0.85)
herbivore_scent = diffuse_once(herbivore_scent, 0.85)
15% of scent intensity lost each diffusion step
Scent spreads 3 cells outward from source
Remaining intensity distributed equally to 4 neighbors
Final values scaled to 0-1 for consistent perception
Creates organic light patches resembling forest canopy or underwater zones
The predator-prey oscillations observed follow modified Lotka-Volterra equations:
dP/dt = αP - βPH (plants grow, eaten by herbivores) dH/dt = δβPH - γH - ζHR (herbivores eat plants, die naturally, eaten by predators) dR/dt = εζHR - λR (predators eat herbivores, die naturally) Where: P = plant population H = herbivore population R = predator population α = plant growth rate (photosynthesis) β = herbivore predation rate on plants
Unlike classical Lotka-Volterra, learning introduces non-constant coefficients—oscillations gradually dampen as the system learns stable equilibrium.
The predator hive implements a REINFORCE policy gradient algorithm:
π(a|s; θ): Policy network (parameterized by θ) J(θ) = E[Σ γᵗ r_t]: Expected cumulative reward ∇J(θ) = E[∇ log π(a|s; θ) * R]: Policy gradient Update rule: θ ← θ + α * ∇J(θ)
Energy is the fundamental currency driving all behavior
Learning and evolution operate on different timescales but interact
Extinction cascades can wipe out entire ecosystems rapidly
Spatial structure creates ecological niches
Shared intelligence enables rapid species-level adaptation
Emergent complexity arises from simple local interactions