Building the Solar System Database
Data Pipeline Architecture
The pipeline follows a clear ETL (Extract, Transform, Load) pattern:
Raw Data (MPC, JPL)
↓
Python Scripts (Extract & Transform)
↓
Generated JSON
↓
MonoGame Assets (Sprites, Textures)
↓
Game Runtime (Load & Render)
Data Sources
Minor Planet Center (MPC)
The Minor Planet Center maintains the authoritative database of small bodies:
- ELEMENTS.NUMBR — ~875,000 numbered asteroids (fixed-width text format)
- ELEMENTS.UNNUM — Unnumbered asteroid discoveries
- ELEMENTS.COMET — ~4,000 known comets
Format is legacy fixed-width (dating to 1990s punchcard era), requiring careful parsing.
JPL Horizons & NASA Ephemeris
- MOONS.txt — Planetary satellite mean orbital parameters (~425 moons)
- DE440 Ephemeris — High-precision planetary positions (for validation)
Python Parsers
Scripts in data/scripts/ extract and enrich raw data:
build_numbered_elements.py
Parses MPC ELEMENTS.NUMBR fixed-width format:
#!/usr/bin/env python3
import pandas as pd
import json
from pathlib import Path
def parse_mpc_elements(filepath):
"""
Parse MPC fixed-width asteroid orbital elements.
Format: columns 1-7 = number, 9-13 = epoch, etc.
"""
asteroids = []
with open(filepath, 'r') as f:
for line in f:
if len(line) < 200:
continue # Skip malformed lines
number = int(line[0:7].strip())
epoch = float(line[20:25].strip())
mean_anomaly = float(line[26:35].strip())
arg_perihelion = float(line[37:46].strip())
long_asc_node = float(line[48:57].strip())
inclination = float(line[59:68].strip())
eccentricity = float(line[70:79].strip())
semi_major_axis = float(line[80:91].strip())
mean_motion = float(line[92:103].strip())
asteroids.append({
'number': number,
'epoch': epoch,
'elements': {
'M': mean_anomaly,
'w': arg_perihelion,
'N': long_asc_node,
'i': inclination,
'e': eccentricity,
'a': semi_major_axis
}
})
return asteroids
if __name__ == '__main__':
data = parse_mpc_elements('data/DAT/ELEMENTS.NUMBR')
with open('data/ASTEROIDS.json', 'w') as f:
json.dump(data, f, indent=2)
build_satellites.py
Groups planetary moons by parent and enriches with physical properties:
def build_satellite_database():
moons = []
with open('data/DAT/MOONS.txt', 'r') as f:
reader = csv.DictReader(f, delimiter='\t')
for row in reader:
moon = {
'name': row['Name'],
'parent': row['ParentBody'],
'semi_major_axis': float(row['a']),
'eccentricity': float(row['e']),
'inclination': float(row['i']),
'period_days': float(row['Period']),
}
moons.append(moon)
# Group by parent for efficient lookup
by_parent = {}
for moon in moons:
parent = moon['parent']
if parent not in by_parent:
by_parent[parent] = []
by_parent[parent].append(moon)
with open('data/SATELITES.json', 'w') as f:
json.dump(by_parent, f, indent=2)
build_comets.py
Classifies comets by type (periodic, long-period, etc.):
def classify_comet(period_years):
if period_years < 20:
return 'periodic'
elif period_years < 200:
return 'intermediate'
else:
return 'long_period'
def build_comet_database():
comets = []
with open('data/DAT/ELEMENTS.COMET', 'r') as f:
for line in f:
comet = parse_comet_line(line)
period = calculate_orbital_period(comet['elements'])
comet['type'] = classify_comet(period)
comets.append(comet)
with open('data/COMETS.json', 'w') as f:
json.dump(comets, f, indent=2)
Generated JSON Structure
ASTEROIDS.json
[
{
"number": 1,
"name": "Ceres",
"class": "G",
"diameter_km": 946.0,
"epoch": 58800.0,
"elements": {
"a": 2.769,
"e": 0.0755,
"i": 10.593,
"N": 80.329,
"w": 73.115,
"M": 355.887
}
},
...
]
SATELITES.json
{
"Earth": [
{
"name": "Moon",
"semi_major_axis": 384400,
"period_days": 27.32,
"eccentricity": 0.0549,
"inclination": 5.145
}
],
"Jupiter": [
{
"name": "Io",
"semi_major_axis": 421700,
"period_days": 1.769,
"eccentricity": 0.004,
"inclination": 0.05
},
...
]
}
Jupyter Notebooks for Asset Generation
Visual assets (sprites, textures) are generated programmatically in Jupyter notebooks, stored in data/Notebooks/.
Why Notebooks?
- Visual Iteration: See rendered output inline as you adjust parameters
- Documentation: Mix code and explanatory markdown
- Reproducibility: Committed to git; anyone can regenerate assets
- No Build Step: MonoGame's Content Pipeline is optional; notebooks output directly to
images/
Example: Planet Sprite Generation
A notebook generates textured planet sprites:
import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
# Load equirectangular texture
earth_texture = Image.open('textures/earth_flat.jpg')
# Generate sphere views at different angles
def render_sphere(texture, angle_deg):
"""Render a 2D circle with shading to simulate 3D sphere."""
size = 256
circle = Image.new('RGBA', (size, size), (0,0,0,0))
draw = ImageDraw.Draw(circle)
# Sample texture in a circular pattern
for x in range(size):
for y in range(size):
# Map pixel to sphere
nx = (x - size/2) / (size/2)
ny = (y - size/2) / (size/2)
if nx*nx + ny*ny > 1:
continue # Outside circle
# Sample texture with rotation
texture_x = (angle_deg + np.arctan2(ny, nx) * 180/np.pi) % 360
texture_y = np.arccos(np.sqrt(nx*nx + ny*ny)) * 180/np.pi
tex_pixel = earth_texture.getpixel((
int(texture_x * earth_texture.width / 360),
int(texture_y * earth_texture.height / 180)
))
circle.putpixel((x, y), tex_pixel)
return circle
# Generate sprite sheet
sprites = []
for angle in range(0, 360, 15):
sprites.append(render_sphere(earth_texture, angle))
# Composite into sprite sheet
sheet = Image.new('RGBA', (256*24, 256))
for i, sprite in enumerate(sprites):
sheet.paste(sprite, (256*i, 0))
sheet.save('images/sprites/earth_strip.png')
Example: Asteroid Belt Texture
Another notebook generates procedural textures for asteroids:
import numpy as np
from PIL import Image
from scipy.ndimage import gaussian_filter
# Generate perlin noise for rocky texture
def perlin_noise_2d(width, height, scale=50):
"""Simple Perlin noise approximation."""
# Use fractional Brownian motion
result = np.zeros((height, width))
for octave in range(4):
freq = 2 ** octave
amp = 0.5 ** octave
# Generate random gradients
x = np.linspace(0, freq, width)
y = np.linspace(0, freq, height)
xx, yy = np.meshgrid(x, y)
noise = np.sin(xx) * np.cos(yy) * amp
result += noise
return (result + 1) / 2 # Normalize to [0, 1]
# Generate asteroid texture
texture = perlin_noise_2d(512, 512, scale=100)
texture = gaussian_filter(texture, sigma=2) # Smooth
# Convert to grayscale image
gray_img = Image.fromarray((texture * 255).astype(np.uint8), mode='L')
gray_img.save('images/textures/asteroid_noise.png')
Asset Pipeline
Generated assets are organized by type:
| Source | Output | Used By |
|---|---|---|
| Notebooks (planet sprites) | images/sprites/ |
SpriteCache, AnimationStripCache |
| Notebooks (UI elements) | images/ui_textures/ |
UITextureCache |
| Notebooks (equirectangular maps) | images/textures/ |
PlanetTextureLibrary |
Assets are copied into the executable's bin/ folder via <Content Include> MSBuild items in the csproj.
Runtime Data Loading
The game loads JSON files at startup:
public class AsteroidRepository : IAsteroidRepository
{
private List<Asteroid> _asteroids;
public void Load(string jsonPath)
{
var json = File.ReadAllText(jsonPath);
var data = JsonSerializer.Deserialize<List<AsteroidDto>>(json);
_asteroids = data.Select(dto => new Asteroid
{
Number = dto.number,
Elements = new OrbitalElements
{
SemiMajorAxis = dto.elements.a,
Eccentricity = dto.elements.e,
Inclination = dto.elements.i,
// ...
}
}).ToList();
}
public IEnumerable<Asteroid> GetAll() => _asteroids;
}
Scaling to 10,000+ Bodies
Performance Considerations
- JSON Streaming: Parse incrementally rather than loading all at once
- Spatial Indexing: Use quad-trees or grids to quickly find nearby bodies
- LOD (Level of Detail): Render distant asteroids as small dots, nearby ones with texture
- Culling: Skip rendering bodies outside the camera viewport
Memory Efficiency
- Store only necessary orbital elements (6 floats = 24 bytes per body)
- Share textures across similar bodies (all C-type asteroids use same noise texture)
- Use float16 for positions in far space (loss of precision is imperceptible)
Workflow: Adding New Data
To update the database with newer MPC data:
- Download fresh ELEMENTS.* files from MPC website into
data/DAT/ - Run the corresponding Python script:
python data/scripts/build_numbered_elements.py - Generated JSON appears in
data/ - Rebuild the game; new data is loaded at startup
No MonoGame Content Pipeline build step needed — JSON is parsed directly at runtime.
Version Control & Reproducibility
Key design choices for sustainability:
- Commit Notebooks: Jupyter notebooks are committed to git (stored as JSON; diffs can be noisy but still useful)
- Gitignore Raw Data: Large raw .txt files are gitignored; scripts document where to download them
- Pin Dependencies: Python environment is captured in
requirements.txt - Document Regeneration: Each script has a docstring explaining how to run it
Future Enhancements
- Automated Updates: Scheduled job to fetch latest MPC data and regenerate JSON
- Validation: Compare computed positions against JPL Horizons for accuracy
- Compression: Use messagepack or protobuf instead of JSON for smaller file size
- Differential Updates: Only include new/changed bodies in updates