Data Creation Pipeline

From raw astronomical data to game-ready JSON and sprite assets using Python and Jupyter notebooks

Building the Solar System Database

Challenge: SolarApp renders 10,000+ celestial bodies. Getting accurate orbital data and generating visual assets at scale requires automation.

Data Pipeline Architecture

The pipeline follows a clear ETL (Extract, Transform, Load) pattern:

Raw Data (MPC, JPL)
    ↓
Python Scripts (Extract & Transform)
    ↓
Generated JSON
    ↓
MonoGame Assets (Sprites, Textures)
    ↓
Game Runtime (Load & Render)

Data Sources

Minor Planet Center (MPC)

The Minor Planet Center maintains the authoritative database of small bodies:

Format is legacy fixed-width (dating to 1990s punchcard era), requiring careful parsing.

JPL Horizons & NASA Ephemeris

Python Parsers

Scripts in data/scripts/ extract and enrich raw data:

build_numbered_elements.py

Parses MPC ELEMENTS.NUMBR fixed-width format:

#!/usr/bin/env python3
import pandas as pd
import json
from pathlib import Path

def parse_mpc_elements(filepath):
    """
    Parse MPC fixed-width asteroid orbital elements.
    Format: columns 1-7 = number, 9-13 = epoch, etc.
    """
    asteroids = []
    with open(filepath, 'r') as f:
        for line in f:
            if len(line) < 200:
                continue  # Skip malformed lines

            number = int(line[0:7].strip())
            epoch = float(line[20:25].strip())
            mean_anomaly = float(line[26:35].strip())
            arg_perihelion = float(line[37:46].strip())
            long_asc_node = float(line[48:57].strip())
            inclination = float(line[59:68].strip())
            eccentricity = float(line[70:79].strip())
            semi_major_axis = float(line[80:91].strip())
            mean_motion = float(line[92:103].strip())

            asteroids.append({
                'number': number,
                'epoch': epoch,
                'elements': {
                    'M': mean_anomaly,
                    'w': arg_perihelion,
                    'N': long_asc_node,
                    'i': inclination,
                    'e': eccentricity,
                    'a': semi_major_axis
                }
            })

    return asteroids

if __name__ == '__main__':
    data = parse_mpc_elements('data/DAT/ELEMENTS.NUMBR')
    with open('data/ASTEROIDS.json', 'w') as f:
        json.dump(data, f, indent=2)

build_satellites.py

Groups planetary moons by parent and enriches with physical properties:

def build_satellite_database():
    moons = []
    with open('data/DAT/MOONS.txt', 'r') as f:
        reader = csv.DictReader(f, delimiter='\t')
        for row in reader:
            moon = {
                'name': row['Name'],
                'parent': row['ParentBody'],
                'semi_major_axis': float(row['a']),
                'eccentricity': float(row['e']),
                'inclination': float(row['i']),
                'period_days': float(row['Period']),
            }
            moons.append(moon)

    # Group by parent for efficient lookup
    by_parent = {}
    for moon in moons:
        parent = moon['parent']
        if parent not in by_parent:
            by_parent[parent] = []
        by_parent[parent].append(moon)

    with open('data/SATELITES.json', 'w') as f:
        json.dump(by_parent, f, indent=2)

build_comets.py

Classifies comets by type (periodic, long-period, etc.):

def classify_comet(period_years):
    if period_years < 20:
        return 'periodic'
    elif period_years < 200:
        return 'intermediate'
    else:
        return 'long_period'

def build_comet_database():
    comets = []
    with open('data/DAT/ELEMENTS.COMET', 'r') as f:
        for line in f:
            comet = parse_comet_line(line)
            period = calculate_orbital_period(comet['elements'])
            comet['type'] = classify_comet(period)
            comets.append(comet)

    with open('data/COMETS.json', 'w') as f:
        json.dump(comets, f, indent=2)

Generated JSON Structure

ASTEROIDS.json

[
  {
    "number": 1,
    "name": "Ceres",
    "class": "G",
    "diameter_km": 946.0,
    "epoch": 58800.0,
    "elements": {
      "a": 2.769,
      "e": 0.0755,
      "i": 10.593,
      "N": 80.329,
      "w": 73.115,
      "M": 355.887
    }
  },
  ...
]

SATELITES.json

{
  "Earth": [
    {
      "name": "Moon",
      "semi_major_axis": 384400,
      "period_days": 27.32,
      "eccentricity": 0.0549,
      "inclination": 5.145
    }
  ],
  "Jupiter": [
    {
      "name": "Io",
      "semi_major_axis": 421700,
      "period_days": 1.769,
      "eccentricity": 0.004,
      "inclination": 0.05
    },
    ...
  ]
}

Jupyter Notebooks for Asset Generation

Visual assets (sprites, textures) are generated programmatically in Jupyter notebooks, stored in data/Notebooks/.

Why Notebooks?

Example: Planet Sprite Generation

A notebook generates textured planet sprites:

import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

# Load equirectangular texture
earth_texture = Image.open('textures/earth_flat.jpg')

# Generate sphere views at different angles
def render_sphere(texture, angle_deg):
    """Render a 2D circle with shading to simulate 3D sphere."""
    size = 256
    circle = Image.new('RGBA', (size, size), (0,0,0,0))
    draw = ImageDraw.Draw(circle)

    # Sample texture in a circular pattern
    for x in range(size):
        for y in range(size):
            # Map pixel to sphere
            nx = (x - size/2) / (size/2)
            ny = (y - size/2) / (size/2)

            if nx*nx + ny*ny > 1:
                continue  # Outside circle

            # Sample texture with rotation
            texture_x = (angle_deg + np.arctan2(ny, nx) * 180/np.pi) % 360
            texture_y = np.arccos(np.sqrt(nx*nx + ny*ny)) * 180/np.pi

            tex_pixel = earth_texture.getpixel((
                int(texture_x * earth_texture.width / 360),
                int(texture_y * earth_texture.height / 180)
            ))
            circle.putpixel((x, y), tex_pixel)

    return circle

# Generate sprite sheet
sprites = []
for angle in range(0, 360, 15):
    sprites.append(render_sphere(earth_texture, angle))

# Composite into sprite sheet
sheet = Image.new('RGBA', (256*24, 256))
for i, sprite in enumerate(sprites):
    sheet.paste(sprite, (256*i, 0))

sheet.save('images/sprites/earth_strip.png')

Example: Asteroid Belt Texture

Another notebook generates procedural textures for asteroids:

import numpy as np
from PIL import Image
from scipy.ndimage import gaussian_filter

# Generate perlin noise for rocky texture
def perlin_noise_2d(width, height, scale=50):
    """Simple Perlin noise approximation."""
    # Use fractional Brownian motion
    result = np.zeros((height, width))
    for octave in range(4):
        freq = 2 ** octave
        amp = 0.5 ** octave

        # Generate random gradients
        x = np.linspace(0, freq, width)
        y = np.linspace(0, freq, height)
        xx, yy = np.meshgrid(x, y)

        noise = np.sin(xx) * np.cos(yy) * amp
        result += noise

    return (result + 1) / 2  # Normalize to [0, 1]

# Generate asteroid texture
texture = perlin_noise_2d(512, 512, scale=100)
texture = gaussian_filter(texture, sigma=2)  # Smooth

# Convert to grayscale image
gray_img = Image.fromarray((texture * 255).astype(np.uint8), mode='L')
gray_img.save('images/textures/asteroid_noise.png')

Asset Pipeline

Generated assets are organized by type:

Source Output Used By
Notebooks (planet sprites) images/sprites/ SpriteCache, AnimationStripCache
Notebooks (UI elements) images/ui_textures/ UITextureCache
Notebooks (equirectangular maps) images/textures/ PlanetTextureLibrary

Assets are copied into the executable's bin/ folder via <Content Include> MSBuild items in the csproj.

Runtime Data Loading

The game loads JSON files at startup:

public class AsteroidRepository : IAsteroidRepository
{
    private List<Asteroid> _asteroids;

    public void Load(string jsonPath)
    {
        var json = File.ReadAllText(jsonPath);
        var data = JsonSerializer.Deserialize<List<AsteroidDto>>(json);

        _asteroids = data.Select(dto => new Asteroid
        {
            Number = dto.number,
            Elements = new OrbitalElements
            {
                SemiMajorAxis = dto.elements.a,
                Eccentricity = dto.elements.e,
                Inclination = dto.elements.i,
                // ...
            }
        }).ToList();
    }

    public IEnumerable<Asteroid> GetAll() => _asteroids;
}

Scaling to 10,000+ Bodies

Performance Considerations

Memory Efficiency

Workflow: Adding New Data

To update the database with newer MPC data:

  1. Download fresh ELEMENTS.* files from MPC website into data/DAT/
  2. Run the corresponding Python script: python data/scripts/build_numbered_elements.py
  3. Generated JSON appears in data/
  4. Rebuild the game; new data is loaded at startup

No MonoGame Content Pipeline build step needed — JSON is parsed directly at runtime.

Version Control & Reproducibility

Key design choices for sustainability:

Future Enhancements