Transformer-based Gravitational Wave Forecasting2024

Resources: Paper (Link) | GitHub Repo (Link) | Interactive Visualizations (Link)

Implementation: Python (PyTorch); causal transformer architecture; large-scale distributed training on NCSA’s Delta supercomputer (16× NVIDIA A100); evaluated on 14M+ surrogate waveforms and 521 numerical-relativity simulations.

Overview: This project treats higher-order gravitational-wave modeling as a causal forecasting task: from an early inspiral prefix, the model predicts both polarizations and multiple harmonic modes through merger and ringdown, enabling fast and accurate completions of expensive NR simulations, rapid PN/EOB–NR hybridization studies, and consistency checks of late-time segments where simulations are incomplete or unreliable, complementary to traditional parameter-to-waveform surrogates.

I led the end-to-end machine learning side of this project, including model design, data and training pipeline implementation on HPC systems, large-scale experimentation, evaluation on surrogate and NR datasets, and model diagnostics (excluding construction of the underlying NRHybSur3dq8 surrogate used for data generation).

Motivation & Scope:

Sequence modeling of gravitational waveforms has previously been used for fast parameter-to-waveform surrogates and quadrupole-only forecasting [17, 19, 20, 22]. Practical applications require fast, physically consistent completions when numerical-relativity (NR) simulations are truncated, sparse, or affected by late-time systematics. In this work, we extend waveform forecasting to a more demanding regime than previous studies: predicting both polarizations (h+, h×) across higher-order harmonic modes using only early inspiral information from quasi-circular, spinning, non-precessing binary black hole mergers. Higher-order modes are particularly important in systems with unequal masses, high spins, or edge-on orientations, but introduce substantial modeling challenges.

Comparison of waveforms in previous work and this study Figure 1. Comparison of waveforms considered in previous studies to the richer structure of higher-order wave modes in this study.

Model and Training: We trained the transformer [31] on over 14 million waveforms generated with the NRHybSur3dq8 surrogate, covering mass ratios up to q = 8, spins between -0.8 and 0.8, and inclination angles in [0, π]. Utilizing 16 NVIDIA A100 GPUs on the Delta supercomputer, we trained the transformer model in 15 hours. The model processes input data over t ∈ [−5000M,−100M) and predicts (h+, h×) over t ∈ [−100M,130M]. Our approach uses an encoder-decoder architecture with causal attention and conditional processing layers that improve performance on waveforms with near-zero cross polarization (edge-on configurations).

Waveform split into encoder and decoder input
Figure 2. Example waveform over the entire range, showing how it is split between encoder and decoder input.

Results: On the surrogate-based test set of 840,000 waveforms, the model achieved mean and median overlap scores of 0.996 and 0.997, respectively. Benchmarking on numerical relativity waveforms from the SXS catalog demonstrated strong out-of-distribution generalization, with a median overlap of 0.969 across 521 NR waveforms and up to 0.998 in face-on/off configurations. The interactive visualizations include, among others, an extended waveform gallery comparing predicted vs true waveforms across parameter space.

Performance scatter plot
Figure 3. Distribution of overlap scores (overlap scores ∈ [−1,1], with 1 indicating perfect alignment).

Model Diagnostics: We conducted model diagnostics studies to understand how the transformer achieves accurate predictions. We performed input obfuscation experiments, where fixed segments of the encoder input were masked out, and measured the resulting drop in predictive performance. Masking early inspiral segments had limited impact on accuracy, while masking segments approaching merger disproportionately degraded performance, highlighting the importance of late-time dynamics for accurate forecasting. Additionally, we correlated the overlap drop with binary parameters such as mass ratio, spin components, and inclination angle to reveal which physical features most influence prediction accuracy. High mass-ratio binaries (large q) showed greater sensitivity to masking of late-time input, indicating the model relies more on merger-phase data in asymmetric systems.

Obfuscation study matrix plot
Figure 4. Correlation between the drop in overlap score from obfuscating fixed segments of the encoder input and various physical parameters of the binary system. Positive values indicate stronger correlation between parameter and sensitivity of that segment, highlighting which parts of the waveform are crucial for capturing features linked to mass ratio, spin, and inclination.

We also examined how truncating higher-order modes affects forecasting accuracy. Mode reduction experiments showed that including higher-order modes improves model performance, particularly in systems with large mass ratios and edge-on orientations.

Mode reduction experiment results
Figure 5. Difference in waveform overlap (𝒪) as a function of binary parameters (mass ratio q, spin components s₁ᶻ, s₂ᶻ, and inclination angle θ) when removing higher-order modes (ℓₘₐₓ = 2, 3) relative to ℓₘₐₓ = 4. This shows how including higher-order modes improves predictive accuracy, especially at high mass ratio and edge-on inclination.



References

[17] Lee et al. (2021) Phys. Rev. D 103 123023
[19] Khan and Green (2021) Phys. Rev. D 103(6) 064015
[20] Khan et al. (2022) Phys. Rev. D 105 024024
[22] Luo et al. (2024) ALLDATA 2024
[31] Vaswani et al. (2017) Advances in neural information processing systems