Transformer-based Gravitational Wave Forecasting2024

Resources: Paper (Link) | GitHub Repo (Link) | Interactive Visualizations (Link)

Machine learning is playing an increasingly central role in gravitational-wave astrophysics, providing tools for tasks ranging from signal detection to population inference ([1], [2]). Sequence modeling of gravitational waveforms has emerged as a promising direction for accelerating waveform generation, particularly in the late inspiral, merger, and ringdown regimes. Previous studies have primarily focused on parameter-to-waveform models or on the dominant harmonic mode ([17], [19], [20], [22]).

In this work, we extend waveform forecasting to a more complex regime: predicting both polarizations (h+, h×) across higher-order harmonic modes using only early inspiral information from quasi-circular, spinning, non-precessing binary black hole mergers. Higher-order modes are particularly important in systems with unequal masses, high spins, or edge-on orientations, but introduce substantial modeling challenges.

Comparison of waveforms in previous work and this study
Figure 1. Comparison of waveforms considered in previous studies to the richer structure of higher-order wave modes in this study.

We trained the transformer [31] on over 14 million waveforms generated with the NRHybSur3dq8 surrogate, covering mass ratios up to q = 8, spins between -0.8 and 0.8, and inclination angles in [0, π]. Utilizing 16 NVIDIA A100 GPUs on the Delta supercomputer, we trained the transformer model in 15 hours. The model processes input data over t ∈ [−5000M,−100M) and predicts (h+, h×) over t ∈ [−100M,130M]. Our approach uses an encoder-decoder architecture with causal attention and conditional processing layers that improve performance on waveforms with near-zero cross polarization (edge-on configurations).

Waveform split into encoder and decoder input
Figure 2. Example waveform over the entire range, showing how it is split between encoder and decoder input.

On the surrogate-based test set of 840,000 waveforms, the model achieved mean and median overlap scores of 0.996 and 0.997, respectively. Benchmarking on numerical relativity waveforms from the SXS catalog demonstrated strong out-of-distribution generalization, with a median overlap of 0.969 across 521 NR waveforms and up to 0.998 in face-on/off configurations. The interactive visualizations include, among others, an extended waveform gallery comparing predicted vs true waveforms across parameter space.

Performance scatter plot
Figure 3. Distribution of overlap scores (overlap scores ∈ [−1,1], with 1 indicating perfect alignment).

We conducted interpretability studies to understand how the transformer achieves accurate predictions. We performed input obfuscation experiments, where fixed segments of the encoder input were masked out, and measured the resulting drop in predictive performance. Masking early inspiral segments had limited impact on accuracy, while masking segments approaching merger disproportionately degraded performance, highlighting the importance of late-time dynamics for accurate forecasting. Additionally, we correlated the overlap drop with binary parameters such as mass ratio, spin components, and inclination angle to reveal which physical features most influence prediction accuracy. High mass-ratio binaries (large q) showed greater sensitivity to masking of late-time input, indicating the model relies more on merger-phase data in asymmetric systems.

Obfuscation study matrix plot
Figure 4. Correlation between the drop in overlap score from obfuscating fixed segments of the encoder input and various physical parameters of the binary system. Positive values indicate stronger correlation between parameter and sensitivity of that segment, highlighting which parts of the waveform are crucial for capturing features linked to mass ratio, spin, and inclination.

We also examined how truncating higher-order modes affects forecasting accuracy. Mode reduction experiments showed that including higher-order modes improves model performance, particularly in systems with large mass ratios and edge-on orientations.

Mode reduction experiment results
Figure 5. Difference in waveform overlap (𝒪) as a function of binary parameters (mass ratio q, spin components s₁ᶻ, s₂ᶻ, and inclination angle θ) when removing higher-order modes (ℓₘₐₓ = 2, 3) relative to ℓₘₐₓ = 4. This shows how including higher-order modes improves predictive accuracy, especially at high mass ratio and edge-on inclination.

For readers interested in the mathematical underpinnings and mechanisms of transformer architectures, please find my Transformers Explained page explaining transformers in detail, including math, diagrams, and references.

References

[1] Cuoco et al. (2025) Living Reviews in Relativity 28
[2] Cuoco et al. (2021) Mach. Learn. Sci. Tech. 2 011002
[17] Lee et al. (2021) Phys. Rev. D 103 123023
[19] Khan and Green (2021) Phys. Rev. D 103(6) 064015
[20] Khan et al. (2022) Phys. Rev. D 105 024024
[22] Luo et al. (2024) ALLDATA 2024
[31] Vaswani et al. (2017) Advances in neural information processing systems