AI foundation models for experimental fusion tasks

Churchill, R. Michael

doi:10.3389/fphy.2024.1531334

PERSPECTIVE article

Front. Phys., 10 February 2025

Sec. Fusion Plasma Physics

Volume 12 - 2024 | https://doi.org/10.3389/fphy.2024.1531334

This article is part of the Research TopicVisualizing Offline and Live Data with AI (VOLDA) Workshop first edition Princeton 11-13th June 2024View all 7 articles

AI foundation models for experimental fusion tasks

R. Michael Churchill*

Princeton Plasma Physics Laboratory, Princeton, NJ, United States

Artificial Intelligence (AI) foundation models, while successful in various domains of language, speech, and vision, have not been adopted in production for fusion energy experiments. This brief paper presents how AI foundation models can be used for fusion energy diagnostics, enabling, for example, visual automated logbooks to provide greater insights into chains of plasma events in a discharge, in time for between-shot analysis.

1 Introduction

AI foundation models [1] encapsulate a concept wherein an AI model is pre-trained in an unsupervised or self-supervised manner with a fundamental task, for example, predicting the next word in a sentence, on a wide range of data, and the trained model subsequently serves as a foundation to fine-tune the pre-trained foundation model for more detailed downstream tasks, for example, sentence generation, text summary, machine translation, etc. Essentially, instead of being a narrow expert, they are generalists. Although the concept of these models gained popularity with large language models (LLMs), such as those underlying ChatGPT [2], in principle, similar techniques can be utilized across a range of modalities, for example, images, audio, video, unstructured meshes, etc. Given the plethora of data on different modalities in experimental magnetic confinement fusion devices and the wide variety of tasks experimental fusion scientists need to perform, a natural question arises on whether AI foundation models can be created for experimental fusion data to enhance and accelerate fusion science. This paper seeks to explain at a conceptual level how these foundation models could be created and how they could effectively be used in experimental fusion settings.

2 Foundation models for fusion energy experiments

Currently, when AI/machine learning (ML) is used for tasks within fusion energy experiments, most often, the focus is on bespoke solutions for a particular task. These bespoke solutions require a lot of work from the practitioner, in gathering data, cleaning data, often performing data reductions (i.e., feature engineering), labeling data for classification problems, etc. The targeted tasks range widely, including models created specifically for anomaly detection [3], classification of plasma events [4–7], and time–series semantic search [8]. Figure 1 shows a representation of a foundation model that would instead serve the basis for these many tasks and more, reducing substantially the burden for repeating many of the steps for custom bespoke solutions. However, the question arises as to what this foundation model is and how this is achieved.

Figure 1

Figure 1. Foundation models for fusion energy enable many downstream tasks to be accomplished by a single model, including classification of plasma phenomena from fast diagnostics, making predictions from few examples, combining multiple diagnostics (modalities), and extracting physics model parameters from diagnostic data, anomaly detection, and more.

For LLMs, one of the more popular foundation models is the generative pre-trained transformer (GPT) [9]. This is a decoder-only transformer [10], pre-trained for next-token prediction (where tokens are created by splitting the text into a fixed vocabulary size of subwords, usually on order $\sim 100$ k). Mathematically, GPT model next-token prediction training maximizes the log-likelihood:

\max_{θ} \sum_{i = 1} \log p_{θ} (s_{i} | s_{i - 1}, s_{i - 2}, \dots, s_{i - k + 1}, s_{i - k}),

where $p_{θ}$ is the probability density represented by the transformer neutral network with the parameter $θ$ and the token represented by $s_{j}$ , with $k$ being the context length of tokens provided to the model to make the next-token prediction of $s_{i}$ . After training this model with gradient descent, the trained model can be used for various tasks, by, for example, adding a learnable layer at the end of the model and fine-tuning for supervised classification problems [9]. One of the most impactful findings with these trained models is “in-context learning” [11], in which a few examples can be input to the model and, without any additional training or fine-tuning, have the model complete a similar task (for example, provide example pairs of word translation from English - $>$ Spanish and an empty word to translate, for example, provide as input to the model “dog - $>$ perro, cat - $>$ gato, bird - $>$ ”, and the model outputs “pajaro”). The increased in-context learning performance of GPT models with the size of the model (and size of data trained on) has enabled these models to be of general purpose (able to perform many different tasks) and led directly to the success of ChatGPT.

In experimental fusion energy sciences, the data are fundamentally different from text, in the first place being continuous instead of discrete but also consists of hundreds of different diagnostic data modalities, ranging from simple time series to more complex multi-channel, line-integrated 2d spatial videos. The time–series nature of the data maps well onto foundation models created for audio or music [12], where their typical downstream tasks are speaker identification, automatic speech recognition, music generation, etc. Typically, to train these models, the self-supervised learning objective differs from the discrete language case since the continuous nature of the time series is a large space to attempt the next-token prediction. Instead, often, contrastive learning for self-supervised training is used, where a time series sequence is partially masked, and the model learns to predict this masked portion by discerning from a set, including the true sequence and many negative or false sequence samples:

L = - l o g \frac{e x p (s i m (c_{t}, q_{t}^{+}) / τ)}{\sum_{q \in Q} \exp (s i m (c_{t}, q) / τ)}

where sim $(a, b) = a^{T} b / ∥ a ∥ ∥ b ∥$ is the cosine similarity, $c_{t}$ is the model output predicted sequence, $q_{t}^{+}$ is the true sequence (quantized to ease learning), $τ$ is a modifiable temperature parameter, and $Q = {q_{1}^{-}, q_{2}^{-}, \dots, q_{N}^{-}, q_{t}^{+}}$ is a set including the true sequence and a number of false sequences to discern between. Contrastive losses are more suitable for situations with continuous valued sequences. With the pre-trained model, a similar path as the LLM can be followed for fine-tuning the models using a few specific labeled examples for supervised learning tasks like classification.

It should be noted here that while LLMs based on next-token prediction loss have been useful for both generative and discriminative downstream tasks, often, foundation models for audio or time-series have been focused on one set or the other (generative or discriminative downstream tasks). Figure 1 focuses on discriminative downstream tasks (e.g., classifying plasma modes in diagnostic data), but it should be noted that there are generative tasks that can be useful in fusion energy, such as scenario planning. Many foundation models for modalities like audio focused on generative tasks use diffusion or flow matching models [13], although they are not studied here.

AI foundation models can be created for single diagnostics; however, AI model architectures exist to incorporate multiple modalities [14–16], thereby taking advantage of the correlations between modalities. For fusion experiments, this is particularly useful as information in, for example, the electron cyclotron emission imaging (ECEI) diagnostic and the beam emission spectroscopy (BES) diagnostic, measures different physical phenomena, and combining the data for predictions will potentially provide greater information than the sum of its parts.

Because AI foundation models are pre-trained to effectively learn the underlying data distribution, it is observed that large parameter models pre-trained on large amounts of unlabeled data perform better [11]. As a result, the consequence is that large high-performance computing (HPC) resources with many GPUs are needed to train these models. With the popularity of deep learning and foundation models, many good frameworks and tools are available to make this easier, including PyTorch, Hugging Face Accelerate, and MetaFAIR library.

3 Automated logbook

One relevant example of how to use such an AI foundation model for fusion energy experiment is shown in the automated logbook example in Figure 2. Fusion energy researchers have a deluge of data to process and understand from experiment, on short timescales between experimental discharges (usually 10–20 min) and longer timescales of months to years for understanding campaign-level data. Insights, if recorded, are normally formulated as text into personal or online logbooks. This manual analysis can be laborious. The AI foundation model could be used to automatically tag plasma events of interest in the diagnostic data, creating a metadata database and enabling fast visualization of plasma event sequences between plasma discharges.

Figure 2

Figure 2. Workflow for the automated logbook, enriched by few-shot learning with large neural networks. A CNN + Transformer foundation model is pre-trained on unlabeled data and then fine-tuned with a small labeled dataset. With the fine-tuned network, fast inference can be done between shots on diagnostic data, to quickly identify plasma events of interest.

As shown in Figure 2, first, a large dataset of raw diagnostic data from many plasma discharges is gathered, without having to label specific plasma events in the data. The AI foundation model is pre-trained on this data, passing in the sequences of data and using a contrastive loss to learn to predict masked portions of the sequence (the model shown is based on the wav2vec 2.0 model [17], with a convolutional neural network (CNN) encoder to reduce the data to a latent space representation, followed by a transformer model [10]). In the second step, a small dataset is gathered and labeled at time slices with a specific plasma event or mode, for example, neoclassical tearing modes (NTMs), Alfven eigenmodes (AEs), edge harmonic oscillation (EHO), etc. The fine-tuning of the model can be to predict a single type of plasma event or different types of events. The size of this labeled dataset is smaller than would be required when training a model directly in a traditional supervised learning fashion since the pre-trained model has learnt good representations of the underlying data distribution. The size of this labeled dataset in principle can be as little as one or a few examples but, in practice, may require more and is problem-dependent. A decoder layer with learnable parameters is added onto the end of the pre-trained model, and with the labeled dataset, the model is fine-tuned to output predicted labels based on an input sequence. This fine-tuning can involve only updating the decoder layer learnable parameters and retaining the rest of the pre-trained model parameters frozen, or unfreezing various layers of the pre-trained model and having those parameters also updated by the learning process. This fine-tuning needs to be done once, and then, the model is used for inference (in machine learning parlance prediction versus learning). As new plasma discharges are completed, the fine-tuned model takes in the new diagnostic data and predicts labels for the various plasma events. In the final step shown in Figure 2, these predictions can be visualized with the data in the automated logbook, for fast feedback to fusion researchers between plasma discharges and further investigation later. Detected modes can also trigger further analysis, for example, bandpass filtering on the detected mode frequencies, and visualizing the resulting spatial model structure in different diagnostics.

Although bespoke AI models could be created for each diagnostic or each plasma event, the traditional supervised learning route would almost surely require thousands of labeled examples gathered by researchers, a long tedious process often avoided. The AI foundation model offers a route where fewer labeled examples are needed. The foundation model can be fine-tuned for different plasma events. This enables identification of chains of events often important for understanding phenomena such as disruptions [18, 19].

Foundation models do require a large unlabeled dataset, and there are no well-defined rules for its size (this is dependent on the variety of the data and information content per sample, which may be hard to quantify). For many fusion energy experiments, substantial data can be available, depending on the device and diagnostic. An example of the largest diagnostic datasets on the DIII-D tokamak is shown in Table 1 (there are a total of 60 different diagnostic systems on DIII-D), showing a substantial amount of data available that can reasonably be expected to be sufficient for the purpose of training an AI foundation model.

Table 1

Table 1. Diagnostics on the DIII-D tokamak with the largest dataset sizes. Note that not all of these data are for overlapping plasma discharges (i.e., some plasma discharges will not have all of these diagnostics available).

In addition to the need for sizeable information-rich data to train on, out-of-distribution (OOD) data during inference need to be considered. Fusion experiments often push the boundaries to new areas, resulting in diagnostic data that may be far from that seen previously. Various works have approached this topic in bespoke AI models for fusion energy, seeking to enable models to adapt to new datasets [5, 20, 21]. In the context of AI foundation models, there are some indications in other fields, such as medical imaging, that foundation models are more robust to data distribution shift [22], even being useful to discriminate OOD data [23]. However, this needs to be researched in the specific context of AI foundation models for fusion energy diagnostic data.

4 Discussion

AI foundation models could serve to simplify and greatly expand the use of AI in experimental fusion energy. The ability to create good latent space representations of diagnostic data can aid in a number of downstream tasks for experimental fusion scientists, such as identification of plasma phenomena across multiple diagnostics, anomaly detection, extracting physics parameters from data, and use in control systems. The automation of these tasks leads to remarkable opportunities to gain further insights across many plasma discharges and uncover hidden relationships. Foundation models also ease the burden on scientists from identifying and labeling thousands of examples for AI models, to a much more manageable level. Some work toward foundation models for fusion energy diagnostics has begun, for example, through the ExaLearn project, which was part of the Exascale Computing Project [Rodriguez et al., 2024 (unpublished study)], EUROFusion projects [24], and multi-modal bespoke models[25, 26], but until now, the full realization of AI foundation models as a production-ready tool in experimental fusion science has not been realized.

Although the focus of this paper has been foundation models for multi-modal time-series-based diagnostics, the advent of reasoning models such as the OpenAI o1 model [27] presents an opportunity to combine these in a hybrid system of AI agents, which can leverage these multi-modal time-series foundation models as tools to further automate discovery and utility of the investment in these experimental devices, including coupling with simulation. Creating these flexible building blocks of multi-modal time-series foundation models, to build these advanced workflows, could greatly aid fusion energy scientists ultimately toward the realization of fusion energy as a clean and sustainable energy source.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding author.

Author contributions

RC: writing–original draft and writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported by the US Department of Energy under DE-AC02-09CH11466.

Acknowledgments

The author gratefully acknowledges stimulating conversations and collaboration with colleagues in the ExaLearn project and with attendees at the Visualizing Offline and Live Data with AI (VOLDA) workshop where the work was presented.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, Arx S, et al. On the opportunities and risks of foundation models. arXiv [Preprint] arXiv:2108.07258 (2021). Available from: http://arxiv.org/abs/2108.07258 (Accessed August 27, 2021)