Skip to main content

ORIGINAL RESEARCH article

Front. Mech. Eng.
Sec. Solid and Structural Mechanics
Volume 10 - 2024 | doi: 10.3389/fmech.2024.1408649
This article is part of the Research Topic Hybrid Modeling - Blending Physics with Data View all 5 articles

Flow-based parameterizations for DAG and feature discovery in scientific multimodal data

Provisionally accepted
  • 1 Sandia National Laboratories, Albuquerque, United States
  • 2 Arizona State University, Tempe, Arizona, United States
  • 3 University of Pennsylvania, Philadelphia, Pennsylvania, United States

The final, formatted version of the article will be published soon.

    Representation learning algorithms are often used to extract essential features from highdimensional datasets. These algorithms commonly assume that such features are independent.However, multimodal datasets containing complementary information often have causally related features. Consequently, there is a need to discover features purporting conditional independencies. Bayesian Networks (BN) are probabilistic graphical models which use directed acyclic graphs (DAGs) to encode the conditional independencies of a joint distribution. To discover features and their conditional independence structure, we develop pimaDAG -a variational autoencoder framework which learns features from multimodal datasets, possibly with known physics constraints, and a BN describing the feature distribution. Our algorithm introduces a new DAG parameterization, which we use to learn a BN simultaneously with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the Gaussians with an outcome of the DAG nodes; this identification enables feature discovery with conditional independence relationships obeying the Markov factorization property.Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning a BN on simultaneously discovered key features in a fully unsupervised setting.

    Keywords: multimodal machine learning, DAGs, bayesian networks, Variational Inference, Variational autoencoders, fingerprinting, causal discovery algorithms

    Received: 28 Mar 2024; Accepted: 23 Sep 2024.

    Copyright: © 2024 Walker, Actor, Martinez and Trask. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Elise Walker, Sandia National Laboratories, Albuquerque, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.