DYNOTEARS: Learning the Structure of Dynamic Bayesian Networks
Authors: Roxana Pamfil, Data Science Consultant, QuantumBlack and Nisara Sriwattanaworachai, Senior Data Scientist, QuantumBlack
Over the last couple of years, an emerging hot topic in the data science community has been pushing models beyond correlations to instead encode causal relationships. This led us to launching CausalNex, an open source library that leverages Bayesian Networks to uncover structural relationships in data, learn complex distributions, and observe the effect of potential interventions.
Until recently, CausalNex has been limited to static (i.i.d.) data. With many of our projects operating in settings that change over time, we realised the need to adapt the methodology behind CausalNex to also work for temporal data. Several Data Scientists within our R&D function developed a new approach, which we recently presented at AISTATS 2020.
A Bayesian Network is a graphical model in which arrows (“directed edges”) indicate conditional dependencies between variables (“nodes”). In a Dynamic Bayesian Network, each variable exists as a node in several time slices, and edges connect nodes both within and across slices.
Why Use Dynamic Bayesian Networks?
Much of machine learning relies on the supervised learning paradigm, which is extremely powerful for predictive modelling. Bayesian networks have some advantages that make them worth considering in certain cases:
- Interpretability and the possibility to use subject-matter knowledge to improve the graphical model.
- Possibility for counterfactual (“what if”) analysis, which makes it possible to assess the impact of interventions.
- Sample efficiency, meaning that not a lot of data is required for learning.
Compared to static Bayesian networks, DBNs have the additional advantage that they model several timescales simultaneously: faster interactions that occur within a time slice and slower interactions that connect earlier events to later events. Many real-world systems exhibit these different timescales, and it is important for our models to account for them explicitly.
Example: A Dynamic Byesian Network for Predictive Maintenance
Let’s illustrate these concepts through an example. Suppose that we work with a company that wants to improve how they do maintenance of equipment in their factory. They give us a data set of measurements over time of the following quantities: equipment condition, sensor reading, whether the sensor is faulty, and whether there was any servicing of the equipment.
- Dependencies within a time slice. On a given day, if the sensor reading goes outside a certain range, a team will come in to service the machine. There is also a possibility that the sensor is faulty, in which case the reading will be off. This information is summarised by the three directed edges:
Equipment condition → Sensor reading, Faulty sensor → Sensor reading, Sensor reading → Servicing.
- Dependencies between time slices. The current state of the machine depends on its previous state, for example if it degrades over time or if it’s overused. Maintenance affects the future state of the machine and also reduces the probability of a faulty sensor. These dependencies appear in the graph as the three directed edges:
Equipment condition at time t-1 → Equipment condition at time t, Servicing at time t-1 → Equipment condition at time t, Servicing at time t-1 → Faulty sensor at time t.
This is, of course, a simplified example, in which we have just 4 variables and the dependencies between them are fairly clear. So how can we build Dynamic Bayesian Networks from data with potentially hundreds of variables?
Structure Learning From Time-Series Data
Our paper introduces an algorithm for learning Dynamic Bayesian Networks from temporal data, a problem which is called structure learning. We leverage an insight from DAGs with NO TEARS, published at NeurIPS 2018, to make this problem tractable. While this earlier work focused on static data, our paper tackles structure learning for dynamic data. We call our method Dynamic NOTEARS, or 🦖 💧 for short.
DYNOTEARS performs structure learning by minimising a loss function, subject to the constraint that the output graph is acyclic. If there are cycles, there is no hope for a causal interpretation of the edges, as there is no way to establish which event in the chain was the first to occur.
The optimisation finds the conditional dependencies that are best supported by the data. In the predictive maintenance example, the algorithm might tell us that servicing done yesterday has an impact on the equipment condition today, but not on the sensor reading
A Dynamic Bayesian Network with 3 nodes in which variables at time t depend on other variables at time t (solid arrows) and on variables at times t-1 and t-2 (dashed arrows).
Summary of Our Findings
In our paper, we set up a simulation study and compared DYNOTEARS with three other algorithms for structure learning. The key question is how close the structure learned by each of the algorithms is to the true DBN. We found that DYNOTEARS consistently outperforms the other methods. Notably, our algorithm has good performance even in low-data regimes where the number of variables exceeds the number of samples.
We also applied DYNOTEARS to two other data sets: stock returns of companies in the S&P100, and synthetic gene-expression data from the DREAM4 Challenge.
The full paper and supplementary materials are available in the AISTATS proceedings. Thanks to Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, and Bryon Aragam, who co-authored the paper.
DYNOTEARS is implemented in CausalNex, QuantumBlack’s open source library for causal reasoning and “what-if” analysis using Bayesian Networks. You can learn more about CausalNex on ReadTheDocs, Medium and GitHub