A DRL Approach for Sensor-driven Decision Making and RUL Prediction
Frameworks for decision making and RUL prediction are suggested
The review is done with the following paper:
Erotokritos Skordilis, et al., “A deep reinforcement learning approach for real-tim sensor-driven decision making and predictive analytics”, Jul 2020
In this paper, the authors propose two decision making methods that use reinforcement learning (RL) and particle filtering for:
- Deriving real-time maintenance policies
- Estimating remaining useful life for sensor-monitored degrading systems
Before getting started
Prerequisites for the paper:
- Bayesian Filter
- Particle Filter
Bayesian Filtering
Bayesian filter is a general approach for estimating unknown probability density function (PDF) recursively. It is designed for the online state estimation problems. For more detail, it will be thoroughly discussed in 2.2. Bayesian filtering.
Particle Filter
TBD
Table of Contents
- Introduction
- System characteristics and monitoring tools
- State-space modeling
- Bayesian filtering
- DRL?
- NN?
- Bayesian filtering-based DRL for maintenance decision making
- Numerical Experiments
1. Introduction
There are multiple advantages using RL frameworks with Bayesian filtering.
Bayesian filtering
- allows an explicit quantification of uncertainties
- Standard DRL does not incorporate uncertainty
- Standard DRL instead, uses point values from the sensor observation
- can infer the current state based on the history of sensor measurments and the latent degradation process
- is computationally more appealing
- Use of its output as input for training a RL framework
- Since Bayesian filtering can combine complex multi-dimensional sensor data
- Use of its output as input for training a RL framework
2. System characteristics and monitoring tools
Before we get to know more about the proposed approch, we first need to know the followings:
2.1. State-space modeling
In the first place, what is state-space model (SSM)? Since we’re dealing with a dynamic system that is degrading latently, degradation variables (latent variables) can be inferred indirectly from sensor we’re trying to use. That is, sensor observations are the key to find the hidden degradation process. SSM is a mathematical model that helps us to find the dependencies between the latent variables and the observations.
Most likely, if dealing with healt-monitoring system, SSM follows a generic stochastic filtering structure in a dynamic state-space form of
where $x_t$ denotes one-dimensional continuous and degradation process value at time $t$ and ${y_t}$ denotes multi-dimensional observation process. Stochastic functions $f$ display the evolution of model variables. ${\Theta} ={ {\theta_x}, {\theta_y} }$ are model variables which characterize the behaviour of the corresponding function $f$. ${u}_t$ is a multi-dimensional vector of operational inputs for each problem specified. During the operations, processes are perturbed by statistical noise $\epsilon$ and $\delta$. The noises $\epsilon$ and $\delta$ follow gaussian distribution and have 0 means and have convariance matrices of $Q$ and ${R}$ as follows:
\[\begin{aligned} \epsilon &\sim \mathcal{N}(0, Q) \\ \delta &\sim \mathcal{N}(0, {R}) \end{aligned}\]Despite the above, SSM cannot capture multimodal hidden dynamics which considers discrete-valued processes. Hence hybrid state-space model (HSSM) has been developed in order to monitor remaining useful lifetime (RUL). Just like SSM, HSSM includes continuous degradation state process $x_t$ and observation processes ${y_t}$. Along with $x_t$ and ${y_t}$ as presented in $Eq. (1)-(2)$, there also exist, an operating condition of the system $c_t$ (i.e., normal or faulty), a latent hazard process $\lambda_t$ representing the system’s probability of failure, and the working status process $o_t$ representing the overall system’s working conditions.
\[\begin{align} &x_t = f_x(x_{t-1}, c_t, {u}_t, {\theta_x}) \\ &{y}_t = f_y(x_t, c_t, {u}_t, {\theta_y}) \\ &c_t = f_c(c_{t-1}, {u}_t, {\theta_{c}}) &\text{latent process stating normal or faulty}\\ &\lambda_t = f_\lambda(x_t, c_t, {u}_t, {\theta_{\lambda}}) &\text{hazard process showing system's probability of failure}\\ &o_t = f_o(\lambda_t, {\theta_o}) &\text{overall system's working conditions} \end{align}\]As mentioned earlier, finding the relationship between latent variable and observation is no easy work. Since latent variable can be inferred indirectly from the observation, modeling a function showing the dependencies would be promising. Here, we can use a single-layered feedforward neural network namely Extreme Learning Machine (ELM). ELM can be used to model high dimensional and correlated sensor measurments.
From $Eq. (1)-(7)$, each equations need proper function to describe the dependencies between the input variables. ELM’s job is to find the proper function for each equation and every ELM will have its own weight vector $\theta$. For instance, when finding a function $f_x$, ELM for $x$ is to find appropriate weights $\theta_x$ in order to show the dependencies between the input variables $x_{t-1}, c_t$, and ${u_t}$.
Latent hazard process $\lambda_t$ shows the probability of system’s failure with the input of $x_t, c_t$, and ${u_t}$. Hazard process is in the form of logistic regression function under the binary assumption that is, normal or faulty as follows:
\[\begin{aligned} \lambda_t &= [1 + \text{exp}[-(\alpha x_t + \beta c_t + {\gamma u_t} + \beta_0)]]^{-1} \\ &= \frac{1}{1 + e^{-(\alpha x_t + \beta c_t + {\gamma u_t} + \beta_0)}} \end{aligned}\]
Fig. 1. Structure of the proposed SSM.
- Circles: latent states
- Rectangles: Observations
- Diamonds: Operating inputs
2.2. Bayesian filtering
Finding the dependencies between the given latent states, $(x_t, c_t, \lambda_t)$, is no easy work. Now for the sake of convenience of notation, let’s say the latents states as ${z_t} = {x_t, c_t, \lambda_t }$. Here, we are going to use Bayesian filter to approximate the posterior distribution. But you might wonder, what is Bayesian filter anyway? You can reference the following and help yourself more.
Bayesian Filter
The purpose of Bayesian filter is to calculate the belief of the hypothesis given the evidence and update
\[P(H | E) = \frac{P(H) P(E|H)}{P(E)}\]
$H$: Hypothesis
> $E$: Evidence
> $P(HE)$: Posterior
> $P(H)$: Prior
> $P(EH)$: Likelihood
2.3. Neural network
3. Bayesian filtering-based DRL for maintenance decision making
3.1. From Bayesian filtering to DRL
Algorithm 1. Bayesian Filtering-based Deep Reinforcement Learning
Input: Full
\[T = \sum_{j=1}^{N}{T_j}, j \in \{1, 2, ..., N \} \text{Sum of all lifetimes in $N$}\] \[\sum_{i=1}^{N_s}{1}_{z_t^{i}\in(b_d, b_{d-1})}\]4. Numerical Experiments
CMAPSS (Commercial Modular Aero-Propulsion System Simulator) turbofan engine degradation dataset from the NASA diagnostic repository (Saxena & Goebel, 2008)
References
2020
- A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analyticsComputers & Industrial Engineering, 2020