ODE filter

ODE filters are a class of probabilistic numerical methods for solving ordinary differential equations (ODEs) that frame the problem of finding a solution to an initial value problem as a Bayesian inference task. Solutions are modeled as probability distributions that also account for the discretization error introduced through the numerical approximation. This probabilistic treatment provides an computation-aware alternative to classical numerical ODE-solvers, which typically only provide point-wise error bounds. It further enables sampling of joint trajectories from the posterior, as well as quantifying uncertainty about the underlying ODE itself. ODE filters offer a flexible framework for incorporating additional information, such as measurements or conservation laws.

The general ODE filtering procedure consists of two steps: The first is the definition of a Prior distribution, for the ODE solution, in the form of a stochastic processes, more specifically, as Gauss–Markov processes. Discretization results in nonlinear Gaussian state space models. Second, use a general Bayesian filtering and smoothing algorithm, from the field of recursive Bayesian estimation, to find numerical solutions to the ODE. The naming convention of ODE filters typically reflects the underlying filtering algorithm: For instance, combining the particle filter with this framework yields the particle ODE filter. Since smoothers are extensions of filters, the literature often refers to both under the umbrella term "ODE filters".

Theory

Problem setup

Consider a first-order ordinary differential equation (ODE) initial value problem (IVP) of the form

${\dot {y}}(t)=f(y(t),t),\quad t\in ,\quad y(0)=y_{0}\in \mathbb {R} ^{d}.$

The vector field ${\textstyle f:\mathbb {R} ^{d}\times \rightarrow \mathbb {R} ^{d}}$ is assumed to be Lipschitz-continuous such that a unique global solution to this ODE-IVP exists according to the Pickard-Lindelöf theorem:^[1]. Classical Numerical ODE Solvers, are algorithms that compute approximate solutions ${\textstyle {\hat {y}}(t)\approx y(t)}$ on a discrete mesh ${\textstyle \{t_{n}\}_{n=0}^{N},\quad 0=t_{0}<\cdots <t_{N}=T,\quad n\in \mathbb {N} }$ ^[2]. Probabilistic ODE Solvers additionally estimate the uncertainty introduced through the discretization. ODE Filters are a type of probabilistic ODE solvers, that adopt a Bayesian Inference framework to compute a posterior^[3]

$p\left(y(t)\mid y(0),\{{\dot {y}}(t_{n})=f(y(t_{n}),t_{n})\}_{n=0}^{N}\right).$

This is done in two steps. First, prior data and likelihood is defined by modeling solutions to the ODE as a Gauss-Markov process. Second, the posterior is computed with Bayesian filtering and smoothing algorithms.^[3]

Step 1: Gauss-Markov process

Prior

The prior is specified by a linear time invariant (LTI) stochastic differential equation (SDE) of the form

$X(0)\sim {\mathcal {N}}(\mu _{0},\Sigma _{0})$

$\mathrm {d} X(t)=FX(t)\mathrm {d} t+L\mathrm {d} B_{t}.$

It describes a stochastic process ("the system"):

$X(t)=\left^{T}\in \mathbb {R} ^{(q+1)d}.$

The state of the system are realizations $x(t)\sim X(t)$ , where $X^{(0)}(t)$ and $X^{(1)}(t)$ model , $y(t)$ and ${\dot {y}}(t)$ respectively. Te remaining $q-1$ sub-vectors may be used to model higher-order derivatives. $F$ is a state transition matrix, $L$ a diffusion matrix, and $B(t)$ is a vector standard Wiener process. Note that $F$ and $L$ need to be chosen such that $\mathrm {d} X^{(0)}(t)=X^{(1)}(t)\mathrm {d} t$ holds^[4]. Solutions to LTI-SDEs are Gauss-Markov processes (GMP).^[5], of the form:

$p\left(x(t)\mid X^{(0)}(0)=x_{0}=y_{0}\right)=GP\left(x(t)\mid \mu (t),k(t,t')\right)$

$\mu (t)=exp(Ft)x_{0}$

$k(t,t')=\int _{0}^{\mathrm {min(t,t')} }exp(F(t-\tau ))LL^{T}exp(F(t'-\tau ))^{T}\mathrm {d} \tau .$

Where $\mu :\rightarrow \mathbb {R} ^{(q+1)d}$ and $k:\times \rightarrow \mathbb {R} ^{(q+1)d\times (q+1)d}$ . Gauss-Markov processes are computationally favorable over generic Gaussian process priors, as they allow for inference with linear time complexity ${\mathcal {O}}(N)$ compared to the general GP inference complexity of ${\mathcal {O}}(N^{3})$ ^[6]^[4]. Discretization of the GMP results in a linear Gaussian state space model (LGSSM):

$x(t_{0})\sim {\mathcal {N}}(\mu _{0},\Sigma _{0})$

$x(t_{n+1})\mid x(t_{n})\sim {\mathcal {N}}(A(t_{n+1}-t_{n})x(t_{n}),Q(t_{n+1}-t_{n}))$

$A(h)=exp(Fh)$

$Q(h)=\int _{0}^{h}e^{F\tau }LL^{T}e^{F^{T}\tau }\mathrm {d} \tau .$

Data

Information about the ODE is introduced through a measurement operator (i.e. state misalignment), known as an information operator in this context^[4],

$Z(t)=g(X(t))=X^{(1)}(t)-f(X^{(0)}(t),t).$

If it were possible to condition $X(t)$ on the event $0=z(t)\sim Z(t)$ for all $t\in$ , this would result in a point-mass on the true solution^[7] . This is computationally intractable (just like for all other forms of numerical simulation), which motivates the discretization of the problem and conditioning the state on discrete observations ${\textstyle \{z(t_{n})=0\}_{n=0}^{N}.}$ The information operator can be modified to solve higher-order ODEs. Multiple information operators can be added to incorporate additional information, such as conservation laws or measurement data^[8]^[9]

Likelihood

The likelihood of the observations given the GMP prior is:

$z_{n}\mid x(t_{n})\sim \delta (z_{n}-g(x(t_{n}))).$

To improve the model's generality and account for numerical errors introduced by discretization, a measurement variance $R_{n}$ can be added to $z_{n}$ ^[4]^[10]. The observations can be incorporated into the LGSSM as a nonlinear measurement model, yielding the final nonlinear Gaussian state space model (NLGSSM), that is the basis of all ODE filters:

$x_{0}\sim {\mathcal {N}}(\mu _{0},\Sigma _{0})$

$x_{n+1}\mid x_{n}\sim {\mathcal {N}}(A(h_{n+1})x_{n},Q(h_{n+1}))$

$z_{n}\mid x_{n}\sim {\mathcal {N}}(g_{n}(x_{n}),R_{n}).$

Here $h_{n+1}=t_{n+1}-t_{n}$ , $x_{n}=x(t_{n})$ , $z_{n}=z(t_{n})$ , and $g_{n}(x_{n})=x_{n}^{(1)}-f(x_{n}^{(0)},t_{n})$ . The desired posterior $p\left(x(t)\mid x_{0},\{z_{n}\}_{n=0}^{N}\right)$ can now be computed iteratively by any filtering and smoothing algorithms known from signal processing or recursive Bayesian estimation.

Step 2: Bayesian filter and smoother

The most general ODE filter and smoother algorithms to solve the NLGSSM are equivalent to the general recursive Beyesian estimation algorithms:

Algorithm: ODE Filter $(f,x_{0},p(x_{n+1}\mid x_{n}))$ 
 1: Initialize  $p(x_{0})$ 
 2: for  ${\textstyle n=0,1,\cdots ,N-1}$  do:
 3:     optional: adapt dynamic model  ${\textstyle p(x_{n+1}\mid x_{n})}$ 
 4:     optional: choose step size  $h_{n}>0$ 
 5:     predict  ${\textstyle p(x_{n+1}\mid z_{1:n})=\int p(x_{n+1}\mid x_{n})p(x_{n}\mid z_{1:n})\mathrm {d} x_{n}}$ 
 6:     observe the ODE  $z_{n+1}=0$ 
 7:     update  $p(x_{n+1}\mid z_{1:n+1})=p(z_{n+1}\mid x_{n+1})p(x_{n+1}\mid z_{1:n})p(z_{n+1})^{-1}$ 
 8:     optional: backward transition  $p(x_{n}|x_{n+1},z_{1:n})=p(x_{n+1}|x_{n})p(x_{n}|z_{1:n})p(x_{n+1}|z_{1:n})^{-1}$ 
 9: end for
10: return  ${\textstyle \{p(x_{n}\mid z_{1:n}),p(x_{n}\mid z_{1:n-1})\}_{n=0,\dots ,N}}$

Algorithm: ODE Smoother $(f,x_{0},p(x_{n+1}\mid x_{n}))$ 
 1:  ${\textstyle \{p(x_{n}\mid z_{1:n}),p(x_{n}\mid z_{1:n-1})\}_{n=0,\dots ,N}\;=\;{\text{ODE filter}}(f,x_{0},p(x_{n+1}\mid x_{n}))}$  
 2: for  $n=N-1,N-2,\cdots ,0$  do:
 3:     compute  ${\textstyle p(x_{n}\mid z_{1:N})=p(x_{n}\mid z_{1:n})\int p(x_{n+1}\mid x_{n})p(x_{n+1}\mid z_{1:N})p(x_{n+1}\mid z_{1:n})^{-1}\mathrm {d} x_{n+1}}$ 
 4: end for
 5: return  $\{p(x_{n}\mid z_{1:N})\}_{n=0,\dots ,N}$

For nonlinear observation models, the probability distributions become non-Gaussian, rendering the above equations intractable. Therefore, to define a specific ODE filter and smoother, the predict, update, and compute steps must be specified in a tractable form. The full posterior can be computed from

$p(x_{0:N}\mid z_{1:N})=p(x_{N}\mid z_{1:N})\prod _{n=0}^{N-1}p(x_{n}\mid x_{n+1},z_{n}).$

Having access to the full posterior enables sampling of joint trajectories that classical numerical methods do not allow.

Implementation Details

Choice of prior

The choice of a prior distribution is replaced by choosing the free parameters of the GMP. That is choosing $\mu _{0},\Sigma _{0},F,L,q$ , however, under some constraints. One of the most popular and most widely used GMPs is the q-times integrated Wiener process (i.e. integrated Brownian motion) ^[11]. For simplicity we can set w.l.o.g. $d=1$ ^[1]^[12]. The q-times Integrated Wiener process (IWP) is defined such that the sub-coordinates model the first q-derivatives of $X^{(0)}(t),$ i.e. $dX^{(n-1)}=X^{(n)}dt\quad \forall n\in$ . Further, $X^{(q)}$ is a standard Wiener Process. This is equivalent to the drift and dispersion matrices:

${\breve {F}}_{\mathrm {IWP} }=\left,\quad {\breve {L}}_{\mathrm {IWP} }=\sigma \left.$

Note that for $d>1$ the full drift and dispersion matrices follow from the one-dimensional versions with a Kronecker product $F_{\mathrm {IWP} }={\breve {F}}_{\mathrm {IWP} }\otimes I_{d}$ , and $L_{\mathrm {IWP} }={\breve {L}}_{\mathrm {IWP} }\otimes I_{d}$ :^[11]. This yields the following known closed form solutions for the transition matrices^[12]

$_{ij}=_{ij}=\mathbb {I} (j\geq i){\frac {h^{j-i}}{(j-i)!}},$

$_{ij}=\sigma ^{2}{\frac {h^{2q+3-i-j}}{(2q+3-i-j)(q+1-i)!(q+1-j)!}}.$

The ideal initialization of the IWP that considers all available information from the ODE with initial value $x_{0}$ is to evaluate all derivatives at $x_{0}$ :

$\mu _{0}=^{T},\quad \Sigma _{0}={\boldsymbol {0}}.$

Here $f^{<i>}(a)=(a)$ is recursively defined from $f^{<0>}(a)=a$ and $f^{<1>}(a)=f(a)$ , and $\odot$ is the elementwise product. Computationally this can be efficiently computed via Taylor mode automatic differentiation.^[1]^[3]

Choice of filter and smoother

One possible approach to building efficient inference algorithms for nonlinear measurement operators $g(X(t))$ is via linearization. This results in a Gaussian inference framework, which is computationally desirable because it allows efficient closed-form inference. Taylor approximation provides one viable linearization technique, whereas the degree and the locality/globality result in different subcategories of extended Kalman filters. Choosing a local, first order Taylor expansion corresponds to the extended Kalman filter of order 1 (EKF1). Other options are a Taylor approximation of order 0, leading to the EKF0 algorithm, or a global linearization, known as the iterated extended Kalman smoother (IEKS).^[1]

Example: EKF1

The first-order Taylor expansion of $g$ around a point $\xi \in \mathbb {R} ^{(q+1)d}$ is:

${\hat {g}}(X)=g(\xi )+J_{g}(\xi )(X-\xi ),$

where $J_{g}(\xi )\in \mathbb {R} ^{d\times (q+1)d}$ is the Jacobian of $g$ . This leads to an affine observation model

$z_{n}\mid x_{n}\sim g(z_{n}\mid x_{n})={\mathcal {N}}(H_{n}x_{n}+c_{n},R_{n}),\quad H_{n}=J_{g}(\xi _{n}),\quad c_{n}=g(\xi _{n})-J_{g}(\xi _{n})\xi _{n}.$

The linearization point is chosen as the predicted mean from the previous observation $\xi _{n}=m_{n}^{-}=\mathbb {E}$ . With this, the predict, update, and compute equations all take on Gaussian form, for which closed form inference is possible. See extended Kalman filter for more details.

$\mathrm {\textbf {predict}} :p(x_{n+1}\mid z_{1:n})={\mathcal {N}}(x_{n+1}\mid m_{n+1}^{-},P_{n+1}^{-})$

$\mathrm {\textbf {update}} :p(x_{n+1}\mid z_{1:n+1})={\mathcal {N}}(x_{n+1}\mid m_{n+1},P_{n+1})$

$\mathrm {\textbf {compute}} :p(x_{n}\mid z_{1:N})={\mathcal {N}}(x_{n}\mid m_{n}^{s},P_{n}^{s}).$

Examples

Logistic ODE

As a one dimensional example, consider the following logistic differential equation IVP:

${\dot {y}}(t)=y(t)(1-y(t)),\quad t\in \quad y(0)=y_{0}=0.01,$

for which a analytical solution exists:

$y(t)={\frac {1}{1+\left({\frac {1+y_{0}}{y_{0}}}\right)y_{0}}}.$

To solve this ODE-IVP with a PDE filter, the prior, discretization, as well as the predict, update, and compute steps need to be specified.

Prior

2-times IWP:

$A(h)=\left,\quad \sigma =1,\quad Q(h)=\left,$

with Taylor mode initialization:

$X(0)=\left^{T}.$

Data and lielihood

We discretized the time domain uniformly with $N=20\rightarrow h=0.5\rightarrow t_{n}=n*h$ and observe data $\{z(t_{n})=0\}_{n=1}^{N}$ without any noise, hence $R_{n}=0$ .

Filter and smoother

As filtering algorithm we use the EKF1 algorithm with the Jacobian is $J_{g}(X)=$ .

Software

probdiffeq JAX based ODE filter library.^[3].
ProbNumDiffEq.jl ODE filter library in Julia^[13].
probnum Probabilistic Numerics in Python^[14]

References

^ ^a ^b ^c ^d Hennig, Philipp; Osborne, Michael A.; Kersting, Hans P. (2022). Probabilistic Numerics: Computation as Machine Learning. Cambridge: Cambridge University Press. doi:10.1017/9781316681411. ISBN 978-1-316-68141-1.
^ Tronarp, Filip; Särkkä, Simo; Hennig, Philipp (2021). "Bayesian ODE solvers: the maximum a posteriori estimate". Statistics and Computing. 31 (3): 23. doi:10.1007/s11222-021-09993-7.
^ ^a ^b ^c ^d Krämer, Peter Nicholas (2024-04-11), Implementing Probabilistic Numerical Solvers for Differential Equations, Universitaet Tuebingen, Hennig, Philipp (Prof Dr.), Universität Tübingen, doi:10.15496/PUBLIKATION-94093, retrieved 2025-09-26
^ ^a ^b ^c ^d Tronarp, Filip; Kersting, Hans; Särkkä, Simo; Hennig, Philipp (2019). "Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective". Statistics and Computing. 29 (6): 1297–1315. doi:10.1007/s11222-019-09900-1.
^ Särkkä, Simo; Solin, Arno (2019). Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge: Cambridge University Press. doi:10.1017/9781108186735. ISBN 978-1-108-18673-5.
^ Øksendal, Bernt (2003). "Stochastic Differential Equations". Universitext. doi:10.1007/978-3-642-14394-6. ISBN 978-3-540-04758-2. ISSN 0172-5939.
^ Cockayne, Jon; Oates, Chris J.; Sullivan, T. J.; Girolami, Mark (2019). "Bayesian Probabilistic Numerical Methods". SIAM Review. 61 (3): 756–789. doi:10.1137/17M1139357. ISSN 0036-1445.
^ Schmidt, Jonathan; Krämer, Nicholas; Hennig, Philipp (2022-07-05), A Probabilistic State Space Model for Joint Inference from Differential Equations and Data, arXiv:2103.10153
^ Bosch, Nathanael; Tronarp, Filip; Hennig, Philipp (2021-10-20), Pick-and-Mix Information Operators for Probabilistic ODE Solvers, arXiv:2110.10770
^ Kersting, Hans (2021-03-11), Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering, Universitaet Tuebingen, Hennig, Philipp (Prof Dr.), Universität Tübingen, doi:10.15496/PUBLIKATION-54639, retrieved 2025-09-25
^ ^a ^b Bosch, Nathanael (2025-05-15), A Flexible and Efficient Framework for Probabilistic Numerical Simulation and Inference, Universitaet Tuebingen, Hennig, Philipp (Prof Dr.), Universität Tübingen, doi:10.15496/PUBLIKATION-106849, retrieved 2025-09-25
^ ^a ^b Kersting, Hans; Sullivan, T. J.; Hennig, Philipp (2020). "Convergence rates of Gaussian ODE filters". Statistics and Computing. 30 (6): 1791–1816. doi:10.1007/s11222-020-09972-4. ISSN 1573-1375. PMC 7527376. PMID 33088027.
^ Bosch, Nathanael (2024-09-30). "ProbNumDiffEq.jl: Probabilistic Numerical Solvers for Ordinary Differential Equations in Julia". Journal of Open Source Software. 9 (101): 7048. Bibcode:2024JOSS....9.7048B. doi:10.21105/joss.07048. ISSN 2475-9066.
^ Wenger, Jonathan; Krämer, Nicholas; Pförtner, Marvin; Schmidt, Jonathan; Bosch, Nathanael; Effenberger, Nina; Zenn, Johannes; Gessner, Alexandra; Karvonen, Toni (2021-12-03), ProbNum: Probabilistic Numerics in Python, arXiv:2112.02100

[:1-1] Hennig, Philipp; Osborne, Michael A.; Kersting, Hans P. (2022). Probabilistic Numerics: Computation as Machine Learning. Cambridge: Cambridge University Press. doi:10.1017/9781316681411. ISBN 978-1-316-68141-1.

[:5-2] Tronarp, Filip; Särkkä, Simo; Hennig, Philipp (2021). "Bayesian ODE solvers: the maximum a posteriori estimate". Statistics and Computing. 31 (3): 23. doi:10.1007/s11222-021-09993-7.

[:4-3] Krämer, Peter Nicholas (2024-04-11), Implementing Probabilistic Numerical Solvers for Differential Equations, Universitaet Tuebingen, Hennig, Philipp (Prof Dr.), Universität Tübingen, doi:10.15496/PUBLIKATION-94093, retrieved 2025-09-26

[:0-4] Tronarp, Filip; Kersting, Hans; Särkkä, Simo; Hennig, Philipp (2019). "Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective". Statistics and Computing. 29 (6): 1297–1315. doi:10.1007/s11222-019-09900-1.

[5] Särkkä, Simo; Solin, Arno (2019). Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge: Cambridge University Press. doi:10.1017/9781108186735. ISBN 978-1-108-18673-5.

[6] Øksendal, Bernt (2003). "Stochastic Differential Equations". Universitext. doi:10.1007/978-3-642-14394-6. ISBN 978-3-540-04758-2. ISSN 0172-5939.

[7] Cockayne, Jon; Oates, Chris J.; Sullivan, T. J.; Girolami, Mark (2019). "Bayesian Probabilistic Numerical Methods". SIAM Review. 61 (3): 756–789. doi:10.1137/17M1139357. ISSN 0036-1445.

[8] Schmidt, Jonathan; Krämer, Nicholas; Hennig, Philipp (2022-07-05), A Probabilistic State Space Model for Joint Inference from Differential Equations and Data, arXiv:2103.10153

[9] Bosch, Nathanael; Tronarp, Filip; Hennig, Philipp (2021-10-20), Pick-and-Mix Information Operators for Probabilistic ODE Solvers, arXiv:2110.10770

[10] Kersting, Hans (2021-03-11), Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering, Universitaet Tuebingen, Hennig, Philipp (Prof Dr.), Universität Tübingen, doi:10.15496/PUBLIKATION-54639, retrieved 2025-09-25

[:2-11] Bosch, Nathanael (2025-05-15), A Flexible and Efficient Framework for Probabilistic Numerical Simulation and Inference, Universitaet Tuebingen, Hennig, Philipp (Prof Dr.), Universität Tübingen, doi:10.15496/PUBLIKATION-106849, retrieved 2025-09-25

[:3-12] Kersting, Hans; Sullivan, T. J.; Hennig, Philipp (2020). "Convergence rates of Gaussian ODE filters". Statistics and Computing. 30 (6): 1791–1816. doi:10.1007/s11222-020-09972-4. ISSN 1573-1375. PMC 7527376. PMID 33088027.

[13] Bosch, Nathanael (2024-09-30). "ProbNumDiffEq.jl: Probabilistic Numerical Solvers for Ordinary Differential Equations in Julia". Journal of Open Source Software. 9 (101): 7048. Bibcode:2024JOSS....9.7048B. doi:10.21105/joss.07048. ISSN 2475-9066.

[14] Wenger, Jonathan; Krämer, Nicholas; Pförtner, Marvin; Schmidt, Jonathan; Bosch, Nathanael; Effenberger, Nina; Zenn, Johannes; Gessner, Alexandra; Karvonen, Toni (2021-12-03), ProbNum: Probabilistic Numerics in Python, arXiv:2112.02100

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

ODE filter

Theory

Problem setup

Step 1: Gauss-Markov process

Prior

Data

Likelihood

Step 2: Bayesian filter and smoother

Implementation Details

Choice of prior

Choice of filter and smoother

Example: EKF1

Examples

Logistic ODE

Prior

Data and lielihood

Filter and smoother

Software

See also

References

Wikious

Boobota

Sagapedia