I ODE, You ODE and the Neural ODE

There comes a time in the life of every mathematician, engineer, and mythological figure alike when

You like analytics, huh!? Then take this ODE.

And it smacks you with a nonlinear, coupled, implicit one, with coefficients changing like the Hogwarts staircases. The only thing you can do is not solve it. Not even by praying to the good souls of Gauss, Leibniz, and whatever other mathematicians you worship. Or at least, that was true until the advent of numerical methods. Not very elegant, often messy, but incredibly effective. They won’t give you any closed-form formulas to brag about, but if you settle for an approximate result, you really can’t complain. After all, although it might not seem like it, we are surrounded by approximations: when you use Google Maps (that tells you’ll arrive in 10 minutes and you get there in 17), when you add a pinch of salt (where “pinch” means a tablespoon), when you read my blog (which is supposed to be educational but helps me more to remember than you to learn)...

But enough chit-chat, since you’re probably on the edge of your seat, let’s finally look at these famous numerical methods. But first, if you haven’t already, go check out Part 1 and Part 2.

ODE in Small Steps

Let's revisit the much-praised Lotka-Volterra system introduced here.

$$ \begin{cases} \frac{dx}{dt} = \alpha x - \beta x y \\ \frac{dy}{dt} = \delta x y - \gamma y \end{cases} $$

where:

$x(t)$ is the number of prey (e.g., rabbits).
$y(t)$ is the number of predators (e.g., foxes).
$\alpha$ is the prey growth rate. It describes how quickly prey reproduce in the absence of predators (which, in the case of rabbits, is apparently quite high according to legend).
$\beta$ is the predation rate. It indicates how often prey are killed by predators.
$\delta$ is the conversion rate. It tells how many new predators are born per prey eaten and digested.
$\gamma$ is the predator mortality rate. It represents the natural death rate of predators in the absence of prey. If they don’t eat, they starve.

There is no known analytical method to symbolically solve this system, so we turn to numerical methods. These are strategies to obtain an approximate solution, using small, discrete, and hopeful steps. All thanks to the great Euler, who, in addition to giving us the iconic tattoo

$$ e^{i \pi} + 1 = 0 $$

(no seriously folks, stop getting it tattooed if you don’t even know what pi is), also gave us the progenitor of all numerical methods: the Euler method. The convergence of this method was proven by the venerable Cauchy and later generalized by the eminent Runge and Kutta with the family of RK methods. Basically, everyone here is great, and the world of numerical methods is more crowded than TikTok’s algorithm. There are many more we could cover, but we’ll stick to a select few.

Euler's Method

Let’s go back to the Taylor series expansion. Remember? I already talked about it here. The Taylor series is like a wildcard, it shows up everywhere.

Let’s consider in particular a first-order expansion, i.e.,

$$ f(x) = f(a) + f'(a) \cdot (\bar x) + O(\bar x^2) $$

where

$$ \bar x = (x - a) $$

In other words, we are evaluating the function at $x$, using derivatives calculated at a point $a$ that is offset from $x$. Now let’s consider a generic Cauchy problem:

$$ \begin{cases} y'(t) = f(y(t), t) \\ y(t_0) = y_0 \end{cases} $$

and recall the definition of derivative. For $h \rightarrow 0$ we can write:

$$ y'(t) \approx \frac{y(t+h) - y(t)}{h} $$

$$ \Rightarrow h \cdot y'(t) \approx y(t+h) - y(t) $$

$$ \Rightarrow y(t+h) \approx y(t) + h \cdot y'(t) $$

Does it look familiar? Exactly, a first-order Taylor expansion, where:

$x = t + h$
$a = t$

So we calculate the derivative at time $t$ and use it to estimate the function at $t+h$. Now let’s go further and assume we start at time $t_0$, which we know because we’re solving a Cauchy problem. We take a tiny step $h$ forward and land at $t_1$. Thus:

$$ y(t_1) = y(t_0 + h) \approx y(t_0) + h \cdot y'(t_0) $$

where

$$ t_1 = t_0 + h $$

Once we have $t_1$, we can move to $t_2$ in the same way:

$$ y(t_2) = y(t_1 + h) \approx y(t_1) + h \cdot y'(t_1) $$

where

$$ t_2 = t_1 + h = t_0 + 2h $$

Get the idea? We can generalize it like this:

$$y(t_{n+1}) = y(t_n + h) \approx y(t_n) + h \cdot y'(t_n)$$

Formula 1. First-Order ODE Expansion

Thus:

$$t_{n+1} = t_n + h = t_0 + n \cdot h$$

Formula 2. Time Discretization

Where:

$t_0$ is the initial time
$t_{n+1}$ is the final time
$h$ is the integration step size, freely chosen
$n$ is the number of iterations

In short, we’re discretizing time. The smaller $h$ is, the more accurate the method, but also more computationally expensive.

Going back to the Cauchy problem:

$$ y'(t) = f(y(t), t) $$

we can plug this into Euler’s method and get the final formulation:

$$y(t_{n+1}) \approx y(t_n) + h \cdot f(y(t_n), t_n)$$

Formula 3. Explicit Euler Method

That’s the formula for the Explicit Euler Method. Why “explicit”? Because we use only known values. Suppose we want to compute $y(t_1)$:

$y(t_0)$ is given by the Cauchy problem
$f(y(t_0), t_0)$ can be calculated by simple substitution

And of course, if there’s an explicit method, there’s also an implicit one, whose formula (without proof) is:

$$y(t_{n+1}) \approx y(t_n) + h \cdot f(y(t_{n+1}), t_{n+1})$$

Formula 4. Implicit Euler Method

It’s called implicit because, as you can see, the value we want to compute, $y(t_{n+1})$, appears on both sides of the equation. It’s not always easy to solve. In some cases, iterative methods like the successive substitution method are needed. But that’s beyond the scope of this article, and since we’ve already made life complicated enough, I’ll leave it to your curiosity to dig deeper.

Euler Example

All of this has been a bit too theoretical so far, I know, so let’s look at something practical with an exercise. Let’s consider the following Cauchy problem representing exponential decay. It may seem pointless, but many natural phenomena follow this law, such as radioactive decay, capacitor discharge in RC circuits, etc.

$$ \begin{cases} y' = -ky \\ y(t_0) = 1 \\ k = 1 \end{cases} $$

where $k$ is the decay rate, and in this case, we choose it ourselves. The solution to this ODE is:

$$ y(t) = e^{-1 \cdot t} $$

If you want the steps, do them yourself. You already know the tools to solve it. I chose this ODE (which can be solved analytically, as you've seen) to show the difference between an exact and a numerical solution. Let’s see the values at $t=1$ and $t=2$.

$$ \begin{array}{|c|c|c|} \hline t & y(t) & \text{Result} \\ \hline 0 & e^0 & 1 \\ \hline 1 & e^{-1 \cdot 1} & 0.367 \\ \hline 2 & e^{-1 \cdot 2} & 0.135 \\ \hline \end{array} $$

Now let’s solve it using Euler’s method. From the time discretization formula, we already know how many steps we need using $h=0.1$, initial time $t_0=0$, and final time $t_n=2$:

$$ t_{n} - t_0 = n \cdot h \Rightarrow 2 - 0 = n \cdot h \Rightarrow n = 20 $$

For clarity, let’s go step by step. The value at $t_0$ is given in the Cauchy problem, so:

Step $n=1$
$$ t_1 = t_0 + 1 \cdot 0.1 = 0.1 $$

$$ y(t_1) \approx y(t_0) + h \cdot f(y(t_0), t_0) $$

$$ y(0.1) \approx y(0) + 0.1 \cdot f(y(0), 0) $$

$$ y(0.1) \approx 1 + 0.1 \cdot (-1 \cdot 1) = 0.9 $$

Step $n=2$

$$ t_2 = t_0 + 2 \cdot 0.1 = 0.2 $$

$$ y(t_2) \approx y(t_1) + h \cdot f(y(t_1), t_1) $$

$$ y(0.2) \approx y(0.1) + 0.1 \cdot f(y(0.1), 0.1) $$

$$ y(0.2) \approx 0.9 + 0.1 \cdot (-1 \cdot 0.9) = 0.81 $$

And so on. Trust me when I say the results are those shown in this table:

$$ \begin{array}{|c|c|c|} \hline n & t_n & \text{Result} \\ \hline 0 & 0 & 1 \\ \hline 1 & 0.1 & 0.9 \\ \hline 2 & 0.2 & 0.810 \\ \hline \dots & \dots & \dots \\ \hline 10 & 1 & 0.349 \\ \hline \dots & \dots & \dots \\ \hline 20 & 2 & 0.122 \\ \hline \end{array} $$

Now let’s see what happens with $h=0.5$. I won’t go step by step because you know how it’s done now, so just take this table:

$$ \begin{array}{|c|c|c|} \hline n & t_n & \text{Result} \\ \hline 0 & 0 & 1 \\ \hline 1 & 0.5 & 0.5 \\ \hline 2 & 1 & 0.250 \\ \hline 3 & 1.5 & 0.125 \\ \hline 4 & 2 & 0.062 \\ \hline \end{array} $$

Here’s a summary comparison between the analytical solution and the two numerical approximations:

$y(t)$ is the analytical solution
$E(h=0.1)$ is Euler with $h=0.1$
$E(h=0.5)$ is Euler with $h=0.5$

$$ \begin{array}{|c|c|c|c|} \hline t & y(t) & E(h=0.1) & E(h=0.5) \\ \hline 0 & 1 & 1 & 1 \\ \hline 1 & 0.367 & 0.349 & 0.25 \\ \hline 2 & 0.135 & 0.122 & 0.062 \\ \hline \end{array} $$

As you can see, apart from $n=0$, the known solution, $h=0.1$ gives a much better approximation than $h=0.5$, but it also requires 20 steps compared to just 4 in the latter case.

All of this can be summarized by the following image:

Figure 1. Results Comparison

The exact solution is the red one, and as $h$ decreases, the approximation gets better and better. Alright, time to move on to the next round where the Runge-Kutta methods will knock you out.

The Mans... RK Family

Let’s give credit where credit is due: Euler’s method is the patriarch of all numerical methods. But is it really that good? Well, as always... it depends. For instance, on the type of problem you’re dealing with. For problems that are too complex, it can fail miserably. So yes, it’s good and dear, but let’s be honest, it’s getting old. We need fresh blood, brave and strong youngsters. That brings us to the Runge-Kutta methods, or simply RK.

RK methods are like a sprawling dynasty, and each has its own quirks. The only certainty, besides death, is that they keep Euler’s core idea: proceed in small steps.

Let’s go back to the Taylor expansion. As you’ll remember, the higher the order of expansion, the better the approximation. RK methods are built exactly on higher-order Taylor expansions. So Euler can be viewed as a method of order 1, i.e., RK1, or first-order Runge-Kutta. Now let’s see what happens when we go beyond first order and move to the second, starting from the First-Order Formula:

$$y(t_{n+1}) = y(t_n + h) \approx y(t_n) + h \cdot y'(t_n) + \frac{h^2}{2} \cdot y''(t_n)$$

Given that

$$ y'(t_n) = f(y(t_n), t_n) $$

$$ \Rightarrow y''(t_n) = f'(y(t_n), t_n) $$

$$\Rightarrow y(t_{n+1}) \approx y(t_n) + h \cdot f(y(t_n), t_n) + \frac{h^2}{2} \cdot f'(y(t_n), t_n)$$

Formula 5. Second-Order ODE Expansion

We can then use the definition of the first derivative and expand $f'$ using a finite difference:

$$ f'(y(t_n), t_n) \approx \frac{f(y(t_n + h), t_n+h) - f(y(t_n), t_n)}{h} = \frac{f(y(t_{n+1}), t_{n+1}) - f(y(t_n), t_n)}{h} $$

But from Euler’s method we know that

$$ y(t_{n+1}) = y_{n+1} = y_n + h \cdot f(y_n, t_n) $$

Substituting in:

$$ f'(y_n, t_n) \approx \frac{f(y_n + h \cdot f(y_n, t_n), t_{n+1}) - f(y_n, t_n)}{h} $$

Substituting again into the Second-Order Expansion, we get:

$$ y_{n+1} \approx y_n + h \cdot f(y_n, t_n) + \frac{h}{2} \cdot \left [ f(y_n + h \cdot f(y_n, t_n), t_{n+1}) - f(y_n, t_n) \right] $$

$$ \Rightarrow y_{n+1} \approx y_n + \frac{h}{2} \cdot \left [ f(y_n + h \cdot f(y_n, t_n), t_{n+1}) + f(y_n, t_n) \right] $$

So let’s define:

$$ k_1 = f(y_n, t_n) \\ k_2 = f(y_n + h \cdot f(y_n, t_n), t_{n+1}) = f(y_n + h \cdot k_1, t_{n+1}) $$

and we obtain the so-called RK2 Method, or Heun’s Method:

$$ \begin{cases} k_1 = f(y_n, t_n) \\ k_2 = f(y_n + h k_1, t_n + h) \\ y_{n+1} \approx y_n + \frac{h}{2} (k_1 + k_2) \end{cases} $$

Formula 6. Heun’s Method

Following the same logic, you can derive higher-order methods by expanding Taylor’s series further:

RK3 or Shu-Osher
RK4
RK5 or Dormand-Prince
... and many more.

Given its importance, we must include the RK4 formulation, it’s an excellent balance between precision and computational complexity:

$$ \begin{cases} k_1 = f(y_n, t_n) \\ k_2 = f(y_n + \frac{h}{2} k_1, t_n + \frac{h}{2}) \\ k_3 = f(y_n + \frac{h}{2} k_2, t_n + \frac{h}{2}) \\ k_4 = f(y_n + h k_3, t_n + h) \\ y_{n+1} \approx y_n + \frac{h}{6} (k_1 + 2k_2 + 2k_3 + k_4) \end{cases} $$

Formula 7. RK4 Method

Note that, just because I care, we’ve only considered explicit methods here. But sticking our heads in the sand doesn’t mean that implicit RK methods (yes, they exist) can’t hurt you.

RK Example

If you’ve made it this far, you’ve probably heard about Lotka-Volterra a bazillion times. And now that we can, why not solve it?! So let’s take the following setup:

$$ \begin{cases} \frac{dx}{dt} = \alpha x - \beta x y \\ \frac{dy}{dt} = \delta x y - \gamma y \\ \alpha = 1.1 \\ \beta = 0.4 \\ \delta = 0.1 \\ \gamma = 0.4 \\ x_0 = 10 \\ y_0 = 5 \end{cases} $$

Obviously, after all the rambling above, we’ll use numerical methods with:

$$ h = 0.1 \\ t \in [0, 2] $$

Specifically, we’ll apply both Heun and RK4. Don’t worry, this is just to show you how they work, so we’ll only carry out one step.

Heun, $n=1$

Remember, Lotka-Volterra is a system of two ODEs, so it gets a little more involved, but nothing crazy. Don’t panic. Let's start by calculating $k_1$ and $k_2$. Since we have two ODEs, we’ll compute:

$k_1^x, k_2^x$ for the first ODE
$k_1^y, k_2^y$ for the second ODE

$$ k_1^x = \alpha x_0 - \beta x_0 y_0 \\ k_1^y = \delta x_0 y_0 - \gamma y_0 $$

$$ k_2^x = \alpha (x_0 + h k_1^x) - \beta (x_0 + h k_1^x) \cdot (y_0 + h k_1^y) \\ k_2^y = \delta (x_0 + h k_1^x) \cdot (y_0 + h k_1^y) - \gamma (y_0 + h k_1^y) $$

$$ x_1 = x_0 + \frac{h}{2} (k_1^x + k_2^x) \\ y_1 = y_0 + \frac{h}{2} (k_1^y + k_2^y) $$

Substituting:

$$ k_1^x = 1.1 \cdot 10 - 0.4 \cdot 10 \cdot 5 = 11 - 20 = -9 \\ k_1^y = 0.1 \cdot 10 \cdot 5 - 0.4 \cdot 5 = 5 - 2 = 3 $$

$$ k_2^x = 1.1 \cdot (10 - 0.1 \cdot 9) - 0.4 \cdot (10 - 0.1 \cdot 9) \cdot (5 + 0.1 \cdot 3) \approx 10.01 - 19.292 = -9.282 \\ k_2^y = 0.1 \cdot (10 - 0.1 \cdot 9) \cdot (5 + 0.1 \cdot 3) - 0.4 \cdot (5 + 0.1 \cdot 3) \approx 4.823 - 2.12 = 2.703 $$

$$ x_1 = 10 + 0.05 \cdot (-9 - 9.282) = 9.086 \\ y_1 = 5 + 0.05 \cdot (3 + 2.703) = 5.285 $$

The sharpest among you may wonder why $t+h$, required for computing $k_2$, doesn’t appear anywhere. Simple: all the ODEs in the system don’t explicitly depend on time $t$, so it’s irrelevant for the computation. But if it did matter, the current time would be calculated using the Time Discretization Formula shown in the previous section.

RK4, $n=1$

Using the same idea, we calculate:

$k_1^x, k_2^x, k_3^x, k_4^x$ for the first ODE
$k_1^y, k_2^y, k_3^y, k_4^y$ for the second ODE

$$ k_1^x = \alpha x_0 - \beta x_0 y_0 \\ k_1^y = \delta x_0 y_0 - \gamma y_0 $$

$$ k_2^x = \alpha \cdot (x_0 + \frac{h}{2} \cdot k_1^x) - \beta \cdot (x_0 + \frac{h}{2} \cdot k_1^x) \cdot (y_0 + \frac{h}{2} \cdot k_1^y) \\ k_2^y = \delta \cdot (x_0 + \frac{h}{2} \cdot k_1^x) \cdot (y_0 + \frac{h}{2} \cdot k_1^y) - \gamma \cdot (y_0 + \frac{h}{2} \cdot k_1^y) $$

$$ k_3^x = \alpha \cdot (x_0 + \frac{h}{2} \cdot k_2^x) - \beta \cdot (x_0 + \frac{h}{2} \cdot k_2^x) \cdot (y_0 + \frac{h}{2} \cdot k_2^y) \\ k_3^y = \delta \cdot (x_0 + \frac{h}{2} \cdot k_2^x) \cdot (y_0 + \frac{h}{2} \cdot k_2^y) - \gamma \cdot (y_0 + \frac{h}{2} \cdot k_2^y) $$

$$ k_4^x = \alpha \cdot (x_0 + h \cdot k_3^x) - \beta \cdot (x_0 + h \cdot k_3^x) \cdot (y_0 + h \cdot k_3^y) \\ k_4^y = \delta \cdot (x_0 + h \cdot k_3^x) \cdot (y_0 + h \cdot k_3^y) - \gamma \cdot (y_0 + h \cdot k_3^y) $$

$$ x_1 = x_0 + \frac{h}{6} (k_1^x + 2k_2^x + 2k_3^x + k_4^x) \\ y_1 = y_0 + \frac{h}{6} (k_1^y + 2k_2^y + 2k_3^y + k_4^y) $$

Substituting:

$$ k_1^x = 1.1 \cdot 10 - 0.4 \cdot 10 \cdot 5 = 11 - 20 = -9 \\ k_1^y = 0.1 \cdot 10 \cdot 5 - 0.4 \cdot 5 = 5 - 2 = 3 $$

$$ k_2^x = 1.1 \cdot (10 - 0.05 \cdot 9) - 0.4 \cdot (10 - 0.05 \cdot 9) \cdot (5 + 0.05 \cdot 3) = 10.5 - 19.64 = -9.14 \\ k_2^y = 0.1 \cdot (10 - 0.05 \cdot 9) \cdot (5 + 0.05 \cdot 3) - 0.4 \cdot (5 + 0.05 \cdot 3) = 4.91 - 2.06 = 2.85 $$

$$ k_3^x = 1.1 \cdot (10 - 0.05 \cdot 9.14) - 0.4 \cdot (10 - 0.05 \cdot 9.14) \cdot (5 + 0.05 \cdot 2.85) = -9.13 \\ k_3^y = 0.1 \cdot (10 - 0.05 \cdot 9.14) \cdot (5 + 0.05 \cdot 2.85) - 0.4 \cdot (5 + 0.05 \cdot 2.85) = 2.84 $$

$$ k_4^x = 1.1 \cdot (10 - 0.1 \cdot 9.13) - 0.4 \cdot (10 - 0.1 \cdot 9.13) \cdot (5 + 0.1 \cdot 2.84) = -9.21 \\ k_4^y = 0.1 \cdot (10 - 0.1 \cdot 9.13) \cdot (5 + 0.1 \cdot 2.84) - 0.4 \cdot (5 + 0.1 \cdot 2.84) = 2.68 $$

$$ x_1 = 10 + \frac{0.1}{6} \cdot (-9 + 2 \cdot (-9.14) + 2 \cdot (-9.13) + (-9.21)) = 9.085 \\ y_1 = 5 + \frac{0.1}{6} \cdot (3 + 2 \cdot (2.85) + 2 \cdot (2.84) + 2.68) = 5.285 $$

Yeah, I know, it's a lot and seems overly complicated, but once you have the method’s formula, it’s just about making the right substitutions carefully.

Just like with Euler’s method, let’s provide a comparison table here. This time, we’ll group by method rather than $h$, and compare the results of RK4, Heun, and Euler. As we cannot compute the analytical solution here, all we have are approximations.

$$ \begin{array}{|c|c|c|c|c|} \hline n & t_n & \text{Euler} & \text{Heun} & \text{RK4} \\ \hline 0 & 0 & x=10, y=5 & x=10, y=5 & x=10, y=5 \\ \hline 1 & 0.1 & x=9.1, y=5.3 & x=9.08, y=5.28 & x=9.08, y=5.28 \\ \hline 2 & 0.2 & x=8.17, y=5.57 & x=8.16, y=5.53 & x=8.16, y=5.53 \\ \hline \dots & \dots & \dots & \dots & \dots \\ \hline 10 & 1 & x=2.61, y=6.19 & x=2.88, y=6.06 & x=2.88, y=6.06 \\ \hline \dots & \dots & \dots & \dots & \dots \\ \hline 20 & 2 & x=0.83, y=4.97 & x=0.96, y=4.8 & x=0.95, y=4.81 \\ \hline \end{array} $$

As seen from the table, Heun and RK4 give very similar results, with differences appearing only after $t=2$ seconds. In contrast, Euler starts to deviate poorly as early as $t=0.1$. The following image shows the phase behavior:

Figure 2. Results Comparison at $t=2$

Finally, let’s see what happens when we extend the time from $t=2$ to $t=15$.

Figure 3. Results Comparison at $t=15$

As you can see, Heun and RK4 have very similar, theoretically consistent behavior, forming a closed orbit (read previous posts if you don’t remember what that means). Meanwhile, Euler accumulates error so heavily that its solution goes completely off track.

To be thorough, here’s the time series of prey and predator populations, i.e., $x$ and $y$, over time $t$:

Comparison Figure 4. Behavior of $x$ and $y$ over time $t=15$

To Wrap Up

Here we are at the end of this third article. By now, you should know (or at least remember, if you already had the knowledge) what derivatives, integrals, differential equations, and numerical methods are. There’s so much more to say, but I tried to keep it concise. No need to thank me. In any case, we now have all the ingredients to talk about Neural ODEs. And in case you were wondering, yes, there will be plenty of math there too. As always, I’ll end with the link to the repo where you can see the code behind everything we’ve discussed.

Until next time.

I ODE, You ODE and the Neural ODE - 3

ODE in Small Steps

Euler's Method

Euler Example

The Mans... RK Family

RK Example

To Wrap Up

References

Published

Category

Tags

Contacts

ODE in Small Steps

Euler's Method

Euler Example

The Mans... RK Family

RK Example

To Wrap Up

Related Posts

References

Published

Category

Tags

Contacts