Mean-Reverting Statistical Arbitrage Strategies in Crude Oil Markets

Fanelli, Viviana

doi:10.3390/risks12070106

Open AccessArticle

Mean-Reverting Statistical Arbitrage Strategies in Crude Oil Markets

by

Viviana Fanelli

Department of Economics, Management and Business Law, University of Bari, Largo Abbazia Santa Scolastica 53, 70124 Bari, Italy

Risks 2024, 12(7), 106; https://doi.org/10.3390/risks12070106

Submission received: 4 March 2024 / Revised: 4 June 2024 / Accepted: 12 June 2024 / Published: 25 June 2024

(This article belongs to the Special Issue Portfolio Theory, Financial Risk Analysis and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we introduce the concept of statistical arbitrage through the definition of a mean-reverting trading strategy that captures persistent anomalies in long-run relationships among assets. We model the statistical arbitrage proceeding in three steps: (1) to identify mispricings in the chosen market, (2) to test mean-reverting statistical arbitrage, and (3) to develop statistical arbitrage trading strategies. We empirically investigate the existence of statistical arbitrage opportunities in crude oil markets. In particular, we focus on long-term pricing relationships between the West Texas Intermediate crude oil futures and a so-called statistical portfolio, composed by other two crude oils, Brent and Dubai. Firstly, the cointegration regression is used to track the persistent pricing equilibrium between the West Texas Intermediate crude oil price and the statistical portfolio value, and to identify mispricings between the two. Secondly, we verify that mispricing dynamics revert back to equilibrium with a predictable behaviour, and we exploit this stylized fact by applying the trading rules commonly used in equity markets to the crude oil market. The trading performance is then measured by three specific profit indicators on out-of-sample data.

Keywords:

statistical arbitrage; trading strategy; commodity markets

1. Introduction

For years, both academics and professionals have aimed at identifying and exploiting arbitrage opportunities that arise in financial markets. For this scope, they have developed even more sophisticated trading strategies. The Morgan Stanley trading group addressed by the quant Nunzio Tartaglia was the first to develop an automated trading strategy in the 1980s. This strategy was called ‘Pairs Trading’. It consists of observing short-term mispricings in two similar securities and understanding trading signals by using graphical analysis of trends and their reversion. In Pairs trading, the assets are selected on the basis of intuition, economic fundamentals, long-term correlations, or simply past experience. By the end of the last millennium, the growing demand for models that could properly capture arbitrage opportunities had led to the development of the so-called statistical arbitrage strategies. Roughly speaking, a statistical arbitrage is a trading strategy that generates profit in the long run by pricing inefficiencies identified through mathematical and statistical models.

In this paper, we have focused on the specific area of mean-reverting statistical arbitrage. This means that we considered an arbitrage portfolio strategy which is associated with a mean-reverting process. Our purpose has been twofold. Firstly, we introduced the concept of statistical arbitrage over a finite time horizon through the definition of a trading strategy that captures persistent anomalies in long-run relationships among asset prices. Secondly, we applied our definitions to a portfolio composed of three related crude oils, namely the West Texas Intermediate, Brent, and Dubai oils. Indeed, we empirically investigated statistical arbitrage opportunities that arise when this portfolio is trading according to basic trading rules on the time window 2005–2017.

We have chosen to analyse the crude oil markets because their dynamics play a central role in the worldwide economy since oil price movements substantially affect the most macroeconomic activity, especially after the 1970s crises (see Barsky and Kilian (2004), Kilian (2009)).

Although in the literature there exist papers that deal with the empirical investigation of statistical arbitrage in several financial markets, to the best of our knowledge and to date, there is no published paper that focuses on modelling statistical arbitrage strategies in crude oil markets.

This paper proceeds as follows. In Section 2 we review the literature. In Section 3.1, we define the mean-reverting statistical arbitrage strategy. Section 3.2 deals with the procedure used to implement the statistical arbitrage strategy by using market data. Section 4 discusses the empirical application on crude oil markets, In particular, Section 4.1.1 contains the description of the dataset used for the two case studies: weekly trading and daily trading. We first implement the statistical arbitrage strategy by using weekly data. In Section 4.1.2, we carry out statistical analyses of time series in order to build statistical arbitrage strategies and, in Section 4.1.3, we verify that strategy dynamics are mean-reverting. In Section 4.1.4, we outline the trading rules used and we discuss their performance in Section 4.1.5. In Section 4.1.6, we test the statistical arbitrage strategies through an out-of-sample analysis. In Section 4.2, we extend our analyses to daily data. We discuss the use of the mean-reverting statistical arbitrage strategy in comparison with other trading strategies in crude oil markets in Section 5, and Section 6 concludes.

2. Literature Review

The term statistical arbitrage appeared for the first time in the 1990s and it has remained widely employed by traders ever since. The crisis of 2000 affected market dynamics, and many mathematical models built by using statistical arbitrage methods have failed in generating good performances. According to Pole (2007), in 2006, researchers developed new and advanced algorithms that guaranteed more accurate and profitable statistical arbitrage trading strategies.

In 1999, Burgess (1999) defined a statistical arbitrage as a generalization of the traditional zero-risk or pure arbitrage. In the latter case, a trader constructs fair-price relationships between two assets that have identical cash-flows and any deviation from that relationship consists of a pure arbitrage opportunity. According to Burgess (1999), zero-risk opportunities have not been able to exist in the market due to several uncertainty factors, related, for example, to future dividend rates or price market volatility during the short time trading. On the contrary, Burgess (1999) stated that the so-called statistical arbitrage opportunities originate from security mispricings whose dynamics fluctuate around an equilibrium level. Thus, through suitable strategies, he exploited small but consistent similarities in asset price dynamics for gaining.

Bondarenko (2003) distinguished between the pure arbitrage opportunity and the statistical arbitrage opportunity. The first one is defined as a zero-cost trading strategy that prevents any possibility of losses. Instead, the statistical arbitrage opportunity has been described as a zero-cost trading strategy that guarantees a positive expected payoff and a non-negative conditional expected payoff in each state of the economy. This means that the strategy value can be negative in some elementary states, as long as the average payoff in each final state is non-negative.

Hogan et al. (2004) and Jarrow et al. (2005) described statistical arbitrage as a trading opportunity which generates profits in the long run without taking risks. In their opinion, it is a natural development of the trading strategies utilized to analyse empirical market anomalies in the existing literature. Indeed, Jensen (1978) argued that arbitrage opportunities are not compatible with an efficient market, but tests of market efficiency should refer to an equilibrium model. On the contrary, statistical arbitrage existence rejects market efficiency without appealing to the joint hypothesis of an equilibrium model.

Among others, statistical arbitrage approaches based on quantitative methods have been proposed by many authors, like Burgess (1999), Vidyamurthy (2004), Elliot et al. (2005), Do et al. (2006), Bertram (2009), Cummins and Bucca (2012), Lin and Tan (2023), Vergara and Werner (2024), and Horikawa and Nakagawa (2024). Their studies have aimed at identifying arbitrage opportunities in stock markets.

When we deal with commodity markets, we refer to crude oil price dynamics, such as, for example, in Kristoufek and Vosvrda (2014), who analysed long-run dependence phenomena for prices empirically. More recently, we mention Cerqueti et al. (2019), Nakajima (2019) and Cerqueti and Fanelli (2021), who investigated long-run equilibria and statistical arbitrages in commodity markets, and additional literature, such as He et al. (2023), Poutre et al. (2023), and Zhang et al. (2024).

3. Research Methods

3.1. Statistical Arbitrage Modelling

Let

(Ω, F, {(F_{t})}_{t \geq 0}, P)

be the filtered probability space, where

\bar{T}

is a finite time horizon.

F = F_{\bar{T}}

is the

σ

-algebra at time

\bar{T}

. All statements and definitions are understood to be valid until

\bar{T}

. Furthermore, we assume there are a finite number of trading dates, indexed by

t = 0, 1, \dots, \bar{T}

. Trading strategies are at the basis of the notion of statistical arbitrage. They are formulated using only available information, such as rules based on historical data, company size, earning announcements, market versus book values, sales growth, or macroeconomic conditions.

We consider N assets1 whose prices at time t are the row vector

[v_{t}^{1}, \dots, v_{t}^{N}]

and an additional asset whose value is

Z_{t}

. We define a portfolio of

N + 1

assets as a

N + 1

-dimensional row vector

[h^{0}, h^{1}, \dots, h^{N}]

and each

h^{i}

,

i = 0, \dots, N

, represents the weight of the i-th constituent asset in the portfolio that is bought at time

t = 0

and held until time T. In particular,

h^{0}

is the weight of the additional asset

Z_{t}

. The coefficients

h^{i}

,

i = 0, \dots, N

, can be either positive or negative, so that we may have, respectively, both long or short positions in the assets. The value process of the

N + 1

-asset portfolio is

{(X_{t})}_{t \geq 0}

, defined as

X_{t} = h^{0} Z_{t} + \sum_{i = 1}^{N} h^{i} v_{t}^{i}, t \geq 0 .

(1)

Definition 1.

The portfolio process

{(X_{t})}_{t \geq 0}

generates statistical arbitrage if there exists a time T such that the following conditions are satisfied:

1.: $X_{0} = 0$ ,
2.: $E [X_{T} | F_{0}] \geq 0$ ,
3.: The variance $V a r [X_{T} | F_{t}]$ decreases monotonically through time2.

where

E [\cdot | F_{0}]

is the expected value under the objective probability measure

P

and

V a r [\cdot | F_{t}]

is the variance, conditional to the information available at time t.

Then, the portfolio

{(X_{t})}_{t \geq 0}

is called the statistical arbitrage strategy3.

Remark 1.

The statistical arbitrage strategy of Definition 1 satisfies three conditions: 1. It is a zero-initial-cost strategy; 2. the expected payoff at the trading day T as seen at time 0 is positive; and 3. the strategy reduces its variance over time by adjusting magnitude of its long and short positions. Condition 3. is essential to generate statistical arbitrage4.

Remark 2.

In the literature, statistical arbitrage is opposed to pure arbitrage (see for example Bondarenko (2003)). Given the portfolio with process

{(X_{t})}_{t \geq 0}

, pure arbitrage has

X_{0} = 0

, and there exists a time T such that

X_{T} \geq 0

with probability 1 and

X_{T} > 0

with positive probability. This means that a pure arbitrage portfolio is basically a deterministic money making machine that exploits mispricings on the market. On the contrary, in statistical arbitrage, the mispricings on the market are based on the expected value of the assets, that is, the mispricing of price relationships are true in expectation, in the long run. Cerqueti et al. (2019) and Cerqueti and Fanelli (2021) develop and apply some methodologies for analysing long-run equilibrium among commodities.

Let

{(Z_{t})}_{t \geq 0}

be the price process of a particular asset, called a target asset. We consider the portfolio vector

h = [h^{1}, \dots, h^{N}]

, which consists of the N assets of prices vector

v_{t} = [v_{t}^{1}, \dots, v_{t}^{N}]

and such that the portfolio value is

V_{t} = h v_{t} = \sum_{i = 1}^{N} h^{i} v_{t}^{i}, t \geq 0 .

(2)

Definition 2.

The portfolio

h

is a statistical portfolio for the target asset,

{(Z_{t})}_{t \geq 0}

if the following fair-price relationship holds:

E [Z_{t} | F_{s}] = E [V_{t} | F_{s}], 0 \leq s \leq t,

(3)

where

E [\cdot | F_{s}]

is the expected value under the objective probability measure

P

conditional to the information available at time s,

F_{s}

, and

V_{t}

is the portfolio value in Equation (2).

Equation (3) gives a long-run equilibrium relationship between the target asset and the portfolio

h

.

The definition of the mispricing portfolio at a generic time t follows

Definition 3.

The mispricing portfolio is a trading strategy,

{(M_{t})}_{t \geq 0}

, which has mean-reverting dynamics described by the following equation:

d M_{t} = α (Θ - M_{t}) d t + σ d W_{t}, M_{0} = 0,

(4)

where

α > 0

is the speed of mean reversion,

Θ > 0

is the long-run mean, σ is the return volatility and

{(W_{t})}_{t \geq 0}

is a Brownian motion.

Definition 3 implies that

M_{t}

is normally distributed and the conditional mean and variance between any two instants s and t,

0 \leq s < t

, given

M_{s}

, are

E [M_{t} | F_{s}] = Θ + (M_{s} - Θ) e^{- α (t - s)}, V a r [M_{t} | F_{s}] = \frac{σ^{2}}{2 α} (1 - e^{- 2 α (t - s)}),

(5)

where we again consider the conditional expected value and variance.

In the following proposition, we link the concept of the mispricing portfolio to statistical arbitrage.

Proposition 1.

The mispricing portfolio

{(M_{t})}_{t \geq 0}

whose dynamics are given by Formula (4) is a statistical arbitrage strategy. Indeed, all conditions in Definition 1 are fulfilled:

1.: $M_{0} = 0$ ,
2.: $E [M_{T} | F_{0}] = Θ (1 - e^{- α T}) > 0$ ,
3.: $\frac{\partial}{\partial t} V a r [M_{T} | F_{t}] = - σ^{2} e^{- 2 α (T - t)} \leq 0$ .

Proposition 2.

The portfolio given by a long position on the target asset and a short position on the statistical portfolio and whose process follows the mean-reverting dynamics (4) is a mispricing portfolio

{(M_{t})}_{t \geq 0}

.

Therefore, given the target asset

{(Z_{t})}_{t \geq 0}

and the statistical portfolio

{(V_{t})}_{t \geq 0}

, consisting of N assets and obtained according to Formula (3), the mispricing portfolio

{(M_{t})}_{t \geq 0}

is a

N + 1

-dimensional vector

\hat{h}

= [1, - h^{1}, \dots, - h^{N}]

, such that

M_{t} = Z_{t} - V_{t} = Z_{t} - \sum_{i = 1}^{N} h^{i} v_{t}^{i}, t \geq 0 .

(6)

Remark 3.

Formula (6) is equivalent to Formula (7), where

h^{0} = 1

and

h^{i}

,

i = 1, \dots, N

, are negative.

An example of statistical arbitrage occurs when a commodity intermarket spread is trading. Intermarket spreads involve the simultaneous purchase and sale of different but related commodities that have a reasonably stable relationship to each other. Opportunities for intermarket spreads occur when commodities being traded are substitutes for each other or there are some other relationships that cause prices to be correlated. For example, random disturbances in supply and demand in cash and futures markets can cause futures prices to diverge and give rise to intermarket spread opportunities. A classical example of intermarket spreads in commodity futures markets is the crack spread. The crack spread is the difference between the futures price of crude oil and an appropriate combination of futures prices of two petroleum products, that is the heating oil and gasoline. The portfolio consisting of a long position on three crude oil futures contracts and short positions on two gasoline futures contracts and one heating oil futures contract is a statistical arbitrage portfolio with the following time t value:

X_{t} = 3 Z_{t} - 2 v_{t}^{1} - v_{t}^{2}, t \geq 0,

(7)

where

X_{t}

is the portfolio value; and

Z_{t}

,

v_{t}^{1}

, and

v_{t}^{2}

, respectively, are the futures prices of crude oil, gasoline, and heating oil. The dynamics of

{(X_{t})}_{t \geq 0}

are shown in Figure 1.

A crack spread position (buy crude oil and sell gasoline and heating oil) would be assumed when refined product prices are high relative to crude oil prices and are expected to fall. Because refineries purchase crude oil and sell refined products in relatively fixed proportions, the prices of crude oil, heating oil, and gasoline tend to move in a parallel fashion. When prices of refined products rise substantially above crude prices, there exists an incentive to purchase crude oil and sell refined products. This would cause the price spread between crude and refined products to narrow. When prices of refined products fall relative to the price of crude oil, the incentive is to purchase less crude oil and run the refinery at less than the full capacity. This would cause the price spread between crude and refined products to rise. Other examples of statistical arbitrage portfolios are the spark spread and the frac spread. The first one mimics financially the generation costs of electricity for a specific facility and involves the simultaneous purchase of natural gas futures and the sale of electric futures. The second one is the difference between the price of gas liquids and natural gas.

3.2. Implementation of the Statistical Arbitrage Strategy

In this Section, we build the mispricing portfolio with dynamics described by Equation (6) by using crude oil market data. First of all, we describe the procedure that we have to follow in order to properly obtain a statistical arbitrage strategy according to Section 3.1.

Firstly, in order to build the mispricing portfolio (6), we need first to choose a target asset, then to identify N assets such that condition (3) holds. These N assets are selected on the basis of a subjective analysis of investors, based on information coming from price behaviour, market rumours, asset physical or financial characteristics, etc. Usually, the target asset and the N assets have common characteristics, such as similar physical or financial characteristics, the same reference market, or they are assets whose prices are affected by the same external factors. However, the technique adopted to obtain the weights of the statistical portfolio

h = [h^{1}, \dots, h^{N}]

of Definition 2 is the cointegration regression. The concept of cointegration has a financial meaning; indeed, it represents a long-term relationship among assets. On one hand, the cointegration approach allows us to obtain the coefficients of the constituent asset prices

v_{t}^{i}

,

i = 1, \dots, N

, in order to form a portfolio (6). On the other hand, it allows us to verify that the chosen constituent assets are appropriate in the sense that their prices are positive correlated with the target asset price; namely, they share the same common trend, long-run equilibrium (3). The coefficients

h^{i}

,

i = 1, \dots, N

, in (2) are elements of the cointegration vector. They are estimated by regressing a set of historical prices

v_{t}^{i}

,

i = 1, \dots, N

, over historical target asset prices, such that

h = \arg \min \sum_{t} {(Z_{t} - \sum_{i = 1}^{N} h^{i} v_{t}^{i})}^{2} .

(8)

It is important to state that the coefficients of the linear cointegration are stable over time, in particular in the long period. However, parameter stability is difficult to discover empirically if the used dataset is large. Some authors in the literature, such as Gregory and Hansen (1996), demonstrate that there exists cointegration even if there are structural breaks in time series. Therefore, the Quandt likelihood ratio (henceforth QLR) test of Stock and Watson (2003) is used to verify that coefficients

h^{i}

,

i = 1, \dots, N

, are stable in the long period. The QLR F-statistics test the hypothesis that the intercept and coefficients in Formula (8) are constant against the alternative of break in the central 70% of the sample.

Secondly, statistical tests are used to verify the mean reversion of mispricing portfolio dynamics. We analyse the autocorrelations across time steps, and apply the augmented Dickey–Fuller (ADF) test to search for unit roots and to study time series stationarity. However, a theoretical problem about the low power of classical Dickey–Fuller tests (see Dickey and Fuller (1979)) to clearly identify the stationarity and so that the predictability of a price process is well known in the econometrics field. Therefore, we use the more robust test of variance ratio (see Cochrane (1988) and Lo and MacKinlay (1988), among others), in order to verify if the dynamics of the mispricing portfolio deviate from the random walk behaviour. If we calculate the variance ratio over consecutive time periods

τ > 0

, we obtain the variance ratio function. Its analysis allows us to find out a mean-reverting nature of the mispricing portfolio. The variance ratio statistic is defined as the normalized ratio of the long-term variance calculated over a period

τ

to single-period variance. Values of variance ratio bigger than one for any

τ

suggest that the historical prices are positively serially correlated and the mispricing portfolio has a trending behaviour. On the contrary, values of variance ratio less than one for any

τ

suggest that the historical prices are negatively serially correlated and the mispricing portfolio has a mean-reverting behaviour.

Finally, appropriate trading rules may be developed in order to take advantage of the mean-reverting behaviour and to open or close positions to profit.

4. Empirical Application on Crude Oil Markets and Results

In this Section, we aim to apply the theory discussed in Section 3.1 and Section 3.2 to real market data. Although many statistical arbitrage opportunities have been empirically identified in stock markets, commodity markets can be explored. Some forms of arbitrage may be identified in these markets, as reviewed by Fanelli (2015). In this article, we focus on crude oils traded on different markets, because as we have already pointed out in the Introduction, crude oil market dynamics play a central role in the worldwide economy since oil price movements substantially affect most macroeconomic activity, especially after the 1970s crises. A great deal of the recent literature discusses the efficiency of crude oil markets and research focuses on the dynamics of the three major crude oil prices: the West Texas Intermediate, Brent, and Dubai (see, for example, Wilkinson et al. (2004), Wlazlowski et al. (2011), Scarpa et al. (2015), and Kilian (2016)).

We show that their prices are related each others and we build a mispricing portfolio by assuming a long position on the West Texas Intermediate crude oil futures and a short position on the statistical portfolio composed by futures on Brent and Dubai crude oils. Furthermore, we develop three basic trading strategies that rely on the mean-reverting behaviour of the mispricing portfolio, and we measure their profitability through performance indicators. Finally, we carry out a backtest of the strategies on out-of-sample data.

4.1. Weekly Trading

4.1.1. Description of the Data

We consider three crude oils. The three largest crude benchmarks in the world are the West Texas Intermediate (henceforth WTI), Brent, and Dubai crude oils. The first two are the most important global crude benchmarks for the light and sweet crude. Instead, Dubai is the most important benchmark for the sour and heavy crude. WTI crude oil is traded on the New York Mercantile Exchange and was launched in March 1983. Nowadays, it is the most liquid futures contract in crude oil markets. The WTI is deliverable to Cushing, Oklahoma, which is accessible to the spot market via pipeline. Brent crude oil, which is traded on the Intercontinental Exchange, was launched in July 1989. Dubai crude oil is quoted by Platt’s.

The dataset consists of weekly futures prices for the first month, spanning from 25 October 2000 to 19 October 2009, resulting in 461 observations. All time series were downloaded from Thomson Reuters Datastream/Eikon.

4.1.2. Time-Series Analysis

We choose the WTI crude oil as the target commodity and we consider the statistical portfolio (3) composed by Brent and Dubai crude oils, according to Definition 2. Therefore, the weight vector

h

is obtained by applying the cointegration regression according to (8). The results of the cointegration regression on the 461 observations are summarized in Table 1. Furthermore, the ADF test statistic for residuals is −3.75029 with a p-value of 0.04869, and this implies that a cointegration relationship is evidenced. Consequently, mispricing portfolio value time series are obtained through Formula (6) and the values are plotted in Figure 2.

In Figure 2, we observe that up to the end of 2006, the mispricing (in USD) fluctuates in the range [−2, 2]. This is due to the fact that most of the time, WTI was higher than the Brent crude oil price, and the three crude oil prices behave in the same way. The exception is in June 2001, when a weakened US economy and an increased non-OPEC production put downward pressure on WTI prices with respect to Brent oil, and made markets more volatile. Since 2006 and even more so in 2007, the gap switched and the Brent oil price was higher than the WTI crude oil price, in which the peak came in February 2009 with an average gap of 4.23 USD/barrel. There could be some macroeconomic changes affecting this spread and recent years, such as the changes in EUR/USD. This evidence caused more volatility in crude oil markets, which is reflected in a wider mispricing oscillation.

Then, the QLR test is applied to check whether the long-run relationship between WTI, Brent, and Dubai crude oils is stable. In particular, here, QLR F-statistics tests the hypothesis that the intercept and coefficients in equation

Z_{t} = c - h^{1} v_{t}^{1} - h^{2} v_{t}^{2},

(see Table 1) are constant against the alternative of the break in the central 70% of the sample. The obtained critical value F is 61.2695, which means that the null hypothesis that these coefficients are stable is rejected at the 1% significance level. Therefore, there is a structural break in the sample. The breakpoint data were taken on 14 February 2005. We consequently divide our data in two subset:, one for the pre-breakpoint dates and the other for the post-breakpoint dates, leaving the data of 2008 to 2009 to test the model by an out-of-sample analysis. We estimate the mispricing coefficients by OLS regression. Table 2 and Table 3 provide the details of the regression estimates obtained for each subsample.

Comparing Table 2 and Table 3, we observe that there is a change in Dubai position in the mispricing portfolio after February 2005. Indeed, before the structural break, the mispricing portfolio consists of a long position on the target commodity WTI crude oil and short positions on Brent and Dubai oils. Instead, after the break, the long position on WTI crude oil is balanced by a short position on Brent oil and a long position on Dubai oil.

By applying the Johansen (1991) test, we verify that the cointegration relation holds also in the presence of a structural break. The results of the test are shown in Table 4 and Table 5.

4.1.3. Mean-Reversion Analysis

In this subsection, we analyse the mispricing portfolio time series obtained in the previous subsection, in order to find predictable components and verify the mean-reverting behaviour.

The autocorrelation function of the mispricing time series is used to examine the short-term effects. As we can see from Figure 3, an autocorrelation coefficient with a value different from zero means that a mispricing value is related to the past value, and hence, the presence of a predictable component is expected.

If we look at the results for unit root tests shown in Table 1, we verify the stationarity of the time series. The stationarity is asserted by the value

- 3.75

of the ADF statistic test, even if acceptable but high value of the p-value

0.04869

could mean an absence of mean reversion. Therefore, we calculate variance ratio statistics according to different time lags and we plot them in Figure 4. The variance ratio function assumes values lower than one, and it is also a decreasing function. We can conclude that the mispricing dynamics follow a mean-reverting behaviour, confirming the existence of predictable components.

4.1.4. Trading Rules Implementation

In this subsection, we aim at investigating suitable trading rules that identify trading signals for opening and closing positions in the mispricing portfolio. We review three basic trading rules described in Burgess (1999) and we test them on our commodity data.

The adopted trading rules rely implicitly on the mean-reverting behaviour of the mispricing time series. In fact, if, in the long run, the mispricing reduces as prices change, a trader, who has previously opened a position in the mispricing portfolio, can realize profits. The trader should only optimize the trade-off between transaction costs and trading gains.

These trading rules define the sign and the magnitude of the mispricing portfolio components

\hat{h}

in Formula (6). Although in the following, we will define three trading rules as functions of the time, we do not need to verify that they fulfill the statistical arbitrage conditions of Definition 1 because the rule functions acquire the mean-reverting characteristics of the mispricing portfolio, and they can be considered strategies along the same line of Proposition 1. Hereafter, we will use indifferently the terms “trading rule” and “trading strategy”.

The characteristics of the three adopted trading rules are summarized in Table 6. For each strategy, we give a short description.

We recall that we suppose a finite number of trading dates,

t = 1, \dots, \bar{T}

. Let

S_{t}^{k}

be the plain vanilla strategy, which is the basic trading rule at date t that depends on the sign and the level of the mispricing at the previous time and on the value of a sensitivity parameter

k \in R

according to the following formula:

S_{t}^{k} = - s i g n (M_{t - 1}) {| M_{t - 1} |}^{k} .

(9)

The mispricing portfolio must be sold when

S_{t}^{k}

is negative and bought when it is positive. An example of a trading rule as a function of the time is displayed in Figure 5.

The holding magnitude varies as a function of the size of the previous mispricing through the sensitivity parameter k. We implement this rule according to different values of k on the mispricing time series. We can summarize our results as follows. When

k = 0

, we have a step function, meaning that the entire holding is always invested in the mispricing portfolio. If

k > 0

, the size of portfolio increases as the magnitude of the mispricing enlarges, and in particular, a

k > 1

corresponds to more aggressive strategies.

In order to reduce the investor risk attitude in mispricing trading, we use the moving-average strategy

S_{t}^{h}

, defined as follows:

S_{t}^{h} = \frac{1}{h} \sum_{j = 1}^{h} S_{t - j}^{k},

(10)

where

h > 0

is the moving average parameter. Furthermore, any transaction, made according to any trading strategy, implies some costs and, obviously, every operator wants to optimize the trade-off between costs and gains from the exploitation of trading signals. In order to carry this out, we consider the following smooth strategy:

S_{t}^{O} = (1 - O) S_{t}^{k} + O S_{t - 1}^{O},

(11)

where

O \geq 0

is a smoothing parameter. By increasing the values of h and O, on one hand, the number of transactions comes down so that the transaction costs decrease; on the other hand, the accuracy of the smoothed trading signal diminishes. This behaviour is illustrated in Figure 6, where we compare the three trading strategies (9), (10), and (11) as functions of the time, assuming

K = 0.33, h = 5

, and

O = 0.8

. From the figure, we deduce that strategy function (9) changes more frequently with respect to function (10) and has a larger amplitude compared with function (11). Function (11) has a very smooth trend in comparison with the other functions, meaning that, ceteris paribus, by adopting this strategy, the number of transactions reduces.

4.1.5. Performance Analysis

In order to evaluate the performance of the trading rules defined in Section 4.1.4, we define and estimate the following performance indicators.

The first indicator

R_{t}

is the mark-to-market profit and loss, which evaluates the return obtained over a generic trading time period

[t - 1, t]

by applying any trading rule. Let us consider, for example the rule (9); the mark-to-market profit and loss at time t is computed by the following formula:

\begin{matrix} R_{t} = S_{t}^{k} \frac{Δ M_{t}}{S_{t} + V_{t}^{h}} - c | Δ S_{t}^{k} |, \end{matrix}

(12)

where

Δ M_{t} = M_{t} - M_{t - 1}

,

Δ S_{t}^{k} = S_{t}^{k} - S_{t - 1}^{k}

, c is the percentage transaction costs and

S_{t} + V_{t}^{h}

is the sum of the mispricing portfolio components. Then, we can substitute

S_{t}^{k}

with the other trading strategies (10) and (11) in order to obtain the return of those strategies.

A strategy profitability indicator is the cumulative mark-to-market profit and loss,

ρ_{t}

, which represents the total return or cumulative profit of a strategy from the inception

t = 0

to the generic trading date t. It is computed as the cumulative sum of the

R_{s}

,

s = 0, \dots, t

:

\begin{matrix} ρ_{t} = \sum_{s = 0}^{t} R_{s} . \end{matrix}

(13)

We can use the indicator

ρ_{t}

to compare the performances of strategy (9) according to different values of k.

A performance indicator that takes into account not only the level of profit, but also the level of strategy risk, measured by the variability of profits, is the Sharpe Ratio. The Sharpe Ratio calculated at date t is

Π_{t}

. As in the traditional sense, it measures the profit per unit of risk. In this context of the statistical arbitrage, it is calculated as the ratio between the annualized mean profitability of the strategy and its annualized standard deviation of the profits:

Π_{t} = \frac{\frac{1}{t} \sum_{s = 1}^{t} R_{s}}{\sqrt{\frac{1}{t} \sum_{s = 1}^{t} {[(R_{s} - \frac{1}{t} \sum_{s = 1}^{t} R_{s}]}^{2}}}

(14)

Figure 7 shows the cumulative profit functions for

k = 0, 0.5, 1

and confirms that the value

k = 1

ensures the greater profit.

Using the performance indicators described above, we compare the trading strategies according to certain parameters and we investigate the most efficient one. We apply the strategy

S_{t}^{O}

, Equation (11), on our mispricing data, letting k vary between 0 and 1 and O vary between 0 and

0.75

. We calculate the values of

ρ_{t}

at the last observation for each k and represent them in Figure 8. From the figure, we can deduce that the optimal strategy ensuring the maximum profit is when

k = 1

and

O = 0.5

and assuming a cost percentage equal to

0.25 %

. We use this optimal rule to test the effectiveness of our statistical arbitrage strategy by an out-of-sample analysis as described in the following subsection.

4.1.6. Out-of-Sample Analysis

In order to carry out a backtest of the strategy, we apply the optimal strategy (11) with parameters

k = 1

,

O = 0.5

, and

c = 0.25 %

to weekly out-of-sample data spanning from 7 January 2008 to 23 December 2010, so that we increase our dataset with the data of the year 2010. On the contrary, we consider the time series spanning from 25 October 2000 to 31 December 2007 as in-sample data, and we use them to estimate the parameters of the mispricing in Section 4.1.2. We compare the yearly performances of the in-sample data with those of the out-of-sample data. The trading performances are measured using the total return and the annual Sharpe ratio, and calculating the percentage of profitable weeks as the percentage of periods corresponding to positive returns. The results of the in-sample analysis are illustrated in Table 7; while those of the out-of-sample analysis are in Table 8. We keep into consideration the structural break in 2005, so that we change the cointegration coefficients in the mispricing estimation.

From this simple analysis, we find out that the strategy performs well in the out-of-sample years, in line with the results obtained on in-sample data. The anomalous value of 2009 total return is mainly due to some macroeconomic events already evidenced in Figure 2.

We could state that an optimal strategy may be developed and updated daily, taking into consideration the three indices of performance (total return, Sharpe ratio, and profitable periods), so that any trading decision would be taken in line with the specific risk profile of the investor.

4.2. Daily Trading

In this section, we use daily one-month futures prices and the dataset spans from 1 January 2010 to 10 April 2017. Each futures contract is traded until the close of business on the third business day prior the 25th calendar day of the month presiding the delivery month, and it is assumed that the investor will roll over the front month pair contracts on the first day of the trading month. We use data from 1 January 2010 to 31 December 2014 for the in-sample analysis, and we leave data from 1 January 2015 to 10 April 2017 for out-of-sample analysis. We implement the statistical arbitrage strategy according to Section 3.2 and we briefly show the results below.

By using observations 1 January 2010–25 April 2017, the cointegration analysis gives the following results:

WTI, Brent, and Dubai time series are first-order integrated;
We apply the cointegration regression. We consider that the equation $Z_{t} = c + h^{1} v_{t}^{1} + h^{2} v_{t}^{2}$ . $Z_{t}$ is the price of the WTI futures, whereas $v_{t}^{1}$ is the price of the Brent futures and $v_{t}^{2}$ is the price of the Dubai futures. $h^{1}$ and $h^{2}$ are the weights of $v_{t}^{1}$ and $v_{t}^{2}$ in the statistical portfolio. c is the constant of regression. The regression coefficients are in Table 9.
The residuals from regression are stationary.

The linear cointegration assumes coefficient stability in the long-run equilibrium between oil prices. By applying the QLR test, we find that the break occurs on 1 June 2011. This structural break in 2011 can be attributed to several important events that impacted crude oils prices. The storage and pipeline capacity constraints at Cushing, Oklahoma, an oil trade hub and the delivery location for NYMEX crude oil futures contracts, resulted in a downward pressure on WTI price. The following events put upward pressure on the price of Brent: the Tunisian revolution in December 2010, the increased weight of Brent and decreased weight of WTI in Standard and Poor’s GSCI commodity index in January 2011, the Libyan crisis in February 2011, and the Fukushima-Daiichi nuclear disaster in Japan in March 2011. On the contrary, the upsurge in U.S. oil production put downward pressure on the price of WTI. The Dubai price is strongly correlated to the Brent price, in fact, although Dubai remains overwhelmingly the most important Asia crude marker, Brent remains a default alternative in Asia.

We verify the mean reverting behaviour of the mispricing portfolio

M_{t} = Z_{t} - (c + h^{1} v_{t}^{1} + h^{2} v_{t}^{2})

, obtained by the cointegration analysis above, according to Section 4.1.3. Then, according to Formula (11), we apply the following optimal strategy:

S_{t}^{O} = (1 - O) S_{t}^{h} + O S_{t - 1}^{O},

(15)

where

S_{t}^{h}

is the strategy (10), and

k = 1

,

h = 6

,

O = 0.5

, and

c = 0.25 %

.

In Table 10 and Table 11, we show the yearly performances, respectively, of the in-sample data and the out-of-sample data. The trading performances are measured using the total return, the annual Sharpe ratio, and calculating the percentage profitable weeks as the percentage of periods corresponding to positive returns, as in Section 4.1. We keep into consideration the structural break in 2011, so that we change the cointegration coefficients in the mispricing portfolio.

We can conclude that also with daily data and extending the dataset to prices of 2015, the strategy performs well in the out-of-sample years, in line with the results obtained on in-sample data. This evidence confirms the robustness of our analysis.

5. Discussion

It is known that there are many factors that impact crude oil prices and therefore influence trading in the crude oil market. For example, crude oil supply and demand cause price fluctuations, which often lead to inflationary pressures and immediate realignments of US dollars and Forex crosses. Geopolitical events can also have a major impact on the market, resulting in increased retail prices of gasoline due to production and supply disruptions. Weather events affect crude oil trading, impacting supply (because of, for example, production disruption) and refinery operations. Therefore, in order to achieve profitable trading, traders need to have an in-depth understanding of the market and select appropriate trading strategies.

There are several trading techniques that can apply in crude oil markets, and the choice of one of them depends on the knowledge, experience, and risk tolerance of the investor. Although it may seem indifferent to use one strategy rather than another, the mean-reverting statistical arbitrage strategy has substantial advantages over the most common strategies. We consider the following widely used strategies:

(a): Fundamental Analysis: traders examine macroeconomic data, geopolitical events, and supply and demand factors to choose the favourable trading strategy;
(b): Technical Analysis: traders use chart patterns and indicators to predict future price movements and identify trading opportunities;
(c): Buy and Hold Strategy: traders carry out a long-term investment by holding onto open positions until the desired profit is achieved;
(d): Spread Trading Strategy: traders take different positions (buy and sell) on two related assets, and then at an appropriate time, they reverse the positions to obtain profit.

Strategies (a) and (c) require that, on the one side, traders have an in-depth knowledge of the macroeconomic factors such as GDP growth, manufacturing data, and employment rates that influence the crude oil markets dynamics, and on the other side, they know how supply and demand factors influence the international oil market over long periods. Strategy (b) requires that the trader has experience in observing price curve charts in order to choose the right time to buy or sell an asset by using some supporting chart tools. Mean-reverting statistical arbitrage strategies allow traders to overcome all those requirements based on personal knowledge of the crude oil market and its dynamics.

Strategy (d) may be included among mean-reverting statistical arbitrage strategies because it consists of modelling the divergence between two related prices (for example, between a crude oil price and a distillate price) that can fluctuate due to changes in supply and demand or other influences within the oil market, but that reverts to a long-run mean. However, while with spread trading the relationship between two correlated assets is considered and only the difference in prices is traded, the mean-reverting strategy is based on a more complex procedure aimed at building a mispricing portfolio of three or more correlated assets. In this way, and differently from the other cited strategies, it is possible to have an accurate prediction of the mean-reverting portfolio dynamics and obtain statistically predictable and expected profits through the application of specific trading rules.

6. Conclusions

In this paper, we introduce the concept of statistical arbitrage through the definition of a trading strategy, called mispricing portfolio. We focus on mean-reverting strategies in order to capture persistent anomalies in the markets. Furthermore, we show how we identify statistical arbitrages and apply trading rules adopted from equity markets.

We show the empirical evidence of statistical arbitrage in crude oil markets. We have built the mispricing portfolio by using a cointegration regression in order to identify long-term pricing relationships between the WTI crude oil futures and the price of a replication portfolio composed of other two crude oils, Brent and Dubai. Finally, we apply trading rules commonly used in equity markets to profit.

Overall, the statistical arbitrage presented in this paper and borrowed from the equity world shows a very promising result in the commodity world and, in particular, in the crude oil markets.

This suggests that more research is addressed in this direction, in particular about the relationship existing between the physical market and the futures market and the role of the convenience yield. Another interesting extension of the work is to apply mean-reverting statistical arbitrage strategies during the COVID-19 pandemic and new armed conflicts and analyse the results.

Funding

This research received no external funding.

Data Availability Statement

All time series used in this study were downloaded from Thomson Reuters Datastream/Eikon.

Conflicts of Interest

The author declares no conflict of interest.

Notes

1	One of the assets may represent a risk-free bond.
2	Conditional variance is a decreasing function of time t, that is, $\frac{\partial}{\partial t} V a r [X_{T} \| F_{t}] \leq 0$ .
3	The portfolio ${(X_{t})}_{t \geq 0}$ is also called simply “statistical arbitrage”.
4	In economic terms, Condition 3 implies that the Sharpe ratio associated to the strategy increases monotonically through time. This is consistent with the policy adopted by hedge funds that profit by exploiting mean-reverting dynamics of a portfolio driven by a continuously evolving Sharpe ratio (see Lo (2010)).

References

Barsky, Robert B., and Lutz Kilian. 2004. Oil and the macroeconomy since the 1970s. The Journal of Economic Perspectives 18: 115–34. [Google Scholar] [CrossRef]
Bertram, William. 2009. Analytic solutions for optimal statistical arbitrage trading. Physica A 389: 2234–43. [Google Scholar] [CrossRef]
Bondarenko, Oleg. 2003. Statistical arbitrage and securities prices. Review of Financial Studies 16: 875–919. [Google Scholar] [CrossRef]
Burgess, A. Neil. 1999. A Computational Methodology for Modelling the Dynamics of Statistical Arbitrage. Ph.D. Thesis, London Business School, London, UK. [Google Scholar]
Cerqueti, Roy, and Viviana Fanelli. 2021. Long memory and crude oil’s price predictability. Annals of Operations Research 299: 895–906. [Google Scholar] [CrossRef]
Cerqueti, Roy, Viviana Fanelli, and Giulia Rotundo. 2019. Long run analysis of crude oil portfolios. Energy Economics 79: 183–205. [Google Scholar] [CrossRef]
Cochrane, John H. 1988. How big is the random walk in GNP? The Journal of Political Economy 96: 893–920. [Google Scholar] [CrossRef]
Cummins, Mark, and Andrea Bucca. 2012. Quantitative spread trading on crude oil and refined products markets. Quantitative Finance 12: 1857–75. [Google Scholar] [CrossRef]
Dickey, David A., and Waine A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74: 427–31. [Google Scholar]
Do, Bihn, Robert Faff, and Kais Hamza. 2006. A New Approach to Modeling and Estimation for Pairs Trading. Working Paper. Clayton: Monash University. [Google Scholar]
Elliot, Robert, J. van der Hoek, and W. Malcolm. 2005. Pairs trading. Quantitative Finance 5: 271–76. [Google Scholar] [CrossRef]
Fanelli, Viviana. 2015. Commodity-linked arbitrage strategies and portfolio management. In Handbook of Multi-Commodity Markets and Products: Structuring, Trading and Risk Management. Hoboken: John Wiley & Sons, pp. 901–38. [Google Scholar]
Fanelli, Viviana. 2020. Financial Modelling in Commodity Markets. London: Chapman and Hall/CRC. [Google Scholar]
Gregory, Allan, and Bruce Hansen. 2023. Residual-based tests for cointegration in models with regime shift. Journal of Economics 70: 100429. [Google Scholar] [CrossRef]
He, Chengying, Tianqi Wang, **nwen Liu, and Ke Huang. 2023. An innovative high-frequency statistical arbitrage in Chinese futures market. Journal of Economics 8: 99–126. [Google Scholar] [CrossRef]
Hogan, Steve, Robert Jarrow, Melvyn Teo, and Mitch Warachka. 2004. Testing market efficiency using statistical arbitrage with applications to momentum and value strategies. Journal of Financial Economics 73: 525–65. [Google Scholar] [CrossRef]
Horikawa, Hiroaki, and Kei Nakagawa. 2024. Relationship between deep hedging and delta hedging: Leveraging a statistical arbitrage strategy. Finance Research Letters 73: 105101. [Google Scholar] [CrossRef]
Jarrow, Robert A., Melvyn Teo, Yiu Kuen Tse, and Mitch Warachka. 2005. Statistical arbitrage and market efficiency: Enhanced theory, robust tests and further applications. In Robust Tests and Further Applications (February 2005). Available online: https://ink.library.smu.edu.sg/lkcsb_research/3168 (accessed on 11 June 2024).
Jensen, Michael C. 1978. Some anomalous evidence regarding market efficiency. Journal of Financial Economics 6: 95–101. [Google Scholar] [CrossRef]
Johansen, Soren. 1991. Cointegration and hypothesis testing of cointegration vectors in gaussian vector autoregressive models. Econometrica 59: 1551–80. [Google Scholar] [CrossRef]
Kilian, Lutz. 2009. Not all oil price shocks are alike: Disentangling demand and supply shocks in the crude oil market. The American Economic Review 99: 1053–69. [Google Scholar] [CrossRef]
Kilian, Lutz. 2016. The impact of the shale oil revolution on us oil and gasoline prices. Review of Environmental Economics and Policy 10: 185–205. [Google Scholar] [CrossRef]
Kristoufek, Ladislav, and Miloslav Vosvrda. 2014. Commodity futures and market efficiency. Energy Economics 42: 50–57. [Google Scholar] [CrossRef]
Lin, Boqiang, and Zhizhou Tan. 2023. Exploring arbitrage opportunities between China’s carbon markets based on statistical arbitrage pairs trading strategy. Environmental Impact Assessment Review 99: 107041. [Google Scholar] [CrossRef]
Lo, Andrew W. 2010. Hedge Funds: An Analytic Perspective. Princeton: Princeton University Press. [Google Scholar]
Lo, Andrew W., and A. Craig MacKinlay. 1988. Stock market prices do not follow random walks: Evidence from a simple specification test. Review of Financial Studies 1: 41–66. [Google Scholar] [CrossRef]
Nakajima, Tadahiro. 2019. Expectations for statistical arbitrage in energy futures markets. Journal of Risk and Financial Management 12: 14. [Google Scholar] [CrossRef]
Pole, Andrew. 2007. Statistical Arbitrage. Hoboken: Wiley Finance. [Google Scholar]
Poutré, Cédric, Georges Dionne, and Gabriel Yergeau. 2023. International high-frequency arbitrage for cross-listed stocks. International Review of Financial Analysis 89: 102777. [Google Scholar] [CrossRef]
Scarpa, Elisa, Alessandro Cologni, and Francesco Sitzia. 2015. Big Fish: Oil Markets and Speculation. FEEM Fondazione Eni Enrico Mattei Research Paper Series; Berkeley: Bepress. [Google Scholar]
Stock, James H., and Mark W. Watson. 2003. Introduction to Econometrics. Boston: Addison Wesley Boston, vol. 104. [Google Scholar]
Vergara, Gabriel, and Kristjanpoller Werner. 2024. Deep reinforcement learning applied to statistical arbitrage investment strategy on cryptomarket. Applied Soft Computing 153: 111255. [Google Scholar] [CrossRef]
Vidyamurthy, Ganapathy. 2004. Pairs Trading: Quantitative Methods and Analysis. Wiley Finance Series; Hoboken: Wiley. [Google Scholar]
Wilkinson, Rick. 2004. Wec: Brent crude challenged as oil price benchmark. Oil & Gas Journal 102: 24. [Google Scholar]
Wlazlowski, Szymon, Bjorn Hagströmer, and Monica Giulietti. 2011. Causality in crude oil prices. Applied Economics 43: 3337–47. [Google Scholar] [CrossRef]
Zhang, Huiming, Siji Qian, and Zhen Ma. 2024. An analysis of the market efficiency of the Chinese copper futures based on intertemporal and intermarket arbitrages. International Review of Financial Analysis 94: 103243. [Google Scholar] [CrossRef]

Figure 1. Crack spread dynamics.

Figure 2. Crude oil mispricing portfolio.

Figure 3. Mispricing autocorrelation function (ACF) and partial autocorrelation Function (PACF).

Figure 4. Variance ratio function.

Figure 5. Trading rule example: the red line represents the trading rule in Equation (9) with parameter

K = 0.33

; it gives signals for trading the mispricing given by the blue line.

Figure 5. Trading rule example: the red line represents the trading rule in Equation (9) with parameter

K = 0.33

; it gives signals for trading the mispricing given by the blue line.

Figure 6. Trading signal comparison: the blue line represents the dynamics of the trading rule in Equation (9) with parameter

K = 0.33

; the red line describes the dynamics of the trading rule in Equation (10) with parameters

K = 0.33

and

h = 5

; the green line represents the dynamics of the trading rule in Equation (11) with parameters

K = 0.33

,

h = 5

, and

O = 0.8

.

Figure 6. Trading signal comparison: the blue line represents the dynamics of the trading rule in Equation (9) with parameter

K = 0.33

; the red line describes the dynamics of the trading rule in Equation (10) with parameters

K = 0.33

and

h = 5

; the green line represents the dynamics of the trading rule in Equation (11) with parameters

K = 0.33

,

h = 5

, and

O = 0.8

.

Figure 7. Total return function comparison: three cumulative profit functions given by Formula (13) are shown according to different values of parameter k of the strategy in Equation (9).

Figure 8. Optimal trading strategy: the surface represents the total return, given by Formula (13), of the strategy in Equation (11); the parameter k varies between 0 and 1 and the parameter O varies between 0 and

0.75

; transaction cost percentage is

0.25 %

.

Figure 8. Optimal trading strategy: the surface represents the total return, given by Formula (13), of the strategy in Equation (11); the parameter k varies between 0 and 1 and the parameter O varies between 0 and

0.75

; transaction cost percentage is

0.25 %

.

Table 1. Cointegration regression for crude oils.

Coefficient	Estimate	Std. Error	t-Statistic	Prob.
c	1.61763	0.19417	8.33071	0.00
$h^{1}$	1.19378	0.04061	9.39531	0.00
$h^{2}$	−0.21702	0.04222	−5.13930	0.00
	$R^{2}$		0.99515
	Adjusted $R^{2}$		0.995131
	S.E. of regression		1.832739
	Akaike info crit		4.055987
	Schwarz crit		4.082885
	F-statistic		47.012000
	Prob (F-stat)		0.000000
	Durbin-Watson stat		0.256805
	RMSE		0.030600
	MAE		0.022500

Note: We consider that equation

Z_{t} = c - h^{1} v_{t}^{1} - h^{2} v_{t}^{2}

.

Z_{t}

is the price of the WTI futures, whereas

v_{t}^{1}

is the price of the Brent futures and

v_{t}^{2}

is the price of the Dubai futures.

h^{1}

and

h^{2}

are the weights of

v_{t}^{1}

and

v_{t}^{2}

in the statistical portfolio. c is the constant of regression (source: Fanelli (2020)).

Table 2. Pre-breakpoint regression results.

Coefficient	Estimate	Std. Error	t-Statistic	Prob.
c	−2.22903	0.403895	−5.519	0.0000
$h^{1}$	0.975588	0.0398486	24.482	<0.0000
$h^{2}$	0.184552	0.0518756	3.558	0.0000
	$R^{2}$		0.982
	Adjusted $R^{2}$		0.982
	Durbin–Watson stat		0.251
	Akaike info crit		622.284
	Schwarz crit		632.410
	RMSE		0.030
	MAE		0.022
	RMSE		0.041
	MAE		0.030

Note: The dataset spans from 25 October 2000 to 14 February 2005. We consider equation

Z_{t} = c - h^{1} v_{t}^{1} - h^{2} v_{t}^{2}

.

Z_{t}

is the price of the WTI futures, whereas

v_{t}^{1}

is the price of the Brent futures and

v_{t}^{2}

is the price of the Dubai futures.

h^{1}

and

h^{2}

are the weights of

v_{t}^{1}

and

v_{t}^{2}

in the statistical portfolio. c is the constant of regression.

Table 3. Post-breakpoint regression results.

Coefficient	Estimate	Std. Error	t-Statistic	Prob.
c	−1.07760	0.992658	−1.086	0.27943
$h^{1}$	1.31559	0.102020	12.895	0.0000
$h^{2}$	−0.317238	0.102883	−3.083	0.00244
	$R^{2}$		0.982
	Adjusted $R^{2}$		0.972
	Durbin–Watson stat		0.972
	Akaike info crit		605.086
	Schwarz crit		614,138
	RMSE		0.049
	MAE		0.0310

Note: The dataset spans from 14 February 2005 to 31 December 2007. We consider that the equation

Z_{t} = c - h^{1} v_{t}^{1} - h^{2} v_{t}^{2}

.

Z_{t}

is the price of the WTI futures, whereas

v_{t}^{1}

is the price of the Brent futures and

v_{t}^{2}

is the price of the Dubai futures.

h^{1}

and

h^{2}

are the weights of

v_{t}^{1}

and

v_{t}^{2}

in the statistical portfolio. c is the constant of regression.

Table 4. Pre-breakpoint Johansen test.

Rank	Eigenvalue	Trace Test	Lmax Test
0	0.073294	28.206 [0.0766]	16.366 [0.2123]
1	0.052025	11.840 [0.1666]	11.487 [0.1324]
2	0.001640	0.353 [0.5524]	0.35307 [0.5524]

Table 5. Post-breakpoint Johansen test.

Rank	Eigenvalue	Trace Test	Lmax Test
0	0.17463	37.851 [0.0042]	28.789 [0.0024]
1	0.058590	9.0619 [0.3664]	9.0565 [0.2880]
2	3.59 × 10⁻⁵	0.005388 [0.9415]	0.005388 [0.9415]

Table 6. Trading strategies.

Name	Symbol	Description
Plain vanilla strategy	$S_{t}^{k}$	The mispricing is traded according to the investor risk profile. Very risky and aggressive positions can be taken.
Moving-average strategy	$S_{t}^{h}$	The mispricing is traded according to a prudential consideration of the investor risk profile.
Smooth strategy	$S_{t}^{O}$	The mispricing is traded according to the investor risk profile and by considering the transaction costs.

Table 7. In-sample performance.

Year	2001	2002	2003	2004
Total return	7.72%	5.16%	4.57%	2.93%
Sharpe ratio	1.70	1.43	1.14	0.99
Profitable weeks	39.39%	51.92%	44.23%	46.15%
Year	2005	2006	2007
Total return	2.14%	4.20%	13.53%
Sharpe ratio	0.91	0.87	0.85
Profitable weeks	30.77%	53.85%	50.94%

Note: We apply the optimal strategy in Equation (11) with parameters

k = 1

,

O = 0.6

, and

c = 0.25 %

to weekly in-sample data spanning from 25 October 2000 to 31 December 2007. We calculate yearly performance indicators.

Table 8. Out-of-sample performance.

Year	2008	2009	2010
Total return	4.97%	30.26%	4.90%
Sharpe ratio	0.85	0.69	0.80
Profitable weeks	42.31%	57.14%	50.00%

Note: We apply the optimal strategy in Equation (11) with parameters

k = 1

,

O = 0.6

, and

c = 0.25 %

to weekly out-of-sample data spanning from 7 January 2008 to 23 December 2010. We calculate yearly performance indicators.

Table 9. Cointegration regression results.

	Coefficient	Std. Error	t-Statistic	Prob.
c	23.8809	1.16787	20.45	8.31 × 10⁻⁸¹
$h^{1}$	−0.0401021	0.0808145	−0.4962	0.6198
$h^{2}$	0.730442	0.0892876	8.181	6.65 × 10⁻¹⁶

Table 10. In-sample performance.

Year	2010	2011	2012	2013
Total return	14.27%	28.22%	17.07%	26.36%
Shape ratio	1.45	2.13	1.20	0.99
Profitable weeks	42.18%	43.2%	45.83%	46.36%

Note: We apply the optimal strategy in Equation (15) with parameters

k = 1

,

h = 6

,

O = 0.5

, and

c = 0.25 %

to in-sample data spanning from 1 January 2010 to 31 December 2014. We calculate yearly performance indicators.

Table 11. Out-of-sample performance.

Year	2014	2015
Total Return	20.34%	13.84%
Shape Ratio	0.86	0.86
Profitable weeks	47.50%	46.36%

We apply the optimal strategy in Equation (15) with parameters

k = 1

,

h = 6

,

O = 0.5

and

c = 0.25 %

to out-of-sample data spanning from 1 January 2015 to 10 April 2017. We calculate yearly performance indicators.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fanelli, V. Mean-Reverting Statistical Arbitrage Strategies in Crude Oil Markets. Risks 2024, 12, 106. https://doi.org/10.3390/risks12070106

AMA Style

Fanelli V. Mean-Reverting Statistical Arbitrage Strategies in Crude Oil Markets. Risks. 2024; 12(7):106. https://doi.org/10.3390/risks12070106

Chicago/Turabian Style

Fanelli, Viviana. 2024. "Mean-Reverting Statistical Arbitrage Strategies in Crude Oil Markets" Risks 12, no. 7: 106. https://doi.org/10.3390/risks12070106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mean-Reverting Statistical Arbitrage Strategies in Crude Oil Markets

Abstract

1. Introduction

2. Literature Review

3. Research Methods

3.1. Statistical Arbitrage Modelling

3.2. Implementation of the Statistical Arbitrage Strategy

4. Empirical Application on Crude Oil Markets and Results

4.1. Weekly Trading

4.1.1. Description of the Data

4.1.2. Time-Series Analysis

4.1.3. Mean-Reversion Analysis

4.1.4. Trading Rules Implementation

4.1.5. Performance Analysis

4.1.6. Out-of-Sample Analysis

4.2. Daily Trading

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI