The Good, the Bad and the Ugly of Backtesting
We have previously discussed how the central aim of Classic Trend Followers is to ‘Hunt for Outliers’. This is achieved through diversifying our simple Trend Following Models across a vast array of different liquid markets. We can of course manage our capital risk at all times, but that does not necessarily lead to net wealth over the long term. It simply preserves our capital base…….but for us to lift the capital base….we need the occasional Outlier…and the more the merrier.
Most traders understand that diversification is a method to ‘smooth portfolio returns’…but while we understand this principle naturally applies when consolidating multiple non-correlated return streams into a diversified portfolio, and we benefit from this principle, the real reason for our diversification is for another purpose.
Namely, the most important reason that we deploy extensive diversification as a Classic Trend Follower, is to increase our chances of capturing Outliers by restricting our trading opportunities to only those times when trends become material in nature.
Our frequency of trades that target the infrequent Outlier per return stream is therefore very low….but our diversification across return streams lifts the total trade frequency to capture a broad class of Outliers (wherever they may be found).
Now of course in our acceptance of the well known principle that a diverse portfolio has generally smoother returns, we do not let this principle thwart our ambitions of widely diversifying across ANY liquid markets as Outliers by their very nature are unpredictable in the ‘where and when’ they occur. So when you hear a statement that adding a new return stream does not offer any additional marginal benefit in smoothing the returns, we say….”who cares how smooth it is….we want as much diversification as possible as we are ‘hunting rabbits”.
This preferential ambition to target Outliers at the possible expense of a smoother equity curve does make our Portfolio equity curves more volatile than our ‘non trend following’ diversified competition, but this is the sacrifice we make in our hunt for Outliers that may reside in ANY liquid market. You see, by placing small bet sizes and having systems that cut losses short at all times, this therefore allows us to trade even fairly highly correlated markets, so when the Outliers emerge, the magnitude of their effect more often than not cascades across correlated (related) markets. The sacrifice we therefore make in the smoothness of our equity curve, is more than made up by the bounty that all these Outliers bring to our portfolios ‘IF’ they occur.
This more ‘selective approach’ we Classic Trend Followers adopt, in deciding only to participate in trading ‘material trends’, allows us to capitalise on the major price changes that reside in a markets total history. These Outlier events are typically non linear in magnitude. when compared to our many small losses we encounter along the way. The size of these behemoths more than compensates for our losses and leads to the outstanding edge in our technique over the long term.
The causative reasons for Outliers can only be determined in hindsight, but unlike other trading approaches that address a particular risk premium (or factor), the possible causes contributing to an Outlier event are vast. They may relate to endogenous events within the market such as a mistake by a trader (aka a Nick Leeson event), a failure of a large financial institution that is intricately tied to the market, an economic announcement, a cat running across an investment bankers keyboard, a force majeure of a market… and the list is endless…..or the Outlier may relate to an exogenous event that arises external to the system and its effect cascades into the financial markets….such as a pandemic, a war, a tidal wave, a moon landing, a civil riot, a shuttle explosion, a new technology and countless other possible reasons.
Now because of these vast array of possible causative reasons for Outliers, and due to their anomalous quantitative nature, they are excluded from standard quantitative assessment by many traders. The statement….”well that Outlier was just too large and needs to be removed from the data sample given its influence in the series” is exactly the statement we as Classic Trend Followers shake our fists at. You see, traditional economics has never liked these anomalies and refuses to recognise them….but this as it turns out, is to their loss. We just love them and capitalise on them.
You would be surprised how in the ‘real world’ these extreme events are far more frequent than we might assume and their magnitude of impact far more than we could imagine. This uncertain quality of all Outliers catches those traders who interpret risk as being represented by a Sharpe Ratio, off-guard and scratching their heads stating….”if only I had seen that coming”.
Of course for those that wish to avoid them as opposed to hunting them, you can always be lucky for a short while….but inevitably these ‘bogeymen’ catch up with you and change your fortunes in an instant. Well as Classic Trend Followers….it is these earth shattering events that we are hunting, as opposed to avoiding…but to hunt for them, you need to be be prepared for a battering in this exotic domain and go to war with a battalion of heavy armored vehicles at your disposal (aka robust systems). .
Given our understanding that Outliers can reside in any liquid market and can cascade across interconnected markets.….it is in our best interests to diversify widely to capture as many of them as possible.
The degree of diversification is of course hampered by our limited capital, so our efforts are placed on reducing our consistent $ bet size allocation per trade, to as small a size as possible, to allow us to trade as broad a universe as we can achieve within these finite capital constraints.
Furthermore, we apply many different types of simple trend following model across all these markets whose designs are all configured around the Golden Rules of “cutting losses short and letting profits run”, as we understand that Outliers can take a vast array of different possible forms in their expression.
So, now let’s get to the question of this article. What is the point of backtesting if we are hunting Outliers that defy standard quantitative treatment? This is a pretty good question actually. We have stated that Outliers are inherently unpredictable in nature. Furthermore we have also stated that our systems we use to extract an edge from these outliers, are simple in nature and generic in application across any liquid market. There must be more to it?
Well, the devil of course is in the detail.
Backtests are not all bad for the Classic Trend Follower. They of course get a bad rap due to the way they are misused to imply predictive power…..but in fact, there is some good, some bad and some ugly to be had with any backtest.
- A backtest over a sufficient trade sample can be used to inform us of how our models respond to adverse risk events AND to how to improve our method for Targeting Outliers = The Good.
- A backtest cannot be used to estimate future expected returns. This is nonsense in our Classic Trend Following World as Outliers which are the principal driver of our returns cannot be predicted in advance = The Bad
- A backtest applied over a small sample of trades under a single market condition with a nice linear equity curve has no meaning for the longer term. In fact it encourages you to believe in illusions. Market conditions are non-stationery and are never persistent in nature = The Ugly
Being the Eternal Optimists, we are going to focus on “The Good”. Namely how we use Backtest outputs to:
- address any risk weakness in a Portfolio; and
- assist us in better distinguishing the difference between Trend versus Outlier.
How we Use Backtests to Mitigate Portfolio Risk
We have discussed this principle at length in prior posts but in a nutshell it goes like this.
While we can never forecast future returns given our predisposition to catching unpredictable Outliers, as Trend Followers we focus on our attention on preserving capital at all times while waiting for these unforseen events to emerge at the edge of the normal day to day churn of normal market activity.
Our predominant risk mitigation method that we deploy to manage adverse risk exposure on any single trade, is via our application of very small bet sizes in relation to the size of our trading capital. This ensures that any single trade is insignificant in relation to our overall capital, and in the event of an unforeseen risk event which is adverse to our cause, is very unlikely to result in compromising our trading ambitions. With many small equally sized bets, we can compile them into a portfolio and confidently trade the entire portfolio with the knowledge that no single return stream is going to place undue weight in terms of its negative impact on the portfolio.
In addition to our small bet size, which protects us from unforeseen adverse impacts when other risk mitigation methods are not respected (like stops), our initial stop which we apply to every trade, is our method of releasing risk ‘steam’ in a portfolio. This risk release mechanism ensures that no individual return stream carries warehoused risk and that our portfolios are always positioned to carry additional future risk.
Both the application of a small bet size PLUS an initial stop for all our trades ensures that our ‘left tail exposure’ is minimized, and that we never allow an adverse risk event to go Non-Linear on us. We always keep our losses to a linear sequence of small losses. We never let them become an adverse tail event. This is to be contrasted of course against the Non Linear wins arising from an Outlier, which we are seeking using trailing stops that allow for unlimited profits, than can be many multiples of magnitude greater than any small loss in our Trade Distribution.
Our method therefore enforces positive skew into our trade distribution, where we find that our success is predicated on a handful of massive outliers. At all other times our small wins and losses effectively translate into a random trade sequence.
Now of course we can use our backtest to test if our method does actually translate into a positively skewed outcome where adverse tail risk is mitigated, and where we can occasionally be lucky and capture an outlier in our process.
A backtest is therefore very useful in this regard to test our assumptions and as a way to stress test our method under different market conditions. This is akin to a bank undertaking sensitivity tests to assess their ability to handle different economic regimes and ‘worst scenarios’.
A backtest is also very useful in determining a conservative position sizing to apply to your models which prevents undue impacts arising from leverage. Under extensive historical testing across different market regimes, you can observe via a backtest the result of aggressive position sizing and the volatility of your equity curve. You do not want your equity curves to be too volatile and compromise your geometric returns (compounding benefits).
Backtests are also a useful method for factoring in the impact of trading costs such as spread and holding costs (SWAP) to observe their overall impact on trading performance and in applying sensitivity tests when these costs are varied over history. There may also be other useful reasons for a backtest that I simply have missed in my haste to write this piece.
With no backtests to assess the efficacy of our process in managing adverse risk….then we are flying blind. So it is a useful exercise as a method to ‘stress test our assumptions’.
How we Use Backtests to Distinguish ‘Trends’ From ‘Outliers’
In a prior post we introduced you to the reason for why Classic Trend Followers like to distinguish between what are referred to as ‘Trends’ versus what are referred to as ‘Outliers’. The distinguishing feature of an ‘Outlier’ is that it really stands out as a quantitative anomaly in the data series, whereas trends can be evident on any timeframe. However it needs to be noted that ‘Outliers’ can take multiple different forms and do not simply constitute exponential anomalies. They may represent extreme linear anomalies or combinations of different forms of extreme form. The point is, that no matter what the form,….these directional anomalies really stand out in the series as exceptional events.
The ‘normal everyday trend’ is a far broader class of linear sequence and can be constructed from random data, from a segment of a larger mean reverting cycle associated with a convergent market condition, or from a data series that possesses serial correlation in the series that can provide enduring directional momentum to the trajectory. Unfortunately not all these different forms of trend are beneficial to our ambitions.
The impact of these 3 different forms of possible trend effect the equity curve of a Classic Trend Follower in the following ways:
- Random Trend – A trend derived from a series with no serial correlation = Produces a Random Equity curve (that could be unfavourable or favourable for an extended period before it inevitably decays under path dependence);
- Convergent Trend as a segment of a Mean Reversion cycle – A Trend of serially correlated data that is part of a larger mean reverting cycle which produces trend/counter-trend cycles about an equilibrium = Produces a deteriorating equity curve characterised by drawdown from the many ‘whipsaws’ received when attempting to apply breakout signals to this form of trend; and
- Divergent Trend with serial correlation- A Trend with persistent enduring directional momentum and found with Outliers = Favourable Equity Curves that are sought by the Classic Trend Follower.
We of course would like to capture the 3rd class of trend (namely the divergent trend), that offers enduring persistence, but this type of trend is far less frequent than we might surmise. By far the majority of visual trends in market data are the result of ‘noise’ or mean reverting market tendency. Unfortunately, it is impossible to distinguish which trend is which during the formative stages of a trend’s development, as a trending condition could be any of these possible 3 types of trend.
It is only when a trend starts to mature beyond the bounds of other ‘normal trends’ in a series that we can infer that there is some causative bias (aka serial/auto correlation) in that particular trending series, which is pushing it towards what we would regard as an ‘Outlier’ (an anomaly).
So a backtest is a useful device to describe the typical normal range of market activity within which we find these 3 classes of trend. Namely in the zone where all trends are about the same size. If we can identify this region where the three classes of trend reside, we can therefore avoid this zone of price activity with our entry filters used by our trend following models (aka our lookbacks w.r.t breakout models or other filters for different classes of trend entry). In doing so we can avoid the majority of noise and mean reversion that resides in the trending condition and then we are well placed to capitalise on the 3rd class of trends that emerges from this normal range of market activity. Namely, a divergent trend that is more likely to possess directional serial correlation within their signature and possibly turn into an ‘Outlier’ when all is said and done.
While we appreciate that we can never predict the where or when of an outlier event, we can certainly exclude that zone of a markets range which can be statistically identified through a long term backtest which leads to stagnation or deteriorating equity performance via effects of random noise or ‘whipsaws’ (aka false breakouts) arising from mean reversion.
Typically, we find that this normal range of market activity can be defined within say 0-3 ATR’s on either side of current price (excluding those periods defined by anomalies), but we could use other Gaussian measures to distinguish the boundary of the normal distribution of returns.
The region that is encapsulated by this normal distribution range of price action can be characterised by the bulk of price action. Within this zone there is a prevalence of trading behaviour that destructively interferes with a serially correlated trending signal.
However, when price manages to extend clear of this ‘disruptive’ zone…which is like the notion of a breakout from a congestion zone….price tends to become more elastic (volatile) and unidirectional in nature. At this boundary, we see that volatility typically expands when price manages to escape this ‘compressed zone’.
Within this more elastic zone away from everyday market activity, the impact of unidirectional serial correlation becomes evident as the predominant behaviour in these abnormal conditions encourages becomes more uniform. Convergent traders are now hitting the ‘escape clause’ with their models or reducing their risk exposure which turns destructive behaviour into confirmatory behaviour for divergence, and divergent traders are now aggressively entering the market. This creates positive feedback that makes price either ‘lift off’ or ‘fall like a stone’ which suits the long AND short divergent trader.
This boundary of elasticity is where we would like to place our first trend entry traps for our models. We use filters that help to place our traps at the boundary of these more elastic price zones. A backtest serves to help delineate these zones in the market data.
Here is how we use a backtest to define the models we will deploy that avoid the disruptive zone and focus attention on the elastic non linear zone where Outliers emerge. We obviously cannot use conventional metrics such as Standard Deviation, Sharpe, Sortino or Profit Factor to assist us as they do not have applicability in our uncertain world that resides in the Tails of the Distributions of returns. These measures are only applicable in more ‘predictable’ and stable domains of the market….so we need different metrics.
Our preferred metric is a risk adjusted metric such as the MAR or Serenity Ratio which gives us very important information relating to the maximum degree of adverse risk exposure that our models face. These path dependent risk adjusted metrics seek to obtain an optimal balance between the reward to risk balance achieved by a Trend Following model over any possible historic path. If our selection method sought criteria relating to the historic returns of that strategy, then these methods have no use to with a different future ahead of us ….however if the Metrics we used provided us with a ratio that allowed us to compare alternatives in terms of their reward to risk relationship of the past, then at least we would be able to select the model that provided us with the least relative risk exposure for every $ earned by the model.
So let us assume we have a choice to make between 50 different Trend Following models when applied to a very large data set of real market data such as 40 years of data from 25 different real market data sets = 1000 years of historic data. You might say where did we get all this data? Well, because we normalise our models so they can be applied to any liquid market, then we have 40 years worth of data from 25 different markets that we can use to test the robustness of our models. We view different normalised market data sets as just different possible historic paths, so by using an array of them in our backtesting process we test our models across many different possible paths that history has presented to us.
So we have 50 models to choose from and we compare their MAR or Serenity Ratio performance over the complete 1000 years of market data to evaluate their relative performance prior to evaluating their compounded performance. It is essential that we evaluate our models on an ‘uncompounded basis’ so that any performance attribution from compounding impacts is eliminated.
What these ‘uncompounded’ ratios reveal to us is their relative performance in terms of their risk to reward contribution to the portfolio. Any significant adverse risk events to our models will be reflected by significant adverse drawdowns over this extreme data sample which will naturally dilute their risk adjusted performance. Choosing the models with the superior MAR or Serenity ratio will therefore define those candidates that have managed to achieve all the positive benefits received from Outliers in this common data set , and will also provide us with the superior performer in terms of risk management throughout the series. These ratios therefore help us select those candidates that avoid the noise and the mean reversion impacts that materially affect our Outliers.
A backtest further allows us to compare our equity curves produced against real market data to observe our relative performance against different forms of trend. So, for example, if we can observe a significant directional anomaly in the market data, then we would like to see a positive correlation between that directional anomaly and our equity performance. While it is very difficult to discriminate between the type of trend in its infancy, we certainly can in hindsight evaluate the performance of our models when presented with Outliers that are a significant anomaly in the series.
If we find that our backtest concludes that we have beneficially exploited all major anomalies in the series and have not been significantly hampered by drawdowns outside of these ‘anomalous periods’, then we can conclude that our method is doing what it is supposed to do.
So, there we have it. Not all backtests are bad. There is some use to them after all…however do not use backtests as a basis to forecast future returns cause if you do….then……..
Trade well and prosper
The ATS mob