# Technical Tuesday: Exploring financial market data with Z-Scores in Python

Nothing says festive season quite like a post on statistics and programming! Welcome to the inaugural Technical Tuesday, a fortnightly series on data science, machine learning and programming with Python. I am often asked about topics in those areas and if I have some good answers I plan to share them here — in a short and digestible format. Most articles will be written with finance use cases in mind, but many insights can be applied to other fields as well.

We’ll start with Z-Scores, or Standard Scores, today. Z-scores are ubiquitous in descriptive and inferential statistics. Moreover, they are often a core methodological part of financial index construction. Standardisation is also a key step in machine learning to transform all input features to the same scale or distribution. (This ultimately helps deal with the curse of dimensionality, a tricky problem in ML.)

# Getting started: definition, formula and introductory example

Put simply, the Z-score shows how far away a data point is from the mean. More specifically, the distance of a data point (or raw value) from the mean is measured in terms of standard deviations.

Let’s use an example to illustrate this. We have a sample of five people’s heights measured in centimetres. The mean height of our sample is 174.2cm and the standard deviation is 9.12cm (calculated with one degree of freedom as it a sample and not the population).

We’re interested in the Z-Score of George’s height of 189cm (which is not part of the above sample). With the below code block, which also creates the Pandas DataFrame table shown above, we manually calculate the Z-Score using the formula introduced earlier. The Z-Score for George’s height is 1.62, i.e. 1.62 standard deviations above our sample mean.

# The challenges of sequential data

We could have calculated the Z-Score of Alice’s height (second row of the table) as well — order does not matter here. Many phenomena in nature have attractive statistical properties such as being close to normally distributed and stationary. By contrast, time series — and in particular financial market data — are more complicated in many ways. We need to take into account the sequential nature of the data, especially if we are planning to make predictions.

Imagine you have 30 daily data points from December 1st to 30th. If you calculate the mean and standard deviation for the whole period and use both to compute the Z-Score for December 10th you will have entered the statistical *danger zone*. This is called look-ahead bias (or data leakage) because on December 10th you would not have had the other 20 “future” data points to calculate the mean and standard deviation necessary for the Z-Score. If you do this any predictive model will look much better than it actually is because you’ve peaked into the future. The first day for which you could use the above data points is the following day, December 31st. So in contrast to the earlier example with human heights, order does matter if the data is sequential!

# Building early warning signals with the right look-back window: Lockdown news volume example

Our first time series example involves news volume for the keyword “lockdown” from Marex Spectron’s NowCast. In this case we need to make sure we only use means and standard deviations computed on data prior to the observation for which we wish to calculate the Z-Score. Finding the appropriate lookback methodology and windows is tricky business.

One option is to define fixed rolling lookback windows, e.g. the past 30 days or 6 months. For the below analysis of “lockdown” news volume I picked another option: the *expanding** *window methodology with data going back to 2015. The expanding method takes all data points going back to the first one for each calculation instead of having a fixed rolling window (where data prior to the window is not used as the window rolls forward).

This approach produces means and standard deviations that tend to react more slowly to changes than (often shorter) rolling windows. Both results are shown in the green and orange lines in the top part of the chart right below. That means extreme short-term moves (of the raw data we’re interested in) are compared to slower-moving means and standard deviations. This produces faster and bigger moves in the Z-Scores, highlighting extraordinary developments. Using expanding lookback windows can often work well as “early warning systems” when big shifts, or regime changes, happen.

The expanding window Z-Score above reacts very strongly to what look like little tremors in January 2020 when Wuhan and parts of China were locked down. In that sense it served well as an early warning system.

This is easily implemented with Python and the Pandas library. First let’s create three variables (e.g. *smoothing*) containing parameters (e.g. *14 *days) in the first few lines and then an empty *result* DataFrame. Line 5 calculates the rolling mean using an expanding lookback window (where the DataFrame *df *contains the raw daily data points). Line 6 then computes the rolling standard deviation also with an expanding lookback window. I set degrees of freedom (ddof) to 0 as the Marex Spectron NowCast data is considered population data instead of sample data (in contrast to the human height samples above). The last column “zscore_expand” implements the Z-Score calculation subtracting the “ma_expand” mean from our daily data points (or to be precise, the slightly smoothed 14 day moving average of it) and then dividing the result by “st_dev_expand”.

# Spotting recurring outliers with Z-Scores: Trade war example

Let’s hope we will not have to face another pandemic and damaging lockdowns any time soon. Certainly 2020 has felt like a once-in-a-century outlier. Other market-moving developments such as the trade disputes between the US and China have been of a recurring nature. How do we know when it’s critical and when it’s a blip? If we use expanding lookback windows for our Z-Score calculation the means and standard deviations tend to remain elevated for quite some time in both in the *lockdown *and *trade war* cases. It can make sense to use shorter, rolling windows so the Z-Score does not “get used to” high levels and fails to highlight new, big moves.

The methodology and parameters (shown below in the code block) produce a reasonable result that picks up the first tremors in 2016 and 2017 quite well. The raw news volume in those years is tiny compared to what was about to come in 2018 and afterwards. The Z-Score also picks up the 2018 and 2019 spikes (in the absolute count) quite well — in contrast to what expanding window Z-Scores would have shown for those later years.

The key difference to the code block above is that we use the *.rolling() *method instead of the *.expanding()* method to calculate mean and standard deviation.* *The method also requires that we pass an argument, i.e. the length of the lookback window, expressed in the number of days: 350 in our case.

# Comparing time series with disparate absolute levels: Inflation v Deflation example

Drawing on Marex Spectron’s NowCast data again, we are able to compare related topics such as growth and recession, supply and demand (for a range of commodities) and inflation and deflation. Often the news volume for these related topics might differ enormously. As the chart right below shows, daily mentions of inflation are five to ten times as high as those for deflation.

If we took the difference of the absolute figures we would end up with the result shown right below. In a time of minimal concerns about deflation (from mid-2015 onwards), the chart looks almost the same as the one for inflation itself as the news volume for deflation is miniscule.

Should concerns about deflation resurface, it would be hard to detect this in the absolute difference chart above. One way to adjust the imbalance between the two is to standardise the time series for each topic by calculating the Z-Scores. We then subtract the Z-Scores (inflation minus deflation) and end up with the below chart showing the difference.

When the COVID/lockdown crisis hit in Q1 2020 the conversation was strongly swinging towards deflation — and this is only visible in the Z-Score chart below. The Inflation/Deflation net figure went to almost -3, i.e. three standard deviations below the mean of the past year.

Parts of the code will look familiar as we calculate the individual Z-Scores for inflation and deflation with a 350-day rolling lookback. As an aside, repeating the code to compute the two results violates the DRY principle of programming (Don’t Repeat Yourself). A reusable function could be written that processes the computation of both scores instead, but an explanation of this is beyond the scope of this introductory post. The second-to-last line of code takes the difference of both Z-Scores and the last line adds a horizontal line at zero.

# Standardising inputs for Machine Learning

Apart from data exploration and time-series comparison Z-Scores can also be used as a key data pre-processing step in Machine Learning pipelines. The three code blocks below describe a quick example. First, we import the preprocessing module from the widely-used Scikit-Learn library and also Numpy. We then create some dummy data in the shape of a real-valued matrix containing three feature vectors (columns) and three training samples (rows).

The next step is to instantiate *StandardScaler()* from the preprocessing module and then passing the training data set (X_train) into the *fit()* method, which computes the mean and standard deviation — as we had done manually in our prior examples. The second line of code performs the standardisation, i.e. it calculates the Z-Scores for each feature vector (column).

Finally, we print the re-scaled matrix and take a look at the first column, which shows Z-Scores of 0, 1.225 and -1.225. The mean of our raw inputs of 1, 2 and 0 equals 1 so a Z-Score of 0 for the first row entry is correct. The standardised matrix can now be fed into the next pre-processing steps or directly into the algorithm fitting part of the ML pipeline.

To summarise, we explained what Z-Scores are and then explored the basic steps of how to compute different variations of them with practical examples using data from Marex Spectron’s NowCast. Standard scores can be used e.g. as early warning signals, to spot recurring outliers and to compare disparate time series. We also explained how Z-Scores are implemented in a machine learning pipeline using Scikit-learn.

Future posts will include a whole range of issues — from data engineering questions like how to handle very large financial market data sets to machine learning applications to more time series analysis and other topics. This article is also cross-posted on LinkedIn.