Correlation is a rather tricky subject. Using visual inspection, what would you say the correlation between the returns of security 1 and security 2 in the left chart and between security 1 and security 3 on the right chart is?

Sourcing, cleaning and storing data are often the most time-consuming steps of data science pipelines. Using and calibrating models has become fairly convenient over the past decade, but the “plumbing” parts of the process can often be quite painful. When you experiment with new data to evaluate a potential project it is a good idea to keep things very simple. Instead of setting up a full database it might be worth managing and storing your data in flat files. I will describe two simple and effective methods today both of which are far easier and quicker to use than a…

Statistics is not known for good jokes, but this one is worth a try as it touches on today’s topic: “There are two kinds of people in the world. 1) Those who can extrapolate from missing data.”

As any experienced data scientist will confirm, 80% of the work of building statistical and machine learning models is pre-processing one’s data properly. That includes steps such as obtaining, cleaning, wrangling, standardising and transforming it. If you import data from Bloomberg into an Excel spreadsheet you will likely have come across blanks and “N/A” in certain cells. Alternatively, connecting to an API with…

Nothing says festive season quite like a post on statistics and programming! Welcome to the inaugural Technical Tuesday, a fortnightly series on data science, machine learning and programming with Python. I am often asked about topics in those areas and if I have some good answers I plan to share them here — in a short and digestible format. Most articles will be written with finance use cases in mind, but many insights can be applied to other fields as well.

We’ll start with Z-Scores, or Standard Scores, today. Z-scores are ubiquitous in descriptive and inferential statistics. Moreover, they are…

All you need is 25 years’ worth of Fed text data, Python and OpenAI’s new deep learning language model labelled “too dangerous to release”

It took an AI language model barely a few hours to learn how to create fake monetary policy statements like the ones regularly issued by the Federal Reserve. The open-source model can also mimic “Fedspeak” when given a few words with which a statement is meant to begin, e.g. “The recent stock market volatility”.

The AI model — OpenAI’s GPT-2 —then generates coherent texts that are almost indistinguishable from human-created Fed statements. Moreover, the model also taught itself the logic and reasoning that underlie monetary policy, e.g. …

The fastest-growing category of data is unstructured, e.g. text and images. In finance many still rely — almost exclusively — on traditional, numeric time-series of prices and fundamental data. How can we access these growing sources of untapped, alternative data? And how do we make sense of millions of documents of text that no human can process in any reasonable amount of time?

“Data is the new oil”

This article demonstrates how you can source and analyse such data. As an example, we want to scrape freely-accessible news articles about the oil company Royal Dutch Shell. …

Part II of a series on building a free, automated dashboard of real-time macroeconomic indicators

In part I of this series we created an automated process for sourcing and wrangling economic data from convenient sources such as the Quandl API, the slightly more challenging data sets from the Reserve Bank of Australia all the way to really difficult sources such as the German Bundesbank.

In this part of the series we will perform some simple analyses and mainly focus on visualising the data we sourced and wrangled before. The main tools will be Python libraries such as Matplotlib and Seaborn.

The full code for both parts of this series can be found on my Github.

Part I of a series on building an automated dashboard that monitors the health of the global economy

Platforms like Kaggle offer a great way of learning about data science and applying machine learning algorithms to real problems. But such data science competitions often fall short in one regard: in most real-world projects you are not simply given all relevant data sets and they certainly won’t be delivered in one convenient CSV file.

Granted, in Kaggle competitions you have to do data wrangling too, e.g. dealing with missing values and converting categorical to numeric values. But the focus is clearly on optimising machine learning algorithms and model stacking. So let’s deal with the aspect of data sourcing, importing…

Crypto markets are going through another bear market. Almost anyone can make money in a bull market, but falling prices tend to separate the wheat from the chaff. The good news is that this also creates opportunities for both traders and investors. Traders who are bold and skilled enough to run short positions can generate significant returns in a downturn. Investors with patience get great entry prices for new positions or to add to existing positions.

1. Position sizing

To stay in the game you must not have positions that are large enough to ruin you. Nobody blows up because they are wrong…

Marcel Dietsch

Head of Quantamental Analytics @MarexSpectron . Machine learning. Quant strategy. Commodities. Python 🐍. Ex-hedge fund PM. PhD @UniofOxford . Views my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store