It took an AI language model barely a few hours to learn how to create fake monetary policy statements like the ones regularly issued by the Federal Reserve. The open-source model can also mimic “Fedspeak” when given a few words with which a statement is meant to begin, e.g. “The recent stock market volatility”.
The AI model — OpenAI’s GPT-2 —then generates coherent texts that are almost indistinguishable from human-created Fed statements. Moreover, the model also taught itself the logic and reasoning that underlie monetary policy, e.g. that a contraction in the housing market may cause the Fed to lean towards more accommodative policy.
This development illustrates the breakneck speed of innovation in artificial intelligence in general and the subfield of natural language processing (NLP) in particular. Some are calling it the “ImageNet moment for language”, referring to the breakthrough in computer vision in the early 2010s that massively boosted technologies like self-driving cars.
The opportunities created by such models that can teach themselves to “understand” language and generate human-like texts and speech are huge. For example the models can help us summarise vast amounts of data, highlight the most important information, extract sentiment etc — instantly. The quality of information extraction could be 10x better than today’s methods because the models have a much deeper understanding of language. This could be a crucial decision-making advantage: in business, financial markets and politics.
The GPT-2 deep learning model was trained by OpenAI with more than 40GB of English-language text data — a both very time-consuming and expensive process. The small and medium-size trained models were open-sourced here. Transfer learning then enables us to take the pre-trained model — which taught itself how the English language “works” (the source task) — and fine-tune the model weights for a different target task.
It even works for Shakespeare
An example for this is fine-tuning the model with thousands of pages of Shakespeare’s works in order to generate “Shakespeare-like” text. This is a sample output:
In order to create AI-generated Fed Statements we need to go through the following steps:
- Data and tools — scrape and clean statements from the Fed website
- Quick exploratory data analysis (EDA)
- Loading and fine-tuning the GPT-2 language model
- Examine results — generic Fed statements and those with a given prefix
Data and tools
We need to scrape, clean and analyse the text data with which we will fine-tune the GPT-2 language model. To that end, we scrape the last 25 years’ worth of statements from website of the Federal Reserve. Here is a recent example:
The tools we use from the Python ecosystem include Jupyter notebooks, the NLP library spaCy, Google’s cloud-hosted notebook Colab and the deep-learning library Tensorflow.
The below Python script implements the web scraper and performs data cleaning mostly with Regex operations (lines 18–33). The scraping code can be found in a different package — full code on my Github. Finally we have to save the scraped text in string format as a file that we feed to the model.
More details on data cleaning
Data cleaning and wrangling are very important and often the most time-consuming parts of data science and machine learning projects. So let’s look at this in a bit more detail here. A lot of traditional natural language processing involves heavy manual cleaning (tokenisation, stemming, lemmatisation, removal of stopwords and punctuation etc). There are lots of articles on this to be found on the web, here’s a tutorial on text data I published recently (and another one on cleaning numeric data too).
However, in this case we want to preserve the text as it is because the language model needs to teach itself how exactly the Fed formulates their statements. The only cleaning needed is related to things that we couldn’t iron out in the scraping process. Regular expressions (Regex) can help us with this. Different options are presented in lines 19 and 21 in terms of how to use Regex with pandas — both achieve the same thing.
Over the past 25 years the Fed has occasionally changed the way it phrases and presents their statements. For example, certain bits of text may appear before and after the main body of text we’re interested in. The scraping script cannot pick up all of these changes. So we use a list of regex_terms that will be removed by calling the function regex_clean in the below script.
A few lines of Regex operations may not look like much but they’re an incredibly powerful tool.
Exploratory data analysis
The Jupyter notebook has a few “before and after” examples of data cleaning. The scraped data does indeed require some clean up, because we wouldn’t want the GPT-2 language model to learn putting line breaks (“\n”) everywhere when it generates its own text!
For immediate release\n\n\n\n\n\n\r\nThe Federal Open Market Committee decided today to ease the stance of monetary policy slightly, expecting the federal funds rate to decline 1/4 percentage point to around 5-1/4 percent.\r\n\r\n\tThe action was taken to cushion the effects on prospective economic growth in the United States of increasing weakness in foreign economies and of less accommodative financial conditions domestically.
\r\n\r\n\tThe discount rate remains unchanged at 5 percent.\n\nThe discount rate remains unchanged at 5 percent.\n\n1998 Monetary policy\n\n Home | News and events\nAccessibility\n\nLast update: September 29, 1998, 2:15 PM
Here are some aggregate statistics regarding our now clean data:
Total word count: 62,820
Total article count: 169
Average number of words per article: 371
We also apply a spaCy NLP pipeline, which removes stopwords and lemmatises the words. The 15 most-common words across all statements of the past 25 years are shown below. As an interesting aside (at least for economists!), the second most important word is “inflation”. The Fed’s mandate is two-fold: maximum sustainable employment and price stability. So inflation, which is related to price stability, is a more common word here than employment. With 916 to 169 occurrences, inflation is mentioned 5.4 times more often than employment.
[('Committee', 1237), ('inflation', 931), ('rate', 776), ('economic', 746), ('market', 544), ('Federal', 516), ('condition', 461), ('percent', 458), ('continue', 440), ('price', 438), ('growth', 432), ('remain', 400), ('federal', 392), ('fund', 385), ('policy', 366)]
Finally, let’s look at a dispersion plot for certain words from 1994 until 2019 (the x-axis shows the number of words from 0 to 62,820, which can double as a rough timeline too). Interestingly the word “inflation” was almost always mentioned but “employment” only came up regularly around the global financial crisis of 2008. “Housing” did come up a lot in the mid-2000s.
Loading and fine-tuning the GPT-2 language model
Now that we have the cleaned text, we can feed it to the model for fine-tuning so it can learn “Fedspeak”. We use the below Python package that wraps the GPT-2 model fine-tuning and text generation scrips.
A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation…
A complete Google Colab notebook with the code is available here. The main reason for using Colab is because Google kindly provides every user with a free GPU, which is by far the fastest way of doing the fine-tuning. Click the “Open in Colab” link if you want to run the notebook yourself. Here’s a quick screencast of the fine-tuning process run in Colab:
The results: AI-generated Fedspeak
Here’s a fake monetary policy statement, which sounds very similar to what the Fed might actually say. Moreover, please check out this link to my Github which has approximately 100 of such model-generated statements.
In response to the deterioration in the labor market, the Committee decided to extend the average maturity of its holdings of securities. The Committee will regularly review the size and composition of its securities holdings and is prepared to adjust those holdings as appropriate. The Committee also decided to keep the target range for the federal funds rate at 0 to 1/4 percent and currently anticipates that economic conditions — including low rates of resource utilization and a subdued outlook for inflation over the medium run — are likely to warrant exceptionally low levels of the federal funds rate at least through mid-2013. The Committee will continue to assess the economic outlook in light of incoming information and is prepared to employ its tools to promote a stronger economic recovery in a context of price stability.
Giving the algorithm a prefix to start a statement
Here we can give the model a word or phrase with which to start (‘seed’) the first sentence. It should be something within the monetary policy context otherwise the result will be odd or funny — or possibly both.
“The stock market has continued to rally, supported by solid growth in employment and incomes and a reduction in excess inventories…”
“Global risks to the economic outlook have shifted in recent months. Inflation and longer-term inflation expectations remain well contained. The Committee perceives the upside and downside risks to the attainment of both sustainable growth and price stability for the next few quarters to be roughly equal…”
“Copper futures have increased in recent weeks, and some analysts have raised concerns about the potential for inflationary imbalances that could undermine economic growth…”
“Asset prices have risen further, but longer-term inflation expectations remain well contained...”
“The asset purchase program is providing these funds to support mortgage lending and housing markets, and it stands ready to expand the program as conditions warrant...”
“The global financial crisis demands a rapid response of monetary policy. The Committee is concerned that this conflict could lead to a continuing deterioration in business conditions that could contribute to inflationary imbalances in the economy that could undermine the favorable performance of the economy and therefore supports the adoption of stringent measures…”
Most of these statements again sound pretty much what the Fed would say. As above for the generic ones there is another text file for you to check out with more statements that start with a specific word or phrase.
How to trip up the model and spot fake text? Try “Trump”
The AI-generated texts might be coherent and very similar to human-created ones. In many cases the model also gets the the logic and reasoning right, e.g. “volatile stock market” or “housing market contraction” is correctly associated with Fed action towards “easing of monetary policy” and “promoting sustainable economic growth”.
However when we bring the words “President Trump” into the picture, the model seems to get things wrong. The first statement below starts with the highlighted prefix but the paragraph ends with 2010! The issue is obvious: Obama was president in 2010, not Trump. My guess is that the model learnt that Obama was president and it also knows that Trump is now president and hence might have swapped the names given that the context is similar.
President Trump lifted the temporary restrictions placed on certain foreign investors and temporarily suspended the capital gains tax on the U.S. Treasury securities market.[...] In order to promote a smooth transition in markets, the Committee will gradually slow the pace of its purchases of both agency debt and agency mortgage-backed securities and anticipates that these transactions will be executed by the end of the first quarter of 2010.
Here’s another example where the model probably took something Obama said about supporting the Fed and put in “Trump”. The tweet below shows one of the many examples where Trump criticised the Fed harshly in public. This obviously doesn’t chime with the model-generated text at all.
“President Trump today expressed strong support for the policies of the FOMC. In taking the discount rate action, the Board approved requests submitted by the Boards of Directors of the Federal Reserve Banks of New York, Philadelphia, Atlanta, Chicago, St. Louis, Minneapolis, and San Francisco…”
Bearing in mind that the model used barely more than an hour to fine-tune itself, the results are still remarkably good and with some light human editing they can be close to perfect.
OpenAI has decided to not to release the really big language models that produce even higher-quality text than the medium-size models used here — yet. The obvious concern is that these larger model might be used to generate unprecedented amounts of fake news that cannot be identified as such anymore because it is too similar to human-written text. This is a real risk, but possible solutions are in the works, e.g. verified sources an identification systems.
The unrealised potential of such self-trained language models are significant, because the algorithm develops a real understanding of the text. The use case of summarisation alone could be huge, e.g. the model processes large amounts of text (in English and other languages) and instantly summarises the content in “its” own words. This would benefit all kinds of decision makers — from foreign policy to trading in financial markets.