Summary
• A new ECB working paper uses texts from European newspapers to create real-time nowcasts of GDP growth in Germany, France, Italy and Spain.
• It shows text-based sentiment metrics improve the forecasting performance of other benchmark models, like the ECB’s official GDP projections.
• The forecasting benefits of these text-based metrics are largest during turbulent times, and they work best as inputs into non-linear machine models like Neural Networks and Random Forests.
Introduction
Econometrics and machine learning appear to be intersecting increasingly often. That the former can tease out causation and the latter improve forecasting means macro-finance researchers now regularly combine them. We recently showed how this hybrid method can unlock information in the Wall Street Journal to predict economic activity.
Similarly, a new ECB working paper uses textual data from European newspapers to nowcast quarterly real GDP in the euro area. The authors use economics-orientated dictionaries and more general-purpose lexicon-based methods like VADER and AFINN to create daily news sentiment metrics. They pit their measure against the ECB’s official GDP projections and find:
• Newspaper sentiment indices substantially improve nowcasting performance versus PMIs, and these gains are largest in the first half of each quarter.
• Standard linear models (e.g., OLS) provide the best estimates of GDP in normal times, while non-linear machine learning models perform better when extreme shocks occur.
• The Google Translate API translates foreign language texts into English well, as translated sentiment indicators correlate strongly with raw, untranslated ones.
This article is only available to Macro Hive subscribers. Sign-up to receive world-class macro analysis with a daily curated newsletter, podcast, original content from award-winning researchers, cross market strategy, equity insights, trade ideas, crypto flow frameworks, academic paper summaries, explanation and analysis of market-moving events, community investor chat room, and more.
Summary
- A new ECB working paper uses texts from European newspapers to create real-time nowcasts of GDP growth in Germany, France, Italy and Spain.
- It shows text-based sentiment metrics improve the forecasting performance of other benchmark models, like the ECB’s official GDP projections.
- The forecasting benefits of these text-based metrics are largest during turbulent times, and they work best as inputs into non-linear machine models like Neural Networks and Random Forests.
Introduction
Econometrics and machine learning appear to be intersecting increasingly often. That the former can tease out causation and the latter improve forecasting means macro-finance researchers now regularly combine them. We recently showed how this hybrid method can unlock information in the Wall Street Journal to predict economic activity.
Similarly, a new ECB working paper uses textual data from European newspapers to nowcast quarterly real GDP in the euro area. The authors use economics-orientated dictionaries and more general-purpose lexicon-based methods like VADER and AFINN to create daily news sentiment metrics. They pit their measure against the ECB’s official GDP projections and find:
- Newspaper sentiment indices substantially improve nowcasting performance versus PMIs, and these gains are largest in the first half of each quarter.
- Standard linear models (e.g., OLS) provide the best estimates of GDP in normal times, while non-linear machine learning models perform better when extreme shocks occur.
- The Google Translate API translates foreign language texts into English well, as translated sentiment indicators correlate strongly with raw, untranslated ones.
High-Frequency Indicators
Over the last 12 months, we have covered several machine learning methods that researchers use to create high-frequency indicators. One was the OECD’s weekly growth tracker, which uses neural networks to fit Google search intensity to official quarterly GDP figures.
Another used Latent Dirichlet Allocation (LDA) topic modelling to create a monthly measure of news attention from Wall Street Journal texts. This high-frequency indicator was then added as an input to a traditional econometrics model.
Macro Hive also recently published a paper on using machine learning methods to predict recessions. We based this on work done by the Bank of England, which uses monthly macroeconomic data in Random Forest and Extreme Randomised Trees models.
The ECB paper goes further. It creates daily sentiment indicators from newspaper articles to produce daily, real-time GDP nowcasts. The authors produce these nowcasts in both econometric and machine learning models to determine which can produce the best estimates of GDP growth.
Creating Sentiment Indices
The daily sentiment metrics are based on economic, corporate or financial market articles published in newspapers across the ‘Big Four’ euro area economies. The authors collect them from the Factiva database. And, in total, they use 5mn articles spanning 1998-2021. All the articles are in the native language.
To obtain the sentiment measures, the authors can use language-specific dictionaries at a country level (e.g., an economics/business-specific dictionary for the German language). Or they can translate the texts to use English language sentiment measures. As most methods in the natural language processing literature focus on English, the authors choose the latter.
Creating a sentiment index derived from translated non-English texts is one of the paper’s USPs. The authors have two options. They can translate the full articles using Google Translate API (they use python’s ‘googletrans’ package). Or they can translate the various sentiment dictionaries from English to the native language. A key finding is that both methods are highly correlated (Table 1).
There are six sentiment metrics in Table 1. These correspond to the English language sentiment dictionaries used by the authors. AFINN is produced by Nielsen (2011). It is a general-purpose dictionary used to measure sentiment, classifying words on an integer scale from positive (+5) to negative (-5). Tetlock (2007) designed HIV, which is also general purpose, but classifies words as either positive (+1) or negative (-1).
The other dictionaries (CGLM, LM, NKTGOS, and HL) are designed specifically for an economics application, but still classify words as either positive (+1) or negative (-1). Lastly, the authors also use one sentence-level sentiment dictionary, VADER. This measure works on a continuous scale from -1 to +1 because it takes the intensity of the expressed sentiment into account. It also recognises emojis!
The authors obtain actual sentiment scores daily. But because they want to nowcast quarterly GDP, they aggregate the sentiment scores by quarter. Then, at the start of every new quarter, they reset the indices to reflect only signals occurring within the relative quarter. In other words, growth in any given quarter is only predicted by news in that quarter, not previous quarters – it is not a rolling average.
Practically, this means as the quarter progresses, the sentiment index averages over more days and therefore more articles. So, the sentiment measure starts noisy but smooths over the quarter (Chart 1). Notably, the authors construct the sentiment indices individually for each country and aggregate them using Eurostat’s GDP weights to create a euro area average. They also standardise them.
Nowcasting in an Econometric Framework
On any given day, the authors want to predict quarter-on-quarter GDP growth using the daily sentiment score derived above and another high-frequency indicator, PMIs. They compare this model’s performance to one with only PMIs and another that contains the ECB’s official latest projection. They measure performance here as the error (MSE) between the nowcast and the actual data.
The authors only publish the results from the CGLM and VADER sentiment metrics (Chart 2). First, the error decreases throughout the quarter, which is consistent with more information becoming available throughout the period. Second, the text models have a smaller nowcast error than the PMI model and ECB projection, but only in the first half of the quarter.
Non-Linear Models
We know the relationship between economic activity and soft indicators (e.g., PMIs and texts) is non-linear. So, the authors use four non-linear models in which to check the nowcast performance. Three are machine learning models: Random Forests, Neural Networks and Boosting Algorithm. The other is a Ridge regression.
During normal times, traditional PMIs and the ECB’s projections are better estimates of growth than text-based sentiment measures. However, during the Great Recession, the text-based GCLM model produces consistently more accurate nowcasts. Meanwhile, during the Covid-19 crisis, the VADER metric has performed better.
The results have two key implications. First, text-based measures are excellent forecasters of growth during economic shocks. Second, the type of dictionary used is important. The VADER dictionary is general purpose – not designed for an economics application. It therefore did better during Covid. The CGLM dictionary was, so outperformed during the GFC.
In terms of the modelling strategy, all non-linear models provide better projections than the linear one in turbulent times. But the machine learning models do better than the Ridge regression. In normal times, the ridge regression delivers the best non-linear nowcasts, but little difference exists between the linear and non-linear regressions in this instance.
Bottom Line
The ECB paper is the first to study the role of translation of non-English texts on constructing economic sentiment metrics. It finds little difference between translating the full texts from the native language to English and translating the dictionary words from English to the native language.
The authors also show how important the choice of dictionary is in constructing sentiment measures. Ones designed with economics in mind provide better forecasts during financial crises. Then, once the text-based sentiment measures are added, predicting GDP is very easy in normal times – just use a standard OLS regression. But in turbulent times, consider using a machine learning model, like the one Macro Hive has designed!
Sam van de Schootbrugge is a Macro Research Analyst at Macro Hive, currently completing his PhD in international finance. He has a master’s degree in economic research from the University of Cambridge and has worked in research roles for over 3 years in both the public and private sector.