Imagine you are the CEO of a well-known asset management company. You are sitting in a meeting with a new data vendor, your new portfolio manager (PM) who is an expert in alternative data, and your compliance officer.
The Food Data Pitch
The data vendor has data showing exactly what each public company’s CEO and all their workers ordered for lunch every day since 1 January 2011. He shares a case study where, using the patterns in the food data, their proprietary signal successfully predicted an impending merger of company X that had been shopping for strategic partners, with conglomerate Y. The senior management of public companies X and Y started eating more steaks, fries, and pizza three months before the actual successful merger. The vendor’s resident food scientist — an authority in the field, shows convincing and intriguing studies about protein, sugar, and fat correlations with dopamine receptor activity and about how changes in senior leadership’s food consumption patterns might signal important business events on the horizon. Even if the exact event remains unpredictable, buying options ahead of time that gain value when the volatility and activity in X and Y increase after the news breaks officially could be profitable.
This article is only available to Macro Hive subscribers. Sign-up to receive world-class macro analysis with a daily curated newsletter, podcast, original content from award-winning researchers, cross market strategy, equity insights, trade ideas, crypto flow frameworks, academic paper summaries, explanation and analysis of market-moving events, community investor chat room, and more.
A longer version of this article co-authored with Neil Seeman appeared as Privacy vs. Alpha: A conversation on Data Driven Investor. Apurv conducted the research when visiting at Harvard Business School earlier in 2019.
Imagine you are the CEO of a well-known asset management company. You are sitting in a meeting with a new data vendor, your new portfolio manager (PM) who is an expert in alternative data, and your compliance officer.
The Food Data Pitch
The data vendor has data showing exactly what each public company’s CEO and all their workers ordered for lunch every day since 1 January 2011. He shares a case study where, using the patterns in the food data, their proprietary signal successfully predicted an impending merger of company X that had been shopping for strategic partners, with conglomerate Y. The senior management of public companies X and Y started eating more steaks, fries, and pizza three months before the actual successful merger. The vendor’s resident food scientist — an authority in the field, shows convincing and intriguing studies about protein, sugar, and fat correlations with dopamine receptor activity and about how changes in senior leadership’s food consumption patterns might signal important business events on the horizon. Even if the exact event remains unpredictable, buying options ahead of time that gain value when the volatility and activity in X and Y increase after the news breaks officially could be profitable.
Your star portfolio manager is salivating. She is still asking all the hard questions ranging from data quality and completeness to the high likelihood of back-fitting – “What about the cases where people just changed their food habits randomly and nothing happened?” – but you know from experience that she really wants the data. You also think the investors in your pension fund might enjoy the story: “Hey we really know the companies in our concentrated equities portfolio. We even know what they are eating! A long protein and short empty carb strategy actually made 3% α in our concentrated event driven strategy.” Insert a fun discussion about food, and you might just have retained a client that would have otherwise switched to smart beta.
Alpha vs. Privacy
On the other hand, the data are expensive. And you are also concerned about privacy – would I want people to know what I am eating?
Your compliance officer points out that if no one else were using these data, you could be in trouble. Additionally, he raises the issue of informed consent: most people fail to really understand what they click ‘yes’ to.[1] To collect the data, the vendor had issued consent forms which the CEO and his colleagues had ticked off on their iPad menus when ordering their food. These forms did contain explicit consent information; but were they even read?
Are the costs and risks of these data worth the expected reward? “Simple”, says PM. “To earn the higher returns, we must take more risk and pay more.” And sure, the privacy of a few people may diminish, but those CEOs and their colleagues signed up for a certain amount of risk when they took on public roles in a public company, right? Wouldn’t the slight potential diminution of a few people’s privacy justify more dollars in the hands of retirees?
[1] Cameron Kerry of the Center for Technology Innovation at the Brookings Institution recently wrote: “In a constant stream of online interactions, especially on the small screens that now account for the majority of usage, it is unrealistic to read through privacy policies. And people simply don’t.”
Figure 1: The Month 0 Expected Privacy vs. a Trade-off
Costs Arise
In the end you choose to obtain the data. You had to hire various specialist consultants to process the data and then everyone had to understand what those specialists were really saying. In the end, it seemed that only the PM did. On the reward side, little has materialised. Yes, your firm espouses patience in investing; but it has been six months and your team has not found any systematic alpha signals that you honestly believed in.
It turns out that this lack of results from hyper-personalisation and vast amounts of analysis is not uncommon in other fields. As recently shown, while large sums of money are spent on targeted advertising — using sophisticated techniques from web bugs, to cookies, to browser and device fingerprinting — its effectiveness is unclear.
The Moral Hazard of Alpha and Privacy Breach
What if these hyper-personal data help generate some alpha for a few years and then there is a terrible leak that exposes sensitive information about people’s DNA to malicious hackers in a different country? How does the current incentive structure play out? If the firm is organised such that the PM and the quant team could reap significant benefits from the alpha signals derived from new data, and compliance or the CEO bear the long-tail risks in case of a leak, then there is room for moral hazard. The PM and the quant teams are incentivised to be aggressive – to procure and exploit the data – because they bear no personal risk.
And these long-tail costs can be real. CapitalOne™ made that mistake when their data was breached, and they’re expecting to pay $100M-$150M in 2019 as a result. And if a big company known for being tech savvy could not protect its customers, what is the guarantee that a smaller asset management firm can?
What’s more, potential counters, like data anonymisation, fail to perform as expected. According to a recent paper, 99.98% of Americans were correctly re-identified in any available ‘anonymised’ dataset by using just 15 characteristics, including age, gender, and marital status.
Learning from Healthcare: the Risks of Exposing Personal Data in the Context of Urgency
So ingrained in healthcare research is the risk of data leakage that Faculties of Medicine have long had independently functioning Institutional Review Boards (IRBs) pre-assess data methods and potential datasets for demonstrable downstream benefit well prior to the collection of any research using human subjects.
Research projects built around methods that do not collect personal data – for example, a survey of hospital protocols on IT usage – can be declared exempt from in-depth IRB protocol reviews, which is a relief for researchers because the process of a full review can be taxing and delay the research significantly. As such, there is an inherent incentive to avoid the collection of personal data unless the ROI of linking datasets – the mapping of clinical outcomes data to specific surgical procedures, for instance – can lead to important discoveries such as finding out that just one of many different surgical procedures is the only one that results in longer post-surgical survival. This research requires personal data but is obviously worth the potential stress and delay of going before an IRB.
Healthcare has also seen the damaging effects of ‘filter failure’ (i.e. information overload): researchers have noted that the main problem is not that there is too much information, but rather that the current tools of managing and evaluating information are ill-suited to the realities of the digital age. Some of the major instances of filter failure are inadequate information retrieval systems in clinical settings, and the problem of identifying all relevant evidence in a complex, diverse landscape of information resources.
Designing a Way Forward: Privacy Friendly α?
We can flip the privacy vs security paradigm in our initial example on its head to create a positive-sum situation. Imagine with the same dataset you find that some of the food data, when aggregated, was useful for predicting store sales. Your firm’s PE arm could use this information for better investment in private companies with low analyst coverage. Perhaps you can do better seasonal correction for macro data by using aggregated information about consumption patterns – especially liquids – over various states during a heat wave. Such strategies do not rely upon individual users losing control over their personally identifiable data.
Other examples of privacy friendly investment or social α may be:
(1) Better ESG and corporate governance being enforced by using anonymised and aggregated data from surveys and forums about employee happiness and their judgment of management effectiveness.
(2) Better macro-economic predictions that help foster improved monetary policy and timelier asset allocation, especially in difficult economic times or for poorer countries when these government data tend to be poorly measured.
(3) Using anonymous surveys or data from explicitly public forums such as Twitter™ where users can provide feedback to the government about fiscal policy or nudge corporations to act more ethically. The sentiment expressed in such surveys and tweets tends to be noisy but leading and hence can also be used for investing when appropriately combined with aggregated credit card or other data or a prior strong view.
Figure 2: Towards a Constructive Conversation. Privacy and α
Starting a Conversation
Investment professionals are taking note. Bill Kelly, the CEO of Chartered Alternative Investment Analyst (CAIA) agrees with the potential alternative data offer:
Finding alpha in what have become highly efficient markets is increasingly a very difficult challenge. The advent of 1.2 million terabytes of alternative data represents a potential sea of inefficiency and the opportunity for alpha discovery.
Yet the future privacy risks concern Bill; he believes that as an industry we are better off self-policing:
The client, regulator, and legislature are mostly standing onshore observing and trying to understand what this all means. They will eventually weigh-in when hindsight is 20/20. The agency problem between the investment professional and the organisation that signs her cheque is real. It must be self-policed with the utmost caution, erring on the side of the consumer at a time when the definition of privacy has failed to keep pace with the rapid digitisation of our world.
As enthusiastic researchers who believe in the promise of data, we have two broad suggestions:
(1) A risk mitigation solution: Form a committee in the firm similar to the IRB committee structure in healthcare in which we reduce the moral hazard by making the CIO, to whom the PM reports, also wear the hat of the Chief Privacy Officer, and thus she would own the long tail risk as well as the alpha. A long-tail risk manager, facing an ‘extinction-level event’ (ELE) after a data breach would, therefore, err on the side of demanding to see proof of alpha-creation well prior to the procurement of any sensitive data. If compliance also had to report to the CIO, then compliance managers could move away from being ‘if, then’ tick-boxer checkers[2] to enterprise risk managers who care about long-tail protection for all people who contribute their data to ‘free services’ (search engines, social media, etc.)
(2) A reward increasing solution: Examples of funds or companies that do not use individual data by design but are achieving investment alpha or improving outcomes for society – as suggested in the previous privacy-friendly α section – could also be powerfully effective in encouraging others to follow suit.
This is a complex topic but an important one. And we do not possess complete answers. Yet we hope to hear from all the stakeholders and return to healthcare for our concluding thought. In a sector that operates urgently and under high stakes, live by the Hippocratic Oath: First, Do No Harm.
[2] Example: Q: “Does the vendor scrape social media sentiment that can be matched to a person?”; “If yes, then” does the data vendor have a protocol in place to de-identify and scrub all personal identifiers?”
Apurv Jain’s latest venture MacroXstudio is focused on privacy friendly α. Previously he led efforts at MSFT towards predicting the macroeconomy and the financial markets with alternative data and artificial intelligence (AI) based techniques for five years. His prior experience includes managing a $3B portfolio as a PM, a senior researcher role at Bridgewater Associates, and options trading at Deutsche Bank. His academic appointments include being a visiting researcher at Harvard Business School and a Senior Data Scientist at Microsoft Research.
Neil Seeman is Founder, Chief Executive Officer and Chief Privacy Officer at RIWI Corp. (CSE: RIW), a global trend-tracking and prediction technology firm, and Senior Fellow at Massey College in the University of Toronto.
(The commentary contained in the above article does not constitute an offer or a solicitation, or a recommendation to implement or liquidate an investment or to carry out any other transaction. It should not be used as a basis for any investment decision or other decision. Any investment decision should be based on appropriate professional advice specific to your needs.)