Open Source Intelligence x Alternative Data
What are these data sources and how they can be used for your organization
Throughout my tenure at Walmart, HSBC, and the European Commission & Parliament, I dedicated my efforts to establishing their initial capabilities for open-source intelligence (OSINT) and alternative data. By combining this data with internal information and machine learning, we were able to formulate strategies and make decisions that were more closely aligned with market opportunities and risks. This approach enabled us to achieve a level of speed and context that would not have been possible through expertise or intuition alone.
Open-source intelligence (OSINT) is a type of intelligence that is publicly available information. Some examples of such information include news articles, social media posts, economic, governmental, and market data, 10k filings, government policies, government records, and open-sourced corporate data such as Google search trends.
Alternative data can be signals from web traffic and transactional data such as shopping and spending patterns by region, type of good, and type of consumer, as well as satellite images to regional growth or trade levels.
The key difference
Alternative data and Open-Source Intelligence (OSINT) are two different types of data that companies can use to gain insights into emerging trends and changes in the market. Alternative data is private and not publicly available, while OSINT is technically free and accessible. However, in most cases, it makes sense to buy OSINT data formatted and cleaned to enable faster analysis. When combined, these data points can provide an additional layer of insights that are impossible with internal data alone. By combining both types of data with internal data, companies can gain a 360-degree view of everything happening in the market, which is the holy grail for firms seeking to make informed decisions.
Once the data is in place, it’s possible to implement a technique called a “domain analysis,” which uses machine learning and augmented intelligence to analyze the massive amounts of data from OSINT, alternative, and internal data when it’s available. The technique can be used in simple or complex situations and will enrich any strategic decision.
Advantages
Save resources and time by finding patterns between products, events, markets, trends, and people of the world’s collective knowledge before deciding what to prioritize and invest in.
Less prone to bias, more contextual, and faster than heuristics.
Intelligence is cheaper and quicker than internal data, business expertise, or external consultants.
No wait times for internal, often outdated data. Insights on any topic from OSINT can be immediately collected and acted on.
Legacy V Zero Assumptions
In most companies, the top executives set the company's strategy and priorities. Then, teams gather information and data to support the initial strategy. This process usually involves using tools like Google Search, Excel spreadsheets, and sometimes business intelligence dashboards. However, the data collected often only includes financial information or market share comparisons. While Google can provide helpful search results and financial data, it is ultimately up to humans to analyze and interpret the information to evaluate the strategy's potential success.
There are major issues with this approach:
Each teammate's individualized Google search results, news, and social media feeds distort information, leading to no coherent center of truth.
The strategy is anchored by currently known information. Peripheral trends, which are unknown and influence outcomes—called this "dark matter"—are ignored/unknown.
Excel doesn't properly contextualize multiple data streams; fundamental data alone is just the tip of the iceberg.
As such, companies have a hard time making sense of the system of the market they are targeting. They favor defined albeit superficial intelligence that confirms pre-existing strategies and then optimizes them versus actively seeking to find flaws in the thesis and new opportunities for value creation. It is a core reason why all but a few businesses often appear tone-deaf, and many are upended daily by market volatility or disruption from competitive forces.
It's common for human-driven strategies to be executed before analyzing any data. This means that any refinements to the strategy using a data-driven approach can only be made during the next implementation. Unfortunately, this approach can make it difficult for firms to create value. Data science is usually applied to already-known information or processes, which creates an "optimization trap." This means that firms don't try to create value in any novel way from multiple signals before setting the strategy. The result can be uncompetitive products that are costly in terms of both time and money.
When conducting a domain analysis, it's important to start with a blank slate and avoid making assumptions about the core focus or topics. This is where machines come in handy, as they can scan open-source and alternative data to identify signals to help prioritize strategic goals within the domain. This approach can eliminate bias and achieve faster and more precise results than relying solely on human experience. Combined with human expertise, this technique produces unparalleled outputs since machines have the necessary processing power to consider all relevant variables that humans may overlook.
Domain Analysis
An excellent example of how the process works is by using advanced natural language processing and network analytics to cluster open-source intelligence (OSINT) documents that mention "Global Risk" before the COVID-19 outbreak. The domain analysis methodology quickly surfaced that pandemics were central to the broader domain of "Global Risk," and should be focused on. It also highlighted how pandemics are connected to more obvious narratives such as recession, oil, and the USD. While this intelligence wouldn't have stopped the COVID-19 pandemic, companies would have been better prepared and could have bought themselves more time to respond to it.
Furthermore, OSINT data from the World Bank and topological data analysis surfaced that Germany, Austria, and France were the most robust countries against COVID-19. And these economies would probably open up sooner. Additionally, a review of Google Search Trends shows the markets perhaps believe the FTSE is the most exposed exchange because of the high correlation Google search correlation with the VIX volatility index during the COVID pandemic, likely due to the additional concerns over how the UK economy can cope with the combination of QE policies and Brexit post-lockdown. A two-week lead time can be worth hundreds of millions of dollars to a fund manager or corporation.
What Can Organizations Do?
Start Now
The best way to learn is by doing, and one of the greatest upsides of OSINT over other types of data is that it can be accessed and leveraged immediately, with an upside, as shown in the visual below. This will get your firm ahead of others in your space - often very important - ask Walmart, who missed the boat on hiring machine learning talent in contrast to Amazon, who invested heavily in it (a 300B dollar mistake).
OSINT You Can Use Immediately
Google Trends is my first go-to in all cases. It’s simple, quickly allowing users to compare the relative search volume of searches between two or more terms. G Trends is a free tool and is an excellent example of open-sourced intelligence by a corporation. Search volume is extremely predictive in politics, retail, and finance - as it can be used as a leading, not lagging, indicator. One technique that can be valuable is looking at how different topics/keywords contrast or correlate to one another, i.e., holidays, flights, credit cards, and mortgages. Often businesses look at these products in siloes when a variety of factors influence them. Google trends can quickly quantify those hypotheses, especially for macroeconomic themes. Additionally, Google Trends has been deadly accurate in predicting electoral outcomes where the polls have failed, showing both Brexit and Trump coming out on top.
Governments and International Organizations. The World Bank and Eurostat offer data on a variety of topics, including economic indicators, social indicators, and environmental indicators. Nonetheless, I’d suggest using Google Public Data Explorer, which aggregates all these data in one place and offers great visualization tools based on the legendary Hans Rosling’s Gapminder - also a great resource (below). Moreover, open-source data can be found on the U.S. open data portal and in numerous other open-source repositories like Kaggle, where you can find datasets for anything from natural language processing to computer vision.
Wolfram Alpha is a computational tool that provides data on various topics such as GDP and population growth. It also contains information about entities such as companies, places, and markets. Unlike a search engine, it answers factual queries directly by computing the answer from externally sourced "curated data." This makes it a reliable source for accessing Morningstar data, as well as projections. The current projections show a downward trend for HSBC and an upward for GS and JPM. These trends are good news for people in corporate strategy or investments.
The Globe of Economic Complexity shows the true scale of the world economy. It visualizes 15 trillion dollars of world trade. One node equals 100 million dollars of exported products and shows how those product spaces and countries are interconnected. It's the best example of economic data and complex systems visualized in their full reality. If you are unfamiliar with graph analytics, look at the Atlas of Economic Complexity, which the examples below are from.
Management
Technology is only half the story. To get the most out of them, managers must think differently - most of these outputs will be new and abstract to an organization. And it can take some time for people to get their head around what’s possible and machines and alternative data sets.
The Actual World Doesn’t Dashboard
The world is interconnected, ambiguous, and complex most of the time. As such, so are the data and outputs, as illustrated by the network below, which uses natural language processing to connect thematics from OSINT. Decision-makers should learn to defer judgment - not immediately revert to their heuristics if they do not understand machine-derived outputs. While humans crave black-white classification (like dashboards), accepting ambiguity and probabilistic thinking leads to better decisions.
How the Actual World Is
Understand the cost of indecision. Firms must be mindful that a "good" or "ok" decision's value is exponentially highest in the beginning – and often much more valuable than a perfect choice made later. This is due to the margins of a competitive edge becoming smaller but, at the same time, exponentially more valuable globally. As a result, focusing on developing decision architecture that focuses on lowering "information-to-action" times (similar to how traders look at financial markets) is essential and will be a driving competitive edge.
The level of expertise needed to beat machines gets higher every day. Managers must accept that the value of their expertise and insights from their information - often static, diminishes rapidly. Windows to exploit and or maintain a market position open and closes faster than ever, and political disputes that can threaten a business strategy, along with new competition, seemingly emerge from nowhere. As such, over the next five to ten years, the most successful companies will place the burden of proof on human intuition, not data. Even when machines are wrong, they are consistently wrong for the same reasons, making them less noisy and easier to fix. Ask five people to rank the most important drivers of a given topic, say interest rates, and you will get five different rank orders. Machines don’t do that.
In short, there are a lot of applications for OSINT and alternative data, with more popping up every day. Firms must realize that their most valuable expertise will be synthesizing disparate signals in the data with machines that enhance their ability to find patterns and quickly pivot to where those lead. Not a specific expertise or knowledge base. The difference in outcomes will be as drastic as those of a captain who's mastered the use of a compass and map, compared to those who can only sail where there is a familiar shoreline when trying to reach the new world.
If you want to learn more about OSINT or have any questions feel free to reach out. I am passionate about this subject and helping firms get the most out of these technologies.