Published on Data Blog

The pandemic unfolds on Google (Part 1): A new global dashboard for COVID-19 monitoring

This page in:

Woman in a face mask using her computer
Image: Engin Akyurt on Unsplash

Highlights

  • This blog presents a global dashboard that compiles publicly available Google search data and can be used by policymakers and the general public to understand how internet search interest for different terms or phrases can be used to effectively assess the spread of the coronavirus in real-time for almost 200 countries worldwide. 
  • Search interest for various symptoms of coronavirus is highly correlated with administrative COVID-19 cases and deaths as reported by the WHO, especially in countries with relatively well-developed reporting capacity. The correlations have been stable over time for the majority of countries, suggesting that even several months into the pandemic, when people fall sick, their information needs remain high.
  • Search interest precedes official data by more than a week, on average, predicting spikes in the incidence of the disease. Observed instances with high search interest but low reported administrative cases are suggestive that the dashboard can warn of potential underreporting.

When we fall ill, many of us turn to Google to understand our symptoms and treatment options. Online searches for medical information have grown together with the rate of internet access, which globally amounts to 59% today. Starting with examining flu outbreaks, researchers have discovered that Google search activity often mimics actual disease burdens. Moreover, as people search online for information about symptoms before (or instead of) accessing medical care, online search activity for symptoms can precede observed disease outbreaks. This means that search activity can be potentially used as an early warning system. Early warnings can help governments and health practitioners mobilize resources and prepare to respond—which is particularly crucial during the COVID-19 pandemic.

Beyond influenza, researchers have shown that Google search activity is correlated with the incidence of several other diseases, including dengue fever, diabetes, Ebola, Zika, or sexually transmitted diseases like syphilis, among others. Turning to the COVID-19 pandemic, the search terms “COVID pneumonia” and “COVID heart” have been shown to correlate with COVID-19 daily incidence in the United States, with search activity 12-14 days in the past correlating most strongly with present day cases, and search activity 19 days in the past correlating with deaths. “Loss of Smell” has been shown to correlate well across a number of countries.

While search activity for a variety of keywords often moves in similar patterns to officially recorded disease incidence, Google trends are best used as complementary to administrative data in most cases.  Search interest in specific diseases can be driven by news, (mis-)information campaigns, or other events. Also, the usefulness of search activity data depends on geographic characteristics including internet access and usage rates, population size, and linguistic fragmentation. However, in some circumstances, such as during a rapidly evolving situation like the COVID-19 pandemic where regions or countries may lack resources for widespread COVID-19 testing or where there may be delays in processing tests, Google trends can be the most up-to date information at scale.

In this blog, we first show that select search terms correlate strongly with officially recorded COVID-19 cases and deaths using data from almost 200 countries. We then highlight two use cases of Google Trends: first, as an early warning system to detect possible future growth in cases or deaths, and second, to detect possible cases of COVID-19 underreporting from official sources. In part 2 of this blog, we present a case study of Brazil where we illustrate the effective employment of the dashboard along with additional Google Trends data.

Search Interest in Select Keywords Correlates Strongly with Official Data

We have developed a global dashboard, incorporating data on nearly 200 countries, to show how search interest in a variety of keywords correlates with COVID-19 cases and deaths, and to facilitate the use of the publicly available data for optimal national policy response and intervention.[1] In addition to including search terms for COVID-19 symptoms, the dashboard also includes search terms to understand the mental health impacts of COVID-19 (e.g., anxiety and loneliness), other potential consequences (e.g., unemployment and debt), and search activity of preventive measures (e.g., face masks and social distancing) as well as treatments (e.g., ventilators).

Searches for COVID-specific symptoms, such as “Loss of Smell” and “Loss of Taste”, correlate strongly with administrative cases and deaths. For the 89 countries with a recorded search interest in “Loss of Smell”,[2] the average correlation with COVID-19 cases is 0.66 (the same correlation is found with COVID-19 deaths; see Figure 1 for trends in the 89 countries). The correlation is strongest when using a 9-day lag in COVID-19 cases, on average, suggesting that search interest increases in “Loss of Smell” can warn well ahead of time of potential growth in observed caseloads. In addition, the correlation remains strong even when considering only recent data. For example, for the period from June through October, the average correlation is 0.64. The correlation between symptom searches and official data is especially strong in countries known for aggressive testing strategies and death registry data (e.g., Germany, United States, New Zealand).

Figure 1. Trends in search interest in ‘Loss of Smell’ (using a 7-day moving average) and COVID-19 cases from February through October 2020

Trends in Google search interest.png

More generic symptoms such as “Fever” fare worse overall. However, these search terms are more widely available, and in certain contexts the trends show a strong similarity to COVID-19 cases. For example, in 184 countries where search data for “Fever” is available, the average correlation with cases is 0.24; however, in two countries (Thailand and Turkey) the correlation is over 0.8. While symptoms specific to COVID-19 tend to work well universally, these results emphasize how search patterns—and relevant search terms to use for understanding potential growth in COVID-19 cases—can differ significantly between countries.

Figure 2. COVID-19 Cases and Search Interest in “Loss of Taste” (using a 7-day moving average) from February through October 2020

Trends in Search in Loss of Taste and COVID cases

Given the strong correlation between search interest in COVID-specific symptoms and administrative records, instances with high search interest but low reported cases can warn countries of potential underreporting.  For example, Figure 2 shows that Tanzania saw an increase in both COVID-19 cases and search interest in “Loss of Taste” from April to May 2020. In stark contrast, zero cases have since been reported to the WHO, while search interest in “Loss of Taste” has continued—suggesting that COVID-19 cases are still occuring. In addition, growth and reductions in search interest for “Loss of Taste” in Turkey have generally mimicked fluctuations in COVID-19 cases; however, in recent months search interest in “Loss of Taste” has grown to significantly higher levels compared to April.[3] High search interest for COVID-19 symptoms with comparatively smaller caseloads could suggest underreporting, which other research has suggested as well. Google Trends data cannot point to underreporting with certainty, due to the variety of other factors that can influence search interest. For example, both Côte d'Ivoire and Chile saw spikes in search interest for “Loss of Taste” in mid-October; however, the short-lived nature of the spikes may be less suggestive of underreporting. Instances of these discrepancies, though, can help policy planners highlight potential areas for closer investigation.

For more information on how Google Trends data can help fill data gaps on COVID-19 administrative records, tune in for Part 2 of this blog, where we use the new dashboard and additional data to explore COVID-19 on the subnational level in Brazil.

 

[1] The methodology is presented directly in the global dashboard page. We use a COVID-19 dataset from the WHO: https://covid19.who.int/table

[2] Google releases search interest data only for relatively popular terms. https://support.google.com/trends/answer/4365533?hl=en

[3] As of this writing, data in our dashboard is current up to October 31, 2020. In mid-November, new daily COVID-19 cases in Turkey started rapidly increasing. Data from the WHO shows that new daily cases increased from 3,116 on November 15th to 7,318 on November 25th—the highest daily cases reported to-date in Turkey. Consequently, rapidly growing search interest in “Loss of Taste” starting in mid-October might be suggestive of possible underreporting during that time period or the capacity of Google searches to anticipate an outbreak of COVID-19 cases.


Authors

Nausheen Khan

Research Analyst within the World Bank's Development Economics Vice Presidency

Manuel Ramos-Maqueda

Impact Evaluation Analyst, Development Impact Evaluation (DIME), World Bank

Robert Marty

Research Analyst, Development Impact Evaluation (DIME), World Bank

Arndt Reichert

Professor of Health Economics and Development Research, University of Hannover

Bibind Vasu

Data Science Intern, Development Impact Evaluation (DIME), World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000