Working with Data? Here's How to Verify Your Sources and Numbers.

By: Amr Eleraqi | 09/01/2021

Amr Eleraqi is a former ICFJ Knight International Fellow.

The year 2020 wasn’t just dominated by the pandemic. It was also a year of open data. 

Many health-related organizations published daily and real-time updates about the spread of the virus around the world, circulating an unprecedented amount of numbers and figures. The challenge for journalists has been to analyze this information accurately, and communicate their findings to the public effectively.

It’s imperative that journalists first understand the data they’re working with. While there is often a rush to publish in today’s non-stop news cycle, doing so inaccurately does more harm than good. During a crisis like COVID-19, data can help raise critical awareness among the public. But if mishandled, it can place people at greater risk. 

Always analyze numbers with healthy skepticism. As journalists, we should investigate when and from where the data we use originates. We should determine who originally collected and published the numbers, as well as the funders behind it.

Journalists must also fix illogical or missing values, and clean up mislabeled figures. These errors may occur during the data entry process, whether done manually or automatically.

The Jordanian Ministry of Health, for example, used to manually enter some COVID-19 test results that didn’t automatically get uploaded into the government database. As the number of daily cases increased, results were lost, and mistakes related to the names and their samples were made, former Jordanian Health Minister Saad Jaber told local media

Keep in mind, too: even when using reliable software like Microsoft Excel, human error can sneak through. Take, for instance, this incident that occurred in the U.K. last year: 16,000 records of COVID-19 patients were accidentally deleted from an official database, resulting in the spread of inaccurate data which hindered efforts like contact tracing to combat the virus. 

To avoid publishing inaccurate data, rely on credible sources and verify the numbers. Here’s a checklist to help:

Transparency

Seek out resources that are transparent about how they compile and document data. This includes the technology and algorithms they used during the process. The more transparent data providers are, the more potential for accuracy there is. 

To this end, make sure you understand how data is being collected by the source you’re referencing. This will enable you to best analyze and verify numbers before you include them in your own reporting.

Methodology

Don’t publish a dataset without attaching the corresponding metadata file, which helps explain how the data was collected. It can also include information about sample size, error margin and missing values, and it includes a glossary of terms and abbreviations. Without these details, you’re like a person who has discovered a gold treasure chest, but doesn’t have the keys to open it. 

In Italy, for example, journalists questioned the credibility of official government data around COVID-19 after finding flaws in the numbers presented to the public. This can be attributed to a variety of factors, among them that the government changed their testing policies several times in 2020, and that methodologies to track cases of the virus differed by region. This contributed to inconsistent, deficient data overall. Had a metadata file been made available, these errors could have been more easily identified.

Context

Context is key when analyzing data. For example, consider how you present information about total infections and infection rates. When a government authority presents regional data about the number of people infected with COVID-19, a large city might show the highest value. This doesn’t necessarily mean its infection rates are highest, however — it might simply be the result of being the most populated area. 

The more appropriate way to compare numbers in locations with different populations is to calculate infection rates per 100 people. This will more accurately demonstrate the spread of the virus.

Understand the data

Don’t start working on a database unless you understand what is being presented. To do so, ask yourself the following questions: 

  • What does the data indicate?
  • Do I understand all terms and definitions included in the data?
  • What is not included in the data that could provide context?
  • What are the units of measurement?
  • Can you cross-reference the data with a different source, to corroborate the values? 

Verifying data requires investigation and analysis. Fortunately, journalists don’t need to be data analysis experts to carry this out. Journalistic values, skills and instinct are all effective methods to assist with fact-checking numbers. Manual verification can be even more effective than automated verification algorithms. While technology might not always be able to determine the credibility of data, it can provide journalists with useful tools and guidance to help. 

At all turns ask questions, be skeptical, and review and cross-reference your numbers as much as possible. The following diagram shows the steps I follow when dealing with numbers in a database. It might help you build your own verification strategy.

Latest News

Legal Strategies for Foreign Journalists in the US

We spoke with two prominent lawyers in our network for their assessment of the new landscape, and strategies to navigate it. For the purposes of this resource, the attorneys preferred not to be named. Ultimately, be vigilant, they advised. Stay up to date with the news and developments as the situation is fast-changing. Understand which countries may be under more scrutiny from the administration.

Journalists from Nigeria, Venezuela Win Prestigious 2025 ICFJ Knight Awards

The International Center for Journalists (ICFJ) today announced its 2025 Knight Award winners – two journalists who have demonstrated exceptional courage and perseverance in exposing wrongdoing in environments that are incredibly hostile to the press. The awardees are: César Batiz, an investigative journalist in exile who is the co-founder and director of the pioneering El Pitazo in Venezuela; and Philip Obaji Jr., a Nigerian journalist who has documented Russian atrocities in Central and West Africa as a correspondent for The Daily Beast.

Sustaining Journalism in Exile: New Toolkit Released

Once in exile to escape threats and danger, journalists soon face a new set of challenges: how to sustain their careers, communities and reporting from afar. ICFJ’s International Journalists’ Network (IJNet), in collaboration with the Network of Exiled Media Outlets (NEMO), has expanded its Exiled Media Toolkit to include a