Using “Big Data” and Other Digital Methodologies to Monitor Substance Use Disorders (SUDs)

By Iris Smith, Ph.D.

Coordinated data collection, analysis, and synthesis at the individual, community, state, and national level are key elements of an effective prevention system. Epidemiology helps us identify patterns in behavioral outcomes, as well as associated risk and protective factors.  Epidemiological data provides information about the scope, frequency, and severity of prevention targets, the impact on local communities, and aids in the generation of hypotheses to test causal relationships between exposures and outcomes.1  The public health approach to substance use emphasizes the importance of looking beyond the individual to the contributions of the physical and social environment on health outcomes. Techniques such as spatial analysis can be useful in broadening the lens of prevention to factors in the social environment that influence outcomes, helping us determine where, when, and with whom to intervene.2

Advances in technology have led to increased use of large publicly available data sets (“big data”) and new methodologies for abstracting, analyzing, and applying it. Data collected by other agencies and organizations (secondary data sources) can be a cost-effective way to supplement primary data collection. However, large data warehouses are also complex and most often consist primarily of observational and not experimental data. This can lead to assumptions about causality that may not be valid.3 Wesson et al. (2022) cautions that use of secondary data, particularly “big data” has the potential to perpetuate inequities in public health, especially when vulnerable populations are not adequately represented.4  The validation of screening tools in the online environment should also be considered when interpreting data.5

Qualitative data obtained from individuals or groups can also inform prevention efforts.  Techniques such as crowd sourcing, digital focus groups, and the use of web conferencing have also been added to the toolbox of data collection strategies. For example, data from social media platforms such as Facebook, Twitter, and Google may not capture information about populations lacking the resources to participate fully in the digital world or who have limited access to technology (for example rural populations). Such data, while valuable, may not be representative of the general population or reflect the needs and behaviors of vulnerable populations. Gathering data from social media platforms has also raised concerns about privacy and confidentiality.  Improving the quality and health equity of data requires that originators and users consider six ‘Vs”:  Volume: the amount of data available; Value: the usefulness of data for decision-making, Variety: the types of data included; Veracity: the trustworthiness of the data; Virtuosity: equity and ethics in design and analysis (inclusiveness), and Velocity: the speed with which data are collected and processed, (timeliness).6



1 Eberth JM, Kramer MR, Delmelle EM, Kirby RS (2021)  What is the Place for Space in Epidemiology?  Annals of Epidemiology, 64; pg. 41E-46.


3 Wesson P, Haswen Y, Valdez G, Stojanovski K, and Handley MA (2022).  Risks and Opportunities to Ensure Equity in the Application of Big Data Research in Public Health.  Annual Review of Public Health, 43.8; 8.1-8.20.


5 Kolc KL, Yue Xuan KT, LoAZY, Shvetcov A, Mitchell PB, Perkes IE.  (2023)  Measuring Psychiatric Symptoms Online:  A Systematic Review of the Use of Inventories on Amazon Mechanical Turk (mTurk).  Journal of Psychiatric Research.

6 Wesson et al. (2022).

Copyright © 2024 Prevention Technology Transfer Center (PTTC) Network