About this MRS guidance
The challenges faced in collecting high quality online data have never been greater. While emphasis should continue to be placed on making sure that participants have a good experience, the research sector is increasingly challenged by fraudulent participants undertaking online data collection. As the development of Large Language Models (LLMs), Generative AI and other forms of artificial intelligence continue to advance, this threat will only increase.
Read more
This MRS guidance aims to provide background on various checks that are currently used by research practitioners to address data quality and integrity challenges for online research projects.
It should be noted that these approaches are not solutions to fraud and there is evidence that fraudulent participants can pass through some of these checks, e.g., IP address protocol. These techniques have been sourced from GDQ members across a range of research vendors and represent some of the approaches that are currently being deployed across our sector to help to identify fraud and poor data quality. This is not an exhaustive list and there will be more added across time. In addition, as more is found about the participant fraudulent behaviour, the approaches will need to be enhanced.
It is key that the sources / suppliers do their utmost to manage fraud and poor quality through their systems, but inevitably fraudulent participants will get through into survey data and therefore research practitioners need to be able, to their utmost, to analyse for problematic participants.
Sources of poor data quality data can come from individuals or organised groups that are deliberately participating in surveys typically for a monetary benefit. Such issues, if left unchecked, can significantly compromise the integrity of research data. By addressing these issues, the sector can aim to elevate the standard of online research projects, by mitigating as far as possible that the data collected is both reliable and valid.
The other key message is that some participants are inappropriately removed from surveys (false positives) as the checks being used are not well understood, e.g., removal for “straight lined” responses alone is bad practice. It should also be noted that poor research / questionnaire design that drive poor research data needs to be considered when reviewing data and determining what is real fraudulent behaviour or not. The GDQ has developed separate guidance on improving the participant experience: https://www.globaldataquality.org/GDQ-resources.
While this document provides insights into identifying issues, it does not prescribe specific actions for each identified problem. This is the first step in a broader process. In the future, MRS aims to provide more detailed guidance on how to act upon the types of issues identified through the techniques outlined in this document. For example, whether to add “open ends” to surveys, how to manage multiple checks, how to review checks and importantly what other data could be made available to support the quality steps through data analysis e.g., participant source.
Similarly, within this document there are no explicit thresholds for what is acceptable for each specific behaviour. For example, no specific value for the acceptable level of repetition across open-ended responses are provided. The guidance is designed to inform, rather than impose rigid standards.
This MRS guide is intended as a resource for practitioners seeking to understand how they could enhance the quality of their data whilst acknowledging that individual checks may not be appropriate and may generate “false positives”.
See less
By understanding and identifying quality issues, practitioners can increase the credibility and reliability of their research findings. We invite you to explore this document as a steppingstone toward higher data integrity in research projects.
Scope of the guidance
The scope of this guide is focussed on steps that could be taken in data collection (pre, during and post) by research practitioners to improve data quality and integrity. The amount of control and influence on the checks will depend on how much influence and accountability that research practitioners have on the quality process. Read more
This version of the document is to help inform and guide practitioners about data quality approaches and contains details of current approaches. This guidance does not recommend which approaches and checks should be used as default on all studies but does inform on how effective these approaches can be in addressing data quality and fraud challenges.
This guide does not cover activities being implemented by online participant suppliers / sources as part of the quality process. This will be addressed in other GDQ documentation in terms of the questions that should be asked and understood.
See less
Please note this data quality guidance, including potential legal issues is provided for information. It is not legal advice and cannot be relied upon as such. Specific legal advice should be taken in relation to any specific legal problems or matters.