Global Data Quality Glossary

Research is successful when trust and confidence in the data are at the foundation of everything we do and say as professionals. That trust and confidence is greatly impacted by the language we use when talking about our activities and the data we generate. With that in mind, the goal of this glossary is to refine the way we talk about data quality overall and become more precise in describing the challenges. This is important both to assure users of data that they can act on the insights generated, and to clearly define quality issues so we can address each issue with the correct set of solutions.

As you use this glossary, we ask you to implement the definitions and terms into your professional vocabulary. When speaking about quality overall, use the correct terms, and when defining the issues like fraud and validity, choose the terms that accurately reflect the issue to be addressed.

For each term you will find a short definition. Each definition is supplemented by a more detailed contextual statement some of which include examples to help to illustrate the terms being defined.

We recognise that these definitions are living terms, which will need to be refined and added to as data quality understanding evolves. If you have any suggestions for additions or changes, please use the Submission Form

Acquiescence Bias (Agreement Bias)

Simple definition: Tending to disproportionately select a positive response.

Full description: In particular, a participant is likely to give disproportionately positive responses to statements. It should be noted and therefore care taken that some cultures can provide more positive responses / aspirational responses.

Aggregator

Simple definition: A supplier that provides access to participants by gathering multiple panel sources and making them all accessible via a single interface. 

Full description: An organisation that provides access to participants by gathering multiple panel sources and making them all accessible via a single interface. 

Artificial Intelligence (AI)

Simple definition: Artificial Intelligence is a computing environment where the machine makes its own autonomous decisions.

Full description: Artificial Intelligence (AI) is a computing environment where the machine makes its own autonomous decisions and acts, creates, evolves, or changes decisions without the oversight or contribution of an actual human.

B2B Surveys

Simple definition: Business to Business research activities.

Full description: Business to Business surveys that are targeted toward business professionals. Examples of commonly targeted B2B groups include IT decision makers (ITDM), human resources decision makers (HRDM), and Healthcare providers (HCP). In addition to the types of fraud that threaten any research study, B2B surveys are particularly vulnerable to false claims of group membership that can severely compromise the validity of conclusions. Fraud rates in quantitative B2B studies are often much higher than in B2C studies, in part because of the higher monetary rewards associated with such studies. 

B2C Surveys

Simple definition: Business to Consumer research activities.

Full description: Business to Consumer surveys that are targeted toward the general consumer. 

Behavioural Validation

Simple definition: A process to identify problematic participants from their behaviour.

Full description: The process of identifying problematic participants through an examination of their behaviour, which can include responses to specific survey questions, response patterns across multiple questions, mouse movements and other behavioural measurement techniques. Behavioural validation can be applied both pre-survey and in-survey. This can be done at a survey level but also at a longitudinal level across multiple surveys over time. A participant may behave differently across surveys, being engaged in one but not in another.

Benchmark Comparisons

Simple definition: Comparing data to a known baseline.

Full description: Benchmarks are generally used in order to assess how brands, products or services are performing in relation to a known measure - some research organisations have built up repositories of benchmarks (also known as "norms") over time that can be used to rate over or under performance. Any project measure, result or response that is outside of acceptable ranges versus available benchmarks and may suggest data fraud or quality issues. Care should be taken in assessing whether responses are likely to exist in reality.

Bots

Simple definition: Software that operates as an agent for a user or a programme or stimulates human activity.

Full description: Computer programs/scripts designed to mimic human activity to participate in online surveys for purposes the incentive or reward. Bots can be highly flexible with many designed behaviours. For example, they may be set up to pause during surveys to extend the length of time taken and avoid being flagged as speeding.

Click Farm / "Organised Fraud Groups"

Simple definition: A group of people paid to complete surveys.

Full description: Significant activity from groups of participants generally for malicious purposes to participate in surveys and earn rewards. People are usually used to avoid bot detection technology. Surveys may be manually completed by people or they may use a bot as a tool to complete surveys at a higher speed than a human could achieve. Click farms can be a single person spoofing multiple accounts on one computer, a group of people working together, or a remotely distributed group working anywhere in the world.

Click farms often use fake user profiles or automated scripts to simulate online activity. Click farms undermine the integrity of surveys by generating a large volume of fake and fraudulent responses. This can skew the data and lead to inaccurate findings.

It becomes challenging for researchers to differentiate genuine responses from those generated by click farms. This leads to: biased responses due to not being able to accurately represent the target audience; not reflecting true opinions or user experiences; inflated response rates due to survey responses being artificially inflated. The result being that it is hard to determine true response rates and sample size.

Contradictory Answers/Hidden Trap Question

Simple definition: An attention checking survey question.

Full description: Answers within one question, or across multiple questions where the data does not align. When a participant is unengaged or inattentive, or is showing acquiescence bias, they are likely to give contradictory answers. For example, if the statements "I like dogs" and "I dislike dogs" are put in different parts of the same list, you would expect them to get opposing responses if a participant is taking care to read the questions. If they give the same or very similar answers, it would be considered contradictory and would be considered a failure at the this hidden trap question. This can provide false positives e.g. someone could honestly like some dogs and dislike others and therefore be confused so practitioners should always use multiple questions to identify out the truly engaged from the unengaged.

Data Fraud

Simple definition: Intentional misrepresentation of identity or data.

Full description: Deception or malicious behaviour intended for financial gain with fake insights provided or through system manipulation. Pre or during research or panel sign-up process, inaccurate or false information may be provided or participant flow system manipulated to register a person as qualified for research, most likely for the purpose of claiming an incentive.

The term "Fraud" should only be used in the context of deliberate or malicious deception and should not be confused with poor "Data Quality".

Data Quality

Simple definition: The measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and how up to date it is.

Full description: There are a number of ways to check and measure data quality. In the research sector, it is critical for all parties to ensure that the highest possibly quality data is collected, processed and analysed. For online data this might include checking open ended text answers, looking for patterns in data and so on. For other methods, it may involve listening to live calls or recontacting participants after a research task to verify qualifying criteria.

The quality of data also includes other key aspects of research e.g, coverage / representivity - online does not readily cover the population and this can impact results. Plus aspects of sample design, quota design, questionnaire design.

Digital Device Fingerprinting

Simple definition: Information collected about a device for the purpose of identification of individual research participants or devices.

Full description: Browser fingerprinting is a technique used to gather information about a web browser's configuration and settings to create a unique identifier or "fingerprint" for that particular browser. It involves collecting various data points, such as the browser version, operating system, screen resolution, installed plugins, fonts, time zone, language preferences, and other attributes that can be easily obtained through standard web technologies like JavaScript.
By combining these data points, websites and online services can generate a unique identifier that can be used to track and distinguish individual browsers, even if cookies are cleared or disabled. The fingerprint is typically a combination of several characteristics, making it difficult for users to change or manipulate all the attributes to avoid tracking.

In research browser fingerprinting can be used to ensure data quality in online surveys by identifying and mitigating fraudulent or suspicious responses. Browser fingerprinting can help identify and flag duplicate responses from the same browser or device.

It is important to note that while browser fingerprinting can help detect fraudulent participants, it is not infallible. Dedicated fraudsters can find ways to manipulate or alter their fingerprints. Also, new browser versions, such as Firefox, offer ways to block fingerprinting.

Disengaged Participant

Simple definition: A participant who reactively does not give an adequate level of thought to the responses they provide.

Full description: A participant who is disengaged and responds to questions without fully processing what is being asked due to the content or structure of the questionnaire or research process.

Double Opt-In

Simple definition: A process to confirm the agreement of participants to opt in.

Full description: After signing up for a panel (opt-in) a participant will be sent a confirmation email. Double opt-in involves confirming an email address by responding to the confirmation email. The double opt-in process does not by itself provide sufficient protection against fraud. 

False Negatives

Simple definition: Incorrectly validated participants identified as of poor quality.

Full description: The proportion of genuine or good quality participants who are incorrectly categorised as poor quality or fraudulent participants.

False Positives

Simple definition: Incorrectly validated participants identified as of good quality.

Full description: The proportion of poor quality or fraudulent participants who are incorrectly categorised as valid.

Form Filling

Simple definition: A script or program to answer survey questions automatically.

Full description: An app, typically a browser plugin or server app, which automatically and randomly or algorithmically fills out survey questions quickly to save a fraudulent participant time completing a questionnaire.

Fraudulent Participant

Simple definition: A participant who deliberately misrepresents their identity, profiling information or responses, including organisations that use bots to impersonate participants.

Full description: A participant who intentionally circumvents the research process, usually for monetary gain. This can occur in both qualitative and quantitative research. Most commonly, survey fraud occurs when participants attempt to collect survey rewards while a) not being qualified for the survey and/or b) making efforts to collect rewards while bypassing as much of the survey as possible. This includes but is not limited to:
1) not responding to questions honestly
2) taking surveys they are not qualified for
3) falsely posing as belonging to a particular demographic group or overclaiming at the screener
4) accessing surveys from countries not being targeted or in languages surveys are not being offered
5) intentionally taking surveys more than once from the same or multiple different accounts
6) the use of automation to generate closed-ended and open-ended survey responses
7) generation of false completes in participant system rather than actual survey data through manipulation of participant process flow (i.e. ghost complete)
8) using false ID to impersonate or create a profile

Generative AI

Simple definition: Refers to a branch of artificial intelligence that focuses on creating systems capable of producing new and original content.

Full description: This could include such things as images, music, or text. It involves training models to learn the underlying patterns and characteristics of a given dataset and then using that knowledge to generate new instances that resemble the original data. Generative AI models aim to mimic human creativity by generating novel outputs that have not been explicitly programmed, although human-like creativity is aspirational at this point.

Geo-Location Tracking

Simple definition: A process to detect participants' locations.

Full description: Identifying the physical location of participants (usually via IP address) to ensure participants are in the geographic locale they claim to be in. Many survey platforms can detect and reject participants that are not in the country or region they claim to be in. However, given the manipulation of IPs and other information, this may not be reliable.

Ghost Complete

Simple definition: A complete survey response recorded in a participant system and not recorded as complete in the survey data.

Full description: The process by a person to create a complete in a participant system rather than survey system by manipulation survey links to generate a false complete. This is done solely for financial gain. This typically would not impact the survey data. There are solutions such as utilising server to server flows.

In Survey Duplicates

Simple definition: Repeated data in a data set.

Full description: The same participants completing the same survey more than one time providing the same responses throughout a survey, or providing the same personal data when that information is collected. Intent may vary: is this someone purposely trying to take the survey multiple times or simply on multiple panels where they receive the same survey and have passed through any other duplication check e.g. using two devices through two different sources?

Inattentive Participant

Simple definition: A participant who does not give an adequate level of thought to the responses they provide.

Full description: A participant who is distracted or disengaged and responds to questions without fully processing or understanding their content. There may be many reasons for the inattentiveness, for example, they might be watching TV while completing a survey and are not reading the questions carefully enough.

Incidence

Simple definition: A test to predict how many people may qualify for a survey.

Full description: Using known profiling data to create a "sample" to determine the number of people within a panel that will qualify for a survey. When a qualifying criteria is included that does not already exists as a data point in a participants profile, known or estimated probabilities may be used to determine the incidence rate of the target participant within the panel. When probabilities are estimated, reaching full target is not guaranteed.

Intercept Sample/River Sample

Simple definition: Participants directed to surveys via website or other sources of advertising in real time.

Full description: Participants who participate in surveys via banners, video games and other ads. There is typically no opt-in process. River samples improve reach, but more care is required to ensure data quality. 

IP-Deduplication

Simple definition: A process to identify participants who are taking surveys from the same IP address.

Full description: Due to the easy and inexpensive access to VPNs that can spoof IP address, this is not truly reliable quality/fraud detection criteria. Anyone making more than a casual attempt at fraud would be likely to use multiple IP addresses. This also means that using IP address to verify location can be unreliable.
It is also possible that more than one genuine participant may be on the same panel in the same household and would have the same IP address. Other measures should be factored in. It is also possible that it is very important to understand that IP address is considered personal information (PII) in many countries, and certainly within EEA markets and the UK where GDPR applies. Informed consent must be gained before IP address is captured for any purpose.

Large Language Model

Simple definition: A specific type of GenAI model that is designed to understand and generate human language.

Full description: These models, such as ChatGPT and Bard, are trained on massive amounts of text data and can generate coherent and contextually relevant text responses based on plain language prompts or queries. They learn the statistical relationships between words and phrases in the training data and use this knowledge to generate human-like responses or create original written content. LLMs have a wide range of applications, including natural language understanding, translation, programming, chatbots, summarizations, and content generation.

One of the risks to research is that these models can be used by fraudsters to program intelligent bots that provide good and valid answers to open-ended questions in surveys.

Length Of Interview (LoI)

Simple definition: The time taken to complete the survey.

Full description: Usually based on the median survey completion time. This is used to detect speeding which also can indicate problematic participants. The length of a survey can have a significant impact on engagement and quality which is why longer interviews are more often conducted by phone or face to face as these are more engaging data collection methods.

Low Incidence Check

Simple definition: An attention checking survey question.

Full description: Disguised screener questions. For example, a very low incidence brand of toothpaste may be put prominently in a brand list to see if the incidence does not match. This can provide false positives e.g. halo effect and practitioners should always use multiple questions to identify the truly engaged from the unengaged.

Machine Learning

Simple definition: Machine learning is the training of computer systems on a given data set to recognize specific patterns that exist in the data.

Full description: Machine learning is the training of computer systems on a given data set to recognize specific patterns that exist in the data. As new data is added, analysts can guide parameters to tune the model or allow the algorithm to train itself to make more accurate predictions. Machine learning is used to ingest large quantities of data to help humans identify patterns more quickly and accurately.

Matrix (AKA Grid) Questions

Simple definition: Closed-ended survey questions with a grid like column structure.

Full description: Closed-ended survey questions with a characteristic grid like column structure in which columns typically correspond to response-options. The number of questions in a grid and how many are presented on screen at once should be considered as an important factor in creating a good participant experience.

Max Diff Questions

Simple definition: A question to evaluate pairs of items.

Full description: Results from a MaxDiff (Maximum Difference or Best/Worst Scaling) exercise can also be used to identify patterns of fraudulent responses. Further reading; https://en.wikipedia.org/wiki/MaxDiff.

Mischievous Participant

Simple definition: A person who provides information that is intentionally false or misleading.

Full description: As opposed to fraud, mischievousness is less likely to be motivated by financial gain. It could be for personal or societal reasons as a protest. For example, 400,000 people reported their religion as Jedi in the 2001 British Census. 

Non-Naivete

Simple definition: Being very familiar with survey content.

Full description: A response bias that results from familiarity with the survey intentions or content. This is most commonly observed among highly active participants who participate in many studies. 

Open-Ended Response Validation

Simple definition: A process to check open-ended responses to determine the quality of participants.

Full description: Open-ended responses are a effective tool for measuring quality and detecting fraud. Controls should include the following:

  • Gibberish and nonsense answers.
  • Answers that are off topic and have nothing to do with the question.
  • Checking for duplications between questions and participants.
  • Checking the language. Is the answer written in the correct language?
  • Has the answer been copy-pasted into the text field? (This can only be detected with third-party solutions)
  • Was the answer generated by an AI model? (This can only be detected with third -party solutions)

It should be noted that some responses such as "gibberish" open ends cannot be taken as a Problematic Participant as some participants just do not like providing open ends and that should be accepted. Based on the type of open end response different actions are required including reviewing the open end responses.


Open Ends / Verbatim Comments

Simple definition: Type of question where participants are asked to answer in their own words.

Full description: Questions designed for participants to answer without a pre-populated answer set. Can be text or numeric. Generally designed to obtain unprompted responses and gain a deeper understanding of personal opinion. Open ended responses are used in a number of different types of data quality checks.

Overclaiming

Simple definition: Deliberately exaggerate to qualify for a survey.

Full description: For example, claiming to be a senior manager in an organisation when they are a junior manager, or claiming extremely high awareness of brands or products.

Panel Management

Simple definition: Process by which panels and panellists are administered to ensure high quality and engaged participants.

Full description: Based on a panel's business demands, for example, in some markets, smoking is prevalent so profile data relating to this would need to be kept regularly updated compare to other markets where this may need less regular review. Panel management is used to monitor panel targeting/profiling to make sure it is consistent with qualifying or termination metrics. Each panel will have activity thresholds that allow for grouping panellists where different levels of engagement are required, which maximises the responses rates, making sure the panellists' experiences are good and relevant activities are delivered to them.

Panel Opt-In

Simple definition: Individuals who have registered with panels as a participant. This involves providing an email address and potentially other personal data.

Full description: To “opt-in” to an online panel is to sign up for a panel as a participant. This involves providing an email address and potentially other personal data and information. It is during the opt-in stage that multiple checks are made to ensure the individual is a unique and valid potential member for an online panel. Validating a person's identity successfully now requires even greater time and cost due to the easy accessibility of multiple free email accounts.

Panel Sample

Simple definition: Participants drawn from a panel to which they have opted in.

Full description: Participants recruited from a documented source who have provided profile data and appropriate information for validation of identity, given explicit consent to participate in research according to the terms and conditions of panel membership, and has not opted out.

Although a national representative sample (or other types of representation) can be curated from within a panel using quotas, panels will still comprise those that are interested in taking part in surveys and the rewards offered for doing so.

Participant (Survey) Experience

Simple definition: A participant's view on the quality of the interaction when completing a survey.

Full description: How a participant feels when participating in a survey. A participant can experience excitement, boredom, interest, frustration, and a wide range of other reactions that can affect their level of engagement, attention, and focus, influencing the quality of their responses. The experience is affected by all stages from the panel portal, survey invite, screening and profiling and the actual survey. In particular, a survey router process may involve being profiled and qualified more than once which can create frustration.

Passive Data Collection

Simple definition: Collection of data without direct interaction with the participant.

Full description: The permission-based or ethical collection of data by researchers observing, measuring, recording, or appending a research subject’s actions or behaviour for the purpose of research and without direct interaction with the research subject. 

Personal Data

Simple definition: Personal information that is attributable to an individual.

Full description: Personal data (sometimes erroneously referred to as "personally identifiable information" or "PII" or "personal information") means any information relating to a natural living person that can be used to identify an individual, for example by reference to direct identifiers (such as a name, specific geographic location, telephone number, picture, sound, video recording or biometric data) or indirectly by reference to an individual’s physical, physiological, mental, economic, cultural or social characteristics. The definition most widely used is that defined in the EU GDPR.

"Sensitive personal information" is a sub-category of this and requires additional care during collection, transferring and processing. This is dependent on which country and laws apply to the data subject. For example, in the UK, the following are considered special category data:
- race;
- ethnic origin;
- political opinions;
- religious or philosophical beliefs;
- trade union membership;
- genetic data;
- biometric data (where this is used for identification purposes);
- health data;
- sex life; or
- sexual orientation

Positional Bias

Simple definition: Selecting a response option based on its position in a list.

Full description: When a participant preferentially selects a response based on its position. Most commonly, problematic participants select the first (or top) available response option. There are simple techniques to reduce the impact of this, notably randomising or rotating lists.

Pre Survey Duplicates

Simple definition: Survey entry attempts from the same individual.

Full description: The same participants directed to the same survey more than one time, as identified by checks on IP addresses and browser fingerprints; duplication is device based rather than participant based and assumes that the same participant is using the same device or providing the same personal data when that information is collected. Intent may vary: is this someone purposely trying to take the survey multiple times or simply on multiple panels where they receive the same survey? There can be false positives if two people share the same device e.g. within the same household.

Pre-Screening Validation Questions

Simple definition: Initial questions in a data collection instrument used to establish suitability of participants.

Full description: Best practice would be to disguise the topic of the research activity so as to minimize the risk of individuals making false claims to be included. A disguised screener, which includes high and low incidence categories, supported with data cleaning makes it easier to determine if an individual participant is overclicking (choosing everything, even if very low incidence) or underclicking (not choosing a reasonable number of categories given the incidence). In addition, systems like Captcha and ReCAPTCHA can be used to ensure a human participant is taking surveys.

Pre-Survey Quality Validation

Simple definition: A type of behavioural activity to verify the quality of participants before surveys are completed.

Full description: The process of identifying and removing low quality participants before they enter a survey. For example, digital fingerprinting, validation against third party financial fraud databases, IP address checks, exclusion lists.

Privacy Laws

Simple definition: Laws that regulate the processing of personal data in any way.

Full description: The body of laws that deals with the regulating, storing, and using of personally information which can be collected by governments, public or private organizations, or other individuals. CPRA and GDPR are two examples of a privacy law. Laws vary greatly around the world and even within countries like the USA. Care should be taken to ensure compliance in the country the work is being conducted from as well as in the country the individual is a citizen of.

Problematic Participant

Simple definition: An umbrella term for all forms of bad quality participants.

Full description: This includes those who use certain types of automation such as translation apps. Not all forms of problematic participants should be categorized as fraud. For example, inattention may be considered as problematic participant behaviour rather than fraud. 

Professional Survey Taker (Professional Participant)

Simple definition: A person who participates in surveys (or research generally) as a hobby or source of income.

Full description: There are two main types of professional survey taker:

  • Bad intent: One simply participating in surveys at a high rate but with no regard for validity or quality of responses. They would usually be doing this to maximise the financial return.
  • Good intent: An active participant that provides valid and quality responses and enjoys taking part in surveys and giving their opinions.It is important to not exclude those with good intentions purely based on frequency. Other quality checks should be factored in.

A consideration relating to all professional survey takers is "Non-Naivete" which is a response bias that results from familiarity with the survey intentions or content. This is most commonly observed among highly active participants who participate in many studies. This may result in participants that can predict what answers may qualify them for the study or may generate higher rewards.

Qualitative Research

Simple definition: Collection and analysis of open ended unstructured data used to develop insights.

Full description: An unstructured research approach with a small number of carefully selected individuals used to produce non-quantifiable insights into behaviour, motivations and attitudes. Generally conducted by telephone or face to face either online or in person but and in some cases using live online text/image based discussion platforms. Qualitative generally involves using small sample sizes and a longer length of interview, typically between 30 and 60 minutes. Qualitative research shares a number of data quality concerns with quantitative research.

Quantitative Research

Simple definition: Collection and analysis of structured data.

Full description: Research centred around the numerical variations held within the dataset and based on statistical outcomes usually in the form of percentages. Surveys are made up of single/multiple choice questions, a small number of open ends, scales, ranking etc.

Survey methodology that employs closed-ended response options that are easily numericized. Quantitative research is particularly vulnerable to fraud due to the ease with which closed-ended responses can be made by inattentive participants and automated form fillers, and the relative difficulty in assessing the quality of closed-ended responses. Attention to the speed of completion, answer patterns and quality of verbatim comments can be used to assess whether fraud is taking place.

Question Design

Simple definition: How a question is designed to elicit the appropriate type of response from participants.

Full description: Elements of question design that affect how and in what manner participants understand and respond. Various aspects of question design can affect participant experience, including whether it is easy to understand, includes an answer option for all possibilities, is skipped for individuals for whom it should be skipped, is leading and or double-barrelled, is cognitively hard to complete. 

Random Response Profile

Simple definition: Selecting random and unconsidered responses.

Full description: When a participant randomly selects from among a question’s available response options. Note that this is different from straight lining or acquiescence bias, which are not random, and are more easily detectable. Form-fillers at times employ a random or semi-random response strategy. 

Red Herring/Explicit Trap Question

Simple definition: An attention checking survey question.

Full description: A question in a survey designed to check whether participants are paying attention. A common example instructs participants to select a “strongly disagree” response option. Sometimes know as an INSTRUCTIONAL MANIPULATION CHECK (IMC). These can generate false positives and typically not used as a stand alone check.

Representativeness:

Simple definition: Degree to which a sample reflects the target population being studied.

Full description: As with "sample" - ensuring that a representative snapshot of the larger target population is obtained in order to guarantee a like for like measure of opinion.

Quota sampling is required as a means of obtaining an illustrative data set that represents the larger potential target market. A good example is "Nat Rep" (Nationally Representative) where the specific project target requires full coverage of all groups that feature within a country's census data for example - gender, age, region etc.

Response Patterns For Repeated Question Sets

Simple definition: Systematic responses to survey question banks / grids.

Full description: When a participant answers a bank or grid of survey questions in a detectable pattern that indicates a lack of attention or deliberate fraud. This includes but is not limited to straight lining/flat lining as a participant may select response options in a non-linear pattern that is harder to detect. This is particularly hard to detect in data sets where the order of questioning is randomised. Other factors such as time taken across the particular question set can be utilised.

Router

Simple definition: Technology that redirects participants to specific surveys. 

Full description: An online software application that screens incoming research participants and then uses those results to assign participants to one of multiple available research projects. A router can also offer participants additional screeners and surveys after screener qualification failure or survey completion.

As many panel organisations work with partners or partner networks, technology has been developed to automate the process of passing a panel member from their source panel, via the panel managing the project and ultimately to surveys. During this routing process, participants may have to answer multiple, and sometime repetitious profiling questions that can create a negative experience. Panel organisations and the research sector are working together to create guidelines on how to align profiling data to reduce repetition and optimise this process.

Sample (Survey Sample)

Simple definition: A subset of the target population from which quantitative data are collected to represent a larger population.

Full description: It is a subset containing the characteristics of the larger target population but still sizeable enough to allow for robust analysis. The subset of the population or universe of interest which is interviewed. Typically, the sample is pulled using random methods, so that everyone has an equal chance of inclusion. However quotas are set in order to guarantee representativity in terms of gender, age, geolocation etc in addition to the specific characteristics of the project design, for example - primary grocery shopper, automotive purchase decision maker, C-suite etc.

Speeding/Racing

Simple definition: Completing a questionnaire extremely quickly.

Full description: Extremely fast survey completion times. Thresholds for what is considered “very fast” can be based on specific minimum completion times (e.g. faster than 1 minute in a study with a median completion time of ten minutes). Many studies show that some valid participants are capable of very fast completion times, and care must be taken to avoid false positives. Thus, speeding is often used as a flag, rather than as a sole rejection criterion unless the completion time is considered impossibly fast. Speed can be measured at a question level and multiple question timings can be combined to create a more accurate picture of the behaviour of a participant. Fraudulent Participants may complete the bulk of questions extremely fast but then pause on the last question to deliberately increase the overall completion time.

Straight Lining/Flat Lining

Simple definition: Providing the same answer to the majority of survey grid questions.

Full description: This is a type of Response Pattern. While this behaviour can be easily detected in survey data, using this as a reason for rejection needs to be carefully considered. A participant may do this if, for example, a question is not relevant to them, or if they have become disengaged for some reason. They may actually be answering honestly but just happen to have the same answer each time. The specific threshold can depend on the nature of the study and should be set accordingly based on the number of survey grid questions, and number of items in grids. The term "straight lining check" should be discontinued and replaced with "response pattern check".

Survey Design

Simple definition: The questionnaire structure and methods used to gather data.

Full description: Survey design is the structure of the questionnaire and any specific techniques (methods) leveraged to capture data.

Underclicking

Simple definition: Deliberately choosing fewer items in a list to prevent repetition.

Full description: For example, choosing 2-3 brands in a list where many would be well known. This would be to avoid repeated loop questions about each brand.

Validation Of Fraud-Detection Solutions

Simple definition: A process to determine the effectiveness of solutions used to identify inappropriate activity.

Full description: Empirical assessment of the effectiveness of solutions at identifying and blocking problematic participants. Such assessment should include a description of methods and outcomes such as false positives and false negative rates. 

Valid Participant

Simple definition: A person who is engaged, honest, and meets research participation requirements.

Full description: This is the ideal participant but great care should be taken to identify a valid participant as, for example, what may look like an unengaged participant may be a result of research being poorly designed or worded, or a survey being too long. Multiple data points and factors should be taken into account when determining whether a person is a valid participant or not.

Verification

Simple definition: The process of validating a person's identity.

Full description: Establishing a participant’s identity or background via personal information or qualifying questions is crucial for online panels and other participant sources to combat fraudulent participation. Some panels, especially B2B panels, incorporate or plan to incorporate official ID verification. The verification level varies based on its placement in the survey process and who controls it. Proprietary research panels may employ more personal data for verification compared to other sources. Sample exchanges rely more on technical factors like browser or device information, although these are not foolproof measures and rather serve as mitigating strategies.

Glossary - What have we missed?

Let us know if there's a term you think should be included in this glossary.