Thursday, May 28, 2020

Big Data does not mean big sample size – in praise of small samples in qualitative research and evaluation


You may count the flowers in this tree, but it may be a far-fetched undertaking to count the feelings and joy it gives to passers-by. This is the picture the author of this blog took in Harare, Zimbabwe (2019)

With experience in mixed methods for my research methodology studies and evaluation work, I have read with interest what Matt Gallivan wrote about talking to users in the age of big data. I commend Matt for an insightful read, breaking the traditional methodological silos. While complementing Matt’s post, I would like to contribute using a scenario about social (don’t ask me about natural!) science researchers, regardless of their methodological allegiance, how they think qualitatively and then I will end with some (dis)advantages of using one method and its sampling strategy in a complex social or behavioural inquiry. All that is big (I mean Big Data) is not necessarily big from value perspective. A small sample may be smarter and more insightful (note, I am saying it may, just to draw the reader’s attention to the context and purpose of research).

I believe (and some evidence shows, possibly incorrectly, hence the need for more evidence!) that quantitative researchers first think qualitatively while breaking down the concepts under inquiry into measurable variables. They then think quantitatively during data analysis process pulling all their statistical knowledge and skills together to make predictions and draw inferences. They eventually think qualitatively at the stage of data interpretation or discussion. In my life as a researcher and evaluator, I have not seen a single research report where a quantitative researcher presents figures and statistical tables as analysed and retrieved from respondent-provided data. It is usually about a results table or figure and then the narrative (however short it may be) from the researcher, adding flesh to the numerical skeleton. It is not the survey respondents who interpret results, it is the quantitative researcher using his/her language to make sense of numerical and statistical results (which cannot talk for and by themselves).

Take an example of scaled questions in a survey. A researcher may need to measure motivation towards or self-confidence in a particular subject. S/he comes up with a list of items that can give insights about self-confidence (subjectively imposing respondents to tick from what s/he retrieved from her/his worldview). Any factors inaccessible to or non-existent in the mind of the quantitative researcher yet worth including in the data collection instrument at the design stage is less likely to be measured. It may be surprising if a researcher gets all perspectives needed to measure self-confidence from respondents' worldviews. S/he scales the items into Likert format, just an example, and collect data from respondents who have nothing to say but to honestly (if no social desirability is activated) choose from the shopping list of items from the researcher to numerically depict their self-confidence. The survey researcher eventually does factor analysis to find out which items hang together to measure self-confidence. S/he may find different shades or dimensions of self-confidence which s/he would surely have to name according to the underlying concept(s). The qualitative name given to the underlying dimension may (slightly) differ from one person to another (due to multiple factors), and this is still labelled quantitative, objective research. This post is not meant to challenge the objectivity usually associated with quantitative research. I do like its rigour anyway.

In a piece of research I published on quantitizing and qualitizing data in mixed methods research, findings point out lots of things that cannot be quantitized, which could have gone missing from the data collected if the piece of research had solely relied on quantitative survey questionnaire. As argued in the aforementioned article, silence, hesitation fillers, implicit, unsaid, supra-segmental or non-linguistic responses are not easily captured quantitatively. I call them un-quantitizable data. If a student is determined to make a difference in their educational choice, their tone (sometimes of surrender to) and/or sense of revolt against all sorts of sociocultural stereotypes may never appear anywhere in quantitative survey data collection instruments. I assume data science coupled with technology will forge the way out of this huge Big Data mess. Silence, deep thoughts, and the pain-points felt to respond in qualitative research are meaningful but in quantitative research it is data loss. Big Data needs small data.

Each research paradigm has its advantages and disadvantages. Quantitative research has its established sampling strategies, and it is not surprising to find minimum sample size commonly set at about ≥30. But rarely is the maximum sample size set, arguing that the bigger the sample, the better. I like the fact that there are established formulas to determine the sample size for quantitative research. It is however important while investigating social and human behaviours to think twice to ensure adequate data are collected to meet the objective the research is designed for, bearing in mind the context as well. Context is of paramount importance in the conduct of research. The size of the sample is important but not everything in research. In a well-designed and conducted qualitative research, you may reach saturation at fewer than 20 interviews (my rule of thumb, to be tested with data in the near future, but I would welcome any pointers to existing evidence). It is not cost-effective nor ethical to gather data when sufficient data have been collected to generate necessary insights. In qualitative research, this is called data saturation, a point upon which no new information is collected from respondents.

Depending on the research objective, I would even go for a sample size of 1 and be able to generate insights needed to understand whether a product or service is functional. Depending on the importance attached to a product or service and the familiarity with it, one user can fully describe it, find errors, propose improvements in a way that can even surprise its designers. A user experience researcher can collect all that is needed from one “good and regular” user of a product, of course two heads are better than one, to understand what is going on with the product or services in real-life. Various researchers have proposed various sample sizes for qualitative research ranging from a minimum of 3 to a maximum 25 depending on the type of qualitative research designs and approaches (case studies, ethnographic studies, phenomenological research, grounded theory, focus group-based research, etc). Don’t mind these sample size numbers, I almost killed myself with more than 200 interviews and narrative biographic educational essays, but it took me about three years to process such huge thick data (research objectives required those extra miles, anyway!). There is no harm going quantitative or qualitative but mixing both in some (not all) studies help generate more insights. The key is knowing why, when and, how to go monomethod or multiple methods in this Big Data era but not an era of so big samples in qualitative research. Big Data will contribute and is contributing to how we access and process data, but cost-efficiency and other factors remain instrumental in sampling.


No comments:

Post a Comment