What Chen Li came up on Facebook discussion raised my attention about Mturk. I used to take it for granted that when our studies are about psychology or certain human behaviors, in other words, if we want to utilize the online questionnaires and experiments, Mturk is a good tool to use. However, what I’ve been ignored so far have come out in my mind: Is the data collected from Mturk representative and reliable enough?
Past design of experiments or questionnaires have taught me that generalization is a tough question. Also, generation is related to what data we obtain. I used to think that as long as the amounts of participants is big enough, the data would somehow represent a bigger population such as US population. However, some scholars claimed that the Mturk population is a unqiue population, because many Mturk subjects tend to be “younger, overeducated, underemployed and less religious”. (Paolacci and Chandler, 2014) Therefore, Mturk cannot represent the whole population, especially Blacks and Asians. However, according to their previously paper, though online participants tend to have lower income and higher education levels compared to general US populations, “internet subject populations tend to be closer to the US population as a whole than subjects recruited from traditional university subject pools.
Despite the shortage of representativeness, people also concern about their data quality. Some Mturkers might be too experienced answering these questions, and some Mturkers don’t want to give the real answers. This is somewhat similar to lab experiments, especially when we want to test some user behaviors or psychological status. I’ve once read about one article talking about some participants would like to give the results towards pre-established hypothesis. Even we don’t tell them our experiment purposes, some participants could still get the points from questionnaire questions. I am one of this kind of people, and I also realize that my behaviors could generate some bias to the study… So far, it is not a solvable problem just like I’ve asked about the experiment generalization and self-report bias to Bryan, my instructor of HCI. He just told me these things are pretty difficult to address at present.
To sum up, I’m still willing to use Mturk as a subject tool for future study. After all, it is a good platform in terms of time, compensation, and sampling.
Paolacci, G., & Chandler, J. (2014). Inside the turk understanding mechanical turk as a participant pool. Current Directions in Psychological Science, 23(3), 184-188.
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on amazon mechanical turk. Judgment and Decision making, 5(5), 411-419.