Difficult to Solve: Ethics and Privacy in Big Data

Since internet and social media develop so fast and have dominated people’s daily life,  its drawback underneath have gradually revealed: ethics and privacy.

Let’s first talk about personal data and privacy. I’m a self-protect person, so this topic is valuable to me. When I was in high school, I’ve heard about Facebook and Twitter. However, they were blocked in China at that time. Then QQ (an Instant Messaging (IM) tool by Tencent, who produced Wechat later) became popular in China. QQ not only have IM function, but its social feature called Qzone. Users could make posts, logs on Qzone. It is kind of like WordPress, but with more features. Then when I was in undergraduate, blogs, microblogs and Wechat became popular.

Actually, back in 2010 and 2011, I positively made posts, wrote blogs, and share infos to other friends. However, after 2012, all kinds of net bluffs jumped into my eye, and also it is so easy to find people, even with their private info, based on their online records. Though it most happened for finding with some bad behaviors, I felt uncomfortable that my info could be so easily retrieved. Then I became a “lurker”.

It seems like majority of social media users havn’t formed the awareness to protect their private information online. In my opinion, it is related to their unawareness of personal rights, and also if the country pay attention to it.  Just use research involving people as an example, researchers in China don’t need to submit any application like IRB, even for publications. Therefore, when I first filled the IRB, I felt it was really time consuming, and heavily hold back the progress. I once spent over one month to get the IRB approved. However, I know it is the ethics of protecting human rights.

Even with IRB, most people still have no idea about their data have been used by others no matter for science research or other purposes. For example, I never think my comments on certain public page on FB, are possible be collected by others before I took text mining class. Same thing to Twitter. Also, it is right that we know our posts are set as public, but they may just think it can be viewed in public not download or further analysis. On the other hand, we also understand it is almost impossible to collect data after everyone’s permission, especially big data. Therefore, how to define the boundary of ethics is still vague to me.

Final Project Progress

Enlighten by Jenny’s suggestion, I incorporate Big Five Inventory into my research. I used to think maybe only test “Introversion” and “Outroversion”, but when I looked into other personalities, I think lurking might not just relate to introversion but openness. So I proposed further question: if openness will also affect lurking behavior. However, I have the feeling that, the ultimate result might contains more than two related variables.

I’ve revised my survey questions based on Jenny’s Feedback, and I will publish it tomorrow to collect data. Things go along nicely, so I’m expecting a good result.


