Final Project Progress

The biggest challenge for my final project which I’ve also mentioned in my poster is the insufficient sample data. Since lurkers are hard-to-involve online population (Andrews, Nonnecke & Preece, 2003), it is very difficult to collect enough data online. Based on my collected data so far, almost 70% of the participants are active Facebook users. Thus, the correlation analysis would be inaccurate, especially the related personality to high lurking level. I’m still figuring out how to narrow down my online participants to lurkers on Mturk. I thought I could set a limitation on Mturk. However, Jenny said I shouldn’t mention “lurker” or “lurking” online, because these words are easily interpreted as negative words. Or, I could expand the recruitment to improve the lurkers participation, which is not applicable to me since I cannot afford the cost at all.

Then, I decides to utilize Mabi and Li Chen’s method: send questionnaire links on Facebook. I would have a further step: pre-select the participants. I would look into some Facebook groups and mark the lurkers, and then send them my link for help. I think personal message might be useful to get lurkers involved. My target is to collect 30 data from lurkers.

Since it is the last blog I write for IST700, I would like to talk about what I’ve learned so far. Though I will be a PhD student this fall, I think I know little about research. That is the reason why I want to take this course in my last semester. I felt it very difficult to catch up with other students in the first several classes. One reason is my poor listening, the other reason also the main reason is that I don’t understand their certain terms, methodologies, especially in philosophy..really strange to me.  Thanks to IST700, I start to learn about them as they are crucial to my future research.  By doing assignments and readings, I have a brand new understanding about writing, especially professional writing. Finally, the class discussion gives me the chance to know others’ fresh ideas and shows me the possibility of interdisciplinary.

Thanks Jenny for her great instruction, and thanks all my classmates for their great work this semester. Hope you have a bright future of research.

Search Engine as Research Machine?

Actually, I never regard search engine as a research method in the past, and I still hold this view except the research topic is about searching or search engine.

Search engine is a good tool for me to get some info quickly before the research. After I get enough information via various websites online, I could also download some literature via search engines.

When talking about what research could be done based on search engine, I’m pretty interested in people’s behavior on search engine. Like we discussed this Monday, though most people would use search engine box to search websites, I’d like to type weblink directly. Famous websites like,,, www., or even brief (you dont need to type 3w actually). These links are very easy to type, then you could easily press enter to these webs. Why am I different? Actually I don’t know. My reason is very simple: since I’m typing on the keyboard, I don’t want to change to mouse or screenpatch. It will cost me 5 more secs for operation, and 5 secs or so for loading one more page (searching).

My research always generated from my daily life. You could see that my final project topic is lurking. I’m a lurker, and I might only post once a month in average. I think it worth me to think about what other think of lurking.

Sentiment Analysis on emoji

An interesting topic caused my contemplation about sentiment analysis: Sentiment analysis on emoji. Yingya shared one article about emoji on social media, then she posted a statement that future sentiment could analyze emoji besides text-based information. Thus, I start to imagine that if it is possible. My first thought is that emoji could somewhat represent human’s sentiment in some extend.

The emoji in row1, column 4 and 5, are pretty standard expression of love and kiss.

However, not every emoji has very unified meaning to everybody. Just like emoji in row2, column2. Based on my understanding, I often used it as helpless, neither crying nor smile. However, my roommate thinks it means laugh till tears. Emoji in row3, column2, I would like to use it when I don’t want to speak, while my roommate think it means ill. Also, emoji in row2, column7, I could either use it for surprising or frightening. Moreover, emoji in row2, column5.. I even don’t know its meaning.

Different people have different understanding towards the same emoji. Even the same person might have different use in different context. Thus, how can we use purely sentiment analysis techniques for emoji?

Another article Yingya shared on Facebook mentioned that automatic sentiment analysis could be inaccurate, because slang and sarcasm are really difficult to detect. I think it is the same as emoji. Some people might use a “smile” emoji, but how can you know it is not a smirk or irony?

In conclusion, even thought sentiment analysis has developed fast, core issue like detecting sarcasm and slang is still under solved. Emoji different from text, is even difficult to detect correctly. One possible idea I could propose might be connect the demographics with the emoji usage, also using the post-demograhics to categorize the emoji, then try to apply categorization algorithms for further analysis.

Finally, as for my research progress, unfortunately, I was pretty busy this week, so I had little time working on it. I’ve finished my survey questions, next I will post it on Mturk or other crowdsourcing websites to get the results.



Sentiment VS Semantics

Actually, I’m pretty familiar with the word: sentiment. Last semester, I took Prof. Yu’s Text Mining class, in which sentiment analysis is one of the main components. I still remember that Prof Yu introduced two online sentiment analysis softwares, and let us manually label 50 pieces of online movie comments. Three major sentiments are positive, neutral and negative.

Different from sentiment, semantics is kind of complex in the natural language processing. Also, different field has different analysis code. For example, semantics in linguistics is to study the common law and the similarity/ differences in different languages. While in logistics, semantics is the explanation of logistic system. In other words, it is just like values, which also includes >,<,!=, 1, 0, etc. In computer science, semantics comes to machines’ understanding about natural language.

From my perspective, these two concepts have similarities, particularly in NLP aspect. Tokenisation, stemmer, stopwords, and even decision trees are general approaches no matter you want to study sentiment or semantics. However, sentiment is relatively easier than semantics, since language is very abstruce and complicated to process or analyze by human beings, let alone machine.

The Development of Web Space

Rogers (2013) mentioned several different periods of web space in his book: the web-as-hyperspace period, the web-as-public-sphere, the social-networks period and the locative period.  According to this sequence, our web space is changing from hyperspace to a more “grounded” space. Or, as what Rogers said in his book: the death of the cyberspace is also caused by cybergeographic space. (p.40)

Indeed, what I have felt about we space so far is pretty similar to Rogers. In the past, maybe when I was in senior high school, websites could only associate to other sites, organizations by linking. Some websites would show lots of links of other related websites in certain area. Just like what Rogers mentioned, these links could be viewed as “acts of associations”.(p.44) Also, some conclusions could also be derived based on different domain types. Rogers illustrated two very interesting figures (Figure 2.7 and 2.8) about cyberspace, and at that time, social network and geography haven’t appeared yet. (p. 50-51)

Then, as the developing of search engines like Google, Yahoo! and Baidu, websites tend to be seperated into different spheres. Here comes an issue that, since search engines apply different algorithms into building “spheres”, the reality of the websites are crucial. If website owners set the links improperly, search engines would fail definitely. Therefore, the politics of web space concerns a lot. Additionally, just like Rogers mentioned that web space is divided into subspheres such as news sphere and blog sphere. It becomes difficult to do cross-sphere search, and the resources is also hard to explore. (p.52)

Network period could be the most understandable period to me. I’ve done many studies about social network during my undergraduate, especially for e-commerce. After analyzing many different sites, network mapping could be used to dig out infos that underlying in the dark. Blogs and microblogs boost in network period, and self-representation come to the surface. Information is no longer revealed from certain single media but everybody. Therefore, based on network mapping, we review all the infos related as Rogers use “fact-checking” to describe it. For instance, if we get some information from solely media, we might want to check the reliability of this info by checking other related resources via network mapping.

Last, the locative period. Actually location is currently a very popular research field combined with mobile technology. When I was finding instructors for my PhD program, I found many professors start to study geologic information system (GIS) and other related sphere. Actually, locative period could benefit us a lot. Mobile Location could bring us lots of convenience when going out. For example, I just had a solo tour in Canada, and what supported me all through my trip was Google Map. Especially in Montreal and Quebec, people use French most, so it was really hard for me to familiarize the streets. As for locative web space, Rogers stated that it could relieve the problem of “equality and demographic concentration” and scandalizing, which I totally agree. Information about location and geography collect geographic demographics automatically, also it could easily investigate the scandals for which Roger presents an example in his book. (p.58)

Overall, the approaches people treat web space is evolving, and Rogers asks a good question that what is the politics of cross-sphere research? Maybe when we are doing our own research, we should think about it thoroughly.


Rogers, R. (2013). Digital methods. MIT press.

Something about Mturk

What Chen Li came up on Facebook discussion raised my attention about Mturk. I used to take it for granted that when our studies are about psychology or certain human behaviors, in other words, if we want to utilize the online questionnaires and experiments,  Mturk is a good tool to use. However, what I’ve been ignored so far have come out in my mind: Is the data collected from Mturk representative and reliable enough?

Past design of experiments or questionnaires have taught me that generalization is a tough question. Also, generation is related to what data we obtain.  I used to think that as long as the amounts of participants is big enough, the data would somehow represent a bigger population such as US population. However, some scholars claimed that the Mturk population is a unqiue population, because many Mturk subjects tend to be “younger, overeducated, underemployed and less religious”. (Paolacci and Chandler, 2014) Therefore, Mturk cannot represent the whole population, especially Blacks and Asians. However, according to their previously paper, though online participants tend to have lower income and higher education levels compared to general US populations, “internet subject populations tend to be closer to the US population as a whole than subjects recruited from traditional university subject pools.

Despite the shortage of representativeness, people also concern about their data quality. Some Mturkers might be too experienced answering these questions, and some Mturkers don’t want to give the real answers. This is somewhat similar to lab experiments, especially when we want to test some user behaviors or psychological status. I’ve once read about one article talking about some participants would like to give the results towards pre-established hypothesis. Even we don’t tell them our experiment purposes, some participants could still get the points from questionnaire questions. I am one of this kind of people, and I also realize that my behaviors could generate some bias to the study… So far, it is not a solvable problem just like I’ve asked about the experiment generalization and self-report bias to Bryan, my instructor of HCI. He just told me these things are pretty difficult to address at present.

To sum up, I’m still willing to use Mturk as a subject tool for future study. After all, it is a good platform in terms of time, compensation, and sampling.



Paolacci, G., & Chandler, J. (2014). Inside the turk understanding mechanical turk as a participant pool. Current Directions in Psychological Science, 23(3), 184-188.

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on amazon mechanical turk. Judgment and Decision making, 5(5), 411-419.

About Virtual Ethnography

In Kevin’s paper, he mentioned that virtual ethnography had became an increasing methodology for online interaction. He also pointed out the common conflicts of virtual ethnography as:

  • Reconciliation of space and time in a virtual field site
  • Negotiation of identity and authenticity in digital interactions
  • Management of ethical dilemmas encountered in online field study

I agree with these conflicts are the most typical. Especially the ethical part. Actually, since my research interest is social computing, I’m quite interested in studying human behaviors on social media. That’s why I want to study if social media will affect the social ability of human beings. Or, for certain people, social media is just like another virtual society, and they have gradually formed another different forms of communication or social style, namely, online communities. One concrete form of data could be online text-based data as the paper said: message board. However, ethics is a big deal in my research.  Just like I said in my last blog, I planned to use experiments for my study. However, take into consideration for many factors, it cannot be conducted currently. Though it is not possible right now,  I need to anyway consider the problem: ethics for my future research. We need to take ethics into consideration as long as we involve ethnography in our research. For online ethnography, it links to identification and authenticity. Just like Kevin mentioned in the paper, specific usernames could be easily traced back to real space identity. Therefore, when conducting ethnography,  participants’ personal info should be well protected.

Enlightened by Kevin, I need to take into account the asyncronized time into account. As he stated in the paper, due to the robust storage capability of current advanced technology, message board could achieve asyncronized conversation by archiving past phrases, and different timeline for different people. This variable should be controlled in my future experiment. Additionally, lurker is an interesting group of people in social media. I’m also interested in studying social media lurkers. These ideas I will definitely study in the future.

Ethnographic Research

Researches and studies especially in Human Computer Interaction field regard ethnography as a crucial methodology. Then what does ethnography mean? You could find its definition in many books as ethnography is an approach to learning about social and cultural life of individuals, communities, organizations, etc. I first knew it from Bryan, who is the instuctor of our HCI course. He also introduced us several methods of ethnography such as field observation, survey, interview, experiments, etc.

I gradually find that I cannot do research without ethnography. What I’m doing currently or in the past are all based on user observation, interviews, questionnaires and experiments.  Hypotheses always get proved by analyzing the ethnographic data via both qualitative method and quantitative method.

However, here comes an issue that, since our study involves human being, we should pay attention to ethics. In other words, how can we obtain data from people legally, and what data of people can we interpret and publish. Especially, people today have formed an habit, which is to post private information online. Some people even don’t aware the risk of information security. Frankly speaking, our behaviors have exposed online as long as we are online users. For instance, our browsing histories direct the advertise recommendation to us. These online information are pretty useful for behavior researchers as well as marketing specialists. This is still a big concern for HCI researchers at present.

As for ethnographic research, I want to talk a little bit about the data collection, especially in online and offline data acquisition. Orgad (2009) mentioned in her book that online and offline are two main resources of obtaining data. She also stated that internet itself differentiate the online and offline data. Internet as a medium has its own features. Hine (2000) called internet as a plausible research field site. Namely, Internet provides us another research platform, especially for social behaviors. However, can these two resources be obtained at the same time? Orgad (2009) demonstrated in her book that it depended on what question we wanted to ask. For example, if we want to ask about the connection between online and offline social environments, we need both of them. Also, she stated in her conclusion that what valued most is the quality of data not if it is online or offline.

When it comes to my own research project, I think it is quite complex. Last class, Jenny didn’t suggest me to do an experiment about sociability of online and offline. Therefore, I’m trying to come up with a new methodology to conduct this study.  Thus, if I finally need to collect data, what data should I collect? Based on Orgad’s theory, I need to collect both of them since my study connects online environment to real scenario.  My research question could be:

  • Can social media environment weaken one’s social ability in real scenario? (making friends)
  • Will people act differently on social media from real scenario? Why? (making friends)

I’m working hard on figuring out what data should I obtain.


References: Orgad, Shani. (2009). How can researchers make sense of the issues involved in collecting and interpreting online andoffline data? In A. Markham & N. Baym (Ed.s), Internet inquiry: Conversations about method (pp. 33-67).

Post-demographics: How does it benefit the social media researchers?

Roger(2009) created the term Post-demographics which refers to those data that is not demographic, and could be grouped by different interests such as movies, sports, TV shows, etc. These grouped data could somehow show the basic social networks under different social media platforms. Social media researchers also aims to explore further underlying rules from the post-demographic data, especially in social networking.

Here, I want to mention two social media platforms: Twitter and Weibo. Both of them are microblogging platforms, which, from my perspective, post-demographic method could fit well for reasons as follows:

First, these platforms are topic-driven. Information could be spread rapidly via topic mode because one topic is brief and mysteric, which attracts people to participate. Microblog provides topic the medium (a hashtag) to be noticed by people. Big population would finally lead the public opinion. That’s why so many politicians and celebrities have their public accounts. Then we users start to get used to the “hashtag mode”. For instance, me myself as an active weibo user recently would chronically go through the top 10 topics first, then choose several topics which attract me most to have a broader view about people’s opinions.

Therefore, people start to realize that this topic could lead them to others who have the same interests or point of views. Weibo also finds it, so it provides a function which categorizes the topics into movies, songs, travels, etc. This could be regarded as the demographic

Fig 1. Sample screenshot of top topics on Weibo

Second, following is another kind of social network which is different from Facebook network.

References: Rogers, R. (2009) “Post-Demographic Machines”. In Annet Dekker and Annette Wolfsberger (eds.), Walled Garden. Amsterdam: Virtual Platform, 29-39.

Social Media of Things

With the image that aliens know nothing about this world or even social media, I’d better to describe Facebook and Twitter with simple and understandable words.


In my understanding, Facebook is one famous social media platform found in United States. To know how exactly how it works, we need to know what is  social media. “Social” is a kind of communication just like the talking between you and me everyday. “Media” means the tool for communications, like telephone, radio and TV. Nowadays, social media more relies on internet, which is a magic network. On internet, we could use these social media tools for communication by typing with keyboard.

How does Facebook works for people?

Once we create our own account, and input our personal information, here’s a sample homepage for ourselves to manage. On this homepage, we could upload our photos, sentences, and add friends. The photos and sentences could be seen by other Facebook users, and they could give you some comments as well. Also, we could also share and send these infos via messages to other friends. In terms of privacy, you could choose your private level for each of your infos, so once you have strong sense of information security protection, you could use the Facebook casually.

Since Facebook are popular, it begins to develop more applications such as game, events, etc. I need to say, most US people regard it as an indispensable stuff in the daily life. That is also why so many researchers in psychology and HCI field do researches based on facebook.


Twitter is another very popular social media platform in US. However, it is somewhat different from Facebook.  Here are the differences:

First, the posts are word limited in Twitter. In other words, it only allows you to post 140 characters for each posts.

Second, it is not a friend-based social network but a follow-based social network. You could follow whoever you want, and whoever also could follow you.

Third, it doesn’t give you the options to choose the private level, which means all of your posts are open to the public.

Fourth, it is mainly based on the hashtag of topic, so that people with the same theme could discuss together. That is also why so many public celebrities and companies or organizations like using Twitter.

Here’s a sample page of mobile Twitter. As you could see you will see all the posts from the people you are following, you also could follow new people, find new topic and write your own tweets.

As for how these social media tools affect the interaction of people, I think its a double sided sword. On one hand, these tools broad people’s social circle and create the opportunities for them to make friends with people all over the world, especially with those who have the common interests and opinions. Also, it accelerate the efficiency and effectiveness for group work. However, it reduce the communication in real scenario, which might weaken the initial social ability of people, which might influence the normal life.