Good memories from the seminar organized last year. Its time to make a new one!
Good memories from the seminar organized last year. Its time to make a new one!
Following the previous post, lets see if sampling matters.
This graph shows a few episodes of Euromaidan communication. It features QAP correlations for likes and betweenness (keep in mind, that QAP is about differences in likes/betweenness across nodes i and j) for a particular week. The correlation is quite strong, however, a regular sample overestimates it.
ps. a sample correction will be discussed in further posts
Graph: QAP correlation between likes and betweenness.
Imagine you have thousands of people commenting something on a Facebook page multiple times. It would be nice to construct a network from it, however, the computations are complicated due to the large number of people. Thus, a sample is needed. A question arises immediately: how would you do it? Network samples are known to be a tough task. There are two quick answers – sample be people who make comments, or sample by post. Which way is better? An intuitive suggestion would be to sample by people. Because one may investigate the whole communication pattern when people talk to each other throughout the page. Although it’s possible (by chance) to lose an important person who is very active and ties a network. When the posts are sampled its possible to have a case when all typical topics are presented in the sample and due to homophily effects people are likely to be allocated within these conversations. A risk here is to lose an important post consequently tossing off many comments and dialogs.
Lets check both ways!
For the exercise the Euromiadan data for the first week of January 2014 is utilised. I run permutations and sample 10 different sets (this is enough to see the pattern). Then I generate 10 networks per each permutation and see if mean density and betweenness correspond to the real values in the complete dataset. Also I make different samples. For example, I run 10 permutations for a sample that is 5% size of the total data, 20%, 40%, 60%, and 80%. I make samples by commentators and by posts separately.
The first graph shows the results for betweenness centrality. It is evident that the sample by commentators works better. It is enough to sample 20% of people to get the same results as in the original data. And the samples by posts are working very poorly. Which is quite logical since many posts are deleted by chance, therefore, no different classes of conversations were selected, thus, reducing a likelihood of a node to be present in on a path between different classes.
However, the story is not so bright for density. Both samples work pretty badly. One must select 80% of the people in order to get a bit closer to the real value. Again, this makes sense. Since density reflects a ratio between all real connections and all possible connections, its intuitive that any sample is going to toss off by chance a number of nodes with connections.
Its nice to have robust samples for betweenness since the diffusion of information cam be studied. A fact that density is not represented is upsetting. It looks like some dyad and triad effects can be missed due to the wrong sampling.
Another thing to check is a power of an each object selected for a sample. For instance, in this data I have 330 posts. Imagine if you toss out only one of them – does it really harm? How many edges are you going to lose? It appears that a lot! A minimum number of edges that are loosed is 1 and the maximum is… 360. So in the worse case scenario just by chance you can lose hundreds of ties by not selecting particular posts. In case of samples by people the maximum number of “killed” ties by not selecting 1 person in my data is 70. This explains why the samples by people are more efficient for betweenness centrality and both samples are so fragile in case of density.
Hi! This post summarises my latest activities.
And some materials after the master class in Kyiv (in Ukrainian).
Ukrainian revolution “Euromaidan” triggered a quick response known as “Anti-Maidan” social movement. This movement has a Facebook page which I analyse using Netvizz. Here I discuss an undirected network of people that are tied by commenting to the same posts during March and April 2014. Overall, I have 596 nodes and 7,204 edges. A brilliant paper recently published in Social Networks shows thatbetweenness centrality has no effect on making profitable choices in a lab. I wonder if betweenness centrality does not play any role in my data as well? When a person makes a comment in my data he or she can receive a premium from others for their opinion (Facebook does not allow punishments though). In a way, making more likes (profit) depends on the correct opinion and choice of words. In this setting closeness centrality means that a person is directly connected to many other threads of conversations. However, the same person may be well connected within one cluster only, making the exchange of information redundant. Betweenness must solve this issue. Indeed, an OLS model shows that 40% of variation in likes is explained by betweenness, whereas closeness has no significant effect. PS. and of course an image made in Gephi =)
UPD: of course, the original article of Bas is about social learning, and here we don’t have any empirical sign of it
Public online conversations on Facebook assisted Euromaidan revolution. Yet, the exact topics of political debates people were engaged in are not known exactly. What were the issues driving Euromaidan activists online? Did they talk about Euro-integration, corruption, injustice, or economic inequalities?
Just out of curiosity I decided to compare how many comments people left with the keywords “Euro…” and “Oligarchs” (in English/Russian/Ukrainian). This is just a crude count without any sensitivity analysis, i.e. no positive/negative dichotomising. A few observations:
1. In terms of frequency, talks about euro clearly dominated
2. Both topics followed critical junctures: start of the movement, Parliament (VR) voting for the antidemocratic laws in January, Yanuckovich fled in February, then Crimean annexation in March.
3. There was only one episodes with divergence. In March an increase in “euro” talks coincided with a critical drop in “oligarch” conversations. Perhaps, Crimean story generated sort of political mobilisation?
4. Surprisingly, the period of elections did not witness an increase in these topics (with a slight outrun of “euro”). I had expected opposite.
A few days ago I was honoured to participate at the special symposium organised by the Journal of Comparative Economics and VoxUkraine where I presented some speculations about civic society and online social capital in Ukraine. It was a great forum where I received neat and challenging comments & suggestions (this is the best euphemism for criticism I could come up with).
I guess this random and scattered blog is just an attempt to spell my views on social capital online. I think it is important to reflect one more time and think about the definitions and their interpretation given the context of online interactions.
(1) Lets start from the very beginning. What is social capital? Sociological determinism rooted in works of Granovetter, Lin, Flap and other scholars of late 80s early 90s suggests that social capital is something an individual may get after investing in social ties. Social ties with important people bring relevant information, reputation, better jobs etc. I guess Bourdieu belongs to this school as well. With an exception that social stratification was crucial in his works. Social capital helps a person with a low social status to walk up on the social ladder.
An alternative view on social capital is rooted in works of Coleman (big fun!) and Putnam (not such a big fun). Social capital here is a common good that appears from communication between people. People are engaged in social interactions and that is how they generate social norms, trust, and institutions. When interactions between people are “nice” (repetitive, mutually oriented, rewording), social capital pops up as a by-product of these interactions and benefits all individuals indirectly. For example, high level of trust reduces transaction costs etc.
The question is – is there any way for social capital to emerge from online communication? And can it be segued offline? I believe that the answer is positive in both cases, however it is not going to please sociologists. I guess we should abandon an idea that social capital is a return on ties. I doubt that online interactions, especially during revolutions, are based on social group cleavages. First of all it is not so simple to signal the social group belonging online; secondly, grass-root civic movements usually unite people with different background (EuroMaidan united working class and service class as well as students and pensioners). Social capital in case of online civic movements is indeed something that people hadn’t had before their online interactions, something that emerged from their interactions, and something that benefits the whole community indirectly. In a way that people who belong to the same Facebook page derive positive experience from there and act with respect to this experience. Here Coleman’s theory fits perfect, because we really want to measure new social norms and trust that emerge from human interaction. A crucial thing here (and I thank Ruben Enikolopov for his hint) is to establish a valid reference category. What if social capital had NOT been developed online? This means that people would have gone online, spent their time on reading some posts…but they would not have developed some mutual trust, realisation of a community, understanding how to behave according to the group expectations in a given context. I think online social capital promotes homogeneity and regularity of behaviour (and this is the path to offline), and the lack of social capital provides atomisation of individuals whose behaviour is not related to others. An important thing here is that their behaviour may look the same (to share the same post) – but this behaviour is not necessary affected by actions of other people. Just like in the example of Weber who pointed out that when people open umbrellas this is not a social action because it is not mutually oriented, they are just reactions to the same independent cause (rain). And these reactions just happened to be the same.
(2) Social capital and networks. When many people share the same post this is important. But this says nothing about their chances to share the next post. That is why my study was concentrated on social networks. Regarding the theory proposed (Granovetter vs Coleman), social capital is always embedded in social networks. And the only way to see if actions of people are related to actions of other people is to measure the properties of social networks. By looking at density or transitivity of online networks we may see if there are some structures of connectivity, or maybe people are unrelated in their online behaviour. By looking at various measures of centrality we may see if people occupy a certain position within this network and, therefore, have some special role in creating social capital.
(3) Online social capital and posts on Facebook. Lets say you agree with me and think that social capital is a common good, and it can be somehow seen in social networks…the next issue is how to measure online interactions in a way that they reflect networks of people. And this has been a big issue for my research since I study people who comment the same post. Here I have to agree with all my critics (Keith and Eric made a very strong point there) who pointed out that this measure reflects an exposure of people to the same information but not necessary the communication between people. Can it still be the case that my measure captures social capital in any way? I guess so. Again, if we agree that social capital is a common good that emerges from communication, then it is crucial to understand the channels for this information to flow. In the worst case scenario all people in my data are exposed to different posts, or they have no pattern of exposure – just random connection to miscellaneous information. In the best case scenario, people in my data are not only exposed to the same information, there is also a structure of the connections allowing us drawing conclusions about clusters where information sticks and bridges that theoretically allow information to spread. How do these bridges work? Even though I don’t measure conversations between people and their reciprocity, I think I still may suggest that when one person is receiving an information from one cluster, the same person is likely to bring this information to another cluster in comment session.
(4). Some other issues like the social profile of users, the role of strong/week ties (and their operationalisation online), and the possibility to study alternative social media will be discussed in the next entries.