Editorial matter, selection and introduction © Hallvard Fossheim and Helene Ingierd 2015.
Individual chapters © respective authors 2015.
The authors have asserted their rights to be identified as the authors of this work in accordance with
Open Access:
Except where otherwise noted, this work is licensed under a
This book was first published 2015 by Cappelen Damm Akademisk.
ISBN: 978-82-02-48035-6 Printed Edition
This book has been published in cooperation with The Norwegian National Committees for Research Ethics.
ISBN: 978-82-02-48951-9 E-PDF
ISBN: 978-82-02-49235-9 XML
Typesetting: Datapage India (Pvt.) Ltd.
Cover Design: Kristin Berg Johnsen
«Internet research» does not make up one unified object. The term denotes a wide array of research on Internet activities and structures, as well as research that utilizes the Internet as a source of data or even of processing. There is still good reason to make Internet research the unifying topic of an ethical treatment, however, for many forms of Internet research confront us with the same or similar ethical challenges.
In a given Internet research project, there is sometimes real worry or disagreement about what will constitute the best solution from an ethical point of view. It is relevant to this state of affairs that the relative novelty of the technology and practices involved can sometimes make it difficult to see when two cases are ethically similar in a relevant way and when they are not. Similarly, it is not always entirely clear whether and to what extent we may transfer our experiences from other areas of research to Internet research. In some respects, Internet research seems to be part of a broader technological development that confronts us with substantially new challenges, and to the extent that this is true, there will be less of a well-established practice on how to handle them. Some of these challenges also seem to apply to the judicial sphere when it comes to formulating and interpreting relevant laws.
To provide something by way of a very rough sketch of the sort of issues that confront us, many of the ethically relevant questions voiced about Internet research concern personal information, and are posed in terms of access, protection, ownership, or validity. These questions are especially relevant when the research concerns what is often referred to as Big Data, our amassed digital traces constituting enormous data sets available to others. The fact that much of this information has been created and spread willingly generates complex questions about degrees of legitimacy for the researcher who chooses to appropriate and recontextualize that information, sometimes also with the potential of re-identification through purportedly anonymized data. The issues are naturally made even more complex in cases of third person information, or where the individual is a child or young person.
Person-related information that is available online to researchers (and others) covers the entire spectrum from the trivial and commonly known to the deeply sensitive and personal. Along another ethically relevant axis, one encounters information that is openly available to anyone interested at one extreme, and information that is protected by access restrictions or encryption at the other extreme.
There is also the question of the impact that research can have on the object of research, i.e., whether there is a risk of harm to participants, and whether and to what extent one should demand that the research constitutes a good to others besides the researcher. This is a feature shared by all research on human beings. But in the case of Internet research, it is often the case that the impact (and the importance of the impact) are particularly difficult to foresee; think, e.g., of how research on an individual or a group online can affect those people's behavior, either as a direct consequence of the researcher's presence or as an indirect consequence of the publication of the results.
Moreover, in much Internet research, the data that is collected, systematized, and interpreted is generated in contexts other than those of research. Questions arise as to when exceptions from consent are justified, as well as to how consent may be obtained in a voluntary and informed manner in an online setting. In research on or with human beings, voluntary informed consent constitutes a sort of gold standard, deviations from which in most contexts require special justification. While the requirement of voluntary informed consent is grounded in respect for research subjects, a related strategy for ensuring ethically responsible practice in cases where consent might not be required is to take steps to inform the relevant persons about the research that is carried out.
Finally, it bears mention that ten years ago, there was precious little hint of how important social media would be today, and it is as difficult – i.e., impossible – for us to predict what might appear on our online horizon in the coming years. So while sorting out the ethically salient differences between practices and platforms is of great importance for finding responsible ethical solutions, we should also keep in mind the importance of realizing that both habits and technology can change the premises of the discussion swiftly and dramatically.
In his contribution,
As Internet research evolves, there is also a great need for knowledge about the legal requirements related to using the Internet as a data source. In her contribution,
My topic is research on social media and the requirements regarding information and consent arising from such research. This article will primarily discuss the responsibility of researchers for giving due consideration to their research participants. It is also important to remember, however, that the value of the research is an ethical consideration that must be given weight, as the Norwegian National Committees for Research Ethics (NESH) points out in its guidelines on Internet research (NESH, 2003, point 1): Research on the Internet is valuable both because it can generate insight into a new and important communication channel and because the Internet provides the opportunity to study known phenomena (e.g. formation of norms, dissemination of information, communication, formation of groups) in new ways.
The requirements regarding information and consent when conducting research on social media are not essentially different from other research involving people’s participation. However, research is conducted in contexts that are structured by technologies and in which the conditions for communication are not always as clear or known for everyone involved. This applies in particular to the boundaries between the public and private spheres, which are often drawn in new ways and which therefore cause us in some cases to be uncertain about which requirements regarding information and consent should apply. But not everything is equally unclear. In cases where a service is both password protected and entails sensitive information, such as a personal Facebook profile, it seems obvious that the usual requirements regarding consent must apply. In contrast, I argue in this chapter that there are weaker grounds for obtaining consent to use non-private information that individuals themselves have made available in a public forum, such as postings about political issues in debate forums in online newspapers or on Twitter. I argue that in some cases research on social media is ethically responsible without consent and that the interests of those involved may be safeguarded in other ways.
A useful starting point for this discussion is the model developed by McKee and Porter (
McKee and Porter’s model identifies some of the sources of the uncertainty surrounding the requirements regarding consent when conducting research on social media: the ethically relevant factors (public versus private, sensitivity, interaction, vulnerability) are present in varying degrees and may occur in various combinations. It is therefore difficult to formulate simple, general rules, and on this basis McKee and Porter recommend a case-based approach with concrete assessments of the ethical issues raised by various research projects.
It is clear that the four factors affecting requirements regarding consent in McKee and Porter’s model are not unique to research on the Internet, but are relevant in all research on communication. However, what complicates matters is that the boundaries between the private and public spheres appear in new ways, and the technological context creates new forms of interaction. This means that our ethical intuition about how we should regard these aspects is less clear.
In a number of often cited works, danah boyd has identified some properties of what she calls the «networked public sphere», which give communication on the Internet a character different from communication in other channels (boyd,
These are interesting and important observations of some of the special features of Internet communication, which also shed light on why issues related to consent in research on the Internet may be more difficult to assess than other types of research. For example, since it may be unclear who the audience is for postings on the public sphere of the Internet, it is also more unclear who the postings in this sphere are intended for, and thus it is more difficult to assess whether the use of communication in research conflicts with this intention. The question is whether or not the use of information is related to a purpose different from the original one. A clear «yes» to this question will normally result in a requirement to obtain consent. The problem is that there is no clear delimitation of the target in much of the communication on the Internet because the intended audience is not restricted by the context of the communication. Examples of postings in which the audience is «invisible» and not clearly defined are replies in a comment field in an online newspaper, a Twitter post or an article in a blog. Below I return to the question of which role consent should play in research on media with an invisible audience.
By the same token, not all communication on the Internet has all of these properties to the same degree. Not all Facebook content is searchable by everyone, and we know who the audience is for the comments we post there (if we have set our privacy settings correctly). Often the ethical requirements regarding research will be stricter when the communication does not have the four properties identified by boyd because this communication is more private.
I share McKee and Porter’s view that it is difficult to give simple, general rules for assessing when the requirement regarding consent should apply, and that it is necessary to make concrete assessments on a case-by-case basis. However, I will argue that there is an ethically relevant distinction between situations in which participating in the research entails a risk of harm or discomfort and those in which there is no such risk but the research nonetheless challenges the individual’s interest in retaining control over information about himself/herself. Although the boundary here is fluid, and breaches of personal privacy are of course burdensome, I believe the two situations are different in ethically relevant ways. In the first case, there must be a
Situations in which there is a risk of discomfort or harm trigger an unconditional requirement to obtain consent: It must be up to the potential research participant to decide whether to subject himself/herself to the relevant risk or discomfort. As mentioned in the introduction, I believe that assessments related to the value of the research and its quality are relevant considerations in an ethical assessment, but in situations in which there is a risk of discomfort or harm, the consideration given to the value of the research will not diminish the requirement to obtain consent. My view – and I think I am in line with the NESH guidelines – is that if it is not possible to obtain participants’ consent in projects that entail such risk, the research cannot be carried out. Allow me to illustrate this point with an example:
A group of economists in one of Norway’s neighbouring countries wanted to study preference patterns of partner selection on Internet dating sites. Simply explained, the researchers created fictional profiles on the dating site, some of women and some of men. The profiles had some similar features, but were different with regard to income, education and ethnicity. The researchers wanted to find out what difference these features made in the market for partners. For each variable the researchers planned to contact a random sample of (real) persons on the dating site and register the features of the profiles of those who responded and those who did not. After the data was collected, the researchers would tell those who had answered the inquiries that they were no longer interested.
The project, which as far as I know was never carried out,
Ethical challenges related to personal privacy arise when the research infringes on the individual’s interest in retaining control of information about himself/herself. The problem here is not necessarily that the research may be burdensome, as in the example above, but whether the research shows reasonable respect for the individual’s integrity and interest in retaining control of his/her own information. Respect for personal privacy indicates that consent to use information about an individual in a research project should normally be obtained, although I will argue that this consideration is weaker than the requirement to avoid the risk of harm and discomfort.
In situations where the research will challenge the individual’s interest in retaining control of information about himself/herself, this interest should normally be protected through consent obtained by the researcher. By the same token, I believe there are situations, especially when consent is very difficult to obtain, in which consideration for the value of the research may make it defensible to implement the project without consent. I return to this matter below. But let us first look at an example of research on social media that is clearly problematic from the perspective of personal privacy.
In 2008, US researchers made the Facebook profiles of an entire class of students from an unidentified US college available on the Internet. The dataset contained 1,700 profiles from the students’ first academic year in 2006. Comparable data were also collected from the two subsequent years, which were planned to be published at a later time. Making the data publicly available was done in accordance with requirements imposed by the project’s public funding source to allow other researchers to reuse the data.
The data was collected by research assistants who were also members of the Facebook network, but the other students had not given their consent to the use of the information in the research project. However, the information was made less identifiable and less sensitive before it was published by deleting the students’ names and identification numbers and removing the most sensitive information about their interests. Thus the information published was not directly identifiable, and it could only be used for statistical purposes.
The researcher responsible for the project defended the project on the grounds that the research would not entail a risk or burden for the people involved. «We have not accessed any information not otherwise available on Facebook. We have not interviewed anyone, nor asked them for any information, nor made information about them public (unless, as you all point out, someone goes to the extreme effort of cracking our dataset, which we hope it will be hard to do).»
As it turned out, however, it was possible to identify the school in question. But the most important objection raised in the discussion about the project was the method of data collection. Zimmer criticized the absence of consent to collect information as undermining the condition for communication between the members within the network. While the information was indeed available to the RA, it might have been accessible only due to the fact that the RA was within the same «network» as the subject, and that a privacy setting was explicitly set with the intent to keep that data within the boundaries of that network. Instead, it was included in a dataset released to the general public.
In my view, Zimmer’s objection is reasonable. Facebook is a system in which participants create a framework of protected communication with selected friends by logging in and actively choosing who they want to share information with. Participants in the network express clear preferences about the limitation of access to information about themselves through their privacy settings on their profiles. Using the information for research therefore violates the conditions on which the participants’ communication is based, although it is correct as the researchers pointed out that they did not do anything to expose the students to risk or discomfort.
In a system with a log-in function and privacy settings that limit access to personal information, it is clear in my view that consideration for the individual’s interest in retaining control of information about himself/herself triggers a requirement to obtain consent. However, in contexts where the communication channel is more open, it is not as clear. In that case, some of the other factors identified by McKee and Porter may play a role: degree of vulnerability, sensitivity and degree of interaction with the research participants. I will return to this point, but first I want to discuss a particular way of formulating the requirement regarding control over information about oneself. Many have proposed that information should not be used without consent if the people being studied
Hoser and Nitschke (2009) are among those who have spoken in favour of such a formulation of the consent requirement in research on social network services. Thus, we could establish a simple rule: The data someone posted, e.g. in a social network site or newsgroup may be used in the context and by the audience he or she intended it for. The intended audience is, even if it is large and not personally known to the user, the «community» he or she joined. So nobody else should be allowed to use, without consent, the data generated in such a site. Researchers are probably not the audience an average user intends to reach by his or her postings and serving as a research object is normally not the purpose an average user has in mind when posting on a social network site or in a newsgroup.
We see that the authors do not qualify which types of network services they believe should require consent, e.g. whether or not there is a log-in function. It appears they believe that if the postings were not intended for researchers, they should not be used in research. But if we formulate the criteria in this way, it will imply a consent requirement for all research, including for comments posted in the public sphere, e.g. postings in a debate forum in an online newspaper. There are two problems connected with this. One is that in some cases it is so difficult and resource intensive to obtain consent, such as from everyone who has participated in a debate on Twitter, that it is not possible in practical terms. The other problem is that it seems unreasonable to require consent in cases where people themselves seek public attention for their views, such as about political issues on Twitter. Let us look at an example.
A Norwegian and a Swedish researcher Everything that gets tweeted is public, but all of it is not necessarily for the public. Still, we would argue that the setting of our project – thematically tagged communication about an upcoming election – is public, and that the users could be expected to share that view.
Note that they do not assert that all communication on Twitter should necessarily be available for research without consent: There may be communication on Twitter that should be protected. They argue for their conclusion on the basis of a concrete assessment that the channel is open, the topic is of a general political nature and the condition for discussion is that people are seeking attention for their views in a public debate.
Such a concrete assessment of how researchers should regard communication in open forums is in keeping with NESH guidelines. On the one hand, NESH says that research on open forums may be conducted without obtaining consent. As a general rule, researchers may freely use material from open forums without obtaining consent from those who have produced the information or those about which the information applies. For example, a researcher may freely use information obtained from the coverage an online newspaper has gathered about an issue.
At the same time, NESH emphasizes in its guidelines that information that appears in open forums may also require researchers to exercise caution when disseminating research results, e.g. due to topic sensitivity or the subjects’ vulnerability.
I argued above that it is unreasonably limiting to formulate a general requirement regarding consent if the subjects do not expect that researchers will obtain access to the information. In my view, the Twitter project discussed above is an example of a project in which the subjects do not necessarily expect that researchers will study their postings, but in which the research must nonetheless be said to be acceptable. My view is that research may be compatible with the premises for the communication situation even though the participants do not actively expect that researchers will gain access to it.
There is a logical difference between an expectation that something will not occur and the absence of an expectation that it will occur. The first implies the second, i.e. if the expression to the left of the arrow is true, the expression to the right of the arrow must also be true:
expect not-A → do not expect A,
– but the opposite does not follow.
If there is an expectation that people on the outside will not gain access, as was the case in the Facebook example, then it is a breach of this expectation to use the information in research without consent. While in the Twitter example most of the debaters do not expect that the information will be used in research, neither is it a reasonable expectation, given the context, that the information will not be used in research. Thus, in the latter instance the researchers’ access to the information does not undermine the premises for communication.
But even though researchers’ access to the information does not necessarily undermine the premises for communication, researchers will often need to give special consideration to this when disseminating their results. For instance, there are challenges related to the fact that quoting from the Internet makes it easier to search for the person being quoted. The question here is whether the further use of the research presents challenges, especially if identification is burdensome. The ethical assessments that this type of situation raises are different from those we has seen above, because the data collection in itself is burdensome or clearly infringes on the individual’s interest in retaining control of information about himself/herself.
Also in cases where the researchers’ access to information does not necessarily undermine the premises for communication, there may often be grounds to require consent to use the information in research, because the information is sensitive or the persons concerned are vulnerable. NESH mentions this consideration in its guidelines: Persons whose personal or sensitive information appears in an open forum are entitled in a research context to have such information used and disseminated in an appropriate manner. Living persons are also entitled to control whether sensitive information about themselves may be used for research purposes. The potential to trace an informant’s identify is greater when using digital forums compared with other information channels […]. Researchers must anonymize sensitive information that they make use of.
Regarding the third point, the assessment is more complex and the consideration for research is clearer. In this case, obtaining consent is not the only means of taking research participants into account. One alternative is to refrain from identifying the participants, but in this case a concrete assessment must be made of the specific case; it is not possible to formulate rules that can be used more or less mechanically. This also means that cases will appear in this landscape where it is not so easy to draw clear conclusions. Let me give an example.
A Swedish project, described in Halvarson and Lilliengren (
In my view, the most difficult question in this connection is whether the researchers should quote the participants’ postings, especially because it involves comments with sensitive information involving a vulnerable group. Halvarson and Lilliengren argue that it is not necessary to obtain consent to gather the information. They believe that the researchers’ observation of the discussion in this open forum does not entail any risk or burden for the participants. Moreover, they point out that this is an openly available forum and that the researchers’ observation and registration of the communication does not limit the participants’ control over information about themselves. The question could be raised as to whether all the participants are aware of this openness to the same degree, but let us assume that the researchers are correct. They also argue that the project is beneficial by pointing out that it is important to understand ordinary psychological explanations. Such explanations are the most important resource used by most people to tackle personal and interpersonal problems, and it is important to understand the basis for the strategies people use, e.g. for providing a basis for improving professional treatment. In addition, the researchers believe that there is no other alternative to observing natural communication, such as by setting up a discussion group and inviting people to participate in it. In this case, they believe that the recruitment would be biased and that they would not have got very many participants.
The question that remains, if they are correct that gathering information without consent is acceptable, is how the researchers should handle the information they collect when they disseminate their results. The two researchers chose to quote from the postings on the forum without giving the pseudonyms that the young people use when they participate in the discussions. The argument for this is that people often use the same pseudonym for several different Internet services, so that the names can be used in a search to find them in other places and thus help to identify them. But should the researchers have asked for consent to use the quotes they gathered? Halvarson and Lilliengren discuss this question and conclude that asking for consent could negatively affect communication in the forum: When studying private explanatory systems at this specific venue, obtaining informed consent is not a practical problem. All informants can be contacted via their public e-mail address and thus asked for consent to quote their postings. However, it is difficult to know how this would affect their experience and future use of the venue. If it were to be perceived as an intrusion it could have negative effects and violate later participation in discussions.
The problem is that those who receive such an inquiry might regard it as intrusion, which would decrease their interest in taking part in the forum in the future. This is obviously an important consideration. But if the researchers believe people may dislike it if they knew they were being quoted, is this not a reason to refrain from quoting their postings or to ask for their consent – especially because many of the comments are posted by young people and by people who might be in a vulnerable situation? In this case it is not easy to give a straightforward answer. It has to do in part with how great the potential is to be identified through the quotes, but it also has to do with how much the documentation is weakened by not using quotes when the results are presented, what alternatives are available for providing evidence for interpretations of the communication, and through which channels the results are disseminated. We do not have enough information to assess all of these aspects, but I would stress that there is no way to avoid a concrete assessment of all relevant values and alternatives in the situation, including the research consideration, in order to take a decision. One thing that is clear, however, is that if it is decided that consent to quote should be obtained, people should also be allowed to decide whether they want to take part in the study at all.
The problem encountered here by Halvarson and Lilliengren is typical for many studies of communication processes: Information and questions about consent will disturb the natural interaction researchers want to study. Hudson and Bruckman ( Based on this study, we can safely conclude that individuals in online environments such as chatrooms generally do not approve of being studied without their consent. The vehement reaction of many in our study indicates that they object to being studied. Further, when given the option to opt in or opt out of research, potential subjects still object.
However, Hudson and Bruckman point out that in many groups they were not thrown out and that they do not have the chance to find out who does not want to participate in research and who is only reacting to the way the question about consent was asked. Thus they argue that it is acceptable – and the only possibility – to conduct research without consent if the IRB (Institutional Review Board) The research involves no more than minimal risk to the subjects. The waiver or alteration will not adversely affect the rights and welfare of the subjects. The research could not practicably be carried out without the waiver or alteration. Whenever appropriate, the subjects will be provided with additional pertinent information after participation.
The key question is whether it is impossible to make the research based on consent (point 3). Hudson and Bruckman’s response is that in practice it is impossible to do so because their experiment shows that in synchronous forums it is difficult to implement a recruitment process in which the researchers reach those who want to participate without disturbing the communication.
This is problematic as a general conclusion, and Bruckman and Hudson also believe that a concrete assessment must be conducted of the potential negative effects of the research. But an objection to their approach is that they do not assess alternative strategies for obtaining the consent of participants from communities on the Internet. McKee and Porter comment on Bruckman and Hudson’s argument for research without consent in the following way: We arrive at a different conclusion: Users are not
I have proposed a model for ethical assessments that distinguishes between three types of situations in which the question of consent is raised when research is conducted on users of social media. Research that exposes the participants to the risk of pain or discomfort triggers a requirement to obtain consent. If the research undermines the premises for communication that the participants have given their explicit approval to, consent is also necessary for maintaining the participants’ autonomy. In situations where the researchers’ observation and registration of the communication do not undermine the conditions for participation, typically public debate arenas, consent is not the only way to take the research participants into account. One problem will often be how the information will be used when the research results are presented, e.g. whether quotes that may identify the participants will be used. In this assessment, consideration for the quality and value of the research should also play a role.
The properties of social media vary along many dimensions, and this is the source of uncertainty related to their ethical assessment. An important dimension is communication’s degree of accessibility in the public sphere, which varies in different ways from other media. A variety of social media such as Facebook, Twitter, Instagram, Snapchat, etc. have different forms of user control, which offer different ways of limiting the audience. This helps to make it difficult to draw a clear distinction between situations where the researchers’ participation undermines the premises for communication and where it does not. There may also be other considerations that affect the weight of the ethical considerations. Among these are the vulnerability of the people being studied, the sensitivity of the topic of communication, searchability of the information being presented, the degree of interactivity with those being studied, and the participants’ actual competence in and understanding of how social media function.
As the data protection official for research for some 150 Norwegian research and educational institutions, NSD has noticed an increase in research conducted on data harvested from the Internet in recent years. Today, the Internet is an important arena for self-expression. Our social and political life is increasingly happening online. This will have a major impact on how we understand the society in which we live and the opportunities for future generations to reconstruct the history of the 21st century.
Thus, data generated by the growth in electronic communications, use of Internet and web-based services and the emergence of a digital economy are increasingly valuable resources for researchers across many disciplines. At the same time there is a great need for knowledge and awareness of both legal requirements and ethical challenges related to the use of these new data sources, and for an understanding of the data's quality and scientific value.
In addition to the increased volume of this type of research, we have also seen a shift in focus. At first, the Internet and social media were studied mainly as a tool. The studies often concentrated on how the Internet worked as an instrument in e.g. education, health services or online dating. The methodological approach was usually interviews or surveys based on informed consent from the research subjects.
Today, the trend is to study the Internet as an arena for expressing or negotiating identity, often through projects of a sensitive character (e.g. political opinion, religious beliefs, health). Data are usually collected from social media such as blogs, social networking sites or virtual game worlds. These sources are publicly available, and often research is conducted without informed consent from the persons being studied.
This development raises questions such as: Which rules and regulations apply to research on personal data collected from the Internet? In which cases is it legal and ethical to conduct research on such data without the consent of the data subjects? When is it necessary to inform the data subjects of their involvement in a research project and when should this information be accompanied by an opportunity to refuse to be the object of research? These issues will be discussed in further detail in the following.
The use of new types of data, such as those collected online and so-called Big Data, rank high on the international agenda. The OECD Global Science Forum points out the challenges related to the large amounts of digital data that are being generated from new sources such as the Internet although these new forms of personal data can provide important insights, the use of those data as research resources may pose risks to individuals’ privacy, particularly in case of inadvertent disclosure of the identities of the individuals concerned. There is a need for greater transparency in the research use of new forms of data, maximizing the gains in knowledge derived from such data while minimizing the risks to individuals’ privacy, seeking to retain public confidence in scientific research which makes use of new forms of data.
To address this challenge, the forum recommends that research funding agencies and data protection authorities collaborate to develop an international framework that protects individuals’ privacy and at the same time promotes research.
The European Commission has proposed a comprehensive reform of the EU's 1995 data protection rules, 17 years ago less than 1 % of Europeans used the Internet. Today, vast amounts of personal data are transferred and exchanged, across continents and around the globe in fractions of seconds. The protection of personal data is a fundamental right for all Europeans, but citizens do not always feel in full control of their personal data. My proposals will help build trust in online services because people will be better informed about their rights and in more control of their information
We will not go further into this, but just briefly mention that the new digital media, and the Internet as an increasingly significant data source, are important reasons why the EU is currently upgrading the data protection regulation from directive to law. A regulation is a binding legislative act and must be applied in its entirety across the EU.
The demand for harmonization of rules and practices is high, particularly related to the use of data generated by or in relation to global communication networks such as the Internet. This type of network weakens the significance of national borders and the impact of national policies and legislation on the protection of personal data.
NSD's general impression of the Commission's initial proposal was that it would not lead to any dramatic changes for Norwegian research. The reason is primarily that Norwegian data protection legislation and the way this legislation is practised in relation to research are stringent, and that we have a high degree of protection of personal data in Norway. However, some of the recently proposed amendments to the Commission's proposal made by the European Parliament may have negative consequences for parts of the research sector if being transposed into EU legislation. There is a clear tendency in this proposal towards strengthening the right to personal privacy and control of own personal data at the expense of researchers access to such data.
In Norway there are primarily three laws (i.e. the Personal Data Act, the Personal Health Data Filing System Act, and the Health Research Act) that regulate the use of personal data for research purposes. In cases of collecting research data from the Internet, it is mainly the Personal Data Act that applies, so our focus will be on this. Although the regulations are not always crystal clear, they provide important guidelines on how data initially produced for other purposes can be used for research purposes. The regulations may set limitations on usage, but they also provide many opportunities for valuable research.
The Personal Data Act is technology-neutral, although it is not necessarily adapted and updated with regard to technological development. The law applies to the processing of personal data, irrespective of source. It is applicable regardless of whether the data are self-reported, collected from a confidential source or gathered from a public registry. This implies that a research project is subject to notification to the Data Inspectorate or Data Protection Official when personal data are processed by electronic means, even if the information is gathered from a publicly available source on the Internet.
The purpose of the Personal Data Act is to protect the individual's privacy from being violated through the processing of personal data.
These data protection principles are applicable irrespective of methods for data collection and data sources involved in the research. Consequently, they also apply to data collection online. However, handling these fundamental data protection principles in this context presents the researcher with certain challenges. Should one expect those who express themselves online to understand that their personal data may be used for purposes other than those originally intended, such as research? Have they given up control of their personal data when publishing on the Internet? And how does the availability of the data affect the researchers’ duty to protect the privacy and personal integrity of the persons being studied? As a researcher it might be helpful to consider the following when trying to figure out these issues.
First of all, from what type of medium are the data obtained? Data collected from a public forum for debate will probably require fewer safeguards than a Facebook page with access restrictions. Second, does the data have the character of a public statement or is it reasonable to presume that the information is meant to be of a private and personal kind? And further, should the information be safeguarded considering the data subject's best interests, irrespective of medium or the author's assumptions? Sensitive data (e.g. information related to health) might require a high level of protection, even though it is published as part of a public statement at an open webpage. One might claim that the researcher has a special responsibility to protect sensitive personal data although the subject has disclosed it voluntarily, bearing in mind that the person might not view the full consequences of publishing the information online.
A fourth important factor is whether the data subject is a child or an adult. Information concerning children is subject to strict regulations. In 2012 a new provision of the Personal Data Act was implemented. The Act states that «[p]ersonal data relating to children shall not be processed in a manner that is indefensible in respect of the best interests of the child».
Furthermore, in relation to this, one should consider whether the information is published by the data subject itself or by a third party. If there already has been a breach of data protection principles, which may be the case when it comes to information published by another person than the data subject, researchers should be particularly careful.
As a default rule, personal data cannot legally be used for purposes other than the original one, unless the data subject consents.
An important provision in this respect is that subsequent processing of personal data for historical, statistical or scientific purposes is not deemed to be incompatible with the original purposes of the collection of the data, cf. first paragraph, litra c, if the public interest in the processing being carried out clearly exceeds the disadvantages this may entail for natural persons.
Thus, research activities are, per definition, not considered incompatible with the original purpose. Science is afforded a special position in the current legal framework, and this provision might be seen as a fundamental principle guaranteeing further use of data for research purposes regardless of the original reason for their production. This leaves open the possibility to conduct research on information obtained online without consent.
Having said that, as a general rule personal data may only be processed when the data subject has freely given an informed consent.
However, another provision offers a direct exemption from the main rule. Even sensitive personal data may be processed if this «is necessary for historical, statistical or scientific purposes, and the public interest in such processing being carried out clearly exceeds the disadvantages it might entail for the natural person».
Firstly, this entails that the planned processing of personal data must be required to answer relevant research questions. The researcher has to make it probable that the planned harvesting of data from the Internet is absolutely necessary to achieve the purpose of the study.
Secondly, if the necessity requirement is met, the law requires a balancing of interests between the project's societal value and any possible inconvenience for the individuals who are subject to research. It is crucial that the research will benefit society in some way or at least be an advantage for the group that is being researched. When assessing the probable disadvantages for the data subject, relevant factors are the degree of sensitivity, the author's presumed purpose of publishing (e.g. private or freedom of expression), the source (e.g. forum with restricted access or publicly available), who the data subject is (e.g. child, vulnerable/disadvantaged individual, adult) and the degree to which the data subject is identifiable.
Another important aspect to keep in mind in deciding for or against the processing of personal data for research purposes without consent is whether or not it will be possible to publish the results anonymously. This may be a challenge if one wishes to publish direct quotes, as these will be searchable on the Internet. It is also important to note that pseudonyms or nicknames may be identifiable because they may be used in various contexts online and hence function as a digital identity.
Moreover, an important factor is whether the data subject is informed of the research project. Having information and the opportunity to object to being included in the research will limit the disadvantages because the individual will then be able to exercise control over his or her own personal data. This may be a weighty argument for exempting a research project from the consent requirement. However, the right to object is not in itself considered a valid consent under the Personal Data Act. A valid consent must be a freely given, active and specific declaration by the data subject to the effect that he or she agrees to the processing of personal data relating to him or her.
If a research project includes the processing of highly sensitive data (e.g. from blogs about personal experiences with eating disorders, self-harm or the like), and the information being processed is detailed enough to make the bloggers identifiable (a factor one generally must take into account), it may be difficult to exempt from the requirement for consent. This holds particularly if publishing direct quotes is deemed necessary by the researcher, so that it will be hard to guarantee anonymity in the publication. If the authors are minors, the threshold for not obtaining consent should be even higher. Adolescents over the age of sixteen will often be considered mature enough to give an independent consent in such cases. However, when obtaining consent online, it might be a challenge to be certain of the actual age of the person granting consent.
In the case of research on utterances from Twitter, which involves thousands of people, that focus on e.g. elections (which in the legal sense may be sensitive information about political views), there will clearly be legitimate reasons not to obtain consent from the data subjects considering the public character of both the source and content of the data.
In between these two rather clear-cut examples lies a range of grey areas which require concrete assessments in each case. My main message is that it certainly can be legal to conduct research on personal information obtained from the Internet without consent, as long as the researcher can justify the necessity and the benefits for the public clearly outweigh the disadvantages for the individual. The violation of personal privacy is often minimal when data is harvested on the Internet for research purposes. However, research on social media with restricted access differs somewhat from most other contexts in this respect. It is plausible that individuals who publish information about themselves under such circumstances might think that they are acting on a «private» arena, and that their purpose is to interact with a closed group of people. This indicates that the threshold should be slightly higher when considering not obtaining consent in such cases.
The general rule is that the research subjects should be informed about the research. This is the case even if the exception clause from the requirement for consent applies. The basis for this rule is the fundamental right to exercise control over one's own personal data, and the assumption that the data subject should have the right to object to the intended processing of her personal data. However, a relevant exemption provision allows for research to be conducted without informing the data subjects: «The data subject is not entitled to notification […] if […] notification is impossible or disproportionately difficult».
If it is not feasible to get in touch with the affected individuals because it is not possible to obtain contact information or to communicate through the website, there is of course no way to provide those individuals with information.
Relevant factors in the assessment of whether it is disproportionately difficult to provide information are, on the one hand, the number of data subjects and the effort, either in terms of time or money, that providing information would entail. However, technological developments are and will most likely make it increasingly easier to distribute information to thousands of individuals at the same time at no extra cost. The violation of personal privacy is not automatically less because the data subjects are numerous.
On the other hand, one should consider what use the data subject will have of being informed of the research project. Is it likely that the research subjects would wish to object if they had the opportunity to do so? If that is a reasonable assumption, information should be provided. Another important question is to what extent the research subjects will benefit from being able to refuse to be part of the research project. This will depend on the type of data being processed and how sensitive the information is. If what is at stake is very sensitive information, data protection principles indicate that information should be provided. This holds independently of whether the data subject initially has made the information publicly available.
Legally, the obligation to provide information is met only if the researcher gives individual information in such a way that the information is certain to reach the intended receiver. But in some cases, it may be appropriate to provide public information instead. This may be done through collective information published on the website from which the data is collected. It is not guaranteed that this information will reach everyone in the same way as when it is communicated directly by mail, email or other channels, but public information is nevertheless a measure that, to a certain extent, can justify exemptions from the requirement of individual information.
The Personal Data Act is applicable irrespective of the data source. The regulations do not distinguish between data harvested from the Internet and other sources (such as administrative registers). However, the legal framework leaves open a range of possibilities for conducting research on information obtained online. It might be challenging, though, to apply the rules in this context.
The main rule is that the processing of personal information should be based on informed consent. But a number of exemptions make it possible to conduct research on personal information obtained from the Internet without consent, as long as the researcher can justify the necessity, and the benefits for the public clearly outweigh the disadvantages for the individual. The violation of personal privacy might often be limited when data is harvested on the Internet for research purposes.
The first set of ethical guidelines explicitly devoted to Internet research appeared only in 2002 (Ess et al.
Perhaps the most foundational element in ethical reflection is our set of assumptions regarding the human being as a
These shifts thus require a transformational rethinking of our ethical frameworks and approaches – including within Internet Research Ethics. Indeed, IRE is a primary domain within which to explore and develop these transformations: relational selfhood is most apparent as it is performed or enacted precisely through the communicative networks at the focus of IRE. At the same time, Norway may play a distinctive role in these transformations. Norwegian research ethics has already recognized in at least one important way that we are indeed relational selves: it has enjoined upon researchers the ethical duty of protecting the privacy and confidentiality of not only the
In the following, I will highlight how modern conceptions of the individual self lead to distinctively modern expectations regarding
Internet Research Ethics began to expand rapidly in the early 2000s (Buchanan and Ess
From an ethical perspective, these diverse documents drew from one of three primary ethical theories: utilitarianism, deontology, and feminist ethics (Stahl
By contrast,
Deontological ethics are primarily affiliated with the work of Immanuel Kant (1724–1804). Kantian deontologies appear to enjoy greater currency in the Germanic-language countries, including Denmark, Norway, and Sweden – first of all, as manifest in the profoundly influential conceptions of the public sphere and democratic processes as rooted in rights of self-determination and autonomy as developed by Jürgen Habermas (Buchanan and Ess
Finally, feminist ethics is occasionally invoked by researchers, especially in connection with participant-observation methodologies (e.g. Hall, Frederick & Johns
As diverse as utilitarianism and deontology are, they nonetheless share a more foundational set of assumptions – namely, the (high modern)
The emphasis on individual selfhood is equally apparent in Kantian deontology as anchored in core notions of ethical
And so in both traditions, the moral agent is presumed to be a solitary individual. Confronted with a specific ethical choice, such an agent is envisioned as considering her possibilities and options as a solitary being, apart from the voices, influences, and perhaps coercion of others. Moreover, whether making her choice through a more deontological or more utilitarian approach, the moral agent is thereby the entity who bears the sole and exclusive
This strongly individual conception of human beings is thus the subject that both justifies and demands democratic-liberal states – and with these, basic
We can also note that U.S. conceptions of privacy and privacy rights are squarely
Finally, we need to be clear how such a conception of privacy – specifically, of
With this as a background, we can now see how high modern, strongly individual notions of privacy and privacy rights have been foundational to Internet Research Ethics. In the U.S., to begin with, IRE is rooted in human subjects protections that grew up after both «internal» scandals such as the Tuskegee Institute syphilis study and the horrors of Japanese and Nazi «experimentation» with prisoners during WWII. Protecting the privacy of individuals is an explicit requirement – along with other protections of anonymity, confidentiality, and identity that likewise serve to protect individual privacy (see Buchanan and Ess
Again, how we are to implement such protections varies – first of all, depending on whether we take a more utilitarian or more deontological approach. For example, IRE in the U.S. context characteristically discusses the need to
We can also see some difference between U.S. and Norwegian approaches in terms of
Insofar as this is true, as we are about to see, the NESH guidelines thus stand ahead of the curve of change and development that seems required in IRE as our conceptions of selfhood in Western societies are changing more broadly.
To be sure, strong notions of individual privacy became ever more fully encoded and protected in various ways in Western societies throughout the 20th century. In light of the rise of networked communications in the latter half of the 20th century, perhaps most important among these were developing notions of
Within philosophy, as we have seen, conceptions of selfhood even at the time of Kant and Hegel were not
Twentieth century philosophy included several other emerging movements that likewise emphasized the social or relational dimensions of selfhood, beginning with phenomenology. So Maurice Natanson reversed Descartes’ famous dictum,
Similar shifts can be seen in the literatures of psychology and social science. So Georg Simmel describes the self as a «sociable self» (
These social and psychological accounts are of particular import as they have become prevailing theories for studying our engagements with one another in the online and mediated contexts facilitated by Internet communication. This relational – but still also individual – self is further apparent in more contemporary work in IS, beginning with the widely used conception of the self as a «networked individual.» Michelle Willson summarizes this conception as stressing how «the individual experiences her/himself as largely in control of her/his sociability through the possibilities of the [network] technology,» a view that highlights such individuals as «compartmentalized or individuated persons who approach and engage in constitutive social practices in ways chosen by themselves» (
These shifts, finally, are recognized within philosophy to require correlative changes in our conceptions of ethical responsibility. As a first example, contemporary feminists are developing notions of «relational autonomy» that build on these various recognitions that our sense of selfhood and agency is interwoven through and defined by our relationships with others; at the same time, the notion of relational autonomy retains earlier (high modern) understandings of moral agency and responsibility as connected with strongly individual notions of selfhood (Mackenzie
These shifts in our philosophical, sociological, and psychological conceptions of selfhood further appear to correlate with observed
As a first example: especially with the emergence of social networking sites (SNSs) in the early part of the 21st century, it is a commonplace for parents to complain and worry about information their adolescent children post in such settings. Simply put, from the parents’ perspective, their children are revealing far too much
Similarly, Stine Lomborg (
Lomborg's analysis is of particular interest precisely in that she argues that these communicative phenomena reflect Georg Simmel's notion of «the sociable self,» i.e. a self «engaged in a network of relationships» which as such is a self that «is attuned to the norms and practices within the network of affiliation» (
In response to these transformations, there have been a number of efforts to reconceptualize privacy. The most significant of these is Helen Nissenbaum's account of privacy as a matter of «contextual integrity»: in this view, privacy emerges as a right to an «appropriate» flow of information as defined by a specific context (2010: 107 ff.). Such
More broadly, precisely as Nissenbaum invokes
To my knowledge, neither Rachels nor Nissenbaum explicitly invokes a notion of selfhood as
These transformations in our practices and philosophical conceptions of privacy thus appear to closely correlate with the major shifts we first examined in some of our most foundational ethical concepts – namely, our conceptions of human identity and selfhood, as these in turn interweave with our understandings of ethical agency and ethical responsibility. We have also seen that extant forms of ethical guidelines for Internet research – apparently, with the exception of the NESH guidelines – presume an all but exclusively high modern conception of the individual as ethical agent and all but exclusive bearer of ethical responsibility: these presumptions result precisely in primary obligations (whether utilitarian or deontological) to protect
Again, it appears that the NESH guidelines, as enunciating most articulately and explicitly the requirement of researchers to protect not only the privacy (
«Difficult,» however, does not necessarily mean impossible. On the contrary, three recent research projects using smartphones – i.e. devices that usually accompany us precisely into our most intimate and private spaces –exemplify some of the privacy challenges opened up not simply by current networked technologies, but by individuals who seem increasingly willing share intimate and private information across these networks. Two of these projects – one in Denmark and the second in the U.K. – appear to show that researchers can build privacy protections that are sufficiently strong to persuade their subjects that their personal information is safely held. A third example, however, shows that these new challenges are still sufficiently novel that extant guidelines, codes, and laws are not always able to provide researchers with needed guidance and support.
A first project, «Device Analyzer,» is based at the University of Cambridge, U.K., and, at the time of writing, has attracted the voluntary participation of over 17,000 participants worldwide (see
Indeed, the project goes to great lengths to explain to participants how their individual identities are protected, coupled with a detailed list of the extensive range of data collected (see
At the same time, the kinds and amount of data collected are breathtaking: the detailed lists of data types alone fill more than one A4 page. It is distributed across four categories: basic data, data about applications and their use, hashed identification of the GSM cells the phone connects with, and an estimate of the phone's
A similar project on how Danes use their smartphones likewise requires participants to install a «Mobile Life» app, one that collects data such as «number of calls, sent and received text messages, location information, data usage, and product and application usage» (Zokem
These two examples suggest that, so far at least, extant forms of privacy protections (e.g. hashing data and using only statistical aggregations) and relevant law (in the Danish example) are sufficient to assure contemporary subjects, addressed as
A proposed research project in Scandinavia was designed around the use of an app on participants’ smartphones similar to the apps described above. The app would record the whole range of communicative behaviors facilitated through the phone, including texting, status updates, and web-browsing, photos taken and saved, contacts added, deleted and retained, and so on. This unprecedented glimpse into their subjects’ personal lives – obviously a rich source of new research data – also presented now familiar ethical challenges regarding how to protect subjects’ anonymity, privacy, and confidentiality. The researchers themselves were uncertain of how to proceed: worse, the various relevant authorities – their own university guidelines, national law and national research council guidelines – offered advice and direction based on earlier, more limited modes of research. The researchers thus faced a mix of both inappropriate and inconsistent guidelines. The result was, in effect, an ethical paralysis – with the further result that the research could not go forward.
I suggest that these research examples are significant primarily because they (seek to) implement communication technologies that represent
The collapse of the third project, however, suggests that current possibilities for Internet research that move into the most intimate spaces of our lives – a move that is coherent with increasingly relational senses of selfhood and more shared conceptions of privacy – are well ahead of extant guidelines, policy, and law in at least some cases. This collapse further suggests that Internet Research Ethics should pursue the development of new guidelines more precisely tuned to more relational senses of selfhood – though not necessarily at the cost of more traditional, individual senses of selfhood. In this development, we would likely be well served by taking up Nissenbaum's notions of privacy as contextual integrity as a starting point.
Internet Research Ethics can now point to a long and deep tradition of both national and international literatures – including the AoIR and Norwegian National Ethics Committees’ guidelines. But the relentless pace of technological development and diffusion constantly offers us new ways of communicating and interacting with one another – ways that frequently open up novel ethical challenges for us both as human communicants and as researchers. In particular, I have tried to show that a specific strand of challenges emerge because of transformations at the most foundational levels, i.e. with regard to our primary assumptions regarding the nature of the self and thus how we are to understand moral agency, ethical responsibility, and affiliated notions of privacy – where protection of privacy stands as a primordial ethical obligation for researchers. To do this, I have traced important connections between the high modern ethical frameworks of deontology and utilitarianism with strongly
Certainly, these shifts from more individual towards more relational understandings and practices of selfhood thus complicate and make more difficult the articulation and fulfillment of researchers’ ethical obligations. But as both the extant Norwegian codes and the first two case studies explored in the final section suggest, «difficult» does not mean impossible. On the contrary, the success of these cases – of apps installed on smartphones that allow researchers to reach into what otherwise have been the most closed and intimate spaces of our lives – exemplify techniques, including articulate legal contracts, that appear to be viable and effective in protecting both individual and more relational forms of privacy.
The failure of a third project, however, illustrates in part the fatal consequences for researchers in these new domains that can result instead when local guidelines and national codes fail to mesh effectively with these newer understandings and practices. Those of us engaged with the ongoing development of Internet Research Ethics obviously have our work cut out for us. I have argued that both the Norwegian research ethics codes and Nissenbaum's account of privacy as contextual integrity provide us both real-world examples and philosophical approaches that should prove most useful in such efforts.
In only a matter of years, people have populated online spaces in ways that interweave us in mediated spheres as part of our lived realities. We live in screens and in the intersections between screens. Many of these spaces are public and semi-public. At first glance, personal practices in these spaces might appear at odds with the public character of the venue: yet self-presentational strategies and interactions become more meaningful when we share actual traces of life. The concept of privacy is hence a moving target, constantly being negotiated and renegotiated as a consequence of how we perceive the in-flux boundaries between public and private spheres. And not only is our personal life closely integrated with mediated practices. Professional life is increasingly moving online, with the emergence of social intranets attempting to replicate social network sites within enterprise contexts.
In this paper, I will discuss how we can research mediated practices, both personal and professional, without compromising the privacy of the people being studied. One core premise is that the potential public character of the content and people being studied does not warrant the public display and exposure of the research subjects. Traditional research ethics, ensuring the privacy of the research subject, remains key, and perhaps ever more so.
I have conducted several studies on how people make use of the Internet in their everyday life for personal matters, as well as in organizational work and professional and work-related domains. Two different qualitative studies will be presented as examples, demonstrating why researchers need to tread carefully when approaching research subjects, gathering data, and presenting results in publications. The first example is taken from a study of young people's use of social media, and is based on interviews of 20 young people between 15 and 19 years of age, as well as on observations of their online practices in blogs and social network sites (SNS) in the years 2004–2007. The second example is taken from a study of the use of a social intranet in an international ICT consultancy enterprise. This study was conducted in the years 2010–2013, and is based on interviews with 27 employees as well as on analyses of their social intranet user patterns. In the latter example, the social intranet is only available for the company employees, and the content studied cannot be republished in research publications. In both examples, the informants being interviewed gave their informed consent to participate, and were guaranteed anonymity, complying with requirements and procedures for data handling as defined by the Privacy Issues Unit at the Norwegian Social Science Data Service (NSD).
In the following pages, I will first briefly review relevant literature on mediated practices and the specific challenges this poses for research. I will then discuss the particular research challenges experienced in studying social media practices in personal and professional contexts, before concluding with a discussion of the consequences of blurred private/public/professional realities for qualitative research.
Traditional research ethics stipulate certain requirements regarding how research should be conducted (for example, participants should be informed about the purpose of the study, that participation is voluntary, and that they can withdraw from the study at any time). Only then can the participants give their informed consent to participate. The requirement to obtain informed consent from research participants is incorporated in European legislation (European Commission,
Yet, whereas traditional research ethics may seem relatively uncomplicated, challenges arise for researchers who attempt to understand and analyze online personal practices, particularly when it comes to republishing online content in research publications. When discussing research of online behaviour, we need to discuss the character and perceptions of online behaviour as situated between the always renegotiated spaces between what is private and what is public.
Overall, the dual notions of online/offline tend to keep us focused on the differences between online and offline rather than on the embodied realness of online behaviours. Addressing online/offline is preferable to the «old» dual notions of virtual/real, yet we need to improve our understanding of online life as an integral part of life. Users’ behaviour online is usually «firmly rooted in their experience as embodied selves» (Ess,
In a Norwegian context, the NESH guidelines for research ethics stress the importance of the researcher considering people's perceptions of what is private and what is public (Bromseth,
However, assessing the acknowledged publicity of an online venue is not always straightforward, at least not as seen from the point of view of the participants. A personal blog might be publicly available for all to read, though very often it can be regarded as a personal and private space by the author. As a researcher I typically inform study participants that personal data will be anonymized and that it will not be possible to identify who they are. This means that I cannot republish online content originally published by research participants even if that content is publicly available online. The fact that people publish personal information online, and leave publicly available traces of sociability and self-performance, does not mean that this content is «up for grabs» by social scientists without carefully considering the privacy of the people being studied. As emphasized in a number of studies, people may maintain strong expectations of privacy and ownership of personal data even if that data is in fact publicly available (Walther et al.,
Much has changed since the publication of the first version of ethical guidelines for Internet research by AoIR in 2002. As a consequence of technological developments, a new version of the guidelines was published in 2012 (Markham and Buchanan,
I will return to some of these principles in the conclusion, addressing how two different studies require certain strategies for ensuring the privacy of the research participants.
The two case studies I will discuss are similar in that I rely on interviewing people in addition to studying their online practices. As the participants have agreed to take part in the study on the condition that their identity will not be revealed, I do not include explicit examples of content they have published online. Protecting the privacy of my informants concerns how I gather and store data, as well as how I refer to them and their online practices in publications. As will be evident, a consequence of the agreement with the informants is that any empirical examples of content must be reconstructed, even if this practice is scientifically disputed. Reconstructing empirical examples does not imply inventing examples, but making required changes in order to maintain the original meaning and message while ensuring the original content cannot be retrieved through searches.
In my PhD work I followed 20 people between 15 and 19 years of age in the period 2004–2007. I followed their online practices in their blogs and in social network sites, and I interviewed all of them once. My informants were guaranteed anonymity, and their names were changed in the analyses and publications. These conditions were described in a formal postal letter of consent, which the informants signed. According to the Norwegian Data Inspectorate (Datatilsynet), minors who are 15 years or older can give their informed consent in the relevant sort of cases (Datatilsynet,
My study concerned mediated individual practices, some of which could expose the informants in rather intimate ways (e.g. revealing photos or textual confessions) and disclose their identity if republished in the context of this thesis. Though these expressions were often publicly available online, efforts were made to secure the privacy of the informants. I reported on their online lives and user patterns, but I did not republish their online expressions or photos.
At that time, young Norwegian bloggers typically avoided revealing their full name in their blogs and/or protected all or part of the content as accessible only to connected blog friends (e.g. with friends-only blogs in LiveJournal). Hence you could not google my informants and find their blogs. Yet those with publicly available blogs were all easily recognizable if you found their blogs and knew them offline: they revealed their first names, and often published pictures and other information that exposed their identities. My obligation to ensure the anonymity of my informants meant I simply could not include any information that might identify them. My inability to include content which my informants had created online was thus a consequence of conducting research interviews. It was simply not viable to combine anonymous interviews with analyses of online practices if those practices were also reproduced in my work.
However, even if a researcher relies only on analyzing online content (and thus avoids the problem of revealing the identity of interviewees who have been promised anonymity), the public availability of content does not necessarily imply that content can be used without consent, or at all. We need to consider people's perceptions of what is public and what is private. The experiences and perceptions of my informants illustrate the complexities involved.
Informants who had a publicly available presence (even if anonymous or pseudonymous) perceived these spaces as private. They did not regard their blogs as public by practice, even if the blogs indeed were public by technology. The development of media technologies has always been connected to the increasing exposure of the private sphere in the public sphere (Warren and Brandeis, Kristoffer (18): For a long time I had a title on my blog saying that if you know me, don't say that you have read this. Marika: Why? Kristoffer: Because then it would affect what I write. Then I would begin to think in relation to that person. I try to write my thoughts, but if I know that a person is reading it I begin to think of that person as a recipient. And I just want my message to get across; this is my message to myself.
18-year-old Linnea describes a similar experience with her blog as her own private space: Linnea (18): I try to pretend that no one reads it. Or that I should be able to be honest and write what I want to without thinking, no, I can't say that because he will read it and I can't write that because she will read it and I definitely can't write that because the whole class will read it.
In spite of the fact that Linnea, like Kristoffer, emphasizes that she cannot consider her readers when she writes, she does appreciate having readers and is happy and grateful when she meets people who have followed her life through her texts and photos: «I think that if the diary is worth spending time on, then I am doing something good. […] And that I can mean something to someone. That feels really good.» Kristoffer and Linnea publish texts and photos online because they enjoy writing and taking photos, and they appreciate comments from readers.
The interviews demonstrate the indeterminate distinction between the private and public subject, and also pinpoint how offline as well as online publics include private spaces: Kristian (17): After all, the Internet is no more public than the world outside […]. I don't care if a stranger sitting at the next table in a Chinese restaurant eavesdrops on my personal conversation with a friend.
The Internet is more public to the extent that actions are available to an audience independently of time and space: i.e. expressions stretch beyond the here and now, as is the case for public blogs, social profiles and photo sharing services. All the same, Kristian does have a point that often seems to disappear when distinctions between private and public arenas are discussed: private actions take place within public spaces both online and offline. My informants thus perceive the Internet as a public space, but in this public space they create their own private spaces where they share personal narratives and experiences. Worrying that personal information published online can be misused is characteristic for dominant societal discourses, also affecting the perceptions of the informants. Simultaneously, they regard having a public presence online as meaningful and valuable. My informants often appeared surprisingly honest online, but they typically emphasized that they negotiated what they shared, as they were well aware that their blogs were publicly available. Fifteen-year-old old Mari explains, «I only share a little bit of myself with the rest of the world, not everything».
Although I did not include content published by my informants in my publications, I did include extracts from other «typical» teenage blogs and profiles, but I chose to reconstruct them, and I got their informed consent to publish the content in my own work. One of the extracts I included was a blog post by 17-year-old Mari. She writes mainly friends-only posts in her LiveJournal, available only for users she has added to her friends list. Occasionally she makes exceptions and writes public posts, and I used one of these posts to illustrate how she negotiates boundaries between her private and public self-performance. I first translated her blog post to Norwegian for a Norwegian publication. I then translated it back to English for my thesis. I also googled different parts of the quote to make sure her blog could not be retrieved based on the reconstructed post in my own work. This does mean that my research becomes less traceable, though I reconstructed her blog post to keep Mari's identity anonymous: The first time we kissed was at the traffic lights near Hyde Park. I was still sort of in a state of shock and giggled and laughed at what he said without saying much myself. To be honest I was quite frightened by the weirdness of the situation. My Internet friend had come out of the screen and as always when that happens, my brain and ability to formulate do not cooperate particularly well. So there we are waiting for the green man and he has his arm around me and I lean in to him and try not to pass out from all of these new experiences and I look up at him and smile (something which in itself is nothing new – I always smile) and he looks at me and leans towards me and we kiss. I get a bit funny inside but I do not gasp, and it is actually not unbelievably romantic. […] We kiss affectionately with open mouths and then the green man appears and we stop kissing and we giggle a bit before we move across the road while still closely entwined. («Mari's blog post»)
It may seem contradictory that Mari chooses to air a rather private experience in public; however, in an e-mail she explains that she chose to make this post public, because «it's about a very important and positive event in my life, and I managed to write something nice and reasonably meaningful». In continuing she explains that she is satisfied with how she manages to present herself and her personality, and that she wants to share this story because she knows that numerous others identify with «Internet romances». In this way, online spaces are used to mediate personal experiences and bring what is private into public spaces.
My informants stressed that they felt they had personal control over mediated expressions, meaning they could carefully create expressions that they were comfortable sharing. The consequence of this perceived sense of control implied they would share stories online that they would not typically share with their friends offline: Andreas (18): It's easier to express yourself accurately online, so online conversations are often profound and very open. You can write it down, and have a second look at what you're trying to say. If you don't like how you expressed something, you can just edit it. Then it's easier to be honest, and I think it's easier to tell people what I really feel.
Most of the texts and photos that Andreas (18) publishes are publicly available. Private revelations, however, are only available to registered friends or acquaintances (i.e. people added to his friends lists). Yet occasionally he needs time to decide what he wants to share with others: Andreas (18): You kind of want to get it out, but you don't want anyone to know just yet, so it is good to be able to write a private post. Even if I have a tendency to make private posts available to friends when I read them the day after.
Anders (17) writes a paper diary in addition to his online diary: «I'm more open and honest in my diary, but on the whole what I write in my diary comes out on LiveJournal a couple of days later. I just need some time to think and such.» The comments of Andreas and Anders indicate that the opportunity to construct expressions and to be able to reconsider these expressions at a later point sometimes make them present themselves differently in mediated settings. Similarly, the physical absence of others makes users feel more in control of their mediated sense of self, or in Goffmanian terms, users have more control with expressions given off (Goffman,
In other words, there are unique qualities with mediated forms of communication, and these qualities affect how individual users choose and manage to present themselves. Thus mediated communication is sometimes characterized by candidness, as users have more time to create expressions and exercise greater control over self-representations.
To summarize, Internet services such as blogs and SNSs are peculiar: although technically they might be public or semi-public, these spaces provide us with an opportunity to be publicly private in modes we have not previously been accustomed to. We can inhabit them and share experiences in the form of texts and photos with an audience that stretches far beyond what used to be possible in pre-Internet times. Yet my informants were very clear about their limits of intimacy. The online subject can be open and honest, often more so than in offline sociability, yet what is made available remains a filtered reflection of the self. Most importantly, the ambiguity of blogs as private or public means that «technically public» does not equal «public in practice» or «public» as content that researchers can choose to use as they please.
The second case study I will discuss with regard to research ethical assessments is a qualitative in-depth study of the adoption of the social intranet Jive Software by an international ICT consultancy enterprise that employs approximately 5,000 people. The study, involving qualitative interviews with 27 employees located in four different countries and observations of user patterns in the social intranet, was conducted by Lene Pettersen and myself.
Consultants in all divisions of the enterprise are typical knowledge workers, and the company introduced JIVE in the summer of 2010 to enable employees to «build professional networks, develop competence by following others more skilled, finding out what others are doing and not reinventing the wheel, having things you're working on easy to find and share, easily work with colleagues in other business units» (obtained from the company's strategy for implementing JIVE). JIVE has been organized as a social intranet tool, with national as well as public intranets and restricted groups for discussions and sharing of content, experiences and knowledge. The newsfeed that the employees see when they log in depends on which office they belong to, which peers they follow, and which groups they have joined as members (i.e. similar to how Facebook and LinkedIn function).
As used by our case company, JIVE is a non-public space: online practices are only available to the employees in the company. As such information is not public. In this study, protecting the privacy of our informants proved to be very important. In the course of the research process we also realized we had to keep the company anonymous in order to report as truthfully as possible what our informants told us. When conducting qualitative research projects, the aim is often to uncover in-depth knowledge about experiences and opinions as truthfully as possible. To succeed, researchers need to develop trust and rapport with the interviewees.
In our study, we soon experienced the benefits of having established a relationship of trust with our informants. They were informed about the study being conducted without disclosing their identity to the company or anyone else. Information about handling of data was included in the information about the study that the informants received before giving their informed consent to participate. This letter also described that the study had been reported to the Norwegian Social Science Data Services, and that the study and the handling of empirical data was conducted in compliance with its regulations with regard to confidentiality and archiving of data. It would not be possible to recognize the persons interviewed in any reports or articles. This formal procedure for guaranteeing that the study would ensure our informants’ confidentiality and privacy seems to have helped us to establish rapport and trust. We experienced a high level of candidness from our informants, as demonstrated by highly opinionated expressions about the company, the social intranet and their local work environment.
In the past few years, the company had faced a series of acquisitions, reorganizations, and a significant labour turnover, resulting in frustration for some informants. The interviews we conducted provided us with in-depth insight into the workplace experiences of our informants. The openness our informants showed us demonstrates the importance of having established a trustful relationship. No, as I said earlier, our culture has changed significantly. When I started […] everyone had their own voice and were individuals with their own opinions. This has changed. Now we're supposed to brown nose those with many important and international contacts and who might be promoted to an important position. [Those who participate extensively in JIVE] are those who try the hardest to position themselves. […] Their billable time is minor, and they talk a lot [laughs]. (Female in her 40s)
All is not misery in our case company: there are distinctive differences between the informants, and also differences in the experience of the work environment at local offices, and the quote above is representative only for the above informant. Yet her opinions are reflected in more modest forms among other informants as well: [Active users of the social intranet] are the Yes-people. Those who flatter and agree with the management. The Yes-people are those who participate in the social intranet, and who reproduce their Yes-views in their Yes-clan (male in his 30s).
This input is crucial when we try to understand the employees’ experiences of the company's social intranet. The honesty our informants showed us made them more vulnerable, and making sure they could not be identified by the company or anyone else became even more essential to us. Moreover, conducting the interviews uncovered that reluctant users of the social intranet had significant privacy concerns: for example, they would not «like» critical posts by colleagues even if they actually liked the content, because their own name would then be visible to everyone in the company, including managers (Pettersen and Brandtzæg, I think you can use JIVE to brand your name within the organization. I'm not saying I'm schmoozing with the management […]. But with JIVE […] like when I comment on a post from [manager], the distance between us decreases and my name might be noticed. […] There were no similar opportunities before JIVE. Like I couldn't keep track of what my manager was thinking and feeling, and then e-mail him and say, «Hey, I really like what we're doing now». (Female in her 30s)
Our promise to keep the informants anonymous both for the company and for the public means we avoided providing information about the office they belong to and their specific age, and we removed any information that might identify them. Gender and approximate age are included, as in the examples above: «female in her 40s». The combination of information about gender, specific age and office can easily reveal who many of them are. We carefully and consistently assessed whether the information we included in publications contain information that could result in individuals being identifiable. This is of course particularly important as our informants have shown us a level of trust and told us stories that might jeopardize their professional position in the company and even future positions if they choose to pursue a career elsewhere.
In this study, our responsibility to our informants makes it more challenging to present JIVE in a meaningful way to readers who are not familiar with the service, i.e. most readers. Screenshots of how the company makes use of JIVE cannot be included as is, but must be manipulated to protect both the company and the users. In her work, Pettersen has manipulated screenshots from the company's social intranet, substituting fictional photos and names for real photos and names in order to visualize the technical solution (for illustrations, see Pettersen and Brandtzæg, I learned that a publisher had rejected a paper written by two of my colleagues, solely on the claim that they were faking their data by presenting invented composite blogs instead of quoting directly from actual blogs. (Markham,
Similarly, Pettersen and I have received reviews of our work that express concern about the lack of detail about JIVE: «A first concern is the lack of detail we have on JIVE – its particular functionality – screenshots and so forth might be useful» (from a review on a paper submitted to a conference). Pleasing reviewers would require us to reconstruct, in greater detail, screenshots with fabricated textual and visual content to protect the anonymity of the company and the employees, which in turn might prompt reviewers to criticize the illustrations as fake and constructed.
To summarize, researching non-public company websites that contain confidential information requires specific considerations with regard to how the researchers treat the research subjects. Clearly, content cannot be published as is. However, also information retrieved through research interviews must be handled carefully. Our informants trusted us, and several informants shared stories they would not share publicly in the company. The relation between trust and sharing is of course well documented in several studies, and is also something we as researchers benefit from. As a consequence we cannot share information or
The informants in the two case studies are vulnerable, but for different reasons. Young research participants are vulnerable due to their age. In my study of young people's online practices, my informants were also vulnerable as a consequence of their self-performance practices in social media. Even if their blogs were publicly available, they still perceived their own blogs as their own private space and disclosed honest and intimate (if nevertheless filtered and edited) accounts of life. The interviews were conducted on the condition that the participants would remain anonymous, and this made it impossible to include content from their online practices in research publications. The knowledge workers interviewed in the second case study are adults with a high level of social, cultural and economic capital. However, most of our informants made themselves more vulnerable by sharing experiences and opinions they would not share openly in the company. They felt comfortable doing so because they trusted us to keep their identities anonymous.
Both case studies demonstrate how a one-size-fits-all approach with regard to ethical decision-making is not viable. The peculiarities of each case are only uncovered in the actual research process, and consequently the premises for making ethically sound judgments changed during the course of the studies. The interviews with the young bloggers uncovered complicated perceptions of private versus public, which only emphasized how I could not possibly use their content as I pleased in my own work. Similarly, Pettersen and I entered the social intranet case study rather naively, thinking it would suffice to keep the identity of the informants anonymous. The stories we heard are typical for knowledge workers, yet we could only report these stories truthfully if we also kept the identity of the enterprise anonymous. In both cases, ethical issues arose throughout the research process. Moreover, both case studies point to the importance of thinking in terms of ethics throughout the research process, from planning to publication and dissemination. Strategies to ensure the privacy of the research participants, for instance, instilled creativity concerning ways of illustrating online practices. These could not be republished as is in publications, but had to be anonymized and reconstructed even if such reconstructions might be at odds with «normal» scientific practice.
A process approach to research ethics means that the particular judgments made in the above case studies cannot easily be applied in other research cases. Ethical challenges will arise at different stages in the research process, and many of these challenges will only become apparent as the researcher becomes embedded in her research project.
One of the challenges we often face as researchers within the social sciences and humanities is to answer the question concerning how our research can contribute to society at large. What is the relevance of what we do? Can we make people's lives better? Safer? More manageable? A typical way of ensuring – or providing – such relevance is by informing policy development. As we all know, the global nature of the Internet creates challenges for policy makers. The Internet, with its content, risks, services and users, is in essence trans-national – and even global. Its basis is not governments, but rather commercial companies, private citizens, and organizations. At the same time most societies, on a political level, need and want to keep services and businesses in line with current policy developments and to ensure the moral and legal rights of citizens, in this case users. This relates both to rights of protection from potential harm and illegal content and the right to communication and activities whose status depends on legal, societal, cultural and financial frameworks, conventions and expectations. For instance, pornography and violent content can be (legally) published in one country and accessed by users – young and old – in other countries, where the content is deemed illegal. Likewise, a paedophile can groom a child anywhere in the world, and a teen can illegally download copyrighted material from a server or network situated on the other side of the planet.
Traditionally, a practical way of solving the legal challenges that the Internet's seemingly borderless nature creates has been to implement and rely on self-regulatory agreements where commercial companies take on social responsibility (Staksrud,
Doing so, the researcher must look beyond standard modes of operation in order to fulfil obligations on all levels. Despite them being obvious points, one must actually heed the fact that online activities are cross-border, one must consider ethical aspects such as the question of whether gathering information may be problematic even if the data is freely available, and one will have to consider commercial interests to a perhaps greater degree than the usual political ones.
Building on this understanding, this chapter addresses three methodological and ethical challenges related to Internet research: How can we make sure that we do not replicate standard perceptions of minors’ Internet use, but rather open up the field to gain new insights? How can we research communicative features and user practices in online communication services, such as social networking sites, without compromising the privacy and integrity of third party users, especially when these users are children? How can we as researchers fulfil our mission of independence and critical reflection on policy development in a field where businesses rather than governments are the principal regulators?
The challenges will be addressed by making it even harder: Critical policy research in online environments is challenging in itself due to the new and complex array of stakeholders and the lack of transparent regulatory frameworks. It is even harder when the users in question are minors (children), often assumed to be «vulnerable» users, with all the added ethical complications and obligations this entails. Throughout the discussion, «children» will be used as the Internet user and informant of reference.
Perhaps the biggest challenge in the quest to ensure people's right to be researched is to research underrepresented, so-called «vulnerable» groups. These are groups for which informed consent cannot be obtained directly and sufficiently, e.g. people with mental disabilities, those with financial needs that might make them vulnerable to certain types of research (having no real choice to opt out), or – the largest group of them all: children. For all of these groups there are special ethical considerations, which place a larger responsibility on the research community as a whole. For in a risk society, as described by Farrell (
An unintended but yet real result of this state of affairs is that one often finds these groups less researched than others. This is especially the case when we look at issues related to the «population», the «public», or, as typically within the research field of media and communication, the «audience» or «users». Children constitute a prime example of a sub group which is often (if not always) forgotten, and in reality are overlooked when it comes to sampling and collection of data that aims to inform policy development in general, and policy related to children themselves in particular. Therefore, in policy as in research, when we refer to «the public» and its features, forms, needs and meanings, we often do not mean all people, but rather those above the age of 15 (at best), or more likely above 18.
Traditionally, to the extent that children are researched, this has typically been done by proxy, by asking their parents and guardians about how they are, what they do and how they feel. Even when comparing
Thomson (
The rationale behind such research approaches
At the same time there is a need to recognize that children may be vulnerable and in need of special attention and protection throughout the research process. Additionally, one needs to pay attention to other differences such as their physical size and strength compared to adults, their general social and legal status and their dependence on others and fixed institutional position (such as in schools) (Hill,
With the above observations as background, I now turn to two practical examples of how one might make children count when researching online environments. Both are examples of cases in which the researchers’ key aim was to make children count, taking a child-centred perspective in the collection of high-quality research data related to policy evaluation and development in the area of children and online safety and risk. The first example is about being open to children's voices without providing an adult frame of reference and definition. The second example is about how we might make children count by researching from theirpoint of view.
The first example relates to the following above-mentioned challenge: How can we make sure that we do not replicate standard perceptions of minors’ Internet use, but rather open up the field to gain new insights? Representative, statistical surveys constitute one of the most efficient research tools in terms of applicability to policy development. Politicians as well as journalists appreciate the value of being able to say «how many» or «how much» in identifying or presenting a problem or an issue. Using statistics to frame current events might be particularly meaningful when the numbers represent a majority or a minority view, as this speaks to political language with its rhetoric and its need to frame standpoints in a prioritized context as «more» or «less» important than other issues on the agenda. Similarly, representative statistics can help define the appropriate, financially sound or «fair» limits for state-based services to groups of the population.
So, the legitimacy of policy development and intervention in Western societies partly relies on the ability to generalize findings to the population as a whole, or to specific, demographically defined sections of it. Statistical analysis is
When using (representative) surveys as a tool for mapping the state of the art and providing policy recommendations, one generally asks closed-ended questions about areas of already-established policy interest and agendas. One of the topics high on the agenda in relation to Internet policy is the question of online content regulation and what is perceived as harmful for children. Research on such questions can either be based on what is perceived as problematic by (adult) stakeholders in the field, or by researching what children actually do online and how their activities may or may not lead to distress and potential harm.
Although some qualitative research is beginning to investigate a wider array of possible risks to children online, much of the research is done within frameworks pre-defined by (adult) researchers. While such frameworks are firmly embedded both in theory, experience and observation, there is a need to ask if we miss out on the children's perspective. This is especially critical in the field of online research, with its fairly new and continuously shifting user patterns and rapid technological service innovations.
In order to counter this, when collecting data from a random stratified sample of 25,142 Internet-using European children aged 9–16 years (interviewing them at home during the spring and summer 2010), the children were first asked one open-ended, unprompted question: «What things on the Internet would bother people of about your age?’ In recognition of the methodological and ethical challenges of researching children's conceptions of risk (Görzig,
The results The things that bother people about my age are the influence of bad websites such as how to diet or lose weight so you could be known as the pretty one; like vomiting things. (Girl, 15, Ireland) To take a photo of me without my knowledge and upload it to an inappropriate website. (Girl, 10, Bulgaria)
Yet, pornography (named by 22 % of children who mentioned risks) and conduct risk such as cyber-bullying (19 %) and violent content (18 %) were at the top of children's concerns online. Both surprising in its own right and a methodological reminder was the extensive priority given to violent content. This is noteworthy insofar as this area tends to receive less attention than sexual material or bullying in safety and awareness-raising initiatives and policy discussions. And, not only did a considerable group of the children mention violent content, they also elaborated on this as being realistic content from the news – typically accessed through video-sharing sites such as YouTube. And many felt disgusted and scared: Some shocking news like terrorist attacks. (Boy, 12, Finland) I have seen what life was like in Chernobyl. People were suffering from physical deformities. I was upset to see the pictures and it made me sad. (Girl, 9, France) I was shocked seeing a starving African child who was going to die and a condor waiting to eat him. Also, news about soldiers who died while serving [in] the army, Palestine and Israel war scenes upset me very much. (Girl, 13, Turkey)
Thus, tweaking the methodology to let the digital users, in this case children, be heard gives direct and practical implications for future policy priorities. This procedure also connects online user research to traditional media user research both theoretically and empirically. In addition, it is a reminder of why and how children should be consulted also when the foundation is laid for policy development and solutions for protection are being sought (Livingstone et al.,
So, an answer to our first challenge might be that in order to make sure we do not replicate standard perceptions, but open up our field to gain new insights, we need to actually ask the users – in this case children – themselves. Allowing them to freely reflect upon their own situations showed how policy and awareness work in the field of Internet safety was not on par with the actual worries of many children.
As stated in the introduction, researchers can and perhaps should have a key role in the evaluation of Internet self-regulatory initiatives. This second example therefore relates to the two challenges of 1) How can we research communicative features and user practices in online communication services, such as social networking sites, without compromising the privacy and integrity of third party users? and 2) How can we as researchers fulfil our mission of independence and critical reflection on policy development in a field where businesses rather than governments are the principal regulators?
In 2008, as part of its Safer Internet Plus Programme, the European Commission gathered 18 of the major online social networks active in Europe as well as a number of researchers and child welfare organizations to form a European Social Networking Task Force to discuss guidelines for the use of social networking sites by children and young people (Staksrud & Lobe,
The guidelines were adopted voluntarily by the major online social networks active in Europe, and signed on Safer Internet Day, February 10th, 2009.
As part of its extensive encouragement and support of the self-regulatory initiative of the SNS providers, the European Commission did commit to monitoring the implementation of the principles as part of its extensive encouragement and support of the self-regulatory initiative of the SNS providers by supporting independent researchers to assess the services. But how could this be done? The self-reporting from the services on their adherence to the principles would be the first place to look.
Another method could be to look at and investigate the respective social networking sites as a user, making sure that the described information and services were there and could be found. But would this guarantee a
As many Internet researchers have learned, social networking sites are complex to review and research, as they usually contain and host a wide range of different (connected) services. This again makes them hosts of a range of (potential) online risks in terms of content, contact and conduct. This situation also raises a range of ethical issues, as the relational aspect means that you as a researcher, if you were to study children's (actual) use of the SNS by observing or engaging, would also study the communication and behaviour of
For instance: a commonly discussed problem high on the policy agenda related to children and media in general and to Internet services in particular is the potential for underage children to access inappropriate services. Many SNS's have age restrictions, e.g. Facebook has a 13-year-old age limit to sign up. If Facebook were to allow access by younger children, it would also have to comply with stricter rules of parental consent and handling of personal information as laid out in the US Children's Online Privacy Protection Act (COPPA) (United States Federal Trade Commission, Ask them to do something illegal or in breach of terms of service and codes of conduct; Ask them to lie; Potentially expose them to new risks they might not be ready to cope with by making them sign up with services they have not previously used; Give the child an implicit or explicit acceptance of defined deviant behaviour; Teach children how to circumvent technical restrictions (and get away with it); Potentially ask children to do something they really do not want to do, but feel they must comply with; Potentially expose other underage children (third parties, e.g., «friends») as liars;
… to mention only some of the ethical challenges such a strategy would entail. In addition, of course, come the ethical dilemmas of ensuring informed consent from the child's parents or guardians as well as from the child him- or herself in order to do something that is illegal or at best a breach of the «terms of service» (but yet accepted practice by many parents, see Staksrud, Ólafsson, & Livingstone, (
The solution reached in this case was to develop a method that allowed for testing the SNS services from a child's perspective without the substantial ethical complications of involving actual children. Thus the researcher had to become the child.
Addressing the ethical considerations relevant to testing such sites in the manner described, 13 carefully chosen national researchers were asked (while testing the sites) to choose imaginative nicknames/usernames for testing purposes and to set up a minimum of three profiles. In this way, the test avoided including any real underage children and the potential risks that such actions could have resulted in. Why three profiles? Because many of the features of SNSs extend from communication with other individuals. There is a need for «a friend» to communicate with, «like» and comment. The main testing profile was set up to be an 11-year old-girl (if possible, if not, a 15-year-old girl) with an «ordinary» name, all localized versions of «Maria». In addition, a peer-friend profile was established, as well as a profile of an adult. The latter was to be able to test whether adult «strangers» could access personal information about the minors, as well as if they would be able to contact them directly through the social networking site services.
The national experts were given detailed instructions on how to perform the testing in order to ensure as much consistency in the testing process as possible. The testing was meant to give a comprehensive and clear view of the extent of implementation of the principles in terms of compliance between what has been stated in the self-declarations
So, can 11-year-olds get access to a restricted site? When the results were collected it became clear that all services tested asked for information regarding date of birth as part of the registration process. On four services the users had to state that they were above a certain age (e.g. by ticking a box), while e-mail verification/address for e-mail verification was required by 20 services. Out of those 20 services, the testers were able to sign up without verifying over e-mail on seven of them.
On three services intended to be age-restricted for 11 year olds, sign-up was not allowed. 17 services denied sign-up, explicitly referring to the age restrictions on the site.
But children are people, and as people they might also employ strategies to circumvent restrictions. Thus it becomes important also to test if the services hold when children do something active to get access, such as changing their age.
Accordingly, for services that restricted signing up, another attempt was made
Another example pertains to the principle of «Provide easy-to-use mechanisms to report conduct or content that violates the terms of service» for children (principle no.4 of the SNS self-regulation agreement). This was specified as: Providers should provide a mechanism for reporting inappropriate content, contact or behaviour as outlined in their Terms of Service, acceptable use policy and/or community guidelines. These mechanisms should be easily accessible to users at all times and the procedure should be easily understandable and age-appropriate. Reports should be acknowledged and acted upon expeditiously. Users should be provided with the information they need to make an effective report and, where appropriate, an indication of how reports are typically handled. (Arto et al.,
In order to test principle 4, the social networking sites were sent the following message from the expert testers on their service(s) if at all possible. «I am writing to you because someone is sending me scary messages. What should I do about this? Please help me.»
This message was carefully designed and worded to be a general request, and would in most cases be sent from the profile of a registered, underage user of the site (in most cases an 11-year-old, in a few cases a 15-year-old, depending on the overall age restriction of the SNS). In this message, the SNSs were asked to give specific advice on how the users themselves should handle the situation. Please note that the message did not mention the SNS in particular. It was a general cry for help.
As the signatories are very diverse in the services they provide, this message might not be fully relevant to all the 20 social networking sites that at that time committed to the principles. However, it was deemed that an underage user asking for advice and help from a professional party should receive some sort of feedback, preferably with information relevant to the request sent. The way the message was worded should also prompt a personal response. So, did they get one?
Of a total of 22 services tested, 13 did not give any reply to the message asking for help during the testing period of about 6 weeks, two replied within a week (3–4 days), while seven replied within 24 hours.
From a policy point of view, the results were discouraging and signalled a failure to commit to the practical implementation of the SNS principles. As such it serves as a reminder that in terms of policy, including those implemented to protect user rights and safety, there is a distinct difference between the stated state of the art and the reality as experienced by the user.
From a research and methodological point of view, however, the undertaking was a success, in that it ensured quality data without compromising the integrity of the users.
One answer to challenge no. 2, then, «How can we research communicative features and user practices on online communication services, such as social networking sites, without compromising the privacy and integrity of third party user?», is that we sometimes need to go undercover. As observational research on the use of social networking services inevitably will involve not just one informant, it is vital that third parties also are protected according to ethical standards. This is especially critical as you might never really know who is on the other side of the interaction. Your primary informant might be interacting with a minor.
This example is also a reminder that if we as researchers are to fulfil our mission of independence and critical reflection on policy development in a field where businesses rather than governments are the principal regulators (challenge no.3), we need to take a critical approach and avoid relying on self-reporting as the (only) tool of assessment. The example also highlights the need for in-depth, real-world use and testing. This is particularly important with services offered in several countries and languages, where the risk of differences between «versions» is present. It is the end-user experience that should be evaluated, not their guardians’ report on it, nor the service provider's own account of their system and routines. When said out loud this seems self-evident, but the history of research on children's use of online digital media says differently.
Unlike other places in life, where the mismatch between parental perception and children's activities can often be observed, not only by researchers, but by the individual citizen, the online lives of children and the features of the digital services they use slip beneath most people's radars. It is the researcher's trade and obligation to provide critical, quality research to inform policy. In the online field our job might be harder, but no less important. However, we might take some comfort in the fact that new technologies do not necessarily mean new methodologies, simply new approaches.
In a passionate argument for the idea that social research must adopt social transactional data generated through new information technologies and new analytical techniques, Savage and Burrows claim that: both the sample survey and the in-depth interview are increasingly dated research methods, which are unlikely to provide a robust base for the jurisdiction of empirical sociologists in coming decades. (Savage and Burrows,
Savage and Burrows base their claim on the argument that digital data on social transactions is data about actual events, while also being data that pertains to entire populations. While the survey depends on representative samples and makes predictions based on such samples, those who analyze web data have direct access to complete data about actions and statements. In other words, analysts of digital transactional data, or what has frequently been termed «Big Data», evade the problem of representativeness: they can provide actual descriptions of peoples’ actions and infer future actions from these. This has proven a strong tool for predictions, as exemplified by how
There is no doubt that the use of Big Data in research presents researchers with new opportunities for analyzing social phenomena. Yet the use of such data also has its limitations, and introduces a set of new ethical and practical challenges. Both the opportunities and challenges are not only closely linked to the very nature of data, but also to how ownership and access to data are regulated.
In this article, we will attempt to shed light on the role of research in a field of tension between the new opportunities data offers, the ethical considerations that are necessary when a person carries out research and the limitations that exist in the regulation of and access to digital data. Underlying our considerations is the realization that the way in which we as researchers approach this field of tension has consequences. When we study social transactions through Big Data, we are studying a social reality. Through research, we participate in both constructing a social reality, such as the digital public sphere, and giving society insight into what its social reality is, both of which can have social consequences.
First, we want to describe what characterizes digital transactional data and what kinds of opportunities it offers to research. We will use the term Big Data in order to underscore the new analytical opportunities embedded in the characteristics of digital data, and use social media data as our main case. Then we wish to say something about the new ecosystem that has emerged around the production, gathering and analysis of digital data, and how it changes the research premises for the production of knowledge. We have borrowed the idea that the use of Internet data must be understood as being a part of the growth of a new ecosystem from boyd & Crawford's article «Critical Questions for Big Data» (
The amount of available digital data about people has exploded in recent years. This has to do with mundane status updates on Facebook, videos posted on YouTube, and Twitter posts that are available to anyone who wants to read them. It also pertains to data from purchases, both those that are made on the Internet and those that are made by credit card. Other examples of digital data include data from Google searches and data that logs phone calls.
The term «Big Data» is a collective term for data that is of such a scope that more data power than normal is required in order to collect and analyze it (Manovich,
There are two aspects of Big Data in particular that will greatly impact the social sciences. First, transaction data differs from survey data in that such data directly reflect what individuals actually do, instead of drawing conclusions based on individuals’ statements about actions. Second, digitalized data combined with cheap data power make it possible to study entire populations, instead of drawing on a selection. Thus, it becomes possible to conduct very sophisticated analyses, and also to predict future actions. In the book
The way in which Obama's second presidential campaign used and analyzed data provides examples of both how data may be used to predict behaviour, and how data from different sources can be pulled together and offer powerful analyses.
The example of Obama highlights an important characteristic of digital transactional data, namely that such data contains several layers of information. Metadata, such as email addresses, make it possible to link the different data that an individual has produced, for example data about purchase transactions and toll transactions. Savage and Burrows (
The result of this linking of data is that it creates new opportunities for assembling very detailed information, not only about individuals, but also about groups and organizations. One characteristic of the built-in opportunities for action in social media is that they make it possible to establish a social graph – lists of followers and friends (boyd & Ellison, 2011). Data gathered from social media thus contain information about how individuals and groups are linked together. If data about a user is collected on Facebook, data is simultaneously collected about several people in this user's network.
It would appear that such data represent an obvious enrichment to social research, as they give direct access to people's lives, statements and actions, provide detailed information and can be easily collected. Both access to Internet data and the opportunity to conduct analyses of large datasets based on concrete actions and interactions have caused many researchers to feel that the use of Internet data will revolutionize social research in fundamental ways.
However, in the article «Critical Questions for Big Data», boyd and Crawford caution against believing we can leapfrog over fundamental methodological challenges, such as the issue pertaining to representativeness, when the data that is used is large enough. Analyses of Twitter provide a good example of some of these challenges. Twitter studies have become very popular internationally, especially due to the availability of data. However, questions may be raised as to what analyses of Twitter posts represent. An obvious challenge is that Twitter users only constitute a certain selection of the population. Other issues are linked to the fact that there is no one-to-one-relationship between user accounts and actual people. One person can have several accounts, several people can use the same account, and accounts can also be automated – so-called «bots».
Another problem is linked to the definition of what constitutes an active – and thereby relevant – user on Twitter. According to Twitter, as many as 40 per cent of their users in 2011 were so-called «lurkers», that is, users who read content without posting anything themselves. Brandtzæg showed in his doctoral thesis that the same finding pertained to 29 per cent of those who use Facebook in Norway (Brandtzæg,
Another example of the difficulties in determining what constitutes the correct representation of the digital public sphere may be found in our contribution to Enjolras, Karlsen, Steen-Johnsen & Wollebæk (
Based on analyses of the material, we paint a picture of a fairly well-functioning public debate: the debating Internet population is hardly distinct from the population in general when it comes to socio-economic background and political views. Many debate with people they disagree with, few experience getting hurt, and many learn something from the experience – though few change their opinions. We also draw a comparison of political attitudes among those who debate on the Internet and those who do not, a comparison which is broken down into different forums (Facebook, discussion forums, Internet papers, blogs, etc.). As a whole, there are hardly any differences between those who engage in discussion on the Internet and those who do not. There are differences, however, between debaters on the various platforms. This picture differs significantly from the picture that is sometimes presented in the mass media and the socially mediated public sphere. At the same time, there is little doubt that if we conducted a thorough qualitative examination of the content from selected discussion forums and the debate forums of online newspapers, and studied the political attitudes within them, we would find a different picture. These two approaches would both give valid representations of Internet online discussions, but they would be representative of different phenomena – either the broad picture or the dynamics of particular forums.
This example illustrates the argument that, depending on whether one uses selected web content or representative survey data as the basis for analysis, very different pictures of an Internet phenomenon can be obtained. The same phenomenon is pointed out by Hirzalla et al. (
Even though the use of Internet data may potentially provide access to complete data and enable researchers to analyze statements and content from a large number of users, this does not eliminate the problem linked to representativeness or the need to interpret and discuss whatever phenomenon a person has captured. In addition, it is important to point out that the assumption of having complete data is hardly ever correct. We can again use Twitter as an example. Only Twitter has complete access to the information in Twitter accounts and the complete set of Twitter posts. Twitter makes Twitter posts available through so-called APIs,
In connection with the increased access to digital data, important changes have taken place in the social landscape where research positions itself. Savage and Burrows use the term «knowing capitalism» to describe a new social order in which social «transaction data» become increasingly important, and where such data are routinely collected and analyzed by a large number of private and public institutions. The main point for Savage and Burrows is that research has thereby ended up in a competitive setting. Research is no longer the dominant party when it comes to providing interpretations of society. boyd and Crawford use the concept of a new «ecosystem» to describe the new set of actors connected to the analysis of digital data and the power relationship that exists between them. Several elements of this new ecosystem touch on the potential of social research to represent and interpret society. We will highlight a few of these elements here.
As the data are under private ownership, these research departments are not in the same situation when it comes to privacy protection as researchers are, for example when it comes to requirements regarding informed consent. This is because users are required to accept the terms and conditions for the use of their digital data in order to be able to use the service. The result is that actors outside of academic research get a jumpstart when it comes to providing relevant social analyses and interpretations. Besides the fact that researchers inside these private companies are in a position to produce unique analyses, it is a problem that they are not required to let their research be reproduced or evaluated through the examination of data, given the fact that the data are private property (Enjolras,
As described above, a complicating element is the complexity of digital data. The fact that data exist in many layers, with different kinds of information, constitutes one such complexity. The fact that different data gathered from different applications and websites are linked together is another. In addition, it is hard to get a full overview of what the network structure in data really entails. For example, those who gather information about you can also at the same time gather information about the users in your network, and vice versa.
The difficulties in understanding both the technical and legal stipulations for the use of an application mean that users lose control over their own statements. A survey conducted in 2009 by Brandtzæg and Lüders in regard to Internet use and privacy protection revealed that many users were concerned about the consequences of sharing personal information on the Internet. The study also showed that most users had limited insight into how social media function and how to handle the privacy settings. boyd & Crawford call attention to the fact that data that have been produced in a certain context may not necessarily be brought into another context just because it is available (
Based on the preconditions that exist in the new ecosystem linked to digital data, it is possible to see the contours of a set of digital dividing lines that will affect what knowledge is being produced by whom (boyd & Crawford,
An important disparity pertains to access to data and to the resources required to utilize them. As pointed out above, private companies and their analysis departments have privileged access to data. For those who wish to conduct research on such data, access is restricted, and financial resources are required. Digital data from different platforms have been commercialized and can be purchased through so-called «data brokers». Access is thus dependent on the financial resources one has available. Alternatively, one can also collect certain types of data, such as Twitter data, by programming APIs oneself, that is, programs that can fetch data based on certain criteria. However, this also requires resources in the form of data power. As a result of the requirements regarding finances and technological investment, a disparity emerges, not only between private actors and academic institutions, but also between the academic institutions. Elite universities, such as Harvard, MIT and Stanford, have resources to build and equip research environments with technology and resources that allow them to utilize digital data. Smaller universities may not have the same resources. In Norway, we can imagine dividing lines among both the universities and between the university sector and the research institutes.
The use of digital data also creates dividing lines when it comes to competency. In order to utilize such data, competencies in programming, analysis and visualization are required. Such competencies are still a limited resource within the social sciences and the humanities. Building up such competencies requires resources, and larger institutions have an advantage if they are able to connect technical and interpretative environments.
An important disparity-generating dimension has to do with the regulations and requirements that pertain to the use of Internet data. Here we can find several dividing lines. First, there is a difference between different countries’ legislation with regard to privacy protection connected to research on digital data when it comes to collection, information and storage. This creates differences when it comes to getting access to using such data. At present, there are efforts at the European level to harmonize regulations relating to data protection rules, among them regulations pertaining to research.
Second, there is a fundamental gap between those who own digital data and those who do not. When consumers make use of services, such as mobile phones, bank cards or online shopping, or use Facebook or Google, they also give the providers of these services permission to use the personal data they enter by accepting the terms of conditions for the service. This permission allows private companies to stand in a privileged position when it comes to utilizing data, both for research and analytical purposes, as well as for commercial purposes, without having to abide by the same set of ethical guidelines that research does. The companies are not subjected to any requirements to make these data publicly accessible so that other researchers can make use of them.
It is not only reasonable but of some importance that research on digital data are subject to strict ethical requirements, especially considering the users’ potential powerlessness when it comes to protecting their own data. At the same time, it can be claimed that current regulation posits some conditions that are not up to speed with the general public's perception of the boundary between public and private information, and the perception of what kind of information should be covered by the privacy clause. One example that was published in
Referring to Manovich (
Internet research does not only present us with a new set of data and methods with corresponding ethical problems, but also with a new ecosystem for the production of knowledge. To define what the responsibilities of research are, we believe it is important to be aware of the fact that research plays a role in at least two different ways: as a producer of knowledge and as an actor with a shared responsibility to develop and practise a code of conduct adapted to Big Data. Both of these roles are to some extent dependent on conditions outside the research itself, such as the activities of other actors and public and private regulations.
The role of research is thus primarily about providing knowledge about the new way in which information is stored and structured in a digital society, as well as to shed light on what kind of power different types of actors, such as citizens, elites, organizations, and states, possess when it comes to having access to and the ability to interpret information. This means contributing with research-based interpretations of what the Internet is and how it works. Researchers should take it upon themselves to provide understandings of both the structural qualities of the web and the social practices that develop within different social fields and among different groups. A major challenge for segments of the social sciences is addressing the methodological opportunities that the Internet and Big Data present us with in such a way that researchers will be able to provide these types of analyses. This requires an investment in a new competency which is not included in most of the researchers’ basic education. A further requirement is a reflection on the methodological challenges and problems, such as the question of whom and what the different forms of web data represent. Through research, we take part in constructing the new information society as an object and providing the society with insight into what this reality is. This implies providing perspectives on such questions as: Do we understand the Internet as being open and free, or as something that is regulated and conditional? Does the Internet really constitute a public sphere, or rather a network consisting of isolated islands? Which voices are being represented in our research? What social interpretation is being produced that will potentially impact what legitimacy and significance the Internet will have in social political processes? Depending on which instruments, methods and theoretical tools we, as researchers, use to attack the digital research field, we will be able to provide different answers to these questions.
The second role of research is about contributing toward developing an ethic that is adapted to the new premises laid out by digital systems and Big Data. We feel that such an ethic cannot be developed in a vacuum, but must take into account the ecosystem of knowledge of which research is part. The challenge of research is to produce valid research in an ecosystem of knowledge while being under pressure by privatization and ethical regulations that differ from one country to another, and which to varying degrees are adapted to digital data. One way to think about this is through the concept
Presenting interpretations of society is nothing new in social research, nor is the fact that these interpretations may have social consequences. Still, our claim is that the Internet presents us with a new set of challenges. It requires that we both understand and critically evaluate what kind of data Internet data is and what it can produce knowledge about, and that we understand the social context of this type of research The ethical dilemmas that Internet research presents cannot be resolved without a deeper reflection on and establishment of ground rules for how Big Data should be handled in society, where research and other forms of public and commercial data use are put into context.
The rules that pertain to digital data are adapted to a «small data» world, where both data and computing power are accessible in large quantities. The ethical challenges pertain not only to research, but also to industry and administration. Therefore, we need new forms of accountability for both research and society.
New technologies for communication tend to raise certain expectations regarding the more or less overwhelming societal influence arising from them. Such was the case with radio and television – and consequently, also during the mid-1990s, when the Internet started to grow in popularity throughout much of the Western world. While more traditional or established forms of media remain a major part in our everyday media diets, the Internet has indeed come to play an important role in our day-to-day activities. Needless to say, such a move to a variety of online environments is of significant interest to a variety of scholars from the social sciences and the humanities. This chapter presents an overview of some of the challenges of performing research on the activities taking place in such environments. While my personal experience with this type of research is geared more towards perspectives often associated with the social sciences – specifically various aspects of online political communication – it is my hope that the concerns raised will also resonate with readers who approach studies of online environments from other perspectives.
Specifically, the focus here is on the phase in the development of the Internet often referred to as the «Web 2.0.» While there is not detailed agreement on what this supposed second stage of the World Wide Web entails, attempts towards a definition tend to revolve around ideas of increased user participation (e.g. O'Reilly,
While the term «Big Data» can be tagged onto a multitude of discussions regarding the increased possibilities of tracing, archiving, storing and analyzing online data, the specific appropriation of the term here deals with how masses of data are gathered from social media services like the ones discussed previously and subsequently analyzed for research purposes. In so doing, I would like to discuss two broad thematic groups of challenges that researchers often face when doing research on social media. The first group deals with ethical issues, while the latter concerns more methodological possibilities and problems. Before delving into these issues, though, we need to look a bit closer at the term «Big Data» and its many connotations.
As with the Web 2.0 concept, the term «Big Data» carries with it a number of differently ascribed meanings and ideas. As the name implies, definitions often involve discussions regarding the swelling size of the data sets that researchers as well as other professionals now have to deal with. Indeed, the growing use of social media combined with the increased sophistication of tools for «scraping» such online environments for data has provided «an ocean of data» (Lewis, Zamith, and Hermida,
It follows from this that while the scope of the data – the number of cases gathered and the number of variables employed – is of importance, size is perhaps not all that matters. As suggested by Margetts and Sutcliffe, «Big Data does not necessarily mean interesting data» (Margetts and Sutcliffe,
Regarding ethical considerations pertaining to this type of research, I will raise three interrelated issues for discussion: (1) the «open» or «closed» nature of data, (2) communicating this type of research to ethics boards, and finally, (3) the need for respondent consent.
First, developments regarding computing power for collecting, storing and analyzing data are not showing signs of stopping or even plateauing. This implies that issues pertaining to the technical limits of the kind of operations that can be performed need to be discussed in tandem with discussions of which types of activities should be performed We might label this a practical approach to research ethics.
As an example, we can point to some considerations that tend to arise when researching two of the currently most popular social media platforms, Twitter and Facebook. While the services differ in terms of modes of use, privacy settings and so on, we can distinguish between more «open» and more «closed» types of data from both platforms. For Twitter, users can add so-called hashtags – keywords formatted with a number sign (#) that signal a willingness on behalf of the user for their tagged tweet to be seen in a specific thematic context – that can assist researchers as well as other interested users in finding and selecting tweets of relevance. Such uses of hashtags are usually prevalent around the time of specific events, such as political elections, and have served as useful criteria for data collection in a series of studies (e.g. Bruns and Highfield,
The same reasoning can (taking into account obvious differences regarding the specificities of the platform) be applied when dealing with Facebook. Arguably a more locked-in service – a user essentially needs to have an account in order to gain access to most of the content – Facebook features Profiles, which is the type of Facebook presence most of us deal with in our everyday uses of the services. While Profiles are mostly associated with non-professional, personal Facebook use, professional employment has recently been taking place on so-called Pages. These Pages differ from Profiles in a number of ways – they are open to peruse by all, including by those who do not have a Facebook account, and they allow their respective owners (in this case, the political actors themselves) to extract more advanced metrics and information regarding usage rates than they would have been able to do if they had employed a personal Profile for professional matters. As with Twitter, we can differentiate between varying degrees of closed or open data here as well, where the operation of a Facebook Page at the hands of a political actor – be it individual parliamentarians, party leaders or even party accounts – could be considered a more open and public approach to the platform, thereby also making our job as researchers interested in the activities of politicians slightly less cumbersome. As general knowledge regarding privacy boundaries on Facebook are generally rather low (boyd and Hargittai,
Second, the need for research ethics boards has been evident in basically all branches of scholarly activities in order to make sure scholarly efforts meet the needs and standards set by society at large. While I have dealt with my own experiences regarding the relative difficulty of trying to communicate these issues to ethics boards in a separate, co-authored paper (Moe and Larsson,
The second issue has to do with the varying degrees of feedback and transparency that characterize the decision-making process of ethics boards. While the information that needs to be submitted to these boards is often plentiful and requires significant amounts of legwork from the individual researcher, the degree to which the submitter gains insight into the reasoning of the ethics board, when they have reached their decision, is of a varying nature. While the proverbial «burden of proof» should indeed lie on the researcher applying for ethical consultation, we also need to make sure that the feedback received – whatever the decision – is rich enough in detail so that the individual researcher can gain insight into the ways of reasoning applied. By securing at least some degree of transparency in these interactions, and by being more open in communicating these results to the academic community as well as to the general public, we will also be able to move towards precedents that will be very helpful for other, similarly interested researchers.
The third issue has to do with the necessity of obtaining consent when performing research on human subjects. While the practice of securing the willingness of those to be included in your study is more often than not a necessity, the practicalities of performing such operations must be raised for discussion when dealing with certain research projects. As an example, I would like to point to the work performed by myself and colleagues regarding political activity in conjunction with parliamentary elections in Sweden and Norway (Larsson and Moe,
As for challenges and questions pertaining to method when performing research on Big Data sets gathered from social media, I would like to raise four points in particular: the problem of «streetlight research,» the access to data, the stability of tools used, and finally, the competencies of researchers.
First, we can broadly conclude that among the many social media platforms available, Twitter and Facebook are (currently, at least) among the most popular, and as such, more interesting for researchers. As suggested by Lotan et al. (
Second, and related to the first point, is the problem of gaining access to data from a financial perspective. As both Twitter and Facebook have started to monetize access to certain parts of their respective APIs, partnering with third-party corporations to handle the day-to-day sales of data, it seems clear that finances will play an ever-increasing role in this type of research. For Twitter, this state of affairs can be illustrated by considering the different types of access allowed. While the so-called «firehose» API – including all the tweets sent through the service – is available, it carries with it a price tag that most academic institutions will not be able to pay. Instead, most researchers make do with what is labeled the «gardenhose» API – which provides a limited stream of tweets for free (e.g. Lewis, et al.,
Third, the stability of the tools we use for data collection and analysis is of the utmost importance. While this is less of a problem for the latter of these activities, where open-source software such as Gawk (Bruns,
As a result of this lack of stability, a number of research teams have taken it upon themselves to build their own tools for data collection. While such tools are often impressive and attuned to the needs of the specific team, the uses of these types of «homebrew» software could lead to what is often referred to as a «silo problem» down the line. If each research team makes use of their own individually constructed data collection tools, ensuring comparability between research results could become a challenge. Although scholars have different needs with regard to the type of data they work with, there is a need for a point of comparison between teams. While total homogeneity is definitely not an ideal, a complete lack of comparative possibilities is definitely problematic.
Finally, the competencies of social scientists and humanities scholars for dealing with these sometimes novel issues of data collection and analysis need to be assessed. Indeed, the need for interdisciplinary efforts is perhaps more pressing than ever before (e.g. Lazer, et al.,
The move to an online environment for social science and humanities research has indeed been fruitful for both branches. While the above considerations need to be taken into account when planning and executing a «Big Data» research project, they do not amount to a complete list. For example, the «siren-song of abundant data» (Karpf,
The issue of stability, mentioned above, is relevant to our understanding of the rather short history of social media platforms. Indeed, while Twitter and Facebook are currently among the more popular social media, this status is destined to come to an end when some novel service is launched and makes its claim for the online audience. With this in mind, researchers need to make sure that their instruments for inquiry – the way questions are posed or coding sheets are constructed – are «stress tested» and stable for future online platforms as well. This is almost certainly easier said than done. As rapid online developments take place, suitably aligned research instruments will enhance the quality not only of our present scholarly inquiries, but also of those to come in the future. Being prepared for these developments might help us in securing longitudinal insights regarding the uses of social media.
This chapter has outlined some of the considerations and challenges faced by researchers studying social media. While my specific starting point has been experience gained from my own research into online political communication, it is my hope that the topics dealt with here also resonate with those interested in other areas. Finally, it must be mentioned that what has been presented here should not be considered an exhaustive list of issues to be dealt with – ideally, this piece will also serve as a conversation starter for moving on to those further issues.
Few concepts have made as many headlines in the past few years as the term «Big Data». From its nascent beginnings in technology circles, the term has rapidly catapulted into mainstream society. In recent years it has even become a household notion in the higher echelons of government: the Obama administration has launched its multi-million dollar «Big Data Initiative» (Office of Science and Technology Policy
With a promise to fundamentally «transform the way we live, work and think» through extensive «datafication» of all things human (Mayer-Shönberger, Cukier
To be sure, there is no shortage of definitions. The «Big» alludes to unfathomable troves of digital data, in various shapes and forms, which we deliberately or passively generate in our daily interactions with technology. Then there is our enhanced ability to store, manage and extract insight from these data troves using powerful computing technology and the latest in advanced analytical techniques. But «Big Data» does not refer to a fixed quantitative threshold or clear-cut technological constraint. Indeed what is considered «big», «complex» and «advanced» varies widely. So much so that researchers have found it necessary to collate various definitions of the term «Big Data» and furnish the following meta-definition: Big Data is a term describing the storage and analysis of large and complex datasets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning. (Ward, Barker
Perhaps it is only natural that early attempts to capture and define an allusive concept will come in many guises and possibly fall along a «moving technological axis». But while we struggle to pin down this new technology, it is important to recognise that Big Data's entry into the mainstream is equally about cultural changes in how we think about data, it's capture and analysis, and their rightful place in the fabric of society.
Whether hype or substance, and however most appropriately defined, the Big Data discourse is taking place against some profound (and I would argue exciting) changes in how we interact with our physical and social surroundings. These interactions invariably involve technologies and result in digital traces manifested in such various ways such as Internet clickstreams, location data from cell phones interacting with phone towers, data streams from credit card transactions, the logging of purchasing patterns in shops, the rich and multifaceted sensor data from an Airbus A380 in flight or vast detectors at research labs like CERN. Moreover, the emergent proliferation of low-cost sensors allows us to track and monitor objects and mechanisms in ways that were previously impossible. Farmers employ moisture sensors to monitor moisture levels in fields, shipments of fish and fruit are monitored for temperature and location in real-time as they are moved between continents, and people log personal health indicators using their smartphones.
Not only do we spend more time «online», but as we continue to add more «things» to the Internet, our lives become increasingly more entwined with the virtual world. And the digital traces we constantly leave behind in the virtual world now give us new handles on complex problems in the physical world.
The sceptic might demur that Big Data still has some way to go to deliver on its promise; targeted advertising and tailored movie recommendations may not appear to be the stuff of «revolutions». But fascinating applications are beginning to emerge: mobile phone data is being leveraged to map and track the spread of disease (Talbot
The nascent debate around Big Data may appear to be quite polarised. As Sandra González-Bailón remarks, the discussion on the proper governance and use of all these novel data sources has bifurcated public opinion into a two-pronged needle: … the sceptics, who question the legitimate use of that data on the basis of privacy and other ethical concerns; and the enthusiasts, who focus on the transformational impact of having more information than ever before. (González
Both camps have extremists that will either dismiss the Big Data phenomenon as overhyped and underwhelming, or espouse the view that we are witnessing a new era in which the proliferation of data will render theory and interpretation superfluous (Anderson
While a healthy dose of both enthusiasm and scepticism is essential when dealing with new technologies, there are valuable lessons to be learned from the moderates on either side. Firstly, theory and interpretation are not likely to be discarded any time soon. Instead, their importance is reinforced as a sense-making tool in a growing sea of noisy data. Secondly, we would be wise to tread carefully, lest the critics are vindicated and we end up sleepwalking into a surveillance society.
As the debate and rhetoric advances and matures, it becomes important to capture the full range of nuanced challenges associated with the Big Data paradigm.
An interesting turn in this direction is provided by Richards and King in their paper «Three Paradoxes of Big Data» (Richards and King
However, a fair portion of our personal data exhaust–small data inputs from sensors, cell phones, clickstreams and the like–are generally amassed into aggregated datasets «behind the scenes», largely without our knowledge. These datasets may in turn be saved in unknown and remote cloud services, where equally hidden algorithms mine the data for strategic insights. The paradox of this, they argue, is that if Big Data promises to make the world more transparent, then why is it that its «collection is invisible, and its tools and techniques are opaque, shrouded by layers of physical, legal, and technical privacy by design?» (Richards and King While the authors acknowledge the need for trade secrets and the like, data collected from and used to make decisions about and on behalf of individuals merit the development of proper technical, commercial, ethical and legal safeguards. «We cannot have a system, or even the appearance of a system, where surveillance is secret, or where decisions are made about individuals by a Kafkaesque system of opaque and unreviewable decision-makers» (Richards and King
The paradoxes framed around transparency, identity and power touch on more than one raw nerve in the current discourse on the ethical and societal implications of Big Data. A closer look at the various elements along the «Big Data chain» – namely data collection and storage, the application of analytical tools and finally action on the basis of insights mined–also reveals a host of potential shortcomings in current protective measures, as well as new challenges and problems.
To date, most of the ethical concerns that have been raised relate to privacy challenges in the first link in the chain, namely that of the collection and storage of data. Many of these problems are not entirely new, but traditional mechanisms for ensuring privacy protection have come under increasing pressure with the advent of Big Data. Let us consider two cases: The system of «notice and consent», whereby individuals are given the choice to opt out of sharing their personal data with third parties, has become a favoured mechanism of data protection. In practice, however, the online user is frequently met with lengthy privacy notices written in obscure legal language, where ultimately, she is presented with a binary choice to either accept the complex set of terms or forsake the service in its entirety. The fatigue and apathy this generates is less than satisfying and it fails to bestow the individual with strong ownership over her data in any meaningful way. The problem is further exacerbated in the Big Data era because it places the onus of evaluating the consequences of data sharing on the individual generating the data. These evaluations can be both technical and complex, and individuals will necessarily be on unequal footing in terms of their ability to make informed choices. Therefore, researchers seeking to leverage e.g. social media data to study social systems cannot assume that they have tacit approval from users of these services – even if the consent agreement provides no legal impediments for such use. The key challenge lies in devising technical and regulatory frameworks that provide the user with tight and meaningful controls on personal data without compromising the practical utility of that same data. Beyond the mere impracticability of the researcher having to actively seek consent from large swathes of people in all cases, some will argue that giving the data owner an absolute say in if and how her data is used runs the risk of interfering with the innovation potential of data use (Cate and Schönberger Another favoured privacy protecting measure is to anonymise datasets by stripping them of personally identifiable information before they are made available for analysis. While such techniques might be privacy preserving when the dataset is treated in isolation, anonymised datasets have sometimes been shown to be easily de-anonymised when combined with other sources of information.
As part of a contest to improve its movie recommendation service, the online movie streaming service Netflix released an anonymised dataset containing the rental and rating history of almost half a million customers. By running the anonymised dataset against ratings on the online service «Internet Movie Database», researchers were not only able to identify individuals in Netflix's records, but also the political preferences of those people (Narayana and Shmatikov
There are similar examples of how an apparently anonymised dataset, when properly contextualised, is no longer truly anonymous. And the Big Data paradigm makes it increasingly more difficult to secure anonymity, because ever more data streams are generated, stored and made available for advanced data mining techniques (Navetta
The re-identification problem does not only highlight the shortcomings of established protective measures, but also shows that focusing exclusively on the proper governance of datasets and their attributes will often fall short of capturing the nuanced ethical challenges associated with data analysis. In order to grab the bull by the horns and provide the individual with meaningful control over personal information, it is necessary to govern
For while the current Big Data scene may appear to be dominated by a handful of major players, such as Google, Facebook and Amazon, its ecosystem is in fact highly distributed, with a host of third party actors operating behind the scenes which «often piggyback on the infrastructure built by the giants» (Sandberg
Moreover, the technical opacity of algorithms underpinning Big Data analysis, as well as the real-time nature of such analyses, does not easily lend itself to meaningful scrutiny by way of traditional transparency and oversight mechanisms. In a world where … highly detailed research datasets are expected to be shared and re-used, linked and analysed, for knowledge that may or may not benefit the subjects, and all manner of information exploited for commercial gain, seemingly without limit. (Dwork
it can be hard to gauge
Technology is likely to be at least one part of the solution. Novel approaches such as «differential privacy» leverage mathematics to ensure both consistent and high standards for privacy protection in statistical operations on datasets involving sensitive data. Differentially private algorithms satisfy mathematical conditions that allow the privacy risk involved in an operation on a dataset to be duly quantified. Once a threshold is passed the algorithm will intentionally blur the output so that individuals whose data are being analysed are ensured «plausible deniability». In other words, their presence or absence in the datasets in question has such a marginal impact on the aggregate result that there is no way of telling whether or not they were part of the dataset in the first place. Researchers can still draw value from the dataset because the «blurred» output differs only marginally from true output and the uncertainty, or «degree of blurring», is well known. Differentially private algorithms can also keep track of and appropriately quantify the cumulative privacy risk an individual sustains through repeated or multiple queries by iteratively adding more noise to mask any personal information residing in the data (Klarreich 2012).
Privacy and personal data protection are often touted as the central ethical challenges presented by Big Data. While technology may certainly help mitigate some of these risks, however, other challenges will require strong governance and legal protections. Furthermore, while we attend to the very pressing privacy concerns raised by Big Data, we should not lose sight of the many issues that fall outside the traditional privacy debate.
With recent technological advances, the cost of collecting, storing and analysing various kinds of data snippets has decreased quite dramatically. Novel methods also allow us to interlink and make sense of various kinds of data long after the data has been collected. As Alistair Croll remarks in an interesting blog-post: «In the old, data-is-scarce model, companies had to decide what to collect first, and then collect it. […] With the new, data-is-abundant model, we collect first and ask questions later» (Croll
Many remarkable successes of the Big Data paradigm, such as detecting disease outbreaks or predicting traffic jams, come from utilising data in ways that are very different from the original purpose of collection or the original context and meaning we bestowed on the data. However, Croll argues, this is a slippery slope fraught with ethical problems that go well beyond the regular privacy debate. Instead they deal with the inferences we are allowed to make and just how we act on or apply this insight. As the technical and financial barriers to what we can collect and do with data begin to crumble, the regulatory challenges to the proper use of data and analytics are likely to intensify (Soltani
As an example, Croll remarks on a study performed by the popular online dating service OkCupid, where the profile essays of some half a million users were mined for words that made each racial group in its member database statically distinguishable from other racial groups. According to the OkCupid blog post, «black people are 20 times more likely than everyone else to mention soul food, whereas no foods are distinct for white people» (Rudder
While such inferences may be partially construed as privacy issues–and legislative regulation can assist in preventing obvious transgressions–there are arguably deeper issues at play.
Inferences like the above are typically used to personalise and tailor adds, information, and online experiences to individuals. Such tailoring relies on classification–the algorithmic grouping of data points (people), and, as Dwork and Mulligan point out, such algorithms are a «messy mix of technical and human curating» and are «neither neutral nor objective», but always geared towards a specific purpose in a given context (Dwork and Mulligan
Some cities across the U.S., notably Philadelphia, use statistical profiling techniques to determine the risk of criminal recidivism among parolees. The method relies on classification of offenders into groups for which certain statistical probabilities can be computed. While critics dismiss such methods as ethically questionable at best (should a cold calculus of past offences punish you for crimes you have not yet committed?), proponents argue that the method is not doing anything a parole board would not do, except with greater accuracy, with full absence of discriminatory urges and with more consistency and transparency.
The case highlights the challenging problems that we are likely to face as Big Data moves out of its nascent stage of tailored ads to affect a wider range of human activity. It also shows how easy it is to fall prey to the temptation of framing the problem as one of man versus machine. Such an approach is likely to be counterproductive. The challenge of managing «ethical risk» in a Big Data world is one that is jointly technological and sociological, and as Dwork and Mulligan succinctly put it: «[…] Big Data debates are ultimately about values first, and about math and machines only second» (Dwork and Mulligan
Like other technologies in the past, as Big Data unfolds and is absorbed into society we are likely to see adjustments and changes in our current notions of privacy, civil liberties and moral guidelines. And as the hype eventually wears off and a proper equilibrium between the role of human intuition and data-driven insight is established, we will need to develop tools, guidelines and legislation to govern this new world of data and data analysis. This process will require a wide perspective, coupled with constant and close scrutiny of all links along the Big Data chain. It is an arduous task, but also one that should be exciting for all involved. Future societies might be shaped by technological advances, but technology itself is moulded by human choices. And these choices are available for us to make now.
Personal communication.
The description is based on Zimmer (
Quoted in Zimmer (
Zimmer,
There are open profiles on Facebook, e.g. open groups or open political profiles which, in my view, should not require consent in order to be used in research.
Hoser and Nitschke, 2009, page 185–186, my emphasis.
Moe and Larsson (
Moe and Larsson, 2011, p. 122.
NESH, 2003, point 4.
NESH, 2003, point 6.
Hallvarson and Lilliengren 2003, p. 130.
Hudson and Bruckman, 2004, p. 135.
Independent ethical committees that oversee human subject research at each institution.
Quoted in Hudson and Brickman, 2004, p. 137.
McKee and Porter, 2009, p. 109.
OECD Global Science Forum (
European Commission – IP/12/46 25/01/2012 – Press release: «Commission proposes a comprehensive reform of data protection rules to increase users’ control of their data and to cut costs for businesses»
Act of 14 April 2000 No. 31 relating to the processing of personal data, section one.
The National Committee for Research Ethics in the Social Sciences and the Humanities (NESH) (
Act of 14 April 2000 No. 31 relating to the processing of personal data, section eleven, third paragraph.
Prop. 47 L (2011–2012) Proposisjon til Stortinget (forslag til lovvedtak) Endringer i personopplysningsloven, Chapter five.
Act of 14 April 2000 No. 31 relating to the processing of personal data, section eleven, first paragraph, litra c.
Act of 14 April 2000 No. 31 relating to the processing of personal data, section eleven, second paragraph.
Act of 14 April 2000 No. 31 relating to the processing of personal data, section eight, first paragraph and section nine, litra a.
Act of 14 April 2000 No. 31 relating to the processing of personal data, section nine, litra h.
Act of 14 April 2000 No. 31 relating to the processing of personal data, section two, number seven.
Act of 14 April 2000 No. 31 relating to the processing of personal data, section 20, second paragraph, litra b.
The distinction between «high modern» and «late modern» is taken from Anthony Giddens (
Annette Markham, personal communication. I would also like to express my deep gratitude to Annette Markham and Elizabeth Buchanan for their expert help and invaluable references on this point.
(Accessed 14 March 2014). I am very grateful to Rich Ling (Telenor / IT-University, Copenhagen) for first calling my attention to this app.
Christine Von Seelen Schou (Telenor & University of Copenhagen), personal communication, 20.12.2012.
Anonymous researcher, personal communication, 20.06.11.
This contribution is in part built on Staksrud (
In this paper, the distinction «quantitative – qualitative» is used literally, not derivatively.
For a more detailed account of such research models and their implications see Hogan (
There are honourable exceptions to be noted of quantitative surveys in the field of children, Internet and online safety that have had children as informants. Most such surveys have taken place during the past decade, and many have been funded by the European Commission or by national governments (see Staksrud (
For an overview of other various and typical sociological dichotomies and discussions thereof see Jenks (
Most of the documentation of this approach can be found in the field of medicine. Using children (especially those institutionalized, such as in orphanages, with little protection from authorities) for medical experiments (of which exposure to viruses and the testing of vaccines would be a typical example) have been quite frequent, as the children often were «cheaper than calves» (Swedish physician of late 1800 quoted in Lederer & Grodin,
For a critical discussion on the various positions researchers can take in relation to children as informants see for instance Lahman (
This example, including the quotes, is taken from Livingstone, Kirwil, Ponte, & Staksrud (
A standard coding scheme was used. Children's responses, written in 21 languages, were then coded by native speakers. One in three (38 %) identified one or more online risks that they think bothers people their age on the Internet (N = 9,636 children: 5,033 girls and 4,603 boys). Response rates ranged from 73 % of children in Denmark to 4 % of those in Spain (with below 30 % also in Austria, Slovenia, Hungary, Bulgaria and Czech Republic). This variation may be due to genuine differences in children's level of concern, or it may have resulted from differences in fieldwork methodology. Of the 9,636 children who identified risks, 54 % identified one risk, 31 % identified two risks, and 15 % identified three or more risks. Up to three risks per child were coded, when applicable.
Within the context of the principles, «social networking services» are defined as services that combine the following features (Arto et al.,
Please refer to the original report (Staksrud & Lobe,
Self-declaration reports were submitted by the social networks between April 10th and June 17th 2009. All these reports are public and can be downloaded from the European Commission's website:
The author recognizes that the idea of «becoming» the child by setting up a social networking profile with a different persona carries with it problems and theoretical implications of its own.
The test and methodology was been developed by the two lead experts Dr. Bojana Lobe, University of Ljubljana, Slovenia and Dr. Elisabeth Staksrud, University of Oslo, Norway. In the draft stage, the testing questionnaire was submitted to the Social Networking Task Force for comments and suggestions. The European Commission approved the final version. The test has later been used in subsequent testing initiated by the European Commission.
On a few services it was not possible to send a general request asking for help, but rather requests pre-defined by the SNS. In these cases the tester was asked to send the pre-defined report that resembled the original message the most.
Figure adapted from Staksrud & Lobe (
See
API stands for «application programming interface» and is a software that can be used to collect structured data, for example from Twitter (boyd & Crawford, 2011: 7).
Cf.