Challenges associated with data collection practices
However, current data collection practices have raised a number of social, political, and regulatory challenges related to privacy, consent, security, and bias. In recent years, there has been increasing media coverage, public debates, advocacy, and political discussions pointing towards the lack of policies surrounding the collection and use of an individual’s data, creating gaps in oversight and citizen protections.
At the root of this discussion is the issue of consent. While regulation requiring consent exists, there is debate surrounding the impact of these laws. The Government of Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA) requires organizations to obtain consent from consumers in order to collect and disclose how they will use, disclose, and manage their data. However, it is questionable whether individuals are able to give informed consent. Individuals are only able to give informed consent when they have a clear understanding of how organizations will be collecting, using, and sharing their data.
Beyond the issue of consent, privacy is a major concern for many consumers, as individuals are becoming more aware that information such as their location data, search history, or genetic information are being collected by application providers and other organizations. Privacy policies generate the expectation that organizations will take measures to ensure the privacy of individuals and their data. For this reason, many organizations anonymize the data they collect from individuals. However, in July 2019, a group of researchers found that 99.98 percent of Americans could be correctly re-identified in any anonymized dataset by using just 15 demographic attributes. Challenges associated with privacy, like these, are likely to continue arising as technology advances and the amount of data shared by individuals increases.
The issue of privacy ties in closely to data security. Once individuals provide their consent, it is typically expected that the organization collecting their data will use and manage it in a secure way, so as to prevent individuals’ information from being used by other organizations, or for purposes for which it was not intended. However, there have been many high-profile examples of data security breaches that have raised concerns around the challenge of data security. One of the most prominent data security breaches in recent years was the Cambridge Analytica scandal, in which the personal data of approximately 87 million Facebook users was acquired without their knowledge and permission, and used to inform election campaign strategies in countries such as the United States, Kenya, and India.
Another major challenge associated with data collection practices is bias. Pre-existing biases related to race, ethnicity, religion, gender, sexual orientation, age, or disability could be, consciously or unconsciously, “baked” into a data set by virtue of the person who collected it, the processes through which the data has been gathered, and how it is used. Datasets could also contain bias by virtue of the representativeness of populations — who has and has not been represented in the data. This does not necessarily come about due to pre-existing biases, but could be the result of individuals lacking access to technologies used to collect this data or consenting to data collection. In many cases, data is repurposed to train the underlying algorithms within AI systems — a process that data was not originally collected for. Some notable examples of data which has been repurposed include historical police data used to train predictive policing algorithms and Flickr photos used to train facial recognition algorithms. There are a number of examples where the use of biased data resulted in largely unintended social consequences when used to train AI-driven hiring tools and facial recognition algorithms. The use of biased data can have far reaching impacts, such as affecting individuals’ access to services and opportunities.
The first article in this series will explore data collection practices that occur at home: the place where most individuals start and end their days.
Technology and policy related to this topic are constantly evolving. If you think we have missed something or see an error please contact Sarah Villeneuve (email@example.com). If you want to get involved in subsequent phases of this project, apply here.