How To Stalk Everyoneby Max von Beust
This is part 5 of my master thesis on Monetization Strategies and Business Models Behind Consumer Data. Find other chapters here:
3. Consumers as Data Models
3.3 Collecting Consumer Data
The previous chapter discussed the implications and forms of data collection to distinguish between a customer profile and a customer as data-model. The following chapter will dive into the technical aspects of data collection specifically, still differentiating between online and environmental data collection.
3.3.1 Online Collection
Online data collection is almost ubiquitous. Every website with some kind of analytics system, advertising, a Facebook Like Button or other social media plugin tracks the user, and often more than one party is tracking the user (Schneier, 2015). This can lead to the situation where a user is monitored by more than 1,300 firms on the 100 most relevant web pages - as was written in The Economist (2014) in 2014 already (a short experiment showed that visiting the top 50 websites in 2018 leads to at least 594 third-party connections). There are three main technologies that companies (or services) use in order to successfully and continuously track a user: cookies, device fingerprinting and unique device identifiers (ID) (Federal Trade Commission, 2016). These three tracking technologies will now be discussed briefly.
visiting the top 50 websites in 2018 leads to at least 594 third-party connections
Most User-Specific: Cookies
The utilization of cookies is the most prominent technology, which saves a small piece of information on the local device when it is connected to a specific website (Krishnamurthy & Wills, 2006, p. 68). Cookies can be useful for users as they save information about the user (e.g. login, game scores, shopping cart,...), at the same time, they enable website providers and third-parties to save online usage information about the user even if the user leaves the specific website - this also works if a user has ad-blockers activated (Krishnamurthy & Wills, 2006, pp. 69–70).
Least User-Specific: Unique Device IDs
On mobile devices such as smartphones a development of mass adoption of single products is visible (Gartner, 2017). In addition, apps are used and browsers are more uniform, making fingerprinting ineffective and tracking more complex (Eckersley, 2010, p. 8). Nonetheless, tracking within web browsers still works through cookies and Android Operating System (OS) offers software development kit (SDK) access to unique (resettable) advertising and (stable) device IDs, which makes tracking consistent and reliable (Google, 2016). iOS provides a similar unique advertising ID, although it is seen as less stable - developers can return to fingerprinting the device across web and apps through several other variables that are accessible without permission (e.g. installed apps, device name, music list, time zone, clock skew) (Apple, 2017; Khanna, 2015, pp. 108–111; Kurtz et al., 2016, pp. 6–14).
they were able to compute personality traits of users simply by using quantitative data from users’ Facebook interactions
The data collected in online environments allows companies to fully understand user behavior and create meaningful profiles of users without their explicit input. As an example of the power of user data, Ortigosa, Carro and Quiroga (2014, p. 70) showed that they were able to compute personality traits of users simply by using quantitative data from users’ Facebook interactions. Companies that can follow every move of users in online environments across platforms and devices have insights into purchasing behavior, needs, desires, abilities, secrets, medical profiles and more by analyzing data that is produced by online user-technology interactions.
3.3.2 Environmental Collection
Environments outside of the online sphere present the next (and current) frontier of data collection with IoT as all technical objects around and within human bodies are evolving into connected computational devices (Schneier, 2015). In this area, we can see four different categories of data collection: smart home, autonomous machines, quantified self and public surveillance.
create a virtual twin of a person’s state and immediate environment
Sensors in smart home environments can collect information about temperature, humidity, pressure on sofas, beds and chairs, movement, speech, images, water levels and flows, electricity usage, stock keeping and many other variables (Reichherzer et al., 2016, pp. 17–18). Products such as the line-up by smart home company Nest offer all-day live streaming of indoor and outdoor locations, tracking of gases and fumes, image recognition, temperature and presence tracking and many more data points for the company to create a virtual twin of a person’s state and immediate environment (Nest Labs Inc., 2018). Moreover, these services connect with the location-tracking feature on smartphones and are compatible with smart speakers such as Amazon’s Alexa or Google Home. Smart home connections are expected to grow by 18% until 2021 (from 2016) (Cisco, 2017b, p. 11).
The creation and collection of data about consumers does not stop within smart homes. Increasingly, autonomous machines such as cars are collecting data about its passengers and their surroundings. This enables companies to build full virtual representations of the public world, people’s movements and their behavior - be it inside or outside of a vehicle (Madrigal, 2017). Here, data collection can be done with lasers, sonars and cameras (which complement technologies like GPS or mobile network connections). These strategies are also applied in other autonomous technology such as delivery robots or drones (Ackerman, 2018; Louw & Silk, Amazon Technologies, Inc., Patent No. US 9714089 B1, 2017).
A more present method of collecting data about consumers is the movement of the quantified self. Individuals use different kinds of internal and external sensors to track physical, psychological, biological, and behavioral information (Swan, 2013, p. 85; The Economist, 2012). The wearable technology company Fitbit promotes features such as continuous heart rate tracking, automatic workout recognition and tracking, sleep (quality) tracking, step and elevation tracking (Fitbit, 2018a). The data that is collected through a basic, external sensor is then transmitted to Fitbit servers or to central health platforms such as Apple HealthKit or Google Fit. Further developments are pills that can track whether a patient has taken them or even track specific internal biological parameters that can be transmitted for further analysis (Kalantar-Zadeh et al., 2018, pp. 81–84; U.S. Food and Drug Administration, 2017). These developments show that within a few years, the mass market is likely to have access to technologies that do not only enable tracking from outside of a body, but also from the inside, potentially making consumers and their actions fully transparent to data collection companies.
pills that can track whether a patient has taken them or even track specific internal biological parameters that can be transmitted for further analysis
The last part of data collection about consumers is public surveillance in on- and offline environments. The National Security Agency's (NSA) Prism program might be the most prominent example of full-scale online data collection about consumers (Greenwald & MacAskill, 2013), but what is of even greater interest in this context is offline data collection through CCTV and artificial-intelligence-based (AI) analysis systems in the background (Vincent, 2018a). Surveillance cameras collect a large amount of data, which is used for data mining already, and have become almost omnipresent in modern cities (Pribadi et al., 2017, p. 22; Wakefield, 2017), as have a large array of other advanced technologies such as phone call tracking, number plate readers, surveillance-enabled light-bulbs or through-the-wall audio sensors (Wheeler, 2016).
Undoubtedly, in a world where retailers are able to accurately track customers by watching their shoes (Wakefield, 2017), where police-forces recognize criminals based on facial recognition in public (Vincent, 2018b) and companies (and governments) are able to identify individuals based on their voice (Kofman, 2018; Pinsky, 2017), consumers are increasingly becoming transparent.
3.4 Consumer Awareness
In light of these developments, it is necessary to have a closer look at the awareness consumers have about data collection and them becoming data models, as this would be the precondition for active engagement of consumers on debates surrounding these topics - be it for economic or privacy reasons. In an online, especially an app-based environment, consumers are not able to fully gauge the extent or the nature of data that is collected (Grundy, Held & Bero, 2017, p. 7) and have the feeling that they have lost control over what is done with their personal information (Rainie, 2016). This is clearly reflected in the findings of Golbeck and Mauriello (2016) which show that only a quarter of Facebook app users are aware of the potential of the application accessing private messages and app logs. High awareness was shown for the collection of profile information (such as gender, name, and education) and the possibility for Facebook to access spatial information (Golbeck & Mauriello, 2016, p. 14). A study by Morey, Forbath and Schoop (2015, p. 101) demonstrates that consumers are unaware of the data that is collected about them with only a quarter of all participants having awareness of friend lists and location recording and only 14% awareness of web history collection.
consumers are not able to fully gauge the extent or the nature of data that is collected
In 2016, at least 38% of consumers believed that their online activities were not monitored at all, although 49% of consumers were aware of the possibility of companies selling their data (Centre for International Governance Innovation & IPSOS, 2016). Even though these numbers are low, awareness has been increasing over the past years (Hampson, 2016). Nonetheless, the awareness of consumers regarding environmental data collection and its implications, remains fairly low (Weaver, 2015; Webster, 2009, p. 22), which might enable businesses to obtain greater benefit from consumers as data models. This asymmetry in information, awareness and, thus, power raises concerns which will be discussed in chapter 4.6.5.