How To Stalk Everyone
by Max von BeustThis is part 5 of my master thesis on Monetization Strategies and Business Models Behind Consumer Data. Find other chapters here:
3. Consumers as Data Models
3.3 Collecting Consumer Data
The previous chapter discussed the implications and forms of data collection to distinguish between a customer profile and a customer as data-model. The following chapter will dive into the technical aspects of data collection specifically, still differentiating between online and environmental data collection.
3.3.1 Online Collection
Online data collection is almost ubiquitous. Every website with some kind of analytics system, advertising, a Facebook Like Button or other social media plugin tracks the user, and often more than one party is tracking the user (Schneier, 2015). This can lead to the situation where a user is monitored by more than 1,300 firms on the 100 most relevant web pages - as was written in The Economist (2014) in 2014 already (a short experiment showed that visiting the top 50 websites in 2018 leads to at least 594 third-party connections). There are three main technologies that companies (or services) use in order to successfully and continuously track a user: cookies, device fingerprinting and unique device identifiers (ID) (Federal Trade Commission, 2016). These three tracking technologies will now be discussed briefly.
visiting the top 50 websites in 2018 leads to at least 594 third-party connections
Most User-Specific: Cookies
The utilization of cookies is the most prominent technology, which saves a small piece of information
on the local device when it is connected to a specific website (Krishnamurthy & Wills, 2006, p. 68).
Cookies can be useful for users as they save information about the user (e.g. login, game scores, shopping
cart,...), at the same time, they enable website providers and third-parties to save online usage information
about the user even if the user leaves the specific website - this also works if a user has ad-blockers
activated (Krishnamurthy & Wills, 2006, pp. 69–70).
Mid User-Specific:
Fingerprinting uses every information (e.g. user agent, screen resolution, cookie state, time zone,
plugins, fonts) that a browser transmits to a web server (through HTTP headers, Javascript, AJAX), concatenates
it and uses this expression as a unique ID, called fingerprint, for a single device (Eckersley, 2010,
p. 5). Eckersley (2010) describes the technical details and accuracy - even minor changes in the fingerprint
are traceable through basic algorithms making it a reliable tool for associating collected data with
a single device. It was developed as it was easy to track a user in the process, but the drawback is
the difficulty to tie all this information to a single user identity. This technique is based upon the
uniqueness of configurations in different devices and, thus, is effective only as long as there is no
mass adoption of standardized hard- and software in the market.
Least User-Specific: Unique Device IDs
On mobile devices such as smartphones a development of mass adoption of single products is visible (Gartner,
2017). In addition, apps are used and browsers are more uniform, making fingerprinting ineffective and
tracking more complex (Eckersley, 2010, p. 8). Nonetheless, tracking within web browsers still works
through cookies and Android Operating System (OS) offers software development kit (SDK) access to unique
(resettable) advertising and (stable) device IDs, which makes tracking consistent and reliable (Google,
2016). iOS provides a similar unique advertising ID, although it is seen as less stable - developers
can return to fingerprinting the device across web and apps through several other variables that are
accessible without permission (e.g. installed apps, device name, music list, time zone, clock skew) (Apple,
2017; Khanna, 2015, pp. 108–111; Kurtz et al., 2016, pp. 6–14).
they were able to compute personality traits of users simply by using quantitative data from users’ Facebook interactions
The data collected in online environments allows companies to fully understand user behavior and create meaningful profiles of users without their explicit input. As an example of the power of user data, Ortigosa, Carro and Quiroga (2014, p. 70) showed that they were able to compute personality traits of users simply by using quantitative data from users’ Facebook interactions. Companies that can follow every move of users in online environments across platforms and devices have insights into purchasing behavior, needs, desires, abilities, secrets, medical profiles and more by analyzing data that is produced by online user-technology interactions.
3.3.2 Environmental Collection
Environments outside of the online sphere present the next (and current) frontier of data collection with IoT as all technical objects around and within human bodies are evolving into connected computational devices (Schneier, 2015). In this area, we can see four different categories of data collection: smart home, autonomous machines, quantified self and public surveillance.
create a virtual twin of a person’s state and immediate environment
Smart Home
Sensors in smart home environments can collect information about temperature, humidity, pressure on
sofas, beds and chairs, movement, speech, images, water levels and flows, electricity usage, stock keeping
and many other variables (Reichherzer et al., 2016, pp. 17–18). Products such as the line-up by smart
home company Nest offer all-day live streaming of indoor and outdoor locations, tracking of gases and
fumes, image recognition, temperature and presence tracking and many more data points for the company
to create a virtual twin of a person’s state and immediate environment (Nest Labs Inc., 2018). Moreover,
these services connect with the location-tracking feature on smartphones and are compatible with smart
speakers such as Amazon’s Alexa or Google Home. Smart home connections are expected to grow by 18% until
2021 (from 2016) (Cisco, 2017b, p. 11).
Autonomous Machines
The creation and collection of data about consumers does not stop within smart homes. Increasingly,
autonomous machines such as cars are collecting data about its passengers and their surroundings. This
enables companies to build full virtual representations of the public world, people’s movements and their
behavior - be it inside or outside of a vehicle (Madrigal, 2017). Here, data collection can be done with
lasers, sonars and cameras (which complement technologies like GPS or mobile network connections). These
strategies are also applied in other autonomous technology such as delivery robots or drones (Ackerman,
2018; Louw & Silk, Amazon Technologies, Inc., Patent No. US 9714089 B1, 2017).
Quantified Self
A more present method of collecting data about consumers is the movement of the quantified self. Individuals
use different kinds of internal and external sensors to track physical, psychological, biological, and
behavioral information (Swan, 2013, p. 85; The Economist, 2012). The wearable technology company Fitbit
promotes features such as continuous heart rate tracking, automatic workout recognition and tracking,
sleep (quality) tracking, step and elevation tracking (Fitbit, 2018a). The data that is collected through
a basic, external sensor is then transmitted to Fitbit servers or to central health platforms such as
Apple HealthKit or Google Fit. Further developments are pills that can track whether a patient has taken
them or even track specific internal biological parameters that can be transmitted for further analysis
(Kalantar-Zadeh et al., 2018, pp. 81–84; U.S. Food and Drug Administration, 2017). These developments
show that within a few years, the mass market is likely to have access to technologies that do not only
enable tracking from outside of a body, but also from the inside, potentially making consumers and their
actions fully transparent to data collection companies.
pills that can track whether a patient has taken them or even track specific internal biological parameters that can be transmitted for further analysis
Public Surveillance
The last part of data collection about consumers is public surveillance in on- and offline environments.
The National Security Agency's (NSA) Prism program might be the most prominent example of full-scale
online data collection about consumers (Greenwald & MacAskill, 2013), but what is of even greater interest
in this context is offline data collection through CCTV and artificial-intelligence-based (AI) analysis
systems in the background (Vincent, 2018a). Surveillance cameras collect a large amount of data, which
is used for data mining already, and have become almost omnipresent in modern cities (Pribadi et al.,
2017, p. 22; Wakefield, 2017), as have a large array of other advanced technologies such as phone call
tracking, number plate readers, surveillance-enabled light-bulbs or through-the-wall audio sensors (Wheeler,
2016).
Undoubtedly, in a world where retailers are able to accurately track customers by watching their shoes
(Wakefield, 2017), where police-forces recognize criminals based on facial recognition in public (Vincent,
2018b) and companies (and governments) are able to identify individuals based on their voice (Kofman,
2018; Pinsky, 2017), consumers are increasingly becoming transparent.
3.4 Consumer Awareness
In light of these developments, it is necessary to have a closer look at the awareness consumers have about data collection and them becoming data models, as this would be the precondition for active engagement of consumers on debates surrounding these topics - be it for economic or privacy reasons. In an online, especially an app-based environment, consumers are not able to fully gauge the extent or the nature of data that is collected (Grundy, Held & Bero, 2017, p. 7) and have the feeling that they have lost control over what is done with their personal information (Rainie, 2016). This is clearly reflected in the findings of Golbeck and Mauriello (2016) which show that only a quarter of Facebook app users are aware of the potential of the application accessing private messages and app logs. High awareness was shown for the collection of profile information (such as gender, name, and education) and the possibility for Facebook to access spatial information (Golbeck & Mauriello, 2016, p. 14). A study by Morey, Forbath and Schoop (2015, p. 101) demonstrates that consumers are unaware of the data that is collected about them with only a quarter of all participants having awareness of friend lists and location recording and only 14% awareness of web history collection.
consumers are not able to fully gauge the extent or the nature of data that is collected
In 2016, at least 38% of consumers believed that their online activities were not monitored at all, although 49% of consumers were aware of the possibility of companies selling their data (Centre for International Governance Innovation & IPSOS, 2016). Even though these numbers are low, awareness has been increasing over the past years (Hampson, 2016). Nonetheless, the awareness of consumers regarding environmental data collection and its implications, remains fairly low (Weaver, 2015; Webster, 2009, p. 22), which might enable businesses to obtain greater benefit from consumers as data models. This asymmetry in information, awareness and, thus, power raises concerns which will be discussed in chapter 4.6.5.
Continue with: