The Data Economy

How To Stalk Everyone

This is part 5 of my master thesis on Monetization Strategies and Business Models Behind Consumer Data. Find other chapters here:

3. Consumers as Data Models

3.3 Collecting Consumer Data

The previous chapter discussed the implications and forms of data collection to distinguish between a customer profile and a customer as data-model. The following chapter will dive into the technical aspects of data collection specifically, still differentiating between online and environmental data collection.

3.3.1 Online Collection

Online data collection is almost ubiquitous. Every website with some kind of analytics system, advertising, a Facebook Like Button or other social media plugin tracks the user, and often more than one party is tracking the user (Schneier, 2015). This can lead to the situation where a user is monitored by more than 1,300 firms on the 100 most relevant web pages - as was written in The Economist (2014) in 2014 already (a short experiment showed that visiting the top 50 websites in 2018 leads to at least 594 third-party connections). There are three main technologies that companies (or services) use in order to successfully and continuously track a user: cookies, device fingerprinting and unique device identifiers (ID) (Federal Trade Commission, 2016). These three tracking technologies will now be discussed briefly.

visiting the top 50 websites in 2018 leads to at least 594 third-party connections

Most User-Specific: Cookies

The utilization of cookies is the most prominent technology, which saves a small piece of information on the local device when it is connected to a specific website (Krishnamurthy & Wills, 2006, p. 68). Cookies can be useful for users as they save information about the user (e.g. login, game scores, shopping cart,...), at the same time, they enable website providers and third-parties to save online usage information about the user even if the user leaves the specific website - this also works if a user has ad-blockers activated (Krishnamurthy & Wills, 2006, pp. 69–70).

Mid User-Specific:

Fingerprinting uses every information (e.g. user agent, screen resolution, cookie state, time zone, plugins, fonts) that a browser transmits to a web server (through HTTP headers, Javascript, AJAX), concatenates it and uses this expression as a unique ID, called fingerprint, for a single device (Eckersley, 2010, p. 5). Eckersley (2010) describes the technical details and accuracy - even minor changes in the fingerprint are traceable through basic algorithms making it a reliable tool for associating collected data with a single device. It was developed as it was easy to track a user in the process, but the drawback is the difficulty to tie all this information to a single user identity. This technique is based upon the uniqueness of configurations in different devices and, thus, is effective only as long as there is no mass adoption of standardized hard- and software in the market.

Least User-Specific: Unique Device IDs

On mobile devices such as smartphones a development of mass adoption of single products is visible (Gartner, 2017). In addition, apps are used and browsers are more uniform, making fingerprinting ineffective and tracking more complex (Eckersley, 2010, p. 8). Nonetheless, tracking within web browsers still works through cookies and Android Operating System (OS) offers software development kit (SDK) access to unique (resettable) advertising and (stable) device IDs, which makes tracking consistent and reliable (Google, 2016). iOS provides a similar unique advertising ID, although it is seen as less stable - developers can return to fingerprinting the device across web and apps through several other variables that are accessible without permission (e.g. installed apps, device name, music list, time zone, clock skew) (Apple, 2017; Khanna, 2015, pp. 108–111; Kurtz et al., 2016, pp. 6–14).

they were able to compute personality traits of users simply by using quantitative data from users’ Facebook interactions

The data collected in online environments allows companies to fully understand user behavior and create meaningful profiles of users without their explicit input. As an example of the power of user data, Ortigosa, Carro and Quiroga (2014, p. 70) showed that they were able to compute personality traits of users simply by using quantitative data from users’ Facebook interactions. Companies that can follow every move of users in online environments across platforms and devices have insights into purchasing behavior, needs, desires, abilities, secrets, medical profiles and more by analyzing data that is produced by online user-technology interactions.

3.3.2 Environmental Collection

Environments outside of the online sphere present the next (and current) frontier of data collection with IoT as all technical objects around and within human bodies are evolving into connected computational devices (Schneier, 2015). In this area, we can see four different categories of data collection: smart home, autonomous machines, quantified self and public surveillance.

create a virtual twin of a person’s state and immediate environment

Smart Home

Sensors in smart home environments can collect information about temperature, humidity, pressure on sofas, beds and chairs, movement, speech, images, water levels and flows, electricity usage, stock keeping and many other variables (Reichherzer et al., 2016, pp. 17–18). Products such as the line-up by smart home company Nest offer all-day live streaming of indoor and outdoor locations, tracking of gases and fumes, image recognition, temperature and presence tracking and many more data points for the company to create a virtual twin of a person’s state and immediate environment (Nest Labs Inc., 2018). Moreover, these services connect with the location-tracking feature on smartphones and are compatible with smart speakers such as Amazon’s Alexa or Google Home. Smart home connections are expected to grow by 18% until 2021 (from 2016) (Cisco, 2017b, p. 11).

Autonomous Machines

The creation and collection of data about consumers does not stop within smart homes. Increasingly, autonomous machines such as cars are collecting data about its passengers and their surroundings. This enables companies to build full virtual representations of the public world, people’s movements and their behavior - be it inside or outside of a vehicle (Madrigal, 2017). Here, data collection can be done with lasers, sonars and cameras (which complement technologies like GPS or mobile network connections). These strategies are also applied in other autonomous technology such as delivery robots or drones (Ackerman, 2018; Louw & Silk, Amazon Technologies, Inc., Patent No. US 9714089 B1, 2017).

Quantified Self

A more present method of collecting data about consumers is the movement of the quantified self. Individuals use different kinds of internal and external sensors to track physical, psychological, biological, and behavioral information (Swan, 2013, p. 85; The Economist, 2012). The wearable technology company Fitbit promotes features such as continuous heart rate tracking, automatic workout recognition and tracking, sleep (quality) tracking, step and elevation tracking (Fitbit, 2018a). The data that is collected through a basic, external sensor is then transmitted to Fitbit servers or to central health platforms such as Apple HealthKit or Google Fit. Further developments are pills that can track whether a patient has taken them or even track specific internal biological parameters that can be transmitted for further analysis (Kalantar-Zadeh et al., 2018, pp. 81–84; U.S. Food and Drug Administration, 2017). These developments show that within a few years, the mass market is likely to have access to technologies that do not only enable tracking from outside of a body, but also from the inside, potentially making consumers and their actions fully transparent to data collection companies.

pills that can track whether a patient has taken them or even track specific internal biological parameters that can be transmitted for further analysis

Public Surveillance

The last part of data collection about consumers is public surveillance in on- and offline environments. The National Security Agency's (NSA) Prism program might be the most prominent example of full-scale online data collection about consumers (Greenwald & MacAskill, 2013), but what is of even greater interest in this context is offline data collection through CCTV and artificial-intelligence-based (AI) analysis systems in the background (Vincent, 2018a). Surveillance cameras collect a large amount of data, which is used for data mining already, and have become almost omnipresent in modern cities (Pribadi et al., 2017, p. 22; Wakefield, 2017), as have a large array of other advanced technologies such as phone call tracking, number plate readers, surveillance-enabled light-bulbs or through-the-wall audio sensors (Wheeler, 2016).

Undoubtedly, in a world where retailers are able to accurately track customers by watching their shoes (Wakefield, 2017), where police-forces recognize criminals based on facial recognition in public (Vincent, 2018b) and companies (and governments) are able to identify individuals based on their voice (Kofman, 2018; Pinsky, 2017), consumers are increasingly becoming transparent.

3.4 Consumer Awareness

In light of these developments, it is necessary to have a closer look at the awareness consumers have about data collection and them becoming data models, as this would be the precondition for active engagement of consumers on debates surrounding these topics - be it for economic or privacy reasons. In an online, especially an app-based environment, consumers are not able to fully gauge the extent or the nature of data that is collected (Grundy, Held & Bero, 2017, p. 7) and have the feeling that they have lost control over what is done with their personal information (Rainie, 2016). This is clearly reflected in the findings of Golbeck and Mauriello (2016) which show that only a quarter of Facebook app users are aware of the potential of the application accessing private messages and app logs. High awareness was shown for the collection of profile information (such as gender, name, and education) and the possibility for Facebook to access spatial information (Golbeck & Mauriello, 2016, p. 14). A study by Morey, Forbath and Schoop (2015, p. 101) demonstrates that consumers are unaware of the data that is collected about them with only a quarter of all participants having awareness of friend lists and location recording and only 14% awareness of web history collection.

consumers are not able to fully gauge the extent or the nature of data that is collected

In 2016, at least 38% of consumers believed that their online activities were not monitored at all, although 49% of consumers were aware of the possibility of companies selling their data (Centre for International Governance Innovation & IPSOS, 2016). Even though these numbers are low, awareness has been increasing over the past years (Hampson, 2016). Nonetheless, the awareness of consumers regarding environmental data collection and its implications, remains fairly low (Weaver, 2015; Webster, 2009, p. 22), which might enable businesses to obtain greater benefit from consumers as data models. This asymmetry in information, awareness and, thus, power raises concerns which will be discussed in chapter 4.6.5.

Continue with:


Ackerman, E. (2018, January 30). "Nuro Raises $92 Million for Adorable Autonomous Delivery Vehicles: Somewhere between a delivery truck and a sidewalk robot, Nuro's robotic vehicles want to deliver your groceries", in: IEEE Spectrum. Retrieved from (accessed February 08, 2018).

Alexa by Amazon (2018). "Alexa Top 500 Global Sites". Retrieved from (accessed February 05, 2018).

Amatriain, X. (2013): "Mining large streams of user data for personalized recommendations", in: ACM SIGKDD Explorations Newsletter, Vol. 14 (2), pp.37–48.

Apple (2017). "ASIdentifierManager - AdSupport: Apple Developer Documentation". Retrieved from (accessed February 06, 2018).

Centre for International Governance Innovation & IPSOS (2016). "2016 CIGI-Ipsos Global Survey on Internet Security and Trust". Retrieved from (accessed February 10, 2018).

Cisco (2017). "The Zettabyte Era: Trends and Analysis". Retrieved from (accessed February 02, 2018).

Eckersley, P. (2010): "How Unique Is Your Web Browser?". In Lecture notes in computer science: Vol. 6205. Privacy enhancing technologies: 10th international symposium, PETS 2010, Berlin, Germany, July 21-23, 2010 ; proceedings (pp. 1–18). Berlin: Springer.

Federal Trade Commission (2016). "Online Tracking". Retrieved from (accessed February 05, 2018).

Fitbit (2018). "Fitbit Charge 2: Features". Retrieved from (accessed February 09, 2018).

Gartner (2017). "Gartner Says Worldwide Sales of Smartphones Grew 7 Percent in the Fourth Quarter of 2016". Egham, U.K. Retrieved from (accessed April 10, 2018).

Golbeck, J. & Mauriello, M. (2016): "User Perception of Facebook App Data Access: A Comparison of Methods and Privacy Concerns", in: Future Internet, Vol. 8 (4), pp.9–23.

Google (2016). "Best Practices for Unique Identifiers: Android Developer Guidelines". Retrieved from (accessed February 06, 2018).

Greenwald, G. & MacAskill, E. (2013, June 6). "NSA Prism program taps in to user data of Apple, Google and others", in: The Guardian. Retrieved from (accessed December 20, 2017).

Grundy, Q., Held, F. & Bero, L. (2017): "Tracing the Potential Flow of Consumer Data: A Network Analysis of Prominent Health and Fitness Apps", in: Journal of medical Internet research, Vol. 19 (6).

Hampson, F. (2016, April 19). Interview by Morgen, H. "Consumer surveillance awareness grows and Internet trust dwindles. Particular concern surrounds personal data in the hands of government and private corporations". Retrieved from (accessed Feburary 10, 2018).

Kalantar-Zadeh, K., Berean, K., Ha, N., Chrimes, A., Xu, K., Grando, D., Ou, J., Pillai, N., Campbell, J., Brkljača, R., Taylor, K., Burgell, R., Yao, C., Ward, S., McSweeney, C., Muir, J. & Gibson, P. (2018): "A human pilot trial of ingestible electronic capsules capable of sensing different gases in the gut", in: Nature Electronics, Vol. 1 (1), pp.79–87.

Khanna, V. (2015): "Remote fingerprinting of mobile phones", in: IEEE Wireless Communications, Vol. 22 (6), pp.106–113.

Kofman, A. (2018). "Finding Your Voice: Forget About Siri and Alexa — When It Comes to Voice Identification, the “NSA Reigns Supreme”". Retrieved from (accessed February 09, 2018).

Krishnamurthy, B. & Wills, C. (2006). "Generating a privacy footprint on the internet". In Almeida, J. (Ed.), Proceedings of the 6th ACM SIGCOMM conference on Internet measurement (pp. 65–70). New York, NY: ACM.

Kurtz, A., Gascon, H., Becker, T., Rieck, K. & Freiling, F. (2016): "Fingerprinting mobile devices using personalized configurations", in: Proceedings on Privacy Enhancing Technologies, Vol. 2016 (1), pp.4–19.

Louw, D. & Silk, R. (2017). "Trigger agents in video streams from drones", Amazon Technologies, Inc., Patent No. US 9714089 B1. USA.

Madrigal, A. (2017, August 23). "Waymo Built a Secret World for Self-Driving Cars: An exclusive look at how Alphabet understands its most ambitious artificial intelligence project", in: The Atlantic. Retrieved from (accessed February 08, 2018).

Morey, T., Forbath, T. & Schoop, A. (2015): "Customer data: Designing for transparency and trust", in: Harvard business review, Vol. 93 (5), pp.96–105. Retrieved from (accessed February, 4, 2018).

Nest Labs Inc. (2018). "Create a Connected Home". Retrieved from (accessed February 07, 2018).

Ortigosa, A., Carro, R. & Quiroga, J. (2014): "Predicting user personality by mining social interactions in Facebook", in: Journal of Computer and System Sciences, Vol. 80 (1), pp.57–71.

Pinsky, Y., Google (2017). "Tomato, tomahto. Google Home now supports multiple users". Retrieved from (accessed February 09, 2018).

Pribadi, A., Kumiawan, F., Hariadi, M. & Nugroho, S. (2017). "Urban distribution CCTV for smart city using decision tree methods". In 2017 International Seminar on Intelligent Technology and its Application (ISITIA): Proceeding : Surabaya, Indonesia, August, 28-29, 2017 (pp. 21–24). Piscataway, NJ: IEEE.

Rainie, L. (2016). "The state of privacy in post-Snowden America". Retrieved from (accessed February 10, 2018).

Reichherzer, T., Satterfield, S., Belitsos, J., Chudzynski, J. & Watson, L. (2016). "An Agent-Based Architecture for Sensor Data Collection and Reasoning in Smart Home Environments for Independent Living". In Khoury, R. & Drummond, C. (Eds.): Vol. 9673. Lecture notes in computer science, Advances in Artificial Intelligence (pp. 15–20). Springer International Publishing.

Schneier, B. (2015, May 17). "How we sold our souls – and more – to the internet giants", in: The Guardian. Retrieved from (accessed February 04, 2018).

Swan, M. (2013): "The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery", in: Big data, Vol. 1 (2), pp.85–99.

The Economist (2012). "Counting every moment". Retrieved from (accessed February 09, 2018).

The Economist (2014). "Getting to know you: Everything people do online is avidly followed by advertisers and third-party trackers". Retrieved from (accessed February 05, 2018).

U.S. Food and Drug Administration (2017). "FDA approves pill with sensor that digitally tracks if patients have ingested their medication". Silver Spring, MD 20993, USA. Retrieved from (accessed February 09, 2018).

Vincent, J. (2018a). "Artificial intelligence is going to supercharge surveillance: What happens when digital eyes get the brains to match?". Retrieved from (accessed February 03, 2018).

Vincent, J. (2018b). "Chinese police are using facial recognition sunglasses to track citizens: The glasses are being used by officers in police stations to oversee travelers during the Lunar New Year". Retrieved from (accessed February 09, 2018).

Wakefield, J. (2017). "Tomorrow's cities: Are your shoes giving away data?". Retrieved from (accessed February 09, 2018).

Weaver, M. (2015, January 6). "UK public must wake up to risks of CCTV, says surveillance commissioner: Tony Porter says Britons are blind to extent of monitoring and wants public bodies to be more open about use of cameras", in: The Guardian. Retrieved from (accessed February 10, 2018).

Webster, C. (2009): "CCTV policy in the UK: Reconsidering the evidence base", in: Surveillance and Society, Vol. 6 (1), pp.10–22. Retrieved from

Wheeler, B. (2016). "The US city that beat Big Brother". Retrieved from (accessed February 09, 2018).