Skip to content
BY 4.0 license Open Access Published by De Gruyter April 21, 2020

“Select before You Collect”: Uses and Abuses of Profiling and Data Mining in Law and Literature

Jeanne Gaakeer
From the journal Pólemos


This article addresses some of the risks involved in the uses of information technologies such as profiling and data mining by means of the German jurist-philosopher Juli Zeh’s dystopic novel Leere Herzen.

1 Breaking free?

1.1 Big brother

In 2017 a Dutch newspaper covered a story about a CEO of Roomba announcing that this company expected to close a deal with one of the global players in the market of information technology companies (i. e. Amazon, Apple or Google’s mother company Alphabet). Subject was the valuable information indicative of the lifestyle of its owners that Roomba’s new type of vacuum cleaner, the robotic vacuum cleaner, could provide. This vacuum cleaner would be able to scan the house it was cleaning with its sensors and would provide a map or chart of it so that its owners could be targeted by advertisements of things found wanting and, subsequently, be tempted into buying, for example, new furniture, or anything else that one would need, or perhaps not need at all, in a “smart home”.

This is a kind of profiling and data mining, albeit on a miniature scale, i. e. of just a couple of individuated subjects, rather than an indefinite group of people. Profiling, then, is the term used to denote

The process of ‘discovering’ correlations between data in databases that can be used to identify and represent a human or nonhuman subject (individual or group) and/or the application of profiles (sets of correlated data) to individuate and represent a subject or to identify a subject as a member of a group or category. [1]

The process consists of two phases. The first is that of drawing up a profile of a group on the basis of specific indicators deemed of decisive importance in view of the information sought; the second is that of analysing and weighing the information. Data mining as a form of ambient intelligence creates

a context-aware environment in which, by means of the coordinated use of databases, sensors, micro-devices and software agents, numerous systems spontaneously scan our environment and daily activities for data to be able to serve us with particular information and (inter)actions, based on certain notions about what is appropriate for us as unique individuals given the particulars of our daily life and context.” [2]

The data subject is the subject that any profile refers to, and the data controller is the subject that determines both the purpose and the use of such processing of data. Both forms of ambient intelligence are computational models of human behaviour that make use of algorithms in order, firstly, to collect data and, secondly, to analyse patterns and correlations to predict future human behaviour. These new technologies that big companies deploy to find out more about their targeted customers and then to act on that information – how does Netflix “know” what you would like to watch? Why does Amazon alert you to the fact that other customers who bought article A also bought articles B and C? –, raise various questions important in themselves but also for law because that has these past decades put these technologies to good use.

1.2 Big data

In legal surroundings, for example, it is important to note that Big Data Mining as already done by the police all over the world implies that huge data bases can be and are being connected and computerised that as such do not contain indications of criminal behaviour. Think of systems that store and file the number plates of all the cars driving on a specific motorway and then compare these with already existing police files. It is a well-known fact that both the hits and the no-hits are stored for later usage. In 2014, the Dutch Financial Intelligence Unit, instituted on the basis of legislation to prevent money laundering and the financing of terrorism, investigated 277, 000 of what they considered “unusual transactions”, of which eventually only 10 % proved to be suspect. That means that data of 90 % of the citizens in this target group have been analysed by the government without a reasonable suspicion. The British Government Communications Headquarter (GCHQ) is said to use a data vacuum cleaner in order to create profiles of internet users worldwide for further and/or future analysis.

Social media and their users inadvertently supply information on the internet that is up for grabs. Children are especially vulnerable; they provide private data when downloading apps or preparing their personal profiles on social media. In the USA, this has already led to the Children’s Online Privacy Protection Act with the requirement of parental permission in the case of collecting data of children younger than 13 years. “Spy-ware” and Big Data may be helpful tools for local councils having to decide who gets social security payments and who does not. [3] Mass surveillance is not only connected to the Snowden affair, nor is the Cambridge Analytica affair a rare example. Thus, Big Data data mining can violate the privacy rights of large groups of citizens protected by article 8 of the European Convention on Human Rights if data mining is deployed without any regard for the presumption of innocence. [4] The algorithm used in the Public Safety Assessment program in the USA that gives each suspect a score indicating the risk of his jumping bail is supposedly less biased than human beings making such assessment. However, the algorithm used in the classification all too often gives a high score to black people. The risk of discrimination looms large when profiling is deliberately racial and/or ethnic. What is more, as far as existing forms of data protection are concerned, it makes a difference whether we are talking about data protection by design, i. e. inscribed in the design itself, or whether we are talking about data protection by default, i. e. by means of standards that are known to be adjustable in themselves. One mistake made when feeding the algorithm may negatively affect a person for the rest of his or her life. And then we have not even considered the risk of outdated information being used: which value can then be attached to correlations found?

1.3 Big problem

In short, there is an exponential growth of Big Data and we should be aware of the risk of an equally exponential growth of big problems, namely with respect to privacy and self-determination. Who am I – the ontological question – if the vacuum cleaner is watching me and enticing me to act in a specific manner? Am I still in control or is this just an Enlightenment illusion? And are we fully aware, politically and socially, that in the wrong hands, algorithms can become instruments of oppression and discrimination? Our grasp on reality – the epistemological perspective – in situations of constant targeting may need re-education with respect to the many risks concerned. Van der Hof and Prins argue that the problem is that “the use and ‘value’ of personal data cannot be separated from the specifics of the context (social, economic and institutional settings) within which these data are collected and used.” [5] Thus, our being unconscious of the fact that we leave a digital trail with practically our every move, including the unconscious generation of clickstream data when we visit the world wide web, shows that “[T]he real problem is how personal data are processed, in what context and towards what end.” [6] In other words, the data controller is in total control. Furthermore, prior to the determination of this problem the question that needs addressing is how to create a public awareness of the fact that all too often our attitude toward technological innovations is downright careless if we consider which information we readily give away without considering the possible consequences we may suffer. This is also to say that we need to pay careful attention to the difference between volunteered and observed data. As Hildebrandt notes,

Volunteered data concerns whatever people deliberately posted or provided to a service provider, for instance a photo posted on a social networking site (SNS) or credit card details provided to a webshop. Observed data concerns the behavioural data that software machines create by measuring online clickstream behaviour […] Observed data are not consciously provided by a person; they are like footprints left in the digital sand of a landscape where the distinction between online and offline becomes increasingly artificial. [7]

An example of a comparable problem can be found in the advice that the Dutch Council of State issued recently. In it, the Council addressed the possibly reduced protection by the law of the individual citizen when binary, black-and-white thought takes over in I-Government. [8] Reality is more variegated than the algorithms presuppose. The owners of camper vans, for example, had a very hard time to explain that their vehicles were no lorries when trajectory speed controls on motorways “recognised” them as such and had them fined accordingly since lorries have a lower speed limit than cars or vans.

I aim to illustrate some of the risks by means of an analysis of Juli Zeh’s novel Leere Herzen (Empty Hearts) in order to also raise the question of the role of the humanities when it comes to reflect on such technological developments. Obviously, Jane Austen already used profiling techniques when we look at the opening line of Pride and Prejudice, ‘‘ ‘It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.’ ” [9] In Henry James’s novella “In the Cage,” [10] – the cage being the “framed and wired confinement” in which the protagonist, a young woman, works as a telegraphist – we read how she imagines all kinds of interconnections between her customers on the basis of their presumed character traits and so on and so forth. And then we have not even considered Edith Wharton’s short story “The Copy” [11] that presages the digital copies so easily made and presumedly preserved until the end of time.

When as human beings we find ourselves in a situation in which we are being looked upon as clusters of algorithms and are being constantly targeted for whatever information we willingly or unwillingly share, with our data taken to be portable, i. e. data can be transported and used in an environment different from the context of production or genesis, John Donne’s premise that “No man is an island” has acquired a dark side. The development of new technologies should, therefore, not only make us aware of the need to think or re-think about our human, all too human, proclivities, but should also with each new method of data mining make us keep in mind the adage “select before you – and, I would add, they – collect”. This is also to suggest that we need to consider the late Freddy Mercury’s heartfelt cry when he sang that he wanted to break free from his vacuum cleaner world.

2 A bridge over troubled water

In the novel Leere Herzen (Empty Hearts) the German jurist-philosopher Juli Zeh brings together two topics that hold us captive in contemporary societies, namely populism and terrorism, with data mining and profiling in a profoundly dystopian and disturbing narrative. [12]

The story is set in the near future, 2025. Its title is derived from the fictional 2025 hitsong “Full Hands Empty Hearts/ It’s a Suicide World, Baby” from the debut album by an artist named Molly Richter. It is indicative of the plot in more than one way. We read about the sucessful German entrepreneur Britta Söldner and her less successful husband Richard, whose seven-year-old daughter Vera is the bosom friend of Cora, also seven and daughter of the happy-go-lucky couple Janina and Knut, whose dream to live in a house in the countryside is for the time being thwarted by their poverty and lack of resolve. Politically all is well, or so it seems. Presidents Trump and Putin have solved the problems in Syria. Federal Chancellor Angela Merkel has lost the election to the nationalists whose party of presumedly caring, anxious citizens, the Besorgte-Bürger-Bewegung, wants to abolish federalism and dissolve the United Nations and the European Union. Now that they are in power – after democratic elections as was the case with the National Socialists in the 1930s, as one character ominously remarks during what is to be a final reckoning in the novel – efficiency is the keyword. Practically all parliamentary checks and balances are done away with. The police and the secret services thrive. There are only three constitutional judges left to preserve the rule of law. The inner cities are purged of Islamic influences such as Koran bookshops and teahouses. To be deplored is, of course, the violence in the streets – the collateral damage of rightist people attacking migrants and the other way around – that is out of the government’s control so that a return to a Hobbesian state of nature or the twilight zone of the bellum omnium contra omnes is to be feared. What is more, citizens no longer know what to think, or what they are allowed to think so that Orwell too comes to mind.

Yet all is well in the district of Braunschweig where Britta and her business partner Babak Hamwi, a homosexual refugee from Iraq, have founded Die Brücke, a clinic for psycho-therapy, self-managing, life-coaching, and ego-polishing. Patients of their firm are people who are suicidal, or rather, at risk of becoming suicidal. By means of an algorithm called Lassie and psychological and behavioural tests developed by Britta, potential targets are being sought. When after the treatment involving various tests, clinical observations and waterboarding a patient remains suicidal, Britta “resells” him to a terrorist organisation that is also a client of the firm, ranging from Islamic State to Green Power. In short, terrorism has professionalised and Britta’s firm discretely provides them with martyrs who delight in the win-win situation.

Then a terrorist attack takes place at Leipzig airport. It is perpetrated in such a clumsy way that it cannot have been the work of a former patient. Babak has Lassie run the data and it turns out that the two terrorists do not even have suicidal traits. If the algorithm does not make mistakes, then maybe there are competitors? Britta becomes anxious when she finds that she is repeatedly being watched by a man called Guido Hatz who also enters their lives as the wealthy entrepreneur who is going to invest in Richard’s start-up. Hatz tells Brita that he has been her guardian angel for quite a while and he advises her to prepare for a time-out because she is far too busy to live a happy life. Does Hatz know what Die Brücke actually does? Britta tries to reach all her professional contacts to find out more. She can hardly think clearly anymore. After yet another advice from Hatz to close the firm, Britta and Babak realise that the firm has been burgled and that they are being watched. Luckily Lassie had already been evacuated, but unfortunately Babak’s handwritten chart that literally ‘connected the dots’ of the list of the names of the 120 people possibly involved in the Leipzig attack, is gone. Britta and Babak decide to disappear, together with their latest client Julietta. They find refuge in the house in the countryside that Britta helped Janina and Knut finance. Every night Babak goes to the secret place where he hid Lassie in order to find out more about members of the Empty Hearts movement because one of the Leipzig terrorists, Markus Blattner who not incidentally is found dead in prison before Babak had a chance to visit him, had a tattoo with that eponymous songtitle.

Suddenly Guido Hatz arrives on their doorstep. They manage to overpower him. Hatz’s story is that he is some sort of offical who has for the past ten years made sure that Die Brücke was excluded from government surveillance. He informs them that the Empty Hearts are planning a coup in order to reinstate Angela Merkel and democratic government and that if they cooperate Britta and Babak will obviously get a place in the new world order. Britta is suspicious. Rather than setting Hatz free, she uses his crypto phone in order to send what later proves to be a fake message to his accomplices that Hatz has won them over. The plan goes like this: Julietta and two other clients of the firm will help stage the coup, Britta can go home to her family and Babak can help Richard become a market leader in the new world order. However, when the breaking news comes that the Besorgte-Bürger-Bewegung leadership is stormed by people with Empty Hearts tattoos, Britta and Babak find out that they have been deceived. The coup was not a coup but a successful attempt to silence them. In the end the Besorgte-Bürger-Bewegung is victorious. But why then did Britta agree to have her clients participate in the coup sacrificing them to Hatz’s trap? One thing is certain, if Britta and Babak had accepted Hatz’s story at face value, they themselves would have been destroyed. Now through Britta’s cunning strategy of pretending to cooperate, they are even more part of society’s establishment than they were before. The novel thus ends on an ultimately cynical note. The powers that be remain in charge, although, ironically, Britta has upped the ante on them given their belief in conspiracy theories. She can continue her life of affluence with the empty heart that was hers all along.

3 The task of the humanities: Connecting the dots

What Zeh’s novel poignantly shows is how easily the suicidal clients of Die Brücke are nudged into compliance with Britta’s plan for them because it taps their inner views about their goal in life and about who they are. “Nudging” as a way of steering human behaviour thrives in an environment in which causality has been exchanged for correlation, as is the case in the world of Big Data. [13] The idea behind it is that people experience difficulties when it comes to making choices if there are (too) many options available because that makes rational decision-making difficult. Nudging then helps people decide. When governmental institutions nudge their citizens, the question is whether they are transparent about it. The answer would be: Probably not, because transparency hinders compliance. By way of negative reaction to Governmental “Behavioural Insight Teams” (BITs) or “Nudge Units” people can try to actively avoid being under surveillance of, for example, tax authorities. And when function creep sets in and systems originally designed for different purposes are being connected and then applied together, the risk of data spreading like an inkblot is immense. With that comes a disruption of the individual citizen’s freedom when, for example, the government emphasises safety and efficiency as core values rather than legality and basic rights that the rule of law in democratic societies promotes.

Turned the other way around, we therefore need to ask ourselves what it is that people do when they start behaving in the way a governmental institution expects them to, if we recall Orwell’s protagonist Winston Smith in Nineteen Eighty-Four and the chilling effects of surveillance and nudging. As noted in paragraph 1, who am I if I am one of the “dots” [14] and what does this mean for “reality as we know it”? Can we as ordinary citizens fully grasp technology’s new realities? “Trying to nudge people into obedience or into desirable behaviours does not take them seriously as individual human actors,” as Hildebrandt claims. [15] This leads me to consider two topics closely connected to free will and responsibility, firstly, that of personhood, and, secondly, that of identity, both in their interconnection to self-determination. Am I supposed to be “just” the whole of my “data” and who decides what my data are? What is more, causality is not a given and the idea of looking for correlations by means of “deep learning” is misleading: algorithms do not and cannot know what humans call reality, and hence they cannot give legitimacy to what they decide for us. [16]

3.1 Vacuum cleaners as persons?

When it comes to conferring rights and responsibilities in the case of technological innovations such as “smart” devices, it should be noted that legislation as a form of legal reaction is necessarily one step behind any new development. What is more, as Teubner aptly notes, “The dynamics of digitalization are constantly creating responsible-free spaces that will expand in the future,” and the reason that this is so is that “legal doctrine insists on responding to the new digital realities exclusively with traditional conceptual instruments.” [17] Think, for example, of the legal subject as an entity with a specific status. Legal personhood is conferred by law on humans and entities such as corporations. In everyday life it is relatively easy to recognise a fellow human being if you meet one. That is to say that we then also recognise the rights and responsibilities of that independent legal-human unit. With artificial persons such as corporations, things are already more difficult in terms of the information we require in order to be able to make an assessment of what the artificial person’s rights and obligations are. To Teubner, this means that in the case of new technologies, “a legal status should be granted to each of the algorithmic types that is carefully calibrated to their specific role.” [18] Here the problem arises. For while we already record tangible objects such as ships or aeroplanes as legal entities, and the same will obviously go for robotic devices and (clusters of) algorithms, we have to accept that this will lead to a proliferation of information because each technological “device” will require its own specifications as far as legal status and/or legal subjectivity is concerned. Doctrinally, most countries have a closed system of legal personhood, so adding to it may not be as easy as, for example, the European Parliament thought it was when in 2017 it spoke about personhood in the form of “electronic personalities” for robots without explaining which form it could, or should take. The European Commission then declined granting such legal status to AI devices. [19] That is also why the recent example of the human look-alike robot Sophia (note that adamic naming is a sure sign of anthropomorphism, as Zeh’s fictional “Lassie” also shows) being granted Saudi Arabian citizenship is hilarious; Sophia is right, though, when she claims that humans rather than robots are the ones that are programmable, because they respond in the desired manner to all forms of profiling and nudging. [20]

What is more, digital technologies can help construct identities, and these may include false identities. The legal persona and the human person may diverge. That is why the point raised by Garreau is acute, “The law is based on the Enlightenment principle that we hold a human nature in common. Increasingly, the question is whether this stills exists.” [21] New technologies influence the construction of the human, literally and figuratively since human thought may change accordingly in the posthuman era, as Katherine Hayles already suggested that it did. [22]

3.2 Identity or simulacrum?

As Luciano Floridi notes, the risk of the informational perspective on the human is that, the more time we spend in the digital-informational sphere the more we will show the forms of behaviour that this sphere promotes. The “onlife experience”, I suggest, reduces us to simulacra. [23] Or, as Mirdin Gnägy succinctly puts it,

The onlife will in turn encourage and facilitate the informational perspective, which leads to processes of dematerialization (objects and processes will be designed to be medium-dependent) and typification. [24]

This process may reduce us to being considered objects that stand reserved only for purposes of yet further ordering by others. [25] This is also to say that Data Protection Acts may be wonderful legal devices, but do they protect data or do they protect humans? [26] And it is to ask whether modern technologies are perhaps a return to the mechanistic worldview cherished by Descartes and LaMettrie that we thought we had left behind? [27]

So we might say that one risk of modern technologies is that if it can be done, somewhere, someone, at some point in time will actually do it. That is why we need to pay careful attention to the “how” and the “what” of profiling and data mining because of the effects on our lives, both private and in the polity, not least because of the fact that much of these technologies takes place in a stream of unconsciousness from the ordinary citizen’s point of view. But these technologies can lead to inequalities in daily life and at a political level, to undesired forms of “social sorting”. The consequences of digital-technological misunderstandings and misinformation on the juridical-political level of human equality are not to be underestimated. How can one decide at all about what or what not to do under the circumstance that the self as data subject has become objectified as a designed product standing reserved for further ordering by even more sophisticated technologies? If the data subject is targeted without being aware of it, and acts on that, i. e. shows a certain kind of behaviour, is this to be called an exercise of his free will? Does the subject in this way show his or her preferences? John Finnis’s analysis of Thomas Aquinas’s commentary on Aristotle’s Nicomachean Ethics that consists of four kinds of explanations of reality that are “instantiated paradigmatically in the human person” offers a valuable contribution to help shape our further thought on identity in relation to informational technologies when he refers to the third explanation as “that which is anticipated and shaped in deliberation” and this includes “one’s personal identity both as self-determining and as self-determined […] What counts is what one becomes in choosing what one chooses” so that

Self-determination, the fact of autonomy and its exercise, is not itself, as such, a good, but rather is simply the fact that one can and does make oneself good or evil, and that this freedom is a necessary condition for the moral goodness which is the proper measure deployed (even if only implicitly) in fully adequate explanations and explanatory descriptions of this third order. [28]

In order to prevent “data dictatorship” Big Data should therefore be treated with a hermeneutics of suspicion precisely because acting on statistically significant correlations without probing the correlations themselves is “looking for the what without knowing the why.” [29] If the problem is also linguistic, in that legal norms and codified rules are a. written in ordinary language that resists translation into algorithms, because b. legal norms and codified rules are by definition abstract in that they apply to an indeterminable number of situations, [30] then we will do well to ask what the humanities can contribute by way of a Lesbian ruler in order to provide equitable suggestions and (Aristotelian) practical wisdom.

3.3 Platform for public debate

The humanities are admirably suited to provide a platform for discussing the “how” and the “why” questions pertaining to new technologies when they use a variety of disciplinary fields ranging from philosophy to literary and cultural studies with which to ask the awkward legal questions this article aims to highlight. Public debate is essential in order to be informed about and reflect on the benefits and hazards of new technologies. In doing so, the goal would not be to take new technologies to task only, but bringing to the surface that in all scientific environments, “Disciplinary lenses inevitably inform perception.” [31] As Toulmin wrote, “Man knows, and he is also conscious that he knows.” [32] This makes epistemology the area of interdisciplinary enquiry in the humanities. It contrasts the still central question “Can we, and if so, how do we understand human behaviour?” to the explanatory methodologies that are in conformity with the paradigms developed in the natural sciences.

It is precisely the circumstance that in the humanities the relationship between the knowing subject and the known object converge, that they can help promote the development of the self-reflection that is necessary in order to gauge the consequences at the juridical-political level for human equality, for our freedom to act and our freedom from interference by private parties and public institutions. [33] Here it is important to bear in mind the fourth order of reality and the human as proferred by John Finnis, namely that of “culture, or mastery over materials […] every kind of techne,” or what we would call technology in the broad sense. [34] In profiling, too, to the position of the observer is crucial. [35] The degree to which the observer influences both the observed in the form of the human subject and the observed in the form of the goals for which the algorithms are put to work on and for which the data are gathered from the human subject need our continued attention. We ourselves should decide what the limits of technology are to be, precisely because our humanity has so far proved to be ineradicable. Or, as the narrator in Ian McEwan’s dystopian short story “Düssel … ” – on the love relation between a “traditional” human being and an artificial person called Jenny in an era fifty years from now – has it, real humans keep asking “the indelicate question”: “Are you real?” [36] When she tells him she was “formed in Düsseldorf in Greater France,” he feels both ashamed, because “why should it matter what she was made of?”, and anxious that she will outpace him intellectually. [37] If our humanity is to be our responsibility [38] and if we cannot yet fully comprehend technology’s negative effects in the midst of huge transformations, caution is good. The humanities can help explore what Caudill calls, “the significance of scientific knowledge for law,” [39] not least because literature “creates autonomous figures that may be used as terms of comparison with […] other constructions of the mind.” [40]

4 Conclusion

To return to Zeh’s novel by way of conclusion: at the end of the day we must think more in terms of the unsettling possibility that someone else is going to decide everything for us, possibly state ideology writ large with or without anti-democratic populistic tendencies as in Leere Herzen. What is more, comparable to Michel Houellebecq’s novel Submission, Leere Herzen can also be viewed as a novel about how ordinary people who consider themselves democrats contribute to the increase of populism by finding refuge in social and mental indifference, seeking material gain only and disregarding long-cherished values and principles as they retreat to the safety of their “smart” homes and their inner, private lives. It is a Wigmorian mirror of society that we may have to recognise closer to home as well, in the present state of the European Union and in individual nations and quite a few of their citizens. The absence in the novel of anyone who is critical and/or intellectual contributes to the atmosphere of cynicism pervading the main characters’ lives. The moralising note at the end of the novel is that Britta is an accessory because her (and Janina’s and Knut’s and … .) lack of commitment to the public cause helped facilitate the rise of the Besorgte-Bürger-Bewegung. Here, too, the humanities can serve to help keep open the channels of communication by providing critical food for thought for our deliberations on the values of democracy that we need to hold dear for the future of the rule of law. Our hearts should not be empty. We should ask whether or not we are all too easily lulled into complacency when the services rendered by new technologies aim at presenting the least inconvenience while providing supposedly maximum benefits. The basic question is not “so what?” but “what if?”. That is why we should select before we collect.

Published Online: 2020-04-21
Published in Print: 2020-04-28

© 2021 Jeanne Gaakeer, published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.