Privacy and artificial intelligence: challenges for protecting health information in a new era | BMC Medical Ethics
Concerns with access, use and control
AI have several unique characteristics compared with traditional health technologies. Notably, they can be prone to certain types of errors and biases [20,21,22,23], and sometimes cannot easily or even feasibly be supervised by human medical professionals. The latter is because of the “black box” problem, whereby learning algorithms’ methods and “reasoning” used for reaching their conclusions can be partially or entirely opaque to human observers [10, 18]. This opacity may also apply to how health and personal information is used and manipulated if appropriate safeguards are not in place. Notably, in response to this problem, many researchers have been developing interpretable forms of AI that will be easier to integrate into medical care [24]. Because of the unique features of AI, the regulatory systems used for approval and ongoing oversight will also need to be unique.
A significant portion of existing technology relating to machine learning and neural networks rests in the hands of large tech corporations. Google, Microsoft, IBM, Apple and other companies are all “preparing, in their own ways, bids on the future of health and on various aspects of the global healthcare industry [25].” Information sharing agreements can be used to grant these private institutions access to patient health information. Also, we know that some recent public–private partnerships for implementing machine learning have resulted in poor protection of privacy. For example, DeepMind, owned by Alphabet Inc. (hereinafter referred to as Google), partnered with the Royal Free London NHS Foundation Trust in 2016 to use machine learning to assist in the management of acute kidney injury [22]. Critics noted that patients were not afforded agency over the use of their information, nor were privacy impacts adequately discussed [22]. A senior advisor with England’s Department of Health said the patient info was obtained on an “inappropriate legal basis” [26]. Further controversy arose after Google subsequently took direct control over DeepMind’s app, effectively transferring control over stored patient data from the United Kingdom to the United States [27]. The ability to essentially “annex” mass quantities of private patient data to another jurisdiction is a new reality of big data and one at more risk of occurring when implementing commercial healthcare AI. The concentration of technological innovation and knowledge in big tech companies creates a power imbalance where public institutions can become more dependent and less an equal and willing partner in health tech implementation.
While some of these violations of patient privacy may have occurred in spite of existing privacy laws, regulations, and policies, it is clear from the DeepMind example that appropriate safeguards must be in place to maintain privacy and patient agency in the context of these public–private partnerships. Beyond the possibility for general abuses of power, AI pose a novel challenge because the algorithms often require access to large quantities of patient data, and may use the data in different ways over time [28]. The location and ownership of servers and computers that store and access patient health information for healthcare AI to use are important in these scenarios. Regulation should require that patient data remain in the jurisdiction from which it is obtained, with few exceptions.
Strong privacy protection is realizable when institutions are structurally encouraged to cooperate to ensure data protection by their very designs [29]. Commercial implementations of healthcare AI can be manageable for the purposes of protecting privacy, but it introduces competing goals. As we have seen, corporations may not be sufficiently encouraged to always maintain privacy protection if they can monetize the data or otherwise gain from them, and if the legal penalties are not high enough to offset this behaviour. Because of these and other concerns, there have been calls for greater systemic oversight of big data health research and technology [30].
Given we have already seen such examples of corporate abuse of patient health information, it is unsurprising that issues of public trust can arise. For example, a 2018 survey of four thousand American adults found that only 11% were willing to share health data with tech companies, versus 72% with physicians [31]. Moreover, only 31% were “somewhat confident” or “confident” in tech companies’ data security [28]. In some jurisdictions like the United States, this has not stopped hospitals from sharing patient data that is not fully anonymized with companies like Microsoft and IBM [32]. A public lack of trust might heighten public scrutiny of or even litigation against commercial implementations of healthcare AI.
The problem of reidentification
Another concern with big data use of commercial AI relates to the external risk of privacy breaches from highly sophisticated algorithmic systems themselves. Healthcare data breaches haven risen in many jurisdictions around the world, including the United States [33, 34], Canada [35,36,37], and Europe [38]. And while they may not be widely used by criminal hackers at this time, AI and other algorithms are contributing to a growing inability to protect health information [39, 40]. A number of recent studies have highlighted how emerging computational strategies can be used to identify individuals in health data repositories managed by public or private institutions [41]. And this is true even if the information has been anonymized and scrubbed of all identifiers [42]. A study by Na et al., for example, found that an algorithm could be used to re-identify 85.6% of adults and 69.8% of children in a physical activity cohort study, “despite data aggregation and removal of protected health information [43].” A 2018 study concluded that data collected by ancestry companies could be used to identify approximately 60% of Americans of European ancestry and that, in the near future, the percentage is likely to increase substantially [44]. Furthermore, a 2019 study successfully used a “linkage attack framework”—that is, an algorithm aimed at re-identifying anonymous health information—that can link online health data to real world people, demonstrating “the vulnerability of existing online health data [45].” And these are just a few examples of the developing approaches that have raised questions about the security of health information framed as being confidential. Indeed, it has been suggested that today’s “techniques of re-identification effectively nullify scrubbing and compromise privacy [46].”
This reality potentially increases the privacy risks of allowing private AI companies to control patient health information, even in circumstances where “anonymization” occurs. It also raises questions of liability, insurability and other practical issues that differ from instances where state institutions directly control patient data. Considering the variable and complex nature of the legal risk private AI developers and maintainers could take on when dealing with high quantities of patient data, carefully constructed contracts will need to be made delineating the rights and obligations of the parties involved, and liability for the various potential negative outcomes.
One way that developers of AI systems can potentially obviate continuing privacy concerns is through the use of generative data. Generative models develop the ability to generate realistic but synthetic patient data with no connection to real individuals [47, 48]. This can enable machine learning without the long term use of real patient data, though it may initially be needed to create the generative model.
link