The why and how of trustworthy AI

: Artificial intelligence is increasingly penetrating industrial applications as well as areas that affect our daily lives. As a consequence, there is a need for criteria to validate whether the quality of AI applications is sufficient for their intended use. Both in the academic community and societal debate, an agreement has emerged under the term “trustworthiness” as the set of essential quality requirements that should be placed on an AI application. At the same time, the question of how these quality requirements can be operationalized is to a large extent still open. In this paper, we consider trustworthy AI from two perspectives: the product and organizational perspective. For the former, we present an AI-specific risk analysis and out-line how verifiable arguments for the trustworthiness of an AI application can be developed. For the second perspective,weexplorehowanAImanagementsystemcanbe employed to assure the trustworthiness of an organization with respect to its handling of AI. Finally, we argue that in order to achieve AI trustworthiness, coordinated measures from both product and organizational perspectives are required.


Introduction
Artificial intelligence (AI) is penetrating numerous areas of our lives at a breathtaking speed.In the process, it is increasingly taking over central activities, on the one hand in safety and security-relevant domains, such as automated driving [60], but also in domains that affect human selfdetermination, such as in the granting of loans [20] or in hiring processes [2].It is generally assumed that AI will only deploy its full potential if it is used in accordance with high quality standards and in line with our core values.In this context, the concept of trustworthiness has been coined, which has already been the subject of intensive research for several years [16], [17], [54].For European practice, the Ethics Guidelines for Trustworthy AI of the High-Level Exert Group [23] published in 2019 are directional and it can be expected that their specifications will find their way into legal requirements through the recently published proposal for AI regulation of the European Commission [15] (for an analysis of the proposal, see [51]).It is noteworthy to mention that the High-Level Expert Group's guidelines, similarly as numerous other AI ethics statements [9], leave open which entity has to comply with their requirements and how.Comparisons to related but simpler domains, such as IT security, make it seem plausible that trustworthiness can and must refer to the organizations that use or develop AI as well as to the AI systems themselves.Looking at the concrete example of "diversity, non-discrimination and fairness," the HLEG recommends establishing a mechanism that allows others to flag issues related to bias and discrimination [23], which is a non-technical measure of the organization, while the recommendation to monitor fairness through metrics [23] can only be addressed by a technical examination of the specific AI application.For both interpretations, the operationalization of trustworthiness requirements is to a large extent still open [21], [43].This paper takes both perspectives with a focus on AI applications which are based on Machine Learning (ML) technologies, presents approaches for operationalization, and discusses how they are related.

The why of trustworthy AI
The impressive progress of AI in recent years, for instance in medical diagnostics [59] and predictive maintenance [8], was largely driven by Machine Learning.The great potential of ML-based applications originates from the fact that their behavior is not predefined by programmed rules but inferred from data instead.More specifically, ML methods are designed to identify patterns in so-called training data and incorporate their statistical insights in a model.Deep neural networks in particular, which are a type of complex ML models generated from vast amounts of data, are capable of performing tasks that, up to date, could not be described effectively by human rules (e. g., image [12] and speech recognition [4]).However, the data-driven nature of ML gives rise to novel imponderables.For instance, ML-based applications bear the risk to incorporate and reproduce discriminatory patterns in real data [42], by which individuals as well as the reputation of the organization in charge might adversely be affected.This example illustrates why the recommendations and requirements in the numerous AI guidelines presented (see chapter 1) are not solely aimed at preventing bodily injury, property damage or financial loss, as it is the case with "classic" IT security and safety.Complementary, parts of them also consider the (intangible) impact of AI on individuals and society, given, for example, its increasing use to support sensitive decision-making processes (e. g., dismissal of workers [9] or sentencing delinquent persons [1]).
Given the over 700 AI policy initiatives from 60 countries and territories, as well as over 170 emerging AI-related regulation initiatives [47], it is apparent that there are different views around the world on how AI should be used.This partially originates from the diversity of national and territorial AI strategies driving official guidelines and AI regulation e. g., [14], [55], [46], and [52].While requirements are globally different, one finds recurring elements in numerous AI guidelines, especially when looking at western initiatives.More precisely, there is a consensus among the variety of European as well as US American guidelines that privacy, fairness and transparency (see [34] and [64]) are also, in addition to the established objectives of reliability, safety, and security, essential quality dimensions of AI.Moreover, the EU level documents at which we focus in the following emphasize human autonomy and oversight (partially in contrast to other international perspectives) as a further key requirement for trustworthy AI.While reliability, safety, and security address the proper functioning and potential vulnerabilities of the system (see #2 [23] and Art. 15 [15]), especially the datadriven nature of ML, as mentioned, gives rise to novel aspects.Clearly, as soon as personal data is processed, the privacy of the individual needs to be protected.Here, the use of ML involves risks such as personal data being newly generated during operation (e. g., personal keystroke patterns [37]) or extracted from the ML model without authorization [48], [18].This notion can be extended to other forms of data e.g., business sensitive data.Furthermore, fairness must be ensured in decision processes (see #5 [23] and Art. 10, 15 [15]), considering that ML algorithms might learn from a biased ground-truth or have a significantly lower performance with respect to certain groups of people [6] (e. g., due to inconsistent data quality).Following, as stated, data-driven AI thrives on concepts that are hard to understand or ill-defined for humans.It is therefore not surprising that often the presented solutions, i. e., ML models, are equally incomprehensible even for experts [7].For cases were sound argumentations on quality or specific properties are needed, additional goals towards explainability can thus be desirable (see #4 [23] and Art. 13 [15]).As the terminology is not always consistent among the different documents, nor are there precise definitions in all cases, we consider (technical) explainability one aspect of transparency in the context of AI, while transparency is related to further aspects such as traceability (see #4, 7 [23] and Art. 12 [15]) and the provision of information (see #1, 2, 4, 7 [23] and Art.13, 52 [15]).Finally, the High-level Expert Group and the European Commission have emphasized human autonomy and oversight among the most important requirements for the use of AI.While the former addresses risks stemming from the sheer fact that the automation provided by an AI application may (in some form) limit or manipulate the agency of humans (see "fundamentals rights impact analysis" [23] and Art. 5, 6 [15]), the latter aims at ensuring effective human supervision and, if necessary, intervention (see #1 [23] and Art.14 [15]).As such, it appears that a broader conception of quality has been adopted in the context of AI, which is well captured by the term "trustworthiness".
While a consensus is emerging on the key dimensions of trustworthy AI especially at European level, for a given AI application, practical test procedures are needed to evaluate whether the corresponding quality characteristics are actually present.Notably, the research regarding the measurement and implementation of AI trustworthiness is broad and ongoing [24], [22], [45].However, while it is already apparent that effective measures to ensure properties such as reliability, fairness, or transparency often require access to the training data, the design process, and the representation of the output, there are still several challenges to be addressed in the concrete verification of requirements [62], [5].For example, as individual parts of a complex ML model are typically challenging to interpret [63], quality aspects cannot be broken down (see [39] for a discussion on single component evaluations).Further challenges arise as AI applications often operate in complex environments which cannot be fully understood and captured by humans themselves (e. g., the indicators of diseases in medical data).Thus, often concise technical criteria are missing to evaluate the coverage of training and test data (see [19] for a combinatorial perspective).In open-world contexts in particular, it is typically not possible to quantify the application domain precisely.Looking further at the example of reliability, as opposed to other engineering disciplines such as construction where one can guarantee by computational proof (or reach a very high assurance level at least) that a building is statically safe -provided no huge earthquake occurs -, the verification of ML-based applications proves challenging.While promising approaches to attest the flawless functionality of an ML model are being developed (e.g., SMT-solver [35], linear relaxation [13], and branch-and-bound algorithms [61], see also [40] for a survey on verification algorithms), they are not yet sufficiently advanced to be of use for practical AI applications [5].Additional complexity emerges in case the ML models are continuously trained during operation.Notably, even if the challenge of conducting an AI system audit was solved, still, giving a warranty appears difficult given the dynamics of ML.At the same time, however, AI systems are currently being applied to many areas of our lives whose trustworthiness must be ensured.

The how of trustworthy AI
As elaborated in the previous chapter, trustworthiness has been adopted as a broad notion of quality for AI applications, exceeding the classic concepts of IT security and safety.While requirements and recommendations that characterize "trustworthy AI" have been presented in numerous guidelines [34], they are, however, introduced at an abstract level and leave to a large extent open which measures or methods should best be taken to meet them [21], [43].Overall, there is a lack of standards that would concretize requirements [11], and technical verification remains a challenge (see chapter 2).Parallelly, numerous organizations already use AI in practice and have high quality expectations for their systems.Thus, it is crucial that they take into account the imponderables caused by AI and find a systematic way of dealing with them.To this end, we observe that two fundamental perspectives must work together: i) a high technical quality of AI systems themselves is required, ii) the organization should make appropriate preparations (e. g., establish structures, processes and roles) for handling its AI applications and their development in a trustworthy manner.In the following, we refer to these two perspectives as the product and organizational perspectives.
The product perspective, detailed in Section 3.1, focuses on an individual "product", meaning here an AI system, within its application context.The approach presented, serves to systematically evaluate the trustworthiness of an AI application with respect to the dimensions mentioned in the previous chapter.It is based on a structured analysis of risks and, moreover, involves their mitigation through mostly technical but, in parts, also nontechnical measures along the lifecycle of the application.
Notably, risks can extend well beyond the initial rollout, either because they were missed during conception of the system or did not even exist at the time.Such continuous or latent issues need to be addressed at the organizational level and the corresponding (organizational) superstructure is discussed in Section 3.2 (we refer to [53] and [41] for the importance of management support for achieving organizational objectives in the implementation of classic IT systems).This superstructure not only needs to reflect long-lasting risk situations, but in addition governs the general policies of the organization towards the dimensions of trustworthy AI and orchestrates general, often process-oriented, measures to maintain these policies in practice.We introduce AI management systems as potential candidates for such a superstructure and explore their role in operationalizing trustworthy AI from the organizational perspective.

Approach from the product perspective
Judging by the notion of trustworthiness endorsed by current EU-level documents, it is apparent that trustworthy AI can, in general, only be achieved by a broad analysis of both the application and its environment, accompanied by careful risk mitigation measures and checkups.However, existing test procedures for traditional IT systems cannot be straightforwardly used for AI (see chapter 2) and, moreover, do not address intangible risks related to discrimination and human autonomy, among others.For these reasons, novel technical foundations are needed for the assessment and control of AI risks.Below, we elaborate on the challenges in this regard and present an approach for the systematic evaluation of trustworthy AI, putting a particular focus on AI applications based on ML technologies.
Since the recommendations and requirements for trustworthy AI are abstract [21], [43], they must first be further operationalized in order to evaluate a specific AI application.Here, we observe three main challenges: First, various aspects of trustworthiness are not readily quantifiable by metrics or key performance indicators.For instance, it is unclear how to measure the user autonomy afforded by an AI application or the degree of its explainability.Second, typically, the concrete requirements and evaluation standards strongly depend on the application context as well as the specific AI method used.For example, while fairness is irrelevant for use cases in production plants, there are high risks in applications that support critical decisions such as algorithms for credit scoring or rating of job applicants.Following on this example, over 20 fairness metrics have been proposed [56], each of them quantifying a different aspect of discrimination.However, there is no statement by official bodies about which metric(s) should be applied for a given use case and, furthermore, which target interval to rate as acceptable.Third and lastly, disparate objectives associated with trustworthiness may conflict with one another leading to conflicting system-level requirements.One reason is that numerous requirements which reflect societal values are detached from performance-related quality requirements.For instance, a perfect predictor which violates the principle of fairness given a discrimination-inherent ground truth, creates a conflict between non-discrimination and optimizing its performance.Similarly, the prediction capabilities of an ML model may be compromised if personal information is removed from the training dataset for privacy reasons.
Interestingly, some of the aforementioned challenges can also be found, for example, in IT Security and Functional Safety, where the objectives of resistance to manipulation or malfunction often lead to very different quality criteria for different systems [58], [3].Similar to the riskbased approaches established in these fields, we believe that the specification of trustworthiness criteria for an AI application must be based on a comprehensive risk analysis.Moreover, given the variety of methods for risk mitigation, the measures for meeting defined criteria must also be adapted to its respective AI application.Accordingly, for assuring and evaluating a trustworthy AI, we propose an approach consisting of two consecutive phases: i) topdown and ii) bottom-up.While the top-down phase serves to identify and break down risks into specific criteria for the AI system, the bottom-up phase encompasses the (likewise specific) measures taken that work towards meeting these criteria and argues for their fulfillment.In the following, we will briefly present the main steps of this approach which we have described in more detail in [49] and [50].
As a basis for systematically analyzing and breaking down AI risks in the top-down phase, especially those stemming from the use of ML, we propose a scheme along the six dimensions presented in chapter 2 (see Figure 1).Since each dimension covers numerous aspects which must be weighted and evaluated differently depending on the application context, it is necessary to subdivide them respectively into more granular risk areas [49].For instance, regarding privacy, different evaluation standards with respect to personal and business sensitive data may emerge.Similarly, when defining specific criteria with respect to transparency, a distinction should be made between explainability for technical experts and information of users and affected persons, for example.Finally, our proposed scheme includes dedicated risk areas "control of  dynamics" to take into account risks which arise during operation such as model and data drift.
The top-down phase of our approach (see l. h.s. Figure 2) serves to specify trustworthiness criteria for measuring the control of the AI risks in the scheme presented.To increase efficiency, this phase begins with a protection requirement analysis at the dimension level, where an eval-uation of the potential damage determines whether a dimension is relevant for the considered AI application.Exclusively for relevant dimensions, an in-depth analysis for each associated risk area follows.Based on the results, targets are formulated describing under which circumstances residual risks are acceptable.These targets are then translated into (preferably quantitative) criteria for the specific AI application.Let us illustrate on the example of MLbased credit scoring.Since this application has an impact on the financial capabilities of individuals, most likely fairness will be among the relevant dimensions.The in-depth risk analysis should be informed e. g., by the applicable anti-discrimination legislation, and help to identify input features which characterize potentially disadvantaged population groups.Lastly, use case-adequate fairness metrics (and target values, if applicable) are chosen, such as statistical parity and counterfactual fairness [57].
After defining the trustworthiness criteria for the AI application, the bottom-up phase of our approach assures and argues that they are fulfilled.To this end, measures (preferably technical) for risk mitigation and testing, which often need to be specifically adapted to the AI technology used, should be implemented, where applicable.We distinguish between four categories of measures, see Figure 2 (r.h.s.): i) data, ii) ML component, iii) embedding, and iv) operation.These categories, however, should not be considered separately as they represent different starting points for risk control along the life cycle.Since our approach focusses on AI applications whose functionality is typically derived directly from data, the selection and quality of data are of particular importance.Complementary, risks should be considered during the design and training procedure of the ML component included in the system (meaning the ML model and, if applicable, pre-and post-processing).Moreover, in numerous applications, the ML component operates in interaction with further, non-ML components (e. g., rule-based), which may serve for logging, monitoring or otherwise processing in-or outputs.Unlike for the ML component, conventional measures and tests can be used for this embedding.Lastly, ensuring the trustworthiness of an AI application does not end at the development stage since various levels of support and maintenance are needed during operation.This cannot, or rather should not, be addressed by technical solutions alone, but involve human oversight as well as allow for the option of human intervention in critical situations.Returning to the example of credit scoring, several measures can be employed to work towards fulfilling set target values with the fairness metrics chosen, for example bias-removing pre-processing of the training data or postprocessing of the ML model outputs.Moreover, a monitoring of fairness indicators should be installed to check e. g., for potential deviations from the target values during operation.
The bottom-up phase (see r. h.s. Figure 2) encompasses the implementation of (technical as well as nontechnical) measures with respect to the four abovementioned categories.Based on their documentation, a safe-guarding argumentation is developed for each risk area under consideration in order to demonstrate, if possible, that the criteria from the top-down phase are met.Subsequently, a conclusion is drawn per dimension on whether the criteria are met or not.Finally, if residual risks still exist due to conflicting requirements between the dimensions for example, trade-offs should be discussed and carefully weighed prior to the final judgement about the application's trustworthiness.

Approach from the organizational perspective
Looking at the recommendations and requirements for trustworthy AI in detail, one quickly recognizes that meeting them is not limited to the AI system, but also involves the organization in charge, even if it is not explicitly demanded.This, for example, becomes apparent in the discussion of trade-offs as mentioned in the previous section: While the product perspective provides technical foundations for analyzing and treating risks in relation to a specific AI application, it cannot generally determine how conflicts between system requirements should be resolved.For instance, in the example of credit scoring above a trade-off between performance (reliability) and fairness might exist.Decisions on such trade-offs should be guided by organizational policies and, where applicable, take into account further requirements such as non-AI sector regulation or budget.In general, this also affects which AI risks the organization considers acceptable.Parallelly, setting up standardized processes and procedures also for the implementation level, can help to effectively achieve a consistent level of trustworthiness.Below, we will present how a trustworthy handling of AI can be approached in a structured manner and, if successful, demonstrated using a dedicated AI management system.A generally established means for organizations to achieve defined objectives purposefully and accountably, are management systems (MS').Following [33], MS' constitute the "set of interrelated or interacting elements of an organization to establish policies and objectives, as well as processes to achieve those objectives" [ibid.].Here, it is initially not specified which goals are to be achieved; for example, management systems are being used for assuring information security [27], product or service quality [28], and environmental sustainability [29] within organizations.Aiming at combining all organizational units and processes which are directed to set objectives into a clear framework, a MS inevitably affects the entire organization, from leadership to specific technical and organizational measures.Typical parts or tasks within a management system (as stated in [36], [44]) include: -formulating objectives in the form of policies, -analyzing risks and opportunities for these objectives, -defining roles or responsibilities for specific (sub-)objectives, -(…) planning and implementing processes and the measures required to achieve them, -and planning, implementing and evaluating reviews of the achievement of objectives.
Given the above description, an appropriately designed management system could be a suitable manner to create an environment where it is ensured that the technical as well as non-technical requirements for trustworthy AI are met.Unlike in the product perspective, such a MS is at first detached from individual applications but can eventually impact their trustworthiness.Also the European Commission appears to consider MS' as an appropriate approach, since it requires both a risk management system and a quality management system for providers of high-risk AI systems (Art.9 and 17 [15]).While the proposal for AI regulation lists basic processes and activities that such MS' should encompass, their specific design, however, needs to be further operationalized.To provide effective support for an organization in handling their AI applications in a trustworthy way, we believe that a management system should be AI specific in each of the points listed above.First, starting from the leadership level, the trustworthiness dimensions as discussed in chapter 2 should be reflected in policies and made aware within the organization.Second, the notion of (organizational) risk must be extended to the broad concept of trustworthy AI.For example, a credit scoring application systematically refusing loans for women while approving for men with equal financial provision can cause financial damage to the provider and, foremost, harms the personal rights of women.Here, by formulating guidelines for risk analysis (e. g., based on the approach presented in Section 3.1), an AI management system can help not only to consider potential (negative) effects on the organization itself but also to examine how its AI products might affect individuals and society.In addition, as mentioned, decisions must be taken (informed by organizational policies) regarding risk acceptance levels and specific trustworthiness criteria.To this end, procedures such as stakeholder consultation for choosing use case-adequate fairness metrics might be needed.Third, being accountable for decisions is an essential aspect of a trustworthy AI handling.Given the residual risks which might exist in the use of AI, responsibilities and liabilities must be assigned to certain roles within the organization.Moreover, regarding the challenge that AI trustworthiness combines diverse objectives which may conflict with each other (e.g., performance vs. fairness, see Section 3.1), a committee should be established such as an "Ethical AI Review Board" to discuss, agree upon, and be accountable for trade-offs [23].Fourth, as indicated above, numerous technical foundations emerging from the product perspective require an organizational counterpart i. e., an activity or process (preferably standardized) that puts them into practice.Notably, due to the dynamics inherent to ML, "post-market monitoring" is crucial (Art.61 [15]).Furthermore, organizations must deal with the fact that AI specific risks cannot usually be completely safeguarded, and unforeseen errors may occur.In this respect, also non-technical measures might be needed such as mechanisms that allow to flag issues, for example related to discrimination, or allow for redress [23].In order to create an AI-sensitized environment, these (non-)technical implementation levels should be accompanied by adequate resource planning (e. g., to assemble diverse developer teams) as well as training measures.Lastly, specific risks and challenges, such as fairness or data quality in general, ask for continuous and critical reevaluation.Audits of an AI application (internal or external) must be carefully planned and prepared by the organizations which, for instance, needs to formulate technical documentations (particularly if the proposal for AI regulation is adopted, see Art. 11 [15]).The success of management systems in other domains (such as IT security, product quality, see [32]), in parts, can be attributed to their standardization, which captures best practices and thus enhances operationalization as well as comparability across organizations.For instance, the international standard for quality management systems (QMS) is being used for over 30 years and more than 1 million certificates have been issued [38].Interestingly, given that the European Commission requires management systems for both quality and risk (see above), risk management has been standardized as well [31].However, neither of the standards mentioned provides guidance for organizations on how to deal with the specific challenges and novel risks that arise from the use of AI.Therefore, a joint working group (ISO/IEC JTC 1/SC 42/WG 1) of the International Standardization Organization (ISO) and the International Electrotechnical Commission (IEC) is developing a set of standards to address different aspects of the use of AI technologies.Two particular ones that are currently under development are the international standards for AI risk management [25] and AI management systems [26].
Judging by the Working Draft of the international standard for AI MS, it proposes a suitable tool for organizations to address the requirements for trustworthy AI presented in [23] and [15] (for an in-depth analysis, we refer to [44]), despite not formally being a QMS as demanded in [15].While the main part of the standard defines requirements for the basic framework of an AI MS, the annexes provide controls which are recommendations for its implementation (whereby it should be noted that technical aspects are only treated by controls).For example, the working draft lists "fairness, security, safety, privacy, transparency and explainability, accountability, availability, maintainability, quality of training data, and AI expertise" as possible AI-related organizational objectives when managing risks (Annex B, [25]), thus providing a direction for risk management towards considering the dimensions of chapter 2. Another recommendation in this standard is to integrate an impact assessment into the organization's risk analysis process.Interestingly, the document introduces the term "impact" explicitly in the context of health and safety of society, as well as traditions, values, the environment, privacy, and fairness, among other things, thus extending the concept of organizational risk to the notion of trustworthiness.Notably, the request for impact assessment differentiates this Working Draft from other MS standards, making clear that such standards would not be sufficient to ensure a trustworthy handling of AI within an organization.
Lastly, it is noteworthy that MS standards do not only provide support for organizations to achieve their goals in a responsible and accountable manner, but they are also a generally accepted means to generate evidence for this.Accordingly, based on the previous discussion, we consider certification of an AI management system to be the basis for trustworthy AI and a suitable way for organizations to demonstrate their trustworthiness.Nonetheless, compliance with a MS standard is not sufficient for assuring the trustworthiness of an AI system itself.As technical verification cannot be waived, there are still numerous challenges to be overcome in order to conduct a comprehensive audit on the system level.

Conclusion
In this paper, we have outlined essential quality dimensions of trustworthy AI and argued that two different perspectives are involved in their operationalization, namely the product and organizational perspective.While the product perspective deals with the risks emerging from a specific AI application, the organizational perspective considers the handling of AI systems in view of organizational objectives and risks.For each of the perspectives, we have presented an approach for operationalizing and assuring trustworthy AI.The first approach serves to systematically evaluate an AI application from a product perspective.The second approach presents a management system specific to AI.Overall, we conclude that AI trustworthiness can only be achieved through a harmonized interplay between product and organizational perspectives since risks emerging from the product must be reflected in organizational policies as well as monitored, and in parts also mitigated by corresponding processes.In addition, as shown in both approaches presented, we believe that the implementation of concrete measures, from either perspective, should be based on a comprehensive risk analysis and specifically adapted to the respective results.
Our discussion of trustworthy AI focuses on MLspecific key dimensions for which there is a consensus in both academia and society.Furthermore, there are other important principles such as sustainability and environmental friendliness ("underrepresented compared to other ethical dimensions" [34]) which could be harmed in the use of ML, for example, due to the high energy consumption in training models.We believe that these principles are generally related to numerous non-ML-specific aspects and, thus, cannot be solely addressed from a product perspective, but other approaches are needed.For instance, environmental sustainability cannot be assessed without consideration of the resources from which electricity is generated.
Regarding the two perspectives introduced in this paper, we observe that the operationalization from the organizational perspective is further advanced since best practices are gradually emerging with respect to processes in AI development and operation (e. g., facilitating automation and tracking via MLflow, DVC, Jenkins) as well as AI management (as reflected in current standardization activities [25], [26]).In a next step, audit and certification schemes are needed, and it can be assumed that, once processes and procedures have been standardized, these schemes can be developed and applied in a similar fashion as for non-AI domains.From a product perspective, numerous testing tools are available for assessing specific trustworthiness-related characteristics in AI systems despite the challenges presented (see chapter 2).However, there is an apparent gap between the results of numerous tools and the formal requirements to deem a risk sufficiently mitigated or controlled.This gap between tools and abstract trustworthiness requirements should be addressed by future research.While testing methods need to be further developed, a promising approach for concretizing requirements is to define (analogously to the protection profiles of the Common Criteria [30]) adequate classes of AI use cases (e. g., regarding the application environment or sector) and specify criteria as well as evaluation standards in more detail for the respective classes.

Figure 1 :
Figure1: AI risk scheme.For a detailed elaboration on the dimensions, see[10], and on the risk areas, see[49].

Figure 2 :
Figure 2: Visualization of the product perspective approach.The l.h.s shows the top-down phase, the r.h.s the bottom-up phase.