Subjective logic reasoning: an urn model intuition and application to connected automated driving

: Subjective Logic (SL) is a powerful extend of classical probability theory that can handle small sample sizes and, with that, the resulting statistical uncertainty. However, SL is a quite abstract theory and has found limited attention in the field of automation so far. In this work, we present a new urn model intuition to SL that connects SL with the Pólya urn scheme. The application of SL-based reliability estimation in automation is demonstrated on two examples from the domain of connected automated driving: first to assess external information for motion planning on-board the vehicle and second to rate connected vehicles as agents within a large-scale multi-agent system.


Introduction
Cooperative automated driving can significantly increase traffic efficiency and benefit the safety on the roads. However, if the restrictive requirements of motion planning and other safety-critical modules in Connected Automated Vehicles (CAVs) are not met, this may result in significant harm [36,39]. In turn, full reliability of cooperative information distributed through an vehicular Multi-Agent System (vMAS) is often assumed without questioning; however, this is not given in general [39]. While already widely adopted in avionics and navigation [31], monitoring and assuring the system's functional performance plays an increasing role in the automotive industry and is generally termed Safety of the Intended Functionality (SOTIF). Thus, to reach SOTIF for connected automated driving, accounting for the reliability of cooperative information is mandatory. We refer to reliability as the extend to which systematic errors, e. g., from a silent failure of a subsystem, can be excluded. Hence, since in practice only small sample sizes are available, the evidence-based statistical uncertainty must be accounted for.
While classical probabilistic approaches can easily model the uncertainty of state estimates in terms of covariance matrices, they lack the ability to explicitly model the evidence-based statistical uncertainty [11]. Yet, for small sample sizes, the statistical uncertainty is crucial to keep a probabilistically inferred result meaningful. For example, consider a coin that is tested to be fair or unfair throwing it three times. In this example, probabilistic inference can never conclude that the coin is fair as a frequentist probability of 0.5 for each side of the coin can never result. Meth-ods like [23,24] based on the evidence theory [35], in turn, have an explicit representation of statistical uncertainty. However, they suffer from unintuitive or even wrong results when the incoming information is highly contradicting [40]. In connected automated driving, however, the reliability of a source of cooperative information often has to be estimated quickly based on a small number of measurements. Furthermore, the information based on which the reliability has to be decided might be very contradicting.
This challenging task can be performed using Subjective Logic (SL). SL is a recent, powerful mathematical theory that extends classical probability theory as well as the evidence theory and bridges the gap between both [11]. However, SL is a complex theory with limited intuition and therefore received limited attention so far. In fact, the community was quite doubtful about the theory [5]. Therefore, after a short summary of the related work in Section 2, an alternative intuition to SL is presented in Section 3 using Pólya's urn model [7]. The SL-based reliability estimation can deal with small sample sizes and yields intuitive results even if the reliability is estimated from very contradicting information. Thus, it overcomes two major drawbacks of existing reliability estimation schemes. This is demonstrated on two applications from the domain of connected and automated driving in Section 4. First, the SLbased reliability estimation from [20] for external information on traffic participants for motion planing on-board a CAV is presented. It is extended by an SL-based assessment of the estimations' reliability using the urn model. Second, the SL-based estimation of the reliability of CAVs as agents in a large-scale vMAS from [21] is summarized. The article closes with some conclusions in Section 5.

Related work
Since the early works on SL [10], several tutorials, e. g., [12], have been published, giving some intuition to SL. Most of it is summarized in the text book [11]. However, as opposed to this work and to the best of our knowledge, none of these publications neither discuss the connection between SL and the Pólya urn scheme nor have an urn model intuition to this theory. Instead, the bariocentric triangle in combination with numerous examples is usually used to illustrate SL. The bariocentric triangle is a triangle with belief, disbelief and uncertainty at its corners, where each point within that triangle represents an SL opinion.
Besides, there are several publications that present applications to SL, mostly for trust management in computer networks [2,13,39], information fusion [15], and communications [3,6]. In this context, decision making has been addressed on an abstract level, e. g., in [39]. In turn, in this work, a concise application to SL in the field of automation, namely reliability estimation and decision making under uncertainties for CAVs is presented.
So far, other methods, such as Baysian inference, the Dempster-Shafer theory [35], or neural networks have been used for reliability estimation [24], [23], [8]. These methods come with the drawback that they either cannot explicitly differ between statistical evidence and prior or that unintuitive results occur in case of strongly contradicting information sources [40]. The presented SL-based reliability estimation scheme overcomes these drawbacks.
Motion planning under uncertainties so far has been addressed through classical approaches [1,4,30,37,38], set-based methods [17,25,27,28], Markov Decision Processes (MDP) [9,34], and communication based approaches [33]. None of these works, however, accounts for the possibility of having an unreliable environmental model. We distinct unreliable from uncertain in the way that uncertain refers to known statistical effects, whereas unreliable refers to possible systematic model errors. In this work, an example is summarized that integrates SL reliability estimates into decision making for motion planning and, thus, accounts for unreliable cooperative information.

An urn model intuition to subjective logic
In this section, the basic object of SL, the so-called Opinion, as well as the most important SL operators are derived from Pólya's urn model [7] as an intuitive alternative to the original derivation of Jøsang in [11]. Consider a classical urn with red and black balls in it, where the probability of drawing the respective colors has to be estimated from a sample of draws with replacement. In the classical maximum likelihood approach, the respective probabilities are calculated as the fraction of observations of a specific color referred to the overall number of observations. As long as the sample size is sufficiently large, this approach yields reasonable approximations of the true distribution, however, it fails for insufficient statistics. While prior knowledge in combination with maximum a posteriori estimation can improve the situation, there are still situations that cannot be handled appropriately. For example, if the actual probabilites are p * = (0.5, 0.5), but only three balls are drawn from the urn, it is impossible to correctly estimate the probability from the sample, even if the correct probabilities are assumed as prior a = (0.5, 0.5).
To overcome the problem of insufficient statistics, some sort of upsampling is needed to estimate the likelihood of each combination of probabilities p = (p, 1 − p), p ∈ [0, 1], where these probabilities can be seen as the parameters of the urn model. Intuitively, the empirical estimate (p , 1 −p ) of red and black balls in the urn is likely to be close to the corresponding actual distribution (p * , 1−p * ) for many samples. At the same time, it can be expected that with a decreasing number of samples, the prior distribution will become increasingly important. A statistical model suitable for such an upsampling with reinforcement is the Pólya urn scheme [7], which, however, does not account for prior knowledge.
Hence, to overcome the problem of insufficient statistics while accounting for prior knowledge, we extend the classical urn model by two further urns, see Figure 1. The second urn initially contains as many special balls (colored green in our example) as the cardinality W of the sample space, i. e., W = card{ } = 2 in our example. The third urn contains red and black balls according to the prior distribution. A small sample is drawn from the first urn, it is put into the second urn, and additionally, balls of the same kind are replaced in the first urn. Then, the upsampling starts: From the second urn, balls are drawn according to the Pólya urn scheme, i. e., each ball drawn from the urn is replaced by two of the same kind. However, whenever a green ball is drawn, the green ball is put back, but instead of a second green ball, a ball from the third urn is drawn and put into the second urn. Balls drawn from the third urn are always replaced by a ball of the same kind afterwards, so that the distribution in the third urn represents the prior again.
This method can be extended to the multi-dimensional case (W > 2) by adding balls of further colors to the first and third urn and adjust the number of special balls in the second urn according to W. Then, for an infinite number of upsampling steps, the likelihood of the generalized combination of probabilities p ∈ [0, 1] W follows a Dirichlet distribution with the parameters r = {r i } i=1,...,W [29]. Hereby, r i describes the expected number of balls from a specific color before upsampling, if the green balls are replaced by balls drawn with replacement from the third urn. The likelihood of the generalized combination of probabilities p can be imagined as the probability that the balls in the first urn actually follow the probability distribution p. Hence, it is a probability on probabilities and therefore is considered a second order probability [11].
Speaking in terms of evidence theory, before upsampling, the fraction of the balls in one of the W colors referred to the total number of balls represents the respective element of the statistical evidence or belief mass b = [b 1 , . . . , b W ] T . In turn, the fraction of green balls u represents statistical uncertainty, as a green ball can act as any colored ball. Mathematically, the urn model can be described completely by the 3-tuple where a describes the prior distribution. In turn, the urn model can be projected to a classical probability. For that, instead of upsampling, the green balls in the second urn are replaced by balls drawn from the third urn. Then, the fractions as above are used to determine the projected probability distribution This leads to the definition of an opinion as the basic SL element and its projected probability: Definition 1 (Subjective Logic Opinion [11]). Let be a domain with card{ } ≥ 2. Let X further be a random variable in . Then, the ordered triple ω = (b, u, a) according to (1) is termed Subjective Logic Opinion or opinion in short. Moreover, the probability distribution P calculated from ω by (2) is termed projected probability of ω.
Furthermore, the equivalent mapping theorem proven in [11] is a direct consequence from the urn model: be an opinion and Dir(p, r, a) a Dirichlet distribution over the same x ∈ . Then, for u ̸ = 1, the equivalent mapping transforms the Dirichlet distribution into its equivalent opinion and vice versa. ◼ Additionally, operators need to be defined to combine such SL opinions. To do so, consider again the aforementioned urn model. A first sample A of, e. g., three red balls is drawn from the first urn, and an opinion ω A on A is created. Later on, a second sample B is drawn from the urn, e. g., two black balls and a red ball, and a second opinion ω B is created. As the two samples are statistically independent, in classical probabilistic reasoning, A and B would be joint to estimate the distribution of the first urn. For the given example, this would yield (0.66, 0.33). For SL, an operator ⊕ corresponding to this intuition can be derived, combining the observations from A and B according to This can be transfered to the opinion domain using the equivalent mapping theorem (3): Here, (5c) covers the generalized case that different base rates a are assumed for ω A and ω B , i. e., the balls in the third urn are differently distributed for A and B, respectively. This leads to the definition of the Aleatory Cumulative Belief Fusion (CBF) operator.
Definition 2 (Cumulative Belief Fusion [11]). Let ω A and ω B be opinions over the same variable X in the domain . Then, the operator ⊕ in  Fig. 1(a), the other steps from Fig. 1 remain the same.
with 0 < u A < 1 and 0 < u B < 1 is called aleatory cumulative belief fusion (CBF) and b A⊕B , u A⊕B , and a A⊕B are calculated according to (5). For accumulating multiple opinions from a set , the shorthand So far, undistorted observability has been assumed, i. e., the color of the balls drawn from the first urn was evident. However, in many practical situations, this undistorted observability is not given, e. g., due to measurement noise. To reflect this imperfect observation, the following procedure, which is depicted in Figure 2, replaces the first step: An additional-i. e., fourth-urn is added to the setup containing white and gray balls. As before, a sample is drawn from the first urn, but the drawn balls are not put into the second urn directly. Instead, for each of the drawn balls from the first urn, one ball is drawn from the fourth urn. Whenever a white ball is drawn, a ball of the same color as its corresponding ball in the original sample is put into the second urn. However, if a gray ball is drawn, a ball differently colored than in the original sample (and non-green) is added to the second urn. This process happens secretly, so that only the balls being put in the second urn are known as observation. Finally, the upsampling is conducted as described previously and an opinion is created. If multiple entities observe the same sample, each observed sample is individually created by drawing white and gray balls from the additional urn and transformed to an individual opinion by upsampling for each entity.
Opposed to multiple samples of the first urn, multiple observations of the same sample drawn from the first urn obviously are statistically dependent. Hence, the CBF operator (6) is not applicable and it is not possible to reduce the statistical uncertainty by evaluating multiple observations of the same sample. However, the accuracy of the knowledge on the available evidence can be increased by averaging out the noise added to the second urn. The observation accuracy of the entities A and B observing the same sample might differ. This corresponds to different fractions of white and gray balls in the additional urn. If prior knowledge on the observation accuracies is available, this can be used to infer a more accurate estima-tion of the actually available evidence by weighting the observed evidence according to the observation accuracy using with w A and w B being weights of the observed evidences, respectively. In the opinion space, (7) maps to Another effect that comes with an imperfect observation is the systematic overestimation of the available evidence, as not every ball put into the second urn actually reflects statistical evidence from the first urn. If prior knowledge about the observation accuracy is available, this error can be corrected by using the trust discounting operator: Definition 4 (Trust Discounting [11]). Letω be an opinion over a variable X on domain and let p d ∈ [0, 1] be a probability. Then, the operator T (ω, p d ) with is called Trust Discounting operator.
Speaking in terms of the urn model, the discounting probability p d describes the fraction of white balls in the additional urn. By multiplying the belief mass with p d , the number of balls added to the second urn due to drawing a gray ball is removed in a stochastic sense. Thus, fictional statistical evidence fades into uncertainty.
From the urn model, furthermore, it becomes clear that with an increasing fraction of gray balls, the gap between actual and assumed evidence grows until merging two opinions does not improve the inferred result any more. This particularly holds if the fraction is unknown and thus cannot be corrected for. Therefore, a consistency criterion is needed with which it can be decided whether or not the fusion of two opinions is reasonable. One possible consistency criterion is the likelihood that the opinions converge into the same probability density for an infinite number of samples drawn from the first urn. This likelihood is primarily determined by the statistical uncertainty, i. e., the number of balls from the first urn that have not yet been observed already, and the resulting projected probability inferred from the observed sample. The smaller the difference between the projected probabilities (2), the likelier it is that they would end up in the same distribution. At the same time, it becomes less likely that differing projected probabilities would still end up in the same distribution with increasing number of balls already observed from the first urn.
In practice, the consistency check is often used as alarm mechanism. Therefore, it comes handier to formulate an inconsistency measure so that further steps are triggered whenever the measure exceeds a certain threshold. This leads to the definition of the Degree of Conflict (DC):

Definition 5 (Degree of Conflict [11]). Let ω A and ω B be opinions over the variable X A and X B in the domain . Then, the measure
is called Degree of Conflict (DC).
With these intuitively derived definitions of SL elements and operators, many applications from automation can be realized already. However, there exist even more operators and measures if the presented ones are not sufficient, see e. g., [11].

Exemplary SL applications for connected automated vehicles
So far, a new urn model interpretation of SL was given to improve the intuition for the theory. In this section, two applications are given to exemplarily illustrate how SL can be used in the context of connected automated driving.
To estimate the reliability of the cooperative information, some sort of redundancy is required. As redundancy raises cost in most cases, only small sample sizes of redundant information are usually available. Specifically in the case of CAVs, the availability of redundant information is additionally strongly dependent on the situation, e. g., a common field of view from external and on-board sensors or information from several CAVs on the same objects. Since SL can deal with small sample sizes and features the statistical uncertainty to weight the result of a reliability test according to the statistical evidence supporting the result, SL is particularly well suited to be used in CAVs for such situations and outplays classical probabilistic approaches, which do not provide that information. Set-based methods, in turn, complement SL and can be used together with SL to improve safety, as we will show in the first application example.
The first example given in Subsection 4.1 bases on [20], where SL is used on-board a CAV to assess the reliability of cooperative information from off-board sources. Here, we extend the results of [20] by further experiments and show how the reliability of the reliability estimation can be assessed using SL and the proposed urn model. Then, it is shortly sketched how this reliability estimate can be integrated in the motion planning scheme from [19,22]. The second example, which is given in Subsection 4.2 summarizes the approach from [21]. It shows how SL can be used in a large scale multi-agent system to assess the reliability of moving agents such as CAVs.

Reliability estimation of cooperative information
Consider the scenario that an CAV approaches a yield Tjunction where buildings occlude the view of the vehicle's sensors on the main road. Without cooperative information, the vehicle has to stop at the yield line, so that the vehicle's perception can sense the upcoming traffic on the main road. Then, the CAV has to wait for a sufficiently wide traffic gap to merge into it. Now, consider additional infrastructure sensors and a Road Side Unit (RSU) at the junction providing (pre-processed) data from the sensors to any connected vehicle. With this cooperative information, the CAV can merge more efficiently by synchronizing its motion to a traffic gap reported by the RSU. However, if the cooperative information is not reliable, e. g., if there actually is a vulnerable road user within the (wrongly reported) traffic gap, this might result in a severe accident. Therefore, a key to safety of the CAV's merging functionality is an adequate, quickly available reliability measure, based on which the CAV can decide whether it will use the cooperative information while approaching the yield junction. Our approach to solve this task, which was initially presented in [20], bases on SL.
The key idea of the approach is to test the incoming cooperative information from the RSU on four different aspects for consistency, and-in combination with the ego perception-for plausibility. If the incoming data is consistent, fits with previously sent information, and matches the information measured through the ego perception, the reliability of the RSU providing the cooperative information is estimated to be high. In turn, if inconsistencies are detected or the information does not match with the ego perception of the CAV, the estimated reliability of the RSU is reduced. For the reliability estimation, the measurement uncertainties can vary over time as long as they are given in conjunction with the measurements. In our case, the RSU transmits the respective variances together with the objects. In contrast, the test statistics describing the second order probabilities are assumed to be stationary. This means for our case that the actual reliability of the RSU does not change while the few samples are collected.
In detail, the following four tests are evaluated: Prediction Test: The prediction test assumes that the RSU provides not only real-time information, but also predictions of the movement of objects, e. g., to account for latency in the system and allow for predictive planning. The test compares buffered predictions from previous time steps with the current measurements, where a high estimated reliability results if the current measurements are consistent with earlier predictions. Map Test: The map test compares the positions of reported road users with the digital map of the vehicle. The underlying assumption is that an expectation on where road users drive is available, e. g., in form of a map. If the reported positions of the road users fit well with that expectation, the RSU is considered reliable, while otherwise, e. g., if vehicles drive through houses according to the CAV's map, the estimated reliability is drastically reduced. The latter can result, e. g., from a calibration error. Ego Localization Test: This test uses the ego localization of the CAV to rate the uncertainties reported the cooperative information. Usually, the ego localization is very accurate in a CAV as it is needed for motion planning. When the ego vehicle approaches the junction, at some point, it enters the field of view of the infrastructure sensors so that the RSU reports an object at the position of the ego vehicle as well as the corresponding measurement uncertainties in terms of a covariance matrix. If the variance is correct, the Malhalanobis distance [16] between the ego localization and the position of the reported object is expected to be small. In this case, the reported variance is plausible and the RSU is estimated to be reliable. Ego Perception Test: The ego perception test compares the perception of the ego vehicle with the objects reported by the RSU. If the ego perception detects an ob-ject in the junction area, it is supposed to have a corresponding object in the object list reported by the RSU. If so, the estimated reliability of the RSU is increased, while a missing detection of an object in the object list of the RSU leads to a drastic decrease of the estimated reliability.
For the individual tests, the CBF, the ABF, and the trust discounting operator are used in combination with binomial and multinomial SL opinions. Finally, the results from all four tests, each formulated as SL opinion, are weighted and merged together using the weighted Average Belief Fusion operator (8). This yields an overall opinion on the RSU's reliability. With the weighted ABF operator, the merits of SL as compared to classical probability theory get apparent. As opposed to classical methods, the operator can use the uncertainty of SL to weight the respective test results according to their expressiveness. For example, if the ego perception test cannot be performed since no object is seen from the CAV's on-board sensors, the uncertainty becomes 1 and the fusion neglects this test. If classical statistical methods without additional measures would be used, the ego perception test result would equal the prior distribution, which then would be fused with the results from the other tests.
Exemplary, Fig. 3 shows a real world scenario for which the ego perception test is applied, while Figure 4 shows the estimated reliability according to the test. In the example, laser scanners are used on infrastructure side as well as on-board the CAV to perceive other traffic participants. It can be seen that there is a bicyclist in the perception of the CAV, marked in red, that is not included in the RSU data. As can be seen in Figure 4, this leads to a massive decrease of the estimated reliability.   In Figure 5, the overall opinions-represented as Dirichlet distributions-are evaluated for 30 manually labeled sequences. Remember that the opinions represent second order probability, i. e., a probability of probabilities. Therefore, the probability of the data's reliability p rel being at least x % can be retrieved by integrating the given distributions over this confidence interval, i. e., from x 100 to 1.0. With that, it can be seen that for every reliable sequence, the probability that the actual reliability is more than 90 % is always more than 90 %. In contrast, for the sequences labeled as not reliable, the probability that the actual reliability is more than 90 % is negligible. This demonstrates that the SL approach is able to estimate the reliability of a source of cooperative information (in our case the RSU) based on view samples on real-world applications. The reliability estimation scheme profits from the statistical uncertainty provided by the SL opinions as the overall fusion can be adapted to the available information.
With the results from Figure 5, we evaluate the reliability of the reliability estimation with SL and the urn model. To do so, we interpret the reliability of the estimation scheme as a random variable x sampled from the domain . Then, contains W = 2 elements: X that the estimation works and X that the estimation does not work. In the urn scheme, X and X are represented as red and black balls in the first urn, respectively. Each plot in Figure 5 represents a red or black ball in the second urn, i. e., statistical evidence supporting either that the estimation works or that it does not work, respectively. Because there is no a priori knowledge on how the experiment will turn out, a = [0.5, 0.5] T is assumed. According to Fig. 5, all experiments are in favor of the reliability estimation scheme. Thus, before upsampling, there are 31 red balls and one black ball in the second urn. This corresponds to the SL opinion ω x = ([0.94, 0] T , 0.06, [0.5, 0.5] T ). Using Theorem 1, ω x is mapped to its corresponding β-distribution. The evaluation of the p-value p(X) ≥ 90 % yields a confidence level of 1 − α = 95 %, where α ist the probability of error.
As a result, the reliability estimates can be used for decision making in motion planning. Details on our motion planning scheme can be found in [19,22]. As usual in literature, we formulated the motion planning for merging scenarios as Optimal Control Problem (OCP), in which the passenger's comfort and safety are optimized. Hereby, the safety goal is formulated in terms of minimizing the residual risk according to a risk model. At the same time, the OCP is constrained to a maximum acceptable risk that must not be exceeded.
Our risk model is based on the set-based approaches [25,27]. Set-based approaches come with safety guarantees and thus are particularly beneficial for safety verification [26]. However, the safety guarantees require deterministic input sets, while most data processing approaches for CAVs, such as multi-object tracking [32] or the reliability estimation presented above, are formulated probabilistically. Hence, we adapted the set-based methods such that we preserve their safety guarantees as much as possible, i. e., up to a maximum acceptable residual risk, while we formulate the risk model probabilistically to stay within the probabilistic framework.
This risk model is extended by the SL-based reliability estimation of the data. To do so, the p-value p rel of the corresponding β-distribution is calculated for a pre-defined confidence level 1 − α. The residual risk then is weighted with this p rel and (1 − p rel ) is added to the residual risk. The latter is done as the situation gets highly dangerous if the motion planning is performed on unreliable data, while the weighting of the original residual risk is done for normalization reasons. Overall, by the adaption of the risk model, the motion planning reacts to unreliable data, discards them and, in this case, the motion planning uses only the ego perception.

Reliability estimation and misbehavior detection in vehicular multi-agent systems
As a second example, the application from [21] is summarized, showing how SL-based reliability estimation can be applied to a large scale vMAS. As already discussed, cooperative information that is shared among different agents on the road via vehicle-to-anything (V2X) communication can improve the traffic efficiency as long as the information is reliable. However, unreliable information can lead to serious consequences. As opposed to the example before, in a large scale scenario, security aspects and trust among the agents in addition to safety considerations play an increasing role and have to be accounted for [39].
In this example, based on an attacker model, a communication is proposed that allows for distributing and updating reliability information on the agents in addition to the cooperative data. The reliability that is observed by the respective agents during a traffic scenario is evaluated through SL and communicated to a central instance. The central instance checks the reliability estimates for consistency, revises them, if they are inaccurate, and acts upon irreliable agents. Basically, the trust management process can be divided into two cases: trust building, if all involved agent consistently report a traffic scenario, and trust revision, if the reported data are inconsistent, i. e., some agents provide wrong information. All reliabilities are represented as SL opinions.
The trust building essentially consists of two steps: first, a trust discounting (9) is applied to the former reliability estimate of the involved agents, as the reliability of a respective agent might age over time. Thus, statistical evidence fades into uncertainty over time and agents need to steadily provide statistical evidence on their reliability to remain highly trusted. In the second step, the trust discounted reliability opinion is merged with the new statistical evidence for the agent's reliability using the CBF operator (5).
The trust revision, in turn, is slightly more complex. First, again, a trust discounting (9) is applied to all agents involved to account for the aging of the reliability information. In the second step, the SL opinions of the respective agents on what happened during the traffic scenario are clustered using the DC (10) as metric. Hence, loosely speaking, in this step, agents reporting more or less the same story are clustered together. In the next step, a reference opinion is calculated for each opinion cluster. As the agents redundantly observe the same scenario, their opinions are expected to be statistically dependent. Hence, the ABF operator (8) is used to find the reference opinion of each cluster. Consequently, the DC is calculated for each agent's opinion to each reference opinion. For each reference opinion, the number of opinions are counted that are consistent with it, i. e., have a DC that does not exceed a threshold θ DC . The reference opinion that gets the highest number of consistent opinions is then selected. The agents that reported opinions consistent with the chosen reference are classified as reliable, while all other agents are classified as not reliable. Accordingly, the estimated reliability of agents classified as reliable is increased by applying trust building. In turn, the estimated reliability of the other agents is revised using the trust revision mechanism from [14].
To evaluate the ability of the SL-based reliability estimation scheme, a simulation was performed. Figure 6 shows two resulting receiver operator characteristic (ROC) curves of the reliable/unreliable classification. The faulty agents were assumed to have a systematic error of 0.6 σ for the first and of 1 σ for the second ROC curve. For all others, Gaussian distributed measurements have been assumed. It shows that even small systematic errors can be detected, while the classification performance quickly rises with an increasing systematic error of the faulty agents. At 1 σ, the reliability estimation scheme reacts sensitively to unreliable agents even for small false positive rates.
The SL reliability estimation scheme was tested on a large scale vMAS scenario using a simulator for the whole traffic of Cologne [18]. For the simulation, an error rate of 10 % for the reliable agents was assumed, i. e., even reliable agents sent 10 % faulty measurements due to noise. With a false positive rate of 10 % and a detection probability of 46 %, this setting corresponds to the 0.6 σ ROC curve. To reduce misclassification, three decisions in a row were used for classification. Thus, after 45 reports, 76 % of unreliable agents and 1.5 % reliable agents were sent to maintenance. This shows that the reliability estimation scheme can increase the reliability of cooperative information throughout the vMAS [21].

Conclusion
In this work, a novel urn model intuition to SL was given based upon Pólya's urn scheme. We hope that this additional tutorial approach helps that SL will be used in automation applications in future. Additionally, we presented two examples how the reliability of cooperative in-formation can be inferred through SL. For the first example, the reliability estimation of RSU information on-board a CAV, we additionally sketched how this information directly can support decision making in motion planning.
Funding: Part of this work was financially supported by the Federal Ministry of Economic Affairs and Energy of Germany within the program "Highly and Fully Automated Driving in Demanding Driving Situations" (project MEC-View, grant number 19A16010I). Part of this work has been conducted as part of ICT4CART project which has received funding from the European Union's Horizon 2020 research & innovation program under grant agreement No. 768953. Content reflects only the authors' view and European Commission is not responsible for any use that may be made of the information it contains.