Relationships between step and cumulative PMI and E-factors: implications on estimating material efficiency with respect to charting synthesis optimization strategies

Abstract This report describes mathematical relationships between step and cumulative process mass intensities (PMIs) for synthesis plans, and analogous parameters applied to E-factors. It is shown that both step E-factors and step PMIs are not additive for synthesis plans. It is also shown that a recursive calculation of cumulative PMIs from step PMIs is a rapid method of determining overall PMIs for synthesis plans, though cumulative PMIs are not sufficiently informative as step PMIs or step E-factors to identify bottlenecks in synthesis plans. Illustrations on the use of these metrics to track the material efficiency of published synthesis plans for the pharmaceutical, apixaban, are given as a template example. Advantages and disadvantages of each metric are discussed. A general algorithm to select the most promising candidate synthesis plans considered at the design stage for a given molecular target that most likely satisfy “green” material efficiency criteria is also presented.

Greek letters: ε = reaction yield φ = mass of input materials

Introduction:
The use of green chemistry metrics is now a well-established tool used by synthetic organic and process chemists to estimate the material efficiency, environmental impact, and safety-hazard impact of their synthesis plans and chemical processes to a desired target molecule [1,2]. More importantly, it is a powerful tool in making decisions about which candidate synthesis strategies to choose and ultimately to actualize in the pursuit of "green" syntheses at the early design and planning stages of a synthesis campaign. Even more importantly, it is imperative that whatever algorithms are used to carry out the estimation of those metrics are robust so that good decisions are made with respect to achieving true synthesis optimization. In this way, chemists will be confident in implementing reliable, reproducible, and intelligible metrics as a routine tool at their disposal in their practice of green chemistry principles in their everyday work.
In this report we wish to present general relationships between step and cumulative process mass intensity (PMI) and general relationships between step and cumulative E-factors for linear and convergent synthesis plans. Both sets of metrics have been advanced by the pharmaceutical industry as highly relevant to the determination of material efficiency both for "after-the-fact" published plans in the literature and for exploring new synthesis strategies using classical and novel methods at the planning 3 stages before any experiments are carried out. Specifically, we develop facile methods of calculating overall PMI, cumulative PMI, overall E-factor, and cumulative E-factor for synthesis plans. The structure of this report is as follows. We begin by showing the derivation of each method and supply mathematical evidence supporting the connections between the relevant metrics. Next, we illustrate the computation of those metrics for published synthesis plans to the blood anti-coagulant pharmaceutical apixaban 1 [3][4][5][6][7][8] since this was chosen as a showcase example by the pharmaceutical industry for achieving a "green" synthesis [9]. Finally, we show how these metrics may be used for ranking the published plans with respect to material efficiency.

E-factor and PMI
For any individual chemical reaction, the mass of waste generated may be generally considered to originate from seven sources: by-products (BP) arising as a mechanistic consequence in producing the target product, side products (SP) arising as a consequence of competing reactions not leading to the intended target product, unreacted excess reagents (XS), catalysts and ligands (CAT), reaction solvents (RS), work-up materials (WU), and purification materials (PU). If P m is the mass of the desired target product of the reaction, then the total mass of waste and the E-factor are defined according to equations (1) and (2) Hence, the last line in equation (2) can be transformed as shown in equation (4).
Equation (4) indicates that the overall PMI for a chemical reaction is not the sum of the contributing PMIs as was the case for E-factors. Rather, it is the sum of the contributing PMIs minus one less than the number of waste contributing factors. Based on these findings for a single reaction, one might think that the same kind of parallel relationship also holds for sequences of reactions in a synthesis plan; namely, that the overall E-factor is the sum of the step E-factors and that the overall PMI is not the sum of the step PMIs.
In fact, for a linear synthesis plan neither the overall E-factor nor the overall PMI is the sum of the corresponding step parameters. We demonstrate what the connecting relationships are between overall and step parameters for each metric with the following simple logic. For a linear sequence of N steps involving the production of intermediate products P 1 , P 2 , …, and P N in steps 1, 2, …, N where all of the preceding intermediate in any given step is committed as a reagent in the next step, we can write the step PMI for step j as shown in equation (5).
From equation (6)  yield is 100% with no loss of mass arising from by-products or unreacted starting materials along the way. This very rare possibility involving consecutive rearrangement reactions has been documented before [11]. We can transform the expression given in equation (6) to find an analogous expression given in equation (7) that relates the overall E-factor for a linear sequence to the step E-factors using the substitution PMI = E + 1.
Again, we observe that the overall E-factor is generally not the sum of the step E-factors, except for the very special case mentioned above. This mathematical evidence therefore shows that step E-factors for a linear plan are not additive. Close examination of equation (7) shows that each of the E-factor parameters is defined with respect to and then if we divide that mass of overall waste by the mass of the final target product we obtain, of course, the overall E-factor for the synthesis plan. Furthermore, if we define a step contributing E-factor, * j E , with respect to the mass of final target product representing the contribution of waste produced from each reaction step to the total waste produced, then we can transform the expression given in equation (8) to obtain a relationship that links the overall E-factor to the step contributing E-factors as shown in equation (9).
In this case, we see that, indeed, the overall E-factor for a linear synthesis plan is the sum of the step contributing E-factors. Note that each step E-factor appearing in equation (8) is defined with respect to the mass of intermediate product produced at that step, whereas the step contributing E-factors appearing in equation (9) are all defined with respect to the mass of final product in the sequence.

Cumulative Versus Step Metrics
We previously reported expressions for cumulative PMI and cumulative E-factor that depended on step PMI and step E-factors, respectively [12,13]. The recursive relationships for linear sequences are shown in equations (10) and (11).
, MW refers to molecular weights of intermediates multiplied by stoichiometric coefficients as appropriate from balanced chemical equations, i ε is the reaction yield of step i, the 1 i → notation means that the cumulative PMI quantity extends from step 1 to step i, and the counting index goes from i = 2, 3, …, etc. Note that the connecting relation PMI = E + 1 was used in obtaining equation (11) from equation (10). The derivation of equation (10) arises as follows. For a three-step plan, from equation (5), the step PMIs for steps 1, 2, and 3 are given by equations (12)(13)(14).
( ) The cumulative PMI for steps 1 and 2 is given by equation (15).
where it is understood that the step PMI for step 1 is identical to the cumulative PMI for step 1. Similarly, the cumulative PMI for steps 1, 2, and 3 is given by equation (16).
It is readily apparent from the emerging pattern of equations (15) and (16)  convergent synthesis plan having a convergent branch consisting of 3 steps leading up to intermediate 3* P and the convergent step occurs at step 5 along the main branch then the cumulative PMI from step 1 to step 5 is given by equation (17).
The first two terms in equation (17) where the mole ratio in equation (19) is a number that has a value greater than 1.
Appendix 1 contains a more complex example of a convergent plan drawn in the form of a synthesis tree diagram [14,15] containing 5 branches including the main branch (See Figure S1). From that diagram it is possible to write out all the cumulative PMI expressions sequentially from the first step all the way to the final step by inspection (see equations S1 to S17).

Application to Syntheses of Apixaban
We evaluated the material efficiency of various literature-documented syntheses of the blood anti-coagulant apixaban 1 as a demonstration of the mathematical relationships presented in this work. Six synthesis plans were considered including two from Bristol-Myers Squibb [3,4], one from an academic lab in China [5], and three from generic drug companies [6][7][8]. All of them follow the same convergent synthesis strategy as evidenced by the common target bond mapping shown in Figure 1 which highlights the construction bonds made in the product chemical structure. The mapping for the pyrazolo[3,4-c]pyridin-7-one ring frame can be encoded as [(2 + 1 + 1 + 1) + (5 + 1)].

Figure 1
Table 1 summarizes the essential materials efficiency metrics for all six plans which are listed in ascending order of total PMI. These metrics were calculated using our previously reported REACTION and SYNTHESIS spreadsheets [11].   What is certain is that the sum of step E-factors does not equal the true overall E-factor for a given synthesis plan.

Table 1
Following our developed cumulative PMI calculator spreadsheet based on the recursive relation given in equation (10) where SF is the stoichiometric factor [16] , and MRP is the material recovery parameter accounting for all auxiliary materials used [16].
The highlighted values in yellow shown in Tables Table   2 agree with those given in the last column of Table 1. Table 5 shows the rankings of all plans based on these overall cumulative PMI values. We can see that the Bristol-Myers Squibb 2006 plan ranks first in all three regimes whereas the ranking order changes for the other plans. This actually is an excellent test for robustness of "green" material efficiency for a synthesis. If a plan consistently ranks first under these three regimes then the probability that it represents the "greenest" plan to a given target is very high.
Ranking first under the kernel regime means that the plan likely has the highest atom economy and overall yield; ranking first under the kernel plus excess regime means that it also consumes the least excess reagents; and finally ranking first under the complete regime means that it also consumes the least auxiliary materials in terms of reaction solvents, work-up materials, and purification materials. Table 2   Table 3   Table 4 A close examination of the rankings of the other plans shown in Table 5 shows some significant changes depending on the calculation regime. For example, the Optimus plan is ranked #2 under the complete regime but drops to #5 under the kernel regime. This is consistent with its auxiliary material consumption being close to that of the #1 plan, and its low ranking overall yield -about half that of the #1 plan. Apart from the winning plan, the Jiang-Ji and MSN plans show little variation in ranking among the three regimes. It is clear from this example that true overall synthesis optimization is achieved when optimization in all three regimes is orchestrated in the same direction.
We believe and emphasize that this criterion is the best and most significant indicator that controls material efficiency optimization of synthesis plans, and therefore the achievement of comparatively "green" synthesis plans. We also point out that in order to drill down to discover where the bottlenecks are in any given plan, it is insufficient to look only at overall E-factor, overall PMI, overall AE, and overall yield values as guides, or for that matter cumulative counterparts of those parameters. Two important guides are the partitioning of E-factors as shown in Table 1, and radial pentagon diagrams for each reaction step in a synthesis as shown in the REACTION spreadsheets for each plan and discussed elsewhere [12]. We determined the precise liabilities of each synthesis plan for apixaban by following this procedure as described in our earlier discussion. We note that the computation of overall PMI via a recursive cumulative PMI calculation is indeed facile. However, the tracking of the incremental increases in PMI does not shed enough light on spotting bottlenecks in a synthesis plan in comparison to tracking individual reaction step PMIs in conjunction with their radial pentagon diagrams which show how the partitioned step metrics come together to result in the corresponding step PMI values.

Table 5
Though it appears that synthesis optimization of apixaban using the existing synthesis strategy has reached a plateau, it does not mean that entirely new strategies cannot be considered that would further drive optimization forward. What has been achieved is comparatively the "greenest" synthesis plan within the constraint of a particular design synthesis strategy as shown by the common target bond mapping shown in Figure 1. We demonstrate that it is possible to use the kernel green metrics of reaction  In the case of the apixaban plans studied in this work, the corresponding reaction network summarizing all intermediates involved in all routes is shown in Figure 4. Figures 5 and   6 show flowcharts describing the logical flow of steps involved in stages I and II. We followed both sets of protocols in presenting our findings on the synthesis of apixaban in this work.   (MGS-1) Project is to tackle this question [13]. We hope that this work will put chemists on a confident path to optimize their own synthesis plans with the goal of practicing green chemistry principles and making intelligent decisions about route selection based on material efficiency in a simple, logical, robust, and reliable manner.

Acknowledgements:
Aleksandr Fukovitch (Apotex Pharmaceuticals) is thanked for useful discussions and François Blandois, Philippe Rigaud, and Camille Lagnier (all of Don Mills Collegiate Institute Mathematics Club) are thanked for checking the mathematical derivations.

Appendix 2
Case scenarios for a two-step plan with respect to sum of step E-factors and overall Efactor.
For a two-step plan given by step 1: A + B P 1 , and step 2: P 1 + C P 2 the overall Efactor in terms of the step E-factors E 1 and E 2 is given by equation (S18).
Case scenarios for a three-step plan with respect to sum of step E-factors and overall Efactor.
For a three-step plan given by step 1: A + B P 1 , step 2: P 1 + C P 2 , and step 3: P 2 + D P 3 the overall E-factor in terms of the step E-factors E 1 , E 2 , and E 3 is given by equation (S19). 1 2 Figure S2 summarizes the 9 possible case scenarios for a three-step plan with respect to the magnitude of the mass ratio of intermediates whether they are equal to 1, greater than 1, or less than 1. Seven of the nine cases yield definite conclusions with respect to the relative magnitudes of the true overall E-factor and the sum of the step E-factors.

Figure S2
Supporting Information: Supporting-Information-EXCEL.zip contains calculator-cumulative PMI.xls and six similar files applied to the six synthesis plans for apixaban examined in this work.