A QUASIEXPERIMENTAL EVALUATION OF A CLINICAL RESEARCH TRAINING PROGRAM
There is a growing need for research training programs that can accelerate the careers of clinical and translational scientists. The Clinical and Translational Science Award KL2 Scholars programs funded by the National Institutes of Health support the research and training of junior faculty advancing towards independent research careers. This study evaluates the impact of KL2 funding on participants' subsequent receipt of a Research Project Grant (R01) award, which represent a commonly referenced milestone of progress faculty make towards independence. Propensity score matching was used to compare the number of months KL2 scholars took to receive an R01 award with that of an equivalent group of early career faculty who did not receive KL2 funding. Although the participants in the KL2 Scholars Program who received an R01 award did so sooner than those in similar programs, more rigorous and longitudinal evaluations are needed to measure the impact of these programs on faculty careers considered overall.
INTRODUCTION
The US National Institutes of Health's (NIH) National Center for Advancing Translational Sciences (NCATS) funds Clinical and Translational Science Award (CTSA) programs at over 60 clinical research institutions across the country. These CTSA programs support clinical and translational investigators and study teams in a variety of ways, and each administers a Clinical Research KL2 Scholars Program that supports early career investigators. Although there is considerable variation in the size, duration, structure, and operation of the KL2 Scholar Programs across the CTSA Consortium, they are collectively making demonstrable contributions to the development of the next generation of clinical and translational research investigators (Sorkness et al., 2020).
This paper contributes to a growing body of research on the impact of KL2 Scholars Programs in the CTSA Consortium (e.g., Trochim et al., 2013; Sorkness et al., 2020). The evaluation presented here uses quasiexperimental methods to estimate the impact one KL2 Scholars Program had on the amount of time participating faculty took to earn a Research Project Grant (R01). The R01 award is the NIH's first and prime funding mechanism for independent research investigators, and the first receipt of this award is a commonly noted milestone in many scientific careers. The specific hypothesis tested here with quasiexperimental methods is that faculty participating in one of these KL2 Scholars Programs received R01 awards in a fewer number of months compared with similarly-supported faculty at the same institution.
The results of this evaluation are relevant to the study of clinical and translational science in general and more particularly to the administration and improvement of programs for early career investigators working in this field. The use of quasiexperimental methods for the purpose of program evaluation also advances best practices in performance improvement (Austin, 2011, Austin & Stuart, 2021; Wang, 2021). Finally, the conclusions of this work speak to the well-recognized need for more precise evidence of the impact these programs have on the research careers of clinical and translational scientists (Rubio et al., 2015; Rubio, 2013).
The MICHR KL2 Scholars Program
The Clinical Research KL2 Scholars Program evaluated here is offered by the University of Michigan (U-M), a large research university and medical center located in the Midwest. Prior to receiving a CTSA award in 2007, U-M offered a variety of postdoctoral training opportunities to early-career faculty, including K30 and KL2 programs funded by the National Center for Research Resources and two KL2 programs funded by the NIH. After receiving CTSA funding, the Michigan Institute for Clinical and Health Research (MICHR) combined and modified some of these particular programs to create the KL2 Scholars Program, which has been in continuous operation since.
MICHR's KL2 Scholars Program is a 2-year mentored career development award for early-career faculty members initiating clinical and translational research agendas. Faculty applying to MICHR's KL2 Scholars Program submit proposals using the NIH K23 application format, and all applications are reviewed by a committee of experienced U-M faculty investigators. Participants who receive this award are assigned a mentoring team that includes a scientific mentor, a peer mentor, and a faculty mentor assigned by MICHR to guide their achievement of personalized career development goals. They receive 50%–75% protected time for research and dedicated funding for their studies, career development, and research dissemination. These types of support are comparable to those provided by other KL2 programs in the CTSA Consortium (Sorkness et al., 2020).
The first cohort of MICHR KL2 scholars was enrolled in 2007 and included 14 individuals, eight of whom were grandfathered into the program from K programs that were phased out when MICHR was established. These eight individuals received their awards no earlier than 2005, as is noted in the Methods section. During its first decade of operation, over 100 faculty members applied to this competitive program, of which just over 45 were accepted. Considered overall, the backgrounds of those accepted into the program were similar to those who were not accepted, as shown in Table 1.
All MICHR KL2 scholars are expected to meet four programmatic requirements. First, they must complete the research described in their award proposal, including completing all required U-M research training requirements (e.g., research ethics and HIPAA training) and all applicable regulatory reviews and requirements. Second, scholars are required to attend workshops on responsible conduct of research and mentor training as well as participate in monthly Research Studio seminars. During these seminars, scholars share their work with their peers and learn best practices related to research and career development. Third, scholars are required to meet with their mentoring team, including their MICHR and scientific faculty mentors, quarterly each year to review their career development plans and goals, and to volunteer their time mentoring trainees from other MICHR programs. Lastly, scholars are required to present their work at two Research Studio sessions and at national conferences. These requirements are intended to motivate and facilitate scholars' ongoing pursuit of independent research careers.
METHODS
The conceptual framework used to evaluate MICHR's KL2 Scholars Program is designed expressly for evaluating the impact of adult education programs (Kirkpatrick & Kirkpatrick, 2005). This framework categorizes evaluation outcomes into four levels, including participants' (1) reactions, (2) learning, (3) behavior, and (4) programmatic results, in order to measure outcomes that fall into a comprehensive range of programmatic domains (e.g., engagement of program resources; scientific activity; scholarly outputs and outcomes; and health and society benefits). The data MICHR uses to evaluate these outcomes are derived from multiple sources, including direct observation, online surveys, and secondary data collection from public and university-managed datasets. The aims, data, and research methods used to conduct this study have been reviewed by the U-M Institutional Review Board and were determined to be exempt from oversight (HUM00133443 and HUM00113293).
KL2 Scholars Program participants' experience, learning, and behavior were evaluated using surveys, skill assessments, and secondary data collection from public sources. Specifically, all KL2 scholars were surveyed following each Research Studio to evaluate the didactic experience. In addition, a 12-item short form of the Clinical Research Appraisal Inventory (CRAI-12), validated by Robinson and colleagues (2013), was administered to KL2 Program cohorts starting in 2016. CRAI-12 measures scholars' confidence in their research knowledge and skill in six domains: (1) Designing studies and collecting data; (2) reporting, interpreting, and presenting study results; (3) conceptualizing study topics and collaboration; (4) planning a study; (5) funding a study; and (6) protecting study participants. Finally, data collected from public records were analyzed to identify all KL2 scholars currently in research careers, a common outcome measure that has been used by CTSAs to evaluate their effectiveness and benchmark their programmatic improvement over time (Rubio et al., 2013; Friedman, 2015).
Propensity Score Matching
The programmatic results of MICHR's KL2 Scholars Program analyzed here were evaluated using propensity score matching. This quasiexperimental method can account for individual background characteristics, such as those who are more likely to apply or be selected for training programs (Guo & Fraser, 2015). Because we could not assign scholars into treated and untreated groups, this method aided in the identification of a comparison group against which the programmatic impact on scholars' time to R01 attainment could be estimated.
The data used for this study are drawn primarily from national and institutional datasets curated and managed by the U-M Institute for Research on Innovation & Science (IRIS). In particular, the IRIS UMETRICS dataset (The Institute for Research on Innovation and Science, 2018) was used to establish linkages between sponsored program data and other publicly available data sources, including the receipt dates of NIH R01 awards.
Matched Samples
The U-M faculty included in the propensity score matching analyses were all K award scholars, having received MICHR KL2, NIH K23, or K08 funding between 2005 and 2015. The NIH K23 or K08 awards provide a level of research support that is generally comparable to that offered by the KL2 programs across the CTSA Consortium; however, the MICHR KL2 is only 2 years in length, compared with 4 to 5 years for the K23 or K08. All of these mechanisms are mentored career development awards offered to clinical or translational researchers.
The faculty who participated in the KL2 Program, including those who went on to obtain subsequent K23 or K08 support, were included in the KL2 scholars group. The time to R01 attainment for KL2 scholars was compared with that of the other faculty constituting a matched group, as identified through propensity score matching. As is detailed later, all calculated impact estimates used models that controlled for faculty members' application to the KL2 Program, even if they were not ultimately accepted.
Statistical Models
To predict the probability of receiving a KL2 award, a generalized logistic regression model was used to estimate propensity scores. This model can be represented using an equation that includes a set of covariates for basic background characteristics (βix); and an intercept (β0): \(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\bf{\alpha}}\)\(\def\bupbeta{\bf{\beta}}\)\(\def\bupgamma{\bf{\gamma}}\)\(\def\bupdelta{\bf{\delta}}\)\(\def\bupvarepsilon{\bf{\varepsilon}}\)\(\def\bupzeta{\bf{\zeta}}\)\(\def\bupeta{\bf{\eta}}\)\(\def\buptheta{\bf{\theta}}\)\(\def\bupiota{\bf{\iota}}\)\(\def\bupkappa{\bf{\kappa}}\)\(\def\\buplambda{\bf{\lambda}}\)\(\def\bupmu{\bf{\mu}}\)\(\def\bupnu{\bf{\nu}}\)\(\def\bupxi{\bf{\xi}}\)\(\def\bupomicron{\bf{\micron}}\)\(\def\buppi{\bf{\pi}}\)\(\def\buprho{\bf{\rho}}\)\(\def\bupsigma{\bf{\sigma}}\)\(\def\buptau{\bf{\tau}}\)\(\def\bupupsilon{\bf{\upsilon}}\)\(\def\bupphi{\bf{\phi}}\)\(\def\bupchi{\bf{\chi}}\)\(\def\buppsy{\bf{\psy}}\)\(\def\bupomega{\bf{\omega}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\begin{equation}{\rm{Probability\ of\ KL2\ particpation}} = {{{e^{{{{\upbeta }}_0} + {{\rm{\beta }}_i}x}}} \over {1 + {e^{{{{\upbeta }}_0} + {{{\upbeta }}_i}x}}}}.\end{equation}
The selection of covariates was guided by theory and prior research relevant to this investigation, as is recommended for all program evaluators and statisticians who use quasiexperimental methods (Steiner et al., 2010; Stuart & Rubin, 2008; Reynolds & DesJardins, 2009). The models used in this study include controls for the year of the award, age of the awardee, and measures of the awardee's prior scientific achievement, operationalized here as the number and value (log transformed) of prior federal awards received. The choice of these controls was informed by published research evaluating the effectiveness of similar award programs using propensity score matching (Pion & Cordray, 2008; Martinez et al., 2015), as well as recent studies of the distinguishing features of CTSA KL2 Scholars Programs (Sorkness et al., 2020). This parsimonious approach to modeling serves the common support condition, potentially increasing the variance of propensity score estimates and decreasing bias in resultant estimates (Heckman & Navarro-Lozano, 2004; Bryson et al., 2002).
As is best practice, analyses were conducted to test for the optimal calibration of the propensity score matching algorithms used for this evaluation (Austin, 2011; Austin & Stuart, 2021; Wang, 2021). Specifically, to find the best match between the KL2 scholars and matched groups, a combination of calipers and matching cohort combinations were tested. Particular caliper increments of 0.2, 0.4, 0.6, 0.8, and 1.0 (representing the distance in standard deviation units of the propensity scores) were used to control the distance between potential matches. The calipers were tested across several different matching procedures that determined the minimum number of KL2 scholars that could be matched with faculty in the matched group, ranging from one KL2 scholar to one other K awardee to as many as one KL2 scholar to five K awardees. A full matching, or “fullmatch,” procedure was also tested because this matching procedure was designed to minimize the total distance in propensity scores used to match KL2 scholars to faculty in the matched group (Sekhon, 2011).
To identify the matching method that provided the most balanced comparison group, the p values of chi-square tests were compared for covariates between KL2 scholars and the comparison groups. The matching method that returned the p value of the greatest magnitude was selected for further testing. To determine the best combination of matching algorithm and caliper, we chose the combination that yielded the best balance between the covariates using the optmatch and Matching packages in R (Hansen & Klopfer, 2016; Sekhon, 2011). To test the hypothesis that all coefficients except the intercept (β0) were equal to zero, the subsequent covariate balance of the two groups was measured using a chi-square test (Zeileis et al., 2008).
Zero-inflated Poisson (ZIP) models were used to measure the association between participation in the KL2 Scholars Program and the number of months until receipt of an R01 award. The ZIP or zero-inflated negative binomial models are appropriate for analyses of outcomes that contain a large proportion of “structured” zeros, indicating null observations (Zeileis et al., 2008; Long & Freese, 2014; Chambers & Hastie, 1992). These models estimate the likelihood that the outcome appears binary before assessing the associations between receipt of a KL2 award and the number of months between the first K23 or K08 award and the first R01 award.
Following an established practice in clinical research and evaluation, this model was also used to account for the serial correlation of data (Nelson & Leroux, 2006; Yau et al., 2004). For example, He and colleagues determined that ZIP models should be used with cross-sectional data in which the outcome variable is a count of time increments with structured zeros, such as to estimate the effects of select treatments on the number of days of alcohol drinking per month (He et al., 2014). The limitations of using ZIP models with small samples of cross-sectional data, particularly in comparison to the use of survival models with larger longitudinal datasets, are discussed later.
Each of the ZIP models used in this study included two covariates; one dichotomous variable that flagged individuals who applied to the KL2 Scholars Program, regardless of award receipt, and one ordinal variable counting the number of different K awards received by each individual (i.e., KL2 followed by K23/K08). Including for these covariates is necessary to control for the fact that faculty often apply for and receive multiple K awards, each of which could hypothetically promote faculty members' progress towards scientific independence. The Akaike information criterion (AIC) of all these models were compared using a Vuong test to identify the model with the best fit to the data (Vuong, 1989). Then, bootstrapping with 2000 replications was used to resample the data to calculate more precise confidence intervals for the best-fit model (Reynolds & DesJardins, 2009).
Finally, sample size estimates using PASS and G*Power software (Faul et al., 2009; Mathews, 2010; Morrow & Peter, 1996; Campbell & Stephen, 2014) were produced to assess how this study could be extended to more precisely determine the impact of CTSA KL2 Scholars Programs on related measures of scientific productivity. These analyses used the same set of covariates used in the regression models already described (Hastie & Pregibon, 1992; Venables & Ripley, 2002), as well as the ZIP model results, in addition to one multiple linear regression, which was used to measure the strength of the association between KL2 participation and the number of months taken to receive R01 funding.
RESULTS
Propensity Score Matching Analyses
Over 500 U-M faculty with non-CTSA K awards were identified from the UMETRICS data, just over 200 of which were ultimately included in the matched group following the use of the optimized propensity score matching process detailed in the methods section. The fullmatch procedure was found to return the highest p values compared with the other matching procedures. And as shown in Table 2, tests for the balance between the matched group and the KL2 scholars group indicate that the full matching method with a caliper of 0.6 returned the highest p value (p = 0.93).
Based on these findings, a fullmatch procedure using a 0.6 caliper was used to generate propensity scores for the matched group and the KL2 scholars. The propensity scores generated for the matched group showed a considerable degree of overlap with those of faculty who obtained MICHR's KL2 award, as shown in Figure 1. This degree of overlap demonstrates an acceptable level of common support (Reynolds & DesJardins, 2009), indicating that the matching process could be successfully used to identify a KL2 scholars group and similar matched group based on these propensity scores.



Citation: Performance Improvement Quarterly 36, 1; 10.56811/PIQ-20-0059
After matching, no significant difference was evident between the two groups on any of the covariates (Table 3). Before the matching process, the K awardees in the matched group had significantly more prior grant awards (p < 0.01), indicating balance had improved. The successful identification of this matched group enabled the impact of MICHR's KL2 Program to be estimated relative to this similar comparison group.
As previously stated, our hypothesis is that participation in the MICHR's KL2 Scholars Program accelerated the time faculty took to receive an NIH R01 award. The paths taken to an R01 award by all U-M faculty members receiving any type of K award between 2005 and 2015 are shown in Figure 2. As shown by this figure, the faculty receiving KL2 awards comprise only a fraction of the total U-M faculty with K awards.



Citation: Performance Improvement Quarterly 36, 1; 10.56811/PIQ-20-0059
Of the 43 faculty identified in the UMETRICS data as being in the KL2 Scholars Program during this time period, 16.3% (n = 7) obtained an R01. No substantial difference was found between the KL2 scholars and the matched faculty group in their likelihood of receiving an R01 award regardless of the amount of time taken to do so. However, faculty who obtained a KL2 as well as another K award went on to receive R01 awards at a slightly higher rate than those with just KL2 support (0.8% vs. 0.4%, respectively).
Impact of KL2 Awards on the Time to an R01 Award
To estimate the impact of the KL2 Scholars Program on the time taken for participants to receive an R01 award, the KL2 scholars were compared with the matched group. Comparisons between the fit of the four models tested in this study suggest that either the zero-inflated or zero-inflated Poisson models provided the best fit. These two models returned AIC values that were not significantly different from each other (AIC-corrected Vuong z-statistic = 1.003; p = 0.158). The ZIP model was used to model the effects of program participation on the count outcome.
The results of the ZIP model suggest that those participating in the MICHR KL2 Scholars Program earned an R01 award in significantly less time than their peers in the matched group (Table 4). The exponential of the coefficient for KL2 Scholars Program participation represents the expected proportional difference in the number of months between the K scholars and matched group, absent changes to any other predictor. When interpreted directly, these results suggest that participation in MICHR's KL2 Scholars Program is associated with a decline in the number of months needed to obtain an R01 award by roughly half (i.e., \({e^{ - .662}} = 0.515\)) compared with the matched group of faculty receiving a different K award.
However, these ZIP coefficients could also be interpreted to represent the difference in the number of months taken to earn an R01 award between those who receive an R01 award and those that never will (Vuong, 1989). In addition, the bootstrapped 95% confidence intervals for the variable flagging participation in the KL2 Scholars Program ranged from −17.162 to 0.007, indicating that the actual magnitude of the impact of the MICHR KL2 Scholars Program on the time to R01 attainment could be much smaller or even negligible.
The ZIP point estimates were used to calculate the estimated sample size necessary to reproduce the study with other similar measures of scientific productivity. A sample size of 234, composed of 37 KL2 scholars and 197 faculty receiving other K awards, achieved 80.4% power to detect a difference of −0.49 between the treatment and control group (with an event rate of .52 and 1, respectively) using a two-sided large-samples z-test of the Poisson event-rate difference at a significance level of 0.05. While the size of the sample used in this study exceeds these estimates, the sample size estimates assume both groups are followed for the same fixed time duration. Using an equally sized and fixed duration in this study would have necessarily reduced the sample size available for analysis. A sample size exceeding that used for this study (i.e., a sample of 49 K scholars and 261 with other K awards) would be required in order to achieve a 90% power to detect a difference of the size estimated here, assuming this approach is extended to evaluate similarly variable measures of scientific productivity.
Analysis of Scholar Self-Evaluations and Program Evaluations
The results of the propensity score matching process and impact estimates on R01 attainment are indirectly supported by additional evaluations that utilized different sources of data. Specifically, surveys of recent cohorts of MICHR KL2 scholars indicate that the Research Studio training they participated in was helpful to their work. Approximately 97% of 150 total responses to feedback surveys collected from Studio attendees in 2016 and 99% of 149 total responses collected in 2017 indicated that this training helped to advance their research. These results, and those of other questions included on the feedback form, broadly suggest that the participating faculty has a consistently positive training experience through MICHR Studio.
Similarly, the scholars' responses to the CRAI-12 assessments suggested they had confidence in their ability to carry out a range of clinical and translational research skills. Across all of the research skills assessed by this instrument, KL2 scholars reported an average self-rating of 5.82 on an 11-point scale ranging from no confidence (0) to total confidence (11). These scholars were most confident in their abilities to obtain informed consent and to write up the results of their research (with an average rating of 7.0 for both metrics) and least confident in their ability to ask staff to leave a project team when necessary (with a mean rating of 3.8). While these figures represent a measure of the self-efficacy of recent cohorts of these scholars, it suggests that these scholars possess considerable research knowledge and skill, as would be expected of past cohorts for which these data could not be collected post hoc.
Further data obtained from publicly available records, including NIH Reporter and PubMed, indicate all 29 KL2 scholars who graduated from the program since 2012 were engaged in clinical and translational research as of 2019. Specifically, these data were used to find evidence of recent research activity as defined by the CTSA's Common Metrics operational guidelines designed for use evaluating KL2 Programs (Schneider et al., 2015). This standard outcome metric suggests that MICHR KL2 scholars are going on to pursue successful clinical and translational research careers.
Collectively, these measures represent key programmatic outcomes across multiple levels, namely including scholars' programmatic experience, learning, and relevant scientific behavior. However, the evaluation outcomes in these three domains were not analyzed relative to those of any meaningful comparison group. For this and other reasons detailed in the limitation section, these results provide only indirect support for the hypothesis tested in this study. Specifically, these results support the plausibility of the hypothesis that MICHR's KL2 Scholars program helped to accelerate U-M faculty receipt of an NIH R01 award.
DISCUSSION
The main objective of this research was to examine the impact of the MICHR KL2 Scholars program on the time participating faculty took to obtain an NIH R01 award. First, the results suggest that MICHR's KL2 Scholars Program likely does have a positive impact by reducing the time participating faculty take to attain an R01 research award, although the magnitude of this effect cannot yet be accurately determined.
Secondarily, this work demonstrated that this KL2 Scholars Program has had a positive impact on faculty development, as evaluated using Kirkpatrick and Kirkpatrick's (2005) comprehensive evaluation framework. While the evaluations of these scholars' experience, learning, and behavior were not the primary focus of this paper, they provide additional evidence regarding the impact of this training program on faculty development. The results of these evaluations also indirectly support the plausibility of the hypothesis that these programs can have a downstream impact on the time taken to receive an R01 award. Triangulating the results of these secondary analyses (Mathison, 1988) suggest that they complement the claims of programmatic impact advanced in this paper. Overall, this evidence suggests MICHR's KL2 Scholars Program effectively supports the development of scholars' careers as clinical and translational scientists.
The evidence presented here also raises the possibility that MICHR's KL2 Scholars Program does not change faculty members' ability to earn an R01 award but accelerates individuals' process towards these awards. These findings can inform further evaluations of CTSA KL2 programs, including studies with larger samples of KL2 scholars. These findings also highlight the need for evaluations of KL2 Programs that include a more comprehensive range of scientific outcomes, including the receipt of other notable clinical and translational research awards.
LIMITATIONS
Two particular limitations of this study merit emphasis. First, evaluations capable of reliably and accurately measuring the impact of research funding programs for university faculty cannot be conducted without controlling for factors (such as gender and the scientific discipline of study) known to link their pursuit of a career in clinical and translational research with the likelihood of obtaining an R01 award (Trochim, et al., 2013; Pion & Cordray, 2008; Martinez et al., 2015; Sung et al., 2003). While this study did control for faculty members' intent to participate in the KL2 Scholars Program by including a variable flagging those that applied, this predictor clearly cannot represent the whole of individuals' intent to pursue clinical and translational research careers.
Second, small sample sizes limited the interpretability of the ZIP models. Although there is no well-established standard sample size required for ZIP models, the power calculations presented here suggest that a lack of power may have yielded imprecise impact estimates. A larger sample size might enable the matched group to be derived entirely from unfunded applicants to a CTSA KL2 Scholars Program, which would also better control for the investigators' intention to pursue clinical and translational research. Additionally, a larger sample size might also enable matching covariate refinement in an effort to account for structural differences across the K23/K08 and KL2 mechanisms, such as the durational and financial differences between training awards, and operational and cultural differences across institutional settings; all of which are known to vary considerably across hubs (Sorkness et al, 2020).
CONCLUSION
This study shows that MICHR's KL2 Scholars Program had a positive impact on a key measure of scientific independence, namely the time taken to earn an NIH R01 award. Additionally, the evaluation data presented here suggest that this program has had a positive impact on facultys' research training experience, learning, and careers. Additionally, by using common measures of scientific productivity and well-established conceptual frameworks, this study adopted an approach that can inform the evaluation of similar KL2 Scholars Programs throughout the CTSA Consortium. The use of quasiexperimental methods for future evaluations should also include other long-term outcome measures of scientific productivity obtained from within and outside the CTSA Consortium.
The findings from this study have practical and theoretical implications that could inform new directions for research. Future research can build upon this work by using similar approaches with larger sample sizes to test longitudinal models capable of controlling for a wider range of those factors likely to affect clinical and translational research production over long periods of time. Doing so would enable the average treatment effects of these programs on the participants to be specified and calculated with greater precision (Long & Freese, 2014; Greene, 2012). Such work could also test both matching algorithms and calipers to confirm or identify which combinations routinely produce the most well-balanced comparison groups (Austin, 2011; Austin & Stuart, 2021; Campbell & Stephen, 2014; Wang, 2021). Further evaluations of KL2 Scholars Programs in the CTSA Consortium can help scientists, program administrators, and evaluators better understand the impact that these awards have on the field of clinical and translational science.

Box plot of p scores for KL2 scholars (1) and the institutional group (0) on a log scale.

Sankey diagram of the paths U-M K awardees took to an R01 award (2005–2015).
Note: Diagram was created using SankeyMATIC
Contributor Notes
ELIAS SAMUELS
Elias Morrel Samuels, PhD, serves as the Administrative Program Director for MICHR Workforce Development and Director of Evaluation. He has extensive experience in higher education administration and is responsible for the development, implementation, and evaluation of workforce development initiatives for all of MICHR. Email: eliasms@med.umich.edu
PHILLIP A. IANNI
Phillip Ianni, PhD, is an expert in the assessment and validation of skill assessments and is responsible for designing processes to examine program impact on long-term career outcomes of MICHR trainees and scholars. Email: pianni@med.umich.edu
BRENDA EAKIN
Brenda Eakin, MS, serves as the Administrative Program Director for MICHR Career Development and Mentoring programs. She has extensive experience in the design and implementation of competency-based education programs and works closely with MICHR faculty on all aspects of the MICHR education portfolio. Email: beakin@med.umich.edu
ELLEN CHAMPAGNE
Ellen Champagne supports the evaluation of training initiatives and programs throughout MICHR in collaboration with the Director of Evaluation, Dr. Samuels. She also serves as liaison with evaluation administrators at other CTSA hubs. Email: ellecham@med.umich.edu
VICKI ELLINGROD
Vicki Ellingrod, PharmD, FCCP, is Associate Dean for Research and Graduate Education at the University of Michigan College of Pharmacy. She currently serves as the John Gideon Searle Professor of Clinical and Translational Psychiatry in the Clinical Pharmacy Department. Dr. Ellingrod is also the Associate Director for MICHR. She serves as Faculty Lead for the Education and Mentoring Group at MICHR and is Director of the MICHR KL2 program. Email: vellingr@med.umich.edu


