Objective To explore the impact of racial and ethnic diversity on the performance of cardiac surgical risk models, the Chinese SinoSCORE was compared with the Society of Thoracic Surgeons (STS) risk model in a diverse American population.
Methods The SinoSCORE risk model was applied to 13 969 consecutive coronary artery bypass surgery patients from twelve American institutions. SinoSCORE risk factors were entered into a logistic regression to create a ‘derived’ SinoSCORE whose performance was compared with that of the STS risk model.
Results Observed mortality was 1.51% (66% of that predicted by STS model). The SinoSCORE ‘low-risk’ group had a mortality of 0.15%±0.04%, while the medium-risk and high-risk groups had mortalities of 0.35%±0.06% and 2.13%±0.14%, respectively. The derived SinoSCORE model had a relatively good discrimination (area under of the curve (AUC)=0.785) compared with that of the STS risk score (AUC=0.811; P=0.18 comparing the two). However, specific factors that were significant in the original SinoSCORE but that lacked significance in our derived model included body mass index, preoperative atrial fibrillation and chronic obstructive pulmonary disease.
Conclusion SinoSCORE demonstrated limited discrimination when applied to an American population. The derived SinoSCORE had a discrimination comparable with that of the STS, suggesting underlying similarities of physiological substrate undergoing surgery. However, differential influence of various risk factors suggests that there may be varying degrees of importance and interactions between risk factors. Clinicians should exercise caution when applying risk models across varying populations due to potential differences that racial, ethnic and geographic factors may play in cardiac disease and surgical outcomes.
- surgery-coronary bypass
- risk stratification
- research methods
Statistics from Altmetric.com
Despite considerable improvements in cardiovascular therapies, coronary artery disease (CAD) remains the leading cause of morbidity and mortality worldwide.1–3 Likewise, this trend shows no signs of reversing in rapidly developing countries such as China, where there is expected to be a >50% increase in coronary events over the next 15 years.2 This has resulted in a similar upwards shift in coronary artery bypass grafting (CABG) surgery volume. However, notably absent from this movement is the adoption of a standard risk model to assess CABG mortality in Chinese populations.1 4
Cardiac risk models have traditionally been derived from and developed for American and European populations; however, these models may prove inaccurate when applied to Chinese populations.5–8 As medicine becomes increasingly collaborative and standardised internationally, it remains unclear if objective standards can be applied in risk adjustment to compare outcomes and prognosis in the setting of racial, ethnic and cultural differences. We therefore undertook the novel exploration of applying SinoSCORE, a contemporary Chinese cardiac risk model, in a diverse American population.9 10 Given both the increasing emphasis on quality assessment in cardiac surgery and the increasing prevalence of CAD in China, we find this investigation to be both timely and compelling.11
Materials and methods
The study population consisted of all patients who underwent CABG surgery in any of 12 cardiac surgery programmes with the Columbia University HeartSource programme from May 2006 to March 2017. These 12 programmes represent a broad array of medical centres—academic and non-academic, urban, suburban and rural located throughout different regions in the USA. The purpose of the Columbia HeartSource project is to help community hospitals enhance an existing cardiovascular programme or to launch one de novo. The programme involves involvement of Columbia faculty for real-time consultation, discussion of challenging cases and external peer review of all deaths and other significant cases. As a result, Columbia HeartSource personnel have readily available access to affiliate’s surgical data. All surgical cases are prospectively entered into a web-based proprietary Society of Thoracic Surgeons (STS)-compliant database that is regularly monitored for timeliness and accuracy.
Our hypothesis is that racial and ethnic differences between populations limit the ability of professionally accepted risk models to predict mortality following CABG surgery. More specifically, we hypothesise that differences between American and Chinese populations will impact the performance of risk models derived from each of those surgical populations. Ideally, our hypothesis would be tested by applying both a Chinese and an American risk model to a representative sample of patients from each population; however, since the Chinese population was not available for study, we elected to compare the performance of each risk model in a diverse American population.
A total of 13 969 consecutive patients underwent CABG surgery and could have an STS operative mortality risk score calculated (ie, isolated CABG or CABG+valve) during this study period. Preoperative and demographic information, operative data and inhospital mortality for all patients were retrieved from the STS-compliant HeartSource database. Excluded were all patients who had associated procedures for which there was no STS risk prediction model.
All risk factors relevant to SinoSCORE9 10 and STS12 13 risk calculations were collected. Available data were entered into each risk calculation using appropriate definitions. In cases where there were minor discrepancies between the definitions of various risk factors (chronic obstructive pulmonary disease (COPD), atrial fibrillation and so on) between the two models, reasonable approximations were made (see online supplementary file.) Based on original SinoSCORE methodology, a SinoSCORE was then calculated for each patient. STS mortality risk scores were retrieved from the HeartSource database.9 SinoSCORE was calculated for all patients undergoing CABG, including those with associated valve surgery, whereas separate STS risk models were employed for each isolated CABG and combined CABG/valve procedure.12 13 In accordance with original SinoSCORE methodology, patients were stratified into low-risk, medium-risk and high-risk groups according to their respective SinoSCOREs (low: <1, medium: 1< and <6, high: ≥6).
Supplementary file 1
The primary endpoint for this study was mortality. All calculations for mortality related to SinoSCORE were based on inhospital mortality—or death occurring during the hospitalisation in which the operation was performed—as SinoSCORE is a risk prediction model for inhospital mortality. All calculations for mortality related to the STS risk models were based on operative mortality—or death occurring during the hospitalisation in which the operation was performed, even if after 30 days (including patients transferred to other acute care facilities), or death occurring after discharge from the hospital, but before the end of the thirtieth postoperative day—as STS is a risk prediction model for operative mortality.
Categorical data are presented as frequency distributions and simple percentages. Values of continuous variables are expressed as a mean±SD or as a median±IQR, as appropriate. Performance of both the SinoSCORE and STS risk models were assessed by comparing the observed and expected mortalities for the overall population and each subgroup, defined by their predicted risk scores.
To assess discrimination within the overall cohort and subgroups, we used receiver operator characteristics to calculate the area under of the curve (AUC). An AUC of 0.5 is indicative of no discriminatory ability, while an AUC of 1.0 is considered to be perfect discriminatory ability. To assess calibration, we used the Hosmer-Lemeshow (H-L) goodness-of-fit test. H-L P values greater than 0.05 were considered to be well-calibrated models for our study population. Calibration plots comparing observed to expected mortality based on the predicted risks calculated by the SinoSCORE and STS were created. A perfectly calibrated prediction model would follow the 45° line, while curves below the line are considered to be overestimates and curves above the line are considered to be underestimates. All analyses were performed using SAS, V.9.2. P<0.05 was considered significant.
The published form of the SinoSCORE risk model does not provide a direct correlation between SinoSCORE and predicted risk of mortality for each individual score, but rather categorises patients into low-risk, medium-risk and high-risk groups, indicating a range of predicted mortality for each of these subgroups. Due to the inability to assign a distinct predicted risk of mortality to each increment of the SinoSCORE, we developed a logistic regression model using risk factors found to be significant in the originally developed SinoSCORE as independent variables and hospital mortality as the dependent variable, resulting in a ‘derived SinoSCORE’. We then compared the performance of this ‘derived SinoSCORE’ with the STS risk model in our diverse American patient population.
Data regarding COPD for the SinoSCORE risk model was replaced using single imputation methods. SinoSCORE defines COPD as ‘long-term use of bronchodilators or steroids for lung disease’; however, use of bronchodilators was not collected in earlier versions of the STS Database (V.2.52 and V.2.61) but was recorded in later versions (V2.73 and V.2.81). Patients with applicable COPD information available were examined. For populations with no information available, an imputed value for SinoSCORE proportionate to the probability of the prevalence of COPD in that population was calculated and assigned to all patients with missing data.
Institutional review board
This study was approved by the Columbia University Institutional Review Board with waiver of consent.
A summary of population characteristics can be found in table 1. The mean age of our patient cohort was 67.14, and 74.55% were men. Of note, only 1.38% of the study population was of Asian origin. The SinoSCORE ‘low risk’ group (≤1) had a mean SinoSCORE of −0.96±1.22, while the ‘medium’ (1<×<6) and ‘high’ (≥6) risk groups had a mean SinoSCORE of 3.4±1.07 and 11.69±4.47, respectively.
Comparison in different populations (primary analyses)
Of the entire cohort, there were 211 deaths observed, resulting in an overall mortality of 1.51%. The STS observed to expected ratio for operative mortality was 0.66. The low-risk subgroup had a mortality of 0.15%±0.04%, the medium-risk subgroup had a mortality of 0.35%±0.06% and the high-risk subgroup had a mortality 2.13%±0.14%. Observed and predictive operative mortality using SinoSCORE methodology in the American and Chinese populations are summarised in table 2. In both populations, SinoSCORE successfully identified patients that were low-risk, medium-risk and high-risk for CABG-related mortality. However, in contrast to the distinct non-overlapping CIs in the original Chinese population from which the SinoSCORE was derived, there was considerable overlap in CIs between risk groups in the American population, suggesting that the discrimination of SinoSCORE in the American population may not be optimal.
Comparison of models (secondary analyses)
To explore whether this discrepancy in performance was a byproduct of the risk factors included in the SinoSCORE model or the population from which the model is derived, we developed a model (‘derived SinoSCORE’) using logistic regression. Significant risk factors included in both the SinoSCORE and derived SinoSCORE model are summarised in table 3.10 Factors that were found to be significant in the original SinoSCORE, but lacked significance in the derived SinoSCORE model included body mass index, preoperative atrial fibrillation, COPD and non-elective surgery.
The derived SinoSCORE model had a relatively good discrimination (AUC=0.785) for our patient cohort compared with that of the STS risk score (AUC=0.811), with no difference in discrimination between these two models (P=0.18; figure 1). Model calibration was evaluated using the H-L test. The derived SinoSCORE was well calibrated for this population (P=0.146), whereas the STS Risk Score was tended to overestimate operative morality (P<0.001) due to the lower than expected mortality in this high-quality surgical population (figure 2).
To our knowledge, this is the first study that applies the SinoSCORE to a non-Chinese population. With increasing globalisation of information and medical knowledge, these findings sound a certain note of caution in applying risk models to populations different from those from which they were derived. A stronger understanding of the underlying factors that limit the broad application of risk models may shed insight into the intrinsic differences between populations and reduce inappropriate utilisation of risk models, thereby potentially reducing patient mortality.8 14 15 Moreover, since China in particular has recently been more open to embracing Western medical standards, a more nuanced understanding of these limitations is increasingly relevant.
Application of the SinoSCORE model to the American population suggested questionable discrimination—as demonstrated by overlapping confidence intervals—and thus may not be an appropriate risk model for American populations. On further analysis, when SinoSCORE significant risk factors were applied to our cohort, the risk model (derived SinoSCORE) had a discrimination that was comparable with that of the STS (P=0.18). This suggests that there are similarities in the patient-related factors that determine operative risk, but that the relative influence of each risk factor may vary between populations (table 3).
Overlapping risk factors between SinoSCORE and derived SinoSCORE also suggest that risk factors for CABG mortality in both populations may be similar, but the differing ORs for identical risk factors indicate that the interaction between risk factors may be distinct between both populations. For example, chronic renal failure (OR: 2.4 vs 1.62), COPD (OR: 2.46 vs 0.86) and extracardiac arteriopathy (OR: 4.38 vs 1.54) played greater roles in mortality in the Chinese population than in the American population. This may be explained by underlying factors that have uniquely shaped the CAD landscape in China. For example, the prevalence of hypertension in China rose dramatically in the previous decade, and ineffective management of hypertension has been linked to the rising incidence of renal failure.16 17 The increased role of COPD in CABG mortality may reflect a higher incidence cigarette smoking and worsening air pollution, as pollutant concentrations have been recorded to be greater than four times the WHO recommended limit and has been linked to more than 1.3 million premature deaths in 2010.18–20 Moreover, China’s urbanisation and industrialisation have been associated with the adoption of ‘Western’ dietary patterns and decreased physical activity that are likely linked to increasing rates of extracardiac arteriopathy.1 3 11 17 20 However, factors such as combined valve surgery (OR: 2.51 vs 2.49) and preoperative critical staging (OR: 2.20 vs 2.92) had similar impacts in both populations, which is not surprising given that these parameters are less influenced by cultural factors but rather tend to reflect sicker patients and more complicated operations. Our data shed light on the intrinsic differences between the Chinese and American populations included in this study, which is then reflected in the performance of risk models/risk scores. This study highlights how risk models may not capture underlying differences in non-clinical aspects of disease and supports the hypothesis that variation in patient composition may be a significant source of error in cardiac risk models.21
While the derived SinoSCORE and STS had comparable discriminations, the derived SinoSCORE demonstrated a superior calibration to STS (P=0.146 vs P<0.001) in the setting of our patient population. This is unsurprising given the derived SinoSCORE was derived from the patient population to which it was applied and thus would understandably result in a superior calibration. In addition, the relatively high-performance network from which the data were derived from had an observed to expected mortality of 0.66 and likely contributed to STS model’s overprediction of mortality and relatively poor calibration.
Our study highlights how risk model performance may be dependent on sample population and shaped by both clinical and non-clinical factors including racial and cultural differences, which need to inform comparative performance assessment. Independent external validations of CABG risk models in broad populations should continue as current risk models may not be suitable for all populations.
As alluded to in the Methods, the patient population would ideally have included both a Chinese and American population to evaluate risk model performance; however, we were unable to obtain to access this population. In addition, our cohort cannot definitively be said to represent the general American population. However, our sample included patients from a broad range of institutions that varied in size, location and volume. Likewise, SinoSCORE was derived from 42 centres that likely represent an appropriate sampling of the Chinese population with advanced CAD who undergo CABG. Furthermore, while definitions and criteria for certain risk factors did not align exactly between SinoSCORE and STS, appropriate approximations were made (see online supplementary file).
To our knowledge, this is the first study that applies the SinoSCORE to a non-Chinese population. The original SinoSCORE demonstrated limited discrimination when applied to an American population, indicating that it may not be appropriate for evaluating CABG mortality in certain populations. The derived SinoSCORE had a discrimination comparable with that of the STS, suggesting that there may be varying degrees of importance and interactions between risk factors within the context of the fundamental physiological similarity of patients undergoing advanced surgical coronary revascularisation. Clinicians must exercise caution when applying risk models across varying populations due to potential differences that racial, ethnic and geographic factors may play in cardiac disease and outcomes.
What is already known about this subject?
To date, approximately 20 risk models have been developed to study mortality following coronary artery bypass surgery. Given the wealth of supporting data and penetration of the database in the American population, the Society of Thoracic Surgeons risk model has gained increasing stature as the most widely accepted standard in the USA. However, emergence of risk models from other countries has raised the question as to the potential limitations of applying risk models that were well derived in one population to other populations with differences in racial and ethnic composition.
What does this study add?
Although application and limitations of American risk models for cardiac surgical mortality to Chinese patient populations has previously been reported, this is the first report of a Chinese model being applied to a diverse contemporary American population. The findings suggest that although there is a fundamental physiological similarity among patients undergoing coronary artery bypass surgery, the performance of risk models is limited by differences in the impact of various risk factors in different populations.
How might this impact on clinical practice?
Although there is a tendency to seek ‘international’ standards for evaluating surgical performance, such efforts must always be tempered by adjustment for specific racial, ethnic and social differences between populations.
Contributors All authors participated in the analysis of the data and either manuscript writing or review.
Funding Funding for this project was supplied by entirely unrestricted internal funds from the Department of Surgery of Columbia University.
Disclaimer None of the authors has any conflict to disclose regarding any aspect of this work; all views expressed are those of the authors, not an official position of the Department of Surgery nor of the University.
Competing interests None declared.
Patient consent Not required.
Ethics approval Columbia University Human Subjects Protocol AAAQ2103.
Provenance and peer review Not commissioned; internally peer reviewed.
Presented at Presented at the 2016 Scientific Sessions of the American Heart Association, New Orleans, 14 November 2016.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.