Objective: The introduction of bundled payment for diabetes care in the Netherlands led to the origination of care groups. This study explored to what extent variation in health care costs per patient can be attributed to the performance of care groups. Furthermore, the commonly applied simple mean aggregation was compared with the more advanced generalized linear mixed model (GLMM) to benchmark health care costs per patient between care groups. Data Source Dutch 2009 nationwide insurance claims data of diabetes type 2 patients (104,544 patients, 50 care groups). Study Design: Both a simple mean aggregation and a GLMM approach was applied to rank care groups, using two different health care costs variables: total treatment health care costs and diabetes-specific specialist care costs per diabetes patient. Principal Findings Care groups varied slightly in the first and mainly in the second indicator. Care group variation was not explained by composition. Although the ranking methods were correlated, some care groups’ rank positions differed, with consequences on the top-10 and the low-10 positions. Conclusions: Differences between care groups exist when an appropriate indicator and a sophisticated aggregation technique is used. Currently applied benchmarking may have unfair consequences for some care groups.