Wait, so you are telling me that a physician’s gestalt of whether something is considered sepsis is more likely than a scoring tool to result in the same physician coding the patient as having sepsis? I can’t believe it!
The fundamental design of this study is designed to get this exact result and I am surprised this is published in Annals. Additionally, looking at the methods, those diagnosed with sepsis vs those without have ethnic disparities and based on the design, are we underdiagnosing sepsis in certain populations? This is not addressed at all. I would be more curious to see if gestalt is more sensitive and/or specific at identifying the objective criteria for sepsis than decision tools, not whether it predicts the same physician adding the very diagnosis that is part of the study outcome. While sepsis is both a real clinical syndrome and a CMS metric, they are so interconnected now that to try and divorce them in a study leads to too many uncontrolled confounders. This study even addresses how this could improve SEP-1, but did not evaluate patients based on meeting SEP-1 criteria. Screeners like SIRS and qSOFA screen for all CMS sepsis, but not limited to CMS severe sepsis or septic shock which are part of the SEP-1 measure, so I am not sure why apples are being compared to oranges in this study.
Another thing that impairs its external validity is that there were patients brought directly to the resuscitation area, not all comers. Most decision tools are designed to identify patients as possibly ill before a physician sees them. Using them on patients already identified as sick who are in front of a physician is not what they are primarily designed for, rather to identify potentially sick patients. I would be surprised if they identified any difference in outcomes for these patients given the reported design of the treatment area studied. They don’t report outcomes here, but I would be skeptical of any conclusions for a follow up study using this design.