BJA Advance Access originally published online on January 31, 2008
British Journal of Anaesthesia 2008 100(3):315-321; doi:10.1093/bja/aem399
Evaluation of a decision support system to predict preoperative investigations
1 Department of Anaesthesia, Royal Liverpool and Broadgreen University Hospitals NHS Trust, Prescot Street, Liverpool L7 8XP, UK
2 Department of Clinical Engineering, Royal Liverpool and Broadgreen University Hospitals NHS Trust, Prescot Street, Liverpool L7 8XP, UK
* Corresponding author. E-mail: burra.murthy{at}rlbuht.nhs.uk
Accepted for publication December 11, 2007.
| Abstract |
|---|
|
|
|---|
Background: We have developed the Optimising Surgical Care and Assessment Record (OSCAR), a clinical decision support system, to help nurses in predicting necessary preoperative investigations before surgery. OSCAR applies the hospitals protocols, which are based on the National Institute for Health and Clinical Excellence guidelines, to the patients medical history and surgical details before recommending required investigations.
Methods: We selected case notes of 50 patients randomly from the OSCAR system that were recorded between October 2006 and January 2007. To form a reference standard, these case histories were anonymized and then sent to 10 consultant anaesthetists across the country. They were asked to study the case history and choose which tests they would carry out and which they would not. Then we have evaluated OSCARs ability to predict the necessary investigations and the nurses judgement, in comparison with the reference standard.
Results: OSCARs ability to identify which investigations should be carried out, that is, its sensitivity, was 91.5% and its ability to identify which investigations not to carry out, that is, its specificity, was 82.7%. OSCAR was consistent in predicting investigations for differing severities of surgery, for ASA grade and gender. We were unable to demonstrate any overall difference between OSCAR and the nurses ability to predict preoperative investigations. When combining the nurses predictions with OSCARs recommendations, an even greater sensitivity of 98.2% could be achieved.
Conclusions: OSCARs prediction algorithm cannot replace the nurses judgement, but it can be used as a supplementary decision aid to promote consistency and improve accuracy.
Keywords: assessment, preanaesthetic; computers; statistics, sensitivity; statistics, specificity
| Introduction |
|---|
|
|
|---|
In the 21st century, health care is becoming more complex with the growing demands of caring for increasing numbers of patients with chronic conditions. Recently, decision support systems (DSS) have been recognized as a coherent and important category of health technology in delivering this complex task. These computerized systems have an increasing role in an era of clinical governance and managed care, where outcomes research, quality of care, risk/benefit analysis, and cost/benefit analyses are becoming crucial. A clinical DSS is a computer tool, which uses two or more items of data to generate patient-specific advice.1 It is an active knowledge resource that uses patient data or history to generate case-specific advice. This advice supports decision-making about individual patients by healthcare professionals.2
The preoperative assessment process ensures that patients are medically and socially evaluated for anaesthesia and surgery. Historically, junior doctors carried out preoperative assessments, but increasingly nurses are carrying out this task. At our hospital, highly trained and experienced nurses carry out preoperative assessments. The Preoperative Assessment Team at the Royal Liverpool and Broadgreen University Hospitals in collaboration with the clinical engineering department have developed a computerized clinical DSS called Optimising Surgical Care and Assessment Record (OSCAR) for their assessments. When we record a patients vital signs and medical history (knowledge), OSCAR will generate a recommendation of appropriate investigations (aid in decision-making) according to our hospitals protocols based on a localized version of the National Institute for Health and Clinical Excellence (NICE) guidelines. This system is used to record the patients history, to predict preoperative investigations, and to provide clinical management advice. It also provides advisory comments based on the medical history, for example, if a patient has an allergy to latex, then they should be first on the operating list. The system was designed to compile a knowledge base about patients that would be independent of the assessment, so if the patient returns for another operation then these facts will be instantly available. Our team uses OSCAR for their daily assessments. We aimed to evaluate OSCARs ability to predict the necessary investigations and to compare it with the nurses judgement.
| Methods |
|---|
|
|
|---|
OSCAR has been developed as a joint initiative between the preoperative assessment team and the clinical engineering department at this hospital. The development team attended assessment clinics and obtained copies of the original paper pathway along with the hospitals protocols and guidelines for preoperative investigations. They then developed an intranet-based solution called OSCAR for the preoperative assessment team. OSCAR was evaluated alongside the paper pathway in the assessment clinics to ensure it would operate at a suitable speed and gather a full account of the patients medical history appropriately. Once the nurses had gained confidence in OSCAR, they started to record the assessment directly into it, using its printout as the replacement for the pathway in the patients case notes. A preliminary study of OSCARs ability to predict investigations was then carried out. Day-to-day maintenance of this service has been passed over to the IT department in the hospital, but the development team have continued to refine OSCAR to support the ongoing needs of the patients and the preoperative assessment team. An interface to the laboratory system is now in operation that allows the nurse to electronically transfer laboratory results back into the assessment. An electronic copy of the assessment can be posted securely onto the hospital intranet so the patients history can be read along with the assessments findings without resorting to the patients case notes.
Our preoperative assessment nurses have been using OSCAR in their routine preoperative assessment process since October 2005. The nurse enters the patients history and vital signs into OSCAR during their assessment and the nurse decides which investigations to carry out. Once the nurse has recorded which investigations they want to carry out, OSCAR shows the investigations it has identified. The nurse can then alter which investigations they are going to book. In this exercise, we are only considering the nurses choice of investigations before seeing OSCARs recommendations and not what they changed them to afterwards. The details of these selections are compared with those identified in our reference standard (discussed later). For the study, we selected 50 cases from OSCAR randomly (using random numbers) that were recorded between October 2006 and January 2007 to evaluate OSCARs sensitivity and specificity in predicting the necessary preoperative investigations.
To evaluate complex DSS, such as OSCAR, which have no gold standard, a reference standard needs to be established.3 The reference standard consisted of 12 consultant anaesthetists with a wide variety of experience in anaesthetizing different surgical populations, working in teaching hospitals, and district general hospitals across the UK. The anonymized case histories from OSCAR were sent to them by electronic mails (e-mails). They were asked to study the case history and choose which tests they would carry out and which they would not, using the options details given in Table 1. The evaluation was confined to eight sets of investigations.
|
We required a consensus of greater than 75% among consultant anaesthetists for each test, as described by NICE guidance in their definition of agreement.4
We formed a consensus for each test of the tests reaching this acceptable level, using the following rationale: if 75% or more consultant anaesthetists select options Definitely Yes or Possibly Yes for the test, then this would form a positive (do test) in our reference standard. If 75% or more consultant anaesthetists select options Definitely No or Possibly No for the test, then this would form a negative (do not test) in our reference standard. Otherwise, the result would form an unknown in our reference standard.
Results from OSCAR and the nurse were disregarded where there was an unknown in our reference standard since this prevented us from evaluating whether OSCAR or the nurse had correctly categorized that particular test.
Statistical analysis
We evaluated OSCARs ability to predict the necessary investigations and compare it with that of the nurse. The sensitivity was taken as the proportion of tests required from the consensus of the reference standard that were indicated as necessary by OSCAR and the nurse. The specificity was taken as the proportion of tests not required from the consensus of the reference standard that were indicated as not required by OSCAR and the nurse.
We calculated 95% confidence intervals to the sensitivity and specificity results, using the Confidence Interval Analysis software supplied with the book Statistics with Confidence.5 The confidence interval shows the range of sensitivity and specificity values that are likely (95% confidence level) to include the true value for the population. Where OSCAR or the nurse identified <20 tests as being required or not required for that particular grouping, the details of sensitivity or specificity were excluded since these deemed to produce too large a confidence interval to allow any meaningful conclusion from being drawn.
The overlap among confidence intervals of OSCARs results with the nurses was taken to conclude that the null hypothesis could not be rejected.
We used the approximation form of McNemars statistical test to evaluate any discrepancy in the cases where OSCAR disagreed with the nurse.6 P<0.05 was taken as statistically significant.
We used the following formula for estimating the sample size when comparing sensitivities in matched-group diagnostic studies.7
|
|
In our previous work, we found that the nurse sensitivity was approximately 0.9 (P1). As measure of improvement we wished to detect a sensitivity for OSCAR of 0.95 (P2) so the difference of 0.05 (
). The probability of disagreement between OSCAR and the nurse is represented by
. The minimum value for this was 0.05 and the maximum, calculated as P1x(1–P2)+(1–P1)xP2, was 0.14. The significance level factor (SLF) was 1.645 for a 5% significance level and the power factor (PF) was 0.840 for 80% power. Therefore, the sample size was estimated to be n=344. Since we had eight tests on each case, 43 (344/8) cases were required. We rounded this up to the next 10, resulting in 50 cases being used.
| Results |
|---|
|
|
|---|
Of 50 patients (29 males and 21 females), 31 underwent major surgery (colectomy, repair of abdominal aortic aneurysm, transurethral resection of prostate, cholecystectomy, and stabilization of spine), and 19 underwent intermediate surgery (knee arthroscopy, inguinal hernia repair, and umbilical hernia repair). Nine patients were ASA I; 21 were ASA II; and 20 were ASA III. The majority of the 32 patients were >60 yr old; only 14 patients were aged between 40 and 59 yr and four were aged <40 yr.
Although we had invited 12 consultant anaesthetists to participate in the validation process, two consultants only provided answers for the first five cases. Hence we analysed the answers given by 10 consultants only. The evaluation of eight types of investigations gave a total of 400 tests for the 50 cases. Overall, consultants agreed on 164 positive tests and 81 negative tests. Thus, there was only 61% agreement within the reference standard. The results of OSCAR and the nurse for the other 155 tests had to be disregarded since no consensus could be obtained from the reference standard. In the reference standard, the consensus for FBC, ECG, and U&E was >80%, but was as low as 72% for pulmonary function tests and only 50% or less for CXR, LFTs, Glucose and Clotting screen. The number of tests agreed for severity of surgery, gender, ASA grade, age group of patients, and types of test are shown in Table 2.
|
Table 3 shows the total number of positives and negatives identified by the consultant anaesthetist that were also identified by OSCAR and the nurse along with the overall sensitivity, specificity, and 95% confidence intervals.
|
The number of true positives and true negatives for each grouping according to severity of the operation, gender, ASA grade, age group of patients, and types of test are shown in Table 4. There was no significant difference between OSCAR and the nurse in predicting necessary preoperative investigations (P=0.82) (Table 5). Table 5 shows the data array comparing agreements and disagreements between OSCAR and the nurse, for which preoperative investigations to carry out.
|
|
| Discussion |
|---|
|
|
|---|
We were unable to demonstrate any difference between OSCAR and the nurses ability to predict preoperative investigations. The sensitivity (91.5%) and specificity (82.7%) along with their confidence intervals were identical and so we could not reject our null hypothesis. This is also demonstrated by the McNemar test (P=0.82) (Table 5). The OSCAR and the nurse have shown a high degree of sensitivity in identifying the need for FBC, U&E, and ECG. But OSCAR appears to have a very poor specificity (66.7%) for chest X-ray (CXR) (Table 4).
OSCAR missed 14 of the tests, which were identified by the anaesthetists as necessary. They included three full blood count tests that were missed due to the rule in OSCAR requiring both the procedure to be a major operation and blood loss of 20% to be likely; this rule has now been amended to any major surgery irrespective of expected blood loss. Four pulmonary function tests (PFTs) were missed due to the rule in OSCAR for patients presenting with a respiratory disease requiring them to also be undergoing a major procedure; we now believe a simple rule like this is inappropriate in detecting whether this test is required. OSCAR has two levels of instruction, one called recommendation, where there is strong evidence a test is required, and the other called suggestion where there is only an indication. We are now considering amending the rule to trigger for all patients presenting with a respiratory disease but moving the instruction to suggested to allow the nurse to use their own judgement. It is necessary for an anaesthetist to know the results of this test for any patient with a severe respiratory disease who is about to undergo general anaesthesia. Two other tests (one PFT and the other coagulation screen) were missed due to the nurse incorrectly completing the computer record.
Five tests (three coagulation screen and two blood glucose) were missed because OSCAR did not have the necessary rules to predict them.
In these 14 cases, seven were due to coding errors in OSCAR, two were due to incorrectly completed computer records, and five were due to lack of rules in OSCAR to predict investigations. All the coding errors were subsequently corrected, disseminating the new improved practice throughout the Trust.
There were 14 tests predicted by OSCAR, which were not predicted by the anaesthetists. Seven CXR tests were recommended unnecessarily. Four of these were due to heart problems in the past that are now considered not indicative. Two were due to the patient being considered elderly and one due to the patient suffering from asthma. OSCAR appears to have a very poor specificity (66.7%) for CXR, due to the prediction algorithms wide criteria. In our local guidelines, one of the criteria for preoperative CXR was any history of cardiovascular or respiratory disease. This is clearly a fault in the coding and strongly supports our case for a two-tiered method of recommending and suggesting tests. If a test is recommended, then OSCAR has found strong evidence in the case for doing this. If a test is suggested, then OSCAR has only found an indication that a test may be required and the nurse must exercise judgement. Three urea and electrolyte tests were recommended unnecessarily. In two of these cases, we believe these tests should have been recommended, that is, where the patient consumed excessive alcohol intake and the other where the patient was an elderly. Two liver function tests were recommended unnecessarily. In one case, we believe the test should have been recommended, that is, where there was a history of significant malignancy. In the other case, the rule in OSCAR incorrectly triggered on when the patient had suffered a heart attack and should have only triggered on heart failure. One PFT was recommended unnecessarily due to rule in OSCAR predicting this from the patient becoming breathless on mild exertion. This is to be changed to suggest this test only. One glucose test was recommended unnecessarily due to an intraarticular steroid injection for a sport injury as steroid therapy.
Distinguishing the context of the information provided by the patient is difficult for a computer system, as in unnecessarily suggesting the four CXR based on past conditions. The fact that these conditions are in the past and that the information relies on the patients memory and integrity needs careful consideration and possibly annotation when constructing an electronic medical record.
OSCAR and the nurse had an overall sensitivity of 91.5% in our study. We asked the nurse to use their judgement first on which investigations to book before they were shown the results from OSCAR. We then allowed them to add any investigations that OSCAR has identified that they may have overlooked, although these refined selections were not the main purpose of this study. If the refined selections were analysed, the results for sensitivity could be redefined a true positive as either the nurse correctly identified a positive or OSCAR correctly identified a positive. In Table 5, the results suggest that if we combine the results from when OSCAR and the nurse identified a positive (139) with when OSCAR alone identified a positive (11) and where the nurse alone identified a positive (11), then we have 161 positives out of 164, that is, a sensitivity of 98.2% (Table 6). This would suggest that by combining OSCARs ability with the nurses judgment, we would only be missing 18 investigations per thousand.
|
This would also impact on the systems specificity, since we would now need to redefine a negative as the situation when both OSCAR and the nurse correctly identified a negative. If either of them identified a positive, then this would have satisfied our condition for booking the test. In Table 5, the results suggest that we would then have 57 negatives out of 81, that is, a specificity of 70.4% (Table 6). This would suggest that where the nurse was originally booking 173 unnecessary investigations per thousand (original specificity of 82.7%), we would now be booking 296 unnecessary investigations per thousand (new specificity of 70.4%). This first impression of overbooking may indicate a major failure in the software or the nurses judgement but could also be due to a discrepancy in the interpretation of the NICE guidelines.4 It could also be due to the low consensus in the reference standard not correlating well with these guidelines, in part due to the relatively small number of consultants who completed the exercise (10). Even though, the NICE guidelines could not create great change, they have raised the profile of preoperative testing with necessary wider discussion and review.
It is very difficult to measure the impact of missing a test compared with the impact of booking unnecessary tests. In a NHS systematic review by Munro and colleagues,8 it was found that the power of preoperative tests to predict adverse postoperative outcomes in asymptomatic patients is either weak or non-existent. Conversely, Finegan and colleagues9 found that a small increase in complications was noted during a period when selective ordering of investigations by staff anaesthesiologists was carried out, although they emphasized that none of these complications was considered preventable.
Our reference standard was based on a consensus from 10 consultant anaesthetists, who only agreed on 61% of the tests. This clearly demonstrates the complexity of the issue and the ethical problems of screening vs not screening. A consensus of only 61% is clear evidence of a disparity of opinion on the criteria for the selection of preoperative investigations. The NICE guidance on the use of routine preoperative testing for elective surgery is based on clinician consensus rather than evidence-based medicine.4
Many computerized DSS developments have failed, as they were mainly technology led, with insufficient emphasis on the need for either high-quality clinical data or clinician input.2 The developers did not appreciate the need for clinician input in the development and testing of these systems. The failure to encourage health professionals who use DSS to apply their own clinical judgement in the context of the patient encounter is described by Liu and colleagues2 as one of the most important problems of contemporary DSS. It is also likely that most clinicians do not appreciate the potential of these tools to help in implementing the evidence from various rigorous studies. They form an important category of health technology. In contrast, OSCAR, a clinician led innovative DSS development, has been designed to help improve clinical practice and patient experience. The development of a DSS should be a dynamic process, not a static one, helping to produce a mature product, which can be tailored to the changing needs of the patient population. Wide use of OSCAR can build a body of evidence to evaluate and revise the NICE Guidelines on preoperative investigations.
We conclude that neither the nurse nor OSCAR can act as the definitive decision maker when selecting preoperative investigations. Each has their strengths and weaknesses with different groups of patients. OSCARs prediction algorithm cannot replace the nurses judgement, but it can be used as a supplementary decision aid in promoting consistency. If we combine the investigation prediction abilities of both the nurse and OSCAR, we can promote consistency and improve accuracy.
| Acknowledgements |
|---|
|
|
|---|
We thank the following Consultant Anaesthetists for their work in reviewing all the case records and formulating our reference standard: Dr Jonas Appiah Ankam, Royal Liverpool & Broadgreen Hospital, Liverpool; Dr John Carlisle, Torbay Hospital, Torquay, Devon; Dr Winston Demello, University Hospital of South Manchester NHS Foundation Trust, Manchester; Dr Andy Dennis, Northern General Hospital, Sheffield; Dr Mark Human, Torbay Hospital, Torquay, Devon; Dr Andrew Kitching, Royal Berkshire Hospital, Reading; Dr Fernando Mateu, Royal Liverpool & Broadgreen Hospital, Liverpool; Dr Christopher Parker, Royal Liverpool & Broadgreen Hospital, Liverpool; Dr Hoo Tsang, Royal Liverpool & Broadgreen Hospital, Liverpool; Dr Oliver Zuzan, Royal Liverpool & Broadgreen Hospital, Liverpool; We are grateful to Prof. J. M. Hunter, Professor of Anaesthesia, University of Liverpool, Liverpool, for her valuable suggestions and comments on the manuscript. We also thank Mr Steven Lane, Centre for Medical Statistics and Health Evaluation, University of Liverpool, for his help in statistical analysis.
| References |
|---|
|
|
|---|
1 Wyatt JC. Knowledge for the clinician 9. Decision support systems. J R Soc Med (2000) 93:629–33.
2 Liu J, Wyatt JC, Altman DG. Decision tools in health care: focus on the problem, not the solution. BMC Med Inform Decis Mak (2006) 6:4. (Available from http://www.biomedcentral.com/1472-6947/6/4).[CrossRef][Medline]
3 Potts HWW, Wyatt JC, Altman DG. Challenges in evaluating complex decision support systems: lessons from Design-a-Trial. In: Artificial Intelligence in Medicine (Lecture Notes in Computer Science)—Quaglini S, Barahona P, Andreassen S, eds. (2001) Heidelberg: Springer. 453–6.
4 National Institute for Clinical Excellence. Preoperative tests. The use of routine preoperative tests for elective surgery. (2003) June (Available from http://guidance.nice.org.uk/CG3, accessed June 2007).
5 Bryant TN. Computer software for calculating confidence intervals (CIA). In: Statistics with Confidence—Altman DG, Machin D, Bryant TN, Gardner MJ, eds. (2000) London: BMJ Publishing Group. 208–13.
6 Dwyer AJ. Matching and McNemar in the comparison of diagnostic modalities. Radiology (1991) 178:328–30.
7 Beam CA. Strategies for improving power in diagnostic radiology research. Am J Roentgenol (1992) 159:631–7.
8 Munro J, Booth A, Nicholl J. Routine preoperative testing: a systematic review of the evidence. Health Technol Assess (1997) 1:1–62.[Medline]
9 Finegan BA, Rashiq S, McAlister FA, OConnor P. Selective ordering of preoperative investigations by anesthesiologists reduces the number and cost of tests. Can J Anaesth (2005) 52:575–80.[Web of Science][Medline]
Read all E-letters![]()
CiteULike
Connotea
Del.icio.us What's this?
E-letters:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||