BJA Advance Access originally published online on August 30, 2009
British Journal of Anaesthesia 2009 103(4):472-483; doi:10.1093/bja/aep241
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Assessment of procedural skills in anaesthesia
1 The Hospital for Sick Children, University of Toronto, 555 University Avenue, Canada M5B 1W5.
2 Oxford Radcliffe Hospitals NHS Trust, Headley Way, Oxford OX3 9DU, UK.
3 The Ottawa Hospital, University of Ottawa, 1053 Carling Avenue, Ottawa, Ontario, K1Y 4E9, Canada
* Corresponding author. E-mail: dylan.bould{at}utoronto.ca
| Abstract |
|---|
|
|
|---|
A key aspect of the practice of anaesthesia is the ability to perform practical procedures efficiently and safely. Decreased working hours during training, an increasing focus on patient safety, and greater accountability have resulted in a paradigm shift in medical education. The resulting international trend towards competency-based training demands robust methods of evaluation of all domains of learning. The assessment of procedural skills in anaesthesia is poor compared with other domains of learning and has fallen behind surgical fields. Logbooks and procedure lists are best suited to providing information regarding likely opportunities within training programmes. Retrospective global scoring and direct observation without specific criteria are unreliable. The current best evidence for a gold standard for assessment of procedural skills in anaesthesia consists of a combination of previously validated checklists and global rating scales, used prospectively by a trained observer, for a procedure performed in an actual patient. Future research should include core assessment parameters to ensure methodological rigor and facilitate robust comparisons with other studies: (i) reliability, (ii) validity, (iii) feasibility, (iv) cost-effectiveness, and (v) comprehensiveness with varying levels of difficulty. Simulation may become a key part of the future of formative and summative skills assessment in anaesthesia; however, research is required to develop and test simulators that are realistic enough to be suitable for use in high-stakes evaluation.
Keywords: education; risk; safety
| Introduction |
|---|
|
|
|---|
A key aspect of the practice of anaesthesia is the ability to perform practical procedures efficiently and safely. Gaba and colleagues30 differentiated between technical performance, the adequacy of actions taken from a medical and technical perspective, and non-technical performance, decision-making and team interaction processes. This broad definition of technical skills includes many different processes, including the recall of factual information, diagnosis, and performing practical procedures. However, for the purposes of this review, we will distinguish between Gaba's broad definition of technical skill and a more focused definition of procedural skill, by which we mean the performance of a practical procedure.
The assessment of procedural skill in anaesthesia is generally given less importance than the assessment of knowledge and judgement-based skills.84 93 This is partly because there has been no universally accepted and comprehensive way to assess procedural skill. It is notable that non-technical skills such as communication constitute another domain that has also been assessed informally, although a full discussion of the evaluation of non-technical skills26 is outside the scope of this review. There has been an international trend towards decreasing working hours for doctors in training and reduced trainee exposure to procedural skills. This has led some in the medical profession to question if there is time for procedural skills to be adequately learned during current training programmes.14 There is a perceived need for greater accountability to the public and government and an increasing emphasis on patient safety.48 The consequences of suboptimal skills include death and permanent brain damage emphasizing the importance of a robust system for ensuring competence in procedural skills in anesthesia.13 These factors have led to a paradigm shift in postgraduate medical education from systems based on completing accredited posts for a specified amount of time to competency-based curricula72–74 96 which demand a focused and rigorous method of evaluating procedural skill.86
Surgical outcome relies on sound procedural skills,58 perhaps even more than in anaesthesia, and research in the assessment of procedural skills has been pioneered in surgery.69 We will therefore also discuss innovations in the surgical field that may be useful for anaesthesia in the future. This review will discuss the acquisition of expertise in procedural skills, general principles about the educational theory behind assessment, examine literature regarding the different techniques that can be used for the assessment of procedural skills in anaesthesia and make relevant extrapolations from literature in other fields of medicine. Finally, we will discuss the context of procedural skills assessment, including the use of simulation.
The acquisition of expertise in procedural skills
There are three stages in the acquisition of procedural skills: cognition, integration, and automation.52 Cognition includes developing an understanding of the task, and perceptual awareness. It is assisted by a clear description and demonstrations of the task. In the integration stage, the knowledge from the cognition phase is incorporated into the learning of the motor skills for that task. Ultimately, the task becomes automatic and even subconscious. For an expert, it may be difficult to break a task down into component parts in order to teach a novice.
The acquisition of competence or expertise in a procedural skill requires experience through a variable number of attempts depending on the skill, the quality of teaching, and the aptitude of the individual. In anaesthesia, skills are typically initially learned on relatively straightforward cases. When competence is achieved in those situations, trainees are exposed to a broader range of normal and pathological variations required for the development of true expertise. Retention of motor skills seems to be most dependent on the degree to which the skill was perfected.37
Which skills should be assessed?
A full discussion of which skills should be assessed is beyond the scope of this review. One viewpoint is that technical skills should be assessed if they are either commonly performed or potentially life saving.93 However, determining the core skills for life-saving manoeuvres can be controversial, as demonstrated by a recent debate on whether competence in cricothyrotomy can reasonably be expected.11 39 Core skills also change as medical technology and knowledge develops.42
| General principles |
|---|
|
|
|---|
A summative assessment is made at the end of a period of training. Summative assessments usually assign either a grade or a pass/fail and have been described as an assessment of learning.32 Summative evaluations may: (i) be part of progress within a competency-based training scheme; (ii) be required before being allowed more significant levels of responsibility; or (iii) form part of certification or revalidation of medical licensing. Summative assessments of procedural skills within anaesthesia training programmes have historically been performed through a combination of retrospective subjective feedback from supervising faculty without specific criteria, self-reported procedure logs, or both.
Formative assessments are used to aid learning and have been described as assessment for learning. In order to be useful, feedback from formative assessment needs to occur in a timely manner, so that it can influence a trainee's progress. Traditional thinking has been that feedback should be given as soon as possible, sometimes concurrently during the performance of the procedure.87 However, there is recent evidence that feedback after completion of a task is more effective than concurrent feedback, especially for long-term retention of skills.77 97 Formative assessment is usually undertaken during supervised clinical work and so its effectiveness is subject to attributes of the supervisor such as willingness to teach and interpersonal skills,45 regardless of the assessment tool used.
Criterion-referenced assessment is when the basis for comparison is a well-defined list of criteria. Norm-referenced assessment is when trainees are compared with their peers. Ipsative assessment is when a trainee's performance is compared with their own over a period of time.
The quality of a method of assessment is described by its reliability and validity. Feasibility relates to its usefulness in the real world. Other important factors in assessing procedural skills are whether they comprehensively test every aspect of the skill and whether feedback is suitably timely to promote further learning.84 Comparison of these factors for procedural skills testing is described in Table 1.
|
Reliability
Reliability refers to the reproducibility of a test. In education, this may refer to inter-rater agreement or test–retest agreement. These agreements are often described as external reliability. There are different statistical analyses that can be used to describe reliability:
- Pearson's product-moment correlation coefficient has been used to describe inter-rater agreement. An r >0.75 indicates excellent agreement.64 However, this method has been criticized as not accounting for bias.43 For instance, if one examiner consistently uses the low marks in a scale and another consistently uses the higher marks, they could still have a high correlation coefficient if they rank subjects similarly. This bias would be particularly significant if the assessment tool has a set pass mark.
- Intraclass correlation coefficient (ICC). The ICC is also used to describe inter-rater agreement. It accounts for the agreement that would be seen by chance and is defined as the ratio of variance between subjects due to error variance. An ICC of 0.8 means that 80% of the variance among scores can be attributed to true variance among subjects.90 Cohen's
coefficient is a type of ICC61 that can only be used when there are two raters. A
>0.80 has been described as indicating near-perfect agreement; 0.61–0.80, substantial agreement; 0.41–0.60, moderate agreement; 0.21–0.40, fair agreement; 0.00–0.20, slight agreement; and <0.00, poor agreement.54 However, it should be noted that this categorization is not universally accepted, and in practice, the degree of acceptable agreement depends on the circumstances. For instance, a high stakes licensing exam requires a particularly reliable assessment tool.3
gives a value from 0.0 to 1.0. By convention 0.0–0.5 can be regarded as imprecise, 0.5–0.8 moderately reliable, and 0.8–1.0 can be used with confidence for high-stakes purposes such as certification, although again these cut-offs are arbitrary.47 68 Internal consistency is a commonly used measure of reliability as it describes the reproducibility of a test. However, the consequences of poor internal reliability are less problematic than poor external reliability. For example, a test where different candidates fail on different questions could be due to differences in clinical experience or teaching, whereas a test with poor inter-rater reliability could be considered intrinsically unfair to the subjects.
Validity
Validity describes whether the test is measuring what it sets out to measure.
- Face validity refers to a general impression of whether the evaluation seems appropriate. For example, evaluating performance of epidural anaesthesia on a model consisting of a banana has little face validity compared with direct observation of the placement of an epidural in a patient. Face validity is perhaps best assessed by expert opinion, although good face validity as judged by the subjects of the assessment improves buy-in of the evaluation.
- Content validity refers to whether an assessment tests the content either of what was being taught or appropriate content as defined by a group of experts.
- Concurrent validity establishes validity based on agreement with another established valid measure.
- Construct validity is used when there is no established gold standard for comparison. A construct is a concept that is to some extent abstract. For instance, although we can all agree that there is such a thing as expertise, it is more difficult to precisely define or test that construct. Instead, we can test a surrogate outcome, such as experience, that is easier to quantify and that we expect to be associated with the construct of expertise. An educational evaluation is therefore considered valid if it can differentiate between groups with different levels of experience and is increasingly valid if the groups it can distinguish between are more similar in experience. Reznick comments that validity cannot be proven in any one experiment. Rather, over time and experimentation one accrues evidence for the validity of a test.68 This is particularly true for construct validity, which only examines a surrogate measure.
- Predictive validity is the ability of a test to predict something that happens after the test such as a clinical outcome or the future test results of a subject. Although such outcome data are the most useful demonstration of validity, it is generally the most difficult to establish.
| Techniques for the assessment of procedural skills |
|---|
|
|
|---|
Psychometric and aptitude testing
Psychometric testing has been found to be of limited value in predicting subsequent procedural performance in the surgical field. A statistically significant but modest correlation was found between the performance of a Z-plasty procedure and scores in tests that assess the ability to rotate 2D and 3D figures mentally, but not with less complex tests that assess the recognition of simple shapes.92 Laparoscopic surgery requires surgeons to infer the shape of 3D structures from 2D screens. The Pictorial Surface Orientation (PicSOr) is a computer-based test of depth perception.31 Gallagher and colleagues31 compared PicSOr performance to simulated laparoscopic cutting tasks in both novices and expert laparoscopic surgeons. There was a modest correlation between performance in PicSOr and laparoscopic performance.
The MICROcomputerised Personnel Aptitude Tester (MICROPAT) measures psychomotor ability and has been used in anaesthesia to compare performance in adaptive pursuit tracking tasks with subsequent performance in fibreoptic nasotracheal endoscopy. Pursuit tracking was correlated with faster times to completion of nasotracheal endoscopy, accounting for approximately one-third of the ability early in the learning curve.17 The MICROPAT has also been investigated as a method of predicting obstetric epidural failure rates but was not correlated with failure rates for either the first 25 or the first 50 epidurals suggesting a limited application for this evaluation.16
As psychometric tests have only been shown to have moderate correlations with performance in the early stages of technical skill acquisition, it remains to be seen if they have a role in medical recruitment or selection for any specialty. Indeed, evidence from fields of expertise outside medicine suggests that large numbers of hours of deliberate practice are more important to the acquisition of motor skills than innate ability.21 In the context of procedural skills in anaesthesia, psychometric testing is currently essentially a research tool and largely unproven.
Procedure lists
Assessment of technical skill has historically been by a combination of a subjective impression from an educational supervisor and logbooks or procedural lists.96 Self-reported procedure lists are a common form of assessment of technical skills mainly because of high feasibility. Although a certain number of procedures performed are clearly necessary to provide the opportunity to progress through the stages of acquisition of technical skills, the actual number is highly variable between individuals.15 20 There are clear limitations in the value of procedure lists, especially if used for summative assessment: there is no guarantee that a task was performed correctly and trainee anaesthetists can consistently repeat mistakes, despite considering their own performance to be acceptable. Performing a procedure badly a large number of times is of little educational value and also puts patients at risk.93 In conclusion, procedural lists are most useful for assessing opportunities provided by training posts and to guide programme directors rather than to assess individuals.8
Cumulative sum analysis
Cumulative sum (Cusum) analysis was originally developed in industry as a method of quality control. It is a statistical method that looks at the outcome rather than the process of performing procedural skills. Cusum plots a graph of the subject's performance over time based on predetermined criteria for success or failure. A value for Cusum is plotted on the y-axis and the number of attempts on the x-axis (Appendix). Failures result in a move up (and successes down) the y-axis as the subject progresses through increasing numbers of attempts. When the Cusum score decreases below a level based on a predetermined acceptable failure rate, the subject can be considered to be competent with statistical significance. The distance that a trainee is above a predetermined line is an indication of how far they are from achieving competency as defined by Cusum.46 Authors have also described other useful endpoints such as a change in the curve that denotes either an improvement or worsening in performance.46 98
The Cusum analysis is an effective objective tool to define learning curves for technical skills. Learning curves can be constructed for individuals or summated to provide learning curves for a population. Cusum can be used to provide an estimate of the number of cases that are required to achieve competency98 and demonstrates a wide variety in that number between individuals.62 For example, subjects required between nine and 88 attempts to become competent at tracheal intubation.20 It can be used to identify when a change in the training process is indicated,98 and poor technique may be corrected before demoralization.46 Cusum has been used to alter the schedule of training rotations, change curricula, and initiate mentoring programmes.98 As such, it can be used not only to assess the competency of individuals in procedural skills but also as an assessment of a training programme and its ability to teach those skills. Another potential application is as a continuous audit of quality of practice for experienced clinicians, although it is more commonly used this way in surgery where complications that constitute negative endpoints are more common.9
A potential disadvantage of the Cusum method is that it often relies on self-reporting, which may be inaccurate. Direct observation of all procedures by trainees is unlikely to be possible for most procedures, as a considerable amount of work may be performed while on call. Cusum is only as objective as the pre-defined success/failure endpoint and these definitions may vary widely.20 51 98 Acceptable rates of success can be determined by institutional rates or expert consensus46 62 98 and also depends on the definition of success. As training in anaesthesia is based on the principle of gradually increasing responsibility and reducing supervision, an increase in responsibility may result in more difficult cases and therefore result in a deterioration of the Cusum curve, despite no deterioration in skill. The issue of accounting for the difficulty level posed to the subject can also be problematic for other methods of assessing technical skill and is further discussed in the simulation section.
A final disadvantage of Cusum is that a great number of attempts may be necessary to prove statistical significance7 78 and this could be unfair to trainees if their progression through a training rotation depends on Cusum defined competency.
Direct observation without criteria
Direct observation by a consultant is traditionally used to assess procedural skills and is feasible in anaesthesia because of the high degree of supervision in cases performed by trainees.84 Despite being feasible, assessments without specific criteria result in poor reliability and validity. To overcome these problems, direct observation with specific criteria has been developed.
Direct observation with criteria
Checklists
Binary content checklists can be used as a way of grading performance during direct observation. Checklists break a task down into its component parts and assign a dichotomous pass/fail outcome to each point. A new checklist needs to be designed and validated for each procedural skill that is to be assessed. Checklists can be constructed by surveying experts,64 although a different group of experts may not agree on each point of the checklist. For instance, different groups have published checklists for epidural anaesthesia with 14, 27, and 61 items on the list.28 71 80 A systematic review and content analysis of checklists for procedural skills assessment can be found elsewhere.58 Checklists have also been designed with outcomes of not performed, performed poorly, and performed well rather than a binary pass–fail outcome to allow them to become more qualitative28 at the cost of a loss in objectivity. A potential problem with checklists is that if all stages are weighted equally regardless of clinical importance, then a trainee may be able to obtain a high score, despite omitting important stages. To prevent this, certain stages can be marked as resulting in an automatic fail if not completed or an overall pass/fail option can be added to the scoring system.
Checklists have been found to have excellent reliability in the assessment of epidural anaesthesia28 and good reliability for the assessment of interscalene brachial plexus blocks when used by trained assessors.64 An advantage of checklists is that they have intrinsic content validity, if they are constructed well, for instance, ensuring that the checklist examines what is taught at that centre. Construct validity has been established for checklists in the assessment of epidural anaesthesia28 and interscalene brachial plexus blocks.64 Predictive validity has not been studied for checklists in anaesthesia.
Global rating scales
Global rating scales (GRSs) differ from checklists, in that they use a Likert scale rather than a dichotomous outcome. As the GRS has a gradation of response in each category, it is less objective than a checklist, although this allows the assessment to be more qualitative.90 GRS can be used to assess many different skills and they are the most objective way that aspects of performance such as professionalism and interpersonal skills can be assessed. When used to assess procedural skill, a GRS may either describe an overall impression of the quality of performance or there may be a Likert scale for a number of different domains within an overall performance.64 70 GRS can be used prospectively or retrospectively, although, as in other forms of assessment, there is evidence that reliability is poor if used retrospectively.85 Potential pitfalls with GRS include the halo effect, when good or bad performance in one domain unduly influences the grading of performance in other domains. This may be partly due to lack of training of assessors41 85 and can make a GRS seem falsely internally consistent.47 Another common problem with GRS is self-imposed scale limitation. Assessors commonly restrict themselves to the high end of the scale: in one study, 95.6% of scores on a nine-point scale were between 6 and 9.85 This may also be because of either lack of training of assessors or because assessors are unwilling to fail a trainee knowing the potentially serious consequences. An alternative explanation for scale limitation is that most trainees are of a high standard. However, to be useful, GRS must be designed to be able to differentiate grades of quality beyond distinguishing between outstanding or failing trainees.
A GRS developed for the assessment of procedural skills in surgery at the University of Toronto (Table 2)55 68 has been repeatedly found to have construct validity for the assessment of procedural skills in both surgery55 68 and anaesthesia, differentiating between junior and senior trainees performing an interscalene block,64 and to discriminate between various levels of experience at performing epidural anesthesia.28 It has also been found to have good reliability for the assessment of orotracheal fibreoptic intubation,63 epidural anaesthesia,28 and interscalene brachial plexus blocks.64
|
An advantage of GRS is that they are not confined to one procedure but can be used for different procedural skills. However, some domains may be particularly useful for certain kinds of procedure, for instance, the domain depth perception has been added for laparoscopy but is unlikely to be useful in anaesthesia while autonomy is likely to be useful for procedural skills assessment in any speciality.90
GRSs are currently being used for in-training assessment of procedural skills in the UK Foundation Programme that covers the first 2 postgraduate years in all specialities. One mandatory competence assessment tool is the Direct Observation of Procedural Skills (DOPS),10 a specific six-point, 11 domain GRS used to assess performance in procedural skills (Table 3). It is notable that DOPS focuses on the context of the procedural skill: nine of the domains describe pre- and post-procedure care and non-technical skills. The actual assessment of the procedural skill is limited to a single domain. DOPS was developed by the Royal College of Physicians UK and is currently in the process of being studied as a pilot system. Foundation year trainees need to undertake 6 DOPS each year from an approved list.66 The trainee chooses the timing, the procedure, and the assessor who may be a more senior doctor or a nurse but is expected to have had some training in the use of DOPS.
|
McKinley and colleagues argue for a holistic evaluation of procedural skills for summative purposes. Their Leicester Clinical Procedure Assessment Tool (LCAT)57 was developed in response to a systematic review58 of checklists and GRS for the evaluation of procedural skills that found that teamwork competencies and humanistic competencies such as safety and infection prevention were omitted from the majority of assessment tools. The LCAT has been shown to be reliable and has thoroughly demonstrated content and face validity, but construct and predictive validity have not yet been investigated.57
It is not clear whether construct validity as tested within an educational research trial is necessarily generalizable to real world practice using in-training assessments by untrained raters. Potential impediments to similar degrees of reliability in the real world to that achieved in research with direct observation of a procedure on actual patients include an imprecise GRS, patient variability resulting in heterogeneous levels of difficulty, lack of training of the assessor resulting in differing levels of expectation from staff and the degree to which the trainee is acting independently.55
Comparisons between GRSs and checklists
Both checklists and GRSs have been found to have good reliability.33 60 Checklists have been challenged as being able to distinguish novice and expert performance but failing to differentiate between high levels of performance, rewarding thoroughness rather than expertise. This may be particularly true for non-technical skills: Hodges and colleagues40 compared checklists and GRSs in psychiatry OSCE stations and found that experts scored better than trainees or medical students when assessed with GRS but worse when assessed with checklists. The authors concluded that checklists might penalize experts who take shortcuts that they have learned as part of their expertise and that an instrument that is valid at one level of training may not be valid at another.40 Similarly, for procedural skills assessment, a comparison of checklists and GRSs in the performance of simulated surgical procedures by trainees found that GRS had better construct validity than the checklist, although both instruments demonstrated construct validity and good reliability.67
Other authors have suggested that checklists may be better suited for assessing procedural skills than other domains of learning as procedural skills tend to be sequential and predictable.53 It has also been suggested that procedure-specific scales may provide an additional degree of formative feedback to generic scales.1 A good example is in ultrasound-guided regional anaesthesia, where radiological visualization of the nerve is a key skill that is not accounted for by generic scales.79 It is notable that GRS generally gather different information than a checklist. Friedman and colleagues29 have demonstrated the value of a checklist in identifying poor aseptic technique, despite good performance in other aspects of procedural skills. This suggests that a combination of a checklist and GRS may be advantageous when a comprehensive evaluation is required, for instance, when testing an intervention in education research.
A dilemma for both GRSs and checklists is determining cut-off scores for what can be considered competent or not competent if used for summative evaluation. One solution is to examine a population of expert anaesthetists and define proficiency from their scores. Using a normative marking scheme with a fixed proportion of subjects passing and failing has the limitation that it would fail to identify either a good or bad cohort of trainees.
Other instruments
Global Operative Assessment of Laparoscopic Skills (GOALS) is an assessment tool that combines both a GRS and a checklist with visual analogue scales (VAS). The two 10 cm VAS are anchored at each end with specific descriptors and refer to (i) overall competence and (ii) the observed difficulty of the procedure. GOALS was found to be more reliable than a GRS and checklist alone with suitable inter-rater reliability for a high-stakes examination. The VAS for competence was found to have construct validity in the assessment of performance of laparoscopic cholecystectomy where a checklist did not.90 The use of a VAS for difficulty when directly observing procedural skills has not yet been explored in anaesthesia and future research in this area is warranted. Another alternative to checklists and GRSs is to measure the number of pre-determined errors.79
Motion analysis
The Imperial College Surgical Assessment Device (ICSAD) is a motion analysis device originally designed for the investigation of hand movements in surgeons. It provides an objective measure of technical ability that has been validated in various surgical fields and has begun to be used in anaesthesia.94 It uses an electromagnetic tracking system (Isotrak II; Polhemus Inc., Colchester, VT, USA) consisting of an electromagnetic field generator and two 10 mm sensors that are attached to the dorsum of each hand. Robotic Video and Motion Analysis Software retrieves time-stamped Cartesian coordinates and defines hand movements by changes in velocity. It processes this information to produce values for total distance moved by each hand, number of movements, total time, and average hand velocity. A Gaussian filter is used to eliminate background noise, so that only meaningful actions register as movements: the size of movement that designates a movement is adjustable; however, after this point, the ICSAD is entirely objective.
Different aspects of economy of movement have been validated as assessment tools for different procedures. Time taken and reduced total movements have been demonstrated to be associated with expertise in open5 19 laparoscopic83 and micro-surgery.75 Reduced number of movements has also been found to be correlated with less anastomotic leakage in a simulated arterial graft model providing some evidence of predictive validity.18 Reduced total path length is associated with expertise in laparoscopic59 81 83 and micro-surgery75 but not in open surgery.5 19
In anaesthesia, the ICSAD has potential to objectively assess the development of expertise in technical skills but has not yet been validated for specific anaesthesia procedures and this should be an area of future research. It may also be of use in formative assessment, although the data output is not intuitively interpretable and would perhaps be best compared with previous performance or that of peers. A limitation of the ICSAD is that it can only ever assess process rather than outcome, for instance, it gives no information on whether the procedure was performed well. The data are likely to be more useful if triangulated with a checklist or GRS.
| Contexts of procedural skills assessment |
|---|
|
|
|---|
Procedural skills have historically been both taught and evaluated on patients. In recent years, advances in simulation technology have enabled the evaluation of procedural skills to be taken out of the clinical context into the simulation laboratory. This has created the possibility of standardized assessment of procedural skills.
Simulation
Simulation includes the use of manikins, human cadavers, animals, virtual reality, and standardized patients. Part-task trainers (PTT) simulate a particular anatomical area or procedure in contrast to full patient simulation with a computer-enhanced mannequin. PTT are generally less expensive than full-patient computerized manikin-based simulators, especially when taking into account the costs of a technician and actors required, but full-patient systems can better simulate the whole clinical environment in order to recreate situations where procedures must be completed quickly to prevent further physiological deterioration.88 Hybrid simulation involves more than one type of simulation and can be used to put procedural skills in context. An example is having an actor playing the part of a patient to assess consent and communication skills, with an attached mannequin arm to assess placing an i.v. cannula.49 50
The fidelity of a simulation refers to its similarity to an actual task or patient. Although simulation will never be exactly the same as a clinical experience, there are a number of advantages in assessing procedural skills in a simulated rather than a clinical environment. Some procedures such as emergency cricothyrotomy are uncommon enough that most anaesthetists will not perform this procedure during their training. Assessing procedural skills using a simulator prevents potential harm to patients and has been described as an ethical imperative,99 although ethical issues remain with the use of animals or human cadavers. In-training assessment has also been challenged as problematic due to power relations between the trainee and the trainer,65 potential conflicts of interest between the consultant as a teacher and as an evaluator,95 and the trivialization of evaluation and development of a tick box mentality.6 As there are problems with both standardized simulated evaluations and in-training assessment, a combination of both may offer the best solution.
Predictive validity has been demonstrated for the Human Patient Simulator (METI, Sarasota, FL, USA) for teaching tracheal intubation: 10 h of deliberate practice was found to be as effective as 15 intubations in the operating theatre.36 The AirSim (TruCorp, Belfast, Northern Ireland) has content validity as it is anatomically correct as it was designed using a 3D model, using spiral CT scans of the human airway.23 The extubated anaesthetized sheep model also has content validity for the can't intubate, can't ventilate scenario: it has secretions, a mobile larynx, it will bleed if a surgical airway is attempted, it can develop s.c. emphysema, it has a cricothyroid membrane, and it desaturates after extubation producing a sense of responsibility to regain the airway. There are, however, both cost and ethical disadvantages to this model.38 Construct validity has been established for some simulations in anaesthesia including the Bill 1 airway simulator (VBM Medizintechnik GmbH, Sulz, Germany) for cricothyrotomy82 and a virtual reality flexible bronchoscopy simulator (Immersion Medical, Gaithersburg, MD, USA).15 However, the majority of studies on the validity of anaesthesia simulators for procedural skills have generally limited themselves to discussions of face validity, rated by subjective realism by the users.89 Face validity depends largely on the simulation's fidelity. Although there is some evidence that low-fidelity PTT are as effective as high-fidelity simulations in the teaching of anaesthesia procedural skills,12 56 this has not been investigated for evaluation.
Surgical specialities are considerably ahead of anaesthesia in the development and validation of PTT. One example is the McGill Inanimate System for Training and Evaluation of Laparoscopic Skills (MISTELS), a non-anatomical simulator that has been assessed at multiple institutions.27 The MISTELS simulation and metric has been found to be reliable with excellent internal consistency, inter-rater and test–retest agreement,27 91 construct validity,27 and concurrent validity.24 27
There is good evidence that simulation-based learning with PTT is effective for teaching procedural skills in surgery. Virtual reality PTT, both non-anatomical25 35 and anatomical,2 have been demonstrated to have predictive validity for the teaching of laparoscopic cholecystectomy and reduce error when operating on patients. Virtual reality PTT have also been used for high-stakes evaluation in surgery.76 Predictive validity has only been demonstrated in anaesthesia for the teaching of tracheal intubation.36 However, procedural skills in anaesthesia can be as vital as decision-making out of the operating theatre can be in the surgical specialities and it could be argued that PTT for anaesthesia have been neglected compared with full-patient manikin simulators. As an example, it is notable that there is no commercially available airway PTT that is capable of simulating a comprehensive range of pathology that can cause a difficult airway.
The success of outcome-driven simulation-based learning suggests that using simulation as an evaluation before independent practice has the potential to impact positively on patient safety. At the present time, there has been no research that has demonstrated predictive validity for any evaluation using any simulation. A priority for future patient safety research should be to demonstrate that a competent performance on a simulator results in competence in actual patients.
Multiple station examinations of procedural skills
The Objective Structured Assessment of Technical Skills (OSATSs) were developed as an objective assessment of procedural skill outside of the operating theatre68 and is similar to an OSCE. Candidates perform a series of standardized surgical skills on bench models. At each, time-limited station, candidates are examined by the direct observation of experts and technical skills are assessed using both a generic GRS and a task-specific checklist. OSATSs have been shown to be a reliable measure of technical skill.4 33 55 68 Construct validity33 68 and concurrent validity have been established for OSATS including correlating surgical faculty rankings to OSATS scores for senior surgical trainees.22 However, organizing and running an OSATS examination is labour intensive and relatively expensive.34 An OSATS examination format has not yet been used in anaesthesia but has the potential to standardize procedural skills assessment if suitable simulations can be validated, as discussed above.
| Conclusions |
|---|
|
|
|---|
Procedural skills in anaesthesia are assessed poorly compared with other domains of learning. This domain of learning has undergone detailed investigation in surgery because of the high importance of procedural skill on surgical patient outcome and anaesthesia has the opportunity to learn from recent advances in the surgical fields.
It has been argued that evaluation drives learning.58 Current evaluations in anaesthesia tend to focus on broadly defined technical skills but neglect the details of procedural skills (and non-technical skills). Improving the evaluation of procedural skills has the potential to promote excellence in a neglected domain of learning.
Research into the assessment of technical skills in anaesthesia has been conducted with heterogeneous methodologies, which makes comparison difficult. Future studies should include core elements that ensure methodological rigor and facilitate robust comparisons between trials. Such studies should be able to demonstrate assessments: (i) validity—it measures what it purports to measure, (ii) reliability, (iii) feasibility—including cost-effectiveness, and (iv) comprehensiveness (allows for various levels of difficulty). The demonstration of predictive validity for the evaluation of procedural skills should be considered an achievable priority.
This review has presented a diverse array of different methods each having their own particular advantages and disadvantages. A key question is how should a training programme assess the procedural skills of trainees in practice? First, several methods can be excluded. Logbooks and procedure lists are best suited to providing information regarding likely opportunities within training programmes and there is little evidence to promote the use of psychometric or aptitude testing in anaesthesia. Cusum analysis has the potential to provide a robust statistical measure of procedural competence but relies on either self-reported performance or repeated direct observations and can require large numbers of performances to demonstrate competency. New technology such as motion analysis may have a role in focusing on manual dexterity during technical tasks but requires further validation before being used to assess procedural skills in anaesthesia.
Currently, the best evidence for a valid, reliable, feasible, and comprehensive assessment tool to assess procedural skill in anaesthesia lies in the use of checklists and GRSs. There is good evidence for the use of a combination of a checklist and GRS in the setting of medical educational research and this combination of tools could be considered the gold standard in this setting. When choosing an assessment tool for procedural skills, the key question is what the purpose of the evaluation is. For formative assessment, checklists and GRSs provide in-depth information to promote learning. Most research has focused on the measurement of performance; however, summative evaluation requires not only the measurement of performance but also the setting of standards and the designation of a trainee as competent or not competent. Subjective judgements of competence as a binary outcome are not psychometrically robust enough for high-stakes purposes and more reliable options include a GRS that includes competence as a behavioural descriptor or calibrating the scores of an assessment tool against an expert group.
There is a need to investigate available assessment tools in real-time in-training evaluations using large numbers of both trainees and raters. Directorate or faculty development is key: trained staff are more reliable in summative assessment and more likely to give feedback in formative assessment.65 As Kelly and colleagues44 have stated: a macroview is needed on the purposes and systems for assessment, before becoming involved in the details of developing particular instruments, in particular, although a mediocre assessment within a good system can be manipulated to be "good enough", a good instrument within a poor system is likely to perform poorly.
Patient simulation is a growth area in anaesthesia training programmes; however, simulation for evaluation remains a controversial area. Research is required to develop and test simulators that are realistic enough and have a suitable range of difficulty to be used for high-stakes evaluation. Simulation and multistation assessments similar to OSATS may become a key part of the future of procedural skills assessment in anaesthesia if suitable part-task simulators can be validated. However, at the present time, there is not enough evidence to recommend that anaesthetic trainees are evaluated in procedural skills using simulators, especially when these skills can be reliably assessed by direct observation of performance on patients.
| Appendix |
|---|
|
|
|---|
Formulae for calculating the variables for Cusum analysis:46
|
|
|
|
|
|
|
|
, the risk of a type I error; β, the risk of a type II error; p1, the acceptable failure rate; and p0, the unacceptable failure rate. | References |
|---|
|
|
|---|
1 Aggarwal R, Grantcharov T, Moorthy K, Milland T, Darzi A. Toward feasible, valid, and reliable video-based assessments of technical surgical skills in the operating room. Ann Surg (2008) 247:372.[CrossRef][Web of Science][Medline]
2 Ahlberg G, Enochsson L, Gallagher AG, et al. Proficiency-based virtual reality training significantly reduces the error rate for residents during their first 10 laparoscopic cholecystectomies. Am J Surg (2007) 193:797–804.[CrossRef][Web of Science][Medline]
3 Altman DG. Practical Statistics for Medical Research. (1991) London: Chapman Hall/CRC Press.
4 Ault G, Reznick R, MacRae H, et al. Exporting a technical skills evaluation technology to other sites. Am J Surg (2001) 182:254–6.[CrossRef][Web of Science][Medline]
5 Bann SD, Khan MS, Darzi AW. Measurement of surgical dexterity using motion analysis of simple bench tasks. World J Surg (2003) 27:390–4.[CrossRef][Web of Science][Medline]
6 Bisson DL, Hyde JP, Mears JE. Assessing practical skills in obstetrics and gynaecology: educational issues and practical implications. Obstet Gynaecol (2006) 8:107.
7 Bolsin S, Colson M. The use of the Cusum technique in the assessment of trainee competence in new procedures. Int J Qual Health Care (2000) 12:433–8.
8 Bould MD, Crabtree NA. Are logbooks of training in anaesthesia a valuable exercise? Br J Hosp Med (2008) 69:236.
9 Bowles TA, Watters DA. Time to Cusum: simplified reporting of outcomes in colorectal surgery. ANZ J Surg (2007) 77:587.[CrossRef][Web of Science][Medline]
10 Carr S. The Foundation Programme assessment tools: an opportunity to enhance feedback to trainees? Postgrad Med J (2006) 82:576–9.
11 Chambers WA. Difficult airways—difficult decisions: guidelines for publication? Anaesthesia (2004) 59:631–3.[CrossRef][Web of Science][Medline]
12 Chandra DB, Savoldelli GL, Joo HS, Weiss ID, Naik VN. Fiberoptic oral intubation: the effect of model fidelity on training for transfer to patient care. Anesthesiology (2008) 109:1007–13.[CrossRef][Web of Science][Medline]
13 Cheney FW, Posner KL, Lee LA, Caplan RA, Domino KB. Trends in anesthesia-related death and brain damage: a closed claims analysis. Anesthesiology (2006) 105:1081–6.[CrossRef][Web of Science][Medline]
14 Cooper GM, McClure JH. Anaesthesia chapter from Saving mothers' lives; reviewing maternal deaths to make pregnancy safer. Br J Anaesth (2008) 100:17–22.
15 Crawford SW, Colt HG. Virtual reality and written assessments are of potential value to determine knowledge and skill in flexible bronchoscopy. Respiration (2004) 71:269–75.[CrossRef][Web of Science][Medline]
16 Dashfield AK, Coghill JC, Langton JA. Correlating obstetric epidural anaesthesia performance and psychomotor aptitude. Anaesthesia (2000) 55:744–9.[CrossRef][Web of Science][Medline]
17 Dashfield AK, Smith JE. Correlating fibreoptic nasotracheal endoscopy performance and psychomotor aptitude. Br J Anaesth (1998) 81:687–91.
18 Datta V, Chang A, Mackay S, Darzi A. The relationship between motion analysis and surgical technical assessments. Am J Surg (2002) 184:70–3.[CrossRef][Web of Science][Medline]
19 Datta V, Mackay S, Mandalia M, Darzi A. The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model. J Am Coll Surg (2001) 193:479–85.[CrossRef][Web of Science][Medline]
20 de Oliveira Filho GR. The construction of learning curves for basic skills in anesthetic procedures: an application for the cumulative sum method. Anesth Analg (2002) 95:411–6.
21 Ericsson KA, Charness N. Expert performance. Am Psychol (1994) 49:725–47.[CrossRef]
22 Faulkner H, Regehr G, Martin J, Reznick R. Validation of an objective structured assessment of technical skill for surgical residents. Acad Med (1996) 71:1363–5.[Web of Science][Medline]
23 Fee JPH, Murray JM, McBride A, Edgar T. A realistic manikin for airway training. Anaesthesia (2003) 58:509–10.[CrossRef][Web of Science]
24 Feldman LS, Hagarty SE, Ghitulescu G, Stanbridge D, Fried GM. Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents. J Am Coll Surg (2004) 198:105–10.[CrossRef][Web of Science][Medline]
25 Feldman LS, Sherman V, Fried GM. Using simulators to assess laparoscopic competence: ready for widespread use? Surgery (2004) 135:28–42.[CrossRef][Web of Science][Medline]
26 Fletcher G, Flin R, McGeorge P, et al. Anaesthetists' Non-Technical Skills (ANTS): evaluation of a behavioural marker system. Br J Anaesth (2003) 90:580–8.
27 Fried GM, Feldman LS, Vassiliou MC, et al. Proving the value of simulation in laparoscopic surgery. Ann Surg (2004) 240:518–25.[CrossRef][Web of Science][Medline]
28 Friedman Z, Katznelson R, Devito I, Siddiqui M, Chan V. Objective assessment of manual skills and proficiency in performing epidural anesthesia—video-assisted validation. Reg Anesth Pain Med (2006) 31:304–10.[Web of Science][Medline]
29 Friedman Z, Siddiqui N, Katznelson R, Devito I, Davies S. Experience is not enough: repeated breaches in epidural anesthesia aseptic technique by novice operators despite improved skill. Anesthesiology (2008) 108:914–20.[CrossRef][Web of Science][Medline]
30 Gaba DM, Howard SK, Flanagan B, Smith BE, Fish KJ, Botney R. Assessment of clinical performance during simulated crises using both technical and behavioral ratings. Anesthesiology (1998) 89:8–18.[CrossRef][Web of Science][Medline]
31 Gallagher AG, Cowie R, Crothers I, Jordan-Black JA, Satava RM. PicSOr: an objective test of perceptual skill that predicts laparoscopic technical skill in three initial studies of laparoscopopic performance. Surg Endosc (2003) 17:1468–71.[CrossRef][Web of Science][Medline]
32 Gardner J. Assessment and Learning (2006) London: Sage Publications.
33 Goff BA, Lentz GM, Lee D, Houmard B, Mandel LS. Development of an objective structured assessment of technical skills for obstetric and gynecology residents. Obstet Gynecol (2000) 96:146–50.[CrossRef][Web of Science][Medline]
34 Goff BA, Nielsen PE, Lentz GM, et al. Surgical skills assessment: a blinded examination of obstetrics and gynecology residents. Am J Obstet Gynecol (2002) 186:613–7.[CrossRef][Web of Science][Medline]
35 Grantcharov TP, Kristiansen VB, Bendix J, Bardram L, Rosenberg J, Funch-Jensen P. Randomized clinical trial of virtual reality simulation for laparoscopic skills training. Br J Surg (2004) 91:146–50.[Web of Science][Medline]
36 Hall RE, Plant JR, Bands CJ, Wall AR, Kang J, Hall CA. Human patient simulation is effective for teaching paramedic students endotracheal intubation. Acad Emerg Med (2005) 12:850–5.[CrossRef][Web of Science][Medline]
37 Hamdorf JM, Hall JC. Acquiring surgical skills. Br J Surg (2000) 87:28–37.[Web of Science][Medline]
38 Heard A, Eakins P. Emergency surgical airway access using a sheep model. Anaesthesia (2005) 60:833–4.[Web of Science]
39 Henderson J, Popat M, Latto P, Pearce A. Difficult Airway Society guidelines. Anaesthesia (2004) 59:1242–3.[CrossRef][Web of Science][Medline]
40 Hodges B, Regehr G, McNaughton N, Tiberius R, Hanson M. OSCE checklists do not capture increasing levels of expertise. Acad Med (1999) 74:1129–34.[Web of Science][Medline]
41 Holmboe ES, Hawkins RE. Methods for evaluating the clinical competence of residents in internal medicine: a review. Ann Intern Med (1998) 129:42–8.
42 Hopkins PM. Ultrasound guidance as a gold standard in regional anaesthesia. Br J Anaesth (2007) 98:299–301.
43 Hunt RJ. Percent agreement, Pearson's correlation, and kappa as measures of inter-examiner reliability. J Dent Res (1986) 65:128–30.
44 Kelly A, Canter R. A new curriculum for surgical training within the United Kingdom: context and model. J Surg Educ (2007) 64:10–9.[CrossRef][Medline]
45 Kelly SP, Shapiro N, Woodruff M, Corrigan K, Sanchez LD, Wolfe RE. The effects of clinical workload on teaching in the emergency department. Acad Emerg Med (2007) 14:526–31.[CrossRef][Web of Science][Medline]
46 Kestin IG. A statistical approach to measuring the competence of anaesthetic trainees at practical procedures. Br J Anaesth (1995) 75:805–9.
47 Keynan A, Friedman M, Benbassat J. Reliability of global rating scales in the assessment of clinical competence of medical students. Med Educ (1987) 21:477–81.[Web of Science][Medline]
48 Kmietowicz Z. Make patient safety part of everyday routines, says watchdog. Br Med J (2008) 336:294–5.
49 Kneebone R, Kidd J, Nestel D, Asvall S, Paraskeva P, Darzi A. An innovative model for teaching and learning clinical procedures. Med Educ (2002) 36:628–34.[CrossRef][Web of Science][Medline]
50 Kneebone R, Nestel D, Yadollahi F, et al. Assessing procedural skills in context: exploring the feasibility of an Integrated Procedural Performance Instrument (IPPI). Med Educ (2006) 40:1105–14.[CrossRef][Web of Science][Medline]
51 Konrad C. Learning manual skills in anesthesiology: is there a recommended number of cases for anesthetic procedures? Anesth Analg (1998) 86:635–9.[Abstract]
52 Kopta JA. The development of motor skills in orthopaedic education. Clin Orthop Relat Res (1971) 75:80–5.[CrossRef][Medline]
53 Lammers RL, Davenport M, Korley F, et al. Teaching and assessing procedural skills using simulation: metrics and methodology. Acad Emerg Med (2008) 15:1079–87.[CrossRef][Web of Science][Medline]
54 Landis JR, Koch GG. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics (1977) 33:363–74.[CrossRef][Web of Science][Medline]
55 Martin JA, Regehr G, Reznick R, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg (1997) 84:273–8.[CrossRef][Web of Science][Medline]
56 Matsumoto ED, Hamstra SJ, Radomski SB, Cusimano MD. The effect of bench model fidelity on endourological skills: a randomized controlled study. J Urol (2002) 167:1243–7.[CrossRef][Web of Science][Medline]
57 McKinley RK, Strand J, Gray T, Schuwirth L, Alun-Jones T, Miller H. Development of a tool to support holistic generic assessment of clinical procedure skills. Med Educ (2008) 42:619–27.[CrossRef][Web of Science][Medline]
58 McKinley RK, Strand J, Ward L, Gray T, Alun-Jones T, Miller H. Checklists for assessment and certification of clinical procedural skills omit essential competencies: a systematic review. Med Educ (2008) 42:338–49.[CrossRef][Web of Science][Medline]
59 Moorthy K, Munz Y, Dosis A, Bello F, Chang A, Darzi A. Bimodal assessment of laparoscopic suturing skills: construct and concurrent validity. Surg Endosc (2004) 18:1608–12.[Web of Science][Medline]
60 Morgan PJ, Cleave-Hogg D, Guest CB. A comparison of global ratings and checklist scores from an undergraduate assessment using an anesthesia simulator. Acad Med (2001) 76:1053–5.[Web of Science][Medline]
61 Muller R, Buttner P. A critical discussion of intraclass correlation coefficients. Stat Med (1994) 13:2465–76.[Web of Science][Medline]
62 Naik VN, Devito I, Halpern SH. Cusum analysis is a useful tool to assess resident proficiency at insertion of labour epidurals. Can J Anaesth (2003) 50:694–8.[Web of Science][Medline]
63 Naik VN, Matsumoto ED, Houston PL, et al. Fiberoptic orotracheal intubation on anesthetized patients: do manipulation skills learned on a simple model transfer into the operating room? Anesthesiology (2001) 95:343–8.[CrossRef][Web of Science][Medline]
64 Naik VN, Perlas A, Chandra DB, Chung DY, Chan VW. An assessment tool for brachial plexus regional anesthesia performance: establishing construct validity and reliability. Reg Anesth Pain Med (2007) 32:41–5.[Web of Science][Medline]
65 Norcini J. Workplace-based assessment as an educational tool: AMEE Guide No. 31. Med Teach (2007) 29:855–71.[CrossRef][Web of Science][Medline]
66 Norcini JJ, McKinley DW. Assessment methods in medical education. Teach Teach Educ (2007) 23:239–50.
67 Regehr G, MacRae H, Reznick RK, Szalay D. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med (1998) 73:993–7.[Web of Science][Medline]
68 Reznick R, Regehr G, MacRae H, Martin J, McCulloch W. Testing technical skill via an innovative bench station examination. Am J Surg (1997) 173:226–30.[CrossRef][Web of Science][Medline]
69 Reznick RK, MacRae H. Teaching surgical skills—changes in the wind. N Engl J Med (2006) 355:2664.
70 Ringsted C, Ostergaard D, Ravn L, Pedersen JA, Berlac PA, van der Vleuten CPM. A feasibility study comparing checklists and global rating forms to assess resident performance in clinical skills. Med Teach (2003) 25:654–8.[CrossRef][Web of Science][Medline]
71 Ringsted C, Ostergaard D, Scherpbier A. Embracing the new paradigm of assessment in residency training: an assessment programme for first-year residency training in anaesthesiology. Med Teach (2003) 25:54–62.[CrossRef][Web of Science][Medline]
72 Royal College of Anaesthetists. CCT in Anaesthesia III: Competency Based Intermediate Level (Years 3 and 4) Training and Assessment. A Manual for Trainees and Trainers. (2007) London: Royal College of Anaesthetists.
73 Royal College of Anaesthetists. CCT in Anaesthesia II: Competency Based Basic Level (ST Years 1 and 2) Training and Assessment. A Manual for Trainees and Trainers. (2007) London: Royal College of Anaesthetists.
74 Royal College of Anaesthetists. CCT in Anaesthesia IV: Competency Based Higher and Advanced Level (Years 5, 6 and 7) Training and Assessment. A Manual for Trainees and Trainers. (2007) London: Royal College of Anaesthetists.
75 Saleh GM, Voyatzis G, Hance J, Ratnasothy J, Darzi A. Evaluating surgical dexterity during corneal suturing. Arch Ophthalmol (2006) 124:1263–6.
76 Salgado J, Grantcharov TP, Papasavas PK, Gagne DJ, Caushaj PF. Technical skills assessment as part of the selection process for a fellowship in minimally invasive surgery. Surg Endosc (2009) 23:641–4.[CrossRef][Web of Science][Medline]
77 Schmidt RA, Bjork RA. New conceptualizations of practice: common principles in three paradigms suggest new concepts for training. Psychol Sci (1992) 3:207–17.[CrossRef][Web of Science]
78 Schuepfer G, Johr M. Generating a learning curve for penile block in neonates, infants and children: an empirical evaluation of technical skills in novice and experienced anaesthetists. Paediatr Anaesth (2004) 14:574–8.[CrossRef][Medline]
79 Sites BD, Spence BC, Gallagher JD, Wiley CW, Bertrand ML, Blike GT. Characterizing novice behavior associated with learning ultrasound-guided peripheral regional anesthesia. Reg Anesth Pain Med (2007) 32:107–15.[Web of Science][Medline]
80 Sivarajan M, Miller E, Hardy C, et al. Objective evaluation of clinical performance and correlation with knowledge. Anesth Analg (1984) 63:603–7.
81 Smith SG, Torkington J, Brown TJ, Taffinder NJ, Darzi A. Motion analysis. Surg Endosc (2002) 16:640–5.[CrossRef][Web of Science][Medline]
82 Sulaiman L, Tighe SQM, Nelson RA. Surgical vs wire-guided cricothyroidotomy: a randomised crossover study of cuffed and uncuffed tracheal tube insertion. Anaesthesia (2006) 61:565–70.[CrossRef][Web of Science][Medline]
83 Taffinder NJ, McManus IC, Gul Y, Russell RC, Darzi A. Effect of sleep deprivation on surgeons' dexterity on laparoscopy simulator. Lancet (1998) 352:1191.[CrossRef][Web of Science][Medline]
84 Tetzlaff JE. Assessment of competency in anesthesiology. Anesthesiology (2007) 106:812–25.[CrossRef][Web of Science][Medline]
85 Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Roberson L. Evaluating evaluation: assessment of the American Board of Internal Medicine Resident Evaluation Form. J Gen Intern Med (1990) 5:214–7.[Web of Science][Medline]
86 Tooke J. Aspiring to Excellence. Finding and Recommendations of the Independent Inquiry into Modernising Medical Careers (2008) Chiswick: Aldridge Press.
87 Turnbull J, Gray J, MacFadyen J. Improving in-training evaluation programs. J Gen Intern Med (1998) 13:317–23.[CrossRef][Web of Science][Medline]
88 Vadodaria BS, Gandhi SD, McIndoe AK. Comparison of four different emergency airway access equipment sets on a human patient simulator. Anaesthesia (2004) 59:73–9.[CrossRef][Web of Science][Medline]
89 Varaday SS, Yentis SM, Clarke S. A homemade model for training in cricothyrotomy. Anaesthesia (2004) 59:1012–5.[CrossRef][Web of Science][Medline]
90 Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg (2005) 190:107–13.[CrossRef][Web of Science][Medline]
91 Vassiliou MC, Ghitulescu GA, Feldman LS, et al. The MISTELS program to measure technical skill in laparoscopic surgery: evidence for reliability. Surg Endosc (2006) 20:744–7.[CrossRef][Web of Science][Medline]
92 Wanzel KR, Hamstra SJ, Anastakis DJ, Matsumoto ED, Cusimano MD. Effect of visual-spatial ability on learning of spatially-complex surgical skills. Lancet (2002) 359:230–1.[CrossRef][Web of Science][Medline]
93 Watts J, Feldman WB. Assessment of technical skills. In: Assessing Clinical Competence.—Nuefeld VR, Norman GR, eds. (1985) New York: Springer. 259–74.
94 Weiss ID, Naik VN, Salvoldelli G, Chandra DB, Joo HS, LeBlanc V. Sleep deprivation and anesthesiologists' technical skills (Abstract). In: Canadian Anesthesiologists' Society Annual Meeting (2007) Calgary.
95 Wilkinson TJ, Wade WB. Problems with using a supervisor's report as a form of summative assessment. Postgrad Med J (2007) 83:504.
96 Wragg A, Wade W, Fuller G, Cowan G, Mills P. Assessing the performance of specialist registrars. Clin Med (2003) 3:131–4.[Web of Science][Medline]
97 Xeroulis GJ, Park J, Moulton CA, Reznick RK, LeBlanc V, Dubrowski A. Teaching suturing and knot-tying skills to medical students: a randomized controlled study comparing computer-based video instruction and (concurrent and summary) expert feedback. Surgery (2007) 141:442–9.[CrossRef][Web of Science][Medline]
98 Young A, Miller JP, Azarow K. Establishing learning curves for surgical residents using Cumulative Summation (CUSUM) analysis. Curr Surg (2005) 62:330–4.[CrossRef][Medline]
99 Ziv A, Wolpe PR, Small SD, Glick S. Simulation-based medical education: an ethical imperative. Acad Med (2003) 78:783–8.[Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
Related articles in BJA:
- In the October 2009 BJA ...
BJA 2009 103: NP.[Extract] [Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||