Skip to Main Content

INFORMATION FOR

    “Competency Assessment - Are Your Measures Reliable?”

    March 26, 2026

    “Competency Assessment - Are Your Measures Reliable?”

    Joseph Donroe MD, MPH, MHS - Yale School of Medicine

    March 19, 2026

    Yale GIM “Research in Progress” Meeting Presented by: Yale School of Medicine’s Department of Internal Medicine, Section of General Internal Medicine

    ID
    14007

    Transcript

    • 00:00Yeah.
    • 00:04Alright. Well, welcome everybody.
    • 00:07Thank you for coming to,
    • 00:09noon conference, general medicine.
    • 00:12Today's,
    • 00:14CME CME code is five
    • 00:16five nine zero one.
    • 00:21Okay. So upcoming,
    • 00:23retreats. Next one will be
    • 00:25the education retreat on the
    • 00:27West Campus.
    • 00:28Stay tuned.
    • 00:30Important
    • 00:31but,
    • 00:33familiar.
    • 00:34F DAC reminders, please be
    • 00:35on the lookout for your
    • 00:37next steps, which
    • 00:38most likely most likely will
    • 00:40be including meeting with your
    • 00:42mentors
    • 00:43or your delegates.
    • 00:48Research
    • 00:48in progress,
    • 00:51I guess the next one,
    • 00:52March twenty sixth will be
    • 00:53Nate Wood.
    • 00:55No. That's not okay. Got
    • 00:56it. Sorry.
    • 00:57Grant Rounds, seven thirty AM
    • 00:59at Nate Wood. Food insecurity
    • 01:01and culinary medicine,
    • 01:03followed by,
    • 01:04noon conference,
    • 01:06next week. Ben Amba,
    • 01:08speaking about the impact of
    • 01:10the current landscape
    • 01:12on GME
    • 01:13nationwide.
    • 01:16Disclosure and accreditation.
    • 01:20So I'm excited to introduce,
    • 01:23Joe Donrow, who is going
    • 01:25to be joining us. So
    • 01:26the original plan for today
    • 01:27was to have two thirty
    • 01:29minute time slots. Our second,
    • 01:31presenter is unfortunately unavailable at
    • 01:33the last minute. So we
    • 01:35have one thirty minute time
    • 01:36slot, but we're really glad
    • 01:37to have,
    • 01:38Joe joining us. Joe started
    • 01:41his training career at Tufts
    • 01:42where he learned his MD
    • 01:44earned his MD and Miles
    • 01:45per hour, spent two years
    • 01:47in Peru,
    • 01:48before heading back north to
    • 01:50New Haven,
    • 01:51to, complete his med peds,
    • 01:53residency and chief resident. And
    • 01:55he's here now on faculty
    • 01:56where he focuses on
    • 01:59teaching clinical skills,
    • 02:01clinical education overall, and also
    • 02:03taking care of patients with,
    • 02:05addiction.
    • 02:06He's done a lot of
    • 02:07work on,
    • 02:09teaching pocus to our house
    • 02:11staff and others. And here
    • 02:13is, going to be speaking
    • 02:14about, more generally competency assessment
    • 02:17and are your measures reliable.
    • 02:19So thank you.
    • 02:22Alright. Thanks for the opportunity
    • 02:23to come and talk.
    • 02:27It's,
    • 02:28it's about competency assessment, but
    • 02:30it's it's gonna just closely
    • 02:32parallel a study that, that
    • 02:34we
    • 02:35did here and and recently,
    • 02:37completed. So,
    • 02:39actually,
    • 02:40when I was approached to
    • 02:41do this, it it it
    • 02:42was a research in in
    • 02:43progress.
    • 02:45I'm happy to say now
    • 02:46that it's,
    • 02:48we've been published this week,
    • 02:50with a group of
    • 02:52authors from from here and
    • 02:54and elsewhere.
    • 02:55And, I can
    • 02:57I can pause for a
    • 02:57second to hit that QR
    • 02:59code and download it and
    • 03:00get the metrics up a
    • 03:01little a little bit?
    • 03:08I'll go through the the
    • 03:09the background and the the
    • 03:11methodology
    • 03:11of of what we did,
    • 03:14briefly,
    • 03:15because we only have twenty
    • 03:16five minutes or so.
    • 03:18I really wanted to try
    • 03:19and focus the
    • 03:20the conversation
    • 03:22around,
    • 03:23a form of reliability
    • 03:25testing that's called generalizability
    • 03:27theory and, decision study,
    • 03:31and give a brief overview
    • 03:33of that and how we
    • 03:34interpreted it in the in
    • 03:36the context of our of
    • 03:37our study. And my disclaimer
    • 03:38is I am I am
    • 03:40not an expert in these,
    • 03:42this analytic,
    • 03:44technique.
    • 03:44And it was actually really
    • 03:46hard to find, expertise
    • 03:48to to move our our
    • 03:49project forward, but I'll I'll
    • 03:50circle back to that in
    • 03:51a in a moment.
    • 03:55So some,
    • 03:56background information.
    • 03:58My interest in this stems
    • 03:59from the work I do
    • 04:00in, in terms of leading
    • 04:02the point of care ultrasound
    • 04:04programs for internal medicine,
    • 04:07training
    • 04:07residents and and training faculty
    • 04:10to use this as a
    • 04:10tool in, in the clinical
    • 04:12environment.
    • 04:13Point of care ultrasound
    • 04:15is the utilization
    • 04:16of ultrasound at the point
    • 04:18of care
    • 04:19by the treating physician. And
    • 04:21so we use it to
    • 04:22help diagnose
    • 04:23and manage,
    • 04:25and in contrast to comprehensive
    • 04:27ultrasound,
    • 04:27this is used to really
    • 04:29address very focused problems.
    • 04:35What's
    • 04:37what's been lagging
    • 04:38as Point of Georgetown becomes
    • 04:40more and more popular amongst
    • 04:43medical schools, amongst residents, and
    • 04:44amongst faculty,
    • 04:46The utilization of Pocus
    • 04:48is increasing,
    • 04:50at a tremendous rate. However,
    • 04:51our ability to understand, are
    • 04:53people actually competent to use
    • 04:55this tool?
    • 04:56That is lagging way behind.
    • 04:58We don't have very good
    • 04:59measures,
    • 05:00to be able to do
    • 05:01that, especially at that top
    • 05:03part of Miller's pyramid,
    • 05:05where it's really,
    • 05:07competency in action. You know,
    • 05:09how is the learner actually
    • 05:10performing in the clinical, clinical
    • 05:12arena? And so
    • 05:17we formed a research question
    • 05:19around this this existing gap,
    • 05:22and the question became,
    • 05:24what is the validity evidence
    • 05:25supporting the use of an
    • 05:26entrustable professional activity
    • 05:29framework
    • 05:30to assess point of care
    • 05:31ultrasound competency
    • 05:33in internal medicine
    • 05:35learners.
    • 05:37Yeah.
    • 05:39What is the state of
    • 05:40care at
    • 05:42The level of certification of
    • 05:44what's required for someone to
    • 05:45roll out the focus machine
    • 05:47in their own practice, I
    • 05:48guess? That's the level. Yeah.
    • 05:50Yeah. It's,
    • 05:51it's it's a little bit
    • 05:52of, of the wild west.
    • 05:56Right now,
    • 05:57most,
    • 05:59departments at Yale do not
    • 06:00have a a privileging
    • 06:02mechanism,
    • 06:03for point of care ultrasound.
    • 06:04There are a few that
    • 06:05do. Emergency medicine,
    • 06:07does.
    • 06:08Surprisingly, you know, groups like
    • 06:10Palm Crit do not.
    • 06:12Internal medicine does not.
    • 06:14And so as these are
    • 06:15being used more and more,
    • 06:17they're being used in the
    • 06:18absence of a privileging process.
    • 06:20In the absence of privilege
    • 06:22process for the hospital, there's
    • 06:23no formal
    • 06:25credentialing process either for which
    • 06:27to verify,
    • 06:28competency.
    • 06:29And so it's a lot
    • 06:30of, sort of up to
    • 06:32the professional to make a
    • 06:33decision on whether or not
    • 06:34they feel comfortable using that
    • 06:37in the in the clinical
    • 06:37arena.
    • 06:38And as we as we
    • 06:40know, clinicians are not always
    • 06:41the best self assessors,
    • 06:43which,
    • 06:44you know, invites a problem,
    • 06:46I think. But we're moving
    • 06:47in that direction. So, actually,
    • 06:48I I chair the committee
    • 06:50for establishing a standard, process
    • 06:52for privileging across the hospital
    • 06:54and the delivery networks.
    • 06:58That that committee has been
    • 07:00in, together for about five
    • 07:01years now,
    • 07:03but I think we are
    • 07:04close. I I I would
    • 07:05expect that we there's probably
    • 07:07privileging that that's gonna happen
    • 07:08within the next six months
    • 07:10or so. Now that I've
    • 07:11said that, I've cursed it,
    • 07:12but I think we are
    • 07:13closer than than we ever
    • 07:15have been. So there there
    • 07:16should be a credentialing privileging
    • 07:17process soon.
    • 07:23The methodology
    • 07:25for,
    • 07:26the study that we that
    • 07:27we did. So,
    • 07:30we developed an EPA or
    • 07:32intractable professional activity
    • 07:34framework
    • 07:34and instrument to use. That
    • 07:36process was guided by a
    • 07:38panel of experts in point
    • 07:40of care ultrasound
    • 07:41and, medical education,
    • 07:44and it followed a very
    • 07:44standardized way to create,
    • 07:47create an EPA.
    • 07:48The tool we created, the
    • 07:49instrument we created is online
    • 07:52so learners access it on
    • 07:53their phones,
    • 07:54so it can be used
    • 07:55in in real time in
    • 07:56the workplace.
    • 07:57Then we trained a group
    • 07:59of,
    • 08:00ultrasound experts to become assessors
    • 08:02for us,
    • 08:03so that they can do
    • 08:04the assessments with our, with
    • 08:06our learners
    • 08:07at the bedside.
    • 08:09And then we evaluated the
    • 08:10framework and the and the
    • 08:12instrument that we're using for
    • 08:13sources of, evidence of validity,
    • 08:15reliability, and and feasibility.
    • 08:21The EPA that we that
    • 08:22we came up with is
    • 08:24this, assessing the acutely ill
    • 08:26patient using point of care
    • 08:27ultrasound,
    • 08:28and the scale that we
    • 08:30use as our,
    • 08:32assessment
    • 08:34assessment assessment scale is up
    • 08:36there. So
    • 08:37with entrustable professional activities,
    • 08:40the the key cutoff is
    • 08:42where is somebody
    • 08:43the level at which somebody
    • 08:44can be entrusted to perform
    • 08:46the activity
    • 08:47by themselves in an unsupervised
    • 08:49way. In our on our
    • 08:51scale, that is level four,
    • 08:52allowed to practice the EPA
    • 08:54unsupervised.
    • 08:55And between level one to
    • 08:57four, there's there's there's a
    • 08:59gradation.
    • 09:00What's nice about this tool
    • 09:02is that at each level,
    • 09:03it really directs the feedback
    • 09:05that the learner needs to
    • 09:07advance to the next step.
    • 09:08So it it becomes an
    • 09:10important way to to to
    • 09:11track competency, but also
    • 09:13to,
    • 09:15to make sure that the
    • 09:16feedback that's given is, is
    • 09:18the right feedback for where
    • 09:19the learner is on their
    • 09:20competency pathway.
    • 09:25So
    • 09:26skipping some steps because I
    • 09:27I just wanted to get
    • 09:28to to really what's the
    • 09:29focus of of today, which
    • 09:31is reliability testing. So one
    • 09:34source of validity evidence when
    • 09:36we're thinking about,
    • 09:38developing a tool is,
    • 09:40is reliability. And when we
    • 09:41think about reliability, what we're
    • 09:43we're asking is are the
    • 09:45measures consistent across,
    • 09:47different workplace conditions and across
    • 09:49different assessors and learners?
    • 09:54Another way to think about
    • 09:55re reliability testing is how
    • 09:57close is the observed score
    • 09:59to the true score. Right?
    • 10:01How close
    • 10:02is my observations
    • 10:03of competence?
    • 10:05How close is that to
    • 10:06the learner's true competence?
    • 10:09If you're prefer to think
    • 10:11about in terms of formula,
    • 10:12you see the formula on
    • 10:13the on the screen there,
    • 10:14observed score equals true score
    • 10:16plus some some error in
    • 10:17our measurement. Right? We can
    • 10:19never really get to the
    • 10:20true, the true score. There's
    • 10:22always some error that we
    • 10:23wanna try and understand
    • 10:25and minimize.
    • 10:29The classical approach to reliability
    • 10:32testing,
    • 10:35really looks at,
    • 10:36or focuses on one source
    • 10:38of errors. So studies are
    • 10:39designed to look at things
    • 10:40like interrater reliability or intercase
    • 10:43reliability or internal consistency alpha
    • 10:45sicknesses, the Cronbach alpha that
    • 10:47you're probably familiar with.
    • 10:52The challenge with that, though,
    • 10:53is that in medical education
    • 10:55and the assessments that we
    • 10:56do, there's there's more than
    • 10:58just one source of error
    • 11:00that we have to worry
    • 11:01about.
    • 11:03There's multiple potential sources of
    • 11:04error. And so in reality,
    • 11:07we have to move from
    • 11:09that,
    • 11:10that classical
    • 11:12formula.
    • 11:13And we have to consider,
    • 11:15you know, what is the
    • 11:16error that we can attribute
    • 11:17to
    • 11:18the learner?
    • 11:19What is the error that
    • 11:20we can attribute to the
    • 11:21raters? Some raters are more
    • 11:22lenient. Some are more strict.
    • 11:24Some know the,
    • 11:26some know the learner and
    • 11:27that influence the scores.
    • 11:28We have to,
    • 11:29think about error attributed to
    • 11:32the clinical case. Are there
    • 11:34differences in difficulty between the
    • 11:36cases that the that the
    • 11:37learners are being assessed on?
    • 11:38And all of those factor
    • 11:40into
    • 11:40that error value.
    • 11:42And so in reality, what
    • 11:44we really need our formula
    • 11:46to look like is this.
    • 11:47So our observed score equals
    • 11:49the true score plus multiple
    • 11:51sources of error. How do
    • 11:52we get to evaluating what
    • 11:54those sources of of error
    • 11:56are and what the relative
    • 11:58contributions are to the overall
    • 12:00error number.
    • 12:04And that's and we I
    • 12:06was stuck there for a
    • 12:07long time. We had collected
    • 12:09our data,
    • 12:11and I was, you know,
    • 12:12really trying to move forward.
    • 12:14And the problem was there
    • 12:16just wasn't the expertise to
    • 12:17to run the studies that
    • 12:18we needed to run, at
    • 12:20least that that I could
    • 12:21find after,
    • 12:22a lot of a lot
    • 12:23of emails and communications around
    • 12:25this, trying to find somebody
    • 12:27to to run the studies
    • 12:28that we needed to do
    • 12:29to to get to this
    • 12:31multiple sources of of error.
    • 12:33And it's a type of
    • 12:34of analysis that's called the
    • 12:35generalizability
    • 12:36theory.
    • 12:39Fortunately,
    • 12:40two things happen.
    • 12:42One,
    • 12:43Donna Windisch in the department
    • 12:45started,
    • 12:46started the Department of Medicine
    • 12:48educational
    • 12:49grant.
    • 12:50That came out about the
    • 12:51same time as,
    • 12:53as I was in the
    • 12:53the struggle to to do
    • 12:54this analysis.
    • 12:56And I was introduced to
    • 12:57Haidong Lu, who's who's here
    • 13:00today as well. And,
    • 13:02with the the funding support,
    • 13:03I was able to connect
    • 13:05with Haidong, and, and we
    • 13:06were able to to plan
    • 13:08together and and,
    • 13:10he he became my my
    • 13:12expert for for getting this
    • 13:13done and was really the,
    • 13:15the
    • 13:16the the key piece to
    • 13:18to be able to move
    • 13:18this, this forward. So I'm
    • 13:21extremely, extremely grateful, both for
    • 13:23the educational research grant and
    • 13:25for, for.
    • 13:28And so what what he
    • 13:29was able to do is
    • 13:30this analysis called, generalizability
    • 13:34theory.
    • 13:35And what this does is
    • 13:37it,
    • 13:39it tries to,
    • 13:41to distill down the various
    • 13:43sources of error that could
    • 13:45be contributing to our overall
    • 13:46reliability
    • 13:48and,
    • 13:49figure out the the relative
    • 13:51contributions of each. So within
    • 13:53this,
    • 13:53this framework of this analysis,
    • 13:56we see that there are
    • 13:57effects,
    • 13:58otherwise known as as facets.
    • 14:00These are the potential sources
    • 14:02of error as we're,
    • 14:04as we're performing our assessment.
    • 14:05So we see things on
    • 14:06there like the learner,
    • 14:08the rater.
    • 14:09The syndrome refers to,
    • 14:12within our EPA, students are
    • 14:14or learners are evaluating the
    • 14:16dyspnic patient, the patient with
    • 14:17abdominal distension, the patient with
    • 14:19hypotension. So various syndromes that
    • 14:21they're, they're evaluating.
    • 14:23And there are
    • 14:24interactions between these things as
    • 14:26well. So there are interactions
    • 14:27between the learner and the
    • 14:29rater, the learner and the
    • 14:29syndrome, the rater and and
    • 14:31on and on and
    • 14:33on. And the idea is
    • 14:34to try and get to
    • 14:35how much are each of
    • 14:36these contributing to the overall
    • 14:39error. And,
    • 14:41we call that the percent
    • 14:42variance. So if we think
    • 14:44about,
    • 14:44there is
    • 14:46an absolute number that is
    • 14:48that error, and within that
    • 14:49absolute number, there are contributions
    • 14:51from each one of these
    • 14:52things. How much does each
    • 14:53of these contribute to that
    • 14:54error number? And it also
    • 14:56gives us a measure of
    • 14:58reliability, and we're gonna circle
    • 15:00back to this,
    • 15:01because one of
    • 15:03the the powers of this,
    • 15:05assessment technique is it allows
    • 15:06us to do what's called
    • 15:07the decision study
    • 15:09where we can estimate,
    • 15:11how many observations or how
    • 15:13many raters do we need
    • 15:14to achieve a certain level
    • 15:15of reliability,
    • 15:17which really helps us to
    • 15:18optimize our processes
    • 15:20of assessment moving forward.
    • 15:22So we're just gonna take
    • 15:22a quick peek at,
    • 15:24at each of these and
    • 15:25talk briefly about, some of
    • 15:26the the the main effects.
    • 15:29So we looked at learner
    • 15:30variance. And for medical education
    • 15:33studies, what you really wanna
    • 15:34see is that the learner
    • 15:35variance is high. You want
    • 15:38the error attributable to differences
    • 15:41in the learner,
    • 15:42different skill sets, different, degrees
    • 15:45of competence.
    • 15:47A high
    • 15:48learner variance is tells you
    • 15:50that you are able to
    • 15:51accurately
    • 15:52discriminate between differences in competency
    • 15:55between your your your learners.
    • 15:57And one of the things
    • 15:58I had to kind of
    • 15:59wrap my head around was,
    • 16:00well, what is what is
    • 16:01high?
    • 16:03You know, this number, twenty
    • 16:04seven point seven,
    • 16:06felt low when it came
    • 16:07out. As it turns out,
    • 16:08that's actually,
    • 16:09quite a robust number for
    • 16:10this type of study. And
    • 16:12so when we're looking at
    • 16:13numbers above twenty five percent,
    • 16:16that's actually considered,
    • 16:17quite good for,
    • 16:19for a medical education reliability
    • 16:22study. So we're we're quite
    • 16:23pleased with our learner variance.
    • 16:29We looked at rater variance.
    • 16:31So this is the idea
    • 16:32of, can some of that
    • 16:34error term or or how
    • 16:35much of that error term
    • 16:36is attributed to just difference
    • 16:37in how the raters are
    • 16:38scoring.
    • 16:39And that could be, as
    • 16:41we know, some some of
    • 16:42us are very strict when
    • 16:43we evaluate our learners. Some
    • 16:45of us are very lenient
    • 16:47when we evaluate our learners.
    • 16:49There's also the element of
    • 16:50we're using EPAs, and and
    • 16:52that's that's a newer way
    • 16:54of assessment. And so,
    • 16:56you know, how well did
    • 16:57our raters understand this tool
    • 16:59that we're that we're using?
    • 17:01We train them. We we
    • 17:02would hope that they would
    • 17:02understand it well, but,
    • 17:04but did they?
    • 17:05Ideally, we want this portion
    • 17:07of the variance to be
    • 17:08quite small. We don't want,
    • 17:10the the, a large portion
    • 17:12of the error being attributed
    • 17:14to the raters. And for
    • 17:15us, the number was sixteen
    • 17:17point five percent.
    • 17:19And,
    • 17:20boy, I was happy because
    • 17:21that seemed really low. But
    • 17:23as it turns out, sixteen
    • 17:25point five is it's not
    • 17:26high or low. It's right
    • 17:27in the middle. I would
    • 17:28call it a modest contribution
    • 17:30to,
    • 17:31to the error value.
    • 17:33And what's nice about this
    • 17:34is it really points us
    • 17:35in a direction to say,
    • 17:37you know, where can we
    • 17:38improve in our assessment methodology
    • 17:40and gives us a target
    • 17:42for that, perhaps more training
    • 17:43of our of our raters.
    • 17:45Carrie, did you have a
    • 17:45question?
    • 17:46In this data set Yeah.
    • 17:49How many
    • 17:50rate
    • 17:51learner have? Yeah. It's a
    • 17:53good question. There was a
    • 17:54range. There was, six hundred
    • 17:56and four assessments that were
    • 17:58done by
    • 17:59I think our final number
    • 18:00was fifteen
    • 18:02different
    • 18:03raters.
    • 18:04And there was variability in
    • 18:06terms of
    • 18:07how many,
    • 18:08how many assessments were done
    • 18:10by each rater. I don't
    • 18:11have off the top of
    • 18:11my head what the average
    • 18:12number of
    • 18:14assessments per rater was.
    • 18:16But the the analysis,
    • 18:19factors
    • 18:20factors that in. How? I
    • 18:22don't I'd have to ask.
    • 18:23I don't agree with her
    • 18:24to to get into go
    • 18:25into the depths with them.
    • 18:27Like, if I No. No.
    • 18:29Each learner has,
    • 18:31has encounters with multiple raters.
    • 18:33Yeah. Yeah.
    • 18:39And then the last, the
    • 18:40last of the effects that
    • 18:41I'll I'll highlight is, is
    • 18:43case variance.
    • 18:44And this is really looking
    • 18:45at how much of the
    • 18:46variance is due to difficulties
    • 18:48in in case
    • 18:50variability or case, case difficulty.
    • 18:52And, ideally, you want this
    • 18:53to be to be quite
    • 18:55low.
    • 18:57That number,
    • 18:59of one percent looks low
    • 19:00and and is low, so
    • 19:01we were actually quite happy
    • 19:02with,
    • 19:03with our our case variance.
    • 19:05To be honest, I was
    • 19:05I was a bit surprised
    • 19:07because there's such a range
    • 19:08of different clinical syndromes that
    • 19:10the,
    • 19:12that the the residents were
    • 19:13were seeing. I have some
    • 19:14theories around why it might
    • 19:16be low,
    • 19:17such as,
    • 19:18it's really the the the
    • 19:20difficulty is in the the
    • 19:21use of the ultrasound, not
    • 19:23in the approach to the
    • 19:24to the patient. The the
    • 19:25residents have a certain skill
    • 19:26level with the the patients.
    • 19:28The new skill is the
    • 19:29ultrasound, and so residents of
    • 19:30a certain level of competence
    • 19:32with ultrasound are gonna score
    • 19:34the same regardless of,
    • 19:36of the patient that that's
    • 19:37in front of them. And
    • 19:38that's that's,
    • 19:40my assessment of why that
    • 19:41number is so low.
    • 19:45As I mentioned before, one
    • 19:46of the the powerful parts
    • 19:48of the generalizable
    • 19:50theory analysis is that it
    • 19:52can,
    • 19:53lead to what's called a
    • 19:54decision study. And decision study
    • 19:56allows us to predict
    • 19:59the
    • 20:00the reliability
    • 20:01of the assessments
    • 20:02for varying levels of effect
    • 20:04or or facets. And so
    • 20:05in this hypothetical,
    • 20:08dataset here,
    • 20:10we can say, how much
    • 20:12does the reliability
    • 20:15estimate change if we keep
    • 20:16the number of raters the
    • 20:18same,
    • 20:19but we increase this is
    • 20:21an OSCE, but we increase
    • 20:22the number of stations within
    • 20:24the OSCE. And we see
    • 20:25that by increasing the number
    • 20:27of stations, you actually get
    • 20:28a a nice jump in
    • 20:30your reliability. And our thresholds
    • 20:32for reliability
    • 20:33here,
    • 20:34for most most clinical
    • 20:37items, you want a reliability
    • 20:38of point seven or point
    • 20:39eight. And And so by
    • 20:40increasing the number of stations,
    • 20:41we're able to get the
    • 20:43the these authors were able
    • 20:44to get the reliability up
    • 20:45to over, over point eight.
    • 20:48You might ask the questions,
    • 20:49well, what happens if we
    • 20:50increase the number of raters
    • 20:51instead of increasing the number
    • 20:52of stations? Can we improve
    • 20:53our our reliability that way?
    • 20:55And going from two raters
    • 20:57to eight raters really didn't
    • 20:58make a meaningful impact in
    • 21:00reliability. And and so you
    • 21:01can take this and you
    • 21:02can say, alright. Well, if
    • 21:03we're designing an assessment
    • 21:04tool and assessment process, really,
    • 21:06we wanna put our focus
    • 21:08on,
    • 21:09the number of observations or
    • 21:10the number of stations. And
    • 21:11and so that's just an
    • 21:12example of sort of how
    • 21:14decision study can be can
    • 21:15be utilized. Yeah.
    • 21:17About that.
    • 21:18That would suggest to me
    • 21:19that the variability is largely
    • 21:21in a rater than across
    • 21:23raters.
    • 21:24Is that correct on you?
    • 21:29In fact, it doesn't I
    • 21:31mean, you would think it
    • 21:32it there's a lot of
    • 21:33variability among rater. Yeah.
    • 21:35Some are really conservatives. Right.
    • 21:38Then you expect
    • 21:39increasing the number of raters
    • 21:41would have a substantial effect
    • 21:42would have an impact. Averaging
    • 21:43of that. Yeah. So I
    • 21:45would agree with you. I
    • 21:45would say that this in
    • 21:47this particular this isn't my
    • 21:48data. This is a hypothetical
    • 21:49dataset
    • 21:50that,
    • 21:51there
    • 21:52probably wasn't a lot of
    • 21:53variability amongst the raters, and
    • 21:54so adding more raters didn't
    • 21:56make a didn't make a
    • 21:57difference in terms of reliability.
    • 21:59Well, but and
    • 22:00is this consistent with the
    • 22:01numbers you showed us before
    • 22:02for the percentage of variability
    • 22:03was attributable to the raters?
    • 22:05No. And so I'll I'll
    • 22:06show you what it looked
    • 22:07like for our data. I've
    • 22:09just this was just this
    • 22:09is just a hypothetical just
    • 22:11to make the point of
    • 22:12what sort of what decision
    • 22:13studies can do if we
    • 22:14if we if we change
    • 22:16the different elements.
    • 22:20I just got a text.
    • 22:21Please repeat the question. In
    • 22:23general
    • 22:24Oh, okay.
    • 22:25I think the microphone's not
    • 22:26working. Just when you get
    • 22:28a question, just repeat it
    • 22:30so online people can hear
    • 22:31it. Okay. We'll we'll we'll
    • 22:33do that moving forward.
    • 22:36So this is,
    • 22:37this is this is our
    • 22:39our data.
    • 22:41And this is the final
    • 22:42product
    • 22:43of our data, meaning this
    • 22:45was the,
    • 22:46this is the data that
    • 22:48seemed to improve,
    • 22:50reliability
    • 22:52best.
    • 22:53And
    • 22:54what it what it came
    • 22:55out with, it is really
    • 22:57the number of observations
    • 22:58made the biggest impact on
    • 23:01moving our
    • 23:03reliability,
    • 23:05curve towards,
    • 23:06towards that point eight. We
    • 23:07chose the higher value point
    • 23:09eight rather than point seven
    • 23:10as our as our cutoff.
    • 23:12And what it looked like
    • 23:13is to get our you
    • 23:15know, given the parameters, keeping
    • 23:16everything else stable, and just
    • 23:17changing the number of observations,
    • 23:21getting our observations up to
    • 23:22about ten gives us a
    • 23:25reliability
    • 23:26to a level point eight
    • 23:27to understanding
    • 23:28where our learners are with
    • 23:30their pocus competency. Doesn't mean
    • 23:32ten observations and your learner
    • 23:34is competent in pocus. It
    • 23:36means after ten observations,
    • 23:38I can reliably
    • 23:39understand
    • 23:40what level that they are
    • 23:42at.
    • 23:44So that's that's quite useful
    • 23:46for helping us to to
    • 23:47understand sort of sort of
    • 23:48next steps.
    • 23:50The limitations of the of
    • 23:51our of our work, I
    • 23:53think the main limitation, we
    • 23:54did it across three large
    • 23:56academic hospitals. So it was,
    • 23:58it was us, it was
    • 23:59MGH, and it was, OHSU
    • 24:01that were, part of it.
    • 24:03Most of the observations came
    • 24:04from us here at Yale.
    • 24:07I think,
    • 24:08I think that
    • 24:10influences the generalizability
    • 24:11of of what we're what
    • 24:13we're putting out there. How,
    • 24:14you know, how would a
    • 24:15tool like this work at
    • 24:16a smaller program, someplace that
    • 24:18it does not have such,
    • 24:19robust, point of care ultrasound,
    • 24:22expertise,
    • 24:24unclear.
    • 24:25And it's mostly in an
    • 24:26inpatient setting. How does this
    • 24:27translate to an outpatient setting,
    • 24:29where ultrasound is also being
    • 24:31used, also unclear.
    • 24:34So conclusions and and next
    • 24:36steps,
    • 24:37the,
    • 24:38you know, the within the
    • 24:40study, we were able to
    • 24:41generate validity,
    • 24:43and feasibility
    • 24:44evidence to support,
    • 24:45what is a a a
    • 24:47very novel,
    • 24:48approach to looking at point
    • 24:49of care ultrasound,
    • 24:51competency.
    • 24:52We
    • 24:53need to put more time,
    • 24:54I think, into rater training,
    • 24:56to make sure that raters
    • 24:58are being consistent in their
    • 25:00in their assessments,
    • 25:01of of the learners, which
    • 25:03probably means,
    • 25:04both
    • 25:05reorienting them to EPAs and
    • 25:07making sure they feel comfortable
    • 25:08with that and and probably
    • 25:10doing some calibration training to
    • 25:12make sure that my level
    • 25:13three is the same as
    • 25:14your level three, etcetera.
    • 25:17When you find the outlier
    • 25:18using these data, can you
    • 25:19find the people that's
    • 25:21find the raters who are
    • 25:22giving everybody Yeah. Who yeah.
    • 25:24We probably can. Yeah. We
    • 25:26probably could probably jump in
    • 25:27and and figure out, like,
    • 25:28who, like, pinpoint who who
    • 25:29really who really needs the
    • 25:31help.
    • 25:32But I guess it's, you
    • 25:33know, it's it's challenging because
    • 25:34you always gotta say, like,
    • 25:35what's your like, who's the
    • 25:36standard, I guess, that you
    • 25:37would compare to. So,
    • 25:40maybe it's me. Maybe me.
    • 25:41I'm too lenient or too
    • 25:42strict. I don't know. So
    • 25:44it'd be interesting to to
    • 25:45think about. That might be
    • 25:45another another study that we
    • 25:47look at. We'll group the
    • 25:48standard. I mean, basically, what
    • 25:50you you do is predict
    • 25:52the score based on the
    • 25:53rater identification.
    • 25:54Interesting.
    • 25:55People who have higher than
    • 25:57average scores, you can do
    • 25:58that. People who have lower
    • 25:59than average scores, you can
    • 26:00do that. Yeah. Nice. So
    • 26:02the the the question for
    • 26:03the, for the Zoom room,
    • 26:06was about using the data
    • 26:08to,
    • 26:09to predict who who are
    • 26:10the more lenient or the
    • 26:11more strict,
    • 26:12raters.
    • 26:14And, doctor Justice was just
    • 26:16giving some some tips on
    • 26:17how we might, might design
    • 26:18that.
    • 26:21I think one of the
    • 26:22things that that I'm interested
    • 26:23in thinking about is, you
    • 26:24know, particularly as we're working
    • 26:25on this privileging process at
    • 26:27the hospital for point of
    • 26:28care ultrasound is is thinking
    • 26:30about using this as a
    • 26:31tool, for more summative level
    • 26:33decision making,
    • 26:35around,
    • 26:36around the privileging,
    • 26:37process,
    • 26:38here at here at Yale.
    • 26:43Some
    • 26:44thank you. So,
    • 26:46Janet and John, I did
    • 26:47this work as part of
    • 26:48my masters of health
    • 26:50science, Donna in the department,
    • 26:52for making the the grant
    • 26:54available.
    • 26:56David and and Jeanette just,
    • 26:58just master
    • 26:59mentors and and really encouraging
    • 27:02and,
    • 27:02and facilitating my interaction with,
    • 27:05with Haidong Liu, which is
    • 27:06really what what made this
    • 27:07project move, move forward,
    • 27:10and then, the team of,
    • 27:12of researchers
    • 27:13that I was able to
    • 27:14work with.
    • 27:15Alright. That's it. Matt.
    • 27:17Yeah. That's really great. Thank
    • 27:18you.
    • 27:20So I had a couple
    • 27:21questions. One was,
    • 27:23are there other at the
    • 27:24hospital level, in terms of
    • 27:25privileging,
    • 27:27is there anything analogous to
    • 27:28this sort of
    • 27:30level of really assessing like,
    • 27:32a heart transplant is probably
    • 27:33more competency assessment than a
    • 27:35heart transplant,
    • 27:37for surgeon, I would think.
    • 27:39Yeah. So and ask question
    • 27:41one was, is is there
    • 27:42something comparable to this,
    • 27:44type of assessment in in
    • 27:46other areas of privileging?
    • 27:48And then do you have
    • 27:49second question too? Second one
    • 27:51was, I know it's very
    • 27:52different, but you was there
    • 27:53anything useful
    • 27:54in the radiology world
    • 27:57in terms of how competency
    • 27:59is assessed
    • 28:00for either technicians or ultrasonographer
    • 28:03radiologists?
    • 28:04Yeah. And then the second
    • 28:05question was there is there
    • 28:06anything comparable in the in
    • 28:07the radiology world?
    • 28:09So so the first question,
    • 28:11there's nothing comparable that I'm
    • 28:12aware of in within privileging.
    • 28:15If you're, you know, for
    • 28:16example,
    • 28:18privileging for,
    • 28:19you know,
    • 28:20if you're a heart surgeon
    • 28:21to do a heart transplant
    • 28:22is really kind of number
    • 28:23of cases that you've done
    • 28:25in graduating from a, you
    • 28:26know, an accredited program or
    • 28:28you did your fellowship in
    • 28:30x y or x y
    • 28:31or z.
    • 28:34A lot of training works
    • 28:35that way. I think it's
    • 28:36probably more comparable to sort
    • 28:38of how we
    • 28:40we privilege around procedures where
    • 28:42it's like you have to
    • 28:43do, you know, five
    • 28:45central lines, and then you're
    • 28:46you're magically competent in, in
    • 28:49that,
    • 28:50which which creates a real
    • 28:51problem. So, you know, a
    • 28:52lot of hospital systems
    • 28:54use, like, a number based
    • 28:56algorithm for deciding who's privileged
    • 28:57or not. So you've done
    • 28:59fifty cardiac studies. Now you're
    • 29:01privileged. But the number
    • 29:04definitely does not tell the
    • 29:05story.
    • 29:06I work with, with trainees.
    • 29:08Some have done fifty cardiac
    • 29:09studies, and they're great. And
    • 29:11I work with others that
    • 29:12have done fifty, and they
    • 29:13really still stink. And so
    • 29:14the number
    • 29:15but there's always a feasibility
    • 29:16element, you know, for the,
    • 29:17you know, like, the credentialing
    • 29:19committee where where they have
    • 29:20to say, like,
    • 29:22you know, if it gets
    • 29:22too complicated,
    • 29:23it it it gets unmanageable
    • 29:25for them to do. So
    • 29:26numbers make it very simple.
    • 29:27Alright? I I can check
    • 29:28the box. They've done x
    • 29:30number and therefore you're you're
    • 29:31privileged,
    • 29:33which may work for for
    • 29:34privileging. I I think if
    • 29:36we run a true assessment
    • 29:37of competency though, we we
    • 29:39have to take a more
    • 29:40holistic
    • 29:41way of, of looking at
    • 29:42that. And then,
    • 29:45nothing
    • 29:46from the radiology
    • 29:47world.
    • 29:49I think also because
    • 29:51you you finish your residency
    • 29:53in in radiology and and
    • 29:55then you are privileged to
    • 29:56to be a a radiologist.
    • 29:57And and so I don't
    • 29:59know that they're necessarily
    • 30:00faced with this with this
    • 30:02problem.
    • 30:03And they have a whole
    • 30:04residency
    • 30:05to to learn this stuff,
    • 30:06whereas we're trying to say
    • 30:07how quickly can I get
    • 30:08somebody from, you know, being
    • 30:10a novice to an expert
    • 30:11so they can start using
    • 30:12this in clinical practice?
    • 30:15I'll be mindful of the
    • 30:16fact that you mentioned you
    • 30:17had
    • 30:19a obligation. So,
    • 30:21Thank you.
    • 30:22If you have questions, follow-up
    • 30:23that. I'm gonna
    • 30:25maybe I should tell people.
    • 30:26Oh, great. Yeah. So,
    • 30:28if there are more more
    • 30:29questions,
    • 30:31please feel free to to
    • 30:32email me.
    • 30:33Happy to to answer things
    • 30:35over over email as well.
    • 30:36I will. Great. Thanks.
    • 30:42Great testaments to finding the
    • 30:44interest.