4-29-25 Workshop Session 2
June 04, 2025Information
- ID
- 13191
- To Cite
- DCA Citation Guide
Transcript
- 00:01So
- 00:02in
- 00:03the
- 00:04previous session, I think,
- 00:07we mainly focus on to
- 00:08show you how to access
- 00:10the data, how to request
- 00:11computing resource. Right?
- 00:13In the next session, what
- 00:15we will do is more
- 00:16about
- 00:17technical details,
- 00:19how you can call existing
- 00:21large engine models
- 00:23in the CHP safe environment.
- 00:26Or if you want to
- 00:28build your own
- 00:29customized large engine model. Right?
- 00:34By leveraging is it existing
- 00:35model by using your own
- 00:37data, then how can you
- 00:38train the model using the
- 00:40environment? So that's what will
- 00:42happen in the next session.
- 00:44And I will provide very
- 00:46brief
- 00:47overview about large engine model,
- 00:49then,
- 00:50a number of speaker will
- 00:51actually jump in to show
- 00:53the different tools
- 00:55that are available on the
- 00:56CHP safe environment, which you
- 00:58can use. Okay? For example,
- 00:59if you want to annotate
- 01:01data, we have a tool.
- 01:02If you want to train
- 01:03the model,
- 01:04call existing API like a
- 01:06keyway tool, how to call
- 01:07it, and how to stuff
- 01:09on using your own data
- 01:11to train them out. So
- 01:12that's what will happen next.
- 01:13Let me just do a
- 01:14quick,
- 01:15introduction
- 01:17of the large engine model.
- 01:18I think because of running
- 01:19of time, so I'm trying
- 01:20to shrink,
- 01:22reduce my session.
- 01:32I think I'm going to
- 01:33skip this. I will
- 01:35was planning to give a
- 01:36short history about AI. Basically,
- 01:39what I'm trying to say
- 01:40is not just starting now.
- 01:41We have been doing this
- 01:43for many times, but this
- 01:44wave of generative AI is
- 01:46slightly different from the previous
- 01:48one in a few ways,
- 01:49like, how the model was,
- 01:53trained. It's much bigger than
- 01:54previous model and focus on
- 01:56generation task rather than trying
- 01:58to do, prediction or analyze
- 02:00the data and also heavy
- 02:02rely on the GPUs.
- 02:04And one thing I really
- 02:05want to mention because we
- 02:06talk about this is a
- 02:07large engine model session, and
- 02:09I'll just give a brief
- 02:10history about
- 02:11language models. Language model has
- 02:14been there for a while.
- 02:15Start nineteen sixty.
- 02:16Really a promising model trying
- 02:18to give a sequence of
- 02:19words, trying to predict next
- 02:21word. Okay. Then
- 02:22later on, neural,
- 02:24language model showed very good
- 02:26performance, but it suffers with
- 02:27all those,
- 02:28computational
- 02:29efficiency
- 02:30and all those issues. So
- 02:32didn't really scale up. Until
- 02:33later, two thousand seventeen, the
- 02:35transformer model was proposed
- 02:38together with abundant of GPUs
- 02:40available,
- 02:40make it possible actually to
- 02:42train,
- 02:43neural
- 02:44language model with a lot
- 02:46of data. Then we move
- 02:47to this pretrained language models.
- 02:49Okay? At that time, like
- 02:51a BERT model, you probably
- 02:52heard a lot too. Right?
- 02:53Show the good performance that
- 02:55you you pretrain,
- 02:57a reasonable text. But then
- 02:59two thousand twenty two, really,
- 03:01we moved to what we
- 03:02call large language model.
- 03:04And,
- 03:05basically, it's a transformal based
- 03:07pretrained language model, but trained
- 03:09out a lot of data.
- 03:11And the the rationale behind
- 03:13this is,
- 03:14oh, I think I
- 03:15is this
- 03:17emergent phenomenon of a large
- 03:19range model. Because people find
- 03:20that when you train on
- 03:22a lot of data,
- 03:23the model not just can
- 03:25do one thing in a
- 03:26reasonable performing. It can actually
- 03:28do a lot of different
- 03:29tasks, always a reasonable performance.
- 03:32That's what we call the
- 03:33emergent phenomenon of the large
- 03:35language model. Suddenly, it become
- 03:36very smart.
- 03:38And that leads to the
- 03:39the the the a lot
- 03:40of development of all those
- 03:42large language model, open source
- 03:44models
- 03:45like a LLAMA, DeepSeq, you
- 03:46probably heard, which give you
- 03:48all the weights.
- 03:49You can actually use your
- 03:50own data to continue pretrain
- 03:53or fine tune. Okay? And
- 03:55commercial model like GPT,
- 03:57you they used to be
- 03:59closed model, so you cannot
- 04:00really fine tune. But now
- 04:02GPT also have a service.
- 04:03You can upload data, do
- 04:05some kind of fine tune
- 04:06on their side, then post
- 04:07the model. The fine tune
- 04:09model on the GPT side.
- 04:10You still have to pay
- 04:11every time you call them.
- 04:12Okay? And then there's also
- 04:14different architect encoder versus decoder.
- 04:17And the the trend is
- 04:18also we're moving more towards
- 04:19multimodal large entry model. Instead
- 04:22of just a text based
- 04:23model, your text plus image,
- 04:25text plus other genomic data,
- 04:26and all those things. A
- 04:27lot of things going on.
- 04:29And,
- 04:31in particularly, in the NLP
- 04:33world,
- 04:34especially in the biomedical NLP
- 04:37world,
- 04:38we often focus on one
- 04:40NLP task called information extraction.
- 04:43So the idea is,
- 04:46in the clinical data, there's
- 04:47a lot of unstructure there.
- 04:48For example, it's a text
- 04:49document and a lot of
- 04:50details.
- 04:51So the task of information
- 04:53extraction is like, okay. Given
- 04:55this document,
- 04:56can you extract all the
- 04:57disease information of the patient
- 04:59out of this document?
- 05:00So I would say this
- 05:01is about comes to seventy,
- 05:03eighty percent of,
- 05:05the the the requirements for
- 05:06a lot of, like,
- 05:08EHR based analysis,
- 05:10both for,
- 05:12practical
- 05:13practices
- 05:14as well as for clinical
- 05:15research.
- 05:16So today, most all almost
- 05:18all the work we show
- 05:19here today is on this
- 05:20information extraction task,
- 05:22and it can further divide
- 05:24into three subtasks.
- 05:26The first one is called
- 05:28named entity recognition.
- 05:29So the idea is that
- 05:31you need to give a
- 05:31document. You need to,
- 05:33the system need to recognize
- 05:35MRI of the abdominal
- 05:37is a test.
- 05:39Okay?
- 05:40You need to know both
- 05:41the type is the test
- 05:42and the boundary.
- 05:43And you need to know
- 05:44June eighteen two thousand eight
- 05:46is a temple expression,
- 05:48that type of entity what
- 05:49type of entity and what's
- 05:50the boundary.
- 05:51That's the MER task. The
- 05:53second one, we also call
- 05:55it relation instruction. So what
- 05:56do you need to know?
- 05:57You want to know this
- 05:58June eighteen two thousand eight
- 06:00is a modifier of MRI.
- 06:03Right? That's a relation between
- 06:04those two entities. So it's
- 06:06very important for you to
- 06:07recognize the context
- 06:09of that clinical entity.
- 06:11And the the third one
- 06:12is called concept normalization.
- 06:14So if you read the
- 06:16notes, what you saw here
- 06:18is renal cell carcinoma.
- 06:20But if you want to
- 06:21build a clinical decision support
- 06:23system, you need to be
- 06:24this entity need to be
- 06:26coded to a concept in
- 06:28the medical terminology.
- 06:29Could be ICD ten. Could
- 06:31be SNOMED. Right? So that's
- 06:32you want to normalize this
- 06:34detect entity
- 06:36to a term in the
- 06:37vocabulary in the standard vocabulary.
- 06:40And as you can see,
- 06:41it's actually not straightforward because
- 06:42you see renal cell carcinoma,
- 06:44but, actually, the term is
- 06:45a, malignant neoplasma
- 06:48of, kidney in the terminology.
- 06:50So you need this kind
- 06:51of mapping. Okay? So
- 06:54today, most of our work
- 06:55will show how we do
- 06:56those three tasks and build
- 06:58the system to extract information
- 06:59out of the text.
- 07:01I just primarily talk about
- 07:03three different,
- 07:05approach.
- 07:06I'll skip those.
- 07:08I have a few slides
- 07:09about history.
- 07:10Two thousand twin two thousand
- 07:12ish, we mainly work on
- 07:13rule based system. You have
- 07:15a dictionary. We try to
- 07:16look up all the disease
- 07:18from the dictionary. Okay? Then
- 07:20two thousand ten, we annotate
- 07:21corpus. We start to do
- 07:23machine learning. So what happens
- 07:24is if you have a
- 07:26pack of red blood cell
- 07:28as a entity,
- 07:29beginning of entity will label
- 07:31as b, Intermediate token of
- 07:33the entity will label as
- 07:34I. Then all other outside
- 07:36entity will label as o.
- 07:38So then you convert this
- 07:39to a sequence labeling task.
- 07:41You label each word. Is
- 07:42it b, is it I,
- 07:44or o? So become machine
- 07:46learning task to learn. Okay?
- 07:48So that's what we're doing
- 07:49around that time. And then
- 07:51twenty twenty,
- 07:52it's more on,
- 07:54deep learning. I think many
- 07:55of you heard about. At
- 07:57that time, we are looking
- 07:58for those context embeddings like
- 08:00a BER model. So we
- 08:01actually fine tune the BER
- 08:03models from open domain with
- 08:05clinical data and show the
- 08:07performance.
- 08:08So this is just trying
- 08:09to summary. All you want
- 08:11to know is, like, moving
- 08:12from rule based machine learning
- 08:14to deep learning, the performance
- 08:15that you're getting better.
- 08:17And now we move to
- 08:18large language model. How you
- 08:20can use large man lang
- 08:22language model to do this
- 08:23information
- 08:24extraction task. I'll give you
- 08:26three examples,
- 08:27three different approach we have
- 08:29worked on. And, we actually
- 08:31have a a
- 08:32I think, potentially, if you're
- 08:34going to work on your
- 08:35own task, those are the
- 08:36three approach you may can
- 08:38take.
- 08:39The first one, you probably
- 08:41all know. You have GPT
- 08:42over there.
- 08:43All you need
- 08:44to do is write a
- 08:45prompt. Right? Say, give this
- 08:47document. Tell GPT what I
- 08:49want. So that's what we
- 08:51did as a first experiment
- 08:52here. Basically,
- 08:54we give
- 08:55GPT three point five, GPT
- 08:56four at that time. We
- 08:58we say we want to
- 08:59extract a medical problem treatment
- 09:01test out of clinical nodes.
- 09:04So the main exercise here
- 09:05is really about prompt. So
- 09:07we tried actually a different,
- 09:09strategy for prompt. You define
- 09:11the task, define the output.
- 09:13You also need to tell
- 09:14the prompt what's the definition
- 09:16of a medical problem,
- 09:17and then you can also
- 09:19give a guideline. So say
- 09:21because in the previous, I
- 09:22also show you what's a
- 09:23boundary. You may say, here,
- 09:24it has to be a
- 09:25noun phrase for that entity.
- 09:26You give that kind of
- 09:27guideline. Then you can also
- 09:29give additional examples.
- 09:31Here's the sentence. Here's the
- 09:32entity I want to extract.
- 09:34So this kind of call
- 09:36few short learning. It gives
- 09:37a three example, like, a
- 09:39few short learning. Right? And
- 09:42we tested all those. We
- 09:43made a framework for the
- 09:45prompt, and we showed the
- 09:46evaluate on the annotate purpose.
- 09:49And then we we show,
- 09:50actually,
- 09:51if you have a lot
- 09:52of annotated data,
- 09:54then BER model, the previous
- 09:55deep learning based approach,
- 09:57still work better. The zero
- 09:59shot performance on GPT is
- 10:01as not as good as
- 10:02the BER model if you
- 10:03have a lot of annotated
- 10:04data. So that's what we
- 10:06found at that time. But
- 10:07it's actually close because the
- 10:08GPT
- 10:09four model can reach to,
- 10:11like, eighty six versus the
- 10:14BER model trained on hundred
- 10:15samples ninety in the relaxing
- 10:17matching.
- 10:18Relaxing matching means if the
- 10:20entity
- 10:21you predicted versus the entity
- 10:22you annotate is overlap but
- 10:24not exactly same. Okay.
- 10:27And,
- 10:28I'll skip this one.
- 10:30Then I'll skip those two.
- 10:32The second exercise we did
- 10:34okay.
- 10:35Later, llama come out. You
- 10:37have all the weights, like
- 10:38I said. You can actually
- 10:40do,
- 10:41fine tune of those using
- 10:42your additional data. So that's
- 10:44what we did in this.
- 10:45We're working on the same,
- 10:47task, extra medical problem treatment
- 10:49test, but now we have
- 10:50the open source LAMA model.
- 10:52We're actually going to use,
- 10:54annotate data from local corpus
- 10:57to fine tune the LAMA
- 10:58model for this task.
- 11:00And this is what we
- 11:01had
- 11:02through the instruction tuning approach.
- 11:04I think, we're not going
- 11:05to talk a little bit
- 11:06about this, so I'm not
- 11:07going to repeat. Basically, you're
- 11:09converting the annotate data, which
- 11:11I showed,
- 11:12to a instruction dataset.
- 11:14Then you fine tune the
- 11:15LAMA model to to,
- 11:17change the weights,
- 11:19for this specific task. And
- 11:21what we found here is
- 11:22actually
- 11:23when you have a
- 11:26a a lot of annotate
- 11:27data, the per the large
- 11:28entry model, llama three, actually
- 11:30start
- 11:31almost same as a per
- 11:33model if you look at
- 11:34those, but slight better than
- 11:35per model. It's this one
- 11:37is more fair compressive
- 11:39because both model use those
- 11:40hundred annotated sample to train
- 11:43it. Okay? But then
- 11:45the last dataset is the
- 11:47unseen dataset for both model.
- 11:49But then you can see
- 11:50actually the LAMA model have
- 11:52much better performance than the
- 11:53BER model.
- 11:55Indicated the LAMA model is
- 11:56more actually generalizable
- 11:58because it's almost have, like,
- 11:59eight percent improvement
- 12:01versus the BER model is
- 12:03around
- 12:04seventy nine. Here's eighty seven.
- 12:06So now we start to
- 12:07actually build this as a
- 12:08LAMA based kind of information
- 12:10instruction system. But one thing
- 12:12I want to point out,
- 12:13at least when we test
- 12:14on the LAMA three,
- 12:16the speed is the issue,
- 12:17actually. If you look at
- 12:18the BER model, it will
- 12:20take us point two second
- 12:21to do a named entity
- 12:23recognition for one document.
- 12:26Three took thirty nine seconds.
- 12:28So if you're processing millions
- 12:30of notes, that's another concern.
- 12:32So there's a a lot
- 12:33of other issues in addition
- 12:35to the performance you want
- 12:36to consider. That's what I
- 12:38want to bring up. So
- 12:39the third approach,
- 12:41you think about the first
- 12:42approach prompt, you don't really
- 12:44need much GPU. Right? It
- 12:46just costs money to call
- 12:47GPT.
- 12:48Second is fine tune. You
- 12:50do need the GPT to
- 12:51load the mod or you
- 12:52You do need a GPU
- 12:53machine to load the model
- 12:55and fine tune. It may
- 12:56take couple hours to couple
- 12:57days.
- 12:58But this one, we call
- 13:00it continue pre training, the
- 13:01LAMA model.
- 13:03You use a lot of
- 13:04clinical data, like all the
- 13:05notes, all the literature. We
- 13:07combined, like, one hundred twenty
- 13:09nine billions of,
- 13:11tokens to continue pretraining,
- 13:14the the the llama model.
- 13:15It took one hundred fifty
- 13:17GPUs running for a month.
- 13:19That will be a lot
- 13:21of money if you go
- 13:22to Amazon.
- 13:23As you can see, the
- 13:24the computational
- 13:25cost for training this model,
- 13:27much bigger compared to a
- 13:29previous one. But the benefit
- 13:31of this is actually the
- 13:32model has become more generalizable.
- 13:34It can work on multiple
- 13:36top clinical NLP task. So
- 13:38that's what we call this
- 13:39MiLama model. We trained on
- 13:41the LAMA two and,
- 13:43show the better performance on
- 13:45actually multiple task, not just
- 13:47on entity recognition, on question
- 13:49answering,
- 13:50inference, and all other tasks.
- 13:52I'm just going to stop
- 13:53here.
- 13:54So in summary,
- 13:56just to quickly
- 13:57talk about what we learned
- 13:59so far. Basically,
- 14:02when
- 14:03you try to extract information
- 14:04out of notes using large
- 14:06engine model, you still can
- 14:08think about, do you really
- 14:09need a large engine model?
- 14:10If the task is simple,
- 14:11I think even sometimes regular
- 14:13expression, even rule based approach
- 14:15still work. And, also, if
- 14:16you already have a lot
- 14:17of annotate data, then deep
- 14:19learning model like BERT still
- 14:21works well. Okay? And, also,
- 14:23it costs less in terms
- 14:24of computational effort.
- 14:26Then
- 14:27if you think g p
- 14:29large engine model does help
- 14:30for that specific, then we
- 14:32also want to discuss, oh,
- 14:33do I should I train
- 14:35my own large engine model
- 14:36based on open source model,
- 14:38or should I go with
- 14:40GPT? Right?
- 14:42Then there's a lot of
- 14:43concern.
- 14:44In addition to performance, you
- 14:45also will think about the
- 14:47cost.
- 14:48Right? The the GPU requirement.
- 14:50Do you have GPUs locally
- 14:51and all those issues?
- 14:54So in the next three
- 14:56to four presentation, we basically
- 14:58will talk about several things,
- 15:00tools
- 15:01available on the CFG safe
- 15:03environment, which will allow you
- 15:05to do this kind of
- 15:06work.
- 15:07The first tool we will
- 15:08talk about is actually about
- 15:10annotation tool. A lot of
- 15:11people
- 15:12didn't really pay much attention
- 15:14to annotation, but you if
- 15:15you really think about look
- 15:17at all the model training,
- 15:18use even within at the
- 15:20area of a large angle,
- 15:21you still need to do
- 15:22some annotation even just for
- 15:24validation
- 15:25evaluation. Then you need a
- 15:26tool to do that.
- 15:28And we we have a
- 15:29tool installed on the CHP
- 15:30for that purpose. Then second
- 15:32one, we showed you
- 15:34a tool we already fine
- 15:36tuned, and we made it
- 15:37available on the, CHP. You
- 15:39can just call it. I
- 15:40think Nate also talked about
- 15:42the services.
- 15:43Third one is really,
- 15:45go deep. I think Lingfue
- 15:47gonna talk about if you
- 15:48have your own data, start
- 15:50with. How can you fine
- 15:51tune that model with your
- 15:53own data on the CHP,
- 15:55man? So let them just
- 15:56start.
- 15:58Do you wanna go ahead?
- 15:59Start with the annotation.
- 16:01Just watch all the time.
- 16:03Maybe just a little fast.
- 16:15Hi, everyone.
- 16:16Today, I'm just, gonna go
- 16:18through why we need annotation
- 16:20and then, try to introduce
- 16:21our annotation tool, Blue.
- 16:25So,
- 16:26annotation is the process of
- 16:28labeling data,
- 16:30marking span of text images
- 16:32or other content with additional
- 16:34information such as, entity types,
- 16:36categories,
- 16:37or relationships.
- 16:38Just as doctor Xu mentioned
- 16:40before showing the graph that
- 16:42there is entities annotated and
- 16:44also the relationship, annotate
- 16:48between those
- 16:50entities. Annotation is, critically important
- 16:53because it serves as the
- 16:55foundation for the machine learning
- 16:57and deep learning models.
- 16:59These models,
- 17:01heavily rely on annotated dataset
- 17:03to learn meaningful patterns and
- 17:06then to make,
- 17:07accurate predictions.
- 17:09Annotation remain
- 17:11the large length models.
- 17:13Although LMM is highly capable,
- 17:17it they still depends on
- 17:18the annotation data for
- 17:20fine tuning and the specific
- 17:22to for specific task and
- 17:24for evaluation against the ground
- 17:26truth.
- 17:28Here
- 17:33here you can see a
- 17:35table that,
- 17:36compare performance
- 17:38between,
- 17:39multiple models
- 17:41for,
- 17:43including the lama three variants
- 17:44and also a fine tuned
- 17:46lama model on the language
- 17:47annotation task.
- 17:50As seen in the result,
- 17:51fine tuned model,
- 17:53which trained our well annotated
- 17:55dataset
- 17:56tend to perform better on
- 17:58specific
- 17:59targeted,
- 18:00project.
- 18:04There are several key topics
- 18:07related to the annotation.
- 18:09The process always begin with,
- 18:11developing a clear and detailed
- 18:14annotation guideline.
- 18:15A well developed guideline improves
- 18:17consistency
- 18:18among,
- 18:19annotators
- 18:20and lead to higher annotation
- 18:22quality as the speed up
- 18:25the onboarding of new annotators
- 18:27and also make conflict resolution
- 18:29easier.
- 18:30Once the guideline is,
- 18:33developed, the next step will
- 18:35be select the annotation
- 18:37with appropriate domain knowledge
- 18:39and then train them thoroughly
- 18:41based on the guideline.
- 18:44After the training, it is
- 18:45important to continuously
- 18:47checking the and monitor the
- 18:49annotation
- 18:50quality.
- 18:51This including,
- 18:53checking agreement among annotators,
- 18:56holding discussion
- 18:57to resolve,
- 18:59disagreement
- 19:00and refine the guideline based
- 19:02on common errors or ambiguities
- 19:04identified during the process.
- 19:07And later, I will also
- 19:08introduce the annotation tool that
- 19:10can support and streamline the
- 19:12annotation workflow.
- 19:17For annotation guideline development,
- 19:19the first step is always
- 19:21define the goal of your
- 19:22project.
- 19:24Clearly state what you are
- 19:26trying to achieve,
- 19:27with the project.
- 19:29Next, provide a clear definition
- 19:31for all the concept like
- 19:34entities, relations, or some special
- 19:36terms.
- 19:39After that, develop
- 19:40detailed annotation rule that covers
- 19:43morality
- 19:43scenarios
- 19:44and edge cases to minimize
- 19:46the ambiguity.
- 19:48It is also essential to
- 19:50include many real world examples
- 19:52in the guidelines.
- 19:54Illustrate both, correct annotation and
- 19:57common errors.
- 19:59Guideline development is not a
- 20:00one time effort. It's always
- 20:02need,
- 20:04iterative process.
- 20:05It is important to involve
- 20:07both,
- 20:08domain experts and the linguistics
- 20:11or info implementations
- 20:12to ensure both,
- 20:14technical accuracy and practical usability.
- 20:18After the initial guideline is
- 20:21create created,
- 20:22it should be refined during
- 20:24the annotation training and tested
- 20:26on real world data.
- 20:28Given the variability
- 20:29and complexity
- 20:31of the real world data,
- 20:33new scenarios will always,
- 20:36inevitably
- 20:37raised
- 20:38and may occur further guideline
- 20:40updates.
- 20:41Once the guideline is stable
- 20:43and robust, the process can
- 20:45move to the corpus final
- 20:47day final finalization.
- 20:51Here, you can see, example
- 20:53of a annotation guideline.
- 20:55The goal of this guideline
- 20:56is to,
- 20:58identify meaningful,
- 20:59clinical concept from,
- 21:02important patient,
- 21:03medical records
- 21:04and to help extract information
- 21:06like
- 21:07the test problem, drug, and
- 21:09treatment.
- 21:11As shown on the left
- 21:12side,
- 21:13we provide a detailed definition
- 21:16to
- 21:17to ensure the annotator understand
- 21:19what should be labeled.
- 21:21In this guideline, we also
- 21:22introduced the modifiers,
- 21:24a concept that,
- 21:27complement,
- 21:27entity and also extend its
- 21:29mailing.
- 21:31For each modifiers
- 21:32such as the severity and
- 21:34body location,
- 21:36it will also need to
- 21:37be specific on how it
- 21:39need to be annotated and
- 21:41what's the relationship with the
- 21:43entities.
- 21:44Additionally, the guideline need to
- 21:46include, many real world examples,
- 21:50like the diagram show on
- 21:51the bottom,
- 21:52to illustrate
- 21:54the correct annotation practice
- 21:56and for ambiguous
- 21:58phrase or tricky scenarios.
- 22:00Examples will also need to
- 22:02be provided to establish a
- 22:04clear consistent rules for annotator
- 22:07to follow.
- 22:12It is it is important
- 22:13to choose the annotators with
- 22:15a proper it's,
- 22:17background for your task.
- 22:19Depending on the complexity,
- 22:22you might need to choose
- 22:24a domain expert like physicians,
- 22:26nurses,
- 22:27or, medical students,
- 22:29or just or some layperson
- 22:31for more general and broad
- 22:33annotation.
- 22:34Training for annotator is a
- 22:36iterative process.
- 22:38Annotator should be trained and
- 22:40evaluate multiple times,
- 22:42until they achieve
- 22:43the expected level of performance.
- 22:47Quality checking,
- 22:48need to be ongoing during
- 22:50the annotation progress.
- 22:52Regularly review is always needed
- 22:55during their work.
- 22:57And you also need to
- 22:58provide the feedback,
- 23:00on time and sometimes additional
- 23:02retraining for the annotators.
- 23:07When managing a project,
- 23:09which contains,
- 23:10multiple annotators,
- 23:12there are several,
- 23:14important steps must be taken
- 23:15to ensure the quality.
- 23:18Before starting the actual annotation,
- 23:21train each annotator
- 23:23thoroughly to ensure they can
- 23:25produce consistent and reliable annotation
- 23:28result
- 23:29that align with the guideline
- 23:31you developed.
- 23:33If resource allow,
- 23:35implement
- 23:35double annotation
- 23:37strategy.
- 23:38Ideally, each sample need to
- 23:39be annotated
- 23:41by two annotators independently.
- 23:44Then a third more experienced
- 23:46annotator
- 23:47can review
- 23:48any discrepancies and make the
- 23:50final decision.
- 23:51This process will have to
- 23:52maintain a high quality of
- 23:54annotation.
- 23:55If double annotation for the
- 23:57intel dataset is not feasible,
- 24:00assign small overlapping
- 24:02subset of data to multiple
- 24:04annotators.
- 24:05This overlap allow you to
- 24:07calculate inter,
- 24:09interagreement
- 24:11of the annotator
- 24:12and then provide a way
- 24:14to monitor and maintain annotation
- 24:15quality.
- 24:20When checking the annotation quality
- 24:22for the NER task, we
- 24:24will focus on two main
- 24:25areas.
- 24:26The first one is entity
- 24:28type agreement and then empty
- 24:30span agreement.
- 24:32For anti type agreement, we
- 24:33verify whether annotators
- 24:36assign the same type for
- 24:37the entity.
- 24:38You can see in this,
- 24:40graph, one of them annotate
- 24:42the Vancom missing HCL as
- 24:44the drug and another one
- 24:46annotate as treatment.
- 24:47Then this,
- 24:49mismatch will
- 24:51need to be discussed when
- 24:53during during the annotation and
- 24:55then correct,
- 24:57for the final
- 24:58step.
- 24:59And also for the
- 25:01anti span agreement, we check
- 25:03whether both annotator
- 25:04select the same portion of
- 25:06text. For the same example,
- 25:08one
- 25:10labeled a lot of emotional
- 25:12stress as a problem and
- 25:13another one annotate just the
- 25:15emotional stress.
- 25:17When such mismatch occurs,
- 25:19it is important to refer
- 25:21back to the guideline
- 25:22and to and to determine
- 25:24what is the current correct
- 25:26one to move forward.
- 25:40When checking the annotation quality
- 25:43in relation
- 25:44to extraction,
- 25:45there are three main aspects
- 25:47we need to evaluate.
- 25:48The first one is,
- 25:50relation type agreement.
- 25:52We check whether both annotators
- 25:54assign the same type of
- 25:56relation between entities,
- 25:58and then we evaluate,
- 26:00entity pair. We verify if
- 26:03the same entity are being
- 26:05linked by the relation.
- 26:06And finally, we need to
- 26:08check the direction
- 26:09directionality,
- 26:11which is important for some
- 26:13tasks because the direction may
- 26:15change the meanings.
- 26:19To
- 26:21evaluate, there are several metrics
- 26:22we can use.
- 26:24The common one is precision
- 26:26recall and f one measure,
- 26:28which help quantify how consistently
- 26:30annotate and identify and classify
- 26:32entities.
- 26:34Additionally, we can also use
- 26:36some statistical measures such as
- 26:38Cohen's copper or.
- 26:41Another important
- 26:43matter is, self train and
- 26:44self test. By training the
- 26:46model on the annotated dataset
- 26:48and then testing on the
- 26:50same dataset,
- 26:52we can
- 26:54check if the model achieve
- 26:55high performance.
- 26:56If the performance
- 26:58is low, it may indicate
- 26:59underlying issue with, annotation inconsistency
- 27:03or, quality that or quality.
- 27:09Oh, here is some examples
- 27:11of, widely used annotation tool.
- 27:15You can see there is,
- 27:16Meditator,
- 27:17Ehost,
- 27:18or a Docana. All of
- 27:20those tools are open source
- 27:21and available on GitHub.
- 27:24Today, I'm gonna,
- 27:25introduce the annotation tool blue
- 27:27is which is implement on
- 27:29the cheap environment, and then
- 27:32each users don't need to
- 27:33install by themselves and could
- 27:35be managed by,
- 27:37admin user.
- 27:42For the Bluetooth,
- 27:44there are several prerequest
- 27:46for the access. The first
- 27:47one, you will need a
- 27:48one HH account
- 27:50and then, adding connect to
- 27:52VPN.
- 27:53So for Mac user, you
- 27:54will need to install a
- 27:55Windows application.
- 27:57And for Windows user, you
- 27:58can use the remote desk,
- 28:01connection to oh, application.
- 28:05First step, you need to
- 28:06connect to the VPN.
- 28:08Open the VPN
- 28:09application and then, in the
- 28:11address, type the telecom mute
- 28:14dot y h h dot
- 28:16org backslash y s m.
- 28:18Here, you need to use
- 28:19your Yale Net ID and
- 28:21password to log in.
- 28:24And then once you successfully,
- 28:27log in to the VPN
- 28:28environment, you can open the
- 28:29application and click the add
- 28:32button
- 28:33to add the
- 28:35IP address.
- 28:36It's ten dot forty eight
- 28:38dot one two eight dot,
- 28:40ninety six
- 28:42dot sixty nine.
- 28:44And
- 28:45once the PC successfully added,
- 28:48it will show on the
- 28:50application and then double click
- 28:53to insert your credential.
- 28:55Here, we'll need your one
- 28:57HH ID and the one
- 28:59HH password.
- 29:03Once you, log in to
- 29:06the PC, you will see
- 29:07a Ubuntu environment.
- 29:13After you get access to
- 29:15that environment, you can use
- 29:16any browser on the left
- 29:18side And on the address,
- 29:20insert the URL, HTTP,
- 29:23local host to open the
- 29:25annotation tool.
- 29:28The first step is to
- 29:30create your account.
- 29:31You always want to have
- 29:33a admin person that creates
- 29:35account first.
- 29:37That will be the person
- 29:38who can manage the whole
- 29:39group and assign the project
- 29:41and tasks to each annotators.
- 29:44Please use your email, username,
- 29:46and pass
- 29:47password to sign up. And
- 29:49for the verification
- 29:51code field,
- 29:52we disable that function so
- 29:54you can just enter any
- 29:55four digit number or
- 29:57combination of characters.
- 30:02After you log in to
- 30:03the blue, you will be
- 30:04able to create able to
- 30:05create the project and invite,
- 30:08users to the tool.
- 30:14Once the admin person successfully
- 30:17log in and then he
- 30:19he or she can send
- 30:20the invitation to the other
- 30:22group members,
- 30:23he the person need to
- 30:25click the invite button and
- 30:27then copy the invitation link
- 30:29to each of
- 30:30the annotators.
- 30:31And the annotator need to
- 30:33be use this link to
- 30:35register. Otherwise, they will not
- 30:36be in the same group.
- 30:41And,
- 30:42by click the click on
- 30:44add new project button, you
- 30:46can you will be able
- 30:47to
- 30:48choose your task either NER
- 30:50or NER plus relational extraction.
- 30:57The pro the creative project
- 30:59will show on the front
- 31:01page,
- 31:01and then you will be
- 31:03able to add annotators
- 31:04to the to the project.
- 31:10To add the data source,
- 31:12you can click the data
- 31:13source button
- 31:14button and then choose what
- 31:16kind of format you want
- 31:17to upload to the tool.
- 31:19We access two format. One
- 31:21is txt. That's the plain
- 31:22text without any,
- 31:25entity or relationship,
- 31:27and and pre annotate. You
- 31:29can also choose the blue
- 31:31format. That is a JSON
- 31:32file. You can include the
- 31:34entity or
- 31:35relationship
- 31:36inside that JSON.
- 31:42For each for each project,
- 31:44you can create tasks for
- 31:45the annotators,
- 31:47by adding the by click
- 31:50the add task button.
- 31:52And for each task, you
- 31:53can assign multiple annotators to
- 31:56this one task just as
- 31:57I mentioned before.
- 31:59Different annotators can
- 32:02annotate same, sub subgroup of
- 32:04data. That is in order
- 32:06to calculate the agreement among
- 32:08annotators.
- 32:12After Tesla
- 32:13created, you you will be
- 32:14able to start annotation.
- 32:16And then
- 32:18first first thing, you need
- 32:19to define the entity and
- 32:21relationship
- 32:22that you already have in
- 32:23your annotation guideline.
- 32:25And
- 32:28and then after that, highlight
- 32:30the phrase you want to
- 32:31do the annotation and then,
- 32:33choose
- 32:34what kind of entity or
- 32:35relationship you want to
- 32:38annotate.
- 32:40The blue tool can will
- 32:42also provide you a function
- 32:44to calculate the agreement among
- 32:46the annotators.
- 32:48Once the annotator finish and
- 32:51finish the task,
- 32:52you can
- 32:54and finalize them. Then you
- 32:56then you can just use
- 32:58the button to
- 33:01check the agreement among them.
- 33:03It will give you a
- 33:04f one score for both
- 33:06entity and relationship.
- 33:10Then I will have a
- 33:12quick demo for the process.
- 34:18Okay. Okay. As I mentioned,
- 34:20you just, connect to the
- 34:21VPN and then
- 34:25type the password.
- 34:32And here, then open
- 34:35open the Windows app.
- 34:38Click to the
- 34:42server
- 34:43we have.
- 35:09And then open the browser.
- 35:24Here, you can sign into
- 35:26your account.
- 35:29To create a new project,
- 35:30you can just click this
- 35:31button and
- 35:33type the project name and
- 35:35select the project type.
- 35:38Here, I already created a
- 35:39demo project project.
- 35:41I want to,
- 35:42import the data source here.
- 35:44So I just click this
- 35:46button,
- 35:46and then I
- 35:48download some The notes I
- 35:50need to annotate it from
- 35:52the chip environment,
- 35:54and then I drag drag
- 35:55it to here.
- 35:58This is the tab,
- 35:59tip c file, so I
- 36:00just choose text,
- 36:02and then I confirm.
- 36:08For each of the project,
- 36:09you can add the annotator,
- 36:12and that's the pit per
- 36:13that's the person within your
- 36:15group.
- 36:19And then you just, create
- 36:21annotation task for them.
- 36:27You can choose, multiple annotators
- 36:29here.
- 36:33And then you go to
- 36:34the file.
- 36:36On this side, you can
- 36:38define the entity. For example,
- 36:40we we want to choose
- 36:42we want to define problem.
- 36:46And then you can,
- 36:48start to do the annotation.
- 36:52Yeah. Basically, that's the whole
- 36:54process for the how you're
- 36:55gonna do the annotation and
- 36:56how to use our tool.
- 37:06Yeah. Any questions?
- 37:09So
- 37:10how we can import our
- 37:12own data to this? This
- 37:13because I think this this
- 37:14is your server. Right?
- 37:17Here, as a
- 37:19the we can download
- 37:21the
- 37:22the team also mentioned we
- 37:24use the cheap environment. Right?
- 37:25The Camino, we can upload
- 37:27the own data mod their
- 37:29their own data to their
- 37:30environment, and this server will
- 37:32connect to the Camino.
- 37:34You can download that data
- 37:36from Camino environment.
- 37:38That will.
- 37:47Not yet. Right now, because,
- 37:49this tool will host in
- 37:51a secure environment,
- 37:53while coming to us because
- 37:54there are lots of PHI
- 37:56information.
- 37:57So
- 37:58that's the purpose,
- 38:00we hosted there. So for
- 38:01example, let's say there are
- 38:02other,
- 38:04and publicly available datasets.
- 38:07We really want to annotate
- 38:08them so they're they're able
- 38:09to be for us to
- 38:10ask,
- 38:12to,
- 38:13upload it to and then
- 38:14from
- 38:15to this server. Right? Yes.
- 38:18Well, I want to think
- 38:19of if you try to
- 38:21annotate public data, don't use.
- 38:23I I I think what
- 38:25what we can do is
- 38:26we set up a blue
- 38:27in a open,
- 38:29public website, then you can
- 38:30just go to over there.
- 38:32Like, I put it in
- 38:33the spin up, then you
- 38:34can just upload because there's
- 38:35no sensitive data. We can
- 38:37just make another instance of
- 38:39the loop of public data.
- 38:41Because this,
- 38:42we install in the communal,
- 38:44in the CHP to support
- 38:45this annotation of of.
- 38:47And if there are public
- 38:48data, well, we can't just
- 38:49set up another because it's
- 38:50a web application. We just
- 38:51set up another web application
- 38:53in this thing, a public
- 38:54space we can. Yeah. We
- 38:56can discuss that. And we
- 38:57should we should ask them
- 38:58to set up that
- 39:00specific
- 39:01list that the public list,
- 39:02or is it available?
- 39:05We have not, but you
- 39:06can contact us. Maybe we
- 39:08just give you a copy.
- 39:09We can stop by ourselves.
- 39:10But right now, we we
- 39:11didn't really distribute this package
- 39:13frame. We're just sitting up
- 39:15before our.
- 39:35It it's just a different,
- 39:36tools.
- 39:38Yeah.
- 39:51Thanks, Silja. I'm gonna be
- 39:53very quick.
- 39:55And machine gun mode on.
- 39:57Okay.
- 39:59Yeah. Doctor Shu already discussed
- 40:00about the difference between,
- 40:02BERT and LAMA.
- 40:04And
- 40:05summarize everything,
- 40:07there is a trade off
- 40:08between performance, computational
- 40:10resources, and time.
- 40:12Okay? So you have, you
- 40:14need better performance.
- 40:16The computational resources are there.
- 40:17Go for llama models, high
- 40:19billion models.
- 40:20Across a wide variety of
- 40:22tasks, they would work well.
- 40:24But if time is a
- 40:25concern, he, projected,
- 40:27you know, issue of speed
- 40:28between BART models and the
- 40:30last language models. It is
- 40:31up to twenty to thirty
- 40:32times slower.
- 40:34So if that is a
- 40:35concern, you need to switch
- 40:36to BERT models. So I'm
- 40:38gonna talk about the clinical
- 40:39information extraction system
- 40:41where we have developed both
- 40:43the BERT and LAMA based
- 40:46large language models for you
- 40:49in such a way that
- 40:50you do not have programming
- 40:52experience, you have some programming
- 40:54experience, or you are a
- 40:55pro programmer.
- 40:57Anyway,
- 40:58we have features that will
- 40:59help you take it and
- 41:01customize it to whatever task
- 41:03you want to use it
- 41:04for.
- 41:05And that
- 41:07is what we call Kiwi.
- 41:09Okay? So we are building
- 41:11Kiwi. The one pipeline that
- 41:13I'm currently gonna show you
- 41:14that is set for all
- 41:16these sort of use cases
- 41:17that I'm talking about
- 41:19is a general clinical information
- 41:21extraction pipeline.
- 41:23I also have things coming
- 41:24up for you, and if
- 41:25you have suggestions or something
- 41:27that you have been really
- 41:28working on, it's a real
- 41:29need of the time, let
- 41:30us know, and then we
- 41:32would work on developing those
- 41:33things.
- 41:34Okay.
- 41:36We have the clinical notes.
- 41:37We need to do some
- 41:38preprocessing,
- 41:39deidentification,
- 41:40these sort of things. Doctor
- 41:42Shu mentioned named entity recognition
- 41:45followed by relation extraction, then
- 41:47there is this concept mapping
- 41:48or concept normalization.
- 41:50Finally, post process it and
- 41:53get
- 41:54all the structured data
- 41:55from the unstructured
- 41:57clinical notes. So that is
- 41:59the basic block diagram of
- 42:01any clinical information extraction pipeline.
- 42:06I don't need to go
- 42:07over this named entity recognition,
- 42:09identify the boundaries,
- 42:10relation, identify the relationship between
- 42:13the entities,
- 42:14and normalization,
- 42:15doctors write the same thing
- 42:16in hundred different types. High
- 42:18VP, hypertension, all these are
- 42:20the same. Right? So you
- 42:21need to get it to
- 42:22another standardized
- 42:24vocabulary, terminology like ICD, SNOMED,
- 42:27these sort of things. That
- 42:28is what concept normalization
- 42:29does. All these three things
- 42:31comes together.
- 42:32That is where you take
- 42:33unstructured data and get your
- 42:35structured thing out of it.
- 42:37Okay. What does our general
- 42:39clinical,
- 42:41information extraction pipeline give you?
- 42:44We mainly focused on four
- 42:45main entities,
- 42:47medical problem, treatment,
- 42:49drug, and test.
- 42:50Right? So you our Kiwi
- 42:52tool will give you all
- 42:54these four
- 42:55main types of entities, but
- 42:57these entities are not just
- 42:58by themselves. Right?
- 43:00When you are talking about
- 43:01a drug, you have things
- 43:03like the strength, the dosage,
- 43:04the duration, the route, all
- 43:06these things are important. And
- 43:08we need to connect that
- 43:09specific drug to that specific
- 43:11route or specific
- 43:12strength or dosage
- 43:14to actually identify what has
- 43:16the doctor written about giving
- 43:18those information to that patient.
- 43:21So we have a bunch
- 43:22of main entities, and we
- 43:24have a bunch of modifiers
- 43:25that correspond to those main
- 43:27entities.
- 43:28Altogether,
- 43:29this is what Kiwi is
- 43:31gonna extract for you. I
- 43:33know many of the things
- 43:34that you may be needing
- 43:36might be missing from this,
- 43:38but we will. If there
- 43:39are some other cases that
- 43:40you would like to extract,
- 43:42we may in future think
- 43:43about incorporating that. So So
- 43:45for medical problem, you have
- 43:47the severity, the condition, the
- 43:49uncertainty, who is the subject,
- 43:50whether is it really talking
- 43:51about the patient or his
- 43:52family because we can see
- 43:54all these sort of things
- 43:55appearing in the notes.
- 43:57Whether that particular problem is
- 43:59negated or not, So this
- 44:01is how it is. So
- 44:02we have the four main
- 44:03entities and all these modifiers.
- 44:08Going very briefly. So YuJa
- 44:10mentioned about the annotation. Let's
- 44:11think when you annotate, this
- 44:13is on the top figure
- 44:14is something that you get.
- 44:16Now suppose you are using
- 44:17a large language model, it
- 44:19understands the language of prompts.
- 44:21Right? And doctor Xu covered
- 44:22this, how to write a
- 44:23proper prompt for a named
- 44:25entity recognition.
- 44:26So you define the task.
- 44:28We want to identify medical
- 44:30problems, treatment test, and, other
- 44:32things, and then you give
- 44:34us how you need the
- 44:35output. That is
- 44:36for making your programming life
- 44:38easy to take it in
- 44:39a particular output format so
- 44:41that you can convert it
- 44:42and evaluate it fast. So
- 44:44that is the output guideline
- 44:45markup.
- 44:46Then you define each entity
- 44:48because we have developed the
- 44:50annotation guidelines, and that is
- 44:51how the humans actually annotate.
- 44:53So the model should also
- 44:55know what is how the
- 44:56humans have annotated. Otherwise, how
- 44:58do you compare that gold
- 44:59human standard annotated data with
- 45:01what is what the model
- 45:02is giving? So whatever information
- 45:04you are giving the human,
- 45:06you also give that to
- 45:08a model
- 45:08in the terms of entity
- 45:10definitions.
- 45:11And then annotation guidelines. We
- 45:13talked about okay. Annotate only
- 45:15complete noun phrases shouldn't be
- 45:17partial, complete adjective phrases. These
- 45:19sort of things that are
- 45:20there in the annotation guideline
- 45:22that you developed is also
- 45:23provided to the model.
- 45:25Now then you build your
- 45:27training data by showing the
- 45:29model a bunch of examples.
- 45:31Suppose your input is at
- 45:32the time of admission, he
- 45:34denied fever, dysphoria, whatever it
- 45:36is. So
- 45:37how does the model provide
- 45:38you the output? So it
- 45:40should say that span class
- 45:41problem fever.
- 45:42That is telling the model,
- 45:44okay,
- 45:45fever is a problem. Whenever
- 45:47see you see a medical
- 45:48problem, put it between the
- 45:50HTML
- 45:51tags, span class is equal
- 45:53to problem, the opening tag,
- 45:55and slash span, which is
- 45:56the closing tag. We did
- 45:58that for our convenience because
- 46:00we were comparing it with
- 46:01the BERT and other models.
- 46:03You can provide the output
- 46:04in the way that you
- 46:05want. You can use JSON
- 46:07format, or if you just
- 46:08want it to be plain
- 46:09text in question answering and
- 46:10things like that, you can
- 46:12give the output in such
- 46:13a way. But at least
- 46:14with the named entity recognition
- 46:15relation extraction, this really helps
- 46:17us. And another thing is
- 46:19that this also prevents or
- 46:21helps us know that the
- 46:22model is not hallucinating.
- 46:24You see? You are giving
- 46:25the input sentence and you
- 46:27are also telling the model
- 46:28to repeat the same sentence
- 46:29but with some tags attached.
- 46:31You can compare your input
- 46:33and your output to see
- 46:34that model is not inserting
- 46:36entities or things that are
- 46:38not already there in the
- 46:39original sentence.
- 46:42Okay.
- 46:43So that is how you
- 46:45create a prompt and do
- 46:46NER with large language models.
- 46:48The next step is relation
- 46:50extraction.
- 46:51There is a particular drug
- 46:53you need to associate its
- 46:54strength, its route, its form,
- 46:57its frequency,
- 46:58everything, and connect that particular
- 47:00drug to whatever is mentioned
- 47:02for it. Right? So for
- 47:04that, for the relation extraction,
- 47:06you need to slightly modify
- 47:08your prompt when you give
- 47:09it to. So here we
- 47:10are saying your task is
- 47:11to mark up modifier entities
- 47:14when given a main entity.
- 47:17So how do we train
- 47:18the model for this task?
- 47:20We will show the model
- 47:21the main entity. That is
- 47:22your input text. Span class
- 47:24drug is equal to this.
- 47:26Then you will ask the
- 47:27model, given this main entity,
- 47:29what are the modifier entities
- 47:31associated with it? And then
- 47:34you give examples in the
- 47:35output where you see now
- 47:37the main entity is not
- 47:38annotated inside the span class
- 47:40tags, whereas you see that
- 47:42point three five mg is
- 47:44within span class is equal
- 47:45to strength.
- 47:47So given a drug, you
- 47:48say this
- 47:49appearance is like this point
- 47:51five milligram or mcg, these
- 47:53sort of things, when it
- 47:54sees repeatedly,
- 47:55it's actually learning that this
- 47:57is the strength associated with
- 47:59that particular main entity.
- 48:01So a lot of examples
- 48:02that are annotated like this
- 48:03is what is helping the
- 48:04model learn.
- 48:07Again, this is another same
- 48:09sort of example. His blood
- 48:10pressure on discharge was one
- 48:12twenty six over sixty three.
- 48:14Heart rate is eighty. You
- 48:16cannot say blood pressure is
- 48:17eighty. Right? It's the same
- 48:18sentence which has two values
- 48:20and two tests. You need
- 48:22to correctly associate blood pressure
- 48:23with hundred and twenty six
- 48:25over sixty three and heart
- 48:26rate with,
- 48:27eighty.
- 48:29Right? So we give the
- 48:30input when we say blood
- 48:32pressure is the entity. Its
- 48:33value should be hundred and
- 48:34twenty six over eighty. If
- 48:36we highlight heart rate as
- 48:37the entity, then the value
- 48:39should be eighty.
- 48:44Again, so now we originally
- 48:46had the annotated data. We
- 48:48converted into these instructions format
- 48:51that had been showing for
- 48:52the named entity recognition, things
- 48:53like this. This is an
- 48:54instruction demonstration
- 48:56sample.
- 48:56And for any so relation
- 48:58extraction, the one that you
- 48:59see down. So you have
- 49:01these
- 49:02and the entire dataset
- 49:04converted into such things is
- 49:06your instruction demonstration.
- 49:08So you have a bunch
- 49:09of these
- 49:10examples that you collectively,
- 49:13call as your instruction
- 49:15dataset.
- 49:16So previously, we have denoted
- 49:18datasets for the other models.
- 49:19They're just slightly different. The
- 49:20term is instruction datasets because
- 49:22the dataset is comprised of
- 49:23a bunch of instructions or
- 49:25prompts with the examples input
- 49:27and output examples.
- 49:29To instruction fine tune a
- 49:31large language model, all you
- 49:33need is such an instruction
- 49:34dataset specific for your task,
- 49:36a base large language model
- 49:39like llama two, llama three,
- 49:41llama four, or whatever it
- 49:42is.
- 49:43And then you give this
- 49:45model, you train it, and
- 49:47finally, you get an instruction
- 49:49tuned large language model. So
- 49:51if it is a LAMA
- 49:52model as your base, you
- 49:53will get an instruction tuned
- 49:55LAMA,
- 49:56but
- 49:57the one that is actually
- 49:59adapted
- 50:00for those tasks.
- 50:02So when you just take
- 50:03the originally available LAMA model,
- 50:05it's a general trained model.
- 50:06Right? It's not adapted or
- 50:08it is not domain adapted
- 50:10for your specific task.
- 50:12By fine tuning a large
- 50:13language model, what you are
- 50:15doing is making its capabilities
- 50:17much more lean towards whatever
- 50:20task you want to perform
- 50:22by showing it a lot
- 50:23of such examples and modifying
- 50:25its weights in such a
- 50:26way that it adapts to
- 50:28that specific task.
- 50:30That particular model, if you
- 50:32now go and test back
- 50:33on some general task, it
- 50:35might not perform the way
- 50:36that it previously
- 50:38performed
- 50:39because you have changed the
- 50:40model weights and adapted it
- 50:42to that specific task.
- 50:44Okay. So this is basically
- 50:46fine tuning and then you
- 50:47would evaluate the model.
- 50:49Now going back, I said
- 50:51we also we had the
- 50:52BERT based models too. There
- 50:54also, as you just shown,
- 50:55you would annotate the dataset,
- 50:57but for a large language
- 50:58model, you would give prompt.
- 51:00For BERT, you would convert
- 51:02it in such a way.
- 51:04So BERT model is basically
- 51:06sequence tagging. So you convert
- 51:08each of the sentence into
- 51:09tokens. Let's say a token
- 51:11is a word. So vital
- 51:13sign remains stable. And then
- 51:15doctor Xu has covered this
- 51:17BIO tagging is what we
- 51:19call beginning of an entity,
- 51:20inside of an entity, outside
- 51:22of an entity. So vital
- 51:24sign is a test here,
- 51:25so you say b test.
- 51:27If you have a problem,
- 51:28you would say acute carcinoma
- 51:30or something. I'm making this
- 51:31up. So acute is gonna
- 51:33be b problem
- 51:34and carcinoma is gonna be
- 51:36I problem. If it is
- 51:37not within the four main
- 51:39entities and four modifiers that
- 51:41we have, we tag it
- 51:42as o, which means outside
- 51:44of an entity. So take
- 51:46the same annotated dataset, convert
- 51:48into two different formats. One,
- 51:50safe for the llama, another
- 51:51for the bird.
- 51:52And for bird, this is
- 51:54token classification. So given a
- 51:56content given a sentence, you
- 51:57are basically predicting whether vital
- 51:59is among the b test,
- 52:01I test, b problem, I
- 52:02for problem, b value, I
- 52:04value, whatever is the corresponding
- 52:06label that should be for
- 52:07that particular token. So it
- 52:09is token classification task what
- 52:11we do.
- 52:13How does relation extraction correspond
- 52:15in the case of BERT?
- 52:17So here also the same
- 52:18thing. Now it becomes sentence
- 52:20classification.
- 52:21You have two classes. One
- 52:23that has value, which is
- 52:25which is a positive class
- 52:26and the other, it is
- 52:27a negative class. So if
- 52:29you show blood pressure and
- 52:31eighty, that is a negative
- 52:33sample. You should say label
- 52:34that sentence as negative. If
- 52:36you have blood pressure and
- 52:37hundred and twenty six over
- 52:39sixty three, then it is
- 52:40a positive sample. So it
- 52:42becomes a sentence classification
- 52:44task. And so many patterns
- 52:46like this and seeing repeated
- 52:48sentences like that, the model
- 52:49is learning that particular pattern
- 52:51and identifying it. Another time
- 52:53that sort of sentence appears,
- 52:54okay. This is a positive
- 52:56or has value or this
- 52:57is a negative or a
- 52:58negative class there.
- 53:01This is the entire Kiwi
- 53:03pipeline.
- 53:04Okay.
- 53:06Having
- 53:07data
- 53:08from multiple sources is important.
- 53:11Something that works on your
- 53:12specific data at a particular
- 53:14hospital setting written by one
- 53:16specific doctor in one particular
- 53:17setting might not generalize well
- 53:20when you try to use
- 53:21that same pipeline in another
- 53:23hospital,
- 53:24in another node that is
- 53:26written by another physician or
- 53:27health care provider.
- 53:29So for Kiwi, we actually
- 53:31have data from four sources
- 53:33so that we can
- 53:34make the model much more
- 53:36generalizable
- 53:37and make it see the
- 53:38patterns that happens in a
- 53:40wide variety of data. We
- 53:42have the UTP that is
- 53:43UT Physicians, empty samples that
- 53:45is an, publicly available dataset.
- 53:47MIMIC three, you might know.
- 53:48And all these data from
- 53:50these different sources are incorporated
- 53:52for our training process.
- 53:54And as I mentioned, instruction
- 53:56format for llama BERT format
- 53:58for training the BERT models.
- 54:00Then you fine tune both
- 54:02the models, and then you
- 54:03test the models out. So
- 54:05you test it on a
- 54:06subset of the UTP, empty
- 54:08samples, and mimic three and
- 54:10also on I two b
- 54:11two. Again, doctor Xu mentioned
- 54:12that I two b two
- 54:13is unseen data. Right? It's
- 54:15not in your training data.
- 54:16That's how we are testing
- 54:17the generalizability
- 54:18to see whether it is
- 54:19actually performing on unseen data.
- 54:23Post process
- 54:24separate entities relationship
- 54:26calculate precision recall and f
- 54:27one, and that is your
- 54:29evaluation.
- 54:31So this is,
- 54:32quickly the composition of the
- 54:34Kiwi dataset, means the Kiwi
- 54:36model that we are giving
- 54:38out currently. It has it
- 54:39has been trained on about
- 54:41one thousand four hundred documents
- 54:43and then tested on some,
- 54:45four different types, each having
- 54:47fifty documents or,
- 54:49twenty five documents.
- 54:52So evaluation, I mentioned precision
- 54:54recall and f one, and
- 54:55it is exact match and
- 54:57relaxed match. To be clear,
- 54:58exact match, the entity type
- 55:00should match and the boundary
- 55:02should also match. But when
- 55:04it is relaxed match, the
- 55:05entity type should still match,
- 55:07but the boundary can be
- 55:09overlapping.
- 55:12How did we perform? Llama
- 55:14three seventy billion was sort
- 55:16of better for NER task
- 55:18and, again, for relation extraction,
- 55:20but you also see that
- 55:21some smaller models still performed
- 55:24on par with it. Sometimes
- 55:25you do not have much
- 55:26difference with the BERT model,
- 55:28but here we saw that
- 55:29at least some statistical significance
- 55:31was there. And I two
- 55:33b two is the unseen
- 55:34data, and doctor Xu mentioned
- 55:36again about how,
- 55:37you know, last language models
- 55:39are better on unseen data
- 55:41compared to the BERT. And
- 55:42BERT models, again, definitely need
- 55:44a lot more data to
- 55:45train on.
- 55:49Now what about the memory
- 55:50usage, total GPU hours GPU
- 55:53hours per epoch, energy consumption,
- 55:55carbon emission?
- 55:56That is where a lot
- 55:58of these computational resources and
- 56:00things comes into play. You
- 56:01need huge amount of memory.
- 56:03As you know, we are
- 56:04comparing a BART model that
- 56:05is about hundred million, three
- 56:07hundred million parameters to something
- 56:09that is seven billion, eight
- 56:10billion, seventy billion, and that
- 56:12difference
- 56:13really shows,
- 56:15in the amount of compute
- 56:16and the, hours that you
- 56:17require for training these models
- 56:19and the memory that they
- 56:21utilize.
- 56:23So, if you want to,
- 56:25fine tune the model, if
- 56:26you are using this parameter
- 56:27efficient fine tuning approaches like
- 56:29Laura, Melingfei is gonna discuss
- 56:31on that,
- 56:32then it is,
- 56:34you need one,
- 56:35a one hundred eighty gigabyte
- 56:37GPU. But if you,
- 56:39need to do the inference
- 56:41for the seventy billion model,
- 56:42you need two a one
- 56:44hundred,
- 56:44eighty gigabyte GPU.
- 56:47Okay. Again, our paper has
- 56:48a lot of things. I
- 56:49can skip through this. Just
- 56:50want to talk about concept
- 56:52normalization.
- 56:53The actual way we do
- 56:54concept normalization
- 56:55is actually having elastic search,
- 56:57which basically does exact
- 57:00match and,
- 57:00partial match of a z
- 57:02match and then b m
- 57:03twenty five to rerank
- 57:05those
- 57:06things extracted.
- 57:07So here we in the
- 57:09Kiwi, we have mapped it
- 57:10to UMLS concept unique identifiers.
- 57:13For anyone who's not familiar
- 57:14with UMLS,
- 57:15it is a meta UMLS
- 57:17metatasaurus
- 57:18basically incorporates
- 57:19about hundred and, some vocabularies
- 57:22and gives it a unique
- 57:23identity. Same concepts from all
- 57:25different vocabularies
- 57:27are mapped to a unique
- 57:28concept ID. So here,
- 57:30this is a large language
- 57:31model utilized concept normalization
- 57:34pipeline.
- 57:35Once you do the NER,
- 57:36you get that query. On
- 57:37the left side, you see
- 57:39left atrium dilated. So that
- 57:41is your query entity with
- 57:43its context. Let's say the
- 57:44sentence that has
- 57:45that. That you give to
- 57:46a last language model and
- 57:48ask it to generate multiple
- 57:50synonyms of it. Why are
- 57:51we doing that? Because the
- 57:52exact thing might not be
- 57:53appearing in any of the
- 57:54standardized vocabularies. So
- 57:56we
- 57:57generate as many variations of
- 57:59that particular entity so that
- 58:00we can do and
- 58:02match,
- 58:03that elastic search and b
- 58:04m twenty five actually does
- 58:05that. So you give the
- 58:06original utterance and all the
- 58:08synonyms and actually check-in
- 58:10your,
- 58:11you know,
- 58:13database that you have created
- 58:14whether that entity is actually
- 58:16present there.
- 58:18So,
- 58:19you search, you will get
- 58:20a bunch of concepts that
- 58:21are sort of similar to
- 58:22that, and then you again
- 58:23use a large language model
- 58:25to find among those concept
- 58:27which is the best one
- 58:28that actually
- 58:30represents the originally redeemed entity.
- 58:33I know I'm going so
- 58:34fast, but
- 58:36the slides will be available,
- 58:37and we will also think
- 58:38of making the recordings available
- 58:40on the YBIG website.
- 58:42Okay. Last step,
- 58:44the Kiwi usually gives you
- 58:45output in a JSON format,
- 58:47but we also have scripts
- 58:48to make it easy for
- 58:49you so that that JSON
- 58:50can be converted into a
- 58:52CSV.
- 58:52And what I have actually
- 58:54highlighted is you see in
- 58:55the first,
- 58:56column, it's the entity, the
- 58:58term that we have actually
- 58:59extracted, and the highlighted one
- 59:01is the concept ID for
- 59:02that, which is basically
- 59:04quiz or concept unique identifiers
- 59:07of that particular thing from
- 59:08UMLS.
- 59:09And if you ask why
- 59:10UMLS,
- 59:11if there is a concept
- 59:12unique identifier, you can actually
- 59:14map it back to SNOMED,
- 59:15ICD, mesh because UMLS includes
- 59:18all those things. That's a
- 59:19very easy task.
- 59:21Where can you find Kiwi?
- 59:22This is our website, You
- 59:23know? Kiwi dot clinical n
- 59:25l p dot org.
- 59:27The QR code will take
- 59:28you right there. You press
- 59:30the live demo,
- 59:31and then
- 59:33you, get a prepopulated,
- 59:36note, a few sentences.
- 59:38Click submit. It will show
- 59:40you the entities and the
- 59:41relation
- 59:42extracted. You can remove that
- 59:44text, add your own text.
- 59:45No programming experience. You can
- 59:47put something in there and
- 59:48get to see what are
- 59:49the entities. Just play around
- 59:51with that.
- 59:52Okay. Now if you want
- 59:54to download the models, we
- 59:56have this another page called
- 59:57download. You need to fill
- 59:58a form,
- 60:00and then we will send
- 01:00:01you,
- 01:00:02the Docker images for that.
- 01:00:03Now how is Docker,
- 01:00:05different?
- 01:00:06Everything is prepackaged
- 01:00:07into a container. You do
- 01:00:09not need to install things
- 01:00:10separately. You just and the
- 01:00:12Docker comes with instructions
- 01:00:14as to what to do.
- 01:00:15It's just like an executable
- 01:00:17you run,
- 01:00:18select, okay, one, two, three
- 01:00:19numbers. It also has a
- 01:00:21readme file, which gives
- 01:00:23you the ways how to
- 01:00:24run about it, and then
- 01:00:25the output will be stored
- 01:00:27in the you need to
- 01:00:27give this, where your input
- 01:00:29data is and the output,
- 01:00:30where you want the output
- 01:00:32which to be, and it
- 01:00:33will give you run the
- 01:00:34entire Kiwi and give you
- 01:00:35the output there. Output
- 01:00:37there.
- 01:00:38Okay? So easy to install
- 01:00:39Docker images. All the dependencies,
- 01:00:40everything is taken care of.
- 01:00:42Can be run on Linux,
- 01:00:43Mac, Windows. You have CPU.
- 01:00:45You have we we have
- 01:00:46versions for that. You have
- 01:00:47GPU. We have versions for
- 01:00:49that.
- 01:00:50And, we have both the
- 01:00:51BERT based and LAMA based
- 01:00:53models that does the thing
- 01:00:54that I was talking about.
- 01:00:57Finally,
- 01:00:58what,
- 01:00:59Vincent is gonna demo is
- 01:01:01forget about all these things.
- 01:01:03Your data is on the
- 01:01:04chip. You want to directly
- 01:01:05use it just with an
- 01:01:07API call. Currently, you need
- 01:01:09to contact Chris Gilman, who's
- 01:01:11a senior software engineer, to
- 01:01:12get that API for calling
- 01:01:14the Kiwi. But in future,
- 01:01:15we are gonna come up
- 01:01:16with a system where you
- 01:01:17can submit the tickets, get
- 01:01:18the API key. So get
- 01:01:20your API key, put it
- 01:01:21into a program that we
- 01:01:22are gonna give you, run
- 01:01:23it. That's as easy as
- 01:01:24it gets.
- 01:01:27A growing database, about thirty
- 01:01:29two requests so far since
- 01:01:30we released, and, that's it.
- 01:01:32I don't want to
- 01:01:34go more on that. What's
- 01:01:35coming up? We have more
- 01:01:37packages that we have actually
- 01:01:39built but not made it
- 01:01:40available in as a docker
- 01:01:41or a service or something
- 01:01:42like that. One of those
- 01:01:44that we are thinking of
- 01:01:45making it available on the
- 01:01:46chip or as, Kiwi is
- 01:01:48currently
- 01:01:49is the resist pipeline, which
- 01:01:50is extracting systematic anticancer therapy
- 01:01:53and the responses based on
- 01:01:55the, RECIST guidelines. Again, I'm
- 01:01:57not a clinician, so I'm
- 01:01:58not going on to it.
- 01:01:59So probably you can see
- 01:02:00something like the similar in
- 01:02:02future
- 01:02:02available,
- 01:02:03like the stalker images or
- 01:02:05API services or something that
- 01:02:06you can download and play
- 01:02:07with.
- 01:02:09My main area of research,
- 01:02:11social determinants of health. This
- 01:02:12is another pipeline that I
- 01:02:14have built. Twenty one social
- 01:02:16determinants of health,
- 01:02:17four one, two, three, four.
- 01:02:19Yeah. Four different models.
- 01:02:21You start from XGBoost,
- 01:02:23TextCNN,
- 01:02:24SentenceBird,
- 01:02:25llama. After that, that actually
- 01:02:28can take your notes and
- 01:02:30annotate it on two levels,
- 01:02:32on twenty one social determinants,
- 01:02:34determinant factors. And let's say,
- 01:02:37it does a sort of
- 01:02:38sentence classification.
- 01:02:39It takes your note. It
- 01:02:41divides it into sentences, tells
- 01:02:42you, okay. This sentence is
- 01:02:43talking about race, sex, gender.
- 01:02:46This sentence is talking about
- 01:02:47the insurance of the person.
- 01:02:48So this sentence is talking
- 01:02:50about their education. Now we
- 01:02:51go one more level. You
- 01:02:52will also have models that
- 01:02:54tells you, okay. This education
- 01:02:56of this person is high
- 01:02:57school or below. The insurance,
- 01:02:59it's yes. The person has
- 01:03:01having an insurance or no.
- 01:03:02So high level on the
- 01:03:04twenty one factors, both the
- 01:03:06values and attributes
- 01:03:08on another
- 01:03:09digging another level deep. So
- 01:03:11that's all we have here.
- 01:03:12And, also, I just wanna,
- 01:03:15forgot that.
- 01:03:16Yeah.
- 01:03:18When you sign a DUA
- 01:03:19with us,
- 01:03:20we are gonna give you
- 01:03:21the model weights of KB
- 01:03:23that is still in the
- 01:03:24pipeline. It will come here
- 01:03:26in the form. So in
- 01:03:27this form that I'm asking
- 01:03:28you to fill to get
- 01:03:29the docker images would also
- 01:03:31be if you are good
- 01:03:32in programming,
- 01:03:33take our model,
- 01:03:35you know, continuously fine tune
- 01:03:37on it with your data,
- 01:03:38make it whatever you want
- 01:03:40to do with it. So,
- 01:03:41that is another thing, but
- 01:03:42you need to sign a
- 01:03:43DUA with us, and that
- 01:03:44form will be available soon
- 01:03:46there. With that, Vincent, take
- 01:03:48it over for a Kiwi
- 01:03:49API demo.
- 01:03:56Not taking questions because of
- 01:03:58the time that we focus
- 01:03:59on.
- 01:04:07So good afternoon, everyone. My
- 01:04:09name is Vincent, and I'm
- 01:04:10a software developer in doctor.
- 01:04:12She labs.
- 01:04:13And today, I will talk
- 01:04:15about how to use the
- 01:04:16Kiwi API service.
- 01:04:18And the core concept is
- 01:04:20of of the Kiwi have
- 01:04:22already become discussed with, by,
- 01:04:24so I will talk very
- 01:04:26quick.
- 01:04:28So what is the Kiwi's
- 01:04:31so API service?
- 01:04:33The Kiwi API service provide
- 01:04:35an API as a service
- 01:04:37interface
- 01:04:38that allow user within
- 01:04:40chips
- 01:04:41in internal network to access
- 01:04:42a Kiwi without a request,
- 01:04:44high performance GPU and having
- 01:04:46to install
- 01:04:47or manage the model locally.
- 01:04:50User simply request a API
- 01:04:53key and make standard HTTP
- 01:04:55API API calls to use
- 01:04:57the service.
- 01:04:58All computational
- 01:04:59resource are running on chips,
- 01:05:01so and, we don't need
- 01:05:03to request the local GPU,
- 01:05:05GPU.
- 01:05:06Does this setup of streamline
- 01:05:08assess TV functionality
- 01:05:11and, makes it more accessible
- 01:05:13in resource constraint
- 01:05:15environment.
- 01:05:20So how does the Kiwi
- 01:05:21API service actually work?
- 01:05:23The process follow a simple
- 01:05:25request and,
- 01:05:27a response
- 01:05:29response.
- 01:05:30User can send a request
- 01:05:31to the Kiwi API server
- 01:05:33in chips environment such as
- 01:05:35the cam Camino, which include
- 01:05:37the either,
- 01:05:38the clinical notes or other
- 01:05:40tests related data.
- 01:05:42Once the API service received
- 01:05:44the request, it's determined
- 01:05:46the task type, based on
- 01:05:48the specific
- 01:05:49endpoint and then return the
- 01:05:51up appropriate appreciate the response.
- 01:05:55Most tasks are handled by
- 01:05:57the background process on the
- 01:05:58API server.
- 01:06:00All incoming requests are and
- 01:06:02queued
- 01:06:03is queued and processed sequentially
- 01:06:06to issue the efficient using
- 01:06:08the lim and imitate,
- 01:06:10computational resource.
- 01:06:11So
- 01:06:13and then
- 01:06:15let's take a closer look
- 01:06:17at how to use the
- 01:06:19QApi service.
- 01:06:20Before we getting start,
- 01:06:23there are something you need
- 01:06:25to prepare. First,
- 01:06:27obviously, you need,
- 01:06:29access in the chips environment
- 01:06:31like the Camino.
- 01:06:32You need to have a
- 01:06:33one HX
- 01:06:35account.
- 01:06:36Then you
- 01:06:37need to answer right API
- 01:06:38key, for the detail just
- 01:06:40mentioned by,
- 01:06:42You need to ask Chris
- 01:06:44Gellman to get the API
- 01:06:45key.
- 01:06:47For the user who have
- 01:06:49some coding experience, they can
- 01:06:51write their own script to
- 01:06:52access the API, but we
- 01:06:54also provide the API launch
- 01:06:56script and some use case
- 01:06:57include
- 01:06:58in Jupyter Notebook provided under
- 01:07:00this, GitHub,
- 01:07:01link.
- 01:07:06Now assuming you, you already
- 01:07:08have the access in into
- 01:07:09the formula and then you
- 01:07:10you open a Jupyter notebook,
- 01:07:12and then you have a
- 01:07:13a key API key. First,
- 01:07:16you need to define a
- 01:07:17variable to instantiate
- 01:07:19the the class, that I
- 01:07:20provide in the script.
- 01:07:21At this step, you need
- 01:07:23to insert
- 01:07:24your API key and into
- 01:07:26this instance.
- 01:07:28For the first time,
- 01:07:29for using the API key
- 01:07:30server,
- 01:07:32you can use the key
- 01:07:33info function to test your
- 01:07:35connection.
- 01:07:37This will give some, result.
- 01:07:39Yeah.
- 01:07:40It, responds as a JSON
- 01:07:42format,
- 01:07:43information.
- 01:07:44There are three main component
- 01:07:46in the in this response.
- 01:07:48You can see the usage
- 01:07:49count and, which tells you
- 01:07:52how many tokens you use
- 01:07:53since you create the API
- 01:07:54key. And the token remain,
- 01:07:57which tells you how many
- 01:07:58token you still have in
- 01:08:00the API key. Finally, the
- 01:08:02expire at, tells you when
- 01:08:04the key is expired.
- 01:08:06For the token, the expire
- 01:08:07state, you can contact our
- 01:08:09team to add add in
- 01:08:10the usage in the future.
- 01:08:13Our main function process is
- 01:08:15the batch prediction,
- 01:08:17which allow user process their
- 01:08:19clinical node in the bark.
- 01:08:21To use this function,
- 01:08:23you need to provide the
- 01:08:24path of your files in
- 01:08:26into in the communal environment.
- 01:08:29Currently, the upload format are
- 01:08:31support to compress the files
- 01:08:32such as the deep or
- 01:08:33tar or the single text
- 01:08:36t file, and you can
- 01:08:37compress your,
- 01:08:38node into a single text
- 01:08:39t file as well.
- 01:08:41If the sit,
- 01:08:42your note will on the
- 01:08:44Kiwi server as a task
- 01:08:45in the queue. The function
- 01:08:47will return the text status
- 01:08:48including the test ID for,
- 01:08:51for this test as a
- 01:08:52JSON format.
- 01:08:53All all test ID are
- 01:08:54related to the API key,
- 01:08:57which means all the user
- 01:08:58data is isolate isolated by
- 01:09:00the API key and the
- 01:09:01test ID.
- 01:09:04Here, is an example when
- 01:09:06you do a batch prediction.
- 01:09:09As you can see, it's
- 01:09:10a return, JSON format of
- 01:09:12the test information.
- 01:09:14It's including the test ID
- 01:09:16and the message that shows
- 01:09:17the state, status and how
- 01:09:20many tokens using this, task
- 01:09:22and, how many token remaining
- 01:09:24your,
- 01:09:25account.
- 01:09:26Finally, the estimate time for
- 01:09:28your task is calculated based
- 01:09:30under your task queue position
- 01:09:31and the progress of the
- 01:09:33task, ahead of it.
- 01:09:37After you submit a task
- 01:09:39and receive a task ID,
- 01:09:41you can use your task
- 01:09:42ID to check the current
- 01:09:43status of your task at
- 01:09:45anytime using the task ID
- 01:09:47status function.
- 01:09:49It will provide your,
- 01:09:51task information and the detail.
- 01:09:53Typically, there are three main,
- 01:09:56status of your,
- 01:09:58the test data. So first
- 01:09:59is the queue status.
- 01:10:01When a state,
- 01:10:02task in the queue, the
- 01:10:04system will provide the,
- 01:10:06test current queue position as
- 01:10:08well as the estimate time
- 01:10:10for processing.
- 01:10:11And next is processing.
- 01:10:13When no no one attack
- 01:10:15you,
- 01:10:16your task will put into
- 01:10:18a process or pull. It
- 01:10:20will in indicate how many
- 01:10:21files into your task and,
- 01:10:24include how many not yet
- 01:10:26processed
- 01:10:27and, how many have you
- 01:10:28been proceed
- 01:10:29and, also, remaining time based
- 01:10:31on the remaining files.
- 01:10:33Finally, the incomplete
- 01:10:35status, you can use the
- 01:10:37test ID in the next
- 01:10:38function to download your result.
- 01:10:42Once your, test is start
- 01:10:44complete,
- 01:10:45you can download your
- 01:10:47test using the type download
- 01:10:49function.
- 01:10:50In this function, you need
- 01:10:51to give output path, and,
- 01:10:53then you want to save
- 01:10:54your, file as a add
- 01:10:56local and the output type
- 01:10:58you you prefer.
- 01:11:00By default, the path are
- 01:11:01the working directory,
- 01:11:03and the the output,
- 01:11:05output is the JSON format.
- 01:11:07For the file saving, it
- 01:11:08supported three types of the
- 01:11:10tip that typically use. First
- 01:11:13is seed file,
- 01:11:14which compress the all result
- 01:11:16into a separate JSON file
- 01:11:17for your input files.
- 01:11:19And then it, insert the
- 01:11:21JSON, which combined all the
- 01:11:23JSON result into a single
- 01:11:25JSON file.
- 01:11:26Finally, it's the CSVS,
- 01:11:28where we've seen our mission.
- 01:11:29We have a con convert,
- 01:11:31integrate into the j the
- 01:11:33eight QVAP service so they
- 01:11:34can just,
- 01:11:35simply,
- 01:11:37output CSV.
- 01:11:39Each file can only be
- 01:11:41download once. After you download,
- 01:11:43you cannot access,
- 01:11:45again because,
- 01:11:46for the some,
- 01:11:48privacy issue, they will delete
- 01:11:50the delete the re record
- 01:11:52of the data.
- 01:11:54And, here is a example
- 01:11:55for download. Typically, it will
- 01:11:57save your file into local
- 01:11:59directory and give you some
- 01:12:01message to tell you if
- 01:12:02this, succeed.
- 01:12:04And the left side, it
- 01:12:05should be the, like, the
- 01:12:06typical re re
- 01:12:08result format.
- 01:12:10And,
- 01:12:11in some case, you might
- 01:12:13submit multiple task and or
- 01:12:16forget a specific
- 01:12:18task ID, this function allow
- 01:12:20you to quickly review the
- 01:12:22status of each text.
- 01:12:24This this function will list
- 01:12:26all tests that not yet
- 01:12:27downloaded.
- 01:12:28Yeah. Yeah.
- 01:12:30And, if you exit as
- 01:12:33accidentally
- 01:12:34submit your task, if the
- 01:12:36task is not,
- 01:12:37into a process,
- 01:12:39you can still using,
- 01:12:40this, this
- 01:12:42function to cancel your task
- 01:12:44be, before the
- 01:12:46task is into
- 01:12:48the process pool,
- 01:12:49and, they will give you
- 01:12:51the token back.
- 01:12:53So I will show you
- 01:12:54a quick demo for them.
- 01:13:17Okay. Sure. So
- 01:13:18So I think we'll skip
- 01:13:19the demo and move to
- 01:13:20the next speaker. It's the
- 01:13:22same things, but you see
- 01:13:23it on coming on. Like,
- 01:13:25I'll actually showed you the
- 01:13:26program there. Right.
- 01:13:27Yeah.
- 01:13:29Thank you.
- 01:13:35Oh, hi, everyone. My name
- 01:13:37is Lingfei Chen. I am
- 01:13:39and, I am a postdoc
- 01:13:40at doctor Vashu's group. Today,
- 01:13:42I'm going to show you
- 01:13:43how to develop customized models
- 01:13:45customized models for some specific
- 01:13:46applications.
- 01:13:48So at the beginning, I
- 01:13:49would like to introduce why
- 01:13:51we need those customized models.
- 01:13:53We all know that large
- 01:13:54language models like LLAMA and
- 01:13:56GPT series have shown great
- 01:13:58potential in many domains
- 01:14:00as they,
- 01:14:01portrayed on large scale of
- 01:14:03text, and they have strong
- 01:14:05instruction following abilities
- 01:14:07across different tasks, and they
- 01:14:09have a wide coverage of
- 01:14:10general knowledge.
- 01:14:11However, they may not fully
- 01:14:13capture the analysis of some
- 01:14:15specific tasks tasks or user
- 01:14:17needs, especially when some, like,
- 01:14:20the task involves some,
- 01:14:22like,
- 01:14:23specific design definitions
- 01:14:26or the task is actually
- 01:14:27real in common users.
- 01:14:29And that's why we need
- 01:14:31to develop customized models for
- 01:14:33ourselves.
- 01:14:34We could in enhance the
- 01:14:35model with the domain specific
- 01:14:37expertise
- 01:14:38in this task and improve
- 01:14:40the performance of existing large
- 01:14:42range models.
- 01:14:43And in the process of
- 01:14:45improving the performance, we could
- 01:14:47actually, like, let the small
- 01:14:49size smaller size models to
- 01:14:51get comparable performance with those
- 01:14:53large larger size models, and
- 01:14:55we could get more, like,
- 01:14:57efficient and cost effective.
- 01:14:59And, also, we could better
- 01:15:01user experience
- 01:15:03be experienced by reducing some
- 01:15:05of the hallucinations in those
- 01:15:06existing large language models.
- 01:15:09And here are some key
- 01:15:10steps of developing customized models.
- 01:15:13So the first is to
- 01:15:14actually define your what is
- 01:15:16your NRP task, and the
- 01:15:18second is to prepare those
- 01:15:20data to,
- 01:15:21train and evaluate the large
- 01:15:23language model.
- 01:15:26Prevent,
- 01:15:27the prep the preparation of
- 01:15:28data involving some, like, steps
- 01:15:31that you have, like, introduced
- 01:15:33before do the data annotation
- 01:15:35and data pre processing to
- 01:15:36afford any further models.
- 01:15:38And after we get to
- 01:15:40the data, we could start
- 01:15:41model training is to enhance
- 01:15:43the performance of the models
- 01:15:44with those task specific data.
- 01:15:47And then
- 01:15:48once we finish the the
- 01:15:50model training, we could actually
- 01:15:52use another set of this
- 01:15:53annotated data to evaluate the
- 01:15:55performance of our developed
- 01:15:57model to see if the
- 01:15:59performance actually gained when compared
- 01:16:02with the backbone model.
- 01:16:03And then once we confirm
- 01:16:05that the performance of the
- 01:16:07model
- 01:16:07improved, we could actually use
- 01:16:09this model, do this customized
- 01:16:11models for those production.
- 01:16:14And here is a general
- 01:16:15workflow of model training and
- 01:16:17evaluation.
- 01:16:18Once we define our task
- 01:16:20and prepare our data, we
- 01:16:22need to, like, split data
- 01:16:23into different subsets.
- 01:16:26Usually, we would have three
- 01:16:27subsets. The first is the
- 01:16:28training data to develop the
- 01:16:30model, and the second would
- 01:16:31be the validation
- 01:16:32data to validate the effectiveness
- 01:16:35of the training model. But
- 01:16:36for simplification, we just use
- 01:16:38that data to, to instead,
- 01:16:41like, the we use the
- 01:16:42test data to, like, evaluate
- 01:16:45the training model. If the
- 01:16:46training model is, effective compared
- 01:16:49with the backbone model, we
- 01:16:50then could use it to
- 01:16:52for the production model to,
- 01:16:54like, process the production data.
- 01:16:56And if the training model
- 01:16:58actually the performance might decrease,
- 01:17:00once it's decreased, we might
- 01:17:01need to adjust the training
- 01:17:03process
- 01:17:04to, like,
- 01:17:06we redo the training part.
- 01:17:10And,
- 01:17:11next, I will show more
- 01:17:12details of each step. I
- 01:17:14will, start from how to
- 01:17:16define tasks.
- 01:17:18This is actually a real
- 01:17:19example.
- 01:17:20Start from, like, task task
- 01:17:22design. I will show you
- 01:17:24how to, like, develop customized
- 01:17:26models step by step.
- 01:17:28So let's say that we
- 01:17:29have a research to investigate
- 01:17:31about the impact of bilingual
- 01:17:34bilingualism
- 01:17:35and ADRD pro progression.
- 01:17:38So the first step
- 01:17:40is, like, in different clinical
- 01:17:42researches, is to find eligible
- 01:17:43patients.
- 01:17:44But once we find those
- 01:17:46eligible patients, we need to
- 01:17:47identify those bilingual or monolingual
- 01:17:49patients within these patients.
- 01:17:52And,
- 01:17:53the first thought is to,
- 01:17:55like, check the structured data
- 01:17:57about those, preferred language or
- 01:17:59written language,
- 01:18:01areas to see what the
- 01:18:03language the patient will prefer.
- 01:18:05But where where we, like,
- 01:18:07check the actual data, we
- 01:18:08find that the structured data
- 01:18:10actually,
- 01:18:11the there might not be
- 01:18:13enough this kind of data
- 01:18:15to support our research,
- 01:18:16and some of them might
- 01:18:18not be even accurate.
- 01:18:20But we noticed that there
- 01:18:22are a lot of language
- 01:18:23information contained in the clinical
- 01:18:25notes. For example, many notes
- 01:18:27would record what the patient,
- 01:18:29speak, what is the preferred
- 01:18:30language, and how well do
- 01:18:32they speak. So we might,
- 01:18:34like, comprehensively
- 01:18:36extract all the language speaking
- 01:18:38status from all the clinical
- 01:18:40notes, like using those OLP,
- 01:18:43models.
- 01:18:44There are two targets that
- 01:18:46we want to extract. The
- 01:18:47first is what language does
- 01:18:49the patient speak and how
- 01:18:50well do they speak. So,
- 01:18:52for these two
- 01:18:54specific tasks,
- 01:18:55aims, we, like,
- 01:18:57could formulate the task as
- 01:18:59a task.
- 01:19:01The first the first thing
- 01:19:02we want to do is
- 01:19:03to identify all the language
- 01:19:05entities in the clinical notes,
- 01:19:06and then we could assign
- 01:19:08different tags based on different
- 01:19:10context
- 01:19:11to indicate different speaking status
- 01:19:13of the patient.
- 01:19:16Once we, like, formulate the
- 01:19:18task as a task, we
- 01:19:20need to further refine the
- 01:19:22details of the task.
- 01:19:24We might need to review
- 01:19:26some of the clinical notes
- 01:19:27and design different tasks for
- 01:19:30the task.
- 01:19:32In this, like, data reviewing
- 01:19:34after the data review, we,
- 01:19:36designed four different tasks for
- 01:19:38this task. The first is
- 01:19:39language fluent, which indicate the
- 01:19:41patient speaks some kind of
- 01:19:42language fluently.
- 01:19:44And there is another language
- 01:19:46sum to indicate the patient
- 01:19:47speaks some of the languages,
- 01:19:49and that they are language
- 01:19:50no and the language other.
- 01:19:52And here are some examples.
- 01:19:54For, for the first one,
- 01:19:56language fluent,
- 01:19:57some of the sentences says
- 01:19:59that the patient speaks Italian
- 01:20:02primary
- 01:20:03primarily.
- 01:20:05So this indicate that the
- 01:20:06person is has lang fluent
- 01:20:09Italian
- 01:20:10abilities.
- 01:20:11And for the language some,
- 01:20:12here's the sentence. She speaks
- 01:20:14some English.
- 01:20:15And, no, the patient does
- 01:20:17not speak English. And for
- 01:20:19the language other, we actually,
- 01:20:22when when we reviewing data,
- 01:20:24we found a lot of,
- 01:20:25like, languages used in other
- 01:20:28individuals. For example, the patient's
- 01:20:30family or the
- 01:20:32written language. So this kind
- 01:20:34of language do not, indicate
- 01:20:36the language speaking status of
- 01:20:38the patient, so we categorize
- 01:20:39them as language other.
- 01:20:42Once we refine the details
- 01:20:44of the EER task, we
- 01:20:45need to, like, start to
- 01:20:47prepare the data for model
- 01:20:49training and the model evaluation.
- 01:20:52Here is the overall flow
- 01:20:54to prepare the data for,
- 01:20:56like, model developing.
- 01:20:58We first need to get
- 01:20:59some raw data and
- 01:21:01do the annotation with the
- 01:21:03annotation guideline that we developed
- 01:21:05before. And once we get
- 01:21:07the annotated results, we may
- 01:21:09need to, like, design a
- 01:21:10prompt for this task.
- 01:21:12Some of the prompt has
- 01:21:14been, like, discussed,
- 01:21:16by doctor Shui and Vipina.
- 01:21:18And once we get the
- 01:21:19prompts and all the annotated
- 01:21:21results, we need to process
- 01:21:22the results for the models
- 01:21:24to load to start the
- 01:21:25training and evaluation.
- 01:21:28And,
- 01:21:29this is the annotation
- 01:21:30using Blue that Yuja has
- 01:21:32mentioned before, so I'm just
- 01:21:33gonna skip this. And here
- 01:21:35is the annotated results look
- 01:21:37like. Usually, we would have
- 01:21:39a JSON file for each
- 01:21:40input sample, and each
- 01:21:43each sample looks like this.
- 01:21:45So it will record file
- 01:21:47name and the original sentence
- 01:21:49and also the entities and
- 01:21:51the positions of entities that
- 01:21:53we,
- 01:21:54marked.
- 01:21:56And here is the prompt.
- 01:21:58I'm just gonna skip this.
- 01:22:00And
- 01:22:01after we get all the
- 01:22:03files and the prompt, we
- 01:22:04need to process the data
- 01:22:05based on different prompts. For
- 01:22:07example,
- 01:22:08our task is to, like,
- 01:22:10annotate all the text in
- 01:22:12the original sentence with, HTML
- 01:22:15tag. So we might need
- 01:22:17to process the input as
- 01:22:19the original sentence, and the
- 01:22:20output and the target output
- 01:22:22would be,
- 01:22:24the same sentence, but with
- 01:22:26all the entities
- 01:22:27wrapped with this, like,
- 01:22:30a language front or language
- 01:22:32sum, those tags,
- 01:22:34to wrap it with HTML
- 01:22:35tag.
- 01:22:37And this is also the
- 01:22:39data preparation for the models
- 01:22:41to load. I will prepare
- 01:22:42I will I will show
- 01:22:44the code, later so you
- 01:22:45can just directly try the
- 01:22:47code to to process it.
- 01:22:49So after we prepare the
- 01:22:51data, we now finally
- 01:22:54can start the fine tuning
- 01:22:55process.
- 01:22:58So fine tuning
- 01:23:00process is actually
- 01:23:01a process to adjust the
- 01:23:03weights of the large language
- 01:23:05models to make it adapt
- 01:23:07to our task specific data.
- 01:23:10So, usually, we need to,
- 01:23:12like,
- 01:23:13adjust all the weights of
- 01:23:14the model, but like, we
- 01:23:16know that for large language
- 01:23:17model, there are a lot
- 01:23:18of parameters. So,
- 01:23:20the full fine tuning would
- 01:23:21be very, like,
- 01:23:23computational
- 01:23:24cost cost will be very
- 01:23:26high. So instead, we would
- 01:23:28use,
- 01:23:29like,
- 01:23:30widely used way, LoRa, to
- 01:23:32do the fine tuning. LoRa
- 01:23:33is actually,
- 01:23:35a low rank adaptation
- 01:23:36that use, two, like, different,
- 01:23:39small vectors here in the
- 01:23:42green
- 01:23:43green
- 01:23:44in the green,
- 01:23:46in the green part.
- 01:23:48Instead of fine tuning the
- 01:23:50entire large network model, we
- 01:23:52we only need to adjust
- 01:23:54the parameters in this, like,
- 01:23:55small vectors.
- 01:23:57So, compared with full fine
- 01:23:59tuning, it is much more
- 01:24:00faster, and we only need
- 01:24:02minimal training resources.
- 01:24:04And, but it it needs
- 01:24:05high quality datasets.
- 01:24:07So we need to, like,
- 01:24:08define the task and
- 01:24:10develop the annotation guideline carefully.
- 01:24:13And it also has some
- 01:24:15high risk of over fifty.
- 01:24:17So for the resources,
- 01:24:19for eight eight billion model,
- 01:24:21it might need one a
- 01:24:22one hundred or h one
- 01:24:23hundred GPU. But for seventy
- 01:24:25billion models, we might need
- 01:24:26two h one hundred GPUs
- 01:24:28to do the fine tuning.
- 01:24:30So, here is some environment
- 01:24:32setup. I also provided the
- 01:24:34code in the,
- 01:24:36at last so you can
- 01:24:37check it for more details.
- 01:24:40And if we want to
- 01:24:41do the fine tuning, we
- 01:24:43need to, like,
- 01:24:45modify
- 01:24:46the config file in the
- 01:24:48code that I provided.
- 01:24:50The first one is to,
- 01:24:51like, indicate where is the
- 01:24:53model
- 01:24:54stored in the Camino or
- 01:24:55CHP environment.
- 01:24:57So
- 01:24:58in the folder, it should
- 01:24:59look like this. It has,
- 01:25:01many, like,
- 01:25:02weights of the model, some
- 01:25:03details of the model.
- 01:25:05And for the data, we
- 01:25:07also need to provide the
- 01:25:08the pass of the data
- 01:25:09that we
- 01:25:10processed before to tell the
- 01:25:12model where is the data.
- 01:25:14And
- 01:25:15beside the model and the
- 01:25:16data, we also need to
- 01:25:17set up some other, like,
- 01:25:19configs. For example, the most
- 01:25:21important one might be the
- 01:25:23learning rate. You could, like,
- 01:25:25adjust the learning rate based
- 01:25:26on
- 01:25:27the evaluation results of the,
- 01:25:30trained model.
- 01:25:32And finally, we need we
- 01:25:35can we can start the
- 01:25:36model's fine tuning.
- 01:25:38The fine tuning process is
- 01:25:40actually very easy. Once we,
- 01:25:42finish the config file, we
- 01:25:43can just start the fine
- 01:25:45tuning with only one line
- 01:25:47of a command.
- 01:25:48So after fine tuning, we
- 01:25:49will get adapt adaptator
- 01:25:52adapter
- 01:25:54parameters, which is a very
- 01:25:55small file.
- 01:25:57So after we get this
- 01:25:58LoRa adapter, we need to
- 01:26:00combine the adapter with the
- 01:26:01original backbone model to formalize
- 01:26:04your own customized model.
- 01:26:07Once we get our customized
- 01:26:09model, we need to,
- 01:26:10test the model to see
- 01:26:12if the performance actually gained
- 01:26:14compared with the backbone model.
- 01:26:16So we need to do
- 01:26:17the inference on the test
- 01:26:18data.
- 01:26:19And here is also some
- 01:26:21example how to setting up
- 01:26:23all the environment.
- 01:26:24And this is the example
- 01:26:25of how to, like, do
- 01:26:27the inference on the test
- 01:26:28data data to get to
- 01:26:29the results.
- 01:26:32This is also to set
- 01:26:34up all the inference,
- 01:26:36from fix.
- 01:26:38For example, the max max
- 01:26:39token indicate how long you'd
- 01:26:42expect the model to output,
- 01:26:44and the stop token EOS
- 01:26:47means once the model, like,
- 01:26:50generated the US to token,
- 01:26:52it would finish
- 01:26:53the generation
- 01:26:54instead of generate, you know,
- 01:26:56five hundred tokens.
- 01:26:59So once we get the
- 01:27:01inference results, we could evaluate
- 01:27:03the performance
- 01:27:04of the model and compare
- 01:27:06it with the performance of
- 01:27:07the backbone model.
- 01:27:08And here is some evaluation
- 01:27:10metric that you just have
- 01:27:11the introduced before, so I'm
- 01:27:13just gonna skip this.
- 01:27:15And I also provide some
- 01:27:16scripts
- 01:27:17for the,
- 01:27:18like, evaluation. You can also
- 01:27:20refer to the code that
- 01:27:21I provided for more details.
- 01:27:23And here is the,
- 01:27:25fine tune results that we
- 01:27:27have once after we do
- 01:27:30the fine tuning with eight
- 01:27:31hundred samples that we annotated
- 01:27:33before.
- 01:27:34So the fine tune the
- 01:27:35means that,
- 01:27:36we use the three seventy
- 01:27:38billion instructor model as a
- 01:27:40backbone model to do the
- 01:27:42fine tune.
- 01:27:43So compared with the backbone
- 01:27:44model, we see that for
- 01:27:46every tech language, fluent, stem,
- 01:27:48no, and other, all the
- 01:27:49performance of all the f
- 01:27:51one score actually in improved.
- 01:27:53So
- 01:27:54in this case, we can
- 01:27:55say that we have
- 01:27:56a effective, like, fine tuned
- 01:27:58customized model.
- 01:28:01And once we find that
- 01:28:02the customized model performed dropped,
- 01:28:04we need to go back
- 01:28:06to the training process to
- 01:28:07retrain the model to see,
- 01:28:09to,
- 01:28:10like,
- 01:28:12iteratively
- 01:28:13check the performance to see
- 01:28:14if there is any,
- 01:28:16gain.
- 01:28:17And here is the code
- 01:28:19data that we provided for,
- 01:28:21like, more details.
- 01:28:23And if you have any,
- 01:28:25like, interest, you can leave
- 01:28:26if, have any question, you
- 01:28:28can leave comments or directly
- 01:28:30send me an email.
- 01:28:31Oh, okay. Thank you so
- 01:28:33much.