Skip to Main Content

4-29-25 Workshop Session 2

June 04, 2025
ID
13191

Transcript

  • 00:01So
  • 00:02in
  • 00:03the
  • 00:04previous session, I think,
  • 00:07we mainly focus on to
  • 00:08show you how to access
  • 00:10the data, how to request
  • 00:11computing resource. Right?
  • 00:13In the next session, what
  • 00:15we will do is more
  • 00:16about
  • 00:17technical details,
  • 00:19how you can call existing
  • 00:21large engine models
  • 00:23in the CHP safe environment.
  • 00:26Or if you want to
  • 00:28build your own
  • 00:29customized large engine model. Right?
  • 00:34By leveraging is it existing
  • 00:35model by using your own
  • 00:37data, then how can you
  • 00:38train the model using the
  • 00:40environment? So that's what will
  • 00:42happen in the next session.
  • 00:44And I will provide very
  • 00:46brief
  • 00:47overview about large engine model,
  • 00:49then,
  • 00:50a number of speaker will
  • 00:51actually jump in to show
  • 00:53the different tools
  • 00:55that are available on the
  • 00:56CHP safe environment, which you
  • 00:58can use. Okay? For example,
  • 00:59if you want to annotate
  • 01:01data, we have a tool.
  • 01:02If you want to train
  • 01:03the model,
  • 01:04call existing API like a
  • 01:06keyway tool, how to call
  • 01:07it, and how to stuff
  • 01:09on using your own data
  • 01:11to train them out. So
  • 01:12that's what will happen next.
  • 01:13Let me just do a
  • 01:14quick,
  • 01:15introduction
  • 01:17of the large engine model.
  • 01:18I think because of running
  • 01:19of time, so I'm trying
  • 01:20to shrink,
  • 01:22reduce my session.
  • 01:32I think I'm going to
  • 01:33skip this. I will
  • 01:35was planning to give a
  • 01:36short history about AI. Basically,
  • 01:39what I'm trying to say
  • 01:40is not just starting now.
  • 01:41We have been doing this
  • 01:43for many times, but this
  • 01:44wave of generative AI is
  • 01:46slightly different from the previous
  • 01:48one in a few ways,
  • 01:49like, how the model was,
  • 01:53trained. It's much bigger than
  • 01:54previous model and focus on
  • 01:56generation task rather than trying
  • 01:58to do, prediction or analyze
  • 02:00the data and also heavy
  • 02:02rely on the GPUs.
  • 02:04And one thing I really
  • 02:05want to mention because we
  • 02:06talk about this is a
  • 02:07large engine model session, and
  • 02:09I'll just give a brief
  • 02:10history about
  • 02:11language models. Language model has
  • 02:14been there for a while.
  • 02:15Start nineteen sixty.
  • 02:16Really a promising model trying
  • 02:18to give a sequence of
  • 02:19words, trying to predict next
  • 02:21word. Okay. Then
  • 02:22later on, neural,
  • 02:24language model showed very good
  • 02:26performance, but it suffers with
  • 02:27all those,
  • 02:28computational
  • 02:29efficiency
  • 02:30and all those issues. So
  • 02:32didn't really scale up. Until
  • 02:33later, two thousand seventeen, the
  • 02:35transformer model was proposed
  • 02:38together with abundant of GPUs
  • 02:40available,
  • 02:40make it possible actually to
  • 02:42train,
  • 02:43neural
  • 02:44language model with a lot
  • 02:46of data. Then we move
  • 02:47to this pretrained language models.
  • 02:49Okay? At that time, like
  • 02:51a BERT model, you probably
  • 02:52heard a lot too. Right?
  • 02:53Show the good performance that
  • 02:55you you pretrain,
  • 02:57a reasonable text. But then
  • 02:59two thousand twenty two, really,
  • 03:01we moved to what we
  • 03:02call large language model.
  • 03:04And,
  • 03:05basically, it's a transformal based
  • 03:07pretrained language model, but trained
  • 03:09out a lot of data.
  • 03:11And the the rationale behind
  • 03:13this is,
  • 03:14oh, I think I
  • 03:15is this
  • 03:17emergent phenomenon of a large
  • 03:19range model. Because people find
  • 03:20that when you train on
  • 03:22a lot of data,
  • 03:23the model not just can
  • 03:25do one thing in a
  • 03:26reasonable performing. It can actually
  • 03:28do a lot of different
  • 03:29tasks, always a reasonable performance.
  • 03:32That's what we call the
  • 03:33emergent phenomenon of the large
  • 03:35language model. Suddenly, it become
  • 03:36very smart.
  • 03:38And that leads to the
  • 03:39the the the a lot
  • 03:40of development of all those
  • 03:42large language model, open source
  • 03:44models
  • 03:45like a LLAMA, DeepSeq, you
  • 03:46probably heard, which give you
  • 03:48all the weights.
  • 03:49You can actually use your
  • 03:50own data to continue pretrain
  • 03:53or fine tune. Okay? And
  • 03:55commercial model like GPT,
  • 03:57you they used to be
  • 03:59closed model, so you cannot
  • 04:00really fine tune. But now
  • 04:02GPT also have a service.
  • 04:03You can upload data, do
  • 04:05some kind of fine tune
  • 04:06on their side, then post
  • 04:07the model. The fine tune
  • 04:09model on the GPT side.
  • 04:10You still have to pay
  • 04:11every time you call them.
  • 04:12Okay? And then there's also
  • 04:14different architect encoder versus decoder.
  • 04:17And the the trend is
  • 04:18also we're moving more towards
  • 04:19multimodal large entry model. Instead
  • 04:22of just a text based
  • 04:23model, your text plus image,
  • 04:25text plus other genomic data,
  • 04:26and all those things. A
  • 04:27lot of things going on.
  • 04:29And,
  • 04:31in particularly, in the NLP
  • 04:33world,
  • 04:34especially in the biomedical NLP
  • 04:37world,
  • 04:38we often focus on one
  • 04:40NLP task called information extraction.
  • 04:43So the idea is,
  • 04:46in the clinical data, there's
  • 04:47a lot of unstructure there.
  • 04:48For example, it's a text
  • 04:49document and a lot of
  • 04:50details.
  • 04:51So the task of information
  • 04:53extraction is like, okay. Given
  • 04:55this document,
  • 04:56can you extract all the
  • 04:57disease information of the patient
  • 04:59out of this document?
  • 05:00So I would say this
  • 05:01is about comes to seventy,
  • 05:03eighty percent of,
  • 05:05the the the requirements for
  • 05:06a lot of, like,
  • 05:08EHR based analysis,
  • 05:10both for,
  • 05:12practical
  • 05:13practices
  • 05:14as well as for clinical
  • 05:15research.
  • 05:16So today, most all almost
  • 05:18all the work we show
  • 05:19here today is on this
  • 05:20information extraction task,
  • 05:22and it can further divide
  • 05:24into three subtasks.
  • 05:26The first one is called
  • 05:28named entity recognition.
  • 05:29So the idea is that
  • 05:31you need to give a
  • 05:31document. You need to,
  • 05:33the system need to recognize
  • 05:35MRI of the abdominal
  • 05:37is a test.
  • 05:39Okay?
  • 05:40You need to know both
  • 05:41the type is the test
  • 05:42and the boundary.
  • 05:43And you need to know
  • 05:44June eighteen two thousand eight
  • 05:46is a temple expression,
  • 05:48that type of entity what
  • 05:49type of entity and what's
  • 05:50the boundary.
  • 05:51That's the MER task. The
  • 05:53second one, we also call
  • 05:55it relation instruction. So what
  • 05:56do you need to know?
  • 05:57You want to know this
  • 05:58June eighteen two thousand eight
  • 06:00is a modifier of MRI.
  • 06:03Right? That's a relation between
  • 06:04those two entities. So it's
  • 06:06very important for you to
  • 06:07recognize the context
  • 06:09of that clinical entity.
  • 06:11And the the third one
  • 06:12is called concept normalization.
  • 06:14So if you read the
  • 06:16notes, what you saw here
  • 06:18is renal cell carcinoma.
  • 06:20But if you want to
  • 06:21build a clinical decision support
  • 06:23system, you need to be
  • 06:24this entity need to be
  • 06:26coded to a concept in
  • 06:28the medical terminology.
  • 06:29Could be ICD ten. Could
  • 06:31be SNOMED. Right? So that's
  • 06:32you want to normalize this
  • 06:34detect entity
  • 06:36to a term in the
  • 06:37vocabulary in the standard vocabulary.
  • 06:40And as you can see,
  • 06:41it's actually not straightforward because
  • 06:42you see renal cell carcinoma,
  • 06:44but, actually, the term is
  • 06:45a, malignant neoplasma
  • 06:48of, kidney in the terminology.
  • 06:50So you need this kind
  • 06:51of mapping. Okay? So
  • 06:54today, most of our work
  • 06:55will show how we do
  • 06:56those three tasks and build
  • 06:58the system to extract information
  • 06:59out of the text.
  • 07:01I just primarily talk about
  • 07:03three different,
  • 07:05approach.
  • 07:06I'll skip those.
  • 07:08I have a few slides
  • 07:09about history.
  • 07:10Two thousand twin two thousand
  • 07:12ish, we mainly work on
  • 07:13rule based system. You have
  • 07:15a dictionary. We try to
  • 07:16look up all the disease
  • 07:18from the dictionary. Okay? Then
  • 07:20two thousand ten, we annotate
  • 07:21corpus. We start to do
  • 07:23machine learning. So what happens
  • 07:24is if you have a
  • 07:26pack of red blood cell
  • 07:28as a entity,
  • 07:29beginning of entity will label
  • 07:31as b, Intermediate token of
  • 07:33the entity will label as
  • 07:34I. Then all other outside
  • 07:36entity will label as o.
  • 07:38So then you convert this
  • 07:39to a sequence labeling task.
  • 07:41You label each word. Is
  • 07:42it b, is it I,
  • 07:44or o? So become machine
  • 07:46learning task to learn. Okay?
  • 07:48So that's what we're doing
  • 07:49around that time. And then
  • 07:51twenty twenty,
  • 07:52it's more on,
  • 07:54deep learning. I think many
  • 07:55of you heard about. At
  • 07:57that time, we are looking
  • 07:58for those context embeddings like
  • 08:00a BER model. So we
  • 08:01actually fine tune the BER
  • 08:03models from open domain with
  • 08:05clinical data and show the
  • 08:07performance.
  • 08:08So this is just trying
  • 08:09to summary. All you want
  • 08:11to know is, like, moving
  • 08:12from rule based machine learning
  • 08:14to deep learning, the performance
  • 08:15that you're getting better.
  • 08:17And now we move to
  • 08:18large language model. How you
  • 08:20can use large man lang
  • 08:22language model to do this
  • 08:23information
  • 08:24extraction task. I'll give you
  • 08:26three examples,
  • 08:27three different approach we have
  • 08:29worked on. And, we actually
  • 08:31have a a
  • 08:32I think, potentially, if you're
  • 08:34going to work on your
  • 08:35own task, those are the
  • 08:36three approach you may can
  • 08:38take.
  • 08:39The first one, you probably
  • 08:41all know. You have GPT
  • 08:42over there.
  • 08:43All you need
  • 08:44to do is write a
  • 08:45prompt. Right? Say, give this
  • 08:47document. Tell GPT what I
  • 08:49want. So that's what we
  • 08:51did as a first experiment
  • 08:52here. Basically,
  • 08:54we give
  • 08:55GPT three point five, GPT
  • 08:56four at that time. We
  • 08:58we say we want to
  • 08:59extract a medical problem treatment
  • 09:01test out of clinical nodes.
  • 09:04So the main exercise here
  • 09:05is really about prompt. So
  • 09:07we tried actually a different,
  • 09:09strategy for prompt. You define
  • 09:11the task, define the output.
  • 09:13You also need to tell
  • 09:14the prompt what's the definition
  • 09:16of a medical problem,
  • 09:17and then you can also
  • 09:19give a guideline. So say
  • 09:21because in the previous, I
  • 09:22also show you what's a
  • 09:23boundary. You may say, here,
  • 09:24it has to be a
  • 09:25noun phrase for that entity.
  • 09:26You give that kind of
  • 09:27guideline. Then you can also
  • 09:29give additional examples.
  • 09:31Here's the sentence. Here's the
  • 09:32entity I want to extract.
  • 09:34So this kind of call
  • 09:36few short learning. It gives
  • 09:37a three example, like, a
  • 09:39few short learning. Right? And
  • 09:42we tested all those. We
  • 09:43made a framework for the
  • 09:45prompt, and we showed the
  • 09:46evaluate on the annotate purpose.
  • 09:49And then we we show,
  • 09:50actually,
  • 09:51if you have a lot
  • 09:52of annotated data,
  • 09:54then BER model, the previous
  • 09:55deep learning based approach,
  • 09:57still work better. The zero
  • 09:59shot performance on GPT is
  • 10:01as not as good as
  • 10:02the BER model if you
  • 10:03have a lot of annotated
  • 10:04data. So that's what we
  • 10:06found at that time. But
  • 10:07it's actually close because the
  • 10:08GPT
  • 10:09four model can reach to,
  • 10:11like, eighty six versus the
  • 10:14BER model trained on hundred
  • 10:15samples ninety in the relaxing
  • 10:17matching.
  • 10:18Relaxing matching means if the
  • 10:20entity
  • 10:21you predicted versus the entity
  • 10:22you annotate is overlap but
  • 10:24not exactly same. Okay.
  • 10:27And,
  • 10:28I'll skip this one.
  • 10:30Then I'll skip those two.
  • 10:32The second exercise we did
  • 10:34okay.
  • 10:35Later, llama come out. You
  • 10:37have all the weights, like
  • 10:38I said. You can actually
  • 10:40do,
  • 10:41fine tune of those using
  • 10:42your additional data. So that's
  • 10:44what we did in this.
  • 10:45We're working on the same,
  • 10:47task, extra medical problem treatment
  • 10:49test, but now we have
  • 10:50the open source LAMA model.
  • 10:52We're actually going to use,
  • 10:54annotate data from local corpus
  • 10:57to fine tune the LAMA
  • 10:58model for this task.
  • 11:00And this is what we
  • 11:01had
  • 11:02through the instruction tuning approach.
  • 11:04I think, we're not going
  • 11:05to talk a little bit
  • 11:06about this, so I'm not
  • 11:07going to repeat. Basically, you're
  • 11:09converting the annotate data, which
  • 11:11I showed,
  • 11:12to a instruction dataset.
  • 11:14Then you fine tune the
  • 11:15LAMA model to to,
  • 11:17change the weights,
  • 11:19for this specific task. And
  • 11:21what we found here is
  • 11:22actually
  • 11:23when you have a
  • 11:26a a lot of annotate
  • 11:27data, the per the large
  • 11:28entry model, llama three, actually
  • 11:30start
  • 11:31almost same as a per
  • 11:33model if you look at
  • 11:34those, but slight better than
  • 11:35per model. It's this one
  • 11:37is more fair compressive
  • 11:39because both model use those
  • 11:40hundred annotated sample to train
  • 11:43it. Okay? But then
  • 11:45the last dataset is the
  • 11:47unseen dataset for both model.
  • 11:49But then you can see
  • 11:50actually the LAMA model have
  • 11:52much better performance than the
  • 11:53BER model.
  • 11:55Indicated the LAMA model is
  • 11:56more actually generalizable
  • 11:58because it's almost have, like,
  • 11:59eight percent improvement
  • 12:01versus the BER model is
  • 12:03around
  • 12:04seventy nine. Here's eighty seven.
  • 12:06So now we start to
  • 12:07actually build this as a
  • 12:08LAMA based kind of information
  • 12:10instruction system. But one thing
  • 12:12I want to point out,
  • 12:13at least when we test
  • 12:14on the LAMA three,
  • 12:16the speed is the issue,
  • 12:17actually. If you look at
  • 12:18the BER model, it will
  • 12:20take us point two second
  • 12:21to do a named entity
  • 12:23recognition for one document.
  • 12:26Three took thirty nine seconds.
  • 12:28So if you're processing millions
  • 12:30of notes, that's another concern.
  • 12:32So there's a a lot
  • 12:33of other issues in addition
  • 12:35to the performance you want
  • 12:36to consider. That's what I
  • 12:38want to bring up. So
  • 12:39the third approach,
  • 12:41you think about the first
  • 12:42approach prompt, you don't really
  • 12:44need much GPU. Right? It
  • 12:46just costs money to call
  • 12:47GPT.
  • 12:48Second is fine tune. You
  • 12:50do need the GPT to
  • 12:51load the mod or you
  • 12:52You do need a GPU
  • 12:53machine to load the model
  • 12:55and fine tune. It may
  • 12:56take couple hours to couple
  • 12:57days.
  • 12:58But this one, we call
  • 13:00it continue pre training, the
  • 13:01LAMA model.
  • 13:03You use a lot of
  • 13:04clinical data, like all the
  • 13:05notes, all the literature. We
  • 13:07combined, like, one hundred twenty
  • 13:09nine billions of,
  • 13:11tokens to continue pretraining,
  • 13:14the the the llama model.
  • 13:15It took one hundred fifty
  • 13:17GPUs running for a month.
  • 13:19That will be a lot
  • 13:21of money if you go
  • 13:22to Amazon.
  • 13:23As you can see, the
  • 13:24the computational
  • 13:25cost for training this model,
  • 13:27much bigger compared to a
  • 13:29previous one. But the benefit
  • 13:31of this is actually the
  • 13:32model has become more generalizable.
  • 13:34It can work on multiple
  • 13:36top clinical NLP task. So
  • 13:38that's what we call this
  • 13:39MiLama model. We trained on
  • 13:41the LAMA two and,
  • 13:43show the better performance on
  • 13:45actually multiple task, not just
  • 13:47on entity recognition, on question
  • 13:49answering,
  • 13:50inference, and all other tasks.
  • 13:52I'm just going to stop
  • 13:53here.
  • 13:54So in summary,
  • 13:56just to quickly
  • 13:57talk about what we learned
  • 13:59so far. Basically,
  • 14:02when
  • 14:03you try to extract information
  • 14:04out of notes using large
  • 14:06engine model, you still can
  • 14:08think about, do you really
  • 14:09need a large engine model?
  • 14:10If the task is simple,
  • 14:11I think even sometimes regular
  • 14:13expression, even rule based approach
  • 14:15still work. And, also, if
  • 14:16you already have a lot
  • 14:17of annotate data, then deep
  • 14:19learning model like BERT still
  • 14:21works well. Okay? And, also,
  • 14:23it costs less in terms
  • 14:24of computational effort.
  • 14:26Then
  • 14:27if you think g p
  • 14:29large engine model does help
  • 14:30for that specific, then we
  • 14:32also want to discuss, oh,
  • 14:33do I should I train
  • 14:35my own large engine model
  • 14:36based on open source model,
  • 14:38or should I go with
  • 14:40GPT? Right?
  • 14:42Then there's a lot of
  • 14:43concern.
  • 14:44In addition to performance, you
  • 14:45also will think about the
  • 14:47cost.
  • 14:48Right? The the GPU requirement.
  • 14:50Do you have GPUs locally
  • 14:51and all those issues?
  • 14:54So in the next three
  • 14:56to four presentation, we basically
  • 14:58will talk about several things,
  • 15:00tools
  • 15:01available on the CFG safe
  • 15:03environment, which will allow you
  • 15:05to do this kind of
  • 15:06work.
  • 15:07The first tool we will
  • 15:08talk about is actually about
  • 15:10annotation tool. A lot of
  • 15:11people
  • 15:12didn't really pay much attention
  • 15:14to annotation, but you if
  • 15:15you really think about look
  • 15:17at all the model training,
  • 15:18use even within at the
  • 15:20area of a large angle,
  • 15:21you still need to do
  • 15:22some annotation even just for
  • 15:24validation
  • 15:25evaluation. Then you need a
  • 15:26tool to do that.
  • 15:28And we we have a
  • 15:29tool installed on the CHP
  • 15:30for that purpose. Then second
  • 15:32one, we showed you
  • 15:34a tool we already fine
  • 15:36tuned, and we made it
  • 15:37available on the, CHP. You
  • 15:39can just call it. I
  • 15:40think Nate also talked about
  • 15:42the services.
  • 15:43Third one is really,
  • 15:45go deep. I think Lingfue
  • 15:47gonna talk about if you
  • 15:48have your own data, start
  • 15:50with. How can you fine
  • 15:51tune that model with your
  • 15:53own data on the CHP,
  • 15:55man? So let them just
  • 15:56start.
  • 15:58Do you wanna go ahead?
  • 15:59Start with the annotation.
  • 16:01Just watch all the time.
  • 16:03Maybe just a little fast.
  • 16:15Hi, everyone.
  • 16:16Today, I'm just, gonna go
  • 16:18through why we need annotation
  • 16:20and then, try to introduce
  • 16:21our annotation tool, Blue.
  • 16:25So,
  • 16:26annotation is the process of
  • 16:28labeling data,
  • 16:30marking span of text images
  • 16:32or other content with additional
  • 16:34information such as, entity types,
  • 16:36categories,
  • 16:37or relationships.
  • 16:38Just as doctor Xu mentioned
  • 16:40before showing the graph that
  • 16:42there is entities annotated and
  • 16:44also the relationship, annotate
  • 16:48between those
  • 16:50entities. Annotation is, critically important
  • 16:53because it serves as the
  • 16:55foundation for the machine learning
  • 16:57and deep learning models.
  • 16:59These models,
  • 17:01heavily rely on annotated dataset
  • 17:03to learn meaningful patterns and
  • 17:06then to make,
  • 17:07accurate predictions.
  • 17:09Annotation remain
  • 17:11the large length models.
  • 17:13Although LMM is highly capable,
  • 17:17it they still depends on
  • 17:18the annotation data for
  • 17:20fine tuning and the specific
  • 17:22to for specific task and
  • 17:24for evaluation against the ground
  • 17:26truth.
  • 17:28Here
  • 17:33here you can see a
  • 17:35table that,
  • 17:36compare performance
  • 17:38between,
  • 17:39multiple models
  • 17:41for,
  • 17:43including the lama three variants
  • 17:44and also a fine tuned
  • 17:46lama model on the language
  • 17:47annotation task.
  • 17:50As seen in the result,
  • 17:51fine tuned model,
  • 17:53which trained our well annotated
  • 17:55dataset
  • 17:56tend to perform better on
  • 17:58specific
  • 17:59targeted,
  • 18:00project.
  • 18:04There are several key topics
  • 18:07related to the annotation.
  • 18:09The process always begin with,
  • 18:11developing a clear and detailed
  • 18:14annotation guideline.
  • 18:15A well developed guideline improves
  • 18:17consistency
  • 18:18among,
  • 18:19annotators
  • 18:20and lead to higher annotation
  • 18:22quality as the speed up
  • 18:25the onboarding of new annotators
  • 18:27and also make conflict resolution
  • 18:29easier.
  • 18:30Once the guideline is,
  • 18:33developed, the next step will
  • 18:35be select the annotation
  • 18:37with appropriate domain knowledge
  • 18:39and then train them thoroughly
  • 18:41based on the guideline.
  • 18:44After the training, it is
  • 18:45important to continuously
  • 18:47checking the and monitor the
  • 18:49annotation
  • 18:50quality.
  • 18:51This including,
  • 18:53checking agreement among annotators,
  • 18:56holding discussion
  • 18:57to resolve,
  • 18:59disagreement
  • 19:00and refine the guideline based
  • 19:02on common errors or ambiguities
  • 19:04identified during the process.
  • 19:07And later, I will also
  • 19:08introduce the annotation tool that
  • 19:10can support and streamline the
  • 19:12annotation workflow.
  • 19:17For annotation guideline development,
  • 19:19the first step is always
  • 19:21define the goal of your
  • 19:22project.
  • 19:24Clearly state what you are
  • 19:26trying to achieve,
  • 19:27with the project.
  • 19:29Next, provide a clear definition
  • 19:31for all the concept like
  • 19:34entities, relations, or some special
  • 19:36terms.
  • 19:39After that, develop
  • 19:40detailed annotation rule that covers
  • 19:43morality
  • 19:43scenarios
  • 19:44and edge cases to minimize
  • 19:46the ambiguity.
  • 19:48It is also essential to
  • 19:50include many real world examples
  • 19:52in the guidelines.
  • 19:54Illustrate both, correct annotation and
  • 19:57common errors.
  • 19:59Guideline development is not a
  • 20:00one time effort. It's always
  • 20:02need,
  • 20:04iterative process.
  • 20:05It is important to involve
  • 20:07both,
  • 20:08domain experts and the linguistics
  • 20:11or info implementations
  • 20:12to ensure both,
  • 20:14technical accuracy and practical usability.
  • 20:18After the initial guideline is
  • 20:21create created,
  • 20:22it should be refined during
  • 20:24the annotation training and tested
  • 20:26on real world data.
  • 20:28Given the variability
  • 20:29and complexity
  • 20:31of the real world data,
  • 20:33new scenarios will always,
  • 20:36inevitably
  • 20:37raised
  • 20:38and may occur further guideline
  • 20:40updates.
  • 20:41Once the guideline is stable
  • 20:43and robust, the process can
  • 20:45move to the corpus final
  • 20:47day final finalization.
  • 20:51Here, you can see, example
  • 20:53of a annotation guideline.
  • 20:55The goal of this guideline
  • 20:56is to,
  • 20:58identify meaningful,
  • 20:59clinical concept from,
  • 21:02important patient,
  • 21:03medical records
  • 21:04and to help extract information
  • 21:06like
  • 21:07the test problem, drug, and
  • 21:09treatment.
  • 21:11As shown on the left
  • 21:12side,
  • 21:13we provide a detailed definition
  • 21:16to
  • 21:17to ensure the annotator understand
  • 21:19what should be labeled.
  • 21:21In this guideline, we also
  • 21:22introduced the modifiers,
  • 21:24a concept that,
  • 21:27complement,
  • 21:27entity and also extend its
  • 21:29mailing.
  • 21:31For each modifiers
  • 21:32such as the severity and
  • 21:34body location,
  • 21:36it will also need to
  • 21:37be specific on how it
  • 21:39need to be annotated and
  • 21:41what's the relationship with the
  • 21:43entities.
  • 21:44Additionally, the guideline need to
  • 21:46include, many real world examples,
  • 21:50like the diagram show on
  • 21:51the bottom,
  • 21:52to illustrate
  • 21:54the correct annotation practice
  • 21:56and for ambiguous
  • 21:58phrase or tricky scenarios.
  • 22:00Examples will also need to
  • 22:02be provided to establish a
  • 22:04clear consistent rules for annotator
  • 22:07to follow.
  • 22:12It is it is important
  • 22:13to choose the annotators with
  • 22:15a proper it's,
  • 22:17background for your task.
  • 22:19Depending on the complexity,
  • 22:22you might need to choose
  • 22:24a domain expert like physicians,
  • 22:26nurses,
  • 22:27or, medical students,
  • 22:29or just or some layperson
  • 22:31for more general and broad
  • 22:33annotation.
  • 22:34Training for annotator is a
  • 22:36iterative process.
  • 22:38Annotator should be trained and
  • 22:40evaluate multiple times,
  • 22:42until they achieve
  • 22:43the expected level of performance.
  • 22:47Quality checking,
  • 22:48need to be ongoing during
  • 22:50the annotation progress.
  • 22:52Regularly review is always needed
  • 22:55during their work.
  • 22:57And you also need to
  • 22:58provide the feedback,
  • 23:00on time and sometimes additional
  • 23:02retraining for the annotators.
  • 23:07When managing a project,
  • 23:09which contains,
  • 23:10multiple annotators,
  • 23:12there are several,
  • 23:14important steps must be taken
  • 23:15to ensure the quality.
  • 23:18Before starting the actual annotation,
  • 23:21train each annotator
  • 23:23thoroughly to ensure they can
  • 23:25produce consistent and reliable annotation
  • 23:28result
  • 23:29that align with the guideline
  • 23:31you developed.
  • 23:33If resource allow,
  • 23:35implement
  • 23:35double annotation
  • 23:37strategy.
  • 23:38Ideally, each sample need to
  • 23:39be annotated
  • 23:41by two annotators independently.
  • 23:44Then a third more experienced
  • 23:46annotator
  • 23:47can review
  • 23:48any discrepancies and make the
  • 23:50final decision.
  • 23:51This process will have to
  • 23:52maintain a high quality of
  • 23:54annotation.
  • 23:55If double annotation for the
  • 23:57intel dataset is not feasible,
  • 24:00assign small overlapping
  • 24:02subset of data to multiple
  • 24:04annotators.
  • 24:05This overlap allow you to
  • 24:07calculate inter,
  • 24:09interagreement
  • 24:11of the annotator
  • 24:12and then provide a way
  • 24:14to monitor and maintain annotation
  • 24:15quality.
  • 24:20When checking the annotation quality
  • 24:22for the NER task, we
  • 24:24will focus on two main
  • 24:25areas.
  • 24:26The first one is entity
  • 24:28type agreement and then empty
  • 24:30span agreement.
  • 24:32For anti type agreement, we
  • 24:33verify whether annotators
  • 24:36assign the same type for
  • 24:37the entity.
  • 24:38You can see in this,
  • 24:40graph, one of them annotate
  • 24:42the Vancom missing HCL as
  • 24:44the drug and another one
  • 24:46annotate as treatment.
  • 24:47Then this,
  • 24:49mismatch will
  • 24:51need to be discussed when
  • 24:53during during the annotation and
  • 24:55then correct,
  • 24:57for the final
  • 24:58step.
  • 24:59And also for the
  • 25:01anti span agreement, we check
  • 25:03whether both annotator
  • 25:04select the same portion of
  • 25:06text. For the same example,
  • 25:08one
  • 25:10labeled a lot of emotional
  • 25:12stress as a problem and
  • 25:13another one annotate just the
  • 25:15emotional stress.
  • 25:17When such mismatch occurs,
  • 25:19it is important to refer
  • 25:21back to the guideline
  • 25:22and to and to determine
  • 25:24what is the current correct
  • 25:26one to move forward.
  • 25:40When checking the annotation quality
  • 25:43in relation
  • 25:44to extraction,
  • 25:45there are three main aspects
  • 25:47we need to evaluate.
  • 25:48The first one is,
  • 25:50relation type agreement.
  • 25:52We check whether both annotators
  • 25:54assign the same type of
  • 25:56relation between entities,
  • 25:58and then we evaluate,
  • 26:00entity pair. We verify if
  • 26:03the same entity are being
  • 26:05linked by the relation.
  • 26:06And finally, we need to
  • 26:08check the direction
  • 26:09directionality,
  • 26:11which is important for some
  • 26:13tasks because the direction may
  • 26:15change the meanings.
  • 26:19To
  • 26:21evaluate, there are several metrics
  • 26:22we can use.
  • 26:24The common one is precision
  • 26:26recall and f one measure,
  • 26:28which help quantify how consistently
  • 26:30annotate and identify and classify
  • 26:32entities.
  • 26:34Additionally, we can also use
  • 26:36some statistical measures such as
  • 26:38Cohen's copper or.
  • 26:41Another important
  • 26:43matter is, self train and
  • 26:44self test. By training the
  • 26:46model on the annotated dataset
  • 26:48and then testing on the
  • 26:50same dataset,
  • 26:52we can
  • 26:54check if the model achieve
  • 26:55high performance.
  • 26:56If the performance
  • 26:58is low, it may indicate
  • 26:59underlying issue with, annotation inconsistency
  • 27:03or, quality that or quality.
  • 27:09Oh, here is some examples
  • 27:11of, widely used annotation tool.
  • 27:15You can see there is,
  • 27:16Meditator,
  • 27:17Ehost,
  • 27:18or a Docana. All of
  • 27:20those tools are open source
  • 27:21and available on GitHub.
  • 27:24Today, I'm gonna,
  • 27:25introduce the annotation tool blue
  • 27:27is which is implement on
  • 27:29the cheap environment, and then
  • 27:32each users don't need to
  • 27:33install by themselves and could
  • 27:35be managed by,
  • 27:37admin user.
  • 27:42For the Bluetooth,
  • 27:44there are several prerequest
  • 27:46for the access. The first
  • 27:47one, you will need a
  • 27:48one HH account
  • 27:50and then, adding connect to
  • 27:52VPN.
  • 27:53So for Mac user, you
  • 27:54will need to install a
  • 27:55Windows application.
  • 27:57And for Windows user, you
  • 27:58can use the remote desk,
  • 28:01connection to oh, application.
  • 28:05First step, you need to
  • 28:06connect to the VPN.
  • 28:08Open the VPN
  • 28:09application and then, in the
  • 28:11address, type the telecom mute
  • 28:14dot y h h dot
  • 28:16org backslash y s m.
  • 28:18Here, you need to use
  • 28:19your Yale Net ID and
  • 28:21password to log in.
  • 28:24And then once you successfully,
  • 28:27log in to the VPN
  • 28:28environment, you can open the
  • 28:29application and click the add
  • 28:32button
  • 28:33to add the
  • 28:35IP address.
  • 28:36It's ten dot forty eight
  • 28:38dot one two eight dot,
  • 28:40ninety six
  • 28:42dot sixty nine.
  • 28:44And
  • 28:45once the PC successfully added,
  • 28:48it will show on the
  • 28:50application and then double click
  • 28:53to insert your credential.
  • 28:55Here, we'll need your one
  • 28:57HH ID and the one
  • 28:59HH password.
  • 29:03Once you, log in to
  • 29:06the PC, you will see
  • 29:07a Ubuntu environment.
  • 29:13After you get access to
  • 29:15that environment, you can use
  • 29:16any browser on the left
  • 29:18side And on the address,
  • 29:20insert the URL, HTTP,
  • 29:23local host to open the
  • 29:25annotation tool.
  • 29:28The first step is to
  • 29:30create your account.
  • 29:31You always want to have
  • 29:33a admin person that creates
  • 29:35account first.
  • 29:37That will be the person
  • 29:38who can manage the whole
  • 29:39group and assign the project
  • 29:41and tasks to each annotators.
  • 29:44Please use your email, username,
  • 29:46and pass
  • 29:47password to sign up. And
  • 29:49for the verification
  • 29:51code field,
  • 29:52we disable that function so
  • 29:54you can just enter any
  • 29:55four digit number or
  • 29:57combination of characters.
  • 30:02After you log in to
  • 30:03the blue, you will be
  • 30:04able to create able to
  • 30:05create the project and invite,
  • 30:08users to the tool.
  • 30:14Once the admin person successfully
  • 30:17log in and then he
  • 30:19he or she can send
  • 30:20the invitation to the other
  • 30:22group members,
  • 30:23he the person need to
  • 30:25click the invite button and
  • 30:27then copy the invitation link
  • 30:29to each of
  • 30:30the annotators.
  • 30:31And the annotator need to
  • 30:33be use this link to
  • 30:35register. Otherwise, they will not
  • 30:36be in the same group.
  • 30:41And,
  • 30:42by click the click on
  • 30:44add new project button, you
  • 30:46can you will be able
  • 30:47to
  • 30:48choose your task either NER
  • 30:50or NER plus relational extraction.
  • 30:57The pro the creative project
  • 30:59will show on the front
  • 31:01page,
  • 31:01and then you will be
  • 31:03able to add annotators
  • 31:04to the to the project.
  • 31:10To add the data source,
  • 31:12you can click the data
  • 31:13source button
  • 31:14button and then choose what
  • 31:16kind of format you want
  • 31:17to upload to the tool.
  • 31:19We access two format. One
  • 31:21is txt. That's the plain
  • 31:22text without any,
  • 31:25entity or relationship,
  • 31:27and and pre annotate. You
  • 31:29can also choose the blue
  • 31:31format. That is a JSON
  • 31:32file. You can include the
  • 31:34entity or
  • 31:35relationship
  • 31:36inside that JSON.
  • 31:42For each for each project,
  • 31:44you can create tasks for
  • 31:45the annotators,
  • 31:47by adding the by click
  • 31:50the add task button.
  • 31:52And for each task, you
  • 31:53can assign multiple annotators to
  • 31:56this one task just as
  • 31:57I mentioned before.
  • 31:59Different annotators can
  • 32:02annotate same, sub subgroup of
  • 32:04data. That is in order
  • 32:06to calculate the agreement among
  • 32:08annotators.
  • 32:12After Tesla
  • 32:13created, you you will be
  • 32:14able to start annotation.
  • 32:16And then
  • 32:18first first thing, you need
  • 32:19to define the entity and
  • 32:21relationship
  • 32:22that you already have in
  • 32:23your annotation guideline.
  • 32:25And
  • 32:28and then after that, highlight
  • 32:30the phrase you want to
  • 32:31do the annotation and then,
  • 32:33choose
  • 32:34what kind of entity or
  • 32:35relationship you want to
  • 32:38annotate.
  • 32:40The blue tool can will
  • 32:42also provide you a function
  • 32:44to calculate the agreement among
  • 32:46the annotators.
  • 32:48Once the annotator finish and
  • 32:51finish the task,
  • 32:52you can
  • 32:54and finalize them. Then you
  • 32:56then you can just use
  • 32:58the button to
  • 33:01check the agreement among them.
  • 33:03It will give you a
  • 33:04f one score for both
  • 33:06entity and relationship.
  • 33:10Then I will have a
  • 33:12quick demo for the process.
  • 34:18Okay. Okay. As I mentioned,
  • 34:20you just, connect to the
  • 34:21VPN and then
  • 34:25type the password.
  • 34:32And here, then open
  • 34:35open the Windows app.
  • 34:38Click to the
  • 34:42server
  • 34:43we have.
  • 35:09And then open the browser.
  • 35:24Here, you can sign into
  • 35:26your account.
  • 35:29To create a new project,
  • 35:30you can just click this
  • 35:31button and
  • 35:33type the project name and
  • 35:35select the project type.
  • 35:38Here, I already created a
  • 35:39demo project project.
  • 35:41I want to,
  • 35:42import the data source here.
  • 35:44So I just click this
  • 35:46button,
  • 35:46and then I
  • 35:48download some The notes I
  • 35:50need to annotate it from
  • 35:52the chip environment,
  • 35:54and then I drag drag
  • 35:55it to here.
  • 35:58This is the tab,
  • 35:59tip c file, so I
  • 36:00just choose text,
  • 36:02and then I confirm.
  • 36:08For each of the project,
  • 36:09you can add the annotator,
  • 36:12and that's the pit per
  • 36:13that's the person within your
  • 36:15group.
  • 36:19And then you just, create
  • 36:21annotation task for them.
  • 36:27You can choose, multiple annotators
  • 36:29here.
  • 36:33And then you go to
  • 36:34the file.
  • 36:36On this side, you can
  • 36:38define the entity. For example,
  • 36:40we we want to choose
  • 36:42we want to define problem.
  • 36:46And then you can,
  • 36:48start to do the annotation.
  • 36:52Yeah. Basically, that's the whole
  • 36:54process for the how you're
  • 36:55gonna do the annotation and
  • 36:56how to use our tool.
  • 37:06Yeah. Any questions?
  • 37:09So
  • 37:10how we can import our
  • 37:12own data to this? This
  • 37:13because I think this this
  • 37:14is your server. Right?
  • 37:17Here, as a
  • 37:19the we can download
  • 37:21the
  • 37:22the team also mentioned we
  • 37:24use the cheap environment. Right?
  • 37:25The Camino, we can upload
  • 37:27the own data mod their
  • 37:29their own data to their
  • 37:30environment, and this server will
  • 37:32connect to the Camino.
  • 37:34You can download that data
  • 37:36from Camino environment.
  • 37:38That will.
  • 37:47Not yet. Right now, because,
  • 37:49this tool will host in
  • 37:51a secure environment,
  • 37:53while coming to us because
  • 37:54there are lots of PHI
  • 37:56information.
  • 37:57So
  • 37:58that's the purpose,
  • 38:00we hosted there. So for
  • 38:01example, let's say there are
  • 38:02other,
  • 38:04and publicly available datasets.
  • 38:07We really want to annotate
  • 38:08them so they're they're able
  • 38:09to be for us to
  • 38:10ask,
  • 38:12to,
  • 38:13upload it to and then
  • 38:14from
  • 38:15to this server. Right? Yes.
  • 38:18Well, I want to think
  • 38:19of if you try to
  • 38:21annotate public data, don't use.
  • 38:23I I I think what
  • 38:25what we can do is
  • 38:26we set up a blue
  • 38:27in a open,
  • 38:29public website, then you can
  • 38:30just go to over there.
  • 38:32Like, I put it in
  • 38:33the spin up, then you
  • 38:34can just upload because there's
  • 38:35no sensitive data. We can
  • 38:37just make another instance of
  • 38:39the loop of public data.
  • 38:41Because this,
  • 38:42we install in the communal,
  • 38:44in the CHP to support
  • 38:45this annotation of of.
  • 38:47And if there are public
  • 38:48data, well, we can't just
  • 38:49set up another because it's
  • 38:50a web application. We just
  • 38:51set up another web application
  • 38:53in this thing, a public
  • 38:54space we can. Yeah. We
  • 38:56can discuss that. And we
  • 38:57should we should ask them
  • 38:58to set up that
  • 39:00specific
  • 39:01list that the public list,
  • 39:02or is it available?
  • 39:05We have not, but you
  • 39:06can contact us. Maybe we
  • 39:08just give you a copy.
  • 39:09We can stop by ourselves.
  • 39:10But right now, we we
  • 39:11didn't really distribute this package
  • 39:13frame. We're just sitting up
  • 39:15before our.
  • 39:35It it's just a different,
  • 39:36tools.
  • 39:38Yeah.
  • 39:51Thanks, Silja. I'm gonna be
  • 39:53very quick.
  • 39:55And machine gun mode on.
  • 39:57Okay.
  • 39:59Yeah. Doctor Shu already discussed
  • 40:00about the difference between,
  • 40:02BERT and LAMA.
  • 40:04And
  • 40:05summarize everything,
  • 40:07there is a trade off
  • 40:08between performance, computational
  • 40:10resources, and time.
  • 40:12Okay? So you have, you
  • 40:14need better performance.
  • 40:16The computational resources are there.
  • 40:17Go for llama models, high
  • 40:19billion models.
  • 40:20Across a wide variety of
  • 40:22tasks, they would work well.
  • 40:24But if time is a
  • 40:25concern, he, projected,
  • 40:27you know, issue of speed
  • 40:28between BART models and the
  • 40:30last language models. It is
  • 40:31up to twenty to thirty
  • 40:32times slower.
  • 40:34So if that is a
  • 40:35concern, you need to switch
  • 40:36to BERT models. So I'm
  • 40:38gonna talk about the clinical
  • 40:39information extraction system
  • 40:41where we have developed both
  • 40:43the BERT and LAMA based
  • 40:46large language models for you
  • 40:49in such a way that
  • 40:50you do not have programming
  • 40:52experience, you have some programming
  • 40:54experience, or you are a
  • 40:55pro programmer.
  • 40:57Anyway,
  • 40:58we have features that will
  • 40:59help you take it and
  • 41:01customize it to whatever task
  • 41:03you want to use it
  • 41:04for.
  • 41:05And that
  • 41:07is what we call Kiwi.
  • 41:09Okay? So we are building
  • 41:11Kiwi. The one pipeline that
  • 41:13I'm currently gonna show you
  • 41:14that is set for all
  • 41:16these sort of use cases
  • 41:17that I'm talking about
  • 41:19is a general clinical information
  • 41:21extraction pipeline.
  • 41:23I also have things coming
  • 41:24up for you, and if
  • 41:25you have suggestions or something
  • 41:27that you have been really
  • 41:28working on, it's a real
  • 41:29need of the time, let
  • 41:30us know, and then we
  • 41:32would work on developing those
  • 41:33things.
  • 41:34Okay.
  • 41:36We have the clinical notes.
  • 41:37We need to do some
  • 41:38preprocessing,
  • 41:39deidentification,
  • 41:40these sort of things. Doctor
  • 41:42Shu mentioned named entity recognition
  • 41:45followed by relation extraction, then
  • 41:47there is this concept mapping
  • 41:48or concept normalization.
  • 41:50Finally, post process it and
  • 41:53get
  • 41:54all the structured data
  • 41:55from the unstructured
  • 41:57clinical notes. So that is
  • 41:59the basic block diagram of
  • 42:01any clinical information extraction pipeline.
  • 42:06I don't need to go
  • 42:07over this named entity recognition,
  • 42:09identify the boundaries,
  • 42:10relation, identify the relationship between
  • 42:13the entities,
  • 42:14and normalization,
  • 42:15doctors write the same thing
  • 42:16in hundred different types. High
  • 42:18VP, hypertension, all these are
  • 42:20the same. Right? So you
  • 42:21need to get it to
  • 42:22another standardized
  • 42:24vocabulary, terminology like ICD, SNOMED,
  • 42:27these sort of things. That
  • 42:28is what concept normalization
  • 42:29does. All these three things
  • 42:31comes together.
  • 42:32That is where you take
  • 42:33unstructured data and get your
  • 42:35structured thing out of it.
  • 42:37Okay. What does our general
  • 42:39clinical,
  • 42:41information extraction pipeline give you?
  • 42:44We mainly focused on four
  • 42:45main entities,
  • 42:47medical problem, treatment,
  • 42:49drug, and test.
  • 42:50Right? So you our Kiwi
  • 42:52tool will give you all
  • 42:54these four
  • 42:55main types of entities, but
  • 42:57these entities are not just
  • 42:58by themselves. Right?
  • 43:00When you are talking about
  • 43:01a drug, you have things
  • 43:03like the strength, the dosage,
  • 43:04the duration, the route, all
  • 43:06these things are important. And
  • 43:08we need to connect that
  • 43:09specific drug to that specific
  • 43:11route or specific
  • 43:12strength or dosage
  • 43:14to actually identify what has
  • 43:16the doctor written about giving
  • 43:18those information to that patient.
  • 43:21So we have a bunch
  • 43:22of main entities, and we
  • 43:24have a bunch of modifiers
  • 43:25that correspond to those main
  • 43:27entities.
  • 43:28Altogether,
  • 43:29this is what Kiwi is
  • 43:31gonna extract for you. I
  • 43:33know many of the things
  • 43:34that you may be needing
  • 43:36might be missing from this,
  • 43:38but we will. If there
  • 43:39are some other cases that
  • 43:40you would like to extract,
  • 43:42we may in future think
  • 43:43about incorporating that. So So
  • 43:45for medical problem, you have
  • 43:47the severity, the condition, the
  • 43:49uncertainty, who is the subject,
  • 43:50whether is it really talking
  • 43:51about the patient or his
  • 43:52family because we can see
  • 43:54all these sort of things
  • 43:55appearing in the notes.
  • 43:57Whether that particular problem is
  • 43:59negated or not, So this
  • 44:01is how it is. So
  • 44:02we have the four main
  • 44:03entities and all these modifiers.
  • 44:08Going very briefly. So YuJa
  • 44:10mentioned about the annotation. Let's
  • 44:11think when you annotate, this
  • 44:13is on the top figure
  • 44:14is something that you get.
  • 44:16Now suppose you are using
  • 44:17a large language model, it
  • 44:19understands the language of prompts.
  • 44:21Right? And doctor Xu covered
  • 44:22this, how to write a
  • 44:23proper prompt for a named
  • 44:25entity recognition.
  • 44:26So you define the task.
  • 44:28We want to identify medical
  • 44:30problems, treatment test, and, other
  • 44:32things, and then you give
  • 44:34us how you need the
  • 44:35output. That is
  • 44:36for making your programming life
  • 44:38easy to take it in
  • 44:39a particular output format so
  • 44:41that you can convert it
  • 44:42and evaluate it fast. So
  • 44:44that is the output guideline
  • 44:45markup.
  • 44:46Then you define each entity
  • 44:48because we have developed the
  • 44:50annotation guidelines, and that is
  • 44:51how the humans actually annotate.
  • 44:53So the model should also
  • 44:55know what is how the
  • 44:56humans have annotated. Otherwise, how
  • 44:58do you compare that gold
  • 44:59human standard annotated data with
  • 45:01what is what the model
  • 45:02is giving? So whatever information
  • 45:04you are giving the human,
  • 45:06you also give that to
  • 45:08a model
  • 45:08in the terms of entity
  • 45:10definitions.
  • 45:11And then annotation guidelines. We
  • 45:13talked about okay. Annotate only
  • 45:15complete noun phrases shouldn't be
  • 45:17partial, complete adjective phrases. These
  • 45:19sort of things that are
  • 45:20there in the annotation guideline
  • 45:22that you developed is also
  • 45:23provided to the model.
  • 45:25Now then you build your
  • 45:27training data by showing the
  • 45:29model a bunch of examples.
  • 45:31Suppose your input is at
  • 45:32the time of admission, he
  • 45:34denied fever, dysphoria, whatever it
  • 45:36is. So
  • 45:37how does the model provide
  • 45:38you the output? So it
  • 45:40should say that span class
  • 45:41problem fever.
  • 45:42That is telling the model,
  • 45:44okay,
  • 45:45fever is a problem. Whenever
  • 45:47see you see a medical
  • 45:48problem, put it between the
  • 45:50HTML
  • 45:51tags, span class is equal
  • 45:53to problem, the opening tag,
  • 45:55and slash span, which is
  • 45:56the closing tag. We did
  • 45:58that for our convenience because
  • 46:00we were comparing it with
  • 46:01the BERT and other models.
  • 46:03You can provide the output
  • 46:04in the way that you
  • 46:05want. You can use JSON
  • 46:07format, or if you just
  • 46:08want it to be plain
  • 46:09text in question answering and
  • 46:10things like that, you can
  • 46:12give the output in such
  • 46:13a way. But at least
  • 46:14with the named entity recognition
  • 46:15relation extraction, this really helps
  • 46:17us. And another thing is
  • 46:19that this also prevents or
  • 46:21helps us know that the
  • 46:22model is not hallucinating.
  • 46:24You see? You are giving
  • 46:25the input sentence and you
  • 46:27are also telling the model
  • 46:28to repeat the same sentence
  • 46:29but with some tags attached.
  • 46:31You can compare your input
  • 46:33and your output to see
  • 46:34that model is not inserting
  • 46:36entities or things that are
  • 46:38not already there in the
  • 46:39original sentence.
  • 46:42Okay.
  • 46:43So that is how you
  • 46:45create a prompt and do
  • 46:46NER with large language models.
  • 46:48The next step is relation
  • 46:50extraction.
  • 46:51There is a particular drug
  • 46:53you need to associate its
  • 46:54strength, its route, its form,
  • 46:57its frequency,
  • 46:58everything, and connect that particular
  • 47:00drug to whatever is mentioned
  • 47:02for it. Right? So for
  • 47:04that, for the relation extraction,
  • 47:06you need to slightly modify
  • 47:08your prompt when you give
  • 47:09it to. So here we
  • 47:10are saying your task is
  • 47:11to mark up modifier entities
  • 47:14when given a main entity.
  • 47:17So how do we train
  • 47:18the model for this task?
  • 47:20We will show the model
  • 47:21the main entity. That is
  • 47:22your input text. Span class
  • 47:24drug is equal to this.
  • 47:26Then you will ask the
  • 47:27model, given this main entity,
  • 47:29what are the modifier entities
  • 47:31associated with it? And then
  • 47:34you give examples in the
  • 47:35output where you see now
  • 47:37the main entity is not
  • 47:38annotated inside the span class
  • 47:40tags, whereas you see that
  • 47:42point three five mg is
  • 47:44within span class is equal
  • 47:45to strength.
  • 47:47So given a drug, you
  • 47:48say this
  • 47:49appearance is like this point
  • 47:51five milligram or mcg, these
  • 47:53sort of things, when it
  • 47:54sees repeatedly,
  • 47:55it's actually learning that this
  • 47:57is the strength associated with
  • 47:59that particular main entity.
  • 48:01So a lot of examples
  • 48:02that are annotated like this
  • 48:03is what is helping the
  • 48:04model learn.
  • 48:07Again, this is another same
  • 48:09sort of example. His blood
  • 48:10pressure on discharge was one
  • 48:12twenty six over sixty three.
  • 48:14Heart rate is eighty. You
  • 48:16cannot say blood pressure is
  • 48:17eighty. Right? It's the same
  • 48:18sentence which has two values
  • 48:20and two tests. You need
  • 48:22to correctly associate blood pressure
  • 48:23with hundred and twenty six
  • 48:25over sixty three and heart
  • 48:26rate with,
  • 48:27eighty.
  • 48:29Right? So we give the
  • 48:30input when we say blood
  • 48:32pressure is the entity. Its
  • 48:33value should be hundred and
  • 48:34twenty six over eighty. If
  • 48:36we highlight heart rate as
  • 48:37the entity, then the value
  • 48:39should be eighty.
  • 48:44Again, so now we originally
  • 48:46had the annotated data. We
  • 48:48converted into these instructions format
  • 48:51that had been showing for
  • 48:52the named entity recognition, things
  • 48:53like this. This is an
  • 48:54instruction demonstration
  • 48:56sample.
  • 48:56And for any so relation
  • 48:58extraction, the one that you
  • 48:59see down. So you have
  • 49:01these
  • 49:02and the entire dataset
  • 49:04converted into such things is
  • 49:06your instruction demonstration.
  • 49:08So you have a bunch
  • 49:09of these
  • 49:10examples that you collectively,
  • 49:13call as your instruction
  • 49:15dataset.
  • 49:16So previously, we have denoted
  • 49:18datasets for the other models.
  • 49:19They're just slightly different. The
  • 49:20term is instruction datasets because
  • 49:22the dataset is comprised of
  • 49:23a bunch of instructions or
  • 49:25prompts with the examples input
  • 49:27and output examples.
  • 49:29To instruction fine tune a
  • 49:31large language model, all you
  • 49:33need is such an instruction
  • 49:34dataset specific for your task,
  • 49:36a base large language model
  • 49:39like llama two, llama three,
  • 49:41llama four, or whatever it
  • 49:42is.
  • 49:43And then you give this
  • 49:45model, you train it, and
  • 49:47finally, you get an instruction
  • 49:49tuned large language model. So
  • 49:51if it is a LAMA
  • 49:52model as your base, you
  • 49:53will get an instruction tuned
  • 49:55LAMA,
  • 49:56but
  • 49:57the one that is actually
  • 49:59adapted
  • 50:00for those tasks.
  • 50:02So when you just take
  • 50:03the originally available LAMA model,
  • 50:05it's a general trained model.
  • 50:06Right? It's not adapted or
  • 50:08it is not domain adapted
  • 50:10for your specific task.
  • 50:12By fine tuning a large
  • 50:13language model, what you are
  • 50:15doing is making its capabilities
  • 50:17much more lean towards whatever
  • 50:20task you want to perform
  • 50:22by showing it a lot
  • 50:23of such examples and modifying
  • 50:25its weights in such a
  • 50:26way that it adapts to
  • 50:28that specific task.
  • 50:30That particular model, if you
  • 50:32now go and test back
  • 50:33on some general task, it
  • 50:35might not perform the way
  • 50:36that it previously
  • 50:38performed
  • 50:39because you have changed the
  • 50:40model weights and adapted it
  • 50:42to that specific task.
  • 50:44Okay. So this is basically
  • 50:46fine tuning and then you
  • 50:47would evaluate the model.
  • 50:49Now going back, I said
  • 50:51we also we had the
  • 50:52BERT based models too. There
  • 50:54also, as you just shown,
  • 50:55you would annotate the dataset,
  • 50:57but for a large language
  • 50:58model, you would give prompt.
  • 51:00For BERT, you would convert
  • 51:02it in such a way.
  • 51:04So BERT model is basically
  • 51:06sequence tagging. So you convert
  • 51:08each of the sentence into
  • 51:09tokens. Let's say a token
  • 51:11is a word. So vital
  • 51:13sign remains stable. And then
  • 51:15doctor Xu has covered this
  • 51:17BIO tagging is what we
  • 51:19call beginning of an entity,
  • 51:20inside of an entity, outside
  • 51:22of an entity. So vital
  • 51:24sign is a test here,
  • 51:25so you say b test.
  • 51:27If you have a problem,
  • 51:28you would say acute carcinoma
  • 51:30or something. I'm making this
  • 51:31up. So acute is gonna
  • 51:33be b problem
  • 51:34and carcinoma is gonna be
  • 51:36I problem. If it is
  • 51:37not within the four main
  • 51:39entities and four modifiers that
  • 51:41we have, we tag it
  • 51:42as o, which means outside
  • 51:44of an entity. So take
  • 51:46the same annotated dataset, convert
  • 51:48into two different formats. One,
  • 51:50safe for the llama, another
  • 51:51for the bird.
  • 51:52And for bird, this is
  • 51:54token classification. So given a
  • 51:56content given a sentence, you
  • 51:57are basically predicting whether vital
  • 51:59is among the b test,
  • 52:01I test, b problem, I
  • 52:02for problem, b value, I
  • 52:04value, whatever is the corresponding
  • 52:06label that should be for
  • 52:07that particular token. So it
  • 52:09is token classification task what
  • 52:11we do.
  • 52:13How does relation extraction correspond
  • 52:15in the case of BERT?
  • 52:17So here also the same
  • 52:18thing. Now it becomes sentence
  • 52:20classification.
  • 52:21You have two classes. One
  • 52:23that has value, which is
  • 52:25which is a positive class
  • 52:26and the other, it is
  • 52:27a negative class. So if
  • 52:29you show blood pressure and
  • 52:31eighty, that is a negative
  • 52:33sample. You should say label
  • 52:34that sentence as negative. If
  • 52:36you have blood pressure and
  • 52:37hundred and twenty six over
  • 52:39sixty three, then it is
  • 52:40a positive sample. So it
  • 52:42becomes a sentence classification
  • 52:44task. And so many patterns
  • 52:46like this and seeing repeated
  • 52:48sentences like that, the model
  • 52:49is learning that particular pattern
  • 52:51and identifying it. Another time
  • 52:53that sort of sentence appears,
  • 52:54okay. This is a positive
  • 52:56or has value or this
  • 52:57is a negative or a
  • 52:58negative class there.
  • 53:01This is the entire Kiwi
  • 53:03pipeline.
  • 53:04Okay.
  • 53:06Having
  • 53:07data
  • 53:08from multiple sources is important.
  • 53:11Something that works on your
  • 53:12specific data at a particular
  • 53:14hospital setting written by one
  • 53:16specific doctor in one particular
  • 53:17setting might not generalize well
  • 53:20when you try to use
  • 53:21that same pipeline in another
  • 53:23hospital,
  • 53:24in another node that is
  • 53:26written by another physician or
  • 53:27health care provider.
  • 53:29So for Kiwi, we actually
  • 53:31have data from four sources
  • 53:33so that we can
  • 53:34make the model much more
  • 53:36generalizable
  • 53:37and make it see the
  • 53:38patterns that happens in a
  • 53:40wide variety of data. We
  • 53:42have the UTP that is
  • 53:43UT Physicians, empty samples that
  • 53:45is an, publicly available dataset.
  • 53:47MIMIC three, you might know.
  • 53:48And all these data from
  • 53:50these different sources are incorporated
  • 53:52for our training process.
  • 53:54And as I mentioned, instruction
  • 53:56format for llama BERT format
  • 53:58for training the BERT models.
  • 54:00Then you fine tune both
  • 54:02the models, and then you
  • 54:03test the models out. So
  • 54:05you test it on a
  • 54:06subset of the UTP, empty
  • 54:08samples, and mimic three and
  • 54:10also on I two b
  • 54:11two. Again, doctor Xu mentioned
  • 54:12that I two b two
  • 54:13is unseen data. Right? It's
  • 54:15not in your training data.
  • 54:16That's how we are testing
  • 54:17the generalizability
  • 54:18to see whether it is
  • 54:19actually performing on unseen data.
  • 54:23Post process
  • 54:24separate entities relationship
  • 54:26calculate precision recall and f
  • 54:27one, and that is your
  • 54:29evaluation.
  • 54:31So this is,
  • 54:32quickly the composition of the
  • 54:34Kiwi dataset, means the Kiwi
  • 54:36model that we are giving
  • 54:38out currently. It has it
  • 54:39has been trained on about
  • 54:41one thousand four hundred documents
  • 54:43and then tested on some,
  • 54:45four different types, each having
  • 54:47fifty documents or,
  • 54:49twenty five documents.
  • 54:52So evaluation, I mentioned precision
  • 54:54recall and f one, and
  • 54:55it is exact match and
  • 54:57relaxed match. To be clear,
  • 54:58exact match, the entity type
  • 55:00should match and the boundary
  • 55:02should also match. But when
  • 55:04it is relaxed match, the
  • 55:05entity type should still match,
  • 55:07but the boundary can be
  • 55:09overlapping.
  • 55:12How did we perform? Llama
  • 55:14three seventy billion was sort
  • 55:16of better for NER task
  • 55:18and, again, for relation extraction,
  • 55:20but you also see that
  • 55:21some smaller models still performed
  • 55:24on par with it. Sometimes
  • 55:25you do not have much
  • 55:26difference with the BERT model,
  • 55:28but here we saw that
  • 55:29at least some statistical significance
  • 55:31was there. And I two
  • 55:33b two is the unseen
  • 55:34data, and doctor Xu mentioned
  • 55:36again about how,
  • 55:37you know, last language models
  • 55:39are better on unseen data
  • 55:41compared to the BERT. And
  • 55:42BERT models, again, definitely need
  • 55:44a lot more data to
  • 55:45train on.
  • 55:49Now what about the memory
  • 55:50usage, total GPU hours GPU
  • 55:53hours per epoch, energy consumption,
  • 55:55carbon emission?
  • 55:56That is where a lot
  • 55:58of these computational resources and
  • 56:00things comes into play. You
  • 56:01need huge amount of memory.
  • 56:03As you know, we are
  • 56:04comparing a BART model that
  • 56:05is about hundred million, three
  • 56:07hundred million parameters to something
  • 56:09that is seven billion, eight
  • 56:10billion, seventy billion, and that
  • 56:12difference
  • 56:13really shows,
  • 56:15in the amount of compute
  • 56:16and the, hours that you
  • 56:17require for training these models
  • 56:19and the memory that they
  • 56:21utilize.
  • 56:23So, if you want to,
  • 56:25fine tune the model, if
  • 56:26you are using this parameter
  • 56:27efficient fine tuning approaches like
  • 56:29Laura, Melingfei is gonna discuss
  • 56:31on that,
  • 56:32then it is,
  • 56:34you need one,
  • 56:35a one hundred eighty gigabyte
  • 56:37GPU. But if you,
  • 56:39need to do the inference
  • 56:41for the seventy billion model,
  • 56:42you need two a one
  • 56:44hundred,
  • 56:44eighty gigabyte GPU.
  • 56:47Okay. Again, our paper has
  • 56:48a lot of things. I
  • 56:49can skip through this. Just
  • 56:50want to talk about concept
  • 56:52normalization.
  • 56:53The actual way we do
  • 56:54concept normalization
  • 56:55is actually having elastic search,
  • 56:57which basically does exact
  • 57:00match and,
  • 57:00partial match of a z
  • 57:02match and then b m
  • 57:03twenty five to rerank
  • 57:05those
  • 57:06things extracted.
  • 57:07So here we in the
  • 57:09Kiwi, we have mapped it
  • 57:10to UMLS concept unique identifiers.
  • 57:13For anyone who's not familiar
  • 57:14with UMLS,
  • 57:15it is a meta UMLS
  • 57:17metatasaurus
  • 57:18basically incorporates
  • 57:19about hundred and, some vocabularies
  • 57:22and gives it a unique
  • 57:23identity. Same concepts from all
  • 57:25different vocabularies
  • 57:27are mapped to a unique
  • 57:28concept ID. So here,
  • 57:30this is a large language
  • 57:31model utilized concept normalization
  • 57:34pipeline.
  • 57:35Once you do the NER,
  • 57:36you get that query. On
  • 57:37the left side, you see
  • 57:39left atrium dilated. So that
  • 57:41is your query entity with
  • 57:43its context. Let's say the
  • 57:44sentence that has
  • 57:45that. That you give to
  • 57:46a last language model and
  • 57:48ask it to generate multiple
  • 57:50synonyms of it. Why are
  • 57:51we doing that? Because the
  • 57:52exact thing might not be
  • 57:53appearing in any of the
  • 57:54standardized vocabularies. So
  • 57:56we
  • 57:57generate as many variations of
  • 57:59that particular entity so that
  • 58:00we can do and
  • 58:02match,
  • 58:03that elastic search and b
  • 58:04m twenty five actually does
  • 58:05that. So you give the
  • 58:06original utterance and all the
  • 58:08synonyms and actually check-in
  • 58:10your,
  • 58:11you know,
  • 58:13database that you have created
  • 58:14whether that entity is actually
  • 58:16present there.
  • 58:18So,
  • 58:19you search, you will get
  • 58:20a bunch of concepts that
  • 58:21are sort of similar to
  • 58:22that, and then you again
  • 58:23use a large language model
  • 58:25to find among those concept
  • 58:27which is the best one
  • 58:28that actually
  • 58:30represents the originally redeemed entity.
  • 58:33I know I'm going so
  • 58:34fast, but
  • 58:36the slides will be available,
  • 58:37and we will also think
  • 58:38of making the recordings available
  • 58:40on the YBIG website.
  • 58:42Okay. Last step,
  • 58:44the Kiwi usually gives you
  • 58:45output in a JSON format,
  • 58:47but we also have scripts
  • 58:48to make it easy for
  • 58:49you so that that JSON
  • 58:50can be converted into a
  • 58:52CSV.
  • 58:52And what I have actually
  • 58:54highlighted is you see in
  • 58:55the first,
  • 58:56column, it's the entity, the
  • 58:58term that we have actually
  • 58:59extracted, and the highlighted one
  • 59:01is the concept ID for
  • 59:02that, which is basically
  • 59:04quiz or concept unique identifiers
  • 59:07of that particular thing from
  • 59:08UMLS.
  • 59:09And if you ask why
  • 59:10UMLS,
  • 59:11if there is a concept
  • 59:12unique identifier, you can actually
  • 59:14map it back to SNOMED,
  • 59:15ICD, mesh because UMLS includes
  • 59:18all those things. That's a
  • 59:19very easy task.
  • 59:21Where can you find Kiwi?
  • 59:22This is our website, You
  • 59:23know? Kiwi dot clinical n
  • 59:25l p dot org.
  • 59:27The QR code will take
  • 59:28you right there. You press
  • 59:30the live demo,
  • 59:31and then
  • 59:33you, get a prepopulated,
  • 59:36note, a few sentences.
  • 59:38Click submit. It will show
  • 59:40you the entities and the
  • 59:41relation
  • 59:42extracted. You can remove that
  • 59:44text, add your own text.
  • 59:45No programming experience. You can
  • 59:47put something in there and
  • 59:48get to see what are
  • 59:49the entities. Just play around
  • 59:51with that.
  • 59:52Okay. Now if you want
  • 59:54to download the models, we
  • 59:56have this another page called
  • 59:57download. You need to fill
  • 59:58a form,
  • 60:00and then we will send
  • 01:00:01you,
  • 01:00:02the Docker images for that.
  • 01:00:03Now how is Docker,
  • 01:00:05different?
  • 01:00:06Everything is prepackaged
  • 01:00:07into a container. You do
  • 01:00:09not need to install things
  • 01:00:10separately. You just and the
  • 01:00:12Docker comes with instructions
  • 01:00:14as to what to do.
  • 01:00:15It's just like an executable
  • 01:00:17you run,
  • 01:00:18select, okay, one, two, three
  • 01:00:19numbers. It also has a
  • 01:00:21readme file, which gives
  • 01:00:23you the ways how to
  • 01:00:24run about it, and then
  • 01:00:25the output will be stored
  • 01:00:27in the you need to
  • 01:00:27give this, where your input
  • 01:00:29data is and the output,
  • 01:00:30where you want the output
  • 01:00:32which to be, and it
  • 01:00:33will give you run the
  • 01:00:34entire Kiwi and give you
  • 01:00:35the output there. Output
  • 01:00:37there.
  • 01:00:38Okay? So easy to install
  • 01:00:39Docker images. All the dependencies,
  • 01:00:40everything is taken care of.
  • 01:00:42Can be run on Linux,
  • 01:00:43Mac, Windows. You have CPU.
  • 01:00:45You have we we have
  • 01:00:46versions for that. You have
  • 01:00:47GPU. We have versions for
  • 01:00:49that.
  • 01:00:50And, we have both the
  • 01:00:51BERT based and LAMA based
  • 01:00:53models that does the thing
  • 01:00:54that I was talking about.
  • 01:00:57Finally,
  • 01:00:58what,
  • 01:00:59Vincent is gonna demo is
  • 01:01:01forget about all these things.
  • 01:01:03Your data is on the
  • 01:01:04chip. You want to directly
  • 01:01:05use it just with an
  • 01:01:07API call. Currently, you need
  • 01:01:09to contact Chris Gilman, who's
  • 01:01:11a senior software engineer, to
  • 01:01:12get that API for calling
  • 01:01:14the Kiwi. But in future,
  • 01:01:15we are gonna come up
  • 01:01:16with a system where you
  • 01:01:17can submit the tickets, get
  • 01:01:18the API key. So get
  • 01:01:20your API key, put it
  • 01:01:21into a program that we
  • 01:01:22are gonna give you, run
  • 01:01:23it. That's as easy as
  • 01:01:24it gets.
  • 01:01:27A growing database, about thirty
  • 01:01:29two requests so far since
  • 01:01:30we released, and, that's it.
  • 01:01:32I don't want to
  • 01:01:34go more on that. What's
  • 01:01:35coming up? We have more
  • 01:01:37packages that we have actually
  • 01:01:39built but not made it
  • 01:01:40available in as a docker
  • 01:01:41or a service or something
  • 01:01:42like that. One of those
  • 01:01:44that we are thinking of
  • 01:01:45making it available on the
  • 01:01:46chip or as, Kiwi is
  • 01:01:48currently
  • 01:01:49is the resist pipeline, which
  • 01:01:50is extracting systematic anticancer therapy
  • 01:01:53and the responses based on
  • 01:01:55the, RECIST guidelines. Again, I'm
  • 01:01:57not a clinician, so I'm
  • 01:01:58not going on to it.
  • 01:01:59So probably you can see
  • 01:02:00something like the similar in
  • 01:02:02future
  • 01:02:02available,
  • 01:02:03like the stalker images or
  • 01:02:05API services or something that
  • 01:02:06you can download and play
  • 01:02:07with.
  • 01:02:09My main area of research,
  • 01:02:11social determinants of health. This
  • 01:02:12is another pipeline that I
  • 01:02:14have built. Twenty one social
  • 01:02:16determinants of health,
  • 01:02:17four one, two, three, four.
  • 01:02:19Yeah. Four different models.
  • 01:02:21You start from XGBoost,
  • 01:02:23TextCNN,
  • 01:02:24SentenceBird,
  • 01:02:25llama. After that, that actually
  • 01:02:28can take your notes and
  • 01:02:30annotate it on two levels,
  • 01:02:32on twenty one social determinants,
  • 01:02:34determinant factors. And let's say,
  • 01:02:37it does a sort of
  • 01:02:38sentence classification.
  • 01:02:39It takes your note. It
  • 01:02:41divides it into sentences, tells
  • 01:02:42you, okay. This sentence is
  • 01:02:43talking about race, sex, gender.
  • 01:02:46This sentence is talking about
  • 01:02:47the insurance of the person.
  • 01:02:48So this sentence is talking
  • 01:02:50about their education. Now we
  • 01:02:51go one more level. You
  • 01:02:52will also have models that
  • 01:02:54tells you, okay. This education
  • 01:02:56of this person is high
  • 01:02:57school or below. The insurance,
  • 01:02:59it's yes. The person has
  • 01:03:01having an insurance or no.
  • 01:03:02So high level on the
  • 01:03:04twenty one factors, both the
  • 01:03:06values and attributes
  • 01:03:08on another
  • 01:03:09digging another level deep. So
  • 01:03:11that's all we have here.
  • 01:03:12And, also, I just wanna,
  • 01:03:15forgot that.
  • 01:03:16Yeah.
  • 01:03:18When you sign a DUA
  • 01:03:19with us,
  • 01:03:20we are gonna give you
  • 01:03:21the model weights of KB
  • 01:03:23that is still in the
  • 01:03:24pipeline. It will come here
  • 01:03:26in the form. So in
  • 01:03:27this form that I'm asking
  • 01:03:28you to fill to get
  • 01:03:29the docker images would also
  • 01:03:31be if you are good
  • 01:03:32in programming,
  • 01:03:33take our model,
  • 01:03:35you know, continuously fine tune
  • 01:03:37on it with your data,
  • 01:03:38make it whatever you want
  • 01:03:40to do with it. So,
  • 01:03:41that is another thing, but
  • 01:03:42you need to sign a
  • 01:03:43DUA with us, and that
  • 01:03:44form will be available soon
  • 01:03:46there. With that, Vincent, take
  • 01:03:48it over for a Kiwi
  • 01:03:49API demo.
  • 01:03:56Not taking questions because of
  • 01:03:58the time that we focus
  • 01:03:59on.
  • 01:04:07So good afternoon, everyone. My
  • 01:04:09name is Vincent, and I'm
  • 01:04:10a software developer in doctor.
  • 01:04:12She labs.
  • 01:04:13And today, I will talk
  • 01:04:15about how to use the
  • 01:04:16Kiwi API service.
  • 01:04:18And the core concept is
  • 01:04:20of of the Kiwi have
  • 01:04:22already become discussed with, by,
  • 01:04:24so I will talk very
  • 01:04:26quick.
  • 01:04:28So what is the Kiwi's
  • 01:04:31so API service?
  • 01:04:33The Kiwi API service provide
  • 01:04:35an API as a service
  • 01:04:37interface
  • 01:04:38that allow user within
  • 01:04:40chips
  • 01:04:41in internal network to access
  • 01:04:42a Kiwi without a request,
  • 01:04:44high performance GPU and having
  • 01:04:46to install
  • 01:04:47or manage the model locally.
  • 01:04:50User simply request a API
  • 01:04:53key and make standard HTTP
  • 01:04:55API API calls to use
  • 01:04:57the service.
  • 01:04:58All computational
  • 01:04:59resource are running on chips,
  • 01:05:01so and, we don't need
  • 01:05:03to request the local GPU,
  • 01:05:05GPU.
  • 01:05:06Does this setup of streamline
  • 01:05:08assess TV functionality
  • 01:05:11and, makes it more accessible
  • 01:05:13in resource constraint
  • 01:05:15environment.
  • 01:05:20So how does the Kiwi
  • 01:05:21API service actually work?
  • 01:05:23The process follow a simple
  • 01:05:25request and,
  • 01:05:27a response
  • 01:05:29response.
  • 01:05:30User can send a request
  • 01:05:31to the Kiwi API server
  • 01:05:33in chips environment such as
  • 01:05:35the cam Camino, which include
  • 01:05:37the either,
  • 01:05:38the clinical notes or other
  • 01:05:40tests related data.
  • 01:05:42Once the API service received
  • 01:05:44the request, it's determined
  • 01:05:46the task type, based on
  • 01:05:48the specific
  • 01:05:49endpoint and then return the
  • 01:05:51up appropriate appreciate the response.
  • 01:05:55Most tasks are handled by
  • 01:05:57the background process on the
  • 01:05:58API server.
  • 01:06:00All incoming requests are and
  • 01:06:02queued
  • 01:06:03is queued and processed sequentially
  • 01:06:06to issue the efficient using
  • 01:06:08the lim and imitate,
  • 01:06:10computational resource.
  • 01:06:11So
  • 01:06:13and then
  • 01:06:15let's take a closer look
  • 01:06:17at how to use the
  • 01:06:19QApi service.
  • 01:06:20Before we getting start,
  • 01:06:23there are something you need
  • 01:06:25to prepare. First,
  • 01:06:27obviously, you need,
  • 01:06:29access in the chips environment
  • 01:06:31like the Camino.
  • 01:06:32You need to have a
  • 01:06:33one HX
  • 01:06:35account.
  • 01:06:36Then you
  • 01:06:37need to answer right API
  • 01:06:38key, for the detail just
  • 01:06:40mentioned by,
  • 01:06:42You need to ask Chris
  • 01:06:44Gellman to get the API
  • 01:06:45key.
  • 01:06:47For the user who have
  • 01:06:49some coding experience, they can
  • 01:06:51write their own script to
  • 01:06:52access the API, but we
  • 01:06:54also provide the API launch
  • 01:06:56script and some use case
  • 01:06:57include
  • 01:06:58in Jupyter Notebook provided under
  • 01:07:00this, GitHub,
  • 01:07:01link.
  • 01:07:06Now assuming you, you already
  • 01:07:08have the access in into
  • 01:07:09the formula and then you
  • 01:07:10you open a Jupyter notebook,
  • 01:07:12and then you have a
  • 01:07:13a key API key. First,
  • 01:07:16you need to define a
  • 01:07:17variable to instantiate
  • 01:07:19the the class, that I
  • 01:07:20provide in the script.
  • 01:07:21At this step, you need
  • 01:07:23to insert
  • 01:07:24your API key and into
  • 01:07:26this instance.
  • 01:07:28For the first time,
  • 01:07:29for using the API key
  • 01:07:30server,
  • 01:07:32you can use the key
  • 01:07:33info function to test your
  • 01:07:35connection.
  • 01:07:37This will give some, result.
  • 01:07:39Yeah.
  • 01:07:40It, responds as a JSON
  • 01:07:42format,
  • 01:07:43information.
  • 01:07:44There are three main component
  • 01:07:46in the in this response.
  • 01:07:48You can see the usage
  • 01:07:49count and, which tells you
  • 01:07:52how many tokens you use
  • 01:07:53since you create the API
  • 01:07:54key. And the token remain,
  • 01:07:57which tells you how many
  • 01:07:58token you still have in
  • 01:08:00the API key. Finally, the
  • 01:08:02expire at, tells you when
  • 01:08:04the key is expired.
  • 01:08:06For the token, the expire
  • 01:08:07state, you can contact our
  • 01:08:09team to add add in
  • 01:08:10the usage in the future.
  • 01:08:13Our main function process is
  • 01:08:15the batch prediction,
  • 01:08:17which allow user process their
  • 01:08:19clinical node in the bark.
  • 01:08:21To use this function,
  • 01:08:23you need to provide the
  • 01:08:24path of your files in
  • 01:08:26into in the communal environment.
  • 01:08:29Currently, the upload format are
  • 01:08:31support to compress the files
  • 01:08:32such as the deep or
  • 01:08:33tar or the single text
  • 01:08:36t file, and you can
  • 01:08:37compress your,
  • 01:08:38node into a single text
  • 01:08:39t file as well.
  • 01:08:41If the sit,
  • 01:08:42your note will on the
  • 01:08:44Kiwi server as a task
  • 01:08:45in the queue. The function
  • 01:08:47will return the text status
  • 01:08:48including the test ID for,
  • 01:08:51for this test as a
  • 01:08:52JSON format.
  • 01:08:53All all test ID are
  • 01:08:54related to the API key,
  • 01:08:57which means all the user
  • 01:08:58data is isolate isolated by
  • 01:09:00the API key and the
  • 01:09:01test ID.
  • 01:09:04Here, is an example when
  • 01:09:06you do a batch prediction.
  • 01:09:09As you can see, it's
  • 01:09:10a return, JSON format of
  • 01:09:12the test information.
  • 01:09:14It's including the test ID
  • 01:09:16and the message that shows
  • 01:09:17the state, status and how
  • 01:09:20many tokens using this, task
  • 01:09:22and, how many token remaining
  • 01:09:24your,
  • 01:09:25account.
  • 01:09:26Finally, the estimate time for
  • 01:09:28your task is calculated based
  • 01:09:30under your task queue position
  • 01:09:31and the progress of the
  • 01:09:33task, ahead of it.
  • 01:09:37After you submit a task
  • 01:09:39and receive a task ID,
  • 01:09:41you can use your task
  • 01:09:42ID to check the current
  • 01:09:43status of your task at
  • 01:09:45anytime using the task ID
  • 01:09:47status function.
  • 01:09:49It will provide your,
  • 01:09:51task information and the detail.
  • 01:09:53Typically, there are three main,
  • 01:09:56status of your,
  • 01:09:58the test data. So first
  • 01:09:59is the queue status.
  • 01:10:01When a state,
  • 01:10:02task in the queue, the
  • 01:10:04system will provide the,
  • 01:10:06test current queue position as
  • 01:10:08well as the estimate time
  • 01:10:10for processing.
  • 01:10:11And next is processing.
  • 01:10:13When no no one attack
  • 01:10:15you,
  • 01:10:16your task will put into
  • 01:10:18a process or pull. It
  • 01:10:20will in indicate how many
  • 01:10:21files into your task and,
  • 01:10:24include how many not yet
  • 01:10:26processed
  • 01:10:27and, how many have you
  • 01:10:28been proceed
  • 01:10:29and, also, remaining time based
  • 01:10:31on the remaining files.
  • 01:10:33Finally, the incomplete
  • 01:10:35status, you can use the
  • 01:10:37test ID in the next
  • 01:10:38function to download your result.
  • 01:10:42Once your, test is start
  • 01:10:44complete,
  • 01:10:45you can download your
  • 01:10:47test using the type download
  • 01:10:49function.
  • 01:10:50In this function, you need
  • 01:10:51to give output path, and,
  • 01:10:53then you want to save
  • 01:10:54your, file as a add
  • 01:10:56local and the output type
  • 01:10:58you you prefer.
  • 01:11:00By default, the path are
  • 01:11:01the working directory,
  • 01:11:03and the the output,
  • 01:11:05output is the JSON format.
  • 01:11:07For the file saving, it
  • 01:11:08supported three types of the
  • 01:11:10tip that typically use. First
  • 01:11:13is seed file,
  • 01:11:14which compress the all result
  • 01:11:16into a separate JSON file
  • 01:11:17for your input files.
  • 01:11:19And then it, insert the
  • 01:11:21JSON, which combined all the
  • 01:11:23JSON result into a single
  • 01:11:25JSON file.
  • 01:11:26Finally, it's the CSVS,
  • 01:11:28where we've seen our mission.
  • 01:11:29We have a con convert,
  • 01:11:31integrate into the j the
  • 01:11:33eight QVAP service so they
  • 01:11:34can just,
  • 01:11:35simply,
  • 01:11:37output CSV.
  • 01:11:39Each file can only be
  • 01:11:41download once. After you download,
  • 01:11:43you cannot access,
  • 01:11:45again because,
  • 01:11:46for the some,
  • 01:11:48privacy issue, they will delete
  • 01:11:50the delete the re record
  • 01:11:52of the data.
  • 01:11:54And, here is a example
  • 01:11:55for download. Typically, it will
  • 01:11:57save your file into local
  • 01:11:59directory and give you some
  • 01:12:01message to tell you if
  • 01:12:02this, succeed.
  • 01:12:04And the left side, it
  • 01:12:05should be the, like, the
  • 01:12:06typical re re
  • 01:12:08result format.
  • 01:12:10And,
  • 01:12:11in some case, you might
  • 01:12:13submit multiple task and or
  • 01:12:16forget a specific
  • 01:12:18task ID, this function allow
  • 01:12:20you to quickly review the
  • 01:12:22status of each text.
  • 01:12:24This this function will list
  • 01:12:26all tests that not yet
  • 01:12:27downloaded.
  • 01:12:28Yeah. Yeah.
  • 01:12:30And, if you exit as
  • 01:12:33accidentally
  • 01:12:34submit your task, if the
  • 01:12:36task is not,
  • 01:12:37into a process,
  • 01:12:39you can still using,
  • 01:12:40this, this
  • 01:12:42function to cancel your task
  • 01:12:44be, before the
  • 01:12:46task is into
  • 01:12:48the process pool,
  • 01:12:49and, they will give you
  • 01:12:51the token back.
  • 01:12:53So I will show you
  • 01:12:54a quick demo for them.
  • 01:13:17Okay. Sure. So
  • 01:13:18So I think we'll skip
  • 01:13:19the demo and move to
  • 01:13:20the next speaker. It's the
  • 01:13:22same things, but you see
  • 01:13:23it on coming on. Like,
  • 01:13:25I'll actually showed you the
  • 01:13:26program there. Right.
  • 01:13:27Yeah.
  • 01:13:29Thank you.
  • 01:13:35Oh, hi, everyone. My name
  • 01:13:37is Lingfei Chen. I am
  • 01:13:39and, I am a postdoc
  • 01:13:40at doctor Vashu's group. Today,
  • 01:13:42I'm going to show you
  • 01:13:43how to develop customized models
  • 01:13:45customized models for some specific
  • 01:13:46applications.
  • 01:13:48So at the beginning, I
  • 01:13:49would like to introduce why
  • 01:13:51we need those customized models.
  • 01:13:53We all know that large
  • 01:13:54language models like LLAMA and
  • 01:13:56GPT series have shown great
  • 01:13:58potential in many domains
  • 01:14:00as they,
  • 01:14:01portrayed on large scale of
  • 01:14:03text, and they have strong
  • 01:14:05instruction following abilities
  • 01:14:07across different tasks, and they
  • 01:14:09have a wide coverage of
  • 01:14:10general knowledge.
  • 01:14:11However, they may not fully
  • 01:14:13capture the analysis of some
  • 01:14:15specific tasks tasks or user
  • 01:14:17needs, especially when some, like,
  • 01:14:20the task involves some,
  • 01:14:22like,
  • 01:14:23specific design definitions
  • 01:14:26or the task is actually
  • 01:14:27real in common users.
  • 01:14:29And that's why we need
  • 01:14:31to develop customized models for
  • 01:14:33ourselves.
  • 01:14:34We could in enhance the
  • 01:14:35model with the domain specific
  • 01:14:37expertise
  • 01:14:38in this task and improve
  • 01:14:40the performance of existing large
  • 01:14:42range models.
  • 01:14:43And in the process of
  • 01:14:45improving the performance, we could
  • 01:14:47actually, like, let the small
  • 01:14:49size smaller size models to
  • 01:14:51get comparable performance with those
  • 01:14:53large larger size models, and
  • 01:14:55we could get more, like,
  • 01:14:57efficient and cost effective.
  • 01:14:59And, also, we could better
  • 01:15:01user experience
  • 01:15:03be experienced by reducing some
  • 01:15:05of the hallucinations in those
  • 01:15:06existing large language models.
  • 01:15:09And here are some key
  • 01:15:10steps of developing customized models.
  • 01:15:13So the first is to
  • 01:15:14actually define your what is
  • 01:15:16your NRP task, and the
  • 01:15:18second is to prepare those
  • 01:15:20data to,
  • 01:15:21train and evaluate the large
  • 01:15:23language model.
  • 01:15:26Prevent,
  • 01:15:27the prep the preparation of
  • 01:15:28data involving some, like, steps
  • 01:15:31that you have, like, introduced
  • 01:15:33before do the data annotation
  • 01:15:35and data pre processing to
  • 01:15:36afford any further models.
  • 01:15:38And after we get to
  • 01:15:40the data, we could start
  • 01:15:41model training is to enhance
  • 01:15:43the performance of the models
  • 01:15:44with those task specific data.
  • 01:15:47And then
  • 01:15:48once we finish the the
  • 01:15:50model training, we could actually
  • 01:15:52use another set of this
  • 01:15:53annotated data to evaluate the
  • 01:15:55performance of our developed
  • 01:15:57model to see if the
  • 01:15:59performance actually gained when compared
  • 01:16:02with the backbone model.
  • 01:16:03And then once we confirm
  • 01:16:05that the performance of the
  • 01:16:07model
  • 01:16:07improved, we could actually use
  • 01:16:09this model, do this customized
  • 01:16:11models for those production.
  • 01:16:14And here is a general
  • 01:16:15workflow of model training and
  • 01:16:17evaluation.
  • 01:16:18Once we define our task
  • 01:16:20and prepare our data, we
  • 01:16:22need to, like, split data
  • 01:16:23into different subsets.
  • 01:16:26Usually, we would have three
  • 01:16:27subsets. The first is the
  • 01:16:28training data to develop the
  • 01:16:30model, and the second would
  • 01:16:31be the validation
  • 01:16:32data to validate the effectiveness
  • 01:16:35of the training model. But
  • 01:16:36for simplification, we just use
  • 01:16:38that data to, to instead,
  • 01:16:41like, the we use the
  • 01:16:42test data to, like, evaluate
  • 01:16:45the training model. If the
  • 01:16:46training model is, effective compared
  • 01:16:49with the backbone model, we
  • 01:16:50then could use it to
  • 01:16:52for the production model to,
  • 01:16:54like, process the production data.
  • 01:16:56And if the training model
  • 01:16:58actually the performance might decrease,
  • 01:17:00once it's decreased, we might
  • 01:17:01need to adjust the training
  • 01:17:03process
  • 01:17:04to, like,
  • 01:17:06we redo the training part.
  • 01:17:10And,
  • 01:17:11next, I will show more
  • 01:17:12details of each step. I
  • 01:17:14will, start from how to
  • 01:17:16define tasks.
  • 01:17:18This is actually a real
  • 01:17:19example.
  • 01:17:20Start from, like, task task
  • 01:17:22design. I will show you
  • 01:17:24how to, like, develop customized
  • 01:17:26models step by step.
  • 01:17:28So let's say that we
  • 01:17:29have a research to investigate
  • 01:17:31about the impact of bilingual
  • 01:17:34bilingualism
  • 01:17:35and ADRD pro progression.
  • 01:17:38So the first step
  • 01:17:40is, like, in different clinical
  • 01:17:42researches, is to find eligible
  • 01:17:43patients.
  • 01:17:44But once we find those
  • 01:17:46eligible patients, we need to
  • 01:17:47identify those bilingual or monolingual
  • 01:17:49patients within these patients.
  • 01:17:52And,
  • 01:17:53the first thought is to,
  • 01:17:55like, check the structured data
  • 01:17:57about those, preferred language or
  • 01:17:59written language,
  • 01:18:01areas to see what the
  • 01:18:03language the patient will prefer.
  • 01:18:05But where where we, like,
  • 01:18:07check the actual data, we
  • 01:18:08find that the structured data
  • 01:18:10actually,
  • 01:18:11the there might not be
  • 01:18:13enough this kind of data
  • 01:18:15to support our research,
  • 01:18:16and some of them might
  • 01:18:18not be even accurate.
  • 01:18:20But we noticed that there
  • 01:18:22are a lot of language
  • 01:18:23information contained in the clinical
  • 01:18:25notes. For example, many notes
  • 01:18:27would record what the patient,
  • 01:18:29speak, what is the preferred
  • 01:18:30language, and how well do
  • 01:18:32they speak. So we might,
  • 01:18:34like, comprehensively
  • 01:18:36extract all the language speaking
  • 01:18:38status from all the clinical
  • 01:18:40notes, like using those OLP,
  • 01:18:43models.
  • 01:18:44There are two targets that
  • 01:18:46we want to extract. The
  • 01:18:47first is what language does
  • 01:18:49the patient speak and how
  • 01:18:50well do they speak. So,
  • 01:18:52for these two
  • 01:18:54specific tasks,
  • 01:18:55aims, we, like,
  • 01:18:57could formulate the task as
  • 01:18:59a task.
  • 01:19:01The first the first thing
  • 01:19:02we want to do is
  • 01:19:03to identify all the language
  • 01:19:05entities in the clinical notes,
  • 01:19:06and then we could assign
  • 01:19:08different tags based on different
  • 01:19:10context
  • 01:19:11to indicate different speaking status
  • 01:19:13of the patient.
  • 01:19:16Once we, like, formulate the
  • 01:19:18task as a task, we
  • 01:19:20need to further refine the
  • 01:19:22details of the task.
  • 01:19:24We might need to review
  • 01:19:26some of the clinical notes
  • 01:19:27and design different tasks for
  • 01:19:30the task.
  • 01:19:32In this, like, data reviewing
  • 01:19:34after the data review, we,
  • 01:19:36designed four different tasks for
  • 01:19:38this task. The first is
  • 01:19:39language fluent, which indicate the
  • 01:19:41patient speaks some kind of
  • 01:19:42language fluently.
  • 01:19:44And there is another language
  • 01:19:46sum to indicate the patient
  • 01:19:47speaks some of the languages,
  • 01:19:49and that they are language
  • 01:19:50no and the language other.
  • 01:19:52And here are some examples.
  • 01:19:54For, for the first one,
  • 01:19:56language fluent,
  • 01:19:57some of the sentences says
  • 01:19:59that the patient speaks Italian
  • 01:20:02primary
  • 01:20:03primarily.
  • 01:20:05So this indicate that the
  • 01:20:06person is has lang fluent
  • 01:20:09Italian
  • 01:20:10abilities.
  • 01:20:11And for the language some,
  • 01:20:12here's the sentence. She speaks
  • 01:20:14some English.
  • 01:20:15And, no, the patient does
  • 01:20:17not speak English. And for
  • 01:20:19the language other, we actually,
  • 01:20:22when when we reviewing data,
  • 01:20:24we found a lot of,
  • 01:20:25like, languages used in other
  • 01:20:28individuals. For example, the patient's
  • 01:20:30family or the
  • 01:20:32written language. So this kind
  • 01:20:34of language do not, indicate
  • 01:20:36the language speaking status of
  • 01:20:38the patient, so we categorize
  • 01:20:39them as language other.
  • 01:20:42Once we refine the details
  • 01:20:44of the EER task, we
  • 01:20:45need to, like, start to
  • 01:20:47prepare the data for model
  • 01:20:49training and the model evaluation.
  • 01:20:52Here is the overall flow
  • 01:20:54to prepare the data for,
  • 01:20:56like, model developing.
  • 01:20:58We first need to get
  • 01:20:59some raw data and
  • 01:21:01do the annotation with the
  • 01:21:03annotation guideline that we developed
  • 01:21:05before. And once we get
  • 01:21:07the annotated results, we may
  • 01:21:09need to, like, design a
  • 01:21:10prompt for this task.
  • 01:21:12Some of the prompt has
  • 01:21:14been, like, discussed,
  • 01:21:16by doctor Shui and Vipina.
  • 01:21:18And once we get the
  • 01:21:19prompts and all the annotated
  • 01:21:21results, we need to process
  • 01:21:22the results for the models
  • 01:21:24to load to start the
  • 01:21:25training and evaluation.
  • 01:21:28And,
  • 01:21:29this is the annotation
  • 01:21:30using Blue that Yuja has
  • 01:21:32mentioned before, so I'm just
  • 01:21:33gonna skip this. And here
  • 01:21:35is the annotated results look
  • 01:21:37like. Usually, we would have
  • 01:21:39a JSON file for each
  • 01:21:40input sample, and each
  • 01:21:43each sample looks like this.
  • 01:21:45So it will record file
  • 01:21:47name and the original sentence
  • 01:21:49and also the entities and
  • 01:21:51the positions of entities that
  • 01:21:53we,
  • 01:21:54marked.
  • 01:21:56And here is the prompt.
  • 01:21:58I'm just gonna skip this.
  • 01:22:00And
  • 01:22:01after we get all the
  • 01:22:03files and the prompt, we
  • 01:22:04need to process the data
  • 01:22:05based on different prompts. For
  • 01:22:07example,
  • 01:22:08our task is to, like,
  • 01:22:10annotate all the text in
  • 01:22:12the original sentence with, HTML
  • 01:22:15tag. So we might need
  • 01:22:17to process the input as
  • 01:22:19the original sentence, and the
  • 01:22:20output and the target output
  • 01:22:22would be,
  • 01:22:24the same sentence, but with
  • 01:22:26all the entities
  • 01:22:27wrapped with this, like,
  • 01:22:30a language front or language
  • 01:22:32sum, those tags,
  • 01:22:34to wrap it with HTML
  • 01:22:35tag.
  • 01:22:37And this is also the
  • 01:22:39data preparation for the models
  • 01:22:41to load. I will prepare
  • 01:22:42I will I will show
  • 01:22:44the code, later so you
  • 01:22:45can just directly try the
  • 01:22:47code to to process it.
  • 01:22:49So after we prepare the
  • 01:22:51data, we now finally
  • 01:22:54can start the fine tuning
  • 01:22:55process.
  • 01:22:58So fine tuning
  • 01:23:00process is actually
  • 01:23:01a process to adjust the
  • 01:23:03weights of the large language
  • 01:23:05models to make it adapt
  • 01:23:07to our task specific data.
  • 01:23:10So, usually, we need to,
  • 01:23:12like,
  • 01:23:13adjust all the weights of
  • 01:23:14the model, but like, we
  • 01:23:16know that for large language
  • 01:23:17model, there are a lot
  • 01:23:18of parameters. So,
  • 01:23:20the full fine tuning would
  • 01:23:21be very, like,
  • 01:23:23computational
  • 01:23:24cost cost will be very
  • 01:23:26high. So instead, we would
  • 01:23:28use,
  • 01:23:29like,
  • 01:23:30widely used way, LoRa, to
  • 01:23:32do the fine tuning. LoRa
  • 01:23:33is actually,
  • 01:23:35a low rank adaptation
  • 01:23:36that use, two, like, different,
  • 01:23:39small vectors here in the
  • 01:23:42green
  • 01:23:43green
  • 01:23:44in the green,
  • 01:23:46in the green part.
  • 01:23:48Instead of fine tuning the
  • 01:23:50entire large network model, we
  • 01:23:52we only need to adjust
  • 01:23:54the parameters in this, like,
  • 01:23:55small vectors.
  • 01:23:57So, compared with full fine
  • 01:23:59tuning, it is much more
  • 01:24:00faster, and we only need
  • 01:24:02minimal training resources.
  • 01:24:04And, but it it needs
  • 01:24:05high quality datasets.
  • 01:24:07So we need to, like,
  • 01:24:08define the task and
  • 01:24:10develop the annotation guideline carefully.
  • 01:24:13And it also has some
  • 01:24:15high risk of over fifty.
  • 01:24:17So for the resources,
  • 01:24:19for eight eight billion model,
  • 01:24:21it might need one a
  • 01:24:22one hundred or h one
  • 01:24:23hundred GPU. But for seventy
  • 01:24:25billion models, we might need
  • 01:24:26two h one hundred GPUs
  • 01:24:28to do the fine tuning.
  • 01:24:30So, here is some environment
  • 01:24:32setup. I also provided the
  • 01:24:34code in the,
  • 01:24:36at last so you can
  • 01:24:37check it for more details.
  • 01:24:40And if we want to
  • 01:24:41do the fine tuning, we
  • 01:24:43need to, like,
  • 01:24:45modify
  • 01:24:46the config file in the
  • 01:24:48code that I provided.
  • 01:24:50The first one is to,
  • 01:24:51like, indicate where is the
  • 01:24:53model
  • 01:24:54stored in the Camino or
  • 01:24:55CHP environment.
  • 01:24:57So
  • 01:24:58in the folder, it should
  • 01:24:59look like this. It has,
  • 01:25:01many, like,
  • 01:25:02weights of the model, some
  • 01:25:03details of the model.
  • 01:25:05And for the data, we
  • 01:25:07also need to provide the
  • 01:25:08the pass of the data
  • 01:25:09that we
  • 01:25:10processed before to tell the
  • 01:25:12model where is the data.
  • 01:25:14And
  • 01:25:15beside the model and the
  • 01:25:16data, we also need to
  • 01:25:17set up some other, like,
  • 01:25:19configs. For example, the most
  • 01:25:21important one might be the
  • 01:25:23learning rate. You could, like,
  • 01:25:25adjust the learning rate based
  • 01:25:26on
  • 01:25:27the evaluation results of the,
  • 01:25:30trained model.
  • 01:25:32And finally, we need we
  • 01:25:35can we can start the
  • 01:25:36model's fine tuning.
  • 01:25:38The fine tuning process is
  • 01:25:40actually very easy. Once we,
  • 01:25:42finish the config file, we
  • 01:25:43can just start the fine
  • 01:25:45tuning with only one line
  • 01:25:47of a command.
  • 01:25:48So after fine tuning, we
  • 01:25:49will get adapt adaptator
  • 01:25:52adapter
  • 01:25:54parameters, which is a very
  • 01:25:55small file.
  • 01:25:57So after we get this
  • 01:25:58LoRa adapter, we need to
  • 01:26:00combine the adapter with the
  • 01:26:01original backbone model to formalize
  • 01:26:04your own customized model.
  • 01:26:07Once we get our customized
  • 01:26:09model, we need to,
  • 01:26:10test the model to see
  • 01:26:12if the performance actually gained
  • 01:26:14compared with the backbone model.
  • 01:26:16So we need to do
  • 01:26:17the inference on the test
  • 01:26:18data.
  • 01:26:19And here is also some
  • 01:26:21example how to setting up
  • 01:26:23all the environment.
  • 01:26:24And this is the example
  • 01:26:25of how to, like, do
  • 01:26:27the inference on the test
  • 01:26:28data data to get to
  • 01:26:29the results.
  • 01:26:32This is also to set
  • 01:26:34up all the inference,
  • 01:26:36from fix.
  • 01:26:38For example, the max max
  • 01:26:39token indicate how long you'd
  • 01:26:42expect the model to output,
  • 01:26:44and the stop token EOS
  • 01:26:47means once the model, like,
  • 01:26:50generated the US to token,
  • 01:26:52it would finish
  • 01:26:53the generation
  • 01:26:54instead of generate, you know,
  • 01:26:56five hundred tokens.
  • 01:26:59So once we get the
  • 01:27:01inference results, we could evaluate
  • 01:27:03the performance
  • 01:27:04of the model and compare
  • 01:27:06it with the performance of
  • 01:27:07the backbone model.
  • 01:27:08And here is some evaluation
  • 01:27:10metric that you just have
  • 01:27:11the introduced before, so I'm
  • 01:27:13just gonna skip this.
  • 01:27:15And I also provide some
  • 01:27:16scripts
  • 01:27:17for the,
  • 01:27:18like, evaluation. You can also
  • 01:27:20refer to the code that
  • 01:27:21I provided for more details.
  • 01:27:23And here is the,
  • 01:27:25fine tune results that we
  • 01:27:27have once after we do
  • 01:27:30the fine tuning with eight
  • 01:27:31hundred samples that we annotated
  • 01:27:33before.
  • 01:27:34So the fine tune the
  • 01:27:35means that,
  • 01:27:36we use the three seventy
  • 01:27:38billion instructor model as a
  • 01:27:40backbone model to do the
  • 01:27:42fine tune.
  • 01:27:43So compared with the backbone
  • 01:27:44model, we see that for
  • 01:27:46every tech language, fluent, stem,
  • 01:27:48no, and other, all the
  • 01:27:49performance of all the f
  • 01:27:51one score actually in improved.
  • 01:27:53So
  • 01:27:54in this case, we can
  • 01:27:55say that we have
  • 01:27:56a effective, like, fine tuned
  • 01:27:58customized model.
  • 01:28:01And once we find that
  • 01:28:02the customized model performed dropped,
  • 01:28:04we need to go back
  • 01:28:06to the training process to
  • 01:28:07retrain the model to see,
  • 01:28:09to,
  • 01:28:10like,
  • 01:28:12iteratively
  • 01:28:13check the performance to see
  • 01:28:14if there is any,
  • 01:28:16gain.
  • 01:28:17And here is the code
  • 01:28:19data that we provided for,
  • 01:28:21like, more details.
  • 01:28:23And if you have any,
  • 01:28:25like, interest, you can leave
  • 01:28:26if, have any question, you
  • 01:28:28can leave comments or directly
  • 01:28:30send me an email.
  • 01:28:31Oh, okay. Thank you so
  • 01:28:33much.