ABERT-BiGRU-CRFModelforEntityRecognitionofChinese ...

Research ArticleA BERT-BiGRU-CRF Model for Entity Recognition of ChineseElectronic Medical Records

Qiuli Qin 1 Shuang Zhao 1 and Chunmei Liu 2

1School of Economics and Management Beijing Jiaotong University Beijing 100044 China2Beijing Jiaotong University Health Service Center Beijing 100044 China

Correspondence should be addressed to Qiuli Qin qlqinbjtueducn

Received 10 December 2020 Revised 4 January 2021 Accepted 9 January 2021 Published 28 January 2021

Academic Editor Abd EI-Baset Hassanien

Copyright copy 2021 Qiuli Qin et al +is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Because of difficulty processing the electronic medical record data of patients with cerebrovascular disease there is little maturerecognition technology capable of identifying the named entity of cerebrovascular disease Excellent research results have beenachieved in the field of named entity recognition (NER) but there are several problems in the pre processing of Chinese namedentities that have multiple meanings of which neglecting the combination of contextual information is one +erefore to extractfive categories of key entity information for diseases symptoms body parts medical examinations and treatment in electronicmedical records this paper proposes the use of a BERT-BiGRU-CRF named entity recognition method which is applied to thefield of cerebrovascular diseases +e BERT layer first converts the electronic medical record text into a low-dimensional vectorthen uses this vector as the input to the BiGRU layer to capture contextual features and finally uses conditional random fields(CRFs) to capture the dependency between adjacent tags +e experimental results show that the F1 score of the modelreaches 9038

1 Introduction

Named entity recognition is to extract entities with actualmeaning from massive unstructured text data [1 2] In themedical field medical entities mainly include symptomsexaminations diseases drugs treatments operations bodyparts etc and are an important part of the establishment of amedical knowledge base Chinese electronic medical record(CMR) [3] is a combination of structured and unstructuredtexts which generally include not only patient informationbut also a large amount of medical knowledge but it isdifficult to process With the development of deep learningtechnology entity recognition algorithms have applied re-search in many fields but they lack applications in the fieldof cerebrovascular diseases (CVD) [4]

Cerebrovascular diseases have become one of the mostthreatening diseases to human health in the world due to thefour characteristics [5 6] +e treatment of cerebrovasculardiseases highly depends on the doctorrsquos experience With theincrease in the number of patients with CVD there is a

greater demand for cerebrovascular disease physicians Sincethe training cycle of professional doctors is relatively long[7 8] it will cause an imbalance in supply and demand ofldquomore patients and fewer doctorsrdquo With the introduction ofthe concept of ldquoAI +Medicalrdquo the use of machine learningtechnology to assist diagnosis and treatment that is throughthe construction of a complex model the feedback mech-anism is used to continuously optimize the parameters of themodel then use the existing clinical data and neuroimagingdata in the hospital to diagnose and treat cerebrovasculardiseases or predict recurrence On the one hand auxiliarydiagnosis and treatment decision-making is helpful to im-prove the professional level of doctors and improve thequality of CVD medical services On the other hand it canoptimize the uneven distribution of medical resources [9]At present the scientific research of machine learning in thefield of CVD mainly focuses on two aspects diagnosis andprognosis prediction of cerebrovascular disease (1) Fromthe perspective of CVD diagnosis most scholars usestructured data to nest machine learning models to complete

HindawiComplexityVolume 2021 Article ID 6631837 11 pageshttpsdoiorg10115520216631837

disease diagnosis +e literature [10ndash12] established a jointdiagnosis model based on logitic regression method andXGBoost machine learning method by collecting clinicaldata of demographic characteristics (2) From the per-spective of prognosis prediction the use of machine learningmethods for risk prediction has gradually become the trendof disease prediction while machine learning methods suchas random forest decision tree SVM and other machinelearning methods have achieved certain research results inthe prediction of cerebrovascular diseases Literature [13ndash15]constructed logistic regression k-NN random forest de-cision tree and SVM machine learning models based onfollow-up data and verified the advantages of machinelearning models in cerebrovascular disease risk prediction Itshows that the effect of neural network learning score isbetter In short from the analysis of clinical data sourcesCVD medical data includes cerebrovascular disease imagingdata follow-up data electronic medical records and otherdata +e focus of most scholars is still on structured datasuch as follow-up data and neuroimaging data while thefocus on electronic medical record data in the field of CVD isslightly lower At present the increase in the number ofCVD patients is accompanied by the ever-increasingnumber of electronic medical records for CVD patientsElectronic medical records can provide scholars with moredata resources For the processing of unstructured text in-formation in electronic medical records named entityrecognition (NER) is a key step and there are relatively fewresearches dedicated to named entity recognition in the fieldof cerebrovascular

+e current research on named entity recognition fo-cuses on three aspects (1) From the perspective of tradi-tional entity recognition methods traditional methodsinclude methods based on dictionaries and rules [16ndash23]+is method relies heavily on domain dictionaries anddomain experts +e selection of features is done manuallyand subjectivity and labor costs are relatively large With thedevelopment of machine learning technology [19 20] moreand more scholars are paying attention to models such asconditional random field (CRF) Hidden Markov Model(HMM) Support Vector Machine (SVM) However NERbased on traditional machine learning technology has higherrequirements for feature selection [21ndash23] and the quality offeature selection directly affects the effect of entity recog-nition (2) From the perspective of deep learning methodswith the development of deep learning technology literature[24ndash27] confirmed the advantages of deep neural networktechnology by comparing traditional CRF models that isdeep neural network technology has less artificial featureintervention than traditional methods and can obtain higheraccuracy and recall rate Deep learning can automaticallyextract word features reduce the subjectivity of featureselection and help further improve the accuracy of recog-nition results +erefore it is better than traditional statis-tical algorithms such as CRF and HMM +e commonsingle-entity recognition neural network generally onlyconsiders the sample input and lacks in-depth thinkingabout the output relationship Based on the idea of modelfusion most scholars usually use LSTM-CRF as the main

framework to solve the deficiencies of neural networkmodels +e available literature [28ndash32] uses traditionalword2vec Glove and other word vector methods uses aBiLSTM-CRF model as the core and adds a CNN modelattention mechanism RNN model etc to the coreframework Furthermore the word vector undergoes con-tinuous fine-tuning of the parameters resulting in the finalrecognition achieving more accurate recognition +e pro-cess of parameter tuning is employed to set the hyper-parameters of the model However for the BiLSTM-CRFmodel there are more parameter settings and the modeltraining time is longer Literature [26ndash31 33 34] proposedthe BGRU neural model which has simple results and highcomputational efficiency can make full use of context in-formation to eliminate entity ambiguity and has some goodeffects in the field of entity recognition (3) From the per-spective of pre-training models the above-described pre-processing models all use traditional word vector methodssuch as word2vec and Glove +is method focuses on thefeature extraction between words and often ignores wordcontext information In order to improve this problem asthe Google BERT pre-training model is proposed the lit-erature studies [35ndash38] combine the BERTword embeddingmodel on the basis of the traditional BiLSTM-CRF modeland consider the polysemy of a word in combination withthe context +e P value R value and F1 score have all beenimproved It can be seen that BERT has strong semanticanalysis capabilities

In order to solve the problems of ignoring context in-formation low model efficiency and susceptibility to wordsegmentation in electronic CVD medical entity recognitionprocessing We propose a BERT-BiGRU-CRF neural net-work model to identify named entities in electronic medicalrecords of cerebrovascular diseases Specifically the BERTlayer first converts the electronic medical record text into alow-dimensional vector then inputs the vector into theBiGRU layer to capture contextual features and finally uses aCRF to capture dependency between adjacent tags +eentity extraction model proposed in this paper has achievedgood recognition results

2 BERT-BiGRU-CRF Model Construction

In the NER field the use of deep neural network modelsfor entity recognition has become the mainstream +isarticle uses BiGRU-CRF as a benchmark to extract namedentities in the field of cerebrovascular diseases +ereason why the BERT pre-training language model ischosen is that the text vector is used as the input of themodel and the granularity of Chinese division is dividedinto character-level and word-level Existing researchshows that character-level pretraining schemes showbetter results [37 39] while the BERT pre-traininglanguage model is a character-level pretraining program+at is each word in the text is converted into a vector byquerying the word vector table as the model input themodel output is the vector representation combined withthe context

2 Complexity

+e overall structure of the BERT-BiGRU-CRF model isshown in Figure 1+emodel is mainly divided into 3 layers+e first layer is the BERT layer +rough the BERT pre-training language model each word in the sentence isconverted into a low-dimensional vector form +e secondlayer is the BiGRU layer which aims to automatically extractsemantic and temporal features from the context +e thirdlayer is the CRF layer which aims to solve the dependencybetween the output tags to obtain the global optimal an-notation sequence of the text

In this study the named entity recognition model wasused to identify the medical named entity in the electronicmedical record of cerebrovascular disease +e specific stepsare as follows

(1) EMR data preprocessing that is processing theoriginal electronic medical record text data set andexpress the electronic medical record text set asH h1 h2 hn1113864 1113865 where the i-th electronicmedical record text is expressed ashi ltwi1 wi2 win gt Predefined entity categoryC c1 c2 cm1113864 1113865 is divided and annotatedaccording to character-level and the characters andpredefined categories are separated by spaces whenannotated

(2) Construct the electronic medical record text trainingdata set

(3) Model training that is training the BERT-BiGRU-CRF named entity recognition modelTake the electronic medical record test text col-lection Dtest d1 d2 dN1113864 1113865 as input and takethe entity and its corresponding category pair asoutput langm1 c1 gt ltm2 c2 gt ltmp cprang1113966 1113967where the entity mi lt hi bi ei gt represents theentity that appears in the document hi bi and eirespectively represent the start and end positionsof mi in hi and no overlap between entities isrequired ie ei lt bi + 1 Cmi represents the pre-defined category of the entity mi then calculatesthe F1 score according to the precision rate andthe recall rate and uses the F1 score as the modelcomprehensive evaluation index

21 BERT Pre-training Language Model Bidirectional En-coder Representation from Transformers (BERT) [40] is anunsupervised and deep bidirectional language representa-tion model for pre-training In order to accurately representthe context-related semantic information in the EMR it isnecessary to call the interface of the model to obtain theembedded representation of each word in the electronicmedical record BERT uses the deep two-way transformerencoder as the main structure of the model Transformerintroduces the self-attention mechanism and also draws onthe residual mechanism of the convolutional neural net-work so the training speed of the model is fast and theexpression ability is strong And also abandoning the RNNloop structure the overall structure of the BERT model isshown in Figure 2

En is the coded representation of the word Trm is thetransformer structure and Tn is the word vector of the targetword after training +e operating principle of the model isto use the transformer structure to construct a multi-layerbidirectional Encoder network which can read the entiretext sequence at one time so that each layer can integrate thecontextual information+e input of the BERTmodel adoptsthe embedding addition method By adding three vectorsToken Embeddings Segment Embeddings and PositionEmbeddings the purpose of pre-training and predicting thenext sentence is achieved In Chinese electronic medicalrecord text processing the semantics of characters or wordsin different positions have different semantics Transformerindicates that the information embedded in the sequence ofthe tag sequence is its relative position or absolute positioninformation as shown in the following formulae

PE Ppos 2i+11113872 1113873 cosPpos

1000(2idmodel)1113888 1113889 (1)

PE Ppos 2i1113872 1113873 sinPpos

1000(2idmodel)1113888 1113889 (2)

where Ppos is the position of the word in the text i representsthe dimension and dmodel is the dimension of the encodedvector +e odd position is encoded using the cosinefunction Even positions are coded using a sine function

In order to better capture word-level and sentence-levelinformation the BERT pre-training language model isjointly trained by two tasks Masked Language Model andNext Sentence Prediction +e Masked LM model [36] issimilar to cloze filling 15 of the words in the randommaskcorpus are marked with the ldquoMASKrdquo form and then theBERTmodel is used to correctly predict the masked words+e strategy adopted in the training is that for 15 of thewords only 80 of the words are actually replaced with[mask] 10 of the words will be randomly replaced withother words and the remaining 10 are unchanged +eNext SP model is to train the model to understand therelationship between sentences that is to judge whether thenext sentence is the next sentence of the previous sentence+e specific method is to randomly select 50 correctsentence pairs from the text corpus and 50 randomlyselect sentence pairs to judge the correctness of the sentencepairs +e Masked LM word processing and Next SP sen-tence processing are jointly trained to ensure that the in-formation is represented by the vector of each word so themodel is comprehensive and semantically accurate It fullydepicts the characteristics of the character-level word-levelsentence-level and even the relationship between sentencesand increases the generalization ability of the BERTmodel

22 BiGRU Layer Gated Recurrent Unit (GRU) [34] gatedrecurrent unit structure is a variant of long and short-termmemory neural network (LSTM) +e LSTM structure in-cludes forget gates input gates and output gates In tradi-tional recurrent neural network (RNN) training gradientdisappearance or explosion problems often occur LSTM

Complexity 3

only solves the problem of gradient disappearance to acertain extent and the calculation is time-consuming +eGRU structure includes an update gate and a reset gate andthe GRU combines the forget gate and the input gate in theLSTM into an update gate +erefore GRU not only has theadvantages of LSTM but also simplifies its network struc-ture In the task of entity recognition of cerebrovasculardisease electronic medical record GRU can extract featureseffectively Its network structure is shown as in Figure 3

In the GRU structure the update gate is z and the resetgate is r +e update gate zt is to calculate how muchelectronic medical record information of the previoushidden layer state htminus1 needs to be transmitted to the currenthidden state ht If zt takes the value [0 1] it needs to betransmitted when it is close to 1 and the information needsto be ignored when the value is close to 0 +e reset gate rt

calculation formula is similar to the update gate principlebut the weight matrix is different+e calculation of zt and rt

is shown in formulae (3) and (4) First the electronic medicalrecord data xt input at time t the state htminus1 of the hiddenlayer at the previous time and the corresponding weights

O

BERT

Xt Xt+1Xtndash1 Xt+2

htndash1 ht ht+1 ht+2

GRU

GRU GRU GRU GRU

GRU GRU GRU

O B ICRF layer

The forwardGRU

Input vector

The backwardGRU

BERT layer

BiGRU layer

Figure 1 BERT-BiGRU-CRF model structure

E1 E2 En

Trm Trm Trm

TrmTrmTrm

T1 T2 Tn

hellip

hellip

hellip

hellip

Figure 2 BERT network model

htndash1

tanh

ht

rt zt

1ndash

xt

σ σh t

Figure 3 GRU structural unit

4 Complexity

are respectively multiplied and added to the σ functionAfter the calculation of zt and rt is completed the contentthat needs to be memorized at time t can be calculatedSecondly use the reset gate to determine the hidden state ofthe electronic medical record at t minus 1 +e information thatneeds to be ignored at time t +en input rt htminus1 xt and usetanh function to calculate the candidate hidden strong stateFinally httransfers the cerebrovascular disease electronicmedical record information retained in the current unit tothe next unit that is at time t the product of zt and 1113957h

represents the cerebrovascular disease information that thehidden unit ht needs to retain +e product of (1 minus zt) andhtminus1 indicates how much information needs to be forgotten+e calculation is shown in formulae (5) and (6) for details

zt σ wz lowast htminus1 xt1113858 1113859( 1113857 (3)

rt σ wr lowast htminus1 xt1113858 1113859( 1113857 (4)

1113957h tanh wh lowast rt μt htminus1 xt( 1113857 (5)

ht 1 minus zt( 1113857μht minus 1 + ztμ1113957ht (6)

where xt is the input of the electronic medical record ofcerebrovascular disease at time t and htminus1 is the state of thehidden layer at the previous time ht is the hidden state attime t w is the weight matrix wz is the update gate weightmatrix and wr is the reset gate weight matrix σ is thesigmoid nonlinear transformation function and tanh is theactivation function 1113957h is the hidden state of candidate

From the operating principle of the GRU unit it candiscard some useless information and the structure of themodel is simple which reduces the computational com-plexity However the simple GRU cannot fully utilize thecontext information of the electronic medical record+erefore this paper designs the backward GRU to learn thebackward semantics and the GRU neural network forwardsand backwards to extract the key features of the named entityin the electronic medical record of cerebrovascular diseasenamely the BiGRUmodel+e specific structure is shown inFigure 4 Based on the GRU principle forward GRU is toobtain the above semantic feature (ht) and backward GRU isto obtain the following semantic features ht and finally theabove and the following semantic features are combined toget ht Refer to formulae (7) and (8) for details

hrarr

t G Rrarr

U xt( 1113857 (7)

hlarr

t GRlarr

U xt( 1113857 (8)

ht lt hrarr

t hlarr

t gt (9)

where hrarr

t is the hidden layer state the purpose is to obtain

the above information from the GRU hlarr

t is the hidden layerstate the purpose is to obtain the following informationfrom the GRU G R

rarrU(xt) means that it is represented by

features from front to back GRlarr

U(xt) means that it is

represented by the back-to-front feature and the finalhidden layer state of ht is the feature of the electronicmedical record report

23 CRF Layer +e NER problem can be regarded as asequence labeling problem +e BiGRU layer outputs thehidden state context feature vector h denoted ash (h1 h2 ht) +is vector only considers the contextinformation in the electronic medical record and does notconsider the inter-label dependencies +erefore this paperadds a CRF layer to label the global optimal sequence andconverts the hidden state sequence h (h1 h2 ht) intothe optimal label sequence y (y1 y2 yt) CRF cal-culation principle [34] firstly for the specified electronicmedical record input sequence x (x1 x2 xt) it cal-culates the score of each location shown in formula (10)Secondly calculate the probability of normalized sequence ythrough the Softmax function shown in formula (11) Fi-nally the label sequence with the highest score is calculatedusing the Viterbi algorithm shown in formula (12)

score(h y) 1113944T

t1Aytminus1 yt + 1113944

T

t1W

tytht (10)

Py

h1113874 1113875

escore(hy)

1113936yprimeisinY(h)escore hyprime( )

(11)

ylowast arg maxyprimeisinY(h)

h yprime( 1113857 (12)

whereA is the transfer score matrix between tags score (h y)is the position score WT

yt is the parameter vector p(yh)

normalized probability functionY(h) represents all possibletag sequences and formula (10) is to calculate the score (h y)of each position in the input sequence from the outputfeature matrix of the BiGRU layer and the CRF transitionmatrix

24 Training Process +e process of deep network modeltraining is a process of repeatedly adjusting parameters sothat loss reaches a minimum However due to the stronglearning ability of deep network models the problem ofmodel generalization is prone to occur For example theproblem of model under-fitting and over-fitting leads topoor adaptability of the model to new sample data+erefore regularization methods can generate manymodels with small parameter values In other words such

GRU

GRU GRU GRU GRU

GRU GRU GRU

hellip

hellip

Figure 4 BiGRU structure

Complexity 5

models have strong anti-interference ability and can adapt todifferent datasets and different ldquoextreme conditionsrdquo It canincrease the generalization capabilities of the model duringthe network training process +e method to solve theproblem in this paper is the L2 regularization method whichcan avoid the over-fitting problem that is adding regula-rization calculation to the cost function shown in the fol-lowing formula

loss Ein + λ1113944j

w2j (13)

where Ein is the training sample error that does not includethe regularization term λ is the adjustable parameter ofregularization and wi represents the weight parameter

3 Experimental Design

31 Data Preparation +e experimental data in this articlewas obtained from the Airsquoai medical electronic medicalrecords website +e electronic medical record data is a totalof 1300 electronic medical records related to cerebrovas-cular diseases which are composed of general patient in-formation chief complaint medical history physicalexamination and diagnosis In addition this article sorts outthe types of entities in the published papers related to namedentities in electronic medical records as shown in Table 1According to the frequency of occurrence of entities inpublished literature electronic medical record entities forcerebrovascular diseases are divided into five entities dis-ease symptom body part examination and treatmentwhich are also proposed by CCKS

32 Data Preprocessing +e electronic medical record in-formation is preprocessed that is line breaks and invalidcharacters etc are removed and 36400 sentences are finallyobtained +e testing dataset and the training dataset aredivided into 2 8 +e labeling system used in this article isBIO labeling +e five types of entities are disease symp-toms body parts examination and treatment +ereforethere are 11 labels namely O Disease-B Disease-I Body-BBody-I Symptom-B Symptom-I Examination-B Exami-nation-I Treatment-B and Treatment-I We conduct namedentity labeling with doctors Among the 300medical recordswe designated two annotators to annotate them at the sametime used Cohenrsquos kappa to calculate the consistency of theannotations and obtained a kappa value of 08 +e labels tobe predicted are shown in Table 2

33 Experimental Settings +e experimental model in thisarticle is built using tensorflow deep learning frameworkand Python programming language +e parameter updatemethod is to update the parameters of the BiGRU-CRFmodel and BERT is a fixed parameter Table 3 lists thehyperparameter values of the experimental model in thisarticle +ese values have been modified according torelevant literature [34ndash38 41] and have not been adjustedfor the cerebrovascular electronic medical record data setin this article +e model parameter optimization in this

paper adopts the stochastic gradient descent method(SGD) the initial learning rate is 0015 the update of thelearning rate adopts the step decay method and the decayrate is 005 +e model has achieved good experimentalresults in the training set and test set

34 Evaluation +is article uses the most commonly usedevaluation index in the field of named entity recognitionprecision rate (P) recall rate (R) and F1 score (F1) +atis P is the recognition rate of correctly recognized namedentities R is the rate of correctly recognized namedentities in the test set and F1 is the harmonic average of Pand R which is the comprehensive evaluation index ofthe model Among them the higher the P and R valuesthe higher the accuracy and the recall rate but in fact thetwo are contradictory in some cases +erefore the F1score is often used to evaluate the overall performance ofthe model +e calculation formula is

Table 1 Sorting of entities

Entity category LiteratureSymptom [28 31 32 37 39 40 42]Treatment [28 31 32 37 39 40 42]Body part [31 32 34 42]Examination [28 31 32 40]Disease [28 31 37 40 42]Surgery [34 40]Anatomy [31 32]Independent symptoms [31 39]Drug [31 39]

Table 2 BIO annotation example

Text Label Meaning头 Body-B +e beginning of the body part entity部 Body-I +e middle of the body part entity疼 Symptom-B +e beginning of the symptom entity痛 Symptom-I +e middle of the symptom entity3 O Non-entity小 O Non-entity时 O Non-entity

Table 3 Parameter settings

Parameter ValueDropout 05Embedding 50GRU-dim 200Regularized intensity 1prime10-8Batch_size 64Lr (learning rate) 0015LSTM_dim 200Dr (decay rate) 005

6 Complexity

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)

where Tnum entities is the number of correct entities identifiedSnum entities is the total number of all entities identified andCnum entities is the number of entities in the test set

4 Experimental Results

41 Comparative Experiment Analysis In order to prove theentity recognition effect based on the BERT-BiGRU-CRFmodel this article uses BiLSTM-CRF as the baseline modeland the selected comparison models are BiLSTM-CRFBERT-BiLSTM-CRF and BERT-BiGRU-CRF Modelintroduction

(1) BiGRU-CRF model this model inputs word vectorsinto the model for training

(2) BiLSTM-CRF model this model is a classic model inthe NER field It uses trained word vectors and thenuses the BiLSTM-CRF model to extract entities

(3) BERT-BiLSTM-CRF model this model is based onthe Google BERT model Many scholars haveembedded BERT in the BiLSTM-CRF model andachieved better recognition results in NERresearch

42ModelPerformanceComparison +e comparisonmodelproposed in this paper first uses electronic medical recorddata for training and then uses a test set for testing +especific comparison results are shown in Table 4

From the comparison results of Table 4 and Figure 5 wecan see that in terms of comprehensive evaluation indica-tors In terms of precision rate recall rate and F1 score theBERT-BiGRU-CRF model proposed in this article has in-creased by 29 50 and 395 respectively comparedwith the BiGRU-CRFmodel +e difference between the twomodels is the embedding of BERT It shows that BERTembedding can improve the recognition effect of entitiesCompared with the BiLSTM-CRF model the increase was314 440 and 434 respectively Compared with theBERT-BiLSTM-CRF model the increase was 125 077and 101 respectively +erefore all P R and F1 score areimproved compared to the baseline model indicating thatthe BERT-BiGRU-CRF model is more applicable to elec-tronic medical record recognition in the CVD field +is ismainly due to the stronger ability of embedding BERT toextract features which enables word vectors to fuse contextinformation On the other hand the BiGRU-CRF model caninput bidirectional information before and after the se-quence which can effectively avoid entity ambiguity

Figure 6 in terms of entity types horizontally comparesthe recognition effects of various entities under differentmodels and compare BiLSTM-CRF BERT-BiLSTM-CRFand BiGRU-CRF In terms of disease entities they wereincreased by 987 273 and 963 respectively on thesymptom entity they were increased by 162 minus033 and329 on the body part entity they were increased by285 045 and 310 On the examination entity theywere increased by 076 minus025 and 049 In terms ofthe treatment entities they were increased by 38 246and 329 respectively +e overall recognition effect ofdifferent entities is compared longitudinally the effect ofchecking entity recognition is higher than the comparisonmodel and the F1 score reaches 90 However the rec-ognition effect of entities in the treatment category isrelatively poor because the entities are relatively long andcannot clearly identify the boundaries of each entity Inshort the recognition effect of the BERT-BiGRU-CRFmodel proposed in this paper is higher than that of thecontrol group

43 Model Training Time Model training is the process ofparameter update +is article analyzes the relationshipbetween the four models in the first 10 rounds of Epoch andF1 It can be seen from Figure 7 that the F1 score of theneural network model without BERT is continuously risingfrom a lower level while the F1 score of the neural networkmodel with BERTcan be maintained at a higher level and ittakes iterations to reach the optimal F1 score fewer times Inaddition as a whole the F1 score of the BERT-BiGRU-CRFmodel is the highest From the comparison of training timeTable 5 lists the time required for each model iteration +etraining time of the BERT-BiGRU-CRFmodel for one roundis 37 seconds shorter than that of the BERT-BiLSTM-CRFmodel+is is due to the simple structure of the BiGRU-CRFmodel and the higher efficiency of the model in calculationIn addition comparing BiGRU-CRF and BERT-BiGRU-CRF models it is worth noting that the BERTwith full wordmask is added to the neural network model which improvesthe overall training efficiency of the model +e overalltraining efficiency is improved

In summary the BERT-BiGRU-CRF entity recognitionmodel proposed in this paper has a better recognition effectthan the control group +is model can make full use ofcontext information further avoid ambiguity and effectivelyavoid repetition between entities and the granularity ofword segmentation in this article is small which can im-prove the accuracy of entity recognition

44 Entity Recognition Result +is paper uses the BERT-BiGRU-CRF named entity recognition model to identify9393 entities (without deduplication) Among them elec-tronic medical records have the most descriptions of bodyparts followed by symptoms and examination entities whiletreatment and disease types are less +e specific results areshown in Figure 8

Complexity 7

Table 4 Comparison results of each model

Model Evaluation index ()Entity type

Comprehensive valueDisease Symptoms Body parts Examination Treatment

BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660

BERT-BiLSTM-CRFP 8864 9038 8782 9198 8412 8859R 8902 9246 8901 9324 8709 9016F1 8883 9141 8841 9261 8558 8937

BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643

BERT-BiGRU-CRFP 9065 9026 8864 9233 8729 8983R 9248 9192 8908 9239 8881 9094F1 9156 9108 8886 9236 8804 9038

PRF1

80

85

90

95

100

Rate

()

BERT-BiLSTM-CRF BiGRU-CRF BERT-BiGRU-CRFBiLSTM-CRFModels

Figure 5 Comparison of various indicators of different models

BiLSTM-CRF

Symptom Body part Examination TreatmentDisease80

85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)

Figure 6 Comparison of entity F1

BiLSTM-CRFBERT-BiLSTM-CRFBiGRU-CRFBERT-BiGRU-CRF

5 100Epoch

0

20

40

60

80

100

F 1 (

)

Figure 7 Epoch and F1 relationship

Table 5 Time of one round of model training

Model Training time (s)BiLSTM-CRF 472BERT-BiLSTM-CRF 356BiGRU-CRF 429BERT-BiGRU-CRF 319

Complexity 9

5 Conclusions

Aiming at the text data of electronic medical records of ce-rebrovascular diseases this paper proposes a BERT-BiGRU-CRF entity recognition model to identify five key entities inthe field of cerebrovascular diseases which are ldquodiseasesymptoms body parts examination and treatmentrdquo +emodel obtains the word vector combined with context in-formation through the BERT layer and then obtains theoptimal annotation sequence through the BiGRU-CRF neuralnetwork model It not only guarantees a simple networkstructure and fast training speed but also can solve theproblem of ambiguity in combination with context infor-mation Next on the one hand we will study the constructionof high-quality dictionaries On the other hand we will extractthe relationship between different entities based on NER toconstruct a knowledge map in the field of cerebrovasculardiseases which is conducive to the further potential infor-mation of electronic medical records in the field of CVD

Data Availability

+e Chinese electronic medical record data used to supportthe findings of this study have been deposited in the Airsquoaimedical repository

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was supported by the Beijing MunicipalCommission of Science and Technology Project(Z131100005613017) the model and demonstration appli-cation of the collaborative prevention and treatment ofmajor diseases in Beijingrsquos medical reform

References

[1] H-J Jang and K-O Cho ldquoApplications of deep learning forthe analysis of medical datardquo Archives of Pharmacal Researchvol 42 no 6 pp 492ndash504 2019

[2] J R Ubbens and I Stavness ldquoDeep plant phenomics a deeplearning platform for complex plant phenotyping tasksrdquoFrontiers in Plant Science vol 8 no 1190 p 2245 2018

[3] L Shen Q Li W Wang et al ldquoTreatment patterns and directmedical costs of metastatic colorectal cancer patients a ret-rospective study of electronic medical records from UrbanChinardquo Journal of Medical Economics vol 23 no 5pp 456ndash463 2020

[4] Z Li T Liu L Ding Z Liu X Li and Y Wang ldquoResearchprogress of machine learning in the diagnosis and treatmentof cerebrovascular diseasesrdquo Chinese Journal of Stroke vol 15no 3 pp 283ndash289 2020

[5] S Wu B Wu and M Li ldquoStroke in China advances andchallenges in epidemiology prevention and managementrdquoLancet Neurology vol 18 no 4 pp 394ndash405 2019

[6] J M Wardlaw C Smith and M Dichgans ldquoSmall vesseldisease mechanisms and clinical implicationsrdquo 4e LancetNeurology vol 18 no 7 pp 684ndash696 2019

[7] W J Powers A A Rabinstein T Ackerson et al ldquoGuidelinesfor the early management of patients with acute ischemicstroke 2019 update to the 2018 guidelines for the earlymanagement of acute ischemic stroke a guideline forhealthcare professionals from the American heart associationAmerican stroke associationrdquo Stroke vol 50 no 12pp e344ndashe418 2019

[8] Z Li A B Singhal and Y Wang ldquoStroke physician trainingin Chinardquo Stroke vol 48 no 12 pp e338ndashe340 2017

[9] E Smit G Saposnik G Biessels et al ldquoPrevention of stroke inpatients with silent cerebrovascular disease a scientificstatement for healthcare professionals from the Americanheart associationAmerican stroke associationrdquo Strokevol 48 no 2 pp E44ndashE71 2017

[10] J Virtanen M Varpela F Biancari J Jalkanen andH Hakovirta ldquoAssociation between anatomical distributionof symptomatic peripheral artery disease and cerebrovasculardiseaserdquo Vascular vol 28 no 3 pp 295ndash300 2020

[11] J Virtanen M Varpela and F Biancari ldquoAssociation betweenanatomical distribution of symptomatic peripheral arterydisease and cerebrovascular diseaserdquo Vascular vol 28 no 32pp 95ndash300 2020

[12] A Baluja R Moises A Cordero et al ldquoPrediction of majoradverse cardiac cerebrovascular events in patients with di-abetes after acute coronary syndromerdquo Diabetes amp VascularDisease Research vol 17 no 1 pp 67ndash69 2019

[13] C Romana T Eva D Edin J Raka and T Milan ldquoBrainimage segmentation based on firefly algorithm combined withK-means clusteringrdquo Studies in Informatics and Controlvol 28 no 2 pp 167ndash176 2019

[14] B Ambale-Venkatesh X Yang C Wu et al ldquoCardiovascularevent prediction by machine learning the multi-ethnic study

Total = 9393

1996 symptom1564 treatment2877 body part2140 examination1423 disease

Figure 8 Distribution of various entities

10 Complexity

of atherosclerosisrdquo Circulation Research vol 121 no 9p 1092 2017

[15] I Korvigo M Holmatov A Zaikovskii and M SkoblovldquoPutting hands to rest efficient deep cnn-rnn architecture forchemical named entity recognition with no hand-craftedrulesrdquo Journal of Cheminformatics vol 10 no 28 2018

[16] H Afify K Mohammed and A Hassanien ldquoMulti-imagesrecognition of breast cancer histopathological via probabi-listic neural network approachrdquo Journal of System andManagement Sciences vol 1 no 2 pp 53ndash68 2020

[17] A Ekbal and S Saha ldquoSimultaneous feature and parameterselection using multi-objective optimization application tonamed entity recognitionrdquo International Journal of MachineLearning and Cybernetics vol 7 no 4 pp 597ndash611 2020

[18] Y Meng and L Long ldquoA deep learning Approach for a sourcecode detection model using self-attentionrdquo Complexityvol 2020 Article ID 5027198 15 pages 2020

[19] L Gligic A Kormilitzin P Goldberg and A Nevado-Hol-gado ldquoNamed entity recognition in electronic health recordsusing transfer learning bootstrapped neural networksrdquoNeural Networks vol 121 pp 132ndash139 2020

[20] C Dumitrescu and I Dumitrache ldquoCombining deep learningtechnologies with multi-level gabor features for facial rec-ognition in biometric automated systemsrdquo Studies in Infor-matics and Control vol 28 no 2 pp 221ndash230 2019

[21] J Chen Z Chang and J Xu ldquoRecognition and classificationof biomedical named entities based on HMMrdquo Computer Ageno 10 pp 40ndash42 2006

[22] M Sui and L Cui ldquoCRF model combining multiple char-acteristics for chemical substance-disease named entity rec-ognitionrdquo Modern Library and Information Technologyno 10 pp 91ndash97 2016

[23] X Sun C Sun and F Ren ldquoBiomedical named entity rec-ognition based on deep conditional random fieldsrdquo PatternRecognition and Artificial Intelligence vol 29 no 11pp 997ndash1008 2016

[24] A Cocos A Fiks and A Masino ldquoDeep learning for phar-macovigilance recurrent neural network architectures forlabeling adverse drug reactions in Twitter postsrdquo Journal ofthe American Medical Informatics Association vol 24 no 4pp 813ndash821 2017

[25] F Zhang and M Wang ldquoMedical named entity recognitionbased on deep learningrdquo Computing Technology and Auto-mation vol 36 no 1 pp 123ndash127 2017

[26] L Qin N Yu and D Zhao ldquoApplying the convolutionalneural network deep learning technology to behaviouralrecognition in intelligent videordquo Tehnicki Vjesnik-TechnicalGazette vol 25 no 2 pp 528ndash535 2018

[27] U Zuperl and F Cus ldquoA cyber-physical system for smartfixture monitoring via clamping simulationrdquo InternationalJournal of Simulation Modelling vol 18 no 1 pp 112ndash1242019

[28] J M Giorgi and G D Bader ldquoTowards reliable named entityrecognition in the biomedical domainrdquo Bioinformaticsvol 36 no 1 pp 280ndash286 2020

[29] I Unanue E Borzeshi and M Piccardi ldquoRecurrent neuralnetworks with specialized word embeddings for health-do-main named-entity recognitionrdquo Journal of Biomedical In-formatics vol 76 pp 102ndash109 2017

[30] L Weston V Tshitoyan J Dagdelen et al ldquoNamed entityrecognition and normalization applied to large-scale infor-mation extraction from the materials science literaturerdquoJournal of Chemical Information and Modeling vol 59 no 9pp 3692ndash3702 2019

[31] Z H Kilimci and S Akyokus ldquoDeep learning- and wordembedding-based heterogeneous classifier ensembles for textclassificationrdquo Complexity vol 2018 Article ID 713014610 pages 2018

[32] S Chowdhury X Dong and L Qian ldquoA multitask bi-di-rectional RNN model for named entity recognition on Chi-nese electronic medical recordsrdquo BMC Bioinformatics vol 19no 17 pp 75ndash84 2018

[33] M Al-Smadi S Al-Zboon Y Jararweh and P JuolaldquoTransfer learning for Arabic named entity recognition withdeep neural networksrdquo Ieee Access vol 8 no 31 pp 37ndash452020

[34] M Gridach and H Haddad ldquoArabic named entity recogni-tion a bidirectional GRU-CRF approachrdquo in ComputationalLinguistics and Intelligent Text Processing Lecture Notes inComputer Science vol 10761 pp 264ndash275 no 1 SpringerBerlin Germany 2018

[35] Y Kim and T Lee ldquoKorean clinical entity recognition fromdiagnosis text using bertrdquo BMC Medical Informatics andDecision Making vol 20 no 1 p 242 2020

[36] J Lee W Yoon S Kim et al ldquoBiobert a pre-trained bio-medical language representation model for biomedical textminingrdquo Bioinformatics (Oxford England) vol 36 no 4pp 1234ndash1240 2020

[37] P Yang and W Dong ldquoChinese named entity recognitionmethod based on BERT embeddingrdquo Computer Engineeringvol 46 no 4 pp 40ndash45+52 2020

[38] Y Cho and Y Lee ldquoBiomedical named entity recognitionusing deep neural networks with contextual informationrdquoBMC Bioinformatics vol 20 no 1 p 735 2019

[39] X Chen C Ouyang Y Liu and Y Bu ldquoImproving the namedentity recognition of Chinese electronic medical records bycombining domain dictionary and rulesrdquo InternationalJournal of Environmental Research and Public Health vol 17no 8 p 2687 2020

[40] F Li Y Jin W Liu B Rawat P Cai and H Yu ldquoFine-tuningbidirectional encoder representations from transformers(Bert)-based models on large-scale electronic health recordnotes an empirical studyrdquo JMIR Medical Informatics vol 7no 3 Article ID e14830 2019

[41] Y Feng H Yu and G Sun ldquoNamed entity recognitionmethod based on BLSTM[J]rdquo Computer Science vol 45 no 2pp 261ndash268 2018

[42] J Wu Y Cheng and H Hao ldquoResearch on Chinese pro-fessional term extraction based on BERTembedded BiLSTM-CRF modelrdquo Journal of Information vol 39 no 4 pp 409ndash418 2020

Complexity 11

disease diagnosis +e literature [10ndash12] established a jointdiagnosis model based on logitic regression method andXGBoost machine learning method by collecting clinicaldata of demographic characteristics (2) From the per-spective of prognosis prediction the use of machine learningmethods for risk prediction has gradually become the trendof disease prediction while machine learning methods suchas random forest decision tree SVM and other machinelearning methods have achieved certain research results inthe prediction of cerebrovascular diseases Literature [13ndash15]constructed logistic regression k-NN random forest de-cision tree and SVM machine learning models based onfollow-up data and verified the advantages of machinelearning models in cerebrovascular disease risk prediction Itshows that the effect of neural network learning score isbetter In short from the analysis of clinical data sourcesCVD medical data includes cerebrovascular disease imagingdata follow-up data electronic medical records and otherdata +e focus of most scholars is still on structured datasuch as follow-up data and neuroimaging data while thefocus on electronic medical record data in the field of CVD isslightly lower At present the increase in the number ofCVD patients is accompanied by the ever-increasingnumber of electronic medical records for CVD patientsElectronic medical records can provide scholars with moredata resources For the processing of unstructured text in-formation in electronic medical records named entityrecognition (NER) is a key step and there are relatively fewresearches dedicated to named entity recognition in the fieldof cerebrovascular

+e current research on named entity recognition fo-cuses on three aspects (1) From the perspective of tradi-tional entity recognition methods traditional methodsinclude methods based on dictionaries and rules [16ndash23]+is method relies heavily on domain dictionaries anddomain experts +e selection of features is done manuallyand subjectivity and labor costs are relatively large With thedevelopment of machine learning technology [19 20] moreand more scholars are paying attention to models such asconditional random field (CRF) Hidden Markov Model(HMM) Support Vector Machine (SVM) However NERbased on traditional machine learning technology has higherrequirements for feature selection [21ndash23] and the quality offeature selection directly affects the effect of entity recog-nition (2) From the perspective of deep learning methodswith the development of deep learning technology literature[24ndash27] confirmed the advantages of deep neural networktechnology by comparing traditional CRF models that isdeep neural network technology has less artificial featureintervention than traditional methods and can obtain higheraccuracy and recall rate Deep learning can automaticallyextract word features reduce the subjectivity of featureselection and help further improve the accuracy of recog-nition results +erefore it is better than traditional statis-tical algorithms such as CRF and HMM +e commonsingle-entity recognition neural network generally onlyconsiders the sample input and lacks in-depth thinkingabout the output relationship Based on the idea of modelfusion most scholars usually use LSTM-CRF as the main

framework to solve the deficiencies of neural networkmodels +e available literature [28ndash32] uses traditionalword2vec Glove and other word vector methods uses aBiLSTM-CRF model as the core and adds a CNN modelattention mechanism RNN model etc to the coreframework Furthermore the word vector undergoes con-tinuous fine-tuning of the parameters resulting in the finalrecognition achieving more accurate recognition +e pro-cess of parameter tuning is employed to set the hyper-parameters of the model However for the BiLSTM-CRFmodel there are more parameter settings and the modeltraining time is longer Literature [26ndash31 33 34] proposedthe BGRU neural model which has simple results and highcomputational efficiency can make full use of context in-formation to eliminate entity ambiguity and has some goodeffects in the field of entity recognition (3) From the per-spective of pre-training models the above-described pre-processing models all use traditional word vector methodssuch as word2vec and Glove +is method focuses on thefeature extraction between words and often ignores wordcontext information In order to improve this problem asthe Google BERT pre-training model is proposed the lit-erature studies [35ndash38] combine the BERTword embeddingmodel on the basis of the traditional BiLSTM-CRF modeland consider the polysemy of a word in combination withthe context +e P value R value and F1 score have all beenimproved It can be seen that BERT has strong semanticanalysis capabilities

In order to solve the problems of ignoring context in-formation low model efficiency and susceptibility to wordsegmentation in electronic CVD medical entity recognitionprocessing We propose a BERT-BiGRU-CRF neural net-work model to identify named entities in electronic medicalrecords of cerebrovascular diseases Specifically the BERTlayer first converts the electronic medical record text into alow-dimensional vector then inputs the vector into theBiGRU layer to capture contextual features and finally uses aCRF to capture dependency between adjacent tags +eentity extraction model proposed in this paper has achievedgood recognition results

2 BERT-BiGRU-CRF Model Construction

In the NER field the use of deep neural network modelsfor entity recognition has become the mainstream +isarticle uses BiGRU-CRF as a benchmark to extract namedentities in the field of cerebrovascular diseases +ereason why the BERT pre-training language model ischosen is that the text vector is used as the input of themodel and the granularity of Chinese division is dividedinto character-level and word-level Existing researchshows that character-level pretraining schemes showbetter results [37 39] while the BERT pre-traininglanguage model is a character-level pretraining program+at is each word in the text is converted into a vector byquerying the word vector table as the model input themodel output is the vector representation combined withthe context

2 Complexity









1000(2idmodel)1113888 1113889 (1)


1000(2idmodel)1113888 1113889 (2)




Complexity 3





O

BERT



GRU

GRU GRU GRU GRU

GRU GRU GRU

O B ICRF layer

The forwardGRU

Input vector

The backwardGRU

BERT layer

BiGRU layer


E1 E2 En

Trm Trm Trm

TrmTrmTrm

T1 T2 Tn

hellip

hellip

hellip

hellip


htndash1

tanh

ht

rt zt

1ndash

xt

σ σh t


4 Complexity









hrarr

t G Rrarr

U xt( 1113857 (7)

hlarr

t GRlarr

U xt( 1113857 (8)

ht lt hrarr

t hlarr

t gt (9)

where hrarr









score(h y) 1113944T


T

t1W

tytht (10)

Py

h1113874 1113875

escore(hy)


(11)


h yprime( 1113857 (12)





GRU

GRU GRU GRU GRU

GRU GRU GRU

hellip

hellip


Complexity 5



w2j (13)














6 Complexity

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)













Complexity 7




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11









1000(2idmodel)1113888 1113889 (1)


1000(2idmodel)1113888 1113889 (2)




Complexity 3





O

BERT



GRU

GRU GRU GRU GRU

GRU GRU GRU

O B ICRF layer

The forwardGRU

Input vector

The backwardGRU

BERT layer

BiGRU layer


E1 E2 En

Trm Trm Trm

TrmTrmTrm

T1 T2 Tn

hellip

hellip

hellip

hellip


htndash1

tanh

ht

rt zt

1ndash

xt

σ σh t


4 Complexity









hrarr

t G Rrarr

U xt( 1113857 (7)

hlarr

t GRlarr

U xt( 1113857 (8)

ht lt hrarr

t hlarr

t gt (9)

where hrarr









score(h y) 1113944T


T

t1W

tytht (10)

Py

h1113874 1113875

escore(hy)


(11)


h yprime( 1113857 (12)





GRU

GRU GRU GRU GRU

GRU GRU GRU

hellip

hellip


Complexity 5



w2j (13)














6 Complexity

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)













Complexity 7




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11





O

BERT



GRU

GRU GRU GRU GRU

GRU GRU GRU

O B ICRF layer

The forwardGRU

Input vector

The backwardGRU

BERT layer

BiGRU layer


E1 E2 En

Trm Trm Trm

TrmTrmTrm

T1 T2 Tn

hellip

hellip

hellip

hellip


htndash1

tanh

ht

rt zt

1ndash

xt

σ σh t


4 Complexity









hrarr

t G Rrarr

U xt( 1113857 (7)

hlarr

t GRlarr

U xt( 1113857 (8)

ht lt hrarr

t hlarr

t gt (9)

where hrarr









score(h y) 1113944T


T

t1W

tytht (10)

Py

h1113874 1113875

escore(hy)


(11)


h yprime( 1113857 (12)





GRU

GRU GRU GRU GRU

GRU GRU GRU

hellip

hellip


Complexity 5



w2j (13)














6 Complexity

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)













Complexity 7




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11









hrarr

t G Rrarr

U xt( 1113857 (7)

hlarr

t GRlarr

U xt( 1113857 (8)

ht lt hrarr

t hlarr

t gt (9)

where hrarr









score(h y) 1113944T


T

t1W

tytht (10)

Py

h1113874 1113875

escore(hy)


(11)


h yprime( 1113857 (12)





GRU

GRU GRU GRU GRU

GRU GRU GRU

hellip

hellip


Complexity 5



w2j (13)














6 Complexity

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)













Complexity 7




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11



w2j (13)














6 Complexity

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)













Complexity 7




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11

P Tnum entities

Snum entities (14)

R Tnum entities

Cnum entities (15)

F1 2PlowastR

(P + R) (16)













Complexity 7




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11




BiLSTM-CRFP 8401 8909 8579 9116 8328 8667R 7950 8983 8624 9204 8506 8653F1 8169 8946 8601 9160 8416 8660


BiGRU-CRFP 8393 8820 8633 9190 8429 8693R 8002 8738 8520 9184 8522 8593F1 8193 8779 8576 9187 8475 8643


PRF1

80

85

90

95

100

Rate

()



BiLSTM-CRF


85

90

95

F 1 (

)

(a)

BERT-BiLSTM-CRF


85

90

95

F 1 (

)

(b)

Figure 6 Continued

8 Complexity

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11

BiGRU-CRF


85

90

95F 1

()

(c)

BERT-BiGRU-CRF


85

90

95

F 1 (

)

(d)



5 100Epoch

0

20

40

60

80

100

F 1 (

)




Complexity 9

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11

5 Conclusions


Data Availability




Acknowledgments


References















Total = 9393



10 Complexity






























Complexity 11






























Complexity 11

ABERT-BiGRU-CRFModelforEntityRecognitionofChinese ...

Documents

Transcript of ABERT-BiGRU-CRFModelforEntityRecognitionofChinese ...