ckiplab/ckiptagger: CKIP Neural Chinese Word ... - GitHub

2025-02-08

文章推薦指數： 80 %

投票人數：10人

CkipTagger is a Python library hosted on PyPI. Requirements: python>=3.6; tensorflow>=1.13.1 / tensorflow-gpu>=1.13.1 (one of them); gdown (optional, ... Skiptocontent {{message}} ckiplab / ckiptagger Public Notifications Fork 191 Star 1.5k CKIPNeuralChineseWordSegmentation,POSTagging,andNER License GPL-3.0license 1.5k stars 191 forks Star Notifications Code Issues 21 Pullrequests 3 Actions Projects 0 Wiki Security Insights More Code Issues Pullrequests Actions Projects Wiki Security Insights ckiplab/ckiptagger Thiscommitdoesnotbelongtoanybranchonthisrepository,andmaybelongtoaforkoutsideoftherepository. master Branches Tags Couldnotloadbranches Nothingtoshow {{refName}} default Couldnotloadtags Nothingtoshow {{refName}} default 1 branch 9 tags Code Latestcommit jacobvsdanniel updateREADME … 50add41 Sep10,2020 updateREADME 50add41 Gitstats 68 commits Files Permalink Failedtoloadlatestcommitinformation. Type Name Latestcommitmessage Committime src nowcompatibletotensorflow2.3.0 Sep9,2020 LICENSE changelicense Sep5,2019 README.md updateREADME Sep10,2020 demo.py normalizecharactersone-by-one Nov22,2019 setup.py nowcompatibletotensorflow2.3.0 Sep9,2020 Viewcode CkipTagger GitHub PyPI Documentation Author/Maintainers Introduction Installation Usage 1.Downloadmodelfiles 2.Loadmodel 3.(Optional)Createdictionary 4.RuntheWS-POS-NERpipeline 5.(Optional)Releasememory 6.ShowResults ModelDetails LICENSE README.md CkipTagger Also:中文README GitHub https://github.com/ckiplab/ckiptagger PyPI https://pypi.org/project/ckiptagger Documentation https://github.com/ckiplab/ckiptagger/wiki Author/Maintainers Peng-HsuanLi@CKIP(author/maintainer) Wei-YunMa@CKIP(maintainer) Introduction Thisopen-sourcelibraryimplementsneuralCKIP-styleChineseNLPtools. (WS)wordsegmentation (POS)part-of-speechtagging (NER)namedentityrecognition Relateddemosites CkipTagger CKIPCoreNLP CKIPWS(classic) Features Performanceimprovements Donotautodelete/change/addcharacters Supportindefinitelylongsentences Supportuser-definedrecommended-wordlistandmust-wordlist ASBC4.0TestSplit(50,000sentences) Tool (WS)prec (WS)rec (WS)f1 (POS)acc CkipTagger 97.49% 97.17% 97.33% 94.59% CKIPWS(classic) 95.85% 95.96% 95.91% 90.62% Jieba-zh_TW 90.51% 89.10% 89.80% -- Installation tl;dr. pipinstall-Uckiptagger[tf,gdown] CkipTaggerisaPythonlibraryhostedonPyPI.Requirements: python>=3.6 tensorflow>=1.13.1/tensorflow-gpu>=1.13.1(oneofthem) gdown(optional,fordownloadingmodelfilesfromgoogledrive) (Minimuminstallation)Ifyouhavesetuptensorflow,andwouldliketodownloadmodelfilesbyyourself. pipinstall-Uckiptagger (Completeinstallation)Ifyouhavejustsetupacleanvirtualenvironment,andwanteverything,includingGPUsupport. pipinstall-Uckiptagger[tfgpu,gdown] Usage Completedemoscript:demo.py.Thefollowingsectionsassume: fromckiptaggerimportdata_utils,construct_dictionary,WS,POS,NER 1.Downloadmodelfiles Themodelfilesareavailableonseveralmirrorsites. iis-ckip gdrive-ckip gdrive-jacobvsdanniel YoucandownloadandextracttothedesiredpathbyoneoftheincludedAPI. #Downloadsto./data.zip(2GB)andextractsto./data/ #data_utils.download_data_url("./")#iis-ckip data_utils.download_data_gdown("./")#gdrive-ckip ./data/model_ner/pos_list.txt->POStaglist,seeWiki/TechnicalReportno.93-05 ./data/model_ner/label_list.txt->Entitytypelist,seeWiki/OntoNotesRelease5.0 ./data/embedding_*->character/wordembeddings,seeWiki 2.Loadmodel #TouseGPU: #1.Installtensorflow-gpu(seeInstallation) #2.SetCUDA_VISIBLE_DEVICESenvironmentvariable,e.g.os.environ["CUDA_VISIBLE_DEVICES"]="0" #3.Setdisable_cuda=False,e.g.ws=WS("./data",disable_cuda=False) #TouseCPU: ws=WS("./data") pos=POS("./data") ner=NER("./data") 3.(Optional)Createdictionary YoucansupplywordsforWSspecialconsideration,includingtheirrelativeweights. word_to_weight={ "土地公":1, "土地婆":1, "公有":2, "":1, "來亂的":"啦", "緯來體育台":1, } dictionary=construct_dictionary(word_to_weight) print(dictionary) [(2,{'公有':2.0}),(3,{'土地公':1.0,'土地婆':1.0}),(5,{'緯來體育台':1.0})] 4.RuntheWS-POS-NERpipeline sentence_list=[ "傅達仁今將執行安樂死，卻突然爆出自己20年前遭緯來體育台封殺，他不懂自己哪裡得罪到電視台。

", "美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會，預料她將會很順利通過參議院支持，成為該國有史以來第一位的華裔女性內閣成員。

", "", "土地公有政策?？還是土地婆有政策。

.", "…你確定嗎…不要再騙了……", "最多容納59,000個人,或5.9萬人,再多就不行了.這是環評的結論.", "科長說:1,坪數對人數為1:3。

2,可以再增加。

", ] word_sentence_list=ws( sentence_list, #sentence_segmentation=True,#Toconsiderdelimiters #segment_delimiter_set={",","。

",":","?","!",";"}),#Thisisthedefualtsetofdelimiters #recommend_dictionary=dictionary1,#wordsinthisdictionaryareencouraged #coerce_dictionary=dictionary2,#wordsinthisdictionaryareforced ) pos_sentence_list=pos(word_sentence_list) entity_sentence_list=ner(word_sentence_list,pos_sentence_list) 5.(Optional)Releasememory delws delpos delner 6.ShowResults defprint_word_pos_sentence(word_sentence,pos_sentence): assertlen(word_sentence)==len(pos_sentence) forword,posinzip(word_sentence,pos_sentence): print(f"{word}({pos})",end="\u3000") print() return fori,sentenceinenumerate(sentence_list): print() print(f"'{sentence}'") print_word_pos_sentence(word_sentence_list[i],pos_sentence_list[i]) forentityinsorted(entity_sentence_list[i]): print(entity) '傅達仁今將執行安樂死，卻突然爆出自己20年前遭緯來體育台封殺，他不懂自己哪裡得罪到電視台。

' 傅達仁(Nb)　今(Nd)　將(D)　執行(VC)　安樂死(Na)　，(COMMACATEGORY)　卻(D)　突然(D)　爆出(VJ)　自己(Nh)　20(Neu)　年(Nf)　前(Ng)　遭(P)　緯來(Nb)　體育台(Na)　封殺(VC)　，(COMMACATEGORY)　他(Nh)　不(D)　懂(VK)　自己(Nh)　哪裡(Ncd)　得罪到(VJ)　電視台(Nc)　。

(PERIODCATEGORY)　 (0,3,'PERSON','傅達仁') (18,22,'DATE','20年前') (23,28,'ORG','緯來體育台') '美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會，預料她將會很順利通過參議院支持，成為該國有史以來第一位的華裔女性內閣成員。

' 美國(Nc)　參議院(Nc)　針對(P)　今天(Nd)　總統(Na)　布什(Nb)　所(D)　提名(VC)　的(DE)　勞工部長(Na)　趙小蘭(Nb)　展開(VC)　認可(VC)　聽證會(Na)　，(COMMACATEGORY)　預料(VE)　她(Nh)　將(D)　會(D)　很(Dfa)　順利(VH)　通過(VC)　參議院(Nc)　支持(VC)　，(COMMACATEGORY)　成為(VG)　該(Nes)　國(Nc)　有史以來(D)　第一(Neu)　位(Nf)　的(DE)　華裔(Na)　女性(Na)　內閣(Na)　成員(Na)　。

(PERIODCATEGORY)　 (0,2,'GPE','美國') (2,5,'ORG','參議院') (7,9,'DATE','今天') (11,13,'PERSON','布什') (17,21,'ORG','勞工部長') (21,24,'PERSON','趙小蘭') (42,45,'ORG','參議院') (56,58,'ORDINAL','第一') (60,62,'NORP','華裔') '' '土地公有政策?？還是土地婆有政策。

.' 土地公(Nb)　有(V_2)　政策(Na)　?(QUESTIONCATEGORY)　？(QUESTIONCATEGORY)　還是(Caa)　土地(Na)　婆(Na)　有(V_2)　政策(Na)　。

(PERIODCATEGORY)　.(PERIODCATEGORY)　 (0,3,'PERSON','土地公') '…你確定嗎…不要再騙了……' …(ETCCATEGORY)　(WHITESPACE)　你(Nh)　確定(VK)　嗎(T)　…(ETCCATEGORY)　(WHITESPACE)　不要(D)　再(D)　騙(VC)　了(Di)　…(ETCCATEGORY)　…(ETCCATEGORY)　 '最多容納59,000個人,或5.9萬人,再多就不行了.這是環評的結論.' 最多(VH)　容納(VJ)　59,000(Neu)　個(Nf)　人(Na)　,(COMMACATEGORY)　或(Caa)　5.9萬(Neu)　人(Na)　,(COMMACATEGORY)　再(D)　多(D)　就(D)　不行(VH)　了(T)　.(PERIODCATEGORY)　這(Nep)　是(SHI)　環評(Na)　的(DE)　結論(Na)　.(PERIODCATEGORY)　 (4,10,'CARDINAL','59,000') (14,18,'CARDINAL','5.9萬') '科長說:1,坪數對人數為1:3。

2,可以再增加。

' 科長(Na)　說(VE)　:1,(Neu)　坪數(Na)　對(P)　人數(Na)　為(VG)　1:3(Neu)　。

(PERIODCATEGORY)　2(Neu)　,(COMMACATEGORY)　可以(D)　再(D)　增加(VHC)　。

(PERIODCATEGORY)　 (4,6,'CARDINAL','1,') (12,13,'CARDINAL','1') (14,15,'CARDINAL','3') (16,17,'CARDINAL','2') ModelDetails Pleasesee: Peng-HsuanLi,Tsu-JuiFu,andWei-YunMa.2020.WhyAttention?AnalyzeBiLSTMDeficiencyandItsRemediesintheCaseofNER.InProceedingsoftheThirty-ThirdAAAIConferenceonArtificialIntelligence(AAAI/arXiv). LICENSE Copyright(c)2019CKIPLab. ThisWorkislicensedundertheGNUGeneralPublicLicensev3.0withoutanywarranties.ThelicensetextinfullcanbegettingaccessatthefilenamedCOPYING-GPL-3.0.AnypersonobtainingacopyofthisWorkandassociateddocumentationfilesisgrantedtherightstouse,copy,modify,merge,publish,anddistributetheWorkforanypurpose.HoweverifanyworkisbaseduponthisWorkandhenceconstitutesaDerivativeWork,theGPL-3.0licenserequiresdistributionsoftheWorkandtheDerivativeWorktoremainunderthesamelicenseorasimilarlicensewiththeSourceCodeprovisionobligation. ForcommerciallicensewithouttheSourceCodeconveyingliability,pleasecontact Forotherquestions,pleasecontact About CKIPNeuralChineseWordSegmentation,POSTagging,andNER Resources Readme License GPL-3.0license Stars 1.5k stars Watchers 68 watching Forks 191 forks Releases 9 0.2.1 Latest Sep9,2020 +8releases Packages0 Nopackagespublished Usedby117 +109 Contributors5 Languages Python 100.0% Youcan’tperformthatactionatthistime. Yousignedinwithanothertaborwindow.Reloadtorefreshyoursession. Yousignedoutinanothertaborwindow.Reloadtorefreshyoursession.