Using emoji-java i've wrote a simple method that removes all emojis including fitzpatrick modifiers. Requires an external library but easier to maintain ...
Home
Public
Questions
Tags
Users
Collectives
ExploreCollectives
FindaJob
Jobs
Companies
Teams
StackOverflowforTeams
–Collaborateandshareknowledgewithaprivategroup.
CreateafreeTeam
WhatisTeams?
Teams
CreatefreeTeam
CollectivesonStackOverflow
Findcentralized,trustedcontentandcollaboratearoundthetechnologiesyouusemost.
Learnmore
Teams
Q&Aforwork
Connectandshareknowledgewithinasinglelocationthatisstructuredandeasytosearch.
Learnmore
Whatistheregextoextractalltheemojisfromastring?
AskQuestion
Asked
7years,4monthsago
Active
3monthsago
Viewed
101ktimes
64
24
IhaveaStringencodedinUTF-8.Forexample:
Thatsanicejoke😆😆😆😛
Ihavetoextractalltheemojispresentinthesentence.Andtheemojicouldbeany
Whenthissentenceisviewedinterminalusingcommandlesstext.txtitisviewedas:
Thatsanicejoke
ThisisthecorrespondingUTFcodefortheemoji.Allthecodesforemojiscanbefoundatemojitracker.
Forthepurposeoffindingalltheoccurances,Iusedaregularexpressionpattern()butitdidntworkfortheUTF-8encodedstring.
Followingismycode:
Strings="Thatsanicejoke😆😆😆😛";
Patternpattern=Pattern.compile("()");
Matchermatcher=pattern.matcher(s);
ListmatchList=newArrayList();
while(matcher.find()){
matchList.add(matcher.group());
}
for(inti=0;istringisspecifictoless-also,yoursolutionideawouldalsocapturejustaboutanyotherunicodecharacter.Theonlyrealsolutionwouldbetohavealistofallunicodecodepointscorrespondingtoemojis.
– DrewMcGowen
Jul19'14at13:04
You'llhavetofindalistofalloftheemojicharacters(codepoints)youwanttofind,they'respreadovermanydifferentUnicodeblocks.ThisPDFhasa"goodsample"(accordingtothefirstlink)...
– T.J.Crowder
Jul19'14at13:07
@T.J.CrowderthepdfthatyoujustmentionedsaysRange:1F300–1F5FFforMiscellaneousSymbolsandPictographs.SoletssayIwanttocaptureanycharacterlyingwithinthisrange.Nowwhattodo?
– vishalaksh
Jul19'14at13:16
1
IcameheretryingtofindaregexthatIcanpasteintoSublimeTexttofindemojis.Noluck.
– adib
Nov14'16at1:59
YoucanuseCharacterclassstackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec14'16at16:32
|
Show2morecomments
17Answers
17
Active
Oldest
Votes
53
Usingemoji-javai'vewroteasimplemethodthatremovesallemojisincludingfitzpatrickmodifiers.Requiresanexternallibrarybuteasiertomaintainthanthosemonsterregexes.
Use:
Stringinput="Astring😄witha\uD83D\uDC66\uD83C\uDFFFfew😉emojis!";
Stringresult=EmojiParser.removeAllEmojis(input);
emoji-javamaveninstallation:
com.vdurmont
emoji-java
3.1.3
gradle:
implementation'com.vdurmont:emoji-java:3.1.3'
EDIT:previouslysubmittedanswerwaspulledintoemoji-javasourcecode.
Share
Follow
editedSep16'20at13:42
user14817809
answeredSep30'15at17:35
gidimgidim
2,2591919silverbadges2121bronzebadges
6
4
Iloveanswerslikethese.Thisworkedlikeacharm.Thanks!
– TheKingInTheNorth
Jan19'16at16:03
Ialsousedthislibrarytoremoveemojisanditworkedperfectly.Onething,thecodesnippetisoutdatedanddidnotworkformewiththelatestversion(threwsomepatternexception),inthedocumentationitisrecommendedtouseEmojiParser#removeAllEmojis(String)andthatindeedworkssmoothly.
– YonatanWilkof
Jun2'16at5:44
Ifyouareusingthis.hereisalinktothejar:github.com/vdurmont/emoji-java/releasesandthisisalinktothedependency:mvnrepository.com/artifact/org.json/json/20080701
– Whitecat
Oct12'16at18:25
1
@gidim,pleaseupdatetheversionofthedependenciesto3.1.3.Version2.0.1thatyoulisteddoesn'thaveEmojiParser.removeAllEmojis(Stringinput)Otherthanthat,thumbsupforthegreatlibrary!
– BrunoCarrier
Oct31'16at20:14
1
@BrunoCarrierthanks!updated.btwi'mnotauthorofthelibrary.Ijustwrotetheemojiremovalfunction.
– gidim
Nov1'16at16:55
|
Show1morecomment
34
thepdfthatyoujustmentionedsaysRange:1F300–1F5FFforMiscellaneousSymbolsandPictographs.SoletssayIwanttocaptureanycharacterlyingwithinthisrange.Nowwhattodo?
Okay,butIwilljustnotethattheemojiinyourquestionareoutsidethatrange!:-)
Thefactthattheseareabove0xFFFFcomplicatesthings,becauseJavastringsstoreUTF-16.Sowecan'tjustuseonesimplecharacterclassforit.We'regoingtohavesurrogatepairs.(More:http://www.unicode.org/faq/utf_bom.html)
U+1F300inUTF-16endsupbeingthepair\uD83C\uDF00;U+1F5FFendsupbeing\uD83D\uDDFF.Notethatthefirstcharacterwentup,wecrossatleastoneboundary.Sowehavetoknowwhatrangesofsurrogatepairswe'relookingfor.
NotbeingsteepedinknowledgeabouttheinnerworkingsofUTF-16,Iwroteaprogramtofindout(sourceattheend —I'ddouble-checkitifIwereyou,ratherthantrustingme).Ittellsmewe'relookingfor\uD83Cfollowedbyanythingintherange\uDF00-\uDFFF(inclusive),or\uD83Dfollowedbyanythingintherange\uDC00-\uDDFF(inclusive).
Soarmedwiththatknowledge,intheorywecouldnowwriteapattern:
//Thisiswrong,keepreading
Patternp=Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
That'sanalternationoftwonon-capturinggroups,thefirstgroupforthepairsstartingwith\uD83C,andthesecondgroupforthepairsstartingwith\uD83D.
Butthatfails(doesn'tfindanything).I'mfairlysureit'sbecausewe'retryingtospecifyhalfofasurrogatepairinvariousplaces:
Patternp=Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
//Halfofapair--------------^------^------^-----------^------^------^
Wecan'tjustsplitupsurrogatepairslikethat,they'recalledsurrogatepairsforareason.:-)
Consequently,Idon'tthinkwecanuseregularexpressions(orindeed,anystring-basedapproach)forthisatall.Ithinkwehavetosearchthroughchararrays.
chararraysholdUTF-16values,sowecanfindthosehalf-pairsinthedataifwelookforitthehardway:
Strings=newStringBuilder()
.append("Thatsanicejoke")
.appendCodePoint(0x1F606)
.appendCodePoint(0x1F606)
.appendCodePoint(0x1F606)
.append("")
.appendCodePoint(0x1F61B)
.toString();
char[]chars=s.toCharArray();
intindex;
charch1;
charch2;
index=0;
while(index=0xDF00&&(int)ch2<=0xDFFF){
System.out.println("Foundemojiatindex"+index);
index+=2;
continue;
}
}
elseif((int)ch1==0xD83D){
ch2=chars[index+1];
if((int)ch2>=0xDC00&&(int)ch2<=0xDDFF){
System.out.println("Foundemojiatindex"+index);
index+=2;
continue;
}
}
++index;
}
Obviouslythat'sjustdebug-levelcode,butitdoesthejob.(Inyourgivenstring,withitsemoji,ofcourseitwon'tfindanythingasthey'reoutsidetherange.Butifyouchangetheupperboundonthesecondpairto0xDEFFinsteadof0xDDFF,itwill.Noideaifthatwouldalsoincludenon-emojis,though.)
Sourceofmyprogramtofindoutwhatthesurrogaterangeswere:
publicclassFindRanges{
publicstaticvoidmain(String[]args){
charlast0='\0';
charlast1='\0';
for(intx=0x1F300;x<=0x1F5FF;++x){
char[]chars=newStringBuilder().appendCodePoint(x).toString().toCharArray();
if(chars[0]!=last0){
if(last0!='\0'){
System.out.println("-\\u"+Integer.toHexString((int)last1).toUpperCase());
}
System.out.print("\\u"+Integer.toHexString((int)chars[0]).toUpperCase()+"\\u"+Integer.toHexString((int)chars[1]).toUpperCase());
last0=chars[0];
}
last1=chars[1];
}
if(last0!='\0'){
System.out.println("-\\u"+Integer.toHexString((int)last1).toUpperCase());
}
}
}
Output:
\uD83C\uDF00-\uDFFF
\uD83D\uDC00-\uDDFF
Share
Follow
editedJul19'14at15:52
answeredJul19'14at13:45
T.J.CrowderT.J.Crowder
918k168168goldbadges16871687silverbadges17001700bronzebadges
1
@purrrminator:Seenotesaboutaboutranges.Theaboveisjustanexamplehandlingaspecificrange,butIwarnedtheOPtherewereothers.
– T.J.Crowder
Aug11'14at12:35
Addacomment
|
19
Hadasimilarproblem.Thefollowingservedmewellandmatchessurrogatepairs
publicclassSplitByUnicode{
publicstaticvoidmain(String[]argv)throwsException{
Stringstring="Thatsanicejoke😆😆😆😛";
System.out.println("OriginalString:"+string);
StringregexPattern="[\uD83C-\uDBFF\uDC00-\uDFFF]+";
byte[]utf8=string.getBytes("UTF-8");
Stringstring1=newString(utf8,"UTF-8");
Patternpattern=Pattern.compile(regexPattern);
Matchermatcher=pattern.matcher(string1);
ListmatchList=newArrayList();
while(matcher.find()){
matchList.add(matcher.group());
}
for(inti=0;imatchList=newArrayList();
while(matcher.find()){
matchList.add(matcher.group());
}
for(inti=0;i=codePoint&&codePoint<=0x1ffff)
{
returnInteger.toHexString(codePoint).toCharArray();
}
returnCharacter.toChars(codePoint);
}
}
Share
Follow
answeredAug18'14at17:43
Mr.CMr.C
6111bronzebadge
Addacomment
|
5
Emojiregex
publicstaticfinalStringsEmojiRegex="(?:[\\u2700-\\u27bf]|"+
"(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|"+
"[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?"+
"(?:\\u200d(?:[^\\ud800-\\udfff]|"+
"(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|"+
"[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|"+
"[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]";
someemojis(1627)
//count=1627
publicstaticfinalStringsEmojiTest="😀😃😄😁😆😅😂🤣☺️😊😇🙂🙃😉😌😍😘😗😙😚😋😜😝😛🤑🤗🤓😎🤡🤠😏😒😞😔😟😕🙁☹️😣😖😫😩😤😠😡😶😐😑😯😦😧😮😲😵😳😱😨😰😢😥🤤😭😓😪😴🙄🤔🤥😬🤐🤢🤧😷🤒🤕😈👿👹👺💩👻💀☠️👽👾🤖🎃😺😸😹😻😼😽🙀😿😾👐🙌👏🙏🤝👍👎👊✊🤛🤜🤞✌️🤘👌👈👉👆👇☝️✋🤚🖐🖖👋🤙💪🖕✍️🤳💅💍💄💋👄👅👂👃👣👁👀🗣👤👥👶👦👧👨👩👱♀👱👴👵👲👳♀👳👮♀👮👷♀👷💂♀💂🕵️♀️🕵👩⚕👨⚕👩🌾👨🌾👩🍳👨🍳👩🎓👨🎓👩🎤👨🎤👩🏫👨🏫👩🏭👨🏭👩💻👨💻👩💼👨💼👩🔧👨🔧👩🔬👨🔬👩🎨👨🎨👩🚒👨🚒👩✈👨✈👩🚀👨🚀👩⚖👨⚖🤶🎅👸🤴👰🤵👼🤰🙇♀🙇💁💁♂🙅🙅♂🙆🙆♂🙋🙋♂🤦♀🤦♂🤷♀🤷♂🙎🙎♂🙍🙍♂💇💇♂💆💆♂🕴💃🕺👯👯♂🚶♀🚶🏃♀🏃👫👭👬💑👩❤️👩👨❤️👨💏👩❤️💋👩👨❤️💋👨👪👨👩👧👨👩👧👦👨👩👦👦👨👩👧👧👩👩👦👩👩👧👩👩👧👦👩👩👦👦👩👩👧👧👨👨👦👨👨👧👨👨👧👦👨👨👦👦👨👨👧👧👩👦👩👧👩👧👦👩👦👦👩👧👧👨👦👨👧👨👧👦👨👦👦👨👧👧👚👕👖👔👗👙👘👠👡👢👞👟👒🎩🎓👑⛑🎒👝👛👜💼👓🕶🌂☂️🐶🐱🐭🐹🐰🦊🐻🐼🐨🐯🦁🐮🐷🐽🐸🐵🙈🙉🙊🐒🐔🐧🐦🐤🐣🐥🦆🦅🦉🦇🐺🐗🐴🦄🐝🐛🦋🐌🐚🐞🐜🕷🕸🐢🐍🦎🦂🦀🦑🐙🦐🐠🐟🐡🐬🦈🐳🐋🐊🐆🐅🐃🐂🐄🦌🐪🐫🐘🦏🦍🐎🐖🐐🐏🐑🐕🐩🐈🐓🦃🕊🐇🐁🐀🐿🐾🐉🐲🌵🎄🌲🌳🌴🌱🌿☘️🍀🎍🎋🍃🍂🍁🍄🌾💐🌷🌹🥀🌻🌼🌸🌺🌎🌍🌏🌕🌖🌗🌘🌑🌒🌓🌔🌚🌝🌞🌛🌜🌙💫⭐️🌟✨⚡️🔥💥☄☀️🌤⛅️🌥🌦🌈☁️🌧⛈🌩🌨☃️⛄️❄️🌬💨🌪🌫🌊💧💦☔️🍏🍎🍐🍊🍋🍌🍉🍇🍓🍈🍒🍑🍍🥝🥑🍅🍆🥒🥕🌽🌶🥔🍠🌰🥜🍯🥐🍞🥖🧀🥚🍳🥓🥞🍤🍗🍖🍕🌭🍔🍟🥙🌮🌯🥗🥘🍝🍜🍲🍥🍣🍱🍛🍚🍙🍘🍢🍡🍧🍨🍦🍰🎂🍮🍭🍬🍫🍿🍩🍪🥛🍼☕️🍵🍶🍺🍻🥂🍷🥃🍸🍹🍾🥄🍴🍽⚽️🏀🏈⚾️🎾🏐🏉🎱🏓🏸🥅🏒🏑🏏⛳️🏹🎣🥊🥋⛸🎿⛷🏂🏋️♀️🏋🤺🤼♀🤼♂🤸♀🤸♂⛹️♀️⛹🤾♀🤾♂🏌️♀️🏌🏄♀🏄🏊♀🏊🤽♀🤽♂🚣♀🚣🏇🚴♀🚴🚵♀🚵🎽🏅🎖🥇🥈🥉🏆🏵🎗🎫🎟🎪🤹♀🤹♂🎭🎨🎬🎤🎧🎼🎹🥁🎷🎺🎸🎻🎲🎯🎳🎮🎰🚗🚕🚙🚌🚎🏎🚓🚑🚒🚐🚚🚛🚜🛴🚲🛵🏍🚨🚔🚍🚘🚖🚡🚠🚟🚃🚋🚞🚝🚄🚅🚈🚂🚆🚇🚊🚉🚁🛩✈️🛫🛬🚀🛰💺🛶⛵️🛥🚤🛳⛴🚢⚓️🚧⛽️🚏🚦🚥🗺🗿🗽⛲️🗼🏰🏯🏟🎡🎢🎠⛱🏖🏝⛰🏔🗻🌋🏜🏕⛺️🛤🛣🏗🏭🏠🏡🏘🏚🏢🏬🏣🏤🏥🏦🏨🏪🏫🏩💒🏛⛪️🕌🕍🕋⛩🗾🎑🏞🌅🌄🌠🎇🎆🌇🌆🏙🌃🌌🌉🌁⌚️📱📲💻⌨️🖥🖨🖱🖲🕹🗜💽💾💿📀📼📷📸📹🎥📽🎞📞☎️📟📠📺📻🎙🎚🎛⏱⏲⏰🕰⌛️⏳📡🔋🔌💡🔦🕯🗑🛢💸💵💴💶💷💰💳💎⚖️🔧🔨⚒🛠⛏🔩⚙️⛓🔫💣🔪🗡⚔️🛡🚬⚰️⚱️🏺🔮📿💈⚗️🔭🔬🕳💊💉🌡🚽🚰🚿🛁🛀🛎🔑🗝🚪🛋🛏🛌🖼🛍🛒🎁🎈🎏🎀🎊🎉🎎🏮🎐✉️📩📨📧💌📥📤📦🏷📪📫📬📭📮📯📜📃📄📑📊📈📉🗒🗓📆📅📇🗃🗳🗄📋📁📂🗂🗞📰📓📔📒📕📗📘📙📚📖🔖🔗📎🖇📐📏📌📍✂️🖊🖋✒️🖌🖍📝✏️🔍🔎🔏🔐🔒🔓❤️💛💚💙💜🖤💔❣️💕💞💓💗💖💘💝💟☮️✝️☪️🕉☸️✡️🔯🕎☯️☦️🛐⛎♈️♉️♊️♋️♌️♍️♎️♏️♐️♑️♒️♓️🆔⚛️🉑☢️☣️📴📳🈶🈚️🈸🈺🈷️✴️🆚💮🉐㊙️㊗️🈴🈵🈹🈲🅰️🅱️🆎🆑🅾️🆘❌⭕️🛑⛔️📛🚫💯💢♨️🚷🚯🚳🚱🔞📵🚭❗️❕❓❔‼️⁉️🔅🔆〽️⚠️🚸🔱⚜️🔰♻️✅🈯️💹❇️✳️❎🌐💠Ⓜ️🌀💤🏧🚾♿️🅿️🈳🈂️🛂🛃🛄🛅🚹🚺🚼🚻🚮🎦📶🈁🔣ℹ️🔤🔡🔠🆖🆗🆙🆒🆕🆓0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣🔟🔢#️⃣*️⃣▶️⏸⏯⏹⏺⏭⏮⏩⏪⏫⏬◀️🔼🔽➡️⬅️⬆️⬇️↗️↘️↙️↖️↕️↔️↪️↩️⤴️⤵️🔀🔁🔂🔄🔃🎵🎶➕➖➗✖️💲💱™️©️®️〰️➰➿🔚🔙🔛🔝🔜✔️☑️🔘⚪️⚫️🔴🔵🔺🔻🔸🔹🔶🔷🔳🔲▪️▫️◾️◽️◼️◻️⬛️⬜️🔈🔇🔉🔊🔔🔕📣📢👁🗨💬💭🗯♠️♣️♥️♦️🃏🎴🀄️🕐🕑🕒🕓🕔🕕🕖🕗🕘🕙🕚🕛🕜🕝🕞🕟🕠🕡🕢🕣🕤🕥🕦🕧🏳️🏴🏁🚩🏳️🌈🇦🇫🇦🇽🇦🇱🇩🇿🇦🇸🇦🇩🇦🇴🇦🇮🇦🇶🇦🇬🇦🇷🇦🇲🇦🇼🇦🇺🇦🇹🇦🇿🇧🇸🇧🇭🇧🇩🇧🇧🇧🇾🇧🇪🇧🇿🇧🇯🇧🇲🇧🇹🇧🇴🇧🇶🇧🇦🇧🇼🇧🇷🇮🇴🇻🇬🇧🇳🇧🇬🇧🇫🇧🇮🇨🇻🇰🇭🇨🇲🇨🇦🇮🇨🇰🇾🇨🇫🇹🇩🇨🇱🇨🇳🇨🇽🇨🇨🇨🇴🇰🇲🇨🇬🇨🇩🇨🇰🇨🇷🇨🇮🇭🇷🇨🇺🇨🇼🇨🇾🇨🇿🇩🇰🇩🇯🇩🇲🇩🇴🇪🇨🇪🇬🇸🇻🇬🇶🇪🇷🇪🇪🇪🇹🇪🇺🇫🇰🇫🇴🇫🇯🇫🇮🇫🇷🇬🇫🇵🇫🇹🇫🇬🇦🇬🇲🇬🇪🇩🇪🇬🇭🇬🇮🇬🇷🇬🇱🇬🇩🇬🇵🇬🇺🇬🇹🇬🇬🇬🇳🇬🇼🇬🇾🇭🇹🇭🇳🇭🇰🇭🇺🇮🇸🇮🇳🇮🇩🇮🇷🇮🇶🇮🇪🇮🇲🇮🇱🇮🇹🇯🇲🇯🇵🎌🇯🇪🇯🇴🇰🇿🇰🇪🇰🇮🇽🇰🇰🇼🇰🇬🇱🇦🇱🇻🇱🇧🇱🇸🇱🇷🇱🇾🇱🇮🇱🇹🇱🇺🇲🇴🇲🇰🇲🇬🇲🇼🇲🇾🇲🇻🇲🇱🇲🇹🇲🇭🇲🇶🇲🇷🇲🇺🇾🇹🇲🇽🇫🇲🇲🇩🇲🇨🇲🇳🇲🇪🇲🇸🇲🇦🇲🇿🇲🇲🇳🇦🇳🇷🇳🇵🇳🇱🇳🇨🇳🇿🇳🇮🇳🇪🇳🇬🇳🇺🇳🇫🇲🇵🇰🇵🇳🇴🇴🇲🇵🇰🇵🇼🇵🇸🇵🇦🇵🇬🇵🇾🇵🇪🇵🇭🇵🇳🇵🇱🇵🇹🇵🇷🇶🇦🇷🇪🇷🇴🇷🇺🇷🇼🇧🇱🇸🇭🇰🇳🇱🇨🇵🇲🇻🇨🇼🇸🇸🇲🇸🇹🇸🇦🇸🇳🇷🇸🇸🇨🇸🇱🇸🇬🇸🇽🇸🇰🇸🇮🇸🇧🇸🇴🇿🇦🇬🇸🇰🇷🇸🇸🇪🇸🇱🇰🇸🇩🇸🇷🇸🇿🇸🇪🇨🇭🇸🇾🇹🇼🇹🇯🇹🇿🇹🇭🇹🇱🇹🇬🇹🇰🇹🇴🇹🇹🇹🇳🇹🇷🇹🇲🇹🇨🇹🇻🇺🇬🇺🇦🇦🇪🇬🇧🇺🇸🇻🇮🇺🇾🇺🇿🇻🇺🇻🇦🇻🇪🇻🇳🇼🇫🇪🇭🇾🇪🇿🇲🇿🇼⚽️🏀🏈⚾️🎾🏐🏉🎱🏓🏸🥅🏒🏑🏏⛳️🏹🎣🥊🥋⛸🎿⛷🏂🏋️♀️🏋🏻♀️🏋🏼♀️🏋🏽♀️🏋🏾♀️🏋🏿♀️🏋️🏋🏻🏋🏼🏋🏽🏋🏾🏋🏿🤺🤼♀️🤼♂️🤸♀️🤸🏻♀️🤸🏼♀️🤸🏽♀️🤸🏾♀️🤸🏿♀️🤸♂️🤸🏻♂️🤸🏼♂️🤸🏽♂️🤸🏾♂️🤸🏿♂️⛹️♀️⛹🏻♀️⛹🏼♀️⛹🏽♀️⛹🏾♀️⛹🏿♀️⛹️⛹🏻⛹🏼⛹🏽⛹🏾⛹🏿🤾♀️🤾🏻♀️🤾🏼♀️🤾🏽♀️🤾🏾♀️🤾🏿♀️🤾♂️🤾🏻♂️🤾🏼♂️🤾🏽♂️🤾🏾♂️🤾🏿♂️🏌️♀️🏌🏻♀️🏌🏼♀️🏌🏽♀️🏌🏾♀️🏌🏿♀️🏌️🏌🏻🏌🏼🏌🏽🏌🏾🏌🏿🏄♀️🏄🏻♀️🏄🏼♀️🏄🏽♀️🏄🏾♀️🏄🏿♀️🏄🏄🏻🏄🏼🏄🏽🏄🏾🏄🏿🏊♀️🏊🏻♀️🏊🏼♀️🏊🏽♀️🏊🏾♀️🏊🏿♀️🏊🏊🏻🏊🏼🏊🏽🏊🏾🏊🏿🤽♀️🤽🏻♀️🤽🏼♀️🤽🏽♀️🤽🏾♀️🤽🏿♀️🤽♂️🤽🏻♂️🤽🏼♂️🤽🏽♂️🤽🏾♂️🤽🏿♂️🚣♀️🚣🏻♀️🚣🏼♀️🚣🏽♀️🚣🏾♀️🚣🏿♀️🚣🚣🏻🚣🏼🚣🏽🚣🏾🚣🏿🏇🏇🏻🏇🏼🏇🏽🏇🏾🏇🏿🚴♀️🚴🏻♀️🚴🏼♀️🚴🏽♀️🚴🏾♀️🚴🏿♀️🚴🚴🏻🚴🏼🚴🏽🚴🏾🚴🏿🚵♀️🚵🏻♀️🚵🏼♀️🚵🏽♀️🚵🏾♀️🚵🏿♀️🚵🚵🏻🚵🏼🚵🏽🚵🏾🚵🏿🎽🏅🎖🥇🥈🥉🏆🏵🎗🎫🎟🎪🤹♀️🤹♂️🎭🎨🎬🎤🎧🎼🎹🥁🎷🎺🎸🎻🎲🎯🎳🎮🎰";
functiontotestemojis
publicvoidcheckMatchingEmojis(){
finalPatternpattern=Pattern.compile(sEmojiRegex);
finalMatchermatcher=pattern.matcher(sEmojiTest);
intfoundEmojiCount=0;
while(matcher.find()){
System.out.println("Fullmatch:"+matcher.group(0));
foundEmojiCount++;
}
System.out.println("*******************************************");
System.out.println("InputEmojicount=1627");
System.out.println("CapturedEmojicount="+foundEmojiCount);
System.out.println("*******************************************");
}
Hereisthegist,testedonallunicode10emojis
ThankstoKevinScottforwrittinggreateexample
Share
Follow
editedOct2'18at13:07
answeredSep18'17at12:38
SergeyChilingaryanSergeyChilingaryan
15322silverbadges1010bronzebadges
Addacomment
|
4
Therearetwowaystosolvethisstickyproblem.
ThefirstoneisUsingthird-partylibslikeemoji-javaandemoji4j.Thesearementionedabove.YoucaneasilyusethemethodcontainsEmojiorremovesEmoji,etc.AndinyourownApps,youneedtokeepupdatewiththeselibs.
Asforme,Iwanttofindasimplesolutiontosolvethisproblem.
Afterawholedayofsearching,I'vefoundamagicregex:
"(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)"
whichIhavetestedOKinJava.Itperfectlysolvedmyproblem.
YoucanviewthisontheGithubpage:
https://github.com/zly394/EmojiRegex
Notes:
Theanswerwhichprovidedby@EricNakagawacontainssomeerrors,whichcannotbeoperatedproperly.
Share
Follow
editedJul10'17at7:48
answeredJul10'17at7:43
VensentWangVensentWang
12166bronzebadges
1
Thiscapturesalotmorethanemojiis.IfyouusethisonBigListofNaughtyStringsyou'llgetplentyofnon-emojimatches.
– JackCole
May15'19at23:52
Addacomment
|
3
Youmayalsouseemoji4jlibrary.
StringemojiText="A🐱,🐱anda🐭becamefriends.For🐶'sbirthdayparty,theyallhad🍔s,🍟s,🍪sand🍰.";
EmojiUtils.removeAllEmojis(emojiText);//returns"A,andabecamefriends.For'sbirthdayparty,theyallhads,s,sand.
Share
Follow
answeredJan29'16at6:37
ChaitanyaChaitanya
2,26844goldbadges2828silverbadges4545bronzebadges
Addacomment
|
2
ThisiswhatIusetoremoveemojisandsofarithasshowntoallowallotheralphabets.
privatestaticStringremove_Emojis(Stringname)
{
//wewillstoreallthelettersinthisarray
ArrayListnonEmoji=newArrayList<>();
//andwhenwerebuildthenamewewillputitinhere
StringnewName="";
//wearegoingtoloopthroughcheckingeachcharactertoseeifitsanemojiornot
for(inti=0;i18)
{
if(Character.isAlphabetic(name.charAt(i)))
{
nonEmoji.add(name.charAt(i));
}
}
}
if(name.charAt(i)=='')//maywanttoconsideraddingor'-'or'\''
{
nonEmoji.add(i);//justaddit
}
if(name.charAt(i)=='@'&&!name.contains(""))//Iputthisinforemailaddresses
{
nonEmoji.add('@');
}
}
//finallyjustloopthroughbuildingitbackout
for(inti=0;i