Java - How to add and remove BOM from UTF-8 file
文章推薦指數: 80 %
The Unicode \ufeff represents 0xEF , 0xBB , 0xBF , read this. 1.1 The below example, write a BOM to a UTF-8 file /home/mkyong/file.txt .
JavaIOTutorialHomeFileCreate&WriteFileReadFileAppendFileDeleteFileCopyFileRenameMoveFileFindFilePathFileTransferFileExistsFileSizeDirectoryCreateDirectoryDeleteDirectoryCopyDirectorySizeDirectoryWalk(Files.walk)TempFileCreateTempFileWriteTempFileDeleteTempFileFilePathSerializationandDeserializationHowTo-WorkingDirectoryHowTo-ReadFilefromResourcesHowTo-GetJARpathHowTo-ZipFileHowTo-UnZipFileHowTo-CompressGzipfileHowTo-DecompressGzipfileHowTo-TarGzip,tar.gzHowTo-FileSeparatorHowTo-CountLinesHowTo-NewLineHowTo-GetFileExtensionHowTo-GetFileCreationDateHowTo-GetFileLastModifiedHowTo-UpdateLastModifiedHowTo-FormatFileTimeHowTo-MoveFileToDirectoryHowTo-RemoteShellScriptHowTo-UTF-8ReadHowTo-UTF-8WriteHowTo-FiletoPathHowTo-Filetobyte[]HowTo-FiletoHexHowTo-FiletoStringHowTo-StringtoFileHowTo-byte[]toStringHowTo-byte[]toFileHowTo-byte[]toObjectHowTo-ImageReadWriteHowTo-ImageResizeHowTo-ReadCSVFileHowTo-WriteCSVFileHowTo-FindFileByExtensionHowTo-InputStreamtoBufferedReaderHowTo-InputStreamtoFileHowTo-StringtoInputStreamHowTo-FileInputStreamJava–HowtoaddandremoveBOMfromUTF-8fileBymkyong|Lastupdated:April14,2021Viewed:29,003(+471pv/w)Tags:bom|ByteBuffer|hex|java.io|readbytes|readfile|utf-8|writebytes|writefileThisarticleshowsyouhowtoadd,checkandremovethebyteordermark(BOM)fromaUTF-8file.TheUTF-8representationoftheBOMisthebytesequence0xEF,0xBB,0xBF(hexadecimal),atthebeginningofthefile.1.AddBOMtoaUTF-8file2.CheckifafilecontainsUTF-8BOM3.RemoveBOMfromaUTF-8file4.CopyafileandaddBOM5.DownloadSourceCode6.ReferencesFurtherReadingReadmoreaboutBOMandUTF-8P.SThebelowBOMexamplesonlyworksforUTF-8file.1.AddBOMtoaUTF-8fileToAddBOMtoaUTF-8file,wecandirectlywriteUnicode\ufefforthreebytes0xEF,0xBB,0xBFatthebeginningoftheUTF-8file.NoteTheUnicode\ufeffrepresents0xEF,0xBB,0xBF,readthis.1.1Thebelowexample,writeaBOMtoaUTF-8file/home/mkyong/file.txt.AddBomToUtf8File.java
packagecom.mkyong.io.howto;
importjava.io.BufferedWriter;
importjava.io.IOException;
importjava.nio.file.Files;
importjava.nio.file.Path;
importjava.nio.file.Paths;
publicclassAddBomToUtf8File{
publicstaticvoidmain(String[]args)throwsIOException{
Pathpath=Paths.get("/home/mkyong/file.txt");
writeBomFile(path,"mkyong");
}
privatestaticvoidwriteBomFile(Pathpath,Stringcontent){
//Java8defaultUTF-8
try(BufferedWriterbw=Files.newBufferedWriter(path)){
bw.write("\ufeff");
bw.write(content);
bw.newLine();
bw.write(content);
}catch(IOExceptione){
e.printStackTrace();
}
}
}
OutputTerminal
$hexdump-C/home/mkyong/file.txt
00000000efbbbf6d6b796f6e670a6d6b796f6e67|...mkyong.mkyong|
00000010
$file/home/mkyong/file.txt
file.txt:UTF-8Unicode(withBOM)text
$cat/home/mkyong/file.txt
mkyong
mkyong
1.2BeforeJava8,BufferedWriterandOutputStreamWriterexamplesofwritingBOMtoaUTF-8file.
privatestaticvoidwriteBomFile(Pathpath,Stringcontent){
try(BufferedWriterbw=newBufferedWriter(
newOutputStreamWriter(
newFileOutputStream(path.toFile())
,StandardCharsets.UTF_8))){
bw.write("\ufeff");
bw.write(content);
bw.newLine();
bw.write(content);
}catch(IOExceptione){
e.printStackTrace();
}
}
1.3PrintWriterandOutputStreamWriterexampletowriteBOMtoaUTF-8file.The0xfeffisthebyteordermark(BOM)codepoint.
privatestaticvoidwriteBomFile(Pathpath,Stringcontent){
try(PrintWriterpw=newPrintWriter(
newOutputStreamWriter(
newFileOutputStream(path.toFile()),StandardCharsets.UTF_8))){
//pw.write("\ufeff");
pw.write(0xfeff);//alternative,codepoint
pw.write(content);
pw.write(System.lineSeparator());
pw.write(content);
}catch(IOExceptione){
e.printStackTrace();
}
}
1.4Alternatively,wecanwritetheBOMbytesequence0xEF,0xBB,and0xBFdirectlytoafile.
privatestaticvoidwriteBomFile4(Pathpath,Stringcontent){
try(FileOutputStreamfos=newFileOutputStream(path.toFile())){
byte[]BOM={(byte)0xEF,(byte)0xBB,(byte)0xBF};
fos.write(BOM);
fos.write(content.getBytes(StandardCharsets.UTF_8));
fos.write(System.lineSeparator().getBytes(StandardCharsets.UTF_8));
fos.write(content.getBytes(StandardCharsets.UTF_8));
}catch(IOExceptione){
e.printStackTrace();
}
}
2.CheckifafilecontainsUTF-8BOMThebelowexamplereadthefirst3bytesfromafileandcheckifitcontainsthe0xEF,0xBB,0xBFbytesequence.CheckBom.java
packagecom.mkyong.io.howto;
importorg.apache.commons.codec.binary.Hex;
importjava.io.FileInputStream;
importjava.io.IOException;
importjava.io.InputStream;
importjava.nio.file.Files;
importjava.nio.file.Path;
importjava.nio.file.Paths;
publicclassCheckBom{
publicstaticvoidmain(String[]args)throwsIOException{
Pathpath=Paths.get("/home/mkyong/file.txt");
if(isContainBOM(path)){
System.out.println("FoundBOM!");
}else{
System.out.println("NoBOM.");
}
}
privatestaticbooleanisContainBOM(Pathpath)throwsIOException{
if(Files.notExists(path)){
thrownewIllegalArgumentException("Path:"+path+"doesnotexists!");
}
booleanresult=false;
byte[]bom=newbyte[3];
try(InputStreamis=newFileInputStream(path.toFile())){
//readfirst3bytesofafile.
is.read(bom);
//BOMencodedasefbbbf
Stringcontent=newString(Hex.encodeHex(bom));
if("efbbbf".equalsIgnoreCase(content)){
result=true;
}
}
returnresult;
}
}
OutputTerminal
FoundBOM!
Theimportorg.apache.commons.codec.binary.Hex;isinthebelowcommons-codeclibrary.Or,wecanuseoneofthesemethodstoconvertbytestohex.pom.xml
延伸文章資訊
- 1java utf-8帶bom格式內容(帶"\uFEFF")轉換成utf-8格式- 台部落
java utf-8帶bom格式內容(帶"\uFEFF")轉換成utf-8格式. 原創 HiWorldNice 2020-06-20 04:48. 從txt文件中讀取一串字符串和數據庫中另一串字...
- 2java: 非法字符: '\ufeff' - 51CTO博客
java: 非法字符: '\ufeff',导入其他代码后,运行显示java:非法字符:'\ufeff'java:需要class,interface或enum原代码可能在编码时使用的编码格式不同,...
- 3java: 非法字符: '\ufeff'_mob604756f2af3b的技术博客
java: 非法字符: '\ufeff',开发工具是IDEA1.解决方法转为GBK再转回为UTF-82.图例...
- 4java: 非法字符: '\ufeff' - 使用D - 博客园
在Idea中启动项目报错:java: 非法字符: '\ufeff',原因时,是由于idea在编译期间字符集乱码。 解决方式如下: 1、在Idea右下角选择对应的编码2、 ...
- 5Error:(1, 1) java: 非法字符: '\ufeff' - 腾讯云
Error:(1, 1) java: 非法字符: '\ufeff'. 2019-08-07 20:45:43阅读2.8K0. utf-8+bom比utf-8多了三个字节前缀:0xEF0xBB0x...