Sunday, 20 September 2020

How To Solve XML Parsing Issue "Content is not allowed in prolog" In Java

If you have come to this post, then you are facing xml parsing issue in Java.

Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)

One common issue for this is BOM character. So what is a BOM (BYTE ORDER MARK) character? When you have multi byte format like UTF-16 or UTF-32, sever needs to indicate if the most significant byte starts from left or right (Big Endian & Little Endian). That's the purpose of BOM. It is a zero-width invisible character. Based on its value (FEFF or FFFE), a byte sequence can be treated as BIG Endian or Little Endian.
If your XML is in UTF-8 format, then BOM character is not required because UTF-8 is 8 bits (single byte). But the remote server might still send BOM character in the xml response due to legacy reasons. And while you are unmarshalling using JAXB, you will get above error.

To solve this, you can wrap around response input stream with BOMInputStream. It is from Apache Commons library.

BOMInputStream bomIn = new BOMInputStream(inputStream);
bomIn.hasBOM();
That should resolve the issue. Method hasBOM() actually checks & removes any BOM character in the input stream. Name of the method is a bit confusing.
Now you can parse the xml response without any error.

No comments:

Post a comment

How To Solve "Caused by: org.hibernate.HibernateException: Missing table" When Table Is Present In Database

If you are using JPA or Hibernate directly and got that exception while starting your application, there is one obvious reason for that. You...