1 2 Previous Next 22 Replies Latest reply: Dec 13, 2010 3:18 PM by Steven Hawkins RSS

Unable to parse xml

Balaji Seshadri Newbie

Im getting below error while parsing 70mB xml file using XMLPARSE function,looks like character encoding issue.

 

Please let me know of any work arounds.

 

2010-10-20 16:23:21,678 DEBUG [org.teiid.PROCESSOR] (Worker0_QueryProcessorQueue7) [Ljava.lang.Object;@13fbd4e
[ExpressionEvaluationException]Unable to evaluate XMLPARSE(DOCUMENT F.file): Value is not valid XML
1 [ExpressionEvaluationException]Value is not valid XML
2 [TransformationException]Value is not valid XML
3 [WstxIOException]Input length = 1
4 [UnmappableCharacterException]Input length = 1
    at org.teiid.query.eval.Evaluator.evaluate(Evaluator.java:606)
    at org.teiid.query.eval.Evaluator.evaluateXQuery(Evaluator.java:846)
    at org.teiid.query.processor.relational.XMLTableNode.nextBatchDirect(XMLTableNode.java:120)
    at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
    at org.teiid.query.processor.BatchIterator.finalRow(BatchIterator.java:69)
    at org.teiid.common.buffer.AbstractTupleSource.getCurrentTuple(AbstractTupleSource.java:69)
    at org.teiid.query.processor.BatchIterator.getCurrentTuple(BatchIterator.java:81)
    at org.teiid.common.buffer.AbstractTupleSource.hasNext(AbstractTupleSource.java:91)
    at org.teiid.query.processor.relational.NestedTableJoinStrategy.process(NestedTableJoinStrategy.java:120)
    at org.teiid.query.processor.relational.JoinNode.nextBatchDirect(JoinNode.java:196)
    at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
    at org.teiid.query.processor.relational.ProjectNode.nextBatchDirect(ProjectNode.java:159)
    at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
    at org.teiid.query.processor.BatchIterator.finalRow(BatchIterator.java:69)
    at org.teiid.common.buffer.AbstractTupleSource.getCurrentTuple(AbstractTupleSource.java:69)
    at org.teiid.query.processor.BatchIterator.getCurrentTuple(BatchIterator.java:81)
    at org.teiid.common.buffer.AbstractTupleSource.nextTuple(AbstractTupleSource.java:48)
    at org.teiid.query.processor.relational.SortUtility.initialSort(SortUtility.java:214)
    at org.teiid.query.processor.relational.SortUtility.sort(SortUtility.java:168)
    at org.teiid.query.processor.relational.SortNode.sortPhase(SortNode.java:96)
    at org.teiid.query.processor.relational.SortNode.nextBatchDirect(SortNode.java:85)
    at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
    at org.teiid.query.processor.relational.RelationalPlan.nextBatch(RelationalPlan.java:107)
    at org.teiid.query.processor.QueryProcessor.nextBatchDirect(QueryProcessor.java:150)
    at org.teiid.query.processor.QueryProcessor.nextBatch(QueryProcessor.java:105)
    at org.teiid.query.processor.BatchCollector.collectTuples(BatchCollector.java:115)
    at org.teiid.dqp.internal.process.RequestWorkItem.processMore(RequestWorkItem.java:250)
    at org.teiid.dqp.internal.process.RequestWorkItem.process(RequestWorkItem.java:184)
    at org.teiid.dqp.internal.process.AbstractWorkItem.run(AbstractWorkItem.java:49)
    at org.teiid.dqp.internal.process.DQPWorkContext.runInContext(DQPWorkContext.java:188)
    at org.teiid.dqp.internal.process.ThreadReuseExecutor$RunnableWrapper.run(ThreadReuseExecutor.java:116)
    at org.teiid.dqp.internal.process.ThreadReuseExecutor$3.run(ThreadReuseExecutor.java:290)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: [ExpressionEvaluationException]Value is not valid XML
1 [TransformationException]Value is not valid XML
2 [WstxIOException]Input length = 1
3 [UnmappableCharacterException]Input length = 1
    at org.teiid.query.eval.Evaluator.evaluateXMLParse(Evaluator.java:695)
    at org.teiid.query.eval.Evaluator.internalEvaluate(Evaluator.java:662)
    at org.teiid.query.eval.Evaluator.evaluate(Evaluator.java:604)
    ... 34 more
Caused by: [TransformationException]Value is not valid XML
1 [WstxIOException]Input length = 1
2 [UnmappableCharacterException]Input length = 1
    at org.teiid.core.types.basic.StringToSQLXMLTransform.isXml(StringToSQLXMLTransform.java:74)
    at org.teiid.query.eval.Evaluator.validate(Evaluator.java:726)
    at org.teiid.query.eval.Evaluator.evaluateXMLParse(Evaluator.java:691)
    ... 36 more
Caused by: com.ctc.wstx.exc.WstxIOException: Input length = 1
    at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
    at org.teiid.core.types.basic.StringToSQLXMLTransform.isXml(StringToSQLXMLTransform.java:71)
    ... 38 more
Caused by: java.nio.charset.UnmappableCharacterException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:261)
    at org.teiid.core.util.InputStreamReader.read(InputStreamReader.java:84)
    at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
    at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
    at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
    at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1034)
    at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:794)
    at com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1900)
    at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3035)
    at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2934)
    at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2846)
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
    ... 39 more

  • 1. Re: Unable to parse xml
    Ramesh Reddy Master

    Balaji,

     

    'UTF-8' is the default encoding Teiid uses to parse as XML documents. If this XML document needs a different encoding schema, then you need to edit the 'vdb.xml' file inside the VDB, or if you are using the Dynamic VDB like "my-vdb.xml" add as following. In the below sample, I am assigning a new property value to "encoding" as "UTF-16" on the "file" translator. This defines a new translator type, then I use that on my model called 'satellites' source tag.

     

    <vdb name="vdbname" version = "1">
        <model name="satellite">
               <source name="file" translator-name="utf16-file" connection-jndi-name="java:beam"/>
        </model>
        <translator name="utf16-file" type="file">
               <property name="encoding" value="UTF-16"/>
        </translator>
    </vdb>
    

     

    Teiid does use streaming, so it should be good in parsing huge XML files. If you successfully parse 70MB, please let us know as this can be interesting.

     

    Thanks

     

    Ramesh..

  • 2. Re: Unable to parse xml
    Ramesh Reddy Master

    Also, we do not have tooling to do the above in Teiid Designer, but we plan on adding it in future releases, so that you can modify the properties with out manual editing.

  • 3. Re: Unable to parse xml
    Balaji Seshadri Newbie

    I changed file encoding system property in Java to UTF-8 and it worked for me.

  • 4. Re: Unable to parse xml
    Steven Hawkins Master

    I think that we should be able to handle this for you without changing the system encoding.  I'll look into this.

  • 6. Re: Unable to parse xml
    Balaji Seshadri Newbie

    Hi Steve,

     

    Im getting the issue again when i parse another xml but this time even after setting the System encoding to UTF-8.

     

    Please see the attached xml.

  • 7. Re: Unable to parse xml
    Steven Hawkins Master

    To help me understand exactly what is happening, which release are you using and how are you using XMLPARSE or otherwise getting an exception?

  • 8. Re: Unable to parse xml
    Balaji Seshadri Newbie

    i was using teiid trunk for 7.2 CR1.The last check in was for this bug https://jira.jboss.org/browse/TEIID-1313.

     

    Here is the query i used.

     

    SELECT
            station.guid
        FROM
            (EXEC GetXml.getTextFiles('Station.xml')) AS f, XMLTABLE('$d/stations/station' PASSING XMLPARSE(DOCUMENT F.file) AS d COLUMNS guid string PATH '@guid') AS station

  • 9. Re: Unable to parse xml
    Steven Hawkins Master

    Balaji,

     

    It is detecting the charset correctly, but there is a bug in the stream reading logic that is treating multi-byte values at the internal buffer boundary as being an error.  Logged as https://issues.jboss.org/browse/TEIID-1390

     

    Try using the wellformed option: XMLPARSE(DOCUMENT F.file WELLFORMED)

     

    Given the size of your docs, and that you may know they are already valid, you probably want to use the wellformed option in any case to skip the initial validation of the document.

     

    Steve

  • 10. Re: Unable to parse xml
    Balaji Seshadri Newbie

    Steve,

     

    Thanks for the workaround.

     

    Im getting Syntax error when try that option.

     


    08:29:08,566 WARN  [PROCESSOR] Processing exception 'Failed to evaluate XQuery e
    xpression; Please check the query and correct errors in syntax or usage. ' for r
    equest iI1hbPi5rsmt.10.  Exception type org.teiid.core.TeiidProcessingException
    thrown from org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(U
    nknown Source). Enable more detailed logging to see the entire stacktrace.

     

    SELECT
            station.guid
        FROM
            (EXEC GetXml.getTextFiles('Station.xml')) AS f, XMLTABLE('$d/stations/station' PASSING XMLPARSE(DOCUMENT F.file WELLFORMED) AS d COLUMNS guid string PATH '@guid') AS station

  • 11. Re: Unable to parse xml
    Steven Hawkins Master

    Balaji,

     

    Ah, the full stacktrace would probably indicate this is related to the same problem, but now with using the file initially as a clob.  The full workaround is to use the file as blob:

     

    SELECT
            station.guid
        FROM
            (EXEC  GetXml.getFiles('Station.xml')) AS f, XMLTABLE('/stations/station'  PASSING XMLPARSE(DOCUMENT F.file WELLFORMED) COLUMNS guid string  PATH '@guid') AS station

     

    Also since document projection is only performed on the context item, you'll get better performance if you pass the document as the context item and not through a specific variable name.

     

    Steve

  • 12. Re: Unable to parse xml
    Balaji Seshadri Newbie

    Thanks a lot that worked.

  • 13. Re: Unable to parse xml
    Balaji Seshadri Newbie

    How to pass as context item can u give me an example.

  • 14. Re: Unable to parse xml
    Steven Hawkins Master

    Sorry, I should have been more explicit.  My example was also updated to pass the doc as the context item.  Note that lack of "AS d" and the variable reference in the xquery.

     

    Steve

1 2 Previous Next