1 2 Previous Next 22 Replies Latest reply on Dec 13, 2010 3:18 PM by shawkins

    Unable to parse xml

    balaji.seshadri

      Im getting below error while parsing 70mB xml file using XMLPARSE function,looks like character encoding issue.

       

      Please let me know of any work arounds.

       

      2010-10-20 16:23:21,678 DEBUG [org.teiid.PROCESSOR] (Worker0_QueryProcessorQueue7) [Ljava.lang.Object;@13fbd4e
      [ExpressionEvaluationException]Unable to evaluate XMLPARSE(DOCUMENT F.file): Value is not valid XML
      1 [ExpressionEvaluationException]Value is not valid XML
      2 [TransformationException]Value is not valid XML
      3 [WstxIOException]Input length = 1
      4 [UnmappableCharacterException]Input length = 1
          at org.teiid.query.eval.Evaluator.evaluate(Evaluator.java:606)
          at org.teiid.query.eval.Evaluator.evaluateXQuery(Evaluator.java:846)
          at org.teiid.query.processor.relational.XMLTableNode.nextBatchDirect(XMLTableNode.java:120)
          at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
          at org.teiid.query.processor.BatchIterator.finalRow(BatchIterator.java:69)
          at org.teiid.common.buffer.AbstractTupleSource.getCurrentTuple(AbstractTupleSource.java:69)
          at org.teiid.query.processor.BatchIterator.getCurrentTuple(BatchIterator.java:81)
          at org.teiid.common.buffer.AbstractTupleSource.hasNext(AbstractTupleSource.java:91)
          at org.teiid.query.processor.relational.NestedTableJoinStrategy.process(NestedTableJoinStrategy.java:120)
          at org.teiid.query.processor.relational.JoinNode.nextBatchDirect(JoinNode.java:196)
          at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
          at org.teiid.query.processor.relational.ProjectNode.nextBatchDirect(ProjectNode.java:159)
          at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
          at org.teiid.query.processor.BatchIterator.finalRow(BatchIterator.java:69)
          at org.teiid.common.buffer.AbstractTupleSource.getCurrentTuple(AbstractTupleSource.java:69)
          at org.teiid.query.processor.BatchIterator.getCurrentTuple(BatchIterator.java:81)
          at org.teiid.common.buffer.AbstractTupleSource.nextTuple(AbstractTupleSource.java:48)
          at org.teiid.query.processor.relational.SortUtility.initialSort(SortUtility.java:214)
          at org.teiid.query.processor.relational.SortUtility.sort(SortUtility.java:168)
          at org.teiid.query.processor.relational.SortNode.sortPhase(SortNode.java:96)
          at org.teiid.query.processor.relational.SortNode.nextBatchDirect(SortNode.java:85)
          at org.teiid.query.processor.relational.RelationalNode.nextBatch(RelationalNode.java:262)
          at org.teiid.query.processor.relational.RelationalPlan.nextBatch(RelationalPlan.java:107)
          at org.teiid.query.processor.QueryProcessor.nextBatchDirect(QueryProcessor.java:150)
          at org.teiid.query.processor.QueryProcessor.nextBatch(QueryProcessor.java:105)
          at org.teiid.query.processor.BatchCollector.collectTuples(BatchCollector.java:115)
          at org.teiid.dqp.internal.process.RequestWorkItem.processMore(RequestWorkItem.java:250)
          at org.teiid.dqp.internal.process.RequestWorkItem.process(RequestWorkItem.java:184)
          at org.teiid.dqp.internal.process.AbstractWorkItem.run(AbstractWorkItem.java:49)
          at org.teiid.dqp.internal.process.DQPWorkContext.runInContext(DQPWorkContext.java:188)
          at org.teiid.dqp.internal.process.ThreadReuseExecutor$RunnableWrapper.run(ThreadReuseExecutor.java:116)
          at org.teiid.dqp.internal.process.ThreadReuseExecutor$3.run(ThreadReuseExecutor.java:290)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:619)
      Caused by: [ExpressionEvaluationException]Value is not valid XML
      1 [TransformationException]Value is not valid XML
      2 [WstxIOException]Input length = 1
      3 [UnmappableCharacterException]Input length = 1
          at org.teiid.query.eval.Evaluator.evaluateXMLParse(Evaluator.java:695)
          at org.teiid.query.eval.Evaluator.internalEvaluate(Evaluator.java:662)
          at org.teiid.query.eval.Evaluator.evaluate(Evaluator.java:604)
          ... 34 more
      Caused by: [TransformationException]Value is not valid XML
      1 [WstxIOException]Input length = 1
      2 [UnmappableCharacterException]Input length = 1
          at org.teiid.core.types.basic.StringToSQLXMLTransform.isXml(StringToSQLXMLTransform.java:74)
          at org.teiid.query.eval.Evaluator.validate(Evaluator.java:726)
          at org.teiid.query.eval.Evaluator.evaluateXMLParse(Evaluator.java:691)
          ... 36 more
      Caused by: com.ctc.wstx.exc.WstxIOException: Input length = 1
          at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
          at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
          at org.teiid.core.types.basic.StringToSQLXMLTransform.isXml(StringToSQLXMLTransform.java:71)
          ... 38 more
      Caused by: java.nio.charset.UnmappableCharacterException: Input length = 1
          at java.nio.charset.CoderResult.throwException(CoderResult.java:261)
          at org.teiid.core.util.InputStreamReader.read(InputStreamReader.java:84)
          at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
          at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
          at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
          at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
          at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1034)
          at com.ctc.wstx.sr.StreamScanner.getNextChar(StreamScanner.java:794)
          at com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1900)
          at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3035)
          at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2934)
          at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2846)
          at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
          ... 39 more

        • 1. Re: Unable to parse xml
          rareddy

          Balaji,

           

          'UTF-8' is the default encoding Teiid uses to parse as XML documents. If this XML document needs a different encoding schema, then you need to edit the 'vdb.xml' file inside the VDB, or if you are using the Dynamic VDB like "my-vdb.xml" add as following. In the below sample, I am assigning a new property value to "encoding" as "UTF-16" on the "file" translator. This defines a new translator type, then I use that on my model called 'satellites' source tag.

           

          <vdb name="vdbname" version = "1">
              <model name="satellite">
                     <source name="file" translator-name="utf16-file" connection-jndi-name="java:beam"/>
              </model>
              <translator name="utf16-file" type="file">
                     <property name="encoding" value="UTF-16"/>
              </translator>
          </vdb>
          

           

          Teiid does use streaming, so it should be good in parsing huge XML files. If you successfully parse 70MB, please let us know as this can be interesting.

           

          Thanks

           

          Ramesh..

          • 2. Re: Unable to parse xml
            rareddy

            Also, we do not have tooling to do the above in Teiid Designer, but we plan on adding it in future releases, so that you can modify the properties with out manual editing.

            • 3. Re: Unable to parse xml
              balaji.seshadri

              I changed file encoding system property in Java to UTF-8 and it worked for me.

              • 4. Re: Unable to parse xml
                shawkins

                I think that we should be able to handle this for you without changing the system encoding.  I'll look into this.

                • 5. Re: Unable to parse xml
                  shawkins
                  • 6. Re: Unable to parse xml
                    balaji.seshadri

                    Hi Steve,

                     

                    Im getting the issue again when i parse another xml but this time even after setting the System encoding to UTF-8.

                     

                    Please see the attached xml.

                    • 7. Re: Unable to parse xml
                      shawkins

                      To help me understand exactly what is happening, which release are you using and how are you using XMLPARSE or otherwise getting an exception?

                      • 8. Re: Unable to parse xml
                        balaji.seshadri

                        i was using teiid trunk for 7.2 CR1.The last check in was for this bug https://jira.jboss.org/browse/TEIID-1313.

                         

                        Here is the query i used.

                         

                        SELECT
                                station.guid
                            FROM
                                (EXEC GetXml.getTextFiles('Station.xml')) AS f, XMLTABLE('$d/stations/station' PASSING XMLPARSE(DOCUMENT F.file) AS d COLUMNS guid string PATH '@guid') AS station

                        • 9. Re: Unable to parse xml
                          shawkins

                          Balaji,

                           

                          It is detecting the charset correctly, but there is a bug in the stream reading logic that is treating multi-byte values at the internal buffer boundary as being an error.  Logged as https://issues.jboss.org/browse/TEIID-1390

                           

                          Try using the wellformed option: XMLPARSE(DOCUMENT F.file WELLFORMED)

                           

                          Given the size of your docs, and that you may know they are already valid, you probably want to use the wellformed option in any case to skip the initial validation of the document.

                           

                          Steve

                          • 10. Re: Unable to parse xml
                            balaji.seshadri

                            Steve,

                             

                            Thanks for the workaround.

                             

                            Im getting Syntax error when try that option.

                             


                            08:29:08,566 WARN  [PROCESSOR] Processing exception 'Failed to evaluate XQuery e
                            xpression; Please check the query and correct errors in syntax or usage. ' for r
                            equest iI1hbPi5rsmt.10.  Exception type org.teiid.core.TeiidProcessingException
                            thrown from org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(U
                            nknown Source). Enable more detailed logging to see the entire stacktrace.

                             

                            SELECT
                                    station.guid
                                FROM
                                    (EXEC GetXml.getTextFiles('Station.xml')) AS f, XMLTABLE('$d/stations/station' PASSING XMLPARSE(DOCUMENT F.file WELLFORMED) AS d COLUMNS guid string PATH '@guid') AS station

                            • 11. Re: Unable to parse xml
                              shawkins

                              Balaji,

                               

                              Ah, the full stacktrace would probably indicate this is related to the same problem, but now with using the file initially as a clob.  The full workaround is to use the file as blob:

                               

                              SELECT
                                      station.guid
                                  FROM
                                      (EXEC  GetXml.getFiles('Station.xml')) AS f, XMLTABLE('/stations/station'  PASSING XMLPARSE(DOCUMENT F.file WELLFORMED) COLUMNS guid string  PATH '@guid') AS station

                               

                              Also since document projection is only performed on the context item, you'll get better performance if you pass the document as the context item and not through a specific variable name.

                               

                              Steve

                              • 12. Re: Unable to parse xml
                                balaji.seshadri

                                Thanks a lot that worked.

                                • 13. Re: Unable to parse xml
                                  balaji.seshadri

                                  How to pass as context item can u give me an example.

                                  • 14. Re: Unable to parse xml
                                    shawkins

                                    Sorry, I should have been more explicit.  My example was also updated to pass the doc as the context item.  Note that lack of "AS d" and the variable reference in the xquery.

                                     

                                    Steve

                                    1 2 Previous Next