2 Replies Latest reply on Feb 17, 2014 4:33 AM by nl

Text extractor ignores text if write limit is exceeded

nl Feb 17, 2014 3:33 AM

Hello ModeShapers,

if the text extractor runs into an exception it does not check whether any output is already available. In case that the write limit is exceeded, Tika throws a TikaException

{noformat}

Parsing exception while extracting text: Your document contained more than 1001 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).

{noformat}

which is catched by the Extractor but the output is not recorded (only in case of no exceptions).

Is there a reason for this behaviour or is it just a bug?

Thanks, Niels

EDIT: This refers to MS 3.x

1. Re: Text extractor ignores text if write limit is exceeded

hchiorean Feb 17, 2014 3:32 AM (in response to nl)

Not really, I think it's just because I wasn't aware that even though Tika throws an exception, the body content handler will still contain the partially extracted text.

Feel free to log an enhancement/bug. Thanks.
Actions
2. Re: Text extractor ignores text if write limit is exceeded

nl Feb 17, 2014 4:33 AM (in response to hchiorean)

MODE-2154.

Thanks.
Actions

Go to original post