Exception when trying to use Microsoft Office Document Sequencer
jacobdoran May 29, 2012 11:24 AMHi,
I'm to create a simple repository for storing and searching microsoft office documents and images but I'm getting a strange error when trying to sequence Microsoft Office Documents. I have the following code to create my repository. Exception handling is stripped:
static final String repositoryName = "repository A"; static final String repositorySource = "source A"; static final String sourcePath = "c:\\data\\modeshape\\"; JcrConfiguration config = new JcrConfiguration(); config.repositorySource(repositorySource) .usingClass(DiskSource.class) .setProperty("repositoryRootPath", sourcePath) .setProperty("updatesAllowed", true) .setDescription("The repository for our content") .setProperty("defaultWorkspaceName", workspaceName);
config.repository(repositoryName) .setSource(repositorySource);
config.textExtractor("Tika Text Extractors")
.setDescription("Text extractors using Tika parsers")
.usingClass(org.modeshape.extractor.tika.TikaTextExtractor.class)
.setProperty("includedMimeTypes", "application/msword,application/vnd.oasis.opendocument.text");
config.sequencer("Image Sequencer") .usingClass("org.modeshape.sequencer.image.ImageMetadataSequencer") .loadedFromClasspath() .setDescription("Sequences image files to extract the characteristics of the image") .sequencingFrom("//(*.(jpg|jpeg|gif|bmp|pcx|png|iff|ras|pbm|pgm|ppm|psd)[*])/jcr:content[@jcr:data]") .andOutputtingTo("/images/$1"); config.sequencer("Microsoft Office Document Sequencer") .usingClass("org.modeshape.sequencer.msoffice.MSOfficeMetadataSequencer") .loadedFromClasspath() .setDescription("Sequences MS Office documents, including spreadsheets and presentations") .sequencingFrom("//(*.(doc|xls|docx|xlsx)[*])/jcr:content[@jcr:data]") .andOutputtingTo("/msoffice/$1"); engine = config.build(); engine.start();
Then to store a document I do the following:
repository = engine.getRepository(repositoryName); Session session = repository.login(); JcrTools tools = new JcrTools(); Node node = tools.findOrCreateNode(session, fileName, "nt:folder", "nt:file"); // Upload the file to that node ... Node contentNode = tools.findOrCreateChild(node, "jcr:content", "nt:resource"); contentNode.setProperty("jcr:lastModified", Calendar.getInstance()); Binary binaryValue = session.getValueFactory().createBinary(stream); contentNode.setProperty("jcr:data", binaryValue); session.save();
For image files it works pefectly but office documents I get the following error:
org.modeshape.repository.sequencer.SequencerException: java.lang.IllegalArgumentException: The bytes argument may not be null at org.modeshape.repository.sequencer.StreamSequencerAdapter.execute(StreamSequencerAdapter.java:198) at org.modeshape.repository.sequencer.SequencingService.processChange(SequencingService.java:498) at org.modeshape.repository.sequencer.SequencingService$RepositoryObserver$1.run(SequencingService.java:666) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
I'm a newbie so probably doing something stupid. I would really appreciate some pointers as to what I'm doing wrong.
Thanks in advance,
Jacob