14 Replies Latest reply: May 10, 2012 9:43 AM by Dmitry Zhuravlev RSS

Using FileSystemBinary?

Steen Laursen Newbie

Using ModeShape with a FileSystemSource I am trying to add a Binary file of 3GB. I am getting a OutOfMemoryException. After a little investigation I believe the default implementation uses InMemoryBinary which load all bytes into memory. My heap was only 256M.

 

Does anyone know how to enable the FileSystemBinary class instead of InMemoryBinary?

  • 1. Re: Using FileSystemBinary?
    Brian Carothers Apprentice

    It should be using FileSystemBinary by default, unless you set eagerFileLoading to true on the FileSystemSource.  Could you post a stack trace from the OOM?

  • 2. Re: Using FileSystemBinary?
    Steen Laursen Newbie

    Here is my configuration of the FileSystemSource repository. Maybe something is wrong in the configuration?

     

              JcrConfiguration configuration = new JcrConfiguration();
              configuration.repositorySource("store")
                         .usingClass(FileSystemSource.class)
                         .setDescription("The repository for our content")
                         .setProperty("workspaceRootPath", "/home/nextgen/content")
                         .setProperty("updatesAllowed", true);
    
              configuration.repository(repositoryId)
                         .setSource("store");
    
              try {
                          // Start the ModeShape engine ...
                          this.engine = configuration.build();
                          this.engine.start();
    
                          // Now get the JCR repository instance ...
                          this.repository = this.engine.getRepository(repositoryId);
               } catch (Exception e) {
                          this.repository = null;
                          throw e;
               }
    
    

     

    Below is the code that inserts the large file

     

              // Insert a folder "video" and add a "abc.mp4" video file
              Node root = session.getRootNode();
    
              // Create folder node
              Node videoNode = root.addNode("video", "nt:folder");
              Node fileNode = videoNode.addNode("abc.mp4", "nt:file");
    
              // Insert file
              Node resNode = fileNode.addNode ("jcr:content", "nt:resource");
              resNode.setProperty("jcr:mimeType", "video/mp4");
              File file = new File("/home/nextgen/abc.mp4");
              Binary binary = (session.getValueFactory().createBinary(new FileInputStream(file)));
              resNode.setProperty("jcr:data",binary);
              session.save();
    
              binary.dispose();
    

     

    and here is the stacktrace as I receive the OutOfMemoryException. Heapsize is set to 512mb.

     

    java.lang.OutOfMemoryError: Java heap space
              at java.util.Arrays.copyOf(Arrays.java:2786)
              at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
              at org.modeshape.common.util.IoUtil.readBytes(IoUtil.java:66)
              at org.modeshape.graph.property.basic.AbstractBinaryValueFactory.create(AbstractBinaryValueFactory.java:229)
              at org.modeshape.graph.property.basic.AbstractBinaryValueFactory.create(AbstractBinaryValueFactory.java:55)
              at org.modeshape.graph.property.basic.AbstractValueFactory.create(AbstractValueFactory.java:123)
              at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:111)
              at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:45)
              at com.nextgen.core.repository.ModeShapeLargeFileInsertTest.testInsert(RespositoryTest.java:132)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
              at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
              at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
              at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
              at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
              at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
              at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
              at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
              at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
              at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
              at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
              at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
              at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
              at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
              at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
              at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
              at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    
  • 3. Re: Using FileSystemBinary?
    Steen Laursen Newbie

    Add the configuration property "eagerFileLoading=false" (which is the default according to the docs) does not change anything.

  • 4. Re: Using FileSystemBinary?
    Brian Carothers Apprentice

    No, you're doing everything right.  We don't have any provision for using a FileSystemBinary when writing to a repository, only when reading from a FileSystemSource.  With hindsight, that looks a fairly important omission.

     

    Would you mind creating a JIRA issue for this?  I'm pretty confident that we could turn around a fix ASAP.

  • 5. Re: Using FileSystemBinary?
    Steen Laursen Newbie

    I created JIRA issue MODE-1201

  • 6. Re: Using FileSystemBinary?
    Brian Carothers Apprentice

    Thanks, Steen.  We should be able to get this fix into the trunk by Monday.

  • 7. Re: Using FileSystemBinary?
    Brian Carothers Apprentice

    I've got a pull request in for this at https://github.com/ModeShape/modeshape/pull/131.  You can apply it locally if you're brave enough to build from trunk[1].  Thanks for the great description of the issue and the very helpful steps-to-reproduce.  I've incorporated a very similar test to verify that this is no longer an issue once the patch is applied.

     

    The patch still has to pass review before it gets added into trunk though, so it may or may not get in on Monday.

     

    [1] - Actually, you don't have to be particularly brave to do this.  Our trunk almost always compiles.

  • 8. Re: Using FileSystemBinary?
    Randall Hauch Master

    I'll be merging that change into the 'master' branch this morning. Thanks for working on this, Brian, and thanks Steen for finding and reporting this in a very thorough manner! That helped a lot!

     

    [1] - Actually, you don't have to be particularly brave to do this.  Our trunk almost always compiles.

    Our 'master' branch (aka, trunk) is very stable at this point. We do all our development in other branches, and merge to 'master' only when things are ready. So our 'master' branch not only almost always compiles, it's almost always very stable.

  • 9. Re: Using FileSystemBinary?
    Randall Hauch Master

    I've merged the changes into the 'master' branch, and resolved the issue.

     

    If you want to try it, get the latest code and build locally, and the "2.6-SNAPSHOT" version will go into your local Maven repository. You can use it in your Maven application by then specifying "2.6-SNAPSHOT" in your POM. Let us know if you have any problems.

  • 10. Re: Using FileSystemBinary?
    Steen Laursen Newbie

    Thanks for the quick turnaround. I tried out the fix and it works well.

     

    I noticed that the insert time (on my system) for a 3GB file using JCR Binary is about 167 seconds, but reading the file is about 88 seconds. Just copying the file (no JCR) using apache-commons IOUtils.copyLarge(InputStream, OutputStream) takes about 49 seconds.

     

    I am not sure if I am doing anything wrong or if there are room for performance optimizations somewhere in the code?

     

     

    This takes about 49 seconds

    @Test
    public void copy() throws IOException {
              long begin = System.currentTimeMillis();
              InputStream is = new FileInputStream(new File("/opt/vmware/Windows 7 x64/Windows7x64.jpg"));
              OutputStream os = new FileOutputStream(new File("/home/steen/vm.vm"));
              long copied = IOUtils.copyLarge(is, os);
              System.out.println("Total time: " + (System.currentTimeMillis() - begin) + " to copy " + copied + " bytes");
    }
    
  • 11. Re: Using FileSystemBinary?
    Randall Hauch Master

    Glad it worked. We're doing a few more things than the copy utility, including writing the file to a temporary file before moving it over any existing file (to handle any error conditions during reads; we don't want to corrupt the file that's there if there's an error reading the new binary value). Also, we're not using Apache Commons' IOUtils, and our utility is using a smaller byte buffer. Not sure how much difference that makes.

  • 12. Re: Using FileSystemBinary?
    Brian Carothers Apprentice

    Steen,

     

    By any chance, is your /tmp directory on a different filesystem than where your FileSystemSource.repositoryRootPath is located?  Even if it's on the same HDD, being on a different filesystem would make a big difference atm.  I'm profiling some of the impact now, but that could explain the very large discrepancy.

  • 13. Re: Using FileSystemBinary?
    Brian Carothers Apprentice

    The more I think about this, the more I think that we're not quite doing this right.  As Randall noted above, our current algorithm for updating file content goes like this:

     

    1.  Write the content to a temp file in java.io.tmpdir to make sure that we have a safe copy of the data

    2.  Delete the existing target file (if it exists)

    3.  Rename the temp file to the target file

     

    This isn't the worst solution, but it could be improved.  First, if java.io.tmpdir happens to point to a different filesystem than the target file is on, the rename turns from a call to File.renameTo() into another file copy and delete.  I'm pretty sure that's what Steen is seeing above, because I get roughly equivalent performance on my MBP (with only one filesystem) whether I copy a 3G file directly with Commons IO or ModeShape's FileUtil or whether I write the 3G file into a file system connector.

     

    I've opened MODE-1206 to describe this and will submit a patch that allows users to specify the temporary directory that is used, allowing them to keep everything on one filesystem. 

     

    I added a pull request at https://github.com/ModeShape/modeshape/pull/132.

  • 14. Re: Using FileSystemBinary?
    Dmitry Zhuravlev Newbie

    As I understand you are rejected this solutions. If so, why MODE-1201 marked as "Closed"? This problem still exist in modeshape 2.7. Please provide some patch for this problem to 2.x modeshape versions.