11 Replies Latest reply on Jan 19, 2010 4:13 PM by alrubinger

    SHRINKWRAP-116: Write on-demand during ZIP Export

    alrubinger

      There are a couple issues with our implementation of the ZIP Exporter:

       

      • The caller must wait until we've encoded the full archive before he/she may start reading
      • We hold the full contents of the ZIP in memory

       

      I like the API; from the start we considered that we pass an InputStream (readable pointer) to the user instead of a full byte array for just this reason.

       

      I've attached a patch to the issue which spawns off the encoding process into a new Thread, immediately returning a reference to the target InputStream.  It uses piped streams; if the underlying buffer gets filled (the user has not read quickly enough), the write operation blocks; our footprint is now limited to the size of the buffer.

       

      So we now avoid potential OOM and provide the user with more efficient return (performance).

       

      The drawback:

       

      Because we're spawning the write operations into another Thread, we now cannot catch problems during encoding and wrap them in ArchiveExportException.  This would require blocking, and could lead to deadlock (we block before returning the InputStream, similarly the writing operations block because the user hasn't read anything from the buffer because we haven't returned it yet).

       

      I say we drop ArchiveExportException if some error is encountered on the writer Thread, and just throw it up that stack.

       

      WDYT?

       

      S,

      ALR

        • 1. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
          alrubinger
          • 2. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
            aslak

            Just a quick question.

             

            What can you do with the InputStream you get?

            (besides the obvious, write it out somewhere...)

             

            • 3. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
              alrubinger

              aslak wrote:

               

              Just a quick question.

               

              What can you do with the InputStream you get?

              (besides the obvious, write it out somewhere...)

               

               

               

              Trick question, right?  You can only read.

               

              It's another leap to assume that clients have an OutputStream in mind and available at the time they request the encoding to ZIP.  Take for example JClouds:

               

              http://code.google.com/p/jclouds/source/browse/trunk/demos/antjruby/build.xml?spec=svn2610&r=2610

               

              TBH I'm not exactly sure what Adrian is doing here, but that BlobStore looks like its accepting an InputStream and handling the write operations on its own.  Same for any number of other utilities which abstract the writing process from the user - all they want is the input to write.

               

              S,

              ALR

              • 4. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                adriancole

                Underneath the scenes, jclouds would take this inputstream and write it somewhere.  this is true

                 

                There are a couple places where this won't be perfect.  For example, Amazon S3 doesn't support chunked uploads, which implies we need to know the size of the content before we send it.  In this case, we would have to rebuffer to disk or something to avoid oom errors.  These are complexities that jclouds and the like have to deal with when we accept a stream as opposed to the far easier byte array.

                • 5. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                  alrubinger

                  So in your view Adrian, you'd prefer the framework provide an option (utility) to give you the byte array, or do it yourself?

                   

                  We work to find the balance between:

                   

                  1) Keeping things in proper scope (generic I/O utilities are really not our business)

                  2) Still be usable

                   

                  A good example of where we break 1) in favor of 2) is the "ZIP export to File" feature.

                   

                  S,

                  ALR

                  • 6. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                    adriancole

                    For archives, if given the choice, I'd much rather have an inputstream than a byte array.  I can always make a bytearray out of the inputstream.  I understand that you need to keep scope close.  I think only having an option for byte array isn't going to work while only having an option of inputstream can always work.

                     

                    Does this make sense?

                    • 7. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                      alrubinger

                      Sure thing.

                       

                      I think we'd always support InStream but perhaps offer a byte[] via some util.  As of right now I don't see enough of a case for it (it's like 3 lines after all, the caller can handle it).

                       

                      Aslak, that leaves open the question of dropping ArchiveExportException during the encoding process.

                       

                      S,

                      ALR

                      • 8. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                        aslak

                         

                        The user base has spoken...

                         

                        The new laziness of ZipExporter might cause some problems tho.

                         

                        Currently it gets a unmodifiableMap by a call to archive.getContent, creates a new List of all the keys(ArchivePaths) for processing each Asset.

                        If 'someone' deletes a path from the archive outside of the Exporter, the Exporter will not find it in the content and assume it's a directory.

                         

                        There is nothing stopping 'someone' from overwriting Assets either, this will change the expected outcome of the Exporter.

                         

                        I guess the Exporter should clone the archive content on initialization.. ?

                         

                        {code}

                             @Test      public void shouldBeAbleToChangeArchiveAfterExporter() throws Exception      {           ArchivePath deletePath = ArchivePaths.create("org/jboss/shrinkwrap/api/spec/JavaArchive.class");           ArchivePath changePath = ArchivePaths.create("org/jboss/shrinkwrap/api/spec/WebArchive.class");           JavaArchive archive = Archives.create("test.jar", JavaArchive.class)                                    .addPackages(                                              true,                                              Archives.class.getPackage());          

                                  InputStream stream = archive.as(ZipExporter.class).exportZip();           // give the Exporter some time to fill up the buffer           Thread.sleep(1000);           // remove path from archive, the exporter now thinks this is a directory           archive.delete(deletePath);           // change the asset content           archive.add(new FileAsset(new File("pom.xml")), changePath);           ByteArrayOutputStream output = new ByteArrayOutputStream();           IOUtil.copyWithClose(stream, output);           JavaArchive test = Archives.create("test2.jar", ZipImporter.class)                     .importZip(                               new ZipInputStream(new ByteArrayInputStream(output.toByteArray())))                     .as(JavaArchive.class);           Assert.assertTrue(                     "The exporter should have taken a snapshot of the archive content before the path was deleted." +                     "(directories are not imported)",                     test.contains(deletePath));           String assetContent = new String(IOUtil.asByteArray(test.get(changePath).openStream()));           Assert.assertFalse(                     "The exporter should have taken a snapshot of the archive content before the path was changed.",                     assetContent.contains("artifactId"));      }

                        {code}

                         

                        • 9. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                          alrubinger

                          Good catch.  We'll just make a copy of the contents before spawing off into the new Thread.

                           

                          Let's put this test in place, but it needs one modification:

                           

                          
                               @Test
                               public void shouldBeAbleToChangeArchiveAfterExporter() throws Exception 
                               {
                                    ...
                                    // give the Exporter some time to fill up the buffer           Thread.sleep(1000);      }

                           

                           

                          We can't Thread.sleep in tests; this is the #1 cause of transient failures in the AS testsuites.  Instead we can make something more reliable (like an overridden implementation of the exporter to be used in testing which permits setting of a barrier).

                           

                          But let's give this issue its own JIRA and handle it separately.

                           

                          S,

                          ALR

                          • 10. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                            alrubinger

                            Looks like I've got a good enough test case showing the OOM:

                             

                            http://jboss.hudson.alrubinger.com/job/ShrinkWrap/org.jboss.shrinkwrap$shrinkwrap-impl-base/60/

                             

                            But giving the caller no mechanism to determine if the export process fails bothers me.

                             

                            I think instead of returning InputStream directly, we should give back a handle:

                             

                            interface ZipExportHandle{
                            
                              InputStream getInputStream();
                            
                              boolean isDone() throws ArchiveExportException;
                            }
                            

                             

                            In this scenario a user can check that all's completed OK, and verify the integrity of the data he/she has read from the InputStream.  If isDone is true without an exceptional circumstance, all's good in the hood.

                             

                            S,

                            ALR

                            • 11. Re: SHRINKWRAP-116: Write on-demand during ZIP Export
                              alrubinger

                              I've committed this, preserving all the old features and introducing some API changes:

                               

                              /**
                               * Handle returned to callers from a request to export via
                               * the {@link ZipExporter}.  As the encoding process is an asynchronous
                               * operation, here we provide the user access to read the 
                               * content as well as check for completeness and integrity.
                               *
                               * @author <a href="mailto:andrew.rubinger@jboss.org">ALR</a>
                               */
                              public interface ZipExportHandle
                              {
                                 /**
                                  * Obtains an {@link InputStream} from which the encoded
                                  * content may be read.
                                  * 
                                  * @return
                                  */
                                 InputStream getContent();
                              
                                 /**
                                  * Blocking operation which will wait until the encoding process's internal
                                  * streams have been closed and verified for integrity.  Do not call this method
                                  * until all bytes have been read from {@link ZipExportHandle#getContent()}; otherwise 
                                  * this may introduce a deadlock.  Any problems with the encoding process will be reported
                                  * by throwing {@link ArchiveExportException}.
                                  * @return
                                  * @throws ArchiveExportException If an error occurred during export
                                  * @throws IllegalStateException If invoked before {@link ZipExportHandle#getContent()} has been
                                  * fully-read
                                  */
                                 void checkComplete() throws ArchiveExportException, IllegalStateException;
                              }
                              

                               

                              S,

                              ALR