5 Replies Latest reply on Feb 6, 2013 12:40 PM by mmatloka

    Improving the MavenDependency Performance

    alrubinger

      The following observation was kicked off by Vineet Reynolds:

       

          I've spent sometime profiling a "slow" test of mine in a project that uses Seam Security. The deployment is created using the Maven Dependency resolver (2.0.0.-alpha-1), and it looks like there is scope for improving the performance of this resolver, given that creation of the deployment takes up roughly 20 seconds (wall clock time) in a test that lasts ~40 seconds.

       

          The hotspot appears to be ClasspathWorkspaceReader which is not only invoked multiple times (1779 in my test), but also features heavily in the "own time" stats (216s spent in this method out of 217s contributed to overall execution time by this method; I've sampled every method invocation, so the profiled times are slower). The DefaultArtifact constructor is also invoked 1.4k+ times, but doesn't contribute heavily to the consumed time (800ms or so). I haven't got any more useful information regarding the cause, since the profiler hasn't picked up anything, but I suspect it is due to the method-local DocumentBuilderFactory and XPathFactory instances being created. But, I think you would know Aether better than me, so I'd like to have your inputs in improving this class before I submit a pull request.

      ...additionally:

       

      By the way, I did manage to improve the performance a bit -  the XPath query evaluation did take some time, but then the WorkspaceReader is being hit 1.7k times, so it didn't help much - wall clock time for the tests reduced from 38s to 34s. I logged the info about the artifacts being requested, and it turns out that there are duplicates among the parameters, so we need some form of a cache to improve performance. Maybe not a cache that we should manage, but one of Aether - which is why I think I need more feedback here.

       

      S,

      ALR

        • 1. Re: Improving the MavenDependency Performance
          vineet.reynolds

          I wasn't sure if this came out correctly in the above observations - using a class-member DocumentBuildFactory and XPathFactory, and compiled XPath expressions did not improve performance as much as I liked. This is more so because XPath expression evaluation is expensive, and when repeated across large number of invocations, the use of a compiled expression does not help a lot. Now, given that most of these lookups are duplicates, we need to find some way to cache the results, or we need to prevent the invocations with the duplicate artifacts.

           

          Also, if we would solve this with a cache, we need to ensure that the cache would be re-used across tests, since it would help immensely when multiple @Deployments in different tests use the same Maven artifact.

          • 2. Re: Improving the MavenDependency Performance
            kpiwko
            • 3. Re: Improving the MavenDependency Performance
              aslak

              Another possible optimalization that came to mind is, in the case where you're not resolving trasitive deps:

               

              We could start by by-passing the Aether lib all togather(not even initialize anything from it) and do a simple check if the artifact exist in the local repo, if it does simply return that File. If it doesn't exist, or you're resolving transitive deps, fall back to use Aether.

               

              In theory we would only need to run Aether the first time you depend on a non existing Artifact to have it download, all the following runs on the same machine will simple just resolve GAV to File via Path concats.

               

              To speed up transitive deps, we could cache the resolved GAVs for a specific Artifact combination. No cache hit, fall back to Aether, cache hit found, do a simple GAV to File resolve.

              • 4. Re: Improving the MavenDependency Performance
                lfryc

                Vineet Reynolds wrote:

                 

                Also, if we would solve this with a cache, we need to ensure that the cache would be re-used across tests, since it would help immensely when multiple @Deployments in different tests use the same Maven artifact.

                This is good point - and good use case for arquillian core - I use caching in form of serialized objects in Drone reusable webdriver sessions - when it would be part of core, all projects can cache through unified interface.

                • 5. Re: Improving the MavenDependency Performance
                  mmatloka

                  I've performed a few optimizations of ClasspathWorkspaceReader.

                  • Already mentioned reuse of DocumentBuilder and XPath
                  • XPath query compilation to XPathExpression
                  • Retrieved from classpath artifacts caching
                  • Reduction of number of disk operations (every classpath entry for every retrieved artifact performed disk isFile/isDirectory operations + same thing for found pom files)

                   

                  I've performed tests basing on forge core\javaee-impl\pom.xml (136 retrieved dependencies). Before changes on my PC running

                   

                  Maven.resolver().loadPomFromFile(pom).importRuntimeDependencies().asFile();

                   

                  took about 32s. After mentioned optimizations ~1,5s.

                   

                  Code is available here: https://github.com/shrinkwrap/resolver/pull/35