11 Replies Latest reply on Aug 20, 2015 12:39 PM by ma6rl

    Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)

    bes82

      I'm currently facing the problem that an Import job is simply to slow and after a bit of configuration tweaking and profiling I'm at a point where I'm not able to improve the performance, but I'm not sure if there simply is nothing to imrove anymore, or if I've done it all wrong.

       

      So here is what I'm doing:

       

      I need to import a lot ~10k of "objects", they have about 20 properties and it is ensured, that no import node contains more than 200 subnodes.

      I'm searching these objects/nodes via SQL2 and an indexed property. The index is on the local machine. This search alone (fire query, calling NodeIterator.hasNext()) takes 30-40ms. Then some properties are added or a new nodes is created, this takes about 1-3ms. Then the node is saved which takes another 40-50 ms. This includes updating the mentioned index synchronously. So this is ~80ms per item just for loading and storing.

       

      I already configured infinispan to use async write behind and set eviction to ~4k nodes.

       

      My question simply is: is this normal speed, because it means I'm only processing 12 objects per second which is incredibly slow.

       

      As I said, I have no idea where to further search for a bottleneck, profiling revealed, that all the time is eaten up by node.persist - which I don't understand at all, because with infinispan configured to async write through should not hold up persist at all - and by hasNext of NodeIterator and query.execute.

       

       

      As mentioned previously I simply would like to hear a few opinions, as I'm just not sure if 12 objects / second for the desribed task is simply the maximum possible on "normal" hardware, of if I might have a massive configuration problem. In the first case I will have to think of something different, in the latter one I can continue searching for a solution.

        • 1. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
          kbachl

          The numbers you report are way to low IMHO;

           

          What infinispan storage do you use? How is it configured?

          Why do you write

           

          "This includes updating the mentioned index synchronously."

           

          vs.

           

          "because with infinispan configured to async write through should not"

          ?

           

          Are you doing it sync or async?

           

          Maybe you want post your modeshape config file as well as your infinispan config;

           

           

          Best,

           

          KB

          • 2. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
            bes82

            Modeshape local Indexes are updated synchronously, infinispan cache is set to async, two different things.

             

            ----

             

            <?xml version="1.0" encoding="UTF-8"?>

            <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                        xsi:schemaLocation="urn:infinispan:config:6.0 http://www.infinispan.org/schemas/infinispan-config-6.0.xsd"

                        xmlns="urn:infinispan:config:6.0">

             

             

                <global>

                    <globalJmxStatistics enabled="false" allowDuplicateDomains="true"/>

                </global>

              

             

             

             

                <namedCache name="contentRepository">

                  

                    <transaction

                        transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"

                        transactionMode="TRANSACTIONAL"

                        lockingMode="OPTIMISTIC" />

                  

                    <persistence

                        passivation="false">

                        <singleFile

                            preload="false"

                            shared="false"

                            fetchPersistentState="false"

                            purgeOnStartup="false"

                            location="${datalocation}">

              <!-- write behind configuration -->

              <async enabled="true"/>

                        </singleFile>

                    </persistence>

                    <!-- limit the number of nodes to hold in memory -->

              <eviction maxEntries="8192" strategy="LIRS" />

                 

                </namedCache>

            </infinispan>

             

            ----

             

            {

                "name" : "modeshapeRepository",

                "jndiName": "jcr/modeshapeRepository",

                "monitoring" : {

                    "enabled" : true

                },

                "indexProviders" : {

                    "local" : {

                        "classname" : "org.modeshape.jcr.index.local.LocalIndexProvider",

                        "directory" : "${indexlocation}"

                    }

                },

                "storage" : {

                    "cacheName" : "contentRepository",

                    "cacheConfiguration" : "META-INF/infinispan-file-config-6.xml",

                    "binaryStorage" : {

                        "type" : "file",

                        "directory": "${binarylocation}"

                    }

                },

                "workspaces" : {

                    "default" : "default",

                    "allowCreation" : true

                },

                "security" : {

                    "anonymous" : {

                        "roles" : ["readonly","readwrite","admin"],

                        "useOnFailedLogin" : false

                    }

                }

               

            }

            • 3. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
              kbachl

              I think your problem is here:

               

              "<eviction maxEntries="8192" strategy="LIRS" />"

               

              You work with way over 10' objects that have to go in, but you limit the max number of in memory entries for infinispan to 8' - remove this and then try it;

               

              Also

               

              <singleFile

                              preload="false"

               

              leads to a slower adoption at the benefit of a bit faster startup time; I would set this to true; Also make sure your ram is big enough to hold *all* data in  ( the whole repo) as you use the singleFileStore of infinispan 6;

               

              You might also want to give your binary storage a

              "minimumBinarySizeInBytes": 1048576

              (here 1 MB) so only big files are written to the filesystem directly (this can lead to slower performance);

               

               

              Best,


              KB

              • 4. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                bes82

                Thanks for the infos.

                 

                I was talking about an import job on an empty repository. I can see when eviction starts to take place, because then (I guess when nodes have to be reloaded) performance starts to drop a bit (but not much). So 80ms is the performance measured right from the beginning (test started on empty repository)

                 

                Preload doesn't change anything for my test, I guess again because of the empty repository at the start.

                 

                Currently I don't store binaries but thanks for mentioning this.

                 

                What I store though is some nodes having ~1k properties, might this be a problem?

                • 5. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                  bes82

                  I found out where the bottleneck is, LocalIndexProvider using MapDB:

                   

                  It seems (at least with synchronous indexes) every node put into an index means direct synchrounous blocking disk I/O.

                   

                  So I started with a simple FS-based ramdisk and suddenly modifying or storing nodes is 100 times (no joke) faster. The ramdisk implemenation used is even able to sync to disk every few seconds.

                   

                  So I started looking at the code of LocalIndexProvider and tried to play around with DBMaker in order to come up with a pure Java solution.

                  Currently MapDB is just used as: this.db = DBMaker.newFileDB(file).make();

                   

                  Adding mmapFileEnableIfSupported() already increased performance by a factor of 10, I guess without any drawbacks. Still ten times slower as with a ramdisk.

                   

                  So I'm currently playing around with asyncWriteFlushDelay, asyncWriteEnable and cacheLRUEnable.

                   

                  AsyncWriteFlushDelay seems to totally block everything, which I don't understand and the implications of cacheLRUEnable are not yet understood.

                   

                  If anyone could shed some light on which options I could/should (better not) use, that would be very helpfull.

                  • 6. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                    rhauch

                    Obviously we'd like to have it perform as fast as possible, and it sounds like you've found a few switches that make a considerable difference. Feel free to log an enhancement request in JIRA and create a pull request with specific changes to how the MapDB maps are created. First, doing so lets us see what you're proposing as well as run with the same proposed changes. Second, it would allow us to collaborate on which set of switches/options makes the most sense -- there are quite a few added in recent MapDB releases that I wasn't aware of.

                    • 7. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                      ma6rl

                      bes82 Were you able to make any additional progress with improving the MapDB performance?

                      • 8. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                        bes82

                        I guess the MapDb index implemented some configuration settings in 4.3 mmapFileEnableIfSupported and cacheLRUEnable are the options to go for.

                         

                        But in general MapDb single property indexes were just way to slow for my use case.

                         

                        So I created my own lucene based provider that can handle multiple properties per index. So every nodeType is now an index and I can do things like "search all from nt:x where a=b and c!=d and e>f" dramatically faster than with MapDb.

                         

                        However the Lucene index does not yet support joins and some property types, that the MapDb provider supports. It's working very well for me, but I'm not at the point where I think it's bugfree enough to be release it to the public.

                        • 9. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                          ma6rl

                          Thanks for the update

                           

                          Bjoern Schmidt wrote:

                           

                          I guess the MapDb index implemented some configuration settings in 4.3 mmapFileEnableIfSupported and cacheLRUEnable are the options to go for.

                           

                          I can see the flags in LocalIndexProvider.java but there does not appear to be anyway to set them other than changing the values in the class and rebuilding Modeshape. I don't believe they have been exposed via any of the Modeshape configuration mechanisms.

                           

                          We are also seeing really poor performance with writes to MapDB especially under concurrent load. We see our through put drop from being able to write 400 nodes a second without indexes to less than 40 nodes a second with a single sync index enabled. While I did expect to see a drop in write performance with indexing, a 10 times reduction does seem a little extreme. I did some profiling using Flame Graphs and all of CPU time is spent in LocalIndexProvider and the MapDB classes.

                           

                          It's encouraging to hear that you are seeing better performance with Lucene as we have also been looking at alternative index providers. I know there is an open issue [MODE-2159] Store indexes in local Lucene - JBoss Issue Tracker to add support for Lucene. It is currently assigned to 4.4 but am not sure if it is going to be in the release. hchiorean, do you know if MODE-2159 is still planned for 4.4 or is it going to be moved out?

                           

                          I'm also running into another interesting issue with MapDB indexes when using user transactions and pessimistic locking. I intermittently see concurrent writes (to different parts of the node hierarchy) deadlock so that only a few complete and the others all sit until the underlying infinispan locks timeout. Theses issues do not occur with indexing disabled. I'm working on trying to create a test case to demonstrate this and if I can am going to create an issue for it.

                           

                          hchiorean, rhauch, do you have any suggestions or feedback about the performance we see writing to a MapDB local index?

                          • 10. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                            hchiorean

                            You can configure various MapDB options: modeshape/LocalIndexProvider.java at modeshape-4.3.0.Final · ModeShape/modeshape · GitHub both via JSON: modeshape/local-index-provider-with-custom-settings.json at modeshape-4.3.0.Final · ModeShape/modeshape · GitHub

                            and via the Wildfly XML: modeshape/standalone-modeshape.xml at modeshape-4.3.0.Final · ModeShape/modeshape · GitHub

                            It's encouraging to hear that you are seeing better performance with Lucene as we have also been looking at alternative index providers. I know there is an open issue [MODE-2159] Store indexes in local Lucene - JBoss Issue Tracker to add support for Lucene. It is currently assigned to 4.4 but am not sure if it is going to be in the release. hchiorean, do you know if MODE-2159 is still planned for 4.4 or is it going to be moved out?

                            It is going to be moved out. It's simply too much work to finish it in time for 4.4

                            • 11. Re: Modeshape / Infinispan performance [how performant should it be] (ModeShape 4.0)
                              ma6rl

                              I've experimented with the MapDB configuration and based on the earlier findings posted by bes82 was able to get a significant performance boost by setting:

                               

                              cacheLRUEnable="true"

                              mmapFileEnable="true"

                              commitFileSyncDisable="true"

                               

                              and am able to create ~300 nodes a second with synchronous indexes. The majority of the performance boost came from 'commitFileSyncDisable', it is worth noting this does greatly increase the chances that the index cache may become corrupted if not shutdown correctly so it should be used with caution.

                               

                              I was also able to figure out the deadlocking issue I was running into above. It turned out this was a result of using the JDBC Infinispan Cache Store and setting the datasources max connection pool size too low. The significant overhead that indexing added meant that connections were not being released quick enough and writes were blocked waiting for connections but were holding a lock on the infinispan entries which in turn was blocking writes with open connections from completing and releasing the connection.

                               

                              Based on metrics from my test environment I was able to handle node writes with a connection pool to request ratio of 1 connection per 4 concurrent requests with indexing disabled. With indexing enabled I need a ration of 1 connection per 2 concurrent requests, or more simply I needed to double the max connections in the pool to cope with indexing.

                               

                              At this point we can work within the constraints of using the local index provider shipped with Modeshape but will most likely move to one of the other providers currently being implemented (lucene or elastic search) once they are available.