1 2 Previous Next 18 Replies Latest reply on Apr 1, 2014 9:46 AM by lhelander

    FileSystemConnector in 4.0

    lhelander

      I am using the FileSystemConnector in 4.0-SNAPSHOT (git pull made yesterday). It does not seem like the "imported" files trigger configured sequencers, is this expected behaviour?

       

      I also noticed that the files are not indexed immediately, but it takes a minute or two before they are returned by a query. Is this expected behaviour?

        • 1. Re: FileSystemConnector in 4.0
          rhauch

          I am using the FileSystemConnector in 4.0-SNAPSHOT (git pull made yesterday). It does not seem like the "imported" files trigger configured sequencers, is this expected behaviour?

          What do you mean by "imported"? The files that are added/modified outside of the JVM process and discovered through the eventing mechanism?

           

          I also noticed that the files are not indexed immediately, but it takes a minute or two before they are returned by a query. Is this expected behaviour?

           

          Well, it actually depends on your platform OS and JVM. The file system connector uses Java 7's WatchService, which is implemented completely differently in Oracle's JDK on different platforms. For example, on Mac OS X I believe it is implemented via a polling mechanism with a surprisingly high polling interval. What sucks is that OS X really has a fantastic (and nearly instantaneous) native notification system for changed files - why Oracle doesn't use that is completely bonkers.

           

          So be sure to learn about the WatchService on how your JVM on your OS implements the WatchService, and whether there are any JVM tuning parameters for it.

          • 2. Re: FileSystemConnector in 4.0
            lhelander

            By "imported" I meant the files that are discovered by the connector. I have set the "enableEvents" to true in my connector configuration.

             

            I am running on OS X and filesystem modifications (adding or deleting files) are detected by the "Monitor Task" of the connector within a few seconds, but it takes an additional one to two minutes before it actually shows up in the repository (I do repeated queries until the modification shows up). The problem you mention with regards to non-optimal solutions for OS X ought to relate to that it actually takes a few seconds before the watcher is triggered, but the additional one to two minutes could likely not be "blamed" on "Java on OS X".

             

            The number of files in the connected file system is less than 10.

             

            What could be the reason for that the sequencer is not triggered when new files are added to the connected file system?

            • 3. Re: FileSystemConnector in 4.0
              rhauch

              Okay, so I'm not operating on all cylinders this AM yet. My previous comment does accurately explain why events might be delayed relative to the actual file system changes, but it doesn't really address how that might be affecting your situation.

               

              First off, are you setting a Time To Live (TTL) property on the connector? If you are, and you are either navigating to or querying the repository *before* externally making changes to the file system, the ModeShape will cache the representations of those external nodes for up to the number of TTL seconds. That means no matter how many times you navigate or query the repository, ModeShape will continue to use the cached (and potentially stale) representation of those external nodes and will not go back to the connector. At some point, when ModeShape is looking for nodes that are still cached, it will examine the TTL to see if it's expired, and if so will evict them from the cache and will fetch new representations from the connector. Therefore, the TTL setting completely dictates how long nodes might be cached, even if they are "stale" compared to the underlying file system.


              Now, events also play into this. When a connector fires an event that a specific node has changed in some way, then ModeShape immediately evicts the cached representation of that node from the internal cache, regardless of whether there is an TTL left for that node. If the file system connector immediately fired events when things changed on the file system, then you could use a long TTL since ModeShape would evict any changed node from its cache. However, Java 7's WatchService has no guarantee about how quickly it will kick in, so there is a delay between changes to the file system and when the corresponding events are fired into ModeShape. And unfortunately (for the reasons discussed in my previous post), with Oracle's JDK on OS X this delay is significant. (I've seen a delay as long as about 1.5 seconds.)


              The sequencing system also is dependent upon events and any delay that might happen.


              Now, let's talk about queries. With 4.0.0.Alpha1, every time the workspace is queried and no built-in index can be used, then ModeShape will "scan" the workspace by navigating the entire tree of nodes. (When certain criteria like ISSAMENODE, ISCHILDNODE, and ISDESCENDANTNODE are used, ModeShape knows not to scan the whole workspace but do something far smarter via navigation. These are treated like "built-in" indexes, except we don't need to maintain any actual index for this.) The scanning, just like navigation, uses the internal cache: once a node is cached in the cache, it remains there until it is evicted due to TTL or events.

               

              So, if you are querying the repository, externally adding files, then re-querying the repository, it is possible that the second query will not see the new files because of TTL or delayed events.

               

              Bottom line, please check what (if anything) you're using for a TTL and understand how that TTL affects staleness. Then understand the kind of impact that the event delays might cause on your JDK and OS.

               

              I hope this explains what you are seeing. If not, please provide more information about exactly what you are performing and in what order, and we'll try to look into it more.

              • 4. Re: FileSystemConnector in 4.0
                lhelander

                Hi,

                 

                I am not setting any specific TTL level.

                 

                My setup basically consists of the modeshape-explorer web application and a repository configured with a filesystem connector and a zip sequencer.

                • 5. Re: FileSystemConnector in 4.0
                  rhauch

                  I wonder if the cache TTL is getting set to some default. It shouldn't take that long for the content to become visible.


                  Can you try setting the cacheTtlSeconds property on the connector to '0', and see if that changes the behavior?

                  • 6. Re: FileSystemConnector in 4.0
                    lhelander

                    I think that the problem with the "delay" is solved. I had an indexing part of my configuration in order to make user that indexing was performed at startup. That enabled me to search for connected nodes in Modeshape 3. Your description of the built-in indexing in Modeshape made me remove this and now the content of the repository follows the state of the connected file system without any significant delays .

                     

                    Correction: it looks like it was the combination of removing the indexing and adding the       "cacheTtlSeconds" : 0   to the connector configuration, that together made the "trick".

                     

                    The indexing part I removed was:

                     

                        "query" : {

                            "enabled" : true,

                            "indexStorage" : {

                                "type" : "filesystem",

                                "location" : "/Users/lars_adm/jcmsReports/indexes"

                            },

                            "indexing" : {

                                "rebuildOnStartup" : {

                                    "when" : "always",

                                    "includeSystemContent" : true,

                                    "mode" : "sync"

                                }

                            }

                        },

                     

                    But still the sequencer do not get triggered .

                     

                    Are there any particular log "categories" that I can configure to some more "detailed" level in order to trace what happens with regards to why the sequencer do not get triggered?

                     

                     

                    Here is my configuration:

                    {

                        "name" : "jcmsReports",

                        "jndiName" : "java:module/jcr/jcmsReports",

                        "monitoring" : {

                            "enabled" : true

                        },

                      

                       "externalSources" : {

                            "files" : {

                               "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",

                               "directoryPath" : "/Users/lars_adm/imports",

                               "projections" : [ "default:/jcmsReports => /" ],

                               "enableEvents" : true,

                               "cacheTtlSeconds" : 0

                            }

                        },

                     

                     

                        

                      

                        "workspaces" : {

                            "default" : "default",

                            "allowCreation" : true

                        },

                        "security" : {

                            "anonymous" : {

                                "roles" : ["readonly","readwrite","admin"],

                                "useOnFailedLogin" : false

                            }

                        },

                        "query" : {

                            "enabled" : true,

                        },

                      

                      

                        "sequencing" : {

                            "removeDerivedContentWithOriginal" : true,

                            "threadPool" : "modeshape-workers",

                            "sequencers" : {

                                "Report Meta Data Sequencer" : {

                                    "classname" : "net.lehswe.modeshape.sequencer.ReportsMetadataSequencer",

                                    "pathExpressions" : ["default:/jcmsReports/*.zip => default:/seq"]

                                }

                            }

                        }  

                      

                      

                    }

                    • 7. Re: FileSystemConnector in 4.0
                      rhauch

                      Try enabling debug or trace on "org.modeshape.jcr.sequencing" and "org.modeshape.jcr.bus" and "org.modeshape.jcr.JcrObservationManager".

                       

                      It'd be very useful to know whether the "cacheTtlSeconds" property affects the outcome/delay at all.

                      • 8. Re: FileSystemConnector in 4.0
                        lhelander

                        It looks like the absence of the "cacheTtlSeconds" property result in that no changes to the connected filesystem gets propagated to the repository. If I set it to 0, then file system changes propagates quickly to the repository.

                         

                        The sequencer however does not get triggered. The only outcome of setting TRACE on the suggested log categories is that I get one log entry that says that the sequencer has been initialized. Making changes in the connected file system does not create any log entries.

                         

                        I should probably point out that I have not installed Modeshape on the server but it is embedded into the application. I am running on EAP 6.2 with Java 7.

                        • 9. Re: FileSystemConnector in 4.0
                          lhelander

                          I have experimented with log settings, but when I add a new file to the connected file system the only log entry that I can get is:

                          08:32:37,718 TRACE [org.modeshape.jcr.cache.document.WorkspaceCache] (modeshape-event-dispatcher-3-thread-1) Cache for workspace 'system' received 1 changes from remote sessions: Save by 'files' at 2014-03-29T08:32:37.718+01:00 with user data = {} in repository with key '0297115' and workspace 'default'

                            Added node 'a1f13b3eaa9871/robin.zip' at "/{}jcmsReports/{}robin.zip" under 'a1f13b3eaa9871/'

                          changed 0 nodes:

                           

                          Any idea why the sequencer is not triggered on this event?

                          • 10. Re: FileSystemConnector in 4.0
                            lhelander

                            Another observation:

                             

                            I tested to use an observer that listens on events related to added nodes. The event handler gets called when I add a new file to the connected file system.

                            If I shutdown the app server, and while the app server is stopped I add some new files to the file system, then when I restart the app server (and my application) the new files are available in the repository, but the event handler does not get called. Is there some way that I can get the detection of the added files to trigger an observer event handler?

                            • 11. Re: FileSystemConnector in 4.0
                              rhauch

                              First of all, when things are added while ModeShape is shut down, ModeShape does not attempt to figure out what might have changed. You could do that in a specialization of the file system connector, but that would be too expensive and complex to do for all scenarios that use the file system connector.

                               

                              Try adding a file to a non-federated area (and save your session), and see if that will cause the sequencer to run. It may be that your sequencer was not properly initialized. Also, I presume that you can easily tell when your sequencer is running; it may also be that the sequencer encountered an error while running and either didn't terminate or terminated silently.

                               

                              Something else you can try is to remove the "default:" from both parts of your path expression, using "/jcmsReports/*.zip => /seq" instead.

                               

                              About the only other thing I can suggest is to connect a debugger to the server to see what is going on and why.

                              • 12. Re: FileSystemConnector in 4.0
                                lhelander

                                I have done some debugging and from what I can see the following happens:

                                When the file system connector detects a new file node that will trigger the event listener in the sequencers "system". It looks like the sequencers "system" only accept property (and not node) changes.

                                • 13. Re: FileSystemConnector in 4.0
                                  rhauch

                                  Okay, try a path expression like this:

                                   

                                       /jcmsReports/*.zip[/jcr:content@jcr:data] => /seq

                                   

                                  That basically tells ModeShape to sequence the "jcr:content/jcr:data" property (relative to the changed/added node). This is how the sequencers generally work.

                                  • 14. Re: FileSystemConnector in 4.0
                                    lhelander

                                    I have tried that, but it does not work.

                                     

                                    The problem is that I can not get any sequencer to work and this is why according to my findings:

                                     

                                    The notify() method in org.modeshape.jcr.Sequencers class do test if the incoming event is of type PropertyAdded or PropertyChanged and if it is of any of these two kinds the configured sequencers are checked. Since the event generated by the fileystem connector is of type "node" and not "property", the notify() method basically ignores the event and no sequencers are triggered (and its configured values are not checked since the sequencers are not checked at all). Maybe one could handle node events in the Sequencers class but probably better would be to signal events related to adding jcr:content:jcr:data property from the file connector?

                                    1 2 Previous Next