1 2 Previous Next 21 Replies Latest reply on Jun 27, 2012 12:00 AM by rhauch

    Binary values in ModeShape 3

    rhauch

      As I've mentioned on previous threads, ModeShape 3 handles binary values in a very different way than in 2.x. I'm starting this thread to describe how the new code has been implemented and to solicit feedback and suggestions for improvements.

       

      I've written the initial documentation for how ModeShape 3 handles binary values. Since we're still in the alpha stage, please ask questions or provide comments in this thread (rather than on the documentation page). None of this is locked down at the moment.

        • 1. Re: Binary values in ModeShape 3
          bwallis42

          Hi Randall,

            How will the binary values be treated within a transaction? In particular with the file system store. Also, I'm curious how this would work for a clustered setup in which case I would think I would have to be using the Infinispan binary store?

           

          thanks

          • 2. Re: Binary values in ModeShape 3
            rhauch

            Because the BinaryStore is used to create the Binary value when the client calls ValueFactory.createBinary(InputStream), the binary content is immediately placed in the BinaryStore. In the case of the FileSystemBinaryStore, that means the binary content is immediately stored in a file on the file system, and it's immediately available for use in any other properties and nodes.

             

            Now, if the transaction is rolled back, the Session used to create the Binary should still contain the transient state that couldn't be committed. So the fact that the Binary is still persisted is a good thing. Only when the Session's transient state (with the unsaved Binary value) is cleared would the BinaryStore contain a Binary value that might no longer be used. And that's okay, actually -- having more unused Binary values doesn't hurt anything (other than space) -- and the BinaryStore mechanism contains a garbage collection mechansim to clean those up. I'll update the page with some info on how garbage collection works.

            • 3. Re: Binary values in ModeShape 3
              rhauch

              I've updated the page with more info about garbage collection.

              • 4. Re: Binary values in ModeShape 3
                jonathandfields

                Hi All,

                 

                I was wondering if it would be possible to generalize the 3.0 Binary/BinaryStore approach to support the creation of  Binaries that expose files (or other resources) that are created and accessible outside of Modeshape.  Although this is not a feature of JCR, it would be  useful, and seems  to be in the spirit of Modeshape "isn't yet another silo of isolated information".   In my use cases, by binary data (video files) needs to be accessible both from within JCR, and also as files to other applications. While this is a form of federation, it seems like it is specific to Binary storage, not to general federation. Indeed, I want to store everything but my video data  in the 3.0 infinispan based nodes.   I haven't worked this out in much detail, and there are probably problems with this, but I was wondering if something like the following might be feasible.... If so, I might try to make the changes to Modeshape myself.

                 

                What if BinaryKey was just a generic key - not necessarily a SHA1 hash - and if there were extended ValueFactory methods (or a separate Modeshape API) to create a Binary with an already known key.  For example consider a BinaryStore implementation based upon Apache VFS, called VFSBinaryStore, where the underlying keys were the URLs.  We might have:

                 

                     VFSBinaryKey key = new VFSBinaryKey("ftp://server1/path/to/file1");

                     Binary b = factory.createBinary(key);

                 

                VFSBinaryKey would be an extension of BinaryKey, that just contains the URL. This would create a Binary that refers to the existing file that was created outside Modeshape.

                 

                To create a file from within Modeshape, but to specify it's URL so so they can be accessed outside of Modeshape with a user-friendly name, we might have:

                 

                     VFSBinaryKey key = new VFSBinaryKey("ftp://server1/path/to/file2");

                     Binary b = factory.createBinary(key, inputStream)

                 

                VFSBinaryStore would  create a new ftp://server1/path/to/file2 and copy the InputStream into it (handling the edge cases if the file already exists, the parent folders don't exist, etc).

                 

                Last, we would have the standard JCR case where Modeshape determines the file name:

                 

                     Binary b = factory.createBinary(inputStream);

                 

                In this case, the behavior would be identical to that of FileSystemBinaryStore - calculate the SHA1, use that as the key, and also as the basis of the file name. The file would be placed in a location specified by the VFSBinaryStore configuration.

                 

                I would also want to be able to retrieve the underlying key, perhaps through an extended Binary interface (or a separate Modeshape API), so that the key (VFS URL in this case) could be used, or provided to other software that is not JCR-enabled, but is VFS-enabled. For example:

                 

                    BinaryKey key = b.getKey();

                    if (key instanceof VFSBinaryKey) {

                        // We know that we can use the key to use VFS directly.

                    }

                 

                I'm not sure how to handle the case if the underlying key changes (the file is moved), or is deleted. It could just be that this is not allowed, and that once Modeshape is referencing the files, by convention, they must not change or be deleted. The problem is, in practice, this may not be possible. Could there be some API to change the BinaryKey of an existing Binary, or to delete all Properties with the Binary given it's key?

                • 5. Re: Binary values in ModeShape 3
                  jonathandfields

                  After looking at the code, it appears as if the BinaryStore implementations are hardwired into the configuration, not something that can be plugged into a repository configuration (org.modeshape.jcr.RepositoryConfiguration). Is that intended to be the final approach in 3.0?

                  • 6. Re: Binary values in ModeShape 3
                    rhauch

                    Jonathan,

                     

                    I think this is a great suggestion, and something that I think we can make work. You are correct in that it's not something we do with the current code, but that can be changed -- and I don't think we're too far from being able to do this.

                     

                    BinaryKey already contains a String, but there is code we'd have to change that assumes that's the SHA-1 of the binary value. So before we talk about that, let's first go over how ModeShape clients would use this feature.

                    API

                    Does the JCR client ever upload the original binary content through JCR, or does the binary content "appear" in the binary store (after being added a different way)? If the answer is that JCR clients should be able to do this, then they'd have to supply the binary key. Is that acceptable? Desirable? We could add a new method to our 'org.modeshape.jcr.api.ValueFactory' extension to 'javax.jcr.ValueFactory' where the user could provide the key:

                     

                    Binary createBinary( String key, InputStream content) throws RepositoryException;
                    

                     

                    Should method fail if the key has already been used, or should it replace the content?

                     

                    Now, obtaining a Binary value given a key is a completely different thing, and we should absolutely provide a way of doing that. Again, we can add a method to our 'org.modeshape.jcr.api.ValueFactory' interface:

                     

                    Binary createBinary( String key) throws RepositoryException;
                    

                     

                    where this would thrown an exception of the Binary was not found (using perhaps a new exception type to distinguish between other repository errors). Note that it is not possible to find out where such a Binary value is used (this is good from an access control exception), but it does mean that a JCR client can use this to confirm whether the content has been used in the repository (at some point in time). Is that a problem?

                    Multiple Binary Stores

                    We currently only allow a single BinaryStore, but I think we could pretty easily define a chain of them (each would be consulted in order, until one of them succeeded), and expand the format for the BinaryKey string values to allow some sort of scoping. I think we'd generally want the built-in BinaryStore to be first, but it'd only understand BinaryKeys that are SHA-1 hashes. And even if it tried and failed to find a BinaryValue for a SHA-1 hash, the rest of the BinaryStores in the chain can be consulted. Yes, this would change our configuration, but that's okay since we're not yet at 3.0.0.Final.

                     

                    Currently, the BinaryKey class contains a string, but we make assumptions that this string value is a SHA-1 hash. We could probably eliminate those assumptions, and allow the BinaryKey to be any string (as long as at least one BinaryStore understood the format).

                    • 7. Re: Binary values in ModeShape 3
                      rhauch

                      One more thing. If we are to support this kind of BinaryStore, it would mean relaxing several existing requirements:

                       

                      1) The BinaryKey is no longer purely a function of the content, and

                      2) The Binary content for a given, fixed BinaryKey may change.

                       

                      Not all BinaryStores would relax those requirements (e.g., our SHA-1 based stores would still use keys based on the immutable content). Is this acceptable?

                      • 8. Re: Binary values in ModeShape 3
                        jonathandfields

                        It would be wonderful if Modeshape provided this flexibility. My thoughts on your questions:

                         

                        Does the JCR client ever upload the original binary content through JCR, or does the binary content "appear" in the binary store (after being added a different way)?

                         

                        I think that there could be use cases for both.  I think that providing the binary key is desirable when uploading. That way, you can create a Binary from within JCR, but control its underlying name, so that it can be accessible outside of JCR, since the name can be made meaningful to another app.

                         

                        As far as what to do if the key has already been used.... That could either be a decision that the BinaryStore makes and is configurable; or, the createBinary() method could have an optional third argument that specifies whether to overwrite, truncate, or throw an exception.

                         

                        Equally important to being able to specify the key for a Binary, is to obtain the key from a Binary.  For example, I might have a Binary with key "http://server/path/to/image.jpeg"  I can then use that key as a URL to display the image in a web app. Otherwise, to display that image, I would need to write a servlet that sends the Binary data over HTTP.

                        • 9. Re: Binary values in ModeShape 3
                          bwallis42

                          Multiple Binary Stores

                          We currently only allow a single BinaryStore, but I think we could pretty easily define a chain of them (each would be consulted in order, until one of them succeeded), and expand the format for the BinaryKey string values to allow some sort of scoping. I think we'd generally want the built-in BinaryStore to be first, but it'd only understand BinaryKeys that are SHA-1 hashes. And even if it tried and failed to find a BinaryValue for a SHA-1 hash, the rest of the BinaryStores in the chain can be consulted. Yes, this would change our configuration, but that's okay since we're not yet at 3.0.0.Final.

                           

                          I've been watching this discussion with interest.

                           

                          The point above about only having a single binary store, can you expand on that? I absolutely need multiple binary stores so I can partition the storage of a large amount of data across multiple storage locations that may have different performance characteristics. This is been discussed before in the thread about federation requirements.

                           

                          thanks,

                          • 10. Re: Binary values in ModeShape 3
                            rhauch

                            The point above about only having a single binary store, can you expand on that? I absolutely need multiple binary stores so I can partition the storage of a large amount of data across multiple storage locations that may have different performance characteristics.

                            Currently, a ModeShape repository uses a single BinaryStore instance. Now, that BinaryStore implementation can store the binary content however it wants, including storing them on multiple machines based upon whatever criteria. We currently have a several BinaryStore implementations (see our initial documentation), and support using custom BinaryStores.

                             

                            What we've been talking about in the past few posts is for a single repository to be able to use a chain of multiple BinaryStore instances. Each BinaryStore can still do whatever it wants, but the ability to have separate BinaryStore instances would likely mean that different instances can be configured to do different things.

                             

                            Perhaps the one idea that might have the biggest impact, however, is changing the BinaryKey from effectively only SHA-1s to arbitrary formats, and to expose the keys to JCR clients. This allows a ModeShape installation to give the clients some control over where the binary content is stored.

                             

                            This is been discussed before in the thread about federation requirements.

                             

                            I presume you're talking about this thread. Now that I've re-read your requirements, it does sound like you'd benefit from multiple (or even a custom) BinaryStore to persist the larger data the way you want. Would the improved BinaryStore capabilties be the complete solution for your federation use case, or do you still need to control where the regular (non-Binary value) content is stored, too?

                            • 11. Re: Binary values in ModeShape 3
                              rhauch

                              Since we agree that this is a useful feature, I've created a feature request in JIRA (MODE-1452) to support multiple BinaryStores. We do need to further identify the requirements and behaviors, however, and I think this thread is the perfect place to do that.

                              • 12. Re: Binary values in ModeShape 3
                                rhauch

                                Jonathan Fields wrote:

                                 

                                It would be wonderful if Modeshape provided this flexibility. My thoughts on your questions:

                                 

                                Does the JCR client ever upload the original binary content through JCR, or does the binary content "appear" in the binary store (after being added a different way)?

                                 

                                I think that there could be use cases for both.  I think that providing the binary key is desirable when uploading. That way, you can create a Binary from within JCR, but control its underlying name, so that it can be accessible outside of JCR, since the name can be made meaningful to another app.

                                Agreed. And probably the BinaryStore implementations can ignore a key if it doesn't know the format. Perhaps our built-in SHA-1 based BinaryStore implementations might accept the client-supplied keys, but would accept it only after verifying it (via computing the SHA-1 itself).

                                 

                                I'm also trying to think of any holes or problems that this might cause.

                                 

                                As far as what to do if the key has already been used.... That could either be a decision that the BinaryStore makes and is configurable; or, the createBinary() method could have an optional third argument that specifies whether to overwrite, truncate, or throw an exception.

                                Agreed. Tho I'm not sure what "truncate" means.

                                 

                                Equally important to being able to specify the key for a Binary, is to obtain the key from a Binary.  For example, I might have a Binary with key "http://server/path/to/image.jpeg"  I can then use that key as a URL to display the image in a web app. Otherwise, to display that image, I would need to write a servlet that sends the Binary data over HTTP.

                                 

                                Agreed. We'd probably expose this in our 'org.modeshape.jcr.api.Binary' extension to 'javax.jcr.Binary'.

                                • 13. Re: Binary values in ModeShape 3
                                  jonathandfields

                                  Overwrite and truncate mean the same thing. Typing a bit too fast....

                                  • 14. Re: Binary values in ModeShape 3
                                    jonathandfields

                                    I am wondering if the key should explicitly specify the BinaryStore instance  along with the binary object ID within that instance to avoid ambiguity. Something like:

                                     

                                        BinaryKey key = new BinaryKey("store1", "id1");  // "store1" is the name of the store from the config file

                                        Binary b = factory.createBinary(key);

                                        assert b.getKey().getStore().equals("store1");

                                        assert b.getKey().getId().equals("id1");

                                        assert b.getKey().toString().equals("store1:id1"); // or something similar

                                     

                                    That way, if "id1" is a valid id in more than one binary store, there is no ambiguity.

                                    1 2 Previous Next