11 Replies Latest reply on Dec 21, 2005 10:04 AM by manik

    CacheLoader: exists() before get() inefficient - solution?

    antonyblakey

      I have a CacheLoader that I am using to allow a TreeCache to act as the primary interface to a DAV/JDBC data model. Each access to the backend can be quite expensive. I note that a CacheLoader.get() is always preceeded by a CacheLoader.exists(), which seems redundant to me and has the unfortunate effect of doubling the backend load. Am I missing something here?

        • 1. Re: CacheLoader: exists() before get() inefficient - solutio
          belaban

          You're right. I have created a JIRA issue (http://jira.jboss.com/jira/browse/JBCACHE-118) that I already started working on. However, it broke the CacheLoader unit tests, so I commented my changes and moved to issue to 1.2.3. I don't want to retest 1.2.2 extensively, this is just the standalone module plus some minor changes

          • 2. Re: CacheLoader: exists() before get() inefficient - solutio
            manik

            Sorry, just realised this forum already exists - I started a new one on the same subject. I'll cross-post here, pls ignore the new one I created.

            Looking at JBCACHE-118 now.

            There is a reported inefficiency of calling loader.exists() before calling loader.get().

            We could call loader.get() directly, but since only attribs are stored in the cache loader, loader.get() returning null could mean the node does not exist or that the node does exist but doesn't contain any data. Which means that if loader.get() returns a null, we would THEN have to call loader.exist() to test whether the node exists in the first place.

            1) The most efficient thing I can think of is to mandate in the API that loader.get() ALWAYS returns a Map of data, an empty map if there is no data. A null ALWAYS means the node doesn't exist. But this may involve patching our CacheLoaders (easy) and clients patching their custom CacheLoaders (harder).

            The next best option depends on use case.

            2) If most nodes contain data then it is best to do a loader.get() followed by (maybe) a loader.exists().

            3) If most nodes are empty, then the current implementation of loader.exists() followed by loader.get() is better.

            I have both implementations with me at the moment, with a new test case (CacheLoaderMethodCallCounterTest) that counts how many times each cache loader method is called and this is how I determined the above.

            Does anyone have any idea of most common use cases? I hardly think many people would have caches of many nodes without any data in them (case 3 above) while I think the case of having nodes with data (case 2 above) is more common. I don't suppose there is any chance we could "enforce" a change in API semantics (case 1, most efficient), is there? :)

            • 3. Re: CacheLoader: exists() before get() inefficient - solutio
              belaban

              We're talking about 2 get() methods in CacheLoader:
              #1 Object get(Fqn name, Object key) throws Exception;
              #2 Map get(Fqn name) throws Exception;

              You're only referring to the 2nd case.

              #1
              If you return null, this could mean:
              - key not associated with a value or
              - node doesn't exist

              #2
              (same as above)

              I think the cleanest way is to add an exception to the get() methods, e.g. NodeNotExistsException. I'm thinking about whether it would make sense to make this a runtime exception, so we don't have to change the signature, but then again, we should make this clear on the API level.
              Of course this would break existing CacheLoader implementations, so we'd have to communicate this clearly.
              Thoughts ?

              • 4. Re: CacheLoader: exists() before get() inefficient - solutio
                belaban


                Manik and I just had a phone call, and here's the outcome (comments welcome!):
                - we will remove the get(node, key) method from the CacheLoader interface. It is not used and I don't think it ever will be used, because maintaining which keys have been loaded, and which not, is too complex
                - For the Map get(node) method, either
                #1 return null if node not found and an empty hashmap if no attributes are present

                or

                #2 throw a NodeNotExists exception

                #1 will silently change semantics, and all CacheLoader impls have to change to conform to the new semantics
                #2 will force existing CacheLoader implementations to be changed

                I'm open for suggestions on which solution people like more. I'm slightly leaning towards #2

                • 5. Re: CacheLoader: exists() before get() inefficient - solutio
                  manik

                  Hi,

                  Method #1 - loader.get(fqn, key) - is not used in the cache loader interceptors and should be removed from the CacheLoader interface as it just adds unnecessary complexity.

                  I like having loader.get(fqn) to throw a NodeNotExistsException - forces implementations to be revisited rather than 'silently' failing because of changed semantics.

                  The other option - changing the semantics of loader.get() so it returns a null only if the node doesn't exist and an empty Map if the load does but with no attribs - may result in 'silent' bugs because people haven't updated their cacheloader impl's.

                  The downside of using an exception is the relative overhead of using an exception for a fairly common situation (trying to load a node that doesn't exist) so it isn't really an "exceptional" circumstance...

                  Naturally such changes will be clearly documented (Javadocs, user guides and FAQs, wikis, even a blog about it).

                  Preferences?

                  • 6. Re: CacheLoader: exists() before get() inefficient - solutio
                    belaban

                    I don't think there is any overhead when throwing an exception compared to returning a value.
                    Hmm, this could easily be microbenched though... Volunteers ? :-)

                    • 7. Re: CacheLoader: exists() before get() inefficient - solutio
                      manik

                      Throwing an exception will NOT force people to revisit their cache loader impls - just checked the codebase, CacheLoader.get() throws Exception.

                      So it will be a 'silent' semantic change either way, which is a shame. I could log warnings when this semantic change could be a potential problem, alerting the user to ensure that the cache loader has been updated accordingly...

                      • 8. Re: CacheLoader: exists() before get() inefficient - solutio
                        belaban

                        Yes, but the exception was designed to be thrown when the cache loader implementation encountered a problem, e.g. if it was not able to connect to a DB in the case of JDBCCacheLoader.
                        I think we should differentiate between an exception caused by an internal failure (a 'real' exception), and an exception thrown when a node is not found.
                        If we go for reuse of the existing Exception in the signature, then I'd rather use the proposed "null return value" scheme.

                        • 9. Re: CacheLoader: exists() before get() inefficient - solutio
                          manik

                          Precisely. So having NodeNotFoundException may cause problems (as it is a subclass of Exception) unless CacheLoader authors are careful about what they throw.

                          I still think not finding a node is not "exceptional" and shouldn't be treated as an Exception.

                          Null value return then?

                          • 10. Re: CacheLoader: exists() before get() inefficient - solutio
                            belaban

                            Looks like it. Yuk ... :-)

                            • 11. Re: CacheLoader: exists() before get() inefficient - solutio
                              manik

                              Tell me about it. Will have to be heavily documented and support will need to be educated on it so they can filter cases that will inevitably come in ... :S