6 Replies Latest reply on May 23, 2012 1:25 PM by dex80526

What is the correct way to get all keySet of a cache

dex80526 May 21, 2012 1:03 PM

On Cache interface, the method Set<K> keySet() and values() are all stated "

This method should only be used for testing or debugging purposes such as to verify that the cache contains all the

* keys (values) entered. Any other use involving execution of this method on a production system is not recommended."

However, I did not see other alternatives to get keySet of a cache. What is the correct way to do this?

Note: I have replication mode here.

1. Re: What is the correct way to get all keySet of a cache

mgencur May 21, 2012 3:39 PM (in response to dex80526)

Getting all keys by keySet is not recommended because it's a dangerous method. It's not atomic and could result in inconsistencies, i.e. you could get incomplete list in the event of adding new keys by another thread on another node in the cluster while getting the keySet.

The proper solution is IMO to store the key set as a separate cache entry:

Set<String> keys = new HashSet<String>();
keys.add("key1");
...
keys.add("keyX");

cache.put(KNOWN_KEYS, keys);

Then you can atomically get the whole set of keys.
Actions
2. Re: What is the correct way to get all keySet of a cache

dex80526 May 21, 2012 4:01 PM (in response to mgencur)

thanks for the suggestion and the insight.

The suggested approache does not scale. In my case, the number of keys could be up to 100K. This will result in extra replication across nodes.
I am aware of the potential "inconsistencies" or "not atmoic". That is one of the reasons that I posted question to have a cache wide lock or make a cache readonly earlier.

In my case, I have the cache configured to use sync in replication, and use a cluster wide lock-token to ensure there is not adding/deleting of cache items when I call cache.keySet().
Do you see there are any other reasons that keySet() method on cache is not recommend to use?

The comment in the source code does not state why the method is not recommend for production.

Then, it seems the size() method wil not be reliable either.

It seems to me that operations such as keySet() or values() or size() of cache is so fundmental that we have to support.
Actions
3. Re: What is the correct way to get all keySet of a cache

kodadma May 21, 2012 6:06 PM (in response to dex80526)

Some caches can have millions or even billions of keys and this API is not suitable for very large caches, imo (it will takes too long and too much RAM to collect all keys from the cluster), but you can try, of course. I think instead of returning Set<> this API call should return iterator of keys and must allow to specify Filter object as well.
Actions
4. Re: What is the correct way to get all keySet of a cache

dex80526 May 21, 2012 8:19 PM (in response to kodadma)

I agree it could take long time to get all keys. The memory is a different issue. The question is how a user can get the keyset of a cache reliably.

The memory is a different issue.
Actions
5. Re: What is the correct way to get all keySet of a cache

galder.zamarreno May 23, 2012 3:33 AM (in response to dex80526)

With replication mode, keySet() is not problematic, you can use it anytime really.

With distribution mode though, it only gives you a local view of the keys present in the cache. IOW, it doesn't go and try to find all keys in the distributed cache, since that could be lengthy.
Actions
6. Re: What is the correct way to get all keySet of a cache

dex80526 May 23, 2012 1:25 PM (in response to galder.zamarreno)

Galder: that's what I want to get comfirmed. thatnks.
Actions

Go to original post