Version 4

    This is tracked  by ISPN-3351.

    Purpose

    Recently it has been more and more a requirement from the users to be able to:

    - controlled shutdown a cluster and flush data to persistent storage

    - restart the whole cluster and preload the data from the storage (no data loss)

    Design

    Here's a suggested design to achieve this functionality:

    Controlled shutdown

      - when a *ClusterShutdown* operation (JMX) is triggered:

      - the coordinator doesn't initiate any rebalance on nodes leaving

      - each node:

        - stops accepting client requests (in future we might get smarter and have a timeout for ongoing transactions)

        - flushes whatever state it has in memory to the cache store: e.g. if async store waits for it to stop, if SingleFileCacheStore is used directly forces a sync

        - writes a "successfulShutdown" flag to the *LocallyRegistry*

        - process exits

    Restart

    - each individual node is restarted with a "ClusterRestart" option indicating they want to restore cluster state

    - each node:

       - reads the "successfulShutdown" and "shutDownView" information from the *LocalRegistry* and sends it to the coordinator

       - if the node doesn't have have the "successfulShutdown" flag present it wipes out the cache store assuming the data is stale

       - the coordinator doesn't start a rebalance on nodes joining, but:

         - the coordinator waits for all nodes from the previously stored shutDownView to join again - ignoring new nodes

         - when each of the expected nodes have rejoined the coordinator informs the cluster to preload

         - after each nodes finished preloading, the coordinator install the "shutDownView" in all the nodes in the cluster (no state transfer yet if there are new nodes as well)

          - enable new nodes to join (enable normal state transfer)

    - in order to support the case in which not all nodes are being restarted the following JMX operations are available

       - restartStatus - provides the following information

                        - list of the expected cluster nodes: "shutDownView"

                        - current cluster state: how many nodes have joined up to this moment, which ones are we still waiting on

                        - noDataLost: a boolean indicating weather enough nodes joined so that no state is lost (e.g. numOwners = 3, if  shutDownView.size -2 nodes have joined; this flag could become smarter in future by looking at the actual segment distribution)

       - forceRestart - allows an explicit restart even if not all nodes from the "shutDownView" are present

            - the nodes that have currently joined preload the state from cache store

            - after all have preloaded a rebalance is triggered in which old topology == "shutDownView" and thenew topology == current existing topology based on the available nodes

    LocalRegistry

    Implemented either as a file on the local disk (backed by the SingleFileCacheStore)

      <global>
         <localRegistry store="file:/path/to/my/local/regitry"/>
      <global>
    

     

    Or in environments where a local store is not available (e.g. clouds): 

    <global>
       <localRegistry store="cahe:/myCache/myCacheStore"/>
    <global>
    ...
    <namedCache name="myCache">
      <loaders>
        <loader name="myCacheStore" class="...">
        ...
        </loader>
      </loaders>
    </namedCache>