4 Replies Latest reply on Dec 2, 2015 8:25 PM by jessie_jie_xie

    Binary store in replicated mode

    jessie_jie_xie

      Hi Guys,

       

      I am setting-up a ModeShape (4.5.0) environment adopting infinispan replicated mode (2 nodes).


      After uploading files, I can find RepositoryData.dat under both

      C:\modeshape\Repository\node1\binaries\data and C:\modeshape\Repository\node2\binaries\data with the same file size.

      So, the binary files are successfully synchronized.


      But at the same time, a table named "ispn_string_table_repositorydata" was created under msnode database, which is shared between two nodes and holds the table ispn_string_table_repository for structure data. And the data size/record count in ispn_string_table_repositorydata will increase along with the new uploaded files.


      I am just confused that which binary store will be used in this case? From ispn_string_table_repositorydata or C:\modeshape\Repository\node[n]\binaries\data?

      Actually in this case, individual binary storage in different locations is preferred than the centralized shared binary storage.

       

       

      Would you please take some time to review my modeshape and infinispan configuration below?

      Any advice/suggestion?


      Many Thanks,


      Jessie


       

      {

          "name" : "Repository",

          "jndiName": null,

          "transactionMode" : "auto",

          "monitoring" : {

              "enabled" : true,

          },

          "workspaces" : {

              "predefined" : ["EA"],

              "default" : "default",

              "allowCreation" : true,

          },

          "storage" : {

              "cacheName" : "Repository",

              "cacheConfiguration" : "infinispan_configuration.xml",

              "binaryStorage" : {

                  "minimumBinarySizeInBytes" : 4096,

                  "minimumStringSize" : 4096,

                  "type" : "cache",

                  "dataCacheName" : "RepositoryData",

                  "metadataCacheName" : "RepositoryMetadata"

              }

          },

      ...

      }

      ======================================================================================

      <?xml version="1.0" encoding="UTF-8"?>

      <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                  xsi:schemaLocation="urn:infinispan:config:7.2 http://www.infinispan.org/schemas/infinispan-config-7.2.xsd"

                  xmlns="urn:infinispan:config:7.2">

        <jgroups>

        <stack-file name="ModeShape" path="default-configs/default-jgroups-tcp.xml"/>

        </jgroups>

       

        <cache-container default-cache="Repository">

        <transport stack="ModeShape" node-name="jetty"/>

        <jmx duplicate-domains="true"/>

        <replicated-cache name="Repository" mode="ASYNC" queue-flush-interval="100">

                  <transaction mode="NON_XA" locking="PESSIMISTIC"/>

                  <persistence passivation="false">                

        <string-keyed-jdbc-store xmlns="urn:infinispan:config:store:jdbc:7.2"           

        fetch-state="false"

                              shared="true"

                              preload="false"

                              purge="false">

        <connection-pool driver="com.mysql.jdbc.Driver" 

        connection-url="jdbc:mysql://localhost:3306/msnode?useUnicode=true&#038;amp&#059;characterEncoding=UTF-8"

        username="root"

        password="root"/>

        <string-keyed-table drop-on-exit="false" create-on-start="true" prefix="ISPN_STRING_TABLE">

        <id-column name="ID_COLUMN" type="VARCHAR(255)" />

        <data-column name="DATA_COLUMN" type="LONGBLOB" />

        <timestamp-column name="TIMESTAMP_COLUMN" type="BIGINT" />

        </string-keyed-table>

        </string-keyed-jdbc-store>

                  </persistence>

        </replicated-cache>

        <replicated-cache name="RepositoryData" mode="ASYNC" queue-flush-interval="100">

                  <transaction mode="NON_XA" locking="PESSIMISTIC"/>

                  <persistence passivation="false">                

                      <file-store fetch-state="true"

                                  shared="false"

                                  preload="false"

                                  purge="false"

                                  path="C:\modeshape\Repository\node1\binaries\data"/>

                  </persistence>

        </replicated-cache>

        <replicated-cache name="RepositoryMetadata" mode="ASYNC" queue-flush-interval="100">

                  <transaction mode="NON_XA" locking="PESSIMISTIC"/>

                  <persistence passivation="false">                

                      <file-store fetch-state="true"

                                  shared="false"

                                  preload="false"

                                  purge="false"

                                  path="C:\modeshape\Repository\node1\binaries\metadata"/>

                  </persistence>

        </replicated-cache>

       

          </cache-container>

       

      </infinispan>

        • 1. Re: Binary store in replicated mode
          hchiorean

          There is a distinction between repository data and binary data: repository data is everything JCR plus ModeShape internal meta-data except the bytes[] which make up the binary values per-se. However, in JCR when working with binary values those are always created as properties on nodes so the only way to store binaries is via JCR nodes/properties.

           

          Whenever you store some binary information in the form of properties on nodes in a JCR repository:

          1. the repository data increases - which in your case is a JDBC backed store under a table with the prefix ISPN_STRING_TABLE (this is because JCR information is stored)
          2. the byte[] content of the binary stream is stored
            1. in the RepositoryData cache (as per your configuration) and
            2. some binary metadata (e.g. SHA1 of the binary stream) in the RepositoryMetadata cache.

          Note that in general you can choose another type of store for binary values (e.g. NFS, database, Mongo etc) not just ISPN, even though you're clustering.

           

          You can read more about binary values in the ModeShape documentation: Binary values - ModeShape 4 - Project Documentation Editor

          • 2. Re: Binary store in replicated mode
            jessie_jie_xie

            Horia Chiorean wrote:

             

            There is a distinction between repository data and binary data: repository data is everything JCR plus ModeShape internal meta-data except the bytes[] which make up the binary values per-se. However, in JCR when working with binary values those are always created as properties on nodes so the only way to store binaries is via JCR nodes/properties.

             

            Whenever you store some binary information in the form of properties on nodes in a JCR repository:

            1. the repository data increases - which in your case is a JDBC backed store under a table with the prefix ISPN_STRING_TABLE (this is because JCR information is stored)
            2. the byte[] content of the binary stream is stored
              1. in the RepositoryData cache (as per your configuration) and
              2. some binary metadata (e.g. SHA1 of the binary stream) in the RepositoryMetadata cache.

            Note that in general you can choose another type of store for binary values (e.g. NFS, database, Mongo etc) not just ISPN, even though you're clustering.

             

            You can read more about binary values in the ModeShape documentation: Binary values - ModeShape 4 - Project Documentation Editor

             

            Hi Horia,

             

            Thanks for reply!

             

            I can easily understand #2, and it works as configured that binary data (for me, they are files) are stored to different directories according to infinispan XML.

             

            What confusing me is the #1. My configuration results in 3 tables in mysql database.

             

            mysql> show tables;

            +--------------------------------------+

            | Tables_in_msnode                    |

            +--------------------------------------+

            | ispn_string_table_repository        |

            | ispn_string_table_repositorydata    |

            | ispn_string_table_repositorymetadata |

            +--------------------------------------+

             

            After uploading a file with 10,831,023 bytes, the total DATA_COLUMN size of ispn_string_table_repositorydata was increased from 0 to 10,831,540, and number of record from 0 to 1.

            One single file node addition should not cause the pure repository data changing so much.

             

            mysql> select SUM(LENGTH(DATA_COLUMN)) from ispn_string_table_repositorydata;

            +--------------------------+

            | SUM(LENGTH(DATA_COLUMN)) |

            +--------------------------+

            |                10831540   |

            +--------------------------+

             

            mysql> select count(*) from ispn_string_table_repositorydata;

            +----------+

            | count(*) |

            +----------+

            |        11|

            +----------+

             

            When I repeat the uploading actions, the size of ispn_string_table_repositorydata will increase accordingly with the uploaded file size.

            What's why I doubt binary data are not only stored in file as configured, but also in database.


            But I got the point why it happened, it was because I set the "default-cache" as "Reposiotory" in infinispan configuration. After reset it to "", no ispn_string_table_repositorydata/ispn_string_table_repositorymetadata were created in database. I close this discussion then


            Thank you very much!


            Jessie

            • 3. Re: Binary store in replicated mode
              jessie_jie_xie

              Horia Chiorean wrote:

               

              There is a distinction between repository data and binary data: repository data is everything JCR plus ModeShape internal meta-data except the bytes[] which make up the binary values per-se. However, in JCR when working with binary values those are always created as properties on nodes so the only way to store binaries is via JCR nodes/properties.

               

              Whenever you store some binary information in the form of properties on nodes in a JCR repository:

              1. the repository data increases - which in your case is a JDBC backed store under a table with the prefix ISPN_STRING_TABLE (this is because JCR information is stored)
              2. the byte[] content of the binary stream is stored
                1. in the RepositoryData cache (as per your configuration) and
                2. some binary metadata (e.g. SHA1 of the binary stream) in the RepositoryMetadata cache.

              Note that in general you can choose another type of store for binary values (e.g. NFS, database, Mongo etc) not just ISPN, even though you're clustering.

               

              You can read more about binary values in the ModeShape documentation: Binary values - ModeShape 4 - Project Documentation Editor

               

              Well Horia, I am considering your words of "choose another type of store for binary values (e.g. NFS, database, Mongo etc) not just ISPN, even though you're clustering."

               

              Mongodb sounds like a good choice for files, but I can not find any specific configuration for it. Seems the only way to use it is by "custom"  type. But the disadvantage is only default constructor MongodbBinaryStore() is called which will fail if username/password is required for Mongodb.

              Or I miss any document about MongodbBinaryStore?

               

              Thank you very much!

               

              Jessie

              • 4. Re: Binary store in replicated mode
                jessie_jie_xie

                OK, I guess I know how to do it.