5 Replies Latest reply on Jan 17, 2012 10:30 PM by rareddy

    Infinispan Connector

    j3hntan

      Hi anyone has got some work going on for infinispan connector for teiid?

      I was think to store data each key/value in infinispan is a row ID and value is entire row data

        • 1. Re: Infinispan Connector
          rareddy

          None that we know of. We have had some discussions around using the Infinispan as internal cache where can use for materialized caching and resultset caching. We need features like bulk data transfer to implement this layer correctly.

           

          Infinispan as source has some what limited usecases as the data stored is custom, unstructured fashion and not backed by schema, so writing a generic translator becomes a challenge.If you wanting to use this translator as relational data back up store, may be there are some options using Infinspan JPA/Hibernate interfaces.

           

          What is your usecase? May be based on that we can further invite others to participate and provide any interesting solutions for this.

           

          Ramesh..

          • 2. Re: Infinispan Connector
            j3hntan

            First to say we like the ease of use and start up of teiid. In our environment, we are looking at revamping our custom developed central data store which provides low latency data access to our large number of relational data sources underneath. We were doing some POC on teiid and found that (I should not say limitation as it is not) developers need to be very careful (aware) in querying "one common database" as trivial query can perform poorly as basically the tables are coming from different database (so buffer works very hard). 

             

            I recent saw infinispan positioning as a data grid layer which provides features such as write back and write behind (for databases in our scenario). If we "put" our data on infinispan, and teiid to provide natural access to "applications". Teiid will then "query" data on "1 datasource - if we can control a single teiid query reading 1 instance of infinispan". We want to test out this scenario. Fully aware the nature of key/value pair of infinispan. I was thinking of having region, key and value mimics table, id and entire row in value - of course we need to think how to organise this data store to be mapped into some sort of schema.

             

            Does such a scenario even make sense?

            • 3. Re: Infinispan Connector
              rareddy

              John,

               

              Further question about usecase is, do you need data integration between all those relational data sources you are fronting with the Teiid or your custom developed data source? Also, are you trying to make copy of the source database in your caching layer or you are trying to cache the results of the user cache? I am trying to see  Teiid is good fit or not. It sounded from above, that you very much like to use Teiid for the JDBC access mechanism (which is valid usecase in some scenarios).

               

              As per the poor query performance, it really depends on the source of that data and capabilities supported by the source. If your user queries are just going against single source at a time, then more or less the user query will gets pushed to the source as is. So, there is some performance hit compare to direct to source as you have one network hop and extra serilization/deserelization step. When user query spans two or more sources then capabilities of source and supplied criteria, costing information and join semantics play a big role in making the query execute faster. You can see the query plan as to how Teiid is going to execute.

               

              Also from above another thing to consider is who is populating the Infinispan? If your application is directly inserting, then they may limit to one database. Through Teiid it can be multiple sources, however in that case you need to use materialization feature.  As I said before, we do not currently support internal materialization into Infinispan (would have been best case scenario 4 u). For external materialization, your application would be responsible for the cache loads and refreshes, Teiid engine does not perform these automatically.

               

              You may also want to try Teiid's internal materialization which is currently based on disk+memory, which is a solution that does not involve Infinispan. We have tested very large sources with this strategy with acceptable speed.

               

              Infinispan route can also make sense as you are explaining.

              • 4. Re: Infinispan Connector
                j3hntan

                In our scenario, we have a number of data sources (more than 40 - acquisitions and such - for some of the systems). Many of our queries span across 4-5 datasource and as expected the performance does not look good and CPU runs wild.

                 

                What we are thinking is to "virtualize" the data by loading the data to the infinispan and get it queried through teiid and since the query plan is on a single source hopefully the performance will be great.

                Any put/remove instruction will be in a transaction scope to the cache store and the underlying datasource.

                 

                Of course we are seeing different query plan for different position of joints on teiid however, it takes developers alot more effort and understanding of internal working of teiid query plan to refactor their codes which can be expensive.

                 

                Unless there are areas in teiid that you can help to enlighten.

                • 5. Re: Infinispan Connector
                  rareddy

                  John,

                  In our scenario, we have a number of data sources (more than 40 - acquisitions and such - for some of the systems). Many of our queries span across 4-5 datasource and as expected the performance does not look good and CPU runs wild.

                  So you do need data integration engine. This performance is with your code or the Teiid? Teiid should give very comparable performance to your custom code.

                   

                  What we are thinking is to "virtualize" the data by loading the data to the infinispan and get it queried through teiid and since the query plan is on a single source hopefully the performance will be great.

                  Any put/remove instruction will be in a transaction scope to the cache store and the underlying datasource.

                  So, this is kind of data mart solution you are going after. The issue here if you want Teiid to handle the updates to cache and underlying sources, only internal materialization has capabilities. As I mentioned above we do not have infinispan integration there. When using external materialization, Teiid does not do the updates.

                   

                  Of course we are seeing different query plan for different position of joints on teiid however, it takes developers alot more effort and understanding of internal working of teiid query plan to refactor their codes which can be expensive.

                  These are defined inside the VDB, when you are defining the view layer for the VDB. So, query planning is exposed only to the VDB developers, not the end user application developers. I suggest you take this route first for atleast few queries and see the performance. Then try the Teiid's in-built internal materialization feature. If those two do not work then you can explore the Infinispan options.

                   

                  Ramesh..