6 Replies Latest reply on Jan 27, 2016 12:19 PM by shawkins

    Why is a JOIN between sources so slow?

    werx.io

      Hi,

      We are trying to create a single JDBC source for our applications and create views that will join between two database schemas and/or two JDBC sources..

       

      We hoped that the caching and other features would mean that joins between two schemas or JDBC sources would be faster than materialized views, stored procedures and DB Links we currently use. However, an initial test to replace a regular view (that internally makes calls to stored functions) is 600x as slow.

       

      The unusual thing is that the views themselves, when they don't join across sources, are very effective. To get one of the items from "schema A" -- without the columns from schema B only takes 1 second. And to get the results from schema B directly takes two seconds But when joined, the result takes 4 minutes.

       

      I've trimmed the code below to the minimum, hopefully it shows that I sub-select the ASSIST schema, then join that to the "..UNBILLED..." view which is in a different schema. I presumed that it would run the inner select (once), then the outer join with the results.  But this still takes 4 minutes.

       

      Any thoughts?

      Need actual source code?

       

      Thank you,

      Hank

       

      -- Example query, that takes 4 minutes, but the view it replaces only takes 4 seconds.

       

      CREATE VIEW TEST3_VDB (

      ...

      ) AS SELECT

      ...

      -- Make the sub select query to schema 1:

        FROM  (SELECT

      ...

      FROM ASSIST.ACCPT_RPT_DETAIL ard

      JOIN ASSIST.PROCUREMENT pr

      ...

        AND aribp.SUBTASK_REF_ID = libp.SUBTASK_REF_ID ) lamibp

      -- And join that to the schema 2 query, which is in its own view:

      JOIN LAM_BP_UNBILLED_TO_ORIG_VDB unbilled

      ...

       

       

      -- Original view that calls stored functions for some columns:

      select * from v_lam_accpt_rpt_item_bill_pref where accpt_rpt_item_id = 187518;

        Elapsed Time:  0 hr, 0 min, 4 sec, 116 ms.

      -- Snapshot view replacing above:

      select * from lam_accpt_rpt_item_bill_pref_vdb where accpt_rpt_item_id = 187518;

        Elapsed Time:  0 hr, 6 min, 23 sec, 974 ms.

      -- Snapshot view replacing stored procedures:

      select * from LAM_BP_UNBILLED_TO_ORIG_VDB where subtask_ref_id = 151976;

        Elapsed Time:  0 hr, 0 min, 2 sec, 82 ms.

      -- Snapshot test that joins only schema A:

      select * from TEST2_VDB where accpt_rpt_item_id = 187518;

        Elapsed Time:  0 hr, 0 min, 1 sec, 34 ms.

      -- Snapshot view in subselect join to table_master VDB view:

      select * from TEST3_VDB where accpt_rpt_item_id = 187518

        Elapsed Time:  0 hr, 4 min, 1 sec, 853 ms.

        • 1. Re: Why is a JOIN between sources so slow?
          shawkins

          The first thing we would need to see is the query plan from each of the approaches.  More than likely there is something inefficient about the join strategy chosen in the materialized case that should be easy to resolve with additional statistic/costing values or hints.

          • 2. Re: Why is a JOIN between sources so slow?
            werx.io

            Update: I created a new VDB that imported BOTH schemas into the same JDBC connection with all the elements prefixed. This may not work for our application, since our Hibernate entities may not be prefixing table names, but as a test, this works very well. Perhaps DV is "passing along" the query to JDBC or not creating any intermediates.

             

            The view is nominally slower, but I'm sure a query for many items would be faster, (e.g. item in ('1234','2354'...))

             

            In other words, DV works very effectively when querying to multiple schemas in the same database, but, as I have currently coded, very inefficiently joining between two databases.

             

            -- Existing view (called two times in a row, so it is a bit "primed" as it were):

            select * from v_lam_accpt_rpt_item_bill_pref   where accpt_rpt_item_id = 187518

            Elapsed Time:  0 hr, 0 min, 3 sec, 71 ms.

             

            -- snapshot View defined by joining all tables

            select * from lam_accpt_rpt_item_bill_pref_vdb where accpt_rpt_item_id = 187518

            Elapsed Time:  0 hr, 0 min, 5 sec, 478 ms.

             

            -- View defined in a view that uses a sub-select

            select * from TEST3_VDB where accpt_rpt_item_id = 187518

            Elapsed Time:  0 hr, 0 min, 5 sec, 179 ms.

             

            Cheers,

            Hank

            • 3. Re: Why is a JOIN between sources so slow?
              werx.io

              Thank you for reviewing. Please take a look at the update. In this case, a JDBC connection is available that can access both schemas. In this case, it could be a viable solution, because the snapshot view doesn't expose that it is joining across two schemas. Many stored procedures were created to access a second schema, but hide it from the application. And although slow for single item, I think it would be much faster for a group of items because the join is done once, not for each item through a procedure call.

               

              Likely, the query requires some optimization  -- but is there a way to get a query plan from TEIID ? It seems there is hardly a problem when the query runs to the same database, but we have other use cases where we want to eliminate Oracle database links.

               

              --Hank

              • 4. Re: Why is a JOIN between sources so slow?
                shawkins

                To get query plans you can set your general logging up to a debug level, use Teiid Designer, extension methods on our JDBC Statement, or statements see Query Plans - Teiid 9.0 (draft) - Project Documentation Editor, SET Statement - Teiid 9.0 (draft) - Project Documentation Editor, and SHOW Statement - Teiid 9.0 (draft) - Project Documentation Editor

                • 5. Re: Why is a JOIN between sources so slow?
                  werx.io

                  Final results for a VDB with a single JDBC connection, one database with multiple schemas.  Note: The original view (first) calls embedded procedures, the VDB view unwraps those procedures (since in can join to both schemas). Basically, I'm seeing exactly what I hoped to, the procedure calls are replaced with a join .

                   

                  select * from v_lam_accpt_rpt_item_bill_pref   where accpt_rpt_item_id = 187518

                  Elapsed Time:  0 hr, 0 min, 5 sec, 794 ms.

                   

                  select * from lam_accpt_rpt_item_bill_pref_vdb where accpt_rpt_item_id = 187518

                  Elapsed Time:  0 hr, 0 min, 1 sec, 790 ms.

                   

                  It is still an issue to achieve this kind of performance improvement between two different JDBC sources.

                  • 6. Re: Why is a JOIN between sources so slow?
                    shawkins

                    > It is still an issue to achieve this kind of performance improvement between two different JDBC sources.

                     

                    Unfortunately it's hard to speculate about the differences without seeing the query plans.  Generally you should expect a join that goes against multiple sources to be slower as a join against a single source can be pushed as a single query.  If possible you can also provide the vdb if you are having an issue with getting the query plans.