Want to start a discussion on getting a better understanding of analyzing a query plan. With the goal of answering these 3 questions (but not limited to):
- Better understanding of some of the node rules
- What to look for in a query plan that would indicate the query will perform at less than optimal performance
- What are the recommended action(s) to improve query performance
Probably, many of the recommendations will be to update your model or add a hint to your query. But I would like to tie the identified problem in the query plan to the recommended resolution. An example of this is the table cardinality. Not having this value set in the source model can definitely impact how the query plan is constructed. But when would a query plan show the symtom? Which node rules utilize this value?
This post may get lengthy, so I'm going to start out with one rule at a time, and try to connect all the dots. Also, I'm running some of my own tests to show what I'm seeing in the query plan in an effort to help draw a correlation between the node rules and the query plan.
NOTE: this is using the 7.7 doc's.
As details are determined, they will be noted in this section.
RuleChooseJoinStrategy - this rules determines the base join strategy and join type
A Join Strategy is, what kind of algorithm Teiid using internally to sorting through the results on each side to satisfy the JoinType.
Join Strategy: Merge, Nested Loop, Nested Table, Enhanced Sort
Default strategy: Nested Loop
Merge: Merge Join Strategy supports generalized Full, Left Outer, and Inner Joins (containing non-equi join criteria) as long as there is at least one equi-join criteria
Nested Loop: Currently implmented as a degenerate case of merge join, and only for use with Full, Left, Inner and Cross joins.
Nested Table: A variation of the NEST_LOOP join that handles nested tables.
Enhanded Sort: An extension to the MERGE join strategy, to check for conditions necessary to NOT fully sort one of the sides. Will be used for inner joins and only if both sorts are not required. Degrades to a normal merge join if the tuples are balanced.
NESTED LOOP is expensive
Join Type: Inner Join, Right Outer Join, Left Outer Join, Full Outer Join, Cross Join, Union Join, Semi Join, Anti Semi Join
CROSS JOINS are usually expensive, as they produce Cartesian product of two resultsets.
- It talks about enabling costing? how is that done
- reference is made to "column ndv and column nnv values", what are ndv / nnv?
- indicates that if "key metadata information" is there, what key metadata is it specifically looking for?
4. When a JOIN is planned, you will see a JoinNode in the query plan. For each JoinNode, at the end for that planned node, it indicates the strategy, type and criteria. Example:
+ Child 0:
+ Join Strategy:NESTED LOOP JOIN
+ Join Type:CROSS JOIN
+ Join Criteria:product.symbol = stock.symbol
What are the possible values for Strategy and Type (I didn't see a link or reference in the doc)? And which one's indicate a possible performance problem?
What are possible resolutions?
Is there value (or null) for which the Join Criteria will contain such that it indicates a problem may exist?
The explanations of the nodes and rules are described here http://docs.jboss.org/teiid/7.7.0.Final/reference/en-US/html_single/#d0e10787, however I see those are much more simplistic than what you are looking for. IMO, this is very broad discussion that will be complete only by reading whole planner code. Teiid does add some annotations to plan when it does take certain decisions, however I think this capability can be improved upon more at all critical decision points, that can lead some what into your suggestions on next questions (2 & 3) as to how they can be improved upon.
As per specific questions on JoinStrategy and JoinType take look at classes JoinNode and JoinType. I believe RuleChooseJoinStrategy decides which one to choose.
JoinType is how two resultsets are being joined in SQL terms. CROSS JOINS are usually expensive, as they produce Cartesian product of two resultsets.
A JoinStrategy is, what kind of algorithm Teiid using internally to sorting through the results on each side to satisfy the JoinType. I believe NESTED LOOP is expensive.
https://issues.jboss.org/browse/TEIID-2043 moves most of the "plain text" decisions into both annotations and the debug log. The goal being that most users should not need to look at the full debug log. I believe that the debug log is mostly for internal diagnostics and is too verbose, changes too much with each release, etc. to be of general use.
As for possible property varients and other information that appears in the processing plan, yes those should be documented. Please open a JIRA for anything that is missing.
As for going further to detect problems that is usually done reactively and has to be in reference to what is expected. Just highlighting cross joins or nested loops would rarely be sufficient to understand poor performance due to Teiid (as those situations are usually a user issue with the query issued and not an optimization issue).