Automatic Physical DB Tuning

CS848 Presentation of
:
The general definition of the problem:
»Given: a workload and space constraint (B)
»Output: the set of phy. str. (a configuration) fits in B
The paper criticizes previous work, e.g. papers #1,3,4.
»Too many assumptions and heuristics
»Loose interaction with the optimizer in candidate selection
»Independence between enumeration and merging
Assumptions (made in related work):
»What-if API (hypothetical physical structures)
»Dependence of optimizer
»Search framework


An optimizer, after getting a logical query
»Issues requests for indexes and materialized views (via access path generation module)
-All sargable predicates - Additional columns
-Sort columns - SPJG sub-queries
Each time optimizer issues a request
»Suspend the optimization (only in tuning mode)
»Analyze the request
»Simulate the physical structure
q No guessing
q Don’t miss candidates that might not be obvious at the final plan
q Don’t propose not exploitable candidates

How to get the best configuration-2

Union of all optimal structures is the best configuration
 Number of requests made by optimizer is not overwhelming

Advantages of knowing the best configuration

 A configuration is transformed into a smaller but less efficient new configurations
»Re-optimization of a relaxed configuration is more efficient
 More useful information to DBA:

Transformations
 C -> C’, a subset of structures in C are replaced
 Indexes
»Merging: I’=(K1,(S1 U K2 U S2)-K1)
»Splitting: re-arrange overlapping columns of wider indexes
Ic=(              ,            ), Ir1=(K1-KC,I1-Ic), Ir2=(K2-KC,I2-Ic)
»Prefixing: I1’=(K1’,    ).
»Promotion to clustered
»Removal
 Views
»V1,V2 -> Vm
»Removal

 For each valid tr,
»C -> Ctr
»Decrease in space:
»Increase in cost:
                                         
Relax C into Cnew using the transformation tr which minimizes the penaltytr.
»
 The configuration with minimum cost
»Impractical, too long to converge
 Instead,
»Use greedy approach till we satisfy the space constraint
“remember time limitation”
»At the end, we have a chain of relaxed configurations
-Pick the configuration resulted in the largest penalty when relaxed
   Correct the transformation

How to find cost of a workload and space of a config.
»Materialize to find space consumed by a configuration
»Re-optimize (simulate) the workload under configuration
It’s too expensive to call the optimizer
Estimating Cost and Space of a configuration

Approximate values for space and cost
Space
»Index: use width of keys and size of table, estimate B-tree size
»
Views: Index + estimate from “cardinality optimizer” info 
query
Previous discussion ignores update queries (UPDATE, DELETE,INSERT queries)
»Update queries affect indexes and views
Adding more index and views would reduce execution cost is not a true assumption
»Index-view update overhead
»Each update query implicitly references all indexes over its referenced table (not just 1 index)
»Cascading effect ?
Now queries which don’t use the indexes of an update query are affected indirectly.
An update query is divided into two,
»A pure SELECT query
»A small UPDATE shell
Remember the goal is to estimate the cost!!!
For each index-I and update query-q
»Find the cost of updating I for q
»While estimating a configuration
-Use just SELECT portion (treat as a normal query)
-Then estimate the cost of indexes referenced in the query
 

0 comments:

Post a Comment

 
 
 

sponsors

indiae.in we are in
Indiae.in