Quantcast
Channel: Oracle
Viewing all articles
Browse latest Browse all 1814

Blog Post: “My Kingdom For A Histogram”

$
0
0
"Don't you know anything at all about numbers?" "Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth. "NOT IMPORTANT!" roared the Dodecahedron, turning red with fury. "Could you have tea for two without the two, or three blind mice without the three? Would there be four corners of the earth if there weren't a four? And how would you sail the seven seas without a seven?" "All I meant was..." began Milo, but the Dodecahedron, overcome with emotion and shouting furiously, carried right on. "If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths? Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last," he concluded, waving his arms over his head, "without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the world. Just follow me and I'll show you." He turned on his heel and stalked off into the cave. -- Norton Juster, The Phantom Tollbooth In one of the current Oracle forums an “old saw” was resurrected, namely that a single-column primary key has no need of a histogram. This presumes that the primary key has no ‘holes’ in the data and has a continuous range of data. The reality of that situation is such primary keys are rarely, if ever, unbroken and the range of data may not be continuous. With that last sentence in mind it’s not strange to consider a histogram on a single-column primary key. And depending on the release of Oracle in use that histogram may be an old standard (a height-balanced histogram) or a new histogram offered in 12c, the hybrid histogram. Let’s look at an example how this plays out in 11.2.0.4 and in 12.1.0.2. First, though, let’s look at that new Hybrid histogram. The Hybrid histogram is a new type of histogram that uses a number of buckets less than the number of distinct values in the table data; it also determines, and utilizes, the repeat frequency of the endpoint values. The bucket size can be adjusted from the initial size based on endpoint values; there are two special buckets which will not change in size, and these are the ‘boundary’ buckets, bucket 0 and the last bucket of the histogram. The Hybrid histogram is based on four apparent ‘rules’, which are: 1) a value should not be found in more than one bucket 2) the bucket size is allowed to be extended in order to contain all instances of the same distinct value 3) adjusted bucket size cannot be less than the original size (not applicable at either end of the data set) 4) the original number of buckets should not be reduced One way to create a hybrid histogram is shown in the example below, part of which was provided by Jonathan Lewis, the remainder of this example was adapted from work done by Mohammed Houri. We begin: SQL> SQL> -- SQL> -- Create and populate a table so we can SQL> -- build a hybrid histogram SQL> -- SQL> -- Thanks to Jonathan Lewis for generating SQL> -- this table and data set SQL> -- SQL> create table t1 (id number, n1 number); Table created. Elapsed: 00:00:00.01 SQL> SQL> @InsT1.sql Elapsed: 00:00:00.00 SQL> Choosing a number of buckets less than the number of distinct values in the table is one way to create a hybrid histogram: SQL> SQL> -- SQL> -- Return the number of distinct values SQL> -- in column n1 SQL> -- SQL> SQL> select count(distinct n1) from t1; COUNT(DISTINCTN1) ----------------- 37 Elapsed: 00:00:00.00 SQL> 20 buckets will be used for our histogram and use DBMS_STATS to create it: SQL> SQL> -- SQL> -- Create a hybrid histogram on column n1 SQL> -- by setting the number of buckets less SQL> -- than the number of distinct values SQL> -- SQL> BEGIN 2 dbms_stats.gather_table_stats 3 (user, 't1', method_opt => 'for columns n1 size 20'); 4 END; 5 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.43 SQL> During the creation of this histogram Oracle places the column values in order to find values that occur in more than one bucket. When such values are found new bucket sizes are computed for all but the boundary buckets, in this case bucket 0 and bucket 19: SQL> SQL> -- SQL> -- Four rules exist to create a hybrid histogram, and the SQL> -- first two are: SQL> -- SQL> -- a value should not be found in more than one bucket SQL> -- a bucket size is allowed to be extended in order to contain SQL> -- all instances of the same value SQL> -- SQL> -- Compute the 'new' bucket size based on the data and SQL> -- endpoint values SQL> -- SQL> -- To illustrate how the bucket size adjustment is SQL> -- calculated let's look at the value 13 in the data SQL> -- SQL> -- Oracle 'walks' the ordered list of values until SQL> -- it finds 13 in two buckets SQL> -- SQL> -- Oracle extends the bucket where 13 is first found (bucket n) SQL> -- and moves all other instances of 13 (from bucket n+1) SQL> -- to this bucket (n) SQL> -- SQL> -- Moving all instances of 13 to a single bucket SQL> -- also makes 13 an endpoint value allowing Oracle SQL> -- to compute the repeat count of that value SQL> -- SQL> -- This continues on through the values so that SQL> -- Oracle can place all occurrences of a distinct SQL> -- value so they don't populate more than one bucket SQL> -- SQL> SELECT 2 (row_number() over(order by ept_nbr)-1) NumBucket 3 ,ept_nbr 4 ,ept_act_val 5 ,rpt_cnt 6 ,ept_nbr - (lag(ept_nbr,1,0) over(order by ept_nbr)) "new bucket size" 7 ,bucket_size "original bucket_size" 8 FROM 9 (SELECT 10 ah.endpoint_number ept_nbr 11 ,ah.endpoint_actual_value ept_act_val 12 ,lag(ah.endpoint_number,1,0) over(order by ah.endpoint_number) ept_lag 13 ,ah.endpoint_repeat_count rpt_cnt 14 ,at.sample_size/at.num_buckets bucket_size 15 FROM 16 user_tab_histograms ah 17 ,user_tab_col_statistics at 18 WHERE ah.table_name = at.table_name 19 AND ah.column_name = at.column_name 20 AND ah.table_name = 'T1' 21 AND ah.column_name = 'N1' 22 ) ORDER BY ept_nbr; NUMBUCKET EPT_NBR EPT_ACT_VAL RPT_CNT new bucket size original bucket_size ---------- ---------- ------------ ---------- --------------- -------------------- 0 1 8 1 1 5 1 6 13 3 5 5 2 12 18 2 6 5 3 20 20 5 8 5 4 26 23 2 6 5 5 32 26 3 6 5 6 38 27 6 6 5 7 44 28 6 6 5 8 50 29 6 6 5 9 58 31 5 8 5 10 69 33 8 11 5 NUMBUCKET EPT_NBR EPT_ACT_VAL RPT_CNT new bucket size original bucket_size ---------- ---------- ------------ ---------- --------------- -------------------- 11 79 35 7 10 5 12 86 38 5 7 5 13 90 41 1 4 5 14 92 42 2 2 5 15 95 43 3 3 5 16 96 44 1 1 5 17 97 45 1 1 5 18 98 46 1 1 5 19 100 59 1 2 5 20 rows selected. Elapsed: 00:00:00.21 SQL> Notice that for buckets 1-12 the bucket size increased; for the remaining non-boundary buckets the bucket size should decrease. But according to Rule 3 no bucket can be adjusted to a size smaller than the original bucket size so those buckets remain unadjusted. The bucket size computation in the query results below is based on the sample size divided by the number of buckets and, thus, displays a constant value for all buckets: SQL> SQL> -- SQL> -- Given that boundary values are the exception computing the new SQL> -- bucket sizes shows that for bucket numbers 1-12 the sizes are SQL> -- the same or larger than the originally computed size (5) SQL> -- SQL> -- All remaining non-endpoint buckets compute to a smaller size SQL> -- than the original SQL> -- SQL> -- Now the third rule appears: SQL> -- SQL> -- No non-endpoint bucket can be smaller than the originally SQL> -- computed size SQL> -- SQL> select 2 uth.endpoint_number 3 ,uth.endpoint_actual_value 4 ,uth.endpoint_repeat_count 5 ,ucs.sample_size/ucs.num_buckets bucket_size 6 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity 7 from 8 user_tab_histograms uth 9 ,user_tab_col_statistics ucs 10 where 11 uth.table_name = ucs.table_name 12 and uth.column_name = ucs.column_name 13 and uth.table_name = 'T1' 14 and uth.column_name = 'N1' 15 order by uth.endpoint_number; ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY --------------- ------------ --------------------- ----------- ---------- 1 8 1 5 -4 6 13 3 5 -2 12 18 2 5 -3 20 20 5 5 0 26 23 2 5 -3 32 26 3 5 -2 38 27 6 5 1 44 28 6 5 1 50 29 6 5 1 58 31 5 5 0 69 33 8 5 3 ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY --------------- ------------ --------------------- ----------- ---------- 79 35 7 5 2 86 38 5 5 0 90 41 1 5 -4 92 42 2 5 -3 95 43 3 5 -2 96 44 1 5 -4 97 45 1 5 -4 98 46 1 5 -4 100 59 1 5 -4 20 rows selected. Elapsed: 00:00:00.19 SQL> For a Hybrid histogram three types of column value exist to base cardinality estimates upon: popular values, non-popular values which are endpoint and non-popular values that are not an endpoint. Let’s look at how Oracle estimates the cardinality for each case, starting with popular values: SQL> SQL> -- SQL> -- Display the endpoint, value, repeat count, SQL> -- bucket size and the 'popularity' SQL> -- SQL> -- Popularity is determined by computing the SQL> -- difference between the frequency of appearance SQL> -- (repeat count) and the bucket size (sample size/number of buckets) SQL> -- Positive values are considered 'popular' SQL> -- SQL> select 2 endpoint_number 3 ,endpoint_actual_value 4 ,endpoint_repeat_count 5 ,bucket_size 6 ,case when Popularity > 0 then 'Pop' 7 else 'Non-Pop' 8 end Popularity 9 from 10 ( 11 select 12 uth.endpoint_number 13 ,uth.endpoint_actual_value 14 ,uth.endpoint_repeat_count 15 ,ucs.sample_size/ucs.num_buckets bucket_size 16 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity 17 from 18 user_tab_histograms uth 19 ,user_tab_col_statistics ucs 20 where 21 uth.table_name = ucs.table_name 22 and uth.column_name = ucs.column_name 23 and uth.table_name = 'T1' 24 and uth.column_name = 'N1' 25 ) 26 order by endpoint_number; ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR --------------- ------------ --------------------- ----------- ------- 1 8 1 5 Non-Pop 6 13 3 5 Non-Pop 12 18 2 5 Non-Pop 20 20 5 5 Non-Pop 26 23 2 5 Non-Pop 32 26 3 5 Non-Pop 38 27 6 5 Pop 44 28 6 5 Pop 50 29 6 5 Pop 58 31 5 5 Non-Pop 69 33 8 5 Pop ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR --------------- ------------ --------------------- ----------- ------- 79 35 7 5 Pop 86 38 5 5 Non-Pop 90 41 1 5 Non-Pop 92 42 2 5 Non-Pop 95 43 3 5 Non-Pop 96 44 1 5 Non-Pop 97 45 1 5 Non-Pop 98 46 1 5 Non-Pop 100 59 1 5 Non-Pop 20 rows selected. Elapsed: 00:00:00.19 SQL> Let’s return only the popular values and work through the cardinality estimate Oracle generates: SQL> -- SQL> -- Using the popularity Oracle estimates the cardinality SQL> -- by considering the following three types of values: SQL> -- SQL> -- popular value SQL> -- non-popular value with an endpoint number SQL> -- non-popular value not present in the histogram table SQL> -- SQL> -- Case 1: Popular values SQL> -- SQL> -- Display the 'popular' values in this data set SQL> -- SQL> select 2 endpoint_actual_value 3 ,endpoint_repeat_count 4 ,bucket_size 5 ,Popularity 6 from 7 ( 8 select 9 endpoint_number 10 ,endpoint_actual_value 11 ,endpoint_repeat_count 12 ,bucket_size 13 ,case when Popularity > 0 then 'Pop' 14 else 'Non-Pop' 15 end Popularity 16 from 17 ( 18 select 19 uth.endpoint_number 20 ,uth.endpoint_actual_value 21 ,uth.endpoint_repeat_count 22 ,ucs.sample_size/ucs.num_buckets bucket_size 23 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity 24 from 25 user_tab_histograms uth 26 ,user_tab_col_statistics ucs 27 where 28 uth.table_name = ucs.table_name 29 and uth.column_name = ucs.column_name 30 and uth.table_name = 'T1' 31 and uth.column_name = 'N1' 32 ) 33 ) 34 where Popularity = 'Pop'; ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR ------------ --------------------- ----------- ------- 27 6 5 Pop 28 6 5 Pop 29 6 5 Pop 33 8 5 Pop 35 7 5 Pop Elapsed: 00:00:00.20 SQL> Looking at one of these popular values (33) let’s return the cardinality estimate Oracle calculated: SQL> -- SQL> -- Using explain plan the cardinality estimation SQL> -- can be displayed for one of the 'popular' values SQL> -- SQL> explain plan for select count(1) from t1 where n1 = 33; Explained. Elapsed: 00:00:00.00 SQL> SQL> -- SQL> -- The cardinality estimate displayed below was SQL> -- calculated using the following formula: SQL> -- SQL> -- E-Rows = ENDPOINT_REPEAT_COUNT * num_rows/sample_size SQL> -- E-Rows = 8 * 100/100 = 8 SQL> -- SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Plan hash value: 3724264953 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T1 | 8 | 24 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- 2 - filter("N1"=33) 14 rows selected. Elapsed: 00:00:00.03 SQL> From the formula found in the above comments our calculated cardinality matches with that provided by the optimizer, so our understanding of the Hybrid histogram seems sound. To continue that verification we consider the case where we have a non-popular value that is an endpoint. The formula for that cardinality calculation is: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size). We calculate the NewDensity with a query based on the work of Alberto Dell’Era. First we return the non-popular values that are endpoints: SQL> -- SQL> -- Case 2: Non-popular values with an endpoint SQL> -- SQL> -- First return all non-popular values that are an SQL> -- endpoint SQL> -- SQL> select 2 endpoint_actual_value 3 ,endpoint_repeat_count 4 ,bucket_size 5 ,Popularity 6 from 7 ( 8 select 9 endpoint_number 10 ,endpoint_actual_value 11 ,endpoint_repeat_count 12 ,bucket_size 13 ,case when Popularity > 0 then 'Pop' 14 else 'Non-Pop' 15 end Popularity 16 from 17 ( 18 select 19 uth.endpoint_number 20 ,uth.endpoint_actual_value 21 ,uth.endpoint_repeat_count 22 ,ucs.sample_size/ucs.num_buckets bucket_size 23 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity 24 from 25 user_tab_histograms uth 26 ,user_tab_col_statistics ucs 27 where 28 uth.table_name = ucs.table_name 29 and uth.column_name = ucs.column_name 30 and uth.table_name = 'T1' 31 and uth.column_name = 'N1' 32 ) 33 ) 34 where Popularity = 'Non-Pop'; ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR ------------ --------------------- ----------- ------- 8 1 5 Non-Pop 13 3 5 Non-Pop 18 2 5 Non-Pop 20 5 5 Non-Pop 23 2 5 Non-Pop 26 3 5 Non-Pop 31 5 5 Non-Pop 38 5 5 Non-Pop 41 1 5 Non-Pop 42 2 5 Non-Pop 43 3 5 Non-Pop ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR ------------ --------------------- ----------- ------- 44 1 5 Non-Pop 45 1 5 Non-Pop 46 1 5 Non-Pop 59 1 5 Non-Pop 15 rows selected. Elapsed: 00:00:00.20 SQL> Let’s return Oracle’s estimated cardinality for one of these values (45): SQL> -- SQL> -- Use explain plan to return the estimated cardinality SQL> -- SQL> -- In this case the estimated cardinality is computed as: SQL> -- SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size) SQL> -- SQL> explain plan for select count(1) from t1 where n1 = 45; Explained. Elapsed: 00:00:00.00 SQL> SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Plan hash value: 3724264953 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T1 | 2 | 6 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- --------------------------------------------------- 2 - filter("N1"=45) 14 rows selected. Elapsed: 00:00:00.02 SQL> Now let’s use our query to return the value for NewDensity: SQL> -- SQL> -- The NewDensity is computed by an internal algorithm but SQL> -- we can get a reliable density using the following query SQL> -- based on work done by Alberto Dell'Era: SQL> -- SQL> SELECT 2 BktCnt 3 ,PopBktCnt 4 ,PopValCnt 5 ,NDV 6 ,pop_bucketSize 7 ,trunc(((BktCnt-PopBktCnt)/BktCnt)/(NDV-PopValCnt),10) NewDensity 8 FROM 9 (SELECT 10 COUNT(1) PopValCnt, 11 SUM(endpoint_repeat_count) PopBktCnt, 12 ndv, 13 BktCnt, 14 pop_bucketSize 15 FROM 16 (SELECT 17 (sample_size - num_nulls) BktCnt, 18 num_distinct ndv, 19 num_buckets, 20 density OldDensity, 21 (sample_size-num_nulls)/num_buckets pop_bucketSize 22 FROM user_tab_col_statistics 23 WHERE 24 table_name = 'T1' 25 AND column_name = 'N1' 26 ), 27 user_histograms 28 WHERE table_name = 'T1' 29 AND column_name = 'N1' 30 AND endpoint_repeat_count> pop_bucketSize 31 GROUP BY ndv, 32 BktCnt, 33 pop_bucketSize 34 ); BKTCNT POPBKTCNT POPVALCNT NDV POP_BUCKETSIZE NEWDENSITY ---------- ---------- ---------- ---------- -------------- ---------- 100 33 5 37 5 .0209375 Elapsed: 00:00:00.19 SQL> We now see if our calculation returned a valid value by ‘plugging’ it into the following equation: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size). The cardinality estimate we compute is: SQL> -- SQL> -- 'Plugging' this new density into the equation above produces: SQL> -- SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size) SQL> -- E-Rows = 100 * greatest (.0209375, 1/100) = 2.09375 ~ 2 SQL> -- SQL> -- which matches the value returned by Oracle SQL> -- SQL> SQL> -- SQL> -- To validate these findings use a different non-popular value SQL> -- that's also an endpoint: SQL> -- SQL> explain plan for select count(1) from t1 where n1 = 43; Explained. Elapsed: 00:00:00.00 SQL> SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Plan hash value: 3724264953 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T1 | 3 | 9 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- 2 - filter("N1"=43) 14 rows selected. Elapsed: 00:00:00.02 SQL> Oracle returned 3 as the estimated cardinality and so did our calculation, more proof that our concept of the Hybrid histogram is correct. One final test will prove how well we understand this histogram, considering non-popular values that are not endpoints. Using our calculated value of NewDensity in this equation: E-Rows = num_rows * NewDensity gives us the following result: E-Rows = 100 * .0209375 = 2.09375 ~ 2. Time to see if our calculated value matches that which Oracle has estimated: SQL> SQL> -- SQL> -- Case 3: Non-popular value without an endpoint number SQL> -- SQL> -- This calculation is fairly simple and straightforward: SQL> -- SQL> -- E-Rows = num_rows * NewDensity = 100 * .0209375 = 2.09375 ~ 2 SQL> -- SQL> SQL> explain plan for select count(1) from t1 where n1 = 17; Explained. Elapsed: 00:00:00.00 SQL> SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- Plan hash value: 3724264953 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T1 | 2 | 6 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- 2 - filter("N1"=17) 14 rows selected. Elapsed: 00:00:00.02 SQL> Our calculated value, as evidenced by the Rows value in the execution plan, matches Oracle’s estimate, giving further evidence that our contcspt of the Hybrid histogram is sound. ` As mentioned earlier in this article Hybrid histograms can exhibit instability in the form of endpoint changes based on the changing frequency of values as inserts are executed againt the table. To illustrate this let’s add another 16 to the data and see what happens: SQL> SQL> -- SQL> -- Hybrid histograms can display instability in terms of SQL> -- endpoints and bucket size SQL> -- SQL> -- To illustrate this we add 16 to the data, increasing SQL> -- the repeat count to 3, causing this newly added 16 SQL> -- to shift into the previous bucket leaving 17 SQL> -- as the new endpoint SQL> -- SQL> insert into t1 values (2, 16); 1 row created. Elapsed: 00:00:00.00 SQL> SQL> BEGIN 2 dbms_stats.gather_table_stats 3 (user, 't1', method_opt => 'for all columns size 20'); 4 END; 5 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.08 SQL> SQL> select 2 uth.endpoint_number 3 ,uth.endpoint_actual_value 4 ,uth.endpoint_repeat_count 5 from 6 user_tab_histograms uth 7 ,user_tables ut 8 ,user_tab_col_statistics ucs 9 where 10 uth.table_name = 'T1' 11 and uth.column_name = 'N1' 12 and uth.table_name = ut.table_name 13 and ut.table_name = ucs.table_name 14 and uth.column_name = ucs.column_name 15 order by uth.endpoint_number; ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT --------------- ------------ --------------------- 1 8 1 6 13 3 11 17 1 SQL> -- SQL> -- Add one more 16 and the endpoint shifts to 16 SQL> -- SQL> insert into t1 values (3, 16); 1 row created. Elapsed: 00:00:00.00 SQL> SQL> BEGIN 2 dbms_stats.gather_table_stats 3 (user, 't1', method_opt => 'for all columns size 20'); 4 END; 5 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.03 SQL> SQL> select 2 uth.endpoint_number 3 ,uth.endpoint_actual_value 4 ,uth.endpoint_repeat_count 5 from 6 user_tab_histograms uth 7 ,user_tables ut 8 ,user_tab_col_statistics ucs 9 where 10 uth.table_name = 'T1' 11 and uth.column_name = 'N1' 12 and uth.table_name = ut.table_name 13 and ut.table_name = ucs.table_name 14 and uth.column_name = ucs.column_name 15 order by uth.endpoint_number; ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT --------------- ------------ --------------------- 1 8 1 6 13 3 11 16 4 The endpoint has shifted again and is now 16. Now that we know what a Hybrid histogram is and what it can do let’s look at another way to generate such a histogram in 12.1.0.2, and this goes back (finally!) to that “old saw” mentioned earlier. Let’s create a table with a single-column primary key and see what Oracle does with it when we run DBMS_STATS. This example, again by Jonathan Lewis, creates a table and populates the id column (the eventual primary key for the table) in a rather interesting way: SQL> SQL> create table t1 2 as 3 with generator as ( 4 select --+ materialize 5 rownum id 6 from dual 7 connect by 8 level SQL> column maxid1 new_value max1 SQL> column maxid2 new_value max2 SQL> column maxid3 new_value max3 SQL> SQL> select min(id), max(id) maxid1 2 from t1; MIN(ID) MAXID1 ---------- ---------- 1 10000 SQL> SQL> insert into t1 select id + 1e6 from t1; 10000 rows created. SQL> SQL> select min(id), max(id) maxid2 2 from t1; MIN(ID) MAXID2 ---------- ---------- 1 1010000 SQL> SQL> insert into t1 select id + 1e7 from t1; 20000 rows created. SQL> SQL> select min(id), max(id) maxid3 2 from t1; MIN(ID) MAXID3 ---------- ---------- 1 11010000 SQL> Notice the values in the column; the values go from 1 to 10000 then jump to 1000001, continuing to 1010000, and jump again to 11000001 continuing on to 11010000. The breaks in the value sequence are what make this data set qualify for a Hybrid histogram in 12.1.0.2; in 11.2.0.4 a height-balanced histogram is created. Let’s add the primary key to the table: ` SQL> SQL> alter table t1 add constraint t1_pk primary key(id); Table altered. Now let’s query for data that isn’t in the table: SQL> SQL> select 2 /*+ dynamic_sampling(0) */ 3 * 4 from t1 5 where 6 id between 12000 and 13000 7 ; no rows selected SQL> Doing nothing special we generate statistics on the table; Oracle will automatically determine that the PK column needs a histogram and create it: SQL> begin 2 dbms_stats.gather_table_stats( 3 ownname => user, 4 tabname =>'T1' 5 ); 6 end; 7 / PL/SQL procedure successfully completed. SQL> SQL> column column_name format a35 SQL> set linesize 150 SQL> SQL> select 2 column_name, sample_size, 3 num_distinct, num_nulls, density, 4 histogram, num_buckets 5 from 6 user_tab_cols 7 where 8 table_name = 'T1' 9 order by 10 column_name 11 ; COLUMN_NAME SAMPLE_SIZE NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM NUM_BUCKETS ----------------------------------- ----------- ------------ ---------- ---------- --------------- ----------- ID 5502 40000 0 .000025 HYBRID 254 SQL> To prove that 11.2.0.4 will create a height-balanced histogram let’s run the example there: SQL> SQL> create table t1 2 as 3 with generator as ( 4 select --+ materialize 5 rownum id 6 from dual 7 connect by 8 level 'T1' 5 ); 6 end; 7 / PL/SQL procedure successfully completed. SQL> SQL> column column_name format a35 SQL> set linesize 150 SQL> SQL> select 2 column_name, sample_size, 3 num_distinct, num_nulls, density, 4 histogram, num_buckets 5 from 6 user_tab_cols 7 where 8 table_name = 'T1' 9 order by 10 column_name 11 ; COLUMN_NAME SAMPLE_SIZE NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM NUM_BUCKETS ----------------------------------- ----------- ------------ ---------- ---------- --------------- ----------- ID 5509 40000 0 .000024959 HEIGHT BALANCED 254 SQL> So it would appear that having a histogram on a single-column primary key isn’t so far-fetched, after all. And we didn’t have to do anything special to get it created. It all comes down to numbers. Filed under: General

Viewing all articles
Browse latest Browse all 1814

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>