CHOOSING SAMPLE PERIOD TO CALCULATE AVERAGES Author JP Vijaykumar Date Mar 8th 2016 This is a test case scenario, wherein I generated average database growth info. Based on the average growth values, I project the amount of diskspace requirements for the db's growth, for a period of three months. We are having historical data on multiple dbs collected on a weekly basis for the last 3 years. I wondered, as to how the average of a db's growth numbers vary, based on the sample periods. This is a test case scenario, generated for education purpose only. create table temp_jp(run_date date, size_gb number); There are other methods to calculate the average growth of db, but to make things simple, I had chosen these methods. truncate table temp_jp; set serverout on size 1000000 timing on declare v_date date:=to_date('2013-03-12','YYYY-MM-DD'); v_num number:=10; begin execute immediate 'alter session set nls_date_format='''||'yyyy-mm-dd'||''' '; for i in 1..156 loop --POPULATING DATA FOR 3 YEARS(3*52 WEEKS) v_date:=v_date+7; v_num:=v_num+round(dbms_random.value(1,20)); --INSERTED RANDOM NUMBERS, TO SIMULATE IRREGULAR DB GROWTH insert into temp_jp values(v_date,v_num); end loop; commit; end; / Generated a report from 36 month's data - 1 month, reducing by 30 days in every iteration. In method 1, if the data is not uniformly collected on a weekly basis, then the average values' calculation will not be accurate. Here I am using count(*) for the numer of weeks between the collection periods. To plug in the descripency of irregular collection periods, a different approach is to be implemented. Since this is a test case, I had generated sample data for every week for a period of 3 yesr for my test. set serverout on size 1000000 timing on declare v_num number:=1080; v_min date; v_max date; v_msz number(6,2); v_avg number(6,2); begin execute immediate 'alter session set nls_date_format='''||'yyyy-mm-dd'||''' '; dbms_output.put_line('METHOD NUM_DAYS START_DATE WEEKLY_AVG_GROWTH_GB 3MTHS_PROJECTED_GROWTH_GB'); while (v_num>0) loop execute immediate 'select min(run_date),max(run_date),max(size_gb), round((max(size_gb) - min(size_gb))/count(*),2) from temp_jp where run_date>=(sysdate -'||v_num||')' into v_min,v_max,v_msz,v_avg; dbms_output.put_line('Method 1: '||' '||lpad(v_num,9)||' '||lpad(v_min,10)||' '||lpad(v_avg,23)||' '||lpad(v_avg*12,23)); execute immediate 'with t as (select run_date,size_gb from temp_jp where run_date>=(sysdate - '||v_num||')) select round(avg(growth_gb),2) from ( select e.run_date,e.size_gb - b.size_gb growth_gb from t e,t b where e.run_date = b.run_date +7)' into v_avg; dbms_output.put_line('Method 2: '||' '||lpad(v_num,9)||' '||lpad((sysdate - v_num),10)||' '||lpad(v_avg,23)||' '||lpad(v_avg*12,23)); v_num:=v_num-30; --REDUCING THE PERIOD BY 30 DAYS FOR EACH ITERATION end loop; end; / METHOD NUM_DAYS START_DATE WEEKLY_AVG_GROWTH_GB 3MTHS_PROJECTED_GROWTH_GB Method 1: 1080 2013-03-26 10.76 129.12 Method 2: 1080 2013-03-24 10.83 129.96 Method 1: 1050 2013-04-30 10.89 130.68 Method 2: 1050 2013-04-23 10.96 131.52 Method 1: 1020 2013-05-28 10.77 129.24 Method 2: 1020 2013-05-23 10.84 130.08 Method 1: 990 2013-06-25 10.95 131.4 Method 2: 990 2013-06-22 11.03 132.36 Method 1: 960 2013-07-23 10.87 130.44 Method 2: 960 2013-07-22 10.95 131.4 Method 1: 930 2013-08-27 10.83 129.96 Method 2: 930 2013-08-21 10.91 130.92 Method 1: 900 2013-09-24 10.73 128.76 Method 2: 900 2013-09-20 10.81 129.72 Method 1: 870 2013-10-22 10.74 128.88 Method 2: 870 2013-10-20 10.82 129.84 Method 1: 840 2013-11-26 10.7 128.4 Method 2: 840 2013-11-19 10.79 129.48 Method 1: 810 2013-12-24 10.74 128.88 Method 2: 810 2013-12-19 10.83 129.96 Method 1: 780 2014-01-21 10.74 128.88 Method 2: 780 2014-01-18 10.84 130.08 Method 1: 750 2014-02-18 10.75 129 Method 2: 750 2014-02-17 10.85 130.2 Method 1: 720 2014-03-25 10.8 129.6 Method 2: 720 2014-03-19 10.9 130.8 Method 1: 690 2014-04-22 10.8 129.6 Method 2: 690 2014-04-18 10.91 130.92 Method 1: 660 2014-05-20 10.93 131.16 Method 2: 660 2014-05-18 11.04 132.48 Method 1: 630 2014-06-24 10.77 129.24 Method 2: 630 2014-06-17 10.89 130.68 Method 1: 600 2014-07-22 10.87 130.44 Method 2: 600 2014-07-17 11 132 Method 1: 570 2014-08-19 10.82 129.84 Method 2: 570 2014-08-16 10.95 131.4 Method 1: 540 2014-09-16 10.94 131.28 Method 2: 540 2014-09-15 11.08 132.96 Method 1: 510 2014-10-21 10.96 131.52 Method 2: 510 2014-10-15 11.11 133.32 Method 1: 480 2014-11-18 10.75 129 Method 2: 480 2014-11-14 10.91 130.92 Method 1: 450 2014-12-16 10.55 126.6 Method 2: 450 2014-12-14 10.72 128.64 Method 1: 420 2015-01-20 10.38 124.56 Method 2: 420 2015-01-13 10.56 126.72 Method 1: 390 2015-02-17 10.61 127.32 Method 2: 390 2015-02-12 10.8 129.6 Method 1: 360 2015-03-17 10.46 125.52 Method 2: 360 2015-03-14 10.67 128.04 Method 1: 330 2015-04-14 10.75 129 Method 2: 330 2015-04-13 10.98 131.76 Method 1: 300 2015-05-19 10.7 128.4 Method 2: 300 2015-05-13 10.95 131.4 Method 1: 270 2015-06-16 10.74 128.88 Method 2: 270 2015-06-12 11.03 132.36 Method 1: 240 2015-07-14 10.77 129.24 Method 2: 240 2015-07-12 11.09 133.08 Method 1: 210 2015-08-18 10.93 131.16 Method 2: 210 2015-08-11 11.31 135.72 Method 1: 180 2015-09-15 10.73 128.76 Method 2: 180 2015-09-10 11.16 133.92 Method 1: 150 2015-10-13 11.27 135.24 Method 2: 150 2015-10-10 11.81 141.72 Method 1: 120 2015-11-10 11.39 136.68 Method 2: 120 2015-11-09 12.06 144.72 Method 1: 90 2015-12-15 10.38 124.56 Method 2: 90 2015-12-09 11.25 135 Method 1: 60 2016-01-12 10.11 121.32 Method 2: 60 2016-01-08 11.38 136.56 Method 1: 30 2016-02-09 7.8 93.6 Method 2: 30 2016-02-07 9.75 117 PL/SQL procedure successfully completed. Elapsed: 00:00:00.07 From the above test, it is evident, that if the sample period is too small, the average values are diminishing. Conservatively, it is better to have a sample period of atleast 6 months' worth of data for projecting a db's future growth numbers. However, based on the amount of data available, acceptable % of variance from projections, you can better choose the best sampling period, suitable for your specific scenario. Happy scripting. References: http://www.toadworld.com/platforms/oracle/w/wiki/10837.tablespace-growth-report.aspx http://www.databasejournal.com/features/oracle/article.php/3673616 Aaron Levenstein's quotation on statistics: http://math.mohawkcollege.ca/kezys/41_StatIntro%2010038_F07.pdf http://www.readoo.in/2015/06/irrelevance-of-statistics-and-maths-in-economics http://funsms.org/one_liners2.php
↧