Wednesday, 4 January 2017

ANALYTICAL FUNCTIONS



ANALYTICAL FUNCTIONS:
--------------------------------------
1.Oracle Analytical Functions results an aggregate value based on a group of rows.
  that group of rows will be identified dynamically during the run time based on the anlytical clause.

2.Analytical function return multiple rows for each group however aggregate fuctions return a single row
  for each group.

AGGREGATE FUNCTIONS:
=====================
MAX
MIN
AVG
COUNT
SUM
 
 GROUP BY CLAUSE:
 -----------------
 The GROUP BY clause allows us to apply aggregate functions to group of rows.
 the aggregate function reduces the number of rows returned by the query.

 what are the things that GROUP BY cannot do?
 ---------------------------------------------
 SELECT AVG(SAL) FROM EMP;

 AVG(SAL)
 --------
 2073.21429

lets say if we want to display the average salary for each department.

SELECT DEPTNO,AVG(SAL) FROM EMP
GROUP BY DEPTNO
ORDER BY DEPTNO;

DEPTNO        AVG(SAL)
-------      -----------
10             2916.66667
20             2175
30             1566.66667
here we can observe that the aggregate function reduces the number of rows returned by the query.

ANALYTICAL FUNCTIONS:
=====================
1.Analytical functions also operate on subsets of rows, similar to aggregate functions in GROUP BY
  queries, but they do not reduce the number of rows returned by the query.

the basic syntax for analytical function:
----------------------------------------------
analytical_function([arguments]) OVER (analytical_clause)

OVER: OVER clause is a keyword as part of the anlytical clause which act same as GROUP BY .

TYPES OF ANALYTICAL FUNCTIONS:
===============================
FIRST_VALUE
LAST_VALUE
LEAD
LAG
ROLLUP
CUBE
PIVOT
UNPIVOT
DECODE
CASE
GROUPING_ID
RANK()
DENSE_RANK()
ROWNUMBER


------------------------------------------------------------------------------------------------------
Example: if our requirement is to fetch and display all employee records along with avg salary of the
employees within the department.
------------------------------------------------------------------------------------------------------
**// requirement cannot be fulfilled by a simple group by statement **//
here makes analytical function we makes job easy.

SELECT EMPNO,DEPTNO,SAL,AVG(SAL)OVER(PARTITION BY DEPTNO) AS AVG_DEPT_SAL FROM EMP;
--------------------------------------------
"EMPNO"        "DEPTNO"      "SAL"  "AVG_DEPT_SAL"
-------  ------    -------  ---------------
7782    10        2450    2916.666666666666666666666666666666666667
7839    10        5000    2916.666666666666666666666666666666666667
7934    10        1300    2916.666666666666666666666666666666666667
7566    20        2975    2175
7902    20        3000    2175
7876    20        1100    2175
7369    20        800      2175
7788    20        3000    2175
7521    30        1250    1566.666666666666666666666666666666666667
7844    30        1500    1566.666666666666666666666666666666666667
7499    30        1600    1566.666666666666666666666666666666666667
7900    30        950      1566.666666666666666666666666666666666667
7698    30        2850    1566.666666666666666666666666666666666667
7654    30        1250    1566.666666666666666666666666666666666667

in this time AVG is an analytical function operating on the group of rows defined by the content of
the OVER clause. we can observe how the AVG function is still reporting the departmental average, like it did in the GROUP BY query ,but the result is present in the each row rather than reducing the total
number of rows returned.
the analytical_clause breaks down into the following optional elements:
====================================================
1.Query_Partition_Clause
2.Order_By_Clause
3.Windowing_Clause

Query_Partition_Clause:
----------------------------
1.The Query_Partition_Clause breaks the result set into partitions,or groups of data.
2.the operation of the analytical fuction is specific to the boundary created by these partitions.
3.If the Query_Partition_Clause is NULL  then the whole result set is treaed as a single partition.

SELECT AVG(SAL) FROM EMP;

AVG(SAL)
--------------
2073.21429

SELECT EMPNO,DEPTNO,SAL,AVG(SAL)OVER()AS AVG_SAL FROM EMP;
-----------------------------------------------------------
"EMPNO"        "DEPTNO"      "SAL"  "AVG_DEPT_SAL"
-------  -------     -----  ---------------
7369    20        800               2073.214  285714285714285714285714285714286
7499    30        1600             2073.214 285714285714285714285714285714286
7521    30        1250             2073.214 285714285714285714285714285714286
7566    20        2975             2073.214 285714285714285714285714285714286
7654    30        1250             2073.214 285714285714285714285714285714286
7698    30        2850             2073.214 285714285714285714285714285714286
7782    10        2450             2073.214 285714285714285714285714285714286
7788    20        3000             2073.214 285714285714285714285714285714286
7839    10        5000             2073.214 285714285714285714285714285714286
7844    30        1500             2073.214 285714285714285714285714285714286
7876    20        1100             2073.214 285714285714285714285714285714286
7900    30        950               2073.214 285714285714285714285714285714286
7902    20        3000             2073.214 285714285714285714285714285714286
7934    10        1300             2073.214 285714285714285714285714285714286

the above query average is based on  all the rows of the result set, same as GROUP BY CLAUSE.

If we change the OVER clause to include a Query_Partition_Clause based on the department, then average salary
displays for each dept.


SELECT EMPNO,DEPTNO,SAL,AVG(SAL)OVER(PARTITION BY DEPTNO) AS AVG_DEPT_SAL FROM EMP;

--------------------------------------------------------------------------
"EMPNO"        "DEPTNO"      "SAL"  "AVG_DEPT_SAL"
-------  ------    -------  ---------     --------   -------------------------
7782    10        2450    2916.666666666666666666666666666666666667
7839    10        5000    2916.666666666666666666666666666666666667
7934    10        1300    2916.666666666666666666666666666666666667
7566    20        2975    2175
7902    20        3000    2175
7876    20        1100    2175
7369    20        800          2175
7788    20        3000    2175
7521    30        1250    1566.666666666666666666666666666666666667
7844    30        1500    1566.666666666666666666666666666666666667
7499    30        1600    1566.666666666666666666666666666666666667
7900    30        950          1566.666666666666666666666666666666666667
7698    30        2850    1566.666666666666666666666666666666666667
7654    30        1250    1566.666666666666666666666666666666666667

Order_By_Clause:
==============
the ORDER BY clause specifies how the data is sorted within each group(partition).
SELECT EMPNO,DEPTNO,SAL,AVG(SAL) OVER(PARTITION BY DEPTNO ORDER BY DEPTNO) AS AVG_DEPT_SAL FROM EMP;

SELECT EMPNO,DEPTNO,SAL,FIRST_VALUE(SAL)OVER(PARTITION BY DEPTNO) AS FIRST_IN_DEPT FROM EMP;
-------------------------------------------
"EMPNO"        "DEPTNO"      "SAL"  "FIRST_IN_DEPT"
-------  -------   ------   ----------------
7782        10       2450        2450
7839        10       5000        2450
7934        10     1300          2450
7566        20       2975        2975
7902        20       3000        2975
7876        20       1100        2975
7369        20        800         2975
7788        20       3000        2975
7521        30       1250        1250
7844        30       1500        1250
7499        30       1600        1250
7900        30        950         1250
7698        30       2850        1250
7654        30       1250        1250

not to the same FIRST_VALUE function lets include order by clause to fethc the highest salary in each
department group.

SELECT EMPNO,DEPTNO,SAL,FIRST_VALUE(SAL)OVER(PARTITION BY DEPTNO ORDER BY SAL ASC NULLS LAST)AS FIRST_VAL_IN_DEPT
FROM EMP;

ASCENDING ORDER:
------------------------------------------------
"EMPNO"        "DEPTNO"      "SAL"  "FIRST_VAL_IN_DEPT"
7934    10        1300    1300
7782    10        2450    1300
7839    10        5000    1300
7369    20        800        800
7876    20        1100    800
7566    20        2975    800
7788    20        3000    800
7902    20        3000    800
7900    30        950      950
7654    30        1250    950
7521    30        1250    950
7844    30        1500    950
7499    30        1600    950
7698    30        2850    950
-----------------------------------------------
SELECT EMPNO,DEPTNO,SAL,FIRST_VALUE(SAL)OVER(PARTITION BY DEPTNO ORDER BY SAL DESC NULLS FIRST)AS FIRST_VAL_IN_DEPT
FROM EMP;

DESCENDING ORDER:
-----------------
"EMPNO"        "DEPTNO"      "SAL"  "FIRST_VAL_IN_DEPT"
7839    10        5000    5000
7782    10        2450    5000
7934    10        1300    5000
7788    20        3000    3000
7902    20        3000    3000
7566    20        2975    3000
7876    20        1100    3000
7369    20        800          3000
7698    30        2850    2850
7499    30        1600    2850
7844    30        1500    2850
7654    30        1250    2850
7521    30        1250    2850
7900    30        950          2850

the above queries ASC NULLS LAST keywords are unnecessary as ASC is the default for an Order_By_Clause
and NULLS LAST is the default for ASC orders.
when ordering by DESC the default is NULLS FIRST.

Windowing_Clause:
===============
Windowing_Clause gives further control on the partitions created on the data as it creates another level of
boundary on top of the partition.
Windowing_Clause is not mandatory i.e if not mentioned then by default it considers from first row of
a group to the current row.

POSSIBLE VALUES FOR start_point and end_point:
----------------------------------------------
UNBOUNDED PRECEDING: the window start at first row of the partition.only available for start points.
UNBOUNDED FOLLOWING: the window ends at the last row of the partition.only available for end points.

select ename, deptno, sal, FIRST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)MIN_SAL_NAME FROM (SELECT * FROM EMP WHERE DEPTNO=30);
----------------------------------
 ename             deptno             sal       MIN_SAL_-NAME
------  -----  ---  ------------
JAMES                        30                    950      JAMES
WARD             30                    1250    JAMES
MARTIN         30                    1250    JAMES
TURNER         30                    1500    JAMES
ALLEN            30                    1600    JAMES
BLAKE            30                    2850    JAMES


select ename, deptno, sal, LAST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)MAX_SAL_NAME FROM (SELECT * FROM EMP WHERE DEPTNO=30);
-------------------------------
ENAME           DEPTNO SAL             MAX_SAL_NAME
------                 ------     ------                 -----------
JAMES                        30        950                 BLAKE
WARD             30        1250                BLAKE
MARTIN         30        1250                BLAKE
TURNER         30        1500                BLAKE
ALLEN            30        1600                BLAKE
BLAKE            30        2850                BLAKE

select ename, deptno, sal, FIRST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)MIN_SAL_NAME FROM EMP;

ENAME     DEPTNO   SAL     MIN_SAL_NAME
----------   ------------  --------   -------------------------
SMITH             20        800      SMITH
JAMES                        30        950      SMITH
ADAMS           20        1100    SMITH
WARD             30        1250    SMITH
MARTIN         30        1250    SMITH
MILLER          10        1300    SMITH
TURNER         30        1500    SMITH
ALLEN            30        1600    SMITH
CLARK           10        2450    SMITH
BLAKE            30        2850    SMITH
JONES             20        2975    SMITH
SCOTT                        20        3000    SMITH
FORD  20        3000    SMITH
KING   10        5000    SMITH

Select  ename, deptno, sal, LAST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)MAX_SAL_NAME FROM EMP;

SELECT ENAME,DEPTNO,SAL,FIRST_VALUE(ENAME)OVER(ORDER BY SAL ASC)MIN_SAL_NAME FROM
(SELECT * FROM EMP WHERE DEPTNO=30);

----------------------------------
 ename                          deptno                        sal       MIN_SAL_-NAME
------                 -----                  ---         ------------
JAMES                        30                    950      JAMES
WARD             30                    1250    JAMES
MARTIN         30                    1250    JAMES
TURNER         30                    1500    JAMES
ALLEN            30                    1600    JAMES
BLAKE            30                    2850    JAMES

SELECT ENAME,DEPTNO,SAL,LAST_VALUE(ENAME)OVER(ORDER BY SAL ASC)MIN_SAL_NAME FROM (SELECT * FROM EMP WHERE
DEPTNO=30);

------------------------------
ENAME           DEPTNO        SAL     MAX_SAL_NAME
------                 ----                  ----       -------------
JAMES                        30                    950      JAMES
MARTIN         30                    1250    WARD
WARD             30                    1250    WARD
TURNER         30                    1500    TURNER
ALLEN            30                    1600    ALLEN
BLAKE            30                    2850    BLAKE


select ename, deptno, sal,
FIRST_VALUE(ENAME)OVER(PARTITION BY DEPTNO ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)MIN_SAL_NAME,
LAST_VALUE(ENAME)OVER(PARTITION BY DEPTNO ORDER BY SAL ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)MAX_SAL_NAME FROM EMP;

select ename, deptno, sal,
FIRST_VALUE(ENAME)OVER(PARTITION BY DEPTNO ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING)MIN_SAL_NAME,
LAST_VALUE(ENAME)OVER(PARTITION BY DEPTNO ORDER BY SAL ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)MAX_SAL_NAME FROM (SELECT * FROM EMP WHERE DEPTNO IN(10,30));

MILLER          10        1300    MILLER          KING
CLARK           10        2450    MILLER          KING
KING               10        5000    MILLER          KING
JAMES                        30        950      JAMES           BLAKE
MARTIN         30        1250    JAMES                        BLAKE
WARD             30        1250    JAMES                        BLAKE
TURNER         30        1500    JAMES                        BLAKE
ALLEN            30        1600    JAMES                        BLAKE
BLAKE            30        2850    JAMES                        BLAKE

SELECT ENAME,DEPTNO,SAL,SUM(SAL)OVER(ORDER BY SAL ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)SLIDESAL
FROM EMP;

-------------------------------------------------------------
ENAME           DEPTNO        SAL     SLIDESAL
-----                  ------                 ---       -------------
SMITH             20                    80          1750
JAMES                        30                    950      2850
ADAMS           20                    1100    3300
WARD             30                    1250    3600
MARTIN         30                    1250    3800
MILLER          10                    1300    4050
TURNER         30                    1500    4400
ALLEN            30                    1600    5550
CLARK           10                    2450    6900
BLAKE            30                    2850    8275
JONES             20                    2975    8825
SCOTT                        20                    3000    8975
FORD              20                    3000    11000
KING               10                    5000    8000

ROW_NUMBER():
=============
ROW_NUMBER assigns a unique number to each row of same window in the ordered sequence of rows specified by order_by_clause.
SELECT ROWNUM,ENAME,SAL FROM EMP;

SELECT DEPTNO,ENAME,ROW_NUMBER()OVER(ORDER BY HIREDATE NULLS LAST)SEQNO
FROM EMP;

the above both queries are similar

SELECT DEPTNO,ENAME,SAL,ROW_NUMBER()OVER(PARTITION BY DEPTNO
ORDER BY SAL DESC NULLS LAST)RR FROM EMP;



-----------------------------
DEPTNO         ENAME           SAL     RR
-----                  -----                  ----       -----------
10                    KING               5000    1
10                    CLARK           2450    2
10                    MILLER          1300    3
20                    SCOTT                        3000    1
20                    FORD              3000    2
20                    JONES             2975    3
20                    ADAMS           1100    4
20                    SMITH             800      5
30                    BLAKE            2850    1
30                    ALLEN            1600    2
30                    TURNER         1500    3
30                    MARTIN         1250    4
30                    WARD             1250    5
30                    JAMES                        950      6

SELECT E1.* FROM(SELECT DEPTNO,ENAME,ROW_NUMBER()OVER(PARTITION BY DEPTNO
ORDER BY SAL DESC NULLS LAST)SEQNO FROM EMP)E1
WHERE E1.SEQNO<=3;
-----------------------
DEPTNO         ENAME          SAL     RR
-----                   -----                 ---       -----
10                    KING               5000    1
10                    CLARK           2450    2
10                    MILLER          1300    3
20                    SCOTT                        3000    1
20                    FORD              3000    2
20                    JONES             2975    3
30                    BLAKE            2850    1
30                    ALLEN            1600    2
30                    TURNER         1500    3

Difference Between RANK(),ROW_NUMBER(), DENSE_RANK()

RANK():
-------
 RANK is almost same as ROW_NUMBER but rows with equal values, within same window, for on which order by clause is specified receive the same rank but next row receives RANK as per it ROW_NUMBER.

          One of the most obvious and useful set of window functions are ranking functions where rows from your result set are ranked according to a certain scheme. There are three ranking functions:

    ROW_NUMBER()
    RANK()
    DENSE_RANK()

The difference is easy to remember. For the examples, let’s assume we have this table (using PostgreSQL syntax):
           
CREATE TABLE t(v) AS
SELECT * FROM (
  VALUES('a'),('a'),('a'),('b'),
        ('c'),('c'),('d'),('e')
) t(v)

ROW_NUMBER():
------------------------
… assigns unique numbers to each row within the PARTITION given the ORDER BY clause. So you’d get:
           
SELECT v, ROW_NUMBER() OVER()
FROM t

Note that some SQL dialects (e.g. SQL Server) require an explicit ORDER BY clause in the OVER() clause:
           
SELECT v, ROW_NUMBER() OVER(ORDER BY v)
FROM t

The above query returns:

| V | ROW_NUMBER |
|---|       ------------|
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |

(see also this SQLFiddle)

RANK():
-------
… behaves like ROW_NUMBER(), except that “equal” rows are ranked the same. If we substitute RANK() into our previous query:
           
SELECT v, RANK() OVER(ORDER BY v)
FROM t









… then the result we’re getting is this:

| V | RANK |
|---|------|
| a |    1 |
| a |    1 |
| a |    1 |
| b |    4 |
| c |    5 |
| c |    5 |
| d |    7 |
| e |    8 |

(see also this SQLFiddle)

As you can see, much like in a sports ranking, we have gaps between the different ranks. We can avoid those gaps by using

DENSE_RANK()

Trivially, DENSE_RANK() is a rank with no gaps, i.e. it is “dense”. We can write:

SELECT v, DENSE_RANK() OVER(ORDER BY v)
FROM t

… to obtain

| V | DENSE_RANK |
|---|       ------------|
| a |          1 |
| a |          1 |
| a |          1 |
| b |          2 |
| c |          3 |
| c |          3 |
| d |          4 |
| e |          5 |

(see also this SQLFiddle)

One interesting aspect of DENSE_RANK() is the fact that it “behaves like” ROW_NUMBER() when we add the DISTINCT keyword.
           
SELECT DISTINCT v, DENSE_RANK() OVER(ORDER BY v)
FROM t






… to obtain

| V | DENSE_RANK |
|---|------------|
| a |          1 |
| b |          2 |
| e |          5 |
| d |          4 |
| c |          3 |

(see also this SQLFiddle)

In fact, ROW_NUMBER() prevents you from using DISTINCT, because ROW_NUMBER() generates unique values across the partition before DISTINCT is applied:
           
SELECT DISTINCT v, ROW_NUMBER() OVER(ORDER BY v)
FROM t
ORDER BY 1, 2

DISTINCT has no effect:

| V | ROW_NUMBER |
|---|------------|
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |

(see also this SQLFiddle)
Putting it all together

A good way to understand the three ranking functions is to see them all in action side-by-side. Run this query
           
SELECT
  v,
  ROW_NUMBER() OVER(ORDER BY v),
  RANK()       OVER(ORDER BY v),
  DENSE_RANK() OVER(ORDER BY v)
FROM t
ORDER BY 1, 2

… or this one (using the SQL standard WINDOW clause, to reuse window specifications):
           
SELECT
  v,
  ROW_NUMBER() OVER(w),
  RANK()       OVER(w),
  DENSE_RANK() OVER(w)
FROM t
WINDOW w AS (ORDER BY v)

… to obtain:

| V | ROW_NUMBER | RANK | DENSE_RANK |
|---|------------|------|------------|
| a |                                1 |    1 |          1 |
| a |                               2 |    1 |          1 |
| a |                                 3 |    1 |          1 |
| b |                                4 |    4 |          2 |
| c |                                5 |    5 |          3 |
| c |                                6 |    5 |          3 |
| d |                                7 |    7 |          4 |
| e |                                8 |    8 |          5 |

No comments:

Post a Comment