ANALYTICAL FUNCTIONS:
--------------------------------------
1.Oracle
Analytical Functions results an aggregate value based on a group of rows.
that group of rows will be identified
dynamically during the run time based on the anlytical clause.
2.Analytical
function return multiple rows for each group however aggregate fuctions return
a single row
for each group.
AGGREGATE
FUNCTIONS:
=====================
MAX
MIN
AVG
COUNT
SUM
GROUP BY CLAUSE:
-----------------
The GROUP BY clause allows us to apply
aggregate functions to group of rows.
the aggregate function reduces the number of
rows returned by the query.
what are the things that GROUP BY cannot do?
---------------------------------------------
SELECT AVG(SAL) FROM EMP;
AVG(SAL)
--------
2073.21429
lets
say if we want to display the average salary for each department.
SELECT
DEPTNO,AVG(SAL) FROM EMP
GROUP
BY DEPTNO
ORDER
BY DEPTNO;
DEPTNO AVG(SAL)
------- -----------
10 2916.66667
20 2175
30 1566.66667
here
we can observe that the aggregate function reduces the number of rows returned
by the query.
ANALYTICAL
FUNCTIONS:
=====================
1.Analytical
functions also operate on subsets of rows, similar to aggregate functions in
GROUP BY
queries, but they do not reduce the number of
rows returned by the query.
the
basic syntax for analytical function:
----------------------------------------------
analytical_function([arguments])
OVER (analytical_clause)
OVER:
OVER clause is a keyword as part of the anlytical clause which act same as
GROUP BY .
TYPES
OF ANALYTICAL FUNCTIONS:
===============================
FIRST_VALUE
LAST_VALUE
LEAD
LAG
ROLLUP
CUBE
PIVOT
UNPIVOT
DECODE
CASE
GROUPING_ID
RANK()
DENSE_RANK()
ROWNUMBER
------------------------------------------------------------------------------------------------------
Example:
if our requirement is to fetch and display all employee records along with avg
salary of the
employees
within the department.
------------------------------------------------------------------------------------------------------
**//
requirement cannot be fulfilled by a simple group by statement **//
here
makes analytical function we makes job easy.
SELECT
EMPNO,DEPTNO,SAL,AVG(SAL)OVER(PARTITION BY DEPTNO) AS AVG_DEPT_SAL FROM EMP;
--------------------------------------------
"EMPNO" "DEPTNO" "SAL" "AVG_DEPT_SAL"
------- ------
------- ---------------
7782 10 2450 2916.666666666666666666666666666666666667
7839 10 5000 2916.666666666666666666666666666666666667
7934 10 1300 2916.666666666666666666666666666666666667
7566 20 2975 2175
7902 20 3000 2175
7876 20 1100 2175
7369 20 800 2175
7788 20 3000 2175
7521 30 1250 1566.666666666666666666666666666666666667
7844 30 1500 1566.666666666666666666666666666666666667
7499 30 1600 1566.666666666666666666666666666666666667
7900 30 950 1566.666666666666666666666666666666666667
7698 30 2850 1566.666666666666666666666666666666666667
7654 30 1250 1566.666666666666666666666666666666666667
in
this time AVG is an analytical function operating on the group of rows defined
by the content of
the
OVER clause. we can observe how the AVG function is still reporting the
departmental average, like it did in the GROUP BY query ,but the result is
present in the each row rather than reducing the total
number
of rows returned.
the
analytical_clause breaks down into the following optional elements:
====================================================
1.Query_Partition_Clause
2.Order_By_Clause
3.Windowing_Clause
Query_Partition_Clause:
----------------------------
1.The
Query_Partition_Clause breaks the result set into partitions,or groups of data.
2.the
operation of the analytical fuction is specific to the boundary created by
these partitions.
3.If
the Query_Partition_Clause is NULL then
the whole result set is treaed as a single partition.
SELECT
AVG(SAL) FROM EMP;
AVG(SAL)
--------------
2073.21429
SELECT
EMPNO,DEPTNO,SAL,AVG(SAL)OVER()AS AVG_SAL FROM EMP;
-----------------------------------------------------------
"EMPNO" "DEPTNO" "SAL" "AVG_DEPT_SAL"
------- -------
----- ---------------
7369 20 800
2073.214 285714285714285714285714285714286
7499 30 1600
2073.214 285714285714285714285714285714286
7521 30 1250
2073.214 285714285714285714285714285714286
7566 20 2975
2073.214 285714285714285714285714285714286
7654 30 1250
2073.214 285714285714285714285714285714286
7698 30 2850
2073.214 285714285714285714285714285714286
7782 10 2450
2073.214 285714285714285714285714285714286
7788 20 3000
2073.214 285714285714285714285714285714286
7839 10 5000
2073.214 285714285714285714285714285714286
7844 30 1500
2073.214 285714285714285714285714285714286
7876 20 1100
2073.214 285714285714285714285714285714286
7900 30 950
2073.214 285714285714285714285714285714286
7902 20 3000
2073.214 285714285714285714285714285714286
7934 10 1300
2073.214 285714285714285714285714285714286
the
above query average is based on all the
rows of the result set, same as GROUP BY CLAUSE.
If we
change the OVER clause to include a Query_Partition_Clause based on the department,
then average salary
displays
for each dept.
SELECT
EMPNO,DEPTNO,SAL,AVG(SAL)OVER(PARTITION BY DEPTNO) AS AVG_DEPT_SAL FROM EMP;
--------------------------------------------------------------------------
"EMPNO" "DEPTNO" "SAL" "AVG_DEPT_SAL"
-------
------
------- --------- --------
-------------------------
7782 10 2450 2916.666666666666666666666666666666666667
7839 10 5000 2916.666666666666666666666666666666666667
7934 10 1300 2916.666666666666666666666666666666666667
7566 20 2975 2175
7902 20 3000 2175
7876 20 1100 2175
7369 20 800
2175
7788 20 3000 2175
7521 30 1250 1566.666666666666666666666666666666666667
7844 30 1500 1566.666666666666666666666666666666666667
7499 30 1600 1566.666666666666666666666666666666666667
7900 30 950
1566.666666666666666666666666666666666667
7698 30 2850 1566.666666666666666666666666666666666667
7654 30 1250 1566.666666666666666666666666666666666667
Order_By_Clause:
==============
the
ORDER BY clause specifies how the data is sorted within each group(partition).
SELECT
EMPNO,DEPTNO,SAL,AVG(SAL) OVER(PARTITION BY DEPTNO ORDER BY DEPTNO) AS
AVG_DEPT_SAL FROM EMP;
SELECT
EMPNO,DEPTNO,SAL,FIRST_VALUE(SAL)OVER(PARTITION BY DEPTNO) AS FIRST_IN_DEPT
FROM EMP;
-------------------------------------------
"EMPNO" "DEPTNO" "SAL" "FIRST_IN_DEPT"
------- -------
------ ----------------
7782 10
2450 2450
7839 10
5000 2450
7934
10 1300 2450
7566 20
2975 2975
7902 20
3000 2975
7876 20
1100 2975
7369 20
800 2975
7788 20
3000 2975
7521 30
1250 1250
7844 30
1500 1250
7499 30
1600 1250
7900 30
950 1250
7698 30
2850 1250
7654 30
1250 1250
not
to the same FIRST_VALUE function lets include order by clause to fethc the
highest salary in each
department
group.
SELECT
EMPNO,DEPTNO,SAL,FIRST_VALUE(SAL)OVER(PARTITION BY DEPTNO ORDER BY SAL ASC
NULLS LAST)AS FIRST_VAL_IN_DEPT
FROM
EMP;
ASCENDING
ORDER:
------------------------------------------------
"EMPNO" "DEPTNO" "SAL" "FIRST_VAL_IN_DEPT"
7934 10 1300 1300
7782 10 2450 1300
7839 10 5000 1300
7369 20 800 800
7876 20 1100 800
7566 20 2975 800
7788 20 3000 800
7902 20 3000 800
7900 30 950 950
7654 30 1250 950
7521 30 1250 950
7844 30 1500 950
7499 30 1600 950
7698 30 2850 950
-----------------------------------------------
SELECT
EMPNO,DEPTNO,SAL,FIRST_VALUE(SAL)OVER(PARTITION BY DEPTNO ORDER BY SAL DESC
NULLS FIRST)AS FIRST_VAL_IN_DEPT
FROM
EMP;
DESCENDING
ORDER:
-----------------
"EMPNO" "DEPTNO" "SAL" "FIRST_VAL_IN_DEPT"
7839 10 5000 5000
7782 10 2450 5000
7934 10 1300 5000
7788 20 3000 3000
7902 20 3000 3000
7566 20 2975 3000
7876 20 1100 3000
7369 20 800
3000
7698 30 2850 2850
7499 30 1600 2850
7844 30 1500 2850
7654 30 1250 2850
7521 30 1250 2850
7900 30 950
2850
the
above queries ASC NULLS LAST keywords are unnecessary as ASC is the default for
an Order_By_Clause
and
NULLS LAST is the default for ASC orders.
when
ordering by DESC the default is NULLS FIRST.
Windowing_Clause:
===============
Windowing_Clause
gives further control on the partitions created on the data as it creates
another level of
boundary
on top of the partition.
Windowing_Clause
is not mandatory i.e if not mentioned then by default it considers from first
row of
a
group to the current row.
POSSIBLE
VALUES FOR start_point and end_point:
----------------------------------------------
UNBOUNDED
PRECEDING: the window start at first row of the partition.only available for
start points.
UNBOUNDED
FOLLOWING: the window ends at the last row of the partition.only available for
end points.
select
ename, deptno, sal, FIRST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED
FOLLOWING)MIN_SAL_NAME FROM (SELECT * FROM EMP WHERE DEPTNO=30);
----------------------------------
ename deptno sal MIN_SAL_-NAME
------ -----
--- ------------
JAMES 30 950 JAMES
WARD 30 1250 JAMES
MARTIN 30 1250 JAMES
TURNER 30 1500 JAMES
ALLEN 30 1600 JAMES
BLAKE 30 2850 JAMES
select
ename, deptno, sal, LAST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED
FOLLOWING)MAX_SAL_NAME FROM (SELECT * FROM EMP WHERE DEPTNO=30);
-------------------------------
ENAME DEPTNO
SAL MAX_SAL_NAME
------
------ ------
-----------
JAMES 30 950
BLAKE
WARD 30 1250 BLAKE
MARTIN 30 1250 BLAKE
TURNER 30 1500 BLAKE
ALLEN 30 1600 BLAKE
BLAKE 30 2850 BLAKE
select
ename, deptno, sal, FIRST_VALUE(ENAME)OVER(ORDER BY SAL ASC ROWS BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED
FOLLOWING)MIN_SAL_NAME FROM EMP;
ENAME DEPTNO SAL MIN_SAL_NAME
---------- ------------
--------
-------------------------
SMITH 20 800 SMITH
JAMES 30 950
SMITH
ADAMS 20 1100 SMITH
WARD 30 1250 SMITH
MARTIN 30 1250 SMITH
MILLER 10 1300 SMITH
TURNER 30 1500 SMITH
ALLEN 30 1600 SMITH
CLARK 10 2450 SMITH
BLAKE 30 2850 SMITH
JONES 20 2975 SMITH
SCOTT 20 3000 SMITH
FORD 20 3000 SMITH
KING 10 5000 SMITH
Select
ename, deptno, sal, LAST_VALUE(ENAME)OVER(ORDER
BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED
FOLLOWING)MAX_SAL_NAME FROM EMP;
SELECT
ENAME,DEPTNO,SAL,FIRST_VALUE(ENAME)OVER(ORDER BY SAL ASC)MIN_SAL_NAME FROM
(SELECT
* FROM EMP WHERE DEPTNO=30);
----------------------------------
ename deptno
sal MIN_SAL_-NAME
------ ----- ---
------------
JAMES 30 950 JAMES
WARD 30 1250 JAMES
MARTIN 30 1250 JAMES
TURNER 30 1500 JAMES
ALLEN 30 1600 JAMES
BLAKE 30 2850 JAMES
SELECT
ENAME,DEPTNO,SAL,LAST_VALUE(ENAME)OVER(ORDER BY SAL ASC)MIN_SAL_NAME FROM
(SELECT * FROM EMP WHERE
DEPTNO=30);
------------------------------
ENAME DEPTNO SAL MAX_SAL_NAME
------ ---- ----
-------------
JAMES 30 950
JAMES
MARTIN 30 1250 WARD
WARD 30 1250 WARD
TURNER 30 1500 TURNER
ALLEN 30 1600 ALLEN
BLAKE 30 2850 BLAKE
select
ename, deptno, sal,
FIRST_VALUE(ENAME)OVER(PARTITION
BY DEPTNO ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED
FOLLOWING)MIN_SAL_NAME,
LAST_VALUE(ENAME)OVER(PARTITION
BY DEPTNO ORDER BY SAL ASC
ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)MAX_SAL_NAME FROM EMP;
select
ename, deptno, sal,
FIRST_VALUE(ENAME)OVER(PARTITION
BY DEPTNO ORDER BY SAL ASC ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED
FOLLOWING)MIN_SAL_NAME,
LAST_VALUE(ENAME)OVER(PARTITION
BY DEPTNO ORDER BY SAL ASC
ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)MAX_SAL_NAME FROM (SELECT *
FROM EMP WHERE DEPTNO IN(10,30));
MILLER 10 1300 MILLER KING
CLARK 10 2450 MILLER KING
KING 10 5000 MILLER KING
JAMES 30 950
JAMES BLAKE
MARTIN 30 1250 JAMES BLAKE
WARD 30 1250 JAMES BLAKE
TURNER 30 1500 JAMES BLAKE
ALLEN 30 1600 JAMES BLAKE
BLAKE 30 2850 JAMES BLAKE
SELECT
ENAME,DEPTNO,SAL,SUM(SAL)OVER(ORDER BY SAL ROWS BETWEEN 1 PRECEDING AND 1
FOLLOWING)SLIDESAL
FROM
EMP;
-------------------------------------------------------------
ENAME DEPTNO SAL SLIDESAL
----- ------ --- -------------
SMITH 20 80 1750
JAMES 30 950 2850
ADAMS 20 1100 3300
WARD 30 1250 3600
MARTIN 30 1250 3800
MILLER 10 1300 4050
TURNER 30 1500 4400
ALLEN 30 1600 5550
CLARK 10 2450 6900
BLAKE 30 2850 8275
JONES 20 2975 8825
SCOTT 20 3000 8975
FORD 20 3000 11000
KING 10 5000 8000
ROW_NUMBER():
=============
ROW_NUMBER
assigns a unique number to each row of same window in the ordered sequence of
rows specified by order_by_clause.
SELECT
ROWNUM,ENAME,SAL FROM EMP;
SELECT
DEPTNO,ENAME,ROW_NUMBER()OVER(ORDER BY HIREDATE NULLS LAST)SEQNO
FROM
EMP;
the
above both queries are similar
SELECT
DEPTNO,ENAME,SAL,ROW_NUMBER()OVER(PARTITION BY DEPTNO
ORDER
BY SAL DESC NULLS LAST)RR FROM EMP;
-----------------------------
DEPTNO
ENAME SAL RR
-----
----- ---- -----------
10 KING 5000 1
10 CLARK 2450 2
10 MILLER 1300 3
20 SCOTT 3000 1
20 FORD 3000 2
20 JONES 2975 3
20 ADAMS 1100 4
20 SMITH 800 5
30 BLAKE 2850 1
30 ALLEN 1600 2
30 TURNER 1500 3
30 MARTIN 1250 4
30 WARD 1250 5
30 JAMES 950 6
SELECT
E1.* FROM(SELECT DEPTNO,ENAME,ROW_NUMBER()OVER(PARTITION BY DEPTNO
ORDER
BY SAL DESC NULLS LAST)SEQNO FROM EMP)E1
WHERE
E1.SEQNO<=3;
-----------------------
DEPTNO ENAME SAL RR
-----
-----
--- -----
10 KING 5000 1
10 CLARK 2450 2
10 MILLER 1300 3
20 SCOTT 3000 1
20 FORD 3000 2
20 JONES 2975 3
30 BLAKE 2850 1
30 ALLEN 1600 2
30 TURNER 1500 3
Difference Between
RANK(),ROW_NUMBER(), DENSE_RANK()
RANK():
-------
RANK is almost same as ROW_NUMBER but rows
with equal values, within same window, for on which order by clause is
specified receive the same rank but next row receives RANK as per it
ROW_NUMBER.
One of the most obvious and useful
set of window functions are ranking functions where rows from your result set
are ranked according to a certain scheme. There are three ranking functions:
ROW_NUMBER()
RANK()
DENSE_RANK()
The
difference is easy to remember. For the examples, let’s assume we have this
table (using PostgreSQL syntax):
CREATE
TABLE t(v) AS
SELECT
* FROM (
VALUES('a'),('a'),('a'),('b'),
('c'),('c'),('d'),('e')
)
t(v)
ROW_NUMBER():
------------------------
…
assigns unique numbers to each row within the PARTITION given the ORDER BY
clause. So you’d get:
SELECT
v, ROW_NUMBER() OVER()
FROM
t
Note that
some SQL dialects (e.g. SQL Server) require an explicit ORDER BY clause in the
OVER() clause:
SELECT
v, ROW_NUMBER() OVER(ORDER BY v)
FROM
t
The
above query returns:
| V |
ROW_NUMBER |
|---| ------------|
| a
| 1 |
| a
| 2 |
| a | 3 |
| b
| 4 |
| c
| 5 |
| c
| 6 |
| d
| 7 |
| e
| 8 |
(see
also this SQLFiddle)
RANK():
-------
…
behaves like ROW_NUMBER(), except that “equal” rows are ranked the same. If we
substitute RANK() into our previous query:
SELECT
v, RANK() OVER(ORDER BY v)
FROM
t
…
then the result we’re getting is this:
| V |
RANK |
|---|------|
| a
| 1 |
| a
| 1 |
| a
| 1 |
| b
| 4 |
| c
| 5 |
| c
| 5 |
| d
| 7 |
| e
| 8 |
(see also
this SQLFiddle)
As
you can see, much like in a sports ranking, we have gaps between the different
ranks. We can avoid those gaps by using
DENSE_RANK()
Trivially,
DENSE_RANK() is a rank with no gaps, i.e. it is “dense”. We can write:
SELECT
v, DENSE_RANK() OVER(ORDER BY v)
FROM
t
… to
obtain
| V |
DENSE_RANK |
|---| ------------|
| a
| 1 |
| a
| 1 |
| a
| 1 |
| b
| 2 |
| c
| 3 |
| c
| 3 |
| d
| 4 |
| e
| 5 |
(see
also this SQLFiddle)
One
interesting aspect of DENSE_RANK() is the fact that it “behaves like”
ROW_NUMBER() when we add the DISTINCT keyword.
SELECT
DISTINCT v, DENSE_RANK() OVER(ORDER BY v)
FROM
t
… to
obtain
| V |
DENSE_RANK |
|---|------------|
| a |
1 |
| b
| 2 |
| e
| 5 |
| d
| 4 |
| c
| 3 |
(see
also this SQLFiddle)
In
fact, ROW_NUMBER() prevents you from using DISTINCT, because ROW_NUMBER()
generates unique values across the partition before DISTINCT is applied:
SELECT
DISTINCT v, ROW_NUMBER() OVER(ORDER BY v)
FROM
t
ORDER
BY 1, 2
DISTINCT
has no effect:
| V |
ROW_NUMBER |
|---|------------|
| a
| 1 |
| a
| 2 |
| a
| 3 |
| b
| 4 |
| c
| 5 |
| c
| 6 |
| d
| 7 |
| e
| 8 |
(see
also this SQLFiddle)
Putting
it all together
A
good way to understand the three ranking functions is to see them all in action
side-by-side. Run this query
SELECT
v,
ROW_NUMBER() OVER(ORDER BY v),
RANK()
OVER(ORDER BY v),
DENSE_RANK() OVER(ORDER BY v)
FROM
t
ORDER
BY 1, 2
… or
this one (using the SQL standard WINDOW clause, to reuse window
specifications):
SELECT
v,
ROW_NUMBER() OVER(w),
RANK()
OVER(w),
DENSE_RANK() OVER(w)
FROM
t
WINDOW
w AS (ORDER BY v)
… to
obtain:
| V |
ROW_NUMBER | RANK | DENSE_RANK |
|---|------------|------|------------|
| a
| 1 | 1 |
1 |
| a
| 2 | 1
| 1 |
| a
| 3 |
1 | 1 |
| b
| 4 | 4 |
2 |
| c
| 5 | 5 |
3 |
| c
| 6 | 5 |
3 |
| d
| 7 | 7 |
4 |
| e
| 8 | 8 |
5 |
No comments:
Post a Comment