SQL Aggregate Functions
Aggregate functions compute a single result from a set of input values. The built-in aggregate functions are listed below.
The aggregate functions that accept an expression parameter can be modified by the keywords DISTINCT
or ALL
. If neither is specified, the result is the same as if ALL
were specified.
DISTINCT |
Modifies the expression to include only distinct values that are not |
ALL |
Includes all rows where expression is not |
AVG ( [DISTINCT|ALL] expression )
-
Average (arithmetic mean) of all input
expression
values COUNT (*)
-
Number of input rows, including rows that are
NULL
COUNT ( [DISTINCT|ALL] expression )
-
Number of input rows where expression value is not equal to
NULL
MAX ( [DISTINCT|ALL] expression )
-
Maximum value of expression across all input values
MIN ( [DISTINCT|ALL] expression )
-
Minimum value of expression across all input values
SUM ( [DISTINCT|ALL] expression )
-
Sum of
expression
across all input values LISTAGG ( [DISTINCT|ALL] expression,'delimiter') [ WITHIN GROUP ( ORDER BY order_list [ASC|DESC] ) ]
-
Displays the order of data within each group specified in the ORDER BY clause and then concatenates the values of the measure column.
-
As a single-set aggregate function, LISTAGG operates on all rows and returns a single output row.
-
As a group-set aggregate, the function operates on and returns an output row for each group defined by the
GROUP BY
clause. -
As an analytic function, LISTAGG partitions the query result set into groups based on one or more expression in the query_partition_clause.
-
STDDEV_SAMP( [DISTINCT|ALL] expression )
-
The
STDDEV_SAMP()
function computes the estimation of the standard deviation based on the unbiased sample varianceVAR_SAMP(X)
. This function can be used when the values of the column are a subset of a larger population. The formula use isSTDDEV_SAMP(X) = SQRT(VAR_SAMP(X))
. STDDEV_POP( [DISTINCT|ALL] expression )
-
The
STDDEV_POP()
function computes the standard deviation of the values in a column. The standard deviation is a measure of the variability of the values around the mean value. As for the variance, NULL values are ignored for the computation of the standard deviation. The formula used for computation isSTDDEV_POP(X) = SQRT(VAR_POP(X))
. VAR_SAMP( [DISTINCT|ALL] expression )
-
The
VAR_SAMP()
function computes the unbiased sample variance of the values in a column, ignoring NULL values. This is an estimation of the variance of a larger dataset where the column is only a sample of the entire population. The formula used isVAR_SAMP(X) = (SUM(X*X) - SUM(X)*SUM(X)/COUNT(X)) / (COUNT(X) -1)
when there are at least two non-NULL values in the column, otherwise NULL. VAR_POP( [DISTINCT|ALL] expression )
-
The
VAR_POP()
function computes the mathematical variance of the values in a column - ignoring NULL values. The variance is the expected value of the squared deviation from the mean. The formula used forVAR_POP(X)
isVAR_POP(X) = (SUM(X*X) - SUM(X)*SUM(X)/COUNT(X)) / COUNT(X))
, where there is at least on non-null value in the column, otherwise NULL. GROUPING ( expression )
-
GROUPING
is an aggregate function used to determine whether an expression is part of theGROUP BY
key. TheGROUPING
aggregate function evaluates to0
if the expression is part of theGROUP BY
key and1
otherwise. When used in combination withROLLUP
,CUBE
orGROUPING SETS
it can be used to distinguish columns containingNULL
values and columns that areNULL
because the columns are not part of the grouping set.
Examples
AVG()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT AVG(a) FROM one;
[AVG]
------
12.5
SELECT AVG(a+b) FROM one;
[AVG]
------
15
SELECT AVG(a+b+c) FROM one;
[AVG]
------
27.5
COUNT(*)
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT COUNT(*) FROM one;
[COUNT]
--------
4
COUNT(expression)
CREATE TABLE tst_table (column1 string);
INSERT INTO tst_table VALUES ('a'),('b'),('c'),(NULL);
SELECT COUNT(*) FROM tst_table;
[COUNT]
--------
4
SELECT COUNT(column1) FROM tst_table;
[COUNT]
--------
3
COUNT( DISTINCT | ALL expression)
CREATE TABLE two (a INT, b DOUBLE, c STRING);
INSERT INTO two VALUES(21,1,21), (22,2,22), (21,1,21), (21,1,21), (21,1,21), (22,2,22),(21,null,null);
SELECT COUNT(*) FROM two;
[COUNT]
--------
7
SELECT COUNT(ALL c) FROM two;
[COUNT]
--------
6
SELECT COUNT(DISTINCT a ) FROM two;
[COUNT]
--------
2
SELECT COUNT(DISTINCT(a+b+c)) FROM two;
[COUNT]
--------
2
LISTAGG()
SELECT LISTAGG(name, '; ') "Player List",
MIN(number) "Min Jersey #"
FROM hockey
WHERE position = 'Forward' and number < 20;
Player List Min Jersey #
---------------------------------------------------------- -------------
GREGORY CAMPBELL; NATHAN HORTON; MILAN LUCIC; TYLER SEGUIN 11
SELECT LISTAGG(name, '; ')
WITHIN GROUP (ORDER BY number, name asc) "Player List",
MIN(number) "Min Jersey #"
FROM hockey
WHERE position = 'Forward' and number < 20;
Player List Min Jersey #
---------------------------------------------------------- -------------
GREGORY CAMPBELL; MILAN LUCIC; NATHAN HORTON; TYLER SEGUIN 11
Example with LISTAGG()
in a SELECT
with a GROUP BY
clause:
CREATE TABLE t1 (f1 int, f2 string);
INSERT TABLE INTO t1 VALUES (1,'one'),(2,'two'),(3,'three'),(2, 'another two');
SELECT LISTAGG (F2, ',') FROM t1;
[LISTAGG]
-------------------------
one,two,three,another two
SELECT LISTAGG (F2, ',') FROM t1 GROUP BY f1;
[LISTAGG]
---------------
one
two,another two
three
MAX()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT MAX(a) FROM one;
[MAX]
------
14
MIN()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT MIN(a) FROM one;
[MIN]
------
11
SUM()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT SUM(a) FROM one;
[SUM]
------
50
STDDEV_SAMP()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT STDDEV_SAMP(a) FROM one;
[STDDEV_SAMP]
------------------
1.118033988749895
STDDEV_POP()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT STDDEV_POP(a) FROM one;
[STDDEV_POP]
------------------
1.12
VAR_SAMP()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT VAR_SAMP(a) FROM one;
[VAR_SAMP]
------------------
1.25
VAR_POP()
CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT VAR_POP(a) FROM one;
[VAR_POP]
------------------
1.25
GROUPING ()
SELECT country,
state,
city
FROM employee;
COUNTRY STATE CITY
-------- ------- ------
germany bavaria munich
germany bavaria <null>
SELECT country,
state,
city,
Grouping(city),
Count(*)
FROM employee
GROUP BY rollup ( ( country ), ( state ), ( city ) );
COUNTRY STATE CITY [GROUPING] [COUNT]
-------- ------- ------ ----------- --------
germany bavaria munich 0 1
germany bavaria <null> 0 1
germany bavaria <null> 1 2
germany <null> <null> 1 2
<null> <null> <null> 1 2