SQL Aggregate Functions

Aggregate functions compute a single result from a set of input values. The built-in aggregate functions are listed below.

The aggregate functions that accept an expression parameter can be modified by the keywords DISTINCT or ALL. If neither is specified, the result is the same as if ALL were specified.

DISTINCT

Modifies the expression to include only distinct values that are not NULL

ALL

Includes all rows where expression is not NULL

AVG ( [DISTINCT|ALL] expression )

Average (arithmetic mean) of all input expression values

COUNT (*)

Number of input rows, including rows that are NULL

COUNT ( [DISTINCT|ALL] expression )

Number of input rows where expression value is not equal to NULL

MAX ( [DISTINCT|ALL] expression )

Maximum value of expression across all input values

MIN ( [DISTINCT|ALL] expression )

Minimum value of expression across all input values

SUM ( [DISTINCT|ALL] expression )

Sum of expression across all input values

LISTAGG ( [DISTINCT|ALL] expression,'delimiter') [ WITHIN GROUP ( ORDER BY order_list [ASC|DESC] ) ]

Displays the order of data within each group specified in the ORDER BY clause and then concatenates the values of the measure column.

  • As a single-set aggregate function, LISTAGG operates on all rows and returns a single output row.

  • As a group-set aggregate, the function operates on and returns an output row for each group defined by the GROUP BY clause.

  • As an analytic function, LISTAGG partitions the query result set into groups based on one or more expression in the query_partition_clause.

STDDEV_SAMP( [DISTINCT|ALL] expression )

The STDDEV_SAMP() function computes the estimation of the standard deviation based on the unbiased sample variance VAR_SAMP(X). This function can be used when the values of the column are a subset of a larger population. The formula use is STDDEV_SAMP(X) = SQRT(VAR_SAMP(X)).

STDDEV_POP( [DISTINCT|ALL] expression )

The STDDEV_POP() function computes the standard deviation of the values in a column. The standard deviation is a measure of the variability of the values around the mean value. As for the variance, NULL values are ignored for the computation of the standard deviation. The formula used for computation is STDDEV_POP(X) = SQRT(VAR_POP(X)).

VAR_SAMP( [DISTINCT|ALL] expression )

The VAR_SAMP() function computes the unbiased sample variance of the values in a column, ignoring NULL values. This is an estimation of the variance of a larger dataset where the column is only a sample of the entire population. The formula used is VAR_SAMP(X) = (SUM(X*X) - SUM(X)*SUM(X)/COUNT(X)) / (COUNT(X) -1) when there are at least two non-NULL values in the column, otherwise NULL.

VAR_POP( [DISTINCT|ALL] expression )

The VAR_POP() function computes the mathematical variance of the values in a column - ignoring NULL values. The variance is the expected value of the squared deviation from the mean. The formula used for VAR_POP(X) is VAR_POP(X) = (SUM(X*X) - SUM(X)*SUM(X)/COUNT(X)) / COUNT(X)), where there is at least on non-null value in the column, otherwise NULL.

GROUPING ( expression )

GROUPING is an aggregate function used to determine whether an expression is part of the GROUP BY key. The GROUPING aggregate function evaluates to 0 if the expression is part of the GROUP BY key and 1 otherwise. When used in combination with ROLLUP, CUBE or GROUPING SETS it can be used to distinguish columns containing NULL values and columns that are NULL because the columns are not part of the grouping set.

Examples

AVG()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT AVG(a) FROM one;
 [AVG]
 ------
  12.5
SELECT AVG(a+b) FROM one;
 [AVG]
 ------
   15
SELECT AVG(a+b+c) FROM one;
 [AVG]
 ------
  27.5

COUNT(*)

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT COUNT(*) FROM one;
 [COUNT]
 --------

    4

COUNT(expression)

CREATE TABLE tst_table (column1 string);
INSERT INTO tst_table VALUES ('a'),('b'),('c'),(NULL);
SELECT COUNT(*) FROM tst_table;
 [COUNT]
 --------
    4
SELECT COUNT(column1) FROM tst_table;
 [COUNT]
 --------
    3

COUNT( DISTINCT | ALL expression)

CREATE TABLE two (a INT, b DOUBLE, c STRING);
INSERT INTO two VALUES(21,1,21), (22,2,22), (21,1,21), (21,1,21), (21,1,21), (22,2,22),(21,null,null);
SELECT COUNT(*) FROM two;
 [COUNT]
 --------
    7
SELECT COUNT(ALL c) FROM two;
 [COUNT]
 --------
    6
SELECT COUNT(DISTINCT a ) FROM two;
 [COUNT]
 --------
    2
SELECT COUNT(DISTINCT(a+b+c)) FROM two;
 [COUNT]
 --------
    2

LISTAGG()

SELECT LISTAGG(name, '; ') "Player List",
  MIN(number) "Min Jersey #"
  FROM hockey
  WHERE position = 'Forward' and number < 20;
                        Player List                         Min Jersey #
 ---------------------------------------------------------- -------------
 GREGORY CAMPBELL; NATHAN HORTON; MILAN LUCIC; TYLER SEGUIN      11
SELECT LISTAGG(name, '; ')
  WITHIN GROUP (ORDER BY number, name asc) "Player List",
  MIN(number) "Min Jersey #"
  FROM hockey
  WHERE position = 'Forward' and number < 20;
                        Player List                         Min Jersey #
 ---------------------------------------------------------- -------------
 GREGORY CAMPBELL; MILAN LUCIC; NATHAN HORTON; TYLER SEGUIN      11

Example with LISTAGG() in a SELECT with a GROUP BY clause:

CREATE TABLE t1 (f1 int, f2 string);
INSERT TABLE INTO t1 VALUES (1,'one'),(2,'two'),(3,'three'),(2, 'another two');
SELECT LISTAGG (F2, ',') FROM t1;
          [LISTAGG]
  -------------------------
  one,two,three,another two
SELECT LISTAGG (F2, ',') FROM t1 GROUP BY f1;
     [LISTAGG]
  ---------------
  one
  two,another two
  three

MAX()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT MAX(a) FROM one;
 [MAX]
 ------
   14

MIN()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT MIN(a) FROM one;
 [MIN]
 ------
   11

SUM()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT SUM(a) FROM one;
 [SUM]
 ------
   50

STDDEV_SAMP()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT STDDEV_SAMP(a) FROM one;
    [STDDEV_SAMP]
 ------------------
  1.118033988749895

STDDEV_POP()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT STDDEV_POP(a) FROM one;
    [STDDEV_POP]
 ------------------
       1.12

VAR_SAMP()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT VAR_SAMP(a) FROM one;
    [VAR_SAMP]
 ------------------
       1.25

VAR_POP()

CREATE TABLE one (a INT, b DOUBLE, c STRING);
INSERT INTO one
VALUES (11,1,11), (12,2,12), (13,3,13), (14,4,14);
SELECT VAR_POP(a) FROM one;
    [VAR_POP]
 ------------------
      1.25

GROUPING ()

SELECT country,
       state,
       city
FROM   employee;
 COUNTRY   STATE   CITY
 -------- ------- ------

 germany  bavaria munich
 germany  bavaria <null>
SELECT country,
       state,
       city,
       Grouping(city),
       Count(*)
FROM   employee
GROUP  BY rollup ( ( country ), ( state ), ( city ) );
 COUNTRY   STATE   CITY  [GROUPING]  [COUNT]
 -------- ------- ------ ----------- --------

 germany  bavaria munich      0         1
 germany  bavaria <null>      0         1
 germany  bavaria <null>      1         2
 germany  <null>  <null>      1         2
 <null>   <null>  <null>      1         2