I've been using
GROUP BY for all types of aggregate queries over the years. Recently, I've been reverse-engineering some code that uses
PARTITION BY to perform aggregations. In reading through all the documentation I can find about
PARTITION BY, it sounds a lot like
GROUP BY, maybe with a little extra functionality added in? Are they two versions of the same general functionality, or are they something different entirely?
They're used in different places.
group by modifies the entire query, like:
select customerId, count(*) as orderCount from Orders group by customerId
partition by just works on a window function, like
select row_number() over (partition by customerId order by orderId) as OrderNumberForThisCustomer from Orders
group by normally reduces the number of rows returned by rolling them up and calculating averages or sums for each row.
partition by does not affect the number of rows returned, but it changes how a window function's result is calculated.
We can take a simple example.
Consider a table named
TableA with the following values:
id firstname lastname Mark ------------------------------------------------------------------- 1 arun prasanth 40 2 ann antony 45 3 sruthy abc 41 6 new abc 47 1 arun prasanth 45 1 arun prasanth 49 2 ann antony 49
The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.
In more simple words GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
SELECT expression1, expression2, ... expression_n, aggregate_function (aggregate_expression) FROM tables WHERE conditions GROUP BY expression1, expression2, ... expression_n;
We can apply
GROUP BY in our table:
select SUM(Mark)marksum,firstname from TableA group by id,firstName
marksum firstname ---------------- 94 ann 134 arun 47 new 41 sruthy
In our real table we have 7 rows and when we apply
GROUP BY id, the server group the results based on
In simple words:
GROUP BYnormally reduces the number of rows returned by rolling them up and calculating
Sum()for each row.
Before going to PARTITION BY, let us look at the
According to the MSDN definition:
OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.
PARTITION BY will not reduce the number of rows returned.
We can apply PARTITION BY in our example table:
SELECT SUM(Mark) OVER (PARTITION BY id) AS marksum, firstname FROM TableA
marksum firstname ------------------- 134 arun 134 arun 134 arun 94 ann 94 ann 41 sruthy 47 new
Look at the results - it will partition the rows and returns all rows, unlike GROUP BY.
partition by doesn't actually roll up the data. It allows you to reset something on a per group basis. For example, you can get an ordinal column within a group by partitioning on the grouping field and using
rownum() over the rows within that group. This gives you something that behaves a bit like an identity column that resets at the beginning of each group.