Concatenating rows in Redshift, Postgres, & MySQL
- Blog
- Tech Talk
Sometimes it’s helpful to look at an aggregated overview of many rows. With numeric columns, it’s easy to sum or average many values, but for string columns we need something different. We can concatenate strings from multiple rows with concatenating…
Sometimes it’s helpful to look at an aggregated overview of many rows. With numeric columns, it’s easy to sum or average many values, but for string columns we need something different. We can concatenate strings from multiple rows with concatenating aggregations.
Imagine we have a table of salespeople and the customers they manage:
We can compact this and make it more human-readable by flattening the many rows per salesperson into a single row:
This can be done with any concatenating aggregation. There’s group_concat in MySQL, string_agg in Postgres, and listagg in redshift.
Let’s first look at how to use each of these functions. Then, we’ll cover a few tricks with concatenating aggregations.
Redshift listagg
Redshift recently added listagg, and a corresponding window function version. The basic usage is:
select
salesperson,
listagg(customer, ', ')
within group (order by contract_size desc)
from customers
group by 1
The within group clause is an uncommon bit of syntax, but all it does is allow for ordering the rows used in an aggregation. Here, it orders all of the customers by contract size, then listagg concatenates their names with a comma separator.
Postgres string_agg
select
salesperson,
string_agg(customer , ', ' order by contract_size desc)
from customers
group by 1
Postgres has a robust set of aggregation functions, and you can similarly aggregate multiple rows with array_agg and json_agg.
By combining array_to_string with array_agg, you can duplicate the behavior of string_agg.
MySQL group_concat
select
salesperson,
group_concat(customer
order by contract_size desc
separator ',')
from customers
group by 1
MySQL supports an optional distinct operator in group_concat:
select group_concat(distinct customer)
from customers
MySQL has a relatively low cap of 1024 characters of the length of a group_concat result, but it can be increased by changing the group_concat_max_len system variable.
SQL Server workarounds
There’s no concatenating aggregation in SQL Server, but there are workarounds based on XML path.
Compound concatenation
While the customer names are nice, it would be better if we could include contract size in our collapsed table:
For the remaining examples, we’ll use Redshift’s dialect.
Because these concatenating aggregations concatenate expressions, and not just column names, we can easily accomplish this by pre-concatenating the customer name and contract size:
select
salesperson,
listagg(customer || ' $' || contract_size, ', ')
within group (order by contract_size desc)
from customers
group by 1
Concatenating a limited number of values
In a real database, concatenating all the rows for a given group will often create a string of unwieldy length. If we use a subquery to limit the number of rows in a given group first, then our final concatenation will be more readable:
with numbered_customers as (
select
*,
row_number() over (partition by salesperson
order by contract_size desc)
from customers
)
select
salesperson,
listagg(customer || ' $' || contract_size, ', ')
within group (order by contract_size desc)
from numbered_customers
where row_number <= 3
group by 1
We first declare numbered_customers, which numbers every customer per salesperson, starting with the largest contracts. Then we concatenate only the first three rows for each salesperson. Nice and tidy.
Happy concatenating!
We hope you learned a few tricks for formatting SQL tables for human reading.