Selecting only one row per group
- Blog
- Tech Talk
Selecting the First Row for each Group Sometimes you just want to graph the winners. Were there more iOS or Android users today? Grouping and counting the daily usage per platform is easy, but getting only the top platform for…
Selecting the First Row for each Group
Sometimes you just want to graph the winners. Were there more iOS or Android users today? Grouping and counting the daily usage per platform is easy, but getting only the top platform for each day can be tough.
Unlike joining only the first row, primary and foreign keys won’t be of much use, so we’ll need a different approach to make a chart like this:
Let’s start with daily counts per platform from our gameplays table:
select date(created_at) dt, platform, count(1) ct
from gameplays
group by 1, 2
This gives us a familiar table, with one date spanning multiple rows:
dt | platform | ct |
2014-06-30 | iOS | 49751 |
2014-06-30 | android | 80781 |
2014-06-29 | iOS | 158909 |
2014-06-29 | android | 91380 |
2014-06-28 | iOS | 108206 |
2014-06-28 | android | 95363 |
2014-06-27 | iOS | 105756 |
2014-06-27 | android | 92316 |
We want a table with one row per date with the highest count, like this:
dt | platform | ct |
2014-06-30 | android | 80781 |
2014-06-29 | iOS | 158909 |
2014-06-28 | iOS | 108206 |
2014-06-27 | iOS | 105756 |
Postgres and Redshift
As usual on Postgres and Redshift, window functions make this an easy task. We’ll use the row_number() function partitioned by date in an inner query, and then filter to row_num = 1 in the outer query to get just the first record per group.
This function numbers each of the rows:
row_number() over (partition by dt order by ct desc) row_num
We’ll plug it into an inner query, like so, to get the desired results:
select dt, platform, ct
from (
select
date(created_at) dt,
platform,
count(1) ct,
row_number() over
(partition by dt order by ct desc) row_num
from gameplays
group by 1, 2
) t
where row_num = 1
MySQL
Since MySQL doesn’t have window functions, we’ll do something similar using group_concat. With group_concat we can roll up the platform column into a comma-separated string per-date, ordered by its count:
group_concat(platform order by ct desc) platform
That’ll give us all the platforms, with the highest-performing platform first.
The complete query looks like this:
select
dt,
group_concat(platform order by ct desc) platform
from (
select date(created_at) dt, platform, count(1) ct
from gameplays
group by 1, 2
) t
group by 1
Which gives us results like this:
dt | platform |
2014-06-30 | android, iOS |
2014-06-29 | iOS, android |
2014-06-28 | iOS, android |
2014-06-27 | iOS, android |
2014-06-26 |
android, iOS |
2014-06-25 | iOS, android |
Perfect! The platforms with the highest counts are first in the list. Using substring_index — which grabs the first word before the comma — we can extract out only the first platform:
substring_index(
group_concat(
platform order by ct desc
)
, ',',1) platform,
Once we know the highest-performing platform each day, we can use max(ct) to get the count associated with that platform.
The complete query:
select
dt,
substring_index(
group_concat(
platform order by ct desc
), ',',1
) platform,
max(ct) ct
from (
select date(created_at) dt, platform, count(1) ct
from gameplays
group by 1, 2
) t
group by 1
And that’s all there is to it!