How to Explode Arrays with Presto
Learn how to use the equivalent of HiveQL’s lateral view explode in Presto and Trino.
If you ever come across a column in a relational table which includes arrays or maps (key value pairs), then you’ll usually want to expand the array so that each element gets its own row. This makes it much easier to use the data in your analysis.
Lateral view explode
If you’re using SQL or HiveQL then you can use a LATERAL VIEW EXPLODE
statement.
It puts each element of the array into its own row which is combined with the other columns of the table - confused yet? Don’t worry! - there’s an example coming up. 🤔
I came across this situation when I was working on a piece of analysis to map out which channels customers interact with in the 30 days pre-purchase. Luckily for me, the data already existed at transaction level. But, as the table maintained one row per transaction, the channels which customers interacted with were in an array.
Presto and Trino don’t have the lateral view explode function which you’d find in SQL or HiveQL, but there is an alternative - you can use the UNNEST()
function with a CROSS JOIN
Before we look at how to do this in Presto and Trino, let’s first look at an example of what a lateral view explode can be used for.
Example of a lateral view explode
A lateral view explode expands your array elements across multiple rows.
As you can see in the example below, the channel_array
elements have now been distributed across multiple rows for each customer_id
.
Before lateral view explode
transaction_id | channel_array |
---|---|
11111 | ["Email","App","Website"] |
22222 | ["Social","Website"] |
33333 | ["Website","App","Email","App"] |
After lateral view explode
transaction_id | channel_number | channel |
---|---|---|
11111 | 1 | |
11111 | 2 | App |
11111 | 3 | Website |
22222 | 1 | Social |
22222 | 2 | Website |
33333 | 1 | Website |
33333 | 2 | App |
33333 | 3 | |
33333 | 4 | App |
How can I do this with Presto and Trino?
You can use CROSS JOIN UNNEST
as in this query:
SELECT
a.transaction_id,
b.n AS channel_number,
b.channel
FROM db.transactions a
CROSS JOIN UNNEST(a.channel_path) WITH ORDINALITY AS b (channel, n)
There are a few things to unpack here. Let’s go through them one by one:
UNNEST()
: This function takes the array column from the original table and expands into individual rows -channel_array
column from thedb.transactions
table in this example.CROSS JOIN
: The result of theUNNEST()
function is joined to all rows in the original table, giving you a repeated row whenever you have more than one element in the array (see repeated values oftransaction_id
in the example).WITH ORDINALITY
: This is optional. It brings back the item position in the array - columnn
in the example above. If you don’t useWITH ORDINALITY
then you don’t need to specifyn
here:CROSS JOIN UNNEST(a.channel_path) AS b (channel, n)
(channel, n)
: These specify what you want the expanded array elements to be called (channel
in this example) as well as the position in the array (n
in this example). They can be called anything you like!b
: You can use this alias to specify columns from theCROSS JOIN UNNEST
in theSELECT
statement using any alias you give it -b
in this example.
Note that if you’re doing this on a mapping of key value pairs instead of an array then you can remove the redundant WITH ORDINALITY
statement and refer to the key values using b.key
and b.value
in the SELECT
statement when b
, key
and value
are defined like this:
CROSS JOIN UNNEST(a.channel_path) AS b (key, value)
Now you now know how to perform a lateral view explode in Presto and Trino.
This is a handy trick to know as more processes use API data returned in JSON format which finds its way downstream to the analysts!