How to Explode Arrays with Presto
Learn how to use the equivalent of HiveQL’s lateral view explode in Presto and Trino.
If you ever come across a column in a relational table which includes arrays or maps (key value pairs), then you’ll usually want to expand the array so that each element gets its own row. This makes it much easier to use the data in your analysis.
Lateral view explode
If you’re using SQL or HiveQL then you can use a
LATERAL VIEW EXPLODE statement.
It puts each element of the array into its own row which is combined with the other columns of the table - confused yet? Don’t worry! - there’s an example coming up. 🤔
I came across this situation when I was working on a piece of analysis to map out which channels customers interact with in the 30 days pre-purchase. Luckily for me, the data already existed at transaction level. But, as the table maintained one row per transaction, the channels which customers interacted with were in an array.
Presto and Trino don’t have the lateral view explode function which you’d find in SQL or HiveQL, but there is an alternative - you can use the
UNNEST() function with a
Before we look at how to do this in Presto and Trino, let’s first look at an example of what a lateral view explode can be used for.
Example of a lateral view explode
A lateral view explode expands your array elements across multiple rows.
As you can see in the example below, the
channel_array elements have now been distributed across multiple rows for each
Before lateral view explode
After lateral view explode
How can I do this with Presto and Trino?
You can use
CROSS JOIN UNNEST as in this query:
SELECT a.transaction_id, b.n AS channel_number, b.channel FROM db.transactions a CROSS JOIN UNNEST(a.channel_path) WITH ORDINALITY AS b (channel, n)
There are a few things to unpack here. Let’s go through them one by one:
UNNEST(): This function takes the array column from the original table and expands into individual rows -
channel_arraycolumn from the
db.transactionstable in this example.
CROSS JOIN: The result of the
UNNEST()function is joined to all rows in the original table, giving you a repeated row whenever you have more than one element in the array (see repeated values of
transaction_idin the example).
WITH ORDINALITY: This is optional. It brings back the item position in the array - column
nin the example above. If you don’t use
WITH ORDINALITYthen you don’t need to specify
CROSS JOIN UNNEST(a.channel_path) AS b (channel, n)
(channel, n): These specify what you want the expanded array elements to be called (
channelin this example) as well as the position in the array (
nin this example). They can be called anything you like!
b: You can use this alias to specify columns from the
CROSS JOIN UNNESTin the
SELECTstatement using any alias you give it -
bin this example.
Note that if you’re doing this on a mapping of key value pairs instead of an array then you can remove the redundant
WITH ORDINALITY statement and refer to the key values using
b.value in the
SELECT statement when
value are defined like this:
CROSS JOIN UNNEST(a.channel_path) AS b (key, value)
Now you now know how to perform a lateral view explode in Presto and Trino.
This is a handy trick to know as more processes use API data returned in JSON format which finds its way downstream to the analysts!