Split multiline pipe-delimited string into columns and rows in MySQL


Do you have an output that is in comma or pipe-delimited format and you want to go digging in a MySQL database for the data that corresponds to that output? Don’t feel like cutting and pasting and/or scripting in a language better suited for the task and just want it all to happen in MySQL SQL?

I highly advise against this.

But if you insist, here’s an example:

select * from (select substring_index(substring_index(
  substring_index(
    substring_index(
      '1234555 | 3c4004d2-b2a1-4a0c-face-edaac27ef695 | 23
      1234556 | 03d607a3-c2d2-face-870f-26bb551d6da7 | 12
      2344464 | face7729-a1e1-4df8-9dd1-ac73c3383e4a | 19
      2344344 | 9d49fc59-face-47a9-801d-49ffed1dfb56 | 91', '\n', list.i),
    '\n', -1), '|', 2), '|', -1) as uuid
from (select t.i * 2 + o.i + 1 as I
  from (select 0 as i union select 1) as o -- ones
    cross join (select 0 as i union select 1) as t  -- twos
  ) as list
order by list.i) as uuids;

Great. How does it work?

Let’s start with the subselect for list:

select 0 as i union select 1 returns a set of rows whose contents are in a single column, with values 0 and 1. The cross join of two copies of this query creates all permutations of 0 and 1 from the o (ones) select with 0 and 1 from the t (twos) select. I did this as binary digits because 4 was an even power of two, but you could do base ten with:

from (select 0 as i union select 1 union select 2 union select 3...

With the permutations of rows, I just add t.i * 2 + o.i to get values 00 through 11 in binary and then add 1 to them. This calculated value will be aliased as i in the subselect list.

Getting the rows

substring_index(substring_index('multiline string goes here', '\n', list.i),'\n',-1) gets the string up to the list.ith delimiter (in other words, list.i rows) in the inner substring_index and the gets the last occurrence between the '\n' delimiters (last row) with the outer substring_index with the -1 argument.

Getting a specific pipe-delimited field

Getting the uuid (second) pipe-delimited only requires repeated the substring_index pattern but with substring_index(substring_index(row returned from the above section), '|', 2), '|', -1). The '|' and the 2 gets up to the second pipe and then the '|' and the -1 gets the last occurrence.

Don’t do this

This is a terrible idea for multiple reasons, not the least of which is that it was enough material for a blog post just trying to explain how it worked!


Leave a Reply

%d bloggers like this: