That's an interesting idea, but it also means that both sets have to have the same number of entries. That might be a reasonable extension of the approach under discussion. I've already created a ticket for this on Github.
Syllable-break detection is an interesting problem. I think providing a syllable template would give us a good place to start. The biggest problem is that none of my tools actually handle supersegmental features correctly, or know what a syllable is.