Sound Change Appliers

Anthologica Universe Atlas / Forums / Miscellaneria / Sound Change Appliers

previous 1 2 3 4 5 6 7 8 next end

Morrígan Witch Queen of New York
posts: 303
, Marquise message

Also, I gave up trying to enhance the pattern-matching capabilities of my SCA* and proceed using the original implementation, so that I can do more debugging and add flags to the script files to control things like segmentation, decomposition/composition mode, and the loading of other files. I might also move forward with supporting feature models soon, which shouldn't be too difficult given that the overall framework was designed to accommodate the use of features.

You can't use ! on sets or subexpressions when using NFAs without blowing up the graph size, and I didn't want to bother with that. I tried just using an expression tree, where this would be possible, but I spent a lot of time futzing around with the implementation, and realized this weekends that something is badly wrong with the way that repeatable expressions withe * or + are matched inside sets {}. I have an idea, but I want to leave that until later.

9 years ago link

Rhetorica Your Writing System Sucks
posts: 1292
, Kelatetía, Koitra, Illera
message

No one ever asks for sound change appliers—but that is because no one knows how to use them or what features they offer, and they are generally not willing to admit it.

Except me, that is, because I had to shop around while planning my own. WHEN ARE YOU ADDING METATHESIS?

9 years ago link

Morrígan Witch Queen of New York
posts: 303
, Marquise message

Metathesis might not actually be too difficult, though every time I've thought some thing like that I've been wrong. I get the feeling that the solution lies somewhere inside a LinkedHashMap, which might be a good way of doing this one thing in general.

Right now I'm finishing up adding support of legacy segmentation (that is, no special segmentation) and normalization control. The test version used canonical decomposition, and future versions will do so by default, but you can turn it off, or use NFC, NFKC, NFDC if you need to.

After that, I need to add support for negatives in the rule condition (just on variables and literals for now). I have an idea of how I can do this on sets and subexpressions, but since I literally got stuck on this for SIX MONTHS I'm managing my expectations.

But seriously, don't we all want to live in a world where

a e o á é ó > ā ē ō â ê ô / _{s n?t r l m}-s#

CH > J / _R?VV?C*CH

is a valid sound change expression?

Also, I probably should get this onto Github or something.

edited once, last update 9 years ago link

Radius C / 2π
posts: 113
, Hydrogen, United States
message

quoting Rhetorica:
No one ever asks for sound change appliers—but that is because no one knows how to use them or what features they offer, and they are generally not willing to admit it.

Anyone who has much end-user experience with SCAs can tell you that the process of debugging your SC list so the program finally comprehends exactly what you want it to do, a painful and tedious process not given full justice by your phrasing "knows how to use them", is often more work than just sound-changing all your words by hand. If you're only applying six or eight changes to a wordlist that's thousands of entries long, it would be worth it. But using an SCA to apply longer and quite complicated changes of the sort you actually find in natlang histories makes one want to rip off one's own arms and gnaw them to bits, because relatively few real-world changes are so simple as x > y / _z. Just as often it's something like "change the rightmost t to a k if and only if the word does not already contain a k" where the SCA needs to examine the contents of the entire word in order to correctly make the requested change (and none that I know of will actually do this; they all seem to be limited to mere pattern matching). Or, you'll get things like "ijj, ij, j > ij, j, 0 / respectively" or some other such situation where multiple lines are necessary but no matter which order you put them in they interfere with each other's outputs. That's just two examples off the top of my head. But there's a whole sea of possible sound changes that SCAs are too dumb to deal with nicely, and in principle they're all fixable by re-working the structure of your SC list. But figuring out how to do so - especially if there's several of these messes and they interact with each other - is no easy task for people who aren't programmers.

9 years ago link

Nessari ?????? ?????? ????????
posts: 932
, Illúbequía, Seattle, Cascadia
message

quoting Radius:
quoting Rhetorica:
No one ever asks for sound change appliers—but that is because no one knows how to use them or what features they offer, and they are generally not willing to admit it.

Anyone who has much end-user experience with SCAs can tell you that the process of debugging your SC list so the program finally comprehends exactly what you want it to do, a painful and tedious process not given full justice by your phrasing "knows how to use them", is often more work than just sound-changing all your words by hand. If you're only applying six or eight changes to a wordlist that's thousands of entries long, it would be worth it. But using an SCA to apply longer and quite complicated changes of the sort you actually find in natlang histories makes one want to rip off one's own arms and gnaw them to bits, because relatively few real-world changes are so simple as x > y / _z. Just as often it's something like "change the rightmost t to a k if and only if the word does not already contain a k" where the SCA needs to examine the contents of the entire word in order to correctly make the requested change (and none that I know of will actually do this; they all seem to be limited to mere pattern matching). Or, you'll get things like "ijj, ij, j > ij, j, 0 / respectively" or some other such situation where multiple lines are necessary but no matter which order you put them in they interfere with each other's outputs. That's just two examples off the top of my head. But there's a whole sea of possible sound changes that SCAs are too dumb to deal with nicely, and in principle they're all fixable by re-working the structure of your SC list. But figuring out how to do so - especially if there's several of these messes and they interact with each other - is no easy task for people who aren't programmers.

This, this, a thousand times this. Especially the figuring out how to do so part — admittedly I haven't spent much time trying to bang my head on them, but it's only recently that I've gotten lexica (ok ok, just two so far) which are nearing the point where doing them by hand may be slower than picking out which one might be best for my particular uses and figuring the damn thing out. Maybe that'll change when I get around to making myself start learning to code, idk.

9 years ago link

Morrígan Witch Queen of New York
posts: 303
, Marquise message

I can do both of those examples. Syllables are a little hard, but I think the approach I've got in mind for metathesis should also work for syllables generally.

Rule feeding and bleeding is definitely hard. Something I'd kind of like to do is analyze rules and produce a graph that examines which rules interact with each other, and which rules are unordered with respect to one another.

Ultimately, this SCA is part of another system for hypothesis testing proposed reconstructions and sets of rules deriving child forms from them. The fact that it's usable by humans too is an added bonus. IMO, the most important contribution is that it can be used with a feature model, and understands (by default, though you can turn it off) that pʰ is different from p, and that a rule affecting the latter will not affect the former.

9 years ago link

Kereb Ba'al
posts: 50
, Reader message

quoting Radius:
Anyone who has much end-user experience with SCAs can tell you that the process of debugging your SC list so the program finally comprehends exactly what you want it to do, a painful and tedious process not given full justice by your phrasing "knows how to use them", is often more work than just sound-changing all your words by hand.

Yeah, I haven't found an SCA yet that wasn't more trouble to learn, and to trick into doing what I need, than just applying the sound changes myself. But then again I don't actually value having a fully sound-changed lexicon done for me automatically since I don't feel any entry should BE in the lexicon that hasn't been manually vetted anyway.

9 years ago link

Hallow XIII Primordial Crab
posts: 539
, 侯, Basel, Switzerland
message

machining it and then revising by hand as necessary is less work tho

in theory

9 years ago link

Nessari ?????? ?????? ????????
posts: 932
, Illúbequía, Seattle, Cascadia
message

quoting Morrígan:
Ultimately, this SCA is part of another system for hypothesis testing proposed reconstructions and sets of rules deriving child forms from them. The fact that it's usable by humans too is an added bonus. IMO, the most important contribution is that it can be used with a feature model, and understands (by default, though you can turn it off) that pʰ is different from p, and that a rule affecting the latter will not affect the former.

I think the issue is more that if you have situations XYZ where pʰ > p, it's fiendishly hard to get the rules to not treat the subsequent p along with original p, if that's what you want to do.

quoting Kereb:
quoting Radius:
Anyone who has much end-user experience with SCAs can tell you that the process of debugging your SC list so the program finally comprehends exactly what you want it to do, a painful and tedious process not given full justice by your phrasing "knows how to use them", is often more work than just sound-changing all your words by hand.

Yeah, I haven't found an SCA yet that wasn't more trouble to learn, and to trick into doing what I need, than just applying the sound changes myself. But then again I don't actually value having a fully sound-changed lexicon done for me automatically since I don't feel any entry should BE in the lexicon that hasn't been manually vetted anyway.

It is entirely about saving time. I adore personally vetting each change, but above a certain lexicon size errors (both in the changes themselves and plain old making sure all the words get changed) start to increase noticeably.

edited once, last update 9 years ago link

Kereb Ba'al
posts: 50
, Reader message

"in theory" yes, but if everything went as In Theory, we wouldn't have this conversation

9 years ago link

Morrígan Witch Queen of New York
posts: 303
, Marquise message

quoting Nessari:
I think the issue is more that if you have situations XYZ where pʰ > p, it's fiendishly hard to get the rules to not treat the subsequent p along with original p, if that's what you want to do.

Sorry, I don't follow this.

9 years ago link

Rhetorica Your Writing System Sucks
posts: 1292
, Kelatetía, Koitra, Illera
message

I think Ness's point is about rule contamination; without using an unambiguous intermediate representation, poorly-defined environmental descriptions can cause conflicts, e.g.

pʰ > p
p > b | m_

...when one desires that original mp > mb, but original mpʰ > mp (Presumably real examples are somewhat more complex.) As a programmer, of course, that just looks like an order-of-operations mistake, but, then, the standard formalism for representing sound changes seems designed to cause such mistakes, and it can be daunting to understand how to break a sound change into multiple steps to avoid such collisions and conflicts.

tl;dr oh god you're all terrible programmers and the entire history of the field of linguistics is to blame

edited once, last update 9 years ago link

dhok posts: 235
, Alkali Metal message

One idea is to have an entry field for strings of characters that must be treated as single characters. Using Zompist's SCA, I've often found myself resorting to such extravagant representations as Devanagari characters for phonemes that the Latin alphabet has trouble representing easily.

This SCA could set aside a block of rarely-used Unicode characters- say, Yi syllabics- and replace each string in this block with a character from this block. (The nice thing about computer programs is that they can automatically replace and read characters that humans can't easily type or interpret). It will do the same with strings in the rule file as it runs. Then, at the end, it converts them all back. You're still working with one phoneme = one character, but the human doesn't have to mess around with special symbols.

9 years ago link

Pthagnar Benedictine Ovulation
posts: 209
, Quaestor, Hole of Aspiration
message

sound change applier rules are a stupid DSL: the simpler the applier, the stupider the DSL

has anyone tried approaching the problem from the other way round — rather than starting from HERE IS A PORKROM THAT TAKE A LISZT OF WERD AND DO x > y / _fart TO IT, then coming up with some special cases, and only exposing these few rules to the user, instead starting with a full scripting language and writing modules to cover some common cases, but without losing the full power of python, or perl, or ruby or whatever kinky shit you like best?

9 years ago link

Rhetorica Your Writing System Sucks
posts: 1292
, Kelatetía, Koitra, Illera
message

quoting dhok:
One idea is to have an entry field for strings of characters that must be treated as single characters. Using Zompist's SCA, I've often found myself resorting to such extravagant representations as Devanagari characters for phonemes that the Latin alphabet has trouble representing easily.

This SCA could set aside a block of rarely-used Unicode characters- say, Yi syllabics- and replace each string in this block with a character from this block. (The nice thing about computer programs is that they can automatically replace and read characters that humans can't easily type or interpret). It will do the same with strings in the rule file as it runs. Then, at the end, it converts them all back. You're still working with one phoneme = one character, but the human doesn't have to mess around with special symbols.

This is, in fact, something I had planned on for klank, our on-site SCA. Syllable-finding, too. However, it is a pain in the ass to write an SCA, so I'd really prefer it if Morrígan could just add every feature everyone's ever requested to hers... but, hey, y'gotta make do.

9 years ago link

Hallow XIII Primordial Crab
posts: 539
, 侯, Basel, Switzerland
message

Well the simultaneity issue is solved by, if you have p_h p b > p b p_h, converting them to, say, PH P B or similar container characters first and then doing the change. But really this is q inconvenient and I am pretty sure that the SCA could also do this. Some kind of simultaneity tag should be doable, no? Something like

=p_h > p
=p > b
=b > p

Idk how you would implement the notation but it would have to be something to make the SCA first transform all changed phonemes behind the tag into unique variables and only then resolve the changes to prevent overlap.

9 years ago link

Morrígan Witch Queen of New York
posts: 303
, Marquise message

quoting Rhetorica:
quoting dhok:
One idea is to have an entry field for strings of characters that must be treated as single characters. Using Zompist's SCA, I've often found myself resorting to such extravagant representations as Devanagari characters for phonemes that the Latin alphabet has trouble representing easily.

This SCA could set aside a block of rarely-used Unicode characters- say, Yi syllabics- and replace each string in this block with a character from this block. (The nice thing about computer programs is that they can automatically replace and read characters that humans can't easily type or interpret). It will do the same with strings in the rule file as it runs. Then, at the end, it converts them all back. You're still working with one phoneme = one character, but the human doesn't have to mess around with special symbols.

This is, in fact, something I had planned on for klank, our on-site SCA. Syllable-finding, too. However, it is a pain in the ass to write an SCA, so I'd really prefer it if Morrígan could just add every feature everyone's ever requested to hers... but, hey, y'gotta make do.

For Dhok's idea, we can already do this (almost). IMO, the most difficult thing about basically every SCA is that they don't understand that pʰ is not just a sequence of p+ʰ it's a separate, single symbol. My programs doesn't manipulate strings. It does a bunch of work up front to turn strings into sequence objects, which are an array of segments, which in turn hold a string/symbol and optionally, feature data. So, during the segmentation it's still possible to have character sequences reserved.

For the example with Ps, I'd have to agree that it seems like a simple order-of-operations mistake, but I think the standard formalism for sound changes is perfectly sensibly and I don't (personally) see it as contributing to this problem. If you understand what you want clearly, and pay attention to the feed and bleed orders, it's fairly simple.

Hallow: no need for anything extra: p_h p b > p b p should work for that because of the way the rules are applied. ~~At least in theory - this is a bug I'm working on, looks like the cursor isn't advancing through the word correctly in this case.~~
Fixed it! There was a dumb quirk on the way I was advancing the cursor.

edited once, last update 9 years ago link

Rhetorica Your Writing System Sucks
posts: 1292
, Kelatetía, Koitra, Illera
message

The happiness thread was overflowing. So: it is moved here.

...for hilarious technical reasons, this post is the first post of the thread, despite not being the first post in any numeric sense.

edited once, last update 9 years ago link

Morrígan Witch Queen of New York
posts: 303
, Marquise message

So, at this point I'll add that I might be very close to a release. You can turn segmentation and normalization on or off in the script file and this appears to propagate correctly - this hasn't been extensively tested yet though. I would really love to get negatives into rules at least for variables and literals if nowhere else.

But I still need to produce a manual, and add support for reserving character sequences to be treated as single sounds. I might just release it tonight anyway. Sorry guys, it's Java - I'm an engineer, I can't help it.

9 years ago link

Rhetorica Your Writing System Sucks
posts: 1292
, Kelatetía, Koitra, Illera
message

Don't worry, there is a cure—you just need to start taking literature intravenously, and everything will mellow out.

I am semi-seriously considering the possibility of integrating your SCA into Anthologica as a server-side Java application, though, so you'd better be thorough in your documentation!

9 years ago link

previous 1 2 3 4 5 6 7 8 next end

notices

return to Miscellaneria