I worked SO much this weekend, it was great. I made a test-standard data set for Ingush-Chechen-Batsbi, did a lot of refactoring and modularization of my alignment code, as well as setting up the Jenetics library so I can use genetic algorithms to tune my model parameters.
The results so far are fantastic but i have a lot of work still ahead. One rather interesting result so far is using a gap penalty (at least, a constant one) is actually bad:
Admittedly, I have not yet tried this with my Indo-Iranian data. I had been using a penalty of 6, which actually was a very bad choice.
So, the latest good news is that I implemented a few new gap penalty functions and so far it looks like the Indo-Iranian data is behaving better with a non-negative gap (using a convex gap function). Still, building more sample data will be a big help.
Your Writing System Sucks
posts: 1279 , Kelatetía: Dis, Major Belt 1 message
Yyyeah, you should've started with that mentality, I think. Sequence alignment is my speciality, and the functions optimized by the standard approaches are grossly incorrect from a biological standpoint; they just have momentum because they're verifiable and objective. The thought of trying to bring that into a linguistics setting makes me squeamish.
sequence alignment isn't the interesting problem here though, for the most part that's probably going to be inferring proto-forms and rules from correspondences. Which I'm able to get fairly handily. Without training on my Chechen-Ingush-Batsbi data, the system was able to pick out correspondence which I know to be correct. What will be interesting is seeing if it can use a (probably statistical) model to infer reasonable ancestor forms, and identify conditioning environments.
der saz ûf eime steine
posts: 291 ,
Transition Metal, Marburg, Germany message
So … you may have explained it before, but what are those charts depicting? I understand that historical linguistics has become increasingly influenced and informed by genetics, epidemiology and population biology over the past decades, but that still doesn't give me a clue what those charts you posted mean.
They depict the way that model parameters impact the performance of the algorithm vs some human-aligned data. Basically all of that involves sequence alignment stuff, and weight coefficients for my feature model.
I was thinking I could make a thread about this over in Terra Firma
yeah, I don't get that. we DID have a quantitative methods course, but frankly it was terrible, the professor sucked (he failed to get tenure, ultimately) and the book sucked. Keith Johnson, I think, the orange one that does everything in R and explains nothing.
Yeah, same here. I mean, okay, it's vaguely more forgivable in traditional linguistics, but if I study CL and I don't have to take statistics what does that say about the value of my degree? (The regulations that prevent CS from being available in 90-point format are also terrible.)
An example of one of the ways in which you shouldn't start sentences is that this is a terrible sentence no one wants to read. Seriously, if you ever start a sentence like that, you’re fired. Clean out your desk. Go.