<!>happiness thread (2015-06-17 11:54:51)
happiness thread
Anthologica Universe Atlas / Forums / Miscellaneria / happiness thread / <!>happiness thread (2015-06-17 11:54:51)

? Morrígan Witch Queen of New York
posts: 303
, Marquise message
Definitively NOT using Levenstein. Each segment in a sequence is represented by a multivalue feature vector. Right now, this is just a multidimensional vector distance weighted by the strength of each feature (they use different scales), but I'd prefer it represented a probability  (or -log thereof) that a pair was related) which is obviously more complicated to model.

I don't have metathesis explicitly implemented yet, but the algorithm already has a way of comparing short (length 1 to n, though anything above n=3 is absurd and even that is questionable), so a 2-2 comparison would compare segments where one underwent metathesis. There are cases where this is probably not sufficient though.

The ranking is an interesting problem, but conceivably that's the interesting problem. Given a set of correspondences and environments, I'll need to figure out a way to work backward and possibly re-run the alignments using new information based on discovering non-viable reconstructions, or discovering that the alignments we derived are somehow not viable.

The most important question is how this system performs when given garbage, viz. a set of chance resemblances between unrelated languages. I need to build an algorithm that can tell these apart, or at least determine that the relationship is a chance one.

I'll start a thread some time tonight if I'm able to get the time. I'm supposed to have dinner with my cousin, so who knows.