The frequency table is reset after every row.
It's Markov-ey in a way but with a different... "frequencization".
Right, here's how the algorithm works.
See the number above each columns? That's the "weight" of each language. For the default it's set at equal 10. In Lojban the weight for each number is adjusted by the number of speakers.
Let's take the first row for example. First the engine concatenates each member of each column and give them a value by multiplying the weight of each column, like so:
CONCAT VALUE
itit 100
itpito 100
ithana 100
pitoit 100
pitopito 100
pitohana 100
hanait 100
hanapito 100
hanahana 100
And then for each concatenated string it takes a set of digraphs by taking a letter and all the letters to the right of it. For example the string "pitohana" will result in the following digraphs:
p i t o h a n a
p - pi pt po ph pa pn pa
i - - it io ih ia in ia
t - - - to th ta tn ta
o - - - - oh oa on oa
h - - - - - ha hn ha
a - - - - - - an aa
n - - - - - - - na
a - - - - - - - -
And then each digraphs are given a value by the following formula:
combined value / (location_of_first_leter + location_of_second_letter)
Like this:
1 2 3 4 5 6 7 8
p i t o h a n a
1 p - 33,33 25 20 16,66 14,28 12,5 11,11
2 i - 25 20 16,66 14,28 12,5 11,11 10
3 t - - 16,66 14,28 12,5 11,11 10 9,09
4 o - - - 12,5 11,11 10 9,09 8,33
5 h - - - - 10 9,09 8,33 7,69
6 a - - - - - 8,33 7,69 7,14
7 n - - - - - - - 6,66
8 a - - - - - - - -
Notice how some digraph appears twice or more? We just simply add the frequency. Do this for all other 8 combos and add them up!
After all is finished. We sort the digraphs with the largest value to the lowast.
Like dis:
["it","ha","pi","an","to","na","tp","ti","th","ah","op","oh","ai","ap","oi"]
["tu","ni","pu","ur","ut","ta","ip","it","in","rt","rp","rn","at","ap","an"]
["se","sa","mi","is","ei","am","ms","im","ss","mm","sm"]
Now... the engine will perform the wordbuilding.
Continues to the next reply.