<!>Hanam, a mixed language (2014-04-20 02:49:48)
Hanam, a mixed language
Anthologica Universe Atlas / Forums / Department of Creativity / Hanam, a mixed language / <!>Hanam, a mixed language (2014-04-20 02:49:48)

? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
The frequency table is reset after every row.
It's Markov-ey in a way but with a different... "frequencization".

Right, here's how the algorithm works.

See the number above each columns? That's the "weight" of each language. For the default it's set at equal 10. In Lojban the weight for each number is adjusted by the number of speakers.

Let's take the first row for example. First the engine concatenates each member of each column and give them a value by multiplying the weight of each column, like so:

CONCAT   VALUE
itit     100
itpito   100
ithana   100
pitoit   100
pitopito 100
pitohana 100
hanait   100
hanapito 100
hanahana 100


And then for each concatenated string it takes a set of digraphs by taking a letter and all the letters to the right of it. For example the string "pitohana" will result in the following digraphs:

  p  i  t  o  h  a  n  a
p -  pi pt po ph pa pn pa
i -  -  it io ih ia in ia
t -  -  -  to th ta tn ta
o -  -  -  -  oh oa on oa
h -  -  -  -  -  ha hn ha
a -  -  -  -  -  -  an aa
n -  -  -  -  -  -  -  na
a -  -  -  -  -  -  -  -


And then each digraphs are given a value by the following formula:

combined value / (location_of_first_leter + location_of_second_letter)

Like this:

       1     2     3     4     5     6     7     8
       p     i     t     o     h     a     n     a
1 p    - 33,33    25    20 16,66 14,28  12,5 11,11
2 i    -    25    20 16,66 14,28  12,5 11,11    10
3 t    -     - 16,66 14,28  12,5 11,11    10  9,09
4 o    -     -     -  12,5 11,11    10  9,09  8,33
5 h    -     -     -     -    10  9,09  8,33  7,69
6 a    -     -     -     -     -  8,33  7,69  7,14
7 n    -     -     -     -     -     -     -  6,66
8 a    -     -     -     -     -     -     -     -


Notice how some digraph appears twice or more? We just simply add the frequency. Do this for all other 8 combos and add them up!

After all is finished. We sort the digraphs with the largest value to the lowast.

Like dis:
["it","ha","pi","an","to","na","tp","ti","th","ah","op","oh","ai","ap","oi"]
["tu","ni","pu","ur","ut","ta","ip","it","in","rt","rp","rn","at","ap","an"]
["se","sa","mi","is","ei","am","ms","im","ss","mm","sm"]

Now... the engine will perform the wordbuilding.

Continues to the next reply.