Hanam, a mixed language
Anthologica Universe Atlas / Forums / Department of Creativity / Hanam, a mixed language

? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
Here's a secret: I have never invented a word personally. It was always a product of an programmed engine.
My latest toy is something that literally mix words together. It was written as an attempt to automate lojban root from its sources.

The algorithm is...
well...

I'm too lazy to explain it, study the algorithm yourself by looking at the javascript: https://dl.dropboxusercontent.com/u/5517255/JS%20Tests/Mixer.html

Anyway, my first experiment is combining Japanese, Korean, and Chinese. For starter, here's the numeral (Cause everyone loves numeral)

      CH    JP     KR     CAN1  CAN2  CHOS
1     it    pito   hana   itop  hana  itop
2     ni    puta   tur    tur   nip   nip
3     sam   mi     seis   sei   sam   sei
4     si    io     neis   sio   nei   syo
5     ñu    itu    tases  tase  ñuit  tase
6     riuk  mu     ieses  ries  iuki  ryes
7     tshit nana   irkop  natti irkop natti
8     prat  ia     ieter  prat  iete  yete
9     kiu   kokono ahop   kokiu ahoko kokyu
10    dsip  too    ier    ieri  toip  yeri
100   braik momo   on     mono  brai  mono
1000  tshen ti     tsëmën tshen sëmën chen
10000 mion  iorodu tëmen  ioron tëmen yoron
? Rhetorica Your Writing System Sucks
posts: 1292
, Kelatetía message
Second-order Markov chain. That's all you had to say!

Looks neat, though! Definitely has potential for koineization. I see you're using syllable structure from the whole dataset but restricting composition to the same row—how are you doing that without modifying the transition frequency table? Do you just throw out hits to bad characters, or are you rewriting the frequency table to exclude them afterwards?
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
The frequency table is reset after every row.
It's Markov-ey in a way but with a different... "frequencization".

Right, here's how the algorithm works.

See the number above each columns? That's the "weight" of each language. For the default it's set at equal 10. In Lojban the weight for each number is adjusted by the number of speakers.

Let's take the first row for example. First the engine concatenates each member of each column and give them a value by multiplying the weight of each column, like so:

CONCAT   VALUE
itit     100
itpito   100
ithana   100
pitoit   100
pitopito 100
pitohana 100
hanait   100
hanapito 100
hanahana 100


And then for each concatenated string it takes a set of digraphs by taking a letter and all the letters to the right of it. For example the string "pitohana" will result in the following digraphs:

  p  i  t  o  h  a  n  a
p -  pi pt po ph pa pn pa
i -  -  it io ih ia in ia
t -  -  -  to th ta tn ta
o -  -  -  -  oh oa on oa
h -  -  -  -  -  ha hn ha
a -  -  -  -  -  -  an aa
n -  -  -  -  -  -  -  na
a -  -  -  -  -  -  -  -


And then each digraphs are given a value by the following formula:

combined value / (location_of_first_leter + location_of_second_letter)

Like this:

       1     2     3     4     5     6     7     8
       p     i     t     o     h     a     n     a
1 p    - 33,33    25    20 16,66 14,28  12,5 11,11
2 i    -    25    20 16,66 14,28  12,5 11,11    10
3 t    -     - 16,66 14,28  12,5 11,11    10  9,09
4 o    -     -     -  12,5 11,11    10  9,09  8,33
5 h    -     -     -     -    10  9,09  8,33  7,69
6 a    -     -     -     -     -  8,33  7,69  7,14
7 n    -     -     -     -     -     -     -  6,66
8 a    -     -     -     -     -     -     -     -


Notice how some digraph appears twice or more? We just simply add the frequency. Do this for all other 8 combos and add them up!

After all is finished. We sort the digraphs with the largest value to the lowast.

Like dis:
["it","ha","pi","an","to","na","tp","ti","th","ah","op","oh","ai","ap","oi"]
["tu","ni","pu","ur","ut","ta","ip","it","in","rt","rp","rn","at","ap","an"]
["se","sa","mi","is","ei","am","ms","im","ss","mm","sm"]

Now... the engine will perform the wordbuilding.

Continues to the next reply.
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
Oh god. I just realize there's a mistake in how I translate my algorithm to Javascript

BRB, changing the JS.
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
Okay, where was I again? Ah yes... the digraph list. Right, here is all the digraph generated in the first row:

["it","ha","pi","pt","hn","an","aa","po","io","ii","to","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","oi","nt","hh","hp","pp","ph","ot","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]

The length of the resulting mix is the average length of the parents rounded to the next integer. But for example, let;s ignore that limitaiton.  Now... we take the very first digram "it" and mark it out from the array:

"it": ["ha","pi","pt","hn","an","aa","po","io","ii","to","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","oi","nt","hh","hp","pp","ph","ot","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]

Next, we find a digrams that start with "t" and mark it out:

"ito": ["ha","pi","pt","hn","an","aa","po","io","ii","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","oi","nt","hh","hp","pp","ph","ot","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]

Do it until reached the desired length:

"ito": ["ha","pi","pt","hn","an","aa","po","io","ii","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","oi","nt","hh","hp","pp","ph","ot","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]
"itoi": ["ha","pi","pt","hn","an","aa","po","io","ii","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","nt","hh","hp","pp","ph","ot","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]
"itoio": ["ha","pi","pt","hn","an","aa","po","ii","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","nt","hh","hp","pp","ph","ot","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]
"itoiot": ["ha","pi","pt","hn","an","aa","po","ii","na","ti","ia","tt","ip","ih","ta","ai","at","th","tp","hi","in","ht","ah","ap","pa","tn","ni","nt","hh","hp","pp","ph","ao","oa","pn","nh","np","oh","ho","op","nn","no","on","oo"]

And so forth.

What if we haven't reached the wanted length but there's no viable digrams? Well... the engine will take a step back and find for next digram.
The medial and final filter are placed to prevent the engine from making phonotactically illegal strings.
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
After fixing the JS. Here's the result of the mixer. The last column is the one I chose to be a word in Hanam:

CAN1  CAN2  CHOS
itoi  hana  itoy
tut   put   tut
mis   sei   mis
sio   ioi   syo
ñuta  itas  nguta
iese  muki  muki
nanni irkop nanni
iera  iate  yera
kokiu kiuko kokyu
toie  ieri  toye
mono  onom  mono
tsënë titsë cënë
ioduo miodu yodwo
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
Anyway, let's back to the language. FIrst. to make it more cohesive. I should determine some sort of rules, namely phoneme and phonotactics.

For phoneme. The language will be roughly similar to middle Chinese. That means. it will have three series of stops, voiced, voiceless, aspirated.
The only fricative would be /s h/
The only liquid will be r-l which appears in variation simlar to korean, /r/ at the start of a syllable and /l/ at the end.
There will be six vowels /a e i o u @/

So... all in all the language will have the following inventory.

p  t   k
ph th  kh
b  d   g
m  n   ng
   s   h
   r/l 

a e i o ë


As for phonotactics. The language will allow C[r/y/w]VC at the most complex. With ty, thy, and dy written as c,ch,j.
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
1st: nal
2nd: nen
3rd: gil
Reflexive: sësso
Proximal: ikol
Non-proximal: ces

Interrogative (thing): hawe
Interrogative (person): nuyu

Since all the parent languages distinguish between three state of rice, so does this language!

rice (plant): yey
rice (husked): meye
rice (cooked): papi

Let's generate a word for "eat"... ah it's "meta"

Now... since both korean and japanese is SOV with case marker. I'll make this language as such:

topic: wan / an
object: wol / ol
subject: ga / ak
locative: neyo / eyo
genitive: noy / oy
instrumental: dey

We have our first sentence in this language:

Naran papiwol meta.
Papiwan narak meta.

Puppy one...