Basic Vocabulary for Philologists
Basic Vocabulary for Philologists
Anthologica Universe Atlas / Forums / Terra Firma / Basic Vocabulary for Philologists / Basic Vocabulary for Philologists

? dhok posts: 235
, Alkali Metal, Norman, United States
message
I'm starting this topic to avoid cluttering up the Historical Linguistics thread with musings about it.

After a hiatus of a couple weeks- caused mainly by apathy deriving from a bout of depression- I'm right back working on this thing.

Currently, it's a bunch of spreadsheets. I have Classical Vocabulary for Philologists, which contains sheets for Latin, Greek and Sanskrit; Romance Vocabulary for Philologists, which contains Italian, Spanish, Portuguese, Catalan, French and Romanian, plus Latin (and now that I've discovered that there in fact exists an English-language etymological dictionary of Sardinian, I may as well include that, too); and some inkling of a Germanic Vocabulary for Philologists, which will probably contain at the least Old Norse, Old English and modern German. I'd also like to put something together for Russian as well, since I'm taking it.

Ideally, I'd eventually like to put this on Anthologica somehow or other. The idea is that you would be able to call up any language that's included and browse the database, but be able to exclude cognates you have no use for. (For example, if I'm learning Sanskrit and know Greek and Latin, I want to be able to see Greek and Latin cognates, but I don't need to look at Tocharian or Old Irish ones.) There would be a better UI than just a giant, fugly Excel spreadsheet. Perhaps you could even have it construct you a one-of-a-kind Anki deck. I don't know.

I'm not sure how this will all play out, but I'm currently giving the spreadsheets a redo. I have a 1000-word frequency list for Latin and a 500-word list for Greek. It's difficult to come across Sanskrit frequency lists; the Heidelburg Corpus will let you construct frequency lists for any work it has, though. It doesn't have the Ṛgveda, which is a pity, since you'd ideally really like to be able to use Vedic Sanskrit for a project like this. Instead, I'm going to base the frequency list on the frequency lists of the sort of Classical Sanskrit works that someone who is learning Sanskrit might be most likely to read. I'll use the Mahābhārata as a starting point, which seems wise, since it's so big and was written over such a long period that its frequency list should be fairly representative of Sanskrit as a whole. I'll then throw out proper names and include anything else that's in the frequency lists of the Ramayana, Hitopadeśa and the two texts in the corpus that are by Kālidāsa (the Meghadūta and Kumārasaṃbhava; he's pretty widely read, isn't he?)

There's also the trouble of defining what a "cognate" is for the purposes of this exercise, especially when you have a word that's a root plus a preverb, or-worse- an alpha privative. (It's clear that Latin ignorare and Greek γιγνώσκω are both from *ǵneh₃, for example, but they have opposite meanings, because ignorare also included an *ṇ.) Do we just include anything that falls under the same root?

Classical Vocabulary for Philologists may be viewed here. Suggestions are encouraged. Right now there are a hell of a lot of columns- if/when the spreadsheet gets turned into something easier on the eyes, it will make it much easier for everyone if the same sort of information can be found in the same columns. Eventually I'd like to be able to collapse some of the cognate columns- just have an Iranian column instead of Avestan and Persian, for example- but I'm not sure what the best way to do that is. Right now I'm ignoring cognates in the other Italic languages for basically this reason.