<!>Sanskrit corpus help needed. (2014-04-17 21:41:05)
Sanskrit corpus help needed.
Anthologica Universe Atlas / Forums / Terra Firma / Sanskrit corpus help needed. / <!>Sanskrit corpus help needed. (2014-04-17 21:41:05)

? dhok posts: 235
, Alkali Metal message
quoting Rhetorica:
I think you may like this. The Digital Corpus of Sanskrit only lets you list words from a given text or one at a time, and their sole downloadable, the FrameNet XML, is pretty much useless outside of a very limited range of computational linguistic analyses. (If that's the Sanskrit equivalent of Perseus, the field is doooomed.) It's not impossible to get the information you want from them, but it would require scraping and aggregating the results from each text.

It's got potential. The other half of the problem is that all Sanskrit texts available are natural texts, which includes the language's infamous system of sandhi. I'll eat my hat if I can get a computer to work out that gajośvaścagrāmādāgacchathaḥ is really gajas aśvas ca grāmāt āgacchathas. There are some easy shortcuts you can take- is usually a final -m, -ḥ is usually -s or -r- but combine the whole system with Sanskrit's equally infamous love of compounds, and you've got a recipe for disaster.