[WFS-India] [Talk-in] Automating OSM translation into Indic languages
Srikanth Lakshmanan
srik.lak at gmail.com
Mon Apr 6 09:10:59 UTC 2015
Hello,
Great work, I have been thinking this for sometime. I am of the opinion
that place names(towns / villages etc) should be translated and not
transliterated. Arun has a point about locality address as people might be
so used to English, that they find translations in their own language
unusable.
For place names, would it be a good idea to run a script which can look up
wikidata, extract names in multiple language and update OSM? Below is a
sample query for 'Bangalore' in multiple languages.
[1]
https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&titles=Bangalore&languages=hi|ka|ml|ta|or|mr|gu|te&props=labels&format=xml
On Sat, Apr 4, 2015 at 11:54 PM, Aruna S <safincrazy at gmail.com> wrote:
> Hello!
>
> Long email warning.
>
> I've been thinking a little bit about automating the translation of maps
> into multiple Indic languages ever since I saw the Kannada map at geoBLR in
> March.
>
> I started some work on it today, and I have lots of interesting things to
> report. Right now I am mostly transliterating as opposed to translating but
> if a dictionary of common words/tags can be compiled, upgrading the script
> to translate instead of transliterating should be doable.
>
> Here's the algorithm I followed:
>
> 1. Get the nodes within a bounding box from OSM using the python
> wrapper for Overpass - overpy
> <http://python-overpy.readthedocs.org/en/latest/example.html> - This
> returns a collection of nodes and associated ID, tags, lat, lon and other
> attributes. This can also be repeated for ways by using the corresponding
> overpy query.
> 2. Filter nodes that have tags
> 3. From the result of the filter, identify nodes with Indic language
> tags - eg:["name:kn"]
> 4. Transliterate the string value for tag["name:kn"] to another
> language - I used Tamil - and store it within tag["name:ta"] - I used the Indic
> transliterator <http://silpa.org.in/Transliteration> APIs from SILPA
> <http://transliteration.readthedocs.org/en/latest/> for this
> 5. Create a new changeset and upload the result(node with
> tag["name:ta"]) to OSM using osmapi
> <http://osmapi.divshot.io/#OsmApi.OsmApi.NodeUpdate>
>
> I did it only for one node:
> https://www.openstreetmap.org/edit?node=1118255762#map=19/12.99451/77.55430
>
>
> *Advantages*
>
> - *Indic to Indic transliterations - ✓*The Indic transliterator APIs
> seem to convert quite effortlessly from one Indic language to another.
> Right now, support is available for Hindi, Tamil, Punjabi, Gujarati,
> Malayalam, Oriya, Bengaliand Kannada. So, if a Kannada tag exists in OSM,
> the same text can be transliterated into multiple Indic languages using the
> naive algorithm I described above.
>
> *Limitations*
>
> - *English to Indic transliterations - X*: Though the Indic
> Transliterator works for English To Indic transliterations as well, it is
> not very useful. This is because only English words that are in the CMU
> dictionary are capable of being transliterated - which means that we can't
> transliterate "Raajaajeenagar", even if we had a custom tag for
> transliteration on OSM. On emailing the developer
> <http://thottingal.in/blog/about/> of the transliterator about
> extending the capabilities of English transliteration, I was told that
> extending the dictionary by adding additional words is one option. I am not
> sure of how feasible this is, or how much more optimal it is as compared to
> translating to one Indic language and transliterating+translating to the
> rest.
> - *Translations of English Words - X* - Right now, I am only able to
> transliterate words, but if a list of common words(I am guessing all the
> OSM tags, and other common words) could be compiled, and translated into
> all the Indic languages, the translation process can be automated quite
> easily. This would require the algorithm to have 2 additional steps
>
>
> 1. From an Indic tag(i.e., an already translated tag, we would have to
> identify portions that are in the translations list, and leave them out of
> the transliteration process.
> 2. For the word(s) identified in step 1, we must find a translation
> in the translations list for the language we are translating into. This
> must then be suffixed or prefixed with the transliterated portion. I am
> guessing suffix will be the norm, while prefixes might occasionally be
> necessary.
>
>
> - *Tracking node version numbers - X *- Right now, I am unable to
> track the version attribute of a node tag using the overpy API. I entered
> the version number manually. Not sure if I am missing something. This is
> just a "need-to-figure-out" issue more than anything. This is very
> important for automatically updating a node to the server because if
> there's a mismatch between the version number being passed to the API and
> the version number on the server, the API won't work.
> - *Which Indic Language to begin transliterating in* - Issues might
> arise if a language like Tamil - where the letter for ka, kha, ga, gha etc
> is the same - is say used to transliterate to Hindi. But, if we use a
> language like Kannada or Hindi for the first time, this issue can probably
> be resolved easily.
>
> The script is on Github
> <https://github.com/anura28/Automate-Translations-OSM/blob/master/automateIndicTranslation.py>.
> Feel free to fork it, use it, work on it, edit it and suggest changes,
> different language, other possibilities, alternatives etc. Pull Requests
> very welcome. :)
>
> This is my first time writing code in Python, so advice on improving code
> would be very welcome. Also, let me know if I'm missing something else,
> obvious or subtle.
>
> Thanks!
>
> Warmly,
>
> Aruna
>
> _______________________________________________
> Talk-in mailing list
> Talk-in at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-in
>
>
--
Regards
Srikanth.L
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openhatch.org/pipermail/wfs-india/attachments/20150406/b07aec18/attachment.html>
More information about the WFS-India
mailing list