What a wanderer could wonder about...

Sunday, December 02, 2007

MIT's Lecture Browser, Dehkhoda and Moien online and Nastaliq!

People at MIT's CSAIL has set up this very interesting Lecture Browser system, which with the help of voice recognition systems, gives you the ability to search through lectures given at MIT (I think the whole audio and video database is not yet available there, but that is the plan).

In the introduction video the project is described like this:

"Conventional search engines are all text based, and that's been very effective, it's great. There's all sorts of text on the web but it doesn't do anything for audio and video materials and this type of data is just exploding these days it's becoming easier than ever to create, to store, to disseminate these kinds of data. Just look at the explosion of podcasting for example this project is all about letting people search inside video material of recorded lectures, to find particular snippets that they're interested in..."
This is just great. Although it is not as powerful as you would expect, the ranking of the results doesn't look that good and there isn't much possibility for using complex search queries, it is still a great idea. Perhaps the idea has been around for long, but putting it in practice for lectures given at MIT, I think, is going to show how useful applying semi-perfect voice recognition to a voice/video database and indexing the database based on that is going to be.

We have a long way to go before we can have this for videos and audios of speeches given in Persian. As far as I know, we are far behind in the voice recognition and NLP systems for Persian; there isn't even a proper marked-up database of Persian scripts or voice recordings, which is necessary for the current technology for building a functioning voice recognition or NLP system for the language.

Talking of Persian, it's been long I have been looking for a proper Persian-Persian dictionary online and I had always wished looking up words in Persian would have been as easy as looking up words in English. Fortunately last week I found Mibo, which gives you the possibility of looking up Persian words in the two prominent dictionaries of Dehkhoda and Moien. There seems to be some legal and copyright problems and some people are trying to shut the site down (based on what Tehran Emrooz says), but I hope people instead of shutting it down, spend the effort on having a legal version online first.

Another good news related to Persian language I heard last week was that within a Tasma or Takfa supported project, the Nastaliq font is developed and freely available online (the release announcement, download link). It looks to have some problems with پ and چ that comes before the end ی, and the size of the font is relatively smaller than the other fonts, but if these bugs are fixed it would be just great.

1 comment:

Tim said...

Hi, do you by chance know the folks at Mibo? We are interested in creating a Persian version of our Chinese-English tool - see http://loqu8.com. We just need to track down the data...