SubDB
The portuguese version of our web client, Easy Subtitles, recently won a Facebook page. Check it out here: http://facebook.com/legendafacil :)

The portuguese version of our web client, Easy Subtitles, recently won a Facebook page. Check it out here: http://facebook.com/legendafacil :)

Catching up

So, it has been a long time since our last post here. Google already dropped it’s Translator API and our N-Gram-Based Text Categorization implementation has proven to be more than suitable for the work.

Since our last post, we moved to a new server at Linode and tweaked our services to provide blazing fast access. Even with almost 9 million requests per month, our CPU load (percent) never passed 6.11%. So, I think we are good, for now ;)

Also, our HTML5 web client, Easy Subtitles has been updated with a new UI (user interface) and some other new features like notifications via boxcar.io and the ability to add language extensions to the downloaded files.

To our brazilian high-definition lovers, we are proud to announce that the Extreme HD Team is now uploading their amazing subtitles (resyncs for high definitions videos) directly to our database.

We are working hard to provide you a great, lightning fast, totally free and without annoying ads, experience. We hope you enjoy ;D

N-Gram-Based Text Categorization

At this point, you might have heard that Google has deprecated it’s Language API:

The Google Translate API has been officially deprecated as of May 26, 2011. Due to the substantial economic burden caused by extensive abuse, the number of requests you may make per day will be limited and the API will be shut off completely on December 1, 2011.

More on: Google Code Blog and Google Translate API Documentation

As you know, we used Google Translate API for language detection. No more! We are now using an implementation of N-Gram-Based Text Categorization to do this vital work in our own servers.

This means faster uploads and better language detection. Faster, because we no longer have to request Google Servers during the upload process and Better because we can now probe more chunks of data, analyze the results and improve the system to satisfy our specific needs.

To make this possible, we now support only a small subset of languages: Dutch, English, French, Italian, Polski, Portuguese (Brazil), Romanian, Spanish, Swedish e Turkish. We have plans to add more languages in the future, but we’ve chosen quality over quantity.

Thanks to @edufelipedev for the suggestion of the algorithm and @wilkerlucio for helping with the tests.

Cheers.

Easy Subtitles (and some stats)

How about trying a new way of searching subtitles in many languages for your video files? All you need to do, is drag a video file directly from your computer to your browser. Doesn’t matter the file size, the search is almost instantaneous.

Try it now on Easy Subtitles!

Didn’t find what you were looking for? Come back later, or leave your email to be notified as soon as a subtitle becomes available.

You can also contribute with the SubDB project by uploading subtitles through your browser. Just drag a video file (the same way when you’re searching) with it’s subtitle (which should have the same name, ex.: video.mkv and video.srt). Just like the search, the upload is almost instantaneous.

As usual, some more stats:

More than 60K subtitles were found with SubDB in the past two months. It’s still a small number considering that in the same period, more than 2M (two million) requests have been made on our API.

We had a significant drop in the number of requests due to summer time vacations I suppose, but in general the important stats like number of uploaded and downloaded subtitles is getting better.

Do you like our service? Tell your friends. It’s all we ask! (not mentioning upload subtitles too… :)

We’re getting smarter!

First, I would like to share some stats with you:

Percent of requests on our API last week:

  • Periscope: 96.63%
  • Pyrrot: 3.26%
  • XBMCSubtitles: 2.91%
  • EasySubtitles: 0.20%

The number of uploads raise a little bit, but sadly, nothing to celebrate yet.

Now, the good news:

We have improved our SubRank algorithm, which means that the chances of getting a wrong/out of sync/bad subtitle are now even lower than before. Off course, since our algorithm works by analyzing the subtitles you upload, more uploads means higher accuracy.

XBMC Subtitles and other stats

Thanks to the XBMC Subtitles plugin, that you can install directly from the XBMC interface, you’ll now be able to download subtitles from SubDB. It’s the way we like, simple and easy to use.

Learn how to install from your XBMC here.

Other stats:

Only yesterday, we have handled more than 90K (yes, ninety thousand) requests on our API. Sadly, only five of those requests are uploads. This is probably related to the fact that only our demo client implements upload for now.

So we ask you, developer, to help us implementing the upload method on your client. It’s very simple.

And we ask you, user: if you have the knowledge to use our demo client (Pyrrot), please, upload your subtitles.

New clients and API methods

For those who follow us on twitter, these are not news, but if you don’t…

SubDB has two new clients

Periscope - is a python module to download subtitles that supports SubDB and many others; and

Easy Subtitles - search for subtitles dragging video files directly into your browser.

Two new API methods

Search - To list the available subtitle languages for a given hash.

Available languages - that lists the languages of all subtitles stored on our database (not all languages that we support - see Google Language API for this).

For more information, check our API documentation.

Supported languages

Another frequently asked question we receive is why SubDB supports so few languages. The answer is: We do not support a few languages, we support a bunch of them. Our systems can handle any language that Google Language API supports. Unfortunately, no subtitle of any languages other than dutch, english and portuguese has been uploaded yet. If someone uploads a french subtitle, for example, it’ll instantly appear on the SubDB main page.

Of course, the website and API documentation isn’t translated to all those languages yet, but you can help us translate SubDB into your own language. Contact us.

Update your clients

Attention: If you using Pyrrot-cli, update your client with the latest version from GitHub. Also, if you are developing a client and receives a “512 Precondition failed”, check it out how the User-Agent string must be on the API documentation.

How to report an invalid subtitle

People frequently ask us, how can they report an invalid (out of sync, broken, with lines missing, etc) subtitle in case it happens.

This is one of the coolest things about SubDB. You just don’t need to. If you happen to download an invalid subtitle, all you have to do is upload a correct one. Our database can handle more than one subtitle per video and language. Sounds easy, right?

How do you choose the right one to download? Again, you don’t need to! Our system uses a number we call SubRank, calculated to determine wich is the best subtitle for your video file. Cool huh?

We handle all the boring stuff, so you can have more time to enjoy your videos.