Thursday, September 2, 2010

Current state of Blu-ray Subtitle Rendering Software

This is a reply I wrote this morning. I think this is quite decently written except for the grammar part… This pretty much summarizes my understanding about all things about Blu-ray subtitle software after near 1 year of research and experiment on this very subject. This article mainly compares the most wildly used subtitle “rendering software” BDSup2Sub and easySUP for the video enthusiasts crowd to how a CORRECT subtitle software should work to prevent flickering – an issue if someone decides to do lots of fancy typesetting. This correct subtitle software I found is called avs2bdnxml, and only the latest update 1.13 is working correctly. And only the subtitle software is NOT enough, the BD authoring tool has to have some ability to do the optimization as well. Most of the information here are derived from this thread on doom9 if you’re interested in how all this buffer overflow and flickering issue were realized, and then techniques discovered and developed to solve the issue for the past year…


The flicker for DVD is completely different from the flicker for BD. First of all, I assume we all use MaestroSBT to generate DVD subtitle (subpicture). The flicker for DVD is caused by the authoring software muxman which does not work with continuous timecodes from what I read. But it was never a problem for DVDMaestro and Scenarist, which is really what MaestroSBT is designed for. Also the subtitles and menus are in the same plane, they are both called subpictures, the only difference is that menus are forced to display and have buttons. There maybe no dedicated decoding buffer (can’t verify this). But the craziest thing I have done is generating a 30fps 720*480 four color stream from Flash, and stuck it into the subtitle track. Some cheap DVD players such as a cyberhome CH-300 plays it totally fine.
The flicker for BD is caused by limited decoding buffer (4MB), something SONY decides to cut corner when making the spec. In HDMV mode, the subtitles (presentation graphs in BD term, PG for short) are no longer in the same layer as the menus (interactive graphs in BD term, IG for short). In BD-J mode, the menus are in yet another separate layer called Java Graphs, while it uses the same PG for subtitles. So for PG, it has its dedicated decoding buffer of 4MB. Lots of optimizations have to be done in order to not overflow the buffer.
The subtitles have a hierarchical structure:
Epoch -> Window -> DisplaySet -> Color Palette -> Composition Object.
Epoch governs everything underneath, in order for subtitles that have continuous timecodes, these subtitles have to be grouped into the same Epoch, and otherwise the buffer will overflow. Ironically, BDSup2Sub and easySUP does exactly the opposite. If the subtitles have continuous timecodes, it separates each subtitle into one Epoch. This means that the output will have gaps (a source of flickering) since the Epoch cannot be too close to each other. BDSup2Sub and easySUP fail in this criterion at the very beginning.
The window controls where your subtitle resides in, this is read from you bdn xml file:
<Graphic Width="443" Height="207" X="92" Y="766">00005139_0.png</Graphic>
This means your window is 443x207, and it’s 92 pixels in horizontal, 766 pixels in vertical away from the top left corner of the 1920x1080 background plane.
In the case that timecodes that are continuous, a bunch of subtitles are in one Epoch, then the window size goes by the largest window of these subtitles. This is the way how Scenarist BD works. Tsmuxer do not have this kind of ability to alter a subtitle stream, so a subtitle rendering software has to have a built-in mechanism to make the adjustment. I’m not sure if SUP format can contain this information.
BD spec also allows a dual window configuration, where you can have two windows in one Epoch. One subtitle event can have two graphs in different window. This is achieved by using this:
<Event Forced="False" InTC="00:02:51:09" OutTC="00:02:55:17">
<Graphic Width="443" Height="207" X="92" Y="766">00005139_0.png</Graphic>
<Graphic Width="974" Height="140" X="474" Y="62">00005139_1.png</Graphic>
</Event>

This will have the result of two windows, one window with 443x207, and it’s 92 pixels in horizontal, 766 pixels in vertical away from the top left corner; another window with 974*140, and it’s 474 pixels in horizontal, 62 pixels in vertical away from the top left corner. Note that these two windows can not overlap.
In the case that timecodes that are continuous, a bunch of subtitles are in one Epoch, then the two window sizes go by the largest two windows of these subtitles. This is the way how Scenarist BD works. If the image splitting and window assignments are too complex, Scnearist BD will lump the two windows into one giant window.
BTW Scnearist BD also supports auto cropping if your subtitles are not cropped, but it’s only limited to single window.
These dual window splitting with the Epoch combination are the most important part to minimize the buffer usage. The window splitting part comes from your subtitle rendering software, BDSup2Sub and easySUP do NOT have the ability to read dual window configuration, this is where they ultimately fails in handling flickering. while the Epoch combination part comes from your BD authoring tool, I’m not sure if this information can be included in SUP format.
A DisplaySet is one event (subtitle). Each DisplaySet can have its own palette. An event is read from the bdn xml file as well:
<Event Forced="False" InTC="00:02:51:09" OutTC="00:02:55:17">
<Graphic Width="443" Height="207" X="92" Y="766">00005139_0.png</Graphic>
<Graphic Width="974" Height="140" X="474" Y="62">00005139_1.png</Graphic>
</Event>

This event is a non-forced subtitle starting at 00:02:51:09 and ending at 00:02:55:17.
Tsmuxer has a weird bug that you can’t mux with a subtitle that starts in less than 2s or so. But this is totally valid in BD Spec, the timing can be started at 00:00:00:00.
A color palette can have a maximum of 256 colors, where the last color entry has to be transparent. So it’s really 255 colors…
A Composition Object is a Graph:
<Event Forced="False" InTC="00:02:51:09" OutTC="00:02:55:17">
<Graphic Width="443" Height="207" X="92" Y="766">00005139_0.png</Graphic>
<Graphic Width="974" Height="140" X="474" Y="62">00005139_1.png</Graphic>
</Event>

So in the case where dual window splitting is on, a DisplaySet will have two graphs (Composition Objects). Normally a DisplaySet will have one Composition Object. Each Composition Object can not be less than 8x8 pixels. Although software players don’t have this limitation, PS3 follows it rigorously which graphs that are less than 8x8 will be discarded.
Also, according to the author of BDSup2Sub, there can be 8 palettes and 64 composition objects per Epoch. But I don’t think Scenarist BD uses this kind of implementation. Scenarist BD uses the DisplaySet, each DisplaySet can have its own palette, and upto 2 Composition Objects. Now I’m not sure exactly how many DisplaySets can an Epoch have here, but I think it’s more than enough to cover the most complex timings...
 
Summary:
For BD subtitles, you can't just slap in a bunch of full 1080 png pictures (presentation graph in BD term, PG for short) like what you did for DVD. If you want to go simple yet have some degree of fanciness, the rule of thumb is to limit your texts into 1920*540, which is half of the vertical resolution. This from what I tested works pretty well, even for continuous timecodes. The only catch is that you can’t use BDSup2Sub and easySUP if your timing are continuous, because they will intentionally creates gaps…
If you want to go fancy, basically to achieve the same level of MaestroSBT for DVD, then things could go really messy. A lot of tweaking includes cropping on the png picture, referencing dual window splitting in the bdn xml file are needed before importing into a BD authoring tool. Avs2bdnxml 1.13 can finally do this after almost a year of development. And your BD authoring tool also needs to be able to read these dual window splitting format (Composition Objects in BD term), and has the ability to combining subtitles that are close to each other to the same Epoch. Only both of these conditions are met, you can have totally flicker free subtitles. So Scenarist BD is THE tool to handle this, I’m not sure if the other high-end commercial solution would make it.
Since avs2bdnxml can only output png+xml files, which means it only works with Scenarist BD, which takes png+xml and output to its proprietary PES+mui files. This PES+mui files retain all the information I mentioned above except the language setting. It can be used for other projects by simply dragging into the Data browser.
For folks who use the most popular free muxer on earth – tsmuxer, cannot take the advantage of it. SUP format is the only way to go with tsmuxer. I’m not sure if ps auxw, the author of avs2bdnxml is still planning SUP support. Couple months back he said it’s 50% done. I’m not sure if he met some difficulties, or just don’t have enough time to look into it yet. But in order for SUP to work, it must be able contain Epoch combination and dual window information, which from the developer of BDSup2Sub, it is very unlikely?
Here is a quote from him:
“I don't think that the BDN XML format in its current form is able to store the information that would be needed to define palettes, objects, windows and epochs in a way that would be needed to allow an external converter to accurately convert it to BD-SUP.
Even if someone would define a new XML format to cover all of this, there would be no tool to create such files. Last but not least, someone would have to write the converter with authoring capabilities (calculating all the DTS/PTS timestamps for a complex PG stream with all plausibility and limitation checks will be a nightmare).”

 
 
And shortly after the above is replied, someone was asking if text based subtitle is workable, and here is my response:
No, text based subtitle (textST) is even worse. It's really designed for minimalist.
It does not support overlapping timecodes, period! It does support dual window like PG, each composition object can have its own style (color, size, font). If your font is too big, you can't have timings that are too close. I think there are certain formulas it follows: text this big has to have x seconds apart from the next one...
What's nice about textST is that it is out of mux type, means that you can add on to your Blu-ray players temporary storage any time you want via BD-Live feature. (I have yet seen any studio utilizing this feature though...) PG on the other hand is in mux type, means that it's already muxed with video and audio, you can not change it unless you reauthor the disc. A BD can have upto 32 PG tracks and 256 textST tracks. So only use it if you are running out of 32 PG tracks…
TextST relies on opentype font (*.otf) for displaying. Each font cannot be larger than 4MB. I think the total size cannot be larger than 4MB as well (I might be wrong on this). What's really weird thing about this font issue is that the real otf fonts are NOT supported, but the truetype font (*.ttf) can be used just by changing their extension to *.otf... You should have quite a few truetype fonts already in your windows OS, just copying out any of them (as long as less than 4MB) and changing to *.otf extension, and it’s ready to use.
Currently Lemony Pro and srt2bdn (a program written by someone on bdjforum.com) can be used to generate textST. But you also need professional BD authoring tool such as Scenarist BD to generate these out of mux type of subtitle. They come in *.m2ts with very small sizes. Tsmuxer does not support textST muxing/remuxing. Lemony Pro has built-in checker to check the validity such as close timing mentioned above, while srt2bdn does not check, you’ll have to trial and error in Scenarist BD. Their subtitle quality is about the same. But final results depend on your player. I’ve found that PowerDVD insists its own style which is extremely ugly, while TMT3 and PS3 display the same picture as previewed in Scenarist BD.

No comments:

Post a Comment