Using Captionate Exported XML June 13, 2006, revision 1 (June 13, 2006)
 
Introduction
Captionate lets you embed data into Flash Video (FLV) files. Captionate can also export (and import) the data in its own XML format.
 
XML export/import in Captionate is designed mainly for data exchange purposes. For example, performing translation on the text XML file off-site may be preferable, rather than doing it in Captionate. While using embedding data has its advantages, it also has a disadvantage (which we will be discussing shortly), and you may want to use the XML file exported for captioning (or other purposes). In this article we will also provide some sample code for using the Captionate XML captions data in Flash.
 
Captionate can also export to other formats via export library plug-ins. Currently, there's a free plug-in available from Manitu Group for exporting the captions for a single language track in Rich Media Project players XML format.

 
Why Embed the Data?
Why not? Your video file has the video and audio embedded in a single file, it's only natural to have other types of data there in a single 'complete' file.
 
Also, the obvious reason why video (and/or audio) files are generally used externally applies to other data: you load the data as it is needed, you don't load the whole file at once. Having the whole file loaded consumes memory and, maybe more importantly, this will introduce a noticeable delay most of the times. Caption data size will be quite small compared to video/audio data size, still loading the whole data from another source will introduce a (maybe unnecessary) delay.
 
Another advantage is that you receive the data via a callback function (your event handler is called) when it's the correct time for the data. The alternative involves polling the time of the video constantly (usually via a timer) and figuring out which data to display. This means a performance penalty.
 
Captionate can convert caption data to cue points type. Cue points are normally included in the onMetaData event data. When the caption data is duplicated in the onMetaData event, this can also introduce a noticeable delay when the video is starting. Captionate has an option for not including 'event' type cue points in onMetaData event.

 
The Disadvantage of Using Embedded Data
When the data is embedded, you don't have access to the whole data, it's received as its time comes.
 
For cue points, Flash Video Encoder (and Captionate, also see the note at the end of previous section) includes a copy of the data in the onMetaData event. While this solves the problem of not having access to the whole data, having duplicate data reduces advantages of data embedding, especially if you won't need it. For 'navigation' cue points, you certainly need the data beforehand to be able to seek to cue points time. For 'event' type cue points, it depends on how you will be using them.
 
For captions/subtitles it depends on if you want to display the caption immediately when you seek to an arbitrary time. When playing the video linearly, you receive onCaption events as necessary while the video is playing and display the caption text. But when you seek to an arbitrary time, you won't know the correct caption to display there, you'll have to wait for the next onCaption event.
 
For most applications, this is not a big problem. In case of a seek operation by user, you would just empty the text field you are displaying captions and the next onCaption event will arrive in a few seconds, you'll only be losing a single caption. Nevertheless, you may want to have that caption too. Another reason is that if your video is paused, this may be somewhat annoying. Then you'll need all the caption data beforehand and loading this from the Captionate XML may be preferable.
 
Generally speaking, the problem arises from the fact that the data in the FLV are at certain points of time, physically it doesn't (and can't) span a time period. Solution is building a system that logically interprets the data.
 
(Consider the case where you want to change some elements in the host SWF depending on the content in FLV. This will work with simple markers (or cue points) as long as you don't seek to an arbitrary time, where you'll need to wait for the next event. You may embed events repeatedly at regular intervals, or, you will need to interpret the data according to current time).

 
The XML Solution
If you have decided to use the external XML approach, here's a solution plan: (As mentioned before, you can also choose to use Rich Media Project players XML format, which only supports one language per XML file. On the other hand Captionate XML can have more than one language track and this lets you change the language on the fly).
 
Reading in the XML
We will develop our solution for Flash MX 2004 (version 7) and Flash 8. Flash MX (version 6) does not support external FLV files and while some of the code here may apply, it's not supported through this article.
 
Reading an XML file with Flash is rather easy, as Flash has a built-in XML Object:
	XMLNAME = "test_data.xml";
	var captionsXML = new XML();
	captionsXML.load(XMLNAME);
Obviously, you won't even need to have the file name on a separate line and even as it is, it takes only 3 lines.
 
Unfortunate for our simple coding purposes, the load function is asynchronous, that is, it returns immediately and lets us know that the loading is finished by calling onLoad function and setting loaded property to true.
 
We have decided to use the XML temporarily and move the data to arrays, hoping for better performance. Here are the 4 arrays we will use:
	// time values
	//: for every caption, each array element is the time value in seconds
	var captionTimes = new Array();
	//
	// speaker index and caption texts
	//: for every caption, each array element is an array which has 
	//: speaker index at [0] and caption texts, such that
	//: track0 text is at [1], track[1] text is at [2], and so on.
	var captions = new Array();
	//
	// speaker info
	//: for every speaker, each array element is an array such that
	//: [0] = speaker name
	//: [1] = string data for the speaker
	var speakerInfo = new Array();
	//
	// track info
	//: for every track, each element is an array such that
	//: [0] = display name for the track
	//: [1] = type of the track
	//: [2] = language code fo rthe track
	//: [3] = target wpm for the track
	//: [4] = string data for the track
	var trackInfo = new Array();
We just need function that will populate these arrays after the XML is loaded, and here's that function, which checks every node in an XML recursively and moves the caption related data in one pass:
//recursive function that copies the data in Captionate XML to 4 arrays
function CreateCaptionArrays(node:XMLNode) {
	if (node.hasChildNodes()) {
		for (var aNode = node.firstChild; aNode != null; aNode=aNode.nextSibling) {
			switch (aNode.nodeName) {
			case "captionate" :
				inCaptionate = true;
				CreateCaptionArrays(aNode);
				inCaptionate = false;
				break;
			case "captioninfo" :
				if (inCaptionate) {
					inCaptioninfo = true;
					CreateCaptionArrays(aNode);
					inCaptioninfo = false;
				}
				break;
			case "trackinfo" :
				if (inCaptioninfo) {
					inTrackinfo = true;
					CreateCaptionArrays(aNode);
					inTrackinfo = false;
				}
				break;
			case "speakerinfo" :
				if (inCaptioninfo) {
					inSpeakerinfo = true;
					CreateCaptionArrays(aNode);
					inSpeakerinfo = false;
				}
				break;
			case "captions" :
				if (inCaptionate) {
					inCaptions = true;
					CreateCaptionArrays(aNode);
					inCaptions = false;
				}
				break;
			case "caption" :
				if (inCaptions) {
					//convert time to seconds
					captionTimes.push(aNode.attributes.time/1000);
					current = new Array();
					captions.push(current);
					inCaption = true;
					CreateCaptionArrays(aNode);
					inCaption = false;
				}
				break;
			case "speaker" :
				if (inCaption) {
					current.push(Number(aNode.firstChild.nodeValue));
				}
				if (inSpeakerinfo) {
					current = new Array();
					speakerInfo.push(current);
					inSpeaker = true;
					CreateCaptionArrays(aNode);
					inSpeaker = false;
				}
				break;
			case "track" :
				if (inTrackinfo) {
					current = new Array();
					trackInfo.push(current);
					inTrack = true;
					CreateCaptionArrays(aNode);
					inTrack = false;
				}
				break;
			case "tracks" :
				if (inCaption) {
					inTracks = true;
					CreateCaptionArrays(aNode);
					inTracks = false;
				}
				break;
			case "name" :
				if (inSpeaker) {
					current.push(aNode.firstChild.nodeValue);
				}
				break;
			case "stringdata" :
				if (inSpeaker) {
					current.push(aNode.firstChild.nodeValue);
				}
				if (inTrack) {
					current[4] = aNode.firstChild.nodeValue;
				}
				break;
			case "displayname" :
				if (inTrack) {
					current[0] = aNode.firstChild.nodeValue;
				}
				break;
			case "type" :
				if (inTrack) {
					current[1] = aNode.firstChild.nodeValue;
				}
				break;
			case "languagecode" :
				if (inTrack) {
					current[2] = aNode.firstChild.nodeValue;
				}
				break;
			case "targetwpm" :
				if (inTrack) {
					current[3] = aNode.firstChild.nodeValue;
				}
				break;
			}
			if ((inTracks) && (aNode.nodeName.slice(0, 5) == "track")) {
				current[Number(aNode.nodeName.slice(5))+1] =
								aNode.firstChild.nodeValue;
			}
			CreateCaptionArrays(aNode);
		}
	}
}
CreateCaptionArrays function makes use of external booleans, and our 4 arrays, so we will need to be careful about initializing the variables before calling the function.
 
In the onLoad handler of our XML we will initialize the variables and call CreateCaptionArrays, then all the necessary information will be in easy to access 4 arrays.
captionsXML.onLoad = function(success) {
	if (!success) {
		// problem with XML load, should be handled
		return;
	}	
	// reset initial values
	inCaptionate = false;
	inCaptioninfo = false;
	inSpeakerinfo = false;
	inSpeaker = false;
	inTrackinfo = false;
	inTrack = false;
	inCaptions = false;
	inCaption = false;
	inTracks = false;
	captionTimes = [];
	captions = [];
	speakerInfo = [];
	trackInfo = [];
	//move data to arrays
	CreateCaptionArrays(captionsXML);
	//XML is no longer needed
	captionsXML = "";
	//proceed to frame 2
	gotoAndStop(2);
};
You may have noticed the last line is gotoAndStop(2);. The reason it's there is that for our sample, we will be doing the loading in the first frame and play the video in the second frame. For that we will also have a stop(); right after we call load.

 
Finding the Correct Caption
Now we are at frame 2...
 
For this article we will use the simplest way to play the video we can. (If you are not familiar with the code below, you can find step by step instructions for this kind of video playing in the Receiving Captionate Embedded Data article).
//Simple sample code to play a FLV file
FLVNAME = "test.flv";
nc = new NetConnection();
nc.connect(null);
nets = new NetStream(nc);
video.attachVideo(nets);//video is the instance name of the video symbol
nets.play(FLVNAME);
Current time of the video is in time property of the NetStream object playing the video, which is nets.time for the above example.
 
We will need to poll the video time, and display the correct caption. Lets create a timer for this:
...
nets.play(FLVNAME);
//display captions via timer
setInterval(callback, 100);
//wait
stop();
This timer will call the function callback. We also have a stop() there because we will be waiting on this frame while the video is playing.
 
We have all the time values for captions in captionTimes array, in seconds. NetStream.time will also return the time in seconds. We assume that the captions in the XML were sorted, as exported by Captionate, so our array is also sorted. Captionate uses empty captions for signaling the end of previous caption, so what we need to do is to find the caption time which has the highest value in all times that are less than or equal to the current time. (If this sounds complicated, please give yourself a second to think about it, it's not complicated at all).
 
Considering normally there are more than a couple of captions, a binary search is more appropriate than a linear search. Following function takes a time value and returns the correct caption index by searching the captionTimes array:
//Binary search function that returns:
// the index to captions array for the time value, OR,
// -1 if no caption applies
function getCaptionIndex(time:Number):Number {
	var lo = 0;
	var hi = captionTimes.length-1;
	if (hi == -1) {
		return (-1);
	}
	var mid;
	var ts;
	while (lo<=hi) {
		mid = Math.floor((lo+hi)/2);
		ts = captionTimes[mid];
		if (time<ts) {
			hi = mid-1;
		} else if (time>ts) {
			lo = mid+1;
		} else {
			return (mid);
		}
	}
	return (Math.floor((lo+hi)/2));
}
Now, we can simply find the index for the correct caption in the callback function.
function callback() {
	//get the caption index for current time
	var index = getCaptionIndex(nets.time);
	//Display track0 text and time for demonstration purposes
	//dtCaption is the variable name of the dynamic text field
	//[1] is the first track (that is track0, speaker no is at [0])
	dtCaption = nets.time+"<br>"+captions[index][1];	
}
That's about it for our sample. It doesn't have a slider, but the method works even if the video time is arbitrarily changed by the user (that's why we have done all this in the first place). As you can see, data handling in the callback function is left as an exercise for the reader (For a start, you should check if the index returned is -1)...
 
Last Words...
While using embedded captions has many advantages, it's also possible to use the Captionate exported XML. In this article, we have provided two sample key functions for achieving this: a function to get the data from the XML into arrays and a function that will return the correct caption index.
 
You can download the FLA file (in Flash MX 2004 format) with the code presented in this article here: capxml.zip (~6.25 KB). (Note that the ZIP does not include an XML or a FLV file).

 
Please send any feedback about this article to support@captionate.com
 
Copyright © 2006 Manitu Group. All rights reserved. All trademarks acknowledged.