Captionate can export and import the data in its own XML format. We don't have a schema or DTD for the format, but this document describes the format in detail and will be kept current. Since Captionate has been free to upgrade/update up to now, no version related information is included.
A Captionate XML, is a text file, starts with the following declaration:
<?xml version="1.0" encoding="UTF-8"?>
The encoding is UTF-8. Also, characters invalid for an XML like '<' are encoded as '<'.
Root tag for a Captionate XML file is <captionate>. All other tags are childs of the root tag.
The root tag parents the 'top level tags'. There are two types of top level tags: Configuration Tags and Content Tags.
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
// toplevel tags here
</captionate>
All top level tags are optional, all or none can be present, in any order. Obviously, having no content type top level tag does not make much sense.
Tags |
Configuration type top level tags |
timeformat |
Optional |
Defines the time format that will be used in time attributes.For details, see below. |
namesareprefixed |
Optional |
In order to workaround issues caused by our bad decision of having 'values' as tag names, declares that those tag names are prefixed. For details, see below. |
Content type top level tags |
custommetadata |
Optional |
Defines custom metadata name:value pairs for onMetaData event custommetadata object. For details, see below. |
markers |
Optional |
Defines markers. For details, see below. |
captioninfo |
Optional |
Defines language tracks and speakers for captions. For details, see below. |
captions |
Optional |
Defines captions. For details, see below. |
cuepoints |
Optional |
Defines cue points. For details, see below. |
Configuration Tags
Configuration tags are <timeformat> and <namesareprefixed>.
<timeformat>
Defines the timestamp value format used in all other tags in the XML. If not present, all timestamp values in the XML are in milliseconds (ms) format.
Valid values:
- ms : milliseconds (default)
- s : seconds
- hh:mm:ss:ff/30 : hours : minutes : seconds : frames (where frames range is 0..29)
- hh:mm:ss:ms : hours : minutes : seconds : milliseconds
<timeformat> Samples:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<timeformat>s</timeformat>
// other toplevel tags here
</captionate>
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<timeformat>hh:mm:ss:ff/30</timeformat>
// other toplevel tags here
</captionate>
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<timeformat>hh:mm:ss:ms</timeformat>
// other toplevel tags here
</captionate>
These formats correspond to the export formats you can set in Preferences dialog, XML Export tab.
<namesareprefixed>
If this tag is present exactly as below,
<namesareprefixed>namesareprefixed</namesareprefixed>
<custommetadata> and <cuepoints> parameters, which are implemented as tag names are prefixed
with the string 'name_'. If this tag is not present in the exact form, Captionate assumes those names are not prefixed.
Implementing parameter names as tag names was a mistake, which we will most probably fix soon (we will always support older versions though). Tag names in an XML has limitations like they cannot begin with a digit or they cannot be empty. Prefixing tag names workarounds some of the issues.
When importing an XML with this tag, Captionate will assume parameter tag names are prefixed and remove the prefixes.
<namesareprefixed> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<namesareprefixed>namesareprefixed</namesareprefixed>
// other toplevel tags here
</captionate>
Content Tags
Content Tags are <custommetadata>, <markers>, <captioninfo>, <captions> and <cuepoints>.
<cuepoints>
Defines cue points.
Tags |
cuepoint |
Optional |
Each <cuepoint> tag contains information for a single cue point. For details, see below. |
<cuepoint>
Defines a single cue point.
Attributes |
time |
Required |
Timestamp for the cue point. The format is defined by the <timeformat> configuration tag. Two cue points cannot have the same timestamp. |
Tags |
name |
Required |
Name of the cue point. |
type |
Required |
Type of the cue point. Can be either event or navigation. |
parameters |
Optional |
Contains the parameters of the cue point as name:value pairs such that name is a tag and value is the content of the tag: <name>value</name>. If the <namesareprefixed> configuration tag is present the name is prefixed with the string 'name_'. |
<cuepoints> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<namesareprefixed>namesareprefixed</namesareprefixed>
<cuepoints>
<cuepoint time="1500">
<name>My First Cue Point</name>
<type>navigation</type>
</cuepoint>
<cuepoint time="2400">
<name>New Cue Point</name>
<type>event</type>
<parameters>
<name_name_0>my value</name_name_0>
<name_url>http://www.captionate.com</name_url>
</parameters>
</cuepoint>
</cuepoints>
// other toplevel tags here
</captionate>
The above sample has two cue points defined. First one is a navigation type cue point and has no parameters. Second one is an event cue point and has two parameters. Because <namesareprefixed> is present, the tags for cue point names are prefixed with 'name_'. <timeformat> tag is not present, so timestamp values are in milliseconds.
<custommetadata>
Defines name:value pairs that will be saved in custommetadata object in onMetaData event data.
Tags |
[name] |
Optional |
[value] Tag name is the name part of the property pair. If the <namesareprefixed> configuration tag is present, the name is prefixed with the string 'name_'. |
<custommetadata> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<custommetadata>
<name1>value1</name1>
<name2>value2</name2>
<name3>value3</name3>
</custommetadata>
// other toplevel tags here
</captionate>
<markers>
Defines 'Markers'.
Tags |
marker |
Optional |
Each <marker> tag contains information for a single marker. For details, see below. |
<marker>
Defines a single marker.
Attributes |
time |
Required |
Timestamp for the marker. The format is defined by the <timeformat> configuration tag. Two markers cannot have the same timestamp. |
Tags |
label |
Required |
label (text) of the marker. |
<markers> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<timeformat>hh:mm:ss:ms</timeformat>
<markers>
<marker time="00:00:08:316">
<labels>scene1 starts</label>
</marker>
<marker time="00:00:42:211">
<label>video ends</label>
</marker>
</markers>
// other toplevel tags here
</captionate>
<captioninfo>
Defines language tracks and speakers.
Tags |
trackinfo |
Required |
Defines language tracks. For details, see below. |
speakerinfo |
Required |
Defines speakers. For details, see below. |
<trackinfo>
Defines language tracks, used in captions.
The order of the <track> tags contained in this tag is important. First defined track becomes track number 0, second one becomes track number 1 and so on.
At least one track definition is required. The track count must also match the caption texts defined in captions.
Tags |
track |
Required |
Each <track> tag defines and contains information for a single language track. For details, see below. |
<track>
Defines a single language tracks' properties.
Note that Captionate embeds this data into FLV files along with caption texts, it does not make
further use of the data. The data can be received in the player SWF file. So, while the tags have their intended uses,
ultimately it depends on the player SWF to make use of the data. Following tag names match the track properties
Captionate provides.
Tags |
displayname |
Required |
A string. Intended for the name of the language track as will be presented to the user. |
type |
Required |
A string. Intended for the type of the track. Captionate provides default two types as 'Caption' and 'Subtitle'. |
languagecode |
Required |
A string. Intended for short code for the language like 'en-us', 'fr'... |
targetwpm |
Required |
A number. Should be the target 'words per minute' rate for the track. |
stringdata |
Required |
A string per track for general purpose use. |
<speakerinfo>
Defines speakers, used in captions.
The order of the <speaker> tags contained in this tag is important. First defined speaker becomes speaker number 0, second one becomes speaker number 1 and so on. (Speaker number '-1' means no speaker).
Tags |
speaker |
Optional |
Each <speaker> tag defines and contains information for a single speaker. For details, see below. |
<speaker>
Defines a single speaker.
Tags |
name |
Required |
A string. Intended for the name of the speaker. |
stringdata |
Required |
A string per speaker for general purpose use. |
<captioninfo> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<captioninfo>
<trackinfo>
<track>
<displayname>English</displayname>
<type>Caption</type>
<languagecode>en-us</languagecode>
<targetwpm>140</targetwpm>
<stringdata></stringdata>
</track>
<track>
<displayname>English for kids</displayname>
<type>Caption</type>
<languagecode>en-us</languagecode>
<targetwpm>70</targetwpm>
<stringdata></stringdata>
</track>
</trackinfo>
<speakerinfo>
<speaker>
<name>John</name>
<stringdata></stringdata>
</speaker>
<speaker>
<name>Vera</name>
<stringdata></stringdata>
</speaker>
</speakerinfo>
</captioninfo>
// other toplevel tags here
</captionate>
<captions>
Defines caption texts and times.
Tags |
caption |
Optional |
Each <caption> tag defines a single caption. For details, see below. |
<caption>
Defines a caption.
Attributes |
time |
Required |
Timestamp for the caption. The format is defined by the <timeformat> configuration tag. Two captions cannot have the same timestamp. |
Tags |
speaker |
Required |
Speaker index number as defined in <speakerinfo> tag. Number -1 means no speaker is defined for the caption. |
tracks |
Required |
Contains caption texts in tags named as trackx, where x is the language track number,
as defined in <trackinfo> tag. At least one trackx tag ('track0') is required. |
<captions> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
<captions>
<caption time="12650">
<speaker>1</speaker>
<tracks>
<track0>Hi John! Could you please tell me about your childhood?</track0>
<track1>Hi John! tell me your childhood</track1>
</tracks>
</caption>
<caption time="12766">
<speaker>-1</speaker>
<tracks>
<track0></track0>
<track1></track1>
</tracks>
</caption>
<caption time="13430">
<speaker>0</speaker>
<tracks>
<track0>I don't know where to start.</track0>
<track1>I don't know where to start.</track1>
</tracks>
</caption>
</captions>
<captioninfo>
<trackinfo>
<track>
<displayname>English</displayname>
<type>Caption</type>
<languagecode>en-us</languagecode>
<targetwpm>140</targetwpm>
<stringdata></stringdata>
</track>
<track>
<displayname>English for kids</displayname>
<type>Caption</type>
<languagecode>en-us</languagecode>
<targetwpm>70</targetwpm>
<stringdata></stringdata>
</track>
</trackinfo>
<speakerinfo>
<speaker>
<name>John</name>
<stringdata></stringdata>
</speaker>
<speaker>
<name>Vera</name>
<stringdata></stringdata>
</speaker>
</speakerinfo>
</captioninfo>
// other toplevel tags here
</captionate>
In the above sample, also captioninfo tag is included (which is generally placed before the captions tag unlike here). In the captioninfo tag, 2 language tracks and 2 speakers are defined, so our captions should have texts for track0 and track1. We have 3 captions, second one is an empty caption used for removing the previous caption.