Captionate XML Format November 2, 2006 Revision 1 by Mehmet GOZUBUYUK, et al, of Manitu Group
 
Introduction
Captionate can export and import the data in its own XML format. We don't have a schema or DTD for the format, but this document describes the format in detail and will be kept current. Since Captionate has been free to upgrade/update up to now, no version related information is included.
 
A Captionate XML, is a text file, starts with the following declaration:
<?xml version="1.0" encoding="UTF-8"?>
The encoding is UTF-8. Also, characters invalid for an XML like '<' are encoded as '&lt;'.

 
Root tag
Root tag for a Captionate XML file is <captionate>. All other tags are childs of the root tag.
 
The root tag parents the 'top level tags'. There are two types of top level tags: Configuration Tags and Content Tags.
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
  // toplevel tags here
</captionate>
All top level tags are optional, all or none can be present, in any order. Obviously, having no content type top level tag does not make much sense.
 
Tags
Configuration type top level tags
timeformat Optional Defines the time format that will be used in time attributes.For details, see below.
namesareprefixed Optional In order to workaround issues caused by our bad decision of having 'values' as tag names, declares that those tag names are prefixed. For details, see below.
Content type top level tags
custommetadata Optional Defines custom metadata name:value pairs for onMetaData event custommetadata object. For details, see below.
markers Optional Defines markers. For details, see below.
captioninfo Optional Defines language tracks and speakers for captions. For details, see below.
captions Optional Defines captions. For details, see below.
cuepoints Optional Defines cue points. For details, see below.

 
Configuration Tags
Configuration tags are <timeformat> and <namesareprefixed>.
 
<timeformat>
Defines the timestamp value format used in all other tags in the XML. If not present, all timestamp values in the XML are in milliseconds (ms) format.
 
Valid values: <timeformat> Samples:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
  <timeformat>s</timeformat>
  // other toplevel tags here
</captionate>
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
  <timeformat>hh:mm:ss:ff/30</timeformat>
  // other toplevel tags here
</captionate>
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
  <timeformat>hh:mm:ss:ms</timeformat>
  // other toplevel tags here
</captionate>

 
These formats correspond to the export formats you can set in Preferences dialog, XML Export tab.
 
<namesareprefixed>
If this tag is present exactly as below,
<namesareprefixed>namesareprefixed</namesareprefixed>
<custommetadata> and <cuepoints> parameters, which are implemented as tag names are prefixed with the string 'name_'. If this tag is not present in the exact form, Captionate assumes those names are not prefixed.
 
Implementing parameter names as tag names was a mistake, which we will most probably fix soon (we will always support older versions though). Tag names in an XML has limitations like they cannot begin with a digit or they cannot be empty. Prefixing tag names workarounds some of the issues.
 
When importing an XML with this tag, Captionate will assume parameter tag names are prefixed and remove the prefixes.
 
<namesareprefixed> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>
  <namesareprefixed>namesareprefixed</namesareprefixed>
  // other toplevel tags here
</captionate>

 
Content Tags
Content Tags are <custommetadata>, <markers>, <captioninfo>, <captions> and <cuepoints>.
 
<cuepoints>
Defines cue points.
 
Tags
cuepoint Optional Each <cuepoint> tag contains information for a single cue point. For details, see below.

 
<cuepoint>
Defines a single cue point.
 
Attributes
time Required Timestamp for the cue point. The format is defined by the <timeformat> configuration tag. Two cue points cannot have the same timestamp.
Tags
name Required Name of the cue point.
type Required Type of the cue point. Can be either event or navigation.
parameters Optional Contains the parameters of the cue point as name:value pairs such that name is a tag and value is the content of the tag: <name>value</name>. If the <namesareprefixed> configuration tag is present the name is prefixed with the string 'name_'.

 
<cuepoints> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>

  <namesareprefixed>namesareprefixed</namesareprefixed>

  <cuepoints>

    <cuepoint time="1500">
      <name>My First Cue Point</name>
      <type>navigation</type>
    </cuepoint>
  
    <cuepoint time="2400">
      <name>New Cue Point</name>
      <type>event</type>
      <parameters>
        <name_name_0>my value</name_name_0>
        <name_url>http://www.captionate.com</name_url>
      </parameters>
    </cuepoint>

  </cuepoints>
  // other toplevel tags here
</captionate>
The above sample has two cue points defined. First one is a navigation type cue point and has no parameters. Second one is an event cue point and has two parameters. Because <namesareprefixed> is present, the tags for cue point names are prefixed with 'name_'. <timeformat> tag is not present, so timestamp values are in milliseconds.

 
<custommetadata>
Defines name:value pairs that will be saved in custommetadata object in onMetaData event data.
 
Tags
[name] Optional [value] Tag name is the name part of the property pair. If the <namesareprefixed> configuration tag is present, the name is prefixed with the string 'name_'.

 
<custommetadata> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>

  <custommetadata>
  
    <name1>value1</name1>
    <name2>value2</name2>
    <name3>value3</name3>

  </custommetadata>

  // other toplevel tags here
</captionate>

 
<markers>
Defines 'Markers'.
 
Tags
marker Optional Each <marker> tag contains information for a single marker. For details, see below.

 
<marker>
Defines a single marker.
 
Attributes
time Required Timestamp for the marker. The format is defined by the <timeformat> configuration tag. Two markers cannot have the same timestamp.
Tags
label Required label (text) of the marker.

 
<markers> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>

  <timeformat>hh:mm:ss:ms</timeformat>

  <markers>

    <marker time="00:00:08:316">
      <labels>scene1 starts</label>
    </marker>
  
    <marker time="00:00:42:211">
      <label>video ends</label>
    </marker>

  </markers>
  // other toplevel tags here
</captionate>

 
<captioninfo>
Defines language tracks and speakers.
 
Tags
trackinfo Required Defines language tracks. For details, see below.
speakerinfo Required Defines speakers. For details, see below.

 
<trackinfo>
Defines language tracks, used in captions.
 
The order of the <track> tags contained in this tag is important. First defined track becomes track number 0, second one becomes track number 1 and so on.
 
At least one track definition is required. The track count must also match the caption texts defined in captions.
 
Tags
track Required Each <track> tag defines and contains information for a single language track. For details, see below.

 
<track>
Defines a single language tracks' properties.
 
Note that Captionate embeds this data into FLV files along with caption texts, it does not make further use of the data. The data can be received in the player SWF file. So, while the tags have their intended uses, ultimately it depends on the player SWF to make use of the data. Following tag names match the track properties Captionate provides.
 
Tags
displayname Required A string. Intended for the name of the language track as will be presented to the user.
type Required A string. Intended for the type of the track. Captionate provides default two types as 'Caption' and 'Subtitle'.
languagecode Required A string. Intended for short code for the language like 'en-us', 'fr'...
targetwpm Required A number. Should be the target 'words per minute' rate for the track.
stringdata Required A string per track for general purpose use.

 
<speakerinfo>
Defines speakers, used in captions.
 
The order of the <speaker> tags contained in this tag is important. First defined speaker becomes speaker number 0, second one becomes speaker number 1 and so on. (Speaker number '-1' means no speaker).
 
Tags
speaker Optional Each <speaker> tag defines and contains information for a single speaker. For details, see below.

 
<speaker>
Defines a single speaker.
 
Tags
name Required A string. Intended for the name of the speaker.
stringdata Required A string per speaker for general purpose use.

 
<captioninfo> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>

  <captioninfo>

    <trackinfo>

      <track>
        <displayname>English</displayname>
        <type>Caption</type>
        <languagecode>en-us</languagecode>
        <targetwpm>140</targetwpm>
        <stringdata></stringdata>
      </track>

      <track>
        <displayname>English for kids</displayname>
        <type>Caption</type>
        <languagecode>en-us</languagecode>
        <targetwpm>70</targetwpm>
        <stringdata></stringdata>
      </track>

    </trackinfo>

    <speakerinfo>

      <speaker>
        <name>John</name>
        <stringdata></stringdata>
      </speaker>

      <speaker>
        <name>Vera</name>
        <stringdata></stringdata>
      </speaker>

    </speakerinfo>
	
  </captioninfo>
  // other toplevel tags here
</captionate>

 
<captions>
Defines caption texts and times.
 
Tags
caption Optional Each <caption> tag defines a single caption. For details, see below.

 
<caption>
Defines a caption.
 
Attributes
time Required Timestamp for the caption. The format is defined by the <timeformat> configuration tag. Two captions cannot have the same timestamp.
Tags
speaker Required Speaker index number as defined in <speakerinfo> tag. Number -1 means no speaker is defined for the caption.
tracks Required Contains caption texts in tags named as trackx, where x is the language track number, as defined in <trackinfo> tag. At least one trackx tag ('track0') is required.

 
<captions> Sample:
<?xml version="1.0" encoding="UTF-8"?>
<captionate>

  <captions>

    <caption time="12650">
    <speaker>1</speaker>
    <tracks>
      <track0>Hi John! Could you please tell me about your childhood?</track0>
      <track1>Hi John! tell me your childhood</track1>
    </tracks>
    </caption>

    <caption time="12766">
    <speaker>-1</speaker>
    <tracks>
      <track0></track0>
      <track1></track1>
    </tracks>
    </caption>

    <caption time="13430">
    <speaker>0</speaker>
    <tracks>
      <track0>I don't know where to start.</track0>
      <track1>I don't know where to start.</track1>
    </tracks>
    </caption>
	
  </captions>

  <captioninfo>

    <trackinfo>

      <track>
        <displayname>English</displayname>
        <type>Caption</type>
        <languagecode>en-us</languagecode>
        <targetwpm>140</targetwpm>
        <stringdata></stringdata>
      </track>

      <track>
        <displayname>English for kids</displayname>
        <type>Caption</type>
        <languagecode>en-us</languagecode>
        <targetwpm>70</targetwpm>
        <stringdata></stringdata>
      </track>

    </trackinfo>

    <speakerinfo>

      <speaker>
        <name>John</name>
        <stringdata></stringdata>
      </speaker>

      <speaker>
        <name>Vera</name>
        <stringdata></stringdata>
      </speaker>

    </speakerinfo>
	
  </captioninfo>
  // other toplevel tags here
</captionate>
In the above sample, also captioninfo tag is included (which is generally placed before the captions tag unlike here). In the captioninfo tag, 2 language tracks and 2 speakers are defined, so our captions should have texts for track0 and track1. We have 3 captions, second one is an empty caption used for removing the previous caption.

 

 

 
Please send any feedback about this article to support@captionate.com
 
Copyright © 2006 Manitu Group. All rights reserved. All trademarks acknowledged.