Tablecast: Draft 0.2

December 14, 2010
Ka-Ping Yee

(See notes and rationale. Your comments are welcome! Please send feedback to the mailing list.)

Abstract

Tablecast is an extension of the Atom Syndication Format to represent a stream of changes to a dataset. Applications can use Tablecast together with a publish-subscribe protocol such as PubSubHubbub to receive timely updates about datasets managed by other parties.

1. Overview

This specification defines a data format called a Tablecast feed and an application programming interface called a Tablecast service.

A Tablecast feed is an Atom feed where each entry represents an edit to a record in a dataset. A Tablecast service provides a specific way to request and retrieve Tablecast feeds. Each entry in a Tablecast feed includes four pieces of information:

This specification defines one type of content called a row edit, an XML element representing a change to a row in a database table. Other types of content are possible.

In this specification, a dataset is a mutable set of records in which each record has a unique record identifier. Records can be added or deleted over time, and the data within records can change over time. The structure of the data within records is up to the application. Applications that process incoming Tablecast feeds are responsible for maintaining the metadata needed to interpret incoming edits in order by effective time, regardless of the order in which edits actually arrive.

The purpose of Tablecast is to help applications maintain and share data, with low latency, in a decentralized fashion. For example, a subscriber could apply the incoming edits to its own table, to maintain a synchronized copy of the publisher's table. Or, an application could subscribe to multiple Tablecast feeds and merge the edits together to produce a combined table.

1.1. Example

The following is an example of a Tablecast feed with one entry. The Tablecast-specific parts are shown in bold.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:tc="http://schemas.google.com/tablecast/2010">
  <id>tag:repository2.com,2010:feed1</id>
  <updated>2010-07-02T20:11:03Z</updated>
  <entry>
    <id>tag:repository1.com,2010:entry1</id>
    <updated>2010-07-02T20:11:03Z</updated>
    <author><uri>mailto:user@mailprovider.org</uri></author>
    <title>tag:repository1.com,2010:entry1</title>
    <content type="application/tablecast+xml">
      <tc:edit tc:record="tag:example.org,2010:1234567"
               tc:author="mailto:user@mailprovider.org"
               tc:effective="2010-06-29T15:27:39Z"
               tc:type="{http://schemas.google.com/tablecast/2010}row">
        <tc:row>
          <tc:field tc:name="facility_name">"New name"</tc:field>
          <tc:field tc:name="available_beds"
                    tc:comment="estimated by doctors on site">55</tc:field>
        </tc:row>
      </tc:edit>
    </content>
  </entry>
</feed>

This entry says that, at 15:27:39 on June 29, 2010, UTC, someone with the e-mail address user@mailprovider.org edited two fields on the record with identifier tag:example.org,2010:1234567, setting the "facility_name" field to the string "New name" and the "available_beds" field to the number 55. The latter change also has an associated change comment explaining that the value 55 was "estimated by doctors on site".

Note:

1.2. Namespace

The XML Namespace URI for all the XML elements and attributes defined in this document is:

http://schemas.google.com/tablecast/2010

1.3. Notational conventions

For convenience, this format may be referred to as "Tablecast 0.2".

This format is based on the Atom Syndication Format, defined in RFC 4287. This specification uses a shorthand for references to the Atom Syndication Format specification: "(Atom: 1.3)" refers to Section 1.3 of RFC 4287.

This specification uses the namespace prefix tc: to signify the Tablecast namespace URI shown in the preceding section and and the namespace prefix atom: to signify the Atom namespace URI. The choice of namespace prefix is arbitrary and has no semantic significance.

2. Definitions

2.1. Record identifiers

A record identifier is a tag URI designating a particular record in a dataset. The authorityName part of the URI must be a fully qualified, lowercase ASCII domain name with no trailing period.

Applications that process Tablecast feeds for a common dataset will need to agree on a common scheme for their record identifiers. The authorityName should be the domain name of the organization that governs the identifier scheme.

2.2. Author identifiers

An author identifier is a URI that identifies the person or organization responsible for making an edit described in a Tablecast entry. The URI must designate a resource that is owned or controlled by this person or organization. Author URIs can designate e-mail addresses, web pages, domains, or other resources; for example, mailto, tel, http, and tag URIs are all acceptable. Domain names in author identifiers must be fully qualified, lowercase ASCII domain names with no trailing period.

2.3. Universal timestamps

A universal timestamp is a string that conforms to the date-time production in RFC3339, with these additional requirements:

  1. An uppercase T must be used to separate the date and time.
  2. An uppercase Z must be used as the time zone specifier.

2.4. Universal names

The universal name of an XML element, using the terminology and syntax defined by James Clark, is an XML Namespace URI enclosed in curly braces (ASCII 0x7B and 0x7D) followed by the local name of an XML element in that namespace. For example, the universal name of the tc:row element is {http://schemas.google.com/tablecast/2010}row.

3. Tablecast feeds

This section defines a data format. A Tablecast feed is a valid Atom feed (Atom: 4.1.1) with these additional requirements:

  1. All entries must be valid Tablecast entries.
  2. All Atom Date constructs (Atom: 3.3) must use universal timestamps.

3.1. Tablecast entries

A Tablecast entry is a valid Atom entry (Atom: 4.1.2) with these additional requirements:

  1. The atom:content element must contain exactly one tc:edit element.
  2. The type attribute of the atom:content element must be application/tablecast+xml.
  3. All Atom Date constructs (Atom: 3.3) must use universal timestamps.
  4. There should be exactly one atom:author element (Atom: 4.2.1) containing one atom:uri element (Atom: 3.2.2) whose content is equal to the value of the tc:edit element's tc:author attribute.

Tablecast entries may also contain other elements that are valid in Atom entries (Atom: 4.1.2). The order of the child elements within an entry is not significant.

3.2. The tc:edit element

The tc:edit element expresses an edit to a record.

The content of the tc:edit element describes the change to the specified record, which can be expressed in two ways:

  1. The change can be expressed as an XML element. In this case, the tc:edit element must contain exactly one child element, and the value of the tc:type attribute must be the universal name of that child element.
  2. The change can be expressed as character data. In this case, the value of the tc:type attribute must be an HTTP media type for the character data, as defined by the media-type production in RFC 2616.

This specification defines just one type of edit content, the tc:row element, for expressing changes to a row of a relational database table. Using other values for the tc:type attribute, Tablecast can accommodate other methods for expressing changes to data, such as XML delta formats, text diffs, or operational transformations.

3.3. The tc:row element

The tc:row element represents an update to a database row. Its content must be either:

  1. a sequence of zero or more tc:field elements specifying the values of fields in the row; or
  2. a single tc:deleted element.

The order of the child elements in a tc:row element is not significant.

The semantics of tc:row contain some embedded assumptions:

A tc:row edit containing tc:field elements has the following meaning:

A tc:row edit containing a tc:deleted element has the following meaning:

3.4. The tc:field element

The tc:field element represents an update to a single field in a single row at a particular effective time.

The recommended translation to JSON for some common data types is as follows:

Original data type JSON representation
Null (e.g. None, null, NULL) JSON null (null)
Boolean JSON boolean (true or false)
Integer
Floating-point number
JSON number (note that JSON can express numbers of arbitrary size and precision, and JSON processors should not assume that JSON numbers are representable as IEEE 754 floating-point numbers)
String
Text
Binary object
JSON string
Date JSON string in ISO 8601 format ("YYYY-MM-DD")
Time JSON string in universal timestamp format ("YYYY-MM-DDThh:mm:ssZ")
Single-valued enumerated type
(e.g. enum, MySQL ENUM)
JSON string (use "UPPERCASE_WITH_UNDERSCORES" in the absence of compelling reasons otherwise)
Multi-valued enumerated type
(e.g. bitmask, flag set, MySQL SET)
JSON array of strings (use "UPPERCASE_WITH_UNDERSCORES" in the absence of compelling reasons otherwise)
Geographic point location JSON array of two or three numbers, either [latitude, longitude] or [latitude, longitude, altitude] in the WGS84 coordinate frame, where latitude is measured in degrees north, longitude in degrees east, and altitude in metres
Geographic polyline
Geographic multi-line
Geographic polygon
Geographic multi-polygon
GeoJSON representation

Note that these JSON representations are not intended to convey the original types of the values, only to be interpretable as the correct values when the appropriate types are known. For flexibility, Tablecast intentionally avoids requiring applications to agree exactly on table schemas. Applications that receive and then republish changes should emit JSON representations using the same structure they received.

3.5. The tc:deleted element

The tc:deleted element indicates that the edit consists of deleting the specified row from the table at a particular effective time.

4. Tablecast services

This section defines an application programming interface based on HTTP requests that yield Tablecast feeds. Applications may choose to support additional query parameters beyond those specified here, to filter, search, or otherwise manipulate the returned entries.

A Tablecast service provides two views of a dataset:

Suppose that D is a dataset and u is an HTTP or HTTPS URL. An application is said to provide a Tablecast of D at u if it meets the requirements in both sections below.

4.1. Stream view

Let X(t) be the set of all Tablecast entries that would be retrievable at u at a given point in time, t, using any combination of query parameters that excludes the snapshot parameter.

  1. For any time t, the dataset produced by starting with an empty dataset and applying all the edits in X(t), in order by effective time, must be identical to the state of D at time t.
  2. For any two times t1 < t2, every entry in X(t2) must either:
  3. Suppose a client makes a GET request for u at time t with these query parameters: The response must be a Tablecast feed containing the list of entries that would be produced by these operations:
    1. Let F be an ordered list of all the entries in X(t) sorted by increasing atom:updated time.
    2. Remove all entries in F that have atom:updated times less than m.
    3. Remove the first k entries in F.
    4. Return the first one or more entries in F, up to a maximum of n entries, or an empty list if F is empty.

4.2. Snapshot view

Let Y(t) be the set of all Tablecast entries that would be retrievable at u at a given point in time, t, using any combination of query parameters that includes the snapshot parameter.

  1. For any time t, the dataset produced by starting with an empty dataset and applying all the edits in Y(t), in order by effective time, must be identical to the state of D at time t.
  2. For any two times t1 < t2, every entry in Y(t2) must either:
  3. Y(t) must contain exactly one entry for each record identifier in D at time t.
  4. Suppose a client makes a GET request for u at time t with these query parameters: The response must be a Tablecast feed containing the list of entries that would be produced by these operations:
    1. Let F be an ordered list of all the entries in Y(t), sorted by increasing record identifier, where record identifiers are compared in lexicographic order by ASCII value.
    2. Remove all entries in F that have record identifiers less than or equal to s.
    3. Return the first one or more entries in F, up to a maximum of n entries, or an empty list if F is empty.

5. Security considerations

5.1. Content secrecy and integrity

Tablecast does not provide a mechanism to assure the secrecy or integrity of content. Applications wanting to protect content against eavesdropping or modification in transit should use a transport layer with encryption and/or content authentication, such as HTTPS, to transmit Tablecast feeds.

5.2. Client authorization

Tablecast does not provide a mechanism to verify that a client is authorized to retrieve a Tablecast feed. Applications wanting to protect content from unauthorized clients should use unguessable HTTPS URLs or other authorization mechanisms to protect their feeds.

5.3. Publisher identity

Tablecast does not provide a mechanism to verify the identity of the publisher. Applications should subscribe to publishers they trust, and can use DNS, DNSSEC, or certificate validation to verify the identity of the publisher.

5.4. Author identity

Tablecast does not provide a mechanism to verify the identity associated with an author identifier. Applications should treat author identifiers in Tablecast feeds as claims made by the publisher, and may choose to reinterpret, selectively process, or ignore edits based on their own policies about publishers and authors. For example, an application may decide that only certain publishers are trusted to speak for certain authors, and accept or ignore incoming edits accordingly. An application may also decide that only certain authors are trusted to edit certain fields, and selectively ignore parts of incoming edits accordingly.

6. Acknowledgements

Thanks to Brett Slatkin, Steve Hakusa, Craig Nevill-Manning, and Alon Halevy for their advice and input.

License

This document is licensed under the GNU Free Documentation License 1.2.