Tablecast: Draft 0.1

November 30, 2010
Ka-Ping Yee

(See notes and rationale.)

Abstract

Tablecast is an extension of the Atom Syndication Format to represent a stream of changes to a dataset. Applications can use Tablecast together with a publish-subscribe protocol such as PubSubHubbub to receive timely updates about datasets managed by other parties.

Contents

  1. 1. Overview
    1. 1.1. Example
    2. 1.2. Namespace
    3. 1.3. Notational conventions
  2. 2. Definitions
    1. 2.1. Record identifiers
    2. 2.2. Author identifiers
    3. 2.3. Universal timestamps
    4. 2.4. Universal names
  3. 3. Tablecast feeds
    1. 3.1. Tablecast entries
    2. 3.2. The tc:edit element
    3. 3.3. The tc:row element
    4. 3.4. The tc:field element
    5. 3.5. The tc:deleted element
  4. 4. Tablecast services
    1. 4.1. Stream view
    2. 4.2. Snapshot view
  5. 5. Security considerations
    1. 5.1. Content secrecy and integrity
    2. 5.2. Client authorization
    3. 5.3. Publisher identity
    4. 5.4. Author identity
  6. 6. Acknowledgements
    1. License

    1. Overview

    This specification defines a data format called a Tablecast feed and an application programming interface called a Tablecast service.

    A Tablecast feed is an Atom feed where each entry represents an edit to a record in a dataset. A Tablecast service provides a specific way to request and retrieve Tablecast feeds. Each entry in a Tablecast feed includes four pieces of information:

    This specification defines one type of content called a row edit, an XML element representing a change to a row in a database table. Other types of content are possible.

    In this specification, a dataset is a mutable set of records in which each record has a unique record identifier. Records can be added or deleted over time, and the data within records can change over time. The structure of the data within records is up to the application. Applications that process incoming Tablecast feeds are responsible for maintaining the metadata needed to interpret incoming edits in order by effective time, regardless of the order in which edits actually arrive.

    The purpose of Tablecast is to help applications maintain and share data, with low latency, in a decentralized fashion. For example, a subscriber could apply the incoming edits to its own table, to maintain a synchronized copy of the publisher's table. Or, an application could subscribe to multiple Tablecast feeds and merge the edits together to produce a combined table.

    1.1. Example

    The following is an example of a Tablecast feed with one entry. The Tablecast-specific parts are shown in bold.

    <?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom"
          xmlns:tc="http://schemas.google.com/tablecast/2010">
      <id>tag:repository2.com,2010:feed1</id>
      <updated>2010-07-02T20:11:03Z</updated>
      <entry>
        <id>tag:repository1.com,2010:entry1</id>
        <updated>2010-07-02T20:11:03Z</updated>
        <author>
          <uri>mailto:user@mailprovider.org</uri>
        </author>
        <title>tag:repository1.com,2010:entry1</title>
        <content type="application/tablecast+xml">
          <tc:edit record="tag:example.org,2010:1234567"
                   author="mailto:user@mailprovider.org"
                   effective="2010-06-29T15:27:39Z"
                   type="{http://schemas.google.com/tablecast/2010}row">
            <tc:row>
              <tc:field name="facility_name">"New name"</tc:field>
              <tc:field name="available_beds">55</tc:field>
            </tc:row>
          </tc:edit>
        </content>
      </entry>
    </feed>
    

    This entry says that, at 15:27:39 on June 29, 2010, UTC, someone with the e-mail address user@mailprovider.org edited two fields on the record with identifier tag:example.org,2010:1234567, setting the "facility_name" field to the string "New name" and the "available_beds" field to the number 55.

    Note:

    1.2. Namespace

    The XML Namespace URI for the XML elements and attributes defined in this document is:

    http://schemas.google.com/tablecast/2010

    This specification uses the namespace prefix tc: to signify the above namespace URI and the namespace prefix atom: to signify the Atom namespace URI. These choices are arbitrary; applications can use any prefixes.

    1.3. Notational conventions

    For convenience, this format may be referred to as "Tablecast 0.1".

    This format is based on the Atom Syndication Format, defined in RFC 4287. This specification uses a shorthand for references to the Atom Syndication Format specification: "(Atom: 1.3)" refers to Section 1.3 of RFC 4287.

    2. Definitions

    2.1. Record identifiers

    A record identifier is a tag URI designating a particular record in a dataset. The authorityName part of the URI must be a fully qualified, lowercase ASCII domain name with no trailing period.

    Applications that process Tablecast feeds for a common dataset will need to agree on a common scheme for their record identifiers. The authorityName should be the domain name of the organization that governs the identifier scheme.

    2.2. Author identifiers

    An author identifier is a URI that identifies the person or organization responsible for making an edit described in a Tablecast entry. The URI must designate a resource that is owned or controlled by this person or organization. Author URIs can designate e-mail addresses, web pages, domains, or other resources; for example, mailto, tel, http, and tag URIs are all acceptable. Domain names in author identifiers must be fully qualified, lowercase ASCII domain names with no trailing period.

    2.3. Universal timestamps

    A universal timestamp is a string that conforms to the date-time production in RFC3339, with these additional requirements:

    1. An uppercase T must be used to separate the date and time.
    2. An uppercase Z must be used as the time zone specifier.

    2.4. Universal names

    The universal name of an XML element, using the terminology and syntax defined by James Clark, is an XML Namespace URI enclosed in curly braces (ASCII 0x7B and 0x7D) followed by the local name of an XML element in that namespace. For example, the universal name of the tc:row element is {http://schemas.google.com/tablecast/2010}row.

    3. Tablecast feeds

    This section defines a data format. A Tablecast feed is a valid Atom feed (Atom: 4.1.1) with these additional requirements:

    1. All entries must be valid Tablecast entries.
    2. All Atom Date constructs (Atom: 3.3) must use universal timestamps.

    3.1. Tablecast entries

    A Tablecast entry is a valid Atom entry (Atom: 4.1.2) with these additional requirements:

    1. The atom:content element must contain exactly one tc:edit element.
    2. The type attribute of the atom:content element must be application/tablecast+xml.
    3. All Atom Date constructs (Atom: 3.3) must use universal timestamps.
    4. There should be exactly one atom:author element (Atom: 4.2.1) containing one atom:uri element (Atom: 3.2.2) whose content is equal to the value of the tc:edit element's author attribute.

    Tablecast entries may also contain other elements that are valid in Atom entries (Atom: 4.1.2). The order of the child elements within an entry is not significant.

    3.2. The tc:edit element

    The tc:edit element expresses an edit to a record.

    The content of the tc:edit element describes the change to the specified record, which can be expressed in two ways:

    1. The change can be expressed as an XML element. In this case, the tc:edit element must contain exactly one child element, and the value of the type attribute must be the universal name of that child element.
    2. The change can be expressed as character data. In this case, the value of the type attribute must be an HTTP media type for the character data, as defined by the media-type production in RFC 2616.

    This specification defines just one type of edit content, the tc:row element, for expressing changes to a row of a relational database table. Using other values for the type attribute, Tablecast can accommodate other methods for expressing changes to arbitrary kinds of documents or data structures, such as XML delta formats, text diffs, or operational transformations.

    3.3. The tc:row element

    The tc:row element represents an update to a database row. Its content must be either:

    1. a sequence of zero or more tc:field elements specifying the values of fields in the row; or
    2. a single tc:deleted element.

    The semantics of a tc:row edit are as follows:

    The order of the child elements in a tc:row element is not significant.

    3.4. The tc:field element

    The tc:field element represents an update to a single field in a single row at a particular effective time.

    The recommended translation to JSON for some common data types is as follows:

    Original data type JSON representation
    None
    null
    NULL
    JSON null (null)
    Boolean JSON boolean (true or false)
    Integer
    Floating-point number
    JSON number
    String
    Text
    Binary object
    JSON string
    Date JSON string in ISO 8601 format ("YYYY-MM-DD")
    Time JSON string in universal timestamp format ("YYYY-MM-DDThh:mm:ssZ")
    Single-valued enumerated type
    (e.g. enum, MySQL ENUM)
    JSON string (use "UPPERCASE_WITH_UNDERSCORES" in the absence of compelling reasons otherwise)
    Multi-valued enumerated type
    (e.g. bitmask, flag set, MySQL SET)
    JSON array of strings (use "UPPERCASE_WITH_UNDERSCORES" in the absence of compelling reasons otherwise)
    Geographic location JSON array of two or three numbers, either [latitude, longitude] or [latitude, longitude, altitude] in the WGS84 coordinate frame, where latitude is measured in degrees north, longitude in degrees east, and altitude in metres
    Geographic polyline
    Geographic polygon
    JSON array containing JSON arrays of two or three numbers, using the above representation for geographic locations

    Note that these JSON representations are not intended to convey the original types of the values, only to be interpretable as the correct values when the appropriate types are known. For flexibility, Tablecast intentionally avoids requiring applications to agree exactly on table schemas. Applications that receive and then republish changes should emit JSON representations using the same structure they received.

    3.5. The tc:deleted element

    The tc:deleted element indicates that the edit consists of deleting the specified row from the table at a particular effective time.

    4. Tablecast services

    This section defines an application programming interface based on HTTP requests that yield Tablecast feeds. Applications may choose to support additional query parameters beyond those specified here, to filter, search among, or limit the number of returned entries.

    A Tablecast service provides two views of a dataset:

    Suppose that D is a dataset and u is an HTTP or HTTPS URL. An application is said to provide a Tablecast of D at u if it meets the requirements in both sections below.

    4.1. Stream view

    Let X(t) be the set of all Tablecast entries that would be retrievable at u at a given point in time, t, using any combination of query parameters that excludes the snapshot parameter.

    1. For any time t, the dataset produced by starting with an empty dataset and applying all the edits in X(t), in order by effective time, must be identical to the state of D at time t.
    2. For any two times t1 < t2, every entry in X(t2) must either:
    3. Suppose a client makes a GET request for u at time t with these query parameters: The response must be a Tablecast feed containing the list of entries that would be produced by these operations:
      1. Let F be an ordered list of all the entries in X(t) sorted by increasing atom:updated time.
      2. Remove all entries in F that have atom:updated times less than m.
      3. Remove the first k entries in F.
      4. Return the first one or more entries in F, or an empty list if F is empty.

    4.2. Snapshot view

    Let Y(t) be the set of all Tablecast entries that would be retrievable at u at a given point in time, t, using any combination of query parameters that includes the snapshot parameter.

    1. For any time t, the dataset produced by starting with an empty dataset and applying all the edits in Y(t), in order by effective time, must be identical to the state of D at time t.
    2. For any two times t1 < t2, every entry in Y(t2) must either:
    3. Y(t) must contain exactly one entry for each record identifier in D at time t.
    4. Suppose a client makes a GET request for u at time t with these query parameters: The response must be a Tablecast feed containing the list of entries that would be produced by these operations:
      1. Let F be an ordered list of all the entries in Y(t), sorted by increasing record identifier, where record identifiers are compared in lexicographic order by ASCII value.
      2. Remove all entries in F that have record identifiers less than s.
      3. Remove the first k items in F.
      4. Return the first one or more entries in F, or an empty list if F is empty.

    5. Security considerations

    5.1. Content secrecy and integrity

    Tablecast does not provide a mechanism to assure the secrecy or integrity of content. Applications wanting to protect content against eavesdropping or modification in transit should use a transport layer with encryption and/or content authentication, such as HTTPS, to transmit Tablecast feeds.

    5.2. Client authorization

    Tablecast does not provide a mechanism to verify that a client is authorized to retrieve a Tablecast feed. Applications wanting to protect content from unauthorized clients should use unguessable HTTPS URLs or other authorization mechanisms to protect their feeds.

    5.3. Publisher identity

    Tablecast does not provide a mechanism to verify the identity of the publisher. Applications should subscribe to publishers they trust, and can use DNS, DNSSEC, or certificate validation to verify the identity of the publisher.

    5.4. Author identity

    Tablecast does not provide a mechanism to verify the identity associated with an author identifier. Applications should treat author identifiers in Tablecast feeds as claims made by the publisher, and may choose to reinterpret, selectively process, or ignore edits based on their own policies about publishers and authors. For example, an application can decide that only certain publishers are trusted to speak for certain authors, and accept or ignore incoming edits accordingly. An application can also decide that only certain authors are trusted to edit certain fields, and selectively ignore parts of incoming edits accordingly.

    6. Acknowledgements

    Thanks to Brett Slatkin, Steve Hakusa, Craig Nevill-Manning, and Alon Halevy for their advice and input.

    License

    This document is licensed under the GNU Free Documentation License 1.2.