(See notes and rationale. Your comments are welcome! Please send feedback to the mailing list.)
Tablecast is an extension of the Atom Syndication Format to represent a stream of changes to a dataset. Applications can use Tablecast together with a publish-subscribe protocol such as PubSubHubbub to receive timely updates about datasets managed by other parties.
This specification defines a data format called a Tablecast feed and an application programming interface called a Tablecast service.
A Tablecast feed is an Atom feed where each entry represents an edit to a record in a dataset. A Tablecast service provides a specific way to request and retrieve Tablecast feeds. Each entry in a Tablecast feed includes four pieces of information:
This specification defines one type of content called a row edit, an XML element representing a change to a row in a database table. Other types of content are possible.
In this specification, a dataset is a mutable set of records in which each record has a unique record identifier. Records can be added or deleted over time, and the data within records can change over time. The structure of the data within records is up to the application. Applications that process incoming Tablecast feeds are responsible for maintaining the metadata needed to interpret incoming edits in order by effective time, regardless of the order in which edits actually arrive.
The purpose of Tablecast is to help applications maintain and share data, with low latency, in a decentralized fashion. For example, a subscriber could apply the incoming edits to its own table, to maintain a synchronized copy of the publisher's table. Or, an application could subscribe to multiple Tablecast feeds and merge the edits together to produce a combined table.
The following is an example of a Tablecast feed with one entry. The Tablecast-specific parts are shown in bold.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:tc="http://schemas.google.com/tablecast/2010">
<id>tag:repository2.com,2010:feed1</id>
<updated>2010-07-02T20:11:03Z</updated>
<entry>
<id>tag:repository1.com,2010:entry1</id>
<updated>2010-07-02T20:11:03Z</updated>
<author><uri>mailto:user@mailprovider.org</uri></author>
<title>tag:repository1.com,2010:entry1</title>
<content type="application/tablecast+xml">
<tc:edit tc:record="tag:example.org,2010:1234567"
tc:author="mailto:user@mailprovider.org"
tc:effective="2010-06-29T15:27:39Z"
tc:type="{http://schemas.google.com/tablecast/2010}row">
<tc:row>
<tc:field tc:name="facility_name">"New name"</tc:field>
<tc:field tc:name="available_beds"
tc:comment="estimated by doctors on site">55</tc:field>
</tc:row>
</tc:edit>
</content>
</entry>
</feed>
This entry says that, at 15:27:39 on June 29, 2010, UTC,
someone with the e-mail address user@mailprovider.org
edited two fields on
the record with identifier tag:example.org,2010:1234567
,
setting the "facility_name" field to the string "New name"
and the "available_beds" field to the number 55.
The latter change also has an associated change comment
explaining that the value 55 was "estimated by doctors on site".
Note:
repository1.com
),
the domain that hosts the feed (repository2.com
),
the domain that assigned the record identifier (example.org
),
and the domain of the author's e-mail address (mailprovider.org
)
can all be different.
2010-06-29T15:27:39Z
)
and the time that the entry appeared in this feed
(2010-07-02T20:11:03Z
) can be different.
In fact, the effective time of the edit need not match the
<atom:updated>
or <atom:published>
time in any entry or feed.
The XML Namespace URI for all the XML elements and attributes defined in this document is:
http://schemas.google.com/tablecast/2010
For convenience, this format may be referred to as "Tablecast 0.2".
This format is based on the Atom Syndication Format, defined in RFC 4287. This specification uses a shorthand for references to the Atom Syndication Format specification: "(Atom: 1.3)" refers to Section 1.3 of RFC 4287.
This specification uses
the namespace prefix tc:
to signify the Tablecast namespace URI
shown in the preceding section and
and the namespace prefix atom:
to signify the Atom namespace URI.
The choice of namespace prefix is arbitrary and has no semantic significance.
A record identifier is a
tag
URI
designating a particular record in a dataset.
The authorityName
part of the URI
must
be a fully qualified, lowercase ASCII domain name with no trailing period.
Applications that process Tablecast feeds for a common dataset
will need to agree on a common scheme for their record identifiers.
The authorityName
should
be the domain name of the organization that governs the identifier scheme.
An author identifier is a URI
that identifies the person or organization
responsible for making an edit described in a Tablecast entry.
The URI must designate a resource
that is owned or controlled by this person or organization.
Author URIs can designate e-mail addresses, web pages, domains,
or other resources; for example,
mailto
,
tel
,
http
, and
tag
URIs
are all acceptable.
Domain names in author identifiers must
be fully qualified, lowercase ASCII domain names with no trailing period.
A universal timestamp is a string
that conforms to the date-time
production
in RFC3339,
with these additional requirements:
T
must be used to separate the date and time.
Z
must be used as the time zone specifier.
The universal name of an XML element,
using the terminology and syntax defined by James Clark,
is an XML Namespace URI
enclosed in curly braces (ASCII 0x7B and 0x7D)
followed by the local name of an XML element in that namespace.
For example, the universal name of the tc:row
element is
{http://schemas.google.com/tablecast/2010}row
.
This section defines a data format. A Tablecast feed is a valid Atom feed (Atom: 4.1.1) with these additional requirements:
A Tablecast entry is a valid Atom entry (Atom: 4.1.2) with these additional requirements:
atom:content
element
must contain exactly one
tc:edit
element.
type
attribute of the atom:content
element
must be application/tablecast+xml
.
atom:author
element (Atom: 4.2.1)
containing one atom:uri
element (Atom: 3.2.2)
whose content is equal to the value of the
tc:edit
element's
tc:author
attribute.
Tablecast entries may also contain other elements that are valid in Atom entries (Atom: 4.1.2). The order of the child elements within an entry is not significant.
tc:edit
element
The tc:edit
element expresses an edit to a record.
tc:record
attribute
whose value is a record identifier
specifying which record in the dataset is edited.
tc:author
attribute
whose value is an author identifier
for the person or entity responsible for the edit.
tc:effective
attribute
whose value is a universal timestamp
specifying when the edit took effect or will take effect.
The effective time may be a time in the past or future;
the exact semantics of the time are application-dependent.
When the edit represents something observed or reported
about the real world, the effective time
should be
the time that the information was observed or first recorded.
tc:type
attribute (see below).
The content of the tc:edit
element
describes the change to the specified record,
which can be expressed in two ways:
tc:edit
element
must contain exactly one child element,
and the value of the tc:type
attribute
must be the
universal name of that child element.
tc:type
attribute
must be an HTTP media type
for the character data,
as defined by the media-type
production in
RFC 2616.
This specification defines just one type of edit content,
the tc:row
element,
for expressing changes to a row of a relational database table.
Using other values for the tc:type
attribute,
Tablecast can accommodate other methods for expressing changes to data,
such as XML delta formats, text diffs, or operational transformations.
tc:row
element
The tc:row
element represents an update to a database row.
Its content must be either:
tc:field
elements
specifying the values of fields in the row; or
tc:deleted
element.
The order of the child elements in a tc:row
element
is not significant.
The semantics of tc:row
contain some embedded assumptions:
A tc:row
edit containing tc:field
elements
has the following meaning:
tc:field
elements
replace any fields that have an earlier effective time,
and the replaced fields acquire the new authors and effective times.
tc:row
edit containing a tc:deleted
element
has the following meaning:
tc:field
element
The tc:field
element
represents an update to a single field in a single row
at a particular effective time.
tc:name
attribute
whose value is the name of the field.
tc:author
attribute
whose value is an author identifier.
This overrides the author in the enclosing tc:edit
element's
tc:author
attribute, for this particular field.
tc:effective
attribute
whose value is a universal timestamp.
This overrides the time in the enclosing tc:edit
element's
tc:effective
attribute, for this particular field.
tc:comment
attribute
whose value is a comment by the author,
explaining the change to this particular field.
The recommended translation to JSON for some common data types is as follows:
Original data type JSON representation Null (e.g. None, null, NULL) JSON null ( null
)Boolean JSON boolean ( true
orfalse
)Integer
Floating-point numberJSON number (note that JSON can express numbers of arbitrary size and precision, and JSON processors should not assume that JSON numbers are representable as IEEE 754 floating-point numbers) String
Text
Binary objectJSON string Date JSON string in ISO 8601 format ( "YYYY-MM-DD"
)Time JSON string in universal timestamp format ( "YYYY-MM-DDThh:mm:ssZ"
)Single-valued enumerated type
(e.g. enum, MySQL ENUM)JSON string (use "UPPERCASE_WITH_UNDERSCORES"
in the absence of compelling reasons otherwise)Multi-valued enumerated type
(e.g. bitmask, flag set, MySQL SET)JSON array of strings (use "UPPERCASE_WITH_UNDERSCORES"
in the absence of compelling reasons otherwise)Geographic point location JSON array of two or three numbers, either [latitude, longitude]
or[latitude, longitude, altitude]
in the WGS84 coordinate frame, where latitude is measured in degrees north, longitude in degrees east, and altitude in metresGeographic polyline
Geographic multi-line
Geographic polygon
Geographic multi-polygonGeoJSON representation
Note that these JSON representations are not intended to convey the original types of the values, only to be interpretable as the correct values when the appropriate types are known. For flexibility, Tablecast intentionally avoids requiring applications to agree exactly on table schemas. Applications that receive and then republish changes should emit JSON representations using the same structure they received.
tc:deleted
element
The tc:deleted
element indicates that the edit
consists of deleting the specified row from the table
at a particular effective time.
tc:author
attribute
whose value is an author identifier.
This overrides the author in the enclosing tc:edit
element's
tc:author
attribute.
tc:effective
attribute
whose value is a universal timestamp.
This overrides the time in the enclosing tc:edit
element's
tc:effective
attribute.
tc:comment
attribute
whose value is a comment by the author, explaining the deletion.
This section defines an application programming interface based on HTTP requests that yield Tablecast feeds. Applications may choose to support additional query parameters beyond those specified here, to filter, search, or otherwise manipulate the returned entries.
A Tablecast service provides two views of a dataset:
Suppose that D is a dataset and u is an HTTP or HTTPS URL. An application is said to provide a Tablecast of D at u if it meets the requirements in both sections below.
Let X(t) be the set of all Tablecast entries
that would be retrievable at u
at a given point in time, t,
using any combination of query parameters
that excludes the snapshot
parameter.
atom:updated
time strictly greater than
that of all entries in X(t1)
and all entries in Y(t1).
(See below for the definition of Y.)
GET
request for u
at time t with these query parameters:
min-updated
parameter has the value m,
where m is a universal timestamp.
skip
parameter has an integer value k.
limit
parameter has an integer value n.
atom:updated
time.
atom:updated
times less than m.
Let Y(t) be the set of all Tablecast entries
that would be retrievable at u
at a given point in time, t,
using any combination of query parameters
that includes the snapshot
parameter.
atom:updated
time strictly greater than
that of all entries in Y(t1).
GET
request for u
at time t with these query parameters:
snapshot
parameter has the value 1
.
skip-record
parameter
has an ASCII string value s.
limit
parameter has an integer value n.
Tablecast does not provide a mechanism to assure the secrecy or integrity of content. Applications wanting to protect content against eavesdropping or modification in transit should use a transport layer with encryption and/or content authentication, such as HTTPS, to transmit Tablecast feeds.
Tablecast does not provide a mechanism to verify that a client is authorized to retrieve a Tablecast feed. Applications wanting to protect content from unauthorized clients should use unguessable HTTPS URLs or other authorization mechanisms to protect their feeds.
Tablecast does not provide a mechanism to verify the identity of the publisher. Applications should subscribe to publishers they trust, and can use DNS, DNSSEC, or certificate validation to verify the identity of the publisher.
Tablecast does not provide a mechanism to verify the identity associated with an author identifier. Applications should treat author identifiers in Tablecast feeds as claims made by the publisher, and may choose to reinterpret, selectively process, or ignore edits based on their own policies about publishers and authors. For example, an application may decide that only certain publishers are trusted to speak for certain authors, and accept or ignore incoming edits accordingly. An application may also decide that only certain authors are trusted to edit certain fields, and selectively ignore parts of incoming edits accordingly.
Thanks to Brett Slatkin, Steve Hakusa, Craig Nevill-Manning, and Alon Halevy for their advice and input.
This document is licensed under the GNU Free Documentation License 1.2.