Maintain an FT content dataset

Previous

This page

A common use case for the FT API is to keep an up-to-date copy of recent FT content in a data store you manage, so you can organise or filter the content as you wish or query it.

The following steps explain how to use the Notifications and Enriched Content endpoints to achieve this.

Main polling loop

When your service starts up, it should call Notifications to find the most recent updates:

GET https://api.ft.com/content/notifications?apiKey={yourApiKey}&since={timestamp}

To determine the request parameters, we recommend following this process:

  • If you have a links.href value reported from a previous call to Notifications, use that URL directly. It may contain extra parameters such as page; these should not be removed or altered, as they are used to ensure multi-region consistency.
  • If you don’t have a links.href value, you could use the latest publishedDate in your data store.
  • If your data store is empty, you could use a fixed offset from the current time. (Avoid using a fixed offset in other situations, as you may miss notifications requiring you to take action.)

In any case, the time you choose should be within 2 days of the current time. If you have a need to build a mirror which includes older FT content, we can supply a data dump on request.

The {timestamp} needs to be in ISO 8601 format using the Z (UTC) time zone: 2014-06-23T13:50:00.000Z

The response to this call includes a list of notifications and a list of links.

You can either request the content synchronously, or store the notifications and have them picked up by a separate process.

Next, identify the URL you will use to make the next call. To do this, iterate through the objects in links until you find one with a rel value of next. (Currently there is only one type of link in the links, but we recommend that you iterate through the list anyway for future compatibility.) The URL is provided in the href field of this object.

There is a maximum size limit to the notifications list, so if it was not empty, there may be further notifications. Repeat the main loop immediately using the URL you have identified.

If the notifications list was empty, you have caught up with all the pending notifications. Wait for a suitable polling interval, and repeat the main loop. The FT publishes around 1000 updates per day. Most clients use a polling interval of around 5 minutes.

By following this process, you are guaranteed to receive all the notifications, so your mirror of FT content will be up to date. In some situations, you may receive a notification more than once for a single publish event. Since it will not affect the integrity of your mirror, we recommend not to attempt to identify these occasions or implement any special processing for them.

Requesting content

Each entry in the notifications list has a type, an id and an apiUrl. Depending on your key and your access to different types of content, it might have a content type as well.

When you want to request content, iterate through the notifications which are waiting to be processed. For each one, look at its type to determine what to do.

If the type is http://www.ft.com/thing/ThingChangeType/UPDATE, call the Content API at the URL given in apiUrl, and use the response to refresh the content in your system. The simplest kind of mirror will simply save the response object to a store, to be retrieved and processed later. Your use case will determine whether you do this or perform some kind of processing before storing it.

If the type is http://www.ft.com/thing/ThingChangeType/DELETE, delete the content referenced by the id from your system, if you have saved it, and trigger any other processing required to reflect the deletion in your system.

Depending on your key, you may be able to distinguish between new and updated content. In this case, newly published articles will be of type http://www.ft.com/thing/ThingChangeType/CREATE. If your key permissions do not allow it, both newly created and updated content will be of type http://www.ft.com/thing/ThingChangeType/UPDATE.

If you are using any of the metadata annotations provided by the FT, which connect content to concepts such as companies, people, topics and brands, you will want to use the Enriched Content API instead of the Content API. To do this, replace the /content/ part in the apiUrl with /enrichedcontent/ and call the URL obtained in this way.

Although the id and apiUrl look similar, they have different functions. The id identifies the content and will not change. The apiUrl tells you where to find the content in the API, which may change.

When things go wrong

If the Notifications API is unavailable in one region, it will fail over to another region. You may see duplicate notifications as described above, but your client should be otherwise unaffected.

If the Notifications API becomes completely unavailable, you will see an HTTP error response code. We recommend waiting a random amount of time before retrying. Any updates which had been saved in the system during the outage will be retrieved in your next call.

Sometimes you may be notified about changes to content that your key is not permitted to retrieve. This most often occurs for content which we have syndicated, so we do not have the rights to distribute it to you. When this happens, you will get a 403 Forbidden HTTP response to your Content or Enriched Content API request. Clients should treat this as expected behaviour and continue processing with the next content item.

If you get a 403 Forbidden HTTP response to your Content or Enriched Content API request, and you already have the content, you should treat this as you would a DELETE for that content item, and remove it from your system.

Sometimes you may see a notification for content which appears to be unchanged. This most often occurs when an aspect of the content is updated which your key does not have access to. For example, we have internal fields which drive certain fields of the FT.com website, which are not made available externally. An update to one of these fields would still result in a notification, because all API clients see the same set of notifications, but a customer request for the content would not show the updated field.

Notes

The FT’s publishing platform uses the principle of eventual consistency across regions, so the notifications you see in one region may occur at slightly different times, or in a different order from another region. We attempt to route clients consistently to the same region, so your mirror will be an accurate reflection of the state in that region.

If a content has been updated several times in the time period you request, or it has been updated and later deleted, the Notifications API hides earlier notifications and only returns the latest one, because there is no need to process notifications that have been superseded. To take full advantage of this feature, avoid using a polling interval which is shorter than you need for your use case.

Depending on your key, you may not be able to distinguish new content from updates in the notifications feed. In this case, you should process all updates in the same way, inserting new content if you don’t already have it. To determine whether a content item is new or has been significantly updated, use the publishedDate and firstPublishedDate fields of the content, which are editorially managed.

We do not keep a record of which clients have received which notifications. If you make a request which covers a time period that you have already requested, you will receive the same notifications as the previous time (excepting updates to the same article, as described above)

Next