Access Metadata Interface Definition

Metadata Resource

A host that wishes to place any of its content under Remote Auth using Access MUST expose a public resource:

http://host_and_port/access_metadata

Which MUST return a document with Content-Type application/json and format as discussed below. The scheme of the URI on which this resource is exposed MUST be http (not https).

In the event that any of these requirements are not met, Access will treat the Unique Identifier of any resource for that host and port as the full URI of the resource and the content classification as unconditional.

Format

The format of the Access Metadata JSON document is as follows:

{

 access_metadata [

array

1..n objects describing the access metadata of a path pattern

   {

object

An object representing a single path pattern

     path_regex

string

A regular expression for matching against a URI path & query using the Java regex format. This may optionally contain a named group called “uid” in which case the authorisation engine will extract this field and use it as the Unique Identifier for that resource.

The Java regex format is very close to the PCRE format - differences to Perl 5 are documented here.

     resolution_method

string

Whether or not the authorisation engine should interrogate the individual path via a HEAD request in order to get resource specific UID & Content Classification metadata - one of:

  • remote_headers

  • none

Default “none”.

     classification

string

The content classification - one of:

  • unconditional

  • conditional_registered

  • conditional_standard

  • conditional_standard_uncounted

  • conditional_premium

  • conditional_premium_uncounted

  • conditional_alphaville_longroom

Further classifications may be supported in future. Default “unconditional”.

   },

   { … }

object

Repeat for additional path patterns

 ]

}

Example for www.ft.com:

GET request to http://www.ft.com/__access_metadata

Which may return:

{
    “access_metadata”: [
        {
            “path_regex”: “(/intl)?/cms/s/3/(?[a-f0-9-]+).”,
            “classification”: “conditional_premium”
        },
        {
            “path_regex”: “(/intl)?/cms/s/[01]/(?[a-f0-9-]+).”,
            “classification”: “conditional_standard”
        },
        {
            “path_regex”: “(/intl)?/cms/s/1/(?[a-f0-9-]+).”,
            “classification”: “conditional_standard_uncounted”
        },
        {
            “path_regex”: “(/intl)?/cms/s/2/(?[a-f0-9-]+).”,
            “classification”: “unconditional”
        },
        {
            “path_regex”: “(/intl)?/fast[fF]tT?“,
            “classification”: “conditional_standard”
        },
        {
            “path_regex”: “.de_login(\?.)?“,
            “classification”: “conditional_premium_uncounted”
        },
        {
            “path_regex”: “/presscuttings/tools/.”,
            “classification”: “conditional_premium”
        },
        {
            “path_regex”: “/presscuttings/s/3/.”,
            “classification”: “conditional_premium”
        }
    ]
}

OR

{
    “access_metadata”: [
        {
            “path_regex”: “(/intl)?/cms/s/.”,
            “resolution_method”: “remote_headers”
        }
    ]
}

Example for Blogs:

GET request to http://blogs.ft.com/__access_metadata

Which may return:

{
    “access_metadata”: [
        {
            “path_regex”: “/the-a-list/.”,
            “classification”: “conditional_standard”,
        },
        {
            “path_regex”: “.”,
            “classification”: “conditional_registered”,
        }
    ]
}

OR

{
    “access_metadata”: [
        {
            “path_regex”: “.”,
            “resolution_method”: “remote_headers”
        }
    ]
}

Example for FTAlphaVille:

GET request to http://ftalphaville.ft.com/__access_metadata

Which may return:

{
    “access_metadata”: [
        {
            “path_regex”: “.”,
            “classification”: “conditional_premium”,
        }
    ]
}

OR

{
    “access_metadata”: [
        {
            “path_regex”: “.”,
            “resolution_method”: “remote_headers”
        }
    ]
}

Processing Rules

The authorising engine will process the file in the following manner:

Given the engine needs to authorise a path /docs on host www.myhost.com:

  1. The UID for the resource is initially defaulted to http://www.myresource.com/docs, and the classification to unconditional.
  2. If at any point in the steps below an error occurs, whether I/O related or due to an invalid format, the processing will use the latest values it has calculated for UID and classification for the resource.
  3. http://www.myhost.com/__access_metadata is downloaded
  4. The string /blah?foo is compared to the path_regex in each object in the json document strictly in order until the first one is found that it matches. Nothing clever is done in terms of ordering the path regexes according to how specific they are - if the first one were “.” then it would always be matched and no others in the file would ever be matched.
  5. If no match is found then the process exits and uses the defaults.
  6. If the object contains a valid content_classification field then that is used as the resource’s classification.
  7. If the regex, when applied to the path, matched a group named uid then the value of that group is used as the unique identifier.
  8. If the object had a resolution_method field with value remote_headers, the authorisation engine should make a HEAD request with a header called X-FT-Access-Metadata: remote_headers (see below for rationale) to http://www.myresource.com/blah (the scheme may be assumed to be http).
  9. If the response contains a header named X-FT-UID then the value of that header is used as the unique identifier.
  10. If the response contains a header named X-FT-Content-Classification with a valid value then that is used as the resource’s classification.

Here is a flow diagram:

 

Per URI Values

If a service wishes to have UIDs or Content Classifications defined on a per URI basis (for instance in future www.ft.com might want to have friendly URIs that do not contain semantic information like UID and classification number) then it may do so by setting the resolution_method field to “remote_headers” for a path pattern. If so it expects the authorisation engine to make a HEAD request to the path with a header X-FT-Access-Metadata: remote_headers, and it will return one or both of the following headers:

X-FT-UID: The unique identifier the service wants the authorisation engine to use for this item of content

X-FT-Content-Classification: One of the valid content classification values as defined above. There is potentially issue here with an endless loop, where because the resource is protected the attempt to make the HEAD request itself routes into Access, which makes the HEAD request again and so on infinitely. This is avoided by a rule in Access that HEAD requests with a header called X-FT-Access-Metadata: remote_headers are always granted.

Caching Considerations

There SHOULD be a standards compliant HTTP 1.1 cache on the authorisation engine side which SHOULD work on a fail stale basis.

The authorisation engine MUST NOT disobey the semantics of the HTTP cache headers on this resource.

The __access_metadata resource SHOULD return appropriate Cache-Control or Expires headers. The longer the resource can be cached, the fewer longer response times the authorisation engine will return, though this must be balanced against any need to be responsive to business changes. In general it is hoped it should be possible to make these quite long - of the order of 24 hours or so.

HEAD requests for protected resources with an X-FT-Access-Metadata header MAY return different Cache-Control / Expires headers to those returned by a normal HEAD request if it makes sense to do so by returning a Vary: X-FT-Access-Metadata header in the response. The longer the authorisation engine is able to cache the results of these HEAD requests the more efficient it will be at making authorisation decisions, so it is desirable to make them cacheable for as long as possible. Business requirements for responsiveness on changing a resource’s classification will obviously need to be considered.

 

Future Improvements

There is scope to evolve this contract in future. Two possibilities have been suggested:

Content Metadata

It may be useful to know what metadata is associated with an item of content. A likely use case is that products are developed that allow customers access to content based on metadata - e.g. the famous “Oil and Gas” theoretical product. This could be added as a new field in the path object:

{

 access_metadata [

array

1..n objects describing the access metadata of a path pattern

   {

object

An object representing a single path pattern

     …

     metadata: [“tag1”]

array of strings

A set of tags indicating metadata values associated with all the content that matches this pattern.

   },

   { … }

object

Repeat for additional path patterns

 ]

}

In the (likely) event that metadata is unit of content specific rather than path pattern specific, the resolution_method field could be set to remote_headers for a path pattern and a header returned in the resulting HEAD request as follows: X-FT-Metadata: <comma space delimited metadata tag names>

Custom Deny Locations

At present when access is denied to a resource it returns a 302 with a Location header in the following format:

http://registration.ft.com/registration/barrier?location=&referer=

This may be insufficiently flexible, and it may be desirable to allow a new field in the path object:

{

 access_metadata [

array

1..n objects describing the access metadata of a path pattern

   {

object

An object representing a single path pattern

     …

     deny_redirect

string

A URI with interpolated dynamic values uri and referer to be extracted from the original authorisation request with syntax {{variable_name}}.
Default: http://registration.ft.com/registration/barrier?location={{uri}}&referer={{referer}}

   },

   { … }

object

Repeat for additional path patterns

 ]

}