Why is there such a perceived problem with metadata? Given that most organisations agree that metadata solves many problems, why isn’t it used widely? Why is metadata perceived to be ‘difficult’ or ‘too complex’?
In a previous post I wrote about the critical nature of metadata – https://vinesolutions.co.uk/2019/09/24/meta-data-and-prorogation/ . Here I extend the discussion, arguing that ‘it isn’t really difficult’ if one focuses on a specific problem!
Why Do Relying Parties Need Attributes?
Usually a given business process is interested in verified attributes to determine eligibility (e.g. government benefit payment) or to reduce risk (e.g. make loan), or to comply with regulation (e.g. seek parental approval for child to use web site). Attribute consumers don’t care about the operational process used to determine that an attribute is assured; they do care that the attribute is assured ‘well enough’, i.e. to an agreed standard, for their business process to rely upon. They also need associated metadata so their process knows the properties of the attributes: syntax, semantics, timeliness, assurance and provenance.
Build a metadata standard focused on your eco-system
It is feasible to build a useful metadata model for many common personal data attributes such as name, date of birth, address, spouse/civil partner, children, assets held, disability, etc. A particular eco-system of parties who need such attributes can simply define their metadata, stating semantics, provenance, timeliness, and verification/assurance levels.
That is, they agree a trust framework between the parties in the eco-system. Why would this not be relatively easy for a closed or controlled eco-system such as banks and KYC/AML attributes, or a government for its common public services? The eco-system can determine its own standards, quickly and incrementally, and start with the key attributes with most value or most associated risk.
Most common attributes have very obvious mechanisms of assurance – it is just that no one has classified them and standardised the resulting ‘metadata’.
Valuable attributes are either observable by behaviour or correlation (like person-address), or are the designation of an authority which is often an agency of the state (e.g. DOB, spouse, child of, disability status), or are aggregate properties of such (e.g. sum over all authoritative attributes owned by person).
Attribute Stores should be controlled by the data owner, not Data Aggregators
Mechanisms of triangulation or aggregation are particularly amenable to personally controlled attribute stores which keep verified attributes and thereby avoid any external agency or organisation owning ‘big data’ about the data subject.
Such ‘customer led assurance’ enables the user to marshal many separate attributes from trusted sources and so derive new trusted attributes. The store (as an ‘assurance providing’ capability) can derive triangulation assurance if given many origin sources of such assurance which are attested by their issuing organisations.
> all my bank and savings accounts leads to ‘declared financial assets’ (unless I am deliberately excluding some)
> all my utility and bank accounts derives ‘trusted address for financial transactions’
> my council tax bills lead to ‘address known to councils for taxation purposes’, and ‘customer of local authority’
> my children/ward’s school reports leads to ‘has provable interest in caring for a child’.
Towards a metadata model
The actual meta-data model for most common attributes is not large and need not be very complicated. We are interested standards for a specific eco-system covering syntax, semantics, timeliness, assurance and provenance. These need to be codified for each attribute, by whatever mechanism the eco-system chooses (e.g. OWL/RDF, transforms such as JSON, enquiry languages such as SPARQL, other semantic web technologies).
There is no need to propagate the process or sequence of events by which the assurance was derived, it is the result, the outcome, which matters: i.e. an assured attribute and its associated meta-data which is valuable to citizens in obtaining services, RP’s in managing risk, and those who wish to ‘monetise’ such data. Representing process is an unnecessary complexity if the focus is on attributes which are assured to our eco-system’s agreed standards.
Below is an initial example of the metadata for a residential address. All of the key characteristics of metadata are illustrated. A varying ‘level of assurance’ is suggested. A relying party, presented with metadata according to this scheme, could make its own risk-based decision on providing service to the data owner.
An example- ‘verified address’
We mean by this ‘assuring the address is the <residential> address of <person>’. (We might have other processes for non-residential addresses, holiday homes, temporary residence with others, public buildings etc.)
So, we can begin the definition of metadata for residential address of a person:
> syntax: PAF assured Postcode and dwelling number, or UPRN as stated by OS database
> semantics: place of <residence> of <person> (residence meaning …. e.g. spends overnights at this property more than 60% of the time, or returns to overnight there with gaps of not more than 30 days, or whatever our eco-system needs)
> timeliness: time of assurance by <provenance>, and assured valid <from> <to>, …
> provenance: <attribute provider>, <assurance organisation e.g. gov, e.g. bank>
> assurance: <mechanism of assurance of address> – see below.
Where the mechanism of assurance of address is one or more of:
> – Self asserted User simply states her address (i.e. no assurance)
> – PIN-in-post Send a letter with one-time personal identification number to <person> at <residential address> and get a response code via web account
> – Delivery-success Send parcels <over time> to <residential address> and get no returns, and sender receives payment from <person>
> – Triangulate <n> – find <person> <address> across multiple <n> sources including consideration of timing over commonly agreed period <from> <to> (Found in electoral roll at this address is an example of triangulate<1>)
> – Geolocation-overnight <person> discloses geo-location history to <assurance organisation> (Or better, the person discloses such data to a trusted software agent which processes this raw data to create a derived attribute ‘resident of … <from> <to>’ with the geolocation assurance type.
> – Independent-inspection inspection of <person> at <property> over night for ‘several nights’ by <assurance organisation> (This is clearly extreme and potentially privacy invading, but is used as an example of very high assurance, even if this will not be needed in a real eco-system.)
Thank you for reading.
If you or your organisation would like to discuss creating a metadata standard for a business process or an eco-system, please feel free contact us!
Image by InsightPhotography on pixabay https://pixabay.com/users/InsightPhotography-3337557