Metadata design and evolution

I’d love to see (and would be happy to contribute to) something about concept design and metadata. The need for clear delineation of what a given concept is (and is not!) is parallel to how we delineate the metadata associated with assets of any kind.

I work in metadata for news, in all formats (text, video,photos, audio, etc…). As I have read your work, I have understood that the challenges of aligning metadata across media types is rooted in the challenge of aligning concepts in those different spaces.

1 Like

Hi Annette – Welcome to the forum! This sounds fascinating. Can you give an example for us to think about?

Sure. To explain where I am coming from, I see an intersection of concepts and metadata. As I understand your writing, concepts are implemented via actions, and those actions (in code) create values, which might be used immediately or might be stored for later use. The fields and values that are preserved are codified and made available to downstream systems in metadata schemas (if you disagree, or if this needs clarification, please say so!).

Metadata schemas might be purpose-built one-offs, or might follow (rigidly or loosely) a community or industry standard. I work in the News industry, with standard schemas as well as ad hoc data structures that feed into standard schemas.

In general, in my work, I find that people are loathe to change schema standards, and sometimes err on the side of stretching standard elements beyond the scope of their intended concept, to avoid excessive churn for those who want to stay current. This is one place trouble can start.

A problem arises when standard metadata elements start off as a way to capture information related to one concept and, over time and despite the best of intentions, evolve to support related or adjoining concepts that do not quite fit. We work with metadata that is not fit for purpose from this evolution every day.

While the misalignment of concept and action can result in a bad user experience, codifying that misalignment in the metadata makes sure that that bad result hangs around and continues to affect the experience of every downstream system that has to work with it.

I could go on at length about metadata standards, but will try to keep it brief, and grounded in examples.

In the past, metadata standards for News in different media types evolved with those media technologies – metadata for photos, metadata for video, metadata for a written text story all had their own standards. And that and worked fine, because News content was licensed by providers and bought by different clients in different businesses, each media type in its own system. As the news business has become less siloed and more integrated, the demand is for multimedia content, not “photos of the royal wedding” and “video of the royal wedding”, but “news of the royal wedding”. The driving preference is to be able to offer text stories, photos, and videos all on one platform, necessitating a standard, media type agnostic metadata schema. At the same time, on the content creation side, photo ‘still’ cameras have evolved to shoot video and digital video cameras easily produce ‘stills’, which are used in place of traditional photos. Combining assets created from these different media tools is made easier when the metadata aligns. So, from both a production side and a consumption side, there has been interest in merging metadata schema standards, where possible.

This applies for metadata embedded within digital assets (like IPTC metadata in .jpgs) and also for metadata schemas that are media type agnostic, which allow for federated search across different content sets and integrated displays of news items in any media format.

So, a few examples of where trying to have standard metadata across media types and across supply chains runs into problems…

Your example of a clash of concepts around social media ratings put me in mind of other ratings issues. An IPTC image rating , which was initially meant to be 0-5 stars, based on the opinion of a photo creator, supplier or user (already a clash of concepts) is now at risk of being expanded to include video ratings , which can encompass multiple star ratings from different opinion holders, including editorial users and random users of websites who want to highlight or recommend their preferred images , and also may need to carry a text value representing an assessment according to a national board (like the MPAA in the US ) that designates a commercial video’s appropriateness for an audience. Too many things called Ratings that all vie for the same metadata slots.

Another example – identifying Creator and Source, which are key pieces of News metadata and are difficult to use in multimedia content, because they hold different meanings when dealing with different media types and business arrangements. For example, in common parlance, a source is someone that a text journalist gets information from and may or may not be mentioned in the resulting article. But the Creator and Source of the article noted in metadata are the organization the journalist works for, not the person who provides information. However, a photo Creator or Source may be a photographer or the organization that employs him, if it is a work for hire, depending on where in the world he works. A photo editor (for a reputable news organization) will not make substantive changes to the photo, so will not be considered a Creator or Source. A videographer or his organization may be a Creator or Source, and a video editor who compiles multiple clips into a finished product may also be a Creator, though not likely a Source, again, depending on where in the world the work is done and local laws. In all of these cases, the people and organizations who create and provide content hold certain rights to the content which is important to understand when licenses are granted for reuse. They all carry the same metadata labels, but their concepts overlap. We have substantial work ahead of us to sort out the concepts in play here and make them identifiable, especially in a consistent, machine consumable way.

In summary, for historical reasons, we find ourselves in a tough situation, where there is a convergence of digital content being described by a standard set of metadata that does not map cleanly onto current needs. Changing those definitions/expanding the set of elements used will be a challenging process.

I am pretty sure this is not what you have in mind as an aspect of concept design, but I did see in your ideas some useful parallels to the alignment work that I have in front of me. I see the principles you have set out for identifying concepts and the discipline that you encourage for limiting actions to the scope of a single concept as a useful guide to the development of new and well-defined metadata. I’d like to further my understanding of how to first tease out the critical components of concepts, so that we can then create the schemas necessary to cleanly support the asset metadata.

1 Like

This is a really important topic, and your post is a great start to a discussion, @annette. So I hope you won’t mind that I moved it to a topic of its own.

I see very strong connections between metadata design and concept design: the issues you raise are very much what I have in mind. I think the following ideas from concept design might be useful in metadata design.

  1. The metadata of a single item is usually divided into groups of fields. For a photo, for example, there’s the EXIF data that contains technical info about exposure etc, the IPTC data that contains captions, etc., and more. Classes may overlap, so a photo might also be an insured asset, say. Concept design is all about finding appropriate granularity for the state of a system, and so it might help us think about natural groupings of metadata, not necessarily confined to the groupings that represent different standards.

  2. The notion of granular purpose in concept design can be applied to help understand what a metadata field or group is for, and conflicts of purpose (overloading, in concept terms) are likely to be problematic. @annette gives several examples of that.

  3. A key difference between concept design and the approaches typically taken in the worlds of ontology and conceptual modeling is that concept design looks to the actions of a concept to clarify the role that the state plays, rather than giving static definitions to state components. It might be useful to apply this to metadata and ask what actions motivate the fields and have to be supported. One simple example: you might think that the capture time of a photo in EXIF is immutable, but it’s very common to want to change it, for example, to correct cases when a camera’s clock was set wrong (or to account for daylight savings). A particularly complex example of this is that I shot a wedding with two cameras and then discovered their clocks were off by a minute or two. It was a huge pain to get the images in order, and it would have been really helpful if my software had allowed me to apply a fixed translation to the times of a set of images.

  4. The operational principle maybe a helpful way to understand the essential role of some metadata. @annette’s examples of the MPAA giving a rating is a good example, because it suggests that software is likely to have various predicates that are applied (eg, to determine if a movie can be shown by an online provider to a minor).

I have some other examples from my metadata experiences with Lightroom which I’ll try and dig up and add to this thread later.

1 Like

In the world where digital assets are routinely exchanged across organizations, standards are a double edged sword.

I can add, re: your #3, that one challenge of working with standards is exactly what you suggest…it is indeed helpful to think about the role that the state plays, but different organizations have different workflows, ie, the same state may play a different role in a different organization. Identifying what are essentially universal roles (maybe via ubiquitous workflows?) , is a way to start. But another challenge of metadata standards, at least those that are freely available and widely used, is that compliance with the intended use varies widely.

One user’s plus is another user’s minus - your example of needing to adjust or correct a time value is needed, for sure. To bring another example from news…some users want to require inclusion of photographer’s identity and location, taken from a registered serial number and a GPS aware camera in standard metadata, to support IP claims. For news photographers who may sometimes capture images in places where they must not be identified for safety’s sake, this is a non-starter. This different role for the same information changes the set of actions that must be allowed when creating or editing images.

1 Like