Hi Rob
So here is the thing. xmp is incredibly complicated, and the rules are not at all clear.
I had to figure out how to write xmp data to files, so that it could be read by Lightroom, so I had to understand some of this.
The first thing to realize is that the rules for which kinds of files require "xmp sidecar" files is defined in an
ad hoc fashion, and there don't appear to be any publicly defined rules about them. Adobe has adopted the strategy of always using sidecar files for proprietary raw formats, but never using them (and never consulting them) for formats like jpg, psd, dmg, tif, etc. This is not a requirement of the spec (which Adobe wrote), but only Adobe's particular decision. Adobe says (in "XMP Specification"):
It is recommended that XMP metadata be embedded in the file that the metadata describes. There are cases where this is not appropriate or possible, such as database storage models, extremes of file size, or due to format and access issues. Small content intended to be frequently transmitted over the Internet might not tolerate the overhead of embedded metadata. Archival systems for video and audio might not have any means to represent the metadata. Some high-end digital cameras have a proprietary, non-extensible file format for “raw” image data and typically store EXIF metadata as a separate file.
If metadata is stored separately from content, there is a risk that the metadata can be lost. The question arises of how to associate the metadata with the file containing the content. Applications should:
● Write the external file as a complete well-formed XML document, including the leading XML declaration.
● The file extension should be .xmp. For Mac OS, optionally set the file’s type to 'TEXT'.
● If a MIME type is needed, use application/rdf+xml.
● Write external metadata as though it were embedded and then had the XMP Packets extracted and catenated by a postprocessor.
● If possible, place the values of the xmpMM:DocumentID, xmpMM:InstanceID, or other appropriate properties within the file the XMP describes, so that format-aware applications can make sure they have the right metadata.
For applications that need to find external XMP files, look in the same directory for a file with the same name as the main document but with an .xmp extension. (This is called a sidecar XMP file.)
If you read this carefully, they are only giving guidelines about whether to use sidecar files, and suggesting that in the absence of a good reason, don't use them. Yet, in practice, Adobe seems always use xmp sidecars for every file format other than the "open" formats like jpg, tif, dng, psd, gif, png, etc.
Once you have a sidecar file, this means that
the same field, like xmp:dc-title, can exist in both the raw file and the sidecar, and the field can have different values in the two files. In fact, one of my cameras (I can't remember which right now) put the camera model into the title field, which was stupid, but there it is. So when I give a "title" to a photo, and Lightroom writes it to the xmp sidecar, there is a guaranteed conflict between the two values.
Now we get the other complication that various xmp fields are supposed to have
the same meaning. As far as I can tell, XMPxmpRights_Marked and Photoshop_CopyrightFlag are the same field, under different names, and, potentially, different values. Adobe applications have a lot of smarts to try to keep these various fields which mean "the same thing" in sync. See the Adobe document "IPTC Standard Photo Metadata (July 2010)" which says
The same information can appear multiple times within Adobe Photoshop's Custom panels/tabs. The data is not duplicated. It is stored only once, and all the panels, tabs or schemas that read or write to that field use it as a “shared property.” Some IPTC Core properties already appear as part of Adobe’s Description, Origin and Categories in the File Info panels as well as the Adobe Photoshop File Browser’s Metadata panel. As an example, enter the name “John Doe” in the “Creator” field of the "IPTC Contact" section of the IPTC tab, then switch to the Adobe Photoshop “Description” tab — notice that the name “John Doe” automatically appears in the "Author" field in that panel. Change that “Author” entry to “Jane Doe” and it will appear in the “Creator” field in the IPTC Contact section using the new name. Both tabs simply provide two different views of the same metadata.
When image files are opened and saved by Adobe Photoshop (version 7.01 or higher), the “IPTC Fields” stored within those files are synchronized with the stored XMP metadata. To maintain compatibility with older versions of Adobe Photoshop, no pre-existing mappings have changed.
Workflow interoperability is another reason why some metadata appears shared. For example, many types of documents need a “Title” (technical specifications, tax forms, blueprints), but only news items need an IPTC Subject Code. If another metadata standard has already defined a useful property, such as the PLUS schema (
http://www.useplus.org/), it is adopted in one of the IPTC schemas. In many cases (title and keywords are two examples) this mapping was already established by Adobe Photoshop’s mapping of binary IPTC IIM metadata to XMP. Likewise, metadata entered using the IPTC Core panels will continue to appear in other locations. The IPTC Core, IPTC Extension, and PLUS fields that are shared are noted in the descriptions of each field that follow.
What this means (as far as I can tell), is the the "same" information can be (a) in one of two different files, and (b) in any of many different xmp fields. Adobe attempts to keep all this in sync, or if it can't, it figures out which is the "correct" one using various heuristics.
Now, exiftool is not an Adobe product, and it has different rules. As I found out through long experimentation (this doesn't seem to be written down anywhere, but maybe I just didn't find it), exiftool tries hard to unify the various "names" for the "same field". You can ask for the -title field, or you can ask for the -xmp-dc:title field. If the former, exiftool tries to report the value after consulting various xmp fields all of which might be the "title" field. If the latter, it just looks at the one variant of the title field. And if you set the value (using "-title=NewValue"), then it tries to keep the various "title" fields in sync, while if you set the title using "-xmp-dc:title=NewValue", it just sets that Dublin Core xmp field.
However, the one thing exiftool does not do (which Lightroom does) is to consult both the raw file and the xmp sidecar file, and try to combine the data from both files into one unambiguous set of metadata. Similarly, when you set the metadata for a file, Lightroom will decide whether to write to the file or a sidecar file depending on the file type. Exiftool, on the other hand, will just write to the file you tell it to, if it can.
What this means is that any program, like your plugin, needs to decide whether to consult an xmp sidecar or whether to consult the file itself (for instance, the rule might be, to read from raw files, if there is an xmp sidecar, use it, if not use the raw file directly, while for writing always use the xmp sidecar, but for jpg/tif/psd/dng/gif/png always ignore the sidecar if it exists -- these are, I believe, Adobe's rules).
But none of this seems to be written down, and you need to figure out what to do by following what others do.
I suspect that much of what I am saying here is already well-known to you. And. i'm sure, some of what I am saying might be wrong.. But this is my understanding of how metadata currently works.
I find this all very annoying, and wish that someone could come in and set up some simple rules, write them down, and we could all follow them
In short, if your metadata plugin is not consulting the xmp sidecars of raw files, in a way consistent with how Lightroom uses sidecar files, I don't think I can use your plugin. It might be perfect for everyone who converts their raw files to dng's. I have a legacy of tens of thousands of raw files, and don't want to convert them right now.
Thanks for all your good work.
Alan