Last year IMATAG published the first study on the presence of metadata, including credit and copyright, in images used by newspaper publishers on their websites.
The conclusion, although predictable, was quite alarming: a majority (97%) of the images published on the internet are stripped of their credit metadata, thus jeopardizing any reconciliation between an image reused outside its original context (a web page) and its source, author or rightholder.
Our report showed up the negligence of publishers in maintaining these identification data attached to the image file, while a simple technical specification to their CMS provider could save the credit from the purge caused by image compression.
#1 FOCUS 2019: PRESS SITES
Here is an update of these statistics. The websites of the same publishers as in 2018 were visited by our robots, this time by prioritizing the study of the images displayed on the homepage of the site and on the pages of the complete articles linked from this page. The observation was made over the period of May 2019, on a sample of 36 french and international press sites, representing 100,000 images (with a minimum of 1,000 photos per site).
#2 ONLY 5 SITES HAVE MORE THAN 50% IMAGES WITH CREDIT
Publishers with a high rate of credit metadata have a CMS designed to keep them “alive” (that is, not to delete them). Very few are in this case: The Spiegel is the best example.
#3 5 OTHERS HAVE GOOD INTENTIONS BUT ALSO A FEW QUACKS
Those between 30% and 50% may have strange behaviors when looking at certain details.
For example, Politico, when analyzed by our robots, provides images credited by default. However, if as a human user you browse their site, the images that will be displayed for you will be resized to fit your device profile (screen size and resolution, network). These resized images, alas, have no metadata …
Another example: Le Figaro. This year, we modified our algorithm to focus only on the fields for photo credit. We discovered that some credits were missing while the images had metadata: in fact, the problem was that the “owner” field was filled instead of the “credit” field. Not in the right place, so …
#4 BELOW 30%, EDITOR DOES NOT REALLY CARE
Less than 30% may have metadata “by accident”, which means that the CMS does not erase them, but they probably disappeared before, during the editing process.
#5 THE MAJORITY OF SITES CLEAR METADATA SYSTEMATICALLY
In the end, the majority of them have credit metadata close to 0%. Whether negligently at the level of the news publisher, or its editors or its image providers, the collective responsibility of the whole chain is engaged, and this will have far more consequences now that a law has been passed by the European Parliament to reward contributors to the information flows exploited by GAFA platforms.
Without proof that the contents relayed by Facebook or Google News are yours, no neighboring rights! This is exactly why metadata were designed for: to sign your visual productions.
#6 PALMARES OF PHOTOGRAPHIC AGENCIES, A NEW EXCLUSIVE IMATAG DATA
To demonstrate the usefulness of credit metadata, IMATAG has identified in the 3% of images of the web that still contain a credit those containing the name of a photographic agency.
Then, by agency, the number of websites on which his images were found allowed to rank them between themselves. This gives them a good idea in the global coverage.
The supremacy of Getty Images in this area was not to prove, it is simply flagrant in this study. Then the trio AFP, REUTERS, AP follows, in relatively equal shares. Then other newswires succeed each other, their rank depending on their production volume and the extent of their geographical coverage.
WHAT TO REMEMBER FROM THIS STUDY
To assert their copyright and neighboring rights, all actors in the image production and distribution chain must mobilize to safeguard image metadata intended to identify rights holders.
To mitigate the erasure or the falsification of this unavoidable data, it is also necessary to provide the press industry with a unique and secure Content Registry, allowing the recording of the content produced and broadcast by the publishers, and the consultation of these contents by the platforms to check the rights (copyright) or to evaluate their veracity (fake news).
This solution exists, we will talk about RoC, Register of Content, in a future article!