Performance re. keywording, an in-depth analysis

General discussion regarding photography practices and Capture One workflow as well as integration with third party applications.
Please DO NOT post to this thread regarding OS specific concerns or questions.
Forum rules
For the sake of being thorough, please remember to note if you are using Mac or Windows.
Hopefully this will keep any confusion to a minimum.

Performance re. keywording, an in-depth analysis

Postby mli20 » Wed Jan 16, 2019 12:28 pm

A common database design for implementing keywording (simplified example) would comprise three tables

[Image] <----- [Image/keyword relations] -----> [Keyword]

Keywords are maintained in the [Keyword] table, one row only per keyword. Search for a keyword, rename a keyword etc., only one lookup is necessary (in an index on keywords), i.e. is fast.

In Capture One the database layout re. keywords (simplified) is

[Image] <----- [Metadata, incl. keywords]

In this layout, any keyword is repeated as many times as it is used in images, which may run into thousands of occurrences. Doing as before, search for a keyword etc., each and every metadata record must be checked, i.e. searches are slowed down considerably.

Excacerbating the problem is the fact that keywords are not kept individually in the metadata table, it is impossible by design. Instead, all keywords are concatenated into one string, like


Whenever filtering, searching etc., the concatenated string must be parsed, i.e. broken up into the individual keywords for them to be available for evaluation. Such string operations are seriously resource demanding, and if repeated for thousands of records...performance goes down the drain. And indexing the concatenated string cannot help.

Kindly note that I have not had any COP documentation available to me in preparing this analysis, other than the database layout. So, it may not be the gospel truth, but hopefully can still contribute to a better understanding of the relevant design issues.

15. january 2019

I finally got around to checking the numbers re. the viabilty of the two database layouts re. keywording. For each of them I constructed the SQL query required to populate the Library/Filter/Keywords pane. I chose this query because it is very obvious how slowly this pane is populated when you open CO with "All Images" selected; it stutters and hesitates for minutes.

Let us check the numbers:

The common layout: [Image] <----- [Image/keyword relations] -----> [Keyword]

- 10000 images
- 2000 keywords
- 15000 metadata rows

Time to execute query: 40 ms
returning 1900 rows

The CO layout: [Image] <----- [Metadata, incl. keywords]

- 3800 images
- 350 keywords
- 12000 metadata rows

Time to execute query: 1540 ms
returning 99 rows

As you can see, even if the "common layout" database is the larger by a considerable amount, the "CO layout" database query is slower by a factor of 38.5.

I'd suggest that a database redesign is required if one wishes to address the Capture One keywording performance problem.

Posts: 299
Joined: Wed May 01, 2013 5:07 pm

Re: Performance re. keywording, an in-depth analysis

Postby Eric Nepean » Wed Jan 16, 2019 1:42 pm

That explains many things.

There have been other comments in these forums on the poor choice of database indexing in this SW.

Organisations are sometimes quite reistant to change, and then this becomes their achilles heel.
Cheers, Eric
[late 2015 iMac, 4GHz i7, 24GB RAM, external SSDs. GX8, E-M1, GX7, GM5, GM1 ....]
Eric Nepean
Posts: 419
Joined: Sat Oct 25, 2014 8:02 am
Location: Ottawa

Re: Performance re. keywording, an in-depth analysis

Postby SFA » Wed Jan 16, 2019 2:05 pm

What would be the relative performance effect of writing and maintaining the keywords compared to the benefits of using them in searches?

In a Catalog environment one could always keep an index of the relationship (by variant), assuming nothing will have changed when opening the catalogue, and do some checks in the background during use.

This approach may not be so useful with session architectures depending in how the session is being managed over time. (For example whether the user sees any benefits in using the "favourites" concept or the potential benefits of using multiple sessions.)
Last edited by SFA on Thu Feb 14, 2019 12:47 pm, edited 1 time in total.
Posts: 6167
Joined: Tue Dec 20, 2011 9:32 pm

Re: Performance re. keywording, an in-depth analysis

Postby mli20 » Wed Jan 16, 2019 4:46 pm

Eric Nepean wrote:There have been other comments in these forums on the poor choice of database indexing in this SW.

And on the issues with the database design. If only better indexing could salvage a poor design, but it can't.

Interestingly, it seems the catalog comprises a list of entities, some 50 of them. One would then expect the number of tables to be approximately the same....but the number is only 18.

Posts: 299
Joined: Wed May 01, 2013 5:07 pm

Re: Performance re. keywording, an in-depth analysis

Postby Tim Trim » Wed Jan 16, 2019 11:20 pm

I agree, as an ex-Oracle consultant, the database design is, interesting! I had a look when I lost a load of folders from the folder tool.

It may explain why I keep losing folders from the folder tool when I delete a folder in C1 and in Finder, others will randomly disappear from the folder tool in C1. Can just add them again, but this has happened to me in C1 11 and again in C1 12. All images deleted first and from catalog trash. Makes no difference if folder deleted in C1 first then finder or vice versa. Somewhat puzzled why C1 doesn't delete the folder in the filesystem too, inconsistent with how it handles images, renames, moves etc which are supposed to be done inside C1.

I guess that something goes wrong during the import process that creates folders and adds each folder to the database. Instead of my import creating a YYYY/MM/DD structure I've just gone to YYYY/MM. That way less likely to go wrong, easier to spot when it does and quicker to fix. Luckily when it loses folders, the collections are correct so All Images contains the images, the edits are intact and they can be found via metadata, but not in the folders tool.
Tim Trim
Posts: 45
Joined: Sun Mar 25, 2018 12:27 pm

Re: Performance re. keywording, an in-depth analysis

Postby NNN634424542422937993 » Fri Jan 25, 2019 11:14 pm

Wow! Thank you.

I knew there was a database problem, but I had not looked in to it. It is clear these fellows are all about adjustments, and they do a very fine job of it even though I am not at a point where I can fully appreciate it. Although I am not a professional, I am planning to use a session for an upcoming vacation trip on my Surface Go. Then I will import the images to my catalog when I get home.

I moved to CO because of Media Pro, and my stronger need for tagging and finding images (of which have thousands, many scanned film as JPEGS.) I never warmed up to Light Room's way of doing it. Since I had Media Pro, I did not spend much time on the cataloging functions in LR.

For the record, I have had several crashes that appear to have been related to tagging and moving images around. The system often freezes, or crashes without a hint before the Phase One crash dialogue box comes up. So far, CO has been able to fix the database itself, but am a bit fearful at this point. Phooey!

Thanks, again for your insight.
Posts: 3
Joined: Sun Jan 15, 2017 5:15 pm

Re: Performance re. keywording, an in-depth analysis

Postby Francesco Mariani » Tue Feb 12, 2019 8:31 pm

Eric Nepean wrote:
Organisations are sometimes quite resistant to change, and then this becomes their achilles heel.
I've worked in IT for over 30 years. There usually is a budget for a project. If some wrong decisions were made during the project, it is difficult to find additional funds to spend time properly correcting the problem.

Phase One is receiving money every time someone purchases or upgrades the software. They should have the budget to fix simple problems like the one discussed in this thread.

Now that Media Pro is dead, what will all the loyal customers do?

In another thread I read a comment where a forum member was wondering why Apple Aperture users keep talking about Aperture. It appears to me that a four-year-old dead-in-the-water software still has the best DAM available to MacOS users.

I've hesitated to switch to CO since v9. I've purchased v11 but still haven't migrated. Then v12 comes along, promising performance improvements, which ends up being a minor improvement, but having the same cost as new software.
Olympus OM-D E-M5 Mark II
MacBook Pro - Retina, 15-inch, Mid 2014 - 2.5 GHz Intel Core i7 processor - 16 GB RAM
macOS Sierra - 10.12.6
Aperture 3.6 > Migrating to Capture One Pro 11 or 12
User avatar
Francesco Mariani
Posts: 39
Joined: Mon May 11, 2015 3:26 am
Location: Toronto, Ontario, Canada

Re: Performance re. keywording, an in-depth analysis

Postby photoGrant » Thu Feb 14, 2019 2:02 am

Holy shit, the fact they're parsing concatenated strings...per image...

It's one thing to blame the hardware on poor performance. It's another to do so whilst neglecting your code.
Grant Hodgeon
Digital Imaging Technician
North America | Europe
Posts: 180
Joined: Thu Sep 18, 2014 5:16 pm
Location: Midwest

Return to Workflow and Common Photography Exploration

Who is online

Users browsing this forum: No registered users and 1 guest