Investigate extremely slow execution butler creation
Description
Attachments
Checklist
Issue Matrix
hideLucidchart Diagrams
Activity
Jim Bosch January 24, 2022 at 9:56 PM
Looks good!
For posterity, the conversation about the difference between old and new execution butlers happened on the PR: we're no longer saving TAGGED and CALIBRATION memberships, but we never had to.

Tim Jenness January 24, 2022 at 9:50 PM
Switching from export/import to Butler.transfer_from() the entire execution butler step takes about 20 minutes. It's a very small patch to enable this.
The profile now shows that 75% of the time is in execution butler creation and 23% in reading the graph. Of that 75%, half is doing the datastore transfer (but no file system checks) and quarter each for export and import.
I could probably do with a verbose log message in Butler.transfer_from for when it actually completes – at the moment you are told when all the refs have been imported into registry but not when the datastore transfer has completed.
Jim Bosch January 24, 2022 at 3:30 PMEdited
Here's my attempt to reconstruct them (I lost the RSP instance used to do this originally, and apparently shell history isn't saved).
First make the QG, since it's too painful to share those (takes about 45min, I think? Or 30min if you don't run the profiler here.):
then make the execution butler (ran overnight):

Tim Jenness January 22, 2022 at 12:13 AM
I've made a very small change that might help. It's now doing a direct transfer of records from one datastore to the other and not involving the file system at all. pipelines_check works, with my only worry being that the execution registry that gets created is now 21 lines shorter (when doing sqlite3 gen3.sqlite3 .dump). I'm not entirely sure why it has shrunk and because the tags tables get renumbered between old code and new it's hard to compare the dumps (the datastore records do match). Even better, this patch is small enough that if it works we can backport it to v23.
Can you either try this branch on data-int or tell me exactly the command I need to run?

Tim Jenness January 21, 2022 at 9:34 PM
Okay. A few things:
There is a new switch to ingest() that indicates not to bother recording the file size. That would remove 38% of the overhead.
The problem though is that the execution butler is build using export + import. It is not built as a butler to butler transfer. This means that there is no mechanism in place to pass that parameter in to the datastore ingest.
If
Butler.transfer_from()
was used then some of the inefficiencies would go away since it would be transferring datastore records over in bulk. Maybe I should take a look to see if using that API is relatively simple.
Here is a screen grab of a subset of the profile:
Details
Assignee
Tim JennessTim JennessReporter
Jim BoschJim BoschLabels
Reviewers
Jim BoschStory Points
2RubinTeam
Ops MiddlewareComponents
Details
Details
Assignee

reported some extremely low execution butler creation in DP0.2 step 2 processing here. This ticket is to reproduce, profile, and investigate that. If a fix looks obvious, I'll make it on this ticket, too, but it's more likely that this will help us avoid the problem in execution butler's replacement.