Demonstrate process for use of execution butler

Description

Demonstrate that execution butler works and fix any issues discovered:

  • Run pipetask qgraph and create execution butler.

  • Execute that graph using that execution butler.

  • Use butler transfer-datasets to update the original registry.

Checklist

Lucidchart Diagrams

Issue Matrix

hide

Activity

Show:
Jim Bosch
May 27, 2021 at 12:53 AM

Looks good; only a couple of very minor comments.

Tim Jenness
May 26, 2021 at 11:29 PM

I think what we have is enough to unblock BPS so I'll send it out for review. There are two PRs: one in daf_butler and the other in pipe_base but they are both small ones. I will see if I can add a test for the new butler configuration option.

I will do the collection chaining butler command line subcommand on a different ticket. It is not needed immediately for BPS tests since everything works without having it, it's just that people will have to fix up the chained collection after the fact.

Tim Jenness
May 26, 2021 at 8:10 PM
(edited)

Previous code was doing run handling incorrectly and not supporting put in the execution butler. New fixes allow:

$ pipetask qgraph -b DATA_REPO/butler.yaml --input HSC/calib,HSC/raw/all,refcats -p "${PIPE_TASKS_DIR}/pipelines/DRP.yaml#processCcd" --instrument lsst.obs.subaru.HyperSuprimeCam --output-run demo_collection_5 --save-execution-butler ./execution3 --save-qgraph test.qgraph $ pipetask run --register-dataset-types -b execution3/ --instrument lsst.obs.subaru.HyperSuprimeCam --output-run demo_collection_5 --qgraph test.qgraph --extend-run $ butler transfer-datasets execution3/ DATA_REPO/ --collections=demo_collection_5

Some notes:

  • We have decided that we should be using an explicit output run with no chaining. Code using the execution butler can construct the time-stamped run name itself.

  • Since this is creating a sqlite registry for execution butler we can not create this directly with a remote URI. The execution butler must be copied to each node somehow.

  • A new command is needed to create the chained collection in the parent registry. We can do that on a separate ticket.

Tim Jenness
May 25, 2021 at 10:26 PM

The default attempt of:

$ pipetask qgraph -b DATA_REPO/butler.yaml --input HSC/calib,HSC/raw/all,refcats -p "${PIPE_TASKS_DIR}/pipelines/DRP.yaml#processCcd" --instrument lsst.obs.subaru.HyperSuprimeCam --output demo_collection_3 --save-execution-butler ./execution --save-qgraph test.qgraph $ pipetask run --register-dataset-types -b execution/ --instrument lsst.obs.subaru.HyperSuprimeCam --output demo_collection_3 --qgraph test.qgraph $ butler --log-level=VERBOSE transfer-datasets execution/ DATA_REPO/ --collections "demo_collection_3/2021*"

works (with change to pipe_base as found on the associated PR) but it currently looks like the number of datasets in the execution butler output runs are not what we expect. It has also shown that we will need an additional tool to create the new chained collection in the parent butler that was created by the pipetask run call in the execution butler.

Done
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Reviewers

Jim Bosch

Story Points

RubinTeam

Checklist

Created May 25, 2021 at 4:41 PM
Updated July 29, 2021 at 10:04 PM
Resolved May 27, 2021 at 11:26 PM