Investigate unicode collection names in butler

Description

A discussion about Rucio filename regexes mentioned unicode and I wondered whether butler could support unicode (restricted to alphanumeric characters, not emoji) in collection names. A quick change:

Seemed to work fine for Greek characters with sqlite and the collection name and file names do not seem to cause any difficulties (standard butler query tooling worked fine as well as butler.get).

Postgres needs to be tested.

Checklist

Issue Matrix

hide

Lucidchart Diagrams

Activity

Tim Jenness March 11, 2022 at 8:08 PM

I implemented the explicit version.

Kian-Tat Lim March 11, 2022 at 4:42 PM

The intent is fine.

I'm not sure why you picked 263a as the start; 2639 seems like a perfectly useful character as well.  I think my suggestion will be even easier to explain and won't require future maintenance.

Tim Jenness March 11, 2022 at 4:14 PM

This is mainly a one line change to expand the regex to remove the ASCII constraint and include emoji character ranges. The rest of the changes are to use a Greek character and emoji in the test code.

Done
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Reviewers

Kian-Tat Lim

Story Points

RubinTeam

Architecture

Components

Checklist

Created March 9, 2022 at 7:58 PM
Updated March 11, 2022 at 8:08 PM
Resolved March 11, 2022 at 8:05 PM