Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilesystemFdw: Enhance pattern matching and timestamp support #205

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rvernica
Copy link
Contributor

Highlights:

  • Support regular expressions in pattern option
  • Allow for case sensitive or insensitive pattern matches
  • Expose file mtime and ctime as timestamps

This PR adds four more options to FilesystemFdw:

  • escape_pattern (default: TRUE)
    If TRUE, the pattern used to match files is escaped before it is
    used for regular expression matching. If FALSE, the pattern used to
    match files is used as is and it is assumed to be a valid regular
    expression.
  • ignore_case (default: FALSE)
    If FALSE, the pattern used to match files is case sensitive. If
    TRUE, the pattern used to match files is case insensitive.
  • mtime_column
    If set, defines which column will contain the file mtime.
  • ctime_column
    If set, defines which column will contain the file ctime.

With this PR, the following example is supported:

> ls -R1 f
f:
taz

f/taz:
a_b.jpeg
a_b.JPEG
a-b.jpg
a_b.png
a b.PNG
CREATE FOREIGN TABLE foo (
    filename VARCHAR,
    mtime TIMESTAMP,
    ctime TIMESTAMP,
    foo VARCHAR,
    bar VARCHAR,
    taz VARCHAR
) SERVER filesystem_srv OPTIONS (
    root_dir        '/f',
    pattern         '{taz}/{foo}[ _-]{bar}\.(jpe?g|png)',
    escape_pattern  'FALSE',
    ignore_case     'TRUE',
    filename_column 'filename',
    mtime_column    'mtime',
    ctime_column    'ctime');
SELECT * FROM foo;
 filename |        mtime        |        ctime        | foo | bar | taz 
----------+---------------------+---------------------+-----+-----+-----
 a_b.jpeg | 2018-04-19 18:37:06 | 2018-04-20 19:20:12 | a   | b   | taz
 a_b.png  | 2018-04-19 18:37:04 | 2018-04-20 19:20:12 | a   | b   | taz
 a-b.jpg  | 2018-04-19 18:37:30 | 2018-04-20 19:20:12 | a   | b   | taz
 a_b.JPEG | 2018-04-19 18:37:09 | 2018-04-20 19:20:12 | a   | b   | taz
 a b.PNG  | 2018-04-19 18:37:15 | 2018-04-20 19:20:12 | a   | b   | taz
(5 rows)

Notice the regular expression used for the pattern option which allows for the foo and bar tokens to be separated by , _, or -. Also, the file extensions can be .jpg, .jpeg, or .png, case insensitive.

Fix for #203

* Allow user to use regular expression syntax in the pattern
* Track the actual filename matched in the Item since I can't be
  recomputed from the pattern
* Example pattern supported now:

    pattern         '{taz}/{foo}[ _-]{bar}\.(jpe?g|png)',

  allows for tokes to be separated by " ", "-", or "_" and allows the
  extension to be "jpg", "jpeg", and "png"
* ignore_case option set to FALSE by default
* used in re.compile to allow for case insensitive regular expression
  matches
* Extract mtime and ctime from file using os.stat
* Propagate mtime and ctime to Item and convert to datetime
* Add mtime and ctime column options
* Propagate and update mtime and ctime throughout FilesystemFdw
* Example:

CREATE FOREIGN TABLE foo (
    filename VARCHAR,
    mtime TIMESTAMP,
    ctime TIMESTAMP,
    foo VARCHAR
) SERVER filesystem_srv OPTIONS (
    root_dir        '/f',
    pattern         '{foo}.zip',
    filename_column 'filename',
    mtime_column    'mtime',
    ctime_column    'ctime');
* Set actual_filename on from_filename Items
* Add set_timestamp function to Item to set mtime and ctime
  * Used by constructor
  * Used by execute
* execute function:
  * use isfile to check if file exists (consistent with
    StructuredDirectory
  * Get file timestamps and set them in the item
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant