Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up CI with Azure Pipelines #2

Closed
wants to merge 1 commit into from
Closed

Conversation

yongtang
Copy link
Owner

@yongtang yongtang commented Feb 9, 2019

No description provided.

@yongtang yongtang closed this Feb 9, 2019
@yongtang yongtang deleted the azure-pipelines branch February 9, 2019 04:10
yongtang pushed a commit that referenced this pull request Dec 22, 2021
Implements reading from bigtable in a synchronous manner.
yongtang pushed a commit that referenced this pull request Feb 4, 2022
* feat: reading from bigtable (#2)

Implements reading from bigtable in a synchronous manner.

* feat: RowRange and RowSet API.

* feat: parallel read (#4)

In this pr we make the read methods accept a row_set reading only rows specified by the user.
We also add a parallel read, that leverages the sample_row_keys method to split work among workers.

* feat: version filters (#6)

This PR adds support for Bigtable version filters.

* feat: support for other data types (#5)

* fix: linter fixes (#8)

* feat docs (#9)

* fix: building on windows (#12)

* fix: refactor bigtable package to api folder (#14)

moved bigtable to tfensorflow_io.python.api

* fix: tests hanging (tensorflow#30)

changed path to bigtable emulator and cbt in tests

moved arguments' initializations to the body of the function in bigtable_ops.py

 fixed interleaveFromRange of column filters when using only one column

* fix: temporarily disable macos tests (tensorflow#32)

* disable tests on macos

Co-authored-by: Kajetan Boroszko <[email protected]>
Co-authored-by: Kajetan Boroszko <[email protected]>
yongtang pushed a commit that referenced this pull request Jul 25, 2022
General:
    Asserts were added and enabled after each DAOS event-related call in order
    to track down internal race conditions in the DAOS client code, see
    DAOS-10601.

    The DAOS_FILE structure documented behavior for the 'offset' field, but
    most of that behavior didn't need to be implemented.  Field 'offset' was
    removed while fixing the Append() and Tell() functions, leaving only a
    single field 'file' in DAOS_FILE, so the DAOS_FILE struct was removed as
    well.

    Typos in error messages were corrected.

File dfs_utils.cc:

In DFS::Setup():
   Code after the Connect() call replaced the detailed error status set in
   Connect() with a generic TF_NOT_FOUND error with no accompanying message.
   This cost me several days of debugging to realize that a problem was not
   some object not being found, but rather was a connection failure to an
   unhealthy container.  The TF_NOT_FOUND has been removed, allowing the more
   detailed error messages in Connect() to be reported.

In ReadBuffer::ReadBuffer()
    By setting buffer_offset to ULONG_MAX, an uninitialized buffer will never
    be matched by CacheHit(), removing the need for a separate 'initialized'
    variable.  The valid variable is no longer needed as well, more on that
    below.

In ReadBuffer::~ReadBuffer()
    daos_event_fini(() cannot be called on an event that is still in flight,
    it fails without doing anything, daos_event_test() must wait for any prior
    event to complete, otherwise the event delete that follows
    daos_event_fini() could then cause corruption of the event queue.  Call the
    reworked WaitEvent() (see below) first to ensure that daos_event_fini()
    will clean up the event before it is deleted.

In ReadBuffer::FinalizeEvent()
    The same problem exists here as in ~ReadBuffer(), daos_event_fini() can't
    be called on an event that is still in flight.  However, FinalizeEvent()
    isn't actually needed, a call to dfs_file->buffers.clear() in Cleanup()
    accomplishes the same thing using the ~ReadBuffer code, so FinalizeEvent
    was removed.

ReadBuffer::WaitEvent()
    There is a need for a WaitEvent() function in several places to wait for
    any outstanding event to complete, but this routine manipulates 'valid',
    so it can't be used anywhere else.  Removed the 'valid' code so that this
    routine can become a void and be called in multiple places.

ReadBuffer::AbortEvent()
    daos_event_abort() doesn't actually contain any logic to ask the server to
    abort an in-flight dfs_read() request.  In addition it is buggy, internal
    DAOS asserts were hit due to daos_event_abort() calls during I/O testing.
    The code was changed to instead use WaitEvent() to simply wait for a prior
    read to complete before issuing a new one, and AbortEvent() was removed.

ReadBuffer::ReadAsync()
    Both daos_event_fini() and daos_event_init() must be called on a
    daos_event_t structure before the event can be reused for another
    dfs_read() call.  These have been added.  The AbortEvent() call was
    replaced with a call to WaitEvent().  The code was changed to save the
    errno from a failed dfs_read() call in the event's ev_error field so
    that the error will be detected, and so a user cannot accidentally read
    trash data after a failed dfs_read() call.

ReadBuffer::ReadSync()
    This function is no longer used, see below.

ReadBuffer::CopyData()
    The WaitEvent() call ensures that the thread blocks until any in-flight
    read request is done.  The event->ev_error field is used to detect I/O
    failure either at the time the dfs_read() is issued or in the reply, so
    the valid flag is no longer needed.

ReadBuffer::CopyFromCache()
    The TF_RandomAccessFile read() function allows for int64_t-sized reads, so
    change the return value here to int64_t.  If an I/O error occurred, then
    return -1 so that the caller function Read() can easily tell when there
    has been an I/O error.  Provide a detailed error message so that the user
    can tell what caused the error.

File dfs_filesystem.cc:

In DFSRandomAccessFile constructor:
    Added an assert() on the event queue creation.

In Cleanup():
    Replaced FinalizeEvent() code with a dfs_file->buffers.clear() call.
    Add asserts on dfs function calls.

In df_dfs_filesystem::Read():
    The loop "for (auto& read_buf : dfs_file->buffers)" was missing a break
    statement, so CacheHit was called 256 times for each curr_offset value.
    A break was added.

    Support was added for detecting a read error and returning -1.

    Since Read() is now a while loop, there is no reason to specially use
    ReadSync() for the first buffer.  Code changed to use ReadAsync() for
    all readahead, CopyFromCache() will block until the first buffer's I/O
    is complete.  ReadSync is now unused, and is removed.

    I could not determine a reason for the WaitEvent loop:
        if (curr_offset >= dfs_file->file_size)
    because I/O requests will never be be started beyond EOF.  The loop
    was removed.

In DFSWritableFile:
   The Append() function had to make a dfs_get_size() call for each append
   to a file, adding a second round trip to the server for each append.  This
   is very expensive.  Member functions were added to cache the file size
   and update it locally as Append() operations are done..  Since the
   tensorflow API only allows one writer, local caching is allowable.  Should
   there be an I/O error, the actual size of the file becomes unknown, the
   new member functions take that into account and call dfs_get_size() in
   those situations to reestablish the correct size of the file.

In Append():
   The dfs_file->daos_file.offset field was not updated after an Append()
   operation completed successfully, so a subsequent Tell() call would return
   the size of the file before the last Append(), not after, the reported size
   was incorrect.  The code was changed to update the cached file size after
   successful Append() operations.

In RenameFile():
    Similar to the Setup() case, the detailed error statuses in Connect() were
    being hidden by a genereric TF_NOT_FOUND error.  The generic error was
    removed.

Signed-off-by: Kevan Rehm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant