Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving dataset utils implementation to Keras Core specifically [split_dataset] #503

Closed
asingh9530 opened this issue Jul 16, 2023 · 6 comments

Comments

@asingh9530
Copy link
Contributor

Hi,

This issue is for tracking to move dataset utils implementation from tf.data.utils to keras-core implementation become framework agnostic. These are following needed to be change/added.

@suvadityamuk Please add if I had left anything.

@fchollet This is for tracking moving logic for [split_dataset].

  • moving keras data-utils from here making sure all the settings stay's intact.
  • need to rewrite following functions since all of them accept tf.Dataset and we need to support torch and Jax.
    • split_dataset to accept torch and Jax inputs.
    • _convert_dataset_to_list to convert torch and Jax inputs to list.
    • _get_data_iterator_from_dataset this might not be entirely trivial since it handles iterator based on tf.Dataset .
    • _get_next_sample again need to rewrite iterator in this too but this might remain same if we make sure iterator we built in _get_data_iterator_from_dataset is python iterator.
    • _get_type_spec add condition to accept torch and Jax.
    • _get_next_sample since need it currently accept tf.tensor need to add support to get torch tensor and Jax or keras-core tensor.
    • _rescale_dataset_split_sizes This might not require any change but need to test it out.
    • _restore_dataset_from_list This might not be entirely trivial as it recovers from tf.Dataset.
@suvadityamuk
Copy link
Collaborator

Yes, looks good to me for now. I'm tackling the split_dataset and a few other functions, will update and link here when I make a PR, hopefully soon enough.

@asingh9530
Copy link
Contributor Author

@suvadityamuk Let's change only split_dataset in this Issue will raise subsequent for rest of function changes also here is the PR. I have only added changes yet.

@suvadityamuk
Copy link
Collaborator

Hi, @asingh9530.

I'd have appreciated it if you indicated beforehand that you were working on the split_dataset functionality, as I had been working on it for approximately 4-5 hours until you made a conflicting PR, evident from my mention on the original issue and our subsequent discussion on the same thread. That is engineering time lost for me due to what seems to be miscommunication (not to mention that today is also a Sunday).

Please feel free to take this issue ahead and complete the task, happy to advise on anything you think I can help with. Thank you for your contribution.

@asingh9530
Copy link
Contributor Author

Hi @suvadityamuk, sorry for any miscommunication, since this was a big change my Intention was for us to work together on this that's why I raised a specific issue for this function change, but if you have already worked on it then I am happy to close my PR and you can take it forward. 😊

@suvadityamuk
Copy link
Collaborator

There is no need for that, I think your PR is already under review, so it's best to go ahead and complete it.

@asingh9530
Copy link
Contributor Author

@suvadityamuk Sure and apologies for any miscommunication I'll raise a different issue for other components of data_utils and will tag you there. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants