Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Es6 rewrite #91

Closed
wants to merge 2 commits into from
Closed

Es6 rewrite #91

wants to merge 2 commits into from

Conversation

takoyaro
Copy link

That's just a WIP, tackling the issues I can, one at a time. Contributions welcome

Also, someone else mentioned using prettier the other day, I'm aware this should be a different PR but I think it'd make contributions easier for everybody, including this one.

@kungfooman
Copy link
Contributor

That's just a WIP, tackling the issues I can, one at a time. Contributions welcome

You can just mark it as draft and concentrate on real info?

Also, someone else mentioned using prettier the other day, I'm aware this should be a different PR but I think it'd make contributions easier for everybody, including this one.

This sentence is highly confusing. I don't know what you wanted to say, but please don't overlap PR's with different concerns (linting + ES6 modules + ...), it just makes it cumbersome to review.

@takoyaro takoyaro marked this pull request as draft April 16, 2023 07:57
@takoyaro
Copy link
Author

You can just mark it as draft and concentrate on real info?

I'm not really familiar with contributions, doing the best I can. I converted the PR to draft for the time being.
Can you explain what you mean by "concentrate on real info" ?

This sentence is highly confusing. I don't know what you wanted to say, but please don't overlap PR's with different concerns (linting + ES6 modules + ...), it just makes it cumbersome to review.

Sorry about my poor wording and english. I got it, will be careful.

@kungfooman
Copy link
Contributor

Can you explain what you mean by "concentrate on real info" ?

Yes, just some explanations why and what you did:

  • Why replace fs with node:fs, is it just the linter you wrote about?
  • What does this PR fix?
    • Does Webpack build work now?
    • Does it work via import-maps without a bundler?
    • Does it play nice with ES5 ONNX?
    • Does it work in node?
  • To what extend did you test it?
  • Which issues were you running into (which you mentioned in the first post)?

@kungfooman
Copy link
Contributor

src/transformers.js can be simplified into... not sure if it needs comments any longer, because it feels self-explaining:

export * from './tokenizers.js';
export * from './models.js';
export * from './processors.js';
export * from './env.js';
export * from './pipelines.js';
export * from './tensor_utils.js';

And usually that is called the entry point, named src/index.js.

It simply exports everything, which I like for debugging. I rewrote @do-me's SemanticFinder into ES6 using ES6 transformers.js and it just makes it nice to quickly test around with:

image

@kungfooman
Copy link
Contributor

Can you also rewrite:

if (global.ReadableStream === undefined && typeof process !== 'undefined') {
    try {
        // @ts-ignore
        global.ReadableStream = (await import('node:stream/web')).ReadableStream; // ReadableStream is not a global with Node 16
    } catch (err) {
        console.warn("ReadableStream not defined and unable to import from node:stream/web");
    }
}

into code which works in a browser + node at the same time (simply replacing global with globalThis):

if (!globalThis.ReadableStream && typeof process !== 'undefined') {
    try {
        // @ts-ignore
        globalThis.ReadableStream = (await import('node:stream/web')).ReadableStream; // ReadableStream is not a global with Node 16
    } catch (err) {
        console.warn("ReadableStream not defined and unable to import from node:stream/web");
    }
}

This kind of code is also called a polyfill and is often separated into e.g. polyfills/ReadableStream.js and included in index.js:

import 'polyfills/ReadableStream.js';
export * from './tokenizers.js';
export * from './models.js';
export * from './processors.js';
export * from './env.js';
export * from './pipelines.js';
export * from './tensor_utils.js';

@kungfooman
Copy link
Contributor

@takoyaro Did you say you are not afraid to get your hands dirty? Is this still in progress or are you already out?

@takoyaro
Copy link
Author

I was planning to get back on it this weekend...not sure why you speak to me with this tone but I hope you're ok.

@kungfooman
Copy link
Contributor

I was planning to get back on it this weekend...not sure why you speak to me with this tone but I hope you're ok.

Thank you! Unfortunately no, I'm not, I'm sitting on a burning stone flying with high-speed through the universe and people have no other issues than "increasing casualties" of each other.

I just want to check-in with you since I heard nothing from you for some days and people open issues etc. I'm sorry for the tone and can you forgive me?

@xenova
Copy link
Owner

xenova commented Apr 24, 2023

You should be able to continue from the latest version of the docs branch. It seems to work in node.js, but doesn't work in the browser for some reason. Still working that out. It's quite late here, so, I'll try figure it out tomorrow :)

@kungfooman
Copy link
Contributor

It seems to work in node.js, but doesn't work in the browser for some reason. Still working that out. It's quite late here, so, I'll try figure it out tomorrow :)

I'm not sure about the exact problem you run into, but I believe it's because ONNX is not a proper ES6 module:

Test code:

await import("https://cdnjs.cloudflare.com/ajax/libs/onnxruntime-web/1.14.0/ort.es6.min.js").default 

image

However, it should be possible to just import ONNX and after the await you can access it via globalThis.ort:

image

@xenova
Copy link
Owner

xenova commented Apr 24, 2023

So you're suggesting the problem is occurring in https://github.com/xenova/transformers.js/blob/docs/src/backends/onnx.js? I'll look into it.

@kungfooman
Copy link
Contributor

Yep, I just found this issue, they seem to have it on their radar:

microsoft/onnxruntime#14529

And their ES6 issue: microsoft/onnxruntime#10913

The ES6 issue has been open for a year, but eventually it should be fixed. Pretty much two options:

  1. Fork / fix it ourselves - possibly creating a PR which could be merged, while using repo-fork in the meantime
  2. Wait till the devs realize that a name like ort.es6.min.js should be a proper ES6 module with an export 😅

@fs-eire Do you happen to have any ETA for ES6 support? I mean... it kinda works, just not in the way ES6 wants it (using globalThis.ort).

Also ES6 standard is to name it ort.min.mjs (mjs meaning an ES6 module)

@fs-eire
Copy link

fs-eire commented Apr 25, 2023

sorry, I don't have an ETA for ES6 yet. If you would like to contribute, it would be great!

The major reason is that we did a lot of "optimizations" for the file size of ort.min.js (UMD), and those are not compatible with webpack with ES6 format. Those including:

  • duplicated contents from
    • ort-wasm.js vs. ort-wasm-simd.js
    • ort-wasm-threaded.js vs. ort-wasm-simd-threaded.js
    • ort-wasm-threaded.worker.js vs. ort-wasm-simd-threaded.worker.js
  • non-standard way dealing inline worker for proxy feature to avoid CORS issue
  • non-standard way dealing inline threaded worker for wasm multithreading to avoid CORS issue
  • not able to tree-shaking in ES6 as all imports are top layer, as we also deploy trimed files like ort.webgl.min.js
  • custom build steps involving other tools like terser

I actually worked on this last month, but it turns out to be much more difficult than expected. I may come back some day to revisit this issue.

@kungfooman
Copy link
Contributor

sorry, I don't have an ETA for ES6 yet. If you would like to contribute, it would be great!

Thank you for the quick response! It feels like you are drowning in complexity there. ES6 should be as simple as renaming *.ts to *.js and using:

import * from './tensor.js';

Instead of:

import * from './tensor';

This is what every modern browser speaks natively.

You can keep the non-standard-ways for CORS etc. to build a bundle while not messing with TypeScript and their AST rewriting first aswell? The interface types just go into *.d.ts and normal types go into JSDoc (which exist anyway for documentation).

Without payment I cannot deep-dive into this, even though I would like to take on this challange. This is also something that probably doesn't take weeks, but maybe two months aswell to get everything right, figured out and tested in multiple browsers/environments etc.

@fs-eire
Copy link

fs-eire commented May 2, 2023

sorry, I don't have an ETA for ES6 yet. If you would like to contribute, it would be great!

Thank you for the quick response! It feels like you are drowning in complexity there. ES6 should be as simple as renaming *.ts to *.js and using:

import * from './tensor.js';

Instead of:

import * from './tensor';

This is what every modern browser speaks natively.

You can keep the non-standard-ways for CORS etc. to build a bundle while not messing with TypeScript and their AST rewriting first aswell? The interface types just go into *.d.ts and normal types go into JSDoc (which exist anyway for documentation).

Without payment I cannot deep-dive into this, even though I would like to take on this challange. This is also something that probably doesn't take weeks, but maybe two months aswell to get everything right, figured out and tested in multiple browsers/environments etc.

I did some study and find that it seems no way for me to create a NPM package which allows user to import in both ESM/CJS way:

// this is Typescript code
import { a, b, b } from 'onnxruntime-common';

It generates error like this:

//The current file is a CommonJS module whose imports will produce 'require' calls; however, the referenced file is an ECMAScript module and cannot be imported with 'require'. Consider writing a dynamic 'import("onnxruntime-common")' call instead.
To convert this file to an ECMAScript module, change its file extension to '.mts', or add the field "type": "module" to 'package.json'.

This article works for JavaScript but still so far no way to make it works for TypeScript.

@xenova
Copy link
Owner

xenova commented May 2, 2023

Hi @fs-eire - Yes, this has also been a massive headache for me too (see dev branch). Why doesn't "type": "module" work?

Fortunately, for each of my use-cases, I have found a workaround so that ES6 modules work in node... which was way more difficult the other way around (getting node modules working in ES6).

Perhaps @kungfooman can give some advice in this regard, as I am way less experienced haha.

@fs-eire
Copy link

fs-eire commented May 2, 2023

Hi @fs-eire - Yes, this has also been a massive headache for me too (see dev branch). Why doesn't "type": "module" work?

Fortunately, for each of my use-cases, I have found a workaround so that ES6 modules work in node... which was way more difficult the other way around (getting node modules working in ES6).

Perhaps @kungfooman can give some advice in this regard, as I am way less experienced haha.

"type": "module" works. However this is not what I want. converting an existing JS/TS project from commonjs to module may take a lot of efforts. My goal is to make onnxruntime-web able to be imported by either ESM/CJS modules.

I already make onnxruntime-common a ESM/CJS dual package, with a very tricky hack. The article I mentioned above works by setting up CJS/ESM source codes with duplicated types declarations in different folders, which containing a package.json with different type.

This is because TypeScript tries to determine whether the current code is CJS/ESM by looking for a nearest "package.json" file, and the solution is to prepare such a file to make it happy. Not sure whether TypeScript will change this behavior in future, but it works for now at least.

The change is here: microsoft/onnxruntime#15772, which is far more complicated than just appending the ".js". And I expect doing that for onnxruntime-web need take a while....

@kungfooman
Copy link
Contributor

This article works for JavaScript but still so far no way to make it works for TypeScript.

The change is here: microsoft/onnxruntime#15772, which is far more complicated than just appending the ".js". And I expect doing that for onnxruntime-web need take a while....

From your referenced article:

image

Once you realize that a project/language basically actively blocks you, it's time to identify it as technical debt and get rid of it. I used TypeScript for years, but it became so annoying that I just don't want to touch it anymore (in its "pure" form with all the peculiar design decisions of TS purists who basically dislike JS while barely knowing it).

I take the good parts (the type system with turing complete types, the TS language server) and *.d.ts files while otherwise keeping everything *.js with JSDoc. It just makes everything a tad easier.

I will probably not convince you, but I think it should be pointed out (before more people apply these kinds of find/xargs/sed rewrite hacks or referring to *.ts files as *.js for example).

The TypeScript stance doesn't even make sense... they happily introduced enums that transpile to different JS code or public-constructor-assign-AST rewrites, but adding an import-rewrite-feature requested by all kinds of people is ideologically blocked for purity reasons... oh well, glad I'm outta there 😂

@fs-eire
Copy link

fs-eire commented May 2, 2023

@kungfooman I know that Typescript has some limitations, but I don't want to jump into the debate of whether or not to use Typescript in my project. The important thing is, as a library developer, I want to make sure my library can be consumed by both JavaScript & Typescript users. Even if I write pure JavaScript with hand-written .d.ts in my package, I still need to deal with the problems and do the same hack to make sure TypeScript users to consume the package happily.

@kungfooman
Copy link
Contributor

@fs-eire No problem, lets cease that debate. Do you still run into problems with that PR? I'm not on the right computer for testing your PR, but the way you are using "exports" field in package.json is exactly how it works for me aswell. But you mentioned the ugly hack, which I don't need to make it work for me.

My playcanvas package simply contains this in package.json:

  "exports": {
    "import": "./build/playcanvas.mjs/index.mjs",
    "require": "./build/playcanvas.js"
  },

Testing with node:

image

Funny thing is that it only works for me when the files actually are named *.mjs, which I wrote a quick/simple PHP script for:

<?php
  function renameFilesIn($here) {
    $files = glob($here);
    $n = count($files);
    print("rename files in $here n=$n\n");
    foreach ($files as $file) {
        $mjs_file_name = str_replace(".js", ".mjs", $file);
        rename($file, $mjs_file_name);
        print("$file -> $mjs_file_name\n");
    }
  }
  renameFilesIn("**/*.js");
  renameFilesIn("**/**/*.js");
  renameFilesIn("**/**/**/*.js");
  renameFilesIn("*/*/*/*/*.js");
  renameFilesIn("*/*/*/*/*/*.js");
  renameFilesIn("*/*/*/*/*/*/*.js");
?>

And the ES5 version is simply generated from the ES6 files (using rollup, but you can use whatever you want).

I'm not sure what your current problems are or if you solved it so far. As test I simply made src/test.ts:

import { Entity } from "playcanvas";
export function lol() {
  const entity = new Entity("test");
  console.log("lol entity name:", entity.name);
}
lol();

JS code is simply generated using npx tsc and runs fine:

image

So if you summarize your current state that would be helpful and I applaud and congratulate you for all of your efforts 👍 💯 🥇

@fs-eire
Copy link

fs-eire commented May 3, 2023

using .mjs / .cjs is a good option, but that turns out to be not compatible with some tools (eg. Clang-format). Clang-format only supports .mjs .js .ts for JavaScript language. (see doc). Need to use the "--assumeFileName" flag to trick clang-format otherwise it treats .mts/.cts/.cjs as C++ source code. I added a PR to the VSCode extension but seems no response yet: xaverh/vscode-clang-format#150

The lucky thing is that, the one-liner package.json is working. The PR should be good to consume by TS/JS & CJS/ESM users.

@kungfooman
Copy link
Contributor

kungfooman commented May 4, 2023

The lucky thing is that, the one-liner package.json is working. The PR should be good to consume by TS/JS & CJS/ESM users.

Glad to hear that 👍 I tested in node and nodes module resolution respects it well. And for browsers people should use import-maps anyway, specifying the ES6 path. Some people of course also prefer to create giant files which contain everything and that should work aswell via webpack/rollup.

Clang-format only supports .mjs .js .ts for JavaScript language.

Why do you need .mts? Wouldn't it be possible to just stick to .ts and apply your import "$path" -> import "$path.mjs" TypeScript hack? It should also be possible to just apply that to the distributed "dist"/"build" folder and keep everything else as it is.

@kungfooman
Copy link
Contributor

kungfooman commented May 11, 2023

Thanks to all the work of @xenova in the docs branch, we can load transformers.js and onnxruntime in a browser like this now:

<body></body>
<script>
  function importFile(content) {
    return "data:text/javascript;base64," + btoa(content);
  }
  const imports = {
    "transformers": "./src/transformers.js",
    "fs": importFile("export default {};"),
    "url": importFile("export default {};"),
    "path": importFile("export default {};"),
    "stream/web": importFile("export default {};"),
    "sharp": importFile("export default {};"),
    "onnxruntime-node": importFile("export default {};"),
    "onnxruntime-web": importFile(`
      await import("https://cdnjs.cloudflare.com/ajax/libs/onnxruntime-web/1.14.0/ort.es6.min.js");
      let ONNX = globalThis.ort;
      export default ONNX;
      export {
        ONNX
      };
    `),
  };
  const importmap = document.createElement("script");
  importmap.type = "importmap";
  importmap.textContent = JSON.stringify({imports});
  document.body.appendChild(importmap);
</script>
<script type="module">
  import * as transformers from "./src/transformers.js";
  Object.assign(window, {...transformers});
</script>
<audio id="SPEECH2TEXT_AUDIO" src="./examples/demo-site/assets/audio/jfk.wav" controls="true"></audio>

To test, you can just hack code into F12/devtools, e.g. Whisper example:

const pipe = await pipeline("automatic-speech-recognition");
const audioCTX = new AudioContext({
  sampleRate: 16000
});
const arrayBuffer = await (await fetch(SPEECH2TEXT_AUDIO.currentSrc)).arrayBuffer();
const decoded = await audioCTX.decodeAudioData(arrayBuffer);
const audio = decoded.getChannelData(0);
const result = await pipe(audio);
console.log("result", result);

Result:

image

Once onnxruntime is a proper ES6 module, we can simplify the import map code for it, but otherwise it works nice for a quick dev cycle already.

@xenova
Copy link
Owner

xenova commented May 11, 2023

Great! I also think I've found a way to do it without import maps - will keep you updated :)

@xenova
Copy link
Owner

xenova commented May 15, 2023

Changes live on main. 👍

@xenova xenova closed this May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants