Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implicit module M wrapping #23619

Open
StefanKarpinski opened this issue Sep 7, 2017 · 28 comments
Open

implicit module M wrapping #23619

StefanKarpinski opened this issue Sep 7, 2017 · 28 comments
Assignees
Labels

Comments

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Sep 7, 2017

Problem

Currently, when you do using M and the code for M.jl is automatically loaded, the file is evaluated in Main and expected to define a module M. It can, however, evaluate arbitrary code in Main before or after the definition of M and can fail to define M entirely:

# M.jl
@show @__MODULE__ # => Main

module M # can not do this bit
    @show @__MODULE__ # => Main.M
end # module M

@show @__MODULE__ # => Main

With #23579 and the plan outlined in #4600 (comment), this becomes increasingly awkward. Modules will no longer live inside of Main so what module is code before and after the definition of M evaluated in?

Proposal

The simplest solution would be for the code in the file M.jl to be evaluated inside of M instead of defining M. In other words, the file M.jl would be reduced to this:

# M.jl
@show @__MODULE__ # => Main.M

When using M or import M ends up loading this file, it would implicitly wrap it in the equivalent of module M; include("M.jl"); end. This way the module M cannot help but be defined and the boilerplate module M and corresponding end # module at the beginning and end of most files is eliminated.

Transition

One fact makes this transition surprisingly straightforward: modules automatically have a constant, internal self-binding. Thus, module M; module M; end; end causes a warning to be printed: WARNING: replacing module M. Because of this, we can safely implement the above proposal and simply change the behavior of module M; module M; end; end – the inner module M; end would have no effect, simply continuing with the definition of the outer M. To encourage people to change their code and eliminate the boilerplate, we can emit a warning that the module M is no longer necessary and should be deleted.

The most common and official case of code evaluated in Main before module code is __precompile__ calls in packages, controlling/indicating whether a package is precompilable or not. This would no longer work as expected, so we need an alternative. Since precompilability is a package-level property, it would make sense for this bit to be recorded in the new Pkg3-style as a precompile = [true|false] entry in the Project.toml file. This metadata would automatically be propagated to the Pkg3 registry metadata for various versions of each package.

@stevengj
Copy link
Member

stevengj commented Sep 7, 2017

Note that the __precompile__(true) statement is currently evaluated outside the module scope, so that it can avoid parsing of the entire module when it is not needed. Oh, I see, you addressed that at the end. But what about loading modules that are not in Pkg3 metadata? Do we disallow precompiling such modules?

@StefanKarpinski
Copy link
Sponsor Member Author

Packages don't have to be registered to have a Project.toml file, so I don't think that's a limitation.

@jebej
Copy link
Contributor

jebej commented Sep 8, 2017

To keep the module definition explicit, which I personally prefer, we could also disallow having any code outside of the module definition. The package file would have to start with module M and end with end.

@StefanKarpinski
Copy link
Sponsor Member Author

I'm curious why you prefer keeping the module name explicit – what's the benefit?

@stevengj
Copy link
Member

stevengj commented Sep 19, 2017

The benefit of keeping it explicit is that I think users may be confused if the "same" code creates a module in a file vs. global definitions if pasted into the REPL. On the other hand, it doesn't seem to confuse Python people too much?

@jebej
Copy link
Contributor

jebej commented Sep 19, 2017

I thought it created an asymmetry/inconsistency with submodules, for which you still would need to write out the keyword. I also liked that using a module is equivalent to just includeing the code. It clears up the magic associated with loading modules (compared to other languages where you might not know what happens behind the scene).

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 5, 2017

But what about loading modules that are not in Pkg3 metadata?

I think we could make __precompile__(true) the default now (and make it a no-op). Opting out with __precompile__(false) at the top would still work.

@stevengj
Copy link
Member

stevengj commented Oct 5, 2017

Julia is a safe language by default, and __precompile__(true) is not safe for arbitrary code.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 5, 2017

I don't think there should be any safety issues with it that are specific to it, just additional warnings / errors.

@stevengj
Copy link
Member

stevengj commented Oct 6, 2017

I guess you could take the position that any code using ccall is intrinsically unsafe and people should be expected to know that they may need __precompile__(false), but it still seems a bit unfriendly.

There is also the fact that new users creating modules probably don't want precompiling, because it slows things down if you are continually changing the module. Precompiling is mainly for experienced developers of larger packages.

@JeffBezanson
Copy link
Sponsor Member

probably don't want precompiling, because it slows things down if you are continually changing the module

That's a good point.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 6, 2017

There is also the fact that new users creating modules probably don't want precompiling, because it slows things down if you are continually changing the module. Precompiling is mainly for experienced developers of larger packages.

I think this was solved by Revise.jl. I don't think there's any particular need to distinguish experienced and novice developers via some arbitrary keyword anymore.

any code using ccall

I'm not aware of any __precompile__-specific issues with ccall. I know there were some initial changes to cfunction, but I think that's mostly all sorted out now. (I also generally assume that users needing a CFFI are pretty quickly headed towards being advanced users anyways.)

And making __precompile__ the default will greatly simplify the code logic and substantially improve the error messages.

@stevengj
Copy link
Member

stevengj commented Oct 6, 2017

@vtjnash, once you start using ccall, it is pretty easy to get data structures back from C that have pointers hidden in them, and you have to know not to stick these into a global constant for precompiling.

Revise doesn't completely solve the problem because it can't handle e.g. changes in type definitions. In general, I wouldn't recommend using precompilation initially when starting new packages.

@lucatrv
Copy link
Contributor

lucatrv commented Oct 29, 2017

Once this and #4600 are implemented, how will it be possible to specify the absolute path to a module file that is outside the current working directory? Please find an example discussed here. Would the include statement still be needed for absolute paths?

@tpapp
Copy link
Contributor

tpapp commented Oct 30, 2017

@stevengj: AFAICT #22721 should resolve the issue with changed type definitions (timholy/Revise.jl#18 for Revise.jl)

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Oct 30, 2017

@lucatrv: I'm not seeing the issue. Calling include("/path/to/file.jl") on a file containing an explicit module M ... end would continue to work just as it does now. You cannot use using to load code from arbitrary absolute paths currently – you have to use include for that already.

@lucatrv
Copy link
Contributor

lucatrv commented Oct 30, 2017

@StefanKarpinski: ok, but in this case how would one decide whether to put an explicit module M ... end statement in his M.jl file when he first writes it? Because if he puts it, people evaluating using .M will get the "warning that the module M is no longer necessary and should be deleted". While if he does not put it, people evaluating include("/path/to/M.jl") may potentially get weird side effects because all the code in script M.jl will be evaluated in the current variable workspace.

IMHO an optional string passed to using together with the module name would solve this issue. When passed, it would establish the reference root path which the relative path starts from.
To make some examples:

  • using M looks for module M in file M.jl within the LOAD_PATH
  • using .M looks for module M in file current_working_directory/M.jl
  • using .M.M looks for module M in file current_working_directory/M/M.jl
  • using "/home/foo/srcdir".M looks for module M in file /home/foo/srcdir/M.jl
  • using A, .B, "path_to_C".C would be a combination of above cases

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Oct 30, 2017

I'm not sure what the use case for being able to either load code via using or include is. These are fairly different mechanisms. include is for loading pieces of some larger body of code from separate files, which is why it interprets relative paths relative to the current source file (not the current working directory). using is for loading top-level external packages – the current source file is largely irrelevant.

Your proposal looks both more complicated and less backwards compatible than what's discussed in #4600. Relative using should not look in the current working directory, nor is the fact that the module path is relative the most salient point. Did you read through the latest proposal in #4600?

If one wants to simulate the behavior of using with include one can just do module M; include("/path/to/M.jl"); end – that doesn't seem all that hard; as I mentioned before, one can have a helper macro as well to make it less verbose, although it's already pretty short.

@lucatrv
Copy link
Contributor

lucatrv commented Oct 31, 2017

Sorry I mistakenly wrote "current working directory" while I meant "current source file directory", so I am going to report here the examples above corrected (with a few additions):

  • using M looks for module M in file M.jl within the LOAD_PATH
  • using .M looks for module M in file current_source_file_directory/M.jl
  • using A.M looks for module M in file module_A_source_file_directory/M.jl
  • using .M.M looks for module M in file current_source_file_directory/M/M.jl
  • using .path.to.M.M looks for module M in file current_source_file_directory/path/to/M/M.jl
  • using "/home/foo/srcdir".M looks for module M in file /home/foo/srcdir/M.jl
  • using A, .B, "path_to_C".C is a combination of above cases

After lots of reading and thinking on how Julia code should be organized into modules to be reusable, in my mind I got the idea that include should be used as a sort of "reusable" source code cut and paste tool, such as in:

module Foo
include("file1.jl")
include("file2.jl")
end

while using in my opinion is the command of choice to load functions implemented in modules (reusable code). In other words in my opinion to make source code reusable the best practice should be to put functions into modules, and load them where needed with using (not with include). I discussed a particular use case here.

A module source file could either:

  1. be installed by an external package (so available in the LOAD_PATH)
  2. be part of the current project (so reachable with path relative to the current source file directory)
  3. be part of another project, or in general be in another directory (this is the case for instance of the JuliaConfig.jl module discussed in the above reference, but in general this would happen any time the module source file is not easily reachable with path relative to the current source file directory)

When I read the latest proposal in #4600, I though that it covers cases 1 and 2, but not 3. So my intent was actually to extend your proposal, with the following two differences:

  • module M in file current_source_file_directory/M/M.jl would need to be referenced explicitly with using .M.M (this change is optional, the benefit in my opinion would be to keep the "dotted" notation more aligned with the system's path notation, but maybe there are other reasons to prefer implicit using .M => src/M/M.jl path notation)
  • absolute paths could be optionally passed as strings, and in this case they would integrate with the "dotted" notation by replacing the parent module source file directory (this is the only significant difference).

IMHO this proposal is still in line with #4600, and with similar degree of backwards compatibility (just more features). In case you do not like it, there can be of course other options, but in general I think that a way to provide absolute paths to using is needed once #4600 and #23619 are implemented, so that include is not required anymore to load modules.

@krrutkow
Copy link

krrutkow commented Nov 2, 2017

Would this proposal prevent baremodule from being used for top level modules?

@stevengj
Copy link
Member

stevengj commented Nov 2, 2017

It seems like several independent issues are getting mixed together here: (1) whether module Foo ... end is implicit in module files, (2) whether precompilation is the default, and (3) how to specify different paths in the using statement. Can we just focus on (1)?

Personally, I tend to agree with @jebej: module loading is much less magical if the module code looks the same no matter where you load it from. Code before and after the module is executed in an undocumented module.

@lucatrv
Copy link
Contributor

lucatrv commented Nov 5, 2017

@stevengj: I think that points (1) and (3) are interdependent. If module Foo ... end is implicit in module files, then include alone cannot be used to load modules, so using and import should provide a way to load module files from specific paths (or otherwise module M; include("/path/to/M.jl"); end has to be used). The choice on one point affects the other one, so IMHO it is better to ponder them together.

Regarding @jebej comment:

I also liked that using a module is equivalent to just includeing the code"

on the other hand I subscribe the 13th aphorism of the Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

which has been a main general goal of Python 3.

I think that having a clear distinction on what include and using do, with no doubts on when one or the other one should be used, is healthy. IMHO include should be used as a sort of reusable source code "copy and paste" tool, while using should be the standard way of loading modules. Then of course everything that using does (which would be just a few things beyond what include does) should be documented in detail to avoid any magic associated with loading modules.

@StefanKarpinski
Copy link
Sponsor Member Author

then include alone cannot be used to load modules

This is not the case – include can load files that have module M ... end in them just fine. This would only require module M; include("/path/to/M.jl"); end to use include to load files that are intended to be loaded via using or import – which is a questionable thing to do in the first place.

IMHO include should be used as a sort of reusable source code "copy and paste" tool, while using should be the standard way of loading modules.

Yet your above argument is that implicit module wrapping is bad because it prevents you from using include and using interchangeably? This seems like an incoherent position. If you don't like implicit module wrapping for imported code on aesthetic reasons, that's fine, but this whole argument seems to be self-contradictory.

@lucatrv
Copy link
Contributor

lucatrv commented Nov 12, 2017

@StefanKarpinski sorry if I was not able to express myself clearly. I actually like very much the implicit module wrapping proposal, for the reasons outlined in your first post of this thread. My position however is that together with the implicit module wrapping proposal, the capability to load modules from absolute paths (case 3 of my Oct 31st post above) should be added to the using/import commands, unless you think that the need to load modules from absolute paths should never happen in normal conditions (see end of this post). One possibility could be for instance to specify absolute paths as proposed in my posts above (using "/home/foo/srcdir".M), or otherwise another possibility could be to directly provide the full path of the module file (using "/home/foo/srcdir/M.jl”), or otherwise maybe there could be other better options to specify absolute paths that someone else can propose. If this capability is not added to the using/import commands (unless loading from absolute paths should never happen, see the end of this post) then I would rather prefer the status quo, but I would like to make clear that the status quo is not my best preference.

I will try to better detail my position in the following. Please consider that all this is based
on my personal preference, I do not want to open a diatribe on this, my aim is just to give suggestions, we can make things work anyway whatever is decided about this.
So here are the possible cases in order of my own preference:

  • Implicit module wrapping AND capability for the using/import commands to load module files from absolute paths (unless this should never happen).
    In this case I see only benefits, for instance:
    a) “boilerplate module M and corresponding end # module at the beginning and end of most files is eliminated”
    b) the using/import commands are always “the standard way of loading modules” in all possible cases, regardless if the module is (case 1) available in the LOAD_PATH or (case 2) reachable through relative path (once relative using/import should search current directory #4600 is implemented) or (case 3) reachable through absolute path
    In this case I personally do not see any drawbacks.
  • Status quo.
    In this case I see the following drawbacks:
    a) “boilerplate module M and corresponding end # module...” remains
    b) sometimes the obvious way of loading modules is to use the using/import commands (case 1, and case 2 once relative using/import should search current directory #4600 is implemented), while other times only the include command can be used (case 3).
  • Implicit module wrapping BUT NO capability for the using/import commands to load module files from absolute paths.
    In this case I see one benefit (“boilerplate module M and corresponding end # module ... is eliminated”), but I see the following drawbacks:
    a) sometimes the obvious way of loading modules is to use the using/import commands (case 1, and case 2 once relative using/import should search current directory #4600 is implemented), while other times only the include command can be used (case 3). This is the same as the status quo but:
    b) when the include command must be used (absolute path), it should be explicitly wrapped with module M; include("/path/to/M.jl"); end. Apart from the additional boilerplate, in doing this one should take care not to write for instance module N; include("/path/to/M.jl"); end.

Now that I should have fully explained my opinion, if you do not agree with it, for my understanding I would like to ask why instead in your opinion it is better that the using/import commands are able to load modules only from LOAD_PATH or relative paths, and not from absolute paths, in other words why in your opinion for case 3 of my Oct 31st post (absolute path) it is better to use module M; include("/path/to/M.jl"); end rather than still relying on the using/import commands with something like using "/home/foo/srcdir".M or otherwise using "/home/foo/srcdir/M.jl”. Is it because normally for production code all modules should always be reachable either from LOAD_PATH or through relative paths, and never through absolute paths, and so you think that loading from absolute paths should be a very particular and temporary condition (not for production code), which justifies the need for a sort of "include hack”? If this is the case, then I would actually agree with your position not to add absolute paths to the using/import commands.

Sorry for the long post but I hope it clarifies.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 12, 2017

the capability to load modules from absolute paths should be added to the using/import commands, unless you think that the need to load modules from absolute paths should never happen in normal conditions

otherwise maybe there could be other better options to specify absolute paths that someone else can propose

I don't think it's the business of the code that loads a module via import Foo to include the path to Foo. The whole point of a statement like that is to say "give me the known resource Foo wherever it may be". If you bake it into the usage location there are various problems:

  1. It tightly couples the source code to the layout of the file system in an undesirable way – try to run the code on a different system, it's not going to work.
  2. It allows multiple places in the code to claim that Foo should be loaded from different places; they can disagree on the location of the same Foo – where should it be loaded from?
  3. Even if all the places agree on where to load Foo from, if it moves they all have to change.

The new package manager provides the ability to put absolute (or relative) paths in a project manifest file, telling Julia to load Foo from that location. Any place in the code that needs that Foo then knows where to look without coupling the code with a specific location, without repeating the location through the code, and without the ability for code to be written such that different places loading the same Foo can disagree about where it comes from.

Even with that mechanism, I don't really think using absolute paths for this sort of thing is a great idea. Relative paths are ok since you may want to refer to a resource defined within the same project which is updated in sync with the code that uses it. Otherwise, you want to specify the identity of a package (i.e. package UUID) that you'd like to use and the identity of a particular state of that package (i.e. version tree hash) that you'd like to load. As long as it gets the right version of the right package, code shouldn't care where it came from or lives on disk.

@lucatrv
Copy link
Contributor

lucatrv commented Nov 12, 2017

@StefanKarpinski: this clarifies all my questions, now that I see the full picture I perfectly agree with you, thanks!

@JeffBezanson JeffBezanson added the status:triage This should be discussed on a triage call label Nov 20, 2017
@JeffBezanson
Copy link
Sponsor Member

I think this will be out of scope for 1.0.

@StefanKarpinski
Copy link
Sponsor Member Author

This could be added in a non-breaking way by just making the inner module in module M; module M; end; end a no-op. Then we would just have to convince people to change their code in a manner other than deprecation warnings, for which we have various options.

@StefanKarpinski StefanKarpinski modified the milestones: 1.0, 1.x Nov 20, 2017
@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Nov 20, 2017
@DilumAluthge DilumAluthge removed this from the 1.x milestone Mar 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants