Add a compiler flag that emits crate dependencies on a best-effort basis #1622

Open
wants to merge 1 commit into from

4 participants

@mahkoh

No description provided.

@mahkoh

The implementation of (a basic version of) this is almost trivial and I've done it myself in my compiler driver.

This can then be used as in the following python script: https://gist.github.com/anonymous/a12206da17ab36f0c30d054ba5b05e89

When a new compilation unit is added, the only thing that has to be done is to add its name to the top of the file. If the dependencies between the units change, nothing has to be done because the script tracks those dependencies itself.

@oli-obk oli-obk commented on the diff May 20, 2016
text/0000-emit-crate-deps.md
+# Summary
+[summary]: #summary
+
+Add a compiler flag that emits crate dependencies on a best-effort basis.
+
+# Motivation
+[motivation]: #motivation
+
+When working with projects that consist of dozens or hundreds of compilation
+units, it is useful to have some kind of automatic dependency management in the
+build system. Otherwise such information has to be maintained in at least two
+locations: Once in the compilation unit itself via an `extern crate X` statement
+and once in the build system.
+
+To this end, this RFC proposes a new compiler flag which emits the names of the
+crates a compilation unit depends on to a text file.
@oli-obk
oli-obk added a note May 20, 2016

Since we might want to add more information than just the crate name, I suggest to produce a json file (or any other structured format), which also yields the attributes given to the extern crate item. This is important for things like macro imports and compiler plugins.

@oli-obk
oli-obk added a note May 20, 2016

This can be used for the first iteration, since you can now check for #[macro_use] attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@oli-obk oli-obk commented on the diff May 20, 2016
text/0000-emit-crate-deps.md
+The crates are collected by walking the AST and inspecting all `extern crate`
+statements. If the statement is of the form `extern crate X as Y`, `X` but not
+`Y` is added to the list.
+
+The file contains the names of the crates separated by `\n`.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+If `-Z parse-only` is passed, the created file is, in general, not precise. This
+is because macro expansion can itself create new `extern crate` statements.
+However, macro expansion does, in general, depend on the dependencies having
+already been compiled.
+
+The `-Z parse-only` variant is, however, essential for the first round of
+dependency resolution in the build system.
@oli-obk
oli-obk added a note May 20, 2016 edited

Please elaborate in the detailed design why this is so (maybe add a workflow of calls to rustc that uses crates that have macros)

@mahkoh
mahkoh added a note May 20, 2016

What's unclear about it?

@oli-obk
oli-obk added a note May 20, 2016

You are just stating that it is essential, not why.

@mahkoh
mahkoh added a note May 20, 2016

A build system needs a dependency graph to know where to start compiling.

@oli-obk
oli-obk added a note May 20, 2016 edited

That's what I meant with add a workflow of calls (maybe the motivation would be a better place than the design)

  1. rustc -Z parse-only -Z emit-crate-deps $file
  2. find dependencies, run this entire process on the dependencies
  3. rustc -Z no-analysis -Z emit-crate-deps $file
  4. find new dependencies, run this entire process on the new dependencies
  5. compile $file by passing the dependencies

make the life easier for readers of your RFC, right now your idea of this process is scattered at random through the RFC

@mahkoh
mahkoh added a note May 20, 2016 edited

The process is not part of the RFC. If you want to suggest a process, keep it in the comments. I've already done so myself in the first comment.

@oli-obk
oli-obk added a note May 20, 2016 edited

it can be part of the motivation, so the detailed design can be understood. And you are stating things like

the problem can be mitigated by tracking only the generated statements separately in the build system

and

first round of dependency resolution in the build system.

which are very much statements relating to the process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@oli-obk oli-obk commented on the diff May 20, 2016
text/0000-emit-crate-deps.md
+The file contains the names of the crates separated by `\n`.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+If `-Z parse-only` is passed, the created file is, in general, not precise. This
+is because macro expansion can itself create new `extern crate` statements.
+However, macro expansion does, in general, depend on the dependencies having
+already been compiled.
+
+The `-Z parse-only` variant is, however, essential for the first round of
+dependency resolution in the build system.
+
+I assume that the generation of `extern crate` statements is a rarely used
+features. If it is used, the problem can be mitigated by tracking only the
+generated statements separately in the build system.
@oli-obk
oli-obk added a note May 20, 2016

Every macro that takes an expression can potentially contain extern crate items in the expression (if it contains a block). for example:

println!("foo: {:?}", { extern crate chrono; chrono::UTC::now() } );
@mahkoh
mahkoh added a note May 20, 2016

Yes, that's what it says in this paragraph.

@oli-obk
oli-obk added a note May 20, 2016

I would include that in the detailed design, instead of the drawbacks. I don't think your system should be best-effort, but either do this for all possible rust code or not at all

@mahkoh
mahkoh added a note May 20, 2016

That is impossible as I've explained in this section.

@oli-obk
oli-obk added a note May 20, 2016 edited

sure it is. The macro expansion -> new dependency game can't be continued forever, even inside rustc + cargo with explicitly stating the dependencies:

// main.rs in blub
#[macro_use]
extern crate blarg;
fn main() {
    let x = blarg!();
}
// lib.rs in blarg
#[macro_export]
macro_rules! blarg {
    () => {{
        #[macro_use] extern crate foo;
        foo!();
    }}
}
// lib.rs in foo
#[macro_export]
macro_rules! foo {
    () => {5}
}

and you get an error while compiling blub:

macro undefined: 'foo!'

@mahkoh
mahkoh added a note May 20, 2016
  1. Macro expansion is, in general, not deterministic so it's impossible to give precise results.

  2. You've already given an example where your algorithm fails for different reasons.

@oli-obk
oli-obk added a note May 20, 2016

Macro expansion is, in general, not deterministic so it's impossible to give precise results.

Well it is deterministic between runs if there are no compiler plugins involved.

You've already given an example where your algorithm fails for different reasons.

I have? Everything that works right now would still work with my algorithm, because rustc cannot handle macros that expand to exter crate items that add new macros.

@mahkoh
mahkoh added a note May 23, 2016 edited

I have?

Yes, steps 3 and 4 must be run in a loop since macro expansion can reveal new dependencies which can allow more macros to expand which reveal new dependencies etc.

I don't think it's worth discussing a more complicated process that allows the compiler to emit such a file once it encounters the first error at this time. I have no idea how such a process would be implemented but I know that the process I've described can be implemented in about 100 lines of code. And while such simple code cannot solve the 100% case, it can solve the 99% case and allows the other 1% to be handled downstream.

It's also not productive to spend more time writing comments here until those who rule over us have signaled whether they might add this feature (in one form or the other) or not. Like I said, I'm already using this feature in my driver and maintaining 100 loc every two months isn't that much work.

@oli-obk
oli-obk added a note May 23, 2016

As I showed in my example, rustc doesn't allow macros to add dependencies that add macros, so there's no need to support it in a build system.

@mahkoh
mahkoh added a note May 23, 2016

This is with the current macro system. Does this still hold with the new one? From what I know, the new macro system will perform macro resolution lazily which would allow macros to add new dependencies.

@mahkoh
mahkoh added a note May 23, 2016

Buf if you're working under the assumption that it does not, then I don't see the point of your comments since the system suggested in this RFC already supports the 100% case via your algorithm.

@mahkoh
mahkoh added a note May 30, 2016

The current behavior seems to be considered a bug by some: rust-lang/rust#33936

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nrc nrc added the T-compiler label May 23, 2016
@nrc nrc was assigned by pnkfelix Jul 7, 2016
@nrc nrc added the I-nominated label Jul 21, 2016
@nikomatsakis

One obvious thing: the flag should not be -Z, but something else. -Z is for unstable functionality.

@nrc

This RFC is entering its 🔔 final comment period 🔔

@nikomatsakis

Also, I would rather not have a separate -Z parse-only flag, but rather combine the two things into one flag.

@nrc nrc commented on the diff Jul 22, 2016
text/0000-emit-crate-deps.md
+build system. Otherwise such information has to be maintained in at least two
+locations: Once in the compilation unit itself via an `extern crate X` statement
+and once in the build system.
+
+To this end, this RFC proposes a new compiler flag which emits the names of the
+crates a compilation unit depends on to a text file.
+
+# Detailed design
+[design]: #detailed-design
+
+Add a new compiler flag `-Z emit-crate-deps`. If the flag is passed, the
+compiler emits a file `<crate name>.crate_deps` which contains a list of all
+crates the compilation unit depends on.
+
+If the flag `-Z parse-only` is passed, the file is emitted immediately after
+parsing. Otherwise it's emitted after macro expansion.
@nrc
nrc added a note Jul 22, 2016

Why is this detail important?

@nrc
nrc added a note Jul 22, 2016

Ah, nevermind, it is explained below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nrc

CC @rust-lang/tools

Nominating for the tools team to have a looks at this.

@nrc

To summarise, this seems like a desirable and small feature to add - emit a list of deps for the current crate. However, the details are a little concerning, in particular the need for a 'best effort' pre-macro expansion version feels bad to me (and @oli-obk by the sounds of it). I'm not sure if there is a good solution for that? What does Cargo do?

@brson
brson commented Jul 22, 2016 edited

Seems like a useful feature, but as-is the details are a bit light.

I don't understand the need for 'best effort' resolution. The only motivation for it I see is this:

The -Z parse-only variant is, however, essential for the first round of dependency resolution in the build system.

What does this mean? What are tools going to use pre-expansion resolution for? Why can't they tolerate doing the expansion? Please expand the main text to include this information.

The build model this is proposing is backwards of Cargo: instead of the build system telling rustc its dependencies, it is asking rustc for its dependencies. This may be important for non-Cargo build systems but they are necessarily going to get less fidelity about the deps (all rustc knows is crate names).

What exactly is in the crate_deps file? The RFC says "a list of all crates". Are these the idents of the crates as contained in the extern crate statement? Are the seperated by newlines? Please make it explicit. For example, it could also conceivably be paths to crate files.

This feature has parallels to --emit dep-info so it seems like it probably makes the most sense to call this --emit crate-deps. The names are very similar though so it would probably pay to brainstorm more distinct names.

@nikomatsakis nikomatsakis commented on the diff Jul 26, 2016
text/0000-emit-crate-deps.md
+crates the compilation unit depends on.
+
+If the flag `-Z parse-only` is passed, the file is emitted immediately after
+parsing. Otherwise it's emitted after macro expansion.
+
+The crates are collected by walking the AST and inspecting all `extern crate`
+statements. If the statement is of the form `extern crate X as Y`, `X` but not
+`Y` is added to the list.
+
+The file contains the names of the crates separated by `\n`.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+If `-Z parse-only` is passed, the created file is, in general, not precise. This
+is because macro expansion can itself create new `extern crate` statements.

Also, macro expansion may remove extern crate statements, naturally. (Since it e.g. applies #[cfg] directives.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nikomatsakis

The build model this is proposing is backwards of Cargo: instead of the build system telling rustc its dependencies, it is asking rustc for its dependencies.

I wanted to bring this up. The cargo model has many advantages -- for example, it allows one to process the dependencies without inspecting the source, and sidesteps the problems around macro expansion -- which is why it was adopted in the first place. It does seem a bit odd for build systems to "swim against the grain" (to mix a few metaphors) by extracting "extern crates" rather than adopting a Cargo-like alternative where dependency information is extracted into a manifest (this connects to my desire to allow extern crate declarations to be elided entirely when they are supplied on the command line).

Also, it seems like one could literally use grep or a regular expression applied across the source files to achieve the goals of the -Z parse-only flag, and in fact with better fidelity, since it would cover cases like foo!({ extern crate bar; }).

@nrc nrc removed the I-nominated label Jul 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment