From 0639d49a4c4fd0f55fb8bbe50e2805b0f299fbd4 Mon Sep 17 00:00:00 2001 From: Mac Malone Date: Fri, 10 Oct 2025 22:17:39 -0400 Subject: [PATCH] feat: scope output cache by platform & toolchain (#10730) This PR changes the Lake's remote cache interface to scope cache outputs by toolchain and/or platform were useful. Packages that set `platformIndependent = true` will not be scoped by platform and the core build (i.e., `bootstrap = true`) will not be scoped by toolchain. Lake's detected platform and toolchain can be overridden with the new `--platform` and `--toolchain` options to `cache get` and `cache put`. Lake no longer accepts the `--scope` option when using `cache get` with Reservoir.. The `--repo` option must be used instead. --- src/lake/Lake/CLI/Help.lean | 79 +++++++++---- src/lake/Lake/CLI/Main.lean | 150 ++++++++++++++++-------- src/lake/Lake/Config/Cache.lean | 159 ++++++++++++++++++++------ src/lake/Lake/Config/Package.lean | 4 + src/lake/Lake/Util/Url.lean | 6 +- src/lake/Lake/Util/Version.lean | 115 +++++++++---------- tests/lake/tests/cache/online-test.sh | 53 ++++++--- 7 files changed, 384 insertions(+), 182 deletions(-) diff --git a/src/lake/Lake/CLI/Help.lean b/src/lake/Lake/CLI/Help.lean index 93cc333566..db3446a4b9 100644 --- a/src/lake/Lake/CLI/Help.lean +++ b/src/lake/Lake/CLI/Help.lean @@ -326,7 +326,15 @@ def helpCacheGet := "Download artifacts from a remote service into the Lake cache USAGE: - lake cache get [] [--scope=] [--max-revs=] + lake cache get [] + +OPTIONS: + --max-revs= backtrack up to n revisions (default: 100) + --rev= uses this exact revision to lookup artifacts + --repo= GitHub repository of the package or a fork + --platform= with Reservoir or --repo, sets the platform + --toolchain= with Reservoir or --repo, sets the toolchain + --scope= scope for a custom endpoint Downloads artifacts for packages in the workspace from a remote cache service. The cache service used can be configured via the environment variables: @@ -334,45 +342,72 @@ The cache service used can be configured via the environment variables: LAKE_CACHE_ARTIFACT_ENDPOINT base URL for artifact downloads LAKE_CACHE_REVISION_ENDPOINT base URL for the mapping download -If neither of these are set, Lake will use Reservoir instead. +If neither of these are set, Lake will use Reservoir. -If an input-to-outputs mappings file or a scope is provided, Lake will -download artifacts for the root package. Otherwise, it will download artifacts -for each package in the root's dependency tree in order (using Reservoir). +If an input-to-outputs mappings file, `--scope`, or `--repo` is provided, +Lake will download artifacts for the root package. Otherwise, it will use +Reservoir to download artifacts for each dependency in workspace (in order). Non-Reservoir dependencies will be skipped. -To determine the artifacts to download, Lake uses the package's Git revision -(commit hash) to lookup its input-to-outputs mappings on the cache service. -Lake will download the artifacts for the most recent commit with available -mappings. It will backtrack up to `--max-revs`, which defaults to 100. -If set to 0, Lake will search the repository's whole history. +To determine the artifacts to download, Lake searches for input-to-output +mappings for a given build of the package via the cache service. This mapping +is identified by a Git revision and prefixed with a scope derived from the +package's name, GitHub repository, Lean toolchain, and current platform. +The exact configuration can be customized using options. -While downloading, Lake will continue on when a download for an artifact -fails or if the download process for a whole package fails. However, it will -report this and exit with a nonzero status code in such cases." +For Reservoir, setting `--repo` will make Lake lookup artifacts for the root +package by a repository name, rather than the package's. This can be used to +download artifacts for a fork of the Reservoir package (if such artifacts are +available). The `--platform` and `--toolchain` options can be used to download +artifacts for a different platform/toolchain configuration than Lake detects. +For a custom endpoint, the full prefix Lake uses can be set via `--scope`. + +If `--rev` is not set, Lake uses the package's current revision to lookup +artifacts. If no mappings are found, Lake will backtrack the Git history up to +`--max-revs`, looking for a revision with mappings. If `--max-revs` is 0, Lake +will search the repository's entire history (or as far as Git will allow). + +If a download for an artifact fails or the download process for a whole +package fails, Lake will report this and continue on to the next. Once done, +if any download failed, Lake will exit with a nonzero status code." def helpCachePut := "Upload artifacts from the Lake cache to a remote service USAGE: - lake cache put --scope= + lake cache put Uploads the input-to-outputs mappings contained in the specified file along with the corresponding output artifacts to a remote cache. The cache service used is configured via the environment variables: - LAKE_CACHE_KEY authentication key for requests - LAKE_CACHE_ARTIFACT_ENDPOINT base URL for artifact uploads - LAKE_CACHE_REVISION_ENDPOINT base URL for the mapping upload + LAKE_CACHE_KEY authentication key for requests + LAKE_CACHE_ARTIFACT_ENDPOINT base URL for artifact uploads + LAKE_CACHE_REVISION_ENDPOINT base URL for the mapping upload Files are uploaded using the AWS Signature Version 4 authentication protocol via `curl`. Thus, the service should generally be an S3-compatible bucket. -Artifacts are uploaded to the artifact endpoint under the prefix `scope` -with a file name corresponding to their Lake content hash. The mappings file -is uploaded to the revision endpoint under the prefix `scope` with a file name -corresponding to the package's current Git revision. As such, the command will -fail if the the work tree currently has changes." +Since Lake does not currently use cryptographically secure hashes for +artifacts and outputs, uploads to the cache are prefixed with a scope to avoid +clashes. This scoped is configured with the following options: + + --scope= sets a fixed scope + --repo= uses the repository + toolchain & platform + --toolchain= with --repo, sets the toolchain + --platform= with --repo, sets the platform + +At least one of `--scope` or `--repo` must be set. If `--repo` is used, Lake +will produce a scope by augmenting the repository with toolchain and platform +information as it deems necessary. If `--scope` is set, Lake will use the +specified scope verbatim. + +Artifacts are uploaded to the artifact endpoint with a file name derived +from their Lake content hash (and prefixed by the repository or scope). +The mappings file is uploaded to the revision endpoint with a file name +derived from the package's current Git revision (and prefixed by the +full scope). As such, the command will warn if the the work tree currently +has changes." def helpScriptCli := "Manage Lake scripts diff --git a/src/lake/Lake/CLI/Main.lean b/src/lake/Lake/CLI/Main.lean index bda58d8a47..c2ab802723 100644 --- a/src/lake/Lake/CLI/Main.lean +++ b/src/lake/Lake/CLI/Main.lean @@ -67,6 +67,10 @@ public structure LakeOptions where outputsFile? : Option FilePath := none forceDownload : Bool := false scope? : Option String := none + /-- Was `scope?` set with `--repo` (and not `--scope`)? -/ + repoScope : Bool := false + platform? : Option String := none + toolchain? : Option String := none rev? : Option String := none maxRevs : Nat := 100 @@ -211,6 +215,24 @@ def lakeShortOption : (opt : Char) → CliM PUnit | 'J' => modifyThe LakeOptions ({· with outFormat := .json}) | opt => throw <| CliError.unknownShortOption opt +/-- Returns an error if the string is not valid GitHub repository name. -/ +-- Limitations derived from https://github.com/dead-claudia/github-limits +def validateRepo? (repo : String) : Option String := Id.run do + unless repo.all isValidRepoChar do + return "invalid characters in repository name" + match repo.split (· == '/') with + | [owner, name] => + if owner.length > 39 then + return "invalid repository name; owner must be at most 390 characters long" + if name.length > 100 then + return "invalid repository name; owner must be at most 100 characters long" + | _ => return "invalid repository name; must contain exactly one '/'" + return none +where + /-- Returns whether `c` is a valid character in GitHub repository name -/ + isValidRepoChar (c : Char) : Bool := + c.isAlphanum || c == '-' || c == '_' || c == '.' || c == '/' + def lakeLongOption : (opt : String) → CliM PUnit | "--quiet" => modifyThe LakeOptions ({· with verbosity := .quiet}) | "--verbose" => modifyThe LakeOptions ({· with verbosity := .verbose}) @@ -230,10 +252,22 @@ def lakeLongOption : (opt : String) → CliM PUnit | "--force-download" => modifyThe LakeOptions ({· with forceDownload := true}) | "--scope" => do let scope ← takeOptArg "--scope" "cache scope" - modifyThe LakeOptions ({· with scope? := some scope}) + modifyThe LakeOptions ({· with scope? := some scope, repoScope := false}) | "--repo" => do - let repo ← takeOptArg "--repo" "repository" - modifyThe LakeOptions ({· with scope? := some repo}) + let repo ← takeOptArg "--repo" "GitHub repository" + if let some e := validateRepo? repo then error e + modifyThe LakeOptions ({· with scope? := some repo, repoScope := true}) +| "--platform" => do + let platform ← takeOptArg "--platform" "cache platform" + if platform.length > 100 then + error "invalid platform; platform is expected to be at most 100 characters long" + modifyThe LakeOptions ({· with platform? := some platform}) +| "--toolchain" => do + let toolchain ← takeOptArg "--toolchain" "cache toolchain" + let toolchain := if toolchain.isEmpty then toolchain else normalizeToolchain toolchain + if toolchain.length > 256 then + error "invalid toolchain version; toolchain is expected to be at most 256 characters long" + modifyThe LakeOptions ({· with toolchain? := some toolchain}) | "--rev" => do let rev ← takeOptArg "--rev" "Git revision" modifyThe LakeOptions ({· with rev? := some rev}) @@ -332,59 +366,86 @@ namespace lake namespace cache +@[inline] private def cacheToolchain (pkg : Package) (toolchain : String) : String := + if pkg.bootstrap then "" else toolchain + +@[inline] private def cachePlatform (pkg : Package) (platform : String) : String := + if pkg.isPlatformIndependent then "" else platform + protected def get : CliM PUnit := do processOptions lakeOption let opts ← getThe LakeOptions let mappings? ← takeArg? - noArgsRem do + noArgsRem do let cfg ← mkLoadConfig opts let ws ← loadWorkspace cfg let cache := ws.lakeCache - if let some file := mappings? then + if let some file := mappings? then liftM (m := LoggerIO) do + if opts.platform?.isSome || opts.toolchain?.isSome then + logWarning "the `--platform` and `--toolchain` options do nothing for `cache get` with a mappings file" + if .warning ≤ opts.failLv then + failure let some remoteScope := opts.scope? - | error "to use `cache get` with a mappings file, `--scope` must be set" + | error "to use `cache get` with a mappings file, `--scope` or `--repo` must be set" let service : CacheService := if let some artifactEndpoint := ws.lakeEnv.cacheArtifactEndpoint? then - {artifactEndpoint} + .downloadArtsService artifactEndpoint else - {apiEndpoint? := some ws.lakeEnv.reservoirApiUrl} + .reservoirService ws.lakeEnv.reservoirApiUrl let map ← CacheMap.load file - service.downloadOutputArtifacts map cache remoteScope ws.root.cacheScope opts.forceDownload + service.downloadOutputArtifacts map cache ws.root.cacheScope remoteScope opts.forceDownload else + let platform := opts.platform?.getD System.Platform.target + let toolchain := opts.toolchain?.getD ws.lakeEnv.toolchain let service : CacheService ← id do match ws.lakeEnv.cacheArtifactEndpoint?, ws.lakeEnv.cacheRevisionEndpoint? with | some artifactEndpoint, some revisionEndpoint => - return {artifactEndpoint, revisionEndpoint} + return .downloadService artifactEndpoint revisionEndpoint | none, none => - return {apiEndpoint? := some ws.lakeEnv.reservoirApiUrl} + return .reservoirService ws.lakeEnv.reservoirApiUrl | some artifactEndpoint, none => error (invalidEndpointConfig artifactEndpoint "") | none, some revisionEndpoint => error (invalidEndpointConfig "" revisionEndpoint) if let some remoteScope := opts.scope? then - let service := if service.apiEndpoint?.isNone then service else { - artifactEndpoint := s!"{ws.lakeEnv.reservoirApiUrl}/artifacts" - revisionEndpoint := s!"{ws.lakeEnv.reservoirApiUrl}/outputs" - } - let map ← getOutputs cache service ws.root remoteScope opts - service.downloadOutputArtifacts map cache remoteScope ws.root.cacheScope opts.forceDownload - else if service.apiEndpoint?.isSome then -- Reservoir + if !opts.repoScope && service.isReservoir then + -- `--scope` with Reservoir would imply downloading artifacts for a different package. + -- This is likely user error (they meant `--repo`) rather than something actually useful. + error "to use `cache get` with `--scope`, a custom endpoint must be set (not Reservoir); \ + if you instead want to download artifacts for a fork of the package, use `--repo`" + let service := service.withRepoScope opts.repoScope + let pkg := ws.root + let repo := GitRepo.mk pkg.dir + let platform := cachePlatform pkg platform + let toolchain := cacheToolchain pkg toolchain + let map ← id do + if let some rev := opts.rev? then + let rev ← repo.resolveRevision rev + let some map ← service.downloadRevisionOutputs? rev cache pkg.cacheScope remoteScope platform toolchain + | error s!"{remoteScope}: outputs not found for revision {rev}" + return map + else + findOutputs cache service pkg remoteScope opts platform toolchain + service.downloadOutputArtifacts map cache pkg.cacheScope remoteScope opts.forceDownload + else if service.isReservoir then -- TODO: Parallelize? let ok ← ws.packages.foldlM (start := 1) (init := true) (m := LoggerIO) fun ok pkg => do try if pkg.scope.isEmpty then logInfo s!"{pkg.name}: skipping non-Reservoir dependency`" else + let platform := cachePlatform pkg platform + let toolchain := cacheToolchain pkg toolchain let remoteScope := s!"{pkg.scope}/{pkg.name.toString (escape := false)}" - let map ← getOutputs cache service pkg remoteScope opts - service.downloadOutputArtifacts map cache remoteScope pkg.cacheScope opts.forceDownload + let map ← findOutputs cache service pkg remoteScope opts platform toolchain + service.downloadOutputArtifacts map cache pkg.cacheScope remoteScope opts.forceDownload return ok catch _ => return false unless ok do error "failed to download artifacts for some dependencies" else - error "to use `cache get` with a custom endpoint, the `--scope` option must be set" + error "to use `cache get` with a custom endpoint, the `--scope` or `--repo` option must be set" where invalidEndpointConfig artifactEndpoint revisionEndpoint := s!"invalid endpoint configuration:\ @@ -392,45 +453,42 @@ where \n LAKE_CACHE_REVISION_ENDPOINT={revisionEndpoint}\n\ To use `cache get` with a custom endpoint, both environment variables \ must be set to non-empty strings. To use Reservoir, neither should be set." - getOutputs cache service pkg remoteScope opts : LoggerIO CacheMap := do + findOutputs cache service pkg remoteScope opts platform toolchain : LoggerIO CacheMap := do let repo := GitRepo.mk pkg.dir - if let some rev := opts.rev? then - let rev ← repo.resolveRevision rev - let some map ← service.downloadRevisionOutputs? rev cache remoteScope - | error s!"{remoteScope}: outputs not found for revision {rev}" - return map - else - if (← repo.hasDiff) then - logWarning s!"{pkg.name}: package has changes; \ - only artifacts for committed code will be downloaded" - if .warning ≤ opts.failLv then - failure - let n := opts.maxRevs - let revs ← repo.getHeadRevisions n - let map? ← revs.findSomeM? fun rev => - service.downloadRevisionOutputs? rev cache remoteScope - let some map := map? - | let revisions := - if n = 0 || revs.size < n then "for any revision" else s!"in {n} revisions from HEAD" - error s!"{remoteScope}: no outputs found {revisions}" - return map + if (← repo.hasDiff) then + logWarning s!"{pkg.name}: package has changes; \ + only artifacts for committed code will be downloaded" + if .warning ≤ opts.failLv then + failure + let n := opts.maxRevs + let revs ← repo.getHeadRevisions n + let map? ← revs.findSomeM? fun rev => + service.downloadRevisionOutputs? rev cache pkg.cacheScope remoteScope platform toolchain + let some map := map? + | let revisions := + if n = 0 || revs.size < n then "for any revision" else s!"in {n} revisions from HEAD" + error s!"{remoteScope}: no outputs found {revisions}" + return map protected def put : CliM PUnit := do processOptions lakeOption let file ← takeArg "mappings" let opts ← getThe LakeOptions let some scope := opts.scope? - | error "the `--scope` option must be set for `cache put`" + | error "the `--scope` or `--repo` option must be set for `cache put`" noArgsRem do let cfg ← mkLoadConfig opts let ws ← loadWorkspace cfg + let pkg := ws.root + let platform := cachePlatform pkg (opts.platform?.getD System.Platform.target) + let toolchain := cacheToolchain pkg (opts.toolchain?.getD ws.lakeEnv.toolchain) let service : CacheService ← id do match ws.lakeEnv.cacheKey?, ws.lakeEnv.cacheArtifactEndpoint?, ws.lakeEnv.cacheRevisionEndpoint? with | some key, some artifactEndpoint, some revisionEndpoint => - return {key, artifactEndpoint, revisionEndpoint} + return .uploadService key artifactEndpoint revisionEndpoint | key?, artifactEndpoint?, revisionEndpoint? => error (invalidEndpointConfig key? artifactEndpoint? revisionEndpoint?) - let pkg := ws.root + let service := service.withRepoScope opts.repoScope let repo := GitRepo.mk pkg.dir if (← repo.hasDiff) then logWarning s!"{pkg.name}: package has changes; \ @@ -444,7 +502,7 @@ protected def put : CliM PUnit := do service.uploadArtifacts ⟨descrs, rfl⟩ paths scope -- Mappings are uploaded after artifacts to allow downloads to assume that -- if the mappings exist, the artifacts should also exist - service.uploadRevisionOutputs rev file scope + service.uploadRevisionOutputs rev file scope platform toolchain where invalidEndpointConfig key? artifactEndpoint? revisionEndpoint? := s!"invalid endpoint configuration:\ diff --git a/src/lake/Lake/Config/Cache.lean b/src/lake/Lake/Config/Cache.lean index 9c842550c2..46699b4bd7 100644 --- a/src/lake/Lake/Config/Cache.lean +++ b/src/lake/Lake/Config/Cache.lean @@ -9,6 +9,7 @@ prelude public import Lake.Util.Log public import Lake.Config.Artifact public import Lake.Build.Trace +import Lake.Config.InstallPath import Lake.Build.Actions import Lake.Util.Url import Lake.Util.Proc @@ -19,6 +20,8 @@ open Lean System namespace Lake +/-! ## Cache Map -/ + /-- Maps an input hash to a structure of output artifact content hashes. @@ -180,6 +183,8 @@ where go as o := do end CacheMap +/-! ## Cache Ref -/ + /-- A mutable reference to a `CacheMap`. -/ public abbrev CacheRef := IO.Ref CacheMap @@ -198,6 +203,8 @@ public def insert [ToJson α] (inputHash : Hash) (val : α) (cache : CacheRef) : end CacheRef +/-! ## Local Cache -/ + /-- The Lake cache directory. -/ public structure Cache where dir : FilePath @@ -291,6 +298,8 @@ public def readOutputs? (cache : Cache) (scope : String) (inputHash : Hash) : Lo end Cache +/-! ## Remote Cache Service -/ + /-- Uploads a file to an online bucket using the Amazon S3 protocol. -/ def uploadS3 (file : FilePath) (contentType : String) (url : String) (key : String) @@ -308,31 +317,79 @@ def uploadS3 /-- Configuration of a remote cache service (e.g., Reservoir or an S3 bucket). -All fields are not required to be valid. -A user should set at least the necessary ones for a given function. +A given configuration is not required to support all cache service functions, and no validation +of the configuration is performed by its operations. Users should construct a service that supports +the desired functions by using `CacheService`'s smart constructors +(e.g., `reservoir`, `uploadService`). -/ public structure CacheService where - key : String := "" - artifactEndpoint : String := "" - revisionEndpoint : String := "" - /-- Reservoir API endpoint. -/ - apiEndpoint? : Option String := none + private mk :: + /- S3 Bucket -/ + private key : String := "" + private artifactEndpoint : String := "" + private revisionEndpoint : String := "" + /- Reservoir -/ + /-- Is this a Reservoir cache service configuration? -/ + isReservoir : Bool := false + private apiEndpoint : String := "" + /-- Whether interpret the scope as a repository or not. -/ + private repoScope : Bool := false namespace CacheService +/-! ### Constructors -/ + +/-- Constructs a `CacheService` for a Reservoir endpoint. -/ +@[inline] public def reservoirService (apiEndpoint : String) (repoScope := false) : CacheService := + {isReservoir := true, apiEndpoint, repoScope} + +/-- Constructs a `CacheService` to upload artifacts and/or outputs to an S3 endpoint. -/ +@[inline] public def uploadService (key artifactEndpoint revisionEndpoint : String) : CacheService := + {key, artifactEndpoint, revisionEndpoint} + +/-- Constructs a `CacheService` to download artifacts and/or outputs from to an S3 endpoint. -/ +@[inline] public def downloadService (artifactEndpoint revisionEndpoint : String) : CacheService := + {artifactEndpoint, revisionEndpoint} + +/-- Constructs a `CacheService` to download just artifacts from to an S3 endpoint. -/ +@[inline] public def downloadArtsService (artifactEndpoint : String) : CacheService := + {artifactEndpoint} + +/-- +Reconfigures the cache service to interpret scopes as repositories (or not if `false`). + +For custom endpoints, if `true`, Lake wil augment the provided scope with +toolchain and platform information in a manner similar to Reservoir. +-/ +@[inline] public def withRepoScope (service : CacheService) (repoScope := true) : CacheService := + {service with repoScope} + +/-! ### Artifact Transfer -/ + /-- The MIME type of Lake/Reservoir artifact. -/ public def artifactContentType : String := "application/vnd.reservoir.artifact" -public def artifactUrl (contentHash : Hash) (scope : String) (service : CacheService) : String := - let scope := "/".intercalate <| scope.split (· == '/') |>.map uriEncode - if let some apiEndpoint := service.apiEndpoint? then - s!"{apiEndpoint}/packages/{scope}/artifacts/{contentHash.hex}.art" +private def appendScope (endpoint : String) (scope : String) : String := + scope.split (· == '/') |>.foldl (init := endpoint) fun s component => + uriEncode component s |>.push '/' + +private def s3ArtifactUrl (contentHash : Hash) (service : CacheService) (scope : String) : String := + appendScope s!"{service.artifactEndpoint}/" scope ++ s!"{contentHash.hex}.art" + +public def artifactUrl (contentHash : Hash) (service : CacheService) (scope : String) : String := + if service.isReservoir then + let endpoint := + if service.repoScope then + s!"{service.apiEndpoint}/repositories/" + else + s!"{service.apiEndpoint}/packages/" + appendScope endpoint scope ++ s!"artifacts/{contentHash.hex}.art" else - s!"{service.artifactEndpoint}/{scope}/{contentHash.hex}.art" + service.s3ArtifactUrl contentHash scope public def downloadArtifact - (descr : ArtifactDescr) (cache : Cache) (scope : String) - (service : CacheService) (force := false) + (descr : ArtifactDescr) (cache : Cache) + (service : CacheService) (scope : String) (force := false) : LoggerIO Unit := do let url := service.artifactUrl descr.hash scope let path := cache.artifactDir / descr.relPath @@ -345,13 +402,13 @@ public def downloadArtifact download url path let hash ← computeFileHash path if hash != descr.hash then - logError s!"{path}: downloaded artifact does not have expect hash" + logError s!"{path}: downloaded artifact does not have the expected hash" IO.FS.removeFile path failure public def downloadArtifacts - (descrs : Array ArtifactDescr) (cache : Cache) (scope : String) - (service : CacheService) (force := false) + (descrs : Array ArtifactDescr) (cache : Cache) + (service : CacheService) (scope : String) (force := false) : LoggerIO Unit := do let ok ← descrs.foldlM (init := true) fun ok descr => try @@ -363,17 +420,17 @@ public def downloadArtifacts error s!"{scope}: failed to download some artifacts" public def downloadOutputArtifacts - (map : CacheMap) (cache : Cache) (remoteScope localScope : String) - (service : CacheService) (force := false) + (map : CacheMap) (cache : Cache) (service : CacheService) + (localScope remoteScope : String) (force := false) : LoggerIO Unit := do cache.writeMap localScope map let descrs ← map.collectOutputDescrs service.downloadArtifacts descrs cache remoteScope force public def uploadArtifact - (contentHash : Hash) (art : FilePath) (scope : String) (service : CacheService) + (contentHash : Hash) (art : FilePath) (service : CacheService) (scope : String) : LoggerIO Unit := do - let url := service.artifactUrl contentHash scope + let url := service.s3ArtifactUrl contentHash scope logInfo s!"\ {scope}: uploading artifact {contentHash}\ \n local path: {art}\ @@ -381,35 +438,62 @@ public def uploadArtifact uploadS3 art artifactContentType url service.key public def uploadArtifacts - (descrs : Vector ArtifactDescr n) (paths : Vector FilePath n) (scope : String) (service : CacheService) -: LoggerIO Unit := n.forM fun n h => uploadArtifact descrs[n].hash paths[n] scope service + (descrs : Vector ArtifactDescr n) (paths : Vector FilePath n) + (service : CacheService) (scope : String) +: LoggerIO Unit := n.forM fun n h => service.uploadArtifact descrs[n].hash paths[n] scope + +/-! ### Output Transfer -/ /-- The MIME type of Lake/Reservoir input-to-output mappings for a Git revision. -/ public def mapContentType : String := "application/vnd.reservoir.outputs+json-lines" +private def s3RevisionUrl + (rev : String) (service : CacheService) (scope : String) + (platform : String := "") (toolchain : String := "") +: String := Id.run do + let mut url := appendScope s!"{service.revisionEndpoint}/" scope + if service.repoScope then + unless platform.isEmpty do + url := uriEncode platform s!"{url}pt/" |>.push '/' + unless toolchain.isEmpty do + url := uriEncode (toolchain2Dir toolchain).toString s!"{url}tc/" |>.push '/' + return s!"{url}{rev}.jsonl" -public def revisionUrl (rev : String) (scope : String) (service : CacheService) : String := - let scope := "/".intercalate <| scope.split (· == '/') |>.map uriEncode - if let some apiEndpoint := service.apiEndpoint? then - s!"{apiEndpoint}/packages/{scope}/build-outputs?rev={rev}" +public def revisionUrl + (rev : String) (service : CacheService) (scope : String) + (platform : String := "") (toolchain : String := "") +: String := + if service.isReservoir then Id.run do + let mut url := service.apiEndpoint + if service.repoScope then + url := url ++ "/repositories/" + else + url := url ++ "/packages/" + url := appendScope url scope ++ s!"build-outputs?rev={rev}" + unless platform.isEmpty do + url := uriEncode platform s!"{url}&platform=" + unless toolchain.isEmpty do + url := uriEncode toolchain s!"{url}&toolchain=" + return url else - s!"{service.revisionEndpoint}/{scope}/{rev}.jsonl" + service.s3RevisionUrl rev scope platform toolchain public def downloadRevisionOutputs? - (rev : String) (cache : Cache) (scope : String) - (service : CacheService) (force := false) + (rev : String) (cache : Cache) (service : CacheService) (localScope remoteScope : String) + (platform : String := "") (toolchain : String := "") (force := false) : LoggerIO (Option CacheMap) := do - let path := cache.revisionPath rev scope + -- TODO: toolchain-scoped revision paths for system cache? + let path := cache.revisionPath rev localScope if (← path.pathExists) && !force then return ← CacheMap.load path - let url := service.revisionUrl rev scope + let url := service.revisionUrl rev remoteScope platform toolchain logInfo s!"\ - {scope}: downloading build outputs for revision {rev}\ + {localScope}: downloading build outputs for revision {rev}\ \n local path: {path}\ \n remote URL: {url}" - let headers := if service.apiEndpoint?.isSome then Reservoir.lakeHeaders else {} + let headers := if service.isReservoir then Reservoir.lakeHeaders else {} let contents? ← try getUrl? url headers catch e => - logError s!"{scope}: output lookup failed" + logError s!"{remoteScope}: output lookup failed" throw e let some contents := contents? | return none @@ -418,9 +502,10 @@ public def downloadRevisionOutputs? CacheMap.load path public def uploadRevisionOutputs - (rev : String) (outputs : FilePath) (scope : String) (service : CacheService) + (rev : String) (outputs : FilePath) (service : CacheService) (scope : String) + (platform : String := "") (toolchain : String := "") : LoggerIO Unit := do - let url := service.revisionUrl rev scope + let url := service.s3RevisionUrl rev scope platform toolchain logInfo s!"\ {scope}: uploading build outputs for revision {rev}\ \n local path: {outputs}\ diff --git a/src/lake/Lake/Config/Package.lean b/src/lake/Lake/Config/Package.lean index 90df724905..b334b7d133 100644 --- a/src/lake/Lake/Config/Package.lean +++ b/src/lake/Lake/Config/Package.lean @@ -213,6 +213,10 @@ namespace Package @[inline] public def platformIndependent (self : Package) : Option Bool := self.config.platformIndependent +/-- Whether the package's has been configured with `platformIndependent = true`. -/ +@[inline] public def isPlatformIndependent (self : Package) : Bool := + self.config.platformIndependent == some true + /-- The package's `releaseRepo`/`releaseRepo?` configuration. -/ @[inline] public def releaseRepo? (self : Package) : Option String := self.config.releaseRepo diff --git a/src/lake/Lake/Util/Url.lean b/src/lake/Lake/Util/Url.lean index 8177bd02f9..22bfee50ff 100644 --- a/src/lake/Lake/Util/Url.lean +++ b/src/lake/Lake/Util/Url.lean @@ -96,8 +96,8 @@ public def uriEncodeChar (c : Char) (s := "") : String := uriEscapeChar c s /-- Encodes a string as an ASCII URI component, escaping special characters (and unicode). -/ -public def uriEncode (s : String) : String := - s.foldl (init := "") fun s c => uriEncodeChar c s +public def uriEncode (s : String) (init := "") : String := + s.foldl (init := init) fun s c => uriEncodeChar c s /-- Performs a HTTP `GET` request of a URL (using `curl` with JSON output) and, if successful, @@ -124,7 +124,7 @@ public def getUrl? else if code == 404 then return none else - error s!"failed to GET URL, error {code}" + error s!"failed to GET URL, error {code}; received:\n{out.stdout}" | .error e => error s!"curl produced invalid JSON output: {e}" diff --git a/src/lake/Lake/Util/Version.lean b/src/lake/Lake/Util/Version.lean index bdcf3e1f2f..e0768817d4 100644 --- a/src/lake/Lake/Util/Version.lean +++ b/src/lake/Lake/Util/Version.lean @@ -12,7 +12,6 @@ public import Lake.Util.Date /-! # Version This module contains useful definitions for manipulating versions. -It also defines a `v!""` syntax for version literals. -/ open System Lean @@ -111,20 +110,34 @@ public instance : ToString StdVer := ⟨StdVer.toString⟩ public instance : ToJson StdVer := ⟨(·.toString)⟩ public instance : FromJson StdVer := ⟨(do StdVer.parse <| ← fromJson? ·)⟩ -/-- A Lean toolchain version. -/ -public inductive ToolchainVer -| release (ver : LeanVer) -| nightly (date : Date) -| pr (no : Nat) -| other (name : String) -deriving Repr, DecidableEq - -public instance : Coe LeanVer ToolchainVer := ⟨ToolchainVer.release⟩ +/-- The `elan` toolchain file name (i.e., `lean-toolchain`). -/ +public def toolchainFileName : FilePath := "lean-toolchain" public def ToolchainVer.defaultOrigin := "leanprover/lean4" public def ToolchainVer.prOrigin := "leanprover/lean4-pr-releases" -public def ToolchainVer.ofString (ver : String) : ToolchainVer := Id.run do +public section -- for `@[computed_field]` +open ToolchainVer in +/-- A Lean toolchain version. -/ +inductive ToolchainVer +| release (ver : LeanVer) +| nightly (date : Date) +| pr (n : Nat) +| other (v : String) +with + @[computed_field] toString : ToolchainVer → String + | .release ver => s!"{defaultOrigin}:v{ver}" + | .nightly date => s!"{defaultOrigin}:nightly-{date}" + | .pr n => s!"{prOrigin}:pr-release-{n}" + | .other v => v +deriving Repr, DecidableEq +end + +namespace ToolchainVer + +public instance : Coe LeanVer ToolchainVer := ⟨ToolchainVer.release⟩ + +public def ofString (ver : String) : ToolchainVer := Id.run do let colonPos := ver.posOf ':' let (origin, tag) := if h : colonPos < ver.endPos then @@ -132,27 +145,29 @@ public def ToolchainVer.ofString (ver : String) : ToolchainVer := Id.run do (ver.extract 0 colonPos, ver.extract pos ver.endPos) else ("", ver) + let noOrigin := origin.isEmpty if tag.startsWith "v" then if let .ok ver := StdVer.parse (tag.drop 1) then - if origin.isEmpty || origin == defaultOrigin then + if noOrigin|| origin == defaultOrigin then return .release ver - return .other ver - else if tag.startsWith "nightly-" then - if let some date := Date.ofString? (tag.drop "nightly-".length) then - if origin.isEmpty || origin == defaultOrigin then + else if let some date := tag.dropPrefix? "nightly-" then + if let some date := Date.ofString? date.toString then + if noOrigin then return .nightly date - else if tag.startsWith "pr-release-" then - if let some n := (tag.drop "pr-release-".length).toNat? then - if origin.isEmpty || origin == prOrigin then + else if let some suffix := origin.dropPrefix? defaultOrigin then + if suffix.isEmpty || suffix == "-nightly".toSubstring then + return .nightly date + else if let some n := tag.dropPrefix? "pr-release-" then + if let some n := n.toNat? then + if noOrigin || origin == prOrigin then return .pr n - else - if let .ok ver := StdVer.parse ver then - if origin.isEmpty || origin == defaultOrigin then - return .release ver + else if let .ok ver := StdVer.parse ver then + if noOrigin || origin == defaultOrigin then + return .release ver return .other ver /-- Parse a toolchain from a `lean-toolchain` file. -/ -public def ToolchainVer.ofFile? (toolchainFile : FilePath) : IO (Option ToolchainVer) := do +public def ofFile? (toolchainFile : FilePath) : IO (Option ToolchainVer) := do try let toolchainString ← IO.FS.readFile toolchainFile return some <| ToolchainVer.ofString toolchainString.trim @@ -161,61 +176,41 @@ public def ToolchainVer.ofFile? (toolchainFile : FilePath) : IO (Option Toolchai return none | e => throw e -/-- The `elan` toolchain file name (i.e., `lean-toolchain`). -/ -public def toolchainFileName : FilePath := "lean-toolchain" - /-- Parse a toolchain from the `lean-toolchain` file of the directory `dir`. -/ -@[inline] public def ToolchainVer.ofDir? (dir : FilePath) : IO (Option ToolchainVer) := +@[inline] public def ofDir? (dir : FilePath) : IO (Option ToolchainVer) := ToolchainVer.ofFile? (dir / toolchainFileName) -public protected def ToolchainVer.toString (ver : ToolchainVer) : String := - match ver with - | .release ver => s!"{defaultOrigin}:v{ver}" - | .nightly date => s!"{defaultOrigin}:nightly-{date}" - | .pr n => s!"{prOrigin}:pr-release-{n}" - | .other s => s - public instance : ToString ToolchainVer := ⟨ToolchainVer.toString⟩ public instance : ToJson ToolchainVer := ⟨(·.toString)⟩ public instance : FromJson ToolchainVer := ⟨(ToolchainVer.ofString <$> fromJson? ·)⟩ -@[expose] public def ToolchainVer.lt (a b : ToolchainVer) : Prop := +public def blt (a b : ToolchainVer) : Bool := match a, b with | .release v1, .release v2 => v1 < v2 | .nightly d1, .nightly d2 => d1 < d2 - | _, _ => False + | _, _ => false -public instance : LT ToolchainVer := ⟨ToolchainVer.lt⟩ +public instance : LT ToolchainVer := ⟨(·.blt ·)⟩ +public instance decLt (a b : ToolchainVer) : Decidable (a < b) := + decidable_of_bool (a.blt b) Iff.rfl -public instance ToolchainVer.decLt (a b : ToolchainVer) : Decidable (a < b) := - match a, b with - | .release v1, .release v2 => inferInstanceAs (Decidable (v1 < v2)) - | .nightly d1, .nightly d2 => inferInstanceAs (Decidable (d1 < d2)) - | .release _, .pr _ | .release _, .nightly _ | .release _, .other _ - | .nightly _, .release _ | .nightly _, .pr _ | .nightly _, .other _ - | .pr _, _ | .other _, _ => .isFalse (by simp [LT.lt, ToolchainVer.lt]) - -@[expose] public def ToolchainVer.le (a b : ToolchainVer) : Prop := +public def ble (a b : ToolchainVer) : Bool := match a, b with | .release v1, .release v2 => v1 ≤ v2 | .nightly d1, .nightly d2 => d1 ≤ d2 | .pr n1, .pr n2 => n1 = n2 | .other v1, .other v2 => v1 = v2 - | _, _ => False + | _, _ => false -public instance : LE ToolchainVer := ⟨ToolchainVer.le⟩ +public instance : LE ToolchainVer := ⟨(·.ble ·)⟩ +public instance decLe (a b : ToolchainVer) : Decidable (a ≤ b) := + decidable_of_bool (a.ble b) Iff.rfl -public instance ToolchainVer.decLe (a b : ToolchainVer) : Decidable (a ≤ b) := - match a, b with - | .release v1, .release v2 => inferInstanceAs (Decidable (v1 ≤ v2)) - | .nightly d1, .nightly d2 => inferInstanceAs (Decidable (d1 ≤ d2)) - | .pr n1, .pr n2 => inferInstanceAs (Decidable (n1 = n2)) - | .other v1, .other v2 => inferInstanceAs (Decidable (v1 = v2)) - | .release _, .pr _ | .release _, .nightly _ | .release _, .other _ - | .nightly _, .release _ | .nightly _, .pr _ | .nightly _, other _ - | .pr _, .release _ | .pr _, .nightly _ | .pr _, .other _ - | .other _, .release _ | .other _, .nightly _ | .other _, .pr _ => - .isFalse (by simp [LE.le, ToolchainVer.le]) +end ToolchainVer + +/-- Converts a toolchain version to its normal form (e.g., with an origin). -/ +public def normalizeToolchain (s : String) : String := + ToolchainVer.ofString s |>.toString /-- Parses a version from a string. -/ public class DecodeVersion (α : Type u) where diff --git a/tests/lake/tests/cache/online-test.sh b/tests/lake/tests/cache/online-test.sh index c9eb241df0..3237e384a9 100755 --- a/tests/lake/tests/cache/online-test.sh +++ b/tests/lake/tests/cache/online-test.sh @@ -1,4 +1,4 @@ -gi#!/usr/bin/env bash +#!/usr/bin/env bash source ../common.sh ./clean.sh @@ -42,28 +42,41 @@ init_git TEST_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &> /dev/null && pwd)" export LAKE_CACHE_DIR="$TEST_DIR/.lake/cache" +# Ensure that Lake is run without a toolchain name +# (so toolchain does not end up in the cache) +export ELAN_TOOLCHAIN= + echo "# TESTS" +# Test `--repo` validation +test_err "must contain exactly one '/'" cache get --repo='invalid' +test_err 'invalid characters in repository name' cache get --repo='!/invalid' + # Test `cache get` command errors for bad configurations -with_cdn_endpoints test_err 'the `--scope` option must be set' cache get +test_err 'the `--platform` and `--toolchain` options do nothing' \ + cache get bogus.jsonl --scope='bogus' --platform='bogus' --wfail +test_err 'the `--platform` and `--toolchain` options do nothing' \ + cache get bogus.jsonl --scope='bogus' --toolchain='bogus' --wfail +test_err 'a custom endpoint must be set (not Reservoir)' cache get --scope='bogus' +with_cdn_endpoints test_err 'the `--scope` or `--repo` option must be set' cache get LAKE_CACHE_ARTIFACT_ENDPOINT=bogus test_err 'both environment variables must be set' cache get LAKE_CACHE_REVISION_ENDPOINT=bogus test_err 'both environment variables must be set' cache get # Test `cache put` command errors for bad configurations -test_err 'the `--scope` option must be set' cache put bogus.jsonl +with_upload_endpoints test_err 'the `--scope` or `--repo` option must be set' cache put bogus.jsonl test_err 'these environment variables must be set' \ - cache put bogus.jsonl --scope='!/bogus' + cache put bogus.jsonl --scope='bogus' LAKE_CACHE_KEY= test_err 'these environment variables must be set' \ - cache put bogus.jsonl --scope='!/bogus' + cache put bogus.jsonl --scope='bogus' LAKE_CACHE_REVISION_ENDPOINT=bogus test_err 'these environment variables must be set' \ - cache put bogus.jsonl --scope='!/bogus' + cache put bogus.jsonl --scope='bogus' LAKE_CACHE_REVISION_ENDPOINT=bogus test_err 'these environment variables must be set' \ - cache put bogus.jsonl --scope='!/bogus' + cache put bogus.jsonl --scope='bogus' -# Test revision failure +# Test revision failure (with Reservoir) REV=$(git rev-parse HEAD) -test_err "revision not found" cache get --scope='!/bogus' --rev='bogus' -test_err "outputs not found for revision" cache get --scope='!/bogus' --rev=$REV +test_err "revision not found" cache get --repo='leanprover/bogus' --rev='bogus' +test_err "outputs not found for revision" cache get --repo='leanprover/bogus' --rev=$REV # Test `cache get` skipping non-Reservoir dependencies test_run -f non-reservoir.toml update @@ -91,12 +104,24 @@ with_cdn_endpoints test_not_out "downloading" cache get --scope='!/test' # Test `--force-download` with_cdn_endpoints test_out "downloading" cache get --scope='!/test' --force-download +# Test cache put/get with a set platform/toolchain +with_upload_endpoints test_run cache put .lake/outputs.jsonl \ + --repo='leanprover/test' --platform=foo --toolchain=bar +test_cmd rm -rf .lake/build "$LAKE_CACHE_DIR" +with_cdn_endpoints test_run cache get \ + --repo='leanprover/test' --platform=foo --toolchain=bar +test_run build +Test --no-build +test_cmd rm -rf .lake/build "$LAKE_CACHE_DIR" +test_run cache get \ + --repo='leanprover/test' --platform=foo --toolchain=bar +test_run build +Test --no-build + # Test dirty work tree warnings test_cmd touch Ignored.lean test_cmd git add -f Ignored.lean with_upload_endpoints test_err "package has changes" --wfail \ - cache put .lake/outputs.jsonl --scope='!/test' -test_err "package has changes" --wfail cache get --scope='!/test' + cache put .lake/outputs.jsonl --repo='leanprover/bogus' +test_err "package has changes" --wfail cache get --repo='leanprover/bogus' test_cmd git commit -m "v2" # Test revision search @@ -112,7 +137,7 @@ test_out "the artifact cache is not enabled for this package" \ LAKE_ARTIFACT_CACHE=true test_run -d .lake/packages/Cli \ build -o "$TEST_DIR/.lake/cli-outputs.jsonl" --no-build with_upload_endpoints test_run -d .lake/packages/Cli \ - cache put "$TEST_DIR/.lake/cli-outputs.jsonl" --scope "leanprover/lean4-cli" + cache put "$TEST_DIR/.lake/cli-outputs.jsonl" --repo=leanprover/lean4-cli test_cmd rm -rf .lake/packages/Cli/.lake/build "$LAKE_CACHE_DIR" test_fails -f reservoir.toml build @Cli --no-build test_err "failed to download artifacts for some dependencies" \ @@ -120,7 +145,7 @@ test_err "failed to download artifacts for some dependencies" \ test_run -f reservoir.toml cache get --max-revs=1 test_run -f reservoir.toml build @Cli --no-build -# Test Reservoir with `--scope`/`--repo` uses GitHub scope +# Test Reservoir with `--repo` uses GitHub scope test_cmd rm -rf .lake/cache test_run -d .lake/packages/Cli cache get --repo=leanprover/lean4-cli --max-revs=1 test_run -d .lake/packages/Cli build --no-build