feat: scope output cache by platform & toolchain (#10730)

This PR changes the Lake's remote cache interface to scope cache outputs
by toolchain and/or platform were useful.

Packages that set `platformIndependent = true` will not be scoped by
platform and the core build (i.e., `bootstrap = true`) will not be
scoped by toolchain. Lake's detected platform and toolchain can be
overridden with the new `--platform` and `--toolchain` options to `cache
get` and `cache put`.

Lake no longer accepts the `--scope` option when using `cache get` with
Reservoir.. The `--repo` option must be used instead.
This commit is contained in:
Mac Malone 2025-10-10 22:17:39 -04:00 committed by GitHub
parent 3a26eb7281
commit 0639d49a4c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 384 additions and 182 deletions

View file

@ -326,7 +326,15 @@ def helpCacheGet :=
"Download artifacts from a remote service into the Lake cache
USAGE:
lake cache get [<mappings>] [--scope=<remote-scope>] [--max-revs=<n>]
lake cache get [<mappings>]
OPTIONS:
--max-revs=<n> backtrack up to n revisions (default: 100)
--rev=<commit-hash> uses this exact revision to lookup artifacts
--repo=<github-repo> GitHub repository of the package or a fork
--platform=<target-triple> with Reservoir or --repo, sets the platform
--toolchain=<name> with Reservoir or --repo, sets the toolchain
--scope=<remote-scope> scope for a custom endpoint
Downloads artifacts for packages in the workspace from a remote cache service.
The cache service used can be configured via the environment variables:
@ -334,45 +342,72 @@ The cache service used can be configured via the environment variables:
LAKE_CACHE_ARTIFACT_ENDPOINT base URL for artifact downloads
LAKE_CACHE_REVISION_ENDPOINT base URL for the mapping download
If neither of these are set, Lake will use Reservoir instead.
If neither of these are set, Lake will use Reservoir.
If an input-to-outputs mappings file or a scope is provided, Lake will
download artifacts for the root package. Otherwise, it will download artifacts
for each package in the root's dependency tree in order (using Reservoir).
If an input-to-outputs mappings file, `--scope`, or `--repo` is provided,
Lake will download artifacts for the root package. Otherwise, it will use
Reservoir to download artifacts for each dependency in workspace (in order).
Non-Reservoir dependencies will be skipped.
To determine the artifacts to download, Lake uses the package's Git revision
(commit hash) to lookup its input-to-outputs mappings on the cache service.
Lake will download the artifacts for the most recent commit with available
mappings. It will backtrack up to `--max-revs`, which defaults to 100.
If set to 0, Lake will search the repository's whole history.
To determine the artifacts to download, Lake searches for input-to-output
mappings for a given build of the package via the cache service. This mapping
is identified by a Git revision and prefixed with a scope derived from the
package's name, GitHub repository, Lean toolchain, and current platform.
The exact configuration can be customized using options.
While downloading, Lake will continue on when a download for an artifact
fails or if the download process for a whole package fails. However, it will
report this and exit with a nonzero status code in such cases."
For Reservoir, setting `--repo` will make Lake lookup artifacts for the root
package by a repository name, rather than the package's. This can be used to
download artifacts for a fork of the Reservoir package (if such artifacts are
available). The `--platform` and `--toolchain` options can be used to download
artifacts for a different platform/toolchain configuration than Lake detects.
For a custom endpoint, the full prefix Lake uses can be set via `--scope`.
If `--rev` is not set, Lake uses the package's current revision to lookup
artifacts. If no mappings are found, Lake will backtrack the Git history up to
`--max-revs`, looking for a revision with mappings. If `--max-revs` is 0, Lake
will search the repository's entire history (or as far as Git will allow).
If a download for an artifact fails or the download process for a whole
package fails, Lake will report this and continue on to the next. Once done,
if any download failed, Lake will exit with a nonzero status code."
def helpCachePut :=
"Upload artifacts from the Lake cache to a remote service
USAGE:
lake cache put <mappings> --scope=<remote-scope>
lake cache put <mappings> <scope-option>
Uploads the input-to-outputs mappings contained in the specified file along
with the corresponding output artifacts to a remote cache. The cache service
used is configured via the environment variables:
LAKE_CACHE_KEY authentication key for requests
LAKE_CACHE_ARTIFACT_ENDPOINT base URL for artifact uploads
LAKE_CACHE_REVISION_ENDPOINT base URL for the mapping upload
LAKE_CACHE_KEY authentication key for requests
LAKE_CACHE_ARTIFACT_ENDPOINT base URL for artifact uploads
LAKE_CACHE_REVISION_ENDPOINT base URL for the mapping upload
Files are uploaded using the AWS Signature Version 4 authentication protocol
via `curl`. Thus, the service should generally be an S3-compatible bucket.
Artifacts are uploaded to the artifact endpoint under the prefix `scope`
with a file name corresponding to their Lake content hash. The mappings file
is uploaded to the revision endpoint under the prefix `scope` with a file name
corresponding to the package's current Git revision. As such, the command will
fail if the the work tree currently has changes."
Since Lake does not currently use cryptographically secure hashes for
artifacts and outputs, uploads to the cache are prefixed with a scope to avoid
clashes. This scoped is configured with the following options:
--scope=<remote-scope> sets a fixed scope
--repo=<github-repo> uses the repository + toolchain & platform
--toolchain=<name> with --repo, sets the toolchain
--platform=<target-triple> with --repo, sets the platform
At least one of `--scope` or `--repo` must be set. If `--repo` is used, Lake
will produce a scope by augmenting the repository with toolchain and platform
information as it deems necessary. If `--scope` is set, Lake will use the
specified scope verbatim.
Artifacts are uploaded to the artifact endpoint with a file name derived
from their Lake content hash (and prefixed by the repository or scope).
The mappings file is uploaded to the revision endpoint with a file name
derived from the package's current Git revision (and prefixed by the
full scope). As such, the command will warn if the the work tree currently
has changes."
def helpScriptCli :=
"Manage Lake scripts

View file

@ -67,6 +67,10 @@ public structure LakeOptions where
outputsFile? : Option FilePath := none
forceDownload : Bool := false
scope? : Option String := none
/-- Was `scope?` set with `--repo` (and not `--scope`)? -/
repoScope : Bool := false
platform? : Option String := none
toolchain? : Option String := none
rev? : Option String := none
maxRevs : Nat := 100
@ -211,6 +215,24 @@ def lakeShortOption : (opt : Char) → CliM PUnit
| 'J' => modifyThe LakeOptions ({· with outFormat := .json})
| opt => throw <| CliError.unknownShortOption opt
/-- Returns an error if the string is not valid GitHub repository name. -/
-- Limitations derived from https://github.com/dead-claudia/github-limits
def validateRepo? (repo : String) : Option String := Id.run do
unless repo.all isValidRepoChar do
return "invalid characters in repository name"
match repo.split (· == '/') with
| [owner, name] =>
if owner.length > 39 then
return "invalid repository name; owner must be at most 390 characters long"
if name.length > 100 then
return "invalid repository name; owner must be at most 100 characters long"
| _ => return "invalid repository name; must contain exactly one '/'"
return none
where
/-- Returns whether `c` is a valid character in GitHub repository name -/
isValidRepoChar (c : Char) : Bool :=
c.isAlphanum || c == '-' || c == '_' || c == '.' || c == '/'
def lakeLongOption : (opt : String) → CliM PUnit
| "--quiet" => modifyThe LakeOptions ({· with verbosity := .quiet})
| "--verbose" => modifyThe LakeOptions ({· with verbosity := .verbose})
@ -230,10 +252,22 @@ def lakeLongOption : (opt : String) → CliM PUnit
| "--force-download" => modifyThe LakeOptions ({· with forceDownload := true})
| "--scope" => do
let scope ← takeOptArg "--scope" "cache scope"
modifyThe LakeOptions ({· with scope? := some scope})
modifyThe LakeOptions ({· with scope? := some scope, repoScope := false})
| "--repo" => do
let repo ← takeOptArg "--repo" "repository"
modifyThe LakeOptions ({· with scope? := some repo})
let repo ← takeOptArg "--repo" "GitHub repository"
if let some e := validateRepo? repo then error e
modifyThe LakeOptions ({· with scope? := some repo, repoScope := true})
| "--platform" => do
let platform ← takeOptArg "--platform" "cache platform"
if platform.length > 100 then
error "invalid platform; platform is expected to be at most 100 characters long"
modifyThe LakeOptions ({· with platform? := some platform})
| "--toolchain" => do
let toolchain ← takeOptArg "--toolchain" "cache toolchain"
let toolchain := if toolchain.isEmpty then toolchain else normalizeToolchain toolchain
if toolchain.length > 256 then
error "invalid toolchain version; toolchain is expected to be at most 256 characters long"
modifyThe LakeOptions ({· with toolchain? := some toolchain})
| "--rev" => do
let rev ← takeOptArg "--rev" "Git revision"
modifyThe LakeOptions ({· with rev? := some rev})
@ -332,59 +366,86 @@ namespace lake
namespace cache
@[inline] private def cacheToolchain (pkg : Package) (toolchain : String) : String :=
if pkg.bootstrap then "" else toolchain
@[inline] private def cachePlatform (pkg : Package) (platform : String) : String :=
if pkg.isPlatformIndependent then "" else platform
protected def get : CliM PUnit := do
processOptions lakeOption
let opts ← getThe LakeOptions
let mappings? ← takeArg?
noArgsRem do
noArgsRem do
let cfg ← mkLoadConfig opts
let ws ← loadWorkspace cfg
let cache := ws.lakeCache
if let some file := mappings? then
if let some file := mappings? then liftM (m := LoggerIO) do
if opts.platform?.isSome || opts.toolchain?.isSome then
logWarning "the `--platform` and `--toolchain` options do nothing for `cache get` with a mappings file"
if .warning ≤ opts.failLv then
failure
let some remoteScope := opts.scope?
| error "to use `cache get` with a mappings file, `--scope` must be set"
| error "to use `cache get` with a mappings file, `--scope` or `--repo` must be set"
let service : CacheService :=
if let some artifactEndpoint := ws.lakeEnv.cacheArtifactEndpoint? then
{artifactEndpoint}
.downloadArtsService artifactEndpoint
else
{apiEndpoint? := some ws.lakeEnv.reservoirApiUrl}
.reservoirService ws.lakeEnv.reservoirApiUrl
let map ← CacheMap.load file
service.downloadOutputArtifacts map cache remoteScope ws.root.cacheScope opts.forceDownload
service.downloadOutputArtifacts map cache ws.root.cacheScope remoteScope opts.forceDownload
else
let platform := opts.platform?.getD System.Platform.target
let toolchain := opts.toolchain?.getD ws.lakeEnv.toolchain
let service : CacheService ← id do
match ws.lakeEnv.cacheArtifactEndpoint?, ws.lakeEnv.cacheRevisionEndpoint? with
| some artifactEndpoint, some revisionEndpoint =>
return {artifactEndpoint, revisionEndpoint}
return .downloadService artifactEndpoint revisionEndpoint
| none, none =>
return {apiEndpoint? := some ws.lakeEnv.reservoirApiUrl}
return .reservoirService ws.lakeEnv.reservoirApiUrl
| some artifactEndpoint, none =>
error (invalidEndpointConfig artifactEndpoint "")
| none, some revisionEndpoint =>
error (invalidEndpointConfig "" revisionEndpoint)
if let some remoteScope := opts.scope? then
let service := if service.apiEndpoint?.isNone then service else {
artifactEndpoint := s!"{ws.lakeEnv.reservoirApiUrl}/artifacts"
revisionEndpoint := s!"{ws.lakeEnv.reservoirApiUrl}/outputs"
}
let map ← getOutputs cache service ws.root remoteScope opts
service.downloadOutputArtifacts map cache remoteScope ws.root.cacheScope opts.forceDownload
else if service.apiEndpoint?.isSome then -- Reservoir
if !opts.repoScope && service.isReservoir then
-- `--scope` with Reservoir would imply downloading artifacts for a different package.
-- This is likely user error (they meant `--repo`) rather than something actually useful.
error "to use `cache get` with `--scope`, a custom endpoint must be set (not Reservoir); \
if you instead want to download artifacts for a fork of the package, use `--repo`"
let service := service.withRepoScope opts.repoScope
let pkg := ws.root
let repo := GitRepo.mk pkg.dir
let platform := cachePlatform pkg platform
let toolchain := cacheToolchain pkg toolchain
let map ← id do
if let some rev := opts.rev? then
let rev ← repo.resolveRevision rev
let some map ← service.downloadRevisionOutputs? rev cache pkg.cacheScope remoteScope platform toolchain
| error s!"{remoteScope}: outputs not found for revision {rev}"
return map
else
findOutputs cache service pkg remoteScope opts platform toolchain
service.downloadOutputArtifacts map cache pkg.cacheScope remoteScope opts.forceDownload
else if service.isReservoir then
-- TODO: Parallelize?
let ok ← ws.packages.foldlM (start := 1) (init := true) (m := LoggerIO) fun ok pkg => do
try
if pkg.scope.isEmpty then
logInfo s!"{pkg.name}: skipping non-Reservoir dependency`"
else
let platform := cachePlatform pkg platform
let toolchain := cacheToolchain pkg toolchain
let remoteScope := s!"{pkg.scope}/{pkg.name.toString (escape := false)}"
let map ← getOutputs cache service pkg remoteScope opts
service.downloadOutputArtifacts map cache remoteScope pkg.cacheScope opts.forceDownload
let map ← findOutputs cache service pkg remoteScope opts platform toolchain
service.downloadOutputArtifacts map cache pkg.cacheScope remoteScope opts.forceDownload
return ok
catch _ =>
return false
unless ok do
error "failed to download artifacts for some dependencies"
else
error "to use `cache get` with a custom endpoint, the `--scope` option must be set"
error "to use `cache get` with a custom endpoint, the `--scope` or `--repo` option must be set"
where
invalidEndpointConfig artifactEndpoint revisionEndpoint :=
s!"invalid endpoint configuration:\
@ -392,45 +453,42 @@ where
\n LAKE_CACHE_REVISION_ENDPOINT={revisionEndpoint}\n\
To use `cache get` with a custom endpoint, both environment variables \
must be set to non-empty strings. To use Reservoir, neither should be set."
getOutputs cache service pkg remoteScope opts : LoggerIO CacheMap := do
findOutputs cache service pkg remoteScope opts platform toolchain : LoggerIO CacheMap := do
let repo := GitRepo.mk pkg.dir
if let some rev := opts.rev? then
let rev ← repo.resolveRevision rev
let some map ← service.downloadRevisionOutputs? rev cache remoteScope
| error s!"{remoteScope}: outputs not found for revision {rev}"
return map
else
if (← repo.hasDiff) then
logWarning s!"{pkg.name}: package has changes; \
only artifacts for committed code will be downloaded"
if .warning ≤ opts.failLv then
failure
let n := opts.maxRevs
let revs ← repo.getHeadRevisions n
let map? ← revs.findSomeM? fun rev =>
service.downloadRevisionOutputs? rev cache remoteScope
let some map := map?
| let revisions :=
if n = 0 || revs.size < n then "for any revision" else s!"in {n} revisions from HEAD"
error s!"{remoteScope}: no outputs found {revisions}"
return map
if (← repo.hasDiff) then
logWarning s!"{pkg.name}: package has changes; \
only artifacts for committed code will be downloaded"
if .warning ≤ opts.failLv then
failure
let n := opts.maxRevs
let revs ← repo.getHeadRevisions n
let map? ← revs.findSomeM? fun rev =>
service.downloadRevisionOutputs? rev cache pkg.cacheScope remoteScope platform toolchain
let some map := map?
| let revisions :=
if n = 0 || revs.size < n then "for any revision" else s!"in {n} revisions from HEAD"
error s!"{remoteScope}: no outputs found {revisions}"
return map
protected def put : CliM PUnit := do
processOptions lakeOption
let file ← takeArg "mappings"
let opts ← getThe LakeOptions
let some scope := opts.scope?
| error "the `--scope` option must be set for `cache put`"
| error "the `--scope` or `--repo` option must be set for `cache put`"
noArgsRem do
let cfg ← mkLoadConfig opts
let ws ← loadWorkspace cfg
let pkg := ws.root
let platform := cachePlatform pkg (opts.platform?.getD System.Platform.target)
let toolchain := cacheToolchain pkg (opts.toolchain?.getD ws.lakeEnv.toolchain)
let service : CacheService ← id do
match ws.lakeEnv.cacheKey?, ws.lakeEnv.cacheArtifactEndpoint?, ws.lakeEnv.cacheRevisionEndpoint? with
| some key, some artifactEndpoint, some revisionEndpoint =>
return {key, artifactEndpoint, revisionEndpoint}
return .uploadService key artifactEndpoint revisionEndpoint
| key?, artifactEndpoint?, revisionEndpoint? =>
error (invalidEndpointConfig key? artifactEndpoint? revisionEndpoint?)
let pkg := ws.root
let service := service.withRepoScope opts.repoScope
let repo := GitRepo.mk pkg.dir
if (← repo.hasDiff) then
logWarning s!"{pkg.name}: package has changes; \
@ -444,7 +502,7 @@ protected def put : CliM PUnit := do
service.uploadArtifacts ⟨descrs, rfl⟩ paths scope
-- Mappings are uploaded after artifacts to allow downloads to assume that
-- if the mappings exist, the artifacts should also exist
service.uploadRevisionOutputs rev file scope
service.uploadRevisionOutputs rev file scope platform toolchain
where
invalidEndpointConfig key? artifactEndpoint? revisionEndpoint? :=
s!"invalid endpoint configuration:\

View file

@ -9,6 +9,7 @@ prelude
public import Lake.Util.Log
public import Lake.Config.Artifact
public import Lake.Build.Trace
import Lake.Config.InstallPath
import Lake.Build.Actions
import Lake.Util.Url
import Lake.Util.Proc
@ -19,6 +20,8 @@ open Lean System
namespace Lake
/-! ## Cache Map -/
/--
Maps an input hash to a structure of output artifact content hashes.
@ -180,6 +183,8 @@ where go as o := do
end CacheMap
/-! ## Cache Ref -/
/-- A mutable reference to a `CacheMap`. -/
public abbrev CacheRef := IO.Ref CacheMap
@ -198,6 +203,8 @@ public def insert [ToJson α] (inputHash : Hash) (val : α) (cache : CacheRef) :
end CacheRef
/-! ## Local Cache -/
/-- The Lake cache directory. -/
public structure Cache where
dir : FilePath
@ -291,6 +298,8 @@ public def readOutputs? (cache : Cache) (scope : String) (inputHash : Hash) : Lo
end Cache
/-! ## Remote Cache Service -/
/-- Uploads a file to an online bucket using the Amazon S3 protocol. -/
def uploadS3
(file : FilePath) (contentType : String) (url : String) (key : String)
@ -308,31 +317,79 @@ def uploadS3
/--
Configuration of a remote cache service (e.g., Reservoir or an S3 bucket).
All fields are not required to be valid.
A user should set at least the necessary ones for a given function.
A given configuration is not required to support all cache service functions, and no validation
of the configuration is performed by its operations. Users should construct a service that supports
the desired functions by using `CacheService`'s smart constructors
(e.g., `reservoir`, `uploadService`).
-/
public structure CacheService where
key : String := ""
artifactEndpoint : String := ""
revisionEndpoint : String := ""
/-- Reservoir API endpoint. -/
apiEndpoint? : Option String := none
private mk ::
/- S3 Bucket -/
private key : String := ""
private artifactEndpoint : String := ""
private revisionEndpoint : String := ""
/- Reservoir -/
/-- Is this a Reservoir cache service configuration? -/
isReservoir : Bool := false
private apiEndpoint : String := ""
/-- Whether interpret the scope as a repository or not. -/
private repoScope : Bool := false
namespace CacheService
/-! ### Constructors -/
/-- Constructs a `CacheService` for a Reservoir endpoint. -/
@[inline] public def reservoirService (apiEndpoint : String) (repoScope := false) : CacheService :=
{isReservoir := true, apiEndpoint, repoScope}
/-- Constructs a `CacheService` to upload artifacts and/or outputs to an S3 endpoint. -/
@[inline] public def uploadService (key artifactEndpoint revisionEndpoint : String) : CacheService :=
{key, artifactEndpoint, revisionEndpoint}
/-- Constructs a `CacheService` to download artifacts and/or outputs from to an S3 endpoint. -/
@[inline] public def downloadService (artifactEndpoint revisionEndpoint : String) : CacheService :=
{artifactEndpoint, revisionEndpoint}
/-- Constructs a `CacheService` to download just artifacts from to an S3 endpoint. -/
@[inline] public def downloadArtsService (artifactEndpoint : String) : CacheService :=
{artifactEndpoint}
/--
Reconfigures the cache service to interpret scopes as repositories (or not if `false`).
For custom endpoints, if `true`, Lake wil augment the provided scope with
toolchain and platform information in a manner similar to Reservoir.
-/
@[inline] public def withRepoScope (service : CacheService) (repoScope := true) : CacheService :=
{service with repoScope}
/-! ### Artifact Transfer -/
/-- The MIME type of Lake/Reservoir artifact. -/
public def artifactContentType : String := "application/vnd.reservoir.artifact"
public def artifactUrl (contentHash : Hash) (scope : String) (service : CacheService) : String :=
let scope := "/".intercalate <| scope.split (· == '/') |>.map uriEncode
if let some apiEndpoint := service.apiEndpoint? then
s!"{apiEndpoint}/packages/{scope}/artifacts/{contentHash.hex}.art"
private def appendScope (endpoint : String) (scope : String) : String :=
scope.split (· == '/') |>.foldl (init := endpoint) fun s component =>
uriEncode component s |>.push '/'
private def s3ArtifactUrl (contentHash : Hash) (service : CacheService) (scope : String) : String :=
appendScope s!"{service.artifactEndpoint}/" scope ++ s!"{contentHash.hex}.art"
public def artifactUrl (contentHash : Hash) (service : CacheService) (scope : String) : String :=
if service.isReservoir then
let endpoint :=
if service.repoScope then
s!"{service.apiEndpoint}/repositories/"
else
s!"{service.apiEndpoint}/packages/"
appendScope endpoint scope ++ s!"artifacts/{contentHash.hex}.art"
else
s!"{service.artifactEndpoint}/{scope}/{contentHash.hex}.art"
service.s3ArtifactUrl contentHash scope
public def downloadArtifact
(descr : ArtifactDescr) (cache : Cache) (scope : String)
(service : CacheService) (force := false)
(descr : ArtifactDescr) (cache : Cache)
(service : CacheService) (scope : String) (force := false)
: LoggerIO Unit := do
let url := service.artifactUrl descr.hash scope
let path := cache.artifactDir / descr.relPath
@ -345,13 +402,13 @@ public def downloadArtifact
download url path
let hash ← computeFileHash path
if hash != descr.hash then
logError s!"{path}: downloaded artifact does not have expect hash"
logError s!"{path}: downloaded artifact does not have the expected hash"
IO.FS.removeFile path
failure
public def downloadArtifacts
(descrs : Array ArtifactDescr) (cache : Cache) (scope : String)
(service : CacheService) (force := false)
(descrs : Array ArtifactDescr) (cache : Cache)
(service : CacheService) (scope : String) (force := false)
: LoggerIO Unit := do
let ok ← descrs.foldlM (init := true) fun ok descr =>
try
@ -363,17 +420,17 @@ public def downloadArtifacts
error s!"{scope}: failed to download some artifacts"
public def downloadOutputArtifacts
(map : CacheMap) (cache : Cache) (remoteScope localScope : String)
(service : CacheService) (force := false)
(map : CacheMap) (cache : Cache) (service : CacheService)
(localScope remoteScope : String) (force := false)
: LoggerIO Unit := do
cache.writeMap localScope map
let descrs ← map.collectOutputDescrs
service.downloadArtifacts descrs cache remoteScope force
public def uploadArtifact
(contentHash : Hash) (art : FilePath) (scope : String) (service : CacheService)
(contentHash : Hash) (art : FilePath) (service : CacheService) (scope : String)
: LoggerIO Unit := do
let url := service.artifactUrl contentHash scope
let url := service.s3ArtifactUrl contentHash scope
logInfo s!"\
{scope}: uploading artifact {contentHash}\
\n local path: {art}\
@ -381,35 +438,62 @@ public def uploadArtifact
uploadS3 art artifactContentType url service.key
public def uploadArtifacts
(descrs : Vector ArtifactDescr n) (paths : Vector FilePath n) (scope : String) (service : CacheService)
: LoggerIO Unit := n.forM fun n h => uploadArtifact descrs[n].hash paths[n] scope service
(descrs : Vector ArtifactDescr n) (paths : Vector FilePath n)
(service : CacheService) (scope : String)
: LoggerIO Unit := n.forM fun n h => service.uploadArtifact descrs[n].hash paths[n] scope
/-! ### Output Transfer -/
/-- The MIME type of Lake/Reservoir input-to-output mappings for a Git revision. -/
public def mapContentType : String := "application/vnd.reservoir.outputs+json-lines"
private def s3RevisionUrl
(rev : String) (service : CacheService) (scope : String)
(platform : String := "") (toolchain : String := "")
: String := Id.run do
let mut url := appendScope s!"{service.revisionEndpoint}/" scope
if service.repoScope then
unless platform.isEmpty do
url := uriEncode platform s!"{url}pt/" |>.push '/'
unless toolchain.isEmpty do
url := uriEncode (toolchain2Dir toolchain).toString s!"{url}tc/" |>.push '/'
return s!"{url}{rev}.jsonl"
public def revisionUrl (rev : String) (scope : String) (service : CacheService) : String :=
let scope := "/".intercalate <| scope.split (· == '/') |>.map uriEncode
if let some apiEndpoint := service.apiEndpoint? then
s!"{apiEndpoint}/packages/{scope}/build-outputs?rev={rev}"
public def revisionUrl
(rev : String) (service : CacheService) (scope : String)
(platform : String := "") (toolchain : String := "")
: String :=
if service.isReservoir then Id.run do
let mut url := service.apiEndpoint
if service.repoScope then
url := url ++ "/repositories/"
else
url := url ++ "/packages/"
url := appendScope url scope ++ s!"build-outputs?rev={rev}"
unless platform.isEmpty do
url := uriEncode platform s!"{url}&platform="
unless toolchain.isEmpty do
url := uriEncode toolchain s!"{url}&toolchain="
return url
else
s!"{service.revisionEndpoint}/{scope}/{rev}.jsonl"
service.s3RevisionUrl rev scope platform toolchain
public def downloadRevisionOutputs?
(rev : String) (cache : Cache) (scope : String)
(service : CacheService) (force := false)
(rev : String) (cache : Cache) (service : CacheService) (localScope remoteScope : String)
(platform : String := "") (toolchain : String := "") (force := false)
: LoggerIO (Option CacheMap) := do
let path := cache.revisionPath rev scope
-- TODO: toolchain-scoped revision paths for system cache?
let path := cache.revisionPath rev localScope
if (← path.pathExists) && !force then
return ← CacheMap.load path
let url := service.revisionUrl rev scope
let url := service.revisionUrl rev remoteScope platform toolchain
logInfo s!"\
{scope}: downloading build outputs for revision {rev}\
{localScope}: downloading build outputs for revision {rev}\
\n local path: {path}\
\n remote URL: {url}"
let headers := if service.apiEndpoint?.isSome then Reservoir.lakeHeaders else {}
let headers := if service.isReservoir then Reservoir.lakeHeaders else {}
let contents? ← try getUrl? url headers catch e =>
logError s!"{scope}: output lookup failed"
logError s!"{remoteScope}: output lookup failed"
throw e
let some contents := contents?
| return none
@ -418,9 +502,10 @@ public def downloadRevisionOutputs?
CacheMap.load path
public def uploadRevisionOutputs
(rev : String) (outputs : FilePath) (scope : String) (service : CacheService)
(rev : String) (outputs : FilePath) (service : CacheService) (scope : String)
(platform : String := "") (toolchain : String := "")
: LoggerIO Unit := do
let url := service.revisionUrl rev scope
let url := service.s3RevisionUrl rev scope platform toolchain
logInfo s!"\
{scope}: uploading build outputs for revision {rev}\
\n local path: {outputs}\

View file

@ -213,6 +213,10 @@ namespace Package
@[inline] public def platformIndependent (self : Package) : Option Bool :=
self.config.platformIndependent
/-- Whether the package's has been configured with `platformIndependent = true`. -/
@[inline] public def isPlatformIndependent (self : Package) : Bool :=
self.config.platformIndependent == some true
/-- The package's `releaseRepo`/`releaseRepo?` configuration. -/
@[inline] public def releaseRepo? (self : Package) : Option String :=
self.config.releaseRepo

View file

@ -96,8 +96,8 @@ public def uriEncodeChar (c : Char) (s := "") : String :=
uriEscapeChar c s
/-- Encodes a string as an ASCII URI component, escaping special characters (and unicode). -/
public def uriEncode (s : String) : String :=
s.foldl (init := "") fun s c => uriEncodeChar c s
public def uriEncode (s : String) (init := "") : String :=
s.foldl (init := init) fun s c => uriEncodeChar c s
/--
Performs a HTTP `GET` request of a URL (using `curl` with JSON output) and, if successful,
@ -124,7 +124,7 @@ public def getUrl?
else if code == 404 then
return none
else
error s!"failed to GET URL, error {code}"
error s!"failed to GET URL, error {code}; received:\n{out.stdout}"
| .error e =>
error s!"curl produced invalid JSON output: {e}"

View file

@ -12,7 +12,6 @@ public import Lake.Util.Date
/-! # Version
This module contains useful definitions for manipulating versions.
It also defines a `v!"<ver>"` syntax for version literals.
-/
open System Lean
@ -111,20 +110,34 @@ public instance : ToString StdVer := ⟨StdVer.toString⟩
public instance : ToJson StdVer := ⟨(·.toString)⟩
public instance : FromJson StdVer := ⟨(do StdVer.parse <| ← fromJson? ·)⟩
/-- A Lean toolchain version. -/
public inductive ToolchainVer
| release (ver : LeanVer)
| nightly (date : Date)
| pr (no : Nat)
| other (name : String)
deriving Repr, DecidableEq
public instance : Coe LeanVer ToolchainVer := ⟨ToolchainVer.release⟩
/-- The `elan` toolchain file name (i.e., `lean-toolchain`). -/
public def toolchainFileName : FilePath := "lean-toolchain"
public def ToolchainVer.defaultOrigin := "leanprover/lean4"
public def ToolchainVer.prOrigin := "leanprover/lean4-pr-releases"
public def ToolchainVer.ofString (ver : String) : ToolchainVer := Id.run do
public section -- for `@[computed_field]`
open ToolchainVer in
/-- A Lean toolchain version. -/
inductive ToolchainVer
| release (ver : LeanVer)
| nightly (date : Date)
| pr (n : Nat)
| other (v : String)
with
@[computed_field] toString : ToolchainVer → String
| .release ver => s!"{defaultOrigin}:v{ver}"
| .nightly date => s!"{defaultOrigin}:nightly-{date}"
| .pr n => s!"{prOrigin}:pr-release-{n}"
| .other v => v
deriving Repr, DecidableEq
end
namespace ToolchainVer
public instance : Coe LeanVer ToolchainVer := ⟨ToolchainVer.release⟩
public def ofString (ver : String) : ToolchainVer := Id.run do
let colonPos := ver.posOf ':'
let (origin, tag) :=
if h : colonPos < ver.endPos then
@ -132,27 +145,29 @@ public def ToolchainVer.ofString (ver : String) : ToolchainVer := Id.run do
(ver.extract 0 colonPos, ver.extract pos ver.endPos)
else
("", ver)
let noOrigin := origin.isEmpty
if tag.startsWith "v" then
if let .ok ver := StdVer.parse (tag.drop 1) then
if origin.isEmpty || origin == defaultOrigin then
if noOrigin|| origin == defaultOrigin then
return .release ver
return .other ver
else if tag.startsWith "nightly-" then
if let some date := Date.ofString? (tag.drop "nightly-".length) then
if origin.isEmpty || origin == defaultOrigin then
else if let some date := tag.dropPrefix? "nightly-" then
if let some date := Date.ofString? date.toString then
if noOrigin then
return .nightly date
else if tag.startsWith "pr-release-" then
if let some n := (tag.drop "pr-release-".length).toNat? then
if origin.isEmpty || origin == prOrigin then
else if let some suffix := origin.dropPrefix? defaultOrigin then
if suffix.isEmpty || suffix == "-nightly".toSubstring then
return .nightly date
else if let some n := tag.dropPrefix? "pr-release-" then
if let some n := n.toNat? then
if noOrigin || origin == prOrigin then
return .pr n
else
if let .ok ver := StdVer.parse ver then
if origin.isEmpty || origin == defaultOrigin then
return .release ver
else if let .ok ver := StdVer.parse ver then
if noOrigin || origin == defaultOrigin then
return .release ver
return .other ver
/-- Parse a toolchain from a `lean-toolchain` file. -/
public def ToolchainVer.ofFile? (toolchainFile : FilePath) : IO (Option ToolchainVer) := do
public def ofFile? (toolchainFile : FilePath) : IO (Option ToolchainVer) := do
try
let toolchainString ← IO.FS.readFile toolchainFile
return some <| ToolchainVer.ofString toolchainString.trim
@ -161,61 +176,41 @@ public def ToolchainVer.ofFile? (toolchainFile : FilePath) : IO (Option Toolchai
return none
| e => throw e
/-- The `elan` toolchain file name (i.e., `lean-toolchain`). -/
public def toolchainFileName : FilePath := "lean-toolchain"
/-- Parse a toolchain from the `lean-toolchain` file of the directory `dir`. -/
@[inline] public def ToolchainVer.ofDir? (dir : FilePath) : IO (Option ToolchainVer) :=
@[inline] public def ofDir? (dir : FilePath) : IO (Option ToolchainVer) :=
ToolchainVer.ofFile? (dir / toolchainFileName)
public protected def ToolchainVer.toString (ver : ToolchainVer) : String :=
match ver with
| .release ver => s!"{defaultOrigin}:v{ver}"
| .nightly date => s!"{defaultOrigin}:nightly-{date}"
| .pr n => s!"{prOrigin}:pr-release-{n}"
| .other s => s
public instance : ToString ToolchainVer := ⟨ToolchainVer.toString⟩
public instance : ToJson ToolchainVer := ⟨(·.toString)⟩
public instance : FromJson ToolchainVer := ⟨(ToolchainVer.ofString <$> fromJson? ·)⟩
@[expose] public def ToolchainVer.lt (a b : ToolchainVer) : Prop :=
public def blt (a b : ToolchainVer) : Bool :=
match a, b with
| .release v1, .release v2 => v1 < v2
| .nightly d1, .nightly d2 => d1 < d2
| _, _ => False
| _, _ => false
public instance : LT ToolchainVer := ⟨ToolchainVer.lt⟩
public instance : LT ToolchainVer := ⟨(·.blt ·)⟩
public instance decLt (a b : ToolchainVer) : Decidable (a < b) :=
decidable_of_bool (a.blt b) Iff.rfl
public instance ToolchainVer.decLt (a b : ToolchainVer) : Decidable (a < b) :=
match a, b with
| .release v1, .release v2 => inferInstanceAs (Decidable (v1 < v2))
| .nightly d1, .nightly d2 => inferInstanceAs (Decidable (d1 < d2))
| .release _, .pr _ | .release _, .nightly _ | .release _, .other _
| .nightly _, .release _ | .nightly _, .pr _ | .nightly _, .other _
| .pr _, _ | .other _, _ => .isFalse (by simp [LT.lt, ToolchainVer.lt])
@[expose] public def ToolchainVer.le (a b : ToolchainVer) : Prop :=
public def ble (a b : ToolchainVer) : Bool :=
match a, b with
| .release v1, .release v2 => v1 ≤ v2
| .nightly d1, .nightly d2 => d1 ≤ d2
| .pr n1, .pr n2 => n1 = n2
| .other v1, .other v2 => v1 = v2
| _, _ => False
| _, _ => false
public instance : LE ToolchainVer := ⟨ToolchainVer.le⟩
public instance : LE ToolchainVer := ⟨(·.ble ·)⟩
public instance decLe (a b : ToolchainVer) : Decidable (a ≤ b) :=
decidable_of_bool (a.ble b) Iff.rfl
public instance ToolchainVer.decLe (a b : ToolchainVer) : Decidable (a ≤ b) :=
match a, b with
| .release v1, .release v2 => inferInstanceAs (Decidable (v1 ≤ v2))
| .nightly d1, .nightly d2 => inferInstanceAs (Decidable (d1 ≤ d2))
| .pr n1, .pr n2 => inferInstanceAs (Decidable (n1 = n2))
| .other v1, .other v2 => inferInstanceAs (Decidable (v1 = v2))
| .release _, .pr _ | .release _, .nightly _ | .release _, .other _
| .nightly _, .release _ | .nightly _, .pr _ | .nightly _, other _
| .pr _, .release _ | .pr _, .nightly _ | .pr _, .other _
| .other _, .release _ | .other _, .nightly _ | .other _, .pr _ =>
.isFalse (by simp [LE.le, ToolchainVer.le])
end ToolchainVer
/-- Converts a toolchain version to its normal form (e.g., with an origin). -/
public def normalizeToolchain (s : String) : String :=
ToolchainVer.ofString s |>.toString
/-- Parses a version from a string. -/
public class DecodeVersion (α : Type u) where

View file

@ -1,4 +1,4 @@
gi#!/usr/bin/env bash
#!/usr/bin/env bash
source ../common.sh
./clean.sh
@ -42,28 +42,41 @@ init_git
TEST_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
export LAKE_CACHE_DIR="$TEST_DIR/.lake/cache"
# Ensure that Lake is run without a toolchain name
# (so toolchain does not end up in the cache)
export ELAN_TOOLCHAIN=
echo "# TESTS"
# Test `--repo` validation
test_err "must contain exactly one '/'" cache get --repo='invalid'
test_err 'invalid characters in repository name' cache get --repo='!/invalid'
# Test `cache get` command errors for bad configurations
with_cdn_endpoints test_err 'the `--scope` option must be set' cache get
test_err 'the `--platform` and `--toolchain` options do nothing' \
cache get bogus.jsonl --scope='bogus' --platform='bogus' --wfail
test_err 'the `--platform` and `--toolchain` options do nothing' \
cache get bogus.jsonl --scope='bogus' --toolchain='bogus' --wfail
test_err 'a custom endpoint must be set (not Reservoir)' cache get --scope='bogus'
with_cdn_endpoints test_err 'the `--scope` or `--repo` option must be set' cache get
LAKE_CACHE_ARTIFACT_ENDPOINT=bogus test_err 'both environment variables must be set' cache get
LAKE_CACHE_REVISION_ENDPOINT=bogus test_err 'both environment variables must be set' cache get
# Test `cache put` command errors for bad configurations
test_err 'the `--scope` option must be set' cache put bogus.jsonl
with_upload_endpoints test_err 'the `--scope` or `--repo` option must be set' cache put bogus.jsonl
test_err 'these environment variables must be set' \
cache put bogus.jsonl --scope='!/bogus'
cache put bogus.jsonl --scope='bogus'
LAKE_CACHE_KEY= test_err 'these environment variables must be set' \
cache put bogus.jsonl --scope='!/bogus'
cache put bogus.jsonl --scope='bogus'
LAKE_CACHE_REVISION_ENDPOINT=bogus test_err 'these environment variables must be set' \
cache put bogus.jsonl --scope='!/bogus'
cache put bogus.jsonl --scope='bogus'
LAKE_CACHE_REVISION_ENDPOINT=bogus test_err 'these environment variables must be set' \
cache put bogus.jsonl --scope='!/bogus'
cache put bogus.jsonl --scope='bogus'
# Test revision failure
# Test revision failure (with Reservoir)
REV=$(git rev-parse HEAD)
test_err "revision not found" cache get --scope='!/bogus' --rev='bogus'
test_err "outputs not found for revision" cache get --scope='!/bogus' --rev=$REV
test_err "revision not found" cache get --repo='leanprover/bogus' --rev='bogus'
test_err "outputs not found for revision" cache get --repo='leanprover/bogus' --rev=$REV
# Test `cache get` skipping non-Reservoir dependencies
test_run -f non-reservoir.toml update
@ -91,12 +104,24 @@ with_cdn_endpoints test_not_out "downloading" cache get --scope='!/test'
# Test `--force-download`
with_cdn_endpoints test_out "downloading" cache get --scope='!/test' --force-download
# Test cache put/get with a set platform/toolchain
with_upload_endpoints test_run cache put .lake/outputs.jsonl \
--repo='leanprover/test' --platform=foo --toolchain=bar
test_cmd rm -rf .lake/build "$LAKE_CACHE_DIR"
with_cdn_endpoints test_run cache get \
--repo='leanprover/test' --platform=foo --toolchain=bar
test_run build +Test --no-build
test_cmd rm -rf .lake/build "$LAKE_CACHE_DIR"
test_run cache get \
--repo='leanprover/test' --platform=foo --toolchain=bar
test_run build +Test --no-build
# Test dirty work tree warnings
test_cmd touch Ignored.lean
test_cmd git add -f Ignored.lean
with_upload_endpoints test_err "package has changes" --wfail \
cache put .lake/outputs.jsonl --scope='!/test'
test_err "package has changes" --wfail cache get --scope='!/test'
cache put .lake/outputs.jsonl --repo='leanprover/bogus'
test_err "package has changes" --wfail cache get --repo='leanprover/bogus'
test_cmd git commit -m "v2"
# Test revision search
@ -112,7 +137,7 @@ test_out "the artifact cache is not enabled for this package" \
LAKE_ARTIFACT_CACHE=true test_run -d .lake/packages/Cli \
build -o "$TEST_DIR/.lake/cli-outputs.jsonl" --no-build
with_upload_endpoints test_run -d .lake/packages/Cli \
cache put "$TEST_DIR/.lake/cli-outputs.jsonl" --scope "leanprover/lean4-cli"
cache put "$TEST_DIR/.lake/cli-outputs.jsonl" --repo=leanprover/lean4-cli
test_cmd rm -rf .lake/packages/Cli/.lake/build "$LAKE_CACHE_DIR"
test_fails -f reservoir.toml build @Cli --no-build
test_err "failed to download artifacts for some dependencies" \
@ -120,7 +145,7 @@ test_err "failed to download artifacts for some dependencies" \
test_run -f reservoir.toml cache get --max-revs=1
test_run -f reservoir.toml build @Cli --no-build
# Test Reservoir with `--scope`/`--repo` uses GitHub scope
# Test Reservoir with `--repo` uses GitHub scope
test_cmd rm -rf .lake/cache
test_run -d .lake/packages/Cli cache get --repo=leanprover/lean4-cli --max-revs=1
test_run -d .lake/packages/Cli build --no-build