Conversation
|
Review requested:
|
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
Nice! This is a great addition. Since it's such a large PR, this will take me some time to review. Will try to tackle it over the next week. |
| */ | ||
| existsSync(path) { | ||
| // Prepend prefix to path for VFS lookup | ||
| const fullPath = this.#prefix + (StringPrototypeStartsWith(path, '/') ? path : '/' + path); |
| validateObject(files, 'options.files'); | ||
| } | ||
|
|
||
| const { VirtualFileSystem } = require('internal/vfs/virtual_fs'); |
There was a problem hiding this comment.
Shouldn't we import this at the top level / lazy load it at the top level?
| ArrayPrototypePush(this.#mocks, { | ||
| __proto__: null, | ||
| ctx, | ||
| restore: restoreFS, |
There was a problem hiding this comment.
| restore: restoreFS, | |
| restore: ctx.restore, |
nit
lib/internal/vfs/entries.js
Outdated
| * @param {object} [options] Optional configuration | ||
| */ | ||
| addFile(name, content, options) { | ||
| const path = this._directory.path + '/' + name; |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const dirPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const parentPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
| } | ||
| callback(null, content); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
lib/internal/vfs/virtual_fs.js
Outdated
| const bytesToRead = Math.min(length, available); | ||
| content.copy(buffer, offset, readPos, readPos + bytesToRead); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
|
|
||
| callback(null, bytesToRead, buffer); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
|
Left an initial review, but like @Ethan-Arrowood said, it'll take time for a more in depth look |
|
It's nice to see some momentum in this area, though from a first glance it seems the design has largely overlooked the feedback from real world use cases collected 4 years ago: https://github.com/nodejs/single-executable/blob/main/docs/virtual-file-system-requirements.md - I think it's worth checking that the API satisfies the constraints that users of this feature have provided, to not waste the work that have been done by prior contributors to gather them, or having to reinvent it later (possibly in a breaking manner) to satisfy these requirements from real world use cases. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61478 +/- ##
==========================================
+ Coverage 89.68% 89.82% +0.13%
==========================================
Files 676 692 +16
Lines 206555 215352 +8797
Branches 39552 41190 +1638
==========================================
+ Hits 185249 193438 +8189
- Misses 13444 14027 +583
- Partials 7862 7887 +25
🚀 New features to boost your workflow:
|
|
And why not something like OPFS aka whatwg/fs? const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.mount('/app', rootHandle) // to make it work with fs
fs.readFileSync('/app/config.json')OR const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.readFileSync('sandbox:/config.json')fs.createVirtual seems like something like a competing specification |
5e317de to
977cc3d
Compare
I generally prefer not to interleave with WHATWG specs as much as possible for core functionality (e.g., SEA). In my experience, they tend to perform poorly on our codebase and remove a few degrees of flexibility. (I also don't find much fun in working on them, and I'm way less interested in contributing to that.) On an implementation side, the core functionality of this feature will be identical (technically, it's missing writes that OPFS supports), as we would need to impact all our internal fs methods anyway. If this lands, we can certainly iterate on a WHATWG-compatible API for this, but I would not add this to this PR. |
|
Small prior art: https://github.com/juliangruber/subfs |
8d711c1 to
73c18cd
Compare
|
I also worked on this a bit on the side recently: Qard@73b8fc6 That is very much in chaotic ideation stage with a bunch of LLM assistance to try some different ideas, but the broader concept I was aiming for was to have a module.exports = new VirtualFileSystem(new LocalProvider())I intended for it to be extensible for a bunch of different interesting scenarios, so there's also an S3 provider and a zip file provider there, mainly just to validate that the model can be applied to other varieties of storage systems effectively. Keep in mind, like I said, the current state is very much just ideation in a branch I pushed up just now to share, but I think there are concepts for extensibility in there that we could consider to enable a whole ecosystem of flexible storage providers. 🙂 Personally, I would hope for something which could provide both read and write access through an abstraction with swappable backends of some variety, this way we could pass around these virtualized file systems like objects and let an ecosystem grow around accepting any generalized virtual file system for its storage backing. I think it'd be very nice for a lot of use cases like file uploads or archive management to be able to just treat them like any other readable and writable file system. |
just a bit off topic... but this reminds me of why i created this feature request: Would not lie, it would be cool if NodeJS also provided some type of static example that would only work in NodeJS (based on how it works internally) const size = 26
const blobPart = BlobFrom({
size,
stream (start, end) {
// can either be sync or async (that resolves to a ReadableStream)
// return new Response('abcdefghijklmnopqrstuvwxyz'.slice(start, end)).body
// return new Blob(['abcdefghijklmnopqrstuvwxyz'.slice(start, end)]).stream()
return fetch('https://httpbin.dev/range/' + size, {
headers: {
range: `bytes=${start}-${end - 1}`
}
}).then(r => r.body)
}
})
blobPart.text().then(text => {
console.log('a-z', text)
})
blobPart.slice(-3).text().then(text => {
console.log('x-z', text)
})
const a = blobPart.slice(0, 6)
a.text().then(text => {
console.log('a-f', text)
})
const b = a.slice(2, 4)
b.text().then(text => {
console.log('c-d', text)
})An actual working PoC(I would not rely on this unless it became officially supported by nodejs core - this is a hack) const blob = new Blob()
const symbols = Object.getOwnPropertySymbols(blob)
const blobSymbol = symbols.map(s => [s.description, s])
const symbolMap = Object.fromEntries(blobSymbol)
const {
kHandle,
kLength,
} = symbolMap
function BlobFrom ({ size, stream }) {
const blob = new Blob()
if (size === 0) return blob
blob[kLength] = size
blob[kHandle] = {
span: [0, size],
getReader () {
const [start, end] = this.span
if (start === end) {
return { pull: cb => cb(0) }
}
let reader
return {
async pull (cb) {
reader ??= (await stream(start, end)).getReader()
const {done, value} = await reader.read()
cb(done ^ 1, value)
}
}
},
slice (start, end) {
const [baseStart] = this.span
return {
span: [baseStart + start, baseStart + end],
getReader: this.getReader,
slice: this.slice,
}
}
}
return blob
}currently problematic to do: also need to handle properly clone, serialize & deserialize, if this where to be sent of to another worker - then i would transfer a MessageChannel where the worker thread asks main frame to hand back a transferable ReadableStream when it needs to read something. but there are probably better ways to handle this internally in core with piping data directly to and from different destinations without having to touch the js runtime? - if only getReader could return the reader directly instead of needing to read from the ReadableStream using js? |
|
I'll let others decide. I hope the comments provided here express my point of view succinctly, so I'll try avoid thrashing this PR further :-) |
P0 fixes: - setup.js: use fileURLToPath() instead of path.pathname for file: URLs - file_handle.js: enforce read/write permissions and exclusive flags - memory.js: prevent auto-creation of parent dirs in openSync - memory.js: validate destination before removing source in renameSync P1 fixes: - memory.js: convert numeric O_* flags to string equivalents - dir.js: add callback support to read() and close() - watcher.js: fix unwatchFile leak by clearing internal listeners set - provider.js: pass options.flag through writeFile/appendFile - memory.js: implement recursive readdir - watcher.js: poll children for directory watch, rescan for new files - streams.js: support fd and start options in read/write streams P2 fixes: - provider.js: check R_OK/W_OK/X_OK permission bits in access() - provider.js: check COPYFILE_EXCL in copyFile() - stats.js: add createZeroStats() without S_IFREG for watchFile - errors.js: add createEACCES error factory
Extract common patterns into shared helpers to reduce repetition in the VFS fs-handler registration code: - toPathStr(): replaces ~40 occurrences of URL-to-string resolution - findVFSWith/findVFSWithAsync(): generic exists+try/catch+mounted lookup, replacing 11 nearly-identical findVFSFor* functions - noopPathSync/noopFdSync: shared no-op handlers for chmod/chown/ utimes-style operations (13 handlers consolidated) Also fixes .pathname usage in rename/copyFile/link handlers to use fileURLToPath() for correct percent-encoding handling. Reduces setup.js from 1524 to 1080 lines (-29%).
This comment has been minimized.
This comment has been minimized.
Switch MemoryFileHandle from exact-fit allocation to geometric doubling (capacity * 2) for amortized O(1) append performance. Track actual content size separately from buffer capacity via a #size field, and expose only valid data through subarray views.
|
Can folks chiming in on the meta discussions about AI/LLM move to #62105 or a separate thread about this topic specifically? GitHub doesn’t handle long threads very well. This PR is already quite difficult to track for technical reviews, please keep the discussions here specific to this PR (question and answers about the copyright analysis are okay as they are technically specific to the content of this PR; more meta discussions about tools can use a dedicated space elsewhere or they will quickly be buried in GitHub’s thread folding along with technical reviews). |
|
I've marked my analysis comment as resolved as I believe it made the point I was trying to make. For the remaining technical discussions, I really think it's going to be easier to do incremental follow-ups than it will be to continue refining this beast of a PR. But, agreed... The rest of the discussion should be focused on the technical side of things |
This comment was marked as spam.
This comment was marked as spam.
|
are there any plans to support command execution inside the vfs? like const vfs = require('node:vfs');
const fs = require('node:fs');
// Create a VFS with default MemoryProvider
const myVfs = vfs.create();
myVfs.exec('ls /')
myVfs.spwan('npm i') |
|
I'm locking this PR on the grounds that it is becoming off-topic and diverging from technical discussions. This is an impartial lock to ensure this PR is on-track. Questions and concerns regarding AI-practices directly related to this PR or not, please follow to the linked issue on OpenJS Foundation On a separate note, and not connected to the lock, I fully agree with @jasnell on all his grounds and do believe that the elusive assertion of the DCO not being fullfiled should be prompted elsewhere instead of directly on this PR. OpenJS Foundation Legal cleared the DCO is correctly being followed, so any further deliberation on the matter won't be useful for the development of this PR. Thank you! |
Ethan-Arrowood
left a comment
There was a problem hiding this comment.
Really impressed with this work. I particularly focussed my review on the docs and tests. I'm very happy with the shape of the API and am excited for this feature. Excellent! 🚀
| if (h !== null) { | ||
| const vfsResult = await h.readdir(path, options); | ||
| if (vfsResult !== undefined) return vfsResult; | ||
| } |
There was a problem hiding this comment.
Follow up for later: there's a fair amount of duplication in these checks... using a shared utility would reduce the duplication.
| * @param {string|number} flags The flags to normalize | ||
| * @returns {string} Normalized string flags | ||
| */ | ||
| function normalizeFlags(flags) { |
There was a problem hiding this comment.
Nit: I'm sure we have this implemented somewhere else already. In a follow up this should be de-duplicated.
| flags === 'ax' || flags === 'ax+'; | ||
| const isExclusive = flags === 'wx' || flags === 'wx+' || | ||
| flags === 'ax' || flags === 'ax+'; | ||
| const isWritable = flags !== 'r'; |
There was a problem hiding this comment.
Likewise... we should have similar logic elsewhere already... in a follow up this should be deduplicated
| entry = new MemoryEntry(TYPE_DIR, { mode: options?.mode }); | ||
| entry.children = new SafeMap(); |
There was a problem hiding this comment.
Nit: for follow-up ... is there ever a case where entry.children should not just be initialized to a new SafeMap on construction?
|
|
||
| async open(vfsPath, flags, mode) { | ||
| const realPath = this.#resolvePath(vfsPath); | ||
| return new Promise((resolve, reject) => { |
There was a problem hiding this comment.
Nit: for follow-up, would prefer using PromiseWithResolvers to replace the new Promise pattern. It avoids allocating the additional closure.
|
|
||
| statSync(vfsPath, options) { | ||
| const realPath = this.#resolvePath(vfsPath); | ||
| return fs.statSync(realPath, options); |
There was a problem hiding this comment.
Nit: for follow-up consideration... should these be using the public API? These can be monkeypatched by user-land and end up with very different implementations.
| const bytesToRead = MathMin(length, available); | ||
| this.#content.copy(buffer, offset, readPos, readPos + bytesToRead); | ||
|
|
||
| if (position === null || position === undefined) { |
There was a problem hiding this comment.
Nit: for follow-up position == null
| // Create a copy so the returned buffer is independently mutable. | ||
| // Buffer.from(ArrayBuffer) creates a view sharing the same memory, | ||
| // which may be backed by read-only segments in the executable. | ||
| return Buffer.from(new Uint8Array(content)); |
There was a problem hiding this comment.
Nit: what type is content here? This should be documented.
there's a risk of double copy here depending on what content is.
|
Should this live behind a CLI flag initially? It might be better to work through #62328 while the API is not exposed by default. |
Honestly... wouldn't hurt. |
The biggest risk of this PR is the patches on the modules' loading side and the fs side. You cannot put them behind a flag. The rest is just a new API with bugs (like many others). |
Add 18 feature-focused test files covering bug fixes from both analysis rounds, and fix lint issues in stats.js and setup.js. Test files added: - test-vfs-access.js: access validation and mode enforcement - test-vfs-buffer-encoding.js: buffer encodings and buffer path args - test-vfs-copyfile-mode.js: copyFile COPYFILE_EXCL support - test-vfs-dir-handle.js: Dir double-close and read/close callbacks - test-vfs-file-url.js: file: URL handling with fileURLToPath - test-vfs-mkdir-recursive-return.js: mkdirSync recursive return value - test-vfs-no-auto-mkdir.js: writes/opens don't auto-create parents - test-vfs-open-flags.js: read-only/write-only/exclusive/numeric flags - test-vfs-readdir-recursive.js: readdir recursive with names and dirents - test-vfs-readfile-fd.js: readFileSync with virtual fd - test-vfs-rename-safety.js: renameSync preserves source on failure - test-vfs-rm-edge-cases.js: rmSync dir/link edge cases - test-vfs-stats-bigint.js: BigInt stats support - test-vfs-stream-options.js: writestream start and stream fd option - test-vfs-symlink-edge-cases.js: broken/intermediate symlinks - test-vfs-watch-directory.js: directory and recursive watch - test-vfs-watchfile.js: unwatchFile cleanup and zero stats - test-vfs-writefile-flags.js: writeFile/appendFile flag support Lint fixes: - stats.js: use BigInt from primordials, add missing JSDoc @returns - setup.js: consolidate ERR_MODULE_NOT_FOUND import
|
Could those patches land separately from the flaggable part? That way it’d be much easier to ensure no regressions from either portion. (i ofc wouldn’t suggest splitting the PR until after it was otherwise philosophically landable) |
|
@ljharb maybe. But we would be landing code that is not tested, so we’d be in a worse stance? Note that experimental features are covered by our threat model. |
|
Surely either the patches are meant to be noops in isolation (in which case existing tests should be sufficient - or improved - to assure no breakage) or have independent functionality (which could ship with its own tests)? |
|
I don’t understand why. Ease of review? |
|
Ease of review, and if there’s any risk in patching core subsystems, that PR could also be released as a patch and then that assumption would be tested, and the VFS functionality could land separately as a minor. Certainly not required, but it might be prudent for assuring stability. |
A first-class virtual file system module (
node:vfs) with a provider-based architecture that integrates with Node.js's fs module and module loader.Key Features
Provider Architecture - Extensible design with pluggable providers:
MemoryProvider- In-memory file system with full read/write supportSEAProvider- Read-only access to Single Executable Application assetsVirtualProvider- Base class for creating custom providersStandard fs API - Uses familiar
writeFileSync,readFileSync,mkdirSyncinstead of custom methodsMount Mode - VFS mounts at a specific path prefix (e.g.,
/virtual), clear separation from real filesystemModule Loading -
require()andimportwork seamlessly from virtual filesSEA Integration - Assets automatically mounted at
/seawhen running as a Single Executable ApplicationFull fs Support - readFile, stat, readdir, exists, streams, promises, glob, symlinks
Example
SEA Usage
When running as a Single Executable Application, bundled assets are automatically available:
Public API
Disclaimer: I've used a significant amount of Claude Code tokens to create this PR. I've reviewed all changes myself.
Fixes #60021