wiki:PathnamesCowan

Version 7 (modified by cowan, 6 years ago) (diff)

--

Pathnames

Pathnames are immutable record-style objects of a disjoint type that represent the natural components of (string) filenames in a way that is intended to be system-independent. This design is mostly based on MIT Scheme, which is mostly based on Common Lisp. However, it is extended to accept additional components which are required to represent URIs and IRIs as well as filenames. In addition, given the pervasiveness of Windows and Posix filenames nowadays, specific provisions for them has been made.

Components and their accessors

The components of a pathname are scheme, host, port, username, password, device, directory, name, type, query, and fragment.

Scheme is the scheme part of a URI/IRI. It is a string.

Username is the username associated with a URI/IRI host. It is a string.

Password is the password associated with a username. It is a string.

Host is the host part of a URI/IRI. It is a string.

Port is the port part of a URI/IRI. It is an exact integer.

Device is a Windows-style device name without a trailing colon. It is a string, typically but not always a single character.

Directory is a directory path, which is represented as a list. The first element is always a symbol, which can be absolute or relative or unc, representing absolute, relative, and UNC directory paths respectively. The root directory is represented by the list (absolute). A null directory path is represented by the empty list. The other elements are strings.

Name is a simple file name without the associated file type or extension, if any. It is a string.

Type is the type or extension of the filename, without any leading period. It is a string.

Query is the query part of a URI/IRI. It is a string.

Fragment is the fragment part of an IRI/URI. It is a string.

All of these components can also be #f to indicate that the component is not present. In particular, a filename or URI ending in / or \ will result in the file and type components being #f.

Accessors

There are 11 accessors corresponding to the 11 components of a pathname, which take the form pathname-* where * is one of the components.

It is an error to mutate anything returned by any of these accessors.

Constructors and parsers

(make-pathname components)

Constructs and returns a pathname. Components is an a-list mapping component names (as symbols) to values. Component names not defined in this specification are allowed, but the results are implementation-specific. Any component names not appearing in components are set to #f.

ISSUE: Should make-pathname accept multiple arguments rather than a single argument which is an a-list? Variable a-lists can be constructed by quasiquotation.

The following procedures accept a string and construct and return a pathname corresponding to that string.

(posix-filename->pathname filename)

Constructs and returns a pathname. Filename is a string representing a Posix filename.

Filename is parsed using / as the component separator. The directory, name, and type components are potentially populated from it, and the device component is set to #f.

The scheme component is set to "file", and the username, password, port, query, and fragment components are set to #f.

(posix-dirname->pathname dirname)

Constructs and returns a pathname. Dirname is a string representing a Posix directory name.

Filename is parsed using / as the component separator. The directory component is potentially populated from it.

The scheme component is set to "file", and the username, password, port, device, name, type, query, and fragment components are set to #f.

(windows-filename->pathname filename)

Constructs and returns a pathname. Filename is a string representing a Windows filename.

Filename is parsed using both \ and / as component separators, and a leading // or \\ results in a directory component whose car is unc. The device and directory, name, and type components are potentially populated from it.

The scheme component is set to "file", and the username, password, port, query, and fragment components are set to #f.

(windows-dirname->pathname dirname)

Constructs and returns a pathname. Dirname is a string representing a Windows directory name.

Filename is parsed using both \ and / as component separators, and a leading // or \\ results in a directory component whose car is unc. The device and directory components are potentially populated from it.

The scheme component is set to "file", and the username, password, port, name, type, query, and fragment components are set to #f.

(uri->pathname uri)

Constructs and returns a pathname. Uri is a string to be converted to a pathname. It is parsed according to IRI rules, which are the same as URI rules except that sequences of %-hexdigit-hexdigit escapes that represent UTF-8 byte sequences are parsed into characters, provided the requisite characters are supported by the implementation.

The scheme, host, port, username, password, directory, name, type, query, and fragment components are potentially populated from uri. The device component is set to #f.

Deconstructors

The following procedures accept a pathname and construct and return a string corresponding to that pathname.

(pathname->posix-filename pathname)

Constructs and returns a filename string from pathname.

The result is constructed using / as the component separator from the directory, name, and type components of pathname.

It is an error if the username, password, port, query, and fragment components are not #f. It is an error if the scheme component is not "file" or #f.

(pathname->posix-filename pathname)

Constructs and returns a filename string from pathname.

The result is constructed using \ as the component separator from the device, directory, name, and type components.

It is an error if the username, password, port, query, and fragment components are not #f. It is an error if the scheme component is not "file" or #f.

(pathname->iri pathname)

Constructs and returns a string according to RFC 3987 rules. No escaping of non-ASCII characters is done.

The scheme, host, port, username, password, directory, name, type, query, and fragment components are used to construct the URI. It is an error if the device component is not #f, or if the first element of the directory component is not absolute.

(pathname-directory->posix-dirname pathname)

Returns pathname as a Posix directory string. The result consists of the strings in the cdr of the directory component joined using / with another / prepended if the car is absolute. (pathname-directory->window-dirname pathname)

Returns the pathname as a Windows directory string. The result consists of the strings in the cdr of the directory component joined using \, prefixed by one of the following:

  • \ if the car of the directory component is absolute
  • \\ followed by the host component if the car of the directory component is unc
  • the device followed by : if the device component is not #f
  • the empty string otherwise

(pathname->uri pathname)

Constructs and returns a string according to RFC 3986 rules. In addition to required %-escaping, non-ASCII characters are converted to UTF-8 byte sequences and %-escaped.

The scheme, host, port, username, password, directory, name, type, query, and fragment components are used to construct the URI. It is an error if the device component is not #f, or if the car of the directory component is not absolute.

Other procedures

(pathname? ''object'')`

Returns #t if object is a pathname, and #f otherwise.

(pathname= pathname1 pathname2)

Compares pathname1 and pathname2 for equality, component-wise. Note that two pathnames may be different according to pathname=? but still access the same resource. The host components are compared case-insensitively.

(merge-pathnames pathname1 pathname2)

Constructs and returns a new pathname with the same components as pathname1. However, if any component of pathname1 is #f, the corresponding component of pathname2 is used instead. The directory component is handled specially; if the pathname1 component begins with relative, the rest of it is appended to the pathname2 component.

(pathname-absolute-filename? pathname)

Returns #t if pathname specifies an absolute or UNC filename.

(pathname-absolute-uri? pathname)

Returns #t if pathname specifies an absolute URI (the fragment component must be #f).

(pathname-uri? pathname)

Returns #t if pathname specifies a URI/IRI (that is, an absolute URI with a possible fragment).

(pathname-query->alist pathname)

Returns the query component of pathname as an a-list, where & and ; separate the entries in the alist, and = separates the car from the cdr.