A tool for displaying and manipulating Web Request+Response (WRR) files of Private Passive Web Archive (pwebarc) project
Project description
What?
wrrarms
(pwebarc-wrrarms
) is a tool for displaying, programmatically manipulating, organizing, importing, and exporting Personal Private Passive Web Archive (pwebarc) (also there) Web Request+Response (WRR) files produced by pWebArc browser extension (also there).
Quickstart
Installation
- Install with:
pip install pwebarc-wrrarms
and run aswrrarms --help
- Alternatively, install it via Nix
nix-env -i -f ./default.nix wrrarms --help
- Alternatively, run without installing:
alias wrrarms="python3 -m wrrarms" wrrarms --help
How to build a file system tree of latest versions of all hoarded URLs
Assuming you keep your WRR dumps in ~/pwebarc/raw
you can generate a hierarchy of symlinks for each URL pointing from under ~/pwebarc/latest
to the most recent WRR file in ~/pwebarc/raw
via:
wrrarms organize --symlink --latest --output hupq --to ~/pwebarc/latest --and "status|== 200C" ~/pwebarc/raw
Personally, I prefer flat_mhs
(see the documentation of the --output
below) format as I dislike deep file hierarchies, using it also simplifies filtering in my ranger
file browser, so I do this:
wrrarms organize --symlink --latest --output flat_mhs --to ~/pwebarc/latest --and "status|== 200C" ~/pwebarc/raw
These commands rescan the whole of ~/pwebarc/raw
and so take a while to complete.
If you have a lot of WRR files and you want to keep your symlink tree updated in real-time you can use a two-stage --stdin0
pipeline shown in the examples section below.
How to generate a local offline website mirror like wget -mpk
If you want to render your WRR files into a local offline website mirror containing interlinked HTML files and their resources a-la wget -mpk
(wget --mirror --page-requisites --convert-links
), run one of the above --symlink --latest
commands, and then do something like this:
wrrarms export mirror --to ~/pwebarc/mirror1 ~/pwebarc/latest/archiveofourown.org
on completion ~/pwebarc/mirror1
will contain a bunch of interlinked minimized HTML files, their resources, and everything else available from WRR files living under ~/pwebarc/latest/archiveofourown.org
.
By default, all the links in exported HTML files will be remapped to local files (even if source WRR files for those would-be exported files are missing in ~/pwebarc/latest/archiveofourown.org
), and those HTML files will also be stripped of all JavaScript, CSS, and other stuff of various levels of evil (see documentation for the scrub
function below).
On the plus side, the result will be completely self-contained and safe to view with a dumb unconfigured browser.
If you are unhappy with this behaviour and, for instance, want to keep the CSS and produce human-readable HTML, run the following instead:
wrrarms export mirror -e 'response.body|eb|scrub response +all_refs,-actions,+styles,+pretty' --to ~/pwebarc/mirror2 ~/pwebarc/latest/archiveofourown.org
Note, however, that CSS resource filtering and remapping is not implemented yet.
If you also want to keep links that point to not yet hoarded Internet URLs to still point those URLs in the exported files instead of them pointing to non-existent local files, similarly to what wget -mpk
does, run wrrarms export mirror
with --remap-open
, e.g.:
wrrarms export mirror -e 'response.body|eb|scrub response +all_refs,-actions,+styles,+pretty' --remap-open --to ~/pwebarc/mirror3 ~/pwebarc/latest/archiveofourown.org
Finally, if you want a mirror made of raw files without any content censorship or link conversions, run:
wrrarms export mirror -e 'response.body|eb' --to ~/pwebarc/mirror-raw ~/pwebarc/latest/archiveofourown.org
The later command will render your mirror pretty quickly, but the other above-mentioned commands will call the scrub
function, and that will be pretty slow (as in avg ~5Mb, ~3 files per second on my 2013-era laptop), mostly because html5lib
that wrrarms
uses for paranoid HTML parsing and filtering is fairly slow.
Using --root
and --depth
As an alternative to (or in combination with) keeping a symlink hierarchy of latest versions, you can load (an index of) an assortment of WRR files into wrrarms
's memory but then export mirror
only select URLs (and all resources needed to properly render those pages) by running something like:
wrrarms export mirror --to ~/pwebarc/mirror4 \
--root 'https://archiveofourown.org/works/3733123?view_adult=true&view_full_work=true' \
--root 'https://archiveofourown.org/works/30186441?view_adult=true&view_full_work=true' \
~/pwebarc/raw/*/2023
(wrrarms
loads (indexes) WRR files pretty fast, so if you are running from an SSD, you can totally feed it years of WRR files and then only export a couple of URLs, and it will take a couple of seconds to finish anyway.)
There is also --depth
option, which works similarly to wget
's --level
option in that it will follow all jump (a href
) and action links accessible with no more than --depth
browser navigations from recursion --root
s and then export mirror
all those URLs (and their resources) too.
When using --root
options, --remap-open
works exactly like wget
's --convert-links
in that it will only remap the URLs that are going to be exported and will keep the rest as-is.
Similarly, --remap-closed
will consider only the URLs reachable from the --root
s in no more that --depth
jumps as available.
How to generate local offline website mirrors like wget -mpk
from you old mitmproxy
stream dumps
Assuming mitmproxy.001.dump
, mitmproxy.002.dump
, etc are files that were produced by running something like
mitmdump -w +mitmproxy.001.dump
at some point, you can generate website mirrors from them by first importing them all to WRR
wrrarms import mitmproxy --to ~/pwebarc/mitmproxy mitmproxy.*.dump
and then export mirror
like above, e.g. to generate mirrors for all URLs:
wrrarms export mirror --to ~/pwebarc/mirror ~/pwebarc/mitmproxy
How to generate previews for WRR files, listen to them via TTS, open them with xdg-open
, etc
See script
sub-directory for examples that show how to use pandoc
and/or w3m
to turn WRR files into previews and readable plain-text that can viewed or listened to via other tools, or dump them into temporary raw data files that can then be immediately fed to xdg-open
for one-click viewing.
Usage
wrrarms
A tool to pretty-print, compute and print values from, search, organize (programmatically rename/move/symlink/hardlink files), import, export, (WIP: check, deduplicate, and edit) pWebArc WRR (WEBREQRES, Web REQuest+RESponse) archive files.
Terminology: a reqres
(Reqres
when a Python type) is an instance of a structure representing HTTP request+response pair with some additional metadata.
-
options:
--version
: show program's version number and exit-h, --help
: show this help message and exit--markdown
: show help messages formatted in Markdown
-
subcommands:
{pprint,get,run,stream,find,organize,import,export}
pprint
: pretty-print given WRR filesget
: print values produced by computing given expressions on a given WRR filerun
: spawn a process with generated temporary files produced by given expressions computed on given WRR files as argumentsstream
: produce a stream of structured lists containing values produced by computing given expressions on given WRR files, a generalizedwrrarms get
find
: print paths of WRR files matching specified criteriaorganize
: programmatically rename/move/hardlink/symlink WRR files based on their contentsimport
: convert other HTTP archive formats into WRRexport
: convert WRR archives into other formats
wrrarms pprint
Pretty-print given WRR files to stdout.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
-u, --unabridged
: print all data in full--abridged
: shorten long strings for brevity (useful when you want to visually scan through batch data dumps) (default)--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution (default)skip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters:
--or EXPR
: only print reqres which match any of these expressions...--and EXPR
: ... and all of these expressions, both can be specified multiple times, both use the same expression format aswrrarms get --expr
, which see
-
MIME type sniffing:
--naive
: populate "potentially" lists likewrrarms (get|run|export) --expr '(request|response).body|eb|scrub \2 defaults'
does; default--paranoid
: populate "potentially" lists in the output using paranoid MIME type sniffing likewrrarms (get|run|export) --expr '(request|response).body|eb|scrub \2 +paranoid'
does; this exists to answer "Hey! Why did it censor out my data?!" questions
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given (default)--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order (default)--walk-reversed
: recursive file system walk is done in reverse lexicographic order
wrrarms get
Compute output values by evaluating expressions EXPR
s on a given reqres stored at PATH
, then print them to stdout terminating each value as specified.
-
positional arguments:
PATH
: input WRR file path
-
expression evaluation:
-e EXPR, --expr EXPR
: an expression to compute; can be specified multiple times in which case computed outputs will be printed sequentially; see also "output" options below; (default:response.body|eb
); each EXPR describes a state-transformer (pipeline) which starts from valueNone
and evaluates a script built from the following:- constants and functions:
es
: replaceNone
value with an empty string""
eb
: replaceNone
value with an empty byte stringb""
false
: replaceNone
value withFalse
true
: replaceNone
value withTrue
missing
:True
if the value isNone
0
: replaceNone
value with0
1
: replaceNone
value with1
not
: apply logicalnot
to valuelen
: applylen
to valuestr
: cast value tostr
or failbytes
: cast value tobytes
or failbool
: cast value tobool
or failint
: cast value toint
or failfloat
: cast value tofloat
or failecho
: replace the value with the given stringquote
: URL-percent-encoding quote valuequote_plus
: URL-percent-encoding quote value and replace spaces with+
symbolsunquote
: URL-percent-encoding unquote valueunquote_plus
: URL-percent-encoding unquote value and replace+
symbols with spacesto_ascii
: encodestr
value intobytes
with "ascii" codecto_utf8
: encodestr
value intobytes
with "utf-8" codecsha256
: replacebytes
value with itssha256
hex digest (hex(sha256(value))
)==
: apply== arg
,arg
is cast to the same type as the current value!=
: apply!= arg
, similarly<
: apply< arg
, similarly<=
: apply<= arg
, similarly>
: apply> arg
, similarly>=
: apply>= arg
, similarlyadd_prefix
: add prefix to the current valueadd_suffix
: add suffix to the current valuetake_prefix
: take firstarg
characters or list elements from the current valuetake_suffix
: take lastarg
characters or list elements from the current valueabbrev
: leave the current value as-is if if its length is less or equal thanarg
characters, otherwise take firstarg/2
followed by lastarg/2
charactersabbrev_each
:abbrev arg
each element in a valuelist
replace
: replace all occurences of the first argument in the current value with the second argument, casts arguments to the same type as the current valuepp_to_path
: encodepath_parts
list
into a POSIX path, quoting as little as neededqsl_urlencode
: encode parsedquery
list
into a URL's query componentstr
qsl_to_path
: encodequery
list
into a POSIX path, quoting as little as neededscrub
: scrub the value by optionally rewriting links and/or removing dynamic content from it; what gets done depends on--remap-*
command line options, the MIME type of the value itself, and the scrubbing options described below; this fuction takes two arguments: - the first must be either ofrequest|response
, it controls which HTTP headersscrub
should inspect to help it detect the MIME type; - the second is eitherdefaults
or ","-separated string of(+|-)(paranoid|unknown|jumps|actions|srcs|all_refs|scripts|iframes|styles|iepragmas|prefetches|tracking|dyndoc|all_dyns|verbose|whitespace|optional_tags|indent|pretty|debug)
tokens which control the scrubbing behaviour: -+paranoid
will assume the server is lying in itsContent-Type
andX-Content-Type-Options
HTTP headers, sniff the contents of(request|response).body
to determine what it actually contains regardless of what the server said, and then use the most paranoid interpretation of both the HTTP headers and the sniffed possible MIME types to decide what should be kept and what sholuld be removed by the options below; i.e., this will make-unknown
,-scripts
, and-styles
options below to censor out more things, in particular, at the moment, most plain text files will get censored out as potential JavaScript; the default is-paranoid
; -(+|-)unknown
controls if the data with unknown content types should passed to the output unchanged or censored out (respectively); the default is+unknown
, which will keep data of unknown content types as-is; -(+|-)(jumps|actions|srcs)
control which kinds of references to other documents should be remapped or censored out (respectively); i.e. it controls whether jump-links (HTMLa href
,area href
, and similar), action-links (HTMLa ping
,form action
, and similar), and/or resource references (HTMLimg src
,iframe src
, CSSurl
references, and similar) should be remapped using the specified--remap-*
option (which see) or censored out similarly to how--remap-void
will do it; the default is+jumps,-actions,-srcs
which will produce a self-contained result that can be fed into another tool --- be it a web browser orpandoc
--- without that tool trying to access the Internet; -(+|-)all_refs
is equivalent to enabling or disabling all of the above options simultaneously; -(+|-)(scripts|iframes|styles|iepragmas|prefetches|tracking)
control which things should be kept or censored out w.r.t. to HTML, CSS, and JavaScript, i.e. it controls whether JavaScript (both separate files and HTML tags and attributes),<iframe>
HTML tags, CSS (both separate files and HTML tags and attributes; why? because CSS is Turing-complete), HTML Internet-Explorer pragmas, HTML content prefetchlink
tags, and other tracking HTML tags and attributes (likea ping
attributes), should be respectively kept in or censored out from the input; the default is-scripts,-iframes,-styles,-iepragmas,-prefetches,-tracking
which ensures the result will not produce any prefetch and tracking requests when loaded in a web browser, and that the whole result is simple data, not a program in some Turing-complete language, thus making it safe to feed the result to other tools too smart for their own users' good; -(+|-)all_dyns
is equivalent to enabling or disabling all of the above (scripts|...
) options simultaneously; -(+|-)verbose
controls whether tag censoring controlled by the above options is to be reported in the output (as comments) or stuff should be wiped from existence without evidence instead; the default is-verbose
; -(+|-)whitespace
controls whether HTML renderer should keep the original HTML whitespace as-is or collapse it away (respectively); the default is-whitespace
; -(+|-)optional_tags
controls whether HTML renderer should put optional HTML tags into the output or skip them (respectively); the default is+optional_tags
(because many tools fail to parse minimized HTML properly); -(+|-)indent
controls whether HTML renderer should indent HTML elements (where whitespace placement in the original markup allows for it) or not (respectively); the default is-indent
; -+pretty
is an alias for+verbose,-whitespace,+indent
which produces the prettiest possible human-readable output that keeps the original whitespace semantics;-pretty
is an alias for+verbose,+whitespace,-indent
which produces the approximation of the original markup with censoring applied; neither is the default; -+debug
is an alias for+pretty
that also uses a much more aggressive version ofindent
that ignores the semantics of original whitespace placement, i.e. it will indent<p>not<em>sep</em>arated</p>
as if there was whitespace before and afterp
,em
,/em
, and/p
tags; this is useful for debugging custom mutations;-debug
is noop, which is the default;
- reqres fields, these work the same way as constants above, i.e. they replace current value of
None
with field's value, if reqres is missing the field in question, which could happen forresponse*
fields, the result isNone
:version
: WEBREQRES format version; intsource
:+
-separated list of applications that produced this reqres; strprotocol
: protocol; e.g."HTTP/1.1"
,"HTTP/2.0"
; strrequest.started_at
: request start time in seconds since 1970-01-01 00:00; Epochrequest.method
: request HTTP method; e.g."GET"
,"POST"
, etc; strrequest.url
: request URL, including the fragment/hash part; strrequest.headers
: request headers; list[tuple[str, bytes]]request.complete
: is request body complete?; boolrequest.body
: request body; bytesresponse.started_at
: response start time in seconds since 1970-01-01 00:00; Epochresponse.code
: HTTP response code; e.g.200
,404
, etc; intresponse.reason
: HTTP response reason; e.g."OK"
,"Not Found"
, etc; usually empty for Chromium and filled for Firefox; strresponse.headers
: response headers; list[tuple[str, bytes]]response.complete
: is response body complete?; boolresponse.body
: response body; Firefox gives raw bytes, Chromium gives UTF-8 encoded strings; bytes | strfinished_at
: request completion time in seconds since 1970-01-01 00:00; Epochwebsocket
: a list of WebSocket frames
- derived attributes:
fs_path
: file system path for the WRR file containing this reqres; str | bytes | Noneqtime
: aliast forrequest.started_at
; mnemonic: "reQuest TIME"; seconds since UNIX epoch; decimal floatqtime_ms
:qtime
in milliseconds rounded down to nearest integer; milliseconds since UNIX epoch; intqtime_msq
: three least significant digits ofqtime_ms
; intqyear
: year number ofgmtime(qtime)
(UTC year number ofqtime
); intqmonth
: month number ofgmtime(qtime)
; intqday
: day of the month ofgmtime(qtime)
; intqhour
: hour ofgmtime(qtime)
in 24h format; intqminute
: minute ofgmtime(qtime)
; intqsecond
: second ofgmtime(qtime)
; intstime
:response.started_at
if there was a response,finished_at
otherwise; mnemonic: "reSponse TIME"; seconds since UNIX epoch; decimal floatstime_ms
:stime
in milliseconds rounded down to nearest integer; milliseconds since UNIX epoch, intstime_msq
: three least significant digits ofstime_msq
; intsyear
: similar tosyear
, but forstime
; intsmonth
: similar tosmonth
, but forstime
; intsday
: similar tosday
, but forstime
; intshour
: similar toshour
, but forstime
; intsminute
: similar tosminute
, but forstime
; intssecond
: similar tossecond
, but forstime
; intftime
: aliast forfinished_at
; seconds since UNIX epoch; decimal floatftime_ms
:ftime
in milliseconds rounded down to nearest integer; milliseconds since UNIX epoch; intftime_msq
: three least significant digits offtime_msq
; intfyear
: similar tosyear
, but forftime
; intfmonth
: similar tosmonth
, but forftime
; intfday
: similar tosday
, but forftime
; intfhour
: similar toshour
, but forftime
; intfminute
: similar tosminute
, but forftime
; intfsecond
: similar tossecond
, but forftime
; intstatus
:"NR"
if there was no response,str(response.code) + "C"
if response was complete,str(response.code) + "N"
otherwise; strmethod
: aliast forrequest.method
; strraw_url
: aliast forrequest.url
; strnet_url
:raw_url
with Punycode UTS46 IDNA encoded hostname, unsafe characters quoted, and without the fragment/hash part; this is the URL that actually gets sent to the server; strpretty_url
:raw_url
, but usinghostname
,mq_path
, andmq_query
; strpretty_nurl
:raw_url
, but usinghostname
,mq_path
, andmq_nquery
; strscheme
: scheme part ofraw_url
; e.g.http
,https
, etc; strraw_hostname
: hostname part ofraw_url
as it is recorded in the reqres; strnet_hostname
: hostname part ofraw_url
, encoded as Punycode UTS46 IDNA; this is what actually gets sent to the server; ASCII strhostname
:net_hostname
decoded back into UNICODE; this is the canonical hostname representation for which IDNA-encoding and decoding are bijective; UNICODE strrhostname
:hostname
with the order of its parts reversed; e.g."www.example.org"
->"com.example.www"
; strport
: port part ofraw_url
; strnetloc
: netloc part ofraw_url
; i.e., in the most general case,<username>:<password>@<hostname>:<port>
; strraw_path
: raw path part ofraw_url
as it is recorded is the reqres; e.g."https://www.example.org"
->""
,"https://www.example.org/"
->"/"
,"https://www.example.org/index.html"
->"/index.html"
; strpath_parts
: component-wise unquoted "/"-splitraw_path
with empty components removed and dots and double dots interpreted away; e.g."https://www.example.org"
->[]
,"https://www.example.org/"
->[]
,"https://www.example.org/index.html"
->["index.html"]
,"https://www.example.org/skipped/.//../used/"
->["used"]
; list[str]mq_path
:path_parts
turned back into a minimally-quoted string; strfilepath_parts
:path_parts
transformed into components usable as an exportable file name; i.e.path_parts
with an optional additional"index"
appended, depending onraw_url
andresponse
MIME type; extension will be stored separately infilepath_ext
; e.g. for HTML documents"https://www.example.org/"
->["index"]
,"https://www.example.org/test.html"
->["test"]
,"https://www.example.org/test"
->["test", "index"]
,"https://www.example.org/test.json"
->["test.json", "index"]
, but if it has a JSON MIME type then"https://www.example.org/test.json"
->["test"]
(andfilepath_ext
will be set to".json"
); this is similar to whatwget -mpk
does, but a bit smarter; list[str]filepath_ext
: extension of the last component offilepath_parts
for recognized MIME types,".data"
otherwise; strraw_query
: query part ofraw_url
(i.e. everything after the?
character and before the#
character) as it is recorded in the reqres; strquery_parts
: parsed (and component-wise unquoted)raw_query
; list[tuple[str, str]]query_ne_parts
:query_parts
with empty query parameters removed; list[tuple[str, str]]mq_query
:query_parts
turned back into a minimally-quoted string; strmq_nquery
:query_ne_parts
turned back into a minimally-quoted string; stroqm
: optional query mark:?
character ifquery
is non-empty, an empty string otherwise; strfragment
: fragment (hash) part of the url; strofm
: optional fragment mark:#
character iffragment
is non-empty, an empty string otherwise; str
- a compound expression built by piping (
|
) the above, for example:response.body|eb
(the default forget
) will print rawresponse.body
or an empty byte string, if there was no response;response.body|eb|scrub response defaults
will take the above value,scrub
it using default content scrubbing settings which will censor out all action and resource reference URLs;response.body|eb|scrub response +all_refs,-actions
(the default forexport
) will remap allhref
jump-links andsrc
resource references to local files while still censoring out all action URLs (since those don't make sense for a static mirror);response.complete
will print the value ofresponse.complete
orNone
, if there was no response;response.complete|false
will printresponse.complete
orFalse
;net_url|to_ascii|sha256
will printsha256
hash of the URL that was actually sent over the network;net_url|to_ascii|sha256|take_prefix 4
will print the first 4 characters of the above;path_parts|take_prefix 3|pp_to_path
will print first 3 path components of the URL, minimally quoted to be used as a path;query_ne_parts|take_prefix 3|qsl_to_path|abbrev 128
will print first 3 non-empty query parameters of the URL, abbreviated to 128 characters or less, minimally quoted to be used as a path;
- constants and functions:
-
URL remapping, used by
scrub
--expr
atom:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything (default)--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained
-
output:
--not-separated
: don't separate output values with anything, just concatenate them-l, --lf-separated
: separate output values with\n
(LF) newline characters (default)-z, --zero-separated
: separate output values with\0
(NUL) bytes
wrrarms run
Compute output values by evaluating expressions EXPR
s for each of NUM
reqres stored at PATH
s, dump the results into into newly generated temporary files terminating each value as specified, spawn a given COMMAND
with given arguments ARG
s and the resulting temporary file paths appended as the last NUM
arguments, wait for it to finish, delete the temporary files, exit with the return code of the spawned process.
-
positional arguments:
COMMAND
: command to spawnARG
: additional arguments to give to theCOMMAND
PATH
: input WRR file paths to be mapped into new temporary files
-
options:
-n NUM, --num-args NUM
: number ofPATH
s (default:1
)
-
expression evaluation:
-e EXPR, --expr EXPR
: seewrrarms get
-
URL remapping, used by
scrub
--expr
atom:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything (default)--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained
-
output:
--not-separated
: don't separate output values with anything, just concatenate them-l, --lf-separated
: separate output values with\n
(LF) newline characters (default)-z, --zero-separated
: separate output values with\0
(NUL) bytes
wrrarms stream
Compute given expressions for each of given WRR files, encode them into a requested format, and print the result to stdout.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
-u, --unabridged
: print all data in full--abridged
: shorten long strings for brevity (useful when you want to visually scan through batch data dumps) (default)--format {py,cbor,json,raw}
: generate output in:- py: Pythonic Object Representation aka
repr
(default) - cbor: CBOR (RFC8949)
- json: JavaScript Object Notation aka JSON; binary data can't be represented, UNICODE replacement characters will be used
- raw: concatenate raw values; termination is controlled by
*-terminated
options
- py: Pythonic Object Representation aka
--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution (default)skip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters:
--or EXPR
: only print reqres which match any of these expressions...--and EXPR
: ... and all of these expressions, both can be specified multiple times, both use the same expression format aswrrarms get --expr
, which see
-
expression evaluation:
-e EXPR, --expr EXPR
: an expression to compute, seewrrarms get --expr
for more info on expression format; can be specified multiple times; the default is.
which will dump the whole reqres structure
-
URL remapping, used by
scrub
--expr
atom:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything (default)--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained
-
--format=raw
output:--not-terminated
: don't terminate--format=raw
output values with anything, just concatenate them-l, --lf-terminated
: terminate--format=raw
output values with\n
(LF) newline characters (default)-z, --zero-terminated
: terminate--format=raw
output values with\0
(NUL) bytes
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given (default)--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order (default)--walk-reversed
: recursive file system walk is done in reverse lexicographic order
wrrarms find
Print paths of WRR files matching specified criteria.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution (default)skip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters:
--or EXPR
: only output paths to reqres which match any of these expressions...--and EXPR
: ... and all of these expressions, both can be specified multiple times, both use the same expression format aswrrarms get --expr
, which see
-
output:
-l, --lf-terminated
: terminate output absolute paths of matching WRR files with\n
(LF) newline characters (default)-z, --zero-terminated
: terminate output absolute paths of matching WRR files with\0
(NUL) bytes
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given (default)--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order (default)--walk-reversed
: recursive file system walk is done in reverse lexicographic order
wrrarms organize
Parse given WRR files into their respective reqres and then rename/move/hardlink/symlink each file to DESTINATION
with the new path derived from each reqres' metadata.
Operations that could lead to accidental data loss are not permitted.
E.g. wrrarms organize --move
will not overwrite any files, which is why the default --output
contains %(num)d
.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr-t DESTINATION, --to DESTINATION
: destination directory, when unset each sourcePATH
must be a directory which will be treated as its ownDESTINATION
-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or "format:" followed by a custom pythonic %-substitution string:- available aliases and corresponding %-substitutions:
default
:%(syear)d/%(smonth)02d/%(sday)02d/%(shour)02d%(sminute)02d%(ssecond)02d%(stime_msq)03d_%(qtime_ms)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(hostname)s.%(num)d
(default) -https://example.org
->1970/01/01/001640000_0_GET_50d7_200C_example.org.0
-https://example.org/
->1970/01/01/001640000_0_GET_8198_200C_example.org.0
-https://example.org/index.html
->1970/01/01/001640000_0_GET_f0dc_200C_example.org.0
-https://example.org/media
->1970/01/01/001640000_0_GET_086d_200C_example.org.0
-https://example.org/media/
->1970/01/01/001640000_0_GET_3fbb_200C_example.org.0
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->1970/01/01/001640000_0_GET_5658_200C_example.org.0
-https://königsgäßchen.example.org/index.html
->1970/01/01/001640000_0_GET_4f11_200C_königsgäßchen.example.org.0
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->1970/01/01/001640000_0_GET_c4ae_200C_ジャジェメント.ですの.example.org.0
short
:%(syear)d/%(smonth)02d/%(sday)02d/%(stime_ms)d_%(qtime_ms)s.%(num)d
-https://example.org
,https://example.org/
,https://example.org/index.html
,https://example.org/media
,https://example.org/media/
,https://example.org/view?one=1&two=2&three=&three=3#fragment
,https://königsgäßchen.example.org/index.html
,https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->1970/01/01/1000000_0.0
surl
:%(scheme)s/%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s
-https://example.org
,https://example.org/
->https/example.org/
-https://example.org/index.html
->https/example.org/index.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view?one=1&two=2&three&three=3
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is
surl_msn
:%(scheme)s/%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s_%(method)s_%(status)s.%(num)d
-https://example.org
,https://example.org/
->https/example.org/_GET_200C.0
-https://example.org/index.html
->https/example.org/index.html_GET_200C.0
-https://example.org/media
,https://example.org/media/
->https/example.org/media_GET_200C.0
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view?one=1&two=2&three&three=3_GET_200C.0
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html_GET_200C.0
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is_GET_200C.0
shupq
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.htm
-https://example.org/index.html
->https/example.org/index.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.htm
shupq_msn
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index_GET_200C.0.htm
-https://example.org/index.html
->https/example.org/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index_GET_200C.0.htm
shupnq
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.htm
-https://example.org/index.html
->https/example.org/index.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.htm
shupnq_msn
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index_GET_200C.0.htm
-https://example.org/index.html
->https/example.org/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index_GET_200C.0.htm
shupnq_mhs
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->https/example.org/index_GET_50d7_200C.htm
-https://example.org/
->https/example.org/index_GET_8198_200C.htm
-https://example.org/index.html
->https/example.org/index_GET_f0dc_200C.html
-https://example.org/media
->https/example.org/media/index_GET_086d_200C.htm
-https://example.org/media/
->https/example.org/media/index_GET_3fbb_200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3_GET_5658_200C.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index_GET_4f11_200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index_GET_c4ae_200C.htm
shupnq_mhsn
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
->https/example.org/index_GET_50d7_200C.0.htm
-https://example.org/
->https/example.org/index_GET_8198_200C.0.htm
-https://example.org/index.html
->https/example.org/index_GET_f0dc_200C.0.html
-https://example.org/media
->https/example.org/media/index_GET_086d_200C.0.htm
-https://example.org/media/
->https/example.org/media/index_GET_3fbb_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3_GET_5658_200C.0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index_GET_4f11_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index_GET_c4ae_200C.0.htm
srhupq
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.htm
-https://example.org/index.html
->https/org.example/index.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.htm
srhupq_msn
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index_GET_200C.0.htm
-https://example.org/index.html
->https/org.example/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index_GET_200C.0.htm
srhupnq
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.htm
-https://example.org/index.html
->https/org.example/index.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.htm
srhupnq_msn
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index_GET_200C.0.htm
-https://example.org/index.html
->https/org.example/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index_GET_200C.0.htm
srhupnq_mhs
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->https/org.example/index_GET_50d7_200C.htm
-https://example.org/
->https/org.example/index_GET_8198_200C.htm
-https://example.org/index.html
->https/org.example/index_GET_f0dc_200C.html
-https://example.org/media
->https/org.example/media/index_GET_086d_200C.htm
-https://example.org/media/
->https/org.example/media/index_GET_3fbb_200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3_GET_5658_200C.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index_GET_4f11_200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index_GET_c4ae_200C.htm
srhupnq_mhsn
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
->https/org.example/index_GET_50d7_200C.0.htm
-https://example.org/
->https/org.example/index_GET_8198_200C.0.htm
-https://example.org/index.html
->https/org.example/index_GET_f0dc_200C.0.html
-https://example.org/media
->https/org.example/media/index_GET_086d_200C.0.htm
-https://example.org/media/
->https/org.example/media/index_GET_3fbb_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3_GET_5658_200C.0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index_GET_4f11_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index_GET_c4ae_200C.0.htm
url
:%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s
-https://example.org
,https://example.org/
->example.org/
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view?one=1&two=2&three&three=3
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is
url_msn
:%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s_%(method)s_%(status)s.%(num)d
-https://example.org
,https://example.org/
->example.org/_GET_200C.0
-https://example.org/index.html
->example.org/index.html_GET_200C.0
-https://example.org/media
,https://example.org/media/
->example.org/media_GET_200C.0
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view?one=1&two=2&three&three=3_GET_200C.0
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html_GET_200C.0
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is_GET_200C.0
hupq
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.htm
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.htm
hupq_msn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index_GET_200C.0.htm
-https://example.org/index.html
->example.org/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index_GET_200C.0.htm
hupnq
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.htm
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.htm
hupnq_msn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index_GET_200C.0.htm
-https://example.org/index.html
->example.org/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index_GET_200C.0.htm
hupnq_mhs
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->example.org/index_GET_50d7_200C.htm
-https://example.org/
->example.org/index_GET_8198_200C.htm
-https://example.org/index.html
->example.org/index_GET_f0dc_200C.html
-https://example.org/media
->example.org/media/index_GET_086d_200C.htm
-https://example.org/media/
->example.org/media/index_GET_3fbb_200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3_GET_5658_200C.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_4f11_200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index_GET_c4ae_200C.htm
hupnq_mhsn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
->example.org/index_GET_50d7_200C.0.htm
-https://example.org/
->example.org/index_GET_8198_200C.0.htm
-https://example.org/index.html
->example.org/index_GET_f0dc_200C.0.html
-https://example.org/media
->example.org/media/index_GET_086d_200C.0.htm
-https://example.org/media/
->example.org/media/index_GET_3fbb_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3_GET_5658_200C.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_4f11_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index_GET_c4ae_200C.0.htm
rhupq
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.htm
-https://example.org/index.html
->org.example/index.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.htm
rhupq_msn
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index_GET_200C.0.htm
-https://example.org/index.html
->org.example/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index_GET_200C.0.htm
rhupnq
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.htm
-https://example.org/index.html
->org.example/index.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.htm
rhupnq_msn
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index_GET_200C.0.htm
-https://example.org/index.html
->org.example/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index_GET_200C.0.htm
rhupnq_mhs
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->org.example/index_GET_50d7_200C.htm
-https://example.org/
->org.example/index_GET_8198_200C.htm
-https://example.org/index.html
->org.example/index_GET_f0dc_200C.html
-https://example.org/media
->org.example/media/index_GET_086d_200C.htm
-https://example.org/media/
->org.example/media/index_GET_3fbb_200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3_GET_5658_200C.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index_GET_4f11_200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index_GET_c4ae_200C.htm
rhupnq_mhsn
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
->org.example/index_GET_50d7_200C.0.htm
-https://example.org/
->org.example/index_GET_8198_200C.0.htm
-https://example.org/index.html
->org.example/index_GET_f0dc_200C.0.html
-https://example.org/media
->org.example/media/index_GET_086d_200C.0.htm
-https://example.org/media/
->org.example/media/index_GET_3fbb_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3_GET_5658_200C.0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index_GET_4f11_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index_GET_c4ae_200C.0.htm
flat
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.htm
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.htm
flat_ms
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(status)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index_GET_200C.htm
-https://example.org/index.html
->example.org/index_GET_200C.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index_GET_200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3_GET_200C.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index_GET_200C.htm
flat_msn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index_GET_200C.0.htm
-https://example.org/index.html
->example.org/index_GET_200C.0.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index_GET_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3_GET_200C.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index_GET_200C.0.htm
flat_mhs
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->example.org/index_GET_50d7_200C.htm
-https://example.org/
->example.org/index_GET_8198_200C.htm
-https://example.org/index.html
->example.org/index_GET_f0dc_200C.html
-https://example.org/media
->example.org/media__index_GET_086d_200C.htm
-https://example.org/media/
->example.org/media__index_GET_3fbb_200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3_GET_5658_200C.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_4f11_200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index_GET_c4ae_200C.htm
flat_mhsn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s.%(num)d%(filepath_ext)s
-https://example.org
->example.org/index_GET_50d7_200C.0.htm
-https://example.org/
->example.org/index_GET_8198_200C.0.htm
-https://example.org/index.html
->example.org/index_GET_f0dc_200C.0.html
-https://example.org/media
->example.org/media__index_GET_086d_200C.0.htm
-https://example.org/media/
->example.org/media__index_GET_3fbb_200C.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3_GET_5658_200C.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index_GET_4f11_200C.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index_GET_c4ae_200C.0.htm
- available substitutions:
num
: number of times the resulting output path was encountered before; adding this parameter to your--output
format will ensure all generated file names will be unique- all expressions of
wrrarms get --expr
, which see
- available aliases and corresponding %-substitutions:
--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution (default)skip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters:
--or EXPR
: only work on reqres which match any of these expressions...--and EXPR
: ... and all of these expressions, both can be specified multiple times, both use the same expression format aswrrarms get --expr
, which see
-
output:
--no-output
: don't print anything (default)-l, --lf-terminated
: terminate output absolute paths of newly produced files with\n
(LF) newline characters-z, --zero-terminated
: terminate output absolute paths of newly produced files with\0
(NUL) bytes
-
action:
--move
: move source files underDESTINATION
(default)--copy
: copy source files to files underDESTINATION
--hardlink
: create hardlinks from source files to paths underDESTINATION
--symlink
: create symlinks from source files to paths underDESTINATION
-
updates:
--keep
: disallow replacements and overwrites for any existing files underDESTINATION
(default); broken symlinks are allowed to be replaced; if source and target directories are the same then some files can still be renamed into previously non-existing names; all other updates are disallowed--latest
: replace files underDESTINATION
ifstime_ms
for the source reqres is newer than the same value for reqres stored at the destination
-
caching, deferring, and batching:
--seen-number INT
: track at most this many distinct generated--output
values; default:16384
; making this larger improves disk performance at the cost of increased memory consumption; setting it to zero will force forcewrrarms
to constantly re-check existence of--output
files and forcewrrarms
to execute all IO actions immediately, disregarding--defer-number
setting--cache-number INT
: cachestat(2)
information about this many files in memory; default:8192
; making this larger improves performance at the cost of increased memory consumption; setting this to a too small number will likely forcewrrarms
into repeatedly performing lots ofstat(2)
system calls on the same files; setting this to a value smaller than--defer-number
will not improve memory consumption very much since deferred IO actions also cache information about their own files--defer-number INT
: defer at most this many IO actions; default:1024
; making this larger improves performance at the cost of increased memory consumption; setting it to zero will force all IO actions to be applied immediately--batch-number INT
: queue at most this many deferred IO actions to be applied together in a batch; this queue will only be used if all other resource constraints are met; default: 128--max-memory INT
: the caches, the deferred actions queue, and the batch queue, all taken together, must not take more than this much memory in MiB; default:1024
; making this larger improves performance; the actual maximum whole-program memory consumption isO(<size of the largest reqres> + <--seen-number> + <sum of lengths of the last --seen-number generated --output paths> + <--cache-number> + <--defer-number> + <--batch-number> + <--max-memory>)
--lazy
: sets all of the above options to positive infinity; most useful when doingwrrarms organize --symlink --latest --output flat
or similar, where the number of distinct generated--output
values and the amount of other datawrrarms
needs to keep in memory is small, in which case it will forcewrrarms
to compute the desired file system state first and then perform all disk writes in a single batch
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given (default when--keep
)--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order (default when--latest
)--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order (default when--keep
)--walk-reversed
: recursive file system walk is done in reverse lexicographic order (default when--latest
)
wrrarms import
Use specified parser to parse data in each INPUT
PATH
into reqres and dump them under DESTINATION
with paths derived from their metadata.
In short, this is wrrarms organize --copy
but for non-WRR INPUT
files.
- file formats:
{mitmproxy}
mitmproxy
: convertmitmproxy
stream dumps into WRR files
wrrarms import mitmproxy
Parse each INPUT
PATH
as mitmproxy
stream dump (by using mitmproxy
's own parser) into a sequence of reqres and dump them under DESTINATION
with paths derived from their metadata.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr-t DESTINATION, --to DESTINATION
: destination directory-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or "format:" followed by a custom pythonic %-substitution string; same aswrrarms organize --output
, which see--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution (default)skip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters:
--or EXPR
: only import reqres which match any of these expressions...--and EXPR
: ... and all of these expressions, both can be specified multiple times, both use the same expression format aswrrarms get --expr
, which see
-
output:
--no-output
: don't print anything (default)-l, --lf-terminated
: terminate output absolute paths of newly produced files with\n
(LF) newline characters-z, --zero-terminated
: terminate output absolute paths of newly produced files with\0
(NUL) bytes
-
caching, deferring, and batching:
--seen-number INT
: track at most this many distinct generated--output
values; default:16384
; making this larger improves disk performance at the cost of increased memory consumption; setting it to zero will force forcewrrarms
to constantly re-check existence of--output
files and forcewrrarms
to execute all IO actions immediately, disregarding--defer-number
setting--cache-number INT
: cachestat(2)
information about this many files in memory; default:8192
; making this larger improves performance at the cost of increased memory consumption; setting this to a too small number will likely forcewrrarms
into repeatedly performing lots ofstat(2)
system calls on the same files; setting this to a value smaller than--defer-number
will not improve memory consumption very much since deferred IO actions also cache information about their own files--defer-number INT
: defer at most this many IO actions; default:0
; making this larger improves performance at the cost of increased memory consumption; setting it to zero will force all IO actions to be applied immediately--batch-number INT
: queue at most this many deferred IO actions to be applied together in a batch; this queue will only be used if all other resource constraints are met; default: 1024--max-memory INT
: the caches, the deferred actions queue, and the batch queue, all taken together, must not take more than this much memory in MiB; default:1024
; making this larger improves performance; the actual maximum whole-program memory consumption isO(<size of the largest reqres> + <--seen-number> + <sum of lengths of the last --seen-number generated --output paths> + <--cache-number> + <--defer-number> + <--batch-number> + <--max-memory>)
--lazy
: sets all of the above options to positive infinity; most useful when doingwrrarms organize --symlink --latest --output flat
or similar, where the number of distinct generated--output
values and the amount of other datawrrarms
needs to keep in memory is small, in which case it will forcewrrarms
to compute the desired file system state first and then perform all disk writes in a single batch
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given (default)--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order (default)--walk-reversed
: recursive file system walk is done in reverse lexicographic order
wrrarms export
Parse given WRR files into their respective reqres, convert to another file format, and then dump the result under DESTINATION
with the new path derived from each reqres' metadata.
- file formats:
{mirror}
mirror
: convert given WRR files into a local website mirror stored in interlinked plain files
wrrarms export mirror
Parse given WRR files, filter out those that have no responses, transform and then dump their response bodies into separate files under DESTINATION
with the new path derived from each reqres' metadata.
In short, this is a combination of wrrarms organize --copy
followed by in-place wrrarms get
.
In other words, this generates static offline website mirrors, producing results similar to those of wget -mpk
.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr-t DESTINATION, --to DESTINATION
: target directory-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or a custom pythonic %-substitution string; same aswrrarms organize --output
, which see--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution (default)skip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters:
--or EXPR
: only export reqres which match any of these expressions...--and EXPR
: ... and all of these expressions, both can be specified multiple times, both use the same expression format aswrrarms get --expr
, which see
-
output:
--no-output
: don't print anything (default)-l, --lf-terminated
: terminate output absolute paths of newly produced files with\n
(LF) newline characters-z, --zero-terminated
: terminate output absolute paths of newly produced files with\0
(NUL) bytes
-
expression evaluation:
-e EXPR, --expr EXPR
: an expression to export, seewrrarms get --expr
for more info on expression format (default:response.body|eb|scrub response +all_refs,-actions
)
-
URL remapping, used by
scrub
--expr
atom:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained--remap-open, -k, --convert-links
: point all URLs present in inputPATH
s and reachable from--root
s in no more that--depth
steps to their corresponding output paths, remap all other URLs like--remap-id
does; this is similar towget (-k|--convert-links)
--remap-closed
: remap all reachable URLs like--remap-open
does, remap all other URLs like--remap-void
does;export
edmirror
s will be self-contained--remap-all
: remap all reachable URLs like--remap-open
does, point other URLs to paths produced by the current--output
format for a corresponding trivialGET <URL> -> 200 OK
reqres; this will produce broken links if the--output
format depends on anything but the URL itself, but for a simple--output
(like the defaulthupq
) this allowswrrarms export
to be used incrementally;export
edmirror
s will be self-contained (default)
-
export targets (default:
net_url
s of all inputPATH
s):-r URL, --root URL
: recursion root; a URL which will be used as a root for recursive export; can be specified multiple times; if none are specified, then all URLs available fromPATH
s are treated as roots-d DEPTH, --depth DEPTH
: maximum recursion depth level; the default is0
, which means "--root
documents and their resources only"; setting this to1
will also export one level of documents referenced via jump and action links, if those are being remapped to local files with--remap-*
; higher values will mean even more recursion
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given (default)--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order (default)--walk-reversed
: recursive file system walk is done in reverse lexicographic order
Examples
-
Pretty-print all reqres in
../dumb_server/pwebarc-dump
using an abridged (for ease of reading and rendering) verbose textual representation:wrrarms pprint ../dumb_server/pwebarc-dump
-
Pipe raw response body from a given WRR file to stdout:
wrrarms get ../dumb_server/pwebarc-dump/path/to/file.wrr
-
Pipe response body scrubbed of dynamic content from a given WRR file to stdout:
wrrarms get -e "response.body|eb|scrub response defaults" ../dumb_server/pwebarc-dump/path/to/file.wrr
-
Get first 4 characters of a hex digest of sha256 hash computed on the URL without the fragment/hash part:
wrrarms get -e "net_url|to_ascii|sha256|take_prefix 4" ../dumb_server/pwebarc-dump/path/to/file.wrr
-
Pipe response body from a given WRR file to stdout, but less efficiently, by generating a temporary file and giving it to
cat
:wrrarms run cat ../dumb_server/pwebarc-dump/path/to/file.wrr
Thus
wrrarms run
can be used to do almost anything you want, e.g.wrrarms run less ../dumb_server/pwebarc-dump/path/to/file.wrr
wrrarms run -- sort -R ../dumb_server/pwebarc-dump/path/to/file.wrr
wrrarms run -n 2 -- diff -u ../dumb_server/pwebarc-dump/path/to/file-v1.wrr ../dumb_server/pwebarc-dump/path/to/file-v2.wrr
-
List paths of all WRR files from
../dumb_server/pwebarc-dump
that contain only complete200 OK
responses with bodies larger than 1K:wrrarms find --and "status|== 200C" --and "response.body|len|> 1024" ../dumb_server/pwebarc-dump
-
Rename all WRR files in
../dumb_server/pwebarc-dump/default
according to their metadata using--output default
(see thewrrarms organize
section for its definition, thedefault
format is designed to be human-readable while causing almost no collisions, thus makingnum
substitution parameter to almost always stay equal to0
, making things nice and deterministic):wrrarms organize ../dumb_server/pwebarc-dump/default
alternatively, just show what would be done
wrrarms organize --dry-run ../dumb_server/pwebarc-dump/default
-
The output of
wrrarms organize --zero-terminated
can be piped intowrrarms organize --stdin0
to perform complex updates. E.g. the following will rename new reqres from../dumb_server/pwebarc-dump
to~/pwebarc/raw
renaming them with--output default
, thefor
loop is there to preserve profiles:for arg in ../dumb_server/pwebarc-dump/* ; do wrrarms organize --zero-terminated --to ~/pwebarc/raw/"$(basename "$arg")" "$arg" done > changes
then, we can reuse
changes
to symlink all new files from~/pwebarc/raw
to~/pwebarc/all
using--output hupq_msn
, which would show most of the URL in the file name:wrrarms organize --stdin0 --symlink --to ~/pwebarc/all --output hupq_msn < changes
and then, we can reuse
changes
again and use them to update~/pwebarc/latest
, filling it with symlinks pointing to the latest200 OK
complete reqres from~/pwebarc/raw
, similar to whatwget -r
would produce (exceptwget
would do network requests and produce responce bodies, while this will build a file system tree of symlinks to WRR files in/pwebarc/raw
):wrrarms organize --stdin0 --symlink --latest --to ~/pwebarc/latest --output hupq --and "status|== 200C" < changes
-
wrrarms organize --move
is de-duplicating when possible, while--copy
,--hardlink
, and--symlink
are non-duplicating when possible, i.e.:wrrarms organize --copy --to ~/pwebarc/copy1 ~/pwebarc/original wrrarms organize --copy --to ~/pwebarc/copy2 ~/pwebarc/original wrrarms organize --hardlink --to ~/pwebarc/copy3 ~/pwebarc/original # noops wrrarms organize --copy --to ~/pwebarc/copy1 ~/pwebarc/original wrrarms organize --hardlink --to ~/pwebarc/copy1 ~/pwebarc/original wrrarms organize --copy --to ~/pwebarc/copy2 ~/pwebarc/original wrrarms organize --hardlink --to ~/pwebarc/copy2 ~/pwebarc/original wrrarms organize --copy --to ~/pwebarc/copy3 ~/pwebarc/original wrrarms organize --hardlink --to ~/pwebarc/copy3 ~/pwebarc/original # de-duplicate wrrarms organize --move --to ~/pwebarc/all ~/pwebarc/original ~/pwebarc/copy1 ~/pwebarc/copy2 ~/pwebarc/copy3
will produce
~/pwebarc/all
which has each duplicated file stored only once. Similarly,wrrarms organize --symlink --output hupq_msn --to ~/pwebarc/pointers ~/pwebarc/original wrrarms organize --symlink --output shupq_msn --to ~/pwebarc/schemed ~/pwebarc/original # noop wrrarms organize --symlink --output hupq_msn --to ~/pwebarc/pointers ~/pwebarc/original ~/pwebarc/schemed
will produce
~/pwebarc/pointers
which has each symlink only once.
Advanced examples
-
Pretty-print all reqres in
../dumb_server/pwebarc-dump
by dumping their whole structure into an abridged Pythonic Object Representation (repr):wrrarms stream --expr . ../dumb_server/pwebarc-dump
wrrarms stream -e . ../dumb_server/pwebarc-dump
-
Pretty-print all reqres in
../dumb_server/pwebarc-dump
using the unabridged verbose textual representation:wrrarms pprint --unabridged ../dumb_server/pwebarc-dump
wrrarms pprint -u ../dumb_server/pwebarc-dump
-
Pretty-print all reqres in
../dumb_server/pwebarc-dump
by dumping their whole structure into the unabridged Pythonic Object Representation (repr) format:wrrarms stream --unabridged --expr . ../dumb_server/pwebarc-dump
wrrarms stream -ue . ../dumb_server/pwebarc-dump
-
Produce a JSON list of
[<file path>, <time it finished loading in milliseconds since UNIX epoch>, <URL>]
tuples (one per reqres) and pipe it intojq
for indented and colored output:wrrarms stream --format=json -ue fs_path -e finished_at -e request.url ../dumb_server/pwebarc-dump | jq .
-
Similarly, but produce a CBOR output:
wrrarms stream --format=cbor -ue fs_path -e finished_at -e request.url ../dumb_server/pwebarc-dump | less
-
Concatenate all response bodies of all the requests in
../dumb_server/pwebarc-dump
:wrrarms stream --format=raw --not-terminated -ue "response.body|es" ../dumb_server/pwebarc-dump | less
-
Print all unique visited URLs, one per line:
wrrarms stream --format=raw --lf-terminated -ue request.url ../dumb_server/pwebarc-dump | sort | uniq
-
Same idea, but using NUL bytes while processing, and prints two URLs per line:
wrrarms stream --format=raw --zero-terminated -ue request.url ../dumb_server/pwebarc-dump | sort -z | uniq -z | xargs -0 -n2 echo
How to handle binary data
Trying to use response bodies produced by wrrarms stream --format=json
is likely to result garbled data as JSON can't represent raw sequences of bytes, thus binary data will have to be encoded into UNICODE using replacement characters:
wrrarms stream --format=json -ue . ../dumb_server/pwebarc-dump/path/to/file.wrr | jq .
The most generic solution to this is to use --format=cbor
instead, which would produce a verbose CBOR representation equivalent to the one used by --format=json
but with binary data preserved as-is:
wrrarms stream --format=cbor -ue . ../dumb_server/pwebarc-dump/path/to/file.wrr | less
Or you could just dump raw response bodies separately:
wrrarms stream --format=raw -ue response.body ../dumb_server/pwebarc-dump/path/to/file.wrr | less
wrrarms get ../dumb_server/pwebarc-dump/path/to/file.wrr | less
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pwebarc_wrrarms-0.11.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 224ee9accf38b491822b8b29cfe73962a2affc0a0ecad2c6f26dfdb0a4687759 |
|
MD5 | a5f0cdaacd033786165bd922de858b0d |
|
BLAKE2b-256 | f97767ec11014d39866309a922884ff9093774b5a99a848535c524c45c9a81e1 |