uri_string
(stdlib)URI processing functions.
This module contains functions for parsing and handling URIs
(
Parsing and serializing non-UTF-8 form-urlencoded query strings are also supported
(
A URI is an identifier consisting of a sequence of characters matching the syntax
rule named URI in
The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) authority = [ userinfo "@" ] host [ ":" port ] userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
The interpretation of a URI depends only on the characters used and not on how those characters are represented in a network protocol.
The functions implemented by this module cover the following use cases:
- Parsing URIs into its components and returing a map
parse/1
- Recomposing a map of URI components into a URI string
recompose/1
- Changing inbound binary and percent-encoding of URIs
transcode/2
- Transforming URIs into a normalized form
normalize/1
normalize/2
- Composing form-urlencoded query strings from a list of key-value pairs
compose_query/1
compose_query/2
- Dissecting form-urlencoded query strings into a list of key-value pairs
dissect_query/1
There are four different encodings present during the handling of URIs:
- Inbound binary encoding in binaries
- Inbound percent-encoding in lists and binaries
- Outbound binary encoding in binaries
- Outbound percent-encoding in lists and binaries
Functions with uri_string()
argument accept lists, binaries and
mixed lists (lists with binary elements) as input type. All of the functions but
transcode/2
expects input as lists of unicode codepoints, UTF-8 encoded binaries
and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").
Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output.
In case of lists there is only percent-encoding. In binaries, however, both binary encoding
and percent-encoding shall be considered. transcode/2
provides the means to convert
between the supported encodings, it takes a uri_string()
and a list of options
specifying inbound and outbound encodings.
Types
error() = {error, atom(), term()}
Error tuple indicating the type of error. Possible values of the second component:
invalid_character
invalid_encoding
invalid_input
invalid_map
invalid_percent_encoding
invalid_scheme
invalid_uri
invalid_utf8
missing_value
The third component is a term providing additional information about the cause of the error.
uri_map() =
#{fragment => unicode:chardata(),
host => unicode:chardata(),
path => unicode:chardata(),
port => integer() >= 0 | undefined,
query => unicode:chardata(),
scheme => unicode:chardata(),
userinfo => unicode:chardata()} |
#{}
Map holding the main components of a URI.
uri_string() = iodata()
List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two,
representing an
Functions
compose_query(QueryList) -> QueryString
QueryList = [{unicode:chardata(), unicode:chardata()}]
QueryString = uri_string() | error()
Composes a form-urlencoded
based on a
, a list of non-percent-encoded key-value pairs.
Form-urlencoding is defined in section
4.10.21.6 of the
See also the opposite operation
dissect_query/1
.
Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
"foo+bar=1&city=%C3%B6rebro"
2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
2> {<<"city">>,<<"örebro"/utf8>>}]).
<<"foo+bar=1&city=%C3%B6rebro">>
compose_query(QueryList, Options) -> QueryString
QueryList = [{unicode:chardata(), unicode:chardata()}]
Options = [{encoding, atom()}]
QueryString = uri_string() | error()
Same as compose_query/1
but with an additional
parameter, that controls the encoding ("charset")
used by the encoding algorithm. There are two supported encodings: utf8
(or unicode
) and latin1
.
Each character in the entry's name and value that cannot be expressed using the selected character encoding, is replaced by a string consisting of a U+0026 AMPERSAND character (&), a "#" (U+0023) character, one or more ASCII digits representing the Unicode code point of the character in base ten, and finally a ";" (U+003B) character.
Bytes that are out of the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A, are percent-encoded (U+0025 PERCENT SIGN character (%) followed by uppercase ASCII hex digits representing the hexadecimal value of the byte).
See also the opposite operation
dissect_query/1
.
Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
1> [{encoding, latin1}]).
"foo+bar=1&city=%F6rebro"
2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
2> {<<"city">>,<<"東京"/utf8>>}], [{encoding, latin1}]).
<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>
dissect_query(QueryString) -> QueryList
QueryString = uri_string()
QueryList =
[{unicode:chardata(), unicode:chardata()}] | error()
Dissects an urlencoded
and returns a
, a list of non-percent-encoded key-value pairs.
Form-urlencoding is defined in section
4.10.21.6 of the
See also the opposite operation
compose_query/1
.
Example:
1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
[{"foo bar","1"},{"city","örebro"}]
2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
[{<<"foo bar">>,<<"1">>},
{<<"city">>,<<230,157,177,228,186,172>>}]
normalize(URI) -> NormalizedURI
URI = uri_string() | uri_map()
NormalizedURI = uri_string() | error()
Transforms an
into a normalized form
using Syntax-Based Normalization as defined by
This function implements case normalization, percent-encoding normalization, path segment normalization and scheme based normalization for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP.
Example:
1>uri_string:normalize("/a/b/c/./../../g").
"/a/g" 2> uri_string:normalize(<<"mid/content=5/../6">>). <<"mid/6">> 3> uri_string:normalize("http://localhost:80"). "https://localhost/" 4>uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
4> host => "localhost-örebro"}). "http://localhost-%C3%B6rebro/a/g"
normalize(URI, Options) -> NormalizedURI
URI = uri_string() | uri_map()
Options = [return_map]
NormalizedURI = uri_string() | uri_map()
Same as normalize/1
but with an additional
parameter, that controls if the normalized URI
shall be returned as an uri_map().
There is one supported option: return_map
.
Example:
1>uri_string:normalize("/a/b/c/./../../g", [return_map]).
#{path => "/a/g"} 2> uri_string:normalize(<<"mid/content=5/../6">>, [return_map]). #{path => <<"mid/6">>} 3> uri_string:normalize("http://localhost:80", [return_map]). #{scheme => "http",path => "/",host => "localhost"} 4>uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
4> host => "localhost-örebro"}, [return_map]). #{scheme => "http",path => "/a/g",host => "localhost-örebro"}
parse(URIString) -> URIMap
URIString = uri_string()
URIMap = uri_map() | error()
Parses an uri_string()
into a uri_map()
, that holds the parsed
components of the URI
.
If parsing fails, an error tuple is returned.
See also the opposite operation
recompose/1
.
Example:
1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
#{fragment => "nose",host => "example.com",
path => "/over/there",port => 8042,query => "name=ferret",
scheme => foo,userinfo => "user"}
2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
#{host => <<"example.com">>,path => <<"/over/there">>,
port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
userinfo => <<"user">>}
recompose(URIMap) -> URIString
URIMap = uri_map()
URIString = uri_string() | error()
Creates an
(percent-encoded), based on the components of
.
If the
is invalid, an error tuple is returned.
See also the opposite operation
parse/1
.
Example:
1>URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}. #{fragment => "top",host => "example.com", path => "/over/there",port => 8042,query => "?name=ferret", scheme => foo,userinfo => "user"} 2>uri_string:recompose(URIMap).
"foo://example.com:8042/over/there?name=ferret#nose"
transcode(URIString, Options) -> Result
URIString = uri_string()
Options =
[{in_encoding, unicode:encoding()} |
{out_encoding, unicode:encoding()}]Result = uri_string() | error()
Transcodes an
,
where
is a list of tagged tuples, specifying the inbound
(in_encoding
) and outbound (out_encoding
) encodings. in_encoding
and out_encoding
specifies both binary encoding and percent-encoding for the
input and output data. Mixed encoding, where binary encoding is not the same as
percent-encoding, is not supported.
If an argument is invalid, an error tuple is returned.
Example:
1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
1> [{in_encoding, utf32},{out_encoding, utf8}]).
<<"foo%C3%B6bar"/utf8>>
2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
2> {out_encoding, utf8}]).
"foo%C3%B6bar"