URL encoding is a mechanism for translating unprintable or special characters to a universally accepted format by web servers and browsers. The encoding of information can be applied to Uniform Resource Names (URNs), Uniform Resource Identifiers (URIs) and Uniform Resource Locators (URLs), and selected characters in the URL are replaced by one or more character triplets comprised of the percent character and two hexadecimal digits. The hexadecimal digits in the character triplets represent the numerical value of the characters that are replaced. URL encoding is widely used in HTML form data submission in HTTP requests.
URL encoding is also known as percent-encoding.
As per RFC 3986, characters found in a URL must be present in the defined set of reserved and unreserved ASCII characters. However, URL encoding allows characters which otherwise would be not permitted to be represented with help of allowed characters. URL encoding is used mostly for non-ASCII control characters – characters beyond the ASCII character set of 128 characters and reserved characters such as the semicolon, equal sign, space or caret.
A two-step process is usually followed for URL encoding, which consists of conversion of the character string into a byte sequence with UTF-8 encoding and then the conversion of each byte that is a non-ASCII character to “%HH,” where HH is the corresponding hexadecimal representation of the replaced byte. URL encoding can help in the conversion of non-ASCII characters to a format that can be transmitted over the internet.
0 Comments