Introducing the WHATWG URL API, officially implemented in Node.js 8.0!

Hello.
I'm Mandai, in charge of Wild on the development team.
It was a while ago, but Node.js 8.0.0 was released on May 30, 2017.
Starting with this version, npm version 5.0.0 has been bundled, and the code around caching has been rewritten to make it faster.
A tweet comparing the speed with previous versions was also published, and in this example, the installation appears to be completed in 1/5 the speed of the previous version
With #npm5 about to come out, I thought I'd update those benchmarks.
Here's the npm5 code I'm working on, vs [email protected] on a popular repo pic.twitter.com/KWPfbpE46p
— ✨11x gayer Kat✨ (@maybekatz) May 19, 2017
The current version of the V8 engine is version 5.8, but it appears to be compatible with V8 5.9 and V8 6.0. Future versions of the V8 engine will likely be upgraded to further speed up the engine. → Node.js 8.0 has been released. It includes npm 5.0 bundles, the Node.js API, and official support for the WHATWG URL parser. - Publickey
This time, we will look at the WHATWG URL API, which was officially implemented in Node.js 8.0
What is the WHATWG URL API?
The WHATWG URL API has been around since Node.js 7, but became official in 8.0.0.
Many of you may have already used it, but because it was in an "experimental" state, you may have been a little hesitant to use it in a production environment.
This API is intended to standardize URL parsing and is provided as an extension to the existing url module
const URL = require('url').URL; const beyondUrl = new URL('http://www.beyondjapan.com/?abc=123&xyz=999#first'); console.log(beyondUrl); // Result URL { href: 'http://www.beyondjapan.com/?abc=123&xyz=999#first', origin: 'http://www.beyondjapan.com', protocol: 'http:', username: '', password: '', host: 'www.beyondjapan.com', hostname: 'www.beyondjapan.com', port: '', pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
Of course, you can also use the url module, and it will feel the same as before
const url = require('url'); const beyondUrl = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; console.log(url.parse(beyondUrl)); // Result Url { protocol: 'http:', slashes: true, auth: null, host: 'www.beyondjapan.com', port: null, hostname: 'www.beyondjapan.com', hash: '#first', search: '?abc=123&xyz=999', query: 'abc=123&xyz=999', pathname: '/', path: '/?abc=123&xyz=999', href: 'http://www.beyondjapan.com/?abc=123&xyz=999#first' }
It's confusing that only the object names are slightly different, but the contents of the objects output are also slightly different.
In the response from the WHATWG URL API, the query string is parsed and returned with a key called searchParams, which is convenient.
This alone makes me want to use it
URL Object Behavior
The URL object returned from the WHATWG API also allows you to access each piece of data
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); console.log(beyondUrl.hostname); // Result www.beyondjapan.com
Try a different hostname
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.hostname = 'example.com'; console.log(beyondUrl); // Result URL { href: 'http://example.com/?abc=123&xyz=999#first', origin: 'http://example.com', protocol: 'http:', username: '', password: '', host: 'example.com', hostname: 'example.com', port: '', pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
The host name is recognized and rewritten, so even if you do the following, only the host name will be changed
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.hostname = 'example.com:443'; // Try adding the port number console.log(beyondUrl); // Result URL { href: 'http://example.com/?abc=123&xyz=999#first', origin: 'http://example.com', protocol: 'http:', // Unchanged username: '', password: '', host: 'example.com', hostname: 'example.com', port: '', // Unchanged pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
If you want to change the port number, you need to change it accordingly
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.port = 443; console.log(beyondUrl); // Result URL { href: 'http://www.beyondjapan.com:443/?abc=123&xyz=999#first', origin: 'http://www.beyondjapan.com:443', protocol: 'http:', // unchanged username: '', password: '', host: 'www.beyondjapan.com:443', hostname: 'www.beyondjapan.com', port: '443', pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
However, this does not seem to be the case if you change the host
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.host = 'example.com:443'; console.log(beyondUrl); // Result URL { href: 'http://example.com:443/?abc=123&xyz=999#first', origin: 'http://example.com:443', protocol: 'http:', // Unchanged username: '', password: '', host: 'example.com:443', // Changed hostname: 'example.com', // Changed port: '443', // Changed pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
URLSearchParams Class
We looked at the behavior of the URL object earlier, but now we'll look at the URLSearchParams class obtained from URL.searchParams.
This object is a class implemented in Node.js 7 that parses query strings and provides getter/setter functions.
The official documentation also compares it to the querystring module, but it seems that the URLSearchParams class is not as flexible as the querystring module, so it does not mean that the querystring module becomes unnecessary.
The URLSearchParams class is provided as a class in the url module, so it can be used independently.
Therefore, it is a powerful class that can be used not only for parsing but also for generation.
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const qsObject = { abc:123, xyz:456, aaa:789 }; const qsIterable = [ ['abc', 123], ['xyz', 456], ['aaa', 789], ]; const qsMap = new Map(); qsMap.set('abc', 123); qsMap.set('xyz', 456); qsMap.set('aaa', 789); function* qsGenerator(){ yield ['abc', 123]; yield ['xyz', 456]; yield ['aaa', 789]; } const params1 = new URLSearchParams(qs); // Can be a regular query string format string const params2 = new URLSearchParams(qsObject); // Can be a regular object const params3 = new URLSearchParams(qsIterable); // Can be an iterator const params4 = new URLSearchParams(qsMap); // Can be a Map object const params5 = new URLSearchParams(qsGenerator()); // Can be a generator console.log(params1); console.log(params2); console.log(params3); console.log(params4); console.log(params5); // Result URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' }
He's a very picky eater who loves everything, including objects, arrays, map objects, and generators
The created URLSearchParams object has various methods
append
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.append('bbb', 963); console.log(params.toString()) // Result // abc=123&xyz=456&aaa=789&bbb=963
delete
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.delete('bbb'); console.log(params.toString()); // Result // abc=123&xyz=456&aaa=789
Returns an iterator for entries
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (let v of params.entries()) console.log(v); // result /* [ 'abc', '123' ] [ 'xyz', '456' ] [ 'aaa', '789' ] */
forEach in a full loop
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.forEach((value, key, p) => { console.log(value, key, p); }) // Result /* 123 abc URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } 456 xyz URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } 789 aaa URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } */
Get returns the value of the argument key
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.get('abc')); // Result // 123
getAll returns all values of the argument key
Since we're talking about what's different from get, I've created two samples
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.getAll('abc')) // Result // [ '123' ]
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789&abc=777'; const params = new URLSearchParams(qs); console.log(params2.getAll('abc')) // Result // [ '123', '777' ]
The URLSearchParams object allows duplicate keys, so it has a getAll method
By the way, in the case of the get method, the specification is to return the first registered key among duplicate keys, so it may be that the value can only be accessed within getAll or a loop
has to check for existence
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.has('abc')); // Result // true
keys returns an iterator of keys
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (let k of params.keys()) console.log(k); // Result /* abc xyz aaa */
Overwrite set
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.set('vvv', 247); console.log(params.toString()); // Result // abc=123&xyz=456&aaa=789&vvv=247 params.set('vvv', 247); console.log(params.toString()); // Result // abc=123&xyz=456&aaa=789&vvv=247
If the corresponding key does not exist, it will perform the equivalent of append, and if the key exists, it will overwrite the contents of the key.
If multiple keys exist, it will delete all of them and then append.
const qs = 'a=1&a=2&a=3'; const params = new URLSearchParams(qs); console.log(params.toString()); // At this point // a=1&a=2&a=3 params.set('a', 4); console.log(params.toString()); // Result // a=4
Destructive sort
Sorts the contents of the object in forward order by name.
Reverse sorting is not possible.
Please note that this does not return a URLSearchParams object with the order of the objects swapped, but swaps the order of the executed objects (although the order is not something you'll often care about)
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.toString()); params.sort(); console.log(params.toString()); // Result // Before execution // abc=123&xyz=456&aaa=789&vvv=247 // After execution // aaa=789&abc=123&vvv=247&xyz=456
values, which returns an iterator of values
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (let v of params.values()) console.log(v); // Result /* 789 123 247 456 */
The URLSearchParams object itself is iterable
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (const [k, v] of params) console.log(k, v); // result /* aaa 789 abc 123 vvv 247 xyz 456 */
summary
I've summarized the WHATWG URL API and URLSearchParams, which will likely play an important role in URL parsing in the future. I hope you found it helpful.
It seems like it will help make progress on implementing query strings, which was surprisingly tedious.
That's it.
0