Introducing the WHATWG URL API, officially implemented in Node.js 8.0!
table of contents
Hello.
I'm Mandai, in charge of Wild on the development team.
A little while ago, Node.js 8.0.0 was released on May 30, 2017.
Starting from this version, npm version 5.0.0 is bundled, and the code around caching has been rewritten and seems to be faster.
A tweet comparing the speed with past versions has also been published, and in this example, it appears that installation is completed at 1/5th the speed of previous versions.
With #npm5 about to come out, I thought I'd update those benchmarks.
Here's the npm5 code I'm working on, vs [email protected] on a popular repo pic.twitter.com/KWPfbpE46p
— ✨11x gayer Kat✨ (@maybekatz) May 19, 2017
The current version of the V8 engine is version 5.8, but it seems to be compatible with V8 5.9 and V8 6.0, and future versions can be expected to be even faster with an upgraded version of the V8 engine. . → Node.js 8.0 released. npm 5.0 bundle, Node.js API included, official support for WHATWG URL parser, etc. - Publickey
This time, I would like to take a look at the WHATWG URL API, which was officially implemented in Node.js 8.0.
What is WHATWG URL API? ?
Actually, the WHATWG URL API has existed since the Node.js 7 series, but it became the official version in 8.0.0.
Many people may have already used it, but since it was in an "experimental" position, they may have been a little hesitant to use it in a production environment.
This API is intended to standardize URL parsing and is provided as an extension of the conventional URL module.
const URL = require('url').URL; const beyondUrl = new URL('http://www.beyondjapan.com/?abc=123&xyz=999#first'); console.log(beyondUrl); // Result URL { href: 'http://www.beyondjapan.com/?abc=123&xyz=999#first', origin: 'http://www.beyondjapan.com', protocol: 'http:', username: '' , password: '', host: 'www.beyondjapan.com', hostname: 'www.beyondjapan.com', port: '', pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
Of course you can also use the URL module, and it feels the same as before.
const url = require('url'); const beyondUrl = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; console.log(url.parse(beyondUrl)); // Result Url { protocol: 'http:', slashes: true, auth: null, host: 'www.beyondjapan.com', port: null, hostname: 'www.beyondjapan.com', hash: '#first', search: ' ?abc=123&xyz=999', query: 'abc=123&xyz=999', pathname: '/', path: '/?abc=123&xyz=999', href: 'http://www.beyondjapan.com/? abc=123&xyz=999#first' }
It's confusing that only the object names are slightly different, but the contents of the output objects are also slightly different.
In the response from the WHATWG URL API, it is convenient that the query string is parsed and returned using the key searchParams.
This alone makes you want to use it.
Behavior of URL objects
The URL object returned from the WHATWG API can also access each data.
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); console.log(beyondUrl .hostname); // Result www.beyondjapan.com
Try using a different host name.
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.hostname = ' example.com'; console.log(beyondUrl); // Result URL { href: 'http://example.com/?abc=123&xyz=999#first', origin: 'http://example.com', protocol: 'http:', username: '', password: '', host: 'example.com', hostname: 'example.com', port: '', pathname: '/', search: '?abc= 123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
Only the host name is recognized and rewritten, so if you do the following, only the host name will be changed.
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.hostname = ' example.com:443'; // Try adding the port number console.log(beyondUrl); // Result URL { href: 'http://example.com/?abc=123&xyz=999#first', origin: 'http://example.com', protocol: 'http:', // unchanged username: '', password: '', host: 'example.com', hostname: 'example.com', port: ' ', // unchanged pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '# first' }
If you want to change the port number, you need to change the port number properly.
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.port = 443 ; console.log(beyondUrl); // Result URL { href: 'http://www.beyondjapan.com:443/?abc=123&xyz=999#first', origin: 'http://www.beyondjapan.com :443', protocol: 'http:', // unchanged username: '', password: '', host: 'www.beyondjapan.com:443', hostname: 'www.beyondjapan.com', port: ' 443', pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999' }, hash: '#first' }
However, this does not seem to be the case when changing the host.
const u = 'http://www.beyondjapan.com/?abc=123&xyz=999#first'; const URL = require('url').URL; const beyondUrl = new URL(u); beyondUrl.host = ' example.com:443'; console.log(beyondUrl); // Result URL { href: 'http://example.com:443/?abc=123&xyz=999#first', origin: 'http://example .com:443', protocol: 'http:', // unchanged username: '', password: '', host: 'example.com:443', // changed hostname: 'example.com', / / changed port: '443', // changed pathname: '/', search: '?abc=123&xyz=999', searchParams: URLSearchParams { 'abc' => '123', 'xyz' => '999 ' }, hash: '#first' }
URLSearchParams class
Earlier we looked at the behavior of the URL object, but now let's look at the URLSearchParams class obtained from URL.searchParams.
This object is a class implemented in the Node.js 7 series that parses the query string and provides getters/setters.
the official documentation where it is compared to the querystring module, but it seems that the URLSearchParams class is not as flexible as the querystring module, so this does not mean that the querystring module is unnecessary.
The URLSearchParams class is provided as a class in the URL module, so it can also be used independently.
Therefore, it is a powerful class that can be used not only for analysis but also for generation.
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const qsObject = { abc:123, xyz:456, aaa:789 }; const qsIterable = [ ['abc' , 123], ['xyz', 456], ['aaa', 789], ]; const qsMap = new Map(); qsMap.set('abc', 123); qsMap.set('xyz', 456 ); qsMap.set('aaa', 789); function* qsGenerator(){ yield ['abc', 123]; yield ['xyz', 456]; yield ['aaa', 789]; } const params1 = new URLSearchParams(qs); // Even a regular query string format const params2 = new URLSearchParams(qsObject); // Even a regular object const params3 = new URLSearchParams(qsIterable); // Even an iterator const params4 = new URLSearchParams (qsMap); // Also in the Map object const params5 = new URLSearchParams(qsGenerator()); // Even in the generator console.log(params1); console.log(params2); console.log(params3); console.log(params4 ); console.log(params5); // Result URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123 ', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa ' => '789' }
She is a picky child who will eat anything, including objects, arrays, map objects, and generators.
The created URLSearchParams object has various methods.
append to add
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.append('bbb', 963); console.log(params .toString()) // result // abc=123&xyz=456&aaa=789&bbb=963
delete
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.delete('bbb'); console.log(params.toString ()); // Result // abc=123&xyz=456&aaa=789
entries returning an iterator
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (let v of params.entries()) console.log(v ); // result /* [ 'abc', '123' ] [ 'xyz', '456' ] [ 'aaa', '789' ] */
forEach for all loop
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.forEach((value, key, p) => { console. log(value, key, p); }) // result /* 123 abc URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } 456 xyz URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } 789 aaa URLSearchParams { 'abc' => '123', 'xyz' => '456', 'aaa' => '789' } */
get returns the value of the argument key
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.get('abc')); // Result // 123
getAll returns all the values of the argument keys
Since we are talking about how it differs from get, I created two samples.
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.getAll('abc')) // Result // [ 'one two three' ]
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789&abc=777'; const params = new URLSearchParams(qs); console.log(params2.getAll('abc')) / / result // [ '123', '777' ]
The URLSearchParams object allows duplicate keys, so a getAll method is provided.
By the way, in the case of the get method, the specification is to return the first registered key among duplicate keys, so it is possible that the value can only be accessed in getAll or within a loop.
Check existence has
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.has('abc')); // Result // true
keys returns an iterator of keys
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (let k of params.keys()) console.log(k ); // result /* abc xyz aaa */
set to overwrite
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); params.set('vvv', 247); console.log(params .toString()); // result // abc=123&xyz=456&aaa=789&vvv=247 params.set('vvv', 247); console.log(params.toString()); // result // abc=123&xyz =456&aaa=789&vvv=247
If the corresponding key does not exist, it performs an action equivalent to append, and if the key exists, it overwrites the contents of the key.
If multiple keys exist, it seems to append after deleting them all.
const qs = 'a=1&a=2&a=3'; const params = new URLSearchParams(qs); console.log(params.toString()); // At this point // a=1&a=2&a=3 params.set ('a', 4); console.log(params.toString()); // result // a=4
Destructive sort sort
Sorts the contents of objects by name.
It seems that reverse sorting is not possible.
Note that it does not return a URLSearchParams object with the objects rearranged, but the order of the executed objects is rearranged (although the order is not something you really care about).
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); console.log(params.toString()); params.sort() ; console.log(params.toString()); // Result // Before execution // abc=123&xyz=456&aaa=789&vvv=247 // After execution // aaa=789&abc=123&vvv=247&xyz=456
values returns an iterator of values
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (let v of params.values()) console.log(v ); // result /* 789 123 247 456 */
URLSearchParams object itself is iterable
const {URLSearchParams} = require('url'); const qs = 'abc=123&xyz=456&aaa=789'; const params = new URLSearchParams(qs); for (const [k, v] of params) console.log(k , v); // result /* aaa 789 abc 123 vvv 247 xyz 456 */
summary
I have summarized the WHATWG URL API and URLSearchParams, which will likely play an important role in URL parsing in the future.I hope this helps you understand.
The implementation around query strings, which was surprisingly troublesome, seems to be progressing.
That's it.