How is it different from JSON? Introduction to YAML starting from
table of contents
Hello.
I'm Mandai, in charge of Wild on the development team.
It has been a while since the JSON data format became popular, but there is also a similar data format called YAML.
I think it depends on the programming language you usually use, whether you often use JSON or YAML.
When developing an API, most of the communication with the outside world is in JSON, and if you are mainly working with PHP in front-end development or back-end, you will most likely communicate in JSON, which can be converted in one go using built-in functions. However, it is also true that there are few opportunities to come into contact with YAML.
This time I will summarize YAML, which is easy to understand once you know it.
About YAML
I'm sure you're familiar with something similar to the term YAML.
HTML is quite similar, and I think XML is also similar.
Of course, the official name of YAML is "YAML Ain't a Markup Language," which is a so-called backronym .
So your understanding that it is a data format, not a markup language, was correct.
[Basics] Supported data types
YAML can only handle three data types.
- sequence
- Map (mapping)
- scalar
Don't you think it's relatively easy to understand since data is represented using only these three types?
This is combined with various symbols that provide a wide range of expression to represent a variety of data.
Let's take a closer look at each data type.
Sequence type
A sequence is an array with consecutive numerical subscripts.
There is no concept of a key, so it is used when simply having multiple values.
- abc - def - 123
Start with a hyphen, enter a space, and then enter the data.
Converting the above YAML data to JSON results in the following format.
["abc", "def", 123]
map type
A map refers to an array (associative array) that allows you to specify subscripts.
Because keys and values are paired, and values can be scalars, sequences, or maps, the structure can be complex.
aaa: test1 bbb: test2 ccc: test3 ddd: eee: test4 fff: test5
In the above example, the values with key ddd are mapped and nested.
Converting this to JSON results in the following format.
{"abc": "test1", "bbb": "test2", "ccc": "test3", "ddd": {"eee": "test4", "fff": "test5"}}
If you format it for easier viewing, it will look like this:
{ "abc": "test1", "bbb": "test2", "ccc": "test3", "ddd": { "eee": "test4", "fff": "test5" } }
Furthermore, there is also a map type (Ordered Mapping) that can be combined with the sequence type mentioned earlier to specify the sorting order.
- aaa: test1 - ccc: test2 - bbb: test3
If the order of the keys is important, you can use a combination of specifying the order like a sequence and adding keys like a map.
This concept is not found in JSON.
scalar type
The last one is a scalar.
A scalar simply refers to a single value, such as a number, string, or Boolean value.
It has already appeared in the map type and sequence type, but what kind of values are scalar? All data types other than map and sequence are considered scalar types.
Map types and sequence types represent structures rather than data types, so it can be said that scalar types define the actual value.
Since a scalar type cannot be written alone, we will try expressing it as an element of a sequence type.
- 100 # number (decimal) - 016 # number (octal) - 0xAC # number (hexadecimal) - 3.14 # number (float) - 12.3e+4 # number (exponent) - .inf # number (+ direction infinity) - -.inf # number (infinity in the - direction) - true # bool - false # bool - on # bool - off # bool - yes # bool - no # bool - null # null - ~ # null - test # string - 1980/09/02 # string (note) - 1980-09-02 00:00:00.000-09:00 # datetime (ISO-8601 format) - 1980-09-02 # date (year, month, day ) - 1980 # date (year)
Among these, only character strings can have multiple lines of data, including line breaks, so several spells are prepared to express multiple lines of text data.
# Text including simple line breaks test1: | You can write text including line breaks. You can write text including line breaks.
# Replace line breaks with half-width spaces (no line feed code exists at the time the data is extracted) test2: > You can write text including line breaks. You can write text including line breaks.
In either pattern, indentation spaces at the beginning of each line are removed.
In the case of ">", after removing it, the new line is to be replaced with a half-width space.
Also, by combining the above two with "+" and "-", you can define whether to include or remove the final line break.
# Add a line feed code at the end of the last line test3: |+ You can write text including line breaks. You can write text including line breaks.
# Do not add a line break code at the end of the last line test4: |- You can write text that includes a line break. You can write text including line breaks.
# Add a line feed code at the end of the last line (however, remove line breaks within the sentence) test3: >+ You can write text that includes line breaks. You can write text including line breaks.
# Do not add a line break code at the end of the last line test4: >- You can write text that includes a line break. You can write text including line breaks.
Depending on the content of the text, it may be difficult to remove the spaces used for indentation.
You can also specify how many spaces to put at the beginning of each line.
# Add two spaces at the beginning of each line test1: |2 You can write text including line breaks. You can write text including line breaks.
# Add 4 spaces at the beginning of each line test1: |4 You can write text including line breaks. You can write text including line breaks.
There is a delicate relationship between the number of spaces to put at the beginning of a line and the number of indents; if there are more than the specified number of spaces in the indent, it is OK, but if there are fewer, it seems that the format is incorrect.
# Error because four or more spaces are required at the beginning of each line test1: |4 You can write text including line breaks. You can write text including line breaks.
Also, according to the YAML definition, the number of spaces to be inserted at the beginning of a line can be specified in the range 1 <= n <= 9.
About nest
As we've seen several times already, maps and sequences allow elements to be nested, so nesting can be expressed using indentation.
In YAML, two spaces are considered one indentation.
# Nesting sequences - aaa - bbb - ccc - ddd - eee - fff
# Nest maps aaa: test1 bbb: test2 ccc: ddd: test3 eee: test4 fff: test5
# Nesting sequences and maps - aaa - bbb - ccc ddd: test1 eee: test2 fff: test3
# Nesting maps and sequences aaa: test1 bbb: test2 ccc: - ddd - eee - fff
# Nesting ordered maps (note the number of indentations) - aaa: test1 - ccc: test2 - bbb: - ddd - eee - fff
When a map or sequence is a child element of an ordered map, it seems that it cannot be nested correctly unless you add two indentations (four spaces).
When working with YAML in Visual Studio Code
When working with YAML in Visual Studio Code (hereinafter referred to as VSCode), it is easy to handle as there are some functions that are supported by the standard.
However, I feel that extensions are necessary to create an even more convenient environment.
released by Red Hat called YAML , you can quickly set up an environment where code hints and validation work, so we highly recommend it.
It is a full-fledged extension that uses Language Server, and is easy to use.
It also supports external schemas, so you can load predefined schemas such as loading Swagger YAML schema data that is often used in API development, schema data for AWS CloudFormation, and schema data for docker-compose.yml. It can also be loaded, making it easier to use.
The YAML schema for Kubernetes is included, so you don't need to prepare it.
summary
This time, I wrote an introductory article about YAML, including some small stories, but what did you think?
YAML is difficult! I think there are some people who have been avoiding it because they didn't know that there are actually only three basic data types.
The scalar type has several built-in data types, and although they all look like strings, you may have learned that YAML clearly distinguishes each data type. .
Regarding strings, there are several convenient ways to express multi-line strings, so it's best to look at examples and use the expression that suits you at the time.
If you can freely express data structures using maps, sequences, or an ordered map that combines them (I want a good name), and can define data with a clear view, the code will definitely be easier to maintain.
Actually, I couldn't introduce you to advanced usage methods, such as how to write data definitions such as anchors and aliases in separate locations, and how to write multiple data structures in one file. We are planning to set up another opportunity to introduce them to you, so please look forward to it!
That's it.