How is it different from JSON? Introduction to YAML starting from

table of contents
Hello.
I'm Mandai, in charge of Wild on the development team.
The data format JSON has been around for quite some time, but there is another similar data format called YAML. Whether
you use JSON or YAML more often will depend on the programming language you normally use.
When developing APIs, most communication with the outside world is in JSON, and if you primarily use PHP for front-end development or back-end development, you will probably communicate in JSON, which can be converted in one go using built-in functions, so it is true that you will rarely have an opportunity to come into contact with YAML.
This time, I'll summarize YAML, which is easy to understand once you know it
About YAML
I'm sure you're all familiar with the term YAML, which is similar to
HTML, and XML, which is similar.
a backronym that means YAML is not a markup language .
So my understanding that it is a data format rather than a markup language was correct
[Basics] Data types that can be handled
YAML can only handle three data types:
- Sequence
- Map (mapping)
- scalar
Since data can be represented using only these three types, don't you think it's fairly easy to get started?
By combining these with various symbols that give a wider range of expression, you can represent various types of data.
Let's take a closer look at each data type
Sequence Types
A sequence is an array with consecutive numeric subscripts.
There is no concept of a key, so it is used to simply store multiple values.
- abc - def - 123
Start with a hyphen, then a space, and then enter the data.
When you convert the above YAML data to JSON, it will look like this:
["abc", "def", 123]
Map type
A map is an array (associative array) that can specify subscripts.
It has a key-value pair, and values can be scalars, sequences, or maps, so the structure can easily become complex.
aaa: test1 bbb: test2 ccc: test3 ddd: eee: test4 fff: test5
In the above example, the value with ddd as the key is a map, and it is nested.
When converted to JSON, it becomes the following format.
{"abc": "test1", "bbb": "test2", "ccc": "test3", "ddd": {"eee": "test4", "fff": "test5"}}
If we format it for easier viewing, it will look like this:
{ "abc": "test1", "bbb": "test2", "ccc": "test3", "ddd": { "eee": "test4", "fff": "test5" } }
Furthermore, there is also a type called Ordered Mapping, which can be combined with the sequence type mentioned above to specify the order
- aaa: test1 - ccc: test2 - bbb: test3
If the order of the keys is important, you can combine a sequence-like ordering with map-like keys.
This concept does not exist in JSON.
Scalar types
Finally, there are scalars.
Simply put, scalars are single values, such as numbers, strings, and Booleans.
We've already seen this with map and sequence types, but what kind of values are scalars? All data types other than maps and sequences are considered scalar types.
Map and sequence types represent structures rather than types of data, so you could say that it is the scalar types that define what values actually exist
Since scalar types cannot be written alone, we will express them as elements of a sequence type
- 100 # number (decimal) - 016 # number (octal) - 0xAC # number (hexadecimal) - 3.14 # number (float) - 12.3e+4 # number (exponent) - .inf # number (positive infinity) - -.inf # number (negative infinity) - true # bool - false # bool - on # bool - off # bool - yes # bool - no # bool - null # null - ~ # null - test # string - 1980/09/02 # string (note) - 1980-09-02 00:00:00.000-09:00 # datetime (ISO-8601 format) - 1980-09-02 # date (year, month, day) - 1980 # date (year)
Of these, only strings can contain multiple lines of data, including line breaks, so several magic spells are provided to represent multiple lines of text data
# Simple text with line breaks test1: | You can write text with line breaks. You can write text with line breaks
# Replace line breaks with half-width spaces (line break codes will not exist when the data is extracted) test2: > You can write text that includes line breaks. You can write text that includes line breaks
In either pattern, the half-width spaces for indentation at the beginning of each line are removed.
In the case of ">", after removal, the line break is replaced with a half-width space.
Additionally, by combining the above two with "+" and "-", you can define whether to include or remove the newline on the last line
# Add a line break code to the end of the last line test3: |+ You can write text that includes a line break. You can write text that includes a line break
# Do not add a line break code to the end of the last line test4: |- You can write text that includes a line break. You can write text that includes a line break
# Add a line break code to the end of the last line (but remove line breaks within the text) test3: >+ You can write text that includes line breaks. You can write text that includes line breaks
# Do not add a line break code to the end of the last line test4: >- You can write text that includes a line break. You can write text that includes a line break
Depending on the content of the text, you may not want the spaces used for indentation to be removed.
In such cases, you can specify how many spaces to insert at the beginning of each line.
# Put two spaces at the beginning of each line test1: |2 You can write text including line breaks. You can write text including line breaks
# Put 4 spaces at the beginning of each line test1: |4 You can write text including line breaks. You can write text including line breaks
There is a subtle relationship between the number of spaces at the beginning of a line and the number of indents. If there are more than the specified number of spaces in the indent, it is OK, but if there are fewer spaces, it seems to be formatted incorrectly
# Error because four or more spaces are required at the beginning of each line test1: |4 You can write text including line breaks. You can write text including line breaks
Also, according to the YAML definition, the number of spaces to insert at the beginning of a line can be specified in the range 1 <= n <= 9
About Nest
As we've seen several times before, maps and sequences allow you to nest elements, and you can express this nesting with indentation.
In YAML, indentation is considered to be one indentation per two spaces.
# Nesting sequences - aaa - bbb - ccc - ddd - eee - fff
# Nesting maps aaa: test1 bbb: test2 ccc: ddd: test3 eee: test4 fff: test5
# Nesting sequences and maps - aaa - bbb - ccc ddd: test1 eee: test2 fff: test3
# Nesting maps and sequences aaa: test1 bbb: test2 ccc: - ddd - eee - fff
# Nested ordered maps (note the number of indentations) - aaa: test1 - ccc: test2 - bbb: - ddd - eee - fff
If a map or sequence is a child element of an ordered map, it seems that it will not be nested correctly unless it is indented by two spaces (four spaces)
If you are using YAML in Visual Studio Code:
When working with YAML in Visual Studio Code (hereafter referred to as VSCode), it is easy to use as it has some standard support.
However, I feel that extensions are necessary to create an even more convenient environment.
I highly recommend using the
YAML released by Red Hat It's a fully-fledged extension that uses Language Server and is easy to use. It
also supports external schemas, so you can load predefined schemas such as Swagger YAML schema data, which is often used in API development, AWS CloudFormation schema data, and docker-compose.yml schema data, making it even easier to use.
The YAML schema for Kubernetes is included, so you don't need to prepare it yourself
summary
This time, I wrote an introductory article about YAML, mixing in some little tips. What did you think?
I'm sure there are some people who thought YAML was difficult and avoided it because they didn't know that there are actually only three basic data types.
There are several built-in scalar data types, and although they all appear to be strings, you can see that YAML properly distinguishes between each data type. Regarding
strings, there are several convenient ways to express multi-line strings, so it's best to look at examples and use the expression that suits you best.
You can freely express data structures using maps, sequences, and a combination of them called ordered maps (I need a better name for this), and if you can create clear data definitions, you're sure to end up with code that's easy to maintain
In fact, we weren't able to fully introduce advanced usage techniques, such as how to write data definitions such as anchors and aliases in a different location, or how to write multiple data structures in one file, so we plan to introduce them on another occasion, so stay tuned!
That's it.
2