Introduction to YAML, starting with what's the difference between it and JSON?

Hello.
I'm Mandai, the Wild team member in charge of development.

While the JSON data format has been widely used for quite some time now, there is another similar data format called YAML.
Whether you use JSON or YAML more often will depend on the programming language you normally use.
When developing APIs, most of the communication with external parties is in JSON format, and even in front-end development or back-end development, if you mainly work with PHP, you will mostly communicate using JSON, which can be converted in one go with built-in functions, so it is true that you will rarely have the opportunity to work with YAML.

This time, I'll summarize YAML, which is easy to understand once you know it

About YAML

I'm sure you're all familiar with terms similar to YAML.
HTML is quite similar, and so is XML.

that essentially says "TYAML is not a markup languagea backronym."

So my understanding that it is a data format rather than a markup language was correct

 

[Basics] Data types that can be handled

YAML can only handle three data types:

  • Sequence
  • Map (mapping)
  • scalar

Since data is represented using only these three types, don't you think it's relatively easy to grasp?
By combining these with various symbols to broaden the range of expression, a variety of data can be represented.

Let's take a closer look at each data type

 

Sequence Types

A sequence refers to an array with consecutive numerical indices.
Since there is no concept of a key, it is simply used to represent multiple values.

- abc - def - 123

 

Enter your data after a hyphen, followed by a space.
Converting the above YAML data to JSON will result in the following format:

["abc", "def", 123]

 

Map type

A map refers to an array (associative array) that can be indexed.
Because it has key-value pairs, and values ​​can be scalars, sequences, or maps, its structure can easily become complex.

aaa: test1 bbb: test2 ccc: test3 ddd: eee: test4 fff: test5

 

In the example above, the values ​​with `ddd` as the key are part of a map, and they are nested.
Converting this to JSON results in the following:

{"abc": "test1", "bbb": "test2", "ccc": "test3", "ddd": {"eee": "test4", "fff": "test5"}}

 

If we format it for easier viewing, it will look like this:

{ "abc": "test1", "bbb": "test2", "ccc": "test3", "ddd": { "eee": "test4", "fff": "test5" } }

 

Furthermore, there is also a type called Ordered Mapping, which can be combined with the sequence type mentioned above to specify the order

- aaa: test1 - ccc: test2 - bbb: test3

 

If the order of keys is important, you can combine specifying the order like a sequence while assigning keys like a map.
This concept doesn't exist in JSON.

 

Scalar types

Finally, there are scalars.
Scalars are simply single values, such as numbers, strings, and boolean values.
We've already seen them in the sections on map and sequence types, but what kind of values ​​are scalars? All data types other than maps and sequences are considered scalar types.

Map and sequence types represent structures rather than types of data, so you could say that it is the scalar types that define what values ​​actually exist

Since scalar types cannot be written alone, we will express them as elements of a sequence type

- 100 # number (decimal) - 016 # number (octal) - 0xAC # number (hexadecimal) - 3.14 # number (float) - 12.3e+4 # number (exponent) - .inf # number (positive infinity) - -.inf # number (negative infinity) - true # bool - false # bool - on # bool - off # bool - yes # bool - no # bool - null # null - ~ # null - test # string - 1980/09/02 # string (note) - 1980-09-02 00:00:00.000-09:00 # datetime (ISO-8601 format) - 1980-09-02 # date (year, month, day) - 1980 # date (year)

 

Of these, only strings can contain multiple lines of data, including line breaks, so several magic spells are provided to represent multiple lines of text data

# Simple text with line breaks test1: | You can write text with line breaks. You can write text with line breaks

 

# Replace line breaks with half-width spaces (line break codes will not exist when the data is extracted) test2: > You can write text that includes line breaks. You can write text that includes line breaks

 

In either pattern, the half-width spaces used for indentation at the beginning of each line are removed.
In the case of ">", after removal, line breaks are replaced with half-width spaces.

Additionally, by combining the above two with "+" and "-", you can define whether to include or remove the newline on the last line

# Add a line break code to the end of the last line test3: |+ You can write text that includes a line break. You can write text that includes a line break

 

# Do not add a line break code to the end of the last line test4: |- You can write text that includes a line break. You can write text that includes a line break

 

# Add a line break code to the end of the last line (but remove line breaks within the text) test3: >+ You can write text that includes line breaks. You can write text that includes line breaks

 

# Do not add a line break code to the end of the last line test4: >- You can write text that includes a line break. You can write text that includes a line break

 

Depending on the text content, removing spaces used for indentation might be problematic.
In such cases, you can specify how many spaces to add to the beginning of each line.

# Put two spaces at the beginning of each line test1: |2 You can write text including line breaks. You can write text including line breaks

 

# Put 4 spaces at the beginning of each line test1: |4 You can write text including line breaks. You can write text including line breaks

 

There is a subtle relationship between the number of spaces at the beginning of a line and the number of indents. If there are more than the specified number of spaces in the indent, it is OK, but if there are fewer spaces, it seems to be formatted incorrectly

# Error because four or more spaces are required at the beginning of each line test1: |4 You can write text including line breaks. You can write text including line breaks

 

Also, according to the YAML definition, the number of spaces to insert at the beginning of a line can be specified in the range 1 <= n <= 9

 

About Nest

As has been mentioned several times already, elements in maps and sequences can be nested, and this nesting can be represented by indentation.
In YAML, two spaces are considered one indentation.

# Nesting sequences - aaa - bbb - ccc - ddd - eee - fff

 

# Nesting maps aaa: test1 bbb: test2 ccc: ddd: test3 eee: test4 fff: test5

 

# Nesting sequences and maps - aaa - bbb - ccc ddd: test1 eee: test2 fff: test3

 

# Nesting maps and sequences aaa: test1 bbb: test2 ccc: - ddd - eee - fff

 

# Nested ordered maps (note the number of indentations) - aaa: test1 - ccc: test2 - bbb: - ddd - eee - fff

 

If a map or sequence is a child element of an ordered map, it seems that it will not be nested correctly unless it is indented by two spaces (four spaces)

 

If you are using YAML in Visual Studio Code:

While Visual Studio Code (VSCode) offers some built-in support for YAML, making it relatively easy to use, I
feel that extensions are necessary to create an even more convenient environment.

released by Red Hat YAML I highly recommend using the extension called
It's a full-fledged extension that uses a Language Server and is very user-friendly. It
also supports external schemas, so you can load predefined schemas such as Swagger YAML schema data (commonly used in API development), AWS CloudFormation schema data, and docker-compose.yml schema data, making it even easier to use.

The YAML schema for Kubernetes is included, so you don't need to prepare it yourself

 

summary

This time, I wrote an introductory article about YAML, including some fun facts. What did you think?
I think there are some people who thought YAML was difficult, and others who avoided it because they didn't know that there are actually only three basic data types.

Scalar types have several built-in data types, and while they all look like strings, you've probably realized that YAML distinguishes between them properly. Regarding
strings, there are several convenient ways to represent multi-line strings, so you'll want to use the appropriate method for each situation by looking at the examples.

You can freely express data structures using maps, sequences, and a combination of them called ordered maps (I need a better name for this), and if you can create clear data definitions, you're sure to end up with code that's easy to maintain

In fact, we weren't able to fully introduce advanced usage techniques, such as how to write data definitions such as anchors and aliases in a different location, or how to write multiple data structures in one file, so we plan to introduce them on another occasion, so stay tuned!

That's all

If you found this article helpful,please give it a "Like"!
2
Loading...
2 votes, average: 1.00 / 12
35,042
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Yoichi Bandai

My main job is developing web APIs for social games, but thankfully I'm also given the opportunity to work on various other tasks, including marketing.
My image rights within Beyond are treated as CC0.