Minor format spec rewording

This commit is contained in:
Nathan McRae 2024-02-15 20:27:35 -08:00
parent a4ac31992d
commit 6227b63aed

View File

@ -16,7 +16,7 @@ Empty fields (i.e. two subsequent '\t' characters) are allowed.
The first line is always the header and the fields of the header are the column names for the file. Column names must be unique within the file and must not contain ':' characters (for compatibility with [Typed TSVs](#typed-tsv)). The first line is always the header and the fields of the header are the column names for the file. Column names must be unique within the file and must not contain ':' characters (for compatibility with [Typed TSVs](#typed-tsv)).
All lines in the file must have the same number of fields. All lines in the file must have the same number of fields as are in the header.
The file must not end with '\n'. That will be treated as if there is an empty row at the end of a file and cause an error. The file must not end with '\n'. That will be treated as if there is an empty row at the end of a file and cause an error.
@ -53,13 +53,17 @@ Aside from the 'binary' column type, all fields must be UTF-8 encoded text. Each
- 'uint32' and 'uint64' are unsigned 32 and 64 bit integers respectively. They should be formatted like this regex: `[1-9][0-9]*` - 'uint32' and 'uint64' are unsigned 32 and 64 bit integers respectively. They should be formatted like this regex: `[1-9][0-9]*`
- 'int32' and 'int64' are signed 32 and 64 bit integers respectively. They should be formatted like this regex: `-?[1-9][0-9]*` (except that '-0' is not allowed) - 'int32' and 'int64' are signed 32 and 64 bit integers respectively. They should be formatted like this regex: `-?[1-9][0-9]*` (except that '-0' is not allowed)
Binary fields are left as-is (after unescaping is performed).
Typed TSV files should have the .ytsv extension (.ttsv is already used). Typed TSV files should have the .ytsv extension (.ttsv is already used).
# Commented TSV # Commented TSV
Commented TSV builds on Typed TSV and allows for more flexibility in the format by including line comments. They are kept distinct so that some applications of it can take advantage of the extra flexibility, while others can stick with the more restricted Typed TSV format. Commented TSV builds on Typed TSV and allows for more flexibility in the format by including line comments. The formats are kept distinct so that some applications can take advantage of the extra flexibility comments allow, while others can stick with the more restricted Typed TSV format.
Commented lines start with a '#' character at the beginning of the line. Unescaped '#' characters are not allowed on a line that does not start with a '#'. Any '#' characters in fields must be escaped. Commented lines start with a '#' character at the beginning of the line. Unescaped '#' characters are not allowed on a line that does not start with a '#'. Any '#' characters in fields must be escaped. Note that the '#' character is excluded from the comment data.
Multiple consecutive comment lines are considered a single comment, with each line separated by a '\n'.
Comments must be UTF-8 encoded text. Comments must be UTF-8 encoded text.
@ -82,7 +86,7 @@ Note that extended formats must remain parseable by baseline parsers, hence we m
Extending formats may also have restrictions. For example, they could disallow record comments and only allow the file comment above the header. Extending formats may also have restrictions. For example, they could disallow record comments and only allow the file comment above the header.
Extended formats may still use the .ctsv extension, though they could use a dedicated one as well. Extended formats may still use the .ctsv extension, though they could use a dedicated one instead.
## Ideas for Extension ## Ideas for Extension