Dattle is a new data specification format. Dattle sits in the same camp as JSON and EDN formats. Data described in Dattle is human readable and self-describing, e.g. you don't need an out-of-band schema to parse the data.
A demonstration of all Dattle syntax:
{"nil" nil
"true" true
"false" false
"string" "UTF-8 \"escaping\""
["vector"] ["one" "two" true false]
{"map" "example"} {"key" "value"
"name" "value"}}
Files containing data expressed in Dattle should have the
extension .dt
.
I'm mulling over the design of a new programming language. I don't think it's any mistake that JSON became so popular because JavaScript became so popular. There's a huge advantage to having both a data format and a language which 'click' together. Data is the more fundamental thing, so I want to start there.
The primary influence here is Extensible Data Notation (EDN) which is the data language of the programming language Clojure. This is another example of data format and language pairing. Compared to JavaScript/JSON this relationship is hardcore: most everything in Clojure is written in EDN.
EDN gets a lot right:
:
) between the keys and
values. We certainly don't need the trailing comma
(,
).
EDN has a bunch I don't need though:
{"a", "b", "c",
"d"}
which is just confusing. Optionality has very little room to
stand in data format. People like to do
stringify(parse(data))
to clean it up into its
proper form so I want to reduce the number of
allowable-but-not-proper forms. Dattle disallows commas
outside of strings. This removes that piece of optionality.
Data expressed in Dattle is defined using the following
elements. Dattle is case sensitive, e.g. TRUE
and
True
not valid replacements for true
.
true
and false
for booleansnil
has three intended use cases:
Dattle has a BNF specification.
Some of the key decisions that make Dattle what it is.
Notably absent from the specification is numbers. Both EDN and JSON have number types. It is an understatement to say numbers are common in data formats.
However, numbers go against the design goals because they are complex, both terms of problem space and syntax. There are hundreds of different number memory representations and syntaxes to pick from. For simplicity, Dattle does not make a decision here.
Both JSON and EDN chose which to support based on the influences
of their languages. JSON numbers are 64-bit floating point
numbers, matching JavaScript. JavaScript has since added support
for BigIntegers of arbitrary-size. EDN supports integers and
doubles with togglable precisions, rooted in Java
long
and double
.
Mathmatical operations are not performed amongst inert data. Numbers are a conveinence in a data format. This conveinence forces formats to make a choice of:
#custom/uint8 4
such that the 4
is
represented as the uint8
type the programming
language or parser may or may not support.
Numbers can be represented using strings in Dattle:
"123"
. This seems like the right outcome
in terms of being explicit, given the complex nature of numbers.
Both producer and consumer of Dattle will need to be conscious
of the memory/string representation of their numbers:
number = 6
dattle = stringify({ amount: stringifyFloat(number) })
// => '{"amount" "6"}'
result = parse(dattle)
assert(parseFloat(result.get('amount')), number)
For anyone who has worked with floating point numbers or underspecified number formats before, this need to be explicit comes as a huge relief.
With numbers left undecided, it seems incongruent that strings aren't also undecided. After all, strings have just as many different representations and are certainly complicated. However, strings are so complicated that for human readable formats we've seemed to all arrive at the decision to use Unicode.
UTF-8 was chosen with the expectation that since Dattle was human readable and used in programming contexts that the majority of strings would be ASCII but still need to be extensible to support more complex characters. Otherwise, UTF-16 would also have been fine.
Another missing feature is the ability to write comments in Dattle. They aren't supported in the spirit of reducing the number of allowable-but-not-proper forms. Comments are not composable, nor data. Having comments would lead people to develop half-baked extension systems inside their Dattle files.
Creating a "Dattle but with comments" would be trivial given how limited the grammar is. No one can stop others from tacking comments on to their data format. So let's talk about it here to avoid fragmentation.
Dattle with comments have the file extension
.dtc
and the syntax is as follows:
# <comment> #
# <multi-line-
<comment> #
# <comment with a hash \# in it> #
The hash #
is chosen for consistency with the other
single character "container" characters for strings,
vectors, and maps:
"abcd"
["a" "b" "c" "d"]
{"a" "b" "c" "d"}
# abcd #
The spacing between the comment text and hash are required. We
want to discourage extension systems using comments and
# marker #
deters that urge more than
#marker#
.
We also don't support unclosed comments like
# comment <end-of-line>
as the Dattle grammar
has no notion of end-of-line itself. End-of-line behaviors get
weird for parsing done on streams of data as the parse now also
needs to factor those in.
Dattle was chosen because it:
.dt
.
Speaking of brand optics, we also need a logo so here we go:
This logo was chosen because:
And with that, Dattle has arrived.