Re: [community-group] Split units and values in type definitions (#121)

Before going deeper down the rabbit hole, I want to enthusiastically agree that splitting numbers and units would make the spec simpler and make it easier for parsers to rely on languages' built-in types (like numbers and strings).

On the other hand, I think we can agree that `"$value:" "100ms"` is easier to type than `$value: { "number": "100", "unit": "ms" }`.

So, given that, along with the stated goal of making token files easy for people to edit, the 2 questions I'm trying to answer are:
1. Can we come up with a clear specification (I think that's what we're calling a microformat) for the possible value of `$value` in a way that allows someone to write a parser that correctly separates units and numbers?
2. Is the complexity of implementing that spec/microformat worth the benefit gained by being easier to write?

Obviously my opinion on both of these is "yes," but as we go through @romainmenke 's excellent counterpoints I'm having doubts. But I do think it's worth continuing to iterate to see if we can get to clarity around the spec's currently-implied microformat before moving to splitting units and numbers.

Ok, now down the rabbit hole.

---

> What if you want to express a string value that starts with #?

Yes, good point. It might be productive to limit our discussion to types like `dimension` and `duration`, where separating units from numbers is important. The need for clarity around string values is a topic that has come up a lot, and warrants a separate topic.

> Negative numbers?

Another good point, my microformat proposal didn't account for negative numbers. I'll iterate at the end of this comment.

> What if a string literal starts with a number?
> "10 horses"
> Is this a parse error or a string value?

I think this hypothetical is a little too hypothetical. But I think it would be reasonable to understand `"10 horses"` as a number of `10` and a unit of `horses` (say, if you needed to convert this to another unit). The microformat doesn't account for spaces between the unit and value, but it should.

If you want `"10 horses"` to be a string (not a number + unit), then you would need something like a `string` type, which again we haven't fully addressed and warrants its own topic.

> How do I express string literals that contain only numbers?

That would be something like a (currently undefined) `string` type, not a `dimension` or a `duration`.

> Parsing becomes much easier if you already know that something is a dimension and must not be something else.

Agree. I think we're discussing a microformat for any tokens that have numbers+units, not for every single type/token.

> If any program produces numbers with scientific notation it forces people to convert them first before using them in design token files.

I agree that changing numbers from scientific notation is an extra step. I think we have to consider tradeoffs here: how often is someone copying-and pasting numbers into a token file from a tool that produces scientific notation? My estimation is that it is pretty rare. If we are making files harder to write and/or harder to parse to acommodate this workflow, is it worth it?

---

Here's an updated proposal for the format based on these ideas:

The `$value` of a token that has units will always take the following format:

1. MIGHT include a `-` or `+`, then
2. MUST include at least one numeric character
3. MIGHT include a decimal
    a. if a decimal is present, MUST include at least one numeric character after the decimal
4. MIGHT include a space
5. MUST include at least one letter character

Here's a reference implementation:

```js
function parse(str) {
  const regex = /^([-+]?\d*\.?\d+)+\s?([A-Za-z]+)$/;
  const matches = str.match(regex);

  if (!matches) {
    return new Error('Input string does not follow the format');
  }

  const number = matches[1];
  const unit = matches[2];

  return { number, unit };
}
```

`parse("100ms")` results in `{ "number": "100", "unit": "ms" }`.
`parse("10 horses")` results in `{ "number": "10", "unit": "horses" }`
`parse("-10.23px")` results in `{ "number": "-10.23", "unit": "px" }`.

`parse("123e3px")` results in an error.

---

@romainmenke I appreciate you continuing to push on this and having an exacting attention to detail and edge cases.


-- 
GitHub Notification of comment by ilikescience
Please view or discuss this issue at https://github.com/design-tokens/community-group/issues/121#issuecomment-1345645090 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Sunday, 11 December 2022 20:09:37 UTC