Parse, Don't Validate – In a Language That Doesn't Want You To

(cekrem.github.io)

119 points | by fagnerbrack 8 hours ago ago

94 comments

Xenoamorphous 5 hours ago ago
I don't like zod. I want to define my types, not write schemas. And I don't like that then I have to use the types derived from those schemas rather than types I've defined myself directly.
So I just define my types and then use typescript-json-schema or similar to build a JSON Schema at build time (i.e. from an npm script) which then I use to validate input using ajv.
The only thing I do on top of that is to use annotations like "@minimum 0" (or, in the email example, "@format email") where the base types are not enough, but those simply go inside comments.
So the compiled package only has ajv as runtime dependency (which you're likely to have anyway, as it's everywhere), you're just defining regular types with some annotations on top and use a dev dependency to build you the JSON Schema. And as popular as zod is, I think JSON Schema is more of a standard and likely to stay with us longer.
I also reference those generated JSON Schemas from my OpenAPI definition, as a bonus.
[-]
- adam_beck 4 hours ago ago
  While I would much prefer to only write Typescript types, this would drive me insane:
  > The only thing I do on top of that is to use annotations like "@minimum 0" (or, in the email example, "@format email") where the base types are not enough, but those simply go inside comments.
  [-]
  - Xenoamorphous 4 hours ago ago
    Obviously it's not ideal, but IMO it's the better option. Much better than `z.number().integer().min(0)` or whatever zod equivalent there is and then have to deal with the inferred types which among other things tend to suck for IntelliSense etc. Those annotations map directly to JSON Schema attributes.
    [-]
    - HeyImAlex 3 hours ago ago
      You can AOT typescript types from zod schemas if intellisense is your main complaint. I think it generally makes more sense to transform from more expressive => less expressive, and zod is more expressive than typescript (as evidenced by your need to add doc comment annotations to get similar behavior going the other way).
- gabes an hour ago ago
  I think this misses what is undeniably the best part about zod. Yes, you could define all of your types this way, but it’s only necessary at the boundary of the program. Internal functions don’t need to validate inputs if the caller is trusted.
  Being able to define a loose input schema at the boundary and then transform it into a shape that your program actually needs is extremely useful.
- sheept 4 hours ago ago
  What you're doing is essentially what Zod is designed to avoid. If you tolerate needing a separate build step more than having to define types with Zod's syntax, then it makes sense not to use Zod since it's not made for you.
  [-]
  - Xenoamorphous 3 hours ago ago
    To me the build step is a good thing. It's a simple script in npm, and it means I only keep what I need (the JSON Schema, which I don't need at dev time) in runtime and whatever package generates those schemas out of TS types can remain as a dev dependency.
    zod can't be a dev only dependency, and you have to deal with breaking changes and maybe switching to a completely different library in a few years (joi, with a syntax very similar to zod's, was very popular a while ago too).
    [-]
    - Lvl999Noob 3 hours ago ago
      How would you use zod as a dev-only dependency? The whole point there is to be a parser. If you remove it from runtime then you are not parsing the production payloads, which is the exact place where you do need to parse them.
- botfriendsarent 3 hours ago ago
  This is what I also do not just in JS but also in other languages. But I write the schemas. And I dont use TS. Im glad Im not the only one. The OP post gave me a serious headache trying to read it.
  Parse and Validate are not binary choices and have nothing to do with each other. Both are useful when applied correctly to a given situation.
  I felt punked by most of it. I dont see what programming languages have to do with it either. Look at swift, a language that can barely only barely parse JSON. Who cares?
- BiteCode_dev 4 hours ago ago
  For all its faults, this is one of the things the Python typing system gets right. It's dynamically introspectable at runtime, so you can define type, parsing and validation in one go with stuff like pydantic.
Altern4tiveAcc 6 hours ago ago
Zod is by far the most ergonomic way to express those ideas in TypeScript these days. I miss it when writing code in other languages.
The friction with the rest of the ecosystem is real, though. Most code out there expects you to handle errors with exceptions.
I get the impression that polymorphic return types could get in the way of JSC/V8/SpiderMonkey's JIT, but I haven't measured it and I'm not sure of the actual impact on hot and cold paths. Same for all the allocations caused by custom Option<T>/Result<T,E> implementations.
I think using Zod at the edge (with branded types and whatnot), while keeping return types as T/Promise<T> to keep a sane relationship with the ecosystem is a good middle ground.
[-]
- jerf 5 hours ago ago
  I haven't done a lot of Typescript, but I've done at least a couple of month's worth now, and every time I have to type "as" my inner Haskell programmer screams.
  If I could add one feature to Typescript it would be something like "as" that actually validates the result against the type system and can fail. Unfortunately, that's way, way easier said than done. It's the bad type of keyword that has unbounded runtime cost because it would have to be a runtime comparison, and there are a lot of design questions about how to write it. However, I still petulantly want it even though I can hardly define it. "zod" is pretty good but you can see how trying to add that as a "keyword" is nightmare fuel for a language-level change.
  [-]
  - WorldMaker 38 minutes ago ago
    In addition to exploring `satisfies` as a better `as` (compile checks, that don't assert), you may also be looking for `is` aka Type Guards (runtime checks that assert types).
    If you have a validator function `function isSomeSpecificType(obj: unknown): boolean` you make it a Type Guard by changing the return type to be the type assertion you need: `function isSomeSpecificType(obj: unknown): obj is SomeSpecificType`. Typescript's narrowing is pretty good about using Type Guards to good advantage.
  - SebastianKra an hour ago ago
    I wonder what you're doing that you need to type it so often. I almost never use it in application code (outside of tests and generic utilities).
    There are some techniques that aren't immediately obvious. Look into...
    - type guards
    - pushing constraints up: `function print(i: Invoice & { issueDate: string })` is better than `assert(i.issueDate)`
    - discriminated unions
  - chrisfarms an hour ago ago
    The "satisfies" keyword may be what you are missing. Most of the cases I used to need "as" (usually weirdly permissive types from some lib) can be nudged in line with a "satisfies"
  - dolia 4 hours ago ago
    "every time I have to type "as" my inner Haskell programmer screams." - most of the times you don't have to. You choose to.
    "If I could add one feature to Typescript it would be something like "as" that actually validates the result against the type system and can fail." - I don't think it's fair to expect that since most of the statically typed languages will not guarantee things in runtime unless you specifically run a validation code in runtime.
    There's also type guards and good old self-written validation functions you can use.
    [-]
    - jerf 3 hours ago ago
      "I don't think it's fair to expect that since most of the statically typed languages will not guarantee things in runtime unless you specifically run a validation code in runtime."
      If I have a value of type X in a static language, then I know that it absolutely conforms to the layout of type X. It isn't even that we have to provide a "validation function", it is that it is quite literally physically impossible for my value to not conform to the definition of type X, in the languages like C or Go or Rust where the type layout is actually a specification of a layout of data in memory. If I have JSON '{"a": 1}', there is no way whatsoever to stick that in a "struct { A string }", because it physically doesn't fit. By "physically", I mean, in RAM, in the physical cells and voltages. There's no way to validate that a "struct { A int }" 'really contains an int' because there is no way for it to be anything else.
      Typescript specifically has these issues because all of its objects boil down to a Javascript object with certain keys, and all of the values are ultimately of type "any" no matter what Typescript tries to lay on top of it. If I have this sort of data come in to a static language, I have to have a step where it very deliberately converts it down to the static representation. There isn't an equivalent of "as". Modulo unsafe, but we don't count unsafe in these sorts of discussions.
      I am absolutely guaranteed in a static language that if my struct says field "A" is an int, it absolutely, positively is, always has been, and always will be.
      The main problem I encounter with "as" is when I have external data coming in. For that I have zod and validation functions. What prompted my post is my experience yesterday where I corrected an AI using "as" (which it used because a lot of its training data does this) and had it call an actual validation function that I happened to already have before it cast it into the type. But the reason my Haskell programmer screams inside is that a validation function can still be wrong, because the compiler isn't helping me. In a static language, I can guarantee that if I have "I dunno, some JSON value" on one side and a struct comes out the other end with some value derived from it, I have absolutely had some bit of code check that and pack it into the static value in a way that the compiler has helped check. I can further reliably compose these promises quite reliably through further type specification in my static type.
      A validation function can still have bugs in it that a static language compiler would have strictly, compile-time validated. It's better, but it's still the manifestation of the quite accurate criticism that dynamic languages end up trading away all their convenience with not specifying types with having to have vast swathes of validation, and testing of that validation, in the testing backend. In Typescript, I can mostly sorta kinda compose them together, but it takes a lot more features and grease and effort. I appreciate Typescript in its capacity as taming Javascript and prefer it substantially over Javascript alone, it's probably the best thing of its type that we could hope for, but if I consider it as a language that stands alone, I really really dislike it.
- zahlman 3 hours ago ago
  > Most code out there expects you to handle errors with exceptions.
  Because you have to build the Option/Result/whatever system yourself, and propagating and unwrapping isn't fun.
- IshKebab 5 hours ago ago
  > I miss it when writing code in other languages.
  You can use Pydantic in Python and serde_derive in Rust. I assume most languages have a thing like that.
  [-]
  - Altern4tiveAcc 34 minutes ago ago
    > I assume most languages have a thing like that.
    You're not wrong, I assume. My problem is specifically with the remaining languages without anything like that. :')
ramon156 6 hours ago ago
The author found out about the square holes in round peg situation with TS. Functions can implicitly error, and there's no annotation that's enforced to tell you that it might error. FP solves this with Result/Option, but this doesn't fit in TS. Effect is there to find a solution but will fail.
Zod is the acceptable middleground in my opinion. Zod will allow you to throw a schema against an object and it'll tell you "yes the result fits your schema". This is fine for most projects.
If you want to go zero-dependency, you can see how far you can get with TS's type system. Branded types are kinda cool. NewTypes are also cool, but also high maintenance. Unless you're building a library that millions depend on, it's probably not worth it.
[-]
- whilenot-dev 5 hours ago ago
  FYI branded types and newtypes are kind of the same thing, branded types just use a unique symbol that's expressed explicitly.
- epolanski 6 hours ago ago
  > Effect is there to find a solution but will fail.
  What do you mean?
  I'm into Effect from long time and it really scales well the more complex your applications.
  Schema is way more advanced than Zod by the way, both at type level and functionality it has a proper decoder/encoder architecture.
  You can encode "this isn't just a string -> non-empty-string -> valid email pattern" but a confirmed email the user has clicked on at the type level, by leveraging effectful schemas (and durable workflows if you want).
  You may not need it 99% of the time, I myself rarely use that, but it's not a fair comparison.
  Zod is more ergonomic, has easier apis and is perfect for most users. Would not recommend schema unless one buys the whole package.
  [-]
  - programmarchy 5 hours ago ago
    I haven’t used Effect but the problem I see with using it is that it seems to want to completely swallow the whole app architecture. At that point, why not just use a functional language?
    [-]
    - steve_adams_86 3 hours ago ago
      I can’t think of a single functional language that offers what effect gives you, though. A fully typed and declarative error channel, managed dependency layer with compile time safety, excellent resource management, the best parsing/validating/serializing library I’ve used in TypeScript, concurrency, streams, cache, otel primitives baked in…
      In all fairness it does require buy-in and gradual adoption isn’t perfectly seamless or frictionless, but I think it’s worth it. They’ve done an outstanding job with it.
      [-]
      - cptmurphy 2 hours ago ago
        Scala. Zio/Cats/Kyo
        [-]
        epolanski 11 minutes ago ago
        Relevant talk by John De Goes: Why Effect is more important than ZIO:
        https://www.youtube.com/watch?v=Ei6VTwhI8QQ
    - epolanski 5 hours ago ago
      Yes, your hunch is correct.
      Which functional language has a similarly huge ecosystem, works across the frontend/backend, has first class support of different runtimes, provides similar ergonomics, has meetups and conferences in so many countries and is easy to hire for (all you need is solid TypeScript)?
      There's a reason effect-ts keeps spreading despite its syntax and learning curve, and I say it as somebody that used Haskell, functional Scala, Purescript, Elm, Racket, Elixir and tested another half a dozen.
      Give me an Elixir with properly powerful types (not gleam) and I'm in.
      I'd gladly throw effect and typescript especially out of my work day, but I see no sane replacement at complexity scaling.
      I wouldn't personally recommend effect without a solid champion in the team and without having the complexity needs for it (I'm talking recurring durable worklows, complex encodings, suspension, retries, etc) and even if you have them the price is steep without a champion, but that's my 2 cents.
      You use it for an agentic cli (opencode uses it e.g.) not a simple crud (which is 90% of web dev industry).
lumpysnake 6 hours ago ago
We should make authors disclose how much AI was used to write an article. This reeks of Opus 4.8.
[-]
- ramon156 6 hours ago ago
  I recently made a Firefox Extension to mark authors as Slop for the same goal but not the same reason.
  I don't think disclosing helps here. If the article wasn't obviously generated, why would that affect you ?
  The only issue I have is being half-way through the article and realizing I am reading hallucinated text. If I can mark the author once, I won't see them again. This works fine for me. You could argue that disclosing would fix this issue, but the issue is not that AI was used, but that it was not curated.
- lijok 6 hours ago ago
  Why should they disclose how much AI was used to write an article?
  [-]
  - hombre_fatal 4 hours ago ago
    Because it's a trapdoor function. You generate heaps of content with AI with 1:10 or 1:100 amplification of time/attention invested, but your readers spend their time reading it at 1:1.
    Also, what are we doing using AI to write our blogs? Surely that's the final domain of human writing outside of our local circle?
  - invader 4 hours ago ago
    If the author hasn't bothered to spend time writing the article, why should I spend my time reading it? Let agents do it for me!
  - Bjartr 6 hours ago ago
    If nothing else, it should be done as a courtesy to those who would like to avoid such content.
    If the result is better for having used AI, why wouldn't an author want to disclose it?
    [-]
    - lijok 6 hours ago ago
      Should they disclose the use of a spellchecker? A translation app? Gramarly? A writing tutor?
      [-]
      - btrettel 6 hours ago ago
        A spell checker, grammar checker, and tutor change a relatively small fraction of the writing, preserve the writer's style/voice, and rarely introduce errors that are hard to detect like hallucinations.
        A translation app changes nearly 100% of the content, often changes the writer's style/voice, and can introduce hard to detect errors. But there's a far closer correspondence to what was written by the original writer. The basic ideas are still from the writer. A translation app is not expanding a short idea into something longer, and including some things the original writer never thought in the process.
        ***
        Pre-LLMs, I did in fact disclose when I was using a translation app in some translations of scientific articles I produced. It would be weird to disclose the use of spell checking, grammar checking, or who previously taught me writing as these things are ubiquitous. I will also acknowledge people who were influential in my thinking. If a LLM is doing a lot of the thinking for me then I do think disclosing LLM use is appropriate.
      - ben-schaaf 5 hours ago ago
        I find it's incredibly helpful to know when someone's using an automated translator, as they usually get details wrong while still reading like a native speaker. Not using a translator at all, or disclosing that one was used means I can make a better educated guess as to what they mean. It also changes how I reply.
      - Bjartr 5 hours ago ago
        If there were groups that voiced a desire to be informed of that, then it would indeed be courteous to do so.
      - BiteCode_dev 3 hours ago ago
        It used to be the polite thing to disclose that you used a translation app. In fact, traditionally, you disclose when you translate anything so people know the context in which to interpret your text.
        In the same way, I wanna know if a book is written by some famous people just ghost written.
        Of course, the point is moot. Somebody using AI to write a blog post is unlikely to be self conscious enought to thing it's necessary to disclose it in the first place.
    - NeutralCrane 6 hours ago ago
      I think the need to jump through hoops to disclose anything and anything that might offend someone’s particular sensibilities is a losing battle. What if I want a disclosure on if the content is being hosted via AWS vs some non-magacorp that agrees with my sensibilities more? Or that the power being used by the data center is renewable? Or a disclosure for the author’s every political position so I know if I agree with them and if I should amplify their message and/or generate ad revenue through their site?
      At the end of the day, the ideas within the content are what matters. An idea has or does not have merit regardless of if it was produced entirely by a person, or by a person using AI as an editor, or 100% generated by AI. If you need a disclosure on if an idea was produced by AI, you are saying that you have no interest on debating the content on the grounds of the arguments it is making, while simultaneously ceding you can’t tell the difference between someone using AI and someone who isn’t (which undermines one of the primary arguments against AI, that it makes for inferior outputs).
      [-]
      - lumpysnake 5 hours ago ago
        > if the content is being hosted via AWS vs some non-magacorp
        > power being used by the data center is renewable
        That doesn't change anything about the content itself. AI writing is a disservice to the reader. Why should I even care to read an article you didn't even care about writing yourself? At this point a 300-character tweet would've achieved the same effect.
        [-]
        NeutralCrane 4 hours ago ago
        That’s my point. The AI writing either affects the content or it doesn’t. If you require a disclaimer to tell the difference, then it isn’t affecting the content.
        Requiring a disclaimer is essentially admitting the content isn’t meaningfully different than human generated content. At that point, who cares? Just engage with the premise on its own merits, rather than on how it was written.
        [-]
        infamia 3 hours ago ago
        > Requiring a disclaimer is essentially admitting the content isn’t meaningfully different than human generated content. At that point, who cares? Just engage with the premise on its own merits, rather than on how it was written.
        The problem is the reader has to invest time to find out and LLM written text will (on average) lower the quality towards "meh" and spend more words doing so. Even if the author is making an earnest effort to produce high quality content, they need to admit to themselves and others that their results will be more hit or miss. The disclosure allows the reader to make a more informed decision about how to engage with the material (e.g., have an LLM summarize or analyze the content, or just dive in because we know it will very likely be a good read). Editing what someone has written is like reviewing code, you're by default not as invested, so the results will likely reflect that reality.
        lumpysnake 4 hours ago ago
        Then you're totally right. In this case, it's a poor usage of AI because we are able to tell it's slop.
        Odds are very high at this point that I've come across a piece of content I enjoyed that was at least partly written by an LLM without having detected it.
  - lumpysnake 6 hours ago ago
    Because I would've completely avoided the article if I knew that I would be served slop. I was interested in the content, but I was immediately thrown off by the writing style, which closely resembles what I've been getting from Opus 4.8 lately in my dev work. Filler language and useless metaphors everywhere.
    > Booleans look tidy until somebody adds a third case and exhaustiveness silently doesn’t kick in. Strings narrow honestly.
    Like, nobody truly writes like that. It wouldn't get past any competent editor.
    Strings narrow honestly? What does that even mean? This kind of 3-word precision is useless and they appear everywhere in the article. We get the point with in the first sentence, no need to add more.
    [-]
    - hombre_fatal 4 hours ago ago
      > Strings narrow honestly.
      This is a great example of the latest "LLM tell" I'm seeing in prose.
      It's so terse with its "power-verb" that I have to read it multiple times. It's a clever compaction of English, not something I want to read outside of a headline or motto.
      Here's another example from a Claude convo I had open: "Alerts flag mirrors". It's agreeing with my proposal that the alert system should be expanded to consider duplicates, and it came up with a cutesy phrase for it that ends up reading like three unrelated words.
      Makes me appreciate how helper words help make the structure of a sentence more obvious.
      More examples: "Errors surface drift", "Tests anchor scope", "Guards screen input". That's probably what it is: when the verb is also the form of a noun (flag, surface) or adjective (narrow).
      Slogans mask meaning.
    - twoodfin 5 hours ago ago
      I just flag like I would terrible writing by a human and move on.
      It’s frankly depressing when (2018) oldies-but-goodies get reposted here for the Nth time. The clarity of thought and obvious effort that went into communicating that thought was expected for top-voted posts at the time. Now those posts appear exceptional in this era’s standard of “the LLM just cleaned up my notes” slop.
exceptione 6 hours ago ago
It is nice the author mentioned F#, because if you want to target the browser (or any JavaScript runtime), you can do from F# directly from fable (https://fable.io). This allows you to program by default in a type safe manner without having to play tricks to circumvent the limits of structural typing.
[-]
- robrenaud 5 hours ago ago
  I suspect idiomatic TypeScript or idiomatic F# are both way better solutions in the real world than abstruse Typescript emulating idiomatic F#.
  [-]
  - exceptione 5 hours ago ago
    Possibly, but I think what we wish for is a language with a nominal type system that lets you switch to structural typing when needed.
    Luckily, F# has type providers, which lets the compiler construct nominal types based on the structure of real data (like json, xml or any format you want), saving you from the effort of building wrapper types by hand.
robertlagrant 8 hours ago ago
This feels right, and I also have never done it (or had the guts to get others to do it).
The reason I've not is - say there's an optional field. Currently we call that null, probably, and check each time if it's there or not. I could instead make a type, like User and UserWithPhoneNumber. Should we be making types for each combination of present/absent fields? That can't be right.
The classic answer is to move the logic inside the domain object, or have a helper function outside the object, so you aren't constantly checking for field presence/absence, but are instead writing the logic once and calling some code.
I'm not sure in practice types can help with this. But I'd love to be proven wrong.
[-]
- xx_ns 7 hours ago ago
  I think this is a slightly different problem. The absence of an optional field, if that's a legal state, is meaningful every time you use the type, so you encode it on the field: `phone: ValidPhoneNumber | null`. When it's not null you're still guaranteed a valid phone number. When it is null, that's a legal state you have to handle and which is domain logic, not validation you forgot to do.
  The combinatorial explosion you're picturing only shows up if you make a separate type per combination of present fields, but you don't need to. An independent optional field stays one `T | null`. You only reach for distinct types when fields are correlated and present together because they represent a state, and then it's a discriminated union on a status field, which is N states, not 2^N.
  [-]
  - robertlagrant 6 hours ago ago
    That's fair enough - I see what you mean. I think I read the case I was thinking into the article. Now I re-read it, it is saying what you're saying, which does make a lot of sense.
    Using types like this also means you can more easily avoid assignment errors, as everything will have a very specific type (e.g. Age instead of int).
- frogulis 6 hours ago ago
  This explosion of optionality types is (the most important) topic of Rich Hickey's "Maybe Not" talk. I recommend it!
  The short version is: the shape of a type is inherent to the type itself, but the optionality of its members is dependent on the situation. A type system that solves this problem separates these concepts to allow for this distinction.
  I _suspect_ it's possible to implement something like that in typescript but I haven't tried it myself (and I doubt it's very ergonomic).
- pillmillipedes 7 hours ago ago
  if a user with/without phone number are equally valid states to be then types won't help you much. I think it's more about writing
```
  class User{phone: ?PhoneNumber}
```
  over
```
  class User{phone: ?string}.
```
  [-]
  - throwwwll 7 hours ago ago
    To expand and give some notion of good taste:
    It's more about writing
```
    struct User {phone: MaybePhoneNumber} // give or take, it's a monoid
```
    over
```
    struct User {phone: Option<String>}
```
    [-]
    - pillmillipedes 6 hours ago ago
      I don't mind discussing syntax when appropriate, but this feels like arguing over which trivial brainfuck substitution[1] is the best.
      > monoid
      nullables with `??` and `?.` are also give-or-take monoids. is it common though to `or` two MaybePhoneNumbers together or to apply a PhoneNumber->MaybePhoneNumber function to it? if not then why mention it?
      let's see something meaningfully different like a database schema.
      [1] https://esolangs.org/wiki/Trivial_brainfuck_substitution
sigbottle 4 hours ago ago
Parse, don't validate is one way of building constraints. The issue is it feeds into a tree-based view of constraints. However it does yield the philosophy of "constraints by construction".
Another is making a set of "linearly independent" configurations - except in practice it never is, is it? Has anyone actually ever had a clean CI Matrix that didn't have weird hidden edge cases, for example?
Functional programming really wants to emphasize the notion of pure functions, which have modularity and independence built in. But there are perf issues and in practice, you don't really escape the issues of "how to design constraints". Sure, you don't need inheritance and OOP and all of that, but you can easily have a tree-based view of constraints and ontology in FP as well.
(Incidentally, my view of the issue with something like Carnap's logical frameworks is that they are so general and flexible that they fail to capture anything operationally useful; yes, I know that isn't always philosophy's goal but I view the same with a lot of purported theories of everything today)
Are there any other philosophies in software that have certain distinct wins versus losses when it comes both to the organization & encoding of your constraints, and coming up with them? Tree-based hierarchal decomposition and linearly independent axes in a space are two go-to things for me.
I suppose you could design a state machine, but that requires understanding all the semantics upfront, encoding them once, and hoping that requirement changes don't mess you up.
I have seen poset-based solutions as well (actually, I think "monotonic" distributed architectures are based around this approach) but that obviously requires a very specific type of problem domain.
----
There are also some very common memes from physics-swes: such as how information cancels out over "long distances" and therefore certain kinds of abstractions are good; attractor states in idea space; or even people loving the idea of symmetry (which, granted - in physics, is truly a beautiful approach, but does not seem to generalize well to generic software engineering). But those are a bit too high level to put into a concrete software plan. Still interesting though.
rzmmm 6 hours ago ago
Is there benefit of using this branded type over just encapsulating the raw string in a private variable in closure or class? This feels a bit like forced nominal typing. The Email type doesn't have to be a string, it can be encapsulated so that invalid Emails are not representable.
[-]
- iainmerrick 6 hours ago ago
  The main advantage of branding is that it’s a zero-cost abstraction -- the boilerplate vanishes at runtime. Just using a string instead of a containing object can give you a lighter-weight runtime.
gherkinnn 3 hours ago ago
I found that having clean models and parsing your data using Zod religiously at the application boundary (requests, URL, DB, env) gets you 80% of the way without fighting the language.
The stray email: string causing trouble is fine and is less work than self-imposed constraints that will be worked around by others.
throwaw12 6 hours ago ago
I personally love the idea and concept, but struggle to apply to real projects.
Suppose I have a User with some attributes like birthday, email and whether they have been verified.
in common codebase, you can see `if (user.verified_at != null)` or something along the lines, in case of parsed code I do feel like I should have types for each of them (or interfaces):
```
    - UserWithBirthday
    - VerifiedUser, UnverifiedUser
    - UserWithEmail, UserWithoutEmail
```
(and imagine having a method which accepts user with birthday and email to send an email day before their birthday, would you create UserWithBirthdayAndEmail type?)
it feels like it is going to bloat the interface space, how do you tackle this problem?
[-]
- bern4444 5 hours ago ago
  It's pretty trivial to create derived and augmented types with Pick, Omit, Required, Partial. Combined with a few parsing functions that return an object typed to whatever specification you need and you are set IE:
```
    type User = { name: string; verified: boolean; email?: string; lastName: string; birthday?: string | { year: string; month: string; date: string; }}

    type Birthday = Required<Pick<User, 'birthday'>>;
    type UserWithBirthday = User & { birthday: Birthday } 
    type VerifiedUser = User & { verified: true; email: string; }
    type VerifiedUserWithBirthday = User & UserWithBirthday & VerifiedUser;


    const userHasBDayAndEmail = (user: User): user is VerifiedUserWithBirthday => {
        if (user.email === undefined || user.birthday === undefined) {
            return false
        }

        return true
    }
```
  Any caller of userHasBDayAndEmail knows for the rest of its nested call stack if the provided user is a User object or a VerifiedUserWithBirthday.
  The types are cheap to write (they're all derived) and have no runtime impact (types are erased at build/compile time) and these parsing functions are quite small to write
  https://www.typescriptlang.org/play/?#code/FAFwngDgpgBAqgZyg...
  [-]
  - throwaw12 5 hours ago ago
    creation is not a problem, maintenance is.
    Suppose you want to add one more property to VerifiedUserWithBirthday and UnverifiedUserWithBirthday, you might get 2 more new types, and somewhere at the higher layer call chains you need to know which enclosing type you should pass so that some method in the bottom chain will accept it.
    I am sure there are more elegant ways, but I am struggling to generalize it to most enterprise SaaS CRUD apps, where you have one object with bunch of properties and can conditionally traverse the code logic
    [-]
    - bern4444 5 hours ago ago
      Yeah that's the engineering part in software engineer :)
      If you have VerifiedUserWithBirthday, any value that fails the parsing function is implicitly UnverifiedUserOrUserWithoutBirthday... No need to define it separately. You get the inverse type for free IE a value that is of type User and not of type VerifiedUserWithBirthday.
      A new property doesn't mean a new derived type. Only if that new property impacts what a VerifiedUserWithBirthday should represent should the VerifiedUserWithBirthday type be updated and even then, it's not a new type, just an update to an existing type. Again minimal updates needed.
      The compiler handles all the validation and will tell you exactly where there are any issues - the compiler is what makes the maintenance cost quite low.
- columnarx3 6 hours ago ago
  I think this is the wrong pattern in this instance. You parse an email or phone number because validating leaves it as a plain string, and you lose the context to know for sure if that string is actually an email or phone number.
  In your instance, you could have:
```
  type User = {
    // ... rest of fields
    email: {
      verified: boolean,
      // branded type here ensures that this string is a proper email address
      value: EmailAddress,
    },
    birthday: Date | null,
  };
```
  In this instance, your logic with a method that accepts birthday and email has all the information it needs to make its choice.
- sirwhinesalot 6 hours ago ago
  The computer-science answer to this problem are called "refinement types", where you can attach arbitrary predicates to a type, e.g. (pseudo-code):
```
    fn send_birthday_mail(user: {u: User, u.birthday != null})
```
  Contracts are a similar solution that restricts the predicates to only appearing in function types.
  The difference between this and an assert is that it gets checked at compile time (it can get quite expensive to do the check though).
  What can you do in mainstream languages? As much as is worth and no more than that. String -> User is worth it, User -> UserWithBirthday is not.
  [-]
  - throwaw12 5 hours ago ago
    this looks cool, but you are doing validation when accepting the object, you probably can't do it excessively, for example, if you are dealing with objects with heights, you might have a HumanLikeHeight where height range is between 40cm and 250cm, and you want to send email to that human, would you keep adding these conditions to the predicates?
    [-]
    - sirwhinesalot 5 hours ago ago
      Languages with refinement types (or contracts) like Dafny and Liquid Haskell can typically handle numerical predicates directly. Some can even handle string predicates directly, including regular expressions. They also allow you to write complex predicates as separate functions, albeit with limited expressiveness.
      But you hit performance and/or outright computational limits (halting problem) rather quickly.
- win311fwg 5 hours ago ago
  > Suppose I have a User with some attributes like birthday, email and whether they have been verified.
  Philosophically, birthday and email are not attributes of a user. If you remove a user from existence, a birthdate and email address still exist. So...
  > would you create UserWithBirthdayAndEmail type
  ...yes, something like a `profile { user, birthday, email }` type is necessary to compose the attributes you are interested in into something where those attributes do belong together.
  > it feels like it is going to bloat the interface space, how do you tackle this problem?
  Like all things formal verification, increase the level of verification in your critical sections and don't sweat the non-critical sections. How impactful will it be to your business if sending a birthday email message fails?
toolslive 4 hours ago ago
You don't have to use TypeScript if you don't want to: you can compile Haskell, Ocaml, Rust, F#, ... to javascript. This is quite efficient, especially if your backend is already in one of those languages. It saves you from creating the same abstraction twice in different languages.
[-]
- esafak 4 hours ago ago
  How does that work with JS frameworks like React, since most development takes place with them?
  [-]
  - toolslive 2 hours ago ago
    it's not that different compared to using the FFI.
    The link below shows how it can work for Ocaml <-> TypeScript.
    https://github.com/ocsigen/ts2ocaml
ivolimmen 6 hours ago ago
One of the pillars of Domain Driven Design. I love working on a pure DDD application but I do not often convince my team (I am a constant) that this is the best way ...
[-]
- jve 6 hours ago ago
  > I am a constant
  What did you mean by that? You don't accept mutability or any inputs on your state of mind?
somat 6 hours ago ago
"TypeScript is structurally typed, which means two types with the same shape are the same type. string is string is string"
I don't speak typescript so am probably missing something obvious. but. why would you parse an email(or anything really) into a string? (or string equivalent) When parsed it will end up as a specific email object, that is, something closer to a C struct. What is the articles dance doing?
[-]
- exceptione 6 hours ago ago
  Javascript doesn't have structs. The idea is that you have data on one hand and you have type witness about that data on the other hand. Type witness is something for the type system. But here you encounter the limits of structural typing versus nominal typing, because structural typing isn't able to witness that directly.
  In sufficiently strong nominal type systems, I can hide the constructor for an EmailAddress type (as in: nobody can just construct an EmailAddress type). In Haskell speak, I can then export a function parseEmailAddress = rawString :: string -> EmailAddress. The function parseEmailAddress is the only place that has access to the constructor. Which means that the only way to turn a string into an EmailAddress is by calling parseEmailAddress.
  Note that at runtime EmailAddress is just a string. The boundaries live in the type system, not on the value level. A structural typing system (as in TypeScript) does not enable that, it forces you to turn EmailAddress into something else than just a string.
  Are you confusing Email vs EmailAddress? I think that in many cases people would prefer EmailAddress to be represented as a dumb string at runtime. But if you don't, you will easily find other examples where you have 2 structurally similar types, that you don't want to mix up.
  [-]
  - somat 5 hours ago ago
    Javascript does have structs, it calls them objects.
    If I parsed an emailAddress the thing that came out it would look like {'domain':'example.com', 'user':'john-doe'} or emailaddr.domain emailaddr.user and a emailaddr.address method if you like that form. Even if what I parsed ended up as a single string-like field, I would still name that field. emailaddr.address
    Salutes for the bit on hiding the constructor, that makes a lot of sense.
    It probably does not help anything that in my one attempt at making a javascript web application I did not bother trying to understand how javascript likes it's objects and just forced a python looking model onto it. If any of the web development team saw my code I would definitely get laughed out of the club.
    [-]
    - exceptione 5 hours ago ago
      Yeah, in your example the structure is sufficiently dissimilar to a string for TypeScript not to confuse them for each other. However, if you also have an identity provider returning UserInfo objects in the form of {'domain':'example.com', 'user':'john-doe'}, you might not like it that now any email address is a valid UserInfo object. On the type level in TS, you cannot tell those types apart. But I guess you figured that out already.
- camdenreslink 6 hours ago ago
  In some languages you can create a type that is equivalent to a string, but it’s own distinct type (sometimes called the New Type pattern). Which I guess is the same as a struct with a single field, but languages have syntactic sugar, and depending on implementation doesn’t allocate another extra wrapper object on the heap (this would happen in JavaScript/TypeScript).
- LelouBil 6 hours ago ago
  Look up NewTypes.
  The article's dance is to avoid having extra fields that are completely unnecessary here. They want some kind of nominal email type, that is actually a string, so can be used in places where a string is needed, but when a method requires an "email" you can't use any string.
  It's a pretty common pattern in functional programming and in many other languages nowadays
hankbond 7 hours ago ago
As a new TypeScript user these are concepts that have greatly helped me simplify my code and improve reliability discrete of testing. Many LLMs guide in this direction if you loosely ask them, but having a concise post like this with the what and the why is fantastic as reference material. The suggestion to use Separation and a Linter rule is something I'm going to immediately look into for my current project. Great post!
wwalexander 6 hours ago ago
This is just validation that is using the type system to indicate the validation has already occurred. I think the real point of “parse, don’t validate” is to make the type system give you structural guarantees that couldn’t exist otherwise (e.g. always having a first/last element in the NonEmpty example from the original article). If you’re just branding the types as “parsed” (in reality, simply validated) you still have to know that the invariants you care about hold when using the “parsed” type (e.g. splitting the email type using “@“ will always yield 2 elements), instead of the structure of the type holding that info inherently (e.g. struct Email { name: String, host: String }).
[-]
- jerf 5 hours ago ago
  "This is just validation that is using the type system to indicate the validation has already occurred. I think the real point of “parse, don’t validate” is to make the type system give you structural guarantees that couldn’t exist otherwise (e.g. always having a first/last element in the NonEmpty example from the original article)."
  It's the same thing. In the latter case, something has validated that your NonEmpty has a first and a last element. It's all validation before you stick it in a type that asserts that the validation is guaranteed to have occurred so every function receiving it doesn't need to do it itself.
  Any non-trivial use of a type system will involve making guarantees the type system itself can not actually express [1]. There's nothing wrong with saying "this is a valid email in accordance with my standards" in a type. Merely using the type system to assert "I have some sort of value in the name and host fields" is valid but a degenerate use. "struct Email { name: Name, host: Hostname }" is an even stronger use of the type system, where Name and Hostname are themselves values you can only get by passing some incoming string through a validation process. Asserting that these things exist is just the most basic check possible, but your type still permits {name: "\0\0\0\0\0\0", host: "!"}, whereas under my definition, assuming that Name and Hostname are reasonably defined, that value will not be ever be something that can be witnessed.
  In fact in general, while I don't absolutely rigidly apply this, especially in smaller script-like programs, when a "string" appears in my strong types that specifically means "this has unbounded contents". It's an appropriate type for "stuff I got off a network" or "stuff a user typed". What stuff? Don't know. Haven't checked it yet. When I do it'll get a more specific type like a Username or DecodedUTF8String or something else. Thanks to people using way too many "strings" and "ints" in the world I have to constantly explain to my LLM that I want stronger types. I'm yet to find the invocation to put into my CLAUDE.md or equivalent to get it to do it right the first time consistently.
  [1]: With a wistful stare into the distance acknowledging the theoretical utopia of dependent types... but it doesn't seem to be coming down from "theoretical" any time soon.
  [-]
  - wwalexander 5 hours ago ago
    > It's the same thing. In the latter case, something has validated that your NonEmpty has a first and a last element.
    No, it has parsed it into a structure that structurally has at least one element, not just the promise that there ought to be one. From the original “Parse, don’t validate” article:
```
    data NonEmpty a = a :| [a]
```
    > your type still permits {name: "\0\0\0\0\0\0", host: "!"}
    I actually originally wrote it with an array of EmailNameCharacters, etc but didn’t want to overcomplicate the example.
conartist6 7 hours ago ago
Don't forget to freeze the objects
ramses0 6 hours ago ago
Meta: in addition to upvotes and downvotes, we almost need a slop/not-slop slider.
This one barely scrapes by at what feels like 30-40% "slop": "honestly", "the one thing", etc...
...but I did learn something about "Brand" types, and have personally tried to do more of "parse don't validate" in my own code.
Recently I did this similar trick for `exec( ValidExecutable(...) )` [python], where it required tagging/washing through a private function/variable to "get" the private bit.
All the scanners tend to light up when they see "exec" at all (eg: `exec( "pandoc" )` for PDF generation), but I needed to hard code a few "expected" pandoc locations so the imaginary hackers couldn't shadow "pandoc" on a path location they controlled.
philipwhiuk 4 hours ago ago
The problem with encoding stuff in type systems is where you stop.
whilenot-dev 5 hours ago ago
```
  default: {
    const _exhaustive: never = result;
    return _exhaustive;
  }
```
...is not how people should implement an exhaustiveness check ever! An exhaustiveness check exhausts your knowledge about the world, it should throw an exception at runtime. Just returning the non-matched case is a recipe for disaster. Do this instead:
```
  default:
    ((value: never) => { throw new Error(`Missing case for value: ${value}`); })(result);
```
[-]
- terminatornet3 4 hours ago ago
  The original author is correct. Their implementation of an exhaustive check will give you a compiler error if you miss a variant in your switch statement. I much prefer a compiler error over a run time error.
  It's even recommended in the official typescript docs - https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...
  [-]
  - whilenot-dev 3 hours ago ago
    > Their implementation of an exhaustive check will give you a compiler error if you miss a variant in your switch statement. I much prefer a compiler error over a run time error.
    What are you talking about? You'd still get the compile error just the same.
    Falling back on returning the input argument doesn't even make sense in the typescript docs:
```
  type Shape = Circle | Square;
 
  function getArea(shape: Shape) {
    switch (shape.kind) {
      case "circle":
        return Math.PI * shape.radius ** 2;
      case "square":
        return shape.sideLength ** 2;
      default:
        const _exhaustiveCheck: never = shape;
        return _exhaustiveCheck;
    }
  }
```
    Case circle and square are returning a number, but an unknown shape is returning itself? This is especially annoying when teammates are starting to cast values into a Shape throughout the codebase. Guess I'll need to make a PR to the typescript docs.
    [-]
    - terminatornet3 2 hours ago ago
      you don't need a runtime error if you have a compiler error. best of luck with your PR
roywiggins 5 hours ago ago
ai; dr, unfortunately
simonreiff 3 hours ago ago
I always felt a little duped whenever I tried coding in TypeScript. You get zero runtime type safety guarantees, plus it's often harder to tell in TypeScript whether the transpilation will result in an efficient and performant implementation. Maybe the worst thing is that if you have two objects, one called EmailAddress and one called UnrelatedThing, but both have a UUID as the first thing and a string as the second thing, and now you create an object at runtime that is called TotallyUnrelatedThing that has a UUID followed by a string, the runtime sees EmailAddress, UnrelatedThing, and TotallyUnrelatedThing as being structurally identical and in fact they are all "compatible" under TS at runtime, which is usually the exact opposite of what one would expect. Now in other languages you can get some additional guarantees like in C# at the cost of more ceremony and boilerplate to establish all your abstract primitives and layers.
My own approach is mostly to prefer JS and JSON objects with helper chains including validators and constructor/builder and parser utilities. Get the age from the user and get the domain from the email address and don't be surprised by the type, because everything is an object, and don't be surprised that you need to validate and parse, but expect to do so always. Do it in as modular and reusable a pattern as makes sense, which often isn't exactly the same for every scenario, but that's OK. Speaking of which, am I the only one who thinks it's usually more of a hassle than it's worth to define a universal EmailAddress for all times and places? Often the conflict happens because even if I try to do so, I usually am using one vendor as an IdP and a different vendor for transactional emails (even if I use the same cloud provider say for both). They each probably have different robust regex implementations to check whether something is truly an email address. I then still need authEmailAddrees and billingEmailAddress objects to pass to each, respectively, but there is no enforcement or requirement to instantiate an interface that contains an enforceable contract in TypeScript, so remind me why I am bothering to say these things are both email addresses? It just always feels like the worst of all worlds when I work in TypeScript, kind of a "rules for thee but not for me" situation. I have to follow typing, but TypeScript doesn't quite have to do so. In particular it always feels like I still have to enforce a lot more validation at the API layer than should be required, without any feeling that I can trust an EmailAddress to instantiate an IEmailAddress interface or that AuthEmailAddress and BillingEmailAddress inherit from the base EmailAddress or that those structures are guaranteed to persist at runtime and that a TotallyUnrelatedThing that just so happens to have a UUID plus a string but isn't strictly instantiating the email class will never accidentally end up populating an email field, which is kind of a concern. (By the way I think the hardcore email address validation really ought to be handled by the upstream provider anyway. I just do some minimal checks on length and presence of dots and at-symbols but don't bother trying to implement a full regex compendium of all email possibilities since these details frankly conflict frequently enough at the edges that I would rather let my IdP and email providers decide for themselves if they truly have an acceptable email input, and handle the failure loudly and up front, rather than try to do all the gatekeeping myself.)