Parsing Integers in C

(daniel.haxx.se)

42 points | by 8organicbits 10 hours ago ago

11 comments

piker 9 hours ago ago
"I think we in the curl project as well as more or less the entire world has learned through the years that it is usually better to be strict when parsing protocols and data, rather than be lenient and try to accept many things and guess what it otherwise maybe meant."
Found this explicit rejection of the Robustness principle[1] fascinating. It comes after decades of cURL operating in the environment that was an ostensible poster child for the benefits of the principle--i.e., HTML over HTTP.
[1] https://en.wikipedia.org/wiki/Robustness_principle
[-]
- recursivecaveat 3 hours ago ago
  The robustness principle is locally optimal. If you want your software to not crash for users, then yes you should just silently correct weird inputs and you should make sure your outputs are following everyone else's happy paths. If you want a globally optimal ecosystem of reliable and predictable behaviour then you want everyone rejecting non-conforming inputs and outputing data that hits all the edge cases of the formats to shake out non-compliant servers.
- trollbridge 8 hours ago ago
  I disagree with the robustness principle. Be strict in what you accept - require them to meet the spec.
- DannyB2 8 hours ago ago
  Being liberal in what you accept is fine, as long as what you accept is precisely documented. But then, is that actually "being liberal"?
  Better advice is to not do something unexpected -- even if that unexpected result is clearly documented, but someone did not read it.
- jesse__ 9 hours ago ago
  The more experienced I get, the more I've started to think that most of the 'principals', 'patterns' and 'best practices' tossed around in the industry are mostly bullshit.
  Be attentive to the classes of bugs you (and your team) produce, and act accordingly to correct those.
- Quekid5 8 hours ago ago
  I think it's been a commonly held opinion in security circles for at least 15+ years that the Robustness principle is generally counterproductive to security. It (almost inevitably) leads to unexpected interactions between different systems which, ultimately, allow for Weird Machines to be constructed.
  An argument can be made that it was instrumental in bootstrapping the early Internet, but it's not really necessary these days. People should know what they're doing 35+ years on.
  It is usually better to just state fully formally up front what is acceptable and reject anything else out of hand. Of course some stuff does need dynamic checks, e.g. ACLs and such, but that's fine... rejecting "iffy" input before we get to that stage doesn't interfere with that.
  [-]
  - 0manrho 30 minutes ago ago
    > I think it's been a commonly held opinion in security circles for at least 15+ years that the Robustness principle is generally counterproductive to security
    Well yes, that's because people have been misapplying and misunderstanding it. The original idea was predicated on the concept of "assume that the network is filled with malevolent entities that will send in packets designed to have the worst possible effect"
    But then the Fail Fast, Fail Often stupidity started spreading like wildfire and companies realized that the consequence for data breaches or other security failures was an acceptable cost of doing business (even if not always true) vs the cost of actually paying devs and sec teams to implement things properly and people kinda lost the plot on it. They just focused on the "be liberal in what you accept" part, went "Wow! That makes thing easy" and maybe only checked for the most common potential abuses/failure/exploit modes, if they bothered at all and only patched things retroactively as issues and exploits popped up in the wild.
    Doing it correctly, like building anything robust and/or secure, is a non-trivial task.
johnisgood 9 hours ago ago
Quick link to the code: https://github.com/curl/curl/blob/3d42510118a9eba12a0d3cd4e2...
leopoldj 2 hours ago ago
Somewhat related, if you are on a C++ project, please consider std::from_chars. It's non-allocating and non-throwing. Works with non-NULL terminated strings.
https://mobiarch.wordpress.com/2022/12/12/string-to-number-c...
Joker_vD 7 hours ago ago
The strtoul()/strtoull() also have a somewhat strange semantics regarding the leading '-': it will apply it to the (unsigned) result, so e.g strtoul("-40", ...) happily returns 18446744073709551576.
Also, the wording of the standard suggests that using strtol()/strtoll() to parse the string representation of LONG_MIN/LLONG_MIN is UB, since it kinda has to go through un-negated LONG_MAX+1/LLONG_MAX+1 which can't be represented in the return type?
[-]
- BobbyTables2 3 hours ago ago
  I find handling of “-“ (and “+”) on an unsigned integer utterly bizarre.
  Words no longer have meaning.