Show HN: Cicada – A scripting language that integrates with C

(github.com)

57 points | by briancr 3 days ago ago

32 comments

smartmic 3 days ago ago
Cool, I like these kinds of projects. When it comes to embedding a scripting language in C, there are already some excellent options: Notable ones are Janet, Guile, and Lua. Tcl is also worth considering. My personal favorite is still Janet[0]. Others?
[0]: https://janet-lang.org/
[-]
- forgotpwd16 3 days ago ago
  Io is nice (Smalltalk/Self-like). A mostly comprehensive list: https://dbohdan.github.io/embedded-scripting-languages/
  [-]
  - publicdebates 3 days ago ago
    That list (or any similar list) would be so helpful if it had a health column, something that takes into account number of contributors, time since last commit, number of forks, number of commits, etc. So many projects are effectively dead but it's not obvious at first sight, and it takes 2 or 3 whole minutes to figure out. That seems short but it adds up when evaluating a project, causing people to just go to a well known solution like Lua (and why not? Lua is just fine; in fact it's great).
    [-]
    - briancr 3 days ago ago
      Seconded.
  - briancr 3 days ago ago
    Should have replied directly —- thanks! That’s a great list..
- dualogy 3 days ago ago
  AngelScript. Matured & maintained since 2003, is fully typed and with C syntax. https://www.angelcode.com/angelscript/
  [-]
  - briancr 3 days ago ago
    Yes very C-like.. One immediate difference is that in these C-like scripting languages there’s a split between definitions and executable commands. In Cicada there are only executable commands: definitions are done using a define operator. (That’s because everything is on the heap; Cicada functions don’t have access to the stack). I personally think the latter method makes more sense for command-line interactivity, but that’s a matter of taste.
- briancr 3 days ago ago
  Thanks! I’m unfamiliar with Janet but I’ve looked into the others you listed.
  One personal preference is that a scripting syntax be somewhat ‘C-like’.. which might recommend a straight C embedded implementation although I think that makes some compromises.
- zem 3 days ago ago
  squirrel: http://squirrel-lang.org/
  [-]
  - briancr 3 days ago ago
    Yes I like this one. It’s similar and even more C-like, in that it discriminates between classes, class instances, functions, methods vs constructors, etc. (Cicada does not).
publicdebates 3 days ago ago
The for loop is odd. Why is the word counter in there twice?
```
    counter :: int

    for counter in <1, 10-counter> (
       print(counter)
       print(" ")
    )
```
Using backfor to count backwards is an odd choice. Why not overload for?
```
    backfor counter in <1, 9> print(counter, " ")
```
This is confusing to me. Maybe I'm misunderstanding the design principles, but the syntax seems unintuitive.
[-]
- briancr 3 days ago ago
  Yeah this is why the syntax is customizable.. maybe it’s not optimal.
  The example I gave was strange and I’ll have to change it. Not sure what I was trying to show there. The basic syntax is just:
  for counter in <1, 5> print(counter)
  backfor counter in <1, 5> print(counter)
  It’s not overloaded because ‘for’ is basically a macro, expanding to ‘iterate, increment counter, break on counter > 5’ where ‘>’ is hard-coded. If ‘for’ was a fundamental operator then yes, there would be a step option and it would be factored into the exit condition.
  You’ve got me thinking, there’s probably a way to overload it even as a macro.. hmmm…
  [-]
  - nextaccountic 3 days ago ago
    Just do for counter in <1, 5>.rev(), which would iterate in a reversed range.
    IMO it's poinless to distinguish synctactically between iterating forwards and backwards, specially if you also support things like for counter in <1, 5>.map({ return args[1] * 2) to irate on even numbers (the double of each number), rather than having to define a fordoubled macro. I mean, adding method like map and rev to ranges is more orthogonal and composes better. (See for example iterators in Rust)
    Not that I don't like syntactic flexibility. I am a big fan of Ruby's unless, for example
    [-]
    - briancr 2 days ago ago
      “IMO it's pointless to distinguish syntactically between iterating forwards and backwards” — I completely agree. It’s really a compiler-macro limitation that’s preventing me from doing this.. though I don’t have to go that route.
      I think what you’re suggesting would require the <a, b> syntax to produce a proper iterator type, which it doesn’t currently do. That’s definitely worth considering — then you could attach methods, etc.
      Thanks for the suggestion! I’ll think about the best way to fix this..
codr7 3 days ago ago
Nice, the more the merrier!
I've been working on one for Kotlin lately:
https://gitlab.com/codr7/shik
[-]
- briancr 3 days ago ago
  Very cool! I’ve never used Kotlin..
briancr 3 days ago ago
Thanks for the references! Writing a language was almost an accident — I worked on a neural networks tool with a scripted interface back around 2000, before I’d ever heard of some of these other languages.. and I’ve been using/updating it ever since.
Beyond NNs, my use case to embed fast C calculations into the language to make scientific programming easier. But the inspiration was less about the use case and more about certain programming innovations which I’m sure are elsewhere but I’m not sure where — like aliases, callable function arguments, generalized inheritance, etc.
That’s a great list — most of those languages I’ve honestly never heard of..
nextaccountic 3 days ago ago
> Uses aliases not pointers, so it's memory-safe
How does it deal with use after free? How does it deal with data races?
Memory safety can't be solved by just eliminating pointer arithmetic, there's more stuff needed to achieve it
[-]
- briancr 3 days ago ago
  There’s no multithreading so race conditions don’t apply. That simplifies things quite a bit.
  There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
  I agree that my earlier statement wasn’t quite a complete explanation.
  Of course, since it interfaces with C, it’s easy to overwrite memory in the callback functions.
  [-]
  - nextaccountic 3 days ago ago
    I mean, that's a neat tradeoff, however..
    > There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
    Does this mean that Cicada will happily and wildly leak memory if I allocate short lived objects in a loop?
    Why don't you just add some reference counting or tracing GC like everybody else
    > 1) members can be removed;
    Does this causes use after free if somebody had access to this member? Or it will give an error during access?
    [-]
    - briancr 2 days ago ago
      No, there are both referenced-based and tracing-based GC routines that will deallocate short-lived objects. Sorry, I was just trying to enumerate the ways memory goes out of scope to show that none of those ways results in an invalid pointer _within the scripting language_.
      The safety comes because there is no way to access a pointer address within the scripting language. The main functionality of pointers is replaced by aliases (e.g. a = @b.c, a = @array[2], etc.). The only use of pointers is behind the scenes, e.g. when you write ‘b.c’ there is of course pointer arithmetic behind the scenes to find the data in member ‘b’.
      Having said that, it is certainly possible for a C callback routine to store an internal pointer, then on a second callback try to use that pointer after it has fallen out of scope. This is the only use-after-free I can imagine.
      [-]
      - nextaccountic 2 days ago ago
        Okay, this is the usual way to perform safe memory management in managed / high level programming languages.. it was just that your "alias" terminology threw me off
        Note that you can add multithreading later if you adopt message passing / actor model. Even Javascript, which is famously single threaded, gained workers with message passing at some point
        [-]
        briancr 2 days ago ago
        Yes, multithreading seems to be a consistent theme among the comments.. so I should definitely look into that. Thanks for the comment. (I actually haven’t done much threaded programming myself so this would be a learning experience for me..)
    - briancr 2 days ago ago
      Also, if someone else has access to the member, meaning that there is an alias to the member, then the reference count should reflect that. Here’s an example:
      i :: int | 1 reference
      a := @i | 2 references
      remove i | 1 reference
      The data originally allocated for ‘i’ should persist because its reference count hasn’t hit zero yet.
tayistay 3 days ago ago
Can I call into the interpreter from multiple threads or does it use global state?
[-]
- briancr 3 days ago ago
  There’s no multithreading capability built into Cicada. So a given instance of the interpreter only has a single concurrent state, and all C callbacks share memory with that global state. Multithreading would require a C-based thread manager.
eps 3 days ago ago
What's the use case? Clearly, you made it with some specific use in mind, at least initially. What was it?
[-]
- briancr 3 days ago ago
  To be more specific (see my general comment), I’ve used the language in two open-source projects: 1) a chromosome conformation reconstruction tool, and 2) a fast neural network generator (back end). Re Project 2: I’m also planning to embed the language into results webpages served from the NN generator website.
languagehacker 3 days ago ago
I've lost count of projects called Cicada
[-]
- publicdebates 3 days ago ago
  A new one seems to pop up every year, and some every 13 or 17 years.
  [-]
  - briancr 3 days ago ago
    This one’s Brood VI!
- briancr 3 days ago ago
  I know, I was dismayed to find out that there’s even another scripting language called Cicada.
  The name came when I was living in Seattle and missed the sounds of east coast summer..