Cool, I like these kinds of projects. When it comes to embedding a scripting language in C, there are already some excellent options: Notable ones are Janet, Guile, and Lua. Tcl is also worth considering. My personal favorite is still Janet[0]. Others?
That list (or any similar list) would be so helpful if it had a health column, something that takes into account number of contributors, time since last commit, number of forks, number of commits, etc. So many projects are effectively dead but it's not obvious at first sight, and it takes 2 or 3 whole minutes to figure out. That seems short but it adds up when evaluating a project, causing people to just go to a well known solution like Lua (and why not? Lua is just fine; in fact it's great).
Yes very C-like.. One immediate difference is that in these C-like scripting languages there’s a split between definitions and executable commands. In Cicada there are only executable commands: definitions are done using a define operator. (That’s because everything is on the heap; Cicada functions don’t have access to the stack). I personally think the latter method makes more sense for command-line interactivity, but that’s a matter of taste.
Thanks! I’m unfamiliar with Janet but I’ve looked into the others you listed.
One personal preference is that a scripting syntax be somewhat ‘C-like’.. which might recommend a straight C embedded implementation although I think that makes some compromises.
Yes I like this one. It’s similar and even more C-like, in that it discriminates between classes, class instances, functions, methods vs constructors, etc. (Cicada does not).
Yeah this is why the syntax is customizable.. maybe it’s not optimal.
The example I gave was strange and I’ll have to change it. Not sure what I was trying to show there. The basic syntax is just:
for counter in <1, 5> print(counter)
backfor counter in <1, 5> print(counter)
It’s not overloaded because ‘for’ is basically a macro, expanding to ‘iterate, increment counter, break on counter > 5’ where ‘>’ is hard-coded. If ‘for’ was a fundamental operator then yes, there would be a step option and it would be factored into the exit condition.
You’ve got me thinking, there’s probably a way to overload it even as a macro.. hmmm…
Just do for counter in <1, 5>.rev(), which would iterate in a reversed range.
IMO it's poinless to distinguish synctactically between iterating forwards and backwards, specially if you also support things like for counter in <1, 5>.map({ return args[1] * 2) to irate on even numbers (the double of each number), rather than having to define a fordoubled macro. I mean, adding method like map and rev to ranges is more orthogonal and composes better. (See for example iterators in Rust)
Not that I don't like syntactic flexibility. I am a big fan of Ruby's unless, for example
“IMO it's pointless to distinguish syntactically between iterating forwards and backwards” — I completely agree. It’s really a compiler-macro limitation that’s preventing me from doing this.. though I don’t have to go that route.
I think what you’re suggesting would require the <a, b> syntax to produce a proper iterator type, which it doesn’t currently do. That’s definitely worth considering — then you could attach methods, etc.
Thanks for the suggestion! I’ll think about the best way to fix this..
Thanks for the references! Writing a language was almost an accident — I worked on a neural networks tool with a scripted interface back around 2000, before I’d ever heard of some of these other languages.. and I’ve been using/updating it ever since.
Beyond NNs, my use case to embed fast C calculations into the language to make scientific programming easier. But the inspiration was less about the use case and more about certain programming innovations which I’m sure are elsewhere but I’m not sure where — like aliases, callable function arguments, generalized inheritance, etc.
That’s a great list — most of those languages I’ve honestly never heard of..
There’s no multithreading so race conditions don’t apply. That simplifies things quite a bit.
There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
I agree that my earlier statement wasn’t quite a complete explanation.
Of course, since it interfaces with C, it’s easy to overwrite memory in the callback functions.
> There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
Does this mean that Cicada will happily and wildly leak memory if I allocate short lived objects in a loop?
Why don't you just add some reference counting or tracing GC like everybody else
> 1) members can be removed;
Does this causes use after free if somebody had access to this member? Or it will give an error during access?
No, there are both referenced-based and tracing-based GC routines that will deallocate short-lived objects. Sorry, I was just trying to enumerate the ways memory goes out of scope to show that none of those ways results in an invalid pointer _within the scripting language_.
The safety comes because there is no way to access a pointer address within the scripting language. The main functionality of pointers is replaced by aliases (e.g. a = @b.c, a = @array[2], etc.). The only use of pointers is behind the scenes, e.g. when you write ‘b.c’ there is of course pointer arithmetic behind the scenes to find the data in member ‘b’.
Having said that, it is certainly possible for a C callback routine to store an internal pointer, then on a second callback try to use that pointer after it has fallen out of scope. This is the only use-after-free I can imagine.
Okay, this is the usual way to perform safe memory management in managed / high level programming languages.. it was just that your "alias" terminology threw me off
Note that you can add multithreading later if you adopt message passing / actor model. Even Javascript, which is famously single threaded, gained workers with message passing at some point
Yes, multithreading seems to be a consistent theme among the comments.. so I should definitely look into that. Thanks for the comment. (I actually haven’t done much threaded programming myself so this would be a learning experience for me..)
Also, if someone else has access to the member, meaning that there is an alias to the member, then the reference count should reflect that. Here’s an example:
i :: int | 1 reference
a := @i | 2 references
remove i | 1 reference
The data originally allocated for ‘i’ should persist because its reference count hasn’t hit zero yet.
There’s no multithreading capability built into Cicada. So a given instance of the interpreter only has a single concurrent state, and all C callbacks share memory with that global state. Multithreading would require a C-based thread manager.
To be more specific (see my general comment), I’ve used the language in two open-source projects: 1) a chromosome conformation reconstruction tool, and 2) a fast neural network generator (back end). Re Project 2: I’m also planning to embed the language into results webpages served from the NN generator website.
Cool, I like these kinds of projects. When it comes to embedding a scripting language in C, there are already some excellent options: Notable ones are Janet, Guile, and Lua. Tcl is also worth considering. My personal favorite is still Janet[0]. Others?
[0]: https://janet-lang.org/
Io is nice (Smalltalk/Self-like). A mostly comprehensive list: https://dbohdan.github.io/embedded-scripting-languages/
That list (or any similar list) would be so helpful if it had a health column, something that takes into account number of contributors, time since last commit, number of forks, number of commits, etc. So many projects are effectively dead but it's not obvious at first sight, and it takes 2 or 3 whole minutes to figure out. That seems short but it adds up when evaluating a project, causing people to just go to a well known solution like Lua (and why not? Lua is just fine; in fact it's great).
Seconded.
Should have replied directly —- thanks! That’s a great list..
AngelScript. Matured & maintained since 2003, is fully typed and with C syntax. https://www.angelcode.com/angelscript/
Yes very C-like.. One immediate difference is that in these C-like scripting languages there’s a split between definitions and executable commands. In Cicada there are only executable commands: definitions are done using a define operator. (That’s because everything is on the heap; Cicada functions don’t have access to the stack). I personally think the latter method makes more sense for command-line interactivity, but that’s a matter of taste.
Thanks! I’m unfamiliar with Janet but I’ve looked into the others you listed.
One personal preference is that a scripting syntax be somewhat ‘C-like’.. which might recommend a straight C embedded implementation although I think that makes some compromises.
squirrel: http://squirrel-lang.org/
Yes I like this one. It’s similar and even more C-like, in that it discriminates between classes, class instances, functions, methods vs constructors, etc. (Cicada does not).
The for loop is odd. Why is the word counter in there twice?
Using backfor to count backwards is an odd choice. Why not overload for? This is confusing to me. Maybe I'm misunderstanding the design principles, but the syntax seems unintuitive.Yeah this is why the syntax is customizable.. maybe it’s not optimal.
The example I gave was strange and I’ll have to change it. Not sure what I was trying to show there. The basic syntax is just:
for counter in <1, 5> print(counter)
backfor counter in <1, 5> print(counter)
It’s not overloaded because ‘for’ is basically a macro, expanding to ‘iterate, increment counter, break on counter > 5’ where ‘>’ is hard-coded. If ‘for’ was a fundamental operator then yes, there would be a step option and it would be factored into the exit condition.
You’ve got me thinking, there’s probably a way to overload it even as a macro.. hmmm…
Just do for counter in <1, 5>.rev(), which would iterate in a reversed range.
IMO it's poinless to distinguish synctactically between iterating forwards and backwards, specially if you also support things like for counter in <1, 5>.map({ return args[1] * 2) to irate on even numbers (the double of each number), rather than having to define a fordoubled macro. I mean, adding method like map and rev to ranges is more orthogonal and composes better. (See for example iterators in Rust)
Not that I don't like syntactic flexibility. I am a big fan of Ruby's unless, for example
“IMO it's pointless to distinguish syntactically between iterating forwards and backwards” — I completely agree. It’s really a compiler-macro limitation that’s preventing me from doing this.. though I don’t have to go that route.
I think what you’re suggesting would require the <a, b> syntax to produce a proper iterator type, which it doesn’t currently do. That’s definitely worth considering — then you could attach methods, etc.
Thanks for the suggestion! I’ll think about the best way to fix this..
Nice, the more the merrier!
I've been working on one for Kotlin lately:
https://gitlab.com/codr7/shik
Very cool! I’ve never used Kotlin..
Thanks for the references! Writing a language was almost an accident — I worked on a neural networks tool with a scripted interface back around 2000, before I’d ever heard of some of these other languages.. and I’ve been using/updating it ever since.
Beyond NNs, my use case to embed fast C calculations into the language to make scientific programming easier. But the inspiration was less about the use case and more about certain programming innovations which I’m sure are elsewhere but I’m not sure where — like aliases, callable function arguments, generalized inheritance, etc.
That’s a great list — most of those languages I’ve honestly never heard of..
> Uses aliases not pointers, so it's memory-safe
How does it deal with use after free? How does it deal with data races?
Memory safety can't be solved by just eliminating pointer arithmetic, there's more stuff needed to achieve it
There’s no multithreading so race conditions don’t apply. That simplifies things quite a bit.
There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
I agree that my earlier statement wasn’t quite a complete explanation.
Of course, since it interfaces with C, it’s easy to overwrite memory in the callback functions.
I mean, that's a neat tradeoff, however..
> There’s actually no ‘free’, but in the (member -> variable data) ontology of Cicada there are indeed a few ways memory can become disused: 1) members can be removed; 2) members can be re-aliased; 3) arrays or lists can be resized. In those conditions the automated/manual collection routines will remove the disused memory, and in no case is there any dangling ‘pointer’ (member or alias) pointing to unallocated memory. Does this answer your question?
Does this mean that Cicada will happily and wildly leak memory if I allocate short lived objects in a loop?
Why don't you just add some reference counting or tracing GC like everybody else
> 1) members can be removed;
Does this causes use after free if somebody had access to this member? Or it will give an error during access?
No, there are both referenced-based and tracing-based GC routines that will deallocate short-lived objects. Sorry, I was just trying to enumerate the ways memory goes out of scope to show that none of those ways results in an invalid pointer _within the scripting language_.
The safety comes because there is no way to access a pointer address within the scripting language. The main functionality of pointers is replaced by aliases (e.g. a = @b.c, a = @array[2], etc.). The only use of pointers is behind the scenes, e.g. when you write ‘b.c’ there is of course pointer arithmetic behind the scenes to find the data in member ‘b’.
Having said that, it is certainly possible for a C callback routine to store an internal pointer, then on a second callback try to use that pointer after it has fallen out of scope. This is the only use-after-free I can imagine.
Okay, this is the usual way to perform safe memory management in managed / high level programming languages.. it was just that your "alias" terminology threw me off
Note that you can add multithreading later if you adopt message passing / actor model. Even Javascript, which is famously single threaded, gained workers with message passing at some point
Yes, multithreading seems to be a consistent theme among the comments.. so I should definitely look into that. Thanks for the comment. (I actually haven’t done much threaded programming myself so this would be a learning experience for me..)
Also, if someone else has access to the member, meaning that there is an alias to the member, then the reference count should reflect that. Here’s an example:
i :: int | 1 reference
a := @i | 2 references
remove i | 1 reference
The data originally allocated for ‘i’ should persist because its reference count hasn’t hit zero yet.
Can I call into the interpreter from multiple threads or does it use global state?
There’s no multithreading capability built into Cicada. So a given instance of the interpreter only has a single concurrent state, and all C callbacks share memory with that global state. Multithreading would require a C-based thread manager.
What's the use case? Clearly, you made it with some specific use in mind, at least initially. What was it?
To be more specific (see my general comment), I’ve used the language in two open-source projects: 1) a chromosome conformation reconstruction tool, and 2) a fast neural network generator (back end). Re Project 2: I’m also planning to embed the language into results webpages served from the NN generator website.
I've lost count of projects called Cicada
A new one seems to pop up every year, and some every 13 or 17 years.
This one’s Brood VI!
I know, I was dismayed to find out that there’s even another scripting language called Cicada.
The name came when I was living in Seattle and missed the sounds of east coast summer..