Luau in interpreter mode is pretty much as fast as LuaJIT 2.1 in interpreter mode.
Luau with (partial) native compilation is factor 1.6 slower than LuaJIT 2.1 in JIT mode. I used Luau with the -g0 -O2 --codegen options (didn't add --!native to the code though), which according to my understanding automatically selects the "profitable" functions for native compilation.
Thank you, I kept waiting for a chart or some numbers that never came. Per usual, we are talking about orders of magnitude difference compared to actually high performing code. Another word for that is "slow". Just worlds apart in expectations.
Of course the lesson is when it comes to performance, it's extremely hard to make up with tuning what you lose in language design. You can optimize the work all you want but nothing beats designing it so that you don't have to do a good chunk of it in the first place.
Asking as a newbie in this area, could you share any pointers to language design for performance?
I'm aware of the early difference between compiled and interpreted languages. Luau has to be interpreted to meet its security goals, and I'm asking with similar goals in mind, so I guess I'm starting from that significant limitation.
Lua gets sone perf with simple types that can represent lots of types without pointers easily. Truthiness is also fast since only nil/false singletons are falsy. Whereas Python has ´__bool__´. But look at metatable stuff for how much lua has to check
All of these introduce guards in with JIT or inline cache, preferable to have no guard at all
This isn't unique to dynamic languages, see C++ map having a layer of indirection forced to support pointer lifetimes of access living past inserts. Whereas Rust doesn't allow borrowing past that, & Go doesn't allow taking address of map value
Other examples: C optimizations having to worry about pointer aliasing. Or Go interfaces having to box everything. It used to have small value types be able to avoid boxing for interface value, but dropped when switching to precise GC
I was actually surprised to see nearly a factor ten between C99 and LuaJIT. In previous measurements (on x86, see e.g. https://github.com/rochus-keller/Are-we-fast-yet/blob/main/L...) there was rather a factor five. So either GCC 12.2 produces much faster code than GCC 4.8, or LuaJIT 2.1 got much slower, or the C99 version of Are-we-fast-yet is much better supported by the CPU cache of the T480 than my previous EliteBook 2530. I don't think that the x86 vs x86_64 makes such a difference (at least I didn't observe this in many other experiments).
I’ve always been curious how Roblox games are deployed and managed. Is each instance of a game executed in a docker container, and the luau code isolated that way or is there some multi-tenant solution?
They run the game servers in Docker. Doing multi-tenant is a weaker security boundary and makes it easier to steal places from other users, which Roblox takes pretty seriously when places represent all the time invested by game studios and millions of dollars in revenue.
running multiple game servers in docker is a multi-tenant environment, because docker is not a serious security boundary unless you're applying significant kernel hardening to your kconfig to the tune of grsecurity patches or similar
I haven't used Roblox but Lua has the ability to create sandboxes to run user code. You expose only the functionality you allow to the user code, usually block I/O, and any unsafe functions. https://luau.org/sandbox
It is obviously a choice why isn't done, but with static modules you can know whether * is overloaded. That will improve procuedure calls by a lot, almist always. sure, with polymorphic finctions you can get a bit of the way using inline caches, but in my experience knowing the callee is always going to be a speedup.
i use luau a lot as part as my Roblox development work, it's pretty fast for its main use case.
there are people a lot more knowledgeable about this topic so i won't pretend to know this is possible, but could a versioning flag similar to the !native flag be added? it would allow both for backwards compatibility and better optimizations, although i know it might add complexity where it's not needed
Here are some measurement results based on the Are-we-fast-yet benchmark suite: https://github.com/rochus-keller/Are-we-fast-yet/blob/main/L...
Luau in interpreter mode is pretty much as fast as LuaJIT 2.1 in interpreter mode.
Luau with (partial) native compilation is factor 1.6 slower than LuaJIT 2.1 in JIT mode. I used Luau with the -g0 -O2 --codegen options (didn't add --!native to the code though), which according to my understanding automatically selects the "profitable" functions for native compilation.
The thing that sticks out at me most on that table is "Mandelbrot" being such an outlier, has the LuaJIT implementation been checked over?
Looking at the code, it looks like the Mandelbrot algorithm has a version-switcher, so does that mean LuaJIT is going down the < 5.3 path?
( Sorry, this isn't my area of expertise, I'm just trying to make sense of the table! )
> has the LuaJIT implementation been checked over
Just re-checked that I inserted the Luau Mandelbrot results in the correct cell.
> does that mean LuaJIT is going down the < 5.3 path?
Yes.
Thank you, I kept waiting for a chart or some numbers that never came. Per usual, we are talking about orders of magnitude difference compared to actually high performing code. Another word for that is "slow". Just worlds apart in expectations.
Of course the lesson is when it comes to performance, it's extremely hard to make up with tuning what you lose in language design. You can optimize the work all you want but nothing beats designing it so that you don't have to do a good chunk of it in the first place.
Asking as a newbie in this area, could you share any pointers to language design for performance?
I'm aware of the early difference between compiled and interpreted languages. Luau has to be interpreted to meet its security goals, and I'm asking with similar goals in mind, so I guess I'm starting from that significant limitation.
Lua gets sone perf with simple types that can represent lots of types without pointers easily. Truthiness is also fast since only nil/false singletons are falsy. Whereas Python has ´__bool__´. But look at metatable stuff for how much lua has to check
All of these introduce guards in with JIT or inline cache, preferable to have no guard at all
This isn't unique to dynamic languages, see C++ map having a layer of indirection forced to support pointer lifetimes of access living past inserts. Whereas Rust doesn't allow borrowing past that, & Go doesn't allow taking address of map value
Other examples: C optimizations having to worry about pointer aliasing. Or Go interfaces having to box everything. It used to have small value types be able to avoid boxing for interface value, but dropped when switching to precise GC
I was actually surprised to see nearly a factor ten between C99 and LuaJIT. In previous measurements (on x86, see e.g. https://github.com/rochus-keller/Are-we-fast-yet/blob/main/L...) there was rather a factor five. So either GCC 12.2 produces much faster code than GCC 4.8, or LuaJIT 2.1 got much slower, or the C99 version of Are-we-fast-yet is much better supported by the CPU cache of the T480 than my previous EliteBook 2530. I don't think that the x86 vs x86_64 makes such a difference (at least I didn't observe this in many other experiments).
I’ve always been curious how Roblox games are deployed and managed. Is each instance of a game executed in a docker container, and the luau code isolated that way or is there some multi-tenant solution?
They run the game servers in Docker. Doing multi-tenant is a weaker security boundary and makes it easier to steal places from other users, which Roblox takes pretty seriously when places represent all the time invested by game studios and millions of dollars in revenue.
running multiple game servers in docker is a multi-tenant environment, because docker is not a serious security boundary unless you're applying significant kernel hardening to your kconfig to the tune of grsecurity patches or similar
How is this cost effective though? There are a lot of low quality games, not by a big studio. These also get a dedicated docker container?
I haven't used Roblox but Lua has the ability to create sandboxes to run user code. You expose only the functionality you allow to the user code, usually block I/O, and any unsafe functions. https://luau.org/sandbox
It is obviously a choice why isn't done, but with static modules you can know whether * is overloaded. That will improve procuedure calls by a lot, almist always. sure, with polymorphic finctions you can get a bit of the way using inline caches, but in my experience knowing the callee is always going to be a speedup.
i use luau a lot as part as my Roblox development work, it's pretty fast for its main use case.
there are people a lot more knowledgeable about this topic so i won't pretend to know this is possible, but could a versioning flag similar to the !native flag be added? it would allow both for backwards compatibility and better optimizations, although i know it might add complexity where it's not needed