Fenils Blog

Weeknote 25 2026

Mon, 22 Jun 2026 00:00:00 GMT

Tech

This week feels like a super major one. My decision to leave tech forever got vetoed, and for some reason I feel super relieved. I can, for the near future, throw out my worries about AI, and focus back on my work plus study of computer science. I am fully motivated to start a project, it has been a while since I last built one end-to-end. Last completed one was JIT interpreter for brainfuck language. Maybe working on a join operator, just join operator with arrow batches as vector format and benchmarking it against datafusion's implementation would be an interesting thing to do. I can start with a basic nested loop join implementation, move to hash join, adding partitioning and spilling and vectorization as we move forward. AHHH this excites me so much!

I also finally got around to integrating wander console in my website. It is available as a top level link in my header. I visit it from time to time to find interesting pages from people. If you are reading this note, give wander console a try, it really is something which can be cherised by IndieWeb peeps!

Also, if you are actually integrating it, do take care of wander.js file's Content-type, I had messed it up but Susam helpfully pointed it out which I promptly fixed with some help from Yash.

Non tech

This week me and Gunwant, went to Pizza 4Ps. This time we ordered Garlic bread as starters and for the main pizza, half was the tried and tested Burrata Salad Pizza and other half was feeling lucky Honey Chilli pizza. Well I can say for sure I didn't feel lucky after having the second half :(

Image: Garlic Bread Sticks Image: Burrata and Honey Chilli Pizza

After that me, Gunwant, Harsh and Yash went to watch Backrooms. Having watched the original backrooms series on youtube and falling in love with it at the very second, I was pretty stoked when it got picked up by a24 , a studio I have grown to absolutely love. They also recruited Kane to make the movie, he finally could use real people and real set instead of blender-ing it all out. Suffice to say, it is a must watch if you are fan of horror genre or backrooms in general.

Image: Backrooms movie title screen

Sports

My climbing friends are discussing a Hampi trip. AND I LOVE HAMPI! My last experience there in the end of previous year was so good, I could't think about missing it this time! And hence I ditched my "dont climb, get strong" plans and finally went to the climbing gym this Thursday. It was still going to be a test of how I feel when using my right foot, jumping on different holds or from high above. Luckily all felt good, so this means I am going to continue climbing and training my fingers for the Hampi trip next week! Enjoy video of a climb I completed off-camera, but couldn't repeat on-camera 😭

https://www.youtube.com/shorts/1NZMz0H1obs?mute=1

Food explore

I tried out Bun Maska at two new places:

For me a good Bun Maska is:

Super soft bun
Proper maska, not just butter

Pure Coffee has the best Bun Maska near Jayanagar metro station at the very least!

As for Poha exploration, this time I tried Maharaja Wada, it was 4/5. Not perfect, but good enough.

Interesting links from around internet

https://www.jvm-weekly.com/p/project-valhalla-explained-how-a
https://www.linkedin.com/posts/andygrove_nice-to-see-aws-labs-publish-a-comparison-share-7472505161546911745-ksQ9/
https://medium.com/@kimth0312/computer-architecture-block-i-o-optimization-18a3f64458e1
https://xkcd.com/3261/
https://xkcd.com/3260/
https://thedailywtf.com/articles/required-fields
https://thedailywtf.com/articles/microbits

Minimal CSS for everything

Wed, 10 Jun 2026 00:00:00 GMT

Today I am going to talk about minimal CSS to make responsive websites. This is the best time to write this down cause I have built my website recently, and I fought through a jungle of different loosely connected pieces of ideas I had in my brain to make this website responsive and with no horizontal scroll. Please please remove horizontal scrolls from your websites, it is not that hard. Personally if I come across one, it drives me nuts. And this is not just noobs writing CSS who introduce them, I just saw it yesterday on Github compare page!! A multi-million dollar company, with all the resources in the world, and they can introduce such bugs. Its still present as of writing this blog i.e. 10-06-2026-06-09 in DD-MM-YYYY, check it out here.

I was doing a lot of web-dev during my college days. Being part of a super active club, we used to conduct a lot of events! And each one required at least a website, with android/iOS app being optional. We were always in build mode, one event goes, another one is knocking on the door. You would think with all this pressure, why not use something like Wordpress? Well there are two reasons:

We had some of the best designers, they have wild imaginations and pushed us to our limits always
We were students and we wanted to learn!

This was pre-LLM era, so all we did was Fuck Around Find Out. Ahh the golden days. But due to all this grinding, we had become super efficient at base responsiveness of any page. For each project, we would bang out responsive base layouts pretty fast, hard part was things like rendering a 3D globe (well this was a library so we escaped), or making CSS animations with canvas, etc. I was involved in almost all the projects either as a frontend guy or a backend guy, sometimes both :P

Around when I took over most of the development, something flipped in development practices of our team. I had pushed a rule of not using any CSS frameworks. I had recently learnt about flexbox, and it felt magical! I wondered to myself: What have I been doing till this day. Trying to use loads of media queries, z-index, float, display and what not to make the website just barely responsive. Responsiveness was just a weird screen away from breaking. As soon as I learnt about it, and built enough intuition I realized: all these frameworks were helping with was hiding skill issue. There were zero reasons, for a complete learning project to use any frameworks. So they were banned, but this meant literally hand-rolling everything in the whole website, now this could be a CTF website or recruitment portal, anything which the next event and design curiosity led to.

The reason I am talking about this today is I recently read an interesting article from Matklad. While I follow his blog posts for systems engineering/low level/PL design, etc, this one stuck to me. No-one wants to learn oddities of CSS, this is especially true for people who want to control everything about their website but not be bothered with too much framework complexity, I am still that guy! So following his lead, what are my rules which I follow and want to remember to make this happen?

With this motivation, I am going to give you a recipe which has worked for me over the years! Lets get started:

Box Sizing

First set:

* { box-sizing: border-box; }

Make all the elements use border-box. By default CSS does not include padding and border as part of the height and width calculation of the element, it just feels unintuitive to me, as well as to Matklad too xD

MDN link

Flexbox

display: flex;

Anywhere where you want to arrange multiple elements in a way where they can wrap, you want them to be strictly stacked, strictly next to each other, strict space between or any such scenarios, use flexbox. I am not doing a proper justice to how important and life changing flexbox is. So, for this one thing, do sit and read full MDN reference. It is really good! And you would definitely forget it, so next time you will have to come and see this again for few things xD

Next is:

justify-content: center;
align-items: center;

How to center a child div? SOLVED!

I am not even kidding, this is it. And its not just this, justify-content has other values like start, end, space-between, space-around and space-evenly. Again MDN page gives a good interactive example for these properties.

and same for align-items: stretch/center/start/end.

There is one question to ask though, why two properties? why not just one? Well for that we will have to understand flexbox a bit and this explanation will also double down as our introduction to next two flexbox properties. Again, I am only going to touch just enough to give an idea, for detailed documentation do refer to its official MDN page.

Flexbox has two directions in which it tries to reason about its child element. One is called the main axis and other is called the cross axis. By default, main axis is horizontal and cross axis is vertical. So, if you say:

justify-content: start;
align-items: start;

You can expect all children to be in the left hand side upper corner of your parent element. Or, if you set:

justify-content: end;
align-items: end;

they will be in the right hand side lower corner. There are loads of different combinations. You should personally play with these combinations to get the best feel! Once you understand these, it will start to feel like superpower :)

There is one catch though, values of main and cross axis can change according to flex-direction, which can be row, row-reverse, column or column-reverse. Main axis in each case is:

row: left-to-right
row-reverse: right-to-left
column: top-to-bottom
column-reverse: bottom-to-top

One can easily infer about cross-axis. And finally there's flex-grow. This is used to tell an element to take a stretch factor in free space. A simple example to understand it would be:

Imagine there's a website with body set as 100% of screen space. At the top there's a header and at the bottom there's a footer, in the middle you have dynamically sized content. Header is of height 10% and footer, 5%. Now we can set flex-grow on middle element to 1, which would make it automatically occupy all the remaining space between them.

In this example one could compute the height of middle element as 85%, but think of multiple elements of varying size, some fixed, some dynamic, all of them interacting with each other and they are dynamically inserted in any order, having properties like flex-grow, helps in that case!

There are other properties on flexbox, but I don't remember now if I used any regularly in college, at least I didn't need any in my current website.

Single media query

Just use:

@media (max-width: 480px) { /* all needed CSS */ }

for any mobile phone specific stuff. Important to note point will be: it is only to be used for things like: image size is 30% on desktops, but on phone it needs to be 50%. DO NOT use it to get responsiveness. Flexbox should have everything you need to achieve that!

Margin on body

For some reason Firefox (not sure about other browsers), had a default margin of 8?? This caused so much confusion to me :(

Setting it zero removed horizontal scroll and overflow I was debugging xD

By default just remove it on html and body tag:

html, body {
    /* by default it has 8 margin 🤦 */
    margin: 0;
}

From Matklads blogpost I also learnt about other such non-intuitive things browsers set and hence every website should perform a so called "CSS Reset", basically a small set of sane properties to keep at the the top of project and start building from that base. One linked by Matklad was this.

Percent over pixels for margin and padding

Percentage scales better for different screen sizes by default, so stop hard-coding pixels! Btw these pixels do not map to actual pixels of your screen. Its just a logical construct which has a mapping decided by browser to the actual physical pixels!

There could be a case where margin and padding looks off on different screen resolutions. I usually keep two sets, one for mobile and one for any other screens.

Rem over pixels for font size

Again don't hard code using pixels, just use rem. While its intuitive, a good read on the same was linked in Matklads post.

CSS variables for light and dark mode

Use global CSS variables for setting light and dark mode colors and use them with system preferences like this:

:root {
    --bg: #fafafa;
    --fg: #212121;
    --muted: #5a5a5a;
    --logo-backdrop: #1565c0;
    --alt-bg: #e0e0e0;
    --border: #d0d0d0;
    --text-color: #212121;
}

@media (prefers-color-scheme: dark) {
    :root {
        --bg: #212121;
        --fg: #dadada;
        --muted: #a0a0a0;
        --logo-backdrop: #42a5f5;
        --alt-bg: #424242;
        --border: #3a3a3a;
        --text-color: #dadada;
    }
}

This would change theme of the website according to system preferences by default. You can also force it using a button on your website :)

Don't forget LVHA

The Love/Hate relationship of CSS. Its basically a rule covering how CSS loads properties on anchor tag in a particular order:

:hover must come after :link and :visited
:active must come after :hover

If you does not follow these, you might see properties being overridden randomly. So always remember these!

Semantic tags

Try to use HTML semantic tags like <em> for italic, <strong> for bold , etc. These are for text decoration, but there are tags for loads of other things like lists: <ul>, <li>, navigation lists etc. There are a ton of semantic tags baked into HTML, try to make use of them as much as possible and leave the defaults as they are!

Divs

And finally, use divs liberally when trying to make sense of some responsive layout, I usually find giving borders of different colors to different divs a pretty good way to debug CSS issues. Now, I know we have a very powerful Inspector, but you know to debug two very different elements together in a complex hierarchical top-down dependent system (flexbox all the way down), this trick is useful xD

Conclusion

And that's it, a good example of all these tips in use is my website's codebase. Not saying, these are the golden rules which are absolute perfect, there could be things wrong with this. Accessibility comes to mind as the major footgun when discussing webdev. So, would love to hear about things which can go wrong or are incorrect! Thanks for reading and until next time, ciao!

My new small space in the vast web

Sun, 24 May 2026 00:00:00 GMT

Motivation

I have been pilled by "write more" propaganda, and its not something new, I have written about it before but never actually made progress on it. Something has changed this year, I utilized my new year to go through the book of Accidental Genius, it talks about freewriting, basically a concept about just typing out words till the editor in your brain takes a step back and you can get your original ideas out, a process of self discovery you can say. Well this is one benefit, mostly importantly I really liked writing as a means to gain clarity. This could work for anything life, work, problem solving, etc. To be frank, I hadn't written much after reading the book in January, but you know there are few things which grow over you? I think the idea has grown over me. So now, I just bang out lines, edit it once maybe and hit push. I even migrated to using markdown files for this, lower the friction to actually publish a blog post, the better.

This is one part of the equation, second is, I have grown to start liking IndieWeb concept. I first discovered the concept through Susam Pal's Wander instances. When I first visited it, I felt like my child like curiosity came back again, it was so wonderful to visit people's websites, read what they have been up to in life/tech, etc. I wanted a little space of my own too, not as fancy as some of those neocities pages, but just my own little simple space where I can host my own interests.

Thirdly, I have finally been able to get my RSS reader set up! I use newsboat, its simple and it works! This time I made sure to very slowly add URLs to my feed, and also do regular scrutiny of which feed I want to keep. This has allowed to sustain a habit of actually opening RSS reader everyday and looking forward to new content 😁. Btw one of the key to achieving this was also having a rather regular cadence feed, for me that is DailyWTF and xkcd. Its so fun to read them! With all this working out, of course I want a RSS feed of my own writing and we need a hosted space for that to happen, so why not build one!?

Execution

Now lets talk about details, how I achieved it, what I liked about the way I took and what I didn't.

Requirements

Keep writing blog posts in markdown files
Simple framework without any client-side javascript
Still able to use npm packages, basically easy to leverage community work
Keep draft blogs private if possible
Easy to deploy

I dont want to increase friction of me writing, a simple markdown file, edit in obsidian and that should do it!

I want to have it load super fast!! No heavy frameworks like React, Vue, etc. I really don't want client side javascript. It should be build once, keep serving always after that! Basically a static site.

There are a bunch of neat things which you can find in npm packages like better markdown parsers, fancy wallpapers, dangerous malwares 💀 (:P). I definitely want to be able to use some of them if possible.

Ditching hugo

For my old website, which my old self made, I used a simple Hugo template. It might have been easy to get up and going at that time, but I am not a fan of these templating engines myself. When I decided to revamp the website, I realized I needed a hugo binary on a specific version, cause it was not building with latest! Some people push their binaries itself to the repo, but what about changing machines? Working across different architectures, etc. I decided I did not want to deal with this crap. So what to do now? Well recently, I migrated my resume from all these templating engines to a custom HTML/CSS file. It is so beautiful now, I can make any changes, add as many points as I want, lay it out as I want, this is what I was looking for! LaTex? Hold my gun, HTML/CSS are always the original king 👑.

I wanted to replicate same success with my new setup too, so that's why I started writing my website in .... HTML and CSS. I could do all my pages with it, but what about markdown files, I would have to write scripts for rendering them? And then actually managing/using them in website is also a bit painful.

Enter Astro

While I was figuring all this out, I attended a IndieWebClubBengaluru meetup and there, some of my old colleagues showed me their websites which were built using Astro. I initially wasn't interested as I thought it would be another heavy client side javascript framework. But then they stared showing me their lighthouse score, how fast their sites loads and how it did not have "client heavy javascript"! Bingo, that was all I wanted to hear. That's when I started exploring Astro, and went through full guide section in its documentation starting from here.

And OH BOY!! I had hit jackpot, somehow these guys had exact thoughts like I had and had everything needed to fulfil my requirements. Markdown files to html? Content management from a third folder of all the markdown content? No client side javascript? RSS? EVERYTHING WAS PRESENT! And everything was supported first class!

There is just one thing amongst my requirements, which you would be confused about, how do you keep draft blogs private if possible, that's technically not something these frameworks can support. Well, to solve this I created a content collection, basically a folder which has all the blog posts. Astro would walk all the dirs/sub-dirs in this collection and make an array of blog posts. Main catch is, this folder is a separate repo in my case. I added it as a submodule, and while it is public for now, I can convert it into a private one at any anytime. I can clone the submodule because I have access to it, but others can't. I can still keep the website code open though! Match made in heaven xD.

Not so good parts

So till now, we discussed how Astro is ticking all the boxes, but nothing is perfect, there were a few caveats. First is resizing images in markdown. Images can be of varied size but resizing them dynamically is a very common usecase. Well it seems like Astro does not have good support for the same with markdown files. I had to convert my blog of chinaga betta hike to a MDX file and inject Astro specific JSX to get it working, this was a huge bummer for me. Before moving on to MDX I tried a bunch of things:

Use img tag in markdown file with local path
Use span and div tag in markdown file with local path
Try to use githubusercontent URL of the image with height and width mentioned in URL

None of these worked 😓. At the end, I had to shift to MDX which obsidian does not support out of the box, it does not even show it in UI!! Well this was one grievance.

Next was, astro/@rss by default does not support linking to images. So if I convert my markdown files to HTML for RSS consumption it does not separate out images in those with some kinda permalink. I am not sure if this an Astro problem, but the OutOfTheBox experience kinda broke here. For now, I am sending MDX files directly in RSS feed without even linking to the images :(

One last problem, was a surprising behaviour. When I first added frontmatter to all my blogs, registered it as content collection, Astro's hot refresh didn't pick them up at all. There were no errors in logs, browser console anywhere?? I had my npm run dev running from the morning since I have been building website incrementally. But I for some reason decided to kill and restart the server, and low and behold it didn't start. It failed with an error. Frontmatter of one of the markdown files was not proper (it was reading root README.md too). Well well, I spent so much time, tweaking content.config.ts and frontmatter for all the blogs, this was very frustrating.

Further plans

But you know, as they say, all's well that ends well. Except it hasn't ended, I will take a pause on development for now, but will come back to try to make fixes for the image handling flow for HTML rendering and RSS readers.

I don't want this to be a passion project for now, it is more of a means to an end, to fulfil the core motivations I listed in the beginning of this blog. I want to take it slow, not try to go full speed once and then never come back to touch it. Core goal is writing more, so we will focus on that above all!

There's also one thing I wanna do before taking on work of image flow fixing, that is adding wander instance to my webpage, that would be fun to have, a corner of mine in the small/indie web community :)

For now, you can find my website here and RSS feed here. Till next time, chao! 😺

Upgrading to neovim 0.12

Wed, 20 May 2026 00:00:00 GMT

I recently upgraded my neovim to 0.12.2, I have been chipping away at it slowly using NVIM_APPNAME feature so that my day to day activities are not hindered. One of the things I found interesting in using NVIM_APPNAME is, I saw a blog where author manually create all the ~/.local/share/nvim-next etc dirs, I did the same cause I was not aware, but later I realized that one can literally just say NVIM_APPNAME=nvim-next ~/.local/share/bob/0.12/bin/nvim and neovim will automatically make all those dirs for you. Also yes, I use awesome bob-nvim for managing neovim versions. With that out of the way, lets get started!

Major changes

Cool, lets talk about major changes, listing them out first:

vim.pack: Migrating to builtin package manager
lsp: Ditching nvim-lspconfig etc for vim.lsp.enable
ui2
no vimscript in config

Migrating to builtin package manager

Neovim now comes with an inbuilt package manager called vim.pack, its currently experimental but is considered good enough for daily driving. I have used a bunch of package managers over time, vim-plugin, packer, lazy and now vim.pack. While I don't care about them a lot, each migration has been inspired by reasons. vim-plug -> packer, lua shift of whole ecosystem. packer -> lazy.nvim, extra features like dependencies, clean config, etc and finally lazy -> vim.pack cause I want to reduce count of external dependencies. With all the changes happening upstream, I am really hopeful that some day my whole config will just fit in a small file!

But I couldn't simply move to using vim.pack, I had to evaluate if it was strictly an upgrade. Factors I looked for:

Do I depend on dependencies feature of lazy?
Does it slow down my startup times?
Does separating config in vim.pack make config structure worse?

For dependencies section, I went through my plugin list and realized I had these dependencies:

aerial.nvim on nvim-treesitter and nvim-web-devicons
telescope on telescope-fzy-native, telescope-live-grep-args and plenary.nvim
nvim-treesitter-context on nvim-treesitter

These don't seem that bad, all of them are loaded much later in the neovim startup process and I could just keep them in a particular order to make the resolution pass. Well I tried it and that did work out, so ticked this off.

For my startup times, I used nvim --startuptime startuptime.log .. If you are not familiar with this command, it instructs neovim to write a log of its startup activities and to get startup time you would look for log --- NVIM STARTED ---, first number in that row is the amount of seconds it took to startup your neovim. I measured this and realized I hadn't used "lazy" in lazy.nvim package manager 😭. But I never felt the need to make it go faster, cause I wasn't able to perceive a delay when starting it. Day I start noticing the delay is the day I bring down my hammer. I had done this earlier for my shell too. But after the migration, timings seemed the same, actually a bit better than lazy.nvim, so I was already happy :)

For config structure, I really liked how lazy forced a dir structure of plugins and encouraged keeping config along side plugin installation line itself. It was a clean way. With vim.pack I had two ways:

keep installation line with config in top level plugin/ dir
keep all installation lines together and config separately

I wanted to maintain order remember? That can easily be done with a single vim.pack.add and listing down all the plugins together with their install order. But with plugin/ dir, I would have to name files 0_, 1_ etc. This is because files in plugin/ are loaded automatically by vim and neovim in sorted order. Naming files like that was a turn off for me, so I went with single vim.pack.add call and created a new dir called plugins/ in lua/ dir and shoved all plugin related config there. I enforced setup order in lua/plugins/init.lua.

With all of this out of the way, migration was really SMOOTH! And I kinda love that I am not pulling a heavy dependency like lazy.nvim in my dep tree :)

If you have advanced usecases or want to understand the feature better, there are two awesome guides: official manual and then this. Read it end-to-end, each section has something you can take away. It is the most comprehensive guide out there right now.

One small trick I learned from the article above is placing vim.loader.enable speeds up startup times for free and this is blessed on us by Folke himself!! Ofc I added that and instantly realized 25ms off the loading time xD

LSP

Neovim v0.12 also brings in more ergonomic LSP usage support. Now, you can place LSP server setting in lsp/ dir and just call vim.lsp.enable(<file-name-in-lsp-dir>)' and this is all the setup you need! nvim-lspconfig is now reduced to just maintain settings for upstream LSP servers. As these rarely change, I just copied from upstream and placed in my lsp/ dir, I also realized I only use two of them now: rust_analyzer and taplo (toml LSP server, also for rust dev :P). With this, I was able to cut down a bunch of lines in my LSP config and also trim down deps of nvim-lspconfig and mason-lspconfig.

ui2

This is an experimental feature where command line meets messages meets pager meets dialog windows. Honestly its best explained in official docs. This was an interesting change which allowed me to trim down a dep I really liked: fidget.nvim. It shows LSP progress on the bottom right corner. Now I do it in ui2 itself using this autocommand, that's it, 15 lines are all we need. I get the progress in same place as command line without Hit Enter prompts.

This works best with cmdheight = 0 , which prevents Hit Enter prompts. cmdheight feature was merged in 0.11 itself, I had tried it then but it felt incomplete and weird, but now with ui2 it has the perfect UX, they have really nailed this! There's just one small hiccup, somehow I am not able to see marco record messages, I could reproduce it on master with minimal config so its definitely an upstream issue, hopefully that gets fixed, but till then I have two hacky autocmds which almost do the same job just slightly worse 🙃.

No vimscript in config

I finally took the leap and ditched out all the vimscript from my config, its completely lua based now! I know I am very late to the party, but I really wanted to keep it around so that I can use it on VMs where vim is the default. Well what finally prompted this move was making a minimal vimscript based vim config, which I can easily drop anywhere and get productive with vim! Here's the minimal config for the curious. I also have a minimal tmux config in the same lines here.

Bugs discovered

This is an interesting section cause I usually never come across any hiccups when upgrading, neovim is a super polished, heavily tested software. People are out there doing builds super frequently to test latest and greatest! ( I was one of them till few years ago :P )

But this release I came across two interesting bugs!

macro recording message display with ui2 and cmdheight=0
a memory segfault with ui2 + invalid rtp and syntax on!

First one we have already discussed, I hope it gets fixed in an upcoming version or even next release is fine :P

For the second one, this is something severe and I was very astonished to come across it! First, link to bug report

minimal repro is just:

require('vim._core.ui2').enable()
vim.cmd [[
set rtp+=$LMAO " Cannot be a non-existing dir, needs to be an env var which does not exist
set syntax
]]

So conditions are:

ui2 should be enabled
rtp should be set to an env var which does not exist
syntax should be on

The reason I came across this is because of this weird rtp setting I had set in my config:

set rtp+=$GOPATH/src/golang.org/x/lint/misc/vim

I have no recollection why I had added this, I have messed around a lot with my config over the years, so there are artifacts I still find in weird corners. But main point being, I don't have golang installed in my system, so GOPATH is not set and hence satisfies condition we mentioned above. I am not exactly sure what is causing this crash in neovim internally, it needs some investigation 🧐.

But yeah interesting times! I created a new APPNAME with nvim-debug and managed to track it down to this config and also reproduce on latest master. I have reported it, let's see if someone upstream picks it up before I get my hands dirty 🏃.

Sides

I was checking :checkhealth vim.lsp and realized I hadn't seen :checkhealth in a while, I did that and boom, it was so much cleaner!! Now we have ✅ and ❌ and ⚠️ to show overall health of the features etc, and in general it looked really really clean!

There are also other features like :restart, etc which I didn't delve into much cause I wasn't sure how to make use of those features right now. There are bunch of other features too, do go through all the release notes. Amount of things I have realized by reading the release notes in detail is mind blowing, 100% recommended!

Conclusion

Overall, I am super happy with this new release, only thing I couldn't change this time is colorscheme, I didn't have any lined up to try out 😓.

Except that, I am already looking forward to more amazing things in upcoming releases (multi-cursor looking at you). Till then, chao!

Working with LLMs

Tue, 05 May 2026 00:00:00 GMT

As we usher in this new era of LLMs, it is interesting to see how different people are starting to work with them. And as a typical keyboard thudding monkey, I want to optimize my workflow too. Because a true master understands tools at it his/her disposal the best.

The way I currently work with them is straight forward way popularized by Claude Code, plan with it first in plan mode, then jump into implementation. I try to manually approve everything, but still I lose context in the "hit enter" hell. To over come it, I sometimes just let it make all the changes and then go back and start editing it. Now, ideally this should work, you plan meticulously and once the plan is solid, bang on, all code will be perfect. Right? Right?

I think that's a wrong model to think how software engineers work. Most of the times, we discover/realize things on the fly, and that could be as small as a super small limited scope change to a complete re-design. So it is more of an iterative loop rather than a one shot model. In that case, one would go in and out of plan mode refining the spec as they learn more.

My questions

But before trying to refine our process, lets try to come up with points I want answers to:

How do I know I have explored all the possible ways to attack a problem, could there a simpler solution?
Another is, breaking down abstractions at correct boundaries, I think LLMs struggle with this right now. I see a lot of people dumping code in places where it shouldn't belong in the first place. Why is it dumped there, cause no one cared enough to think about boundaries. Well, this was a problem before LLMs too, but its much more worse right now.
Writing code by hand is a process which forces one to slow down and look at the surrounding code, think about frictions we face when coming up with code. Just being lazy and realizing a lot of things. LLMs don't have that (1). How to bring back this process of slowing down? And in what form? Hand-write everything again?
How do I trust the tests written by LLM? Amount of people who are not reading generated tests is baffling high. No one, literally no one I know is reading generated tests. They think if there are tests its enough. Amount of times I have found generated tests to not be helpful is actually very high. Like the saying of man goes: "To know a man, check his trash". "To know about an implementation, check its tests".
How to find subtle problems within the implementation, Antirez put it nicely: "but still things that superficially work do not mean they are optimal."(2)

User workflows

Before we try to answer these questions, lets try to read Antirez's use of LLMs for array type support in redis (2) (3). Summarizing it the way I understood it:

He wrote first design draft completely by himself
Brought in LLM, started attacking draft from different angles, this would have likely required him asking correct questions to LLM
He read whole code line by line with extreme care. I liked this a lot: but still things that superficially work do not mean they are optimal.
He rewrote the whole implementation again in a mix of manual and LLM mode
Extensive testing, a complete month dedicated to just that

In his own words towards the end:

For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms. To write the initial huge specification was the key to the successive work, as it was the key to review each single line of sparsearray.c and t_array.c and modifying everything was not a good fit.

As we are at it, these are some ways I have seen people around me use it:

Clowns: Absolute direct vibe code, this is just dumb
GreatPretenders: Give the problem to LLM, act like they understand it by saying: "we manually accepted edits", test it on basic cases and ship to production.
Meticulously try to plan things with it, try to attack from different angles. From here on two more routes:
- Strategist: Write code using a LLM assisted autocomplete
- OldieGoldie: Write code completely by hand

We are not going to talk about Clowns and GreatPretenders at all except one statement to these people, PLEASE stop making my life difficult.

Thoughts on my questions

Now that we have everyone's workflows in place, let's try to come back to our questions. (Answers in the same bullet point number as the question)

I like Antirez's approach here, he took a month just to write the spec, and he didn't write first draft with the help of LLM, it was completely by himself. This is where I think Strategist and OldieGoldie's get defeated, I believe key point is: not reading the approach given by LLM first. Cause there are times, they just don't know, and they don't know what they don't know. They are not able to come up with few of the strategies you might come up with. You could call this the creative step or whatever. I have noticed, reading LLMs output first creates a bias in the mind, and also we might get hindsighted on asking the correct questions. That's why try to come up with a plan on your own and then work with LLM to try to attack it from different sides to solidify it.
On this part, I think there are two steps where this comes up, first is when planning i.e. 1st step and next is when actually writing code by hand and noticing a friction point. First part is addressable during first step itself, this is usually the easy part. But when it comes to the latter, I think it correlates with 3rd point of mine.
Now this is a tricky one, one needs to slow down, we slowed down once in the initial planning phase, but when next? In the iterative cycle I mentioned above, how do we slow down during the actual implementation section to notice these frictions? Well one way is converting into OldieGoldie, it is slow but definitely works! Though one could be lost completely in implementation details and want to complete it fast, which would lead us to the pre-LLM era problem of people writing absolute horrendous code without respecting any abstractions.

So, completely automated is bad, completely hand written is dicey, then Strategist wins? Well, I don't think so, again this is a point about slowing down, fancy autocompletes are not a great way to slow down and understand cross module dependencies. Well then what? I like Antirez's way here, seemingly he generated all code first as a PoC, realized few things during PoC to fix, re-generated it, assumed just reading everything in extreme detail would help but he didn't know the answer to: but still things that superficially work do not mean they are optimal. So he went back and rewrote whole implementation in a mix of manual and AI-assisted mode.

The difference between Strategist and this is using code as a throw-away signal, Antirez used the first version as a PoC, that's it. He then, rewrote the implementation in his own way completely, this makes the process so much faster and more context aware than one shotting the implementation and making abstractions etc on the fly.
For this point, Antirez said two things:
- "Everything was working, and this type has massive testing, thanks, again to AI"
- "When this stage was done, I started, during the third month, to stress test the implementation in many different ways."
I don't think there's info on what he did here. As such testing is a very subjective topic and how to do it properly for a particular system is a monster of its own. For now, I try to follow the same procedure as before, try to come up with test cases myself and then involve LLMs to expand upon them on their own and combine to form a better list. This helps avoid a bunch of test cases which add 1K lines of abstractions on their own to test a simple thing.
For this one, I think 3rd point above goes in enough details about everything. Key to this point I believe is slowing down and reading code multiple times to try and think from different angles.

This kinda also lays out how I want to try using LLMs going forward.

Unanswered Questions in Antirezs' article

Taking a small detour and going back to Antirez's article, I have a few things I would have liked to understand in more details:

When he used in LLM in the planning phase, what part of it was him trying to probe questions out of LLM as an experienced user and what part of it was, LLM finding defects/improvements on its own?
What part of codebases did he rewrite manually and what parts were rewritten using LLM? How did he decide which part to allocate to who?
How did he approach testing in general, did he check LLMs generated tests in super detail? Did he rewrite them too? What did his one month of testing look like in detail? How did LLMs help outside unit tests?

Conclusion

It was interesting to think about these things while writing this article down. I would have not imagined myself thinking about these things because I had previously been haunted by a college senior of mine being too strict on writing down huge number of pages of LLD, HLD, PRD, etc etc for club projects. Ofc we never finished the projects which he was supervising. I still don't know if all of this was coherent or just random rambling. Well, there's one thing for sure, I have something new to try and I will make sure I keep the rigor up in LLM age! [4]

Footnotes:

ClaudeHeads

Thu, 09 Apr 2026 00:00:00 GMT

So firstly what are ClaudeHeads? They are people who have claude in place of their head. They literally think LLM is the only answer and can only think using them, they are just straight up bad at individual thinking, but that doesn't matter, cause LLM can solve everything given right context, given all the information in the world about thing which exists.

For me this is a problem. I joined database industry cause I could not bear writing HTTP APIs for the rest of my life. I am not smart enough personally, but there are horrible software engineers out there, you would find shitty code in all parts of the software stack. But for something which is performance critical, needs to correct always, and is always the black box for programmers, that would need highest and purest levels of programmers, right right??? Well it seems I could only enjoy this dream for sometime, cause LLMs have given birth to ClaudeHeads. We use an open source project named datafusion and have based our database on it. Its not a direct stock integration, we have had to make a lot of changes according to our needs, and it seems distribution is still an unsolved problem there. Also main IP of planner stays with us, single execution is not a solved problem, but open source projects are very very good!

Well, given that it is open source, of course LLMs are trained on it. Now, that is one part of the equation, in recent months they have also become good enough to interact with private parts of our codebase. Migration to a datafusion based engine is a recent enough project and we had been working hard to get performance on TPCDS-like benchmark[1], TPCH-like benchmark, Clickbench, and a bunch of internal benchmarks. We were very very slow as compared to our good old internal custom developed Java engine, as compared to Databricks, as compared to Snowflake. Whole team was heads down working on getting perf better than all of the above combined. As me and other senior colleagues took an "old school" methodological approach of looking at heap profiles, CPU flamegraphs, custom metrics collected by us, finding gaps in our understanding of the system, as not whole codebase was familiar to us yet. Here enters my ClaudeHead hero, who downloads research papers of Datafusion/Arrow etc, keeps them in a folder, keeps all TPCDS queries, their flamegraphs, heap profiles and metrics together, and send the agents to "find perf improvements". The result was pretty shit with earlier models, but what about recent ones? You leave them for a night and they conjure up a bunch of things. Tho how do you test them?

So, incidentally someone in my company developed a easy to use benchmark setup. What was left now, multiple branches started getting created, purely vibe coded and benchmarked in parallel. What ever improved perf was posted as it is after "understanding from Claude summary" to the channel and merged to main. Well the problem is "understanding" part is absent, if asked to reason about the change from different angles, like architectural correctness, my friend would turn around and just ask Claude. There's no head working there, it's just Claude. Well how do I know this? Cause I have asked questions around why some part of it didn't make sense in larger scheme of things. Why even tho metric shows there's nothing to optimize, you keep repeating there's an optimization in a specific region, without backed by proof, just because Claude said that. What's worse, this is a junior engineer just entering the field. Not good at coding, not good at databases, not good at CS fundamentals. But given LLMs, he can keep on posting perf optimizations and get them merged. One could argue, if those don't make sense why can't you prove them wrong? Well here comes the main point of article, my views on ClaudeHeads and how they are correct at times, but expert bullshitters at times. Back in the days, when no LLMs existed, if someone bullshitted, they had to put a LOT OF EFFORT to even get something remotely good out, btw this is considering that it was still considered easy, Brandolini's law. During the process, they learned 100s of things and would definitely come out as a much better engineer, but now? Tell LLM to fire off and conjure stuff, what if that does not make sense in grand scheme of things and would literally break in just a different environment (someone shares my feelings). Well that's benchmaxxing. But what if you could keep benchmaxxing again and again for each dataset. That's not exactly what's happening, but I am thinking through scenarios.

Not understanding what changes you are making to me is the biggest risk of all time, and it breaks what I thought about before starting to work on databases. It's not a race of understanding, now it's a race of trying random folder structures with random bits of information to get the best output out of LLM models. Oh guess what, I am still stuck in the old model, and this has caused a big disadvantage to me. Not only am I slow now, I am also losing learning opportunities myself, just because someone decided to not understand them and rely completely on LLMs. I can pick up LLMs to speed up my work, but all I have ever learnt is, slow and steady wins the race. I believe it to my heart, mental models are the biggest factors of a product. That's the reason when someone who understands codebases deeply leaves, new team gets in frenzy. That's the reason losing a product person who understood product deeply, is such a big loss. They are hard to replace. Code was never the moat, mental models were the actual secret sauce. But in this case, trash out the mental models, we will just use LLMs to not just write code, but also think, not build mental models, just straight up outsource thinking.

Writing code is amongst the best way to build mental models, slow, deliberate thinking is what dials down core ideas of anything. This applies to product building as well as programming. Having faced the "friction", and letting mind battle with it is the best way. I recently read a blog post in similar vein, but talking about astrophysics. It's an excellent read, do go through it.

But yeah, this is my problem, sorry I am not slow, I am just not a ClaudeHead.

[1] I say TPCDS-like cause I remember how badly PlanetScale was thrased in an official blog post when the data generator used by them did not actually comply with TPCDS specifications :upside_down:

Estimating filter equality selectivity using NDVs

Thu, 09 Apr 2026 00:00:00 GMT

Now that I work on databases, I have a habit of keeping up with upstream datafusion PRs. Today I noticed an interesting PR talking about usage of NDVs in equality filter selectivity. I have always been fascinated by NDVs cause my colleagues in planner team always mention them as something super helpful. I started looking into the PR and it turned out to be a small one, but there was a review on it and honestly I did not understand it at all. So I sat down to do some reading on how this works.

Firstly, what are NDVs? NDVs are number of distinct values, these are usually stored at parquet file level. We can also compute NDVs for a column, i.e. how many distinct values a single column contains.

What is filter selectivity? For a filter supplied in a query, number of rows selected by it is called it's selectivity. For e.g. a filter which filters out 50 rows out of total 100 has a selectivity of 50%.

Now lets understand how do they come together, lets say we have a join query like this:

select * from A where A.x = B.y;

In this, we have our join condition as A.x = B.y, if we can predict what will be the selectivity of this filter expression we can make interesting decisions based on it. A good example is: do we want to use partition-wise join or non-partition-wise join? ( i.e. if we have loads of rows to join on, we can distribute them across cores instead of doing it on single core itself )

NDVs help us do exactly that, let's say we have a condition where y = 42. And let's say y column has 5 distinct values, that means our NDV count is 5. As we don't have exact histograms telling us about data distribution, we assume each value is "uniformly distributed" across whole column. For e.g. if y column is made up of {38,39,40,41,42} and has 100 values in total, we assume there are 20 values of 38, 20 values of 39 and so on. This assumption means probability of 42 getting matched is equal to all others distinct values i.e. 1 / 5. If we multiply this with total number of rows in the column, we get selectivity of y = 42 as 20. Here key point is understanding us assuming uniform distribution, if we had histograms, we could exactly tell how many rows have value 42 in the column, but NDVs work as next best case.

This estimation of rows helps in join order estimation, join type estimation, etc.

After understanding this I noticed there was a review comment on the PR and tried to decode that. Review was as follows

I think this new `1 / distinct_count` branch is a little too broad as written. Right now it fires whenever the pruned interval collapses to a single value, but that is not quite the same thing as proving we have an equality filter.

For example, if the incoming stats already describe a singleton interval, or if a conjunction of inequalities narrows the range to one point without actually adding any selectivity beyond the existing stats, we would still scale by `1 / NDV` here and end up under-estimating the row count.

This was a total bouncer for me, it was so high that if this were a cricket match, umpire would call it a WIDE. But let's try to break it down, so author's current condition to use 1/NDV is as follows:

if ...
	target.distinct_count
                    && distinct_count > 0
                    && !target_interval.lower().is_null()
                    && target_interval.lower() == target_interval.upper() {...}

In this interval means zone maps i.e. min/max values of that column. In our case above y would have min/max values as {39,42}.

Condition checks if NDV count is not zero and target_interval's lower value is same as upper value, if everything passes we assume our filter selectivity as 1/NDV.

According to reviewer, 1/NDV estimation is incorrect in following cases:

"if the incoming stats already describe a singleton interval"
"if a conjunction of inequalities narrows the range to one point without actually adding any selectivity beyond the existing stats"

Both of these reviews at the core address the problems of:

shape of data changes as it gets processed by different operators

singleton does not guarantee an equality filter source

Lets try to understand above line with an example. Lets say we have two filters on our y column due to some CTE/subquery etc:

first being: y >= 41 || y <= 42
and second being: y > 33 AND y < 42 (non equality condition)

After first filter we would have:

bounds: [41, 42]
NDV count: 5 (notice it didn't change)

When we come to second filter and apply bounds to predicate we only get rows containing 41. Here we will predict selectivity as 1/NDV. This is the exact problem, lets say out of first filter we get 70 rows out i.e. first filter has selectivity of 70%. Now lets say we have 35 rows of 41 and 35 rows of 42, after applying second filter 35 rows are remaining i.e. 50% selectivity. But, if we go by NDV route, we get 70/5 i.e. 14 rows, that is a super low estimation!

Our NDV count did not change as data flowed through both filters, same phenomenon can happen with different operators in the middle. We also saw that even though we got singleton interval as an

This was an interesting dive, which confused me a lot at different places, even while writing this down!

References:

Chinaga Betta Hike

Sun, 08 Feb 2026 00:00:00 GMT

Chinaga betta is a nice little day hike bear Bengaluru. It is said to be 2.1 kms one side, so in total of 4.2 kms up and down. It's a simple hike, can be done with family and friends. I recently got to visiti it and I am gonna mention how was the experience. First thing is it needs permit, so book it from arayna vihaara website. Choose a convenient slot, in my time it was just 6 AM to 6:30 AM. This is needed cause it's said that a forest ranger will accompany you to the top, I say it that way, cause surprise surprise there was no one when we reached there.

Trek starts from the base of temple Torana Anjaneya Swami Temple. This is where forest department is supposed to check your IDs before starting. There were a lot of locals when we reached there, its a temple which seemed super active and when we were returning it also seemed they were in the process of sacrificing a goat. We didn't stand back to see that. Well that's for a later bit, first in the start, when we were approaching base of the temple we were very scared as it was pitch black and when we entered forest side, it started to feel like off roading. So driving through a lonely road in night with no one in sight, was a bit concerning, but when we reached there and saw few fellow hikers we were relieved.

We waited for forest ranger till 6:45, but when we realized we were played for a fool, we just started on our own. So yeah, 250 rupees went in vain :(

Okay, next thing, let's talk about the actual trail. We followed trail from this website. I would divide whole hike into four sections, first is big temple to small temple, next is rocky/slaby tiring uphill section, next is flatlands and finally the last remaining part to the summit.

Forest ranger is mostly not needed for the hike's majority, it's just that at the top, there's a vertical rock, which you have to climb and most people would not be comfortable doing that. I was a climber so I climbed it pretty easily (subtle flex xD). Hike starts at the back of the temple, and there's a outward protruding rock at the top, which has a flag above, that is your summit.

First part when we start from the back of the temple, there's a trail which seems to go in the forest, follow that. Once you follow that you will reach another small temple. There will be two paths there, and just as Robert Frost's protagonist, we have to the road less taken. This is first part done, its just light walking, perfect warmup for the next tiring section.

Next section is a where uphill starts, its through mildly dense forest, well it was less denser cause we could see people there had burnt a lot of trees to keep it clear for trail. This section is also where we saw sunrise. It would have been better to watch it from flatlands above, but we were late due to waiting for forest ranger :(

This section is also slippery at times, two of my friends survived the slip, but it could get dangerous, so either wear good shoes or be extra careful where you are stepping and how you are shifting body weight. It shouldn't be a problem for most people, but extra caution is never harmful.

There's a section between flatland and this uphill where you will start seeing eucalyptus trees. They are beautiful with little yellow flowers on them. This gave me a feeling of walking through magic forest of berserk. If locals hadn't burnt a lot of trees, I wonder if I would have declared it Garden of Eden.

Image: Walk through eucalyptus trees

Also while going uphill you will notice white arrows, follow them. Though you could also just rely on the trail map you have downloaded, it was on OpenStreetMaps on IndiaHikes website so you could use any FOSS maps app to view it.

Next section is flatlands, it is what the name says. I don't think it's called flatlands in any blog or something, but I am calling it that cause of minecraft biome xD This is one spot you can catch sunrise. Best would be top, but even this is fine. From here you would start getting views of surrounding area. Its serene, breathe in and get ready for the remaining part!

Now, next is walking through some part of flatlands to reach farthermost bottom of last section, there should be a easily visible trail starting there. Just follow that.

As you walk through the last section, you will come across two big rocks creating a narrow space between them. You have to squeeze and pass, that is also very slippery, but all I could think at that point is, if I could climb this chimney 😂

Once you complete that you would reach the final vertical rock which you have to climb. There's a rope there to assist but it seemed we were the first one to reach so it was thrown above!? Not sure why would someone do that. It looks like this:

Image: Vertical wall to climb

It doesn't look much but there's no proper footing down below, so if you slip you can get injured. And getting up is one thing, getting down is more scary unless you have someone looking at your feet and telling you where the footholds are.

They are also carved inside the rock, and not projecting outwards. Also part of the reason why when getting down you have to look for them. Well, getting back in our case, I was the guy who got pushed forward to climb to get rope from above. I had no safety, so my friends were a bit concerned but I was confident as I had climbed much higher rocks with much dicier footholds and handholds as compared to this in Hampi. I threw the rope down and assisted everyone else to climb above. As we reached above, we could get a much clearer view of everything around, it's a beautiful place. On one side, you are seeing mountains till eyes can reach, covered with a blanket of clouds. Next side, you see tiny settlements, with lesser hills, perfect for people to make actual small towns around. While it's not a lot of height, wind there was super strong. So if you are lean and light weight, please don't fly off xD

Image: Amongst the clouds Image: Tiny settlements between small hills

We had a dog guide us the whole time, she was so cute and playful, unfortunately she disappeared when we completed the hike, so we couldn't treat her :( We also could not carry her to to topmost section as that was a climb on a straight rock. It wasn't the biggest but not possible for us to do with a dog.

And yeah that's it, we came down the same path, but there's another surprise waiting for you after hike, we stopped by the Swandenahalli Lake. We think it's a small pond rather than a lake. We chilled there for some time, skipped some stones which itself was good enough to offset lake vs pond disappointment :)

And that's it on the way back we tried Pavithra Idli Hotel's Benne thatte idili, vada and masala dose. They have been cooking since 1942 and are pretty famous, we had to wait for 10-15 minutes to get a seat on a Saturday morning. Benne Thatte idli wasn't upto to the hype for me, I have had better ones near in Jayanagar. But Masala dose was better and we watered everything off with a hot filter coffee, always the best part for me xD It's worth a try once :)

And that's it, enjoy and have a nice trip.

References: https://indiahikes.com/documented-trek/chinaga-betta-trek

Understanding Snapshots in Apache Iceberg

Wed, 12 Feb 2025 00:00:00 GMT

External Link

https://www.e6data.com/blog/apache-iceberg-snapshots-time-travel
https://archive.is/VJ7pE

Sink consistency in RisingWave

Wed, 12 Feb 2025 00:00:00 GMT

NOTE: I am mostly writing this down to present to someone who is already familiar with the system, but I have laid down some ground work to make it slightly better. Write up is also heavily code referential, so sorry if that's not up your alley.

RisingWave is a popular and open-source streaming database, it can work with a variety of different sources and sinks and has capabilities to provide performant real time analyses on streaming data along side service ad-hoc queries. Basically a lot of buzzwords.

I have grown interest into the system and was trying to understand how it prevents data loss with so many different sinks in case one of it's compute node dies? We will be looking into handling of iceberg sink cause that's what I am working with these days. I am going to assume familiarity with iceberg already cause understanding that would take another several blog posts.

One of the good features of iceberg is it's decoupling between data files and metadata files. One can take existing parquet files and create a table out of them easily. Work for the same is active in iceberg-rust. Even when comitting iceberg writers do the same, write data files first and then try to write metadata files, if they fail (they may fail cause another writer's commit would cause ACID guarantees to fail on table) they just have to re-generate metadata files and try to commit again.

So writing data files vs committing are separate processes, same happens in iceberg-rs and hence RisingWave, for iceberg sink these are the locations where each occurs:

Writing happens under IcebergSinkWriter here
Commiting happens here

Getting back to RisingWave, core idea of persisting such state in databases is to use some kind of logs, a lot of databases have their own WAL implementation. RisingWave also leverages concept of log stores for the same. These are the current log stores implementation:

In Memory Log Store
KV Log Store

Now our doubt was what if RisingWave compute node crashes before commit happens. LogStores implements LogReader. LogReader abstraction shows what all methods does it provide, namely:

init
next_item , read next item in log
truncate , increments read offset in log
rewind , decrements read offset in log

These methods are used along side RisingWave's internal global clock to make sure no data is lost. Hierarchy of internal clock looks like this:

barriers every configurable ms, configurable using barrier_interval_ms in system params
checkpoints every N barriers, configurable using checkpoint_frequency system param
commits every N checkpoints, configurable for iceberg sink using commit_checkpoint_interval

So we can keep reading data async on every barrier using next_item and keep truncateing on every commit. This would ensure we lose no data for different types of sinks.

Let's see what happens for iceberg sink:

Firstly, how LogReader relates to our iceberg writer. LogReader is used by LogSinker , in our case DecoupleCheckpointLogSinker here and that finally calls:

For writing: write_batch here
For committing: commit here, this follows central clock of barriers

Now, DecoupleCheckpointLogSinker also listens to central clock of barrier and writes data files to object store on each barrier here (i.e. call close method on data file writer), but it actually commits the result on every N checkpoints.

So technically, if barrier and checkpoint values are not same and a compute node crashes between two checkpoints, we would have written data files to object store, but it would not be committed i.e. no metadata files. So these would fall under table maintenance job of orphan files.

This can be mitigated by simply setting checkpoint_frequency to 1 i.e. trigger at every barrier and also commit_checkpoint_interval to 1 i.e. commit on every barrier/checkpoint.

Now, how to increase batching size? That can be done by configuring barrier_interval_ms . Though this could be a bad idea cause barriers are used internally for a lot of other things, they are like ticks in minecraft engine. So making everything slower for batching can make us lose other system internal state/data leaving system in weird non-recoverable condition.

Lets write a Brainfuck Interpreter: Optimizations

Sun, 26 May 2024 00:00:00 GMT

In last part, we wrote a naive implementation of brainfuck, which is pain-stakingly slow, let's try to optimize it, we will majorly discuss two major optimizations in this blog. We will end up with really nice speedups at the end, so buckle up and let's go!

First optimization

One of the best things about implementing brainfuck is it's implementation is simple and straightforward and hence one can find optimization opportunities realtively easily. We don't try to plot a flamegraph, cause we know most of the time is spent in exec function, that's where we execute all of our operations, so any optimizations done in that flow would give us direct noticeable speedups.

Let's look at implementations of our operands again, this is the core loop right now. There's not much to see, implementations for >, <, +, -, ., , are pretty simple and one liner even :P . So let's have a look at multi-liners i.e. loop implementations, here most hot path would definitely will be finding it's corresponding loop operand, let's say we have a program like:

[:::[::]:::[::[::]::]]
^1  ^2     ^3 ^4

(: here means any random operand), we have 4 loops in total, 1 being the parent loops of all, containing 2 and 3 as their immediate child loop and finally 4 inside 3. Let's say 1 repeats 5 times. In a single interation of loop1 we will be finding end of loop2 once, which would make this find operation happen 5 times. Now let's say loop3 executes 10 times, for loop4 we will execute find operation 10 * 5 = 50 times, this is wasted computation. We can do this computation once and store it for whole execution of program.

So do we make a kind of caching mechanism to store just for the inner loops? Technically we also have to jump for outer loops, so we do need jumping index for them too, but only once for most parent loop, and fewer times for depth one loops. What if we precompute all bracket locations? We as such do it while executing, maybe do it before execution starts, and then reference them to jump easily around. Let's give it a try and see our benchmark results.

We make an array as big as program size and fill it in with -1 values, at exact index of loop operands we will fill in it's corresponding loop operands index. So we create two arrays: open_brackets_loc and close_brackets_loc. Now just before entering the core loop of exec we call a new function called fill_brackets_loc, this takes in program and it's length and calculates all brackets location along with filling them in our arrays. Implementation is simple, we find a [ and maintain a counter till we find corresponding ], same as what we did in last blogpost, but we will only do it once this time, at the very start. Code looks like this:

void fill_brackets_loc(char *prog, int prog_len) {
  int i = 0, next_open_bracket_loc = -1;

  while (i < prog_len) {
    switch (prog[i]) {
    case '[': {
      int brackets_depth = 0;
      for (int j = i; j < prog_len; j++) {
        if (prog[j] == '[') { // found a new loop start operand
          if (next_open_bracket_loc == -1 && j != i) {
            next_open_bracket_loc = j;
          }
          brackets_depth++; // increase the counter
        } else if (prog[j] == ']') { // found a new loop end operand
          brackets_depth--; // decrease the counter
        }

        if (brackets_depth == 0) {
          open_brackets_loc[i] = j; // filling in our arrays
          close_brackets_loc[j] = i;
          break;
        }
      }

      if (brackets_depth != 0) {
        ABORT("brackets mismatch"); // oops didn't find corresponding loop operand
      }

      break;
    }
    default:
      break;
    }

    if (next_open_bracket_loc != -1) {
      i = next_open_bracket_loc;
      next_open_bracket_loc = -1;
    } else {
      i++;
    }
  }
}

We can even do one better by also storing each [] identified when transversing nested loops. But for now, this works :P

Now our [ handler in exec looks like this:

case '[':
  if (tape[pointer] == 0) {
    int idx = open_brackets_loc[i];
    if (idx == -1) {
      DBG_PRINTF("[: got bracket_loc as -1 for i: %d", i);
      ABORT("invalid state");
    }
    i = idx;
    continue;
  }

  break;

We directly look up the location of corresponding loop operanding and jump!

same for ]:

case ']': {
  if (tape[pointer] != 0) {
    int idx = close_brackets_loc[i];
    if (idx == -1) {
      DBG_PRINTF("]: got bracket_loc as -1 for i: %d", i);
      ABORT("invalid state");
    }
    i = idx;
    continue;
  }

 break;
}

Running our benchmarks now gives us: Factor: ~7s Mandelbrot: ~22s

That's some big gains from a simple observation! But wait we have more :)

Second optimization

Before this occurs, I made a small change to our core exec, instead of using characters I am using enum variants for identifying each character, it's essentially the same thing as before just different representation. For the coversion between character operations and enum variants I wrote a simple parse function:

enum Op_type {
  INVALID = 0,
  FWD,
  BWD,
  INCREMENT,
  DECREMENT,
  OUTPUT,
  INPUT,
  JMP_IF_ZERO,
  JMP_IF_NOT_ZERO,
};

void parse(char *prog, int prog_len) {
  int i = 0;

  while (i < prog_len) {
    enum Op_type op_type = INVALID;
    switch (prog[i]) {
    case '>':
      op_type = FWD;
    case '<':
      if (op_type == INVALID)
        op_type = BWD;
    case '+':
      if (op_type == INVALID)
        op_type = INCREMENT;
    case '-': {
      if (op_type == INVALID)
        op_type = DECREMENT;
      break;
    }
    case '.':
      op_type = OUTPUT;
      break;
    case ',':
      op_type = INPUT;
      break;
    case '[':
      op_type = JMP_IF_ZERO;
      break;
    case ']':
      op_type = JMP_IF_NOT_ZERO;
      break;
    default:
      break;
    }

    if (op_type == INVALID) {
      i++;
      continue; // this can happen when there are comments which are supposed to
                // be ignored
    }
      i++;
  }
}

Now an interesting optimization I have seen done in Bytecode Interpreters is combining instructions when they occur together way too often. This could happen with same or different instructions too. I learnt about this first time while completing (Crafting interpreters)[https://craftinginterpreters.com/] an amazing book by Bob Nystorm. So let's try to find if it is possible to combine any instructions in our case. We add an array with size of [number of instructions][number of instructions]. This is because we want to check how each instruction relates with other ones. At the end of parse function we add this code to make it record op_assoc:

ops[++ops_len] = op;
if (ops_len > 0) {
    // This logic simply tries to unite op_assoc[1][5]
    // and op_assoc[5][1] into one single field
    int op_type_1 = (int)ops[ops_len].op_type;
    int op_type_2 = (int)ops[ops_len - 1].op_type;
    if (op_type_1 >= op_type_2) {
      op_assoc[op_type_2][op_type_1]++;
    } else {
      op_assoc[op_type_1][op_type_2]++;
    }
}

We try to unite results of form op_assoc[i][j] and op_assoc[j][i] into one field op_assoc[i][j], cause we don't want to see associativity of + and [ and [ and + as separate results. With this done, let's try to get output of it for mandelbrot:

DEBUG: op_assoc[1][1]: 3506
DEBUG: op_assoc[1][3]: 438
DEBUG: op_assoc[1][4]: 337
DEBUG: op_assoc[1][5]: 3
DEBUG: op_assoc[1][7]: 498
DEBUG: op_assoc[1][8]: 568
DEBUG: op_assoc[2][2]: 3604
DEBUG: op_assoc[2][3]: 386
DEBUG: op_assoc[2][4]: 246
DEBUG: op_assoc[2][5]: 3
DEBUG: op_assoc[2][7]: 362
DEBUG: op_assoc[2][8]: 521
DEBUG: op_assoc[3][3]: 224
DEBUG: op_assoc[3][7]: 30
DEBUG: op_assoc[3][8]: 86
DEBUG: op_assoc[4][4]: 2
DEBUG: op_assoc[4][7]: 462
DEBUG: op_assoc[4][8]: 133
DEBUG: op_assoc[5][7]: 1
DEBUG: op_assoc[5][8]: 1
DEBUG: op_assoc[7][7]: 10
DEBUG: op_assoc[8][8]: 32

Highest oens are (2, 2), (1, 1), so repeating instructions, specifically >, <, these should be easy to club. Let's do just that, we will add a repeat field for each operation which will store how many times does the operation repeat. After this we can make exec function increment values by repeat's value instead of just 1, after this change our exec function looks like this:

int exec(char *prog, int prog_len) {
  DBG_PRINT(prog);
  int i = 0, val;

  parse(prog, prog_len);
  // print_op_assoc(); // This is for checking which all ops occur together
  fill_brackets_loc();

  while (i <= ops_len) {
    // start = clock();
    switch (ops[i].op_type) {
    case FWD:
      pointer += ops[i].repeat; // We increment by `repeat` now
      break;
    case BWD:
      pointer -= ops[i].repeat; // We increment by `repeat` now
      break;
    case INCREMENT:
      val = (int)tape[pointer];
      val += ops[i].repeat; // We increment by `repeat` now
      tape[pointer] = (char)val;
      break;
    case DECREMENT:
      val = (int)tape[pointer];
      val -= ops[i].repeat; // We increment by `repeat` now
      tape[pointer] = (char)val;
      break;
    case OUTPUT:
      printf("%c", tape[pointer]);
      break;
    case INPUT: {
      char ch = (char)getchar();
      tape[pointer] = ch;
      break;
    }
    case JMP_IF_ZERO:
      if (tape[pointer] == 0) {
        int idx = open_brackets_loc[i];
        if (idx == -1) {
          DBG_PRINTF("[: got bracket_loc as -1 for i: %d", i);
          ABORT("invalid state");
        }
        i = idx;
        continue;
      }

      break;
    case JMP_IF_NOT_ZERO: {
      if (tape[pointer] != 0) {
        int idx = close_brackets_loc[i];
        if (idx == -1) {
          DBG_PRINTF("]: got bracket_loc as -1 for i: %d", i);
          ABORT("invalid state");
        }
        i = idx;
        continue;
      }

      break;
    }
    case INVALID:
      ABORT("INVALID shouln't have leakded till here, there's a bug in parsing "
            "code");
    default:
      break;
    }

    i++;
  }

  return 0;
}

Simple and easy, let's benchmark this change:

Factor: ~2.16s Mandelbrot: ~5.9s

And we get another round of massive speedups! Whole code is available here

This is where halt our efforts for optimizations, next we are going to learn about JITs from systems perspective, how do we leverage kernel APIs to achieve JITting.