Code Optimization – The what and the how?

An optimization problem is the problem of finding the best solution from all feasible solutions — well that’s what wiki says about optimization. For most part in the programming realm optimization feels like an Art and a balanced form of choosing what to optimize and what not to. When to optimize and when not to.

Errr What Again ?

alright let’s try that again, for me optimization is not the holly grail and depends on what you want to achieve at the end of the day. It often leads to self sabotaging if not executed carefully.
Often premature optimization can lead to or is a result of

  • Less clear code.
  • Poor code architecture/arrangements.
  • Less secure coding.
  • Wasted programing hours.
Fortunately we live in a world where most of the technologies do not need optimization and are often tuned to perform in most of the cases right out of the box. Because of that most of the system may not need optimization for extended periods of time.
About 97% of time you will not need optimization and for the rest 3% one should only be concerned about it if this is something that can be quantified. Quantification can be anything from response time, cost incurred , CPUs being used , RAM requirements , thread requirement and anything in between.

Often these quantification are done with profiler, but that should not always be the case. If you don’t have access to one, chances are your engineers are already aware of where performance bottlenecks are.
Keep in mind there will be instances where bottlenecks within your program will appear in areas of your code you would have never thought and having a Profiling step is going to help you in the long run.

The Idea of optimization essentially is to avoid early micro-optimizations. However optimizations on larger scale i.e macro-optimization (things like choosing an O(log N) algorithm instead of O(N²)) are always worth the effort and should incorporated in early stages.
A good example is choosing which data structure you want to work with when using redis as a cache. There are tradeoffs for each data structure and your decision will lead to making compromises based on the results you want to achieve at the end of the day.

THE WHEN

I think this is the easy part, and something have been discussing for a while now. So to summarize I often follow a three step rule to determine when optimization is something we should consider doing.
Step 1. Get the code working.
Step 2. Verify that the code is correct.
Step 3. Make optimizations only when it can be quantified ( in context of making it faster, cost optimization, resource optimization etc ) — this is often done after usage of a profiling tools suitable for the technology in question.

The evaluation for our third steps should start with defining clear goals such as performance threshold of what we want to achieve. Once that is done we can select a plan of action and determine root elements behind the need of optimization and approach it accordingly.

THE HOW?

We need to understand every technology is different and will need a different strategy and approach towards optimization. There is no silver bullet to achieve that and can happen on different levels of granularity.

The Ideal approach would be to avoid the need of optimization all together by including easy to adapt best practices.
For Example :
But most of the time I believe this luxury will not be available and often we need to intervene as the need of optimization crops up.
When it does try to following among other things :

  • Start or Move towards a good/better architecture :
    Be it a Stack architecture or code architecture, A loosely coupled high modularity system is something we should always aim for. The benefits of having a maintainable codebase deserves a separate post by itself. Benefits of having a loosely coupled architecture also deserves a post by itself. In my experience using micro-services and distributed architecture works wonders most of the time.
  • Choose or migrate towards right data structure :
    When you are choosing a data structure make sure accessing it is fastest. Almost always I never try to optimize a processes which happens every week or once a day.
  • The Database :
    A correctly architected database design goes a long way, it would be foolish to assume that DB design is something which can be easily refactored at latter stage. A poorly designed DB is almost impossible to fix easily and it will cost you time and money and sleepless nights! This is because DB systems are too basic to the operation of the whole system & if we are stuck with a poor design — changes needed to recover from that might propagate on every level of your software stack. I will always recommend spending enough time designing and testing your DB schema and avoid using cursors within our code.
  • Optimize for memory:
    The speed of retrieving data goes in following order
    cache memory > memory > disk > network
    Hence try to use memory bound I/O as much as possible, which roughly translate to use memory based db as cache (redis or memcache) whenever possible to keep the disk based database fetches at minimum. I remember reading somewhere

    The memory subsystem is 10 to 100 times slower than the CPU (Cache memory), and if your data gets paged to disk, it’s 1000 to 10,000 times slower.

    Once we have a clear view of things that need to be optimized, we might want do it one item/optimization at a time. Although this might sounds too much work and time involved, this ensures high availability and granularity of our optimization efforts. This also leads to lower chances of system wide error propagation while keeping your cost of optimization down (In that Instance).

Leave a Reply