Earlier this evening I stumbled over this great article in Luis Tineo's blog: http://www.kingletas.com/2012/09/use-sqlite3-with-magento.html

Once again he had some great ideas and he shared them on his blog. Always trying to find ways to improve Magento's performance this time he addressed the autoloading mechanism.

Magento's powerful concept of having multiple code pools, the deep directory nesting and the fact that it literally comes with thousands of files will end up in up to five lookups for a single file and thousands of lookups per single request. Check this code snippet in Mage.php:

    /**
     * Set include path
     */
    $paths[] = BP . DS . 'app' . DS . 'code' . DS . 'local';
    $paths[] = BP . DS . 'app' . DS . 'code' . DS . 'community';
    $paths[] = BP . DS . 'app' . DS . 'code' . DS . 'core';
    $paths[] = BP . DS . 'lib';

    $appPath = implode(PS, $paths);
    set_include_path($appPath . PS . Mage::registry('original_include_path'));

Magento's native compiler

Magento actually comes with a solution for this: The compiler.

Magento's compiler will copy all php files into a flat directory and rename the filename to match the classname. Using this method the native autoloader will only have to look into one directory. This mechanism comes with some issues. Besides the fact that Magento won't operate one the original files anymore (that makes it impossible to develop while the compiler is enabled) and that lots of complex file operations have to be done during the compilation process (incl. copying whole folders of related stuff - check Mage_Compiler_Model_Process) you will very likely run into problems if the compiler is enabled.

This is the reason why we aren't using the Magento compiler at all. Plus, if you're using a php bytecode cache (like APC) loading classes should be remarkably faster anyways. (Also check this blog post: http://www.byte.nl/blog/2011/10/11/should-i-use-the-magento-compiler/))

Luis' solution

Now back to Luis' approach: Using sqlite3 he's keeping track of the mapping between a class name and the corresponding file. While this is a great idea I was wondering what justifies adding another component to handle this problem.

My solution

So my approach is: Instead of using sqlite3, redis, memcache or any other additional component, I'm using plain php. The amount of data that needs to be stored is not too big and storing this data in a file in the local filesystem, reading it once and processing requests from memory is probably much faster than interacting with any other database.

So I created a simple proof-of-concept module "Aoe_ClassPathCache". Download it from GitHub

Aoe_ClassPathCache

or check it out from there. (As always this module includes a modman configuration file).

This module basically replaces the Varien_Autoloader and searches for the files in the configured include paths instead of handing over this task to php's include function. The information on the exact location of this file is cached as a serialized array in var/classpathcache.php.

No php class file is touched and as long as you don't add new files that override other file in the autoloader hierarchy you don't even need to delete the cache while developing or deploying.

Currently there's no interface for deleting this class path cache. You can simply delete the var/classpathcache.php file. This file will then be gradually populated to contain all the files while you access classes the first time.

I don't have any performance comparisons in place and also note that this is currently a proof-of-concept. If you consider using this on a production system please test it properly (and let me know your results). On the other hand this mechanism is pretty unobtrusive and doesn't interfere much with any other processes and concepts, so that I don't expect any bigger issues.

Have fun,

Fabrizio

Update

After publishing this module yesterday a ended up having lots of new ideas over night and spent some time this morning to improve this module.

The new version 0.1.0 contains following changes/improvements:

  • The base path isn't part of the cached content anymore. This way the cache size (that is kept in memory) is remarkably smaller. And the cached file is portable and can be part of your deployment.
  • Instead of using the filepath as key for the cache I changed this to use the actual class names instead. That again is slighty shorter and will avoid converting classnames to filenames over and over again.
  • (Un)serializing is slow. And as the file will be read so much more ofter than it will be written I decided to change the way the cache information is stored to native php and include that file. This way you can also see what's actually stored in the cache. But don't change anything manually in the cache file, as it will be overwritten.
  • Writing cache files can cause issues if two requests are trying to update the same file. By writing into a tempory file writing this file should be almost atomic (well, php's rename function isn't atomic itself, but this should be good enough).
  • Whitespace fix.

Magento 2's solution

As Andrey Tserkus pointed out my solution is very close to the Magento 2 implementation of the autoloader caching. I hadn't checked that before, but it makes me happy to see that they came up with the same solution.

So consider my module as a backport of that feature :) (However the static nature of the class makes the Magento 1 based solution look a little bit more ugly...)

Comments

This website uses disqus for the commenting functionality. In order to protect your privacy comments are disabled by default.

Enable Comments