I've been struggling with url-keys way to long these days and I think there is a number of things that isn't as smooth as it should be yet. I'm referring to Magento EE 22.214.171.124 (and parts of this also applies to Magento CE 126.96.36.199-alpha1). This is not so much a blog post, but more offloading the results of my research and my open question. I'm happy to read your comments and ideas.
- A product url-key used to be a normal attribute that was stores in the products eav tables:
select * from catalog_product_entity_varchar where attribute_id = 86;
- Now url-keys are stored as separate entities in the tables catalog_product_entity_url_key and catalog_category_entity_url_key instead (or in addition?)
- Url-keys can live on a global scope only or can be website or store specific. Like any other attribute this can be configured in Catalog > Attributes > Manage Attributes.
- The new tables catalog_product_entity_url_key and catalog_category_entity_url_key come with a unique key that does not allow duplicate keys in the same store:
- However if the url-key attribute is configured to allow store specific values you can still have conflicts between store values and a global fallback. This is not covered by the MySQL key.
- Then there are the index tables: Index data lives in core_url_rewrite and enterprise_url_rewrite. The data in core_url_rewrite does not seem to be updated at all unless I'm reusing the code from the old indexer to fill them. But this data is still used to render the links. Something seems to be broken here.
- Of course before anything can be done the duplicate urls need to be resolved. And we had tons of them. When products are duplicated the url-key will not be changed. And even if the url-keys were changed afterwards there is a high chance that the default value is still the copied one.
- I implemented a script that detects all duplicate keys and tries to resolve the duplicates by addressing hidden and disabled products first and trying to change the url-key by appending a meaningful attribute value (especially for variants of a configurable product). E.g. shoe, shoe-brown, shoe-black. The script worked but due to all the other confusion mentioned here I couldn't get the new urls to be displayed correctly. Then I saw Vinai's solution (https://gist.github.com/Vinai/5451584) that is a lot faster and cleaner than my original one. But this version does not take url-keys on different store levels into account. I addressed this and this seems to work fine now: https://gist.github.com/fbrnc/5464097. But I still want to merge the idea of my original script and try to make the url-key change as unobstrusive as possible by looking at invisible products first and changing the key to something more or less meaningful.
- You need to fix all duplicates first, otherwise the indexer will not do anything. And also will not complain about it.
- Using another script I tried to copy the store specific value to the global scope and to delete the scope specific value so it would fallback to the global value. This worked. For some products. Not for all. Strange…
- Btw, if you have only one store configured Magento falls back into a single store mode and won't even show you the store selectors. I'm not sure if values will be stored in the global scope or in the store scope then. However, you can image what happens if you create a second store in that instance after some time. Data integrity isn't getting better...
- And then there are thousands of redirects from old urls to new urls that blow up everything and won't make it any simpler.
- Before 1.13/1.8 any CMS page with a url-key that was also used as a category or product url-key would be evaluated first. This way you could easily replace the main categories by cms landing pages. This has changed now. Even though the CMS controller is processed first, the product and category urls will be evaluated before the routing process starts, making it much harder to display cms content in a clean way instead.
- The Magento backend won’t allow you to add a leading dot before your urls suffixes but claims that this is happening automatically. It doesn't. Here's my solution: https://github.com/fbrnc/Aoe_SuffixFix
- Generated urls will be cached in the object. This is a good thing, but results on broken urls when category-specific and a category-independent urls are generated in the same request.
- Magento comes with a couple of new scheduler tasks to process the indexes. The interesting thing here is that a new way of scheduling was introduced. Instead of a cron syntax (e.g. /5 *) now there is "always". Those tasks will be executed on every cron run. And the ugly part is: The same cron record will be reused over and over again. So you cannot see a history of what's happening and error messages will be stuck in this single object forever. I guess I'll have to revisit this for my Aoe_Scheduler module. Also there is a number of other crons that are executed every minute. Why isn't this consistent and also running "always"? And if your scheduler won’t be triggered every minute from the system's cron the tasks will start piling up.