When we were debugging this 10 (!) years ago, we used the lmbench memory test tool to really make sure that the cache was what we thought it was. That tool will really show you very nice performance information and you can really see the delta in memory time when you fall out of the cache. In fact it's how I first figured out that L2 was not enabled at all ...
ron