ARM Architecture

Unique Entry Screens – Stress Validation

By Aditi Sharma, NXP


With the rising complexity of automotive subsystems, the necessity for a number of cores is growing in an SoC. A number of cores will entry shared system sources on a regular basis. When a couple of core is accessing the identical reminiscence or any shared variable, in that situation there’s a chance that the output or the worth of the shared variable is improper as a result of race situation. Race circumstances pose a severe menace to information integrity. Race circumstances could be prevented if correct synchronization strategies are used. One such synchronization technique is utilizing Unique Load and Retailer on a shared reminiscence useful resource.

Unique Load and Retailer directions want Unique entry screens on the right track reminiscence.

On this article, we are going to see the strategies required to emphasize Unique entry screens at goal reminiscence to uncover any bug buried deep contained in the design.


The unique entry mechanism permits the implementation of semaphore-type operations with out requiring the bus to stay locked to a specific grasp all through the operation. The benefit of unique entry is that semaphore-type operations don’t impression both the important bus entry latency or the utmost achievable bandwidth[2].

The essential means of unique entry is:

a) Controller_1 performs an unique learn from a shared tackle location.

b) Controller_1 then instantly makes an attempt to finish the unique operation by performing an unique write to the identical shared tackle location.

c) Unique write entry of Controller_1 is taken into account :

  • Profitable if EXOKAY response is acquired. This may happen provided that no different controller has accomplished regular or unique write to that location between the learn and write accesses.
  • Failed if OKAY response is acquired. It means one other controller has accomplished regular or unique write to the shared location between learn and write accesses. On this case, the shared tackle location just isn’t up to date by Controller_1.

When Controller_1 performs an unique learn, an unique entry monitor is allotted. Unique entry monitor incorporates info relating to Transaction ID, tackle to be monitored and the variety of bytes to observe. Now completely different eventualities can happen if goal reminiscence has at the least two screens out of which 5 eventualities are listed beneath:

i) Controller_1(C1) performs an unique learn at shared tackle A1 resulting in the allocation of Unique entry monitor. Controller_1 then instantly points unique write on the identical shared tackle A1. No different controller has written in between the Unique learn and write operations of Controller_1. Unique entry monitor is deallocated on Unique write and EXOKAY response is shipped signaling Controller_1 a few profitable write.

ii) Controller_1(C1) performs an Unique learn on shared tackle A1.Controller_2(C2) then performs unique learn and write on the identical location. As soon as the unique write of Controller_2 succeeds with EXOKAY, each Controller_1 and Controller_2 screens are deallocated instantly. Controller_1 performs an Unique write and receives an OKAY response.

Controller_1 unique write will likely be unsuccessful however Controller_2 unique write will likely be profitable.

iii) Controller_1(C1) did an Unique learn on shared tackle A1.Controller_2(C2) did regular write on the similar location. Controller_1 monitor is deallocated instantly. Controller_1 did an Unique write ultimately. Now each Controller_2 and Controller_1 will obtain an OKAY response.Controller_1 unique write will likely be unsuccessful however Controller_2 regular write will likely be profitable.

iv) Controller_1(C1) did an Unique learn on shared tackle A1.Controller_2 did an unique learn on the identical tackle A1. Controller_1 did an Unique write and acquired an EXOKAY response. Each Controller_1 and Controller_2 screens are deallocated instantly. Controller_2 did an Unique Write and acquired an OKAY response.

Controller_1 unique write was profitable butController_2 unique write was unsuccessful.

v) Controller_1(C1) did an Unique learn on shared tackle A1. Controller_1(C1) did an Unique learn on a distinct tackle A2 inside the similar goal reminiscence. Since just one monitor could be related to one transaction ID at goal reminiscence at any specific time, the primary monitor will likely be deallocated instantly. Now, Controller_1 Unique write on tackle A2 will obtain an EXOKAY response, and Unique write on tackle A1 will obtain an OKAY response.


A system should implement two units of screens to help synchronization between controllers, native and international. A Load-Unique operation updates all of the screens to an unique state[1].

Native screens

Every controller that helps unique entry has an area monitor. Unique accesses to reminiscence areas marked as Non-shareable are checked solely towards this native monitor. Unique accesses to reminiscence areas marked as Shareable are checked towards each native and international screens.

World screens

A world monitor tracks unique accesses to reminiscence areas marked as Shareable. World screens are applied at Goal reminiscence. Any Retailer-Unique operation concentrating on Shareable reminiscence should verify its native and international screens to find out whether or not it could replace reminiscence. There are normally a number of World screens current.

If interconnect is Cache coherent, it could even have unique entry screens. Unique entry to the cacheable coherent area could return from interconnect screens themselves.

If World screens on the Goal reminiscence finish have to be validated, it is extremely essential to maintain the cache disabled and reminiscence area shareable in reminiscence attributes of controller MPU/MMU settings.


i) Eliminating setup points

Multicore software program environments could be complicated needing programming in lots of locations. To make it possible for all reminiscence attributes and cacheability settings are appropriate for all cores, it’s good to run a fundamental check involving two cores to verify the performance of worldwide unique entry screens. It’s finest to run State of affairs ii) as talked about in Part I. If the response is as anticipated, the setup concern is eradicated and we will transfer ahead to do additional stress testing on unique entry screens.

ii) A number of masters accessing Totally different areas

As mentioned earlier, there could be an ‘n’ variety of World screens on the right track reminiscence. ‘n’ screens could be allotted if‘n’ masters do parallel unique reads on completely different goal reminiscence areas. An essential level to notice right here is that the identical grasp can’t allocate ‘n’ screens on the similar time. So, we will validate all ‘n’ screens provided that we’ve at the least ‘n’ masters able to doing unique entry in SoC.

The next code runs on all ‘n’ cores for various reminiscence areas:

EXCL READ: Ldrex <in_register> <memory_location_n>

WAIT: Major Core writes the important thing and secondary cores watch for the important thing
This wait is there to make it possible for all cores have run an Unique readbefore any core makes an attempt an Unique write

EXCL WRITE: Strex <result_register><out_register> <memory_location_n>

On the finish of EXCL WRITE, result_register will retailer the outcome for that core. 0 means EXOKAY and 1 means OKAY. <out_register> incorporates the worth to be written after doing a little arithmetic operation on the worth acquired in <in_register>.

All ‘n’ cores ought to obtain EXOKAY within the case of ‘n’ international screens. If any of the cores acquired OKAY, there could be a problem within the variety of international screens or interconnect screens.

iii) A number of masters accessing the identical location

This check is designed to validate the true use case situation the place ‘n’ cores try to do learn modify write on the identical reminiscence location in a loop. If any race situation exists, it needs to be captured by this check. On the finish of the check, the ultimate worth of the shared location needs to be the variety of cores* the variety of loops. If the worth is lower than this quantity, a race situation exists inside the system. Its wants debugging whether or not it’s the issue of the goal reminiscence controller or interconnect.

The next code runs on all ‘n’ cores for a similar reminiscence location:

Because the variety of cores will increase, the variety of makes an attempt per core may even enhance. The variety of makes an attempt needs to be comparable for all cores with the identical velocity. If the distinction is large, it might be doable that some cores are incurring excessive latencies within the path. It might want debugging.

Beneath is an instance displaying how the variety of makes an attempt will increase with the growing core rely for Core_0.


[1] ARM Synchronization Primitives Growth Article
[2] AMBA AXI Protocol Specification (

In case you want to obtain a replica of this white paper, click on right here

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button