Search found 75 matches

by agner
2022-04-24, 16:56:59
Forum: Agner's CPU blog
Topic: Intel's new Chimera: Alder Lake
Replies: 14
Views: 644035

Intel's new Chimera: Alder Lake

A chimera is a monster combining parts from different animals, or an organism containing multiple different sets of DNA. I am calling Intel's new Alder Lake processor a chimera because it is a hybrid containing two different kinds of CPU cores with very different designs. The Alder Lake processor co...
by agner
2021-12-20, 12:28:04
Forum: Agner's CPU blog
Topic: INTEL X86,why do align access and non-align access have same performance?
Replies: 1
Views: 45260

Re: INTEL X86,why do align access and non-align access have same performance?

Modern high-end microprocessors are able to handle unaligned access quite efficiently in most cases. In your case, the caches, prefetchers, and in/out buffers are handling the alignment issue smoothly while the bottleneck that determines the execution time is the exchange of data between caches and ...
by agner
2021-12-11, 6:05:42
Forum: Agner's CPU blog
Topic: ARMV8 fundamental data type atomicity
Replies: 2
Views: 49899

Re: ARMV8 fundamental data type atomicity

I don't know. ARM is off-topic here.
You may post the question at stackoverflow.com
by agner
2021-11-06, 16:04:41
Forum: Agner's CPU blog
Topic: Forwardcom: A project towards the ideal computer
Replies: 7
Views: 123367

Re: Forwardcom: A project towards the ideal computer

rejesh_g wrote This sounds interesting. Are you still looking for folks to work on this project? Yes indeed. Please look at https://www.forwardcom.info . The first softcore is working. You may start trying to run the examples in the emulator. If you have the FPGA board you can also try the examples ...
by agner
2021-11-04, 15:42:24
Forum: Agner's CPU blog
Topic: Memory load μops on P4/P4E microarchitecture
Replies: 2
Views: 51651

Re: Memory load μops on P4/P4E microarchitecture

You are probably right. I don't remember much about the NetBurst architecture. Why do you care? NetBurst is long dead.
by agner
2021-11-04, 15:35:23
Forum: Agner's CPU blog
Topic: Clarification on Intel Haswell microarchitecture pipeline
Replies: 2
Views: 53474

Re: Clarification on Intel Haswell microarchitecture pipeline

STA at port 7 calculates the address for a store. The address can be calculated before the data to store is available. The calculated address is passed back to the scheduler where it waits until STD at port 4 needs it. For example: MOV [RAX+RBX],ECX. Here RAX and RBX go to STA to calculate the addre...
by agner
2021-11-01, 12:09:04
Forum: Agner's CPU blog
Topic: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
Replies: 4
Views: 74404

Re: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE

The explanation is that mulss xmm0,xmm1 is using 32 bits of a 128 bits register. The remaining 96 bits of xmm0 are unchanged. This gives you a false dependence that delays the next iteration. Possible solutions: use the whole register (mulps xmm0, xmm1) or clear or reload the register between iterat...
by agner
2021-09-20, 5:14:17
Forum: Agner's CPU blog
Topic: SSE replacement for FPREM1
Replies: 1
Views: 45099

Re: SSE replacement for FPREM1

It is complicated to calculate the remainder with reasonable accuracy for high x. The vector class library is doing this in the sin, cos, and tan functions. See the file vectormath_trig.h in https://github.com/vectorclass/version2 If you are using the reduced x for a trigonometric function anyway th...
by agner
2021-07-01, 18:05:17
Forum: Agner's CPU blog
Topic: Surprising new feature in AMD Ryzen 3000
Replies: 11
Views: 2459368

Re: Surprising new feature in AMD Ryzen 3000

No. It is using temporary registers inside the CPU. It is not using the cache.