Search found 75 matches
- 2022-04-24, 16:56:59
- Forum: Agner's CPU blog
- Topic: Intel's new Chimera: Alder Lake
- Replies: 14
- Views: 644090
Intel's new Chimera: Alder Lake
A chimera is a monster combining parts from different animals, or an organism containing multiple different sets of DNA. I am calling Intel's new Alder Lake processor a chimera because it is a hybrid containing two different kinds of CPU cores with very different designs. The Alder Lake processor co...
- 2021-12-20, 12:28:04
- Forum: Agner's CPU blog
- Topic: INTEL X86,why do align access and non-align access have same performance?
- Replies: 1
- Views: 45269
Re: INTEL X86,why do align access and non-align access have same performance?
Modern high-end microprocessors are able to handle unaligned access quite efficiently in most cases. In your case, the caches, prefetchers, and in/out buffers are handling the alignment issue smoothly while the bottleneck that determines the execution time is the exchange of data between caches and ...
- 2021-12-11, 6:05:42
- Forum: Agner's CPU blog
- Topic: ARMV8 fundamental data type atomicity
- Replies: 2
- Views: 49905
Re: ARMV8 fundamental data type atomicity
I don't know. ARM is off-topic here.
You may post the question at stackoverflow.com
You may post the question at stackoverflow.com
- 2021-11-06, 16:04:41
- Forum: Agner's CPU blog
- Topic: Forwardcom: A project towards the ideal computer
- Replies: 7
- Views: 123387
Re: Forwardcom: A project towards the ideal computer
rejesh_g wrote This sounds interesting. Are you still looking for folks to work on this project? Yes indeed. Please look at https://www.forwardcom.info . The first softcore is working. You may start trying to run the examples in the emulator. If you have the FPGA board you can also try the examples ...
- 2021-11-04, 15:42:24
- Forum: Agner's CPU blog
- Topic: Memory load μops on P4/P4E microarchitecture
- Replies: 2
- Views: 51659
Re: Memory load μops on P4/P4E microarchitecture
You are probably right. I don't remember much about the NetBurst architecture. Why do you care? NetBurst is long dead.
- 2021-11-04, 15:35:23
- Forum: Agner's CPU blog
- Topic: Clarification on Intel Haswell microarchitecture pipeline
- Replies: 2
- Views: 53481
Re: Clarification on Intel Haswell microarchitecture pipeline
STA at port 7 calculates the address for a store. The address can be calculated before the data to store is available. The calculated address is passed back to the scheduler where it waits until STD at port 4 needs it. For example: MOV [RAX+RBX],ECX. Here RAX and RBX go to STA to calculate the addre...
- 2021-11-01, 12:09:04
- Forum: Agner's CPU blog
- Topic: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
- Replies: 4
- Views: 74410
Re: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
The explanation is that mulss xmm0,xmm1 is using 32 bits of a 128 bits register. The remaining 96 bits of xmm0 are unchanged. This gives you a false dependence that delays the next iteration. Possible solutions: use the whole register (mulps xmm0, xmm1) or clear or reload the register between iterat...
- 2021-10-04, 10:51:08
- Forum: Agner's CPU blog
- Topic: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
- Replies: 4
- Views: 74410
Re: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
I will suggest that you ask this question at stackoverflow.com
- 2021-09-20, 5:14:17
- Forum: Agner's CPU blog
- Topic: SSE replacement for FPREM1
- Replies: 1
- Views: 45103
Re: SSE replacement for FPREM1
It is complicated to calculate the remainder with reasonable accuracy for high x. The vector class library is doing this in the sin, cos, and tan functions. See the file vectormath_trig.h in https://github.com/vectorclass/version2 If you are using the reduced x for a trigonometric function anyway th...
- 2021-07-01, 18:05:17
- Forum: Agner's CPU blog
- Topic: Surprising new feature in AMD Ryzen 3000
- Replies: 11
- Views: 2459588
Re: Surprising new feature in AMD Ryzen 3000
No. It is using temporary registers inside the CPU. It is not using the cache.