Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Intel® Xeon Phi™ Coprocessor code named “Knights Landing” - Application Readiness
By Indraneil Gokhale (Intel)Posted 09/15/20140
As part of the application readiness efforts for future Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors (code named Knights Landing), developers are interested in improving two key aspects of their workloads: Vectorization/code generation Thread parallelism This article mainly talks a...
Avoid frequency drop in GPU cores when executing applications in Heterogeneous mode
By Anoop Madhusoodhanan Prabha (Intel)Posted 03/23/20150
Introduction Intel(R) C++ Compiler 15.0 provides a feature which enables offloading general purpose compute kernels to processor graphics. This feature enables the processor graphics silicon area for general purpose computing. The key idea is to utilize the compute power of both CPU cores and GP...
Intel Cluster Ready FAQ: Software vendors (ISVs)
By Werner Krotz-vogel (Intel)Posted 03/23/20150
Why should we join the Intel Cluster Ready program? A: By offering registered Intel Cluster Ready applications, you can provide the confidence that applications will run as they should, right away, on certified clusters. Participating in the program will help you increase application adoption, e...
Intel Cluster Ready FAQ: Hardware vendors, system integrators, platform suppliers
By Werner Krotz-vogel (Intel)Posted 03/23/20150
Q: Why should we join the Intel® Cluster Ready program? A: By offering certified Intel Cluster Ready systems and certified components, you can give customers greater confidence in deploying and running HPC systems. Participating in the program will help you drive HPC adoption, expand your custom...
Subscribe to Intel Developer Zone Articles
Advanced Computer Concepts For The (Not So) Common Chef: Terminology Pt 1
By Taylor Kidd (Intel) Posted on 03/24/15 0
Before we start, I will use the next two blogs to clear up some terminology. If you are familiar with these concepts, I give you permission to jump to the next section.  I suggest any software readers still check out the other blog about threads. There is a lot of confusion, even among us softwar...
Check out the Parallel Universe e-publication
By Mike Pearce (Intel) Posted on 03/18/15 0
The Parallel Universe is a quarterly publication devoted to exploring inroads and innovations in the field of software development, from high performance computing to threading hybrid applications. Issue #20 - Cover story: From Knights Corner to Knights Landing: Prepare for the Next Generation o...
VTune™ Amplifier XE 2015 Update 2 supports for driverless hardware event-based sampling with call stack info
By Peter Wang (Intel) Posted on 03/15/15 1
In general, vtune drivers will be built and loaded to the Linux* system automatically during installing VTune™ Amplifier XE product, then hardware PMU event-based sampling can work.  However sometime, vtune drivers were built/loadeded unsuccessfully, because of one of below reason: 1.    There ...
Intel® Xeon Phi™ Coprocessor Developer Training Coming to a City Near You in 2015
By Mike Pearce (Intel) Posted on 03/04/15 0
Intel is offering an updated and expanded series of software developer trainings in parallel programming using the Intel® Xeon Phi™ coprocessor.
Subscribe to Intel Developer Zone Blogs
Intel® Parallel Studio XE SP1 & Intel® Cluster Studio XE SP1
By kathy-farrel (Intel)0
Intel® Parallel Studio XE SP1 & Intel® Cluster Studio XE SP1 - What's New - Webinar Tuesday, September 17 9am PDT Please join us for a technical presentation on the new features found in the recently released Intel® Parallel Studio XE 2013 SP1 Intel® Cluster Studio XE SP1. This release includes support for compilers and performance analysis on Intel® Xeon Phi™ on Windows*. The technical presentation will briefly cover new features for both C++ and Fortran on Linux*, Windows*, and OS X* operating systems as well as error checking and performance profiling tools. Learn how to efficiently boost your application performance! Not too late! - Register Now  Learn about Upcoming Webinars
COPROCESSADOR PHI AND JAVA
By Rafael R.2
Hi, In our university bought a machine with CO-PROCESSOR PHI. The description in the site: https://software.intel.com/en-us/articles/intelr-xeon-phitm-coprocessor-... It is reported that there is no support JAVA yet. The answer is 2013 and we are already in 2015. Is there a Java option for coding? Tks Rafael
Intel® Xeon Phi™ Coprocessor Developer Training Coming to a City Near You in 2015
By Mike Pearce (Intel)0
https://software.intel.com/en-us/blogs/2015/03/04/intel-xeon-phi-coprocessor-developer-training-coming-to-a-city-near-you-in-2015
Mixing kernel space and userspace in a new kernel.
By Jog L.0
Hello, I was thinking of creating an open source kernel (with block already written in the linux kernel obviously). Now I would like to hear from experts what are the dangers to run in ring0 if no users and no external connections are done. We are in a situation in which the processor is isolated from the whole world. No one can mess with it. all the processes running on top of it have to register and are created and compiled by root using a specific memory range. No process can be launched without the acceptation of root. No human accesses it. The code running inside is reviewed and we have facilities to be sure no other memory range than the one we expect each process to use can be used. That is for the -restrictive- context. Now, could we imagine it be possible for such a kernel to exist or are there some limitations that I don't predict ? The kernel is to be massively specialized, hence the "almost starting from scratch". Thanks for your insights, Jog
linking with two versions of mkl (multi threaded and single threaded) in one application
By Michal K.3
Hi, Is it possible to use both the single threaded version of mkl library and the multi threaded version of mkl in one application? I need the single threaded version to use with PLASMA library, yet at some other part of my code, I need use mkl PARDISO, for which I need the multi threaded version. Any help will be greatly appreciated. Cheers Michal  
PCIe 3.0 reference clock jitter tool
By Sonal C.0
Where can I access the Intel PCIe clock jitter tool
Memory to CPU (mov) bandwidth limitations
By albus d.3
(sorry for weak english I am not native english, Not sure if right forum, first time here - This is general about some hardware limits i do not understand technical reason and I would very like to know) We have now parallelised SIMD arithmetic (like 8 float mulls or divisions in one step) theoretical (but also nearly practical) arithmetical bandwidth per core is thus like 4GHz * 8 floats = about 30 GFLOPS per core or something like that But we still AFAIK have quite low RAM to CPU bandwidth at the level of read or write of 1 or 2 int of float per nanosecond, such ram-2-cpu bandwidth when i am testing it is like only 2 GLOP per second per core or something like that; (both those values are rough but this difference seem to be physical truth at least from my experience) I mean arithmetic can be paralelised (like 8-vectorised) but load/store movs are not - thus SIMD paralistation has obly a fraction of its potential power This is extremally crusial to increase this memory bandwith (muc...
speedup problem using openMP in intel fortran
By bohluly2
Dear all, I have developed  a program and unfortunately I have speedup problem in it. My program is so big so I have tried to write a sample similar to my program, fortunately this simple program has a same problem with my program.  I need other experiences and your help if it is possible. Thanks, I am using VS2010 and Intel FORTRAN XE 2011 Program:     TYPE var         REAL(8),POINTER :: A, B, C      END TYPE var      REAL(8),POINTER :: A(:), B(:), C(:)      TYPE(var),POINTER  :: vars(:)        TYPE(var),POINTER :: varOMP            REAL*8  t1,t2 ,ai,bi,ci,di,ei,fi        INTEGER(4) c1,c2      INTEGER N, CHUNKSIZE, I, id, f , l      PARAMETER (N=200)      PARAMETER (CHUNKSIZE=10)            Allocate (A(N), B(N), C(N),vars(N)) !     initializations         DO I = 1, N          A(N)      =   I * 1.0          B(N)      =   A(N)          vars(I)%A =>  A(N)          vars(I)%B =>  B(N)          vars(I)%C =>  C(N)          vars(I)%A = 0.51          vars(I)%B...
Subscribe to Forums

Highlights