Hyper-threading technology

32
HYPER-THREADING TECHNOLOGY INTRODUCTION: Also called HT Technology, hyper threading was developed by Intel for use in Pentium 4 and Xeon processors. It is the process of executing two "threads" of information simultaneously. This allows the CPU to act as though it were 2 separate CPU's. It uses additional registers to overlap two instruction streams in order to achieve an approximate 30% gain in performance. Multithreaded applications take advantage of the Hyper-Threaded hardware as they would on any dual-processor system; however, the performance gain cannot equal that of true dual-processor CPUs. DEFINITION:

Transcript of Hyper-threading technology

Page 1: Hyper-threading technology

HYPER-THREADING TECHNOLOGY

INTRODUCTION:

Also called HT Technology, hyper threading was developed by Intel for use in

Pentium 4 and Xeon processors. It is the process of executing two "threads" of information

simultaneously. This allows the CPU to act as though it were 2 separate CPU's.

It uses additional registers to overlap two instruction streams in order to achieve an

approximate 30% gain in performance. Multithreaded applications take advantage of the

Hyper-Threaded hardware as they would on any dual-processor system; however, the

performance gain cannot equal that of true dual-processor CPUs.

DEFINITION:

Hyper-Threading technology is a groundbreaking innovation from Intel that

enables multi-threaded server software applications to execute threads in parallel within each

processor in a server platform. The Intel® Xeon™ processor family uses Hyper-Threading

technology, along with the Intel® Net Burst™ micro architecture, to increase compute power

and throughput for today's Internet, e-Business, and enterprise server applications.

This level of threading technology has never been seen before in a general-

purpose microprocessor. Hyper-Threading technology helps increase transaction rates,

reduces end-user response times, and enhances business productivity providing a competitive

Page 2: Hyper-threading technology

edge to e-Businesses and the enterprise. The Intel® Xeon™ processor family for servers

represents the next leap forward in processor design and performance by being the first

Intel® processor to support thread-level parallelism on a single processor.

HISTORY:

The hyper-threading technology found its roots in Digital Equipment

Corporation, but was brought on the market by Intel. Hyper-Threading was first introduced in

the Foster MP-based Xeon in 2002. It appeared on the 3.06 GHz Northwood-based Pentium 4

in the same year, and then appeared in every Pentium 4 HT, Pentium 4 Extreme Edition and

Pentium Extreme Edition processor.

Previous generations of Intel’s processors based on the Core micro architecture do not have

Hyper-Threading, because the Core micro architecture is a descendant of the P6 micro

architecture used in iterations of Pentium since the Pentium Pro through the Pentium III and

the Celeron and the Pentium II Xeon and Pentium III Xeon models.

Intel released the Nehalem (Core i7) in November 2008 in which hyper-

threading makes a return. The first generation Nehalem contains 4 cores and effectively

scales 8 threads. Since then, both 2- and 6-core models have been released, scaling 4 and 12

threads respectively. The Intel Atom is an in-order processor with hyper-threading, for low

power mobile PCs and low-price desktop PCs.

The Itanium 9300 launched with eight threads per processor (2 threads per core)

through enhanced hyper-threading technology. Polson, the next-generation Itanium, is

scheduled to have additional hyper-threading enhancements. The Intel Xeon 5500 server

chips also utilize two-way hyper-threading.

ABSTRACT:

Hyper threading technology, which brings the concept of simultaneous

multithreading to the Intel architecture, was first introduced on the Intel Xeon processor in

early 2002 for the server market. In November 2002, Intel launched the technology on the

Intel Pentium 4 at clock frequencies of 3.06 GHz and higher, making the technology widely

available to the consumer market. This technology signals a new direction in micro

Page 3: Hyper-threading technology

architecture development and fundamentally changes the cost-benefit tradeoffs of micro

architecture design choices.

BENEFITS:

In terms of day-to-day tasks like web browsing, email and word

processing, Hyper-Threading won’t have much of an impact. Yes, Hyper-Threading is

theoretically better at multi-tasking. However, today’s processors are so fast that basic

programs are rarely limited by the speed of your processor. The way programs are coded can

also be a limitation. You may sometimes find that you have numerous programs open, but

only one of your processor cores is being put too much use. That’s because the programs are,

for whatever reason, not having their work divided among the different cores.

When you’re trying to do some heavy lifting, however, Hyper-Threading can be more

helpful. The applications most likely to benefit are 3D rendering programs, heavy-duty

audio/video Transco ding apps, and scientific applications built for maximum multi-threaded

performance. But you may also enjoy a performance boost when encoding audio files in

iTunes, playing 3D games and zipping/unzipping folders. The boost in performance can be

up to 30%, although there will also be situations where Hyper-Threading provides no boost at

all.

THREADING IN XP:

1) Download and install Windows XP Service Pack 2

2) Find the following files (normally in your c: windowsservicepackfiles)

ntkrnlmp.exehalmacpi.dll and copy them to your c:\windows\system32 folder. (This is

considering your new motherboard has ACPI support. I know that these files will support

non-ACPI computers as well, but that has not been tested)

3) Open up boot.ini in your text editor and find the following line:

multi(0)disk(0)disk(0)partition(1)WINDOWS=”Microsoft Windows XP

Professional” /fast detect /No Execute=Opt In .

PERFORMANCE:

Page 4: Hyper-threading technology

Applications that exhibit good threading methods and scale well on multi-

processor servers today are likely to take advantage of Hyper-Threading technology. The

performance increase seen is highly dependent on the nature of the application, the threading

model it uses, as well as system dependencies.

DIFFERENCE BETWEEN I3 I5 I7 PROCESSORS

Here is an i3 i5 i7 comparison which discusses how the three processor lines differ in terms

of features and performance.

INTEL CORE I3 I5 I7 COMPARISON: TECHNICAL FEATURES .

All the core i3 processors have twin core with clocking frequency

ranging from 2.933 GHz to 3.2 GHz. A 4MB L3 smart cache, 2 x 256 KB L2 cache and

Direct Media Interface bus, fitted with the brand new LGA 1156 socket, makes them the best

entry level processors. All these chips are built on a 32 nm architecture which ensures that

more transistors can be etched on the silicon chips. An integrated GPU (Graphic Processing

Unit) makes graphic processing even faster. With Intel's hyper-threading and virtualization

technology enabled, along with HD graphics, these chips are priced at $133 only.

As the i3 i5 i7 comparison chart on the Intel web site reveals, core i5 line consists of three

separate series of processors with twin and quad cores. The twin, as well as quad cores comes

with 4 threads each. The clocking frequencies of these processors range from 2.4 GHz to 3.33

GHz, powered by the Intel Turbo boost technology that boosts clocking frequencies to higher

level when need be. With 4 MB to 8 MB L3 cache, direct media interface, integrated GPU,

LGA 1156 socket, Intel HD graphics, Intel smart cache technology and Hyper-threading

enabled, the cost of these processors ranges from $176 to $256. As a core i5 vs core i7

comparison will prove, i5 chips are faster than the i3 chips. They form the mid level segment.

With the core i7 line, Intel has fulfilled its dream of creating the 'best processors on the

planet'. In the core i3 vs. i5 vs. i7 comparison, core i7 is miles ahead of the rest of the pack.

This line consists of quad core processors with clocking frequencies reaching 3.33 GHz

Page 5: Hyper-threading technology

powered by Intel Turbo boost. As a quad core vs. dual core comparison would prove, greater

number of cores can immensely boost computing speeds.

With L3 smart cache ranging from 8MB to as much as 12 MB, Intel Quick Path Interconnect

technology (that can enhance data transfer speed to 25.6 GB/Sec), integrated GPU, Hyper-

threading and Intel HD graphics, the core i7 series processors are indeed, unarguably, the best

processors ever manufactured. They are meant for high end computing applications, web

servers and high end business users. The price of these processors ranges from $200 to as

much as $1000.

INTEL CORE I3 I5 I7 COMPARISON: PERFORMANCE

Let us now make an Intel core i3 vs. i5 vs. i7 performance comparison. Besides

being multicore, what makes the i3, i5, i7 processors to be computing power houses, is the

hyper-threading technology, combined with the Turbo boost feature. An integrated GPU and

an enhanced L3 cache make graphic processing super fast. As an Intel core i3 vs. core 2 duo

comparison would reveal, the i3 processors surpass the computing power offered by the

earlier core 2 duo series. If you are eying an entry level laptop computer, go for the core i3

line. It is great for home use desktop computers too.

If you are a business user, I suggest that you go for the core i5 line that can handle

multitasking even better than the i3 line. With hyper-threading enabled and Intel's range of

innovative technologies fully operational, the core i5 line is ideal for the business user or

home users, who are into intensive gaming. If you want to settle for nothing less than the very

best in computing today, go for the high end core i7 line. As you must have realized while

going through the core i3 i5 i7 comparison, the i7 line puts phenomenal computing power at

your fingertips that was available once only to users of supercomputers! Budget wise, they

may be the costliest out of the whole lot, but they offer true value for your money.

BEST COMPUTER PROCESSOR:

The only problem with Intel core i7 980X is the price which sets it beyond the

reach of most mortals. However, there is a lot of choice in the lower ranks to choose from. As

Page 6: Hyper-threading technology

an AMD versus Intel chips comparison would reveal, top processors belonging to the other

chip giant - AMD aren't in the same class as Intel's core i7. They are therefore competing

with Intel by offering quad core chips at much lower prices. As the war between AMD vs.

Intel processors rages on, we consumers are the one to benefit from the competition.

As a core i3 vs. i5 vs. i7 comparison would reveal, there is a lot of choice offered by Intel too,

for users with different levels of requirements in terms of features and price. Here are some

of the best CPUs to look out for, that fall in the medium and low budget category, for you to

choose from.

AMD Phenom II X4 965 Black Edition 3.4 GHz - $195

Intel Core i5 750 2.66 GHz - $196

Intel Pentium Processor G6950 2.80 - $87(!)

AMD Phenom II X2 550 3.1 GHz - $102

SOFTWARE SUPPORT FOR HYPER-THREADING

Hyper-Threading performance advantages will only be realized when using Operating

Systems which support multiple CPU’sIn these operating systems, each CPU with Hyper-

Threading will be seen as two CPU’s.Operating Systems which support Hyper-Threading

include Microsoft Windows NT 4.0, Microsoft Windows 2000, Microsoft XP Professional,

and most Unix variants.

CPU’S WHICH SUPPORT HYPER-THREADING

Hyper-Threading is supported by some Intel Xeon and Pentium processors.

HOW DOES THE PROCESSOR WORK?

The computer processor acts as the primary coordinating component of the

computer. The CPU will access programs, data, or other computer functions from RAM

(Random Access Memory) when called by the computer &operating system. The processor

will then interpret the computer instructions that are related to the ordered task before sending

Page 7: Hyper-threading technology

it back to the computer's RAM for execution via the computer system bus in the

correct order of execution.

COMPUTER PROCESSOR LOGIC

At the core of the computer processor is the ability for it to process machine

language code. There are three basic machine language instructions that the CPU can

execute:

- Moving data from one location in the computer & memory to another

- Jump to new instruction sets based on logical operations or choices

- Perform mathematical operations using the Arithmetic Logic Unit (ALU)

In order to conduct these operations the processor makes use of an address bus that it uses to

send addresses to the computer memory as well as a data bus that is used to retrieve or send

information to the computer memory. It also has a separate control line that will notify the

memory of the computer if it is getting or sending/setting a given memory location. In order

to conduct all of its designed operations, the CPU also has a clock which forms the basis for

synchronizing the processor's actions with the remainder of the computer. For

accessing commonly used computer instructions or data, processors will also implement

different caching schemes in order to gain access to the required data at a faster rate than

using direct access RAM.

PROCESSOR MEMORY

The computer processor makes use of read only and random access memory

(ROM and RAM respectfully). The processor & ROM is programmed with preset

information that is permanently programmed with core functions in order to facility processor

communication with the data bus. ROM is commonly referred to as the BIOS (Basic

Input/output System) on Windows computers and is also used to retrieve the boot sector for

the computer.

Page 8: Hyper-threading technology

The processor can read and write to the RAM depending on what action(s) the current

instruction set has determined if the processor needs to conduct. RAM is not designed to

permanently save data and is rest when the computer is turned off or loses power.

THE ROLE OF THE 64 BIT PROCESSOR

Although 64 bit computer processors have been deployed since the early 1990s,

they have only been deployed at the consumer-level in large numbers in recent years. All of

the major computer processor manufacturers now produce 64 bit computer processors which

are available for use across different types of operating system. The primary advantage of a

64 bit computer processor over legacy designs is the significantly expanded address space

available to the processor. The previous 32 bit processors would be limited to a maximum of

two to four gigabytes of effective RAM access. 64 Gigabyte processors are also able to

provide increased input/output access to hard drives and the computer's video card that

help to further increase overall system performance.

Early adopters of 64 bit processors don't necessarily see a large system performance if

not doing high demand tasks such as video editing or playing networked 3D video games.

This will continue to change as more applications are designed to take advantage of 64 bit

processors and the increased memory capacity of the new computer processors.

REQUIREMENTS OF HYPER-THREADING TECHNOLOGY

HT technology requires the following fundamentals:-

1. A processor built-in with HT technology

Not all processors support HT; therefore before purchasing a computer make

sure that it supports HT Technology. You can easily identify HT enabled

processor by checking its specification and CPU logo. Normally, Intel clearly

puts tags on HT built-in processors. Some of Intel family processors that support

HT technology are Intel Atom, core processors, Xeon, Core i-series, Pentium

4 and Pentium mobile processors.

Page 9: Hyper-threading technology

An operating system that supports HT,

HT enabled single processor appears as two processors to the

operating system. However, if the OS don’t support HT, you can’t benefit from

this technology even though you have HT enabled processor. The OS must

recognize that you have HT enabled processors so that it will schedule two

threads or sets of instruction for processing. Windows XP and later Operating

systems are optimized for HT technology.

2. HT compatible Chipset

3. HT enabled system BIOS – you can easily enable/disable HT on system

BIOS. Consult your system manual for this.

ADVANTAGES:

Intel claims threading an application can result in increased performance on a uni-

processor machine or for a multi-processor application. Threads can make a GUI more

responsive. They can also facilitate the overlap of I/ O and computation. If multiple

processors are available, threaded applications may see substantial speedup.

HOW TO SEE THREADING IN COMPUTER?

1. Click the Start button, right-click My Computer, and then click Properties.

2. Click Hardware and click Device Manager.

In the Device Manager window, click the plus (+) sign next to the processor type. If Hyper-

Threading is enabled, the processor is listed twice.

TO ENABLE OR DISABLE HYPER-THREADING:

1. Shut down and restart the computer.

2. When the DELL� logo appears, press <F2> immediately to enter the system setup

program.

Page 10: Hyper-threading technology

If you wait too long and the Microsoft Windows logo appears, continue to wait until

you see the Windows desktop. Then shut down your computer through the Start menu

and try again.

3. When the system setup program screen appears, highlight CPU Information and press

<Enter>.

4. When the CPU information screen appears, highlight Hyper-Threading and press the

spacebar on the keyboard to select Enable or Disable.

5. Press <ESC> to save the setting and exit the CPU Information screen.

6. Press <ESC> to Save and Exit.

7. When you see the message Save changes and exit now, press <Enter>.

Your computer will restart.

FUCTIONS AND SYNTAX:

LogicalProcPerPhysicalProc

Syntax: unsigned char LogicalProcPerPhysicalProc (void)

Description: This function returns a byte value that contains the maximum number of logical

processors per physical package. This is the maximum value of a logical processor that a

physical package can handle. In order to get the number of available logical processors that a

program can use, use the function CPUCount to get the value of AvailLogicalNum.

Return Value: Number of logical processors.

CorePerPhysicalProc

Syntax: unsigned char CorePerPhysicalProc (void)

Description: This function returns a byte value that contains the maximum number of cores

per physical package. This is the maximum value of cores that a physical package can handle.

Page 11: Hyper-threading technology

In order to get the number of available cores that a program can use, use the function

CPUCount to get the value of AvailCoreNum.

Return Value: Number of maximum cores per physical package.

HTSupported

Syntax: unsigned int HTSupported (void)

Description: This function checks if the processor has Hyper-Threading technology built-in.

HTSupported

Syntax: unsigned int HTSupported (void)

Description: This function checks if the processor has Hyper-Threading technology built-in.

Return Value: 0 If Hyper-Threading is not built-in.

CPUCount

Syntax: unsigned char CPUCount (unsigned char *AvailLogicalNum,

unsigned char *AvailCoreNum,

unsigned char *PhysicalNum,)

DISADVANTAGE:

Threading an existing serial application increases the complexity of

the application, Intel says. Sharing of resources, such as global data, can introduce common

parallel programming errors such as storage conflicts and other race conditions. Debugging

such problems is difficult as they are non-deterministic, and introducing debugging probes,

such as print statements, can mask these errors.

Page 12: Hyper-threading technology

IMPROVING MULTI-THREADING VALIDATION:

Clearly, with MT-mode bugs constituting nearly twice the

Number of post-silicon bugs, 15% versus 8% of the presilicon bugs, coupled with the high

cost of fixing post silicon MT bugs (full layer versus metal taproots), there is an opportunity

for improving pre-silicon validation of future MT-capable processors. Driven by the analysis

of pre- and post-silicon MT-mode bugs [2, 3], we are improving pre-silicon validation by

doing the following:

· Enhancing the Cluster Test Environments to improve MT-mode functionality checking.

· Increasing the focus on micro architecture validation of multi-cluster protocols such as

SMC, atomic Operations and forward progress mechanisms.

· Increasing the use of coverage-based validation Techniques to address hardware/microcode.

HYPER-THREADING VS. DUALCORE:

Some Intel processor supports hyper-threading technology, which allows

that processor to execute simultaneously. Programs that are designed to use HTT may run

10% to 30% faster on a HTTP enabled processor on a similar non-HTT model. Dual core

processor has two threads to run.

How Hyper-Threading works:

The current computing paradigm implies multithreading calculations. It

concerns not only servers, but also workstations and desktop systems. Threads can relate to

one or different applications, but there are almost always more than 1 active threads (to make

sure open in the Windows 2000/XP the Task Manager and display the number of threads). At

the same time a usual processor can execute only one thread at a time and must switch

between them constantly.

The Hyper-Threading technology was first realized in the Intel Xeon MP processor (Foster

MP). Note that the Xeon MP, announced at IDF Spring 2002, uses a core similar to the

Pentium 4 Willamette, has a 256 KBytes L2 cache and 512 KBytes/1 MBytes L3 cache and

supports 4-processor configurations.

Page 13: Hyper-threading technology

The Hyper-Threading support is also available in the processor for workstations -- Intel

Xeon (Prestonia core, 512 Kbytes L2 cache) which appeared on the market earlier than the

Xeon MP. We already examined dual-processor configurations on the Intel Xeon, that is why

we are going to take a look at Hyper-Threading capabilities by the example of these CPUs -

both theoretically and practically. However that may be, the "usual" Xeon is more convenient

than the Xeon MP in 4-processor systems...

The Hyper-Threading is based in the principles that at each point of time only a part of

processor resources is used for execution of the program code. Unused resources can also be

loaded, for example, with parallel execution of another application (or just another thread of

the same application). One physical processor Intel Xeon forms two logical processors (LP)

which share CPU computational resources. An operating system and applications see two

CPUs and can distribute a work load between them, like in case of a normal dual-processor

system.

Page 14: Hyper-threading technology

One of the aims of the Hyper-Threading is with only one active thread to let it be

executed at the same rate as on a usual CPU. That is why the processor has two main modes:

Single-Task (ST) and Multi-Task (MT). In the ST mode only one logical processor is active

which uses available resources completely (ST0 and ST1 modes); the other LP is stopped by

the HALT instruction. When the second thread appears the second processor gets enabled (by

interrupt), and the physical CPU switches to the MT mode. Halting of an unused LP is on the

shoulders of an OS which is responsible for the execution of one thread be as fast as without

the Hyper-Threading.

Each of two LP has an Architecture State (AS) which includes a state of registers of

different types -- of general purpose, controlling, APIC and service ones. Each LP has its own

APIC (interrupt controller) and a set of registers; for their correct operation there is a Register

Alias Table (RAT) which traces correspondence between 8 general-purpose registers IA-32

and 128 registers of the physical CPU (one RAT for each LP).

Page 15: Hyper-threading technology

When two threads are executed two Next Instruction Pointers are

supported. The most part of instructions is taken from the Trace Cache (TC) where they are

kept in the decode form, and two active LPs access the TC in turn, in a cycle. At the same

time, when only one LP is active it doesn't share the TC access. The Microcode ROM is

accessed the same way. The ITLB (Instruction Translation Look-aside Buffer) units which

get enabled when required instructions are lacking in the instruction cache, are duplicated and

deliver instructions for their threads. The IA-32 Instruction Decode Unit is shared, and when

decoding of instructions is required for both threads, it serves them in turn (in a cycle). The

Uop Queue and Allocator units are divided in two and provide half of elements for each LP. 5

schedulers process queues of decoded instructions (Uops) although they belong to LP0/LP1

and deliver instructions for execution to respective Execution Units -- depending on readiness

for execution of the former ones and accessibility of the latter. Caches of all levels (L1/L2 for

Xeon, and L3 for Xeon MP) are entirely shareable between the LPs, though to provide data

integrity entries in the DTLB (Data Translation Look-aside Buffer) have descriptors in the

form of IDs of logical processors.

Thus, instructions of both logical CPUs can be executed simultaneously using resources of

one physical processor which are divided into 4 classes:

Duplicated;

Fully Shared;

Entry Tagged;

Partitioned depending on the operating mode - ST0/ST1 or MT.

The most of applications which work faster in multiprocessor systems can also speed up on

the CPU with the Hyper-Threading without any modifications. But there can be problems: for

Page 16: Hyper-threading technology

example, if one of the processes is in the waiting cycle it can take all resources of the

physical CPU hampering operation of the second LP. Thus, the performance with the Hyper-

Threading enabled can even fall down (up to 20%). To prevent this Intel recommends to use

the PAUSE instruction instead of empty waiting cycles (appeared in the IA-32 starting from

the Pentium 4). Besides, automatic and semi-automatic code optimization is being worked on

now - for example, the Intel OpenMP C++/Fortran Compilers series achieved a great success.

Another aim of Intel in development of the Hyper-Threading technology was to make

the number of transistors, a die surface and power consumption grow much slower with a

considerable efficiency increase. Well, incorporation of the Hyper-Threading into the

Xeon/Xeon MP increased the die's surface and power consumption by just 5%. We are just to

estimate what performance gain is obtained with it.

Identifying and Monitoring Hyper-threaded:

It is not possible to effectively detect whether or not processors are hyper

threaded Or dual-core from within an operating system. An external tool is required to

Perform this task. Intel has an extremely effective tool for this purpose. It can be

download from their website either as a completely pre-packaged executable, or a text file of

sample code can be downloaded and customized. This is an extremely handy tool and can be

found at:

http://intel.com/cd/ids/developer/asmo-na/eng/recent/275339.htm

An example of the pre-packaged executables output is shown below in Figure:

Page 17: Hyper-threading technology

Monitoring hyper-threaded CPUs within Windows operating systems can be

accomplished via PerfMon.exe. As previously mentioned, the OS recognizes a single

Hyper-threaded CPU as two logical processors.

Application of hyper threading:

For the desktop, video editing represents a computationally intensive desktop

application for which HT can offer noticeable performance benefits. Intel describes in a white paper

how, hypothetically, the CPU would read a stream of uncompressed video and would process the

special effects in real time while the processed video stream would then be stored onto a disk in this

application. This problem can be particularly performance sensitive if the special effects have to be

applied to a live video stream, Intel says. The time available to process each frame of video is finite

and should be processed before the next frame arrives.

Page 18: Hyper-threading technology

With threading, a few pieces of information are crucial to the success of the threaded version. If the

special effects to be performed on each pixel of the video frame are complex, for example, then the

function will meet the computationally intensive criteria to which HT applies. Depending on the size

of the video frame, the processing of each frame can be divided into multiple parts and each can be

concurrently processed using threads. This translates into what Intel calls a "data decomposition

problem," which applies to the time allotted for each thread-processing task.

In any threaded design, the first areas that are targeted comprise the most time processor

consuming areas in the code. In the hypothetical video editing example, the application of special

effects to the video frame is the most time consuming task, followed by the I/ O to read and write a

frame. The main thread acts as a master thread and divides the current video frame into four parts in

the setup phase .Once the data has been set up, the master thread wakes up the three other threads and

all four threads, including the master, operate on its unique section of the video frame. Once the

threads are done processing their share of the data, they wait at a barrier for all threads to complete

their sections of the frame. The master then suspends all of the worker threads and writes the

processed frame to disk before reading the next available frame from the stream.

Are there any licensing issues?

Each logical processor that is contained within a Hyper-Threading processor

appears to the operating system as an individual processor. This means that tools or services

within Windows that display information about processors, such as the Windows Task

Manager or Windows Performance Monitor, will display processor information for every

logical processor that Windows is utilizing.

Intel’s processor identification methodology has been updated to support the software

identification of Hyper-Threading using the CPUID instruction. Operating system and

application software can use this identification mechanism to detect the presence of Hyper-

Threading processors and to provide support for features such as Hyper-Threading-aware

product licensing. Windows .NET Server supports an API that provides the logical to

physical mapping for the processors in the system. The current Windows operating system

licensing model for Hyper-Threading-enabled systems is to require a processor license for

each physical processor. However, it is important to note that any software product that was

released before the introduction of Hyper-Threading will not support Hyper-Threading

detection and will treat each logical processor as if it were an individual physical processor.

Page 19: Hyper-threading technology

This licensing model applies to all 32-bit versions of Windows XP and Windows .NET

Server. This model delivers the performance benefit of utilizing both logical processors for

each processor that the Windows license supports. The processor limits which result from

this licensing model for 32-bit versions of Windows .NET Server and Windows XP are

shown below.

If seventeen

Hyper-

Threading

processors are

listed by the

BIOS,

Windows .NET Data enter Server will exhaust the 32-processor limit using both logical

processors on the first 16 physical processors listed. The operating system will not use either

logical processor on the seventeenth physical processor. As described earlier, utilizing a

single logical processor on an idle physical Hyper-Threading processor provides better

performance than utilizing the second logical processor on a physical processor that already

has an active logical processor.

As a result, Microsoft’s recommendation for systems that contain more than 16 physical

Hyper-Threading processors is to disable Hyper-Threading at the BIOS before installing or

booting Windows. Because the performance benefit provided by the second logical

processors in a Hyper-Threading system decreases as the number of physical processors in

the system increases, it is not anticipated that the lack of Hyper-Threading support on

systems with more than 16 physical Hyper-Threading processors will have a significant

impact on the performance of the system.

Hyper-threading in Linux:

Windows Version

Maximum

Physical

Processor Limit

Maximum

Logical

Processor Limit

Windows XP Home Edition 1 2

Windows XP Professional 2 4

Windows .NET Standard

Server

4 8

Windows .NET Enterprise

Server

8 16

Windows .NET Datacenter

Server

32 32

Page 20: Hyper-threading technology

In order to make use of Hyper-Threading in Linux, you will need Hyper-Threading enabled in kernel. But how can you find out if your CPU supports HT? We can get the information from our running Linux system about its CPU by looking into /proc. For example, bellow you can see the output taken from a Xeon system:

cat /proc/cpuinfoprocessor : 0vendor_id : GenuineIntelcpu family : 15model : 4model name : Intel(R) Xeon(TM) CPU 3.20GHzstepping : 3cpu MHz : 3201.940cache size : 2048 KBphysical id : 0siblings : 2core id : 0cpu cores : 1fdiv_bug : nohlt_bug : nof00f_bug : nocoma_bug : nofpu : yesfpu_exception : yescpuid level : 5wp : yes

Inside the flags section we are looking for a “ht” flag. If it is present, this means that the system supports HT.Let’s look on another sample taken from a Pentium4 CPU (the un-needed infos were removed):model name : Intel(R) Pentium(R) 4 CPU 3.20GHzcpu MHz : 3192.092

Again this system also supports HT. If you don’t see the HT flag, then your system doesn’t support HT. Obviously this will not be available on AMD processors as this is an Intel technology (this might not be true anymore with newer AMD CPUs). Here is an example from an AMD Opteon system:model name : AMD Opteron(tm) Processor 242cpu MHz : 1593.326

If your CPU supports HT, then you can take advantage of this technology only if HT support is enabled in your running kernel. You can either install a kernel provided by your

Page 21: Hyper-threading technology

Linux distribution with HT support (one that has *SMP* inside its name for ex.) or you can compile your own kernel and include HT support.

Once you are running a HT enabled kernel your should normally see the virtual CPU as a regular extra CPU (you will see 2 CPUs on a single CPU system, 4 CPUs on a dual processor system, etc.). You can easily check this with:

cat /proc/cpuinfo

If you still see only one CPU even after you have installed a HT enabled kernel, then you might want to check:

HT is not disabled in BIOS. APCI is enabled in BIOS.

Threading Compilers:

Intel's new HT compiler tools run the gamut of programming applications, Intel says. Version 7.0 of Intel C++ and Intel Fortran compilers for Windows and Linux can improve the performance of applications for Intel Itanium 2, Intel Xeon, and Intel Pentium 4 processor-based systems up to 40% compared to compilers currently available from other vendors, Intel claims.

Specific to HT, the Version 7.0 Intel compilers include an auto-parallelization option that automatically looks in applications for opportunities to create multiple execution threads and enhancements to OpenMP, an open standard that enables the use of high-level directives to simplify the creation and management of multi-threaded application software.

Conclusion:

Whether you want desktop or laptop computer, HT technology is available in most computer types including servers and workstations. HT makes us to work more fast and high performance.

Page 22: Hyper-threading technology
Page 23: Hyper-threading technology