Discussion:
Another Branch Prediction Attack
(too old to reply)
Andy K.
2018-03-29 20:59:06 UTC
Permalink
Raw Message
Date: Thu, 29 Mar 2018 11:23:14 GMT
From: Bruce Schneier
Subject: Another Branch Prediction Attack


Another Branch Prediction Attack

URL: https://www.schneier.com/blog/archives/2018/03/another_branch_.html

When Spectre and Meltdown were first announced earlier this year,
pretty much everyone predicted that there would be many more attacks
In the new attack, an attacker primes the PHT and running branch instructions so that the PHT will always assume a particular branch is taken or not taken. The victim code then runs and makes a branch, which is potentially disturbing the PHT. The attacker then runs more branch instructions of its own to detect that disturbance to the PHT; the attacker knows that some branches should be predicted in a particular direction and tests to see if the victim's code has changed that prediction.
The researchers looked only at Intel processors, using the attacks to leak information protected using Intel's SGX (Software Guard Extensions), a feature found on certain chips to carve out small sections of encrypted code and data such that even the operating system (or virtualization software) cannot access it. They also described ways the attack could be used against address space layout randomization and to infer data in encryption and image libraries.
--
AndyK
Rich
2018-03-30 02:38:03 UTC
Permalink
Raw Message
Post by Andy K.
Date: Thu, 29 Mar 2018 11:23:14 GMT
From: Bruce Schneier
Subject: Another Branch Prediction Attack
Another Branch Prediction Attack
URL: https://www.schneier.com/blog/archives/2018/03/another_branch_.html
When Spectre and Meltdown were first announced earlier this year,
pretty much everyone predicted that there would be many more attacks
targeting branch prediction in microprocessors.
Yup. The "cats out of the bag" or the "genie is out of the bottle" or
the "horse has left the barn" (or pick your own cliche).

We've not yet seen all the possible ways that side channels can be
exploited. And since the designs to this point were all built without
thought to side channel attack vectors, the chips probably look like
swiss cheese from a side channel viewpoint.

The only real fix is going to be new chips, with new designs, built
with side channel attacks in mind, and built to mitigate the attacks up
front.

Most of our legacy stuff, yeah, if it was vulnerable, it is going to
stay vulnerable for the most part.
Andy K.
2018-03-30 05:53:32 UTC
Permalink
Raw Message
On Fri, 30 Mar 2018 02:38:03 -0000 (UTC)
Post by Rich
The only real fix is going to be new chips, with new designs, built
with side channel attacks in mind, and built to mitigate the attacks up
front.
Yup, and for users to swallow the inevitable performance hit that comes
with it.
--
AndyK
Rich
2018-03-30 13:39:52 UTC
Permalink
Raw Message
Post by Andy K.
On Fri, 30 Mar 2018 02:38:03 -0000 (UTC)
Post by Rich
The only real fix is going to be new chips, with new designs, built
with side channel attacks in mind, and built to mitigate the attacks up
front.
Yup, and for users to swallow the inevitable performance hit that comes
with it.
Some mitigations will incur a performance hit, yes. Others that
currently incur one would (if fixed at the hardware level) effectively
be no performance hit.

I.e., one of the huge performance hits was the software fix for Linux
where the TLB is now flushed upon transitioning from user to kernel
space in Linux.

The performance hit comes from flushing out everything in the TLB,
including the valid items, requiring them then be reloaded by the
hardware as further memory accesses occur.

A revised hardware design where the TLB was also part of the state that
was reset back to where it was before speculative execution began would
remove the need for the kernel to flush the TLB. Removing the need to
flush the TLB removes the performance hit that the software fix incurs.

Now, the extra on chip memory to save the state of the TLB and restore
it on miss-speculations means that further gains in performance might
not happen at that design change. But that would restore the lost
performance due to the software fix (a full TLB flush to fix this is
somewhat like using a sledge hammer to swat a fly, it would work but is
significant overkill).
Computer Nerd Kev
2018-03-31 05:01:27 UTC
Permalink
Raw Message
Post by Andy K.
Another Branch Prediction Attack
URL: https://www.schneier.com/blog/archives/2018/03/another_branch_.html
When Spectre and Meltdown were first announced earlier this year,
pretty much everyone predicted that there would be many more attacks
This attack (and my memory is fading already on the others, but I think
it at least applies to Spectre as well) relies on comparing the
performance of the CPU at executing tasks, with the time taken to
perform branching instructions varying depending on data present in
areas of memory theoretically inaccessible to user code.

Besides the predictive operation of the branching instructions, which
can't be easily changed, and the use of protected memory areas, which
has currently been changed at the expense of general performance,
another key requirement for the attack is an accurate way of
measuring time.

I noticed that Firefox's short-term response to the Spectre
vulnerability was to reduce the availability and resolution of
its available timer functions in JavaScript:
https://blog.mozilla.org/security/2018/01/03/mitigations-landing-new-class-timing-attack/

Taking this a step further, would it be possible to actively reduce
the precision of the hardware timers in the x86 architecture only
when a large number of predictive branching instructions are
executed in short succession?

If additional hardware is able to watch the data read by the CPU
from RAM for such a succession of these branching instructions,
it could trigger random variations in the frequency output by the
timer clock generator. These variations can be of large enough
significance to hide any effect from predictive branching, or at
least make the computation time required to detect it impractical.

Existing x86 CPUs could then still be used within the system, with
the only new limit to their performance being timing accuracy,
when required at the same time that unusually frequent branching
instructions are being executed.

EDIT:
-----
Actually, I'm afraid the Time Stamp Counter introduced with the
Pentium may sabotage this approach:
https://en.wikipedia.org/wiki/Time_Stamp_Counter#Use_in_exploiting_cache_side-channel_attacks
Well bugger, I've typed too much to abandon this post now.
----


Warning, I'm in one of my thinking outside of the box moods so
the following thinking is likely to be rather irrelevant and
pointless...

I then started thinking about how additional hardware could be
added to existing computers so as to offer an equivalent
to the software patches. Some sort of additional connector
sandwidched between the RAM sticks and the RAM sockets on the
motherboard could, maybe, allow additional circuitry to detect
the CPU reading branch instructions by monitoring the data
lines. Though this assumption is made in ignorance of the degree
of optimisation acheived with the x86 architecture and the
effects that adding additional loads on the RAM outputs is
likely to have on stability.

Then one comes to the question of how to change the clock
frequency used for the timer without actually modifying the
circuit used to generate it. It turns out that the prime
factors able to influence a crystal oscillator are heat,
acceleration, magnetic fields, and radiation. Long story
short, ruling out Heat due to the thermal mass of the
component, I've concluded that a device using an elecromagnet
to attract a lever, to move a small piece of lead, to
uncover a potent radioactive source, should produce enough
varying vibration, electromagnetism, and radiation that
when glued onto the can of the oscillator crystal it
should make it quite giddy.

Unfortunately I later discovered that the 14.318MHz
crystal frequency used for the timers is also synthesised
to generate the clock signals for PCI, USB, and all sorts
of other stuff that probably shouldn't be toyed with. A
shame, until then it was all looking so practical. :)

Note that this doesn't affect the modification I described
earlier of randomly varying the timer clock, because this
could be done after the generation of the other clock
signals.


References:
https://en.wikipedia.org/wiki/Intel_8254
-"Intel 8253". Original PC timer chip - Wikipedia

https://en.wikipedia.org/wiki/High_Precision_Event_Timer
-"High Precision Event Timer". Higher frequency timer added
later to the PC architecture - Wikipeida

https://en.wikipedia.org/wiki/Time_Stamp_Counter
-"Time Stamp Counter". Evil little thing that Intel put
in the Pentium to make sure that the fix I just
described wouldn't work. Or an internal execution
cycle counter - Wikipedia

http://oscilent.com/esupport/TechSupport/ReviewPapers/IntroQuartz/vigtoc.htm
-"Introduction to Quartz Frequency Standards"
-See ch. III "Oscillator Instabilities"
-Influences on precision Crystal Oscillator accuracy
(PC crystals won't be "precision").

http://ieee-uffc.org/extras/learning/Brendel198.html
-"INFLUENCE OF A MAGNETIC FIELD ON QUARTZ CRYSTAL RESONATORS"
- R. BRENDEL
-Again about precision crystals.
--
__ __
#_ < |\| |< _#
Rich
2018-03-31 06:05:56 UTC
Permalink
Raw Message
Post by Computer Nerd Kev
Post by Andy K.
Another Branch Prediction Attack
URL: https://www.schneier.com/blog/archives/2018/03/another_branch_.html
When Spectre and Meltdown were first announced earlier this year,
pretty much everyone predicted that there would be many more attacks
Taking this a step further, would it be possible to actively reduce
the precision of the hardware timers in the x86 architecture only
when a large number of predictive branching instructions are
executed in short succession?
Except that there is no such thing as a "predictive branching
instruction". All branch instructions are predicted by the CPU. It
has onboard memory that remembers the last X (where X varies depending
on the specific CPU variant). So all branches are receiving
"predictive branching" operations, all the time.

Further, simply reducing the timer resolution is not a fix. It is at
best a band-aid because all it does is increase the time necessary to
detect the differences. You'd have to reduce the timing resolution to
a point where it is no longer useful for normal usage (playing videos,
etc.) before you'd reach a point where simply fidling with the timing
makes the time necessary large enough to be all but impractical.
Post by Computer Nerd Kev
If additional hardware is able to watch the data read by the CPU
from RAM for such a succession of these branching instructions,
it could trigger random variations in the frequency output by the
timer clock generator.
Won't help. Modern CPU's don't actually execute anything directly out
of RAM. All RAM sees in todays machines is cache line fills and cache
line flushes. The branch instructions are executed from the onboard
cache. And depending on the architecture, for short loops, even the
caches don't see the execution cycles (because a lower level loop
buffer is actually handling the instruction feed).
Post by Computer Nerd Kev
I then started thinking about how additional hardware could be
added to existing computers so as to offer an equivalent
to the software patches. Some sort of additional connector
sandwidched between the RAM sticks and the RAM sockets on the
motherboard could, maybe, allow additional circuitry to detect
the CPU reading branch instructions by monitoring the data
lines.
Won't work. The cache line fills that this 'extra hardware' sees will
have almost zero correlation to actual branch execution, and in some
instances these cache line fills will occur thousands or tens of
thousands of clock cycles before any execution of any branches in the
lines will occur.
Post by Computer Nerd Kev
I've concluded that a device using an elecromagnet to attract a
lever, to move a small piece of lead,
Mechanical device. The rate of reaction of a mechanical device is
going to be on the order of disk seek latency (mS or so). On modern
CPU's you've already blown past a few million CPU clock cycles by the
time your lever and lead can move out of the way.
Post by Computer Nerd Kev
to uncover a potent radioactive source,
And, now you have to get licences from the govt. to sell the machine
because it comes with a "potent radioactive source" installed, and who
know what else worth of regulatory red-tape.
Post by Computer Nerd Kev
Unfortunately I later discovered that the 14.318MHz
crystal frequency used for the timers is also synthesised
to generate the clock signals for PCI, USB, and all sorts
of other stuff that probably shouldn't be toyed with. A
shame, until then it was all looking so practical. :)
Yep. Tweaking one clock likely bothers many others that should not
have their frequency changed. They are almost all interconnected.
That is the single easiest way to keep them all in phase with each
other, derive them from each other.
Computer Nerd Kev
2018-03-31 22:46:33 UTC
Permalink
Raw Message
Post by Rich
Post by Computer Nerd Kev
Post by Andy K.
Another Branch Prediction Attack
URL: https://www.schneier.com/blog/archives/2018/03/another_branch_.html
When Spectre and Meltdown were first announced earlier this year,
pretty much everyone predicted that there would be many more attacks
Taking this a step further, would it be possible to actively reduce
the precision of the hardware timers in the x86 architecture only
when a large number of predictive branching instructions are
executed in short succession?
Except that there is no such thing as a "predictive branching
instruction". All branch instructions are predicted by the CPU. It
has onboard memory that remembers the last X (where X varies depending
on the specific CPU variant). So all branches are receiving
"predictive branching" operations, all the time.
OK, fair enough. I didn't mean to imply that I was talking about a
small sub-set of instructions, the false positive rate would be high.
Post by Rich
Further, simply reducing the timer resolution is not a fix. It is at
best a band-aid because all it does is increase the time necessary to
detect the differences.
The idea was to be a better solution than the current approach of
software patches.
Post by Rich
You'd have to reduce the timing resolution to
a point where it is no longer useful for normal usage (playing videos,
etc.) before you'd reach a point where simply fidling with the timing
makes the time necessary large enough to be all but impractical.
The idea is that playing videos wouldn't be enough to set off the
protection mechanism, at least for any significant proportion of
the play time. See below for my new doubts on the practicality
of this.
Post by Rich
Post by Computer Nerd Kev
If additional hardware is able to watch the data read by the CPU
from RAM for such a succession of these branching instructions,
it could trigger random variations in the frequency output by the
timer clock generator.
Won't help. Modern CPU's don't actually execute anything directly out
of RAM. All RAM sees in todays machines is cache line fills and cache
line flushes. The branch instructions are executed from the onboard
cache.
Yes, it's the cache line fills that I was thinking of watching, the
idea is to detect an unusually large proportion of branching
instructions.

Actually, you're right this is silly. Any short "do while" loop (etc.)
is likely to set it off.
Post by Rich
And depending on the architecture, for short loops, even the
caches don't see the execution cycles (because a lower level loop
buffer is actually handling the instruction feed).
Post by Computer Nerd Kev
I then started thinking about how additional hardware could be
added to existing computers so as to offer an equivalent
to the software patches. Some sort of additional connector
sandwidched between the RAM sticks and the RAM sockets on the
motherboard could, maybe, allow additional circuitry to detect
the CPU reading branch instructions by monitoring the data
lines.
Won't work. The cache line fills that this 'extra hardware' sees will
have almost zero correlation to actual branch execution, and in some
instances these cache line fills will occur thousands or tens of
thousands of clock cycles before any execution of any branches in the
lines will occur.
The timer oscillator only runs at 14.3MHz, so it doesn't matter so
much if the protection is enabled/disabled for a few hundred timer
cycles before/after it is required.
Post by Rich
Post by Computer Nerd Kev
I've concluded that a device using an elecromagnet to attract a
lever, to move a small piece of lead,
Mechanical device. The rate of reaction of a mechanical device is
going to be on the order of disk seek latency (mS or so). On modern
CPU's you've already blown past a few million CPU clock cycles by the
time your lever and lead can move out of the way.
Post by Computer Nerd Kev
to uncover a potent radioactive source,
And, now you have to get licences from the govt. to sell the machine
because it comes with a "potent radioactive source" installed, and who
know what else worth of regulatory red-tape.
Yes, I thought that was enough to imply that I was joking. Maybe the
smiley should have been earlier.
Post by Rich
Post by Computer Nerd Kev
Unfortunately I later discovered that the 14.318MHz
crystal frequency used for the timers is also synthesised
to generate the clock signals for PCI, USB, and all sorts
of other stuff that probably shouldn't be toyed with. A
shame, until then it was all looking so practical. :)
Yep. Tweaking one clock likely bothers many others that should not
have their frequency changed. They are almost all interconnected.
That is the single easiest way to keep them all in phase with each
other, derive them from each other.
In theory the timer doesn't need to be in phase with anything, it's
for measuring time not gating signals. In practice it makes economic
sense to generate multiple clock signals from one crystal oscillator,
but the buffered timer frequency output from the clock generator
chip can be varied thereafter without worrying the hardware (unless
some sneaky designer decided to use it for some non-timer function
as well, which I don't know for sure but it isn't inevitable).
--
__ __
#_ < |\| |< _#
Loading...