* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, apic: Print verbose error interrupt reason on apic=debug
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Demacro CONFIG_PARAVIRT cpu accessors
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix mrst sparse complaints
x86: Fix spelling error in the memcpy() source code comment
x86, mpparse: Remove unnecessary variable
- info on the magic SysRq key.
telephony/
- directory with info on telephony (e.g. voice over IP) support.
-uml/
- - directory with information about User Mode Linux.
unicode.txt
- info on the Unicode character/font mapping used in Linux.
unshare.txt
successful, will make the kernel abort a subsequent transition
to a sleep state if any wakeup events are reported after the
write has returned.
+
+What: /sys/power/reserved_size
+Date: May 2011
+Contact: Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+ The /sys/power/reserved_size file allows user space to control
+ the amount of memory reserved for allocations made by device
+ drivers during the "device freeze" stage of hibernation. It can
+ be written a string representing a non-negative integer that
+ will be used as the amount of memory to reserve for allocations
+ made by device drivers' "freeze" callbacks, in bytes.
+
+ Reading from this file will display the current value, which is
+ set to 1 MB by default.
<para>
Whenever an interrupt triggers, the lowlevel arch code calls into
the generic interrupt code by calling desc->handle_irq().
- This highlevel IRQ handling function only uses desc->chip primitives
- referenced by the assigned chip descriptor structure.
+ This highlevel IRQ handling function only uses desc->irq_data.chip
+ primitives referenced by the assigned chip descriptor structure.
</para>
</sect1>
<sect1 id="Highlevel_Driver_API">
<listitem><para>enable_irq()</para></listitem>
<listitem><para>disable_irq_nosync() (SMP only)</para></listitem>
<listitem><para>synchronize_irq() (SMP only)</para></listitem>
- <listitem><para>set_irq_type()</para></listitem>
- <listitem><para>set_irq_wake()</para></listitem>
- <listitem><para>set_irq_data()</para></listitem>
- <listitem><para>set_irq_chip()</para></listitem>
- <listitem><para>set_irq_chip_data()</para></listitem>
+ <listitem><para>irq_set_irq_type()</para></listitem>
+ <listitem><para>irq_set_irq_wake()</para></listitem>
+ <listitem><para>irq_set_handler_data()</para></listitem>
+ <listitem><para>irq_set_chip()</para></listitem>
+ <listitem><para>irq_set_chip_data()</para></listitem>
</itemizedlist>
See the autogenerated function documentation for details.
</para>
<listitem><para>handle_fasteoi_irq</para></listitem>
<listitem><para>handle_simple_irq</para></listitem>
<listitem><para>handle_percpu_irq</para></listitem>
+ <listitem><para>handle_edge_eoi_irq</para></listitem>
+ <listitem><para>handle_bad_irq</para></listitem>
</itemizedlist>
The interrupt flow handlers (either predefined or architecture
specific) are assigned to specific interrupts by the architecture
<programlisting>
default_enable(struct irq_data *data)
{
- desc->chip->irq_unmask(data);
+ desc->irq_data.chip->irq_unmask(data);
}
default_disable(struct irq_data *data)
{
if (!delay_disable(data))
- desc->chip->irq_mask(data);
+ desc->irq_data.chip->irq_mask(data);
}
default_ack(struct irq_data *data)
<para>
The following control flow is implemented (simplified excerpt):
<programlisting>
-desc->chip->irq_mask();
-handle_IRQ_event(desc->action);
-desc->chip->irq_unmask();
+desc->irq_data.chip->irq_mask_ack();
+handle_irq_event(desc->action);
+desc->irq_data.chip->irq_unmask();
</programlisting>
</para>
</sect3>
<para>
The following control flow is implemented (simplified excerpt):
<programlisting>
-handle_IRQ_event(desc->action);
-desc->chip->irq_eoi();
+handle_irq_event(desc->action);
+desc->irq_data.chip->irq_eoi();
</programlisting>
</para>
</sect3>
The following control flow is implemented (simplified excerpt):
<programlisting>
if (desc->status & running) {
- desc->chip->irq_mask();
+ desc->irq_data.chip->irq_mask_ack();
desc->status |= pending | masked;
return;
}
-desc->chip->irq_ack();
+desc->irq_data.chip->irq_ack();
desc->status |= running;
do {
if (desc->status & masked)
- desc->chip->irq_unmask();
+ desc->irq_data.chip->irq_unmask();
desc->status &= ~pending;
- handle_IRQ_event(desc->action);
+ handle_irq_event(desc->action);
} while (status & pending);
desc->status &= ~running;
</programlisting>
<para>
The following control flow is implemented (simplified excerpt):
<programlisting>
-handle_IRQ_event(desc->action);
+handle_irq_event(desc->action);
</programlisting>
</para>
</sect3>
<para>
The following control flow is implemented (simplified excerpt):
<programlisting>
-handle_IRQ_event(desc->action);
-if (desc->chip->irq_eoi)
- desc->chip->irq_eoi();
+if (desc->irq_data.chip->irq_ack)
+ desc->irq_data.chip->irq_ack();
+handle_irq_event(desc->action);
+if (desc->irq_data.chip->irq_eoi)
+ desc->irq_data.chip->irq_eoi();
</programlisting>
</para>
</sect3>
+ <sect3 id="EOI_Edge_IRQ_flow_handler">
+ <title>EOI Edge IRQ flow handler</title>
+ <para>
+ handle_edge_eoi_irq provides an abnomination of the edge
+ handler which is solely used to tame a badly wreckaged
+ irq controller on powerpc/cell.
+ </para>
+ </sect3>
+ <sect3 id="BAD_IRQ_flow_handler">
+ <title>Bad IRQ flow handler</title>
+ <para>
+ handle_bad_irq is used for spurious interrupts which
+ have no real handler assigned..
+ </para>
+ </sect3>
</sect2>
<sect2 id="Quirks_and_optimizations">
<title>Quirks and optimizations</title>
<listitem><para>irq_mask_ack() - Optional, recommended for performance</para></listitem>
<listitem><para>irq_mask()</para></listitem>
<listitem><para>irq_unmask()</para></listitem>
+ <listitem><para>irq_eoi() - Optional, required for eoi flow handlers</para></listitem>
<listitem><para>irq_retrigger() - Optional</para></listitem>
<listitem><para>irq_set_type() - Optional</para></listitem>
<listitem><para>irq_set_wake() - Optional</para></listitem>
<chapter id="doirq">
<title>__do_IRQ entry point</title>
<para>
- The original implementation __do_IRQ() is an alternative entry
- point for all types of interrupts.
+ The original implementation __do_IRQ() was an alternative entry
+ point for all types of interrupts. It not longer exists.
</para>
<para>
This handler turned out to be not suitable for all
interrupt hardware and was therefore reimplemented with split
- functionality for egde/level/simple/percpu interrupts. This is not
+ functionality for edge/level/simple/percpu interrupts. This is not
only a functional optimization. It also shortens code paths for
interrupts.
</para>
- <para>
- To make use of the split implementation, replace the call to
- __do_IRQ by a call to desc->handle_irq() and associate
- the appropriate handler function to desc->handle_irq().
- In most cases the generic handler implementations should
- be sufficient.
- </para>
</chapter>
<chapter id="locking">
<title>Locking on SMP</title>
<para>
The locking of chip registers is up to the architecture that
- defines the chip primitives. There is a chip->lock field that can be used
- for serialization, but the generic layer does not touch it. The per-irq
- structure is protected via desc->lock, by the generic layer.
+ defines the chip primitives. The per-irq structure is
+ protected via desc->lock, by the generic layer.
</para>
</chapter>
<chapter id="structs">
<!ENTITY sub-srggb10 SYSTEM "v4l/pixfmt-srggb10.xml">
<!ENTITY sub-srggb8 SYSTEM "v4l/pixfmt-srggb8.xml">
<!ENTITY sub-y10 SYSTEM "v4l/pixfmt-y10.xml">
+<!ENTITY sub-y12 SYSTEM "v4l/pixfmt-y12.xml">
<!ENTITY sub-pixfmt SYSTEM "v4l/pixfmt.xml">
<!ENTITY sub-cropcap SYSTEM "v4l/vidioc-cropcap.xml">
<!ENTITY sub-dbg-g-register SYSTEM "v4l/vidioc-dbg-g-register.xml">
<varlistentry>
<term><parameter>request</parameter></term>
<listitem>
- <para>MEDIA_IOC_ENUM_LINKS</para>
+ <para>MEDIA_IOC_SETUP_LINK</para>
</listitem>
</varlistentry>
<varlistentry>
--- /dev/null
+<refentry id="V4L2-PIX-FMT-Y12">
+ <refmeta>
+ <refentrytitle>V4L2_PIX_FMT_Y12 ('Y12 ')</refentrytitle>
+ &manvol;
+ </refmeta>
+ <refnamediv>
+ <refname><constant>V4L2_PIX_FMT_Y12</constant></refname>
+ <refpurpose>Grey-scale image</refpurpose>
+ </refnamediv>
+ <refsect1>
+ <title>Description</title>
+
+ <para>This is a grey-scale image with a depth of 12 bits per pixel. Pixels
+are stored in 16-bit words with unused high bits padded with 0. The least
+significant byte is stored at lower memory addresses (little-endian).</para>
+
+ <example>
+ <title><constant>V4L2_PIX_FMT_Y12</constant> 4 × 4
+pixel image</title>
+
+ <formalpara>
+ <title>Byte Order.</title>
+ <para>Each cell is one byte.
+ <informaltable frame="none">
+ <tgroup cols="9" align="center">
+ <colspec align="left" colwidth="2*" />
+ <tbody valign="top">
+ <row>
+ <entry>start + 0:</entry>
+ <entry>Y'<subscript>00low</subscript></entry>
+ <entry>Y'<subscript>00high</subscript></entry>
+ <entry>Y'<subscript>01low</subscript></entry>
+ <entry>Y'<subscript>01high</subscript></entry>
+ <entry>Y'<subscript>02low</subscript></entry>
+ <entry>Y'<subscript>02high</subscript></entry>
+ <entry>Y'<subscript>03low</subscript></entry>
+ <entry>Y'<subscript>03high</subscript></entry>
+ </row>
+ <row>
+ <entry>start + 8:</entry>
+ <entry>Y'<subscript>10low</subscript></entry>
+ <entry>Y'<subscript>10high</subscript></entry>
+ <entry>Y'<subscript>11low</subscript></entry>
+ <entry>Y'<subscript>11high</subscript></entry>
+ <entry>Y'<subscript>12low</subscript></entry>
+ <entry>Y'<subscript>12high</subscript></entry>
+ <entry>Y'<subscript>13low</subscript></entry>
+ <entry>Y'<subscript>13high</subscript></entry>
+ </row>
+ <row>
+ <entry>start + 16:</entry>
+ <entry>Y'<subscript>20low</subscript></entry>
+ <entry>Y'<subscript>20high</subscript></entry>
+ <entry>Y'<subscript>21low</subscript></entry>
+ <entry>Y'<subscript>21high</subscript></entry>
+ <entry>Y'<subscript>22low</subscript></entry>
+ <entry>Y'<subscript>22high</subscript></entry>
+ <entry>Y'<subscript>23low</subscript></entry>
+ <entry>Y'<subscript>23high</subscript></entry>
+ </row>
+ <row>
+ <entry>start + 24:</entry>
+ <entry>Y'<subscript>30low</subscript></entry>
+ <entry>Y'<subscript>30high</subscript></entry>
+ <entry>Y'<subscript>31low</subscript></entry>
+ <entry>Y'<subscript>31high</subscript></entry>
+ <entry>Y'<subscript>32low</subscript></entry>
+ <entry>Y'<subscript>32high</subscript></entry>
+ <entry>Y'<subscript>33low</subscript></entry>
+ <entry>Y'<subscript>33high</subscript></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </para>
+ </formalpara>
+ </example>
+ </refsect1>
+</refentry>
&sub-packed-yuv;
&sub-grey;
&sub-y10;
+ &sub-y12;
&sub-y16;
&sub-yuyv;
&sub-uyvy;
<entry>b<subscript>1</subscript></entry>
<entry>b<subscript>0</subscript></entry>
</row>
+ <row id="V4L2-MBUS-FMT-SGBRG8-1X8">
+ <entry>V4L2_MBUS_FMT_SGBRG8_1X8</entry>
+ <entry>0x3013</entry>
+ <entry></entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>g<subscript>7</subscript></entry>
+ <entry>g<subscript>6</subscript></entry>
+ <entry>g<subscript>5</subscript></entry>
+ <entry>g<subscript>4</subscript></entry>
+ <entry>g<subscript>3</subscript></entry>
+ <entry>g<subscript>2</subscript></entry>
+ <entry>g<subscript>1</subscript></entry>
+ <entry>g<subscript>0</subscript></entry>
+ </row>
<row id="V4L2-MBUS-FMT-SGRBG8-1X8">
<entry>V4L2_MBUS_FMT_SGRBG8_1X8</entry>
<entry>0x3002</entry>
<entry>g<subscript>1</subscript></entry>
<entry>g<subscript>0</subscript></entry>
</row>
+ <row id="V4L2-MBUS-FMT-SRGGB8-1X8">
+ <entry>V4L2_MBUS_FMT_SRGGB8_1X8</entry>
+ <entry>0x3014</entry>
+ <entry></entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>r<subscript>7</subscript></entry>
+ <entry>r<subscript>6</subscript></entry>
+ <entry>r<subscript>5</subscript></entry>
+ <entry>r<subscript>4</subscript></entry>
+ <entry>r<subscript>3</subscript></entry>
+ <entry>r<subscript>2</subscript></entry>
+ <entry>r<subscript>1</subscript></entry>
+ <entry>r<subscript>0</subscript></entry>
+ </row>
<row id="V4L2-MBUS-FMT-SBGGR10-DPCM8-1X8">
<entry>V4L2_MBUS_FMT_SBGGR10_DPCM8_1X8</entry>
<entry>0x300b</entry>
<entry>u<subscript>1</subscript></entry>
<entry>u<subscript>0</subscript></entry>
</row>
+ <row id="V4L2-MBUS-FMT-Y12-1X12">
+ <entry>V4L2_MBUS_FMT_Y12_1X12</entry>
+ <entry>0x2013</entry>
+ <entry></entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>y<subscript>11</subscript></entry>
+ <entry>y<subscript>10</subscript></entry>
+ <entry>y<subscript>9</subscript></entry>
+ <entry>y<subscript>8</subscript></entry>
+ <entry>y<subscript>7</subscript></entry>
+ <entry>y<subscript>6</subscript></entry>
+ <entry>y<subscript>5</subscript></entry>
+ <entry>y<subscript>4</subscript></entry>
+ <entry>y<subscript>3</subscript></entry>
+ <entry>y<subscript>2</subscript></entry>
+ <entry>y<subscript>1</subscript></entry>
+ <entry>y<subscript>0</subscript></entry>
+ </row>
<row id="V4L2-MBUS-FMT-UYVY8-1X16">
<entry>V4L2_MBUS_FMT_UYVY8_1X16</entry>
<entry>0x200f</entry>
tasks # attach a task(thread) and show list of threads
cgroup.procs # show list of processes
cgroup.event_control # an interface for event_fd()
- memory.usage_in_bytes # show current memory(RSS+Cache) usage.
- memory.memsw.usage_in_bytes # show current memory+Swap usage
+ memory.usage_in_bytes # show current res_counter usage for memory
+ (See 5.5 for details)
+ memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap
+ (See 5.5 for details)
memory.limit_in_bytes # set/show limit of memory usage
memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage
memory.failcnt # show the number of memory usage hits limits
You can reset failcnt by writing 0 to failcnt file.
# echo 0 > .../memory.failcnt
+5.5 usage_in_bytes
+
+For efficiency, as other kernel components, memory cgroup uses some optimization
+to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
+method and doesn't show 'exact' value of memory(and swap) usage, it's an fuzz
+value for efficient access. (Of course, when necessary, it's synchronized.)
+If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
+value in memory.stat(see 5.2).
+
6. Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.
----------------------------
-What: The acpi_sleep=s4_nonvs command line option
-When: 2.6.37
-Files: arch/x86/kernel/acpi/sleep.c
-Why: superseded by acpi_sleep=nonvs
-Who: Rafael J. Wysocki <rjw@sisk.pl>
-
-----------------------------
-
What: PCI DMA unmap state API
When: August 2012
Why: PCI DMA unmap state API (include/linux/pci-dma.h) was replaced
entering atomic context, using:
int flex_array_prealloc(struct flex_array *array, unsigned int start,
- unsigned int end, gfp_t flags);
+ unsigned int nr_elements, gfp_t flags);
This function will ensure that memory for the elements indexed in the range
-defined by start and end has been allocated. Thereafter, a
+defined by start and nr_elements has been allocated. Thereafter, a
flex_array_put() call on an element in that range is guaranteed not to
block.
Prefix: 'gl523sm'
Addresses scanned: I2C 0x18 - 0x1a, 0x29 - 0x2b, 0x4c - 0x4e
Datasheet:
- * Intel Xeon Processor
- Prefix: - any other - may require 'force_adm1021' parameter
- Addresses scanned: none
- Datasheet: Publicly available at Intel website
* Maxim MAX1617
Prefix: 'max1617'
Addresses scanned: I2C 0x18 - 0x1a, 0x29 - 0x2b, 0x4c - 0x4e
ADM1021-clones do faster measurements, but there is really no good reason
for that.
-Xeon support
-------------
-Some Xeon processors have real max1617, adm1021, or compatible chips
-within them, with two temperature sensors.
+Netburst-based Xeon support
+---------------------------
-Other Xeons have chips with only one sensor.
+Some Xeon processors based on the Netburst (early Pentium 4, from 2001 to
+2003) microarchitecture had real MAX1617, ADM1021, or compatible chips
+within them, with two temperature sensors. Other Xeon processors of this
+era (with 400 MHz FSB) had chips with only one temperature sensor.
-If you have a Xeon, and the adm1021 module loads, and both temperatures
-appear valid, then things are good.
+If you have such an old Xeon, and you get two valid temperatures when
+loading the adm1021 module, then things are good.
-If the adm1021 module doesn't load, you should try this:
- modprobe adm1021 force_adm1021=BUS,ADDRESS
- ADDRESS can only be 0x18, 0x1a, 0x29, 0x2b, 0x4c, or 0x4e.
+If nothing happens when loading the adm1021 module, and you are certain
+that your specific Xeon processor model includes compatible sensors, you
+will have to explicitly instantiate the sensor chips from user-space. See
+method 4 in Documentation/i2c/instantiating-devices. Possible slave
+addresses are 0x18, 0x1a, 0x29, 0x2b, 0x4c, or 0x4e. It is likely that
+only temp2 will be correct and temp1 will have to be ignored.
-If you have dual Xeons you may have appear to have two separate
-adm1021-compatible chips, or two single-temperature sensors, at distinct
-addresses.
+Previous generations of the Xeon processor (based on Pentium II/III)
+didn't have these sensors. Next generations of Xeon processors (533 MHz
+FSB and faster) lost them, until the Core-based generation which
+introduced integrated digital thermal sensors. These are supported by
+the coretemp driver.
Addresses scanned: I2C 0x4c and 0x4d
Datasheet: Publicly available at the ON Semiconductor website
http://www.onsemi.com/PowerSolutions/product.do?id=ADT7461
+ * Analog Devices ADT7461A
+ Prefix: 'adt7461a'
+ Addresses scanned: I2C 0x4c and 0x4d
+ Datasheet: Publicly available at the ON Semiconductor website
+ http://www.onsemi.com/PowerSolutions/product.do?id=ADT7461A
+ * ON Semiconductor NCT1008
+ Prefix: 'nct1008'
+ Addresses scanned: I2C 0x4c and 0x4d
+ Datasheet: Publicly available at the ON Semiconductor website
+ http://www.onsemi.com/PowerSolutions/product.do?id=NCT1008
* Maxim MAX6646
Prefix: 'max6646'
Addresses scanned: I2C 0x4d
* ALERT is triggered by open remote sensor.
* SMBus PEC support for Write Byte and Receive Byte transactions.
-ADT7461:
+ADT7461, ADT7461A, NCT1008:
* Extended temperature range (breaks compatibility)
* Lower resolution for remote temperature
Only the local hysteresis can be set from user-space, and the same delta
applies to the remote hysteresis.
-The lm90 driver will not update its values more frequently than every
-other second; reading them more often will do no harm, but will return
-'old' values.
+The lm90 driver will not update its values more frequently than configured with
+the update_interval attribute; reading them more often will do no harm, but will
+return 'old' values.
SMBus Alert Support
-------------------
This driver has basic support for SMBus alert. When an alert is received,
the status register is read and the faulty temperature channel is logged.
-The Analog Devices chips (ADM1032 and ADT7461) do not implement the SMBus
-alert protocol properly so additional care is needed: the ALERT output is
-disabled when an alert is received, and is re-enabled only when the alarm
-is gone. Otherwise the chip would block alerts from other chips in the bus
-as long as the alarm is active.
+The Analog Devices chips (ADM1032, ADT7461 and ADT7461A) and ON
+Semiconductor chips (NCT1008) do not implement the SMBus alert protocol
+properly so additional care is needed: the ALERT output is disabled when
+an alert is received, and is re-enabled only when the alarm is gone.
+Otherwise the chip would block alerts from other chips in the bus as long
+as the alarm is active.
PEC Support
-----------
acpi_sleep= [HW,ACPI] Sleep options
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
- old_ordering, s4_nonvs, sci_force_enable }
+ old_ordering, nonvs, sci_force_enable }
See Documentation/power/video.txt for information on
s3_bios and s3_mode.
s3_beep is for debugging; it makes the PC's speaker beep
+++ /dev/null
-The Definitive KVM (Kernel-based Virtual Machine) API Documentation
-===================================================================
-
-1. General description
-
-The kvm API is a set of ioctls that are issued to control various aspects
-of a virtual machine. The ioctls belong to three classes
-
- - System ioctls: These query and set global attributes which affect the
- whole kvm subsystem. In addition a system ioctl is used to create
- virtual machines
-
- - VM ioctls: These query and set attributes that affect an entire virtual
- machine, for example memory layout. In addition a VM ioctl is used to
- create virtual cpus (vcpus).
-
- Only run VM ioctls from the same process (address space) that was used
- to create the VM.
-
- - vcpu ioctls: These query and set attributes that control the operation
- of a single virtual cpu.
-
- Only run vcpu ioctls from the same thread that was used to create the
- vcpu.
-
-2. File descriptors
-
-The kvm API is centered around file descriptors. An initial
-open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
-can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
-handle will create a VM file descriptor which can be used to issue VM
-ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
-and return a file descriptor pointing to it. Finally, ioctls on a vcpu
-fd can be used to control the vcpu, including the important task of
-actually running guest code.
-
-In general file descriptors can be migrated among processes by means
-of fork() and the SCM_RIGHTS facility of unix domain socket. These
-kinds of tricks are explicitly not supported by kvm. While they will
-not cause harm to the host, their actual behavior is not guaranteed by
-the API. The only supported use is one virtual machine per process,
-and one vcpu per thread.
-
-3. Extensions
-
-As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
-incompatible change are allowed. However, there is an extension
-facility that allows backward-compatible extensions to the API to be
-queried and used.
-
-The extension mechanism is not based on on the Linux version number.
-Instead, kvm defines extension identifiers and a facility to query
-whether a particular extension identifier is available. If it is, a
-set of ioctls is available for application use.
-
-4. API description
-
-This section describes ioctls that can be used to control kvm guests.
-For each ioctl, the following information is provided along with a
-description:
-
- Capability: which KVM extension provides this ioctl. Can be 'basic',
- which means that is will be provided by any kernel that supports
- API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
- means availability needs to be checked with KVM_CHECK_EXTENSION
- (see section 4.4).
-
- Architectures: which instruction set architectures provide this ioctl.
- x86 includes both i386 and x86_64.
-
- Type: system, vm, or vcpu.
-
- Parameters: what parameters are accepted by the ioctl.
-
- Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
- are not detailed, but errors with specific meanings are.
-
-4.1 KVM_GET_API_VERSION
-
-Capability: basic
-Architectures: all
-Type: system ioctl
-Parameters: none
-Returns: the constant KVM_API_VERSION (=12)
-
-This identifies the API version as the stable kvm API. It is not
-expected that this number will change. However, Linux 2.6.20 and
-2.6.21 report earlier versions; these are not documented and not
-supported. Applications should refuse to run if KVM_GET_API_VERSION
-returns a value other than 12. If this check passes, all ioctls
-described as 'basic' will be available.
-
-4.2 KVM_CREATE_VM
-
-Capability: basic
-Architectures: all
-Type: system ioctl
-Parameters: none
-Returns: a VM fd that can be used to control the new virtual machine.
-
-The new VM has no virtual cpus and no memory. An mmap() of a VM fd
-will access the virtual machine's physical address space; offset zero
-corresponds to guest physical address zero. Use of mmap() on a VM fd
-is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
-available.
-
-4.3 KVM_GET_MSR_INDEX_LIST
-
-Capability: basic
-Architectures: x86
-Type: system
-Parameters: struct kvm_msr_list (in/out)
-Returns: 0 on success; -1 on error
-Errors:
- E2BIG: the msr index list is to be to fit in the array specified by
- the user.
-
-struct kvm_msr_list {
- __u32 nmsrs; /* number of msrs in entries */
- __u32 indices[0];
-};
-
-This ioctl returns the guest msrs that are supported. The list varies
-by kvm version and host processor, but does not change otherwise. The
-user fills in the size of the indices array in nmsrs, and in return
-kvm adjusts nmsrs to reflect the actual number of msrs and fills in
-the indices array with their numbers.
-
-Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
-not returned in the MSR list, as different vcpus can have a different number
-of banks, as set via the KVM_X86_SETUP_MCE ioctl.
-
-4.4 KVM_CHECK_EXTENSION
-
-Capability: basic
-Architectures: all
-Type: system ioctl
-Parameters: extension identifier (KVM_CAP_*)
-Returns: 0 if unsupported; 1 (or some other positive integer) if supported
-
-The API allows the application to query about extensions to the core
-kvm API. Userspace passes an extension identifier (an integer) and
-receives an integer that describes the extension availability.
-Generally 0 means no and 1 means yes, but some extensions may report
-additional information in the integer return value.
-
-4.5 KVM_GET_VCPU_MMAP_SIZE
-
-Capability: basic
-Architectures: all
-Type: system ioctl
-Parameters: none
-Returns: size of vcpu mmap area, in bytes
-
-The KVM_RUN ioctl (cf.) communicates with userspace via a shared
-memory region. This ioctl returns the size of that region. See the
-KVM_RUN documentation for details.
-
-4.6 KVM_SET_MEMORY_REGION
-
-Capability: basic
-Architectures: all
-Type: vm ioctl
-Parameters: struct kvm_memory_region (in)
-Returns: 0 on success, -1 on error
-
-This ioctl is obsolete and has been removed.
-
-4.7 KVM_CREATE_VCPU
-
-Capability: basic
-Architectures: all
-Type: vm ioctl
-Parameters: vcpu id (apic id on x86)
-Returns: vcpu fd on success, -1 on error
-
-This API adds a vcpu to a virtual machine. The vcpu id is a small integer
-in the range [0, max_vcpus).
-
-4.8 KVM_GET_DIRTY_LOG (vm ioctl)
-
-Capability: basic
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_dirty_log (in/out)
-Returns: 0 on success, -1 on error
-
-/* for KVM_GET_DIRTY_LOG */
-struct kvm_dirty_log {
- __u32 slot;
- __u32 padding;
- union {
- void __user *dirty_bitmap; /* one bit per page */
- __u64 padding;
- };
-};
-
-Given a memory slot, return a bitmap containing any pages dirtied
-since the last call to this ioctl. Bit 0 is the first page in the
-memory slot. Ensure the entire structure is cleared to avoid padding
-issues.
-
-4.9 KVM_SET_MEMORY_ALIAS
-
-Capability: basic
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_memory_alias (in)
-Returns: 0 (success), -1 (error)
-
-This ioctl is obsolete and has been removed.
-
-4.10 KVM_RUN
-
-Capability: basic
-Architectures: all
-Type: vcpu ioctl
-Parameters: none
-Returns: 0 on success, -1 on error
-Errors:
- EINTR: an unmasked signal is pending
-
-This ioctl is used to run a guest virtual cpu. While there are no
-explicit parameters, there is an implicit parameter block that can be
-obtained by mmap()ing the vcpu fd at offset 0, with the size given by
-KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
-kvm_run' (see below).
-
-4.11 KVM_GET_REGS
-
-Capability: basic
-Architectures: all
-Type: vcpu ioctl
-Parameters: struct kvm_regs (out)
-Returns: 0 on success, -1 on error
-
-Reads the general purpose registers from the vcpu.
-
-/* x86 */
-struct kvm_regs {
- /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
- __u64 rax, rbx, rcx, rdx;
- __u64 rsi, rdi, rsp, rbp;
- __u64 r8, r9, r10, r11;
- __u64 r12, r13, r14, r15;
- __u64 rip, rflags;
-};
-
-4.12 KVM_SET_REGS
-
-Capability: basic
-Architectures: all
-Type: vcpu ioctl
-Parameters: struct kvm_regs (in)
-Returns: 0 on success, -1 on error
-
-Writes the general purpose registers into the vcpu.
-
-See KVM_GET_REGS for the data structure.
-
-4.13 KVM_GET_SREGS
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_sregs (out)
-Returns: 0 on success, -1 on error
-
-Reads special registers from the vcpu.
-
-/* x86 */
-struct kvm_sregs {
- struct kvm_segment cs, ds, es, fs, gs, ss;
- struct kvm_segment tr, ldt;
- struct kvm_dtable gdt, idt;
- __u64 cr0, cr2, cr3, cr4, cr8;
- __u64 efer;
- __u64 apic_base;
- __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
-};
-
-interrupt_bitmap is a bitmap of pending external interrupts. At most
-one bit may be set. This interrupt has been acknowledged by the APIC
-but not yet injected into the cpu core.
-
-4.14 KVM_SET_SREGS
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_sregs (in)
-Returns: 0 on success, -1 on error
-
-Writes special registers into the vcpu. See KVM_GET_SREGS for the
-data structures.
-
-4.15 KVM_TRANSLATE
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_translation (in/out)
-Returns: 0 on success, -1 on error
-
-Translates a virtual address according to the vcpu's current address
-translation mode.
-
-struct kvm_translation {
- /* in */
- __u64 linear_address;
-
- /* out */
- __u64 physical_address;
- __u8 valid;
- __u8 writeable;
- __u8 usermode;
- __u8 pad[5];
-};
-
-4.16 KVM_INTERRUPT
-
-Capability: basic
-Architectures: x86, ppc
-Type: vcpu ioctl
-Parameters: struct kvm_interrupt (in)
-Returns: 0 on success, -1 on error
-
-Queues a hardware interrupt vector to be injected. This is only
-useful if in-kernel local APIC or equivalent is not used.
-
-/* for KVM_INTERRUPT */
-struct kvm_interrupt {
- /* in */
- __u32 irq;
-};
-
-X86:
-
-Note 'irq' is an interrupt vector, not an interrupt pin or line.
-
-PPC:
-
-Queues an external interrupt to be injected. This ioctl is overleaded
-with 3 different irq values:
-
-a) KVM_INTERRUPT_SET
-
- This injects an edge type external interrupt into the guest once it's ready
- to receive interrupts. When injected, the interrupt is done.
-
-b) KVM_INTERRUPT_UNSET
-
- This unsets any pending interrupt.
-
- Only available with KVM_CAP_PPC_UNSET_IRQ.
-
-c) KVM_INTERRUPT_SET_LEVEL
-
- This injects a level type external interrupt into the guest context. The
- interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
- is triggered.
-
- Only available with KVM_CAP_PPC_IRQ_LEVEL.
-
-Note that any value for 'irq' other than the ones stated above is invalid
-and incurs unexpected behavior.
-
-4.17 KVM_DEBUG_GUEST
-
-Capability: basic
-Architectures: none
-Type: vcpu ioctl
-Parameters: none)
-Returns: -1 on error
-
-Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
-
-4.18 KVM_GET_MSRS
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_msrs (in/out)
-Returns: 0 on success, -1 on error
-
-Reads model-specific registers from the vcpu. Supported msr indices can
-be obtained using KVM_GET_MSR_INDEX_LIST.
-
-struct kvm_msrs {
- __u32 nmsrs; /* number of msrs in entries */
- __u32 pad;
-
- struct kvm_msr_entry entries[0];
-};
-
-struct kvm_msr_entry {
- __u32 index;
- __u32 reserved;
- __u64 data;
-};
-
-Application code should set the 'nmsrs' member (which indicates the
-size of the entries array) and the 'index' member of each array entry.
-kvm will fill in the 'data' member.
-
-4.19 KVM_SET_MSRS
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_msrs (in)
-Returns: 0 on success, -1 on error
-
-Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
-data structures.
-
-Application code should set the 'nmsrs' member (which indicates the
-size of the entries array), and the 'index' and 'data' members of each
-array entry.
-
-4.20 KVM_SET_CPUID
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_cpuid (in)
-Returns: 0 on success, -1 on error
-
-Defines the vcpu responses to the cpuid instruction. Applications
-should use the KVM_SET_CPUID2 ioctl if available.
-
-
-struct kvm_cpuid_entry {
- __u32 function;
- __u32 eax;
- __u32 ebx;
- __u32 ecx;
- __u32 edx;
- __u32 padding;
-};
-
-/* for KVM_SET_CPUID */
-struct kvm_cpuid {
- __u32 nent;
- __u32 padding;
- struct kvm_cpuid_entry entries[0];
-};
-
-4.21 KVM_SET_SIGNAL_MASK
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_signal_mask (in)
-Returns: 0 on success, -1 on error
-
-Defines which signals are blocked during execution of KVM_RUN. This
-signal mask temporarily overrides the threads signal mask. Any
-unblocked signal received (except SIGKILL and SIGSTOP, which retain
-their traditional behaviour) will cause KVM_RUN to return with -EINTR.
-
-Note the signal will only be delivered if not blocked by the original
-signal mask.
-
-/* for KVM_SET_SIGNAL_MASK */
-struct kvm_signal_mask {
- __u32 len;
- __u8 sigset[0];
-};
-
-4.22 KVM_GET_FPU
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_fpu (out)
-Returns: 0 on success, -1 on error
-
-Reads the floating point state from the vcpu.
-
-/* for KVM_GET_FPU and KVM_SET_FPU */
-struct kvm_fpu {
- __u8 fpr[8][16];
- __u16 fcw;
- __u16 fsw;
- __u8 ftwx; /* in fxsave format */
- __u8 pad1;
- __u16 last_opcode;
- __u64 last_ip;
- __u64 last_dp;
- __u8 xmm[16][16];
- __u32 mxcsr;
- __u32 pad2;
-};
-
-4.23 KVM_SET_FPU
-
-Capability: basic
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_fpu (in)
-Returns: 0 on success, -1 on error
-
-Writes the floating point state to the vcpu.
-
-/* for KVM_GET_FPU and KVM_SET_FPU */
-struct kvm_fpu {
- __u8 fpr[8][16];
- __u16 fcw;
- __u16 fsw;
- __u8 ftwx; /* in fxsave format */
- __u8 pad1;
- __u16 last_opcode;
- __u64 last_ip;
- __u64 last_dp;
- __u8 xmm[16][16];
- __u32 mxcsr;
- __u32 pad2;
-};
-
-4.24 KVM_CREATE_IRQCHIP
-
-Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
-Type: vm ioctl
-Parameters: none
-Returns: 0 on success, -1 on error
-
-Creates an interrupt controller model in the kernel. On x86, creates a virtual
-ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
-local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-only go to the IOAPIC. On ia64, a IOSAPIC is created.
-
-4.25 KVM_IRQ_LINE
-
-Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
-Type: vm ioctl
-Parameters: struct kvm_irq_level
-Returns: 0 on success, -1 on error
-
-Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
-
-struct kvm_irq_level {
- union {
- __u32 irq; /* GSI */
- __s32 status; /* not used for KVM_IRQ_LEVEL */
- };
- __u32 level; /* 0 or 1 */
-};
-
-4.26 KVM_GET_IRQCHIP
-
-Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
-Type: vm ioctl
-Parameters: struct kvm_irqchip (in/out)
-Returns: 0 on success, -1 on error
-
-Reads the state of a kernel interrupt controller created with
-KVM_CREATE_IRQCHIP into a buffer provided by the caller.
-
-struct kvm_irqchip {
- __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
- __u32 pad;
- union {
- char dummy[512]; /* reserving space */
- struct kvm_pic_state pic;
- struct kvm_ioapic_state ioapic;
- } chip;
-};
-
-4.27 KVM_SET_IRQCHIP
-
-Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
-Type: vm ioctl
-Parameters: struct kvm_irqchip (in)
-Returns: 0 on success, -1 on error
-
-Sets the state of a kernel interrupt controller created with
-KVM_CREATE_IRQCHIP from a buffer provided by the caller.
-
-struct kvm_irqchip {
- __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
- __u32 pad;
- union {
- char dummy[512]; /* reserving space */
- struct kvm_pic_state pic;
- struct kvm_ioapic_state ioapic;
- } chip;
-};
-
-4.28 KVM_XEN_HVM_CONFIG
-
-Capability: KVM_CAP_XEN_HVM
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_xen_hvm_config (in)
-Returns: 0 on success, -1 on error
-
-Sets the MSR that the Xen HVM guest uses to initialize its hypercall
-page, and provides the starting address and size of the hypercall
-blobs in userspace. When the guest writes the MSR, kvm copies one
-page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
-memory.
-
-struct kvm_xen_hvm_config {
- __u32 flags;
- __u32 msr;
- __u64 blob_addr_32;
- __u64 blob_addr_64;
- __u8 blob_size_32;
- __u8 blob_size_64;
- __u8 pad2[30];
-};
-
-4.29 KVM_GET_CLOCK
-
-Capability: KVM_CAP_ADJUST_CLOCK
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_clock_data (out)
-Returns: 0 on success, -1 on error
-
-Gets the current timestamp of kvmclock as seen by the current guest. In
-conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
-such as migration.
-
-struct kvm_clock_data {
- __u64 clock; /* kvmclock current value */
- __u32 flags;
- __u32 pad[9];
-};
-
-4.30 KVM_SET_CLOCK
-
-Capability: KVM_CAP_ADJUST_CLOCK
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_clock_data (in)
-Returns: 0 on success, -1 on error
-
-Sets the current timestamp of kvmclock to the value specified in its parameter.
-In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
-such as migration.
-
-struct kvm_clock_data {
- __u64 clock; /* kvmclock current value */
- __u32 flags;
- __u32 pad[9];
-};
-
-4.31 KVM_GET_VCPU_EVENTS
-
-Capability: KVM_CAP_VCPU_EVENTS
-Extended by: KVM_CAP_INTR_SHADOW
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_vcpu_event (out)
-Returns: 0 on success, -1 on error
-
-Gets currently pending exceptions, interrupts, and NMIs as well as related
-states of the vcpu.
-
-struct kvm_vcpu_events {
- struct {
- __u8 injected;
- __u8 nr;
- __u8 has_error_code;
- __u8 pad;
- __u32 error_code;
- } exception;
- struct {
- __u8 injected;
- __u8 nr;
- __u8 soft;
- __u8 shadow;
- } interrupt;
- struct {
- __u8 injected;
- __u8 pending;
- __u8 masked;
- __u8 pad;
- } nmi;
- __u32 sipi_vector;
- __u32 flags;
-};
-
-KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
-interrupt.shadow contains a valid state. Otherwise, this field is undefined.
-
-4.32 KVM_SET_VCPU_EVENTS
-
-Capability: KVM_CAP_VCPU_EVENTS
-Extended by: KVM_CAP_INTR_SHADOW
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_vcpu_event (in)
-Returns: 0 on success, -1 on error
-
-Set pending exceptions, interrupts, and NMIs as well as related states of the
-vcpu.
-
-See KVM_GET_VCPU_EVENTS for the data structure.
-
-Fields that may be modified asynchronously by running VCPUs can be excluded
-from the update. These fields are nmi.pending and sipi_vector. Keep the
-corresponding bits in the flags field cleared to suppress overwriting the
-current in-kernel state. The bits are:
-
-KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
-KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
-
-If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
-the flags field to signal that interrupt.shadow contains a valid state and
-shall be written into the VCPU.
-
-4.33 KVM_GET_DEBUGREGS
-
-Capability: KVM_CAP_DEBUGREGS
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_debugregs (out)
-Returns: 0 on success, -1 on error
-
-Reads debug registers from the vcpu.
-
-struct kvm_debugregs {
- __u64 db[4];
- __u64 dr6;
- __u64 dr7;
- __u64 flags;
- __u64 reserved[9];
-};
-
-4.34 KVM_SET_DEBUGREGS
-
-Capability: KVM_CAP_DEBUGREGS
-Architectures: x86
-Type: vm ioctl
-Parameters: struct kvm_debugregs (in)
-Returns: 0 on success, -1 on error
-
-Writes debug registers into the vcpu.
-
-See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
-yet and must be cleared on entry.
-
-4.35 KVM_SET_USER_MEMORY_REGION
-
-Capability: KVM_CAP_USER_MEM
-Architectures: all
-Type: vm ioctl
-Parameters: struct kvm_userspace_memory_region (in)
-Returns: 0 on success, -1 on error
-
-struct kvm_userspace_memory_region {
- __u32 slot;
- __u32 flags;
- __u64 guest_phys_addr;
- __u64 memory_size; /* bytes */
- __u64 userspace_addr; /* start of the userspace allocated memory */
-};
-
-/* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES 1UL
-
-This ioctl allows the user to create or modify a guest physical memory
-slot. When changing an existing slot, it may be moved in the guest
-physical memory space, or its flags may be modified. It may not be
-resized. Slots may not overlap in guest physical address space.
-
-Memory for the region is taken starting at the address denoted by the
-field userspace_addr, which must point at user addressable memory for
-the entire memory slot size. Any object may back this memory, including
-anonymous memory, ordinary files, and hugetlbfs.
-
-It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
-be identical. This allows large pages in the guest to be backed by large
-pages in the host.
-
-The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
-instructs kvm to keep track of writes to memory within the slot. See
-the KVM_GET_DIRTY_LOG ioctl.
-
-When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
-region are automatically reflected into the guest. For example, an mmap()
-that affects the region will be made visible immediately. Another example
-is madvise(MADV_DROP).
-
-It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
-The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
-allocation and is deprecated.
-
-4.36 KVM_SET_TSS_ADDR
-
-Capability: KVM_CAP_SET_TSS_ADDR
-Architectures: x86
-Type: vm ioctl
-Parameters: unsigned long tss_address (in)
-Returns: 0 on success, -1 on error
-
-This ioctl defines the physical address of a three-page region in the guest
-physical address space. The region must be within the first 4GB of the
-guest physical address space and must not conflict with any memory slot
-or any mmio address. The guest may malfunction if it accesses this memory
-region.
-
-This ioctl is required on Intel-based hosts. This is needed on Intel hardware
-because of a quirk in the virtualization implementation (see the internals
-documentation when it pops into existence).
-
-4.37 KVM_ENABLE_CAP
-
-Capability: KVM_CAP_ENABLE_CAP
-Architectures: ppc
-Type: vcpu ioctl
-Parameters: struct kvm_enable_cap (in)
-Returns: 0 on success; -1 on error
-
-+Not all extensions are enabled by default. Using this ioctl the application
-can enable an extension, making it available to the guest.
-
-On systems that do not support this ioctl, it always fails. On systems that
-do support it, it only works for extensions that are supported for enablement.
-
-To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
-be used.
-
-struct kvm_enable_cap {
- /* in */
- __u32 cap;
-
-The capability that is supposed to get enabled.
-
- __u32 flags;
-
-A bitfield indicating future enhancements. Has to be 0 for now.
-
- __u64 args[4];
-
-Arguments for enabling a feature. If a feature needs initial values to
-function properly, this is the place to put them.
-
- __u8 pad[64];
-};
-
-4.38 KVM_GET_MP_STATE
-
-Capability: KVM_CAP_MP_STATE
-Architectures: x86, ia64
-Type: vcpu ioctl
-Parameters: struct kvm_mp_state (out)
-Returns: 0 on success; -1 on error
-
-struct kvm_mp_state {
- __u32 mp_state;
-};
-
-Returns the vcpu's current "multiprocessing state" (though also valid on
-uniprocessor guests).
-
-Possible values are:
-
- - KVM_MP_STATE_RUNNABLE: the vcpu is currently running
- - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
- which has not yet received an INIT signal
- - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
- now ready for a SIPI
- - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
- is waiting for an interrupt
- - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
- accessible via KVM_GET_VCPU_EVENTS)
-
-This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel
-irqchip, the multiprocessing state must be maintained by userspace.
-
-4.39 KVM_SET_MP_STATE
-
-Capability: KVM_CAP_MP_STATE
-Architectures: x86, ia64
-Type: vcpu ioctl
-Parameters: struct kvm_mp_state (in)
-Returns: 0 on success; -1 on error
-
-Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
-arguments.
-
-This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel
-irqchip, the multiprocessing state must be maintained by userspace.
-
-4.40 KVM_SET_IDENTITY_MAP_ADDR
-
-Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
-Architectures: x86
-Type: vm ioctl
-Parameters: unsigned long identity (in)
-Returns: 0 on success, -1 on error
-
-This ioctl defines the physical address of a one-page region in the guest
-physical address space. The region must be within the first 4GB of the
-guest physical address space and must not conflict with any memory slot
-or any mmio address. The guest may malfunction if it accesses this memory
-region.
-
-This ioctl is required on Intel-based hosts. This is needed on Intel hardware
-because of a quirk in the virtualization implementation (see the internals
-documentation when it pops into existence).
-
-4.41 KVM_SET_BOOT_CPU_ID
-
-Capability: KVM_CAP_SET_BOOT_CPU_ID
-Architectures: x86, ia64
-Type: vm ioctl
-Parameters: unsigned long vcpu_id
-Returns: 0 on success, -1 on error
-
-Define which vcpu is the Bootstrap Processor (BSP). Values are the same
-as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
-is vcpu 0.
-
-4.42 KVM_GET_XSAVE
-
-Capability: KVM_CAP_XSAVE
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_xsave (out)
-Returns: 0 on success, -1 on error
-
-struct kvm_xsave {
- __u32 region[1024];
-};
-
-This ioctl would copy current vcpu's xsave struct to the userspace.
-
-4.43 KVM_SET_XSAVE
-
-Capability: KVM_CAP_XSAVE
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_xsave (in)
-Returns: 0 on success, -1 on error
-
-struct kvm_xsave {
- __u32 region[1024];
-};
-
-This ioctl would copy userspace's xsave struct to the kernel.
-
-4.44 KVM_GET_XCRS
-
-Capability: KVM_CAP_XCRS
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_xcrs (out)
-Returns: 0 on success, -1 on error
-
-struct kvm_xcr {
- __u32 xcr;
- __u32 reserved;
- __u64 value;
-};
-
-struct kvm_xcrs {
- __u32 nr_xcrs;
- __u32 flags;
- struct kvm_xcr xcrs[KVM_MAX_XCRS];
- __u64 padding[16];
-};
-
-This ioctl would copy current vcpu's xcrs to the userspace.
-
-4.45 KVM_SET_XCRS
-
-Capability: KVM_CAP_XCRS
-Architectures: x86
-Type: vcpu ioctl
-Parameters: struct kvm_xcrs (in)
-Returns: 0 on success, -1 on error
-
-struct kvm_xcr {
- __u32 xcr;
- __u32 reserved;
- __u64 value;
-};
-
-struct kvm_xcrs {
- __u32 nr_xcrs;
- __u32 flags;
- struct kvm_xcr xcrs[KVM_MAX_XCRS];
- __u64 padding[16];
-};
-
-This ioctl would set vcpu's xcr to the value userspace specified.
-
-4.46 KVM_GET_SUPPORTED_CPUID
-
-Capability: KVM_CAP_EXT_CPUID
-Architectures: x86
-Type: system ioctl
-Parameters: struct kvm_cpuid2 (in/out)
-Returns: 0 on success, -1 on error
-
-struct kvm_cpuid2 {
- __u32 nent;
- __u32 padding;
- struct kvm_cpuid_entry2 entries[0];
-};
-
-#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
-#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
-#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
-
-struct kvm_cpuid_entry2 {
- __u32 function;
- __u32 index;
- __u32 flags;
- __u32 eax;
- __u32 ebx;
- __u32 ecx;
- __u32 edx;
- __u32 padding[3];
-};
-
-This ioctl returns x86 cpuid features which are supported by both the hardware
-and kvm. Userspace can use the information returned by this ioctl to
-construct cpuid information (for KVM_SET_CPUID2) that is consistent with
-hardware, kernel, and userspace capabilities, and with user requirements (for
-example, the user may wish to constrain cpuid to emulate older hardware,
-or for feature consistency across a cluster).
-
-Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
-with the 'nent' field indicating the number of entries in the variable-size
-array 'entries'. If the number of entries is too low to describe the cpu
-capabilities, an error (E2BIG) is returned. If the number is too high,
-the 'nent' field is adjusted and an error (ENOMEM) is returned. If the
-number is just right, the 'nent' field is adjusted to the number of valid
-entries in the 'entries' array, which is then filled.
-
-The entries returned are the host cpuid as returned by the cpuid instruction,
-with unknown or unsupported features masked out. Some features (for example,
-x2apic), may not be present in the host cpu, but are exposed by kvm if it can
-emulate them efficiently. The fields in each entry are defined as follows:
-
- function: the eax value used to obtain the entry
- index: the ecx value used to obtain the entry (for entries that are
- affected by ecx)
- flags: an OR of zero or more of the following:
- KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
- if the index field is valid
- KVM_CPUID_FLAG_STATEFUL_FUNC:
- if cpuid for this function returns different values for successive
- invocations; there will be several entries with the same function,
- all with this flag set
- KVM_CPUID_FLAG_STATE_READ_NEXT:
- for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
- the first entry to be read by a cpu
- eax, ebx, ecx, edx: the values returned by the cpuid instruction for
- this function/index combination
-
-4.47 KVM_PPC_GET_PVINFO
-
-Capability: KVM_CAP_PPC_GET_PVINFO
-Architectures: ppc
-Type: vm ioctl
-Parameters: struct kvm_ppc_pvinfo (out)
-Returns: 0 on success, !0 on error
-
-struct kvm_ppc_pvinfo {
- __u32 flags;
- __u32 hcall[4];
- __u8 pad[108];
-};
-
-This ioctl fetches PV specific information that need to be passed to the guest
-using the device tree or other means from vm context.
-
-For now the only implemented piece of information distributed here is an array
-of 4 instructions that make up a hypercall.
-
-If any additional field gets added to this structure later on, a bit for that
-additional piece of information will be set in the flags bitmap.
-
-4.48 KVM_ASSIGN_PCI_DEVICE
-
-Capability: KVM_CAP_DEVICE_ASSIGNMENT
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_assigned_pci_dev (in)
-Returns: 0 on success, -1 on error
-
-Assigns a host PCI device to the VM.
-
-struct kvm_assigned_pci_dev {
- __u32 assigned_dev_id;
- __u32 busnr;
- __u32 devfn;
- __u32 flags;
- __u32 segnr;
- union {
- __u32 reserved[11];
- };
-};
-
-The PCI device is specified by the triple segnr, busnr, and devfn.
-Identification in succeeding service requests is done via assigned_dev_id. The
-following flags are specified:
-
-/* Depends on KVM_CAP_IOMMU */
-#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
-
-4.49 KVM_DEASSIGN_PCI_DEVICE
-
-Capability: KVM_CAP_DEVICE_DEASSIGNMENT
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_assigned_pci_dev (in)
-Returns: 0 on success, -1 on error
-
-Ends PCI device assignment, releasing all associated resources.
-
-See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is
-used in kvm_assigned_pci_dev to identify the device.
-
-4.50 KVM_ASSIGN_DEV_IRQ
-
-Capability: KVM_CAP_ASSIGN_DEV_IRQ
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_assigned_irq (in)
-Returns: 0 on success, -1 on error
-
-Assigns an IRQ to a passed-through device.
-
-struct kvm_assigned_irq {
- __u32 assigned_dev_id;
- __u32 host_irq;
- __u32 guest_irq;
- __u32 flags;
- union {
- struct {
- __u32 addr_lo;
- __u32 addr_hi;
- __u32 data;
- } guest_msi;
- __u32 reserved[12];
- };
-};
-
-The following flags are defined:
-
-#define KVM_DEV_IRQ_HOST_INTX (1 << 0)
-#define KVM_DEV_IRQ_HOST_MSI (1 << 1)
-#define KVM_DEV_IRQ_HOST_MSIX (1 << 2)
-
-#define KVM_DEV_IRQ_GUEST_INTX (1 << 8)
-#define KVM_DEV_IRQ_GUEST_MSI (1 << 9)
-#define KVM_DEV_IRQ_GUEST_MSIX (1 << 10)
-
-It is not valid to specify multiple types per host or guest IRQ. However, the
-IRQ type of host and guest can differ or can even be null.
-
-4.51 KVM_DEASSIGN_DEV_IRQ
-
-Capability: KVM_CAP_ASSIGN_DEV_IRQ
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_assigned_irq (in)
-Returns: 0 on success, -1 on error
-
-Ends an IRQ assignment to a passed-through device.
-
-See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
-by assigned_dev_id, flags must correspond to the IRQ type specified on
-KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
-
-4.52 KVM_SET_GSI_ROUTING
-
-Capability: KVM_CAP_IRQ_ROUTING
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_irq_routing (in)
-Returns: 0 on success, -1 on error
-
-Sets the GSI routing table entries, overwriting any previously set entries.
-
-struct kvm_irq_routing {
- __u32 nr;
- __u32 flags;
- struct kvm_irq_routing_entry entries[0];
-};
-
-No flags are specified so far, the corresponding field must be set to zero.
-
-struct kvm_irq_routing_entry {
- __u32 gsi;
- __u32 type;
- __u32 flags;
- __u32 pad;
- union {
- struct kvm_irq_routing_irqchip irqchip;
- struct kvm_irq_routing_msi msi;
- __u32 pad[8];
- } u;
-};
-
-/* gsi routing entry types */
-#define KVM_IRQ_ROUTING_IRQCHIP 1
-#define KVM_IRQ_ROUTING_MSI 2
-
-No flags are specified so far, the corresponding field must be set to zero.
-
-struct kvm_irq_routing_irqchip {
- __u32 irqchip;
- __u32 pin;
-};
-
-struct kvm_irq_routing_msi {
- __u32 address_lo;
- __u32 address_hi;
- __u32 data;
- __u32 pad;
-};
-
-4.53 KVM_ASSIGN_SET_MSIX_NR
-
-Capability: KVM_CAP_DEVICE_MSIX
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_assigned_msix_nr (in)
-Returns: 0 on success, -1 on error
-
-Set the number of MSI-X interrupts for an assigned device. This service can
-only be called once in the lifetime of an assigned device.
-
-struct kvm_assigned_msix_nr {
- __u32 assigned_dev_id;
- __u16 entry_nr;
- __u16 padding;
-};
-
-#define KVM_MAX_MSIX_PER_DEV 256
-
-4.54 KVM_ASSIGN_SET_MSIX_ENTRY
-
-Capability: KVM_CAP_DEVICE_MSIX
-Architectures: x86 ia64
-Type: vm ioctl
-Parameters: struct kvm_assigned_msix_entry (in)
-Returns: 0 on success, -1 on error
-
-Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
-the GSI vector to zero means disabling the interrupt.
-
-struct kvm_assigned_msix_entry {
- __u32 assigned_dev_id;
- __u32 gsi;
- __u16 entry; /* The index of entry in the MSI-X table */
- __u16 padding[3];
-};
-
-5. The kvm_run structure
-
-Application code obtains a pointer to the kvm_run structure by
-mmap()ing a vcpu fd. From that point, application code can control
-execution by changing fields in kvm_run prior to calling the KVM_RUN
-ioctl, and obtain information about the reason KVM_RUN returned by
-looking up structure members.
-
-struct kvm_run {
- /* in */
- __u8 request_interrupt_window;
-
-Request that KVM_RUN return when it becomes possible to inject external
-interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
-
- __u8 padding1[7];
-
- /* out */
- __u32 exit_reason;
-
-When KVM_RUN has returned successfully (return value 0), this informs
-application code why KVM_RUN has returned. Allowable values for this
-field are detailed below.
-
- __u8 ready_for_interrupt_injection;
-
-If request_interrupt_window has been specified, this field indicates
-an interrupt can be injected now with KVM_INTERRUPT.
-
- __u8 if_flag;
-
-The value of the current interrupt flag. Only valid if in-kernel
-local APIC is not used.
-
- __u8 padding2[2];
-
- /* in (pre_kvm_run), out (post_kvm_run) */
- __u64 cr8;
-
-The value of the cr8 register. Only valid if in-kernel local APIC is
-not used. Both input and output.
-
- __u64 apic_base;
-
-The value of the APIC BASE msr. Only valid if in-kernel local
-APIC is not used. Both input and output.
-
- union {
- /* KVM_EXIT_UNKNOWN */
- struct {
- __u64 hardware_exit_reason;
- } hw;
-
-If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
-reasons. Further architecture-specific information is available in
-hardware_exit_reason.
-
- /* KVM_EXIT_FAIL_ENTRY */
- struct {
- __u64 hardware_entry_failure_reason;
- } fail_entry;
-
-If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
-to unknown reasons. Further architecture-specific information is
-available in hardware_entry_failure_reason.
-
- /* KVM_EXIT_EXCEPTION */
- struct {
- __u32 exception;
- __u32 error_code;
- } ex;
-
-Unused.
-
- /* KVM_EXIT_IO */
- struct {
-#define KVM_EXIT_IO_IN 0
-#define KVM_EXIT_IO_OUT 1
- __u8 direction;
- __u8 size; /* bytes */
- __u16 port;
- __u32 count;
- __u64 data_offset; /* relative to kvm_run start */
- } io;
-
-If exit_reason is KVM_EXIT_IO, then the vcpu has
-executed a port I/O instruction which could not be satisfied by kvm.
-data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
-where kvm expects application code to place the data for the next
-KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
-
- struct {
- struct kvm_debug_exit_arch arch;
- } debug;
-
-Unused.
-
- /* KVM_EXIT_MMIO */
- struct {
- __u64 phys_addr;
- __u8 data[8];
- __u32 len;
- __u8 is_write;
- } mmio;
-
-If exit_reason is KVM_EXIT_MMIO, then the vcpu has
-executed a memory-mapped I/O instruction which could not be satisfied
-by kvm. The 'data' member contains the written data if 'is_write' is
-true, and should be filled by application code otherwise.
-
-NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
-operations are complete (and guest state is consistent) only after userspace
-has re-entered the kernel with KVM_RUN. The kernel side will first finish
-incomplete operations and then check for pending signals. Userspace
-can re-enter the guest with an unmasked signal pending to complete
-pending operations.
-
- /* KVM_EXIT_HYPERCALL */
- struct {
- __u64 nr;
- __u64 args[6];
- __u64 ret;
- __u32 longmode;
- __u32 pad;
- } hypercall;
-
-Unused. This was once used for 'hypercall to userspace'. To implement
-such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
-Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
-
- /* KVM_EXIT_TPR_ACCESS */
- struct {
- __u64 rip;
- __u32 is_write;
- __u32 pad;
- } tpr_access;
-
-To be documented (KVM_TPR_ACCESS_REPORTING).
-
- /* KVM_EXIT_S390_SIEIC */
- struct {
- __u8 icptcode;
- __u64 mask; /* psw upper half */
- __u64 addr; /* psw lower half */
- __u16 ipa;
- __u32 ipb;
- } s390_sieic;
-
-s390 specific.
-
- /* KVM_EXIT_S390_RESET */
-#define KVM_S390_RESET_POR 1
-#define KVM_S390_RESET_CLEAR 2
-#define KVM_S390_RESET_SUBSYSTEM 4
-#define KVM_S390_RESET_CPU_INIT 8
-#define KVM_S390_RESET_IPL 16
- __u64 s390_reset_flags;
-
-s390 specific.
-
- /* KVM_EXIT_DCR */
- struct {
- __u32 dcrn;
- __u32 data;
- __u8 is_write;
- } dcr;
-
-powerpc specific.
-
- /* KVM_EXIT_OSI */
- struct {
- __u64 gprs[32];
- } osi;
-
-MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
-hypercalls and exit with this exit struct that contains all the guest gprs.
-
-If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
-Userspace can now handle the hypercall and when it's done modify the gprs as
-necessary. Upon guest entry all guest GPRs will then be replaced by the values
-in this struct.
-
- /* Fix the size of the union. */
- char padding[256];
- };
-};
+++ /dev/null
-KVM CPUID bits
-Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
-=====================================================
-
-A guest running on a kvm host, can check some of its features using
-cpuid. This is not always guaranteed to work, since userspace can
-mask-out some, or even all KVM-related cpuid features before launching
-a guest.
-
-KVM cpuid functions are:
-
-function: KVM_CPUID_SIGNATURE (0x40000000)
-returns : eax = 0,
- ebx = 0x4b4d564b,
- ecx = 0x564b4d56,
- edx = 0x4d.
-Note that this value in ebx, ecx and edx corresponds to the string "KVMKVMKVM".
-This function queries the presence of KVM cpuid leafs.
-
-
-function: define KVM_CPUID_FEATURES (0x40000001)
-returns : ebx, ecx, edx = 0
- eax = and OR'ed group of (1 << flag), where each flags is:
-
-
-flag || value || meaning
-=============================================================================
-KVM_FEATURE_CLOCKSOURCE || 0 || kvmclock available at msrs
- || || 0x11 and 0x12.
-------------------------------------------------------------------------------
-KVM_FEATURE_NOP_IO_DELAY || 1 || not necessary to perform delays
- || || on PIO operations.
-------------------------------------------------------------------------------
-KVM_FEATURE_MMU_OP || 2 || deprecated.
-------------------------------------------------------------------------------
-KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs
- || || 0x4b564d00 and 0x4b564d01
-------------------------------------------------------------------------------
-KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by
- || || writing to msr 0x4b564d02
-------------------------------------------------------------------------------
-KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
- || || per-cpu warps are expected in
- || || kvmclock.
-------------------------------------------------------------------------------
+++ /dev/null
-KVM Lock Overview
-=================
-
-1. Acquisition Orders
----------------------
-
-(to be written)
-
-2. Reference
-------------
-
-Name: kvm_lock
-Type: raw_spinlock
-Arch: any
-Protects: - vm_list
- - hardware virtualization enable/disable
-Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
- migration.
-
-Name: kvm_arch::tsc_write_lock
-Type: raw_spinlock
-Arch: x86
-Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
- - tsc offset in vmcb
-Comment: 'raw' because updating the tsc offsets must not be preempted.
+++ /dev/null
-The x86 kvm shadow mmu
-======================
-
-The mmu (in arch/x86/kvm, files mmu.[ch] and paging_tmpl.h) is responsible
-for presenting a standard x86 mmu to the guest, while translating guest
-physical addresses to host physical addresses.
-
-The mmu code attempts to satisfy the following requirements:
-
-- correctness: the guest should not be able to determine that it is running
- on an emulated mmu except for timing (we attempt to comply
- with the specification, not emulate the characteristics of
- a particular implementation such as tlb size)
-- security: the guest must not be able to touch host memory not assigned
- to it
-- performance: minimize the performance penalty imposed by the mmu
-- scaling: need to scale to large memory and large vcpu guests
-- hardware: support the full range of x86 virtualization hardware
-- integration: Linux memory management code must be in control of guest memory
- so that swapping, page migration, page merging, transparent
- hugepages, and similar features work without change
-- dirty tracking: report writes to guest memory to enable live migration
- and framebuffer-based displays
-- footprint: keep the amount of pinned kernel memory low (most memory
- should be shrinkable)
-- reliability: avoid multipage or GFP_ATOMIC allocations
-
-Acronyms
-========
-
-pfn host page frame number
-hpa host physical address
-hva host virtual address
-gfn guest frame number
-gpa guest physical address
-gva guest virtual address
-ngpa nested guest physical address
-ngva nested guest virtual address
-pte page table entry (used also to refer generically to paging structure
- entries)
-gpte guest pte (referring to gfns)
-spte shadow pte (referring to pfns)
-tdp two dimensional paging (vendor neutral term for NPT and EPT)
-
-Virtual and real hardware supported
-===================================
-
-The mmu supports first-generation mmu hardware, which allows an atomic switch
-of the current paging mode and cr3 during guest entry, as well as
-two-dimensional paging (AMD's NPT and Intel's EPT). The emulated hardware
-it exposes is the traditional 2/3/4 level x86 mmu, with support for global
-pages, pae, pse, pse36, cr0.wp, and 1GB pages. Work is in progress to support
-exposing NPT capable hardware on NPT capable hosts.
-
-Translation
-===========
-
-The primary job of the mmu is to program the processor's mmu to translate
-addresses for the guest. Different translations are required at different
-times:
-
-- when guest paging is disabled, we translate guest physical addresses to
- host physical addresses (gpa->hpa)
-- when guest paging is enabled, we translate guest virtual addresses, to
- guest physical addresses, to host physical addresses (gva->gpa->hpa)
-- when the guest launches a guest of its own, we translate nested guest
- virtual addresses, to nested guest physical addresses, to guest physical
- addresses, to host physical addresses (ngva->ngpa->gpa->hpa)
-
-The primary challenge is to encode between 1 and 3 translations into hardware
-that support only 1 (traditional) and 2 (tdp) translations. When the
-number of required translations matches the hardware, the mmu operates in
-direct mode; otherwise it operates in shadow mode (see below).
-
-Memory
-======
-
-Guest memory (gpa) is part of the user address space of the process that is
-using kvm. Userspace defines the translation between guest addresses and user
-addresses (gpa->hva); note that two gpas may alias to the same hva, but not
-vice versa.
-
-These hvas may be backed using any method available to the host: anonymous
-memory, file backed memory, and device memory. Memory might be paged by the
-host at any time.
-
-Events
-======
-
-The mmu is driven by events, some from the guest, some from the host.
-
-Guest generated events:
-- writes to control registers (especially cr3)
-- invlpg/invlpga instruction execution
-- access to missing or protected translations
-
-Host generated events:
-- changes in the gpa->hpa translation (either through gpa->hva changes or
- through hva->hpa changes)
-- memory pressure (the shrinker)
-
-Shadow pages
-============
-
-The principal data structure is the shadow page, 'struct kvm_mmu_page'. A
-shadow page contains 512 sptes, which can be either leaf or nonleaf sptes. A
-shadow page may contain a mix of leaf and nonleaf sptes.
-
-A nonleaf spte allows the hardware mmu to reach the leaf pages and
-is not related to a translation directly. It points to other shadow pages.
-
-A leaf spte corresponds to either one or two translations encoded into
-one paging structure entry. These are always the lowest level of the
-translation stack, with optional higher level translations left to NPT/EPT.
-Leaf ptes point at guest pages.
-
-The following table shows translations encoded by leaf ptes, with higher-level
-translations in parentheses:
-
- Non-nested guests:
- nonpaging: gpa->hpa
- paging: gva->gpa->hpa
- paging, tdp: (gva->)gpa->hpa
- Nested guests:
- non-tdp: ngva->gpa->hpa (*)
- tdp: (ngva->)ngpa->gpa->hpa
-
-(*) the guest hypervisor will encode the ngva->gpa translation into its page
- tables if npt is not present
-
-Shadow pages contain the following information:
- role.level:
- The level in the shadow paging hierarchy that this shadow page belongs to.
- 1=4k sptes, 2=2M sptes, 3=1G sptes, etc.
- role.direct:
- If set, leaf sptes reachable from this page are for a linear range.
- Examples include real mode translation, large guest pages backed by small
- host pages, and gpa->hpa translations when NPT or EPT is active.
- The linear range starts at (gfn << PAGE_SHIFT) and its size is determined
- by role.level (2MB for first level, 1GB for second level, 0.5TB for third
- level, 256TB for fourth level)
- If clear, this page corresponds to a guest page table denoted by the gfn
- field.
- role.quadrant:
- When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit
- sptes. That means a guest page table contains more ptes than the host,
- so multiple shadow pages are needed to shadow one guest page.
- For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the
- first or second 512-gpte block in the guest page table. For second-level
- page tables, each 32-bit gpte is converted to two 64-bit sptes
- (since each first-level guest page is shadowed by two first-level
- shadow pages) so role.quadrant takes values in the range 0..3. Each
- quadrant maps 1GB virtual address space.
- role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
- role.invalid:
- The page is invalid and should not be used. It is a root page that is
- currently pinned (by a cpu hardware register pointing to it); once it is
- unpinned it will be destroyed.
- role.cr4_pae:
- Contains the value of cr4.pae for which the page is valid (e.g. whether
- 32-bit or 64-bit gptes are in use).
- role.nxe:
- Contains the value of efer.nxe for which the page is valid.
- role.cr0_wp:
- Contains the value of cr0.wp for which the page is valid.
- gfn:
- Either the guest page table containing the translations shadowed by this
- page, or the base page frame for linear translations. See role.direct.
- spt:
- A pageful of 64-bit sptes containing the translations for this page.
- Accessed by both kvm and hardware.
- The page pointed to by spt will have its page->private pointing back
- at the shadow page structure.
- sptes in spt point either at guest pages, or at lower-level shadow pages.
- Specifically, if sp1 and sp2 are shadow pages, then sp1->spt[n] may point
- at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
- The spt array forms a DAG structure with the shadow page as a node, and
- guest pages as leaves.
- gfns:
- An array of 512 guest frame numbers, one for each present pte. Used to
- perform a reverse map from a pte to a gfn. When role.direct is set, any
- element of this array can be calculated from the gfn field when used, in
- this case, the array of gfns is not allocated. See role.direct and gfn.
- slot_bitmap:
- A bitmap containing one bit per memory slot. If the page contains a pte
- mapping a page from memory slot n, then bit n of slot_bitmap will be set
- (if a page is aliased among several slots, then it is not guaranteed that
- all slots will be marked).
- Used during dirty logging to avoid scanning a shadow page if none if its
- pages need tracking.
- root_count:
- A counter keeping track of how many hardware registers (guest cr3 or
- pdptrs) are now pointing at the page. While this counter is nonzero, the
- page cannot be destroyed. See role.invalid.
- multimapped:
- Whether there exist multiple sptes pointing at this page.
- parent_pte/parent_ptes:
- If multimapped is zero, parent_pte points at the single spte that points at
- this page's spt. Otherwise, parent_ptes points at a data structure
- with a list of parent_ptes.
- unsync:
- If true, then the translations in this page may not match the guest's
- translation. This is equivalent to the state of the tlb when a pte is
- changed but before the tlb entry is flushed. Accordingly, unsync ptes
- are synchronized when the guest executes invlpg or flushes its tlb by
- other means. Valid for leaf pages.
- unsync_children:
- How many sptes in the page point at pages that are unsync (or have
- unsynchronized children).
- unsync_child_bitmap:
- A bitmap indicating which sptes in spt point (directly or indirectly) at
- pages that may be unsynchronized. Used to quickly locate all unsychronized
- pages reachable from a given page.
-
-Reverse map
-===========
-
-The mmu maintains a reverse mapping whereby all ptes mapping a page can be
-reached given its gfn. This is used, for example, when swapping out a page.
-
-Synchronized and unsynchronized pages
-=====================================
-
-The guest uses two events to synchronize its tlb and page tables: tlb flushes
-and page invalidations (invlpg).
-
-A tlb flush means that we need to synchronize all sptes reachable from the
-guest's cr3. This is expensive, so we keep all guest page tables write
-protected, and synchronize sptes to gptes when a gpte is written.
-
-A special case is when a guest page table is reachable from the current
-guest cr3. In this case, the guest is obliged to issue an invlpg instruction
-before using the translation. We take advantage of that by removing write
-protection from the guest page, and allowing the guest to modify it freely.
-We synchronize modified gptes when the guest invokes invlpg. This reduces
-the amount of emulation we have to do when the guest modifies multiple gptes,
-or when the a guest page is no longer used as a page table and is used for
-random guest data.
-
-As a side effect we have to resynchronize all reachable unsynchronized shadow
-pages on a tlb flush.
-
-
-Reaction to events
-==================
-
-- guest page fault (or npt page fault, or ept violation)
-
-This is the most complicated event. The cause of a page fault can be:
-
- - a true guest fault (the guest translation won't allow the access) (*)
- - access to a missing translation
- - access to a protected translation
- - when logging dirty pages, memory is write protected
- - synchronized shadow pages are write protected (*)
- - access to untranslatable memory (mmio)
-
- (*) not applicable in direct mode
-
-Handling a page fault is performed as follows:
-
- - if needed, walk the guest page tables to determine the guest translation
- (gva->gpa or ngpa->gpa)
- - if permissions are insufficient, reflect the fault back to the guest
- - determine the host page
- - if this is an mmio request, there is no host page; call the emulator
- to emulate the instruction instead
- - walk the shadow page table to find the spte for the translation,
- instantiating missing intermediate page tables as necessary
- - try to unsynchronize the page
- - if successful, we can let the guest continue and modify the gpte
- - emulate the instruction
- - if failed, unshadow the page and let the guest continue
- - update any translations that were modified by the instruction
-
-invlpg handling:
-
- - walk the shadow page hierarchy and drop affected translations
- - try to reinstantiate the indicated translation in the hope that the
- guest will use it in the near future
-
-Guest control register updates:
-
-- mov to cr3
- - look up new shadow roots
- - synchronize newly reachable shadow pages
-
-- mov to cr0/cr4/efer
- - set up mmu context for new paging mode
- - look up new shadow roots
- - synchronize newly reachable shadow pages
-
-Host translation updates:
-
- - mmu notifier called with updated hva
- - look up affected sptes through reverse map
- - drop (or update) translations
-
-Emulating cr0.wp
-================
-
-If tdp is not enabled, the host must keep cr0.wp=1 so page write protection
-works for the guest kernel, not guest guest userspace. When the guest
-cr0.wp=1, this does not present a problem. However when the guest cr0.wp=0,
-we cannot map the permissions for gpte.u=1, gpte.w=0 to any spte (the
-semantics require allowing any guest kernel access plus user read access).
-
-We handle this by mapping the permissions to two possible sptes, depending
-on fault type:
-
-- kernel write fault: spte.u=0, spte.w=1 (allows full kernel access,
- disallows user access)
-- read fault: spte.u=1, spte.w=0 (allows full read access, disallows kernel
- write access)
-
-(user write faults generate a #PF)
-
-Large pages
-===========
-
-The mmu supports all combinations of large and small guest and host pages.
-Supported page sizes include 4k, 2M, 4M, and 1G. 4M pages are treated as
-two separate 2M pages, on both guest and host, since the mmu always uses PAE
-paging.
-
-To instantiate a large spte, four constraints must be satisfied:
-
-- the spte must point to a large host page
-- the guest pte must be a large pte of at least equivalent size (if tdp is
- enabled, there is no guest pte and this condition is satisified)
-- if the spte will be writeable, the large page frame may not overlap any
- write-protected pages
-- the guest page must be wholly contained by a single memory slot
-
-To check the last two conditions, the mmu maintains a ->write_count set of
-arrays for each memory slot and large page size. Every write protected page
-causes its write_count to be incremented, thus preventing instantiation of
-a large spte. The frames at the end of an unaligned memory slot have
-artificically inflated ->write_counts so they can never be instantiated.
-
-Further reading
-===============
-
-- NPT presentation from KVM Forum 2008
- http://www.linux-kvm.org/wiki/images/c/c8/KvmForum2008%24kdf2008_21.pdf
-
+++ /dev/null
-KVM-specific MSRs.
-Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
-=====================================================
-
-KVM makes use of some custom MSRs to service some requests.
-
-Custom MSRs have a range reserved for them, that goes from
-0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
-but they are deprecated and their use is discouraged.
-
-Custom MSR list
---------
-
-The current supported Custom MSR list is:
-
-MSR_KVM_WALL_CLOCK_NEW: 0x4b564d00
-
- data: 4-byte alignment physical address of a memory area which must be
- in guest RAM. This memory is expected to hold a copy of the following
- structure:
-
- struct pvclock_wall_clock {
- u32 version;
- u32 sec;
- u32 nsec;
- } __attribute__((__packed__));
-
- whose data will be filled in by the hypervisor. The hypervisor is only
- guaranteed to update this data at the moment of MSR write.
- Users that want to reliably query this information more than once have
- to write more than once to this MSR. Fields have the following meanings:
-
- version: guest has to check version before and after grabbing
- time information and check that they are both equal and even.
- An odd version indicates an in-progress update.
-
- sec: number of seconds for wallclock.
-
- nsec: number of nanoseconds for wallclock.
-
- Note that although MSRs are per-CPU entities, the effect of this
- particular MSR is global.
-
- Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
- leaf prior to usage.
-
-MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01
-
- data: 4-byte aligned physical address of a memory area which must be in
- guest RAM, plus an enable bit in bit 0. This memory is expected to hold
- a copy of the following structure:
-
- struct pvclock_vcpu_time_info {
- u32 version;
- u32 pad0;
- u64 tsc_timestamp;
- u64 system_time;
- u32 tsc_to_system_mul;
- s8 tsc_shift;
- u8 flags;
- u8 pad[2];
- } __attribute__((__packed__)); /* 32 bytes */
-
- whose data will be filled in by the hypervisor periodically. Only one
- write, or registration, is needed for each VCPU. The interval between
- updates of this structure is arbitrary and implementation-dependent.
- The hypervisor may update this structure at any time it sees fit until
- anything with bit0 == 0 is written to it.
-
- Fields have the following meanings:
-
- version: guest has to check version before and after grabbing
- time information and check that they are both equal and even.
- An odd version indicates an in-progress update.
-
- tsc_timestamp: the tsc value at the current VCPU at the time
- of the update of this structure. Guests can subtract this value
- from current tsc to derive a notion of elapsed time since the
- structure update.
-
- system_time: a host notion of monotonic time, including sleep
- time at the time this structure was last updated. Unit is
- nanoseconds.
-
- tsc_to_system_mul: a function of the tsc frequency. One has
- to multiply any tsc-related quantity by this value to get
- a value in nanoseconds, besides dividing by 2^tsc_shift
-
- tsc_shift: cycle to nanosecond divider, as a power of two, to
- allow for shift rights. One has to shift right any tsc-related
- quantity by this value to get a value in nanoseconds, besides
- multiplying by tsc_to_system_mul.
-
- With this information, guests can derive per-CPU time by
- doing:
-
- time = (current_tsc - tsc_timestamp)
- time = (time * tsc_to_system_mul) >> tsc_shift
- time = time + system_time
-
- flags: bits in this field indicate extended capabilities
- coordinated between the guest and the hypervisor. Availability
- of specific flags has to be checked in 0x40000001 cpuid leaf.
- Current flags are:
-
- flag bit | cpuid bit | meaning
- -------------------------------------------------------------
- | | time measures taken across
- 0 | 24 | multiple cpus are guaranteed to
- | | be monotonic
- -------------------------------------------------------------
-
- Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
- leaf prior to usage.
-
-
-MSR_KVM_WALL_CLOCK: 0x11
-
- data and functioning: same as MSR_KVM_WALL_CLOCK_NEW. Use that instead.
-
- This MSR falls outside the reserved KVM range and may be removed in the
- future. Its usage is deprecated.
-
- Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
- leaf prior to usage.
-
-MSR_KVM_SYSTEM_TIME: 0x12
-
- data and functioning: same as MSR_KVM_SYSTEM_TIME_NEW. Use that instead.
-
- This MSR falls outside the reserved KVM range and may be removed in the
- future. Its usage is deprecated.
-
- Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
- leaf prior to usage.
-
- The suggested algorithm for detecting kvmclock presence is then:
-
- if (!kvm_para_available()) /* refer to cpuid.txt */
- return NON_PRESENT;
-
- flags = cpuid_eax(0x40000001);
- if (flags & 3) {
- msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
- msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
- return PRESENT;
- } else if (flags & 0) {
- msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
- msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
- return PRESENT;
- } else
- return NON_PRESENT;
-
-MSR_KVM_ASYNC_PF_EN: 0x4b564d02
- data: Bits 63-6 hold 64-byte aligned physical address of a
- 64 byte memory area which must be in guest RAM and must be
- zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
- when asynchronous page faults are enabled on the vcpu 0 when
- disabled. Bit 2 is 1 if asynchronous page faults can be injected
- when vcpu is in cpl == 0.
-
- First 4 byte of 64 byte memory location will be written to by
- the hypervisor at the time of asynchronous page fault (APF)
- injection to indicate type of asynchronous page fault. Value
- of 1 means that the page referred to by the page fault is not
- present. Value 2 means that the page is now available. Disabling
- interrupt inhibits APFs. Guest must not enable interrupt
- before the reason is read, or it may be overwritten by another
- APF. Since APF uses the same exception vector as regular page
- fault guest must reset the reason to 0 before it does
- something that can generate normal page fault. If during page
- fault APF reason is 0 it means that this is regular page
- fault.
-
- During delivery of type 1 APF cr2 contains a token that will
- be used to notify a guest when missing page becomes
- available. When page becomes available type 2 APF is sent with
- cr2 set to the token associated with the page. There is special
- kind of token 0xffffffff which tells vcpu that it should wake
- up all processes waiting for APFs and no individual type 2 APFs
- will be sent.
-
- If APF is disabled while there are outstanding APFs, they will
- not be delivered.
-
- Currently type 2 APF will be always delivered on the same vcpu as
- type 1 was, but guest should not rely on that.
+++ /dev/null
-The PPC KVM paravirtual interface
-=================================
-
-The basic execution principle by which KVM on PowerPC works is to run all kernel
-space code in PR=1 which is user space. This way we trap all privileged
-instructions and can emulate them accordingly.
-
-Unfortunately that is also the downfall. There are quite some privileged
-instructions that needlessly return us to the hypervisor even though they
-could be handled differently.
-
-This is what the PPC PV interface helps with. It takes privileged instructions
-and transforms them into unprivileged ones with some help from the hypervisor.
-This cuts down virtualization costs by about 50% on some of my benchmarks.
-
-The code for that interface can be found in arch/powerpc/kernel/kvm*
-
-Querying for existence
-======================
-
-To find out if we're running on KVM or not, we leverage the device tree. When
-Linux is running on KVM, a node /hypervisor exists. That node contains a
-compatible property with the value "linux,kvm".
-
-Once you determined you're running under a PV capable KVM, you can now use
-hypercalls as described below.
-
-KVM hypercalls
-==============
-
-Inside the device tree's /hypervisor node there's a property called
-'hypercall-instructions'. This property contains at most 4 opcodes that make
-up the hypercall. To call a hypercall, just call these instructions.
-
-The parameters are as follows:
-
- Register IN OUT
-
- r0 - volatile
- r3 1st parameter Return code
- r4 2nd parameter 1st output value
- r5 3rd parameter 2nd output value
- r6 4th parameter 3rd output value
- r7 5th parameter 4th output value
- r8 6th parameter 5th output value
- r9 7th parameter 6th output value
- r10 8th parameter 7th output value
- r11 hypercall number 8th output value
- r12 - volatile
-
-Hypercall definitions are shared in generic code, so the same hypercall numbers
-apply for x86 and powerpc alike with the exception that each KVM hypercall
-also needs to be ORed with the KVM vendor code which is (42 << 16).
-
-Return codes can be as follows:
-
- Code Meaning
-
- 0 Success
- 12 Hypercall not implemented
- <0 Error
-
-The magic page
-==============
-
-To enable communication between the hypervisor and guest there is a new shared
-page that contains parts of supervisor visible register state. The guest can
-map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
-
-With this hypercall issued the guest always gets the magic page mapped at the
-desired location in effective and physical address space. For now, we always
-map the page to -4096. This way we can access it using absolute load and store
-functions. The following instruction reads the first field of the magic page:
-
- ld rX, -4096(0)
-
-The interface is designed to be extensible should there be need later to add
-additional registers to the magic page. If you add fields to the magic page,
-also define a new hypercall feature to indicate that the host can give you more
-registers. Only if the host supports the additional features, make use of them.
-
-The magic page has the following layout as described in
-arch/powerpc/include/asm/kvm_para.h:
-
-struct kvm_vcpu_arch_shared {
- __u64 scratch1;
- __u64 scratch2;
- __u64 scratch3;
- __u64 critical; /* Guest may not get interrupts if == r1 */
- __u64 sprg0;
- __u64 sprg1;
- __u64 sprg2;
- __u64 sprg3;
- __u64 srr0;
- __u64 srr1;
- __u64 dar;
- __u64 msr;
- __u32 dsisr;
- __u32 int_pending; /* Tells the guest if we have an interrupt */
-};
-
-Additions to the page must only occur at the end. Struct fields are always 32
-or 64 bit aligned, depending on them being 32 or 64 bit wide respectively.
-
-Magic page features
-===================
-
-When mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE,
-a second return value is passed to the guest. This second return value contains
-a bitmap of available features inside the magic page.
-
-The following enhancements to the magic page are currently available:
-
- KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page
-
-For enhanced features in the magic page, please check for the existence of the
-feature before using them!
-
-MSR bits
-========
-
-The MSR contains bits that require hypervisor intervention and bits that do
-not require direct hypervisor intervention because they only get interpreted
-when entering the guest or don't have any impact on the hypervisor's behavior.
-
-The following bits are safe to be set inside the guest:
-
- MSR_EE
- MSR_RI
- MSR_CR
- MSR_ME
-
-If any other bit changes in the MSR, please still use mtmsr(d).
-
-Patched instructions
-====================
-
-The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
-respectively on 32 bit systems with an added offset of 4 to accommodate for big
-endianness.
-
-The following is a list of mapping the Linux kernel performs when running as
-guest. Implementing any of those mappings is optional, as the instruction traps
-also act on the shared page. So calling privileged instructions still works as
-before.
-
-From To
-==== ==
-
-mfmsr rX ld rX, magic_page->msr
-mfsprg rX, 0 ld rX, magic_page->sprg0
-mfsprg rX, 1 ld rX, magic_page->sprg1
-mfsprg rX, 2 ld rX, magic_page->sprg2
-mfsprg rX, 3 ld rX, magic_page->sprg3
-mfsrr0 rX ld rX, magic_page->srr0
-mfsrr1 rX ld rX, magic_page->srr1
-mfdar rX ld rX, magic_page->dar
-mfdsisr rX lwz rX, magic_page->dsisr
-
-mtmsr rX std rX, magic_page->msr
-mtsprg 0, rX std rX, magic_page->sprg0
-mtsprg 1, rX std rX, magic_page->sprg1
-mtsprg 2, rX std rX, magic_page->sprg2
-mtsprg 3, rX std rX, magic_page->sprg3
-mtsrr0 rX std rX, magic_page->srr0
-mtsrr1 rX std rX, magic_page->srr1
-mtdar rX std rX, magic_page->dar
-mtdsisr rX stw rX, magic_page->dsisr
-
-tlbsync nop
-
-mtmsrd rX, 0 b <special mtmsr section>
-mtmsr rX b <special mtmsr section>
-
-mtmsrd rX, 1 b <special mtmsrd section>
-
-[Book3S only]
-mtsrin rX, rY b <special mtsrin section>
-
-[BookE only]
-wrteei [0|1] b <special wrteei section>
-
-
-Some instructions require more logic to determine what's going on than a load
-or store instruction can deliver. To enable patching of those, we keep some
-RAM around where we can live translate instructions to. What happens is the
-following:
-
- 1) copy emulation code to memory
- 2) patch that code to fit the emulated instruction
- 3) patch that code to return to the original pc + 4
- 4) patch the original instruction to branch to the new code
-
-That way we can inject an arbitrary amount of code as replacement for a single
-instruction. This allows us to check for pending interrupts when setting EE=1
-for example.
+++ /dev/null
-Review checklist for kvm patches
-================================
-
-1. The patch must follow Documentation/CodingStyle and
- Documentation/SubmittingPatches.
-
-2. Patches should be against kvm.git master branch.
-
-3. If the patch introduces or modifies a new userspace API:
- - the API must be documented in Documentation/kvm/api.txt
- - the API must be discoverable using KVM_CHECK_EXTENSION
-
-4. New state must include support for save/restore.
-
-5. New features must default to off (userspace should explicitly request them).
- Performance improvements can and should default to on.
-
-6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2
-
-7. Emulator changes should be accompanied by unit tests for qemu-kvm.git
- kvm/test directory.
-
-8. Changes should be vendor neutral when possible. Changes to common code
- are better than duplicating changes to vendor code.
-
-9. Similarly, prefer changes to arch independent code than to arch dependent
- code.
-
-10. User/kernel interfaces and guest/host interfaces must be 64-bit clean
- (all variables and sizes naturally aligned on 64-bit; use specific types
- only - u64 rather than ulong).
-
-11. New guest visible features must either be documented in a hardware manual
- or be accompanied by documentation.
-
-12. Features must be robust against reset and kexec - for example, shared
- host/guest memory must be unshared to prevent the host from writing to
- guest memory that the guest has not reserved for this purpose.
+++ /dev/null
-
- Timekeeping Virtualization for X86-Based Architectures
-
- Zachary Amsden <zamsden@redhat.com>
- Copyright (c) 2010, Red Hat. All rights reserved.
-
-1) Overview
-2) Timing Devices
-3) TSC Hardware
-4) Virtualization Problems
-
-=========================================================================
-
-1) Overview
-
-One of the most complicated parts of the X86 platform, and specifically,
-the virtualization of this platform is the plethora of timing devices available
-and the complexity of emulating those devices. In addition, virtualization of
-time introduces a new set of challenges because it introduces a multiplexed
-division of time beyond the control of the guest CPU.
-
-First, we will describe the various timekeeping hardware available, then
-present some of the problems which arise and solutions available, giving
-specific recommendations for certain classes of KVM guests.
-
-The purpose of this document is to collect data and information relevant to
-timekeeping which may be difficult to find elsewhere, specifically,
-information relevant to KVM and hardware-based virtualization.
-
-=========================================================================
-
-2) Timing Devices
-
-First we discuss the basic hardware devices available. TSC and the related
-KVM clock are special enough to warrant a full exposition and are described in
-the following section.
-
-2.1) i8254 - PIT
-
-One of the first timer devices available is the programmable interrupt timer,
-or PIT. The PIT has a fixed frequency 1.193182 MHz base clock and three
-channels which can be programmed to deliver periodic or one-shot interrupts.
-These three channels can be configured in different modes and have individual
-counters. Channel 1 and 2 were not available for general use in the original
-IBM PC, and historically were connected to control RAM refresh and the PC
-speaker. Now the PIT is typically integrated as part of an emulated chipset
-and a separate physical PIT is not used.
-
-The PIT uses I/O ports 0x40 - 0x43. Access to the 16-bit counters is done
-using single or multiple byte access to the I/O ports. There are 6 modes
-available, but not all modes are available to all timers, as only timer 2
-has a connected gate input, required for modes 1 and 5. The gate line is
-controlled by port 61h, bit 0, as illustrated in the following diagram.
-
- -------------- ----------------
-| | | |
-| 1.1932 MHz |---------->| CLOCK OUT | ---------> IRQ 0
-| Clock | | | |
- -------------- | +->| GATE TIMER 0 |
- | ----------------
- |
- | ----------------
- | | |
- |------>| CLOCK OUT | ---------> 66.3 KHZ DRAM
- | | | (aka /dev/null)
- | +->| GATE TIMER 1 |
- | ----------------
- |
- | ----------------
- | | |
- |------>| CLOCK OUT | ---------> Port 61h, bit 5
- | | |
-Port 61h, bit 0 ---------->| GATE TIMER 2 | \_.---- ____
- ---------------- _| )--|LPF|---Speaker
- / *---- \___/
-Port 61h, bit 1 -----------------------------------/
-
-The timer modes are now described.
-
-Mode 0: Single Timeout. This is a one-shot software timeout that counts down
- when the gate is high (always true for timers 0 and 1). When the count
- reaches zero, the output goes high.
-
-Mode 1: Triggered One-shot. The output is initially set high. When the gate
- line is set high, a countdown is initiated (which does not stop if the gate is
- lowered), during which the output is set low. When the count reaches zero,
- the output goes high.
-
-Mode 2: Rate Generator. The output is initially set high. When the countdown
- reaches 1, the output goes low for one count and then returns high. The value
- is reloaded and the countdown automatically resumes. If the gate line goes
- low, the count is halted. If the output is low when the gate is lowered, the
- output automatically goes high (this only affects timer 2).
-
-Mode 3: Square Wave. This generates a high / low square wave. The count
- determines the length of the pulse, which alternates between high and low
- when zero is reached. The count only proceeds when gate is high and is
- automatically reloaded on reaching zero. The count is decremented twice at
- each clock to generate a full high / low cycle at the full periodic rate.
- If the count is even, the clock remains high for N/2 counts and low for N/2
- counts; if the clock is odd, the clock is high for (N+1)/2 counts and low
- for (N-1)/2 counts. Only even values are latched by the counter, so odd
- values are not observed when reading. This is the intended mode for timer 2,
- which generates sine-like tones by low-pass filtering the square wave output.
-
-Mode 4: Software Strobe. After programming this mode and loading the counter,
- the output remains high until the counter reaches zero. Then the output
- goes low for 1 clock cycle and returns high. The counter is not reloaded.
- Counting only occurs when gate is high.
-
-Mode 5: Hardware Strobe. After programming and loading the counter, the
- output remains high. When the gate is raised, a countdown is initiated
- (which does not stop if the gate is lowered). When the counter reaches zero,
- the output goes low for 1 clock cycle and then returns high. The counter is
- not reloaded.
-
-In addition to normal binary counting, the PIT supports BCD counting. The
-command port, 0x43 is used to set the counter and mode for each of the three
-timers.
-
-PIT commands, issued to port 0x43, using the following bit encoding:
-
-Bit 7-4: Command (See table below)
-Bit 3-1: Mode (000 = Mode 0, 101 = Mode 5, 11X = undefined)
-Bit 0 : Binary (0) / BCD (1)
-
-Command table:
-
-0000 - Latch Timer 0 count for port 0x40
- sample and hold the count to be read in port 0x40;
- additional commands ignored until counter is read;
- mode bits ignored.
-
-0001 - Set Timer 0 LSB mode for port 0x40
- set timer to read LSB only and force MSB to zero;
- mode bits set timer mode
-
-0010 - Set Timer 0 MSB mode for port 0x40
- set timer to read MSB only and force LSB to zero;
- mode bits set timer mode
-
-0011 - Set Timer 0 16-bit mode for port 0x40
- set timer to read / write LSB first, then MSB;
- mode bits set timer mode
-
-0100 - Latch Timer 1 count for port 0x41 - as described above
-0101 - Set Timer 1 LSB mode for port 0x41 - as described above
-0110 - Set Timer 1 MSB mode for port 0x41 - as described above
-0111 - Set Timer 1 16-bit mode for port 0x41 - as described above
-
-1000 - Latch Timer 2 count for port 0x42 - as described above
-1001 - Set Timer 2 LSB mode for port 0x42 - as described above
-1010 - Set Timer 2 MSB mode for port 0x42 - as described above
-1011 - Set Timer 2 16-bit mode for port 0x42 as described above
-
-1101 - General counter latch
- Latch combination of counters into corresponding ports
- Bit 3 = Counter 2
- Bit 2 = Counter 1
- Bit 1 = Counter 0
- Bit 0 = Unused
-
-1110 - Latch timer status
- Latch combination of counter mode into corresponding ports
- Bit 3 = Counter 2
- Bit 2 = Counter 1
- Bit 1 = Counter 0
-
- The output of ports 0x40-0x42 following this command will be:
-
- Bit 7 = Output pin
- Bit 6 = Count loaded (0 if timer has expired)
- Bit 5-4 = Read / Write mode
- 01 = MSB only
- 10 = LSB only
- 11 = LSB / MSB (16-bit)
- Bit 3-1 = Mode
- Bit 0 = Binary (0) / BCD mode (1)
-
-2.2) RTC
-
-The second device which was available in the original PC was the MC146818 real
-time clock. The original device is now obsolete, and usually emulated by the
-system chipset, sometimes by an HPET and some frankenstein IRQ routing.
-
-The RTC is accessed through CMOS variables, which uses an index register to
-control which bytes are read. Since there is only one index register, read
-of the CMOS and read of the RTC require lock protection (in addition, it is
-dangerous to allow userspace utilities such as hwclock to have direct RTC
-access, as they could corrupt kernel reads and writes of CMOS memory).
-
-The RTC generates an interrupt which is usually routed to IRQ 8. The interrupt
-can function as a periodic timer, an additional once a day alarm, and can issue
-interrupts after an update of the CMOS registers by the MC146818 is complete.
-The type of interrupt is signalled in the RTC status registers.
-
-The RTC will update the current time fields by battery power even while the
-system is off. The current time fields should not be read while an update is
-in progress, as indicated in the status register.
-
-The clock uses a 32.768kHz crystal, so bits 6-4 of register A should be
-programmed to a 32kHz divider if the RTC is to count seconds.
-
-This is the RAM map originally used for the RTC/CMOS:
-
-Location Size Description
-------------------------------------------
-00h byte Current second (BCD)
-01h byte Seconds alarm (BCD)
-02h byte Current minute (BCD)
-03h byte Minutes alarm (BCD)
-04h byte Current hour (BCD)
-05h byte Hours alarm (BCD)
-06h byte Current day of week (BCD)
-07h byte Current day of month (BCD)
-08h byte Current month (BCD)
-09h byte Current year (BCD)
-0Ah byte Register A
- bit 7 = Update in progress
- bit 6-4 = Divider for clock
- 000 = 4.194 MHz
- 001 = 1.049 MHz
- 010 = 32 kHz
- 10X = test modes
- 110 = reset / disable
- 111 = reset / disable
- bit 3-0 = Rate selection for periodic interrupt
- 000 = periodic timer disabled
- 001 = 3.90625 uS
- 010 = 7.8125 uS
- 011 = .122070 mS
- 100 = .244141 mS
- ...
- 1101 = 125 mS
- 1110 = 250 mS
- 1111 = 500 mS
-0Bh byte Register B
- bit 7 = Run (0) / Halt (1)
- bit 6 = Periodic interrupt enable
- bit 5 = Alarm interrupt enable
- bit 4 = Update-ended interrupt enable
- bit 3 = Square wave interrupt enable
- bit 2 = BCD calendar (0) / Binary (1)
- bit 1 = 12-hour mode (0) / 24-hour mode (1)
- bit 0 = 0 (DST off) / 1 (DST enabled)
-OCh byte Register C (read only)
- bit 7 = interrupt request flag (IRQF)
- bit 6 = periodic interrupt flag (PF)
- bit 5 = alarm interrupt flag (AF)
- bit 4 = update interrupt flag (UF)
- bit 3-0 = reserved
-ODh byte Register D (read only)
- bit 7 = RTC has power
- bit 6-0 = reserved
-32h byte Current century BCD (*)
- (*) location vendor specific and now determined from ACPI global tables
-
-2.3) APIC
-
-On Pentium and later processors, an on-board timer is available to each CPU
-as part of the Advanced Programmable Interrupt Controller. The APIC is
-accessed through memory-mapped registers and provides interrupt service to each
-CPU, used for IPIs and local timer interrupts.
-
-Although in theory the APIC is a safe and stable source for local interrupts,
-in practice, many bugs and glitches have occurred due to the special nature of
-the APIC CPU-local memory-mapped hardware. Beware that CPU errata may affect
-the use of the APIC and that workarounds may be required. In addition, some of
-these workarounds pose unique constraints for virtualization - requiring either
-extra overhead incurred from extra reads of memory-mapped I/O or additional
-functionality that may be more computationally expensive to implement.
-
-Since the APIC is documented quite well in the Intel and AMD manuals, we will
-avoid repetition of the detail here. It should be pointed out that the APIC
-timer is programmed through the LVT (local vector timer) register, is capable
-of one-shot or periodic operation, and is based on the bus clock divided down
-by the programmable divider register.
-
-2.4) HPET
-
-HPET is quite complex, and was originally intended to replace the PIT / RTC
-support of the X86 PC. It remains to be seen whether that will be the case, as
-the de facto standard of PC hardware is to emulate these older devices. Some
-systems designated as legacy free may support only the HPET as a hardware timer
-device.
-
-The HPET spec is rather loose and vague, requiring at least 3 hardware timers,
-but allowing implementation freedom to support many more. It also imposes no
-fixed rate on the timer frequency, but does impose some extremal values on
-frequency, error and slew.
-
-In general, the HPET is recommended as a high precision (compared to PIT /RTC)
-time source which is independent of local variation (as there is only one HPET
-in any given system). The HPET is also memory-mapped, and its presence is
-indicated through ACPI tables by the BIOS.
-
-Detailed specification of the HPET is beyond the current scope of this
-document, as it is also very well documented elsewhere.
-
-2.5) Offboard Timers
-
-Several cards, both proprietary (watchdog boards) and commonplace (e1000) have
-timing chips built into the cards which may have registers which are accessible
-to kernel or user drivers. To the author's knowledge, using these to generate
-a clocksource for a Linux or other kernel has not yet been attempted and is in
-general frowned upon as not playing by the agreed rules of the game. Such a
-timer device would require additional support to be virtualized properly and is
-not considered important at this time as no known operating system does this.
-
-=========================================================================
-
-3) TSC Hardware
-
-The TSC or time stamp counter is relatively simple in theory; it counts
-instruction cycles issued by the processor, which can be used as a measure of
-time. In practice, due to a number of problems, it is the most complicated
-timekeeping device to use.
-
-The TSC is represented internally as a 64-bit MSR which can be read with the
-RDMSR, RDTSC, or RDTSCP (when available) instructions. In the past, hardware
-limitations made it possible to write the TSC, but generally on old hardware it
-was only possible to write the low 32-bits of the 64-bit counter, and the upper
-32-bits of the counter were cleared. Now, however, on Intel processors family
-0Fh, for models 3, 4 and 6, and family 06h, models e and f, this restriction
-has been lifted and all 64-bits are writable. On AMD systems, the ability to
-write the TSC MSR is not an architectural guarantee.
-
-The TSC is accessible from CPL-0 and conditionally, for CPL > 0 software by
-means of the CR4.TSD bit, which when enabled, disables CPL > 0 TSC access.
-
-Some vendors have implemented an additional instruction, RDTSCP, which returns
-atomically not just the TSC, but an indicator which corresponds to the
-processor number. This can be used to index into an array of TSC variables to
-determine offset information in SMP systems where TSCs are not synchronized.
-The presence of this instruction must be determined by consulting CPUID feature
-bits.
-
-Both VMX and SVM provide extension fields in the virtualization hardware which
-allows the guest visible TSC to be offset by a constant. Newer implementations
-promise to allow the TSC to additionally be scaled, but this hardware is not
-yet widely available.
-
-3.1) TSC synchronization
-
-The TSC is a CPU-local clock in most implementations. This means, on SMP
-platforms, the TSCs of different CPUs may start at different times depending
-on when the CPUs are powered on. Generally, CPUs on the same die will share
-the same clock, however, this is not always the case.
-
-The BIOS may attempt to resynchronize the TSCs during the poweron process and
-the operating system or other system software may attempt to do this as well.
-Several hardware limitations make the problem worse - if it is not possible to
-write the full 64-bits of the TSC, it may be impossible to match the TSC in
-newly arriving CPUs to that of the rest of the system, resulting in
-unsynchronized TSCs. This may be done by BIOS or system software, but in
-practice, getting a perfectly synchronized TSC will not be possible unless all
-values are read from the same clock, which generally only is possible on single
-socket systems or those with special hardware support.
-
-3.2) TSC and CPU hotplug
-
-As touched on already, CPUs which arrive later than the boot time of the system
-may not have a TSC value that is synchronized with the rest of the system.
-Either system software, BIOS, or SMM code may actually try to establish the TSC
-to a value matching the rest of the system, but a perfect match is usually not
-a guarantee. This can have the effect of bringing a system from a state where
-TSC is synchronized back to a state where TSC synchronization flaws, however
-small, may be exposed to the OS and any virtualization environment.
-
-3.3) TSC and multi-socket / NUMA
-
-Multi-socket systems, especially large multi-socket systems are likely to have
-individual clocksources rather than a single, universally distributed clock.
-Since these clocks are driven by different crystals, they will not have
-perfectly matched frequency, and temperature and electrical variations will
-cause the CPU clocks, and thus the TSCs to drift over time. Depending on the
-exact clock and bus design, the drift may or may not be fixed in absolute
-error, and may accumulate over time.
-
-In addition, very large systems may deliberately slew the clocks of individual
-cores. This technique, known as spread-spectrum clocking, reduces EMI at the
-clock frequency and harmonics of it, which may be required to pass FCC
-standards for telecommunications and computer equipment.
-
-It is recommended not to trust the TSCs to remain synchronized on NUMA or
-multiple socket systems for these reasons.
-
-3.4) TSC and C-states
-
-C-states, or idling states of the processor, especially C1E and deeper sleep
-states may be problematic for TSC as well. The TSC may stop advancing in such
-a state, resulting in a TSC which is behind that of other CPUs when execution
-is resumed. Such CPUs must be detected and flagged by the operating system
-based on CPU and chipset identifications.
-
-The TSC in such a case may be corrected by catching it up to a known external
-clocksource.
-
-3.5) TSC frequency change / P-states
-
-To make things slightly more interesting, some CPUs may change frequency. They
-may or may not run the TSC at the same rate, and because the frequency change
-may be staggered or slewed, at some points in time, the TSC rate may not be
-known other than falling within a range of values. In this case, the TSC will
-not be a stable time source, and must be calibrated against a known, stable,
-external clock to be a usable source of time.
-
-Whether the TSC runs at a constant rate or scales with the P-state is model
-dependent and must be determined by inspecting CPUID, chipset or vendor
-specific MSR fields.
-
-In addition, some vendors have known bugs where the P-state is actually
-compensated for properly during normal operation, but when the processor is
-inactive, the P-state may be raised temporarily to service cache misses from
-other processors. In such cases, the TSC on halted CPUs could advance faster
-than that of non-halted processors. AMD Turion processors are known to have
-this problem.
-
-3.6) TSC and STPCLK / T-states
-
-External signals given to the processor may also have the effect of stopping
-the TSC. This is typically done for thermal emergency power control to prevent
-an overheating condition, and typically, there is no way to detect that this
-condition has happened.
-
-3.7) TSC virtualization - VMX
-
-VMX provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
-instructions, which is enough for full virtualization of TSC in any manner. In
-addition, VMX allows passing through the host TSC plus an additional TSC_OFFSET
-field specified in the VMCS. Special instructions must be used to read and
-write the VMCS field.
-
-3.8) TSC virtualization - SVM
-
-SVM provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
-instructions, which is enough for full virtualization of TSC in any manner. In
-addition, SVM allows passing through the host TSC plus an additional offset
-field specified in the SVM control block.
-
-3.9) TSC feature bits in Linux
-
-In summary, there is no way to guarantee the TSC remains in perfect
-synchronization unless it is explicitly guaranteed by the architecture. Even
-if so, the TSCs in multi-sockets or NUMA systems may still run independently
-despite being locally consistent.
-
-The following feature bits are used by Linux to signal various TSC attributes,
-but they can only be taken to be meaningful for UP or single node systems.
-
-X86_FEATURE_TSC : The TSC is available in hardware
-X86_FEATURE_RDTSCP : The RDTSCP instruction is available
-X86_FEATURE_CONSTANT_TSC : The TSC rate is unchanged with P-states
-X86_FEATURE_NONSTOP_TSC : The TSC does not stop in C-states
-X86_FEATURE_TSC_RELIABLE : TSC sync checks are skipped (VMware)
-
-4) Virtualization Problems
-
-Timekeeping is especially problematic for virtualization because a number of
-challenges arise. The most obvious problem is that time is now shared between
-the host and, potentially, a number of virtual machines. Thus the virtual
-operating system does not run with 100% usage of the CPU, despite the fact that
-it may very well make that assumption. It may expect it to remain true to very
-exacting bounds when interrupt sources are disabled, but in reality only its
-virtual interrupt sources are disabled, and the machine may still be preempted
-at any time. This causes problems as the passage of real time, the injection
-of machine interrupts and the associated clock sources are no longer completely
-synchronized with real time.
-
-This same problem can occur on native harware to a degree, as SMM mode may
-steal cycles from the naturally on X86 systems when SMM mode is used by the
-BIOS, but not in such an extreme fashion. However, the fact that SMM mode may
-cause similar problems to virtualization makes it a good justification for
-solving many of these problems on bare metal.
-
-4.1) Interrupt clocking
-
-One of the most immediate problems that occurs with legacy operating systems
-is that the system timekeeping routines are often designed to keep track of
-time by counting periodic interrupts. These interrupts may come from the PIT
-or the RTC, but the problem is the same: the host virtualization engine may not
-be able to deliver the proper number of interrupts per second, and so guest
-time may fall behind. This is especially problematic if a high interrupt rate
-is selected, such as 1000 HZ, which is unfortunately the default for many Linux
-guests.
-
-There are three approaches to solving this problem; first, it may be possible
-to simply ignore it. Guests which have a separate time source for tracking
-'wall clock' or 'real time' may not need any adjustment of their interrupts to
-maintain proper time. If this is not sufficient, it may be necessary to inject
-additional interrupts into the guest in order to increase the effective
-interrupt rate. This approach leads to complications in extreme conditions,
-where host load or guest lag is too much to compensate for, and thus another
-solution to the problem has risen: the guest may need to become aware of lost
-ticks and compensate for them internally. Although promising in theory, the
-implementation of this policy in Linux has been extremely error prone, and a
-number of buggy variants of lost tick compensation are distributed across
-commonly used Linux systems.
-
-Windows uses periodic RTC clocking as a means of keeping time internally, and
-thus requires interrupt slewing to keep proper time. It does use a low enough
-rate (ed: is it 18.2 Hz?) however that it has not yet been a problem in
-practice.
-
-4.2) TSC sampling and serialization
-
-As the highest precision time source available, the cycle counter of the CPU
-has aroused much interest from developers. As explained above, this timer has
-many problems unique to its nature as a local, potentially unstable and
-potentially unsynchronized source. One issue which is not unique to the TSC,
-but is highlighted because of its very precise nature is sampling delay. By
-definition, the counter, once read is already old. However, it is also
-possible for the counter to be read ahead of the actual use of the result.
-This is a consequence of the superscalar execution of the instruction stream,
-which may execute instructions out of order. Such execution is called
-non-serialized. Forcing serialized execution is necessary for precise
-measurement with the TSC, and requires a serializing instruction, such as CPUID
-or an MSR read.
-
-Since CPUID may actually be virtualized by a trap and emulate mechanism, this
-serialization can pose a performance issue for hardware virtualization. An
-accurate time stamp counter reading may therefore not always be available, and
-it may be necessary for an implementation to guard against "backwards" reads of
-the TSC as seen from other CPUs, even in an otherwise perfectly synchronized
-system.
-
-4.3) Timespec aliasing
-
-Additionally, this lack of serialization from the TSC poses another challenge
-when using results of the TSC when measured against another time source. As
-the TSC is much higher precision, many possible values of the TSC may be read
-while another clock is still expressing the same value.
-
-That is, you may read (T,T+10) while external clock C maintains the same value.
-Due to non-serialized reads, you may actually end up with a range which
-fluctuates - from (T-1.. T+10). Thus, any time calculated from a TSC, but
-calibrated against an external value may have a range of valid values.
-Re-calibrating this computation may actually cause time, as computed after the
-calibration, to go backwards, compared with time computed before the
-calibration.
-
-This problem is particularly pronounced with an internal time source in Linux,
-the kernel time, which is expressed in the theoretically high resolution
-timespec - but which advances in much larger granularity intervals, sometimes
-at the rate of jiffies, and possibly in catchup modes, at a much larger step.
-
-This aliasing requires care in the computation and recalibration of kvmclock
-and any other values derived from TSC computation (such as TSC virtualization
-itself).
-
-4.4) Migration
-
-Migration of a virtual machine raises problems for timekeeping in two ways.
-First, the migration itself may take time, during which interrupts cannot be
-delivered, and after which, the guest time may need to be caught up. NTP may
-be able to help to some degree here, as the clock correction required is
-typically small enough to fall in the NTP-correctable window.
-
-An additional concern is that timers based off the TSC (or HPET, if the raw bus
-clock is exposed) may now be running at different rates, requiring compensation
-in some way in the hypervisor by virtualizing these timers. In addition,
-migrating to a faster machine may preclude the use of a passthrough TSC, as a
-faster clock cannot be made visible to a guest without the potential of time
-advancing faster than usual. A slower clock is less of a problem, as it can
-always be caught up to the original rate. KVM clock avoids these problems by
-simply storing multipliers and offsets against the TSC for the guest to convert
-back into nanosecond resolution values.
-
-4.5) Scheduling
-
-Since scheduling may be based on precise timing and firing of interrupts, the
-scheduling algorithms of an operating system may be adversely affected by
-virtualization. In theory, the effect is random and should be universally
-distributed, but in contrived as well as real scenarios (guest device access,
-causes of virtualization exits, possible context switch), this may not always
-be the case. The effect of this has not been well studied.
-
-In an attempt to work around this, several implementations have provided a
-paravirtualized scheduler clock, which reveals the true amount of CPU time for
-which a virtual machine has been running.
-
-4.6) Watchdogs
-
-Watchdog timers, such as the lock detector in Linux may fire accidentally when
-running under hardware virtualization due to timer interrupts being delayed or
-misinterpretation of the passage of real time. Usually, these warnings are
-spurious and can be ignored, but in some circumstances it may be necessary to
-disable such detection.
-
-4.7) Delays and precision timing
-
-Precise timing and delays may not be possible in a virtualized system. This
-can happen if the system is controlling physical hardware, or issues delays to
-compensate for slower I/O to and from devices. The first issue is not solvable
-in general for a virtualized system; hardware control software can't be
-adequately virtualized without a full real-time operating system, which would
-require an RT aware virtualization platform.
-
-The second issue may cause performance problems, but this is unlikely to be a
-significant issue. In many cases these delays may be eliminated through
-configuration or paravirtualization.
-
-4.8) Covert channels and leaks
-
-In addition to the above problems, time information will inevitably leak to the
-guest about the host in anything but a perfect implementation of virtualized
-time. This may allow the guest to infer the presence of a hypervisor (as in a
-red-pill type detection), and it may allow information to leak between guests
-by using CPU utilization itself as a signalling channel. Preventing such
-problems would require completely isolated virtual time which may not track
-real time any longer. This may be useful in certain security or QA contexts,
-but in general isn't recommended for real-world deployment scenarios.
+++ /dev/null
-# This creates the demonstration utility "lguest" which runs a Linux guest.
-# Missing headers? Add "-I../../include -I../../arch/x86/include"
-CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE
-
-all: lguest
-
-clean:
- rm -f lguest
+++ /dev/null
-#! /bin/sh
-
-set -e
-
-PREFIX=$1
-shift
-
-trap 'rm -r $TMPDIR' 0
-TMPDIR=`mktemp -d`
-
-exec 3>/dev/null
-for f; do
- while IFS="
-" read -r LINE; do
- case "$LINE" in
- *$PREFIX:[0-9]*:\**)
- NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
- if [ -f $TMPDIR/$NUM ]; then
- echo "$TMPDIR/$NUM already exits prior to $f"
- exit 1
- fi
- exec 3>>$TMPDIR/$NUM
- echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
- /bin/echo "$LINE" | sed -e "s/$PREFIX:[0-9]*//" -e "s/:\*/*/" >&3
- ;;
- *$PREFIX:[0-9]*)
- NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
- if [ -f $TMPDIR/$NUM ]; then
- echo "$TMPDIR/$NUM already exits prior to $f"
- exit 1
- fi
- exec 3>>$TMPDIR/$NUM
- echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
- /bin/echo "$LINE" | sed "s/$PREFIX:[0-9]*//" >&3
- ;;
- *:\**)
- /bin/echo "$LINE" | sed -e "s/:\*/*/" -e "s,/\*\*/,," >&3
- echo >&3
- exec 3>/dev/null
- ;;
- *)
- /bin/echo "$LINE" >&3
- ;;
- esac
- done < $f
- echo >&3
- exec 3>/dev/null
-done
-
-LASTFILE=""
-for f in $TMPDIR/*; do
- if [ "$LASTFILE" != $(cat $TMPDIR/.$(basename $f) ) ]; then
- LASTFILE=$(cat $TMPDIR/.$(basename $f) )
- echo "[ $LASTFILE ]"
- fi
- cat $f
-done
-
+++ /dev/null
-/*P:100
- * This is the Launcher code, a simple program which lays out the "physical"
- * memory for the new Guest by mapping the kernel image and the virtual
- * devices, then opens /dev/lguest to tell the kernel about the Guest and
- * control it.
-:*/
-#define _LARGEFILE64_SOURCE
-#define _GNU_SOURCE
-#include <stdio.h>
-#include <string.h>
-#include <unistd.h>
-#include <err.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <elf.h>
-#include <sys/mman.h>
-#include <sys/param.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/wait.h>
-#include <sys/eventfd.h>
-#include <fcntl.h>
-#include <stdbool.h>
-#include <errno.h>
-#include <ctype.h>
-#include <sys/socket.h>
-#include <sys/ioctl.h>
-#include <sys/time.h>
-#include <time.h>
-#include <netinet/in.h>
-#include <net/if.h>
-#include <linux/sockios.h>
-#include <linux/if_tun.h>
-#include <sys/uio.h>
-#include <termios.h>
-#include <getopt.h>
-#include <assert.h>
-#include <sched.h>
-#include <limits.h>
-#include <stddef.h>
-#include <signal.h>
-#include <pwd.h>
-#include <grp.h>
-
-#include <linux/virtio_config.h>
-#include <linux/virtio_net.h>
-#include <linux/virtio_blk.h>
-#include <linux/virtio_console.h>
-#include <linux/virtio_rng.h>
-#include <linux/virtio_ring.h>
-#include <asm/bootparam.h>
-#include "../../include/linux/lguest_launcher.h"
-/*L:110
- * We can ignore the 42 include files we need for this program, but I do want
- * to draw attention to the use of kernel-style types.
- *
- * As Linus said, "C is a Spartan language, and so should your naming be." I
- * like these abbreviations, so we define them here. Note that u64 is always
- * unsigned long long, which works on all Linux systems: this means that we can
- * use %llu in printf for any u64.
- */
-typedef unsigned long long u64;
-typedef uint32_t u32;
-typedef uint16_t u16;
-typedef uint8_t u8;
-/*:*/
-
-#define PAGE_PRESENT 0x7 /* Present, RW, Execute */
-#define BRIDGE_PFX "bridge:"
-#ifndef SIOCBRADDIF
-#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
-#endif
-/* We can have up to 256 pages for devices. */
-#define DEVICE_PAGES 256
-/* This will occupy 3 pages: it must be a power of 2. */
-#define VIRTQUEUE_NUM 256
-
-/*L:120
- * verbose is both a global flag and a macro. The C preprocessor allows
- * this, and although I wouldn't recommend it, it works quite nicely here.
- */
-static bool verbose;
-#define verbose(args...) \
- do { if (verbose) printf(args); } while(0)
-/*:*/
-
-/* The pointer to the start of guest memory. */
-static void *guest_base;
-/* The maximum guest physical address allowed, and maximum possible. */
-static unsigned long guest_limit, guest_max;
-/* The /dev/lguest file descriptor. */
-static int lguest_fd;
-
-/* a per-cpu variable indicating whose vcpu is currently running */
-static unsigned int __thread cpu_id;
-
-/* This is our list of devices. */
-struct device_list {
- /* Counter to assign interrupt numbers. */
- unsigned int next_irq;
-
- /* Counter to print out convenient device numbers. */
- unsigned int device_num;
-
- /* The descriptor page for the devices. */
- u8 *descpage;
-
- /* A single linked list of devices. */
- struct device *dev;
- /* And a pointer to the last device for easy append. */
- struct device *lastdev;
-};
-
-/* The list of Guest devices, based on command line arguments. */
-static struct device_list devices;
-
-/* The device structure describes a single device. */
-struct device {
- /* The linked-list pointer. */
- struct device *next;
-
- /* The device's descriptor, as mapped into the Guest. */
- struct lguest_device_desc *desc;
-
- /* We can't trust desc values once Guest has booted: we use these. */
- unsigned int feature_len;
- unsigned int num_vq;
-
- /* The name of this device, for --verbose. */
- const char *name;
-
- /* Any queues attached to this device */
- struct virtqueue *vq;
-
- /* Is it operational */
- bool running;
-
- /* Does Guest want an intrrupt on empty? */
- bool irq_on_empty;
-
- /* Device-specific data. */
- void *priv;
-};
-
-/* The virtqueue structure describes a queue attached to a device. */
-struct virtqueue {
- struct virtqueue *next;
-
- /* Which device owns me. */
- struct device *dev;
-
- /* The configuration for this queue. */
- struct lguest_vqconfig config;
-
- /* The actual ring of buffers. */
- struct vring vring;
-
- /* Last available index we saw. */
- u16 last_avail_idx;
-
- /* How many are used since we sent last irq? */
- unsigned int pending_used;
-
- /* Eventfd where Guest notifications arrive. */
- int eventfd;
-
- /* Function for the thread which is servicing this virtqueue. */
- void (*service)(struct virtqueue *vq);
- pid_t thread;
-};
-
-/* Remember the arguments to the program so we can "reboot" */
-static char **main_args;
-
-/* The original tty settings to restore on exit. */
-static struct termios orig_term;
-
-/*
- * We have to be careful with barriers: our devices are all run in separate
- * threads and so we need to make sure that changes visible to the Guest happen
- * in precise order.
- */
-#define wmb() __asm__ __volatile__("" : : : "memory")
-#define mb() __asm__ __volatile__("" : : : "memory")
-
-/*
- * Convert an iovec element to the given type.
- *
- * This is a fairly ugly trick: we need to know the size of the type and
- * alignment requirement to check the pointer is kosher. It's also nice to
- * have the name of the type in case we report failure.
- *
- * Typing those three things all the time is cumbersome and error prone, so we
- * have a macro which sets them all up and passes to the real function.
- */
-#define convert(iov, type) \
- ((type *)_convert((iov), sizeof(type), __alignof__(type), #type))
-
-static void *_convert(struct iovec *iov, size_t size, size_t align,
- const char *name)
-{
- if (iov->iov_len != size)
- errx(1, "Bad iovec size %zu for %s", iov->iov_len, name);
- if ((unsigned long)iov->iov_base % align != 0)
- errx(1, "Bad alignment %p for %s", iov->iov_base, name);
- return iov->iov_base;
-}
-
-/* Wrapper for the last available index. Makes it easier to change. */
-#define lg_last_avail(vq) ((vq)->last_avail_idx)
-
-/*
- * The virtio configuration space is defined to be little-endian. x86 is
- * little-endian too, but it's nice to be explicit so we have these helpers.
- */
-#define cpu_to_le16(v16) (v16)
-#define cpu_to_le32(v32) (v32)
-#define cpu_to_le64(v64) (v64)
-#define le16_to_cpu(v16) (v16)
-#define le32_to_cpu(v32) (v32)
-#define le64_to_cpu(v64) (v64)
-
-/* Is this iovec empty? */
-static bool iov_empty(const struct iovec iov[], unsigned int num_iov)
-{
- unsigned int i;
-
- for (i = 0; i < num_iov; i++)
- if (iov[i].iov_len)
- return false;
- return true;
-}
-
-/* Take len bytes from the front of this iovec. */
-static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len)
-{
- unsigned int i;
-
- for (i = 0; i < num_iov; i++) {
- unsigned int used;
-
- used = iov[i].iov_len < len ? iov[i].iov_len : len;
- iov[i].iov_base += used;
- iov[i].iov_len -= used;
- len -= used;
- }
- assert(len == 0);
-}
-
-/* The device virtqueue descriptors are followed by feature bitmasks. */
-static u8 *get_feature_bits(struct device *dev)
-{
- return (u8 *)(dev->desc + 1)
- + dev->num_vq * sizeof(struct lguest_vqconfig);
-}
-
-/*L:100
- * The Launcher code itself takes us out into userspace, that scary place where
- * pointers run wild and free! Unfortunately, like most userspace programs,
- * it's quite boring (which is why everyone likes to hack on the kernel!).
- * Perhaps if you make up an Lguest Drinking Game at this point, it will get
- * you through this section. Or, maybe not.
- *
- * The Launcher sets up a big chunk of memory to be the Guest's "physical"
- * memory and stores it in "guest_base". In other words, Guest physical ==
- * Launcher virtual with an offset.
- *
- * This can be tough to get your head around, but usually it just means that we
- * use these trivial conversion functions when the Guest gives us its
- * "physical" addresses:
- */
-static void *from_guest_phys(unsigned long addr)
-{
- return guest_base + addr;
-}
-
-static unsigned long to_guest_phys(const void *addr)
-{
- return (addr - guest_base);
-}
-
-/*L:130
- * Loading the Kernel.
- *
- * We start with couple of simple helper routines. open_or_die() avoids
- * error-checking code cluttering the callers:
- */
-static int open_or_die(const char *name, int flags)
-{
- int fd = open(name, flags);
- if (fd < 0)
- err(1, "Failed to open %s", name);
- return fd;
-}
-
-/* map_zeroed_pages() takes a number of pages. */
-static void *map_zeroed_pages(unsigned int num)
-{
- int fd = open_or_die("/dev/zero", O_RDONLY);
- void *addr;
-
- /*
- * We use a private mapping (ie. if we write to the page, it will be
- * copied). We allocate an extra two pages PROT_NONE to act as guard
- * pages against read/write attempts that exceed allocated space.
- */
- addr = mmap(NULL, getpagesize() * (num+2),
- PROT_NONE, MAP_PRIVATE, fd, 0);
-
- if (addr == MAP_FAILED)
- err(1, "Mmapping %u pages of /dev/zero", num);
-
- if (mprotect(addr + getpagesize(), getpagesize() * num,
- PROT_READ|PROT_WRITE) == -1)
- err(1, "mprotect rw %u pages failed", num);
-
- /*
- * One neat mmap feature is that you can close the fd, and it
- * stays mapped.
- */
- close(fd);
-
- /* Return address after PROT_NONE page */
- return addr + getpagesize();
-}
-
-/* Get some more pages for a device. */
-static void *get_pages(unsigned int num)
-{
- void *addr = from_guest_phys(guest_limit);
-
- guest_limit += num * getpagesize();
- if (guest_limit > guest_max)
- errx(1, "Not enough memory for devices");
- return addr;
-}
-
-/*
- * This routine is used to load the kernel or initrd. It tries mmap, but if
- * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
- * it falls back to reading the memory in.
- */
-static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
-{
- ssize_t r;
-
- /*
- * We map writable even though for some segments are marked read-only.
- * The kernel really wants to be writable: it patches its own
- * instructions.
- *
- * MAP_PRIVATE means that the page won't be copied until a write is
- * done to it. This allows us to share untouched memory between
- * Guests.
- */
- if (mmap(addr, len, PROT_READ|PROT_WRITE,
- MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED)
- return;
-
- /* pread does a seek and a read in one shot: saves a few lines. */
- r = pread(fd, addr, len, offset);
- if (r != len)
- err(1, "Reading offset %lu len %lu gave %zi", offset, len, r);
-}
-
-/*
- * This routine takes an open vmlinux image, which is in ELF, and maps it into
- * the Guest memory. ELF = Embedded Linking Format, which is the format used
- * by all modern binaries on Linux including the kernel.
- *
- * The ELF headers give *two* addresses: a physical address, and a virtual
- * address. We use the physical address; the Guest will map itself to the
- * virtual address.
- *
- * We return the starting address.
- */
-static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
-{
- Elf32_Phdr phdr[ehdr->e_phnum];
- unsigned int i;
-
- /*
- * Sanity checks on the main ELF header: an x86 executable with a
- * reasonable number of correctly-sized program headers.
- */
- if (ehdr->e_type != ET_EXEC
- || ehdr->e_machine != EM_386
- || ehdr->e_phentsize != sizeof(Elf32_Phdr)
- || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr))
- errx(1, "Malformed elf header");
-
- /*
- * An ELF executable contains an ELF header and a number of "program"
- * headers which indicate which parts ("segments") of the program to
- * load where.
- */
-
- /* We read in all the program headers at once: */
- if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0)
- err(1, "Seeking to program headers");
- if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr))
- err(1, "Reading program headers");
-
- /*
- * Try all the headers: there are usually only three. A read-only one,
- * a read-write one, and a "note" section which we don't load.
- */
- for (i = 0; i < ehdr->e_phnum; i++) {
- /* If this isn't a loadable segment, we ignore it */
- if (phdr[i].p_type != PT_LOAD)
- continue;
-
- verbose("Section %i: size %i addr %p\n",
- i, phdr[i].p_memsz, (void *)phdr[i].p_paddr);
-
- /* We map this section of the file at its physical address. */
- map_at(elf_fd, from_guest_phys(phdr[i].p_paddr),
- phdr[i].p_offset, phdr[i].p_filesz);
- }
-
- /* The entry point is given in the ELF header. */
- return ehdr->e_entry;
-}
-
-/*L:150
- * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed
- * to jump into it and it will unpack itself. We used to have to perform some
- * hairy magic because the unpacking code scared me.
- *
- * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote
- * a small patch to jump over the tricky bits in the Guest, so now we just read
- * the funky header so we know where in the file to load, and away we go!
- */
-static unsigned long load_bzimage(int fd)
-{
- struct boot_params boot;
- int r;
- /* Modern bzImages get loaded at 1M. */
- void *p = from_guest_phys(0x100000);
-
- /*
- * Go back to the start of the file and read the header. It should be
- * a Linux boot header (see Documentation/x86/i386/boot.txt)
- */
- lseek(fd, 0, SEEK_SET);
- read(fd, &boot, sizeof(boot));
-
- /* Inside the setup_hdr, we expect the magic "HdrS" */
- if (memcmp(&boot.hdr.header, "HdrS", 4) != 0)
- errx(1, "This doesn't look like a bzImage to me");
-
- /* Skip over the extra sectors of the header. */
- lseek(fd, (boot.hdr.setup_sects+1) * 512, SEEK_SET);
-
- /* Now read everything into memory. in nice big chunks. */
- while ((r = read(fd, p, 65536)) > 0)
- p += r;
-
- /* Finally, code32_start tells us where to enter the kernel. */
- return boot.hdr.code32_start;
-}
-
-/*L:140
- * Loading the kernel is easy when it's a "vmlinux", but most kernels
- * come wrapped up in the self-decompressing "bzImage" format. With a little
- * work, we can load those, too.
- */
-static unsigned long load_kernel(int fd)
-{
- Elf32_Ehdr hdr;
-
- /* Read in the first few bytes. */
- if (read(fd, &hdr, sizeof(hdr)) != sizeof(hdr))
- err(1, "Reading kernel");
-
- /* If it's an ELF file, it starts with "\177ELF" */
- if (memcmp(hdr.e_ident, ELFMAG, SELFMAG) == 0)
- return map_elf(fd, &hdr);
-
- /* Otherwise we assume it's a bzImage, and try to load it. */
- return load_bzimage(fd);
-}
-
-/*
- * This is a trivial little helper to align pages. Andi Kleen hated it because
- * it calls getpagesize() twice: "it's dumb code."
- *
- * Kernel guys get really het up about optimization, even when it's not
- * necessary. I leave this code as a reaction against that.
- */
-static inline unsigned long page_align(unsigned long addr)
-{
- /* Add upwards and truncate downwards. */
- return ((addr + getpagesize()-1) & ~(getpagesize()-1));
-}
-
-/*L:180
- * An "initial ram disk" is a disk image loaded into memory along with the
- * kernel which the kernel can use to boot from without needing any drivers.
- * Most distributions now use this as standard: the initrd contains the code to
- * load the appropriate driver modules for the current machine.
- *
- * Importantly, James Morris works for RedHat, and Fedora uses initrds for its
- * kernels. He sent me this (and tells me when I break it).
- */
-static unsigned long load_initrd(const char *name, unsigned long mem)
-{
- int ifd;
- struct stat st;
- unsigned long len;
-
- ifd = open_or_die(name, O_RDONLY);
- /* fstat() is needed to get the file size. */
- if (fstat(ifd, &st) < 0)
- err(1, "fstat() on initrd '%s'", name);
-
- /*
- * We map the initrd at the top of memory, but mmap wants it to be
- * page-aligned, so we round the size up for that.
- */
- len = page_align(st.st_size);
- map_at(ifd, from_guest_phys(mem - len), 0, st.st_size);
- /*
- * Once a file is mapped, you can close the file descriptor. It's a
- * little odd, but quite useful.
- */
- close(ifd);
- verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len);
-
- /* We return the initrd size. */
- return len;
-}
-/*:*/
-
-/*
- * Simple routine to roll all the commandline arguments together with spaces
- * between them.
- */
-static void concat(char *dst, char *args[])
-{
- unsigned int i, len = 0;
-
- for (i = 0; args[i]; i++) {
- if (i) {
- strcat(dst+len, " ");
- len++;
- }
- strcpy(dst+len, args[i]);
- len += strlen(args[i]);
- }
- /* In case it's empty. */
- dst[len] = '\0';
-}
-
-/*L:185
- * This is where we actually tell the kernel to initialize the Guest. We
- * saw the arguments it expects when we looked at initialize() in lguest_user.c:
- * the base of Guest "physical" memory, the top physical page to allow and the
- * entry point for the Guest.
- */
-static void tell_kernel(unsigned long start)
-{
- unsigned long args[] = { LHREQ_INITIALIZE,
- (unsigned long)guest_base,
- guest_limit / getpagesize(), start };
- verbose("Guest: %p - %p (%#lx)\n",
- guest_base, guest_base + guest_limit, guest_limit);
- lguest_fd = open_or_die("/dev/lguest", O_RDWR);
- if (write(lguest_fd, args, sizeof(args)) < 0)
- err(1, "Writing to /dev/lguest");
-}
-/*:*/
-
-/*L:200
- * Device Handling.
- *
- * When the Guest gives us a buffer, it sends an array of addresses and sizes.
- * We need to make sure it's not trying to reach into the Launcher itself, so
- * we have a convenient routine which checks it and exits with an error message
- * if something funny is going on:
- */
-static void *_check_pointer(unsigned long addr, unsigned int size,
- unsigned int line)
-{
- /*
- * Check if the requested address and size exceeds the allocated memory,
- * or addr + size wraps around.
- */
- if ((addr + size) > guest_limit || (addr + size) < addr)
- errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr);
- /*
- * We return a pointer for the caller's convenience, now we know it's
- * safe to use.
- */
- return from_guest_phys(addr);
-}
-/* A macro which transparently hands the line number to the real function. */
-#define check_pointer(addr,size) _check_pointer(addr, size, __LINE__)
-
-/*
- * Each buffer in the virtqueues is actually a chain of descriptors. This
- * function returns the next descriptor in the chain, or vq->vring.num if we're
- * at the end.
- */
-static unsigned next_desc(struct vring_desc *desc,
- unsigned int i, unsigned int max)
-{
- unsigned int next;
-
- /* If this descriptor says it doesn't chain, we're done. */
- if (!(desc[i].flags & VRING_DESC_F_NEXT))
- return max;
-
- /* Check they're not leading us off end of descriptors. */
- next = desc[i].next;
- /* Make sure compiler knows to grab that: we don't want it changing! */
- wmb();
-
- if (next >= max)
- errx(1, "Desc next is %u", next);
-
- return next;
-}
-
-/*
- * This actually sends the interrupt for this virtqueue, if we've used a
- * buffer.
- */
-static void trigger_irq(struct virtqueue *vq)
-{
- unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
-
- /* Don't inform them if nothing used. */
- if (!vq->pending_used)
- return;
- vq->pending_used = 0;
-
- /* If they don't want an interrupt, don't send one... */
- if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) {
- /* ... unless they've asked us to force one on empty. */
- if (!vq->dev->irq_on_empty
- || lg_last_avail(vq) != vq->vring.avail->idx)
- return;
- }
-
- /* Send the Guest an interrupt tell them we used something up. */
- if (write(lguest_fd, buf, sizeof(buf)) != 0)
- err(1, "Triggering irq %i", vq->config.irq);
-}
-
-/*
- * This looks in the virtqueue for the first available buffer, and converts
- * it to an iovec for convenient access. Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function waits if necessary, and returns the descriptor number found.
- */
-static unsigned wait_for_vq_desc(struct virtqueue *vq,
- struct iovec iov[],
- unsigned int *out_num, unsigned int *in_num)
-{
- unsigned int i, head, max;
- struct vring_desc *desc;
- u16 last_avail = lg_last_avail(vq);
-
- /* There's nothing available? */
- while (last_avail == vq->vring.avail->idx) {
- u64 event;
-
- /*
- * Since we're about to sleep, now is a good time to tell the
- * Guest about what we've used up to now.
- */
- trigger_irq(vq);
-
- /* OK, now we need to know about added descriptors. */
- vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
-
- /*
- * They could have slipped one in as we were doing that: make
- * sure it's written, then check again.
- */
- mb();
- if (last_avail != vq->vring.avail->idx) {
- vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
- break;
- }
-
- /* Nothing new? Wait for eventfd to tell us they refilled. */
- if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event))
- errx(1, "Event read failed?");
-
- /* We don't need to be notified again. */
- vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
- }
-
- /* Check it isn't doing very strange things with descriptor numbers. */
- if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num)
- errx(1, "Guest moved used index from %u to %u",
- last_avail, vq->vring.avail->idx);
-
- /*
- * Grab the next descriptor number they're advertising, and increment
- * the index we've seen.
- */
- head = vq->vring.avail->ring[last_avail % vq->vring.num];
- lg_last_avail(vq)++;
-
- /* If their number is silly, that's a fatal mistake. */
- if (head >= vq->vring.num)
- errx(1, "Guest says index %u is available", head);
-
- /* When we start there are none of either input nor output. */
- *out_num = *in_num = 0;
-
- max = vq->vring.num;
- desc = vq->vring.desc;
- i = head;
-
- /*
- * If this is an indirect entry, then this buffer contains a descriptor
- * table which we handle as if it's any normal descriptor chain.
- */
- if (desc[i].flags & VRING_DESC_F_INDIRECT) {
- if (desc[i].len % sizeof(struct vring_desc))
- errx(1, "Invalid size for indirect buffer table");
-
- max = desc[i].len / sizeof(struct vring_desc);
- desc = check_pointer(desc[i].addr, desc[i].len);
- i = 0;
- }
-
- do {
- /* Grab the first descriptor, and check it's OK. */
- iov[*out_num + *in_num].iov_len = desc[i].len;
- iov[*out_num + *in_num].iov_base
- = check_pointer(desc[i].addr, desc[i].len);
- /* If this is an input descriptor, increment that count. */
- if (desc[i].flags & VRING_DESC_F_WRITE)
- (*in_num)++;
- else {
- /*
- * If it's an output descriptor, they're all supposed
- * to come before any input descriptors.
- */
- if (*in_num)
- errx(1, "Descriptor has out after in");
- (*out_num)++;
- }
-
- /* If we've got too many, that implies a descriptor loop. */
- if (*out_num + *in_num > max)
- errx(1, "Looped descriptor");
- } while ((i = next_desc(desc, i, max)) != max);
-
- return head;
-}
-
-/*
- * After we've used one of their buffers, we tell the Guest about it. Sometime
- * later we'll want to send them an interrupt using trigger_irq(); note that
- * wait_for_vq_desc() does that for us if it has to wait.
- */
-static void add_used(struct virtqueue *vq, unsigned int head, int len)
-{
- struct vring_used_elem *used;
-
- /*
- * The virtqueue contains a ring of used buffers. Get a pointer to the
- * next entry in that used ring.
- */
- used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num];
- used->id = head;
- used->len = len;
- /* Make sure buffer is written before we update index. */
- wmb();
- vq->vring.used->idx++;
- vq->pending_used++;
-}
-
-/* And here's the combo meal deal. Supersize me! */
-static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len)
-{
- add_used(vq, head, len);
- trigger_irq(vq);
-}
-
-/*
- * The Console
- *
- * We associate some data with the console for our exit hack.
- */
-struct console_abort {
- /* How many times have they hit ^C? */
- int count;
- /* When did they start? */
- struct timeval start;
-};
-
-/* This is the routine which handles console input (ie. stdin). */
-static void console_input(struct virtqueue *vq)
-{
- int len;
- unsigned int head, in_num, out_num;
- struct console_abort *abort = vq->dev->priv;
- struct iovec iov[vq->vring.num];
-
- /* Make sure there's a descriptor available. */
- head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
- if (out_num)
- errx(1, "Output buffers in console in queue?");
-
- /* Read into it. This is where we usually wait. */
- len = readv(STDIN_FILENO, iov, in_num);
- if (len <= 0) {
- /* Ran out of input? */
- warnx("Failed to get console input, ignoring console.");
- /*
- * For simplicity, dying threads kill the whole Launcher. So
- * just nap here.
- */
- for (;;)
- pause();
- }
-
- /* Tell the Guest we used a buffer. */
- add_used_and_trigger(vq, head, len);
-
- /*
- * Three ^C within one second? Exit.
- *
- * This is such a hack, but works surprisingly well. Each ^C has to
- * be in a buffer by itself, so they can't be too fast. But we check
- * that we get three within about a second, so they can't be too
- * slow.
- */
- if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) {
- abort->count = 0;
- return;
- }
-
- abort->count++;
- if (abort->count == 1)
- gettimeofday(&abort->start, NULL);
- else if (abort->count == 3) {
- struct timeval now;
- gettimeofday(&now, NULL);
- /* Kill all Launcher processes with SIGINT, like normal ^C */
- if (now.tv_sec <= abort->start.tv_sec+1)
- kill(0, SIGINT);
- abort->count = 0;
- }
-}
-
-/* This is the routine which handles console output (ie. stdout). */
-static void console_output(struct virtqueue *vq)
-{
- unsigned int head, out, in;
- struct iovec iov[vq->vring.num];
-
- /* We usually wait in here, for the Guest to give us something. */
- head = wait_for_vq_desc(vq, iov, &out, &in);
- if (in)
- errx(1, "Input buffers in console output queue?");
-
- /* writev can return a partial write, so we loop here. */
- while (!iov_empty(iov, out)) {
- int len = writev(STDOUT_FILENO, iov, out);
- if (len <= 0)
- err(1, "Write to stdout gave %i", len);
- iov_consume(iov, out, len);
- }
-
- /*
- * We're finished with that buffer: if we're going to sleep,
- * wait_for_vq_desc() will prod the Guest with an interrupt.
- */
- add_used(vq, head, 0);
-}
-
-/*
- * The Network
- *
- * Handling output for network is also simple: we get all the output buffers
- * and write them to /dev/net/tun.
- */
-struct net_info {
- int tunfd;
-};
-
-static void net_output(struct virtqueue *vq)
-{
- struct net_info *net_info = vq->dev->priv;
- unsigned int head, out, in;
- struct iovec iov[vq->vring.num];
-
- /* We usually wait in here for the Guest to give us a packet. */
- head = wait_for_vq_desc(vq, iov, &out, &in);
- if (in)
- errx(1, "Input buffers in net output queue?");
- /*
- * Send the whole thing through to /dev/net/tun. It expects the exact
- * same format: what a coincidence!
- */
- if (writev(net_info->tunfd, iov, out) < 0)
- errx(1, "Write to tun failed?");
-
- /*
- * Done with that one; wait_for_vq_desc() will send the interrupt if
- * all packets are processed.
- */
- add_used(vq, head, 0);
-}
-
-/*
- * Handling network input is a bit trickier, because I've tried to optimize it.
- *
- * First we have a helper routine which tells is if from this file descriptor
- * (ie. the /dev/net/tun device) will block:
- */
-static bool will_block(int fd)
-{
- fd_set fdset;
- struct timeval zero = { 0, 0 };
- FD_ZERO(&fdset);
- FD_SET(fd, &fdset);
- return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
-}
-
-/*
- * This handles packets coming in from the tun device to our Guest. Like all
- * service routines, it gets called again as soon as it returns, so you don't
- * see a while(1) loop here.
- */
-static void net_input(struct virtqueue *vq)
-{
- int len;
- unsigned int head, out, in;
- struct iovec iov[vq->vring.num];
- struct net_info *net_info = vq->dev->priv;
-
- /*
- * Get a descriptor to write an incoming packet into. This will also
- * send an interrupt if they're out of descriptors.
- */
- head = wait_for_vq_desc(vq, iov, &out, &in);
- if (out)
- errx(1, "Output buffers in net input queue?");
-
- /*
- * If it looks like we'll block reading from the tun device, send them
- * an interrupt.
- */
- if (vq->pending_used && will_block(net_info->tunfd))
- trigger_irq(vq);
-
- /*
- * Read in the packet. This is where we normally wait (when there's no
- * incoming network traffic).
- */
- len = readv(net_info->tunfd, iov, in);
- if (len <= 0)
- err(1, "Failed to read from tun.");
-
- /*
- * Mark that packet buffer as used, but don't interrupt here. We want
- * to wait until we've done as much work as we can.
- */
- add_used(vq, head, len);
-}
-/*:*/
-
-/* This is the helper to create threads: run the service routine in a loop. */
-static int do_thread(void *_vq)
-{
- struct virtqueue *vq = _vq;
-
- for (;;)
- vq->service(vq);
- return 0;
-}
-
-/*
- * When a child dies, we kill our entire process group with SIGTERM. This
- * also has the side effect that the shell restores the console for us!
- */
-static void kill_launcher(int signal)
-{
- kill(0, SIGTERM);
-}
-
-static void reset_device(struct device *dev)
-{
- struct virtqueue *vq;
-
- verbose("Resetting device %s\n", dev->name);
-
- /* Clear any features they've acked. */
- memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len);
-
- /* We're going to be explicitly killing threads, so ignore them. */
- signal(SIGCHLD, SIG_IGN);
-
- /* Zero out the virtqueues, get rid of their threads */
- for (vq = dev->vq; vq; vq = vq->next) {
- if (vq->thread != (pid_t)-1) {
- kill(vq->thread, SIGTERM);
- waitpid(vq->thread, NULL, 0);
- vq->thread = (pid_t)-1;
- }
- memset(vq->vring.desc, 0,
- vring_size(vq->config.num, LGUEST_VRING_ALIGN));
- lg_last_avail(vq) = 0;
- }
- dev->running = false;
-
- /* Now we care if threads die. */
- signal(SIGCHLD, (void *)kill_launcher);
-}
-
-/*L:216
- * This actually creates the thread which services the virtqueue for a device.
- */
-static void create_thread(struct virtqueue *vq)
-{
- /*
- * Create stack for thread. Since the stack grows upwards, we point
- * the stack pointer to the end of this region.
- */
- char *stack = malloc(32768);
- unsigned long args[] = { LHREQ_EVENTFD,
- vq->config.pfn*getpagesize(), 0 };
-
- /* Create a zero-initialized eventfd. */
- vq->eventfd = eventfd(0, 0);
- if (vq->eventfd < 0)
- err(1, "Creating eventfd");
- args[2] = vq->eventfd;
-
- /*
- * Attach an eventfd to this virtqueue: it will go off when the Guest
- * does an LHCALL_NOTIFY for this vq.
- */
- if (write(lguest_fd, &args, sizeof(args)) != 0)
- err(1, "Attaching eventfd");
-
- /*
- * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so
- * we get a signal if it dies.
- */
- vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq);
- if (vq->thread == (pid_t)-1)
- err(1, "Creating clone");
-
- /* We close our local copy now the child has it. */
- close(vq->eventfd);
-}
-
-static bool accepted_feature(struct device *dev, unsigned int bit)
-{
- const u8 *features = get_feature_bits(dev) + dev->feature_len;
-
- if (dev->feature_len < bit / CHAR_BIT)
- return false;
- return features[bit / CHAR_BIT] & (1 << (bit % CHAR_BIT));
-}
-
-static void start_device(struct device *dev)
-{
- unsigned int i;
- struct virtqueue *vq;
-
- verbose("Device %s OK: offered", dev->name);
- for (i = 0; i < dev->feature_len; i++)
- verbose(" %02x", get_feature_bits(dev)[i]);
- verbose(", accepted");
- for (i = 0; i < dev->feature_len; i++)
- verbose(" %02x", get_feature_bits(dev)
- [dev->feature_len+i]);
-
- dev->irq_on_empty = accepted_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
-
- for (vq = dev->vq; vq; vq = vq->next) {
- if (vq->service)
- create_thread(vq);
- }
- dev->running = true;
-}
-
-static void cleanup_devices(void)
-{
- struct device *dev;
-
- for (dev = devices.dev; dev; dev = dev->next)
- reset_device(dev);
-
- /* If we saved off the original terminal settings, restore them now. */
- if (orig_term.c_lflag & (ISIG|ICANON|ECHO))
- tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
-}
-
-/* When the Guest tells us they updated the status field, we handle it. */
-static void update_device_status(struct device *dev)
-{
- /* A zero status is a reset, otherwise it's a set of flags. */
- if (dev->desc->status == 0)
- reset_device(dev);
- else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
- warnx("Device %s configuration FAILED", dev->name);
- if (dev->running)
- reset_device(dev);
- } else if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
- if (!dev->running)
- start_device(dev);
- }
-}
-
-/*L:215
- * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In
- * particular, it's used to notify us of device status changes during boot.
- */
-static void handle_output(unsigned long addr)
-{
- struct device *i;
-
- /* Check each device. */
- for (i = devices.dev; i; i = i->next) {
- struct virtqueue *vq;
-
- /*
- * Notifications to device descriptors mean they updated the
- * device status.
- */
- if (from_guest_phys(addr) == i->desc) {
- update_device_status(i);
- return;
- }
-
- /*
- * Devices *can* be used before status is set to DRIVER_OK.
- * The original plan was that they would never do this: they
- * would always finish setting up their status bits before
- * actually touching the virtqueues. In practice, we allowed
- * them to, and they do (eg. the disk probes for partition
- * tables as part of initialization).
- *
- * If we see this, we start the device: once it's running, we
- * expect the device to catch all the notifications.
- */
- for (vq = i->vq; vq; vq = vq->next) {
- if (addr != vq->config.pfn*getpagesize())
- continue;
- if (i->running)
- errx(1, "Notification on running %s", i->name);
- /* This just calls create_thread() for each virtqueue */
- start_device(i);
- return;
- }
- }
-
- /*
- * Early console write is done using notify on a nul-terminated string
- * in Guest memory. It's also great for hacking debugging messages
- * into a Guest.
- */
- if (addr >= guest_limit)
- errx(1, "Bad NOTIFY %#lx", addr);
-
- write(STDOUT_FILENO, from_guest_phys(addr),
- strnlen(from_guest_phys(addr), guest_limit - addr));
-}
-
-/*L:190
- * Device Setup
- *
- * All devices need a descriptor so the Guest knows it exists, and a "struct
- * device" so the Launcher can keep track of it. We have common helper
- * routines to allocate and manage them.
- */
-
-/*
- * The layout of the device page is a "struct lguest_device_desc" followed by a
- * number of virtqueue descriptors, then two sets of feature bits, then an
- * array of configuration bytes. This routine returns the configuration
- * pointer.
- */
-static u8 *device_config(const struct device *dev)
-{
- return (void *)(dev->desc + 1)
- + dev->num_vq * sizeof(struct lguest_vqconfig)
- + dev->feature_len * 2;
-}
-
-/*
- * This routine allocates a new "struct lguest_device_desc" from descriptor
- * table page just above the Guest's normal memory. It returns a pointer to
- * that descriptor.
- */
-static struct lguest_device_desc *new_dev_desc(u16 type)
-{
- struct lguest_device_desc d = { .type = type };
- void *p;
-
- /* Figure out where the next device config is, based on the last one. */
- if (devices.lastdev)
- p = device_config(devices.lastdev)
- + devices.lastdev->desc->config_len;
- else
- p = devices.descpage;
-
- /* We only have one page for all the descriptors. */
- if (p + sizeof(d) > (void *)devices.descpage + getpagesize())
- errx(1, "Too many devices");
-
- /* p might not be aligned, so we memcpy in. */
- return memcpy(p, &d, sizeof(d));
-}
-
-/*
- * Each device descriptor is followed by the description of its virtqueues. We
- * specify how many descriptors the virtqueue is to have.
- */
-static void add_virtqueue(struct device *dev, unsigned int num_descs,
- void (*service)(struct virtqueue *))
-{
- unsigned int pages;
- struct virtqueue **i, *vq = malloc(sizeof(*vq));
- void *p;
-
- /* First we need some memory for this virtqueue. */
- pages = (vring_size(num_descs, LGUEST_VRING_ALIGN) + getpagesize() - 1)
- / getpagesize();
- p = get_pages(pages);
-
- /* Initialize the virtqueue */
- vq->next = NULL;
- vq->last_avail_idx = 0;
- vq->dev = dev;
-
- /*
- * This is the routine the service thread will run, and its Process ID
- * once it's running.
- */
- vq->service = service;
- vq->thread = (pid_t)-1;
-
- /* Initialize the configuration. */
- vq->config.num = num_descs;
- vq->config.irq = devices.next_irq++;
- vq->config.pfn = to_guest_phys(p) / getpagesize();
-
- /* Initialize the vring. */
- vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN);
-
- /*
- * Append virtqueue to this device's descriptor. We use
- * device_config() to get the end of the device's current virtqueues;
- * we check that we haven't added any config or feature information
- * yet, otherwise we'd be overwriting them.
- */
- assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0);
- memcpy(device_config(dev), &vq->config, sizeof(vq->config));
- dev->num_vq++;
- dev->desc->num_vq++;
-
- verbose("Virtqueue page %#lx\n", to_guest_phys(p));
-
- /*
- * Add to tail of list, so dev->vq is first vq, dev->vq->next is
- * second.
- */
- for (i = &dev->vq; *i; i = &(*i)->next);
- *i = vq;
-}
-
-/*
- * The first half of the feature bitmask is for us to advertise features. The
- * second half is for the Guest to accept features.
- */
-static void add_feature(struct device *dev, unsigned bit)
-{
- u8 *features = get_feature_bits(dev);
-
- /* We can't extend the feature bits once we've added config bytes */
- if (dev->desc->feature_len <= bit / CHAR_BIT) {
- assert(dev->desc->config_len == 0);
- dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1;
- }
-
- features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT));
-}
-
-/*
- * This routine sets the configuration fields for an existing device's
- * descriptor. It only works for the last device, but that's OK because that's
- * how we use it.
- */
-static void set_config(struct device *dev, unsigned len, const void *conf)
-{
- /* Check we haven't overflowed our single page. */
- if (device_config(dev) + len > devices.descpage + getpagesize())
- errx(1, "Too many devices");
-
- /* Copy in the config information, and store the length. */
- memcpy(device_config(dev), conf, len);
- dev->desc->config_len = len;
-
- /* Size must fit in config_len field (8 bits)! */
- assert(dev->desc->config_len == len);
-}
-
-/*
- * This routine does all the creation and setup of a new device, including
- * calling new_dev_desc() to allocate the descriptor and device memory. We
- * don't actually start the service threads until later.
- *
- * See what I mean about userspace being boring?
- */
-static struct device *new_device(const char *name, u16 type)
-{
- struct device *dev = malloc(sizeof(*dev));
-
- /* Now we populate the fields one at a time. */
- dev->desc = new_dev_desc(type);
- dev->name = name;
- dev->vq = NULL;
- dev->feature_len = 0;
- dev->num_vq = 0;
- dev->running = false;
-
- /*
- * Append to device list. Prepending to a single-linked list is
- * easier, but the user expects the devices to be arranged on the bus
- * in command-line order. The first network device on the command line
- * is eth0, the first block device /dev/vda, etc.
- */
- if (devices.lastdev)
- devices.lastdev->next = dev;
- else
- devices.dev = dev;
- devices.lastdev = dev;
-
- return dev;
-}
-
-/*
- * Our first setup routine is the console. It's a fairly simple device, but
- * UNIX tty handling makes it uglier than it could be.
- */
-static void setup_console(void)
-{
- struct device *dev;
-
- /* If we can save the initial standard input settings... */
- if (tcgetattr(STDIN_FILENO, &orig_term) == 0) {
- struct termios term = orig_term;
- /*
- * Then we turn off echo, line buffering and ^C etc: We want a
- * raw input stream to the Guest.
- */
- term.c_lflag &= ~(ISIG|ICANON|ECHO);
- tcsetattr(STDIN_FILENO, TCSANOW, &term);
- }
-
- dev = new_device("console", VIRTIO_ID_CONSOLE);
-
- /* We store the console state in dev->priv, and initialize it. */
- dev->priv = malloc(sizeof(struct console_abort));
- ((struct console_abort *)dev->priv)->count = 0;
-
- /*
- * The console needs two virtqueues: the input then the output. When
- * they put something the input queue, we make sure we're listening to
- * stdin. When they put something in the output queue, we write it to
- * stdout.
- */
- add_virtqueue(dev, VIRTQUEUE_NUM, console_input);
- add_virtqueue(dev, VIRTQUEUE_NUM, console_output);
-
- verbose("device %u: console\n", ++devices.device_num);
-}
-/*:*/
-
-/*M:010
- * Inter-guest networking is an interesting area. Simplest is to have a
- * --sharenet=<name> option which opens or creates a named pipe. This can be
- * used to send packets to another guest in a 1:1 manner.
- *
- * More sopisticated is to use one of the tools developed for project like UML
- * to do networking.
- *
- * Faster is to do virtio bonding in kernel. Doing this 1:1 would be
- * completely generic ("here's my vring, attach to your vring") and would work
- * for any traffic. Of course, namespace and permissions issues need to be
- * dealt with. A more sophisticated "multi-channel" virtio_net.c could hide
- * multiple inter-guest channels behind one interface, although it would
- * require some manner of hotplugging new virtio channels.
- *
- * Finally, we could implement a virtio network switch in the kernel.
-:*/
-
-static u32 str2ip(const char *ipaddr)
-{
- unsigned int b[4];
-
- if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4)
- errx(1, "Failed to parse IP address '%s'", ipaddr);
- return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3];
-}
-
-static void str2mac(const char *macaddr, unsigned char mac[6])
-{
- unsigned int m[6];
- if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x",
- &m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6)
- errx(1, "Failed to parse mac address '%s'", macaddr);
- mac[0] = m[0];
- mac[1] = m[1];
- mac[2] = m[2];
- mac[3] = m[3];
- mac[4] = m[4];
- mac[5] = m[5];
-}
-
-/*
- * This code is "adapted" from libbridge: it attaches the Host end of the
- * network device to the bridge device specified by the command line.
- *
- * This is yet another James Morris contribution (I'm an IP-level guy, so I
- * dislike bridging), and I just try not to break it.
- */
-static void add_to_bridge(int fd, const char *if_name, const char *br_name)
-{
- int ifidx;
- struct ifreq ifr;
-
- if (!*br_name)
- errx(1, "must specify bridge name");
-
- ifidx = if_nametoindex(if_name);
- if (!ifidx)
- errx(1, "interface %s does not exist!", if_name);
-
- strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
- ifr.ifr_name[IFNAMSIZ-1] = '\0';
- ifr.ifr_ifindex = ifidx;
- if (ioctl(fd, SIOCBRADDIF, &ifr) < 0)
- err(1, "can't add %s to bridge %s", if_name, br_name);
-}
-
-/*
- * This sets up the Host end of the network device with an IP address, brings
- * it up so packets will flow, the copies the MAC address into the hwaddr
- * pointer.
- */
-static void configure_device(int fd, const char *tapif, u32 ipaddr)
-{
- struct ifreq ifr;
- struct sockaddr_in sin;
-
- memset(&ifr, 0, sizeof(ifr));
- strcpy(ifr.ifr_name, tapif);
-
- /* Don't read these incantations. Just cut & paste them like I did! */
- sin.sin_family = AF_INET;
- sin.sin_addr.s_addr = htonl(ipaddr);
- memcpy(&ifr.ifr_addr, &sin, sizeof(sin));
- if (ioctl(fd, SIOCSIFADDR, &ifr) != 0)
- err(1, "Setting %s interface address", tapif);
- ifr.ifr_flags = IFF_UP;
- if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0)
- err(1, "Bringing interface %s up", tapif);
-}
-
-static int get_tun_device(char tapif[IFNAMSIZ])
-{
- struct ifreq ifr;
- int netfd;
-
- /* Start with this zeroed. Messy but sure. */
- memset(&ifr, 0, sizeof(ifr));
-
- /*
- * We open the /dev/net/tun device and tell it we want a tap device. A
- * tap device is like a tun device, only somehow different. To tell
- * the truth, I completely blundered my way through this code, but it
- * works now!
- */
- netfd = open_or_die("/dev/net/tun", O_RDWR);
- ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
- strcpy(ifr.ifr_name, "tap%d");
- if (ioctl(netfd, TUNSETIFF, &ifr) != 0)
- err(1, "configuring /dev/net/tun");
-
- if (ioctl(netfd, TUNSETOFFLOAD,
- TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0)
- err(1, "Could not set features for tun device");
-
- /*
- * We don't need checksums calculated for packets coming in this
- * device: trust us!
- */
- ioctl(netfd, TUNSETNOCSUM, 1);
-
- memcpy(tapif, ifr.ifr_name, IFNAMSIZ);
- return netfd;
-}
-
-/*L:195
- * Our network is a Host<->Guest network. This can either use bridging or
- * routing, but the principle is the same: it uses the "tun" device to inject
- * packets into the Host as if they came in from a normal network card. We
- * just shunt packets between the Guest and the tun device.
- */
-static void setup_tun_net(char *arg)
-{
- struct device *dev;
- struct net_info *net_info = malloc(sizeof(*net_info));
- int ipfd;
- u32 ip = INADDR_ANY;
- bool bridging = false;
- char tapif[IFNAMSIZ], *p;
- struct virtio_net_config conf;
-
- net_info->tunfd = get_tun_device(tapif);
-
- /* First we create a new network device. */
- dev = new_device("net", VIRTIO_ID_NET);
- dev->priv = net_info;
-
- /* Network devices need a recv and a send queue, just like console. */
- add_virtqueue(dev, VIRTQUEUE_NUM, net_input);
- add_virtqueue(dev, VIRTQUEUE_NUM, net_output);
-
- /*
- * We need a socket to perform the magic network ioctls to bring up the
- * tap interface, connect to the bridge etc. Any socket will do!
- */
- ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
- if (ipfd < 0)
- err(1, "opening IP socket");
-
- /* If the command line was --tunnet=bridge:<name> do bridging. */
- if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
- arg += strlen(BRIDGE_PFX);
- bridging = true;
- }
-
- /* A mac address may follow the bridge name or IP address */
- p = strchr(arg, ':');
- if (p) {
- str2mac(p+1, conf.mac);
- add_feature(dev, VIRTIO_NET_F_MAC);
- *p = '\0';
- }
-
- /* arg is now either an IP address or a bridge name */
- if (bridging)
- add_to_bridge(ipfd, tapif, arg);
- else
- ip = str2ip(arg);
-
- /* Set up the tun device. */
- configure_device(ipfd, tapif, ip);
-
- add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
- /* Expect Guest to handle everything except UFO */
- add_feature(dev, VIRTIO_NET_F_CSUM);
- add_feature(dev, VIRTIO_NET_F_GUEST_CSUM);
- add_feature(dev, VIRTIO_NET_F_GUEST_TSO4);
- add_feature(dev, VIRTIO_NET_F_GUEST_TSO6);
- add_feature(dev, VIRTIO_NET_F_GUEST_ECN);
- add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
- add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
- add_feature(dev, VIRTIO_NET_F_HOST_ECN);
- /* We handle indirect ring entries */
- add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
- set_config(dev, sizeof(conf), &conf);
-
- /* We don't need the socket any more; setup is done. */
- close(ipfd);
-
- devices.device_num++;
-
- if (bridging)
- verbose("device %u: tun %s attached to bridge: %s\n",
- devices.device_num, tapif, arg);
- else
- verbose("device %u: tun %s: %s\n",
- devices.device_num, tapif, arg);
-}
-/*:*/
-
-/* This hangs off device->priv. */
-struct vblk_info {
- /* The size of the file. */
- off64_t len;
-
- /* The file descriptor for the file. */
- int fd;
-
-};
-
-/*L:210
- * The Disk
- *
- * The disk only has one virtqueue, so it only has one thread. It is really
- * simple: the Guest asks for a block number and we read or write that position
- * in the file.
- *
- * Before we serviced each virtqueue in a separate thread, that was unacceptably
- * slow: the Guest waits until the read is finished before running anything
- * else, even if it could have been doing useful work.
- *
- * We could have used async I/O, except it's reputed to suck so hard that
- * characters actually go missing from your code when you try to use it.
- */
-static void blk_request(struct virtqueue *vq)
-{
- struct vblk_info *vblk = vq->dev->priv;
- unsigned int head, out_num, in_num, wlen;
- int ret;
- u8 *in;
- struct virtio_blk_outhdr *out;
- struct iovec iov[vq->vring.num];
- off64_t off;
-
- /*
- * Get the next request, where we normally wait. It triggers the
- * interrupt to acknowledge previously serviced requests (if any).
- */
- head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
-
- /*
- * Every block request should contain at least one output buffer
- * (detailing the location on disk and the type of request) and one
- * input buffer (to hold the result).
- */
- if (out_num == 0 || in_num == 0)
- errx(1, "Bad virtblk cmd %u out=%u in=%u",
- head, out_num, in_num);
-
- out = convert(&iov[0], struct virtio_blk_outhdr);
- in = convert(&iov[out_num+in_num-1], u8);
- /*
- * For historical reasons, block operations are expressed in 512 byte
- * "sectors".
- */
- off = out->sector * 512;
-
- /*
- * In general the virtio block driver is allowed to try SCSI commands.
- * It'd be nice if we supported eject, for example, but we don't.
- */
- if (out->type & VIRTIO_BLK_T_SCSI_CMD) {
- fprintf(stderr, "Scsi commands unsupported\n");
- *in = VIRTIO_BLK_S_UNSUPP;
- wlen = sizeof(*in);
- } else if (out->type & VIRTIO_BLK_T_OUT) {
- /*
- * Write
- *
- * Move to the right location in the block file. This can fail
- * if they try to write past end.
- */
- if (lseek64(vblk->fd, off, SEEK_SET) != off)
- err(1, "Bad seek to sector %llu", out->sector);
-
- ret = writev(vblk->fd, iov+1, out_num-1);
- verbose("WRITE to sector %llu: %i\n", out->sector, ret);
-
- /*
- * Grr... Now we know how long the descriptor they sent was, we
- * make sure they didn't try to write over the end of the block
- * file (possibly extending it).
- */
- if (ret > 0 && off + ret > vblk->len) {
- /* Trim it back to the correct length */
- ftruncate64(vblk->fd, vblk->len);
- /* Die, bad Guest, die. */
- errx(1, "Write past end %llu+%u", off, ret);
- }
-
- wlen = sizeof(*in);
- *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
- } else if (out->type & VIRTIO_BLK_T_FLUSH) {
- /* Flush */
- ret = fdatasync(vblk->fd);
- verbose("FLUSH fdatasync: %i\n", ret);
- wlen = sizeof(*in);
- *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
- } else {
- /*
- * Read
- *
- * Move to the right location in the block file. This can fail
- * if they try to read past end.
- */
- if (lseek64(vblk->fd, off, SEEK_SET) != off)
- err(1, "Bad seek to sector %llu", out->sector);
-
- ret = readv(vblk->fd, iov+1, in_num-1);
- verbose("READ from sector %llu: %i\n", out->sector, ret);
- if (ret >= 0) {
- wlen = sizeof(*in) + ret;
- *in = VIRTIO_BLK_S_OK;
- } else {
- wlen = sizeof(*in);
- *in = VIRTIO_BLK_S_IOERR;
- }
- }
-
- /* Finished that request. */
- add_used(vq, head, wlen);
-}
-
-/*L:198 This actually sets up a virtual block device. */
-static void setup_block_file(const char *filename)
-{
- struct device *dev;
- struct vblk_info *vblk;
- struct virtio_blk_config conf;
-
- /* Creat the device. */
- dev = new_device("block", VIRTIO_ID_BLOCK);
-
- /* The device has one virtqueue, where the Guest places requests. */
- add_virtqueue(dev, VIRTQUEUE_NUM, blk_request);
-
- /* Allocate the room for our own bookkeeping */
- vblk = dev->priv = malloc(sizeof(*vblk));
-
- /* First we open the file and store the length. */
- vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE);
- vblk->len = lseek64(vblk->fd, 0, SEEK_END);
-
- /* We support FLUSH. */
- add_feature(dev, VIRTIO_BLK_F_FLUSH);
-
- /* Tell Guest how many sectors this device has. */
- conf.capacity = cpu_to_le64(vblk->len / 512);
-
- /*
- * Tell Guest not to put in too many descriptors at once: two are used
- * for the in and out elements.
- */
- add_feature(dev, VIRTIO_BLK_F_SEG_MAX);
- conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2);
-
- /* Don't try to put whole struct: we have 8 bit limit. */
- set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf);
-
- verbose("device %u: virtblock %llu sectors\n",
- ++devices.device_num, le64_to_cpu(conf.capacity));
-}
-
-/*L:211
- * Our random number generator device reads from /dev/random into the Guest's
- * input buffers. The usual case is that the Guest doesn't want random numbers
- * and so has no buffers although /dev/random is still readable, whereas
- * console is the reverse.
- *
- * The same logic applies, however.
- */
-struct rng_info {
- int rfd;
-};
-
-static void rng_input(struct virtqueue *vq)
-{
- int len;
- unsigned int head, in_num, out_num, totlen = 0;
- struct rng_info *rng_info = vq->dev->priv;
- struct iovec iov[vq->vring.num];
-
- /* First we need a buffer from the Guests's virtqueue. */
- head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
- if (out_num)
- errx(1, "Output buffers in rng?");
-
- /*
- * Just like the console write, we loop to cover the whole iovec.
- * In this case, short reads actually happen quite a bit.
- */
- while (!iov_empty(iov, in_num)) {
- len = readv(rng_info->rfd, iov, in_num);
- if (len <= 0)
- err(1, "Read from /dev/random gave %i", len);
- iov_consume(iov, in_num, len);
- totlen += len;
- }
-
- /* Tell the Guest about the new input. */
- add_used(vq, head, totlen);
-}
-
-/*L:199
- * This creates a "hardware" random number device for the Guest.
- */
-static void setup_rng(void)
-{
- struct device *dev;
- struct rng_info *rng_info = malloc(sizeof(*rng_info));
-
- /* Our device's privat info simply contains the /dev/random fd. */
- rng_info->rfd = open_or_die("/dev/random", O_RDONLY);
-
- /* Create the new device. */
- dev = new_device("rng", VIRTIO_ID_RNG);
- dev->priv = rng_info;
-
- /* The device has one virtqueue, where the Guest places inbufs. */
- add_virtqueue(dev, VIRTQUEUE_NUM, rng_input);
-
- verbose("device %u: rng\n", devices.device_num++);
-}
-/* That's the end of device setup. */
-
-/*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */
-static void __attribute__((noreturn)) restart_guest(void)
-{
- unsigned int i;
-
- /*
- * Since we don't track all open fds, we simply close everything beyond
- * stderr.
- */
- for (i = 3; i < FD_SETSIZE; i++)
- close(i);
-
- /* Reset all the devices (kills all threads). */
- cleanup_devices();
-
- execv(main_args[0], main_args);
- err(1, "Could not exec %s", main_args[0]);
-}
-
-/*L:220
- * Finally we reach the core of the Launcher which runs the Guest, serves
- * its input and output, and finally, lays it to rest.
- */
-static void __attribute__((noreturn)) run_guest(void)
-{
- for (;;) {
- unsigned long notify_addr;
- int readval;
-
- /* We read from the /dev/lguest device to run the Guest. */
- readval = pread(lguest_fd, ¬ify_addr,
- sizeof(notify_addr), cpu_id);
-
- /* One unsigned long means the Guest did HCALL_NOTIFY */
- if (readval == sizeof(notify_addr)) {
- verbose("Notify on address %#lx\n", notify_addr);
- handle_output(notify_addr);
- /* ENOENT means the Guest died. Reading tells us why. */
- } else if (errno == ENOENT) {
- char reason[1024] = { 0 };
- pread(lguest_fd, reason, sizeof(reason)-1, cpu_id);
- errx(1, "%s", reason);
- /* ERESTART means that we need to reboot the guest */
- } else if (errno == ERESTART) {
- restart_guest();
- /* Anything else means a bug or incompatible change. */
- } else
- err(1, "Running guest failed");
- }
-}
-/*L:240
- * This is the end of the Launcher. The good news: we are over halfway
- * through! The bad news: the most fiendish part of the code still lies ahead
- * of us.
- *
- * Are you ready? Take a deep breath and join me in the core of the Host, in
- * "make Host".
-:*/
-
-static struct option opts[] = {
- { "verbose", 0, NULL, 'v' },
- { "tunnet", 1, NULL, 't' },
- { "block", 1, NULL, 'b' },
- { "rng", 0, NULL, 'r' },
- { "initrd", 1, NULL, 'i' },
- { "username", 1, NULL, 'u' },
- { "chroot", 1, NULL, 'c' },
- { NULL },
-};
-static void usage(void)
-{
- errx(1, "Usage: lguest [--verbose] "
- "[--tunnet=(<ipaddr>:<macaddr>|bridge:<bridgename>:<macaddr>)\n"
- "|--block=<filename>|--initrd=<filename>]...\n"
- "<mem-in-mb> vmlinux [args...]");
-}
-
-/*L:105 The main routine is where the real work begins: */
-int main(int argc, char *argv[])
-{
- /* Memory, code startpoint and size of the (optional) initrd. */
- unsigned long mem = 0, start, initrd_size = 0;
- /* Two temporaries. */
- int i, c;
- /* The boot information for the Guest. */
- struct boot_params *boot;
- /* If they specify an initrd file to load. */
- const char *initrd_name = NULL;
-
- /* Password structure for initgroups/setres[gu]id */
- struct passwd *user_details = NULL;
-
- /* Directory to chroot to */
- char *chroot_path = NULL;
-
- /* Save the args: we "reboot" by execing ourselves again. */
- main_args = argv;
-
- /*
- * First we initialize the device list. We keep a pointer to the last
- * device, and the next interrupt number to use for devices (1:
- * remember that 0 is used by the timer).
- */
- devices.lastdev = NULL;
- devices.next_irq = 1;
-
- /* We're CPU 0. In fact, that's the only CPU possible right now. */
- cpu_id = 0;
-
- /*
- * We need to know how much memory so we can set up the device
- * descriptor and memory pages for the devices as we parse the command
- * line. So we quickly look through the arguments to find the amount
- * of memory now.
- */
- for (i = 1; i < argc; i++) {
- if (argv[i][0] != '-') {
- mem = atoi(argv[i]) * 1024 * 1024;
- /*
- * We start by mapping anonymous pages over all of
- * guest-physical memory range. This fills it with 0,
- * and ensures that the Guest won't be killed when it
- * tries to access it.
- */
- guest_base = map_zeroed_pages(mem / getpagesize()
- + DEVICE_PAGES);
- guest_limit = mem;
- guest_max = mem + DEVICE_PAGES*getpagesize();
- devices.descpage = get_pages(1);
- break;
- }
- }
-
- /* The options are fairly straight-forward */
- while ((c = getopt_long(argc, argv, "v", opts, NULL)) != EOF) {
- switch (c) {
- case 'v':
- verbose = true;
- break;
- case 't':
- setup_tun_net(optarg);
- break;
- case 'b':
- setup_block_file(optarg);
- break;
- case 'r':
- setup_rng();
- break;
- case 'i':
- initrd_name = optarg;
- break;
- case 'u':
- user_details = getpwnam(optarg);
- if (!user_details)
- err(1, "getpwnam failed, incorrect username?");
- break;
- case 'c':
- chroot_path = optarg;
- break;
- default:
- warnx("Unknown argument %s", argv[optind]);
- usage();
- }
- }
- /*
- * After the other arguments we expect memory and kernel image name,
- * followed by command line arguments for the kernel.
- */
- if (optind + 2 > argc)
- usage();
-
- verbose("Guest base is at %p\n", guest_base);
-
- /* We always have a console device */
- setup_console();
-
- /* Now we load the kernel */
- start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
-
- /* Boot information is stashed at physical address 0 */
- boot = from_guest_phys(0);
-
- /* Map the initrd image if requested (at top of physical memory) */
- if (initrd_name) {
- initrd_size = load_initrd(initrd_name, mem);
- /*
- * These are the location in the Linux boot header where the
- * start and size of the initrd are expected to be found.
- */
- boot->hdr.ramdisk_image = mem - initrd_size;
- boot->hdr.ramdisk_size = initrd_size;
- /* The bootloader type 0xFF means "unknown"; that's OK. */
- boot->hdr.type_of_loader = 0xFF;
- }
-
- /*
- * The Linux boot header contains an "E820" memory map: ours is a
- * simple, single region.
- */
- boot->e820_entries = 1;
- boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM });
- /*
- * The boot header contains a command line pointer: we put the command
- * line after the boot header.
- */
- boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1);
- /* We use a simple helper to copy the arguments separated by spaces. */
- concat((char *)(boot + 1), argv+optind+2);
-
- /* Boot protocol version: 2.07 supports the fields for lguest. */
- boot->hdr.version = 0x207;
-
- /* The hardware_subarch value of "1" tells the Guest it's an lguest. */
- boot->hdr.hardware_subarch = 1;
-
- /* Tell the entry path not to try to reload segment registers. */
- boot->hdr.loadflags |= KEEP_SEGMENTS;
-
- /*
- * We tell the kernel to initialize the Guest: this returns the open
- * /dev/lguest file descriptor.
- */
- tell_kernel(start);
-
- /* Ensure that we terminate if a device-servicing child dies. */
- signal(SIGCHLD, kill_launcher);
-
- /* If we exit via err(), this kills all the threads, restores tty. */
- atexit(cleanup_devices);
-
- /* If requested, chroot to a directory */
- if (chroot_path) {
- if (chroot(chroot_path) != 0)
- err(1, "chroot(\"%s\") failed", chroot_path);
-
- if (chdir("/") != 0)
- err(1, "chdir(\"/\") failed");
-
- verbose("chroot done\n");
- }
-
- /* If requested, drop privileges */
- if (user_details) {
- uid_t u;
- gid_t g;
-
- u = user_details->pw_uid;
- g = user_details->pw_gid;
-
- if (initgroups(user_details->pw_name, g) != 0)
- err(1, "initgroups failed");
-
- if (setresgid(g, g, g) != 0)
- err(1, "setresgid failed");
-
- if (setresuid(u, u, u) != 0)
- err(1, "setresuid failed");
-
- verbose("Dropping privileges completed\n");
- }
-
- /* Finally, run the Guest. This doesn't return. */
- run_guest();
-}
-/*:*/
-
-/*M:999
- * Mastery is done: you now know everything I do.
- *
- * But surely you have seen code, features and bugs in your wanderings which
- * you now yearn to attack? That is the real game, and I look forward to you
- * patching and forking lguest into the Your-Name-Here-visor.
- *
- * Farewell, and good coding!
- * Rusty Russell.
- */
+++ /dev/null
- __
- (___()'`; Rusty's Remarkably Unreliable Guide to Lguest
- /, /` - or, A Young Coder's Illustrated Hypervisor
- \\"--\\ http://lguest.ozlabs.org
-
-Lguest is designed to be a minimal 32-bit x86 hypervisor for the Linux kernel,
-for Linux developers and users to experiment with virtualization with the
-minimum of complexity. Nonetheless, it should have sufficient features to
-make it useful for specific tasks, and, of course, you are encouraged to fork
-and enhance it (see drivers/lguest/README).
-
-Features:
-
-- Kernel module which runs in a normal kernel.
-- Simple I/O model for communication.
-- Simple program to create new guests.
-- Logo contains cute puppies: http://lguest.ozlabs.org
-
-Developer features:
-
-- Fun to hack on.
-- No ABI: being tied to a specific kernel anyway, you can change anything.
-- Many opportunities for improvement or feature implementation.
-
-Running Lguest:
-
-- The easiest way to run lguest is to use same kernel as guest and host.
- You can configure them differently, but usually it's easiest not to.
-
- You will need to configure your kernel with the following options:
-
- "General setup":
- "Prompt for development and/or incomplete code/drivers" = Y
- (CONFIG_EXPERIMENTAL=y)
-
- "Processor type and features":
- "Paravirtualized guest support" = Y
- "Lguest guest support" = Y
- "High Memory Support" = off/4GB
- "Alignment value to which kernel should be aligned" = 0x100000
- (CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
- CONFIG_PHYSICAL_ALIGN=0x100000)
-
- "Device Drivers":
- "Block devices"
- "Virtio block driver (EXPERIMENTAL)" = M/Y
- "Network device support"
- "Universal TUN/TAP device driver support" = M/Y
- "Virtio network driver (EXPERIMENTAL)" = M/Y
- (CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m and CONFIG_TUN=m)
-
- "Virtualization"
- "Linux hypervisor example code" = M/Y
- (CONFIG_LGUEST=m)
-
-- A tool called "lguest" is available in this directory: type "make"
- to build it. If you didn't build your kernel in-tree, use "make
- O=<builddir>".
-
-- Create or find a root disk image. There are several useful ones
- around, such as the xm-test tiny root image at
- http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img
-
- For more serious work, I usually use a distribution ISO image and
- install it under qemu, then make multiple copies:
-
- dd if=/dev/zero of=rootfile bs=1M count=2048
- qemu -cdrom image.iso -hda rootfile -net user -net nic -boot d
-
- Make sure that you install a getty on /dev/hvc0 if you want to log in on the
- console!
-
-- "modprobe lg" if you built it as a module.
-
-- Run an lguest as root:
-
- Documentation/lguest/lguest 64 vmlinux --tunnet=192.168.19.1 --block=rootfile root=/dev/vda
-
- Explanation:
- 64: the amount of memory to use, in MB.
-
- vmlinux: the kernel image found in the top of your build directory. You
- can also use a standard bzImage.
-
- --tunnet=192.168.19.1: configures a "tap" device for networking with this
- IP address.
-
- --block=rootfile: a file or block device which becomes /dev/vda
- inside the guest.
-
- root=/dev/vda: this (and anything else on the command line) are
- kernel boot parameters.
-
-- Configuring networking. I usually have the host masquerade, using
- "iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE" and "echo 1 >
- /proc/sys/net/ipv4/ip_forward". In this example, I would configure
- eth0 inside the guest at 192.168.19.2.
-
- Another method is to bridge the tap device to an external interface
- using --tunnet=bridge:<bridgename>, and perhaps run dhcp on the guest
- to obtain an IP address. The bridge needs to be configured first:
- this option simply adds the tap interface to it.
-
- A simple example on my system:
-
- ifconfig eth0 0.0.0.0
- brctl addbr lg0
- ifconfig lg0 up
- brctl addif lg0 eth0
- dhclient lg0
-
- Then use --tunnet=bridge:lg0 when launching the guest.
-
- See:
-
- http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
-
- for general information on how to get bridging to work.
-
-- Random number generation. Using the --rng option will provide a
- /dev/hwrng in the guest that will read from the host's /dev/random.
- Use this option in conjunction with rng-tools (see ../hw_random.txt)
- to provide entropy to the guest kernel's /dev/random.
-
-There is a helpful mailing list at http://ozlabs.org/mailman/listinfo/lguest
-
-Good luck!
-Rusty Russell rusty@rustcorp.com.au.
time.) Unlike the other suspend-related phases, during the prepare
phase the device tree is traversed top-down.
- The prepare phase uses only a bus callback. After the callback method
- returns, no new children may be registered below the device. The method
- may also prepare the device or driver in some way for the upcoming
- system power transition, but it should not put the device into a
- low-power state.
+ In addition to that, if device drivers need to allocate additional
+ memory to be able to hadle device suspend correctly, that should be
+ done in the prepare phase.
+
+ After the prepare callback method returns, no new children may be
+ registered below the device. The method may also prepare the device or
+ driver in some way for the upcoming system power transition (for
+ example, by allocating additional memory required for this purpose), but
+ it should not put the device into a low-power state.
2. The suspend methods should quiesce the device to stop it from performing
I/O. They also may save the device registers and put it into the
Suspend notifiers
- (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
-
-There are some operations that device drivers may want to carry out in their
-.suspend() routines, but shouldn't, because they can cause the hibernation or
-suspend to fail. For example, a driver may want to allocate a substantial amount
-of memory (like 50 MB) in .suspend(), but that shouldn't be done after the
-swsusp's memory shrinker has run.
-
-Also, there may be some operations, that subsystems want to carry out before a
-hibernation/suspend or after a restore/resume, requiring the system to be fully
-functional, so the drivers' .suspend() and .resume() routines are not suitable
-for this purpose. For example, device drivers may want to upload firmware to
-their devices after a restore from a hibernation image, but they cannot do it by
-calling request_firmware() from their .resume() routines (user land processes
-are frozen at this point). The solution may be to load the firmware into
-memory before processes are frozen and upload it from there in the .resume()
-routine. Of course, a hibernation notifier may be used for this purpose.
-
-The subsystems that have such needs can register suspend notifiers that will be
-called upon the following events by the suspend core:
+ (C) 2007-2011 Rafael J. Wysocki <rjw@sisk.pl>, GPL
+
+There are some operations that subsystems or drivers may want to carry out
+before hibernation/suspend or after restore/resume, but they require the system
+to be fully functional, so the drivers' and subsystems' .suspend() and .resume()
+or even .prepare() and .complete() callbacks are not suitable for this purpose.
+For example, device drivers may want to upload firmware to their devices after
+resume/restore, but they cannot do it by calling request_firmware() from their
+.resume() or .complete() routines (user land processes are frozen at these
+points). The solution may be to load the firmware into memory before processes
+are frozen and upload it from there in the .resume() routine.
+A suspend/hibernation notifier may be used for this purpose.
+
+The subsystems or drivers having such needs can register suspend notifiers that
+will be called upon the following events by the PM core:
PM_HIBERNATION_PREPARE The system is going to hibernate or suspend, tasks will
be frozen immediately.
PM_POST_HIBERNATION The system memory state has been restored from a
- hibernation image or an error occurred during the
- hibernation. Device drivers' .resume() callbacks have
+ hibernation image or an error occurred during
+ hibernation. Device drivers' restore callbacks have
been executed and tasks have been thawed.
PM_RESTORE_PREPARE The system is going to restore a hibernation image.
- If all goes well the restored kernel will issue a
+ If all goes well, the restored kernel will issue a
PM_POST_HIBERNATION notification.
-PM_POST_RESTORE An error occurred during the hibernation restore.
- Device drivers' .resume() callbacks have been executed
+PM_POST_RESTORE An error occurred during restore from hibernation.
+ Device drivers' restore callbacks have been executed
and tasks have been thawed.
-PM_SUSPEND_PREPARE The system is preparing for a suspend.
+PM_SUSPEND_PREPARE The system is preparing for suspend.
PM_POST_SUSPEND The system has just resumed or an error occurred during
- the suspend. Device drivers' .resume() callbacks have
- been executed and tasks have been thawed.
+ suspend. Device drivers' resume callbacks have been
+ executed and tasks have been thawed.
It is generally assumed that whatever the notifiers do for
PM_HIBERNATION_PREPARE, should be undone for PM_POST_HIBERNATION. Analogously,
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
field:int common_pid; offset:4; size:4; signed:1;
- field:int common_lock_depth; offset:8; size:4; signed:1;
field:unsigned long __probe_ip; offset:12; size:4; signed:0;
field:int __probe_nargs; offset:16; size:4; signed:1;
+++ /dev/null
- User Mode Linux HOWTO
- User Mode Linux Core Team
- Mon Nov 18 14:16:16 EST 2002
-
- This document describes the use and abuse of Jeff Dike's User Mode
- Linux: a port of the Linux kernel as a normal Intel Linux process.
- ______________________________________________________________________
-
- Table of Contents
-
- 1. Introduction
-
- 1.1 How is User Mode Linux Different?
- 1.2 Why Would I Want User Mode Linux?
-
- 2. Compiling the kernel and modules
-
- 2.1 Compiling the kernel
- 2.2 Compiling and installing kernel modules
- 2.3 Compiling and installing uml_utilities
-
- 3. Running UML and logging in
-
- 3.1 Running UML
- 3.2 Logging in
- 3.3 Examples
-
- 4. UML on 2G/2G hosts
-
- 4.1 Introduction
- 4.2 The problem
- 4.3 The solution
-
- 5. Setting up serial lines and consoles
-
- 5.1 Specifying the device
- 5.2 Specifying the channel
- 5.3 Examples
-
- 6. Setting up the network
-
- 6.1 General setup
- 6.2 Userspace daemons
- 6.3 Specifying ethernet addresses
- 6.4 UML interface setup
- 6.5 Multicast
- 6.6 TUN/TAP with the uml_net helper
- 6.7 TUN/TAP with a preconfigured tap device
- 6.8 Ethertap
- 6.9 The switch daemon
- 6.10 Slip
- 6.11 Slirp
- 6.12 pcap
- 6.13 Setting up the host yourself
-
- 7. Sharing Filesystems between Virtual Machines
-
- 7.1 A warning
- 7.2 Using layered block devices
- 7.3 Note!
- 7.4 Another warning
- 7.5 uml_moo : Merging a COW file with its backing file
-
- 8. Creating filesystems
-
- 8.1 Create the filesystem file
- 8.2 Assign the file to a UML device
- 8.3 Creating and mounting the filesystem
-
- 9. Host file access
-
- 9.1 Using hostfs
- 9.2 hostfs as the root filesystem
- 9.3 Building hostfs
-
- 10. The Management Console
- 10.1 version
- 10.2 halt and reboot
- 10.3 config
- 10.4 remove
- 10.5 sysrq
- 10.6 help
- 10.7 cad
- 10.8 stop
- 10.9 go
-
- 11. Kernel debugging
-
- 11.1 Starting the kernel under gdb
- 11.2 Examining sleeping processes
- 11.3 Running ddd on UML
- 11.4 Debugging modules
- 11.5 Attaching gdb to the kernel
- 11.6 Using alternate debuggers
-
- 12. Kernel debugging examples
-
- 12.1 The case of the hung fsck
- 12.2 Episode 2: The case of the hung fsck
-
- 13. What to do when UML doesn't work
-
- 13.1 Strange compilation errors when you build from source
- 13.2 (obsolete)
- 13.3 A variety of panics and hangs with /tmp on a reiserfs filesystem
- 13.4 The compile fails with errors about conflicting types for 'open', 'dup', and 'waitpid'
- 13.5 UML doesn't work when /tmp is an NFS filesystem
- 13.6 UML hangs on boot when compiled with gprof support
- 13.7 syslogd dies with a SIGTERM on startup
- 13.8 TUN/TAP networking doesn't work on a 2.4 host
- 13.9 You can network to the host but not to other machines on the net
- 13.10 I have no root and I want to scream
- 13.11 UML build conflict between ptrace.h and ucontext.h
- 13.12 The UML BogoMips is exactly half the host's BogoMips
- 13.13 When you run UML, it immediately segfaults
- 13.14 xterms appear, then immediately disappear
- 13.15 Any other panic, hang, or strange behavior
-
- 14. Diagnosing Problems
-
- 14.1 Case 1 : Normal kernel panics
- 14.2 Case 2 : Tracing thread panics
- 14.3 Case 3 : Tracing thread panics caused by other threads
- 14.4 Case 4 : Hangs
-
- 15. Thanks
-
- 15.1 Code and Documentation
- 15.2 Flushing out bugs
- 15.3 Buglets and clean-ups
- 15.4 Case Studies
- 15.5 Other contributions
-
-
- ______________________________________________________________________
-
- 1\b1.\b. I\bIn\bnt\btr\bro\bod\bdu\buc\bct\bti\bio\bon\bn
-
- Welcome to User Mode Linux. It's going to be fun.
-
-
-
- 1\b1.\b.1\b1.\b. H\bHo\bow\bw i\bis\bs U\bUs\bse\ber\br M\bMo\bod\bde\be L\bLi\bin\bnu\bux\bx D\bDi\bif\bff\bfe\ber\bre\ben\bnt\bt?\b?
-
- Normally, the Linux Kernel talks straight to your hardware (video
- card, keyboard, hard drives, etc), and any programs which run ask the
- kernel to operate the hardware, like so:
-
-
-
- +-----------+-----------+----+
- | Process 1 | Process 2 | ...|
- +-----------+-----------+----+
- | Linux Kernel |
- +----------------------------+
- | Hardware |
- +----------------------------+
-
-
-
-
- The User Mode Linux Kernel is different; instead of talking to the
- hardware, it talks to a `real' Linux kernel (called the `host kernel'
- from now on), like any other program. Programs can then run inside
- User-Mode Linux as if they were running under a normal kernel, like
- so:
-
-
-
- +----------------+
- | Process 2 | ...|
- +-----------+----------------+
- | Process 1 | User-Mode Linux|
- +----------------------------+
- | Linux Kernel |
- +----------------------------+
- | Hardware |
- +----------------------------+
-
-
-
-
-
- 1\b1.\b.2\b2.\b. W\bWh\bhy\by W\bWo\bou\bul\bld\bd I\bI W\bWa\ban\bnt\bt U\bUs\bse\ber\br M\bMo\bod\bde\be L\bLi\bin\bnu\bux\bx?\b?
-
-
- 1. If User Mode Linux crashes, your host kernel is still fine.
-
- 2. You can run a usermode kernel as a non-root user.
-
- 3. You can debug the User Mode Linux like any normal process.
-
- 4. You can run gprof (profiling) and gcov (coverage testing).
-
- 5. You can play with your kernel without breaking things.
-
- 6. You can use it as a sandbox for testing new apps.
-
- 7. You can try new development kernels safely.
-
- 8. You can run different distributions simultaneously.
-
- 9. It's extremely fun.
-
-
-
-
-
- 2\b2.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg t\bth\bhe\be k\bke\ber\brn\bne\bel\bl a\ban\bnd\bd m\bmo\bod\bdu\bul\ble\bes\bs
-
-
-
-
- 2\b2.\b.1\b1.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg t\bth\bhe\be k\bke\ber\brn\bne\bel\bl
-
-
- Compiling the user mode kernel is just like compiling any other
- kernel. Let's go through the steps, using 2.4.0-prerelease (current
- as of this writing) as an example:
-
-
- 1. Download the latest UML patch from
-
- the download page <http://user-mode-linux.sourceforge.net/
-
- In this example, the file is uml-patch-2.4.0-prerelease.bz2.
-
-
- 2. Download the matching kernel from your favourite kernel mirror,
- such as:
-
- ftp://ftp.ca.kernel.org/pub/kernel/v2.4/linux-2.4.0-prerelease.tar.bz2
- <ftp://ftp.ca.kernel.org/pub/kernel/v2.4/linux-2.4.0-prerelease.tar.bz2>
- .
-
-
- 3. Make a directory and unpack the kernel into it.
-
-
-
- host%
- mkdir ~/uml
-
-
-
-
-
-
- host%
- cd ~/uml
-
-
-
-
-
-
- host%
- tar -xzvf linux-2.4.0-prerelease.tar.bz2
-
-
-
-
-
-
- 4. Apply the patch using
-
-
-
- host%
- cd ~/uml/linux
-
-
-
- host%
- bzcat uml-patch-2.4.0-prerelease.bz2 | patch -p1
-
-
-
-
-
-
- 5. Run your favorite config; `make xconfig ARCH=um' is the most
- convenient. `make config ARCH=um' and 'make menuconfig ARCH=um'
- will work as well. The defaults will give you a useful kernel. If
- you want to change something, go ahead, it probably won't hurt
- anything.
-
-
- Note: If the host is configured with a 2G/2G address space split
- rather than the usual 3G/1G split, then the packaged UML binaries
- will not run. They will immediately segfault. See ``UML on 2G/2G
- hosts'' for the scoop on running UML on your system.
-
-
-
- 6. Finish with `make linux ARCH=um': the result is a file called
- `linux' in the top directory of your source tree.
-
- Make sure that you don't build this kernel in /usr/src/linux. On some
- distributions, /usr/include/asm is a link into this pool. The user-
- mode build changes the other end of that link, and things that include
- <asm/anything.h> stop compiling.
-
- The sources are also available from cvs at the project's cvs page,
- which has directions on getting the sources. You can also browse the
- CVS pool from there.
-
- If you get the CVS sources, you will have to check them out into an
- empty directory. You will then have to copy each file into the
- corresponding directory in the appropriate kernel pool.
-
- If you don't have the latest kernel pool, you can get the
- corresponding user-mode sources with
-
-
- host% cvs co -r v_2_3_x linux
-
-
-
-
- where 'x' is the version in your pool. Note that you will not get the
- bug fixes and enhancements that have gone into subsequent releases.
-
-
- 2\b2.\b.2\b2.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg a\ban\bnd\bd i\bin\bns\bst\bta\bal\bll\bli\bin\bng\bg k\bke\ber\brn\bne\bel\bl m\bmo\bod\bdu\bul\ble\bes\bs
-
- UML modules are built in the same way as the native kernel (with the
- exception of the 'ARCH=um' that you always need for UML):
-
-
- host% make modules ARCH=um
-
-
-
-
- Any modules that you want to load into this kernel need to be built in
- the user-mode pool. Modules from the native kernel won't work.
-
- You can install them by using ftp or something to copy them into the
- virtual machine and dropping them into /lib/modules/`uname -r`.
-
- You can also get the kernel build process to install them as follows:
-
- 1. with the kernel not booted, mount the root filesystem in the top
- level of the kernel pool:
-
-
- host% mount root_fs mnt -o loop
-
-
-
-
-
-
- 2. run
-
-
- host%
- make modules_install INSTALL_MOD_PATH=`pwd`/mnt ARCH=um
-
-
-
-
-
-
- 3. unmount the filesystem
-
-
- host% umount mnt
-
-
-
-
-
-
- 4. boot the kernel on it
-
-
- When the system is booted, you can use insmod as usual to get the
- modules into the kernel. A number of things have been loaded into UML
- as modules, especially filesystems and network protocols and filters,
- so most symbols which need to be exported probably already are.
- However, if you do find symbols that need exporting, let us
- <http://user-mode-linux.sourceforge.net/> know, and
- they'll be "taken care of".
-
-
-
- 2\b2.\b.3\b3.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg a\ban\bnd\bd i\bin\bns\bst\bta\bal\bll\bli\bin\bng\bg u\bum\bml\bl_\b_u\but\bti\bil\bli\bit\bti\bie\bes\bs
-
- Many features of the UML kernel require a user-space helper program,
- so a uml_utilities package is distributed separately from the kernel
- patch which provides these helpers. Included within this is:
-
- +\bo port-helper - Used by consoles which connect to xterms or ports
-
- +\bo tunctl - Configuration tool to create and delete tap devices
-
- +\bo uml_net - Setuid binary for automatic tap device configuration
-
- +\bo uml_switch - User-space virtual switch required for daemon
- transport
-
- The uml_utilities tree is compiled with:
-
-
- host#
- make && make install
-
-
-
-
- Note that UML kernel patches may require a specific version of the
- uml_utilities distribution. If you don't keep up with the mailing
- lists, ensure that you have the latest release of uml_utilities if you
- are experiencing problems with your UML kernel, particularly when
- dealing with consoles or command-line switches to the helper programs
-
-
-
-
-
-
-
-
- 3\b3.\b. R\bRu\bun\bnn\bni\bin\bng\bg U\bUM\bML\bL a\ban\bnd\bd l\blo\bog\bgg\bgi\bin\bng\bg i\bin\bn
-
-
-
- 3\b3.\b.1\b1.\b. R\bRu\bun\bnn\bni\bin\bng\bg U\bUM\bML\bL
-
- It runs on 2.2.15 or later, and all 2.4 kernels.
-
-
- Booting UML is straightforward. Simply run 'linux': it will try to
- mount the file `root_fs' in the current directory. You do not need to
- run it as root. If your root filesystem is not named `root_fs', then
- you need to put a `ubd0=root_fs_whatever' switch on the linux command
- line.
-
-
- You will need a filesystem to boot UML from. There are a number
- available for download from here <http://user-mode-
- linux.sourceforge.net/> . There are also several tools
- <http://user-mode-linux.sourceforge.net/> which can be
- used to generate UML-compatible filesystem images from media.
- The kernel will boot up and present you with a login prompt.
-
-
- Note: If the host is configured with a 2G/2G address space split
- rather than the usual 3G/1G split, then the packaged UML binaries will
- not run. They will immediately segfault. See ``UML on 2G/2G hosts''
- for the scoop on running UML on your system.
-
-
-
- 3\b3.\b.2\b2.\b. L\bLo\bog\bgg\bgi\bin\bng\bg i\bin\bn
-
-
-
- The prepackaged filesystems have a root account with password 'root'
- and a user account with password 'user'. The login banner will
- generally tell you how to log in. So, you log in and you will find
- yourself inside a little virtual machine. Our filesystems have a
- variety of commands and utilities installed (and it is fairly easy to
- add more), so you will have a lot of tools with which to poke around
- the system.
-
- There are a couple of other ways to log in:
-
- +\bo On a virtual console
-
-
-
- Each virtual console that is configured (i.e. the device exists in
- /dev and /etc/inittab runs a getty on it) will come up in its own
- xterm. If you get tired of the xterms, read ``Setting up serial
- lines and consoles'' to see how to attach the consoles to
- something else, like host ptys.
-
-
-
- +\bo Over the serial line
-
-
- In the boot output, find a line that looks like:
-
-
-
- serial line 0 assigned pty /dev/ptyp1
-
-
-
-
- Attach your favorite terminal program to the corresponding tty. I.e.
- for minicom, the command would be
-
-
- host% minicom -o -p /dev/ttyp1
-
-
-
-
-
-
- +\bo Over the net
-
-
- If the network is running, then you can telnet to the virtual
- machine and log in to it. See ``Setting up the network'' to learn
- about setting up a virtual network.
-
- When you're done using it, run halt, and the kernel will bring itself
- down and the process will exit.
-
-
- 3\b3.\b.3\b3.\b. E\bEx\bxa\bam\bmp\bpl\ble\bes\bs
-
- Here are some examples of UML in action:
-
- +\bo A login session <http://user-mode-linux.sourceforge.net/login.html>
-
- +\bo A virtual network <http://user-mode-linux.sourceforge.net/net.html>
-
-
-
-
-
-
-
- 4\b4.\b. U\bUM\bML\bL o\bon\bn 2\b2G\bG/\b/2\b2G\bG h\bho\bos\bst\bts\bs
-
-
-
-
- 4\b4.\b.1\b1.\b. I\bIn\bnt\btr\bro\bod\bdu\buc\bct\bti\bio\bon\bn
-
-
- Most Linux machines are configured so that the kernel occupies the
- upper 1G (0xc0000000 - 0xffffffff) of the 4G address space and
- processes use the lower 3G (0x00000000 - 0xbfffffff). However, some
- machine are configured with a 2G/2G split, with the kernel occupying
- the upper 2G (0x80000000 - 0xffffffff) and processes using the lower
- 2G (0x00000000 - 0x7fffffff).
-
-
-
-
- 4\b4.\b.2\b2.\b. T\bTh\bhe\be p\bpr\bro\bob\bbl\ble\bem\bm
-
-
- The prebuilt UML binaries on this site will not run on 2G/2G hosts
- because UML occupies the upper .5G of the 3G process address space
- (0xa0000000 - 0xbfffffff). Obviously, on 2G/2G hosts, this is right
- in the middle of the kernel address space, so UML won't even load - it
- will immediately segfault.
-
-
-
-
- 4\b4.\b.3\b3.\b. T\bTh\bhe\be s\bso\bol\blu\but\bti\bio\bon\bn
-
-
- The fix for this is to rebuild UML from source after enabling
- CONFIG_HOST_2G_2G (under 'General Setup'). This will cause UML to
- load itself in the top .5G of that smaller process address space,
- where it will run fine. See ``Compiling the kernel and modules'' if
- you need help building UML from source.
-
-
-
-
-
-
-
-
-
-
- 5\b5.\b. S\bSe\bet\btt\bti\bin\bng\bg u\bup\bp s\bse\ber\bri\bia\bal\bl l\bli\bin\bne\bes\bs a\ban\bnd\bd c\bco\bon\bns\bso\bol\ble\bes\bs
-
-
- It is possible to attach UML serial lines and consoles to many types
- of host I/O channels by specifying them on the command line.
-
-
- You can attach them to host ptys, ttys, file descriptors, and ports.
- This allows you to do things like
-
- +\bo have a UML console appear on an unused host console,
-
- +\bo hook two virtual machines together by having one attach to a pty
- and having the other attach to the corresponding tty
-
- +\bo make a virtual machine accessible from the net by attaching a
- console to a port on the host.
-
-
- The general format of the command line option is device=channel.
-
-
-
- 5\b5.\b.1\b1.\b. S\bSp\bpe\bec\bci\bif\bfy\byi\bin\bng\bg t\bth\bhe\be d\bde\bev\bvi\bic\bce\be
-
- Devices are specified with "con" or "ssl" (console or serial line,
- respectively), optionally with a device number if you are talking
- about a specific device.
-
-
- Using just "con" or "ssl" describes all of the consoles or serial
- lines. If you want to talk about console #3 or serial line #10, they
- would be "con3" and "ssl10", respectively.
-
-
- A specific device name will override a less general "con=" or "ssl=".
- So, for example, you can assign a pty to each of the serial lines
- except for the first two like this:
-
-
- ssl=pty ssl0=tty:/dev/tty0 ssl1=tty:/dev/tty1
-
-
-
-
- The specificity of the device name is all that matters; order on the
- command line is irrelevant.
-
-
-
- 5\b5.\b.2\b2.\b. S\bSp\bpe\bec\bci\bif\bfy\byi\bin\bng\bg t\bth\bhe\be c\bch\bha\ban\bnn\bne\bel\bl
-
- There are a number of different types of channels to attach a UML
- device to, each with a different way of specifying exactly what to
- attach to.
-
- +\bo pseudo-terminals - device=pty pts terminals - device=pts
-
-
- This will cause UML to allocate a free host pseudo-terminal for the
- device. The terminal that it got will be announced in the boot
- log. You access it by attaching a terminal program to the
- corresponding tty:
-
- +\bo screen /dev/pts/n
-
- +\bo screen /dev/ttyxx
-
- +\bo minicom -o -p /dev/ttyxx - minicom seems not able to handle pts
- devices
-
- +\bo kermit - start it up, 'open' the device, then 'connect'
-
-
-
-
-
- +\bo terminals - device=tty:tty device file
-
-
- This will make UML attach the device to the specified tty (i.e
-
-
- con1=tty:/dev/tty3
-
-
-
-
- will attach UML's console 1 to the host's /dev/tty3). If the tty that
- you specify is the slave end of a tty/pty pair, something else must
- have already opened the corresponding pty in order for this to work.
-
-
-
-
-
- +\bo xterms - device=xterm
-
-
- UML will run an xterm and the device will be attached to it.
-
-
-
-
-
- +\bo Port - device=port:port number
-
-
- This will attach the UML devices to the specified host port.
- Attaching console 1 to the host's port 9000 would be done like
- this:
-
-
- con1=port:9000
-
-
-
-
- Attaching all the serial lines to that port would be done similarly:
-
-
- ssl=port:9000
-
-
-
-
- You access these devices by telnetting to that port. Each active tel-
- net session gets a different device. If there are more telnets to a
- port than UML devices attached to it, then the extra telnet sessions
- will block until an existing telnet detaches, or until another device
- becomes active (i.e. by being activated in /etc/inittab).
-
- This channel has the advantage that you can both attach multiple UML
- devices to it and know how to access them without reading the UML boot
- log. It is also unique in allowing access to a UML from remote
- machines without requiring that the UML be networked. This could be
- useful in allowing public access to UMLs because they would be
- accessible from the net, but wouldn't need any kind of network
- filtering or access control because they would have no network access.
-
-
- If you attach the main console to a portal, then the UML boot will
- appear to hang. In reality, it's waiting for a telnet to connect, at
- which point the boot will proceed.
-
-
-
-
-
- +\bo already-existing file descriptors - device=file descriptor
-
-
- If you set up a file descriptor on the UML command line, you can
- attach a UML device to it. This is most commonly used to put the
- main console back on stdin and stdout after assigning all the other
- consoles to something else:
-
-
- con0=fd:0,fd:1 con=pts
-
-
-
-
-
-
-
-
- +\bo Nothing - device=null
-
-
- This allows the device to be opened, in contrast to 'none', but
- reads will block, and writes will succeed and the data will be
- thrown out.
-
-
-
-
-
- +\bo None - device=none
-
-
- This causes the device to disappear.
-
-
-
- You can also specify different input and output channels for a device
- by putting a comma between them:
-
-
- ssl3=tty:/dev/tty2,xterm
-
-
-
-
- will cause serial line 3 to accept input on the host's /dev/tty3 and
- display output on an xterm. That's a silly example - the most common
- use of this syntax is to reattach the main console to stdin and stdout
- as shown above.
-
-
- If you decide to move the main console away from stdin/stdout, the
- initial boot output will appear in the terminal that you're running
- UML in. However, once the console driver has been officially
- initialized, then the boot output will start appearing wherever you
- specified that console 0 should be. That device will receive all
- subsequent output.
-
-
-
- 5\b5.\b.3\b3.\b. E\bEx\bxa\bam\bmp\bpl\ble\bes\bs
-
- There are a number of interesting things you can do with this
- capability.
-
-
- First, this is how you get rid of those bleeding console xterms by
- attaching them to host ptys:
-
-
- con=pty con0=fd:0,fd:1
-
-
-
-
- This will make a UML console take over an unused host virtual console,
- so that when you switch to it, you will see the UML login prompt
- rather than the host login prompt:
-
-
- con1=tty:/dev/tty6
-
-
-
-
- You can attach two virtual machines together with what amounts to a
- serial line as follows:
-
- Run one UML with a serial line attached to a pty -
-
-
- ssl1=pty
-
-
-
-
- Look at the boot log to see what pty it got (this example will assume
- that it got /dev/ptyp1).
-
- Boot the other UML with a serial line attached to the corresponding
- tty -
-
-
- ssl1=tty:/dev/ttyp1
-
-
-
-
- Log in, make sure that it has no getty on that serial line, attach a
- terminal program like minicom to it, and you should see the login
- prompt of the other virtual machine.
-
-
- 6\b6.\b. S\bSe\bet\btt\bti\bin\bng\bg u\bup\bp t\bth\bhe\be n\bne\bet\btw\bwo\bor\brk\bk
-
-
-
- This page describes how to set up the various transports and to
- provide a UML instance with network access to the host, other machines
- on the local net, and the rest of the net.
-
-
- As of 2.4.5, UML networking has been completely redone to make it much
- easier to set up, fix bugs, and add new features.
-
-
- There is a new helper, uml_net, which does the host setup that
- requires root privileges.
-
-
- There are currently five transport types available for a UML virtual
- machine to exchange packets with other hosts:
-
- +\bo ethertap
-
- +\bo TUN/TAP
-
- +\bo Multicast
-
- +\bo a switch daemon
-
- +\bo slip
-
- +\bo slirp
-
- +\bo pcap
-
- The TUN/TAP, ethertap, slip, and slirp transports allow a UML
- instance to exchange packets with the host. They may be directed
- to the host or the host may just act as a router to provide access
- to other physical or virtual machines.
-
-
- The pcap transport is a synthetic read-only interface, using the
- libpcap binary to collect packets from interfaces on the host and
- filter them. This is useful for building preconfigured traffic
- monitors or sniffers.
-
-
- The daemon and multicast transports provide a completely virtual
- network to other virtual machines. This network is completely
- disconnected from the physical network unless one of the virtual
- machines on it is acting as a gateway.
-
-
- With so many host transports, which one should you use? Here's when
- you should use each one:
-
- +\bo ethertap - if you want access to the host networking and it is
- running 2.2
-
- +\bo TUN/TAP - if you want access to the host networking and it is
- running 2.4. Also, the TUN/TAP transport is able to use a
- preconfigured device, allowing it to avoid using the setuid uml_net
- helper, which is a security advantage.
-
- +\bo Multicast - if you want a purely virtual network and you don't want
- to set up anything but the UML
-
- +\bo a switch daemon - if you want a purely virtual network and you
- don't mind running the daemon in order to get somewhat better
- performance
-
- +\bo slip - there is no particular reason to run the slip backend unless
- ethertap and TUN/TAP are just not available for some reason
-
- +\bo slirp - if you don't have root access on the host to setup
- networking, or if you don't want to allocate an IP to your UML
-
- +\bo pcap - not much use for actual network connectivity, but great for
- monitoring traffic on the host
-
- Ethertap is available on 2.4 and works fine. TUN/TAP is preferred
- to it because it has better performance and ethertap is officially
- considered obsolete in 2.4. Also, the root helper only needs to
- run occasionally for TUN/TAP, rather than handling every packet, as
- it does with ethertap. This is a slight security advantage since
- it provides fewer opportunities for a nasty UML user to somehow
- exploit the helper's root privileges.
-
-
- 6\b6.\b.1\b1.\b. G\bGe\ben\bne\ber\bra\bal\bl s\bse\bet\btu\bup\bp
-
- First, you must have the virtual network enabled in your UML. If are
- running a prebuilt kernel from this site, everything is already
- enabled. If you build the kernel yourself, under the "Network device
- support" menu, enable "Network device support", and then the three
- transports.
-
-
- The next step is to provide a network device to the virtual machine.
- This is done by describing it on the kernel command line.
-
- The general format is
-
-
- eth <n> = <transport> , <transport args>
-
-
-
-
- For example, a virtual ethernet device may be attached to a host
- ethertap device as follows:
-
-
- eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254
-
-
-
-
- This sets up eth0 inside the virtual machine to attach itself to the
- host /dev/tap0, assigns it an ethernet address, and assigns the host
- tap0 interface an IP address.
-
-
-
- Note that the IP address you assign to the host end of the tap device
- must be different than the IP you assign to the eth device inside UML.
- If you are short on IPs and don't want to consume two per UML, then
- you can reuse the host's eth IP address for the host ends of the tap
- devices. Internally, the UMLs must still get unique IPs for their eth
- devices. You can also give the UMLs non-routable IPs (192.168.x.x or
- 10.x.x.x) and have the host masquerade them. This will let outgoing
- connections work, but incoming connections won't without more work,
- such as port forwarding from the host.
- Also note that when you configure the host side of an interface, it is
- only acting as a gateway. It will respond to pings sent to it
- locally, but is not useful to do that since it's a host interface.
- You are not talking to the UML when you ping that interface and get a
- response.
-
-
- You can also add devices to a UML and remove them at runtime. See the
- ``The Management Console'' page for details.
-
-
- The sections below describe this in more detail.
-
-
- Once you've decided how you're going to set up the devices, you boot
- UML, log in, configure the UML side of the devices, and set up routes
- to the outside world. At that point, you will be able to talk to any
- other machines, physical or virtual, on the net.
-
-
- If ifconfig inside UML fails and the network refuses to come up, run
- tell you what went wrong.
-
-
-
- 6\b6.\b.2\b2.\b. U\bUs\bse\ber\brs\bsp\bpa\bac\bce\be d\bda\bae\bem\bmo\bon\bns\bs
-
- You will likely need the setuid helper, or the switch daemon, or both.
- They are both installed with the RPM and deb, so if you've installed
- either, you can skip the rest of this section.
-
-
- If not, then you need to check them out of CVS, build them, and
- install them. The helper is uml_net, in CVS /tools/uml_net, and the
- daemon is uml_switch, in CVS /tools/uml_router. They are both built
- with a plain 'make'. Both need to be installed in a directory that's
- in your path - /usr/bin is recommend. On top of that, uml_net needs
- to be setuid root.
-
-
-
- 6\b6.\b.3\b3.\b. S\bSp\bpe\bec\bci\bif\bfy\byi\bin\bng\bg e\bet\bth\bhe\ber\brn\bne\bet\bt a\bad\bdd\bdr\bre\bes\bss\bse\bes\bs
-
- Below, you will see that the TUN/TAP, ethertap, and daemon interfaces
- allow you to specify hardware addresses for the virtual ethernet
- devices. This is generally not necessary. If you don't have a
- specific reason to do it, you probably shouldn't. If one is not
- specified on the command line, the driver will assign one based on the
- device IP address. It will provide the address fe:fd:nn:nn:nn:nn
- where nn.nn.nn.nn is the device IP address. This is nearly always
- sufficient to guarantee a unique hardware address for the device. A
- couple of exceptions are:
-
- +\bo Another set of virtual ethernet devices are on the same network and
- they are assigned hardware addresses using a different scheme which
- may conflict with the UML IP address-based scheme
-
- +\bo You aren't going to use the device for IP networking, so you don't
- assign the device an IP address
-
- If you let the driver provide the hardware address, you should make
- sure that the device IP address is known before the interface is
- brought up. So, inside UML, this will guarantee that:
-
-
-
- UML#
- ifconfig eth0 192.168.0.250 up
-
-
-
-
- If you decide to assign the hardware address yourself, make sure that
- the first byte of the address is even. Addresses with an odd first
- byte are broadcast addresses, which you don't want assigned to a
- device.
-
-
-
- 6\b6.\b.4\b4.\b. U\bUM\bML\bL i\bin\bnt\bte\ber\brf\bfa\bac\bce\be s\bse\bet\btu\bup\bp
-
- Once the network devices have been described on the command line, you
- should boot UML and log in.
-
-
- The first thing to do is bring the interface up:
-
-
- UML# ifconfig ethn ip-address up
-
-
-
-
- You should be able to ping the host at this point.
-
-
- To reach the rest of the world, you should set a default route to the
- host:
-
-
- UML# route add default gw host ip
-
-
-
-
- Again, with host ip of 192.168.0.4:
-
-
- UML# route add default gw 192.168.0.4
-
-
-
-
- This page used to recommend setting a network route to your local net.
- This is wrong, because it will cause UML to try to figure out hardware
- addresses of the local machines by arping on the interface to the
- host. Since that interface is basically a single strand of ethernet
- with two nodes on it (UML and the host) and arp requests don't cross
- networks, they will fail to elicit any responses. So, what you want
- is for UML to just blindly throw all packets at the host and let it
- figure out what to do with them, which is what leaving out the network
- route and adding the default route does.
-
-
- Note: If you can't communicate with other hosts on your physical
- ethernet, it's probably because of a network route that's
- automatically set up. If you run 'route -n' and see a route that
- looks like this:
-
-
-
-
- Destination Gateway Genmask Flags Metric Ref Use Iface
- 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
-
-
-
-
- with a mask that's not 255.255.255.255, then replace it with a route
- to your host:
-
-
- UML#
- route del -net 192.168.0.0 dev eth0 netmask 255.255.255.0
-
-
-
-
-
-
- UML#
- route add -host 192.168.0.4 dev eth0
-
-
-
-
- This, plus the default route to the host, will allow UML to exchange
- packets with any machine on your ethernet.
-
-
-
- 6\b6.\b.5\b5.\b. M\bMu\bul\blt\bti\bic\bca\bas\bst\bt
-
- The simplest way to set up a virtual network between multiple UMLs is
- to use the mcast transport. This was written by Harald Welte and is
- present in UML version 2.4.5-5um and later. Your system must have
- multicast enabled in the kernel and there must be a multicast-capable
- network device on the host. Normally, this is eth0, but if there is
- no ethernet card on the host, then you will likely get strange error
- messages when you bring the device up inside UML.
-
-
- To use it, run two UMLs with
-
-
- eth0=mcast
-
-
-
-
- on their command lines. Log in, configure the ethernet device in each
- machine with different IP addresses:
-
-
- UML1# ifconfig eth0 192.168.0.254
-
-
-
-
-
-
- UML2# ifconfig eth0 192.168.0.253
-
-
-
-
- and they should be able to talk to each other.
-
- The full set of command line options for this transport are
-
-
-
- ethn=mcast,ethernet address,multicast
- address,multicast port,ttl
-
-
-
-
- Harald's original README is here <http://user-mode-linux.source-
- forge.net/> and explains these in detail, as well as
- some other issues.
-
-
-
- 6\b6.\b.6\b6.\b. T\bTU\bUN\bN/\b/T\bTA\bAP\bP w\bwi\bit\bth\bh t\bth\bhe\be u\bum\bml\bl_\b_n\bne\bet\bt h\bhe\bel\blp\bpe\ber\br
-
- TUN/TAP is the preferred mechanism on 2.4 to exchange packets with the
- host. The TUN/TAP backend has been in UML since 2.4.9-3um.
-
-
- The easiest way to get up and running is to let the setuid uml_net
- helper do the host setup for you. This involves insmod-ing the tun.o
- module if necessary, configuring the device, and setting up IP
- forwarding, routing, and proxy arp. If you are new to UML networking,
- do this first. If you're concerned about the security implications of
- the setuid helper, use it to get up and running, then read the next
- section to see how to have UML use a preconfigured tap device, which
- avoids the use of uml_net.
-
-
- If you specify an IP address for the host side of the device, the
- uml_net helper will do all necessary setup on the host - the only
- requirement is that TUN/TAP be available, either built in to the host
- kernel or as the tun.o module.
-
- The format of the command line switch to attach a device to a TUN/TAP
- device is
-
-
- eth <n> =tuntap,,, <IP address>
-
-
-
-
- For example, this argument will attach the UML's eth0 to the next
- available tap device and assign an ethernet address to it based on its
- IP address
-
-
- eth0=tuntap,,,192.168.0.254
-
-
-
-
-
-
- Note that the IP address that must be used for the eth device inside
- UML is fixed by the routing and proxy arp that is set up on the
- TUN/TAP device on the host. You can use a different one, but it won't
- work because reply packets won't reach the UML. This is a feature.
- It prevents a nasty UML user from doing things like setting the UML IP
- to the same as the network's nameserver or mail server.
-
-
- There are a couple potential problems with running the TUN/TAP
- transport on a 2.4 host kernel
-
- +\bo TUN/TAP seems not to work on 2.4.3 and earlier. Upgrade the host
- kernel or use the ethertap transport.
-
- +\bo With an upgraded kernel, TUN/TAP may fail with
-
-
- File descriptor in bad state
-
-
-
-
- This is due to a header mismatch between the upgraded kernel and the
- kernel that was originally installed on the machine. The fix is to
- make sure that /usr/src/linux points to the headers for the running
- kernel.
-
- These were pointed out by Tim Robinson <timro at trkr dot net> in
- <http://www.geocrawler.com/> name="this uml-
- user post"> .
-
-
-
- 6\b6.\b.7\b7.\b. T\bTU\bUN\bN/\b/T\bTA\bAP\bP w\bwi\bit\bth\bh a\ba p\bpr\bre\bec\bco\bon\bnf\bfi\big\bgu\bur\bre\bed\bd t\bta\bap\bp d\bde\bev\bvi\bic\bce\be
-
- If you prefer not to have UML use uml_net (which is somewhat
- insecure), with UML 2.4.17-11, you can set up a TUN/TAP device
- beforehand. The setup needs to be done as root, but once that's done,
- there is no need for root assistance. Setting up the device is done
- as follows:
-
- +\bo Create the device with tunctl (available from the UML utilities
- tarball)
-
-
-
-
- host# tunctl -u uid
-
-
-
-
- where uid is the user id or username that UML will be run as. This
- will tell you what device was created.
-
- +\bo Configure the device IP (change IP addresses and device name to
- suit)
-
-
-
-
- host# ifconfig tap0 192.168.0.254 up
-
-
-
-
-
- +\bo Set up routing and arping if desired - this is my recipe, there are
- other ways of doing the same thing
-
-
- host#
- bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
-
- host#
- route add -host 192.168.0.253 dev tap0
-
-
-
-
-
-
- host#
- bash -c 'echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arp'
-
-
-
-
-
-
- host#
- arp -Ds 192.168.0.253 eth0 pub
-
-
-
-
- Note that this must be done every time the host boots - this configu-
- ration is not stored across host reboots. So, it's probably a good
- idea to stick it in an rc file. An even better idea would be a little
- utility which reads the information from a config file and sets up
- devices at boot time.
-
- +\bo Rather than using up two IPs and ARPing for one of them, you can
- also provide direct access to your LAN by the UML by using a
- bridge.
-
-
- host#
- brctl addbr br0
-
-
-
-
-
-
- host#
- ifconfig eth0 0.0.0.0 promisc up
-
-
-
-
-
-
- host#
- ifconfig tap0 0.0.0.0 promisc up
-
-
-
-
-
-
- host#
- ifconfig br0 192.168.0.1 netmask 255.255.255.0 up
-
-
-
-
-
-
-
- host#
- brctl stp br0 off
-
-
-
-
-
-
- host#
- brctl setfd br0 1
-
-
-
-
-
-
- host#
- brctl sethello br0 1
-
-
-
-
-
-
- host#
- brctl addif br0 eth0
-
-
-
-
-
-
- host#
- brctl addif br0 tap0
-
-
-
-
- Note that 'br0' should be setup using ifconfig with the existing IP
- address of eth0, as eth0 no longer has its own IP.
-
- +\bo
-
-
- Also, the /dev/net/tun device must be writable by the user running
- UML in order for the UML to use the device that's been configured
- for it. The simplest thing to do is
-
-
- host# chmod 666 /dev/net/tun
-
-
-
-
- Making it world-writable looks bad, but it seems not to be
- exploitable as a security hole. However, it does allow anyone to cre-
- ate useless tap devices (useless because they can't configure them),
- which is a DOS attack. A somewhat more secure alternative would to be
- to create a group containing all the users who have preconfigured tap
- devices and chgrp /dev/net/tun to that group with mode 664 or 660.
-
-
- +\bo Once the device is set up, run UML with 'eth0=tuntap,device name'
- (i.e. 'eth0=tuntap,tap0') on the command line (or do it with the
- mconsole config command).
-
- +\bo Bring the eth device up in UML and you're in business.
-
- If you don't want that tap device any more, you can make it non-
- persistent with
-
-
- host# tunctl -d tap device
-
-
-
-
- Finally, tunctl has a -b (for brief mode) switch which causes it to
- output only the name of the tap device it created. This makes it
- suitable for capture by a script:
-
-
- host# TAP=`tunctl -u 1000 -b`
-
-
-
-
-
-
- 6\b6.\b.8\b8.\b. E\bEt\bth\bhe\ber\brt\bta\bap\bp
-
- Ethertap is the general mechanism on 2.2 for userspace processes to
- exchange packets with the kernel.
-
-
-
- To use this transport, you need to describe the virtual network device
- on the UML command line. The general format for this is
-
-
- eth <n> =ethertap, <device> , <ethernet address> , <tap IP address>
-
-
-
-
- So, the previous example
-
-
- eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254
-
-
-
-
- attaches the UML eth0 device to the host /dev/tap0, assigns it the
- ethernet address fe:fd:0:0:0:1, and assigns the IP address
- 192.168.0.254 to the tap device.
-
-
-
- The tap device is mandatory, but the others are optional. If the
- ethernet address is omitted, one will be assigned to it.
-
-
- The presence of the tap IP address will cause the helper to run and do
- whatever host setup is needed to allow the virtual machine to
- communicate with the outside world. If you're not sure you know what
- you're doing, this is the way to go.
-
-
- If it is absent, then you must configure the tap device and whatever
- arping and routing you will need on the host. However, even in this
- case, the uml_net helper still needs to be in your path and it must be
- setuid root if you're not running UML as root. This is because the
- tap device doesn't support SIGIO, which UML needs in order to use
- something as a source of input. So, the helper is used as a
- convenient asynchronous IO thread.
-
- If you're using the uml_net helper, you can ignore the following host
- setup - uml_net will do it for you. You just need to make sure you
- have ethertap available, either built in to the host kernel or
- available as a module.
-
-
- If you want to set things up yourself, you need to make sure that the
- appropriate /dev entry exists. If it doesn't, become root and create
- it as follows:
-
-
- mknod /dev/tap <minor> c 36 <minor> + 16
-
-
-
-
- For example, this is how to create /dev/tap0:
-
-
- mknod /dev/tap0 c 36 0 + 16
-
-
-
-
- You also need to make sure that the host kernel has ethertap support.
- If ethertap is enabled as a module, you apparently need to insmod
- ethertap once for each ethertap device you want to enable. So,
-
-
- host#
- insmod ethertap
-
-
-
-
- will give you the tap0 interface. To get the tap1 interface, you need
- to run
-
-
- host#
- insmod ethertap unit=1 -o ethertap1
-
-
-
-
-
-
-
- 6\b6.\b.9\b9.\b. T\bTh\bhe\be s\bsw\bwi\bit\btc\bch\bh d\bda\bae\bem\bmo\bon\bn
-
- N\bNo\bot\bte\be: This is the daemon formerly known as uml_router, but which was
- renamed so the network weenies of the world would stop growling at me.
-
-
- The switch daemon, uml_switch, provides a mechanism for creating a
- totally virtual network. By default, it provides no connection to the
- host network (but see -tap, below).
-
-
- The first thing you need to do is run the daemon. Running it with no
- arguments will make it listen on a default pair of unix domain
- sockets.
-
-
- If you want it to listen on a different pair of sockets, use
-
-
- -unix control socket data socket
-
-
-
-
-
- If you want it to act as a hub rather than a switch, use
-
-
- -hub
-
-
-
-
-
- If you want the switch to be connected to host networking (allowing
- the umls to get access to the outside world through the host), use
-
-
- -tap tap0
-
-
-
-
-
- Note that the tap device must be preconfigured (see "TUN/TAP with a
- preconfigured tap device", above). If you're using a different tap
- device than tap0, specify that instead of tap0.
-
-
- uml_switch can be backgrounded as follows
-
-
- host%
- uml_switch [ options ] < /dev/null > /dev/null
-
-
-
-
- The reason it doesn't background by default is that it listens to
- stdin for EOF. When it sees that, it exits.
-
-
- The general format of the kernel command line switch is
-
-
-
- ethn=daemon,ethernet address,socket
- type,control socket,data socket
-
-
-
-
- You can leave off everything except the 'daemon'. You only need to
- specify the ethernet address if the one that will be assigned to it
- isn't acceptable for some reason. The rest of the arguments describe
- how to communicate with the daemon. You should only specify them if
- you told the daemon to use different sockets than the default. So, if
- you ran the daemon with no arguments, running the UML on the same
- machine with
- eth0=daemon
-
-
-
-
- will cause the eth0 driver to attach itself to the daemon correctly.
-
-
-
- 6\b6.\b.1\b10\b0.\b. S\bSl\bli\bip\bp
-
- Slip is another, less general, mechanism for a process to communicate
- with the host networking. In contrast to the ethertap interface,
- which exchanges ethernet frames with the host and can be used to
- transport any higher-level protocol, it can only be used to transport
- IP.
-
-
- The general format of the command line switch is
-
-
-
- ethn=slip,slip IP
-
-
-
-
- The slip IP argument is the IP address that will be assigned to the
- host end of the slip device. If it is specified, the helper will run
- and will set up the host so that the virtual machine can reach it and
- the rest of the network.
-
-
- There are some oddities with this interface that you should be aware
- of. You should only specify one slip device on a given virtual
- machine, and its name inside UML will be 'umn', not 'eth0' or whatever
- you specified on the command line. These problems will be fixed at
- some point.
-
-
-
- 6\b6.\b.1\b11\b1.\b. S\bSl\bli\bir\brp\bp
-
- slirp uses an external program, usually /usr/bin/slirp, to provide IP
- only networking connectivity through the host. This is similar to IP
- masquerading with a firewall, although the translation is performed in
- user-space, rather than by the kernel. As slirp does not set up any
- interfaces on the host, or changes routing, slirp does not require
- root access or setuid binaries on the host.
-
-
- The general format of the command line switch for slirp is:
-
-
-
- ethn=slirp,ethernet address,slirp path
-
-
-
-
- The ethernet address is optional, as UML will set up the interface
- with an ethernet address based upon the initial IP address of the
- interface. The slirp path is generally /usr/bin/slirp, although it
- will depend on distribution.
-
-
- The slirp program can have a number of options passed to the command
- line and we can't add them to the UML command line, as they will be
- parsed incorrectly. Instead, a wrapper shell script can be written or
- the options inserted into the /.slirprc file. More information on
- all of the slirp options can be found in its man pages.
-
-
- The eth0 interface on UML should be set up with the IP 10.2.0.15,
- although you can use anything as long as it is not used by a network
- you will be connecting to. The default route on UML should be set to
- use
-
-
- UML#
- route add default dev eth0
-
-
-
-
- slirp provides a number of useful IP addresses which can be used by
- UML, such as 10.0.2.3 which is an alias for the DNS server specified
- in /etc/resolv.conf on the host or the IP given in the 'dns' option
- for slirp.
-
-
- Even with a baudrate setting higher than 115200, the slirp connection
- is limited to 115200. If you need it to go faster, the slirp binary
- needs to be compiled with FULL_BOLT defined in config.h.
-
-
-
- 6\b6.\b.1\b12\b2.\b. p\bpc\bca\bap\bp
-
- The pcap transport is attached to a UML ethernet device on the command
- line or with uml_mconsole with the following syntax:
-
-
-
- ethn=pcap,host interface,filter
- expression,option1,option2
-
-
-
-
- The expression and options are optional.
-
-
- The interface is whatever network device on the host you want to
- sniff. The expression is a pcap filter expression, which is also what
- tcpdump uses, so if you know how to specify tcpdump filters, you will
- use the same expressions here. The options are up to two of
- 'promisc', control whether pcap puts the host interface into
- promiscuous mode. 'optimize' and 'nooptimize' control whether the pcap
- expression optimizer is used.
-
-
- Example:
-
-
-
- eth0=pcap,eth0,tcp
-
- eth1=pcap,eth0,!tcp
-
-
-
- will cause the UML eth0 to emit all tcp packets on the host eth0 and
- the UML eth1 to emit all non-tcp packets on the host eth0.
-
-
-
- 6\b6.\b.1\b13\b3.\b. S\bSe\bet\btt\bti\bin\bng\bg u\bup\bp t\bth\bhe\be h\bho\bos\bst\bt y\byo\bou\bur\brs\bse\bel\blf\bf
-
- If you don't specify an address for the host side of the ethertap or
- slip device, UML won't do any setup on the host. So this is what is
- needed to get things working (the examples use a host-side IP of
- 192.168.0.251 and a UML-side IP of 192.168.0.250 - adjust to suit your
- own network):
-
- +\bo The device needs to be configured with its IP address. Tap devices
- are also configured with an mtu of 1484. Slip devices are
- configured with a point-to-point address pointing at the UML ip
- address.
-
-
- host# ifconfig tap0 arp mtu 1484 192.168.0.251 up
-
-
-
-
-
-
- host#
- ifconfig sl0 192.168.0.251 pointopoint 192.168.0.250 up
-
-
-
-
-
- +\bo If a tap device is being set up, a route is set to the UML IP.
-
-
- UML# route add -host 192.168.0.250 gw 192.168.0.251
-
-
-
-
-
- +\bo To allow other hosts on your network to see the virtual machine,
- proxy arp is set up for it.
-
-
- host# arp -Ds 192.168.0.250 eth0 pub
-
-
-
-
-
- +\bo Finally, the host is set up to route packets.
-
-
- host# echo 1 > /proc/sys/net/ipv4/ip_forward
-
-
-
-
-
-
-
-
-
-
- 7\b7.\b. S\bSh\bha\bar\bri\bin\bng\bg F\bFi\bil\ble\bes\bsy\bys\bst\bte\bem\bms\bs b\bbe\bet\btw\bwe\bee\ben\bn V\bVi\bir\brt\btu\bua\bal\bl M\bMa\bac\bch\bhi\bin\bne\bes\bs
-
-
-
-
- 7\b7.\b.1\b1.\b. A\bA w\bwa\bar\brn\bni\bin\bng\bg
-
- Don't attempt to share filesystems simply by booting two UMLs from the
- same file. That's the same thing as booting two physical machines
- from a shared disk. It will result in filesystem corruption.
-
-
-
- 7\b7.\b.2\b2.\b. U\bUs\bsi\bin\bng\bg l\bla\bay\bye\ber\bre\bed\bd b\bbl\blo\boc\bck\bk d\bde\bev\bvi\bic\bce\bes\bs
-
- The way to share a filesystem between two virtual machines is to use
- the copy-on-write (COW) layering capability of the ubd block driver.
- As of 2.4.6-2um, the driver supports layering a read-write private
- device over a read-only shared device. A machine's writes are stored
- in the private device, while reads come from either device - the
- private one if the requested block is valid in it, the shared one if
- not. Using this scheme, the majority of data which is unchanged is
- shared between an arbitrary number of virtual machines, each of which
- has a much smaller file containing the changes that it has made. With
- a large number of UMLs booting from a large root filesystem, this
- leads to a huge disk space saving. It will also help performance,
- since the host will be able to cache the shared data using a much
- smaller amount of memory, so UML disk requests will be served from the
- host's memory rather than its disks.
-
-
-
-
- To add a copy-on-write layer to an existing block device file, simply
- add the name of the COW file to the appropriate ubd switch:
-
-
- ubd0=root_fs_cow,root_fs_debian_22
-
-
-
-
- where 'root_fs_cow' is the private COW file and 'root_fs_debian_22' is
- the existing shared filesystem. The COW file need not exist. If it
- doesn't, the driver will create and initialize it. Once the COW file
- has been initialized, it can be used on its own on the command line:
-
-
- ubd0=root_fs_cow
-
-
-
-
- The name of the backing file is stored in the COW file header, so it
- would be redundant to continue specifying it on the command line.
-
-
-
- 7\b7.\b.3\b3.\b. N\bNo\bot\bte\be!\b!
-
- When checking the size of the COW file in order to see the gobs of
- space that you're saving, make sure you use 'ls -ls' to see the actual
- disk consumption rather than the length of the file. The COW file is
- sparse, so the length will be very different from the disk usage.
- Here is a 'ls -l' of a COW file and backing file from one boot and
- shutdown:
- host% ls -l cow.debian debian2.2
- -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian
- -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2
-
-
-
-
- Doesn't look like much saved space, does it? Well, here's 'ls -ls':
-
-
- host% ls -ls cow.debian debian2.2
- 880 -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian
- 525832 -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2
-
-
-
-
- Now, you can see that the COW file has less than a meg of disk, rather
- than 492 meg.
-
-
-
- 7\b7.\b.4\b4.\b. A\bAn\bno\bot\bth\bhe\ber\br w\bwa\bar\brn\bni\bin\bng\bg
-
- Once a filesystem is being used as a readonly backing file for a COW
- file, do not boot directly from it or modify it in any way. Doing so
- will invalidate any COW files that are using it. The mtime and size
- of the backing file are stored in the COW file header at its creation,
- and they must continue to match. If they don't, the driver will
- refuse to use the COW file.
-
-
-
-
- If you attempt to evade this restriction by changing either the
- backing file or the COW header by hand, you will get a corrupted
- filesystem.
-
-
-
-
- Among other things, this means that upgrading the distribution in a
- backing file and expecting that all of the COW files using it will see
- the upgrade will not work.
-
-
-
-
- 7\b7.\b.5\b5.\b. u\bum\bml\bl_\b_m\bmo\boo\bo :\b: M\bMe\ber\brg\bgi\bin\bng\bg a\ba C\bCO\bOW\bW f\bfi\bil\ble\be w\bwi\bit\bth\bh i\bit\bts\bs b\bba\bac\bck\bki\bin\bng\bg f\bfi\bil\ble\be
-
- Depending on how you use UML and COW devices, it may be advisable to
- merge the changes in the COW file into the backing file every once in
- a while.
-
-
-
-
- The utility that does this is uml_moo. Its usage is
-
-
- host% uml_moo COW file new backing file
-
-
-
-
- There's no need to specify the backing file since that information is
- already in the COW file header. If you're paranoid, boot the new
- merged file, and if you're happy with it, move it over the old backing
- file.
-
-
-
-
- uml_moo creates a new backing file by default as a safety measure. It
- also has a destructive merge option which will merge the COW file
- directly into its current backing file. This is really only usable
- when the backing file only has one COW file associated with it. If
- there are multiple COWs associated with a backing file, a -d merge of
- one of them will invalidate all of the others. However, it is
- convenient if you're short of disk space, and it should also be
- noticeably faster than a non-destructive merge.
-
-
-
-
- uml_moo is installed with the UML deb and RPM. If you didn't install
- UML from one of those packages, you can also get it from the UML
- utilities <http://user-mode-linux.sourceforge.net/
- utilities> tar file in tools/moo.
-
-
-
-
-
-
-
-
- 8\b8.\b. C\bCr\bre\bea\bat\bti\bin\bng\bg f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bms\bs
-
-
- You may want to create and mount new UML filesystems, either because
- your root filesystem isn't large enough or because you want to use a
- filesystem other than ext2.
-
-
- This was written on the occasion of reiserfs being included in the
- 2.4.1 kernel pool, and therefore the 2.4.1 UML, so the examples will
- talk about reiserfs. This information is generic, and the examples
- should be easy to translate to the filesystem of your choice.
-
-
- 8\b8.\b.1\b1.\b. C\bCr\bre\bea\bat\bte\be t\bth\bhe\be f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm f\bfi\bil\ble\be
-
- dd is your friend. All you need to do is tell dd to create an empty
- file of the appropriate size. I usually make it sparse to save time
- and to avoid allocating disk space until it's actually used. For
- example, the following command will create a sparse 100 meg file full
- of zeroes.
-
-
- host%
- dd if=/dev/zero of=new_filesystem seek=100 count=1 bs=1M
-
-
-
-
-
-
- 8\b8.\b.2\b2.\b. A\bAs\bss\bsi\big\bgn\bn t\bth\bhe\be f\bfi\bil\ble\be t\bto\bo a\ba U\bUM\bML\bL d\bde\bev\bvi\bic\bce\be
-
- Add an argument like the following to the UML command line:
-
- ubd4=new_filesystem
-
-
-
-
- making sure that you use an unassigned ubd device number.
-
-
-
- 8\b8.\b.3\b3.\b. C\bCr\bre\bea\bat\bti\bin\bng\bg a\ban\bnd\bd m\bmo\bou\bun\bnt\bti\bin\bng\bg t\bth\bhe\be f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm
-
- Make sure that the filesystem is available, either by being built into
- the kernel, or available as a module, then boot up UML and log in. If
- the root filesystem doesn't have the filesystem utilities (mkfs, fsck,
- etc), then get them into UML by way of the net or hostfs.
-
-
- Make the new filesystem on the device assigned to the new file:
-
-
- host# mkreiserfs /dev/ubd/4
-
-
- <----------- MKREISERFSv2 ----------->
-
- ReiserFS version 3.6.25
- Block size 4096 bytes
- Block count 25856
- Used blocks 8212
- Journal - 8192 blocks (18-8209), journal header is in block 8210
- Bitmaps: 17
- Root block 8211
- Hash function "r5"
- ATTENTION: ALL DATA WILL BE LOST ON '/dev/ubd/4'! (y/n)y
- journal size 8192 (from 18)
- Initializing journal - 0%....20%....40%....60%....80%....100%
- Syncing..done.
-
-
-
-
- Now, mount it:
-
-
- UML#
- mount /dev/ubd/4 /mnt
-
-
-
-
- and you're in business.
-
-
-
-
-
-
-
-
-
- 9\b9.\b. H\bHo\bos\bst\bt f\bfi\bil\ble\be a\bac\bcc\bce\bes\bss\bs
-
-
- If you want to access files on the host machine from inside UML, you
- can treat it as a separate machine and either nfs mount directories
- from the host or copy files into the virtual machine with scp or rcp.
- However, since UML is running on the host, it can access those
- files just like any other process and make them available inside the
- virtual machine without needing to use the network.
-
-
- This is now possible with the hostfs virtual filesystem. With it, you
- can mount a host directory into the UML filesystem and access the
- files contained in it just as you would on the host.
-
-
- 9\b9.\b.1\b1.\b. U\bUs\bsi\bin\bng\bg h\bho\bos\bst\btf\bfs\bs
-
- To begin with, make sure that hostfs is available inside the virtual
- machine with
-
-
- UML# cat /proc/filesystems
-
-
-
- . hostfs should be listed. If it's not, either rebuild the kernel
- with hostfs configured into it or make sure that hostfs is built as a
- module and available inside the virtual machine, and insmod it.
-
-
- Now all you need to do is run mount:
-
-
- UML# mount none /mnt/host -t hostfs
-
-
-
-
- will mount the host's / on the virtual machine's /mnt/host.
-
-
- If you don't want to mount the host root directory, then you can
- specify a subdirectory to mount with the -o switch to mount:
-
-
- UML# mount none /mnt/home -t hostfs -o /home
-
-
-
-
- will mount the hosts's /home on the virtual machine's /mnt/home.
-
-
-
- 9\b9.\b.2\b2.\b. h\bho\bos\bst\btf\bfs\bs a\bas\bs t\bth\bhe\be r\bro\boo\bot\bt f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm
-
- It's possible to boot from a directory hierarchy on the host using
- hostfs rather than using the standard filesystem in a file.
-
- To start, you need that hierarchy. The easiest way is to loop mount
- an existing root_fs file:
-
-
- host# mount root_fs uml_root_dir -o loop
-
-
-
-
- You need to change the filesystem type of / in etc/fstab to be
- 'hostfs', so that line looks like this:
-
- /dev/ubd/0 / hostfs defaults 1 1
-
-
-
-
- Then you need to chown to yourself all the files in that directory
- that are owned by root. This worked for me:
-
-
- host# find . -uid 0 -exec chown jdike {} \;
-
-
-
-
- Next, make sure that your UML kernel has hostfs compiled in, not as a
- module. Then run UML with the boot device pointing at that directory:
-
-
- ubd0=/path/to/uml/root/directory
-
-
-
-
- UML should then boot as it does normally.
-
-
- 9\b9.\b.3\b3.\b. B\bBu\bui\bil\bld\bdi\bin\bng\bg h\bho\bos\bst\btf\bfs\bs
-
- If you need to build hostfs because it's not in your kernel, you have
- two choices:
-
-
-
- +\bo Compiling hostfs into the kernel:
-
-
- Reconfigure the kernel and set the 'Host filesystem' option under
-
-
- +\bo Compiling hostfs as a module:
-
-
- Reconfigure the kernel and set the 'Host filesystem' option under
- be in arch/um/fs/hostfs/hostfs.o. Install that in
- /lib/modules/`uname -r`/fs in the virtual machine, boot it up, and
-
-
- UML# insmod hostfs
-
-
-
-
-
-
-
-
-
-
-
-
- 1\b10\b0.\b. T\bTh\bhe\be M\bMa\ban\bna\bag\bge\bem\bme\ben\bnt\bt C\bCo\bon\bns\bso\bol\ble\be
-
-
-
- The UML management console is a low-level interface to the kernel,
- somewhat like the i386 SysRq interface. Since there is a full-blown
- operating system under UML, there is much greater flexibility possible
- than with the SysRq mechanism.
-
-
- There are a number of things you can do with the mconsole interface:
-
- +\bo get the kernel version
-
- +\bo add and remove devices
-
- +\bo halt or reboot the machine
-
- +\bo Send SysRq commands
-
- +\bo Pause and resume the UML
-
-
- You need the mconsole client (uml_mconsole) which is present in CVS
- (/tools/mconsole) in 2.4.5-9um and later, and will be in the RPM in
- 2.4.6.
-
-
- You also need CONFIG_MCONSOLE (under 'General Setup') enabled in UML.
- When you boot UML, you'll see a line like:
-
-
- mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
-
-
-
-
- If you specify a unique machine id one the UML command line, i.e.
-
-
- umid=debian
-
-
-
-
- you'll see this
-
-
- mconsole initialized on /home/jdike/.uml/debian/mconsole
-
-
-
-
- That file is the socket that uml_mconsole will use to communicate with
- UML. Run it with either the umid or the full path as its argument:
-
-
- host% uml_mconsole debian
-
-
-
-
- or
-
-
- host% uml_mconsole /home/jdike/.uml/debian/mconsole
-
-
-
-
- You'll get a prompt, at which you can run one of these commands:
-
- +\bo version
-
- +\bo halt
-
- +\bo reboot
-
- +\bo config
-
- +\bo remove
-
- +\bo sysrq
-
- +\bo help
-
- +\bo cad
-
- +\bo stop
-
- +\bo go
-
-
- 1\b10\b0.\b.1\b1.\b. v\bve\ber\brs\bsi\bio\bon\bn
-
- This takes no arguments. It prints the UML version.
-
-
- (mconsole) version
- OK Linux usermode 2.4.5-9um #1 Wed Jun 20 22:47:08 EDT 2001 i686
-
-
-
-
- There are a couple actual uses for this. It's a simple no-op which
- can be used to check that a UML is running. It's also a way of
- sending an interrupt to the UML. This is sometimes useful on SMP
- hosts, where there's a bug which causes signals to UML to be lost,
- often causing it to appear to hang. Sending such a UML the mconsole
- version command is a good way to 'wake it up' before networking has
- been enabled, as it does not do anything to the function of the UML.
-
-
-
- 1\b10\b0.\b.2\b2.\b. h\bha\bal\blt\bt a\ban\bnd\bd r\bre\beb\bbo\boo\bot\bt
-
- These take no arguments. They shut the machine down immediately, with
- no syncing of disks and no clean shutdown of userspace. So, they are
- pretty close to crashing the machine.
-
-
- (mconsole) halt
- OK
-
-
-
-
-
-
- 1\b10\b0.\b.3\b3.\b. c\bco\bon\bnf\bfi\big\bg
-
- "config" adds a new device to the virtual machine. Currently the ubd
- and network drivers support this. It takes one argument, which is the
- device to add, with the same syntax as the kernel command line.
-
-
-
-
- (mconsole)
- config ubd3=/home/jdike/incoming/roots/root_fs_debian22
-
- OK
- (mconsole) config eth1=mcast
- OK
-
-
-
-
-
-
- 1\b10\b0.\b.4\b4.\b. r\bre\bem\bmo\bov\bve\be
-
- "remove" deletes a device from the system. Its argument is just the
- name of the device to be removed. The device must be idle in whatever
- sense the driver considers necessary. In the case of the ubd driver,
- the removed block device must not be mounted, swapped on, or otherwise
- open, and in the case of the network driver, the device must be down.
-
-
- (mconsole) remove ubd3
- OK
- (mconsole) remove eth1
- OK
-
-
-
-
-
-
- 1\b10\b0.\b.5\b5.\b. s\bsy\bys\bsr\brq\bq
-
- This takes one argument, which is a single letter. It calls the
- generic kernel's SysRq driver, which does whatever is called for by
- that argument. See the SysRq documentation in Documentation/sysrq.txt
- in your favorite kernel tree to see what letters are valid and what
- they do.
-
-
-
- 1\b10\b0.\b.6\b6.\b. h\bhe\bel\blp\bp
-
- "help" returns a string listing the valid commands and what each one
- does.
-
-
-
- 1\b10\b0.\b.7\b7.\b. c\bca\bad\bd
-
- This invokes the Ctl-Alt-Del action on init. What exactly this ends
- up doing is up to /etc/inittab. Normally, it reboots the machine.
- With UML, this is usually not desired, so if a halt would be better,
- then find the section of inittab that looks like this
-
-
- # What to do when CTRL-ALT-DEL is pressed.
- ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -r now
-
-
-
-
- and change the command to halt.
-
-
-
- 1\b10\b0.\b.8\b8.\b. s\bst\bto\bop\bp
-
- This puts the UML in a loop reading mconsole requests until a 'go'
- mconsole command is received. This is very useful for making backups
- of UML filesystems, as the UML can be stopped, then synced via 'sysrq
- s', so that everything is written to the filesystem. You can then copy
- the filesystem and then send the UML 'go' via mconsole.
-
-
- Note that a UML running with more than one CPU will have problems
- after you send the 'stop' command, as only one CPU will be held in a
- mconsole loop and all others will continue as normal. This is a bug,
- and will be fixed.
-
-
-
- 1\b10\b0.\b.9\b9.\b. g\bgo\bo
-
- This resumes a UML after being paused by a 'stop' command. Note that
- when the UML has resumed, TCP connections may have timed out and if
- the UML is paused for a long period of time, crond might go a little
- crazy, running all the jobs it didn't do earlier.
-
-
-
-
-
-
-
-
- 1\b11\b1.\b. K\bKe\ber\brn\bne\bel\bl d\bde\beb\bbu\bug\bgg\bgi\bin\bng\bg
-
-
- N\bNo\bot\bte\be:\b: The interface that makes debugging, as described here, possible
- is present in 2.4.0-test6 kernels and later.
-
-
- Since the user-mode kernel runs as a normal Linux process, it is
- possible to debug it with gdb almost like any other process. It is
- slightly different because the kernel's threads are already being
- ptraced for system call interception, so gdb can't ptrace them.
- However, a mechanism has been added to work around that problem.
-
-
- In order to debug the kernel, you need build it from source. See
- ``Compiling the kernel and modules'' for information on doing that.
- Make sure that you enable CONFIG_DEBUGSYM and CONFIG_PT_PROXY during
- the config. These will compile the kernel with -g, and enable the
- ptrace proxy so that gdb works with UML, respectively.
-
-
-
-
- 1\b11\b1.\b.1\b1.\b. S\bSt\bta\bar\brt\bti\bin\bng\bg t\bth\bhe\be k\bke\ber\brn\bne\bel\bl u\bun\bnd\bde\ber\br g\bgd\bdb\bb
-
- You can have the kernel running under the control of gdb from the
- beginning by putting 'debug' on the command line. You will get an
- xterm with gdb running inside it. The kernel will send some commands
- to gdb which will leave it stopped at the beginning of start_kernel.
- At this point, you can get things going with 'next', 'step', or
- 'cont'.
-
-
- There is a transcript of a debugging session here <debug-
- session.html> , with breakpoints being set in the scheduler and in an
- interrupt handler.
- 1\b11\b1.\b.2\b2.\b. E\bEx\bxa\bam\bmi\bin\bni\bin\bng\bg s\bsl\ble\bee\bep\bpi\bin\bng\bg p\bpr\bro\boc\bce\bes\bss\bse\bes\bs
-
- Not every bug is evident in the currently running process. Sometimes,
- processes hang in the kernel when they shouldn't because they've
- deadlocked on a semaphore or something similar. In this case, when
- you ^C gdb and get a backtrace, you will see the idle thread, which
- isn't very relevant.
-
-
- What you want is the stack of whatever process is sleeping when it
- shouldn't be. You need to figure out which process that is, which is
- generally fairly easy. Then you need to get its host process id,
- which you can do either by looking at ps on the host or at
- task.thread.extern_pid in gdb.
-
-
- Now what you do is this:
-
- +\bo detach from the current thread
-
-
- (UML gdb) det
-
-
-
-
-
- +\bo attach to the thread you are interested in
-
-
- (UML gdb) att <host pid>
-
-
-
-
-
- +\bo look at its stack and anything else of interest
-
-
- (UML gdb) bt
-
-
-
-
- Note that you can't do anything at this point that requires that a
- process execute, e.g. calling a function
-
- +\bo when you're done looking at that process, reattach to the current
- thread and continue it
-
-
- (UML gdb)
- att 1
-
-
-
-
-
-
- (UML gdb)
- c
-
-
-
-
- Here, specifying any pid which is not the process id of a UML thread
- will cause gdb to reattach to the current thread. I commonly use 1,
- but any other invalid pid would work.
-
-
-
- 1\b11\b1.\b.3\b3.\b. R\bRu\bun\bnn\bni\bin\bng\bg d\bdd\bdd\bd o\bon\bn U\bUM\bML\bL
-
- ddd works on UML, but requires a special kludge. The process goes
- like this:
-
- +\bo Start ddd
-
-
- host% ddd linux
-
-
-
-
-
- +\bo With ps, get the pid of the gdb that ddd started. You can ask the
- gdb to tell you, but for some reason that confuses things and
- causes a hang.
-
- +\bo run UML with 'debug=parent gdb-pid=<pid>' added to the command line
- - it will just sit there after you hit return
-
- +\bo type 'att 1' to the ddd gdb and you will see something like
-
-
- 0xa013dc51 in __kill ()
-
-
- (gdb)
-
-
-
-
-
- +\bo At this point, type 'c', UML will boot up, and you can use ddd just
- as you do on any other process.
-
-
-
- 1\b11\b1.\b.4\b4.\b. D\bDe\beb\bbu\bug\bgg\bgi\bin\bng\bg m\bmo\bod\bdu\bul\ble\bes\bs
-
- gdb has support for debugging code which is dynamically loaded into
- the process. This support is what is needed to debug kernel modules
- under UML.
-
-
- Using that support is somewhat complicated. You have to tell gdb what
- object file you just loaded into UML and where in memory it is. Then,
- it can read the symbol table, and figure out where all the symbols are
- from the load address that you provided. It gets more interesting
- when you load the module again (i.e. after an rmmod). You have to
- tell gdb to forget about all its symbols, including the main UML ones
- for some reason, then load then all back in again.
-
-
- There's an easy way and a hard way to do this. The easy way is to use
- the umlgdb expect script written by Chandan Kudige. It basically
- automates the process for you.
-
-
- First, you must tell it where your modules are. There is a list in
- the script that looks like this:
- set MODULE_PATHS {
- "fat" "/usr/src/uml/linux-2.4.18/fs/fat/fat.o"
- "isofs" "/usr/src/uml/linux-2.4.18/fs/isofs/isofs.o"
- "minix" "/usr/src/uml/linux-2.4.18/fs/minix/minix.o"
- }
-
-
-
-
- You change that to list the names and paths of the modules that you
- are going to debug. Then you run it from the toplevel directory of
- your UML pool and it basically tells you what to do:
-
-
-
-
- ******** GDB pid is 21903 ********
- Start UML as: ./linux <kernel switches> debug gdb-pid=21903
-
-
-
- GNU gdb 5.0rh-5 Red Hat Linux 7.1
- Copyright 2001 Free Software Foundation, Inc.
- GDB is free software, covered by the GNU General Public License, and you are
- welcome to change it and/or distribute copies of it under certain conditions.
- Type "show copying" to see the conditions.
- There is absolutely no warranty for GDB. Type "show warranty" for details.
- This GDB was configured as "i386-redhat-linux"...
- (gdb) b sys_init_module
- Breakpoint 1 at 0xa0011923: file module.c, line 349.
- (gdb) att 1
-
-
-
-
- After you run UML and it sits there doing nothing, you hit return at
- the 'att 1' and continue it:
-
-
- Attaching to program: /home/jdike/linux/2.4/um/./linux, process 1
- 0xa00f4221 in __kill ()
- (UML gdb) c
- Continuing.
-
-
-
-
- At this point, you debug normally. When you insmod something, the
- expect magic will kick in and you'll see something like:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- *** Module hostfs loaded ***
- Breakpoint 1, sys_init_module (name_user=0x805abb0 "hostfs",
- mod_user=0x8070e00) at module.c:349
- 349 char *name, *n_name, *name_tmp = NULL;
- (UML gdb) finish
- Run till exit from #0 sys_init_module (name_user=0x805abb0 "hostfs",
- mod_user=0x8070e00) at module.c:349
- 0xa00e2e23 in execute_syscall (r=0xa8140284) at syscall_kern.c:411
- 411 else res = EXECUTE_SYSCALL(syscall, regs);
- Value returned is $1 = 0
- (UML gdb)
- p/x (int)module_list + module_list->size_of_struct
-
- $2 = 0xa9021054
- (UML gdb) symbol-file ./linux
- Load new symbol table from "./linux"? (y or n) y
- Reading symbols from ./linux...
- done.
- (UML gdb)
- add-symbol-file /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o 0xa9021054
-
- add symbol table from file "/home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o" at
- .text_addr = 0xa9021054
- (y or n) y
-
- Reading symbols from /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o...
- done.
- (UML gdb) p *module_list
- $1 = {size_of_struct = 84, next = 0xa0178720, name = 0xa9022de0 "hostfs",
- size = 9016, uc = {usecount = {counter = 0}, pad = 0}, flags = 1,
- nsyms = 57, ndeps = 0, syms = 0xa9023170, deps = 0x0, refs = 0x0,
- init = 0xa90221f0 <init_hostfs>, cleanup = 0xa902222c <exit_hostfs>,
- ex_table_start = 0x0, ex_table_end = 0x0, persist_start = 0x0,
- persist_end = 0x0, can_unload = 0, runsize = 0, kallsyms_start = 0x0,
- kallsyms_end = 0x0,
- archdata_start = 0x1b855 <Address 0x1b855 out of bounds>,
- archdata_end = 0xe5890000 <Address 0xe5890000 out of bounds>,
- kernel_data = 0xf689c35d <Address 0xf689c35d out of bounds>}
- >> Finished loading symbols for hostfs ...
-
-
-
-
- That's the easy way. It's highly recommended. The hard way is
- described below in case you're interested in what's going on.
-
-
- Boot the kernel under the debugger and load the module with insmod or
- modprobe. With gdb, do:
-
-
- (UML gdb) p module_list
-
-
-
-
- This is a list of modules that have been loaded into the kernel, with
- the most recently loaded module first. Normally, the module you want
- is at module_list. If it's not, walk down the next links, looking at
- the name fields until find the module you want to debug. Take the
- address of that structure, and add module.size_of_struct (which in
- 2.4.10 kernels is 96 (0x60)) to it. Gdb can make this hard addition
- for you :-):
-
-
-
- (UML gdb)
- printf "%#x\n", (int)module_list module_list->size_of_struct
-
-
-
-
- The offset from the module start occasionally changes (before 2.4.0,
- it was module.size_of_struct + 4), so it's a good idea to check the
- init and cleanup addresses once in a while, as describe below. Now
- do:
-
-
- (UML gdb)
- add-symbol-file /path/to/module/on/host that_address
-
-
-
-
- Tell gdb you really want to do it, and you're in business.
-
-
- If there's any doubt that you got the offset right, like breakpoints
- appear not to work, or they're appearing in the wrong place, you can
- check it by looking at the module structure. The init and cleanup
- fields should look like:
-
-
- init = 0x588066b0 <init_hostfs>, cleanup = 0x588066c0 <exit_hostfs>
-
-
-
-
- with no offsets on the symbol names. If the names are right, but they
- are offset, then the offset tells you how much you need to add to the
- address you gave to add-symbol-file.
-
-
- When you want to load in a new version of the module, you need to get
- gdb to forget about the old one. The only way I've found to do that
- is to tell gdb to forget about all symbols that it knows about:
-
-
- (UML gdb) symbol-file
-
-
-
-
- Then reload the symbols from the kernel binary:
-
-
- (UML gdb) symbol-file /path/to/kernel
-
-
-
-
- and repeat the process above. You'll also need to re-enable break-
- points. They were disabled when you dumped all the symbols because
- gdb couldn't figure out where they should go.
-
-
-
- 1\b11\b1.\b.5\b5.\b. A\bAt\btt\bta\bac\bch\bhi\bin\bng\bg g\bgd\bdb\bb t\bto\bo t\bth\bhe\be k\bke\ber\brn\bne\bel\bl
-
- If you don't have the kernel running under gdb, you can attach gdb to
- it later by sending the tracing thread a SIGUSR1. The first line of
- the console output identifies its pid:
- tracing thread pid = 20093
-
-
-
-
- When you send it the signal:
-
-
- host% kill -USR1 20093
-
-
-
-
- you will get an xterm with gdb running in it.
-
-
- If you have the mconsole compiled into UML, then the mconsole client
- can be used to start gdb:
-
-
- (mconsole) (mconsole) config gdb=xterm
-
-
-
-
- will fire up an xterm with gdb running in it.
-
-
-
- 1\b11\b1.\b.6\b6.\b. U\bUs\bsi\bin\bng\bg a\bal\blt\bte\ber\brn\bna\bat\bte\be d\bde\beb\bbu\bug\bgg\bge\ber\brs\bs
-
- UML has support for attaching to an already running debugger rather
- than starting gdb itself. This is present in CVS as of 17 Apr 2001.
- I sent it to Alan for inclusion in the ac tree, and it will be in my
- 2.4.4 release.
-
-
- This is useful when gdb is a subprocess of some UI, such as emacs or
- ddd. It can also be used to run debuggers other than gdb on UML.
- Below is an example of using strace as an alternate debugger.
-
-
- To do this, you need to get the pid of the debugger and pass it in
- with the
-
-
- If you are using gdb under some UI, then tell it to 'att 1', and
- you'll find yourself attached to UML.
-
-
- If you are using something other than gdb as your debugger, then
- you'll need to get it to do the equivalent of 'att 1' if it doesn't do
- it automatically.
-
-
- An example of an alternate debugger is strace. You can strace the
- actual kernel as follows:
-
- +\bo Run the following in a shell
-
-
- host%
- sh -c 'echo pid=$$; echo -n hit return; read x; exec strace -p 1 -o strace.out'
-
-
-
- +\bo Run UML with 'debug' and 'gdb-pid=<pid>' with the pid printed out
- by the previous command
-
- +\bo Hit return in the shell, and UML will start running, and strace
- output will start accumulating in the output file.
-
- Note that this is different from running
-
-
- host% strace ./linux
-
-
-
-
- That will strace only the main UML thread, the tracing thread, which
- doesn't do any of the actual kernel work. It just oversees the vir-
- tual machine. In contrast, using strace as described above will show
- you the low-level activity of the virtual machine.
-
-
-
-
-
- 1\b12\b2.\b. K\bKe\ber\brn\bne\bel\bl d\bde\beb\bbu\bug\bgg\bgi\bin\bng\bg e\bex\bxa\bam\bmp\bpl\ble\bes\bs
-
- 1\b12\b2.\b.1\b1.\b. T\bTh\bhe\be c\bca\bas\bse\be o\bof\bf t\bth\bhe\be h\bhu\bun\bng\bg f\bfs\bsc\bck\bk
-
- When booting up the kernel, fsck failed, and dropped me into a shell
- to fix things up. I ran fsck -y, which hung:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Setting hostname uml [ OK ]
- Checking root filesystem
- /dev/fhd0 was not cleanly unmounted, check forced.
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780.
-
- /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
- (i.e., without -a or -p options)
- [ FAILED ]
-
- *** An error occurred during the file system check.
- *** Dropping you to a shell; the system will reboot
- *** when you leave the shell.
- Give root password for maintenance
- (or type Control-D for normal startup):
-
- [root@uml /root]# fsck -y /dev/fhd0
- fsck -y /dev/fhd0
- Parallelizing fsck version 1.14 (9-Jan-1999)
- e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
- /dev/fhd0 contains a file system with errors, check forced.
- Pass 1: Checking inodes, blocks, and sizes
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes
-
- Inode 19780, i_blocks is 1548, should be 540. Fix? yes
-
- Pass 2: Checking directory structure
- Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes
-
- Directory inode 11858, block 0, offset 0: directory corrupted
- Salvage? yes
-
- Missing '.' in directory inode 11858.
- Fix? yes
-
- Missing '..' in directory inode 11858.
- Fix? yes
-
-
-
-
-
- The standard drill in this sort of situation is to fire up gdb on the
- signal thread, which, in this case, was pid 1935. In another window,
- I run gdb and attach pid 1935.
-
-
-
-
- ~/linux/2.3.26/um 1016: gdb linux
- GNU gdb 4.17.0.11 with Linux support
- Copyright 1998 Free Software Foundation, Inc.
- GDB is free software, covered by the GNU General Public License, and you are
- welcome to change it and/or distribute copies of it under certain conditions.
- Type "show copying" to see the conditions.
- There is absolutely no warranty for GDB. Type "show warranty" for details.
- This GDB was configured as "i386-redhat-linux"...
-
- (gdb) att 1935
- Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 1935
- 0x100756d9 in __wait4 ()
-
-
-
-
-
-
- Let's see what's currently running:
-
-
-
- (gdb) p current_task.pid
- $1 = 0
-
-
-
-
-
- It's the idle thread, which means that fsck went to sleep for some
- reason and never woke up.
-
-
- Let's guess that the last process in the process list is fsck:
-
-
-
- (gdb) p current_task.prev_task.comm
- $13 = "fsck.ext2\000\000\000\000\000\000"
-
-
-
-
-
- It is, so let's see what it thinks it's up to:
-
-
-
- (gdb) p current_task.prev_task.thread
- $14 = {extern_pid = 1980, tracing = 0, want_tracing = 0, forking = 0,
- kernel_stack_page = 0, signal_stack = 1342627840, syscall = {id = 4, args = {
- 3, 134973440, 1024, 0, 1024}, have_result = 0, result = 50590720},
- request = {op = 2, u = {exec = {ip = 1350467584, sp = 2952789424}, fork = {
- regs = {1350467584, 2952789424, 0 <repeats 15 times>}, sigstack = 0,
- pid = 0}, switch_to = 0x507e8000, thread = {proc = 0x507e8000,
- arg = 0xaffffdb0, flags = 0, new_pid = 0}, input_request = {
- op = 1350467584, fd = -1342177872, proc = 0, pid = 0}}}}
-
-
-
-
-
- The interesting things here are the fact that its .thread.syscall.id
- is __NR_write (see the big switch in arch/um/kernel/syscall_kern.c or
- the defines in include/asm-um/arch/unistd.h), and that it never
- returned. Also, its .request.op is OP_SWITCH (see
- arch/um/include/user_util.h). These mean that it went into a write,
- and, for some reason, called schedule().
-
-
- The fact that it never returned from write means that its stack should
- be fairly interesting. Its pid is 1980 (.thread.extern_pid). That
- process is being ptraced by the signal thread, so it must be detached
- before gdb can attach it:
-
-
-
-
-
-
-
-
-
-
- (gdb) call detach(1980)
-
- Program received signal SIGSEGV, Segmentation fault.
- <function called from gdb>
- The program being debugged stopped while in a function called from GDB.
- When the function (detach) is done executing, GDB will silently
- stop (instead of continuing to evaluate the expression containing
- the function call).
- (gdb) call detach(1980)
- $15 = 0
-
-
-
-
-
- The first detach segfaults for some reason, and the second one
- succeeds.
-
-
- Now I detach from the signal thread, attach to the fsck thread, and
- look at its stack:
-
-
- (gdb) det
- Detaching from program: /home/dike/linux/2.3.26/um/linux Pid 1935
- (gdb) att 1980
- Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 1980
- 0x10070451 in __kill ()
- (gdb) bt
- #0 0x10070451 in __kill ()
- #1 0x10068ccd in usr1_pid (pid=1980) at process.c:30
- #2 0x1006a03f in _switch_to (prev=0x50072000, next=0x507e8000)
- at process_kern.c:156
- #3 0x1006a052 in switch_to (prev=0x50072000, next=0x507e8000, last=0x50072000)
- at process_kern.c:161
- #4 0x10001d12 in schedule () at sched.c:777
- #5 0x1006a744 in __down (sem=0x507d241c) at semaphore.c:71
- #6 0x1006aa10 in __down_failed () at semaphore.c:157
- #7 0x1006c5d8 in segv_handler (sc=0x5006e940) at trap_user.c:174
- #8 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
- #9 <signal handler called>
- #10 0x10155404 in errno ()
- #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50
- #12 0x1006c5d8 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
- #13 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
- #14 <signal handler called>
- #15 0xc0fd in ?? ()
- #16 0x10016647 in sys_write (fd=3,
- buf=0x80b8800 <Address 0x80b8800 out of bounds>, count=1024)
- at read_write.c:159
- #17 0x1006d5b3 in execute_syscall (syscall=4, args=0x5006ef08)
- at syscall_kern.c:254
- #18 0x1006af87 in really_do_syscall (sig=12) at syscall_user.c:35
- #19 <signal handler called>
- #20 0x400dc8b0 in ?? ()
-
-
-
-
-
- The interesting things here are :
-
- +\bo There are two segfaults on this stack (frames 9 and 14)
-
- +\bo The first faulting address (frame 11) is 0x50000800
-
- (gdb) p (void *)1342179328
- $16 = (void *) 0x50000800
-
-
-
-
-
- The initial faulting address is interesting because it is on the idle
- thread's stack. I had been seeing the idle thread segfault for no
- apparent reason, and the cause looked like stack corruption. In hopes
- of catching the culprit in the act, I had turned off all protections
- to that stack while the idle thread wasn't running. This apparently
- tripped that trap.
-
-
- However, the more immediate problem is that second segfault and I'm
- going to concentrate on that. First, I want to see where the fault
- happened, so I have to go look at the sigcontent struct in frame 8:
-
-
-
- (gdb) up
- #1 0x10068ccd in usr1_pid (pid=1980) at process.c:30
- 30 kill(pid, SIGUSR1);
- (gdb)
- #2 0x1006a03f in _switch_to (prev=0x50072000, next=0x507e8000)
- at process_kern.c:156
- 156 usr1_pid(getpid());
- (gdb)
- #3 0x1006a052 in switch_to (prev=0x50072000, next=0x507e8000, last=0x50072000)
- at process_kern.c:161
- 161 _switch_to(prev, next);
- (gdb)
- #4 0x10001d12 in schedule () at sched.c:777
- 777 switch_to(prev, next, prev);
- (gdb)
- #5 0x1006a744 in __down (sem=0x507d241c) at semaphore.c:71
- 71 schedule();
- (gdb)
- #6 0x1006aa10 in __down_failed () at semaphore.c:157
- 157 }
- (gdb)
- #7 0x1006c5d8 in segv_handler (sc=0x5006e940) at trap_user.c:174
- 174 segv(sc->cr2, sc->err & 2);
- (gdb)
- #8 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
- 182 segv_handler(sc);
- (gdb) p *sc
- Cannot access memory at address 0x0.
-
-
-
-
- That's not very useful, so I'll try a more manual method:
-
-
- (gdb) p *((struct sigcontext *) (&sig + 1))
- $19 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43,
- __dsh = 0, edi = 1342179328, esi = 1350378548, ebp = 1342630440,
- esp = 1342630420, ebx = 1348150624, edx = 1280, ecx = 0, eax = 0,
- trapno = 14, err = 4, eip = 268480945, cs = 35, __csh = 0, eflags = 66118,
- esp_at_signal = 1342630420, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0,
- cr2 = 1280}
-
-
-
- The ip is in handle_mm_fault:
-
-
- (gdb) p (void *)268480945
- $20 = (void *) 0x1000b1b1
- (gdb) i sym $20
- handle_mm_fault + 57 in section .text
-
-
-
-
-
- Specifically, it's in pte_alloc:
-
-
- (gdb) i line *$20
- Line 124 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1b1 <handle_mm_fault+57>
- and ends at 0x1000b1b7 <handle_mm_fault+63>.
-
-
-
-
-
- To find where in handle_mm_fault this is, I'll jump forward in the
- code until I see an address in that procedure:
-
-
-
- (gdb) i line *0x1000b1c0
- Line 126 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1b7 <handle_mm_fault+63>
- and ends at 0x1000b1c3 <handle_mm_fault+75>.
- (gdb) i line *0x1000b1d0
- Line 131 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1d0 <handle_mm_fault+88>
- and ends at 0x1000b1da <handle_mm_fault+98>.
- (gdb) i line *0x1000b1e0
- Line 61 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1da <handle_mm_fault+98>
- and ends at 0x1000b1e1 <handle_mm_fault+105>.
- (gdb) i line *0x1000b1f0
- Line 134 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b1f0 <handle_mm_fault+120>
- and ends at 0x1000b200 <handle_mm_fault+136>.
- (gdb) i line *0x1000b200
- Line 135 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b200 <handle_mm_fault+136>
- and ends at 0x1000b208 <handle_mm_fault+144>.
- (gdb) i line *0x1000b210
- Line 139 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
- starts at address 0x1000b210 <handle_mm_fault+152>
- and ends at 0x1000b219 <handle_mm_fault+161>.
- (gdb) i line *0x1000b220
- Line 1168 of "memory.c" starts at address 0x1000b21e <handle_mm_fault+166>
- and ends at 0x1000b222 <handle_mm_fault+170>.
-
-
-
-
-
- Something is apparently wrong with the page tables or vma_structs, so
- lets go back to frame 11 and have a look at them:
-
-
-
- #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50
- 50 handle_mm_fault(current, vma, address, is_write);
- (gdb) call pgd_offset_proc(vma->vm_mm, address)
- $22 = (pgd_t *) 0x80a548c
-
-
-
-
-
- That's pretty bogus. Page tables aren't supposed to be in process
- text or data areas. Let's see what's in the vma:
-
-
- (gdb) p *vma
- $23 = {vm_mm = 0x507d2434, vm_start = 0, vm_end = 134512640,
- vm_next = 0x80a4f8c, vm_page_prot = {pgprot = 0}, vm_flags = 31200,
- vm_avl_height = 2058, vm_avl_left = 0x80a8c94, vm_avl_right = 0x80d1000,
- vm_next_share = 0xaffffdb0, vm_pprev_share = 0xaffffe63,
- vm_ops = 0xaffffe7a, vm_pgoff = 2952789626, vm_file = 0xafffffec,
- vm_private_data = 0x62}
- (gdb) p *vma.vm_mm
- $24 = {mmap = 0x507d2434, mmap_avl = 0x0, mmap_cache = 0x8048000,
- pgd = 0x80a4f8c, mm_users = {counter = 0}, mm_count = {counter = 134904288},
- map_count = 134909076, mmap_sem = {count = {counter = 135073792},
- sleepers = -1342177872, wait = {lock = <optimized out or zero length>,
- task_list = {next = 0xaffffe63, prev = 0xaffffe7a},
- __magic = -1342177670, __creator = -1342177300}, __magic = 98},
- page_table_lock = {}, context = 138, start_code = 0, end_code = 0,
- start_data = 0, end_data = 0, start_brk = 0, brk = 0, start_stack = 0,
- arg_start = 0, arg_end = 0, env_start = 0, env_end = 0, rss = 1350381536,
- total_vm = 0, locked_vm = 0, def_flags = 0, cpu_vm_mask = 0, swap_cnt = 0,
- swap_address = 0, segments = 0x0}
-
-
-
-
-
- This also pretty bogus. With all of the 0x80xxxxx and 0xaffffxxx
- addresses, this is looking like a stack was plonked down on top of
- these structures. Maybe it's a stack overflow from the next page:
-
-
-
- (gdb) p vma
- $25 = (struct vm_area_struct *) 0x507d2434
-
-
-
-
-
- That's towards the lower quarter of the page, so that would have to
- have been pretty heavy stack overflow:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- (gdb) x/100x $25
- 0x507d2434: 0x507d2434 0x00000000 0x08048000 0x080a4f8c
- 0x507d2444: 0x00000000 0x080a79e0 0x080a8c94 0x080d1000
- 0x507d2454: 0xaffffdb0 0xaffffe63 0xaffffe7a 0xaffffe7a
- 0x507d2464: 0xafffffec 0x00000062 0x0000008a 0x00000000
- 0x507d2474: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2484: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2494: 0x00000000 0x00000000 0x507d2fe0 0x00000000
- 0x507d24a4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24b4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24c4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24d4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24e4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d24f4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2504: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2514: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2524: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2534: 0x00000000 0x00000000 0x507d25dc 0x00000000
- 0x507d2544: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2554: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2564: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2574: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2584: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d2594: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d25a4: 0x00000000 0x00000000 0x00000000 0x00000000
- 0x507d25b4: 0x00000000 0x00000000 0x00000000 0x00000000
-
-
-
-
-
- It's not stack overflow. The only "stack-like" piece of this data is
- the vma_struct itself.
-
-
- At this point, I don't see any avenues to pursue, so I just have to
- admit that I have no idea what's going on. What I will do, though, is
- stick a trap on the segfault handler which will stop if it sees any
- writes to the idle thread's stack. That was the thing that happened
- first, and it may be that if I can catch it immediately, what's going
- on will be somewhat clearer.
-
-
- 1\b12\b2.\b.2\b2.\b. E\bEp\bpi\bis\bso\bod\bde\be 2\b2:\b: T\bTh\bhe\be c\bca\bas\bse\be o\bof\bf t\bth\bhe\be h\bhu\bun\bng\bg f\bfs\bsc\bck\bk
-
- After setting a trap in the SEGV handler for accesses to the signal
- thread's stack, I reran the kernel.
-
-
- fsck hung again, this time by hitting the trap:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Setting hostname uml [ OK ]
- Checking root filesystem
- /dev/fhd0 contains a file system with errors, check forced.
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780.
-
- /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
- (i.e., without -a or -p options)
- [ FAILED ]
-
- *** An error occurred during the file system check.
- *** Dropping you to a shell; the system will reboot
- *** when you leave the shell.
- Give root password for maintenance
- (or type Control-D for normal startup):
-
- [root@uml /root]# fsck -y /dev/fhd0
- fsck -y /dev/fhd0
- Parallelizing fsck version 1.14 (9-Jan-1999)
- e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
- /dev/fhd0 contains a file system with errors, check forced.
- Pass 1: Checking inodes, blocks, and sizes
- Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes
-
- Pass 2: Checking directory structure
- Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes
-
- Directory inode 11858, block 0, offset 0: directory corrupted
- Salvage? yes
-
- Missing '.' in directory inode 11858.
- Fix? yes
-
- Missing '..' in directory inode 11858.
- Fix? yes
-
- Untested (4127) [100fe44c]: trap_kern.c line 31
-
-
-
-
-
- I need to get the signal thread to detach from pid 4127 so that I can
- attach to it with gdb. This is done by sending it a SIGUSR1, which is
- caught by the signal thread, which detaches the process:
-
-
- kill -USR1 4127
-
-
-
-
-
- Now I can run gdb on it:
-
-
-
-
-
-
-
-
-
-
-
-
-
- ~/linux/2.3.26/um 1034: gdb linux
- GNU gdb 4.17.0.11 with Linux support
- Copyright 1998 Free Software Foundation, Inc.
- GDB is free software, covered by the GNU General Public License, and you are
- welcome to change it and/or distribute copies of it under certain conditions.
- Type "show copying" to see the conditions.
- There is absolutely no warranty for GDB. Type "show warranty" for details.
- This GDB was configured as "i386-redhat-linux"...
- (gdb) att 4127
- Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 4127
- 0x10075891 in __libc_nanosleep ()
-
-
-
-
-
- The backtrace shows that it was in a write and that the fault address
- (address in frame 3) is 0x50000800, which is right in the middle of
- the signal thread's stack page:
-
-
- (gdb) bt
- #0 0x10075891 in __libc_nanosleep ()
- #1 0x1007584d in __sleep (seconds=1000000)
- at ../sysdeps/unix/sysv/linux/sleep.c:78
- #2 0x1006ce9a in stop () at user_util.c:191
- #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31
- #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
- #5 0x1006c63c in kern_segv_handler (sig=11) at trap_user.c:182
- #6 <signal handler called>
- #7 0xc0fd in ?? ()
- #8 0x10016647 in sys_write (fd=3, buf=0x80b8800 "R.", count=1024)
- at read_write.c:159
- #9 0x1006d603 in execute_syscall (syscall=4, args=0x5006ef08)
- at syscall_kern.c:254
- #10 0x1006af87 in really_do_syscall (sig=12) at syscall_user.c:35
- #11 <signal handler called>
- #12 0x400dc8b0 in ?? ()
- #13 <signal handler called>
- #14 0x400dc8b0 in ?? ()
- #15 0x80545fd in ?? ()
- #16 0x804daae in ?? ()
- #17 0x8054334 in ?? ()
- #18 0x804d23e in ?? ()
- #19 0x8049632 in ?? ()
- #20 0x80491d2 in ?? ()
- #21 0x80596b5 in ?? ()
- (gdb) p (void *)1342179328
- $3 = (void *) 0x50000800
-
-
-
-
-
- Going up the stack to the segv_handler frame and looking at where in
- the code the access happened shows that it happened near line 110 of
- block_dev.c:
-
-
-
-
-
-
-
-
-
- (gdb) up
- #1 0x1007584d in __sleep (seconds=1000000)
- at ../sysdeps/unix/sysv/linux/sleep.c:78
- ../sysdeps/unix/sysv/linux/sleep.c:78: No such file or directory.
- (gdb)
- #2 0x1006ce9a in stop () at user_util.c:191
- 191 while(1) sleep(1000000);
- (gdb)
- #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31
- 31 KERN_UNTESTED();
- (gdb)
- #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
- 174 segv(sc->cr2, sc->err & 2);
- (gdb) p *sc
- $1 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43,
- __dsh = 0, edi = 1342179328, esi = 134973440, ebp = 1342631484,
- esp = 1342630864, ebx = 256, edx = 0, ecx = 256, eax = 1024, trapno = 14,
- err = 6, eip = 268550834, cs = 35, __csh = 0, eflags = 66070,
- esp_at_signal = 1342630864, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0,
- cr2 = 1342179328}
- (gdb) p (void *)268550834
- $2 = (void *) 0x1001c2b2
- (gdb) i sym $2
- block_write + 1090 in section .text
- (gdb) i line *$2
- Line 209 of "/home/dike/linux/2.3.26/um/include/asm/arch/string.h"
- starts at address 0x1001c2a1 <block_write+1073>
- and ends at 0x1001c2bf <block_write+1103>.
- (gdb) i line *0x1001c2c0
- Line 110 of "block_dev.c" starts at address 0x1001c2bf <block_write+1103>
- and ends at 0x1001c2e3 <block_write+1139>.
-
-
-
-
-
- Looking at the source shows that the fault happened during a call to
- copy_to_user to copy the data into the kernel:
-
-
- 107 count -= chars;
- 108 copy_from_user(p,buf,chars);
- 109 p += chars;
- 110 buf += chars;
-
-
-
-
-
- p is the pointer which must contain 0x50000800, since buf contains
- 0x80b8800 (frame 8 above). It is defined as:
-
-
- p = offset + bh->b_data;
-
-
-
-
-
- I need to figure out what bh is, and it just so happens that bh is
- passed as an argument to mark_buffer_uptodate and mark_buffer_dirty a
- few lines later, so I do a little disassembly:
-
-
-
-
- (gdb) disas 0x1001c2bf 0x1001c2e0
- Dump of assembler code from 0x1001c2bf to 0x1001c2d0:
- 0x1001c2bf <block_write+1103>: addl %eax,0xc(%ebp)
- 0x1001c2c2 <block_write+1106>: movl 0xfffffdd4(%ebp),%edx
- 0x1001c2c8 <block_write+1112>: btsl $0x0,0x18(%edx)
- 0x1001c2cd <block_write+1117>: btsl $0x1,0x18(%edx)
- 0x1001c2d2 <block_write+1122>: sbbl %ecx,%ecx
- 0x1001c2d4 <block_write+1124>: testl %ecx,%ecx
- 0x1001c2d6 <block_write+1126>: jne 0x1001c2e3 <block_write+1139>
- 0x1001c2d8 <block_write+1128>: pushl $0x0
- 0x1001c2da <block_write+1130>: pushl %edx
- 0x1001c2db <block_write+1131>: call 0x1001819c <__mark_buffer_dirty>
- End of assembler dump.
-
-
-
-
-
- At that point, bh is in %edx (address 0x1001c2da), which is calculated
- at 0x1001c2c2 as %ebp + 0xfffffdd4, so I figure exactly what that is,
- taking %ebp from the sigcontext_struct above:
-
-
- (gdb) p (void *)1342631484
- $5 = (void *) 0x5006ee3c
- (gdb) p 0x5006ee3c+0xfffffdd4
- $6 = 1342630928
- (gdb) p (void *)$6
- $7 = (void *) 0x5006ec10
- (gdb) p *((void **)$7)
- $8 = (void *) 0x50100200
-
-
-
-
-
- Now, I look at the structure to see what's in it, and particularly,
- what its b_data field contains:
-
-
- (gdb) p *((struct buffer_head *)0x50100200)
- $13 = {b_next = 0x50289380, b_blocknr = 49405, b_size = 1024, b_list = 0,
- b_dev = 15872, b_count = {counter = 1}, b_rdev = 15872, b_state = 24,
- b_flushtime = 0, b_next_free = 0x501001a0, b_prev_free = 0x50100260,
- b_this_page = 0x501001a0, b_reqnext = 0x0, b_pprev = 0x507fcf58,
- b_data = 0x50000800 "", b_page = 0x50004000,
- b_end_io = 0x10017f60 <end_buffer_io_sync>, b_dev_id = 0x0,
- b_rsector = 98810, b_wait = {lock = <optimized out or zero length>,
- task_list = {next = 0x50100248, prev = 0x50100248}, __magic = 1343226448,
- __creator = 0}, b_kiobuf = 0x0}
-
-
-
-
-
- The b_data field is indeed 0x50000800, so the question becomes how
- that happened. The rest of the structure looks fine, so this probably
- is not a case of data corruption. It happened on purpose somehow.
-
-
- The b_page field is a pointer to the page_struct representing the
- 0x50000000 page. Looking at it shows the kernel's idea of the state
- of that page:
-
-
-
- (gdb) p *$13.b_page
- $17 = {list = {next = 0x50004a5c, prev = 0x100c5174}, mapping = 0x0,
- index = 0, next_hash = 0x0, count = {counter = 1}, flags = 132, lru = {
- next = 0x50008460, prev = 0x50019350}, wait = {
- lock = <optimized out or zero length>, task_list = {next = 0x50004024,
- prev = 0x50004024}, __magic = 1342193708, __creator = 0},
- pprev_hash = 0x0, buffers = 0x501002c0, virtual = 1342177280,
- zone = 0x100c5160}
-
-
-
-
-
- Some sanity-checking: the virtual field shows the "virtual" address of
- this page, which in this kernel is the same as its "physical" address,
- and the page_struct itself should be mem_map[0], since it represents
- the first page of memory:
-
-
-
- (gdb) p (void *)1342177280
- $18 = (void *) 0x50000000
- (gdb) p mem_map
- $19 = (mem_map_t *) 0x50004000
-
-
-
-
-
- These check out fine.
-
-
- Now to check out the page_struct itself. In particular, the flags
- field shows whether the page is considered free or not:
-
-
- (gdb) p (void *)132
- $21 = (void *) 0x84
-
-
-
-
-
- The "reserved" bit is the high bit, which is definitely not set, so
- the kernel considers the signal stack page to be free and available to
- be used.
-
-
- At this point, I jump to conclusions and start looking at my early
- boot code, because that's where that page is supposed to be reserved.
-
-
- In my setup_arch procedure, I have the following code which looks just
- fine:
-
-
-
- bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn);
- free_bootmem(__pa(low_physmem) + bootmap_size, high_physmem - low_physmem);
-
-
-
-
-
- Two stack pages have already been allocated, and low_physmem points to
- the third page, which is the beginning of free memory.
- The init_bootmem call declares the entire memory to the boot memory
- manager, which marks it all reserved. The free_bootmem call frees up
- all of it, except for the first two pages. This looks correct to me.
-
-
- So, I decide to see init_bootmem run and make sure that it is marking
- those first two pages as reserved. I never get that far.
-
-
- Stepping into init_bootmem, and looking at bootmem_map before looking
- at what it contains shows the following:
-
-
-
- (gdb) p bootmem_map
- $3 = (void *) 0x50000000
-
-
-
-
-
- Aha! The light dawns. That first page is doing double duty as a
- stack and as the boot memory map. The last thing that the boot memory
- manager does is to free the pages used by its memory map, so this page
- is getting freed even its marked as reserved.
-
-
- The fix was to initialize the boot memory manager before allocating
- those two stack pages, and then allocate them through the boot memory
- manager. After doing this, and fixing a couple of subsequent buglets,
- the stack corruption problem disappeared.
-
-
-
-
-
- 1\b13\b3.\b. W\bWh\bha\bat\bt t\bto\bo d\bdo\bo w\bwh\bhe\ben\bn U\bUM\bML\bL d\bdo\boe\bes\bsn\bn'\b't\bt w\bwo\bor\brk\bk
-
-
-
-
- 1\b13\b3.\b.1\b1.\b. S\bSt\btr\bra\ban\bng\bge\be c\bco\bom\bmp\bpi\bil\bla\bat\bti\bio\bon\bn e\ber\brr\bro\bor\brs\bs w\bwh\bhe\ben\bn y\byo\bou\bu b\bbu\bui\bil\bld\bd f\bfr\bro\bom\bm s\bso\bou\bur\brc\bce\be
-
- As of test11, it is necessary to have "ARCH=um" in the environment or
- on the make command line for all steps in building UML, including
- clean, distclean, or mrproper, config, menuconfig, or xconfig, dep,
- and linux. If you forget for any of them, the i386 build seems to
- contaminate the UML build. If this happens, start from scratch with
-
-
- host%
- make mrproper ARCH=um
-
-
-
-
- and repeat the build process with ARCH=um on all the steps.
-
-
- See ``Compiling the kernel and modules'' for more details.
-
-
- Another cause of strange compilation errors is building UML in
- /usr/src/linux. If you do this, the first thing you need to do is
- clean up the mess you made. The /usr/src/linux/asm link will now
- point to /usr/src/linux/asm-um. Make it point back to
- /usr/src/linux/asm-i386. Then, move your UML pool someplace else and
- build it there. Also see below, where a more specific set of symptoms
- is described.
-
-
-
- 1\b13\b3.\b.3\b3.\b. A\bA v\bva\bar\bri\bie\bet\bty\by o\bof\bf p\bpa\ban\bni\bic\bcs\bs a\ban\bnd\bd h\bha\ban\bng\bgs\bs w\bwi\bit\bth\bh /\b/t\btm\bmp\bp o\bon\bn a\ba r\bre\bei\bis\bse\ber\brf\bfs\bs f\bfi\bil\ble\bes\bsy\bys\bs-\b-
- t\bte\bem\bm
-
- I saw this on reiserfs 3.5.21 and it seems to be fixed in 3.5.27.
- Panics preceded by
-
-
- Detaching pid nnnn
-
-
-
- are diagnostic of this problem. This is a reiserfs bug which causes a
- thread to occasionally read stale data from a mmapped page shared with
- another thread. The fix is to upgrade the filesystem or to have /tmp
- be an ext2 filesystem.
-
-
-
- 1\b13\b3.\b.4\b4.\b. T\bTh\bhe\be c\bco\bom\bmp\bpi\bil\ble\be f\bfa\bai\bil\bls\bs w\bwi\bit\bth\bh e\ber\brr\bro\bor\brs\bs a\bab\bbo\bou\but\bt c\bco\bon\bnf\bfl\bli\bic\bct\bti\bin\bng\bg t\bty\byp\bpe\bes\bs f\bfo\bor\br
- '\b'o\bop\bpe\ben\bn'\b',\b, '\b'd\bdu\bup\bp'\b',\b, a\ban\bnd\bd '\b'w\bwa\bai\bit\btp\bpi\bid\bd'\b'
-
- This happens when you build in /usr/src/linux. The UML build makes
- the include/asm link point to include/asm-um. /usr/include/asm points
- to /usr/src/linux/include/asm, so when that link gets moved, files
- which need to include the asm-i386 versions of headers get the
- incompatible asm-um versions. The fix is to move the include/asm link
- back to include/asm-i386 and to do UML builds someplace else.
-
-
-
- 1\b13\b3.\b.5\b5.\b. U\bUM\bML\bL d\bdo\boe\bes\bsn\bn'\b't\bt w\bwo\bor\brk\bk w\bwh\bhe\ben\bn /\b/t\btm\bmp\bp i\bis\bs a\ban\bn N\bNF\bFS\bS f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm
-
- This seems to be a similar situation with the ReiserFS problem above.
- Some versions of NFS seems not to handle mmap correctly, which UML
- depends on. The workaround is have /tmp be a non-NFS directory.
-
-
- 1\b13\b3.\b.6\b6.\b. U\bUM\bML\bL h\bha\ban\bng\bgs\bs o\bon\bn b\bbo\boo\bot\bt w\bwh\bhe\ben\bn c\bco\bom\bmp\bpi\bil\ble\bed\bd w\bwi\bit\bth\bh g\bgp\bpr\bro\bof\bf s\bsu\bup\bpp\bpo\bor\brt\bt
-
- If you build UML with gprof support and, early in the boot, it does
- this
-
-
- kernel BUG at page_alloc.c:100!
-
-
-
-
- you have a buggy gcc. You can work around the problem by removing
- UM_FASTCALL from CFLAGS in arch/um/Makefile-i386. This will open up
- another bug, but that one is fairly hard to reproduce.
-
-
-
- 1\b13\b3.\b.7\b7.\b. s\bsy\bys\bsl\blo\bog\bgd\bd d\bdi\bie\bes\bs w\bwi\bit\bth\bh a\ba S\bSI\bIG\bGT\bTE\bER\bRM\bM o\bon\bn s\bst\bta\bar\brt\btu\bup\bp
-
- The exact boot error depends on the distribution that you're booting,
- but Debian produces this:
-
-
- /etc/rc2.d/S10sysklogd: line 49: 93 Terminated
- start-stop-daemon --start --quiet --exec /sbin/syslogd -- $SYSLOGD
-
-
-
-
- This is a syslogd bug. There's a race between a parent process
- installing a signal handler and its child sending the signal. See
- this uml-devel post <http://www.geocrawler.com/lists/3/Source-
- Forge/709/0/6612801> for the details.
-
-
-
- 1\b13\b3.\b.8\b8.\b. T\bTU\bUN\bN/\b/T\bTA\bAP\bP n\bne\bet\btw\bwo\bor\brk\bki\bin\bng\bg d\bdo\boe\bes\bsn\bn'\b't\bt w\bwo\bor\brk\bk o\bon\bn a\ba 2\b2.\b.4\b4 h\bho\bos\bst\bt
-
- There are a couple of problems which were
- <http://www.geocrawler.com/lists/3/SourceForge/597/0/> name="pointed
- out"> by Tim Robinson <timro at trkr dot net>
-
- +\bo It doesn't work on hosts running 2.4.7 (or thereabouts) or earlier.
- The fix is to upgrade to something more recent and then read the
- next item.
-
- +\bo If you see
-
-
- File descriptor in bad state
-
-
-
- when you bring up the device inside UML, you have a header mismatch
- between the original kernel and the upgraded one. Make /usr/src/linux
- point at the new headers. This will only be a problem if you build
- uml_net yourself.
-
-
-
- 1\b13\b3.\b.9\b9.\b. Y\bYo\bou\bu c\bca\ban\bn n\bne\bet\btw\bwo\bor\brk\bk t\bto\bo t\bth\bhe\be h\bho\bos\bst\bt b\bbu\but\bt n\bno\bot\bt t\bto\bo o\bot\bth\bhe\ber\br m\bma\bac\bch\bhi\bin\bne\bes\bs o\bon\bn t\bth\bhe\be
- n\bne\bet\bt
-
- If you can connect to the host, and the host can connect to UML, but
- you cannot connect to any other machines, then you may need to enable
- IP Masquerading on the host. Usually this is only experienced when
- using private IP addresses (192.168.x.x or 10.x.x.x) for host/UML
- networking, rather than the public address space that your host is
- connected to. UML does not enable IP Masquerading, so you will need
- to create a static rule to enable it:
-
-
- host%
- iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
-
-
-
-
- Replace eth0 with the interface that you use to talk to the rest of
- the world.
-
-
- Documentation on IP Masquerading, and SNAT, can be found at
- www.netfilter.org <http://www.netfilter.org> .
-
-
- If you can reach the local net, but not the outside Internet, then
- that is usually a routing problem. The UML needs a default route:
-
-
- UML#
- route add default gw gateway IP
-
-
-
-
- The gateway IP can be any machine on the local net that knows how to
- reach the outside world. Usually, this is the host or the local net-
- work's gateway.
-
-
- Occasionally, we hear from someone who can reach some machines, but
- not others on the same net, or who can reach some ports on other
- machines, but not others. These are usually caused by strange
- firewalling somewhere between the UML and the other box. You track
- this down by running tcpdump on every interface the packets travel
- over and see where they disappear. When you find a machine that takes
- the packets in, but does not send them onward, that's the culprit.
-
-
-
- 1\b13\b3.\b.1\b10\b0.\b. I\bI h\bha\bav\bve\be n\bno\bo r\bro\boo\bot\bt a\ban\bnd\bd I\bI w\bwa\ban\bnt\bt t\bto\bo s\bsc\bcr\bre\bea\bam\bm
-
- Thanks to Birgit Wahlich for telling me about this strange one. It
- turns out that there's a limit of six environment variables on the
- kernel command line. When that limit is reached or exceeded, argument
- processing stops, which means that the 'root=' argument that UML
- usually adds is not seen. So, the filesystem has no idea what the
- root device is, so it panics.
-
-
- The fix is to put less stuff on the command line. Glomming all your
- setup variables into one is probably the best way to go.
-
-
-
- 1\b13\b3.\b.1\b11\b1.\b. U\bUM\bML\bL b\bbu\bui\bil\bld\bd c\bco\bon\bnf\bfl\bli\bic\bct\bt b\bbe\bet\btw\bwe\bee\ben\bn p\bpt\btr\bra\bac\bce\be.\b.h\bh a\ban\bnd\bd u\buc\bco\bon\bnt\bte\bex\bxt\bt.\b.h\bh
-
- On some older systems, /usr/include/asm/ptrace.h and
- /usr/include/sys/ucontext.h define the same names. So, when they're
- included together, the defines from one completely mess up the parsing
- of the other, producing errors like:
- /usr/include/sys/ucontext.h:47: parse error before
- `10'
-
-
-
-
- plus a pile of warnings.
-
-
- This is a libc botch, which has since been fixed, and I don't see any
- way around it besides upgrading.
-
-
-
- 1\b13\b3.\b.1\b12\b2.\b. T\bTh\bhe\be U\bUM\bML\bL B\bBo\bog\bgo\boM\bMi\bip\bps\bs i\bis\bs e\bex\bxa\bac\bct\btl\bly\by h\bha\bal\blf\bf t\bth\bhe\be h\bho\bos\bst\bt'\b's\bs B\bBo\bog\bgo\boM\bMi\bip\bps\bs
-
- On i386 kernels, there are two ways of running the loop that is used
- to calculate the BogoMips rating, using the TSC if it's there or using
- a one-instruction loop. The TSC produces twice the BogoMips as the
- loop. UML uses the loop, since it has nothing resembling a TSC, and
- will get almost exactly the same BogoMips as a host using the loop.
- However, on a host with a TSC, its BogoMips will be double the loop
- BogoMips, and therefore double the UML BogoMips.
-
-
-
- 1\b13\b3.\b.1\b13\b3.\b. W\bWh\bhe\ben\bn y\byo\bou\bu r\bru\bun\bn U\bUM\bML\bL,\b, i\bit\bt i\bim\bmm\bme\bed\bdi\bia\bat\bte\bel\bly\by s\bse\beg\bgf\bfa\bau\bul\blt\bts\bs
-
- If the host is configured with the 2G/2G address space split, that's
- why. See ``UML on 2G/2G hosts'' for the details on getting UML to
- run on your host.
-
-
-
- 1\b13\b3.\b.1\b14\b4.\b. x\bxt\bte\ber\brm\bms\bs a\bap\bpp\bpe\bea\bar\br,\b, t\bth\bhe\ben\bn i\bim\bmm\bme\bed\bdi\bia\bat\bte\bel\bly\by d\bdi\bis\bsa\bap\bpp\bpe\bea\bar\br
-
- If you're running an up to date kernel with an old release of
- uml_utilities, the port-helper program will not work properly, so
- xterms will exit straight after they appear. The solution is to
- upgrade to the latest release of uml_utilities. Usually this problem
- occurs when you have installed a packaged release of UML then compiled
- your own development kernel without upgrading the uml_utilities from
- the source distribution.
-
-
-
- 1\b13\b3.\b.1\b15\b5.\b. A\bAn\bny\by o\bot\bth\bhe\ber\br p\bpa\ban\bni\bic\bc,\b, h\bha\ban\bng\bg,\b, o\bor\br s\bst\btr\bra\ban\bng\bge\be b\bbe\beh\bha\bav\bvi\bio\bor\br
-
- If you're seeing truly strange behavior, such as hangs or panics that
- happen in random places, or you try running the debugger to see what's
- happening and it acts strangely, then it could be a problem in the
- host kernel. If you're not running a stock Linus or -ac kernel, then
- try that. An early version of the preemption patch and a 2.4.10 SuSE
- kernel have caused very strange problems in UML.
-
-
- Otherwise, let me know about it. Send a message to one of the UML
- mailing lists - either the developer list - user-mode-linux-devel at
- lists dot sourceforge dot net (subscription info) or the user list -
- user-mode-linux-user at lists dot sourceforge do net (subscription
- info), whichever you prefer. Don't assume that everyone knows about
- it and that a fix is imminent.
-
-
- If you want to be super-helpful, read ``Diagnosing Problems'' and
- follow the instructions contained therein.
- 1\b14\b4.\b. D\bDi\bia\bag\bgn\bno\bos\bsi\bin\bng\bg P\bPr\bro\bob\bbl\ble\bem\bms\bs
-
-
- If you get UML to crash, hang, or otherwise misbehave, you should
- report this on one of the project mailing lists, either the developer
- list - user-mode-linux-devel at lists dot sourceforge dot net
- (subscription info) or the user list - user-mode-linux-user at lists
- dot sourceforge dot net (subscription info). When you do, it is
- likely that I will want more information. So, it would be helpful to
- read the stuff below, do whatever is applicable in your case, and
- report the results to the list.
-
-
- For any diagnosis, you're going to need to build a debugging kernel.
- The binaries from this site aren't debuggable. If you haven't done
- this before, read about ``Compiling the kernel and modules'' and
- ``Kernel debugging'' UML first.
-
-
- 1\b14\b4.\b.1\b1.\b. C\bCa\bas\bse\be 1\b1 :\b: N\bNo\bor\brm\bma\bal\bl k\bke\ber\brn\bne\bel\bl p\bpa\ban\bni\bic\bcs\bs
-
- The most common case is for a normal thread to panic. To debug this,
- you will need to run it under the debugger (add 'debug' to the command
- line). An xterm will start up with gdb running inside it. Continue
- it when it stops in start_kernel and make it crash. Now ^C gdb and
-
-
- If the panic was a "Kernel mode fault", then there will be a segv
- frame on the stack and I'm going to want some more information. The
- stack might look something like this:
-
-
- (UML gdb) backtrace
- #0 0x1009bf76 in __sigprocmask (how=1, set=0x5f347940, oset=0x0)
- at ../sysdeps/unix/sysv/linux/sigprocmask.c:49
- #1 0x10091411 in change_sig (signal=10, on=1) at process.c:218
- #2 0x10094785 in timer_handler (sig=26) at time_kern.c:32
- #3 0x1009bf38 in __restore ()
- at ../sysdeps/unix/sysv/linux/i386/sigaction.c:125
- #4 0x1009534c in segv (address=8, ip=268849158, is_write=2, is_user=0)
- at trap_kern.c:66
- #5 0x10095c04 in segv_handler (sig=11) at trap_user.c:285
- #6 0x1009bf38 in __restore ()
-
-
-
-
- I'm going to want to see the symbol and line information for the value
- of ip in the segv frame. In this case, you would do the following:
-
-
- (UML gdb) i sym 268849158
-
-
-
-
- and
-
-
- (UML gdb) i line *268849158
-
-
-
-
- The reason for this is the __restore frame right above the segv_han-
- dler frame is hiding the frame that actually segfaulted. So, I have
- to get that information from the faulting ip.
-
-
- 1\b14\b4.\b.2\b2.\b. C\bCa\bas\bse\be 2\b2 :\b: T\bTr\bra\bac\bci\bin\bng\bg t\bth\bhr\bre\bea\bad\bd p\bpa\ban\bni\bic\bcs\bs
-
- The less common and more painful case is when the tracing thread
- panics. In this case, the kernel debugger will be useless because it
- needs a healthy tracing thread in order to work. The first thing to
- do is get a backtrace from the tracing thread. This is done by
- figuring out what its pid is, firing up gdb, and attaching it to that
- pid. You can figure out the tracing thread pid by looking at the
- first line of the console output, which will look like this:
-
-
- tracing thread pid = 15851
-
-
-
-
- or by running ps on the host and finding the line that looks like
- this:
-
-
- jdike 15851 4.5 0.4 132568 1104 pts/0 S 21:34 0:05 ./linux [(tracing thread)]
-
-
-
-
- If the panic was 'segfault in signals', then follow the instructions
- above for collecting information about the location of the seg fault.
-
-
- If the tracing thread flaked out all by itself, then send that
- backtrace in and wait for our crack debugging team to fix the problem.
-
-
- 1\b14\b4.\b.3\b3.\b. C\bCa\bas\bse\be 3\b3 :\b: T\bTr\bra\bac\bci\bin\bng\bg t\bth\bhr\bre\bea\bad\bd p\bpa\ban\bni\bic\bcs\bs c\bca\bau\bus\bse\bed\bd b\bby\by o\bot\bth\bhe\ber\br t\bth\bhr\bre\bea\bad\bds\bs
-
- However, there are cases where the misbehavior of another thread
- caused the problem. The most common panic of this type is:
-
-
- wait_for_stop failed to wait for <pid> to stop with <signal number>
-
-
-
-
- In this case, you'll need to get a backtrace from the process men-
- tioned in the panic, which is complicated by the fact that the kernel
- debugger is defunct and without some fancy footwork, another gdb can't
- attach to it. So, this is how the fancy footwork goes:
-
- In a shell:
-
-
- host% kill -STOP pid
-
-
-
-
- Run gdb on the tracing thread as described in case 2 and do:
-
-
- (host gdb) call detach(pid)
-
-
- If you get a segfault, do it again. It always works the second time.
-
- Detach from the tracing thread and attach to that other thread:
-
-
- (host gdb) detach
-
-
-
-
-
-
- (host gdb) attach pid
-
-
-
-
- If gdb hangs when attaching to that process, go back to a shell and
- do:
-
-
- host%
- kill -CONT pid
-
-
-
-
- And then get the backtrace:
-
-
- (host gdb) backtrace
-
-
-
-
-
- 1\b14\b4.\b.4\b4.\b. C\bCa\bas\bse\be 4\b4 :\b: H\bHa\ban\bng\bgs\bs
-
- Hangs seem to be fairly rare, but they sometimes happen. When a hang
- happens, we need a backtrace from the offending process. Run the
- kernel debugger as described in case 1 and get a backtrace. If the
- current process is not the idle thread, then send in the backtrace.
- You can tell that it's the idle thread if the stack looks like this:
-
-
- #0 0x100b1401 in __libc_nanosleep ()
- #1 0x100a2885 in idle_sleep (secs=10) at time.c:122
- #2 0x100a546f in do_idle () at process_kern.c:445
- #3 0x100a5508 in cpu_idle () at process_kern.c:471
- #4 0x100ec18f in start_kernel () at init/main.c:592
- #5 0x100a3e10 in start_kernel_proc (unused=0x0) at um_arch.c:71
- #6 0x100a383f in signal_tramp (arg=0x100a3dd8) at trap_user.c:50
-
-
-
-
- If this is the case, then some other process is at fault, and went to
- sleep when it shouldn't have. Run ps on the host and figure out which
- process should not have gone to sleep and stayed asleep. Then attach
- to it with gdb and get a backtrace as described in case 3.
-
-
-
-
-
-
- 1\b15\b5.\b. T\bTh\bha\ban\bnk\bks\bs
-
-
- A number of people have helped this project in various ways, and this
- page gives recognition where recognition is due.
-
-
- If you're listed here and you would prefer a real link on your name,
- or no link at all, instead of the despammed email address pseudo-link,
- let me know.
-
-
- If you're not listed here and you think maybe you should be, please
- let me know that as well. I try to get everyone, but sometimes my
- bookkeeping lapses and I forget about contributions.
-
-
- 1\b15\b5.\b.1\b1.\b. C\bCo\bod\bde\be a\ban\bnd\bd D\bDo\boc\bcu\bum\bme\ben\bnt\bta\bat\bti\bio\bon\bn
-
- Rusty Russell <rusty at linuxcare.com.au> -
-
- +\bo wrote the HOWTO <http://user-mode-
- linux.sourceforge.net/UserModeLinux-HOWTO.html>
-
- +\bo prodded me into making this project official and putting it on
- SourceForge
-
- +\bo came up with the way cool UML logo <http://user-mode-
- linux.sourceforge.net/uml-small.png>
-
- +\bo redid the config process
-
-
- Peter Moulder <reiter at netspace.net.au> - Fixed my config and build
- processes, and added some useful code to the block driver
-
-
- Bill Stearns <wstearns at pobox.com> -
-
- +\bo HOWTO updates
-
- +\bo lots of bug reports
-
- +\bo lots of testing
-
- +\bo dedicated a box (uml.ists.dartmouth.edu) to support UML development
-
- +\bo wrote the mkrootfs script, which allows bootable filesystems of
- RPM-based distributions to be cranked out
-
- +\bo cranked out a large number of filesystems with said script
-
-
- Jim Leu <jleu at mindspring.com> - Wrote the virtual ethernet driver
- and associated usermode tools
-
- Lars Brinkhoff <http://lars.nocrew.org/> - Contributed the ptrace
- proxy from his own project <http://a386.nocrew.org/> to allow easier
- kernel debugging
-
-
- Andrea Arcangeli <andrea at suse.de> - Redid some of the early boot
- code so that it would work on machines with Large File Support
-
-
- Chris Emerson <http://www.chiark.greenend.org.uk/~cemerson/> - Did
- the first UML port to Linux/ppc
-
-
- Harald Welte <laforge at gnumonks.org> - Wrote the multicast
- transport for the network driver
-
-
- Jorgen Cederlof - Added special file support to hostfs
-
-
- Greg Lonnon <glonnon at ridgerun dot com> - Changed the ubd driver
- to allow it to layer a COW file on a shared read-only filesystem and
- wrote the iomem emulation support
-
-
- Henrik Nordstrom <http://hem.passagen.se/hno/> - Provided a variety
- of patches, fixes, and clues
-
-
- Lennert Buytenhek - Contributed various patches, a rewrite of the
- network driver, the first implementation of the mconsole driver, and
- did the bulk of the work needed to get SMP working again.
-
-
- Yon Uriarte - Fixed the TUN/TAP network backend while I slept.
-
-
- Adam Heath - Made a bunch of nice cleanups to the initialization code,
- plus various other small patches.
-
-
- Matt Zimmerman - Matt volunteered to be the UML Debian maintainer and
- is doing a real nice job of it. He also noticed and fixed a number of
- actually and potentially exploitable security holes in uml_net. Plus
- the occasional patch. I like patches.
-
-
- James McMechan - James seems to have taken over maintenance of the ubd
- driver and is doing a nice job of it.
-
-
- Chandan Kudige - wrote the umlgdb script which automates the reloading
- of module symbols.
-
-
- Steve Schmidtke - wrote the UML slirp transport and hostaudio drivers,
- enabling UML processes to access audio devices on the host. He also
- submitted patches for the slip transport and lots of other things.
-
-
- David Coulson <http://davidcoulson.net> -
-
- +\bo Set up the usermodelinux.org <http://usermodelinux.org> site,
- which is a great way of keeping the UML user community on top of
- UML goings-on.
-
- +\bo Site documentation and updates
-
- +\bo Nifty little UML management daemon UMLd
- <http://uml.openconsultancy.com/umld/>
-
- +\bo Lots of testing and bug reports
-
-
-
-
- 1\b15\b5.\b.2\b2.\b. F\bFl\blu\bus\bsh\bhi\bin\bng\bg o\bou\but\bt b\bbu\bug\bgs\bs
-
-
-
- +\bo Yuri Pudgorodsky
-
- +\bo Gerald Britton
-
- +\bo Ian Wehrman
-
- +\bo Gord Lamb
-
- +\bo Eugene Koontz
-
- +\bo John H. Hartman
-
- +\bo Anders Karlsson
-
- +\bo Daniel Phillips
-
- +\bo John Fremlin
-
- +\bo Rainer Burgstaller
-
- +\bo James Stevenson
-
- +\bo Matt Clay
-
- +\bo Cliff Jefferies
-
- +\bo Geoff Hoff
-
- +\bo Lennert Buytenhek
-
- +\bo Al Viro
-
- +\bo Frank Klingenhoefer
-
- +\bo Livio Baldini Soares
-
- +\bo Jon Burgess
-
- +\bo Petru Paler
-
- +\bo Paul
-
- +\bo Chris Reahard
-
- +\bo Sverker Nilsson
-
- +\bo Gong Su
-
- +\bo johan verrept
-
- +\bo Bjorn Eriksson
-
- +\bo Lorenzo Allegrucci
-
- +\bo Muli Ben-Yehuda
-
- +\bo David Mansfield
-
- +\bo Howard Goff
-
- +\bo Mike Anderson
-
- +\bo John Byrne
-
- +\bo Sapan J. Batia
-
- +\bo Iris Huang
-
- +\bo Jan Hudec
-
- +\bo Voluspa
-
-
-
-
- 1\b15\b5.\b.3\b3.\b. B\bBu\bug\bgl\ble\bet\bts\bs a\ban\bnd\bd c\bcl\ble\bea\ban\bn-\b-u\bup\bps\bs
-
-
-
- +\bo Dave Zarzycki
-
- +\bo Adam Lazur
-
- +\bo Boria Feigin
-
- +\bo Brian J. Murrell
-
- +\bo JS
-
- +\bo Roman Zippel
-
- +\bo Wil Cooley
-
- +\bo Ayelet Shemesh
-
- +\bo Will Dyson
-
- +\bo Sverker Nilsson
-
- +\bo dvorak
-
- +\bo v.naga srinivas
-
- +\bo Shlomi Fish
-
- +\bo Roger Binns
-
- +\bo johan verrept
-
- +\bo MrChuoi
-
- +\bo Peter Cleve
-
- +\bo Vincent Guffens
-
- +\bo Nathan Scott
-
- +\bo Patrick Caulfield
-
- +\bo jbearce
-
- +\bo Catalin Marinas
-
- +\bo Shane Spencer
-
- +\bo Zou Min
-
-
- +\bo Ryan Boder
-
- +\bo Lorenzo Colitti
-
- +\bo Gwendal Grignou
-
- +\bo Andre' Breiler
-
- +\bo Tsutomu Yasuda
-
-
-
- 1\b15\b5.\b.4\b4.\b. C\bCa\bas\bse\be S\bSt\btu\bud\bdi\bie\bes\bs
-
-
- +\bo Jon Wright
-
- +\bo William McEwan
-
- +\bo Michael Richardson
-
-
-
- 1\b15\b5.\b.5\b5.\b. O\bOt\bth\bhe\ber\br c\bco\bon\bnt\btr\bri\bib\bbu\but\bti\bio\bon\bns\bs
-
-
- Bill Carr <Bill.Carr at compaq.com> made the Red Hat mkrootfs script
- work with RH 6.2.
-
- Michael Jennings <mikejen at hevanet.com> sent in some material which
- is now gracing the top of the index page <http://user-mode-
- linux.sourceforge.net/> of this site.
-
- SGI <http://www.sgi.com> (and more specifically Ralf Baechle <ralf at
- uni-koblenz.de> ) gave me an account on oss.sgi.com
- <http://www.oss.sgi.com> . The bandwidth there made it possible to
- produce most of the filesystems available on the project download
- page.
-
- Laurent Bonnaud <Laurent.Bonnaud at inpg.fr> took the old grotty
- Debian filesystem that I've been distributing and updated it to 2.2.
- It is now available by itself here.
-
- Rik van Riel gave me some ftp space on ftp.nl.linux.org so I can make
- releases even when Sourceforge is broken.
-
- Rodrigo de Castro looked at my broken pte code and told me what was
- wrong with it, letting me fix a long-standing (several weeks) and
- serious set of bugs.
-
- Chris Reahard built a specialized root filesystem for running a DNS
- server jailed inside UML. It's available from the download
- <http://user-mode-linux.sourceforge.net/dl-sf.html> page in the Jail
- Filesystems section.
-
-
-
-
-
-
-
-
-
-
-
-
-1'-
In the above chart minuses and slashes represent "real" data amounts, points and
-accents represent "useful" data, basically, CEU scaled amd cropped output,
+accents represent "useful" data, basically, CEU scaled and cropped output,
mapped back onto the client's source plane.
Such a configuration can be produced by user requests:
1. Calculate current sensor scales:
- scale_s = ((3') - (3)) / ((2') - (2))
+ scale_s = ((2') - (2)) / ((3') - (3))
2. Calculate "effective" input crop (sensor subwindow) - CEU crop scaled back at
current sensor scales onto input window - this is user S_CROP:
4. Calculate sensor output window by applying combined scales to real input
window:
- width_s_out = ((2') - (2)) / scale_comb
+ width_s_out = ((7') - (7)) = ((2') - (2)) / scale_comb
5. Apply iterative sensor S_FMT for sensor output window.
--- /dev/null
+Virtualization support in the Linux kernel.
+
+00-INDEX
+ - this file.
+kvm/
+ - Kernel Virtual Machine. See also http://linux-kvm.org
+lguest/
+ - Extremely simple hypervisor for experimental/educational use.
+uml/
+ - User Mode Linux, builds/runs Linux kernel as a userspace program.
--- /dev/null
+The Definitive KVM (Kernel-based Virtual Machine) API Documentation
+===================================================================
+
+1. General description
+
+The kvm API is a set of ioctls that are issued to control various aspects
+of a virtual machine. The ioctls belong to three classes
+
+ - System ioctls: These query and set global attributes which affect the
+ whole kvm subsystem. In addition a system ioctl is used to create
+ virtual machines
+
+ - VM ioctls: These query and set attributes that affect an entire virtual
+ machine, for example memory layout. In addition a VM ioctl is used to
+ create virtual cpus (vcpus).
+
+ Only run VM ioctls from the same process (address space) that was used
+ to create the VM.
+
+ - vcpu ioctls: These query and set attributes that control the operation
+ of a single virtual cpu.
+
+ Only run vcpu ioctls from the same thread that was used to create the
+ vcpu.
+
+2. File descriptors
+
+The kvm API is centered around file descriptors. An initial
+open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
+can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
+handle will create a VM file descriptor which can be used to issue VM
+ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
+and return a file descriptor pointing to it. Finally, ioctls on a vcpu
+fd can be used to control the vcpu, including the important task of
+actually running guest code.
+
+In general file descriptors can be migrated among processes by means
+of fork() and the SCM_RIGHTS facility of unix domain socket. These
+kinds of tricks are explicitly not supported by kvm. While they will
+not cause harm to the host, their actual behavior is not guaranteed by
+the API. The only supported use is one virtual machine per process,
+and one vcpu per thread.
+
+3. Extensions
+
+As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
+incompatible change are allowed. However, there is an extension
+facility that allows backward-compatible extensions to the API to be
+queried and used.
+
+The extension mechanism is not based on on the Linux version number.
+Instead, kvm defines extension identifiers and a facility to query
+whether a particular extension identifier is available. If it is, a
+set of ioctls is available for application use.
+
+4. API description
+
+This section describes ioctls that can be used to control kvm guests.
+For each ioctl, the following information is provided along with a
+description:
+
+ Capability: which KVM extension provides this ioctl. Can be 'basic',
+ which means that is will be provided by any kernel that supports
+ API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which
+ means availability needs to be checked with KVM_CHECK_EXTENSION
+ (see section 4.4).
+
+ Architectures: which instruction set architectures provide this ioctl.
+ x86 includes both i386 and x86_64.
+
+ Type: system, vm, or vcpu.
+
+ Parameters: what parameters are accepted by the ioctl.
+
+ Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
+ are not detailed, but errors with specific meanings are.
+
+4.1 KVM_GET_API_VERSION
+
+Capability: basic
+Architectures: all
+Type: system ioctl
+Parameters: none
+Returns: the constant KVM_API_VERSION (=12)
+
+This identifies the API version as the stable kvm API. It is not
+expected that this number will change. However, Linux 2.6.20 and
+2.6.21 report earlier versions; these are not documented and not
+supported. Applications should refuse to run if KVM_GET_API_VERSION
+returns a value other than 12. If this check passes, all ioctls
+described as 'basic' will be available.
+
+4.2 KVM_CREATE_VM
+
+Capability: basic
+Architectures: all
+Type: system ioctl
+Parameters: none
+Returns: a VM fd that can be used to control the new virtual machine.
+
+The new VM has no virtual cpus and no memory. An mmap() of a VM fd
+will access the virtual machine's physical address space; offset zero
+corresponds to guest physical address zero. Use of mmap() on a VM fd
+is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
+available.
+
+4.3 KVM_GET_MSR_INDEX_LIST
+
+Capability: basic
+Architectures: x86
+Type: system
+Parameters: struct kvm_msr_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+ E2BIG: the msr index list is to be to fit in the array specified by
+ the user.
+
+struct kvm_msr_list {
+ __u32 nmsrs; /* number of msrs in entries */
+ __u32 indices[0];
+};
+
+This ioctl returns the guest msrs that are supported. The list varies
+by kvm version and host processor, but does not change otherwise. The
+user fills in the size of the indices array in nmsrs, and in return
+kvm adjusts nmsrs to reflect the actual number of msrs and fills in
+the indices array with their numbers.
+
+Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
+not returned in the MSR list, as different vcpus can have a different number
+of banks, as set via the KVM_X86_SETUP_MCE ioctl.
+
+4.4 KVM_CHECK_EXTENSION
+
+Capability: basic
+Architectures: all
+Type: system ioctl
+Parameters: extension identifier (KVM_CAP_*)
+Returns: 0 if unsupported; 1 (or some other positive integer) if supported
+
+The API allows the application to query about extensions to the core
+kvm API. Userspace passes an extension identifier (an integer) and
+receives an integer that describes the extension availability.
+Generally 0 means no and 1 means yes, but some extensions may report
+additional information in the integer return value.
+
+4.5 KVM_GET_VCPU_MMAP_SIZE
+
+Capability: basic
+Architectures: all
+Type: system ioctl
+Parameters: none
+Returns: size of vcpu mmap area, in bytes
+
+The KVM_RUN ioctl (cf.) communicates with userspace via a shared
+memory region. This ioctl returns the size of that region. See the
+KVM_RUN documentation for details.
+
+4.6 KVM_SET_MEMORY_REGION
+
+Capability: basic
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_memory_region (in)
+Returns: 0 on success, -1 on error
+
+This ioctl is obsolete and has been removed.
+
+4.7 KVM_CREATE_VCPU
+
+Capability: basic
+Architectures: all
+Type: vm ioctl
+Parameters: vcpu id (apic id on x86)
+Returns: vcpu fd on success, -1 on error
+
+This API adds a vcpu to a virtual machine. The vcpu id is a small integer
+in the range [0, max_vcpus).
+
+4.8 KVM_GET_DIRTY_LOG (vm ioctl)
+
+Capability: basic
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_dirty_log (in/out)
+Returns: 0 on success, -1 on error
+
+/* for KVM_GET_DIRTY_LOG */
+struct kvm_dirty_log {
+ __u32 slot;
+ __u32 padding;
+ union {
+ void __user *dirty_bitmap; /* one bit per page */
+ __u64 padding;
+ };
+};
+
+Given a memory slot, return a bitmap containing any pages dirtied
+since the last call to this ioctl. Bit 0 is the first page in the
+memory slot. Ensure the entire structure is cleared to avoid padding
+issues.
+
+4.9 KVM_SET_MEMORY_ALIAS
+
+Capability: basic
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_memory_alias (in)
+Returns: 0 (success), -1 (error)
+
+This ioctl is obsolete and has been removed.
+
+4.10 KVM_RUN
+
+Capability: basic
+Architectures: all
+Type: vcpu ioctl
+Parameters: none
+Returns: 0 on success, -1 on error
+Errors:
+ EINTR: an unmasked signal is pending
+
+This ioctl is used to run a guest virtual cpu. While there are no
+explicit parameters, there is an implicit parameter block that can be
+obtained by mmap()ing the vcpu fd at offset 0, with the size given by
+KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
+kvm_run' (see below).
+
+4.11 KVM_GET_REGS
+
+Capability: basic
+Architectures: all
+Type: vcpu ioctl
+Parameters: struct kvm_regs (out)
+Returns: 0 on success, -1 on error
+
+Reads the general purpose registers from the vcpu.
+
+/* x86 */
+struct kvm_regs {
+ /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
+ __u64 rax, rbx, rcx, rdx;
+ __u64 rsi, rdi, rsp, rbp;
+ __u64 r8, r9, r10, r11;
+ __u64 r12, r13, r14, r15;
+ __u64 rip, rflags;
+};
+
+4.12 KVM_SET_REGS
+
+Capability: basic
+Architectures: all
+Type: vcpu ioctl
+Parameters: struct kvm_regs (in)
+Returns: 0 on success, -1 on error
+
+Writes the general purpose registers into the vcpu.
+
+See KVM_GET_REGS for the data structure.
+
+4.13 KVM_GET_SREGS
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_sregs (out)
+Returns: 0 on success, -1 on error
+
+Reads special registers from the vcpu.
+
+/* x86 */
+struct kvm_sregs {
+ struct kvm_segment cs, ds, es, fs, gs, ss;
+ struct kvm_segment tr, ldt;
+ struct kvm_dtable gdt, idt;
+ __u64 cr0, cr2, cr3, cr4, cr8;
+ __u64 efer;
+ __u64 apic_base;
+ __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
+};
+
+interrupt_bitmap is a bitmap of pending external interrupts. At most
+one bit may be set. This interrupt has been acknowledged by the APIC
+but not yet injected into the cpu core.
+
+4.14 KVM_SET_SREGS
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_sregs (in)
+Returns: 0 on success, -1 on error
+
+Writes special registers into the vcpu. See KVM_GET_SREGS for the
+data structures.
+
+4.15 KVM_TRANSLATE
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_translation (in/out)
+Returns: 0 on success, -1 on error
+
+Translates a virtual address according to the vcpu's current address
+translation mode.
+
+struct kvm_translation {
+ /* in */
+ __u64 linear_address;
+
+ /* out */
+ __u64 physical_address;
+ __u8 valid;
+ __u8 writeable;
+ __u8 usermode;
+ __u8 pad[5];
+};
+
+4.16 KVM_INTERRUPT
+
+Capability: basic
+Architectures: x86, ppc
+Type: vcpu ioctl
+Parameters: struct kvm_interrupt (in)
+Returns: 0 on success, -1 on error
+
+Queues a hardware interrupt vector to be injected. This is only
+useful if in-kernel local APIC or equivalent is not used.
+
+/* for KVM_INTERRUPT */
+struct kvm_interrupt {
+ /* in */
+ __u32 irq;
+};
+
+X86:
+
+Note 'irq' is an interrupt vector, not an interrupt pin or line.
+
+PPC:
+
+Queues an external interrupt to be injected. This ioctl is overleaded
+with 3 different irq values:
+
+a) KVM_INTERRUPT_SET
+
+ This injects an edge type external interrupt into the guest once it's ready
+ to receive interrupts. When injected, the interrupt is done.
+
+b) KVM_INTERRUPT_UNSET
+
+ This unsets any pending interrupt.
+
+ Only available with KVM_CAP_PPC_UNSET_IRQ.
+
+c) KVM_INTERRUPT_SET_LEVEL
+
+ This injects a level type external interrupt into the guest context. The
+ interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
+ is triggered.
+
+ Only available with KVM_CAP_PPC_IRQ_LEVEL.
+
+Note that any value for 'irq' other than the ones stated above is invalid
+and incurs unexpected behavior.
+
+4.17 KVM_DEBUG_GUEST
+
+Capability: basic
+Architectures: none
+Type: vcpu ioctl
+Parameters: none)
+Returns: -1 on error
+
+Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead.
+
+4.18 KVM_GET_MSRS
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_msrs (in/out)
+Returns: 0 on success, -1 on error
+
+Reads model-specific registers from the vcpu. Supported msr indices can
+be obtained using KVM_GET_MSR_INDEX_LIST.
+
+struct kvm_msrs {
+ __u32 nmsrs; /* number of msrs in entries */
+ __u32 pad;
+
+ struct kvm_msr_entry entries[0];
+};
+
+struct kvm_msr_entry {
+ __u32 index;
+ __u32 reserved;
+ __u64 data;
+};
+
+Application code should set the 'nmsrs' member (which indicates the
+size of the entries array) and the 'index' member of each array entry.
+kvm will fill in the 'data' member.
+
+4.19 KVM_SET_MSRS
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_msrs (in)
+Returns: 0 on success, -1 on error
+
+Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
+data structures.
+
+Application code should set the 'nmsrs' member (which indicates the
+size of the entries array), and the 'index' and 'data' members of each
+array entry.
+
+4.20 KVM_SET_CPUID
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_cpuid (in)
+Returns: 0 on success, -1 on error
+
+Defines the vcpu responses to the cpuid instruction. Applications
+should use the KVM_SET_CPUID2 ioctl if available.
+
+
+struct kvm_cpuid_entry {
+ __u32 function;
+ __u32 eax;
+ __u32 ebx;
+ __u32 ecx;
+ __u32 edx;
+ __u32 padding;
+};
+
+/* for KVM_SET_CPUID */
+struct kvm_cpuid {
+ __u32 nent;
+ __u32 padding;
+ struct kvm_cpuid_entry entries[0];
+};
+
+4.21 KVM_SET_SIGNAL_MASK
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_signal_mask (in)
+Returns: 0 on success, -1 on error
+
+Defines which signals are blocked during execution of KVM_RUN. This
+signal mask temporarily overrides the threads signal mask. Any
+unblocked signal received (except SIGKILL and SIGSTOP, which retain
+their traditional behaviour) will cause KVM_RUN to return with -EINTR.
+
+Note the signal will only be delivered if not blocked by the original
+signal mask.
+
+/* for KVM_SET_SIGNAL_MASK */
+struct kvm_signal_mask {
+ __u32 len;
+ __u8 sigset[0];
+};
+
+4.22 KVM_GET_FPU
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_fpu (out)
+Returns: 0 on success, -1 on error
+
+Reads the floating point state from the vcpu.
+
+/* for KVM_GET_FPU and KVM_SET_FPU */
+struct kvm_fpu {
+ __u8 fpr[8][16];
+ __u16 fcw;
+ __u16 fsw;
+ __u8 ftwx; /* in fxsave format */
+ __u8 pad1;
+ __u16 last_opcode;
+ __u64 last_ip;
+ __u64 last_dp;
+ __u8 xmm[16][16];
+ __u32 mxcsr;
+ __u32 pad2;
+};
+
+4.23 KVM_SET_FPU
+
+Capability: basic
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_fpu (in)
+Returns: 0 on success, -1 on error
+
+Writes the floating point state to the vcpu.
+
+/* for KVM_GET_FPU and KVM_SET_FPU */
+struct kvm_fpu {
+ __u8 fpr[8][16];
+ __u16 fcw;
+ __u16 fsw;
+ __u8 ftwx; /* in fxsave format */
+ __u8 pad1;
+ __u16 last_opcode;
+ __u64 last_ip;
+ __u64 last_dp;
+ __u8 xmm[16][16];
+ __u32 mxcsr;
+ __u32 pad2;
+};
+
+4.24 KVM_CREATE_IRQCHIP
+
+Capability: KVM_CAP_IRQCHIP
+Architectures: x86, ia64
+Type: vm ioctl
+Parameters: none
+Returns: 0 on success, -1 on error
+
+Creates an interrupt controller model in the kernel. On x86, creates a virtual
+ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
+local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
+only go to the IOAPIC. On ia64, a IOSAPIC is created.
+
+4.25 KVM_IRQ_LINE
+
+Capability: KVM_CAP_IRQCHIP
+Architectures: x86, ia64
+Type: vm ioctl
+Parameters: struct kvm_irq_level
+Returns: 0 on success, -1 on error
+
+Sets the level of a GSI input to the interrupt controller model in the kernel.
+Requires that an interrupt controller model has been previously created with
+KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level
+to be set to 1 and then back to 0.
+
+struct kvm_irq_level {
+ union {
+ __u32 irq; /* GSI */
+ __s32 status; /* not used for KVM_IRQ_LEVEL */
+ };
+ __u32 level; /* 0 or 1 */
+};
+
+4.26 KVM_GET_IRQCHIP
+
+Capability: KVM_CAP_IRQCHIP
+Architectures: x86, ia64
+Type: vm ioctl
+Parameters: struct kvm_irqchip (in/out)
+Returns: 0 on success, -1 on error
+
+Reads the state of a kernel interrupt controller created with
+KVM_CREATE_IRQCHIP into a buffer provided by the caller.
+
+struct kvm_irqchip {
+ __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
+ __u32 pad;
+ union {
+ char dummy[512]; /* reserving space */
+ struct kvm_pic_state pic;
+ struct kvm_ioapic_state ioapic;
+ } chip;
+};
+
+4.27 KVM_SET_IRQCHIP
+
+Capability: KVM_CAP_IRQCHIP
+Architectures: x86, ia64
+Type: vm ioctl
+Parameters: struct kvm_irqchip (in)
+Returns: 0 on success, -1 on error
+
+Sets the state of a kernel interrupt controller created with
+KVM_CREATE_IRQCHIP from a buffer provided by the caller.
+
+struct kvm_irqchip {
+ __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
+ __u32 pad;
+ union {
+ char dummy[512]; /* reserving space */
+ struct kvm_pic_state pic;
+ struct kvm_ioapic_state ioapic;
+ } chip;
+};
+
+4.28 KVM_XEN_HVM_CONFIG
+
+Capability: KVM_CAP_XEN_HVM
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_xen_hvm_config (in)
+Returns: 0 on success, -1 on error
+
+Sets the MSR that the Xen HVM guest uses to initialize its hypercall
+page, and provides the starting address and size of the hypercall
+blobs in userspace. When the guest writes the MSR, kvm copies one
+page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
+memory.
+
+struct kvm_xen_hvm_config {
+ __u32 flags;
+ __u32 msr;
+ __u64 blob_addr_32;
+ __u64 blob_addr_64;
+ __u8 blob_size_32;
+ __u8 blob_size_64;
+ __u8 pad2[30];
+};
+
+4.29 KVM_GET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (out)
+Returns: 0 on success, -1 on error
+
+Gets the current timestamp of kvmclock as seen by the current guest. In
+conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
+4.30 KVM_SET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (in)
+Returns: 0 on success, -1 on error
+
+Sets the current timestamp of kvmclock to the value specified in its parameter.
+In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+ __u64 clock; /* kvmclock current value */
+ __u32 flags;
+ __u32 pad[9];
+};
+
+4.31 KVM_GET_VCPU_EVENTS
+
+Capability: KVM_CAP_VCPU_EVENTS
+Extended by: KVM_CAP_INTR_SHADOW
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_vcpu_event (out)
+Returns: 0 on success, -1 on error
+
+Gets currently pending exceptions, interrupts, and NMIs as well as related
+states of the vcpu.
+
+struct kvm_vcpu_events {
+ struct {
+ __u8 injected;
+ __u8 nr;
+ __u8 has_error_code;
+ __u8 pad;
+ __u32 error_code;
+ } exception;
+ struct {
+ __u8 injected;
+ __u8 nr;
+ __u8 soft;
+ __u8 shadow;
+ } interrupt;
+ struct {
+ __u8 injected;
+ __u8 pending;
+ __u8 masked;
+ __u8 pad;
+ } nmi;
+ __u32 sipi_vector;
+ __u32 flags;
+};
+
+KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
+interrupt.shadow contains a valid state. Otherwise, this field is undefined.
+
+4.32 KVM_SET_VCPU_EVENTS
+
+Capability: KVM_CAP_VCPU_EVENTS
+Extended by: KVM_CAP_INTR_SHADOW
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_vcpu_event (in)
+Returns: 0 on success, -1 on error
+
+Set pending exceptions, interrupts, and NMIs as well as related states of the
+vcpu.
+
+See KVM_GET_VCPU_EVENTS for the data structure.
+
+Fields that may be modified asynchronously by running VCPUs can be excluded
+from the update. These fields are nmi.pending and sipi_vector. Keep the
+corresponding bits in the flags field cleared to suppress overwriting the
+current in-kernel state. The bits are:
+
+KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
+KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
+
+If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
+the flags field to signal that interrupt.shadow contains a valid state and
+shall be written into the VCPU.
+
+4.33 KVM_GET_DEBUGREGS
+
+Capability: KVM_CAP_DEBUGREGS
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_debugregs (out)
+Returns: 0 on success, -1 on error
+
+Reads debug registers from the vcpu.
+
+struct kvm_debugregs {
+ __u64 db[4];
+ __u64 dr6;
+ __u64 dr7;
+ __u64 flags;
+ __u64 reserved[9];
+};
+
+4.34 KVM_SET_DEBUGREGS
+
+Capability: KVM_CAP_DEBUGREGS
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_debugregs (in)
+Returns: 0 on success, -1 on error
+
+Writes debug registers into the vcpu.
+
+See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
+yet and must be cleared on entry.
+
+4.35 KVM_SET_USER_MEMORY_REGION
+
+Capability: KVM_CAP_USER_MEM
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_userspace_memory_region (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_userspace_memory_region {
+ __u32 slot;
+ __u32 flags;
+ __u64 guest_phys_addr;
+ __u64 memory_size; /* bytes */
+ __u64 userspace_addr; /* start of the userspace allocated memory */
+};
+
+/* for kvm_memory_region::flags */
+#define KVM_MEM_LOG_DIRTY_PAGES 1UL
+
+This ioctl allows the user to create or modify a guest physical memory
+slot. When changing an existing slot, it may be moved in the guest
+physical memory space, or its flags may be modified. It may not be
+resized. Slots may not overlap in guest physical address space.
+
+Memory for the region is taken starting at the address denoted by the
+field userspace_addr, which must point at user addressable memory for
+the entire memory slot size. Any object may back this memory, including
+anonymous memory, ordinary files, and hugetlbfs.
+
+It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
+be identical. This allows large pages in the guest to be backed by large
+pages in the host.
+
+The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
+instructs kvm to keep track of writes to memory within the slot. See
+the KVM_GET_DIRTY_LOG ioctl.
+
+When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
+region are automatically reflected into the guest. For example, an mmap()
+that affects the region will be made visible immediately. Another example
+is madvise(MADV_DROP).
+
+It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
+The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
+allocation and is deprecated.
+
+4.36 KVM_SET_TSS_ADDR
+
+Capability: KVM_CAP_SET_TSS_ADDR
+Architectures: x86
+Type: vm ioctl
+Parameters: unsigned long tss_address (in)
+Returns: 0 on success, -1 on error
+
+This ioctl defines the physical address of a three-page region in the guest
+physical address space. The region must be within the first 4GB of the
+guest physical address space and must not conflict with any memory slot
+or any mmio address. The guest may malfunction if it accesses this memory
+region.
+
+This ioctl is required on Intel-based hosts. This is needed on Intel hardware
+because of a quirk in the virtualization implementation (see the internals
+documentation when it pops into existence).
+
+4.37 KVM_ENABLE_CAP
+
+Capability: KVM_CAP_ENABLE_CAP
+Architectures: ppc
+Type: vcpu ioctl
+Parameters: struct kvm_enable_cap (in)
+Returns: 0 on success; -1 on error
+
++Not all extensions are enabled by default. Using this ioctl the application
+can enable an extension, making it available to the guest.
+
+On systems that do not support this ioctl, it always fails. On systems that
+do support it, it only works for extensions that are supported for enablement.
+
+To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
+be used.
+
+struct kvm_enable_cap {
+ /* in */
+ __u32 cap;
+
+The capability that is supposed to get enabled.
+
+ __u32 flags;
+
+A bitfield indicating future enhancements. Has to be 0 for now.
+
+ __u64 args[4];
+
+Arguments for enabling a feature. If a feature needs initial values to
+function properly, this is the place to put them.
+
+ __u8 pad[64];
+};
+
+4.38 KVM_GET_MP_STATE
+
+Capability: KVM_CAP_MP_STATE
+Architectures: x86, ia64
+Type: vcpu ioctl
+Parameters: struct kvm_mp_state (out)
+Returns: 0 on success; -1 on error
+
+struct kvm_mp_state {
+ __u32 mp_state;
+};
+
+Returns the vcpu's current "multiprocessing state" (though also valid on
+uniprocessor guests).
+
+Possible values are:
+
+ - KVM_MP_STATE_RUNNABLE: the vcpu is currently running
+ - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP)
+ which has not yet received an INIT signal
+ - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is
+ now ready for a SIPI
+ - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and
+ is waiting for an interrupt
+ - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector
+ accessible via KVM_GET_VCPU_EVENTS)
+
+This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel
+irqchip, the multiprocessing state must be maintained by userspace.
+
+4.39 KVM_SET_MP_STATE
+
+Capability: KVM_CAP_MP_STATE
+Architectures: x86, ia64
+Type: vcpu ioctl
+Parameters: struct kvm_mp_state (in)
+Returns: 0 on success; -1 on error
+
+Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
+arguments.
+
+This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel
+irqchip, the multiprocessing state must be maintained by userspace.
+
+4.40 KVM_SET_IDENTITY_MAP_ADDR
+
+Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
+Architectures: x86
+Type: vm ioctl
+Parameters: unsigned long identity (in)
+Returns: 0 on success, -1 on error
+
+This ioctl defines the physical address of a one-page region in the guest
+physical address space. The region must be within the first 4GB of the
+guest physical address space and must not conflict with any memory slot
+or any mmio address. The guest may malfunction if it accesses this memory
+region.
+
+This ioctl is required on Intel-based hosts. This is needed on Intel hardware
+because of a quirk in the virtualization implementation (see the internals
+documentation when it pops into existence).
+
+4.41 KVM_SET_BOOT_CPU_ID
+
+Capability: KVM_CAP_SET_BOOT_CPU_ID
+Architectures: x86, ia64
+Type: vm ioctl
+Parameters: unsigned long vcpu_id
+Returns: 0 on success, -1 on error
+
+Define which vcpu is the Bootstrap Processor (BSP). Values are the same
+as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default
+is vcpu 0.
+
+4.42 KVM_GET_XSAVE
+
+Capability: KVM_CAP_XSAVE
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_xsave (out)
+Returns: 0 on success, -1 on error
+
+struct kvm_xsave {
+ __u32 region[1024];
+};
+
+This ioctl would copy current vcpu's xsave struct to the userspace.
+
+4.43 KVM_SET_XSAVE
+
+Capability: KVM_CAP_XSAVE
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_xsave (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_xsave {
+ __u32 region[1024];
+};
+
+This ioctl would copy userspace's xsave struct to the kernel.
+
+4.44 KVM_GET_XCRS
+
+Capability: KVM_CAP_XCRS
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_xcrs (out)
+Returns: 0 on success, -1 on error
+
+struct kvm_xcr {
+ __u32 xcr;
+ __u32 reserved;
+ __u64 value;
+};
+
+struct kvm_xcrs {
+ __u32 nr_xcrs;
+ __u32 flags;
+ struct kvm_xcr xcrs[KVM_MAX_XCRS];
+ __u64 padding[16];
+};
+
+This ioctl would copy current vcpu's xcrs to the userspace.
+
+4.45 KVM_SET_XCRS
+
+Capability: KVM_CAP_XCRS
+Architectures: x86
+Type: vcpu ioctl
+Parameters: struct kvm_xcrs (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_xcr {
+ __u32 xcr;
+ __u32 reserved;
+ __u64 value;
+};
+
+struct kvm_xcrs {
+ __u32 nr_xcrs;
+ __u32 flags;
+ struct kvm_xcr xcrs[KVM_MAX_XCRS];
+ __u64 padding[16];
+};
+
+This ioctl would set vcpu's xcr to the value userspace specified.
+
+4.46 KVM_GET_SUPPORTED_CPUID
+
+Capability: KVM_CAP_EXT_CPUID
+Architectures: x86
+Type: system ioctl
+Parameters: struct kvm_cpuid2 (in/out)
+Returns: 0 on success, -1 on error
+
+struct kvm_cpuid2 {
+ __u32 nent;
+ __u32 padding;
+ struct kvm_cpuid_entry2 entries[0];
+};
+
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
+#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
+#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
+
+struct kvm_cpuid_entry2 {
+ __u32 function;
+ __u32 index;
+ __u32 flags;
+ __u32 eax;
+ __u32 ebx;
+ __u32 ecx;
+ __u32 edx;
+ __u32 padding[3];
+};
+
+This ioctl returns x86 cpuid features which are supported by both the hardware
+and kvm. Userspace can use the information returned by this ioctl to
+construct cpuid information (for KVM_SET_CPUID2) that is consistent with
+hardware, kernel, and userspace capabilities, and with user requirements (for
+example, the user may wish to constrain cpuid to emulate older hardware,
+or for feature consistency across a cluster).
+
+Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
+with the 'nent' field indicating the number of entries in the variable-size
+array 'entries'. If the number of entries is too low to describe the cpu
+capabilities, an error (E2BIG) is returned. If the number is too high,
+the 'nent' field is adjusted and an error (ENOMEM) is returned. If the
+number is just right, the 'nent' field is adjusted to the number of valid
+entries in the 'entries' array, which is then filled.
+
+The entries returned are the host cpuid as returned by the cpuid instruction,
+with unknown or unsupported features masked out. Some features (for example,
+x2apic), may not be present in the host cpu, but are exposed by kvm if it can
+emulate them efficiently. The fields in each entry are defined as follows:
+
+ function: the eax value used to obtain the entry
+ index: the ecx value used to obtain the entry (for entries that are
+ affected by ecx)
+ flags: an OR of zero or more of the following:
+ KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
+ if the index field is valid
+ KVM_CPUID_FLAG_STATEFUL_FUNC:
+ if cpuid for this function returns different values for successive
+ invocations; there will be several entries with the same function,
+ all with this flag set
+ KVM_CPUID_FLAG_STATE_READ_NEXT:
+ for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
+ the first entry to be read by a cpu
+ eax, ebx, ecx, edx: the values returned by the cpuid instruction for
+ this function/index combination
+
+4.47 KVM_PPC_GET_PVINFO
+
+Capability: KVM_CAP_PPC_GET_PVINFO
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_ppc_pvinfo (out)
+Returns: 0 on success, !0 on error
+
+struct kvm_ppc_pvinfo {
+ __u32 flags;
+ __u32 hcall[4];
+ __u8 pad[108];
+};
+
+This ioctl fetches PV specific information that need to be passed to the guest
+using the device tree or other means from vm context.
+
+For now the only implemented piece of information distributed here is an array
+of 4 instructions that make up a hypercall.
+
+If any additional field gets added to this structure later on, a bit for that
+additional piece of information will be set in the flags bitmap.
+
+4.48 KVM_ASSIGN_PCI_DEVICE
+
+Capability: KVM_CAP_DEVICE_ASSIGNMENT
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+
+Assigns a host PCI device to the VM.
+
+struct kvm_assigned_pci_dev {
+ __u32 assigned_dev_id;
+ __u32 busnr;
+ __u32 devfn;
+ __u32 flags;
+ __u32 segnr;
+ union {
+ __u32 reserved[11];
+ };
+};
+
+The PCI device is specified by the triple segnr, busnr, and devfn.
+Identification in succeeding service requests is done via assigned_dev_id. The
+following flags are specified:
+
+/* Depends on KVM_CAP_IOMMU */
+#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
+
+4.49 KVM_DEASSIGN_PCI_DEVICE
+
+Capability: KVM_CAP_DEVICE_DEASSIGNMENT
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+
+Ends PCI device assignment, releasing all associated resources.
+
+See KVM_CAP_DEVICE_ASSIGNMENT for the data structure. Only assigned_dev_id is
+used in kvm_assigned_pci_dev to identify the device.
+
+4.50 KVM_ASSIGN_DEV_IRQ
+
+Capability: KVM_CAP_ASSIGN_DEV_IRQ
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_irq (in)
+Returns: 0 on success, -1 on error
+
+Assigns an IRQ to a passed-through device.
+
+struct kvm_assigned_irq {
+ __u32 assigned_dev_id;
+ __u32 host_irq;
+ __u32 guest_irq;
+ __u32 flags;
+ union {
+ struct {
+ __u32 addr_lo;
+ __u32 addr_hi;
+ __u32 data;
+ } guest_msi;
+ __u32 reserved[12];
+ };
+};
+
+The following flags are defined:
+
+#define KVM_DEV_IRQ_HOST_INTX (1 << 0)
+#define KVM_DEV_IRQ_HOST_MSI (1 << 1)
+#define KVM_DEV_IRQ_HOST_MSIX (1 << 2)
+
+#define KVM_DEV_IRQ_GUEST_INTX (1 << 8)
+#define KVM_DEV_IRQ_GUEST_MSI (1 << 9)
+#define KVM_DEV_IRQ_GUEST_MSIX (1 << 10)
+
+It is not valid to specify multiple types per host or guest IRQ. However, the
+IRQ type of host and guest can differ or can even be null.
+
+4.51 KVM_DEASSIGN_DEV_IRQ
+
+Capability: KVM_CAP_ASSIGN_DEV_IRQ
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_irq (in)
+Returns: 0 on success, -1 on error
+
+Ends an IRQ assignment to a passed-through device.
+
+See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
+by assigned_dev_id, flags must correspond to the IRQ type specified on
+KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
+
+4.52 KVM_SET_GSI_ROUTING
+
+Capability: KVM_CAP_IRQ_ROUTING
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_irq_routing (in)
+Returns: 0 on success, -1 on error
+
+Sets the GSI routing table entries, overwriting any previously set entries.
+
+struct kvm_irq_routing {
+ __u32 nr;
+ __u32 flags;
+ struct kvm_irq_routing_entry entries[0];
+};
+
+No flags are specified so far, the corresponding field must be set to zero.
+
+struct kvm_irq_routing_entry {
+ __u32 gsi;
+ __u32 type;
+ __u32 flags;
+ __u32 pad;
+ union {
+ struct kvm_irq_routing_irqchip irqchip;
+ struct kvm_irq_routing_msi msi;
+ __u32 pad[8];
+ } u;
+};
+
+/* gsi routing entry types */
+#define KVM_IRQ_ROUTING_IRQCHIP 1
+#define KVM_IRQ_ROUTING_MSI 2
+
+No flags are specified so far, the corresponding field must be set to zero.
+
+struct kvm_irq_routing_irqchip {
+ __u32 irqchip;
+ __u32 pin;
+};
+
+struct kvm_irq_routing_msi {
+ __u32 address_lo;
+ __u32 address_hi;
+ __u32 data;
+ __u32 pad;
+};
+
+4.53 KVM_ASSIGN_SET_MSIX_NR
+
+Capability: KVM_CAP_DEVICE_MSIX
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_nr (in)
+Returns: 0 on success, -1 on error
+
+Set the number of MSI-X interrupts for an assigned device. This service can
+only be called once in the lifetime of an assigned device.
+
+struct kvm_assigned_msix_nr {
+ __u32 assigned_dev_id;
+ __u16 entry_nr;
+ __u16 padding;
+};
+
+#define KVM_MAX_MSIX_PER_DEV 256
+
+4.54 KVM_ASSIGN_SET_MSIX_ENTRY
+
+Capability: KVM_CAP_DEVICE_MSIX
+Architectures: x86 ia64
+Type: vm ioctl
+Parameters: struct kvm_assigned_msix_entry (in)
+Returns: 0 on success, -1 on error
+
+Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
+the GSI vector to zero means disabling the interrupt.
+
+struct kvm_assigned_msix_entry {
+ __u32 assigned_dev_id;
+ __u32 gsi;
+ __u16 entry; /* The index of entry in the MSI-X table */
+ __u16 padding[3];
+};
+
+5. The kvm_run structure
+
+Application code obtains a pointer to the kvm_run structure by
+mmap()ing a vcpu fd. From that point, application code can control
+execution by changing fields in kvm_run prior to calling the KVM_RUN
+ioctl, and obtain information about the reason KVM_RUN returned by
+looking up structure members.
+
+struct kvm_run {
+ /* in */
+ __u8 request_interrupt_window;
+
+Request that KVM_RUN return when it becomes possible to inject external
+interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
+
+ __u8 padding1[7];
+
+ /* out */
+ __u32 exit_reason;
+
+When KVM_RUN has returned successfully (return value 0), this informs
+application code why KVM_RUN has returned. Allowable values for this
+field are detailed below.
+
+ __u8 ready_for_interrupt_injection;
+
+If request_interrupt_window has been specified, this field indicates
+an interrupt can be injected now with KVM_INTERRUPT.
+
+ __u8 if_flag;
+
+The value of the current interrupt flag. Only valid if in-kernel
+local APIC is not used.
+
+ __u8 padding2[2];
+
+ /* in (pre_kvm_run), out (post_kvm_run) */
+ __u64 cr8;
+
+The value of the cr8 register. Only valid if in-kernel local APIC is
+not used. Both input and output.
+
+ __u64 apic_base;
+
+The value of the APIC BASE msr. Only valid if in-kernel local
+APIC is not used. Both input and output.
+
+ union {
+ /* KVM_EXIT_UNKNOWN */
+ struct {
+ __u64 hardware_exit_reason;
+ } hw;
+
+If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
+reasons. Further architecture-specific information is available in
+hardware_exit_reason.
+
+ /* KVM_EXIT_FAIL_ENTRY */
+ struct {
+ __u64 hardware_entry_failure_reason;
+ } fail_entry;
+
+If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
+to unknown reasons. Further architecture-specific information is
+available in hardware_entry_failure_reason.
+
+ /* KVM_EXIT_EXCEPTION */
+ struct {
+ __u32 exception;
+ __u32 error_code;
+ } ex;
+
+Unused.
+
+ /* KVM_EXIT_IO */
+ struct {
+#define KVM_EXIT_IO_IN 0
+#define KVM_EXIT_IO_OUT 1
+ __u8 direction;
+ __u8 size; /* bytes */
+ __u16 port;
+ __u32 count;
+ __u64 data_offset; /* relative to kvm_run start */
+ } io;
+
+If exit_reason is KVM_EXIT_IO, then the vcpu has
+executed a port I/O instruction which could not be satisfied by kvm.
+data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
+where kvm expects application code to place the data for the next
+KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
+
+ struct {
+ struct kvm_debug_exit_arch arch;
+ } debug;
+
+Unused.
+
+ /* KVM_EXIT_MMIO */
+ struct {
+ __u64 phys_addr;
+ __u8 data[8];
+ __u32 len;
+ __u8 is_write;
+ } mmio;
+
+If exit_reason is KVM_EXIT_MMIO, then the vcpu has
+executed a memory-mapped I/O instruction which could not be satisfied
+by kvm. The 'data' member contains the written data if 'is_write' is
+true, and should be filled by application code otherwise.
+
+NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
+operations are complete (and guest state is consistent) only after userspace
+has re-entered the kernel with KVM_RUN. The kernel side will first finish
+incomplete operations and then check for pending signals. Userspace
+can re-enter the guest with an unmasked signal pending to complete
+pending operations.
+
+ /* KVM_EXIT_HYPERCALL */
+ struct {
+ __u64 nr;
+ __u64 args[6];
+ __u64 ret;
+ __u32 longmode;
+ __u32 pad;
+ } hypercall;
+
+Unused. This was once used for 'hypercall to userspace'. To implement
+such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
+Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
+
+ /* KVM_EXIT_TPR_ACCESS */
+ struct {
+ __u64 rip;
+ __u32 is_write;
+ __u32 pad;
+ } tpr_access;
+
+To be documented (KVM_TPR_ACCESS_REPORTING).
+
+ /* KVM_EXIT_S390_SIEIC */
+ struct {
+ __u8 icptcode;
+ __u64 mask; /* psw upper half */
+ __u64 addr; /* psw lower half */
+ __u16 ipa;
+ __u32 ipb;
+ } s390_sieic;
+
+s390 specific.
+
+ /* KVM_EXIT_S390_RESET */
+#define KVM_S390_RESET_POR 1
+#define KVM_S390_RESET_CLEAR 2
+#define KVM_S390_RESET_SUBSYSTEM 4
+#define KVM_S390_RESET_CPU_INIT 8
+#define KVM_S390_RESET_IPL 16
+ __u64 s390_reset_flags;
+
+s390 specific.
+
+ /* KVM_EXIT_DCR */
+ struct {
+ __u32 dcrn;
+ __u32 data;
+ __u8 is_write;
+ } dcr;
+
+powerpc specific.
+
+ /* KVM_EXIT_OSI */
+ struct {
+ __u64 gprs[32];
+ } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+
+ /* Fix the size of the union. */
+ char padding[256];
+ };
+};
--- /dev/null
+KVM CPUID bits
+Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
+=====================================================
+
+A guest running on a kvm host, can check some of its features using
+cpuid. This is not always guaranteed to work, since userspace can
+mask-out some, or even all KVM-related cpuid features before launching
+a guest.
+
+KVM cpuid functions are:
+
+function: KVM_CPUID_SIGNATURE (0x40000000)
+returns : eax = 0,
+ ebx = 0x4b4d564b,
+ ecx = 0x564b4d56,
+ edx = 0x4d.
+Note that this value in ebx, ecx and edx corresponds to the string "KVMKVMKVM".
+This function queries the presence of KVM cpuid leafs.
+
+
+function: define KVM_CPUID_FEATURES (0x40000001)
+returns : ebx, ecx, edx = 0
+ eax = and OR'ed group of (1 << flag), where each flags is:
+
+
+flag || value || meaning
+=============================================================================
+KVM_FEATURE_CLOCKSOURCE || 0 || kvmclock available at msrs
+ || || 0x11 and 0x12.
+------------------------------------------------------------------------------
+KVM_FEATURE_NOP_IO_DELAY || 1 || not necessary to perform delays
+ || || on PIO operations.
+------------------------------------------------------------------------------
+KVM_FEATURE_MMU_OP || 2 || deprecated.
+------------------------------------------------------------------------------
+KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs
+ || || 0x4b564d00 and 0x4b564d01
+------------------------------------------------------------------------------
+KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by
+ || || writing to msr 0x4b564d02
+------------------------------------------------------------------------------
+KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
+ || || per-cpu warps are expected in
+ || || kvmclock.
+------------------------------------------------------------------------------
--- /dev/null
+KVM Lock Overview
+=================
+
+1. Acquisition Orders
+---------------------
+
+(to be written)
+
+2. Reference
+------------
+
+Name: kvm_lock
+Type: raw_spinlock
+Arch: any
+Protects: - vm_list
+ - hardware virtualization enable/disable
+Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
+ migration.
+
+Name: kvm_arch::tsc_write_lock
+Type: raw_spinlock
+Arch: x86
+Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
+ - tsc offset in vmcb
+Comment: 'raw' because updating the tsc offsets must not be preempted.
--- /dev/null
+The x86 kvm shadow mmu
+======================
+
+The mmu (in arch/x86/kvm, files mmu.[ch] and paging_tmpl.h) is responsible
+for presenting a standard x86 mmu to the guest, while translating guest
+physical addresses to host physical addresses.
+
+The mmu code attempts to satisfy the following requirements:
+
+- correctness: the guest should not be able to determine that it is running
+ on an emulated mmu except for timing (we attempt to comply
+ with the specification, not emulate the characteristics of
+ a particular implementation such as tlb size)
+- security: the guest must not be able to touch host memory not assigned
+ to it
+- performance: minimize the performance penalty imposed by the mmu
+- scaling: need to scale to large memory and large vcpu guests
+- hardware: support the full range of x86 virtualization hardware
+- integration: Linux memory management code must be in control of guest memory
+ so that swapping, page migration, page merging, transparent
+ hugepages, and similar features work without change
+- dirty tracking: report writes to guest memory to enable live migration
+ and framebuffer-based displays
+- footprint: keep the amount of pinned kernel memory low (most memory
+ should be shrinkable)
+- reliability: avoid multipage or GFP_ATOMIC allocations
+
+Acronyms
+========
+
+pfn host page frame number
+hpa host physical address
+hva host virtual address
+gfn guest frame number
+gpa guest physical address
+gva guest virtual address
+ngpa nested guest physical address
+ngva nested guest virtual address
+pte page table entry (used also to refer generically to paging structure
+ entries)
+gpte guest pte (referring to gfns)
+spte shadow pte (referring to pfns)
+tdp two dimensional paging (vendor neutral term for NPT and EPT)
+
+Virtual and real hardware supported
+===================================
+
+The mmu supports first-generation mmu hardware, which allows an atomic switch
+of the current paging mode and cr3 during guest entry, as well as
+two-dimensional paging (AMD's NPT and Intel's EPT). The emulated hardware
+it exposes is the traditional 2/3/4 level x86 mmu, with support for global
+pages, pae, pse, pse36, cr0.wp, and 1GB pages. Work is in progress to support
+exposing NPT capable hardware on NPT capable hosts.
+
+Translation
+===========
+
+The primary job of the mmu is to program the processor's mmu to translate
+addresses for the guest. Different translations are required at different
+times:
+
+- when guest paging is disabled, we translate guest physical addresses to
+ host physical addresses (gpa->hpa)
+- when guest paging is enabled, we translate guest virtual addresses, to
+ guest physical addresses, to host physical addresses (gva->gpa->hpa)
+- when the guest launches a guest of its own, we translate nested guest
+ virtual addresses, to nested guest physical addresses, to guest physical
+ addresses, to host physical addresses (ngva->ngpa->gpa->hpa)
+
+The primary challenge is to encode between 1 and 3 translations into hardware
+that support only 1 (traditional) and 2 (tdp) translations. When the
+number of required translations matches the hardware, the mmu operates in
+direct mode; otherwise it operates in shadow mode (see below).
+
+Memory
+======
+
+Guest memory (gpa) is part of the user address space of the process that is
+using kvm. Userspace defines the translation between guest addresses and user
+addresses (gpa->hva); note that two gpas may alias to the same hva, but not
+vice versa.
+
+These hvas may be backed using any method available to the host: anonymous
+memory, file backed memory, and device memory. Memory might be paged by the
+host at any time.
+
+Events
+======
+
+The mmu is driven by events, some from the guest, some from the host.
+
+Guest generated events:
+- writes to control registers (especially cr3)
+- invlpg/invlpga instruction execution
+- access to missing or protected translations
+
+Host generated events:
+- changes in the gpa->hpa translation (either through gpa->hva changes or
+ through hva->hpa changes)
+- memory pressure (the shrinker)
+
+Shadow pages
+============
+
+The principal data structure is the shadow page, 'struct kvm_mmu_page'. A
+shadow page contains 512 sptes, which can be either leaf or nonleaf sptes. A
+shadow page may contain a mix of leaf and nonleaf sptes.
+
+A nonleaf spte allows the hardware mmu to reach the leaf pages and
+is not related to a translation directly. It points to other shadow pages.
+
+A leaf spte corresponds to either one or two translations encoded into
+one paging structure entry. These are always the lowest level of the
+translation stack, with optional higher level translations left to NPT/EPT.
+Leaf ptes point at guest pages.
+
+The following table shows translations encoded by leaf ptes, with higher-level
+translations in parentheses:
+
+ Non-nested guests:
+ nonpaging: gpa->hpa
+ paging: gva->gpa->hpa
+ paging, tdp: (gva->)gpa->hpa
+ Nested guests:
+ non-tdp: ngva->gpa->hpa (*)
+ tdp: (ngva->)ngpa->gpa->hpa
+
+(*) the guest hypervisor will encode the ngva->gpa translation into its page
+ tables if npt is not present
+
+Shadow pages contain the following information:
+ role.level:
+ The level in the shadow paging hierarchy that this shadow page belongs to.
+ 1=4k sptes, 2=2M sptes, 3=1G sptes, etc.
+ role.direct:
+ If set, leaf sptes reachable from this page are for a linear range.
+ Examples include real mode translation, large guest pages backed by small
+ host pages, and gpa->hpa translations when NPT or EPT is active.
+ The linear range starts at (gfn << PAGE_SHIFT) and its size is determined
+ by role.level (2MB for first level, 1GB for second level, 0.5TB for third
+ level, 256TB for fourth level)
+ If clear, this page corresponds to a guest page table denoted by the gfn
+ field.
+ role.quadrant:
+ When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit
+ sptes. That means a guest page table contains more ptes than the host,
+ so multiple shadow pages are needed to shadow one guest page.
+ For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the
+ first or second 512-gpte block in the guest page table. For second-level
+ page tables, each 32-bit gpte is converted to two 64-bit sptes
+ (since each first-level guest page is shadowed by two first-level
+ shadow pages) so role.quadrant takes values in the range 0..3. Each
+ quadrant maps 1GB virtual address space.
+ role.access:
+ Inherited guest access permissions in the form uwx. Note execute
+ permission is positive, not negative.
+ role.invalid:
+ The page is invalid and should not be used. It is a root page that is
+ currently pinned (by a cpu hardware register pointing to it); once it is
+ unpinned it will be destroyed.
+ role.cr4_pae:
+ Contains the value of cr4.pae for which the page is valid (e.g. whether
+ 32-bit or 64-bit gptes are in use).
+ role.nxe:
+ Contains the value of efer.nxe for which the page is valid.
+ role.cr0_wp:
+ Contains the value of cr0.wp for which the page is valid.
+ gfn:
+ Either the guest page table containing the translations shadowed by this
+ page, or the base page frame for linear translations. See role.direct.
+ spt:
+ A pageful of 64-bit sptes containing the translations for this page.
+ Accessed by both kvm and hardware.
+ The page pointed to by spt will have its page->private pointing back
+ at the shadow page structure.
+ sptes in spt point either at guest pages, or at lower-level shadow pages.
+ Specifically, if sp1 and sp2 are shadow pages, then sp1->spt[n] may point
+ at __pa(sp2->spt). sp2 will point back at sp1 through parent_pte.
+ The spt array forms a DAG structure with the shadow page as a node, and
+ guest pages as leaves.
+ gfns:
+ An array of 512 guest frame numbers, one for each present pte. Used to
+ perform a reverse map from a pte to a gfn. When role.direct is set, any
+ element of this array can be calculated from the gfn field when used, in
+ this case, the array of gfns is not allocated. See role.direct and gfn.
+ slot_bitmap:
+ A bitmap containing one bit per memory slot. If the page contains a pte
+ mapping a page from memory slot n, then bit n of slot_bitmap will be set
+ (if a page is aliased among several slots, then it is not guaranteed that
+ all slots will be marked).
+ Used during dirty logging to avoid scanning a shadow page if none if its
+ pages need tracking.
+ root_count:
+ A counter keeping track of how many hardware registers (guest cr3 or
+ pdptrs) are now pointing at the page. While this counter is nonzero, the
+ page cannot be destroyed. See role.invalid.
+ multimapped:
+ Whether there exist multiple sptes pointing at this page.
+ parent_pte/parent_ptes:
+ If multimapped is zero, parent_pte points at the single spte that points at
+ this page's spt. Otherwise, parent_ptes points at a data structure
+ with a list of parent_ptes.
+ unsync:
+ If true, then the translations in this page may not match the guest's
+ translation. This is equivalent to the state of the tlb when a pte is
+ changed but before the tlb entry is flushed. Accordingly, unsync ptes
+ are synchronized when the guest executes invlpg or flushes its tlb by
+ other means. Valid for leaf pages.
+ unsync_children:
+ How many sptes in the page point at pages that are unsync (or have
+ unsynchronized children).
+ unsync_child_bitmap:
+ A bitmap indicating which sptes in spt point (directly or indirectly) at
+ pages that may be unsynchronized. Used to quickly locate all unsychronized
+ pages reachable from a given page.
+
+Reverse map
+===========
+
+The mmu maintains a reverse mapping whereby all ptes mapping a page can be
+reached given its gfn. This is used, for example, when swapping out a page.
+
+Synchronized and unsynchronized pages
+=====================================
+
+The guest uses two events to synchronize its tlb and page tables: tlb flushes
+and page invalidations (invlpg).
+
+A tlb flush means that we need to synchronize all sptes reachable from the
+guest's cr3. This is expensive, so we keep all guest page tables write
+protected, and synchronize sptes to gptes when a gpte is written.
+
+A special case is when a guest page table is reachable from the current
+guest cr3. In this case, the guest is obliged to issue an invlpg instruction
+before using the translation. We take advantage of that by removing write
+protection from the guest page, and allowing the guest to modify it freely.
+We synchronize modified gptes when the guest invokes invlpg. This reduces
+the amount of emulation we have to do when the guest modifies multiple gptes,
+or when the a guest page is no longer used as a page table and is used for
+random guest data.
+
+As a side effect we have to resynchronize all reachable unsynchronized shadow
+pages on a tlb flush.
+
+
+Reaction to events
+==================
+
+- guest page fault (or npt page fault, or ept violation)
+
+This is the most complicated event. The cause of a page fault can be:
+
+ - a true guest fault (the guest translation won't allow the access) (*)
+ - access to a missing translation
+ - access to a protected translation
+ - when logging dirty pages, memory is write protected
+ - synchronized shadow pages are write protected (*)
+ - access to untranslatable memory (mmio)
+
+ (*) not applicable in direct mode
+
+Handling a page fault is performed as follows:
+
+ - if needed, walk the guest page tables to determine the guest translation
+ (gva->gpa or ngpa->gpa)
+ - if permissions are insufficient, reflect the fault back to the guest
+ - determine the host page
+ - if this is an mmio request, there is no host page; call the emulator
+ to emulate the instruction instead
+ - walk the shadow page table to find the spte for the translation,
+ instantiating missing intermediate page tables as necessary
+ - try to unsynchronize the page
+ - if successful, we can let the guest continue and modify the gpte
+ - emulate the instruction
+ - if failed, unshadow the page and let the guest continue
+ - update any translations that were modified by the instruction
+
+invlpg handling:
+
+ - walk the shadow page hierarchy and drop affected translations
+ - try to reinstantiate the indicated translation in the hope that the
+ guest will use it in the near future
+
+Guest control register updates:
+
+- mov to cr3
+ - look up new shadow roots
+ - synchronize newly reachable shadow pages
+
+- mov to cr0/cr4/efer
+ - set up mmu context for new paging mode
+ - look up new shadow roots
+ - synchronize newly reachable shadow pages
+
+Host translation updates:
+
+ - mmu notifier called with updated hva
+ - look up affected sptes through reverse map
+ - drop (or update) translations
+
+Emulating cr0.wp
+================
+
+If tdp is not enabled, the host must keep cr0.wp=1 so page write protection
+works for the guest kernel, not guest guest userspace. When the guest
+cr0.wp=1, this does not present a problem. However when the guest cr0.wp=0,
+we cannot map the permissions for gpte.u=1, gpte.w=0 to any spte (the
+semantics require allowing any guest kernel access plus user read access).
+
+We handle this by mapping the permissions to two possible sptes, depending
+on fault type:
+
+- kernel write fault: spte.u=0, spte.w=1 (allows full kernel access,
+ disallows user access)
+- read fault: spte.u=1, spte.w=0 (allows full read access, disallows kernel
+ write access)
+
+(user write faults generate a #PF)
+
+Large pages
+===========
+
+The mmu supports all combinations of large and small guest and host pages.
+Supported page sizes include 4k, 2M, 4M, and 1G. 4M pages are treated as
+two separate 2M pages, on both guest and host, since the mmu always uses PAE
+paging.
+
+To instantiate a large spte, four constraints must be satisfied:
+
+- the spte must point to a large host page
+- the guest pte must be a large pte of at least equivalent size (if tdp is
+ enabled, there is no guest pte and this condition is satisified)
+- if the spte will be writeable, the large page frame may not overlap any
+ write-protected pages
+- the guest page must be wholly contained by a single memory slot
+
+To check the last two conditions, the mmu maintains a ->write_count set of
+arrays for each memory slot and large page size. Every write protected page
+causes its write_count to be incremented, thus preventing instantiation of
+a large spte. The frames at the end of an unaligned memory slot have
+artificically inflated ->write_counts so they can never be instantiated.
+
+Further reading
+===============
+
+- NPT presentation from KVM Forum 2008
+ http://www.linux-kvm.org/wiki/images/c/c8/KvmForum2008%24kdf2008_21.pdf
+
--- /dev/null
+KVM-specific MSRs.
+Glauber Costa <glommer@redhat.com>, Red Hat Inc, 2010
+=====================================================
+
+KVM makes use of some custom MSRs to service some requests.
+
+Custom MSRs have a range reserved for them, that goes from
+0x4b564d00 to 0x4b564dff. There are MSRs outside this area,
+but they are deprecated and their use is discouraged.
+
+Custom MSR list
+--------
+
+The current supported Custom MSR list is:
+
+MSR_KVM_WALL_CLOCK_NEW: 0x4b564d00
+
+ data: 4-byte alignment physical address of a memory area which must be
+ in guest RAM. This memory is expected to hold a copy of the following
+ structure:
+
+ struct pvclock_wall_clock {
+ u32 version;
+ u32 sec;
+ u32 nsec;
+ } __attribute__((__packed__));
+
+ whose data will be filled in by the hypervisor. The hypervisor is only
+ guaranteed to update this data at the moment of MSR write.
+ Users that want to reliably query this information more than once have
+ to write more than once to this MSR. Fields have the following meanings:
+
+ version: guest has to check version before and after grabbing
+ time information and check that they are both equal and even.
+ An odd version indicates an in-progress update.
+
+ sec: number of seconds for wallclock.
+
+ nsec: number of nanoseconds for wallclock.
+
+ Note that although MSRs are per-CPU entities, the effect of this
+ particular MSR is global.
+
+ Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
+ leaf prior to usage.
+
+MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01
+
+ data: 4-byte aligned physical address of a memory area which must be in
+ guest RAM, plus an enable bit in bit 0. This memory is expected to hold
+ a copy of the following structure:
+
+ struct pvclock_vcpu_time_info {
+ u32 version;
+ u32 pad0;
+ u64 tsc_timestamp;
+ u64 system_time;
+ u32 tsc_to_system_mul;
+ s8 tsc_shift;
+ u8 flags;
+ u8 pad[2];
+ } __attribute__((__packed__)); /* 32 bytes */
+
+ whose data will be filled in by the hypervisor periodically. Only one
+ write, or registration, is needed for each VCPU. The interval between
+ updates of this structure is arbitrary and implementation-dependent.
+ The hypervisor may update this structure at any time it sees fit until
+ anything with bit0 == 0 is written to it.
+
+ Fields have the following meanings:
+
+ version: guest has to check version before and after grabbing
+ time information and check that they are both equal and even.
+ An odd version indicates an in-progress update.
+
+ tsc_timestamp: the tsc value at the current VCPU at the time
+ of the update of this structure. Guests can subtract this value
+ from current tsc to derive a notion of elapsed time since the
+ structure update.
+
+ system_time: a host notion of monotonic time, including sleep
+ time at the time this structure was last updated. Unit is
+ nanoseconds.
+
+ tsc_to_system_mul: a function of the tsc frequency. One has
+ to multiply any tsc-related quantity by this value to get
+ a value in nanoseconds, besides dividing by 2^tsc_shift
+
+ tsc_shift: cycle to nanosecond divider, as a power of two, to
+ allow for shift rights. One has to shift right any tsc-related
+ quantity by this value to get a value in nanoseconds, besides
+ multiplying by tsc_to_system_mul.
+
+ With this information, guests can derive per-CPU time by
+ doing:
+
+ time = (current_tsc - tsc_timestamp)
+ time = (time * tsc_to_system_mul) >> tsc_shift
+ time = time + system_time
+
+ flags: bits in this field indicate extended capabilities
+ coordinated between the guest and the hypervisor. Availability
+ of specific flags has to be checked in 0x40000001 cpuid leaf.
+ Current flags are:
+
+ flag bit | cpuid bit | meaning
+ -------------------------------------------------------------
+ | | time measures taken across
+ 0 | 24 | multiple cpus are guaranteed to
+ | | be monotonic
+ -------------------------------------------------------------
+
+ Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid
+ leaf prior to usage.
+
+
+MSR_KVM_WALL_CLOCK: 0x11
+
+ data and functioning: same as MSR_KVM_WALL_CLOCK_NEW. Use that instead.
+
+ This MSR falls outside the reserved KVM range and may be removed in the
+ future. Its usage is deprecated.
+
+ Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
+ leaf prior to usage.
+
+MSR_KVM_SYSTEM_TIME: 0x12
+
+ data and functioning: same as MSR_KVM_SYSTEM_TIME_NEW. Use that instead.
+
+ This MSR falls outside the reserved KVM range and may be removed in the
+ future. Its usage is deprecated.
+
+ Availability of this MSR must be checked via bit 0 in 0x4000001 cpuid
+ leaf prior to usage.
+
+ The suggested algorithm for detecting kvmclock presence is then:
+
+ if (!kvm_para_available()) /* refer to cpuid.txt */
+ return NON_PRESENT;
+
+ flags = cpuid_eax(0x40000001);
+ if (flags & 3) {
+ msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
+ msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
+ return PRESENT;
+ } else if (flags & 0) {
+ msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
+ msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
+ return PRESENT;
+ } else
+ return NON_PRESENT;
+
+MSR_KVM_ASYNC_PF_EN: 0x4b564d02
+ data: Bits 63-6 hold 64-byte aligned physical address of a
+ 64 byte memory area which must be in guest RAM and must be
+ zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
+ when asynchronous page faults are enabled on the vcpu 0 when
+ disabled. Bit 2 is 1 if asynchronous page faults can be injected
+ when vcpu is in cpl == 0.
+
+ First 4 byte of 64 byte memory location will be written to by
+ the hypervisor at the time of asynchronous page fault (APF)
+ injection to indicate type of asynchronous page fault. Value
+ of 1 means that the page referred to by the page fault is not
+ present. Value 2 means that the page is now available. Disabling
+ interrupt inhibits APFs. Guest must not enable interrupt
+ before the reason is read, or it may be overwritten by another
+ APF. Since APF uses the same exception vector as regular page
+ fault guest must reset the reason to 0 before it does
+ something that can generate normal page fault. If during page
+ fault APF reason is 0 it means that this is regular page
+ fault.
+
+ During delivery of type 1 APF cr2 contains a token that will
+ be used to notify a guest when missing page becomes
+ available. When page becomes available type 2 APF is sent with
+ cr2 set to the token associated with the page. There is special
+ kind of token 0xffffffff which tells vcpu that it should wake
+ up all processes waiting for APFs and no individual type 2 APFs
+ will be sent.
+
+ If APF is disabled while there are outstanding APFs, they will
+ not be delivered.
+
+ Currently type 2 APF will be always delivered on the same vcpu as
+ type 1 was, but guest should not rely on that.
--- /dev/null
+The PPC KVM paravirtual interface
+=================================
+
+The basic execution principle by which KVM on PowerPC works is to run all kernel
+space code in PR=1 which is user space. This way we trap all privileged
+instructions and can emulate them accordingly.
+
+Unfortunately that is also the downfall. There are quite some privileged
+instructions that needlessly return us to the hypervisor even though they
+could be handled differently.
+
+This is what the PPC PV interface helps with. It takes privileged instructions
+and transforms them into unprivileged ones with some help from the hypervisor.
+This cuts down virtualization costs by about 50% on some of my benchmarks.
+
+The code for that interface can be found in arch/powerpc/kernel/kvm*
+
+Querying for existence
+======================
+
+To find out if we're running on KVM or not, we leverage the device tree. When
+Linux is running on KVM, a node /hypervisor exists. That node contains a
+compatible property with the value "linux,kvm".
+
+Once you determined you're running under a PV capable KVM, you can now use
+hypercalls as described below.
+
+KVM hypercalls
+==============
+
+Inside the device tree's /hypervisor node there's a property called
+'hypercall-instructions'. This property contains at most 4 opcodes that make
+up the hypercall. To call a hypercall, just call these instructions.
+
+The parameters are as follows:
+
+ Register IN OUT
+
+ r0 - volatile
+ r3 1st parameter Return code
+ r4 2nd parameter 1st output value
+ r5 3rd parameter 2nd output value
+ r6 4th parameter 3rd output value
+ r7 5th parameter 4th output value
+ r8 6th parameter 5th output value
+ r9 7th parameter 6th output value
+ r10 8th parameter 7th output value
+ r11 hypercall number 8th output value
+ r12 - volatile
+
+Hypercall definitions are shared in generic code, so the same hypercall numbers
+apply for x86 and powerpc alike with the exception that each KVM hypercall
+also needs to be ORed with the KVM vendor code which is (42 << 16).
+
+Return codes can be as follows:
+
+ Code Meaning
+
+ 0 Success
+ 12 Hypercall not implemented
+ <0 Error
+
+The magic page
+==============
+
+To enable communication between the hypervisor and guest there is a new shared
+page that contains parts of supervisor visible register state. The guest can
+map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
+
+With this hypercall issued the guest always gets the magic page mapped at the
+desired location in effective and physical address space. For now, we always
+map the page to -4096. This way we can access it using absolute load and store
+functions. The following instruction reads the first field of the magic page:
+
+ ld rX, -4096(0)
+
+The interface is designed to be extensible should there be need later to add
+additional registers to the magic page. If you add fields to the magic page,
+also define a new hypercall feature to indicate that the host can give you more
+registers. Only if the host supports the additional features, make use of them.
+
+The magic page has the following layout as described in
+arch/powerpc/include/asm/kvm_para.h:
+
+struct kvm_vcpu_arch_shared {
+ __u64 scratch1;
+ __u64 scratch2;
+ __u64 scratch3;
+ __u64 critical; /* Guest may not get interrupts if == r1 */
+ __u64 sprg0;
+ __u64 sprg1;
+ __u64 sprg2;
+ __u64 sprg3;
+ __u64 srr0;
+ __u64 srr1;
+ __u64 dar;
+ __u64 msr;
+ __u32 dsisr;
+ __u32 int_pending; /* Tells the guest if we have an interrupt */
+};
+
+Additions to the page must only occur at the end. Struct fields are always 32
+or 64 bit aligned, depending on them being 32 or 64 bit wide respectively.
+
+Magic page features
+===================
+
+When mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE,
+a second return value is passed to the guest. This second return value contains
+a bitmap of available features inside the magic page.
+
+The following enhancements to the magic page are currently available:
+
+ KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page
+
+For enhanced features in the magic page, please check for the existence of the
+feature before using them!
+
+MSR bits
+========
+
+The MSR contains bits that require hypervisor intervention and bits that do
+not require direct hypervisor intervention because they only get interpreted
+when entering the guest or don't have any impact on the hypervisor's behavior.
+
+The following bits are safe to be set inside the guest:
+
+ MSR_EE
+ MSR_RI
+ MSR_CR
+ MSR_ME
+
+If any other bit changes in the MSR, please still use mtmsr(d).
+
+Patched instructions
+====================
+
+The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
+respectively on 32 bit systems with an added offset of 4 to accommodate for big
+endianness.
+
+The following is a list of mapping the Linux kernel performs when running as
+guest. Implementing any of those mappings is optional, as the instruction traps
+also act on the shared page. So calling privileged instructions still works as
+before.
+
+From To
+==== ==
+
+mfmsr rX ld rX, magic_page->msr
+mfsprg rX, 0 ld rX, magic_page->sprg0
+mfsprg rX, 1 ld rX, magic_page->sprg1
+mfsprg rX, 2 ld rX, magic_page->sprg2
+mfsprg rX, 3 ld rX, magic_page->sprg3
+mfsrr0 rX ld rX, magic_page->srr0
+mfsrr1 rX ld rX, magic_page->srr1
+mfdar rX ld rX, magic_page->dar
+mfdsisr rX lwz rX, magic_page->dsisr
+
+mtmsr rX std rX, magic_page->msr
+mtsprg 0, rX std rX, magic_page->sprg0
+mtsprg 1, rX std rX, magic_page->sprg1
+mtsprg 2, rX std rX, magic_page->sprg2
+mtsprg 3, rX std rX, magic_page->sprg3
+mtsrr0 rX std rX, magic_page->srr0
+mtsrr1 rX std rX, magic_page->srr1
+mtdar rX std rX, magic_page->dar
+mtdsisr rX stw rX, magic_page->dsisr
+
+tlbsync nop
+
+mtmsrd rX, 0 b <special mtmsr section>
+mtmsr rX b <special mtmsr section>
+
+mtmsrd rX, 1 b <special mtmsrd section>
+
+[Book3S only]
+mtsrin rX, rY b <special mtsrin section>
+
+[BookE only]
+wrteei [0|1] b <special wrteei section>
+
+
+Some instructions require more logic to determine what's going on than a load
+or store instruction can deliver. To enable patching of those, we keep some
+RAM around where we can live translate instructions to. What happens is the
+following:
+
+ 1) copy emulation code to memory
+ 2) patch that code to fit the emulated instruction
+ 3) patch that code to return to the original pc + 4
+ 4) patch the original instruction to branch to the new code
+
+That way we can inject an arbitrary amount of code as replacement for a single
+instruction. This allows us to check for pending interrupts when setting EE=1
+for example.
--- /dev/null
+Review checklist for kvm patches
+================================
+
+1. The patch must follow Documentation/CodingStyle and
+ Documentation/SubmittingPatches.
+
+2. Patches should be against kvm.git master branch.
+
+3. If the patch introduces or modifies a new userspace API:
+ - the API must be documented in Documentation/virtual/kvm/api.txt
+ - the API must be discoverable using KVM_CHECK_EXTENSION
+
+4. New state must include support for save/restore.
+
+5. New features must default to off (userspace should explicitly request them).
+ Performance improvements can and should default to on.
+
+6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2
+
+7. Emulator changes should be accompanied by unit tests for qemu-kvm.git
+ kvm/test directory.
+
+8. Changes should be vendor neutral when possible. Changes to common code
+ are better than duplicating changes to vendor code.
+
+9. Similarly, prefer changes to arch independent code than to arch dependent
+ code.
+
+10. User/kernel interfaces and guest/host interfaces must be 64-bit clean
+ (all variables and sizes naturally aligned on 64-bit; use specific types
+ only - u64 rather than ulong).
+
+11. New guest visible features must either be documented in a hardware manual
+ or be accompanied by documentation.
+
+12. Features must be robust against reset and kexec - for example, shared
+ host/guest memory must be unshared to prevent the host from writing to
+ guest memory that the guest has not reserved for this purpose.
--- /dev/null
+
+ Timekeeping Virtualization for X86-Based Architectures
+
+ Zachary Amsden <zamsden@redhat.com>
+ Copyright (c) 2010, Red Hat. All rights reserved.
+
+1) Overview
+2) Timing Devices
+3) TSC Hardware
+4) Virtualization Problems
+
+=========================================================================
+
+1) Overview
+
+One of the most complicated parts of the X86 platform, and specifically,
+the virtualization of this platform is the plethora of timing devices available
+and the complexity of emulating those devices. In addition, virtualization of
+time introduces a new set of challenges because it introduces a multiplexed
+division of time beyond the control of the guest CPU.
+
+First, we will describe the various timekeeping hardware available, then
+present some of the problems which arise and solutions available, giving
+specific recommendations for certain classes of KVM guests.
+
+The purpose of this document is to collect data and information relevant to
+timekeeping which may be difficult to find elsewhere, specifically,
+information relevant to KVM and hardware-based virtualization.
+
+=========================================================================
+
+2) Timing Devices
+
+First we discuss the basic hardware devices available. TSC and the related
+KVM clock are special enough to warrant a full exposition and are described in
+the following section.
+
+2.1) i8254 - PIT
+
+One of the first timer devices available is the programmable interrupt timer,
+or PIT. The PIT has a fixed frequency 1.193182 MHz base clock and three
+channels which can be programmed to deliver periodic or one-shot interrupts.
+These three channels can be configured in different modes and have individual
+counters. Channel 1 and 2 were not available for general use in the original
+IBM PC, and historically were connected to control RAM refresh and the PC
+speaker. Now the PIT is typically integrated as part of an emulated chipset
+and a separate physical PIT is not used.
+
+The PIT uses I/O ports 0x40 - 0x43. Access to the 16-bit counters is done
+using single or multiple byte access to the I/O ports. There are 6 modes
+available, but not all modes are available to all timers, as only timer 2
+has a connected gate input, required for modes 1 and 5. The gate line is
+controlled by port 61h, bit 0, as illustrated in the following diagram.
+
+ -------------- ----------------
+| | | |
+| 1.1932 MHz |---------->| CLOCK OUT | ---------> IRQ 0
+| Clock | | | |
+ -------------- | +->| GATE TIMER 0 |
+ | ----------------
+ |
+ | ----------------
+ | | |
+ |------>| CLOCK OUT | ---------> 66.3 KHZ DRAM
+ | | | (aka /dev/null)
+ | +->| GATE TIMER 1 |
+ | ----------------
+ |
+ | ----------------
+ | | |
+ |------>| CLOCK OUT | ---------> Port 61h, bit 5
+ | | |
+Port 61h, bit 0 ---------->| GATE TIMER 2 | \_.---- ____
+ ---------------- _| )--|LPF|---Speaker
+ / *---- \___/
+Port 61h, bit 1 -----------------------------------/
+
+The timer modes are now described.
+
+Mode 0: Single Timeout. This is a one-shot software timeout that counts down
+ when the gate is high (always true for timers 0 and 1). When the count
+ reaches zero, the output goes high.
+
+Mode 1: Triggered One-shot. The output is initially set high. When the gate
+ line is set high, a countdown is initiated (which does not stop if the gate is
+ lowered), during which the output is set low. When the count reaches zero,
+ the output goes high.
+
+Mode 2: Rate Generator. The output is initially set high. When the countdown
+ reaches 1, the output goes low for one count and then returns high. The value
+ is reloaded and the countdown automatically resumes. If the gate line goes
+ low, the count is halted. If the output is low when the gate is lowered, the
+ output automatically goes high (this only affects timer 2).
+
+Mode 3: Square Wave. This generates a high / low square wave. The count
+ determines the length of the pulse, which alternates between high and low
+ when zero is reached. The count only proceeds when gate is high and is
+ automatically reloaded on reaching zero. The count is decremented twice at
+ each clock to generate a full high / low cycle at the full periodic rate.
+ If the count is even, the clock remains high for N/2 counts and low for N/2
+ counts; if the clock is odd, the clock is high for (N+1)/2 counts and low
+ for (N-1)/2 counts. Only even values are latched by the counter, so odd
+ values are not observed when reading. This is the intended mode for timer 2,
+ which generates sine-like tones by low-pass filtering the square wave output.
+
+Mode 4: Software Strobe. After programming this mode and loading the counter,
+ the output remains high until the counter reaches zero. Then the output
+ goes low for 1 clock cycle and returns high. The counter is not reloaded.
+ Counting only occurs when gate is high.
+
+Mode 5: Hardware Strobe. After programming and loading the counter, the
+ output remains high. When the gate is raised, a countdown is initiated
+ (which does not stop if the gate is lowered). When the counter reaches zero,
+ the output goes low for 1 clock cycle and then returns high. The counter is
+ not reloaded.
+
+In addition to normal binary counting, the PIT supports BCD counting. The
+command port, 0x43 is used to set the counter and mode for each of the three
+timers.
+
+PIT commands, issued to port 0x43, using the following bit encoding:
+
+Bit 7-4: Command (See table below)
+Bit 3-1: Mode (000 = Mode 0, 101 = Mode 5, 11X = undefined)
+Bit 0 : Binary (0) / BCD (1)
+
+Command table:
+
+0000 - Latch Timer 0 count for port 0x40
+ sample and hold the count to be read in port 0x40;
+ additional commands ignored until counter is read;
+ mode bits ignored.
+
+0001 - Set Timer 0 LSB mode for port 0x40
+ set timer to read LSB only and force MSB to zero;
+ mode bits set timer mode
+
+0010 - Set Timer 0 MSB mode for port 0x40
+ set timer to read MSB only and force LSB to zero;
+ mode bits set timer mode
+
+0011 - Set Timer 0 16-bit mode for port 0x40
+ set timer to read / write LSB first, then MSB;
+ mode bits set timer mode
+
+0100 - Latch Timer 1 count for port 0x41 - as described above
+0101 - Set Timer 1 LSB mode for port 0x41 - as described above
+0110 - Set Timer 1 MSB mode for port 0x41 - as described above
+0111 - Set Timer 1 16-bit mode for port 0x41 - as described above
+
+1000 - Latch Timer 2 count for port 0x42 - as described above
+1001 - Set Timer 2 LSB mode for port 0x42 - as described above
+1010 - Set Timer 2 MSB mode for port 0x42 - as described above
+1011 - Set Timer 2 16-bit mode for port 0x42 as described above
+
+1101 - General counter latch
+ Latch combination of counters into corresponding ports
+ Bit 3 = Counter 2
+ Bit 2 = Counter 1
+ Bit 1 = Counter 0
+ Bit 0 = Unused
+
+1110 - Latch timer status
+ Latch combination of counter mode into corresponding ports
+ Bit 3 = Counter 2
+ Bit 2 = Counter 1
+ Bit 1 = Counter 0
+
+ The output of ports 0x40-0x42 following this command will be:
+
+ Bit 7 = Output pin
+ Bit 6 = Count loaded (0 if timer has expired)
+ Bit 5-4 = Read / Write mode
+ 01 = MSB only
+ 10 = LSB only
+ 11 = LSB / MSB (16-bit)
+ Bit 3-1 = Mode
+ Bit 0 = Binary (0) / BCD mode (1)
+
+2.2) RTC
+
+The second device which was available in the original PC was the MC146818 real
+time clock. The original device is now obsolete, and usually emulated by the
+system chipset, sometimes by an HPET and some frankenstein IRQ routing.
+
+The RTC is accessed through CMOS variables, which uses an index register to
+control which bytes are read. Since there is only one index register, read
+of the CMOS and read of the RTC require lock protection (in addition, it is
+dangerous to allow userspace utilities such as hwclock to have direct RTC
+access, as they could corrupt kernel reads and writes of CMOS memory).
+
+The RTC generates an interrupt which is usually routed to IRQ 8. The interrupt
+can function as a periodic timer, an additional once a day alarm, and can issue
+interrupts after an update of the CMOS registers by the MC146818 is complete.
+The type of interrupt is signalled in the RTC status registers.
+
+The RTC will update the current time fields by battery power even while the
+system is off. The current time fields should not be read while an update is
+in progress, as indicated in the status register.
+
+The clock uses a 32.768kHz crystal, so bits 6-4 of register A should be
+programmed to a 32kHz divider if the RTC is to count seconds.
+
+This is the RAM map originally used for the RTC/CMOS:
+
+Location Size Description
+------------------------------------------
+00h byte Current second (BCD)
+01h byte Seconds alarm (BCD)
+02h byte Current minute (BCD)
+03h byte Minutes alarm (BCD)
+04h byte Current hour (BCD)
+05h byte Hours alarm (BCD)
+06h byte Current day of week (BCD)
+07h byte Current day of month (BCD)
+08h byte Current month (BCD)
+09h byte Current year (BCD)
+0Ah byte Register A
+ bit 7 = Update in progress
+ bit 6-4 = Divider for clock
+ 000 = 4.194 MHz
+ 001 = 1.049 MHz
+ 010 = 32 kHz
+ 10X = test modes
+ 110 = reset / disable
+ 111 = reset / disable
+ bit 3-0 = Rate selection for periodic interrupt
+ 000 = periodic timer disabled
+ 001 = 3.90625 uS
+ 010 = 7.8125 uS
+ 011 = .122070 mS
+ 100 = .244141 mS
+ ...
+ 1101 = 125 mS
+ 1110 = 250 mS
+ 1111 = 500 mS
+0Bh byte Register B
+ bit 7 = Run (0) / Halt (1)
+ bit 6 = Periodic interrupt enable
+ bit 5 = Alarm interrupt enable
+ bit 4 = Update-ended interrupt enable
+ bit 3 = Square wave interrupt enable
+ bit 2 = BCD calendar (0) / Binary (1)
+ bit 1 = 12-hour mode (0) / 24-hour mode (1)
+ bit 0 = 0 (DST off) / 1 (DST enabled)
+OCh byte Register C (read only)
+ bit 7 = interrupt request flag (IRQF)
+ bit 6 = periodic interrupt flag (PF)
+ bit 5 = alarm interrupt flag (AF)
+ bit 4 = update interrupt flag (UF)
+ bit 3-0 = reserved
+ODh byte Register D (read only)
+ bit 7 = RTC has power
+ bit 6-0 = reserved
+32h byte Current century BCD (*)
+ (*) location vendor specific and now determined from ACPI global tables
+
+2.3) APIC
+
+On Pentium and later processors, an on-board timer is available to each CPU
+as part of the Advanced Programmable Interrupt Controller. The APIC is
+accessed through memory-mapped registers and provides interrupt service to each
+CPU, used for IPIs and local timer interrupts.
+
+Although in theory the APIC is a safe and stable source for local interrupts,
+in practice, many bugs and glitches have occurred due to the special nature of
+the APIC CPU-local memory-mapped hardware. Beware that CPU errata may affect
+the use of the APIC and that workarounds may be required. In addition, some of
+these workarounds pose unique constraints for virtualization - requiring either
+extra overhead incurred from extra reads of memory-mapped I/O or additional
+functionality that may be more computationally expensive to implement.
+
+Since the APIC is documented quite well in the Intel and AMD manuals, we will
+avoid repetition of the detail here. It should be pointed out that the APIC
+timer is programmed through the LVT (local vector timer) register, is capable
+of one-shot or periodic operation, and is based on the bus clock divided down
+by the programmable divider register.
+
+2.4) HPET
+
+HPET is quite complex, and was originally intended to replace the PIT / RTC
+support of the X86 PC. It remains to be seen whether that will be the case, as
+the de facto standard of PC hardware is to emulate these older devices. Some
+systems designated as legacy free may support only the HPET as a hardware timer
+device.
+
+The HPET spec is rather loose and vague, requiring at least 3 hardware timers,
+but allowing implementation freedom to support many more. It also imposes no
+fixed rate on the timer frequency, but does impose some extremal values on
+frequency, error and slew.
+
+In general, the HPET is recommended as a high precision (compared to PIT /RTC)
+time source which is independent of local variation (as there is only one HPET
+in any given system). The HPET is also memory-mapped, and its presence is
+indicated through ACPI tables by the BIOS.
+
+Detailed specification of the HPET is beyond the current scope of this
+document, as it is also very well documented elsewhere.
+
+2.5) Offboard Timers
+
+Several cards, both proprietary (watchdog boards) and commonplace (e1000) have
+timing chips built into the cards which may have registers which are accessible
+to kernel or user drivers. To the author's knowledge, using these to generate
+a clocksource for a Linux or other kernel has not yet been attempted and is in
+general frowned upon as not playing by the agreed rules of the game. Such a
+timer device would require additional support to be virtualized properly and is
+not considered important at this time as no known operating system does this.
+
+=========================================================================
+
+3) TSC Hardware
+
+The TSC or time stamp counter is relatively simple in theory; it counts
+instruction cycles issued by the processor, which can be used as a measure of
+time. In practice, due to a number of problems, it is the most complicated
+timekeeping device to use.
+
+The TSC is represented internally as a 64-bit MSR which can be read with the
+RDMSR, RDTSC, or RDTSCP (when available) instructions. In the past, hardware
+limitations made it possible to write the TSC, but generally on old hardware it
+was only possible to write the low 32-bits of the 64-bit counter, and the upper
+32-bits of the counter were cleared. Now, however, on Intel processors family
+0Fh, for models 3, 4 and 6, and family 06h, models e and f, this restriction
+has been lifted and all 64-bits are writable. On AMD systems, the ability to
+write the TSC MSR is not an architectural guarantee.
+
+The TSC is accessible from CPL-0 and conditionally, for CPL > 0 software by
+means of the CR4.TSD bit, which when enabled, disables CPL > 0 TSC access.
+
+Some vendors have implemented an additional instruction, RDTSCP, which returns
+atomically not just the TSC, but an indicator which corresponds to the
+processor number. This can be used to index into an array of TSC variables to
+determine offset information in SMP systems where TSCs are not synchronized.
+The presence of this instruction must be determined by consulting CPUID feature
+bits.
+
+Both VMX and SVM provide extension fields in the virtualization hardware which
+allows the guest visible TSC to be offset by a constant. Newer implementations
+promise to allow the TSC to additionally be scaled, but this hardware is not
+yet widely available.
+
+3.1) TSC synchronization
+
+The TSC is a CPU-local clock in most implementations. This means, on SMP
+platforms, the TSCs of different CPUs may start at different times depending
+on when the CPUs are powered on. Generally, CPUs on the same die will share
+the same clock, however, this is not always the case.
+
+The BIOS may attempt to resynchronize the TSCs during the poweron process and
+the operating system or other system software may attempt to do this as well.
+Several hardware limitations make the problem worse - if it is not possible to
+write the full 64-bits of the TSC, it may be impossible to match the TSC in
+newly arriving CPUs to that of the rest of the system, resulting in
+unsynchronized TSCs. This may be done by BIOS or system software, but in
+practice, getting a perfectly synchronized TSC will not be possible unless all
+values are read from the same clock, which generally only is possible on single
+socket systems or those with special hardware support.
+
+3.2) TSC and CPU hotplug
+
+As touched on already, CPUs which arrive later than the boot time of the system
+may not have a TSC value that is synchronized with the rest of the system.
+Either system software, BIOS, or SMM code may actually try to establish the TSC
+to a value matching the rest of the system, but a perfect match is usually not
+a guarantee. This can have the effect of bringing a system from a state where
+TSC is synchronized back to a state where TSC synchronization flaws, however
+small, may be exposed to the OS and any virtualization environment.
+
+3.3) TSC and multi-socket / NUMA
+
+Multi-socket systems, especially large multi-socket systems are likely to have
+individual clocksources rather than a single, universally distributed clock.
+Since these clocks are driven by different crystals, they will not have
+perfectly matched frequency, and temperature and electrical variations will
+cause the CPU clocks, and thus the TSCs to drift over time. Depending on the
+exact clock and bus design, the drift may or may not be fixed in absolute
+error, and may accumulate over time.
+
+In addition, very large systems may deliberately slew the clocks of individual
+cores. This technique, known as spread-spectrum clocking, reduces EMI at the
+clock frequency and harmonics of it, which may be required to pass FCC
+standards for telecommunications and computer equipment.
+
+It is recommended not to trust the TSCs to remain synchronized on NUMA or
+multiple socket systems for these reasons.
+
+3.4) TSC and C-states
+
+C-states, or idling states of the processor, especially C1E and deeper sleep
+states may be problematic for TSC as well. The TSC may stop advancing in such
+a state, resulting in a TSC which is behind that of other CPUs when execution
+is resumed. Such CPUs must be detected and flagged by the operating system
+based on CPU and chipset identifications.
+
+The TSC in such a case may be corrected by catching it up to a known external
+clocksource.
+
+3.5) TSC frequency change / P-states
+
+To make things slightly more interesting, some CPUs may change frequency. They
+may or may not run the TSC at the same rate, and because the frequency change
+may be staggered or slewed, at some points in time, the TSC rate may not be
+known other than falling within a range of values. In this case, the TSC will
+not be a stable time source, and must be calibrated against a known, stable,
+external clock to be a usable source of time.
+
+Whether the TSC runs at a constant rate or scales with the P-state is model
+dependent and must be determined by inspecting CPUID, chipset or vendor
+specific MSR fields.
+
+In addition, some vendors have known bugs where the P-state is actually
+compensated for properly during normal operation, but when the processor is
+inactive, the P-state may be raised temporarily to service cache misses from
+other processors. In such cases, the TSC on halted CPUs could advance faster
+than that of non-halted processors. AMD Turion processors are known to have
+this problem.
+
+3.6) TSC and STPCLK / T-states
+
+External signals given to the processor may also have the effect of stopping
+the TSC. This is typically done for thermal emergency power control to prevent
+an overheating condition, and typically, there is no way to detect that this
+condition has happened.
+
+3.7) TSC virtualization - VMX
+
+VMX provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
+instructions, which is enough for full virtualization of TSC in any manner. In
+addition, VMX allows passing through the host TSC plus an additional TSC_OFFSET
+field specified in the VMCS. Special instructions must be used to read and
+write the VMCS field.
+
+3.8) TSC virtualization - SVM
+
+SVM provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
+instructions, which is enough for full virtualization of TSC in any manner. In
+addition, SVM allows passing through the host TSC plus an additional offset
+field specified in the SVM control block.
+
+3.9) TSC feature bits in Linux
+
+In summary, there is no way to guarantee the TSC remains in perfect
+synchronization unless it is explicitly guaranteed by the architecture. Even
+if so, the TSCs in multi-sockets or NUMA systems may still run independently
+despite being locally consistent.
+
+The following feature bits are used by Linux to signal various TSC attributes,
+but they can only be taken to be meaningful for UP or single node systems.
+
+X86_FEATURE_TSC : The TSC is available in hardware
+X86_FEATURE_RDTSCP : The RDTSCP instruction is available
+X86_FEATURE_CONSTANT_TSC : The TSC rate is unchanged with P-states
+X86_FEATURE_NONSTOP_TSC : The TSC does not stop in C-states
+X86_FEATURE_TSC_RELIABLE : TSC sync checks are skipped (VMware)
+
+4) Virtualization Problems
+
+Timekeeping is especially problematic for virtualization because a number of
+challenges arise. The most obvious problem is that time is now shared between
+the host and, potentially, a number of virtual machines. Thus the virtual
+operating system does not run with 100% usage of the CPU, despite the fact that
+it may very well make that assumption. It may expect it to remain true to very
+exacting bounds when interrupt sources are disabled, but in reality only its
+virtual interrupt sources are disabled, and the machine may still be preempted
+at any time. This causes problems as the passage of real time, the injection
+of machine interrupts and the associated clock sources are no longer completely
+synchronized with real time.
+
+This same problem can occur on native harware to a degree, as SMM mode may
+steal cycles from the naturally on X86 systems when SMM mode is used by the
+BIOS, but not in such an extreme fashion. However, the fact that SMM mode may
+cause similar problems to virtualization makes it a good justification for
+solving many of these problems on bare metal.
+
+4.1) Interrupt clocking
+
+One of the most immediate problems that occurs with legacy operating systems
+is that the system timekeeping routines are often designed to keep track of
+time by counting periodic interrupts. These interrupts may come from the PIT
+or the RTC, but the problem is the same: the host virtualization engine may not
+be able to deliver the proper number of interrupts per second, and so guest
+time may fall behind. This is especially problematic if a high interrupt rate
+is selected, such as 1000 HZ, which is unfortunately the default for many Linux
+guests.
+
+There are three approaches to solving this problem; first, it may be possible
+to simply ignore it. Guests which have a separate time source for tracking
+'wall clock' or 'real time' may not need any adjustment of their interrupts to
+maintain proper time. If this is not sufficient, it may be necessary to inject
+additional interrupts into the guest in order to increase the effective
+interrupt rate. This approach leads to complications in extreme conditions,
+where host load or guest lag is too much to compensate for, and thus another
+solution to the problem has risen: the guest may need to become aware of lost
+ticks and compensate for them internally. Although promising in theory, the
+implementation of this policy in Linux has been extremely error prone, and a
+number of buggy variants of lost tick compensation are distributed across
+commonly used Linux systems.
+
+Windows uses periodic RTC clocking as a means of keeping time internally, and
+thus requires interrupt slewing to keep proper time. It does use a low enough
+rate (ed: is it 18.2 Hz?) however that it has not yet been a problem in
+practice.
+
+4.2) TSC sampling and serialization
+
+As the highest precision time source available, the cycle counter of the CPU
+has aroused much interest from developers. As explained above, this timer has
+many problems unique to its nature as a local, potentially unstable and
+potentially unsynchronized source. One issue which is not unique to the TSC,
+but is highlighted because of its very precise nature is sampling delay. By
+definition, the counter, once read is already old. However, it is also
+possible for the counter to be read ahead of the actual use of the result.
+This is a consequence of the superscalar execution of the instruction stream,
+which may execute instructions out of order. Such execution is called
+non-serialized. Forcing serialized execution is necessary for precise
+measurement with the TSC, and requires a serializing instruction, such as CPUID
+or an MSR read.
+
+Since CPUID may actually be virtualized by a trap and emulate mechanism, this
+serialization can pose a performance issue for hardware virtualization. An
+accurate time stamp counter reading may therefore not always be available, and
+it may be necessary for an implementation to guard against "backwards" reads of
+the TSC as seen from other CPUs, even in an otherwise perfectly synchronized
+system.
+
+4.3) Timespec aliasing
+
+Additionally, this lack of serialization from the TSC poses another challenge
+when using results of the TSC when measured against another time source. As
+the TSC is much higher precision, many possible values of the TSC may be read
+while another clock is still expressing the same value.
+
+That is, you may read (T,T+10) while external clock C maintains the same value.
+Due to non-serialized reads, you may actually end up with a range which
+fluctuates - from (T-1.. T+10). Thus, any time calculated from a TSC, but
+calibrated against an external value may have a range of valid values.
+Re-calibrating this computation may actually cause time, as computed after the
+calibration, to go backwards, compared with time computed before the
+calibration.
+
+This problem is particularly pronounced with an internal time source in Linux,
+the kernel time, which is expressed in the theoretically high resolution
+timespec - but which advances in much larger granularity intervals, sometimes
+at the rate of jiffies, and possibly in catchup modes, at a much larger step.
+
+This aliasing requires care in the computation and recalibration of kvmclock
+and any other values derived from TSC computation (such as TSC virtualization
+itself).
+
+4.4) Migration
+
+Migration of a virtual machine raises problems for timekeeping in two ways.
+First, the migration itself may take time, during which interrupts cannot be
+delivered, and after which, the guest time may need to be caught up. NTP may
+be able to help to some degree here, as the clock correction required is
+typically small enough to fall in the NTP-correctable window.
+
+An additional concern is that timers based off the TSC (or HPET, if the raw bus
+clock is exposed) may now be running at different rates, requiring compensation
+in some way in the hypervisor by virtualizing these timers. In addition,
+migrating to a faster machine may preclude the use of a passthrough TSC, as a
+faster clock cannot be made visible to a guest without the potential of time
+advancing faster than usual. A slower clock is less of a problem, as it can
+always be caught up to the original rate. KVM clock avoids these problems by
+simply storing multipliers and offsets against the TSC for the guest to convert
+back into nanosecond resolution values.
+
+4.5) Scheduling
+
+Since scheduling may be based on precise timing and firing of interrupts, the
+scheduling algorithms of an operating system may be adversely affected by
+virtualization. In theory, the effect is random and should be universally
+distributed, but in contrived as well as real scenarios (guest device access,
+causes of virtualization exits, possible context switch), this may not always
+be the case. The effect of this has not been well studied.
+
+In an attempt to work around this, several implementations have provided a
+paravirtualized scheduler clock, which reveals the true amount of CPU time for
+which a virtual machine has been running.
+
+4.6) Watchdogs
+
+Watchdog timers, such as the lock detector in Linux may fire accidentally when
+running under hardware virtualization due to timer interrupts being delayed or
+misinterpretation of the passage of real time. Usually, these warnings are
+spurious and can be ignored, but in some circumstances it may be necessary to
+disable such detection.
+
+4.7) Delays and precision timing
+
+Precise timing and delays may not be possible in a virtualized system. This
+can happen if the system is controlling physical hardware, or issues delays to
+compensate for slower I/O to and from devices. The first issue is not solvable
+in general for a virtualized system; hardware control software can't be
+adequately virtualized without a full real-time operating system, which would
+require an RT aware virtualization platform.
+
+The second issue may cause performance problems, but this is unlikely to be a
+significant issue. In many cases these delays may be eliminated through
+configuration or paravirtualization.
+
+4.8) Covert channels and leaks
+
+In addition to the above problems, time information will inevitably leak to the
+guest about the host in anything but a perfect implementation of virtualized
+time. This may allow the guest to infer the presence of a hypervisor (as in a
+red-pill type detection), and it may allow information to leak between guests
+by using CPU utilization itself as a signalling channel. Preventing such
+problems would require completely isolated virtual time which may not track
+real time any longer. This may be useful in certain security or QA contexts,
+but in general isn't recommended for real-world deployment scenarios.
--- /dev/null
+# This creates the demonstration utility "lguest" which runs a Linux guest.
+# Missing headers? Add "-I../../include -I../../arch/x86/include"
+CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE
+
+all: lguest
+
+clean:
+ rm -f lguest
--- /dev/null
+#! /bin/sh
+
+set -e
+
+PREFIX=$1
+shift
+
+trap 'rm -r $TMPDIR' 0
+TMPDIR=`mktemp -d`
+
+exec 3>/dev/null
+for f; do
+ while IFS="
+" read -r LINE; do
+ case "$LINE" in
+ *$PREFIX:[0-9]*:\**)
+ NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
+ if [ -f $TMPDIR/$NUM ]; then
+ echo "$TMPDIR/$NUM already exits prior to $f"
+ exit 1
+ fi
+ exec 3>>$TMPDIR/$NUM
+ echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
+ /bin/echo "$LINE" | sed -e "s/$PREFIX:[0-9]*//" -e "s/:\*/*/" >&3
+ ;;
+ *$PREFIX:[0-9]*)
+ NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
+ if [ -f $TMPDIR/$NUM ]; then
+ echo "$TMPDIR/$NUM already exits prior to $f"
+ exit 1
+ fi
+ exec 3>>$TMPDIR/$NUM
+ echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
+ /bin/echo "$LINE" | sed "s/$PREFIX:[0-9]*//" >&3
+ ;;
+ *:\**)
+ /bin/echo "$LINE" | sed -e "s/:\*/*/" -e "s,/\*\*/,," >&3
+ echo >&3
+ exec 3>/dev/null
+ ;;
+ *)
+ /bin/echo "$LINE" >&3
+ ;;
+ esac
+ done < $f
+ echo >&3
+ exec 3>/dev/null
+done
+
+LASTFILE=""
+for f in $TMPDIR/*; do
+ if [ "$LASTFILE" != $(cat $TMPDIR/.$(basename $f) ) ]; then
+ LASTFILE=$(cat $TMPDIR/.$(basename $f) )
+ echo "[ $LASTFILE ]"
+ fi
+ cat $f
+done
+
--- /dev/null
+/*P:100
+ * This is the Launcher code, a simple program which lays out the "physical"
+ * memory for the new Guest by mapping the kernel image and the virtual
+ * devices, then opens /dev/lguest to tell the kernel about the Guest and
+ * control it.
+:*/
+#define _LARGEFILE64_SOURCE
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <err.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <elf.h>
+#include <sys/mman.h>
+#include <sys/param.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <sys/eventfd.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <ctype.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <time.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <linux/sockios.h>
+#include <linux/if_tun.h>
+#include <sys/uio.h>
+#include <termios.h>
+#include <getopt.h>
+#include <assert.h>
+#include <sched.h>
+#include <limits.h>
+#include <stddef.h>
+#include <signal.h>
+#include <pwd.h>
+#include <grp.h>
+
+#include <linux/virtio_config.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_blk.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_rng.h>
+#include <linux/virtio_ring.h>
+#include <asm/bootparam.h>
+#include "../../include/linux/lguest_launcher.h"
+/*L:110
+ * We can ignore the 42 include files we need for this program, but I do want
+ * to draw attention to the use of kernel-style types.
+ *
+ * As Linus said, "C is a Spartan language, and so should your naming be." I
+ * like these abbreviations, so we define them here. Note that u64 is always
+ * unsigned long long, which works on all Linux systems: this means that we can
+ * use %llu in printf for any u64.
+ */
+typedef unsigned long long u64;
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+/*:*/
+
+#define PAGE_PRESENT 0x7 /* Present, RW, Execute */
+#define BRIDGE_PFX "bridge:"
+#ifndef SIOCBRADDIF
+#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
+#endif
+/* We can have up to 256 pages for devices. */
+#define DEVICE_PAGES 256
+/* This will occupy 3 pages: it must be a power of 2. */
+#define VIRTQUEUE_NUM 256
+
+/*L:120
+ * verbose is both a global flag and a macro. The C preprocessor allows
+ * this, and although I wouldn't recommend it, it works quite nicely here.
+ */
+static bool verbose;
+#define verbose(args...) \
+ do { if (verbose) printf(args); } while(0)
+/*:*/
+
+/* The pointer to the start of guest memory. */
+static void *guest_base;
+/* The maximum guest physical address allowed, and maximum possible. */
+static unsigned long guest_limit, guest_max;
+/* The /dev/lguest file descriptor. */
+static int lguest_fd;
+
+/* a per-cpu variable indicating whose vcpu is currently running */
+static unsigned int __thread cpu_id;
+
+/* This is our list of devices. */
+struct device_list {
+ /* Counter to assign interrupt numbers. */
+ unsigned int next_irq;
+
+ /* Counter to print out convenient device numbers. */
+ unsigned int device_num;
+
+ /* The descriptor page for the devices. */
+ u8 *descpage;
+
+ /* A single linked list of devices. */
+ struct device *dev;
+ /* And a pointer to the last device for easy append. */
+ struct device *lastdev;
+};
+
+/* The list of Guest devices, based on command line arguments. */
+static struct device_list devices;
+
+/* The device structure describes a single device. */
+struct device {
+ /* The linked-list pointer. */
+ struct device *next;
+
+ /* The device's descriptor, as mapped into the Guest. */
+ struct lguest_device_desc *desc;
+
+ /* We can't trust desc values once Guest has booted: we use these. */
+ unsigned int feature_len;
+ unsigned int num_vq;
+
+ /* The name of this device, for --verbose. */
+ const char *name;
+
+ /* Any queues attached to this device */
+ struct virtqueue *vq;
+
+ /* Is it operational */
+ bool running;
+
+ /* Does Guest want an intrrupt on empty? */
+ bool irq_on_empty;
+
+ /* Device-specific data. */
+ void *priv;
+};
+
+/* The virtqueue structure describes a queue attached to a device. */
+struct virtqueue {
+ struct virtqueue *next;
+
+ /* Which device owns me. */
+ struct device *dev;
+
+ /* The configuration for this queue. */
+ struct lguest_vqconfig config;
+
+ /* The actual ring of buffers. */
+ struct vring vring;
+
+ /* Last available index we saw. */
+ u16 last_avail_idx;
+
+ /* How many are used since we sent last irq? */
+ unsigned int pending_used;
+
+ /* Eventfd where Guest notifications arrive. */
+ int eventfd;
+
+ /* Function for the thread which is servicing this virtqueue. */
+ void (*service)(struct virtqueue *vq);
+ pid_t thread;
+};
+
+/* Remember the arguments to the program so we can "reboot" */
+static char **main_args;
+
+/* The original tty settings to restore on exit. */
+static struct termios orig_term;
+
+/*
+ * We have to be careful with barriers: our devices are all run in separate
+ * threads and so we need to make sure that changes visible to the Guest happen
+ * in precise order.
+ */
+#define wmb() __asm__ __volatile__("" : : : "memory")
+#define mb() __asm__ __volatile__("" : : : "memory")
+
+/*
+ * Convert an iovec element to the given type.
+ *
+ * This is a fairly ugly trick: we need to know the size of the type and
+ * alignment requirement to check the pointer is kosher. It's also nice to
+ * have the name of the type in case we report failure.
+ *
+ * Typing those three things all the time is cumbersome and error prone, so we
+ * have a macro which sets them all up and passes to the real function.
+ */
+#define convert(iov, type) \
+ ((type *)_convert((iov), sizeof(type), __alignof__(type), #type))
+
+static void *_convert(struct iovec *iov, size_t size, size_t align,
+ const char *name)
+{
+ if (iov->iov_len != size)
+ errx(1, "Bad iovec size %zu for %s", iov->iov_len, name);
+ if ((unsigned long)iov->iov_base % align != 0)
+ errx(1, "Bad alignment %p for %s", iov->iov_base, name);
+ return iov->iov_base;
+}
+
+/* Wrapper for the last available index. Makes it easier to change. */
+#define lg_last_avail(vq) ((vq)->last_avail_idx)
+
+/*
+ * The virtio configuration space is defined to be little-endian. x86 is
+ * little-endian too, but it's nice to be explicit so we have these helpers.
+ */
+#define cpu_to_le16(v16) (v16)
+#define cpu_to_le32(v32) (v32)
+#define cpu_to_le64(v64) (v64)
+#define le16_to_cpu(v16) (v16)
+#define le32_to_cpu(v32) (v32)
+#define le64_to_cpu(v64) (v64)
+
+/* Is this iovec empty? */
+static bool iov_empty(const struct iovec iov[], unsigned int num_iov)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_iov; i++)
+ if (iov[i].iov_len)
+ return false;
+ return true;
+}
+
+/* Take len bytes from the front of this iovec. */
+static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_iov; i++) {
+ unsigned int used;
+
+ used = iov[i].iov_len < len ? iov[i].iov_len : len;
+ iov[i].iov_base += used;
+ iov[i].iov_len -= used;
+ len -= used;
+ }
+ assert(len == 0);
+}
+
+/* The device virtqueue descriptors are followed by feature bitmasks. */
+static u8 *get_feature_bits(struct device *dev)
+{
+ return (u8 *)(dev->desc + 1)
+ + dev->num_vq * sizeof(struct lguest_vqconfig);
+}
+
+/*L:100
+ * The Launcher code itself takes us out into userspace, that scary place where
+ * pointers run wild and free! Unfortunately, like most userspace programs,
+ * it's quite boring (which is why everyone likes to hack on the kernel!).
+ * Perhaps if you make up an Lguest Drinking Game at this point, it will get
+ * you through this section. Or, maybe not.
+ *
+ * The Launcher sets up a big chunk of memory to be the Guest's "physical"
+ * memory and stores it in "guest_base". In other words, Guest physical ==
+ * Launcher virtual with an offset.
+ *
+ * This can be tough to get your head around, but usually it just means that we
+ * use these trivial conversion functions when the Guest gives us its
+ * "physical" addresses:
+ */
+static void *from_guest_phys(unsigned long addr)
+{
+ return guest_base + addr;
+}
+
+static unsigned long to_guest_phys(const void *addr)
+{
+ return (addr - guest_base);
+}
+
+/*L:130
+ * Loading the Kernel.
+ *
+ * We start with couple of simple helper routines. open_or_die() avoids
+ * error-checking code cluttering the callers:
+ */
+static int open_or_die(const char *name, int flags)
+{
+ int fd = open(name, flags);
+ if (fd < 0)
+ err(1, "Failed to open %s", name);
+ return fd;
+}
+
+/* map_zeroed_pages() takes a number of pages. */
+static void *map_zeroed_pages(unsigned int num)
+{
+ int fd = open_or_die("/dev/zero", O_RDONLY);
+ void *addr;
+
+ /*
+ * We use a private mapping (ie. if we write to the page, it will be
+ * copied). We allocate an extra two pages PROT_NONE to act as guard
+ * pages against read/write attempts that exceed allocated space.
+ */
+ addr = mmap(NULL, getpagesize() * (num+2),
+ PROT_NONE, MAP_PRIVATE, fd, 0);
+
+ if (addr == MAP_FAILED)
+ err(1, "Mmapping %u pages of /dev/zero", num);
+
+ if (mprotect(addr + getpagesize(), getpagesize() * num,
+ PROT_READ|PROT_WRITE) == -1)
+ err(1, "mprotect rw %u pages failed", num);
+
+ /*
+ * One neat mmap feature is that you can close the fd, and it
+ * stays mapped.
+ */
+ close(fd);
+
+ /* Return address after PROT_NONE page */
+ return addr + getpagesize();
+}
+
+/* Get some more pages for a device. */
+static void *get_pages(unsigned int num)
+{
+ void *addr = from_guest_phys(guest_limit);
+
+ guest_limit += num * getpagesize();
+ if (guest_limit > guest_max)
+ errx(1, "Not enough memory for devices");
+ return addr;
+}
+
+/*
+ * This routine is used to load the kernel or initrd. It tries mmap, but if
+ * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
+ * it falls back to reading the memory in.
+ */
+static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
+{
+ ssize_t r;
+
+ /*
+ * We map writable even though for some segments are marked read-only.
+ * The kernel really wants to be writable: it patches its own
+ * instructions.
+ *
+ * MAP_PRIVATE means that the page won't be copied until a write is
+ * done to it. This allows us to share untouched memory between
+ * Guests.
+ */
+ if (mmap(addr, len, PROT_READ|PROT_WRITE,
+ MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED)
+ return;
+
+ /* pread does a seek and a read in one shot: saves a few lines. */
+ r = pread(fd, addr, len, offset);
+ if (r != len)
+ err(1, "Reading offset %lu len %lu gave %zi", offset, len, r);
+}
+
+/*
+ * This routine takes an open vmlinux image, which is in ELF, and maps it into
+ * the Guest memory. ELF = Embedded Linking Format, which is the format used
+ * by all modern binaries on Linux including the kernel.
+ *
+ * The ELF headers give *two* addresses: a physical address, and a virtual
+ * address. We use the physical address; the Guest will map itself to the
+ * virtual address.
+ *
+ * We return the starting address.
+ */
+static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
+{
+ Elf32_Phdr phdr[ehdr->e_phnum];
+ unsigned int i;
+
+ /*
+ * Sanity checks on the main ELF header: an x86 executable with a
+ * reasonable number of correctly-sized program headers.
+ */
+ if (ehdr->e_type != ET_EXEC
+ || ehdr->e_machine != EM_386
+ || ehdr->e_phentsize != sizeof(Elf32_Phdr)
+ || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr))
+ errx(1, "Malformed elf header");
+
+ /*
+ * An ELF executable contains an ELF header and a number of "program"
+ * headers which indicate which parts ("segments") of the program to
+ * load where.
+ */
+
+ /* We read in all the program headers at once: */
+ if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0)
+ err(1, "Seeking to program headers");
+ if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr))
+ err(1, "Reading program headers");
+
+ /*
+ * Try all the headers: there are usually only three. A read-only one,
+ * a read-write one, and a "note" section which we don't load.
+ */
+ for (i = 0; i < ehdr->e_phnum; i++) {
+ /* If this isn't a loadable segment, we ignore it */
+ if (phdr[i].p_type != PT_LOAD)
+ continue;
+
+ verbose("Section %i: size %i addr %p\n",
+ i, phdr[i].p_memsz, (void *)phdr[i].p_paddr);
+
+ /* We map this section of the file at its physical address. */
+ map_at(elf_fd, from_guest_phys(phdr[i].p_paddr),
+ phdr[i].p_offset, phdr[i].p_filesz);
+ }
+
+ /* The entry point is given in the ELF header. */
+ return ehdr->e_entry;
+}
+
+/*L:150
+ * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed
+ * to jump into it and it will unpack itself. We used to have to perform some
+ * hairy magic because the unpacking code scared me.
+ *
+ * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote
+ * a small patch to jump over the tricky bits in the Guest, so now we just read
+ * the funky header so we know where in the file to load, and away we go!
+ */
+static unsigned long load_bzimage(int fd)
+{
+ struct boot_params boot;
+ int r;
+ /* Modern bzImages get loaded at 1M. */
+ void *p = from_guest_phys(0x100000);
+
+ /*
+ * Go back to the start of the file and read the header. It should be
+ * a Linux boot header (see Documentation/x86/i386/boot.txt)
+ */
+ lseek(fd, 0, SEEK_SET);
+ read(fd, &boot, sizeof(boot));
+
+ /* Inside the setup_hdr, we expect the magic "HdrS" */
+ if (memcmp(&boot.hdr.header, "HdrS", 4) != 0)
+ errx(1, "This doesn't look like a bzImage to me");
+
+ /* Skip over the extra sectors of the header. */
+ lseek(fd, (boot.hdr.setup_sects+1) * 512, SEEK_SET);
+
+ /* Now read everything into memory. in nice big chunks. */
+ while ((r = read(fd, p, 65536)) > 0)
+ p += r;
+
+ /* Finally, code32_start tells us where to enter the kernel. */
+ return boot.hdr.code32_start;
+}
+
+/*L:140
+ * Loading the kernel is easy when it's a "vmlinux", but most kernels
+ * come wrapped up in the self-decompressing "bzImage" format. With a little
+ * work, we can load those, too.
+ */
+static unsigned long load_kernel(int fd)
+{
+ Elf32_Ehdr hdr;
+
+ /* Read in the first few bytes. */
+ if (read(fd, &hdr, sizeof(hdr)) != sizeof(hdr))
+ err(1, "Reading kernel");
+
+ /* If it's an ELF file, it starts with "\177ELF" */
+ if (memcmp(hdr.e_ident, ELFMAG, SELFMAG) == 0)
+ return map_elf(fd, &hdr);
+
+ /* Otherwise we assume it's a bzImage, and try to load it. */
+ return load_bzimage(fd);
+}
+
+/*
+ * This is a trivial little helper to align pages. Andi Kleen hated it because
+ * it calls getpagesize() twice: "it's dumb code."
+ *
+ * Kernel guys get really het up about optimization, even when it's not
+ * necessary. I leave this code as a reaction against that.
+ */
+static inline unsigned long page_align(unsigned long addr)
+{
+ /* Add upwards and truncate downwards. */
+ return ((addr + getpagesize()-1) & ~(getpagesize()-1));
+}
+
+/*L:180
+ * An "initial ram disk" is a disk image loaded into memory along with the
+ * kernel which the kernel can use to boot from without needing any drivers.
+ * Most distributions now use this as standard: the initrd contains the code to
+ * load the appropriate driver modules for the current machine.
+ *
+ * Importantly, James Morris works for RedHat, and Fedora uses initrds for its
+ * kernels. He sent me this (and tells me when I break it).
+ */
+static unsigned long load_initrd(const char *name, unsigned long mem)
+{
+ int ifd;
+ struct stat st;
+ unsigned long len;
+
+ ifd = open_or_die(name, O_RDONLY);
+ /* fstat() is needed to get the file size. */
+ if (fstat(ifd, &st) < 0)
+ err(1, "fstat() on initrd '%s'", name);
+
+ /*
+ * We map the initrd at the top of memory, but mmap wants it to be
+ * page-aligned, so we round the size up for that.
+ */
+ len = page_align(st.st_size);
+ map_at(ifd, from_guest_phys(mem - len), 0, st.st_size);
+ /*
+ * Once a file is mapped, you can close the file descriptor. It's a
+ * little odd, but quite useful.
+ */
+ close(ifd);
+ verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len);
+
+ /* We return the initrd size. */
+ return len;
+}
+/*:*/
+
+/*
+ * Simple routine to roll all the commandline arguments together with spaces
+ * between them.
+ */
+static void concat(char *dst, char *args[])
+{
+ unsigned int i, len = 0;
+
+ for (i = 0; args[i]; i++) {
+ if (i) {
+ strcat(dst+len, " ");
+ len++;
+ }
+ strcpy(dst+len, args[i]);
+ len += strlen(args[i]);
+ }
+ /* In case it's empty. */
+ dst[len] = '\0';
+}
+
+/*L:185
+ * This is where we actually tell the kernel to initialize the Guest. We
+ * saw the arguments it expects when we looked at initialize() in lguest_user.c:
+ * the base of Guest "physical" memory, the top physical page to allow and the
+ * entry point for the Guest.
+ */
+static void tell_kernel(unsigned long start)
+{
+ unsigned long args[] = { LHREQ_INITIALIZE,
+ (unsigned long)guest_base,
+ guest_limit / getpagesize(), start };
+ verbose("Guest: %p - %p (%#lx)\n",
+ guest_base, guest_base + guest_limit, guest_limit);
+ lguest_fd = open_or_die("/dev/lguest", O_RDWR);
+ if (write(lguest_fd, args, sizeof(args)) < 0)
+ err(1, "Writing to /dev/lguest");
+}
+/*:*/
+
+/*L:200
+ * Device Handling.
+ *
+ * When the Guest gives us a buffer, it sends an array of addresses and sizes.
+ * We need to make sure it's not trying to reach into the Launcher itself, so
+ * we have a convenient routine which checks it and exits with an error message
+ * if something funny is going on:
+ */
+static void *_check_pointer(unsigned long addr, unsigned int size,
+ unsigned int line)
+{
+ /*
+ * Check if the requested address and size exceeds the allocated memory,
+ * or addr + size wraps around.
+ */
+ if ((addr + size) > guest_limit || (addr + size) < addr)
+ errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr);
+ /*
+ * We return a pointer for the caller's convenience, now we know it's
+ * safe to use.
+ */
+ return from_guest_phys(addr);
+}
+/* A macro which transparently hands the line number to the real function. */
+#define check_pointer(addr,size) _check_pointer(addr, size, __LINE__)
+
+/*
+ * Each buffer in the virtqueues is actually a chain of descriptors. This
+ * function returns the next descriptor in the chain, or vq->vring.num if we're
+ * at the end.
+ */
+static unsigned next_desc(struct vring_desc *desc,
+ unsigned int i, unsigned int max)
+{
+ unsigned int next;
+
+ /* If this descriptor says it doesn't chain, we're done. */
+ if (!(desc[i].flags & VRING_DESC_F_NEXT))
+ return max;
+
+ /* Check they're not leading us off end of descriptors. */
+ next = desc[i].next;
+ /* Make sure compiler knows to grab that: we don't want it changing! */
+ wmb();
+
+ if (next >= max)
+ errx(1, "Desc next is %u", next);
+
+ return next;
+}
+
+/*
+ * This actually sends the interrupt for this virtqueue, if we've used a
+ * buffer.
+ */
+static void trigger_irq(struct virtqueue *vq)
+{
+ unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
+
+ /* Don't inform them if nothing used. */
+ if (!vq->pending_used)
+ return;
+ vq->pending_used = 0;
+
+ /* If they don't want an interrupt, don't send one... */
+ if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) {
+ /* ... unless they've asked us to force one on empty. */
+ if (!vq->dev->irq_on_empty
+ || lg_last_avail(vq) != vq->vring.avail->idx)
+ return;
+ }
+
+ /* Send the Guest an interrupt tell them we used something up. */
+ if (write(lguest_fd, buf, sizeof(buf)) != 0)
+ err(1, "Triggering irq %i", vq->config.irq);
+}
+
+/*
+ * This looks in the virtqueue for the first available buffer, and converts
+ * it to an iovec for convenient access. Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function waits if necessary, and returns the descriptor number found.
+ */
+static unsigned wait_for_vq_desc(struct virtqueue *vq,
+ struct iovec iov[],
+ unsigned int *out_num, unsigned int *in_num)
+{
+ unsigned int i, head, max;
+ struct vring_desc *desc;
+ u16 last_avail = lg_last_avail(vq);
+
+ /* There's nothing available? */
+ while (last_avail == vq->vring.avail->idx) {
+ u64 event;
+
+ /*
+ * Since we're about to sleep, now is a good time to tell the
+ * Guest about what we've used up to now.
+ */
+ trigger_irq(vq);
+
+ /* OK, now we need to know about added descriptors. */
+ vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+
+ /*
+ * They could have slipped one in as we were doing that: make
+ * sure it's written, then check again.
+ */
+ mb();
+ if (last_avail != vq->vring.avail->idx) {
+ vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
+ break;
+ }
+
+ /* Nothing new? Wait for eventfd to tell us they refilled. */
+ if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event))
+ errx(1, "Event read failed?");
+
+ /* We don't need to be notified again. */
+ vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
+ }
+
+ /* Check it isn't doing very strange things with descriptor numbers. */
+ if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num)
+ errx(1, "Guest moved used index from %u to %u",
+ last_avail, vq->vring.avail->idx);
+
+ /*
+ * Grab the next descriptor number they're advertising, and increment
+ * the index we've seen.
+ */
+ head = vq->vring.avail->ring[last_avail % vq->vring.num];
+ lg_last_avail(vq)++;
+
+ /* If their number is silly, that's a fatal mistake. */
+ if (head >= vq->vring.num)
+ errx(1, "Guest says index %u is available", head);
+
+ /* When we start there are none of either input nor output. */
+ *out_num = *in_num = 0;
+
+ max = vq->vring.num;
+ desc = vq->vring.desc;
+ i = head;
+
+ /*
+ * If this is an indirect entry, then this buffer contains a descriptor
+ * table which we handle as if it's any normal descriptor chain.
+ */
+ if (desc[i].flags & VRING_DESC_F_INDIRECT) {
+ if (desc[i].len % sizeof(struct vring_desc))
+ errx(1, "Invalid size for indirect buffer table");
+
+ max = desc[i].len / sizeof(struct vring_desc);
+ desc = check_pointer(desc[i].addr, desc[i].len);
+ i = 0;
+ }
+
+ do {
+ /* Grab the first descriptor, and check it's OK. */
+ iov[*out_num + *in_num].iov_len = desc[i].len;
+ iov[*out_num + *in_num].iov_base
+ = check_pointer(desc[i].addr, desc[i].len);
+ /* If this is an input descriptor, increment that count. */
+ if (desc[i].flags & VRING_DESC_F_WRITE)
+ (*in_num)++;
+ else {
+ /*
+ * If it's an output descriptor, they're all supposed
+ * to come before any input descriptors.
+ */
+ if (*in_num)
+ errx(1, "Descriptor has out after in");
+ (*out_num)++;
+ }
+
+ /* If we've got too many, that implies a descriptor loop. */
+ if (*out_num + *in_num > max)
+ errx(1, "Looped descriptor");
+ } while ((i = next_desc(desc, i, max)) != max);
+
+ return head;
+}
+
+/*
+ * After we've used one of their buffers, we tell the Guest about it. Sometime
+ * later we'll want to send them an interrupt using trigger_irq(); note that
+ * wait_for_vq_desc() does that for us if it has to wait.
+ */
+static void add_used(struct virtqueue *vq, unsigned int head, int len)
+{
+ struct vring_used_elem *used;
+
+ /*
+ * The virtqueue contains a ring of used buffers. Get a pointer to the
+ * next entry in that used ring.
+ */
+ used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num];
+ used->id = head;
+ used->len = len;
+ /* Make sure buffer is written before we update index. */
+ wmb();
+ vq->vring.used->idx++;
+ vq->pending_used++;
+}
+
+/* And here's the combo meal deal. Supersize me! */
+static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len)
+{
+ add_used(vq, head, len);
+ trigger_irq(vq);
+}
+
+/*
+ * The Console
+ *
+ * We associate some data with the console for our exit hack.
+ */
+struct console_abort {
+ /* How many times have they hit ^C? */
+ int count;
+ /* When did they start? */
+ struct timeval start;
+};
+
+/* This is the routine which handles console input (ie. stdin). */
+static void console_input(struct virtqueue *vq)
+{
+ int len;
+ unsigned int head, in_num, out_num;
+ struct console_abort *abort = vq->dev->priv;
+ struct iovec iov[vq->vring.num];
+
+ /* Make sure there's a descriptor available. */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
+ if (out_num)
+ errx(1, "Output buffers in console in queue?");
+
+ /* Read into it. This is where we usually wait. */
+ len = readv(STDIN_FILENO, iov, in_num);
+ if (len <= 0) {
+ /* Ran out of input? */
+ warnx("Failed to get console input, ignoring console.");
+ /*
+ * For simplicity, dying threads kill the whole Launcher. So
+ * just nap here.
+ */
+ for (;;)
+ pause();
+ }
+
+ /* Tell the Guest we used a buffer. */
+ add_used_and_trigger(vq, head, len);
+
+ /*
+ * Three ^C within one second? Exit.
+ *
+ * This is such a hack, but works surprisingly well. Each ^C has to
+ * be in a buffer by itself, so they can't be too fast. But we check
+ * that we get three within about a second, so they can't be too
+ * slow.
+ */
+ if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) {
+ abort->count = 0;
+ return;
+ }
+
+ abort->count++;
+ if (abort->count == 1)
+ gettimeofday(&abort->start, NULL);
+ else if (abort->count == 3) {
+ struct timeval now;
+ gettimeofday(&now, NULL);
+ /* Kill all Launcher processes with SIGINT, like normal ^C */
+ if (now.tv_sec <= abort->start.tv_sec+1)
+ kill(0, SIGINT);
+ abort->count = 0;
+ }
+}
+
+/* This is the routine which handles console output (ie. stdout). */
+static void console_output(struct virtqueue *vq)
+{
+ unsigned int head, out, in;
+ struct iovec iov[vq->vring.num];
+
+ /* We usually wait in here, for the Guest to give us something. */
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (in)
+ errx(1, "Input buffers in console output queue?");
+
+ /* writev can return a partial write, so we loop here. */
+ while (!iov_empty(iov, out)) {
+ int len = writev(STDOUT_FILENO, iov, out);
+ if (len <= 0)
+ err(1, "Write to stdout gave %i", len);
+ iov_consume(iov, out, len);
+ }
+
+ /*
+ * We're finished with that buffer: if we're going to sleep,
+ * wait_for_vq_desc() will prod the Guest with an interrupt.
+ */
+ add_used(vq, head, 0);
+}
+
+/*
+ * The Network
+ *
+ * Handling output for network is also simple: we get all the output buffers
+ * and write them to /dev/net/tun.
+ */
+struct net_info {
+ int tunfd;
+};
+
+static void net_output(struct virtqueue *vq)
+{
+ struct net_info *net_info = vq->dev->priv;
+ unsigned int head, out, in;
+ struct iovec iov[vq->vring.num];
+
+ /* We usually wait in here for the Guest to give us a packet. */
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (in)
+ errx(1, "Input buffers in net output queue?");
+ /*
+ * Send the whole thing through to /dev/net/tun. It expects the exact
+ * same format: what a coincidence!
+ */
+ if (writev(net_info->tunfd, iov, out) < 0)
+ errx(1, "Write to tun failed?");
+
+ /*
+ * Done with that one; wait_for_vq_desc() will send the interrupt if
+ * all packets are processed.
+ */
+ add_used(vq, head, 0);
+}
+
+/*
+ * Handling network input is a bit trickier, because I've tried to optimize it.
+ *
+ * First we have a helper routine which tells is if from this file descriptor
+ * (ie. the /dev/net/tun device) will block:
+ */
+static bool will_block(int fd)
+{
+ fd_set fdset;
+ struct timeval zero = { 0, 0 };
+ FD_ZERO(&fdset);
+ FD_SET(fd, &fdset);
+ return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
+}
+
+/*
+ * This handles packets coming in from the tun device to our Guest. Like all
+ * service routines, it gets called again as soon as it returns, so you don't
+ * see a while(1) loop here.
+ */
+static void net_input(struct virtqueue *vq)
+{
+ int len;
+ unsigned int head, out, in;
+ struct iovec iov[vq->vring.num];
+ struct net_info *net_info = vq->dev->priv;
+
+ /*
+ * Get a descriptor to write an incoming packet into. This will also
+ * send an interrupt if they're out of descriptors.
+ */
+ head = wait_for_vq_desc(vq, iov, &out, &in);
+ if (out)
+ errx(1, "Output buffers in net input queue?");
+
+ /*
+ * If it looks like we'll block reading from the tun device, send them
+ * an interrupt.
+ */
+ if (vq->pending_used && will_block(net_info->tunfd))
+ trigger_irq(vq);
+
+ /*
+ * Read in the packet. This is where we normally wait (when there's no
+ * incoming network traffic).
+ */
+ len = readv(net_info->tunfd, iov, in);
+ if (len <= 0)
+ err(1, "Failed to read from tun.");
+
+ /*
+ * Mark that packet buffer as used, but don't interrupt here. We want
+ * to wait until we've done as much work as we can.
+ */
+ add_used(vq, head, len);
+}
+/*:*/
+
+/* This is the helper to create threads: run the service routine in a loop. */
+static int do_thread(void *_vq)
+{
+ struct virtqueue *vq = _vq;
+
+ for (;;)
+ vq->service(vq);
+ return 0;
+}
+
+/*
+ * When a child dies, we kill our entire process group with SIGTERM. This
+ * also has the side effect that the shell restores the console for us!
+ */
+static void kill_launcher(int signal)
+{
+ kill(0, SIGTERM);
+}
+
+static void reset_device(struct device *dev)
+{
+ struct virtqueue *vq;
+
+ verbose("Resetting device %s\n", dev->name);
+
+ /* Clear any features they've acked. */
+ memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len);
+
+ /* We're going to be explicitly killing threads, so ignore them. */
+ signal(SIGCHLD, SIG_IGN);
+
+ /* Zero out the virtqueues, get rid of their threads */
+ for (vq = dev->vq; vq; vq = vq->next) {
+ if (vq->thread != (pid_t)-1) {
+ kill(vq->thread, SIGTERM);
+ waitpid(vq->thread, NULL, 0);
+ vq->thread = (pid_t)-1;
+ }
+ memset(vq->vring.desc, 0,
+ vring_size(vq->config.num, LGUEST_VRING_ALIGN));
+ lg_last_avail(vq) = 0;
+ }
+ dev->running = false;
+
+ /* Now we care if threads die. */
+ signal(SIGCHLD, (void *)kill_launcher);
+}
+
+/*L:216
+ * This actually creates the thread which services the virtqueue for a device.
+ */
+static void create_thread(struct virtqueue *vq)
+{
+ /*
+ * Create stack for thread. Since the stack grows upwards, we point
+ * the stack pointer to the end of this region.
+ */
+ char *stack = malloc(32768);
+ unsigned long args[] = { LHREQ_EVENTFD,
+ vq->config.pfn*getpagesize(), 0 };
+
+ /* Create a zero-initialized eventfd. */
+ vq->eventfd = eventfd(0, 0);
+ if (vq->eventfd < 0)
+ err(1, "Creating eventfd");
+ args[2] = vq->eventfd;
+
+ /*
+ * Attach an eventfd to this virtqueue: it will go off when the Guest
+ * does an LHCALL_NOTIFY for this vq.
+ */
+ if (write(lguest_fd, &args, sizeof(args)) != 0)
+ err(1, "Attaching eventfd");
+
+ /*
+ * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so
+ * we get a signal if it dies.
+ */
+ vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq);
+ if (vq->thread == (pid_t)-1)
+ err(1, "Creating clone");
+
+ /* We close our local copy now the child has it. */
+ close(vq->eventfd);
+}
+
+static bool accepted_feature(struct device *dev, unsigned int bit)
+{
+ const u8 *features = get_feature_bits(dev) + dev->feature_len;
+
+ if (dev->feature_len < bit / CHAR_BIT)
+ return false;
+ return features[bit / CHAR_BIT] & (1 << (bit % CHAR_BIT));
+}
+
+static void start_device(struct device *dev)
+{
+ unsigned int i;
+ struct virtqueue *vq;
+
+ verbose("Device %s OK: offered", dev->name);
+ for (i = 0; i < dev->feature_len; i++)
+ verbose(" %02x", get_feature_bits(dev)[i]);
+ verbose(", accepted");
+ for (i = 0; i < dev->feature_len; i++)
+ verbose(" %02x", get_feature_bits(dev)
+ [dev->feature_len+i]);
+
+ dev->irq_on_empty = accepted_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
+
+ for (vq = dev->vq; vq; vq = vq->next) {
+ if (vq->service)
+ create_thread(vq);
+ }
+ dev->running = true;
+}
+
+static void cleanup_devices(void)
+{
+ struct device *dev;
+
+ for (dev = devices.dev; dev; dev = dev->next)
+ reset_device(dev);
+
+ /* If we saved off the original terminal settings, restore them now. */
+ if (orig_term.c_lflag & (ISIG|ICANON|ECHO))
+ tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
+}
+
+/* When the Guest tells us they updated the status field, we handle it. */
+static void update_device_status(struct device *dev)
+{
+ /* A zero status is a reset, otherwise it's a set of flags. */
+ if (dev->desc->status == 0)
+ reset_device(dev);
+ else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
+ warnx("Device %s configuration FAILED", dev->name);
+ if (dev->running)
+ reset_device(dev);
+ } else if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
+ if (!dev->running)
+ start_device(dev);
+ }
+}
+
+/*L:215
+ * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In
+ * particular, it's used to notify us of device status changes during boot.
+ */
+static void handle_output(unsigned long addr)
+{
+ struct device *i;
+
+ /* Check each device. */
+ for (i = devices.dev; i; i = i->next) {
+ struct virtqueue *vq;
+
+ /*
+ * Notifications to device descriptors mean they updated the
+ * device status.
+ */
+ if (from_guest_phys(addr) == i->desc) {
+ update_device_status(i);
+ return;
+ }
+
+ /*
+ * Devices *can* be used before status is set to DRIVER_OK.
+ * The original plan was that they would never do this: they
+ * would always finish setting up their status bits before
+ * actually touching the virtqueues. In practice, we allowed
+ * them to, and they do (eg. the disk probes for partition
+ * tables as part of initialization).
+ *
+ * If we see this, we start the device: once it's running, we
+ * expect the device to catch all the notifications.
+ */
+ for (vq = i->vq; vq; vq = vq->next) {
+ if (addr != vq->config.pfn*getpagesize())
+ continue;
+ if (i->running)
+ errx(1, "Notification on running %s", i->name);
+ /* This just calls create_thread() for each virtqueue */
+ start_device(i);
+ return;
+ }
+ }
+
+ /*
+ * Early console write is done using notify on a nul-terminated string
+ * in Guest memory. It's also great for hacking debugging messages
+ * into a Guest.
+ */
+ if (addr >= guest_limit)
+ errx(1, "Bad NOTIFY %#lx", addr);
+
+ write(STDOUT_FILENO, from_guest_phys(addr),
+ strnlen(from_guest_phys(addr), guest_limit - addr));
+}
+
+/*L:190
+ * Device Setup
+ *
+ * All devices need a descriptor so the Guest knows it exists, and a "struct
+ * device" so the Launcher can keep track of it. We have common helper
+ * routines to allocate and manage them.
+ */
+
+/*
+ * The layout of the device page is a "struct lguest_device_desc" followed by a
+ * number of virtqueue descriptors, then two sets of feature bits, then an
+ * array of configuration bytes. This routine returns the configuration
+ * pointer.
+ */
+static u8 *device_config(const struct device *dev)
+{
+ return (void *)(dev->desc + 1)
+ + dev->num_vq * sizeof(struct lguest_vqconfig)
+ + dev->feature_len * 2;
+}
+
+/*
+ * This routine allocates a new "struct lguest_device_desc" from descriptor
+ * table page just above the Guest's normal memory. It returns a pointer to
+ * that descriptor.
+ */
+static struct lguest_device_desc *new_dev_desc(u16 type)
+{
+ struct lguest_device_desc d = { .type = type };
+ void *p;
+
+ /* Figure out where the next device config is, based on the last one. */
+ if (devices.lastdev)
+ p = device_config(devices.lastdev)
+ + devices.lastdev->desc->config_len;
+ else
+ p = devices.descpage;
+
+ /* We only have one page for all the descriptors. */
+ if (p + sizeof(d) > (void *)devices.descpage + getpagesize())
+ errx(1, "Too many devices");
+
+ /* p might not be aligned, so we memcpy in. */
+ return memcpy(p, &d, sizeof(d));
+}
+
+/*
+ * Each device descriptor is followed by the description of its virtqueues. We
+ * specify how many descriptors the virtqueue is to have.
+ */
+static void add_virtqueue(struct device *dev, unsigned int num_descs,
+ void (*service)(struct virtqueue *))
+{
+ unsigned int pages;
+ struct virtqueue **i, *vq = malloc(sizeof(*vq));
+ void *p;
+
+ /* First we need some memory for this virtqueue. */
+ pages = (vring_size(num_descs, LGUEST_VRING_ALIGN) + getpagesize() - 1)
+ / getpagesize();
+ p = get_pages(pages);
+
+ /* Initialize the virtqueue */
+ vq->next = NULL;
+ vq->last_avail_idx = 0;
+ vq->dev = dev;
+
+ /*
+ * This is the routine the service thread will run, and its Process ID
+ * once it's running.
+ */
+ vq->service = service;
+ vq->thread = (pid_t)-1;
+
+ /* Initialize the configuration. */
+ vq->config.num = num_descs;
+ vq->config.irq = devices.next_irq++;
+ vq->config.pfn = to_guest_phys(p) / getpagesize();
+
+ /* Initialize the vring. */
+ vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN);
+
+ /*
+ * Append virtqueue to this device's descriptor. We use
+ * device_config() to get the end of the device's current virtqueues;
+ * we check that we haven't added any config or feature information
+ * yet, otherwise we'd be overwriting them.
+ */
+ assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0);
+ memcpy(device_config(dev), &vq->config, sizeof(vq->config));
+ dev->num_vq++;
+ dev->desc->num_vq++;
+
+ verbose("Virtqueue page %#lx\n", to_guest_phys(p));
+
+ /*
+ * Add to tail of list, so dev->vq is first vq, dev->vq->next is
+ * second.
+ */
+ for (i = &dev->vq; *i; i = &(*i)->next);
+ *i = vq;
+}
+
+/*
+ * The first half of the feature bitmask is for us to advertise features. The
+ * second half is for the Guest to accept features.
+ */
+static void add_feature(struct device *dev, unsigned bit)
+{
+ u8 *features = get_feature_bits(dev);
+
+ /* We can't extend the feature bits once we've added config bytes */
+ if (dev->desc->feature_len <= bit / CHAR_BIT) {
+ assert(dev->desc->config_len == 0);
+ dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1;
+ }
+
+ features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT));
+}
+
+/*
+ * This routine sets the configuration fields for an existing device's
+ * descriptor. It only works for the last device, but that's OK because that's
+ * how we use it.
+ */
+static void set_config(struct device *dev, unsigned len, const void *conf)
+{
+ /* Check we haven't overflowed our single page. */
+ if (device_config(dev) + len > devices.descpage + getpagesize())
+ errx(1, "Too many devices");
+
+ /* Copy in the config information, and store the length. */
+ memcpy(device_config(dev), conf, len);
+ dev->desc->config_len = len;
+
+ /* Size must fit in config_len field (8 bits)! */
+ assert(dev->desc->config_len == len);
+}
+
+/*
+ * This routine does all the creation and setup of a new device, including
+ * calling new_dev_desc() to allocate the descriptor and device memory. We
+ * don't actually start the service threads until later.
+ *
+ * See what I mean about userspace being boring?
+ */
+static struct device *new_device(const char *name, u16 type)
+{
+ struct device *dev = malloc(sizeof(*dev));
+
+ /* Now we populate the fields one at a time. */
+ dev->desc = new_dev_desc(type);
+ dev->name = name;
+ dev->vq = NULL;
+ dev->feature_len = 0;
+ dev->num_vq = 0;
+ dev->running = false;
+
+ /*
+ * Append to device list. Prepending to a single-linked list is
+ * easier, but the user expects the devices to be arranged on the bus
+ * in command-line order. The first network device on the command line
+ * is eth0, the first block device /dev/vda, etc.
+ */
+ if (devices.lastdev)
+ devices.lastdev->next = dev;
+ else
+ devices.dev = dev;
+ devices.lastdev = dev;
+
+ return dev;
+}
+
+/*
+ * Our first setup routine is the console. It's a fairly simple device, but
+ * UNIX tty handling makes it uglier than it could be.
+ */
+static void setup_console(void)
+{
+ struct device *dev;
+
+ /* If we can save the initial standard input settings... */
+ if (tcgetattr(STDIN_FILENO, &orig_term) == 0) {
+ struct termios term = orig_term;
+ /*
+ * Then we turn off echo, line buffering and ^C etc: We want a
+ * raw input stream to the Guest.
+ */
+ term.c_lflag &= ~(ISIG|ICANON|ECHO);
+ tcsetattr(STDIN_FILENO, TCSANOW, &term);
+ }
+
+ dev = new_device("console", VIRTIO_ID_CONSOLE);
+
+ /* We store the console state in dev->priv, and initialize it. */
+ dev->priv = malloc(sizeof(struct console_abort));
+ ((struct console_abort *)dev->priv)->count = 0;
+
+ /*
+ * The console needs two virtqueues: the input then the output. When
+ * they put something the input queue, we make sure we're listening to
+ * stdin. When they put something in the output queue, we write it to
+ * stdout.
+ */
+ add_virtqueue(dev, VIRTQUEUE_NUM, console_input);
+ add_virtqueue(dev, VIRTQUEUE_NUM, console_output);
+
+ verbose("device %u: console\n", ++devices.device_num);
+}
+/*:*/
+
+/*M:010
+ * Inter-guest networking is an interesting area. Simplest is to have a
+ * --sharenet=<name> option which opens or creates a named pipe. This can be
+ * used to send packets to another guest in a 1:1 manner.
+ *
+ * More sopisticated is to use one of the tools developed for project like UML
+ * to do networking.
+ *
+ * Faster is to do virtio bonding in kernel. Doing this 1:1 would be
+ * completely generic ("here's my vring, attach to your vring") and would work
+ * for any traffic. Of course, namespace and permissions issues need to be
+ * dealt with. A more sophisticated "multi-channel" virtio_net.c could hide
+ * multiple inter-guest channels behind one interface, although it would
+ * require some manner of hotplugging new virtio channels.
+ *
+ * Finally, we could implement a virtio network switch in the kernel.
+:*/
+
+static u32 str2ip(const char *ipaddr)
+{
+ unsigned int b[4];
+
+ if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4)
+ errx(1, "Failed to parse IP address '%s'", ipaddr);
+ return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3];
+}
+
+static void str2mac(const char *macaddr, unsigned char mac[6])
+{
+ unsigned int m[6];
+ if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x",
+ &m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6)
+ errx(1, "Failed to parse mac address '%s'", macaddr);
+ mac[0] = m[0];
+ mac[1] = m[1];
+ mac[2] = m[2];
+ mac[3] = m[3];
+ mac[4] = m[4];
+ mac[5] = m[5];
+}
+
+/*
+ * This code is "adapted" from libbridge: it attaches the Host end of the
+ * network device to the bridge device specified by the command line.
+ *
+ * This is yet another James Morris contribution (I'm an IP-level guy, so I
+ * dislike bridging), and I just try not to break it.
+ */
+static void add_to_bridge(int fd, const char *if_name, const char *br_name)
+{
+ int ifidx;
+ struct ifreq ifr;
+
+ if (!*br_name)
+ errx(1, "must specify bridge name");
+
+ ifidx = if_nametoindex(if_name);
+ if (!ifidx)
+ errx(1, "interface %s does not exist!", if_name);
+
+ strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
+ ifr.ifr_name[IFNAMSIZ-1] = '\0';
+ ifr.ifr_ifindex = ifidx;
+ if (ioctl(fd, SIOCBRADDIF, &ifr) < 0)
+ err(1, "can't add %s to bridge %s", if_name, br_name);
+}
+
+/*
+ * This sets up the Host end of the network device with an IP address, brings
+ * it up so packets will flow, the copies the MAC address into the hwaddr
+ * pointer.
+ */
+static void configure_device(int fd, const char *tapif, u32 ipaddr)
+{
+ struct ifreq ifr;
+ struct sockaddr_in sin;
+
+ memset(&ifr, 0, sizeof(ifr));
+ strcpy(ifr.ifr_name, tapif);
+
+ /* Don't read these incantations. Just cut & paste them like I did! */
+ sin.sin_family = AF_INET;
+ sin.sin_addr.s_addr = htonl(ipaddr);
+ memcpy(&ifr.ifr_addr, &sin, sizeof(sin));
+ if (ioctl(fd, SIOCSIFADDR, &ifr) != 0)
+ err(1, "Setting %s interface address", tapif);
+ ifr.ifr_flags = IFF_UP;
+ if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0)
+ err(1, "Bringing interface %s up", tapif);
+}
+
+static int get_tun_device(char tapif[IFNAMSIZ])
+{
+ struct ifreq ifr;
+ int netfd;
+
+ /* Start with this zeroed. Messy but sure. */
+ memset(&ifr, 0, sizeof(ifr));
+
+ /*
+ * We open the /dev/net/tun device and tell it we want a tap device. A
+ * tap device is like a tun device, only somehow different. To tell
+ * the truth, I completely blundered my way through this code, but it
+ * works now!
+ */
+ netfd = open_or_die("/dev/net/tun", O_RDWR);
+ ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
+ strcpy(ifr.ifr_name, "tap%d");
+ if (ioctl(netfd, TUNSETIFF, &ifr) != 0)
+ err(1, "configuring /dev/net/tun");
+
+ if (ioctl(netfd, TUNSETOFFLOAD,
+ TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0)
+ err(1, "Could not set features for tun device");
+
+ /*
+ * We don't need checksums calculated for packets coming in this
+ * device: trust us!
+ */
+ ioctl(netfd, TUNSETNOCSUM, 1);
+
+ memcpy(tapif, ifr.ifr_name, IFNAMSIZ);
+ return netfd;
+}
+
+/*L:195
+ * Our network is a Host<->Guest network. This can either use bridging or
+ * routing, but the principle is the same: it uses the "tun" device to inject
+ * packets into the Host as if they came in from a normal network card. We
+ * just shunt packets between the Guest and the tun device.
+ */
+static void setup_tun_net(char *arg)
+{
+ struct device *dev;
+ struct net_info *net_info = malloc(sizeof(*net_info));
+ int ipfd;
+ u32 ip = INADDR_ANY;
+ bool bridging = false;
+ char tapif[IFNAMSIZ], *p;
+ struct virtio_net_config conf;
+
+ net_info->tunfd = get_tun_device(tapif);
+
+ /* First we create a new network device. */
+ dev = new_device("net", VIRTIO_ID_NET);
+ dev->priv = net_info;
+
+ /* Network devices need a recv and a send queue, just like console. */
+ add_virtqueue(dev, VIRTQUEUE_NUM, net_input);
+ add_virtqueue(dev, VIRTQUEUE_NUM, net_output);
+
+ /*
+ * We need a socket to perform the magic network ioctls to bring up the
+ * tap interface, connect to the bridge etc. Any socket will do!
+ */
+ ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+ if (ipfd < 0)
+ err(1, "opening IP socket");
+
+ /* If the command line was --tunnet=bridge:<name> do bridging. */
+ if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
+ arg += strlen(BRIDGE_PFX);
+ bridging = true;
+ }
+
+ /* A mac address may follow the bridge name or IP address */
+ p = strchr(arg, ':');
+ if (p) {
+ str2mac(p+1, conf.mac);
+ add_feature(dev, VIRTIO_NET_F_MAC);
+ *p = '\0';
+ }
+
+ /* arg is now either an IP address or a bridge name */
+ if (bridging)
+ add_to_bridge(ipfd, tapif, arg);
+ else
+ ip = str2ip(arg);
+
+ /* Set up the tun device. */
+ configure_device(ipfd, tapif, ip);
+
+ add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
+ /* Expect Guest to handle everything except UFO */
+ add_feature(dev, VIRTIO_NET_F_CSUM);
+ add_feature(dev, VIRTIO_NET_F_GUEST_CSUM);
+ add_feature(dev, VIRTIO_NET_F_GUEST_TSO4);
+ add_feature(dev, VIRTIO_NET_F_GUEST_TSO6);
+ add_feature(dev, VIRTIO_NET_F_GUEST_ECN);
+ add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
+ add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
+ add_feature(dev, VIRTIO_NET_F_HOST_ECN);
+ /* We handle indirect ring entries */
+ add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
+ set_config(dev, sizeof(conf), &conf);
+
+ /* We don't need the socket any more; setup is done. */
+ close(ipfd);
+
+ devices.device_num++;
+
+ if (bridging)
+ verbose("device %u: tun %s attached to bridge: %s\n",
+ devices.device_num, tapif, arg);
+ else
+ verbose("device %u: tun %s: %s\n",
+ devices.device_num, tapif, arg);
+}
+/*:*/
+
+/* This hangs off device->priv. */
+struct vblk_info {
+ /* The size of the file. */
+ off64_t len;
+
+ /* The file descriptor for the file. */
+ int fd;
+
+};
+
+/*L:210
+ * The Disk
+ *
+ * The disk only has one virtqueue, so it only has one thread. It is really
+ * simple: the Guest asks for a block number and we read or write that position
+ * in the file.
+ *
+ * Before we serviced each virtqueue in a separate thread, that was unacceptably
+ * slow: the Guest waits until the read is finished before running anything
+ * else, even if it could have been doing useful work.
+ *
+ * We could have used async I/O, except it's reputed to suck so hard that
+ * characters actually go missing from your code when you try to use it.
+ */
+static void blk_request(struct virtqueue *vq)
+{
+ struct vblk_info *vblk = vq->dev->priv;
+ unsigned int head, out_num, in_num, wlen;
+ int ret;
+ u8 *in;
+ struct virtio_blk_outhdr *out;
+ struct iovec iov[vq->vring.num];
+ off64_t off;
+
+ /*
+ * Get the next request, where we normally wait. It triggers the
+ * interrupt to acknowledge previously serviced requests (if any).
+ */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
+
+ /*
+ * Every block request should contain at least one output buffer
+ * (detailing the location on disk and the type of request) and one
+ * input buffer (to hold the result).
+ */
+ if (out_num == 0 || in_num == 0)
+ errx(1, "Bad virtblk cmd %u out=%u in=%u",
+ head, out_num, in_num);
+
+ out = convert(&iov[0], struct virtio_blk_outhdr);
+ in = convert(&iov[out_num+in_num-1], u8);
+ /*
+ * For historical reasons, block operations are expressed in 512 byte
+ * "sectors".
+ */
+ off = out->sector * 512;
+
+ /*
+ * In general the virtio block driver is allowed to try SCSI commands.
+ * It'd be nice if we supported eject, for example, but we don't.
+ */
+ if (out->type & VIRTIO_BLK_T_SCSI_CMD) {
+ fprintf(stderr, "Scsi commands unsupported\n");
+ *in = VIRTIO_BLK_S_UNSUPP;
+ wlen = sizeof(*in);
+ } else if (out->type & VIRTIO_BLK_T_OUT) {
+ /*
+ * Write
+ *
+ * Move to the right location in the block file. This can fail
+ * if they try to write past end.
+ */
+ if (lseek64(vblk->fd, off, SEEK_SET) != off)
+ err(1, "Bad seek to sector %llu", out->sector);
+
+ ret = writev(vblk->fd, iov+1, out_num-1);
+ verbose("WRITE to sector %llu: %i\n", out->sector, ret);
+
+ /*
+ * Grr... Now we know how long the descriptor they sent was, we
+ * make sure they didn't try to write over the end of the block
+ * file (possibly extending it).
+ */
+ if (ret > 0 && off + ret > vblk->len) {
+ /* Trim it back to the correct length */
+ ftruncate64(vblk->fd, vblk->len);
+ /* Die, bad Guest, die. */
+ errx(1, "Write past end %llu+%u", off, ret);
+ }
+
+ wlen = sizeof(*in);
+ *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
+ } else if (out->type & VIRTIO_BLK_T_FLUSH) {
+ /* Flush */
+ ret = fdatasync(vblk->fd);
+ verbose("FLUSH fdatasync: %i\n", ret);
+ wlen = sizeof(*in);
+ *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
+ } else {
+ /*
+ * Read
+ *
+ * Move to the right location in the block file. This can fail
+ * if they try to read past end.
+ */
+ if (lseek64(vblk->fd, off, SEEK_SET) != off)
+ err(1, "Bad seek to sector %llu", out->sector);
+
+ ret = readv(vblk->fd, iov+1, in_num-1);
+ verbose("READ from sector %llu: %i\n", out->sector, ret);
+ if (ret >= 0) {
+ wlen = sizeof(*in) + ret;
+ *in = VIRTIO_BLK_S_OK;
+ } else {
+ wlen = sizeof(*in);
+ *in = VIRTIO_BLK_S_IOERR;
+ }
+ }
+
+ /* Finished that request. */
+ add_used(vq, head, wlen);
+}
+
+/*L:198 This actually sets up a virtual block device. */
+static void setup_block_file(const char *filename)
+{
+ struct device *dev;
+ struct vblk_info *vblk;
+ struct virtio_blk_config conf;
+
+ /* Creat the device. */
+ dev = new_device("block", VIRTIO_ID_BLOCK);
+
+ /* The device has one virtqueue, where the Guest places requests. */
+ add_virtqueue(dev, VIRTQUEUE_NUM, blk_request);
+
+ /* Allocate the room for our own bookkeeping */
+ vblk = dev->priv = malloc(sizeof(*vblk));
+
+ /* First we open the file and store the length. */
+ vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE);
+ vblk->len = lseek64(vblk->fd, 0, SEEK_END);
+
+ /* We support FLUSH. */
+ add_feature(dev, VIRTIO_BLK_F_FLUSH);
+
+ /* Tell Guest how many sectors this device has. */
+ conf.capacity = cpu_to_le64(vblk->len / 512);
+
+ /*
+ * Tell Guest not to put in too many descriptors at once: two are used
+ * for the in and out elements.
+ */
+ add_feature(dev, VIRTIO_BLK_F_SEG_MAX);
+ conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2);
+
+ /* Don't try to put whole struct: we have 8 bit limit. */
+ set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf);
+
+ verbose("device %u: virtblock %llu sectors\n",
+ ++devices.device_num, le64_to_cpu(conf.capacity));
+}
+
+/*L:211
+ * Our random number generator device reads from /dev/random into the Guest's
+ * input buffers. The usual case is that the Guest doesn't want random numbers
+ * and so has no buffers although /dev/random is still readable, whereas
+ * console is the reverse.
+ *
+ * The same logic applies, however.
+ */
+struct rng_info {
+ int rfd;
+};
+
+static void rng_input(struct virtqueue *vq)
+{
+ int len;
+ unsigned int head, in_num, out_num, totlen = 0;
+ struct rng_info *rng_info = vq->dev->priv;
+ struct iovec iov[vq->vring.num];
+
+ /* First we need a buffer from the Guests's virtqueue. */
+ head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
+ if (out_num)
+ errx(1, "Output buffers in rng?");
+
+ /*
+ * Just like the console write, we loop to cover the whole iovec.
+ * In this case, short reads actually happen quite a bit.
+ */
+ while (!iov_empty(iov, in_num)) {
+ len = readv(rng_info->rfd, iov, in_num);
+ if (len <= 0)
+ err(1, "Read from /dev/random gave %i", len);
+ iov_consume(iov, in_num, len);
+ totlen += len;
+ }
+
+ /* Tell the Guest about the new input. */
+ add_used(vq, head, totlen);
+}
+
+/*L:199
+ * This creates a "hardware" random number device for the Guest.
+ */
+static void setup_rng(void)
+{
+ struct device *dev;
+ struct rng_info *rng_info = malloc(sizeof(*rng_info));
+
+ /* Our device's privat info simply contains the /dev/random fd. */
+ rng_info->rfd = open_or_die("/dev/random", O_RDONLY);
+
+ /* Create the new device. */
+ dev = new_device("rng", VIRTIO_ID_RNG);
+ dev->priv = rng_info;
+
+ /* The device has one virtqueue, where the Guest places inbufs. */
+ add_virtqueue(dev, VIRTQUEUE_NUM, rng_input);
+
+ verbose("device %u: rng\n", devices.device_num++);
+}
+/* That's the end of device setup. */
+
+/*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */
+static void __attribute__((noreturn)) restart_guest(void)
+{
+ unsigned int i;
+
+ /*
+ * Since we don't track all open fds, we simply close everything beyond
+ * stderr.
+ */
+ for (i = 3; i < FD_SETSIZE; i++)
+ close(i);
+
+ /* Reset all the devices (kills all threads). */
+ cleanup_devices();
+
+ execv(main_args[0], main_args);
+ err(1, "Could not exec %s", main_args[0]);
+}
+
+/*L:220
+ * Finally we reach the core of the Launcher which runs the Guest, serves
+ * its input and output, and finally, lays it to rest.
+ */
+static void __attribute__((noreturn)) run_guest(void)
+{
+ for (;;) {
+ unsigned long notify_addr;
+ int readval;
+
+ /* We read from the /dev/lguest device to run the Guest. */
+ readval = pread(lguest_fd, ¬ify_addr,
+ sizeof(notify_addr), cpu_id);
+
+ /* One unsigned long means the Guest did HCALL_NOTIFY */
+ if (readval == sizeof(notify_addr)) {
+ verbose("Notify on address %#lx\n", notify_addr);
+ handle_output(notify_addr);
+ /* ENOENT means the Guest died. Reading tells us why. */
+ } else if (errno == ENOENT) {
+ char reason[1024] = { 0 };
+ pread(lguest_fd, reason, sizeof(reason)-1, cpu_id);
+ errx(1, "%s", reason);
+ /* ERESTART means that we need to reboot the guest */
+ } else if (errno == ERESTART) {
+ restart_guest();
+ /* Anything else means a bug or incompatible change. */
+ } else
+ err(1, "Running guest failed");
+ }
+}
+/*L:240
+ * This is the end of the Launcher. The good news: we are over halfway
+ * through! The bad news: the most fiendish part of the code still lies ahead
+ * of us.
+ *
+ * Are you ready? Take a deep breath and join me in the core of the Host, in
+ * "make Host".
+:*/
+
+static struct option opts[] = {
+ { "verbose", 0, NULL, 'v' },
+ { "tunnet", 1, NULL, 't' },
+ { "block", 1, NULL, 'b' },
+ { "rng", 0, NULL, 'r' },
+ { "initrd", 1, NULL, 'i' },
+ { "username", 1, NULL, 'u' },
+ { "chroot", 1, NULL, 'c' },
+ { NULL },
+};
+static void usage(void)
+{
+ errx(1, "Usage: lguest [--verbose] "
+ "[--tunnet=(<ipaddr>:<macaddr>|bridge:<bridgename>:<macaddr>)\n"
+ "|--block=<filename>|--initrd=<filename>]...\n"
+ "<mem-in-mb> vmlinux [args...]");
+}
+
+/*L:105 The main routine is where the real work begins: */
+int main(int argc, char *argv[])
+{
+ /* Memory, code startpoint and size of the (optional) initrd. */
+ unsigned long mem = 0, start, initrd_size = 0;
+ /* Two temporaries. */
+ int i, c;
+ /* The boot information for the Guest. */
+ struct boot_params *boot;
+ /* If they specify an initrd file to load. */
+ const char *initrd_name = NULL;
+
+ /* Password structure for initgroups/setres[gu]id */
+ struct passwd *user_details = NULL;
+
+ /* Directory to chroot to */
+ char *chroot_path = NULL;
+
+ /* Save the args: we "reboot" by execing ourselves again. */
+ main_args = argv;
+
+ /*
+ * First we initialize the device list. We keep a pointer to the last
+ * device, and the next interrupt number to use for devices (1:
+ * remember that 0 is used by the timer).
+ */
+ devices.lastdev = NULL;
+ devices.next_irq = 1;
+
+ /* We're CPU 0. In fact, that's the only CPU possible right now. */
+ cpu_id = 0;
+
+ /*
+ * We need to know how much memory so we can set up the device
+ * descriptor and memory pages for the devices as we parse the command
+ * line. So we quickly look through the arguments to find the amount
+ * of memory now.
+ */
+ for (i = 1; i < argc; i++) {
+ if (argv[i][0] != '-') {
+ mem = atoi(argv[i]) * 1024 * 1024;
+ /*
+ * We start by mapping anonymous pages over all of
+ * guest-physical memory range. This fills it with 0,
+ * and ensures that the Guest won't be killed when it
+ * tries to access it.
+ */
+ guest_base = map_zeroed_pages(mem / getpagesize()
+ + DEVICE_PAGES);
+ guest_limit = mem;
+ guest_max = mem + DEVICE_PAGES*getpagesize();
+ devices.descpage = get_pages(1);
+ break;
+ }
+ }
+
+ /* The options are fairly straight-forward */
+ while ((c = getopt_long(argc, argv, "v", opts, NULL)) != EOF) {
+ switch (c) {
+ case 'v':
+ verbose = true;
+ break;
+ case 't':
+ setup_tun_net(optarg);
+ break;
+ case 'b':
+ setup_block_file(optarg);
+ break;
+ case 'r':
+ setup_rng();
+ break;
+ case 'i':
+ initrd_name = optarg;
+ break;
+ case 'u':
+ user_details = getpwnam(optarg);
+ if (!user_details)
+ err(1, "getpwnam failed, incorrect username?");
+ break;
+ case 'c':
+ chroot_path = optarg;
+ break;
+ default:
+ warnx("Unknown argument %s", argv[optind]);
+ usage();
+ }
+ }
+ /*
+ * After the other arguments we expect memory and kernel image name,
+ * followed by command line arguments for the kernel.
+ */
+ if (optind + 2 > argc)
+ usage();
+
+ verbose("Guest base is at %p\n", guest_base);
+
+ /* We always have a console device */
+ setup_console();
+
+ /* Now we load the kernel */
+ start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
+
+ /* Boot information is stashed at physical address 0 */
+ boot = from_guest_phys(0);
+
+ /* Map the initrd image if requested (at top of physical memory) */
+ if (initrd_name) {
+ initrd_size = load_initrd(initrd_name, mem);
+ /*
+ * These are the location in the Linux boot header where the
+ * start and size of the initrd are expected to be found.
+ */
+ boot->hdr.ramdisk_image = mem - initrd_size;
+ boot->hdr.ramdisk_size = initrd_size;
+ /* The bootloader type 0xFF means "unknown"; that's OK. */
+ boot->hdr.type_of_loader = 0xFF;
+ }
+
+ /*
+ * The Linux boot header contains an "E820" memory map: ours is a
+ * simple, single region.
+ */
+ boot->e820_entries = 1;
+ boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM });
+ /*
+ * The boot header contains a command line pointer: we put the command
+ * line after the boot header.
+ */
+ boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1);
+ /* We use a simple helper to copy the arguments separated by spaces. */
+ concat((char *)(boot + 1), argv+optind+2);
+
+ /* Boot protocol version: 2.07 supports the fields for lguest. */
+ boot->hdr.version = 0x207;
+
+ /* The hardware_subarch value of "1" tells the Guest it's an lguest. */
+ boot->hdr.hardware_subarch = 1;
+
+ /* Tell the entry path not to try to reload segment registers. */
+ boot->hdr.loadflags |= KEEP_SEGMENTS;
+
+ /*
+ * We tell the kernel to initialize the Guest: this returns the open
+ * /dev/lguest file descriptor.
+ */
+ tell_kernel(start);
+
+ /* Ensure that we terminate if a device-servicing child dies. */
+ signal(SIGCHLD, kill_launcher);
+
+ /* If we exit via err(), this kills all the threads, restores tty. */
+ atexit(cleanup_devices);
+
+ /* If requested, chroot to a directory */
+ if (chroot_path) {
+ if (chroot(chroot_path) != 0)
+ err(1, "chroot(\"%s\") failed", chroot_path);
+
+ if (chdir("/") != 0)
+ err(1, "chdir(\"/\") failed");
+
+ verbose("chroot done\n");
+ }
+
+ /* If requested, drop privileges */
+ if (user_details) {
+ uid_t u;
+ gid_t g;
+
+ u = user_details->pw_uid;
+ g = user_details->pw_gid;
+
+ if (initgroups(user_details->pw_name, g) != 0)
+ err(1, "initgroups failed");
+
+ if (setresgid(g, g, g) != 0)
+ err(1, "setresgid failed");
+
+ if (setresuid(u, u, u) != 0)
+ err(1, "setresuid failed");
+
+ verbose("Dropping privileges completed\n");
+ }
+
+ /* Finally, run the Guest. This doesn't return. */
+ run_guest();
+}
+/*:*/
+
+/*M:999
+ * Mastery is done: you now know everything I do.
+ *
+ * But surely you have seen code, features and bugs in your wanderings which
+ * you now yearn to attack? That is the real game, and I look forward to you
+ * patching and forking lguest into the Your-Name-Here-visor.
+ *
+ * Farewell, and good coding!
+ * Rusty Russell.
+ */
--- /dev/null
+ __
+ (___()'`; Rusty's Remarkably Unreliable Guide to Lguest
+ /, /` - or, A Young Coder's Illustrated Hypervisor
+ \\"--\\ http://lguest.ozlabs.org
+
+Lguest is designed to be a minimal 32-bit x86 hypervisor for the Linux kernel,
+for Linux developers and users to experiment with virtualization with the
+minimum of complexity. Nonetheless, it should have sufficient features to
+make it useful for specific tasks, and, of course, you are encouraged to fork
+and enhance it (see drivers/lguest/README).
+
+Features:
+
+- Kernel module which runs in a normal kernel.
+- Simple I/O model for communication.
+- Simple program to create new guests.
+- Logo contains cute puppies: http://lguest.ozlabs.org
+
+Developer features:
+
+- Fun to hack on.
+- No ABI: being tied to a specific kernel anyway, you can change anything.
+- Many opportunities for improvement or feature implementation.
+
+Running Lguest:
+
+- The easiest way to run lguest is to use same kernel as guest and host.
+ You can configure them differently, but usually it's easiest not to.
+
+ You will need to configure your kernel with the following options:
+
+ "General setup":
+ "Prompt for development and/or incomplete code/drivers" = Y
+ (CONFIG_EXPERIMENTAL=y)
+
+ "Processor type and features":
+ "Paravirtualized guest support" = Y
+ "Lguest guest support" = Y
+ "High Memory Support" = off/4GB
+ "Alignment value to which kernel should be aligned" = 0x100000
+ (CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
+ CONFIG_PHYSICAL_ALIGN=0x100000)
+
+ "Device Drivers":
+ "Block devices"
+ "Virtio block driver (EXPERIMENTAL)" = M/Y
+ "Network device support"
+ "Universal TUN/TAP device driver support" = M/Y
+ "Virtio network driver (EXPERIMENTAL)" = M/Y
+ (CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m and CONFIG_TUN=m)
+
+ "Virtualization"
+ "Linux hypervisor example code" = M/Y
+ (CONFIG_LGUEST=m)
+
+- A tool called "lguest" is available in this directory: type "make"
+ to build it. If you didn't build your kernel in-tree, use "make
+ O=<builddir>".
+
+- Create or find a root disk image. There are several useful ones
+ around, such as the xm-test tiny root image at
+ http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img
+
+ For more serious work, I usually use a distribution ISO image and
+ install it under qemu, then make multiple copies:
+
+ dd if=/dev/zero of=rootfile bs=1M count=2048
+ qemu -cdrom image.iso -hda rootfile -net user -net nic -boot d
+
+ Make sure that you install a getty on /dev/hvc0 if you want to log in on the
+ console!
+
+- "modprobe lg" if you built it as a module.
+
+- Run an lguest as root:
+
+ Documentation/virtual/lguest/lguest 64 vmlinux --tunnet=192.168.19.1 \
+ --block=rootfile root=/dev/vda
+
+ Explanation:
+ 64: the amount of memory to use, in MB.
+
+ vmlinux: the kernel image found in the top of your build directory. You
+ can also use a standard bzImage.
+
+ --tunnet=192.168.19.1: configures a "tap" device for networking with this
+ IP address.
+
+ --block=rootfile: a file or block device which becomes /dev/vda
+ inside the guest.
+
+ root=/dev/vda: this (and anything else on the command line) are
+ kernel boot parameters.
+
+- Configuring networking. I usually have the host masquerade, using
+ "iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE" and "echo 1 >
+ /proc/sys/net/ipv4/ip_forward". In this example, I would configure
+ eth0 inside the guest at 192.168.19.2.
+
+ Another method is to bridge the tap device to an external interface
+ using --tunnet=bridge:<bridgename>, and perhaps run dhcp on the guest
+ to obtain an IP address. The bridge needs to be configured first:
+ this option simply adds the tap interface to it.
+
+ A simple example on my system:
+
+ ifconfig eth0 0.0.0.0
+ brctl addbr lg0
+ ifconfig lg0 up
+ brctl addif lg0 eth0
+ dhclient lg0
+
+ Then use --tunnet=bridge:lg0 when launching the guest.
+
+ See:
+
+ http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
+
+ for general information on how to get bridging to work.
+
+- Random number generation. Using the --rng option will provide a
+ /dev/hwrng in the guest that will read from the host's /dev/random.
+ Use this option in conjunction with rng-tools (see ../hw_random.txt)
+ to provide entropy to the guest kernel's /dev/random.
+
+There is a helpful mailing list at http://ozlabs.org/mailman/listinfo/lguest
+
+Good luck!
+Rusty Russell rusty@rustcorp.com.au.
--- /dev/null
+ User Mode Linux HOWTO
+ User Mode Linux Core Team
+ Mon Nov 18 14:16:16 EST 2002
+
+ This document describes the use and abuse of Jeff Dike's User Mode
+ Linux: a port of the Linux kernel as a normal Intel Linux process.
+ ______________________________________________________________________
+
+ Table of Contents
+
+ 1. Introduction
+
+ 1.1 How is User Mode Linux Different?
+ 1.2 Why Would I Want User Mode Linux?
+
+ 2. Compiling the kernel and modules
+
+ 2.1 Compiling the kernel
+ 2.2 Compiling and installing kernel modules
+ 2.3 Compiling and installing uml_utilities
+
+ 3. Running UML and logging in
+
+ 3.1 Running UML
+ 3.2 Logging in
+ 3.3 Examples
+
+ 4. UML on 2G/2G hosts
+
+ 4.1 Introduction
+ 4.2 The problem
+ 4.3 The solution
+
+ 5. Setting up serial lines and consoles
+
+ 5.1 Specifying the device
+ 5.2 Specifying the channel
+ 5.3 Examples
+
+ 6. Setting up the network
+
+ 6.1 General setup
+ 6.2 Userspace daemons
+ 6.3 Specifying ethernet addresses
+ 6.4 UML interface setup
+ 6.5 Multicast
+ 6.6 TUN/TAP with the uml_net helper
+ 6.7 TUN/TAP with a preconfigured tap device
+ 6.8 Ethertap
+ 6.9 The switch daemon
+ 6.10 Slip
+ 6.11 Slirp
+ 6.12 pcap
+ 6.13 Setting up the host yourself
+
+ 7. Sharing Filesystems between Virtual Machines
+
+ 7.1 A warning
+ 7.2 Using layered block devices
+ 7.3 Note!
+ 7.4 Another warning
+ 7.5 uml_moo : Merging a COW file with its backing file
+
+ 8. Creating filesystems
+
+ 8.1 Create the filesystem file
+ 8.2 Assign the file to a UML device
+ 8.3 Creating and mounting the filesystem
+
+ 9. Host file access
+
+ 9.1 Using hostfs
+ 9.2 hostfs as the root filesystem
+ 9.3 Building hostfs
+
+ 10. The Management Console
+ 10.1 version
+ 10.2 halt and reboot
+ 10.3 config
+ 10.4 remove
+ 10.5 sysrq
+ 10.6 help
+ 10.7 cad
+ 10.8 stop
+ 10.9 go
+
+ 11. Kernel debugging
+
+ 11.1 Starting the kernel under gdb
+ 11.2 Examining sleeping processes
+ 11.3 Running ddd on UML
+ 11.4 Debugging modules
+ 11.5 Attaching gdb to the kernel
+ 11.6 Using alternate debuggers
+
+ 12. Kernel debugging examples
+
+ 12.1 The case of the hung fsck
+ 12.2 Episode 2: The case of the hung fsck
+
+ 13. What to do when UML doesn't work
+
+ 13.1 Strange compilation errors when you build from source
+ 13.2 (obsolete)
+ 13.3 A variety of panics and hangs with /tmp on a reiserfs filesystem
+ 13.4 The compile fails with errors about conflicting types for 'open', 'dup', and 'waitpid'
+ 13.5 UML doesn't work when /tmp is an NFS filesystem
+ 13.6 UML hangs on boot when compiled with gprof support
+ 13.7 syslogd dies with a SIGTERM on startup
+ 13.8 TUN/TAP networking doesn't work on a 2.4 host
+ 13.9 You can network to the host but not to other machines on the net
+ 13.10 I have no root and I want to scream
+ 13.11 UML build conflict between ptrace.h and ucontext.h
+ 13.12 The UML BogoMips is exactly half the host's BogoMips
+ 13.13 When you run UML, it immediately segfaults
+ 13.14 xterms appear, then immediately disappear
+ 13.15 Any other panic, hang, or strange behavior
+
+ 14. Diagnosing Problems
+
+ 14.1 Case 1 : Normal kernel panics
+ 14.2 Case 2 : Tracing thread panics
+ 14.3 Case 3 : Tracing thread panics caused by other threads
+ 14.4 Case 4 : Hangs
+
+ 15. Thanks
+
+ 15.1 Code and Documentation
+ 15.2 Flushing out bugs
+ 15.3 Buglets and clean-ups
+ 15.4 Case Studies
+ 15.5 Other contributions
+
+
+ ______________________________________________________________________
+
+ 1\b1.\b. I\bIn\bnt\btr\bro\bod\bdu\buc\bct\bti\bio\bon\bn
+
+ Welcome to User Mode Linux. It's going to be fun.
+
+
+
+ 1\b1.\b.1\b1.\b. H\bHo\bow\bw i\bis\bs U\bUs\bse\ber\br M\bMo\bod\bde\be L\bLi\bin\bnu\bux\bx D\bDi\bif\bff\bfe\ber\bre\ben\bnt\bt?\b?
+
+ Normally, the Linux Kernel talks straight to your hardware (video
+ card, keyboard, hard drives, etc), and any programs which run ask the
+ kernel to operate the hardware, like so:
+
+
+
+ +-----------+-----------+----+
+ | Process 1 | Process 2 | ...|
+ +-----------+-----------+----+
+ | Linux Kernel |
+ +----------------------------+
+ | Hardware |
+ +----------------------------+
+
+
+
+
+ The User Mode Linux Kernel is different; instead of talking to the
+ hardware, it talks to a `real' Linux kernel (called the `host kernel'
+ from now on), like any other program. Programs can then run inside
+ User-Mode Linux as if they were running under a normal kernel, like
+ so:
+
+
+
+ +----------------+
+ | Process 2 | ...|
+ +-----------+----------------+
+ | Process 1 | User-Mode Linux|
+ +----------------------------+
+ | Linux Kernel |
+ +----------------------------+
+ | Hardware |
+ +----------------------------+
+
+
+
+
+
+ 1\b1.\b.2\b2.\b. W\bWh\bhy\by W\bWo\bou\bul\bld\bd I\bI W\bWa\ban\bnt\bt U\bUs\bse\ber\br M\bMo\bod\bde\be L\bLi\bin\bnu\bux\bx?\b?
+
+
+ 1. If User Mode Linux crashes, your host kernel is still fine.
+
+ 2. You can run a usermode kernel as a non-root user.
+
+ 3. You can debug the User Mode Linux like any normal process.
+
+ 4. You can run gprof (profiling) and gcov (coverage testing).
+
+ 5. You can play with your kernel without breaking things.
+
+ 6. You can use it as a sandbox for testing new apps.
+
+ 7. You can try new development kernels safely.
+
+ 8. You can run different distributions simultaneously.
+
+ 9. It's extremely fun.
+
+
+
+
+
+ 2\b2.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg t\bth\bhe\be k\bke\ber\brn\bne\bel\bl a\ban\bnd\bd m\bmo\bod\bdu\bul\ble\bes\bs
+
+
+
+
+ 2\b2.\b.1\b1.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg t\bth\bhe\be k\bke\ber\brn\bne\bel\bl
+
+
+ Compiling the user mode kernel is just like compiling any other
+ kernel. Let's go through the steps, using 2.4.0-prerelease (current
+ as of this writing) as an example:
+
+
+ 1. Download the latest UML patch from
+
+ the download page <http://user-mode-linux.sourceforge.net/
+
+ In this example, the file is uml-patch-2.4.0-prerelease.bz2.
+
+
+ 2. Download the matching kernel from your favourite kernel mirror,
+ such as:
+
+ ftp://ftp.ca.kernel.org/pub/kernel/v2.4/linux-2.4.0-prerelease.tar.bz2
+ <ftp://ftp.ca.kernel.org/pub/kernel/v2.4/linux-2.4.0-prerelease.tar.bz2>
+ .
+
+
+ 3. Make a directory and unpack the kernel into it.
+
+
+
+ host%
+ mkdir ~/uml
+
+
+
+
+
+
+ host%
+ cd ~/uml
+
+
+
+
+
+
+ host%
+ tar -xzvf linux-2.4.0-prerelease.tar.bz2
+
+
+
+
+
+
+ 4. Apply the patch using
+
+
+
+ host%
+ cd ~/uml/linux
+
+
+
+ host%
+ bzcat uml-patch-2.4.0-prerelease.bz2 | patch -p1
+
+
+
+
+
+
+ 5. Run your favorite config; `make xconfig ARCH=um' is the most
+ convenient. `make config ARCH=um' and 'make menuconfig ARCH=um'
+ will work as well. The defaults will give you a useful kernel. If
+ you want to change something, go ahead, it probably won't hurt
+ anything.
+
+
+ Note: If the host is configured with a 2G/2G address space split
+ rather than the usual 3G/1G split, then the packaged UML binaries
+ will not run. They will immediately segfault. See ``UML on 2G/2G
+ hosts'' for the scoop on running UML on your system.
+
+
+
+ 6. Finish with `make linux ARCH=um': the result is a file called
+ `linux' in the top directory of your source tree.
+
+ Make sure that you don't build this kernel in /usr/src/linux. On some
+ distributions, /usr/include/asm is a link into this pool. The user-
+ mode build changes the other end of that link, and things that include
+ <asm/anything.h> stop compiling.
+
+ The sources are also available from cvs at the project's cvs page,
+ which has directions on getting the sources. You can also browse the
+ CVS pool from there.
+
+ If you get the CVS sources, you will have to check them out into an
+ empty directory. You will then have to copy each file into the
+ corresponding directory in the appropriate kernel pool.
+
+ If you don't have the latest kernel pool, you can get the
+ corresponding user-mode sources with
+
+
+ host% cvs co -r v_2_3_x linux
+
+
+
+
+ where 'x' is the version in your pool. Note that you will not get the
+ bug fixes and enhancements that have gone into subsequent releases.
+
+
+ 2\b2.\b.2\b2.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg a\ban\bnd\bd i\bin\bns\bst\bta\bal\bll\bli\bin\bng\bg k\bke\ber\brn\bne\bel\bl m\bmo\bod\bdu\bul\ble\bes\bs
+
+ UML modules are built in the same way as the native kernel (with the
+ exception of the 'ARCH=um' that you always need for UML):
+
+
+ host% make modules ARCH=um
+
+
+
+
+ Any modules that you want to load into this kernel need to be built in
+ the user-mode pool. Modules from the native kernel won't work.
+
+ You can install them by using ftp or something to copy them into the
+ virtual machine and dropping them into /lib/modules/`uname -r`.
+
+ You can also get the kernel build process to install them as follows:
+
+ 1. with the kernel not booted, mount the root filesystem in the top
+ level of the kernel pool:
+
+
+ host% mount root_fs mnt -o loop
+
+
+
+
+
+
+ 2. run
+
+
+ host%
+ make modules_install INSTALL_MOD_PATH=`pwd`/mnt ARCH=um
+
+
+
+
+
+
+ 3. unmount the filesystem
+
+
+ host% umount mnt
+
+
+
+
+
+
+ 4. boot the kernel on it
+
+
+ When the system is booted, you can use insmod as usual to get the
+ modules into the kernel. A number of things have been loaded into UML
+ as modules, especially filesystems and network protocols and filters,
+ so most symbols which need to be exported probably already are.
+ However, if you do find symbols that need exporting, let us
+ <http://user-mode-linux.sourceforge.net/> know, and
+ they'll be "taken care of".
+
+
+
+ 2\b2.\b.3\b3.\b. C\bCo\bom\bmp\bpi\bil\bli\bin\bng\bg a\ban\bnd\bd i\bin\bns\bst\bta\bal\bll\bli\bin\bng\bg u\bum\bml\bl_\b_u\but\bti\bil\bli\bit\bti\bie\bes\bs
+
+ Many features of the UML kernel require a user-space helper program,
+ so a uml_utilities package is distributed separately from the kernel
+ patch which provides these helpers. Included within this is:
+
+ +\bo port-helper - Used by consoles which connect to xterms or ports
+
+ +\bo tunctl - Configuration tool to create and delete tap devices
+
+ +\bo uml_net - Setuid binary for automatic tap device configuration
+
+ +\bo uml_switch - User-space virtual switch required for daemon
+ transport
+
+ The uml_utilities tree is compiled with:
+
+
+ host#
+ make && make install
+
+
+
+
+ Note that UML kernel patches may require a specific version of the
+ uml_utilities distribution. If you don't keep up with the mailing
+ lists, ensure that you have the latest release of uml_utilities if you
+ are experiencing problems with your UML kernel, particularly when
+ dealing with consoles or command-line switches to the helper programs
+
+
+
+
+
+
+
+
+ 3\b3.\b. R\bRu\bun\bnn\bni\bin\bng\bg U\bUM\bML\bL a\ban\bnd\bd l\blo\bog\bgg\bgi\bin\bng\bg i\bin\bn
+
+
+
+ 3\b3.\b.1\b1.\b. R\bRu\bun\bnn\bni\bin\bng\bg U\bUM\bML\bL
+
+ It runs on 2.2.15 or later, and all 2.4 kernels.
+
+
+ Booting UML is straightforward. Simply run 'linux': it will try to
+ mount the file `root_fs' in the current directory. You do not need to
+ run it as root. If your root filesystem is not named `root_fs', then
+ you need to put a `ubd0=root_fs_whatever' switch on the linux command
+ line.
+
+
+ You will need a filesystem to boot UML from. There are a number
+ available for download from here <http://user-mode-
+ linux.sourceforge.net/> . There are also several tools
+ <http://user-mode-linux.sourceforge.net/> which can be
+ used to generate UML-compatible filesystem images from media.
+ The kernel will boot up and present you with a login prompt.
+
+
+ Note: If the host is configured with a 2G/2G address space split
+ rather than the usual 3G/1G split, then the packaged UML binaries will
+ not run. They will immediately segfault. See ``UML on 2G/2G hosts''
+ for the scoop on running UML on your system.
+
+
+
+ 3\b3.\b.2\b2.\b. L\bLo\bog\bgg\bgi\bin\bng\bg i\bin\bn
+
+
+
+ The prepackaged filesystems have a root account with password 'root'
+ and a user account with password 'user'. The login banner will
+ generally tell you how to log in. So, you log in and you will find
+ yourself inside a little virtual machine. Our filesystems have a
+ variety of commands and utilities installed (and it is fairly easy to
+ add more), so you will have a lot of tools with which to poke around
+ the system.
+
+ There are a couple of other ways to log in:
+
+ +\bo On a virtual console
+
+
+
+ Each virtual console that is configured (i.e. the device exists in
+ /dev and /etc/inittab runs a getty on it) will come up in its own
+ xterm. If you get tired of the xterms, read ``Setting up serial
+ lines and consoles'' to see how to attach the consoles to
+ something else, like host ptys.
+
+
+
+ +\bo Over the serial line
+
+
+ In the boot output, find a line that looks like:
+
+
+
+ serial line 0 assigned pty /dev/ptyp1
+
+
+
+
+ Attach your favorite terminal program to the corresponding tty. I.e.
+ for minicom, the command would be
+
+
+ host% minicom -o -p /dev/ttyp1
+
+
+
+
+
+
+ +\bo Over the net
+
+
+ If the network is running, then you can telnet to the virtual
+ machine and log in to it. See ``Setting up the network'' to learn
+ about setting up a virtual network.
+
+ When you're done using it, run halt, and the kernel will bring itself
+ down and the process will exit.
+
+
+ 3\b3.\b.3\b3.\b. E\bEx\bxa\bam\bmp\bpl\ble\bes\bs
+
+ Here are some examples of UML in action:
+
+ +\bo A login session <http://user-mode-linux.sourceforge.net/login.html>
+
+ +\bo A virtual network <http://user-mode-linux.sourceforge.net/net.html>
+
+
+
+
+
+
+
+ 4\b4.\b. U\bUM\bML\bL o\bon\bn 2\b2G\bG/\b/2\b2G\bG h\bho\bos\bst\bts\bs
+
+
+
+
+ 4\b4.\b.1\b1.\b. I\bIn\bnt\btr\bro\bod\bdu\buc\bct\bti\bio\bon\bn
+
+
+ Most Linux machines are configured so that the kernel occupies the
+ upper 1G (0xc0000000 - 0xffffffff) of the 4G address space and
+ processes use the lower 3G (0x00000000 - 0xbfffffff). However, some
+ machine are configured with a 2G/2G split, with the kernel occupying
+ the upper 2G (0x80000000 - 0xffffffff) and processes using the lower
+ 2G (0x00000000 - 0x7fffffff).
+
+
+
+
+ 4\b4.\b.2\b2.\b. T\bTh\bhe\be p\bpr\bro\bob\bbl\ble\bem\bm
+
+
+ The prebuilt UML binaries on this site will not run on 2G/2G hosts
+ because UML occupies the upper .5G of the 3G process address space
+ (0xa0000000 - 0xbfffffff). Obviously, on 2G/2G hosts, this is right
+ in the middle of the kernel address space, so UML won't even load - it
+ will immediately segfault.
+
+
+
+
+ 4\b4.\b.3\b3.\b. T\bTh\bhe\be s\bso\bol\blu\but\bti\bio\bon\bn
+
+
+ The fix for this is to rebuild UML from source after enabling
+ CONFIG_HOST_2G_2G (under 'General Setup'). This will cause UML to
+ load itself in the top .5G of that smaller process address space,
+ where it will run fine. See ``Compiling the kernel and modules'' if
+ you need help building UML from source.
+
+
+
+
+
+
+
+
+
+
+ 5\b5.\b. S\bSe\bet\btt\bti\bin\bng\bg u\bup\bp s\bse\ber\bri\bia\bal\bl l\bli\bin\bne\bes\bs a\ban\bnd\bd c\bco\bon\bns\bso\bol\ble\bes\bs
+
+
+ It is possible to attach UML serial lines and consoles to many types
+ of host I/O channels by specifying them on the command line.
+
+
+ You can attach them to host ptys, ttys, file descriptors, and ports.
+ This allows you to do things like
+
+ +\bo have a UML console appear on an unused host console,
+
+ +\bo hook two virtual machines together by having one attach to a pty
+ and having the other attach to the corresponding tty
+
+ +\bo make a virtual machine accessible from the net by attaching a
+ console to a port on the host.
+
+
+ The general format of the command line option is device=channel.
+
+
+
+ 5\b5.\b.1\b1.\b. S\bSp\bpe\bec\bci\bif\bfy\byi\bin\bng\bg t\bth\bhe\be d\bde\bev\bvi\bic\bce\be
+
+ Devices are specified with "con" or "ssl" (console or serial line,
+ respectively), optionally with a device number if you are talking
+ about a specific device.
+
+
+ Using just "con" or "ssl" describes all of the consoles or serial
+ lines. If you want to talk about console #3 or serial line #10, they
+ would be "con3" and "ssl10", respectively.
+
+
+ A specific device name will override a less general "con=" or "ssl=".
+ So, for example, you can assign a pty to each of the serial lines
+ except for the first two like this:
+
+
+ ssl=pty ssl0=tty:/dev/tty0 ssl1=tty:/dev/tty1
+
+
+
+
+ The specificity of the device name is all that matters; order on the
+ command line is irrelevant.
+
+
+
+ 5\b5.\b.2\b2.\b. S\bSp\bpe\bec\bci\bif\bfy\byi\bin\bng\bg t\bth\bhe\be c\bch\bha\ban\bnn\bne\bel\bl
+
+ There are a number of different types of channels to attach a UML
+ device to, each with a different way of specifying exactly what to
+ attach to.
+
+ +\bo pseudo-terminals - device=pty pts terminals - device=pts
+
+
+ This will cause UML to allocate a free host pseudo-terminal for the
+ device. The terminal that it got will be announced in the boot
+ log. You access it by attaching a terminal program to the
+ corresponding tty:
+
+ +\bo screen /dev/pts/n
+
+ +\bo screen /dev/ttyxx
+
+ +\bo minicom -o -p /dev/ttyxx - minicom seems not able to handle pts
+ devices
+
+ +\bo kermit - start it up, 'open' the device, then 'connect'
+
+
+
+
+
+ +\bo terminals - device=tty:tty device file
+
+
+ This will make UML attach the device to the specified tty (i.e
+
+
+ con1=tty:/dev/tty3
+
+
+
+
+ will attach UML's console 1 to the host's /dev/tty3). If the tty that
+ you specify is the slave end of a tty/pty pair, something else must
+ have already opened the corresponding pty in order for this to work.
+
+
+
+
+
+ +\bo xterms - device=xterm
+
+
+ UML will run an xterm and the device will be attached to it.
+
+
+
+
+
+ +\bo Port - device=port:port number
+
+
+ This will attach the UML devices to the specified host port.
+ Attaching console 1 to the host's port 9000 would be done like
+ this:
+
+
+ con1=port:9000
+
+
+
+
+ Attaching all the serial lines to that port would be done similarly:
+
+
+ ssl=port:9000
+
+
+
+
+ You access these devices by telnetting to that port. Each active tel-
+ net session gets a different device. If there are more telnets to a
+ port than UML devices attached to it, then the extra telnet sessions
+ will block until an existing telnet detaches, or until another device
+ becomes active (i.e. by being activated in /etc/inittab).
+
+ This channel has the advantage that you can both attach multiple UML
+ devices to it and know how to access them without reading the UML boot
+ log. It is also unique in allowing access to a UML from remote
+ machines without requiring that the UML be networked. This could be
+ useful in allowing public access to UMLs because they would be
+ accessible from the net, but wouldn't need any kind of network
+ filtering or access control because they would have no network access.
+
+
+ If you attach the main console to a portal, then the UML boot will
+ appear to hang. In reality, it's waiting for a telnet to connect, at
+ which point the boot will proceed.
+
+
+
+
+
+ +\bo already-existing file descriptors - device=file descriptor
+
+
+ If you set up a file descriptor on the UML command line, you can
+ attach a UML device to it. This is most commonly used to put the
+ main console back on stdin and stdout after assigning all the other
+ consoles to something else:
+
+
+ con0=fd:0,fd:1 con=pts
+
+
+
+
+
+
+
+
+ +\bo Nothing - device=null
+
+
+ This allows the device to be opened, in contrast to 'none', but
+ reads will block, and writes will succeed and the data will be
+ thrown out.
+
+
+
+
+
+ +\bo None - device=none
+
+
+ This causes the device to disappear.
+
+
+
+ You can also specify different input and output channels for a device
+ by putting a comma between them:
+
+
+ ssl3=tty:/dev/tty2,xterm
+
+
+
+
+ will cause serial line 3 to accept input on the host's /dev/tty3 and
+ display output on an xterm. That's a silly example - the most common
+ use of this syntax is to reattach the main console to stdin and stdout
+ as shown above.
+
+
+ If you decide to move the main console away from stdin/stdout, the
+ initial boot output will appear in the terminal that you're running
+ UML in. However, once the console driver has been officially
+ initialized, then the boot output will start appearing wherever you
+ specified that console 0 should be. That device will receive all
+ subsequent output.
+
+
+
+ 5\b5.\b.3\b3.\b. E\bEx\bxa\bam\bmp\bpl\ble\bes\bs
+
+ There are a number of interesting things you can do with this
+ capability.
+
+
+ First, this is how you get rid of those bleeding console xterms by
+ attaching them to host ptys:
+
+
+ con=pty con0=fd:0,fd:1
+
+
+
+
+ This will make a UML console take over an unused host virtual console,
+ so that when you switch to it, you will see the UML login prompt
+ rather than the host login prompt:
+
+
+ con1=tty:/dev/tty6
+
+
+
+
+ You can attach two virtual machines together with what amounts to a
+ serial line as follows:
+
+ Run one UML with a serial line attached to a pty -
+
+
+ ssl1=pty
+
+
+
+
+ Look at the boot log to see what pty it got (this example will assume
+ that it got /dev/ptyp1).
+
+ Boot the other UML with a serial line attached to the corresponding
+ tty -
+
+
+ ssl1=tty:/dev/ttyp1
+
+
+
+
+ Log in, make sure that it has no getty on that serial line, attach a
+ terminal program like minicom to it, and you should see the login
+ prompt of the other virtual machine.
+
+
+ 6\b6.\b. S\bSe\bet\btt\bti\bin\bng\bg u\bup\bp t\bth\bhe\be n\bne\bet\btw\bwo\bor\brk\bk
+
+
+
+ This page describes how to set up the various transports and to
+ provide a UML instance with network access to the host, other machines
+ on the local net, and the rest of the net.
+
+
+ As of 2.4.5, UML networking has been completely redone to make it much
+ easier to set up, fix bugs, and add new features.
+
+
+ There is a new helper, uml_net, which does the host setup that
+ requires root privileges.
+
+
+ There are currently five transport types available for a UML virtual
+ machine to exchange packets with other hosts:
+
+ +\bo ethertap
+
+ +\bo TUN/TAP
+
+ +\bo Multicast
+
+ +\bo a switch daemon
+
+ +\bo slip
+
+ +\bo slirp
+
+ +\bo pcap
+
+ The TUN/TAP, ethertap, slip, and slirp transports allow a UML
+ instance to exchange packets with the host. They may be directed
+ to the host or the host may just act as a router to provide access
+ to other physical or virtual machines.
+
+
+ The pcap transport is a synthetic read-only interface, using the
+ libpcap binary to collect packets from interfaces on the host and
+ filter them. This is useful for building preconfigured traffic
+ monitors or sniffers.
+
+
+ The daemon and multicast transports provide a completely virtual
+ network to other virtual machines. This network is completely
+ disconnected from the physical network unless one of the virtual
+ machines on it is acting as a gateway.
+
+
+ With so many host transports, which one should you use? Here's when
+ you should use each one:
+
+ +\bo ethertap - if you want access to the host networking and it is
+ running 2.2
+
+ +\bo TUN/TAP - if you want access to the host networking and it is
+ running 2.4. Also, the TUN/TAP transport is able to use a
+ preconfigured device, allowing it to avoid using the setuid uml_net
+ helper, which is a security advantage.
+
+ +\bo Multicast - if you want a purely virtual network and you don't want
+ to set up anything but the UML
+
+ +\bo a switch daemon - if you want a purely virtual network and you
+ don't mind running the daemon in order to get somewhat better
+ performance
+
+ +\bo slip - there is no particular reason to run the slip backend unless
+ ethertap and TUN/TAP are just not available for some reason
+
+ +\bo slirp - if you don't have root access on the host to setup
+ networking, or if you don't want to allocate an IP to your UML
+
+ +\bo pcap - not much use for actual network connectivity, but great for
+ monitoring traffic on the host
+
+ Ethertap is available on 2.4 and works fine. TUN/TAP is preferred
+ to it because it has better performance and ethertap is officially
+ considered obsolete in 2.4. Also, the root helper only needs to
+ run occasionally for TUN/TAP, rather than handling every packet, as
+ it does with ethertap. This is a slight security advantage since
+ it provides fewer opportunities for a nasty UML user to somehow
+ exploit the helper's root privileges.
+
+
+ 6\b6.\b.1\b1.\b. G\bGe\ben\bne\ber\bra\bal\bl s\bse\bet\btu\bup\bp
+
+ First, you must have the virtual network enabled in your UML. If are
+ running a prebuilt kernel from this site, everything is already
+ enabled. If you build the kernel yourself, under the "Network device
+ support" menu, enable "Network device support", and then the three
+ transports.
+
+
+ The next step is to provide a network device to the virtual machine.
+ This is done by describing it on the kernel command line.
+
+ The general format is
+
+
+ eth <n> = <transport> , <transport args>
+
+
+
+
+ For example, a virtual ethernet device may be attached to a host
+ ethertap device as follows:
+
+
+ eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254
+
+
+
+
+ This sets up eth0 inside the virtual machine to attach itself to the
+ host /dev/tap0, assigns it an ethernet address, and assigns the host
+ tap0 interface an IP address.
+
+
+
+ Note that the IP address you assign to the host end of the tap device
+ must be different than the IP you assign to the eth device inside UML.
+ If you are short on IPs and don't want to consume two per UML, then
+ you can reuse the host's eth IP address for the host ends of the tap
+ devices. Internally, the UMLs must still get unique IPs for their eth
+ devices. You can also give the UMLs non-routable IPs (192.168.x.x or
+ 10.x.x.x) and have the host masquerade them. This will let outgoing
+ connections work, but incoming connections won't without more work,
+ such as port forwarding from the host.
+ Also note that when you configure the host side of an interface, it is
+ only acting as a gateway. It will respond to pings sent to it
+ locally, but is not useful to do that since it's a host interface.
+ You are not talking to the UML when you ping that interface and get a
+ response.
+
+
+ You can also add devices to a UML and remove them at runtime. See the
+ ``The Management Console'' page for details.
+
+
+ The sections below describe this in more detail.
+
+
+ Once you've decided how you're going to set up the devices, you boot
+ UML, log in, configure the UML side of the devices, and set up routes
+ to the outside world. At that point, you will be able to talk to any
+ other machines, physical or virtual, on the net.
+
+
+ If ifconfig inside UML fails and the network refuses to come up, run
+ tell you what went wrong.
+
+
+
+ 6\b6.\b.2\b2.\b. U\bUs\bse\ber\brs\bsp\bpa\bac\bce\be d\bda\bae\bem\bmo\bon\bns\bs
+
+ You will likely need the setuid helper, or the switch daemon, or both.
+ They are both installed with the RPM and deb, so if you've installed
+ either, you can skip the rest of this section.
+
+
+ If not, then you need to check them out of CVS, build them, and
+ install them. The helper is uml_net, in CVS /tools/uml_net, and the
+ daemon is uml_switch, in CVS /tools/uml_router. They are both built
+ with a plain 'make'. Both need to be installed in a directory that's
+ in your path - /usr/bin is recommend. On top of that, uml_net needs
+ to be setuid root.
+
+
+
+ 6\b6.\b.3\b3.\b. S\bSp\bpe\bec\bci\bif\bfy\byi\bin\bng\bg e\bet\bth\bhe\ber\brn\bne\bet\bt a\bad\bdd\bdr\bre\bes\bss\bse\bes\bs
+
+ Below, you will see that the TUN/TAP, ethertap, and daemon interfaces
+ allow you to specify hardware addresses for the virtual ethernet
+ devices. This is generally not necessary. If you don't have a
+ specific reason to do it, you probably shouldn't. If one is not
+ specified on the command line, the driver will assign one based on the
+ device IP address. It will provide the address fe:fd:nn:nn:nn:nn
+ where nn.nn.nn.nn is the device IP address. This is nearly always
+ sufficient to guarantee a unique hardware address for the device. A
+ couple of exceptions are:
+
+ +\bo Another set of virtual ethernet devices are on the same network and
+ they are assigned hardware addresses using a different scheme which
+ may conflict with the UML IP address-based scheme
+
+ +\bo You aren't going to use the device for IP networking, so you don't
+ assign the device an IP address
+
+ If you let the driver provide the hardware address, you should make
+ sure that the device IP address is known before the interface is
+ brought up. So, inside UML, this will guarantee that:
+
+
+
+ UML#
+ ifconfig eth0 192.168.0.250 up
+
+
+
+
+ If you decide to assign the hardware address yourself, make sure that
+ the first byte of the address is even. Addresses with an odd first
+ byte are broadcast addresses, which you don't want assigned to a
+ device.
+
+
+
+ 6\b6.\b.4\b4.\b. U\bUM\bML\bL i\bin\bnt\bte\ber\brf\bfa\bac\bce\be s\bse\bet\btu\bup\bp
+
+ Once the network devices have been described on the command line, you
+ should boot UML and log in.
+
+
+ The first thing to do is bring the interface up:
+
+
+ UML# ifconfig ethn ip-address up
+
+
+
+
+ You should be able to ping the host at this point.
+
+
+ To reach the rest of the world, you should set a default route to the
+ host:
+
+
+ UML# route add default gw host ip
+
+
+
+
+ Again, with host ip of 192.168.0.4:
+
+
+ UML# route add default gw 192.168.0.4
+
+
+
+
+ This page used to recommend setting a network route to your local net.
+ This is wrong, because it will cause UML to try to figure out hardware
+ addresses of the local machines by arping on the interface to the
+ host. Since that interface is basically a single strand of ethernet
+ with two nodes on it (UML and the host) and arp requests don't cross
+ networks, they will fail to elicit any responses. So, what you want
+ is for UML to just blindly throw all packets at the host and let it
+ figure out what to do with them, which is what leaving out the network
+ route and adding the default route does.
+
+
+ Note: If you can't communicate with other hosts on your physical
+ ethernet, it's probably because of a network route that's
+ automatically set up. If you run 'route -n' and see a route that
+ looks like this:
+
+
+
+
+ Destination Gateway Genmask Flags Metric Ref Use Iface
+ 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
+
+
+
+
+ with a mask that's not 255.255.255.255, then replace it with a route
+ to your host:
+
+
+ UML#
+ route del -net 192.168.0.0 dev eth0 netmask 255.255.255.0
+
+
+
+
+
+
+ UML#
+ route add -host 192.168.0.4 dev eth0
+
+
+
+
+ This, plus the default route to the host, will allow UML to exchange
+ packets with any machine on your ethernet.
+
+
+
+ 6\b6.\b.5\b5.\b. M\bMu\bul\blt\bti\bic\bca\bas\bst\bt
+
+ The simplest way to set up a virtual network between multiple UMLs is
+ to use the mcast transport. This was written by Harald Welte and is
+ present in UML version 2.4.5-5um and later. Your system must have
+ multicast enabled in the kernel and there must be a multicast-capable
+ network device on the host. Normally, this is eth0, but if there is
+ no ethernet card on the host, then you will likely get strange error
+ messages when you bring the device up inside UML.
+
+
+ To use it, run two UMLs with
+
+
+ eth0=mcast
+
+
+
+
+ on their command lines. Log in, configure the ethernet device in each
+ machine with different IP addresses:
+
+
+ UML1# ifconfig eth0 192.168.0.254
+
+
+
+
+
+
+ UML2# ifconfig eth0 192.168.0.253
+
+
+
+
+ and they should be able to talk to each other.
+
+ The full set of command line options for this transport are
+
+
+
+ ethn=mcast,ethernet address,multicast
+ address,multicast port,ttl
+
+
+
+
+ Harald's original README is here <http://user-mode-linux.source-
+ forge.net/> and explains these in detail, as well as
+ some other issues.
+
+
+
+ 6\b6.\b.6\b6.\b. T\bTU\bUN\bN/\b/T\bTA\bAP\bP w\bwi\bit\bth\bh t\bth\bhe\be u\bum\bml\bl_\b_n\bne\bet\bt h\bhe\bel\blp\bpe\ber\br
+
+ TUN/TAP is the preferred mechanism on 2.4 to exchange packets with the
+ host. The TUN/TAP backend has been in UML since 2.4.9-3um.
+
+
+ The easiest way to get up and running is to let the setuid uml_net
+ helper do the host setup for you. This involves insmod-ing the tun.o
+ module if necessary, configuring the device, and setting up IP
+ forwarding, routing, and proxy arp. If you are new to UML networking,
+ do this first. If you're concerned about the security implications of
+ the setuid helper, use it to get up and running, then read the next
+ section to see how to have UML use a preconfigured tap device, which
+ avoids the use of uml_net.
+
+
+ If you specify an IP address for the host side of the device, the
+ uml_net helper will do all necessary setup on the host - the only
+ requirement is that TUN/TAP be available, either built in to the host
+ kernel or as the tun.o module.
+
+ The format of the command line switch to attach a device to a TUN/TAP
+ device is
+
+
+ eth <n> =tuntap,,, <IP address>
+
+
+
+
+ For example, this argument will attach the UML's eth0 to the next
+ available tap device and assign an ethernet address to it based on its
+ IP address
+
+
+ eth0=tuntap,,,192.168.0.254
+
+
+
+
+
+
+ Note that the IP address that must be used for the eth device inside
+ UML is fixed by the routing and proxy arp that is set up on the
+ TUN/TAP device on the host. You can use a different one, but it won't
+ work because reply packets won't reach the UML. This is a feature.
+ It prevents a nasty UML user from doing things like setting the UML IP
+ to the same as the network's nameserver or mail server.
+
+
+ There are a couple potential problems with running the TUN/TAP
+ transport on a 2.4 host kernel
+
+ +\bo TUN/TAP seems not to work on 2.4.3 and earlier. Upgrade the host
+ kernel or use the ethertap transport.
+
+ +\bo With an upgraded kernel, TUN/TAP may fail with
+
+
+ File descriptor in bad state
+
+
+
+
+ This is due to a header mismatch between the upgraded kernel and the
+ kernel that was originally installed on the machine. The fix is to
+ make sure that /usr/src/linux points to the headers for the running
+ kernel.
+
+ These were pointed out by Tim Robinson <timro at trkr dot net> in
+ <http://www.geocrawler.com/> name="this uml-
+ user post"> .
+
+
+
+ 6\b6.\b.7\b7.\b. T\bTU\bUN\bN/\b/T\bTA\bAP\bP w\bwi\bit\bth\bh a\ba p\bpr\bre\bec\bco\bon\bnf\bfi\big\bgu\bur\bre\bed\bd t\bta\bap\bp d\bde\bev\bvi\bic\bce\be
+
+ If you prefer not to have UML use uml_net (which is somewhat
+ insecure), with UML 2.4.17-11, you can set up a TUN/TAP device
+ beforehand. The setup needs to be done as root, but once that's done,
+ there is no need for root assistance. Setting up the device is done
+ as follows:
+
+ +\bo Create the device with tunctl (available from the UML utilities
+ tarball)
+
+
+
+
+ host# tunctl -u uid
+
+
+
+
+ where uid is the user id or username that UML will be run as. This
+ will tell you what device was created.
+
+ +\bo Configure the device IP (change IP addresses and device name to
+ suit)
+
+
+
+
+ host# ifconfig tap0 192.168.0.254 up
+
+
+
+
+
+ +\bo Set up routing and arping if desired - this is my recipe, there are
+ other ways of doing the same thing
+
+
+ host#
+ bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
+
+ host#
+ route add -host 192.168.0.253 dev tap0
+
+
+
+
+
+
+ host#
+ bash -c 'echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arp'
+
+
+
+
+
+
+ host#
+ arp -Ds 192.168.0.253 eth0 pub
+
+
+
+
+ Note that this must be done every time the host boots - this configu-
+ ration is not stored across host reboots. So, it's probably a good
+ idea to stick it in an rc file. An even better idea would be a little
+ utility which reads the information from a config file and sets up
+ devices at boot time.
+
+ +\bo Rather than using up two IPs and ARPing for one of them, you can
+ also provide direct access to your LAN by the UML by using a
+ bridge.
+
+
+ host#
+ brctl addbr br0
+
+
+
+
+
+
+ host#
+ ifconfig eth0 0.0.0.0 promisc up
+
+
+
+
+
+
+ host#
+ ifconfig tap0 0.0.0.0 promisc up
+
+
+
+
+
+
+ host#
+ ifconfig br0 192.168.0.1 netmask 255.255.255.0 up
+
+
+
+
+
+
+
+ host#
+ brctl stp br0 off
+
+
+
+
+
+
+ host#
+ brctl setfd br0 1
+
+
+
+
+
+
+ host#
+ brctl sethello br0 1
+
+
+
+
+
+
+ host#
+ brctl addif br0 eth0
+
+
+
+
+
+
+ host#
+ brctl addif br0 tap0
+
+
+
+
+ Note that 'br0' should be setup using ifconfig with the existing IP
+ address of eth0, as eth0 no longer has its own IP.
+
+ +\bo
+
+
+ Also, the /dev/net/tun device must be writable by the user running
+ UML in order for the UML to use the device that's been configured
+ for it. The simplest thing to do is
+
+
+ host# chmod 666 /dev/net/tun
+
+
+
+
+ Making it world-writable looks bad, but it seems not to be
+ exploitable as a security hole. However, it does allow anyone to cre-
+ ate useless tap devices (useless because they can't configure them),
+ which is a DOS attack. A somewhat more secure alternative would to be
+ to create a group containing all the users who have preconfigured tap
+ devices and chgrp /dev/net/tun to that group with mode 664 or 660.
+
+
+ +\bo Once the device is set up, run UML with 'eth0=tuntap,device name'
+ (i.e. 'eth0=tuntap,tap0') on the command line (or do it with the
+ mconsole config command).
+
+ +\bo Bring the eth device up in UML and you're in business.
+
+ If you don't want that tap device any more, you can make it non-
+ persistent with
+
+
+ host# tunctl -d tap device
+
+
+
+
+ Finally, tunctl has a -b (for brief mode) switch which causes it to
+ output only the name of the tap device it created. This makes it
+ suitable for capture by a script:
+
+
+ host# TAP=`tunctl -u 1000 -b`
+
+
+
+
+
+
+ 6\b6.\b.8\b8.\b. E\bEt\bth\bhe\ber\brt\bta\bap\bp
+
+ Ethertap is the general mechanism on 2.2 for userspace processes to
+ exchange packets with the kernel.
+
+
+
+ To use this transport, you need to describe the virtual network device
+ on the UML command line. The general format for this is
+
+
+ eth <n> =ethertap, <device> , <ethernet address> , <tap IP address>
+
+
+
+
+ So, the previous example
+
+
+ eth0=ethertap,tap0,fe:fd:0:0:0:1,192.168.0.254
+
+
+
+
+ attaches the UML eth0 device to the host /dev/tap0, assigns it the
+ ethernet address fe:fd:0:0:0:1, and assigns the IP address
+ 192.168.0.254 to the tap device.
+
+
+
+ The tap device is mandatory, but the others are optional. If the
+ ethernet address is omitted, one will be assigned to it.
+
+
+ The presence of the tap IP address will cause the helper to run and do
+ whatever host setup is needed to allow the virtual machine to
+ communicate with the outside world. If you're not sure you know what
+ you're doing, this is the way to go.
+
+
+ If it is absent, then you must configure the tap device and whatever
+ arping and routing you will need on the host. However, even in this
+ case, the uml_net helper still needs to be in your path and it must be
+ setuid root if you're not running UML as root. This is because the
+ tap device doesn't support SIGIO, which UML needs in order to use
+ something as a source of input. So, the helper is used as a
+ convenient asynchronous IO thread.
+
+ If you're using the uml_net helper, you can ignore the following host
+ setup - uml_net will do it for you. You just need to make sure you
+ have ethertap available, either built in to the host kernel or
+ available as a module.
+
+
+ If you want to set things up yourself, you need to make sure that the
+ appropriate /dev entry exists. If it doesn't, become root and create
+ it as follows:
+
+
+ mknod /dev/tap <minor> c 36 <minor> + 16
+
+
+
+
+ For example, this is how to create /dev/tap0:
+
+
+ mknod /dev/tap0 c 36 0 + 16
+
+
+
+
+ You also need to make sure that the host kernel has ethertap support.
+ If ethertap is enabled as a module, you apparently need to insmod
+ ethertap once for each ethertap device you want to enable. So,
+
+
+ host#
+ insmod ethertap
+
+
+
+
+ will give you the tap0 interface. To get the tap1 interface, you need
+ to run
+
+
+ host#
+ insmod ethertap unit=1 -o ethertap1
+
+
+
+
+
+
+
+ 6\b6.\b.9\b9.\b. T\bTh\bhe\be s\bsw\bwi\bit\btc\bch\bh d\bda\bae\bem\bmo\bon\bn
+
+ N\bNo\bot\bte\be: This is the daemon formerly known as uml_router, but which was
+ renamed so the network weenies of the world would stop growling at me.
+
+
+ The switch daemon, uml_switch, provides a mechanism for creating a
+ totally virtual network. By default, it provides no connection to the
+ host network (but see -tap, below).
+
+
+ The first thing you need to do is run the daemon. Running it with no
+ arguments will make it listen on a default pair of unix domain
+ sockets.
+
+
+ If you want it to listen on a different pair of sockets, use
+
+
+ -unix control socket data socket
+
+
+
+
+
+ If you want it to act as a hub rather than a switch, use
+
+
+ -hub
+
+
+
+
+
+ If you want the switch to be connected to host networking (allowing
+ the umls to get access to the outside world through the host), use
+
+
+ -tap tap0
+
+
+
+
+
+ Note that the tap device must be preconfigured (see "TUN/TAP with a
+ preconfigured tap device", above). If you're using a different tap
+ device than tap0, specify that instead of tap0.
+
+
+ uml_switch can be backgrounded as follows
+
+
+ host%
+ uml_switch [ options ] < /dev/null > /dev/null
+
+
+
+
+ The reason it doesn't background by default is that it listens to
+ stdin for EOF. When it sees that, it exits.
+
+
+ The general format of the kernel command line switch is
+
+
+
+ ethn=daemon,ethernet address,socket
+ type,control socket,data socket
+
+
+
+
+ You can leave off everything except the 'daemon'. You only need to
+ specify the ethernet address if the one that will be assigned to it
+ isn't acceptable for some reason. The rest of the arguments describe
+ how to communicate with the daemon. You should only specify them if
+ you told the daemon to use different sockets than the default. So, if
+ you ran the daemon with no arguments, running the UML on the same
+ machine with
+ eth0=daemon
+
+
+
+
+ will cause the eth0 driver to attach itself to the daemon correctly.
+
+
+
+ 6\b6.\b.1\b10\b0.\b. S\bSl\bli\bip\bp
+
+ Slip is another, less general, mechanism for a process to communicate
+ with the host networking. In contrast to the ethertap interface,
+ which exchanges ethernet frames with the host and can be used to
+ transport any higher-level protocol, it can only be used to transport
+ IP.
+
+
+ The general format of the command line switch is
+
+
+
+ ethn=slip,slip IP
+
+
+
+
+ The slip IP argument is the IP address that will be assigned to the
+ host end of the slip device. If it is specified, the helper will run
+ and will set up the host so that the virtual machine can reach it and
+ the rest of the network.
+
+
+ There are some oddities with this interface that you should be aware
+ of. You should only specify one slip device on a given virtual
+ machine, and its name inside UML will be 'umn', not 'eth0' or whatever
+ you specified on the command line. These problems will be fixed at
+ some point.
+
+
+
+ 6\b6.\b.1\b11\b1.\b. S\bSl\bli\bir\brp\bp
+
+ slirp uses an external program, usually /usr/bin/slirp, to provide IP
+ only networking connectivity through the host. This is similar to IP
+ masquerading with a firewall, although the translation is performed in
+ user-space, rather than by the kernel. As slirp does not set up any
+ interfaces on the host, or changes routing, slirp does not require
+ root access or setuid binaries on the host.
+
+
+ The general format of the command line switch for slirp is:
+
+
+
+ ethn=slirp,ethernet address,slirp path
+
+
+
+
+ The ethernet address is optional, as UML will set up the interface
+ with an ethernet address based upon the initial IP address of the
+ interface. The slirp path is generally /usr/bin/slirp, although it
+ will depend on distribution.
+
+
+ The slirp program can have a number of options passed to the command
+ line and we can't add them to the UML command line, as they will be
+ parsed incorrectly. Instead, a wrapper shell script can be written or
+ the options inserted into the /.slirprc file. More information on
+ all of the slirp options can be found in its man pages.
+
+
+ The eth0 interface on UML should be set up with the IP 10.2.0.15,
+ although you can use anything as long as it is not used by a network
+ you will be connecting to. The default route on UML should be set to
+ use
+
+
+ UML#
+ route add default dev eth0
+
+
+
+
+ slirp provides a number of useful IP addresses which can be used by
+ UML, such as 10.0.2.3 which is an alias for the DNS server specified
+ in /etc/resolv.conf on the host or the IP given in the 'dns' option
+ for slirp.
+
+
+ Even with a baudrate setting higher than 115200, the slirp connection
+ is limited to 115200. If you need it to go faster, the slirp binary
+ needs to be compiled with FULL_BOLT defined in config.h.
+
+
+
+ 6\b6.\b.1\b12\b2.\b. p\bpc\bca\bap\bp
+
+ The pcap transport is attached to a UML ethernet device on the command
+ line or with uml_mconsole with the following syntax:
+
+
+
+ ethn=pcap,host interface,filter
+ expression,option1,option2
+
+
+
+
+ The expression and options are optional.
+
+
+ The interface is whatever network device on the host you want to
+ sniff. The expression is a pcap filter expression, which is also what
+ tcpdump uses, so if you know how to specify tcpdump filters, you will
+ use the same expressions here. The options are up to two of
+ 'promisc', control whether pcap puts the host interface into
+ promiscuous mode. 'optimize' and 'nooptimize' control whether the pcap
+ expression optimizer is used.
+
+
+ Example:
+
+
+
+ eth0=pcap,eth0,tcp
+
+ eth1=pcap,eth0,!tcp
+
+
+
+ will cause the UML eth0 to emit all tcp packets on the host eth0 and
+ the UML eth1 to emit all non-tcp packets on the host eth0.
+
+
+
+ 6\b6.\b.1\b13\b3.\b. S\bSe\bet\btt\bti\bin\bng\bg u\bup\bp t\bth\bhe\be h\bho\bos\bst\bt y\byo\bou\bur\brs\bse\bel\blf\bf
+
+ If you don't specify an address for the host side of the ethertap or
+ slip device, UML won't do any setup on the host. So this is what is
+ needed to get things working (the examples use a host-side IP of
+ 192.168.0.251 and a UML-side IP of 192.168.0.250 - adjust to suit your
+ own network):
+
+ +\bo The device needs to be configured with its IP address. Tap devices
+ are also configured with an mtu of 1484. Slip devices are
+ configured with a point-to-point address pointing at the UML ip
+ address.
+
+
+ host# ifconfig tap0 arp mtu 1484 192.168.0.251 up
+
+
+
+
+
+
+ host#
+ ifconfig sl0 192.168.0.251 pointopoint 192.168.0.250 up
+
+
+
+
+
+ +\bo If a tap device is being set up, a route is set to the UML IP.
+
+
+ UML# route add -host 192.168.0.250 gw 192.168.0.251
+
+
+
+
+
+ +\bo To allow other hosts on your network to see the virtual machine,
+ proxy arp is set up for it.
+
+
+ host# arp -Ds 192.168.0.250 eth0 pub
+
+
+
+
+
+ +\bo Finally, the host is set up to route packets.
+
+
+ host# echo 1 > /proc/sys/net/ipv4/ip_forward
+
+
+
+
+
+
+
+
+
+
+ 7\b7.\b. S\bSh\bha\bar\bri\bin\bng\bg F\bFi\bil\ble\bes\bsy\bys\bst\bte\bem\bms\bs b\bbe\bet\btw\bwe\bee\ben\bn V\bVi\bir\brt\btu\bua\bal\bl M\bMa\bac\bch\bhi\bin\bne\bes\bs
+
+
+
+
+ 7\b7.\b.1\b1.\b. A\bA w\bwa\bar\brn\bni\bin\bng\bg
+
+ Don't attempt to share filesystems simply by booting two UMLs from the
+ same file. That's the same thing as booting two physical machines
+ from a shared disk. It will result in filesystem corruption.
+
+
+
+ 7\b7.\b.2\b2.\b. U\bUs\bsi\bin\bng\bg l\bla\bay\bye\ber\bre\bed\bd b\bbl\blo\boc\bck\bk d\bde\bev\bvi\bic\bce\bes\bs
+
+ The way to share a filesystem between two virtual machines is to use
+ the copy-on-write (COW) layering capability of the ubd block driver.
+ As of 2.4.6-2um, the driver supports layering a read-write private
+ device over a read-only shared device. A machine's writes are stored
+ in the private device, while reads come from either device - the
+ private one if the requested block is valid in it, the shared one if
+ not. Using this scheme, the majority of data which is unchanged is
+ shared between an arbitrary number of virtual machines, each of which
+ has a much smaller file containing the changes that it has made. With
+ a large number of UMLs booting from a large root filesystem, this
+ leads to a huge disk space saving. It will also help performance,
+ since the host will be able to cache the shared data using a much
+ smaller amount of memory, so UML disk requests will be served from the
+ host's memory rather than its disks.
+
+
+
+
+ To add a copy-on-write layer to an existing block device file, simply
+ add the name of the COW file to the appropriate ubd switch:
+
+
+ ubd0=root_fs_cow,root_fs_debian_22
+
+
+
+
+ where 'root_fs_cow' is the private COW file and 'root_fs_debian_22' is
+ the existing shared filesystem. The COW file need not exist. If it
+ doesn't, the driver will create and initialize it. Once the COW file
+ has been initialized, it can be used on its own on the command line:
+
+
+ ubd0=root_fs_cow
+
+
+
+
+ The name of the backing file is stored in the COW file header, so it
+ would be redundant to continue specifying it on the command line.
+
+
+
+ 7\b7.\b.3\b3.\b. N\bNo\bot\bte\be!\b!
+
+ When checking the size of the COW file in order to see the gobs of
+ space that you're saving, make sure you use 'ls -ls' to see the actual
+ disk consumption rather than the length of the file. The COW file is
+ sparse, so the length will be very different from the disk usage.
+ Here is a 'ls -l' of a COW file and backing file from one boot and
+ shutdown:
+ host% ls -l cow.debian debian2.2
+ -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian
+ -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2
+
+
+
+
+ Doesn't look like much saved space, does it? Well, here's 'ls -ls':
+
+
+ host% ls -ls cow.debian debian2.2
+ 880 -rw-r--r-- 1 jdike jdike 492504064 Aug 6 21:16 cow.debian
+ 525832 -rwxrw-rw- 1 jdike jdike 537919488 Aug 6 20:42 debian2.2
+
+
+
+
+ Now, you can see that the COW file has less than a meg of disk, rather
+ than 492 meg.
+
+
+
+ 7\b7.\b.4\b4.\b. A\bAn\bno\bot\bth\bhe\ber\br w\bwa\bar\brn\bni\bin\bng\bg
+
+ Once a filesystem is being used as a readonly backing file for a COW
+ file, do not boot directly from it or modify it in any way. Doing so
+ will invalidate any COW files that are using it. The mtime and size
+ of the backing file are stored in the COW file header at its creation,
+ and they must continue to match. If they don't, the driver will
+ refuse to use the COW file.
+
+
+
+
+ If you attempt to evade this restriction by changing either the
+ backing file or the COW header by hand, you will get a corrupted
+ filesystem.
+
+
+
+
+ Among other things, this means that upgrading the distribution in a
+ backing file and expecting that all of the COW files using it will see
+ the upgrade will not work.
+
+
+
+
+ 7\b7.\b.5\b5.\b. u\bum\bml\bl_\b_m\bmo\boo\bo :\b: M\bMe\ber\brg\bgi\bin\bng\bg a\ba C\bCO\bOW\bW f\bfi\bil\ble\be w\bwi\bit\bth\bh i\bit\bts\bs b\bba\bac\bck\bki\bin\bng\bg f\bfi\bil\ble\be
+
+ Depending on how you use UML and COW devices, it may be advisable to
+ merge the changes in the COW file into the backing file every once in
+ a while.
+
+
+
+
+ The utility that does this is uml_moo. Its usage is
+
+
+ host% uml_moo COW file new backing file
+
+
+
+
+ There's no need to specify the backing file since that information is
+ already in the COW file header. If you're paranoid, boot the new
+ merged file, and if you're happy with it, move it over the old backing
+ file.
+
+
+
+
+ uml_moo creates a new backing file by default as a safety measure. It
+ also has a destructive merge option which will merge the COW file
+ directly into its current backing file. This is really only usable
+ when the backing file only has one COW file associated with it. If
+ there are multiple COWs associated with a backing file, a -d merge of
+ one of them will invalidate all of the others. However, it is
+ convenient if you're short of disk space, and it should also be
+ noticeably faster than a non-destructive merge.
+
+
+
+
+ uml_moo is installed with the UML deb and RPM. If you didn't install
+ UML from one of those packages, you can also get it from the UML
+ utilities <http://user-mode-linux.sourceforge.net/
+ utilities> tar file in tools/moo.
+
+
+
+
+
+
+
+
+ 8\b8.\b. C\bCr\bre\bea\bat\bti\bin\bng\bg f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bms\bs
+
+
+ You may want to create and mount new UML filesystems, either because
+ your root filesystem isn't large enough or because you want to use a
+ filesystem other than ext2.
+
+
+ This was written on the occasion of reiserfs being included in the
+ 2.4.1 kernel pool, and therefore the 2.4.1 UML, so the examples will
+ talk about reiserfs. This information is generic, and the examples
+ should be easy to translate to the filesystem of your choice.
+
+
+ 8\b8.\b.1\b1.\b. C\bCr\bre\bea\bat\bte\be t\bth\bhe\be f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm f\bfi\bil\ble\be
+
+ dd is your friend. All you need to do is tell dd to create an empty
+ file of the appropriate size. I usually make it sparse to save time
+ and to avoid allocating disk space until it's actually used. For
+ example, the following command will create a sparse 100 meg file full
+ of zeroes.
+
+
+ host%
+ dd if=/dev/zero of=new_filesystem seek=100 count=1 bs=1M
+
+
+
+
+
+
+ 8\b8.\b.2\b2.\b. A\bAs\bss\bsi\big\bgn\bn t\bth\bhe\be f\bfi\bil\ble\be t\bto\bo a\ba U\bUM\bML\bL d\bde\bev\bvi\bic\bce\be
+
+ Add an argument like the following to the UML command line:
+
+ ubd4=new_filesystem
+
+
+
+
+ making sure that you use an unassigned ubd device number.
+
+
+
+ 8\b8.\b.3\b3.\b. C\bCr\bre\bea\bat\bti\bin\bng\bg a\ban\bnd\bd m\bmo\bou\bun\bnt\bti\bin\bng\bg t\bth\bhe\be f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm
+
+ Make sure that the filesystem is available, either by being built into
+ the kernel, or available as a module, then boot up UML and log in. If
+ the root filesystem doesn't have the filesystem utilities (mkfs, fsck,
+ etc), then get them into UML by way of the net or hostfs.
+
+
+ Make the new filesystem on the device assigned to the new file:
+
+
+ host# mkreiserfs /dev/ubd/4
+
+
+ <----------- MKREISERFSv2 ----------->
+
+ ReiserFS version 3.6.25
+ Block size 4096 bytes
+ Block count 25856
+ Used blocks 8212
+ Journal - 8192 blocks (18-8209), journal header is in block 8210
+ Bitmaps: 17
+ Root block 8211
+ Hash function "r5"
+ ATTENTION: ALL DATA WILL BE LOST ON '/dev/ubd/4'! (y/n)y
+ journal size 8192 (from 18)
+ Initializing journal - 0%....20%....40%....60%....80%....100%
+ Syncing..done.
+
+
+
+
+ Now, mount it:
+
+
+ UML#
+ mount /dev/ubd/4 /mnt
+
+
+
+
+ and you're in business.
+
+
+
+
+
+
+
+
+
+ 9\b9.\b. H\bHo\bos\bst\bt f\bfi\bil\ble\be a\bac\bcc\bce\bes\bss\bs
+
+
+ If you want to access files on the host machine from inside UML, you
+ can treat it as a separate machine and either nfs mount directories
+ from the host or copy files into the virtual machine with scp or rcp.
+ However, since UML is running on the host, it can access those
+ files just like any other process and make them available inside the
+ virtual machine without needing to use the network.
+
+
+ This is now possible with the hostfs virtual filesystem. With it, you
+ can mount a host directory into the UML filesystem and access the
+ files contained in it just as you would on the host.
+
+
+ 9\b9.\b.1\b1.\b. U\bUs\bsi\bin\bng\bg h\bho\bos\bst\btf\bfs\bs
+
+ To begin with, make sure that hostfs is available inside the virtual
+ machine with
+
+
+ UML# cat /proc/filesystems
+
+
+
+ . hostfs should be listed. If it's not, either rebuild the kernel
+ with hostfs configured into it or make sure that hostfs is built as a
+ module and available inside the virtual machine, and insmod it.
+
+
+ Now all you need to do is run mount:
+
+
+ UML# mount none /mnt/host -t hostfs
+
+
+
+
+ will mount the host's / on the virtual machine's /mnt/host.
+
+
+ If you don't want to mount the host root directory, then you can
+ specify a subdirectory to mount with the -o switch to mount:
+
+
+ UML# mount none /mnt/home -t hostfs -o /home
+
+
+
+
+ will mount the hosts's /home on the virtual machine's /mnt/home.
+
+
+
+ 9\b9.\b.2\b2.\b. h\bho\bos\bst\btf\bfs\bs a\bas\bs t\bth\bhe\be r\bro\boo\bot\bt f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm
+
+ It's possible to boot from a directory hierarchy on the host using
+ hostfs rather than using the standard filesystem in a file.
+
+ To start, you need that hierarchy. The easiest way is to loop mount
+ an existing root_fs file:
+
+
+ host# mount root_fs uml_root_dir -o loop
+
+
+
+
+ You need to change the filesystem type of / in etc/fstab to be
+ 'hostfs', so that line looks like this:
+
+ /dev/ubd/0 / hostfs defaults 1 1
+
+
+
+
+ Then you need to chown to yourself all the files in that directory
+ that are owned by root. This worked for me:
+
+
+ host# find . -uid 0 -exec chown jdike {} \;
+
+
+
+
+ Next, make sure that your UML kernel has hostfs compiled in, not as a
+ module. Then run UML with the boot device pointing at that directory:
+
+
+ ubd0=/path/to/uml/root/directory
+
+
+
+
+ UML should then boot as it does normally.
+
+
+ 9\b9.\b.3\b3.\b. B\bBu\bui\bil\bld\bdi\bin\bng\bg h\bho\bos\bst\btf\bfs\bs
+
+ If you need to build hostfs because it's not in your kernel, you have
+ two choices:
+
+
+
+ +\bo Compiling hostfs into the kernel:
+
+
+ Reconfigure the kernel and set the 'Host filesystem' option under
+
+
+ +\bo Compiling hostfs as a module:
+
+
+ Reconfigure the kernel and set the 'Host filesystem' option under
+ be in arch/um/fs/hostfs/hostfs.o. Install that in
+ /lib/modules/`uname -r`/fs in the virtual machine, boot it up, and
+
+
+ UML# insmod hostfs
+
+
+
+
+
+
+
+
+
+
+
+
+ 1\b10\b0.\b. T\bTh\bhe\be M\bMa\ban\bna\bag\bge\bem\bme\ben\bnt\bt C\bCo\bon\bns\bso\bol\ble\be
+
+
+
+ The UML management console is a low-level interface to the kernel,
+ somewhat like the i386 SysRq interface. Since there is a full-blown
+ operating system under UML, there is much greater flexibility possible
+ than with the SysRq mechanism.
+
+
+ There are a number of things you can do with the mconsole interface:
+
+ +\bo get the kernel version
+
+ +\bo add and remove devices
+
+ +\bo halt or reboot the machine
+
+ +\bo Send SysRq commands
+
+ +\bo Pause and resume the UML
+
+
+ You need the mconsole client (uml_mconsole) which is present in CVS
+ (/tools/mconsole) in 2.4.5-9um and later, and will be in the RPM in
+ 2.4.6.
+
+
+ You also need CONFIG_MCONSOLE (under 'General Setup') enabled in UML.
+ When you boot UML, you'll see a line like:
+
+
+ mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
+
+
+
+
+ If you specify a unique machine id one the UML command line, i.e.
+
+
+ umid=debian
+
+
+
+
+ you'll see this
+
+
+ mconsole initialized on /home/jdike/.uml/debian/mconsole
+
+
+
+
+ That file is the socket that uml_mconsole will use to communicate with
+ UML. Run it with either the umid or the full path as its argument:
+
+
+ host% uml_mconsole debian
+
+
+
+
+ or
+
+
+ host% uml_mconsole /home/jdike/.uml/debian/mconsole
+
+
+
+
+ You'll get a prompt, at which you can run one of these commands:
+
+ +\bo version
+
+ +\bo halt
+
+ +\bo reboot
+
+ +\bo config
+
+ +\bo remove
+
+ +\bo sysrq
+
+ +\bo help
+
+ +\bo cad
+
+ +\bo stop
+
+ +\bo go
+
+
+ 1\b10\b0.\b.1\b1.\b. v\bve\ber\brs\bsi\bio\bon\bn
+
+ This takes no arguments. It prints the UML version.
+
+
+ (mconsole) version
+ OK Linux usermode 2.4.5-9um #1 Wed Jun 20 22:47:08 EDT 2001 i686
+
+
+
+
+ There are a couple actual uses for this. It's a simple no-op which
+ can be used to check that a UML is running. It's also a way of
+ sending an interrupt to the UML. This is sometimes useful on SMP
+ hosts, where there's a bug which causes signals to UML to be lost,
+ often causing it to appear to hang. Sending such a UML the mconsole
+ version command is a good way to 'wake it up' before networking has
+ been enabled, as it does not do anything to the function of the UML.
+
+
+
+ 1\b10\b0.\b.2\b2.\b. h\bha\bal\blt\bt a\ban\bnd\bd r\bre\beb\bbo\boo\bot\bt
+
+ These take no arguments. They shut the machine down immediately, with
+ no syncing of disks and no clean shutdown of userspace. So, they are
+ pretty close to crashing the machine.
+
+
+ (mconsole) halt
+ OK
+
+
+
+
+
+
+ 1\b10\b0.\b.3\b3.\b. c\bco\bon\bnf\bfi\big\bg
+
+ "config" adds a new device to the virtual machine. Currently the ubd
+ and network drivers support this. It takes one argument, which is the
+ device to add, with the same syntax as the kernel command line.
+
+
+
+
+ (mconsole)
+ config ubd3=/home/jdike/incoming/roots/root_fs_debian22
+
+ OK
+ (mconsole) config eth1=mcast
+ OK
+
+
+
+
+
+
+ 1\b10\b0.\b.4\b4.\b. r\bre\bem\bmo\bov\bve\be
+
+ "remove" deletes a device from the system. Its argument is just the
+ name of the device to be removed. The device must be idle in whatever
+ sense the driver considers necessary. In the case of the ubd driver,
+ the removed block device must not be mounted, swapped on, or otherwise
+ open, and in the case of the network driver, the device must be down.
+
+
+ (mconsole) remove ubd3
+ OK
+ (mconsole) remove eth1
+ OK
+
+
+
+
+
+
+ 1\b10\b0.\b.5\b5.\b. s\bsy\bys\bsr\brq\bq
+
+ This takes one argument, which is a single letter. It calls the
+ generic kernel's SysRq driver, which does whatever is called for by
+ that argument. See the SysRq documentation in Documentation/sysrq.txt
+ in your favorite kernel tree to see what letters are valid and what
+ they do.
+
+
+
+ 1\b10\b0.\b.6\b6.\b. h\bhe\bel\blp\bp
+
+ "help" returns a string listing the valid commands and what each one
+ does.
+
+
+
+ 1\b10\b0.\b.7\b7.\b. c\bca\bad\bd
+
+ This invokes the Ctl-Alt-Del action on init. What exactly this ends
+ up doing is up to /etc/inittab. Normally, it reboots the machine.
+ With UML, this is usually not desired, so if a halt would be better,
+ then find the section of inittab that looks like this
+
+
+ # What to do when CTRL-ALT-DEL is pressed.
+ ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -r now
+
+
+
+
+ and change the command to halt.
+
+
+
+ 1\b10\b0.\b.8\b8.\b. s\bst\bto\bop\bp
+
+ This puts the UML in a loop reading mconsole requests until a 'go'
+ mconsole command is received. This is very useful for making backups
+ of UML filesystems, as the UML can be stopped, then synced via 'sysrq
+ s', so that everything is written to the filesystem. You can then copy
+ the filesystem and then send the UML 'go' via mconsole.
+
+
+ Note that a UML running with more than one CPU will have problems
+ after you send the 'stop' command, as only one CPU will be held in a
+ mconsole loop and all others will continue as normal. This is a bug,
+ and will be fixed.
+
+
+
+ 1\b10\b0.\b.9\b9.\b. g\bgo\bo
+
+ This resumes a UML after being paused by a 'stop' command. Note that
+ when the UML has resumed, TCP connections may have timed out and if
+ the UML is paused for a long period of time, crond might go a little
+ crazy, running all the jobs it didn't do earlier.
+
+
+
+
+
+
+
+
+ 1\b11\b1.\b. K\bKe\ber\brn\bne\bel\bl d\bde\beb\bbu\bug\bgg\bgi\bin\bng\bg
+
+
+ N\bNo\bot\bte\be:\b: The interface that makes debugging, as described here, possible
+ is present in 2.4.0-test6 kernels and later.
+
+
+ Since the user-mode kernel runs as a normal Linux process, it is
+ possible to debug it with gdb almost like any other process. It is
+ slightly different because the kernel's threads are already being
+ ptraced for system call interception, so gdb can't ptrace them.
+ However, a mechanism has been added to work around that problem.
+
+
+ In order to debug the kernel, you need build it from source. See
+ ``Compiling the kernel and modules'' for information on doing that.
+ Make sure that you enable CONFIG_DEBUGSYM and CONFIG_PT_PROXY during
+ the config. These will compile the kernel with -g, and enable the
+ ptrace proxy so that gdb works with UML, respectively.
+
+
+
+
+ 1\b11\b1.\b.1\b1.\b. S\bSt\bta\bar\brt\bti\bin\bng\bg t\bth\bhe\be k\bke\ber\brn\bne\bel\bl u\bun\bnd\bde\ber\br g\bgd\bdb\bb
+
+ You can have the kernel running under the control of gdb from the
+ beginning by putting 'debug' on the command line. You will get an
+ xterm with gdb running inside it. The kernel will send some commands
+ to gdb which will leave it stopped at the beginning of start_kernel.
+ At this point, you can get things going with 'next', 'step', or
+ 'cont'.
+
+
+ There is a transcript of a debugging session here <debug-
+ session.html> , with breakpoints being set in the scheduler and in an
+ interrupt handler.
+ 1\b11\b1.\b.2\b2.\b. E\bEx\bxa\bam\bmi\bin\bni\bin\bng\bg s\bsl\ble\bee\bep\bpi\bin\bng\bg p\bpr\bro\boc\bce\bes\bss\bse\bes\bs
+
+ Not every bug is evident in the currently running process. Sometimes,
+ processes hang in the kernel when they shouldn't because they've
+ deadlocked on a semaphore or something similar. In this case, when
+ you ^C gdb and get a backtrace, you will see the idle thread, which
+ isn't very relevant.
+
+
+ What you want is the stack of whatever process is sleeping when it
+ shouldn't be. You need to figure out which process that is, which is
+ generally fairly easy. Then you need to get its host process id,
+ which you can do either by looking at ps on the host or at
+ task.thread.extern_pid in gdb.
+
+
+ Now what you do is this:
+
+ +\bo detach from the current thread
+
+
+ (UML gdb) det
+
+
+
+
+
+ +\bo attach to the thread you are interested in
+
+
+ (UML gdb) att <host pid>
+
+
+
+
+
+ +\bo look at its stack and anything else of interest
+
+
+ (UML gdb) bt
+
+
+
+
+ Note that you can't do anything at this point that requires that a
+ process execute, e.g. calling a function
+
+ +\bo when you're done looking at that process, reattach to the current
+ thread and continue it
+
+
+ (UML gdb)
+ att 1
+
+
+
+
+
+
+ (UML gdb)
+ c
+
+
+
+
+ Here, specifying any pid which is not the process id of a UML thread
+ will cause gdb to reattach to the current thread. I commonly use 1,
+ but any other invalid pid would work.
+
+
+
+ 1\b11\b1.\b.3\b3.\b. R\bRu\bun\bnn\bni\bin\bng\bg d\bdd\bdd\bd o\bon\bn U\bUM\bML\bL
+
+ ddd works on UML, but requires a special kludge. The process goes
+ like this:
+
+ +\bo Start ddd
+
+
+ host% ddd linux
+
+
+
+
+
+ +\bo With ps, get the pid of the gdb that ddd started. You can ask the
+ gdb to tell you, but for some reason that confuses things and
+ causes a hang.
+
+ +\bo run UML with 'debug=parent gdb-pid=<pid>' added to the command line
+ - it will just sit there after you hit return
+
+ +\bo type 'att 1' to the ddd gdb and you will see something like
+
+
+ 0xa013dc51 in __kill ()
+
+
+ (gdb)
+
+
+
+
+
+ +\bo At this point, type 'c', UML will boot up, and you can use ddd just
+ as you do on any other process.
+
+
+
+ 1\b11\b1.\b.4\b4.\b. D\bDe\beb\bbu\bug\bgg\bgi\bin\bng\bg m\bmo\bod\bdu\bul\ble\bes\bs
+
+ gdb has support for debugging code which is dynamically loaded into
+ the process. This support is what is needed to debug kernel modules
+ under UML.
+
+
+ Using that support is somewhat complicated. You have to tell gdb what
+ object file you just loaded into UML and where in memory it is. Then,
+ it can read the symbol table, and figure out where all the symbols are
+ from the load address that you provided. It gets more interesting
+ when you load the module again (i.e. after an rmmod). You have to
+ tell gdb to forget about all its symbols, including the main UML ones
+ for some reason, then load then all back in again.
+
+
+ There's an easy way and a hard way to do this. The easy way is to use
+ the umlgdb expect script written by Chandan Kudige. It basically
+ automates the process for you.
+
+
+ First, you must tell it where your modules are. There is a list in
+ the script that looks like this:
+ set MODULE_PATHS {
+ "fat" "/usr/src/uml/linux-2.4.18/fs/fat/fat.o"
+ "isofs" "/usr/src/uml/linux-2.4.18/fs/isofs/isofs.o"
+ "minix" "/usr/src/uml/linux-2.4.18/fs/minix/minix.o"
+ }
+
+
+
+
+ You change that to list the names and paths of the modules that you
+ are going to debug. Then you run it from the toplevel directory of
+ your UML pool and it basically tells you what to do:
+
+
+
+
+ ******** GDB pid is 21903 ********
+ Start UML as: ./linux <kernel switches> debug gdb-pid=21903
+
+
+
+ GNU gdb 5.0rh-5 Red Hat Linux 7.1
+ Copyright 2001 Free Software Foundation, Inc.
+ GDB is free software, covered by the GNU General Public License, and you are
+ welcome to change it and/or distribute copies of it under certain conditions.
+ Type "show copying" to see the conditions.
+ There is absolutely no warranty for GDB. Type "show warranty" for details.
+ This GDB was configured as "i386-redhat-linux"...
+ (gdb) b sys_init_module
+ Breakpoint 1 at 0xa0011923: file module.c, line 349.
+ (gdb) att 1
+
+
+
+
+ After you run UML and it sits there doing nothing, you hit return at
+ the 'att 1' and continue it:
+
+
+ Attaching to program: /home/jdike/linux/2.4/um/./linux, process 1
+ 0xa00f4221 in __kill ()
+ (UML gdb) c
+ Continuing.
+
+
+
+
+ At this point, you debug normally. When you insmod something, the
+ expect magic will kick in and you'll see something like:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ *** Module hostfs loaded ***
+ Breakpoint 1, sys_init_module (name_user=0x805abb0 "hostfs",
+ mod_user=0x8070e00) at module.c:349
+ 349 char *name, *n_name, *name_tmp = NULL;
+ (UML gdb) finish
+ Run till exit from #0 sys_init_module (name_user=0x805abb0 "hostfs",
+ mod_user=0x8070e00) at module.c:349
+ 0xa00e2e23 in execute_syscall (r=0xa8140284) at syscall_kern.c:411
+ 411 else res = EXECUTE_SYSCALL(syscall, regs);
+ Value returned is $1 = 0
+ (UML gdb)
+ p/x (int)module_list + module_list->size_of_struct
+
+ $2 = 0xa9021054
+ (UML gdb) symbol-file ./linux
+ Load new symbol table from "./linux"? (y or n) y
+ Reading symbols from ./linux...
+ done.
+ (UML gdb)
+ add-symbol-file /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o 0xa9021054
+
+ add symbol table from file "/home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o" at
+ .text_addr = 0xa9021054
+ (y or n) y
+
+ Reading symbols from /home/jdike/linux/2.4/um/arch/um/fs/hostfs/hostfs.o...
+ done.
+ (UML gdb) p *module_list
+ $1 = {size_of_struct = 84, next = 0xa0178720, name = 0xa9022de0 "hostfs",
+ size = 9016, uc = {usecount = {counter = 0}, pad = 0}, flags = 1,
+ nsyms = 57, ndeps = 0, syms = 0xa9023170, deps = 0x0, refs = 0x0,
+ init = 0xa90221f0 <init_hostfs>, cleanup = 0xa902222c <exit_hostfs>,
+ ex_table_start = 0x0, ex_table_end = 0x0, persist_start = 0x0,
+ persist_end = 0x0, can_unload = 0, runsize = 0, kallsyms_start = 0x0,
+ kallsyms_end = 0x0,
+ archdata_start = 0x1b855 <Address 0x1b855 out of bounds>,
+ archdata_end = 0xe5890000 <Address 0xe5890000 out of bounds>,
+ kernel_data = 0xf689c35d <Address 0xf689c35d out of bounds>}
+ >> Finished loading symbols for hostfs ...
+
+
+
+
+ That's the easy way. It's highly recommended. The hard way is
+ described below in case you're interested in what's going on.
+
+
+ Boot the kernel under the debugger and load the module with insmod or
+ modprobe. With gdb, do:
+
+
+ (UML gdb) p module_list
+
+
+
+
+ This is a list of modules that have been loaded into the kernel, with
+ the most recently loaded module first. Normally, the module you want
+ is at module_list. If it's not, walk down the next links, looking at
+ the name fields until find the module you want to debug. Take the
+ address of that structure, and add module.size_of_struct (which in
+ 2.4.10 kernels is 96 (0x60)) to it. Gdb can make this hard addition
+ for you :-):
+
+
+
+ (UML gdb)
+ printf "%#x\n", (int)module_list module_list->size_of_struct
+
+
+
+
+ The offset from the module start occasionally changes (before 2.4.0,
+ it was module.size_of_struct + 4), so it's a good idea to check the
+ init and cleanup addresses once in a while, as describe below. Now
+ do:
+
+
+ (UML gdb)
+ add-symbol-file /path/to/module/on/host that_address
+
+
+
+
+ Tell gdb you really want to do it, and you're in business.
+
+
+ If there's any doubt that you got the offset right, like breakpoints
+ appear not to work, or they're appearing in the wrong place, you can
+ check it by looking at the module structure. The init and cleanup
+ fields should look like:
+
+
+ init = 0x588066b0 <init_hostfs>, cleanup = 0x588066c0 <exit_hostfs>
+
+
+
+
+ with no offsets on the symbol names. If the names are right, but they
+ are offset, then the offset tells you how much you need to add to the
+ address you gave to add-symbol-file.
+
+
+ When you want to load in a new version of the module, you need to get
+ gdb to forget about the old one. The only way I've found to do that
+ is to tell gdb to forget about all symbols that it knows about:
+
+
+ (UML gdb) symbol-file
+
+
+
+
+ Then reload the symbols from the kernel binary:
+
+
+ (UML gdb) symbol-file /path/to/kernel
+
+
+
+
+ and repeat the process above. You'll also need to re-enable break-
+ points. They were disabled when you dumped all the symbols because
+ gdb couldn't figure out where they should go.
+
+
+
+ 1\b11\b1.\b.5\b5.\b. A\bAt\btt\bta\bac\bch\bhi\bin\bng\bg g\bgd\bdb\bb t\bto\bo t\bth\bhe\be k\bke\ber\brn\bne\bel\bl
+
+ If you don't have the kernel running under gdb, you can attach gdb to
+ it later by sending the tracing thread a SIGUSR1. The first line of
+ the console output identifies its pid:
+ tracing thread pid = 20093
+
+
+
+
+ When you send it the signal:
+
+
+ host% kill -USR1 20093
+
+
+
+
+ you will get an xterm with gdb running in it.
+
+
+ If you have the mconsole compiled into UML, then the mconsole client
+ can be used to start gdb:
+
+
+ (mconsole) (mconsole) config gdb=xterm
+
+
+
+
+ will fire up an xterm with gdb running in it.
+
+
+
+ 1\b11\b1.\b.6\b6.\b. U\bUs\bsi\bin\bng\bg a\bal\blt\bte\ber\brn\bna\bat\bte\be d\bde\beb\bbu\bug\bgg\bge\ber\brs\bs
+
+ UML has support for attaching to an already running debugger rather
+ than starting gdb itself. This is present in CVS as of 17 Apr 2001.
+ I sent it to Alan for inclusion in the ac tree, and it will be in my
+ 2.4.4 release.
+
+
+ This is useful when gdb is a subprocess of some UI, such as emacs or
+ ddd. It can also be used to run debuggers other than gdb on UML.
+ Below is an example of using strace as an alternate debugger.
+
+
+ To do this, you need to get the pid of the debugger and pass it in
+ with the
+
+
+ If you are using gdb under some UI, then tell it to 'att 1', and
+ you'll find yourself attached to UML.
+
+
+ If you are using something other than gdb as your debugger, then
+ you'll need to get it to do the equivalent of 'att 1' if it doesn't do
+ it automatically.
+
+
+ An example of an alternate debugger is strace. You can strace the
+ actual kernel as follows:
+
+ +\bo Run the following in a shell
+
+
+ host%
+ sh -c 'echo pid=$$; echo -n hit return; read x; exec strace -p 1 -o strace.out'
+
+
+
+ +\bo Run UML with 'debug' and 'gdb-pid=<pid>' with the pid printed out
+ by the previous command
+
+ +\bo Hit return in the shell, and UML will start running, and strace
+ output will start accumulating in the output file.
+
+ Note that this is different from running
+
+
+ host% strace ./linux
+
+
+
+
+ That will strace only the main UML thread, the tracing thread, which
+ doesn't do any of the actual kernel work. It just oversees the vir-
+ tual machine. In contrast, using strace as described above will show
+ you the low-level activity of the virtual machine.
+
+
+
+
+
+ 1\b12\b2.\b. K\bKe\ber\brn\bne\bel\bl d\bde\beb\bbu\bug\bgg\bgi\bin\bng\bg e\bex\bxa\bam\bmp\bpl\ble\bes\bs
+
+ 1\b12\b2.\b.1\b1.\b. T\bTh\bhe\be c\bca\bas\bse\be o\bof\bf t\bth\bhe\be h\bhu\bun\bng\bg f\bfs\bsc\bck\bk
+
+ When booting up the kernel, fsck failed, and dropped me into a shell
+ to fix things up. I ran fsck -y, which hung:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Setting hostname uml [ OK ]
+ Checking root filesystem
+ /dev/fhd0 was not cleanly unmounted, check forced.
+ Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780.
+
+ /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
+ (i.e., without -a or -p options)
+ [ FAILED ]
+
+ *** An error occurred during the file system check.
+ *** Dropping you to a shell; the system will reboot
+ *** when you leave the shell.
+ Give root password for maintenance
+ (or type Control-D for normal startup):
+
+ [root@uml /root]# fsck -y /dev/fhd0
+ fsck -y /dev/fhd0
+ Parallelizing fsck version 1.14 (9-Jan-1999)
+ e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
+ /dev/fhd0 contains a file system with errors, check forced.
+ Pass 1: Checking inodes, blocks, and sizes
+ Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes
+
+ Inode 19780, i_blocks is 1548, should be 540. Fix? yes
+
+ Pass 2: Checking directory structure
+ Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes
+
+ Directory inode 11858, block 0, offset 0: directory corrupted
+ Salvage? yes
+
+ Missing '.' in directory inode 11858.
+ Fix? yes
+
+ Missing '..' in directory inode 11858.
+ Fix? yes
+
+
+
+
+
+ The standard drill in this sort of situation is to fire up gdb on the
+ signal thread, which, in this case, was pid 1935. In another window,
+ I run gdb and attach pid 1935.
+
+
+
+
+ ~/linux/2.3.26/um 1016: gdb linux
+ GNU gdb 4.17.0.11 with Linux support
+ Copyright 1998 Free Software Foundation, Inc.
+ GDB is free software, covered by the GNU General Public License, and you are
+ welcome to change it and/or distribute copies of it under certain conditions.
+ Type "show copying" to see the conditions.
+ There is absolutely no warranty for GDB. Type "show warranty" for details.
+ This GDB was configured as "i386-redhat-linux"...
+
+ (gdb) att 1935
+ Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 1935
+ 0x100756d9 in __wait4 ()
+
+
+
+
+
+
+ Let's see what's currently running:
+
+
+
+ (gdb) p current_task.pid
+ $1 = 0
+
+
+
+
+
+ It's the idle thread, which means that fsck went to sleep for some
+ reason and never woke up.
+
+
+ Let's guess that the last process in the process list is fsck:
+
+
+
+ (gdb) p current_task.prev_task.comm
+ $13 = "fsck.ext2\000\000\000\000\000\000"
+
+
+
+
+
+ It is, so let's see what it thinks it's up to:
+
+
+
+ (gdb) p current_task.prev_task.thread
+ $14 = {extern_pid = 1980, tracing = 0, want_tracing = 0, forking = 0,
+ kernel_stack_page = 0, signal_stack = 1342627840, syscall = {id = 4, args = {
+ 3, 134973440, 1024, 0, 1024}, have_result = 0, result = 50590720},
+ request = {op = 2, u = {exec = {ip = 1350467584, sp = 2952789424}, fork = {
+ regs = {1350467584, 2952789424, 0 <repeats 15 times>}, sigstack = 0,
+ pid = 0}, switch_to = 0x507e8000, thread = {proc = 0x507e8000,
+ arg = 0xaffffdb0, flags = 0, new_pid = 0}, input_request = {
+ op = 1350467584, fd = -1342177872, proc = 0, pid = 0}}}}
+
+
+
+
+
+ The interesting things here are the fact that its .thread.syscall.id
+ is __NR_write (see the big switch in arch/um/kernel/syscall_kern.c or
+ the defines in include/asm-um/arch/unistd.h), and that it never
+ returned. Also, its .request.op is OP_SWITCH (see
+ arch/um/include/user_util.h). These mean that it went into a write,
+ and, for some reason, called schedule().
+
+
+ The fact that it never returned from write means that its stack should
+ be fairly interesting. Its pid is 1980 (.thread.extern_pid). That
+ process is being ptraced by the signal thread, so it must be detached
+ before gdb can attach it:
+
+
+
+
+
+
+
+
+
+
+ (gdb) call detach(1980)
+
+ Program received signal SIGSEGV, Segmentation fault.
+ <function called from gdb>
+ The program being debugged stopped while in a function called from GDB.
+ When the function (detach) is done executing, GDB will silently
+ stop (instead of continuing to evaluate the expression containing
+ the function call).
+ (gdb) call detach(1980)
+ $15 = 0
+
+
+
+
+
+ The first detach segfaults for some reason, and the second one
+ succeeds.
+
+
+ Now I detach from the signal thread, attach to the fsck thread, and
+ look at its stack:
+
+
+ (gdb) det
+ Detaching from program: /home/dike/linux/2.3.26/um/linux Pid 1935
+ (gdb) att 1980
+ Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 1980
+ 0x10070451 in __kill ()
+ (gdb) bt
+ #0 0x10070451 in __kill ()
+ #1 0x10068ccd in usr1_pid (pid=1980) at process.c:30
+ #2 0x1006a03f in _switch_to (prev=0x50072000, next=0x507e8000)
+ at process_kern.c:156
+ #3 0x1006a052 in switch_to (prev=0x50072000, next=0x507e8000, last=0x50072000)
+ at process_kern.c:161
+ #4 0x10001d12 in schedule () at sched.c:777
+ #5 0x1006a744 in __down (sem=0x507d241c) at semaphore.c:71
+ #6 0x1006aa10 in __down_failed () at semaphore.c:157
+ #7 0x1006c5d8 in segv_handler (sc=0x5006e940) at trap_user.c:174
+ #8 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
+ #9 <signal handler called>
+ #10 0x10155404 in errno ()
+ #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50
+ #12 0x1006c5d8 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
+ #13 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
+ #14 <signal handler called>
+ #15 0xc0fd in ?? ()
+ #16 0x10016647 in sys_write (fd=3,
+ buf=0x80b8800 <Address 0x80b8800 out of bounds>, count=1024)
+ at read_write.c:159
+ #17 0x1006d5b3 in execute_syscall (syscall=4, args=0x5006ef08)
+ at syscall_kern.c:254
+ #18 0x1006af87 in really_do_syscall (sig=12) at syscall_user.c:35
+ #19 <signal handler called>
+ #20 0x400dc8b0 in ?? ()
+
+
+
+
+
+ The interesting things here are :
+
+ +\bo There are two segfaults on this stack (frames 9 and 14)
+
+ +\bo The first faulting address (frame 11) is 0x50000800
+
+ (gdb) p (void *)1342179328
+ $16 = (void *) 0x50000800
+
+
+
+
+
+ The initial faulting address is interesting because it is on the idle
+ thread's stack. I had been seeing the idle thread segfault for no
+ apparent reason, and the cause looked like stack corruption. In hopes
+ of catching the culprit in the act, I had turned off all protections
+ to that stack while the idle thread wasn't running. This apparently
+ tripped that trap.
+
+
+ However, the more immediate problem is that second segfault and I'm
+ going to concentrate on that. First, I want to see where the fault
+ happened, so I have to go look at the sigcontent struct in frame 8:
+
+
+
+ (gdb) up
+ #1 0x10068ccd in usr1_pid (pid=1980) at process.c:30
+ 30 kill(pid, SIGUSR1);
+ (gdb)
+ #2 0x1006a03f in _switch_to (prev=0x50072000, next=0x507e8000)
+ at process_kern.c:156
+ 156 usr1_pid(getpid());
+ (gdb)
+ #3 0x1006a052 in switch_to (prev=0x50072000, next=0x507e8000, last=0x50072000)
+ at process_kern.c:161
+ 161 _switch_to(prev, next);
+ (gdb)
+ #4 0x10001d12 in schedule () at sched.c:777
+ 777 switch_to(prev, next, prev);
+ (gdb)
+ #5 0x1006a744 in __down (sem=0x507d241c) at semaphore.c:71
+ 71 schedule();
+ (gdb)
+ #6 0x1006aa10 in __down_failed () at semaphore.c:157
+ 157 }
+ (gdb)
+ #7 0x1006c5d8 in segv_handler (sc=0x5006e940) at trap_user.c:174
+ 174 segv(sc->cr2, sc->err & 2);
+ (gdb)
+ #8 0x1006c5ec in kern_segv_handler (sig=11) at trap_user.c:182
+ 182 segv_handler(sc);
+ (gdb) p *sc
+ Cannot access memory at address 0x0.
+
+
+
+
+ That's not very useful, so I'll try a more manual method:
+
+
+ (gdb) p *((struct sigcontext *) (&sig + 1))
+ $19 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43,
+ __dsh = 0, edi = 1342179328, esi = 1350378548, ebp = 1342630440,
+ esp = 1342630420, ebx = 1348150624, edx = 1280, ecx = 0, eax = 0,
+ trapno = 14, err = 4, eip = 268480945, cs = 35, __csh = 0, eflags = 66118,
+ esp_at_signal = 1342630420, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0,
+ cr2 = 1280}
+
+
+
+ The ip is in handle_mm_fault:
+
+
+ (gdb) p (void *)268480945
+ $20 = (void *) 0x1000b1b1
+ (gdb) i sym $20
+ handle_mm_fault + 57 in section .text
+
+
+
+
+
+ Specifically, it's in pte_alloc:
+
+
+ (gdb) i line *$20
+ Line 124 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b1b1 <handle_mm_fault+57>
+ and ends at 0x1000b1b7 <handle_mm_fault+63>.
+
+
+
+
+
+ To find where in handle_mm_fault this is, I'll jump forward in the
+ code until I see an address in that procedure:
+
+
+
+ (gdb) i line *0x1000b1c0
+ Line 126 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b1b7 <handle_mm_fault+63>
+ and ends at 0x1000b1c3 <handle_mm_fault+75>.
+ (gdb) i line *0x1000b1d0
+ Line 131 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b1d0 <handle_mm_fault+88>
+ and ends at 0x1000b1da <handle_mm_fault+98>.
+ (gdb) i line *0x1000b1e0
+ Line 61 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b1da <handle_mm_fault+98>
+ and ends at 0x1000b1e1 <handle_mm_fault+105>.
+ (gdb) i line *0x1000b1f0
+ Line 134 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b1f0 <handle_mm_fault+120>
+ and ends at 0x1000b200 <handle_mm_fault+136>.
+ (gdb) i line *0x1000b200
+ Line 135 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b200 <handle_mm_fault+136>
+ and ends at 0x1000b208 <handle_mm_fault+144>.
+ (gdb) i line *0x1000b210
+ Line 139 of "/home/dike/linux/2.3.26/um/include/asm/pgalloc.h"
+ starts at address 0x1000b210 <handle_mm_fault+152>
+ and ends at 0x1000b219 <handle_mm_fault+161>.
+ (gdb) i line *0x1000b220
+ Line 1168 of "memory.c" starts at address 0x1000b21e <handle_mm_fault+166>
+ and ends at 0x1000b222 <handle_mm_fault+170>.
+
+
+
+
+
+ Something is apparently wrong with the page tables or vma_structs, so
+ lets go back to frame 11 and have a look at them:
+
+
+
+ #11 0x1006c0aa in segv (address=1342179328, is_write=2) at trap_kern.c:50
+ 50 handle_mm_fault(current, vma, address, is_write);
+ (gdb) call pgd_offset_proc(vma->vm_mm, address)
+ $22 = (pgd_t *) 0x80a548c
+
+
+
+
+
+ That's pretty bogus. Page tables aren't supposed to be in process
+ text or data areas. Let's see what's in the vma:
+
+
+ (gdb) p *vma
+ $23 = {vm_mm = 0x507d2434, vm_start = 0, vm_end = 134512640,
+ vm_next = 0x80a4f8c, vm_page_prot = {pgprot = 0}, vm_flags = 31200,
+ vm_avl_height = 2058, vm_avl_left = 0x80a8c94, vm_avl_right = 0x80d1000,
+ vm_next_share = 0xaffffdb0, vm_pprev_share = 0xaffffe63,
+ vm_ops = 0xaffffe7a, vm_pgoff = 2952789626, vm_file = 0xafffffec,
+ vm_private_data = 0x62}
+ (gdb) p *vma.vm_mm
+ $24 = {mmap = 0x507d2434, mmap_avl = 0x0, mmap_cache = 0x8048000,
+ pgd = 0x80a4f8c, mm_users = {counter = 0}, mm_count = {counter = 134904288},
+ map_count = 134909076, mmap_sem = {count = {counter = 135073792},
+ sleepers = -1342177872, wait = {lock = <optimized out or zero length>,
+ task_list = {next = 0xaffffe63, prev = 0xaffffe7a},
+ __magic = -1342177670, __creator = -1342177300}, __magic = 98},
+ page_table_lock = {}, context = 138, start_code = 0, end_code = 0,
+ start_data = 0, end_data = 0, start_brk = 0, brk = 0, start_stack = 0,
+ arg_start = 0, arg_end = 0, env_start = 0, env_end = 0, rss = 1350381536,
+ total_vm = 0, locked_vm = 0, def_flags = 0, cpu_vm_mask = 0, swap_cnt = 0,
+ swap_address = 0, segments = 0x0}
+
+
+
+
+
+ This also pretty bogus. With all of the 0x80xxxxx and 0xaffffxxx
+ addresses, this is looking like a stack was plonked down on top of
+ these structures. Maybe it's a stack overflow from the next page:
+
+
+
+ (gdb) p vma
+ $25 = (struct vm_area_struct *) 0x507d2434
+
+
+
+
+
+ That's towards the lower quarter of the page, so that would have to
+ have been pretty heavy stack overflow:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ (gdb) x/100x $25
+ 0x507d2434: 0x507d2434 0x00000000 0x08048000 0x080a4f8c
+ 0x507d2444: 0x00000000 0x080a79e0 0x080a8c94 0x080d1000
+ 0x507d2454: 0xaffffdb0 0xaffffe63 0xaffffe7a 0xaffffe7a
+ 0x507d2464: 0xafffffec 0x00000062 0x0000008a 0x00000000
+ 0x507d2474: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2484: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2494: 0x00000000 0x00000000 0x507d2fe0 0x00000000
+ 0x507d24a4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d24b4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d24c4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d24d4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d24e4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d24f4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2504: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2514: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2524: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2534: 0x00000000 0x00000000 0x507d25dc 0x00000000
+ 0x507d2544: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2554: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2564: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2574: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2584: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d2594: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d25a4: 0x00000000 0x00000000 0x00000000 0x00000000
+ 0x507d25b4: 0x00000000 0x00000000 0x00000000 0x00000000
+
+
+
+
+
+ It's not stack overflow. The only "stack-like" piece of this data is
+ the vma_struct itself.
+
+
+ At this point, I don't see any avenues to pursue, so I just have to
+ admit that I have no idea what's going on. What I will do, though, is
+ stick a trap on the segfault handler which will stop if it sees any
+ writes to the idle thread's stack. That was the thing that happened
+ first, and it may be that if I can catch it immediately, what's going
+ on will be somewhat clearer.
+
+
+ 1\b12\b2.\b.2\b2.\b. E\bEp\bpi\bis\bso\bod\bde\be 2\b2:\b: T\bTh\bhe\be c\bca\bas\bse\be o\bof\bf t\bth\bhe\be h\bhu\bun\bng\bg f\bfs\bsc\bck\bk
+
+ After setting a trap in the SEGV handler for accesses to the signal
+ thread's stack, I reran the kernel.
+
+
+ fsck hung again, this time by hitting the trap:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Setting hostname uml [ OK ]
+ Checking root filesystem
+ /dev/fhd0 contains a file system with errors, check forced.
+ Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780.
+
+ /dev/fhd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
+ (i.e., without -a or -p options)
+ [ FAILED ]
+
+ *** An error occurred during the file system check.
+ *** Dropping you to a shell; the system will reboot
+ *** when you leave the shell.
+ Give root password for maintenance
+ (or type Control-D for normal startup):
+
+ [root@uml /root]# fsck -y /dev/fhd0
+ fsck -y /dev/fhd0
+ Parallelizing fsck version 1.14 (9-Jan-1999)
+ e2fsck 1.14, 9-Jan-1999 for EXT2 FS 0.5b, 95/08/09
+ /dev/fhd0 contains a file system with errors, check forced.
+ Pass 1: Checking inodes, blocks, and sizes
+ Error reading block 86894 (Attempt to read block from filesystem resulted in short read) while reading indirect blocks of inode 19780. Ignore error? yes
+
+ Pass 2: Checking directory structure
+ Error reading block 49405 (Attempt to read block from filesystem resulted in short read). Ignore error? yes
+
+ Directory inode 11858, block 0, offset 0: directory corrupted
+ Salvage? yes
+
+ Missing '.' in directory inode 11858.
+ Fix? yes
+
+ Missing '..' in directory inode 11858.
+ Fix? yes
+
+ Untested (4127) [100fe44c]: trap_kern.c line 31
+
+
+
+
+
+ I need to get the signal thread to detach from pid 4127 so that I can
+ attach to it with gdb. This is done by sending it a SIGUSR1, which is
+ caught by the signal thread, which detaches the process:
+
+
+ kill -USR1 4127
+
+
+
+
+
+ Now I can run gdb on it:
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ~/linux/2.3.26/um 1034: gdb linux
+ GNU gdb 4.17.0.11 with Linux support
+ Copyright 1998 Free Software Foundation, Inc.
+ GDB is free software, covered by the GNU General Public License, and you are
+ welcome to change it and/or distribute copies of it under certain conditions.
+ Type "show copying" to see the conditions.
+ There is absolutely no warranty for GDB. Type "show warranty" for details.
+ This GDB was configured as "i386-redhat-linux"...
+ (gdb) att 4127
+ Attaching to program `/home/dike/linux/2.3.26/um/linux', Pid 4127
+ 0x10075891 in __libc_nanosleep ()
+
+
+
+
+
+ The backtrace shows that it was in a write and that the fault address
+ (address in frame 3) is 0x50000800, which is right in the middle of
+ the signal thread's stack page:
+
+
+ (gdb) bt
+ #0 0x10075891 in __libc_nanosleep ()
+ #1 0x1007584d in __sleep (seconds=1000000)
+ at ../sysdeps/unix/sysv/linux/sleep.c:78
+ #2 0x1006ce9a in stop () at user_util.c:191
+ #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31
+ #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
+ #5 0x1006c63c in kern_segv_handler (sig=11) at trap_user.c:182
+ #6 <signal handler called>
+ #7 0xc0fd in ?? ()
+ #8 0x10016647 in sys_write (fd=3, buf=0x80b8800 "R.", count=1024)
+ at read_write.c:159
+ #9 0x1006d603 in execute_syscall (syscall=4, args=0x5006ef08)
+ at syscall_kern.c:254
+ #10 0x1006af87 in really_do_syscall (sig=12) at syscall_user.c:35
+ #11 <signal handler called>
+ #12 0x400dc8b0 in ?? ()
+ #13 <signal handler called>
+ #14 0x400dc8b0 in ?? ()
+ #15 0x80545fd in ?? ()
+ #16 0x804daae in ?? ()
+ #17 0x8054334 in ?? ()
+ #18 0x804d23e in ?? ()
+ #19 0x8049632 in ?? ()
+ #20 0x80491d2 in ?? ()
+ #21 0x80596b5 in ?? ()
+ (gdb) p (void *)1342179328
+ $3 = (void *) 0x50000800
+
+
+
+
+
+ Going up the stack to the segv_handler frame and looking at where in
+ the code the access happened shows that it happened near line 110 of
+ block_dev.c:
+
+
+
+
+
+
+
+
+
+ (gdb) up
+ #1 0x1007584d in __sleep (seconds=1000000)
+ at ../sysdeps/unix/sysv/linux/sleep.c:78
+ ../sysdeps/unix/sysv/linux/sleep.c:78: No such file or directory.
+ (gdb)
+ #2 0x1006ce9a in stop () at user_util.c:191
+ 191 while(1) sleep(1000000);
+ (gdb)
+ #3 0x1006bf88 in segv (address=1342179328, is_write=2) at trap_kern.c:31
+ 31 KERN_UNTESTED();
+ (gdb)
+ #4 0x1006c628 in segv_handler (sc=0x5006eaf8) at trap_user.c:174
+ 174 segv(sc->cr2, sc->err & 2);
+ (gdb) p *sc
+ $1 = {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43,
+ __dsh = 0, edi = 1342179328, esi = 134973440, ebp = 1342631484,
+ esp = 1342630864, ebx = 256, edx = 0, ecx = 256, eax = 1024, trapno = 14,
+ err = 6, eip = 268550834, cs = 35, __csh = 0, eflags = 66070,
+ esp_at_signal = 1342630864, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0,
+ cr2 = 1342179328}
+ (gdb) p (void *)268550834
+ $2 = (void *) 0x1001c2b2
+ (gdb) i sym $2
+ block_write + 1090 in section .text
+ (gdb) i line *$2
+ Line 209 of "/home/dike/linux/2.3.26/um/include/asm/arch/string.h"
+ starts at address 0x1001c2a1 <block_write+1073>
+ and ends at 0x1001c2bf <block_write+1103>.
+ (gdb) i line *0x1001c2c0
+ Line 110 of "block_dev.c" starts at address 0x1001c2bf <block_write+1103>
+ and ends at 0x1001c2e3 <block_write+1139>.
+
+
+
+
+
+ Looking at the source shows that the fault happened during a call to
+ copy_to_user to copy the data into the kernel:
+
+
+ 107 count -= chars;
+ 108 copy_from_user(p,buf,chars);
+ 109 p += chars;
+ 110 buf += chars;
+
+
+
+
+
+ p is the pointer which must contain 0x50000800, since buf contains
+ 0x80b8800 (frame 8 above). It is defined as:
+
+
+ p = offset + bh->b_data;
+
+
+
+
+
+ I need to figure out what bh is, and it just so happens that bh is
+ passed as an argument to mark_buffer_uptodate and mark_buffer_dirty a
+ few lines later, so I do a little disassembly:
+
+
+
+
+ (gdb) disas 0x1001c2bf 0x1001c2e0
+ Dump of assembler code from 0x1001c2bf to 0x1001c2d0:
+ 0x1001c2bf <block_write+1103>: addl %eax,0xc(%ebp)
+ 0x1001c2c2 <block_write+1106>: movl 0xfffffdd4(%ebp),%edx
+ 0x1001c2c8 <block_write+1112>: btsl $0x0,0x18(%edx)
+ 0x1001c2cd <block_write+1117>: btsl $0x1,0x18(%edx)
+ 0x1001c2d2 <block_write+1122>: sbbl %ecx,%ecx
+ 0x1001c2d4 <block_write+1124>: testl %ecx,%ecx
+ 0x1001c2d6 <block_write+1126>: jne 0x1001c2e3 <block_write+1139>
+ 0x1001c2d8 <block_write+1128>: pushl $0x0
+ 0x1001c2da <block_write+1130>: pushl %edx
+ 0x1001c2db <block_write+1131>: call 0x1001819c <__mark_buffer_dirty>
+ End of assembler dump.
+
+
+
+
+
+ At that point, bh is in %edx (address 0x1001c2da), which is calculated
+ at 0x1001c2c2 as %ebp + 0xfffffdd4, so I figure exactly what that is,
+ taking %ebp from the sigcontext_struct above:
+
+
+ (gdb) p (void *)1342631484
+ $5 = (void *) 0x5006ee3c
+ (gdb) p 0x5006ee3c+0xfffffdd4
+ $6 = 1342630928
+ (gdb) p (void *)$6
+ $7 = (void *) 0x5006ec10
+ (gdb) p *((void **)$7)
+ $8 = (void *) 0x50100200
+
+
+
+
+
+ Now, I look at the structure to see what's in it, and particularly,
+ what its b_data field contains:
+
+
+ (gdb) p *((struct buffer_head *)0x50100200)
+ $13 = {b_next = 0x50289380, b_blocknr = 49405, b_size = 1024, b_list = 0,
+ b_dev = 15872, b_count = {counter = 1}, b_rdev = 15872, b_state = 24,
+ b_flushtime = 0, b_next_free = 0x501001a0, b_prev_free = 0x50100260,
+ b_this_page = 0x501001a0, b_reqnext = 0x0, b_pprev = 0x507fcf58,
+ b_data = 0x50000800 "", b_page = 0x50004000,
+ b_end_io = 0x10017f60 <end_buffer_io_sync>, b_dev_id = 0x0,
+ b_rsector = 98810, b_wait = {lock = <optimized out or zero length>,
+ task_list = {next = 0x50100248, prev = 0x50100248}, __magic = 1343226448,
+ __creator = 0}, b_kiobuf = 0x0}
+
+
+
+
+
+ The b_data field is indeed 0x50000800, so the question becomes how
+ that happened. The rest of the structure looks fine, so this probably
+ is not a case of data corruption. It happened on purpose somehow.
+
+
+ The b_page field is a pointer to the page_struct representing the
+ 0x50000000 page. Looking at it shows the kernel's idea of the state
+ of that page:
+
+
+
+ (gdb) p *$13.b_page
+ $17 = {list = {next = 0x50004a5c, prev = 0x100c5174}, mapping = 0x0,
+ index = 0, next_hash = 0x0, count = {counter = 1}, flags = 132, lru = {
+ next = 0x50008460, prev = 0x50019350}, wait = {
+ lock = <optimized out or zero length>, task_list = {next = 0x50004024,
+ prev = 0x50004024}, __magic = 1342193708, __creator = 0},
+ pprev_hash = 0x0, buffers = 0x501002c0, virtual = 1342177280,
+ zone = 0x100c5160}
+
+
+
+
+
+ Some sanity-checking: the virtual field shows the "virtual" address of
+ this page, which in this kernel is the same as its "physical" address,
+ and the page_struct itself should be mem_map[0], since it represents
+ the first page of memory:
+
+
+
+ (gdb) p (void *)1342177280
+ $18 = (void *) 0x50000000
+ (gdb) p mem_map
+ $19 = (mem_map_t *) 0x50004000
+
+
+
+
+
+ These check out fine.
+
+
+ Now to check out the page_struct itself. In particular, the flags
+ field shows whether the page is considered free or not:
+
+
+ (gdb) p (void *)132
+ $21 = (void *) 0x84
+
+
+
+
+
+ The "reserved" bit is the high bit, which is definitely not set, so
+ the kernel considers the signal stack page to be free and available to
+ be used.
+
+
+ At this point, I jump to conclusions and start looking at my early
+ boot code, because that's where that page is supposed to be reserved.
+
+
+ In my setup_arch procedure, I have the following code which looks just
+ fine:
+
+
+
+ bootmap_size = init_bootmem(start_pfn, end_pfn - start_pfn);
+ free_bootmem(__pa(low_physmem) + bootmap_size, high_physmem - low_physmem);
+
+
+
+
+
+ Two stack pages have already been allocated, and low_physmem points to
+ the third page, which is the beginning of free memory.
+ The init_bootmem call declares the entire memory to the boot memory
+ manager, which marks it all reserved. The free_bootmem call frees up
+ all of it, except for the first two pages. This looks correct to me.
+
+
+ So, I decide to see init_bootmem run and make sure that it is marking
+ those first two pages as reserved. I never get that far.
+
+
+ Stepping into init_bootmem, and looking at bootmem_map before looking
+ at what it contains shows the following:
+
+
+
+ (gdb) p bootmem_map
+ $3 = (void *) 0x50000000
+
+
+
+
+
+ Aha! The light dawns. That first page is doing double duty as a
+ stack and as the boot memory map. The last thing that the boot memory
+ manager does is to free the pages used by its memory map, so this page
+ is getting freed even its marked as reserved.
+
+
+ The fix was to initialize the boot memory manager before allocating
+ those two stack pages, and then allocate them through the boot memory
+ manager. After doing this, and fixing a couple of subsequent buglets,
+ the stack corruption problem disappeared.
+
+
+
+
+
+ 1\b13\b3.\b. W\bWh\bha\bat\bt t\bto\bo d\bdo\bo w\bwh\bhe\ben\bn U\bUM\bML\bL d\bdo\boe\bes\bsn\bn'\b't\bt w\bwo\bor\brk\bk
+
+
+
+
+ 1\b13\b3.\b.1\b1.\b. S\bSt\btr\bra\ban\bng\bge\be c\bco\bom\bmp\bpi\bil\bla\bat\bti\bio\bon\bn e\ber\brr\bro\bor\brs\bs w\bwh\bhe\ben\bn y\byo\bou\bu b\bbu\bui\bil\bld\bd f\bfr\bro\bom\bm s\bso\bou\bur\brc\bce\be
+
+ As of test11, it is necessary to have "ARCH=um" in the environment or
+ on the make command line for all steps in building UML, including
+ clean, distclean, or mrproper, config, menuconfig, or xconfig, dep,
+ and linux. If you forget for any of them, the i386 build seems to
+ contaminate the UML build. If this happens, start from scratch with
+
+
+ host%
+ make mrproper ARCH=um
+
+
+
+
+ and repeat the build process with ARCH=um on all the steps.
+
+
+ See ``Compiling the kernel and modules'' for more details.
+
+
+ Another cause of strange compilation errors is building UML in
+ /usr/src/linux. If you do this, the first thing you need to do is
+ clean up the mess you made. The /usr/src/linux/asm link will now
+ point to /usr/src/linux/asm-um. Make it point back to
+ /usr/src/linux/asm-i386. Then, move your UML pool someplace else and
+ build it there. Also see below, where a more specific set of symptoms
+ is described.
+
+
+
+ 1\b13\b3.\b.3\b3.\b. A\bA v\bva\bar\bri\bie\bet\bty\by o\bof\bf p\bpa\ban\bni\bic\bcs\bs a\ban\bnd\bd h\bha\ban\bng\bgs\bs w\bwi\bit\bth\bh /\b/t\btm\bmp\bp o\bon\bn a\ba r\bre\bei\bis\bse\ber\brf\bfs\bs f\bfi\bil\ble\bes\bsy\bys\bs-\b-
+ t\bte\bem\bm
+
+ I saw this on reiserfs 3.5.21 and it seems to be fixed in 3.5.27.
+ Panics preceded by
+
+
+ Detaching pid nnnn
+
+
+
+ are diagnostic of this problem. This is a reiserfs bug which causes a
+ thread to occasionally read stale data from a mmapped page shared with
+ another thread. The fix is to upgrade the filesystem or to have /tmp
+ be an ext2 filesystem.
+
+
+
+ 1\b13\b3.\b.4\b4.\b. T\bTh\bhe\be c\bco\bom\bmp\bpi\bil\ble\be f\bfa\bai\bil\bls\bs w\bwi\bit\bth\bh e\ber\brr\bro\bor\brs\bs a\bab\bbo\bou\but\bt c\bco\bon\bnf\bfl\bli\bic\bct\bti\bin\bng\bg t\bty\byp\bpe\bes\bs f\bfo\bor\br
+ '\b'o\bop\bpe\ben\bn'\b',\b, '\b'd\bdu\bup\bp'\b',\b, a\ban\bnd\bd '\b'w\bwa\bai\bit\btp\bpi\bid\bd'\b'
+
+ This happens when you build in /usr/src/linux. The UML build makes
+ the include/asm link point to include/asm-um. /usr/include/asm points
+ to /usr/src/linux/include/asm, so when that link gets moved, files
+ which need to include the asm-i386 versions of headers get the
+ incompatible asm-um versions. The fix is to move the include/asm link
+ back to include/asm-i386 and to do UML builds someplace else.
+
+
+
+ 1\b13\b3.\b.5\b5.\b. U\bUM\bML\bL d\bdo\boe\bes\bsn\bn'\b't\bt w\bwo\bor\brk\bk w\bwh\bhe\ben\bn /\b/t\btm\bmp\bp i\bis\bs a\ban\bn N\bNF\bFS\bS f\bfi\bil\ble\bes\bsy\bys\bst\bte\bem\bm
+
+ This seems to be a similar situation with the ReiserFS problem above.
+ Some versions of NFS seems not to handle mmap correctly, which UML
+ depends on. The workaround is have /tmp be a non-NFS directory.
+
+
+ 1\b13\b3.\b.6\b6.\b. U\bUM\bML\bL h\bha\ban\bng\bgs\bs o\bon\bn b\bbo\boo\bot\bt w\bwh\bhe\ben\bn c\bco\bom\bmp\bpi\bil\ble\bed\bd w\bwi\bit\bth\bh g\bgp\bpr\bro\bof\bf s\bsu\bup\bpp\bpo\bor\brt\bt
+
+ If you build UML with gprof support and, early in the boot, it does
+ this
+
+
+ kernel BUG at page_alloc.c:100!
+
+
+
+
+ you have a buggy gcc. You can work around the problem by removing
+ UM_FASTCALL from CFLAGS in arch/um/Makefile-i386. This will open up
+ another bug, but that one is fairly hard to reproduce.
+
+
+
+ 1\b13\b3.\b.7\b7.\b. s\bsy\bys\bsl\blo\bog\bgd\bd d\bdi\bie\bes\bs w\bwi\bit\bth\bh a\ba S\bSI\bIG\bGT\bTE\bER\bRM\bM o\bon\bn s\bst\bta\bar\brt\btu\bup\bp
+
+ The exact boot error depends on the distribution that you're booting,
+ but Debian produces this:
+
+
+ /etc/rc2.d/S10sysklogd: line 49: 93 Terminated
+ start-stop-daemon --start --quiet --exec /sbin/syslogd -- $SYSLOGD
+
+
+
+
+ This is a syslogd bug. There's a race between a parent process
+ installing a signal handler and its child sending the signal. See
+ this uml-devel post <http://www.geocrawler.com/lists/3/Source-
+ Forge/709/0/6612801> for the details.
+
+
+
+ 1\b13\b3.\b.8\b8.\b. T\bTU\bUN\bN/\b/T\bTA\bAP\bP n\bne\bet\btw\bwo\bor\brk\bki\bin\bng\bg d\bdo\boe\bes\bsn\bn'\b't\bt w\bwo\bor\brk\bk o\bon\bn a\ba 2\b2.\b.4\b4 h\bho\bos\bst\bt
+
+ There are a couple of problems which were
+ <http://www.geocrawler.com/lists/3/SourceForge/597/0/> name="pointed
+ out"> by Tim Robinson <timro at trkr dot net>
+
+ +\bo It doesn't work on hosts running 2.4.7 (or thereabouts) or earlier.
+ The fix is to upgrade to something more recent and then read the
+ next item.
+
+ +\bo If you see
+
+
+ File descriptor in bad state
+
+
+
+ when you bring up the device inside UML, you have a header mismatch
+ between the original kernel and the upgraded one. Make /usr/src/linux
+ point at the new headers. This will only be a problem if you build
+ uml_net yourself.
+
+
+
+ 1\b13\b3.\b.9\b9.\b. Y\bYo\bou\bu c\bca\ban\bn n\bne\bet\btw\bwo\bor\brk\bk t\bto\bo t\bth\bhe\be h\bho\bos\bst\bt b\bbu\but\bt n\bno\bot\bt t\bto\bo o\bot\bth\bhe\ber\br m\bma\bac\bch\bhi\bin\bne\bes\bs o\bon\bn t\bth\bhe\be
+ n\bne\bet\bt
+
+ If you can connect to the host, and the host can connect to UML, but
+ you cannot connect to any other machines, then you may need to enable
+ IP Masquerading on the host. Usually this is only experienced when
+ using private IP addresses (192.168.x.x or 10.x.x.x) for host/UML
+ networking, rather than the public address space that your host is
+ connected to. UML does not enable IP Masquerading, so you will need
+ to create a static rule to enable it:
+
+
+ host%
+ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
+
+
+
+
+ Replace eth0 with the interface that you use to talk to the rest of
+ the world.
+
+
+ Documentation on IP Masquerading, and SNAT, can be found at
+ www.netfilter.org <http://www.netfilter.org> .
+
+
+ If you can reach the local net, but not the outside Internet, then
+ that is usually a routing problem. The UML needs a default route:
+
+
+ UML#
+ route add default gw gateway IP
+
+
+
+
+ The gateway IP can be any machine on the local net that knows how to
+ reach the outside world. Usually, this is the host or the local net-
+ work's gateway.
+
+
+ Occasionally, we hear from someone who can reach some machines, but
+ not others on the same net, or who can reach some ports on other
+ machines, but not others. These are usually caused by strange
+ firewalling somewhere between the UML and the other box. You track
+ this down by running tcpdump on every interface the packets travel
+ over and see where they disappear. When you find a machine that takes
+ the packets in, but does not send them onward, that's the culprit.
+
+
+
+ 1\b13\b3.\b.1\b10\b0.\b. I\bI h\bha\bav\bve\be n\bno\bo r\bro\boo\bot\bt a\ban\bnd\bd I\bI w\bwa\ban\bnt\bt t\bto\bo s\bsc\bcr\bre\bea\bam\bm
+
+ Thanks to Birgit Wahlich for telling me about this strange one. It
+ turns out that there's a limit of six environment variables on the
+ kernel command line. When that limit is reached or exceeded, argument
+ processing stops, which means that the 'root=' argument that UML
+ usually adds is not seen. So, the filesystem has no idea what the
+ root device is, so it panics.
+
+
+ The fix is to put less stuff on the command line. Glomming all your
+ setup variables into one is probably the best way to go.
+
+
+
+ 1\b13\b3.\b.1\b11\b1.\b. U\bUM\bML\bL b\bbu\bui\bil\bld\bd c\bco\bon\bnf\bfl\bli\bic\bct\bt b\bbe\bet\btw\bwe\bee\ben\bn p\bpt\btr\bra\bac\bce\be.\b.h\bh a\ban\bnd\bd u\buc\bco\bon\bnt\bte\bex\bxt\bt.\b.h\bh
+
+ On some older systems, /usr/include/asm/ptrace.h and
+ /usr/include/sys/ucontext.h define the same names. So, when they're
+ included together, the defines from one completely mess up the parsing
+ of the other, producing errors like:
+ /usr/include/sys/ucontext.h:47: parse error before
+ `10'
+
+
+
+
+ plus a pile of warnings.
+
+
+ This is a libc botch, which has since been fixed, and I don't see any
+ way around it besides upgrading.
+
+
+
+ 1\b13\b3.\b.1\b12\b2.\b. T\bTh\bhe\be U\bUM\bML\bL B\bBo\bog\bgo\boM\bMi\bip\bps\bs i\bis\bs e\bex\bxa\bac\bct\btl\bly\by h\bha\bal\blf\bf t\bth\bhe\be h\bho\bos\bst\bt'\b's\bs B\bBo\bog\bgo\boM\bMi\bip\bps\bs
+
+ On i386 kernels, there are two ways of running the loop that is used
+ to calculate the BogoMips rating, using the TSC if it's there or using
+ a one-instruction loop. The TSC produces twice the BogoMips as the
+ loop. UML uses the loop, since it has nothing resembling a TSC, and
+ will get almost exactly the same BogoMips as a host using the loop.
+ However, on a host with a TSC, its BogoMips will be double the loop
+ BogoMips, and therefore double the UML BogoMips.
+
+
+
+ 1\b13\b3.\b.1\b13\b3.\b. W\bWh\bhe\ben\bn y\byo\bou\bu r\bru\bun\bn U\bUM\bML\bL,\b, i\bit\bt i\bim\bmm\bme\bed\bdi\bia\bat\bte\bel\bly\by s\bse\beg\bgf\bfa\bau\bul\blt\bts\bs
+
+ If the host is configured with the 2G/2G address space split, that's
+ why. See ``UML on 2G/2G hosts'' for the details on getting UML to
+ run on your host.
+
+
+
+ 1\b13\b3.\b.1\b14\b4.\b. x\bxt\bte\ber\brm\bms\bs a\bap\bpp\bpe\bea\bar\br,\b, t\bth\bhe\ben\bn i\bim\bmm\bme\bed\bdi\bia\bat\bte\bel\bly\by d\bdi\bis\bsa\bap\bpp\bpe\bea\bar\br
+
+ If you're running an up to date kernel with an old release of
+ uml_utilities, the port-helper program will not work properly, so
+ xterms will exit straight after they appear. The solution is to
+ upgrade to the latest release of uml_utilities. Usually this problem
+ occurs when you have installed a packaged release of UML then compiled
+ your own development kernel without upgrading the uml_utilities from
+ the source distribution.
+
+
+
+ 1\b13\b3.\b.1\b15\b5.\b. A\bAn\bny\by o\bot\bth\bhe\ber\br p\bpa\ban\bni\bic\bc,\b, h\bha\ban\bng\bg,\b, o\bor\br s\bst\btr\bra\ban\bng\bge\be b\bbe\beh\bha\bav\bvi\bio\bor\br
+
+ If you're seeing truly strange behavior, such as hangs or panics that
+ happen in random places, or you try running the debugger to see what's
+ happening and it acts strangely, then it could be a problem in the
+ host kernel. If you're not running a stock Linus or -ac kernel, then
+ try that. An early version of the preemption patch and a 2.4.10 SuSE
+ kernel have caused very strange problems in UML.
+
+
+ Otherwise, let me know about it. Send a message to one of the UML
+ mailing lists - either the developer list - user-mode-linux-devel at
+ lists dot sourceforge dot net (subscription info) or the user list -
+ user-mode-linux-user at lists dot sourceforge do net (subscription
+ info), whichever you prefer. Don't assume that everyone knows about
+ it and that a fix is imminent.
+
+
+ If you want to be super-helpful, read ``Diagnosing Problems'' and
+ follow the instructions contained therein.
+ 1\b14\b4.\b. D\bDi\bia\bag\bgn\bno\bos\bsi\bin\bng\bg P\bPr\bro\bob\bbl\ble\bem\bms\bs
+
+
+ If you get UML to crash, hang, or otherwise misbehave, you should
+ report this on one of the project mailing lists, either the developer
+ list - user-mode-linux-devel at lists dot sourceforge dot net
+ (subscription info) or the user list - user-mode-linux-user at lists
+ dot sourceforge dot net (subscription info). When you do, it is
+ likely that I will want more information. So, it would be helpful to
+ read the stuff below, do whatever is applicable in your case, and
+ report the results to the list.
+
+
+ For any diagnosis, you're going to need to build a debugging kernel.
+ The binaries from this site aren't debuggable. If you haven't done
+ this before, read about ``Compiling the kernel and modules'' and
+ ``Kernel debugging'' UML first.
+
+
+ 1\b14\b4.\b.1\b1.\b. C\bCa\bas\bse\be 1\b1 :\b: N\bNo\bor\brm\bma\bal\bl k\bke\ber\brn\bne\bel\bl p\bpa\ban\bni\bic\bcs\bs
+
+ The most common case is for a normal thread to panic. To debug this,
+ you will need to run it under the debugger (add 'debug' to the command
+ line). An xterm will start up with gdb running inside it. Continue
+ it when it stops in start_kernel and make it crash. Now ^C gdb and
+
+
+ If the panic was a "Kernel mode fault", then there will be a segv
+ frame on the stack and I'm going to want some more information. The
+ stack might look something like this:
+
+
+ (UML gdb) backtrace
+ #0 0x1009bf76 in __sigprocmask (how=1, set=0x5f347940, oset=0x0)
+ at ../sysdeps/unix/sysv/linux/sigprocmask.c:49
+ #1 0x10091411 in change_sig (signal=10, on=1) at process.c:218
+ #2 0x10094785 in timer_handler (sig=26) at time_kern.c:32
+ #3 0x1009bf38 in __restore ()
+ at ../sysdeps/unix/sysv/linux/i386/sigaction.c:125
+ #4 0x1009534c in segv (address=8, ip=268849158, is_write=2, is_user=0)
+ at trap_kern.c:66
+ #5 0x10095c04 in segv_handler (sig=11) at trap_user.c:285
+ #6 0x1009bf38 in __restore ()
+
+
+
+
+ I'm going to want to see the symbol and line information for the value
+ of ip in the segv frame. In this case, you would do the following:
+
+
+ (UML gdb) i sym 268849158
+
+
+
+
+ and
+
+
+ (UML gdb) i line *268849158
+
+
+
+
+ The reason for this is the __restore frame right above the segv_han-
+ dler frame is hiding the frame that actually segfaulted. So, I have
+ to get that information from the faulting ip.
+
+
+ 1\b14\b4.\b.2\b2.\b. C\bCa\bas\bse\be 2\b2 :\b: T\bTr\bra\bac\bci\bin\bng\bg t\bth\bhr\bre\bea\bad\bd p\bpa\ban\bni\bic\bcs\bs
+
+ The less common and more painful case is when the tracing thread
+ panics. In this case, the kernel debugger will be useless because it
+ needs a healthy tracing thread in order to work. The first thing to
+ do is get a backtrace from the tracing thread. This is done by
+ figuring out what its pid is, firing up gdb, and attaching it to that
+ pid. You can figure out the tracing thread pid by looking at the
+ first line of the console output, which will look like this:
+
+
+ tracing thread pid = 15851
+
+
+
+
+ or by running ps on the host and finding the line that looks like
+ this:
+
+
+ jdike 15851 4.5 0.4 132568 1104 pts/0 S 21:34 0:05 ./linux [(tracing thread)]
+
+
+
+
+ If the panic was 'segfault in signals', then follow the instructions
+ above for collecting information about the location of the seg fault.
+
+
+ If the tracing thread flaked out all by itself, then send that
+ backtrace in and wait for our crack debugging team to fix the problem.
+
+
+ 1\b14\b4.\b.3\b3.\b. C\bCa\bas\bse\be 3\b3 :\b: T\bTr\bra\bac\bci\bin\bng\bg t\bth\bhr\bre\bea\bad\bd p\bpa\ban\bni\bic\bcs\bs c\bca\bau\bus\bse\bed\bd b\bby\by o\bot\bth\bhe\ber\br t\bth\bhr\bre\bea\bad\bds\bs
+
+ However, there are cases where the misbehavior of another thread
+ caused the problem. The most common panic of this type is:
+
+
+ wait_for_stop failed to wait for <pid> to stop with <signal number>
+
+
+
+
+ In this case, you'll need to get a backtrace from the process men-
+ tioned in the panic, which is complicated by the fact that the kernel
+ debugger is defunct and without some fancy footwork, another gdb can't
+ attach to it. So, this is how the fancy footwork goes:
+
+ In a shell:
+
+
+ host% kill -STOP pid
+
+
+
+
+ Run gdb on the tracing thread as described in case 2 and do:
+
+
+ (host gdb) call detach(pid)
+
+
+ If you get a segfault, do it again. It always works the second time.
+
+ Detach from the tracing thread and attach to that other thread:
+
+
+ (host gdb) detach
+
+
+
+
+
+
+ (host gdb) attach pid
+
+
+
+
+ If gdb hangs when attaching to that process, go back to a shell and
+ do:
+
+
+ host%
+ kill -CONT pid
+
+
+
+
+ And then get the backtrace:
+
+
+ (host gdb) backtrace
+
+
+
+
+
+ 1\b14\b4.\b.4\b4.\b. C\bCa\bas\bse\be 4\b4 :\b: H\bHa\ban\bng\bgs\bs
+
+ Hangs seem to be fairly rare, but they sometimes happen. When a hang
+ happens, we need a backtrace from the offending process. Run the
+ kernel debugger as described in case 1 and get a backtrace. If the
+ current process is not the idle thread, then send in the backtrace.
+ You can tell that it's the idle thread if the stack looks like this:
+
+
+ #0 0x100b1401 in __libc_nanosleep ()
+ #1 0x100a2885 in idle_sleep (secs=10) at time.c:122
+ #2 0x100a546f in do_idle () at process_kern.c:445
+ #3 0x100a5508 in cpu_idle () at process_kern.c:471
+ #4 0x100ec18f in start_kernel () at init/main.c:592
+ #5 0x100a3e10 in start_kernel_proc (unused=0x0) at um_arch.c:71
+ #6 0x100a383f in signal_tramp (arg=0x100a3dd8) at trap_user.c:50
+
+
+
+
+ If this is the case, then some other process is at fault, and went to
+ sleep when it shouldn't have. Run ps on the host and figure out which
+ process should not have gone to sleep and stayed asleep. Then attach
+ to it with gdb and get a backtrace as described in case 3.
+
+
+
+
+
+
+ 1\b15\b5.\b. T\bTh\bha\ban\bnk\bks\bs
+
+
+ A number of people have helped this project in various ways, and this
+ page gives recognition where recognition is due.
+
+
+ If you're listed here and you would prefer a real link on your name,
+ or no link at all, instead of the despammed email address pseudo-link,
+ let me know.
+
+
+ If you're not listed here and you think maybe you should be, please
+ let me know that as well. I try to get everyone, but sometimes my
+ bookkeeping lapses and I forget about contributions.
+
+
+ 1\b15\b5.\b.1\b1.\b. C\bCo\bod\bde\be a\ban\bnd\bd D\bDo\boc\bcu\bum\bme\ben\bnt\bta\bat\bti\bio\bon\bn
+
+ Rusty Russell <rusty at linuxcare.com.au> -
+
+ +\bo wrote the HOWTO <http://user-mode-
+ linux.sourceforge.net/UserModeLinux-HOWTO.html>
+
+ +\bo prodded me into making this project official and putting it on
+ SourceForge
+
+ +\bo came up with the way cool UML logo <http://user-mode-
+ linux.sourceforge.net/uml-small.png>
+
+ +\bo redid the config process
+
+
+ Peter Moulder <reiter at netspace.net.au> - Fixed my config and build
+ processes, and added some useful code to the block driver
+
+
+ Bill Stearns <wstearns at pobox.com> -
+
+ +\bo HOWTO updates
+
+ +\bo lots of bug reports
+
+ +\bo lots of testing
+
+ +\bo dedicated a box (uml.ists.dartmouth.edu) to support UML development
+
+ +\bo wrote the mkrootfs script, which allows bootable filesystems of
+ RPM-based distributions to be cranked out
+
+ +\bo cranked out a large number of filesystems with said script
+
+
+ Jim Leu <jleu at mindspring.com> - Wrote the virtual ethernet driver
+ and associated usermode tools
+
+ Lars Brinkhoff <http://lars.nocrew.org/> - Contributed the ptrace
+ proxy from his own project <http://a386.nocrew.org/> to allow easier
+ kernel debugging
+
+
+ Andrea Arcangeli <andrea at suse.de> - Redid some of the early boot
+ code so that it would work on machines with Large File Support
+
+
+ Chris Emerson <http://www.chiark.greenend.org.uk/~cemerson/> - Did
+ the first UML port to Linux/ppc
+
+
+ Harald Welte <laforge at gnumonks.org> - Wrote the multicast
+ transport for the network driver
+
+
+ Jorgen Cederlof - Added special file support to hostfs
+
+
+ Greg Lonnon <glonnon at ridgerun dot com> - Changed the ubd driver
+ to allow it to layer a COW file on a shared read-only filesystem and
+ wrote the iomem emulation support
+
+
+ Henrik Nordstrom <http://hem.passagen.se/hno/> - Provided a variety
+ of patches, fixes, and clues
+
+
+ Lennert Buytenhek - Contributed various patches, a rewrite of the
+ network driver, the first implementation of the mconsole driver, and
+ did the bulk of the work needed to get SMP working again.
+
+
+ Yon Uriarte - Fixed the TUN/TAP network backend while I slept.
+
+
+ Adam Heath - Made a bunch of nice cleanups to the initialization code,
+ plus various other small patches.
+
+
+ Matt Zimmerman - Matt volunteered to be the UML Debian maintainer and
+ is doing a real nice job of it. He also noticed and fixed a number of
+ actually and potentially exploitable security holes in uml_net. Plus
+ the occasional patch. I like patches.
+
+
+ James McMechan - James seems to have taken over maintenance of the ubd
+ driver and is doing a nice job of it.
+
+
+ Chandan Kudige - wrote the umlgdb script which automates the reloading
+ of module symbols.
+
+
+ Steve Schmidtke - wrote the UML slirp transport and hostaudio drivers,
+ enabling UML processes to access audio devices on the host. He also
+ submitted patches for the slip transport and lots of other things.
+
+
+ David Coulson <http://davidcoulson.net> -
+
+ +\bo Set up the usermodelinux.org <http://usermodelinux.org> site,
+ which is a great way of keeping the UML user community on top of
+ UML goings-on.
+
+ +\bo Site documentation and updates
+
+ +\bo Nifty little UML management daemon UMLd
+ <http://uml.openconsultancy.com/umld/>
+
+ +\bo Lots of testing and bug reports
+
+
+
+
+ 1\b15\b5.\b.2\b2.\b. F\bFl\blu\bus\bsh\bhi\bin\bng\bg o\bou\but\bt b\bbu\bug\bgs\bs
+
+
+
+ +\bo Yuri Pudgorodsky
+
+ +\bo Gerald Britton
+
+ +\bo Ian Wehrman
+
+ +\bo Gord Lamb
+
+ +\bo Eugene Koontz
+
+ +\bo John H. Hartman
+
+ +\bo Anders Karlsson
+
+ +\bo Daniel Phillips
+
+ +\bo John Fremlin
+
+ +\bo Rainer Burgstaller
+
+ +\bo James Stevenson
+
+ +\bo Matt Clay
+
+ +\bo Cliff Jefferies
+
+ +\bo Geoff Hoff
+
+ +\bo Lennert Buytenhek
+
+ +\bo Al Viro
+
+ +\bo Frank Klingenhoefer
+
+ +\bo Livio Baldini Soares
+
+ +\bo Jon Burgess
+
+ +\bo Petru Paler
+
+ +\bo Paul
+
+ +\bo Chris Reahard
+
+ +\bo Sverker Nilsson
+
+ +\bo Gong Su
+
+ +\bo johan verrept
+
+ +\bo Bjorn Eriksson
+
+ +\bo Lorenzo Allegrucci
+
+ +\bo Muli Ben-Yehuda
+
+ +\bo David Mansfield
+
+ +\bo Howard Goff
+
+ +\bo Mike Anderson
+
+ +\bo John Byrne
+
+ +\bo Sapan J. Batia
+
+ +\bo Iris Huang
+
+ +\bo Jan Hudec
+
+ +\bo Voluspa
+
+
+
+
+ 1\b15\b5.\b.3\b3.\b. B\bBu\bug\bgl\ble\bet\bts\bs a\ban\bnd\bd c\bcl\ble\bea\ban\bn-\b-u\bup\bps\bs
+
+
+
+ +\bo Dave Zarzycki
+
+ +\bo Adam Lazur
+
+ +\bo Boria Feigin
+
+ +\bo Brian J. Murrell
+
+ +\bo JS
+
+ +\bo Roman Zippel
+
+ +\bo Wil Cooley
+
+ +\bo Ayelet Shemesh
+
+ +\bo Will Dyson
+
+ +\bo Sverker Nilsson
+
+ +\bo dvorak
+
+ +\bo v.naga srinivas
+
+ +\bo Shlomi Fish
+
+ +\bo Roger Binns
+
+ +\bo johan verrept
+
+ +\bo MrChuoi
+
+ +\bo Peter Cleve
+
+ +\bo Vincent Guffens
+
+ +\bo Nathan Scott
+
+ +\bo Patrick Caulfield
+
+ +\bo jbearce
+
+ +\bo Catalin Marinas
+
+ +\bo Shane Spencer
+
+ +\bo Zou Min
+
+
+ +\bo Ryan Boder
+
+ +\bo Lorenzo Colitti
+
+ +\bo Gwendal Grignou
+
+ +\bo Andre' Breiler
+
+ +\bo Tsutomu Yasuda
+
+
+
+ 1\b15\b5.\b.4\b4.\b. C\bCa\bas\bse\be S\bSt\btu\bud\bdi\bie\bes\bs
+
+
+ +\bo Jon Wright
+
+ +\bo William McEwan
+
+ +\bo Michael Richardson
+
+
+
+ 1\b15\b5.\b.5\b5.\b. O\bOt\bth\bhe\ber\br c\bco\bon\bnt\btr\bri\bib\bbu\but\bti\bio\bon\bns\bs
+
+
+ Bill Carr <Bill.Carr at compaq.com> made the Red Hat mkrootfs script
+ work with RH 6.2.
+
+ Michael Jennings <mikejen at hevanet.com> sent in some material which
+ is now gracing the top of the index page <http://user-mode-
+ linux.sourceforge.net/> of this site.
+
+ SGI <http://www.sgi.com> (and more specifically Ralf Baechle <ralf at
+ uni-koblenz.de> ) gave me an account on oss.sgi.com
+ <http://www.oss.sgi.com> . The bandwidth there made it possible to
+ produce most of the filesystems available on the project download
+ page.
+
+ Laurent Bonnaud <Laurent.Bonnaud at inpg.fr> took the old grotty
+ Debian filesystem that I've been distributing and updated it to 2.2.
+ It is now available by itself here.
+
+ Rik van Riel gave me some ftp space on ftp.nl.linux.org so I can make
+ releases even when Sourceforge is broken.
+
+ Rodrigo de Castro looked at my broken pte code and told me what was
+ wrong with it, letting me fix a long-standing (several weeks) and
+ serious set of bugs.
+
+ Chris Reahard built a specialized root filesystem for running a DNS
+ server jailed inside UML. It's available from the download
+ <http://user-mode-linux.sourceforge.net/dl-sf.html> page in the Jail
+ Filesystems section.
+
+
+
+
+
+
+
+
+
+
+
+
4. Application Programming Interface (API)
5. Example Execution Scenarios
6. Guidelines
+7. Debugging
1. Introduction
* Unless work items are expected to consume a huge amount of CPU
cycles, using a bound wq is usually beneficial due to the increased
level of locality in wq operations and work item execution.
+
+
+7. Debugging
+
+Because the work functions are executed by generic worker threads
+there are a few tricks needed to shed some light on misbehaving
+workqueue users.
+
+Worker threads show up in the process list as:
+
+root 5671 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/0:1]
+root 5672 0.0 0.0 0 0 ? S 12:07 0:00 [kworker/1:2]
+root 5673 0.0 0.0 0 0 ? S 12:12 0:00 [kworker/0:0]
+root 5674 0.0 0.0 0 0 ? S 12:13 0:00 [kworker/1:0]
+
+If kworkers are going crazy (using too much cpu), there are two types
+of possible problems:
+
+ 1. Something beeing scheduled in rapid succession
+ 2. A single work item that consumes lots of cpu cycles
+
+The first one can be tracked using tracing:
+
+ $ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
+ $ cat /sys/kernel/debug/tracing/trace_pipe > out.txt
+ (wait a few secs)
+ ^C
+
+If something is busy looping on work queueing, it would be dominating
+the output and the offender can be determined with the work item
+function.
+
+For the second type of problems it should be possible to just check
+the stack trace of the offending worker thread.
+
+ $ cat /proc/THE_OFFENDING_KWORKER/stack
+
+The work item's function should be trivially visible in the stack
+trace.
(e.g. because you have < 3 GB memory).
Kernel boot message: "PCI-DMA: Disabling IOMMU"
- 2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
+ 2. <arch/x86/kernel/amd_gart_64.c>: AMD GART based hardware IOMMU.
Kernel boot message: "PCI-DMA: using GART IOMMU"
3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
F: sound/oss/aedsp16.c
AFFS FILE SYSTEM
-M: Roman Zippel <zippel@linux-m68k.org>
-S: Maintained
+L: linux-fsdevel@vger.kernel.org
+S: Orphan
F: Documentation/filesystems/affs.txt
F: fs/affs/
S: Maintained
F: arch/arm/mach-s3c64xx/
-ARM/S5P ARM ARCHITECTURES
+ARM/S5P EXYNOS ARM ARCHITECTURES
M: Kukjin Kim <kgene.kim@samsung.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
L: linux-samsung-soc@vger.kernel.org (moderated for non-subscribers)
S: Maintained
F: arch/arm/mach-s5p*/
+F: arch/arm/mach-exynos*/
ARM/SAMSUNG MOBILE MACHINE SUPPORT
M: Kyungmin Park <kyungmin.park@samsung.com>
M: Grant Likely <grant.likely@secretlab.ca>
S: Maintained
T: git git://git.secretlab.ca/git/linux-2.6.git
-F: Documentation/gpio/gpio.txt
+F: Documentation/gpio.txt
F: drivers/gpio/
F: include/linux/gpio*
+GRE DEMULTIPLEXER DRIVER
+M: Dmitry Kozlov <xeb@mail.ru>
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/ipv4/gre.c
+F: include/net/gre.h
+
GRETH 10/100/1G Ethernet MAC device driver
M: Kristoffer Glembo <kristoffer@gaisler.com>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/greth*
-HARD DRIVE ACTIVE PROTECTION SYSTEM (HDAPS) DRIVER
-M: Frank Seidel <frank@f-seidel.de>
-L: platform-driver-x86@vger.kernel.org
-W: http://www.kernel.org/pub/linux/kernel/people/fseidel/hdaps/
-S: Maintained
-F: drivers/platform/x86/hdaps.c
-
-HWPOISON MEMORY FAILURE HANDLING
-M: Andi Kleen <andi@firstfloor.org>
-L: linux-mm@kvack.org
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git hwpoison
-S: Maintained
-F: mm/memory-failure.c
-F: mm/hwpoison-inject.c
-
-HYPERVISOR VIRTUAL CONSOLE DRIVER
-L: linuxppc-dev@lists.ozlabs.org
-S: Odd Fixes
-F: drivers/tty/hvc/
-
-iSCSI BOOT FIRMWARE TABLE (iBFT) DRIVER
-M: Peter Jones <pjones@redhat.com>
-M: Konrad Rzeszutek Wilk <konrad@kernel.org>
-S: Maintained
-F: drivers/firmware/iscsi_ibft*
-
GSPCA FINEPIX SUBDRIVER
M: Frank Zago <frank@zago.net>
L: linux-media@vger.kernel.org
S: Maintained
F: drivers/media/video/gspca/
+HARD DRIVE ACTIVE PROTECTION SYSTEM (HDAPS) DRIVER
+M: Frank Seidel <frank@f-seidel.de>
+L: platform-driver-x86@vger.kernel.org
+W: http://www.kernel.org/pub/linux/kernel/people/fseidel/hdaps/
+S: Maintained
+F: drivers/platform/x86/hdaps.c
+
+HWPOISON MEMORY FAILURE HANDLING
+M: Andi Kleen <andi@firstfloor.org>
+L: linux-mm@kvack.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git hwpoison
+S: Maintained
+F: mm/memory-failure.c
+F: mm/hwpoison-inject.c
+
+HYPERVISOR VIRTUAL CONSOLE DRIVER
+L: linuxppc-dev@lists.ozlabs.org
+S: Odd Fixes
+F: drivers/tty/hvc/
+
HARDWARE MONITORING
M: Jean Delvare <khali@linux-fr.org>
M: Guenter Roeck <guenter.roeck@ericsson.com>
F: include/linux/cciss_ioctl.h
HFS FILESYSTEM
-M: Roman Zippel <zippel@linux-m68k.org>
-S: Maintained
+L: linux-fsdevel@vger.kernel.org
+S: Orphan
F: Documentation/filesystems/hfs.txt
F: fs/hfs/
F: drivers/pnp/isapnp/
F: include/linux/isapnp.h
+iSCSI BOOT FIRMWARE TABLE (iBFT) DRIVER
+M: Peter Jones <pjones@redhat.com>
+M: Konrad Rzeszutek Wilk <konrad@kernel.org>
+S: Maintained
+F: drivers/firmware/iscsi_ibft*
+
ISCSI
M: Mike Christie <michaelc@cs.wisc.edu>
L: open-iscsi@googlegroups.com
L: lguest@lists.ozlabs.org
W: http://lguest.ozlabs.org/
S: Odd Fixes
-F: Documentation/lguest/
+F: Documentation/virtual/lguest/
F: arch/x86/lguest/
F: drivers/lguest/
F: include/linux/lguest*.h
M68K ARCHITECTURE
M: Geert Uytterhoeven <geert@linux-m68k.org>
-M: Roman Zippel <zippel@linux-m68k.org>
L: linux-m68k@lists.linux-m68k.org
W: http://www.linux-m68k.org/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k.git
F: drivers/pps/
F: include/linux/pps*.h
+PPTP DRIVER
+M: Dmitry Kozlov <xeb@mail.ru>
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/pptp.c
+W: http://sourceforge.net/projects/accel-pptp
+
PREEMPTIBLE KERNEL
M: Robert Love <rml@tech9.net>
L: kpreempt-tech@lists.sourceforge.net
F: drivers/usb/host/uhci*
USB "USBNET" DRIVER FRAMEWORK
-M: David Brownell <dbrownell@users.sourceforge.net>
+M: Oliver Neukum <oneukum@suse.de>
L: netdev@vger.kernel.org
W: http://www.linux-usb.org/usbnet
S: Maintained
L: user-mode-linux-user@lists.sourceforge.net
W: http://user-mode-linux.sourceforge.net
S: Maintained
-F: Documentation/uml/
+F: Documentation/virtual/uml/
F: arch/um/
F: fs/hostfs/
F: fs/hppfs/
S: Maintained
F: drivers/platform/x86
+XEN HYPERVISOR INTERFACE
+M: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
+M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+L: xen-devel@lists.xensource.com (moderated for non-subscribers)
+L: virtualization@lists.linux-foundation.org
+S: Supported
+F: arch/x86/xen/
+F: drivers/*/xen-*front.c
+F: drivers/xen/
+F: arch/x86/include/asm/xen/
+F: include/xen/
+
XEN NETWORK BACKEND DRIVER
M: Ian Campbell <ian.campbell@citrix.com>
L: xen-devel@lists.xensource.com (moderated for non-subscribers)
F: arch/x86/xen/*swiotlb*
F: drivers/xen/*swiotlb*
-XEN HYPERVISOR INTERFACE
-M: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-L: xen-devel@lists.xensource.com (moderated for non-subscribers)
-L: virtualization@lists.linux-foundation.org
-S: Supported
-F: arch/x86/xen/
-F: drivers/*/xen-*front.c
-F: drivers/xen/
-F: arch/x86/include/asm/xen/
-F: include/xen/
-
XFS FILESYSTEM
P: Silicon Graphics Inc
M: Alex Elder <aelder@sgi.com>
S: Maintained
F: drivers/tty/serial/zs.*
-GRE DEMULTIPLEXER DRIVER
-M: Dmitry Kozlov <xeb@mail.ru>
-L: netdev@vger.kernel.org
-S: Maintained
-F: net/ipv4/gre.c
-F: include/net/gre.h
-
-PPTP DRIVER
-M: Dmitry Kozlov <xeb@mail.ru>
-L: netdev@vger.kernel.org
-S: Maintained
-F: drivers/net/pptp.c
-W: http://sourceforge.net/projects/accel-pptp
-
THE REST
M: Linus Torvalds <torvalds@linux-foundation.org>
L: linux-kernel@vger.kernel.org
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 39
-EXTRAVERSION = -rc4
+EXTRAVERSION =
NAME = Flesh-Eating Bats with Fangs
# *DOCUMENTATION*
@echo ' make C=1 [targets] Check all c source with $$CHECK (sparse by default)'
@echo ' make C=2 [targets] Force check of all c source with $$CHECK'
@echo ' make W=1 [targets] Enable extra gcc checks'
+ @echo ' make RECORDMCOUNT_WARN=1 [targets] Warn about ignored mcount sections'
@echo ''
@echo 'Execute "make" or "make all" to build all targets marked with [*] '
@echo 'For further info see the ./README file'
#define __NR_fanotify_init 494
#define __NR_fanotify_mark 495
#define __NR_prlimit64 496
+#define __NR_name_to_handle_at 497
+#define __NR_open_by_handle_at 498
+#define __NR_clock_adjtime 499
+#define __NR_syncfs 500
#ifdef __KERNEL__
-#define NR_SYSCALLS 497
+#define NR_SYSCALLS 501
#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
switch (which) {
case IPI_RESCHEDULE:
- /* Reschedule callback. Everything to be done
- is done by the interrupt return path. */
+ scheduler_ipi();
break;
case IPI_CALL_FUNC:
.quad sys_ni_syscall /* sys_timerfd */
.quad sys_eventfd
.quad sys_recvmmsg
- .quad sys_fallocate /* 480 */
+ .quad sys_fallocate /* 480 */
.quad sys_timerfd_create
.quad sys_timerfd_settime
.quad sys_timerfd_gettime
.quad sys_signalfd4
- .quad sys_eventfd2 /* 485 */
+ .quad sys_eventfd2 /* 485 */
.quad sys_epoll_create1
.quad sys_dup3
.quad sys_pipe2
.quad sys_inotify_init1
- .quad sys_preadv /* 490 */
+ .quad sys_preadv /* 490 */
.quad sys_pwritev
.quad sys_rt_tgsigqueueinfo
.quad sys_perf_event_open
.quad sys_fanotify_init
- .quad sys_fanotify_mark /* 495 */
+ .quad sys_fanotify_mark /* 495 */
.quad sys_prlimit64
+ .quad sys_name_to_handle_at
+ .quad sys_open_by_handle_at
+ .quad sys_clock_adjtime
+ .quad sys_syncfs /* 500 */
.size sys_call_table, . - sys_call_table
.type sys_call_table, @object
static inline void register_rpcc_clocksource(long cycle_freq)
{
- clocksource_calc_mult_shift(&clocksource_rpcc, cycle_freq, 4);
- clocksource_register(&clocksource_rpcc);
+ clocksource_register_hz(&clocksource_rpcc, cycle_freq);
}
#else /* !CONFIG_SMP */
static inline void register_rpcc_clocksource(long cycle_freq)
ZBSSADDR := $(CONFIG_ZBOOT_ROM_BSS)
else
ZTEXTADDR := 0
-ZBSSADDR := ALIGN(4)
+ZBSSADDR := ALIGN(8)
endif
SEDFLAGS = s/TEXT_START/$(ZTEXTADDR)/;s/BSS_START/$(ZBSSADDR)/
bl cache_on
restart: adr r0, LC0
- ldmia r0, {r1, r2, r3, r5, r6, r9, r11, r12}
- ldr sp, [r0, #32]
+ ldmia r0, {r1, r2, r3, r6, r9, r11, r12}
+ ldr sp, [r0, #28]
/*
* We might be running at a different address. We need
* to fix up various pointers.
*/
sub r0, r0, r1 @ calculate the delta offset
- add r5, r5, r0 @ _start
add r6, r6, r0 @ _edata
#ifndef CONFIG_ZBOOT_ROM
/*
* Check to see if we will overwrite ourselves.
* r4 = final kernel address
- * r5 = start of this image
* r9 = size of decompressed image
* r10 = end of this image, including bss/stack/malloc space if non XIP
* We basically want:
- * r4 >= r10 -> OK
- * r4 + image length <= r5 -> OK
+ * r4 - 16k page directory >= r10 -> OK
+ * r4 + image length <= current position (pc) -> OK
*/
+ add r10, r10, #16384
cmp r4, r10
bhs wont_overwrite
add r10, r4, r9
- cmp r10, r5
+ ARM( cmp r10, pc )
+ THUMB( mov lr, pc )
+ THUMB( cmp r10, lr )
bls wont_overwrite
/*
* Relocate ourselves past the end of the decompressed kernel.
- * r5 = start of this image
* r6 = _edata
* r10 = end of the decompressed kernel
* Because we always copy ahead, we need to do it from the end and go
* backward in case the source and destination overlap.
*/
- /* Round up to next 256-byte boundary. */
- add r10, r10, #256
+ /*
+ * Bump to the next 256-byte boundary with the size of
+ * the relocation code added. This avoids overwriting
+ * ourself when the offset is small.
+ */
+ add r10, r10, #((reloc_code_end - restart + 256) & ~255)
bic r10, r10, #255
+ /* Get start of code we want to copy and align it down. */
+ adr r5, restart
+ bic r5, r5, #31
+
sub r9, r6, r5 @ size to copy
add r9, r9, #31 @ rounded up to a multiple
bic r9, r9, #31 @ ... of 32 bytes
/* Preserve offset to relocated code. */
sub r6, r9, r6
+#ifndef CONFIG_ZBOOT_ROM
+ /* cache_clean_flush may use the stack, so relocate it */
+ add sp, sp, r6
+#endif
+
bl cache_clean_flush
adr r0, BSYM(restart)
LC0: .word LC0 @ r1
.word __bss_start @ r2
.word _end @ r3
- .word _start @ r5
.word _edata @ r6
.word _image_size @ r9
.word _got_start @ r11
#endif
.ltorg
+reloc_code_end:
.align
.section ".stack", "aw", %nobits
.bss : { *(.bss) }
_end = .;
+ . = ALIGN(8); /* the stack must be 64-bit aligned */
.stack : { *(.stack) }
.stab 0 : { *(.stab) }
#include <linux/init.h>
#include <linux/list.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/device.h>
#include <linux/amba/bus.h>
#include <asm/mach/irq.h>
#include <asm/hardware/vic.h>
-#if defined(CONFIG_PM)
+#ifdef CONFIG_PM
/**
* struct vic_device - VIC PM device
- * @sysdev: The system device which is registered.
* @irq: The IRQ number for the base of the VIC.
* @base: The register base for the VIC.
* @resume_sources: A bitmask of interrupts for resume.
* @protect: Save for VIC_PROTECT.
*/
struct vic_device {
- struct sys_device sysdev;
-
void __iomem *base;
int irq;
u32 resume_sources;
static struct vic_device vic_devices[CONFIG_ARM_VIC_NR];
static int vic_id;
-
-static inline struct vic_device *to_vic(struct sys_device *sys)
-{
- return container_of(sys, struct vic_device, sysdev);
-}
#endif /* CONFIG_PM */
/**
writel(32, base + VIC_PL190_DEF_VECT_ADDR);
}
-#if defined(CONFIG_PM)
-static int vic_class_resume(struct sys_device *dev)
+#ifdef CONFIG_PM
+static void resume_one_vic(struct vic_device *vic)
{
- struct vic_device *vic = to_vic(dev);
void __iomem *base = vic->base;
printk(KERN_DEBUG "%s: resuming vic at %p\n", __func__, base);
writel(vic->soft_int, base + VIC_INT_SOFT);
writel(~vic->soft_int, base + VIC_INT_SOFT_CLEAR);
+}
- return 0;
+static void vic_resume(void)
+{
+ int id;
+
+ for (id = vic_id - 1; id >= 0; id--)
+ resume_one_vic(vic_devices + id);
}
-static int vic_class_suspend(struct sys_device *dev, pm_message_t state)
+static void suspend_one_vic(struct vic_device *vic)
{
- struct vic_device *vic = to_vic(dev);
void __iomem *base = vic->base;
printk(KERN_DEBUG "%s: suspending vic at %p\n", __func__, base);
writel(vic->resume_irqs, base + VIC_INT_ENABLE);
writel(~vic->resume_irqs, base + VIC_INT_ENABLE_CLEAR);
+}
+
+static int vic_suspend(void)
+{
+ int id;
+
+ for (id = 0; id < vic_id; id++)
+ suspend_one_vic(vic_devices + id);
return 0;
}
-struct sysdev_class vic_class = {
- .name = "vic",
- .suspend = vic_class_suspend,
- .resume = vic_class_resume,
+struct syscore_ops vic_syscore_ops = {
+ .suspend = vic_suspend,
+ .resume = vic_resume,
};
/**
*/
static int __init vic_pm_init(void)
{
- struct vic_device *dev = vic_devices;
- int err;
- int id;
-
- if (vic_id == 0)
- return 0;
-
- err = sysdev_class_register(&vic_class);
- if (err) {
- printk(KERN_ERR "%s: cannot register class\n", __func__);
- return err;
- }
-
- for (id = 0; id < vic_id; id++, dev++) {
- dev->sysdev.id = id;
- dev->sysdev.cls = &vic_class;
-
- err = sysdev_register(&dev->sysdev);
- if (err) {
- printk(KERN_ERR "%s: failed to register device\n",
- __func__);
- return err;
- }
- }
+ if (vic_id > 0)
+ register_syscore_ops(&vic_syscore_ops);
return 0;
}
--- /dev/null
+CONFIG_EXPERIMENTAL=y
+CONFIG_LOG_BUF_SHIFT=14
+CONFIG_EMBEDDED=y
+# CONFIG_HOTPLUG is not set
+# CONFIG_ELF_CORE is not set
+# CONFIG_FUTEX is not set
+# CONFIG_TIMERFD is not set
+# CONFIG_VM_EVENT_COUNTERS is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_SLAB=y
+# CONFIG_LBDAF is not set
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+# CONFIG_MMU is not set
+CONFIG_ARCH_AT91=y
+CONFIG_ARCH_AT91X40=y
+CONFIG_MACH_AT91EB01=y
+CONFIG_AT91_EARLY_USART0=y
+CONFIG_CPU_ARM7TDMI=y
+CONFIG_SET_MEM_PARAM=y
+CONFIG_DRAM_BASE=0x01000000
+CONFIG_DRAM_SIZE=0x00400000
+CONFIG_FLASH_MEM_BASE=0x01400000
+CONFIG_PROCESSOR_ID=0x14000040
+CONFIG_ZBOOT_ROM_TEXT=0x0
+CONFIG_ZBOOT_ROM_BSS=0x0
+CONFIG_BINFMT_FLAT=y
+# CONFIG_SUSPEND is not set
+# CONFIG_FW_LOADER is not set
+CONFIG_MTD=y
+CONFIG_MTD_PARTITIONS=y
+CONFIG_MTD_CHAR=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_RAM=y
+CONFIG_MTD_ROM=y
+CONFIG_BLK_DEV_RAM=y
+# CONFIG_INPUT is not set
+# CONFIG_SERIO is not set
+# CONFIG_VT is not set
+# CONFIG_DEVKMEM is not set
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+# CONFIG_USB_SUPPORT is not set
+CONFIG_EXT2_FS=y
+# CONFIG_DNOTIFY is not set
+CONFIG_ROMFS_FS=y
+# CONFIG_ENABLE_MUST_CHECK is not set
--- /dev/null
+#ifndef __ASMARM_I8253_H
+#define __ASMARM_I8253_H
+
+/* i8253A PIT registers */
+#define PIT_MODE 0x43
+#define PIT_CH0 0x40
+
+#define PIT_LATCH ((PIT_TICK_RATE + HZ / 2) / HZ)
+
+extern raw_spinlock_t i8253_lock;
+
+#define outb_pit outb_p
+#define inb_pit inb_p
+
+#endif
struct kprobe;
typedef void (kprobe_insn_handler_t)(struct kprobe *, struct pt_regs *);
+typedef unsigned long (kprobe_check_cc)(unsigned long);
+
/* Architecture specific copy of original instruction. */
struct arch_specific_insn {
kprobe_opcode_t *insn;
kprobe_insn_handler_t *insn_handler;
+ kprobe_check_cc *insn_check_cc;
};
struct prev_kprobe {
* timer interrupt which may be pending.
*/
struct sys_timer {
- struct sys_device dev;
void (*init)(void);
void (*suspend)(void);
void (*resume)(void);
#include <mach/barriers.h>
#elif defined(CONFIG_ARM_DMA_MEM_BUFFERABLE) || defined(CONFIG_SMP)
#define mb() do { dsb(); outer_sync(); } while (0)
-#define rmb() dmb()
+#define rmb() dsb()
#define wmb() mb()
#else
#include <asm/memory.h>
*
* *) If the PC is written to by the instruction, the
* instruction must be fully simulated in software.
- * If it is a conditional instruction, the handler
- * will use insn[0] to copy its condition code to
- * set r0 to 1 and insn[1] to "mov pc, lr" to return.
*
* *) Otherwise, a modified form of the instruction is
* directly executed. Its handler calls the
#define branch_displacement(insn) sign_extend(((insn) & 0xffffff) << 2, 25)
+#define is_r15(insn, bitpos) (((insn) & (0xf << bitpos)) == (0xf << bitpos))
+
+/*
+ * Test if load/store instructions writeback the address register.
+ * if P (bit 24) == 0 or W (bit 21) == 1
+ */
+#define is_writeback(insn) ((insn ^ 0x01000000) & 0x01200000)
+
#define PSR_fs (PSR_f|PSR_s)
#define KPROBE_RETURN_INSTRUCTION 0xe1a0f00e /* mov pc, lr */
-#define SET_R0_TRUE_INSTRUCTION 0xe3a00001 /* mov r0, #1 */
-
-#define truecc_insn(insn) (((insn) & 0xf0000000) | \
- (SET_R0_TRUE_INSTRUCTION & 0x0fffffff))
typedef long (insn_0arg_fn_t)(void);
typedef long (insn_1arg_fn_t)(long);
static void __kprobes simulate_bbl(struct kprobe *p, struct pt_regs *regs)
{
- insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
long iaddr = (long)p->addr;
int disp = branch_displacement(insn);
- if (!insnslot_1arg_rflags(0, regs->ARM_cpsr, i_fn))
- return;
-
if (insn & (1 << 24))
regs->ARM_lr = iaddr + 4;
static void __kprobes simulate_blx2bx(struct kprobe *p, struct pt_regs *regs)
{
- insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
int rm = insn & 0xf;
long rmv = regs->uregs[rm];
- if (!insnslot_1arg_rflags(0, regs->ARM_cpsr, i_fn))
- return;
-
if (insn & (1 << 5))
regs->ARM_lr = (long)p->addr + 4;
regs->ARM_cpsr |= PSR_T_BIT;
}
+static void __kprobes simulate_mrs(struct kprobe *p, struct pt_regs *regs)
+{
+ kprobe_opcode_t insn = p->opcode;
+ int rd = (insn >> 12) & 0xf;
+ unsigned long mask = 0xf8ff03df; /* Mask out execution state */
+ regs->uregs[rd] = regs->ARM_cpsr & mask;
+}
+
static void __kprobes simulate_ldm1stm1(struct kprobe *p, struct pt_regs *regs)
{
- insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
int rn = (insn >> 16) & 0xf;
int lbit = insn & (1 << 20);
int reg_bit_vector;
int reg_count;
- if (!insnslot_1arg_rflags(0, regs->ARM_cpsr, i_fn))
- return;
-
reg_count = 0;
reg_bit_vector = insn & 0xffff;
while (reg_bit_vector) {
static void __kprobes simulate_stm1_pc(struct kprobe *p, struct pt_regs *regs)
{
- insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
-
- if (!insnslot_1arg_rflags(0, regs->ARM_cpsr, i_fn))
- return;
-
regs->ARM_pc = (long)p->addr + str_pc_offset;
simulate_ldm1stm1(p, regs);
regs->ARM_pc = (long)p->addr + 4;
regs->uregs[12] = regs->uregs[13];
}
-static void __kprobes emulate_ldcstc(struct kprobe *p, struct pt_regs *regs)
-{
- insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
- kprobe_opcode_t insn = p->opcode;
- int rn = (insn >> 16) & 0xf;
- long rnv = regs->uregs[rn];
-
- /* Save Rn in case of writeback. */
- regs->uregs[rn] = insnslot_1arg_rflags(rnv, regs->ARM_cpsr, i_fn);
-}
-
static void __kprobes emulate_ldrd(struct kprobe *p, struct pt_regs *regs)
{
insn_2arg_fn_t *i_fn = (insn_2arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
+ long ppc = (long)p->addr + 8;
int rd = (insn >> 12) & 0xf;
int rn = (insn >> 16) & 0xf;
int rm = insn & 0xf; /* rm may be invalid, don't care. */
+ long rmv = (rm == 15) ? ppc : regs->uregs[rm];
+ long rnv = (rn == 15) ? ppc : regs->uregs[rn];
/* Not following the C calling convention here, so need asm(). */
__asm__ __volatile__ (
"str r0, %[rn] \n\t" /* in case of writeback */
"str r2, %[rd0] \n\t"
"str r3, %[rd1] \n\t"
- : [rn] "+m" (regs->uregs[rn]),
+ : [rn] "+m" (rnv),
[rd0] "=m" (regs->uregs[rd]),
[rd1] "=m" (regs->uregs[rd+1])
- : [rm] "m" (regs->uregs[rm]),
+ : [rm] "m" (rmv),
[cpsr] "r" (regs->ARM_cpsr),
[i_fn] "r" (i_fn)
: "r0", "r1", "r2", "r3", "lr", "cc"
);
+ if (is_writeback(insn))
+ regs->uregs[rn] = rnv;
}
static void __kprobes emulate_strd(struct kprobe *p, struct pt_regs *regs)
{
insn_4arg_fn_t *i_fn = (insn_4arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
+ long ppc = (long)p->addr + 8;
int rd = (insn >> 12) & 0xf;
int rn = (insn >> 16) & 0xf;
int rm = insn & 0xf;
- long rnv = regs->uregs[rn];
- long rmv = regs->uregs[rm]; /* rm/rmv may be invalid, don't care. */
+ long rnv = (rn == 15) ? ppc : regs->uregs[rn];
+ /* rm/rmv may be invalid, don't care. */
+ long rmv = (rm == 15) ? ppc : regs->uregs[rm];
+ long rnv_wb;
- regs->uregs[rn] = insnslot_4arg_rflags(rnv, rmv, regs->uregs[rd],
+ rnv_wb = insnslot_4arg_rflags(rnv, rmv, regs->uregs[rd],
regs->uregs[rd+1],
regs->ARM_cpsr, i_fn);
+ if (is_writeback(insn))
+ regs->uregs[rn] = rnv_wb;
}
static void __kprobes emulate_ldr(struct kprobe *p, struct pt_regs *regs)
regs->uregs[rn] = rnv_wb; /* Save Rn in case of writeback. */
}
-static void __kprobes emulate_mrrc(struct kprobe *p, struct pt_regs *regs)
-{
- insn_llret_0arg_fn_t *i_fn = (insn_llret_0arg_fn_t *)&p->ainsn.insn[0];
- kprobe_opcode_t insn = p->opcode;
- union reg_pair fnr;
- int rd = (insn >> 12) & 0xf;
- int rn = (insn >> 16) & 0xf;
-
- fnr.dr = insnslot_llret_0arg_rflags(regs->ARM_cpsr, i_fn);
- regs->uregs[rn] = fnr.r0;
- regs->uregs[rd] = fnr.r1;
-}
-
-static void __kprobes emulate_mcrr(struct kprobe *p, struct pt_regs *regs)
-{
- insn_2arg_fn_t *i_fn = (insn_2arg_fn_t *)&p->ainsn.insn[0];
- kprobe_opcode_t insn = p->opcode;
- int rd = (insn >> 12) & 0xf;
- int rn = (insn >> 16) & 0xf;
- long rnv = regs->uregs[rn];
- long rdv = regs->uregs[rd];
-
- insnslot_2arg_rflags(rnv, rdv, regs->ARM_cpsr, i_fn);
-}
-
static void __kprobes emulate_sat(struct kprobe *p, struct pt_regs *regs)
{
insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
insnslot_0arg_rflags(regs->ARM_cpsr, i_fn);
}
-static void __kprobes emulate_rd12(struct kprobe *p, struct pt_regs *regs)
+static void __kprobes emulate_nop(struct kprobe *p, struct pt_regs *regs)
{
- insn_0arg_fn_t *i_fn = (insn_0arg_fn_t *)&p->ainsn.insn[0];
- kprobe_opcode_t insn = p->opcode;
- int rd = (insn >> 12) & 0xf;
-
- regs->uregs[rd] = insnslot_0arg_rflags(regs->ARM_cpsr, i_fn);
}
-static void __kprobes emulate_ird12(struct kprobe *p, struct pt_regs *regs)
+static void __kprobes
+emulate_rd12_modify(struct kprobe *p, struct pt_regs *regs)
{
insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
- int ird = (insn >> 12) & 0xf;
+ int rd = (insn >> 12) & 0xf;
+ long rdv = regs->uregs[rd];
- insnslot_1arg_rflags(regs->uregs[ird], regs->ARM_cpsr, i_fn);
+ regs->uregs[rd] = insnslot_1arg_rflags(rdv, regs->ARM_cpsr, i_fn);
}
-static void __kprobes emulate_rn16(struct kprobe *p, struct pt_regs *regs)
+static void __kprobes
+emulate_rd12rn0_modify(struct kprobe *p, struct pt_regs *regs)
{
- insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
+ insn_2arg_fn_t *i_fn = (insn_2arg_fn_t *)&p->ainsn.insn[0];
kprobe_opcode_t insn = p->opcode;
- int rn = (insn >> 16) & 0xf;
+ int rd = (insn >> 12) & 0xf;
+ int rn = insn & 0xf;
+ long rdv = regs->uregs[rd];
long rnv = regs->uregs[rn];
- insnslot_1arg_rflags(rnv, regs->ARM_cpsr, i_fn);
+ regs->uregs[rd] = insnslot_2arg_rflags(rdv, rnv, regs->ARM_cpsr, i_fn);
}
static void __kprobes emulate_rd12rm0(struct kprobe *p, struct pt_regs *regs)
regs->uregs[rd] = insnslot_1arg_rwflags(rnv, ®s->ARM_cpsr, i_fn);
}
+static void __kprobes
+emulate_alu_tests_imm(struct kprobe *p, struct pt_regs *regs)
+{
+ insn_1arg_fn_t *i_fn = (insn_1arg_fn_t *)&p->ainsn.insn[0];
+ kprobe_opcode_t insn = p->opcode;
+ int rn = (insn >> 16) & 0xf;
+ long rnv = (rn == 15) ? (long)p->addr + 8 : regs->uregs[rn];
+
+ insnslot_1arg_rwflags(rnv, ®s->ARM_cpsr, i_fn);
+}
+
static void __kprobes
emulate_alu_rflags(struct kprobe *p, struct pt_regs *regs)
{
insnslot_3arg_rwflags(rnv, rmv, rsv, ®s->ARM_cpsr, i_fn);
}
+static void __kprobes
+emulate_alu_tests(struct kprobe *p, struct pt_regs *regs)
+{
+ insn_3arg_fn_t *i_fn = (insn_3arg_fn_t *)&p->ainsn.insn[0];
+ kprobe_opcode_t insn = p->opcode;
+ long ppc = (long)p->addr + 8;
+ int rn = (insn >> 16) & 0xf;
+ int rs = (insn >> 8) & 0xf; /* rs/rsv may be invalid, don't care. */
+ int rm = insn & 0xf;
+ long rnv = (rn == 15) ? ppc : regs->uregs[rn];
+ long rmv = (rm == 15) ? ppc : regs->uregs[rm];
+ long rsv = regs->uregs[rs];
+
+ insnslot_3arg_rwflags(rnv, rmv, rsv, ®s->ARM_cpsr, i_fn);
+}
+
static enum kprobe_insn __kprobes
prep_emulate_ldr_str(kprobe_opcode_t insn, struct arch_specific_insn *asi)
{
- int ibit = (insn & (1 << 26)) ? 25 : 22;
+ int not_imm = (insn & (1 << 26)) ? (insn & (1 << 25))
+ : (~insn & (1 << 22));
+
+ if (is_writeback(insn) && is_r15(insn, 16))
+ return INSN_REJECTED; /* Writeback to PC */
insn &= 0xfff00fff;
insn |= 0x00001000; /* Rn = r0, Rd = r1 */
- if (insn & (1 << ibit)) {
+ if (not_imm) {
insn &= ~0xf;
insn |= 2; /* Rm = r2 */
}
}
static enum kprobe_insn __kprobes
-prep_emulate_rd12rm0(kprobe_opcode_t insn, struct arch_specific_insn *asi)
+prep_emulate_rd12_modify(kprobe_opcode_t insn, struct arch_specific_insn *asi)
{
- insn &= 0xffff0ff0; /* Rd = r0, Rm = r0 */
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
+
+ insn &= 0xffff0fff; /* Rd = r0 */
asi->insn[0] = insn;
- asi->insn_handler = emulate_rd12rm0;
+ asi->insn_handler = emulate_rd12_modify;
return INSN_GOOD;
}
static enum kprobe_insn __kprobes
-prep_emulate_rd12(kprobe_opcode_t insn, struct arch_specific_insn *asi)
+prep_emulate_rd12rn0_modify(kprobe_opcode_t insn,
+ struct arch_specific_insn *asi)
{
- insn &= 0xffff0fff; /* Rd = r0 */
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
+
+ insn &= 0xffff0ff0; /* Rd = r0 */
+ insn |= 0x00000001; /* Rn = r1 */
+ asi->insn[0] = insn;
+ asi->insn_handler = emulate_rd12rn0_modify;
+ return INSN_GOOD;
+}
+
+static enum kprobe_insn __kprobes
+prep_emulate_rd12rm0(kprobe_opcode_t insn, struct arch_specific_insn *asi)
+{
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
+
+ insn &= 0xffff0ff0; /* Rd = r0, Rm = r0 */
asi->insn[0] = insn;
- asi->insn_handler = emulate_rd12;
+ asi->insn_handler = emulate_rd12rm0;
return INSN_GOOD;
}
prep_emulate_rd12rn16rm0_wflags(kprobe_opcode_t insn,
struct arch_specific_insn *asi)
{
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
+
insn &= 0xfff00ff0; /* Rd = r0, Rn = r0 */
insn |= 0x00000001; /* Rm = r1 */
asi->insn[0] = insn;
prep_emulate_rd16rs8rm0_wflags(kprobe_opcode_t insn,
struct arch_specific_insn *asi)
{
+ if (is_r15(insn, 16))
+ return INSN_REJECTED; /* Rd is PC */
+
insn &= 0xfff0f0f0; /* Rd = r0, Rs = r0 */
insn |= 0x00000001; /* Rm = r1 */
asi->insn[0] = insn;
prep_emulate_rd16rn12rs8rm0_wflags(kprobe_opcode_t insn,
struct arch_specific_insn *asi)
{
+ if (is_r15(insn, 16))
+ return INSN_REJECTED; /* Rd is PC */
+
insn &= 0xfff000f0; /* Rd = r0, Rn = r0 */
insn |= 0x00000102; /* Rs = r1, Rm = r2 */
asi->insn[0] = insn;
prep_emulate_rdhi16rdlo12rs8rm0_wflags(kprobe_opcode_t insn,
struct arch_specific_insn *asi)
{
+ if (is_r15(insn, 16) || is_r15(insn, 12))
+ return INSN_REJECTED; /* RdHi or RdLo is PC */
+
insn &= 0xfff000f0; /* RdHi = r0, RdLo = r1 */
insn |= 0x00001203; /* Rs = r2, Rm = r3 */
asi->insn[0] = insn;
static enum kprobe_insn __kprobes
space_1111(kprobe_opcode_t insn, struct arch_specific_insn *asi)
{
- /* CPS mmod == 1 : 1111 0001 0000 xx10 xxxx xxxx xx0x xxxx */
- /* RFE : 1111 100x x0x1 xxxx xxxx 1010 xxxx xxxx */
- /* SRS : 1111 100x x1x0 1101 xxxx 0101 xxxx xxxx */
- if ((insn & 0xfff30020) == 0xf1020000 ||
- (insn & 0xfe500f00) == 0xf8100a00 ||
- (insn & 0xfe5f0f00) == 0xf84d0500)
- return INSN_REJECTED;
-
- /* PLD : 1111 01x1 x101 xxxx xxxx xxxx xxxx xxxx : */
- if ((insn & 0xfd700000) == 0xf4500000) {
- insn &= 0xfff0ffff; /* Rn = r0 */
- asi->insn[0] = insn;
- asi->insn_handler = emulate_rn16;
- return INSN_GOOD;
+ /* memory hint : 1111 0100 x001 xxxx xxxx xxxx xxxx xxxx : */
+ /* PLDI : 1111 0100 x101 xxxx xxxx xxxx xxxx xxxx : */
+ /* PLDW : 1111 0101 x001 xxxx xxxx xxxx xxxx xxxx : */
+ /* PLD : 1111 0101 x101 xxxx xxxx xxxx xxxx xxxx : */
+ if ((insn & 0xfe300000) == 0xf4100000) {
+ asi->insn_handler = emulate_nop;
+ return INSN_GOOD_NO_SLOT;
}
/* BLX(1) : 1111 101x xxxx xxxx xxxx xxxx xxxx xxxx : */
return INSN_GOOD_NO_SLOT;
}
- /* SETEND : 1111 0001 0000 0001 xxxx xxxx 0000 xxxx */
- /* CDP2 : 1111 1110 xxxx xxxx xxxx xxxx xxx0 xxxx */
- if ((insn & 0xffff00f0) == 0xf1010000 ||
- (insn & 0xff000010) == 0xfe000000) {
- asi->insn[0] = insn;
- asi->insn_handler = emulate_none;
- return INSN_GOOD;
- }
+ /* CPS : 1111 0001 0000 xxx0 xxxx xxxx xx0x xxxx */
+ /* SETEND: 1111 0001 0000 0001 xxxx xxxx 0000 xxxx */
+ /* SRS : 1111 100x x1x0 xxxx xxxx xxxx xxxx xxxx */
+ /* RFE : 1111 100x x0x1 xxxx xxxx xxxx xxxx xxxx */
+
+ /* Coprocessor instructions... */
/* MCRR2 : 1111 1100 0100 xxxx xxxx xxxx xxxx xxxx : (Rd != Rn) */
/* MRRC2 : 1111 1100 0101 xxxx xxxx xxxx xxxx xxxx : (Rd != Rn) */
- if ((insn & 0xffe00000) == 0xfc400000) {
- insn &= 0xfff00fff; /* Rn = r0 */
- insn |= 0x00001000; /* Rd = r1 */
- asi->insn[0] = insn;
- asi->insn_handler =
- (insn & (1 << 20)) ? emulate_mrrc : emulate_mcrr;
- return INSN_GOOD;
- }
+ /* LDC2 : 1111 110x xxx1 xxxx xxxx xxxx xxxx xxxx */
+ /* STC2 : 1111 110x xxx0 xxxx xxxx xxxx xxxx xxxx */
+ /* CDP2 : 1111 1110 xxxx xxxx xxxx xxxx xxx0 xxxx */
+ /* MCR2 : 1111 1110 xxx0 xxxx xxxx xxxx xxx1 xxxx */
+ /* MRC2 : 1111 1110 xxx1 xxxx xxxx xxxx xxx1 xxxx */
- /* LDC2 : 1111 110x xxx1 xxxx xxxx xxxx xxxx xxxx */
- /* STC2 : 1111 110x xxx0 xxxx xxxx xxxx xxxx xxxx */
- if ((insn & 0xfe000000) == 0xfc000000) {
- insn &= 0xfff0ffff; /* Rn = r0 */
- asi->insn[0] = insn;
- asi->insn_handler = emulate_ldcstc;
- return INSN_GOOD;
- }
-
- /* MCR2 : 1111 1110 xxx0 xxxx xxxx xxxx xxx1 xxxx */
- /* MRC2 : 1111 1110 xxx1 xxxx xxxx xxxx xxx1 xxxx */
- insn &= 0xffff0fff; /* Rd = r0 */
- asi->insn[0] = insn;
- asi->insn_handler = (insn & (1 << 20)) ? emulate_rd12 : emulate_ird12;
- return INSN_GOOD;
+ return INSN_REJECTED;
}
static enum kprobe_insn __kprobes
/* cccc 0001 0xx0 xxxx xxxx xxxx xxxx xxx0 xxxx */
if ((insn & 0x0f900010) == 0x01000000) {
- /* BXJ : cccc 0001 0010 xxxx xxxx xxxx 0010 xxxx */
- /* MSR : cccc 0001 0x10 xxxx xxxx xxxx 0000 xxxx */
- if ((insn & 0x0ff000f0) == 0x01200020 ||
- (insn & 0x0fb000f0) == 0x01200000)
- return INSN_REJECTED;
-
- /* MRS : cccc 0001 0x00 xxxx xxxx xxxx 0000 xxxx */
- if ((insn & 0x0fb00010) == 0x01000000)
- return prep_emulate_rd12(insn, asi);
+ /* MRS cpsr : cccc 0001 0000 xxxx xxxx xxxx 0000 xxxx */
+ if ((insn & 0x0ff000f0) == 0x01000000) {
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
+ asi->insn_handler = simulate_mrs;
+ return INSN_GOOD_NO_SLOT;
+ }
/* SMLALxy : cccc 0001 0100 xxxx xxxx xxxx 1xx0 xxxx */
if ((insn & 0x0ff00090) == 0x01400080)
- return prep_emulate_rdhi16rdlo12rs8rm0_wflags(insn, asi);
+ return prep_emulate_rdhi16rdlo12rs8rm0_wflags(insn,
+ asi);
/* SMULWy : cccc 0001 0010 xxxx xxxx xxxx 1x10 xxxx */
/* SMULxy : cccc 0001 0110 xxxx xxxx xxxx 1xx0 xxxx */
return prep_emulate_rd16rs8rm0_wflags(insn, asi);
/* SMLAxy : cccc 0001 0000 xxxx xxxx xxxx 1xx0 xxxx : Q */
- /* SMLAWy : cccc 0001 0010 xxxx xxxx xxxx 0x00 xxxx : Q */
- return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
+ /* SMLAWy : cccc 0001 0010 xxxx xxxx xxxx 1x00 xxxx : Q */
+ if ((insn & 0x0ff00090) == 0x01000080 ||
+ (insn & 0x0ff000b0) == 0x01200080)
+ return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
+
+ /* BXJ : cccc 0001 0010 xxxx xxxx xxxx 0010 xxxx */
+ /* MSR : cccc 0001 0x10 xxxx xxxx xxxx 0000 xxxx */
+ /* MRS spsr : cccc 0001 0100 xxxx xxxx xxxx 0000 xxxx */
+ /* Other instruction encodings aren't yet defined */
+ return INSN_REJECTED;
}
/* cccc 0001 0xx0 xxxx xxxx xxxx xxxx 0xx1 xxxx */
else if ((insn & 0x0f900090) == 0x01000010) {
- /* BKPT : 1110 0001 0010 xxxx xxxx xxxx 0111 xxxx */
- if ((insn & 0xfff000f0) == 0xe1200070)
- return INSN_REJECTED;
-
/* BLX(2) : cccc 0001 0010 xxxx xxxx xxxx 0011 xxxx */
/* BX : cccc 0001 0010 xxxx xxxx xxxx 0001 xxxx */
if ((insn & 0x0ff000d0) == 0x01200010) {
- asi->insn[0] = truecc_insn(insn);
+ if ((insn & 0x0ff000ff) == 0x0120003f)
+ return INSN_REJECTED; /* BLX pc */
asi->insn_handler = simulate_blx2bx;
- return INSN_GOOD;
+ return INSN_GOOD_NO_SLOT;
}
/* CLZ : cccc 0001 0110 xxxx xxxx xxxx 0001 xxxx */
/* QSUB : cccc 0001 0010 xxxx xxxx xxxx 0101 xxxx :Q */
/* QDADD : cccc 0001 0100 xxxx xxxx xxxx 0101 xxxx :Q */
/* QDSUB : cccc 0001 0110 xxxx xxxx xxxx 0101 xxxx :Q */
- return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+ if ((insn & 0x0f9000f0) == 0x01000050)
+ return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+
+ /* BKPT : 1110 0001 0010 xxxx xxxx xxxx 0111 xxxx */
+ /* SMC : cccc 0001 0110 xxxx xxxx xxxx 0111 xxxx */
+
+ /* Other instruction encodings aren't yet defined */
+ return INSN_REJECTED;
}
/* cccc 0000 xxxx xxxx xxxx xxxx xxxx 1001 xxxx */
- else if ((insn & 0x0f000090) == 0x00000090) {
+ else if ((insn & 0x0f0000f0) == 0x00000090) {
/* MUL : cccc 0000 0000 xxxx xxxx xxxx 1001 xxxx : */
/* MULS : cccc 0000 0001 xxxx xxxx xxxx 1001 xxxx :cc */
/* MLA : cccc 0000 0010 xxxx xxxx xxxx 1001 xxxx : */
/* MLAS : cccc 0000 0011 xxxx xxxx xxxx 1001 xxxx :cc */
/* UMAAL : cccc 0000 0100 xxxx xxxx xxxx 1001 xxxx : */
+ /* undef : cccc 0000 0101 xxxx xxxx xxxx 1001 xxxx : */
+ /* MLS : cccc 0000 0110 xxxx xxxx xxxx 1001 xxxx : */
+ /* undef : cccc 0000 0111 xxxx xxxx xxxx 1001 xxxx : */
/* UMULL : cccc 0000 1000 xxxx xxxx xxxx 1001 xxxx : */
/* UMULLS : cccc 0000 1001 xxxx xxxx xxxx 1001 xxxx :cc */
/* UMLAL : cccc 0000 1010 xxxx xxxx xxxx 1001 xxxx : */
/* SMULLS : cccc 0000 1101 xxxx xxxx xxxx 1001 xxxx :cc */
/* SMLAL : cccc 0000 1110 xxxx xxxx xxxx 1001 xxxx : */
/* SMLALS : cccc 0000 1111 xxxx xxxx xxxx 1001 xxxx :cc */
- if ((insn & 0x0fe000f0) == 0x00000090) {
- return prep_emulate_rd16rs8rm0_wflags(insn, asi);
- } else if ((insn & 0x0fe000f0) == 0x00200090) {
- return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
- } else {
- return prep_emulate_rdhi16rdlo12rs8rm0_wflags(insn, asi);
- }
+ if ((insn & 0x00d00000) == 0x00500000)
+ return INSN_REJECTED;
+ else if ((insn & 0x00e00000) == 0x00000000)
+ return prep_emulate_rd16rs8rm0_wflags(insn, asi);
+ else if ((insn & 0x00a00000) == 0x00200000)
+ return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
+ else
+ return prep_emulate_rdhi16rdlo12rs8rm0_wflags(insn,
+ asi);
}
/* cccc 000x xxxx xxxx xxxx xxxx xxxx 1xx1 xxxx */
/* SWP : cccc 0001 0000 xxxx xxxx xxxx 1001 xxxx */
/* SWPB : cccc 0001 0100 xxxx xxxx xxxx 1001 xxxx */
- /* LDRD : cccc 000x xxx0 xxxx xxxx xxxx 1101 xxxx */
- /* STRD : cccc 000x xxx0 xxxx xxxx xxxx 1111 xxxx */
+ /* ??? : cccc 0001 0x01 xxxx xxxx xxxx 1001 xxxx */
+ /* ??? : cccc 0001 0x10 xxxx xxxx xxxx 1001 xxxx */
+ /* ??? : cccc 0001 0x11 xxxx xxxx xxxx 1001 xxxx */
/* STREX : cccc 0001 1000 xxxx xxxx xxxx 1001 xxxx */
/* LDREX : cccc 0001 1001 xxxx xxxx xxxx 1001 xxxx */
+ /* STREXD: cccc 0001 1010 xxxx xxxx xxxx 1001 xxxx */
+ /* LDREXD: cccc 0001 1011 xxxx xxxx xxxx 1001 xxxx */
+ /* STREXB: cccc 0001 1100 xxxx xxxx xxxx 1001 xxxx */
+ /* LDREXB: cccc 0001 1101 xxxx xxxx xxxx 1001 xxxx */
+ /* STREXH: cccc 0001 1110 xxxx xxxx xxxx 1001 xxxx */
+ /* LDREXH: cccc 0001 1111 xxxx xxxx xxxx 1001 xxxx */
+
+ /* LDRD : cccc 000x xxx0 xxxx xxxx xxxx 1101 xxxx */
+ /* STRD : cccc 000x xxx0 xxxx xxxx xxxx 1111 xxxx */
/* LDRH : cccc 000x xxx1 xxxx xxxx xxxx 1011 xxxx */
/* STRH : cccc 000x xxx0 xxxx xxxx xxxx 1011 xxxx */
/* LDRSB : cccc 000x xxx1 xxxx xxxx xxxx 1101 xxxx */
/* LDRSH : cccc 000x xxx1 xxxx xxxx xxxx 1111 xxxx */
- if ((insn & 0x0fb000f0) == 0x01000090) {
- /* SWP/SWPB */
- return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+ if ((insn & 0x0f0000f0) == 0x01000090) {
+ if ((insn & 0x0fb000f0) == 0x01000090) {
+ /* SWP/SWPB */
+ return prep_emulate_rd12rn16rm0_wflags(insn,
+ asi);
+ } else {
+ /* STREX/LDREX variants and unallocaed space */
+ return INSN_REJECTED;
+ }
+
} else if ((insn & 0x0e1000d0) == 0x00000d0) {
/* STRD/LDRD */
+ if ((insn & 0x0000e000) == 0x0000e000)
+ return INSN_REJECTED; /* Rd is LR or PC */
+ if (is_writeback(insn) && is_r15(insn, 16))
+ return INSN_REJECTED; /* Writeback to PC */
+
insn &= 0xfff00fff;
insn |= 0x00002000; /* Rn = r0, Rd = r2 */
- if (insn & (1 << 22)) {
- /* I bit */
+ if (!(insn & (1 << 22))) {
+ /* Register index */
insn &= ~0xf;
insn |= 1; /* Rm = r1 */
}
return INSN_GOOD;
}
+ /* LDRH/STRH/LDRSB/LDRSH */
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
return prep_emulate_ldr_str(insn, asi);
}
/*
* ALU op with S bit and Rd == 15 :
- * cccc 000x xxx1 xxxx 1111 xxxx xxxx xxxx
+ * cccc 000x xxx1 xxxx 1111 xxxx xxxx xxxx
*/
if ((insn & 0x0e10f000) == 0x0010f000)
return INSN_REJECTED;
insn |= 0x00000200; /* Rs = r2 */
}
asi->insn[0] = insn;
- asi->insn_handler = (insn & (1 << 20)) ? /* S-bit */
+
+ if ((insn & 0x0f900000) == 0x01100000) {
+ /*
+ * TST : cccc 0001 0001 xxxx xxxx xxxx xxxx xxxx
+ * TEQ : cccc 0001 0011 xxxx xxxx xxxx xxxx xxxx
+ * CMP : cccc 0001 0101 xxxx xxxx xxxx xxxx xxxx
+ * CMN : cccc 0001 0111 xxxx xxxx xxxx xxxx xxxx
+ */
+ asi->insn_handler = emulate_alu_tests;
+ } else {
+ /* ALU ops which write to Rd */
+ asi->insn_handler = (insn & (1 << 20)) ? /* S-bit */
emulate_alu_rwflags : emulate_alu_rflags;
+ }
return INSN_GOOD;
}
static enum kprobe_insn __kprobes
space_cccc_001x(kprobe_opcode_t insn, struct arch_specific_insn *asi)
{
+ /* MOVW : cccc 0011 0000 xxxx xxxx xxxx xxxx xxxx */
+ /* MOVT : cccc 0011 0100 xxxx xxxx xxxx xxxx xxxx */
+ if ((insn & 0x0fb00000) == 0x03000000)
+ return prep_emulate_rd12_modify(insn, asi);
+
+ /* hints : cccc 0011 0010 0000 xxxx xxxx xxxx xxxx */
+ if ((insn & 0x0fff0000) == 0x03200000) {
+ unsigned op2 = insn & 0x000000ff;
+ if (op2 == 0x01 || op2 == 0x04) {
+ /* YIELD : cccc 0011 0010 0000 xxxx xxxx 0000 0001 */
+ /* SEV : cccc 0011 0010 0000 xxxx xxxx 0000 0100 */
+ asi->insn[0] = insn;
+ asi->insn_handler = emulate_none;
+ return INSN_GOOD;
+ } else if (op2 <= 0x03) {
+ /* NOP : cccc 0011 0010 0000 xxxx xxxx 0000 0000 */
+ /* WFE : cccc 0011 0010 0000 xxxx xxxx 0000 0010 */
+ /* WFI : cccc 0011 0010 0000 xxxx xxxx 0000 0011 */
+ /*
+ * We make WFE and WFI true NOPs to avoid stalls due
+ * to missing events whilst processing the probe.
+ */
+ asi->insn_handler = emulate_nop;
+ return INSN_GOOD_NO_SLOT;
+ }
+ /* For DBG and unallocated hints it's safest to reject them */
+ return INSN_REJECTED;
+ }
+
/*
* MSR : cccc 0011 0x10 xxxx xxxx xxxx xxxx xxxx
- * Undef : cccc 0011 0100 xxxx xxxx xxxx xxxx xxxx
* ALU op with S bit and Rd == 15 :
* cccc 001x xxx1 xxxx 1111 xxxx xxxx xxxx
*/
if ((insn & 0x0fb00000) == 0x03200000 || /* MSR */
- (insn & 0x0ff00000) == 0x03400000 || /* Undef */
(insn & 0x0e10f000) == 0x0210f000) /* ALU s-bit, R15 */
return INSN_REJECTED;
* *S (bit 20) updates condition codes
* ADC/SBC/RSC reads the C flag
*/
- insn &= 0xffff0fff; /* Rd = r0 */
+ insn &= 0xfff00fff; /* Rn = r0 and Rd = r0 */
asi->insn[0] = insn;
- asi->insn_handler = (insn & (1 << 20)) ? /* S-bit */
+
+ if ((insn & 0x0f900000) == 0x03100000) {
+ /*
+ * TST : cccc 0011 0001 xxxx xxxx xxxx xxxx xxxx
+ * TEQ : cccc 0011 0011 xxxx xxxx xxxx xxxx xxxx
+ * CMP : cccc 0011 0101 xxxx xxxx xxxx xxxx xxxx
+ * CMN : cccc 0011 0111 xxxx xxxx xxxx xxxx xxxx
+ */
+ asi->insn_handler = emulate_alu_tests_imm;
+ } else {
+ /* ALU ops which write to Rd */
+ asi->insn_handler = (insn & (1 << 20)) ? /* S-bit */
emulate_alu_imm_rwflags : emulate_alu_imm_rflags;
+ }
return INSN_GOOD;
}
{
/* SEL : cccc 0110 1000 xxxx xxxx xxxx 1011 xxxx GE: !!! */
if ((insn & 0x0ff000f0) == 0x068000b0) {
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
insn &= 0xfff00ff0; /* Rd = r0, Rn = r0 */
insn |= 0x00000001; /* Rm = r1 */
asi->insn[0] = insn;
/* USAT16 : cccc 0110 1110 xxxx xxxx xxxx 0011 xxxx :Q */
if ((insn & 0x0fa00030) == 0x06a00010 ||
(insn & 0x0fb000f0) == 0x06a00030) {
+ if (is_r15(insn, 12))
+ return INSN_REJECTED; /* Rd is PC */
insn &= 0xffff0ff0; /* Rd = r0, Rm = r0 */
asi->insn[0] = insn;
asi->insn_handler = emulate_sat;
/* REV : cccc 0110 1011 xxxx xxxx xxxx 0011 xxxx */
/* REV16 : cccc 0110 1011 xxxx xxxx xxxx 1011 xxxx */
+ /* RBIT : cccc 0110 1111 xxxx xxxx xxxx 0011 xxxx */
/* REVSH : cccc 0110 1111 xxxx xxxx xxxx 1011 xxxx */
if ((insn & 0x0ff00070) == 0x06b00030 ||
- (insn & 0x0ff000f0) == 0x06f000b0)
+ (insn & 0x0ff00070) == 0x06f00030)
return prep_emulate_rd12rm0(insn, asi);
+ /* ??? : cccc 0110 0000 xxxx xxxx xxxx xxx1 xxxx : */
/* SADD16 : cccc 0110 0001 xxxx xxxx xxxx 0001 xxxx :GE */
/* SADDSUBX : cccc 0110 0001 xxxx xxxx xxxx 0011 xxxx :GE */
/* SSUBADDX : cccc 0110 0001 xxxx xxxx xxxx 0101 xxxx :GE */
/* SSUB16 : cccc 0110 0001 xxxx xxxx xxxx 0111 xxxx :GE */
/* SADD8 : cccc 0110 0001 xxxx xxxx xxxx 1001 xxxx :GE */
+ /* ??? : cccc 0110 0001 xxxx xxxx xxxx 1011 xxxx : */
+ /* ??? : cccc 0110 0001 xxxx xxxx xxxx 1101 xxxx : */
/* SSUB8 : cccc 0110 0001 xxxx xxxx xxxx 1111 xxxx :GE */
/* QADD16 : cccc 0110 0010 xxxx xxxx xxxx 0001 xxxx : */
/* QADDSUBX : cccc 0110 0010 xxxx xxxx xxxx 0011 xxxx : */
/* QSUBADDX : cccc 0110 0010 xxxx xxxx xxxx 0101 xxxx : */
/* QSUB16 : cccc 0110 0010 xxxx xxxx xxxx 0111 xxxx : */
/* QADD8 : cccc 0110 0010 xxxx xxxx xxxx 1001 xxxx : */
+ /* ??? : cccc 0110 0010 xxxx xxxx xxxx 1011 xxxx : */
+ /* ??? : cccc 0110 0010 xxxx xxxx xxxx 1101 xxxx : */
/* QSUB8 : cccc 0110 0010 xxxx xxxx xxxx 1111 xxxx : */
/* SHADD16 : cccc 0110 0011 xxxx xxxx xxxx 0001 xxxx : */
/* SHADDSUBX : cccc 0110 0011 xxxx xxxx xxxx 0011 xxxx : */
/* SHSUBADDX : cccc 0110 0011 xxxx xxxx xxxx 0101 xxxx : */
/* SHSUB16 : cccc 0110 0011 xxxx xxxx xxxx 0111 xxxx : */
/* SHADD8 : cccc 0110 0011 xxxx xxxx xxxx 1001 xxxx : */
+ /* ??? : cccc 0110 0011 xxxx xxxx xxxx 1011 xxxx : */
+ /* ??? : cccc 0110 0011 xxxx xxxx xxxx 1101 xxxx : */
/* SHSUB8 : cccc 0110 0011 xxxx xxxx xxxx 1111 xxxx : */
+ /* ??? : cccc 0110 0100 xxxx xxxx xxxx xxx1 xxxx : */
/* UADD16 : cccc 0110 0101 xxxx xxxx xxxx 0001 xxxx :GE */
/* UADDSUBX : cccc 0110 0101 xxxx xxxx xxxx 0011 xxxx :GE */
/* USUBADDX : cccc 0110 0101 xxxx xxxx xxxx 0101 xxxx :GE */
/* USUB16 : cccc 0110 0101 xxxx xxxx xxxx 0111 xxxx :GE */
/* UADD8 : cccc 0110 0101 xxxx xxxx xxxx 1001 xxxx :GE */
+ /* ??? : cccc 0110 0101 xxxx xxxx xxxx 1011 xxxx : */
+ /* ??? : cccc 0110 0101 xxxx xxxx xxxx 1101 xxxx : */
/* USUB8 : cccc 0110 0101 xxxx xxxx xxxx 1111 xxxx :GE */
/* UQADD16 : cccc 0110 0110 xxxx xxxx xxxx 0001 xxxx : */
/* UQADDSUBX : cccc 0110 0110 xxxx xxxx xxxx 0011 xxxx : */
/* UQSUBADDX : cccc 0110 0110 xxxx xxxx xxxx 0101 xxxx : */
/* UQSUB16 : cccc 0110 0110 xxxx xxxx xxxx 0111 xxxx : */
/* UQADD8 : cccc 0110 0110 xxxx xxxx xxxx 1001 xxxx : */
+ /* ??? : cccc 0110 0110 xxxx xxxx xxxx 1011 xxxx : */
+ /* ??? : cccc 0110 0110 xxxx xxxx xxxx 1101 xxxx : */
/* UQSUB8 : cccc 0110 0110 xxxx xxxx xxxx 1111 xxxx : */
/* UHADD16 : cccc 0110 0111 xxxx xxxx xxxx 0001 xxxx : */
/* UHADDSUBX : cccc 0110 0111 xxxx xxxx xxxx 0011 xxxx : */
/* UHSUBADDX : cccc 0110 0111 xxxx xxxx xxxx 0101 xxxx : */
/* UHSUB16 : cccc 0110 0111 xxxx xxxx xxxx 0111 xxxx : */
/* UHADD8 : cccc 0110 0111 xxxx xxxx xxxx 1001 xxxx : */
+ /* ??? : cccc 0110 0111 xxxx xxxx xxxx 1011 xxxx : */
+ /* ??? : cccc 0110 0111 xxxx xxxx xxxx 1101 xxxx : */
/* UHSUB8 : cccc 0110 0111 xxxx xxxx xxxx 1111 xxxx : */
+ if ((insn & 0x0f800010) == 0x06000010) {
+ if ((insn & 0x00300000) == 0x00000000 ||
+ (insn & 0x000000e0) == 0x000000a0 ||
+ (insn & 0x000000e0) == 0x000000c0)
+ return INSN_REJECTED; /* Unallocated space */
+ return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+ }
+
/* PKHBT : cccc 0110 1000 xxxx xxxx xxxx x001 xxxx : */
/* PKHTB : cccc 0110 1000 xxxx xxxx xxxx x101 xxxx : */
+ if ((insn & 0x0ff00030) == 0x06800010)
+ return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+
/* SXTAB16 : cccc 0110 1000 xxxx xxxx xxxx 0111 xxxx : */
- /* SXTB : cccc 0110 1010 xxxx xxxx xxxx 0111 xxxx : */
+ /* SXTB16 : cccc 0110 1000 1111 xxxx xxxx 0111 xxxx : */
+ /* ??? : cccc 0110 1001 xxxx xxxx xxxx 0111 xxxx : */
/* SXTAB : cccc 0110 1010 xxxx xxxx xxxx 0111 xxxx : */
+ /* SXTB : cccc 0110 1010 1111 xxxx xxxx 0111 xxxx : */
/* SXTAH : cccc 0110 1011 xxxx xxxx xxxx 0111 xxxx : */
+ /* SXTH : cccc 0110 1011 1111 xxxx xxxx 0111 xxxx : */
/* UXTAB16 : cccc 0110 1100 xxxx xxxx xxxx 0111 xxxx : */
+ /* UXTB16 : cccc 0110 1100 1111 xxxx xxxx 0111 xxxx : */
+ /* ??? : cccc 0110 1101 xxxx xxxx xxxx 0111 xxxx : */
/* UXTAB : cccc 0110 1110 xxxx xxxx xxxx 0111 xxxx : */
+ /* UXTB : cccc 0110 1110 1111 xxxx xxxx 0111 xxxx : */
/* UXTAH : cccc 0110 1111 xxxx xxxx xxxx 0111 xxxx : */
- return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+ /* UXTH : cccc 0110 1111 1111 xxxx xxxx 0111 xxxx : */
+ if ((insn & 0x0f8000f0) == 0x06800070) {
+ if ((insn & 0x00300000) == 0x00100000)
+ return INSN_REJECTED; /* Unallocated space */
+
+ if ((insn & 0x000f0000) == 0x000f0000)
+ return prep_emulate_rd12rm0(insn, asi);
+ else
+ return prep_emulate_rd12rn16rm0_wflags(insn, asi);
+ }
+
+ /* Other instruction encodings aren't yet defined */
+ return INSN_REJECTED;
}
static enum kprobe_insn __kprobes
if ((insn & 0x0ff000f0) == 0x03f000f0)
return INSN_REJECTED;
- /* USADA8 : cccc 0111 1000 xxxx xxxx xxxx 0001 xxxx */
- /* USAD8 : cccc 0111 1000 xxxx 1111 xxxx 0001 xxxx */
- if ((insn & 0x0ff000f0) == 0x07800010)
- return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
-
/* SMLALD : cccc 0111 0100 xxxx xxxx xxxx 00x1 xxxx */
/* SMLSLD : cccc 0111 0100 xxxx xxxx xxxx 01x1 xxxx */
if ((insn & 0x0ff00090) == 0x07400010)
return prep_emulate_rdhi16rdlo12rs8rm0_wflags(insn, asi);
/* SMLAD : cccc 0111 0000 xxxx xxxx xxxx 00x1 xxxx :Q */
+ /* SMUAD : cccc 0111 0000 xxxx 1111 xxxx 00x1 xxxx :Q */
/* SMLSD : cccc 0111 0000 xxxx xxxx xxxx 01x1 xxxx :Q */
+ /* SMUSD : cccc 0111 0000 xxxx 1111 xxxx 01x1 xxxx : */
/* SMMLA : cccc 0111 0101 xxxx xxxx xxxx 00x1 xxxx : */
- /* SMMLS : cccc 0111 0101 xxxx xxxx xxxx 11x1 xxxx : */
+ /* SMMUL : cccc 0111 0101 xxxx 1111 xxxx 00x1 xxxx : */
+ /* USADA8 : cccc 0111 1000 xxxx xxxx xxxx 0001 xxxx : */
+ /* USAD8 : cccc 0111 1000 xxxx 1111 xxxx 0001 xxxx : */
if ((insn & 0x0ff00090) == 0x07000010 ||
(insn & 0x0ff000d0) == 0x07500010 ||
- (insn & 0x0ff000d0) == 0x075000d0)
+ (insn & 0x0ff000f0) == 0x07800010) {
+
+ if ((insn & 0x0000f000) == 0x0000f000)
+ return prep_emulate_rd16rs8rm0_wflags(insn, asi);
+ else
+ return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
+ }
+
+ /* SMMLS : cccc 0111 0101 xxxx xxxx xxxx 11x1 xxxx : */
+ if ((insn & 0x0ff000d0) == 0x075000d0)
return prep_emulate_rd16rn12rs8rm0_wflags(insn, asi);
- /* SMUSD : cccc 0111 0000 xxxx xxxx xxxx 01x1 xxxx : */
- /* SMUAD : cccc 0111 0000 xxxx 1111 xxxx 00x1 xxxx :Q */
- /* SMMUL : cccc 0111 0101 xxxx 1111 xxxx 00x1 xxxx : */
- return prep_emulate_rd16rs8rm0_wflags(insn, asi);
+ /* SBFX : cccc 0111 101x xxxx xxxx xxxx x101 xxxx : */
+ /* UBFX : cccc 0111 111x xxxx xxxx xxxx x101 xxxx : */
+ if ((insn & 0x0fa00070) == 0x07a00050)
+ return prep_emulate_rd12rm0(insn, asi);
+
+ /* BFI : cccc 0111 110x xxxx xxxx xxxx x001 xxxx : */
+ /* BFC : cccc 0111 110x xxxx xxxx xxxx x001 1111 : */
+ if ((insn & 0x0fe00070) == 0x07c00010) {
+
+ if ((insn & 0x0000000f) == 0x0000000f)
+ return prep_emulate_rd12_modify(insn, asi);
+ else
+ return prep_emulate_rd12rn0_modify(insn, asi);
+ }
+
+ return INSN_REJECTED;
}
static enum kprobe_insn __kprobes
/* STRB : cccc 01xx x1x0 xxxx xxxx xxxx xxxx xxxx */
/* STRBT : cccc 01x0 x110 xxxx xxxx xxxx xxxx xxxx */
/* STRT : cccc 01x0 x010 xxxx xxxx xxxx xxxx xxxx */
+
+ if ((insn & 0x00500000) == 0x00500000 && is_r15(insn, 12))
+ return INSN_REJECTED; /* LDRB into PC */
+
return prep_emulate_ldr_str(insn, asi);
}
/* LDM(1) : cccc 100x x0x1 xxxx xxxx xxxx xxxx xxxx */
/* STM(1) : cccc 100x x0x0 xxxx xxxx xxxx xxxx xxxx */
- asi->insn[0] = truecc_insn(insn);
asi->insn_handler = ((insn & 0x108000) == 0x008000) ? /* STM & R15 */
simulate_stm1_pc : simulate_ldm1stm1;
- return INSN_GOOD;
+ return INSN_GOOD_NO_SLOT;
}
static enum kprobe_insn __kprobes
{
/* B : cccc 1010 xxxx xxxx xxxx xxxx xxxx xxxx */
/* BL : cccc 1011 xxxx xxxx xxxx xxxx xxxx xxxx */
- asi->insn[0] = truecc_insn(insn);
asi->insn_handler = simulate_bbl;
- return INSN_GOOD;
+ return INSN_GOOD_NO_SLOT;
}
static enum kprobe_insn __kprobes
-space_cccc_1100_010x(kprobe_opcode_t insn, struct arch_specific_insn *asi)
+space_cccc_11xx(kprobe_opcode_t insn, struct arch_specific_insn *asi)
{
+ /* Coprocessor instructions... */
/* MCRR : cccc 1100 0100 xxxx xxxx xxxx xxxx xxxx : (Rd!=Rn) */
/* MRRC : cccc 1100 0101 xxxx xxxx xxxx xxxx xxxx : (Rd!=Rn) */
- insn &= 0xfff00fff;
- insn |= 0x00001000; /* Rn = r0, Rd = r1 */
- asi->insn[0] = insn;
- asi->insn_handler = (insn & (1 << 20)) ? emulate_mrrc : emulate_mcrr;
- return INSN_GOOD;
+ /* LDC : cccc 110x xxx1 xxxx xxxx xxxx xxxx xxxx */
+ /* STC : cccc 110x xxx0 xxxx xxxx xxxx xxxx xxxx */
+ /* CDP : cccc 1110 xxxx xxxx xxxx xxxx xxx0 xxxx */
+ /* MCR : cccc 1110 xxx0 xxxx xxxx xxxx xxx1 xxxx */
+ /* MRC : cccc 1110 xxx1 xxxx xxxx xxxx xxx1 xxxx */
+
+ /* SVC : cccc 1111 xxxx xxxx xxxx xxxx xxxx xxxx */
+
+ return INSN_REJECTED;
}
-static enum kprobe_insn __kprobes
-space_cccc_110x(kprobe_opcode_t insn, struct arch_specific_insn *asi)
+static unsigned long __kprobes __check_eq(unsigned long cpsr)
{
- /* LDC : cccc 110x xxx1 xxxx xxxx xxxx xxxx xxxx */
- /* STC : cccc 110x xxx0 xxxx xxxx xxxx xxxx xxxx */
- insn &= 0xfff0ffff; /* Rn = r0 */
- asi->insn[0] = insn;
- asi->insn_handler = emulate_ldcstc;
- return INSN_GOOD;
+ return cpsr & PSR_Z_BIT;
}
-static enum kprobe_insn __kprobes
-space_cccc_111x(kprobe_opcode_t insn, struct arch_specific_insn *asi)
+static unsigned long __kprobes __check_ne(unsigned long cpsr)
{
- /* BKPT : 1110 0001 0010 xxxx xxxx xxxx 0111 xxxx */
- /* SWI : cccc 1111 xxxx xxxx xxxx xxxx xxxx xxxx */
- if ((insn & 0xfff000f0) == 0xe1200070 ||
- (insn & 0x0f000000) == 0x0f000000)
- return INSN_REJECTED;
+ return (~cpsr) & PSR_Z_BIT;
+}
- /* CDP : cccc 1110 xxxx xxxx xxxx xxxx xxx0 xxxx */
- if ((insn & 0x0f000010) == 0x0e000000) {
- asi->insn[0] = insn;
- asi->insn_handler = emulate_none;
- return INSN_GOOD;
- }
+static unsigned long __kprobes __check_cs(unsigned long cpsr)
+{
+ return cpsr & PSR_C_BIT;
+}
- /* MCR : cccc 1110 xxx0 xxxx xxxx xxxx xxx1 xxxx */
- /* MRC : cccc 1110 xxx1 xxxx xxxx xxxx xxx1 xxxx */
- insn &= 0xffff0fff; /* Rd = r0 */
- asi->insn[0] = insn;
- asi->insn_handler = (insn & (1 << 20)) ? emulate_rd12 : emulate_ird12;
- return INSN_GOOD;
+static unsigned long __kprobes __check_cc(unsigned long cpsr)
+{
+ return (~cpsr) & PSR_C_BIT;
+}
+
+static unsigned long __kprobes __check_mi(unsigned long cpsr)
+{
+ return cpsr & PSR_N_BIT;
+}
+
+static unsigned long __kprobes __check_pl(unsigned long cpsr)
+{
+ return (~cpsr) & PSR_N_BIT;
+}
+
+static unsigned long __kprobes __check_vs(unsigned long cpsr)
+{
+ return cpsr & PSR_V_BIT;
+}
+
+static unsigned long __kprobes __check_vc(unsigned long cpsr)
+{
+ return (~cpsr) & PSR_V_BIT;
+}
+
+static unsigned long __kprobes __check_hi(unsigned long cpsr)
+{
+ cpsr &= ~(cpsr >> 1); /* PSR_C_BIT &= ~PSR_Z_BIT */
+ return cpsr & PSR_C_BIT;
}
+static unsigned long __kprobes __check_ls(unsigned long cpsr)
+{
+ cpsr &= ~(cpsr >> 1); /* PSR_C_BIT &= ~PSR_Z_BIT */
+ return (~cpsr) & PSR_C_BIT;
+}
+
+static unsigned long __kprobes __check_ge(unsigned long cpsr)
+{
+ cpsr ^= (cpsr << 3); /* PSR_N_BIT ^= PSR_V_BIT */
+ return (~cpsr) & PSR_N_BIT;
+}
+
+static unsigned long __kprobes __check_lt(unsigned long cpsr)
+{
+ cpsr ^= (cpsr << 3); /* PSR_N_BIT ^= PSR_V_BIT */
+ return cpsr & PSR_N_BIT;
+}
+
+static unsigned long __kprobes __check_gt(unsigned long cpsr)
+{
+ unsigned long temp = cpsr ^ (cpsr << 3); /* PSR_N_BIT ^= PSR_V_BIT */
+ temp |= (cpsr << 1); /* PSR_N_BIT |= PSR_Z_BIT */
+ return (~temp) & PSR_N_BIT;
+}
+
+static unsigned long __kprobes __check_le(unsigned long cpsr)
+{
+ unsigned long temp = cpsr ^ (cpsr << 3); /* PSR_N_BIT ^= PSR_V_BIT */
+ temp |= (cpsr << 1); /* PSR_N_BIT |= PSR_Z_BIT */
+ return temp & PSR_N_BIT;
+}
+
+static unsigned long __kprobes __check_al(unsigned long cpsr)
+{
+ return true;
+}
+
+static kprobe_check_cc * const condition_checks[16] = {
+ &__check_eq, &__check_ne, &__check_cs, &__check_cc,
+ &__check_mi, &__check_pl, &__check_vs, &__check_vc,
+ &__check_hi, &__check_ls, &__check_ge, &__check_lt,
+ &__check_gt, &__check_le, &__check_al, &__check_al
+};
+
/* Return:
* INSN_REJECTED If instruction is one not allowed to kprobe,
* INSN_GOOD If instruction is supported and uses instruction slot,
enum kprobe_insn __kprobes
arm_kprobe_decode_insn(kprobe_opcode_t insn, struct arch_specific_insn *asi)
{
+ asi->insn_check_cc = condition_checks[insn>>28];
asi->insn[1] = KPROBE_RETURN_INSTRUCTION;
- if ((insn & 0xf0000000) == 0xf0000000) {
+ if ((insn & 0xf0000000) == 0xf0000000)
return space_1111(insn, asi);
- } else if ((insn & 0x0e000000) == 0x00000000) {
+ else if ((insn & 0x0e000000) == 0x00000000)
return space_cccc_000x(insn, asi);
- } else if ((insn & 0x0e000000) == 0x02000000) {
+ else if ((insn & 0x0e000000) == 0x02000000)
return space_cccc_001x(insn, asi);
- } else if ((insn & 0x0f000010) == 0x06000010) {
+ else if ((insn & 0x0f000010) == 0x06000010)
return space_cccc_0110__1(insn, asi);
- } else if ((insn & 0x0f000010) == 0x07000010) {
+ else if ((insn & 0x0f000010) == 0x07000010)
return space_cccc_0111__1(insn, asi);
- } else if ((insn & 0x0c000000) == 0x04000000) {
+ else if ((insn & 0x0c000000) == 0x04000000)
return space_cccc_01xx(insn, asi);
- } else if ((insn & 0x0e000000) == 0x08000000) {
+ else if ((insn & 0x0e000000) == 0x08000000)
return space_cccc_100x(insn, asi);
- } else if ((insn & 0x0e000000) == 0x0a000000) {
+ else if ((insn & 0x0e000000) == 0x0a000000)
return space_cccc_101x(insn, asi);
- } else if ((insn & 0x0fe00000) == 0x0c400000) {
-
- return space_cccc_1100_010x(insn, asi);
-
- } else if ((insn & 0x0e000000) == 0x0c000000) {
-
- return space_cccc_110x(insn, asi);
-
- }
-
- return space_cccc_111x(insn, asi);
+ return space_cccc_11xx(insn, asi);
}
void __init arm_kprobe_decode_init(void)
{
find_str_pc_offset();
}
-
-
-/*
- * All ARM instructions listed below.
- *
- * Instructions and their general purpose registers are given.
- * If a particular register may not use R15, it is prefixed with a "!".
- * If marked with a "*" means the value returned by reading R15
- * is implementation defined.
- *
- * ADC/ADD/AND/BIC/CMN/CMP/EOR/MOV/MVN/ORR/RSB/RSC/SBC/SUB/TEQ
- * TST: Rd, Rn, Rm, !Rs
- * BX: Rm
- * BLX(2): !Rm
- * BX: Rm (R15 legal, but discouraged)
- * BXJ: !Rm,
- * CLZ: !Rd, !Rm
- * CPY: Rd, Rm
- * LDC/2,STC/2 immediate offset & unindex: Rn
- * LDC/2,STC/2 immediate pre/post-indexed: !Rn
- * LDM(1/3): !Rn, register_list
- * LDM(2): !Rn, !register_list
- * LDR,STR,PLD immediate offset: Rd, Rn
- * LDR,STR,PLD register offset: Rd, Rn, !Rm
- * LDR,STR,PLD scaled register offset: Rd, !Rn, !Rm
- * LDR,STR immediate pre/post-indexed: Rd, !Rn
- * LDR,STR register pre/post-indexed: Rd, !Rn, !Rm
- * LDR,STR scaled register pre/post-indexed: Rd, !Rn, !Rm
- * LDRB,STRB immediate offset: !Rd, Rn
- * LDRB,STRB register offset: !Rd, Rn, !Rm
- * LDRB,STRB scaled register offset: !Rd, !Rn, !Rm
- * LDRB,STRB immediate pre/post-indexed: !Rd, !Rn
- * LDRB,STRB register pre/post-indexed: !Rd, !Rn, !Rm
- * LDRB,STRB scaled register pre/post-indexed: !Rd, !Rn, !Rm
- * LDRT,LDRBT,STRBT immediate pre/post-indexed: !Rd, !Rn
- * LDRT,LDRBT,STRBT register pre/post-indexed: !Rd, !Rn, !Rm
- * LDRT,LDRBT,STRBT scaled register pre/post-indexed: !Rd, !Rn, !Rm
- * LDRH/SH/SB/D,STRH/SH/SB/D immediate offset: !Rd, Rn
- * LDRH/SH/SB/D,STRH/SH/SB/D register offset: !Rd, Rn, !Rm
- * LDRH/SH/SB/D,STRH/SH/SB/D immediate pre/post-indexed: !Rd, !Rn
- * LDRH/SH/SB/D,STRH/SH/SB/D register pre/post-indexed: !Rd, !Rn, !Rm
- * LDREX: !Rd, !Rn
- * MCR/2: !Rd
- * MCRR/2,MRRC/2: !Rd, !Rn
- * MLA: !Rd, !Rn, !Rm, !Rs
- * MOV: Rd
- * MRC/2: !Rd (if Rd==15, only changes cond codes, not the register)
- * MRS,MSR: !Rd
- * MUL: !Rd, !Rm, !Rs
- * PKH{BT,TB}: !Rd, !Rn, !Rm
- * QDADD,[U]QADD/16/8/SUBX: !Rd, !Rm, !Rn
- * QDSUB,[U]QSUB/16/8/ADDX: !Rd, !Rm, !Rn
- * REV/16/SH: !Rd, !Rm
- * RFE: !Rn
- * {S,U}[H]ADD{16,8,SUBX},{S,U}[H]SUB{16,8,ADDX}: !Rd, !Rn, !Rm
- * SEL: !Rd, !Rn, !Rm
- * SMLA<x><y>,SMLA{D,W<y>},SMLSD,SMML{A,S}: !Rd, !Rn, !Rm, !Rs
- * SMLAL<x><y>,SMLA{D,LD},SMLSLD,SMMULL,SMULW<y>: !RdHi, !RdLo, !Rm, !Rs
- * SMMUL,SMUAD,SMUL<x><y>,SMUSD: !Rd, !Rm, !Rs
- * SSAT/16: !Rd, !Rm
- * STM(1/2): !Rn, register_list* (R15 in reg list not recommended)
- * STRT immediate pre/post-indexed: Rd*, !Rn
- * STRT register pre/post-indexed: Rd*, !Rn, !Rm
- * STRT scaled register pre/post-indexed: Rd*, !Rn, !Rm
- * STREX: !Rd, !Rn, !Rm
- * SWP/B: !Rd, !Rn, !Rm
- * {S,U}XTA{B,B16,H}: !Rd, !Rn, !Rm
- * {S,U}XT{B,B16,H}: !Rd, !Rm
- * UM{AA,LA,UL}L: !RdHi, !RdLo, !Rm, !Rs
- * USA{D8,A8,T,T16}: !Rd, !Rm, !Rs
- *
- * May transfer control by writing R15 (possible mode changes or alternate
- * mode accesses marked by "*"):
- * ALU op (* with s-bit), B, BL, BKPT, BLX(1/2), BX, BXJ, CPS*, CPY,
- * LDM(1), LDM(2/3)*, LDR, MOV, RFE*, SWI*
- *
- * Instructions that do not take general registers, nor transfer control:
- * CDP/2, SETEND, SRS*
- */
struct kprobe_ctlblk *kcb)
{
regs->ARM_pc += 4;
- p->ainsn.insn_handler(p, regs);
+ if (p->ainsn.insn_check_cc(regs->ARM_cpsr))
+ p->ainsn.insn_handler(p, regs);
}
/*
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/leds.h>
static SYSDEV_ATTR(event, 0200, NULL, leds_store);
-static int leds_suspend(struct sys_device *dev, pm_message_t state)
+static struct sysdev_class leds_sysclass = {
+ .name = "leds",
+};
+
+static struct sys_device leds_device = {
+ .id = 0,
+ .cls = &leds_sysclass,
+};
+
+static int leds_suspend(void)
{
leds_event(led_stop);
return 0;
}
-static int leds_resume(struct sys_device *dev)
+static void leds_resume(void)
{
leds_event(led_start);
- return 0;
}
-static int leds_shutdown(struct sys_device *dev)
+static void leds_shutdown(void)
{
leds_event(led_halted);
- return 0;
}
-static struct sysdev_class leds_sysclass = {
- .name = "leds",
+static struct syscore_ops leds_syscore_ops = {
.shutdown = leds_shutdown,
.suspend = leds_suspend,
.resume = leds_resume,
};
-static struct sys_device leds_device = {
- .id = 0,
- .cls = &leds_sysclass,
-};
-
static int __init leds_init(void)
{
int ret;
ret = sysdev_register(&leds_device);
if (ret == 0)
ret = sysdev_create_file(&leds_device, &attr_event);
+ if (ret == 0)
+ register_syscore_ops(&leds_syscore_ops);
return ret;
}
tail = (struct frame_tail __user *)regs->ARM_fp - 1;
- while (tail && !((unsigned long)tail & 0x3))
+ while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
+ tail && !((unsigned long)tail & 0x3))
tail = user_backtrace(tail, entry);
}
#ifdef CONFIG_HAVE_HW_BREAKPOINT
case PTRACE_GETHBPREGS:
+ if (ptrace_get_breakpoints(child) < 0)
+ return -ESRCH;
+
ret = ptrace_gethbpregs(child, addr,
(unsigned long __user *)data);
+ ptrace_put_breakpoints(child);
break;
case PTRACE_SETHBPREGS:
+ if (ptrace_get_breakpoints(child) < 0)
+ return -ESRCH;
+
ret = ptrace_sethbpregs(child, addr,
(unsigned long __user *)data);
+ ptrace_put_breakpoints(child);
break;
#endif
return err;
}
-static inline void setup_syscall_restart(struct pt_regs *regs)
-{
- regs->ARM_r0 = regs->ARM_ORIG_r0;
- regs->ARM_pc -= thumb_mode(regs) ? 2 : 4;
-}
-
/*
* OK, we're invoking a handler
*/
static int
handle_signal(unsigned long sig, struct k_sigaction *ka,
siginfo_t *info, sigset_t *oldset,
- struct pt_regs * regs, int syscall)
+ struct pt_regs * regs)
{
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = current;
int usig = sig;
int ret;
- /*
- * If we were from a system call, check for system call restarting...
- */
- if (syscall) {
- switch (regs->ARM_r0) {
- case -ERESTART_RESTARTBLOCK:
- case -ERESTARTNOHAND:
- regs->ARM_r0 = -EINTR;
- break;
- case -ERESTARTSYS:
- if (!(ka->sa.sa_flags & SA_RESTART)) {
- regs->ARM_r0 = -EINTR;
- break;
- }
- /* fallthrough */
- case -ERESTARTNOINTR:
- setup_syscall_restart(regs);
- }
- }
-
/*
* translate the signal
*/
*/
static void do_signal(struct pt_regs *regs, int syscall)
{
+ unsigned int retval = 0, continue_addr = 0, restart_addr = 0;
struct k_sigaction ka;
siginfo_t info;
int signr;
if (!user_mode(regs))
return;
+ /*
+ * If we were from a system call, check for system call restarting...
+ */
+ if (syscall) {
+ continue_addr = regs->ARM_pc;
+ restart_addr = continue_addr - (thumb_mode(regs) ? 2 : 4);
+ retval = regs->ARM_r0;
+
+ /*
+ * Prepare for system call restart. We do this here so that a
+ * debugger will see the already changed PSW.
+ */
+ switch (retval) {
+ case -ERESTARTNOHAND:
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ regs->ARM_r0 = regs->ARM_ORIG_r0;
+ regs->ARM_pc = restart_addr;
+ break;
+ case -ERESTART_RESTARTBLOCK:
+ regs->ARM_r0 = -EINTR;
+ break;
+ }
+ }
+
if (try_to_freeze())
goto no_signal;
+ /*
+ * Get the signal to deliver. When running under ptrace, at this
+ * point the debugger may change all our registers ...
+ */
signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
sigset_t *oldset;
+ /*
+ * Depending on the signal settings we may need to revert the
+ * decision to restart the system call. But skip this if a
+ * debugger has chosen to restart at a different PC.
+ */
+ if (regs->ARM_pc == restart_addr) {
+ if (retval == -ERESTARTNOHAND
+ || (retval == -ERESTARTSYS
+ && !(ka.sa.sa_flags & SA_RESTART))) {
+ regs->ARM_r0 = -EINTR;
+ regs->ARM_pc = continue_addr;
+ }
+ }
+
if (test_thread_flag(TIF_RESTORE_SIGMASK))
oldset = ¤t->saved_sigmask;
else
oldset = ¤t->blocked;
- if (handle_signal(signr, &ka, &info, oldset, regs, syscall) == 0) {
+ if (handle_signal(signr, &ka, &info, oldset, regs) == 0) {
/*
* A signal was successfully delivered; the saved
* sigmask will have been stored in the signal frame,
}
no_signal:
- /*
- * No signal to deliver to the process - restart the syscall.
- */
if (syscall) {
- if (regs->ARM_r0 == -ERESTART_RESTARTBLOCK) {
+ /*
+ * Handle restarting a different system call. As above,
+ * if a debugger has chosen to restart at a different PC,
+ * ignore the restart.
+ */
+ if (retval == -ERESTART_RESTARTBLOCK
+ && regs->ARM_pc == continue_addr) {
if (thumb_mode(regs)) {
regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
regs->ARM_pc -= 2;
#endif
}
}
- if (regs->ARM_r0 == -ERESTARTNOHAND ||
- regs->ARM_r0 == -ERESTARTSYS ||
- regs->ARM_r0 == -ERESTARTNOINTR) {
- setup_syscall_restart(regs);
- }
/* If there's no signal to deliver, we just put the saved sigmask
* back.
{
}
-static void broadcast_timer_setup(struct clock_event_device *evt)
+static void __cpuinit broadcast_timer_setup(struct clock_event_device *evt)
{
evt->name = "dummy_timer";
evt->features = CLOCK_EVT_FEAT_ONESHOT |
break;
case IPI_RESCHEDULE:
- /*
- * nothing more to do - eveything is
- * done on the interrupt return path
- */
+ scheduler_ipi();
break;
case IPI_CALL_FUNC:
long err;
int i;
- if (nsops < 1)
+ if (nsops < 1 || nsops > SEMOPM)
return -EINVAL;
sops = kmalloc(sizeof(*sops) * nsops, GFP_KERNEL);
if (!sops)
#include <linux/timex.h>
#include <linux/errno.h>
#include <linux/profile.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/timer.h>
#include <linux/irq.h>
#endif
#if defined(CONFIG_PM) && !defined(CONFIG_GENERIC_CLOCKEVENTS)
-static int timer_suspend(struct sys_device *dev, pm_message_t state)
+static int timer_suspend(void)
{
- struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
-
- if (timer->suspend != NULL)
- timer->suspend();
+ if (system_timer->suspend)
+ system_timer->suspend();
return 0;
}
-static int timer_resume(struct sys_device *dev)
+static void timer_resume(void)
{
- struct sys_timer *timer = container_of(dev, struct sys_timer, dev);
-
- if (timer->resume != NULL)
- timer->resume();
-
- return 0;
+ if (system_timer->resume)
+ system_timer->resume();
}
#else
#define timer_suspend NULL
#define timer_resume NULL
#endif
-static struct sysdev_class timer_sysclass = {
- .name = "timer",
+static struct syscore_ops timer_syscore_ops = {
.suspend = timer_suspend,
.resume = timer_resume,
};
-static int __init timer_init_sysfs(void)
+static int __init timer_init_syscore_ops(void)
{
- int ret = sysdev_class_register(&timer_sysclass);
- if (ret == 0) {
- system_timer->dev.cls = &timer_sysclass;
- ret = sysdev_register(&system_timer->dev);
- }
+ register_syscore_ops(&timer_syscore_ops);
- return ret;
+ return 0;
}
-device_initcall(timer_init_sysfs);
+device_initcall(timer_init_syscore_ops);
void __init time_init(void)
{
select CPU_ARM926T
select GENERIC_CLOCKEVENTS
select HAVE_FB_ATMEL
+ select HAVE_NET_MACB
config ARCH_AT572D940HF
bool "AT572D940HF"
#include <mach/board.h>
#include "generic.h"
+static void __init at91eb01_init_irq(void)
+{
+ at91x40_init_interrupts(NULL);
+}
+
static void __init at91eb01_map_io(void)
{
at91x40_initialize(40000000);
MACHINE_START(AT91EB01, "Atmel AT91 EB01")
/* Maintainer: Greg Ungerer <gerg@snapgear.com> */
.timer = &at91x40_timer,
- .init_irq = at91x40_init_interrupts,
+ .init_irq = at91eb01_init_irq,
.map_io = at91eb01_map_io,
MACHINE_END
#define ARCH_ID_AT91SAM9G45 0x819b05a0
#define ARCH_ID_AT91SAM9G45MRL 0x819b05a2 /* aka 9G45-ES2 & non ES lots */
#define ARCH_ID_AT91SAM9G45ES 0x819b05a1 /* 9G45-ES (Engineering Sample) */
+#define ARCH_ID_AT91SAM9X5 0x819a05a0
#define ARCH_ID_AT91CAP9 0x039A03A0
#define ARCH_ID_AT91SAM9XE128 0x329973a0
#define ARCH_EXID_AT91SAM9G46 0x00000003
#define ARCH_EXID_AT91SAM9G45 0x00000004
+#define ARCH_EXID_AT91SAM9G15 0x00000000
+#define ARCH_EXID_AT91SAM9G35 0x00000001
+#define ARCH_EXID_AT91SAM9X35 0x00000002
+#define ARCH_EXID_AT91SAM9G25 0x00000003
+#define ARCH_EXID_AT91SAM9X25 0x00000004
+
static inline unsigned long at91_exid_identify(void)
{
return at91_sys_read(AT91_DBGU_EXID);
#define cpu_is_at91sam9m11() (0)
#endif
+#ifdef CONFIG_ARCH_AT91SAM9X5
+#define cpu_is_at91sam9x5() (at91_cpu_identify() == ARCH_ID_AT91SAM9X5)
+#define cpu_is_at91sam9g15() (cpu_is_at91sam9x5() && \
+ (at91_exid_identify() == ARCH_EXID_AT91SAM9G15))
+#define cpu_is_at91sam9g35() (cpu_is_at91sam9x5() && \
+ (at91_exid_identify() == ARCH_EXID_AT91SAM9G35))
+#define cpu_is_at91sam9x35() (cpu_is_at91sam9x5() && \
+ (at91_exid_identify() == ARCH_EXID_AT91SAM9X35))
+#define cpu_is_at91sam9g25() (cpu_is_at91sam9x5() && \
+ (at91_exid_identify() == ARCH_EXID_AT91SAM9G25))
+#define cpu_is_at91sam9x25() (cpu_is_at91sam9x5() && \
+ (at91_exid_identify() == ARCH_EXID_AT91SAM9X25))
+#else
+#define cpu_is_at91sam9x5() (0)
+#define cpu_is_at91sam9g15() (0)
+#define cpu_is_at91sam9g35() (0)
+#define cpu_is_at91sam9x35() (0)
+#define cpu_is_at91sam9g25() (0)
+#define cpu_is_at91sam9x25() (0)
+#endif
+
#ifdef CONFIG_ARCH_AT91CAP9
#define cpu_is_at91cap9() (at91_cpu_identify() == ARCH_ID_AT91CAP9)
#define cpu_is_at91cap9_revB() (at91cap9_rev_identify() == ARCH_REVISION_CAP9_B)
depends on ARCH_DAVINCI_DM644x
select MISC_DEVICES
select EEPROM_AT24
+ select I2C
help
Configure this option to specify the whether the board used
for development is a DM644x EVM
depends on ARCH_DAVINCI_DM644x
select MISC_DEVICES
select EEPROM_AT24
+ select I2C
help
Say Y here to select the Lyrtech Small Form Factor
Software Defined Radio (SFFSDR) board.
select MACH_DAVINCI_DM6467TEVM
select MISC_DEVICES
select EEPROM_AT24
+ select I2C
help
Configure this option to specify the whether the board used
for development is a DM6467 EVM
depends on ARCH_DAVINCI_DM365
select MISC_DEVICES
select EEPROM_AT24
+ select I2C
help
Configure this option to specify whether the board used
for development is a DM365 EVM
select GPIO_PCF857X
select MISC_DEVICES
select EEPROM_AT24
+ select I2C
help
Say Y here to select the TI DA830/OMAP-L137/AM17x Evaluation Module.
depends on ARCH_DAVINCI_DA850
select MISC_DEVICES
select EEPROM_AT24
+ select I2C
help
Say Y here to select the Critical Link MityDSP-L138/MityARM-1808
System on Module. Information on this SoM may be found at
#include <mach/mux.h>
#include <mach/spi.h>
-#define MITYOMAPL138_PHY_ID "0:03"
+#define MITYOMAPL138_PHY_ID ""
#define FACTORY_CONFIG_MAGIC 0x012C0138
#define FACTORY_CONFIG_VERSION 0x00010001
static struct platform_device mityomapl138_nandflash_device = {
.name = "davinci_nand",
- .id = 0,
+ .id = 1,
.dev = {
.platform_data = &mityomapl138_nandflash_data,
},
if (freqs.old == freqs.new)
return ret;
- cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER,
- dev_driver_string(cpufreq.dev),
- "transition: %u --> %u\n", freqs.old, freqs.new);
+ dev_dbg(&cpufreq.dev, "transition: %u --> %u\n", freqs.old, freqs.new);
ret = cpufreq_frequency_table_target(policy, pdata->freq_table,
freqs.new, relation, &idx);
#define DA8XX_GPIO_BASE 0x01e26000
#define DA8XX_I2C1_BASE 0x01e28000
#define DA8XX_SPI0_BASE 0x01c41000
-#define DA8XX_SPI1_BASE 0x01f0e000
+#define DA830_SPI1_BASE 0x01e12000
+#define DA850_SPI1_BASE 0x01f0e000
#define DA8XX_EMAC_CTRL_REG_OFFSET 0x3000
#define DA8XX_EMAC_MOD_REG_OFFSET 0x2000
static struct resource da8xx_spi1_resources[] = {
[0] = {
- .start = DA8XX_SPI1_BASE,
- .end = DA8XX_SPI1_BASE + SZ_4K - 1,
+ .start = DA830_SPI1_BASE,
+ .end = DA830_SPI1_BASE + SZ_4K - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
da8xx_spi_pdata[instance].num_chipselect = len;
+ if (instance == 1 && cpu_is_davinci_da850()) {
+ da8xx_spi1_resources[0].start = DA850_SPI1_BASE;
+ da8xx_spi1_resources[0].end = DA850_SPI1_BASE + SZ_4K - 1;
+ }
+
return platform_device_register(&da8xx_spi_device[instance]);
}
.name = "timer2",
.parent = &pll1_aux_clk,
.lpsc = DAVINCI_LPSC_TIMER2,
- .usecount = 1, /* REVISIT: why can't' this be disabled? */
+ .usecount = 1, /* REVISIT: why can't this be disabled? */
};
static struct clk timer3_clk = {
.name = "timer2",
.parent = &pll1_aux_clk,
.lpsc = DAVINCI_LPSC_TIMER2,
- .usecount = 1, /* REVISIT: why can't' this be disabled? */
+ .usecount = 1, /* REVISIT: why can't this be disabled? */
};
static struct clk_lookup dm644x_clks[] = {
#define UART_SHIFT 2
+#define davinci_uart_v2p(x) ((x) - PAGE_OFFSET + PLAT_PHYS_OFFSET)
+#define davinci_uart_p2v(x) ((x) - PLAT_PHYS_OFFSET + PAGE_OFFSET)
+
.pushsection .data
davinci_uart_phys: .word 0
davinci_uart_virt: .word 0
/* Use davinci_uart_phys/virt if already configured */
10: mrc p15, 0, \rp, c1, c0
tst \rp, #1 @ MMU enabled?
- ldreq \rp, =__virt_to_phys(davinci_uart_phys)
+ ldreq \rp, =davinci_uart_v2p(davinci_uart_phys)
ldrne \rp, =davinci_uart_phys
add \rv, \rp, #4 @ davinci_uart_virt
ldr \rp, [\rp, #0]
tst \rp, #1 @ MMU enabled?
/* Copy uart phys address from decompressor uart info */
- ldreq \rv, =__virt_to_phys(davinci_uart_phys)
+ ldreq \rv, =davinci_uart_v2p(davinci_uart_phys)
ldrne \rv, =davinci_uart_phys
ldreq \rp, =DAVINCI_UART_INFO
- ldrne \rp, =__phys_to_virt(DAVINCI_UART_INFO)
+ ldrne \rp, =davinci_uart_p2v(DAVINCI_UART_INFO)
ldr \rp, [\rp, #0]
str \rp, [\rv]
/* Copy uart virt address from decompressor uart info */
- ldreq \rv, =__virt_to_phys(davinci_uart_virt)
+ ldreq \rv, =davinci_uart_v2p(davinci_uart_virt)
ldrne \rv, =davinci_uart_virt
ldreq \rp, =DAVINCI_UART_INFO
- ldrne \rp, =__phys_to_virt(DAVINCI_UART_INFO)
+ ldrne \rp, =davinci_uart_p2v(DAVINCI_UART_INFO)
ldr \rp, [\rp, #4]
str \rp, [\rv]
*
* This area sits just below the page tables (see arch/arm/kernel/head.S).
*/
-#define DAVINCI_UART_INFO (PHYS_OFFSET + 0x3ff8)
+#define DAVINCI_UART_INFO (PLAT_PHYS_OFFSET + 0x3ff8)
#define DAVINCI_UART0_BASE (IO_PHYS + 0x20000)
#define DAVINCI_UART1_BASE (IO_PHYS + 0x20400)
#include <linux/init.h>
#include <linux/suspend.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <asm/cacheflush.h>
flush_cache_all();
}
-static int exynos4_pm_resume(struct sys_device *dev)
+static struct sysdev_driver exynos4_pm_driver = {
+ .add = exynos4_pm_add,
+};
+
+static __init int exynos4_pm_drvinit(void)
+{
+ unsigned int tmp;
+
+ s3c_pm_init();
+
+ /* All wakeup disable */
+
+ tmp = __raw_readl(S5P_WAKEUP_MASK);
+ tmp |= ((0xFF << 8) | (0x1F << 1));
+ __raw_writel(tmp, S5P_WAKEUP_MASK);
+
+ return sysdev_driver_register(&exynos4_sysclass, &exynos4_pm_driver);
+}
+arch_initcall(exynos4_pm_drvinit);
+
+static void exynos4_pm_resume(void)
{
/* For release retention */
/* enable L2X0*/
writel_relaxed(1, S5P_VA_L2CC + L2X0_CTRL);
#endif
-
- return 0;
}
-static struct sysdev_driver exynos4_pm_driver = {
- .add = exynos4_pm_add,
+static struct syscore_ops exynos4_pm_syscore_ops = {
.resume = exynos4_pm_resume,
};
-static __init int exynos4_pm_drvinit(void)
+static __init int exynos4_pm_syscore_init(void)
{
- unsigned int tmp;
-
- s3c_pm_init();
-
- /* All wakeup disable */
-
- tmp = __raw_readl(S5P_WAKEUP_MASK);
- tmp |= ((0xFF << 8) | (0x1F << 1));
- __raw_writel(tmp, S5P_WAKEUP_MASK);
-
- return sysdev_driver_register(&exynos4_sysclass, &exynos4_pm_driver);
+ register_syscore_ops(&exynos4_pm_syscore_ops);
+ return 0;
}
-arch_initcall(exynos4_pm_drvinit);
+arch_initcall(exynos4_pm_syscore_init);
config ARCH_CATS
bool "CATS"
+ select CLKSRC_I8253
select FOOTBRIDGE_HOST
select ISA
select ISA_DMA
config ARCH_NETWINDER
bool "NetWinder"
+ select CLKSRC_I8253
select FOOTBRIDGE_HOST
select ISA
select ISA_DMA
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/io.h>
+#include <linux/spinlock.h>
#include <linux/timex.h>
#include <asm/irq.h>
-
+#include <asm/i8253.h>
#include <asm/mach/time.h>
#include "common.h"
-#define PIT_MODE 0x43
-#define PIT_CH0 0x40
-
-#define PIT_LATCH ((PIT_TICK_RATE + HZ / 2) / HZ)
-
-static cycle_t pit_read(struct clocksource *cs)
-{
- unsigned long flags;
- static int old_count;
- static u32 old_jifs;
- int count;
- u32 jifs;
-
- raw_local_irq_save(flags);
-
- jifs = jiffies;
- outb_p(0x00, PIT_MODE); /* latch the count */
- count = inb_p(PIT_CH0); /* read the latched count */
- count |= inb_p(PIT_CH0) << 8;
-
- if (count > old_count && jifs == old_jifs)
- count = old_count;
-
- old_count = count;
- old_jifs = jifs;
-
- raw_local_irq_restore(flags);
-
- count = (PIT_LATCH - 1) - count;
-
- return (cycle_t)(jifs * PIT_LATCH) + count;
-}
-
-static struct clocksource pit_cs = {
- .name = "pit",
- .rating = 110,
- .read = pit_read,
- .mask = CLOCKSOURCE_MASK(32),
-};
+DEFINE_RAW_SPINLOCK(i8253_lock);
static void pit_set_mode(enum clock_event_mode mode,
struct clock_event_device *evt)
pit_ce.max_delta_ns = clockevent_delta2ns(0x7fff, &pit_ce);
pit_ce.min_delta_ns = clockevent_delta2ns(0x000f, &pit_ce);
- clocksource_register_hz(&pit_cs, PIT_TICK_RATE);
+ clocksource_i8253_init();
setup_irq(pit_ce.irq, &pit_timer_irq);
clockevents_register_device(&pit_ce);
#include <linux/platform_device.h>
#include <linux/slab.h>
#include <linux/string.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/amba/bus.h>
#include <linux/amba/kmi.h>
#include <linux/clocksource.h>
#ifdef CONFIG_PM
static unsigned long ic_irq_enable;
-static int irq_suspend(struct sys_device *dev, pm_message_t state)
+static int irq_suspend(void)
{
ic_irq_enable = readl(VA_IC_BASE + IRQ_ENABLE);
return 0;
}
-static int irq_resume(struct sys_device *dev)
+static void irq_resume(void)
{
/* disable all irq sources */
writel(-1, VA_CMIC_BASE + IRQ_ENABLE_CLEAR);
writel(-1, VA_IC_BASE + FIQ_ENABLE_CLEAR);
writel(ic_irq_enable, VA_IC_BASE + IRQ_ENABLE_SET);
- return 0;
}
#else
#define irq_suspend NULL
#define irq_resume NULL
#endif
-static struct sysdev_class irq_class = {
- .name = "irq",
+static struct syscore_ops irq_syscore_ops = {
.suspend = irq_suspend,
.resume = irq_resume,
};
-static struct sys_device irq_device = {
- .id = 0,
- .cls = &irq_class,
-};
-
-static int __init irq_init_sysfs(void)
+static int __init irq_syscore_init(void)
{
- int ret = sysdev_class_register(&irq_class);
- if (ret == 0)
- ret = sysdev_register(&irq_device);
- return ret;
+ register_syscore_ops(&irq_syscore_ops);
+
+ return 0;
}
-device_initcall(irq_init_sysfs);
+device_initcall(irq_syscore_init);
/*
* Flash handling.
.workaround = FLS_USB2_WORKAROUND_ENGCM09152,
};
+static int vpr200_usbh_init(struct platform_device *pdev)
+{
+ return mx35_initialize_usb_hw(pdev->id,
+ MXC_EHCI_INTERFACE_SINGLE_UNI | MXC_EHCI_INTERNAL_PHY);
+}
+
/* USB HOST config */
static const struct mxc_usbh_platform_data usb_host_pdata __initconst = {
- .portsc = MXC_EHCI_MODE_SERIAL,
- .flags = MXC_EHCI_INTERFACE_SINGLE_UNI |
- MXC_EHCI_INTERNAL_PHY,
+ .init = vpr200_usbh_init,
+ .portsc = MXC_EHCI_MODE_SERIAL,
};
static struct platform_device *devices[] __initdata = {
.wakeup = wake, \
}
-static const struct gpio_keys_button loco_buttons[] __initconst = {
+static struct gpio_keys_button loco_buttons[] = {
GPIO_BUTTON(MX53_LOCO_POWER, KEY_POWER, 1, "power", 0),
GPIO_BUTTON(MX53_LOCO_UI1, KEY_VOLUMEUP, 1, "volume-up", 0),
GPIO_BUTTON(MX53_LOCO_UI2, KEY_VOLUMEDOWN, 1, "volume-down", 0),
unsigned long diff, parent_rate, calc_rate; \
int i; \
\
- parent_rate = clk_get_rate(clk->parent); \
div_max = BM_CLKCTRL_##dr##_DIV >> BP_CLKCTRL_##dr##_DIV; \
bm_busy = BM_CLKCTRL_##dr##_BUSY; \
\
if (clk->parent == &ref_xtal_clk) { \
+ parent_rate = clk_get_rate(clk->parent); \
div = DIV_ROUND_UP(parent_rate, rate); \
if (clk == &cpu_clk) { \
div_max = BM_CLKCTRL_CPU_DIV_XTAL >> \
if (div == 0 || div > div_max) \
return -EINVAL; \
} else { \
+ /* \
+ * hack alert: this block modifies clk->parent, too, \
+ * so the base to use it the grand parent. \
+ */ \
+ parent_rate = clk_get_rate(clk->parent->parent); \
rate >>= PARENT_RATE_SHIFT; \
parent_rate >>= PARENT_RATE_SHIFT; \
diff = parent_rate; \
#ifdef CONFIG_PM_RUNTIME
static int omap1_pm_runtime_suspend(struct device *dev)
{
- struct clk *iclk, *fclk;
- int ret = 0;
+ int ret;
dev_dbg(dev, "%s\n", __func__);
ret = pm_generic_runtime_suspend(dev);
+ if (ret)
+ return ret;
- fclk = clk_get(dev, "fck");
- if (!IS_ERR(fclk)) {
- clk_disable(fclk);
- clk_put(fclk);
- }
-
- iclk = clk_get(dev, "ick");
- if (!IS_ERR(iclk)) {
- clk_disable(iclk);
- clk_put(iclk);
+ ret = pm_runtime_clk_suspend(dev);
+ if (ret) {
+ pm_generic_runtime_resume(dev);
+ return ret;
}
return 0;
-};
+}
static int omap1_pm_runtime_resume(struct device *dev)
{
- struct clk *iclk, *fclk;
-
dev_dbg(dev, "%s\n", __func__);
- iclk = clk_get(dev, "ick");
- if (!IS_ERR(iclk)) {
- clk_enable(iclk);
- clk_put(iclk);
- }
+ pm_runtime_clk_resume(dev);
+ return pm_generic_runtime_resume(dev);
+}
- fclk = clk_get(dev, "fck");
- if (!IS_ERR(fclk)) {
- clk_enable(fclk);
- clk_put(fclk);
- }
+static struct dev_power_domain default_power_domain = {
+ .ops = {
+ .runtime_suspend = omap1_pm_runtime_suspend,
+ .runtime_resume = omap1_pm_runtime_resume,
+ USE_PLATFORM_PM_SLEEP_OPS
+ },
+};
- return pm_generic_runtime_resume(dev);
+static struct pm_clk_notifier_block platform_bus_notifier = {
+ .pwr_domain = &default_power_domain,
+ .con_ids = { "ick", "fck", NULL, },
};
static int __init omap1_pm_runtime_init(void)
{
- const struct dev_pm_ops *pm;
- struct dev_pm_ops *omap_pm;
-
if (!cpu_class_is_omap1())
return -ENODEV;
- pm = platform_bus_get_pm_ops();
- if (!pm) {
- pr_err("%s: unable to get dev_pm_ops from platform_bus\n",
- __func__);
- return -ENODEV;
- }
-
- omap_pm = kmemdup(pm, sizeof(struct dev_pm_ops), GFP_KERNEL);
- if (!omap_pm) {
- pr_err("%s: unable to alloc memory for new dev_pm_ops\n",
- __func__);
- return -ENOMEM;
- }
-
- omap_pm->runtime_suspend = omap1_pm_runtime_suspend;
- omap_pm->runtime_resume = omap1_pm_runtime_resume;
-
- platform_bus_set_pm_ops(omap_pm);
+ pm_runtime_clk_add_notifier(&platform_bus_type, &platform_bus_notifier);
return 0;
}
# Power Management
ifeq ($(CONFIG_PM),y)
obj-$(CONFIG_ARCH_OMAP2) += pm24xx.o
-obj-$(CONFIG_ARCH_OMAP2) += sleep24xx.o pm_bus.o
+obj-$(CONFIG_ARCH_OMAP2) += sleep24xx.o
obj-$(CONFIG_ARCH_OMAP3) += pm34xx.o sleep34xx.o \
- cpuidle34xx.o pm_bus.o
-obj-$(CONFIG_ARCH_OMAP4) += pm44xx.o pm_bus.o
+ cpuidle34xx.o
+obj-$(CONFIG_ARCH_OMAP4) += pm44xx.o
obj-$(CONFIG_PM_DEBUG) += pm-debug.o
obj-$(CONFIG_OMAP_SMARTREFLEX) += sr_device.o smartreflex.o
obj-$(CONFIG_OMAP_SMARTREFLEX_CLASS3) += smartreflex-class3.o
AFLAGS_sleep24xx.o :=-Wa,-march=armv6
-AFLAGS_sleep34xx.o :=-Wa,-march=armv7-a
+AFLAGS_sleep34xx.o :=-Wa,-march=armv7-a$(plus_sec)
ifeq ($(CONFIG_PM_VERBOSE),y)
CFLAGS_pm_bus.o += -DDEBUG
static void __init rx51_map_io(void)
{
omap2_set_globals_3xxx();
- rx51_video_mem_init();
omap34xx_map_common_io();
}
+static void __init rx51_reserve(void)
+{
+ rx51_video_mem_init();
+ omap_reserve();
+}
+
MACHINE_START(NOKIA_RX51, "Nokia RX-51 board")
/* Maintainer: Lauri Leukkunen <lauri.leukkunen@nokia.com> */
.boot_params = 0x80000100,
- .reserve = omap_reserve,
+ .reserve = rx51_reserve,
.map_io = rx51_map_io,
.init_early = rx51_init_early,
.init_irq = omap_init_irq,
sdrc_cs0->rfr_ctrl, sdrc_cs0->actim_ctrla,
sdrc_cs0->actim_ctrlb, sdrc_cs0->mr,
0, 0, 0, 0);
+ clk->rate = rate;
return 0;
}
CLK(NULL, "dsp_fck", &dsp_fck, CK_443X),
CLK("omapdss_dss", "sys_clk", &dss_sys_clk, CK_443X),
CLK("omapdss_dss", "tv_clk", &dss_tv_clk, CK_443X),
- CLK("omapdss_dss", "dss_clk", &dss_dss_clk, CK_443X),
CLK("omapdss_dss", "video_clk", &dss_48mhz_clk, CK_443X),
- CLK("omapdss_dss", "fck", &dss_fck, CK_443X),
- /*
- * On OMAP4, DSS ick is a dummy clock; this is needed for compatibility
- * with OMAP2/3.
- */
- CLK("omapdss_dss", "ick", &dummy_ck, CK_443X),
+ CLK("omapdss_dss", "fck", &dss_dss_clk, CK_443X),
+ CLK("omapdss_dss", "ick", &dss_fck, CK_443X),
CLK(NULL, "efuse_ctrl_cust_fck", &efuse_ctrl_cust_fck, CK_443X),
CLK(NULL, "emif1_fck", &emif1_fck, CK_443X),
CLK(NULL, "emif2_fck", &emif2_fck, CK_443X),
u32 per_cm_clksel;
u32 emu_cm_clksel;
u32 emu_cm_clkstctrl;
+ u32 pll_cm_autoidle;
u32 pll_cm_autoidle2;
u32 pll_cm_clksel4;
u32 pll_cm_clksel5;
omap2_cm_read_mod_reg(OMAP3430_EMU_MOD, CM_CLKSEL1);
cm_context.emu_cm_clkstctrl =
omap2_cm_read_mod_reg(OMAP3430_EMU_MOD, OMAP2_CM_CLKSTCTRL);
+ /*
+ * As per erratum i671, ROM code does not respect the PER DPLL
+ * programming scheme if CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL == 1.
+ * In this case, even though this register has been saved in
+ * scratchpad contents, we need to restore AUTO_PERIPH_DPLL
+ * by ourselves. So, we need to save it anyway.
+ */
+ cm_context.pll_cm_autoidle =
+ omap2_cm_read_mod_reg(PLL_MOD, CM_AUTOIDLE);
cm_context.pll_cm_autoidle2 =
omap2_cm_read_mod_reg(PLL_MOD, CM_AUTOIDLE2);
cm_context.pll_cm_clksel4 =
CM_CLKSEL1);
omap2_cm_write_mod_reg(cm_context.emu_cm_clkstctrl, OMAP3430_EMU_MOD,
OMAP2_CM_CLKSTCTRL);
+ /*
+ * As per erratum i671, ROM code does not respect the PER DPLL
+ * programming scheme if CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL == 1.
+ * In this case, we need to restore AUTO_PERIPH_DPLL by ourselves.
+ */
+ omap2_cm_write_mod_reg(cm_context.pll_cm_autoidle, PLL_MOD,
+ CM_AUTOIDLE);
omap2_cm_write_mod_reg(cm_context.pll_cm_autoidle2, PLL_MOD,
CM_AUTOIDLE2);
omap2_cm_write_mod_reg(cm_context.pll_cm_clksel4, PLL_MOD,
omap2_cm_read_mod_reg(WKUP_MOD, CM_CLKSEL);
prcm_block_contents.cm_clken_pll =
omap2_cm_read_mod_reg(PLL_MOD, CM_CLKEN);
+ /*
+ * As per erratum i671, ROM code does not respect the PER DPLL
+ * programming scheme if CM_AUTOIDLE_PLL..AUTO_PERIPH_DPLL == 1.
+ * Then, in anycase, clear these bits to avoid extra latencies.
+ */
prcm_block_contents.cm_autoidle_pll =
- omap2_cm_read_mod_reg(PLL_MOD, OMAP3430_CM_AUTOIDLE_PLL);
+ omap2_cm_read_mod_reg(PLL_MOD, CM_AUTOIDLE) &
+ ~OMAP3430_AUTO_PERIPH_DPLL_MASK;
prcm_block_contents.cm_clksel1_pll =
omap2_cm_read_mod_reg(PLL_MOD, OMAP3430_CM_CLKSEL1_PLL);
prcm_block_contents.cm_clksel2_pll =
static struct omap_hwmod omap2420_gpio1_hwmod = {
.name = "gpio1",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap242x_gpio1_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap242x_gpio1_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2420_gpio2_hwmod = {
.name = "gpio2",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap242x_gpio2_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap242x_gpio2_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2420_gpio3_hwmod = {
.name = "gpio3",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap242x_gpio3_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap242x_gpio3_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2420_gpio4_hwmod = {
.name = "gpio4",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap242x_gpio4_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap242x_gpio4_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod_addr_space omap2420_dma_system_addrs[] = {
{
.pa_start = 0x48056000,
- .pa_end = 0x4a0560ff,
+ .pa_end = 0x48056fff,
.flags = ADDR_TYPE_RT
},
};
static struct omap_hwmod omap2430_gpio1_hwmod = {
.name = "gpio1",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap243x_gpio1_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap243x_gpio1_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2430_gpio2_hwmod = {
.name = "gpio2",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap243x_gpio2_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap243x_gpio2_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2430_gpio3_hwmod = {
.name = "gpio3",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap243x_gpio3_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap243x_gpio3_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2430_gpio4_hwmod = {
.name = "gpio4",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap243x_gpio4_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap243x_gpio4_irqs),
.main_clk = "gpios_fck",
static struct omap_hwmod omap2430_gpio5_hwmod = {
.name = "gpio5",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap243x_gpio5_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap243x_gpio5_irqs),
.main_clk = "gpio5_fck",
static struct omap_hwmod_addr_space omap2430_dma_system_addrs[] = {
{
.pa_start = 0x48056000,
- .pa_end = 0x4a0560ff,
+ .pa_end = 0x48056fff,
.flags = ADDR_TYPE_RT
},
};
static struct omap_hwmod omap3xxx_gpio1_hwmod = {
.name = "gpio1",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap3xxx_gpio1_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap3xxx_gpio1_irqs),
.main_clk = "gpio1_ick",
static struct omap_hwmod omap3xxx_gpio2_hwmod = {
.name = "gpio2",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap3xxx_gpio2_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap3xxx_gpio2_irqs),
.main_clk = "gpio2_ick",
static struct omap_hwmod omap3xxx_gpio3_hwmod = {
.name = "gpio3",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap3xxx_gpio3_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap3xxx_gpio3_irqs),
.main_clk = "gpio3_ick",
static struct omap_hwmod omap3xxx_gpio4_hwmod = {
.name = "gpio4",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap3xxx_gpio4_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap3xxx_gpio4_irqs),
.main_clk = "gpio4_ick",
static struct omap_hwmod omap3xxx_gpio5_hwmod = {
.name = "gpio5",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap3xxx_gpio5_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap3xxx_gpio5_irqs),
.main_clk = "gpio5_ick",
static struct omap_hwmod omap3xxx_gpio6_hwmod = {
.name = "gpio6",
+ .flags = HWMOD_CONTROL_OPT_CLKS_IN_RESET,
.mpu_irqs = omap3xxx_gpio6_irqs,
.mpu_irqs_cnt = ARRAY_SIZE(omap3xxx_gpio6_irqs),
.main_clk = "gpio6_ick",
static struct omap_hwmod_addr_space omap3xxx_dma_system_addrs[] = {
{
.pa_start = 0x48056000,
- .pa_end = 0x4a0560ff,
+ .pa_end = 0x48056fff,
.flags = ADDR_TYPE_RT
},
};
static struct omap_hwmod_addr_space omap44xx_dma_system_addrs[] = {
{
.pa_start = 0x4a056000,
- .pa_end = 0x4a0560ff,
+ .pa_end = 0x4a056fff,
.flags = ADDR_TYPE_RT
},
};
/* No timeout error for debug sources */
}
- base = ((l3->rt) + (*(omap3_l3_bases[int_type] + err_source)));
-
/* identify the error source */
for (err_source = 0; !(status & (1 << err_source)); err_source++)
;
+
+ base = l3->rt + *(omap3_l3_bases[int_type] + err_source);
error = omap3_l3_readll(base, L3_ERROR_LOG);
if (error) {
if (cpu_is_omap44xx()) {
_init_omap_device("l3_main_1", &l3_dev);
_init_omap_device("dsp", &dsp_dev);
+ _init_omap_device("iva", &iva_dev);
} else {
_init_omap_device("l3_main", &l3_dev);
}
+++ /dev/null
-/*
- * Runtime PM support code for OMAP
- *
- * Author: Kevin Hilman, Deep Root Systems, LLC
- *
- * Copyright (C) 2010 Texas Instruments, Inc.
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
- */
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/io.h>
-#include <linux/pm_runtime.h>
-#include <linux/platform_device.h>
-#include <linux/mutex.h>
-
-#include <plat/omap_device.h>
-#include <plat/omap-pm.h>
-
-#ifdef CONFIG_PM_RUNTIME
-static int omap_pm_runtime_suspend(struct device *dev)
-{
- struct platform_device *pdev = to_platform_device(dev);
- int r, ret = 0;
-
- dev_dbg(dev, "%s\n", __func__);
-
- ret = pm_generic_runtime_suspend(dev);
-
- if (!ret && dev->parent == &omap_device_parent) {
- r = omap_device_idle(pdev);
- WARN_ON(r);
- }
-
- return ret;
-};
-
-static int omap_pm_runtime_resume(struct device *dev)
-{
- struct platform_device *pdev = to_platform_device(dev);
- int r;
-
- dev_dbg(dev, "%s\n", __func__);
-
- if (dev->parent == &omap_device_parent) {
- r = omap_device_enable(pdev);
- WARN_ON(r);
- }
-
- return pm_generic_runtime_resume(dev);
-};
-#else
-#define omap_pm_runtime_suspend NULL
-#define omap_pm_runtime_resume NULL
-#endif /* CONFIG_PM_RUNTIME */
-
-static int __init omap_pm_runtime_init(void)
-{
- const struct dev_pm_ops *pm;
- struct dev_pm_ops *omap_pm;
-
- pm = platform_bus_get_pm_ops();
- if (!pm) {
- pr_err("%s: unable to get dev_pm_ops from platform_bus\n",
- __func__);
- return -ENODEV;
- }
-
- omap_pm = kmemdup(pm, sizeof(struct dev_pm_ops), GFP_KERNEL);
- if (!omap_pm) {
- pr_err("%s: unable to alloc memory for new dev_pm_ops\n",
- __func__);
- return -ENOMEM;
- }
-
- omap_pm->runtime_suspend = omap_pm_runtime_suspend;
- omap_pm->runtime_resume = omap_pm_runtime_resume;
-
- platform_bus_set_pm_ops(omap_pm);
-
- return 0;
-}
-core_initcall(omap_pm_runtime_init);
sys_clk_speed /= 1000;
/* Generic voltage parameters */
- vdd->curr_volt = 1200000;
vdd->volt_scale = vp_forceupdate_scale_voltage;
vdd->vp_enabled = false;
#include <linux/init.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
#include <linux/interrupt.h>
#include <linux/sched.h>
#include <linux/bitops.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <mach/pxa2xx-regs.h>
#ifdef CONFIG_PM
static uint32_t saved_cken;
-static int pxa2xx_clock_suspend(struct sys_device *d, pm_message_t state)
+static int pxa2xx_clock_suspend(void)
{
saved_cken = CKEN;
return 0;
}
-static int pxa2xx_clock_resume(struct sys_device *d)
+static void pxa2xx_clock_resume(void)
{
CKEN = saved_cken;
- return 0;
}
#else
#define pxa2xx_clock_suspend NULL
#define pxa2xx_clock_resume NULL
#endif
-struct sysdev_class pxa2xx_clock_sysclass = {
- .name = "pxa2xx-clock",
+struct syscore_ops pxa2xx_clock_syscore_ops = {
.suspend = pxa2xx_clock_suspend,
.resume = pxa2xx_clock_resume,
};
-
-static int __init pxa2xx_clock_init(void)
-{
- if (cpu_is_pxa2xx())
- return sysdev_class_register(&pxa2xx_clock_sysclass);
- return 0;
-}
-postcore_initcall(pxa2xx_clock_init);
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/io.h>
+#include <linux/syscore_ops.h>
#include <mach/smemc.h>
#include <mach/pxa3xx-regs.h>
static uint32_t cken[2];
static uint32_t accr;
-static int pxa3xx_clock_suspend(struct sys_device *d, pm_message_t state)
+static int pxa3xx_clock_suspend(void)
{
cken[0] = CKENA;
cken[1] = CKENB;
return 0;
}
-static int pxa3xx_clock_resume(struct sys_device *d)
+static void pxa3xx_clock_resume(void)
{
ACCR = accr;
CKENA = cken[0];
CKENB = cken[1];
- return 0;
}
#else
#define pxa3xx_clock_suspend NULL
#define pxa3xx_clock_resume NULL
#endif
-struct sysdev_class pxa3xx_clock_sysclass = {
- .name = "pxa3xx-clock",
+struct syscore_ops pxa3xx_clock_syscore_ops = {
.suspend = pxa3xx_clock_suspend,
.resume = pxa3xx_clock_resume,
};
-
-static int __init pxa3xx_clock_init(void)
-{
- if (cpu_is_pxa3xx() || cpu_is_pxa95x())
- return sysdev_class_register(&pxa3xx_clock_sysclass);
- return 0;
-}
-postcore_initcall(pxa3xx_clock_init);
#include <linux/clkdev.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
struct clkops {
void (*enable)(struct clk *);
void clk_pxa2xx_cken_enable(struct clk *clk);
void clk_pxa2xx_cken_disable(struct clk *clk);
-extern struct sysdev_class pxa2xx_clock_sysclass;
+extern struct syscore_ops pxa2xx_clock_syscore_ops;
#if defined(CONFIG_PXA3xx) || defined(CONFIG_PXA95x)
#define DEFINE_PXA3_CKEN(_name, _cken, _rate, _delay) \
extern void clk_pxa3xx_cken_enable(struct clk *);
extern void clk_pxa3xx_cken_disable(struct clk *);
-extern struct sysdev_class pxa3xx_clock_sysclass;
+extern struct syscore_ops pxa3xx_clock_syscore_ops;
+
#endif
*/
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
#include <linux/irq.h>
#include <linux/gpio.h>
#include <linux/delay.h>
*/
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/irq.h>
#include <linux/gpio.h>
#ifdef CONFIG_PM
static unsigned long sleep_save_msc[10];
-static int cmx2xx_suspend(struct sys_device *dev, pm_message_t state)
+static int cmx2xx_suspend(void)
{
cmx2xx_pci_suspend();
return 0;
}
-static int cmx2xx_resume(struct sys_device *dev)
+static void cmx2xx_resume(void)
{
cmx2xx_pci_resume();
__raw_writel(sleep_save_msc[0], MSC0);
__raw_writel(sleep_save_msc[1], MSC1);
__raw_writel(sleep_save_msc[2], MSC2);
-
- return 0;
}
-static struct sysdev_class cmx2xx_pm_sysclass = {
- .name = "pm",
+static struct syscore_ops cmx2xx_pm_syscore_ops = {
.resume = cmx2xx_resume,
.suspend = cmx2xx_suspend,
};
-static struct sys_device cmx2xx_pm_device = {
- .cls = &cmx2xx_pm_sysclass,
-};
-
static int __init cmx2xx_pm_init(void)
{
- int error;
- error = sysdev_class_register(&cmx2xx_pm_sysclass);
- if (error == 0)
- error = sysdev_register(&cmx2xx_pm_device);
- return error;
+ register_syscore_ops(&cmx2xx_pm_syscore_ops);
+
+ return 0;
}
#else
static int __init cmx2xx_pm_init(void) { return 0; }
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
#include <linux/interrupt.h>
#include <linux/gpio.h>
#include <asm/mach-types.h>
#include <linux/platform_device.h>
#include <linux/pwm_backlight.h>
#include <linux/i2c/pxa-i2c.h>
-#include <linux/sysdev.h>
#include <asm/irq.h>
#include <asm/mach-types.h>
#include <linux/mtd/partitions.h>
#include <linux/mtd/physmap.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
#include <linux/ucb1400.h>
#include <asm/mach/arch.h>
#define pxa3xx_get_clk_frequency_khz(x) (0)
#endif
-extern struct sysdev_class pxa_irq_sysclass;
-extern struct sysdev_class pxa_gpio_sysclass;
-extern struct sysdev_class pxa2xx_mfp_sysclass;
-extern struct sysdev_class pxa3xx_mfp_sysclass;
+extern struct syscore_ops pxa_irq_syscore_ops;
+extern struct syscore_ops pxa_gpio_syscore_ops;
+extern struct syscore_ops pxa2xx_mfp_syscore_ops;
+extern struct syscore_ops pxa3xx_mfp_syscore_ops;
void __init pxa_set_ffuart_info(void *info);
void __init pxa_set_btuart_info(void *info);
static struct regulator_init_data bq24022_init_data = {
.constraints = {
.max_uA = 500000,
- .valid_ops_mask = REGULATOR_CHANGE_CURRENT,
+ .valid_ops_mask = REGULATOR_CHANGE_CURRENT|REGULATOR_CHANGE_STATUS,
},
.num_consumer_supplies = ARRAY_SIZE(bq24022_consumers),
.consumer_supplies = bq24022_consumers,
#include <linux/init.h>
#include <linux/module.h>
#include <linux/interrupt.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <linux/irq.h>
static unsigned long saved_icmr[MAX_INTERNAL_IRQS/32];
static unsigned long saved_ipr[MAX_INTERNAL_IRQS];
-static int pxa_irq_suspend(struct sys_device *dev, pm_message_t state)
+static int pxa_irq_suspend(void)
{
int i;
return 0;
}
-static int pxa_irq_resume(struct sys_device *dev)
+static void pxa_irq_resume(void)
{
int i;
__raw_writel(saved_ipr[i], IRQ_BASE + IPR(i));
__raw_writel(1, IRQ_BASE + ICCR);
- return 0;
}
#else
#define pxa_irq_suspend NULL
#define pxa_irq_resume NULL
#endif
-struct sysdev_class pxa_irq_sysclass = {
- .name = "irq",
+struct syscore_ops pxa_irq_syscore_ops = {
.suspend = pxa_irq_suspend,
.resume = pxa_irq_resume,
};
-
-static int __init pxa_irq_init(void)
-{
- return sysdev_class_register(&pxa_irq_sysclass);
-}
-
-core_initcall(pxa_irq_init);
#include <linux/init.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/sched.h>
#include <linux/bitops.h>
#ifdef CONFIG_PM
-static int lpd270_irq_resume(struct sys_device *dev)
+static void lpd270_irq_resume(void)
{
__raw_writew(lpd270_irq_enabled, LPD270_INT_MASK);
- return 0;
}
-static struct sysdev_class lpd270_irq_sysclass = {
- .name = "cpld_irq",
+static struct syscore_ops lpd270_irq_syscore_ops = {
.resume = lpd270_irq_resume,
};
-static struct sys_device lpd270_irq_device = {
- .cls = &lpd270_irq_sysclass,
-};
-
static int __init lpd270_irq_device_init(void)
{
- int ret = -ENODEV;
if (machine_is_logicpd_pxa270()) {
- ret = sysdev_class_register(&lpd270_irq_sysclass);
- if (ret == 0)
- ret = sysdev_register(&lpd270_irq_device);
+ register_syscore_ops(&lpd270_irq_syscore_ops);
+ return 0;
}
- return ret;
+ return -ENODEV;
}
device_initcall(lpd270_irq_device_init);
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/major.h>
#include <linux/fb.h>
#include <linux/interrupt.h>
#ifdef CONFIG_PM
-static int lubbock_irq_resume(struct sys_device *dev)
+static void lubbock_irq_resume(void)
{
LUB_IRQ_MASK_EN = lubbock_irq_enabled;
- return 0;
}
-static struct sysdev_class lubbock_irq_sysclass = {
- .name = "cpld_irq",
+static struct syscore_ops lubbock_irq_syscore_ops = {
.resume = lubbock_irq_resume,
};
-static struct sys_device lubbock_irq_device = {
- .cls = &lubbock_irq_sysclass,
-};
-
static int __init lubbock_irq_device_init(void)
{
- int ret = -ENODEV;
-
if (machine_is_lubbock()) {
- ret = sysdev_class_register(&lubbock_irq_sysclass);
- if (ret == 0)
- ret = sysdev_register(&lubbock_irq_device);
+ register_syscore_ops(&lubbock_irq_syscore_ops);
+ return 0;
}
- return ret;
+ return -ENODEV;
}
device_initcall(lubbock_irq_device_init);
static struct regulator_init_data bq24022_init_data = {
.constraints = {
.max_uA = 500000,
- .valid_ops_mask = REGULATOR_CHANGE_CURRENT,
+ .valid_ops_mask = REGULATOR_CHANGE_CURRENT | REGULATOR_CHANGE_STATUS,
},
.num_consumer_supplies = ARRAY_SIZE(bq24022_consumers),
.consumer_supplies = bq24022_consumers,
#include <linux/init.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/sched.h>
#include <linux/bitops.h>
#ifdef CONFIG_PM
-static int mainstone_irq_resume(struct sys_device *dev)
+static void mainstone_irq_resume(void)
{
MST_INTMSKENA = mainstone_irq_enabled;
- return 0;
}
-static struct sysdev_class mainstone_irq_sysclass = {
- .name = "cpld_irq",
+static struct syscore_ops mainstone_irq_syscore_ops = {
.resume = mainstone_irq_resume,
};
-static struct sys_device mainstone_irq_device = {
- .cls = &mainstone_irq_sysclass,
-};
-
static int __init mainstone_irq_device_init(void)
{
- int ret = -ENODEV;
+ if (machine_is_mainstone())
+ register_syscore_ops(&mainstone_irq_syscore_ops);
- if (machine_is_mainstone()) {
- ret = sysdev_class_register(&mainstone_irq_sysclass);
- if (ret == 0)
- ret = sysdev_register(&mainstone_irq_device);
- }
- return ret;
+ return 0;
}
device_initcall(mainstone_irq_device_init);
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <mach/gpio.h>
#include <mach/pxa2xx-regs.h>
static unsigned long saved_gpdr[4];
static unsigned long saved_pgsr[4];
-static int pxa2xx_mfp_suspend(struct sys_device *d, pm_message_t state)
+static int pxa2xx_mfp_suspend(void)
{
int i;
return 0;
}
-static int pxa2xx_mfp_resume(struct sys_device *d)
+static void pxa2xx_mfp_resume(void)
{
int i;
PGSR(i) = saved_pgsr[i];
}
PSSR = PSSR_RDH | PSSR_PH;
- return 0;
}
#else
#define pxa2xx_mfp_suspend NULL
#define pxa2xx_mfp_resume NULL
#endif
-struct sysdev_class pxa2xx_mfp_sysclass = {
- .name = "mfp",
+struct syscore_ops pxa2xx_mfp_syscore_ops = {
.suspend = pxa2xx_mfp_suspend,
.resume = pxa2xx_mfp_resume,
};
for (i = 0; i <= gpio_to_bank(pxa_last_gpio); i++)
gpdr_lpm[i] = GPDR(i * 32);
- return sysdev_class_register(&pxa2xx_mfp_sysclass);
+ return 0;
}
postcore_initcall(pxa2xx_mfp_init);
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <mach/hardware.h>
#include <mach/mfp-pxa3xx.h>
* a pull-down mode if they're an active low chip select, and we're
* just entering standby.
*/
-static int pxa3xx_mfp_suspend(struct sys_device *d, pm_message_t state)
+static int pxa3xx_mfp_suspend(void)
{
mfp_config_lpm();
return 0;
}
-static int pxa3xx_mfp_resume(struct sys_device *d)
+static void pxa3xx_mfp_resume(void)
{
mfp_config_run();
* preserve them here in case they will be referenced later
*/
ASCR &= ~(ASCR_RDH | ASCR_D1S | ASCR_D2S | ASCR_D3S);
- return 0;
}
#else
#define pxa3xx_mfp_suspend NULL
#define pxa3xx_mfp_resume NULL
#endif
-struct sysdev_class pxa3xx_mfp_sysclass = {
- .name = "mfp",
+struct syscore_ops pxa3xx_mfp_syscore_ops = {
.suspend = pxa3xx_mfp_suspend,
- .resume = pxa3xx_mfp_resume,
+ .resume = pxa3xx_mfp_resume,
};
-
-static int __init mfp_init_devicefs(void)
-{
- if (cpu_is_pxa3xx())
- return sysdev_class_register(&pxa3xx_mfp_sysclass);
-
- return 0;
-}
-postcore_initcall(mfp_init_devicefs);
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/input.h>
#include <linux/delay.h>
#include <linux/gpio_keys.h>
}
-static int mioa701_sys_suspend(struct sys_device *sysdev, pm_message_t state)
+static int mioa701_sys_suspend(void)
{
int i = 0, is_bt_on;
u32 *mem_resume_vector = phys_to_virt(RESUME_VECTOR_ADDR);
return 0;
}
-static int mioa701_sys_resume(struct sys_device *sysdev)
+static void mioa701_sys_resume(void)
{
int i = 0;
u32 *mem_resume_vector = phys_to_virt(RESUME_VECTOR_ADDR);
*mem_resume_enabler = save_buffer[i++];
*mem_resume_bt = save_buffer[i++];
*mem_resume_unknown = save_buffer[i++];
-
- return 0;
}
-static struct sysdev_class mioa701_sysclass = {
- .name = "mioa701",
-};
-
-static struct sys_device sysdev_bootstrap = {
- .cls = &mioa701_sysclass,
-};
-
-static struct sysdev_driver driver_bootstrap = {
- .suspend = &mioa701_sys_suspend,
- .resume = &mioa701_sys_resume,
+static struct syscore_ops mioa701_syscore_ops = {
+ .suspend = mioa701_sys_suspend,
+ .resume = mioa701_sys_resume,
};
static int __init bootstrap_init(void)
{
- int rc;
int save_size = mioa701_bootstrap_lg + (sizeof(u32) * 3);
- rc = sysdev_class_register(&mioa701_sysclass);
- if (rc) {
- printk(KERN_ERR "Failed registering mioa701 sys class\n");
- return -ENODEV;
- }
- rc = sysdev_register(&sysdev_bootstrap);
- if (rc) {
- printk(KERN_ERR "Failed registering mioa701 sys device\n");
- return -ENODEV;
- }
- rc = sysdev_driver_register(&mioa701_sysclass, &driver_bootstrap);
- if (rc) {
- printk(KERN_ERR "Failed registering PMU sys driver\n");
- return -ENODEV;
- }
+ register_syscore_ops(&mioa701_syscore_ops);
save_buffer = kmalloc(save_size, GFP_KERNEL);
if (!save_buffer)
static void bootstrap_exit(void)
{
kfree(save_buffer);
- sysdev_driver_unregister(&mioa701_sysclass, &driver_bootstrap);
- sysdev_unregister(&sysdev_bootstrap);
- sysdev_class_unregister(&mioa701_sysclass);
+ unregister_syscore_ops(&mioa701_syscore_ops);
printk(KERN_CRIT "Unregistering mioa701 suspend will hang next"
"resume !!!\n");
#include <linux/gpio.h>
#include <linux/wm97xx.h>
#include <linux/power_supply.h>
-#include <linux/sysdev.h>
#include <linux/mtd/mtd.h>
#include <linux/mtd/partitions.h>
#include <linux/mtd/physmap.h>
#include <linux/pwm_backlight.h>
#include <linux/gpio.h>
#include <linux/power_supply.h>
-#include <linux/sysdev.h>
#include <linux/w1-gpio.h>
#include <asm/mach-types.h>
*/
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/delay.h>
#include <linux/irq.h>
#include <linux/gpio_keys.h>
static unsigned long store_ptr;
-/* sys_device for Palm Zire 72 PM */
+/* syscore_ops for Palm Zire 72 PM */
-static int palmz72_pm_suspend(struct sys_device *dev, pm_message_t msg)
+static int palmz72_pm_suspend(void)
{
/* setup the resume_info struct for the original bootloader */
palmz72_resume_info.resume_addr = (u32) cpu_resume;
return 0;
}
-static int palmz72_pm_resume(struct sys_device *dev)
+static void palmz72_pm_resume(void)
{
*PALMZ72_SAVE_DWORD = store_ptr;
- return 0;
}
-static struct sysdev_class palmz72_pm_sysclass = {
- .name = "palmz72_pm",
+static struct syscore_ops palmz72_pm_syscore_ops = {
.suspend = palmz72_pm_suspend,
.resume = palmz72_pm_resume,
};
-static struct sys_device palmz72_pm_device = {
- .cls = &palmz72_pm_sysclass,
-};
-
static int __init palmz72_pm_init(void)
{
- int ret = -ENODEV;
if (machine_is_palmz72()) {
- ret = sysdev_class_register(&palmz72_pm_sysclass);
- if (ret == 0)
- ret = sysdev_register(&palmz72_pm_device);
+ register_syscore_ops(&palmz72_pm_syscore_ops);
+ return 0;
}
- return ret;
+ return -ENODEV;
}
device_initcall(palmz72_pm_init);
#include <linux/init.h>
#include <linux/platform_device.h>
#include <linux/suspend.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/irq.h>
#include <asm/mach/map.h>
&pxa_device_asoc_platform,
};
-static struct sys_device pxa25x_sysdev[] = {
- {
- .cls = &pxa_irq_sysclass,
- }, {
- .cls = &pxa2xx_mfp_sysclass,
- }, {
- .cls = &pxa_gpio_sysclass,
- }, {
- .cls = &pxa2xx_clock_sysclass,
- }
-};
-
static int __init pxa25x_init(void)
{
- int i, ret = 0;
+ int ret = 0;
if (cpu_is_pxa25x()) {
pxa25x_init_pm();
- for (i = 0; i < ARRAY_SIZE(pxa25x_sysdev); i++) {
- ret = sysdev_register(&pxa25x_sysdev[i]);
- if (ret)
- pr_err("failed to register sysdev[%d]\n", i);
- }
+ register_syscore_ops(&pxa_irq_syscore_ops);
+ register_syscore_ops(&pxa2xx_mfp_syscore_ops);
+ register_syscore_ops(&pxa_gpio_syscore_ops);
+ register_syscore_ops(&pxa2xx_clock_syscore_ops);
ret = platform_add_devices(pxa25x_devices,
ARRAY_SIZE(pxa25x_devices));
#include <linux/init.h>
#include <linux/suspend.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <linux/irq.h>
#include <linux/i2c/pxa-i2c.h>
&pxa27x_device_pwm1,
};
-static struct sys_device pxa27x_sysdev[] = {
- {
- .cls = &pxa_irq_sysclass,
- }, {
- .cls = &pxa2xx_mfp_sysclass,
- }, {
- .cls = &pxa_gpio_sysclass,
- }, {
- .cls = &pxa2xx_clock_sysclass,
- }
-};
-
static int __init pxa27x_init(void)
{
- int i, ret = 0;
+ int ret = 0;
if (cpu_is_pxa27x()) {
pxa27x_init_pm();
- for (i = 0; i < ARRAY_SIZE(pxa27x_sysdev); i++) {
- ret = sysdev_register(&pxa27x_sysdev[i]);
- if (ret)
- pr_err("failed to register sysdev[%d]\n", i);
- }
+ register_syscore_ops(&pxa_irq_syscore_ops);
+ register_syscore_ops(&pxa2xx_mfp_syscore_ops);
+ register_syscore_ops(&pxa_gpio_syscore_ops);
+ register_syscore_ops(&pxa2xx_clock_syscore_ops);
ret = platform_add_devices(devices, ARRAY_SIZE(devices));
}
#include <linux/platform_device.h>
#include <linux/irq.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/i2c/pxa-i2c.h>
#include <asm/mach/map.h>
&pxa27x_device_pwm1,
};
-static struct sys_device pxa3xx_sysdev[] = {
- {
- .cls = &pxa_irq_sysclass,
- }, {
- .cls = &pxa3xx_mfp_sysclass,
- }, {
- .cls = &pxa_gpio_sysclass,
- }, {
- .cls = &pxa3xx_clock_sysclass,
- }
-};
-
static int __init pxa3xx_init(void)
{
- int i, ret = 0;
+ int ret = 0;
if (cpu_is_pxa3xx()) {
pxa3xx_init_pm();
- for (i = 0; i < ARRAY_SIZE(pxa3xx_sysdev); i++) {
- ret = sysdev_register(&pxa3xx_sysdev[i]);
- if (ret)
- pr_err("failed to register sysdev[%d]\n", i);
- }
+ register_syscore_ops(&pxa_irq_syscore_ops);
+ register_syscore_ops(&pxa3xx_mfp_syscore_ops);
+ register_syscore_ops(&pxa_gpio_syscore_ops);
+ register_syscore_ops(&pxa3xx_clock_syscore_ops);
ret = platform_add_devices(devices, ARRAY_SIZE(devices));
}
#include <linux/i2c/pxa-i2c.h>
#include <linux/irq.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <mach/hardware.h>
#include <mach/gpio.h>
&pxa27x_device_pwm1,
};
-static struct sys_device pxa95x_sysdev[] = {
- {
- .cls = &pxa_irq_sysclass,
- }, {
- .cls = &pxa_gpio_sysclass,
- }, {
- .cls = &pxa3xx_clock_sysclass,
- }
-};
-
static int __init pxa95x_init(void)
{
int ret = 0, i;
if ((ret = pxa_init_dma(IRQ_DMA, 32)))
return ret;
- for (i = 0; i < ARRAY_SIZE(pxa95x_sysdev); i++) {
- ret = sysdev_register(&pxa95x_sysdev[i]);
- if (ret)
- pr_err("failed to register sysdev[%d]\n", i);
- }
+ register_syscore_ops(&pxa_irq_syscore_ops);
+ register_syscore_ops(&pxa_gpio_syscore_ops);
+ register_syscore_ops(&pxa3xx_clock_syscore_ops);
ret = platform_add_devices(devices, ARRAY_SIZE(devices));
}
#include <linux/init.h>
#include <linux/kernel.h>
-#include <linux/sysdev.h>
#include <linux/platform_device.h>
#include <linux/interrupt.h>
#include <linux/gpio.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <mach/hardware.h>
#include <mach/smemc.h>
static unsigned long sxcnfg, memclkcfg;
static unsigned long csadrcfg[4];
-static int pxa3xx_smemc_suspend(struct sys_device *dev, pm_message_t state)
+static int pxa3xx_smemc_suspend(void)
{
msc[0] = __raw_readl(MSC0);
msc[1] = __raw_readl(MSC1);
return 0;
}
-static int pxa3xx_smemc_resume(struct sys_device *dev)
+static void pxa3xx_smemc_resume(void)
{
__raw_writel(msc[0], MSC0);
__raw_writel(msc[1], MSC1);
__raw_writel(csadrcfg[1], CSADRCFG1);
__raw_writel(csadrcfg[2], CSADRCFG2);
__raw_writel(csadrcfg[3], CSADRCFG3);
-
- return 0;
}
-static struct sysdev_class smemc_sysclass = {
- .name = "smemc",
+static struct syscore_ops smemc_syscore_ops = {
.suspend = pxa3xx_smemc_suspend,
.resume = pxa3xx_smemc_resume,
};
-static struct sys_device smemc_sysdev = {
- .id = 0,
- .cls = &smemc_sysclass,
-};
-
static int __init smemc_init(void)
{
- int ret = 0;
+ if (cpu_is_pxa3xx())
+ register_syscore_ops(&smemc_syscore_ops);
- if (cpu_is_pxa3xx()) {
- ret = sysdev_class_register(&smemc_sysclass);
- if (ret)
- return ret;
-
- ret = sysdev_register(&smemc_sysdev);
- }
-
- return ret;
+ return 0;
}
subsys_initcall(smemc_init);
#endif
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
#include <linux/interrupt.h>
#include <linux/sched.h>
#include <linux/bitops.h>
#include <linux/mtd/mtd.h>
#include <linux/mtd/partitions.h>
#include <linux/mtd/physmap.h>
+#include <linux/syscore_ops.h>
#include <mach/pxa25x.h>
#include <mach/audio.h>
return v1;
}
-/* CPU sysdev */
-static int viper_cpu_suspend(struct sys_device *sysdev, pm_message_t state)
+/* CPU system core operations. */
+static int viper_cpu_suspend(void)
{
viper_icr_set_bit(VIPER_ICR_R_DIS);
return 0;
}
-static int viper_cpu_resume(struct sys_device *sysdev)
+static void viper_cpu_resume(void)
{
viper_icr_clear_bit(VIPER_ICR_R_DIS);
- return 0;
}
-static struct sysdev_driver viper_cpu_sysdev_driver = {
+static struct syscore_ops viper_cpu_syscore_ops = {
.suspend = viper_cpu_suspend,
.resume = viper_cpu_resume,
};
viper_init_vcore_gpios();
viper_init_cpufreq();
- sysdev_driver_register(&cpu_sysdev_class, &viper_cpu_sysdev_driver);
+ register_syscore_ops(&viper_cpu_syscore_ops);
if (version) {
pr_info("viper: hardware v%di%d detected. "
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
-#include <linux/sysdev.h>
#include <linux/usb/gpio_vbus.h>
#include <linux/mtd/mtd.h>
#include <linux/mtd/partitions.h>
* operation to deadlock the system.
*/
#define mb() dsb()
-#define rmb() dmb()
+#define rmb() dsb()
#define wmb() mb()
#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/ioport.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <plat/cpu.h>
#include <plat/pm.h>
-static int s3c2410_irq_add(struct sys_device *sysdev)
-{
- return 0;
-}
-
-static struct sysdev_driver s3c2410_irq_driver = {
- .add = s3c2410_irq_add,
+struct syscore_ops s3c24xx_irq_syscore_ops = {
.suspend = s3c24xx_irq_suspend,
.resume = s3c24xx_irq_resume,
};
-
-static int __init s3c2410_irq_init(void)
-{
- return sysdev_driver_register(&s3c2410_sysclass, &s3c2410_irq_driver);
-}
-
-arch_initcall(s3c2410_irq_init);
-
-static struct sysdev_driver s3c2410a_irq_driver = {
- .add = s3c2410_irq_add,
- .suspend = s3c24xx_irq_suspend,
- .resume = s3c24xx_irq_resume,
-};
-
-static int __init s3c2410a_irq_init(void)
-{
- return sysdev_driver_register(&s3c2410a_sysclass, &s3c2410a_irq_driver);
-}
-
-arch_initcall(s3c2410a_irq_init);
#include <linux/timer.h>
#include <linux/init.h>
#include <linux/gpio.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
#include <linux/dm9000.h>
/* NAND Flash on BAST board */
#ifdef CONFIG_PM
-static int bast_pm_suspend(struct sys_device *sd, pm_message_t state)
+static int bast_pm_suspend(void)
{
/* ensure that an nRESET is not generated on resume. */
gpio_direction_output(S3C2410_GPA(21), 1);
return 0;
}
-static int bast_pm_resume(struct sys_device *sd)
+static void bast_pm_resume(void)
{
s3c_gpio_cfgpin(S3C2410_GPA(21), S3C2410_GPA21_nRSTOUT);
- return 0;
}
#else
#define bast_pm_resume NULL
#endif
-static struct sysdev_class bast_pm_sysclass = {
- .name = "mach-bast",
+static struct syscore_ops bast_pm_syscore_ops = {
.suspend = bast_pm_suspend,
.resume = bast_pm_resume,
};
-static struct sys_device bast_pm_sysdev = {
- .cls = &bast_pm_sysclass,
-};
-
static int smartmedia_map[] = { 0 };
static int chip0_map[] = { 1 };
static int chip1_map[] = { 2 };
static void __init bast_init(void)
{
- sysdev_class_register(&bast_pm_sysclass);
- sysdev_register(&bast_pm_sysdev);
+ register_syscore_ops(&bast_pm_syscore_ops);
s3c_i2c0_set_platdata(&bast_i2c_info);
s3c_nand_set_platdata(&bast_nand_info);
#include <linux/errno.h>
#include <linux/time.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/gpio.h>
#include <linux/io.h>
}
}
-static int s3c2410_pm_resume(struct sys_device *dev)
+static void s3c2410_pm_resume(void)
{
unsigned long tmp;
if ( machine_is_aml_m5900() )
s3c2410_gpio_setpin(S3C2410_GPF(2), 0);
-
- return 0;
}
+struct syscore_ops s3c2410_pm_syscore_ops = {
+ .resume = s3c2410_pm_resume,
+};
+
static int s3c2410_pm_add(struct sys_device *dev)
{
pm_cpu_prep = s3c2410_pm_prepare;
#if defined(CONFIG_CPU_S3C2410)
static struct sysdev_driver s3c2410_pm_driver = {
.add = s3c2410_pm_add,
- .resume = s3c2410_pm_resume,
};
/* register ourselves */
static struct sysdev_driver s3c2410a_pm_driver = {
.add = s3c2410_pm_add,
- .resume = s3c2410_pm_resume,
};
static int __init s3c2410a_pm_drvinit(void)
#if defined(CONFIG_CPU_S3C2440)
static struct sysdev_driver s3c2440_pm_driver = {
.add = s3c2410_pm_add,
- .resume = s3c2410_pm_resume,
};
static int __init s3c2440_pm_drvinit(void)
#if defined(CONFIG_CPU_S3C2442)
static struct sysdev_driver s3c2442_pm_driver = {
.add = s3c2410_pm_add,
- .resume = s3c2410_pm_resume,
};
static int __init s3c2442_pm_drvinit(void)
#include <linux/gpio.h>
#include <linux/clk.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
#include <linux/io.h>
#include <plat/devs.h>
#include <plat/clock.h>
#include <plat/pll.h>
+#include <plat/pm.h>
#include <plat/gpio-core.h>
#include <plat/gpio-cfg.h>
{
printk("S3C2410: Initialising architecture\n");
+ register_syscore_ops(&s3c2410_pm_syscore_ops);
+ register_syscore_ops(&s3c24xx_irq_syscore_ops);
+
return sysdev_register(&s3c2410_sysdev);
}
static struct sysdev_driver s3c2412_irq_driver = {
.add = s3c2412_irq_add,
- .suspend = s3c24xx_irq_suspend,
- .resume = s3c24xx_irq_resume,
};
static int s3c2412_irq_init(void)
#include <linux/timer.h>
#include <linux/init.h>
#include <linux/gpio.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
#include <linux/i2c.h>
/* Jive power management device */
#ifdef CONFIG_PM
-static int jive_pm_suspend(struct sys_device *sd, pm_message_t state)
+static int jive_pm_suspend(void)
{
/* Write the magic value u-boot uses to check for resume into
* the INFORM0 register, and ensure INFORM1 is set to the
return 0;
}
-static int jive_pm_resume(struct sys_device *sd)
+static void jive_pm_resume(void)
{
__raw_writel(0x0, S3C2412_INFORM0);
- return 0;
}
#else
#define jive_pm_resume NULL
#endif
-static struct sysdev_class jive_pm_sysclass = {
- .name = "jive-pm",
+static struct syscore_ops jive_pm_syscore_ops = {
.suspend = jive_pm_suspend,
.resume = jive_pm_resume,
};
-static struct sys_device jive_pm_sysdev = {
- .cls = &jive_pm_sysclass,
-};
-
static void __init jive_map_io(void)
{
s3c24xx_init_io(jive_iodesc, ARRAY_SIZE(jive_iodesc));
static void __init jive_machine_init(void)
{
- /* register system devices for managing low level suspend */
+ /* register system core operations for managing low level suspend */
- sysdev_class_register(&jive_pm_sysclass);
- sysdev_register(&jive_pm_sysdev);
+ register_syscore_ops(&jive_pm_syscore_ops);
/* write our sleep configurations for the IO. Pull down all unused
* IO, ensure that we have turned off all peripherals we do not
#include <linux/timer.h>
#include <linux/init.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/platform_device.h>
#include <linux/io.h>
SAVE_ITEM(S3C2413_GPJSLPCON),
};
-static int s3c2412_pm_suspend(struct sys_device *dev, pm_message_t state)
+static struct sysdev_driver s3c2412_pm_driver = {
+ .add = s3c2412_pm_add,
+};
+
+static __init int s3c2412_pm_init(void)
+{
+ return sysdev_driver_register(&s3c2412_sysclass, &s3c2412_pm_driver);
+}
+
+arch_initcall(s3c2412_pm_init);
+
+static int s3c2412_pm_suspend(void)
{
s3c_pm_do_save(s3c2412_sleep, ARRAY_SIZE(s3c2412_sleep));
return 0;
}
-static int s3c2412_pm_resume(struct sys_device *dev)
+static void s3c2412_pm_resume(void)
{
unsigned long tmp;
__raw_writel(tmp, S3C2412_PWRCFG);
s3c_pm_do_restore(s3c2412_sleep, ARRAY_SIZE(s3c2412_sleep));
- return 0;
}
-static struct sysdev_driver s3c2412_pm_driver = {
- .add = s3c2412_pm_add,
+struct syscore_ops s3c2412_pm_syscore_ops = {
.suspend = s3c2412_pm_suspend,
.resume = s3c2412_pm_resume,
};
-
-static __init int s3c2412_pm_init(void)
-{
- return sysdev_driver_register(&s3c2412_sysclass, &s3c2412_pm_driver);
-}
-
-arch_initcall(s3c2412_pm_init);
#include <linux/clk.h>
#include <linux/delay.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
#include <linux/io.h>
{
printk("S3C2412: Initialising architecture\n");
+ register_syscore_ops(&s3c2412_pm_syscore_ops);
+ register_syscore_ops(&s3c24xx_irq_syscore_ops);
+
return sysdev_register(&s3c2412_sysdev);
}
static struct sysdev_driver s3c2416_irq_driver = {
.add = s3c2416_irq_add,
- .suspend = s3c24xx_irq_suspend,
- .resume = s3c24xx_irq_resume,
};
static int __init s3c2416_irq_init(void)
*/
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <asm/cacheflush.h>
return 0;
}
-static int s3c2416_pm_suspend(struct sys_device *dev, pm_message_t state)
+static struct sysdev_driver s3c2416_pm_driver = {
+ .add = s3c2416_pm_add,
+};
+
+static __init int s3c2416_pm_init(void)
{
- return 0;
+ return sysdev_driver_register(&s3c2416_sysclass, &s3c2416_pm_driver);
}
-static int s3c2416_pm_resume(struct sys_device *dev)
+arch_initcall(s3c2416_pm_init);
+
+
+static void s3c2416_pm_resume(void)
{
/* unset the return-from-sleep amd inform flags */
__raw_writel(0x0, S3C2443_PWRMODE);
__raw_writel(0x0, S3C2412_INFORM0);
__raw_writel(0x0, S3C2412_INFORM1);
-
- return 0;
}
-static struct sysdev_driver s3c2416_pm_driver = {
- .add = s3c2416_pm_add,
- .suspend = s3c2416_pm_suspend,
+struct syscore_ops s3c2416_pm_syscore_ops = {
.resume = s3c2416_pm_resume,
};
-
-static __init int s3c2416_pm_init(void)
-{
- return sysdev_driver_register(&s3c2416_sysclass, &s3c2416_pm_driver);
-}
-
-arch_initcall(s3c2416_pm_init);
#include <linux/platform_device.h>
#include <linux/serial_core.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/clk.h>
#include <linux/io.h>
#include <plat/devs.h>
#include <plat/cpu.h>
#include <plat/sdhci.h>
+#include <plat/pm.h>
#include <plat/iic-core.h>
#include <plat/fb-core.h>
s3c_fb_setname("s3c2443-fb");
+ register_syscore_ops(&s3c2416_pm_syscore_ops);
+ register_syscore_ops(&s3c24xx_irq_syscore_ops);
+
return sysdev_register(&s3c2416_sysdev);
}
#include <linux/init.h>
#include <linux/gpio.h>
#include <linux/device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/serial_core.h>
#include <linux/clk.h>
#include <linux/i2c.h>
#ifdef CONFIG_PM
static unsigned char pm_osiris_ctrl0;
-static int osiris_pm_suspend(struct sys_device *sd, pm_message_t state)
+static int osiris_pm_suspend(void)
{
unsigned int tmp;
return 0;
}
-static int osiris_pm_resume(struct sys_device *sd)
+static void osiris_pm_resume(void)
{
if (pm_osiris_ctrl0 & OSIRIS_CTRL0_FIX8)
__raw_writeb(OSIRIS_CTRL1_FIX8, OSIRIS_VA_CTRL1);
__raw_writeb(pm_osiris_ctrl0, OSIRIS_VA_CTRL0);
s3c_gpio_cfgpin(S3C2410_GPA(21), S3C2410_GPA21_nRSTOUT);
-
- return 0;
}
#else
#define osiris_pm_resume NULL
#endif
-static struct sysdev_class osiris_pm_sysclass = {
- .name = "mach-osiris",
+static struct syscore_ops osiris_pm_syscore_ops = {
.suspend = osiris_pm_suspend,
.resume = osiris_pm_resume,
};
-static struct sys_device osiris_pm_sysdev = {
- .cls = &osiris_pm_sysclass,
-};
-
/* Link for DVS driver to TPS65011 */
static void osiris_tps_release(struct device *dev)
static void __init osiris_init(void)
{
- sysdev_class_register(&osiris_pm_sysclass);
- sysdev_register(&osiris_pm_sysdev);
+ register_syscore_ops(&osiris_pm_syscore_ops);
s3c_i2c0_set_platdata(NULL);
s3c_nand_set_platdata(&osiris_nand_info);
#include <linux/platform_device.h>
#include <linux/serial_core.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/gpio.h>
#include <linux/clk.h>
#include <linux/io.h>
#include <plat/devs.h>
#include <plat/cpu.h>
#include <plat/s3c244x.h>
+#include <plat/pm.h>
#include <plat/gpio-core.h>
#include <plat/gpio-cfg.h>
s3c_device_wdt.resource[1].start = IRQ_S3C2440_WDT;
s3c_device_wdt.resource[1].end = IRQ_S3C2440_WDT;
+ /* register suspend/resume handlers */
+
+ register_syscore_ops(&s3c2410_pm_syscore_ops);
+ register_syscore_ops(&s3c244x_pm_syscore_ops);
+ register_syscore_ops(&s3c24xx_irq_syscore_ops);
+
/* register our system device for everything else */
return sysdev_register(&s3c2440_sysdev);
#include <linux/err.h>
#include <linux/device.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/ioport.h>
#include <linux/mutex.h>
#include <plat/clock.h>
#include <plat/cpu.h>
#include <plat/s3c244x.h>
+#include <plat/pm.h>
#include <plat/gpio-core.h>
#include <plat/gpio-cfg.h>
{
printk("S3C2442: Initialising architecture\n");
+ register_syscore_ops(&s3c2410_pm_syscore_ops);
+ register_syscore_ops(&s3c244x_pm_syscore_ops);
+ register_syscore_ops(&s3c24xx_irq_syscore_ops);
+
return sysdev_register(&s3c2442_sysdev);
}
static struct sysdev_driver s3c2440_irq_driver = {
.add = s3c244x_irq_add,
- .suspend = s3c24xx_irq_suspend,
- .resume = s3c24xx_irq_resume,
};
static int s3c2440_irq_init(void)
static struct sysdev_driver s3c2442_irq_driver = {
.add = s3c244x_irq_add,
- .suspend = s3c24xx_irq_suspend,
- .resume = s3c24xx_irq_resume,
};
#include <linux/serial_core.h>
#include <linux/platform_device.h>
#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/clk.h>
#include <linux/io.h>
s3c2410_baseclk_add();
}
-#ifdef CONFIG_PM
-
-static struct sleep_save s3c244x_sleep[] = {
- SAVE_ITEM(S3C2440_DSC0),
- SAVE_ITEM(S3C2440_DSC1),
- SAVE_ITEM(S3C2440_GPJDAT),
- SAVE_ITEM(S3C2440_GPJCON),
- SAVE_ITEM(S3C2440_GPJUP)
-};
-
-static int s3c244x_suspend(struct sys_device *dev, pm_message_t state)
-{
- s3c_pm_do_save(s3c244x_sleep, ARRAY_SIZE(s3c244x_sleep));
- return 0;
-}
-
-static int s3c244x_resume(struct sys_device *dev)
-{
- s3c_pm_do_restore(s3c244x_sleep, ARRAY_SIZE(s3c244x_sleep));
- return 0;
-}
-
-#else
-#define s3c244x_suspend NULL
-#define s3c244x_resume NULL
-#endif
-
/* Since the S3C2442 and S3C2440 share items, put both sysclasses here */
struct sysdev_class s3c2440_sysclass = {
.name = "s3c2440-core",
- .suspend = s3c244x_suspend,
- .resume = s3c244x_resume
};
struct sysdev_class s3c2442_sysclass = {
.name = "s3c2442-core",
- .suspend = s3c244x_suspend,
- .resume = s3c244x_resume
};
/* need to register class before we actually register the device, and
}
core_initcall(s3c2442_core_init);
+
+
+#ifdef CONFIG_PM
+static struct sleep_save s3c244x_sleep[] = {
+ SAVE_ITEM(S3C2440_DSC0),
+ SAVE_ITEM(S3C2440_DSC1),
+ SAVE_ITEM(S3C2440_GPJDAT),
+ SAVE_ITEM(S3C2440_GPJCON),
+ SAVE_ITEM(S3C2440_GPJUP)
+};
+
+static int s3c244x_suspend(void)
+{
+ s3c_pm_do_save(s3c244x_sleep, ARRAY_SIZE(s3c244x_sleep));
+ return 0;
+}
+
+static void s3c244x_resume(void)
+{
+ s3c_pm_do_restore(s3c244x_sleep, ARRAY_SIZE(s3c244x_sleep));
+}
+#else
+#define s3c244x_suspend NULL
+#define s3c244x_resume NULL
+#endif
+
+struct syscore_ops s3c244x_pm_syscore_ops = {
+ .suspend = s3c244x_suspend,
+ .resume = s3c244x_resume,
+};
*/
#include <linux/kernel.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/interrupt.h>
#include <linux/serial_core.h>
#include <linux/irq.h>
static u32 irq_uart_mask[CONFIG_SERIAL_SAMSUNG_UARTS];
-static int s3c64xx_irq_pm_suspend(struct sys_device *dev, pm_message_t state)
+static int s3c64xx_irq_pm_suspend(void)
{
struct irq_grp_save *grp = eint_grp_save;
int i;
return 0;
}
-static int s3c64xx_irq_pm_resume(struct sys_device *dev)
+static void s3c64xx_irq_pm_resume(void)
{
struct irq_grp_save *grp = eint_grp_save;
int i;
}
S3C_PMDBG("%s: IRQ configuration restored\n", __func__);
- return 0;
}
-static struct sysdev_driver s3c64xx_irq_driver = {
+struct syscore_ops s3c64xx_irq_syscore_ops = {
.suspend = s3c64xx_irq_pm_suspend,
.resume = s3c64xx_irq_pm_resume,
};
-static int __init s3c64xx_irq_pm_init(void)
+static __init int s3c64xx_syscore_init(void)
{
- return sysdev_driver_register(&s3c64xx_sysclass, &s3c64xx_irq_driver);
-}
+ register_syscore_ops(&s3c64xx_irq_syscore_ops);
-arch_initcall(s3c64xx_irq_pm_init);
+ return 0;
+}
+core_initcall(s3c64xx_syscore_init);
#include <linux/init.h>
#include <linux/suspend.h>
+#include <linux/syscore_ops.h>
#include <linux/io.h>
#include <plat/cpu.h>
return 0;
}
-static int s5pv210_pm_resume(struct sys_device *dev)
+static struct sysdev_driver s5pv210_pm_driver = {
+ .add = s5pv210_pm_add,
+};
+
+static __init int s5pv210_pm_drvinit(void)
+{
+ return sysdev_driver_register(&s5pv210_sysclass, &s5pv210_pm_driver);
+}
+arch_initcall(s5pv210_pm_drvinit);
+
+static void s5pv210_pm_resume(void)
{
u32 tmp;
__raw_writel(tmp , S5P_OTHERS);
s3c_pm_do_restore_core(s5pv210_core_save, ARRAY_SIZE(s5pv210_core_save));
-
- return 0;
}
-static struct sysdev_driver s5pv210_pm_driver = {
- .add = s5pv210_pm_add,
+static struct syscore_ops s5pv210_pm_syscore_ops = {
.resume = s5pv210_pm_resume,
};
-static __init int s5pv210_pm_drvinit(void)
+static __init int s5pv210_pm_syscore_init(void)
{
- return sysdev_driver_register(&s5pv210_sysclass, &s5pv210_pm_driver);
+ register_syscore_ops(&s5pv210_pm_syscore_ops);
+ return 0;
}
-arch_initcall(s5pv210_pm_drvinit);
+arch_initcall(s5pv210_pm_syscore_init);
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/ioport.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <mach/hardware.h>
#include <asm/mach/irq.h>
unsigned int iccr;
} sa1100irq_state;
-static int sa1100irq_suspend(struct sys_device *dev, pm_message_t state)
+static int sa1100irq_suspend(void)
{
struct sa1100irq_state *st = &sa1100irq_state;
return 0;
}
-static int sa1100irq_resume(struct sys_device *dev)
+static void sa1100irq_resume(void)
{
struct sa1100irq_state *st = &sa1100irq_state;
ICMR = st->icmr;
}
- return 0;
}
-static struct sysdev_class sa1100irq_sysclass = {
- .name = "sa11x0-irq",
+static struct syscore_ops sa1100irq_syscore_ops = {
.suspend = sa1100irq_suspend,
.resume = sa1100irq_resume,
};
-static struct sys_device sa1100irq_device = {
- .id = 0,
- .cls = &sa1100irq_sysclass,
-};
-
static int __init sa1100irq_init_devicefs(void)
{
- sysdev_class_register(&sa1100irq_sysclass);
- return sysdev_register(&sa1100irq_device);
+ register_syscore_ops(&sa1100irq_syscore_ops);
+ return 0;
}
device_initcall(sa1100irq_init_devicefs);
#include <linux/clk.h>
#include <linux/sh_clk.h>
#include <linux/bitmap.h>
+#include <linux/slab.h>
#ifdef CONFIG_PM_RUNTIME
-#define BIT_ONCE 0
-#define BIT_ACTIVE 1
-#define BIT_CLK_ENABLED 2
-struct pm_runtime_data {
- unsigned long flags;
- struct clk *clk;
-};
-
-static void __devres_release(struct device *dev, void *res)
-{
- struct pm_runtime_data *prd = res;
-
- dev_dbg(dev, "__devres_release()\n");
-
- if (test_bit(BIT_CLK_ENABLED, &prd->flags))
- clk_disable(prd->clk);
-
- if (test_bit(BIT_ACTIVE, &prd->flags))
- clk_put(prd->clk);
-}
-
-static struct pm_runtime_data *__to_prd(struct device *dev)
-{
- return devres_find(dev, __devres_release, NULL, NULL);
-}
-
-static void platform_pm_runtime_init(struct device *dev,
- struct pm_runtime_data *prd)
-{
- if (prd && !test_and_set_bit(BIT_ONCE, &prd->flags)) {
- prd->clk = clk_get(dev, NULL);
- if (!IS_ERR(prd->clk)) {
- set_bit(BIT_ACTIVE, &prd->flags);
- dev_info(dev, "clocks managed by runtime pm\n");
- }
- }
-}
-
-static void platform_pm_runtime_bug(struct device *dev,
- struct pm_runtime_data *prd)
-{
- if (prd && !test_and_set_bit(BIT_ONCE, &prd->flags))
- dev_err(dev, "runtime pm suspend before resume\n");
-}
-
-int platform_pm_runtime_suspend(struct device *dev)
-{
- struct pm_runtime_data *prd = __to_prd(dev);
-
- dev_dbg(dev, "platform_pm_runtime_suspend()\n");
-
- platform_pm_runtime_bug(dev, prd);
-
- if (prd && test_bit(BIT_ACTIVE, &prd->flags)) {
- clk_disable(prd->clk);
- clear_bit(BIT_CLK_ENABLED, &prd->flags);
- }
-
- return 0;
-}
-
-int platform_pm_runtime_resume(struct device *dev)
-{
- struct pm_runtime_data *prd = __to_prd(dev);
-
- dev_dbg(dev, "platform_pm_runtime_resume()\n");
-
- platform_pm_runtime_init(dev, prd);
-
- if (prd && test_bit(BIT_ACTIVE, &prd->flags)) {
- clk_enable(prd->clk);
- set_bit(BIT_CLK_ENABLED, &prd->flags);
- }
-
- return 0;
-}
-
-int platform_pm_runtime_idle(struct device *dev)
+static int default_platform_runtime_idle(struct device *dev)
{
/* suspend synchronously to disable clocks immediately */
return pm_runtime_suspend(dev);
}
-static int platform_bus_notify(struct notifier_block *nb,
- unsigned long action, void *data)
-{
- struct device *dev = data;
- struct pm_runtime_data *prd;
-
- dev_dbg(dev, "platform_bus_notify() %ld !\n", action);
-
- if (action == BUS_NOTIFY_BIND_DRIVER) {
- prd = devres_alloc(__devres_release, sizeof(*prd), GFP_KERNEL);
- if (prd)
- devres_add(dev, prd);
- else
- dev_err(dev, "unable to alloc memory for runtime pm\n");
- }
-
- return 0;
-}
-
-#else /* CONFIG_PM_RUNTIME */
-
-static int platform_bus_notify(struct notifier_block *nb,
- unsigned long action, void *data)
-{
- struct device *dev = data;
- struct clk *clk;
+static struct dev_power_domain default_power_domain = {
+ .ops = {
+ .runtime_suspend = pm_runtime_clk_suspend,
+ .runtime_resume = pm_runtime_clk_resume,
+ .runtime_idle = default_platform_runtime_idle,
+ USE_PLATFORM_PM_SLEEP_OPS
+ },
+};
- dev_dbg(dev, "platform_bus_notify() %ld !\n", action);
+#define DEFAULT_PWR_DOMAIN_PTR (&default_power_domain)
- switch (action) {
- case BUS_NOTIFY_BIND_DRIVER:
- clk = clk_get(dev, NULL);
- if (!IS_ERR(clk)) {
- clk_enable(clk);
- clk_put(clk);
- dev_info(dev, "runtime pm disabled, clock forced on\n");
- }
- break;
- case BUS_NOTIFY_UNBOUND_DRIVER:
- clk = clk_get(dev, NULL);
- if (!IS_ERR(clk)) {
- clk_disable(clk);
- clk_put(clk);
- dev_info(dev, "runtime pm disabled, clock forced off\n");
- }
- break;
- }
+#else
- return 0;
-}
+#define DEFAULT_PWR_DOMAIN_PTR NULL
#endif /* CONFIG_PM_RUNTIME */
-static struct notifier_block platform_bus_notifier = {
- .notifier_call = platform_bus_notify
+static struct pm_clk_notifier_block platform_bus_notifier = {
+ .pwr_domain = DEFAULT_PWR_DOMAIN_PTR,
+ .con_ids = { NULL, },
};
static int __init sh_pm_runtime_init(void)
{
- bus_register_notifier(&platform_bus_type, &platform_bus_notifier);
+ pm_runtime_clk_add_notifier(&platform_bus_type, &platform_bus_notifier);
return 0;
}
core_initcall(sh_pm_runtime_init);
#include <asm/outercache.h>
-#define rmb() dmb()
+#define rmb() dsb()
#define wmb() do { dsb(); outer_sync(); } while (0)
#define mb() wmb()
.irq = NOMADIK_GPIO_TO_IRQ(217),
.platform_data = &mop500_tc35892_data,
},
-};
-
-/* I2C0 devices only available prior to HREFv60 */
-static struct i2c_board_info __initdata mop500_i2c0_old_devices[] = {
+ /* I2C0 devices only available prior to HREFv60 */
{
I2C_BOARD_INFO("tps61052", 0x33),
.platform_data = &mop500_tps61052_data,
},
};
+#define NUM_PRE_V60_I2C0_DEVICES 1
+
static struct i2c_board_info __initdata mop500_i2c2_devices[] = {
{
/* lp5521 LED driver, 1st device */
static void __init mop500_init_machine(void)
{
+ int i2c0_devs;
+
/*
* The HREFv60 board removed a GPIO expander and routed
* all these GPIO pins to the internal GPIO controller
platform_device_register(&ab8500_device);
- i2c_register_board_info(0, mop500_i2c0_devices,
- ARRAY_SIZE(mop500_i2c0_devices));
- if (!machine_is_hrefv60())
- i2c_register_board_info(0, mop500_i2c0_old_devices,
- ARRAY_SIZE(mop500_i2c0_old_devices));
+ i2c0_devs = ARRAY_SIZE(mop500_i2c0_devices);
+ if (machine_is_hrefv60())
+ i2c0_devs -= NUM_PRE_V60_I2C0_DEVICES;
+
+ i2c_register_board_info(0, mop500_i2c0_devices, i2c0_devs);
i2c_register_board_info(2, mop500_i2c2_devices,
ARRAY_SIZE(mop500_i2c2_devices));
}
* Convert start_pfn/end_pfn to a struct page pointer.
*/
start_pg = pfn_to_page(start_pfn - 1) + 1;
- end_pg = pfn_to_page(end_pfn);
+ end_pg = pfn_to_page(end_pfn - 1) + 1;
/*
* Convert to physical addresses, and
bank_start = bank_pfn_start(bank);
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * Take care not to free memmap entries that don't exist
+ * due to SPARSEMEM sections which aren't present.
+ */
+ bank_start = min(bank_start,
+ ALIGN(prev_bank_end, PAGES_PER_SECTION));
+#endif
/*
* If we had a previous bank, and there is a space
* between the current bank and the previous, free it.
*/
prev_bank_end = ALIGN(bank_pfn_end(bank), MAX_ORDER_NR_PAGES);
}
+
+#ifdef CONFIG_SPARSEMEM
+ if (!IS_ALIGNED(prev_bank_end, PAGES_PER_SECTION))
+ free_memmap(prev_bank_end,
+ ALIGN(prev_bank_end, PAGES_PER_SECTION));
+#endif
}
static void __init free_highpages(void)
teq r2, #DMA_TO_DEVICE
beq xscale_dma_clean_range
b xscale_dma_flush_range
-ENDPROC(xscsale_dma_a0_map_area)
+ENDPROC(xscale_dma_a0_map_area)
/*
* dma_unmap_area(start, size, dir)
return 0;
}
+/*
+ * This lock class tells lockdep that GPIO irqs are in a different
+ * category than their parents, so it won't report false recursion.
+ */
+static struct lock_class_key gpio_lock_class;
+
int __init mxc_gpio_init(struct mxc_gpio_port *port, int cnt)
{
int i, j;
__raw_writel(~0, port[i].base + GPIO_ISR);
for (j = port[i].virtual_irq_start;
j < port[i].virtual_irq_start + 32; j++) {
+ irq_set_lockdep_class(j, &gpio_lock_class);
irq_set_chip_and_handler(j, &gpio_irq_chip,
handle_level_irq);
set_irq_flags(j, IRQF_VALID);
1:
@ return from FIQ
subs pc, lr, #4
+
+ .align
imx_ssi_fiq_base:
.word 0x0
imx_ssi_fiq_rx_buffer:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/interrupt.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/err.h>
#include <linux/clk.h>
#include <linux/io.h>
.resume_noirq = omap_mpuio_resume_noirq,
};
-/* use platform_driver for this, now that there's no longer any
- * point to sys_device (other than not disturbing old code).
- */
+/* use platform_driver for this. */
static struct platform_driver omap_mpuio_driver = {
.driver = {
.name = "mpuio",
}
#if defined(CONFIG_ARCH_OMAP16XX) || defined(CONFIG_ARCH_OMAP2PLUS)
-static int omap_gpio_suspend(struct sys_device *dev, pm_message_t mesg)
+static int omap_gpio_suspend(void)
{
int i;
return 0;
}
-static int omap_gpio_resume(struct sys_device *dev)
+static void omap_gpio_resume(void)
{
int i;
if (!cpu_class_is_omap2() && !cpu_is_omap16xx())
- return 0;
+ return;
for (i = 0; i < gpio_bank_count; i++) {
struct gpio_bank *bank = &gpio_bank[i];
__raw_writel(bank->saved_wakeup, wake_set);
spin_unlock_irqrestore(&bank->lock, flags);
}
-
- return 0;
}
-static struct sysdev_class omap_gpio_sysclass = {
- .name = "gpio",
+static struct syscore_ops omap_gpio_syscore_ops = {
.suspend = omap_gpio_suspend,
.resume = omap_gpio_resume,
};
-static struct sys_device omap_gpio_device = {
- .id = 0,
- .cls = &omap_gpio_sysclass,
-};
-
#endif
#ifdef CONFIG_ARCH_OMAP2PLUS
static int __init omap_gpio_sysinit(void)
{
- int ret = 0;
-
mpuio_init();
#if defined(CONFIG_ARCH_OMAP16XX) || defined(CONFIG_ARCH_OMAP2PLUS)
- if (cpu_is_omap16xx() || cpu_class_is_omap2()) {
- if (ret == 0) {
- ret = sysdev_class_register(&omap_gpio_sysclass);
- if (ret == 0)
- ret = sysdev_register(&omap_gpio_device);
- }
- }
+ if (cpu_is_omap16xx() || cpu_class_is_omap2())
+ register_syscore_ops(&omap_gpio_syscore_ops);
#endif
- return ret;
+ return 0;
}
arch_initcall(omap_gpio_sysinit);
clk_enable(obj->clk);
errs = iommu_report_fault(obj, &da);
clk_disable(obj->clk);
+ if (errs == 0)
+ return IRQ_HANDLED;
/* Fault callback or TLB/PTE Dynamic loading */
if (obj->isr && !obj->isr(obj, da, errs, obj->isr_priv))
return 0;
}
+static int _od_runtime_suspend(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+
+ return omap_device_idle(pdev);
+}
+
+static int _od_runtime_resume(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+
+ return omap_device_enable(pdev);
+}
+
+static struct dev_power_domain omap_device_power_domain = {
+ .ops = {
+ .runtime_suspend = _od_runtime_suspend,
+ .runtime_resume = _od_runtime_resume,
+ USE_PLATFORM_PM_SLEEP_OPS
+ }
+};
+
/**
* omap_device_register - register an omap_device with one omap_hwmod
* @od: struct omap_device * to register
pr_debug("omap_device: %s: registering\n", od->pdev.name);
od->pdev.dev.parent = &omap_device_parent;
+ od->pdev.dev.pwr_domain = &omap_device_power_domain;
return platform_device_register(&od->pdev);
}
#include <linux/init.h>
#include <linux/irq.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/slab.h>
#include <mach/gpio.h>
}
#ifdef CONFIG_PM
-static int pxa_gpio_suspend(struct sys_device *dev, pm_message_t state)
+static int pxa_gpio_suspend(void)
{
struct pxa_gpio_chip *c;
int gpio;
return 0;
}
-static int pxa_gpio_resume(struct sys_device *dev)
+static void pxa_gpio_resume(void)
{
struct pxa_gpio_chip *c;
int gpio;
__raw_writel(c->saved_gfer, c->regbase + GFER_OFFSET);
__raw_writel(c->saved_gpdr, c->regbase + GPDR_OFFSET);
}
- return 0;
}
#else
#define pxa_gpio_suspend NULL
#define pxa_gpio_resume NULL
#endif
-struct sysdev_class pxa_gpio_sysclass = {
- .name = "gpio",
+struct syscore_ops pxa_gpio_syscore_ops = {
.suspend = pxa_gpio_suspend,
.resume = pxa_gpio_resume,
};
-
-static int __init pxa_gpio_init(void)
-{
- return sysdev_class_register(&pxa_gpio_sysclass);
-}
-
-core_initcall(pxa_gpio_init);
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/io.h>
-#include <linux/sysdev.h>
#include <plat/mfp.h>
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/interrupt.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/slab.h>
#include <linux/errno.h>
#include <linux/io.h>
EXPORT_SYMBOL(s3c2410_dma_getposition);
-static inline struct s3c2410_dma_chan *to_dma_chan(struct sys_device *dev)
-{
- return container_of(dev, struct s3c2410_dma_chan, dev);
-}
-
-/* system device class */
+/* system core operations */
#ifdef CONFIG_PM
-static int s3c2410_dma_suspend(struct sys_device *dev, pm_message_t state)
+static void s3c2410_dma_suspend_chan(s3c2410_dma_chan *cp)
{
- struct s3c2410_dma_chan *cp = to_dma_chan(dev);
-
printk(KERN_DEBUG "suspending dma channel %d\n", cp->number);
if (dma_rdreg(cp, S3C2410_DMA_DMASKTRIG) & S3C2410_DMASKTRIG_ON) {
s3c2410_dma_dostop(cp);
}
+}
+
+static int s3c2410_dma_suspend(void)
+{
+ struct s3c2410_dma_chan *cp = s3c2410_chans;
+ int channel;
+
+ for (channel = 0; channel < dma_channels; cp++, channel++)
+ s3c2410_dma_suspend_chan(cp);
return 0;
}
-static int s3c2410_dma_resume(struct sys_device *dev)
+static void s3c2410_dma_resume_chan(struct s3c2410_dma_chan *cp)
{
- struct s3c2410_dma_chan *cp = to_dma_chan(dev);
unsigned int no = cp->number | DMACH_LOW_LEVEL;
/* restore channel's hardware configuration */
return 0;
}
+static void s3c2410_dma_resume(void)
+{
+ struct s3c2410_dma_chan *cp = s3c2410_chans + dma_channels - 1;
+ int channel;
+
+ for (channel = dma_channels - 1; channel >= 0; cp++, channel--)
+ s3c2410_dma_resume_chan(cp);
+}
+
#else
#define s3c2410_dma_suspend NULL
#define s3c2410_dma_resume NULL
#endif /* CONFIG_PM */
-struct sysdev_class dma_sysclass = {
- .name = "s3c24xx-dma",
+struct syscore_ops dma_syscore_ops = {
.suspend = s3c2410_dma_suspend,
.resume = s3c2410_dma_resume,
};
/* initialisation code */
-static int __init s3c24xx_dma_sysclass_init(void)
+static int __init s3c24xx_dma_syscore_init(void)
{
- int ret = sysdev_class_register(&dma_sysclass);
-
- if (ret != 0)
- printk(KERN_ERR "dma sysclass registration failed\n");
-
- return ret;
-}
-
-core_initcall(s3c24xx_dma_sysclass_init);
-
-static int __init s3c24xx_dma_sysdev_register(void)
-{
- struct s3c2410_dma_chan *cp = s3c2410_chans;
- int channel, ret;
-
- for (channel = 0; channel < dma_channels; cp++, channel++) {
- cp->dev.cls = &dma_sysclass;
- cp->dev.id = channel;
- ret = sysdev_register(&cp->dev);
-
- if (ret) {
- printk(KERN_ERR "error registering dev for dma %d\n",
- channel);
- return ret;
- }
- }
+ register_syscore_ops(&dma_syscore_ops);
return 0;
}
-late_initcall(s3c24xx_dma_sysdev_register);
+late_initcall(s3c24xx_dma_syscore_init);
int __init s3c24xx_dma_init(unsigned int channels, unsigned int irq,
unsigned int stride)
#include <linux/init.h>
#include <linux/module.h>
#include <linux/interrupt.h>
-#include <linux/sysdev.h>
#include <linux/irq.h>
#include <plat/cpu.h>
static unsigned long save_eintflt[4];
static unsigned long save_eintmask;
-int s3c24xx_irq_suspend(struct sys_device *dev, pm_message_t state)
+int s3c24xx_irq_suspend(void)
{
unsigned int i;
return 0;
}
-int s3c24xx_irq_resume(struct sys_device *dev)
+void s3c24xx_irq_resume(void)
{
unsigned int i;
s3c_pm_do_restore(irq_save, ARRAY_SIZE(irq_save));
__raw_writel(save_eintmask, S3C24XX_EINTMASK);
-
- return 0;
}
#include <linux/init.h>
#include <linux/module.h>
#include <linux/interrupt.h>
-#include <linux/sysdev.h>
#include <plat/cpu.h>
#include <plat/irqs.h>
SAVE_ITEM(S5P_EINT_MASK(3)),
};
-int s3c24xx_irq_suspend(struct sys_device *dev, pm_message_t state)
+int s3c24xx_irq_suspend(void)
{
s3c_pm_do_save(eint_save, ARRAY_SIZE(eint_save));
return 0;
}
-int s3c24xx_irq_resume(struct sys_device *dev)
+void s3c24xx_irq_resume(void)
{
s3c_pm_do_restore(eint_save, ARRAY_SIZE(eint_save));
-
- return 0;
}
struct sys_timer;
extern struct sys_timer s3c24xx_timer;
+extern struct syscore_ops s3c2410_pm_syscore_ops;
+extern struct syscore_ops s3c2412_pm_syscore_ops;
+extern struct syscore_ops s3c2416_pm_syscore_ops;
+extern struct syscore_ops s3c244x_pm_syscore_ops;
+extern struct syscore_ops s3c64xx_irq_syscore_ops;
+
/* system device classes */
extern struct sysdev_class s3c2410_sysclass;
#ifdef CONFIG_PM
extern int s3c_irqext_wake(struct irq_data *data, unsigned int state);
-extern int s3c24xx_irq_suspend(struct sys_device *dev, pm_message_t state);
-extern int s3c24xx_irq_resume(struct sys_device *dev);
+extern int s3c24xx_irq_suspend(void);
+extern void s3c24xx_irq_resume(void);
#else
#define s3c_irqext_wake NULL
#define s3c24xx_irq_suspend NULL
#define s3c24xx_irq_resume NULL
#endif
+extern struct syscore_ops s3c24xx_irq_syscore_ops;
+
/* PM debug functions */
#ifdef CONFIG_SAMSUNG_PM_DEBUG
}
#ifdef CONFIG_PM
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
-static int vfp_pm_suspend(struct sys_device *dev, pm_message_t state)
+static int vfp_pm_suspend(void)
{
struct thread_info *ti = current_thread_info();
u32 fpexc = fmrx(FPEXC);
return 0;
}
-static int vfp_pm_resume(struct sys_device *dev)
+static void vfp_pm_resume(void)
{
/* ensure we have access to the vfp */
vfp_enable(NULL);
/* and disable it to ensure the next usage restores the state */
fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
-
- return 0;
}
-static struct sysdev_class vfp_pm_sysclass = {
- .name = "vfp",
+static struct syscore_ops vfp_pm_syscore_ops = {
.suspend = vfp_pm_suspend,
.resume = vfp_pm_resume,
};
-static struct sys_device vfp_pm_sysdev = {
- .cls = &vfp_pm_sysclass,
-};
-
static void vfp_pm_init(void)
{
- sysdev_class_register(&vfp_pm_sysclass);
- sysdev_register(&vfp_pm_sysdev);
+ register_syscore_ops(&vfp_pm_syscore_ops);
}
-
#else
static inline void vfp_pm_init(void) { }
#endif /* CONFIG_PM */
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/platform_device.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/io.h>
struct intc {
void __iomem *regs;
struct irq_chip chip;
- struct sys_device sysdev;
#ifdef CONFIG_PM
unsigned long suspend_ipr;
unsigned long saved_ipr[64];
intc0.suspend_ipr = offset;
}
-static int intc_suspend(struct sys_device *sdev, pm_message_t state)
+static int intc_suspend(void)
{
- struct intc *intc = container_of(sdev, struct intc, sysdev);
int i;
if (unlikely(!irqs_disabled())) {
return -EINVAL;
}
- if (unlikely(!intc->suspend_ipr)) {
+ if (unlikely(!intc0.suspend_ipr)) {
pr_err("intc_suspend: suspend_ipr not initialized\n");
return -EINVAL;
}
for (i = 0; i < 64; i++) {
- intc->saved_ipr[i] = intc_readl(intc, INTPR0 + 4 * i);
- intc_writel(intc, INTPR0 + 4 * i, intc->suspend_ipr);
+ intc0.saved_ipr[i] = intc_readl(&intc0, INTPR0 + 4 * i);
+ intc_writel(&intc0, INTPR0 + 4 * i, intc0.suspend_ipr);
}
return 0;
}
-static int intc_resume(struct sys_device *sdev)
+static int intc_resume(void)
{
- struct intc *intc = container_of(sdev, struct intc, sysdev);
int i;
- WARN_ON(!irqs_disabled());
-
for (i = 0; i < 64; i++)
- intc_writel(intc, INTPR0 + 4 * i, intc->saved_ipr[i]);
+ intc_writel(&intc0, INTPR0 + 4 * i, intc0.saved_ipr[i]);
return 0;
}
#define intc_resume NULL
#endif
-static struct sysdev_class intc_class = {
- .name = "intc",
+static struct syscore_ops intc_syscore_ops = {
.suspend = intc_suspend,
.resume = intc_resume,
};
-static int __init intc_init_sysdev(void)
+static int __init intc_init_syscore(void)
{
- int ret;
-
- ret = sysdev_class_register(&intc_class);
- if (ret)
- return ret;
+ register_syscore_ops(&intc_syscore_ops);
- intc0.sysdev.id = 0;
- intc0.sysdev.cls = &intc_class;
- ret = sysdev_register(&intc0.sysdev);
-
- return ret;
+ return 0;
}
-device_initcall(intc_init_sysdev);
+device_initcall(intc_init_syscore);
unsigned long intc_get_pending(unsigned int group)
{
#include <linux/bitops.h>
#include <linux/hardirq.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/pm.h>
#include <linux/nmi.h>
#include <linux/smp.h>
/* Suspend/resume support */
#ifdef CONFIG_PM
-static int nmi_wdt_suspend(struct sys_device *dev, pm_message_t state)
+static int nmi_wdt_suspend(void)
{
nmi_wdt_stop();
return 0;
}
-static int nmi_wdt_resume(struct sys_device *dev)
+static void nmi_wdt_resume(void)
{
if (nmi_active)
nmi_wdt_start();
- return 0;
}
-static struct sysdev_class nmi_sysclass = {
- .name = DRV_NAME,
+static struct syscore_ops nmi_syscore_ops = {
.resume = nmi_wdt_resume,
.suspend = nmi_wdt_suspend,
};
-static struct sys_device device_nmi_wdt = {
- .id = 0,
- .cls = &nmi_sysclass,
-};
-
-static int __init init_nmi_wdt_sysfs(void)
+static int __init init_nmi_wdt_syscore(void)
{
- int error;
-
- if (!nmi_active)
- return 0;
+ if (nmi_active)
+ register_syscore_ops(&nmi_syscore_ops);
- error = sysdev_class_register(&nmi_sysclass);
- if (!error)
- error = sysdev_register(&device_nmi_wdt);
- return error;
+ return 0;
}
-late_initcall(init_nmi_wdt_sysfs);
+late_initcall(init_nmi_wdt_syscore);
#endif /* CONFIG_PM */
#include <asm/gptimers.h>
#include <asm/nmi.h>
-/* Accelerators for sched_clock()
- * convert from cycles(64bits) => nanoseconds (64bits)
- * basic equation:
- * ns = cycles / (freq / ns_per_sec)
- * ns = cycles * (ns_per_sec / freq)
- * ns = cycles * (10^9 / (cpu_khz * 10^3))
- * ns = cycles * (10^6 / cpu_khz)
- *
- * Then we use scaling math (suggested by george@mvista.com) to get:
- * ns = cycles * (10^6 * SC / cpu_khz) / SC
- * ns = cycles * cyc2ns_scale / SC
- *
- * And since SC is a constant power of two, we can convert the div
- * into a shift.
- *
- * We can use khz divisor instead of mhz to keep a better precision, since
- * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits.
- * (mathieu.desnoyers@polymtl.ca)
- *
- * -johnstul@us.ibm.com "math is hard, lets go shopping!"
- */
-
-#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */
#if defined(CONFIG_CYCLES_CLOCKSOURCE)
.rating = 400,
.read = bfin_read_cycles,
.mask = CLOCKSOURCE_MASK(64),
- .shift = CYC2NS_SCALE_FACTOR,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
static int __init bfin_cs_cycles_init(void)
{
- bfin_cs_cycles.mult = \
- clocksource_hz2mult(get_cclk(), bfin_cs_cycles.shift);
-
- if (clocksource_register(&bfin_cs_cycles))
+ if (clocksource_register_hz(&bfin_cs_cycles, get_cclk()))
panic("failed to register clocksource");
return 0;
.rating = 350,
.read = bfin_read_gptimer0,
.mask = CLOCKSOURCE_MASK(32),
- .shift = CYC2NS_SCALE_FACTOR,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
{
setup_gptimer0();
- bfin_cs_gptimer0.mult = \
- clocksource_hz2mult(get_sclk(), bfin_cs_gptimer0.shift);
-
- if (clocksource_register(&bfin_cs_gptimer0))
+ if (clocksource_register_hz(&bfin_cs_gptimer0, get_sclk()))
panic("failed to register clocksource");
return 0;
#define DRIVER_NAME "bfin dpmc"
-#define dprintk(msg...) \
- cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, DRIVER_NAME, msg)
-
struct bfin_dpmc_platform_data *pdata;
/**
while (msg_queue->count) {
msg = &msg_queue->ipi_message[msg_queue->head];
switch (msg->type) {
+ case BFIN_IPI_RESCHEDULE:
+ scheduler_ipi();
+ break;
case BFIN_IPI_CALL_FUNC:
spin_unlock_irqrestore(&msg_queue->lock, flags);
ipi_call_function(cpu, msg);
ipi = REG_RD(intr_vect, irq_regs[smp_processor_id()], rw_ipi);
+ if (ipi.vector & IPI_SCHEDULE) {
+ scheduler_ipi();
+ }
if (ipi.vector & IPI_CALL) {
- func(info);
+ func(info);
}
if (ipi.vector & IPI_FLUSH_TLB) {
- if (flush_mm == FLUSH_ALL)
- __flush_tlb_all();
- else if (flush_vma == FLUSH_ALL)
+ if (flush_mm == FLUSH_ALL)
+ __flush_tlb_all();
+ else if (flush_vma == FLUSH_ALL)
__flush_tlb_mm(flush_mm);
- else
+ else
__flush_tlb_page(flush_vma, flush_addr);
}
#include <linux/acpi.h>
#include <acpi/processor.h>
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, "acpi-cpufreq", msg)
-
MODULE_AUTHOR("Venkatesh Pallipadi");
MODULE_DESCRIPTION("ACPI Processor P-States Driver");
MODULE_LICENSE("GPL");
{
s64 retval;
- dprintk("processor_set_pstate\n");
+ pr_debug("processor_set_pstate\n");
retval = ia64_pal_set_pstate((u64)value);
if (retval) {
- dprintk("Failed to set freq to 0x%x, with error 0x%lx\n",
+ pr_debug("Failed to set freq to 0x%x, with error 0x%lx\n",
value, retval);
return -ENODEV;
}
u64 pstate_index = 0;
s64 retval;
- dprintk("processor_get_pstate\n");
+ pr_debug("processor_get_pstate\n");
retval = ia64_pal_get_pstate(&pstate_index,
PAL_GET_PSTATE_TYPE_INSTANT);
*value = (u32) pstate_index;
if (retval)
- dprintk("Failed to get current freq with "
+ pr_debug("Failed to get current freq with "
"error 0x%lx, idx 0x%x\n", retval, *value);
return (int)retval;
{
unsigned long i;
- dprintk("extract_clock\n");
+ pr_debug("extract_clock\n");
for (i = 0; i < data->acpi_data.state_count; i++) {
if (value == data->acpi_data.states[i].status)
cpumask_t saved_mask;
unsigned long clock_freq;
- dprintk("processor_get_freq\n");
+ pr_debug("processor_get_freq\n");
saved_mask = current->cpus_allowed;
set_cpus_allowed_ptr(current, cpumask_of(cpu));
cpumask_t saved_mask;
int retval;
- dprintk("processor_set_freq\n");
+ pr_debug("processor_set_freq\n");
saved_mask = current->cpus_allowed;
set_cpus_allowed_ptr(current, cpumask_of(cpu));
if (state == data->acpi_data.state) {
if (unlikely(data->resume)) {
- dprintk("Called after resume, resetting to P%d\n", state);
+ pr_debug("Called after resume, resetting to P%d\n", state);
data->resume = 0;
} else {
- dprintk("Already at target state (P%d)\n", state);
+ pr_debug("Already at target state (P%d)\n", state);
retval = 0;
goto migrate_end;
}
}
- dprintk("Transitioning from P%d to P%d\n",
+ pr_debug("Transitioning from P%d to P%d\n",
data->acpi_data.state, state);
/* cpufreq frequency struct */
value = (u32) data->acpi_data.states[state].control;
- dprintk("Transitioning to state: 0x%08x\n", value);
+ pr_debug("Transitioning to state: 0x%08x\n", value);
ret = processor_set_pstate(value);
if (ret) {
{
struct cpufreq_acpi_io *data = acpi_io_data[cpu];
- dprintk("acpi_cpufreq_get\n");
+ pr_debug("acpi_cpufreq_get\n");
return processor_get_freq(data, cpu);
}
unsigned int next_state = 0;
unsigned int result = 0;
- dprintk("acpi_cpufreq_setpolicy\n");
+ pr_debug("acpi_cpufreq_setpolicy\n");
result = cpufreq_frequency_table_target(policy,
data->freq_table, target_freq, relation, &next_state);
unsigned int result = 0;
struct cpufreq_acpi_io *data = acpi_io_data[policy->cpu];
- dprintk("acpi_cpufreq_verify\n");
+ pr_debug("acpi_cpufreq_verify\n");
result = cpufreq_frequency_table_verify(policy,
data->freq_table);
struct cpufreq_acpi_io *data;
unsigned int result = 0;
- dprintk("acpi_cpufreq_cpu_init\n");
+ pr_debug("acpi_cpufreq_cpu_init\n");
data = kzalloc(sizeof(struct cpufreq_acpi_io), GFP_KERNEL);
if (!data)
/* capability check */
if (data->acpi_data.state_count <= 1) {
- dprintk("No P-States\n");
+ pr_debug("No P-States\n");
result = -ENODEV;
goto err_unreg;
}
ACPI_ADR_SPACE_FIXED_HARDWARE) ||
(data->acpi_data.status_register.space_id !=
ACPI_ADR_SPACE_FIXED_HARDWARE)) {
- dprintk("Unsupported address space [%d, %d]\n",
+ pr_debug("Unsupported address space [%d, %d]\n",
(u32) (data->acpi_data.control_register.space_id),
(u32) (data->acpi_data.status_register.space_id));
result = -ENODEV;
"activated.\n", cpu);
for (i = 0; i < data->acpi_data.state_count; i++)
- dprintk(" %cP%d: %d MHz, %d mW, %d uS, %d uS, 0x%x 0x%x\n",
+ pr_debug(" %cP%d: %d MHz, %d mW, %d uS, %d uS, 0x%x 0x%x\n",
(i == data->acpi_data.state?'*':' '), i,
(u32) data->acpi_data.states[i].core_frequency,
(u32) data->acpi_data.states[i].power,
{
struct cpufreq_acpi_io *data = acpi_io_data[policy->cpu];
- dprintk("acpi_cpufreq_cpu_exit\n");
+ pr_debug("acpi_cpufreq_cpu_exit\n");
if (data) {
cpufreq_frequency_table_put_attr(policy->cpu);
static int __init
acpi_cpufreq_init (void)
{
- dprintk("acpi_cpufreq_init\n");
+ pr_debug("acpi_cpufreq_init\n");
return cpufreq_register_driver(&acpi_cpufreq_driver);
}
static void __exit
acpi_cpufreq_exit (void)
{
- dprintk("acpi_cpufreq_exit\n");
+ pr_debug("acpi_cpufreq_exit\n");
cpufreq_unregister_driver(&acpi_cpufreq_driver);
return;
.rating = 300,
.read = read_cyclone,
.mask = (1LL << 40) - 1,
- .mult = 0, /*to be calculated*/
- .shift = 16,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
/* initialize last tick */
cyclone_mc = cyclone_timer;
clocksource_cyclone.fsys_mmio = cyclone_timer;
- clocksource_cyclone.mult = clocksource_hz2mult(CYCLONE_TIMER_FREQ,
- clocksource_cyclone.shift);
- clocksource_register(&clocksource_cyclone);
+ clocksource_register_hz(&clocksource_cyclone, CYCLONE_TIMER_FREQ);
return 0;
}
#include <linux/irq.h>
#include <linux/ratelimit.h>
#include <linux/acpi.h>
+#include <linux/sched.h>
#include <asm/delay.h>
#include <asm/intrinsics.h>
smp_local_flush_tlb();
kstat_incr_irqs_this_cpu(irq, desc);
} else if (unlikely(IS_RESCHEDULE(vector))) {
+ scheduler_ipi();
kstat_incr_irqs_this_cpu(irq, desc);
} else {
ia64_setreg(_IA64_REG_CR_TPR, vector);
.rating = 350,
.read = itc_get_cycles,
.mask = CLOCKSOURCE_MASK(64),
- .mult = 0, /*to be calculated*/
- .shift = 16,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
#ifdef CONFIG_PARAVIRT
.resume = paravirt_clocksource_resume,
ia64_cpu_local_tick();
if (!itc_clocksource) {
- /* Sort out mult/shift values: */
- clocksource_itc.mult =
- clocksource_hz2mult(local_cpu_data->itc_freq,
- clocksource_itc.shift);
- clocksource_register(&clocksource_itc);
+ clocksource_register_hz(&clocksource_itc,
+ local_cpu_data->itc_freq);
itc_clocksource = &clocksource_itc;
}
}
.rating = 450,
.read = read_sn2,
.mask = (1LL << 55) - 1,
- .mult = 0,
- .shift = 10,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
void __init sn_timer_init(void)
{
clocksource_sn2.fsys_mmio = RTC_COUNTER_ADDR;
- clocksource_sn2.mult = clocksource_hz2mult(sn_rtc_cycles_per_second,
- clocksource_sn2.shift);
- clocksource_register(&clocksource_sn2);
+ clocksource_register_hz(&clocksource_sn2, sn_rtc_cycles_per_second);
ia64_udelay = &ia64_sn_udelay;
}
static int xen_slab_ready;
#ifdef CONFIG_SMP
+#include <linux/sched.h>
+
/* Dummy stub. Though we may check XEN_RESCHEDULE_VECTOR before __do_IRQ,
* it ends up to issue several memory accesses upon percpu data and
* thus adds unnecessary traffic to other paths.
static irqreturn_t
xen_dummy_handler(int irq, void *dev_id)
{
+ return IRQ_HANDLED;
+}
+static irqreturn_t
+xen_resched_handler(int irq, void *dev_id)
+{
+ scheduler_ipi();
return IRQ_HANDLED;
}
};
static struct irqaction xen_resched_irqaction = {
- .handler = xen_dummy_handler,
+ .handler = xen_resched_handler,
.flags = IRQF_DISABLED,
.name = "resched"
};
*
* Description: This routine executes on CPU which received
* 'RESCHEDULE_IPI'.
- * Rescheduling is processed at the exit of interrupt
- * operation.
*
* Born on Date: 2002.02.05
*
*==========================================================================*/
void smp_reschedule_interrupt(void)
{
- /* nothing to do */
+ scheduler_ipi();
}
/*==========================================================================*
/* Hook for MIDI serial driver */
void (*atari_MIDI_interrupt_hook) (void);
-/* Hook for mouse driver */
-void (*atari_mouse_interrupt_hook) (char *);
/* Hook for keyboard inputdev driver */
void (*atari_input_keyboard_interrupt_hook) (unsigned char, char);
/* Hook for mouse inputdev driver */
void (*atari_input_mouse_interrupt_hook) (char *);
-EXPORT_SYMBOL(atari_mouse_interrupt_hook);
EXPORT_SYMBOL(atari_input_keyboard_interrupt_hook);
EXPORT_SYMBOL(atari_input_mouse_interrupt_hook);
kb_state.buf[kb_state.len++] = scancode;
if (kb_state.len == 3) {
kb_state.state = KEYBOARD;
- if (atari_mouse_interrupt_hook)
- atari_mouse_interrupt_hook(kb_state.buf);
+ if (atari_input_mouse_interrupt_hook)
+ atari_input_mouse_interrupt_hook(kb_state.buf);
}
break;
kb_state.len = 0;
error = request_irq(IRQ_MFP_ACIA, atari_keyboard_interrupt,
- IRQ_TYPE_SLOW, "keyboard/mouse/MIDI",
+ IRQ_TYPE_SLOW, "keyboard,mouse,MIDI",
atari_keyboard_interrupt);
if (error)
return error;
{
stdma_isr = NULL;
if (request_irq(IRQ_MFP_FDC, stdma_int, IRQ_TYPE_SLOW | IRQF_SHARED,
- "ST-DMA: floppy/ACSI/IDE/Falcon-SCSI", stdma_int))
+ "ST-DMA floppy,ACSI,IDE,Falcon-SCSI", stdma_int))
pr_err("Couldn't register ST-DMA interrupt\n");
}
/* Hook for MIDI serial driver */
extern void (*atari_MIDI_interrupt_hook) (void);
-/* Hook for mouse driver */
-extern void (*atari_mouse_interrupt_hook) (char *);
/* Hook for keyboard inputdev driver */
extern void (*atari_input_keyboard_interrupt_hook) (unsigned char, char);
/* Hook for mouse inputdev driver */
{
const unsigned long *p = vaddr;
int res = 32;
+ unsigned int words;
unsigned long num;
if (!size)
return 0;
- size = (size + 31) >> 5;
+ words = (size + 31) >> 5;
while (!(num = ~*p++)) {
- if (!--size)
+ if (!--words)
goto out;
}
: "=d" (res) : "d" (num & -num));
res ^= 31;
out:
- return ((long)p - (long)vaddr - 4) * 8 + res;
+ res += ((long)p - (long)vaddr - 4) * 8;
+ return res < size ? res : size;
}
static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
/* Look for zero in first longword */
__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
: "=d" (res) : "d" (num & -num));
- if (res < 32)
- return offset + (res ^ 31);
+ if (res < 32) {
+ offset += res ^ 31;
+ return offset < size ? offset : size;
+ }
offset += 32;
+
+ if (offset >= size)
+ return size;
}
/* No zero yet, search remaining full bytes for a zero */
- res = find_first_zero_bit(p, size - ((long)p - (long)vaddr) * 8);
- return offset + res;
+ return offset + find_first_zero_bit(p, size - offset);
}
static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
{
const unsigned long *p = vaddr;
int res = 32;
+ unsigned int words;
unsigned long num;
if (!size)
return 0;
- size = (size + 31) >> 5;
+ words = (size + 31) >> 5;
while (!(num = *p++)) {
- if (!--size)
+ if (!--words)
goto out;
}
: "=d" (res) : "d" (num & -num));
res ^= 31;
out:
- return ((long)p - (long)vaddr - 4) * 8 + res;
+ res += ((long)p - (long)vaddr - 4) * 8;
+ return res < size ? res : size;
}
static inline int find_next_bit(const unsigned long *vaddr, int size,
/* Look for one in first longword */
__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
: "=d" (res) : "d" (num & -num));
- if (res < 32)
- return offset + (res ^ 31);
+ if (res < 32) {
+ offset += res ^ 31;
+ return offset < size ? offset : size;
+ }
offset += 32;
+
+ if (offset >= size)
+ return size;
}
/* No one yet, search remaining full bytes for a one */
- res = find_first_bit(p, size - ((long)p - (long)vaddr) * 8);
- return offset + res;
+ return offset + find_first_bit(p, size - offset);
}
/*
static inline int find_first_zero_bit_le(const void *vaddr, unsigned size)
{
const unsigned long *p = vaddr, *addr = vaddr;
- int res;
+ int res = 0;
+ unsigned int words;
if (!size)
return 0;
- size = (size >> 5) + ((size & 31) > 0);
- while (*p++ == ~0UL)
- {
- if (--size == 0)
- return (p - addr) << 5;
+ words = (size >> 5) + ((size & 31) > 0);
+ while (*p++ == ~0UL) {
+ if (--words == 0)
+ goto out;
}
--p;
for (res = 0; res < 32; res++)
if (!test_bit_le(res, p))
break;
- return (p - addr) * 32 + res;
+out:
+ res += (p - addr) * 32;
+ return res < size ? res : size;
}
static inline unsigned long find_next_zero_bit_le(const void *addr,
offset -= bit;
/* Look for zero in first longword */
for (res = bit; res < 32; res++)
- if (!test_bit_le(res, p))
- return offset + res;
+ if (!test_bit_le(res, p)) {
+ offset += res;
+ return offset < size ? offset : size;
+ }
p++;
offset += 32;
+
+ if (offset >= size)
+ return size;
}
/* No zero yet, search remaining full bytes for a zero */
return offset + find_first_zero_bit_le(p, size - offset);
static inline int find_first_bit_le(const void *vaddr, unsigned size)
{
const unsigned long *p = vaddr, *addr = vaddr;
- int res;
+ int res = 0;
+ unsigned int words;
if (!size)
return 0;
- size = (size >> 5) + ((size & 31) > 0);
+ words = (size >> 5) + ((size & 31) > 0);
while (*p++ == 0UL) {
- if (--size == 0)
- return (p - addr) << 5;
+ if (--words == 0)
+ goto out;
}
--p;
for (res = 0; res < 32; res++)
if (test_bit_le(res, p))
break;
- return (p - addr) * 32 + res;
+out:
+ res += (p - addr) * 32;
+ return res < size ? res : size;
}
static inline unsigned long find_next_bit_le(const void *addr,
offset -= bit;
/* Look for one in first longword */
for (res = bit; res < 32; res++)
- if (test_bit_le(res, p))
- return offset + res;
+ if (test_bit_le(res, p)) {
+ offset += res;
+ return offset < size ? offset : size;
+ }
p++;
offset += 32;
+
+ if (offset >= size)
+ return size;
}
/* No set bit yet, search remaining full bytes for a set bit */
return offset + find_first_bit_le(p, size - offset);
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_chown 16
-#define __NR_break 17
+/*#define __NR_break 17*/
#define __NR_oldstat 18
#define __NR_lseek 19
#define __NR_getpid 20
#define __NR_oldfstat 28
#define __NR_pause 29
#define __NR_utime 30
-#define __NR_stty 31
-#define __NR_gtty 32
+/*#define __NR_stty 31*/
+/*#define __NR_gtty 32*/
#define __NR_access 33
#define __NR_nice 34
-#define __NR_ftime 35
+/*#define __NR_ftime 35*/
#define __NR_sync 36
#define __NR_kill 37
#define __NR_rename 38
#define __NR_dup 41
#define __NR_pipe 42
#define __NR_times 43
-#define __NR_prof 44
+/*#define __NR_prof 44*/
#define __NR_brk 45
#define __NR_setgid 46
#define __NR_getgid 47
#define __NR_getegid 50
#define __NR_acct 51
#define __NR_umount2 52
-#define __NR_lock 53
+/*#define __NR_lock 53*/
#define __NR_ioctl 54
#define __NR_fcntl 55
-#define __NR_mpx 56
+/*#define __NR_mpx 56*/
#define __NR_setpgid 57
-#define __NR_ulimit 58
-#define __NR_oldolduname 59
+/*#define __NR_ulimit 58*/
+/*#define __NR_oldolduname 59*/
#define __NR_umask 60
#define __NR_chroot 61
#define __NR_ustat 62
#define __NR_fchown 95
#define __NR_getpriority 96
#define __NR_setpriority 97
-#define __NR_profil 98
+/*#define __NR_profil 98*/
#define __NR_statfs 99
#define __NR_fstatfs 100
-#define __NR_ioperm 101
+/*#define __NR_ioperm 101*/
#define __NR_socketcall 102
#define __NR_syslog 103
#define __NR_setitimer 104
#define __NR_stat 106
#define __NR_lstat 107
#define __NR_fstat 108
-#define __NR_olduname 109
-#define __NR_iopl /* 110 */ not supported
+/*#define __NR_olduname 109*/
+/*#define __NR_iopl 110*/ /* not supported */
#define __NR_vhangup 111
-#define __NR_idle /* 112 */ Obsolete
-#define __NR_vm86 /* 113 */ not supported
+/*#define __NR_idle 112*/ /* Obsolete */
+/*#define __NR_vm86 113*/ /* not supported */
#define __NR_wait4 114
#define __NR_swapoff 115
#define __NR_sysinfo 116
#define __NR_adjtimex 124
#define __NR_mprotect 125
#define __NR_sigprocmask 126
-#define __NR_create_module 127
+/*#define __NR_create_module 127*/
#define __NR_init_module 128
#define __NR_delete_module 129
-#define __NR_get_kernel_syms 130
+/*#define __NR_get_kernel_syms 130*/
#define __NR_quotactl 131
#define __NR_getpgid 132
#define __NR_fchdir 133
#define __NR_bdflush 134
#define __NR_sysfs 135
#define __NR_personality 136
-#define __NR_afs_syscall 137 /* Syscall for Andrew File System */
+/*#define __NR_afs_syscall 137*/ /* Syscall for Andrew File System */
#define __NR_setfsuid 138
#define __NR_setfsgid 139
#define __NR__llseek 140
#define __NR_setresuid 164
#define __NR_getresuid 165
#define __NR_getpagesize 166
-#define __NR_query_module 167
+/*#define __NR_query_module 167*/
#define __NR_poll 168
#define __NR_nfsservctl 169
#define __NR_setresgid 170
#define __NR_capset 185
#define __NR_sigaltstack 186
#define __NR_sendfile 187
-#define __NR_getpmsg 188 /* some people actually want streams */
-#define __NR_putpmsg 189 /* some people actually want streams */
+/*#define __NR_getpmsg 188*/ /* some people actually want streams */
+/*#define __NR_putpmsg 189*/ /* some people actually want streams */
#define __NR_vfork 190
#define __NR_ugetrlimit 191
#define __NR_mmap2 192
#define __NR_setfsuid32 215
#define __NR_setfsgid32 216
#define __NR_pivot_root 217
+/* 218*/
+/* 219*/
#define __NR_getdents64 220
#define __NR_gettid 221
#define __NR_tkill 222
#define __NR_mq_notify 275
#define __NR_mq_getsetattr 276
#define __NR_waitid 277
-#define __NR_vserver 278
+/*#define __NR_vserver 278*/
#define __NR_add_key 279
#define __NR_request_key 280
#define __NR_keyctl 281
extra-y += vmlinux.lds
obj-y := entry.o process.o traps.o ints.o signal.o ptrace.o module.o \
- sys_m68k.o time.o setup.o m68k_ksyms.o devres.o
+ sys_m68k.o time.o setup.o m68k_ksyms.o devres.o syscalltable.o
devres-y = ../../../kernel/irq/devres.o
rts
-.data
-ALIGN
-sys_call_table:
- .long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */
- .long sys_exit
- .long sys_fork
- .long sys_read
- .long sys_write
- .long sys_open /* 5 */
- .long sys_close
- .long sys_waitpid
- .long sys_creat
- .long sys_link
- .long sys_unlink /* 10 */
- .long sys_execve
- .long sys_chdir
- .long sys_time
- .long sys_mknod
- .long sys_chmod /* 15 */
- .long sys_chown16
- .long sys_ni_syscall /* old break syscall holder */
- .long sys_stat
- .long sys_lseek
- .long sys_getpid /* 20 */
- .long sys_mount
- .long sys_oldumount
- .long sys_setuid16
- .long sys_getuid16
- .long sys_stime /* 25 */
- .long sys_ptrace
- .long sys_alarm
- .long sys_fstat
- .long sys_pause
- .long sys_utime /* 30 */
- .long sys_ni_syscall /* old stty syscall holder */
- .long sys_ni_syscall /* old gtty syscall holder */
- .long sys_access
- .long sys_nice
- .long sys_ni_syscall /* 35 */ /* old ftime syscall holder */
- .long sys_sync
- .long sys_kill
- .long sys_rename
- .long sys_mkdir
- .long sys_rmdir /* 40 */
- .long sys_dup
- .long sys_pipe
- .long sys_times
- .long sys_ni_syscall /* old prof syscall holder */
- .long sys_brk /* 45 */
- .long sys_setgid16
- .long sys_getgid16
- .long sys_signal
- .long sys_geteuid16
- .long sys_getegid16 /* 50 */
- .long sys_acct
- .long sys_umount /* recycled never used phys() */
- .long sys_ni_syscall /* old lock syscall holder */
- .long sys_ioctl
- .long sys_fcntl /* 55 */
- .long sys_ni_syscall /* old mpx syscall holder */
- .long sys_setpgid
- .long sys_ni_syscall /* old ulimit syscall holder */
- .long sys_ni_syscall
- .long sys_umask /* 60 */
- .long sys_chroot
- .long sys_ustat
- .long sys_dup2
- .long sys_getppid
- .long sys_getpgrp /* 65 */
- .long sys_setsid
- .long sys_sigaction
- .long sys_sgetmask
- .long sys_ssetmask
- .long sys_setreuid16 /* 70 */
- .long sys_setregid16
- .long sys_sigsuspend
- .long sys_sigpending
- .long sys_sethostname
- .long sys_setrlimit /* 75 */
- .long sys_old_getrlimit
- .long sys_getrusage
- .long sys_gettimeofday
- .long sys_settimeofday
- .long sys_getgroups16 /* 80 */
- .long sys_setgroups16
- .long sys_old_select
- .long sys_symlink
- .long sys_lstat
- .long sys_readlink /* 85 */
- .long sys_uselib
- .long sys_swapon
- .long sys_reboot
- .long sys_old_readdir
- .long sys_old_mmap /* 90 */
- .long sys_munmap
- .long sys_truncate
- .long sys_ftruncate
- .long sys_fchmod
- .long sys_fchown16 /* 95 */
- .long sys_getpriority
- .long sys_setpriority
- .long sys_ni_syscall /* old profil syscall holder */
- .long sys_statfs
- .long sys_fstatfs /* 100 */
- .long sys_ni_syscall /* ioperm for i386 */
- .long sys_socketcall
- .long sys_syslog
- .long sys_setitimer
- .long sys_getitimer /* 105 */
- .long sys_newstat
- .long sys_newlstat
- .long sys_newfstat
- .long sys_ni_syscall
- .long sys_ni_syscall /* 110 */ /* iopl for i386 */
- .long sys_vhangup
- .long sys_ni_syscall /* obsolete idle() syscall */
- .long sys_ni_syscall /* vm86old for i386 */
- .long sys_wait4
- .long sys_swapoff /* 115 */
- .long sys_sysinfo
- .long sys_ipc
- .long sys_fsync
- .long sys_sigreturn
- .long sys_clone /* 120 */
- .long sys_setdomainname
- .long sys_newuname
- .long sys_cacheflush /* modify_ldt for i386 */
- .long sys_adjtimex
- .long sys_mprotect /* 125 */
- .long sys_sigprocmask
- .long sys_ni_syscall /* old "create_module" */
- .long sys_init_module
- .long sys_delete_module
- .long sys_ni_syscall /* 130 - old "get_kernel_syms" */
- .long sys_quotactl
- .long sys_getpgid
- .long sys_fchdir
- .long sys_bdflush
- .long sys_sysfs /* 135 */
- .long sys_personality
- .long sys_ni_syscall /* for afs_syscall */
- .long sys_setfsuid16
- .long sys_setfsgid16
- .long sys_llseek /* 140 */
- .long sys_getdents
- .long sys_select
- .long sys_flock
- .long sys_msync
- .long sys_readv /* 145 */
- .long sys_writev
- .long sys_getsid
- .long sys_fdatasync
- .long sys_sysctl
- .long sys_mlock /* 150 */
- .long sys_munlock
- .long sys_mlockall
- .long sys_munlockall
- .long sys_sched_setparam
- .long sys_sched_getparam /* 155 */
- .long sys_sched_setscheduler
- .long sys_sched_getscheduler
- .long sys_sched_yield
- .long sys_sched_get_priority_max
- .long sys_sched_get_priority_min /* 160 */
- .long sys_sched_rr_get_interval
- .long sys_nanosleep
- .long sys_mremap
- .long sys_setresuid16
- .long sys_getresuid16 /* 165 */
- .long sys_getpagesize
- .long sys_ni_syscall /* old sys_query_module */
- .long sys_poll
- .long sys_nfsservctl
- .long sys_setresgid16 /* 170 */
- .long sys_getresgid16
- .long sys_prctl
- .long sys_rt_sigreturn
- .long sys_rt_sigaction
- .long sys_rt_sigprocmask /* 175 */
- .long sys_rt_sigpending
- .long sys_rt_sigtimedwait
- .long sys_rt_sigqueueinfo
- .long sys_rt_sigsuspend
- .long sys_pread64 /* 180 */
- .long sys_pwrite64
- .long sys_lchown16;
- .long sys_getcwd
- .long sys_capget
- .long sys_capset /* 185 */
- .long sys_sigaltstack
- .long sys_sendfile
- .long sys_ni_syscall /* streams1 */
- .long sys_ni_syscall /* streams2 */
- .long sys_vfork /* 190 */
- .long sys_getrlimit
- .long sys_mmap2
- .long sys_truncate64
- .long sys_ftruncate64
- .long sys_stat64 /* 195 */
- .long sys_lstat64
- .long sys_fstat64
- .long sys_chown
- .long sys_getuid
- .long sys_getgid /* 200 */
- .long sys_geteuid
- .long sys_getegid
- .long sys_setreuid
- .long sys_setregid
- .long sys_getgroups /* 205 */
- .long sys_setgroups
- .long sys_fchown
- .long sys_setresuid
- .long sys_getresuid
- .long sys_setresgid /* 210 */
- .long sys_getresgid
- .long sys_lchown
- .long sys_setuid
- .long sys_setgid
- .long sys_setfsuid /* 215 */
- .long sys_setfsgid
- .long sys_pivot_root
- .long sys_ni_syscall
- .long sys_ni_syscall
- .long sys_getdents64 /* 220 */
- .long sys_gettid
- .long sys_tkill
- .long sys_setxattr
- .long sys_lsetxattr
- .long sys_fsetxattr /* 225 */
- .long sys_getxattr
- .long sys_lgetxattr
- .long sys_fgetxattr
- .long sys_listxattr
- .long sys_llistxattr /* 230 */
- .long sys_flistxattr
- .long sys_removexattr
- .long sys_lremovexattr
- .long sys_fremovexattr
- .long sys_futex /* 235 */
- .long sys_sendfile64
- .long sys_mincore
- .long sys_madvise
- .long sys_fcntl64
- .long sys_readahead /* 240 */
- .long sys_io_setup
- .long sys_io_destroy
- .long sys_io_getevents
- .long sys_io_submit
- .long sys_io_cancel /* 245 */
- .long sys_fadvise64
- .long sys_exit_group
- .long sys_lookup_dcookie
- .long sys_epoll_create
- .long sys_epoll_ctl /* 250 */
- .long sys_epoll_wait
- .long sys_remap_file_pages
- .long sys_set_tid_address
- .long sys_timer_create
- .long sys_timer_settime /* 255 */
- .long sys_timer_gettime
- .long sys_timer_getoverrun
- .long sys_timer_delete
- .long sys_clock_settime
- .long sys_clock_gettime /* 260 */
- .long sys_clock_getres
- .long sys_clock_nanosleep
- .long sys_statfs64
- .long sys_fstatfs64
- .long sys_tgkill /* 265 */
- .long sys_utimes
- .long sys_fadvise64_64
- .long sys_mbind
- .long sys_get_mempolicy
- .long sys_set_mempolicy /* 270 */
- .long sys_mq_open
- .long sys_mq_unlink
- .long sys_mq_timedsend
- .long sys_mq_timedreceive
- .long sys_mq_notify /* 275 */
- .long sys_mq_getsetattr
- .long sys_waitid
- .long sys_ni_syscall /* for sys_vserver */
- .long sys_add_key
- .long sys_request_key /* 280 */
- .long sys_keyctl
- .long sys_ioprio_set
- .long sys_ioprio_get
- .long sys_inotify_init
- .long sys_inotify_add_watch /* 285 */
- .long sys_inotify_rm_watch
- .long sys_migrate_pages
- .long sys_openat
- .long sys_mkdirat
- .long sys_mknodat /* 290 */
- .long sys_fchownat
- .long sys_futimesat
- .long sys_fstatat64
- .long sys_unlinkat
- .long sys_renameat /* 295 */
- .long sys_linkat
- .long sys_symlinkat
- .long sys_readlinkat
- .long sys_fchmodat
- .long sys_faccessat /* 300 */
- .long sys_ni_syscall /* Reserved for pselect6 */
- .long sys_ni_syscall /* Reserved for ppoll */
- .long sys_unshare
- .long sys_set_robust_list
- .long sys_get_robust_list /* 305 */
- .long sys_splice
- .long sys_sync_file_range
- .long sys_tee
- .long sys_vmsplice
- .long sys_move_pages /* 310 */
- .long sys_sched_setaffinity
- .long sys_sched_getaffinity
- .long sys_kexec_load
- .long sys_getcpu
- .long sys_epoll_pwait /* 315 */
- .long sys_utimensat
- .long sys_signalfd
- .long sys_timerfd_create
- .long sys_eventfd
- .long sys_fallocate /* 320 */
- .long sys_timerfd_settime
- .long sys_timerfd_gettime
- .long sys_signalfd4
- .long sys_eventfd2
- .long sys_epoll_create1 /* 325 */
- .long sys_dup3
- .long sys_pipe2
- .long sys_inotify_init1
- .long sys_preadv
- .long sys_pwritev /* 330 */
- .long sys_rt_tgsigqueueinfo
- .long sys_perf_event_open
- .long sys_get_thread_area
- .long sys_set_thread_area
- .long sys_atomic_cmpxchg_32 /* 335 */
- .long sys_atomic_barrier
- .long sys_fanotify_init
- .long sys_fanotify_mark
- .long sys_prlimit64
- .long sys_name_to_handle_at /* 340 */
- .long sys_open_by_handle_at
- .long sys_clock_adjtime
- .long sys_syncfs
-
/*
- * linux/arch/m68knommu/kernel/syscalltable.S
- *
* Copyright (C) 2002, Greg Ungerer (gerg@snapgear.com)
*
* Based on older entry.S files, the following copyrights apply:
* Kenneth Albanowski <kjahds@kjahds.com>,
* Copyright (C) 2000 Lineo Inc. (www.lineo.com)
* Copyright (C) 1991, 1992 Linus Torvalds
+ *
+ * Linux/m68k support by Hamish Macdonald
*/
#include <linux/sys.h>
#include <linux/linkage.h>
-#include <asm/unistd.h>
-.text
+#ifndef CONFIG_MMU
+#define sys_mmap2 sys_mmap_pgoff
+#endif
+
+.section .rodata
ALIGN
ENTRY(sys_call_table)
- .long sys_restart_syscall /* 0 - old "setup()" system call */
+ .long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */
.long sys_exit
.long sys_fork
.long sys_read
.long sys_write
- .long sys_open /* 5 */
+ .long sys_open /* 5 */
.long sys_close
.long sys_waitpid
.long sys_creat
.long sys_link
- .long sys_unlink /* 10 */
+ .long sys_unlink /* 10 */
.long sys_execve
.long sys_chdir
.long sys_time
.long sys_mknod
- .long sys_chmod /* 15 */
+ .long sys_chmod /* 15 */
.long sys_chown16
- .long sys_ni_syscall /* old break syscall holder */
+ .long sys_ni_syscall /* old break syscall holder */
.long sys_stat
.long sys_lseek
- .long sys_getpid /* 20 */
+ .long sys_getpid /* 20 */
.long sys_mount
.long sys_oldumount
.long sys_setuid16
.long sys_getuid16
- .long sys_stime /* 25 */
+ .long sys_stime /* 25 */
.long sys_ptrace
.long sys_alarm
.long sys_fstat
.long sys_pause
- .long sys_utime /* 30 */
- .long sys_ni_syscall /* old stty syscall holder */
- .long sys_ni_syscall /* old gtty syscall holder */
+ .long sys_utime /* 30 */
+ .long sys_ni_syscall /* old stty syscall holder */
+ .long sys_ni_syscall /* old gtty syscall holder */
.long sys_access
.long sys_nice
- .long sys_ni_syscall /* 35 */ /* old ftime syscall holder */
+ .long sys_ni_syscall /* 35 - old ftime syscall holder */
.long sys_sync
.long sys_kill
.long sys_rename
.long sys_mkdir
- .long sys_rmdir /* 40 */
+ .long sys_rmdir /* 40 */
.long sys_dup
.long sys_pipe
.long sys_times
- .long sys_ni_syscall /* old prof syscall holder */
- .long sys_brk /* 45 */
+ .long sys_ni_syscall /* old prof syscall holder */
+ .long sys_brk /* 45 */
.long sys_setgid16
.long sys_getgid16
.long sys_signal
.long sys_geteuid16
- .long sys_getegid16 /* 50 */
+ .long sys_getegid16 /* 50 */
.long sys_acct
- .long sys_umount /* recycled never used phys() */
- .long sys_ni_syscall /* old lock syscall holder */
+ .long sys_umount /* recycled never used phys() */
+ .long sys_ni_syscall /* old lock syscall holder */
.long sys_ioctl
- .long sys_fcntl /* 55 */
- .long sys_ni_syscall /* old mpx syscall holder */
+ .long sys_fcntl /* 55 */
+ .long sys_ni_syscall /* old mpx syscall holder */
.long sys_setpgid
- .long sys_ni_syscall /* old ulimit syscall holder */
+ .long sys_ni_syscall /* old ulimit syscall holder */
.long sys_ni_syscall
- .long sys_umask /* 60 */
+ .long sys_umask /* 60 */
.long sys_chroot
.long sys_ustat
.long sys_dup2
.long sys_getppid
- .long sys_getpgrp /* 65 */
+ .long sys_getpgrp /* 65 */
.long sys_setsid
.long sys_sigaction
.long sys_sgetmask
.long sys_ssetmask
- .long sys_setreuid16 /* 70 */
+ .long sys_setreuid16 /* 70 */
.long sys_setregid16
.long sys_sigsuspend
.long sys_sigpending
.long sys_sethostname
- .long sys_setrlimit /* 75 */
+ .long sys_setrlimit /* 75 */
.long sys_old_getrlimit
.long sys_getrusage
.long sys_gettimeofday
.long sys_settimeofday
- .long sys_getgroups16 /* 80 */
+ .long sys_getgroups16 /* 80 */
.long sys_setgroups16
.long sys_old_select
.long sys_symlink
.long sys_lstat
- .long sys_readlink /* 85 */
+ .long sys_readlink /* 85 */
.long sys_uselib
- .long sys_ni_syscall /* sys_swapon */
+ .long sys_swapon
.long sys_reboot
.long sys_old_readdir
- .long sys_old_mmap /* 90 */
+ .long sys_old_mmap /* 90 */
.long sys_munmap
.long sys_truncate
.long sys_ftruncate
.long sys_fchmod
- .long sys_fchown16 /* 95 */
+ .long sys_fchown16 /* 95 */
.long sys_getpriority
.long sys_setpriority
- .long sys_ni_syscall /* old profil syscall holder */
+ .long sys_ni_syscall /* old profil syscall holder */
.long sys_statfs
- .long sys_fstatfs /* 100 */
- .long sys_ni_syscall /* ioperm for i386 */
+ .long sys_fstatfs /* 100 */
+ .long sys_ni_syscall /* ioperm for i386 */
.long sys_socketcall
.long sys_syslog
.long sys_setitimer
- .long sys_getitimer /* 105 */
+ .long sys_getitimer /* 105 */
.long sys_newstat
.long sys_newlstat
.long sys_newfstat
.long sys_ni_syscall
- .long sys_ni_syscall /* iopl for i386 */ /* 110 */
+ .long sys_ni_syscall /* 110 - iopl for i386 */
.long sys_vhangup
- .long sys_ni_syscall /* obsolete idle() syscall */
- .long sys_ni_syscall /* vm86old for i386 */
+ .long sys_ni_syscall /* obsolete idle() syscall */
+ .long sys_ni_syscall /* vm86old for i386 */
.long sys_wait4
- .long sys_ni_syscall /* 115 */ /* sys_swapoff */
+ .long sys_swapoff /* 115 */
.long sys_sysinfo
.long sys_ipc
.long sys_fsync
.long sys_sigreturn
- .long sys_clone /* 120 */
+ .long sys_clone /* 120 */
.long sys_setdomainname
.long sys_newuname
- .long sys_cacheflush /* modify_ldt for i386 */
+ .long sys_cacheflush /* modify_ldt for i386 */
.long sys_adjtimex
- .long sys_ni_syscall /* 125 */ /* sys_mprotect */
+ .long sys_mprotect /* 125 */
.long sys_sigprocmask
- .long sys_ni_syscall /* old "creat_module" */
+ .long sys_ni_syscall /* old "create_module" */
.long sys_init_module
.long sys_delete_module
- .long sys_ni_syscall /* 130: old "get_kernel_syms" */
+ .long sys_ni_syscall /* 130 - old "get_kernel_syms" */
.long sys_quotactl
.long sys_getpgid
.long sys_fchdir
.long sys_bdflush
- .long sys_sysfs /* 135 */
+ .long sys_sysfs /* 135 */
.long sys_personality
- .long sys_ni_syscall /* for afs_syscall */
+ .long sys_ni_syscall /* for afs_syscall */
.long sys_setfsuid16
.long sys_setfsgid16
- .long sys_llseek /* 140 */
+ .long sys_llseek /* 140 */
.long sys_getdents
.long sys_select
.long sys_flock
- .long sys_ni_syscall /* sys_msync */
- .long sys_readv /* 145 */
+ .long sys_msync
+ .long sys_readv /* 145 */
.long sys_writev
.long sys_getsid
.long sys_fdatasync
.long sys_sysctl
- .long sys_ni_syscall /* 150 */ /* sys_mlock */
- .long sys_ni_syscall /* sys_munlock */
- .long sys_ni_syscall /* sys_mlockall */
- .long sys_ni_syscall /* sys_munlockall */
+ .long sys_mlock /* 150 */
+ .long sys_munlock
+ .long sys_mlockall
+ .long sys_munlockall
.long sys_sched_setparam
- .long sys_sched_getparam /* 155 */
+ .long sys_sched_getparam /* 155 */
.long sys_sched_setscheduler
.long sys_sched_getscheduler
.long sys_sched_yield
.long sys_sched_get_priority_min /* 160 */
.long sys_sched_rr_get_interval
.long sys_nanosleep
- .long sys_ni_syscall /* sys_mremap */
+ .long sys_mremap
.long sys_setresuid16
- .long sys_getresuid16 /* 165 */
- .long sys_getpagesize /* sys_getpagesize */
- .long sys_ni_syscall /* old "query_module" */
+ .long sys_getresuid16 /* 165 */
+ .long sys_getpagesize
+ .long sys_ni_syscall /* old "query_module" */
.long sys_poll
- .long sys_ni_syscall /* sys_nfsservctl */
- .long sys_setresgid16 /* 170 */
+ .long sys_nfsservctl
+ .long sys_setresgid16 /* 170 */
.long sys_getresgid16
.long sys_prctl
.long sys_rt_sigreturn
.long sys_rt_sigaction
- .long sys_rt_sigprocmask /* 175 */
+ .long sys_rt_sigprocmask /* 175 */
.long sys_rt_sigpending
.long sys_rt_sigtimedwait
.long sys_rt_sigqueueinfo
.long sys_rt_sigsuspend
- .long sys_pread64 /* 180 */
+ .long sys_pread64 /* 180 */
.long sys_pwrite64
.long sys_lchown16
.long sys_getcwd
.long sys_capget
- .long sys_capset /* 185 */
+ .long sys_capset /* 185 */
.long sys_sigaltstack
.long sys_sendfile
- .long sys_ni_syscall /* streams1 */
- .long sys_ni_syscall /* streams2 */
- .long sys_vfork /* 190 */
+ .long sys_ni_syscall /* streams1 */
+ .long sys_ni_syscall /* streams2 */
+ .long sys_vfork /* 190 */
.long sys_getrlimit
- .long sys_mmap_pgoff
+ .long sys_mmap2
.long sys_truncate64
.long sys_ftruncate64
- .long sys_stat64 /* 195 */
+ .long sys_stat64 /* 195 */
.long sys_lstat64
.long sys_fstat64
.long sys_chown
.long sys_getuid
- .long sys_getgid /* 200 */
+ .long sys_getgid /* 200 */
.long sys_geteuid
.long sys_getegid
.long sys_setreuid
.long sys_setregid
- .long sys_getgroups /* 205 */
+ .long sys_getgroups /* 205 */
.long sys_setgroups
.long sys_fchown
.long sys_setresuid
.long sys_getresuid
- .long sys_setresgid /* 210 */
+ .long sys_setresgid /* 210 */
.long sys_getresgid
.long sys_lchown
.long sys_setuid
.long sys_setgid
- .long sys_setfsuid /* 215 */
+ .long sys_setfsuid /* 215 */
.long sys_setfsgid
.long sys_pivot_root
.long sys_ni_syscall
.long sys_ni_syscall
- .long sys_getdents64 /* 220 */
+ .long sys_getdents64 /* 220 */
.long sys_gettid
.long sys_tkill
.long sys_setxattr
.long sys_lsetxattr
- .long sys_fsetxattr /* 225 */
+ .long sys_fsetxattr /* 225 */
.long sys_getxattr
.long sys_lgetxattr
.long sys_fgetxattr
.long sys_listxattr
- .long sys_llistxattr /* 230 */
+ .long sys_llistxattr /* 230 */
.long sys_flistxattr
.long sys_removexattr
.long sys_lremovexattr
.long sys_fremovexattr
- .long sys_futex /* 235 */
+ .long sys_futex /* 235 */
.long sys_sendfile64
- .long sys_ni_syscall /* sys_mincore */
- .long sys_ni_syscall /* sys_madvise */
+ .long sys_mincore
+ .long sys_madvise
.long sys_fcntl64
- .long sys_readahead /* 240 */
+ .long sys_readahead /* 240 */
.long sys_io_setup
.long sys_io_destroy
.long sys_io_getevents
.long sys_io_submit
- .long sys_io_cancel /* 245 */
+ .long sys_io_cancel /* 245 */
.long sys_fadvise64
.long sys_exit_group
.long sys_lookup_dcookie
.long sys_epoll_create
- .long sys_epoll_ctl /* 250 */
+ .long sys_epoll_ctl /* 250 */
.long sys_epoll_wait
- .long sys_ni_syscall /* sys_remap_file_pages */
+ .long sys_remap_file_pages
.long sys_set_tid_address
.long sys_timer_create
- .long sys_timer_settime /* 255 */
+ .long sys_timer_settime /* 255 */
.long sys_timer_gettime
.long sys_timer_getoverrun
.long sys_timer_delete
.long sys_clock_settime
- .long sys_clock_gettime /* 260 */
+ .long sys_clock_gettime /* 260 */
.long sys_clock_getres
.long sys_clock_nanosleep
.long sys_statfs64
.long sys_fstatfs64
- .long sys_tgkill /* 265 */
+ .long sys_tgkill /* 265 */
.long sys_utimes
.long sys_fadvise64_64
- .long sys_mbind
+ .long sys_mbind
.long sys_get_mempolicy
- .long sys_set_mempolicy /* 270 */
+ .long sys_set_mempolicy /* 270 */
.long sys_mq_open
.long sys_mq_unlink
.long sys_mq_timedsend
.long sys_mq_timedreceive
- .long sys_mq_notify /* 275 */
+ .long sys_mq_notify /* 275 */
.long sys_mq_getsetattr
.long sys_waitid
- .long sys_ni_syscall /* for sys_vserver */
+ .long sys_ni_syscall /* for sys_vserver */
.long sys_add_key
- .long sys_request_key /* 280 */
+ .long sys_request_key /* 280 */
.long sys_keyctl
.long sys_ioprio_set
.long sys_ioprio_get
.long sys_readlinkat
.long sys_fchmodat
.long sys_faccessat /* 300 */
- .long sys_ni_syscall /* Reserved for pselect6 */
- .long sys_ni_syscall /* Reserved for ppoll */
+ .long sys_pselect6
+ .long sys_ppoll
.long sys_unshare
.long sys_set_robust_list
.long sys_get_robust_list /* 305 */
.long sys_clock_adjtime
.long sys_syncfs
- .rept NR_syscalls-(.-sys_call_table)/4
- .long sys_ni_syscall
- .endr
-
zones_size[ZONE_DMA] = m68k_memory[i].size >> PAGE_SHIFT;
free_area_init_node(i, zones_size,
m68k_memory[i].addr >> PAGE_SHIFT, NULL);
+ if (node_present_pages(i))
+ node_set_state(i, N_NORMAL_MEMORY);
}
}
.rating = 300,
.read = microblaze_read,
.mask = CLOCKSOURCE_MASK(32),
- .shift = 8, /* I can shift it */
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
static int __init microblaze_clocksource_init(void)
{
- clocksource_microblaze.mult =
- clocksource_hz2mult(timer_clock_freq,
- clocksource_microblaze.shift);
- if (clocksource_register(&clocksource_microblaze))
+ if (clocksource_register_hz(&clocksource_microblaze, timer_clock_freq))
panic("failed to register clocksource");
/* stop timer1 */
platforms += emma
platforms += jazz
platforms += jz4740
+platforms += lantiq
platforms += lasat
platforms += loongson
platforms += mipssim
select HAVE_PWM
select HAVE_CLK
+config LANTIQ
+ bool "Lantiq based platforms"
+ select DMA_NONCOHERENT
+ select IRQ_CPU
+ select CEVT_R4K
+ select CSRC_R4K
+ select SYS_HAS_CPU_MIPS32_R1
+ select SYS_HAS_CPU_MIPS32_R2
+ select SYS_SUPPORTS_BIG_ENDIAN
+ select SYS_SUPPORTS_32BIT_KERNEL
+ select SYS_SUPPORTS_MULTITHREADING
+ select SYS_HAS_EARLY_PRINTK
+ select ARCH_REQUIRE_GPIOLIB
+ select SWAP_IO_SPACE
+ select BOOT_RAW
+ select HAVE_CLK
+ select MIPS_MACHINE
+
config LASAT
bool "LASAT Networks platforms"
select CEVT_R4K
Hikari
Say Y here for most Octeon reference boards.
+config NLM_XLR_BOARD
+ bool "Netlogic XLR/XLS based systems"
+ depends on EXPERIMENTAL
+ select BOOT_ELF32
+ select NLM_COMMON
+ select NLM_XLR
+ select SYS_HAS_CPU_XLR
+ select SYS_SUPPORTS_SMP
+ select HW_HAS_PCI
+ select SWAP_IO_SPACE
+ select SYS_SUPPORTS_32BIT_KERNEL
+ select SYS_SUPPORTS_64BIT_KERNEL
+ select 64BIT_PHYS_ADDR
+ select SYS_SUPPORTS_BIG_ENDIAN
+ select SYS_SUPPORTS_HIGHMEM
+ select DMA_COHERENT
+ select NR_CPUS_DEFAULT_32
+ select CEVT_R4K
+ select CSRC_R4K
+ select IRQ_CPU
+ select ZONE_DMA if 64BIT
+ select SYNC_R4K
+ select SYS_HAS_EARLY_PRINTK
+ help
+ Support for systems based on Netlogic XLR and XLS processors.
+ Say Y here if you have a XLR or XLS based board.
+
endchoice
source "arch/mips/alchemy/Kconfig"
source "arch/mips/bcm63xx/Kconfig"
source "arch/mips/jazz/Kconfig"
source "arch/mips/jz4740/Kconfig"
+source "arch/mips/lantiq/Kconfig"
source "arch/mips/lasat/Kconfig"
source "arch/mips/pmc-sierra/Kconfig"
source "arch/mips/powertv/Kconfig"
source "arch/mips/vr41xx/Kconfig"
source "arch/mips/cavium-octeon/Kconfig"
source "arch/mips/loongson/Kconfig"
+source "arch/mips/netlogic/Kconfig"
endmenu
config IRQ_GIC
bool
-config IRQ_CPU_OCTEON
- bool
-
config MIPS_BOARDS_GEN
bool
config CPU_CAVIUM_OCTEON
bool "Cavium Octeon processor"
depends on SYS_HAS_CPU_CAVIUM_OCTEON
- select IRQ_CPU
- select IRQ_CPU_OCTEON
select CPU_HAS_PREFETCH
select CPU_SUPPORTS_64BIT_KERNEL
select SYS_SUPPORTS_SMP
help
Broadcom BMIPS5000 processors.
+config CPU_XLR
+ bool "Netlogic XLR SoC"
+ depends on SYS_HAS_CPU_XLR
+ select CPU_SUPPORTS_32BIT_KERNEL
+ select CPU_SUPPORTS_64BIT_KERNEL
+ select CPU_SUPPORTS_HIGHMEM
+ select WEAK_ORDERING
+ select WEAK_REORDERING_BEYOND_LLSC
+ select CPU_SUPPORTS_HUGEPAGES
+ help
+ Netlogic Microsystems XLR/XLS processors.
endchoice
if CPU_LOONGSON2F
config SYS_HAS_CPU_BMIPS5000
bool
+config SYS_HAS_CPU_XLR
+ bool
+
#
# CPU may reorder R->R, R->W, W->R, W->W
# Reordering beyond LL and SC is handled in WEAK_REORDERING_BEYOND_LLSC
config I8253
bool
+ select CLKSRC_I8253
select MIPS_EXTERNAL_TIMER
config ZONE_DMA32
#
include $(srctree)/arch/mips/Kbuild.platforms
+#
+# NETLOGIC SOC Common (common)
+#
+cflags-$(CONFIG_NLM_COMMON) += -I$(srctree)/arch/mips/include/asm/mach-netlogic
+cflags-$(CONFIG_NLM_COMMON) += -I$(srctree)/arch/mips/include/asm/netlogic
+
+#
+# NETLOGIC XLR/XLS SoC, Simulator and boards
+#
+core-$(CONFIG_NLM_XLR) += arch/mips/netlogic/xlr/
+load-$(CONFIG_NLM_XLR_BOARD) += 0xffffffff84000000
+
cflags-y += -I$(srctree)/arch/mips/include/asm/mach-generic
drivers-$(CONFIG_PCI) += arch/mips/pci/
#include <linux/spinlock.h>
#include <linux/interrupt.h>
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/mach-au1x00/au1000.h>
#include <asm/mach-au1x00/au1xxx_dbdma.h>
/* I couldn't find a macro that did this... */
#define ALIGN_ADDR(x, a) ((((u32)(x)) + (a-1)) & ~(a-1))
-static dbdma_global_t *dbdma_gptr = (dbdma_global_t *)DDMA_GLOBAL_BASE;
+static dbdma_global_t *dbdma_gptr =
+ (dbdma_global_t *)KSEG1ADDR(AU1550_DBDMA_CONF_PHYS_ADDR);
static int dbdma_initialized;
static dbdev_tab_t dbdev_tab[] = {
if (ctp != NULL) {
memset(ctp, 0, sizeof(chan_tab_t));
ctp->chan_index = chan = i;
- dcp = DDMA_CHANNEL_BASE;
+ dcp = KSEG1ADDR(AU1550_DBDMA_PHYS_ADDR);
dcp += (0x0100 * chan);
ctp->chan_ptr = (au1x_dma_chan_t *)dcp;
cp = (au1x_dma_chan_t *)dcp;
}
-struct alchemy_dbdma_sysdev {
- struct sys_device sysdev;
- u32 pm_regs[NUM_DBDMA_CHANS + 1][6];
-};
+static unsigned long alchemy_dbdma_pm_data[NUM_DBDMA_CHANS + 1][6];
-static int alchemy_dbdma_suspend(struct sys_device *dev,
- pm_message_t state)
+static int alchemy_dbdma_suspend(void)
{
- struct alchemy_dbdma_sysdev *sdev =
- container_of(dev, struct alchemy_dbdma_sysdev, sysdev);
int i;
- u32 addr;
+ void __iomem *addr;
- addr = DDMA_GLOBAL_BASE;
- sdev->pm_regs[0][0] = au_readl(addr + 0x00);
- sdev->pm_regs[0][1] = au_readl(addr + 0x04);
- sdev->pm_regs[0][2] = au_readl(addr + 0x08);
- sdev->pm_regs[0][3] = au_readl(addr + 0x0c);
+ addr = (void __iomem *)KSEG1ADDR(AU1550_DBDMA_CONF_PHYS_ADDR);
+ alchemy_dbdma_pm_data[0][0] = __raw_readl(addr + 0x00);
+ alchemy_dbdma_pm_data[0][1] = __raw_readl(addr + 0x04);
+ alchemy_dbdma_pm_data[0][2] = __raw_readl(addr + 0x08);
+ alchemy_dbdma_pm_data[0][3] = __raw_readl(addr + 0x0c);
/* save channel configurations */
- for (i = 1, addr = DDMA_CHANNEL_BASE; i <= NUM_DBDMA_CHANS; i++) {
- sdev->pm_regs[i][0] = au_readl(addr + 0x00);
- sdev->pm_regs[i][1] = au_readl(addr + 0x04);
- sdev->pm_regs[i][2] = au_readl(addr + 0x08);
- sdev->pm_regs[i][3] = au_readl(addr + 0x0c);
- sdev->pm_regs[i][4] = au_readl(addr + 0x10);
- sdev->pm_regs[i][5] = au_readl(addr + 0x14);
+ addr = (void __iomem *)KSEG1ADDR(AU1550_DBDMA_PHYS_ADDR);
+ for (i = 1; i <= NUM_DBDMA_CHANS; i++) {
+ alchemy_dbdma_pm_data[i][0] = __raw_readl(addr + 0x00);
+ alchemy_dbdma_pm_data[i][1] = __raw_readl(addr + 0x04);
+ alchemy_dbdma_pm_data[i][2] = __raw_readl(addr + 0x08);
+ alchemy_dbdma_pm_data[i][3] = __raw_readl(addr + 0x0c);
+ alchemy_dbdma_pm_data[i][4] = __raw_readl(addr + 0x10);
+ alchemy_dbdma_pm_data[i][5] = __raw_readl(addr + 0x14);
/* halt channel */
- au_writel(sdev->pm_regs[i][0] & ~1, addr + 0x00);
- au_sync();
- while (!(au_readl(addr + 0x14) & 1))
- au_sync();
+ __raw_writel(alchemy_dbdma_pm_data[i][0] & ~1, addr + 0x00);
+ wmb();
+ while (!(__raw_readl(addr + 0x14) & 1))
+ wmb();
addr += 0x100; /* next channel base */
}
/* disable channel interrupts */
- au_writel(0, DDMA_GLOBAL_BASE + 0x0c);
- au_sync();
+ addr = (void __iomem *)KSEG1ADDR(AU1550_DBDMA_CONF_PHYS_ADDR);
+ __raw_writel(0, addr + 0x0c);
+ wmb();
return 0;
}
-static int alchemy_dbdma_resume(struct sys_device *dev)
+static void alchemy_dbdma_resume(void)
{
- struct alchemy_dbdma_sysdev *sdev =
- container_of(dev, struct alchemy_dbdma_sysdev, sysdev);
int i;
- u32 addr;
+ void __iomem *addr;
- addr = DDMA_GLOBAL_BASE;
- au_writel(sdev->pm_regs[0][0], addr + 0x00);
- au_writel(sdev->pm_regs[0][1], addr + 0x04);
- au_writel(sdev->pm_regs[0][2], addr + 0x08);
- au_writel(sdev->pm_regs[0][3], addr + 0x0c);
+ addr = (void __iomem *)KSEG1ADDR(AU1550_DBDMA_CONF_PHYS_ADDR);
+ __raw_writel(alchemy_dbdma_pm_data[0][0], addr + 0x00);
+ __raw_writel(alchemy_dbdma_pm_data[0][1], addr + 0x04);
+ __raw_writel(alchemy_dbdma_pm_data[0][2], addr + 0x08);
+ __raw_writel(alchemy_dbdma_pm_data[0][3], addr + 0x0c);
/* restore channel configurations */
- for (i = 1, addr = DDMA_CHANNEL_BASE; i <= NUM_DBDMA_CHANS; i++) {
- au_writel(sdev->pm_regs[i][0], addr + 0x00);
- au_writel(sdev->pm_regs[i][1], addr + 0x04);
- au_writel(sdev->pm_regs[i][2], addr + 0x08);
- au_writel(sdev->pm_regs[i][3], addr + 0x0c);
- au_writel(sdev->pm_regs[i][4], addr + 0x10);
- au_writel(sdev->pm_regs[i][5], addr + 0x14);
- au_sync();
+ addr = (void __iomem *)KSEG1ADDR(AU1550_DBDMA_PHYS_ADDR);
+ for (i = 1; i <= NUM_DBDMA_CHANS; i++) {
+ __raw_writel(alchemy_dbdma_pm_data[i][0], addr + 0x00);
+ __raw_writel(alchemy_dbdma_pm_data[i][1], addr + 0x04);
+ __raw_writel(alchemy_dbdma_pm_data[i][2], addr + 0x08);
+ __raw_writel(alchemy_dbdma_pm_data[i][3], addr + 0x0c);
+ __raw_writel(alchemy_dbdma_pm_data[i][4], addr + 0x10);
+ __raw_writel(alchemy_dbdma_pm_data[i][5], addr + 0x14);
+ wmb();
addr += 0x100; /* next channel base */
}
-
- return 0;
}
-static struct sysdev_class alchemy_dbdma_sysdev_class = {
- .name = "dbdma",
+static struct syscore_ops alchemy_dbdma_syscore_ops = {
.suspend = alchemy_dbdma_suspend,
.resume = alchemy_dbdma_resume,
};
-static int __init alchemy_dbdma_sysdev_init(void)
-{
- struct alchemy_dbdma_sysdev *sdev;
- int ret;
-
- ret = sysdev_class_register(&alchemy_dbdma_sysdev_class);
- if (ret)
- return ret;
-
- sdev = kzalloc(sizeof(struct alchemy_dbdma_sysdev), GFP_KERNEL);
- if (!sdev)
- return -ENOMEM;
-
- sdev->sysdev.id = -1;
- sdev->sysdev.cls = &alchemy_dbdma_sysdev_class;
- ret = sysdev_register(&sdev->sysdev);
- if (ret)
- kfree(sdev);
-
- return ret;
-}
-
static int __init au1xxx_dbdma_init(void)
{
int irq_nr, ret;
else {
dbdma_initialized = 1;
printk(KERN_INFO "Alchemy DBDMA initialized\n");
- ret = alchemy_dbdma_sysdev_init();
- if (ret) {
- printk(KERN_ERR "DBDMA PM init failed\n");
- ret = 0;
- }
+ register_syscore_ops(&alchemy_dbdma_syscore_ops);
}
return ret;
* returned from request_dma.
*/
+/* DMA Channel register block spacing */
+#define DMA_CHANNEL_LEN 0x00000100
+
DEFINE_SPINLOCK(au1000_dma_spin_lock);
struct dma_chan au1000_dma_table[NUM_AU1000_DMA_CHANNELS] = {
unsigned int fifo_addr;
unsigned int dma_mode;
} dma_dev_table[DMA_NUM_DEV] = {
- {UART0_ADDR + UART_TX, 0},
- {UART0_ADDR + UART_RX, 0},
- {0, 0},
- {0, 0},
- {AC97C_DATA, DMA_DW16 }, /* coherent */
- {AC97C_DATA, DMA_DR | DMA_DW16 }, /* coherent */
- {UART3_ADDR + UART_TX, DMA_DW8 | DMA_NC},
- {UART3_ADDR + UART_RX, DMA_DR | DMA_DW8 | DMA_NC},
- {USBD_EP0RD, DMA_DR | DMA_DW8 | DMA_NC},
- {USBD_EP0WR, DMA_DW8 | DMA_NC},
- {USBD_EP2WR, DMA_DW8 | DMA_NC},
- {USBD_EP3WR, DMA_DW8 | DMA_NC},
- {USBD_EP4RD, DMA_DR | DMA_DW8 | DMA_NC},
- {USBD_EP5RD, DMA_DR | DMA_DW8 | DMA_NC},
- {I2S_DATA, DMA_DW32 | DMA_NC},
- {I2S_DATA, DMA_DR | DMA_DW32 | DMA_NC}
+ { AU1000_UART0_PHYS_ADDR + 0x04, DMA_DW8 }, /* UART0_TX */
+ { AU1000_UART0_PHYS_ADDR + 0x00, DMA_DW8 | DMA_DR }, /* UART0_RX */
+ { 0, 0 }, /* DMA_REQ0 */
+ { 0, 0 }, /* DMA_REQ1 */
+ { AU1000_AC97_PHYS_ADDR + 0x08, DMA_DW16 }, /* AC97 TX c */
+ { AU1000_AC97_PHYS_ADDR + 0x08, DMA_DW16 | DMA_DR }, /* AC97 RX c */
+ { AU1000_UART3_PHYS_ADDR + 0x04, DMA_DW8 | DMA_NC }, /* UART3_TX */
+ { AU1000_UART3_PHYS_ADDR + 0x00, DMA_DW8 | DMA_NC | DMA_DR }, /* UART3_RX */
+ { AU1000_USBD_PHYS_ADDR + 0x00, DMA_DW8 | DMA_NC | DMA_DR }, /* EP0RD */
+ { AU1000_USBD_PHYS_ADDR + 0x04, DMA_DW8 | DMA_NC }, /* EP0WR */
+ { AU1000_USBD_PHYS_ADDR + 0x08, DMA_DW8 | DMA_NC }, /* EP2WR */
+ { AU1000_USBD_PHYS_ADDR + 0x0c, DMA_DW8 | DMA_NC }, /* EP3WR */
+ { AU1000_USBD_PHYS_ADDR + 0x10, DMA_DW8 | DMA_NC | DMA_DR }, /* EP4RD */
+ { AU1000_USBD_PHYS_ADDR + 0x14, DMA_DW8 | DMA_NC | DMA_DR }, /* EP5RD */
+ /* on Au1500, these 2 are DMA_REQ2/3 (GPIO208/209) instead! */
+ { AU1000_I2S_PHYS_ADDR + 0x00, DMA_DW32 | DMA_NC}, /* I2S TX */
+ { AU1000_I2S_PHYS_ADDR + 0x00, DMA_DW32 | DMA_NC | DMA_DR}, /* I2S RX */
};
int au1000_dma_read_proc(char *buf, char **start, off_t fpos,
/* Device FIFO addresses and default DMA modes - 2nd bank */
static const struct dma_dev dma_dev_table_bank2[DMA_NUM_DEV_BANK2] = {
- { SD0_XMIT_FIFO, DMA_DS | DMA_DW8 }, /* coherent */
- { SD0_RECV_FIFO, DMA_DS | DMA_DR | DMA_DW8 }, /* coherent */
- { SD1_XMIT_FIFO, DMA_DS | DMA_DW8 }, /* coherent */
- { SD1_RECV_FIFO, DMA_DS | DMA_DR | DMA_DW8 } /* coherent */
+ { AU1100_SD0_PHYS_ADDR + 0x00, DMA_DS | DMA_DW8 }, /* coherent */
+ { AU1100_SD0_PHYS_ADDR + 0x04, DMA_DS | DMA_DW8 | DMA_DR }, /* coherent */
+ { AU1100_SD1_PHYS_ADDR + 0x00, DMA_DS | DMA_DW8 }, /* coherent */
+ { AU1100_SD1_PHYS_ADDR + 0x04, DMA_DS | DMA_DW8 | DMA_DR } /* coherent */
};
void dump_au1000_dma_channel(unsigned int dmanr)
}
/* fill it in */
- chan->io = DMA_CHANNEL_BASE + i * DMA_CHANNEL_LEN;
+ chan->io = KSEG1ADDR(AU1000_DMA_PHYS_ADDR) + i * DMA_CHANNEL_LEN;
chan->dev_id = dev_id;
chan->dev_str = dev_str;
chan->fifo_addr = dev->fifo_addr;
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/slab.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <asm/irq_cpu.h>
#include <asm/mipsregs.h>
#include <asm/mach-pb1x00/pb1000.h>
#endif
+/* Interrupt Controller register offsets */
+#define IC_CFG0RD 0x40
+#define IC_CFG0SET 0x40
+#define IC_CFG0CLR 0x44
+#define IC_CFG1RD 0x48
+#define IC_CFG1SET 0x48
+#define IC_CFG1CLR 0x4C
+#define IC_CFG2RD 0x50
+#define IC_CFG2SET 0x50
+#define IC_CFG2CLR 0x54
+#define IC_REQ0INT 0x54
+#define IC_SRCRD 0x58
+#define IC_SRCSET 0x58
+#define IC_SRCCLR 0x5C
+#define IC_REQ1INT 0x5C
+#define IC_ASSIGNRD 0x60
+#define IC_ASSIGNSET 0x60
+#define IC_ASSIGNCLR 0x64
+#define IC_WAKERD 0x68
+#define IC_WAKESET 0x68
+#define IC_WAKECLR 0x6C
+#define IC_MASKRD 0x70
+#define IC_MASKSET 0x70
+#define IC_MASKCLR 0x74
+#define IC_RISINGRD 0x78
+#define IC_RISINGCLR 0x78
+#define IC_FALLINGRD 0x7C
+#define IC_FALLINGCLR 0x7C
+#define IC_TESTBIT 0x80
+
static int au1x_ic_settype(struct irq_data *d, unsigned int flow_type);
/* NOTE on interrupt priorities: The original writers of this code said:
static void au1x_ic0_unmask(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC0_INT_BASE;
- au_writel(1 << bit, IC0_MASKSET);
- au_writel(1 << bit, IC0_WAKESET);
- au_sync();
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR);
+
+ __raw_writel(1 << bit, base + IC_MASKSET);
+ __raw_writel(1 << bit, base + IC_WAKESET);
+ wmb();
}
static void au1x_ic1_unmask(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC1_INT_BASE;
- au_writel(1 << bit, IC1_MASKSET);
- au_writel(1 << bit, IC1_WAKESET);
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR);
+
+ __raw_writel(1 << bit, base + IC_MASKSET);
+ __raw_writel(1 << bit, base + IC_WAKESET);
/* very hacky. does the pb1000 cpld auto-disable this int?
* nowhere in the current kernel sources is it disabled. --mlau
*/
#if defined(CONFIG_MIPS_PB1000)
if (d->irq == AU1000_GPIO15_INT)
- au_writel(0x4000, PB1000_MDR); /* enable int */
+ __raw_writel(0x4000, (void __iomem *)PB1000_MDR); /* enable int */
#endif
- au_sync();
+ wmb();
}
static void au1x_ic0_mask(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC0_INT_BASE;
- au_writel(1 << bit, IC0_MASKCLR);
- au_writel(1 << bit, IC0_WAKECLR);
- au_sync();
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR);
+
+ __raw_writel(1 << bit, base + IC_MASKCLR);
+ __raw_writel(1 << bit, base + IC_WAKECLR);
+ wmb();
}
static void au1x_ic1_mask(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC1_INT_BASE;
- au_writel(1 << bit, IC1_MASKCLR);
- au_writel(1 << bit, IC1_WAKECLR);
- au_sync();
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR);
+
+ __raw_writel(1 << bit, base + IC_MASKCLR);
+ __raw_writel(1 << bit, base + IC_WAKECLR);
+ wmb();
}
static void au1x_ic0_ack(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC0_INT_BASE;
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR);
/*
* This may assume that we don't get interrupts from
* both edges at once, or if we do, that we don't care.
*/
- au_writel(1 << bit, IC0_FALLINGCLR);
- au_writel(1 << bit, IC0_RISINGCLR);
- au_sync();
+ __raw_writel(1 << bit, base + IC_FALLINGCLR);
+ __raw_writel(1 << bit, base + IC_RISINGCLR);
+ wmb();
}
static void au1x_ic1_ack(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC1_INT_BASE;
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR);
/*
* This may assume that we don't get interrupts from
* both edges at once, or if we do, that we don't care.
*/
- au_writel(1 << bit, IC1_FALLINGCLR);
- au_writel(1 << bit, IC1_RISINGCLR);
- au_sync();
+ __raw_writel(1 << bit, base + IC_FALLINGCLR);
+ __raw_writel(1 << bit, base + IC_RISINGCLR);
+ wmb();
}
static void au1x_ic0_maskack(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC0_INT_BASE;
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR);
- au_writel(1 << bit, IC0_WAKECLR);
- au_writel(1 << bit, IC0_MASKCLR);
- au_writel(1 << bit, IC0_RISINGCLR);
- au_writel(1 << bit, IC0_FALLINGCLR);
- au_sync();
+ __raw_writel(1 << bit, base + IC_WAKECLR);
+ __raw_writel(1 << bit, base + IC_MASKCLR);
+ __raw_writel(1 << bit, base + IC_RISINGCLR);
+ __raw_writel(1 << bit, base + IC_FALLINGCLR);
+ wmb();
}
static void au1x_ic1_maskack(struct irq_data *d)
{
unsigned int bit = d->irq - AU1000_INTC1_INT_BASE;
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR);
- au_writel(1 << bit, IC1_WAKECLR);
- au_writel(1 << bit, IC1_MASKCLR);
- au_writel(1 << bit, IC1_RISINGCLR);
- au_writel(1 << bit, IC1_FALLINGCLR);
- au_sync();
+ __raw_writel(1 << bit, base + IC_WAKECLR);
+ __raw_writel(1 << bit, base + IC_MASKCLR);
+ __raw_writel(1 << bit, base + IC_RISINGCLR);
+ __raw_writel(1 << bit, base + IC_FALLINGCLR);
+ wmb();
}
static int au1x_ic1_setwake(struct irq_data *d, unsigned int on)
return -EINVAL;
local_irq_save(flags);
- wakemsk = au_readl(SYS_WAKEMSK);
+ wakemsk = __raw_readl((void __iomem *)SYS_WAKEMSK);
if (on)
wakemsk |= 1 << bit;
else
wakemsk &= ~(1 << bit);
- au_writel(wakemsk, SYS_WAKEMSK);
- au_sync();
+ __raw_writel(wakemsk, (void __iomem *)SYS_WAKEMSK);
+ wmb();
local_irq_restore(flags);
return 0;
static int au1x_ic_settype(struct irq_data *d, unsigned int flow_type)
{
struct irq_chip *chip;
- unsigned long icr[6];
- unsigned int bit, ic, irq = d->irq;
+ unsigned int bit, irq = d->irq;
irq_flow_handler_t handler = NULL;
unsigned char *name = NULL;
+ void __iomem *base;
int ret;
if (irq >= AU1000_INTC1_INT_BASE) {
bit = irq - AU1000_INTC1_INT_BASE;
chip = &au1x_ic1_chip;
- ic = 1;
+ base = (void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR);
} else {
bit = irq - AU1000_INTC0_INT_BASE;
chip = &au1x_ic0_chip;
- ic = 0;
+ base = (void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR);
}
if (bit > 31)
return -EINVAL;
- icr[0] = ic ? IC1_CFG0SET : IC0_CFG0SET;
- icr[1] = ic ? IC1_CFG1SET : IC0_CFG1SET;
- icr[2] = ic ? IC1_CFG2SET : IC0_CFG2SET;
- icr[3] = ic ? IC1_CFG0CLR : IC0_CFG0CLR;
- icr[4] = ic ? IC1_CFG1CLR : IC0_CFG1CLR;
- icr[5] = ic ? IC1_CFG2CLR : IC0_CFG2CLR;
-
ret = 0;
switch (flow_type) { /* cfgregs 2:1:0 */
case IRQ_TYPE_EDGE_RISING: /* 0:0:1 */
- au_writel(1 << bit, icr[5]);
- au_writel(1 << bit, icr[4]);
- au_writel(1 << bit, icr[0]);
+ __raw_writel(1 << bit, base + IC_CFG2CLR);
+ __raw_writel(1 << bit, base + IC_CFG1CLR);
+ __raw_writel(1 << bit, base + IC_CFG0SET);
handler = handle_edge_irq;
name = "riseedge";
break;
case IRQ_TYPE_EDGE_FALLING: /* 0:1:0 */
- au_writel(1 << bit, icr[5]);
- au_writel(1 << bit, icr[1]);
- au_writel(1 << bit, icr[3]);
+ __raw_writel(1 << bit, base + IC_CFG2CLR);
+ __raw_writel(1 << bit, base + IC_CFG1SET);
+ __raw_writel(1 << bit, base + IC_CFG0CLR);
handler = handle_edge_irq;
name = "falledge";
break;
case IRQ_TYPE_EDGE_BOTH: /* 0:1:1 */
- au_writel(1 << bit, icr[5]);
- au_writel(1 << bit, icr[1]);
- au_writel(1 << bit, icr[0]);
+ __raw_writel(1 << bit, base + IC_CFG2CLR);
+ __raw_writel(1 << bit, base + IC_CFG1SET);
+ __raw_writel(1 << bit, base + IC_CFG0SET);
handler = handle_edge_irq;
name = "bothedge";
break;
case IRQ_TYPE_LEVEL_HIGH: /* 1:0:1 */
- au_writel(1 << bit, icr[2]);
- au_writel(1 << bit, icr[4]);
- au_writel(1 << bit, icr[0]);
+ __raw_writel(1 << bit, base + IC_CFG2SET);
+ __raw_writel(1 << bit, base + IC_CFG1CLR);
+ __raw_writel(1 << bit, base + IC_CFG0SET);
handler = handle_level_irq;
name = "hilevel";
break;
case IRQ_TYPE_LEVEL_LOW: /* 1:1:0 */
- au_writel(1 << bit, icr[2]);
- au_writel(1 << bit, icr[1]);
- au_writel(1 << bit, icr[3]);
+ __raw_writel(1 << bit, base + IC_CFG2SET);
+ __raw_writel(1 << bit, base + IC_CFG1SET);
+ __raw_writel(1 << bit, base + IC_CFG0CLR);
handler = handle_level_irq;
name = "lowlevel";
break;
case IRQ_TYPE_NONE: /* 0:0:0 */
- au_writel(1 << bit, icr[5]);
- au_writel(1 << bit, icr[4]);
- au_writel(1 << bit, icr[3]);
+ __raw_writel(1 << bit, base + IC_CFG2CLR);
+ __raw_writel(1 << bit, base + IC_CFG1CLR);
+ __raw_writel(1 << bit, base + IC_CFG0CLR);
break;
default:
ret = -EINVAL;
}
__irq_set_chip_handler_name_locked(d->irq, chip, handler, name);
- au_sync();
+ wmb();
return ret;
}
off = MIPS_CPU_IRQ_BASE + 7;
goto handle;
} else if (pending & CAUSEF_IP2) {
- s = IC0_REQ0INT;
+ s = KSEG1ADDR(AU1000_IC0_PHYS_ADDR) + IC_REQ0INT;
off = AU1000_INTC0_INT_BASE;
} else if (pending & CAUSEF_IP3) {
- s = IC0_REQ1INT;
+ s = KSEG1ADDR(AU1000_IC0_PHYS_ADDR) + IC_REQ1INT;
off = AU1000_INTC0_INT_BASE;
} else if (pending & CAUSEF_IP4) {
- s = IC1_REQ0INT;
+ s = KSEG1ADDR(AU1000_IC1_PHYS_ADDR) + IC_REQ0INT;
off = AU1000_INTC1_INT_BASE;
} else if (pending & CAUSEF_IP5) {
- s = IC1_REQ1INT;
+ s = KSEG1ADDR(AU1000_IC1_PHYS_ADDR) + IC_REQ1INT;
off = AU1000_INTC1_INT_BASE;
} else
goto spurious;
- s = au_readl(s);
+ s = __raw_readl((void __iomem *)s);
if (unlikely(!s)) {
spurious:
spurious_interrupt();
do_IRQ(off);
}
+
+static inline void ic_init(void __iomem *base)
+{
+ /* initialize interrupt controller to a safe state */
+ __raw_writel(0xffffffff, base + IC_CFG0CLR);
+ __raw_writel(0xffffffff, base + IC_CFG1CLR);
+ __raw_writel(0xffffffff, base + IC_CFG2CLR);
+ __raw_writel(0xffffffff, base + IC_MASKCLR);
+ __raw_writel(0xffffffff, base + IC_ASSIGNCLR);
+ __raw_writel(0xffffffff, base + IC_WAKECLR);
+ __raw_writel(0xffffffff, base + IC_SRCSET);
+ __raw_writel(0xffffffff, base + IC_FALLINGCLR);
+ __raw_writel(0xffffffff, base + IC_RISINGCLR);
+ __raw_writel(0x00000000, base + IC_TESTBIT);
+ wmb();
+}
+
static void __init au1000_init_irq(struct au1xxx_irqmap *map)
{
unsigned int bit, irq_nr;
- int i;
-
- /*
- * Initialize interrupt controllers to a safe state.
- */
- au_writel(0xffffffff, IC0_CFG0CLR);
- au_writel(0xffffffff, IC0_CFG1CLR);
- au_writel(0xffffffff, IC0_CFG2CLR);
- au_writel(0xffffffff, IC0_MASKCLR);
- au_writel(0xffffffff, IC0_ASSIGNCLR);
- au_writel(0xffffffff, IC0_WAKECLR);
- au_writel(0xffffffff, IC0_SRCSET);
- au_writel(0xffffffff, IC0_FALLINGCLR);
- au_writel(0xffffffff, IC0_RISINGCLR);
- au_writel(0x00000000, IC0_TESTBIT);
-
- au_writel(0xffffffff, IC1_CFG0CLR);
- au_writel(0xffffffff, IC1_CFG1CLR);
- au_writel(0xffffffff, IC1_CFG2CLR);
- au_writel(0xffffffff, IC1_MASKCLR);
- au_writel(0xffffffff, IC1_ASSIGNCLR);
- au_writel(0xffffffff, IC1_WAKECLR);
- au_writel(0xffffffff, IC1_SRCSET);
- au_writel(0xffffffff, IC1_FALLINGCLR);
- au_writel(0xffffffff, IC1_RISINGCLR);
- au_writel(0x00000000, IC1_TESTBIT);
+ void __iomem *base;
+ ic_init((void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR));
+ ic_init((void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR));
mips_cpu_irq_init();
/* register all 64 possible IC0+IC1 irq sources as type "none".
* Use set_irq_type() to set edge/level behaviour at runtime.
*/
- for (i = AU1000_INTC0_INT_BASE;
- (i < AU1000_INTC0_INT_BASE + 32); i++)
- au1x_ic_settype(irq_get_irq_data(i), IRQ_TYPE_NONE);
+ for (irq_nr = AU1000_INTC0_INT_BASE;
+ (irq_nr < AU1000_INTC0_INT_BASE + 32); irq_nr++)
+ au1x_ic_settype(irq_get_irq_data(irq_nr), IRQ_TYPE_NONE);
- for (i = AU1000_INTC1_INT_BASE;
- (i < AU1000_INTC1_INT_BASE + 32); i++)
- au1x_ic_settype(irq_get_irq_data(i), IRQ_TYPE_NONE);
+ for (irq_nr = AU1000_INTC1_INT_BASE;
+ (irq_nr < AU1000_INTC1_INT_BASE + 32); irq_nr++)
+ au1x_ic_settype(irq_get_irq_data(irq_nr), IRQ_TYPE_NONE);
/*
* Initialize IC0, which is fixed per processor.
if (irq_nr >= AU1000_INTC1_INT_BASE) {
bit = irq_nr - AU1000_INTC1_INT_BASE;
- if (map->im_request)
- au_writel(1 << bit, IC1_ASSIGNSET);
+ base = (void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR);
} else {
bit = irq_nr - AU1000_INTC0_INT_BASE;
- if (map->im_request)
- au_writel(1 << bit, IC0_ASSIGNSET);
+ base = (void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR);
}
+ if (map->im_request)
+ __raw_writel(1 << bit, base + IC_ASSIGNSET);
au1x_ic_settype(irq_get_irq_data(irq_nr), map->im_type);
++map;
}
}
-struct alchemy_ic_sysdev {
- struct sys_device sysdev;
- void __iomem *base;
- unsigned long pmdata[7];
-};
-static int alchemy_ic_suspend(struct sys_device *dev, pm_message_t state)
-{
- struct alchemy_ic_sysdev *icdev =
- container_of(dev, struct alchemy_ic_sysdev, sysdev);
+static unsigned long alchemy_ic_pmdata[7 * 2];
- icdev->pmdata[0] = __raw_readl(icdev->base + IC_CFG0RD);
- icdev->pmdata[1] = __raw_readl(icdev->base + IC_CFG1RD);
- icdev->pmdata[2] = __raw_readl(icdev->base + IC_CFG2RD);
- icdev->pmdata[3] = __raw_readl(icdev->base + IC_SRCRD);
- icdev->pmdata[4] = __raw_readl(icdev->base + IC_ASSIGNRD);
- icdev->pmdata[5] = __raw_readl(icdev->base + IC_WAKERD);
- icdev->pmdata[6] = __raw_readl(icdev->base + IC_MASKRD);
-
- return 0;
+static inline void alchemy_ic_suspend_one(void __iomem *base, unsigned long *d)
+{
+ d[0] = __raw_readl(base + IC_CFG0RD);
+ d[1] = __raw_readl(base + IC_CFG1RD);
+ d[2] = __raw_readl(base + IC_CFG2RD);
+ d[3] = __raw_readl(base + IC_SRCRD);
+ d[4] = __raw_readl(base + IC_ASSIGNRD);
+ d[5] = __raw_readl(base + IC_WAKERD);
+ d[6] = __raw_readl(base + IC_MASKRD);
+ ic_init(base); /* shut it up too while at it */
}
-static int alchemy_ic_resume(struct sys_device *dev)
+static inline void alchemy_ic_resume_one(void __iomem *base, unsigned long *d)
{
- struct alchemy_ic_sysdev *icdev =
- container_of(dev, struct alchemy_ic_sysdev, sysdev);
-
- __raw_writel(0xffffffff, icdev->base + IC_MASKCLR);
- __raw_writel(0xffffffff, icdev->base + IC_CFG0CLR);
- __raw_writel(0xffffffff, icdev->base + IC_CFG1CLR);
- __raw_writel(0xffffffff, icdev->base + IC_CFG2CLR);
- __raw_writel(0xffffffff, icdev->base + IC_SRCCLR);
- __raw_writel(0xffffffff, icdev->base + IC_ASSIGNCLR);
- __raw_writel(0xffffffff, icdev->base + IC_WAKECLR);
- __raw_writel(0xffffffff, icdev->base + IC_RISINGCLR);
- __raw_writel(0xffffffff, icdev->base + IC_FALLINGCLR);
- __raw_writel(0x00000000, icdev->base + IC_TESTBIT);
- wmb();
- __raw_writel(icdev->pmdata[0], icdev->base + IC_CFG0SET);
- __raw_writel(icdev->pmdata[1], icdev->base + IC_CFG1SET);
- __raw_writel(icdev->pmdata[2], icdev->base + IC_CFG2SET);
- __raw_writel(icdev->pmdata[3], icdev->base + IC_SRCSET);
- __raw_writel(icdev->pmdata[4], icdev->base + IC_ASSIGNSET);
- __raw_writel(icdev->pmdata[5], icdev->base + IC_WAKESET);
+ ic_init(base);
+
+ __raw_writel(d[0], base + IC_CFG0SET);
+ __raw_writel(d[1], base + IC_CFG1SET);
+ __raw_writel(d[2], base + IC_CFG2SET);
+ __raw_writel(d[3], base + IC_SRCSET);
+ __raw_writel(d[4], base + IC_ASSIGNSET);
+ __raw_writel(d[5], base + IC_WAKESET);
wmb();
- __raw_writel(icdev->pmdata[6], icdev->base + IC_MASKSET);
+ __raw_writel(d[6], base + IC_MASKSET);
wmb();
+}
+static int alchemy_ic_suspend(void)
+{
+ alchemy_ic_suspend_one((void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR),
+ alchemy_ic_pmdata);
+ alchemy_ic_suspend_one((void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR),
+ &alchemy_ic_pmdata[7]);
return 0;
}
-static struct sysdev_class alchemy_ic_sysdev_class = {
- .name = "ic",
+static void alchemy_ic_resume(void)
+{
+ alchemy_ic_resume_one((void __iomem *)KSEG1ADDR(AU1000_IC1_PHYS_ADDR),
+ &alchemy_ic_pmdata[7]);
+ alchemy_ic_resume_one((void __iomem *)KSEG1ADDR(AU1000_IC0_PHYS_ADDR),
+ alchemy_ic_pmdata);
+}
+
+static struct syscore_ops alchemy_ic_syscore_ops = {
.suspend = alchemy_ic_suspend,
.resume = alchemy_ic_resume,
};
-static int __init alchemy_ic_sysdev_init(void)
+static int __init alchemy_ic_pm_init(void)
{
- struct alchemy_ic_sysdev *icdev;
- unsigned long icbase[2] = { IC0_PHYS_ADDR, IC1_PHYS_ADDR };
- int err, i;
-
- err = sysdev_class_register(&alchemy_ic_sysdev_class);
- if (err)
- return err;
-
- for (i = 0; i < 2; i++) {
- icdev = kzalloc(sizeof(struct alchemy_ic_sysdev), GFP_KERNEL);
- if (!icdev)
- return -ENOMEM;
-
- icdev->base = ioremap(icbase[i], 0x1000);
-
- icdev->sysdev.id = i;
- icdev->sysdev.cls = &alchemy_ic_sysdev_class;
- err = sysdev_register(&icdev->sysdev);
- if (err) {
- kfree(icdev);
- return err;
- }
- }
-
+ register_syscore_ops(&alchemy_ic_syscore_ops);
return 0;
}
-device_initcall(alchemy_ic_sysdev_init);
+device_initcall(alchemy_ic_pm_init);
#include <linux/dma-mapping.h>
#include <linux/etherdevice.h>
+#include <linux/init.h>
#include <linux/platform_device.h>
#include <linux/serial_8250.h>
-#include <linux/init.h>
+#include <linux/slab.h>
#include <asm/mach-au1x00/au1xxx.h>
#include <asm/mach-au1x00/au1xxx_dbdma.h>
#ifdef CONFIG_SERIAL_8250
switch (state) {
case 0:
- if ((__raw_readl(port->membase + UART_MOD_CNTRL) & 3) != 3) {
- /* power-on sequence as suggested in the databooks */
- __raw_writel(0, port->membase + UART_MOD_CNTRL);
- wmb();
- __raw_writel(1, port->membase + UART_MOD_CNTRL);
- wmb();
- }
- __raw_writel(3, port->membase + UART_MOD_CNTRL); /* full on */
- wmb();
+ alchemy_uart_enable(CPHYSADDR(port->membase));
serial8250_do_pm(port, state, old_state);
break;
case 3: /* power off */
serial8250_do_pm(port, state, old_state);
- __raw_writel(0, port->membase + UART_MOD_CNTRL);
- wmb();
+ alchemy_uart_disable(CPHYSADDR(port->membase));
break;
default:
serial8250_do_pm(port, state, old_state);
.pm = alchemy_8250_pm, \
}
-static struct plat_serial8250_port au1x00_uart_data[] = {
-#if defined(CONFIG_SOC_AU1000)
- PORT(UART0_PHYS_ADDR, AU1000_UART0_INT),
- PORT(UART1_PHYS_ADDR, AU1000_UART1_INT),
- PORT(UART2_PHYS_ADDR, AU1000_UART2_INT),
- PORT(UART3_PHYS_ADDR, AU1000_UART3_INT),
-#elif defined(CONFIG_SOC_AU1500)
- PORT(UART0_PHYS_ADDR, AU1500_UART0_INT),
- PORT(UART3_PHYS_ADDR, AU1500_UART3_INT),
-#elif defined(CONFIG_SOC_AU1100)
- PORT(UART0_PHYS_ADDR, AU1100_UART0_INT),
- PORT(UART1_PHYS_ADDR, AU1100_UART1_INT),
- PORT(UART3_PHYS_ADDR, AU1100_UART3_INT),
-#elif defined(CONFIG_SOC_AU1550)
- PORT(UART0_PHYS_ADDR, AU1550_UART0_INT),
- PORT(UART1_PHYS_ADDR, AU1550_UART1_INT),
- PORT(UART3_PHYS_ADDR, AU1550_UART3_INT),
-#elif defined(CONFIG_SOC_AU1200)
- PORT(UART0_PHYS_ADDR, AU1200_UART0_INT),
- PORT(UART1_PHYS_ADDR, AU1200_UART1_INT),
-#endif
- { },
+static struct plat_serial8250_port au1x00_uart_data[][4] __initdata = {
+ [ALCHEMY_CPU_AU1000] = {
+ PORT(AU1000_UART0_PHYS_ADDR, AU1000_UART0_INT),
+ PORT(AU1000_UART1_PHYS_ADDR, AU1000_UART1_INT),
+ PORT(AU1000_UART2_PHYS_ADDR, AU1000_UART2_INT),
+ PORT(AU1000_UART3_PHYS_ADDR, AU1000_UART3_INT),
+ },
+ [ALCHEMY_CPU_AU1500] = {
+ PORT(AU1000_UART0_PHYS_ADDR, AU1500_UART0_INT),
+ PORT(AU1000_UART3_PHYS_ADDR, AU1500_UART3_INT),
+ },
+ [ALCHEMY_CPU_AU1100] = {
+ PORT(AU1000_UART0_PHYS_ADDR, AU1100_UART0_INT),
+ PORT(AU1000_UART1_PHYS_ADDR, AU1100_UART1_INT),
+ PORT(AU1000_UART3_PHYS_ADDR, AU1100_UART3_INT),
+ },
+ [ALCHEMY_CPU_AU1550] = {
+ PORT(AU1000_UART0_PHYS_ADDR, AU1550_UART0_INT),
+ PORT(AU1000_UART1_PHYS_ADDR, AU1550_UART1_INT),
+ PORT(AU1000_UART3_PHYS_ADDR, AU1550_UART3_INT),
+ },
+ [ALCHEMY_CPU_AU1200] = {
+ PORT(AU1000_UART0_PHYS_ADDR, AU1200_UART0_INT),
+ PORT(AU1000_UART1_PHYS_ADDR, AU1200_UART1_INT),
+ },
};
static struct platform_device au1xx0_uart_device = {
.name = "serial8250",
.id = PLAT8250_DEV_AU1X00,
- .dev = {
- .platform_data = au1x00_uart_data,
- },
};
+static void __init alchemy_setup_uarts(int ctype)
+{
+ unsigned int uartclk = get_au1x00_uart_baud_base() * 16;
+ int s = sizeof(struct plat_serial8250_port);
+ int c = alchemy_get_uarts(ctype);
+ struct plat_serial8250_port *ports;
+
+ ports = kzalloc(s * (c + 1), GFP_KERNEL);
+ if (!ports) {
+ printk(KERN_INFO "Alchemy: no memory for UART data\n");
+ return;
+ }
+ memcpy(ports, au1x00_uart_data[ctype], s * c);
+ au1xx0_uart_device.dev.platform_data = ports;
+
+ /* Fill up uartclk. */
+ for (s = 0; s < c; s++)
+ ports[s].uartclk = uartclk;
+ if (platform_device_register(&au1xx0_uart_device))
+ printk(KERN_INFO "Alchemy: failed to register UARTs\n");
+}
+
/* OHCI (USB full speed host controller) */
static struct resource au1xxx_usb_ohci_resources[] = {
[0] = {
static struct resource au1200_mmc0_resources[] = {
[0] = {
- .start = SD0_PHYS_ADDR,
- .end = SD0_PHYS_ADDR + 0x7ffff,
+ .start = AU1100_SD0_PHYS_ADDR,
+ .end = AU1100_SD0_PHYS_ADDR + 0xfff,
.flags = IORESOURCE_MEM,
},
[1] = {
#ifndef CONFIG_MIPS_DB1200
static struct resource au1200_mmc1_resources[] = {
[0] = {
- .start = SD1_PHYS_ADDR,
- .end = SD1_PHYS_ADDR + 0x7ffff,
+ .start = AU1100_SD1_PHYS_ADDR,
+ .end = AU1100_SD1_PHYS_ADDR + 0xfff,
.flags = IORESOURCE_MEM,
},
[1] = {
#endif
/* Macro to help defining the Ethernet MAC resources */
+#define MAC_RES_COUNT 3 /* MAC regs base, MAC enable reg, MAC INT */
#define MAC_RES(_base, _enable, _irq) \
{ \
- .start = CPHYSADDR(_base), \
- .end = CPHYSADDR(_base + 0xffff), \
+ .start = _base, \
+ .end = _base + 0xffff, \
.flags = IORESOURCE_MEM, \
}, \
{ \
- .start = CPHYSADDR(_enable), \
- .end = CPHYSADDR(_enable + 0x3), \
+ .start = _enable, \
+ .end = _enable + 0x3, \
.flags = IORESOURCE_MEM, \
}, \
{ \
.flags = IORESOURCE_IRQ \
}
-static struct resource au1xxx_eth0_resources[] = {
-#if defined(CONFIG_SOC_AU1000)
- MAC_RES(AU1000_ETH0_BASE, AU1000_MAC0_ENABLE, AU1000_MAC0_DMA_INT),
-#elif defined(CONFIG_SOC_AU1100)
- MAC_RES(AU1100_ETH0_BASE, AU1100_MAC0_ENABLE, AU1100_MAC0_DMA_INT),
-#elif defined(CONFIG_SOC_AU1550)
- MAC_RES(AU1550_ETH0_BASE, AU1550_MAC0_ENABLE, AU1550_MAC0_DMA_INT),
-#elif defined(CONFIG_SOC_AU1500)
- MAC_RES(AU1500_ETH0_BASE, AU1500_MAC0_ENABLE, AU1500_MAC0_DMA_INT),
-#endif
+static struct resource au1xxx_eth0_resources[][MAC_RES_COUNT] __initdata = {
+ [ALCHEMY_CPU_AU1000] = {
+ MAC_RES(AU1000_MAC0_PHYS_ADDR,
+ AU1000_MACEN_PHYS_ADDR,
+ AU1000_MAC0_DMA_INT)
+ },
+ [ALCHEMY_CPU_AU1500] = {
+ MAC_RES(AU1500_MAC0_PHYS_ADDR,
+ AU1500_MACEN_PHYS_ADDR,
+ AU1500_MAC0_DMA_INT)
+ },
+ [ALCHEMY_CPU_AU1100] = {
+ MAC_RES(AU1000_MAC0_PHYS_ADDR,
+ AU1000_MACEN_PHYS_ADDR,
+ AU1100_MAC0_DMA_INT)
+ },
+ [ALCHEMY_CPU_AU1550] = {
+ MAC_RES(AU1000_MAC0_PHYS_ADDR,
+ AU1000_MACEN_PHYS_ADDR,
+ AU1550_MAC0_DMA_INT)
+ },
};
-
static struct au1000_eth_platform_data au1xxx_eth0_platform_data = {
.phy1_search_mac0 = 1,
};
static struct platform_device au1xxx_eth0_device = {
.name = "au1000-eth",
.id = 0,
- .num_resources = ARRAY_SIZE(au1xxx_eth0_resources),
- .resource = au1xxx_eth0_resources,
+ .num_resources = MAC_RES_COUNT,
.dev.platform_data = &au1xxx_eth0_platform_data,
};
-#ifndef CONFIG_SOC_AU1100
-static struct resource au1xxx_eth1_resources[] = {
-#if defined(CONFIG_SOC_AU1000)
- MAC_RES(AU1000_ETH1_BASE, AU1000_MAC1_ENABLE, AU1000_MAC1_DMA_INT),
-#elif defined(CONFIG_SOC_AU1550)
- MAC_RES(AU1550_ETH1_BASE, AU1550_MAC1_ENABLE, AU1550_MAC1_DMA_INT),
-#elif defined(CONFIG_SOC_AU1500)
- MAC_RES(AU1500_ETH1_BASE, AU1500_MAC1_ENABLE, AU1500_MAC1_DMA_INT),
-#endif
+static struct resource au1xxx_eth1_resources[][MAC_RES_COUNT] __initdata = {
+ [ALCHEMY_CPU_AU1000] = {
+ MAC_RES(AU1000_MAC1_PHYS_ADDR,
+ AU1000_MACEN_PHYS_ADDR + 4,
+ AU1000_MAC1_DMA_INT)
+ },
+ [ALCHEMY_CPU_AU1500] = {
+ MAC_RES(AU1500_MAC1_PHYS_ADDR,
+ AU1500_MACEN_PHYS_ADDR + 4,
+ AU1500_MAC1_DMA_INT)
+ },
+ [ALCHEMY_CPU_AU1550] = {
+ MAC_RES(AU1000_MAC1_PHYS_ADDR,
+ AU1000_MACEN_PHYS_ADDR + 4,
+ AU1550_MAC1_DMA_INT)
+ },
};
static struct au1000_eth_platform_data au1xxx_eth1_platform_data = {
static struct platform_device au1xxx_eth1_device = {
.name = "au1000-eth",
.id = 1,
- .num_resources = ARRAY_SIZE(au1xxx_eth1_resources),
- .resource = au1xxx_eth1_resources,
+ .num_resources = MAC_RES_COUNT,
.dev.platform_data = &au1xxx_eth1_platform_data,
};
-#endif
void __init au1xxx_override_eth_cfg(unsigned int port,
struct au1000_eth_platform_data *eth_data)
if (port == 0)
memcpy(&au1xxx_eth0_platform_data, eth_data,
sizeof(struct au1000_eth_platform_data));
-#ifndef CONFIG_SOC_AU1100
else
memcpy(&au1xxx_eth1_platform_data, eth_data,
sizeof(struct au1000_eth_platform_data));
-#endif
+}
+
+static void __init alchemy_setup_macs(int ctype)
+{
+ int ret, i;
+ unsigned char ethaddr[6];
+ struct resource *macres;
+
+ /* Handle 1st MAC */
+ if (alchemy_get_macs(ctype) < 1)
+ return;
+
+ macres = kmalloc(sizeof(struct resource) * MAC_RES_COUNT, GFP_KERNEL);
+ if (!macres) {
+ printk(KERN_INFO "Alchemy: no memory for MAC0 resources\n");
+ return;
+ }
+ memcpy(macres, au1xxx_eth0_resources[ctype],
+ sizeof(struct resource) * MAC_RES_COUNT);
+ au1xxx_eth0_device.resource = macres;
+
+ i = prom_get_ethernet_addr(ethaddr);
+ if (!i && !is_valid_ether_addr(au1xxx_eth0_platform_data.mac))
+ memcpy(au1xxx_eth0_platform_data.mac, ethaddr, 6);
+
+ ret = platform_device_register(&au1xxx_eth0_device);
+ if (!ret)
+ printk(KERN_INFO "Alchemy: failed to register MAC0\n");
+
+
+ /* Handle 2nd MAC */
+ if (alchemy_get_macs(ctype) < 2)
+ return;
+
+ macres = kmalloc(sizeof(struct resource) * MAC_RES_COUNT, GFP_KERNEL);
+ if (!macres) {
+ printk(KERN_INFO "Alchemy: no memory for MAC1 resources\n");
+ return;
+ }
+ memcpy(macres, au1xxx_eth1_resources[ctype],
+ sizeof(struct resource) * MAC_RES_COUNT);
+ au1xxx_eth1_device.resource = macres;
+
+ ethaddr[5] += 1; /* next addr for 2nd MAC */
+ if (!i && !is_valid_ether_addr(au1xxx_eth1_platform_data.mac))
+ memcpy(au1xxx_eth1_platform_data.mac, ethaddr, 6);
+
+ /* Register second MAC if enabled in pinfunc */
+ if (!(au_readl(SYS_PINFUNC) & (u32)SYS_PF_NI2)) {
+ ret = platform_device_register(&au1xxx_eth1_device);
+ if (ret)
+ printk(KERN_INFO "Alchemy: failed to register MAC1\n");
+ }
}
static struct platform_device *au1xxx_platform_devices[] __initdata = {
- &au1xx0_uart_device,
&au1xxx_usb_ohci_device,
#ifdef CONFIG_FB_AU1100
&au1100_lcd_device,
#ifdef SMBUS_PSC_BASE
&pbdb_smbus_device,
#endif
- &au1xxx_eth0_device,
};
static int __init au1xxx_platform_init(void)
{
- unsigned int uartclk = get_au1x00_uart_baud_base() * 16;
- int err, i;
- unsigned char ethaddr[6];
+ int err, ctype = alchemy_get_cputype();
- /* Fill up uartclk. */
- for (i = 0; au1x00_uart_data[i].flags; i++)
- au1x00_uart_data[i].uartclk = uartclk;
-
- /* use firmware-provided mac addr if available and necessary */
- i = prom_get_ethernet_addr(ethaddr);
- if (!i && !is_valid_ether_addr(au1xxx_eth0_platform_data.mac))
- memcpy(au1xxx_eth0_platform_data.mac, ethaddr, 6);
+ alchemy_setup_uarts(ctype);
+ alchemy_setup_macs(ctype);
err = platform_add_devices(au1xxx_platform_devices,
ARRAY_SIZE(au1xxx_platform_devices));
-#ifndef CONFIG_SOC_AU1100
- ethaddr[5] += 1; /* next addr for 2nd MAC */
- if (!i && !is_valid_ether_addr(au1xxx_eth1_platform_data.mac))
- memcpy(au1xxx_eth1_platform_data.mac, ethaddr, 6);
-
- /* Register second MAC if enabled in pinfunc */
- if (!err && !(au_readl(SYS_PINFUNC) & (u32)SYS_PF_NI2))
- err = platform_device_register(&au1xxx_eth1_device);
-#endif
-
return err;
}
/* this is faster than wasting cycles trying to approximate it */
preset_lpj = (est_freq >> 1) / HZ;
- board_setup(); /* board specific setup */
-
if (au1xxx_cpu_needs_config_od())
/* Various early Au1xx0 errata corrected by this */
set_c0_config(1 << 19); /* Set Config[OD] */
/* Clear to obtain best system bus performance */
clear_c0_config(1 << 19); /* Clear Config[OD] */
+ board_setup(); /* board specific setup */
+
/* IO/MEM resources. */
set_io_port_base(0);
ioport_resource.start = IOPORT_RESOURCE_START;
goto cntr_err;
/* register counter1 clocksource and event device */
- clocksource_set_clock(&au1x_counter1_clocksource, 32768);
- clocksource_register(&au1x_counter1_clocksource);
+ clocksource_register_hz(&au1x_counter1_clocksource, 32768);
cd->shift = 32;
cd->mult = div_sc(32768, NSEC_PER_SEC, cd->shift);
unsigned long freq0, clksrc, div, pfc;
unsigned short whoami;
+ /* Set Config[OD] (disable overlapping bus transaction):
+ * This gets rid of a _lot_ of spurious interrupts (especially
+ * wrt. IDE); but incurs ~10% performance hit in some
+ * cpu-bound applications.
+ */
+ set_c0_config(1 << 19);
+
bcsr_init(DB1200_BCSR_PHYS_ADDR,
DB1200_BCSR_PHYS_ADDR + DB1200_BCSR_HEXLED_OFS);
void __init board_setup(void)
{
unsigned long bcsr1, bcsr2;
- u32 pin_func;
bcsr1 = DB1000_BCSR_PHYS_ADDR;
bcsr2 = DB1000_BCSR_PHYS_ADDR + DB1000_BCSR_HEXLED_OFS;
- pin_func = 0;
-
#ifdef CONFIG_MIPS_DB1000
printk(KERN_INFO "AMD Alchemy Au1000/Db1000 Board\n");
#endif
/* Not valid for Au1550 */
#if defined(CONFIG_IRDA) && \
(defined(CONFIG_SOC_AU1000) || defined(CONFIG_SOC_AU1100))
- /* Set IRFIRSEL instead of GPIO15 */
- pin_func = au_readl(SYS_PINFUNC) | SYS_PF_IRF;
- au_writel(pin_func, SYS_PINFUNC);
- /* Power off until the driver is in use */
- bcsr_mod(BCSR_RESETS, BCSR_RESETS_IRDA_MODE_MASK,
- BCSR_RESETS_IRDA_MODE_OFF);
+ {
+ u32 pin_func;
+
+ /* Set IRFIRSEL instead of GPIO15 */
+ pin_func = au_readl(SYS_PINFUNC) | SYS_PF_IRF;
+ au_writel(pin_func, SYS_PINFUNC);
+ /* Power off until the driver is in use */
+ bcsr_mod(BCSR_RESETS, BCSR_RESETS_IRDA_MODE_MASK,
+ BCSR_RESETS_IRDA_MODE_OFF);
+ }
#endif
bcsr_write(BCSR_PCMCIA, 0); /* turn off PCMCIA power */
alchemy_gpio1_input_enable();
#ifdef CONFIG_MIPS_MIRAGE
- /* GPIO[20] is output */
- alchemy_gpio_direction_output(20, 0);
+ {
+ u32 pin_func;
- /* Set GPIO[210:208] instead of SSI_0 */
- pin_func = au_readl(SYS_PINFUNC) | SYS_PF_S0;
+ /* GPIO[20] is output */
+ alchemy_gpio_direction_output(20, 0);
- /* Set GPIO[215:211] for LEDs */
- pin_func |= 5 << 2;
+ /* Set GPIO[210:208] instead of SSI_0 */
+ pin_func = au_readl(SYS_PINFUNC) | SYS_PF_S0;
- /* Set GPIO[214:213] for more LEDs */
- pin_func |= 5 << 12;
+ /* Set GPIO[215:211] for LEDs */
+ pin_func |= 5 << 2;
- /* Set GPIO[207:200] instead of PCMCIA/LCD */
- pin_func |= SYS_PF_LCD | SYS_PF_PC;
- au_writel(pin_func, SYS_PINFUNC);
+ /* Set GPIO[214:213] for more LEDs */
+ pin_func |= 5 << 12;
- /*
- * Enable speaker amplifier. This should
- * be part of the audio driver.
- */
- alchemy_gpio_direction_output(209, 1);
+ /* Set GPIO[207:200] instead of PCMCIA/LCD */
+ pin_func |= SYS_PF_LCD | SYS_PF_PC;
+ au_writel(pin_func, SYS_PINFUNC);
- pm_power_off = mirage_power_off;
- _machine_halt = mirage_power_off;
- _machine_restart = (void(*)(char *))mips_softreset;
+ /*
+ * Enable speaker amplifier. This should
+ * be part of the audio driver.
+ */
+ alchemy_gpio_direction_output(209, 1);
+
+ pm_power_off = mirage_power_off;
+ _machine_halt = mirage_power_off;
+ _machine_restart = (void(*)(char *))mips_softreset;
+ }
#endif
#ifdef CONFIG_MIPS_BOSPORUS
/* Set AUX clock to 12 MHz * 8 = 96 MHz */
au_writel(8, SYS_AUXPLL);
- au_writel(0, SYS_PINSTATERD);
+ alchemy_gpio1_input_enable();
udelay(100);
#if defined(CONFIG_USB_OHCI_HCD) || defined(CONFIG_USB_OHCI_HCD_MODULE)
sys_clksrc = sys_freqctrl = pin_func = 0;
/* Set AUX clock to 12 MHz * 8 = 96 MHz */
au_writel(8, SYS_AUXPLL);
- au_writel(0, SYS_PINSTATERD);
+ alchemy_gpio1_input_enable();
udelay(100);
/* GPIO201 is input for PCMCIA card detect */
void prom_putchar(unsigned char c)
{
- alchemy_uart_putchar(UART0_PHYS_ADDR, c);
+ alchemy_uart_putchar(AU1000_UART0_PHYS_ADDR, c);
}
#include <prom.h>
-#define UART1_ADDR KSEG1ADDR(UART1_PHYS_ADDR)
-#define UART3_ADDR KSEG1ADDR(UART3_PHYS_ADDR)
-
char irq_tab_alchemy[][5] __initdata = {
[0] = { -1, AU1500_PCI_INTA, AU1500_PCI_INTB, 0xff, 0xff },
};
void __init board_setup(void)
{
- printk(KERN_INFO "Tarpeze ITS GPR board\n");
+ printk(KERN_INFO "Trapeze ITS GPR board\n");
pm_power_off = gpr_power_off;
_machine_halt = gpr_power_off;
_machine_restart = gpr_reset;
- /* Enable UART3 */
- au_writel(0x1, UART3_ADDR + UART_MOD_CNTRL);/* clock enable (CE) */
- au_writel(0x3, UART3_ADDR + UART_MOD_CNTRL); /* CE and "enable" */
- /* Enable UART1 */
- au_writel(0x1, UART1_ADDR + UART_MOD_CNTRL); /* clock enable (CE) */
- au_writel(0x3, UART1_ADDR + UART_MOD_CNTRL); /* CE and "enable" */
+ /* Enable UART1/3 */
+ alchemy_uart_enable(AU1000_UART3_PHYS_ADDR);
+ alchemy_uart_enable(AU1000_UART1_PHYS_ADDR);
/* Take away Reset of UMTS-card */
alchemy_gpio_direction_output(215, 1);
void prom_putchar(unsigned char c)
{
- alchemy_uart_putchar(UART0_PHYS_ADDR, c);
+ alchemy_uart_putchar(AU1000_UART0_PHYS_ADDR, c);
}
au_writel(SYS_PF_NI2, SYS_PINFUNC);
/* Initialize GPIO */
- au_writel(0xFFFFFFFF, SYS_TRIOUTCLR);
+ au_writel(~0, KSEG1ADDR(AU1000_SYS_PHYS_ADDR) + SYS_TRIOUTCLR);
alchemy_gpio_direction_output(0, 0); /* Disable M66EN (PCI 66MHz) */
alchemy_gpio_direction_output(3, 1); /* Disable PCI CLKRUN# */
alchemy_gpio_direction_output(1, 1); /* Enable EXT_IO3 */
void prom_putchar(unsigned char c)
{
- alchemy_uart_putchar(UART0_PHYS_ADDR, c);
+ alchemy_uart_putchar(AU1000_UART0_PHYS_ADDR, c);
}
static struct resource mtx1_wdt_res[] = {
[0] = {
- .start = 15,
- .end = 15,
+ .start = 215,
+ .end = 215,
.name = "mtx1-wdt-gpio",
.flags = IORESOURCE_IRQ,
}
au_writel(pin_func, SYS_PINFUNC);
/* Enable UART */
- au_writel(0x01, UART3_ADDR + UART_MOD_CNTRL); /* clock enable (CE) */
- mdelay(10);
- au_writel(0x03, UART3_ADDR + UART_MOD_CNTRL); /* CE and "enable" */
- mdelay(10);
-
- /* Enable DTR = USB power up */
- au_writel(0x01, UART3_ADDR + UART_MCR); /* UART_MCR_DTR is 0x01??? */
+ alchemy_uart_enable(AU1000_UART3_PHYS_ADDR);
+ /* Enable DTR (MCR bit 0) = USB power up */
+ __raw_writel(1, (void __iomem *)KSEG1ADDR(AU1000_UART3_PHYS_ADDR + 0x18));
+ wmb();
#ifdef CONFIG_PCI
#if defined(__MIPSEB__)
prom_init_cmdline();
memsize_str = prom_getenv("memsize");
- if (!memsize_str)
+ if (!memsize_str || strict_strtoul(memsize_str, 0, &memsize))
memsize = 0x04000000;
- else
- strict_strtoul(memsize_str, 0, &memsize);
+
add_memory_region(0, memsize, BOOT_MEM_RAM);
}
void prom_putchar(unsigned char c)
{
- alchemy_uart_putchar(UART0_PHYS_ADDR, c);
+ alchemy_uart_putchar(AU1000_UART0_PHYS_ADDR, c);
}
size = 0x1f;
}
- gpch->regs = ioremap_nocache(AR7_REGS_GPIO,
- AR7_REGS_GPIO + 0x10);
-
+ gpch->regs = ioremap_nocache(AR7_REGS_GPIO, size);
if (!gpch->regs) {
printk(KERN_ERR "%s: failed to ioremap regs\n",
gpch->chip.label);
*
* Copyright (C) 2005 Broadcom Corporation
* Copyright (C) 2006 Felix Fietkau <nbd@openwrt.org>
+ * Copyright (C) 2010-2011 Hauke Mehrtens <hauke@hauke-m.de>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
static char nvram_buf[NVRAM_SPACE];
/* Probe for NVRAM header */
-static void __init early_nvram_init(void)
+static void early_nvram_init(void)
{
struct ssb_mipscore *mcore = &ssb_bcm47xx.mipscore;
struct nvram_header *header;
* Copyright (C) 2006 Felix Fietkau <nbd@openwrt.org>
* Copyright (C) 2006 Michael Buesch <mb@bu3sch.de>
* Copyright (C) 2010 Waldemar Brodkorb <wbx@openadk.org>
+ * Copyright (C) 2010-2011 Hauke Mehrtens <hauke@hauke-m.de>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
}
#define READ_FROM_NVRAM(_outvar, name, buf) \
- if (nvram_getenv(name, buf, sizeof(buf)) >= 0)\
+ if (nvram_getprefix(prefix, name, buf, sizeof(buf)) >= 0)\
sprom->_outvar = simple_strtoul(buf, NULL, 0);
-static void bcm47xx_fill_sprom(struct ssb_sprom *sprom)
+#define READ_FROM_NVRAM2(_outvar, name1, name2, buf) \
+ if (nvram_getprefix(prefix, name1, buf, sizeof(buf)) >= 0 || \
+ nvram_getprefix(prefix, name2, buf, sizeof(buf)) >= 0)\
+ sprom->_outvar = simple_strtoul(buf, NULL, 0);
+
+static inline int nvram_getprefix(const char *prefix, char *name,
+ char *buf, int len)
+{
+ if (prefix) {
+ char key[100];
+
+ snprintf(key, sizeof(key), "%s%s", prefix, name);
+ return nvram_getenv(key, buf, len);
+ }
+
+ return nvram_getenv(name, buf, len);
+}
+
+static u32 nvram_getu32(const char *name, char *buf, int len)
+{
+ int rv;
+ char key[100];
+ u16 var0, var1;
+
+ snprintf(key, sizeof(key), "%s0", name);
+ rv = nvram_getenv(key, buf, len);
+ /* return 0 here so this looks like unset */
+ if (rv < 0)
+ return 0;
+ var0 = simple_strtoul(buf, NULL, 0);
+
+ snprintf(key, sizeof(key), "%s1", name);
+ rv = nvram_getenv(key, buf, len);
+ if (rv < 0)
+ return 0;
+ var1 = simple_strtoul(buf, NULL, 0);
+ return var1 << 16 | var0;
+}
+
+static void bcm47xx_fill_sprom(struct ssb_sprom *sprom, const char *prefix)
{
char buf[100];
u32 boardflags;
sprom->revision = 1; /* Fallback: Old hardware does not define this. */
READ_FROM_NVRAM(revision, "sromrev", buf);
- if (nvram_getenv("il0macaddr", buf, sizeof(buf)) >= 0)
+ if (nvram_getprefix(prefix, "il0macaddr", buf, sizeof(buf)) >= 0 ||
+ nvram_getprefix(prefix, "macaddr", buf, sizeof(buf)) >= 0)
nvram_parse_macaddr(buf, sprom->il0mac);
- if (nvram_getenv("et0macaddr", buf, sizeof(buf)) >= 0)
+ if (nvram_getprefix(prefix, "et0macaddr", buf, sizeof(buf)) >= 0)
nvram_parse_macaddr(buf, sprom->et0mac);
- if (nvram_getenv("et1macaddr", buf, sizeof(buf)) >= 0)
+ if (nvram_getprefix(prefix, "et1macaddr", buf, sizeof(buf)) >= 0)
nvram_parse_macaddr(buf, sprom->et1mac);
READ_FROM_NVRAM(et0phyaddr, "et0phyaddr", buf);
READ_FROM_NVRAM(et1phyaddr, "et1phyaddr", buf);
READ_FROM_NVRAM(pa1hib0, "pa1hib0", buf);
READ_FROM_NVRAM(pa1hib2, "pa1hib1", buf);
READ_FROM_NVRAM(pa1hib1, "pa1hib2", buf);
- READ_FROM_NVRAM(gpio0, "wl0gpio0", buf);
- READ_FROM_NVRAM(gpio1, "wl0gpio1", buf);
- READ_FROM_NVRAM(gpio2, "wl0gpio2", buf);
- READ_FROM_NVRAM(gpio3, "wl0gpio3", buf);
- READ_FROM_NVRAM(maxpwr_bg, "pa0maxpwr", buf);
- READ_FROM_NVRAM(maxpwr_al, "pa1lomaxpwr", buf);
- READ_FROM_NVRAM(maxpwr_a, "pa1maxpwr", buf);
- READ_FROM_NVRAM(maxpwr_ah, "pa1himaxpwr", buf);
- READ_FROM_NVRAM(itssi_a, "pa1itssit", buf);
- READ_FROM_NVRAM(itssi_bg, "pa0itssit", buf);
+ READ_FROM_NVRAM2(gpio0, "ledbh0", "wl0gpio0", buf);
+ READ_FROM_NVRAM2(gpio1, "ledbh1", "wl0gpio1", buf);
+ READ_FROM_NVRAM2(gpio2, "ledbh2", "wl0gpio2", buf);
+ READ_FROM_NVRAM2(gpio3, "ledbh3", "wl0gpio3", buf);
+ READ_FROM_NVRAM2(maxpwr_bg, "maxp2ga0", "pa0maxpwr", buf);
+ READ_FROM_NVRAM2(maxpwr_al, "maxp5gla0", "pa1lomaxpwr", buf);
+ READ_FROM_NVRAM2(maxpwr_a, "maxp5ga0", "pa1maxpwr", buf);
+ READ_FROM_NVRAM2(maxpwr_ah, "maxp5gha0", "pa1himaxpwr", buf);
+ READ_FROM_NVRAM2(itssi_bg, "itt5ga0", "pa0itssit", buf);
+ READ_FROM_NVRAM2(itssi_a, "itt2ga0", "pa1itssit", buf);
READ_FROM_NVRAM(tri2g, "tri2g", buf);
READ_FROM_NVRAM(tri5gl, "tri5gl", buf);
READ_FROM_NVRAM(tri5g, "tri5g", buf);
READ_FROM_NVRAM(tri5gh, "tri5gh", buf);
+ READ_FROM_NVRAM(txpid2g[0], "txpid2ga0", buf);
+ READ_FROM_NVRAM(txpid2g[1], "txpid2ga1", buf);
+ READ_FROM_NVRAM(txpid2g[2], "txpid2ga2", buf);
+ READ_FROM_NVRAM(txpid2g[3], "txpid2ga3", buf);
+ READ_FROM_NVRAM(txpid5g[0], "txpid5ga0", buf);
+ READ_FROM_NVRAM(txpid5g[1], "txpid5ga1", buf);
+ READ_FROM_NVRAM(txpid5g[2], "txpid5ga2", buf);
+ READ_FROM_NVRAM(txpid5g[3], "txpid5ga3", buf);
+ READ_FROM_NVRAM(txpid5gl[0], "txpid5gla0", buf);
+ READ_FROM_NVRAM(txpid5gl[1], "txpid5gla1", buf);
+ READ_FROM_NVRAM(txpid5gl[2], "txpid5gla2", buf);
+ READ_FROM_NVRAM(txpid5gl[3], "txpid5gla3", buf);
+ READ_FROM_NVRAM(txpid5gh[0], "txpid5gha0", buf);
+ READ_FROM_NVRAM(txpid5gh[1], "txpid5gha1", buf);
+ READ_FROM_NVRAM(txpid5gh[2], "txpid5gha2", buf);
+ READ_FROM_NVRAM(txpid5gh[3], "txpid5gha3", buf);
READ_FROM_NVRAM(rxpo2g, "rxpo2g", buf);
READ_FROM_NVRAM(rxpo5g, "rxpo5g", buf);
READ_FROM_NVRAM(rssisav2g, "rssisav2g", buf);
READ_FROM_NVRAM(rssismf5g, "rssismf5g", buf);
READ_FROM_NVRAM(bxa5g, "bxa5g", buf);
READ_FROM_NVRAM(cck2gpo, "cck2gpo", buf);
- READ_FROM_NVRAM(ofdm2gpo, "ofdm2gpo", buf);
- READ_FROM_NVRAM(ofdm5glpo, "ofdm5glpo", buf);
- READ_FROM_NVRAM(ofdm5gpo, "ofdm5gpo", buf);
- READ_FROM_NVRAM(ofdm5ghpo, "ofdm5ghpo", buf);
- if (nvram_getenv("boardflags", buf, sizeof(buf)) >= 0) {
+ sprom->ofdm2gpo = nvram_getu32("ofdm2gpo", buf, sizeof(buf));
+ sprom->ofdm5glpo = nvram_getu32("ofdm5glpo", buf, sizeof(buf));
+ sprom->ofdm5gpo = nvram_getu32("ofdm5gpo", buf, sizeof(buf));
+ sprom->ofdm5ghpo = nvram_getu32("ofdm5ghpo", buf, sizeof(buf));
+
+ READ_FROM_NVRAM(antenna_gain.ghz24.a0, "ag0", buf);
+ READ_FROM_NVRAM(antenna_gain.ghz24.a1, "ag1", buf);
+ READ_FROM_NVRAM(antenna_gain.ghz24.a2, "ag2", buf);
+ READ_FROM_NVRAM(antenna_gain.ghz24.a3, "ag3", buf);
+ memcpy(&sprom->antenna_gain.ghz5, &sprom->antenna_gain.ghz24,
+ sizeof(sprom->antenna_gain.ghz5));
+
+ if (nvram_getprefix(prefix, "boardflags", buf, sizeof(buf)) >= 0) {
boardflags = simple_strtoul(buf, NULL, 0);
if (boardflags) {
sprom->boardflags_lo = (boardflags & 0x0000FFFFU);
sprom->boardflags_hi = (boardflags & 0xFFFF0000U) >> 16;
}
}
- if (nvram_getenv("boardflags2", buf, sizeof(buf)) >= 0) {
+ if (nvram_getprefix(prefix, "boardflags2", buf, sizeof(buf)) >= 0) {
boardflags = simple_strtoul(buf, NULL, 0);
if (boardflags) {
sprom->boardflags2_lo = (boardflags & 0x0000FFFFU);
}
}
+int bcm47xx_get_sprom(struct ssb_bus *bus, struct ssb_sprom *out)
+{
+ char prefix[10];
+
+ if (bus->bustype == SSB_BUSTYPE_PCI) {
+ snprintf(prefix, sizeof(prefix), "pci/%u/%u/",
+ bus->host_pci->bus->number + 1,
+ PCI_SLOT(bus->host_pci->devfn));
+ bcm47xx_fill_sprom(out, prefix);
+ return 0;
+ } else {
+ printk(KERN_WARNING "bcm47xx: unable to fill SPROM for given bustype.\n");
+ return -EINVAL;
+ }
+}
+
static int bcm47xx_get_invariants(struct ssb_bus *bus,
struct ssb_init_invariants *iv)
{
if (nvram_getenv("boardrev", buf, sizeof(buf)) >= 0)
iv->boardinfo.rev = (u16)simple_strtoul(buf, NULL, 0);
- bcm47xx_fill_sprom(&iv->sprom);
+ bcm47xx_fill_sprom(&iv->sprom, NULL);
if (nvram_getenv("cardbus", buf, sizeof(buf)) >= 0)
iv->has_cardbus_slot = !!simple_strtoul(buf, NULL, 10);
char buf[100];
struct ssb_mipscore *mcore;
+ err = ssb_arch_register_fallback_sprom(&bcm47xx_get_sprom);
+ if (err)
+ printk(KERN_WARNING "bcm47xx: someone else already registered"
+ " a ssb SPROM callback handler (err %d)\n", err);
+
err = ssb_bus_ssbbus_register(&ssb_bcm47xx, SSB_ENUM_BASE,
bcm47xx_get_invariants);
if (err)
.boardflags_lo = 0x2848,
.boardflags_hi = 0x0000,
};
+
+int bcm63xx_get_fallback_sprom(struct ssb_bus *bus, struct ssb_sprom *out)
+{
+ if (bus->bustype == SSB_BUSTYPE_PCI) {
+ memcpy(out, &bcm63xx_sprom, sizeof(struct ssb_sprom));
+ return 0;
+ } else {
+ printk(KERN_ERR PFX "unable to fill SPROM for given bustype.\n");
+ return -EINVAL;
+ }
+}
#endif
/*
if (!board_get_mac_address(bcm63xx_sprom.il0mac)) {
memcpy(bcm63xx_sprom.et0mac, bcm63xx_sprom.il0mac, ETH_ALEN);
memcpy(bcm63xx_sprom.et1mac, bcm63xx_sprom.il0mac, ETH_ALEN);
- if (ssb_arch_set_fallback_sprom(&bcm63xx_sprom) < 0)
- printk(KERN_ERR "failed to register fallback SPROM\n");
+ if (ssb_arch_register_fallback_sprom(
+ &bcm63xx_get_fallback_sprom) < 0)
+ printk(KERN_ERR PFX "failed to register fallback SPROM\n");
}
#endif
}
int main(int argc, char *argv[])
{
+ unsigned long long vmlinux_size, vmlinux_load_addr, vmlinuz_load_addr;
struct stat sb;
- uint64_t vmlinux_size, vmlinux_load_addr, vmlinuz_load_addr;
if (argc != 3) {
fprintf(stderr, "Usage: %s <pathname> <vmlinux_load_addr>\n",
void putc(char c)
{
/* all current (Jan. 2010) in-kernel boards */
- alchemy_uart_putchar(UART0_PHYS_ADDR, c);
+ alchemy_uart_putchar(AU1000_UART0_PHYS_ADDR, c);
}
-config CAVIUM_OCTEON_SPECIFIC_OPTIONS
- bool "Enable Octeon specific options"
- depends on CPU_CAVIUM_OCTEON
- default "y"
+if CPU_CAVIUM_OCTEON
config CAVIUM_CN63XXP1
bool "Enable CN63XXP1 errata worarounds"
- depends on CAVIUM_OCTEON_SPECIFIC_OPTIONS
default "n"
help
The CN63XXP1 chip requires build time workarounds to
config CAVIUM_OCTEON_2ND_KERNEL
bool "Build the kernel to be used as a 2nd kernel on the same chip"
- depends on CAVIUM_OCTEON_SPECIFIC_OPTIONS
default "n"
help
This option configures this kernel to be linked at a different
config CAVIUM_OCTEON_HW_FIX_UNALIGNED
bool "Enable hardware fixups of unaligned loads and stores"
- depends on CAVIUM_OCTEON_SPECIFIC_OPTIONS
default "y"
help
Configure the Octeon hardware to automatically fix unaligned loads
config CAVIUM_OCTEON_CVMSEG_SIZE
int "Number of L1 cache lines reserved for CVMSEG memory"
- depends on CAVIUM_OCTEON_SPECIFIC_OPTIONS
range 0 54
default 1
help
config CAVIUM_OCTEON_LOCK_L2
bool "Lock often used kernel code in the L2"
- depends on CAVIUM_OCTEON_SPECIFIC_OPTIONS
default "y"
help
Enable locking parts of the kernel into the L2 cache.
config ARCH_SPARSEMEM_ENABLE
def_bool y
select SPARSEMEM_STATIC
- depends on CPU_CAVIUM_OCTEON
config CAVIUM_OCTEON_HELPER
def_bool y
config SWIOTLB
def_bool y
- depends on CPU_CAVIUM_OCTEON
select IOMMU_HELPER
select NEED_SG_DMA_LENGTH
+
+
+endif # CPU_CAVIUM_OCTEON
void __init plat_time_init(void)
{
clocksource_mips.rating = 300;
- clocksource_set_clock(&clocksource_mips, octeon_get_clock_rate());
- clocksource_register(&clocksource_mips);
+ clocksource_register_hz(&clocksource_mips, octeon_get_clock_rate());
}
static u64 octeon_udelay_factor;
union octeon_cvmemctl cvmmemctl;
union cvmx_iob_fau_timeout fau_timeout;
union cvmx_pow_nw_tim nm_tim;
- uint64_t cvmctl;
/* Get the current settings for CP0_CVMMEMCTL_REG */
cvmmemctl.u64 = read_c0_cvmmemctl();
CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE,
CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE * 128);
- /* Move the performance counter interrupts to IRQ 6 */
- cvmctl = read_c0_cvmctl();
- cvmctl &= ~(7 << 7);
- cvmctl |= 6 << 7;
- write_c0_cvmctl(cvmctl);
-
/* Set a default for the hardware timeouts */
fau_timeout.u64 = 0;
fau_timeout.s.tout_val = 0xfff;
uint64_t action;
/* Load the mailbox register to figure out what we're supposed to do */
- action = cvmx_read_csr(CVMX_CIU_MBOX_CLRX(coreid));
+ action = cvmx_read_csr(CVMX_CIU_MBOX_CLRX(coreid)) & 0xffff;
/* Clear the mailbox to clear the interrupt */
cvmx_write_csr(CVMX_CIU_MBOX_CLRX(coreid), action);
if (action & SMP_CALL_FUNCTION)
smp_call_function_interrupt();
+ if (action & SMP_RESCHEDULE_YOURSELF)
+ scheduler_ipi();
/* Check if we've been told to flush the icache */
if (action & SMP_ICACHE_FLUSH)
if (labi->labi_signature != LABI_SIGNATURE)
panic("The bootloader version on this board is incorrect.");
#endif
-
- cvmx_write_csr(CVMX_CIU_MBOX_CLRX(cvmx_get_core_num()), 0xffffffff);
+ /*
+ * Only the low order mailbox bits are used for IPIs, leave
+ * the other bits alone.
+ */
+ cvmx_write_csr(CVMX_CIU_MBOX_CLRX(cvmx_get_core_num()), 0xffff);
if (request_irq(OCTEON_IRQ_MBOX0, mailbox_interrupt, IRQF_DISABLED,
- "mailbox0", mailbox_interrupt)) {
+ "SMP-IPI", mailbox_interrupt)) {
panic("Cannot request_irq(OCTEON_IRQ_MBOX0)\n");
}
- if (request_irq(OCTEON_IRQ_MBOX1, mailbox_interrupt, IRQF_DISABLED,
- "mailbox1", mailbox_interrupt)) {
- panic("Cannot request_irq(OCTEON_IRQ_MBOX1)\n");
- }
}
/**
CONFIG_NET_EMATCH=y
CONFIG_NET_CLS_ACT=y
CONFIG_BT=m
-CONFIG_BT_L2CAP=m
-CONFIG_BT_SCO=m
+CONFIG_BT_L2CAP=y
+CONFIG_BT_SCO=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_USB_GADGET=m
CONFIG_USB_GADGET_M66592=y
CONFIG_MMC=m
-CONFIG_LEDS_CLASS=m
+CONFIG_LEDS_CLASS=y
CONFIG_STAGING=y
# CONFIG_STAGING_EXCLUDE_BUILD is not set
CONFIG_FB_SM7XX=y
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_HID=m
-CONFIG_LEDS_CLASS=m
+CONFIG_LEDS_CLASS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_IDE_DISK=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_VLSI_FIR=m
CONFIG_MCS_FIR=m
CONFIG_BT=m
-CONFIG_BT_L2CAP=m
-CONFIG_BT_SCO=m
+CONFIG_BT_L2CAP=y
+CONFIG_BT_SCO=y
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
--- /dev/null
+CONFIG_NLM_XLR_BOARD=y
+CONFIG_HIGHMEM=y
+CONFIG_KSM=y
+CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
+CONFIG_SMP=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_PREEMPT_VOLUNTARY=y
+CONFIG_KEXEC=y
+CONFIG_EXPERIMENTAL=y
+CONFIG_CROSS_COMPILE="mips64-unknown-linux-gnu-"
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_BSD_PROCESS_ACCT_V3=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_AUDIT=y
+CONFIG_NAMESPACES=y
+CONFIG_SCHED_AUTOGROUP=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_INITRAMFS_SOURCE="usr/dev_file_list usr/rootfs"
+CONFIG_RD_BZIP2=y
+CONFIG_RD_LZMA=y
+CONFIG_INITRAMFS_COMPRESSION_GZIP=y
+# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
+CONFIG_EXPERT=y
+CONFIG_KALLSYMS_ALL=y
+# CONFIG_ELF_CORE is not set
+# CONFIG_PCSPKR_PLATFORM is not set
+# CONFIG_PERF_EVENTS is not set
+# CONFIG_COMPAT_BRK is not set
+CONFIG_PROFILING=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODVERSIONS=y
+CONFIG_MODULE_SRCVERSION_ALL=y
+CONFIG_BLK_DEV_INTEGRITY=y
+CONFIG_BINFMT_MISC=m
+CONFIG_PM_RUNTIME=y
+CONFIG_PM_DEBUG=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=m
+CONFIG_NET_KEY=m
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_ROUTE_MULTIPATH=y
+CONFIG_IP_ROUTE_VERBOSE=y
+CONFIG_NET_IPIP=m
+CONFIG_IP_MROUTE=y
+CONFIG_IP_PIMSM_V1=y
+CONFIG_IP_PIMSM_V2=y
+CONFIG_SYN_COOKIES=y
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_XFRM_MODE_TRANSPORT=m
+CONFIG_INET_XFRM_MODE_TUNNEL=m
+CONFIG_INET_XFRM_MODE_BEET=m
+CONFIG_TCP_CONG_ADVANCED=y
+CONFIG_TCP_CONG_HSTCP=m
+CONFIG_TCP_CONG_HYBLA=m
+CONFIG_TCP_CONG_SCALABLE=m
+CONFIG_TCP_CONG_LP=m
+CONFIG_TCP_CONG_VENO=m
+CONFIG_TCP_CONG_YEAH=m
+CONFIG_TCP_CONG_ILLINOIS=m
+CONFIG_TCP_MD5SIG=y
+CONFIG_IPV6=y
+CONFIG_IPV6_PRIVACY=y
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_INET6_XFRM_MODE_TRANSPORT=m
+CONFIG_INET6_XFRM_MODE_TUNNEL=m
+CONFIG_INET6_XFRM_MODE_BEET=m
+CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
+CONFIG_IPV6_SIT=m
+CONFIG_IPV6_TUNNEL=m
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NETLABEL=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=m
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CT_PROTO_UDPLITE=m
+CONFIG_NF_CONNTRACK_AMANDA=m
+CONFIG_NF_CONNTRACK_FTP=m
+CONFIG_NF_CONNTRACK_H323=m
+CONFIG_NF_CONNTRACK_IRC=m
+CONFIG_NF_CONNTRACK_NETBIOS_NS=m
+CONFIG_NF_CONNTRACK_PPTP=m
+CONFIG_NF_CONNTRACK_SANE=m
+CONFIG_NF_CONNTRACK_SIP=m
+CONFIG_NF_CONNTRACK_TFTP=m
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NETFILTER_TPROXY=m
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
+CONFIG_NETFILTER_XT_TARGET_DSCP=m
+CONFIG_NETFILTER_XT_TARGET_MARK=m
+CONFIG_NETFILTER_XT_TARGET_NFLOG=m
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
+CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
+CONFIG_NETFILTER_XT_TARGET_TPROXY=m
+CONFIG_NETFILTER_XT_TARGET_TRACE=m
+CONFIG_NETFILTER_XT_TARGET_SECMARK=m
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
+CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
+CONFIG_NETFILTER_XT_MATCH_COMMENT=m
+CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
+CONFIG_NETFILTER_XT_MATCH_DSCP=m
+CONFIG_NETFILTER_XT_MATCH_ESP=m
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
+CONFIG_NETFILTER_XT_MATCH_HELPER=m
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
+CONFIG_NETFILTER_XT_MATCH_LENGTH=m
+CONFIG_NETFILTER_XT_MATCH_LIMIT=m
+CONFIG_NETFILTER_XT_MATCH_MAC=m
+CONFIG_NETFILTER_XT_MATCH_MARK=m
+CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
+CONFIG_NETFILTER_XT_MATCH_OSF=m
+CONFIG_NETFILTER_XT_MATCH_OWNER=m
+CONFIG_NETFILTER_XT_MATCH_POLICY=m
+CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
+CONFIG_NETFILTER_XT_MATCH_QUOTA=m
+CONFIG_NETFILTER_XT_MATCH_RATEEST=m
+CONFIG_NETFILTER_XT_MATCH_REALM=m
+CONFIG_NETFILTER_XT_MATCH_RECENT=m
+CONFIG_NETFILTER_XT_MATCH_SOCKET=m
+CONFIG_NETFILTER_XT_MATCH_STATE=m
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
+CONFIG_NETFILTER_XT_MATCH_STRING=m
+CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
+CONFIG_NETFILTER_XT_MATCH_TIME=m
+CONFIG_NETFILTER_XT_MATCH_U32=m
+CONFIG_IP_VS=m
+CONFIG_IP_VS_IPV6=y
+CONFIG_IP_VS_PROTO_TCP=y
+CONFIG_IP_VS_PROTO_UDP=y
+CONFIG_IP_VS_PROTO_ESP=y
+CONFIG_IP_VS_PROTO_AH=y
+CONFIG_IP_VS_RR=m
+CONFIG_IP_VS_WRR=m
+CONFIG_IP_VS_LC=m
+CONFIG_IP_VS_WLC=m
+CONFIG_IP_VS_LBLC=m
+CONFIG_IP_VS_LBLCR=m
+CONFIG_IP_VS_DH=m
+CONFIG_IP_VS_SH=m
+CONFIG_IP_VS_SED=m
+CONFIG_IP_VS_NQ=m
+CONFIG_IP_VS_FTP=m
+CONFIG_NF_CONNTRACK_IPV4=m
+CONFIG_IP_NF_QUEUE=m
+CONFIG_IP_NF_IPTABLES=m
+CONFIG_IP_NF_MATCH_AH=m
+CONFIG_IP_NF_MATCH_ECN=m
+CONFIG_IP_NF_MATCH_TTL=m
+CONFIG_IP_NF_FILTER=m
+CONFIG_IP_NF_TARGET_REJECT=m
+CONFIG_IP_NF_TARGET_LOG=m
+CONFIG_IP_NF_TARGET_ULOG=m
+CONFIG_NF_NAT=m
+CONFIG_IP_NF_TARGET_MASQUERADE=m
+CONFIG_IP_NF_TARGET_NETMAP=m
+CONFIG_IP_NF_TARGET_REDIRECT=m
+CONFIG_IP_NF_MANGLE=m
+CONFIG_IP_NF_TARGET_CLUSTERIP=m
+CONFIG_IP_NF_TARGET_ECN=m
+CONFIG_IP_NF_TARGET_TTL=m
+CONFIG_IP_NF_RAW=m
+CONFIG_IP_NF_SECURITY=m
+CONFIG_IP_NF_ARPTABLES=m
+CONFIG_IP_NF_ARPFILTER=m
+CONFIG_IP_NF_ARP_MANGLE=m
+CONFIG_NF_CONNTRACK_IPV6=m
+CONFIG_IP6_NF_QUEUE=m
+CONFIG_IP6_NF_IPTABLES=m
+CONFIG_IP6_NF_MATCH_AH=m
+CONFIG_IP6_NF_MATCH_EUI64=m
+CONFIG_IP6_NF_MATCH_FRAG=m
+CONFIG_IP6_NF_MATCH_OPTS=m
+CONFIG_IP6_NF_MATCH_HL=m
+CONFIG_IP6_NF_MATCH_IPV6HEADER=m
+CONFIG_IP6_NF_MATCH_MH=m
+CONFIG_IP6_NF_MATCH_RT=m
+CONFIG_IP6_NF_TARGET_HL=m
+CONFIG_IP6_NF_TARGET_LOG=m
+CONFIG_IP6_NF_FILTER=m
+CONFIG_IP6_NF_TARGET_REJECT=m
+CONFIG_IP6_NF_MANGLE=m
+CONFIG_IP6_NF_RAW=m
+CONFIG_IP6_NF_SECURITY=m
+CONFIG_DECNET_NF_GRABULATOR=m
+CONFIG_BRIDGE_NF_EBTABLES=m
+CONFIG_BRIDGE_EBT_BROUTE=m
+CONFIG_BRIDGE_EBT_T_FILTER=m
+CONFIG_BRIDGE_EBT_T_NAT=m
+CONFIG_BRIDGE_EBT_802_3=m
+CONFIG_BRIDGE_EBT_AMONG=m
+CONFIG_BRIDGE_EBT_ARP=m
+CONFIG_BRIDGE_EBT_IP=m
+CONFIG_BRIDGE_EBT_IP6=m
+CONFIG_BRIDGE_EBT_LIMIT=m
+CONFIG_BRIDGE_EBT_MARK=m
+CONFIG_BRIDGE_EBT_PKTTYPE=m
+CONFIG_BRIDGE_EBT_STP=m
+CONFIG_BRIDGE_EBT_VLAN=m
+CONFIG_BRIDGE_EBT_ARPREPLY=m
+CONFIG_BRIDGE_EBT_DNAT=m
+CONFIG_BRIDGE_EBT_MARK_T=m
+CONFIG_BRIDGE_EBT_REDIRECT=m
+CONFIG_BRIDGE_EBT_SNAT=m
+CONFIG_BRIDGE_EBT_LOG=m
+CONFIG_BRIDGE_EBT_ULOG=m
+CONFIG_BRIDGE_EBT_NFLOG=m
+CONFIG_IP_DCCP=m
+CONFIG_RDS=m
+CONFIG_RDS_TCP=m
+CONFIG_TIPC=m
+CONFIG_ATM=m
+CONFIG_ATM_CLIP=m
+CONFIG_ATM_LANE=m
+CONFIG_ATM_MPOA=m
+CONFIG_ATM_BR2684=m
+CONFIG_BRIDGE=m
+CONFIG_VLAN_8021Q=m
+CONFIG_VLAN_8021Q_GVRP=y
+CONFIG_DECNET=m
+CONFIG_LLC2=m
+CONFIG_IPX=m
+CONFIG_ATALK=m
+CONFIG_DEV_APPLETALK=m
+CONFIG_IPDDP=m
+CONFIG_IPDDP_ENCAP=y
+CONFIG_IPDDP_DECAP=y
+CONFIG_X25=m
+CONFIG_LAPB=m
+CONFIG_ECONET=m
+CONFIG_ECONET_AUNUDP=y
+CONFIG_ECONET_NATIVE=y
+CONFIG_WAN_ROUTER=m
+CONFIG_PHONET=m
+CONFIG_IEEE802154=m
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_CBQ=m
+CONFIG_NET_SCH_HTB=m
+CONFIG_NET_SCH_HFSC=m
+CONFIG_NET_SCH_ATM=m
+CONFIG_NET_SCH_PRIO=m
+CONFIG_NET_SCH_MULTIQ=m
+CONFIG_NET_SCH_RED=m
+CONFIG_NET_SCH_SFQ=m
+CONFIG_NET_SCH_TEQL=m
+CONFIG_NET_SCH_TBF=m
+CONFIG_NET_SCH_GRED=m
+CONFIG_NET_SCH_DSMARK=m
+CONFIG_NET_SCH_NETEM=m
+CONFIG_NET_SCH_DRR=m
+CONFIG_NET_SCH_INGRESS=m
+CONFIG_NET_CLS_BASIC=m
+CONFIG_NET_CLS_TCINDEX=m
+CONFIG_NET_CLS_ROUTE4=m
+CONFIG_NET_CLS_FW=m
+CONFIG_NET_CLS_U32=m
+CONFIG_CLS_U32_MARK=y
+CONFIG_NET_CLS_RSVP=m
+CONFIG_NET_CLS_RSVP6=m
+CONFIG_NET_CLS_FLOW=m
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_CMP=m
+CONFIG_NET_EMATCH_NBYTE=m
+CONFIG_NET_EMATCH_U32=m
+CONFIG_NET_EMATCH_META=m
+CONFIG_NET_EMATCH_TEXT=m
+CONFIG_NET_CLS_ACT=y
+CONFIG_NET_ACT_POLICE=m
+CONFIG_NET_ACT_GACT=m
+CONFIG_GACT_PROB=y
+CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_IPT=m
+CONFIG_NET_ACT_NAT=m
+CONFIG_NET_ACT_PEDIT=m
+CONFIG_NET_ACT_SIMP=m
+CONFIG_NET_ACT_SKBEDIT=m
+CONFIG_DCB=y
+CONFIG_NET_PKTGEN=m
+# CONFIG_WIRELESS is not set
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+# CONFIG_STANDALONE is not set
+CONFIG_CONNECTOR=y
+CONFIG_MTD=m
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_CRYPTOLOOP=m
+CONFIG_BLK_DEV_NBD=m
+CONFIG_BLK_DEV_OSD=m
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=65536
+CONFIG_CDROM_PKTCDVD=y
+CONFIG_MISC_DEVICES=y
+CONFIG_RAID_ATTRS=m
+CONFIG_SCSI=y
+CONFIG_SCSI_TGT=m
+CONFIG_BLK_DEV_SD=y
+CONFIG_CHR_DEV_ST=m
+CONFIG_CHR_DEV_OSST=m
+CONFIG_BLK_DEV_SR=y
+CONFIG_CHR_DEV_SG=y
+CONFIG_CHR_DEV_SCH=m
+CONFIG_SCSI_MULTI_LUN=y
+CONFIG_SCSI_CONSTANTS=y
+CONFIG_SCSI_LOGGING=y
+CONFIG_SCSI_SCAN_ASYNC=y
+CONFIG_SCSI_SPI_ATTRS=m
+CONFIG_SCSI_FC_TGT_ATTRS=y
+CONFIG_SCSI_SAS_LIBSAS=m
+CONFIG_SCSI_SRP_ATTRS=m
+CONFIG_SCSI_SRP_TGT_ATTRS=y
+CONFIG_ISCSI_TCP=m
+CONFIG_LIBFCOE=m
+CONFIG_SCSI_DEBUG=m
+CONFIG_SCSI_DH=y
+CONFIG_SCSI_DH_RDAC=m
+CONFIG_SCSI_DH_HP_SW=m
+CONFIG_SCSI_DH_EMC=m
+CONFIG_SCSI_DH_ALUA=m
+CONFIG_SCSI_OSD_INITIATOR=m
+CONFIG_SCSI_OSD_ULD=m
+# CONFIG_INPUT_MOUSEDEV is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_EVBUG=m
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSE is not set
+# CONFIG_SERIO_I8042 is not set
+CONFIG_SERIO_SERPORT=m
+CONFIG_SERIO_LIBPS2=y
+CONFIG_SERIO_RAW=m
+CONFIG_VT_HW_CONSOLE_BINDING=y
+CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
+CONFIG_LEGACY_PTY_COUNT=0
+CONFIG_SERIAL_NONSTANDARD=y
+CONFIG_N_HDLC=m
+# CONFIG_DEVKMEM is not set
+CONFIG_STALDRV=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_8250_NR_UARTS=48
+CONFIG_SERIAL_8250_EXTENDED=y
+CONFIG_SERIAL_8250_MANY_PORTS=y
+CONFIG_SERIAL_8250_SHARE_IRQ=y
+CONFIG_SERIAL_8250_RSA=y
+CONFIG_HW_RANDOM=y
+CONFIG_HW_RANDOM_TIMERIOMEM=m
+CONFIG_RAW_DRIVER=m
+# CONFIG_HWMON is not set
+# CONFIG_VGA_CONSOLE is not set
+# CONFIG_HID_SUPPORT is not set
+# CONFIG_USB_SUPPORT is not set
+CONFIG_UIO=y
+CONFIG_UIO_PDRV=m
+CONFIG_UIO_PDRV_GENIRQ=m
+CONFIG_EXT2_FS=y
+CONFIG_EXT2_FS_XATTR=y
+CONFIG_EXT2_FS_POSIX_ACL=y
+CONFIG_EXT2_FS_SECURITY=y
+CONFIG_EXT3_FS=y
+CONFIG_EXT3_FS_POSIX_ACL=y
+CONFIG_EXT3_FS_SECURITY=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_GFS2_FS=m
+CONFIG_GFS2_FS_LOCKING_DLM=y
+CONFIG_OCFS2_FS=m
+CONFIG_BTRFS_FS=m
+CONFIG_BTRFS_FS_POSIX_ACL=y
+CONFIG_NILFS2_FS=m
+CONFIG_QUOTA_NETLINK_INTERFACE=y
+# CONFIG_PRINT_QUOTA_WARNING is not set
+CONFIG_QFMT_V1=m
+CONFIG_QFMT_V2=m
+CONFIG_AUTOFS4_FS=m
+CONFIG_FUSE_FS=y
+CONFIG_CUSE=m
+CONFIG_FSCACHE=m
+CONFIG_FSCACHE_STATS=y
+CONFIG_FSCACHE_HISTOGRAM=y
+CONFIG_CACHEFILES=m
+CONFIG_ISO9660_FS=m
+CONFIG_JOLIET=y
+CONFIG_ZISOFS=y
+CONFIG_UDF_FS=m
+CONFIG_MSDOS_FS=m
+CONFIG_VFAT_FS=m
+CONFIG_NTFS_FS=m
+CONFIG_PROC_KCORE=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_CONFIGFS_FS=y
+CONFIG_ADFS_FS=m
+CONFIG_AFFS_FS=m
+CONFIG_ECRYPT_FS=y
+CONFIG_HFS_FS=m
+CONFIG_HFSPLUS_FS=m
+CONFIG_BEFS_FS=m
+CONFIG_BFS_FS=m
+CONFIG_EFS_FS=m
+CONFIG_CRAMFS=m
+CONFIG_SQUASHFS=m
+CONFIG_VXFS_FS=m
+CONFIG_MINIX_FS=m
+CONFIG_OMFS_FS=m
+CONFIG_HPFS_FS=m
+CONFIG_QNX4FS_FS=m
+CONFIG_ROMFS_FS=m
+CONFIG_SYSV_FS=m
+CONFIG_UFS_FS=m
+CONFIG_EXOFS_FS=m
+CONFIG_NFS_FS=m
+CONFIG_NFS_V3=y
+CONFIG_NFS_V3_ACL=y
+CONFIG_NFS_V4=y
+CONFIG_NFS_FSCACHE=y
+CONFIG_NFSD=m
+CONFIG_NFSD_V3_ACL=y
+CONFIG_NFSD_V4=y
+CONFIG_CIFS=m
+CONFIG_CIFS_WEAK_PW_HASH=y
+CONFIG_CIFS_UPCALL=y
+CONFIG_CIFS_XATTR=y
+CONFIG_CIFS_POSIX=y
+CONFIG_CIFS_DFS_UPCALL=y
+CONFIG_CIFS_EXPERIMENTAL=y
+CONFIG_NCP_FS=m
+CONFIG_NCPFS_PACKET_SIGNING=y
+CONFIG_NCPFS_IOCTL_LOCKING=y
+CONFIG_NCPFS_STRONG=y
+CONFIG_NCPFS_NFS_NS=y
+CONFIG_NCPFS_OS2_NS=y
+CONFIG_NCPFS_NLS=y
+CONFIG_NCPFS_EXTRAS=y
+CONFIG_CODA_FS=m
+CONFIG_AFS_FS=m
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_ACORN_PARTITION=y
+CONFIG_ACORN_PARTITION_ICS=y
+CONFIG_ACORN_PARTITION_RISCIX=y
+CONFIG_OSF_PARTITION=y
+CONFIG_AMIGA_PARTITION=y
+CONFIG_ATARI_PARTITION=y
+CONFIG_MAC_PARTITION=y
+CONFIG_BSD_DISKLABEL=y
+CONFIG_MINIX_SUBPARTITION=y
+CONFIG_SOLARIS_X86_PARTITION=y
+CONFIG_UNIXWARE_DISKLABEL=y
+CONFIG_LDM_PARTITION=y
+CONFIG_SGI_PARTITION=y
+CONFIG_ULTRIX_PARTITION=y
+CONFIG_SUN_PARTITION=y
+CONFIG_KARMA_PARTITION=y
+CONFIG_EFI_PARTITION=y
+CONFIG_SYSV68_PARTITION=y
+CONFIG_NLS=y
+CONFIG_NLS_DEFAULT="cp437"
+CONFIG_NLS_CODEPAGE_437=m
+CONFIG_NLS_CODEPAGE_737=m
+CONFIG_NLS_CODEPAGE_775=m
+CONFIG_NLS_CODEPAGE_850=m
+CONFIG_NLS_CODEPAGE_852=m
+CONFIG_NLS_CODEPAGE_855=m
+CONFIG_NLS_CODEPAGE_857=m
+CONFIG_NLS_CODEPAGE_860=m
+CONFIG_NLS_CODEPAGE_861=m
+CONFIG_NLS_CODEPAGE_862=m
+CONFIG_NLS_CODEPAGE_863=m
+CONFIG_NLS_CODEPAGE_864=m
+CONFIG_NLS_CODEPAGE_865=m
+CONFIG_NLS_CODEPAGE_866=m
+CONFIG_NLS_CODEPAGE_869=m
+CONFIG_NLS_CODEPAGE_936=m
+CONFIG_NLS_CODEPAGE_950=m
+CONFIG_NLS_CODEPAGE_932=m
+CONFIG_NLS_CODEPAGE_949=m
+CONFIG_NLS_CODEPAGE_874=m
+CONFIG_NLS_ISO8859_8=m
+CONFIG_NLS_CODEPAGE_1250=m
+CONFIG_NLS_CODEPAGE_1251=m
+CONFIG_NLS_ASCII=m
+CONFIG_NLS_ISO8859_1=m
+CONFIG_NLS_ISO8859_2=m
+CONFIG_NLS_ISO8859_3=m
+CONFIG_NLS_ISO8859_4=m
+CONFIG_NLS_ISO8859_5=m
+CONFIG_NLS_ISO8859_6=m
+CONFIG_NLS_ISO8859_7=m
+CONFIG_NLS_ISO8859_9=m
+CONFIG_NLS_ISO8859_13=m
+CONFIG_NLS_ISO8859_14=m
+CONFIG_NLS_ISO8859_15=m
+CONFIG_NLS_KOI8_R=m
+CONFIG_NLS_KOI8_U=m
+CONFIG_PRINTK_TIME=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_UNUSED_SYMBOLS=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_DETECT_HUNG_TASK=y
+CONFIG_SCHEDSTATS=y
+CONFIG_TIMER_STATS=y
+CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_MEMORY_INIT=y
+CONFIG_SYSCTL_SYSCALL_CHECK=y
+CONFIG_SCHED_TRACER=y
+CONFIG_BLK_DEV_IO_TRACE=y
+CONFIG_KGDB=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_LSM_MMAP_MIN_ADDR=0
+CONFIG_SECURITY_SELINUX=y
+CONFIG_SECURITY_SELINUX_BOOTPARAM=y
+CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=0
+CONFIG_SECURITY_SELINUX_DISABLE=y
+CONFIG_SECURITY_SMACK=y
+CONFIG_SECURITY_TOMOYO=y
+CONFIG_CRYPTO_NULL=m
+CONFIG_CRYPTO_CRYPTD=m
+CONFIG_CRYPTO_TEST=m
+CONFIG_CRYPTO_CCM=m
+CONFIG_CRYPTO_GCM=m
+CONFIG_CRYPTO_CTS=m
+CONFIG_CRYPTO_LRW=m
+CONFIG_CRYPTO_PCBC=m
+CONFIG_CRYPTO_XTS=m
+CONFIG_CRYPTO_HMAC=y
+CONFIG_CRYPTO_XCBC=m
+CONFIG_CRYPTO_VMAC=m
+CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_RMD128=m
+CONFIG_CRYPTO_RMD160=m
+CONFIG_CRYPTO_RMD256=m
+CONFIG_CRYPTO_RMD320=m
+CONFIG_CRYPTO_SHA256=m
+CONFIG_CRYPTO_SHA512=m
+CONFIG_CRYPTO_TGR192=m
+CONFIG_CRYPTO_WP512=m
+CONFIG_CRYPTO_ANUBIS=m
+CONFIG_CRYPTO_BLOWFISH=m
+CONFIG_CRYPTO_CAMELLIA=m
+CONFIG_CRYPTO_CAST5=m
+CONFIG_CRYPTO_CAST6=m
+CONFIG_CRYPTO_FCRYPT=m
+CONFIG_CRYPTO_KHAZAD=m
+CONFIG_CRYPTO_SALSA20=m
+CONFIG_CRYPTO_SEED=m
+CONFIG_CRYPTO_SERPENT=m
+CONFIG_CRYPTO_TEA=m
+CONFIG_CRYPTO_TWOFISH=m
+CONFIG_CRYPTO_ZLIB=m
+CONFIG_CRYPTO_LZO=m
+CONFIG_CRC_CCITT=m
+CONFIG_CRC7=m
#define SMP_CACHE_SHIFT L1_CACHE_SHIFT
#define SMP_CACHE_BYTES L1_CACHE_BYTES
-#define __read_mostly __attribute__((__section__(".data.read_mostly")))
+#define __read_mostly __attribute__((__section__(".data..read_mostly")))
#endif /* _ASM_CACHE_H */
#ifndef __ASM_CEVT_R4K_H
#define __ASM_CEVT_R4K_H
+#include <linux/clockchips.h>
+#include <asm/time.h>
+
DECLARE_PER_CPU(struct clock_event_device, mips_clockevent_device);
void mips_event_handler(struct clock_event_device *dev);
#define PRID_COMP_TOSHIBA 0x070000
#define PRID_COMP_LSI 0x080000
#define PRID_COMP_LEXRA 0x0b0000
+#define PRID_COMP_NETLOGIC 0x0c0000
#define PRID_COMP_CAVIUM 0x0d0000
#define PRID_COMP_INGENIC 0xd00000
#define PRID_IMP_JZRISC 0x0200
+/*
+ * These are the PRID's for when 23:16 == PRID_COMP_NETLOGIC
+ */
+#define PRID_IMP_NETLOGIC_XLR732 0x0000
+#define PRID_IMP_NETLOGIC_XLR716 0x0200
+#define PRID_IMP_NETLOGIC_XLR532 0x0900
+#define PRID_IMP_NETLOGIC_XLR308 0x0600
+#define PRID_IMP_NETLOGIC_XLR532C 0x0800
+#define PRID_IMP_NETLOGIC_XLR516C 0x0a00
+#define PRID_IMP_NETLOGIC_XLR508C 0x0b00
+#define PRID_IMP_NETLOGIC_XLR308C 0x0f00
+#define PRID_IMP_NETLOGIC_XLS608 0x8000
+#define PRID_IMP_NETLOGIC_XLS408 0x8800
+#define PRID_IMP_NETLOGIC_XLS404 0x8c00
+#define PRID_IMP_NETLOGIC_XLS208 0x8e00
+#define PRID_IMP_NETLOGIC_XLS204 0x8f00
+#define PRID_IMP_NETLOGIC_XLS108 0xce00
+#define PRID_IMP_NETLOGIC_XLS104 0xcf00
+#define PRID_IMP_NETLOGIC_XLS616B 0x4000
+#define PRID_IMP_NETLOGIC_XLS608B 0x4a00
+#define PRID_IMP_NETLOGIC_XLS416B 0x4400
+#define PRID_IMP_NETLOGIC_XLS412B 0x4c00
+#define PRID_IMP_NETLOGIC_XLS408B 0x4e00
+#define PRID_IMP_NETLOGIC_XLS404B 0x4f00
+
/*
* Definitions for 7:0 on legacy processors
*/
*/
CPU_5KC, CPU_20KC, CPU_25KF, CPU_SB1, CPU_SB1A, CPU_LOONGSON2,
CPU_CAVIUM_OCTEON, CPU_CAVIUM_OCTEON_PLUS, CPU_CAVIUM_OCTEON2,
+ CPU_XLR,
CPU_LAST
};
#include <asm/cache.h>
#include <asm-generic/dma-coherent.h>
+#ifndef CONFIG_SGI_IP27 /* Kludge to fix 2.6.39 build for IP27 */
#include <dma-coherence.h>
+#endif
extern struct dma_map_ops *mips_dma_map_ops;
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
+ flush_tlb_mm(vma->vm_mm);
}
static inline int huge_pte_none(pte_t pte)
#define PIT_CH0 0x40
#define PIT_CH2 0x42
+#define PIT_LATCH LATCH
+
extern raw_spinlock_t i8253_lock;
extern void setup_pit_timer(void);
+#define inb_pit inb_p
+#define outb_pit outb_p
+
#endif /* __ASM_I8253_H */
#define WORD_INSN ".word"
#endif
-#define JUMP_LABEL(key, label) \
- do { \
- asm goto("1:\tnop\n\t" \
- "nop\n\t" \
- ".pushsection __jump_table, \"a\"\n\t" \
- WORD_INSN " 1b, %l[" #label "], %0\n\t" \
- ".popsection\n\t" \
- : : "i" (key) : : label); \
- } while (0)
-
+static __always_inline bool arch_static_branch(struct jump_label_key *key)
+{
+ asm goto("1:\tnop\n\t"
+ "nop\n\t"
+ ".pushsection __jump_table, \"aw\"\n\t"
+ WORD_INSN " 1b, %l[l_yes], %0\n\t"
+ ".popsection\n\t"
+ : : "i" (key) : : l_yes);
+ return false;
+l_yes:
+ return true;
+}
#endif /* __KERNEL__ */
return ALCHEMY_CPU_UNKNOWN;
}
+/* return number of uarts on a given cputype */
+static inline int alchemy_get_uarts(int type)
+{
+ switch (type) {
+ case ALCHEMY_CPU_AU1000:
+ return 4;
+ case ALCHEMY_CPU_AU1500:
+ case ALCHEMY_CPU_AU1200:
+ return 2;
+ case ALCHEMY_CPU_AU1100:
+ case ALCHEMY_CPU_AU1550:
+ return 3;
+ }
+ return 0;
+}
+
+/* enable an UART block if it isn't already */
+static inline void alchemy_uart_enable(u32 uart_phys)
+{
+ void __iomem *addr = (void __iomem *)KSEG1ADDR(uart_phys);
+
+ /* reset, enable clock, deassert reset */
+ if ((__raw_readl(addr + 0x100) & 3) != 3) {
+ __raw_writel(0, addr + 0x100);
+ wmb();
+ __raw_writel(1, addr + 0x100);
+ wmb();
+ }
+ __raw_writel(3, addr + 0x100);
+ wmb();
+}
+
+static inline void alchemy_uart_disable(u32 uart_phys)
+{
+ void __iomem *addr = (void __iomem *)KSEG1ADDR(uart_phys);
+ __raw_writel(0, addr + 0x100); /* UART_MOD_CNTRL */
+ wmb();
+}
+
static inline void alchemy_uart_putchar(u32 uart_phys, u8 c)
{
void __iomem *base = (void __iomem *)KSEG1ADDR(uart_phys);
wmb();
}
+/* return number of ethernet MACs on a given cputype */
+static inline int alchemy_get_macs(int type)
+{
+ switch (type) {
+ case ALCHEMY_CPU_AU1000:
+ case ALCHEMY_CPU_AU1500:
+ case ALCHEMY_CPU_AU1550:
+ return 2;
+ case ALCHEMY_CPU_AU1100:
+ return 1;
+ }
+ return 0;
+}
+
/* arch/mips/au1000/common/clocks.c */
extern void set_au1x00_speed(unsigned int new_freq);
extern unsigned int get_au1x00_speed(void);
/*
* Physical base addresses for integrated peripherals
+ * 0..au1000 1..au1500 2..au1100 3..au1550 4..au1200
*/
+#define AU1000_AC97_PHYS_ADDR 0x10000000 /* 012 */
+#define AU1000_USBD_PHYS_ADDR 0x10200000 /* 0123 */
+#define AU1000_IC0_PHYS_ADDR 0x10400000 /* 01234 */
+#define AU1000_MAC0_PHYS_ADDR 0x10500000 /* 023 */
+#define AU1000_MAC1_PHYS_ADDR 0x10510000 /* 023 */
+#define AU1000_MACEN_PHYS_ADDR 0x10520000 /* 023 */
+#define AU1100_SD0_PHYS_ADDR 0x10600000 /* 24 */
+#define AU1100_SD1_PHYS_ADDR 0x10680000 /* 24 */
+#define AU1000_I2S_PHYS_ADDR 0x11000000 /* 02 */
+#define AU1500_MAC0_PHYS_ADDR 0x11500000 /* 1 */
+#define AU1500_MAC1_PHYS_ADDR 0x11510000 /* 1 */
+#define AU1500_MACEN_PHYS_ADDR 0x11520000 /* 1 */
+#define AU1000_UART0_PHYS_ADDR 0x11100000 /* 01234 */
+#define AU1000_UART1_PHYS_ADDR 0x11200000 /* 0234 */
+#define AU1000_UART2_PHYS_ADDR 0x11300000 /* 0 */
+#define AU1000_UART3_PHYS_ADDR 0x11400000 /* 0123 */
+#define AU1500_GPIO2_PHYS_ADDR 0x11700000 /* 1234 */
+#define AU1000_IC1_PHYS_ADDR 0x11800000 /* 01234 */
+#define AU1000_SYS_PHYS_ADDR 0x11900000 /* 01234 */
+#define AU1000_DMA_PHYS_ADDR 0x14002000 /* 012 */
+#define AU1550_DBDMA_PHYS_ADDR 0x14002000 /* 34 */
+#define AU1550_DBDMA_CONF_PHYS_ADDR 0x14003000 /* 34 */
+#define AU1000_MACDMA0_PHYS_ADDR 0x14004000 /* 0123 */
+#define AU1000_MACDMA1_PHYS_ADDR 0x14004200 /* 0123 */
+
+
#ifdef CONFIG_SOC_AU1000
#define MEM_PHYS_ADDR 0x14000000
#define STATIC_MEM_PHYS_ADDR 0x14001000
-#define DMA0_PHYS_ADDR 0x14002000
-#define DMA1_PHYS_ADDR 0x14002100
-#define DMA2_PHYS_ADDR 0x14002200
-#define DMA3_PHYS_ADDR 0x14002300
-#define DMA4_PHYS_ADDR 0x14002400
-#define DMA5_PHYS_ADDR 0x14002500
-#define DMA6_PHYS_ADDR 0x14002600
-#define DMA7_PHYS_ADDR 0x14002700
-#define IC0_PHYS_ADDR 0x10400000
-#define IC1_PHYS_ADDR 0x11800000
-#define AC97_PHYS_ADDR 0x10000000
#define USBH_PHYS_ADDR 0x10100000
-#define USBD_PHYS_ADDR 0x10200000
#define IRDA_PHYS_ADDR 0x10300000
-#define MAC0_PHYS_ADDR 0x10500000
-#define MAC1_PHYS_ADDR 0x10510000
-#define MACEN_PHYS_ADDR 0x10520000
-#define MACDMA0_PHYS_ADDR 0x14004000
-#define MACDMA1_PHYS_ADDR 0x14004200
-#define I2S_PHYS_ADDR 0x11000000
-#define UART0_PHYS_ADDR 0x11100000
-#define UART1_PHYS_ADDR 0x11200000
-#define UART2_PHYS_ADDR 0x11300000
-#define UART3_PHYS_ADDR 0x11400000
#define SSI0_PHYS_ADDR 0x11600000
#define SSI1_PHYS_ADDR 0x11680000
-#define SYS_PHYS_ADDR 0x11900000
#define PCMCIA_IO_PHYS_ADDR 0xF00000000ULL
#define PCMCIA_ATTR_PHYS_ADDR 0xF40000000ULL
#define PCMCIA_MEM_PHYS_ADDR 0xF80000000ULL
#ifdef CONFIG_SOC_AU1500
#define MEM_PHYS_ADDR 0x14000000
#define STATIC_MEM_PHYS_ADDR 0x14001000
-#define DMA0_PHYS_ADDR 0x14002000
-#define DMA1_PHYS_ADDR 0x14002100
-#define DMA2_PHYS_ADDR 0x14002200
-#define DMA3_PHYS_ADDR 0x14002300
-#define DMA4_PHYS_ADDR 0x14002400
-#define DMA5_PHYS_ADDR 0x14002500
-#define DMA6_PHYS_ADDR 0x14002600
-#define DMA7_PHYS_ADDR 0x14002700
-#define IC0_PHYS_ADDR 0x10400000
-#define IC1_PHYS_ADDR 0x11800000
-#define AC97_PHYS_ADDR 0x10000000
#define USBH_PHYS_ADDR 0x10100000
-#define USBD_PHYS_ADDR 0x10200000
#define PCI_PHYS_ADDR 0x14005000
-#define MAC0_PHYS_ADDR 0x11500000
-#define MAC1_PHYS_ADDR 0x11510000
-#define MACEN_PHYS_ADDR 0x11520000
-#define MACDMA0_PHYS_ADDR 0x14004000
-#define MACDMA1_PHYS_ADDR 0x14004200
-#define I2S_PHYS_ADDR 0x11000000
-#define UART0_PHYS_ADDR 0x11100000
-#define UART3_PHYS_ADDR 0x11400000
-#define GPIO2_PHYS_ADDR 0x11700000
-#define SYS_PHYS_ADDR 0x11900000
#define PCI_MEM_PHYS_ADDR 0x400000000ULL
#define PCI_IO_PHYS_ADDR 0x500000000ULL
#define PCI_CONFIG0_PHYS_ADDR 0x600000000ULL
#ifdef CONFIG_SOC_AU1100
#define MEM_PHYS_ADDR 0x14000000
#define STATIC_MEM_PHYS_ADDR 0x14001000
-#define DMA0_PHYS_ADDR 0x14002000
-#define DMA1_PHYS_ADDR 0x14002100
-#define DMA2_PHYS_ADDR 0x14002200
-#define DMA3_PHYS_ADDR 0x14002300
-#define DMA4_PHYS_ADDR 0x14002400
-#define DMA5_PHYS_ADDR 0x14002500
-#define DMA6_PHYS_ADDR 0x14002600
-#define DMA7_PHYS_ADDR 0x14002700
-#define IC0_PHYS_ADDR 0x10400000
-#define SD0_PHYS_ADDR 0x10600000
-#define SD1_PHYS_ADDR 0x10680000
-#define IC1_PHYS_ADDR 0x11800000
-#define AC97_PHYS_ADDR 0x10000000
#define USBH_PHYS_ADDR 0x10100000
-#define USBD_PHYS_ADDR 0x10200000
#define IRDA_PHYS_ADDR 0x10300000
-#define MAC0_PHYS_ADDR 0x10500000
-#define MACEN_PHYS_ADDR 0x10520000
-#define MACDMA0_PHYS_ADDR 0x14004000
-#define MACDMA1_PHYS_ADDR 0x14004200
-#define I2S_PHYS_ADDR 0x11000000
-#define UART0_PHYS_ADDR 0x11100000
-#define UART1_PHYS_ADDR 0x11200000
-#define UART3_PHYS_ADDR 0x11400000
#define SSI0_PHYS_ADDR 0x11600000
#define SSI1_PHYS_ADDR 0x11680000
-#define GPIO2_PHYS_ADDR 0x11700000
-#define SYS_PHYS_ADDR 0x11900000
#define LCD_PHYS_ADDR 0x15000000
#define PCMCIA_IO_PHYS_ADDR 0xF00000000ULL
#define PCMCIA_ATTR_PHYS_ADDR 0xF40000000ULL
#ifdef CONFIG_SOC_AU1550
#define MEM_PHYS_ADDR 0x14000000
#define STATIC_MEM_PHYS_ADDR 0x14001000
-#define IC0_PHYS_ADDR 0x10400000
-#define IC1_PHYS_ADDR 0x11800000
#define USBH_PHYS_ADDR 0x14020000
-#define USBD_PHYS_ADDR 0x10200000
#define PCI_PHYS_ADDR 0x14005000
-#define MAC0_PHYS_ADDR 0x10500000
-#define MAC1_PHYS_ADDR 0x10510000
-#define MACEN_PHYS_ADDR 0x10520000
-#define MACDMA0_PHYS_ADDR 0x14004000
-#define MACDMA1_PHYS_ADDR 0x14004200
-#define UART0_PHYS_ADDR 0x11100000
-#define UART1_PHYS_ADDR 0x11200000
-#define UART3_PHYS_ADDR 0x11400000
-#define GPIO2_PHYS_ADDR 0x11700000
-#define SYS_PHYS_ADDR 0x11900000
-#define DDMA_PHYS_ADDR 0x14002000
#define PE_PHYS_ADDR 0x14008000
#define PSC0_PHYS_ADDR 0x11A00000
#define PSC1_PHYS_ADDR 0x11B00000
#define STATIC_MEM_PHYS_ADDR 0x14001000
#define AES_PHYS_ADDR 0x10300000
#define CIM_PHYS_ADDR 0x14004000
-#define IC0_PHYS_ADDR 0x10400000
-#define IC1_PHYS_ADDR 0x11800000
#define USBM_PHYS_ADDR 0x14020000
#define USBH_PHYS_ADDR 0x14020100
-#define UART0_PHYS_ADDR 0x11100000
-#define UART1_PHYS_ADDR 0x11200000
-#define GPIO2_PHYS_ADDR 0x11700000
-#define SYS_PHYS_ADDR 0x11900000
-#define DDMA_PHYS_ADDR 0x14002000
#define PSC0_PHYS_ADDR 0x11A00000
#define PSC1_PHYS_ADDR 0x11B00000
-#define SD0_PHYS_ADDR 0x10600000
-#define SD1_PHYS_ADDR 0x10680000
#define LCD_PHYS_ADDR 0x15000000
#define SWCNT_PHYS_ADDR 0x1110010C
#define MAEFE_PHYS_ADDR 0x14012000
#endif
-/* Interrupt Controller register offsets */
-#define IC_CFG0RD 0x40
-#define IC_CFG0SET 0x40
-#define IC_CFG0CLR 0x44
-#define IC_CFG1RD 0x48
-#define IC_CFG1SET 0x48
-#define IC_CFG1CLR 0x4C
-#define IC_CFG2RD 0x50
-#define IC_CFG2SET 0x50
-#define IC_CFG2CLR 0x54
-#define IC_REQ0INT 0x54
-#define IC_SRCRD 0x58
-#define IC_SRCSET 0x58
-#define IC_SRCCLR 0x5C
-#define IC_REQ1INT 0x5C
-#define IC_ASSIGNRD 0x60
-#define IC_ASSIGNSET 0x60
-#define IC_ASSIGNCLR 0x64
-#define IC_WAKERD 0x68
-#define IC_WAKESET 0x68
-#define IC_WAKECLR 0x6C
-#define IC_MASKRD 0x70
-#define IC_MASKSET 0x70
-#define IC_MASKCLR 0x74
-#define IC_RISINGRD 0x78
-#define IC_RISINGCLR 0x78
-#define IC_FALLINGRD 0x7C
-#define IC_FALLINGCLR 0x7C
-#define IC_TESTBIT 0x80
-
-
-/* Interrupt Controller 0 */
-#define IC0_CFG0RD 0xB0400040
-#define IC0_CFG0SET 0xB0400040
-#define IC0_CFG0CLR 0xB0400044
-
-#define IC0_CFG1RD 0xB0400048
-#define IC0_CFG1SET 0xB0400048
-#define IC0_CFG1CLR 0xB040004C
-
-#define IC0_CFG2RD 0xB0400050
-#define IC0_CFG2SET 0xB0400050
-#define IC0_CFG2CLR 0xB0400054
-
-#define IC0_REQ0INT 0xB0400054
-#define IC0_SRCRD 0xB0400058
-#define IC0_SRCSET 0xB0400058
-#define IC0_SRCCLR 0xB040005C
-#define IC0_REQ1INT 0xB040005C
-
-#define IC0_ASSIGNRD 0xB0400060
-#define IC0_ASSIGNSET 0xB0400060
-#define IC0_ASSIGNCLR 0xB0400064
-
-#define IC0_WAKERD 0xB0400068
-#define IC0_WAKESET 0xB0400068
-#define IC0_WAKECLR 0xB040006C
-
-#define IC0_MASKRD 0xB0400070
-#define IC0_MASKSET 0xB0400070
-#define IC0_MASKCLR 0xB0400074
-
-#define IC0_RISINGRD 0xB0400078
-#define IC0_RISINGCLR 0xB0400078
-#define IC0_FALLINGRD 0xB040007C
-#define IC0_FALLINGCLR 0xB040007C
-
-#define IC0_TESTBIT 0xB0400080
-
-/* Interrupt Controller 1 */
-#define IC1_CFG0RD 0xB1800040
-#define IC1_CFG0SET 0xB1800040
-#define IC1_CFG0CLR 0xB1800044
-
-#define IC1_CFG1RD 0xB1800048
-#define IC1_CFG1SET 0xB1800048
-#define IC1_CFG1CLR 0xB180004C
-
-#define IC1_CFG2RD 0xB1800050
-#define IC1_CFG2SET 0xB1800050
-#define IC1_CFG2CLR 0xB1800054
-
-#define IC1_REQ0INT 0xB1800054
-#define IC1_SRCRD 0xB1800058
-#define IC1_SRCSET 0xB1800058
-#define IC1_SRCCLR 0xB180005C
-#define IC1_REQ1INT 0xB180005C
-
-#define IC1_ASSIGNRD 0xB1800060
-#define IC1_ASSIGNSET 0xB1800060
-#define IC1_ASSIGNCLR 0xB1800064
-
-#define IC1_WAKERD 0xB1800068
-#define IC1_WAKESET 0xB1800068
-#define IC1_WAKECLR 0xB180006C
-
-#define IC1_MASKRD 0xB1800070
-#define IC1_MASKSET 0xB1800070
-#define IC1_MASKCLR 0xB1800074
-
-#define IC1_RISINGRD 0xB1800078
-#define IC1_RISINGCLR 0xB1800078
-#define IC1_FALLINGRD 0xB180007C
-#define IC1_FALLINGCLR 0xB180007C
-
-#define IC1_TESTBIT 0xB1800080
/* Au1000 */
#ifdef CONFIG_SOC_AU1000
-#define UART0_ADDR 0xB1100000
-#define UART3_ADDR 0xB1400000
-
#define USB_OHCI_BASE 0x10100000 /* phys addr for ioremap */
#define USB_HOST_CONFIG 0xB017FFFC
#define FOR_PLATFORM_C_USB_HOST_INT AU1000_USB_HOST_INT
-
-#define AU1000_ETH0_BASE 0xB0500000
-#define AU1000_ETH1_BASE 0xB0510000
-#define AU1000_MAC0_ENABLE 0xB0520000
-#define AU1000_MAC1_ENABLE 0xB0520004
-#define NUM_ETH_INTERFACES 2
#endif /* CONFIG_SOC_AU1000 */
/* Au1500 */
#ifdef CONFIG_SOC_AU1500
-#define UART0_ADDR 0xB1100000
-#define UART3_ADDR 0xB1400000
-
#define USB_OHCI_BASE 0x10100000 /* phys addr for ioremap */
#define USB_HOST_CONFIG 0xB017fffc
#define FOR_PLATFORM_C_USB_HOST_INT AU1500_USB_HOST_INT
-
-#define AU1500_ETH0_BASE 0xB1500000
-#define AU1500_ETH1_BASE 0xB1510000
-#define AU1500_MAC0_ENABLE 0xB1520000
-#define AU1500_MAC1_ENABLE 0xB1520004
-#define NUM_ETH_INTERFACES 2
#endif /* CONFIG_SOC_AU1500 */
/* Au1100 */
#ifdef CONFIG_SOC_AU1100
-#define UART0_ADDR 0xB1100000
-#define UART3_ADDR 0xB1400000
-
#define USB_OHCI_BASE 0x10100000 /* phys addr for ioremap */
#define USB_HOST_CONFIG 0xB017FFFC
#define FOR_PLATFORM_C_USB_HOST_INT AU1100_USB_HOST_INT
-
-#define AU1100_ETH0_BASE 0xB0500000
-#define AU1100_MAC0_ENABLE 0xB0520000
-#define NUM_ETH_INTERFACES 1
#endif /* CONFIG_SOC_AU1100 */
#ifdef CONFIG_SOC_AU1550
-#define UART0_ADDR 0xB1100000
#define USB_OHCI_BASE 0x14020000 /* phys addr for ioremap */
#define USB_OHCI_LEN 0x00060000
#define USB_HOST_CONFIG 0xB4027ffc
#define FOR_PLATFORM_C_USB_HOST_INT AU1550_USB_HOST_INT
-
-#define AU1550_ETH0_BASE 0xB0500000
-#define AU1550_ETH1_BASE 0xB0510000
-#define AU1550_MAC0_ENABLE 0xB0520000
-#define AU1550_MAC1_ENABLE 0xB0520004
-#define NUM_ETH_INTERFACES 2
#endif /* CONFIG_SOC_AU1550 */
#ifdef CONFIG_SOC_AU1200
-#define UART0_ADDR 0xB1100000
-
#define USB_UOC_BASE 0x14020020
#define USB_UOC_LEN 0x20
#define USB_OHCI_BASE 0x14020100
#define SYS_PINFUNC_S1B (1 << 2)
#endif
-#define SYS_TRIOUTRD 0xB1900100
-#define SYS_TRIOUTCLR 0xB1900100
-#define SYS_OUTPUTRD 0xB1900108
-#define SYS_OUTPUTSET 0xB1900108
-#define SYS_OUTPUTCLR 0xB190010C
-#define SYS_PINSTATERD 0xB1900110
-#define SYS_PININPUTEN 0xB1900110
-
-/* GPIO2, Au1500, Au1550 only */
-#define GPIO2_BASE 0xB1700000
-#define GPIO2_DIR (GPIO2_BASE + 0)
-#define GPIO2_OUTPUT (GPIO2_BASE + 8)
-#define GPIO2_PINSTATE (GPIO2_BASE + 0xC)
-#define GPIO2_INTENABLE (GPIO2_BASE + 0x10)
-#define GPIO2_ENABLE (GPIO2_BASE + 0x14)
-
/* Power Management */
#define SYS_SCRATCH0 0xB1900018
#define SYS_SCRATCH1 0xB190001C
# define AC97C_RS (1 << 1)
# define AC97C_CE (1 << 0)
-/* Secure Digital (SD) Controller */
-#define SD0_XMIT_FIFO 0xB0600000
-#define SD0_RECV_FIFO 0xB0600004
-#define SD1_XMIT_FIFO 0xB0680000
-#define SD1_RECV_FIFO 0xB0680004
-
#if defined(CONFIG_SOC_AU1500) || defined(CONFIG_SOC_AU1550)
/* Au1500 PCI Controller */
#define Au1500_CFG_BASE 0xB4005000 /* virtual, KSEG1 addr */
#define NUM_AU1000_DMA_CHANNELS 8
-/* DMA Channel Base Addresses */
-#define DMA_CHANNEL_BASE 0xB4002000
-#define DMA_CHANNEL_LEN 0x00000100
-
/* DMA Channel Register Offsets */
#define DMA_MODE_SET 0x00000000
#define DMA_MODE_READ DMA_MODE_SET
#ifndef _LANGUAGE_ASSEMBLY
-/*
- * The DMA base addresses.
- * The channels are every 256 bytes (0x0100) from the channel 0 base.
- * Interrupt status/enable is bits 15:0 for channels 15 to zero.
- */
-#define DDMA_GLOBAL_BASE 0xb4003000
-#define DDMA_CHANNEL_BASE 0xb4002000
-
typedef volatile struct dbdma_global {
u32 ddma_config;
u32 ddma_intstat;
#define MAKE_IRQ(intc, off) (AU1000_INTC##intc##_INT_BASE + (off))
+/* GPIO1 registers within SYS_ area */
+#define SYS_TRIOUTRD 0x100
+#define SYS_TRIOUTCLR 0x100
+#define SYS_OUTPUTRD 0x108
+#define SYS_OUTPUTSET 0x108
+#define SYS_OUTPUTCLR 0x10C
+#define SYS_PINSTATERD 0x110
+#define SYS_PININPUTEN 0x110
+
+/* register offsets within GPIO2 block */
+#define GPIO2_DIR 0x00
+#define GPIO2_OUTPUT 0x08
+#define GPIO2_PINSTATE 0x0C
+#define GPIO2_INTENABLE 0x10
+#define GPIO2_ENABLE 0x14
+
+struct gpio;
static inline int au1000_gpio1_to_irq(int gpio)
{
*/
static inline void alchemy_gpio1_set_value(int gpio, int v)
{
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_SYS_PHYS_ADDR);
unsigned long mask = 1 << (gpio - ALCHEMY_GPIO1_BASE);
unsigned long r = v ? SYS_OUTPUTSET : SYS_OUTPUTCLR;
- au_writel(mask, r);
- au_sync();
+ __raw_writel(mask, base + r);
+ wmb();
}
static inline int alchemy_gpio1_get_value(int gpio)
{
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_SYS_PHYS_ADDR);
unsigned long mask = 1 << (gpio - ALCHEMY_GPIO1_BASE);
- return au_readl(SYS_PINSTATERD) & mask;
+ return __raw_readl(base + SYS_PINSTATERD) & mask;
}
static inline int alchemy_gpio1_direction_input(int gpio)
{
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_SYS_PHYS_ADDR);
unsigned long mask = 1 << (gpio - ALCHEMY_GPIO1_BASE);
- au_writel(mask, SYS_TRIOUTCLR);
- au_sync();
+ __raw_writel(mask, base + SYS_TRIOUTCLR);
+ wmb();
return 0;
}
*/
static inline void __alchemy_gpio2_mod_dir(int gpio, int to_out)
{
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1500_GPIO2_PHYS_ADDR);
unsigned long mask = 1 << (gpio - ALCHEMY_GPIO2_BASE);
- unsigned long d = au_readl(GPIO2_DIR);
+ unsigned long d = __raw_readl(base + GPIO2_DIR);
+
if (to_out)
d |= mask;
else
d &= ~mask;
- au_writel(d, GPIO2_DIR);
- au_sync();
+ __raw_writel(d, base + GPIO2_DIR);
+ wmb();
}
static inline void alchemy_gpio2_set_value(int gpio, int v)
{
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1500_GPIO2_PHYS_ADDR);
unsigned long mask;
mask = ((v) ? 0x00010001 : 0x00010000) << (gpio - ALCHEMY_GPIO2_BASE);
- au_writel(mask, GPIO2_OUTPUT);
- au_sync();
+ __raw_writel(mask, base + GPIO2_OUTPUT);
+ wmb();
}
static inline int alchemy_gpio2_get_value(int gpio)
{
- return au_readl(GPIO2_PINSTATE) & (1 << (gpio - ALCHEMY_GPIO2_BASE));
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1500_GPIO2_PHYS_ADDR);
+ return __raw_readl(base + GPIO2_PINSTATE) & (1 << (gpio - ALCHEMY_GPIO2_BASE));
}
static inline int alchemy_gpio2_direction_input(int gpio)
*/
static inline void alchemy_gpio1_input_enable(void)
{
- au_writel(0, SYS_PININPUTEN); /* the write op is key */
- au_sync();
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1000_SYS_PHYS_ADDR);
+ __raw_writel(0, base + SYS_PININPUTEN); /* the write op is key */
+ wmb();
}
/* GPIO2 shared interrupts and control */
static inline void __alchemy_gpio2_mod_int(int gpio2, int en)
{
- unsigned long r = au_readl(GPIO2_INTENABLE);
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1500_GPIO2_PHYS_ADDR);
+ unsigned long r = __raw_readl(base + GPIO2_INTENABLE);
if (en)
r |= 1 << gpio2;
else
r &= ~(1 << gpio2);
- au_writel(r, GPIO2_INTENABLE);
- au_sync();
+ __raw_writel(r, base + GPIO2_INTENABLE);
+ wmb();
}
/**
*/
static inline void alchemy_gpio2_enable(void)
{
- au_writel(3, GPIO2_ENABLE); /* reset, clock enabled */
- au_sync();
- au_writel(1, GPIO2_ENABLE); /* clock enabled */
- au_sync();
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1500_GPIO2_PHYS_ADDR);
+ __raw_writel(3, base + GPIO2_ENABLE); /* reset, clock enabled */
+ wmb();
+ __raw_writel(1, base + GPIO2_ENABLE); /* clock enabled */
+ wmb();
}
/**
*/
static inline void alchemy_gpio2_disable(void)
{
- au_writel(2, GPIO2_ENABLE); /* reset, clock disabled */
- au_sync();
+ void __iomem *base = (void __iomem *)KSEG1ADDR(AU1500_GPIO2_PHYS_ADDR);
+ __raw_writel(2, base + GPIO2_ENABLE); /* reset, clock disabled */
+ wmb();
}
/**********************************************************************/
alchemy_gpio_set_value(gpio, v);
}
+static inline int gpio_get_value_cansleep(unsigned gpio)
+{
+ return gpio_get_value(gpio);
+}
+
+static inline void gpio_set_value_cansleep(unsigned gpio, int value)
+{
+ gpio_set_value(gpio, value);
+}
+
static inline int gpio_is_valid(int gpio)
{
return alchemy_gpio_is_valid(gpio);
return 0;
}
+static inline int gpio_request_one(unsigned gpio,
+ unsigned long flags, const char *label)
+{
+ return 0;
+}
+
+static inline int gpio_request_array(struct gpio *array, size_t num)
+{
+ return 0;
+}
+
static inline void gpio_free(unsigned gpio)
{
}
+static inline void gpio_free_array(struct gpio *array, size_t num)
+{
+}
+
+static inline int gpio_set_debounce(unsigned gpio, unsigned debounce)
+{
+ return -ENOSYS;
+}
+
+static inline int gpio_export(unsigned gpio, bool direction_may_change)
+{
+ return -ENOSYS;
+}
+
+static inline int gpio_export_link(struct device *dev, const char *name,
+ unsigned gpio)
+{
+ return -ENOSYS;
+}
+
+static inline int gpio_sysfs_set_active_low(unsigned gpio, int value)
+{
+ return -ENOSYS;
+}
+
+static inline void gpio_unexport(unsigned gpio)
+{
+}
+
#endif /* !CONFIG_ALCHEMY_GPIO_INDIRECT */
static inline void nvram_parse_macaddr(char *buf, u8 *macaddr)
{
- sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &macaddr[0], &macaddr[1],
- &macaddr[2], &macaddr[3], &macaddr[4], &macaddr[5]);
+ if (strchr(buf, ':'))
+ sscanf(buf, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &macaddr[0],
+ &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4],
+ &macaddr[5]);
+ else if (strchr(buf, '-'))
+ sscanf(buf, "%hhx-%hhx-%hhx-%hhx-%hhx-%hhx", &macaddr[0],
+ &macaddr[1], &macaddr[2], &macaddr[3], &macaddr[4],
+ &macaddr[5]);
+ else
+ printk(KERN_WARNING "Can not parse mac address: %s\n", buf);
}
#endif
char kernel_crc[CRC_LEN];
/* 228-235: Unused at present */
char reserved1[8];
- /* 236-239: CRC32 of header excluding tagVersion */
+ /* 236-239: CRC32 of header excluding last 20 bytes */
char header_crc[CRC_LEN];
/* 240-255: Unused at present */
char reserved2[16];
# CN30XX Disable instruction prefetching
or v0, v0, 0x2000
skip:
+ # First clear off CvmCtl[IPPCI] bit and move the performance
+ # counters interrupt to IRQ 6
+ li v1, ~(7 << 7)
+ and v0, v0, v1
+ ori v0, v0, (6 << 7)
# Write the cavium control register
dmtc0 v0, CP0_CVMCTL_REG
sync
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+#ifndef _LANTIQ_H__
+#define _LANTIQ_H__
+
+#include <linux/irq.h>
+
+/* generic reg access functions */
+#define ltq_r32(reg) __raw_readl(reg)
+#define ltq_w32(val, reg) __raw_writel(val, reg)
+#define ltq_w32_mask(clear, set, reg) \
+ ltq_w32((ltq_r32(reg) & ~(clear)) | (set), reg)
+#define ltq_r8(reg) __raw_readb(reg)
+#define ltq_w8(val, reg) __raw_writeb(val, reg)
+
+/* register access macros for EBU and CGU */
+#define ltq_ebu_w32(x, y) ltq_w32((x), ltq_ebu_membase + (y))
+#define ltq_ebu_r32(x) ltq_r32(ltq_ebu_membase + (x))
+#define ltq_cgu_w32(x, y) ltq_w32((x), ltq_cgu_membase + (y))
+#define ltq_cgu_r32(x) ltq_r32(ltq_cgu_membase + (x))
+
+extern __iomem void *ltq_ebu_membase;
+extern __iomem void *ltq_cgu_membase;
+
+extern unsigned int ltq_get_cpu_ver(void);
+extern unsigned int ltq_get_soc_type(void);
+
+/* clock speeds */
+#define CLOCK_60M 60000000
+#define CLOCK_83M 83333333
+#define CLOCK_111M 111111111
+#define CLOCK_133M 133333333
+#define CLOCK_167M 166666667
+#define CLOCK_200M 200000000
+#define CLOCK_266M 266666666
+#define CLOCK_333M 333333333
+#define CLOCK_400M 400000000
+
+/* spinlock all ebu i/o */
+extern spinlock_t ebu_lock;
+
+/* some irq helpers */
+extern void ltq_disable_irq(struct irq_data *data);
+extern void ltq_mask_and_ack_irq(struct irq_data *data);
+extern void ltq_enable_irq(struct irq_data *data);
+
+/* find out what caused the last cpu reset */
+extern int ltq_reset_cause(void);
+#define LTQ_RST_CAUSE_WDTRST 0x20
+
+#define IOPORT_RESOURCE_START 0x10000000
+#define IOPORT_RESOURCE_END 0xffffffff
+#define IOMEM_RESOURCE_START 0x10000000
+#define IOMEM_RESOURCE_END 0xffffffff
+#define LTQ_FLASH_START 0x10000000
+#define LTQ_FLASH_MAX 0x04000000
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LANTIQ_PLATFORM_H__
+#define _LANTIQ_PLATFORM_H__
+
+#include <linux/mtd/partitions.h>
+#include <linux/socket.h>
+
+/* struct used to pass info to the pci core */
+enum {
+ PCI_CLOCK_INT = 0,
+ PCI_CLOCK_EXT
+};
+
+#define PCI_EXIN0 0x0001
+#define PCI_EXIN1 0x0002
+#define PCI_EXIN2 0x0004
+#define PCI_EXIN3 0x0008
+#define PCI_EXIN4 0x0010
+#define PCI_EXIN5 0x0020
+#define PCI_EXIN_MAX 6
+
+#define PCI_GNT1 0x0040
+#define PCI_GNT2 0x0080
+#define PCI_GNT3 0x0100
+#define PCI_GNT4 0x0200
+
+#define PCI_REQ1 0x0400
+#define PCI_REQ2 0x0800
+#define PCI_REQ3 0x1000
+#define PCI_REQ4 0x2000
+#define PCI_REQ_SHIFT 10
+#define PCI_REQ_MASK 0xf
+
+struct ltq_pci_data {
+ int clock;
+ int gpio;
+ int irq[16];
+};
+
+/* struct used to pass info to network drivers */
+struct ltq_eth_data {
+ struct sockaddr mac;
+ int mii_mode;
+};
+
+#endif
--- /dev/null
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ */
+#ifndef __ASM_MIPS_MACH_LANTIQ_WAR_H
+#define __ASM_MIPS_MACH_LANTIQ_WAR_H
+
+#define R4600_V1_INDEX_ICACHEOP_WAR 0
+#define R4600_V1_HIT_CACHEOP_WAR 0
+#define R4600_V2_HIT_CACHEOP_WAR 0
+#define R5432_CP0_INTERRUPT_WAR 0
+#define BCM1250_M3_WAR 0
+#define SIBYTE_1956_WAR 0
+#define MIPS4K_ICACHE_REFILL_WAR 0
+#define MIPS_CACHE_SYNC_WAR 0
+#define TX49XX_ICACHE_INDEX_INV_WAR 0
+#define RM9000_CDEX_SMP_WAR 0
+#define ICACHE_REFILLS_WORKAROUND_WAR 0
+#define R10000_LLSC_WAR 0
+#define MIPS34K_MISSED_ITLB_WAR 0
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef __LANTIQ_IRQ_H
+#define __LANTIQ_IRQ_H
+
+#include <lantiq_irq.h>
+
+#define NR_IRQS 256
+
+#include_next <irq.h>
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LANTIQ_XWAY_IRQ_H__
+#define _LANTIQ_XWAY_IRQ_H__
+
+#define INT_NUM_IRQ0 8
+#define INT_NUM_IM0_IRL0 (INT_NUM_IRQ0 + 0)
+#define INT_NUM_IM1_IRL0 (INT_NUM_IRQ0 + 32)
+#define INT_NUM_IM2_IRL0 (INT_NUM_IRQ0 + 64)
+#define INT_NUM_IM3_IRL0 (INT_NUM_IRQ0 + 96)
+#define INT_NUM_IM4_IRL0 (INT_NUM_IRQ0 + 128)
+#define INT_NUM_IM_OFFSET (INT_NUM_IM1_IRL0 - INT_NUM_IM0_IRL0)
+
+#define LTQ_ASC_TIR(x) (INT_NUM_IM3_IRL0 + (x * 8))
+#define LTQ_ASC_RIR(x) (INT_NUM_IM3_IRL0 + (x * 8) + 1)
+#define LTQ_ASC_EIR(x) (INT_NUM_IM3_IRL0 + (x * 8) + 2)
+
+#define LTQ_ASC_ASE_TIR INT_NUM_IM2_IRL0
+#define LTQ_ASC_ASE_RIR (INT_NUM_IM2_IRL0 + 2)
+#define LTQ_ASC_ASE_EIR (INT_NUM_IM2_IRL0 + 3)
+
+#define LTQ_SSC_TIR (INT_NUM_IM0_IRL0 + 15)
+#define LTQ_SSC_RIR (INT_NUM_IM0_IRL0 + 14)
+#define LTQ_SSC_EIR (INT_NUM_IM0_IRL0 + 16)
+
+#define LTQ_MEI_DYING_GASP_INT (INT_NUM_IM1_IRL0 + 21)
+#define LTQ_MEI_INT (INT_NUM_IM1_IRL0 + 23)
+
+#define LTQ_TIMER6_INT (INT_NUM_IM1_IRL0 + 23)
+#define LTQ_USB_INT (INT_NUM_IM1_IRL0 + 22)
+#define LTQ_USB_OC_INT (INT_NUM_IM4_IRL0 + 23)
+
+#define MIPS_CPU_TIMER_IRQ 7
+
+#define LTQ_DMA_CH0_INT (INT_NUM_IM2_IRL0)
+#define LTQ_DMA_CH1_INT (INT_NUM_IM2_IRL0 + 1)
+#define LTQ_DMA_CH2_INT (INT_NUM_IM2_IRL0 + 2)
+#define LTQ_DMA_CH3_INT (INT_NUM_IM2_IRL0 + 3)
+#define LTQ_DMA_CH4_INT (INT_NUM_IM2_IRL0 + 4)
+#define LTQ_DMA_CH5_INT (INT_NUM_IM2_IRL0 + 5)
+#define LTQ_DMA_CH6_INT (INT_NUM_IM2_IRL0 + 6)
+#define LTQ_DMA_CH7_INT (INT_NUM_IM2_IRL0 + 7)
+#define LTQ_DMA_CH8_INT (INT_NUM_IM2_IRL0 + 8)
+#define LTQ_DMA_CH9_INT (INT_NUM_IM2_IRL0 + 9)
+#define LTQ_DMA_CH10_INT (INT_NUM_IM2_IRL0 + 10)
+#define LTQ_DMA_CH11_INT (INT_NUM_IM2_IRL0 + 11)
+#define LTQ_DMA_CH12_INT (INT_NUM_IM2_IRL0 + 25)
+#define LTQ_DMA_CH13_INT (INT_NUM_IM2_IRL0 + 26)
+#define LTQ_DMA_CH14_INT (INT_NUM_IM2_IRL0 + 27)
+#define LTQ_DMA_CH15_INT (INT_NUM_IM2_IRL0 + 28)
+#define LTQ_DMA_CH16_INT (INT_NUM_IM2_IRL0 + 29)
+#define LTQ_DMA_CH17_INT (INT_NUM_IM2_IRL0 + 30)
+#define LTQ_DMA_CH18_INT (INT_NUM_IM2_IRL0 + 16)
+#define LTQ_DMA_CH19_INT (INT_NUM_IM2_IRL0 + 21)
+
+#define LTQ_PPE_MBOX_INT (INT_NUM_IM2_IRL0 + 24)
+
+#define INT_NUM_IM4_IRL14 (INT_NUM_IM4_IRL0 + 14)
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LTQ_XWAY_H__
+#define _LTQ_XWAY_H__
+
+#ifdef CONFIG_SOC_TYPE_XWAY
+
+#include <lantiq.h>
+
+/* Chip IDs */
+#define SOC_ID_DANUBE1 0x129
+#define SOC_ID_DANUBE2 0x12B
+#define SOC_ID_TWINPASS 0x12D
+#define SOC_ID_AMAZON_SE 0x152
+#define SOC_ID_ARX188 0x16C
+#define SOC_ID_ARX168 0x16D
+#define SOC_ID_ARX182 0x16F
+
+/* SoC Types */
+#define SOC_TYPE_DANUBE 0x01
+#define SOC_TYPE_TWINPASS 0x02
+#define SOC_TYPE_AR9 0x03
+#define SOC_TYPE_VR9 0x04
+#define SOC_TYPE_AMAZON_SE 0x05
+
+/* ASC0/1 - serial port */
+#define LTQ_ASC0_BASE_ADDR 0x1E100400
+#define LTQ_ASC1_BASE_ADDR 0x1E100C00
+#define LTQ_ASC_SIZE 0x400
+
+/* RCU - reset control unit */
+#define LTQ_RCU_BASE_ADDR 0x1F203000
+#define LTQ_RCU_SIZE 0x1000
+
+/* GPTU - general purpose timer unit */
+#define LTQ_GPTU_BASE_ADDR 0x18000300
+#define LTQ_GPTU_SIZE 0x100
+
+/* EBU - external bus unit */
+#define LTQ_EBU_GPIO_START 0x14000000
+#define LTQ_EBU_GPIO_SIZE 0x1000
+
+#define LTQ_EBU_BASE_ADDR 0x1E105300
+#define LTQ_EBU_SIZE 0x100
+
+#define LTQ_EBU_BUSCON0 0x0060
+#define LTQ_EBU_PCC_CON 0x0090
+#define LTQ_EBU_PCC_IEN 0x00A4
+#define LTQ_EBU_PCC_ISTAT 0x00A0
+#define LTQ_EBU_BUSCON1 0x0064
+#define LTQ_EBU_ADDRSEL1 0x0024
+#define EBU_WRDIS 0x80000000
+
+/* CGU - clock generation unit */
+#define LTQ_CGU_BASE_ADDR 0x1F103000
+#define LTQ_CGU_SIZE 0x1000
+
+/* ICU - interrupt control unit */
+#define LTQ_ICU_BASE_ADDR 0x1F880200
+#define LTQ_ICU_SIZE 0x100
+
+/* EIU - external interrupt unit */
+#define LTQ_EIU_BASE_ADDR 0x1F101000
+#define LTQ_EIU_SIZE 0x1000
+
+/* PMU - power management unit */
+#define LTQ_PMU_BASE_ADDR 0x1F102000
+#define LTQ_PMU_SIZE 0x1000
+
+#define PMU_DMA 0x0020
+#define PMU_USB 0x8041
+#define PMU_LED 0x0800
+#define PMU_GPT 0x1000
+#define PMU_PPE 0x2000
+#define PMU_FPI 0x4000
+#define PMU_SWITCH 0x10000000
+
+/* ETOP - ethernet */
+#define LTQ_ETOP_BASE_ADDR 0x1E180000
+#define LTQ_ETOP_SIZE 0x40000
+
+/* DMA */
+#define LTQ_DMA_BASE_ADDR 0x1E104100
+#define LTQ_DMA_SIZE 0x800
+
+/* PCI */
+#define PCI_CR_BASE_ADDR 0x1E105400
+#define PCI_CR_SIZE 0x400
+
+/* WDT */
+#define LTQ_WDT_BASE_ADDR 0x1F8803F0
+#define LTQ_WDT_SIZE 0x10
+
+/* STP - serial to parallel conversion unit */
+#define LTQ_STP_BASE_ADDR 0x1E100BB0
+#define LTQ_STP_SIZE 0x40
+
+/* GPIO */
+#define LTQ_GPIO0_BASE_ADDR 0x1E100B10
+#define LTQ_GPIO1_BASE_ADDR 0x1E100B40
+#define LTQ_GPIO2_BASE_ADDR 0x1E100B70
+#define LTQ_GPIO_SIZE 0x30
+
+/* SSC */
+#define LTQ_SSC_BASE_ADDR 0x1e100800
+#define LTQ_SSC_SIZE 0x100
+
+/* MEI - dsl core */
+#define LTQ_MEI_BASE_ADDR 0x1E116000
+
+/* DEU - data encryption unit */
+#define LTQ_DEU_BASE_ADDR 0x1E103100
+
+/* MPS - multi processor unit (voice) */
+#define LTQ_MPS_BASE_ADDR (KSEG1 + 0x1F107000)
+#define LTQ_MPS_CHIPID ((u32 *)(LTQ_MPS_BASE_ADDR + 0x0344))
+
+/* request a non-gpio and set the PIO config */
+extern int ltq_gpio_request(unsigned int pin, unsigned int alt0,
+ unsigned int alt1, unsigned int dir, const char *name);
+extern void ltq_pmu_enable(unsigned int module);
+extern void ltq_pmu_disable(unsigned int module);
+
+static inline int ltq_is_ar9(void)
+{
+ return (ltq_get_soc_type() == SOC_TYPE_AR9);
+}
+
+static inline int ltq_is_vr9(void)
+{
+ return (ltq_get_soc_type() == SOC_TYPE_VR9);
+}
+
+#endif /* CONFIG_SOC_TYPE_XWAY */
+#endif /* _LTQ_XWAY_H__ */
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2011 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef LTQ_DMA_H__
+#define LTQ_DMA_H__
+
+#define LTQ_DESC_SIZE 0x08 /* each descriptor is 64bit */
+#define LTQ_DESC_NUM 0x40 /* 64 descriptors / channel */
+
+#define LTQ_DMA_OWN BIT(31) /* owner bit */
+#define LTQ_DMA_C BIT(30) /* complete bit */
+#define LTQ_DMA_SOP BIT(29) /* start of packet */
+#define LTQ_DMA_EOP BIT(28) /* end of packet */
+#define LTQ_DMA_TX_OFFSET(x) ((x & 0x1f) << 23) /* data bytes offset */
+#define LTQ_DMA_RX_OFFSET(x) ((x & 0x7) << 23) /* data bytes offset */
+#define LTQ_DMA_SIZE_MASK (0xffff) /* the size field is 16 bit */
+
+struct ltq_dma_desc {
+ u32 ctl;
+ u32 addr;
+};
+
+struct ltq_dma_channel {
+ int nr; /* the channel number */
+ int irq; /* the mapped irq */
+ int desc; /* the current descriptor */
+ struct ltq_dma_desc *desc_base; /* the descriptor base */
+ int phys; /* physical addr */
+};
+
+enum {
+ DMA_PORT_ETOP = 0,
+ DMA_PORT_DEU,
+};
+
+extern void ltq_dma_enable_irq(struct ltq_dma_channel *ch);
+extern void ltq_dma_disable_irq(struct ltq_dma_channel *ch);
+extern void ltq_dma_ack_irq(struct ltq_dma_channel *ch);
+extern void ltq_dma_open(struct ltq_dma_channel *ch);
+extern void ltq_dma_close(struct ltq_dma_channel *ch);
+extern void ltq_dma_alloc_tx(struct ltq_dma_channel *ch);
+extern void ltq_dma_alloc_rx(struct ltq_dma_channel *ch);
+extern void ltq_dma_free(struct ltq_dma_channel *ch);
+extern void ltq_dma_init_port(int p);
+
+#endif
--- /dev/null
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Netlogic Microsystems
+ * Copyright (C) 2003 Ralf Baechle
+ */
+#ifndef __ASM_MACH_NETLOGIC_CPU_FEATURE_OVERRIDES_H
+#define __ASM_MACH_NETLOGIC_CPU_FEATURE_OVERRIDES_H
+
+#define cpu_has_4kex 1
+#define cpu_has_4k_cache 1
+#define cpu_has_watch 1
+#define cpu_has_mips16 0
+#define cpu_has_counter 1
+#define cpu_has_divec 1
+#define cpu_has_vce 0
+#define cpu_has_cache_cdex_p 0
+#define cpu_has_cache_cdex_s 0
+#define cpu_has_prefetch 1
+#define cpu_has_mcheck 1
+#define cpu_has_ejtag 1
+
+#define cpu_has_llsc 1
+#define cpu_has_vtag_icache 0
+#define cpu_has_dc_aliases 0
+#define cpu_has_ic_fills_f_dc 0
+#define cpu_has_dsp 0
+#define cpu_has_mipsmt 0
+#define cpu_has_userlocal 0
+#define cpu_icache_snoops_remote_store 0
+
+#define cpu_has_nofpuex 0
+#define cpu_has_64bits 1
+
+#define cpu_has_mips32r1 1
+#define cpu_has_mips32r2 0
+#define cpu_has_mips64r1 1
+#define cpu_has_mips64r2 0
+
+#define cpu_has_inclusive_pcaches 0
+
+#define cpu_dcache_line_size() 32
+#define cpu_icache_line_size() 32
+
+#endif /* __ASM_MACH_NETLOGIC_CPU_FEATURE_OVERRIDES_H */
--- /dev/null
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Netlogic Microsystems.
+ */
+#ifndef __ASM_NETLOGIC_IRQ_H
+#define __ASM_NETLOGIC_IRQ_H
+
+#define NR_IRQS 64
+#define MIPS_CPU_IRQ_BASE 0
+
+#endif /* __ASM_NETLOGIC_IRQ_H */
--- /dev/null
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Netlogic Microsystems.
+ * Copyright (C) 2002, 2004, 2007 by Ralf Baechle <ralf@linux-mips.org>
+ */
+#ifndef __ASM_MIPS_MACH_NLM_WAR_H
+#define __ASM_MIPS_MACH_NLM_WAR_H
+
+#define R4600_V1_INDEX_ICACHEOP_WAR 0
+#define R4600_V1_HIT_CACHEOP_WAR 0
+#define R4600_V2_HIT_CACHEOP_WAR 0
+#define R5432_CP0_INTERRUPT_WAR 0
+#define BCM1250_M3_WAR 0
+#define SIBYTE_1956_WAR 0
+#define MIPS4K_ICACHE_REFILL_WAR 0
+#define MIPS_CACHE_SYNC_WAR 0
+#define TX49XX_ICACHE_INDEX_INV_WAR 0
+#define RM9000_CDEX_SMP_WAR 0
+#define ICACHE_REFILLS_WORKAROUND_WAR 0
+#define R10000_LLSC_WAR 0
+#define MIPS34K_MISSED_ITLB_WAR 0
+
+#endif /* __ASM_MIPS_MACH_NLM_WAR_H */
#define MODULE_PROC_FAMILY "LOONGSON2 "
#elif defined CONFIG_CPU_CAVIUM_OCTEON
#define MODULE_PROC_FAMILY "OCTEON "
+#elif defined CONFIG_CPU_XLR
+#define MODULE_PROC_FAMILY "XLR "
#else
#error MODULE_PROC_FAMILY undefined for your processor configuration
#endif
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NLM_INTERRUPT_H
+#define _ASM_NLM_INTERRUPT_H
+
+/* Defines for the IRQ numbers */
+
+#define IRQ_IPI_SMP_FUNCTION 3
+#define IRQ_IPI_SMP_RESCHEDULE 4
+#define IRQ_MSGRING 6
+#define IRQ_TIMER 7
+
+#endif
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NLM_MIPS_EXTS_H
+#define _ASM_NLM_MIPS_EXTS_H
+
+/*
+ * XLR and XLP interrupt request and interrupt mask registers
+ */
+#define read_c0_eirr() __read_64bit_c0_register($9, 6)
+#define read_c0_eimr() __read_64bit_c0_register($9, 7)
+#define write_c0_eirr(val) __write_64bit_c0_register($9, 6, val)
+
+/*
+ * Writing EIMR in 32 bit is a special case, the lower 8 bit of the
+ * EIMR is shadowed in the status register, so we cannot save and
+ * restore status register for split read.
+ */
+#define write_c0_eimr(val) \
+do { \
+ if (sizeof(unsigned long) == 4) { \
+ unsigned long __flags; \
+ \
+ local_irq_save(__flags); \
+ __asm__ __volatile__( \
+ ".set\tmips64\n\t" \
+ "dsll\t%L0, %L0, 32\n\t" \
+ "dsrl\t%L0, %L0, 32\n\t" \
+ "dsll\t%M0, %M0, 32\n\t" \
+ "or\t%L0, %L0, %M0\n\t" \
+ "dmtc0\t%L0, $9, 7\n\t" \
+ ".set\tmips0" \
+ : : "r" (val)); \
+ __flags = (__flags & 0xffff00ff) | (((val) & 0xff) << 8);\
+ local_irq_restore(__flags); \
+ } else \
+ __write_64bit_c0_register($9, 7, (val)); \
+} while (0)
+
+static inline int hard_smp_processor_id(void)
+{
+ return __read_32bit_c0_register($15, 1) & 0x3ff;
+}
+
+#endif /*_ASM_NLM_MIPS_EXTS_H */
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NETLOGIC_BOOTINFO_H
+#define _ASM_NETLOGIC_BOOTINFO_H
+
+struct psb_info {
+ uint64_t boot_level;
+ uint64_t io_base;
+ uint64_t output_device;
+ uint64_t uart_print;
+ uint64_t led_output;
+ uint64_t init;
+ uint64_t exit;
+ uint64_t warm_reset;
+ uint64_t wakeup;
+ uint64_t online_cpu_map;
+ uint64_t master_reentry_sp;
+ uint64_t master_reentry_gp;
+ uint64_t master_reentry_fn;
+ uint64_t slave_reentry_fn;
+ uint64_t magic_dword;
+ uint64_t uart_putchar;
+ uint64_t size;
+ uint64_t uart_getchar;
+ uint64_t nmi_handler;
+ uint64_t psb_version;
+ uint64_t mac_addr;
+ uint64_t cpu_frequency;
+ uint64_t board_version;
+ uint64_t malloc;
+ uint64_t free;
+ uint64_t global_shmem_addr;
+ uint64_t global_shmem_size;
+ uint64_t psb_os_cpu_map;
+ uint64_t userapp_cpu_map;
+ uint64_t wakeup_os;
+ uint64_t psb_mem_map;
+ uint64_t board_major_version;
+ uint64_t board_minor_version;
+ uint64_t board_manf_revision;
+ uint64_t board_serial_number;
+ uint64_t psb_physaddr_map;
+ uint64_t xlr_loaderip_config;
+ uint64_t bldr_envp;
+ uint64_t avail_mem_map;
+};
+
+enum {
+ NETLOGIC_IO_SPACE = 0x10,
+ PCIX_IO_SPACE,
+ PCIX_CFG_SPACE,
+ PCIX_MEMORY_SPACE,
+ HT_IO_SPACE,
+ HT_CFG_SPACE,
+ HT_MEMORY_SPACE,
+ SRAM_SPACE,
+ FLASH_CONTROLLER_SPACE
+};
+
+#define NLM_MAX_ARGS 64
+#define NLM_MAX_ENVS 32
+
+/* This is what netlboot passes and linux boot_mem_map is subtly different */
+#define NLM_BOOT_MEM_MAP_MAX 32
+struct nlm_boot_mem_map {
+ int nr_map;
+ struct nlm_boot_mem_map_entry {
+ uint64_t addr; /* start of memory segment */
+ uint64_t size; /* size of memory segment */
+ uint32_t type; /* type of memory segment */
+ } map[NLM_BOOT_MEM_MAP_MAX];
+};
+
+/* Pointer to saved boot loader info */
+extern struct psb_info nlm_prom_info;
+
+#endif
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NLM_GPIO_H
+#define _ASM_NLM_GPIO_H
+
+#define NETLOGIC_GPIO_INT_EN_REG 0
+#define NETLOGIC_GPIO_INPUT_INVERSION_REG 1
+#define NETLOGIC_GPIO_IO_DIR_REG 2
+#define NETLOGIC_GPIO_IO_DATA_WR_REG 3
+#define NETLOGIC_GPIO_IO_DATA_RD_REG 4
+
+#define NETLOGIC_GPIO_SWRESET_REG 8
+#define NETLOGIC_GPIO_DRAM1_CNTRL_REG 9
+#define NETLOGIC_GPIO_DRAM1_RATIO_REG 10
+#define NETLOGIC_GPIO_DRAM1_RESET_REG 11
+#define NETLOGIC_GPIO_DRAM1_STATUS_REG 12
+#define NETLOGIC_GPIO_DRAM2_CNTRL_REG 13
+#define NETLOGIC_GPIO_DRAM2_RATIO_REG 14
+#define NETLOGIC_GPIO_DRAM2_RESET_REG 15
+#define NETLOGIC_GPIO_DRAM2_STATUS_REG 16
+
+#define NETLOGIC_GPIO_PWRON_RESET_CFG_REG 21
+#define NETLOGIC_GPIO_BIST_ALL_GO_STATUS_REG 24
+#define NETLOGIC_GPIO_BIST_CPU_GO_STATUS_REG 25
+#define NETLOGIC_GPIO_BIST_DEV_GO_STATUS_REG 26
+
+#define NETLOGIC_GPIO_FUSE_BANK_REG 35
+#define NETLOGIC_GPIO_CPU_RESET_REG 40
+#define NETLOGIC_GPIO_RNG_REG 43
+
+#define NETLOGIC_PWRON_RESET_PCMCIA_BOOT 17
+#define NETLOGIC_GPIO_LED_BITMAP 0x1700000
+#define NETLOGIC_GPIO_LED_0_SHIFT 20
+#define NETLOGIC_GPIO_LED_1_SHIFT 24
+
+#define NETLOGIC_GPIO_LED_OUTPUT_CODE_RESET 0x01
+#define NETLOGIC_GPIO_LED_OUTPUT_CODE_HARD_RESET 0x02
+#define NETLOGIC_GPIO_LED_OUTPUT_CODE_SOFT_RESET 0x03
+#define NETLOGIC_GPIO_LED_OUTPUT_CODE_MAIN 0x04
+
+#endif
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NLM_IOMAP_H
+#define _ASM_NLM_IOMAP_H
+
+#define DEFAULT_NETLOGIC_IO_BASE CKSEG1ADDR(0x1ef00000)
+#define NETLOGIC_IO_DDR2_CHN0_OFFSET 0x01000
+#define NETLOGIC_IO_DDR2_CHN1_OFFSET 0x02000
+#define NETLOGIC_IO_DDR2_CHN2_OFFSET 0x03000
+#define NETLOGIC_IO_DDR2_CHN3_OFFSET 0x04000
+#define NETLOGIC_IO_PIC_OFFSET 0x08000
+#define NETLOGIC_IO_UART_0_OFFSET 0x14000
+#define NETLOGIC_IO_UART_1_OFFSET 0x15100
+
+#define NETLOGIC_IO_SIZE 0x1000
+
+#define NETLOGIC_IO_BRIDGE_OFFSET 0x00000
+
+#define NETLOGIC_IO_RLD2_CHN0_OFFSET 0x05000
+#define NETLOGIC_IO_RLD2_CHN1_OFFSET 0x06000
+
+#define NETLOGIC_IO_SRAM_OFFSET 0x07000
+
+#define NETLOGIC_IO_PCIX_OFFSET 0x09000
+#define NETLOGIC_IO_HT_OFFSET 0x0A000
+
+#define NETLOGIC_IO_SECURITY_OFFSET 0x0B000
+
+#define NETLOGIC_IO_GMAC_0_OFFSET 0x0C000
+#define NETLOGIC_IO_GMAC_1_OFFSET 0x0D000
+#define NETLOGIC_IO_GMAC_2_OFFSET 0x0E000
+#define NETLOGIC_IO_GMAC_3_OFFSET 0x0F000
+
+/* XLS devices */
+#define NETLOGIC_IO_GMAC_4_OFFSET 0x20000
+#define NETLOGIC_IO_GMAC_5_OFFSET 0x21000
+#define NETLOGIC_IO_GMAC_6_OFFSET 0x22000
+#define NETLOGIC_IO_GMAC_7_OFFSET 0x23000
+
+#define NETLOGIC_IO_PCIE_0_OFFSET 0x1E000
+#define NETLOGIC_IO_PCIE_1_OFFSET 0x1F000
+#define NETLOGIC_IO_SRIO_0_OFFSET 0x1E000
+#define NETLOGIC_IO_SRIO_1_OFFSET 0x1F000
+
+#define NETLOGIC_IO_USB_0_OFFSET 0x24000
+#define NETLOGIC_IO_USB_1_OFFSET 0x25000
+
+#define NETLOGIC_IO_COMP_OFFSET 0x1D000
+/* end XLS devices */
+
+/* XLR devices */
+#define NETLOGIC_IO_SPI4_0_OFFSET 0x10000
+#define NETLOGIC_IO_XGMAC_0_OFFSET 0x11000
+#define NETLOGIC_IO_SPI4_1_OFFSET 0x12000
+#define NETLOGIC_IO_XGMAC_1_OFFSET 0x13000
+/* end XLR devices */
+
+#define NETLOGIC_IO_I2C_0_OFFSET 0x16000
+#define NETLOGIC_IO_I2C_1_OFFSET 0x17000
+
+#define NETLOGIC_IO_GPIO_OFFSET 0x18000
+#define NETLOGIC_IO_FLASH_OFFSET 0x19000
+#define NETLOGIC_IO_TB_OFFSET 0x1C000
+
+#define NETLOGIC_CPLD_OFFSET KSEG1ADDR(0x1d840000)
+
+/*
+ * Base Address (Virtual) of the PCI Config address space
+ * For now, choose 256M phys in kseg1 = 0xA0000000 + (1<<28)
+ * Config space spans 256 (num of buses) * 256 (num functions) * 256 bytes
+ * ie 1<<24 = 16M
+ */
+#define DEFAULT_PCI_CONFIG_BASE 0x18000000
+#define DEFAULT_HT_TYPE0_CFG_BASE 0x16000000
+#define DEFAULT_HT_TYPE1_CFG_BASE 0x17000000
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+typedef volatile __u32 nlm_reg_t;
+extern unsigned long netlogic_io_base;
+
+/* FIXME read once in write_reg */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define netlogic_read_reg(base, offset) ((base)[(offset)])
+#define netlogic_write_reg(base, offset, value) ((base)[(offset)] = (value))
+#else
+#define netlogic_read_reg(base, offset) (be32_to_cpu((base)[(offset)]))
+#define netlogic_write_reg(base, offset, value) \
+ ((base)[(offset)] = cpu_to_be32((value)))
+#endif
+
+#define netlogic_read_reg_le32(base, offset) (le32_to_cpu((base)[(offset)]))
+#define netlogic_write_reg_le32(base, offset, value) \
+ ((base)[(offset)] = cpu_to_le32((value)))
+#define netlogic_io_mmio(offset) ((nlm_reg_t *)(netlogic_io_base+(offset)))
+#endif /* __ASSEMBLY__ */
+#endif
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NLM_XLR_PIC_H
+#define _ASM_NLM_XLR_PIC_H
+
+#define PIC_CLKS_PER_SEC 66666666ULL
+/* PIC hardware interrupt numbers */
+#define PIC_IRT_WD_INDEX 0
+#define PIC_IRT_TIMER_0_INDEX 1
+#define PIC_IRT_TIMER_1_INDEX 2
+#define PIC_IRT_TIMER_2_INDEX 3
+#define PIC_IRT_TIMER_3_INDEX 4
+#define PIC_IRT_TIMER_4_INDEX 5
+#define PIC_IRT_TIMER_5_INDEX 6
+#define PIC_IRT_TIMER_6_INDEX 7
+#define PIC_IRT_TIMER_7_INDEX 8
+#define PIC_IRT_CLOCK_INDEX PIC_IRT_TIMER_7_INDEX
+#define PIC_IRT_UART_0_INDEX 9
+#define PIC_IRT_UART_1_INDEX 10
+#define PIC_IRT_I2C_0_INDEX 11
+#define PIC_IRT_I2C_1_INDEX 12
+#define PIC_IRT_PCMCIA_INDEX 13
+#define PIC_IRT_GPIO_INDEX 14
+#define PIC_IRT_HYPER_INDEX 15
+#define PIC_IRT_PCIX_INDEX 16
+/* XLS */
+#define PIC_IRT_CDE_INDEX 15
+#define PIC_IRT_BRIDGE_TB_XLS_INDEX 16
+/* XLS */
+#define PIC_IRT_GMAC0_INDEX 17
+#define PIC_IRT_GMAC1_INDEX 18
+#define PIC_IRT_GMAC2_INDEX 19
+#define PIC_IRT_GMAC3_INDEX 20
+#define PIC_IRT_XGS0_INDEX 21
+#define PIC_IRT_XGS1_INDEX 22
+#define PIC_IRT_HYPER_FATAL_INDEX 23
+#define PIC_IRT_PCIX_FATAL_INDEX 24
+#define PIC_IRT_BRIDGE_AERR_INDEX 25
+#define PIC_IRT_BRIDGE_BERR_INDEX 26
+#define PIC_IRT_BRIDGE_TB_XLR_INDEX 27
+#define PIC_IRT_BRIDGE_AERR_NMI_INDEX 28
+/* XLS */
+#define PIC_IRT_GMAC4_INDEX 21
+#define PIC_IRT_GMAC5_INDEX 22
+#define PIC_IRT_GMAC6_INDEX 23
+#define PIC_IRT_GMAC7_INDEX 24
+#define PIC_IRT_BRIDGE_ERR_INDEX 25
+#define PIC_IRT_PCIE_LINK0_INDEX 26
+#define PIC_IRT_PCIE_LINK1_INDEX 27
+#define PIC_IRT_PCIE_LINK2_INDEX 23
+#define PIC_IRT_PCIE_LINK3_INDEX 24
+#define PIC_IRT_PCIE_XLSB0_LINK2_INDEX 28
+#define PIC_IRT_PCIE_XLSB0_LINK3_INDEX 29
+#define PIC_IRT_SRIO_LINK0_INDEX 26
+#define PIC_IRT_SRIO_LINK1_INDEX 27
+#define PIC_IRT_SRIO_LINK2_INDEX 28
+#define PIC_IRT_SRIO_LINK3_INDEX 29
+#define PIC_IRT_PCIE_INT_INDEX 28
+#define PIC_IRT_PCIE_FATAL_INDEX 29
+#define PIC_IRT_GPIO_B_INDEX 30
+#define PIC_IRT_USB_INDEX 31
+/* XLS */
+#define PIC_NUM_IRTS 32
+
+
+#define PIC_CLOCK_TIMER 7
+
+/* PIC Registers */
+#define PIC_CTRL 0x00
+#define PIC_IPI 0x04
+#define PIC_INT_ACK 0x06
+
+#define WD_MAX_VAL_0 0x08
+#define WD_MAX_VAL_1 0x09
+#define WD_MASK_0 0x0a
+#define WD_MASK_1 0x0b
+#define WD_HEARBEAT_0 0x0c
+#define WD_HEARBEAT_1 0x0d
+
+#define PIC_IRT_0_BASE 0x40
+#define PIC_IRT_1_BASE 0x80
+#define PIC_TIMER_MAXVAL_0_BASE 0x100
+#define PIC_TIMER_MAXVAL_1_BASE 0x110
+#define PIC_TIMER_COUNT_0_BASE 0x120
+#define PIC_TIMER_COUNT_1_BASE 0x130
+
+#define PIC_IRT_0(picintr) (PIC_IRT_0_BASE + (picintr))
+#define PIC_IRT_1(picintr) (PIC_IRT_1_BASE + (picintr))
+
+#define PIC_TIMER_MAXVAL_0(i) (PIC_TIMER_MAXVAL_0_BASE + (i))
+#define PIC_TIMER_MAXVAL_1(i) (PIC_TIMER_MAXVAL_1_BASE + (i))
+#define PIC_TIMER_COUNT_0(i) (PIC_TIMER_COUNT_0_BASE + (i))
+#define PIC_TIMER_COUNT_1(i) (PIC_TIMER_COUNT_0_BASE + (i))
+
+/*
+ * Mapping between hardware interrupt numbers and IRQs on CPU
+ * we use a simple scheme to map PIC interrupts 0-31 to IRQs
+ * 8-39. This leaves the IRQ 0-7 for cpu interrupts like
+ * count/compare and FMN
+ */
+#define PIC_IRQ_BASE 8
+#define PIC_INTR_TO_IRQ(i) (PIC_IRQ_BASE + (i))
+#define PIC_IRQ_TO_INTR(i) ((i) - PIC_IRQ_BASE)
+
+#define PIC_IRT_FIRST_IRQ PIC_IRQ_BASE
+#define PIC_WD_IRQ PIC_INTR_TO_IRQ(PIC_IRT_WD_INDEX)
+#define PIC_TIMER_0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_0_INDEX)
+#define PIC_TIMER_1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_1_INDEX)
+#define PIC_TIMER_2_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_2_INDEX)
+#define PIC_TIMER_3_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_3_INDEX)
+#define PIC_TIMER_4_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_4_INDEX)
+#define PIC_TIMER_5_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_5_INDEX)
+#define PIC_TIMER_6_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_6_INDEX)
+#define PIC_TIMER_7_IRQ PIC_INTR_TO_IRQ(PIC_IRT_TIMER_7_INDEX)
+#define PIC_CLOCK_IRQ (PIC_TIMER_7_IRQ)
+#define PIC_UART_0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_UART_0_INDEX)
+#define PIC_UART_1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_UART_1_INDEX)
+#define PIC_I2C_0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_I2C_0_INDEX)
+#define PIC_I2C_1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_I2C_1_INDEX)
+#define PIC_PCMCIA_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCMCIA_INDEX)
+#define PIC_GPIO_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GPIO_INDEX)
+#define PIC_HYPER_IRQ PIC_INTR_TO_IRQ(PIC_IRT_HYPER_INDEX)
+#define PIC_PCIX_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIX_INDEX)
+/* XLS */
+#define PIC_CDE_IRQ PIC_INTR_TO_IRQ(PIC_IRT_CDE_INDEX)
+#define PIC_BRIDGE_TB_XLS_IRQ PIC_INTR_TO_IRQ(PIC_IRT_BRIDGE_TB_XLS_INDEX)
+/* end XLS */
+#define PIC_GMAC_0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC0_INDEX)
+#define PIC_GMAC_1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC1_INDEX)
+#define PIC_GMAC_2_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC2_INDEX)
+#define PIC_GMAC_3_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC3_INDEX)
+#define PIC_XGS_0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_XGS0_INDEX)
+#define PIC_XGS_1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_XGS1_INDEX)
+#define PIC_HYPER_FATAL_IRQ PIC_INTR_TO_IRQ(PIC_IRT_HYPER_FATAL_INDEX)
+#define PIC_PCIX_FATAL_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIX_FATAL_INDEX)
+#define PIC_BRIDGE_AERR_IRQ PIC_INTR_TO_IRQ(PIC_IRT_BRIDGE_AERR_INDEX)
+#define PIC_BRIDGE_BERR_IRQ PIC_INTR_TO_IRQ(PIC_IRT_BRIDGE_BERR_INDEX)
+#define PIC_BRIDGE_TB_XLR_IRQ PIC_INTR_TO_IRQ(PIC_IRT_BRIDGE_TB_XLR_INDEX)
+#define PIC_BRIDGE_AERR_NMI_IRQ PIC_INTR_TO_IRQ(PIC_IRT_BRIDGE_AERR_NMI_INDEX)
+/* XLS defines */
+#define PIC_GMAC_4_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC4_INDEX)
+#define PIC_GMAC_5_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC5_INDEX)
+#define PIC_GMAC_6_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC6_INDEX)
+#define PIC_GMAC_7_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GMAC7_INDEX)
+#define PIC_BRIDGE_ERR_IRQ PIC_INTR_TO_IRQ(PIC_IRT_BRIDGE_ERR_INDEX)
+#define PIC_PCIE_LINK0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_LINK0_INDEX)
+#define PIC_PCIE_LINK1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_LINK1_INDEX)
+#define PIC_PCIE_LINK2_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_LINK2_INDEX)
+#define PIC_PCIE_LINK3_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_LINK3_INDEX)
+#define PIC_PCIE_XLSB0_LINK2_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_XLSB0_LINK2_INDEX)
+#define PIC_PCIE_XLSB0_LINK3_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_XLSB0_LINK3_INDEX)
+#define PIC_SRIO_LINK0_IRQ PIC_INTR_TO_IRQ(PIC_IRT_SRIO_LINK0_INDEX)
+#define PIC_SRIO_LINK1_IRQ PIC_INTR_TO_IRQ(PIC_IRT_SRIO_LINK1_INDEX)
+#define PIC_SRIO_LINK2_IRQ PIC_INTR_TO_IRQ(PIC_IRT_SRIO_LINK2_INDEX)
+#define PIC_SRIO_LINK3_IRQ PIC_INTR_TO_IRQ(PIC_IRT_SRIO_LINK3_INDEX)
+#define PIC_PCIE_INT_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_INT__INDEX)
+#define PIC_PCIE_FATAL_IRQ PIC_INTR_TO_IRQ(PIC_IRT_PCIE_FATAL_INDEX)
+#define PIC_GPIO_B_IRQ PIC_INTR_TO_IRQ(PIC_IRT_GPIO_B_INDEX)
+#define PIC_USB_IRQ PIC_INTR_TO_IRQ(PIC_IRT_USB_INDEX)
+#define PIC_IRT_LAST_IRQ PIC_USB_IRQ
+/* end XLS */
+
+#ifndef __ASSEMBLY__
+static inline void pic_send_ipi(u32 ipi)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+
+ netlogic_write_reg(mmio, PIC_IPI, ipi);
+}
+
+static inline u32 pic_read_control(void)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+
+ return netlogic_read_reg(mmio, PIC_CTRL);
+}
+
+static inline void pic_write_control(u32 control)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+
+ netlogic_write_reg(mmio, PIC_CTRL, control);
+}
+
+static inline void pic_update_control(u32 control)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+
+ netlogic_write_reg(mmio, PIC_CTRL,
+ (control | netlogic_read_reg(mmio, PIC_CTRL)));
+}
+
+#define PIC_IRQ_IS_EDGE_TRIGGERED(irq) (((irq) >= PIC_TIMER_0_IRQ) && \
+ ((irq) <= PIC_TIMER_7_IRQ))
+#define PIC_IRQ_IS_IRT(irq) (((irq) >= PIC_IRT_FIRST_IRQ) && \
+ ((irq) <= PIC_IRT_LAST_IRQ))
+#endif
+
+#endif /* _ASM_NLM_XLR_PIC_H */
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ASM_NLM_XLR_H
+#define _ASM_NLM_XLR_H
+
+/* Platform UART functions */
+struct uart_port;
+unsigned int nlm_xlr_uart_in(struct uart_port *, int);
+void nlm_xlr_uart_out(struct uart_port *, int, int);
+
+/* SMP support functions */
+struct irq_desc;
+void nlm_smp_function_ipi_handler(unsigned int irq, struct irq_desc *desc);
+void nlm_smp_resched_ipi_handler(unsigned int irq, struct irq_desc *desc);
+int nlm_wakeup_secondary_cpus(u32 wakeup_mask);
+void nlm_smp_irq_init(void);
+void nlm_boot_smp_nmi(void);
+void prom_pre_boot_secondary_cpus(void);
+
+extern struct plat_smp_ops nlm_smp_ops;
+extern unsigned long nlm_common_ebase;
+
+/* XLS B silicon "Rook" */
+static inline unsigned int nlm_chip_is_xls_b(void)
+{
+ uint32_t prid = read_c0_prid();
+
+ return ((prid & 0xf000) == 0x4000);
+}
+
+/*
+ * XLR chip types
+ */
+ /* The XLS product line has chip versions 0x[48c]? */
+static inline unsigned int nlm_chip_is_xls(void)
+{
+ uint32_t prid = read_c0_prid();
+
+ return ((prid & 0xf000) == 0x8000 || (prid & 0xf000) == 0x4000 ||
+ (prid & 0xf000) == 0xc000);
+}
+
+#endif /* _ASM_NLM_XLR_H */
#define instruction_pointer(regs) ((regs)->cp0_epc)
#define profile_pc(regs) instruction_pointer(regs)
-extern asmlinkage void do_syscall_trace(struct pt_regs *regs, int entryexit);
+extern asmlinkage void syscall_trace_enter(struct pt_regs *regs);
+extern asmlinkage void syscall_trace_leave(struct pt_regs *regs);
extern NORET_TYPE void die(const char *, struct pt_regs *) ATTRIB_NORET;
#define _TIF_FPUBOUND (1<<TIF_FPUBOUND)
#define _TIF_LOAD_WATCH (1<<TIF_LOAD_WATCH)
+/* work to do in syscall_trace_leave() */
+#define _TIF_WORK_SYSCALL_EXIT (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT)
+
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK (0x0000ffef & \
~(_TIF_SECCOMP | _TIF_SYSCALL_AUDIT))
#endif
}
-static inline void clocksource_set_clock(struct clocksource *cs,
- unsigned int clock)
-{
- clocksource_calc_mult_shift(cs, clock, 4);
-}
-
static inline void clockevent_set_clock(struct clock_event_device *cd,
unsigned int clock)
{
*/
int vdma_remap(unsigned long laddr, unsigned long paddr, unsigned long size)
{
- int first, pages, npages;
+ int first, pages;
if (laddr > 0xffffff) {
if (vdma_debug)
return -EINVAL; /* invalid physical address */
}
- npages = pages =
- (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
+ pages = (((paddr & (VDMA_PAGESIZE - 1)) + size) >> 12) + 1;
first = laddr >> 12;
if (vdma_debug)
printk("vdma_remap: first=%x, pages=%x\n", first, pages);
static void jz4740_dma_chan_irq(struct jz4740_dma_chan *dma)
{
- uint32_t status;
-
- status = jz4740_dma_read(JZ_REG_DMA_STATUS_CTRL(dma->id));
+ (void) jz4740_dma_read(JZ_REG_DMA_STATUS_CTRL(dma->id));
jz4740_dma_write_mask(JZ_REG_DMA_STATUS_CTRL(dma->id), 0,
JZ_DMA_STATUS_CTRL_ENABLE | JZ_DMA_STATUS_CTRL_TRANSFER_DONE);
/*
* Copyright (C) 2009-2010, Lars-Peter Clausen <lars@metafoo.de>
+ * Copyright (C) 2011, Maarten ter Huurne <maarten@treewalker.org>
* JZ4740 setup code
*
* This program is free software; you can redistribute it and/or modify it
*/
#include <linux/init.h>
+#include <linux/io.h>
#include <linux/kernel.h>
+#include <asm/bootinfo.h>
+
+#include <asm/mach-jz4740/base.h>
+
#include "reset.h"
+
+#define JZ4740_EMC_SDRAM_CTRL 0x80
+
+
+static void __init jz4740_detect_mem(void)
+{
+ void __iomem *jz_emc_base;
+ u32 ctrl, bus, bank, rows, cols;
+ phys_t size;
+
+ jz_emc_base = ioremap(JZ4740_EMC_BASE_ADDR, 0x100);
+ ctrl = readl(jz_emc_base + JZ4740_EMC_SDRAM_CTRL);
+ bus = 2 - ((ctrl >> 31) & 1);
+ bank = 1 + ((ctrl >> 19) & 1);
+ cols = 8 + ((ctrl >> 26) & 7);
+ rows = 11 + ((ctrl >> 20) & 3);
+ printk(KERN_DEBUG
+ "SDRAM preconfigured: bus:%u bank:%u rows:%u cols:%u\n",
+ bus, bank, rows, cols);
+ iounmap(jz_emc_base);
+
+ size = 1 << (bus + bank + cols + rows);
+ add_memory_region(0, size, BOOT_MEM_RAM);
+}
+
void __init plat_mem_setup(void)
{
jz4740_reset_init();
+ jz4740_detect_mem();
}
const char *get_system_type(void)
static struct clock_event_device jz4740_clockevent = {
.name = "jz4740-timer",
- .features = CLOCK_EVT_FEAT_PERIODIC,
+ .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
.set_next_event = jz4740_clockevent_set_next,
.set_mode = jz4740_clockevent_set_mode,
.rating = 200,
clockevents_register_device(&jz4740_clockevent);
- clocksource_set_clock(&jz4740_clocksource, clk_rate);
- ret = clocksource_register(&jz4740_clocksource);
+ ret = clocksource_register_hz(&jz4740_clocksource, clk_rate);
if (ret)
printk(KERN_ERR "Failed to register clocksource: %d\n", ret);
{
writel(BIT(16), jz4740_timer_base + JZ_REG_TIMER_STOP_CLEAR);
}
+EXPORT_SYMBOL_GPL(jz4740_timer_enable_watchdog);
void jz4740_timer_disable_watchdog(void)
{
writel(BIT(16), jz4740_timer_base + JZ_REG_TIMER_STOP_SET);
}
+EXPORT_SYMBOL_GPL(jz4740_timer_disable_watchdog);
void __init jz4740_timer_init(void)
{
obj-$(CONFIG_CPU_TX49XX) += r4k_fpu.o r4k_switch.o
obj-$(CONFIG_CPU_VR41XX) += r4k_fpu.o r4k_switch.o
obj-$(CONFIG_CPU_CAVIUM_OCTEON) += octeon_switch.o
+obj-$(CONFIG_CPU_XLR) += r4k_fpu.o r4k_switch.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SMP_UP) += smp-up.o
{
struct txx9_tmr_reg __iomem *tmrptr;
- clocksource_set_clock(&txx9_clocksource.cs, TIMER_CLK(imbusclk));
- clocksource_register(&txx9_clocksource.cs);
+ clocksource_register_hz(&txx9_clocksource.cs, TIMER_CLK(imbusclk));
tmrptr = ioremap(baseaddr, sizeof(struct txx9_tmr_reg));
__raw_writel(TCR_BASE, &tmrptr->tcr);
#endif
}
+static inline void set_elf_platform(int cpu, const char *plat)
+{
+ if (cpu == 0)
+ __elf_platform = plat;
+}
+
/*
* Get the FPU Implementation/Revision.
*/
case PRID_IMP_LOONGSON2:
c->cputype = CPU_LOONGSON2;
__cpu_name[cpu] = "ICT Loongson-2";
+
+ switch (c->processor_id & PRID_REV_MASK) {
+ case PRID_REV_LOONGSON2E:
+ set_elf_platform(cpu, "loongson2e");
+ break;
+ case PRID_REV_LOONGSON2F:
+ set_elf_platform(cpu, "loongson2f");
+ break;
+ }
+
c->isa_level = MIPS_CPU_ISA_III;
c->options = R4K_OPTS |
MIPS_CPU_FPU | MIPS_CPU_LLSC |
case PRID_IMP_BMIPS32_REV8:
c->cputype = CPU_BMIPS32;
__cpu_name[cpu] = "Broadcom BMIPS32";
+ set_elf_platform(cpu, "bmips32");
break;
case PRID_IMP_BMIPS3300:
case PRID_IMP_BMIPS3300_ALT:
case PRID_IMP_BMIPS3300_BUG:
c->cputype = CPU_BMIPS3300;
__cpu_name[cpu] = "Broadcom BMIPS3300";
+ set_elf_platform(cpu, "bmips3300");
break;
case PRID_IMP_BMIPS43XX: {
int rev = c->processor_id & 0xff;
rev <= PRID_REV_BMIPS4380_HI) {
c->cputype = CPU_BMIPS4380;
__cpu_name[cpu] = "Broadcom BMIPS4380";
+ set_elf_platform(cpu, "bmips4380");
} else {
c->cputype = CPU_BMIPS4350;
__cpu_name[cpu] = "Broadcom BMIPS4350";
+ set_elf_platform(cpu, "bmips4350");
}
break;
}
case PRID_IMP_BMIPS5000:
c->cputype = CPU_BMIPS5000;
__cpu_name[cpu] = "Broadcom BMIPS5000";
+ set_elf_platform(cpu, "bmips5000");
c->options |= MIPS_CPU_ULRI;
break;
}
c->cputype = CPU_CAVIUM_OCTEON_PLUS;
__cpu_name[cpu] = "Cavium Octeon+";
platform:
- if (cpu == 0)
- __elf_platform = "octeon";
+ set_elf_platform(cpu, "octeon");
break;
case PRID_IMP_CAVIUM_CN63XX:
c->cputype = CPU_CAVIUM_OCTEON2;
__cpu_name[cpu] = "Cavium Octeon II";
- if (cpu == 0)
- __elf_platform = "octeon2";
+ set_elf_platform(cpu, "octeon2");
break;
default:
printk(KERN_INFO "Unknown Octeon chip!\n");
}
}
+static inline void cpu_probe_netlogic(struct cpuinfo_mips *c, int cpu)
+{
+ decode_configs(c);
+
+ c->options = (MIPS_CPU_TLB |
+ MIPS_CPU_4KEX |
+ MIPS_CPU_COUNTER |
+ MIPS_CPU_DIVEC |
+ MIPS_CPU_WATCH |
+ MIPS_CPU_EJTAG |
+ MIPS_CPU_LLSC);
+
+ switch (c->processor_id & 0xff00) {
+ case PRID_IMP_NETLOGIC_XLR732:
+ case PRID_IMP_NETLOGIC_XLR716:
+ case PRID_IMP_NETLOGIC_XLR532:
+ case PRID_IMP_NETLOGIC_XLR308:
+ case PRID_IMP_NETLOGIC_XLR532C:
+ case PRID_IMP_NETLOGIC_XLR516C:
+ case PRID_IMP_NETLOGIC_XLR508C:
+ case PRID_IMP_NETLOGIC_XLR308C:
+ c->cputype = CPU_XLR;
+ __cpu_name[cpu] = "Netlogic XLR";
+ break;
+
+ case PRID_IMP_NETLOGIC_XLS608:
+ case PRID_IMP_NETLOGIC_XLS408:
+ case PRID_IMP_NETLOGIC_XLS404:
+ case PRID_IMP_NETLOGIC_XLS208:
+ case PRID_IMP_NETLOGIC_XLS204:
+ case PRID_IMP_NETLOGIC_XLS108:
+ case PRID_IMP_NETLOGIC_XLS104:
+ case PRID_IMP_NETLOGIC_XLS616B:
+ case PRID_IMP_NETLOGIC_XLS608B:
+ case PRID_IMP_NETLOGIC_XLS416B:
+ case PRID_IMP_NETLOGIC_XLS412B:
+ case PRID_IMP_NETLOGIC_XLS408B:
+ case PRID_IMP_NETLOGIC_XLS404B:
+ c->cputype = CPU_XLR;
+ __cpu_name[cpu] = "Netlogic XLS";
+ break;
+
+ default:
+ printk(KERN_INFO "Unknown Netlogic chip id [%02x]!\n",
+ c->processor_id);
+ c->cputype = CPU_XLR;
+ break;
+ }
+
+ c->isa_level = MIPS_CPU_ISA_M64R1;
+ c->tlbsize = ((read_c0_config1() >> 25) & 0x3f) + 1;
+}
+
#ifdef CONFIG_64BIT
/* For use by uaccess.h */
u64 __ua_limit;
case PRID_COMP_INGENIC:
cpu_probe_ingenic(c, cpu);
break;
+ case PRID_COMP_NETLOGIC:
+ cpu_probe_netlogic(c, cpu);
+ break;
}
BUG_ON(!__cpu_name[cpu]);
plldiv = G_BCM1480_SYS_PLL_DIV(__raw_readq(IOADDR(A_SCD_SYSTEM_CFG)));
zbbus = ((plldiv >> 1) * 50000000) + ((plldiv & 1) * 25000000);
- clocksource_set_clock(cs, zbbus);
- clocksource_register(cs);
+ clocksource_register_hz(cs, zbbus);
}
printk(KERN_INFO "I/O ASIC clock frequency %dHz\n", freq);
clocksource_dec.rating = 200 + freq / 10000000;
- clocksource_set_clock(&clocksource_dec, freq);
-
- clocksource_register(&clocksource_dec);
+ clocksource_register_hz(&clocksource_dec, freq);
}
clocksource_mips.rating = 200 + mips_hpt_frequency / 10000000;
- clocksource_set_clock(&clocksource_mips, mips_hpt_frequency);
-
- clocksource_register(&clocksource_mips);
+ clocksource_register_hz(&clocksource_mips, mips_hpt_frequency);
}
/**
/**
* powertv_tim_c_clocksource_init - set up a clock source for the TIM_C clock
*
- * The hard part here is coming up with a constant k and shift s such that
- * the 48-bit TIM_C value multiplied by k doesn't overflow and that value,
- * when shifted right by s, yields the corresponding number of nanoseconds.
* We know that TIM_C counts at 27 MHz/8, so each cycle corresponds to
- * 1 / (27,000,000/8) seconds. Multiply that by a billion and you get the
- * number of nanoseconds. Since the TIM_C value has 48 bits and the math is
- * done in 64 bits, avoiding an overflow means that k must be less than
- * 64 - 48 = 16 bits.
+ * 1 / (27,000,000/8) seconds.
*/
static void __init powertv_tim_c_clocksource_init(void)
{
- int prescale;
- unsigned long dividend;
- unsigned long k;
- int s;
- const int max_k_bits = (64 - 48) - 1;
- const unsigned long billion = 1000000000;
const unsigned long counts_per_second = 27000000 / 8;
- prescale = BITS_PER_LONG - ilog2(billion) - 1;
- dividend = billion << prescale;
- k = dividend / counts_per_second;
- s = ilog2(k) - max_k_bits;
-
- if (s < 0)
- s = prescale;
-
- else {
- k >>= s;
- s += prescale;
- }
-
- clocksource_tim_c.mult = k;
- clocksource_tim_c.shift = s;
clocksource_tim_c.rating = 200;
- clocksource_register(&clocksource_tim_c);
+ clocksource_register_hz(&clocksource_tim_c, counts_per_second);
tim_c = (struct tim_c *) asic_reg_addr(tim_ch);
}
/* Calculate a somewhat reasonable rating value */
clocksource_mips.rating = 200 + mips_hpt_frequency / 10000000;
- clocksource_set_clock(&clocksource_mips, mips_hpt_frequency);
-
- clocksource_register(&clocksource_mips);
+ clocksource_register_hz(&clocksource_mips, mips_hpt_frequency);
return 0;
}
IOADDR(A_SCD_TIMER_REGISTER(SB1250_HPT_NUM,
R_SCD_TIMER_CFG)));
- clocksource_set_clock(cs, V_SCD_TIMER_FREQ);
- clocksource_register(cs);
+ clocksource_register_hz(cs, V_SCD_TIMER_FREQ);
}
FEXPORT(syscall_exit_work_partial)
SAVE_STATIC
syscall_exit_work:
- li t0, _TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT
+ li t0, _TIF_WORK_SYSCALL_EXIT
and t0, a2 # a2 is preloaded with TI_FLAGS
beqz t0, work_pending # trace bit set?
- local_irq_enable # could let do_syscall_trace()
+ local_irq_enable # could let syscall_trace_leave()
# call schedule() instead
move a0, sp
- li a1, 1
- jal do_syscall_trace
+ jal syscall_trace_leave
b resume_userspace
#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_MIPS_MT)
#define JAL 0x0c000000 /* jump & link: ip --> ra, jump to target */
#define ADDR_MASK 0x03ffffff /* op_code|addr : 31...26|25 ....0 */
+#define JUMP_RANGE_MASK ((1UL << 28) - 1)
#define INSN_NOP 0x00000000 /* nop */
#define INSN_JAL(addr) \
/* jal (ftrace_caller + 8), jump over the first two instruction */
buf = (u32 *)&insn_jal_ftrace_caller;
- uasm_i_jal(&buf, (FTRACE_ADDR + 8));
+ uasm_i_jal(&buf, (FTRACE_ADDR + 8) & JUMP_RANGE_MASK);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* j ftrace_graph_caller */
buf = (u32 *)&insn_j_ftrace_graph_caller;
- uasm_i_j(&buf, (unsigned long)ftrace_graph_caller);
+ uasm_i_j(&buf, (unsigned long)ftrace_graph_caller & JUMP_RANGE_MASK);
#endif
}
setup_irq(0, &irq0);
}
-/*
- * Since the PIT overflows every tick, its not very useful
- * to just read by itself. So use jiffies to emulate a free
- * running counter:
- */
-static cycle_t pit_read(struct clocksource *cs)
-{
- unsigned long flags;
- int count;
- u32 jifs;
- static int old_count;
- static u32 old_jifs;
-
- raw_spin_lock_irqsave(&i8253_lock, flags);
- /*
- * Although our caller may have the read side of xtime_lock,
- * this is now a seqlock, and we are cheating in this routine
- * by having side effects on state that we cannot undo if
- * there is a collision on the seqlock and our caller has to
- * retry. (Namely, old_jifs and old_count.) So we must treat
- * jiffies as volatile despite the lock. We read jiffies
- * before latching the timer count to guarantee that although
- * the jiffies value might be older than the count (that is,
- * the counter may underflow between the last point where
- * jiffies was incremented and the point where we latch the
- * count), it cannot be newer.
- */
- jifs = jiffies;
- outb_p(0x00, PIT_MODE); /* latch the count ASAP */
- count = inb_p(PIT_CH0); /* read the latched count */
- count |= inb_p(PIT_CH0) << 8;
-
- /* VIA686a test code... reset the latch if count > max + 1 */
- if (count > LATCH) {
- outb_p(0x34, PIT_MODE);
- outb_p(LATCH & 0xff, PIT_CH0);
- outb(LATCH >> 8, PIT_CH0);
- count = LATCH - 1;
- }
-
- /*
- * It's possible for count to appear to go the wrong way for a
- * couple of reasons:
- *
- * 1. The timer counter underflows, but we haven't handled the
- * resulting interrupt and incremented jiffies yet.
- * 2. Hardware problem with the timer, not giving us continuous time,
- * the counter does small "jumps" upwards on some Pentium systems,
- * (see c't 95/10 page 335 for Neptun bug.)
- *
- * Previous attempts to handle these cases intelligently were
- * buggy, so we just do the simple thing now.
- */
- if (count > old_count && jifs == old_jifs) {
- count = old_count;
- }
- old_count = count;
- old_jifs = jifs;
-
- raw_spin_unlock_irqrestore(&i8253_lock, flags);
-
- count = (LATCH - 1) - count;
-
- return (cycle_t)(jifs * LATCH) + count;
-}
-
-static struct clocksource clocksource_pit = {
- .name = "pit",
- .rating = 110,
- .read = pit_read,
- .mask = CLOCKSOURCE_MASK(32),
- .mult = 0,
- .shift = 20,
-};
-
static int __init init_pit_clocksource(void)
{
if (num_possible_cpus() > 1) /* PIT does not scale! */
return 0;
- clocksource_pit.mult = clocksource_hz2mult(CLOCK_TICK_RATE, 20);
- return clocksource_register(&clocksource_pit);
+ return clocksource_i8253_init();
}
arch_initcall(init_pit_clocksource);
* Notification of system call entry/exit
* - triggered by current->work.syscall_trace
*/
-asmlinkage void do_syscall_trace(struct pt_regs *regs, int entryexit)
+asmlinkage void syscall_trace_enter(struct pt_regs *regs)
{
/* do the secure computing check first */
- if (!entryexit)
- secure_computing(regs->regs[2]);
-
- if (unlikely(current->audit_context) && entryexit)
- audit_syscall_exit(AUDITSC_RESULT(regs->regs[2]),
- regs->regs[2]);
+ secure_computing(regs->regs[2]);
if (!(current->ptrace & PT_PTRACED))
goto out;
}
out:
- if (unlikely(current->audit_context) && !entryexit)
+ if (unlikely(current->audit_context))
audit_syscall_entry(audit_arch(), regs->regs[2],
regs->regs[4], regs->regs[5],
regs->regs[6], regs->regs[7]);
}
+
+/*
+ * Notification of system call entry/exit
+ * - triggered by current->work.syscall_trace
+ */
+asmlinkage void syscall_trace_leave(struct pt_regs *regs)
+{
+ if (unlikely(current->audit_context))
+ audit_syscall_exit(AUDITSC_RESULT(regs->regs[7]),
+ -regs->regs[2]);
+
+ if (!(current->ptrace & PT_PTRACED))
+ return;
+
+ if (!test_thread_flag(TIF_SYSCALL_TRACE))
+ return;
+
+ /* The 0x80 provides a way for the tracing parent to distinguish
+ between a syscall stop and SIGTRAP delivery */
+ ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD) ?
+ 0x80 : 0));
+
+ /*
+ * this isn't the same as continuing with a signal, but it will do
+ * for normal use. strace only continues with a signal if the
+ * stopping signal is not SIGTRAP. -brl
+ */
+ if (current->exit_code) {
+ send_sig(current->exit_code, current, 1);
+ current->exit_code = 0;
+ }
+}
SAVE_STATIC
move s0, t2
move a0, sp
- li a1, 0
- jal do_syscall_trace
+ jal syscall_trace_enter
move t0, s0
RESTORE_STATIC
sys sys_ioprio_get 2 /* 4315 */
sys sys_utimensat 4
sys sys_signalfd 3
- sys sys_ni_syscall 0
+ sys sys_ni_syscall 0 /* was timerfd */
sys sys_eventfd 1
sys sys_fallocate 6 /* 4320 */
sys sys_timerfd_create 2
SAVE_STATIC
move s0, t2
move a0, sp
- li a1, 0
- jal do_syscall_trace
+ jal syscall_trace_enter
move t0, s0
RESTORE_STATIC
PTR sys_ioprio_get
PTR sys_utimensat /* 5275 */
PTR sys_signalfd
- PTR sys_ni_syscall
+ PTR sys_ni_syscall /* was timerfd */
PTR sys_eventfd
PTR sys_fallocate
PTR sys_timerfd_create /* 5280 */
SAVE_STATIC
move s0, t2
move a0, sp
- li a1, 0
- jal do_syscall_trace
+ jal syscall_trace_enter
move t0, s0
RESTORE_STATIC
PTR sys_ioprio_get
PTR compat_sys_utimensat
PTR compat_sys_signalfd /* 6280 */
- PTR sys_ni_syscall
+ PTR sys_ni_syscall /* was timerfd */
PTR sys_eventfd
PTR sys_fallocate
PTR sys_timerfd_create
move s0, t2 # Save syscall pointer
move a0, sp
- li a1, 0
- jal do_syscall_trace
+ jal syscall_trace_enter
move t0, s0
RESTORE_STATIC
PTR sys_ioprio_get /* 4315 */
PTR compat_sys_utimensat
PTR compat_sys_signalfd
- PTR sys_ni_syscall
+ PTR sys_ni_syscall /* was timerfd */
PTR sys_eventfd
PTR sys32_fallocate /* 4320 */
PTR sys_timerfd_create
static void ipi_resched_interrupt(void)
{
- /* Return from interrupt should be enough to cause scheduler check */
+ scheduler_ipi();
}
static void ipi_call_interrupt(void)
#include <linux/capability.h>
#include <linux/errno.h>
#include <linux/linkage.h>
-#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/smp.h>
-#include <linux/mman.h>
#include <linux/ptrace.h>
-#include <linux/sched.h>
#include <linux/string.h>
#include <linux/syscalls.h>
#include <linux/file.h>
#include <linux/msg.h>
#include <linux/shm.h>
#include <linux/compiler.h>
-#include <linux/module.h>
#include <linux/ipc.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
-#include <linux/random.h>
#include <linux/elf.h>
#include <asm/asm.h>
return res;
}
-unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
-
-EXPORT_SYMBOL(shm_align_mask);
-
-#define COLOUR_ALIGN(addr,pgoff) \
- ((((addr) + shm_align_mask) & ~shm_align_mask) + \
- (((pgoff) << PAGE_SHIFT) & shm_align_mask))
-
-unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
-{
- struct vm_area_struct * vmm;
- int do_color_align;
- unsigned long task_size;
-
-#ifdef CONFIG_32BIT
- task_size = TASK_SIZE;
-#else /* Must be CONFIG_64BIT*/
- task_size = test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE;
-#endif
-
- if (len > task_size)
- return -ENOMEM;
-
- if (flags & MAP_FIXED) {
- /* Even MAP_FIXED mappings must reside within task_size. */
- if (task_size - len < addr)
- return -EINVAL;
-
- /*
- * We do not accept a shared mapping if it would violate
- * cache aliasing constraints.
- */
- if ((flags & MAP_SHARED) &&
- ((addr - (pgoff << PAGE_SHIFT)) & shm_align_mask))
- return -EINVAL;
- return addr;
- }
-
- do_color_align = 0;
- if (filp || (flags & MAP_SHARED))
- do_color_align = 1;
- if (addr) {
- if (do_color_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
- vmm = find_vma(current->mm, addr);
- if (task_size - len >= addr &&
- (!vmm || addr + len <= vmm->vm_start))
- return addr;
- }
- addr = current->mm->mmap_base;
- if (do_color_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
-
- for (vmm = find_vma(current->mm, addr); ; vmm = vmm->vm_next) {
- /* At this point: (!vmm || addr < vmm->vm_end). */
- if (task_size - len < addr)
- return -ENOMEM;
- if (!vmm || addr + len <= vmm->vm_start)
- return addr;
- addr = vmm->vm_end;
- if (do_color_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- }
-}
-
-void arch_pick_mmap_layout(struct mm_struct *mm)
-{
- unsigned long random_factor = 0UL;
-
- if (current->flags & PF_RANDOMIZE) {
- random_factor = get_random_int();
- random_factor = random_factor << PAGE_SHIFT;
- if (TASK_IS_32BIT_ADDR)
- random_factor &= 0xfffffful;
- else
- random_factor &= 0xffffffful;
- }
-
- mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
- mm->unmap_area = arch_unmap_area;
-}
-
-static inline unsigned long brk_rnd(void)
-{
- unsigned long rnd = get_random_int();
-
- rnd = rnd << PAGE_SHIFT;
- /* 8MB for 32bit, 256MB for 64bit */
- if (TASK_IS_32BIT_ADDR)
- rnd = rnd & 0x7ffffful;
- else
- rnd = rnd & 0xffffffful;
-
- return rnd;
-}
-
-unsigned long arch_randomize_brk(struct mm_struct *mm)
-{
- unsigned long base = mm->brk;
- unsigned long ret;
-
- ret = PAGE_ALIGN(base + brk_rnd());
-
- if (ret < mm->brk)
- return mm->brk;
-
- return ret;
-}
-
SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long,
fd, off_t, offset)
unsigned long dvpret = dvpe();
#endif /* CONFIG_MIPS_MT_SMTC */
- notify_die(DIE_OOPS, str, regs, 0, regs_to_trapnr(regs), SIGSEGV);
+ if (notify_die(DIE_OOPS, str, regs, 0, regs_to_trapnr(regs), SIGSEGV) == NOTIFY_STOP)
+ sig = 0;
console_verbose();
spin_lock_irq(&die_lock);
mips_mt_regdump(dvpret);
#endif /* CONFIG_MIPS_MT_SMTC */
- if (notify_die(DIE_OOPS, str, regs, 0, regs_to_trapnr(regs), SIGSEGV) == NOTIFY_STOP)
- sig = 0;
-
printk("%s[#%d]:\n", str, ++die_counter);
show_registers(regs);
add_taint(TAINT_DIE);
RODATA
/* writeable */
+ _sdata = .; /* Start of data section */
.data : { /* Data */
. = . + DATAOFFSET; /* for CONFIG_MAPPED_KERNEL */
INIT_TASK_DATA(PAGE_SIZE)
NOSAVE_DATA
CACHELINE_ALIGNED_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)
+ READ_MOSTLY_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)
DATA_DATA
CONSTRUCTORS
}
--- /dev/null
+if LANTIQ
+
+config SOC_TYPE_XWAY
+ bool
+ default n
+
+choice
+ prompt "SoC Type"
+ default SOC_XWAY
+
+config SOC_AMAZON_SE
+ bool "Amazon SE"
+ select SOC_TYPE_XWAY
+
+config SOC_XWAY
+ bool "XWAY"
+ select SOC_TYPE_XWAY
+ select HW_HAS_PCI
+endchoice
+
+source "arch/mips/lantiq/xway/Kconfig"
+
+endif
--- /dev/null
+# Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License version 2 as published
+# by the Free Software Foundation.
+
+obj-y := irq.o setup.o clk.o prom.o devices.o
+
+obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+
+obj-$(CONFIG_SOC_TYPE_XWAY) += xway/
--- /dev/null
+#
+# Lantiq
+#
+
+platform-$(CONFIG_LANTIQ) += lantiq/
+cflags-$(CONFIG_LANTIQ) += -I$(srctree)/arch/mips/include/asm/mach-lantiq
+load-$(CONFIG_LANTIQ) = 0xffffffff80002000
+cflags-$(CONFIG_SOC_TYPE_XWAY) += -I$(srctree)/arch/mips/include/asm/mach-lantiq/xway
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 Thomas Langer <thomas.langer@lantiq.com>
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/list.h>
+
+#include <asm/time.h>
+#include <asm/irq.h>
+#include <asm/div64.h>
+
+#include <lantiq_soc.h>
+
+#include "clk.h"
+
+struct clk {
+ const char *name;
+ unsigned long rate;
+ unsigned long (*get_rate) (void);
+};
+
+static struct clk *cpu_clk;
+static int cpu_clk_cnt;
+
+/* lantiq socs have 3 static clocks */
+static struct clk cpu_clk_generic[] = {
+ {
+ .name = "cpu",
+ .get_rate = ltq_get_cpu_hz,
+ }, {
+ .name = "fpi",
+ .get_rate = ltq_get_fpi_hz,
+ }, {
+ .name = "io",
+ .get_rate = ltq_get_io_region_clock,
+ },
+};
+
+static struct resource ltq_cgu_resource = {
+ .name = "cgu",
+ .start = LTQ_CGU_BASE_ADDR,
+ .end = LTQ_CGU_BASE_ADDR + LTQ_CGU_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+/* remapped clock register range */
+void __iomem *ltq_cgu_membase;
+
+void clk_init(void)
+{
+ cpu_clk = cpu_clk_generic;
+ cpu_clk_cnt = ARRAY_SIZE(cpu_clk_generic);
+}
+
+static inline int clk_good(struct clk *clk)
+{
+ return clk && !IS_ERR(clk);
+}
+
+unsigned long clk_get_rate(struct clk *clk)
+{
+ if (unlikely(!clk_good(clk)))
+ return 0;
+
+ if (clk->rate != 0)
+ return clk->rate;
+
+ if (clk->get_rate != NULL)
+ return clk->get_rate();
+
+ return 0;
+}
+EXPORT_SYMBOL(clk_get_rate);
+
+struct clk *clk_get(struct device *dev, const char *id)
+{
+ int i;
+
+ for (i = 0; i < cpu_clk_cnt; i++)
+ if (!strcmp(id, cpu_clk[i].name))
+ return &cpu_clk[i];
+ BUG();
+ return ERR_PTR(-ENOENT);
+}
+EXPORT_SYMBOL(clk_get);
+
+void clk_put(struct clk *clk)
+{
+ /* not used */
+}
+EXPORT_SYMBOL(clk_put);
+
+static inline u32 ltq_get_counter_resolution(void)
+{
+ u32 res;
+
+ __asm__ __volatile__(
+ ".set push\n"
+ ".set mips32r2\n"
+ "rdhwr %0, $3\n"
+ ".set pop\n"
+ : "=&r" (res)
+ : /* no input */
+ : "memory");
+
+ return res;
+}
+
+void __init plat_time_init(void)
+{
+ struct clk *clk;
+
+ if (insert_resource(&iomem_resource, <q_cgu_resource) < 0)
+ panic("Failed to insert cgu memory\n");
+
+ if (request_mem_region(ltq_cgu_resource.start,
+ resource_size(<q_cgu_resource), "cgu") < 0)
+ panic("Failed to request cgu memory\n");
+
+ ltq_cgu_membase = ioremap_nocache(ltq_cgu_resource.start,
+ resource_size(<q_cgu_resource));
+ if (!ltq_cgu_membase) {
+ pr_err("Failed to remap cgu memory\n");
+ unreachable();
+ }
+ clk = clk_get(0, "cpu");
+ mips_hpt_frequency = clk_get_rate(clk) / ltq_get_counter_resolution();
+ write_c0_compare(read_c0_count());
+ clk_put(clk);
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LTQ_CLK_H__
+#define _LTQ_CLK_H__
+
+extern void clk_init(void);
+
+extern unsigned long ltq_get_cpu_hz(void);
+extern unsigned long ltq_get_fpi_hz(void);
+extern unsigned long ltq_get_io_region_clock(void);
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/reboot.h>
+#include <linux/platform_device.h>
+#include <linux/leds.h>
+#include <linux/etherdevice.h>
+#include <linux/reboot.h>
+#include <linux/time.h>
+#include <linux/io.h>
+#include <linux/gpio.h>
+#include <linux/leds.h>
+
+#include <asm/bootinfo.h>
+#include <asm/irq.h>
+
+#include <lantiq_soc.h>
+
+#include "devices.h"
+
+/* nor flash */
+static struct resource ltq_nor_resource = {
+ .name = "nor",
+ .start = LTQ_FLASH_START,
+ .end = LTQ_FLASH_START + LTQ_FLASH_MAX - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+static struct platform_device ltq_nor = {
+ .name = "ltq_nor",
+ .resource = <q_nor_resource,
+ .num_resources = 1,
+};
+
+void __init ltq_register_nor(struct physmap_flash_data *data)
+{
+ ltq_nor.dev.platform_data = data;
+ platform_device_register(<q_nor);
+}
+
+/* watchdog */
+static struct resource ltq_wdt_resource = {
+ .name = "watchdog",
+ .start = LTQ_WDT_BASE_ADDR,
+ .end = LTQ_WDT_BASE_ADDR + LTQ_WDT_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+void __init ltq_register_wdt(void)
+{
+ platform_device_register_simple("ltq_wdt", 0, <q_wdt_resource, 1);
+}
+
+/* asc ports */
+static struct resource ltq_asc0_resources[] = {
+ {
+ .name = "asc0",
+ .start = LTQ_ASC0_BASE_ADDR,
+ .end = LTQ_ASC0_BASE_ADDR + LTQ_ASC_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ },
+ IRQ_RES(tx, LTQ_ASC_TIR(0)),
+ IRQ_RES(rx, LTQ_ASC_RIR(0)),
+ IRQ_RES(err, LTQ_ASC_EIR(0)),
+};
+
+static struct resource ltq_asc1_resources[] = {
+ {
+ .name = "asc1",
+ .start = LTQ_ASC1_BASE_ADDR,
+ .end = LTQ_ASC1_BASE_ADDR + LTQ_ASC_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ },
+ IRQ_RES(tx, LTQ_ASC_TIR(1)),
+ IRQ_RES(rx, LTQ_ASC_RIR(1)),
+ IRQ_RES(err, LTQ_ASC_EIR(1)),
+};
+
+void __init ltq_register_asc(int port)
+{
+ switch (port) {
+ case 0:
+ platform_device_register_simple("ltq_asc", 0,
+ ltq_asc0_resources, ARRAY_SIZE(ltq_asc0_resources));
+ break;
+ case 1:
+ platform_device_register_simple("ltq_asc", 1,
+ ltq_asc1_resources, ARRAY_SIZE(ltq_asc1_resources));
+ break;
+ default:
+ break;
+ }
+}
+
+#ifdef CONFIG_PCI
+/* pci */
+static struct platform_device ltq_pci = {
+ .name = "ltq_pci",
+ .num_resources = 0,
+};
+
+void __init ltq_register_pci(struct ltq_pci_data *data)
+{
+ ltq_pci.dev.platform_data = data;
+ platform_device_register(<q_pci);
+}
+#else
+void __init ltq_register_pci(struct ltq_pci_data *data)
+{
+ pr_err("kernel is compiled without PCI support\n");
+}
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LTQ_DEVICES_H__
+#define _LTQ_DEVICES_H__
+
+#include <lantiq_platform.h>
+#include <linux/mtd/physmap.h>
+
+#define IRQ_RES(resname, irq) \
+ {.name = #resname, .start = (irq), .flags = IORESOURCE_IRQ}
+
+extern void ltq_register_nor(struct physmap_flash_data *data);
+extern void ltq_register_wdt(void);
+extern void ltq_register_asc(int port);
+extern void ltq_register_pci(struct ltq_pci_data *data);
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/cpu.h>
+
+#include <lantiq.h>
+#include <lantiq_soc.h>
+
+/* no ioremap possible at this early stage, lets use KSEG1 instead */
+#define LTQ_ASC_BASE KSEG1ADDR(LTQ_ASC1_BASE_ADDR)
+#define ASC_BUF 1024
+#define LTQ_ASC_FSTAT ((u32 *)(LTQ_ASC_BASE + 0x0048))
+#define LTQ_ASC_TBUF ((u32 *)(LTQ_ASC_BASE + 0x0020))
+#define TXMASK 0x3F00
+#define TXOFFSET 8
+
+void prom_putchar(char c)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ do { } while ((ltq_r32(LTQ_ASC_FSTAT) & TXMASK) >> TXOFFSET);
+ if (c == '\n')
+ ltq_w32('\r', LTQ_ASC_TBUF);
+ ltq_w32(c, LTQ_ASC_TBUF);
+ local_irq_restore(flags);
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ * Copyright (C) 2010 Thomas Langer <thomas.langer@lantiq.com>
+ */
+
+#include <linux/interrupt.h>
+#include <linux/ioport.h>
+
+#include <asm/bootinfo.h>
+#include <asm/irq_cpu.h>
+
+#include <lantiq_soc.h>
+#include <irq.h>
+
+/* register definitions */
+#define LTQ_ICU_IM0_ISR 0x0000
+#define LTQ_ICU_IM0_IER 0x0008
+#define LTQ_ICU_IM0_IOSR 0x0010
+#define LTQ_ICU_IM0_IRSR 0x0018
+#define LTQ_ICU_IM0_IMR 0x0020
+#define LTQ_ICU_IM1_ISR 0x0028
+#define LTQ_ICU_OFFSET (LTQ_ICU_IM1_ISR - LTQ_ICU_IM0_ISR)
+
+#define LTQ_EIU_EXIN_C 0x0000
+#define LTQ_EIU_EXIN_INIC 0x0004
+#define LTQ_EIU_EXIN_INEN 0x000C
+
+/* irq numbers used by the external interrupt unit (EIU) */
+#define LTQ_EIU_IR0 (INT_NUM_IM4_IRL0 + 30)
+#define LTQ_EIU_IR1 (INT_NUM_IM3_IRL0 + 31)
+#define LTQ_EIU_IR2 (INT_NUM_IM1_IRL0 + 26)
+#define LTQ_EIU_IR3 INT_NUM_IM1_IRL0
+#define LTQ_EIU_IR4 (INT_NUM_IM1_IRL0 + 1)
+#define LTQ_EIU_IR5 (INT_NUM_IM1_IRL0 + 2)
+#define LTQ_EIU_IR6 (INT_NUM_IM2_IRL0 + 30)
+
+#define MAX_EIU 6
+
+/* irqs generated by device attached to the EBU need to be acked in
+ * a special manner
+ */
+#define LTQ_ICU_EBU_IRQ 22
+
+#define ltq_icu_w32(x, y) ltq_w32((x), ltq_icu_membase + (y))
+#define ltq_icu_r32(x) ltq_r32(ltq_icu_membase + (x))
+
+#define ltq_eiu_w32(x, y) ltq_w32((x), ltq_eiu_membase + (y))
+#define ltq_eiu_r32(x) ltq_r32(ltq_eiu_membase + (x))
+
+static unsigned short ltq_eiu_irq[MAX_EIU] = {
+ LTQ_EIU_IR0,
+ LTQ_EIU_IR1,
+ LTQ_EIU_IR2,
+ LTQ_EIU_IR3,
+ LTQ_EIU_IR4,
+ LTQ_EIU_IR5,
+};
+
+static struct resource ltq_icu_resource = {
+ .name = "icu",
+ .start = LTQ_ICU_BASE_ADDR,
+ .end = LTQ_ICU_BASE_ADDR + LTQ_ICU_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+static struct resource ltq_eiu_resource = {
+ .name = "eiu",
+ .start = LTQ_EIU_BASE_ADDR,
+ .end = LTQ_EIU_BASE_ADDR + LTQ_ICU_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+static void __iomem *ltq_icu_membase;
+static void __iomem *ltq_eiu_membase;
+
+void ltq_disable_irq(struct irq_data *d)
+{
+ u32 ier = LTQ_ICU_IM0_IER;
+ int irq_nr = d->irq - INT_NUM_IRQ0;
+
+ ier += LTQ_ICU_OFFSET * (irq_nr / INT_NUM_IM_OFFSET);
+ irq_nr %= INT_NUM_IM_OFFSET;
+ ltq_icu_w32(ltq_icu_r32(ier) & ~(1 << irq_nr), ier);
+}
+
+void ltq_mask_and_ack_irq(struct irq_data *d)
+{
+ u32 ier = LTQ_ICU_IM0_IER;
+ u32 isr = LTQ_ICU_IM0_ISR;
+ int irq_nr = d->irq - INT_NUM_IRQ0;
+
+ ier += LTQ_ICU_OFFSET * (irq_nr / INT_NUM_IM_OFFSET);
+ isr += LTQ_ICU_OFFSET * (irq_nr / INT_NUM_IM_OFFSET);
+ irq_nr %= INT_NUM_IM_OFFSET;
+ ltq_icu_w32(ltq_icu_r32(ier) & ~(1 << irq_nr), ier);
+ ltq_icu_w32((1 << irq_nr), isr);
+}
+
+static void ltq_ack_irq(struct irq_data *d)
+{
+ u32 isr = LTQ_ICU_IM0_ISR;
+ int irq_nr = d->irq - INT_NUM_IRQ0;
+
+ isr += LTQ_ICU_OFFSET * (irq_nr / INT_NUM_IM_OFFSET);
+ irq_nr %= INT_NUM_IM_OFFSET;
+ ltq_icu_w32((1 << irq_nr), isr);
+}
+
+void ltq_enable_irq(struct irq_data *d)
+{
+ u32 ier = LTQ_ICU_IM0_IER;
+ int irq_nr = d->irq - INT_NUM_IRQ0;
+
+ ier += LTQ_ICU_OFFSET * (irq_nr / INT_NUM_IM_OFFSET);
+ irq_nr %= INT_NUM_IM_OFFSET;
+ ltq_icu_w32(ltq_icu_r32(ier) | (1 << irq_nr), ier);
+}
+
+static unsigned int ltq_startup_eiu_irq(struct irq_data *d)
+{
+ int i;
+ int irq_nr = d->irq - INT_NUM_IRQ0;
+
+ ltq_enable_irq(d);
+ for (i = 0; i < MAX_EIU; i++) {
+ if (irq_nr == ltq_eiu_irq[i]) {
+ /* low level - we should really handle set_type */
+ ltq_eiu_w32(ltq_eiu_r32(LTQ_EIU_EXIN_C) |
+ (0x6 << (i * 4)), LTQ_EIU_EXIN_C);
+ /* clear all pending */
+ ltq_eiu_w32(ltq_eiu_r32(LTQ_EIU_EXIN_INIC) & ~(1 << i),
+ LTQ_EIU_EXIN_INIC);
+ /* enable */
+ ltq_eiu_w32(ltq_eiu_r32(LTQ_EIU_EXIN_INEN) | (1 << i),
+ LTQ_EIU_EXIN_INEN);
+ break;
+ }
+ }
+
+ return 0;
+}
+
+static void ltq_shutdown_eiu_irq(struct irq_data *d)
+{
+ int i;
+ int irq_nr = d->irq - INT_NUM_IRQ0;
+
+ ltq_disable_irq(d);
+ for (i = 0; i < MAX_EIU; i++) {
+ if (irq_nr == ltq_eiu_irq[i]) {
+ /* disable */
+ ltq_eiu_w32(ltq_eiu_r32(LTQ_EIU_EXIN_INEN) & ~(1 << i),
+ LTQ_EIU_EXIN_INEN);
+ break;
+ }
+ }
+}
+
+static struct irq_chip ltq_irq_type = {
+ "icu",
+ .irq_enable = ltq_enable_irq,
+ .irq_disable = ltq_disable_irq,
+ .irq_unmask = ltq_enable_irq,
+ .irq_ack = ltq_ack_irq,
+ .irq_mask = ltq_disable_irq,
+ .irq_mask_ack = ltq_mask_and_ack_irq,
+};
+
+static struct irq_chip ltq_eiu_type = {
+ "eiu",
+ .irq_startup = ltq_startup_eiu_irq,
+ .irq_shutdown = ltq_shutdown_eiu_irq,
+ .irq_enable = ltq_enable_irq,
+ .irq_disable = ltq_disable_irq,
+ .irq_unmask = ltq_enable_irq,
+ .irq_ack = ltq_ack_irq,
+ .irq_mask = ltq_disable_irq,
+ .irq_mask_ack = ltq_mask_and_ack_irq,
+};
+
+static void ltq_hw_irqdispatch(int module)
+{
+ u32 irq;
+
+ irq = ltq_icu_r32(LTQ_ICU_IM0_IOSR + (module * LTQ_ICU_OFFSET));
+ if (irq == 0)
+ return;
+
+ /* silicon bug causes only the msb set to 1 to be valid. all
+ * other bits might be bogus
+ */
+ irq = __fls(irq);
+ do_IRQ((int)irq + INT_NUM_IM0_IRL0 + (INT_NUM_IM_OFFSET * module));
+
+ /* if this is a EBU irq, we need to ack it or get a deadlock */
+ if ((irq == LTQ_ICU_EBU_IRQ) && (module == 0))
+ ltq_ebu_w32(ltq_ebu_r32(LTQ_EBU_PCC_ISTAT) | 0x10,
+ LTQ_EBU_PCC_ISTAT);
+}
+
+#define DEFINE_HWx_IRQDISPATCH(x) \
+ static void ltq_hw ## x ## _irqdispatch(void) \
+ { \
+ ltq_hw_irqdispatch(x); \
+ }
+DEFINE_HWx_IRQDISPATCH(0)
+DEFINE_HWx_IRQDISPATCH(1)
+DEFINE_HWx_IRQDISPATCH(2)
+DEFINE_HWx_IRQDISPATCH(3)
+DEFINE_HWx_IRQDISPATCH(4)
+
+static void ltq_hw5_irqdispatch(void)
+{
+ do_IRQ(MIPS_CPU_TIMER_IRQ);
+}
+
+asmlinkage void plat_irq_dispatch(void)
+{
+ unsigned int pending = read_c0_status() & read_c0_cause() & ST0_IM;
+ unsigned int i;
+
+ if (pending & CAUSEF_IP7) {
+ do_IRQ(MIPS_CPU_TIMER_IRQ);
+ goto out;
+ } else {
+ for (i = 0; i < 5; i++) {
+ if (pending & (CAUSEF_IP2 << i)) {
+ ltq_hw_irqdispatch(i);
+ goto out;
+ }
+ }
+ }
+ pr_alert("Spurious IRQ: CAUSE=0x%08x\n", read_c0_status());
+
+out:
+ return;
+}
+
+static struct irqaction cascade = {
+ .handler = no_action,
+ .flags = IRQF_DISABLED,
+ .name = "cascade",
+};
+
+void __init arch_init_irq(void)
+{
+ int i;
+
+ if (insert_resource(&iomem_resource, <q_icu_resource) < 0)
+ panic("Failed to insert icu memory\n");
+
+ if (request_mem_region(ltq_icu_resource.start,
+ resource_size(<q_icu_resource), "icu") < 0)
+ panic("Failed to request icu memory\n");
+
+ ltq_icu_membase = ioremap_nocache(ltq_icu_resource.start,
+ resource_size(<q_icu_resource));
+ if (!ltq_icu_membase)
+ panic("Failed to remap icu memory\n");
+
+ if (insert_resource(&iomem_resource, <q_eiu_resource) < 0)
+ panic("Failed to insert eiu memory\n");
+
+ if (request_mem_region(ltq_eiu_resource.start,
+ resource_size(<q_eiu_resource), "eiu") < 0)
+ panic("Failed to request eiu memory\n");
+
+ ltq_eiu_membase = ioremap_nocache(ltq_eiu_resource.start,
+ resource_size(<q_eiu_resource));
+ if (!ltq_eiu_membase)
+ panic("Failed to remap eiu memory\n");
+
+ /* make sure all irqs are turned off by default */
+ for (i = 0; i < 5; i++)
+ ltq_icu_w32(0, LTQ_ICU_IM0_IER + (i * LTQ_ICU_OFFSET));
+
+ /* clear all possibly pending interrupts */
+ ltq_icu_w32(~0, LTQ_ICU_IM0_ISR + (i * LTQ_ICU_OFFSET));
+
+ mips_cpu_irq_init();
+
+ for (i = 2; i <= 6; i++)
+ setup_irq(i, &cascade);
+
+ if (cpu_has_vint) {
+ pr_info("Setting up vectored interrupts\n");
+ set_vi_handler(2, ltq_hw0_irqdispatch);
+ set_vi_handler(3, ltq_hw1_irqdispatch);
+ set_vi_handler(4, ltq_hw2_irqdispatch);
+ set_vi_handler(5, ltq_hw3_irqdispatch);
+ set_vi_handler(6, ltq_hw4_irqdispatch);
+ set_vi_handler(7, ltq_hw5_irqdispatch);
+ }
+
+ for (i = INT_NUM_IRQ0;
+ i <= (INT_NUM_IRQ0 + (5 * INT_NUM_IM_OFFSET)); i++)
+ if ((i == LTQ_EIU_IR0) || (i == LTQ_EIU_IR1) ||
+ (i == LTQ_EIU_IR2))
+ irq_set_chip_and_handler(i, <q_eiu_type,
+ handle_level_irq);
+ /* EIU3-5 only exist on ar9 and vr9 */
+ else if (((i == LTQ_EIU_IR3) || (i == LTQ_EIU_IR4) ||
+ (i == LTQ_EIU_IR5)) && (ltq_is_ar9() || ltq_is_vr9()))
+ irq_set_chip_and_handler(i, <q_eiu_type,
+ handle_level_irq);
+ else
+ irq_set_chip_and_handler(i, <q_irq_type,
+ handle_level_irq);
+
+#if !defined(CONFIG_MIPS_MT_SMP) && !defined(CONFIG_MIPS_MT_SMTC)
+ set_c0_status(IE_IRQ0 | IE_IRQ1 | IE_IRQ2 |
+ IE_IRQ3 | IE_IRQ4 | IE_IRQ5);
+#else
+ set_c0_status(IE_SW0 | IE_SW1 | IE_IRQ0 | IE_IRQ1 |
+ IE_IRQ2 | IE_IRQ3 | IE_IRQ4 | IE_IRQ5);
+#endif
+}
+
+unsigned int __cpuinit get_c0_compare_int(void)
+{
+ return CP0_LEGACY_COMPARE_IRQ;
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LANTIQ_MACH_H__
+#define _LANTIQ_MACH_H__
+
+#include <asm/mips_machine.h>
+
+enum lantiq_mach_type {
+ LTQ_MACH_GENERIC = 0,
+ LTQ_MACH_EASY50712, /* Danube evaluation board */
+ LTQ_MACH_EASY50601, /* Amazon SE evaluation board */
+};
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/module.h>
+#include <linux/clk.h>
+#include <asm/bootinfo.h>
+#include <asm/time.h>
+
+#include <lantiq.h>
+
+#include "prom.h"
+#include "clk.h"
+
+static struct ltq_soc_info soc_info;
+
+unsigned int ltq_get_cpu_ver(void)
+{
+ return soc_info.rev;
+}
+EXPORT_SYMBOL(ltq_get_cpu_ver);
+
+unsigned int ltq_get_soc_type(void)
+{
+ return soc_info.type;
+}
+EXPORT_SYMBOL(ltq_get_soc_type);
+
+const char *get_system_type(void)
+{
+ return soc_info.sys_type;
+}
+
+void prom_free_prom_memory(void)
+{
+}
+
+static void __init prom_init_cmdline(void)
+{
+ int argc = fw_arg0;
+ char **argv = (char **) KSEG1ADDR(fw_arg1);
+ int i;
+
+ for (i = 0; i < argc; i++) {
+ char *p = (char *) KSEG1ADDR(argv[i]);
+
+ if (p && *p) {
+ strlcat(arcs_cmdline, p, sizeof(arcs_cmdline));
+ strlcat(arcs_cmdline, " ", sizeof(arcs_cmdline));
+ }
+ }
+}
+
+void __init prom_init(void)
+{
+ struct clk *clk;
+
+ ltq_soc_detect(&soc_info);
+ clk_init();
+ clk = clk_get(0, "cpu");
+ snprintf(soc_info.sys_type, LTQ_SYS_TYPE_LEN - 1, "%s rev1.%d",
+ soc_info.name, soc_info.rev);
+ clk_put(clk);
+ soc_info.sys_type[LTQ_SYS_TYPE_LEN - 1] = '\0';
+ pr_info("SoC: %s\n", soc_info.sys_type);
+ prom_init_cmdline();
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LTQ_PROM_H__
+#define _LTQ_PROM_H__
+
+#define LTQ_SYS_TYPE_LEN 0x100
+
+struct ltq_soc_info {
+ unsigned char *name;
+ unsigned int rev;
+ unsigned int partnum;
+ unsigned int type;
+ unsigned char sys_type[LTQ_SYS_TYPE_LEN];
+};
+
+extern void ltq_soc_detect(struct ltq_soc_info *i);
+extern void ltq_soc_setup(void);
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/ioport.h>
+#include <asm/bootinfo.h>
+
+#include <lantiq_soc.h>
+
+#include "machtypes.h"
+#include "devices.h"
+#include "prom.h"
+
+void __init plat_mem_setup(void)
+{
+ /* assume 16M as default incase uboot fails to pass proper ramsize */
+ unsigned long memsize = 16;
+ char **envp = (char **) KSEG1ADDR(fw_arg2);
+
+ ioport_resource.start = IOPORT_RESOURCE_START;
+ ioport_resource.end = IOPORT_RESOURCE_END;
+ iomem_resource.start = IOMEM_RESOURCE_START;
+ iomem_resource.end = IOMEM_RESOURCE_END;
+
+ set_io_port_base((unsigned long) KSEG1);
+
+ while (*envp) {
+ char *e = (char *)KSEG1ADDR(*envp);
+ if (!strncmp(e, "memsize=", 8)) {
+ e += 8;
+ if (strict_strtoul(e, 0, &memsize))
+ pr_warn("bad memsize specified\n");
+ }
+ envp++;
+ }
+ memsize *= 1024 * 1024;
+ add_memory_region(0x00000000, memsize, BOOT_MEM_RAM);
+}
+
+static int __init
+lantiq_setup(void)
+{
+ ltq_soc_setup();
+ mips_machine_setup();
+ return 0;
+}
+
+arch_initcall(lantiq_setup);
+
+static void __init
+lantiq_generic_init(void)
+{
+ /* Nothing to do */
+}
+
+MIPS_MACHINE(LTQ_MACH_GENERIC,
+ "Generic",
+ "Generic Lantiq based board",
+ lantiq_generic_init);
--- /dev/null
+if SOC_XWAY
+
+menu "MIPS Machine"
+
+config LANTIQ_MACH_EASY50712
+ bool "Easy50712 - Danube"
+ default y
+
+endmenu
+
+endif
+
+if SOC_AMAZON_SE
+
+menu "MIPS Machine"
+
+config LANTIQ_MACH_EASY50601
+ bool "Easy50601 - Amazon SE"
+ default y
+
+endmenu
+
+endif
--- /dev/null
+obj-y := pmu.o ebu.o reset.o gpio.o gpio_stp.o gpio_ebu.o devices.o dma.o
+
+obj-$(CONFIG_SOC_XWAY) += clk-xway.o prom-xway.o setup-xway.o
+obj-$(CONFIG_SOC_AMAZON_SE) += clk-ase.o prom-ase.o setup-ase.o
+
+obj-$(CONFIG_LANTIQ_MACH_EASY50712) += mach-easy50712.o
+obj-$(CONFIG_LANTIQ_MACH_EASY50601) += mach-easy50601.o
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2011 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/clk.h>
+
+#include <asm/time.h>
+#include <asm/irq.h>
+#include <asm/div64.h>
+
+#include <lantiq_soc.h>
+
+/* cgu registers */
+#define LTQ_CGU_SYS 0x0010
+
+unsigned int ltq_get_io_region_clock(void)
+{
+ return CLOCK_133M;
+}
+EXPORT_SYMBOL(ltq_get_io_region_clock);
+
+unsigned int ltq_get_fpi_bus_clock(int fpi)
+{
+ return CLOCK_133M;
+}
+EXPORT_SYMBOL(ltq_get_fpi_bus_clock);
+
+unsigned int ltq_get_cpu_hz(void)
+{
+ if (ltq_cgu_r32(LTQ_CGU_SYS) & (1 << 5))
+ return CLOCK_266M;
+ else
+ return CLOCK_133M;
+}
+EXPORT_SYMBOL(ltq_get_cpu_hz);
+
+unsigned int ltq_get_fpi_hz(void)
+{
+ return CLOCK_133M;
+}
+EXPORT_SYMBOL(ltq_get_fpi_hz);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/clk.h>
+
+#include <asm/time.h>
+#include <asm/irq.h>
+#include <asm/div64.h>
+
+#include <lantiq_soc.h>
+
+static unsigned int ltq_ram_clocks[] = {
+ CLOCK_167M, CLOCK_133M, CLOCK_111M, CLOCK_83M };
+#define DDR_HZ ltq_ram_clocks[ltq_cgu_r32(LTQ_CGU_SYS) & 0x3]
+
+#define BASIC_FREQUENCY_1 35328000
+#define BASIC_FREQUENCY_2 36000000
+#define BASIS_REQUENCY_USB 12000000
+
+#define GET_BITS(x, msb, lsb) \
+ (((x) & ((1 << ((msb) + 1)) - 1)) >> (lsb))
+
+#define LTQ_CGU_PLL0_CFG 0x0004
+#define LTQ_CGU_PLL1_CFG 0x0008
+#define LTQ_CGU_PLL2_CFG 0x000C
+#define LTQ_CGU_SYS 0x0010
+#define LTQ_CGU_UPDATE 0x0014
+#define LTQ_CGU_IF_CLK 0x0018
+#define LTQ_CGU_OSC_CON 0x001C
+#define LTQ_CGU_SMD 0x0020
+#define LTQ_CGU_CT1SR 0x0028
+#define LTQ_CGU_CT2SR 0x002C
+#define LTQ_CGU_PCMCR 0x0030
+#define LTQ_CGU_PCI_CR 0x0034
+#define LTQ_CGU_PD_PC 0x0038
+#define LTQ_CGU_FMR 0x003C
+
+#define CGU_PLL0_PHASE_DIVIDER_ENABLE \
+ (ltq_cgu_r32(LTQ_CGU_PLL0_CFG) & (1 << 31))
+#define CGU_PLL0_BYPASS \
+ (ltq_cgu_r32(LTQ_CGU_PLL0_CFG) & (1 << 30))
+#define CGU_PLL0_CFG_DSMSEL \
+ (ltq_cgu_r32(LTQ_CGU_PLL0_CFG) & (1 << 28))
+#define CGU_PLL0_CFG_FRAC_EN \
+ (ltq_cgu_r32(LTQ_CGU_PLL0_CFG) & (1 << 27))
+#define CGU_PLL1_SRC \
+ (ltq_cgu_r32(LTQ_CGU_PLL1_CFG) & (1 << 31))
+#define CGU_PLL2_PHASE_DIVIDER_ENABLE \
+ (ltq_cgu_r32(LTQ_CGU_PLL2_CFG) & (1 << 20))
+#define CGU_SYS_FPI_SEL (1 << 6)
+#define CGU_SYS_DDR_SEL 0x3
+#define CGU_PLL0_SRC (1 << 29)
+
+#define CGU_PLL0_CFG_PLLK GET_BITS(ltq_cgu_r32(LTQ_CGU_PLL0_CFG), 26, 17)
+#define CGU_PLL0_CFG_PLLN GET_BITS(ltq_cgu_r32(LTQ_CGU_PLL0_CFG), 12, 6)
+#define CGU_PLL0_CFG_PLLM GET_BITS(ltq_cgu_r32(LTQ_CGU_PLL0_CFG), 5, 2)
+#define CGU_PLL2_SRC GET_BITS(ltq_cgu_r32(LTQ_CGU_PLL2_CFG), 18, 17)
+#define CGU_PLL2_CFG_INPUT_DIV GET_BITS(ltq_cgu_r32(LTQ_CGU_PLL2_CFG), 16, 13)
+
+static unsigned int ltq_get_pll0_fdiv(void);
+
+static inline unsigned int get_input_clock(int pll)
+{
+ switch (pll) {
+ case 0:
+ if (ltq_cgu_r32(LTQ_CGU_PLL0_CFG) & CGU_PLL0_SRC)
+ return BASIS_REQUENCY_USB;
+ else if (CGU_PLL0_PHASE_DIVIDER_ENABLE)
+ return BASIC_FREQUENCY_1;
+ else
+ return BASIC_FREQUENCY_2;
+ case 1:
+ if (CGU_PLL1_SRC)
+ return BASIS_REQUENCY_USB;
+ else if (CGU_PLL0_PHASE_DIVIDER_ENABLE)
+ return BASIC_FREQUENCY_1;
+ else
+ return BASIC_FREQUENCY_2;
+ case 2:
+ switch (CGU_PLL2_SRC) {
+ case 0:
+ return ltq_get_pll0_fdiv();
+ case 1:
+ return CGU_PLL2_PHASE_DIVIDER_ENABLE ?
+ BASIC_FREQUENCY_1 :
+ BASIC_FREQUENCY_2;
+ case 2:
+ return BASIS_REQUENCY_USB;
+ }
+ default:
+ return 0;
+ }
+}
+
+static inline unsigned int cal_dsm(int pll, unsigned int num, unsigned int den)
+{
+ u64 res, clock = get_input_clock(pll);
+
+ res = num * clock;
+ do_div(res, den);
+ return res;
+}
+
+static inline unsigned int mash_dsm(int pll, unsigned int M, unsigned int N,
+ unsigned int K)
+{
+ unsigned int num = ((N + 1) << 10) + K;
+ unsigned int den = (M + 1) << 10;
+
+ return cal_dsm(pll, num, den);
+}
+
+static inline unsigned int ssff_dsm_1(int pll, unsigned int M, unsigned int N,
+ unsigned int K)
+{
+ unsigned int num = ((N + 1) << 11) + K + 512;
+ unsigned int den = (M + 1) << 11;
+
+ return cal_dsm(pll, num, den);
+}
+
+static inline unsigned int ssff_dsm_2(int pll, unsigned int M, unsigned int N,
+ unsigned int K)
+{
+ unsigned int num = K >= 512 ?
+ ((N + 1) << 12) + K - 512 : ((N + 1) << 12) + K + 3584;
+ unsigned int den = (M + 1) << 12;
+
+ return cal_dsm(pll, num, den);
+}
+
+static inline unsigned int dsm(int pll, unsigned int M, unsigned int N,
+ unsigned int K, unsigned int dsmsel, unsigned int phase_div_en)
+{
+ if (!dsmsel)
+ return mash_dsm(pll, M, N, K);
+ else if (!phase_div_en)
+ return mash_dsm(pll, M, N, K);
+ else
+ return ssff_dsm_2(pll, M, N, K);
+}
+
+static inline unsigned int ltq_get_pll0_fosc(void)
+{
+ if (CGU_PLL0_BYPASS)
+ return get_input_clock(0);
+ else
+ return !CGU_PLL0_CFG_FRAC_EN
+ ? dsm(0, CGU_PLL0_CFG_PLLM, CGU_PLL0_CFG_PLLN, 0,
+ CGU_PLL0_CFG_DSMSEL,
+ CGU_PLL0_PHASE_DIVIDER_ENABLE)
+ : dsm(0, CGU_PLL0_CFG_PLLM, CGU_PLL0_CFG_PLLN,
+ CGU_PLL0_CFG_PLLK, CGU_PLL0_CFG_DSMSEL,
+ CGU_PLL0_PHASE_DIVIDER_ENABLE);
+}
+
+static unsigned int ltq_get_pll0_fdiv(void)
+{
+ unsigned int div = CGU_PLL2_CFG_INPUT_DIV + 1;
+
+ return (ltq_get_pll0_fosc() + (div >> 1)) / div;
+}
+
+unsigned int ltq_get_io_region_clock(void)
+{
+ unsigned int ret = ltq_get_pll0_fosc();
+
+ switch (ltq_cgu_r32(LTQ_CGU_PLL2_CFG) & CGU_SYS_DDR_SEL) {
+ default:
+ case 0:
+ return (ret + 1) / 2;
+ case 1:
+ return (ret * 2 + 2) / 5;
+ case 2:
+ return (ret + 1) / 3;
+ case 3:
+ return (ret + 2) / 4;
+ }
+}
+EXPORT_SYMBOL(ltq_get_io_region_clock);
+
+unsigned int ltq_get_fpi_bus_clock(int fpi)
+{
+ unsigned int ret = ltq_get_io_region_clock();
+
+ if ((fpi == 2) && (ltq_cgu_r32(LTQ_CGU_SYS) & CGU_SYS_FPI_SEL))
+ ret >>= 1;
+ return ret;
+}
+EXPORT_SYMBOL(ltq_get_fpi_bus_clock);
+
+unsigned int ltq_get_cpu_hz(void)
+{
+ switch (ltq_cgu_r32(LTQ_CGU_SYS) & 0xc) {
+ case 0:
+ return CLOCK_333M;
+ case 4:
+ return DDR_HZ;
+ case 8:
+ return DDR_HZ << 1;
+ default:
+ return DDR_HZ >> 1;
+ }
+}
+EXPORT_SYMBOL(ltq_get_cpu_hz);
+
+unsigned int ltq_get_fpi_hz(void)
+{
+ unsigned int ddr_clock = DDR_HZ;
+
+ if (ltq_cgu_r32(LTQ_CGU_SYS) & 0x40)
+ return ddr_clock >> 1;
+ return ddr_clock;
+}
+EXPORT_SYMBOL(ltq_get_fpi_hz);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/mtd/physmap.h>
+#include <linux/kernel.h>
+#include <linux/reboot.h>
+#include <linux/platform_device.h>
+#include <linux/leds.h>
+#include <linux/etherdevice.h>
+#include <linux/reboot.h>
+#include <linux/time.h>
+#include <linux/io.h>
+#include <linux/gpio.h>
+#include <linux/leds.h>
+
+#include <asm/bootinfo.h>
+#include <asm/irq.h>
+
+#include <lantiq_soc.h>
+#include <lantiq_irq.h>
+#include <lantiq_platform.h>
+
+#include "devices.h"
+
+/* gpio */
+static struct resource ltq_gpio_resource[] = {
+ {
+ .name = "gpio0",
+ .start = LTQ_GPIO0_BASE_ADDR,
+ .end = LTQ_GPIO0_BASE_ADDR + LTQ_GPIO_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ }, {
+ .name = "gpio1",
+ .start = LTQ_GPIO1_BASE_ADDR,
+ .end = LTQ_GPIO1_BASE_ADDR + LTQ_GPIO_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ }, {
+ .name = "gpio2",
+ .start = LTQ_GPIO2_BASE_ADDR,
+ .end = LTQ_GPIO2_BASE_ADDR + LTQ_GPIO_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ }
+};
+
+void __init ltq_register_gpio(void)
+{
+ platform_device_register_simple("ltq_gpio", 0,
+ <q_gpio_resource[0], 1);
+ platform_device_register_simple("ltq_gpio", 1,
+ <q_gpio_resource[1], 1);
+
+ /* AR9 and VR9 have an extra gpio block */
+ if (ltq_is_ar9() || ltq_is_vr9()) {
+ platform_device_register_simple("ltq_gpio", 2,
+ <q_gpio_resource[2], 1);
+ }
+}
+
+/* serial to parallel conversion */
+static struct resource ltq_stp_resource = {
+ .name = "stp",
+ .start = LTQ_STP_BASE_ADDR,
+ .end = LTQ_STP_BASE_ADDR + LTQ_STP_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+void __init ltq_register_gpio_stp(void)
+{
+ platform_device_register_simple("ltq_stp", 0, <q_stp_resource, 1);
+}
+
+/* asc ports - amazon se has its own serial mapping */
+static struct resource ltq_ase_asc_resources[] = {
+ {
+ .name = "asc0",
+ .start = LTQ_ASC1_BASE_ADDR,
+ .end = LTQ_ASC1_BASE_ADDR + LTQ_ASC_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+ },
+ IRQ_RES(tx, LTQ_ASC_ASE_TIR),
+ IRQ_RES(rx, LTQ_ASC_ASE_RIR),
+ IRQ_RES(err, LTQ_ASC_ASE_EIR),
+};
+
+void __init ltq_register_ase_asc(void)
+{
+ platform_device_register_simple("ltq_asc", 0,
+ ltq_ase_asc_resources, ARRAY_SIZE(ltq_ase_asc_resources));
+}
+
+/* ethernet */
+static struct resource ltq_etop_resources = {
+ .name = "etop",
+ .start = LTQ_ETOP_BASE_ADDR,
+ .end = LTQ_ETOP_BASE_ADDR + LTQ_ETOP_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+static struct platform_device ltq_etop = {
+ .name = "ltq_etop",
+ .resource = <q_etop_resources,
+ .num_resources = 1,
+};
+
+void __init
+ltq_register_etop(struct ltq_eth_data *eth)
+{
+ if (eth) {
+ ltq_etop.dev.platform_data = eth;
+ platform_device_register(<q_etop);
+ }
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LTQ_DEVICES_XWAY_H__
+#define _LTQ_DEVICES_XWAY_H__
+
+#include "../devices.h"
+#include <linux/phy.h>
+
+extern void ltq_register_gpio(void);
+extern void ltq_register_gpio_stp(void);
+extern void ltq_register_ase_asc(void);
+extern void ltq_register_etop(struct ltq_eth_data *eth);
+
+#endif
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2011 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/dma-mapping.h>
+
+#include <lantiq_soc.h>
+#include <xway_dma.h>
+
+#define LTQ_DMA_CTRL 0x10
+#define LTQ_DMA_CPOLL 0x14
+#define LTQ_DMA_CS 0x18
+#define LTQ_DMA_CCTRL 0x1C
+#define LTQ_DMA_CDBA 0x20
+#define LTQ_DMA_CDLEN 0x24
+#define LTQ_DMA_CIS 0x28
+#define LTQ_DMA_CIE 0x2C
+#define LTQ_DMA_PS 0x40
+#define LTQ_DMA_PCTRL 0x44
+#define LTQ_DMA_IRNEN 0xf4
+
+#define DMA_DESCPT BIT(3) /* descriptor complete irq */
+#define DMA_TX BIT(8) /* TX channel direction */
+#define DMA_CHAN_ON BIT(0) /* channel on / off bit */
+#define DMA_PDEN BIT(6) /* enable packet drop */
+#define DMA_CHAN_RST BIT(1) /* channel on / off bit */
+#define DMA_RESET BIT(0) /* channel on / off bit */
+#define DMA_IRQ_ACK 0x7e /* IRQ status register */
+#define DMA_POLL BIT(31) /* turn on channel polling */
+#define DMA_CLK_DIV4 BIT(6) /* polling clock divider */
+#define DMA_2W_BURST BIT(1) /* 2 word burst length */
+#define DMA_MAX_CHANNEL 20 /* the soc has 20 channels */
+#define DMA_ETOP_ENDIANESS (0xf << 8) /* endianess swap etop channels */
+#define DMA_WEIGHT (BIT(17) | BIT(16)) /* default channel wheight */
+
+#define ltq_dma_r32(x) ltq_r32(ltq_dma_membase + (x))
+#define ltq_dma_w32(x, y) ltq_w32(x, ltq_dma_membase + (y))
+#define ltq_dma_w32_mask(x, y, z) ltq_w32_mask(x, y, \
+ ltq_dma_membase + (z))
+
+static struct resource ltq_dma_resource = {
+ .name = "dma",
+ .start = LTQ_DMA_BASE_ADDR,
+ .end = LTQ_DMA_BASE_ADDR + LTQ_DMA_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+static void __iomem *ltq_dma_membase;
+
+void
+ltq_dma_enable_irq(struct ltq_dma_channel *ch)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ ltq_dma_w32(ch->nr, LTQ_DMA_CS);
+ ltq_dma_w32_mask(0, 1 << ch->nr, LTQ_DMA_IRNEN);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_enable_irq);
+
+void
+ltq_dma_disable_irq(struct ltq_dma_channel *ch)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ ltq_dma_w32(ch->nr, LTQ_DMA_CS);
+ ltq_dma_w32_mask(1 << ch->nr, 0, LTQ_DMA_IRNEN);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_disable_irq);
+
+void
+ltq_dma_ack_irq(struct ltq_dma_channel *ch)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ ltq_dma_w32(ch->nr, LTQ_DMA_CS);
+ ltq_dma_w32(DMA_IRQ_ACK, LTQ_DMA_CIS);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_ack_irq);
+
+void
+ltq_dma_open(struct ltq_dma_channel *ch)
+{
+ unsigned long flag;
+
+ local_irq_save(flag);
+ ltq_dma_w32(ch->nr, LTQ_DMA_CS);
+ ltq_dma_w32_mask(0, DMA_CHAN_ON, LTQ_DMA_CCTRL);
+ ltq_dma_enable_irq(ch);
+ local_irq_restore(flag);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_open);
+
+void
+ltq_dma_close(struct ltq_dma_channel *ch)
+{
+ unsigned long flag;
+
+ local_irq_save(flag);
+ ltq_dma_w32(ch->nr, LTQ_DMA_CS);
+ ltq_dma_w32_mask(DMA_CHAN_ON, 0, LTQ_DMA_CCTRL);
+ ltq_dma_disable_irq(ch);
+ local_irq_restore(flag);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_close);
+
+static void
+ltq_dma_alloc(struct ltq_dma_channel *ch)
+{
+ unsigned long flags;
+
+ ch->desc = 0;
+ ch->desc_base = dma_alloc_coherent(NULL,
+ LTQ_DESC_NUM * LTQ_DESC_SIZE,
+ &ch->phys, GFP_ATOMIC);
+ memset(ch->desc_base, 0, LTQ_DESC_NUM * LTQ_DESC_SIZE);
+
+ local_irq_save(flags);
+ ltq_dma_w32(ch->nr, LTQ_DMA_CS);
+ ltq_dma_w32(ch->phys, LTQ_DMA_CDBA);
+ ltq_dma_w32(LTQ_DESC_NUM, LTQ_DMA_CDLEN);
+ ltq_dma_w32_mask(DMA_CHAN_ON, 0, LTQ_DMA_CCTRL);
+ wmb();
+ ltq_dma_w32_mask(0, DMA_CHAN_RST, LTQ_DMA_CCTRL);
+ while (ltq_dma_r32(LTQ_DMA_CCTRL) & DMA_CHAN_RST)
+ ;
+ local_irq_restore(flags);
+}
+
+void
+ltq_dma_alloc_tx(struct ltq_dma_channel *ch)
+{
+ unsigned long flags;
+
+ ltq_dma_alloc(ch);
+
+ local_irq_save(flags);
+ ltq_dma_w32(DMA_DESCPT, LTQ_DMA_CIE);
+ ltq_dma_w32_mask(0, 1 << ch->nr, LTQ_DMA_IRNEN);
+ ltq_dma_w32(DMA_WEIGHT | DMA_TX, LTQ_DMA_CCTRL);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_alloc_tx);
+
+void
+ltq_dma_alloc_rx(struct ltq_dma_channel *ch)
+{
+ unsigned long flags;
+
+ ltq_dma_alloc(ch);
+
+ local_irq_save(flags);
+ ltq_dma_w32(DMA_DESCPT, LTQ_DMA_CIE);
+ ltq_dma_w32_mask(0, 1 << ch->nr, LTQ_DMA_IRNEN);
+ ltq_dma_w32(DMA_WEIGHT, LTQ_DMA_CCTRL);
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_alloc_rx);
+
+void
+ltq_dma_free(struct ltq_dma_channel *ch)
+{
+ if (!ch->desc_base)
+ return;
+ ltq_dma_close(ch);
+ dma_free_coherent(NULL, LTQ_DESC_NUM * LTQ_DESC_SIZE,
+ ch->desc_base, ch->phys);
+}
+EXPORT_SYMBOL_GPL(ltq_dma_free);
+
+void
+ltq_dma_init_port(int p)
+{
+ ltq_dma_w32(p, LTQ_DMA_PS);
+ switch (p) {
+ case DMA_PORT_ETOP:
+ /*
+ * Tell the DMA engine to swap the endianess of data frames and
+ * drop packets if the channel arbitration fails.
+ */
+ ltq_dma_w32_mask(0, DMA_ETOP_ENDIANESS | DMA_PDEN,
+ LTQ_DMA_PCTRL);
+ break;
+
+ case DMA_PORT_DEU:
+ ltq_dma_w32((DMA_2W_BURST << 4) | (DMA_2W_BURST << 2),
+ LTQ_DMA_PCTRL);
+ break;
+
+ default:
+ break;
+ }
+}
+EXPORT_SYMBOL_GPL(ltq_dma_init_port);
+
+int __init
+ltq_dma_init(void)
+{
+ int i;
+
+ /* insert and request the memory region */
+ if (insert_resource(&iomem_resource, <q_dma_resource) < 0)
+ panic("Failed to insert dma memory\n");
+
+ if (request_mem_region(ltq_dma_resource.start,
+ resource_size(<q_dma_resource), "dma") < 0)
+ panic("Failed to request dma memory\n");
+
+ /* remap dma register range */
+ ltq_dma_membase = ioremap_nocache(ltq_dma_resource.start,
+ resource_size(<q_dma_resource));
+ if (!ltq_dma_membase)
+ panic("Failed to remap dma memory\n");
+
+ /* power up and reset the dma engine */
+ ltq_pmu_enable(PMU_DMA);
+ ltq_dma_w32_mask(0, DMA_RESET, LTQ_DMA_CTRL);
+
+ /* disable all interrupts */
+ ltq_dma_w32(0, LTQ_DMA_IRNEN);
+
+ /* reset/configure each channel */
+ for (i = 0; i < DMA_MAX_CHANNEL; i++) {
+ ltq_dma_w32(i, LTQ_DMA_CS);
+ ltq_dma_w32(DMA_CHAN_RST, LTQ_DMA_CCTRL);
+ ltq_dma_w32(DMA_POLL | DMA_CLK_DIV4, LTQ_DMA_CPOLL);
+ ltq_dma_w32_mask(DMA_CHAN_ON, 0, LTQ_DMA_CCTRL);
+ }
+ return 0;
+}
+
+postcore_initcall(ltq_dma_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * EBU - the external bus unit attaches PCI, NOR and NAND
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/ioport.h>
+
+#include <lantiq_soc.h>
+
+/* all access to the ebu must be locked */
+DEFINE_SPINLOCK(ebu_lock);
+EXPORT_SYMBOL_GPL(ebu_lock);
+
+static struct resource ltq_ebu_resource = {
+ .name = "ebu",
+ .start = LTQ_EBU_BASE_ADDR,
+ .end = LTQ_EBU_BASE_ADDR + LTQ_EBU_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+/* remapped base addr of the clock unit and external bus unit */
+void __iomem *ltq_ebu_membase;
+
+static int __init lantiq_ebu_init(void)
+{
+ /* insert and request the memory region */
+ if (insert_resource(&iomem_resource, <q_ebu_resource) < 0)
+ panic("Failed to insert ebu memory\n");
+
+ if (request_mem_region(ltq_ebu_resource.start,
+ resource_size(<q_ebu_resource), "ebu") < 0)
+ panic("Failed to request ebu memory\n");
+
+ /* remap ebu register range */
+ ltq_ebu_membase = ioremap_nocache(ltq_ebu_resource.start,
+ resource_size(<q_ebu_resource));
+ if (!ltq_ebu_membase)
+ panic("Failed to remap ebu memory\n");
+
+ /* make sure to unprotect the memory region where flash is located */
+ ltq_ebu_w32(ltq_ebu_r32(LTQ_EBU_BUSCON0) & ~EBU_WRDIS, LTQ_EBU_BUSCON0);
+ return 0;
+}
+
+postcore_initcall(lantiq_ebu_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/gpio.h>
+#include <linux/ioport.h>
+#include <linux/io.h>
+
+#include <lantiq_soc.h>
+
+#define LTQ_GPIO_OUT 0x00
+#define LTQ_GPIO_IN 0x04
+#define LTQ_GPIO_DIR 0x08
+#define LTQ_GPIO_ALTSEL0 0x0C
+#define LTQ_GPIO_ALTSEL1 0x10
+#define LTQ_GPIO_OD 0x14
+
+#define PINS_PER_PORT 16
+#define MAX_PORTS 3
+
+#define ltq_gpio_getbit(m, r, p) (!!(ltq_r32(m + r) & (1 << p)))
+#define ltq_gpio_setbit(m, r, p) ltq_w32_mask(0, (1 << p), m + r)
+#define ltq_gpio_clearbit(m, r, p) ltq_w32_mask((1 << p), 0, m + r)
+
+struct ltq_gpio {
+ void __iomem *membase;
+ struct gpio_chip chip;
+};
+
+static struct ltq_gpio ltq_gpio_port[MAX_PORTS];
+
+int gpio_to_irq(unsigned int gpio)
+{
+ return -EINVAL;
+}
+EXPORT_SYMBOL(gpio_to_irq);
+
+int irq_to_gpio(unsigned int gpio)
+{
+ return -EINVAL;
+}
+EXPORT_SYMBOL(irq_to_gpio);
+
+int ltq_gpio_request(unsigned int pin, unsigned int alt0,
+ unsigned int alt1, unsigned int dir, const char *name)
+{
+ int id = 0;
+
+ if (pin >= (MAX_PORTS * PINS_PER_PORT))
+ return -EINVAL;
+ if (gpio_request(pin, name)) {
+ pr_err("failed to setup lantiq gpio: %s\n", name);
+ return -EBUSY;
+ }
+ if (dir)
+ gpio_direction_output(pin, 1);
+ else
+ gpio_direction_input(pin);
+ while (pin >= PINS_PER_PORT) {
+ pin -= PINS_PER_PORT;
+ id++;
+ }
+ if (alt0)
+ ltq_gpio_setbit(ltq_gpio_port[id].membase,
+ LTQ_GPIO_ALTSEL0, pin);
+ else
+ ltq_gpio_clearbit(ltq_gpio_port[id].membase,
+ LTQ_GPIO_ALTSEL0, pin);
+ if (alt1)
+ ltq_gpio_setbit(ltq_gpio_port[id].membase,
+ LTQ_GPIO_ALTSEL1, pin);
+ else
+ ltq_gpio_clearbit(ltq_gpio_port[id].membase,
+ LTQ_GPIO_ALTSEL1, pin);
+ return 0;
+}
+EXPORT_SYMBOL(ltq_gpio_request);
+
+static void ltq_gpio_set(struct gpio_chip *chip, unsigned int offset, int value)
+{
+ struct ltq_gpio *ltq_gpio = container_of(chip, struct ltq_gpio, chip);
+
+ if (value)
+ ltq_gpio_setbit(ltq_gpio->membase, LTQ_GPIO_OUT, offset);
+ else
+ ltq_gpio_clearbit(ltq_gpio->membase, LTQ_GPIO_OUT, offset);
+}
+
+static int ltq_gpio_get(struct gpio_chip *chip, unsigned int offset)
+{
+ struct ltq_gpio *ltq_gpio = container_of(chip, struct ltq_gpio, chip);
+
+ return ltq_gpio_getbit(ltq_gpio->membase, LTQ_GPIO_IN, offset);
+}
+
+static int ltq_gpio_direction_input(struct gpio_chip *chip, unsigned int offset)
+{
+ struct ltq_gpio *ltq_gpio = container_of(chip, struct ltq_gpio, chip);
+
+ ltq_gpio_clearbit(ltq_gpio->membase, LTQ_GPIO_OD, offset);
+ ltq_gpio_clearbit(ltq_gpio->membase, LTQ_GPIO_DIR, offset);
+
+ return 0;
+}
+
+static int ltq_gpio_direction_output(struct gpio_chip *chip,
+ unsigned int offset, int value)
+{
+ struct ltq_gpio *ltq_gpio = container_of(chip, struct ltq_gpio, chip);
+
+ ltq_gpio_setbit(ltq_gpio->membase, LTQ_GPIO_OD, offset);
+ ltq_gpio_setbit(ltq_gpio->membase, LTQ_GPIO_DIR, offset);
+ ltq_gpio_set(chip, offset, value);
+
+ return 0;
+}
+
+static int ltq_gpio_req(struct gpio_chip *chip, unsigned offset)
+{
+ struct ltq_gpio *ltq_gpio = container_of(chip, struct ltq_gpio, chip);
+
+ ltq_gpio_clearbit(ltq_gpio->membase, LTQ_GPIO_ALTSEL0, offset);
+ ltq_gpio_clearbit(ltq_gpio->membase, LTQ_GPIO_ALTSEL1, offset);
+ return 0;
+}
+
+static int ltq_gpio_probe(struct platform_device *pdev)
+{
+ struct resource *res;
+
+ if (pdev->id >= MAX_PORTS) {
+ dev_err(&pdev->dev, "invalid gpio port %d\n",
+ pdev->id);
+ return -EINVAL;
+ }
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(&pdev->dev, "failed to get memory for gpio port %d\n",
+ pdev->id);
+ return -ENOENT;
+ }
+ res = devm_request_mem_region(&pdev->dev, res->start,
+ resource_size(res), dev_name(&pdev->dev));
+ if (!res) {
+ dev_err(&pdev->dev,
+ "failed to request memory for gpio port %d\n",
+ pdev->id);
+ return -EBUSY;
+ }
+ ltq_gpio_port[pdev->id].membase = devm_ioremap_nocache(&pdev->dev,
+ res->start, resource_size(res));
+ if (!ltq_gpio_port[pdev->id].membase) {
+ dev_err(&pdev->dev, "failed to remap memory for gpio port %d\n",
+ pdev->id);
+ return -ENOMEM;
+ }
+ ltq_gpio_port[pdev->id].chip.label = "ltq_gpio";
+ ltq_gpio_port[pdev->id].chip.direction_input = ltq_gpio_direction_input;
+ ltq_gpio_port[pdev->id].chip.direction_output =
+ ltq_gpio_direction_output;
+ ltq_gpio_port[pdev->id].chip.get = ltq_gpio_get;
+ ltq_gpio_port[pdev->id].chip.set = ltq_gpio_set;
+ ltq_gpio_port[pdev->id].chip.request = ltq_gpio_req;
+ ltq_gpio_port[pdev->id].chip.base = PINS_PER_PORT * pdev->id;
+ ltq_gpio_port[pdev->id].chip.ngpio = PINS_PER_PORT;
+ platform_set_drvdata(pdev, <q_gpio_port[pdev->id]);
+ return gpiochip_add(<q_gpio_port[pdev->id].chip);
+}
+
+static struct platform_driver
+ltq_gpio_driver = {
+ .probe = ltq_gpio_probe,
+ .driver = {
+ .name = "ltq_gpio",
+ .owner = THIS_MODULE,
+ },
+};
+
+int __init ltq_gpio_init(void)
+{
+ int ret = platform_driver_register(<q_gpio_driver);
+
+ if (ret)
+ pr_info("ltq_gpio : Error registering platfom driver!");
+ return ret;
+}
+
+postcore_initcall(ltq_gpio_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/platform_device.h>
+#include <linux/mutex.h>
+#include <linux/gpio.h>
+#include <linux/io.h>
+
+#include <lantiq_soc.h>
+
+/*
+ * By attaching hardware latches to the EBU it is possible to create output
+ * only gpios. This driver configures a special memory address, which when
+ * written to outputs 16 bit to the latches.
+ */
+
+#define LTQ_EBU_BUSCON 0x1e7ff /* 16 bit access, slowest timing */
+#define LTQ_EBU_WP 0x80000000 /* write protect bit */
+
+/* we keep a shadow value of the last value written to the ebu */
+static int ltq_ebu_gpio_shadow = 0x0;
+static void __iomem *ltq_ebu_gpio_membase;
+
+static void ltq_ebu_apply(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&ebu_lock, flags);
+ ltq_ebu_w32(LTQ_EBU_BUSCON, LTQ_EBU_BUSCON1);
+ *((__u16 *)ltq_ebu_gpio_membase) = ltq_ebu_gpio_shadow;
+ ltq_ebu_w32(LTQ_EBU_BUSCON | LTQ_EBU_WP, LTQ_EBU_BUSCON1);
+ spin_unlock_irqrestore(&ebu_lock, flags);
+}
+
+static void ltq_ebu_set(struct gpio_chip *chip, unsigned offset, int value)
+{
+ if (value)
+ ltq_ebu_gpio_shadow |= (1 << offset);
+ else
+ ltq_ebu_gpio_shadow &= ~(1 << offset);
+ ltq_ebu_apply();
+}
+
+static int ltq_ebu_direction_output(struct gpio_chip *chip, unsigned offset,
+ int value)
+{
+ ltq_ebu_set(chip, offset, value);
+
+ return 0;
+}
+
+static struct gpio_chip ltq_ebu_chip = {
+ .label = "ltq_ebu",
+ .direction_output = ltq_ebu_direction_output,
+ .set = ltq_ebu_set,
+ .base = 72,
+ .ngpio = 16,
+ .can_sleep = 1,
+ .owner = THIS_MODULE,
+};
+
+static int ltq_ebu_probe(struct platform_device *pdev)
+{
+ int ret = 0;
+ struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+
+ if (!res) {
+ dev_err(&pdev->dev, "failed to get memory resource\n");
+ return -ENOENT;
+ }
+
+ res = devm_request_mem_region(&pdev->dev, res->start,
+ resource_size(res), dev_name(&pdev->dev));
+ if (!res) {
+ dev_err(&pdev->dev, "failed to request memory resource\n");
+ return -EBUSY;
+ }
+
+ ltq_ebu_gpio_membase = devm_ioremap_nocache(&pdev->dev, res->start,
+ resource_size(res));
+ if (!ltq_ebu_gpio_membase) {
+ dev_err(&pdev->dev, "Failed to ioremap mem region\n");
+ return -ENOMEM;
+ }
+
+ /* grab the default shadow value passed form the platform code */
+ ltq_ebu_gpio_shadow = (unsigned int) pdev->dev.platform_data;
+
+ /* tell the ebu controller which memory address we will be using */
+ ltq_ebu_w32(pdev->resource->start | 0x1, LTQ_EBU_ADDRSEL1);
+
+ /* write protect the region */
+ ltq_ebu_w32(LTQ_EBU_BUSCON | LTQ_EBU_WP, LTQ_EBU_BUSCON1);
+
+ ret = gpiochip_add(<q_ebu_chip);
+ if (!ret)
+ ltq_ebu_apply();
+ return ret;
+}
+
+static struct platform_driver ltq_ebu_driver = {
+ .probe = ltq_ebu_probe,
+ .driver = {
+ .name = "ltq_ebu",
+ .owner = THIS_MODULE,
+ },
+};
+
+static int __init ltq_ebu_init(void)
+{
+ int ret = platform_driver_register(<q_ebu_driver);
+
+ if (ret)
+ pr_info("ltq_ebu : Error registering platfom driver!");
+ return ret;
+}
+
+postcore_initcall(ltq_ebu_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2007 John Crispin <blogic@openwrt.org>
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/platform_device.h>
+#include <linux/mutex.h>
+#include <linux/io.h>
+#include <linux/gpio.h>
+
+#include <lantiq_soc.h>
+
+#define LTQ_STP_CON0 0x00
+#define LTQ_STP_CON1 0x04
+#define LTQ_STP_CPU0 0x08
+#define LTQ_STP_CPU1 0x0C
+#define LTQ_STP_AR 0x10
+
+#define LTQ_STP_CON_SWU (1 << 31)
+#define LTQ_STP_2HZ 0
+#define LTQ_STP_4HZ (1 << 23)
+#define LTQ_STP_8HZ (2 << 23)
+#define LTQ_STP_10HZ (3 << 23)
+#define LTQ_STP_SPEED_MASK (0xf << 23)
+#define LTQ_STP_UPD_FPI (1 << 31)
+#define LTQ_STP_UPD_MASK (3 << 30)
+#define LTQ_STP_ADSL_SRC (3 << 24)
+
+#define LTQ_STP_GROUP0 (1 << 0)
+
+#define LTQ_STP_RISING 0
+#define LTQ_STP_FALLING (1 << 26)
+#define LTQ_STP_EDGE_MASK (1 << 26)
+
+#define ltq_stp_r32(reg) __raw_readl(ltq_stp_membase + reg)
+#define ltq_stp_w32(val, reg) __raw_writel(val, ltq_stp_membase + reg)
+#define ltq_stp_w32_mask(clear, set, reg) \
+ ltq_w32((ltq_r32(ltq_stp_membase + reg) & ~(clear)) | (set), \
+ ltq_stp_membase + (reg))
+
+static int ltq_stp_shadow = 0xffff;
+static void __iomem *ltq_stp_membase;
+
+static void ltq_stp_set(struct gpio_chip *chip, unsigned offset, int value)
+{
+ if (value)
+ ltq_stp_shadow |= (1 << offset);
+ else
+ ltq_stp_shadow &= ~(1 << offset);
+ ltq_stp_w32(ltq_stp_shadow, LTQ_STP_CPU0);
+}
+
+static int ltq_stp_direction_output(struct gpio_chip *chip, unsigned offset,
+ int value)
+{
+ ltq_stp_set(chip, offset, value);
+
+ return 0;
+}
+
+static struct gpio_chip ltq_stp_chip = {
+ .label = "ltq_stp",
+ .direction_output = ltq_stp_direction_output,
+ .set = ltq_stp_set,
+ .base = 48,
+ .ngpio = 24,
+ .can_sleep = 1,
+ .owner = THIS_MODULE,
+};
+
+static int ltq_stp_hw_init(void)
+{
+ /* the 3 pins used to control the external stp */
+ ltq_gpio_request(4, 1, 0, 1, "stp-st");
+ ltq_gpio_request(5, 1, 0, 1, "stp-d");
+ ltq_gpio_request(6, 1, 0, 1, "stp-sh");
+
+ /* sane defaults */
+ ltq_stp_w32(0, LTQ_STP_AR);
+ ltq_stp_w32(0, LTQ_STP_CPU0);
+ ltq_stp_w32(0, LTQ_STP_CPU1);
+ ltq_stp_w32(LTQ_STP_CON_SWU, LTQ_STP_CON0);
+ ltq_stp_w32(0, LTQ_STP_CON1);
+
+ /* rising or falling edge */
+ ltq_stp_w32_mask(LTQ_STP_EDGE_MASK, LTQ_STP_FALLING, LTQ_STP_CON0);
+
+ /* per default stp 15-0 are set */
+ ltq_stp_w32_mask(0, LTQ_STP_GROUP0, LTQ_STP_CON1);
+
+ /* stp are update periodically by the FPI bus */
+ ltq_stp_w32_mask(LTQ_STP_UPD_MASK, LTQ_STP_UPD_FPI, LTQ_STP_CON1);
+
+ /* set stp update speed */
+ ltq_stp_w32_mask(LTQ_STP_SPEED_MASK, LTQ_STP_8HZ, LTQ_STP_CON1);
+
+ /* tell the hardware that pin (led) 0 and 1 are controlled
+ * by the dsl arc
+ */
+ ltq_stp_w32_mask(0, LTQ_STP_ADSL_SRC, LTQ_STP_CON0);
+
+ ltq_pmu_enable(PMU_LED);
+ return 0;
+}
+
+static int __devinit ltq_stp_probe(struct platform_device *pdev)
+{
+ struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ int ret = 0;
+
+ if (!res)
+ return -ENOENT;
+ res = devm_request_mem_region(&pdev->dev, res->start,
+ resource_size(res), dev_name(&pdev->dev));
+ if (!res) {
+ dev_err(&pdev->dev, "failed to request STP memory\n");
+ return -EBUSY;
+ }
+ ltq_stp_membase = devm_ioremap_nocache(&pdev->dev, res->start,
+ resource_size(res));
+ if (!ltq_stp_membase) {
+ dev_err(&pdev->dev, "failed to remap STP memory\n");
+ return -ENOMEM;
+ }
+ ret = gpiochip_add(<q_stp_chip);
+ if (!ret)
+ ret = ltq_stp_hw_init();
+
+ return ret;
+}
+
+static struct platform_driver ltq_stp_driver = {
+ .probe = ltq_stp_probe,
+ .driver = {
+ .name = "ltq_stp",
+ .owner = THIS_MODULE,
+ },
+};
+
+int __init ltq_stp_init(void)
+{
+ int ret = platform_driver_register(<q_stp_driver);
+
+ if (ret)
+ pr_info("ltq_stp: error registering platfom driver");
+ return ret;
+}
+
+postcore_initcall(ltq_stp_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/platform_device.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/partitions.h>
+#include <linux/mtd/physmap.h>
+#include <linux/input.h>
+
+#include <lantiq.h>
+
+#include "../machtypes.h"
+#include "devices.h"
+
+static struct mtd_partition easy50601_partitions[] = {
+ {
+ .name = "uboot",
+ .offset = 0x0,
+ .size = 0x10000,
+ },
+ {
+ .name = "uboot_env",
+ .offset = 0x10000,
+ .size = 0x10000,
+ },
+ {
+ .name = "linux",
+ .offset = 0x20000,
+ .size = 0xE0000,
+ },
+ {
+ .name = "rootfs",
+ .offset = 0x100000,
+ .size = 0x300000,
+ },
+};
+
+static struct physmap_flash_data easy50601_flash_data = {
+ .nr_parts = ARRAY_SIZE(easy50601_partitions),
+ .parts = easy50601_partitions,
+};
+
+static void __init easy50601_init(void)
+{
+ ltq_register_nor(&easy50601_flash_data);
+}
+
+MIPS_MACHINE(LTQ_MACH_EASY50601,
+ "EASY50601",
+ "EASY50601 Eval Board",
+ easy50601_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/platform_device.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/partitions.h>
+#include <linux/mtd/physmap.h>
+#include <linux/input.h>
+#include <linux/phy.h>
+
+#include <lantiq_soc.h>
+#include <irq.h>
+
+#include "../machtypes.h"
+#include "devices.h"
+
+static struct mtd_partition easy50712_partitions[] = {
+ {
+ .name = "uboot",
+ .offset = 0x0,
+ .size = 0x10000,
+ },
+ {
+ .name = "uboot_env",
+ .offset = 0x10000,
+ .size = 0x10000,
+ },
+ {
+ .name = "linux",
+ .offset = 0x20000,
+ .size = 0xe0000,
+ },
+ {
+ .name = "rootfs",
+ .offset = 0x100000,
+ .size = 0x300000,
+ },
+};
+
+static struct physmap_flash_data easy50712_flash_data = {
+ .nr_parts = ARRAY_SIZE(easy50712_partitions),
+ .parts = easy50712_partitions,
+};
+
+static struct ltq_pci_data ltq_pci_data = {
+ .clock = PCI_CLOCK_INT,
+ .gpio = PCI_GNT1 | PCI_REQ1,
+ .irq = {
+ [14] = INT_NUM_IM0_IRL0 + 22,
+ },
+};
+
+static struct ltq_eth_data ltq_eth_data = {
+ .mii_mode = PHY_INTERFACE_MODE_MII,
+};
+
+static void __init easy50712_init(void)
+{
+ ltq_register_gpio_stp();
+ ltq_register_nor(&easy50712_flash_data);
+ ltq_register_pci(<q_pci_data);
+ ltq_register_etop(<q_eth_data);
+}
+
+MIPS_MACHINE(LTQ_MACH_EASY50712,
+ "EASY50712",
+ "EASY50712 Eval Board",
+ easy50712_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/ioport.h>
+
+#include <lantiq_soc.h>
+
+/* PMU - the power management unit allows us to turn part of the core
+ * on and off
+ */
+
+/* the enable / disable registers */
+#define LTQ_PMU_PWDCR 0x1C
+#define LTQ_PMU_PWDSR 0x20
+
+#define ltq_pmu_w32(x, y) ltq_w32((x), ltq_pmu_membase + (y))
+#define ltq_pmu_r32(x) ltq_r32(ltq_pmu_membase + (x))
+
+static struct resource ltq_pmu_resource = {
+ .name = "pmu",
+ .start = LTQ_PMU_BASE_ADDR,
+ .end = LTQ_PMU_BASE_ADDR + LTQ_PMU_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+static void __iomem *ltq_pmu_membase;
+
+void ltq_pmu_enable(unsigned int module)
+{
+ int err = 1000000;
+
+ ltq_pmu_w32(ltq_pmu_r32(LTQ_PMU_PWDCR) & ~module, LTQ_PMU_PWDCR);
+ do {} while (--err && (ltq_pmu_r32(LTQ_PMU_PWDSR) & module));
+
+ if (!err)
+ panic("activating PMU module failed!\n");
+}
+EXPORT_SYMBOL(ltq_pmu_enable);
+
+void ltq_pmu_disable(unsigned int module)
+{
+ ltq_pmu_w32(ltq_pmu_r32(LTQ_PMU_PWDCR) | module, LTQ_PMU_PWDCR);
+}
+EXPORT_SYMBOL(ltq_pmu_disable);
+
+int __init ltq_pmu_init(void)
+{
+ if (insert_resource(&iomem_resource, <q_pmu_resource) < 0)
+ panic("Failed to insert pmu memory\n");
+
+ if (request_mem_region(ltq_pmu_resource.start,
+ resource_size(<q_pmu_resource), "pmu") < 0)
+ panic("Failed to request pmu memory\n");
+
+ ltq_pmu_membase = ioremap_nocache(ltq_pmu_resource.start,
+ resource_size(<q_pmu_resource));
+ if (!ltq_pmu_membase)
+ panic("Failed to remap pmu memory\n");
+ return 0;
+}
+
+core_initcall(ltq_pmu_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/module.h>
+#include <linux/clk.h>
+#include <asm/bootinfo.h>
+#include <asm/time.h>
+
+#include <lantiq_soc.h>
+
+#include "../prom.h"
+
+#define SOC_AMAZON_SE "Amazon_SE"
+
+#define PART_SHIFT 12
+#define PART_MASK 0x0FFFFFFF
+#define REV_SHIFT 28
+#define REV_MASK 0xF0000000
+
+void __init ltq_soc_detect(struct ltq_soc_info *i)
+{
+ i->partnum = (ltq_r32(LTQ_MPS_CHIPID) & PART_MASK) >> PART_SHIFT;
+ i->rev = (ltq_r32(LTQ_MPS_CHIPID) & REV_MASK) >> REV_SHIFT;
+ switch (i->partnum) {
+ case SOC_ID_AMAZON_SE:
+ i->name = SOC_AMAZON_SE;
+ i->type = SOC_TYPE_AMAZON_SE;
+ break;
+
+ default:
+ unreachable();
+ break;
+ }
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/module.h>
+#include <linux/clk.h>
+#include <asm/bootinfo.h>
+#include <asm/time.h>
+
+#include <lantiq_soc.h>
+
+#include "../prom.h"
+
+#define SOC_DANUBE "Danube"
+#define SOC_TWINPASS "Twinpass"
+#define SOC_AR9 "AR9"
+
+#define PART_SHIFT 12
+#define PART_MASK 0x0FFFFFFF
+#define REV_SHIFT 28
+#define REV_MASK 0xF0000000
+
+void __init ltq_soc_detect(struct ltq_soc_info *i)
+{
+ i->partnum = (ltq_r32(LTQ_MPS_CHIPID) & PART_MASK) >> PART_SHIFT;
+ i->rev = (ltq_r32(LTQ_MPS_CHIPID) & REV_MASK) >> REV_SHIFT;
+ switch (i->partnum) {
+ case SOC_ID_DANUBE1:
+ case SOC_ID_DANUBE2:
+ i->name = SOC_DANUBE;
+ i->type = SOC_TYPE_DANUBE;
+ break;
+
+ case SOC_ID_TWINPASS:
+ i->name = SOC_TWINPASS;
+ i->type = SOC_TYPE_DANUBE;
+ break;
+
+ case SOC_ID_ARX188:
+ case SOC_ID_ARX168:
+ case SOC_ID_ARX182:
+ i->name = SOC_AR9;
+ i->type = SOC_TYPE_AR9;
+ break;
+
+ default:
+ unreachable();
+ break;
+ }
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/ioport.h>
+#include <linux/pm.h>
+#include <linux/module.h>
+#include <asm/reboot.h>
+
+#include <lantiq_soc.h>
+
+#define ltq_rcu_w32(x, y) ltq_w32((x), ltq_rcu_membase + (y))
+#define ltq_rcu_r32(x) ltq_r32(ltq_rcu_membase + (x))
+
+/* register definitions */
+#define LTQ_RCU_RST 0x0010
+#define LTQ_RCU_RST_ALL 0x40000000
+
+#define LTQ_RCU_RST_STAT 0x0014
+#define LTQ_RCU_STAT_SHIFT 26
+
+static struct resource ltq_rcu_resource = {
+ .name = "rcu",
+ .start = LTQ_RCU_BASE_ADDR,
+ .end = LTQ_RCU_BASE_ADDR + LTQ_RCU_SIZE - 1,
+ .flags = IORESOURCE_MEM,
+};
+
+/* remapped base addr of the reset control unit */
+static void __iomem *ltq_rcu_membase;
+
+/* This function is used by the watchdog driver */
+int ltq_reset_cause(void)
+{
+ u32 val = ltq_rcu_r32(LTQ_RCU_RST_STAT);
+ return val >> LTQ_RCU_STAT_SHIFT;
+}
+EXPORT_SYMBOL_GPL(ltq_reset_cause);
+
+static void ltq_machine_restart(char *command)
+{
+ pr_notice("System restart\n");
+ local_irq_disable();
+ ltq_rcu_w32(ltq_rcu_r32(LTQ_RCU_RST) | LTQ_RCU_RST_ALL, LTQ_RCU_RST);
+ unreachable();
+}
+
+static void ltq_machine_halt(void)
+{
+ pr_notice("System halted.\n");
+ local_irq_disable();
+ unreachable();
+}
+
+static void ltq_machine_power_off(void)
+{
+ pr_notice("Please turn off the power now.\n");
+ local_irq_disable();
+ unreachable();
+}
+
+static int __init mips_reboot_setup(void)
+{
+ /* insert and request the memory region */
+ if (insert_resource(&iomem_resource, <q_rcu_resource) < 0)
+ panic("Failed to insert rcu memory\n");
+
+ if (request_mem_region(ltq_rcu_resource.start,
+ resource_size(<q_rcu_resource), "rcu") < 0)
+ panic("Failed to request rcu memory\n");
+
+ /* remap rcu register range */
+ ltq_rcu_membase = ioremap_nocache(ltq_rcu_resource.start,
+ resource_size(<q_rcu_resource));
+ if (!ltq_rcu_membase)
+ panic("Failed to remap rcu memory\n");
+
+ _machine_restart = ltq_machine_restart;
+ _machine_halt = ltq_machine_halt;
+ pm_power_off = ltq_machine_power_off;
+
+ return 0;
+}
+
+arch_initcall(mips_reboot_setup);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2011 John Crispin <blogic@openwrt.org>
+ */
+
+#include <lantiq_soc.h>
+
+#include "../prom.h"
+#include "devices.h"
+
+void __init ltq_soc_setup(void)
+{
+ ltq_register_ase_asc();
+ ltq_register_gpio();
+ ltq_register_wdt();
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2011 John Crispin <blogic@openwrt.org>
+ */
+
+#include <lantiq_soc.h>
+
+#include "../prom.h"
+#include "devices.h"
+
+void __init ltq_soc_setup(void)
+{
+ ltq_register_asc(0);
+ ltq_register_asc(1);
+ ltq_register_gpio();
+ ltq_register_wdt();
+}
obj-$(CONFIG_CPU_TX49XX) += dump_tlb.o
obj-$(CONFIG_CPU_VR41XX) += dump_tlb.o
obj-$(CONFIG_CPU_CAVIUM_OCTEON) += dump_tlb.o
+obj-$(CONFIG_CPU_XLR) += dump_tlb.o
# libgcc-style stuff needed in the kernel
obj-y += ashldi3.o ashrdi3.o cmpdi2.o lshrdi3.o ucmpdi2.o
.rating = 120, /* Functional for real use, but not desired */
.read = mfgpt_read,
.mask = CLOCKSOURCE_MASK(32),
- .mult = 0,
- .shift = 22,
};
int __init init_mfgpt_clocksource(void)
if (num_possible_cpus() > 1) /* MFGPT does not scale! */
return 0;
- clocksource_mfgpt.mult = clocksource_hz2mult(MFGPT_TICK_RATE, 22);
- return clocksource_register(&clocksource_mfgpt);
+ return clocksource_register_hz(&clocksource_mfgpt, MFGPT_TICK_RATE);
}
arch_initcall(init_mfgpt_clocksource);
#define parse_even_earlier(res, option, p) \
do { \
- int ret; \
+ unsigned int tmp __maybe_unused; \
+ \
if (strncmp(option, (char *)p, strlen(option)) == 0) \
- ret = strict_strtol((char *)p + strlen(option"="), 10, &res); \
+ tmp = strict_strtol((char *)p + strlen(option"="), 10, &res); \
} while (0)
void __init prom_init_env(void)
#
obj-y += cache.o dma-default.o extable.o fault.o \
- init.o tlbex.o tlbex-fault.o uasm.o page.o
+ init.o mmap.o tlbex.o tlbex-fault.o uasm.o \
+ page.o
obj-$(CONFIG_32BIT) += ioremap.o pgtable-32.o
obj-$(CONFIG_64BIT) += pgtable-64.o
obj-$(CONFIG_CPU_TX49XX) += c-r4k.o cex-gen.o tlb-r4k.o
obj-$(CONFIG_CPU_VR41XX) += c-r4k.o cex-gen.o tlb-r4k.o
obj-$(CONFIG_CPU_CAVIUM_OCTEON) += c-octeon.o cex-oct.o tlb-r4k.o
+obj-$(CONFIG_CPU_XLR) += c-r4k.o tlb-r4k.o cex-gen.o
obj-$(CONFIG_IP22_CPU_SCACHE) += sc-ip22.o
obj-$(CONFIG_R5000_CPU_SCACHE) += sc-r5k.o
case CPU_25KF:
case CPU_SB1:
case CPU_SB1A:
+ case CPU_XLR:
c->dcache.flags |= MIPS_CACHE_PINDEX;
break;
unsigned long flags, addr, begin, end, pow2;
unsigned int config = read_c0_config();
struct cpuinfo_mips *c = ¤t_cpu_data;
- int tmp;
if (config & CONF_SC)
return 0;
/* Now search for the wrap around point. */
pow2 = (128 * 1024);
- tmp = 0;
for (addr = begin + (128 * 1024); addr < end; addr = begin + pow2) {
cache_op(Index_Load_Tag_SD, addr);
__asm__ __volatile__("nop; nop; nop; nop;"); /* hazard... */
--- /dev/null
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2011 Wind River Systems,
+ * written by Ralf Baechle <ralf@linux-mips.org>
+ */
+#include <linux/errno.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+
+unsigned long shm_align_mask = PAGE_SIZE - 1; /* Sane caches */
+
+EXPORT_SYMBOL(shm_align_mask);
+
+#define COLOUR_ALIGN(addr,pgoff) \
+ ((((addr) + shm_align_mask) & ~shm_align_mask) + \
+ (((pgoff) << PAGE_SHIFT) & shm_align_mask))
+
+unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ struct vm_area_struct * vmm;
+ int do_color_align;
+
+ if (len > TASK_SIZE)
+ return -ENOMEM;
+
+ if (flags & MAP_FIXED) {
+ /* Even MAP_FIXED mappings must reside within TASK_SIZE. */
+ if (TASK_SIZE - len < addr)
+ return -EINVAL;
+
+ /*
+ * We do not accept a shared mapping if it would violate
+ * cache aliasing constraints.
+ */
+ if ((flags & MAP_SHARED) &&
+ ((addr - (pgoff << PAGE_SHIFT)) & shm_align_mask))
+ return -EINVAL;
+ return addr;
+ }
+
+ do_color_align = 0;
+ if (filp || (flags & MAP_SHARED))
+ do_color_align = 1;
+ if (addr) {
+ if (do_color_align)
+ addr = COLOUR_ALIGN(addr, pgoff);
+ else
+ addr = PAGE_ALIGN(addr);
+ vmm = find_vma(current->mm, addr);
+ if (TASK_SIZE - len >= addr &&
+ (!vmm || addr + len <= vmm->vm_start))
+ return addr;
+ }
+ addr = current->mm->mmap_base;
+ if (do_color_align)
+ addr = COLOUR_ALIGN(addr, pgoff);
+ else
+ addr = PAGE_ALIGN(addr);
+
+ for (vmm = find_vma(current->mm, addr); ; vmm = vmm->vm_next) {
+ /* At this point: (!vmm || addr < vmm->vm_end). */
+ if (TASK_SIZE - len < addr)
+ return -ENOMEM;
+ if (!vmm || addr + len <= vmm->vm_start)
+ return addr;
+ addr = vmm->vm_end;
+ if (do_color_align)
+ addr = COLOUR_ALIGN(addr, pgoff);
+ }
+}
+
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+ unsigned long random_factor = 0UL;
+
+ if (current->flags & PF_RANDOMIZE) {
+ random_factor = get_random_int();
+ random_factor = random_factor << PAGE_SHIFT;
+ if (TASK_IS_32BIT_ADDR)
+ random_factor &= 0xfffffful;
+ else
+ random_factor &= 0xffffffful;
+ }
+
+ mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
+ mm->get_unmapped_area = arch_get_unmapped_area;
+ mm->unmap_area = arch_unmap_area;
+}
+
+static inline unsigned long brk_rnd(void)
+{
+ unsigned long rnd = get_random_int();
+
+ rnd = rnd << PAGE_SHIFT;
+ /* 8MB for 32bit, 256MB for 64bit */
+ if (TASK_IS_32BIT_ADDR)
+ rnd = rnd & 0x7ffffful;
+ else
+ rnd = rnd & 0xffffffful;
+
+ return rnd;
+}
+
+unsigned long arch_randomize_brk(struct mm_struct *mm)
+{
+ unsigned long base = mm->brk;
+ unsigned long ret;
+
+ ret = PAGE_ALIGN(base + brk_rnd());
+
+ if (ret < mm->brk)
+ return mm->brk;
+
+ return ret;
+}
case CPU_5KC:
case CPU_TX49XX:
case CPU_PR4450:
+ case CPU_XLR:
uasm_i_nop(p);
tlbw(p);
break;
struct uasm_reloc *r = relocs;
u32 *f;
unsigned int final_len;
- struct mips_huge_tlb_info htlb_info;
- enum vmalloc64_mode vmalloc_mode;
+ struct mips_huge_tlb_info htlb_info __maybe_unused;
+ enum vmalloc64_mode vmalloc_mode __maybe_unused;
memset(tlb_handler, 0, sizeof(tlb_handler));
memset(labels, 0, sizeof(labels));
void __init prom_init(void)
{
- int result;
-
prom_argc = fw_arg0;
_prom_argv = (int *) fw_arg1;
_prom_envp = (int *) fw_arg2;
#ifdef CONFIG_SERIAL_8250_CONSOLE
console_config();
#endif
- /* Early detection of CMP support */
- result = gcmp_probe(GCMP_BASE_ADDR, GCMP_ADDRSPACE_SZ);
-
#ifdef CONFIG_MIPS_CMP
- if (result)
+ /* Early detection of CMP support */
+ if (gcmp_probe(GCMP_BASE_ADDR, GCMP_ADDRSPACE_SZ))
register_smp_ops(&cmp_smp_ops);
+ else
#endif
#ifdef CONFIG_MIPS_MT_SMP
-#ifdef CONFIG_MIPS_CMP
- if (!result)
register_smp_ops(&vsmp_smp_ops);
-#else
- register_smp_ops(&vsmp_smp_ops);
-#endif
#endif
#ifdef CONFIG_MIPS_MT_SMTC
register_smp_ops(&msmtc_smp_ops);
static inline int mips_pcibios_iack(void)
{
int irq;
- u32 dummy;
/*
* Determine highest priority pending interrupt by performing
BONITO_PCIMAP_CFG = 0x20000;
/* Flush Bonito register block */
- dummy = BONITO_PCIMAP_CFG;
+ (void) BONITO_PCIMAP_CFG;
iob(); /* sync */
irq = __raw_readl((u32 *)_pcictrl_bonito_pcicfg);
static irqreturn_t ipi_resched_interrupt(int irq, void *dev_id)
{
+ scheduler_ipi();
+
return IRQ_HANDLED;
}
--- /dev/null
+config NLM_COMMON
+ bool
+
+config NLM_XLR
+ bool
--- /dev/null
+obj-y += setup.o platform.o irq.o setup.o time.o
+obj-$(CONFIG_SMP) += smp.o smpboot.o
+obj-$(CONFIG_EARLY_PRINTK) += xlr_console.o
+
+EXTRA_CFLAGS += -Werror
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/interrupt.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+
+#include <asm/mipsregs.h>
+
+#include <asm/netlogic/xlr/iomap.h>
+#include <asm/netlogic/xlr/pic.h>
+#include <asm/netlogic/xlr/xlr.h>
+
+#include <asm/netlogic/interrupt.h>
+#include <asm/netlogic/mips-extns.h>
+
+static u64 nlm_irq_mask;
+static DEFINE_SPINLOCK(nlm_pic_lock);
+
+static void xlr_pic_enable(struct irq_data *d)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+ unsigned long flags;
+ nlm_reg_t reg;
+ int irq = d->irq;
+
+ WARN(!PIC_IRQ_IS_IRT(irq), "Bad irq %d", irq);
+
+ spin_lock_irqsave(&nlm_pic_lock, flags);
+ reg = netlogic_read_reg(mmio, PIC_IRT_1_BASE + irq - PIC_IRQ_BASE);
+ netlogic_write_reg(mmio, PIC_IRT_1_BASE + irq - PIC_IRQ_BASE,
+ reg | (1 << 6) | (1 << 30) | (1 << 31));
+ spin_unlock_irqrestore(&nlm_pic_lock, flags);
+}
+
+static void xlr_pic_mask(struct irq_data *d)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+ unsigned long flags;
+ nlm_reg_t reg;
+ int irq = d->irq;
+
+ WARN(!PIC_IRQ_IS_IRT(irq), "Bad irq %d", irq);
+
+ spin_lock_irqsave(&nlm_pic_lock, flags);
+ reg = netlogic_read_reg(mmio, PIC_IRT_1_BASE + irq - PIC_IRQ_BASE);
+ netlogic_write_reg(mmio, PIC_IRT_1_BASE + irq - PIC_IRQ_BASE,
+ reg | (1 << 6) | (1 << 30) | (0 << 31));
+ spin_unlock_irqrestore(&nlm_pic_lock, flags);
+}
+
+#ifdef CONFIG_PCI
+/* Extra ACK needed for XLR on chip PCI controller */
+static void xlr_pci_ack(struct irq_data *d)
+{
+ nlm_reg_t *pci_mmio = netlogic_io_mmio(NETLOGIC_IO_PCIX_OFFSET);
+
+ netlogic_read_reg(pci_mmio, (0x140 >> 2));
+}
+
+/* Extra ACK needed for XLS on chip PCIe controller */
+static void xls_pcie_ack(struct irq_data *d)
+{
+ nlm_reg_t *pcie_mmio_le = netlogic_io_mmio(NETLOGIC_IO_PCIE_1_OFFSET);
+
+ switch (d->irq) {
+ case PIC_PCIE_LINK0_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x90 >> 2), 0xffffffff);
+ break;
+ case PIC_PCIE_LINK1_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x94 >> 2), 0xffffffff);
+ break;
+ case PIC_PCIE_LINK2_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x190 >> 2), 0xffffffff);
+ break;
+ case PIC_PCIE_LINK3_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x194 >> 2), 0xffffffff);
+ break;
+ }
+}
+
+/* For XLS B silicon, the 3,4 PCI interrupts are different */
+static void xls_pcie_ack_b(struct irq_data *d)
+{
+ nlm_reg_t *pcie_mmio_le = netlogic_io_mmio(NETLOGIC_IO_PCIE_1_OFFSET);
+
+ switch (d->irq) {
+ case PIC_PCIE_LINK0_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x90 >> 2), 0xffffffff);
+ break;
+ case PIC_PCIE_LINK1_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x94 >> 2), 0xffffffff);
+ break;
+ case PIC_PCIE_XLSB0_LINK2_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x190 >> 2), 0xffffffff);
+ break;
+ case PIC_PCIE_XLSB0_LINK3_IRQ:
+ netlogic_write_reg(pcie_mmio_le, (0x194 >> 2), 0xffffffff);
+ break;
+ }
+}
+#endif
+
+static void xlr_pic_ack(struct irq_data *d)
+{
+ unsigned long flags;
+ nlm_reg_t *mmio;
+ int irq = d->irq;
+ void *hd = irq_data_get_irq_handler_data(d);
+
+ WARN(!PIC_IRQ_IS_IRT(irq), "Bad irq %d", irq);
+
+ if (hd) {
+ void (*extra_ack)(void *) = hd;
+ extra_ack(d);
+ }
+ mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+ spin_lock_irqsave(&nlm_pic_lock, flags);
+ netlogic_write_reg(mmio, PIC_INT_ACK, (1 << (irq - PIC_IRQ_BASE)));
+ spin_unlock_irqrestore(&nlm_pic_lock, flags);
+}
+
+/*
+ * This chip definition handles interrupts routed thru the XLR
+ * hardware PIC, currently IRQs 8-39 are mapped to hardware intr
+ * 0-31 wired the XLR PIC
+ */
+static struct irq_chip xlr_pic = {
+ .name = "XLR-PIC",
+ .irq_enable = xlr_pic_enable,
+ .irq_mask = xlr_pic_mask,
+ .irq_ack = xlr_pic_ack,
+};
+
+static void rsvd_irq_handler(struct irq_data *d)
+{
+ WARN(d->irq >= PIC_IRQ_BASE, "Bad irq %d", d->irq);
+}
+
+/*
+ * Chip definition for CPU originated interrupts(timer, msg) and
+ * IPIs
+ */
+struct irq_chip nlm_cpu_intr = {
+ .name = "XLR-CPU-INTR",
+ .irq_enable = rsvd_irq_handler,
+ .irq_mask = rsvd_irq_handler,
+ .irq_ack = rsvd_irq_handler,
+};
+
+void __init init_xlr_irqs(void)
+{
+ nlm_reg_t *mmio = netlogic_io_mmio(NETLOGIC_IO_PIC_OFFSET);
+ uint32_t thread_mask = 1;
+ int level, i;
+
+ pr_info("Interrupt thread mask [%x]\n", thread_mask);
+ for (i = 0; i < PIC_NUM_IRTS; i++) {
+ level = PIC_IRQ_IS_EDGE_TRIGGERED(i);
+
+ /* Bind all PIC irqs to boot cpu */
+ netlogic_write_reg(mmio, PIC_IRT_0_BASE + i, thread_mask);
+
+ /*
+ * Use local scheduling and high polarity for all IRTs
+ * Invalidate all IRTs, by default
+ */
+ netlogic_write_reg(mmio, PIC_IRT_1_BASE + i,
+ (level << 30) | (1 << 6) | (PIC_IRQ_BASE + i));
+ }
+
+ /* Make all IRQs as level triggered by default */
+ for (i = 0; i < NR_IRQS; i++) {
+ if (PIC_IRQ_IS_IRT(i))
+ irq_set_chip_and_handler(i, &xlr_pic, handle_level_irq);
+ else
+ irq_set_chip_and_handler(i, &nlm_cpu_intr,
+ handle_level_irq);
+ }
+#ifdef CONFIG_SMP
+ irq_set_chip_and_handler(IRQ_IPI_SMP_FUNCTION, &nlm_cpu_intr,
+ nlm_smp_function_ipi_handler);
+ irq_set_chip_and_handler(IRQ_IPI_SMP_RESCHEDULE, &nlm_cpu_intr,
+ nlm_smp_resched_ipi_handler);
+ nlm_irq_mask |=
+ ((1ULL << IRQ_IPI_SMP_FUNCTION) | (1ULL << IRQ_IPI_SMP_RESCHEDULE));
+#endif
+
+#ifdef CONFIG_PCI
+ /*
+ * For PCI interrupts, we need to ack the PIC controller too, overload
+ * irq handler data to do this
+ */
+ if (nlm_chip_is_xls()) {
+ if (nlm_chip_is_xls_b()) {
+ irq_set_handler_data(PIC_PCIE_LINK0_IRQ,
+ xls_pcie_ack_b);
+ irq_set_handler_data(PIC_PCIE_LINK1_IRQ,
+ xls_pcie_ack_b);
+ irq_set_handler_data(PIC_PCIE_XLSB0_LINK2_IRQ,
+ xls_pcie_ack_b);
+ irq_set_handler_data(PIC_PCIE_XLSB0_LINK3_IRQ,
+ xls_pcie_ack_b);
+ } else {
+ irq_set_handler_data(PIC_PCIE_LINK0_IRQ, xls_pcie_ack);
+ irq_set_handler_data(PIC_PCIE_LINK1_IRQ, xls_pcie_ack);
+ irq_set_handler_data(PIC_PCIE_LINK2_IRQ, xls_pcie_ack);
+ irq_set_handler_data(PIC_PCIE_LINK3_IRQ, xls_pcie_ack);
+ }
+ } else {
+ /* XLR PCI controller ACK */
+ irq_set_handler_data(PIC_PCIE_XLSB0_LINK3_IRQ, xlr_pci_ack);
+ }
+#endif
+ /* unmask all PIC related interrupts. If no handler is installed by the
+ * drivers, it'll just ack the interrupt and return
+ */
+ for (i = PIC_IRT_FIRST_IRQ; i <= PIC_IRT_LAST_IRQ; i++)
+ nlm_irq_mask |= (1ULL << i);
+
+ nlm_irq_mask |= (1ULL << IRQ_TIMER);
+}
+
+void __init arch_init_irq(void)
+{
+ /* Initialize the irq descriptors */
+ init_xlr_irqs();
+ write_c0_eimr(nlm_irq_mask);
+}
+
+void __cpuinit nlm_smp_irq_init(void)
+{
+ /* set interrupt mask for non-zero cpus */
+ write_c0_eimr(nlm_irq_mask);
+}
+
+asmlinkage void plat_irq_dispatch(void)
+{
+ uint64_t eirr;
+ int i;
+
+ eirr = read_c0_eirr() & read_c0_eimr();
+ if (!eirr)
+ return;
+
+ /* no need of EIRR here, writing compare clears interrupt */
+ if (eirr & (1 << IRQ_TIMER)) {
+ do_IRQ(IRQ_TIMER);
+ return;
+ }
+
+ /* use dcltz: optimize below code */
+ for (i = 63; i != -1; i--) {
+ if (eirr & (1ULL << i))
+ break;
+ }
+ if (i == -1) {
+ pr_err("no interrupt !!\n");
+ return;
+ }
+
+ /* Ack eirr */
+ write_c0_eirr(1ULL << i);
+
+ do_IRQ(i);
+}
--- /dev/null
+/*
+ * Copyright 2011, Netlogic Microsystems.
+ * Copyright 2004, Matt Porter <mporter@kernel.crashing.org>
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include <linux/device.h>
+#include <linux/platform_device.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/resource.h>
+#include <linux/serial_8250.h>
+#include <linux/serial_reg.h>
+
+#include <asm/netlogic/xlr/iomap.h>
+#include <asm/netlogic/xlr/pic.h>
+#include <asm/netlogic/xlr/xlr.h>
+
+unsigned int nlm_xlr_uart_in(struct uart_port *p, int offset)
+{
+ nlm_reg_t *mmio;
+ unsigned int value;
+
+ /* XLR uart does not need any mapping of regs */
+ mmio = (nlm_reg_t *)(p->membase + (offset << p->regshift));
+ value = netlogic_read_reg(mmio, 0);
+
+ /* See XLR/XLS errata */
+ if (offset == UART_MSR)
+ value ^= 0xF0;
+ else if (offset == UART_MCR)
+ value ^= 0x3;
+
+ return value;
+}
+
+void nlm_xlr_uart_out(struct uart_port *p, int offset, int value)
+{
+ nlm_reg_t *mmio;
+
+ /* XLR uart does not need any mapping of regs */
+ mmio = (nlm_reg_t *)(p->membase + (offset << p->regshift));
+
+ /* See XLR/XLS errata */
+ if (offset == UART_MSR)
+ value ^= 0xF0;
+ else if (offset == UART_MCR)
+ value ^= 0x3;
+
+ netlogic_write_reg(mmio, 0, value);
+}
+
+#define PORT(_irq) \
+ { \
+ .irq = _irq, \
+ .regshift = 2, \
+ .iotype = UPIO_MEM32, \
+ .flags = (UPF_SKIP_TEST | \
+ UPF_FIXED_TYPE | UPF_BOOT_AUTOCONF),\
+ .uartclk = PIC_CLKS_PER_SEC, \
+ .type = PORT_16550A, \
+ .serial_in = nlm_xlr_uart_in, \
+ .serial_out = nlm_xlr_uart_out, \
+ }
+
+static struct plat_serial8250_port xlr_uart_data[] = {
+ PORT(PIC_UART_0_IRQ),
+ PORT(PIC_UART_1_IRQ),
+ {},
+};
+
+static struct platform_device uart_device = {
+ .name = "serial8250",
+ .id = PLAT8250_DEV_PLATFORM,
+ .dev = {
+ .platform_data = xlr_uart_data,
+ },
+};
+
+static int __init nlm_uart_init(void)
+{
+ nlm_reg_t *mmio;
+
+ mmio = netlogic_io_mmio(NETLOGIC_IO_UART_0_OFFSET);
+ xlr_uart_data[0].membase = (void __iomem *)mmio;
+ xlr_uart_data[0].mapbase = CPHYSADDR((unsigned long)mmio);
+
+ mmio = netlogic_io_mmio(NETLOGIC_IO_UART_1_OFFSET);
+ xlr_uart_data[1].membase = (void __iomem *)mmio;
+ xlr_uart_data[1].mapbase = CPHYSADDR((unsigned long)mmio);
+
+ return platform_device_register(&uart_device);
+}
+
+arch_initcall(nlm_uart_init);
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/serial_8250.h>
+#include <linux/pm.h>
+
+#include <asm/reboot.h>
+#include <asm/time.h>
+#include <asm/bootinfo.h>
+#include <asm/smp-ops.h>
+
+#include <asm/netlogic/interrupt.h>
+#include <asm/netlogic/psb-bootinfo.h>
+
+#include <asm/netlogic/xlr/xlr.h>
+#include <asm/netlogic/xlr/iomap.h>
+#include <asm/netlogic/xlr/pic.h>
+#include <asm/netlogic/xlr/gpio.h>
+
+unsigned long netlogic_io_base = (unsigned long)(DEFAULT_NETLOGIC_IO_BASE);
+unsigned long nlm_common_ebase = 0x0;
+struct psb_info nlm_prom_info;
+
+static void nlm_early_serial_setup(void)
+{
+ struct uart_port s;
+ nlm_reg_t *uart_base;
+
+ uart_base = netlogic_io_mmio(NETLOGIC_IO_UART_0_OFFSET);
+ memset(&s, 0, sizeof(s));
+ s.flags = ASYNC_BOOT_AUTOCONF | ASYNC_SKIP_TEST;
+ s.iotype = UPIO_MEM32;
+ s.regshift = 2;
+ s.irq = PIC_UART_0_IRQ;
+ s.uartclk = PIC_CLKS_PER_SEC;
+ s.serial_in = nlm_xlr_uart_in;
+ s.serial_out = nlm_xlr_uart_out;
+ s.mapbase = (unsigned long)uart_base;
+ s.membase = (unsigned char __iomem *)uart_base;
+ early_serial_setup(&s);
+}
+
+static void nlm_linux_exit(void)
+{
+ nlm_reg_t *mmio;
+
+ mmio = netlogic_io_mmio(NETLOGIC_IO_GPIO_OFFSET);
+ /* trigger a chip reset by writing 1 to GPIO_SWRESET_REG */
+ netlogic_write_reg(mmio, NETLOGIC_GPIO_SWRESET_REG, 1);
+ for ( ; ; )
+ cpu_wait();
+}
+
+void __init plat_mem_setup(void)
+{
+ panic_timeout = 5;
+ _machine_restart = (void (*)(char *))nlm_linux_exit;
+ _machine_halt = nlm_linux_exit;
+ pm_power_off = nlm_linux_exit;
+}
+
+const char *get_system_type(void)
+{
+ return "Netlogic XLR/XLS Series";
+}
+
+void __init prom_free_prom_memory(void)
+{
+ /* Nothing yet */
+}
+
+static void build_arcs_cmdline(int *argv)
+{
+ int i, remain, len;
+ char *arg;
+
+ remain = sizeof(arcs_cmdline) - 1;
+ arcs_cmdline[0] = '\0';
+ for (i = 0; argv[i] != 0; i++) {
+ arg = (char *)(long)argv[i];
+ len = strlen(arg);
+ if (len + 1 > remain)
+ break;
+ strcat(arcs_cmdline, arg);
+ strcat(arcs_cmdline, " ");
+ remain -= len + 1;
+ }
+
+ /* Add the default options here */
+ if ((strstr(arcs_cmdline, "console=")) == NULL) {
+ arg = "console=ttyS0,38400 ";
+ len = strlen(arg);
+ if (len > remain)
+ goto fail;
+ strcat(arcs_cmdline, arg);
+ remain -= len;
+ }
+#ifdef CONFIG_BLK_DEV_INITRD
+ if ((strstr(arcs_cmdline, "rdinit=")) == NULL) {
+ arg = "rdinit=/sbin/init ";
+ len = strlen(arg);
+ if (len > remain)
+ goto fail;
+ strcat(arcs_cmdline, arg);
+ remain -= len;
+ }
+#endif
+ return;
+fail:
+ panic("Cannot add %s, command line too big!", arg);
+}
+
+static void prom_add_memory(void)
+{
+ struct nlm_boot_mem_map *bootm;
+ u64 start, size;
+ u64 pref_backup = 512; /* avoid pref walking beyond end */
+ int i;
+
+ bootm = (void *)(long)nlm_prom_info.psb_mem_map;
+ for (i = 0; i < bootm->nr_map; i++) {
+ if (bootm->map[i].type != BOOT_MEM_RAM)
+ continue;
+ start = bootm->map[i].addr;
+ size = bootm->map[i].size;
+
+ /* Work around for using bootloader mem */
+ if (i == 0 && start == 0 && size == 0x0c000000)
+ size = 0x0ff00000;
+
+ add_memory_region(start, size - pref_backup, BOOT_MEM_RAM);
+ }
+}
+
+void __init prom_init(void)
+{
+ int *argv, *envp; /* passed as 32 bit ptrs */
+ struct psb_info *prom_infop;
+
+ /* truncate to 32 bit and sign extend all args */
+ argv = (int *)(long)(int)fw_arg1;
+ envp = (int *)(long)(int)fw_arg2;
+ prom_infop = (struct psb_info *)(long)(int)fw_arg3;
+
+ nlm_prom_info = *prom_infop;
+
+ nlm_early_serial_setup();
+ build_arcs_cmdline(argv);
+ nlm_common_ebase = read_c0_ebase() & (~((1 << 12) - 1));
+ prom_add_memory();
+
+#ifdef CONFIG_SMP
+ nlm_wakeup_secondary_cpus(nlm_prom_info.online_cpu_map);
+ register_smp_ops(&nlm_smp_ops);
+#endif
+}
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/irq.h>
+
+#include <asm/mmu_context.h>
+
+#include <asm/netlogic/interrupt.h>
+#include <asm/netlogic/mips-extns.h>
+
+#include <asm/netlogic/xlr/iomap.h>
+#include <asm/netlogic/xlr/pic.h>
+#include <asm/netlogic/xlr/xlr.h>
+
+void core_send_ipi(int logical_cpu, unsigned int action)
+{
+ int cpu = cpu_logical_map(logical_cpu);
+ u32 tid = cpu & 0x3;
+ u32 pid = (cpu >> 2) & 0x07;
+ u32 ipi = (tid << 16) | (pid << 20);
+
+ if (action & SMP_CALL_FUNCTION)
+ ipi |= IRQ_IPI_SMP_FUNCTION;
+ else if (action & SMP_RESCHEDULE_YOURSELF)
+ ipi |= IRQ_IPI_SMP_RESCHEDULE;
+ else
+ return;
+
+ pic_send_ipi(ipi);
+}
+
+void nlm_send_ipi_single(int cpu, unsigned int action)
+{
+ core_send_ipi(cpu, action);
+}
+
+void nlm_send_ipi_mask(const struct cpumask *mask, unsigned int action)
+{
+ int cpu;
+
+ for_each_cpu(cpu, mask) {
+ core_send_ipi(cpu, action);
+ }
+}
+
+/* IRQ_IPI_SMP_FUNCTION Handler */
+void nlm_smp_function_ipi_handler(unsigned int irq, struct irq_desc *desc)
+{
+ smp_call_function_interrupt();
+}
+
+/* IRQ_IPI_SMP_RESCHEDULE handler */
+void nlm_smp_resched_ipi_handler(unsigned int irq, struct irq_desc *desc)
+{
+ set_need_resched();
+}
+
+void nlm_common_ipi_handler(int irq, struct pt_regs *regs)
+{
+ if (irq == IRQ_IPI_SMP_FUNCTION) {
+ smp_call_function_interrupt();
+ } else {
+ /* Announce that we are for reschduling */
+ set_need_resched();
+ }
+}
+
+/*
+ * Called before going into mips code, early cpu init
+ */
+void nlm_early_init_secondary(void)
+{
+ write_c0_ebase((uint32_t)nlm_common_ebase);
+ /* TLB partition here later */
+}
+
+/*
+ * Code to run on secondary just after probing the CPU
+ */
+static void __cpuinit nlm_init_secondary(void)
+{
+ nlm_smp_irq_init();
+}
+
+void nlm_smp_finish(void)
+{
+#ifdef notyet
+ nlm_common_msgring_cpu_init();
+#endif
+}
+
+void nlm_cpus_done(void)
+{
+}
+
+/*
+ * Boot all other cpus in the system, initialize them, and bring them into
+ * the boot function
+ */
+int nlm_cpu_unblock[NR_CPUS];
+int nlm_cpu_ready[NR_CPUS];
+unsigned long nlm_next_gp;
+unsigned long nlm_next_sp;
+cpumask_t phys_cpu_present_map;
+
+void nlm_boot_secondary(int logical_cpu, struct task_struct *idle)
+{
+ unsigned long gp = (unsigned long)task_thread_info(idle);
+ unsigned long sp = (unsigned long)__KSTK_TOS(idle);
+ int cpu = cpu_logical_map(logical_cpu);
+
+ nlm_next_sp = sp;
+ nlm_next_gp = gp;
+
+ /* barrier */
+ __sync();
+ nlm_cpu_unblock[cpu] = 1;
+}
+
+void __init nlm_smp_setup(void)
+{
+ unsigned int boot_cpu;
+ int num_cpus, i;
+
+ boot_cpu = hard_smp_processor_id();
+ cpus_clear(phys_cpu_present_map);
+
+ cpu_set(boot_cpu, phys_cpu_present_map);
+ __cpu_number_map[boot_cpu] = 0;
+ __cpu_logical_map[0] = boot_cpu;
+ cpu_set(0, cpu_possible_map);
+
+ num_cpus = 1;
+ for (i = 0; i < NR_CPUS; i++) {
+ if (nlm_cpu_ready[i]) {
+ cpu_set(i, phys_cpu_present_map);
+ __cpu_number_map[i] = num_cpus;
+ __cpu_logical_map[num_cpus] = i;
+ cpu_set(num_cpus, cpu_possible_map);
+ ++num_cpus;
+ }
+ }
+
+ pr_info("Phys CPU present map: %lx, possible map %lx\n",
+ (unsigned long)phys_cpu_present_map.bits[0],
+ (unsigned long)cpu_possible_map.bits[0]);
+
+ pr_info("Detected %i Slave CPU(s)\n", num_cpus);
+}
+
+void nlm_prepare_cpus(unsigned int max_cpus)
+{
+}
+
+struct plat_smp_ops nlm_smp_ops = {
+ .send_ipi_single = nlm_send_ipi_single,
+ .send_ipi_mask = nlm_send_ipi_mask,
+ .init_secondary = nlm_init_secondary,
+ .smp_finish = nlm_smp_finish,
+ .cpus_done = nlm_cpus_done,
+ .boot_secondary = nlm_boot_secondary,
+ .smp_setup = nlm_smp_setup,
+ .prepare_cpus = nlm_prepare_cpus,
+};
+
+unsigned long secondary_entry_point;
+
+int nlm_wakeup_secondary_cpus(u32 wakeup_mask)
+{
+ unsigned int tid, pid, ipi, i, boot_cpu;
+ void *reset_vec;
+
+ secondary_entry_point = (unsigned long)prom_pre_boot_secondary_cpus;
+ reset_vec = (void *)CKSEG1ADDR(0x1fc00000);
+ memcpy(reset_vec, nlm_boot_smp_nmi, 0x80);
+ boot_cpu = hard_smp_processor_id();
+
+ for (i = 0; i < NR_CPUS; i++) {
+ if (i == boot_cpu)
+ continue;
+ if (wakeup_mask & (1u << i)) {
+ tid = i & 0x3;
+ pid = (i >> 2) & 0x7;
+ ipi = (tid << 16) | (pid << 20) | (1 << 8);
+ pic_send_ipi(ipi);
+ }
+ }
+
+ return 0;
+}
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/regdef.h>
+#include <asm/mipsregs.h>
+
+
+/* Don't jump to linux function from Bootloader stack. Change it
+ * here. Kernel might allocate bootloader memory before all the CPUs are
+ * brought up (eg: Inode cache region) and we better don't overwrite this
+ * memory
+ */
+NESTED(prom_pre_boot_secondary_cpus, 16, sp)
+ .set mips64
+ mfc0 t0, $15, 1 # read ebase
+ andi t0, 0x1f # t0 has the processor_id()
+ sll t0, 2 # offset in cpu array
+
+ PTR_LA t1, nlm_cpu_ready # mark CPU ready
+ PTR_ADDU t1, t0
+ li t2, 1
+ sw t2, 0(t1)
+
+ PTR_LA t1, nlm_cpu_unblock
+ PTR_ADDU t1, t0
+1: lw t2, 0(t1) # wait till unblocked
+ beqz t2, 1b
+ nop
+
+ PTR_LA t1, nlm_next_sp
+ PTR_L sp, 0(t1)
+ PTR_LA t1, nlm_next_gp
+ PTR_L gp, 0(t1)
+
+ PTR_LA t0, nlm_early_init_secondary
+ jalr t0
+ nop
+
+ PTR_LA t0, smp_bootstrap
+ jr t0
+ nop
+END(prom_pre_boot_secondary_cpus)
+
+NESTED(nlm_boot_smp_nmi, 0, sp)
+ .set push
+ .set noat
+ .set mips64
+ .set noreorder
+
+ /* Clear the NMI and BEV bits */
+ MFC0 k0, CP0_STATUS
+ li k1, 0xffb7ffff
+ and k0, k0, k1
+ MTC0 k0, CP0_STATUS
+
+ PTR_LA k1, secondary_entry_point
+ PTR_L k0, 0(k1)
+ jr k0
+ nop
+ .set pop
+END(nlm_boot_smp_nmi)
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/init.h>
+
+#include <asm/time.h>
+#include <asm/netlogic/interrupt.h>
+#include <asm/netlogic/psb-bootinfo.h>
+
+unsigned int __cpuinit get_c0_compare_int(void)
+{
+ return IRQ_TIMER;
+}
+
+void __init plat_time_init(void)
+{
+ mips_hpt_frequency = nlm_prom_info.cpu_frequency;
+ pr_info("MIPS counter frequency [%ld]\n",
+ (unsigned long)mips_hpt_frequency);
+}
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/types.h>
+#include <asm/netlogic/xlr/iomap.h>
+
+void prom_putchar(char c)
+{
+ nlm_reg_t *mmio;
+
+ mmio = netlogic_io_mmio(NETLOGIC_IO_UART_0_OFFSET);
+ while (netlogic_read_reg(mmio, 0x5) == 0)
+ ;
+ netlogic_write_reg(mmio, 0x0, c);
+}
obj-$(CONFIG_SIBYTE_BCM112X) += fixup-sb1250.o pci-sb1250.o
obj-$(CONFIG_SIBYTE_BCM1x80) += pci-bcm1480.o pci-bcm1480ht.o
obj-$(CONFIG_SNI_RM) += fixup-sni.o ops-sni.o
+obj-$(CONFIG_SOC_XWAY) += pci-lantiq.o ops-lantiq.o
obj-$(CONFIG_TANBAC_TB0219) += fixup-tb0219.o
obj-$(CONFIG_TANBAC_TB0226) += fixup-tb0226.o
obj-$(CONFIG_TANBAC_TB0287) += fixup-tb0287.o
obj-$(CONFIG_WR_PPMC) += fixup-wrppmc.o
obj-$(CONFIG_MIKROTIK_RB532) += pci-rc32434.o ops-rc32434.o fixup-rc32434.o
obj-$(CONFIG_CPU_CAVIUM_OCTEON) += pci-octeon.o pcie-octeon.o
+obj-$(CONFIG_NLM_XLR) += pci-xlr.o
ifdef CONFIG_PCI_MSI
obj-$(CONFIG_CPU_CAVIUM_OCTEON) += msi-octeon.o
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/mm.h>
+#include <asm/addrspace.h>
+#include <linux/vmalloc.h>
+
+#include <lantiq_soc.h>
+
+#include "pci-lantiq.h"
+
+#define LTQ_PCI_CFG_BUSNUM_SHF 16
+#define LTQ_PCI_CFG_DEVNUM_SHF 11
+#define LTQ_PCI_CFG_FUNNUM_SHF 8
+
+#define PCI_ACCESS_READ 0
+#define PCI_ACCESS_WRITE 1
+
+static int ltq_pci_config_access(unsigned char access_type, struct pci_bus *bus,
+ unsigned int devfn, unsigned int where, u32 *data)
+{
+ unsigned long cfg_base;
+ unsigned long flags;
+ u32 temp;
+
+ /* we support slot from 0 to 15 dev_fn & 0x68 (AD29) is the
+ SoC itself */
+ if ((bus->number != 0) || ((devfn & 0xf8) > 0x78)
+ || ((devfn & 0xf8) == 0) || ((devfn & 0xf8) == 0x68))
+ return 1;
+
+ spin_lock_irqsave(&ebu_lock, flags);
+
+ cfg_base = (unsigned long) ltq_pci_mapped_cfg;
+ cfg_base |= (bus->number << LTQ_PCI_CFG_BUSNUM_SHF) | (devfn <<
+ LTQ_PCI_CFG_FUNNUM_SHF) | (where & ~0x3);
+
+ /* Perform access */
+ if (access_type == PCI_ACCESS_WRITE) {
+ ltq_w32(swab32(*data), ((u32 *)cfg_base));
+ } else {
+ *data = ltq_r32(((u32 *)(cfg_base)));
+ *data = swab32(*data);
+ }
+ wmb();
+
+ /* clean possible Master abort */
+ cfg_base = (unsigned long) ltq_pci_mapped_cfg;
+ cfg_base |= (0x0 << LTQ_PCI_CFG_FUNNUM_SHF) + 4;
+ temp = ltq_r32(((u32 *)(cfg_base)));
+ temp = swab32(temp);
+ cfg_base = (unsigned long) ltq_pci_mapped_cfg;
+ cfg_base |= (0x68 << LTQ_PCI_CFG_FUNNUM_SHF) + 4;
+ ltq_w32(temp, ((u32 *)cfg_base));
+
+ spin_unlock_irqrestore(&ebu_lock, flags);
+
+ if (((*data) == 0xffffffff) && (access_type == PCI_ACCESS_READ))
+ return 1;
+
+ return 0;
+}
+
+int ltq_pci_read_config_dword(struct pci_bus *bus, unsigned int devfn,
+ int where, int size, u32 *val)
+{
+ u32 data = 0;
+
+ if (ltq_pci_config_access(PCI_ACCESS_READ, bus, devfn, where, &data))
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ if (size == 1)
+ *val = (data >> ((where & 3) << 3)) & 0xff;
+ else if (size == 2)
+ *val = (data >> ((where & 3) << 3)) & 0xffff;
+ else
+ *val = data;
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+int ltq_pci_write_config_dword(struct pci_bus *bus, unsigned int devfn,
+ int where, int size, u32 val)
+{
+ u32 data = 0;
+
+ if (size == 4) {
+ data = val;
+ } else {
+ if (ltq_pci_config_access(PCI_ACCESS_READ, bus,
+ devfn, where, &data))
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ if (size == 1)
+ data = (data & ~(0xff << ((where & 3) << 3))) |
+ (val << ((where & 3) << 3));
+ else if (size == 2)
+ data = (data & ~(0xffff << ((where & 3) << 3))) |
+ (val << ((where & 3) << 3));
+ }
+
+ if (ltq_pci_config_access(PCI_ACCESS_WRITE, bus, devfn, where, &data))
+ return PCIBIOS_DEVICE_NOT_FOUND;
+
+ return PCIBIOS_SUCCESSFUL;
+}
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/platform_device.h>
+
+#include <asm/pci.h>
+#include <asm/gpio.h>
+#include <asm/addrspace.h>
+
+#include <lantiq_soc.h>
+#include <lantiq_irq.h>
+#include <lantiq_platform.h>
+
+#include "pci-lantiq.h"
+
+#define LTQ_PCI_CFG_BASE 0x17000000
+#define LTQ_PCI_CFG_SIZE 0x00008000
+#define LTQ_PCI_MEM_BASE 0x18000000
+#define LTQ_PCI_MEM_SIZE 0x02000000
+#define LTQ_PCI_IO_BASE 0x1AE00000
+#define LTQ_PCI_IO_SIZE 0x00200000
+
+#define PCI_CR_FCI_ADDR_MAP0 0x00C0
+#define PCI_CR_FCI_ADDR_MAP1 0x00C4
+#define PCI_CR_FCI_ADDR_MAP2 0x00C8
+#define PCI_CR_FCI_ADDR_MAP3 0x00CC
+#define PCI_CR_FCI_ADDR_MAP4 0x00D0
+#define PCI_CR_FCI_ADDR_MAP5 0x00D4
+#define PCI_CR_FCI_ADDR_MAP6 0x00D8
+#define PCI_CR_FCI_ADDR_MAP7 0x00DC
+#define PCI_CR_CLK_CTRL 0x0000
+#define PCI_CR_PCI_MOD 0x0030
+#define PCI_CR_PC_ARB 0x0080
+#define PCI_CR_FCI_ADDR_MAP11hg 0x00E4
+#define PCI_CR_BAR11MASK 0x0044
+#define PCI_CR_BAR12MASK 0x0048
+#define PCI_CR_BAR13MASK 0x004C
+#define PCI_CS_BASE_ADDR1 0x0010
+#define PCI_CR_PCI_ADDR_MAP11 0x0064
+#define PCI_CR_FCI_BURST_LENGTH 0x00E8
+#define PCI_CR_PCI_EOI 0x002C
+#define PCI_CS_STS_CMD 0x0004
+
+#define PCI_MASTER0_REQ_MASK_2BITS 8
+#define PCI_MASTER1_REQ_MASK_2BITS 10
+#define PCI_MASTER2_REQ_MASK_2BITS 12
+#define INTERNAL_ARB_ENABLE_BIT 0
+
+#define LTQ_CGU_IFCCR 0x0018
+#define LTQ_CGU_PCICR 0x0034
+
+#define ltq_pci_w32(x, y) ltq_w32((x), ltq_pci_membase + (y))
+#define ltq_pci_r32(x) ltq_r32(ltq_pci_membase + (x))
+
+#define ltq_pci_cfg_w32(x, y) ltq_w32((x), ltq_pci_mapped_cfg + (y))
+#define ltq_pci_cfg_r32(x) ltq_r32(ltq_pci_mapped_cfg + (x))
+
+struct ltq_pci_gpio_map {
+ int pin;
+ int alt0;
+ int alt1;
+ int dir;
+ char *name;
+};
+
+/* the pci core can make use of the following gpios */
+static struct ltq_pci_gpio_map ltq_pci_gpio_map[] = {
+ { 0, 1, 0, 0, "pci-exin0" },
+ { 1, 1, 0, 0, "pci-exin1" },
+ { 2, 1, 0, 0, "pci-exin2" },
+ { 39, 1, 0, 0, "pci-exin3" },
+ { 10, 1, 0, 0, "pci-exin4" },
+ { 9, 1, 0, 0, "pci-exin5" },
+ { 30, 1, 0, 1, "pci-gnt1" },
+ { 23, 1, 0, 1, "pci-gnt2" },
+ { 19, 1, 0, 1, "pci-gnt3" },
+ { 38, 1, 0, 1, "pci-gnt4" },
+ { 29, 1, 0, 0, "pci-req1" },
+ { 31, 1, 0, 0, "pci-req2" },
+ { 3, 1, 0, 0, "pci-req3" },
+ { 37, 1, 0, 0, "pci-req4" },
+};
+
+__iomem void *ltq_pci_mapped_cfg;
+static __iomem void *ltq_pci_membase;
+
+int (*ltqpci_plat_dev_init)(struct pci_dev *dev) = NULL;
+
+/* Since the PCI REQ pins can be reused for other functionality, make it
+ possible to exclude those from interpretation by the PCI controller */
+static int ltq_pci_req_mask = 0xf;
+
+static int *ltq_pci_irq_map;
+
+struct pci_ops ltq_pci_ops = {
+ .read = ltq_pci_read_config_dword,
+ .write = ltq_pci_write_config_dword
+};
+
+static struct resource pci_io_resource = {
+ .name = "pci io space",
+ .start = LTQ_PCI_IO_BASE,
+ .end = LTQ_PCI_IO_BASE + LTQ_PCI_IO_SIZE - 1,
+ .flags = IORESOURCE_IO
+};
+
+static struct resource pci_mem_resource = {
+ .name = "pci memory space",
+ .start = LTQ_PCI_MEM_BASE,
+ .end = LTQ_PCI_MEM_BASE + LTQ_PCI_MEM_SIZE - 1,
+ .flags = IORESOURCE_MEM
+};
+
+static struct pci_controller ltq_pci_controller = {
+ .pci_ops = <q_pci_ops,
+ .mem_resource = &pci_mem_resource,
+ .mem_offset = 0x00000000UL,
+ .io_resource = &pci_io_resource,
+ .io_offset = 0x00000000UL,
+};
+
+int pcibios_plat_dev_init(struct pci_dev *dev)
+{
+ if (ltqpci_plat_dev_init)
+ return ltqpci_plat_dev_init(dev);
+
+ return 0;
+}
+
+static u32 ltq_calc_bar11mask(void)
+{
+ u32 mem, bar11mask;
+
+ /* BAR11MASK value depends on available memory on system. */
+ mem = num_physpages * PAGE_SIZE;
+ bar11mask = (0x0ffffff0 & ~((1 << (fls(mem) - 1)) - 1)) | 8;
+
+ return bar11mask;
+}
+
+static void ltq_pci_setup_gpio(int gpio)
+{
+ int i;
+ for (i = 0; i < ARRAY_SIZE(ltq_pci_gpio_map); i++) {
+ if (gpio & (1 << i)) {
+ ltq_gpio_request(ltq_pci_gpio_map[i].pin,
+ ltq_pci_gpio_map[i].alt0,
+ ltq_pci_gpio_map[i].alt1,
+ ltq_pci_gpio_map[i].dir,
+ ltq_pci_gpio_map[i].name);
+ }
+ }
+ ltq_gpio_request(21, 0, 0, 1, "pci-reset");
+ ltq_pci_req_mask = (gpio >> PCI_REQ_SHIFT) & PCI_REQ_MASK;
+}
+
+static int __devinit ltq_pci_startup(struct ltq_pci_data *conf)
+{
+ u32 temp_buffer;
+
+ /* set clock to 33Mhz */
+ ltq_cgu_w32(ltq_cgu_r32(LTQ_CGU_IFCCR) & ~0xf00000, LTQ_CGU_IFCCR);
+ ltq_cgu_w32(ltq_cgu_r32(LTQ_CGU_IFCCR) | 0x800000, LTQ_CGU_IFCCR);
+
+ /* external or internal clock ? */
+ if (conf->clock) {
+ ltq_cgu_w32(ltq_cgu_r32(LTQ_CGU_IFCCR) & ~(1 << 16),
+ LTQ_CGU_IFCCR);
+ ltq_cgu_w32((1 << 30), LTQ_CGU_PCICR);
+ } else {
+ ltq_cgu_w32(ltq_cgu_r32(LTQ_CGU_IFCCR) | (1 << 16),
+ LTQ_CGU_IFCCR);
+ ltq_cgu_w32((1 << 31) | (1 << 30), LTQ_CGU_PCICR);
+ }
+
+ /* setup pci clock and gpis used by pci */
+ ltq_pci_setup_gpio(conf->gpio);
+
+ /* enable auto-switching between PCI and EBU */
+ ltq_pci_w32(0xa, PCI_CR_CLK_CTRL);
+
+ /* busy, i.e. configuration is not done, PCI access has to be retried */
+ ltq_pci_w32(ltq_pci_r32(PCI_CR_PCI_MOD) & ~(1 << 24), PCI_CR_PCI_MOD);
+ wmb();
+ /* BUS Master/IO/MEM access */
+ ltq_pci_cfg_w32(ltq_pci_cfg_r32(PCI_CS_STS_CMD) | 7, PCI_CS_STS_CMD);
+
+ /* enable external 2 PCI masters */
+ temp_buffer = ltq_pci_r32(PCI_CR_PC_ARB);
+ temp_buffer &= (~(ltq_pci_req_mask << 16));
+ /* enable internal arbiter */
+ temp_buffer |= (1 << INTERNAL_ARB_ENABLE_BIT);
+ /* enable internal PCI master reqest */
+ temp_buffer &= (~(3 << PCI_MASTER0_REQ_MASK_2BITS));
+
+ /* enable EBU request */
+ temp_buffer &= (~(3 << PCI_MASTER1_REQ_MASK_2BITS));
+
+ /* enable all external masters request */
+ temp_buffer &= (~(3 << PCI_MASTER2_REQ_MASK_2BITS));
+ ltq_pci_w32(temp_buffer, PCI_CR_PC_ARB);
+ wmb();
+
+ /* setup BAR memory regions */
+ ltq_pci_w32(0x18000000, PCI_CR_FCI_ADDR_MAP0);
+ ltq_pci_w32(0x18400000, PCI_CR_FCI_ADDR_MAP1);
+ ltq_pci_w32(0x18800000, PCI_CR_FCI_ADDR_MAP2);
+ ltq_pci_w32(0x18c00000, PCI_CR_FCI_ADDR_MAP3);
+ ltq_pci_w32(0x19000000, PCI_CR_FCI_ADDR_MAP4);
+ ltq_pci_w32(0x19400000, PCI_CR_FCI_ADDR_MAP5);
+ ltq_pci_w32(0x19800000, PCI_CR_FCI_ADDR_MAP6);
+ ltq_pci_w32(0x19c00000, PCI_CR_FCI_ADDR_MAP7);
+ ltq_pci_w32(0x1ae00000, PCI_CR_FCI_ADDR_MAP11hg);
+ ltq_pci_w32(ltq_calc_bar11mask(), PCI_CR_BAR11MASK);
+ ltq_pci_w32(0, PCI_CR_PCI_ADDR_MAP11);
+ ltq_pci_w32(0, PCI_CS_BASE_ADDR1);
+ /* both TX and RX endian swap are enabled */
+ ltq_pci_w32(ltq_pci_r32(PCI_CR_PCI_EOI) | 3, PCI_CR_PCI_EOI);
+ wmb();
+ ltq_pci_w32(ltq_pci_r32(PCI_CR_BAR12MASK) | 0x80000000,
+ PCI_CR_BAR12MASK);
+ ltq_pci_w32(ltq_pci_r32(PCI_CR_BAR13MASK) | 0x80000000,
+ PCI_CR_BAR13MASK);
+ /*use 8 dw burst length */
+ ltq_pci_w32(0x303, PCI_CR_FCI_BURST_LENGTH);
+ ltq_pci_w32(ltq_pci_r32(PCI_CR_PCI_MOD) | (1 << 24), PCI_CR_PCI_MOD);
+ wmb();
+
+ /* setup irq line */
+ ltq_ebu_w32(ltq_ebu_r32(LTQ_EBU_PCC_CON) | 0xc, LTQ_EBU_PCC_CON);
+ ltq_ebu_w32(ltq_ebu_r32(LTQ_EBU_PCC_IEN) | 0x10, LTQ_EBU_PCC_IEN);
+
+ /* toggle reset pin */
+ __gpio_set_value(21, 0);
+ wmb();
+ mdelay(1);
+ __gpio_set_value(21, 1);
+ return 0;
+}
+
+int __init pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
+{
+ if (ltq_pci_irq_map[slot])
+ return ltq_pci_irq_map[slot];
+ printk(KERN_ERR "lq_pci: trying to map irq for unknown slot %d\n",
+ slot);
+
+ return 0;
+}
+
+static int __devinit ltq_pci_probe(struct platform_device *pdev)
+{
+ struct ltq_pci_data *ltq_pci_data =
+ (struct ltq_pci_data *) pdev->dev.platform_data;
+ pci_probe_only = 0;
+ ltq_pci_irq_map = ltq_pci_data->irq;
+ ltq_pci_membase = ioremap_nocache(PCI_CR_BASE_ADDR, PCI_CR_SIZE);
+ ltq_pci_mapped_cfg =
+ ioremap_nocache(LTQ_PCI_CFG_BASE, LTQ_PCI_CFG_BASE);
+ ltq_pci_controller.io_map_base =
+ (unsigned long)ioremap(LTQ_PCI_IO_BASE, LTQ_PCI_IO_SIZE - 1);
+ ltq_pci_startup(ltq_pci_data);
+ register_pci_controller(<q_pci_controller);
+
+ return 0;
+}
+
+static struct platform_driver
+ltq_pci_driver = {
+ .probe = ltq_pci_probe,
+ .driver = {
+ .name = "ltq_pci",
+ .owner = THIS_MODULE,
+ },
+};
+
+int __init pcibios_init(void)
+{
+ int ret = platform_driver_register(<q_pci_driver);
+ if (ret)
+ printk(KERN_INFO "ltq_pci: Error registering platfom driver!");
+ return ret;
+}
+
+arch_initcall(pcibios_init);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#ifndef _LTQ_PCI_H__
+#define _LTQ_PCI_H__
+
+extern __iomem void *ltq_pci_mapped_cfg;
+extern int ltq_pci_read_config_dword(struct pci_bus *bus,
+ unsigned int devfn, int where, int size, u32 *val);
+extern int ltq_pci_write_config_dword(struct pci_bus *bus,
+ unsigned int devfn, int where, int size, u32 val);
+
+#endif
--- /dev/null
+/*
+ * Copyright 2003-2011 NetLogic Microsystems, Inc. (NetLogic). All rights
+ * reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the NetLogic
+ * license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NETLOGIC ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL NETLOGIC OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/console.h>
+
+#include <asm/io.h>
+
+#include <asm/netlogic/interrupt.h>
+#include <asm/netlogic/xlr/iomap.h>
+#include <asm/netlogic/xlr/pic.h>
+#include <asm/netlogic/xlr/xlr.h>
+
+static void *pci_config_base;
+
+#define pci_cfg_addr(bus, devfn, off) (((bus) << 16) | ((devfn) << 8) | (off))
+
+/* PCI ops */
+static inline u32 pci_cfg_read_32bit(struct pci_bus *bus, unsigned int devfn,
+ int where)
+{
+ u32 data;
+ u32 *cfgaddr;
+
+ cfgaddr = (u32 *)(pci_config_base +
+ pci_cfg_addr(bus->number, devfn, where & ~3));
+ data = *cfgaddr;
+ return cpu_to_le32(data);
+}
+
+static inline void pci_cfg_write_32bit(struct pci_bus *bus, unsigned int devfn,
+ int where, u32 data)
+{
+ u32 *cfgaddr;
+
+ cfgaddr = (u32 *)(pci_config_base +
+ pci_cfg_addr(bus->number, devfn, where & ~3));
+ *cfgaddr = cpu_to_le32(data);
+}
+
+static int nlm_pcibios_read(struct pci_bus *bus, unsigned int devfn,
+ int where, int size, u32 *val)
+{
+ u32 data;
+
+ if ((size == 2) && (where & 1))
+ return PCIBIOS_BAD_REGISTER_NUMBER;
+ else if ((size == 4) && (where & 3))
+ return PCIBIOS_BAD_REGISTER_NUMBER;
+
+ data = pci_cfg_read_32bit(bus, devfn, where);
+
+ if (size == 1)
+ *val = (data >> ((where & 3) << 3)) & 0xff;
+ else if (size == 2)
+ *val = (data >> ((where & 3) << 3)) & 0xffff;
+ else
+ *val = data;
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+
+static int nlm_pcibios_write(struct pci_bus *bus, unsigned int devfn,
+ int where, int size, u32 val)
+{
+ u32 data;
+
+ if ((size == 2) && (where & 1))
+ return PCIBIOS_BAD_REGISTER_NUMBER;
+ else if ((size == 4) && (where & 3))
+ return PCIBIOS_BAD_REGISTER_NUMBER;
+
+ data = pci_cfg_read_32bit(bus, devfn, where);
+
+ if (size == 1)
+ data = (data & ~(0xff << ((where & 3) << 3))) |
+ (val << ((where & 3) << 3));
+ else if (size == 2)
+ data = (data & ~(0xffff << ((where & 3) << 3))) |
+ (val << ((where & 3) << 3));
+ else
+ data = val;
+
+ pci_cfg_write_32bit(bus, devfn, where, data);
+
+ return PCIBIOS_SUCCESSFUL;
+}
+
+struct pci_ops nlm_pci_ops = {
+ .read = nlm_pcibios_read,
+ .write = nlm_pcibios_write
+};
+
+static struct resource nlm_pci_mem_resource = {
+ .name = "XLR PCI MEM",
+ .start = 0xd0000000UL, /* 256MB PCI mem @ 0xd000_0000 */
+ .end = 0xdfffffffUL,
+ .flags = IORESOURCE_MEM,
+};
+
+static struct resource nlm_pci_io_resource = {
+ .name = "XLR IO MEM",
+ .start = 0x10000000UL, /* 16MB PCI IO @ 0x1000_0000 */
+ .end = 0x100fffffUL,
+ .flags = IORESOURCE_IO,
+};
+
+struct pci_controller nlm_pci_controller = {
+ .index = 0,
+ .pci_ops = &nlm_pci_ops,
+ .mem_resource = &nlm_pci_mem_resource,
+ .mem_offset = 0x00000000UL,
+ .io_resource = &nlm_pci_io_resource,
+ .io_offset = 0x00000000UL,
+};
+
+int __init pcibios_map_irq(const struct pci_dev *dev, u8 slot, u8 pin)
+{
+ if (!nlm_chip_is_xls())
+ return PIC_PCIX_IRQ; /* for XLR just one IRQ*/
+
+ /*
+ * For XLS PCIe, there is an IRQ per Link, find out which
+ * link the device is on to assign interrupts
+ */
+ if (dev->bus->self == NULL)
+ return 0;
+
+ switch (dev->bus->self->devfn) {
+ case 0x0:
+ return PIC_PCIE_LINK0_IRQ;
+ case 0x8:
+ return PIC_PCIE_LINK1_IRQ;
+ case 0x10:
+ if (nlm_chip_is_xls_b())
+ return PIC_PCIE_XLSB0_LINK2_IRQ;
+ else
+ return PIC_PCIE_LINK2_IRQ;
+ case 0x18:
+ if (nlm_chip_is_xls_b())
+ return PIC_PCIE_XLSB0_LINK3_IRQ;
+ else
+ return PIC_PCIE_LINK3_IRQ;
+ }
+ WARN(1, "Unexpected devfn %d\n", dev->bus->self->devfn);
+ return 0;
+}
+
+/* Do platform specific device initialization at pci_enable_device() time */
+int pcibios_plat_dev_init(struct pci_dev *dev)
+{
+ return 0;
+}
+
+static int __init pcibios_init(void)
+{
+ /* PSB assigns PCI resources */
+ pci_probe_only = 1;
+ pci_config_base = ioremap(DEFAULT_PCI_CONFIG_BASE, 16 << 20);
+
+ /* Extend IO port for memory mapped io */
+ ioport_resource.start = 0;
+ ioport_resource.end = ~0;
+
+ set_io_port_base(CKSEG1);
+ nlm_pci_controller.io_map_base = CKSEG1;
+
+ pr_info("Registering XLR/XLS PCIX/PCIE Controller.\n");
+ register_pci_controller(&nlm_pci_controller);
+
+ return 0;
+}
+
+arch_initcall(pcibios_init);
+
+struct pci_fixup pcibios_fixups[] = {
+ {0}
+};
static struct irq_chip msp_per_irq_controller = {
.name = "MSP_PER",
- .irq_enable = unmask_per_irq.
+ .irq_enable = unmask_per_irq,
.irq_disable = mask_per_irq,
.irq_ack = msp_per_irq_ack,
#ifdef CONFIG_SMP
if (status & 0x2)
smp_call_function_interrupt();
+ if (status & 0x4)
+ scheduler_ipi();
break;
case 1:
if (status & 0x2)
smp_call_function_interrupt();
+ if (status & 0x4)
+ scheduler_ipi();
break;
}
}
0:
PTR_L t1, PBE_ADDRESS(t0) /* source */
PTR_L t2, PBE_ORIG_ADDRESS(t0) /* destination */
- PTR_ADDIU t3, t1, PAGE_SIZE
+ PTR_ADDU t3, t1, PAGE_SIZE
1:
REG_L t8, (t1)
REG_S t8, (t2)
struct resource *r;
r = rb532_gpio_reg0_res;
- rb532_gpio_chip->regbase = ioremap_nocache(r->start, r->end - r->start);
+ rb532_gpio_chip->regbase = ioremap_nocache(r->start, resource_size(r));
if (!rb532_gpio_chip->regbase) {
printk(KERN_ERR "rb532: cannot remap GPIO register 0\n");
*/
static int __init sgiseeq_devinit(void)
{
- unsigned int tmp;
+ unsigned int pbdma __maybe_unused;
int res, i;
eth0_pd.hpc = hpc3c0;
/* Second HPC is missing? */
if (ip22_is_fullhouse() ||
- get_dbe(tmp, (unsigned int *)&hpc3c1->pbdma[1]))
+ get_dbe(pbdma, (unsigned int *)&hpc3c1->pbdma[1]))
return 0;
sgimc->giopar |= SGIMC_GIOPAR_MASTEREXP1 | SGIMC_GIOPAR_EXP164 |
static unsigned long dosample(void)
{
u32 ct0, ct1;
- u8 msb, lsb;
+ u8 msb;
/* Start the counter. */
sgint->tcword = (SGINT_TCWORD_CNT2 | SGINT_TCWORD_CALL |
/* Latch and spin until top byte of counter2 is zero */
do {
writeb(SGINT_TCWORD_CNT2 | SGINT_TCWORD_CLAT, &sgint->tcword);
- lsb = readb(&sgint->tcnt2);
+ (void) readb(&sgint->tcnt2);
msb = readb(&sgint->tcnt2);
ct1 = read_c0_count();
} while (msb);
unsigned long xtalk_addr, size_t size)
{
nasid_t nasid = COMPACT_TO_NASID_NODEID(cnode);
- volatile hubreg_t junk;
unsigned i;
/* use small-window mapping if possible */
* after we write it.
*/
IIO_ITTE_PUT(nasid, i, HUB_PIO_MAP_TO_MEM, widget, xtalk_addr);
- junk = HUB_L(IIO_ITTE_GET(nasid, i));
+ (void) HUB_L(IIO_ITTE_GET(nasid, i));
return NODE_BWIN_BASE(nasid, widget) + (xtalk_addr % BWIN_SIZE);
}
#ifdef CONFIG_SMP
if (pend0 & (1UL << CPU_RESCHED_A_IRQ)) {
LOCAL_HUB_CLR_INTR(CPU_RESCHED_A_IRQ);
+ scheduler_ipi();
} else if (pend0 & (1UL << CPU_RESCHED_B_IRQ)) {
LOCAL_HUB_CLR_INTR(CPU_RESCHED_B_IRQ);
+ scheduler_ipi();
} else if (pend0 & (1UL << CPU_CALL_A_IRQ)) {
LOCAL_HUB_CLR_INTR(CPU_CALL_A_IRQ);
smp_call_function_interrupt();
static __init void set_ktext_source(nasid_t client_nasid, nasid_t server_nasid)
{
- cnodeid_t client_cnode;
kern_vars_t *kvp;
- client_cnode = NASID_TO_COMPACT_NODEID(client_nasid);
-
kvp = &hub_data(client_nasid)->kern_vars;
KERN_VARS_ADDR(client_nasid) = (unsigned long)kvp;
static void rt_set_mode(enum clock_event_mode mode,
struct clock_event_device *evt)
{
- switch (mode) {
- case CLOCK_EVT_MODE_ONESHOT:
- /* The only mode supported */
- break;
-
- case CLOCK_EVT_MODE_PERIODIC:
- case CLOCK_EVT_MODE_UNUSED:
- case CLOCK_EVT_MODE_SHUTDOWN:
- case CLOCK_EVT_MODE_RESUME:
- /* Nothing to do */
- break;
- }
+ /* Nothing to do ... */
}
int rt_timer_irq;
{
struct clocksource *cs = &hub_rt_clocksource;
- clocksource_set_clock(cs, CYCLES_PER_SEC);
- clocksource_register(cs);
+ clocksource_register_hz(cs, CYCLES_PER_SEC);
}
void __init plat_time_init(void)
#include <linux/delay.h>
#include <linux/smp.h>
#include <linux/kernel_stat.h>
+#include <linux/sched.h>
#include <asm/mmu_context.h>
#include <asm/io.h>
/* Clear the mailbox to clear the interrupt */
__raw_writeq(((u64)action)<<48, mailbox_0_clear_regs[cpu]);
- /*
- * Nothing to do for SMP_RESCHEDULE_YOURSELF; returning from the
- * interrupt will do the reschedule for us
- */
+ if (action & SMP_RESCHEDULE_YOURSELF)
+ scheduler_ipi();
if (action & SMP_CALL_FUNCTION)
smp_call_function_interrupt();
#include <linux/interrupt.h>
#include <linux/smp.h>
#include <linux/kernel_stat.h>
+#include <linux/sched.h>
#include <asm/mmu_context.h>
#include <asm/io.h>
/* Clear the mailbox to clear the interrupt */
____raw_writeq(((u64)action) << 48, mailbox_clear_regs[cpu]);
- /*
- * Nothing to do for SMP_RESCHEDULE_YOURSELF; returning from the
- * interrupt will do the reschedule for us
- */
+ if (action & SMP_RESCHEDULE_YOURSELF)
+ scheduler_ipi();
if (action & SMP_CALL_FUNCTION)
smp_call_function_interrupt();
static __init unsigned long dosample(void)
{
u32 ct0, ct1;
- volatile u8 msb, lsb;
+ volatile u8 msb;
/* Start the counter. */
outb_p(0x34, 0x43);
/* Latch and spin until top byte of counter0 is zero */
do {
outb(0x00, 0x43);
- lsb = inb(0x40);
+ (void) inb(0x40);
msb = inb(0x40);
ct1 = read_c0_count();
} while (msb);
* @irq: The interrupt number.
* @dev_id: The device ID.
*
- * We need do nothing here, since the scheduling will be effected on our way
- * back through entry.S.
- *
* Returns IRQ_HANDLED to indicate we handled the interrupt successfully.
*/
static irqreturn_t smp_reschedule_interrupt(int irq, void *dev_id)
{
- /* do nothing */
+ scheduler_ipi();
return IRQ_HANDLED;
}
case IPI_RESCHEDULE:
smp_debug(100, KERN_DEBUG "CPU%d IPI_RESCHEDULE\n", this_cpu);
- /*
- * Reschedule callback. Everything to be
- * done is done by the interrupt return path.
- */
+ scheduler_ipi();
break;
case IPI_CALL_FUNC:
}
memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
- for (i = 0; i < npmem_ranges; i++)
+ for (i = 0; i < npmem_ranges; i++) {
+ node_set_state(i, N_NORMAL_MEMORY);
node_set_online(i);
+ }
#endif
/*
uint fec_addr_low; /* lower 32 bits of station address */
ushort fec_addr_high; /* upper 16 bits of station address */
ushort res1; /* reserved */
- uint fec_hash_table_high; /* upper 32-bits of hash table */
- uint fec_hash_table_low; /* lower 32-bits of hash table */
+ uint fec_grp_hash_table_high; /* upper 32-bits of hash table */
+ uint fec_grp_hash_table_low; /* lower 32-bits of hash table */
uint fec_r_des_start; /* beginning of Rx descriptor ring */
uint fec_x_des_start; /* beginning of Tx descriptor ring */
uint fec_r_buff_size; /* Rx buffer size */
#ifdef __KERNEL__
#include <linux/irq.h>
-#include <linux/sysdev.h>
#include <asm/dcr.h>
#include <asm/msi_bitmap.h>
/* link */
struct mpic *next;
- struct sys_device sysdev;
-
#ifdef CONFIG_PM
struct mpic_irq_save *save_data;
#endif
*
* Obviously, the GART is not cache coherent and so any change to it
* must be flushed to memory (or maybe just make the GART space non
- * cachable). AGP memory itself does't seem to be cache coherent neither.
+ * cachable). AGP memory itself doesn't seem to be cache coherent neither.
*
* In order to invalidate the GART (which is probably necessary to inval
* the bridge internal TLBs), the following sequence has to be written,
if (data && !(data & DABR_TRANSLATION))
return -EIO;
#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ if (ptrace_get_breakpoints(task) < 0)
+ return -ESRCH;
+
bp = thread->ptrace_bps[0];
if ((!data) || !(data & (DABR_DATA_WRITE | DABR_DATA_READ))) {
if (bp) {
unregister_hw_breakpoint(bp);
thread->ptrace_bps[0] = NULL;
}
+ ptrace_put_breakpoints(task);
return 0;
}
if (bp) {
(DABR_DATA_WRITE | DABR_DATA_READ),
&attr.bp_type);
ret = modify_user_hw_breakpoint(bp, &attr);
- if (ret)
+ if (ret) {
+ ptrace_put_breakpoints(task);
return ret;
+ }
thread->ptrace_bps[0] = bp;
+ ptrace_put_breakpoints(task);
thread->dabr = data;
return 0;
}
ptrace_triggered, task);
if (IS_ERR(bp)) {
thread->ptrace_bps[0] = NULL;
+ ptrace_put_breakpoints(task);
return PTR_ERR(bp);
}
+ ptrace_put_breakpoints(task);
+
#endif /* CONFIG_HAVE_HW_BREAKPOINT */
/* Move contents to the DABR register */
generic_smp_call_function_interrupt();
break;
case PPC_MSG_RESCHEDULE:
- /* we notice need_resched on exit */
+ scheduler_ipi();
break;
case PPC_MSG_CALL_FUNC_SINGLE:
generic_smp_call_function_single_interrupt();
static irqreturn_t reschedule_action(int irq, void *data)
{
- /* we just need the return path side effect of checking need_resched */
+ scheduler_ipi();
return IRQ_HANDLED;
}
.end = mpc83xx_suspend_end,
};
+static struct of_device_id pmc_match[];
static int pmc_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct device_node *np = ofdev->dev.of_node;
struct resource res;
struct pmc_type *type;
int ret = 0;
- if (!ofdev->dev.of_match)
+ match = of_match_device(pmc_match, &ofdev->dev);
+ if (!match)
return -EINVAL;
- type = ofdev->dev.of_match->data;
+ type = match->data;
if (!of_device_is_available(np))
return -ENODEV;
#include <linux/io.h>
#include <linux/mutex.h>
#include <linux/linux_logo.h>
+#include <linux/syscore_ops.h>
#include <asm/spu.h>
#include <asm/spu_priv1.h>
#include <asm/spu_csa.h>
}
EXPORT_SYMBOL_GPL(spu_init_channels);
-static int spu_shutdown(struct sys_device *sysdev)
-{
- struct spu *spu = container_of(sysdev, struct spu, sysdev);
-
- spu_free_irqs(spu);
- spu_destroy_spu(spu);
- return 0;
-}
-
static struct sysdev_class spu_sysdev_class = {
.name = "spu",
- .shutdown = spu_shutdown,
};
int spu_add_sysdev_attr(struct sysdev_attribute *attr)
}
#endif
+static void spu_shutdown(void)
+{
+ struct spu *spu;
+
+ mutex_lock(&spu_full_list_mutex);
+ list_for_each_entry(spu, &spu_full_list, full_list) {
+ spu_free_irqs(spu);
+ spu_destroy_spu(spu);
+ }
+ mutex_unlock(&spu_full_list_mutex);
+}
+
+static struct syscore_ops spu_syscore_ops = {
+ .shutdown = spu_shutdown,
+};
+
static int __init init_spu_base(void)
{
int i, ret = 0;
crash_register_spus(&spu_full_list);
mutex_unlock(&spu_full_list_mutex);
spu_add_sysdev_attr(&attr_stat);
+ register_syscore_ops(&spu_syscore_ops);
spu_init_affinity();
#include <linux/signal.h>
#include <linux/pci.h>
#include <linux/interrupt.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/adb.h>
#include <linux/pmu.h>
#include <linux/module.h>
return viaint;
}
-static int pmacpic_suspend(struct sys_device *sysdev, pm_message_t state)
+static int pmacpic_suspend(void)
{
int viaint = pmacpic_find_viaint();
return 0;
}
-static int pmacpic_resume(struct sys_device *sysdev)
+static void pmacpic_resume(void)
{
int i;
for (i = 0; i < max_real_irqs; ++i)
if (test_bit(i, sleep_save_mask))
pmac_unmask_irq(irq_get_irq_data(i));
-
- return 0;
}
-#endif /* CONFIG_PM && CONFIG_PPC32 */
-
-static struct sysdev_class pmacpic_sysclass = {
- .name = "pmac_pic",
+static struct syscore_ops pmacpic_syscore_ops = {
+ .suspend = pmacpic_suspend,
+ .resume = pmacpic_resume,
};
-static struct sys_device device_pmacpic = {
- .id = 0,
- .cls = &pmacpic_sysclass,
-};
-
-static struct sysdev_driver driver_pmacpic = {
-#if defined(CONFIG_PM) && defined(CONFIG_PPC32)
- .suspend = &pmacpic_suspend,
- .resume = &pmacpic_resume,
-#endif /* CONFIG_PM && CONFIG_PPC32 */
-};
-
-static int __init init_pmacpic_sysfs(void)
+static int __init init_pmacpic_syscore(void)
{
-#ifdef CONFIG_PPC32
- if (max_irqs == 0)
- return -ENODEV;
-#endif
- printk(KERN_DEBUG "Registering pmac pic with sysfs...\n");
- sysdev_class_register(&pmacpic_sysclass);
- sysdev_register(&device_pmacpic);
- sysdev_driver_register(&pmacpic_sysclass, &driver_pmacpic);
+ register_syscore_ops(&pmacpic_syscore_ops);
return 0;
}
-machine_subsys_initcall(powermac, init_pmacpic_sysfs);
+machine_subsys_initcall(powermac, init_pmacpic_syscore);
+
+#endif /* CONFIG_PM && CONFIG_PPC32 */
return 0;
}
+static const struct of_device_id fsl_of_msi_ids[];
static int __devinit fsl_of_msi_probe(struct platform_device *dev)
{
+ const struct of_device_id *match;
struct fsl_msi *msi;
struct resource res;
int err, i, j, irq_index, count;
u32 offset;
static const u32 all_avail[] = { 0, NR_MSI_IRQS };
- if (!dev->dev.of_match)
+ match = of_match_device(fsl_of_msi_ids, &dev->dev);
+ if (!match)
return -EINVAL;
- features = dev->dev.of_match->data;
+ features = match->data;
printk(KERN_DEBUG "Setting up Freescale MSI support\n");
#include <linux/stddef.h>
#include <linux/sched.h>
#include <linux/signal.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/device.h>
#include <linux/bootmem.h>
#include <linux/spinlock.h>
u32 sercr;
} ipic_saved_state;
-static int ipic_suspend(struct sys_device *sdev, pm_message_t state)
+static int ipic_suspend(void)
{
struct ipic *ipic = primary_ipic;
return 0;
}
-static int ipic_resume(struct sys_device *sdev)
+static void ipic_resume(void)
{
struct ipic *ipic = primary_ipic;
ipic_write(ipic->regs, IPIC_SECNR, ipic_saved_state.secnr);
ipic_write(ipic->regs, IPIC_SERMR, ipic_saved_state.sermr);
ipic_write(ipic->regs, IPIC_SERCR, ipic_saved_state.sercr);
-
- return 0;
}
#else
#define ipic_suspend NULL
#define ipic_resume NULL
#endif
-static struct sysdev_class ipic_sysclass = {
- .name = "ipic",
+static struct syscore_ops ipic_syscore_ops = {
.suspend = ipic_suspend,
.resume = ipic_resume,
};
-static struct sys_device device_ipic = {
- .id = 0,
- .cls = &ipic_sysclass,
-};
-
-static int __init init_ipic_sysfs(void)
+static int __init init_ipic_syscore(void)
{
- int rc;
-
if (!primary_ipic || !primary_ipic->regs)
return -ENODEV;
- printk(KERN_DEBUG "Registering ipic with sysfs...\n");
- rc = sysdev_class_register(&ipic_sysclass);
- if (rc) {
- printk(KERN_ERR "Failed registering ipic sys class\n");
- return -ENODEV;
- }
- rc = sysdev_register(&device_ipic);
- if (rc) {
- printk(KERN_ERR "Failed registering ipic sys device\n");
- return -ENODEV;
- }
+ printk(KERN_DEBUG "Registering ipic system core operations\n");
+ register_syscore_ops(&ipic_syscore_ops);
+
return 0;
}
-subsys_initcall(init_ipic_sysfs);
+subsys_initcall(init_ipic_syscore);
#include <linux/spinlock.h>
#include <linux/pci.h>
#include <linux/slab.h>
+#include <linux/syscore_ops.h>
#include <asm/ptrace.h>
#include <asm/signal.h>
#endif /* CONFIG_SMP */
#ifdef CONFIG_PM
-static int mpic_suspend(struct sys_device *dev, pm_message_t state)
+static void mpic_suspend_one(struct mpic *mpic)
{
- struct mpic *mpic = container_of(dev, struct mpic, sysdev);
int i;
for (i = 0; i < mpic->num_sources; i++) {
mpic->save_data[i].dest =
mpic_irq_read(i, MPIC_INFO(IRQ_DESTINATION));
}
+}
+
+static int mpic_suspend(void)
+{
+ struct mpic *mpic = mpics;
+
+ while (mpic) {
+ mpic_suspend_one(mpic);
+ mpic = mpic->next;
+ }
return 0;
}
-static int mpic_resume(struct sys_device *dev)
+static void mpic_resume_one(struct mpic *mpic)
{
- struct mpic *mpic = container_of(dev, struct mpic, sysdev);
int i;
for (i = 0; i < mpic->num_sources; i++) {
}
#endif
} /* end for loop */
+}
- return 0;
+static void mpic_resume(void)
+{
+ struct mpic *mpic = mpics;
+
+ while (mpic) {
+ mpic_resume_one(mpic);
+ mpic = mpic->next;
+ }
}
-#endif
-static struct sysdev_class mpic_sysclass = {
-#ifdef CONFIG_PM
+static struct syscore_ops mpic_syscore_ops = {
.resume = mpic_resume,
.suspend = mpic_suspend,
-#endif
- .name = "mpic",
};
static int mpic_init_sys(void)
{
- struct mpic *mpic = mpics;
- int error, id = 0;
-
- error = sysdev_class_register(&mpic_sysclass);
-
- while (mpic && !error) {
- mpic->sysdev.cls = &mpic_sysclass;
- mpic->sysdev.id = id++;
- error = sysdev_register(&mpic->sysdev);
- mpic = mpic->next;
- }
- return error;
+ register_syscore_ops(&mpic_syscore_ops);
+ return 0;
}
device_initcall(mpic_init_sys);
+#endif
select HAVE_KERNEL_XZ
select HAVE_GET_USER_PAGES_FAST
select HAVE_ARCH_MUTEX_CPU_RELAX
+ select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
select ARCH_INLINE_SPIN_TRYLOCK
select ARCH_INLINE_SPIN_TRYLOCK_BH
select ARCH_INLINE_SPIN_LOCK
/* Add the entropy */
while (nbytes >= 8) {
- *((__u64 *)parm_block) ^= *((__u64 *)buf+i*8);
+ *((__u64 *)parm_block) ^= *((__u64 *)(buf+i));
prng_add_entropy();
i += 8;
nbytes -= 8;
int set_memory_ro(unsigned long addr, int numpages);
int set_memory_rw(unsigned long addr, int numpages);
int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_x(unsigned long addr, int numpages);
#endif /* _S390_CACHEFLUSH_H */
#define _ASM_S390_DIAG_H
/*
- * Diagnose 10: Release pages
+ * Diagnose 10: Release page range
*/
-extern void diag10(unsigned long addr);
+static inline void diag10_range(unsigned long start_pfn, unsigned long num_pfn)
+{
+ unsigned long start_addr, end_addr;
+
+ start_addr = start_pfn << PAGE_SHIFT;
+ end_addr = (start_pfn + num_pfn - 1) << PAGE_SHIFT;
+
+ asm volatile(
+ "0: diag %0,%1,0x10\n"
+ "1:\n"
+ EX_TABLE(0b, 1b)
+ EX_TABLE(1b, 1b)
+ : : "a" (start_addr), "a" (end_addr));
+}
/*
* Diagnose 14: Input spool file manipulation
#ifdef CONFIG_64BIT
#define MCOUNT_INSN_SIZE 12
-#define MCOUNT_OFFSET 8
#else
#define MCOUNT_INSN_SIZE 20
-#define MCOUNT_OFFSET 4
#endif
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
- return addr - MCOUNT_OFFSET;
+ return addr;
}
#endif /* __ASSEMBLY__ */
--- /dev/null
+#ifndef _ASM_S390_JUMP_LABEL_H
+#define _ASM_S390_JUMP_LABEL_H
+
+#include <linux/types.h>
+
+#define JUMP_LABEL_NOP_SIZE 6
+
+#ifdef CONFIG_64BIT
+#define ASM_PTR ".quad"
+#define ASM_ALIGN ".balign 8"
+#else
+#define ASM_PTR ".long"
+#define ASM_ALIGN ".balign 4"
+#endif
+
+static __always_inline bool arch_static_branch(struct jump_label_key *key)
+{
+ asm goto("0: brcl 0,0\n"
+ ".pushsection __jump_table, \"aw\"\n"
+ ASM_ALIGN "\n"
+ ASM_PTR " 0b, %l[label], %0\n"
+ ".popsection\n"
+ : : "X" (key) : : label);
+ return false;
+label:
+ return true;
+}
+
+typedef unsigned long jump_label_t;
+
+struct jump_entry {
+ jump_label_t code;
+ jump_label_t target;
+ jump_label_t key;
+};
+
+#endif
#ifdef CONFIG_64BIT
mm->context.asce_bits |= _ASCE_TYPE_REGION3;
#endif
- if (current->mm->context.alloc_pgste) {
+ if (current->mm && current->mm->context.alloc_pgste) {
/*
* alloc_pgste indicates, that any NEW context will be created
* with extended page tables. The old context is unchanged. The
obj-y := bitmap.o traps.o time.o process.o base.o early.o setup.o \
processor.o sys_s390.o ptrace.o signal.o cpcmd.o ebcdic.o \
s390_ext.o debug.o irq.o ipl.o dis.o diag.o mem_detect.o \
- vdso.o vtime.o sysinfo.o nmi.o sclp.o
+ vdso.o vtime.o sysinfo.o nmi.o sclp.o jump_label.o
obj-y += $(if $(CONFIG_64BIT),entry64.o,entry.o)
obj-y += $(if $(CONFIG_64BIT),reipl64.o,reipl.o)
#include <linux/module.h>
#include <asm/diag.h>
-/*
- * Diagnose 10: Release pages
- */
-void diag10(unsigned long addr)
-{
- if (addr >= 0x7ff00000)
- return;
- asm volatile(
-#ifdef CONFIG_64BIT
- " sam31\n"
- " diag %0,%0,0x10\n"
- "0: sam64\n"
-#else
- " diag %0,%0,0x10\n"
- "0:\n"
-#endif
- EX_TABLE(0b, 0b)
- : : "a" (addr));
-}
-EXPORT_SYMBOL(diag10);
-
/*
* Diagnose 14: Input spool file manipulation
*/
{ "rp", 0x77, INSTR_S_RD },
{ "stcke", 0x78, INSTR_S_RD },
{ "sacf", 0x79, INSTR_S_RD },
+ { "spp", 0x80, INSTR_S_RD },
{ "stsi", 0x7d, INSTR_S_RD },
{ "srnm", 0x99, INSTR_S_RD },
{ "stfpc", 0x9c, INSTR_S_RD },
stosm __SF_EMPTY(%r15),0x04 # now we can turn dat on
basr %r14,0
l %r14,restart_addr-.(%r14)
- br %r14 # branch to start_secondary
+ basr %r14,%r14 # branch to start_secondary
restart_addr:
.long start_secondary
.align 8
mvc __LC_SYSTEM_TIMER(8),__TI_system_timer(%r1)
xc __LC_STEAL_TIMER(8),__LC_STEAL_TIMER
stosm __SF_EMPTY(%r15),0x04 # now we can turn dat on
- jg start_secondary
+ brasl %r14,start_secondary
.align 8
restart_vtime:
.long 0x7fffffff,0xffffffff
--- /dev/null
+/*
+ * Jump label s390 support
+ *
+ * Copyright IBM Corp. 2011
+ * Author(s): Jan Glauber <jang@linux.vnet.ibm.com>
+ */
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/stop_machine.h>
+#include <linux/jump_label.h>
+#include <asm/ipl.h>
+
+#ifdef HAVE_JUMP_LABEL
+
+struct insn {
+ u16 opcode;
+ s32 offset;
+} __packed;
+
+struct insn_args {
+ unsigned long *target;
+ struct insn *insn;
+ ssize_t size;
+};
+
+static int __arch_jump_label_transform(void *data)
+{
+ struct insn_args *args = data;
+ int rc;
+
+ rc = probe_kernel_write(args->target, args->insn, args->size);
+ WARN_ON_ONCE(rc < 0);
+ return 0;
+}
+
+void arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type)
+{
+ struct insn_args args;
+ struct insn insn;
+
+ if (type == JUMP_LABEL_ENABLE) {
+ /* brcl 15,offset */
+ insn.opcode = 0xc0f4;
+ insn.offset = (entry->target - entry->code) >> 1;
+ } else {
+ /* brcl 0,0 */
+ insn.opcode = 0xc004;
+ insn.offset = 0;
+ }
+
+ args.target = (void *) entry->code;
+ args.insn = &insn;
+ args.size = JUMP_LABEL_NOP_SIZE;
+
+ stop_machine(__arch_jump_label_transform, &args, NULL);
+}
+
+#endif
kstat_cpu(smp_processor_id()).irqs[EXTINT_IPI]++;
/*
* handle bit signal external calls
- *
- * For the ec_schedule signal we have to do nothing. All the work
- * is done automatically when we return from the interrupt.
*/
bits = xchg(&S390_lowcore.ext_call_fast, 0);
+ if (test_bit(ec_schedule, &bits))
+ scheduler_ipi();
+
if (test_bit(ec_call_function, &bits))
generic_smp_call_function_interrupt();
tm __TI_flags+7(%r2),_TIF_EXIT_SIE
jz 0f
larl %r2,sie_exit # work pending, leave sie
- stg %r2,__LC_RETURN_PSW+8
+ stg %r2,SPI_PSW+8(0,%r15)
br %r14
0: larl %r2,sie_reenter # re-enter with guest id
- stg %r2,__LC_RETURN_PSW+8
+ stg %r2,SPI_PSW+8(0,%r15)
1: br %r14
/*
} else
free_page((unsigned long) npa);
}
- diag10(addr);
+ diag10_range(addr >> PAGE_SHIFT, 1);
pa->pages[pa->index++] = addr;
(*counter)++;
spin_unlock(&cmm_lock);
struct task_struct *tsk;
__u16 subcode;
- kstat_cpu(smp_processor_id()).irqs[EXTINT_PFL]++;
/*
* Get the external interruption subcode & pfault
* initial/completion signal bit. VM stores this
subcode = ext_int_code >> 16;
if ((subcode & 0xff00) != __SUBCODE_MASK)
return;
+ kstat_cpu(smp_processor_id()).irqs[EXTINT_PFL]++;
/*
* Get the token (= address of the task structure of the affected task).
*/
#ifdef CONFIG_64BIT
- tsk = *(struct task_struct **) param64;
+ tsk = (struct task_struct *) param64;
#else
- tsk = *(struct task_struct **) param32;
+ tsk = (struct task_struct *) param32;
#endif
if (subcode & 0x0080) {
WARN_ON_ONCE(1);
continue;
}
- ptep = pte_offset_kernel(pmdp, addr + i * PAGE_SIZE);
+ ptep = pte_offset_kernel(pmdp, addr);
pte = *ptep;
pte = set(pte);
- ptep_invalidate(&init_mm, addr + i * PAGE_SIZE, ptep);
+ ptep_invalidate(&init_mm, addr, ptep);
*ptep = pte;
+ addr += PAGE_SIZE;
}
}
return 0;
}
EXPORT_SYMBOL_GPL(set_memory_nx);
+
+int set_memory_x(unsigned long addr, int numpages)
+{
+ return 0;
+}
return rc;
}
-long hwsampler_query_min_interval(void)
+unsigned long hwsampler_query_min_interval(void)
{
- if (min_sampler_rate)
- return min_sampler_rate;
- else
- return -EINVAL;
+ return min_sampler_rate;
}
-long hwsampler_query_max_interval(void)
+unsigned long hwsampler_query_max_interval(void)
{
- if (max_sampler_rate)
- return max_sampler_rate;
- else
- return -EINVAL;
+ return max_sampler_rate;
}
unsigned long hwsampler_get_sample_overflow_count(unsigned int cpu)
int hwsampler_shutdown(void);
int hwsampler_allocate(unsigned long sdbt, unsigned long sdb);
int hwsampler_deallocate(void);
-long hwsampler_query_min_interval(void);
-long hwsampler_query_max_interval(void);
+unsigned long hwsampler_query_min_interval(void);
+unsigned long hwsampler_query_max_interval(void);
int hwsampler_start_all(unsigned long interval);
int hwsampler_stop_all(void);
int hwsampler_deactivate(unsigned int cpu);
* create hwsampler files only if hwsampler_setup() succeeds.
*/
oprofile_min_interval = hwsampler_query_min_interval();
- if (oprofile_min_interval < 0) {
- oprofile_min_interval = 0;
+ if (oprofile_min_interval == 0)
return -ENODEV;
- }
oprofile_max_interval = hwsampler_query_max_interval();
- if (oprofile_max_interval < 0) {
- oprofile_max_interval = 0;
+ if (oprofile_max_interval == 0)
return -ENODEV;
- }
if (oprofile_timer_init(ops))
return -ENODEV;
select RTC_LIB
select GENERIC_ATOMIC64
select GENERIC_IRQ_SHOW
- select ARCH_NO_SYSDEV_OPS
help
The SuperH is a RISC processor targeted for use in embedded systems
and consumer electronics; it was also used in the Sega Dreamcast
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
-CONFIG_PM_VERBOSE=y
CONFIG_PM_RUNTIME=y
CONFIG_CPU_IDLE=y
CONFIG_NET=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
-CONFIG_PM_VERBOSE=y
CONFIG_PM_RUNTIME=y
CONFIG_CPU_IDLE=y
CONFIG_NET=y
queue_work(pm_wq, &hwblk_work);
}
-int platform_pm_runtime_suspend(struct device *dev)
+static int default_platform_runtime_suspend(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
struct pdev_archdata *ad = &pdev->archdata;
int hwblk = ad->hwblk_id;
int ret = 0;
- dev_dbg(dev, "platform_pm_runtime_suspend() [%d]\n", hwblk);
+ dev_dbg(dev, "%s() [%d]\n", __func__, hwblk);
/* ignore off-chip platform devices */
if (!hwblk)
mutex_unlock(&ad->mutex);
out:
- dev_dbg(dev, "platform_pm_runtime_suspend() [%d] returns %d\n",
- hwblk, ret);
+ dev_dbg(dev, "%s() [%d] returns %d\n",
+ __func__, hwblk, ret);
return ret;
}
-int platform_pm_runtime_resume(struct device *dev)
+static int default_platform_runtime_resume(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
struct pdev_archdata *ad = &pdev->archdata;
int hwblk = ad->hwblk_id;
int ret = 0;
- dev_dbg(dev, "platform_pm_runtime_resume() [%d]\n", hwblk);
+ dev_dbg(dev, "%s() [%d]\n", __func__, hwblk);
/* ignore off-chip platform devices */
if (!hwblk)
*/
mutex_unlock(&ad->mutex);
out:
- dev_dbg(dev, "platform_pm_runtime_resume() [%d] returns %d\n",
- hwblk, ret);
+ dev_dbg(dev, "%s() [%d] returns %d\n",
+ __func__, hwblk, ret);
return ret;
}
-int platform_pm_runtime_idle(struct device *dev)
+static int default_platform_runtime_idle(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
int hwblk = pdev->archdata.hwblk_id;
int ret = 0;
- dev_dbg(dev, "platform_pm_runtime_idle() [%d]\n", hwblk);
+ dev_dbg(dev, "%s() [%d]\n", __func__, hwblk);
/* ignore off-chip platform devices */
if (!hwblk)
/* suspend synchronously to disable clocks immediately */
ret = pm_runtime_suspend(dev);
out:
- dev_dbg(dev, "platform_pm_runtime_idle() [%d] done!\n", hwblk);
+ dev_dbg(dev, "%s() [%d] done!\n", __func__, hwblk);
return ret;
}
+static struct dev_power_domain default_power_domain = {
+ .ops = {
+ .runtime_suspend = default_platform_runtime_suspend,
+ .runtime_resume = default_platform_runtime_resume,
+ .runtime_idle = default_platform_runtime_idle,
+ USE_PLATFORM_PM_SLEEP_OPS
+ },
+};
+
static int platform_bus_notify(struct notifier_block *nb,
unsigned long action, void *data)
{
hwblk_disable(hwblk_info, hwblk);
/* make sure driver re-inits itself once */
__set_bit(PDEV_ARCHDATA_FLAG_INIT, &pdev->archdata.flags);
+ dev->pwr_domain = &default_power_domain;
break;
/* TODO: add BUS_NOTIFY_BIND_DRIVER and increase idle count */
case BUS_NOTIFY_BOUND_DRIVER:
__set_bit(PDEV_ARCHDATA_FLAG_INIT, &pdev->archdata.flags);
break;
case BUS_NOTIFY_DEL_DEVICE:
+ dev->pwr_domain = NULL;
break;
}
return 0;
set_tsk_thread_flag(child, TIF_SINGLESTEP);
+ if (ptrace_get_breakpoints(child) < 0)
+ return;
+
set_single_step(child, pc);
+ ptrace_put_breakpoints(child);
}
void user_disable_single_step(struct task_struct *child)
#include <linux/module.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
+#include <linux/sched.h>
#include <asm/atomic.h>
#include <asm/processor.h>
#include <asm/system.h>
generic_smp_call_function_interrupt();
break;
case SMP_MSG_RESCHEDULE:
+ scheduler_ipi();
break;
case SMP_MSG_FUNCTION_SINGLE:
generic_smp_call_function_single_interrupt();
#define JUMP_LABEL_NOP_SIZE 4
-#define JUMP_LABEL(key, label) \
- do { \
- asm goto("1:\n\t" \
- "nop\n\t" \
- "nop\n\t" \
- ".pushsection __jump_table, \"a\"\n\t"\
- ".align 4\n\t" \
- ".word 1b, %l[" #label "], %c0\n\t" \
- ".popsection \n\t" \
- : : "i" (key) : : label);\
- } while (0)
+static __always_inline bool arch_static_branch(struct jump_label_key *key)
+{
+ asm goto("1:\n\t"
+ "nop\n\t"
+ "nop\n\t"
+ ".pushsection __jump_table, \"aw\"\n\t"
+ ".align 4\n\t"
+ ".word 1b, %l[l_yes], %c0\n\t"
+ ".popsection \n\t"
+ : : "i" (key) : : l_yes);
+ return false;
+l_yes:
+ return true;
+}
#endif /* __KERNEL__ */
#define smt_capable() (sparc64_multi_core)
#endif /* CONFIG_SMP */
-#define cpu_coregroup_mask(cpu) (&cpu_core_map[cpu])
+extern cpumask_t cpu_core_map[NR_CPUS];
+static inline const struct cpumask *cpu_coregroup_mask(int cpu)
+{
+ return &cpu_core_map[cpu];
+}
#endif /* _ASM_SPARC64_TOPOLOGY_H */
return 0;
}
-static struct of_device_id __initdata apc_match[] = {
+static struct of_device_id apc_match[] = {
{
.name = APC_OBPNAME,
},
sabre_scan_bus(pbm, &op->dev);
}
+static const struct of_device_id sabre_match[];
static int __devinit sabre_probe(struct platform_device *op)
{
+ const struct of_device_id *match;
const struct linux_prom64_registers *pr_regs;
struct device_node *dp = op->dev.of_node;
struct pci_pbm_info *pbm;
const u32 *vdma;
u64 clear_irq;
- hummingbird_p = op->dev.of_match && (op->dev.of_match->data != NULL);
+ match = of_match_device(sabre_match, &op->dev);
+ hummingbird_p = match && (match->data != NULL);
if (!hummingbird_p) {
struct device_node *cpu_dp;
return err;
}
+static const struct of_device_id schizo_match[];
static int __devinit schizo_probe(struct platform_device *op)
{
- if (!op->dev.of_match)
+ const struct of_device_id *match;
+
+ match = of_match_device(schizo_match, &op->dev);
+ if (!match)
return -EINVAL;
- return __schizo_init(op, (unsigned long) op->dev.of_match->data);
+ return __schizo_init(op, (unsigned long)match->data);
}
/* The ordering of this table is very important. Some Tomatillo
return 0;
}
-static struct of_device_id __initdata pmc_match[] = {
+static struct of_device_id pmc_match[] = {
{
.name = PMC_OBPNAME,
},
void __cpuinit smp_store_cpu_info(int id)
{
int cpu_node;
+ int mid;
cpu_data(id).udelay_val = loops_per_jiffy;
cpu_data(id).clock_tick = prom_getintdefault(cpu_node,
"clock-frequency", 0);
cpu_data(id).prom_node = cpu_node;
- cpu_data(id).mid = cpu_get_hwmid(cpu_node);
+ mid = cpu_get_hwmid(cpu_node);
- if (cpu_data(id).mid < 0)
- panic("No MID found for CPU%d at node 0x%08d", id, cpu_node);
+ if (mid < 0) {
+ printk(KERN_NOTICE "No MID found for CPU%d at node 0x%08d", id, cpu_node);
+ mid = 0;
+ }
+ cpu_data(id).mid = mid;
}
void __init smp_cpus_done(unsigned int max_cpus)
void smp_send_reschedule(int cpu)
{
- /* See sparc64 */
+ /*
+ * XXX missing reschedule IPI, see scheduler_ipi()
+ */
}
void smp_send_stop(void)
void __irq_entry smp_receive_signal_client(int irq, struct pt_regs *regs)
{
clear_softint(1 << irq);
+ scheduler_ipi();
}
/* This is a nop because we capture all other cpus
return 0;
}
-static struct of_device_id __initdata clock_match[] = {
+static struct of_device_id clock_match[] = {
{
.name = "eeprom",
},
/* Also, handle the alignment code out of band. */
cc_dword_align:
- cmp %g1, 6
- bl,a ccte
+ cmp %g1, 16
+ bge 1f
+ srl %g1, 1, %o3
+2: cmp %o3, 0
+ be,a ccte
andcc %g1, 0xf, %o3
- andcc %o0, 0x1, %g0
+ andcc %o3, %o0, %g0 ! Check %o0 only (%o1 has the same last 2 bits)
+ be,a 2b
+ srl %o3, 1, %o3
+1: andcc %o0, 0x1, %g0
bne ccslow
andcc %o0, 0x2, %g0
be 1f
/* Called when smp_send_reschedule() triggers IRQ_RESCHEDULE. */
static irqreturn_t handle_reschedule_ipi(int irq, void *token)
{
- /*
- * Nothing to do here; when we return from interrupt, the
- * rescheduling will occur there. But do bump the interrupt
- * profiler count in the meantime.
- */
__get_cpu_var(irq_stat).irq_resched_count++;
+ scheduler_ipi();
return IRQ_HANDLED;
}
config HPPFS
tristate "HoneyPot ProcFS (EXPERIMENTAL)"
- depends on EXPERIMENTAL
+ depends on EXPERIMENTAL && PROC_FS
help
hppfs (HoneyPot ProcFS) is a filesystem which allows UML /proc
entries to be overridden, removed, or fabricated from the host.
{
struct thread_info *ti;
unsigned long mask = THREAD_SIZE - 1;
- ti = (struct thread_info *) (((unsigned long) &ti) & ~mask);
+ void *p;
+
+ asm volatile ("" : "=r" (p) : "0" (&ti));
+ ti = (struct thread_info *) (((unsigned long)p) & ~mask);
return ti;
}
break;
case 'R':
- set_tsk_need_resched(current);
+ scheduler_ipi();
break;
case 'S':
#include <stdio.h>
#include <stdlib.h>
+#include <unistd.h>
#include <errno.h>
#include <signal.h>
#include <string.h>
host.release, host.version, host.machine);
}
+/*
+ * We cannot use glibc's abort(). It makes use of tgkill() which
+ * has no effect within UML's kernel threads.
+ * After that glibc would execute an invalid instruction to kill
+ * the calling process and UML crashes with SIGSEGV.
+ */
+static inline void __attribute__ ((noreturn)) uml_abort(void)
+{
+ sigset_t sig;
+
+ fflush(NULL);
+
+ if (!sigemptyset(&sig) && !sigaddset(&sig, SIGABRT))
+ sigprocmask(SIG_UNBLOCK, &sig, 0);
+
+ for (;;)
+ if (kill(getpid(), SIGABRT) < 0)
+ exit(127);
+}
+
void os_dump_core(void)
{
int pid;
while ((pid = waitpid(-1, NULL, WNOHANG | __WALL)) > 0)
os_kill_ptraced_process(pid, 0);
- abort();
+ uml_abort();
}
obj-y = bug.o bugs.o checksum.o delay.o fault.o ksyms.o ldt.o ptrace.o \
ptrace_user.o setjmp.o signal.o stub.o stub_segv.o syscalls.o sysrq.o \
- sys_call_table.o tls.o
+ sys_call_table.o tls.o atomic64_cx8_32.o
obj-$(CONFIG_BINFMT_ELF) += elfcore.o
--- /dev/null
+/*
+ * atomic64_t for 586+
+ *
+ * Copied from arch/x86/lib/atomic64_cx8_32.S
+ *
+ * Copyright © 2010 Luca Barbieri
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/dwarf2.h>
+
+.macro SAVE reg
+ pushl_cfi %\reg
+ CFI_REL_OFFSET \reg, 0
+.endm
+
+.macro RESTORE reg
+ popl_cfi %\reg
+ CFI_RESTORE \reg
+.endm
+
+.macro read64 reg
+ movl %ebx, %eax
+ movl %ecx, %edx
+/* we need LOCK_PREFIX since otherwise cmpxchg8b always does the write */
+ LOCK_PREFIX
+ cmpxchg8b (\reg)
+.endm
+
+ENTRY(atomic64_read_cx8)
+ CFI_STARTPROC
+
+ read64 %ecx
+ ret
+ CFI_ENDPROC
+ENDPROC(atomic64_read_cx8)
+
+ENTRY(atomic64_set_cx8)
+ CFI_STARTPROC
+
+1:
+/* we don't need LOCK_PREFIX since aligned 64-bit writes
+ * are atomic on 586 and newer */
+ cmpxchg8b (%esi)
+ jne 1b
+
+ ret
+ CFI_ENDPROC
+ENDPROC(atomic64_set_cx8)
+
+ENTRY(atomic64_xchg_cx8)
+ CFI_STARTPROC
+
+ movl %ebx, %eax
+ movl %ecx, %edx
+1:
+ LOCK_PREFIX
+ cmpxchg8b (%esi)
+ jne 1b
+
+ ret
+ CFI_ENDPROC
+ENDPROC(atomic64_xchg_cx8)
+
+.macro addsub_return func ins insc
+ENTRY(atomic64_\func\()_return_cx8)
+ CFI_STARTPROC
+ SAVE ebp
+ SAVE ebx
+ SAVE esi
+ SAVE edi
+
+ movl %eax, %esi
+ movl %edx, %edi
+ movl %ecx, %ebp
+
+ read64 %ebp
+1:
+ movl %eax, %ebx
+ movl %edx, %ecx
+ \ins\()l %esi, %ebx
+ \insc\()l %edi, %ecx
+ LOCK_PREFIX
+ cmpxchg8b (%ebp)
+ jne 1b
+
+10:
+ movl %ebx, %eax
+ movl %ecx, %edx
+ RESTORE edi
+ RESTORE esi
+ RESTORE ebx
+ RESTORE ebp
+ ret
+ CFI_ENDPROC
+ENDPROC(atomic64_\func\()_return_cx8)
+.endm
+
+addsub_return add add adc
+addsub_return sub sub sbb
+
+.macro incdec_return func ins insc
+ENTRY(atomic64_\func\()_return_cx8)
+ CFI_STARTPROC
+ SAVE ebx
+
+ read64 %esi
+1:
+ movl %eax, %ebx
+ movl %edx, %ecx
+ \ins\()l $1, %ebx
+ \insc\()l $0, %ecx
+ LOCK_PREFIX
+ cmpxchg8b (%esi)
+ jne 1b
+
+10:
+ movl %ebx, %eax
+ movl %ecx, %edx
+ RESTORE ebx
+ ret
+ CFI_ENDPROC
+ENDPROC(atomic64_\func\()_return_cx8)
+.endm
+
+incdec_return inc add adc
+incdec_return dec sub sbb
+
+ENTRY(atomic64_dec_if_positive_cx8)
+ CFI_STARTPROC
+ SAVE ebx
+
+ read64 %esi
+1:
+ movl %eax, %ebx
+ movl %edx, %ecx
+ subl $1, %ebx
+ sbb $0, %ecx
+ js 2f
+ LOCK_PREFIX
+ cmpxchg8b (%esi)
+ jne 1b
+
+2:
+ movl %ebx, %eax
+ movl %ecx, %edx
+ RESTORE ebx
+ ret
+ CFI_ENDPROC
+ENDPROC(atomic64_dec_if_positive_cx8)
+
+ENTRY(atomic64_add_unless_cx8)
+ CFI_STARTPROC
+ SAVE ebp
+ SAVE ebx
+/* these just push these two parameters on the stack */
+ SAVE edi
+ SAVE esi
+
+ movl %ecx, %ebp
+ movl %eax, %esi
+ movl %edx, %edi
+
+ read64 %ebp
+1:
+ cmpl %eax, 0(%esp)
+ je 4f
+2:
+ movl %eax, %ebx
+ movl %edx, %ecx
+ addl %esi, %ebx
+ adcl %edi, %ecx
+ LOCK_PREFIX
+ cmpxchg8b (%ebp)
+ jne 1b
+
+ movl $1, %eax
+3:
+ addl $8, %esp
+ CFI_ADJUST_CFA_OFFSET -8
+ RESTORE ebx
+ RESTORE ebp
+ ret
+4:
+ cmpl %edx, 4(%esp)
+ jne 2b
+ xorl %eax, %eax
+ jmp 3b
+ CFI_ENDPROC
+ENDPROC(atomic64_add_unless_cx8)
+
+ENTRY(atomic64_inc_not_zero_cx8)
+ CFI_STARTPROC
+ SAVE ebx
+
+ read64 %esi
+1:
+ testl %eax, %eax
+ je 4f
+2:
+ movl %eax, %ebx
+ movl %edx, %ecx
+ addl $1, %ebx
+ adcl $0, %ecx
+ LOCK_PREFIX
+ cmpxchg8b (%esi)
+ jne 1b
+
+ movl $1, %eax
+3:
+ RESTORE ebx
+ ret
+4:
+ testl %edx, %edx
+ jne 2b
+ jmp 3b
+ CFI_ENDPROC
+ENDPROC(atomic64_inc_not_zero_cx8)
#include <linux/list.h>
#include <linux/kallsyms.h>
#include <linux/proc_fs.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
#include <linux/gpio.h>
#include <asm/system.h>
unsigned int iccr;
} puv3_irq_state;
-static int puv3_irq_suspend(struct sys_device *dev, pm_message_t state)
+static int puv3_irq_suspend(void)
{
struct puv3_irq_state *st = &puv3_irq_state;
return 0;
}
-static int puv3_irq_resume(struct sys_device *dev)
+static void puv3_irq_resume(void)
{
struct puv3_irq_state *st = &puv3_irq_state;
writel(st->icmr, INTC_ICMR);
}
- return 0;
}
-static struct sysdev_class puv3_irq_sysclass = {
- .name = "pkunity-irq",
+static struct syscore_ops puv3_irq_syscore_ops = {
.suspend = puv3_irq_suspend,
.resume = puv3_irq_resume,
};
-static struct sys_device puv3_irq_device = {
- .id = 0,
- .cls = &puv3_irq_sysclass,
-};
-
-static int __init puv3_irq_init_devicefs(void)
+static int __init puv3_irq_init_syscore(void)
{
- sysdev_class_register(&puv3_irq_sysclass);
- return sysdev_register(&puv3_irq_device);
+ register_syscore_ops(&puv3_irq_syscore_ops);
+ return 0;
}
-device_initcall(puv3_irq_init_devicefs);
+device_initcall(puv3_irq_init_syscore);
void __init init_IRQ(void)
{
config X86_32
def_bool !64BIT
+ select CLKSRC_I8253
config X86_64
def_bool 64BIT
select GENERIC_IRQ_SHOW
select IRQ_FORCED_THREADING
select USE_GENERIC_SMP_HELPERS if SMP
- select ARCH_NO_SYSDEV_OPS
config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS)
bool "AMD IOMMU support"
select SWIOTLB
select PCI_MSI
+ select PCI_IOV
depends on X86_64 && PCI && ACPI
---help---
With this option you can enable support for AMD IOMMU hardware in
endif # APM
-source "arch/x86/kernel/cpu/cpufreq/Kconfig"
+source "drivers/cpufreq/Kconfig"
source "drivers/cpuidle/Kconfig"
if (oreg.ax > 15*1024) {
return -1; /* Bogus! */
} else if (oreg.ax == 15*1024) {
- boot_params.alt_mem_k = (oreg.dx << 6) + oreg.ax;
+ boot_params.alt_mem_k = (oreg.bx << 6) + oreg.ax;
} else {
/*
* This ignores memory above 16MB if we have a memory
.endm
#endif
+.macro altinstruction_entry orig alt feature orig_len alt_len
+ .align 8
+ .quad \orig
+ .quad \alt
+ .word \feature
+ .byte \orig_len
+ .byte \alt_len
+.endm
+
#endif /* __ASSEMBLY__ */
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/stringify.h>
-#include <linux/jump_label.h>
#include <asm/asm.h>
/*
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
extern void text_poke_smp_batch(struct text_poke_param *params, int n);
-#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_JUMP_LABEL)
#define IDEAL_NOP_SIZE_5 5
extern unsigned char ideal_nop5[IDEAL_NOP_SIZE_5];
extern void arch_init_ideal_nop5(void);
#ifndef _ASM_X86_AMD_IOMMU_PROTO_H
#define _ASM_X86_AMD_IOMMU_PROTO_H
-struct amd_iommu;
+#include <asm/amd_iommu_types.h>
extern int amd_iommu_init_dma_ops(void);
extern int amd_iommu_init_passthrough(void);
+extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
-extern void amd_iommu_flush_all_domains(void);
-extern void amd_iommu_flush_all_devices(void);
extern void amd_iommu_apply_erratum_63(u16 devid);
extern void amd_iommu_reset_cmd_buffer(struct amd_iommu *iommu);
extern int amd_iommu_init_devices(void);
(pdev->device == PCI_DEVICE_ID_RD890_IOMMU);
}
+static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
+{
+ if (!(iommu->cap & (1 << IOMMU_CAP_EFR)))
+ return false;
+
+ return !!(iommu->features & f);
+}
+
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
#define MMIO_CONTROL_OFFSET 0x0018
#define MMIO_EXCL_BASE_OFFSET 0x0020
#define MMIO_EXCL_LIMIT_OFFSET 0x0028
+#define MMIO_EXT_FEATURES 0x0030
#define MMIO_CMD_HEAD_OFFSET 0x2000
#define MMIO_CMD_TAIL_OFFSET 0x2008
#define MMIO_EVT_HEAD_OFFSET 0x2010
#define MMIO_EVT_TAIL_OFFSET 0x2018
#define MMIO_STATUS_OFFSET 0x2020
+
+/* Extended Feature Bits */
+#define FEATURE_PREFETCH (1ULL<<0)
+#define FEATURE_PPR (1ULL<<1)
+#define FEATURE_X2APIC (1ULL<<2)
+#define FEATURE_NX (1ULL<<3)
+#define FEATURE_GT (1ULL<<4)
+#define FEATURE_IA (1ULL<<6)
+#define FEATURE_GA (1ULL<<7)
+#define FEATURE_HE (1ULL<<8)
+#define FEATURE_PC (1ULL<<9)
+
/* MMIO status bits */
#define MMIO_STATUS_COM_WAIT_INT_MASK 0x04
/* command specific defines */
#define CMD_COMPL_WAIT 0x01
#define CMD_INV_DEV_ENTRY 0x02
-#define CMD_INV_IOMMU_PAGES 0x03
+#define CMD_INV_IOMMU_PAGES 0x03
+#define CMD_INV_IOTLB_PAGES 0x04
+#define CMD_INV_ALL 0x08
#define CMD_COMPL_WAIT_STORE_MASK 0x01
#define CMD_COMPL_WAIT_INT_MASK 0x02
#define IOMMU_PTE_IR (1ULL << 61)
#define IOMMU_PTE_IW (1ULL << 62)
+#define DTE_FLAG_IOTLB 0x01
+
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
#define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
/* IOMMU capabilities */
#define IOMMU_CAP_IOTLB 24
#define IOMMU_CAP_NPCACHE 26
+#define IOMMU_CAP_EFR 27
#define MAX_DOMAIN_ID 65536
/* global flag if IOMMUs cache non-present entries */
extern bool amd_iommu_np_cache;
+/* Only true if all IOMMUs support device IOTLBs */
+extern bool amd_iommu_iotlb_sup;
/*
* Make iterating over all IOMMUs easier
/* flags read from acpi table */
u8 acpi_flags;
+ /* Extended features */
+ u64 features;
+
/*
* Capability pointer. There could be more than one IOMMU per PCI
* device function if there are more than one AMD IOMMU capability
/* if one, we need to send a completion wait command */
bool need_sync;
- /* becomes true if a command buffer reset is running */
- bool reset_in_progress;
-
/* default dma_ops domain for that IOMMU */
struct dma_ops_domain *default_dom;
#define APIC_DEST_LOGICAL 0x00800
#define APIC_DEST_PHYSICAL 0x00000
#define APIC_DM_FIXED 0x00000
+#define APIC_DM_FIXED_MASK 0x00700
#define APIC_DM_LOWEST 0x00100
#define APIC_DM_SMI 0x00200
#define APIC_DM_REMRD 0x00300
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
#define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
+#define X86_FEATURE_ERMS (9*32+ 9) /* Enhanced REP MOVSB/STOSB */
#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/*
- * call mcount is "e8 <4 byte offset>"
- * The addr points to the 4 byte offset and the caller of this
- * function wants the pointer to e8. Simply subtract one.
+ * addr is the address of the mcount call instruction.
+ * recordmcount does the necessary offset calculation.
*/
- return addr - 1;
+ return addr;
}
#ifdef CONFIG_DYNAMIC_FTRACE
#define PIT_CH0 0x40
#define PIT_CH2 0x42
+#define PIT_LATCH LATCH
+
extern raw_spinlock_t i8253_lock;
extern struct clock_event_device *global_clock_event;
extern void ioapic_and_gsi_init(void);
extern void ioapic_insert_resources(void);
-int io_apic_setup_irq_pin(unsigned int irq, int node, struct io_apic_irq_attr *attr);
+int io_apic_setup_irq_pin_once(unsigned int irq, int node, struct io_apic_irq_attr *attr);
extern struct IO_APIC_route_entry **alloc_ioapic_entries(void);
extern void free_ioapic_entries(struct IO_APIC_route_entry **ioapic_entries);
#include <linux/types.h>
#include <asm/nops.h>
+#include <asm/asm.h>
#define JUMP_LABEL_NOP_SIZE 5
-# define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
-
-# define JUMP_LABEL(key, label) \
- do { \
- asm goto("1:" \
- JUMP_LABEL_INITIAL_NOP \
- ".pushsection __jump_table, \"aw\" \n\t"\
- _ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
- ".popsection \n\t" \
- : : "i" (key) : : label); \
- } while (0)
+#define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
+
+static __always_inline bool arch_static_branch(struct jump_label_key *key)
+{
+ asm goto("1:"
+ JUMP_LABEL_INITIAL_NOP
+ ".pushsection __jump_table, \"aw\" \n\t"
+ _ASM_ALIGN "\n\t"
+ _ASM_PTR "1b, %l[l_yes], %c0 \n\t"
+ ".popsection \n\t"
+ : : "i" (key) : : l_yes);
+ return false;
+l_yes:
+ return true;
+}
#endif /* __KERNEL__ */
/* Install a pte for a particular vaddr in kernel space. */
void set_pte_vaddr(unsigned long vaddr, pte_t pte);
+extern void native_pagetable_reserve(u64 start, u64 end);
#ifdef CONFIG_X86_32
extern void native_pagetable_setup_start(pgd_t *base);
extern void native_pagetable_setup_done(pgd_t *base);
* executable.)
*/
#define RESERVE_BRK(name,sz) \
- static void __section(.discard.text) __used \
+ static void __section(.discard.text) __used notrace \
__brk_reservation_fn_##name##__(void) { \
asm volatile ( \
".pushsection .brk_reservation,\"aw\",@nobits;" \
/* Generic stack tracer with callbacks */
struct stacktrace_ops {
- void (*warning)(void *data, char *msg);
- /* msg must contain %s for the symbol */
- void (*warning_symbol)(void *data, char *msg, unsigned long symbol);
void (*address)(void *data, unsigned long address, int reliable);
/* On negative return stop dumping */
int (*stack)(void *data, char *name);
* Returns 0 if the range is valid, nonzero otherwise.
*
* This is equivalent to the following test:
- * (u33)addr + (u33)size >= (u33)current->addr_limit.seg (u65 for x86_64)
+ * (u33)addr + (u33)size > (u33)current->addr_limit.seg (u65 for x86_64)
*
* This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry...
*/
/* after this # consecutive successes, bump up the throttle if it was lowered */
#define COMPLETE_THRESHOLD 5
+#define UV_LB_SUBNODEID 0x10
+
/*
* number of entries in the destination side payload queue
*/
* The distribution specification (32 bytes) is interpreted as a 256-bit
* distribution vector. Adjacent bits correspond to consecutive even numbered
* nodeIDs. The result of adding the index of a given bit to the 15-bit
- * 'base_dest_nodeid' field of the header corresponds to the
+ * 'base_dest_nasid' field of the header corresponds to the
* destination nodeID associated with that specified bit.
*/
struct bau_target_uvhubmask {
struct bau_msg_header {
unsigned int dest_subnodeid:6; /* must be 0x10, for the LB */
/* bits 5:0 */
- unsigned int base_dest_nodeid:15; /* nasid of the */
+ unsigned int base_dest_nasid:15; /* nasid of the */
/* bits 20:6 */ /* first bit in uvhub map */
unsigned int command:8; /* message type */
/* bits 28:21 */
unsigned long d_rcanceled; /* number of messages canceled by resets */
};
+struct hub_and_pnode {
+ short uvhub;
+ short pnode;
+};
/*
* one per-cpu; to locate the software tables
*/
int baudisabled;
int set_bau_off;
short cpu;
+ short osnode;
short uvhub_cpu;
short uvhub;
short cpus_in_socket;
short cpus_in_uvhub;
+ short partition_base_pnode;
unsigned short message_number;
unsigned short uvhub_quiesce;
short socket_acknowledge_count[DEST_Q_SIZE];
int congested_period;
cycles_t period_time;
long period_requests;
+ struct hub_and_pnode *target_hub_and_pnode;
};
static inline int bau_uvhub_isset(int uvhub, struct bau_target_uvhubmask *dstp)
{
return constant_test_bit(uvhub, &dstp->bits[0]);
}
-static inline void bau_uvhub_set(int uvhub, struct bau_target_uvhubmask *dstp)
+static inline void bau_uvhub_set(int pnode, struct bau_target_uvhubmask *dstp)
{
- __set_bit(uvhub, &dstp->bits[0]);
+ __set_bit(pnode, &dstp->bits[0]);
}
static inline void bau_uvhubs_clear(struct bau_target_uvhubmask *dstp,
int nbits)
unsigned short nr_online_cpus;
unsigned short pnode;
short memory_nid;
+ spinlock_t nmi_lock;
+ unsigned long nmi_count;
};
extern struct uv_blade_info *uv_blade_info;
extern short *uv_node_to_blade;
*
* SGI UV MMR definitions
*
- * Copyright (C) 2007-2010 Silicon Graphics, Inc. All rights reserved.
+ * Copyright (C) 2007-2011 Silicon Graphics, Inc. All rights reserved.
*/
#ifndef _ASM_X86_UV_UV_MMRS_H
} s;
};
+/* ========================================================================= */
+/* UVH_SCRATCH5 */
+/* ========================================================================= */
+#define UVH_SCRATCH5 0x2d0200UL
+#define UVH_SCRATCH5_32 0x00778
+
+#define UVH_SCRATCH5_SCRATCH5_SHFT 0
+#define UVH_SCRATCH5_SCRATCH5_MASK 0xffffffffffffffffUL
+union uvh_scratch5_u {
+ unsigned long v;
+ struct uvh_scratch5_s {
+ unsigned long scratch5 : 64; /* RW, W1CS */
+ } s;
+};
#endif /* __ASM_UV_MMRS_X86_H__ */
void (*banner)(void);
};
+/**
+ * struct x86_init_mapping - platform specific initial kernel pagetable setup
+ * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage
+ *
+ * For more details on the purpose of this hook, look in
+ * init_memory_mapping and the commit that added it.
+ */
+struct x86_init_mapping {
+ void (*pagetable_reserve)(u64 start, u64 end);
+};
+
/**
* struct x86_init_paging - platform specific paging functions
* @pagetable_setup_start: platform specific pre paging_init() call
struct x86_init_mpparse mpparse;
struct x86_init_irqs irqs;
struct x86_init_oem oem;
+ struct x86_init_mapping mapping;
struct x86_init_paging paging;
struct x86_init_timers timers;
struct x86_init_iommu iommu;
extern unsigned long set_phys_range_identity(unsigned long pfn_s,
unsigned long pfn_e);
-extern int m2p_add_override(unsigned long mfn, struct page *page);
-extern int m2p_remove_override(struct page *page);
+extern int m2p_add_override(unsigned long mfn, struct page *page,
+ bool clear_pte);
+extern int m2p_remove_override(struct page *page, bool clear_pte);
extern struct page *m2p_find_override(unsigned long mfn);
extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
#endif
#if defined(CONFIG_XEN_DOM0)
void __init xen_setup_pirqs(void);
+int xen_find_device_domain_owner(struct pci_dev *dev);
+int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
+int xen_unregister_device_domain_owner(struct pci_dev *dev);
#else
static inline void __init xen_setup_pirqs(void)
{
}
+static inline int xen_find_device_domain_owner(struct pci_dev *dev)
+{
+ return -1;
+}
+static inline int xen_register_device_domain_owner(struct pci_dev *dev,
+ uint16_t domain)
+{
+ return -1;
+}
+static inline int xen_unregister_device_domain_owner(struct pci_dev *dev)
+{
+ return -1;
+}
#endif
#if defined(CONFIG_PCI_MSI)
ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_AUDIT) += audit_64.o
- obj-$(CONFIG_GART_IOMMU) += pci-gart_64.o aperture_64.o
+ obj-$(CONFIG_GART_IOMMU) += amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
obj-$(CONFIG_AMD_IOMMU) += amd_iommu_init.o amd_iommu.o
#ifdef CONFIG_HIBERNATION
if (strncmp(str, "s4_nohwsig", 10) == 0)
acpi_no_s4_hw_signature();
- if (strncmp(str, "s4_nonvs", 8) == 0) {
- pr_warning("ACPI: acpi_sleep=s4_nonvs is deprecated, "
- "please use acpi_sleep=nonvs instead");
- acpi_nvs_nosave();
- }
#endif
if (strncmp(str, "nonvs", 5) == 0)
acpi_nvs_nosave();
u8 insnbuf[MAX_PATCH_LEN];
DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
+ /*
+ * The scan order should be from start to end. A later scanned
+ * alternative code can overwrite a previous scanned alternative code.
+ * Some kernel functions (e.g. memcpy, memset, etc) use this order to
+ * patch code.
+ *
+ * So be careful if you want to change the scan order to any other
+ * order.
+ */
for (a = start; a < end; a++) {
u8 *instr = a->instr;
BUG_ON(a->replacementlen > a->instrlen);
__stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
}
-#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_JUMP_LABEL)
#ifdef CONFIG_X86_64
unsigned char ideal_nop5[5] = { 0x66, 0x66, 0x66, 0x66, 0x90 };
--- /dev/null
+/*
+ * Dynamic DMA mapping support for AMD Hammer.
+ *
+ * Use the integrated AGP GART in the Hammer northbridge as an IOMMU for PCI.
+ * This allows to use PCI devices that only support 32bit addresses on systems
+ * with more than 4GB.
+ *
+ * See Documentation/PCI/PCI-DMA-mapping.txt for the interface specification.
+ *
+ * Copyright 2002 Andi Kleen, SuSE Labs.
+ * Subject to the GNU General Public License v2 only.
+ */
+
+#include <linux/types.h>
+#include <linux/ctype.h>
+#include <linux/agp_backend.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/string.h>
+#include <linux/spinlock.h>
+#include <linux/pci.h>
+#include <linux/module.h>
+#include <linux/topology.h>
+#include <linux/interrupt.h>
+#include <linux/bitmap.h>
+#include <linux/kdebug.h>
+#include <linux/scatterlist.h>
+#include <linux/iommu-helper.h>
+#include <linux/syscore_ops.h>
+#include <linux/io.h>
+#include <linux/gfp.h>
+#include <asm/atomic.h>
+#include <asm/mtrr.h>
+#include <asm/pgtable.h>
+#include <asm/proto.h>
+#include <asm/iommu.h>
+#include <asm/gart.h>
+#include <asm/cacheflush.h>
+#include <asm/swiotlb.h>
+#include <asm/dma.h>
+#include <asm/amd_nb.h>
+#include <asm/x86_init.h>
+#include <asm/iommu_table.h>
+
+static unsigned long iommu_bus_base; /* GART remapping area (physical) */
+static unsigned long iommu_size; /* size of remapping area bytes */
+static unsigned long iommu_pages; /* .. and in pages */
+
+static u32 *iommu_gatt_base; /* Remapping table */
+
+static dma_addr_t bad_dma_addr;
+
+/*
+ * If this is disabled the IOMMU will use an optimized flushing strategy
+ * of only flushing when an mapping is reused. With it true the GART is
+ * flushed for every mapping. Problem is that doing the lazy flush seems
+ * to trigger bugs with some popular PCI cards, in particular 3ware (but
+ * has been also also seen with Qlogic at least).
+ */
+static int iommu_fullflush = 1;
+
+/* Allocation bitmap for the remapping area: */
+static DEFINE_SPINLOCK(iommu_bitmap_lock);
+/* Guarded by iommu_bitmap_lock: */
+static unsigned long *iommu_gart_bitmap;
+
+static u32 gart_unmapped_entry;
+
+#define GPTE_VALID 1
+#define GPTE_COHERENT 2
+#define GPTE_ENCODE(x) \
+ (((x) & 0xfffff000) | (((x) >> 32) << 4) | GPTE_VALID | GPTE_COHERENT)
+#define GPTE_DECODE(x) (((x) & 0xfffff000) | (((u64)(x) & 0xff0) << 28))
+
+#define EMERGENCY_PAGES 32 /* = 128KB */
+
+#ifdef CONFIG_AGP
+#define AGPEXTERN extern
+#else
+#define AGPEXTERN
+#endif
+
+/* GART can only remap to physical addresses < 1TB */
+#define GART_MAX_PHYS_ADDR (1ULL << 40)
+
+/* backdoor interface to AGP driver */
+AGPEXTERN int agp_memory_reserved;
+AGPEXTERN __u32 *agp_gatt_table;
+
+static unsigned long next_bit; /* protected by iommu_bitmap_lock */
+static bool need_flush; /* global flush state. set for each gart wrap */
+
+static unsigned long alloc_iommu(struct device *dev, int size,
+ unsigned long align_mask)
+{
+ unsigned long offset, flags;
+ unsigned long boundary_size;
+ unsigned long base_index;
+
+ base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev),
+ PAGE_SIZE) >> PAGE_SHIFT;
+ boundary_size = ALIGN((u64)dma_get_seg_boundary(dev) + 1,
+ PAGE_SIZE) >> PAGE_SHIFT;
+
+ spin_lock_irqsave(&iommu_bitmap_lock, flags);
+ offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, next_bit,
+ size, base_index, boundary_size, align_mask);
+ if (offset == -1) {
+ need_flush = true;
+ offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, 0,
+ size, base_index, boundary_size,
+ align_mask);
+ }
+ if (offset != -1) {
+ next_bit = offset+size;
+ if (next_bit >= iommu_pages) {
+ next_bit = 0;
+ need_flush = true;
+ }
+ }
+ if (iommu_fullflush)
+ need_flush = true;
+ spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
+
+ return offset;
+}
+
+static void free_iommu(unsigned long offset, int size)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu_bitmap_lock, flags);
+ bitmap_clear(iommu_gart_bitmap, offset, size);
+ if (offset >= next_bit)
+ next_bit = offset + size;
+ spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
+}
+
+/*
+ * Use global flush state to avoid races with multiple flushers.
+ */
+static void flush_gart(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu_bitmap_lock, flags);
+ if (need_flush) {
+ amd_flush_garts();
+ need_flush = false;
+ }
+ spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
+}
+
+#ifdef CONFIG_IOMMU_LEAK
+/* Debugging aid for drivers that don't free their IOMMU tables */
+static int leak_trace;
+static int iommu_leak_pages = 20;
+
+static void dump_leak(void)
+{
+ static int dump;
+
+ if (dump)
+ return;
+ dump = 1;
+
+ show_stack(NULL, NULL);
+ debug_dma_dump_mappings(NULL);
+}
+#endif
+
+static void iommu_full(struct device *dev, size_t size, int dir)
+{
+ /*
+ * Ran out of IOMMU space for this operation. This is very bad.
+ * Unfortunately the drivers cannot handle this operation properly.
+ * Return some non mapped prereserved space in the aperture and
+ * let the Northbridge deal with it. This will result in garbage
+ * in the IO operation. When the size exceeds the prereserved space
+ * memory corruption will occur or random memory will be DMAed
+ * out. Hopefully no network devices use single mappings that big.
+ */
+
+ dev_err(dev, "PCI-DMA: Out of IOMMU space for %lu bytes\n", size);
+
+ if (size > PAGE_SIZE*EMERGENCY_PAGES) {
+ if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL)
+ panic("PCI-DMA: Memory would be corrupted\n");
+ if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL)
+ panic(KERN_ERR
+ "PCI-DMA: Random memory would be DMAed\n");
+ }
+#ifdef CONFIG_IOMMU_LEAK
+ dump_leak();
+#endif
+}
+
+static inline int
+need_iommu(struct device *dev, unsigned long addr, size_t size)
+{
+ return force_iommu || !dma_capable(dev, addr, size);
+}
+
+static inline int
+nonforced_iommu(struct device *dev, unsigned long addr, size_t size)
+{
+ return !dma_capable(dev, addr, size);
+}
+
+/* Map a single continuous physical area into the IOMMU.
+ * Caller needs to check if the iommu is needed and flush.
+ */
+static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem,
+ size_t size, int dir, unsigned long align_mask)
+{
+ unsigned long npages = iommu_num_pages(phys_mem, size, PAGE_SIZE);
+ unsigned long iommu_page;
+ int i;
+
+ if (unlikely(phys_mem + size > GART_MAX_PHYS_ADDR))
+ return bad_dma_addr;
+
+ iommu_page = alloc_iommu(dev, npages, align_mask);
+ if (iommu_page == -1) {
+ if (!nonforced_iommu(dev, phys_mem, size))
+ return phys_mem;
+ if (panic_on_overflow)
+ panic("dma_map_area overflow %lu bytes\n", size);
+ iommu_full(dev, size, dir);
+ return bad_dma_addr;
+ }
+
+ for (i = 0; i < npages; i++) {
+ iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
+ phys_mem += PAGE_SIZE;
+ }
+ return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK);
+}
+
+/* Map a single area into the IOMMU */
+static dma_addr_t gart_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ unsigned long bus;
+ phys_addr_t paddr = page_to_phys(page) + offset;
+
+ if (!dev)
+ dev = &x86_dma_fallback_dev;
+
+ if (!need_iommu(dev, paddr, size))
+ return paddr;
+
+ bus = dma_map_area(dev, paddr, size, dir, 0);
+ flush_gart();
+
+ return bus;
+}
+
+/*
+ * Free a DMA mapping.
+ */
+static void gart_unmap_page(struct device *dev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ unsigned long iommu_page;
+ int npages;
+ int i;
+
+ if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE ||
+ dma_addr >= iommu_bus_base + iommu_size)
+ return;
+
+ iommu_page = (dma_addr - iommu_bus_base)>>PAGE_SHIFT;
+ npages = iommu_num_pages(dma_addr, size, PAGE_SIZE);
+ for (i = 0; i < npages; i++) {
+ iommu_gatt_base[iommu_page + i] = gart_unmapped_entry;
+ }
+ free_iommu(iommu_page, npages);
+}
+
+/*
+ * Wrapper for pci_unmap_single working with scatterlists.
+ */
+static void gart_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+ enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+ struct scatterlist *s;
+ int i;
+
+ for_each_sg(sg, s, nents, i) {
+ if (!s->dma_length || !s->length)
+ break;
+ gart_unmap_page(dev, s->dma_address, s->dma_length, dir, NULL);
+ }
+}
+
+/* Fallback for dma_map_sg in case of overflow */
+static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg,
+ int nents, int dir)
+{
+ struct scatterlist *s;
+ int i;
+
+#ifdef CONFIG_IOMMU_DEBUG
+ pr_debug("dma_map_sg overflow\n");
+#endif
+
+ for_each_sg(sg, s, nents, i) {
+ unsigned long addr = sg_phys(s);
+
+ if (nonforced_iommu(dev, addr, s->length)) {
+ addr = dma_map_area(dev, addr, s->length, dir, 0);
+ if (addr == bad_dma_addr) {
+ if (i > 0)
+ gart_unmap_sg(dev, sg, i, dir, NULL);
+ nents = 0;
+ sg[0].dma_length = 0;
+ break;
+ }
+ }
+ s->dma_address = addr;
+ s->dma_length = s->length;
+ }
+ flush_gart();
+
+ return nents;
+}
+
+/* Map multiple scatterlist entries continuous into the first. */
+static int __dma_map_cont(struct device *dev, struct scatterlist *start,
+ int nelems, struct scatterlist *sout,
+ unsigned long pages)
+{
+ unsigned long iommu_start = alloc_iommu(dev, pages, 0);
+ unsigned long iommu_page = iommu_start;
+ struct scatterlist *s;
+ int i;
+
+ if (iommu_start == -1)
+ return -1;
+
+ for_each_sg(start, s, nelems, i) {
+ unsigned long pages, addr;
+ unsigned long phys_addr = s->dma_address;
+
+ BUG_ON(s != start && s->offset);
+ if (s == start) {
+ sout->dma_address = iommu_bus_base;
+ sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
+ sout->dma_length = s->length;
+ } else {
+ sout->dma_length += s->length;
+ }
+
+ addr = phys_addr;
+ pages = iommu_num_pages(s->offset, s->length, PAGE_SIZE);
+ while (pages--) {
+ iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr);
+ addr += PAGE_SIZE;
+ iommu_page++;
+ }
+ }
+ BUG_ON(iommu_page - iommu_start != pages);
+
+ return 0;
+}
+
+static inline int
+dma_map_cont(struct device *dev, struct scatterlist *start, int nelems,
+ struct scatterlist *sout, unsigned long pages, int need)
+{
+ if (!need) {
+ BUG_ON(nelems != 1);
+ sout->dma_address = start->dma_address;
+ sout->dma_length = start->length;
+ return 0;
+ }
+ return __dma_map_cont(dev, start, nelems, sout, pages);
+}
+
+/*
+ * DMA map all entries in a scatterlist.
+ * Merge chunks that have page aligned sizes into a continuous mapping.
+ */
+static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+ enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+ struct scatterlist *s, *ps, *start_sg, *sgmap;
+ int need = 0, nextneed, i, out, start;
+ unsigned long pages = 0;
+ unsigned int seg_size;
+ unsigned int max_seg_size;
+
+ if (nents == 0)
+ return 0;
+
+ if (!dev)
+ dev = &x86_dma_fallback_dev;
+
+ out = 0;
+ start = 0;
+ start_sg = sg;
+ sgmap = sg;
+ seg_size = 0;
+ max_seg_size = dma_get_max_seg_size(dev);
+ ps = NULL; /* shut up gcc */
+
+ for_each_sg(sg, s, nents, i) {
+ dma_addr_t addr = sg_phys(s);
+
+ s->dma_address = addr;
+ BUG_ON(s->length == 0);
+
+ nextneed = need_iommu(dev, addr, s->length);
+
+ /* Handle the previous not yet processed entries */
+ if (i > start) {
+ /*
+ * Can only merge when the last chunk ends on a
+ * page boundary and the new one doesn't have an
+ * offset.
+ */
+ if (!iommu_merge || !nextneed || !need || s->offset ||
+ (s->length + seg_size > max_seg_size) ||
+ (ps->offset + ps->length) % PAGE_SIZE) {
+ if (dma_map_cont(dev, start_sg, i - start,
+ sgmap, pages, need) < 0)
+ goto error;
+ out++;
+
+ seg_size = 0;
+ sgmap = sg_next(sgmap);
+ pages = 0;
+ start = i;
+ start_sg = s;
+ }
+ }
+
+ seg_size += s->length;
+ need = nextneed;
+ pages += iommu_num_pages(s->offset, s->length, PAGE_SIZE);
+ ps = s;
+ }
+ if (dma_map_cont(dev, start_sg, i - start, sgmap, pages, need) < 0)
+ goto error;
+ out++;
+ flush_gart();
+ if (out < nents) {
+ sgmap = sg_next(sgmap);
+ sgmap->dma_length = 0;
+ }
+ return out;
+
+error:
+ flush_gart();
+ gart_unmap_sg(dev, sg, out, dir, NULL);
+
+ /* When it was forced or merged try again in a dumb way */
+ if (force_iommu || iommu_merge) {
+ out = dma_map_sg_nonforce(dev, sg, nents, dir);
+ if (out > 0)
+ return out;
+ }
+ if (panic_on_overflow)
+ panic("dma_map_sg: overflow on %lu pages\n", pages);
+
+ iommu_full(dev, pages << PAGE_SHIFT, dir);
+ for_each_sg(sg, s, nents, i)
+ s->dma_address = bad_dma_addr;
+ return 0;
+}
+
+/* allocate and map a coherent mapping */
+static void *
+gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr,
+ gfp_t flag)
+{
+ dma_addr_t paddr;
+ unsigned long align_mask;
+ struct page *page;
+
+ if (force_iommu && !(flag & GFP_DMA)) {
+ flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
+ page = alloc_pages(flag | __GFP_ZERO, get_order(size));
+ if (!page)
+ return NULL;
+
+ align_mask = (1UL << get_order(size)) - 1;
+ paddr = dma_map_area(dev, page_to_phys(page), size,
+ DMA_BIDIRECTIONAL, align_mask);
+
+ flush_gart();
+ if (paddr != bad_dma_addr) {
+ *dma_addr = paddr;
+ return page_address(page);
+ }
+ __free_pages(page, get_order(size));
+ } else
+ return dma_generic_alloc_coherent(dev, size, dma_addr, flag);
+
+ return NULL;
+}
+
+/* free a coherent mapping */
+static void
+gart_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_addr)
+{
+ gart_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL, NULL);
+ free_pages((unsigned long)vaddr, get_order(size));
+}
+
+static int gart_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+ return (dma_addr == bad_dma_addr);
+}
+
+static int no_agp;
+
+static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size)
+{
+ unsigned long a;
+
+ if (!iommu_size) {
+ iommu_size = aper_size;
+ if (!no_agp)
+ iommu_size /= 2;
+ }
+
+ a = aper + iommu_size;
+ iommu_size -= round_up(a, PMD_PAGE_SIZE) - a;
+
+ if (iommu_size < 64*1024*1024) {
+ pr_warning(
+ "PCI-DMA: Warning: Small IOMMU %luMB."
+ " Consider increasing the AGP aperture in BIOS\n",
+ iommu_size >> 20);
+ }
+
+ return iommu_size;
+}
+
+static __init unsigned read_aperture(struct pci_dev *dev, u32 *size)
+{
+ unsigned aper_size = 0, aper_base_32, aper_order;
+ u64 aper_base;
+
+ pci_read_config_dword(dev, AMD64_GARTAPERTUREBASE, &aper_base_32);
+ pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &aper_order);
+ aper_order = (aper_order >> 1) & 7;
+
+ aper_base = aper_base_32 & 0x7fff;
+ aper_base <<= 25;
+
+ aper_size = (32 * 1024 * 1024) << aper_order;
+ if (aper_base + aper_size > 0x100000000UL || !aper_size)
+ aper_base = 0;
+
+ *size = aper_size;
+ return aper_base;
+}
+
+static void enable_gart_translations(void)
+{
+ int i;
+
+ if (!amd_nb_has_feature(AMD_NB_GART))
+ return;
+
+ for (i = 0; i < amd_nb_num(); i++) {
+ struct pci_dev *dev = node_to_amd_nb(i)->misc;
+
+ enable_gart_translation(dev, __pa(agp_gatt_table));
+ }
+
+ /* Flush the GART-TLB to remove stale entries */
+ amd_flush_garts();
+}
+
+/*
+ * If fix_up_north_bridges is set, the north bridges have to be fixed up on
+ * resume in the same way as they are handled in gart_iommu_hole_init().
+ */
+static bool fix_up_north_bridges;
+static u32 aperture_order;
+static u32 aperture_alloc;
+
+void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
+{
+ fix_up_north_bridges = true;
+ aperture_order = aper_order;
+ aperture_alloc = aper_alloc;
+}
+
+static void gart_fixup_northbridges(void)
+{
+ int i;
+
+ if (!fix_up_north_bridges)
+ return;
+
+ if (!amd_nb_has_feature(AMD_NB_GART))
+ return;
+
+ pr_info("PCI-DMA: Restoring GART aperture settings\n");
+
+ for (i = 0; i < amd_nb_num(); i++) {
+ struct pci_dev *dev = node_to_amd_nb(i)->misc;
+
+ /*
+ * Don't enable translations just yet. That is the next
+ * step. Restore the pre-suspend aperture settings.
+ */
+ gart_set_size_and_enable(dev, aperture_order);
+ pci_write_config_dword(dev, AMD64_GARTAPERTUREBASE, aperture_alloc >> 25);
+ }
+}
+
+static void gart_resume(void)
+{
+ pr_info("PCI-DMA: Resuming GART IOMMU\n");
+
+ gart_fixup_northbridges();
+
+ enable_gart_translations();
+}
+
+static struct syscore_ops gart_syscore_ops = {
+ .resume = gart_resume,
+
+};
+
+/*
+ * Private Northbridge GATT initialization in case we cannot use the
+ * AGP driver for some reason.
+ */
+static __init int init_amd_gatt(struct agp_kern_info *info)
+{
+ unsigned aper_size, gatt_size, new_aper_size;
+ unsigned aper_base, new_aper_base;
+ struct pci_dev *dev;
+ void *gatt;
+ int i;
+
+ pr_info("PCI-DMA: Disabling AGP.\n");
+
+ aper_size = aper_base = info->aper_size = 0;
+ dev = NULL;
+ for (i = 0; i < amd_nb_num(); i++) {
+ dev = node_to_amd_nb(i)->misc;
+ new_aper_base = read_aperture(dev, &new_aper_size);
+ if (!new_aper_base)
+ goto nommu;
+
+ if (!aper_base) {
+ aper_size = new_aper_size;
+ aper_base = new_aper_base;
+ }
+ if (aper_size != new_aper_size || aper_base != new_aper_base)
+ goto nommu;
+ }
+ if (!aper_base)
+ goto nommu;
+
+ info->aper_base = aper_base;
+ info->aper_size = aper_size >> 20;
+
+ gatt_size = (aper_size >> PAGE_SHIFT) * sizeof(u32);
+ gatt = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(gatt_size));
+ if (!gatt)
+ panic("Cannot allocate GATT table");
+ if (set_memory_uc((unsigned long)gatt, gatt_size >> PAGE_SHIFT))
+ panic("Could not set GART PTEs to uncacheable pages");
+
+ agp_gatt_table = gatt;
+
+ register_syscore_ops(&gart_syscore_ops);
+
+ flush_gart();
+
+ pr_info("PCI-DMA: aperture base @ %x size %u KB\n",
+ aper_base, aper_size>>10);
+
+ return 0;
+
+ nommu:
+ /* Should not happen anymore */
+ pr_warning("PCI-DMA: More than 4GB of RAM and no IOMMU\n"
+ "falling back to iommu=soft.\n");
+ return -1;
+}
+
+static struct dma_map_ops gart_dma_ops = {
+ .map_sg = gart_map_sg,
+ .unmap_sg = gart_unmap_sg,
+ .map_page = gart_map_page,
+ .unmap_page = gart_unmap_page,
+ .alloc_coherent = gart_alloc_coherent,
+ .free_coherent = gart_free_coherent,
+ .mapping_error = gart_mapping_error,
+};
+
+static void gart_iommu_shutdown(void)
+{
+ struct pci_dev *dev;
+ int i;
+
+ /* don't shutdown it if there is AGP installed */
+ if (!no_agp)
+ return;
+
+ if (!amd_nb_has_feature(AMD_NB_GART))
+ return;
+
+ for (i = 0; i < amd_nb_num(); i++) {
+ u32 ctl;
+
+ dev = node_to_amd_nb(i)->misc;
+ pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &ctl);
+
+ ctl &= ~GARTEN;
+
+ pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
+ }
+}
+
+int __init gart_iommu_init(void)
+{
+ struct agp_kern_info info;
+ unsigned long iommu_start;
+ unsigned long aper_base, aper_size;
+ unsigned long start_pfn, end_pfn;
+ unsigned long scratch;
+ long i;
+
+ if (!amd_nb_has_feature(AMD_NB_GART))
+ return 0;
+
+#ifndef CONFIG_AGP_AMD64
+ no_agp = 1;
+#else
+ /* Makefile puts PCI initialization via subsys_initcall first. */
+ /* Add other AMD AGP bridge drivers here */
+ no_agp = no_agp ||
+ (agp_amd64_init() < 0) ||
+ (agp_copy_info(agp_bridge, &info) < 0);
+#endif
+
+ if (no_iommu ||
+ (!force_iommu && max_pfn <= MAX_DMA32_PFN) ||
+ !gart_iommu_aperture ||
+ (no_agp && init_amd_gatt(&info) < 0)) {
+ if (max_pfn > MAX_DMA32_PFN) {
+ pr_warning("More than 4GB of memory but GART IOMMU not available.\n");
+ pr_warning("falling back to iommu=soft.\n");
+ }
+ return 0;
+ }
+
+ /* need to map that range */
+ aper_size = info.aper_size << 20;
+ aper_base = info.aper_base;
+ end_pfn = (aper_base>>PAGE_SHIFT) + (aper_size>>PAGE_SHIFT);
+
+ if (end_pfn > max_low_pfn_mapped) {
+ start_pfn = (aper_base>>PAGE_SHIFT);
+ init_memory_mapping(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);
+ }
+
+ pr_info("PCI-DMA: using GART IOMMU.\n");
+ iommu_size = check_iommu_size(info.aper_base, aper_size);
+ iommu_pages = iommu_size >> PAGE_SHIFT;
+
+ iommu_gart_bitmap = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ get_order(iommu_pages/8));
+ if (!iommu_gart_bitmap)
+ panic("Cannot allocate iommu bitmap\n");
+
+#ifdef CONFIG_IOMMU_LEAK
+ if (leak_trace) {
+ int ret;
+
+ ret = dma_debug_resize_entries(iommu_pages);
+ if (ret)
+ pr_debug("PCI-DMA: Cannot trace all the entries\n");
+ }
+#endif
+
+ /*
+ * Out of IOMMU space handling.
+ * Reserve some invalid pages at the beginning of the GART.
+ */
+ bitmap_set(iommu_gart_bitmap, 0, EMERGENCY_PAGES);
+
+ pr_info("PCI-DMA: Reserving %luMB of IOMMU area in the AGP aperture\n",
+ iommu_size >> 20);
+
+ agp_memory_reserved = iommu_size;
+ iommu_start = aper_size - iommu_size;
+ iommu_bus_base = info.aper_base + iommu_start;
+ bad_dma_addr = iommu_bus_base;
+ iommu_gatt_base = agp_gatt_table + (iommu_start>>PAGE_SHIFT);
+
+ /*
+ * Unmap the IOMMU part of the GART. The alias of the page is
+ * always mapped with cache enabled and there is no full cache
+ * coherency across the GART remapping. The unmapping avoids
+ * automatic prefetches from the CPU allocating cache lines in
+ * there. All CPU accesses are done via the direct mapping to
+ * the backing memory. The GART address is only used by PCI
+ * devices.
+ */
+ set_memory_np((unsigned long)__va(iommu_bus_base),
+ iommu_size >> PAGE_SHIFT);
+ /*
+ * Tricky. The GART table remaps the physical memory range,
+ * so the CPU wont notice potential aliases and if the memory
+ * is remapped to UC later on, we might surprise the PCI devices
+ * with a stray writeout of a cacheline. So play it sure and
+ * do an explicit, full-scale wbinvd() _after_ having marked all
+ * the pages as Not-Present:
+ */
+ wbinvd();
+
+ /*
+ * Now all caches are flushed and we can safely enable
+ * GART hardware. Doing it early leaves the possibility
+ * of stale cache entries that can lead to GART PTE
+ * errors.
+ */
+ enable_gart_translations();
+
+ /*
+ * Try to workaround a bug (thanks to BenH):
+ * Set unmapped entries to a scratch page instead of 0.
+ * Any prefetches that hit unmapped entries won't get an bus abort
+ * then. (P2P bridge may be prefetching on DMA reads).
+ */
+ scratch = get_zeroed_page(GFP_KERNEL);
+ if (!scratch)
+ panic("Cannot allocate iommu scratch page");
+ gart_unmapped_entry = GPTE_ENCODE(__pa(scratch));
+ for (i = EMERGENCY_PAGES; i < iommu_pages; i++)
+ iommu_gatt_base[i] = gart_unmapped_entry;
+
+ flush_gart();
+ dma_ops = &gart_dma_ops;
+ x86_platform.iommu_shutdown = gart_iommu_shutdown;
+ swiotlb = 0;
+
+ return 0;
+}
+
+void __init gart_parse_options(char *p)
+{
+ int arg;
+
+#ifdef CONFIG_IOMMU_LEAK
+ if (!strncmp(p, "leak", 4)) {
+ leak_trace = 1;
+ p += 4;
+ if (*p == '=')
+ ++p;
+ if (isdigit(*p) && get_option(&p, &arg))
+ iommu_leak_pages = arg;
+ }
+#endif
+ if (isdigit(*p) && get_option(&p, &arg))
+ iommu_size = arg;
+ if (!strncmp(p, "fullflush", 9))
+ iommu_fullflush = 1;
+ if (!strncmp(p, "nofullflush", 11))
+ iommu_fullflush = 0;
+ if (!strncmp(p, "noagp", 5))
+ no_agp = 1;
+ if (!strncmp(p, "noaperture", 10))
+ fix_aperture = 0;
+ /* duplicated from pci-dma.c */
+ if (!strncmp(p, "force", 5))
+ gart_iommu_aperture_allowed = 1;
+ if (!strncmp(p, "allowed", 7))
+ gart_iommu_aperture_allowed = 1;
+ if (!strncmp(p, "memaper", 7)) {
+ fallback_aper_force = 1;
+ p += 7;
+ if (*p == '=') {
+ ++p;
+ if (get_option(&p, &arg))
+ fallback_aper_order = arg;
+ }
+ }
+}
+IOMMU_INIT_POST(gart_iommu_hole_init);
*/
#include <linux/pci.h>
+#include <linux/pci-ats.h>
#include <linux/bitmap.h>
#include <linux/slab.h>
#include <linux/debugfs.h>
#include <linux/dma-mapping.h>
#include <linux/iommu-helper.h>
#include <linux/iommu.h>
+#include <linux/delay.h>
#include <asm/proto.h>
#include <asm/iommu.h>
#include <asm/gart.h>
#define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28))
-#define EXIT_LOOP_COUNT 10000000
+#define LOOP_TIMEOUT 100000
static DEFINE_RWLOCK(amd_iommu_devtable_lock);
u32 data[4];
};
-static void reset_iommu_command_buffer(struct amd_iommu *iommu);
static void update_domain(struct protection_domain *domain);
/****************************************************************************
break;
case EVENT_TYPE_ILL_CMD:
printk("ILLEGAL_COMMAND_ERROR address=0x%016llx]\n", address);
- iommu->reset_in_progress = true;
- reset_iommu_command_buffer(iommu);
dump_command(address);
break;
case EVENT_TYPE_CMD_HARD_ERR:
spin_unlock_irqrestore(&iommu->lock, flags);
}
-irqreturn_t amd_iommu_int_handler(int irq, void *data)
+irqreturn_t amd_iommu_int_thread(int irq, void *data)
{
struct amd_iommu *iommu;
return IRQ_HANDLED;
}
+irqreturn_t amd_iommu_int_handler(int irq, void *data)
+{
+ return IRQ_WAKE_THREAD;
+}
+
/****************************************************************************
*
* IOMMU command queuing functions
*
****************************************************************************/
-/*
- * Writes the command to the IOMMUs command buffer and informs the
- * hardware about the new command. Must be called with iommu->lock held.
- */
-static int __iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd)
+static int wait_on_sem(volatile u64 *sem)
+{
+ int i = 0;
+
+ while (*sem == 0 && i < LOOP_TIMEOUT) {
+ udelay(1);
+ i += 1;
+ }
+
+ if (i == LOOP_TIMEOUT) {
+ pr_alert("AMD-Vi: Completion-Wait loop timed out\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static void copy_cmd_to_buffer(struct amd_iommu *iommu,
+ struct iommu_cmd *cmd,
+ u32 tail)
{
- u32 tail, head;
u8 *target;
- WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
- tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
target = iommu->cmd_buf + tail;
- memcpy_toio(target, cmd, sizeof(*cmd));
- tail = (tail + sizeof(*cmd)) % iommu->cmd_buf_size;
- head = readl(iommu->mmio_base + MMIO_CMD_HEAD_OFFSET);
- if (tail == head)
- return -ENOMEM;
+ tail = (tail + sizeof(*cmd)) % iommu->cmd_buf_size;
+
+ /* Copy command to buffer */
+ memcpy(target, cmd, sizeof(*cmd));
+
+ /* Tell the IOMMU about it */
writel(tail, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+}
- return 0;
+static void build_completion_wait(struct iommu_cmd *cmd, u64 address)
+{
+ WARN_ON(address & 0x7ULL);
+
+ memset(cmd, 0, sizeof(*cmd));
+ cmd->data[0] = lower_32_bits(__pa(address)) | CMD_COMPL_WAIT_STORE_MASK;
+ cmd->data[1] = upper_32_bits(__pa(address));
+ cmd->data[2] = 1;
+ CMD_SET_TYPE(cmd, CMD_COMPL_WAIT);
+}
+
+static void build_inv_dte(struct iommu_cmd *cmd, u16 devid)
+{
+ memset(cmd, 0, sizeof(*cmd));
+ cmd->data[0] = devid;
+ CMD_SET_TYPE(cmd, CMD_INV_DEV_ENTRY);
+}
+
+static void build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
+ size_t size, u16 domid, int pde)
+{
+ u64 pages;
+ int s;
+
+ pages = iommu_num_pages(address, size, PAGE_SIZE);
+ s = 0;
+
+ if (pages > 1) {
+ /*
+ * If we have to flush more than one page, flush all
+ * TLB entries for this domain
+ */
+ address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
+ s = 1;
+ }
+
+ address &= PAGE_MASK;
+
+ memset(cmd, 0, sizeof(*cmd));
+ cmd->data[1] |= domid;
+ cmd->data[2] = lower_32_bits(address);
+ cmd->data[3] = upper_32_bits(address);
+ CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
+ if (s) /* size bit - we flush more than one 4kb page */
+ cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
+ if (pde) /* PDE bit - we wan't flush everything not only the PTEs */
+ cmd->data[2] |= CMD_INV_IOMMU_PAGES_PDE_MASK;
+}
+
+static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
+ u64 address, size_t size)
+{
+ u64 pages;
+ int s;
+
+ pages = iommu_num_pages(address, size, PAGE_SIZE);
+ s = 0;
+
+ if (pages > 1) {
+ /*
+ * If we have to flush more than one page, flush all
+ * TLB entries for this domain
+ */
+ address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
+ s = 1;
+ }
+
+ address &= PAGE_MASK;
+
+ memset(cmd, 0, sizeof(*cmd));
+ cmd->data[0] = devid;
+ cmd->data[0] |= (qdep & 0xff) << 24;
+ cmd->data[1] = devid;
+ cmd->data[2] = lower_32_bits(address);
+ cmd->data[3] = upper_32_bits(address);
+ CMD_SET_TYPE(cmd, CMD_INV_IOTLB_PAGES);
+ if (s)
+ cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
+}
+
+static void build_inv_all(struct iommu_cmd *cmd)
+{
+ memset(cmd, 0, sizeof(*cmd));
+ CMD_SET_TYPE(cmd, CMD_INV_ALL);
}
/*
- * General queuing function for commands. Takes iommu->lock and calls
- * __iommu_queue_command().
+ * Writes the command to the IOMMUs command buffer and informs the
+ * hardware about the new command.
*/
static int iommu_queue_command(struct amd_iommu *iommu, struct iommu_cmd *cmd)
{
+ u32 left, tail, head, next_tail;
unsigned long flags;
- int ret;
+ WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
+
+again:
spin_lock_irqsave(&iommu->lock, flags);
- ret = __iommu_queue_command(iommu, cmd);
- if (!ret)
- iommu->need_sync = true;
- spin_unlock_irqrestore(&iommu->lock, flags);
- return ret;
-}
+ head = readl(iommu->mmio_base + MMIO_CMD_HEAD_OFFSET);
+ tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+ next_tail = (tail + sizeof(*cmd)) % iommu->cmd_buf_size;
+ left = (head - next_tail) % iommu->cmd_buf_size;
-/*
- * This function waits until an IOMMU has completed a completion
- * wait command
- */
-static void __iommu_wait_for_completion(struct amd_iommu *iommu)
-{
- int ready = 0;
- unsigned status = 0;
- unsigned long i = 0;
+ if (left <= 2) {
+ struct iommu_cmd sync_cmd;
+ volatile u64 sem = 0;
+ int ret;
- INC_STATS_COUNTER(compl_wait);
+ build_completion_wait(&sync_cmd, (u64)&sem);
+ copy_cmd_to_buffer(iommu, &sync_cmd, tail);
- while (!ready && (i < EXIT_LOOP_COUNT)) {
- ++i;
- /* wait for the bit to become one */
- status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
- ready = status & MMIO_STATUS_COM_WAIT_INT_MASK;
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+ if ((ret = wait_on_sem(&sem)) != 0)
+ return ret;
+
+ goto again;
}
- /* set bit back to zero */
- status &= ~MMIO_STATUS_COM_WAIT_INT_MASK;
- writel(status, iommu->mmio_base + MMIO_STATUS_OFFSET);
+ copy_cmd_to_buffer(iommu, cmd, tail);
+
+ /* We need to sync now to make sure all commands are processed */
+ iommu->need_sync = true;
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
- if (unlikely(i == EXIT_LOOP_COUNT))
- iommu->reset_in_progress = true;
+ return 0;
}
/*
* This function queues a completion wait command into the command
* buffer of an IOMMU
*/
-static int __iommu_completion_wait(struct amd_iommu *iommu)
+static int iommu_completion_wait(struct amd_iommu *iommu)
{
struct iommu_cmd cmd;
+ volatile u64 sem = 0;
+ int ret;
- memset(&cmd, 0, sizeof(cmd));
- cmd.data[0] = CMD_COMPL_WAIT_INT_MASK;
- CMD_SET_TYPE(&cmd, CMD_COMPL_WAIT);
+ if (!iommu->need_sync)
+ return 0;
- return __iommu_queue_command(iommu, &cmd);
+ build_completion_wait(&cmd, (u64)&sem);
+
+ ret = iommu_queue_command(iommu, &cmd);
+ if (ret)
+ return ret;
+
+ return wait_on_sem(&sem);
}
-/*
- * This function is called whenever we need to ensure that the IOMMU has
- * completed execution of all commands we sent. It sends a
- * COMPLETION_WAIT command and waits for it to finish. The IOMMU informs
- * us about that by writing a value to a physical address we pass with
- * the command.
- */
-static int iommu_completion_wait(struct amd_iommu *iommu)
+static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid)
{
- int ret = 0;
- unsigned long flags;
+ struct iommu_cmd cmd;
- spin_lock_irqsave(&iommu->lock, flags);
+ build_inv_dte(&cmd, devid);
- if (!iommu->need_sync)
- goto out;
+ return iommu_queue_command(iommu, &cmd);
+}
- ret = __iommu_completion_wait(iommu);
+static void iommu_flush_dte_all(struct amd_iommu *iommu)
+{
+ u32 devid;
- iommu->need_sync = false;
+ for (devid = 0; devid <= 0xffff; ++devid)
+ iommu_flush_dte(iommu, devid);
- if (ret)
- goto out;
-
- __iommu_wait_for_completion(iommu);
+ iommu_completion_wait(iommu);
+}
-out:
- spin_unlock_irqrestore(&iommu->lock, flags);
+/*
+ * This function uses heavy locking and may disable irqs for some time. But
+ * this is no issue because it is only called during resume.
+ */
+static void iommu_flush_tlb_all(struct amd_iommu *iommu)
+{
+ u32 dom_id;
- if (iommu->reset_in_progress)
- reset_iommu_command_buffer(iommu);
+ for (dom_id = 0; dom_id <= 0xffff; ++dom_id) {
+ struct iommu_cmd cmd;
+ build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+ dom_id, 1);
+ iommu_queue_command(iommu, &cmd);
+ }
- return 0;
+ iommu_completion_wait(iommu);
}
-static void iommu_flush_complete(struct protection_domain *domain)
+static void iommu_flush_all(struct amd_iommu *iommu)
{
- int i;
+ struct iommu_cmd cmd;
- for (i = 0; i < amd_iommus_present; ++i) {
- if (!domain->dev_iommu[i])
- continue;
+ build_inv_all(&cmd);
- /*
- * Devices of this domain are behind this IOMMU
- * We need to wait for completion of all commands.
- */
- iommu_completion_wait(amd_iommus[i]);
+ iommu_queue_command(iommu, &cmd);
+ iommu_completion_wait(iommu);
+}
+
+void iommu_flush_all_caches(struct amd_iommu *iommu)
+{
+ if (iommu_feature(iommu, FEATURE_IA)) {
+ iommu_flush_all(iommu);
+ } else {
+ iommu_flush_dte_all(iommu);
+ iommu_flush_tlb_all(iommu);
}
}
/*
- * Command send function for invalidating a device table entry
+ * Command send function for flushing on-device TLB
*/
-static int iommu_flush_device(struct device *dev)
+static int device_flush_iotlb(struct device *dev, u64 address, size_t size)
{
+ struct pci_dev *pdev = to_pci_dev(dev);
struct amd_iommu *iommu;
struct iommu_cmd cmd;
u16 devid;
+ int qdep;
+ qdep = pci_ats_queue_depth(pdev);
devid = get_device_id(dev);
iommu = amd_iommu_rlookup_table[devid];
- /* Build command */
- memset(&cmd, 0, sizeof(cmd));
- CMD_SET_TYPE(&cmd, CMD_INV_DEV_ENTRY);
- cmd.data[0] = devid;
+ build_inv_iotlb_pages(&cmd, devid, qdep, address, size);
return iommu_queue_command(iommu, &cmd);
}
-static void __iommu_build_inv_iommu_pages(struct iommu_cmd *cmd, u64 address,
- u16 domid, int pde, int s)
-{
- memset(cmd, 0, sizeof(*cmd));
- address &= PAGE_MASK;
- CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
- cmd->data[1] |= domid;
- cmd->data[2] = lower_32_bits(address);
- cmd->data[3] = upper_32_bits(address);
- if (s) /* size bit - we flush more than one 4kb page */
- cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
- if (pde) /* PDE bit - we wan't flush everything not only the PTEs */
- cmd->data[2] |= CMD_INV_IOMMU_PAGES_PDE_MASK;
-}
-
/*
- * Generic command send function for invalidaing TLB entries
+ * Command send function for invalidating a device table entry
*/
-static int iommu_queue_inv_iommu_pages(struct amd_iommu *iommu,
- u64 address, u16 domid, int pde, int s)
+static int device_flush_dte(struct device *dev)
{
- struct iommu_cmd cmd;
+ struct amd_iommu *iommu;
+ struct pci_dev *pdev;
+ u16 devid;
int ret;
- __iommu_build_inv_iommu_pages(&cmd, address, domid, pde, s);
+ pdev = to_pci_dev(dev);
+ devid = get_device_id(dev);
+ iommu = amd_iommu_rlookup_table[devid];
- ret = iommu_queue_command(iommu, &cmd);
+ ret = iommu_flush_dte(iommu, devid);
+ if (ret)
+ return ret;
+
+ if (pci_ats_enabled(pdev))
+ ret = device_flush_iotlb(dev, 0, ~0UL);
return ret;
}
* It invalidates a single PTE if the range to flush is within a single
* page. Otherwise it flushes the whole TLB of the IOMMU.
*/
-static void __iommu_flush_pages(struct protection_domain *domain,
- u64 address, size_t size, int pde)
+static void __domain_flush_pages(struct protection_domain *domain,
+ u64 address, size_t size, int pde)
{
- int s = 0, i;
- unsigned long pages = iommu_num_pages(address, size, PAGE_SIZE);
-
- address &= PAGE_MASK;
-
- if (pages > 1) {
- /*
- * If we have to flush more than one page, flush all
- * TLB entries for this domain
- */
- address = CMD_INV_IOMMU_ALL_PAGES_ADDRESS;
- s = 1;
- }
+ struct iommu_dev_data *dev_data;
+ struct iommu_cmd cmd;
+ int ret = 0, i;
+ build_inv_iommu_pages(&cmd, address, size, domain->id, pde);
for (i = 0; i < amd_iommus_present; ++i) {
if (!domain->dev_iommu[i])
* Devices of this domain are behind this IOMMU
* We need a TLB flush
*/
- iommu_queue_inv_iommu_pages(amd_iommus[i], address,
- domain->id, pde, s);
+ ret |= iommu_queue_command(amd_iommus[i], &cmd);
+ }
+
+ list_for_each_entry(dev_data, &domain->dev_list, list) {
+ struct pci_dev *pdev = to_pci_dev(dev_data->dev);
+
+ if (!pci_ats_enabled(pdev))
+ continue;
+
+ ret |= device_flush_iotlb(dev_data->dev, address, size);
}
- return;
+ WARN_ON(ret);
}
-static void iommu_flush_pages(struct protection_domain *domain,
- u64 address, size_t size)
+static void domain_flush_pages(struct protection_domain *domain,
+ u64 address, size_t size)
{
- __iommu_flush_pages(domain, address, size, 0);
+ __domain_flush_pages(domain, address, size, 0);
}
/* Flush the whole IO/TLB for a given protection domain */
-static void iommu_flush_tlb(struct protection_domain *domain)
+static void domain_flush_tlb(struct protection_domain *domain)
{
- __iommu_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 0);
+ __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 0);
}
/* Flush the whole IO/TLB for a given protection domain - including PDE */
-static void iommu_flush_tlb_pde(struct protection_domain *domain)
+static void domain_flush_tlb_pde(struct protection_domain *domain)
{
- __iommu_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1);
-}
-
-
-/*
- * This function flushes the DTEs for all devices in domain
- */
-static void iommu_flush_domain_devices(struct protection_domain *domain)
-{
- struct iommu_dev_data *dev_data;
- unsigned long flags;
-
- spin_lock_irqsave(&domain->lock, flags);
-
- list_for_each_entry(dev_data, &domain->dev_list, list)
- iommu_flush_device(dev_data->dev);
-
- spin_unlock_irqrestore(&domain->lock, flags);
+ __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1);
}
-static void iommu_flush_all_domain_devices(void)
+static void domain_flush_complete(struct protection_domain *domain)
{
- struct protection_domain *domain;
- unsigned long flags;
+ int i;
- spin_lock_irqsave(&amd_iommu_pd_lock, flags);
+ for (i = 0; i < amd_iommus_present; ++i) {
+ if (!domain->dev_iommu[i])
+ continue;
- list_for_each_entry(domain, &amd_iommu_pd_list, list) {
- iommu_flush_domain_devices(domain);
- iommu_flush_complete(domain);
+ /*
+ * Devices of this domain are behind this IOMMU
+ * We need to wait for completion of all commands.
+ */
+ iommu_completion_wait(amd_iommus[i]);
}
-
- spin_unlock_irqrestore(&amd_iommu_pd_lock, flags);
}
-void amd_iommu_flush_all_devices(void)
-{
- iommu_flush_all_domain_devices();
-}
/*
- * This function uses heavy locking and may disable irqs for some time. But
- * this is no issue because it is only called during resume.
+ * This function flushes the DTEs for all devices in domain
*/
-void amd_iommu_flush_all_domains(void)
+static void domain_flush_devices(struct protection_domain *domain)
{
- struct protection_domain *domain;
+ struct iommu_dev_data *dev_data;
unsigned long flags;
- spin_lock_irqsave(&amd_iommu_pd_lock, flags);
-
- list_for_each_entry(domain, &amd_iommu_pd_list, list) {
- spin_lock(&domain->lock);
- iommu_flush_tlb_pde(domain);
- iommu_flush_complete(domain);
- spin_unlock(&domain->lock);
- }
-
- spin_unlock_irqrestore(&amd_iommu_pd_lock, flags);
-}
-
-static void reset_iommu_command_buffer(struct amd_iommu *iommu)
-{
- pr_err("AMD-Vi: Resetting IOMMU command buffer\n");
-
- if (iommu->reset_in_progress)
- panic("AMD-Vi: ILLEGAL_COMMAND_ERROR while resetting command buffer\n");
+ spin_lock_irqsave(&domain->lock, flags);
- amd_iommu_reset_cmd_buffer(iommu);
- amd_iommu_flush_all_devices();
- amd_iommu_flush_all_domains();
+ list_for_each_entry(dev_data, &domain->dev_list, list)
+ device_flush_dte(dev_data->dev);
- iommu->reset_in_progress = false;
+ spin_unlock_irqrestore(&domain->lock, flags);
}
/****************************************************************************
return domain->flags & PD_DMA_OPS_MASK;
}
-static void set_dte_entry(u16 devid, struct protection_domain *domain)
+static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
{
u64 pte_root = virt_to_phys(domain->pt_root);
+ u32 flags = 0;
pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
pte_root |= IOMMU_PTE_IR | IOMMU_PTE_IW | IOMMU_PTE_P | IOMMU_PTE_TV;
- amd_iommu_dev_table[devid].data[2] = domain->id;
- amd_iommu_dev_table[devid].data[1] = upper_32_bits(pte_root);
- amd_iommu_dev_table[devid].data[0] = lower_32_bits(pte_root);
+ if (ats)
+ flags |= DTE_FLAG_IOTLB;
+
+ amd_iommu_dev_table[devid].data[3] |= flags;
+ amd_iommu_dev_table[devid].data[2] = domain->id;
+ amd_iommu_dev_table[devid].data[1] = upper_32_bits(pte_root);
+ amd_iommu_dev_table[devid].data[0] = lower_32_bits(pte_root);
}
static void clear_dte_entry(u16 devid)
{
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu;
+ struct pci_dev *pdev;
+ bool ats = false;
u16 devid;
devid = get_device_id(dev);
iommu = amd_iommu_rlookup_table[devid];
dev_data = get_dev_data(dev);
+ pdev = to_pci_dev(dev);
+
+ if (amd_iommu_iotlb_sup)
+ ats = pci_ats_enabled(pdev);
/* Update data structures */
dev_data->domain = domain;
list_add(&dev_data->list, &domain->dev_list);
- set_dte_entry(devid, domain);
+ set_dte_entry(devid, domain, ats);
/* Do reference counting */
domain->dev_iommu[iommu->index] += 1;
domain->dev_cnt += 1;
/* Flush the DTE entry */
- iommu_flush_device(dev);
+ device_flush_dte(dev);
}
static void do_detach(struct device *dev)
{
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu;
+ struct pci_dev *pdev;
u16 devid;
devid = get_device_id(dev);
iommu = amd_iommu_rlookup_table[devid];
dev_data = get_dev_data(dev);
+ pdev = to_pci_dev(dev);
/* decrease reference counters */
dev_data->domain->dev_iommu[iommu->index] -= 1;
clear_dte_entry(devid);
/* Flush the DTE entry */
- iommu_flush_device(dev);
+ device_flush_dte(dev);
}
/*
static int attach_device(struct device *dev,
struct protection_domain *domain)
{
+ struct pci_dev *pdev = to_pci_dev(dev);
unsigned long flags;
int ret;
+ if (amd_iommu_iotlb_sup)
+ pci_enable_ats(pdev, PAGE_SHIFT);
+
write_lock_irqsave(&amd_iommu_devtable_lock, flags);
ret = __attach_device(dev, domain);
write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
* left the caches in the IOMMU dirty. So we have to flush
* here to evict all dirty stuff.
*/
- iommu_flush_tlb_pde(domain);
+ domain_flush_tlb_pde(domain);
return ret;
}
*/
static void detach_device(struct device *dev)
{
+ struct pci_dev *pdev = to_pci_dev(dev);
unsigned long flags;
/* lock device table */
write_lock_irqsave(&amd_iommu_devtable_lock, flags);
__detach_device(dev);
write_unlock_irqrestore(&amd_iommu_devtable_lock, flags);
+
+ if (amd_iommu_iotlb_sup && pci_ats_enabled(pdev))
+ pci_disable_ats(pdev);
}
/*
goto out;
}
- iommu_flush_device(dev);
+ device_flush_dte(dev);
iommu_completion_wait(iommu);
out:
struct iommu_dev_data *dev_data;
list_for_each_entry(dev_data, &domain->dev_list, list) {
+ struct pci_dev *pdev = to_pci_dev(dev_data->dev);
u16 devid = get_device_id(dev_data->dev);
- set_dte_entry(devid, domain);
+ set_dte_entry(devid, domain, pci_ats_enabled(pdev));
}
}
return;
update_device_table(domain);
- iommu_flush_domain_devices(domain);
- iommu_flush_tlb_pde(domain);
+
+ domain_flush_devices(domain);
+ domain_flush_tlb_pde(domain);
domain->updated = false;
}
ADD_STATS_COUNTER(alloced_io_mem, size);
if (unlikely(dma_dom->need_flush && !amd_iommu_unmap_flush)) {
- iommu_flush_tlb(&dma_dom->domain);
+ domain_flush_tlb(&dma_dom->domain);
dma_dom->need_flush = false;
} else if (unlikely(amd_iommu_np_cache))
- iommu_flush_pages(&dma_dom->domain, address, size);
+ domain_flush_pages(&dma_dom->domain, address, size);
out:
return address;
dma_ops_free_addresses(dma_dom, dma_addr, pages);
if (amd_iommu_unmap_flush || dma_dom->need_flush) {
- iommu_flush_pages(&dma_dom->domain, flush_addr, size);
+ domain_flush_pages(&dma_dom->domain, flush_addr, size);
dma_dom->need_flush = false;
}
}
if (addr == DMA_ERROR_CODE)
goto out;
- iommu_flush_complete(domain);
+ domain_flush_complete(domain);
out:
spin_unlock_irqrestore(&domain->lock, flags);
__unmap_single(domain->priv, dma_addr, size, dir);
- iommu_flush_complete(domain);
+ domain_flush_complete(domain);
spin_unlock_irqrestore(&domain->lock, flags);
}
goto unmap;
}
- iommu_flush_complete(domain);
+ domain_flush_complete(domain);
out:
spin_unlock_irqrestore(&domain->lock, flags);
s->dma_address = s->dma_length = 0;
}
- iommu_flush_complete(domain);
+ domain_flush_complete(domain);
spin_unlock_irqrestore(&domain->lock, flags);
}
goto out_free;
}
- iommu_flush_complete(domain);
+ domain_flush_complete(domain);
spin_unlock_irqrestore(&domain->lock, flags);
__unmap_single(domain->priv, dma_addr, size, DMA_BIDIRECTIONAL);
- iommu_flush_complete(domain);
+ domain_flush_complete(domain);
spin_unlock_irqrestore(&domain->lock, flags);
if (!iommu)
return;
- iommu_flush_device(dev);
+ device_flush_dte(dev);
iommu_completion_wait(iommu);
}
unmap_size = iommu_unmap_page(domain, iova, page_size);
mutex_unlock(&domain->api_lock);
- iommu_flush_tlb_pde(domain);
+ domain_flush_tlb_pde(domain);
return get_order(unmap_size);
}
/* IOMMUs have a non-present cache? */
bool amd_iommu_np_cache __read_mostly;
+bool amd_iommu_iotlb_sup __read_mostly = true;
/*
* The ACPI table parsing functions set this variable on an error
static u32 alias_table_size; /* size of the alias table */
static u32 rlookup_table_size; /* size if the rlookup table */
+/*
+ * This function flushes all internal caches of
+ * the IOMMU used by this driver.
+ */
+extern void iommu_flush_all_caches(struct amd_iommu *iommu);
+
static inline void update_last_devid(u16 devid)
{
if (devid > amd_iommu_last_bdf)
/* Function to enable the hardware */
static void iommu_enable(struct amd_iommu *iommu)
{
- printk(KERN_INFO "AMD-Vi: Enabling IOMMU at %s cap 0x%hx\n",
+ static const char * const feat_str[] = {
+ "PreF", "PPR", "X2APIC", "NX", "GT", "[5]",
+ "IA", "GA", "HE", "PC", NULL
+ };
+ int i;
+
+ printk(KERN_INFO "AMD-Vi: Enabling IOMMU at %s cap 0x%hx",
dev_name(&iommu->dev->dev), iommu->cap_ptr);
+ if (iommu->cap & (1 << IOMMU_CAP_EFR)) {
+ printk(KERN_CONT " extended features: ");
+ for (i = 0; feat_str[i]; ++i)
+ if (iommu_feature(iommu, (1ULL << i)))
+ printk(KERN_CONT " %s", feat_str[i]);
+ }
+ printk(KERN_CONT "\n");
+
iommu_feature_enable(iommu, CONTROL_IOMMU_EN);
}
static void __init init_iommu_from_pci(struct amd_iommu *iommu)
{
int cap_ptr = iommu->cap_ptr;
- u32 range, misc;
+ u32 range, misc, low, high;
int i, j;
pci_read_config_dword(iommu->dev, cap_ptr + MMIO_CAP_HDR_OFFSET,
MMIO_GET_LD(range));
iommu->evt_msi_num = MMIO_MSI_NUM(misc);
+ if (!(iommu->cap & (1 << IOMMU_CAP_IOTLB)))
+ amd_iommu_iotlb_sup = false;
+
+ /* read extended feature bits */
+ low = readl(iommu->mmio_base + MMIO_EXT_FEATURES);
+ high = readl(iommu->mmio_base + MMIO_EXT_FEATURES + 4);
+
+ iommu->features = ((u64)high << 32) | low;
+
if (!is_rd890_iommu(iommu->dev))
return;
if (pci_enable_msi(iommu->dev))
return 1;
- r = request_irq(iommu->dev->irq, amd_iommu_int_handler,
- IRQF_SAMPLE_RANDOM,
- "AMD-Vi",
- NULL);
+ r = request_threaded_irq(iommu->dev->irq,
+ amd_iommu_int_handler,
+ amd_iommu_int_thread,
+ 0, "AMD-Vi",
+ iommu->dev);
if (r) {
pci_disable_msi(iommu->dev);
iommu_set_exclusion_range(iommu);
iommu_init_msi(iommu);
iommu_enable(iommu);
+ iommu_flush_all_caches(iommu);
}
}
* we have to flush after the IOMMUs are enabled because a
* disabled IOMMU will never execute the commands we send
*/
- amd_iommu_flush_all_devices();
- amd_iommu_flush_all_domains();
+ for_each_iommu(iommu)
+ iommu_flush_all_caches(iommu);
}
static int amd_iommu_suspend(void)
.rating = APBT_CLOCKSOURCE_RATING,
.read = apbt_read_clocksource,
.mask = APBT_MASK,
- .shift = APBT_SHIFT,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.resume = apbt_restart_clocksource,
};
if (t1 == apbt_read_clocksource(&clocksource_apbt))
panic("APBT counter not counting. APBT disabled\n");
- /*
- * initialize and register APBT clocksource
- * convert that to ns/clock cycle
- * mult = (ns/c) * 2^APBT_SHIFT
- */
- clocksource_apbt.mult = div_sc(MSEC_PER_SEC,
- (unsigned long) apbt_freq, APBT_SHIFT);
- clocksource_register(&clocksource_apbt);
+ clocksource_register_khz(&clocksource_apbt, (u32)apbt_freq*1000);
return 0;
}
*/
void smp_error_interrupt(struct pt_regs *regs)
{
- u32 v, v1;
+ u32 v0, v1;
+ u32 i = 0;
+ static const char * const error_interrupt_reason[] = {
+ "Send CS error", /* APIC Error Bit 0 */
+ "Receive CS error", /* APIC Error Bit 1 */
+ "Send accept error", /* APIC Error Bit 2 */
+ "Receive accept error", /* APIC Error Bit 3 */
+ "Redirectable IPI", /* APIC Error Bit 4 */
+ "Send illegal vector", /* APIC Error Bit 5 */
+ "Received illegal vector", /* APIC Error Bit 6 */
+ "Illegal register address", /* APIC Error Bit 7 */
+ };
exit_idle();
irq_enter();
/* First tickle the hardware, only then report what went on. -- REW */
- v = apic_read(APIC_ESR);
+ v0 = apic_read(APIC_ESR);
apic_write(APIC_ESR, 0);
v1 = apic_read(APIC_ESR);
ack_APIC_irq();
atomic_inc(&irq_err_count);
- /*
- * Here is what the APIC error bits mean:
- * 0: Send CS error
- * 1: Receive CS error
- * 2: Send accept error
- * 3: Receive accept error
- * 4: Reserved
- * 5: Send illegal vector
- * 6: Received illegal vector
- * 7: Illegal register address
- */
- pr_debug("APIC error on CPU%d: %02x(%02x)\n",
- smp_processor_id(), v , v1);
+ apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
+ smp_processor_id(), v0 , v1);
+
+ v1 = v1 & 0xff;
+ while (v1) {
+ if (v1 & 0x1)
+ apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]);
+ i++;
+ v1 >>= 1;
+ };
+
+ apic_printk(APIC_DEBUG, KERN_CONT "\n");
+
irq_exit();
}
}
early_param("noapic", parse_noapic);
-static int io_apic_setup_irq_pin_once(unsigned int irq, int node,
- struct io_apic_irq_attr *attr);
+static int io_apic_setup_irq_pin(unsigned int irq, int node,
+ struct io_apic_irq_attr *attr);
/* Will be called in mpparse/acpi/sfi codes for saving IRQ info */
void mp_save_irq(struct mpc_intsrc *m)
}
#endif /* CONFIG_HT_IRQ */
-int
+static int
io_apic_setup_irq_pin(unsigned int irq, int node, struct io_apic_irq_attr *attr)
{
struct irq_cfg *cfg = alloc_irq_and_cfg_at(irq, node);
return ret;
}
-static int io_apic_setup_irq_pin_once(unsigned int irq, int node,
- struct io_apic_irq_attr *attr)
+int io_apic_setup_irq_pin_once(unsigned int irq, int node,
+ struct io_apic_irq_attr *attr)
{
unsigned int id = attr->ioapic, pin = attr->ioapic_pin;
int ret;
#include <asm/smp.h>
#include <asm/x86_init.h>
#include <asm/emergency-restart.h>
+#include <asm/nmi.h>
+
+/* BMC sets a bit this MMR non-zero before sending an NMI */
+#define UVH_NMI_MMR UVH_SCRATCH5
+#define UVH_NMI_MMR_CLEAR (UVH_NMI_MMR + 8)
+#define UV_NMI_PENDING_MASK (1UL << 63)
+DEFINE_PER_CPU(unsigned long, cpu_last_nmi_count);
DEFINE_PER_CPU(int, x2apic_extra_bits);
*/
int uv_handle_nmi(struct notifier_block *self, unsigned long reason, void *data)
{
+ unsigned long real_uv_nmi;
+ int bid;
+
if (reason != DIE_NMIUNKNOWN)
return NOTIFY_OK;
if (in_crash_kexec)
/* do nothing if entering the crash kernel */
return NOTIFY_OK;
+
/*
- * Use a lock so only one cpu prints at a time
- * to prevent intermixed output.
+ * Each blade has an MMR that indicates when an NMI has been sent
+ * to cpus on the blade. If an NMI is detected, atomically
+ * clear the MMR and update a per-blade NMI count used to
+ * cause each cpu on the blade to notice a new NMI.
+ */
+ bid = uv_numa_blade_id();
+ real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) & UV_NMI_PENDING_MASK);
+
+ if (unlikely(real_uv_nmi)) {
+ spin_lock(&uv_blade_info[bid].nmi_lock);
+ real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) & UV_NMI_PENDING_MASK);
+ if (real_uv_nmi) {
+ uv_blade_info[bid].nmi_count++;
+ uv_write_local_mmr(UVH_NMI_MMR_CLEAR, UV_NMI_PENDING_MASK);
+ }
+ spin_unlock(&uv_blade_info[bid].nmi_lock);
+ }
+
+ if (likely(__get_cpu_var(cpu_last_nmi_count) == uv_blade_info[bid].nmi_count))
+ return NOTIFY_DONE;
+
+ __get_cpu_var(cpu_last_nmi_count) = uv_blade_info[bid].nmi_count;
+
+ /*
+ * Use a lock so only one cpu prints at a time.
+ * This prevents intermixed output.
*/
spin_lock(&uv_nmi_lock);
- pr_info("NMI stack dump cpu %u:\n", smp_processor_id());
+ pr_info("UV NMI stack dump cpu %u:\n", smp_processor_id());
dump_stack();
spin_unlock(&uv_nmi_lock);
}
static struct notifier_block uv_dump_stack_nmi_nb = {
- .notifier_call = uv_handle_nmi
+ .notifier_call = uv_handle_nmi,
+ .priority = NMI_LOCAL_LOW_PRIOR - 1,
};
void uv_register_nmi_notifier(void)
printk(KERN_DEBUG "UV: Found %d blades\n", uv_num_possible_blades());
bytes = sizeof(struct uv_blade_info) * uv_num_possible_blades();
- uv_blade_info = kmalloc(bytes, GFP_KERNEL);
+ uv_blade_info = kzalloc(bytes, GFP_KERNEL);
BUG_ON(!uv_blade_info);
+
for (blade = 0; blade < uv_num_possible_blades(); blade++)
uv_blade_info[blade].memory_nid = -1;
uv_blade_info[blade].pnode = pnode;
uv_blade_info[blade].nr_possible_cpus = 0;
uv_blade_info[blade].nr_online_cpus = 0;
+ spin_lock_init(&uv_blade_info[blade].nmi_lock);
max_pnode = max(pnode, max_pnode);
blade++;
}
#include <linux/kthread.h>
#include <linux/jiffies.h>
#include <linux/acpi.h>
+#include <linux/syscore_ops.h>
#include <asm/system.h>
#include <asm/uaccess.h>
dpm_suspend_noirq(PMSG_SUSPEND);
local_irq_disable();
- sysdev_suspend(PMSG_SUSPEND);
+ syscore_suspend();
local_irq_enable();
apm_error("suspend", err);
err = (err == APM_SUCCESS) ? 0 : -EIO;
- sysdev_resume();
+ syscore_resume();
local_irq_enable();
dpm_resume_noirq(PMSG_RESUME);
dpm_suspend_noirq(PMSG_SUSPEND);
local_irq_disable();
- sysdev_suspend(PMSG_SUSPEND);
+ syscore_suspend();
local_irq_enable();
err = set_system_power_state(APM_STATE_STANDBY);
apm_error("standby", err);
local_irq_disable();
- sysdev_resume();
+ syscore_resume();
local_irq_enable();
dpm_resume_noirq(PMSG_RESUME);
obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
-obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
#endif
/* As a rule processors have APIC timer running in deep C states */
- if (c->x86 >= 0xf && !cpu_has_amd_erratum(amd_erratum_400))
+ if (c->x86 > 0xf && !cpu_has_amd_erratum(amd_erratum_400))
set_cpu_cap(c, X86_FEATURE_ARAT);
/*
cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx);
- if (eax > 0)
- c->x86_capability[9] = ebx;
+ c->x86_capability[9] = ebx;
}
/* AMD-defined flags: level 0x80000001 */
+++ /dev/null
-#
-# CPU Frequency scaling
-#
-
-menu "CPU Frequency scaling"
-
-source "drivers/cpufreq/Kconfig"
-
-if CPU_FREQ
-
-comment "CPUFreq processor drivers"
-
-config X86_PCC_CPUFREQ
- tristate "Processor Clocking Control interface driver"
- depends on ACPI && ACPI_PROCESSOR
- help
- This driver adds support for the PCC interface.
-
- For details, take a look at:
- <file:Documentation/cpu-freq/pcc-cpufreq.txt>.
-
- To compile this driver as a module, choose M here: the
- module will be called pcc-cpufreq.
-
- If in doubt, say N.
-
-config X86_ACPI_CPUFREQ
- tristate "ACPI Processor P-States driver"
- select CPU_FREQ_TABLE
- depends on ACPI_PROCESSOR
- help
- This driver adds a CPUFreq driver which utilizes the ACPI
- Processor Performance States.
- This driver also supports Intel Enhanced Speedstep.
-
- To compile this driver as a module, choose M here: the
- module will be called acpi-cpufreq.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config ELAN_CPUFREQ
- tristate "AMD Elan SC400 and SC410"
- select CPU_FREQ_TABLE
- depends on X86_ELAN
- ---help---
- This adds the CPUFreq driver for AMD Elan SC400 and SC410
- processors.
-
- You need to specify the processor maximum speed as boot
- parameter: elanfreq=maxspeed (in kHz) or as module
- parameter "max_freq".
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config SC520_CPUFREQ
- tristate "AMD Elan SC520"
- select CPU_FREQ_TABLE
- depends on X86_ELAN
- ---help---
- This adds the CPUFreq driver for AMD Elan SC520 processor.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-
-config X86_POWERNOW_K6
- tristate "AMD Mobile K6-2/K6-3 PowerNow!"
- select CPU_FREQ_TABLE
- depends on X86_32
- help
- This adds the CPUFreq driver for mobile AMD K6-2+ and mobile
- AMD K6-3+ processors.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_POWERNOW_K7
- tristate "AMD Mobile Athlon/Duron PowerNow!"
- select CPU_FREQ_TABLE
- depends on X86_32
- help
- This adds the CPUFreq driver for mobile AMD K7 mobile processors.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_POWERNOW_K7_ACPI
- bool
- depends on X86_POWERNOW_K7 && ACPI_PROCESSOR
- depends on !(X86_POWERNOW_K7 = y && ACPI_PROCESSOR = m)
- depends on X86_32
- default y
-
-config X86_POWERNOW_K8
- tristate "AMD Opteron/Athlon64 PowerNow!"
- select CPU_FREQ_TABLE
- depends on ACPI && ACPI_PROCESSOR
- help
- This adds the CPUFreq driver for K8/K10 Opteron/Athlon64 processors.
-
- To compile this driver as a module, choose M here: the
- module will be called powernow-k8.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
-config X86_GX_SUSPMOD
- tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
- depends on X86_32 && PCI
- help
- This add the CPUFreq driver for NatSemi Geode processors which
- support suspend modulation.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_SPEEDSTEP_CENTRINO
- tristate "Intel Enhanced SpeedStep (deprecated)"
- select CPU_FREQ_TABLE
- select X86_SPEEDSTEP_CENTRINO_TABLE if X86_32
- depends on X86_32 || (X86_64 && ACPI_PROCESSOR)
- help
- This is deprecated and this functionality is now merged into
- acpi_cpufreq (X86_ACPI_CPUFREQ). Use that driver instead of
- speedstep_centrino.
- This adds the CPUFreq driver for Enhanced SpeedStep enabled
- mobile CPUs. This means Intel Pentium M (Centrino) CPUs
- or 64bit enabled Intel Xeons.
-
- To compile this driver as a module, choose M here: the
- module will be called speedstep-centrino.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_SPEEDSTEP_CENTRINO_TABLE
- bool "Built-in tables for Banias CPUs"
- depends on X86_32 && X86_SPEEDSTEP_CENTRINO
- default y
- help
- Use built-in tables for Banias CPUs if ACPI encoding
- is not available.
-
- If in doubt, say N.
-
-config X86_SPEEDSTEP_ICH
- tristate "Intel Speedstep on ICH-M chipsets (ioport interface)"
- select CPU_FREQ_TABLE
- depends on X86_32
- help
- This adds the CPUFreq driver for certain mobile Intel Pentium III
- (Coppermine), all mobile Intel Pentium III-M (Tualatin) and all
- mobile Intel Pentium 4 P4-M on systems which have an Intel ICH2,
- ICH3 or ICH4 southbridge.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_SPEEDSTEP_SMI
- tristate "Intel SpeedStep on 440BX/ZX/MX chipsets (SMI interface)"
- select CPU_FREQ_TABLE
- depends on X86_32 && EXPERIMENTAL
- help
- This adds the CPUFreq driver for certain mobile Intel Pentium III
- (Coppermine), all mobile Intel Pentium III-M (Tualatin)
- on systems which have an Intel 440BX/ZX/MX southbridge.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_P4_CLOCKMOD
- tristate "Intel Pentium 4 clock modulation"
- select CPU_FREQ_TABLE
- help
- This adds the CPUFreq driver for Intel Pentium 4 / XEON
- processors. When enabled it will lower CPU temperature by skipping
- clocks.
-
- This driver should be only used in exceptional
- circumstances when very low power is needed because it causes severe
- slowdowns and noticeable latencies. Normally Speedstep should be used
- instead.
-
- To compile this driver as a module, choose M here: the
- module will be called p4-clockmod.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- Unless you are absolutely sure say N.
-
-config X86_CPUFREQ_NFORCE2
- tristate "nVidia nForce2 FSB changing"
- depends on X86_32 && EXPERIMENTAL
- help
- This adds the CPUFreq driver for FSB changing on nVidia nForce2
- platforms.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_LONGRUN
- tristate "Transmeta LongRun"
- depends on X86_32
- help
- This adds the CPUFreq driver for Transmeta Crusoe and Efficeon processors
- which support LongRun.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_LONGHAUL
- tristate "VIA Cyrix III Longhaul"
- select CPU_FREQ_TABLE
- depends on X86_32 && ACPI_PROCESSOR
- help
- This adds the CPUFreq driver for VIA Samuel/CyrixIII,
- VIA Cyrix Samuel/C3, VIA Cyrix Ezra and VIA Cyrix Ezra-T
- processors.
-
- For details, take a look at <file:Documentation/cpu-freq/>.
-
- If in doubt, say N.
-
-config X86_E_POWERSAVER
- tristate "VIA C7 Enhanced PowerSaver (DANGEROUS)"
- select CPU_FREQ_TABLE
- depends on X86_32 && EXPERIMENTAL
- help
- This adds the CPUFreq driver for VIA C7 processors. However, this driver
- does not have any safeguards to prevent operating the CPU out of spec
- and is thus considered dangerous. Please use the regular ACPI cpufreq
- driver, enabled by CONFIG_X86_ACPI_CPUFREQ.
-
- If in doubt, say N.
-
-comment "shared options"
-
-config X86_SPEEDSTEP_LIB
- tristate
- default (X86_SPEEDSTEP_ICH || X86_SPEEDSTEP_SMI || X86_P4_CLOCKMOD)
-
-config X86_SPEEDSTEP_RELAXED_CAP_CHECK
- bool "Relaxed speedstep capability checks"
- depends on X86_32 && (X86_SPEEDSTEP_SMI || X86_SPEEDSTEP_ICH)
- help
- Don't perform all checks for a speedstep capable system which would
- normally be done. Some ancient or strange systems, though speedstep
- capable, don't always indicate that they are speedstep capable. This
- option lets the probing code bypass some of those checks if the
- parameter "relaxed_check=1" is passed to the module.
-
-endif # CPU_FREQ
-
-endmenu
+++ /dev/null
-# Link order matters. K8 is preferred to ACPI because of firmware bugs in early
-# K8 systems. ACPI is preferred to all other hardware-specific drivers.
-# speedstep-* is preferred over p4-clockmod.
-
-obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o mperf.o
-obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o mperf.o
-obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o
-obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o
-obj-$(CONFIG_X86_POWERNOW_K7) += powernow-k7.o
-obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
-obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o
-obj-$(CONFIG_ELAN_CPUFREQ) += elanfreq.o
-obj-$(CONFIG_SC520_CPUFREQ) += sc520_freq.o
-obj-$(CONFIG_X86_LONGRUN) += longrun.o
-obj-$(CONFIG_X86_GX_SUSPMOD) += gx-suspmod.o
-obj-$(CONFIG_X86_SPEEDSTEP_ICH) += speedstep-ich.o
-obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o
-obj-$(CONFIG_X86_SPEEDSTEP_SMI) += speedstep-smi.o
-obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
-obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
-obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
+++ /dev/null
-/*
- * acpi-cpufreq.c - ACPI Processor P-States Driver
- *
- * Copyright (C) 2001, 2002 Andy Grover <andrew.grover@intel.com>
- * Copyright (C) 2001, 2002 Paul Diefenbaugh <paul.s.diefenbaugh@intel.com>
- * Copyright (C) 2002 - 2004 Dominik Brodowski <linux@brodo.de>
- * Copyright (C) 2006 Denis Sadykov <denis.m.sadykov@intel.com>
- *
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
- *
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/smp.h>
-#include <linux/sched.h>
-#include <linux/cpufreq.h>
-#include <linux/compiler.h>
-#include <linux/dmi.h>
-#include <linux/slab.h>
-
-#include <linux/acpi.h>
-#include <linux/io.h>
-#include <linux/delay.h>
-#include <linux/uaccess.h>
-
-#include <acpi/processor.h>
-
-#include <asm/msr.h>
-#include <asm/processor.h>
-#include <asm/cpufeature.h>
-#include "mperf.h"
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "acpi-cpufreq", msg)
-
-MODULE_AUTHOR("Paul Diefenbaugh, Dominik Brodowski");
-MODULE_DESCRIPTION("ACPI Processor P-States Driver");
-MODULE_LICENSE("GPL");
-
-enum {
- UNDEFINED_CAPABLE = 0,
- SYSTEM_INTEL_MSR_CAPABLE,
- SYSTEM_IO_CAPABLE,
-};
-
-#define INTEL_MSR_RANGE (0xffff)
-
-struct acpi_cpufreq_data {
- struct acpi_processor_performance *acpi_data;
- struct cpufreq_frequency_table *freq_table;
- unsigned int resume;
- unsigned int cpu_feature;
-};
-
-static DEFINE_PER_CPU(struct acpi_cpufreq_data *, acfreq_data);
-
-/* acpi_perf_data is a pointer to percpu data. */
-static struct acpi_processor_performance __percpu *acpi_perf_data;
-
-static struct cpufreq_driver acpi_cpufreq_driver;
-
-static unsigned int acpi_pstate_strict;
-
-static int check_est_cpu(unsigned int cpuid)
-{
- struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
-
- return cpu_has(cpu, X86_FEATURE_EST);
-}
-
-static unsigned extract_io(u32 value, struct acpi_cpufreq_data *data)
-{
- struct acpi_processor_performance *perf;
- int i;
-
- perf = data->acpi_data;
-
- for (i = 0; i < perf->state_count; i++) {
- if (value == perf->states[i].status)
- return data->freq_table[i].frequency;
- }
- return 0;
-}
-
-static unsigned extract_msr(u32 msr, struct acpi_cpufreq_data *data)
-{
- int i;
- struct acpi_processor_performance *perf;
-
- msr &= INTEL_MSR_RANGE;
- perf = data->acpi_data;
-
- for (i = 0; data->freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
- if (msr == perf->states[data->freq_table[i].index].status)
- return data->freq_table[i].frequency;
- }
- return data->freq_table[0].frequency;
-}
-
-static unsigned extract_freq(u32 val, struct acpi_cpufreq_data *data)
-{
- switch (data->cpu_feature) {
- case SYSTEM_INTEL_MSR_CAPABLE:
- return extract_msr(val, data);
- case SYSTEM_IO_CAPABLE:
- return extract_io(val, data);
- default:
- return 0;
- }
-}
-
-struct msr_addr {
- u32 reg;
-};
-
-struct io_addr {
- u16 port;
- u8 bit_width;
-};
-
-struct drv_cmd {
- unsigned int type;
- const struct cpumask *mask;
- union {
- struct msr_addr msr;
- struct io_addr io;
- } addr;
- u32 val;
-};
-
-/* Called via smp_call_function_single(), on the target CPU */
-static void do_drv_read(void *_cmd)
-{
- struct drv_cmd *cmd = _cmd;
- u32 h;
-
- switch (cmd->type) {
- case SYSTEM_INTEL_MSR_CAPABLE:
- rdmsr(cmd->addr.msr.reg, cmd->val, h);
- break;
- case SYSTEM_IO_CAPABLE:
- acpi_os_read_port((acpi_io_address)cmd->addr.io.port,
- &cmd->val,
- (u32)cmd->addr.io.bit_width);
- break;
- default:
- break;
- }
-}
-
-/* Called via smp_call_function_many(), on the target CPUs */
-static void do_drv_write(void *_cmd)
-{
- struct drv_cmd *cmd = _cmd;
- u32 lo, hi;
-
- switch (cmd->type) {
- case SYSTEM_INTEL_MSR_CAPABLE:
- rdmsr(cmd->addr.msr.reg, lo, hi);
- lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
- wrmsr(cmd->addr.msr.reg, lo, hi);
- break;
- case SYSTEM_IO_CAPABLE:
- acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
- cmd->val,
- (u32)cmd->addr.io.bit_width);
- break;
- default:
- break;
- }
-}
-
-static void drv_read(struct drv_cmd *cmd)
-{
- int err;
- cmd->val = 0;
-
- err = smp_call_function_any(cmd->mask, do_drv_read, cmd, 1);
- WARN_ON_ONCE(err); /* smp_call_function_any() was buggy? */
-}
-
-static void drv_write(struct drv_cmd *cmd)
-{
- int this_cpu;
-
- this_cpu = get_cpu();
- if (cpumask_test_cpu(this_cpu, cmd->mask))
- do_drv_write(cmd);
- smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
- put_cpu();
-}
-
-static u32 get_cur_val(const struct cpumask *mask)
-{
- struct acpi_processor_performance *perf;
- struct drv_cmd cmd;
-
- if (unlikely(cpumask_empty(mask)))
- return 0;
-
- switch (per_cpu(acfreq_data, cpumask_first(mask))->cpu_feature) {
- case SYSTEM_INTEL_MSR_CAPABLE:
- cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
- cmd.addr.msr.reg = MSR_IA32_PERF_STATUS;
- break;
- case SYSTEM_IO_CAPABLE:
- cmd.type = SYSTEM_IO_CAPABLE;
- perf = per_cpu(acfreq_data, cpumask_first(mask))->acpi_data;
- cmd.addr.io.port = perf->control_register.address;
- cmd.addr.io.bit_width = perf->control_register.bit_width;
- break;
- default:
- return 0;
- }
-
- cmd.mask = mask;
- drv_read(&cmd);
-
- dprintk("get_cur_val = %u\n", cmd.val);
-
- return cmd.val;
-}
-
-static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
-{
- struct acpi_cpufreq_data *data = per_cpu(acfreq_data, cpu);
- unsigned int freq;
- unsigned int cached_freq;
-
- dprintk("get_cur_freq_on_cpu (%d)\n", cpu);
-
- if (unlikely(data == NULL ||
- data->acpi_data == NULL || data->freq_table == NULL)) {
- return 0;
- }
-
- cached_freq = data->freq_table[data->acpi_data->state].frequency;
- freq = extract_freq(get_cur_val(cpumask_of(cpu)), data);
- if (freq != cached_freq) {
- /*
- * The dreaded BIOS frequency change behind our back.
- * Force set the frequency on next target call.
- */
- data->resume = 1;
- }
-
- dprintk("cur freq = %u\n", freq);
-
- return freq;
-}
-
-static unsigned int check_freqs(const struct cpumask *mask, unsigned int freq,
- struct acpi_cpufreq_data *data)
-{
- unsigned int cur_freq;
- unsigned int i;
-
- for (i = 0; i < 100; i++) {
- cur_freq = extract_freq(get_cur_val(mask), data);
- if (cur_freq == freq)
- return 1;
- udelay(10);
- }
- return 0;
-}
-
-static int acpi_cpufreq_target(struct cpufreq_policy *policy,
- unsigned int target_freq, unsigned int relation)
-{
- struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
- struct acpi_processor_performance *perf;
- struct cpufreq_freqs freqs;
- struct drv_cmd cmd;
- unsigned int next_state = 0; /* Index into freq_table */
- unsigned int next_perf_state = 0; /* Index into perf table */
- unsigned int i;
- int result = 0;
-
- dprintk("acpi_cpufreq_target %d (%d)\n", target_freq, policy->cpu);
-
- if (unlikely(data == NULL ||
- data->acpi_data == NULL || data->freq_table == NULL)) {
- return -ENODEV;
- }
-
- perf = data->acpi_data;
- result = cpufreq_frequency_table_target(policy,
- data->freq_table,
- target_freq,
- relation, &next_state);
- if (unlikely(result)) {
- result = -ENODEV;
- goto out;
- }
-
- next_perf_state = data->freq_table[next_state].index;
- if (perf->state == next_perf_state) {
- if (unlikely(data->resume)) {
- dprintk("Called after resume, resetting to P%d\n",
- next_perf_state);
- data->resume = 0;
- } else {
- dprintk("Already at target state (P%d)\n",
- next_perf_state);
- goto out;
- }
- }
-
- switch (data->cpu_feature) {
- case SYSTEM_INTEL_MSR_CAPABLE:
- cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
- cmd.addr.msr.reg = MSR_IA32_PERF_CTL;
- cmd.val = (u32) perf->states[next_perf_state].control;
- break;
- case SYSTEM_IO_CAPABLE:
- cmd.type = SYSTEM_IO_CAPABLE;
- cmd.addr.io.port = perf->control_register.address;
- cmd.addr.io.bit_width = perf->control_register.bit_width;
- cmd.val = (u32) perf->states[next_perf_state].control;
- break;
- default:
- result = -ENODEV;
- goto out;
- }
-
- /* cpufreq holds the hotplug lock, so we are safe from here on */
- if (policy->shared_type != CPUFREQ_SHARED_TYPE_ANY)
- cmd.mask = policy->cpus;
- else
- cmd.mask = cpumask_of(policy->cpu);
-
- freqs.old = perf->states[perf->state].core_frequency * 1000;
- freqs.new = data->freq_table[next_state].frequency;
- for_each_cpu(i, policy->cpus) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- }
-
- drv_write(&cmd);
-
- if (acpi_pstate_strict) {
- if (!check_freqs(cmd.mask, freqs.new, data)) {
- dprintk("acpi_cpufreq_target failed (%d)\n",
- policy->cpu);
- result = -EAGAIN;
- goto out;
- }
- }
-
- for_each_cpu(i, policy->cpus) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
- perf->state = next_perf_state;
-
-out:
- return result;
-}
-
-static int acpi_cpufreq_verify(struct cpufreq_policy *policy)
-{
- struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
-
- dprintk("acpi_cpufreq_verify\n");
-
- return cpufreq_frequency_table_verify(policy, data->freq_table);
-}
-
-static unsigned long
-acpi_cpufreq_guess_freq(struct acpi_cpufreq_data *data, unsigned int cpu)
-{
- struct acpi_processor_performance *perf = data->acpi_data;
-
- if (cpu_khz) {
- /* search the closest match to cpu_khz */
- unsigned int i;
- unsigned long freq;
- unsigned long freqn = perf->states[0].core_frequency * 1000;
-
- for (i = 0; i < (perf->state_count-1); i++) {
- freq = freqn;
- freqn = perf->states[i+1].core_frequency * 1000;
- if ((2 * cpu_khz) > (freqn + freq)) {
- perf->state = i;
- return freq;
- }
- }
- perf->state = perf->state_count-1;
- return freqn;
- } else {
- /* assume CPU is at P0... */
- perf->state = 0;
- return perf->states[0].core_frequency * 1000;
- }
-}
-
-static void free_acpi_perf_data(void)
-{
- unsigned int i;
-
- /* Freeing a NULL pointer is OK, and alloc_percpu zeroes. */
- for_each_possible_cpu(i)
- free_cpumask_var(per_cpu_ptr(acpi_perf_data, i)
- ->shared_cpu_map);
- free_percpu(acpi_perf_data);
-}
-
-/*
- * acpi_cpufreq_early_init - initialize ACPI P-States library
- *
- * Initialize the ACPI P-States library (drivers/acpi/processor_perflib.c)
- * in order to determine correct frequency and voltage pairings. We can
- * do _PDC and _PSD and find out the processor dependency for the
- * actual init that will happen later...
- */
-static int __init acpi_cpufreq_early_init(void)
-{
- unsigned int i;
- dprintk("acpi_cpufreq_early_init\n");
-
- acpi_perf_data = alloc_percpu(struct acpi_processor_performance);
- if (!acpi_perf_data) {
- dprintk("Memory allocation error for acpi_perf_data.\n");
- return -ENOMEM;
- }
- for_each_possible_cpu(i) {
- if (!zalloc_cpumask_var_node(
- &per_cpu_ptr(acpi_perf_data, i)->shared_cpu_map,
- GFP_KERNEL, cpu_to_node(i))) {
-
- /* Freeing a NULL pointer is OK: alloc_percpu zeroes. */
- free_acpi_perf_data();
- return -ENOMEM;
- }
- }
-
- /* Do initialization in ACPI core */
- acpi_processor_preregister_performance(acpi_perf_data);
- return 0;
-}
-
-#ifdef CONFIG_SMP
-/*
- * Some BIOSes do SW_ANY coordination internally, either set it up in hw
- * or do it in BIOS firmware and won't inform about it to OS. If not
- * detected, this has a side effect of making CPU run at a different speed
- * than OS intended it to run at. Detect it and handle it cleanly.
- */
-static int bios_with_sw_any_bug;
-
-static int sw_any_bug_found(const struct dmi_system_id *d)
-{
- bios_with_sw_any_bug = 1;
- return 0;
-}
-
-static const struct dmi_system_id sw_any_bug_dmi_table[] = {
- {
- .callback = sw_any_bug_found,
- .ident = "Supermicro Server X6DLP",
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "Supermicro"),
- DMI_MATCH(DMI_BIOS_VERSION, "080010"),
- DMI_MATCH(DMI_PRODUCT_NAME, "X6DLP"),
- },
- },
- { }
-};
-
-static int acpi_cpufreq_blacklist(struct cpuinfo_x86 *c)
-{
- /* Intel Xeon Processor 7100 Series Specification Update
- * http://www.intel.com/Assets/PDF/specupdate/314554.pdf
- * AL30: A Machine Check Exception (MCE) Occurring during an
- * Enhanced Intel SpeedStep Technology Ratio Change May Cause
- * Both Processor Cores to Lock Up. */
- if (c->x86_vendor == X86_VENDOR_INTEL) {
- if ((c->x86 == 15) &&
- (c->x86_model == 6) &&
- (c->x86_mask == 8)) {
- printk(KERN_INFO "acpi-cpufreq: Intel(R) "
- "Xeon(R) 7100 Errata AL30, processors may "
- "lock up on frequency changes: disabling "
- "acpi-cpufreq.\n");
- return -ENODEV;
- }
- }
- return 0;
-}
-#endif
-
-static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
-{
- unsigned int i;
- unsigned int valid_states = 0;
- unsigned int cpu = policy->cpu;
- struct acpi_cpufreq_data *data;
- unsigned int result = 0;
- struct cpuinfo_x86 *c = &cpu_data(policy->cpu);
- struct acpi_processor_performance *perf;
-#ifdef CONFIG_SMP
- static int blacklisted;
-#endif
-
- dprintk("acpi_cpufreq_cpu_init\n");
-
-#ifdef CONFIG_SMP
- if (blacklisted)
- return blacklisted;
- blacklisted = acpi_cpufreq_blacklist(c);
- if (blacklisted)
- return blacklisted;
-#endif
-
- data = kzalloc(sizeof(struct acpi_cpufreq_data), GFP_KERNEL);
- if (!data)
- return -ENOMEM;
-
- data->acpi_data = per_cpu_ptr(acpi_perf_data, cpu);
- per_cpu(acfreq_data, cpu) = data;
-
- if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
- acpi_cpufreq_driver.flags |= CPUFREQ_CONST_LOOPS;
-
- result = acpi_processor_register_performance(data->acpi_data, cpu);
- if (result)
- goto err_free;
-
- perf = data->acpi_data;
- policy->shared_type = perf->shared_type;
-
- /*
- * Will let policy->cpus know about dependency only when software
- * coordination is required.
- */
- if (policy->shared_type == CPUFREQ_SHARED_TYPE_ALL ||
- policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
- cpumask_copy(policy->cpus, perf->shared_cpu_map);
- }
- cpumask_copy(policy->related_cpus, perf->shared_cpu_map);
-
-#ifdef CONFIG_SMP
- dmi_check_system(sw_any_bug_dmi_table);
- if (bios_with_sw_any_bug && cpumask_weight(policy->cpus) == 1) {
- policy->shared_type = CPUFREQ_SHARED_TYPE_ALL;
- cpumask_copy(policy->cpus, cpu_core_mask(cpu));
- }
-#endif
-
- /* capability check */
- if (perf->state_count <= 1) {
- dprintk("No P-States\n");
- result = -ENODEV;
- goto err_unreg;
- }
-
- if (perf->control_register.space_id != perf->status_register.space_id) {
- result = -ENODEV;
- goto err_unreg;
- }
-
- switch (perf->control_register.space_id) {
- case ACPI_ADR_SPACE_SYSTEM_IO:
- dprintk("SYSTEM IO addr space\n");
- data->cpu_feature = SYSTEM_IO_CAPABLE;
- break;
- case ACPI_ADR_SPACE_FIXED_HARDWARE:
- dprintk("HARDWARE addr space\n");
- if (!check_est_cpu(cpu)) {
- result = -ENODEV;
- goto err_unreg;
- }
- data->cpu_feature = SYSTEM_INTEL_MSR_CAPABLE;
- break;
- default:
- dprintk("Unknown addr space %d\n",
- (u32) (perf->control_register.space_id));
- result = -ENODEV;
- goto err_unreg;
- }
-
- data->freq_table = kmalloc(sizeof(struct cpufreq_frequency_table) *
- (perf->state_count+1), GFP_KERNEL);
- if (!data->freq_table) {
- result = -ENOMEM;
- goto err_unreg;
- }
-
- /* detect transition latency */
- policy->cpuinfo.transition_latency = 0;
- for (i = 0; i < perf->state_count; i++) {
- if ((perf->states[i].transition_latency * 1000) >
- policy->cpuinfo.transition_latency)
- policy->cpuinfo.transition_latency =
- perf->states[i].transition_latency * 1000;
- }
-
- /* Check for high latency (>20uS) from buggy BIOSes, like on T42 */
- if (perf->control_register.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE &&
- policy->cpuinfo.transition_latency > 20 * 1000) {
- policy->cpuinfo.transition_latency = 20 * 1000;
- printk_once(KERN_INFO
- "P-state transition latency capped at 20 uS\n");
- }
-
- /* table init */
- for (i = 0; i < perf->state_count; i++) {
- if (i > 0 && perf->states[i].core_frequency >=
- data->freq_table[valid_states-1].frequency / 1000)
- continue;
-
- data->freq_table[valid_states].index = i;
- data->freq_table[valid_states].frequency =
- perf->states[i].core_frequency * 1000;
- valid_states++;
- }
- data->freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
- perf->state = 0;
-
- result = cpufreq_frequency_table_cpuinfo(policy, data->freq_table);
- if (result)
- goto err_freqfree;
-
- if (perf->states[0].core_frequency * 1000 != policy->cpuinfo.max_freq)
- printk(KERN_WARNING FW_WARN "P-state 0 is not max freq\n");
-
- switch (perf->control_register.space_id) {
- case ACPI_ADR_SPACE_SYSTEM_IO:
- /* Current speed is unknown and not detectable by IO port */
- policy->cur = acpi_cpufreq_guess_freq(data, policy->cpu);
- break;
- case ACPI_ADR_SPACE_FIXED_HARDWARE:
- acpi_cpufreq_driver.get = get_cur_freq_on_cpu;
- policy->cur = get_cur_freq_on_cpu(cpu);
- break;
- default:
- break;
- }
-
- /* notify BIOS that we exist */
- acpi_processor_notify_smm(THIS_MODULE);
-
- /* Check for APERF/MPERF support in hardware */
- if (cpu_has(c, X86_FEATURE_APERFMPERF))
- acpi_cpufreq_driver.getavg = cpufreq_get_measured_perf;
-
- dprintk("CPU%u - ACPI performance management activated.\n", cpu);
- for (i = 0; i < perf->state_count; i++)
- dprintk(" %cP%d: %d MHz, %d mW, %d uS\n",
- (i == perf->state ? '*' : ' '), i,
- (u32) perf->states[i].core_frequency,
- (u32) perf->states[i].power,
- (u32) perf->states[i].transition_latency);
-
- cpufreq_frequency_table_get_attr(data->freq_table, policy->cpu);
-
- /*
- * the first call to ->target() should result in us actually
- * writing something to the appropriate registers.
- */
- data->resume = 1;
-
- return result;
-
-err_freqfree:
- kfree(data->freq_table);
-err_unreg:
- acpi_processor_unregister_performance(perf, cpu);
-err_free:
- kfree(data);
- per_cpu(acfreq_data, cpu) = NULL;
-
- return result;
-}
-
-static int acpi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
-{
- struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
-
- dprintk("acpi_cpufreq_cpu_exit\n");
-
- if (data) {
- cpufreq_frequency_table_put_attr(policy->cpu);
- per_cpu(acfreq_data, policy->cpu) = NULL;
- acpi_processor_unregister_performance(data->acpi_data,
- policy->cpu);
- kfree(data->freq_table);
- kfree(data);
- }
-
- return 0;
-}
-
-static int acpi_cpufreq_resume(struct cpufreq_policy *policy)
-{
- struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
-
- dprintk("acpi_cpufreq_resume\n");
-
- data->resume = 1;
-
- return 0;
-}
-
-static struct freq_attr *acpi_cpufreq_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver acpi_cpufreq_driver = {
- .verify = acpi_cpufreq_verify,
- .target = acpi_cpufreq_target,
- .bios_limit = acpi_processor_get_bios_limit,
- .init = acpi_cpufreq_cpu_init,
- .exit = acpi_cpufreq_cpu_exit,
- .resume = acpi_cpufreq_resume,
- .name = "acpi-cpufreq",
- .owner = THIS_MODULE,
- .attr = acpi_cpufreq_attr,
-};
-
-static int __init acpi_cpufreq_init(void)
-{
- int ret;
-
- if (acpi_disabled)
- return 0;
-
- dprintk("acpi_cpufreq_init\n");
-
- ret = acpi_cpufreq_early_init();
- if (ret)
- return ret;
-
- ret = cpufreq_register_driver(&acpi_cpufreq_driver);
- if (ret)
- free_acpi_perf_data();
-
- return ret;
-}
-
-static void __exit acpi_cpufreq_exit(void)
-{
- dprintk("acpi_cpufreq_exit\n");
-
- cpufreq_unregister_driver(&acpi_cpufreq_driver);
-
- free_percpu(acpi_perf_data);
-}
-
-module_param(acpi_pstate_strict, uint, 0644);
-MODULE_PARM_DESC(acpi_pstate_strict,
- "value 0 or non-zero. non-zero -> strict ACPI checks are "
- "performed during frequency changes.");
-
-late_initcall(acpi_cpufreq_init);
-module_exit(acpi_cpufreq_exit);
-
-MODULE_ALIAS("acpi");
+++ /dev/null
-/*
- * (C) 2004-2006 Sebastian Witt <se.witt@gmx.net>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- * Based upon reverse engineered information
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/pci.h>
-#include <linux/delay.h>
-
-#define NFORCE2_XTAL 25
-#define NFORCE2_BOOTFSB 0x48
-#define NFORCE2_PLLENABLE 0xa8
-#define NFORCE2_PLLREG 0xa4
-#define NFORCE2_PLLADR 0xa0
-#define NFORCE2_PLL(mul, div) (0x100000 | (mul << 8) | div)
-
-#define NFORCE2_MIN_FSB 50
-#define NFORCE2_SAFE_DISTANCE 50
-
-/* Delay in ms between FSB changes */
-/* #define NFORCE2_DELAY 10 */
-
-/*
- * nforce2_chipset:
- * FSB is changed using the chipset
- */
-static struct pci_dev *nforce2_dev;
-
-/* fid:
- * multiplier * 10
- */
-static int fid;
-
-/* min_fsb, max_fsb:
- * minimum and maximum FSB (= FSB at boot time)
- */
-static int min_fsb;
-static int max_fsb;
-
-MODULE_AUTHOR("Sebastian Witt <se.witt@gmx.net>");
-MODULE_DESCRIPTION("nForce2 FSB changing cpufreq driver");
-MODULE_LICENSE("GPL");
-
-module_param(fid, int, 0444);
-module_param(min_fsb, int, 0444);
-
-MODULE_PARM_DESC(fid, "CPU multiplier to use (11.5 = 115)");
-MODULE_PARM_DESC(min_fsb,
- "Minimum FSB to use, if not defined: current FSB - 50");
-
-#define PFX "cpufreq-nforce2: "
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "cpufreq-nforce2", msg)
-
-/**
- * nforce2_calc_fsb - calculate FSB
- * @pll: PLL value
- *
- * Calculates FSB from PLL value
- */
-static int nforce2_calc_fsb(int pll)
-{
- unsigned char mul, div;
-
- mul = (pll >> 8) & 0xff;
- div = pll & 0xff;
-
- if (div > 0)
- return NFORCE2_XTAL * mul / div;
-
- return 0;
-}
-
-/**
- * nforce2_calc_pll - calculate PLL value
- * @fsb: FSB
- *
- * Calculate PLL value for given FSB
- */
-static int nforce2_calc_pll(unsigned int fsb)
-{
- unsigned char xmul, xdiv;
- unsigned char mul = 0, div = 0;
- int tried = 0;
-
- /* Try to calculate multiplier and divider up to 4 times */
- while (((mul == 0) || (div == 0)) && (tried <= 3)) {
- for (xdiv = 2; xdiv <= 0x80; xdiv++)
- for (xmul = 1; xmul <= 0xfe; xmul++)
- if (nforce2_calc_fsb(NFORCE2_PLL(xmul, xdiv)) ==
- fsb + tried) {
- mul = xmul;
- div = xdiv;
- }
- tried++;
- }
-
- if ((mul == 0) || (div == 0))
- return -1;
-
- return NFORCE2_PLL(mul, div);
-}
-
-/**
- * nforce2_write_pll - write PLL value to chipset
- * @pll: PLL value
- *
- * Writes new FSB PLL value to chipset
- */
-static void nforce2_write_pll(int pll)
-{
- int temp;
-
- /* Set the pll addr. to 0x00 */
- pci_write_config_dword(nforce2_dev, NFORCE2_PLLADR, 0);
-
- /* Now write the value in all 64 registers */
- for (temp = 0; temp <= 0x3f; temp++)
- pci_write_config_dword(nforce2_dev, NFORCE2_PLLREG, pll);
-
- return;
-}
-
-/**
- * nforce2_fsb_read - Read FSB
- *
- * Read FSB from chipset
- * If bootfsb != 0, return FSB at boot-time
- */
-static unsigned int nforce2_fsb_read(int bootfsb)
-{
- struct pci_dev *nforce2_sub5;
- u32 fsb, temp = 0;
-
- /* Get chipset boot FSB from subdevice 5 (FSB at boot-time) */
- nforce2_sub5 = pci_get_subsys(PCI_VENDOR_ID_NVIDIA, 0x01EF,
- PCI_ANY_ID, PCI_ANY_ID, NULL);
- if (!nforce2_sub5)
- return 0;
-
- pci_read_config_dword(nforce2_sub5, NFORCE2_BOOTFSB, &fsb);
- fsb /= 1000000;
-
- /* Check if PLL register is already set */
- pci_read_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8 *)&temp);
-
- if (bootfsb || !temp)
- return fsb;
-
- /* Use PLL register FSB value */
- pci_read_config_dword(nforce2_dev, NFORCE2_PLLREG, &temp);
- fsb = nforce2_calc_fsb(temp);
-
- return fsb;
-}
-
-/**
- * nforce2_set_fsb - set new FSB
- * @fsb: New FSB
- *
- * Sets new FSB
- */
-static int nforce2_set_fsb(unsigned int fsb)
-{
- u32 temp = 0;
- unsigned int tfsb;
- int diff;
- int pll = 0;
-
- if ((fsb > max_fsb) || (fsb < NFORCE2_MIN_FSB)) {
- printk(KERN_ERR PFX "FSB %d is out of range!\n", fsb);
- return -EINVAL;
- }
-
- tfsb = nforce2_fsb_read(0);
- if (!tfsb) {
- printk(KERN_ERR PFX "Error while reading the FSB\n");
- return -EINVAL;
- }
-
- /* First write? Then set actual value */
- pci_read_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8 *)&temp);
- if (!temp) {
- pll = nforce2_calc_pll(tfsb);
-
- if (pll < 0)
- return -EINVAL;
-
- nforce2_write_pll(pll);
- }
-
- /* Enable write access */
- temp = 0x01;
- pci_write_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8)temp);
-
- diff = tfsb - fsb;
-
- if (!diff)
- return 0;
-
- while ((tfsb != fsb) && (tfsb <= max_fsb) && (tfsb >= min_fsb)) {
- if (diff < 0)
- tfsb++;
- else
- tfsb--;
-
- /* Calculate the PLL reg. value */
- pll = nforce2_calc_pll(tfsb);
- if (pll == -1)
- return -EINVAL;
-
- nforce2_write_pll(pll);
-#ifdef NFORCE2_DELAY
- mdelay(NFORCE2_DELAY);
-#endif
- }
-
- temp = 0x40;
- pci_write_config_byte(nforce2_dev, NFORCE2_PLLADR, (u8)temp);
-
- return 0;
-}
-
-/**
- * nforce2_get - get the CPU frequency
- * @cpu: CPU number
- *
- * Returns the CPU frequency
- */
-static unsigned int nforce2_get(unsigned int cpu)
-{
- if (cpu)
- return 0;
- return nforce2_fsb_read(0) * fid * 100;
-}
-
-/**
- * nforce2_target - set a new CPUFreq policy
- * @policy: new policy
- * @target_freq: the target frequency
- * @relation: how that frequency relates to achieved frequency
- * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
- *
- * Sets a new CPUFreq policy.
- */
-static int nforce2_target(struct cpufreq_policy *policy,
- unsigned int target_freq, unsigned int relation)
-{
-/* unsigned long flags; */
- struct cpufreq_freqs freqs;
- unsigned int target_fsb;
-
- if ((target_freq > policy->max) || (target_freq < policy->min))
- return -EINVAL;
-
- target_fsb = target_freq / (fid * 100);
-
- freqs.old = nforce2_get(policy->cpu);
- freqs.new = target_fsb * fid * 100;
- freqs.cpu = 0; /* Only one CPU on nForce2 platforms */
-
- if (freqs.old == freqs.new)
- return 0;
-
- dprintk("Old CPU frequency %d kHz, new %d kHz\n",
- freqs.old, freqs.new);
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- /* Disable IRQs */
- /* local_irq_save(flags); */
-
- if (nforce2_set_fsb(target_fsb) < 0)
- printk(KERN_ERR PFX "Changing FSB to %d failed\n",
- target_fsb);
- else
- dprintk("Changed FSB successfully to %d\n",
- target_fsb);
-
- /* Enable IRQs */
- /* local_irq_restore(flags); */
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-
- return 0;
-}
-
-/**
- * nforce2_verify - verifies a new CPUFreq policy
- * @policy: new policy
- */
-static int nforce2_verify(struct cpufreq_policy *policy)
-{
- unsigned int fsb_pol_max;
-
- fsb_pol_max = policy->max / (fid * 100);
-
- if (policy->min < (fsb_pol_max * fid * 100))
- policy->max = (fsb_pol_max + 1) * fid * 100;
-
- cpufreq_verify_within_limits(policy,
- policy->cpuinfo.min_freq,
- policy->cpuinfo.max_freq);
- return 0;
-}
-
-static int nforce2_cpu_init(struct cpufreq_policy *policy)
-{
- unsigned int fsb;
- unsigned int rfid;
-
- /* capability check */
- if (policy->cpu != 0)
- return -ENODEV;
-
- /* Get current FSB */
- fsb = nforce2_fsb_read(0);
-
- if (!fsb)
- return -EIO;
-
- /* FIX: Get FID from CPU */
- if (!fid) {
- if (!cpu_khz) {
- printk(KERN_WARNING PFX
- "cpu_khz not set, can't calculate multiplier!\n");
- return -ENODEV;
- }
-
- fid = cpu_khz / (fsb * 100);
- rfid = fid % 5;
-
- if (rfid) {
- if (rfid > 2)
- fid += 5 - rfid;
- else
- fid -= rfid;
- }
- }
-
- printk(KERN_INFO PFX "FSB currently at %i MHz, FID %d.%d\n", fsb,
- fid / 10, fid % 10);
-
- /* Set maximum FSB to FSB at boot time */
- max_fsb = nforce2_fsb_read(1);
-
- if (!max_fsb)
- return -EIO;
-
- if (!min_fsb)
- min_fsb = max_fsb - NFORCE2_SAFE_DISTANCE;
-
- if (min_fsb < NFORCE2_MIN_FSB)
- min_fsb = NFORCE2_MIN_FSB;
-
- /* cpuinfo and default policy values */
- policy->cpuinfo.min_freq = min_fsb * fid * 100;
- policy->cpuinfo.max_freq = max_fsb * fid * 100;
- policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
- policy->cur = nforce2_get(policy->cpu);
- policy->min = policy->cpuinfo.min_freq;
- policy->max = policy->cpuinfo.max_freq;
-
- return 0;
-}
-
-static int nforce2_cpu_exit(struct cpufreq_policy *policy)
-{
- return 0;
-}
-
-static struct cpufreq_driver nforce2_driver = {
- .name = "nforce2",
- .verify = nforce2_verify,
- .target = nforce2_target,
- .get = nforce2_get,
- .init = nforce2_cpu_init,
- .exit = nforce2_cpu_exit,
- .owner = THIS_MODULE,
-};
-
-/**
- * nforce2_detect_chipset - detect the Southbridge which contains FSB PLL logic
- *
- * Detects nForce2 A2 and C1 stepping
- *
- */
-static int nforce2_detect_chipset(void)
-{
- nforce2_dev = pci_get_subsys(PCI_VENDOR_ID_NVIDIA,
- PCI_DEVICE_ID_NVIDIA_NFORCE2,
- PCI_ANY_ID, PCI_ANY_ID, NULL);
-
- if (nforce2_dev == NULL)
- return -ENODEV;
-
- printk(KERN_INFO PFX "Detected nForce2 chipset revision %X\n",
- nforce2_dev->revision);
- printk(KERN_INFO PFX
- "FSB changing is maybe unstable and can lead to "
- "crashes and data loss.\n");
-
- return 0;
-}
-
-/**
- * nforce2_init - initializes the nForce2 CPUFreq driver
- *
- * Initializes the nForce2 FSB support. Returns -ENODEV on unsupported
- * devices, -EINVAL on problems during initiatization, and zero on
- * success.
- */
-static int __init nforce2_init(void)
-{
- /* TODO: do we need to detect the processor? */
-
- /* detect chipset */
- if (nforce2_detect_chipset()) {
- printk(KERN_INFO PFX "No nForce2 chipset.\n");
- return -ENODEV;
- }
-
- return cpufreq_register_driver(&nforce2_driver);
-}
-
-/**
- * nforce2_exit - unregisters cpufreq module
- *
- * Unregisters nForce2 FSB change support.
- */
-static void __exit nforce2_exit(void)
-{
- cpufreq_unregister_driver(&nforce2_driver);
-}
-
-module_init(nforce2_init);
-module_exit(nforce2_exit);
-
+++ /dev/null
-/*
- * Based on documentation provided by Dave Jones. Thanks!
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/ioport.h>
-#include <linux/slab.h>
-#include <linux/timex.h>
-#include <linux/io.h>
-#include <linux/delay.h>
-
-#include <asm/msr.h>
-#include <asm/tsc.h>
-
-#define EPS_BRAND_C7M 0
-#define EPS_BRAND_C7 1
-#define EPS_BRAND_EDEN 2
-#define EPS_BRAND_C3 3
-#define EPS_BRAND_C7D 4
-
-struct eps_cpu_data {
- u32 fsb;
- struct cpufreq_frequency_table freq_table[];
-};
-
-static struct eps_cpu_data *eps_cpu[NR_CPUS];
-
-
-static unsigned int eps_get(unsigned int cpu)
-{
- struct eps_cpu_data *centaur;
- u32 lo, hi;
-
- if (cpu)
- return 0;
- centaur = eps_cpu[cpu];
- if (centaur == NULL)
- return 0;
-
- /* Return current frequency */
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- return centaur->fsb * ((lo >> 8) & 0xff);
-}
-
-static int eps_set_state(struct eps_cpu_data *centaur,
- unsigned int cpu,
- u32 dest_state)
-{
- struct cpufreq_freqs freqs;
- u32 lo, hi;
- int err = 0;
- int i;
-
- freqs.old = eps_get(cpu);
- freqs.new = centaur->fsb * ((dest_state >> 8) & 0xff);
- freqs.cpu = cpu;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- /* Wait while CPU is busy */
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- i = 0;
- while (lo & ((1 << 16) | (1 << 17))) {
- udelay(16);
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- i++;
- if (unlikely(i > 64)) {
- err = -ENODEV;
- goto postchange;
- }
- }
- /* Set new multiplier and voltage */
- wrmsr(MSR_IA32_PERF_CTL, dest_state & 0xffff, 0);
- /* Wait until transition end */
- i = 0;
- do {
- udelay(16);
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- i++;
- if (unlikely(i > 64)) {
- err = -ENODEV;
- goto postchange;
- }
- } while (lo & ((1 << 16) | (1 << 17)));
-
- /* Return current frequency */
-postchange:
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- freqs.new = centaur->fsb * ((lo >> 8) & 0xff);
-
-#ifdef DEBUG
- {
- u8 current_multiplier, current_voltage;
-
- /* Print voltage and multiplier */
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- current_voltage = lo & 0xff;
- printk(KERN_INFO "eps: Current voltage = %dmV\n",
- current_voltage * 16 + 700);
- current_multiplier = (lo >> 8) & 0xff;
- printk(KERN_INFO "eps: Current multiplier = %d\n",
- current_multiplier);
- }
-#endif
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- return err;
-}
-
-static int eps_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- struct eps_cpu_data *centaur;
- unsigned int newstate = 0;
- unsigned int cpu = policy->cpu;
- unsigned int dest_state;
- int ret;
-
- if (unlikely(eps_cpu[cpu] == NULL))
- return -ENODEV;
- centaur = eps_cpu[cpu];
-
- if (unlikely(cpufreq_frequency_table_target(policy,
- &eps_cpu[cpu]->freq_table[0],
- target_freq,
- relation,
- &newstate))) {
- return -EINVAL;
- }
-
- /* Make frequency transition */
- dest_state = centaur->freq_table[newstate].index & 0xffff;
- ret = eps_set_state(centaur, cpu, dest_state);
- if (ret)
- printk(KERN_ERR "eps: Timeout!\n");
- return ret;
-}
-
-static int eps_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy,
- &eps_cpu[policy->cpu]->freq_table[0]);
-}
-
-static int eps_cpu_init(struct cpufreq_policy *policy)
-{
- unsigned int i;
- u32 lo, hi;
- u64 val;
- u8 current_multiplier, current_voltage;
- u8 max_multiplier, max_voltage;
- u8 min_multiplier, min_voltage;
- u8 brand = 0;
- u32 fsb;
- struct eps_cpu_data *centaur;
- struct cpuinfo_x86 *c = &cpu_data(0);
- struct cpufreq_frequency_table *f_table;
- int k, step, voltage;
- int ret;
- int states;
-
- if (policy->cpu != 0)
- return -ENODEV;
-
- /* Check brand */
- printk(KERN_INFO "eps: Detected VIA ");
-
- switch (c->x86_model) {
- case 10:
- rdmsr(0x1153, lo, hi);
- brand = (((lo >> 2) ^ lo) >> 18) & 3;
- printk(KERN_CONT "Model A ");
- break;
- case 13:
- rdmsr(0x1154, lo, hi);
- brand = (((lo >> 4) ^ (lo >> 2))) & 0x000000ff;
- printk(KERN_CONT "Model D ");
- break;
- }
-
- switch (brand) {
- case EPS_BRAND_C7M:
- printk(KERN_CONT "C7-M\n");
- break;
- case EPS_BRAND_C7:
- printk(KERN_CONT "C7\n");
- break;
- case EPS_BRAND_EDEN:
- printk(KERN_CONT "Eden\n");
- break;
- case EPS_BRAND_C7D:
- printk(KERN_CONT "C7-D\n");
- break;
- case EPS_BRAND_C3:
- printk(KERN_CONT "C3\n");
- return -ENODEV;
- break;
- }
- /* Enable Enhanced PowerSaver */
- rdmsrl(MSR_IA32_MISC_ENABLE, val);
- if (!(val & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
- val |= MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP;
- wrmsrl(MSR_IA32_MISC_ENABLE, val);
- /* Can be locked at 0 */
- rdmsrl(MSR_IA32_MISC_ENABLE, val);
- if (!(val & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
- printk(KERN_INFO "eps: Can't enable Enhanced PowerSaver\n");
- return -ENODEV;
- }
- }
-
- /* Print voltage and multiplier */
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- current_voltage = lo & 0xff;
- printk(KERN_INFO "eps: Current voltage = %dmV\n",
- current_voltage * 16 + 700);
- current_multiplier = (lo >> 8) & 0xff;
- printk(KERN_INFO "eps: Current multiplier = %d\n", current_multiplier);
-
- /* Print limits */
- max_voltage = hi & 0xff;
- printk(KERN_INFO "eps: Highest voltage = %dmV\n",
- max_voltage * 16 + 700);
- max_multiplier = (hi >> 8) & 0xff;
- printk(KERN_INFO "eps: Highest multiplier = %d\n", max_multiplier);
- min_voltage = (hi >> 16) & 0xff;
- printk(KERN_INFO "eps: Lowest voltage = %dmV\n",
- min_voltage * 16 + 700);
- min_multiplier = (hi >> 24) & 0xff;
- printk(KERN_INFO "eps: Lowest multiplier = %d\n", min_multiplier);
-
- /* Sanity checks */
- if (current_multiplier == 0 || max_multiplier == 0
- || min_multiplier == 0)
- return -EINVAL;
- if (current_multiplier > max_multiplier
- || max_multiplier <= min_multiplier)
- return -EINVAL;
- if (current_voltage > 0x1f || max_voltage > 0x1f)
- return -EINVAL;
- if (max_voltage < min_voltage)
- return -EINVAL;
-
- /* Calc FSB speed */
- fsb = cpu_khz / current_multiplier;
- /* Calc number of p-states supported */
- if (brand == EPS_BRAND_C7M)
- states = max_multiplier - min_multiplier + 1;
- else
- states = 2;
-
- /* Allocate private data and frequency table for current cpu */
- centaur = kzalloc(sizeof(struct eps_cpu_data)
- + (states + 1) * sizeof(struct cpufreq_frequency_table),
- GFP_KERNEL);
- if (!centaur)
- return -ENOMEM;
- eps_cpu[0] = centaur;
-
- /* Copy basic values */
- centaur->fsb = fsb;
-
- /* Fill frequency and MSR value table */
- f_table = ¢aur->freq_table[0];
- if (brand != EPS_BRAND_C7M) {
- f_table[0].frequency = fsb * min_multiplier;
- f_table[0].index = (min_multiplier << 8) | min_voltage;
- f_table[1].frequency = fsb * max_multiplier;
- f_table[1].index = (max_multiplier << 8) | max_voltage;
- f_table[2].frequency = CPUFREQ_TABLE_END;
- } else {
- k = 0;
- step = ((max_voltage - min_voltage) * 256)
- / (max_multiplier - min_multiplier);
- for (i = min_multiplier; i <= max_multiplier; i++) {
- voltage = (k * step) / 256 + min_voltage;
- f_table[k].frequency = fsb * i;
- f_table[k].index = (i << 8) | voltage;
- k++;
- }
- f_table[k].frequency = CPUFREQ_TABLE_END;
- }
-
- policy->cpuinfo.transition_latency = 140000; /* 844mV -> 700mV in ns */
- policy->cur = fsb * current_multiplier;
-
- ret = cpufreq_frequency_table_cpuinfo(policy, ¢aur->freq_table[0]);
- if (ret) {
- kfree(centaur);
- return ret;
- }
-
- cpufreq_frequency_table_get_attr(¢aur->freq_table[0], policy->cpu);
- return 0;
-}
-
-static int eps_cpu_exit(struct cpufreq_policy *policy)
-{
- unsigned int cpu = policy->cpu;
- struct eps_cpu_data *centaur;
- u32 lo, hi;
-
- if (eps_cpu[cpu] == NULL)
- return -ENODEV;
- centaur = eps_cpu[cpu];
-
- /* Get max frequency */
- rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
- /* Set max frequency */
- eps_set_state(centaur, cpu, hi & 0xffff);
- /* Bye */
- cpufreq_frequency_table_put_attr(policy->cpu);
- kfree(eps_cpu[cpu]);
- eps_cpu[cpu] = NULL;
- return 0;
-}
-
-static struct freq_attr *eps_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver eps_driver = {
- .verify = eps_verify,
- .target = eps_target,
- .init = eps_cpu_init,
- .exit = eps_cpu_exit,
- .get = eps_get,
- .name = "e_powersaver",
- .owner = THIS_MODULE,
- .attr = eps_attr,
-};
-
-static int __init eps_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
-
- /* This driver will work only on Centaur C7 processors with
- * Enhanced SpeedStep/PowerSaver registers */
- if (c->x86_vendor != X86_VENDOR_CENTAUR
- || c->x86 != 6 || c->x86_model < 10)
- return -ENODEV;
- if (!cpu_has(c, X86_FEATURE_EST))
- return -ENODEV;
-
- if (cpufreq_register_driver(&eps_driver))
- return -EINVAL;
- return 0;
-}
-
-static void __exit eps_exit(void)
-{
- cpufreq_unregister_driver(&eps_driver);
-}
-
-MODULE_AUTHOR("Rafal Bilski <rafalbilski@interia.pl>");
-MODULE_DESCRIPTION("Enhanced PowerSaver driver for VIA C7 CPU's.");
-MODULE_LICENSE("GPL");
-
-module_init(eps_init);
-module_exit(eps_exit);
+++ /dev/null
-/*
- * elanfreq: cpufreq driver for the AMD ELAN family
- *
- * (c) Copyright 2002 Robert Schwebel <r.schwebel@pengutronix.de>
- *
- * Parts of this code are (c) Sven Geggus <sven@geggus.net>
- *
- * All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * 2002-02-13: - initial revision for 2.4.18-pre9 by Robert Schwebel
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-
-#include <linux/delay.h>
-#include <linux/cpufreq.h>
-
-#include <asm/msr.h>
-#include <linux/timex.h>
-#include <linux/io.h>
-
-#define REG_CSCIR 0x22 /* Chip Setup and Control Index Register */
-#define REG_CSCDR 0x23 /* Chip Setup and Control Data Register */
-
-/* Module parameter */
-static int max_freq;
-
-struct s_elan_multiplier {
- int clock; /* frequency in kHz */
- int val40h; /* PMU Force Mode register */
- int val80h; /* CPU Clock Speed Register */
-};
-
-/*
- * It is important that the frequencies
- * are listed in ascending order here!
- */
-static struct s_elan_multiplier elan_multiplier[] = {
- {1000, 0x02, 0x18},
- {2000, 0x02, 0x10},
- {4000, 0x02, 0x08},
- {8000, 0x00, 0x00},
- {16000, 0x00, 0x02},
- {33000, 0x00, 0x04},
- {66000, 0x01, 0x04},
- {99000, 0x01, 0x05}
-};
-
-static struct cpufreq_frequency_table elanfreq_table[] = {
- {0, 1000},
- {1, 2000},
- {2, 4000},
- {3, 8000},
- {4, 16000},
- {5, 33000},
- {6, 66000},
- {7, 99000},
- {0, CPUFREQ_TABLE_END},
-};
-
-
-/**
- * elanfreq_get_cpu_frequency: determine current cpu speed
- *
- * Finds out at which frequency the CPU of the Elan SOC runs
- * at the moment. Frequencies from 1 to 33 MHz are generated
- * the normal way, 66 and 99 MHz are called "Hyperspeed Mode"
- * and have the rest of the chip running with 33 MHz.
- */
-
-static unsigned int elanfreq_get_cpu_frequency(unsigned int cpu)
-{
- u8 clockspeed_reg; /* Clock Speed Register */
-
- local_irq_disable();
- outb_p(0x80, REG_CSCIR);
- clockspeed_reg = inb_p(REG_CSCDR);
- local_irq_enable();
-
- if ((clockspeed_reg & 0xE0) == 0xE0)
- return 0;
-
- /* Are we in CPU clock multiplied mode (66/99 MHz)? */
- if ((clockspeed_reg & 0xE0) == 0xC0) {
- if ((clockspeed_reg & 0x01) == 0)
- return 66000;
- else
- return 99000;
- }
-
- /* 33 MHz is not 32 MHz... */
- if ((clockspeed_reg & 0xE0) == 0xA0)
- return 33000;
-
- return (1<<((clockspeed_reg & 0xE0) >> 5)) * 1000;
-}
-
-
-/**
- * elanfreq_set_cpu_frequency: Change the CPU core frequency
- * @cpu: cpu number
- * @freq: frequency in kHz
- *
- * This function takes a frequency value and changes the CPU frequency
- * according to this. Note that the frequency has to be checked by
- * elanfreq_validatespeed() for correctness!
- *
- * There is no return value.
- */
-
-static void elanfreq_set_cpu_state(unsigned int state)
-{
- struct cpufreq_freqs freqs;
-
- freqs.old = elanfreq_get_cpu_frequency(0);
- freqs.new = elan_multiplier[state].clock;
- freqs.cpu = 0; /* elanfreq.c is UP only driver */
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- printk(KERN_INFO "elanfreq: attempting to set frequency to %i kHz\n",
- elan_multiplier[state].clock);
-
-
- /*
- * Access to the Elan's internal registers is indexed via
- * 0x22: Chip Setup & Control Register Index Register (CSCI)
- * 0x23: Chip Setup & Control Register Data Register (CSCD)
- *
- */
-
- /*
- * 0x40 is the Power Management Unit's Force Mode Register.
- * Bit 6 enables Hyperspeed Mode (66/100 MHz core frequency)
- */
-
- local_irq_disable();
- outb_p(0x40, REG_CSCIR); /* Disable hyperspeed mode */
- outb_p(0x00, REG_CSCDR);
- local_irq_enable(); /* wait till internal pipelines and */
- udelay(1000); /* buffers have cleaned up */
-
- local_irq_disable();
-
- /* now, set the CPU clock speed register (0x80) */
- outb_p(0x80, REG_CSCIR);
- outb_p(elan_multiplier[state].val80h, REG_CSCDR);
-
- /* now, the hyperspeed bit in PMU Force Mode Register (0x40) */
- outb_p(0x40, REG_CSCIR);
- outb_p(elan_multiplier[state].val40h, REG_CSCDR);
- udelay(10000);
- local_irq_enable();
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-};
-
-
-/**
- * elanfreq_validatespeed: test if frequency range is valid
- * @policy: the policy to validate
- *
- * This function checks if a given frequency range in kHz is valid
- * for the hardware supported by the driver.
- */
-
-static int elanfreq_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, &elanfreq_table[0]);
-}
-
-static int elanfreq_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate = 0;
-
- if (cpufreq_frequency_table_target(policy, &elanfreq_table[0],
- target_freq, relation, &newstate))
- return -EINVAL;
-
- elanfreq_set_cpu_state(newstate);
-
- return 0;
-}
-
-
-/*
- * Module init and exit code
- */
-
-static int elanfreq_cpu_init(struct cpufreq_policy *policy)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- unsigned int i;
- int result;
-
- /* capability check */
- if ((c->x86_vendor != X86_VENDOR_AMD) ||
- (c->x86 != 4) || (c->x86_model != 10))
- return -ENODEV;
-
- /* max freq */
- if (!max_freq)
- max_freq = elanfreq_get_cpu_frequency(0);
-
- /* table init */
- for (i = 0; (elanfreq_table[i].frequency != CPUFREQ_TABLE_END); i++) {
- if (elanfreq_table[i].frequency > max_freq)
- elanfreq_table[i].frequency = CPUFREQ_ENTRY_INVALID;
- }
-
- /* cpuinfo and default policy values */
- policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
- policy->cur = elanfreq_get_cpu_frequency(0);
-
- result = cpufreq_frequency_table_cpuinfo(policy, elanfreq_table);
- if (result)
- return result;
-
- cpufreq_frequency_table_get_attr(elanfreq_table, policy->cpu);
- return 0;
-}
-
-
-static int elanfreq_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-
-#ifndef MODULE
-/**
- * elanfreq_setup - elanfreq command line parameter parsing
- *
- * elanfreq command line parameter. Use:
- * elanfreq=66000
- * to set the maximum CPU frequency to 66 MHz. Note that in
- * case you do not give this boot parameter, the maximum
- * frequency will fall back to _current_ CPU frequency which
- * might be lower. If you build this as a module, use the
- * max_freq module parameter instead.
- */
-static int __init elanfreq_setup(char *str)
-{
- max_freq = simple_strtoul(str, &str, 0);
- printk(KERN_WARNING "You're using the deprecated elanfreq command line option. Use elanfreq.max_freq instead, please!\n");
- return 1;
-}
-__setup("elanfreq=", elanfreq_setup);
-#endif
-
-
-static struct freq_attr *elanfreq_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-
-static struct cpufreq_driver elanfreq_driver = {
- .get = elanfreq_get_cpu_frequency,
- .verify = elanfreq_verify,
- .target = elanfreq_target,
- .init = elanfreq_cpu_init,
- .exit = elanfreq_cpu_exit,
- .name = "elanfreq",
- .owner = THIS_MODULE,
- .attr = elanfreq_attr,
-};
-
-
-static int __init elanfreq_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
-
- /* Test if we have the right hardware */
- if ((c->x86_vendor != X86_VENDOR_AMD) ||
- (c->x86 != 4) || (c->x86_model != 10)) {
- printk(KERN_INFO "elanfreq: error: no Elan processor found!\n");
- return -ENODEV;
- }
- return cpufreq_register_driver(&elanfreq_driver);
-}
-
-
-static void __exit elanfreq_exit(void)
-{
- cpufreq_unregister_driver(&elanfreq_driver);
-}
-
-
-module_param(max_freq, int, 0444);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Robert Schwebel <r.schwebel@pengutronix.de>, "
- "Sven Geggus <sven@geggus.net>");
-MODULE_DESCRIPTION("cpufreq driver for AMD's Elan CPUs");
-
-module_init(elanfreq_init);
-module_exit(elanfreq_exit);
+++ /dev/null
-/*
- * Cyrix MediaGX and NatSemi Geode Suspend Modulation
- * (C) 2002 Zwane Mwaikambo <zwane@commfireservices.com>
- * (C) 2002 Hiroshi Miura <miura@da-cha.org>
- * All Rights Reserved
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * version 2 as published by the Free Software Foundation
- *
- * The author(s) of this software shall not be held liable for damages
- * of any nature resulting due to the use of this software. This
- * software is provided AS-IS with no warranties.
- *
- * Theoretical note:
- *
- * (see Geode(tm) CS5530 manual (rev.4.1) page.56)
- *
- * CPU frequency control on NatSemi Geode GX1/GXLV processor and CS55x0
- * are based on Suspend Modulation.
- *
- * Suspend Modulation works by asserting and de-asserting the SUSP# pin
- * to CPU(GX1/GXLV) for configurable durations. When asserting SUSP#
- * the CPU enters an idle state. GX1 stops its core clock when SUSP# is
- * asserted then power consumption is reduced.
- *
- * Suspend Modulation's OFF/ON duration are configurable
- * with 'Suspend Modulation OFF Count Register'
- * and 'Suspend Modulation ON Count Register'.
- * These registers are 8bit counters that represent the number of
- * 32us intervals which the SUSP# pin is asserted(ON)/de-asserted(OFF)
- * to the processor.
- *
- * These counters define a ratio which is the effective frequency
- * of operation of the system.
- *
- * OFF Count
- * F_eff = Fgx * ----------------------
- * OFF Count + ON Count
- *
- * 0 <= On Count, Off Count <= 255
- *
- * From these limits, we can get register values
- *
- * off_duration + on_duration <= MAX_DURATION
- * on_duration = off_duration * (stock_freq - freq) / freq
- *
- * off_duration = (freq * DURATION) / stock_freq
- * on_duration = DURATION - off_duration
- *
- *
- *---------------------------------------------------------------------------
- *
- * ChangeLog:
- * Dec. 12, 2003 Hiroshi Miura <miura@da-cha.org>
- * - fix on/off register mistake
- * - fix cpu_khz calc when it stops cpu modulation.
- *
- * Dec. 11, 2002 Hiroshi Miura <miura@da-cha.org>
- * - rewrite for Cyrix MediaGX Cx5510/5520 and
- * NatSemi Geode Cs5530(A).
- *
- * Jul. ??, 2002 Zwane Mwaikambo <zwane@commfireservices.com>
- * - cs5530_mod patch for 2.4.19-rc1.
- *
- *---------------------------------------------------------------------------
- *
- * Todo
- * Test on machines with 5510, 5530, 5530A
- */
-
-/************************************************************************
- * Suspend Modulation - Definitions *
- ************************************************************************/
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/smp.h>
-#include <linux/cpufreq.h>
-#include <linux/pci.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-
-#include <asm/processor-cyrix.h>
-
-/* PCI config registers, all at F0 */
-#define PCI_PMER1 0x80 /* power management enable register 1 */
-#define PCI_PMER2 0x81 /* power management enable register 2 */
-#define PCI_PMER3 0x82 /* power management enable register 3 */
-#define PCI_IRQTC 0x8c /* irq speedup timer counter register:typical 2 to 4ms */
-#define PCI_VIDTC 0x8d /* video speedup timer counter register: typical 50 to 100ms */
-#define PCI_MODOFF 0x94 /* suspend modulation OFF counter register, 1 = 32us */
-#define PCI_MODON 0x95 /* suspend modulation ON counter register */
-#define PCI_SUSCFG 0x96 /* suspend configuration register */
-
-/* PMER1 bits */
-#define GPM (1<<0) /* global power management */
-#define GIT (1<<1) /* globally enable PM device idle timers */
-#define GTR (1<<2) /* globally enable IO traps */
-#define IRQ_SPDUP (1<<3) /* disable clock throttle during interrupt handling */
-#define VID_SPDUP (1<<4) /* disable clock throttle during vga video handling */
-
-/* SUSCFG bits */
-#define SUSMOD (1<<0) /* enable/disable suspend modulation */
-/* the below is supported only with cs5530 (after rev.1.2)/cs5530A */
-#define SMISPDUP (1<<1) /* select how SMI re-enable suspend modulation: */
- /* IRQTC timer or read SMI speedup disable reg.(F1BAR[08-09h]) */
-#define SUSCFG (1<<2) /* enable powering down a GXLV processor. "Special 3Volt Suspend" mode */
-/* the below is supported only with cs5530A */
-#define PWRSVE_ISA (1<<3) /* stop ISA clock */
-#define PWRSVE (1<<4) /* active idle */
-
-struct gxfreq_params {
- u8 on_duration;
- u8 off_duration;
- u8 pci_suscfg;
- u8 pci_pmer1;
- u8 pci_pmer2;
- struct pci_dev *cs55x0;
-};
-
-static struct gxfreq_params *gx_params;
-static int stock_freq;
-
-/* PCI bus clock - defaults to 30.000 if cpu_khz is not available */
-static int pci_busclk;
-module_param(pci_busclk, int, 0444);
-
-/* maximum duration for which the cpu may be suspended
- * (32us * MAX_DURATION). If no parameter is given, this defaults
- * to 255.
- * Note that this leads to a maximum of 8 ms(!) where the CPU clock
- * is suspended -- processing power is just 0.39% of what it used to be,
- * though. 781.25 kHz(!) for a 200 MHz processor -- wow. */
-static int max_duration = 255;
-module_param(max_duration, int, 0444);
-
-/* For the default policy, we want at least some processing power
- * - let's say 5%. (min = maxfreq / POLICY_MIN_DIV)
- */
-#define POLICY_MIN_DIV 20
-
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "gx-suspmod", msg)
-
-/**
- * we can detect a core multipiler from dir0_lsb
- * from GX1 datasheet p.56,
- * MULT[3:0]:
- * 0000 = SYSCLK multiplied by 4 (test only)
- * 0001 = SYSCLK multiplied by 10
- * 0010 = SYSCLK multiplied by 4
- * 0011 = SYSCLK multiplied by 6
- * 0100 = SYSCLK multiplied by 9
- * 0101 = SYSCLK multiplied by 5
- * 0110 = SYSCLK multiplied by 7
- * 0111 = SYSCLK multiplied by 8
- * of 33.3MHz
- **/
-static int gx_freq_mult[16] = {
- 4, 10, 4, 6, 9, 5, 7, 8,
- 0, 0, 0, 0, 0, 0, 0, 0
-};
-
-
-/****************************************************************
- * Low Level chipset interface *
- ****************************************************************/
-static struct pci_device_id gx_chipset_tbl[] __initdata = {
- { PCI_VDEVICE(CYRIX, PCI_DEVICE_ID_CYRIX_5530_LEGACY), },
- { PCI_VDEVICE(CYRIX, PCI_DEVICE_ID_CYRIX_5520), },
- { PCI_VDEVICE(CYRIX, PCI_DEVICE_ID_CYRIX_5510), },
- { 0, },
-};
-
-static void gx_write_byte(int reg, int value)
-{
- pci_write_config_byte(gx_params->cs55x0, reg, value);
-}
-
-/**
- * gx_detect_chipset:
- *
- **/
-static __init struct pci_dev *gx_detect_chipset(void)
-{
- struct pci_dev *gx_pci = NULL;
-
- /* check if CPU is a MediaGX or a Geode. */
- if ((boot_cpu_data.x86_vendor != X86_VENDOR_NSC) &&
- (boot_cpu_data.x86_vendor != X86_VENDOR_CYRIX)) {
- dprintk("error: no MediaGX/Geode processor found!\n");
- return NULL;
- }
-
- /* detect which companion chip is used */
- for_each_pci_dev(gx_pci) {
- if ((pci_match_id(gx_chipset_tbl, gx_pci)) != NULL)
- return gx_pci;
- }
-
- dprintk("error: no supported chipset found!\n");
- return NULL;
-}
-
-/**
- * gx_get_cpuspeed:
- *
- * Finds out at which efficient frequency the Cyrix MediaGX/NatSemi
- * Geode CPU runs.
- */
-static unsigned int gx_get_cpuspeed(unsigned int cpu)
-{
- if ((gx_params->pci_suscfg & SUSMOD) == 0)
- return stock_freq;
-
- return (stock_freq * gx_params->off_duration)
- / (gx_params->on_duration + gx_params->off_duration);
-}
-
-/**
- * gx_validate_speed:
- * determine current cpu speed
- *
- **/
-
-static unsigned int gx_validate_speed(unsigned int khz, u8 *on_duration,
- u8 *off_duration)
-{
- unsigned int i;
- u8 tmp_on, tmp_off;
- int old_tmp_freq = stock_freq;
- int tmp_freq;
-
- *off_duration = 1;
- *on_duration = 0;
-
- for (i = max_duration; i > 0; i--) {
- tmp_off = ((khz * i) / stock_freq) & 0xff;
- tmp_on = i - tmp_off;
- tmp_freq = (stock_freq * tmp_off) / i;
- /* if this relation is closer to khz, use this. If it's equal,
- * prefer it, too - lower latency */
- if (abs(tmp_freq - khz) <= abs(old_tmp_freq - khz)) {
- *on_duration = tmp_on;
- *off_duration = tmp_off;
- old_tmp_freq = tmp_freq;
- }
- }
-
- return old_tmp_freq;
-}
-
-
-/**
- * gx_set_cpuspeed:
- * set cpu speed in khz.
- **/
-
-static void gx_set_cpuspeed(unsigned int khz)
-{
- u8 suscfg, pmer1;
- unsigned int new_khz;
- unsigned long flags;
- struct cpufreq_freqs freqs;
-
- freqs.cpu = 0;
- freqs.old = gx_get_cpuspeed(0);
-
- new_khz = gx_validate_speed(khz, &gx_params->on_duration,
- &gx_params->off_duration);
-
- freqs.new = new_khz;
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- local_irq_save(flags);
-
-
-
- if (new_khz != stock_freq) {
- /* if new khz == 100% of CPU speed, it is special case */
- switch (gx_params->cs55x0->device) {
- case PCI_DEVICE_ID_CYRIX_5530_LEGACY:
- pmer1 = gx_params->pci_pmer1 | IRQ_SPDUP | VID_SPDUP;
- /* FIXME: need to test other values -- Zwane,Miura */
- /* typical 2 to 4ms */
- gx_write_byte(PCI_IRQTC, 4);
- /* typical 50 to 100ms */
- gx_write_byte(PCI_VIDTC, 100);
- gx_write_byte(PCI_PMER1, pmer1);
-
- if (gx_params->cs55x0->revision < 0x10) {
- /* CS5530(rev 1.2, 1.3) */
- suscfg = gx_params->pci_suscfg|SUSMOD;
- } else {
- /* CS5530A,B.. */
- suscfg = gx_params->pci_suscfg|SUSMOD|PWRSVE;
- }
- break;
- case PCI_DEVICE_ID_CYRIX_5520:
- case PCI_DEVICE_ID_CYRIX_5510:
- suscfg = gx_params->pci_suscfg | SUSMOD;
- break;
- default:
- local_irq_restore(flags);
- dprintk("fatal: try to set unknown chipset.\n");
- return;
- }
- } else {
- suscfg = gx_params->pci_suscfg & ~(SUSMOD);
- gx_params->off_duration = 0;
- gx_params->on_duration = 0;
- dprintk("suspend modulation disabled: cpu runs 100%% speed.\n");
- }
-
- gx_write_byte(PCI_MODOFF, gx_params->off_duration);
- gx_write_byte(PCI_MODON, gx_params->on_duration);
-
- gx_write_byte(PCI_SUSCFG, suscfg);
- pci_read_config_byte(gx_params->cs55x0, PCI_SUSCFG, &suscfg);
-
- local_irq_restore(flags);
-
- gx_params->pci_suscfg = suscfg;
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-
- dprintk("suspend modulation w/ duration of ON:%d us, OFF:%d us\n",
- gx_params->on_duration * 32, gx_params->off_duration * 32);
- dprintk("suspend modulation w/ clock speed: %d kHz.\n", freqs.new);
-}
-
-/****************************************************************
- * High level functions *
- ****************************************************************/
-
-/*
- * cpufreq_gx_verify: test if frequency range is valid
- *
- * This function checks if a given frequency range in kHz is valid
- * for the hardware supported by the driver.
- */
-
-static int cpufreq_gx_verify(struct cpufreq_policy *policy)
-{
- unsigned int tmp_freq = 0;
- u8 tmp1, tmp2;
-
- if (!stock_freq || !policy)
- return -EINVAL;
-
- policy->cpu = 0;
- cpufreq_verify_within_limits(policy, (stock_freq / max_duration),
- stock_freq);
-
- /* it needs to be assured that at least one supported frequency is
- * within policy->min and policy->max. If it is not, policy->max
- * needs to be increased until one freuqency is supported.
- * policy->min may not be decreased, though. This way we guarantee a
- * specific processing capacity.
- */
- tmp_freq = gx_validate_speed(policy->min, &tmp1, &tmp2);
- if (tmp_freq < policy->min)
- tmp_freq += stock_freq / max_duration;
- policy->min = tmp_freq;
- if (policy->min > policy->max)
- policy->max = tmp_freq;
- tmp_freq = gx_validate_speed(policy->max, &tmp1, &tmp2);
- if (tmp_freq > policy->max)
- tmp_freq -= stock_freq / max_duration;
- policy->max = tmp_freq;
- if (policy->max < policy->min)
- policy->max = policy->min;
- cpufreq_verify_within_limits(policy, (stock_freq / max_duration),
- stock_freq);
-
- return 0;
-}
-
-/*
- * cpufreq_gx_target:
- *
- */
-static int cpufreq_gx_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- u8 tmp1, tmp2;
- unsigned int tmp_freq;
-
- if (!stock_freq || !policy)
- return -EINVAL;
-
- policy->cpu = 0;
-
- tmp_freq = gx_validate_speed(target_freq, &tmp1, &tmp2);
- while (tmp_freq < policy->min) {
- tmp_freq += stock_freq / max_duration;
- tmp_freq = gx_validate_speed(tmp_freq, &tmp1, &tmp2);
- }
- while (tmp_freq > policy->max) {
- tmp_freq -= stock_freq / max_duration;
- tmp_freq = gx_validate_speed(tmp_freq, &tmp1, &tmp2);
- }
-
- gx_set_cpuspeed(tmp_freq);
-
- return 0;
-}
-
-static int cpufreq_gx_cpu_init(struct cpufreq_policy *policy)
-{
- unsigned int maxfreq, curfreq;
-
- if (!policy || policy->cpu != 0)
- return -ENODEV;
-
- /* determine maximum frequency */
- if (pci_busclk)
- maxfreq = pci_busclk * gx_freq_mult[getCx86(CX86_DIR1) & 0x0f];
- else if (cpu_khz)
- maxfreq = cpu_khz;
- else
- maxfreq = 30000 * gx_freq_mult[getCx86(CX86_DIR1) & 0x0f];
-
- stock_freq = maxfreq;
- curfreq = gx_get_cpuspeed(0);
-
- dprintk("cpu max frequency is %d.\n", maxfreq);
- dprintk("cpu current frequency is %dkHz.\n", curfreq);
-
- /* setup basic struct for cpufreq API */
- policy->cpu = 0;
-
- if (max_duration < POLICY_MIN_DIV)
- policy->min = maxfreq / max_duration;
- else
- policy->min = maxfreq / POLICY_MIN_DIV;
- policy->max = maxfreq;
- policy->cur = curfreq;
- policy->cpuinfo.min_freq = maxfreq / max_duration;
- policy->cpuinfo.max_freq = maxfreq;
- policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
-
- return 0;
-}
-
-/*
- * cpufreq_gx_init:
- * MediaGX/Geode GX initialize cpufreq driver
- */
-static struct cpufreq_driver gx_suspmod_driver = {
- .get = gx_get_cpuspeed,
- .verify = cpufreq_gx_verify,
- .target = cpufreq_gx_target,
- .init = cpufreq_gx_cpu_init,
- .name = "gx-suspmod",
- .owner = THIS_MODULE,
-};
-
-static int __init cpufreq_gx_init(void)
-{
- int ret;
- struct gxfreq_params *params;
- struct pci_dev *gx_pci;
-
- /* Test if we have the right hardware */
- gx_pci = gx_detect_chipset();
- if (gx_pci == NULL)
- return -ENODEV;
-
- /* check whether module parameters are sane */
- if (max_duration > 0xff)
- max_duration = 0xff;
-
- dprintk("geode suspend modulation available.\n");
-
- params = kzalloc(sizeof(struct gxfreq_params), GFP_KERNEL);
- if (params == NULL)
- return -ENOMEM;
-
- params->cs55x0 = gx_pci;
- gx_params = params;
-
- /* keep cs55x0 configurations */
- pci_read_config_byte(params->cs55x0, PCI_SUSCFG, &(params->pci_suscfg));
- pci_read_config_byte(params->cs55x0, PCI_PMER1, &(params->pci_pmer1));
- pci_read_config_byte(params->cs55x0, PCI_PMER2, &(params->pci_pmer2));
- pci_read_config_byte(params->cs55x0, PCI_MODON, &(params->on_duration));
- pci_read_config_byte(params->cs55x0, PCI_MODOFF,
- &(params->off_duration));
-
- ret = cpufreq_register_driver(&gx_suspmod_driver);
- if (ret) {
- kfree(params);
- return ret; /* register error! */
- }
-
- return 0;
-}
-
-static void __exit cpufreq_gx_exit(void)
-{
- cpufreq_unregister_driver(&gx_suspmod_driver);
- pci_dev_put(gx_params->cs55x0);
- kfree(gx_params);
-}
-
-MODULE_AUTHOR("Hiroshi Miura <miura@da-cha.org>");
-MODULE_DESCRIPTION("Cpufreq driver for Cyrix MediaGX and NatSemi Geode");
-MODULE_LICENSE("GPL");
-
-module_init(cpufreq_gx_init);
-module_exit(cpufreq_gx_exit);
-
+++ /dev/null
-/*
- * (C) 2001-2004 Dave Jones. <davej@redhat.com>
- * (C) 2002 Padraig Brady. <padraig@antefacto.com>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- * Based upon datasheets & sample CPUs kindly provided by VIA.
- *
- * VIA have currently 3 different versions of Longhaul.
- * Version 1 (Longhaul) uses the BCR2 MSR at 0x1147.
- * It is present only in Samuel 1 (C5A), Samuel 2 (C5B) stepping 0.
- * Version 2 of longhaul is backward compatible with v1, but adds
- * LONGHAUL MSR for purpose of both frequency and voltage scaling.
- * Present in Samuel 2 (steppings 1-7 only) (C5B), and Ezra (C5C).
- * Version 3 of longhaul got renamed to Powersaver and redesigned
- * to use only the POWERSAVER MSR at 0x110a.
- * It is present in Ezra-T (C5M), Nehemiah (C5X) and above.
- * It's pretty much the same feature wise to longhaul v2, though
- * there is provision for scaling FSB too, but this doesn't work
- * too well in practice so we don't even try to use this.
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/pci.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/delay.h>
-#include <linux/timex.h>
-#include <linux/io.h>
-#include <linux/acpi.h>
-
-#include <asm/msr.h>
-#include <acpi/processor.h>
-
-#include "longhaul.h"
-
-#define PFX "longhaul: "
-
-#define TYPE_LONGHAUL_V1 1
-#define TYPE_LONGHAUL_V2 2
-#define TYPE_POWERSAVER 3
-
-#define CPU_SAMUEL 1
-#define CPU_SAMUEL2 2
-#define CPU_EZRA 3
-#define CPU_EZRA_T 4
-#define CPU_NEHEMIAH 5
-#define CPU_NEHEMIAH_C 6
-
-/* Flags */
-#define USE_ACPI_C3 (1 << 1)
-#define USE_NORTHBRIDGE (1 << 2)
-
-static int cpu_model;
-static unsigned int numscales = 16;
-static unsigned int fsb;
-
-static const struct mV_pos *vrm_mV_table;
-static const unsigned char *mV_vrm_table;
-
-static unsigned int highest_speed, lowest_speed; /* kHz */
-static unsigned int minmult, maxmult;
-static int can_scale_voltage;
-static struct acpi_processor *pr;
-static struct acpi_processor_cx *cx;
-static u32 acpi_regs_addr;
-static u8 longhaul_flags;
-static unsigned int longhaul_index;
-
-/* Module parameters */
-static int scale_voltage;
-static int disable_acpi_c3;
-static int revid_errata;
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "longhaul", msg)
-
-
-/* Clock ratios multiplied by 10 */
-static int mults[32];
-static int eblcr[32];
-static int longhaul_version;
-static struct cpufreq_frequency_table *longhaul_table;
-
-#ifdef CONFIG_CPU_FREQ_DEBUG
-static char speedbuffer[8];
-
-static char *print_speed(int speed)
-{
- if (speed < 1000) {
- snprintf(speedbuffer, sizeof(speedbuffer), "%dMHz", speed);
- return speedbuffer;
- }
-
- if (speed%1000 == 0)
- snprintf(speedbuffer, sizeof(speedbuffer),
- "%dGHz", speed/1000);
- else
- snprintf(speedbuffer, sizeof(speedbuffer),
- "%d.%dGHz", speed/1000, (speed%1000)/100);
-
- return speedbuffer;
-}
-#endif
-
-
-static unsigned int calc_speed(int mult)
-{
- int khz;
- khz = (mult/10)*fsb;
- if (mult%10)
- khz += fsb/2;
- khz *= 1000;
- return khz;
-}
-
-
-static int longhaul_get_cpu_mult(void)
-{
- unsigned long invalue = 0, lo, hi;
-
- rdmsr(MSR_IA32_EBL_CR_POWERON, lo, hi);
- invalue = (lo & (1<<22|1<<23|1<<24|1<<25))>>22;
- if (longhaul_version == TYPE_LONGHAUL_V2 ||
- longhaul_version == TYPE_POWERSAVER) {
- if (lo & (1<<27))
- invalue += 16;
- }
- return eblcr[invalue];
-}
-
-/* For processor with BCR2 MSR */
-
-static void do_longhaul1(unsigned int mults_index)
-{
- union msr_bcr2 bcr2;
-
- rdmsrl(MSR_VIA_BCR2, bcr2.val);
- /* Enable software clock multiplier */
- bcr2.bits.ESOFTBF = 1;
- bcr2.bits.CLOCKMUL = mults_index & 0xff;
-
- /* Sync to timer tick */
- safe_halt();
- /* Change frequency on next halt or sleep */
- wrmsrl(MSR_VIA_BCR2, bcr2.val);
- /* Invoke transition */
- ACPI_FLUSH_CPU_CACHE();
- halt();
-
- /* Disable software clock multiplier */
- local_irq_disable();
- rdmsrl(MSR_VIA_BCR2, bcr2.val);
- bcr2.bits.ESOFTBF = 0;
- wrmsrl(MSR_VIA_BCR2, bcr2.val);
-}
-
-/* For processor with Longhaul MSR */
-
-static void do_powersaver(int cx_address, unsigned int mults_index,
- unsigned int dir)
-{
- union msr_longhaul longhaul;
- u32 t;
-
- rdmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- /* Setup new frequency */
- if (!revid_errata)
- longhaul.bits.RevisionKey = longhaul.bits.RevisionID;
- else
- longhaul.bits.RevisionKey = 0;
- longhaul.bits.SoftBusRatio = mults_index & 0xf;
- longhaul.bits.SoftBusRatio4 = (mults_index & 0x10) >> 4;
- /* Setup new voltage */
- if (can_scale_voltage)
- longhaul.bits.SoftVID = (mults_index >> 8) & 0x1f;
- /* Sync to timer tick */
- safe_halt();
- /* Raise voltage if necessary */
- if (can_scale_voltage && dir) {
- longhaul.bits.EnableSoftVID = 1;
- wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- /* Change voltage */
- if (!cx_address) {
- ACPI_FLUSH_CPU_CACHE();
- halt();
- } else {
- ACPI_FLUSH_CPU_CACHE();
- /* Invoke C3 */
- inb(cx_address);
- /* Dummy op - must do something useless after P_LVL3
- * read */
- t = inl(acpi_gbl_FADT.xpm_timer_block.address);
- }
- longhaul.bits.EnableSoftVID = 0;
- wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- }
-
- /* Change frequency on next halt or sleep */
- longhaul.bits.EnableSoftBusRatio = 1;
- wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- if (!cx_address) {
- ACPI_FLUSH_CPU_CACHE();
- halt();
- } else {
- ACPI_FLUSH_CPU_CACHE();
- /* Invoke C3 */
- inb(cx_address);
- /* Dummy op - must do something useless after P_LVL3 read */
- t = inl(acpi_gbl_FADT.xpm_timer_block.address);
- }
- /* Disable bus ratio bit */
- longhaul.bits.EnableSoftBusRatio = 0;
- wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
-
- /* Reduce voltage if necessary */
- if (can_scale_voltage && !dir) {
- longhaul.bits.EnableSoftVID = 1;
- wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- /* Change voltage */
- if (!cx_address) {
- ACPI_FLUSH_CPU_CACHE();
- halt();
- } else {
- ACPI_FLUSH_CPU_CACHE();
- /* Invoke C3 */
- inb(cx_address);
- /* Dummy op - must do something useless after P_LVL3
- * read */
- t = inl(acpi_gbl_FADT.xpm_timer_block.address);
- }
- longhaul.bits.EnableSoftVID = 0;
- wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- }
-}
-
-/**
- * longhaul_set_cpu_frequency()
- * @mults_index : bitpattern of the new multiplier.
- *
- * Sets a new clock ratio.
- */
-
-static void longhaul_setstate(unsigned int table_index)
-{
- unsigned int mults_index;
- int speed, mult;
- struct cpufreq_freqs freqs;
- unsigned long flags;
- unsigned int pic1_mask, pic2_mask;
- u16 bm_status = 0;
- u32 bm_timeout = 1000;
- unsigned int dir = 0;
-
- mults_index = longhaul_table[table_index].index;
- /* Safety precautions */
- mult = mults[mults_index & 0x1f];
- if (mult == -1)
- return;
- speed = calc_speed(mult);
- if ((speed > highest_speed) || (speed < lowest_speed))
- return;
- /* Voltage transition before frequency transition? */
- if (can_scale_voltage && longhaul_index < table_index)
- dir = 1;
-
- freqs.old = calc_speed(longhaul_get_cpu_mult());
- freqs.new = speed;
- freqs.cpu = 0; /* longhaul.c is UP only driver */
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- dprintk("Setting to FSB:%dMHz Mult:%d.%dx (%s)\n",
- fsb, mult/10, mult%10, print_speed(speed/1000));
-retry_loop:
- preempt_disable();
- local_irq_save(flags);
-
- pic2_mask = inb(0xA1);
- pic1_mask = inb(0x21); /* works on C3. save mask. */
- outb(0xFF, 0xA1); /* Overkill */
- outb(0xFE, 0x21); /* TMR0 only */
-
- /* Wait while PCI bus is busy. */
- if (acpi_regs_addr && (longhaul_flags & USE_NORTHBRIDGE
- || ((pr != NULL) && pr->flags.bm_control))) {
- bm_status = inw(acpi_regs_addr);
- bm_status &= 1 << 4;
- while (bm_status && bm_timeout) {
- outw(1 << 4, acpi_regs_addr);
- bm_timeout--;
- bm_status = inw(acpi_regs_addr);
- bm_status &= 1 << 4;
- }
- }
-
- if (longhaul_flags & USE_NORTHBRIDGE) {
- /* Disable AGP and PCI arbiters */
- outb(3, 0x22);
- } else if ((pr != NULL) && pr->flags.bm_control) {
- /* Disable bus master arbitration */
- acpi_write_bit_register(ACPI_BITREG_ARB_DISABLE, 1);
- }
- switch (longhaul_version) {
-
- /*
- * Longhaul v1. (Samuel[C5A] and Samuel2 stepping 0[C5B])
- * Software controlled multipliers only.
- */
- case TYPE_LONGHAUL_V1:
- do_longhaul1(mults_index);
- break;
-
- /*
- * Longhaul v2 appears in Samuel2 Steppings 1->7 [C5B] and Ezra [C5C]
- *
- * Longhaul v3 (aka Powersaver). (Ezra-T [C5M] & Nehemiah [C5N])
- * Nehemiah can do FSB scaling too, but this has never been proven
- * to work in practice.
- */
- case TYPE_LONGHAUL_V2:
- case TYPE_POWERSAVER:
- if (longhaul_flags & USE_ACPI_C3) {
- /* Don't allow wakeup */
- acpi_write_bit_register(ACPI_BITREG_BUS_MASTER_RLD, 0);
- do_powersaver(cx->address, mults_index, dir);
- } else {
- do_powersaver(0, mults_index, dir);
- }
- break;
- }
-
- if (longhaul_flags & USE_NORTHBRIDGE) {
- /* Enable arbiters */
- outb(0, 0x22);
- } else if ((pr != NULL) && pr->flags.bm_control) {
- /* Enable bus master arbitration */
- acpi_write_bit_register(ACPI_BITREG_ARB_DISABLE, 0);
- }
- outb(pic2_mask, 0xA1); /* restore mask */
- outb(pic1_mask, 0x21);
-
- local_irq_restore(flags);
- preempt_enable();
-
- freqs.new = calc_speed(longhaul_get_cpu_mult());
- /* Check if requested frequency is set. */
- if (unlikely(freqs.new != speed)) {
- printk(KERN_INFO PFX "Failed to set requested frequency!\n");
- /* Revision ID = 1 but processor is expecting revision key
- * equal to 0. Jumpers at the bottom of processor will change
- * multiplier and FSB, but will not change bits in Longhaul
- * MSR nor enable voltage scaling. */
- if (!revid_errata) {
- printk(KERN_INFO PFX "Enabling \"Ignore Revision ID\" "
- "option.\n");
- revid_errata = 1;
- msleep(200);
- goto retry_loop;
- }
- /* Why ACPI C3 sometimes doesn't work is a mystery for me.
- * But it does happen. Processor is entering ACPI C3 state,
- * but it doesn't change frequency. I tried poking various
- * bits in northbridge registers, but without success. */
- if (longhaul_flags & USE_ACPI_C3) {
- printk(KERN_INFO PFX "Disabling ACPI C3 support.\n");
- longhaul_flags &= ~USE_ACPI_C3;
- if (revid_errata) {
- printk(KERN_INFO PFX "Disabling \"Ignore "
- "Revision ID\" option.\n");
- revid_errata = 0;
- }
- msleep(200);
- goto retry_loop;
- }
- /* This shouldn't happen. Longhaul ver. 2 was reported not
- * working on processors without voltage scaling, but with
- * RevID = 1. RevID errata will make things right. Just
- * to be 100% sure. */
- if (longhaul_version == TYPE_LONGHAUL_V2) {
- printk(KERN_INFO PFX "Switching to Longhaul ver. 1\n");
- longhaul_version = TYPE_LONGHAUL_V1;
- msleep(200);
- goto retry_loop;
- }
- }
- /* Report true CPU frequency */
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-
- if (!bm_timeout)
- printk(KERN_INFO PFX "Warning: Timeout while waiting for "
- "idle PCI bus.\n");
-}
-
-/*
- * Centaur decided to make life a little more tricky.
- * Only longhaul v1 is allowed to read EBLCR BSEL[0:1].
- * Samuel2 and above have to try and guess what the FSB is.
- * We do this by assuming we booted at maximum multiplier, and interpolate
- * between that value multiplied by possible FSBs and cpu_mhz which
- * was calculated at boot time. Really ugly, but no other way to do this.
- */
-
-#define ROUNDING 0xf
-
-static int guess_fsb(int mult)
-{
- int speed = cpu_khz / 1000;
- int i;
- int speeds[] = { 666, 1000, 1333, 2000 };
- int f_max, f_min;
-
- for (i = 0; i < 4; i++) {
- f_max = ((speeds[i] * mult) + 50) / 100;
- f_max += (ROUNDING / 2);
- f_min = f_max - ROUNDING;
- if ((speed <= f_max) && (speed >= f_min))
- return speeds[i] / 10;
- }
- return 0;
-}
-
-
-static int __cpuinit longhaul_get_ranges(void)
-{
- unsigned int i, j, k = 0;
- unsigned int ratio;
- int mult;
-
- /* Get current frequency */
- mult = longhaul_get_cpu_mult();
- if (mult == -1) {
- printk(KERN_INFO PFX "Invalid (reserved) multiplier!\n");
- return -EINVAL;
- }
- fsb = guess_fsb(mult);
- if (fsb == 0) {
- printk(KERN_INFO PFX "Invalid (reserved) FSB!\n");
- return -EINVAL;
- }
- /* Get max multiplier - as we always did.
- * Longhaul MSR is useful only when voltage scaling is enabled.
- * C3 is booting at max anyway. */
- maxmult = mult;
- /* Get min multiplier */
- switch (cpu_model) {
- case CPU_NEHEMIAH:
- minmult = 50;
- break;
- case CPU_NEHEMIAH_C:
- minmult = 40;
- break;
- default:
- minmult = 30;
- break;
- }
-
- dprintk("MinMult:%d.%dx MaxMult:%d.%dx\n",
- minmult/10, minmult%10, maxmult/10, maxmult%10);
-
- highest_speed = calc_speed(maxmult);
- lowest_speed = calc_speed(minmult);
- dprintk("FSB:%dMHz Lowest speed: %s Highest speed:%s\n", fsb,
- print_speed(lowest_speed/1000),
- print_speed(highest_speed/1000));
-
- if (lowest_speed == highest_speed) {
- printk(KERN_INFO PFX "highestspeed == lowest, aborting.\n");
- return -EINVAL;
- }
- if (lowest_speed > highest_speed) {
- printk(KERN_INFO PFX "nonsense! lowest (%d > %d) !\n",
- lowest_speed, highest_speed);
- return -EINVAL;
- }
-
- longhaul_table = kmalloc((numscales + 1) * sizeof(*longhaul_table),
- GFP_KERNEL);
- if (!longhaul_table)
- return -ENOMEM;
-
- for (j = 0; j < numscales; j++) {
- ratio = mults[j];
- if (ratio == -1)
- continue;
- if (ratio > maxmult || ratio < minmult)
- continue;
- longhaul_table[k].frequency = calc_speed(ratio);
- longhaul_table[k].index = j;
- k++;
- }
- if (k <= 1) {
- kfree(longhaul_table);
- return -ENODEV;
- }
- /* Sort */
- for (j = 0; j < k - 1; j++) {
- unsigned int min_f, min_i;
- min_f = longhaul_table[j].frequency;
- min_i = j;
- for (i = j + 1; i < k; i++) {
- if (longhaul_table[i].frequency < min_f) {
- min_f = longhaul_table[i].frequency;
- min_i = i;
- }
- }
- if (min_i != j) {
- swap(longhaul_table[j].frequency,
- longhaul_table[min_i].frequency);
- swap(longhaul_table[j].index,
- longhaul_table[min_i].index);
- }
- }
-
- longhaul_table[k].frequency = CPUFREQ_TABLE_END;
-
- /* Find index we are running on */
- for (j = 0; j < k; j++) {
- if (mults[longhaul_table[j].index & 0x1f] == mult) {
- longhaul_index = j;
- break;
- }
- }
- return 0;
-}
-
-
-static void __cpuinit longhaul_setup_voltagescaling(void)
-{
- union msr_longhaul longhaul;
- struct mV_pos minvid, maxvid, vid;
- unsigned int j, speed, pos, kHz_step, numvscales;
- int min_vid_speed;
-
- rdmsrl(MSR_VIA_LONGHAUL, longhaul.val);
- if (!(longhaul.bits.RevisionID & 1)) {
- printk(KERN_INFO PFX "Voltage scaling not supported by CPU.\n");
- return;
- }
-
- if (!longhaul.bits.VRMRev) {
- printk(KERN_INFO PFX "VRM 8.5\n");
- vrm_mV_table = &vrm85_mV[0];
- mV_vrm_table = &mV_vrm85[0];
- } else {
- printk(KERN_INFO PFX "Mobile VRM\n");
- if (cpu_model < CPU_NEHEMIAH)
- return;
- vrm_mV_table = &mobilevrm_mV[0];
- mV_vrm_table = &mV_mobilevrm[0];
- }
-
- minvid = vrm_mV_table[longhaul.bits.MinimumVID];
- maxvid = vrm_mV_table[longhaul.bits.MaximumVID];
-
- if (minvid.mV == 0 || maxvid.mV == 0 || minvid.mV > maxvid.mV) {
- printk(KERN_INFO PFX "Bogus values Min:%d.%03d Max:%d.%03d. "
- "Voltage scaling disabled.\n",
- minvid.mV/1000, minvid.mV%1000,
- maxvid.mV/1000, maxvid.mV%1000);
- return;
- }
-
- if (minvid.mV == maxvid.mV) {
- printk(KERN_INFO PFX "Claims to support voltage scaling but "
- "min & max are both %d.%03d. "
- "Voltage scaling disabled\n",
- maxvid.mV/1000, maxvid.mV%1000);
- return;
- }
-
- /* How many voltage steps*/
- numvscales = maxvid.pos - minvid.pos + 1;
- printk(KERN_INFO PFX
- "Max VID=%d.%03d "
- "Min VID=%d.%03d, "
- "%d possible voltage scales\n",
- maxvid.mV/1000, maxvid.mV%1000,
- minvid.mV/1000, minvid.mV%1000,
- numvscales);
-
- /* Calculate max frequency at min voltage */
- j = longhaul.bits.MinMHzBR;
- if (longhaul.bits.MinMHzBR4)
- j += 16;
- min_vid_speed = eblcr[j];
- if (min_vid_speed == -1)
- return;
- switch (longhaul.bits.MinMHzFSB) {
- case 0:
- min_vid_speed *= 13333;
- break;
- case 1:
- min_vid_speed *= 10000;
- break;
- case 3:
- min_vid_speed *= 6666;
- break;
- default:
- return;
- break;
- }
- if (min_vid_speed >= highest_speed)
- return;
- /* Calculate kHz for one voltage step */
- kHz_step = (highest_speed - min_vid_speed) / numvscales;
-
- j = 0;
- while (longhaul_table[j].frequency != CPUFREQ_TABLE_END) {
- speed = longhaul_table[j].frequency;
- if (speed > min_vid_speed)
- pos = (speed - min_vid_speed) / kHz_step + minvid.pos;
- else
- pos = minvid.pos;
- longhaul_table[j].index |= mV_vrm_table[pos] << 8;
- vid = vrm_mV_table[mV_vrm_table[pos]];
- printk(KERN_INFO PFX "f: %d kHz, index: %d, vid: %d mV\n",
- speed, j, vid.mV);
- j++;
- }
-
- can_scale_voltage = 1;
- printk(KERN_INFO PFX "Voltage scaling enabled.\n");
-}
-
-
-static int longhaul_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, longhaul_table);
-}
-
-
-static int longhaul_target(struct cpufreq_policy *policy,
- unsigned int target_freq, unsigned int relation)
-{
- unsigned int table_index = 0;
- unsigned int i;
- unsigned int dir = 0;
- u8 vid, current_vid;
-
- if (cpufreq_frequency_table_target(policy, longhaul_table, target_freq,
- relation, &table_index))
- return -EINVAL;
-
- /* Don't set same frequency again */
- if (longhaul_index == table_index)
- return 0;
-
- if (!can_scale_voltage)
- longhaul_setstate(table_index);
- else {
- /* On test system voltage transitions exceeding single
- * step up or down were turning motherboard off. Both
- * "ondemand" and "userspace" are unsafe. C7 is doing
- * this in hardware, C3 is old and we need to do this
- * in software. */
- i = longhaul_index;
- current_vid = (longhaul_table[longhaul_index].index >> 8);
- current_vid &= 0x1f;
- if (table_index > longhaul_index)
- dir = 1;
- while (i != table_index) {
- vid = (longhaul_table[i].index >> 8) & 0x1f;
- if (vid != current_vid) {
- longhaul_setstate(i);
- current_vid = vid;
- msleep(200);
- }
- if (dir)
- i++;
- else
- i--;
- }
- longhaul_setstate(table_index);
- }
- longhaul_index = table_index;
- return 0;
-}
-
-
-static unsigned int longhaul_get(unsigned int cpu)
-{
- if (cpu)
- return 0;
- return calc_speed(longhaul_get_cpu_mult());
-}
-
-static acpi_status longhaul_walk_callback(acpi_handle obj_handle,
- u32 nesting_level,
- void *context, void **return_value)
-{
- struct acpi_device *d;
-
- if (acpi_bus_get_device(obj_handle, &d))
- return 0;
-
- *return_value = acpi_driver_data(d);
- return 1;
-}
-
-/* VIA don't support PM2 reg, but have something similar */
-static int enable_arbiter_disable(void)
-{
- struct pci_dev *dev;
- int status = 1;
- int reg;
- u8 pci_cmd;
-
- /* Find PLE133 host bridge */
- reg = 0x78;
- dev = pci_get_device(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8601_0,
- NULL);
- /* Find PM133/VT8605 host bridge */
- if (dev == NULL)
- dev = pci_get_device(PCI_VENDOR_ID_VIA,
- PCI_DEVICE_ID_VIA_8605_0, NULL);
- /* Find CLE266 host bridge */
- if (dev == NULL) {
- reg = 0x76;
- dev = pci_get_device(PCI_VENDOR_ID_VIA,
- PCI_DEVICE_ID_VIA_862X_0, NULL);
- /* Find CN400 V-Link host bridge */
- if (dev == NULL)
- dev = pci_get_device(PCI_VENDOR_ID_VIA, 0x7259, NULL);
- }
- if (dev != NULL) {
- /* Enable access to port 0x22 */
- pci_read_config_byte(dev, reg, &pci_cmd);
- if (!(pci_cmd & 1<<7)) {
- pci_cmd |= 1<<7;
- pci_write_config_byte(dev, reg, pci_cmd);
- pci_read_config_byte(dev, reg, &pci_cmd);
- if (!(pci_cmd & 1<<7)) {
- printk(KERN_ERR PFX
- "Can't enable access to port 0x22.\n");
- status = 0;
- }
- }
- pci_dev_put(dev);
- return status;
- }
- return 0;
-}
-
-static int longhaul_setup_southbridge(void)
-{
- struct pci_dev *dev;
- u8 pci_cmd;
-
- /* Find VT8235 southbridge */
- dev = pci_get_device(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8235, NULL);
- if (dev == NULL)
- /* Find VT8237 southbridge */
- dev = pci_get_device(PCI_VENDOR_ID_VIA,
- PCI_DEVICE_ID_VIA_8237, NULL);
- if (dev != NULL) {
- /* Set transition time to max */
- pci_read_config_byte(dev, 0xec, &pci_cmd);
- pci_cmd &= ~(1 << 2);
- pci_write_config_byte(dev, 0xec, pci_cmd);
- pci_read_config_byte(dev, 0xe4, &pci_cmd);
- pci_cmd &= ~(1 << 7);
- pci_write_config_byte(dev, 0xe4, pci_cmd);
- pci_read_config_byte(dev, 0xe5, &pci_cmd);
- pci_cmd |= 1 << 7;
- pci_write_config_byte(dev, 0xe5, pci_cmd);
- /* Get address of ACPI registers block*/
- pci_read_config_byte(dev, 0x81, &pci_cmd);
- if (pci_cmd & 1 << 7) {
- pci_read_config_dword(dev, 0x88, &acpi_regs_addr);
- acpi_regs_addr &= 0xff00;
- printk(KERN_INFO PFX "ACPI I/O at 0x%x\n",
- acpi_regs_addr);
- }
-
- pci_dev_put(dev);
- return 1;
- }
- return 0;
-}
-
-static int __cpuinit longhaul_cpu_init(struct cpufreq_policy *policy)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- char *cpuname = NULL;
- int ret;
- u32 lo, hi;
-
- /* Check what we have on this motherboard */
- switch (c->x86_model) {
- case 6:
- cpu_model = CPU_SAMUEL;
- cpuname = "C3 'Samuel' [C5A]";
- longhaul_version = TYPE_LONGHAUL_V1;
- memcpy(mults, samuel1_mults, sizeof(samuel1_mults));
- memcpy(eblcr, samuel1_eblcr, sizeof(samuel1_eblcr));
- break;
-
- case 7:
- switch (c->x86_mask) {
- case 0:
- longhaul_version = TYPE_LONGHAUL_V1;
- cpu_model = CPU_SAMUEL2;
- cpuname = "C3 'Samuel 2' [C5B]";
- /* Note, this is not a typo, early Samuel2's had
- * Samuel1 ratios. */
- memcpy(mults, samuel1_mults, sizeof(samuel1_mults));
- memcpy(eblcr, samuel2_eblcr, sizeof(samuel2_eblcr));
- break;
- case 1 ... 15:
- longhaul_version = TYPE_LONGHAUL_V2;
- if (c->x86_mask < 8) {
- cpu_model = CPU_SAMUEL2;
- cpuname = "C3 'Samuel 2' [C5B]";
- } else {
- cpu_model = CPU_EZRA;
- cpuname = "C3 'Ezra' [C5C]";
- }
- memcpy(mults, ezra_mults, sizeof(ezra_mults));
- memcpy(eblcr, ezra_eblcr, sizeof(ezra_eblcr));
- break;
- }
- break;
-
- case 8:
- cpu_model = CPU_EZRA_T;
- cpuname = "C3 'Ezra-T' [C5M]";
- longhaul_version = TYPE_POWERSAVER;
- numscales = 32;
- memcpy(mults, ezrat_mults, sizeof(ezrat_mults));
- memcpy(eblcr, ezrat_eblcr, sizeof(ezrat_eblcr));
- break;
-
- case 9:
- longhaul_version = TYPE_POWERSAVER;
- numscales = 32;
- memcpy(mults, nehemiah_mults, sizeof(nehemiah_mults));
- memcpy(eblcr, nehemiah_eblcr, sizeof(nehemiah_eblcr));
- switch (c->x86_mask) {
- case 0 ... 1:
- cpu_model = CPU_NEHEMIAH;
- cpuname = "C3 'Nehemiah A' [C5XLOE]";
- break;
- case 2 ... 4:
- cpu_model = CPU_NEHEMIAH;
- cpuname = "C3 'Nehemiah B' [C5XLOH]";
- break;
- case 5 ... 15:
- cpu_model = CPU_NEHEMIAH_C;
- cpuname = "C3 'Nehemiah C' [C5P]";
- break;
- }
- break;
-
- default:
- cpuname = "Unknown";
- break;
- }
- /* Check Longhaul ver. 2 */
- if (longhaul_version == TYPE_LONGHAUL_V2) {
- rdmsr(MSR_VIA_LONGHAUL, lo, hi);
- if (lo == 0 && hi == 0)
- /* Looks like MSR isn't present */
- longhaul_version = TYPE_LONGHAUL_V1;
- }
-
- printk(KERN_INFO PFX "VIA %s CPU detected. ", cpuname);
- switch (longhaul_version) {
- case TYPE_LONGHAUL_V1:
- case TYPE_LONGHAUL_V2:
- printk(KERN_CONT "Longhaul v%d supported.\n", longhaul_version);
- break;
- case TYPE_POWERSAVER:
- printk(KERN_CONT "Powersaver supported.\n");
- break;
- };
-
- /* Doesn't hurt */
- longhaul_setup_southbridge();
-
- /* Find ACPI data for processor */
- acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
- ACPI_UINT32_MAX, &longhaul_walk_callback, NULL,
- NULL, (void *)&pr);
-
- /* Check ACPI support for C3 state */
- if (pr != NULL && longhaul_version == TYPE_POWERSAVER) {
- cx = &pr->power.states[ACPI_STATE_C3];
- if (cx->address > 0 && cx->latency <= 1000)
- longhaul_flags |= USE_ACPI_C3;
- }
- /* Disable if it isn't working */
- if (disable_acpi_c3)
- longhaul_flags &= ~USE_ACPI_C3;
- /* Check if northbridge is friendly */
- if (enable_arbiter_disable())
- longhaul_flags |= USE_NORTHBRIDGE;
-
- /* Check ACPI support for bus master arbiter disable */
- if (!(longhaul_flags & USE_ACPI_C3
- || longhaul_flags & USE_NORTHBRIDGE)
- && ((pr == NULL) || !(pr->flags.bm_control))) {
- printk(KERN_ERR PFX
- "No ACPI support. Unsupported northbridge.\n");
- return -ENODEV;
- }
-
- if (longhaul_flags & USE_NORTHBRIDGE)
- printk(KERN_INFO PFX "Using northbridge support.\n");
- if (longhaul_flags & USE_ACPI_C3)
- printk(KERN_INFO PFX "Using ACPI support.\n");
-
- ret = longhaul_get_ranges();
- if (ret != 0)
- return ret;
-
- if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
- longhaul_setup_voltagescaling();
-
- policy->cpuinfo.transition_latency = 200000; /* nsec */
- policy->cur = calc_speed(longhaul_get_cpu_mult());
-
- ret = cpufreq_frequency_table_cpuinfo(policy, longhaul_table);
- if (ret)
- return ret;
-
- cpufreq_frequency_table_get_attr(longhaul_table, policy->cpu);
-
- return 0;
-}
-
-static int __devexit longhaul_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-static struct freq_attr *longhaul_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver longhaul_driver = {
- .verify = longhaul_verify,
- .target = longhaul_target,
- .get = longhaul_get,
- .init = longhaul_cpu_init,
- .exit = __devexit_p(longhaul_cpu_exit),
- .name = "longhaul",
- .owner = THIS_MODULE,
- .attr = longhaul_attr,
-};
-
-
-static int __init longhaul_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
-
- if (c->x86_vendor != X86_VENDOR_CENTAUR || c->x86 != 6)
- return -ENODEV;
-
-#ifdef CONFIG_SMP
- if (num_online_cpus() > 1) {
- printk(KERN_ERR PFX "More than 1 CPU detected, "
- "longhaul disabled.\n");
- return -ENODEV;
- }
-#endif
-#ifdef CONFIG_X86_IO_APIC
- if (cpu_has_apic) {
- printk(KERN_ERR PFX "APIC detected. Longhaul is currently "
- "broken in this configuration.\n");
- return -ENODEV;
- }
-#endif
- switch (c->x86_model) {
- case 6 ... 9:
- return cpufreq_register_driver(&longhaul_driver);
- case 10:
- printk(KERN_ERR PFX "Use acpi-cpufreq driver for VIA C7\n");
- default:
- ;
- }
-
- return -ENODEV;
-}
-
-
-static void __exit longhaul_exit(void)
-{
- int i;
-
- for (i = 0; i < numscales; i++) {
- if (mults[i] == maxmult) {
- longhaul_setstate(i);
- break;
- }
- }
-
- cpufreq_unregister_driver(&longhaul_driver);
- kfree(longhaul_table);
-}
-
-/* Even if BIOS is exporting ACPI C3 state, and it is used
- * with success when CPU is idle, this state doesn't
- * trigger frequency transition in some cases. */
-module_param(disable_acpi_c3, int, 0644);
-MODULE_PARM_DESC(disable_acpi_c3, "Don't use ACPI C3 support");
-/* Change CPU voltage with frequency. Very useful to save
- * power, but most VIA C3 processors aren't supporting it. */
-module_param(scale_voltage, int, 0644);
-MODULE_PARM_DESC(scale_voltage, "Scale voltage of processor");
-/* Force revision key to 0 for processors which doesn't
- * support voltage scaling, but are introducing itself as
- * such. */
-module_param(revid_errata, int, 0644);
-MODULE_PARM_DESC(revid_errata, "Ignore CPU Revision ID");
-
-MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
-MODULE_DESCRIPTION("Longhaul driver for VIA Cyrix processors.");
-MODULE_LICENSE("GPL");
-
-late_initcall(longhaul_init);
-module_exit(longhaul_exit);
+++ /dev/null
-/*
- * longhaul.h
- * (C) 2003 Dave Jones.
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * VIA-specific information
- */
-
-union msr_bcr2 {
- struct {
- unsigned Reseved:19, // 18:0
- ESOFTBF:1, // 19
- Reserved2:3, // 22:20
- CLOCKMUL:4, // 26:23
- Reserved3:5; // 31:27
- } bits;
- unsigned long val;
-};
-
-union msr_longhaul {
- struct {
- unsigned RevisionID:4, // 3:0
- RevisionKey:4, // 7:4
- EnableSoftBusRatio:1, // 8
- EnableSoftVID:1, // 9
- EnableSoftBSEL:1, // 10
- Reserved:3, // 11:13
- SoftBusRatio4:1, // 14
- VRMRev:1, // 15
- SoftBusRatio:4, // 19:16
- SoftVID:5, // 24:20
- Reserved2:3, // 27:25
- SoftBSEL:2, // 29:28
- Reserved3:2, // 31:30
- MaxMHzBR:4, // 35:32
- MaximumVID:5, // 40:36
- MaxMHzFSB:2, // 42:41
- MaxMHzBR4:1, // 43
- Reserved4:4, // 47:44
- MinMHzBR:4, // 51:48
- MinimumVID:5, // 56:52
- MinMHzFSB:2, // 58:57
- MinMHzBR4:1, // 59
- Reserved5:4; // 63:60
- } bits;
- unsigned long long val;
-};
-
-/*
- * Clock ratio tables. Div/Mod by 10 to get ratio.
- * The eblcr values specify the ratio read from the CPU.
- * The mults values specify what to write to the CPU.
- */
-
-/*
- * VIA C3 Samuel 1 & Samuel 2 (stepping 0)
- */
-static const int __cpuinitdata samuel1_mults[16] = {
- -1, /* 0000 -> RESERVED */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- -1, /* 0011 -> RESERVED */
- -1, /* 0100 -> RESERVED */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- 55, /* 0111 -> 5.5x */
- 60, /* 1000 -> 6.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 50, /* 1011 -> 5.0x */
- 65, /* 1100 -> 6.5x */
- 75, /* 1101 -> 7.5x */
- -1, /* 1110 -> RESERVED */
- -1, /* 1111 -> RESERVED */
-};
-
-static const int __cpuinitdata samuel1_eblcr[16] = {
- 50, /* 0000 -> RESERVED */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- -1, /* 0011 -> RESERVED */
- 55, /* 0100 -> 5.5x */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- -1, /* 0111 -> RESERVED */
- -1, /* 1000 -> RESERVED */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 60, /* 1011 -> 6.0x */
- -1, /* 1100 -> RESERVED */
- 75, /* 1101 -> 7.5x */
- -1, /* 1110 -> RESERVED */
- 65, /* 1111 -> 6.5x */
-};
-
-/*
- * VIA C3 Samuel2 Stepping 1->15
- */
-static const int __cpuinitdata samuel2_eblcr[16] = {
- 50, /* 0000 -> 5.0x */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- 100, /* 0011 -> 10.0x */
- 55, /* 0100 -> 5.5x */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- 110, /* 0111 -> 11.0x */
- 90, /* 1000 -> 9.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 60, /* 1011 -> 6.0x */
- 120, /* 1100 -> 12.0x */
- 75, /* 1101 -> 7.5x */
- 130, /* 1110 -> 13.0x */
- 65, /* 1111 -> 6.5x */
-};
-
-/*
- * VIA C3 Ezra
- */
-static const int __cpuinitdata ezra_mults[16] = {
- 100, /* 0000 -> 10.0x */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- 90, /* 0011 -> 9.0x */
- 95, /* 0100 -> 9.5x */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- 55, /* 0111 -> 5.5x */
- 60, /* 1000 -> 6.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 50, /* 1011 -> 5.0x */
- 65, /* 1100 -> 6.5x */
- 75, /* 1101 -> 7.5x */
- 85, /* 1110 -> 8.5x */
- 120, /* 1111 -> 12.0x */
-};
-
-static const int __cpuinitdata ezra_eblcr[16] = {
- 50, /* 0000 -> 5.0x */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- 100, /* 0011 -> 10.0x */
- 55, /* 0100 -> 5.5x */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- 95, /* 0111 -> 9.5x */
- 90, /* 1000 -> 9.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 60, /* 1011 -> 6.0x */
- 120, /* 1100 -> 12.0x */
- 75, /* 1101 -> 7.5x */
- 85, /* 1110 -> 8.5x */
- 65, /* 1111 -> 6.5x */
-};
-
-/*
- * VIA C3 (Ezra-T) [C5M].
- */
-static const int __cpuinitdata ezrat_mults[32] = {
- 100, /* 0000 -> 10.0x */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- 90, /* 0011 -> 9.0x */
- 95, /* 0100 -> 9.5x */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- 55, /* 0111 -> 5.5x */
- 60, /* 1000 -> 6.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 50, /* 1011 -> 5.0x */
- 65, /* 1100 -> 6.5x */
- 75, /* 1101 -> 7.5x */
- 85, /* 1110 -> 8.5x */
- 120, /* 1111 -> 12.0x */
-
- -1, /* 0000 -> RESERVED (10.0x) */
- 110, /* 0001 -> 11.0x */
- -1, /* 0010 -> 12.0x */
- -1, /* 0011 -> RESERVED (9.0x)*/
- 105, /* 0100 -> 10.5x */
- 115, /* 0101 -> 11.5x */
- 125, /* 0110 -> 12.5x */
- 135, /* 0111 -> 13.5x */
- 140, /* 1000 -> 14.0x */
- 150, /* 1001 -> 15.0x */
- 160, /* 1010 -> 16.0x */
- 130, /* 1011 -> 13.0x */
- 145, /* 1100 -> 14.5x */
- 155, /* 1101 -> 15.5x */
- -1, /* 1110 -> RESERVED (13.0x) */
- -1, /* 1111 -> RESERVED (12.0x) */
-};
-
-static const int __cpuinitdata ezrat_eblcr[32] = {
- 50, /* 0000 -> 5.0x */
- 30, /* 0001 -> 3.0x */
- 40, /* 0010 -> 4.0x */
- 100, /* 0011 -> 10.0x */
- 55, /* 0100 -> 5.5x */
- 35, /* 0101 -> 3.5x */
- 45, /* 0110 -> 4.5x */
- 95, /* 0111 -> 9.5x */
- 90, /* 1000 -> 9.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 60, /* 1011 -> 6.0x */
- 120, /* 1100 -> 12.0x */
- 75, /* 1101 -> 7.5x */
- 85, /* 1110 -> 8.5x */
- 65, /* 1111 -> 6.5x */
-
- -1, /* 0000 -> RESERVED (9.0x) */
- 110, /* 0001 -> 11.0x */
- 120, /* 0010 -> 12.0x */
- -1, /* 0011 -> RESERVED (10.0x)*/
- 135, /* 0100 -> 13.5x */
- 115, /* 0101 -> 11.5x */
- 125, /* 0110 -> 12.5x */
- 105, /* 0111 -> 10.5x */
- 130, /* 1000 -> 13.0x */
- 150, /* 1001 -> 15.0x */
- 160, /* 1010 -> 16.0x */
- 140, /* 1011 -> 14.0x */
- -1, /* 1100 -> RESERVED (12.0x) */
- 155, /* 1101 -> 15.5x */
- -1, /* 1110 -> RESERVED (13.0x) */
- 145, /* 1111 -> 14.5x */
-};
-
-/*
- * VIA C3 Nehemiah */
-
-static const int __cpuinitdata nehemiah_mults[32] = {
- 100, /* 0000 -> 10.0x */
- -1, /* 0001 -> 16.0x */
- 40, /* 0010 -> 4.0x */
- 90, /* 0011 -> 9.0x */
- 95, /* 0100 -> 9.5x */
- -1, /* 0101 -> RESERVED */
- 45, /* 0110 -> 4.5x */
- 55, /* 0111 -> 5.5x */
- 60, /* 1000 -> 6.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 50, /* 1011 -> 5.0x */
- 65, /* 1100 -> 6.5x */
- 75, /* 1101 -> 7.5x */
- 85, /* 1110 -> 8.5x */
- 120, /* 1111 -> 12.0x */
- -1, /* 0000 -> 10.0x */
- 110, /* 0001 -> 11.0x */
- -1, /* 0010 -> 12.0x */
- -1, /* 0011 -> 9.0x */
- 105, /* 0100 -> 10.5x */
- 115, /* 0101 -> 11.5x */
- 125, /* 0110 -> 12.5x */
- 135, /* 0111 -> 13.5x */
- 140, /* 1000 -> 14.0x */
- 150, /* 1001 -> 15.0x */
- 160, /* 1010 -> 16.0x */
- 130, /* 1011 -> 13.0x */
- 145, /* 1100 -> 14.5x */
- 155, /* 1101 -> 15.5x */
- -1, /* 1110 -> RESERVED (13.0x) */
- -1, /* 1111 -> 12.0x */
-};
-
-static const int __cpuinitdata nehemiah_eblcr[32] = {
- 50, /* 0000 -> 5.0x */
- 160, /* 0001 -> 16.0x */
- 40, /* 0010 -> 4.0x */
- 100, /* 0011 -> 10.0x */
- 55, /* 0100 -> 5.5x */
- -1, /* 0101 -> RESERVED */
- 45, /* 0110 -> 4.5x */
- 95, /* 0111 -> 9.5x */
- 90, /* 1000 -> 9.0x */
- 70, /* 1001 -> 7.0x */
- 80, /* 1010 -> 8.0x */
- 60, /* 1011 -> 6.0x */
- 120, /* 1100 -> 12.0x */
- 75, /* 1101 -> 7.5x */
- 85, /* 1110 -> 8.5x */
- 65, /* 1111 -> 6.5x */
- 90, /* 0000 -> 9.0x */
- 110, /* 0001 -> 11.0x */
- 120, /* 0010 -> 12.0x */
- 100, /* 0011 -> 10.0x */
- 135, /* 0100 -> 13.5x */
- 115, /* 0101 -> 11.5x */
- 125, /* 0110 -> 12.5x */
- 105, /* 0111 -> 10.5x */
- 130, /* 1000 -> 13.0x */
- 150, /* 1001 -> 15.0x */
- 160, /* 1010 -> 16.0x */
- 140, /* 1011 -> 14.0x */
- 120, /* 1100 -> 12.0x */
- 155, /* 1101 -> 15.5x */
- -1, /* 1110 -> RESERVED (13.0x) */
- 145 /* 1111 -> 14.5x */
-};
-
-/*
- * Voltage scales. Div/Mod by 1000 to get actual voltage.
- * Which scale to use depends on the VRM type in use.
- */
-
-struct mV_pos {
- unsigned short mV;
- unsigned short pos;
-};
-
-static const struct mV_pos __cpuinitdata vrm85_mV[32] = {
- {1250, 8}, {1200, 6}, {1150, 4}, {1100, 2},
- {1050, 0}, {1800, 30}, {1750, 28}, {1700, 26},
- {1650, 24}, {1600, 22}, {1550, 20}, {1500, 18},
- {1450, 16}, {1400, 14}, {1350, 12}, {1300, 10},
- {1275, 9}, {1225, 7}, {1175, 5}, {1125, 3},
- {1075, 1}, {1825, 31}, {1775, 29}, {1725, 27},
- {1675, 25}, {1625, 23}, {1575, 21}, {1525, 19},
- {1475, 17}, {1425, 15}, {1375, 13}, {1325, 11}
-};
-
-static const unsigned char __cpuinitdata mV_vrm85[32] = {
- 0x04, 0x14, 0x03, 0x13, 0x02, 0x12, 0x01, 0x11,
- 0x00, 0x10, 0x0f, 0x1f, 0x0e, 0x1e, 0x0d, 0x1d,
- 0x0c, 0x1c, 0x0b, 0x1b, 0x0a, 0x1a, 0x09, 0x19,
- 0x08, 0x18, 0x07, 0x17, 0x06, 0x16, 0x05, 0x15
-};
-
-static const struct mV_pos __cpuinitdata mobilevrm_mV[32] = {
- {1750, 31}, {1700, 30}, {1650, 29}, {1600, 28},
- {1550, 27}, {1500, 26}, {1450, 25}, {1400, 24},
- {1350, 23}, {1300, 22}, {1250, 21}, {1200, 20},
- {1150, 19}, {1100, 18}, {1050, 17}, {1000, 16},
- {975, 15}, {950, 14}, {925, 13}, {900, 12},
- {875, 11}, {850, 10}, {825, 9}, {800, 8},
- {775, 7}, {750, 6}, {725, 5}, {700, 4},
- {675, 3}, {650, 2}, {625, 1}, {600, 0}
-};
-
-static const unsigned char __cpuinitdata mV_mobilevrm[32] = {
- 0x1f, 0x1e, 0x1d, 0x1c, 0x1b, 0x1a, 0x19, 0x18,
- 0x17, 0x16, 0x15, 0x14, 0x13, 0x12, 0x11, 0x10,
- 0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08,
- 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
-};
-
+++ /dev/null
-/*
- * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/timex.h>
-
-#include <asm/msr.h>
-#include <asm/processor.h>
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "longrun", msg)
-
-static struct cpufreq_driver longrun_driver;
-
-/**
- * longrun_{low,high}_freq is needed for the conversion of cpufreq kHz
- * values into per cent values. In TMTA microcode, the following is valid:
- * performance_pctg = (current_freq - low_freq)/(high_freq - low_freq)
- */
-static unsigned int longrun_low_freq, longrun_high_freq;
-
-
-/**
- * longrun_get_policy - get the current LongRun policy
- * @policy: struct cpufreq_policy where current policy is written into
- *
- * Reads the current LongRun policy by access to MSR_TMTA_LONGRUN_FLAGS
- * and MSR_TMTA_LONGRUN_CTRL
- */
-static void __cpuinit longrun_get_policy(struct cpufreq_policy *policy)
-{
- u32 msr_lo, msr_hi;
-
- rdmsr(MSR_TMTA_LONGRUN_FLAGS, msr_lo, msr_hi);
- dprintk("longrun flags are %x - %x\n", msr_lo, msr_hi);
- if (msr_lo & 0x01)
- policy->policy = CPUFREQ_POLICY_PERFORMANCE;
- else
- policy->policy = CPUFREQ_POLICY_POWERSAVE;
-
- rdmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
- dprintk("longrun ctrl is %x - %x\n", msr_lo, msr_hi);
- msr_lo &= 0x0000007F;
- msr_hi &= 0x0000007F;
-
- if (longrun_high_freq <= longrun_low_freq) {
- /* Assume degenerate Longrun table */
- policy->min = policy->max = longrun_high_freq;
- } else {
- policy->min = longrun_low_freq + msr_lo *
- ((longrun_high_freq - longrun_low_freq) / 100);
- policy->max = longrun_low_freq + msr_hi *
- ((longrun_high_freq - longrun_low_freq) / 100);
- }
- policy->cpu = 0;
-}
-
-
-/**
- * longrun_set_policy - sets a new CPUFreq policy
- * @policy: new policy
- *
- * Sets a new CPUFreq policy on LongRun-capable processors. This function
- * has to be called with cpufreq_driver locked.
- */
-static int longrun_set_policy(struct cpufreq_policy *policy)
-{
- u32 msr_lo, msr_hi;
- u32 pctg_lo, pctg_hi;
-
- if (!policy)
- return -EINVAL;
-
- if (longrun_high_freq <= longrun_low_freq) {
- /* Assume degenerate Longrun table */
- pctg_lo = pctg_hi = 100;
- } else {
- pctg_lo = (policy->min - longrun_low_freq) /
- ((longrun_high_freq - longrun_low_freq) / 100);
- pctg_hi = (policy->max - longrun_low_freq) /
- ((longrun_high_freq - longrun_low_freq) / 100);
- }
-
- if (pctg_hi > 100)
- pctg_hi = 100;
- if (pctg_lo > pctg_hi)
- pctg_lo = pctg_hi;
-
- /* performance or economy mode */
- rdmsr(MSR_TMTA_LONGRUN_FLAGS, msr_lo, msr_hi);
- msr_lo &= 0xFFFFFFFE;
- switch (policy->policy) {
- case CPUFREQ_POLICY_PERFORMANCE:
- msr_lo |= 0x00000001;
- break;
- case CPUFREQ_POLICY_POWERSAVE:
- break;
- }
- wrmsr(MSR_TMTA_LONGRUN_FLAGS, msr_lo, msr_hi);
-
- /* lower and upper boundary */
- rdmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
- msr_lo &= 0xFFFFFF80;
- msr_hi &= 0xFFFFFF80;
- msr_lo |= pctg_lo;
- msr_hi |= pctg_hi;
- wrmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
-
- return 0;
-}
-
-
-/**
- * longrun_verify_poliy - verifies a new CPUFreq policy
- * @policy: the policy to verify
- *
- * Validates a new CPUFreq policy. This function has to be called with
- * cpufreq_driver locked.
- */
-static int longrun_verify_policy(struct cpufreq_policy *policy)
-{
- if (!policy)
- return -EINVAL;
-
- policy->cpu = 0;
- cpufreq_verify_within_limits(policy,
- policy->cpuinfo.min_freq,
- policy->cpuinfo.max_freq);
-
- if ((policy->policy != CPUFREQ_POLICY_POWERSAVE) &&
- (policy->policy != CPUFREQ_POLICY_PERFORMANCE))
- return -EINVAL;
-
- return 0;
-}
-
-static unsigned int longrun_get(unsigned int cpu)
-{
- u32 eax, ebx, ecx, edx;
-
- if (cpu)
- return 0;
-
- cpuid(0x80860007, &eax, &ebx, &ecx, &edx);
- dprintk("cpuid eax is %u\n", eax);
-
- return eax * 1000;
-}
-
-/**
- * longrun_determine_freqs - determines the lowest and highest possible core frequency
- * @low_freq: an int to put the lowest frequency into
- * @high_freq: an int to put the highest frequency into
- *
- * Determines the lowest and highest possible core frequencies on this CPU.
- * This is necessary to calculate the performance percentage according to
- * TMTA rules:
- * performance_pctg = (target_freq - low_freq)/(high_freq - low_freq)
- */
-static int __cpuinit longrun_determine_freqs(unsigned int *low_freq,
- unsigned int *high_freq)
-{
- u32 msr_lo, msr_hi;
- u32 save_lo, save_hi;
- u32 eax, ebx, ecx, edx;
- u32 try_hi;
- struct cpuinfo_x86 *c = &cpu_data(0);
-
- if (!low_freq || !high_freq)
- return -EINVAL;
-
- if (cpu_has(c, X86_FEATURE_LRTI)) {
- /* if the LongRun Table Interface is present, the
- * detection is a bit easier:
- * For minimum frequency, read out the maximum
- * level (msr_hi), write that into "currently
- * selected level", and read out the frequency.
- * For maximum frequency, read out level zero.
- */
- /* minimum */
- rdmsr(MSR_TMTA_LRTI_READOUT, msr_lo, msr_hi);
- wrmsr(MSR_TMTA_LRTI_READOUT, msr_hi, msr_hi);
- rdmsr(MSR_TMTA_LRTI_VOLT_MHZ, msr_lo, msr_hi);
- *low_freq = msr_lo * 1000; /* to kHz */
-
- /* maximum */
- wrmsr(MSR_TMTA_LRTI_READOUT, 0, msr_hi);
- rdmsr(MSR_TMTA_LRTI_VOLT_MHZ, msr_lo, msr_hi);
- *high_freq = msr_lo * 1000; /* to kHz */
-
- dprintk("longrun table interface told %u - %u kHz\n",
- *low_freq, *high_freq);
-
- if (*low_freq > *high_freq)
- *low_freq = *high_freq;
- return 0;
- }
-
- /* set the upper border to the value determined during TSC init */
- *high_freq = (cpu_khz / 1000);
- *high_freq = *high_freq * 1000;
- dprintk("high frequency is %u kHz\n", *high_freq);
-
- /* get current borders */
- rdmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
- save_lo = msr_lo & 0x0000007F;
- save_hi = msr_hi & 0x0000007F;
-
- /* if current perf_pctg is larger than 90%, we need to decrease the
- * upper limit to make the calculation more accurate.
- */
- cpuid(0x80860007, &eax, &ebx, &ecx, &edx);
- /* try decreasing in 10% steps, some processors react only
- * on some barrier values */
- for (try_hi = 80; try_hi > 0 && ecx > 90; try_hi -= 10) {
- /* set to 0 to try_hi perf_pctg */
- msr_lo &= 0xFFFFFF80;
- msr_hi &= 0xFFFFFF80;
- msr_hi |= try_hi;
- wrmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
-
- /* read out current core MHz and current perf_pctg */
- cpuid(0x80860007, &eax, &ebx, &ecx, &edx);
-
- /* restore values */
- wrmsr(MSR_TMTA_LONGRUN_CTRL, save_lo, save_hi);
- }
- dprintk("percentage is %u %%, freq is %u MHz\n", ecx, eax);
-
- /* performance_pctg = (current_freq - low_freq)/(high_freq - low_freq)
- * eqals
- * low_freq * (1 - perf_pctg) = (cur_freq - high_freq * perf_pctg)
- *
- * high_freq * perf_pctg is stored tempoarily into "ebx".
- */
- ebx = (((cpu_khz / 1000) * ecx) / 100); /* to MHz */
-
- if ((ecx > 95) || (ecx == 0) || (eax < ebx))
- return -EIO;
-
- edx = ((eax - ebx) * 100) / (100 - ecx);
- *low_freq = edx * 1000; /* back to kHz */
-
- dprintk("low frequency is %u kHz\n", *low_freq);
-
- if (*low_freq > *high_freq)
- *low_freq = *high_freq;
-
- return 0;
-}
-
-
-static int __cpuinit longrun_cpu_init(struct cpufreq_policy *policy)
-{
- int result = 0;
-
- /* capability check */
- if (policy->cpu != 0)
- return -ENODEV;
-
- /* detect low and high frequency */
- result = longrun_determine_freqs(&longrun_low_freq, &longrun_high_freq);
- if (result)
- return result;
-
- /* cpuinfo and default policy values */
- policy->cpuinfo.min_freq = longrun_low_freq;
- policy->cpuinfo.max_freq = longrun_high_freq;
- policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
- longrun_get_policy(policy);
-
- return 0;
-}
-
-
-static struct cpufreq_driver longrun_driver = {
- .flags = CPUFREQ_CONST_LOOPS,
- .verify = longrun_verify_policy,
- .setpolicy = longrun_set_policy,
- .get = longrun_get,
- .init = longrun_cpu_init,
- .name = "longrun",
- .owner = THIS_MODULE,
-};
-
-
-/**
- * longrun_init - initializes the Transmeta Crusoe LongRun CPUFreq driver
- *
- * Initializes the LongRun support.
- */
-static int __init longrun_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
-
- if (c->x86_vendor != X86_VENDOR_TRANSMETA ||
- !cpu_has(c, X86_FEATURE_LONGRUN))
- return -ENODEV;
-
- return cpufreq_register_driver(&longrun_driver);
-}
-
-
-/**
- * longrun_exit - unregisters LongRun support
- */
-static void __exit longrun_exit(void)
-{
- cpufreq_unregister_driver(&longrun_driver);
-}
-
-
-MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
-MODULE_DESCRIPTION("LongRun driver for Transmeta Crusoe and "
- "Efficeon processors.");
-MODULE_LICENSE("GPL");
-
-module_init(longrun_init);
-module_exit(longrun_exit);
+++ /dev/null
-#include <linux/kernel.h>
-#include <linux/smp.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/slab.h>
-
-#include "mperf.h"
-
-static DEFINE_PER_CPU(struct aperfmperf, acfreq_old_perf);
-
-/* Called via smp_call_function_single(), on the target CPU */
-static void read_measured_perf_ctrs(void *_cur)
-{
- struct aperfmperf *am = _cur;
-
- get_aperfmperf(am);
-}
-
-/*
- * Return the measured active (C0) frequency on this CPU since last call
- * to this function.
- * Input: cpu number
- * Return: Average CPU frequency in terms of max frequency (zero on error)
- *
- * We use IA32_MPERF and IA32_APERF MSRs to get the measured performance
- * over a period of time, while CPU is in C0 state.
- * IA32_MPERF counts at the rate of max advertised frequency
- * IA32_APERF counts at the rate of actual CPU frequency
- * Only IA32_APERF/IA32_MPERF ratio is architecturally defined and
- * no meaning should be associated with absolute values of these MSRs.
- */
-unsigned int cpufreq_get_measured_perf(struct cpufreq_policy *policy,
- unsigned int cpu)
-{
- struct aperfmperf perf;
- unsigned long ratio;
- unsigned int retval;
-
- if (smp_call_function_single(cpu, read_measured_perf_ctrs, &perf, 1))
- return 0;
-
- ratio = calc_aperfmperf_ratio(&per_cpu(acfreq_old_perf, cpu), &perf);
- per_cpu(acfreq_old_perf, cpu) = perf;
-
- retval = (policy->cpuinfo.max_freq * ratio) >> APERFMPERF_SHIFT;
-
- return retval;
-}
-EXPORT_SYMBOL_GPL(cpufreq_get_measured_perf);
-MODULE_LICENSE("GPL");
+++ /dev/null
-/*
- * (c) 2010 Advanced Micro Devices, Inc.
- * Your use of this code is subject to the terms and conditions of the
- * GNU general public license version 2. See "COPYING" or
- * http://www.gnu.org/licenses/gpl.html
- */
-
-unsigned int cpufreq_get_measured_perf(struct cpufreq_policy *policy,
- unsigned int cpu);
+++ /dev/null
-/*
- * Pentium 4/Xeon CPU on demand clock modulation/speed scaling
- * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
- * (C) 2002 Zwane Mwaikambo <zwane@commfireservices.com>
- * (C) 2002 Arjan van de Ven <arjanv@redhat.com>
- * (C) 2002 Tora T. Engstad
- * All Rights Reserved
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * The author(s) of this software shall not be held liable for damages
- * of any nature resulting due to the use of this software. This
- * software is provided AS-IS with no warranties.
- *
- * Date Errata Description
- * 20020525 N44, O17 12.5% or 25% DC causes lockup
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/smp.h>
-#include <linux/cpufreq.h>
-#include <linux/cpumask.h>
-#include <linux/timex.h>
-
-#include <asm/processor.h>
-#include <asm/msr.h>
-#include <asm/timer.h>
-
-#include "speedstep-lib.h"
-
-#define PFX "p4-clockmod: "
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "p4-clockmod", msg)
-
-/*
- * Duty Cycle (3bits), note DC_DISABLE is not specified in
- * intel docs i just use it to mean disable
- */
-enum {
- DC_RESV, DC_DFLT, DC_25PT, DC_38PT, DC_50PT,
- DC_64PT, DC_75PT, DC_88PT, DC_DISABLE
-};
-
-#define DC_ENTRIES 8
-
-
-static int has_N44_O17_errata[NR_CPUS];
-static unsigned int stock_freq;
-static struct cpufreq_driver p4clockmod_driver;
-static unsigned int cpufreq_p4_get(unsigned int cpu);
-
-static int cpufreq_p4_setdc(unsigned int cpu, unsigned int newstate)
-{
- u32 l, h;
-
- if (!cpu_online(cpu) ||
- (newstate > DC_DISABLE) || (newstate == DC_RESV))
- return -EINVAL;
-
- rdmsr_on_cpu(cpu, MSR_IA32_THERM_STATUS, &l, &h);
-
- if (l & 0x01)
- dprintk("CPU#%d currently thermal throttled\n", cpu);
-
- if (has_N44_O17_errata[cpu] &&
- (newstate == DC_25PT || newstate == DC_DFLT))
- newstate = DC_38PT;
-
- rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
- if (newstate == DC_DISABLE) {
- dprintk("CPU#%d disabling modulation\n", cpu);
- wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l & ~(1<<4), h);
- } else {
- dprintk("CPU#%d setting duty cycle to %d%%\n",
- cpu, ((125 * newstate) / 10));
- /* bits 63 - 5 : reserved
- * bit 4 : enable/disable
- * bits 3-1 : duty cycle
- * bit 0 : reserved
- */
- l = (l & ~14);
- l = l | (1<<4) | ((newstate & 0x7)<<1);
- wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
- }
-
- return 0;
-}
-
-
-static struct cpufreq_frequency_table p4clockmod_table[] = {
- {DC_RESV, CPUFREQ_ENTRY_INVALID},
- {DC_DFLT, 0},
- {DC_25PT, 0},
- {DC_38PT, 0},
- {DC_50PT, 0},
- {DC_64PT, 0},
- {DC_75PT, 0},
- {DC_88PT, 0},
- {DC_DISABLE, 0},
- {DC_RESV, CPUFREQ_TABLE_END},
-};
-
-
-static int cpufreq_p4_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate = DC_RESV;
- struct cpufreq_freqs freqs;
- int i;
-
- if (cpufreq_frequency_table_target(policy, &p4clockmod_table[0],
- target_freq, relation, &newstate))
- return -EINVAL;
-
- freqs.old = cpufreq_p4_get(policy->cpu);
- freqs.new = stock_freq * p4clockmod_table[newstate].index / 8;
-
- if (freqs.new == freqs.old)
- return 0;
-
- /* notifiers */
- for_each_cpu(i, policy->cpus) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- }
-
- /* run on each logical CPU,
- * see section 13.15.3 of IA32 Intel Architecture Software
- * Developer's Manual, Volume 3
- */
- for_each_cpu(i, policy->cpus)
- cpufreq_p4_setdc(i, p4clockmod_table[newstate].index);
-
- /* notifiers */
- for_each_cpu(i, policy->cpus) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
-
- return 0;
-}
-
-
-static int cpufreq_p4_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, &p4clockmod_table[0]);
-}
-
-
-static unsigned int cpufreq_p4_get_frequency(struct cpuinfo_x86 *c)
-{
- if (c->x86 == 0x06) {
- if (cpu_has(c, X86_FEATURE_EST))
- printk_once(KERN_WARNING PFX "Warning: EST-capable "
- "CPU detected. The acpi-cpufreq module offers "
- "voltage scaling in addition to frequency "
- "scaling. You should use that instead of "
- "p4-clockmod, if possible.\n");
- switch (c->x86_model) {
- case 0x0E: /* Core */
- case 0x0F: /* Core Duo */
- case 0x16: /* Celeron Core */
- case 0x1C: /* Atom */
- p4clockmod_driver.flags |= CPUFREQ_CONST_LOOPS;
- return speedstep_get_frequency(SPEEDSTEP_CPU_PCORE);
- case 0x0D: /* Pentium M (Dothan) */
- p4clockmod_driver.flags |= CPUFREQ_CONST_LOOPS;
- /* fall through */
- case 0x09: /* Pentium M (Banias) */
- return speedstep_get_frequency(SPEEDSTEP_CPU_PM);
- }
- }
-
- if (c->x86 != 0xF)
- return 0;
-
- /* on P-4s, the TSC runs with constant frequency independent whether
- * throttling is active or not. */
- p4clockmod_driver.flags |= CPUFREQ_CONST_LOOPS;
-
- if (speedstep_detect_processor() == SPEEDSTEP_CPU_P4M) {
- printk(KERN_WARNING PFX "Warning: Pentium 4-M detected. "
- "The speedstep-ich or acpi cpufreq modules offer "
- "voltage scaling in addition of frequency scaling. "
- "You should use either one instead of p4-clockmod, "
- "if possible.\n");
- return speedstep_get_frequency(SPEEDSTEP_CPU_P4M);
- }
-
- return speedstep_get_frequency(SPEEDSTEP_CPU_P4D);
-}
-
-
-
-static int cpufreq_p4_cpu_init(struct cpufreq_policy *policy)
-{
- struct cpuinfo_x86 *c = &cpu_data(policy->cpu);
- int cpuid = 0;
- unsigned int i;
-
-#ifdef CONFIG_SMP
- cpumask_copy(policy->cpus, cpu_sibling_mask(policy->cpu));
-#endif
-
- /* Errata workaround */
- cpuid = (c->x86 << 8) | (c->x86_model << 4) | c->x86_mask;
- switch (cpuid) {
- case 0x0f07:
- case 0x0f0a:
- case 0x0f11:
- case 0x0f12:
- has_N44_O17_errata[policy->cpu] = 1;
- dprintk("has errata -- disabling low frequencies\n");
- }
-
- if (speedstep_detect_processor() == SPEEDSTEP_CPU_P4D &&
- c->x86_model < 2) {
- /* switch to maximum frequency and measure result */
- cpufreq_p4_setdc(policy->cpu, DC_DISABLE);
- recalibrate_cpu_khz();
- }
- /* get max frequency */
- stock_freq = cpufreq_p4_get_frequency(c);
- if (!stock_freq)
- return -EINVAL;
-
- /* table init */
- for (i = 1; (p4clockmod_table[i].frequency != CPUFREQ_TABLE_END); i++) {
- if ((i < 2) && (has_N44_O17_errata[policy->cpu]))
- p4clockmod_table[i].frequency = CPUFREQ_ENTRY_INVALID;
- else
- p4clockmod_table[i].frequency = (stock_freq * i)/8;
- }
- cpufreq_frequency_table_get_attr(p4clockmod_table, policy->cpu);
-
- /* cpuinfo and default policy values */
-
- /* the transition latency is set to be 1 higher than the maximum
- * transition latency of the ondemand governor */
- policy->cpuinfo.transition_latency = 10000001;
- policy->cur = stock_freq;
-
- return cpufreq_frequency_table_cpuinfo(policy, &p4clockmod_table[0]);
-}
-
-
-static int cpufreq_p4_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-static unsigned int cpufreq_p4_get(unsigned int cpu)
-{
- u32 l, h;
-
- rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
-
- if (l & 0x10) {
- l = l >> 1;
- l &= 0x7;
- } else
- l = DC_DISABLE;
-
- if (l != DC_DISABLE)
- return stock_freq * l / 8;
-
- return stock_freq;
-}
-
-static struct freq_attr *p4clockmod_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver p4clockmod_driver = {
- .verify = cpufreq_p4_verify,
- .target = cpufreq_p4_target,
- .init = cpufreq_p4_cpu_init,
- .exit = cpufreq_p4_cpu_exit,
- .get = cpufreq_p4_get,
- .name = "p4-clockmod",
- .owner = THIS_MODULE,
- .attr = p4clockmod_attr,
-};
-
-
-static int __init cpufreq_p4_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- int ret;
-
- /*
- * THERM_CONTROL is architectural for IA32 now, so
- * we can rely on the capability checks
- */
- if (c->x86_vendor != X86_VENDOR_INTEL)
- return -ENODEV;
-
- if (!test_cpu_cap(c, X86_FEATURE_ACPI) ||
- !test_cpu_cap(c, X86_FEATURE_ACC))
- return -ENODEV;
-
- ret = cpufreq_register_driver(&p4clockmod_driver);
- if (!ret)
- printk(KERN_INFO PFX "P4/Xeon(TM) CPU On-Demand Clock "
- "Modulation available\n");
-
- return ret;
-}
-
-
-static void __exit cpufreq_p4_exit(void)
-{
- cpufreq_unregister_driver(&p4clockmod_driver);
-}
-
-
-MODULE_AUTHOR("Zwane Mwaikambo <zwane@commfireservices.com>");
-MODULE_DESCRIPTION("cpufreq driver for Pentium(TM) 4/Xeon(TM)");
-MODULE_LICENSE("GPL");
-
-late_initcall(cpufreq_p4_init);
-module_exit(cpufreq_p4_exit);
+++ /dev/null
-/*
- * pcc-cpufreq.c - Processor Clocking Control firmware cpufreq interface
- *
- * Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com>
- * Copyright (C) 2009 Hewlett-Packard Development Company, L.P.
- * Nagananda Chumbalkar <nagananda.chumbalkar@hp.com>
- *
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; version 2 of the License.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON
- * INFRINGEMENT. See the GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write to the Free Software Foundation, Inc.,
- * 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/smp.h>
-#include <linux/sched.h>
-#include <linux/cpufreq.h>
-#include <linux/compiler.h>
-#include <linux/slab.h>
-
-#include <linux/acpi.h>
-#include <linux/io.h>
-#include <linux/spinlock.h>
-#include <linux/uaccess.h>
-
-#include <acpi/processor.h>
-
-#define PCC_VERSION "1.00.00"
-#define POLL_LOOPS 300
-
-#define CMD_COMPLETE 0x1
-#define CMD_GET_FREQ 0x0
-#define CMD_SET_FREQ 0x1
-
-#define BUF_SZ 4
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "pcc-cpufreq", msg)
-
-struct pcc_register_resource {
- u8 descriptor;
- u16 length;
- u8 space_id;
- u8 bit_width;
- u8 bit_offset;
- u8 access_size;
- u64 address;
-} __attribute__ ((packed));
-
-struct pcc_memory_resource {
- u8 descriptor;
- u16 length;
- u8 space_id;
- u8 resource_usage;
- u8 type_specific;
- u64 granularity;
- u64 minimum;
- u64 maximum;
- u64 translation_offset;
- u64 address_length;
-} __attribute__ ((packed));
-
-static struct cpufreq_driver pcc_cpufreq_driver;
-
-struct pcc_header {
- u32 signature;
- u16 length;
- u8 major;
- u8 minor;
- u32 features;
- u16 command;
- u16 status;
- u32 latency;
- u32 minimum_time;
- u32 maximum_time;
- u32 nominal;
- u32 throttled_frequency;
- u32 minimum_frequency;
-};
-
-static void __iomem *pcch_virt_addr;
-static struct pcc_header __iomem *pcch_hdr;
-
-static DEFINE_SPINLOCK(pcc_lock);
-
-static struct acpi_generic_address doorbell;
-
-static u64 doorbell_preserve;
-static u64 doorbell_write;
-
-static u8 OSC_UUID[16] = {0x63, 0x9B, 0x2C, 0x9F, 0x70, 0x91, 0x49, 0x1f,
- 0xBB, 0x4F, 0xA5, 0x98, 0x2F, 0xA1, 0xB5, 0x46};
-
-struct pcc_cpu {
- u32 input_offset;
- u32 output_offset;
-};
-
-static struct pcc_cpu __percpu *pcc_cpu_info;
-
-static int pcc_cpufreq_verify(struct cpufreq_policy *policy)
-{
- cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
- policy->cpuinfo.max_freq);
- return 0;
-}
-
-static inline void pcc_cmd(void)
-{
- u64 doorbell_value;
- int i;
-
- acpi_read(&doorbell_value, &doorbell);
- acpi_write((doorbell_value & doorbell_preserve) | doorbell_write,
- &doorbell);
-
- for (i = 0; i < POLL_LOOPS; i++) {
- if (ioread16(&pcch_hdr->status) & CMD_COMPLETE)
- break;
- }
-}
-
-static inline void pcc_clear_mapping(void)
-{
- if (pcch_virt_addr)
- iounmap(pcch_virt_addr);
- pcch_virt_addr = NULL;
-}
-
-static unsigned int pcc_get_freq(unsigned int cpu)
-{
- struct pcc_cpu *pcc_cpu_data;
- unsigned int curr_freq;
- unsigned int freq_limit;
- u16 status;
- u32 input_buffer;
- u32 output_buffer;
-
- spin_lock(&pcc_lock);
-
- dprintk("get: get_freq for CPU %d\n", cpu);
- pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
-
- input_buffer = 0x1;
- iowrite32(input_buffer,
- (pcch_virt_addr + pcc_cpu_data->input_offset));
- iowrite16(CMD_GET_FREQ, &pcch_hdr->command);
-
- pcc_cmd();
-
- output_buffer =
- ioread32(pcch_virt_addr + pcc_cpu_data->output_offset);
-
- /* Clear the input buffer - we are done with the current command */
- memset_io((pcch_virt_addr + pcc_cpu_data->input_offset), 0, BUF_SZ);
-
- status = ioread16(&pcch_hdr->status);
- if (status != CMD_COMPLETE) {
- dprintk("get: FAILED: for CPU %d, status is %d\n",
- cpu, status);
- goto cmd_incomplete;
- }
- iowrite16(0, &pcch_hdr->status);
- curr_freq = (((ioread32(&pcch_hdr->nominal) * (output_buffer & 0xff))
- / 100) * 1000);
-
- dprintk("get: SUCCESS: (virtual) output_offset for cpu %d is "
- "0x%x, contains a value of: 0x%x. Speed is: %d MHz\n",
- cpu, (pcch_virt_addr + pcc_cpu_data->output_offset),
- output_buffer, curr_freq);
-
- freq_limit = (output_buffer >> 8) & 0xff;
- if (freq_limit != 0xff) {
- dprintk("get: frequency for cpu %d is being temporarily"
- " capped at %d\n", cpu, curr_freq);
- }
-
- spin_unlock(&pcc_lock);
- return curr_freq;
-
-cmd_incomplete:
- iowrite16(0, &pcch_hdr->status);
- spin_unlock(&pcc_lock);
- return 0;
-}
-
-static int pcc_cpufreq_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- struct pcc_cpu *pcc_cpu_data;
- struct cpufreq_freqs freqs;
- u16 status;
- u32 input_buffer;
- int cpu;
-
- spin_lock(&pcc_lock);
- cpu = policy->cpu;
- pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
-
- dprintk("target: CPU %d should go to target freq: %d "
- "(virtual) input_offset is 0x%x\n",
- cpu, target_freq,
- (pcch_virt_addr + pcc_cpu_data->input_offset));
-
- freqs.new = target_freq;
- freqs.cpu = cpu;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- input_buffer = 0x1 | (((target_freq * 100)
- / (ioread32(&pcch_hdr->nominal) * 1000)) << 8);
- iowrite32(input_buffer,
- (pcch_virt_addr + pcc_cpu_data->input_offset));
- iowrite16(CMD_SET_FREQ, &pcch_hdr->command);
-
- pcc_cmd();
-
- /* Clear the input buffer - we are done with the current command */
- memset_io((pcch_virt_addr + pcc_cpu_data->input_offset), 0, BUF_SZ);
-
- status = ioread16(&pcch_hdr->status);
- if (status != CMD_COMPLETE) {
- dprintk("target: FAILED for cpu %d, with status: 0x%x\n",
- cpu, status);
- goto cmd_incomplete;
- }
- iowrite16(0, &pcch_hdr->status);
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- dprintk("target: was SUCCESSFUL for cpu %d\n", cpu);
- spin_unlock(&pcc_lock);
-
- return 0;
-
-cmd_incomplete:
- iowrite16(0, &pcch_hdr->status);
- spin_unlock(&pcc_lock);
- return -EINVAL;
-}
-
-static int pcc_get_offset(int cpu)
-{
- acpi_status status;
- struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
- union acpi_object *pccp, *offset;
- struct pcc_cpu *pcc_cpu_data;
- struct acpi_processor *pr;
- int ret = 0;
-
- pr = per_cpu(processors, cpu);
- pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
-
- status = acpi_evaluate_object(pr->handle, "PCCP", NULL, &buffer);
- if (ACPI_FAILURE(status))
- return -ENODEV;
-
- pccp = buffer.pointer;
- if (!pccp || pccp->type != ACPI_TYPE_PACKAGE) {
- ret = -ENODEV;
- goto out_free;
- };
-
- offset = &(pccp->package.elements[0]);
- if (!offset || offset->type != ACPI_TYPE_INTEGER) {
- ret = -ENODEV;
- goto out_free;
- }
-
- pcc_cpu_data->input_offset = offset->integer.value;
-
- offset = &(pccp->package.elements[1]);
- if (!offset || offset->type != ACPI_TYPE_INTEGER) {
- ret = -ENODEV;
- goto out_free;
- }
-
- pcc_cpu_data->output_offset = offset->integer.value;
-
- memset_io((pcch_virt_addr + pcc_cpu_data->input_offset), 0, BUF_SZ);
- memset_io((pcch_virt_addr + pcc_cpu_data->output_offset), 0, BUF_SZ);
-
- dprintk("pcc_get_offset: for CPU %d: pcc_cpu_data "
- "input_offset: 0x%x, pcc_cpu_data output_offset: 0x%x\n",
- cpu, pcc_cpu_data->input_offset, pcc_cpu_data->output_offset);
-out_free:
- kfree(buffer.pointer);
- return ret;
-}
-
-static int __init pcc_cpufreq_do_osc(acpi_handle *handle)
-{
- acpi_status status;
- struct acpi_object_list input;
- struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
- union acpi_object in_params[4];
- union acpi_object *out_obj;
- u32 capabilities[2];
- u32 errors;
- u32 supported;
- int ret = 0;
-
- input.count = 4;
- input.pointer = in_params;
- in_params[0].type = ACPI_TYPE_BUFFER;
- in_params[0].buffer.length = 16;
- in_params[0].buffer.pointer = OSC_UUID;
- in_params[1].type = ACPI_TYPE_INTEGER;
- in_params[1].integer.value = 1;
- in_params[2].type = ACPI_TYPE_INTEGER;
- in_params[2].integer.value = 2;
- in_params[3].type = ACPI_TYPE_BUFFER;
- in_params[3].buffer.length = 8;
- in_params[3].buffer.pointer = (u8 *)&capabilities;
-
- capabilities[0] = OSC_QUERY_ENABLE;
- capabilities[1] = 0x1;
-
- status = acpi_evaluate_object(*handle, "_OSC", &input, &output);
- if (ACPI_FAILURE(status))
- return -ENODEV;
-
- if (!output.length)
- return -ENODEV;
-
- out_obj = output.pointer;
- if (out_obj->type != ACPI_TYPE_BUFFER) {
- ret = -ENODEV;
- goto out_free;
- }
-
- errors = *((u32 *)out_obj->buffer.pointer) & ~(1 << 0);
- if (errors) {
- ret = -ENODEV;
- goto out_free;
- }
-
- supported = *((u32 *)(out_obj->buffer.pointer + 4));
- if (!(supported & 0x1)) {
- ret = -ENODEV;
- goto out_free;
- }
-
- kfree(output.pointer);
- capabilities[0] = 0x0;
- capabilities[1] = 0x1;
-
- status = acpi_evaluate_object(*handle, "_OSC", &input, &output);
- if (ACPI_FAILURE(status))
- return -ENODEV;
-
- if (!output.length)
- return -ENODEV;
-
- out_obj = output.pointer;
- if (out_obj->type != ACPI_TYPE_BUFFER) {
- ret = -ENODEV;
- goto out_free;
- }
-
- errors = *((u32 *)out_obj->buffer.pointer) & ~(1 << 0);
- if (errors) {
- ret = -ENODEV;
- goto out_free;
- }
-
- supported = *((u32 *)(out_obj->buffer.pointer + 4));
- if (!(supported & 0x1)) {
- ret = -ENODEV;
- goto out_free;
- }
-
-out_free:
- kfree(output.pointer);
- return ret;
-}
-
-static int __init pcc_cpufreq_probe(void)
-{
- acpi_status status;
- struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
- struct pcc_memory_resource *mem_resource;
- struct pcc_register_resource *reg_resource;
- union acpi_object *out_obj, *member;
- acpi_handle handle, osc_handle, pcch_handle;
- int ret = 0;
-
- status = acpi_get_handle(NULL, "\\_SB", &handle);
- if (ACPI_FAILURE(status))
- return -ENODEV;
-
- status = acpi_get_handle(handle, "PCCH", &pcch_handle);
- if (ACPI_FAILURE(status))
- return -ENODEV;
-
- status = acpi_get_handle(handle, "_OSC", &osc_handle);
- if (ACPI_SUCCESS(status)) {
- ret = pcc_cpufreq_do_osc(&osc_handle);
- if (ret)
- dprintk("probe: _OSC evaluation did not succeed\n");
- /* Firmware's use of _OSC is optional */
- ret = 0;
- }
-
- status = acpi_evaluate_object(handle, "PCCH", NULL, &output);
- if (ACPI_FAILURE(status))
- return -ENODEV;
-
- out_obj = output.pointer;
- if (out_obj->type != ACPI_TYPE_PACKAGE) {
- ret = -ENODEV;
- goto out_free;
- }
-
- member = &out_obj->package.elements[0];
- if (member->type != ACPI_TYPE_BUFFER) {
- ret = -ENODEV;
- goto out_free;
- }
-
- mem_resource = (struct pcc_memory_resource *)member->buffer.pointer;
-
- dprintk("probe: mem_resource descriptor: 0x%x,"
- " length: %d, space_id: %d, resource_usage: %d,"
- " type_specific: %d, granularity: 0x%llx,"
- " minimum: 0x%llx, maximum: 0x%llx,"
- " translation_offset: 0x%llx, address_length: 0x%llx\n",
- mem_resource->descriptor, mem_resource->length,
- mem_resource->space_id, mem_resource->resource_usage,
- mem_resource->type_specific, mem_resource->granularity,
- mem_resource->minimum, mem_resource->maximum,
- mem_resource->translation_offset,
- mem_resource->address_length);
-
- if (mem_resource->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
- ret = -ENODEV;
- goto out_free;
- }
-
- pcch_virt_addr = ioremap_nocache(mem_resource->minimum,
- mem_resource->address_length);
- if (pcch_virt_addr == NULL) {
- dprintk("probe: could not map shared mem region\n");
- goto out_free;
- }
- pcch_hdr = pcch_virt_addr;
-
- dprintk("probe: PCCH header (virtual) addr: 0x%p\n", pcch_hdr);
- dprintk("probe: PCCH header is at physical address: 0x%llx,"
- " signature: 0x%x, length: %d bytes, major: %d, minor: %d,"
- " supported features: 0x%x, command field: 0x%x,"
- " status field: 0x%x, nominal latency: %d us\n",
- mem_resource->minimum, ioread32(&pcch_hdr->signature),
- ioread16(&pcch_hdr->length), ioread8(&pcch_hdr->major),
- ioread8(&pcch_hdr->minor), ioread32(&pcch_hdr->features),
- ioread16(&pcch_hdr->command), ioread16(&pcch_hdr->status),
- ioread32(&pcch_hdr->latency));
-
- dprintk("probe: min time between commands: %d us,"
- " max time between commands: %d us,"
- " nominal CPU frequency: %d MHz,"
- " minimum CPU frequency: %d MHz,"
- " minimum CPU frequency without throttling: %d MHz\n",
- ioread32(&pcch_hdr->minimum_time),
- ioread32(&pcch_hdr->maximum_time),
- ioread32(&pcch_hdr->nominal),
- ioread32(&pcch_hdr->throttled_frequency),
- ioread32(&pcch_hdr->minimum_frequency));
-
- member = &out_obj->package.elements[1];
- if (member->type != ACPI_TYPE_BUFFER) {
- ret = -ENODEV;
- goto pcch_free;
- }
-
- reg_resource = (struct pcc_register_resource *)member->buffer.pointer;
-
- doorbell.space_id = reg_resource->space_id;
- doorbell.bit_width = reg_resource->bit_width;
- doorbell.bit_offset = reg_resource->bit_offset;
- doorbell.access_width = 64;
- doorbell.address = reg_resource->address;
-
- dprintk("probe: doorbell: space_id is %d, bit_width is %d, "
- "bit_offset is %d, access_width is %d, address is 0x%llx\n",
- doorbell.space_id, doorbell.bit_width, doorbell.bit_offset,
- doorbell.access_width, reg_resource->address);
-
- member = &out_obj->package.elements[2];
- if (member->type != ACPI_TYPE_INTEGER) {
- ret = -ENODEV;
- goto pcch_free;
- }
-
- doorbell_preserve = member->integer.value;
-
- member = &out_obj->package.elements[3];
- if (member->type != ACPI_TYPE_INTEGER) {
- ret = -ENODEV;
- goto pcch_free;
- }
-
- doorbell_write = member->integer.value;
-
- dprintk("probe: doorbell_preserve: 0x%llx,"
- " doorbell_write: 0x%llx\n",
- doorbell_preserve, doorbell_write);
-
- pcc_cpu_info = alloc_percpu(struct pcc_cpu);
- if (!pcc_cpu_info) {
- ret = -ENOMEM;
- goto pcch_free;
- }
-
- printk(KERN_DEBUG "pcc-cpufreq: (v%s) driver loaded with frequency"
- " limits: %d MHz, %d MHz\n", PCC_VERSION,
- ioread32(&pcch_hdr->minimum_frequency),
- ioread32(&pcch_hdr->nominal));
- kfree(output.pointer);
- return ret;
-pcch_free:
- pcc_clear_mapping();
-out_free:
- kfree(output.pointer);
- return ret;
-}
-
-static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
-{
- unsigned int cpu = policy->cpu;
- unsigned int result = 0;
-
- if (!pcch_virt_addr) {
- result = -1;
- goto out;
- }
-
- result = pcc_get_offset(cpu);
- if (result) {
- dprintk("init: PCCP evaluation failed\n");
- goto out;
- }
-
- policy->max = policy->cpuinfo.max_freq =
- ioread32(&pcch_hdr->nominal) * 1000;
- policy->min = policy->cpuinfo.min_freq =
- ioread32(&pcch_hdr->minimum_frequency) * 1000;
- policy->cur = pcc_get_freq(cpu);
-
- if (!policy->cur) {
- dprintk("init: Unable to get current CPU frequency\n");
- result = -EINVAL;
- goto out;
- }
-
- dprintk("init: policy->max is %d, policy->min is %d\n",
- policy->max, policy->min);
-out:
- return result;
-}
-
-static int pcc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
-{
- return 0;
-}
-
-static struct cpufreq_driver pcc_cpufreq_driver = {
- .flags = CPUFREQ_CONST_LOOPS,
- .get = pcc_get_freq,
- .verify = pcc_cpufreq_verify,
- .target = pcc_cpufreq_target,
- .init = pcc_cpufreq_cpu_init,
- .exit = pcc_cpufreq_cpu_exit,
- .name = "pcc-cpufreq",
- .owner = THIS_MODULE,
-};
-
-static int __init pcc_cpufreq_init(void)
-{
- int ret;
-
- if (acpi_disabled)
- return 0;
-
- ret = pcc_cpufreq_probe();
- if (ret) {
- dprintk("pcc_cpufreq_init: PCCH evaluation failed\n");
- return ret;
- }
-
- ret = cpufreq_register_driver(&pcc_cpufreq_driver);
-
- return ret;
-}
-
-static void __exit pcc_cpufreq_exit(void)
-{
- cpufreq_unregister_driver(&pcc_cpufreq_driver);
-
- pcc_clear_mapping();
-
- free_percpu(pcc_cpu_info);
-}
-
-MODULE_AUTHOR("Matthew Garrett, Naga Chumbalkar");
-MODULE_VERSION(PCC_VERSION);
-MODULE_DESCRIPTION("Processor Clocking Control interface driver");
-MODULE_LICENSE("GPL");
-
-late_initcall(pcc_cpufreq_init);
-module_exit(pcc_cpufreq_exit);
+++ /dev/null
-/*
- * This file was based upon code in Powertweak Linux (http://powertweak.sf.net)
- * (C) 2000-2003 Dave Jones, Arjan van de Ven, Janne Pänkälä,
- * Dominik Brodowski.
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/ioport.h>
-#include <linux/timex.h>
-#include <linux/io.h>
-
-#include <asm/msr.h>
-
-#define POWERNOW_IOPORT 0xfff0 /* it doesn't matter where, as long
- as it is unused */
-
-#define PFX "powernow-k6: "
-static unsigned int busfreq; /* FSB, in 10 kHz */
-static unsigned int max_multiplier;
-
-
-/* Clock ratio multiplied by 10 - see table 27 in AMD#23446 */
-static struct cpufreq_frequency_table clock_ratio[] = {
- {45, /* 000 -> 4.5x */ 0},
- {50, /* 001 -> 5.0x */ 0},
- {40, /* 010 -> 4.0x */ 0},
- {55, /* 011 -> 5.5x */ 0},
- {20, /* 100 -> 2.0x */ 0},
- {30, /* 101 -> 3.0x */ 0},
- {60, /* 110 -> 6.0x */ 0},
- {35, /* 111 -> 3.5x */ 0},
- {0, CPUFREQ_TABLE_END}
-};
-
-
-/**
- * powernow_k6_get_cpu_multiplier - returns the current FSB multiplier
- *
- * Returns the current setting of the frequency multiplier. Core clock
- * speed is frequency of the Front-Side Bus multiplied with this value.
- */
-static int powernow_k6_get_cpu_multiplier(void)
-{
- u64 invalue = 0;
- u32 msrval;
-
- msrval = POWERNOW_IOPORT + 0x1;
- wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */
- invalue = inl(POWERNOW_IOPORT + 0x8);
- msrval = POWERNOW_IOPORT + 0x0;
- wrmsr(MSR_K6_EPMR, msrval, 0); /* disable it again */
-
- return clock_ratio[(invalue >> 5)&7].index;
-}
-
-
-/**
- * powernow_k6_set_state - set the PowerNow! multiplier
- * @best_i: clock_ratio[best_i] is the target multiplier
- *
- * Tries to change the PowerNow! multiplier
- */
-static void powernow_k6_set_state(unsigned int best_i)
-{
- unsigned long outvalue = 0, invalue = 0;
- unsigned long msrval;
- struct cpufreq_freqs freqs;
-
- if (clock_ratio[best_i].index > max_multiplier) {
- printk(KERN_ERR PFX "invalid target frequency\n");
- return;
- }
-
- freqs.old = busfreq * powernow_k6_get_cpu_multiplier();
- freqs.new = busfreq * clock_ratio[best_i].index;
- freqs.cpu = 0; /* powernow-k6.c is UP only driver */
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- /* we now need to transform best_i to the BVC format, see AMD#23446 */
-
- outvalue = (1<<12) | (1<<10) | (1<<9) | (best_i<<5);
-
- msrval = POWERNOW_IOPORT + 0x1;
- wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */
- invalue = inl(POWERNOW_IOPORT + 0x8);
- invalue = invalue & 0xf;
- outvalue = outvalue | invalue;
- outl(outvalue , (POWERNOW_IOPORT + 0x8));
- msrval = POWERNOW_IOPORT + 0x0;
- wrmsr(MSR_K6_EPMR, msrval, 0); /* disable it again */
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-
- return;
-}
-
-
-/**
- * powernow_k6_verify - verifies a new CPUfreq policy
- * @policy: new policy
- *
- * Policy must be within lowest and highest possible CPU Frequency,
- * and at least one possible state must be within min and max.
- */
-static int powernow_k6_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, &clock_ratio[0]);
-}
-
-
-/**
- * powernow_k6_setpolicy - sets a new CPUFreq policy
- * @policy: new policy
- * @target_freq: the target frequency
- * @relation: how that frequency relates to achieved frequency
- * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
- *
- * sets a new CPUFreq policy
- */
-static int powernow_k6_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate = 0;
-
- if (cpufreq_frequency_table_target(policy, &clock_ratio[0],
- target_freq, relation, &newstate))
- return -EINVAL;
-
- powernow_k6_set_state(newstate);
-
- return 0;
-}
-
-
-static int powernow_k6_cpu_init(struct cpufreq_policy *policy)
-{
- unsigned int i, f;
- int result;
-
- if (policy->cpu != 0)
- return -ENODEV;
-
- /* get frequencies */
- max_multiplier = powernow_k6_get_cpu_multiplier();
- busfreq = cpu_khz / max_multiplier;
-
- /* table init */
- for (i = 0; (clock_ratio[i].frequency != CPUFREQ_TABLE_END); i++) {
- f = clock_ratio[i].index;
- if (f > max_multiplier)
- clock_ratio[i].frequency = CPUFREQ_ENTRY_INVALID;
- else
- clock_ratio[i].frequency = busfreq * f;
- }
-
- /* cpuinfo and default policy values */
- policy->cpuinfo.transition_latency = 200000;
- policy->cur = busfreq * max_multiplier;
-
- result = cpufreq_frequency_table_cpuinfo(policy, clock_ratio);
- if (result)
- return result;
-
- cpufreq_frequency_table_get_attr(clock_ratio, policy->cpu);
-
- return 0;
-}
-
-
-static int powernow_k6_cpu_exit(struct cpufreq_policy *policy)
-{
- unsigned int i;
- for (i = 0; i < 8; i++) {
- if (i == max_multiplier)
- powernow_k6_set_state(i);
- }
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-static unsigned int powernow_k6_get(unsigned int cpu)
-{
- unsigned int ret;
- ret = (busfreq * powernow_k6_get_cpu_multiplier());
- return ret;
-}
-
-static struct freq_attr *powernow_k6_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver powernow_k6_driver = {
- .verify = powernow_k6_verify,
- .target = powernow_k6_target,
- .init = powernow_k6_cpu_init,
- .exit = powernow_k6_cpu_exit,
- .get = powernow_k6_get,
- .name = "powernow-k6",
- .owner = THIS_MODULE,
- .attr = powernow_k6_attr,
-};
-
-
-/**
- * powernow_k6_init - initializes the k6 PowerNow! CPUFreq driver
- *
- * Initializes the K6 PowerNow! support. Returns -ENODEV on unsupported
- * devices, -EINVAL or -ENOMEM on problems during initiatization, and zero
- * on success.
- */
-static int __init powernow_k6_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
-
- if ((c->x86_vendor != X86_VENDOR_AMD) || (c->x86 != 5) ||
- ((c->x86_model != 12) && (c->x86_model != 13)))
- return -ENODEV;
-
- if (!request_region(POWERNOW_IOPORT, 16, "PowerNow!")) {
- printk(KERN_INFO PFX "PowerNow IOPORT region already used.\n");
- return -EIO;
- }
-
- if (cpufreq_register_driver(&powernow_k6_driver)) {
- release_region(POWERNOW_IOPORT, 16);
- return -EINVAL;
- }
-
- return 0;
-}
-
-
-/**
- * powernow_k6_exit - unregisters AMD K6-2+/3+ PowerNow! support
- *
- * Unregisters AMD K6-2+ / K6-3+ PowerNow! support.
- */
-static void __exit powernow_k6_exit(void)
-{
- cpufreq_unregister_driver(&powernow_k6_driver);
- release_region(POWERNOW_IOPORT, 16);
-}
-
-
-MODULE_AUTHOR("Arjan van de Ven, Dave Jones <davej@redhat.com>, "
- "Dominik Brodowski <linux@brodo.de>");
-MODULE_DESCRIPTION("PowerNow! driver for AMD K6-2+ / K6-3+ processors.");
-MODULE_LICENSE("GPL");
-
-module_init(powernow_k6_init);
-module_exit(powernow_k6_exit);
+++ /dev/null
-/*
- * AMD K7 Powernow driver.
- * (C) 2003 Dave Jones on behalf of SuSE Labs.
- * (C) 2003-2004 Dave Jones <davej@redhat.com>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- * Based upon datasheets & sample CPUs kindly provided by AMD.
- *
- * Errata 5:
- * CPU may fail to execute a FID/VID change in presence of interrupt.
- * - We cli/sti on stepping A0 CPUs around the FID/VID transition.
- * Errata 15:
- * CPU with half frequency multipliers may hang upon wakeup from disconnect.
- * - We disable half multipliers if ACPI is used on A0 stepping CPUs.
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/dmi.h>
-#include <linux/timex.h>
-#include <linux/io.h>
-
-#include <asm/timer.h> /* Needed for recalibrate_cpu_khz() */
-#include <asm/msr.h>
-#include <asm/system.h>
-
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
-#include <linux/acpi.h>
-#include <acpi/processor.h>
-#endif
-
-#include "powernow-k7.h"
-
-#define PFX "powernow: "
-
-
-struct psb_s {
- u8 signature[10];
- u8 tableversion;
- u8 flags;
- u16 settlingtime;
- u8 reserved1;
- u8 numpst;
-};
-
-struct pst_s {
- u32 cpuid;
- u8 fsbspeed;
- u8 maxfid;
- u8 startvid;
- u8 numpstates;
-};
-
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
-union powernow_acpi_control_t {
- struct {
- unsigned long fid:5,
- vid:5,
- sgtc:20,
- res1:2;
- } bits;
- unsigned long val;
-};
-#endif
-
-#ifdef CONFIG_CPU_FREQ_DEBUG
-/* divide by 1000 to get VCore voltage in V. */
-static const int mobile_vid_table[32] = {
- 2000, 1950, 1900, 1850, 1800, 1750, 1700, 1650,
- 1600, 1550, 1500, 1450, 1400, 1350, 1300, 0,
- 1275, 1250, 1225, 1200, 1175, 1150, 1125, 1100,
- 1075, 1050, 1025, 1000, 975, 950, 925, 0,
-};
-#endif
-
-/* divide by 10 to get FID. */
-static const int fid_codes[32] = {
- 110, 115, 120, 125, 50, 55, 60, 65,
- 70, 75, 80, 85, 90, 95, 100, 105,
- 30, 190, 40, 200, 130, 135, 140, 210,
- 150, 225, 160, 165, 170, 180, -1, -1,
-};
-
-/* This parameter is used in order to force ACPI instead of legacy method for
- * configuration purpose.
- */
-
-static int acpi_force;
-
-static struct cpufreq_frequency_table *powernow_table;
-
-static unsigned int can_scale_bus;
-static unsigned int can_scale_vid;
-static unsigned int minimum_speed = -1;
-static unsigned int maximum_speed;
-static unsigned int number_scales;
-static unsigned int fsb;
-static unsigned int latency;
-static char have_a0;
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "powernow-k7", msg)
-
-static int check_fsb(unsigned int fsbspeed)
-{
- int delta;
- unsigned int f = fsb / 1000;
-
- delta = (fsbspeed > f) ? fsbspeed - f : f - fsbspeed;
- return delta < 5;
-}
-
-static int check_powernow(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- unsigned int maxei, eax, ebx, ecx, edx;
-
- if ((c->x86_vendor != X86_VENDOR_AMD) || (c->x86 != 6)) {
-#ifdef MODULE
- printk(KERN_INFO PFX "This module only works with "
- "AMD K7 CPUs\n");
-#endif
- return 0;
- }
-
- /* Get maximum capabilities */
- maxei = cpuid_eax(0x80000000);
- if (maxei < 0x80000007) { /* Any powernow info ? */
-#ifdef MODULE
- printk(KERN_INFO PFX "No powernow capabilities detected\n");
-#endif
- return 0;
- }
-
- if ((c->x86_model == 6) && (c->x86_mask == 0)) {
- printk(KERN_INFO PFX "K7 660[A0] core detected, "
- "enabling errata workarounds\n");
- have_a0 = 1;
- }
-
- cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
-
- /* Check we can actually do something before we say anything.*/
- if (!(edx & (1 << 1 | 1 << 2)))
- return 0;
-
- printk(KERN_INFO PFX "PowerNOW! Technology present. Can scale: ");
-
- if (edx & 1 << 1) {
- printk("frequency");
- can_scale_bus = 1;
- }
-
- if ((edx & (1 << 1 | 1 << 2)) == 0x6)
- printk(" and ");
-
- if (edx & 1 << 2) {
- printk("voltage");
- can_scale_vid = 1;
- }
-
- printk(".\n");
- return 1;
-}
-
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
-static void invalidate_entry(unsigned int entry)
-{
- powernow_table[entry].frequency = CPUFREQ_ENTRY_INVALID;
-}
-#endif
-
-static int get_ranges(unsigned char *pst)
-{
- unsigned int j;
- unsigned int speed;
- u8 fid, vid;
-
- powernow_table = kzalloc((sizeof(struct cpufreq_frequency_table) *
- (number_scales + 1)), GFP_KERNEL);
- if (!powernow_table)
- return -ENOMEM;
-
- for (j = 0 ; j < number_scales; j++) {
- fid = *pst++;
-
- powernow_table[j].frequency = (fsb * fid_codes[fid]) / 10;
- powernow_table[j].index = fid; /* lower 8 bits */
-
- speed = powernow_table[j].frequency;
-
- if ((fid_codes[fid] % 10) == 5) {
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
- if (have_a0 == 1)
- invalidate_entry(j);
-#endif
- }
-
- if (speed < minimum_speed)
- minimum_speed = speed;
- if (speed > maximum_speed)
- maximum_speed = speed;
-
- vid = *pst++;
- powernow_table[j].index |= (vid << 8); /* upper 8 bits */
-
- dprintk(" FID: 0x%x (%d.%dx [%dMHz]) "
- "VID: 0x%x (%d.%03dV)\n", fid, fid_codes[fid] / 10,
- fid_codes[fid] % 10, speed/1000, vid,
- mobile_vid_table[vid]/1000,
- mobile_vid_table[vid]%1000);
- }
- powernow_table[number_scales].frequency = CPUFREQ_TABLE_END;
- powernow_table[number_scales].index = 0;
-
- return 0;
-}
-
-
-static void change_FID(int fid)
-{
- union msr_fidvidctl fidvidctl;
-
- rdmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
- if (fidvidctl.bits.FID != fid) {
- fidvidctl.bits.SGTC = latency;
- fidvidctl.bits.FID = fid;
- fidvidctl.bits.VIDC = 0;
- fidvidctl.bits.FIDC = 1;
- wrmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
- }
-}
-
-
-static void change_VID(int vid)
-{
- union msr_fidvidctl fidvidctl;
-
- rdmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
- if (fidvidctl.bits.VID != vid) {
- fidvidctl.bits.SGTC = latency;
- fidvidctl.bits.VID = vid;
- fidvidctl.bits.FIDC = 0;
- fidvidctl.bits.VIDC = 1;
- wrmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
- }
-}
-
-
-static void change_speed(unsigned int index)
-{
- u8 fid, vid;
- struct cpufreq_freqs freqs;
- union msr_fidvidstatus fidvidstatus;
- int cfid;
-
- /* fid are the lower 8 bits of the index we stored into
- * the cpufreq frequency table in powernow_decode_bios,
- * vid are the upper 8 bits.
- */
-
- fid = powernow_table[index].index & 0xFF;
- vid = (powernow_table[index].index & 0xFF00) >> 8;
-
- freqs.cpu = 0;
-
- rdmsrl(MSR_K7_FID_VID_STATUS, fidvidstatus.val);
- cfid = fidvidstatus.bits.CFID;
- freqs.old = fsb * fid_codes[cfid] / 10;
-
- freqs.new = powernow_table[index].frequency;
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- /* Now do the magic poking into the MSRs. */
-
- if (have_a0 == 1) /* A0 errata 5 */
- local_irq_disable();
-
- if (freqs.old > freqs.new) {
- /* Going down, so change FID first */
- change_FID(fid);
- change_VID(vid);
- } else {
- /* Going up, so change VID first */
- change_VID(vid);
- change_FID(fid);
- }
-
-
- if (have_a0 == 1)
- local_irq_enable();
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-}
-
-
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
-
-static struct acpi_processor_performance *acpi_processor_perf;
-
-static int powernow_acpi_init(void)
-{
- int i;
- int retval = 0;
- union powernow_acpi_control_t pc;
-
- if (acpi_processor_perf != NULL && powernow_table != NULL) {
- retval = -EINVAL;
- goto err0;
- }
-
- acpi_processor_perf = kzalloc(sizeof(struct acpi_processor_performance),
- GFP_KERNEL);
- if (!acpi_processor_perf) {
- retval = -ENOMEM;
- goto err0;
- }
-
- if (!zalloc_cpumask_var(&acpi_processor_perf->shared_cpu_map,
- GFP_KERNEL)) {
- retval = -ENOMEM;
- goto err05;
- }
-
- if (acpi_processor_register_performance(acpi_processor_perf, 0)) {
- retval = -EIO;
- goto err1;
- }
-
- if (acpi_processor_perf->control_register.space_id !=
- ACPI_ADR_SPACE_FIXED_HARDWARE) {
- retval = -ENODEV;
- goto err2;
- }
-
- if (acpi_processor_perf->status_register.space_id !=
- ACPI_ADR_SPACE_FIXED_HARDWARE) {
- retval = -ENODEV;
- goto err2;
- }
-
- number_scales = acpi_processor_perf->state_count;
-
- if (number_scales < 2) {
- retval = -ENODEV;
- goto err2;
- }
-
- powernow_table = kzalloc((sizeof(struct cpufreq_frequency_table) *
- (number_scales + 1)), GFP_KERNEL);
- if (!powernow_table) {
- retval = -ENOMEM;
- goto err2;
- }
-
- pc.val = (unsigned long) acpi_processor_perf->states[0].control;
- for (i = 0; i < number_scales; i++) {
- u8 fid, vid;
- struct acpi_processor_px *state =
- &acpi_processor_perf->states[i];
- unsigned int speed, speed_mhz;
-
- pc.val = (unsigned long) state->control;
- dprintk("acpi: P%d: %d MHz %d mW %d uS control %08x SGTC %d\n",
- i,
- (u32) state->core_frequency,
- (u32) state->power,
- (u32) state->transition_latency,
- (u32) state->control,
- pc.bits.sgtc);
-
- vid = pc.bits.vid;
- fid = pc.bits.fid;
-
- powernow_table[i].frequency = fsb * fid_codes[fid] / 10;
- powernow_table[i].index = fid; /* lower 8 bits */
- powernow_table[i].index |= (vid << 8); /* upper 8 bits */
-
- speed = powernow_table[i].frequency;
- speed_mhz = speed / 1000;
-
- /* processor_perflib will multiply the MHz value by 1000 to
- * get a KHz value (e.g. 1266000). However, powernow-k7 works
- * with true KHz values (e.g. 1266768). To ensure that all
- * powernow frequencies are available, we must ensure that
- * ACPI doesn't restrict them, so we round up the MHz value
- * to ensure that perflib's computed KHz value is greater than
- * or equal to powernow's KHz value.
- */
- if (speed % 1000 > 0)
- speed_mhz++;
-
- if ((fid_codes[fid] % 10) == 5) {
- if (have_a0 == 1)
- invalidate_entry(i);
- }
-
- dprintk(" FID: 0x%x (%d.%dx [%dMHz]) "
- "VID: 0x%x (%d.%03dV)\n", fid, fid_codes[fid] / 10,
- fid_codes[fid] % 10, speed_mhz, vid,
- mobile_vid_table[vid]/1000,
- mobile_vid_table[vid]%1000);
-
- if (state->core_frequency != speed_mhz) {
- state->core_frequency = speed_mhz;
- dprintk(" Corrected ACPI frequency to %d\n",
- speed_mhz);
- }
-
- if (latency < pc.bits.sgtc)
- latency = pc.bits.sgtc;
-
- if (speed < minimum_speed)
- minimum_speed = speed;
- if (speed > maximum_speed)
- maximum_speed = speed;
- }
-
- powernow_table[i].frequency = CPUFREQ_TABLE_END;
- powernow_table[i].index = 0;
-
- /* notify BIOS that we exist */
- acpi_processor_notify_smm(THIS_MODULE);
-
- return 0;
-
-err2:
- acpi_processor_unregister_performance(acpi_processor_perf, 0);
-err1:
- free_cpumask_var(acpi_processor_perf->shared_cpu_map);
-err05:
- kfree(acpi_processor_perf);
-err0:
- printk(KERN_WARNING PFX "ACPI perflib can not be used on "
- "this platform\n");
- acpi_processor_perf = NULL;
- return retval;
-}
-#else
-static int powernow_acpi_init(void)
-{
- printk(KERN_INFO PFX "no support for ACPI processor found."
- " Please recompile your kernel with ACPI processor\n");
- return -EINVAL;
-}
-#endif
-
-static void print_pst_entry(struct pst_s *pst, unsigned int j)
-{
- dprintk("PST:%d (@%p)\n", j, pst);
- dprintk(" cpuid: 0x%x fsb: %d maxFID: 0x%x startvid: 0x%x\n",
- pst->cpuid, pst->fsbspeed, pst->maxfid, pst->startvid);
-}
-
-static int powernow_decode_bios(int maxfid, int startvid)
-{
- struct psb_s *psb;
- struct pst_s *pst;
- unsigned int i, j;
- unsigned char *p;
- unsigned int etuple;
- unsigned int ret;
-
- etuple = cpuid_eax(0x80000001);
-
- for (i = 0xC0000; i < 0xffff0 ; i += 16) {
-
- p = phys_to_virt(i);
-
- if (memcmp(p, "AMDK7PNOW!", 10) == 0) {
- dprintk("Found PSB header at %p\n", p);
- psb = (struct psb_s *) p;
- dprintk("Table version: 0x%x\n", psb->tableversion);
- if (psb->tableversion != 0x12) {
- printk(KERN_INFO PFX "Sorry, only v1.2 tables"
- " supported right now\n");
- return -ENODEV;
- }
-
- dprintk("Flags: 0x%x\n", psb->flags);
- if ((psb->flags & 1) == 0)
- dprintk("Mobile voltage regulator\n");
- else
- dprintk("Desktop voltage regulator\n");
-
- latency = psb->settlingtime;
- if (latency < 100) {
- printk(KERN_INFO PFX "BIOS set settling time "
- "to %d microseconds. "
- "Should be at least 100. "
- "Correcting.\n", latency);
- latency = 100;
- }
- dprintk("Settling Time: %d microseconds.\n",
- psb->settlingtime);
- dprintk("Has %d PST tables. (Only dumping ones "
- "relevant to this CPU).\n",
- psb->numpst);
-
- p += sizeof(struct psb_s);
-
- pst = (struct pst_s *) p;
-
- for (j = 0; j < psb->numpst; j++) {
- pst = (struct pst_s *) p;
- number_scales = pst->numpstates;
-
- if ((etuple == pst->cpuid) &&
- check_fsb(pst->fsbspeed) &&
- (maxfid == pst->maxfid) &&
- (startvid == pst->startvid)) {
- print_pst_entry(pst, j);
- p = (char *)pst + sizeof(struct pst_s);
- ret = get_ranges(p);
- return ret;
- } else {
- unsigned int k;
- p = (char *)pst + sizeof(struct pst_s);
- for (k = 0; k < number_scales; k++)
- p += 2;
- }
- }
- printk(KERN_INFO PFX "No PST tables match this cpuid "
- "(0x%x)\n", etuple);
- printk(KERN_INFO PFX "This is indicative of a broken "
- "BIOS.\n");
-
- return -EINVAL;
- }
- p++;
- }
-
- return -ENODEV;
-}
-
-
-static int powernow_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate;
-
- if (cpufreq_frequency_table_target(policy, powernow_table, target_freq,
- relation, &newstate))
- return -EINVAL;
-
- change_speed(newstate);
-
- return 0;
-}
-
-
-static int powernow_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, powernow_table);
-}
-
-/*
- * We use the fact that the bus frequency is somehow
- * a multiple of 100000/3 khz, then we compute sgtc according
- * to this multiple.
- * That way, we match more how AMD thinks all of that work.
- * We will then get the same kind of behaviour already tested under
- * the "well-known" other OS.
- */
-static int __cpuinit fixup_sgtc(void)
-{
- unsigned int sgtc;
- unsigned int m;
-
- m = fsb / 3333;
- if ((m % 10) >= 5)
- m += 5;
-
- m /= 10;
-
- sgtc = 100 * m * latency;
- sgtc = sgtc / 3;
- if (sgtc > 0xfffff) {
- printk(KERN_WARNING PFX "SGTC too large %d\n", sgtc);
- sgtc = 0xfffff;
- }
- return sgtc;
-}
-
-static unsigned int powernow_get(unsigned int cpu)
-{
- union msr_fidvidstatus fidvidstatus;
- unsigned int cfid;
-
- if (cpu)
- return 0;
- rdmsrl(MSR_K7_FID_VID_STATUS, fidvidstatus.val);
- cfid = fidvidstatus.bits.CFID;
-
- return fsb * fid_codes[cfid] / 10;
-}
-
-
-static int __cpuinit acer_cpufreq_pst(const struct dmi_system_id *d)
-{
- printk(KERN_WARNING PFX
- "%s laptop with broken PST tables in BIOS detected.\n",
- d->ident);
- printk(KERN_WARNING PFX
- "You need to downgrade to 3A21 (09/09/2002), or try a newer "
- "BIOS than 3A71 (01/20/2003)\n");
- printk(KERN_WARNING PFX
- "cpufreq scaling has been disabled as a result of this.\n");
- return 0;
-}
-
-/*
- * Some Athlon laptops have really fucked PST tables.
- * A BIOS update is all that can save them.
- * Mention this, and disable cpufreq.
- */
-static struct dmi_system_id __cpuinitdata powernow_dmi_table[] = {
- {
- .callback = acer_cpufreq_pst,
- .ident = "Acer Aspire",
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "Insyde Software"),
- DMI_MATCH(DMI_BIOS_VERSION, "3A71"),
- },
- },
- { }
-};
-
-static int __cpuinit powernow_cpu_init(struct cpufreq_policy *policy)
-{
- union msr_fidvidstatus fidvidstatus;
- int result;
-
- if (policy->cpu != 0)
- return -ENODEV;
-
- rdmsrl(MSR_K7_FID_VID_STATUS, fidvidstatus.val);
-
- recalibrate_cpu_khz();
-
- fsb = (10 * cpu_khz) / fid_codes[fidvidstatus.bits.CFID];
- if (!fsb) {
- printk(KERN_WARNING PFX "can not determine bus frequency\n");
- return -EINVAL;
- }
- dprintk("FSB: %3dMHz\n", fsb/1000);
-
- if (dmi_check_system(powernow_dmi_table) || acpi_force) {
- printk(KERN_INFO PFX "PSB/PST known to be broken. "
- "Trying ACPI instead\n");
- result = powernow_acpi_init();
- } else {
- result = powernow_decode_bios(fidvidstatus.bits.MFID,
- fidvidstatus.bits.SVID);
- if (result) {
- printk(KERN_INFO PFX "Trying ACPI perflib\n");
- maximum_speed = 0;
- minimum_speed = -1;
- latency = 0;
- result = powernow_acpi_init();
- if (result) {
- printk(KERN_INFO PFX
- "ACPI and legacy methods failed\n");
- }
- } else {
- /* SGTC use the bus clock as timer */
- latency = fixup_sgtc();
- printk(KERN_INFO PFX "SGTC: %d\n", latency);
- }
- }
-
- if (result)
- return result;
-
- printk(KERN_INFO PFX "Minimum speed %d MHz. Maximum speed %d MHz.\n",
- minimum_speed/1000, maximum_speed/1000);
-
- policy->cpuinfo.transition_latency =
- cpufreq_scale(2000000UL, fsb, latency);
-
- policy->cur = powernow_get(0);
-
- cpufreq_frequency_table_get_attr(powernow_table, policy->cpu);
-
- return cpufreq_frequency_table_cpuinfo(policy, powernow_table);
-}
-
-static int powernow_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
-
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
- if (acpi_processor_perf) {
- acpi_processor_unregister_performance(acpi_processor_perf, 0);
- free_cpumask_var(acpi_processor_perf->shared_cpu_map);
- kfree(acpi_processor_perf);
- }
-#endif
-
- kfree(powernow_table);
- return 0;
-}
-
-static struct freq_attr *powernow_table_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver powernow_driver = {
- .verify = powernow_verify,
- .target = powernow_target,
- .get = powernow_get,
-#ifdef CONFIG_X86_POWERNOW_K7_ACPI
- .bios_limit = acpi_processor_get_bios_limit,
-#endif
- .init = powernow_cpu_init,
- .exit = powernow_cpu_exit,
- .name = "powernow-k7",
- .owner = THIS_MODULE,
- .attr = powernow_table_attr,
-};
-
-static int __init powernow_init(void)
-{
- if (check_powernow() == 0)
- return -ENODEV;
- return cpufreq_register_driver(&powernow_driver);
-}
-
-
-static void __exit powernow_exit(void)
-{
- cpufreq_unregister_driver(&powernow_driver);
-}
-
-module_param(acpi_force, int, 0444);
-MODULE_PARM_DESC(acpi_force, "Force ACPI to be used.");
-
-MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
-MODULE_DESCRIPTION("Powernow driver for AMD K7 processors.");
-MODULE_LICENSE("GPL");
-
-late_initcall(powernow_init);
-module_exit(powernow_exit);
-
+++ /dev/null
-/*
- * (C) 2003 Dave Jones.
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * AMD-specific information
- *
- */
-
-union msr_fidvidctl {
- struct {
- unsigned FID:5, // 4:0
- reserved1:3, // 7:5
- VID:5, // 12:8
- reserved2:3, // 15:13
- FIDC:1, // 16
- VIDC:1, // 17
- reserved3:2, // 19:18
- FIDCHGRATIO:1, // 20
- reserved4:11, // 31-21
- SGTC:20, // 32:51
- reserved5:12; // 63:52
- } bits;
- unsigned long long val;
-};
-
-union msr_fidvidstatus {
- struct {
- unsigned CFID:5, // 4:0
- reserved1:3, // 7:5
- SFID:5, // 12:8
- reserved2:3, // 15:13
- MFID:5, // 20:16
- reserved3:11, // 31:21
- CVID:5, // 36:32
- reserved4:3, // 39:37
- SVID:5, // 44:40
- reserved5:3, // 47:45
- MVID:5, // 52:48
- reserved6:11; // 63:53
- } bits;
- unsigned long long val;
-};
+++ /dev/null
-/*
- * (c) 2003-2010 Advanced Micro Devices, Inc.
- * Your use of this code is subject to the terms and conditions of the
- * GNU general public license version 2. See "COPYING" or
- * http://www.gnu.org/licenses/gpl.html
- *
- * Support : mark.langsdorf@amd.com
- *
- * Based on the powernow-k7.c module written by Dave Jones.
- * (C) 2003 Dave Jones on behalf of SuSE Labs
- * (C) 2004 Dominik Brodowski <linux@brodo.de>
- * (C) 2004 Pavel Machek <pavel@ucw.cz>
- * Licensed under the terms of the GNU GPL License version 2.
- * Based upon datasheets & sample CPUs kindly provided by AMD.
- *
- * Valuable input gratefully received from Dave Jones, Pavel Machek,
- * Dominik Brodowski, Jacob Shin, and others.
- * Originally developed by Paul Devriendt.
- * Processor information obtained from Chapter 9 (Power and Thermal Management)
- * of the "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
- * Opteron Processors" available for download from www.amd.com
- *
- * Tables for specific CPUs can be inferred from
- * http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf
- */
-
-#include <linux/kernel.h>
-#include <linux/smp.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/cpumask.h>
-#include <linux/sched.h> /* for current / set_cpus_allowed() */
-#include <linux/io.h>
-#include <linux/delay.h>
-
-#include <asm/msr.h>
-
-#include <linux/acpi.h>
-#include <linux/mutex.h>
-#include <acpi/processor.h>
-
-#define PFX "powernow-k8: "
-#define VERSION "version 2.20.00"
-#include "powernow-k8.h"
-#include "mperf.h"
-
-/* serialize freq changes */
-static DEFINE_MUTEX(fidvid_mutex);
-
-static DEFINE_PER_CPU(struct powernow_k8_data *, powernow_data);
-
-static int cpu_family = CPU_OPTERON;
-
-/* core performance boost */
-static bool cpb_capable, cpb_enabled;
-static struct msr __percpu *msrs;
-
-static struct cpufreq_driver cpufreq_amd64_driver;
-
-#ifndef CONFIG_SMP
-static inline const struct cpumask *cpu_core_mask(int cpu)
-{
- return cpumask_of(0);
-}
-#endif
-
-/* Return a frequency in MHz, given an input fid */
-static u32 find_freq_from_fid(u32 fid)
-{
- return 800 + (fid * 100);
-}
-
-/* Return a frequency in KHz, given an input fid */
-static u32 find_khz_freq_from_fid(u32 fid)
-{
- return 1000 * find_freq_from_fid(fid);
-}
-
-static u32 find_khz_freq_from_pstate(struct cpufreq_frequency_table *data,
- u32 pstate)
-{
- return data[pstate].frequency;
-}
-
-/* Return the vco fid for an input fid
- *
- * Each "low" fid has corresponding "high" fid, and you can get to "low" fids
- * only from corresponding high fids. This returns "high" fid corresponding to
- * "low" one.
- */
-static u32 convert_fid_to_vco_fid(u32 fid)
-{
- if (fid < HI_FID_TABLE_BOTTOM)
- return 8 + (2 * fid);
- else
- return fid;
-}
-
-/*
- * Return 1 if the pending bit is set. Unless we just instructed the processor
- * to transition to a new state, seeing this bit set is really bad news.
- */
-static int pending_bit_stuck(void)
-{
- u32 lo, hi;
-
- if (cpu_family == CPU_HW_PSTATE)
- return 0;
-
- rdmsr(MSR_FIDVID_STATUS, lo, hi);
- return lo & MSR_S_LO_CHANGE_PENDING ? 1 : 0;
-}
-
-/*
- * Update the global current fid / vid values from the status msr.
- * Returns 1 on error.
- */
-static int query_current_values_with_pending_wait(struct powernow_k8_data *data)
-{
- u32 lo, hi;
- u32 i = 0;
-
- if (cpu_family == CPU_HW_PSTATE) {
- rdmsr(MSR_PSTATE_STATUS, lo, hi);
- i = lo & HW_PSTATE_MASK;
- data->currpstate = i;
-
- /*
- * a workaround for family 11h erratum 311 might cause
- * an "out-of-range Pstate if the core is in Pstate-0
- */
- if ((boot_cpu_data.x86 == 0x11) && (i >= data->numps))
- data->currpstate = HW_PSTATE_0;
-
- return 0;
- }
- do {
- if (i++ > 10000) {
- dprintk("detected change pending stuck\n");
- return 1;
- }
- rdmsr(MSR_FIDVID_STATUS, lo, hi);
- } while (lo & MSR_S_LO_CHANGE_PENDING);
-
- data->currvid = hi & MSR_S_HI_CURRENT_VID;
- data->currfid = lo & MSR_S_LO_CURRENT_FID;
-
- return 0;
-}
-
-/* the isochronous relief time */
-static void count_off_irt(struct powernow_k8_data *data)
-{
- udelay((1 << data->irt) * 10);
- return;
-}
-
-/* the voltage stabilization time */
-static void count_off_vst(struct powernow_k8_data *data)
-{
- udelay(data->vstable * VST_UNITS_20US);
- return;
-}
-
-/* need to init the control msr to a safe value (for each cpu) */
-static void fidvid_msr_init(void)
-{
- u32 lo, hi;
- u8 fid, vid;
-
- rdmsr(MSR_FIDVID_STATUS, lo, hi);
- vid = hi & MSR_S_HI_CURRENT_VID;
- fid = lo & MSR_S_LO_CURRENT_FID;
- lo = fid | (vid << MSR_C_LO_VID_SHIFT);
- hi = MSR_C_HI_STP_GNT_BENIGN;
- dprintk("cpu%d, init lo 0x%x, hi 0x%x\n", smp_processor_id(), lo, hi);
- wrmsr(MSR_FIDVID_CTL, lo, hi);
-}
-
-/* write the new fid value along with the other control fields to the msr */
-static int write_new_fid(struct powernow_k8_data *data, u32 fid)
-{
- u32 lo;
- u32 savevid = data->currvid;
- u32 i = 0;
-
- if ((fid & INVALID_FID_MASK) || (data->currvid & INVALID_VID_MASK)) {
- printk(KERN_ERR PFX "internal error - overflow on fid write\n");
- return 1;
- }
-
- lo = fid;
- lo |= (data->currvid << MSR_C_LO_VID_SHIFT);
- lo |= MSR_C_LO_INIT_FID_VID;
-
- dprintk("writing fid 0x%x, lo 0x%x, hi 0x%x\n",
- fid, lo, data->plllock * PLL_LOCK_CONVERSION);
-
- do {
- wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
- if (i++ > 100) {
- printk(KERN_ERR PFX
- "Hardware error - pending bit very stuck - "
- "no further pstate changes possible\n");
- return 1;
- }
- } while (query_current_values_with_pending_wait(data));
-
- count_off_irt(data);
-
- if (savevid != data->currvid) {
- printk(KERN_ERR PFX
- "vid change on fid trans, old 0x%x, new 0x%x\n",
- savevid, data->currvid);
- return 1;
- }
-
- if (fid != data->currfid) {
- printk(KERN_ERR PFX
- "fid trans failed, fid 0x%x, curr 0x%x\n", fid,
- data->currfid);
- return 1;
- }
-
- return 0;
-}
-
-/* Write a new vid to the hardware */
-static int write_new_vid(struct powernow_k8_data *data, u32 vid)
-{
- u32 lo;
- u32 savefid = data->currfid;
- int i = 0;
-
- if ((data->currfid & INVALID_FID_MASK) || (vid & INVALID_VID_MASK)) {
- printk(KERN_ERR PFX "internal error - overflow on vid write\n");
- return 1;
- }
-
- lo = data->currfid;
- lo |= (vid << MSR_C_LO_VID_SHIFT);
- lo |= MSR_C_LO_INIT_FID_VID;
-
- dprintk("writing vid 0x%x, lo 0x%x, hi 0x%x\n",
- vid, lo, STOP_GRANT_5NS);
-
- do {
- wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
- if (i++ > 100) {
- printk(KERN_ERR PFX "internal error - pending bit "
- "very stuck - no further pstate "
- "changes possible\n");
- return 1;
- }
- } while (query_current_values_with_pending_wait(data));
-
- if (savefid != data->currfid) {
- printk(KERN_ERR PFX "fid changed on vid trans, old "
- "0x%x new 0x%x\n",
- savefid, data->currfid);
- return 1;
- }
-
- if (vid != data->currvid) {
- printk(KERN_ERR PFX "vid trans failed, vid 0x%x, "
- "curr 0x%x\n",
- vid, data->currvid);
- return 1;
- }
-
- return 0;
-}
-
-/*
- * Reduce the vid by the max of step or reqvid.
- * Decreasing vid codes represent increasing voltages:
- * vid of 0 is 1.550V, vid of 0x1e is 0.800V, vid of VID_OFF is off.
- */
-static int decrease_vid_code_by_step(struct powernow_k8_data *data,
- u32 reqvid, u32 step)
-{
- if ((data->currvid - reqvid) > step)
- reqvid = data->currvid - step;
-
- if (write_new_vid(data, reqvid))
- return 1;
-
- count_off_vst(data);
-
- return 0;
-}
-
-/* Change hardware pstate by single MSR write */
-static int transition_pstate(struct powernow_k8_data *data, u32 pstate)
-{
- wrmsr(MSR_PSTATE_CTRL, pstate, 0);
- data->currpstate = pstate;
- return 0;
-}
-
-/* Change Opteron/Athlon64 fid and vid, by the 3 phases. */
-static int transition_fid_vid(struct powernow_k8_data *data,
- u32 reqfid, u32 reqvid)
-{
- if (core_voltage_pre_transition(data, reqvid, reqfid))
- return 1;
-
- if (core_frequency_transition(data, reqfid))
- return 1;
-
- if (core_voltage_post_transition(data, reqvid))
- return 1;
-
- if (query_current_values_with_pending_wait(data))
- return 1;
-
- if ((reqfid != data->currfid) || (reqvid != data->currvid)) {
- printk(KERN_ERR PFX "failed (cpu%d): req 0x%x 0x%x, "
- "curr 0x%x 0x%x\n",
- smp_processor_id(),
- reqfid, reqvid, data->currfid, data->currvid);
- return 1;
- }
-
- dprintk("transitioned (cpu%d): new fid 0x%x, vid 0x%x\n",
- smp_processor_id(), data->currfid, data->currvid);
-
- return 0;
-}
-
-/* Phase 1 - core voltage transition ... setup voltage */
-static int core_voltage_pre_transition(struct powernow_k8_data *data,
- u32 reqvid, u32 reqfid)
-{
- u32 rvosteps = data->rvo;
- u32 savefid = data->currfid;
- u32 maxvid, lo, rvomult = 1;
-
- dprintk("ph1 (cpu%d): start, currfid 0x%x, currvid 0x%x, "
- "reqvid 0x%x, rvo 0x%x\n",
- smp_processor_id(),
- data->currfid, data->currvid, reqvid, data->rvo);
-
- if ((savefid < LO_FID_TABLE_TOP) && (reqfid < LO_FID_TABLE_TOP))
- rvomult = 2;
- rvosteps *= rvomult;
- rdmsr(MSR_FIDVID_STATUS, lo, maxvid);
- maxvid = 0x1f & (maxvid >> 16);
- dprintk("ph1 maxvid=0x%x\n", maxvid);
- if (reqvid < maxvid) /* lower numbers are higher voltages */
- reqvid = maxvid;
-
- while (data->currvid > reqvid) {
- dprintk("ph1: curr 0x%x, req vid 0x%x\n",
- data->currvid, reqvid);
- if (decrease_vid_code_by_step(data, reqvid, data->vidmvs))
- return 1;
- }
-
- while ((rvosteps > 0) &&
- ((rvomult * data->rvo + data->currvid) > reqvid)) {
- if (data->currvid == maxvid) {
- rvosteps = 0;
- } else {
- dprintk("ph1: changing vid for rvo, req 0x%x\n",
- data->currvid - 1);
- if (decrease_vid_code_by_step(data, data->currvid-1, 1))
- return 1;
- rvosteps--;
- }
- }
-
- if (query_current_values_with_pending_wait(data))
- return 1;
-
- if (savefid != data->currfid) {
- printk(KERN_ERR PFX "ph1 err, currfid changed 0x%x\n",
- data->currfid);
- return 1;
- }
-
- dprintk("ph1 complete, currfid 0x%x, currvid 0x%x\n",
- data->currfid, data->currvid);
-
- return 0;
-}
-
-/* Phase 2 - core frequency transition */
-static int core_frequency_transition(struct powernow_k8_data *data, u32 reqfid)
-{
- u32 vcoreqfid, vcocurrfid, vcofiddiff;
- u32 fid_interval, savevid = data->currvid;
-
- if (data->currfid == reqfid) {
- printk(KERN_ERR PFX "ph2 null fid transition 0x%x\n",
- data->currfid);
- return 0;
- }
-
- dprintk("ph2 (cpu%d): starting, currfid 0x%x, currvid 0x%x, "
- "reqfid 0x%x\n",
- smp_processor_id(),
- data->currfid, data->currvid, reqfid);
-
- vcoreqfid = convert_fid_to_vco_fid(reqfid);
- vcocurrfid = convert_fid_to_vco_fid(data->currfid);
- vcofiddiff = vcocurrfid > vcoreqfid ? vcocurrfid - vcoreqfid
- : vcoreqfid - vcocurrfid;
-
- if ((reqfid <= LO_FID_TABLE_TOP) && (data->currfid <= LO_FID_TABLE_TOP))
- vcofiddiff = 0;
-
- while (vcofiddiff > 2) {
- (data->currfid & 1) ? (fid_interval = 1) : (fid_interval = 2);
-
- if (reqfid > data->currfid) {
- if (data->currfid > LO_FID_TABLE_TOP) {
- if (write_new_fid(data,
- data->currfid + fid_interval))
- return 1;
- } else {
- if (write_new_fid
- (data,
- 2 + convert_fid_to_vco_fid(data->currfid)))
- return 1;
- }
- } else {
- if (write_new_fid(data, data->currfid - fid_interval))
- return 1;
- }
-
- vcocurrfid = convert_fid_to_vco_fid(data->currfid);
- vcofiddiff = vcocurrfid > vcoreqfid ? vcocurrfid - vcoreqfid
- : vcoreqfid - vcocurrfid;
- }
-
- if (write_new_fid(data, reqfid))
- return 1;
-
- if (query_current_values_with_pending_wait(data))
- return 1;
-
- if (data->currfid != reqfid) {
- printk(KERN_ERR PFX
- "ph2: mismatch, failed fid transition, "
- "curr 0x%x, req 0x%x\n",
- data->currfid, reqfid);
- return 1;
- }
-
- if (savevid != data->currvid) {
- printk(KERN_ERR PFX "ph2: vid changed, save 0x%x, curr 0x%x\n",
- savevid, data->currvid);
- return 1;
- }
-
- dprintk("ph2 complete, currfid 0x%x, currvid 0x%x\n",
- data->currfid, data->currvid);
-
- return 0;
-}
-
-/* Phase 3 - core voltage transition flow ... jump to the final vid. */
-static int core_voltage_post_transition(struct powernow_k8_data *data,
- u32 reqvid)
-{
- u32 savefid = data->currfid;
- u32 savereqvid = reqvid;
-
- dprintk("ph3 (cpu%d): starting, currfid 0x%x, currvid 0x%x\n",
- smp_processor_id(),
- data->currfid, data->currvid);
-
- if (reqvid != data->currvid) {
- if (write_new_vid(data, reqvid))
- return 1;
-
- if (savefid != data->currfid) {
- printk(KERN_ERR PFX
- "ph3: bad fid change, save 0x%x, curr 0x%x\n",
- savefid, data->currfid);
- return 1;
- }
-
- if (data->currvid != reqvid) {
- printk(KERN_ERR PFX
- "ph3: failed vid transition\n, "
- "req 0x%x, curr 0x%x",
- reqvid, data->currvid);
- return 1;
- }
- }
-
- if (query_current_values_with_pending_wait(data))
- return 1;
-
- if (savereqvid != data->currvid) {
- dprintk("ph3 failed, currvid 0x%x\n", data->currvid);
- return 1;
- }
-
- if (savefid != data->currfid) {
- dprintk("ph3 failed, currfid changed 0x%x\n",
- data->currfid);
- return 1;
- }
-
- dprintk("ph3 complete, currfid 0x%x, currvid 0x%x\n",
- data->currfid, data->currvid);
-
- return 0;
-}
-
-static void check_supported_cpu(void *_rc)
-{
- u32 eax, ebx, ecx, edx;
- int *rc = _rc;
-
- *rc = -ENODEV;
-
- if (__this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_AMD)
- return;
-
- eax = cpuid_eax(CPUID_PROCESSOR_SIGNATURE);
- if (((eax & CPUID_XFAM) != CPUID_XFAM_K8) &&
- ((eax & CPUID_XFAM) < CPUID_XFAM_10H))
- return;
-
- if ((eax & CPUID_XFAM) == CPUID_XFAM_K8) {
- if (((eax & CPUID_USE_XFAM_XMOD) != CPUID_USE_XFAM_XMOD) ||
- ((eax & CPUID_XMOD) > CPUID_XMOD_REV_MASK)) {
- printk(KERN_INFO PFX
- "Processor cpuid %x not supported\n", eax);
- return;
- }
-
- eax = cpuid_eax(CPUID_GET_MAX_CAPABILITIES);
- if (eax < CPUID_FREQ_VOLT_CAPABILITIES) {
- printk(KERN_INFO PFX
- "No frequency change capabilities detected\n");
- return;
- }
-
- cpuid(CPUID_FREQ_VOLT_CAPABILITIES, &eax, &ebx, &ecx, &edx);
- if ((edx & P_STATE_TRANSITION_CAPABLE)
- != P_STATE_TRANSITION_CAPABLE) {
- printk(KERN_INFO PFX
- "Power state transitions not supported\n");
- return;
- }
- } else { /* must be a HW Pstate capable processor */
- cpuid(CPUID_FREQ_VOLT_CAPABILITIES, &eax, &ebx, &ecx, &edx);
- if ((edx & USE_HW_PSTATE) == USE_HW_PSTATE)
- cpu_family = CPU_HW_PSTATE;
- else
- return;
- }
-
- *rc = 0;
-}
-
-static int check_pst_table(struct powernow_k8_data *data, struct pst_s *pst,
- u8 maxvid)
-{
- unsigned int j;
- u8 lastfid = 0xff;
-
- for (j = 0; j < data->numps; j++) {
- if (pst[j].vid > LEAST_VID) {
- printk(KERN_ERR FW_BUG PFX "vid %d invalid : 0x%x\n",
- j, pst[j].vid);
- return -EINVAL;
- }
- if (pst[j].vid < data->rvo) {
- /* vid + rvo >= 0 */
- printk(KERN_ERR FW_BUG PFX "0 vid exceeded with pstate"
- " %d\n", j);
- return -ENODEV;
- }
- if (pst[j].vid < maxvid + data->rvo) {
- /* vid + rvo >= maxvid */
- printk(KERN_ERR FW_BUG PFX "maxvid exceeded with pstate"
- " %d\n", j);
- return -ENODEV;
- }
- if (pst[j].fid > MAX_FID) {
- printk(KERN_ERR FW_BUG PFX "maxfid exceeded with pstate"
- " %d\n", j);
- return -ENODEV;
- }
- if (j && (pst[j].fid < HI_FID_TABLE_BOTTOM)) {
- /* Only first fid is allowed to be in "low" range */
- printk(KERN_ERR FW_BUG PFX "two low fids - %d : "
- "0x%x\n", j, pst[j].fid);
- return -EINVAL;
- }
- if (pst[j].fid < lastfid)
- lastfid = pst[j].fid;
- }
- if (lastfid & 1) {
- printk(KERN_ERR FW_BUG PFX "lastfid invalid\n");
- return -EINVAL;
- }
- if (lastfid > LO_FID_TABLE_TOP)
- printk(KERN_INFO FW_BUG PFX
- "first fid not from lo freq table\n");
-
- return 0;
-}
-
-static void invalidate_entry(struct cpufreq_frequency_table *powernow_table,
- unsigned int entry)
-{
- powernow_table[entry].frequency = CPUFREQ_ENTRY_INVALID;
-}
-
-static void print_basics(struct powernow_k8_data *data)
-{
- int j;
- for (j = 0; j < data->numps; j++) {
- if (data->powernow_table[j].frequency !=
- CPUFREQ_ENTRY_INVALID) {
- if (cpu_family == CPU_HW_PSTATE) {
- printk(KERN_INFO PFX
- " %d : pstate %d (%d MHz)\n", j,
- data->powernow_table[j].index,
- data->powernow_table[j].frequency/1000);
- } else {
- printk(KERN_INFO PFX
- "fid 0x%x (%d MHz), vid 0x%x\n",
- data->powernow_table[j].index & 0xff,
- data->powernow_table[j].frequency/1000,
- data->powernow_table[j].index >> 8);
- }
- }
- }
- if (data->batps)
- printk(KERN_INFO PFX "Only %d pstates on battery\n",
- data->batps);
-}
-
-static u32 freq_from_fid_did(u32 fid, u32 did)
-{
- u32 mhz = 0;
-
- if (boot_cpu_data.x86 == 0x10)
- mhz = (100 * (fid + 0x10)) >> did;
- else if (boot_cpu_data.x86 == 0x11)
- mhz = (100 * (fid + 8)) >> did;
- else
- BUG();
-
- return mhz * 1000;
-}
-
-static int fill_powernow_table(struct powernow_k8_data *data,
- struct pst_s *pst, u8 maxvid)
-{
- struct cpufreq_frequency_table *powernow_table;
- unsigned int j;
-
- if (data->batps) {
- /* use ACPI support to get full speed on mains power */
- printk(KERN_WARNING PFX
- "Only %d pstates usable (use ACPI driver for full "
- "range\n", data->batps);
- data->numps = data->batps;
- }
-
- for (j = 1; j < data->numps; j++) {
- if (pst[j-1].fid >= pst[j].fid) {
- printk(KERN_ERR PFX "PST out of sequence\n");
- return -EINVAL;
- }
- }
-
- if (data->numps < 2) {
- printk(KERN_ERR PFX "no p states to transition\n");
- return -ENODEV;
- }
-
- if (check_pst_table(data, pst, maxvid))
- return -EINVAL;
-
- powernow_table = kmalloc((sizeof(struct cpufreq_frequency_table)
- * (data->numps + 1)), GFP_KERNEL);
- if (!powernow_table) {
- printk(KERN_ERR PFX "powernow_table memory alloc failure\n");
- return -ENOMEM;
- }
-
- for (j = 0; j < data->numps; j++) {
- int freq;
- powernow_table[j].index = pst[j].fid; /* lower 8 bits */
- powernow_table[j].index |= (pst[j].vid << 8); /* upper 8 bits */
- freq = find_khz_freq_from_fid(pst[j].fid);
- powernow_table[j].frequency = freq;
- }
- powernow_table[data->numps].frequency = CPUFREQ_TABLE_END;
- powernow_table[data->numps].index = 0;
-
- if (query_current_values_with_pending_wait(data)) {
- kfree(powernow_table);
- return -EIO;
- }
-
- dprintk("cfid 0x%x, cvid 0x%x\n", data->currfid, data->currvid);
- data->powernow_table = powernow_table;
- if (cpumask_first(cpu_core_mask(data->cpu)) == data->cpu)
- print_basics(data);
-
- for (j = 0; j < data->numps; j++)
- if ((pst[j].fid == data->currfid) &&
- (pst[j].vid == data->currvid))
- return 0;
-
- dprintk("currfid/vid do not match PST, ignoring\n");
- return 0;
-}
-
-/* Find and validate the PSB/PST table in BIOS. */
-static int find_psb_table(struct powernow_k8_data *data)
-{
- struct psb_s *psb;
- unsigned int i;
- u32 mvs;
- u8 maxvid;
- u32 cpst = 0;
- u32 thiscpuid;
-
- for (i = 0xc0000; i < 0xffff0; i += 0x10) {
- /* Scan BIOS looking for the signature. */
- /* It can not be at ffff0 - it is too big. */
-
- psb = phys_to_virt(i);
- if (memcmp(psb, PSB_ID_STRING, PSB_ID_STRING_LEN) != 0)
- continue;
-
- dprintk("found PSB header at 0x%p\n", psb);
-
- dprintk("table vers: 0x%x\n", psb->tableversion);
- if (psb->tableversion != PSB_VERSION_1_4) {
- printk(KERN_ERR FW_BUG PFX "PSB table is not v1.4\n");
- return -ENODEV;
- }
-
- dprintk("flags: 0x%x\n", psb->flags1);
- if (psb->flags1) {
- printk(KERN_ERR FW_BUG PFX "unknown flags\n");
- return -ENODEV;
- }
-
- data->vstable = psb->vstable;
- dprintk("voltage stabilization time: %d(*20us)\n",
- data->vstable);
-
- dprintk("flags2: 0x%x\n", psb->flags2);
- data->rvo = psb->flags2 & 3;
- data->irt = ((psb->flags2) >> 2) & 3;
- mvs = ((psb->flags2) >> 4) & 3;
- data->vidmvs = 1 << mvs;
- data->batps = ((psb->flags2) >> 6) & 3;
-
- dprintk("ramp voltage offset: %d\n", data->rvo);
- dprintk("isochronous relief time: %d\n", data->irt);
- dprintk("maximum voltage step: %d - 0x%x\n", mvs, data->vidmvs);
-
- dprintk("numpst: 0x%x\n", psb->num_tables);
- cpst = psb->num_tables;
- if ((psb->cpuid == 0x00000fc0) ||
- (psb->cpuid == 0x00000fe0)) {
- thiscpuid = cpuid_eax(CPUID_PROCESSOR_SIGNATURE);
- if ((thiscpuid == 0x00000fc0) ||
- (thiscpuid == 0x00000fe0))
- cpst = 1;
- }
- if (cpst != 1) {
- printk(KERN_ERR FW_BUG PFX "numpst must be 1\n");
- return -ENODEV;
- }
-
- data->plllock = psb->plllocktime;
- dprintk("plllocktime: 0x%x (units 1us)\n", psb->plllocktime);
- dprintk("maxfid: 0x%x\n", psb->maxfid);
- dprintk("maxvid: 0x%x\n", psb->maxvid);
- maxvid = psb->maxvid;
-
- data->numps = psb->numps;
- dprintk("numpstates: 0x%x\n", data->numps);
- return fill_powernow_table(data,
- (struct pst_s *)(psb+1), maxvid);
- }
- /*
- * If you see this message, complain to BIOS manufacturer. If
- * he tells you "we do not support Linux" or some similar
- * nonsense, remember that Windows 2000 uses the same legacy
- * mechanism that the old Linux PSB driver uses. Tell them it
- * is broken with Windows 2000.
- *
- * The reference to the AMD documentation is chapter 9 in the
- * BIOS and Kernel Developer's Guide, which is available on
- * www.amd.com
- */
- printk(KERN_ERR FW_BUG PFX "No PSB or ACPI _PSS objects\n");
- printk(KERN_ERR PFX "Make sure that your BIOS is up to date"
- " and Cool'N'Quiet support is enabled in BIOS setup\n");
- return -ENODEV;
-}
-
-static void powernow_k8_acpi_pst_values(struct powernow_k8_data *data,
- unsigned int index)
-{
- u64 control;
-
- if (!data->acpi_data.state_count || (cpu_family == CPU_HW_PSTATE))
- return;
-
- control = data->acpi_data.states[index].control;
- data->irt = (control >> IRT_SHIFT) & IRT_MASK;
- data->rvo = (control >> RVO_SHIFT) & RVO_MASK;
- data->exttype = (control >> EXT_TYPE_SHIFT) & EXT_TYPE_MASK;
- data->plllock = (control >> PLL_L_SHIFT) & PLL_L_MASK;
- data->vidmvs = 1 << ((control >> MVS_SHIFT) & MVS_MASK);
- data->vstable = (control >> VST_SHIFT) & VST_MASK;
-}
-
-static int powernow_k8_cpu_init_acpi(struct powernow_k8_data *data)
-{
- struct cpufreq_frequency_table *powernow_table;
- int ret_val = -ENODEV;
- u64 control, status;
-
- if (acpi_processor_register_performance(&data->acpi_data, data->cpu)) {
- dprintk("register performance failed: bad ACPI data\n");
- return -EIO;
- }
-
- /* verify the data contained in the ACPI structures */
- if (data->acpi_data.state_count <= 1) {
- dprintk("No ACPI P-States\n");
- goto err_out;
- }
-
- control = data->acpi_data.control_register.space_id;
- status = data->acpi_data.status_register.space_id;
-
- if ((control != ACPI_ADR_SPACE_FIXED_HARDWARE) ||
- (status != ACPI_ADR_SPACE_FIXED_HARDWARE)) {
- dprintk("Invalid control/status registers (%x - %x)\n",
- control, status);
- goto err_out;
- }
-
- /* fill in data->powernow_table */
- powernow_table = kmalloc((sizeof(struct cpufreq_frequency_table)
- * (data->acpi_data.state_count + 1)), GFP_KERNEL);
- if (!powernow_table) {
- dprintk("powernow_table memory alloc failure\n");
- goto err_out;
- }
-
- /* fill in data */
- data->numps = data->acpi_data.state_count;
- powernow_k8_acpi_pst_values(data, 0);
-
- if (cpu_family == CPU_HW_PSTATE)
- ret_val = fill_powernow_table_pstate(data, powernow_table);
- else
- ret_val = fill_powernow_table_fidvid(data, powernow_table);
- if (ret_val)
- goto err_out_mem;
-
- powernow_table[data->acpi_data.state_count].frequency =
- CPUFREQ_TABLE_END;
- powernow_table[data->acpi_data.state_count].index = 0;
- data->powernow_table = powernow_table;
-
- if (cpumask_first(cpu_core_mask(data->cpu)) == data->cpu)
- print_basics(data);
-
- /* notify BIOS that we exist */
- acpi_processor_notify_smm(THIS_MODULE);
-
- if (!zalloc_cpumask_var(&data->acpi_data.shared_cpu_map, GFP_KERNEL)) {
- printk(KERN_ERR PFX
- "unable to alloc powernow_k8_data cpumask\n");
- ret_val = -ENOMEM;
- goto err_out_mem;
- }
-
- return 0;
-
-err_out_mem:
- kfree(powernow_table);
-
-err_out:
- acpi_processor_unregister_performance(&data->acpi_data, data->cpu);
-
- /* data->acpi_data.state_count informs us at ->exit()
- * whether ACPI was used */
- data->acpi_data.state_count = 0;
-
- return ret_val;
-}
-
-static int fill_powernow_table_pstate(struct powernow_k8_data *data,
- struct cpufreq_frequency_table *powernow_table)
-{
- int i;
- u32 hi = 0, lo = 0;
- rdmsr(MSR_PSTATE_CUR_LIMIT, lo, hi);
- data->max_hw_pstate = (lo & HW_PSTATE_MAX_MASK) >> HW_PSTATE_MAX_SHIFT;
-
- for (i = 0; i < data->acpi_data.state_count; i++) {
- u32 index;
-
- index = data->acpi_data.states[i].control & HW_PSTATE_MASK;
- if (index > data->max_hw_pstate) {
- printk(KERN_ERR PFX "invalid pstate %d - "
- "bad value %d.\n", i, index);
- printk(KERN_ERR PFX "Please report to BIOS "
- "manufacturer\n");
- invalidate_entry(powernow_table, i);
- continue;
- }
- rdmsr(MSR_PSTATE_DEF_BASE + index, lo, hi);
- if (!(hi & HW_PSTATE_VALID_MASK)) {
- dprintk("invalid pstate %d, ignoring\n", index);
- invalidate_entry(powernow_table, i);
- continue;
- }
-
- powernow_table[i].index = index;
-
- /* Frequency may be rounded for these */
- if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
- || boot_cpu_data.x86 == 0x11) {
- powernow_table[i].frequency =
- freq_from_fid_did(lo & 0x3f, (lo >> 6) & 7);
- } else
- powernow_table[i].frequency =
- data->acpi_data.states[i].core_frequency * 1000;
- }
- return 0;
-}
-
-static int fill_powernow_table_fidvid(struct powernow_k8_data *data,
- struct cpufreq_frequency_table *powernow_table)
-{
- int i;
-
- for (i = 0; i < data->acpi_data.state_count; i++) {
- u32 fid;
- u32 vid;
- u32 freq, index;
- u64 status, control;
-
- if (data->exttype) {
- status = data->acpi_data.states[i].status;
- fid = status & EXT_FID_MASK;
- vid = (status >> VID_SHIFT) & EXT_VID_MASK;
- } else {
- control = data->acpi_data.states[i].control;
- fid = control & FID_MASK;
- vid = (control >> VID_SHIFT) & VID_MASK;
- }
-
- dprintk(" %d : fid 0x%x, vid 0x%x\n", i, fid, vid);
-
- index = fid | (vid<<8);
- powernow_table[i].index = index;
-
- freq = find_khz_freq_from_fid(fid);
- powernow_table[i].frequency = freq;
-
- /* verify frequency is OK */
- if ((freq > (MAX_FREQ * 1000)) || (freq < (MIN_FREQ * 1000))) {
- dprintk("invalid freq %u kHz, ignoring\n", freq);
- invalidate_entry(powernow_table, i);
- continue;
- }
-
- /* verify voltage is OK -
- * BIOSs are using "off" to indicate invalid */
- if (vid == VID_OFF) {
- dprintk("invalid vid %u, ignoring\n", vid);
- invalidate_entry(powernow_table, i);
- continue;
- }
-
- if (freq != (data->acpi_data.states[i].core_frequency * 1000)) {
- printk(KERN_INFO PFX "invalid freq entries "
- "%u kHz vs. %u kHz\n", freq,
- (unsigned int)
- (data->acpi_data.states[i].core_frequency
- * 1000));
- invalidate_entry(powernow_table, i);
- continue;
- }
- }
- return 0;
-}
-
-static void powernow_k8_cpu_exit_acpi(struct powernow_k8_data *data)
-{
- if (data->acpi_data.state_count)
- acpi_processor_unregister_performance(&data->acpi_data,
- data->cpu);
- free_cpumask_var(data->acpi_data.shared_cpu_map);
-}
-
-static int get_transition_latency(struct powernow_k8_data *data)
-{
- int max_latency = 0;
- int i;
- for (i = 0; i < data->acpi_data.state_count; i++) {
- int cur_latency = data->acpi_data.states[i].transition_latency
- + data->acpi_data.states[i].bus_master_latency;
- if (cur_latency > max_latency)
- max_latency = cur_latency;
- }
- if (max_latency == 0) {
- /*
- * Fam 11h and later may return 0 as transition latency. This
- * is intended and means "very fast". While cpufreq core and
- * governors currently can handle that gracefully, better set it
- * to 1 to avoid problems in the future.
- */
- if (boot_cpu_data.x86 < 0x11)
- printk(KERN_ERR FW_WARN PFX "Invalid zero transition "
- "latency\n");
- max_latency = 1;
- }
- /* value in usecs, needs to be in nanoseconds */
- return 1000 * max_latency;
-}
-
-/* Take a frequency, and issue the fid/vid transition command */
-static int transition_frequency_fidvid(struct powernow_k8_data *data,
- unsigned int index)
-{
- u32 fid = 0;
- u32 vid = 0;
- int res, i;
- struct cpufreq_freqs freqs;
-
- dprintk("cpu %d transition to index %u\n", smp_processor_id(), index);
-
- /* fid/vid correctness check for k8 */
- /* fid are the lower 8 bits of the index we stored into
- * the cpufreq frequency table in find_psb_table, vid
- * are the upper 8 bits.
- */
- fid = data->powernow_table[index].index & 0xFF;
- vid = (data->powernow_table[index].index & 0xFF00) >> 8;
-
- dprintk("table matched fid 0x%x, giving vid 0x%x\n", fid, vid);
-
- if (query_current_values_with_pending_wait(data))
- return 1;
-
- if ((data->currvid == vid) && (data->currfid == fid)) {
- dprintk("target matches current values (fid 0x%x, vid 0x%x)\n",
- fid, vid);
- return 0;
- }
-
- dprintk("cpu %d, changing to fid 0x%x, vid 0x%x\n",
- smp_processor_id(), fid, vid);
- freqs.old = find_khz_freq_from_fid(data->currfid);
- freqs.new = find_khz_freq_from_fid(fid);
-
- for_each_cpu(i, data->available_cores) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- }
-
- res = transition_fid_vid(data, fid, vid);
- freqs.new = find_khz_freq_from_fid(data->currfid);
-
- for_each_cpu(i, data->available_cores) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
- return res;
-}
-
-/* Take a frequency, and issue the hardware pstate transition command */
-static int transition_frequency_pstate(struct powernow_k8_data *data,
- unsigned int index)
-{
- u32 pstate = 0;
- int res, i;
- struct cpufreq_freqs freqs;
-
- dprintk("cpu %d transition to index %u\n", smp_processor_id(), index);
-
- /* get MSR index for hardware pstate transition */
- pstate = index & HW_PSTATE_MASK;
- if (pstate > data->max_hw_pstate)
- return 0;
- freqs.old = find_khz_freq_from_pstate(data->powernow_table,
- data->currpstate);
- freqs.new = find_khz_freq_from_pstate(data->powernow_table, pstate);
-
- for_each_cpu(i, data->available_cores) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- }
-
- res = transition_pstate(data, pstate);
- freqs.new = find_khz_freq_from_pstate(data->powernow_table, pstate);
-
- for_each_cpu(i, data->available_cores) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
- return res;
-}
-
-/* Driver entry point to switch to the target frequency */
-static int powernowk8_target(struct cpufreq_policy *pol,
- unsigned targfreq, unsigned relation)
-{
- cpumask_var_t oldmask;
- struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
- u32 checkfid;
- u32 checkvid;
- unsigned int newstate;
- int ret = -EIO;
-
- if (!data)
- return -EINVAL;
-
- checkfid = data->currfid;
- checkvid = data->currvid;
-
- /* only run on specific CPU from here on. */
- /* This is poor form: use a workqueue or smp_call_function_single */
- if (!alloc_cpumask_var(&oldmask, GFP_KERNEL))
- return -ENOMEM;
-
- cpumask_copy(oldmask, tsk_cpus_allowed(current));
- set_cpus_allowed_ptr(current, cpumask_of(pol->cpu));
-
- if (smp_processor_id() != pol->cpu) {
- printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
- goto err_out;
- }
-
- if (pending_bit_stuck()) {
- printk(KERN_ERR PFX "failing targ, change pending bit set\n");
- goto err_out;
- }
-
- dprintk("targ: cpu %d, %d kHz, min %d, max %d, relation %d\n",
- pol->cpu, targfreq, pol->min, pol->max, relation);
-
- if (query_current_values_with_pending_wait(data))
- goto err_out;
-
- if (cpu_family != CPU_HW_PSTATE) {
- dprintk("targ: curr fid 0x%x, vid 0x%x\n",
- data->currfid, data->currvid);
-
- if ((checkvid != data->currvid) ||
- (checkfid != data->currfid)) {
- printk(KERN_INFO PFX
- "error - out of sync, fix 0x%x 0x%x, "
- "vid 0x%x 0x%x\n",
- checkfid, data->currfid,
- checkvid, data->currvid);
- }
- }
-
- if (cpufreq_frequency_table_target(pol, data->powernow_table,
- targfreq, relation, &newstate))
- goto err_out;
-
- mutex_lock(&fidvid_mutex);
-
- powernow_k8_acpi_pst_values(data, newstate);
-
- if (cpu_family == CPU_HW_PSTATE)
- ret = transition_frequency_pstate(data, newstate);
- else
- ret = transition_frequency_fidvid(data, newstate);
- if (ret) {
- printk(KERN_ERR PFX "transition frequency failed\n");
- ret = 1;
- mutex_unlock(&fidvid_mutex);
- goto err_out;
- }
- mutex_unlock(&fidvid_mutex);
-
- if (cpu_family == CPU_HW_PSTATE)
- pol->cur = find_khz_freq_from_pstate(data->powernow_table,
- newstate);
- else
- pol->cur = find_khz_freq_from_fid(data->currfid);
- ret = 0;
-
-err_out:
- set_cpus_allowed_ptr(current, oldmask);
- free_cpumask_var(oldmask);
- return ret;
-}
-
-/* Driver entry point to verify the policy and range of frequencies */
-static int powernowk8_verify(struct cpufreq_policy *pol)
-{
- struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
-
- if (!data)
- return -EINVAL;
-
- return cpufreq_frequency_table_verify(pol, data->powernow_table);
-}
-
-struct init_on_cpu {
- struct powernow_k8_data *data;
- int rc;
-};
-
-static void __cpuinit powernowk8_cpu_init_on_cpu(void *_init_on_cpu)
-{
- struct init_on_cpu *init_on_cpu = _init_on_cpu;
-
- if (pending_bit_stuck()) {
- printk(KERN_ERR PFX "failing init, change pending bit set\n");
- init_on_cpu->rc = -ENODEV;
- return;
- }
-
- if (query_current_values_with_pending_wait(init_on_cpu->data)) {
- init_on_cpu->rc = -ENODEV;
- return;
- }
-
- if (cpu_family == CPU_OPTERON)
- fidvid_msr_init();
-
- init_on_cpu->rc = 0;
-}
-
-/* per CPU init entry point to the driver */
-static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol)
-{
- static const char ACPI_PSS_BIOS_BUG_MSG[] =
- KERN_ERR FW_BUG PFX "No compatible ACPI _PSS objects found.\n"
- FW_BUG PFX "Try again with latest BIOS.\n";
- struct powernow_k8_data *data;
- struct init_on_cpu init_on_cpu;
- int rc;
- struct cpuinfo_x86 *c = &cpu_data(pol->cpu);
-
- if (!cpu_online(pol->cpu))
- return -ENODEV;
-
- smp_call_function_single(pol->cpu, check_supported_cpu, &rc, 1);
- if (rc)
- return -ENODEV;
-
- data = kzalloc(sizeof(struct powernow_k8_data), GFP_KERNEL);
- if (!data) {
- printk(KERN_ERR PFX "unable to alloc powernow_k8_data");
- return -ENOMEM;
- }
-
- data->cpu = pol->cpu;
- data->currpstate = HW_PSTATE_INVALID;
-
- if (powernow_k8_cpu_init_acpi(data)) {
- /*
- * Use the PSB BIOS structure. This is only available on
- * an UP version, and is deprecated by AMD.
- */
- if (num_online_cpus() != 1) {
- printk_once(ACPI_PSS_BIOS_BUG_MSG);
- goto err_out;
- }
- if (pol->cpu != 0) {
- printk(KERN_ERR FW_BUG PFX "No ACPI _PSS objects for "
- "CPU other than CPU0. Complain to your BIOS "
- "vendor.\n");
- goto err_out;
- }
- rc = find_psb_table(data);
- if (rc)
- goto err_out;
-
- /* Take a crude guess here.
- * That guess was in microseconds, so multiply with 1000 */
- pol->cpuinfo.transition_latency = (
- ((data->rvo + 8) * data->vstable * VST_UNITS_20US) +
- ((1 << data->irt) * 30)) * 1000;
- } else /* ACPI _PSS objects available */
- pol->cpuinfo.transition_latency = get_transition_latency(data);
-
- /* only run on specific CPU from here on */
- init_on_cpu.data = data;
- smp_call_function_single(data->cpu, powernowk8_cpu_init_on_cpu,
- &init_on_cpu, 1);
- rc = init_on_cpu.rc;
- if (rc != 0)
- goto err_out_exit_acpi;
-
- if (cpu_family == CPU_HW_PSTATE)
- cpumask_copy(pol->cpus, cpumask_of(pol->cpu));
- else
- cpumask_copy(pol->cpus, cpu_core_mask(pol->cpu));
- data->available_cores = pol->cpus;
-
- if (cpu_family == CPU_HW_PSTATE)
- pol->cur = find_khz_freq_from_pstate(data->powernow_table,
- data->currpstate);
- else
- pol->cur = find_khz_freq_from_fid(data->currfid);
- dprintk("policy current frequency %d kHz\n", pol->cur);
-
- /* min/max the cpu is capable of */
- if (cpufreq_frequency_table_cpuinfo(pol, data->powernow_table)) {
- printk(KERN_ERR FW_BUG PFX "invalid powernow_table\n");
- powernow_k8_cpu_exit_acpi(data);
- kfree(data->powernow_table);
- kfree(data);
- return -EINVAL;
- }
-
- /* Check for APERF/MPERF support in hardware */
- if (cpu_has(c, X86_FEATURE_APERFMPERF))
- cpufreq_amd64_driver.getavg = cpufreq_get_measured_perf;
-
- cpufreq_frequency_table_get_attr(data->powernow_table, pol->cpu);
-
- if (cpu_family == CPU_HW_PSTATE)
- dprintk("cpu_init done, current pstate 0x%x\n",
- data->currpstate);
- else
- dprintk("cpu_init done, current fid 0x%x, vid 0x%x\n",
- data->currfid, data->currvid);
-
- per_cpu(powernow_data, pol->cpu) = data;
-
- return 0;
-
-err_out_exit_acpi:
- powernow_k8_cpu_exit_acpi(data);
-
-err_out:
- kfree(data);
- return -ENODEV;
-}
-
-static int __devexit powernowk8_cpu_exit(struct cpufreq_policy *pol)
-{
- struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
-
- if (!data)
- return -EINVAL;
-
- powernow_k8_cpu_exit_acpi(data);
-
- cpufreq_frequency_table_put_attr(pol->cpu);
-
- kfree(data->powernow_table);
- kfree(data);
- per_cpu(powernow_data, pol->cpu) = NULL;
-
- return 0;
-}
-
-static void query_values_on_cpu(void *_err)
-{
- int *err = _err;
- struct powernow_k8_data *data = __this_cpu_read(powernow_data);
-
- *err = query_current_values_with_pending_wait(data);
-}
-
-static unsigned int powernowk8_get(unsigned int cpu)
-{
- struct powernow_k8_data *data = per_cpu(powernow_data, cpu);
- unsigned int khz = 0;
- int err;
-
- if (!data)
- return 0;
-
- smp_call_function_single(cpu, query_values_on_cpu, &err, true);
- if (err)
- goto out;
-
- if (cpu_family == CPU_HW_PSTATE)
- khz = find_khz_freq_from_pstate(data->powernow_table,
- data->currpstate);
- else
- khz = find_khz_freq_from_fid(data->currfid);
-
-
-out:
- return khz;
-}
-
-static void _cpb_toggle_msrs(bool t)
-{
- int cpu;
-
- get_online_cpus();
-
- rdmsr_on_cpus(cpu_online_mask, MSR_K7_HWCR, msrs);
-
- for_each_cpu(cpu, cpu_online_mask) {
- struct msr *reg = per_cpu_ptr(msrs, cpu);
- if (t)
- reg->l &= ~BIT(25);
- else
- reg->l |= BIT(25);
- }
- wrmsr_on_cpus(cpu_online_mask, MSR_K7_HWCR, msrs);
-
- put_online_cpus();
-}
-
-/*
- * Switch on/off core performance boosting.
- *
- * 0=disable
- * 1=enable.
- */
-static void cpb_toggle(bool t)
-{
- if (!cpb_capable)
- return;
-
- if (t && !cpb_enabled) {
- cpb_enabled = true;
- _cpb_toggle_msrs(t);
- printk(KERN_INFO PFX "Core Boosting enabled.\n");
- } else if (!t && cpb_enabled) {
- cpb_enabled = false;
- _cpb_toggle_msrs(t);
- printk(KERN_INFO PFX "Core Boosting disabled.\n");
- }
-}
-
-static ssize_t store_cpb(struct cpufreq_policy *policy, const char *buf,
- size_t count)
-{
- int ret = -EINVAL;
- unsigned long val = 0;
-
- ret = strict_strtoul(buf, 10, &val);
- if (!ret && (val == 0 || val == 1) && cpb_capable)
- cpb_toggle(val);
- else
- return -EINVAL;
-
- return count;
-}
-
-static ssize_t show_cpb(struct cpufreq_policy *policy, char *buf)
-{
- return sprintf(buf, "%u\n", cpb_enabled);
-}
-
-#define define_one_rw(_name) \
-static struct freq_attr _name = \
-__ATTR(_name, 0644, show_##_name, store_##_name)
-
-define_one_rw(cpb);
-
-static struct freq_attr *powernow_k8_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- &cpb,
- NULL,
-};
-
-static struct cpufreq_driver cpufreq_amd64_driver = {
- .verify = powernowk8_verify,
- .target = powernowk8_target,
- .bios_limit = acpi_processor_get_bios_limit,
- .init = powernowk8_cpu_init,
- .exit = __devexit_p(powernowk8_cpu_exit),
- .get = powernowk8_get,
- .name = "powernow-k8",
- .owner = THIS_MODULE,
- .attr = powernow_k8_attr,
-};
-
-/*
- * Clear the boost-disable flag on the CPU_DOWN path so that this cpu
- * cannot block the remaining ones from boosting. On the CPU_UP path we
- * simply keep the boost-disable flag in sync with the current global
- * state.
- */
-static int cpb_notify(struct notifier_block *nb, unsigned long action,
- void *hcpu)
-{
- unsigned cpu = (long)hcpu;
- u32 lo, hi;
-
- switch (action) {
- case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
-
- if (!cpb_enabled) {
- rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
- lo |= BIT(25);
- wrmsr_on_cpu(cpu, MSR_K7_HWCR, lo, hi);
- }
- break;
-
- case CPU_DOWN_PREPARE:
- case CPU_DOWN_PREPARE_FROZEN:
- rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
- lo &= ~BIT(25);
- wrmsr_on_cpu(cpu, MSR_K7_HWCR, lo, hi);
- break;
-
- default:
- break;
- }
-
- return NOTIFY_OK;
-}
-
-static struct notifier_block cpb_nb = {
- .notifier_call = cpb_notify,
-};
-
-/* driver entry point for init */
-static int __cpuinit powernowk8_init(void)
-{
- unsigned int i, supported_cpus = 0, cpu;
- int rv;
-
- for_each_online_cpu(i) {
- int rc;
- smp_call_function_single(i, check_supported_cpu, &rc, 1);
- if (rc == 0)
- supported_cpus++;
- }
-
- if (supported_cpus != num_online_cpus())
- return -ENODEV;
-
- printk(KERN_INFO PFX "Found %d %s (%d cpu cores) (" VERSION ")\n",
- num_online_nodes(), boot_cpu_data.x86_model_id, supported_cpus);
-
- if (boot_cpu_has(X86_FEATURE_CPB)) {
-
- cpb_capable = true;
-
- msrs = msrs_alloc();
- if (!msrs) {
- printk(KERN_ERR "%s: Error allocating msrs!\n", __func__);
- return -ENOMEM;
- }
-
- register_cpu_notifier(&cpb_nb);
-
- rdmsr_on_cpus(cpu_online_mask, MSR_K7_HWCR, msrs);
-
- for_each_cpu(cpu, cpu_online_mask) {
- struct msr *reg = per_cpu_ptr(msrs, cpu);
- cpb_enabled |= !(!!(reg->l & BIT(25)));
- }
-
- printk(KERN_INFO PFX "Core Performance Boosting: %s.\n",
- (cpb_enabled ? "on" : "off"));
- }
-
- rv = cpufreq_register_driver(&cpufreq_amd64_driver);
- if (rv < 0 && boot_cpu_has(X86_FEATURE_CPB)) {
- unregister_cpu_notifier(&cpb_nb);
- msrs_free(msrs);
- msrs = NULL;
- }
- return rv;
-}
-
-/* driver entry point for term */
-static void __exit powernowk8_exit(void)
-{
- dprintk("exit\n");
-
- if (boot_cpu_has(X86_FEATURE_CPB)) {
- msrs_free(msrs);
- msrs = NULL;
-
- unregister_cpu_notifier(&cpb_nb);
- }
-
- cpufreq_unregister_driver(&cpufreq_amd64_driver);
-}
-
-MODULE_AUTHOR("Paul Devriendt <paul.devriendt@amd.com> and "
- "Mark Langsdorf <mark.langsdorf@amd.com>");
-MODULE_DESCRIPTION("AMD Athlon 64 and Opteron processor frequency driver.");
-MODULE_LICENSE("GPL");
-
-late_initcall(powernowk8_init);
-module_exit(powernowk8_exit);
+++ /dev/null
-/*
- * (c) 2003-2006 Advanced Micro Devices, Inc.
- * Your use of this code is subject to the terms and conditions of the
- * GNU general public license version 2. See "COPYING" or
- * http://www.gnu.org/licenses/gpl.html
- */
-
-enum pstate {
- HW_PSTATE_INVALID = 0xff,
- HW_PSTATE_0 = 0,
- HW_PSTATE_1 = 1,
- HW_PSTATE_2 = 2,
- HW_PSTATE_3 = 3,
- HW_PSTATE_4 = 4,
- HW_PSTATE_5 = 5,
- HW_PSTATE_6 = 6,
- HW_PSTATE_7 = 7,
-};
-
-struct powernow_k8_data {
- unsigned int cpu;
-
- u32 numps; /* number of p-states */
- u32 batps; /* number of p-states supported on battery */
- u32 max_hw_pstate; /* maximum legal hardware pstate */
-
- /* these values are constant when the PSB is used to determine
- * vid/fid pairings, but are modified during the ->target() call
- * when ACPI is used */
- u32 rvo; /* ramp voltage offset */
- u32 irt; /* isochronous relief time */
- u32 vidmvs; /* usable value calculated from mvs */
- u32 vstable; /* voltage stabilization time, units 20 us */
- u32 plllock; /* pll lock time, units 1 us */
- u32 exttype; /* extended interface = 1 */
-
- /* keep track of the current fid / vid or pstate */
- u32 currvid;
- u32 currfid;
- enum pstate currpstate;
-
- /* the powernow_table includes all frequency and vid/fid pairings:
- * fid are the lower 8 bits of the index, vid are the upper 8 bits.
- * frequency is in kHz */
- struct cpufreq_frequency_table *powernow_table;
-
- /* the acpi table needs to be kept. it's only available if ACPI was
- * used to determine valid frequency/vid/fid states */
- struct acpi_processor_performance acpi_data;
-
- /* we need to keep track of associated cores, but let cpufreq
- * handle hotplug events - so just point at cpufreq pol->cpus
- * structure */
- struct cpumask *available_cores;
-};
-
-/* processor's cpuid instruction support */
-#define CPUID_PROCESSOR_SIGNATURE 1 /* function 1 */
-#define CPUID_XFAM 0x0ff00000 /* extended family */
-#define CPUID_XFAM_K8 0
-#define CPUID_XMOD 0x000f0000 /* extended model */
-#define CPUID_XMOD_REV_MASK 0x000c0000
-#define CPUID_XFAM_10H 0x00100000 /* family 0x10 */
-#define CPUID_USE_XFAM_XMOD 0x00000f00
-#define CPUID_GET_MAX_CAPABILITIES 0x80000000
-#define CPUID_FREQ_VOLT_CAPABILITIES 0x80000007
-#define P_STATE_TRANSITION_CAPABLE 6
-
-/* Model Specific Registers for p-state transitions. MSRs are 64-bit. For */
-/* writes (wrmsr - opcode 0f 30), the register number is placed in ecx, and */
-/* the value to write is placed in edx:eax. For reads (rdmsr - opcode 0f 32), */
-/* the register number is placed in ecx, and the data is returned in edx:eax. */
-
-#define MSR_FIDVID_CTL 0xc0010041
-#define MSR_FIDVID_STATUS 0xc0010042
-
-/* Field definitions within the FID VID Low Control MSR : */
-#define MSR_C_LO_INIT_FID_VID 0x00010000
-#define MSR_C_LO_NEW_VID 0x00003f00
-#define MSR_C_LO_NEW_FID 0x0000003f
-#define MSR_C_LO_VID_SHIFT 8
-
-/* Field definitions within the FID VID High Control MSR : */
-#define MSR_C_HI_STP_GNT_TO 0x000fffff
-
-/* Field definitions within the FID VID Low Status MSR : */
-#define MSR_S_LO_CHANGE_PENDING 0x80000000 /* cleared when completed */
-#define MSR_S_LO_MAX_RAMP_VID 0x3f000000
-#define MSR_S_LO_MAX_FID 0x003f0000
-#define MSR_S_LO_START_FID 0x00003f00
-#define MSR_S_LO_CURRENT_FID 0x0000003f
-
-/* Field definitions within the FID VID High Status MSR : */
-#define MSR_S_HI_MIN_WORKING_VID 0x3f000000
-#define MSR_S_HI_MAX_WORKING_VID 0x003f0000
-#define MSR_S_HI_START_VID 0x00003f00
-#define MSR_S_HI_CURRENT_VID 0x0000003f
-#define MSR_C_HI_STP_GNT_BENIGN 0x00000001
-
-
-/* Hardware Pstate _PSS and MSR definitions */
-#define USE_HW_PSTATE 0x00000080
-#define HW_PSTATE_MASK 0x00000007
-#define HW_PSTATE_VALID_MASK 0x80000000
-#define HW_PSTATE_MAX_MASK 0x000000f0
-#define HW_PSTATE_MAX_SHIFT 4
-#define MSR_PSTATE_DEF_BASE 0xc0010064 /* base of Pstate MSRs */
-#define MSR_PSTATE_STATUS 0xc0010063 /* Pstate Status MSR */
-#define MSR_PSTATE_CTRL 0xc0010062 /* Pstate control MSR */
-#define MSR_PSTATE_CUR_LIMIT 0xc0010061 /* pstate current limit MSR */
-
-/* define the two driver architectures */
-#define CPU_OPTERON 0
-#define CPU_HW_PSTATE 1
-
-
-/*
- * There are restrictions frequencies have to follow:
- * - only 1 entry in the low fid table ( <=1.4GHz )
- * - lowest entry in the high fid table must be >= 2 * the entry in the
- * low fid table
- * - lowest entry in the high fid table must be a <= 200MHz + 2 * the entry
- * in the low fid table
- * - the parts can only step at <= 200 MHz intervals, odd fid values are
- * supported in revision G and later revisions.
- * - lowest frequency must be >= interprocessor hypertransport link speed
- * (only applies to MP systems obviously)
- */
-
-/* fids (frequency identifiers) are arranged in 2 tables - lo and hi */
-#define LO_FID_TABLE_TOP 7 /* fid values marking the boundary */
-#define HI_FID_TABLE_BOTTOM 8 /* between the low and high tables */
-
-#define LO_VCOFREQ_TABLE_TOP 1400 /* corresponding vco frequency values */
-#define HI_VCOFREQ_TABLE_BOTTOM 1600
-
-#define MIN_FREQ_RESOLUTION 200 /* fids jump by 2 matching freq jumps by 200 */
-
-#define MAX_FID 0x2a /* Spec only gives FID values as far as 5 GHz */
-#define LEAST_VID 0x3e /* Lowest (numerically highest) useful vid value */
-
-#define MIN_FREQ 800 /* Min and max freqs, per spec */
-#define MAX_FREQ 5000
-
-#define INVALID_FID_MASK 0xffffffc0 /* not a valid fid if these bits are set */
-#define INVALID_VID_MASK 0xffffffc0 /* not a valid vid if these bits are set */
-
-#define VID_OFF 0x3f
-
-#define STOP_GRANT_5NS 1 /* min poss memory access latency for voltage change */
-
-#define PLL_LOCK_CONVERSION (1000/5) /* ms to ns, then divide by clock period */
-
-#define MAXIMUM_VID_STEPS 1 /* Current cpus only allow a single step of 25mV */
-#define VST_UNITS_20US 20 /* Voltage Stabilization Time is in units of 20us */
-
-/*
- * Most values of interest are encoded in a single field of the _PSS
- * entries: the "control" value.
- */
-
-#define IRT_SHIFT 30
-#define RVO_SHIFT 28
-#define EXT_TYPE_SHIFT 27
-#define PLL_L_SHIFT 20
-#define MVS_SHIFT 18
-#define VST_SHIFT 11
-#define VID_SHIFT 6
-#define IRT_MASK 3
-#define RVO_MASK 3
-#define EXT_TYPE_MASK 1
-#define PLL_L_MASK 0x7f
-#define MVS_MASK 3
-#define VST_MASK 0x7f
-#define VID_MASK 0x1f
-#define FID_MASK 0x1f
-#define EXT_VID_MASK 0x3f
-#define EXT_FID_MASK 0x3f
-
-
-/*
- * Version 1.4 of the PSB table. This table is constructed by BIOS and is
- * to tell the OS's power management driver which VIDs and FIDs are
- * supported by this particular processor.
- * If the data in the PSB / PST is wrong, then this driver will program the
- * wrong values into hardware, which is very likely to lead to a crash.
- */
-
-#define PSB_ID_STRING "AMDK7PNOW!"
-#define PSB_ID_STRING_LEN 10
-
-#define PSB_VERSION_1_4 0x14
-
-struct psb_s {
- u8 signature[10];
- u8 tableversion;
- u8 flags1;
- u16 vstable;
- u8 flags2;
- u8 num_tables;
- u32 cpuid;
- u8 plllocktime;
- u8 maxfid;
- u8 maxvid;
- u8 numps;
-};
-
-/* Pairs of fid/vid values are appended to the version 1.4 PSB table. */
-struct pst_s {
- u8 fid;
- u8 vid;
-};
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, "powernow-k8", msg)
-
-static int core_voltage_pre_transition(struct powernow_k8_data *data,
- u32 reqvid, u32 regfid);
-static int core_voltage_post_transition(struct powernow_k8_data *data, u32 reqvid);
-static int core_frequency_transition(struct powernow_k8_data *data, u32 reqfid);
-
-static void powernow_k8_acpi_pst_values(struct powernow_k8_data *data, unsigned int index);
-
-static int fill_powernow_table_pstate(struct powernow_k8_data *data, struct cpufreq_frequency_table *powernow_table);
-static int fill_powernow_table_fidvid(struct powernow_k8_data *data, struct cpufreq_frequency_table *powernow_table);
+++ /dev/null
-/*
- * sc520_freq.c: cpufreq driver for the AMD Elan sc520
- *
- * Copyright (C) 2005 Sean Young <sean@mess.org>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * Based on elanfreq.c
- *
- * 2005-03-30: - initial revision
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-
-#include <linux/delay.h>
-#include <linux/cpufreq.h>
-#include <linux/timex.h>
-#include <linux/io.h>
-
-#include <asm/msr.h>
-
-#define MMCR_BASE 0xfffef000 /* The default base address */
-#define OFFS_CPUCTL 0x2 /* CPU Control Register */
-
-static __u8 __iomem *cpuctl;
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "sc520_freq", msg)
-#define PFX "sc520_freq: "
-
-static struct cpufreq_frequency_table sc520_freq_table[] = {
- {0x01, 100000},
- {0x02, 133000},
- {0, CPUFREQ_TABLE_END},
-};
-
-static unsigned int sc520_freq_get_cpu_frequency(unsigned int cpu)
-{
- u8 clockspeed_reg = *cpuctl;
-
- switch (clockspeed_reg & 0x03) {
- default:
- printk(KERN_ERR PFX "error: cpuctl register has unexpected "
- "value %02x\n", clockspeed_reg);
- case 0x01:
- return 100000;
- case 0x02:
- return 133000;
- }
-}
-
-static void sc520_freq_set_cpu_state(unsigned int state)
-{
-
- struct cpufreq_freqs freqs;
- u8 clockspeed_reg;
-
- freqs.old = sc520_freq_get_cpu_frequency(0);
- freqs.new = sc520_freq_table[state].frequency;
- freqs.cpu = 0; /* AMD Elan is UP */
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
- dprintk("attempting to set frequency to %i kHz\n",
- sc520_freq_table[state].frequency);
-
- local_irq_disable();
-
- clockspeed_reg = *cpuctl & ~0x03;
- *cpuctl = clockspeed_reg | sc520_freq_table[state].index;
-
- local_irq_enable();
-
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-};
-
-static int sc520_freq_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, &sc520_freq_table[0]);
-}
-
-static int sc520_freq_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate = 0;
-
- if (cpufreq_frequency_table_target(policy, sc520_freq_table,
- target_freq, relation, &newstate))
- return -EINVAL;
-
- sc520_freq_set_cpu_state(newstate);
-
- return 0;
-}
-
-
-/*
- * Module init and exit code
- */
-
-static int sc520_freq_cpu_init(struct cpufreq_policy *policy)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- int result;
-
- /* capability check */
- if (c->x86_vendor != X86_VENDOR_AMD ||
- c->x86 != 4 || c->x86_model != 9)
- return -ENODEV;
-
- /* cpuinfo and default policy values */
- policy->cpuinfo.transition_latency = 1000000; /* 1ms */
- policy->cur = sc520_freq_get_cpu_frequency(0);
-
- result = cpufreq_frequency_table_cpuinfo(policy, sc520_freq_table);
- if (result)
- return result;
-
- cpufreq_frequency_table_get_attr(sc520_freq_table, policy->cpu);
-
- return 0;
-}
-
-
-static int sc520_freq_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-
-static struct freq_attr *sc520_freq_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-
-static struct cpufreq_driver sc520_freq_driver = {
- .get = sc520_freq_get_cpu_frequency,
- .verify = sc520_freq_verify,
- .target = sc520_freq_target,
- .init = sc520_freq_cpu_init,
- .exit = sc520_freq_cpu_exit,
- .name = "sc520_freq",
- .owner = THIS_MODULE,
- .attr = sc520_freq_attr,
-};
-
-
-static int __init sc520_freq_init(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- int err;
-
- /* Test if we have the right hardware */
- if (c->x86_vendor != X86_VENDOR_AMD ||
- c->x86 != 4 || c->x86_model != 9) {
- dprintk("no Elan SC520 processor found!\n");
- return -ENODEV;
- }
- cpuctl = ioremap((unsigned long)(MMCR_BASE + OFFS_CPUCTL), 1);
- if (!cpuctl) {
- printk(KERN_ERR "sc520_freq: error: failed to remap memory\n");
- return -ENOMEM;
- }
-
- err = cpufreq_register_driver(&sc520_freq_driver);
- if (err)
- iounmap(cpuctl);
-
- return err;
-}
-
-
-static void __exit sc520_freq_exit(void)
-{
- cpufreq_unregister_driver(&sc520_freq_driver);
- iounmap(cpuctl);
-}
-
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Sean Young <sean@mess.org>");
-MODULE_DESCRIPTION("cpufreq driver for AMD's Elan sc520 CPU");
-
-module_init(sc520_freq_init);
-module_exit(sc520_freq_exit);
-
+++ /dev/null
-/*
- * cpufreq driver for Enhanced SpeedStep, as found in Intel's Pentium
- * M (part of the Centrino chipset).
- *
- * Since the original Pentium M, most new Intel CPUs support Enhanced
- * SpeedStep.
- *
- * Despite the "SpeedStep" in the name, this is almost entirely unlike
- * traditional SpeedStep.
- *
- * Modelled on speedstep.c
- *
- * Copyright (C) 2003 Jeremy Fitzhardinge <jeremy@goop.org>
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/sched.h> /* current */
-#include <linux/delay.h>
-#include <linux/compiler.h>
-#include <linux/gfp.h>
-
-#include <asm/msr.h>
-#include <asm/processor.h>
-#include <asm/cpufeature.h>
-
-#define PFX "speedstep-centrino: "
-#define MAINTAINER "cpufreq@vger.kernel.org"
-
-#define dprintk(msg...) \
- cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, "speedstep-centrino", msg)
-
-#define INTEL_MSR_RANGE (0xffff)
-
-struct cpu_id
-{
- __u8 x86; /* CPU family */
- __u8 x86_model; /* model */
- __u8 x86_mask; /* stepping */
-};
-
-enum {
- CPU_BANIAS,
- CPU_DOTHAN_A1,
- CPU_DOTHAN_A2,
- CPU_DOTHAN_B0,
- CPU_MP4HT_D0,
- CPU_MP4HT_E0,
-};
-
-static const struct cpu_id cpu_ids[] = {
- [CPU_BANIAS] = { 6, 9, 5 },
- [CPU_DOTHAN_A1] = { 6, 13, 1 },
- [CPU_DOTHAN_A2] = { 6, 13, 2 },
- [CPU_DOTHAN_B0] = { 6, 13, 6 },
- [CPU_MP4HT_D0] = {15, 3, 4 },
- [CPU_MP4HT_E0] = {15, 4, 1 },
-};
-#define N_IDS ARRAY_SIZE(cpu_ids)
-
-struct cpu_model
-{
- const struct cpu_id *cpu_id;
- const char *model_name;
- unsigned max_freq; /* max clock in kHz */
-
- struct cpufreq_frequency_table *op_points; /* clock/voltage pairs */
-};
-static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c,
- const struct cpu_id *x);
-
-/* Operating points for current CPU */
-static DEFINE_PER_CPU(struct cpu_model *, centrino_model);
-static DEFINE_PER_CPU(const struct cpu_id *, centrino_cpu);
-
-static struct cpufreq_driver centrino_driver;
-
-#ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE
-
-/* Computes the correct form for IA32_PERF_CTL MSR for a particular
- frequency/voltage operating point; frequency in MHz, volts in mV.
- This is stored as "index" in the structure. */
-#define OP(mhz, mv) \
- { \
- .frequency = (mhz) * 1000, \
- .index = (((mhz)/100) << 8) | ((mv - 700) / 16) \
- }
-
-/*
- * These voltage tables were derived from the Intel Pentium M
- * datasheet, document 25261202.pdf, Table 5. I have verified they
- * are consistent with my IBM ThinkPad X31, which has a 1.3GHz Pentium
- * M.
- */
-
-/* Ultra Low Voltage Intel Pentium M processor 900MHz (Banias) */
-static struct cpufreq_frequency_table banias_900[] =
-{
- OP(600, 844),
- OP(800, 988),
- OP(900, 1004),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Ultra Low Voltage Intel Pentium M processor 1000MHz (Banias) */
-static struct cpufreq_frequency_table banias_1000[] =
-{
- OP(600, 844),
- OP(800, 972),
- OP(900, 988),
- OP(1000, 1004),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Low Voltage Intel Pentium M processor 1.10GHz (Banias) */
-static struct cpufreq_frequency_table banias_1100[] =
-{
- OP( 600, 956),
- OP( 800, 1020),
- OP( 900, 1100),
- OP(1000, 1164),
- OP(1100, 1180),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-
-/* Low Voltage Intel Pentium M processor 1.20GHz (Banias) */
-static struct cpufreq_frequency_table banias_1200[] =
-{
- OP( 600, 956),
- OP( 800, 1004),
- OP( 900, 1020),
- OP(1000, 1100),
- OP(1100, 1164),
- OP(1200, 1180),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Intel Pentium M processor 1.30GHz (Banias) */
-static struct cpufreq_frequency_table banias_1300[] =
-{
- OP( 600, 956),
- OP( 800, 1260),
- OP(1000, 1292),
- OP(1200, 1356),
- OP(1300, 1388),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Intel Pentium M processor 1.40GHz (Banias) */
-static struct cpufreq_frequency_table banias_1400[] =
-{
- OP( 600, 956),
- OP( 800, 1180),
- OP(1000, 1308),
- OP(1200, 1436),
- OP(1400, 1484),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Intel Pentium M processor 1.50GHz (Banias) */
-static struct cpufreq_frequency_table banias_1500[] =
-{
- OP( 600, 956),
- OP( 800, 1116),
- OP(1000, 1228),
- OP(1200, 1356),
- OP(1400, 1452),
- OP(1500, 1484),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Intel Pentium M processor 1.60GHz (Banias) */
-static struct cpufreq_frequency_table banias_1600[] =
-{
- OP( 600, 956),
- OP( 800, 1036),
- OP(1000, 1164),
- OP(1200, 1276),
- OP(1400, 1420),
- OP(1600, 1484),
- { .frequency = CPUFREQ_TABLE_END }
-};
-
-/* Intel Pentium M processor 1.70GHz (Banias) */
-static struct cpufreq_frequency_table banias_1700[] =
-{
- OP( 600, 956),
- OP( 800, 1004),
- OP(1000, 1116),
- OP(1200, 1228),
- OP(1400, 1308),
- OP(1700, 1484),
- { .frequency = CPUFREQ_TABLE_END }
-};
-#undef OP
-
-#define _BANIAS(cpuid, max, name) \
-{ .cpu_id = cpuid, \
- .model_name = "Intel(R) Pentium(R) M processor " name "MHz", \
- .max_freq = (max)*1000, \
- .op_points = banias_##max, \
-}
-#define BANIAS(max) _BANIAS(&cpu_ids[CPU_BANIAS], max, #max)
-
-/* CPU models, their operating frequency range, and freq/voltage
- operating points */
-static struct cpu_model models[] =
-{
- _BANIAS(&cpu_ids[CPU_BANIAS], 900, " 900"),
- BANIAS(1000),
- BANIAS(1100),
- BANIAS(1200),
- BANIAS(1300),
- BANIAS(1400),
- BANIAS(1500),
- BANIAS(1600),
- BANIAS(1700),
-
- /* NULL model_name is a wildcard */
- { &cpu_ids[CPU_DOTHAN_A1], NULL, 0, NULL },
- { &cpu_ids[CPU_DOTHAN_A2], NULL, 0, NULL },
- { &cpu_ids[CPU_DOTHAN_B0], NULL, 0, NULL },
- { &cpu_ids[CPU_MP4HT_D0], NULL, 0, NULL },
- { &cpu_ids[CPU_MP4HT_E0], NULL, 0, NULL },
-
- { NULL, }
-};
-#undef _BANIAS
-#undef BANIAS
-
-static int centrino_cpu_init_table(struct cpufreq_policy *policy)
-{
- struct cpuinfo_x86 *cpu = &cpu_data(policy->cpu);
- struct cpu_model *model;
-
- for(model = models; model->cpu_id != NULL; model++)
- if (centrino_verify_cpu_id(cpu, model->cpu_id) &&
- (model->model_name == NULL ||
- strcmp(cpu->x86_model_id, model->model_name) == 0))
- break;
-
- if (model->cpu_id == NULL) {
- /* No match at all */
- dprintk("no support for CPU model \"%s\": "
- "send /proc/cpuinfo to " MAINTAINER "\n",
- cpu->x86_model_id);
- return -ENOENT;
- }
-
- if (model->op_points == NULL) {
- /* Matched a non-match */
- dprintk("no table support for CPU model \"%s\"\n",
- cpu->x86_model_id);
- dprintk("try using the acpi-cpufreq driver\n");
- return -ENOENT;
- }
-
- per_cpu(centrino_model, policy->cpu) = model;
-
- dprintk("found \"%s\": max frequency: %dkHz\n",
- model->model_name, model->max_freq);
-
- return 0;
-}
-
-#else
-static inline int centrino_cpu_init_table(struct cpufreq_policy *policy)
-{
- return -ENODEV;
-}
-#endif /* CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE */
-
-static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c,
- const struct cpu_id *x)
-{
- if ((c->x86 == x->x86) &&
- (c->x86_model == x->x86_model) &&
- (c->x86_mask == x->x86_mask))
- return 1;
- return 0;
-}
-
-/* To be called only after centrino_model is initialized */
-static unsigned extract_clock(unsigned msr, unsigned int cpu, int failsafe)
-{
- int i;
-
- /*
- * Extract clock in kHz from PERF_CTL value
- * for centrino, as some DSDTs are buggy.
- * Ideally, this can be done using the acpi_data structure.
- */
- if ((per_cpu(centrino_cpu, cpu) == &cpu_ids[CPU_BANIAS]) ||
- (per_cpu(centrino_cpu, cpu) == &cpu_ids[CPU_DOTHAN_A1]) ||
- (per_cpu(centrino_cpu, cpu) == &cpu_ids[CPU_DOTHAN_B0])) {
- msr = (msr >> 8) & 0xff;
- return msr * 100000;
- }
-
- if ((!per_cpu(centrino_model, cpu)) ||
- (!per_cpu(centrino_model, cpu)->op_points))
- return 0;
-
- msr &= 0xffff;
- for (i = 0;
- per_cpu(centrino_model, cpu)->op_points[i].frequency
- != CPUFREQ_TABLE_END;
- i++) {
- if (msr == per_cpu(centrino_model, cpu)->op_points[i].index)
- return per_cpu(centrino_model, cpu)->
- op_points[i].frequency;
- }
- if (failsafe)
- return per_cpu(centrino_model, cpu)->op_points[i-1].frequency;
- else
- return 0;
-}
-
-/* Return the current CPU frequency in kHz */
-static unsigned int get_cur_freq(unsigned int cpu)
-{
- unsigned l, h;
- unsigned clock_freq;
-
- rdmsr_on_cpu(cpu, MSR_IA32_PERF_STATUS, &l, &h);
- clock_freq = extract_clock(l, cpu, 0);
-
- if (unlikely(clock_freq == 0)) {
- /*
- * On some CPUs, we can see transient MSR values (which are
- * not present in _PSS), while CPU is doing some automatic
- * P-state transition (like TM2). Get the last freq set
- * in PERF_CTL.
- */
- rdmsr_on_cpu(cpu, MSR_IA32_PERF_CTL, &l, &h);
- clock_freq = extract_clock(l, cpu, 1);
- }
- return clock_freq;
-}
-
-
-static int centrino_cpu_init(struct cpufreq_policy *policy)
-{
- struct cpuinfo_x86 *cpu = &cpu_data(policy->cpu);
- unsigned freq;
- unsigned l, h;
- int ret;
- int i;
-
- /* Only Intel makes Enhanced Speedstep-capable CPUs */
- if (cpu->x86_vendor != X86_VENDOR_INTEL ||
- !cpu_has(cpu, X86_FEATURE_EST))
- return -ENODEV;
-
- if (cpu_has(cpu, X86_FEATURE_CONSTANT_TSC))
- centrino_driver.flags |= CPUFREQ_CONST_LOOPS;
-
- if (policy->cpu != 0)
- return -ENODEV;
-
- for (i = 0; i < N_IDS; i++)
- if (centrino_verify_cpu_id(cpu, &cpu_ids[i]))
- break;
-
- if (i != N_IDS)
- per_cpu(centrino_cpu, policy->cpu) = &cpu_ids[i];
-
- if (!per_cpu(centrino_cpu, policy->cpu)) {
- dprintk("found unsupported CPU with "
- "Enhanced SpeedStep: send /proc/cpuinfo to "
- MAINTAINER "\n");
- return -ENODEV;
- }
-
- if (centrino_cpu_init_table(policy)) {
- return -ENODEV;
- }
-
- /* Check to see if Enhanced SpeedStep is enabled, and try to
- enable it if not. */
- rdmsr(MSR_IA32_MISC_ENABLE, l, h);
-
- if (!(l & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
- l |= MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP;
- dprintk("trying to enable Enhanced SpeedStep (%x)\n", l);
- wrmsr(MSR_IA32_MISC_ENABLE, l, h);
-
- /* check to see if it stuck */
- rdmsr(MSR_IA32_MISC_ENABLE, l, h);
- if (!(l & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
- printk(KERN_INFO PFX
- "couldn't enable Enhanced SpeedStep\n");
- return -ENODEV;
- }
- }
-
- freq = get_cur_freq(policy->cpu);
- policy->cpuinfo.transition_latency = 10000;
- /* 10uS transition latency */
- policy->cur = freq;
-
- dprintk("centrino_cpu_init: cur=%dkHz\n", policy->cur);
-
- ret = cpufreq_frequency_table_cpuinfo(policy,
- per_cpu(centrino_model, policy->cpu)->op_points);
- if (ret)
- return (ret);
-
- cpufreq_frequency_table_get_attr(
- per_cpu(centrino_model, policy->cpu)->op_points, policy->cpu);
-
- return 0;
-}
-
-static int centrino_cpu_exit(struct cpufreq_policy *policy)
-{
- unsigned int cpu = policy->cpu;
-
- if (!per_cpu(centrino_model, cpu))
- return -ENODEV;
-
- cpufreq_frequency_table_put_attr(cpu);
-
- per_cpu(centrino_model, cpu) = NULL;
-
- return 0;
-}
-
-/**
- * centrino_verify - verifies a new CPUFreq policy
- * @policy: new policy
- *
- * Limit must be within this model's frequency range at least one
- * border included.
- */
-static int centrino_verify (struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy,
- per_cpu(centrino_model, policy->cpu)->op_points);
-}
-
-/**
- * centrino_setpolicy - set a new CPUFreq policy
- * @policy: new policy
- * @target_freq: the target frequency
- * @relation: how that frequency relates to achieved frequency
- * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
- *
- * Sets a new CPUFreq policy.
- */
-static int centrino_target (struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate = 0;
- unsigned int msr, oldmsr = 0, h = 0, cpu = policy->cpu;
- struct cpufreq_freqs freqs;
- int retval = 0;
- unsigned int j, k, first_cpu, tmp;
- cpumask_var_t covered_cpus;
-
- if (unlikely(!zalloc_cpumask_var(&covered_cpus, GFP_KERNEL)))
- return -ENOMEM;
-
- if (unlikely(per_cpu(centrino_model, cpu) == NULL)) {
- retval = -ENODEV;
- goto out;
- }
-
- if (unlikely(cpufreq_frequency_table_target(policy,
- per_cpu(centrino_model, cpu)->op_points,
- target_freq,
- relation,
- &newstate))) {
- retval = -EINVAL;
- goto out;
- }
-
- first_cpu = 1;
- for_each_cpu(j, policy->cpus) {
- int good_cpu;
-
- /* cpufreq holds the hotplug lock, so we are safe here */
- if (!cpu_online(j))
- continue;
-
- /*
- * Support for SMP systems.
- * Make sure we are running on CPU that wants to change freq
- */
- if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
- good_cpu = cpumask_any_and(policy->cpus,
- cpu_online_mask);
- else
- good_cpu = j;
-
- if (good_cpu >= nr_cpu_ids) {
- dprintk("couldn't limit to CPUs in this domain\n");
- retval = -EAGAIN;
- if (first_cpu) {
- /* We haven't started the transition yet. */
- goto out;
- }
- break;
- }
-
- msr = per_cpu(centrino_model, cpu)->op_points[newstate].index;
-
- if (first_cpu) {
- rdmsr_on_cpu(good_cpu, MSR_IA32_PERF_CTL, &oldmsr, &h);
- if (msr == (oldmsr & 0xffff)) {
- dprintk("no change needed - msr was and needs "
- "to be %x\n", oldmsr);
- retval = 0;
- goto out;
- }
-
- freqs.old = extract_clock(oldmsr, cpu, 0);
- freqs.new = extract_clock(msr, cpu, 0);
-
- dprintk("target=%dkHz old=%d new=%d msr=%04x\n",
- target_freq, freqs.old, freqs.new, msr);
-
- for_each_cpu(k, policy->cpus) {
- if (!cpu_online(k))
- continue;
- freqs.cpu = k;
- cpufreq_notify_transition(&freqs,
- CPUFREQ_PRECHANGE);
- }
-
- first_cpu = 0;
- /* all but 16 LSB are reserved, treat them with care */
- oldmsr &= ~0xffff;
- msr &= 0xffff;
- oldmsr |= msr;
- }
-
- wrmsr_on_cpu(good_cpu, MSR_IA32_PERF_CTL, oldmsr, h);
- if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
- break;
-
- cpumask_set_cpu(j, covered_cpus);
- }
-
- for_each_cpu(k, policy->cpus) {
- if (!cpu_online(k))
- continue;
- freqs.cpu = k;
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
-
- if (unlikely(retval)) {
- /*
- * We have failed halfway through the frequency change.
- * We have sent callbacks to policy->cpus and
- * MSRs have already been written on coverd_cpus.
- * Best effort undo..
- */
-
- for_each_cpu(j, covered_cpus)
- wrmsr_on_cpu(j, MSR_IA32_PERF_CTL, oldmsr, h);
-
- tmp = freqs.new;
- freqs.new = freqs.old;
- freqs.old = tmp;
- for_each_cpu(j, policy->cpus) {
- if (!cpu_online(j))
- continue;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
- }
- retval = 0;
-
-out:
- free_cpumask_var(covered_cpus);
- return retval;
-}
-
-static struct freq_attr* centrino_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver centrino_driver = {
- .name = "centrino", /* should be speedstep-centrino,
- but there's a 16 char limit */
- .init = centrino_cpu_init,
- .exit = centrino_cpu_exit,
- .verify = centrino_verify,
- .target = centrino_target,
- .get = get_cur_freq,
- .attr = centrino_attr,
- .owner = THIS_MODULE,
-};
-
-
-/**
- * centrino_init - initializes the Enhanced SpeedStep CPUFreq driver
- *
- * Initializes the Enhanced SpeedStep support. Returns -ENODEV on
- * unsupported devices, -ENOENT if there's no voltage table for this
- * particular CPU model, -EINVAL on problems during initiatization,
- * and zero on success.
- *
- * This is quite picky. Not only does the CPU have to advertise the
- * "est" flag in the cpuid capability flags, we look for a specific
- * CPU model and stepping, and we need to have the exact model name in
- * our voltage tables. That is, be paranoid about not releasing
- * someone's valuable magic smoke.
- */
-static int __init centrino_init(void)
-{
- struct cpuinfo_x86 *cpu = &cpu_data(0);
-
- if (!cpu_has(cpu, X86_FEATURE_EST))
- return -ENODEV;
-
- return cpufreq_register_driver(¢rino_driver);
-}
-
-static void __exit centrino_exit(void)
-{
- cpufreq_unregister_driver(¢rino_driver);
-}
-
-MODULE_AUTHOR ("Jeremy Fitzhardinge <jeremy@goop.org>");
-MODULE_DESCRIPTION ("Enhanced SpeedStep driver for Intel Pentium M processors.");
-MODULE_LICENSE ("GPL");
-
-late_initcall(centrino_init);
-module_exit(centrino_exit);
+++ /dev/null
-/*
- * (C) 2001 Dave Jones, Arjan van de ven.
- * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- * Based upon reverse engineered information, and on Intel documentation
- * for chipsets ICH2-M and ICH3-M.
- *
- * Many thanks to Ducrot Bruno for finding and fixing the last
- * "missing link" for ICH2-M/ICH3-M support, and to Thomas Winkler
- * for extensive testing.
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-
-/*********************************************************************
- * SPEEDSTEP - DEFINITIONS *
- *********************************************************************/
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/pci.h>
-#include <linux/sched.h>
-
-#include "speedstep-lib.h"
-
-
-/* speedstep_chipset:
- * It is necessary to know which chipset is used. As accesses to
- * this device occur at various places in this module, we need a
- * static struct pci_dev * pointing to that device.
- */
-static struct pci_dev *speedstep_chipset_dev;
-
-
-/* speedstep_processor
- */
-static enum speedstep_processor speedstep_processor;
-
-static u32 pmbase;
-
-/*
- * There are only two frequency states for each processor. Values
- * are in kHz for the time being.
- */
-static struct cpufreq_frequency_table speedstep_freqs[] = {
- {SPEEDSTEP_HIGH, 0},
- {SPEEDSTEP_LOW, 0},
- {0, CPUFREQ_TABLE_END},
-};
-
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "speedstep-ich", msg)
-
-
-/**
- * speedstep_find_register - read the PMBASE address
- *
- * Returns: -ENODEV if no register could be found
- */
-static int speedstep_find_register(void)
-{
- if (!speedstep_chipset_dev)
- return -ENODEV;
-
- /* get PMBASE */
- pci_read_config_dword(speedstep_chipset_dev, 0x40, &pmbase);
- if (!(pmbase & 0x01)) {
- printk(KERN_ERR "speedstep-ich: could not find speedstep register\n");
- return -ENODEV;
- }
-
- pmbase &= 0xFFFFFFFE;
- if (!pmbase) {
- printk(KERN_ERR "speedstep-ich: could not find speedstep register\n");
- return -ENODEV;
- }
-
- dprintk("pmbase is 0x%x\n", pmbase);
- return 0;
-}
-
-/**
- * speedstep_set_state - set the SpeedStep state
- * @state: new processor frequency state (SPEEDSTEP_LOW or SPEEDSTEP_HIGH)
- *
- * Tries to change the SpeedStep state. Can be called from
- * smp_call_function_single.
- */
-static void speedstep_set_state(unsigned int state)
-{
- u8 pm2_blk;
- u8 value;
- unsigned long flags;
-
- if (state > 0x1)
- return;
-
- /* Disable IRQs */
- local_irq_save(flags);
-
- /* read state */
- value = inb(pmbase + 0x50);
-
- dprintk("read at pmbase 0x%x + 0x50 returned 0x%x\n", pmbase, value);
-
- /* write new state */
- value &= 0xFE;
- value |= state;
-
- dprintk("writing 0x%x to pmbase 0x%x + 0x50\n", value, pmbase);
-
- /* Disable bus master arbitration */
- pm2_blk = inb(pmbase + 0x20);
- pm2_blk |= 0x01;
- outb(pm2_blk, (pmbase + 0x20));
-
- /* Actual transition */
- outb(value, (pmbase + 0x50));
-
- /* Restore bus master arbitration */
- pm2_blk &= 0xfe;
- outb(pm2_blk, (pmbase + 0x20));
-
- /* check if transition was successful */
- value = inb(pmbase + 0x50);
-
- /* Enable IRQs */
- local_irq_restore(flags);
-
- dprintk("read at pmbase 0x%x + 0x50 returned 0x%x\n", pmbase, value);
-
- if (state == (value & 0x1))
- dprintk("change to %u MHz succeeded\n",
- speedstep_get_frequency(speedstep_processor) / 1000);
- else
- printk(KERN_ERR "cpufreq: change failed - I/O error\n");
-
- return;
-}
-
-/* Wrapper for smp_call_function_single. */
-static void _speedstep_set_state(void *_state)
-{
- speedstep_set_state(*(unsigned int *)_state);
-}
-
-/**
- * speedstep_activate - activate SpeedStep control in the chipset
- *
- * Tries to activate the SpeedStep status and control registers.
- * Returns -EINVAL on an unsupported chipset, and zero on success.
- */
-static int speedstep_activate(void)
-{
- u16 value = 0;
-
- if (!speedstep_chipset_dev)
- return -EINVAL;
-
- pci_read_config_word(speedstep_chipset_dev, 0x00A0, &value);
- if (!(value & 0x08)) {
- value |= 0x08;
- dprintk("activating SpeedStep (TM) registers\n");
- pci_write_config_word(speedstep_chipset_dev, 0x00A0, value);
- }
-
- return 0;
-}
-
-
-/**
- * speedstep_detect_chipset - detect the Southbridge which contains SpeedStep logic
- *
- * Detects ICH2-M, ICH3-M and ICH4-M so far. The pci_dev points to
- * the LPC bridge / PM module which contains all power-management
- * functions. Returns the SPEEDSTEP_CHIPSET_-number for the detected
- * chipset, or zero on failure.
- */
-static unsigned int speedstep_detect_chipset(void)
-{
- speedstep_chipset_dev = pci_get_subsys(PCI_VENDOR_ID_INTEL,
- PCI_DEVICE_ID_INTEL_82801DB_12,
- PCI_ANY_ID, PCI_ANY_ID,
- NULL);
- if (speedstep_chipset_dev)
- return 4; /* 4-M */
-
- speedstep_chipset_dev = pci_get_subsys(PCI_VENDOR_ID_INTEL,
- PCI_DEVICE_ID_INTEL_82801CA_12,
- PCI_ANY_ID, PCI_ANY_ID,
- NULL);
- if (speedstep_chipset_dev)
- return 3; /* 3-M */
-
-
- speedstep_chipset_dev = pci_get_subsys(PCI_VENDOR_ID_INTEL,
- PCI_DEVICE_ID_INTEL_82801BA_10,
- PCI_ANY_ID, PCI_ANY_ID,
- NULL);
- if (speedstep_chipset_dev) {
- /* speedstep.c causes lockups on Dell Inspirons 8000 and
- * 8100 which use a pretty old revision of the 82815
- * host brige. Abort on these systems.
- */
- static struct pci_dev *hostbridge;
-
- hostbridge = pci_get_subsys(PCI_VENDOR_ID_INTEL,
- PCI_DEVICE_ID_INTEL_82815_MC,
- PCI_ANY_ID, PCI_ANY_ID,
- NULL);
-
- if (!hostbridge)
- return 2; /* 2-M */
-
- if (hostbridge->revision < 5) {
- dprintk("hostbridge does not support speedstep\n");
- speedstep_chipset_dev = NULL;
- pci_dev_put(hostbridge);
- return 0;
- }
-
- pci_dev_put(hostbridge);
- return 2; /* 2-M */
- }
-
- return 0;
-}
-
-static void get_freq_data(void *_speed)
-{
- unsigned int *speed = _speed;
-
- *speed = speedstep_get_frequency(speedstep_processor);
-}
-
-static unsigned int speedstep_get(unsigned int cpu)
-{
- unsigned int speed;
-
- /* You're supposed to ensure CPU is online. */
- if (smp_call_function_single(cpu, get_freq_data, &speed, 1) != 0)
- BUG();
-
- dprintk("detected %u kHz as current frequency\n", speed);
- return speed;
-}
-
-/**
- * speedstep_target - set a new CPUFreq policy
- * @policy: new policy
- * @target_freq: the target frequency
- * @relation: how that frequency relates to achieved frequency
- * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
- *
- * Sets a new CPUFreq policy.
- */
-static int speedstep_target(struct cpufreq_policy *policy,
- unsigned int target_freq,
- unsigned int relation)
-{
- unsigned int newstate = 0, policy_cpu;
- struct cpufreq_freqs freqs;
- int i;
-
- if (cpufreq_frequency_table_target(policy, &speedstep_freqs[0],
- target_freq, relation, &newstate))
- return -EINVAL;
-
- policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
- freqs.old = speedstep_get(policy_cpu);
- freqs.new = speedstep_freqs[newstate].frequency;
- freqs.cpu = policy->cpu;
-
- dprintk("transiting from %u to %u kHz\n", freqs.old, freqs.new);
-
- /* no transition necessary */
- if (freqs.old == freqs.new)
- return 0;
-
- for_each_cpu(i, policy->cpus) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- }
-
- smp_call_function_single(policy_cpu, _speedstep_set_state, &newstate,
- true);
-
- for_each_cpu(i, policy->cpus) {
- freqs.cpu = i;
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
- }
-
- return 0;
-}
-
-
-/**
- * speedstep_verify - verifies a new CPUFreq policy
- * @policy: new policy
- *
- * Limit must be within speedstep_low_freq and speedstep_high_freq, with
- * at least one border included.
- */
-static int speedstep_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, &speedstep_freqs[0]);
-}
-
-struct get_freqs {
- struct cpufreq_policy *policy;
- int ret;
-};
-
-static void get_freqs_on_cpu(void *_get_freqs)
-{
- struct get_freqs *get_freqs = _get_freqs;
-
- get_freqs->ret =
- speedstep_get_freqs(speedstep_processor,
- &speedstep_freqs[SPEEDSTEP_LOW].frequency,
- &speedstep_freqs[SPEEDSTEP_HIGH].frequency,
- &get_freqs->policy->cpuinfo.transition_latency,
- &speedstep_set_state);
-}
-
-static int speedstep_cpu_init(struct cpufreq_policy *policy)
-{
- int result;
- unsigned int policy_cpu, speed;
- struct get_freqs gf;
-
- /* only run on CPU to be set, or on its sibling */
-#ifdef CONFIG_SMP
- cpumask_copy(policy->cpus, cpu_sibling_mask(policy->cpu));
-#endif
- policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
-
- /* detect low and high frequency and transition latency */
- gf.policy = policy;
- smp_call_function_single(policy_cpu, get_freqs_on_cpu, &gf, 1);
- if (gf.ret)
- return gf.ret;
-
- /* get current speed setting */
- speed = speedstep_get(policy_cpu);
- if (!speed)
- return -EIO;
-
- dprintk("currently at %s speed setting - %i MHz\n",
- (speed == speedstep_freqs[SPEEDSTEP_LOW].frequency)
- ? "low" : "high",
- (speed / 1000));
-
- /* cpuinfo and default policy values */
- policy->cur = speed;
-
- result = cpufreq_frequency_table_cpuinfo(policy, speedstep_freqs);
- if (result)
- return result;
-
- cpufreq_frequency_table_get_attr(speedstep_freqs, policy->cpu);
-
- return 0;
-}
-
-
-static int speedstep_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-static struct freq_attr *speedstep_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-
-static struct cpufreq_driver speedstep_driver = {
- .name = "speedstep-ich",
- .verify = speedstep_verify,
- .target = speedstep_target,
- .init = speedstep_cpu_init,
- .exit = speedstep_cpu_exit,
- .get = speedstep_get,
- .owner = THIS_MODULE,
- .attr = speedstep_attr,
-};
-
-
-/**
- * speedstep_init - initializes the SpeedStep CPUFreq driver
- *
- * Initializes the SpeedStep support. Returns -ENODEV on unsupported
- * devices, -EINVAL on problems during initiatization, and zero on
- * success.
- */
-static int __init speedstep_init(void)
-{
- /* detect processor */
- speedstep_processor = speedstep_detect_processor();
- if (!speedstep_processor) {
- dprintk("Intel(R) SpeedStep(TM) capable processor "
- "not found\n");
- return -ENODEV;
- }
-
- /* detect chipset */
- if (!speedstep_detect_chipset()) {
- dprintk("Intel(R) SpeedStep(TM) for this chipset not "
- "(yet) available.\n");
- return -ENODEV;
- }
-
- /* activate speedstep support */
- if (speedstep_activate()) {
- pci_dev_put(speedstep_chipset_dev);
- return -EINVAL;
- }
-
- if (speedstep_find_register())
- return -ENODEV;
-
- return cpufreq_register_driver(&speedstep_driver);
-}
-
-
-/**
- * speedstep_exit - unregisters SpeedStep support
- *
- * Unregisters SpeedStep support.
- */
-static void __exit speedstep_exit(void)
-{
- pci_dev_put(speedstep_chipset_dev);
- cpufreq_unregister_driver(&speedstep_driver);
-}
-
-
-MODULE_AUTHOR("Dave Jones <davej@redhat.com>, "
- "Dominik Brodowski <linux@brodo.de>");
-MODULE_DESCRIPTION("Speedstep driver for Intel mobile processors on chipsets "
- "with ICH-M southbridges.");
-MODULE_LICENSE("GPL");
-
-module_init(speedstep_init);
-module_exit(speedstep_exit);
+++ /dev/null
-/*
- * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * Library for common functions for Intel SpeedStep v.1 and v.2 support
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-
-#include <asm/msr.h>
-#include <asm/tsc.h>
-#include "speedstep-lib.h"
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "speedstep-lib", msg)
-
-#define PFX "speedstep-lib: "
-
-#ifdef CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK
-static int relaxed_check;
-#else
-#define relaxed_check 0
-#endif
-
-/*********************************************************************
- * GET PROCESSOR CORE SPEED IN KHZ *
- *********************************************************************/
-
-static unsigned int pentium3_get_frequency(enum speedstep_processor processor)
-{
- /* See table 14 of p3_ds.pdf and table 22 of 29834003.pdf */
- struct {
- unsigned int ratio; /* Frequency Multiplier (x10) */
- u8 bitmap; /* power on configuration bits
- [27, 25:22] (in MSR 0x2a) */
- } msr_decode_mult[] = {
- { 30, 0x01 },
- { 35, 0x05 },
- { 40, 0x02 },
- { 45, 0x06 },
- { 50, 0x00 },
- { 55, 0x04 },
- { 60, 0x0b },
- { 65, 0x0f },
- { 70, 0x09 },
- { 75, 0x0d },
- { 80, 0x0a },
- { 85, 0x26 },
- { 90, 0x20 },
- { 100, 0x2b },
- { 0, 0xff } /* error or unknown value */
- };
-
- /* PIII(-M) FSB settings: see table b1-b of 24547206.pdf */
- struct {
- unsigned int value; /* Front Side Bus speed in MHz */
- u8 bitmap; /* power on configuration bits [18: 19]
- (in MSR 0x2a) */
- } msr_decode_fsb[] = {
- { 66, 0x0 },
- { 100, 0x2 },
- { 133, 0x1 },
- { 0, 0xff}
- };
-
- u32 msr_lo, msr_tmp;
- int i = 0, j = 0;
-
- /* read MSR 0x2a - we only need the low 32 bits */
- rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_tmp);
- dprintk("P3 - MSR_IA32_EBL_CR_POWERON: 0x%x 0x%x\n", msr_lo, msr_tmp);
- msr_tmp = msr_lo;
-
- /* decode the FSB */
- msr_tmp &= 0x00c0000;
- msr_tmp >>= 18;
- while (msr_tmp != msr_decode_fsb[i].bitmap) {
- if (msr_decode_fsb[i].bitmap == 0xff)
- return 0;
- i++;
- }
-
- /* decode the multiplier */
- if (processor == SPEEDSTEP_CPU_PIII_C_EARLY) {
- dprintk("workaround for early PIIIs\n");
- msr_lo &= 0x03c00000;
- } else
- msr_lo &= 0x0bc00000;
- msr_lo >>= 22;
- while (msr_lo != msr_decode_mult[j].bitmap) {
- if (msr_decode_mult[j].bitmap == 0xff)
- return 0;
- j++;
- }
-
- dprintk("speed is %u\n",
- (msr_decode_mult[j].ratio * msr_decode_fsb[i].value * 100));
-
- return msr_decode_mult[j].ratio * msr_decode_fsb[i].value * 100;
-}
-
-
-static unsigned int pentiumM_get_frequency(void)
-{
- u32 msr_lo, msr_tmp;
-
- rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_tmp);
- dprintk("PM - MSR_IA32_EBL_CR_POWERON: 0x%x 0x%x\n", msr_lo, msr_tmp);
-
- /* see table B-2 of 24547212.pdf */
- if (msr_lo & 0x00040000) {
- printk(KERN_DEBUG PFX "PM - invalid FSB: 0x%x 0x%x\n",
- msr_lo, msr_tmp);
- return 0;
- }
-
- msr_tmp = (msr_lo >> 22) & 0x1f;
- dprintk("bits 22-26 are 0x%x, speed is %u\n",
- msr_tmp, (msr_tmp * 100 * 1000));
-
- return msr_tmp * 100 * 1000;
-}
-
-static unsigned int pentium_core_get_frequency(void)
-{
- u32 fsb = 0;
- u32 msr_lo, msr_tmp;
- int ret;
-
- rdmsr(MSR_FSB_FREQ, msr_lo, msr_tmp);
- /* see table B-2 of 25366920.pdf */
- switch (msr_lo & 0x07) {
- case 5:
- fsb = 100000;
- break;
- case 1:
- fsb = 133333;
- break;
- case 3:
- fsb = 166667;
- break;
- case 2:
- fsb = 200000;
- break;
- case 0:
- fsb = 266667;
- break;
- case 4:
- fsb = 333333;
- break;
- default:
- printk(KERN_ERR "PCORE - MSR_FSB_FREQ undefined value");
- }
-
- rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_tmp);
- dprintk("PCORE - MSR_IA32_EBL_CR_POWERON: 0x%x 0x%x\n",
- msr_lo, msr_tmp);
-
- msr_tmp = (msr_lo >> 22) & 0x1f;
- dprintk("bits 22-26 are 0x%x, speed is %u\n",
- msr_tmp, (msr_tmp * fsb));
-
- ret = (msr_tmp * fsb);
- return ret;
-}
-
-
-static unsigned int pentium4_get_frequency(void)
-{
- struct cpuinfo_x86 *c = &boot_cpu_data;
- u32 msr_lo, msr_hi, mult;
- unsigned int fsb = 0;
- unsigned int ret;
- u8 fsb_code;
-
- /* Pentium 4 Model 0 and 1 do not have the Core Clock Frequency
- * to System Bus Frequency Ratio Field in the Processor Frequency
- * Configuration Register of the MSR. Therefore the current
- * frequency cannot be calculated and has to be measured.
- */
- if (c->x86_model < 2)
- return cpu_khz;
-
- rdmsr(0x2c, msr_lo, msr_hi);
-
- dprintk("P4 - MSR_EBC_FREQUENCY_ID: 0x%x 0x%x\n", msr_lo, msr_hi);
-
- /* decode the FSB: see IA-32 Intel (C) Architecture Software
- * Developer's Manual, Volume 3: System Prgramming Guide,
- * revision #12 in Table B-1: MSRs in the Pentium 4 and
- * Intel Xeon Processors, on page B-4 and B-5.
- */
- fsb_code = (msr_lo >> 16) & 0x7;
- switch (fsb_code) {
- case 0:
- fsb = 100 * 1000;
- break;
- case 1:
- fsb = 13333 * 10;
- break;
- case 2:
- fsb = 200 * 1000;
- break;
- }
-
- if (!fsb)
- printk(KERN_DEBUG PFX "couldn't detect FSB speed. "
- "Please send an e-mail to <linux@brodo.de>\n");
-
- /* Multiplier. */
- mult = msr_lo >> 24;
-
- dprintk("P4 - FSB %u kHz; Multiplier %u; Speed %u kHz\n",
- fsb, mult, (fsb * mult));
-
- ret = (fsb * mult);
- return ret;
-}
-
-
-/* Warning: may get called from smp_call_function_single. */
-unsigned int speedstep_get_frequency(enum speedstep_processor processor)
-{
- switch (processor) {
- case SPEEDSTEP_CPU_PCORE:
- return pentium_core_get_frequency();
- case SPEEDSTEP_CPU_PM:
- return pentiumM_get_frequency();
- case SPEEDSTEP_CPU_P4D:
- case SPEEDSTEP_CPU_P4M:
- return pentium4_get_frequency();
- case SPEEDSTEP_CPU_PIII_T:
- case SPEEDSTEP_CPU_PIII_C:
- case SPEEDSTEP_CPU_PIII_C_EARLY:
- return pentium3_get_frequency(processor);
- default:
- return 0;
- };
- return 0;
-}
-EXPORT_SYMBOL_GPL(speedstep_get_frequency);
-
-
-/*********************************************************************
- * DETECT SPEEDSTEP-CAPABLE PROCESSOR *
- *********************************************************************/
-
-unsigned int speedstep_detect_processor(void)
-{
- struct cpuinfo_x86 *c = &cpu_data(0);
- u32 ebx, msr_lo, msr_hi;
-
- dprintk("x86: %x, model: %x\n", c->x86, c->x86_model);
-
- if ((c->x86_vendor != X86_VENDOR_INTEL) ||
- ((c->x86 != 6) && (c->x86 != 0xF)))
- return 0;
-
- if (c->x86 == 0xF) {
- /* Intel Mobile Pentium 4-M
- * or Intel Mobile Pentium 4 with 533 MHz FSB */
- if (c->x86_model != 2)
- return 0;
-
- ebx = cpuid_ebx(0x00000001);
- ebx &= 0x000000FF;
-
- dprintk("ebx value is %x, x86_mask is %x\n", ebx, c->x86_mask);
-
- switch (c->x86_mask) {
- case 4:
- /*
- * B-stepping [M-P4-M]
- * sample has ebx = 0x0f, production has 0x0e.
- */
- if ((ebx == 0x0e) || (ebx == 0x0f))
- return SPEEDSTEP_CPU_P4M;
- break;
- case 7:
- /*
- * C-stepping [M-P4-M]
- * needs to have ebx=0x0e, else it's a celeron:
- * cf. 25130917.pdf / page 7, footnote 5 even
- * though 25072120.pdf / page 7 doesn't say
- * samples are only of B-stepping...
- */
- if (ebx == 0x0e)
- return SPEEDSTEP_CPU_P4M;
- break;
- case 9:
- /*
- * D-stepping [M-P4-M or M-P4/533]
- *
- * this is totally strange: CPUID 0x0F29 is
- * used by M-P4-M, M-P4/533 and(!) Celeron CPUs.
- * The latter need to be sorted out as they don't
- * support speedstep.
- * Celerons with CPUID 0x0F29 may have either
- * ebx=0x8 or 0xf -- 25130917.pdf doesn't say anything
- * specific.
- * M-P4-Ms may have either ebx=0xe or 0xf [see above]
- * M-P4/533 have either ebx=0xe or 0xf. [25317607.pdf]
- * also, M-P4M HTs have ebx=0x8, too
- * For now, they are distinguished by the model_id
- * string
- */
- if ((ebx == 0x0e) ||
- (strstr(c->x86_model_id,
- "Mobile Intel(R) Pentium(R) 4") != NULL))
- return SPEEDSTEP_CPU_P4M;
- break;
- default:
- break;
- }
- return 0;
- }
-
- switch (c->x86_model) {
- case 0x0B: /* Intel PIII [Tualatin] */
- /* cpuid_ebx(1) is 0x04 for desktop PIII,
- * 0x06 for mobile PIII-M */
- ebx = cpuid_ebx(0x00000001);
- dprintk("ebx is %x\n", ebx);
-
- ebx &= 0x000000FF;
-
- if (ebx != 0x06)
- return 0;
-
- /* So far all PIII-M processors support SpeedStep. See
- * Intel's 24540640.pdf of June 2003
- */
- return SPEEDSTEP_CPU_PIII_T;
-
- case 0x08: /* Intel PIII [Coppermine] */
-
- /* all mobile PIII Coppermines have FSB 100 MHz
- * ==> sort out a few desktop PIIIs. */
- rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_hi);
- dprintk("Coppermine: MSR_IA32_EBL_CR_POWERON is 0x%x, 0x%x\n",
- msr_lo, msr_hi);
- msr_lo &= 0x00c0000;
- if (msr_lo != 0x0080000)
- return 0;
-
- /*
- * If the processor is a mobile version,
- * platform ID has bit 50 set
- * it has SpeedStep technology if either
- * bit 56 or 57 is set
- */
- rdmsr(MSR_IA32_PLATFORM_ID, msr_lo, msr_hi);
- dprintk("Coppermine: MSR_IA32_PLATFORM ID is 0x%x, 0x%x\n",
- msr_lo, msr_hi);
- if ((msr_hi & (1<<18)) &&
- (relaxed_check ? 1 : (msr_hi & (3<<24)))) {
- if (c->x86_mask == 0x01) {
- dprintk("early PIII version\n");
- return SPEEDSTEP_CPU_PIII_C_EARLY;
- } else
- return SPEEDSTEP_CPU_PIII_C;
- }
-
- default:
- return 0;
- }
-}
-EXPORT_SYMBOL_GPL(speedstep_detect_processor);
-
-
-/*********************************************************************
- * DETECT SPEEDSTEP SPEEDS *
- *********************************************************************/
-
-unsigned int speedstep_get_freqs(enum speedstep_processor processor,
- unsigned int *low_speed,
- unsigned int *high_speed,
- unsigned int *transition_latency,
- void (*set_state) (unsigned int state))
-{
- unsigned int prev_speed;
- unsigned int ret = 0;
- unsigned long flags;
- struct timeval tv1, tv2;
-
- if ((!processor) || (!low_speed) || (!high_speed) || (!set_state))
- return -EINVAL;
-
- dprintk("trying to determine both speeds\n");
-
- /* get current speed */
- prev_speed = speedstep_get_frequency(processor);
- if (!prev_speed)
- return -EIO;
-
- dprintk("previous speed is %u\n", prev_speed);
-
- local_irq_save(flags);
-
- /* switch to low state */
- set_state(SPEEDSTEP_LOW);
- *low_speed = speedstep_get_frequency(processor);
- if (!*low_speed) {
- ret = -EIO;
- goto out;
- }
-
- dprintk("low speed is %u\n", *low_speed);
-
- /* start latency measurement */
- if (transition_latency)
- do_gettimeofday(&tv1);
-
- /* switch to high state */
- set_state(SPEEDSTEP_HIGH);
-
- /* end latency measurement */
- if (transition_latency)
- do_gettimeofday(&tv2);
-
- *high_speed = speedstep_get_frequency(processor);
- if (!*high_speed) {
- ret = -EIO;
- goto out;
- }
-
- dprintk("high speed is %u\n", *high_speed);
-
- if (*low_speed == *high_speed) {
- ret = -ENODEV;
- goto out;
- }
-
- /* switch to previous state, if necessary */
- if (*high_speed != prev_speed)
- set_state(SPEEDSTEP_LOW);
-
- if (transition_latency) {
- *transition_latency = (tv2.tv_sec - tv1.tv_sec) * USEC_PER_SEC +
- tv2.tv_usec - tv1.tv_usec;
- dprintk("transition latency is %u uSec\n", *transition_latency);
-
- /* convert uSec to nSec and add 20% for safety reasons */
- *transition_latency *= 1200;
-
- /* check if the latency measurement is too high or too low
- * and set it to a safe value (500uSec) in that case
- */
- if (*transition_latency > 10000000 ||
- *transition_latency < 50000) {
- printk(KERN_WARNING PFX "frequency transition "
- "measured seems out of range (%u "
- "nSec), falling back to a safe one of"
- "%u nSec.\n",
- *transition_latency, 500000);
- *transition_latency = 500000;
- }
- }
-
-out:
- local_irq_restore(flags);
- return ret;
-}
-EXPORT_SYMBOL_GPL(speedstep_get_freqs);
-
-#ifdef CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK
-module_param(relaxed_check, int, 0444);
-MODULE_PARM_DESC(relaxed_check,
- "Don't do all checks for speedstep capability.");
-#endif
-
-MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
-MODULE_DESCRIPTION("Library for Intel SpeedStep 1 or 2 cpufreq drivers.");
-MODULE_LICENSE("GPL");
+++ /dev/null
-/*
- * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- * Library for common functions for Intel SpeedStep v.1 and v.2 support
- *
- * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
- */
-
-
-
-/* processors */
-enum speedstep_processor {
- SPEEDSTEP_CPU_PIII_C_EARLY = 0x00000001, /* Coppermine core */
- SPEEDSTEP_CPU_PIII_C = 0x00000002, /* Coppermine core */
- SPEEDSTEP_CPU_PIII_T = 0x00000003, /* Tualatin core */
- SPEEDSTEP_CPU_P4M = 0x00000004, /* P4-M */
-/* the following processors are not speedstep-capable and are not auto-detected
- * in speedstep_detect_processor(). However, their speed can be detected using
- * the speedstep_get_frequency() call. */
- SPEEDSTEP_CPU_PM = 0xFFFFFF03, /* Pentium M */
- SPEEDSTEP_CPU_P4D = 0xFFFFFF04, /* desktop P4 */
- SPEEDSTEP_CPU_PCORE = 0xFFFFFF05, /* Core */
-};
-
-/* speedstep states -- only two of them */
-
-#define SPEEDSTEP_HIGH 0x00000000
-#define SPEEDSTEP_LOW 0x00000001
-
-
-/* detect a speedstep-capable processor */
-extern enum speedstep_processor speedstep_detect_processor(void);
-
-/* detect the current speed (in khz) of the processor */
-extern unsigned int speedstep_get_frequency(enum speedstep_processor processor);
-
-
-/* detect the low and high speeds of the processor. The callback
- * set_state"'s first argument is either SPEEDSTEP_HIGH or
- * SPEEDSTEP_LOW; the second argument is zero so that no
- * cpufreq_notify_transition calls are initiated.
- */
-extern unsigned int speedstep_get_freqs(enum speedstep_processor processor,
- unsigned int *low_speed,
- unsigned int *high_speed,
- unsigned int *transition_latency,
- void (*set_state) (unsigned int state));
+++ /dev/null
-/*
- * Intel SpeedStep SMI driver.
- *
- * (C) 2003 Hiroshi Miura <miura@da-cha.org>
- *
- * Licensed under the terms of the GNU GPL License version 2.
- *
- */
-
-
-/*********************************************************************
- * SPEEDSTEP - DEFINITIONS *
- *********************************************************************/
-
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/moduleparam.h>
-#include <linux/init.h>
-#include <linux/cpufreq.h>
-#include <linux/delay.h>
-#include <linux/io.h>
-#include <asm/ist.h>
-
-#include "speedstep-lib.h"
-
-/* speedstep system management interface port/command.
- *
- * These parameters are got from IST-SMI BIOS call.
- * If user gives it, these are used.
- *
- */
-static int smi_port;
-static int smi_cmd;
-static unsigned int smi_sig;
-
-/* info about the processor */
-static enum speedstep_processor speedstep_processor;
-
-/*
- * There are only two frequency states for each processor. Values
- * are in kHz for the time being.
- */
-static struct cpufreq_frequency_table speedstep_freqs[] = {
- {SPEEDSTEP_HIGH, 0},
- {SPEEDSTEP_LOW, 0},
- {0, CPUFREQ_TABLE_END},
-};
-
-#define GET_SPEEDSTEP_OWNER 0
-#define GET_SPEEDSTEP_STATE 1
-#define SET_SPEEDSTEP_STATE 2
-#define GET_SPEEDSTEP_FREQS 4
-
-/* how often shall the SMI call be tried if it failed, e.g. because
- * of DMA activity going on? */
-#define SMI_TRIES 5
-
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, \
- "speedstep-smi", msg)
-
-/**
- * speedstep_smi_ownership
- */
-static int speedstep_smi_ownership(void)
-{
- u32 command, result, magic, dummy;
- u32 function = GET_SPEEDSTEP_OWNER;
- unsigned char magic_data[] = "Copyright (c) 1999 Intel Corporation";
-
- command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
- magic = virt_to_phys(magic_data);
-
- dprintk("trying to obtain ownership with command %x at port %x\n",
- command, smi_port);
-
- __asm__ __volatile__(
- "push %%ebp\n"
- "out %%al, (%%dx)\n"
- "pop %%ebp\n"
- : "=D" (result),
- "=a" (dummy), "=b" (dummy), "=c" (dummy), "=d" (dummy),
- "=S" (dummy)
- : "a" (command), "b" (function), "c" (0), "d" (smi_port),
- "D" (0), "S" (magic)
- : "memory"
- );
-
- dprintk("result is %x\n", result);
-
- return result;
-}
-
-/**
- * speedstep_smi_get_freqs - get SpeedStep preferred & current freq.
- * @low: the low frequency value is placed here
- * @high: the high frequency value is placed here
- *
- * Only available on later SpeedStep-enabled systems, returns false results or
- * even hangs [cf. bugme.osdl.org # 1422] on earlier systems. Empirical testing
- * shows that the latter occurs if !(ist_info.event & 0xFFFF).
- */
-static int speedstep_smi_get_freqs(unsigned int *low, unsigned int *high)
-{
- u32 command, result = 0, edi, high_mhz, low_mhz, dummy;
- u32 state = 0;
- u32 function = GET_SPEEDSTEP_FREQS;
-
- if (!(ist_info.event & 0xFFFF)) {
- dprintk("bug #1422 -- can't read freqs from BIOS\n");
- return -ENODEV;
- }
-
- command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
-
- dprintk("trying to determine frequencies with command %x at port %x\n",
- command, smi_port);
-
- __asm__ __volatile__(
- "push %%ebp\n"
- "out %%al, (%%dx)\n"
- "pop %%ebp"
- : "=a" (result),
- "=b" (high_mhz),
- "=c" (low_mhz),
- "=d" (state), "=D" (edi), "=S" (dummy)
- : "a" (command),
- "b" (function),
- "c" (state),
- "d" (smi_port), "S" (0), "D" (0)
- );
-
- dprintk("result %x, low_freq %u, high_freq %u\n",
- result, low_mhz, high_mhz);
-
- /* abort if results are obviously incorrect... */
- if ((high_mhz + low_mhz) < 600)
- return -EINVAL;
-
- *high = high_mhz * 1000;
- *low = low_mhz * 1000;
-
- return result;
-}
-
-/**
- * speedstep_get_state - set the SpeedStep state
- * @state: processor frequency state (SPEEDSTEP_LOW or SPEEDSTEP_HIGH)
- *
- */
-static int speedstep_get_state(void)
-{
- u32 function = GET_SPEEDSTEP_STATE;
- u32 result, state, edi, command, dummy;
-
- command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
-
- dprintk("trying to determine current setting with command %x "
- "at port %x\n", command, smi_port);
-
- __asm__ __volatile__(
- "push %%ebp\n"
- "out %%al, (%%dx)\n"
- "pop %%ebp\n"
- : "=a" (result),
- "=b" (state), "=D" (edi),
- "=c" (dummy), "=d" (dummy), "=S" (dummy)
- : "a" (command), "b" (function), "c" (0),
- "d" (smi_port), "S" (0), "D" (0)
- );
-
- dprintk("state is %x, result is %x\n", state, result);
-
- return state & 1;
-}
-
-
-/**
- * speedstep_set_state - set the SpeedStep state
- * @state: new processor frequency state (SPEEDSTEP_LOW or SPEEDSTEP_HIGH)
- *
- */
-static void speedstep_set_state(unsigned int state)
-{
- unsigned int result = 0, command, new_state, dummy;
- unsigned long flags;
- unsigned int function = SET_SPEEDSTEP_STATE;
- unsigned int retry = 0;
-
- if (state > 0x1)
- return;
-
- /* Disable IRQs */
- local_irq_save(flags);
-
- command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
-
- dprintk("trying to set frequency to state %u "
- "with command %x at port %x\n",
- state, command, smi_port);
-
- do {
- if (retry) {
- dprintk("retry %u, previous result %u, waiting...\n",
- retry, result);
- mdelay(retry * 50);
- }
- retry++;
- __asm__ __volatile__(
- "push %%ebp\n"
- "out %%al, (%%dx)\n"
- "pop %%ebp"
- : "=b" (new_state), "=D" (result),
- "=c" (dummy), "=a" (dummy),
- "=d" (dummy), "=S" (dummy)
- : "a" (command), "b" (function), "c" (state),
- "d" (smi_port), "S" (0), "D" (0)
- );
- } while ((new_state != state) && (retry <= SMI_TRIES));
-
- /* enable IRQs */
- local_irq_restore(flags);
-
- if (new_state == state)
- dprintk("change to %u MHz succeeded after %u tries "
- "with result %u\n",
- (speedstep_freqs[new_state].frequency / 1000),
- retry, result);
- else
- printk(KERN_ERR "cpufreq: change to state %u "
- "failed with new_state %u and result %u\n",
- state, new_state, result);
-
- return;
-}
-
-
-/**
- * speedstep_target - set a new CPUFreq policy
- * @policy: new policy
- * @target_freq: new freq
- * @relation:
- *
- * Sets a new CPUFreq policy/freq.
- */
-static int speedstep_target(struct cpufreq_policy *policy,
- unsigned int target_freq, unsigned int relation)
-{
- unsigned int newstate = 0;
- struct cpufreq_freqs freqs;
-
- if (cpufreq_frequency_table_target(policy, &speedstep_freqs[0],
- target_freq, relation, &newstate))
- return -EINVAL;
-
- freqs.old = speedstep_freqs[speedstep_get_state()].frequency;
- freqs.new = speedstep_freqs[newstate].frequency;
- freqs.cpu = 0; /* speedstep.c is UP only driver */
-
- if (freqs.old == freqs.new)
- return 0;
-
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- speedstep_set_state(newstate);
- cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-
- return 0;
-}
-
-
-/**
- * speedstep_verify - verifies a new CPUFreq policy
- * @policy: new policy
- *
- * Limit must be within speedstep_low_freq and speedstep_high_freq, with
- * at least one border included.
- */
-static int speedstep_verify(struct cpufreq_policy *policy)
-{
- return cpufreq_frequency_table_verify(policy, &speedstep_freqs[0]);
-}
-
-
-static int speedstep_cpu_init(struct cpufreq_policy *policy)
-{
- int result;
- unsigned int speed, state;
- unsigned int *low, *high;
-
- /* capability check */
- if (policy->cpu != 0)
- return -ENODEV;
-
- result = speedstep_smi_ownership();
- if (result) {
- dprintk("fails in acquiring ownership of a SMI interface.\n");
- return -EINVAL;
- }
-
- /* detect low and high frequency */
- low = &speedstep_freqs[SPEEDSTEP_LOW].frequency;
- high = &speedstep_freqs[SPEEDSTEP_HIGH].frequency;
-
- result = speedstep_smi_get_freqs(low, high);
- if (result) {
- /* fall back to speedstep_lib.c dection mechanism:
- * try both states out */
- dprintk("could not detect low and high frequencies "
- "by SMI call.\n");
- result = speedstep_get_freqs(speedstep_processor,
- low, high,
- NULL,
- &speedstep_set_state);
-
- if (result) {
- dprintk("could not detect two different speeds"
- " -- aborting.\n");
- return result;
- } else
- dprintk("workaround worked.\n");
- }
-
- /* get current speed setting */
- state = speedstep_get_state();
- speed = speedstep_freqs[state].frequency;
-
- dprintk("currently at %s speed setting - %i MHz\n",
- (speed == speedstep_freqs[SPEEDSTEP_LOW].frequency)
- ? "low" : "high",
- (speed / 1000));
-
- /* cpuinfo and default policy values */
- policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
- policy->cur = speed;
-
- result = cpufreq_frequency_table_cpuinfo(policy, speedstep_freqs);
- if (result)
- return result;
-
- cpufreq_frequency_table_get_attr(speedstep_freqs, policy->cpu);
-
- return 0;
-}
-
-static int speedstep_cpu_exit(struct cpufreq_policy *policy)
-{
- cpufreq_frequency_table_put_attr(policy->cpu);
- return 0;
-}
-
-static unsigned int speedstep_get(unsigned int cpu)
-{
- if (cpu)
- return -ENODEV;
- return speedstep_get_frequency(speedstep_processor);
-}
-
-
-static int speedstep_resume(struct cpufreq_policy *policy)
-{
- int result = speedstep_smi_ownership();
-
- if (result)
- dprintk("fails in re-acquiring ownership of a SMI interface.\n");
-
- return result;
-}
-
-static struct freq_attr *speedstep_attr[] = {
- &cpufreq_freq_attr_scaling_available_freqs,
- NULL,
-};
-
-static struct cpufreq_driver speedstep_driver = {
- .name = "speedstep-smi",
- .verify = speedstep_verify,
- .target = speedstep_target,
- .init = speedstep_cpu_init,
- .exit = speedstep_cpu_exit,
- .get = speedstep_get,
- .resume = speedstep_resume,
- .owner = THIS_MODULE,
- .attr = speedstep_attr,
-};
-
-/**
- * speedstep_init - initializes the SpeedStep CPUFreq driver
- *
- * Initializes the SpeedStep support. Returns -ENODEV on unsupported
- * BIOS, -EINVAL on problems during initiatization, and zero on
- * success.
- */
-static int __init speedstep_init(void)
-{
- speedstep_processor = speedstep_detect_processor();
-
- switch (speedstep_processor) {
- case SPEEDSTEP_CPU_PIII_T:
- case SPEEDSTEP_CPU_PIII_C:
- case SPEEDSTEP_CPU_PIII_C_EARLY:
- break;
- default:
- speedstep_processor = 0;
- }
-
- if (!speedstep_processor) {
- dprintk("No supported Intel CPU detected.\n");
- return -ENODEV;
- }
-
- dprintk("signature:0x%.8lx, command:0x%.8lx, "
- "event:0x%.8lx, perf_level:0x%.8lx.\n",
- ist_info.signature, ist_info.command,
- ist_info.event, ist_info.perf_level);
-
- /* Error if no IST-SMI BIOS or no PARM
- sig= 'ISGE' aka 'Intel Speedstep Gate E' */
- if ((ist_info.signature != 0x47534943) && (
- (smi_port == 0) || (smi_cmd == 0)))
- return -ENODEV;
-
- if (smi_sig == 1)
- smi_sig = 0x47534943;
- else
- smi_sig = ist_info.signature;
-
- /* setup smi_port from MODLULE_PARM or BIOS */
- if ((smi_port > 0xff) || (smi_port < 0))
- return -EINVAL;
- else if (smi_port == 0)
- smi_port = ist_info.command & 0xff;
-
- if ((smi_cmd > 0xff) || (smi_cmd < 0))
- return -EINVAL;
- else if (smi_cmd == 0)
- smi_cmd = (ist_info.command >> 16) & 0xff;
-
- return cpufreq_register_driver(&speedstep_driver);
-}
-
-
-/**
- * speedstep_exit - unregisters SpeedStep support
- *
- * Unregisters SpeedStep support.
- */
-static void __exit speedstep_exit(void)
-{
- cpufreq_unregister_driver(&speedstep_driver);
-}
-
-module_param(smi_port, int, 0444);
-module_param(smi_cmd, int, 0444);
-module_param(smi_sig, uint, 0444);
-
-MODULE_PARM_DESC(smi_port, "Override the BIOS-given IST port with this value "
- "-- Intel's default setting is 0xb2");
-MODULE_PARM_DESC(smi_cmd, "Override the BIOS-given IST command with this value "
- "-- Intel's default setting is 0x82");
-MODULE_PARM_DESC(smi_sig, "Set to 1 to fake the IST signature when using the "
- "SMI interface.");
-
-MODULE_AUTHOR("Hiroshi Miura");
-MODULE_DESCRIPTION("Speedstep driver for IST applet SMI interface.");
-MODULE_LICENSE("GPL");
-
-module_init(speedstep_init);
-module_exit(speedstep_exit);
static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
{
+ u64 misc_enable;
+
/* Unmask CPUID levels if masked: */
if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
- u64 misc_enable;
-
rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
if (misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID) {
* (model 2) with the same problem.
*/
if (c->x86 == 15) {
- u64 misc_enable;
-
rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) {
}
}
#endif
+
+ /*
+ * If fast string is not enabled in IA32_MISC_ENABLE for any reason,
+ * clear the fast string and enhanced fast string CPU capabilities.
+ */
+ if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
+ rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
+ if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) {
+ printk(KERN_INFO "Disabled fast string operations\n");
+ setup_clear_cpu_cap(X86_FEATURE_REP_GOOD);
+ setup_clear_cpu_cap(X86_FEATURE_ERMS);
+ }
+ }
}
#ifdef CONFIG_X86_32
out_free:
if (b) {
kobject_put(&b->kobj);
+ list_del(&b->miscj);
kfree(b);
}
return err;
*/
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
+ h = lvtthmr_init;
/*
* The initial value of thermal LVT entries on all APs always reads
* 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
* sequence to them and LVT registers are reset to 0s except for
* the mask bits which are set to 1s when APs receive INIT IPI.
- * Always restore the value that BIOS has programmed on AP based on
- * BSP's info we saved since BIOS is always setting the same value
- * for all threads/cores
+ * If BIOS takes over the thermal interrupt and sets its interrupt
+ * delivery mode to SMI (not fixed), it restores the value that the
+ * BIOS has programmed on AP based on BSP's info we saved since BIOS
+ * is always setting the same value for all threads/cores.
*/
- apic_write(APIC_LVTTHMR, lvtthmr_init);
+ if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+ apic_write(APIC_LVTTHMR, lvtthmr_init);
- h = lvtthmr_init;
if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
printk(KERN_DEBUG
#include <asm/nmi.h>
#include <asm/compat.h>
#include <asm/smp.h>
+#include <asm/alternative.h>
#if 0
#undef wrmsrl
return new_raw_count;
}
-/* using X86_FEATURE_PERFCTR_CORE to later implement ALTERNATIVE() here */
static inline int x86_pmu_addr_offset(int index)
{
- if (boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
- return index << 1;
- return index;
+ int offset;
+
+ /* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
+ alternative_io(ASM_NOP2,
+ "shll $1, %%eax",
+ X86_FEATURE_PERFCTR_CORE,
+ "=a" (offset),
+ "a" (index));
+
+ return offset;
}
static inline unsigned int x86_pmu_config_addr(int index)
/*
* Branch tracing:
*/
- if ((attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS) &&
- (hwc->sample_period == 1)) {
+ if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
+ !attr->freq && hwc->sample_period == 1) {
/* BTS is not supported by this architecture. */
if (!x86_pmu.bts_active)
return -EOPNOTSUPP;
cpuc = &__get_cpu_var(cpu_hw_events);
+ /*
+ * Some chipsets need to unmask the LVTPC in a particular spot
+ * inside the nmi handler. As a result, the unmasking was pushed
+ * into all the nmi handlers.
+ *
+ * This generic handler doesn't seem to have any issues where the
+ * unmasking occurs so it was left at the top.
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask)) {
/*
return NOTIFY_DONE;
}
- apic_write(APIC_LVTPC, APIC_DM_NMI);
-
handled = x86_pmu.handle_irq(args->regs);
if (!handled)
return NOTIFY_DONE;
* callchain support
*/
-static void
-backtrace_warning_symbol(void *data, char *msg, unsigned long symbol)
-{
- /* Ignore warnings */
-}
-
-static void backtrace_warning(void *data, char *msg)
-{
- /* Ignore warnings */
-}
-
static int backtrace_stack(void *data, char *name)
{
return 0;
}
static const struct stacktrace_ops backtrace_ops = {
- .warning = backtrace_warning,
- .warning_symbol = backtrace_warning_symbol,
.stack = backtrace_stack,
.address = backtrace_address,
.walk_stack = print_context_stack_bp,
*/
static const u64 amd_perfmon_event_map[] =
{
- [PERF_COUNT_HW_CPU_CYCLES] = 0x0076,
- [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
- [PERF_COUNT_HW_CACHE_REFERENCES] = 0x0080,
- [PERF_COUNT_HW_CACHE_MISSES] = 0x0081,
- [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2,
- [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3,
+ [PERF_COUNT_HW_CPU_CYCLES] = 0x0076,
+ [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
+ [PERF_COUNT_HW_CACHE_REFERENCES] = 0x0080,
+ [PERF_COUNT_HW_CACHE_MISSES] = 0x0081,
+ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2,
+ [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3,
+ [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00d0, /* "Decoder empty" event */
+ [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x00d1, /* "Dispatch stalls" event */
};
static u64 amd_pmu_event_map(int hw_event)
/*
* Intel PerfMon, used on Core and later.
*/
-static const u64 intel_perfmon_event_map[] =
+static u64 intel_perfmon_event_map[PERF_COUNT_HW_MAX] __read_mostly =
{
[PERF_COUNT_HW_CPU_CYCLES] = 0x003c,
[PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
[PERF_COUNT_HW_BUS_CYCLES] = 0x013c,
};
-static struct event_constraint intel_core_event_constraints[] =
+static struct event_constraint intel_core_event_constraints[] __read_mostly =
{
INTEL_EVENT_CONSTRAINT(0x11, 0x2), /* FP_ASSIST */
INTEL_EVENT_CONSTRAINT(0x12, 0x2), /* MUL */
EVENT_CONSTRAINT_END
};
-static struct event_constraint intel_core2_event_constraints[] =
+static struct event_constraint intel_core2_event_constraints[] __read_mostly =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
EVENT_CONSTRAINT_END
};
-static struct event_constraint intel_nehalem_event_constraints[] =
+static struct event_constraint intel_nehalem_event_constraints[] __read_mostly =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
EVENT_CONSTRAINT_END
};
-static struct extra_reg intel_nehalem_extra_regs[] =
+static struct extra_reg intel_nehalem_extra_regs[] __read_mostly =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff),
EVENT_EXTRA_END
};
-static struct event_constraint intel_nehalem_percore_constraints[] =
+static struct event_constraint intel_nehalem_percore_constraints[] __read_mostly =
{
INTEL_EVENT_CONSTRAINT(0xb7, 0),
EVENT_CONSTRAINT_END
};
-static struct event_constraint intel_westmere_event_constraints[] =
+static struct event_constraint intel_westmere_event_constraints[] __read_mostly =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
EVENT_CONSTRAINT_END
};
-static struct event_constraint intel_snb_event_constraints[] =
+static struct event_constraint intel_snb_event_constraints[] __read_mostly =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
EVENT_CONSTRAINT_END
};
-static struct extra_reg intel_westmere_extra_regs[] =
+static struct extra_reg intel_westmere_extra_regs[] __read_mostly =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0xffff),
EVENT_EXTRA_END
};
-static struct event_constraint intel_westmere_percore_constraints[] =
+static struct event_constraint intel_westmere_percore_constraints[] __read_mostly =
{
INTEL_EVENT_CONSTRAINT(0xb7, 0),
INTEL_EVENT_CONSTRAINT(0xbb, 0),
EVENT_CONSTRAINT_END
};
-static struct event_constraint intel_gen_event_constraints[] =
+static struct event_constraint intel_gen_event_constraints[] __read_mostly =
{
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
},
},
[ C(LL ) ] = {
- /*
- * TBD: Need Off-core Response Performance Monitoring support
- */
[ C(OP_READ) ] = {
- /* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
+ /* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
- /* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
- [ C(RESULT_MISS) ] = 0x01bb,
+ /* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
+ [ C(RESULT_MISS) ] = 0x01b7,
},
[ C(OP_WRITE) ] = {
- /* OFFCORE_RESPONSE_0.ANY_RFO.LOCAL_CACHE */
+ /* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
- /* OFFCORE_RESPONSE_1.ANY_RFO.ANY_LLC_MISS */
- [ C(RESULT_MISS) ] = 0x01bb,
+ /* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
+ [ C(RESULT_MISS) ] = 0x01b7,
},
[ C(OP_PREFETCH) ] = {
- /* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
+ /* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
- /* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
- [ C(RESULT_MISS) ] = 0x01bb,
+ /* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
+ [ C(RESULT_MISS) ] = 0x01b7,
},
},
[ C(DTLB) ] = {
},
[ C(LL ) ] = {
[ C(OP_READ) ] = {
- /* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
+ /* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
- /* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
- [ C(RESULT_MISS) ] = 0x01bb,
+ /* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
+ [ C(RESULT_MISS) ] = 0x01b7,
},
/*
* Use RFO, not WRITEBACK, because a write miss would typically occur
* on RFO.
*/
[ C(OP_WRITE) ] = {
- /* OFFCORE_RESPONSE_1.ANY_RFO.LOCAL_CACHE */
- [ C(RESULT_ACCESS) ] = 0x01bb,
- /* OFFCORE_RESPONSE_0.ANY_RFO.ANY_LLC_MISS */
+ /* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
+ [ C(RESULT_ACCESS) ] = 0x01b7,
+ /* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
[ C(RESULT_MISS) ] = 0x01b7,
},
[ C(OP_PREFETCH) ] = {
- /* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
+ /* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
[ C(RESULT_ACCESS) ] = 0x01b7,
- /* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
- [ C(RESULT_MISS) ] = 0x01bb,
+ /* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
+ [ C(RESULT_MISS) ] = 0x01b7,
},
},
[ C(DTLB) ] = {
};
/*
- * OFFCORE_RESPONSE MSR bits (subset), See IA32 SDM Vol 3 30.6.1.3
+ * Nehalem/Westmere MSR_OFFCORE_RESPONSE bits;
+ * See IA32 SDM Vol 3B 30.6.1.3
*/
-#define DMND_DATA_RD (1 << 0)
-#define DMND_RFO (1 << 1)
-#define DMND_WB (1 << 3)
-#define PF_DATA_RD (1 << 4)
-#define PF_DATA_RFO (1 << 5)
-#define RESP_UNCORE_HIT (1 << 8)
-#define RESP_MISS (0xf600) /* non uncore hit */
+#define NHM_DMND_DATA_RD (1 << 0)
+#define NHM_DMND_RFO (1 << 1)
+#define NHM_DMND_IFETCH (1 << 2)
+#define NHM_DMND_WB (1 << 3)
+#define NHM_PF_DATA_RD (1 << 4)
+#define NHM_PF_DATA_RFO (1 << 5)
+#define NHM_PF_IFETCH (1 << 6)
+#define NHM_OFFCORE_OTHER (1 << 7)
+#define NHM_UNCORE_HIT (1 << 8)
+#define NHM_OTHER_CORE_HIT_SNP (1 << 9)
+#define NHM_OTHER_CORE_HITM (1 << 10)
+ /* reserved */
+#define NHM_REMOTE_CACHE_FWD (1 << 12)
+#define NHM_REMOTE_DRAM (1 << 13)
+#define NHM_LOCAL_DRAM (1 << 14)
+#define NHM_NON_DRAM (1 << 15)
+
+#define NHM_ALL_DRAM (NHM_REMOTE_DRAM|NHM_LOCAL_DRAM)
+
+#define NHM_DMND_READ (NHM_DMND_DATA_RD)
+#define NHM_DMND_WRITE (NHM_DMND_RFO|NHM_DMND_WB)
+#define NHM_DMND_PREFETCH (NHM_PF_DATA_RD|NHM_PF_DATA_RFO)
+
+#define NHM_L3_HIT (NHM_UNCORE_HIT|NHM_OTHER_CORE_HIT_SNP|NHM_OTHER_CORE_HITM)
+#define NHM_L3_MISS (NHM_NON_DRAM|NHM_ALL_DRAM|NHM_REMOTE_CACHE_FWD)
+#define NHM_L3_ACCESS (NHM_L3_HIT|NHM_L3_MISS)
static __initconst const u64 nehalem_hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_MAX]
{
[ C(LL ) ] = {
[ C(OP_READ) ] = {
- [ C(RESULT_ACCESS) ] = DMND_DATA_RD|RESP_UNCORE_HIT,
- [ C(RESULT_MISS) ] = DMND_DATA_RD|RESP_MISS,
+ [ C(RESULT_ACCESS) ] = NHM_DMND_READ|NHM_L3_ACCESS,
+ [ C(RESULT_MISS) ] = NHM_DMND_READ|NHM_L3_MISS,
},
[ C(OP_WRITE) ] = {
- [ C(RESULT_ACCESS) ] = DMND_RFO|DMND_WB|RESP_UNCORE_HIT,
- [ C(RESULT_MISS) ] = DMND_RFO|DMND_WB|RESP_MISS,
+ [ C(RESULT_ACCESS) ] = NHM_DMND_WRITE|NHM_L3_ACCESS,
+ [ C(RESULT_MISS) ] = NHM_DMND_WRITE|NHM_L3_MISS,
},
[ C(OP_PREFETCH) ] = {
- [ C(RESULT_ACCESS) ] = PF_DATA_RD|PF_DATA_RFO|RESP_UNCORE_HIT,
- [ C(RESULT_MISS) ] = PF_DATA_RD|PF_DATA_RFO|RESP_MISS,
+ [ C(RESULT_ACCESS) ] = NHM_DMND_PREFETCH|NHM_L3_ACCESS,
+ [ C(RESULT_MISS) ] = NHM_DMND_PREFETCH|NHM_L3_MISS,
},
}
};
cpuc = &__get_cpu_var(cpu_hw_events);
+ /*
+ * Some chipsets need to unmask the LVTPC in a particular spot
+ * inside the nmi handler. As a result, the unmasking was pushed
+ * into all the nmi handlers.
+ *
+ * This handler doesn't seem to have any issues with the unmasking
+ * so it was left at the top.
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
struct hw_perf_event *hwc = &event->hw;
unsigned int hw_event, bts_event;
+ if (event->attr.freq)
+ return NULL;
+
hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
* AJ106 could possibly be worked around by not allowing LBR
* usage from PEBS, including the fixup.
* AJ68 could possibly be worked around by always programming
- * a pebs_event_reset[0] value and coping with the lost events.
+ * a pebs_event_reset[0] value and coping with the lost events.
*
* But taken together it might just make sense to not enable PEBS on
* these chips.
x86_pmu.percore_constraints = intel_nehalem_percore_constraints;
x86_pmu.enable_all = intel_pmu_nhm_enable_all;
x86_pmu.extra_regs = intel_nehalem_extra_regs;
+
+ /* UOPS_ISSUED.STALLED_CYCLES */
+ intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x180010e;
+ /* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
+ intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x1803fb1;
+
+ if (ebx & 0x40) {
+ /*
+ * Erratum AAJ80 detected, we work it around by using
+ * the BR_MISP_EXEC.ANY event. This will over-count
+ * branch-misses, but it's still much better than the
+ * architectural event which is often completely bogus:
+ */
+ intel_perfmon_event_map[PERF_COUNT_HW_BRANCH_MISSES] = 0x7f89;
+
+ pr_cont("erratum AAJ80 worked around, ");
+ }
pr_cont("Nehalem events, ");
break;
x86_pmu.enable_all = intel_pmu_nhm_enable_all;
x86_pmu.pebs_constraints = intel_westmere_pebs_event_constraints;
x86_pmu.extra_regs = intel_westmere_extra_regs;
+
+ /* UOPS_ISSUED.STALLED_CYCLES */
+ intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x180010e;
+ /* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
+ intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x1803fb1;
+
pr_cont("Westmere events, ");
break;
x86_pmu.event_constraints = intel_snb_event_constraints;
x86_pmu.pebs_constraints = intel_snb_pebs_events;
+
+ /* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
+ intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x180010e;
+ /* UOPS_DISPATCHED.THREAD,c=1,i=1 to count stall cycles*/
+ intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x18001b1;
+
pr_cont("SandyBridge events, ");
break;
.opcode = P4_OPCODE(P4_EVENT_MISPRED_BRANCH_RETIRED),
.escr_msr = { MSR_P4_CRU_ESCR0, MSR_P4_CRU_ESCR1 },
.escr_emask =
- P4_ESCR_EMASK_BIT(P4_EVENT_MISPRED_BRANCH_RETIRED, NBOGUS),
+ P4_ESCR_EMASK_BIT(P4_EVENT_MISPRED_BRANCH_RETIRED, NBOGUS),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_X87_ASSIST] = {
int idx, handled = 0;
u64 val;
- data.addr = 0;
- data.raw = NULL;
+ perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
x86_pmu_stop(event, 0);
}
- if (handled) {
- /* p4 quirk: unmask it again */
- apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+ if (handled)
inc_irq_stat(apic_perf_irqs);
- }
+
+ /*
+ * When dealing with the unmasking of the LVTPC on P4 perf hw, it has
+ * been observed that the OVF bit flag has to be cleared first _before_
+ * the LVTPC can be unmasked.
+ *
+ * The reason is the NMI line will continue to be asserted while the OVF
+ * bit is set. This causes a second NMI to generate if the LVTPC is
+ * unmasked before the OVF bit is cleared, leading to unknown NMI
+ * messages.
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
return handled;
}
{
unsigned int low, high;
- /* If we get stripped -- indexig fails */
+ /* If we get stripped -- indexing fails */
BUILD_BUG_ON(ARCH_P4_MAX_CCCR > X86_PMC_MAX_GENERIC);
rdmsr(MSR_IA32_MISC_ENABLE, low, high);
set_io_apic_irq_attr(&attr, idx, line, it->trigger, it->polarity);
- return io_apic_setup_irq_pin(*out_hwirq, cpu_to_node(0), &attr);
+ return io_apic_setup_irq_pin_once(*out_hwirq, cpu_to_node(0), &attr);
}
static void __init ioapic_add_ofnode(struct device_node *np)
}
EXPORT_SYMBOL_GPL(print_context_stack_bp);
-
-static void
-print_trace_warning_symbol(void *data, char *msg, unsigned long symbol)
-{
- printk(data);
- print_symbol(msg, symbol);
- printk("\n");
-}
-
-static void print_trace_warning(void *data, char *msg)
-{
- printk("%s%s\n", (char *)data, msg);
-}
-
static int print_trace_stack(void *data, char *name)
{
printk("%s <%s> ", (char *)data, name);
}
static const struct stacktrace_ops print_trace_ops = {
- .warning = print_trace_warning,
- .warning_symbol = print_trace_warning_symbol,
.stack = print_trace_stack,
.address = print_trace_address,
.walk_stack = print_context_stack,
/*
* Common hpet info
*/
-static unsigned long hpet_period;
+static unsigned long hpet_freq;
static void hpet_legacy_set_mode(enum clock_event_mode mode,
struct clock_event_device *evt);
.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
.set_mode = hpet_legacy_set_mode,
.set_next_event = hpet_legacy_next_event,
- .shift = 32,
.irq = 0,
.rating = 50,
};
/* Start HPET legacy interrupts */
hpet_enable_legacy_int();
- /*
- * The mult factor is defined as (include/linux/clockchips.h)
- * mult/2^shift = cyc/ns (in contrast to ns/cyc in clocksource.h)
- * hpet_period is in units of femtoseconds (per cycle), so
- * mult/2^shift = cyc/ns = 10^6/hpet_period
- * mult = (10^6 * 2^shift)/hpet_period
- * mult = (FSEC_PER_NSEC << hpet_clockevent.shift)/hpet_period
- */
- hpet_clockevent.mult = div_sc((unsigned long) FSEC_PER_NSEC,
- hpet_period, hpet_clockevent.shift);
- /* Calculate the min / max delta */
- hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFFFF,
- &hpet_clockevent);
- /* Setup minimum reprogramming delta. */
- hpet_clockevent.min_delta_ns = clockevent_delta2ns(HPET_MIN_PROG_DELTA,
- &hpet_clockevent);
-
/*
* Start hpet with the boot cpu mask and make it
* global after the IO_APIC has been initialized.
*/
hpet_clockevent.cpumask = cpumask_of(smp_processor_id());
- clockevents_register_device(&hpet_clockevent);
+ clockevents_config_and_register(&hpet_clockevent, hpet_freq,
+ HPET_MIN_PROG_DELTA, 0x7FFFFFFF);
global_clock_event = &hpet_clockevent;
printk(KERN_DEBUG "hpet clockevent registered\n");
}
static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
{
struct clock_event_device *evt = &hdev->evt;
- uint64_t hpet_freq;
WARN_ON(cpu != smp_processor_id());
if (!(hdev->flags & HPET_DEV_VALID))
evt->set_mode = hpet_msi_set_mode;
evt->set_next_event = hpet_msi_next_event;
- evt->shift = 32;
-
- /*
- * The period is a femto seconds value. We need to calculate the
- * scaled math multiplication factor for nanosecond to hpet tick
- * conversion.
- */
- hpet_freq = FSEC_PER_SEC;
- do_div(hpet_freq, hpet_period);
- evt->mult = div_sc((unsigned long) hpet_freq,
- NSEC_PER_SEC, evt->shift);
- /* Calculate the max delta */
- evt->max_delta_ns = clockevent_delta2ns(0x7FFFFFFF, evt);
- /* 5 usec minimum reprogramming delta. */
- evt->min_delta_ns = 5000;
-
evt->cpumask = cpumask_of(hdev->cpu);
- clockevents_register_device(evt);
+
+ clockevents_config_and_register(evt, hpet_freq, HPET_MIN_PROG_DELTA,
+ 0x7FFFFFFF);
}
#ifdef CONFIG_HPET
static int hpet_clocksource_register(void)
{
u64 start, now;
- u64 hpet_freq;
cycle_t t1;
/* Start the counter */
return -ENODEV;
}
- /*
- * The definition of mult is (include/linux/clocksource.h)
- * mult/2^shift = ns/cyc and hpet_period is in units of fsec/cyc
- * so we first need to convert hpet_period to ns/cyc units:
- * mult/2^shift = ns/cyc = hpet_period/10^6
- * mult = (hpet_period * 2^shift)/10^6
- * mult = (hpet_period << shift)/FSEC_PER_NSEC
- */
-
- /* Need to convert hpet_period (fsec/cyc) to cyc/sec:
- *
- * cyc/sec = FSEC_PER_SEC/hpet_period(fsec/cyc)
- * cyc/sec = (FSEC_PER_NSEC * NSEC_PER_SEC)/hpet_period
- */
- hpet_freq = FSEC_PER_SEC;
- do_div(hpet_freq, hpet_period);
clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq);
-
return 0;
}
*/
int __init hpet_enable(void)
{
+ unsigned long hpet_period;
unsigned int id;
+ u64 freq;
int i;
if (!is_hpet_capable())
if (hpet_period < HPET_MIN_PERIOD || hpet_period > HPET_MAX_PERIOD)
goto out_nohpet;
+ /*
+ * The period is a femto seconds value. Convert it to a
+ * frequency.
+ */
+ freq = FSEC_PER_SEC;
+ do_div(freq, hpet_period);
+ hpet_freq = freq;
+
/*
* Read the HPET ID register to retrieve the IRQ routing
* information and the number of channels
.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
.set_mode = init_pit_timer,
.set_next_event = pit_next_event,
- .shift = 32,
.irq = 0,
};
* IO_APIC has been initialized.
*/
pit_ce.cpumask = cpumask_of(smp_processor_id());
- pit_ce.mult = div_sc(CLOCK_TICK_RATE, NSEC_PER_SEC, pit_ce.shift);
- pit_ce.max_delta_ns = clockevent_delta2ns(0x7FFF, &pit_ce);
- pit_ce.min_delta_ns = clockevent_delta2ns(0xF, &pit_ce);
- clockevents_register_device(&pit_ce);
+ clockevents_config_and_register(&pit_ce, CLOCK_TICK_RATE, 0xF, 0x7FFF);
global_clock_event = &pit_ce;
}
#ifndef CONFIG_X86_64
-/*
- * Since the PIT overflows every tick, its not very useful
- * to just read by itself. So use jiffies to emulate a free
- * running counter:
- */
-static cycle_t pit_read(struct clocksource *cs)
-{
- static int old_count;
- static u32 old_jifs;
- unsigned long flags;
- int count;
- u32 jifs;
-
- raw_spin_lock_irqsave(&i8253_lock, flags);
- /*
- * Although our caller may have the read side of xtime_lock,
- * this is now a seqlock, and we are cheating in this routine
- * by having side effects on state that we cannot undo if
- * there is a collision on the seqlock and our caller has to
- * retry. (Namely, old_jifs and old_count.) So we must treat
- * jiffies as volatile despite the lock. We read jiffies
- * before latching the timer count to guarantee that although
- * the jiffies value might be older than the count (that is,
- * the counter may underflow between the last point where
- * jiffies was incremented and the point where we latch the
- * count), it cannot be newer.
- */
- jifs = jiffies;
- outb_pit(0x00, PIT_MODE); /* latch the count ASAP */
- count = inb_pit(PIT_CH0); /* read the latched count */
- count |= inb_pit(PIT_CH0) << 8;
-
- /* VIA686a test code... reset the latch if count > max + 1 */
- if (count > LATCH) {
- outb_pit(0x34, PIT_MODE);
- outb_pit(LATCH & 0xff, PIT_CH0);
- outb_pit(LATCH >> 8, PIT_CH0);
- count = LATCH - 1;
- }
-
- /*
- * It's possible for count to appear to go the wrong way for a
- * couple of reasons:
- *
- * 1. The timer counter underflows, but we haven't handled the
- * resulting interrupt and incremented jiffies yet.
- * 2. Hardware problem with the timer, not giving us continuous time,
- * the counter does small "jumps" upwards on some Pentium systems,
- * (see c't 95/10 page 335 for Neptun bug.)
- *
- * Previous attempts to handle these cases intelligently were
- * buggy, so we just do the simple thing now.
- */
- if (count > old_count && jifs == old_jifs)
- count = old_count;
-
- old_count = count;
- old_jifs = jifs;
-
- raw_spin_unlock_irqrestore(&i8253_lock, flags);
-
- count = (LATCH - 1) - count;
-
- return (cycle_t)(jifs * LATCH) + count;
-}
-
-static struct clocksource pit_cs = {
- .name = "pit",
- .rating = 110,
- .read = pit_read,
- .mask = CLOCKSOURCE_MASK(32),
- .mult = 0,
- .shift = 20,
-};
-
static int __init init_pit_clocksource(void)
{
/*
pit_ce.mode != CLOCK_EVT_MODE_PERIODIC)
return 0;
- pit_cs.mult = clocksource_hz2mult(CLOCK_TICK_RATE, pit_cs.shift);
-
- return clocksource_register(&pit_cs);
+ return clocksource_i8253_init();
}
arch_initcall(init_pit_clocksource);
-
#endif /* !CONFIG_X86_64 */
struct pt_regs *regs)
{
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ unsigned long flags;
/* This is possible if op is under delayed unoptimizing */
if (kprobe_disabled(&op->kp))
return;
- preempt_disable();
+ local_irq_save(flags);
if (kprobe_running()) {
kprobes_inc_nmissed_count(&op->kp);
} else {
opt_pre_handler(&op->kp, regs);
__this_cpu_write(current_kprobe, NULL);
}
- preempt_enable_no_resched();
+ local_irq_restore(flags);
}
static int __kprobes copy_optimized_instructions(u8 *dest, u8 *src)
#include <asm/x86_init.h>
#include <asm/reboot.h>
-#define KVM_SCALE 22
-
static int kvmclock = 1;
static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
.read = kvm_clock_get_cycles,
.rating = 400,
.mask = CLOCKSOURCE_MASK(64),
- .mult = 1 << KVM_SCALE,
- .shift = KVM_SCALE,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
machine_ops.crash_shutdown = kvm_crash_shutdown;
#endif
kvm_get_preset_lpj();
- clocksource_register(&kvm_clock);
+ clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
pv_info.paravirt_enabled = 1;
pv_info.name = "KVM";
#include <linux/bug.h>
#include <linux/mm.h>
#include <linux/gfp.h>
+#include <linux/jump_label.h>
#include <asm/system.h>
#include <asm/page.h>
static int
check_slot(unsigned long mpc_new_phys, unsigned long mpc_new_length, int count)
{
- int ret = 0;
-
if (!mpc_new_phys || count <= mpc_new_length) {
WARN(1, "update_mptable: No spare slots (length: %x)\n", count);
return -1;
}
- return ret;
+ return 0;
}
#else /* CONFIG_X86_IO_APIC */
static
+++ /dev/null
-/*
- * Dynamic DMA mapping support for AMD Hammer.
- *
- * Use the integrated AGP GART in the Hammer northbridge as an IOMMU for PCI.
- * This allows to use PCI devices that only support 32bit addresses on systems
- * with more than 4GB.
- *
- * See Documentation/PCI/PCI-DMA-mapping.txt for the interface specification.
- *
- * Copyright 2002 Andi Kleen, SuSE Labs.
- * Subject to the GNU General Public License v2 only.
- */
-
-#include <linux/types.h>
-#include <linux/ctype.h>
-#include <linux/agp_backend.h>
-#include <linux/init.h>
-#include <linux/mm.h>
-#include <linux/sched.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/pci.h>
-#include <linux/module.h>
-#include <linux/topology.h>
-#include <linux/interrupt.h>
-#include <linux/bitmap.h>
-#include <linux/kdebug.h>
-#include <linux/scatterlist.h>
-#include <linux/iommu-helper.h>
-#include <linux/syscore_ops.h>
-#include <linux/io.h>
-#include <linux/gfp.h>
-#include <asm/atomic.h>
-#include <asm/mtrr.h>
-#include <asm/pgtable.h>
-#include <asm/proto.h>
-#include <asm/iommu.h>
-#include <asm/gart.h>
-#include <asm/cacheflush.h>
-#include <asm/swiotlb.h>
-#include <asm/dma.h>
-#include <asm/amd_nb.h>
-#include <asm/x86_init.h>
-#include <asm/iommu_table.h>
-
-static unsigned long iommu_bus_base; /* GART remapping area (physical) */
-static unsigned long iommu_size; /* size of remapping area bytes */
-static unsigned long iommu_pages; /* .. and in pages */
-
-static u32 *iommu_gatt_base; /* Remapping table */
-
-static dma_addr_t bad_dma_addr;
-
-/*
- * If this is disabled the IOMMU will use an optimized flushing strategy
- * of only flushing when an mapping is reused. With it true the GART is
- * flushed for every mapping. Problem is that doing the lazy flush seems
- * to trigger bugs with some popular PCI cards, in particular 3ware (but
- * has been also also seen with Qlogic at least).
- */
-static int iommu_fullflush = 1;
-
-/* Allocation bitmap for the remapping area: */
-static DEFINE_SPINLOCK(iommu_bitmap_lock);
-/* Guarded by iommu_bitmap_lock: */
-static unsigned long *iommu_gart_bitmap;
-
-static u32 gart_unmapped_entry;
-
-#define GPTE_VALID 1
-#define GPTE_COHERENT 2
-#define GPTE_ENCODE(x) \
- (((x) & 0xfffff000) | (((x) >> 32) << 4) | GPTE_VALID | GPTE_COHERENT)
-#define GPTE_DECODE(x) (((x) & 0xfffff000) | (((u64)(x) & 0xff0) << 28))
-
-#define EMERGENCY_PAGES 32 /* = 128KB */
-
-#ifdef CONFIG_AGP
-#define AGPEXTERN extern
-#else
-#define AGPEXTERN
-#endif
-
-/* GART can only remap to physical addresses < 1TB */
-#define GART_MAX_PHYS_ADDR (1ULL << 40)
-
-/* backdoor interface to AGP driver */
-AGPEXTERN int agp_memory_reserved;
-AGPEXTERN __u32 *agp_gatt_table;
-
-static unsigned long next_bit; /* protected by iommu_bitmap_lock */
-static bool need_flush; /* global flush state. set for each gart wrap */
-
-static unsigned long alloc_iommu(struct device *dev, int size,
- unsigned long align_mask)
-{
- unsigned long offset, flags;
- unsigned long boundary_size;
- unsigned long base_index;
-
- base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev),
- PAGE_SIZE) >> PAGE_SHIFT;
- boundary_size = ALIGN((u64)dma_get_seg_boundary(dev) + 1,
- PAGE_SIZE) >> PAGE_SHIFT;
-
- spin_lock_irqsave(&iommu_bitmap_lock, flags);
- offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, next_bit,
- size, base_index, boundary_size, align_mask);
- if (offset == -1) {
- need_flush = true;
- offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, 0,
- size, base_index, boundary_size,
- align_mask);
- }
- if (offset != -1) {
- next_bit = offset+size;
- if (next_bit >= iommu_pages) {
- next_bit = 0;
- need_flush = true;
- }
- }
- if (iommu_fullflush)
- need_flush = true;
- spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
-
- return offset;
-}
-
-static void free_iommu(unsigned long offset, int size)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&iommu_bitmap_lock, flags);
- bitmap_clear(iommu_gart_bitmap, offset, size);
- if (offset >= next_bit)
- next_bit = offset + size;
- spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
-}
-
-/*
- * Use global flush state to avoid races with multiple flushers.
- */
-static void flush_gart(void)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&iommu_bitmap_lock, flags);
- if (need_flush) {
- amd_flush_garts();
- need_flush = false;
- }
- spin_unlock_irqrestore(&iommu_bitmap_lock, flags);
-}
-
-#ifdef CONFIG_IOMMU_LEAK
-/* Debugging aid for drivers that don't free their IOMMU tables */
-static int leak_trace;
-static int iommu_leak_pages = 20;
-
-static void dump_leak(void)
-{
- static int dump;
-
- if (dump)
- return;
- dump = 1;
-
- show_stack(NULL, NULL);
- debug_dma_dump_mappings(NULL);
-}
-#endif
-
-static void iommu_full(struct device *dev, size_t size, int dir)
-{
- /*
- * Ran out of IOMMU space for this operation. This is very bad.
- * Unfortunately the drivers cannot handle this operation properly.
- * Return some non mapped prereserved space in the aperture and
- * let the Northbridge deal with it. This will result in garbage
- * in the IO operation. When the size exceeds the prereserved space
- * memory corruption will occur or random memory will be DMAed
- * out. Hopefully no network devices use single mappings that big.
- */
-
- dev_err(dev, "PCI-DMA: Out of IOMMU space for %lu bytes\n", size);
-
- if (size > PAGE_SIZE*EMERGENCY_PAGES) {
- if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL)
- panic("PCI-DMA: Memory would be corrupted\n");
- if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL)
- panic(KERN_ERR
- "PCI-DMA: Random memory would be DMAed\n");
- }
-#ifdef CONFIG_IOMMU_LEAK
- dump_leak();
-#endif
-}
-
-static inline int
-need_iommu(struct device *dev, unsigned long addr, size_t size)
-{
- return force_iommu || !dma_capable(dev, addr, size);
-}
-
-static inline int
-nonforced_iommu(struct device *dev, unsigned long addr, size_t size)
-{
- return !dma_capable(dev, addr, size);
-}
-
-/* Map a single continuous physical area into the IOMMU.
- * Caller needs to check if the iommu is needed and flush.
- */
-static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem,
- size_t size, int dir, unsigned long align_mask)
-{
- unsigned long npages = iommu_num_pages(phys_mem, size, PAGE_SIZE);
- unsigned long iommu_page;
- int i;
-
- if (unlikely(phys_mem + size > GART_MAX_PHYS_ADDR))
- return bad_dma_addr;
-
- iommu_page = alloc_iommu(dev, npages, align_mask);
- if (iommu_page == -1) {
- if (!nonforced_iommu(dev, phys_mem, size))
- return phys_mem;
- if (panic_on_overflow)
- panic("dma_map_area overflow %lu bytes\n", size);
- iommu_full(dev, size, dir);
- return bad_dma_addr;
- }
-
- for (i = 0; i < npages; i++) {
- iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
- phys_mem += PAGE_SIZE;
- }
- return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK);
-}
-
-/* Map a single area into the IOMMU */
-static dma_addr_t gart_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- struct dma_attrs *attrs)
-{
- unsigned long bus;
- phys_addr_t paddr = page_to_phys(page) + offset;
-
- if (!dev)
- dev = &x86_dma_fallback_dev;
-
- if (!need_iommu(dev, paddr, size))
- return paddr;
-
- bus = dma_map_area(dev, paddr, size, dir, 0);
- flush_gart();
-
- return bus;
-}
-
-/*
- * Free a DMA mapping.
- */
-static void gart_unmap_page(struct device *dev, dma_addr_t dma_addr,
- size_t size, enum dma_data_direction dir,
- struct dma_attrs *attrs)
-{
- unsigned long iommu_page;
- int npages;
- int i;
-
- if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE ||
- dma_addr >= iommu_bus_base + iommu_size)
- return;
-
- iommu_page = (dma_addr - iommu_bus_base)>>PAGE_SHIFT;
- npages = iommu_num_pages(dma_addr, size, PAGE_SIZE);
- for (i = 0; i < npages; i++) {
- iommu_gatt_base[iommu_page + i] = gart_unmapped_entry;
- }
- free_iommu(iommu_page, npages);
-}
-
-/*
- * Wrapper for pci_unmap_single working with scatterlists.
- */
-static void gart_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
- enum dma_data_direction dir, struct dma_attrs *attrs)
-{
- struct scatterlist *s;
- int i;
-
- for_each_sg(sg, s, nents, i) {
- if (!s->dma_length || !s->length)
- break;
- gart_unmap_page(dev, s->dma_address, s->dma_length, dir, NULL);
- }
-}
-
-/* Fallback for dma_map_sg in case of overflow */
-static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg,
- int nents, int dir)
-{
- struct scatterlist *s;
- int i;
-
-#ifdef CONFIG_IOMMU_DEBUG
- pr_debug("dma_map_sg overflow\n");
-#endif
-
- for_each_sg(sg, s, nents, i) {
- unsigned long addr = sg_phys(s);
-
- if (nonforced_iommu(dev, addr, s->length)) {
- addr = dma_map_area(dev, addr, s->length, dir, 0);
- if (addr == bad_dma_addr) {
- if (i > 0)
- gart_unmap_sg(dev, sg, i, dir, NULL);
- nents = 0;
- sg[0].dma_length = 0;
- break;
- }
- }
- s->dma_address = addr;
- s->dma_length = s->length;
- }
- flush_gart();
-
- return nents;
-}
-
-/* Map multiple scatterlist entries continuous into the first. */
-static int __dma_map_cont(struct device *dev, struct scatterlist *start,
- int nelems, struct scatterlist *sout,
- unsigned long pages)
-{
- unsigned long iommu_start = alloc_iommu(dev, pages, 0);
- unsigned long iommu_page = iommu_start;
- struct scatterlist *s;
- int i;
-
- if (iommu_start == -1)
- return -1;
-
- for_each_sg(start, s, nelems, i) {
- unsigned long pages, addr;
- unsigned long phys_addr = s->dma_address;
-
- BUG_ON(s != start && s->offset);
- if (s == start) {
- sout->dma_address = iommu_bus_base;
- sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
- sout->dma_length = s->length;
- } else {
- sout->dma_length += s->length;
- }
-
- addr = phys_addr;
- pages = iommu_num_pages(s->offset, s->length, PAGE_SIZE);
- while (pages--) {
- iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr);
- addr += PAGE_SIZE;
- iommu_page++;
- }
- }
- BUG_ON(iommu_page - iommu_start != pages);
-
- return 0;
-}
-
-static inline int
-dma_map_cont(struct device *dev, struct scatterlist *start, int nelems,
- struct scatterlist *sout, unsigned long pages, int need)
-{
- if (!need) {
- BUG_ON(nelems != 1);
- sout->dma_address = start->dma_address;
- sout->dma_length = start->length;
- return 0;
- }
- return __dma_map_cont(dev, start, nelems, sout, pages);
-}
-
-/*
- * DMA map all entries in a scatterlist.
- * Merge chunks that have page aligned sizes into a continuous mapping.
- */
-static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
- enum dma_data_direction dir, struct dma_attrs *attrs)
-{
- struct scatterlist *s, *ps, *start_sg, *sgmap;
- int need = 0, nextneed, i, out, start;
- unsigned long pages = 0;
- unsigned int seg_size;
- unsigned int max_seg_size;
-
- if (nents == 0)
- return 0;
-
- if (!dev)
- dev = &x86_dma_fallback_dev;
-
- out = 0;
- start = 0;
- start_sg = sg;
- sgmap = sg;
- seg_size = 0;
- max_seg_size = dma_get_max_seg_size(dev);
- ps = NULL; /* shut up gcc */
-
- for_each_sg(sg, s, nents, i) {
- dma_addr_t addr = sg_phys(s);
-
- s->dma_address = addr;
- BUG_ON(s->length == 0);
-
- nextneed = need_iommu(dev, addr, s->length);
-
- /* Handle the previous not yet processed entries */
- if (i > start) {
- /*
- * Can only merge when the last chunk ends on a
- * page boundary and the new one doesn't have an
- * offset.
- */
- if (!iommu_merge || !nextneed || !need || s->offset ||
- (s->length + seg_size > max_seg_size) ||
- (ps->offset + ps->length) % PAGE_SIZE) {
- if (dma_map_cont(dev, start_sg, i - start,
- sgmap, pages, need) < 0)
- goto error;
- out++;
-
- seg_size = 0;
- sgmap = sg_next(sgmap);
- pages = 0;
- start = i;
- start_sg = s;
- }
- }
-
- seg_size += s->length;
- need = nextneed;
- pages += iommu_num_pages(s->offset, s->length, PAGE_SIZE);
- ps = s;
- }
- if (dma_map_cont(dev, start_sg, i - start, sgmap, pages, need) < 0)
- goto error;
- out++;
- flush_gart();
- if (out < nents) {
- sgmap = sg_next(sgmap);
- sgmap->dma_length = 0;
- }
- return out;
-
-error:
- flush_gart();
- gart_unmap_sg(dev, sg, out, dir, NULL);
-
- /* When it was forced or merged try again in a dumb way */
- if (force_iommu || iommu_merge) {
- out = dma_map_sg_nonforce(dev, sg, nents, dir);
- if (out > 0)
- return out;
- }
- if (panic_on_overflow)
- panic("dma_map_sg: overflow on %lu pages\n", pages);
-
- iommu_full(dev, pages << PAGE_SHIFT, dir);
- for_each_sg(sg, s, nents, i)
- s->dma_address = bad_dma_addr;
- return 0;
-}
-
-/* allocate and map a coherent mapping */
-static void *
-gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr,
- gfp_t flag)
-{
- dma_addr_t paddr;
- unsigned long align_mask;
- struct page *page;
-
- if (force_iommu && !(flag & GFP_DMA)) {
- flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32);
- page = alloc_pages(flag | __GFP_ZERO, get_order(size));
- if (!page)
- return NULL;
-
- align_mask = (1UL << get_order(size)) - 1;
- paddr = dma_map_area(dev, page_to_phys(page), size,
- DMA_BIDIRECTIONAL, align_mask);
-
- flush_gart();
- if (paddr != bad_dma_addr) {
- *dma_addr = paddr;
- return page_address(page);
- }
- __free_pages(page, get_order(size));
- } else
- return dma_generic_alloc_coherent(dev, size, dma_addr, flag);
-
- return NULL;
-}
-
-/* free a coherent mapping */
-static void
-gart_free_coherent(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_addr)
-{
- gart_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL, NULL);
- free_pages((unsigned long)vaddr, get_order(size));
-}
-
-static int gart_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
- return (dma_addr == bad_dma_addr);
-}
-
-static int no_agp;
-
-static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size)
-{
- unsigned long a;
-
- if (!iommu_size) {
- iommu_size = aper_size;
- if (!no_agp)
- iommu_size /= 2;
- }
-
- a = aper + iommu_size;
- iommu_size -= round_up(a, PMD_PAGE_SIZE) - a;
-
- if (iommu_size < 64*1024*1024) {
- pr_warning(
- "PCI-DMA: Warning: Small IOMMU %luMB."
- " Consider increasing the AGP aperture in BIOS\n",
- iommu_size >> 20);
- }
-
- return iommu_size;
-}
-
-static __init unsigned read_aperture(struct pci_dev *dev, u32 *size)
-{
- unsigned aper_size = 0, aper_base_32, aper_order;
- u64 aper_base;
-
- pci_read_config_dword(dev, AMD64_GARTAPERTUREBASE, &aper_base_32);
- pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &aper_order);
- aper_order = (aper_order >> 1) & 7;
-
- aper_base = aper_base_32 & 0x7fff;
- aper_base <<= 25;
-
- aper_size = (32 * 1024 * 1024) << aper_order;
- if (aper_base + aper_size > 0x100000000UL || !aper_size)
- aper_base = 0;
-
- *size = aper_size;
- return aper_base;
-}
-
-static void enable_gart_translations(void)
-{
- int i;
-
- if (!amd_nb_has_feature(AMD_NB_GART))
- return;
-
- for (i = 0; i < amd_nb_num(); i++) {
- struct pci_dev *dev = node_to_amd_nb(i)->misc;
-
- enable_gart_translation(dev, __pa(agp_gatt_table));
- }
-
- /* Flush the GART-TLB to remove stale entries */
- amd_flush_garts();
-}
-
-/*
- * If fix_up_north_bridges is set, the north bridges have to be fixed up on
- * resume in the same way as they are handled in gart_iommu_hole_init().
- */
-static bool fix_up_north_bridges;
-static u32 aperture_order;
-static u32 aperture_alloc;
-
-void set_up_gart_resume(u32 aper_order, u32 aper_alloc)
-{
- fix_up_north_bridges = true;
- aperture_order = aper_order;
- aperture_alloc = aper_alloc;
-}
-
-static void gart_fixup_northbridges(void)
-{
- int i;
-
- if (!fix_up_north_bridges)
- return;
-
- if (!amd_nb_has_feature(AMD_NB_GART))
- return;
-
- pr_info("PCI-DMA: Restoring GART aperture settings\n");
-
- for (i = 0; i < amd_nb_num(); i++) {
- struct pci_dev *dev = node_to_amd_nb(i)->misc;
-
- /*
- * Don't enable translations just yet. That is the next
- * step. Restore the pre-suspend aperture settings.
- */
- gart_set_size_and_enable(dev, aperture_order);
- pci_write_config_dword(dev, AMD64_GARTAPERTUREBASE, aperture_alloc >> 25);
- }
-}
-
-static void gart_resume(void)
-{
- pr_info("PCI-DMA: Resuming GART IOMMU\n");
-
- gart_fixup_northbridges();
-
- enable_gart_translations();
-}
-
-static struct syscore_ops gart_syscore_ops = {
- .resume = gart_resume,
-
-};
-
-/*
- * Private Northbridge GATT initialization in case we cannot use the
- * AGP driver for some reason.
- */
-static __init int init_amd_gatt(struct agp_kern_info *info)
-{
- unsigned aper_size, gatt_size, new_aper_size;
- unsigned aper_base, new_aper_base;
- struct pci_dev *dev;
- void *gatt;
- int i;
-
- pr_info("PCI-DMA: Disabling AGP.\n");
-
- aper_size = aper_base = info->aper_size = 0;
- dev = NULL;
- for (i = 0; i < amd_nb_num(); i++) {
- dev = node_to_amd_nb(i)->misc;
- new_aper_base = read_aperture(dev, &new_aper_size);
- if (!new_aper_base)
- goto nommu;
-
- if (!aper_base) {
- aper_size = new_aper_size;
- aper_base = new_aper_base;
- }
- if (aper_size != new_aper_size || aper_base != new_aper_base)
- goto nommu;
- }
- if (!aper_base)
- goto nommu;
-
- info->aper_base = aper_base;
- info->aper_size = aper_size >> 20;
-
- gatt_size = (aper_size >> PAGE_SHIFT) * sizeof(u32);
- gatt = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
- get_order(gatt_size));
- if (!gatt)
- panic("Cannot allocate GATT table");
- if (set_memory_uc((unsigned long)gatt, gatt_size >> PAGE_SHIFT))
- panic("Could not set GART PTEs to uncacheable pages");
-
- agp_gatt_table = gatt;
-
- register_syscore_ops(&gart_syscore_ops);
-
- flush_gart();
-
- pr_info("PCI-DMA: aperture base @ %x size %u KB\n",
- aper_base, aper_size>>10);
-
- return 0;
-
- nommu:
- /* Should not happen anymore */
- pr_warning("PCI-DMA: More than 4GB of RAM and no IOMMU\n"
- "falling back to iommu=soft.\n");
- return -1;
-}
-
-static struct dma_map_ops gart_dma_ops = {
- .map_sg = gart_map_sg,
- .unmap_sg = gart_unmap_sg,
- .map_page = gart_map_page,
- .unmap_page = gart_unmap_page,
- .alloc_coherent = gart_alloc_coherent,
- .free_coherent = gart_free_coherent,
- .mapping_error = gart_mapping_error,
-};
-
-static void gart_iommu_shutdown(void)
-{
- struct pci_dev *dev;
- int i;
-
- /* don't shutdown it if there is AGP installed */
- if (!no_agp)
- return;
-
- if (!amd_nb_has_feature(AMD_NB_GART))
- return;
-
- for (i = 0; i < amd_nb_num(); i++) {
- u32 ctl;
-
- dev = node_to_amd_nb(i)->misc;
- pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &ctl);
-
- ctl &= ~GARTEN;
-
- pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
- }
-}
-
-int __init gart_iommu_init(void)
-{
- struct agp_kern_info info;
- unsigned long iommu_start;
- unsigned long aper_base, aper_size;
- unsigned long start_pfn, end_pfn;
- unsigned long scratch;
- long i;
-
- if (!amd_nb_has_feature(AMD_NB_GART))
- return 0;
-
-#ifndef CONFIG_AGP_AMD64
- no_agp = 1;
-#else
- /* Makefile puts PCI initialization via subsys_initcall first. */
- /* Add other AMD AGP bridge drivers here */
- no_agp = no_agp ||
- (agp_amd64_init() < 0) ||
- (agp_copy_info(agp_bridge, &info) < 0);
-#endif
-
- if (no_iommu ||
- (!force_iommu && max_pfn <= MAX_DMA32_PFN) ||
- !gart_iommu_aperture ||
- (no_agp && init_amd_gatt(&info) < 0)) {
- if (max_pfn > MAX_DMA32_PFN) {
- pr_warning("More than 4GB of memory but GART IOMMU not available.\n");
- pr_warning("falling back to iommu=soft.\n");
- }
- return 0;
- }
-
- /* need to map that range */
- aper_size = info.aper_size << 20;
- aper_base = info.aper_base;
- end_pfn = (aper_base>>PAGE_SHIFT) + (aper_size>>PAGE_SHIFT);
-
- if (end_pfn > max_low_pfn_mapped) {
- start_pfn = (aper_base>>PAGE_SHIFT);
- init_memory_mapping(start_pfn<<PAGE_SHIFT, end_pfn<<PAGE_SHIFT);
- }
-
- pr_info("PCI-DMA: using GART IOMMU.\n");
- iommu_size = check_iommu_size(info.aper_base, aper_size);
- iommu_pages = iommu_size >> PAGE_SHIFT;
-
- iommu_gart_bitmap = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
- get_order(iommu_pages/8));
- if (!iommu_gart_bitmap)
- panic("Cannot allocate iommu bitmap\n");
-
-#ifdef CONFIG_IOMMU_LEAK
- if (leak_trace) {
- int ret;
-
- ret = dma_debug_resize_entries(iommu_pages);
- if (ret)
- pr_debug("PCI-DMA: Cannot trace all the entries\n");
- }
-#endif
-
- /*
- * Out of IOMMU space handling.
- * Reserve some invalid pages at the beginning of the GART.
- */
- bitmap_set(iommu_gart_bitmap, 0, EMERGENCY_PAGES);
-
- pr_info("PCI-DMA: Reserving %luMB of IOMMU area in the AGP aperture\n",
- iommu_size >> 20);
-
- agp_memory_reserved = iommu_size;
- iommu_start = aper_size - iommu_size;
- iommu_bus_base = info.aper_base + iommu_start;
- bad_dma_addr = iommu_bus_base;
- iommu_gatt_base = agp_gatt_table + (iommu_start>>PAGE_SHIFT);
-
- /*
- * Unmap the IOMMU part of the GART. The alias of the page is
- * always mapped with cache enabled and there is no full cache
- * coherency across the GART remapping. The unmapping avoids
- * automatic prefetches from the CPU allocating cache lines in
- * there. All CPU accesses are done via the direct mapping to
- * the backing memory. The GART address is only used by PCI
- * devices.
- */
- set_memory_np((unsigned long)__va(iommu_bus_base),
- iommu_size >> PAGE_SHIFT);
- /*
- * Tricky. The GART table remaps the physical memory range,
- * so the CPU wont notice potential aliases and if the memory
- * is remapped to UC later on, we might surprise the PCI devices
- * with a stray writeout of a cacheline. So play it sure and
- * do an explicit, full-scale wbinvd() _after_ having marked all
- * the pages as Not-Present:
- */
- wbinvd();
-
- /*
- * Now all caches are flushed and we can safely enable
- * GART hardware. Doing it early leaves the possibility
- * of stale cache entries that can lead to GART PTE
- * errors.
- */
- enable_gart_translations();
-
- /*
- * Try to workaround a bug (thanks to BenH):
- * Set unmapped entries to a scratch page instead of 0.
- * Any prefetches that hit unmapped entries won't get an bus abort
- * then. (P2P bridge may be prefetching on DMA reads).
- */
- scratch = get_zeroed_page(GFP_KERNEL);
- if (!scratch)
- panic("Cannot allocate iommu scratch page");
- gart_unmapped_entry = GPTE_ENCODE(__pa(scratch));
- for (i = EMERGENCY_PAGES; i < iommu_pages; i++)
- iommu_gatt_base[i] = gart_unmapped_entry;
-
- flush_gart();
- dma_ops = &gart_dma_ops;
- x86_platform.iommu_shutdown = gart_iommu_shutdown;
- swiotlb = 0;
-
- return 0;
-}
-
-void __init gart_parse_options(char *p)
-{
- int arg;
-
-#ifdef CONFIG_IOMMU_LEAK
- if (!strncmp(p, "leak", 4)) {
- leak_trace = 1;
- p += 4;
- if (*p == '=')
- ++p;
- if (isdigit(*p) && get_option(&p, &arg))
- iommu_leak_pages = arg;
- }
-#endif
- if (isdigit(*p) && get_option(&p, &arg))
- iommu_size = arg;
- if (!strncmp(p, "fullflush", 9))
- iommu_fullflush = 1;
- if (!strncmp(p, "nofullflush", 11))
- iommu_fullflush = 0;
- if (!strncmp(p, "noagp", 5))
- no_agp = 1;
- if (!strncmp(p, "noaperture", 10))
- fix_aperture = 0;
- /* duplicated from pci-dma.c */
- if (!strncmp(p, "force", 5))
- gart_iommu_aperture_allowed = 1;
- if (!strncmp(p, "allowed", 7))
- gart_iommu_aperture_allowed = 1;
- if (!strncmp(p, "memaper", 7)) {
- fallback_aper_force = 1;
- p += 7;
- if (*p == '=') {
- ++p;
- if (get_option(&p, &arg))
- fallback_aper_order = arg;
- }
- }
-}
-IOMMU_INIT_POST(gart_iommu_hole_init);
struct iommu_table_entry *finish)
{
struct iommu_table_entry *p, *q, *x;
- char sym_p[KSYM_SYMBOL_LEN];
- char sym_q[KSYM_SYMBOL_LEN];
/* Simple cyclic dependency checker. */
for (p = start; p < finish; p++) {
q = find_dependents_of(start, finish, p);
x = find_dependents_of(start, finish, q);
if (p == x) {
- sprint_symbol(sym_p, (unsigned long)p->detect);
- sprint_symbol(sym_q, (unsigned long)q->detect);
-
- printk(KERN_ERR "CYCLIC DEPENDENCY FOUND! %s depends" \
- " on %s and vice-versa. BREAKING IT.\n",
- sym_p, sym_q);
+ printk(KERN_ERR "CYCLIC DEPENDENCY FOUND! %pS depends on %pS and vice-versa. BREAKING IT.\n",
+ p->detect, q->detect);
/* Heavy handed way..*/
x->depend = 0;
}
for (p = start; p < finish; p++) {
q = find_dependents_of(p, finish, p);
if (q && q > p) {
- sprint_symbol(sym_p, (unsigned long)p->detect);
- sprint_symbol(sym_q, (unsigned long)q->detect);
-
- printk(KERN_ERR "EXECUTION ORDER INVALID! %s "\
- "should be called before %s!\n",
- sym_p, sym_q);
+ printk(KERN_ERR "EXECUTION ORDER INVALID! %pS should be called before %pS!\n",
+ p->detect, q->detect);
}
}
}
unsigned len, type;
struct perf_event *bp;
+ if (ptrace_get_breakpoints(tsk) < 0)
+ return -ESRCH;
+
data &= ~DR_CONTROL_RESERVED;
old_dr7 = ptrace_get_dr7(thread->ptrace_bps);
restore:
}
goto restore;
}
+
+ ptrace_put_breakpoints(tsk);
+
return ((orig_ret < 0) ? orig_ret : rc);
}
if (n < HBP_NUM) {
struct perf_event *bp;
+
+ if (ptrace_get_breakpoints(tsk) < 0)
+ return -ESRCH;
+
bp = thread->ptrace_bps[n];
if (!bp)
- return 0;
- val = bp->hw.info.address;
+ val = 0;
+ else
+ val = bp->hw.info.address;
+
+ ptrace_put_breakpoints(tsk);
} else if (n == 6) {
val = thread->debugreg6;
} else if (n == 7) {
struct perf_event *bp;
struct thread_struct *t = &tsk->thread;
struct perf_event_attr attr;
+ int err = 0;
+
+ if (ptrace_get_breakpoints(tsk) < 0)
+ return -ESRCH;
if (!t->ptrace_bps[nr]) {
ptrace_breakpoint_init(&attr);
* writing for the user. And anyway this is the previous
* behaviour.
*/
- if (IS_ERR(bp))
- return PTR_ERR(bp);
+ if (IS_ERR(bp)) {
+ err = PTR_ERR(bp);
+ goto put;
+ }
t->ptrace_bps[nr] = bp;
} else {
- int err;
-
bp = t->ptrace_bps[nr];
attr = bp->attr;
attr.bp_addr = addr;
err = modify_user_hw_breakpoint(bp, &attr);
- if (err)
- return err;
}
-
- return 0;
+put:
+ ptrace_put_breakpoints(tsk);
+ return err;
}
/*
/* Get our own relocated address */
call 1f
1: popl %ebx
- subl $1b, %ebx
+ subl $(1b - r_base), %ebx
/* Compute the equivalent real-mode segment */
movl %ebx, %ecx
shrl $4, %ecx
/* Patch post-real-mode segment jump */
- movw dispatch_table(%ebx,%eax,2),%ax
- movw %ax, 101f(%ebx)
- movw %cx, 102f(%ebx)
+ movw (dispatch_table - r_base)(%ebx,%eax,2),%ax
+ movw %ax, (101f - r_base)(%ebx)
+ movw %cx, (102f - r_base)(%ebx)
/* Set up the IDT for real mode. */
- lidtl machine_real_restart_idt(%ebx)
+ lidtl (machine_real_restart_idt - r_base)(%ebx)
/*
* Set up a GDT from which we can load segment descriptors for real
* mode. The GDT is not used in real mode; it is just needed here to
* prepare the descriptors.
*/
- lgdtl machine_real_restart_gdt(%ebx)
+ lgdtl (machine_real_restart_gdt - r_base)(%ebx)
/*
* Load the data segment registers with 16-bit compatible values
}
/*
- * Reschedule call back. Nothing to do,
- * all the work is done automatically when
- * we return from the interrupt.
+ * Reschedule call back.
*/
void smp_reschedule_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
inc_irq_stat(irq_resched_count);
+ scheduler_ipi();
/*
* KVM uses this interrupt to force a cpu out of guest mode
*/
#include <linux/uaccess.h>
#include <asm/stacktrace.h>
-static void save_stack_warning(void *data, char *msg)
-{
-}
-
-static void
-save_stack_warning_symbol(void *data, char *msg, unsigned long symbol)
-{
-}
-
static int save_stack_stack(void *data, char *name)
{
return 0;
}
static const struct stacktrace_ops save_stack_ops = {
- .warning = save_stack_warning,
- .warning_symbol = save_stack_warning_symbol,
.stack = save_stack_stack,
.address = save_stack_address,
.walk_stack = print_context_stack,
};
static const struct stacktrace_ops save_stack_ops_nosched = {
- .warning = save_stack_warning,
- .warning_symbol = save_stack_warning_symbol,
.stack = save_stack_stack,
.address = save_stack_address_nosched,
.walk_stack = print_context_stack,
.banner = default_banner,
},
+ .mapping = {
+ .pagetable_reserve = native_pagetable_reserve,
+ },
+
.paging = {
.pagetable_setup_start = native_pagetable_setup_start,
.pagetable_setup_done = native_pagetable_setup_done,
* kernel and insert a module (lg.ko) which allows us to run other Linux
* kernels the same way we'd run processes. We call the first kernel the Host,
* and the others the Guests. The program which sets up and configures Guests
- * (such as the example in Documentation/lguest/lguest.c) is called the
+ * (such as the example in Documentation/virtual/lguest/lguest.c) is called the
* Launcher.
*
* Secondly, we only run specially modified Guests, not normal kernels: setting
.rating = 200,
.read = lguest_clock_read,
.mask = CLOCKSOURCE_MASK(64),
- .mult = 1 << 22,
- .shift = 22,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
/* Set up the timer interrupt (0) to go to our simple timer routine */
irq_set_handler(0, lguest_time_irq);
- clocksource_register(&lguest_clock);
+ clocksource_register_hz(&lguest_clock, NSEC_PER_SEC);
/* We can't set cpumask in the initializer: damn C limitations! Set it
* here and register our timer device. */
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+#include <asm/alternative-asm.h>
/*
* Zero a page.
CFI_ENDPROC
ENDPROC(clear_page_c)
+ENTRY(clear_page_c_e)
+ CFI_STARTPROC
+ movl $4096,%ecx
+ xorl %eax,%eax
+ rep stosb
+ ret
+ CFI_ENDPROC
+ENDPROC(clear_page_c_e)
+
ENTRY(clear_page)
CFI_STARTPROC
xorl %eax,%eax
.Lclear_page_end:
ENDPROC(clear_page)
- /* Some CPUs run faster using the string instructions.
- It is also a lot simpler. Use this when possible */
+ /*
+ * Some CPUs support enhanced REP MOVSB/STOSB instructions.
+ * It is recommended to use this when possible.
+ * If enhanced REP MOVSB/STOSB is not available, try to use fast string.
+ * Otherwise, use original function.
+ *
+ */
#include <asm/cpufeature.h>
.section .altinstr_replacement,"ax"
1: .byte 0xeb /* jmp <disp8> */
.byte (clear_page_c - clear_page) - (2f - 1b) /* offset */
-2:
+2: .byte 0xeb /* jmp <disp8> */
+ .byte (clear_page_c_e - clear_page) - (3f - 2b) /* offset */
+3:
.previous
.section .altinstructions,"a"
- .align 8
- .quad clear_page
- .quad 1b
- .word X86_FEATURE_REP_GOOD
- .byte .Lclear_page_end - clear_page
- .byte 2b - 1b
+ altinstruction_entry clear_page,1b,X86_FEATURE_REP_GOOD,\
+ .Lclear_page_end-clear_page, 2b-1b
+ altinstruction_entry clear_page,2b,X86_FEATURE_ERMS, \
+ .Lclear_page_end-clear_page,3b-2b
.previous
#include <asm/asm-offsets.h>
#include <asm/thread_info.h>
#include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
- .macro ALTERNATIVE_JUMP feature,orig,alt
+/*
+ * By placing feature2 after feature1 in altinstructions section, we logically
+ * implement:
+ * If CPU has feature2, jmp to alt2 is used
+ * else if CPU has feature1, jmp to alt1 is used
+ * else jmp to orig is used.
+ */
+ .macro ALTERNATIVE_JUMP feature1,feature2,orig,alt1,alt2
0:
.byte 0xe9 /* 32bit jump */
.long \orig-1f /* by default jump to orig */
1:
.section .altinstr_replacement,"ax"
2: .byte 0xe9 /* near jump with 32bit immediate */
- .long \alt-1b /* offset */ /* or alternatively to alt */
+ .long \alt1-1b /* offset */ /* or alternatively to alt1 */
+3: .byte 0xe9 /* near jump with 32bit immediate */
+ .long \alt2-1b /* offset */ /* or alternatively to alt2 */
.previous
+
.section .altinstructions,"a"
- .align 8
- .quad 0b
- .quad 2b
- .word \feature /* when feature is set */
- .byte 5
- .byte 5
+ altinstruction_entry 0b,2b,\feature1,5,5
+ altinstruction_entry 0b,3b,\feature2,5,5
.previous
.endm
addq %rdx,%rcx
jc bad_to_user
cmpq TI_addr_limit(%rax),%rcx
- jae bad_to_user
- ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+ ja bad_to_user
+ ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS, \
+ copy_user_generic_unrolled,copy_user_generic_string, \
+ copy_user_enhanced_fast_string
CFI_ENDPROC
ENDPROC(_copy_to_user)
addq %rdx,%rcx
jc bad_from_user
cmpq TI_addr_limit(%rax),%rcx
- jae bad_from_user
- ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,copy_user_generic_unrolled,copy_user_generic_string
+ ja bad_from_user
+ ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS, \
+ copy_user_generic_unrolled,copy_user_generic_string, \
+ copy_user_enhanced_fast_string
CFI_ENDPROC
ENDPROC(_copy_from_user)
.previous
CFI_ENDPROC
ENDPROC(copy_user_generic_string)
+
+/*
+ * Some CPUs are adding enhanced REP MOVSB/STOSB instructions.
+ * It's recommended to use enhanced REP MOVSB/STOSB if it's enabled.
+ *
+ * Input:
+ * rdi destination
+ * rsi source
+ * rdx count
+ *
+ * Output:
+ * eax uncopied bytes or 0 if successful.
+ */
+ENTRY(copy_user_enhanced_fast_string)
+ CFI_STARTPROC
+ andl %edx,%edx
+ jz 2f
+ movl %edx,%ecx
+1: rep
+ movsb
+2: xorl %eax,%eax
+ ret
+
+ .section .fixup,"ax"
+12: movl %ecx,%edx /* ecx is zerorest also */
+ jmp copy_user_handle_tail
+ .previous
+
+ .section __ex_table,"a"
+ .align 8
+ .quad 1b,12b
+ .previous
+ CFI_ENDPROC
+ENDPROC(copy_user_enhanced_fast_string)
#include <asm/cpufeature.h>
#include <asm/dwarf2.h>
+#include <asm/alternative-asm.h>
/*
* memcpy - Copy a memory block.
.Lmemcpy_e:
.previous
+/*
+ * memcpy_c_e() - enhanced fast string memcpy. This is faster and simpler than
+ * memcpy_c. Use memcpy_c_e when possible.
+ *
+ * This gets patched over the unrolled variant (below) via the
+ * alternative instructions framework:
+ */
+ .section .altinstr_replacement, "ax", @progbits
+.Lmemcpy_c_e:
+ movq %rdi, %rax
+
+ movl %edx, %ecx
+ rep movsb
+ ret
+.Lmemcpy_e_e:
+ .previous
+
ENTRY(__memcpy)
ENTRY(memcpy)
CFI_STARTPROC
jb .Lhandle_tail
/*
- * We check whether memory false dependece could occur,
+ * We check whether memory false dependence could occur,
* then jump to corresponding copy mode.
*/
cmp %dil, %sil
ENDPROC(__memcpy)
/*
- * Some CPUs run faster using the string copy instructions.
- * It is also a lot simpler. Use this when possible:
- */
-
- .section .altinstructions, "a"
- .align 8
- .quad memcpy
- .quad .Lmemcpy_c
- .word X86_FEATURE_REP_GOOD
-
- /*
+ * Some CPUs are adding enhanced REP MOVSB/STOSB feature
+ * If the feature is supported, memcpy_c_e() is the first choice.
+ * If enhanced rep movsb copy is not available, use fast string copy
+ * memcpy_c() when possible. This is faster and code is simpler than
+ * original memcpy().
+ * Otherwise, original memcpy() is used.
+ * In .altinstructions section, ERMS feature is placed after REG_GOOD
+ * feature to implement the right patch order.
+ *
* Replace only beginning, memcpy is used to apply alternatives,
* so it is silly to overwrite itself with nops - reboot is the
* only outcome...
*/
- .byte .Lmemcpy_e - .Lmemcpy_c
- .byte .Lmemcpy_e - .Lmemcpy_c
+ .section .altinstructions, "a"
+ altinstruction_entry memcpy,.Lmemcpy_c,X86_FEATURE_REP_GOOD,\
+ .Lmemcpy_e-.Lmemcpy_c,.Lmemcpy_e-.Lmemcpy_c
+ altinstruction_entry memcpy,.Lmemcpy_c_e,X86_FEATURE_ERMS, \
+ .Lmemcpy_e_e-.Lmemcpy_c_e,.Lmemcpy_e_e-.Lmemcpy_c_e
.previous
#define _STRING_C
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
#undef memmove
*/
ENTRY(memmove)
CFI_STARTPROC
+
/* Handle more 32bytes in loop */
mov %rdi, %rax
cmp $0x20, %rdx
/* Decide forward/backward copy mode */
cmp %rdi, %rsi
- jb 2f
+ jge .Lmemmove_begin_forward
+ mov %rsi, %r8
+ add %rdx, %r8
+ cmp %rdi, %r8
+ jg 2f
+.Lmemmove_begin_forward:
/*
* movsq instruction have many startup latency
* so we handle small size by general register.
rep movsq
movq %r11, (%r10)
jmp 13f
+.Lmemmove_end_forward:
+
/*
* Handle data backward by movsq.
*/
13:
retq
CFI_ENDPROC
+
+ .section .altinstr_replacement,"ax"
+.Lmemmove_begin_forward_efs:
+ /* Forward moving data. */
+ movq %rdx, %rcx
+ rep movsb
+ retq
+.Lmemmove_end_forward_efs:
+ .previous
+
+ .section .altinstructions,"a"
+ .align 8
+ .quad .Lmemmove_begin_forward
+ .quad .Lmemmove_begin_forward_efs
+ .word X86_FEATURE_ERMS
+ .byte .Lmemmove_end_forward-.Lmemmove_begin_forward
+ .byte .Lmemmove_end_forward_efs-.Lmemmove_begin_forward_efs
+ .previous
ENDPROC(memmove)
#include <linux/linkage.h>
#include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative-asm.h>
/*
- * ISO C memset - set a memory block to a byte value.
+ * ISO C memset - set a memory block to a byte value. This function uses fast
+ * string to get better performance than the original function. The code is
+ * simpler and shorter than the orignal function as well.
*
* rdi destination
* rsi value (char)
.Lmemset_e:
.previous
+/*
+ * ISO C memset - set a memory block to a byte value. This function uses
+ * enhanced rep stosb to override the fast string function.
+ * The code is simpler and shorter than the fast string function as well.
+ *
+ * rdi destination
+ * rsi value (char)
+ * rdx count (bytes)
+ *
+ * rax original destination
+ */
+ .section .altinstr_replacement, "ax", @progbits
+.Lmemset_c_e:
+ movq %rdi,%r9
+ movb %sil,%al
+ movl %edx,%ecx
+ rep stosb
+ movq %r9,%rax
+ ret
+.Lmemset_e_e:
+ .previous
+
ENTRY(memset)
ENTRY(__memset)
CFI_STARTPROC
ENDPROC(memset)
ENDPROC(__memset)
- /* Some CPUs run faster using the string instructions.
- It is also a lot simpler. Use this when possible */
-
-#include <asm/cpufeature.h>
-
+ /* Some CPUs support enhanced REP MOVSB/STOSB feature.
+ * It is recommended to use this when possible.
+ *
+ * If enhanced REP MOVSB/STOSB feature is not available, use fast string
+ * instructions.
+ *
+ * Otherwise, use original memset function.
+ *
+ * In .altinstructions section, ERMS feature is placed after REG_GOOD
+ * feature to implement the right patch order.
+ */
.section .altinstructions,"a"
- .align 8
- .quad memset
- .quad .Lmemset_c
- .word X86_FEATURE_REP_GOOD
- .byte .Lfinal - memset
- .byte .Lmemset_e - .Lmemset_c
+ altinstruction_entry memset,.Lmemset_c,X86_FEATURE_REP_GOOD,\
+ .Lfinal-memset,.Lmemset_e-.Lmemset_c
+ altinstruction_entry memset,.Lmemset_c_e,X86_FEATURE_ERMS, \
+ .Lfinal-memset,.Lmemset_e_e-.Lmemset_c_e
.previous
end, pgt_buf_start << PAGE_SHIFT, pgt_buf_top << PAGE_SHIFT);
}
+void __init native_pagetable_reserve(u64 start, u64 end)
+{
+ memblock_x86_reserve_range(start, end, "PGTABLE");
+}
+
struct map_range {
unsigned long start;
unsigned long end;
__flush_tlb_all();
+ /*
+ * Reserve the kernel pagetable pages we used (pgt_buf_start -
+ * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
+ * so that they can be reused for other purposes.
+ *
+ * On native it just means calling memblock_x86_reserve_range, on Xen it
+ * also means marking RW the pagetable pages that we allocated before
+ * but that haven't been used.
+ *
+ * In fact on xen we mark RO the whole range pgt_buf_start -
+ * pgt_buf_top, because we have to make sure that when
+ * init_memory_mapping reaches the pagetable pages area, it maps
+ * RO all the pagetable pages, including the ones that are beyond
+ * pgt_buf_end at that time.
+ */
if (!after_bootmem && pgt_buf_end > pgt_buf_start)
- memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
- pgt_buf_end << PAGE_SHIFT, "PGTABLE");
+ x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
+ PFN_PHYS(pgt_buf_end));
if (!after_bootmem)
early_memtest(start, end);
bi->end = min(bi->end, high);
/* and there's no empty block */
- if (bi->start == bi->end) {
+ if (bi->start >= bi->end) {
numa_remove_memblk_from(i--, mi);
continue;
}
#include <asm/stacktrace.h>
#include <linux/compat.h>
-static void backtrace_warning_symbol(void *data, char *msg,
- unsigned long symbol)
-{
- /* Ignore warnings */
-}
-
-static void backtrace_warning(void *data, char *msg)
-{
- /* Ignore warnings */
-}
-
static int backtrace_stack(void *data, char *name)
{
/* Yes, we want all stacks */
}
static struct stacktrace_ops backtrace_ops = {
- .warning = backtrace_warning,
- .warning_symbol = backtrace_warning_symbol,
.stack = backtrace_stack,
.address = backtrace_address,
.walk_stack = print_context_stack,
}
irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, 0,
(type == PCI_CAP_ID_MSIX) ?
- "msi-x" : "msi");
+ "msi-x" : "msi",
+ DOMID_SELF);
if (irq < 0)
goto error;
dev_dbg(&dev->dev,
irq = xen_bind_pirq_msi_to_irq(dev, msidesc, v[i], 0,
(type == PCI_CAP_ID_MSIX) ?
"pcifront-msi-x" :
- "pcifront-msi");
+ "pcifront-msi",
+ DOMID_SELF);
if (irq < 0)
goto free;
i++;
list_for_each_entry(msidesc, &dev->msi_list, list) {
struct physdev_map_pirq map_irq;
+ domid_t domid;
+
+ domid = ret = xen_find_device_domain_owner(dev);
+ /* N.B. Casting int's -ENODEV to uint16_t results in 0xFFED,
+ * hence check ret value for < 0. */
+ if (ret < 0)
+ domid = DOMID_SELF;
memset(&map_irq, 0, sizeof(map_irq));
- map_irq.domid = DOMID_SELF;
+ map_irq.domid = domid;
map_irq.type = MAP_PIRQ_TYPE_MSI;
map_irq.index = -1;
map_irq.pirq = -1;
ret = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq);
if (ret) {
- dev_warn(&dev->dev, "xen map irq failed %d\n", ret);
+ dev_warn(&dev->dev, "xen map irq failed %d for %d domain\n",
+ ret, domid);
goto out;
}
ret = xen_bind_pirq_msi_to_irq(dev, msidesc,
map_irq.pirq, map_irq.index,
(type == PCI_CAP_ID_MSIX) ?
- "msi-x" : "msi");
+ "msi-x" : "msi",
+ domid);
if (ret < 0)
goto out;
}
}
}
#endif
+
+#ifdef CONFIG_XEN_DOM0
+struct xen_device_domain_owner {
+ domid_t domain;
+ struct pci_dev *dev;
+ struct list_head list;
+};
+
+static DEFINE_SPINLOCK(dev_domain_list_spinlock);
+static struct list_head dev_domain_list = LIST_HEAD_INIT(dev_domain_list);
+
+static struct xen_device_domain_owner *find_device(struct pci_dev *dev)
+{
+ struct xen_device_domain_owner *owner;
+
+ list_for_each_entry(owner, &dev_domain_list, list) {
+ if (owner->dev == dev)
+ return owner;
+ }
+ return NULL;
+}
+
+int xen_find_device_domain_owner(struct pci_dev *dev)
+{
+ struct xen_device_domain_owner *owner;
+ int domain = -ENODEV;
+
+ spin_lock(&dev_domain_list_spinlock);
+ owner = find_device(dev);
+ if (owner)
+ domain = owner->domain;
+ spin_unlock(&dev_domain_list_spinlock);
+ return domain;
+}
+EXPORT_SYMBOL_GPL(xen_find_device_domain_owner);
+
+int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain)
+{
+ struct xen_device_domain_owner *owner;
+
+ owner = kzalloc(sizeof(struct xen_device_domain_owner), GFP_KERNEL);
+ if (!owner)
+ return -ENODEV;
+
+ spin_lock(&dev_domain_list_spinlock);
+ if (find_device(dev)) {
+ spin_unlock(&dev_domain_list_spinlock);
+ kfree(owner);
+ return -EEXIST;
+ }
+ owner->domain = domain;
+ owner->dev = dev;
+ list_add_tail(&owner->list, &dev_domain_list);
+ spin_unlock(&dev_domain_list_spinlock);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xen_register_device_domain_owner);
+
+int xen_unregister_device_domain_owner(struct pci_dev *dev)
+{
+ struct xen_device_domain_owner *owner;
+
+ spin_lock(&dev_domain_list_spinlock);
+ owner = find_device(dev);
+ if (!owner) {
+ spin_unlock(&dev_domain_list_spinlock);
+ return -ENODEV;
+ }
+ list_del(&owner->list);
+ spin_unlock(&dev_domain_list_spinlock);
+ kfree(owner);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xen_unregister_device_domain_owner);
+#endif
"pciclass0c03";
reg = <0x16800 0x0 0x0 0x0 0x0>;
- interrupts = <22 3>;
+ interrupts = <22 1>;
};
usb@d,1 {
"pciclass0c03";
reg = <0x16900 0x0 0x0 0x0 0x0>;
- interrupts = <22 3>;
+ interrupts = <22 1>;
};
sata@e,0 {
"pciclass0106";
reg = <0x17000 0x0 0x0 0x0 0x0>;
- interrupts = <23 3>;
+ interrupts = <23 1>;
};
flash@f,0 {
return 0;
}
-void __init mrst_time_init(void)
+static void __init mrst_time_init(void)
{
sfi_table_parse(SFI_SIG_MTMR, NULL, NULL, sfi_parse_mtmr);
switch (mrst_timer_options) {
apbt_time_init();
}
-void __cpuinit mrst_arch_setup(void)
+static void __cpuinit mrst_arch_setup(void)
{
if (boot_cpu_data.x86 == 6 && boot_cpu_data.x86_model == 0x27)
__mrst_cpu_chip = MRST_CPU_CHIP_PENWELL;
struct mm_struct *mm,
unsigned long va, unsigned int cpu)
{
- int tcpu;
- int uvhub;
int locals = 0;
int remotes = 0;
int hubs = 0;
+ int tcpu;
+ int tpnode;
struct bau_desc *bau_desc;
struct cpumask *flush_mask;
struct ptc_stats *stat;
struct bau_control *bcp;
struct bau_control *tbcp;
+ struct hub_and_pnode *hpp;
/* kernel was booted 'nobau' */
if (nobau)
bau_desc += UV_ITEMS_PER_DESCRIPTOR * bcp->uvhub_cpu;
bau_uvhubs_clear(&bau_desc->distribution, UV_DISTRIBUTION_SIZE);
- /* cpu statistics */
for_each_cpu(tcpu, flush_mask) {
- uvhub = uv_cpu_to_blade_id(tcpu);
- bau_uvhub_set(uvhub, &bau_desc->distribution);
- if (uvhub == bcp->uvhub)
+ /*
+ * The distribution vector is a bit map of pnodes, relative
+ * to the partition base pnode (and the partition base nasid
+ * in the header).
+ * Translate cpu to pnode and hub using an array stored
+ * in local memory.
+ */
+ hpp = &bcp->socket_master->target_hub_and_pnode[tcpu];
+ tpnode = hpp->pnode - bcp->partition_base_pnode;
+ bau_uvhub_set(tpnode, &bau_desc->distribution);
+ if (hpp->uvhub == bcp->uvhub)
locals++;
else
remotes++;
* an interrupt, but causes an error message to be returned to
* the sender.
*/
-static void uv_enable_timeouts(void)
+static void __init uv_enable_timeouts(void)
{
int uvhub;
int nuvhubs;
}
/*
- * initialize the sending side's sending buffers
+ * Initialize the sending side's sending buffers.
*/
static void
-uv_activation_descriptor_init(int node, int pnode)
+uv_activation_descriptor_init(int node, int pnode, int base_pnode)
{
int i;
int cpu;
n = pa >> uv_nshift;
m = pa & uv_mmask;
+ /* the 14-bit pnode */
uv_write_global_mmr64(pnode, UVH_LB_BAU_SB_DESCRIPTOR_BASE,
(n << UV_DESC_BASE_PNODE_SHIFT | m));
-
/*
- * initializing all 8 (UV_ITEMS_PER_DESCRIPTOR) descriptors for each
+ * Initializing all 8 (UV_ITEMS_PER_DESCRIPTOR) descriptors for each
* cpu even though we only use the first one; one descriptor can
* describe a broadcast to 256 uv hubs.
*/
memset(bd2, 0, sizeof(struct bau_desc));
bd2->header.sw_ack_flag = 1;
/*
- * base_dest_nodeid is the nasid of the first uvhub
- * in the partition. The bit map will indicate uvhub numbers,
- * which are 0-N in a partition. Pnodes are unique system-wide.
+ * The base_dest_nasid set in the message header is the nasid
+ * of the first uvhub in the partition. The bit map will
+ * indicate destination pnode numbers relative to that base.
+ * They may not be consecutive if nasid striding is being used.
*/
- bd2->header.base_dest_nodeid = UV_PNODE_TO_NASID(uv_partition_base_pnode);
- bd2->header.dest_subnodeid = 0x10; /* the LB */
+ bd2->header.base_dest_nasid = UV_PNODE_TO_NASID(base_pnode);
+ bd2->header.dest_subnodeid = UV_LB_SUBNODEID;
bd2->header.command = UV_NET_ENDPOINT_INTD;
bd2->header.int_both = 1;
/*
/*
* Initialization of each UV hub's structures
*/
-static void __init uv_init_uvhub(int uvhub, int vector)
+static void __init uv_init_uvhub(int uvhub, int vector, int base_pnode)
{
int node;
int pnode;
node = uvhub_to_first_node(uvhub);
pnode = uv_blade_to_pnode(uvhub);
- uv_activation_descriptor_init(node, pnode);
+ uv_activation_descriptor_init(node, pnode, base_pnode);
uv_payload_queue_init(node, pnode);
/*
- * the below initialization can't be in firmware because the
- * messaging IRQ will be determined by the OS
+ * The below initialization can't be in firmware because the
+ * messaging IRQ will be determined by the OS.
*/
apicid = uvhub_to_first_apicid(uvhub) | uv_apicid_hibits;
uv_write_global_mmr64(pnode, UVH_BAU_DATA_CONFIG,
/*
* initialize the bau_control structure for each cpu
*/
-static int __init uv_init_per_cpu(int nuvhubs)
+static int __init uv_init_per_cpu(int nuvhubs, int base_part_pnode)
{
int i;
int cpu;
+ int tcpu;
int pnode;
int uvhub;
int have_hmaster;
bcp = &per_cpu(bau_control, cpu);
memset(bcp, 0, sizeof(struct bau_control));
pnode = uv_cpu_hub_info(cpu)->pnode;
+ if ((pnode - base_part_pnode) >= UV_DISTRIBUTION_SIZE) {
+ printk(KERN_EMERG
+ "cpu %d pnode %d-%d beyond %d; BAU disabled\n",
+ cpu, pnode, base_part_pnode,
+ UV_DISTRIBUTION_SIZE);
+ return 1;
+ }
+ bcp->osnode = cpu_to_node(cpu);
+ bcp->partition_base_pnode = uv_partition_base_pnode;
uvhub = uv_cpu_hub_info(cpu)->numa_blade_id;
*(uvhub_mask + (uvhub/8)) |= (1 << (uvhub%8));
bdp = &uvhub_descs[uvhub];
bdp->pnode = pnode;
/* kludge: 'assuming' one node per socket, and assuming that
disabling a socket just leaves a gap in node numbers */
- socket = (cpu_to_node(cpu) & 1);
+ socket = bcp->osnode & 1;
bdp->socket_mask |= (1 << socket);
sdp = &bdp->socket[socket];
sdp->cpu_number[sdp->num_cpus] = cpu;
nextsocket:
socket++;
socket_mask = (socket_mask >> 1);
+ /* each socket gets a local array of pnodes/hubs */
+ bcp = smaster;
+ bcp->target_hub_and_pnode = kmalloc_node(
+ sizeof(struct hub_and_pnode) *
+ num_possible_cpus(), GFP_KERNEL, bcp->osnode);
+ memset(bcp->target_hub_and_pnode, 0,
+ sizeof(struct hub_and_pnode) *
+ num_possible_cpus());
+ for_each_present_cpu(tcpu) {
+ bcp->target_hub_and_pnode[tcpu].pnode =
+ uv_cpu_hub_info(tcpu)->pnode;
+ bcp->target_hub_and_pnode[tcpu].uvhub =
+ uv_cpu_hub_info(tcpu)->numa_blade_id;
+ }
}
}
kfree(uvhub_descs);
spin_lock_init(&disable_lock);
congested_cycles = microsec_2_cycles(congested_response_us);
- if (uv_init_per_cpu(nuvhubs)) {
- nobau = 1;
- return 0;
- }
-
uv_partition_base_pnode = 0x7fffffff;
- for (uvhub = 0; uvhub < nuvhubs; uvhub++)
+ for (uvhub = 0; uvhub < nuvhubs; uvhub++) {
if (uv_blade_nr_possible_cpus(uvhub) &&
(uv_blade_to_pnode(uvhub) < uv_partition_base_pnode))
uv_partition_base_pnode = uv_blade_to_pnode(uvhub);
+ }
+
+ if (uv_init_per_cpu(nuvhubs, uv_partition_base_pnode)) {
+ nobau = 1;
+ return 0;
+ }
vector = UV_BAU_MESSAGE;
for_each_possible_blade(uvhub)
if (uv_blade_nr_possible_cpus(uvhub))
- uv_init_uvhub(uvhub, vector);
+ uv_init_uvhub(uvhub, vector, uv_partition_base_pnode);
uv_enable_timeouts();
alloc_intr_gate(vector, uv_bau_message_intr1);
.rating = 400,
.read = uv_read_rtc,
.mask = (cycle_t)UVH_RTC_REAL_TIME_CLOCK_MASK,
- .shift = 10,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
if (!is_uv_system())
return -ENODEV;
- clocksource_uv.mult = clocksource_hz2mult(sn_rtc_cycles_per_second,
- clocksource_uv.shift);
-
/* If single blade, prefer tsc */
if (uv_num_possible_blades() == 1)
clocksource_uv.rating = 250;
- rc = clocksource_register(&clocksource_uv);
+ rc = clocksource_register_hz(&clocksource_uv, sn_rtc_cycles_per_second);
if (rc)
printk(KERN_INFO "UV RTC clocksource failed rc %d\n", rc);
else
*dx &= maskedx;
}
-static __init void xen_init_cpuid_mask(void)
+static void __init xen_init_cpuid_mask(void)
{
unsigned int ax, bx, cx, dx;
unsigned int xsave_mask;
/*
* load_gdt for early boot, when the gdt is only mapped once
*/
-static __init void xen_load_gdt_boot(const struct desc_ptr *dtr)
+static void __init xen_load_gdt_boot(const struct desc_ptr *dtr)
{
unsigned long va = dtr->address;
unsigned int size = dtr->size + 1;
* Version of write_gdt_entry for use at early boot-time needed to
* update an entry as simply as possible.
*/
-static __init void xen_write_gdt_entry_boot(struct desc_struct *dt, int entry,
+static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry,
const void *desc, int type)
{
switch (type) {
return ret;
}
-static const struct pv_info xen_info __initdata = {
+static const struct pv_info xen_info __initconst = {
.paravirt_enabled = 1,
.shared_kernel_pmd = 0,
.name = "Xen",
};
-static const struct pv_init_ops xen_init_ops __initdata = {
+static const struct pv_init_ops xen_init_ops __initconst = {
.patch = xen_patch,
};
-static const struct pv_cpu_ops xen_cpu_ops __initdata = {
+static const struct pv_cpu_ops xen_cpu_ops __initconst = {
.cpuid = xen_cpuid,
.set_debugreg = xen_set_debugreg,
.end_context_switch = xen_end_context_switch,
};
-static const struct pv_apic_ops xen_apic_ops __initdata = {
+static const struct pv_apic_ops xen_apic_ops __initconst = {
#ifdef CONFIG_X86_LOCAL_APIC
.startup_ipi_hook = paravirt_nop,
#endif
return 0;
}
-static const struct machine_ops __initdata xen_machine_ops = {
+static const struct machine_ops xen_machine_ops __initconst = {
.restart = xen_restart,
.halt = xen_machine_halt,
.power_off = xen_machine_halt,
return NOTIFY_OK;
}
-static struct notifier_block __cpuinitdata xen_hvm_cpu_notifier = {
+static struct notifier_block xen_hvm_cpu_notifier __cpuinitdata = {
.notifier_call = xen_hvm_cpu_notify,
};
}
EXPORT_SYMBOL_GPL(xen_hvm_need_lapic);
-const __refconst struct hypervisor_x86 x86_hyper_xen_hvm = {
+const struct hypervisor_x86 x86_hyper_xen_hvm __refconst = {
.name = "Xen HVM",
.detect = xen_hvm_platform,
.init_platform = xen_hvm_guest_init,
xen_safe_halt();
}
-static const struct pv_irq_ops xen_irq_ops __initdata = {
+static const struct pv_irq_ops xen_irq_ops __initconst = {
.save_fl = PV_CALLEE_SAVE(xen_save_fl),
.restore_fl = PV_CALLEE_SAVE(xen_restore_fl),
.irq_disable = PV_CALLEE_SAVE(xen_irq_disable),
* that's before we have page structures to store the bits. So do all
* the book-keeping now.
*/
-static __init int xen_mark_pinned(struct mm_struct *mm, struct page *page,
+static int __init xen_mark_pinned(struct mm_struct *mm, struct page *page,
enum pt_level level)
{
SetPagePinned(page);
active_mm = percpu_read(cpu_tlbstate.active_mm);
- if (active_mm == mm)
+ if (active_mm == mm && percpu_read(cpu_tlbstate.state) != TLBSTATE_OK)
leave_mm(smp_processor_id());
/* If this cpu still has a stale cr3 reference, then make sure
spin_unlock(&mm->page_table_lock);
}
-static __init void xen_pagetable_setup_start(pgd_t *base)
+static void __init xen_pagetable_setup_start(pgd_t *base)
{
}
+static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
+{
+ /* reserve the range used */
+ native_pagetable_reserve(start, end);
+
+ /* set as RW the rest */
+ printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
+ PFN_PHYS(pgt_buf_top));
+ while (end < PFN_PHYS(pgt_buf_top)) {
+ make_lowmem_page_readwrite(__va(end));
+ end += PAGE_SIZE;
+ }
+}
+
static void xen_post_allocator_init(void);
-static __init void xen_pagetable_setup_done(pgd_t *base)
+static void __init xen_pagetable_setup_done(pgd_t *base)
{
xen_setup_shared_info();
xen_post_allocator_init();
}
#ifdef CONFIG_X86_32
-static __init pte_t mask_rw_pte(pte_t *ptep, pte_t pte)
+static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
{
/* If there's an existing pte, then don't allow _PAGE_RW to be set */
if (pte_val_ma(*ptep) & _PAGE_PRESENT)
return pte;
}
#else /* CONFIG_X86_64 */
-static __init pte_t mask_rw_pte(pte_t *ptep, pte_t pte)
+static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
{
unsigned long pfn = pte_pfn(pte);
* it is RO.
*/
if (((!is_early_ioremap_ptep(ptep) &&
- pfn >= pgt_buf_start && pfn < pgt_buf_end)) ||
+ pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
pte = pte_wrprotect(pte);
/* Init-time set_pte while constructing initial pagetables, which
doesn't allow RO pagetable pages to be remapped RW */
-static __init void xen_set_pte_init(pte_t *ptep, pte_t pte)
+static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
{
pte = mask_rw_pte(ptep, pte);
/* Early in boot, while setting up the initial pagetable, assume
everything is pinned. */
-static __init void xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn)
+static void __init xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn)
{
#ifdef CONFIG_FLATMEM
BUG_ON(mem_map); /* should only be used early */
}
/* Used for pmd and pud */
-static __init void xen_alloc_pmd_init(struct mm_struct *mm, unsigned long pfn)
+static void __init xen_alloc_pmd_init(struct mm_struct *mm, unsigned long pfn)
{
#ifdef CONFIG_FLATMEM
BUG_ON(mem_map); /* should only be used early */
/* Early release_pte assumes that all pts are pinned, since there's
only init_mm and anything attached to that is pinned. */
-static __init void xen_release_pte_init(unsigned long pfn)
+static void __init xen_release_pte_init(unsigned long pfn)
{
pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, pfn);
make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
}
-static __init void xen_release_pmd_init(unsigned long pfn)
+static void __init xen_release_pmd_init(unsigned long pfn)
{
make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
}
BUG();
}
-static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
+static void __init xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
{
unsigned pmdidx, pteidx;
unsigned ident_pte;
* of the physical mapping once some sort of allocator has been set
* up.
*/
-__init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
+pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
unsigned long max_pfn)
{
pud_t *l3;
static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
static RESERVE_BRK_ARRAY(pmd_t, swapper_kernel_pmd, PTRS_PER_PMD);
-static __init void xen_write_cr3_init(unsigned long cr3)
+static void __init xen_write_cr3_init(unsigned long cr3)
{
unsigned long pfn = PFN_DOWN(__pa(swapper_pg_dir));
pv_mmu_ops.write_cr3 = &xen_write_cr3;
}
-__init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
+pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
unsigned long max_pfn)
{
pmd_t *kernel_pmd;
#endif
}
-__init void xen_ident_map_ISA(void)
+void __init xen_ident_map_ISA(void)
{
unsigned long pa;
xen_flush_tlb();
}
-static __init void xen_post_allocator_init(void)
+static void __init xen_post_allocator_init(void)
{
#ifdef CONFIG_XEN_DEBUG
pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte_debug);
preempt_enable();
}
-static const struct pv_mmu_ops xen_mmu_ops __initdata = {
+static const struct pv_mmu_ops xen_mmu_ops __initconst = {
.read_cr2 = xen_read_cr2,
.write_cr2 = xen_write_cr2,
void __init xen_init_mmu_ops(void)
{
+ x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
pv_mmu_ops = xen_mmu_ops;
/* Boundary cross-over for the edges: */
if (idx) {
unsigned long *p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ unsigned long *mid_mfn_p;
p2m_init(p2m);
p2m_top[topidx][mididx] = p2m;
+ /* For save/restore we need to MFN of the P2M saved */
+
+ mid_mfn_p = p2m_top_mfn_p[topidx];
+ WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
+ "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
+ topidx, mididx);
+ mid_mfn_p[mididx] = virt_to_mfn(p2m);
+
}
return idx != 0;
}
pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
{
unsigned topidx = p2m_top_index(pfn);
- if (p2m_top[topidx] == p2m_mid_missing) {
- unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ unsigned long *mid_mfn_p;
+ unsigned long **mid;
+
+ mid = p2m_top[topidx];
+ mid_mfn_p = p2m_top_mfn_p[topidx];
+ if (mid == p2m_mid_missing) {
+ mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_init(mid);
p2m_top[topidx] = mid;
+
+ BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
+ }
+ /* And the save/restore P2M tables.. */
+ if (mid_mfn_p == p2m_mid_missing_mfn) {
+ mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_mid_mfn_init(mid_mfn_p);
+
+ p2m_top_mfn_p[topidx] = mid_mfn_p;
+ p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
+ /* Note: we don't set mid_mfn_p[midix] here,
+ * look in __early_alloc_p2m */
}
}
}
/* Add an MFN override for a particular page */
-int m2p_add_override(unsigned long mfn, struct page *page)
+int m2p_add_override(unsigned long mfn, struct page *page, bool clear_pte)
{
unsigned long flags;
unsigned long pfn;
if (!PageHighMem(page)) {
address = (unsigned long)__va(pfn << PAGE_SHIFT);
ptep = lookup_address(address, &level);
-
if (WARN(ptep == NULL || level != PG_LEVEL_4K,
"m2p_add_override: pfn %lx not mapped", pfn))
return -EINVAL;
if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
return -ENOMEM;
- if (!PageHighMem(page))
+ if (clear_pte && !PageHighMem(page))
/* Just zap old mapping for now */
pte_clear(&init_mm, address, ptep);
-
spin_lock_irqsave(&m2p_override_lock, flags);
list_add(&page->lru, &m2p_overrides[mfn_hash(mfn)]);
spin_unlock_irqrestore(&m2p_override_lock, flags);
return 0;
}
-
-int m2p_remove_override(struct page *page)
+EXPORT_SYMBOL_GPL(m2p_add_override);
+int m2p_remove_override(struct page *page, bool clear_pte)
{
unsigned long flags;
unsigned long mfn;
spin_unlock_irqrestore(&m2p_override_lock, flags);
set_phys_to_machine(pfn, page->index);
- if (!PageHighMem(page))
+ if (clear_pte && !PageHighMem(page))
set_pte_at(&init_mm, address, ptep,
pfn_pte(pfn, PAGE_KERNEL));
/* No tlb flush necessary because the caller already
return 0;
}
+EXPORT_SYMBOL_GPL(m2p_remove_override);
struct page *m2p_find_override(unsigned long mfn)
{
*/
#define EXTRA_MEM_RATIO (10)
-static __init void xen_add_extra_mem(unsigned long pages)
+static void __init xen_add_extra_mem(unsigned long pages)
{
unsigned long pfn;
if (last > end)
continue;
- if (entry->type == E820_RAM) {
+ if ((entry->type == E820_RAM) || (entry->type == E820_UNUSABLE)) {
if (start > start_pci)
identity += set_phys_range_identity(
PFN_UP(start_pci), PFN_DOWN(start));
memcpy(map_raw, map, sizeof(map));
e820.nr_map = 0;
+#ifdef CONFIG_X86_32
+ xen_extra_mem_start = mem_end;
+#else
xen_extra_mem_start = max((1ULL << 32), mem_end);
+#endif
for (i = 0; i < memmap.nr_entries; i++) {
unsigned long long end;
#endif
}
-static __cpuinit int register_callback(unsigned type, const void *func)
+static int __cpuinit register_callback(unsigned type, const void *func)
{
struct callback_register callback = {
.type = type,
static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id);
/*
- * Reschedule call back. Nothing to do,
- * all the work is done automatically when
- * we return from the interrupt.
+ * Reschedule call back.
*/
static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
{
inc_irq_stat(irq_resched_count);
+ scheduler_ipi();
return IRQ_HANDLED;
}
-static __cpuinit void cpu_bringup(void)
+static void __cpuinit cpu_bringup(void)
{
int cpu = smp_processor_id();
wmb(); /* make sure everything is out */
}
-static __cpuinit void cpu_bringup_and_idle(void)
+static void __cpuinit cpu_bringup_and_idle(void)
{
cpu_bringup();
cpu_idle();
}
}
-static __cpuinit int
+static int __cpuinit
cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
{
struct vcpu_guest_context *ctxt;
return IRQ_HANDLED;
}
-static const struct smp_ops xen_smp_ops __initdata = {
+static const struct smp_ops xen_smp_ops __initconst = {
.smp_prepare_boot_cpu = xen_smp_prepare_boot_cpu,
.smp_prepare_cpus = xen_smp_prepare_cpus,
.smp_cpus_done = xen_smp_cpus_done,
#include "xen-ops.h"
-#define XEN_SHIFT 22
-
/* Xen may fire a timer up to this many ns early */
#define TIMER_SLOP 100000
#define NS_PER_TICK (1000000000LL / HZ)
.rating = 400,
.read = xen_clocksource_get_cycles,
.mask = ~0,
- .mult = 1<<XEN_SHIFT, /* time directly in nanoseconds */
- .shift = XEN_SHIFT,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
}
}
-static const struct pv_time_ops xen_time_ops __initdata = {
+static const struct pv_time_ops xen_time_ops __initconst = {
.sched_clock = xen_clocksource_read,
};
-static __init void xen_time_init(void)
+static void __init xen_time_init(void)
{
int cpu = smp_processor_id();
struct timespec tp;
- clocksource_register(&xen_clocksource);
+ clocksource_register_hz(&xen_clocksource, NSEC_PER_SEC);
if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, cpu, NULL) == 0) {
/* Successfully turned off 100Hz tick, so we have the
xen_setup_cpu_clockevents();
}
-__init void xen_init_time_ops(void)
+void __init xen_init_time_ops(void)
{
pv_time_ops = xen_time_ops;
xen_setup_cpu_clockevents();
}
-__init void xen_hvm_init_time_ops(void)
+void __init xen_hvm_init_time_ops(void)
{
/* vector callback is needed otherwise we cannot receive interrupts
* on cpu > 0 and at this point we don't know how many cpus are
#ifdef CONFIG_PARAVIRT_SPINLOCKS
void __init xen_init_spinlocks(void);
-__cpuinit void xen_init_lock_cpu(int cpu);
+void __cpuinit xen_init_lock_cpu(int cpu);
void xen_uninit_lock_cpu(int cpu);
#else
static inline void xen_init_spinlocks(void)
}
EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup);
+struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk)
+{
+ return container_of(task_subsys_state(tsk, blkio_subsys_id),
+ struct blkio_cgroup, css);
+}
+EXPORT_SYMBOL_GPL(task_blkio_cgroup);
+
static inline void
blkio_update_group_weight(struct blkio_group *blkg, unsigned int weight)
{
#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
extern struct blkio_cgroup blkio_root_cgroup;
extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
+extern struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk);
extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
struct blkio_group *blkg, void *key, dev_t dev,
enum blkio_policy_id plid);
struct cgroup;
static inline struct blkio_cgroup *
cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }
+static inline struct blkio_cgroup *
+task_blkio_cgroup(struct task_struct *tsk) { return NULL; }
static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
struct blkio_group *blkg, void *key, dev_t dev,
*/
void blk_run_queue_async(struct request_queue *q)
{
- if (likely(!blk_queue_stopped(q)))
+ if (likely(!blk_queue_stopped(q))) {
+ __cancel_delayed_work(&q->delay_work);
queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
+ }
}
EXPORT_SYMBOL(blk_run_queue_async);
}
static struct throtl_grp * throtl_find_alloc_tg(struct throtl_data *td,
- struct cgroup *cgroup)
+ struct blkio_cgroup *blkcg)
{
- struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);
struct throtl_grp *tg = NULL;
void *key = td;
struct backing_dev_info *bdi = &td->queue->backing_dev_info;
static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
{
- struct cgroup *cgroup;
struct throtl_grp *tg = NULL;
+ struct blkio_cgroup *blkcg;
rcu_read_lock();
- cgroup = task_cgroup(current, blkio_subsys_id);
- tg = throtl_find_alloc_tg(td, cgroup);
+ blkcg = task_blkio_cgroup(current);
+ tg = throtl_find_alloc_tg(td, blkcg);
if (!tg)
tg = &td->root_tg;
rcu_read_unlock();
cfqg->needs_update = true;
}
-static struct cfq_group *
-cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create)
+static struct cfq_group * cfq_find_alloc_cfqg(struct cfq_data *cfqd,
+ struct blkio_cgroup *blkcg, int create)
{
- struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup);
struct cfq_group *cfqg = NULL;
void *key = cfqd;
int i, j;
*/
static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
{
- struct cgroup *cgroup;
+ struct blkio_cgroup *blkcg;
struct cfq_group *cfqg = NULL;
rcu_read_lock();
- cgroup = task_cgroup(current, blkio_subsys_id);
- cfqg = cfq_find_alloc_cfqg(cfqd, cgroup, create);
+ blkcg = task_blkio_cgroup(current);
+ cfqg = cfq_find_alloc_cfqg(cfqd, blkcg, create);
if (!cfqg && create)
cfqg = &cfqd->root_group;
rcu_read_unlock();
source "drivers/clk/Kconfig"
source "drivers/hwspinlock/Kconfig"
+
+source "drivers/clocksource/Kconfig"
+
endmenu
static DEFINE_MUTEX(performance_mutex);
-/* Use cpufreq debug layer for _PPC changes. */
-#define cpufreq_printk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, \
- "cpufreq-core", msg)
-
/*
* _PPC support is implemented as a CPUfreq policy notifier:
* This means each time a CPUfreq driver registered also with
return -ENODEV;
}
- cpufreq_printk("CPU %d: _PPC is %d - frequency %s limited\n", pr->id,
+ pr_debug("CPU %d: _PPC is %d - frequency %s limited\n", pr->id,
(int)ppc, ppc ? "" : "not");
pr->performance_platform_limit = (int)ppc;
if (ACPI_SUCCESS(status))
device->flags.lockable = 1;
+ /* Power resources cannot be power manageable. */
+ if (device->device_type == ACPI_BUS_TYPE_POWER)
+ return 0;
+
/* Presence of _PS0|_PR0 indicates 'power manageable' */
status = acpi_get_handle(device->handle, "_PS0", &temp);
if (ACPI_FAILURE(status))
{
AHCI_HFLAGS (AHCI_HFLAG_NO_FPDMA_AA | AHCI_HFLAG_NO_PMP |
AHCI_HFLAG_YES_NCQ),
- .flags = AHCI_FLAG_COMMON,
+ .flags = AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM,
.pio_mask = ATA_PIO4,
.udma_mask = ATA_UDMA6,
.port_ops = &ahci_ops,
{ PCI_VDEVICE(INTEL, 0x1d06), board_ahci }, /* PBG RAID */
{ PCI_VDEVICE(INTEL, 0x2826), board_ahci }, /* PBG RAID */
{ PCI_VDEVICE(INTEL, 0x2323), board_ahci }, /* DH89xxCC AHCI */
+ { PCI_VDEVICE(INTEL, 0x1e02), board_ahci }, /* Panther Point AHCI */
+ { PCI_VDEVICE(INTEL, 0x1e03), board_ahci }, /* Panther Point AHCI */
+ { PCI_VDEVICE(INTEL, 0x1e04), board_ahci }, /* Panther Point RAID */
+ { PCI_VDEVICE(INTEL, 0x1e05), board_ahci }, /* Panther Point RAID */
+ { PCI_VDEVICE(INTEL, 0x1e06), board_ahci }, /* Panther Point RAID */
+ { PCI_VDEVICE(INTEL, 0x1e07), board_ahci }, /* Panther Point RAID */
/* JMicron 360/1/3/5/6, match class to avoid IDE function */
{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
EM_CTL_ALHD = (1 << 26), /* Activity LED */
EM_CTL_XMT = (1 << 25), /* Transmit Only */
EM_CTL_SMB = (1 << 24), /* Single Message Buffer */
+ EM_CTL_SGPIO = (1 << 19), /* SGPIO messages supported */
+ EM_CTL_SES = (1 << 18), /* SES-2 messages supported */
+ EM_CTL_SAFTE = (1 << 17), /* SAF-TE messages supported */
+ EM_CTL_LED = (1 << 16), /* LED messages supported */
/* em message type */
EM_MSG_TYPE_LED = (1 << 0), /* LED */
{ 0x8086, 0x1d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata },
/* SATA Controller IDE (PBG) */
{ 0x8086, 0x1d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+ /* SATA Controller IDE (Panther Point) */
+ { 0x8086, 0x1e00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata },
+ /* SATA Controller IDE (Panther Point) */
+ { 0x8086, 0x1e01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata },
+ /* SATA Controller IDE (Panther Point) */
+ { 0x8086, 0x1e08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+ /* SATA Controller IDE (Panther Point) */
+ { 0x8086, 0x1e09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
{ } /* terminate list */
};
static ssize_t ahci_store_em_buffer(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t size);
+static ssize_t ahci_show_em_supported(struct device *dev,
+ struct device_attribute *attr, char *buf);
static DEVICE_ATTR(ahci_host_caps, S_IRUGO, ahci_show_host_caps, NULL);
static DEVICE_ATTR(ahci_host_cap2, S_IRUGO, ahci_show_host_cap2, NULL);
static DEVICE_ATTR(ahci_port_cmd, S_IRUGO, ahci_show_port_cmd, NULL);
static DEVICE_ATTR(em_buffer, S_IWUSR | S_IRUGO,
ahci_read_em_buffer, ahci_store_em_buffer);
+static DEVICE_ATTR(em_message_supported, S_IRUGO, ahci_show_em_supported, NULL);
struct device_attribute *ahci_shost_attrs[] = {
&dev_attr_link_power_management_policy,
&dev_attr_ahci_host_version,
&dev_attr_ahci_port_cmd,
&dev_attr_em_buffer,
+ &dev_attr_em_message_supported,
NULL
};
EXPORT_SYMBOL_GPL(ahci_shost_attrs);
return size;
}
+static ssize_t ahci_show_em_supported(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct Scsi_Host *shost = class_to_shost(dev);
+ struct ata_port *ap = ata_shost_to_port(shost);
+ struct ahci_host_priv *hpriv = ap->host->private_data;
+ void __iomem *mmio = hpriv->mmio;
+ u32 em_ctl;
+
+ em_ctl = readl(mmio + HOST_EM_CTL);
+
+ return sprintf(buf, "%s%s%s%s\n",
+ em_ctl & EM_CTL_LED ? "led " : "",
+ em_ctl & EM_CTL_SAFTE ? "saf-te " : "",
+ em_ctl & EM_CTL_SES ? "ses-2 " : "",
+ em_ctl & EM_CTL_SGPIO ? "sgpio " : "");
+}
+
/**
* ahci_save_initial_config - Save and fixup initial config values
* @dev: target AHCI device
ahci_enable_fbs(ap);
pp->intr_mask |= PORT_IRQ_BAD_PMP;
- writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
+
+ /*
+ * We must not change the port interrupt mask register if the
+ * port is marked frozen, the value in pp->intr_mask will be
+ * restored later when the port is thawed.
+ *
+ * Note that during initialization, the port is marked as
+ * frozen since the irq handler is not yet registered.
+ */
+ if (!(ap->pflags & ATA_PFLAG_FROZEN))
+ writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
}
static void ahci_pmp_detach(struct ata_port *ap)
writel(cmd, port_mmio + PORT_CMD);
pp->intr_mask &= ~PORT_IRQ_BAD_PMP;
- writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
+
+ /* see comment above in ahci_pmp_attach() */
+ if (!(ap->pflags & ATA_PFLAG_FROZEN))
+ writel(pp->intr_mask, port_mmio + PORT_IRQ_MASK);
}
int ahci_port_resume(struct ata_port *ap)
*/
{ "PIONEER DVD-RW DVRTD08", "1.00", ATA_HORKAGE_NOSETXFER },
{ "PIONEER DVD-RW DVR-212D", "1.28", ATA_HORKAGE_NOSETXFER },
+ { "PIONEER DVD-RW DVR-216D", "1.08", ATA_HORKAGE_NOSETXFER },
/* End Marker */
{ }
if (!ap)
return NULL;
- ap->pflags |= ATA_PFLAG_INITIALIZING;
+ ap->pflags |= ATA_PFLAG_INITIALIZING | ATA_PFLAG_FROZEN;
ap->lock = &host->lock;
ap->print_id = -1;
ap->host = host;
struct ata_eh_context *ehc = &link->eh_context;
struct ata_device *dev, *link_dev = NULL, *lpm_dev = NULL;
enum ata_lpm_policy old_policy = link->lpm_policy;
+ bool no_dipm = link->ap->flags & ATA_FLAG_NO_DIPM;
unsigned int hints = ATA_LPM_EMPTY | ATA_LPM_HIPM;
unsigned int err_mask;
int rc;
*/
ata_for_each_dev(dev, link, ENABLED) {
bool hipm = ata_id_has_hipm(dev->id);
- bool dipm = ata_id_has_dipm(dev->id);
+ bool dipm = ata_id_has_dipm(dev->id) && !no_dipm;
/* find the first enabled and LPM enabled devices */
if (!link_dev)
/* host config updated, enable DIPM if transitioning to MIN_POWER */
ata_for_each_dev(dev, link, ENABLED) {
- if (policy == ATA_LPM_MIN_POWER && ata_id_has_dipm(dev->id)) {
+ if (policy == ATA_LPM_MIN_POWER && !no_dipm &&
+ ata_id_has_dipm(dev->id)) {
err_mask = ata_dev_set_feature(dev,
SETFEATURES_SATA_ENABLE, SATA_DIPM);
if (err_mask && err_mask != AC_ERR_DEV) {
#define DRV_NAME "pata_at91"
-#define DRV_VERSION "0.1"
+#define DRV_VERSION "0.2"
#define CF_IDE_OFFSET 0x00c00000
#define CF_ALT_IDE_OFFSET 0x00e00000
#define CF_IDE_RES_SIZE 0x08
+#define NCS_RD_PULSE_LIMIT 0x3f /* maximal value for pulse bitfields */
struct at91_ide_info {
unsigned long mode;
void __iomem *alt_addr;
};
-static const struct ata_timing initial_timing =
- {XFER_PIO_0, 70, 290, 240, 600, 165, 150, 600, 0};
+static const struct ata_timing initial_timing = {
+ .mode = XFER_PIO_0,
+ .setup = 70,
+ .act8b = 290,
+ .rec8b = 240,
+ .cyc8b = 600,
+ .active = 165,
+ .recover = 150,
+ .dmack_hold = 0,
+ .cycle = 600,
+ .udma = 0
+};
static unsigned long calc_mck_cycles(unsigned long ns, unsigned long mck_hz)
{
/* (CS0, CS1, DIR, OE) <= (CFCE1, CFCE2, CFRNW, NCSX) timings */
ncs_read_setup = 1;
ncs_read_pulse = read_cycle - 2;
+ if (ncs_read_pulse > NCS_RD_PULSE_LIMIT) {
+ ncs_read_pulse = NCS_RD_PULSE_LIMIT;
+ dev_warn(dev, "ncs_read_pulse limited to maximal value %lu\n",
+ ncs_read_pulse);
+ }
/* Write timings same as read timings */
write_cycle = read_cycle;
}
#ifdef CONFIG_SBUS
+static const struct of_device_id fore200e_sba_match[];
static int __devinit fore200e_sba_probe(struct platform_device *op)
{
+ const struct of_device_id *match;
const struct fore200e_bus *bus;
struct fore200e *fore200e;
static int index = 0;
int err;
- if (!op->dev.of_match)
+ match = of_match_device(fore200e_sba_match, &op->dev);
+ if (!match)
return -EINVAL;
- bus = op->dev.of_match->data;
+ bus = match->data;
fore200e = kzalloc(sizeof(struct fore200e), GFP_KERNEL);
if (!fore200e)
bool
default n
-config ARCH_NO_SYSDEV_OPS
- bool
- ---help---
- To be selected by architectures that don't use sysdev class or
- sysdev driver power management (suspend/resume) and shutdown
- operations.
-
endmenu
return drv->bus->match ? drv->bus->match(dev, drv) : 1;
}
-extern void sysdev_shutdown(void);
-
extern char *make_class_name(const char *name, struct kobject *kobj);
extern int devres_release_all(struct device *dev);
drv = dev->driver;
if (drv) {
- pm_runtime_get_noresume(dev);
- pm_runtime_barrier(dev);
+ pm_runtime_get_sync(dev);
driver_sysfs_remove(dev);
BUS_NOTIFY_UNBIND_DRIVER,
dev);
+ pm_runtime_put_sync(dev);
+
if (dev->bus && dev->bus->remove)
dev->bus->remove(dev);
else if (drv->remove)
BUS_NOTIFY_UNBOUND_DRIVER,
dev);
- pm_runtime_put_sync(dev);
}
}
if (!firmware_p)
return -EINVAL;
+ if (WARN_ON(usermodehelper_is_disabled())) {
+ dev_err(device, "firmware: %s will not be loaded\n", name);
+ return -EBUSY;
+ }
+
*firmware_p = firmware = kzalloc(sizeof(*firmware), GFP_KERNEL);
if (!firmware) {
dev_err(device, "%s: kmalloc(struct firmware) failed\n",
return ret;
}
-static int platform_pm_prepare(struct device *dev)
+int platform_pm_prepare(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static void platform_pm_complete(struct device *dev)
+void platform_pm_complete(struct device *dev)
{
struct device_driver *drv = dev->driver;
drv->pm->complete(dev);
}
-#else /* !CONFIG_PM_SLEEP */
-
-#define platform_pm_prepare NULL
-#define platform_pm_complete NULL
-
-#endif /* !CONFIG_PM_SLEEP */
+#endif /* CONFIG_PM_SLEEP */
#ifdef CONFIG_SUSPEND
-int __weak platform_pm_suspend(struct device *dev)
+int platform_pm_suspend(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-int __weak platform_pm_suspend_noirq(struct device *dev)
+int platform_pm_suspend_noirq(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-int __weak platform_pm_resume(struct device *dev)
+int platform_pm_resume(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-int __weak platform_pm_resume_noirq(struct device *dev)
+int platform_pm_resume_noirq(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-#else /* !CONFIG_SUSPEND */
-
-#define platform_pm_suspend NULL
-#define platform_pm_resume NULL
-#define platform_pm_suspend_noirq NULL
-#define platform_pm_resume_noirq NULL
-
-#endif /* !CONFIG_SUSPEND */
+#endif /* CONFIG_SUSPEND */
#ifdef CONFIG_HIBERNATE_CALLBACKS
-static int platform_pm_freeze(struct device *dev)
+int platform_pm_freeze(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_freeze_noirq(struct device *dev)
+int platform_pm_freeze_noirq(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_thaw(struct device *dev)
+int platform_pm_thaw(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_thaw_noirq(struct device *dev)
+int platform_pm_thaw_noirq(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_poweroff(struct device *dev)
+int platform_pm_poweroff(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_poweroff_noirq(struct device *dev)
+int platform_pm_poweroff_noirq(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_restore(struct device *dev)
+int platform_pm_restore(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-static int platform_pm_restore_noirq(struct device *dev)
+int platform_pm_restore_noirq(struct device *dev)
{
struct device_driver *drv = dev->driver;
int ret = 0;
return ret;
}
-#else /* !CONFIG_HIBERNATE_CALLBACKS */
-
-#define platform_pm_freeze NULL
-#define platform_pm_thaw NULL
-#define platform_pm_poweroff NULL
-#define platform_pm_restore NULL
-#define platform_pm_freeze_noirq NULL
-#define platform_pm_thaw_noirq NULL
-#define platform_pm_poweroff_noirq NULL
-#define platform_pm_restore_noirq NULL
-
-#endif /* !CONFIG_HIBERNATE_CALLBACKS */
-
-#ifdef CONFIG_PM_RUNTIME
-
-int __weak platform_pm_runtime_suspend(struct device *dev)
-{
- return pm_generic_runtime_suspend(dev);
-};
-
-int __weak platform_pm_runtime_resume(struct device *dev)
-{
- return pm_generic_runtime_resume(dev);
-};
-
-int __weak platform_pm_runtime_idle(struct device *dev)
-{
- return pm_generic_runtime_idle(dev);
-};
-
-#else /* !CONFIG_PM_RUNTIME */
-
-#define platform_pm_runtime_suspend NULL
-#define platform_pm_runtime_resume NULL
-#define platform_pm_runtime_idle NULL
-
-#endif /* !CONFIG_PM_RUNTIME */
+#endif /* CONFIG_HIBERNATE_CALLBACKS */
static const struct dev_pm_ops platform_dev_pm_ops = {
- .prepare = platform_pm_prepare,
- .complete = platform_pm_complete,
- .suspend = platform_pm_suspend,
- .resume = platform_pm_resume,
- .freeze = platform_pm_freeze,
- .thaw = platform_pm_thaw,
- .poweroff = platform_pm_poweroff,
- .restore = platform_pm_restore,
- .suspend_noirq = platform_pm_suspend_noirq,
- .resume_noirq = platform_pm_resume_noirq,
- .freeze_noirq = platform_pm_freeze_noirq,
- .thaw_noirq = platform_pm_thaw_noirq,
- .poweroff_noirq = platform_pm_poweroff_noirq,
- .restore_noirq = platform_pm_restore_noirq,
- .runtime_suspend = platform_pm_runtime_suspend,
- .runtime_resume = platform_pm_runtime_resume,
- .runtime_idle = platform_pm_runtime_idle,
+ .runtime_suspend = pm_generic_runtime_suspend,
+ .runtime_resume = pm_generic_runtime_resume,
+ .runtime_idle = pm_generic_runtime_idle,
+ USE_PLATFORM_PM_SLEEP_OPS
};
struct bus_type platform_bus_type = {
};
EXPORT_SYMBOL_GPL(platform_bus_type);
-/**
- * platform_bus_get_pm_ops() - return pointer to busses dev_pm_ops
- *
- * This function can be used by platform code to get the current
- * set of dev_pm_ops functions used by the platform_bus_type.
- */
-const struct dev_pm_ops * __init platform_bus_get_pm_ops(void)
-{
- return platform_bus_type.pm;
-}
-
-/**
- * platform_bus_set_pm_ops() - update dev_pm_ops for the platform_bus_type
- *
- * @pm: pointer to new dev_pm_ops struct to be used for platform_bus_type
- *
- * Platform code can override the dev_pm_ops methods of
- * platform_bus_type by using this function. It is expected that
- * platform code will first do a platform_bus_get_pm_ops(), then
- * kmemdup it, then customize selected methods and pass a pointer to
- * the new struct dev_pm_ops to this function.
- *
- * Since platform-specific code is customizing methods for *all*
- * devices (not just platform-specific devices) it is expected that
- * any custom overrides of these functions will keep existing behavior
- * and simply extend it. For example, any customization of the
- * runtime PM methods should continue to call the pm_generic_*
- * functions as the default ones do in addition to the
- * platform-specific behavior.
- */
-void __init platform_bus_set_pm_ops(const struct dev_pm_ops *pm)
-{
- platform_bus_type.pm = pm;
-}
-
int __init platform_bus_init(void)
{
int error;
obj-$(CONFIG_PM_RUNTIME) += runtime.o
obj-$(CONFIG_PM_TRACE_RTC) += trace.o
obj-$(CONFIG_PM_OPP) += opp.o
+obj-$(CONFIG_HAVE_CLK) += clock_ops.o
-ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
-ccflags-$(CONFIG_PM_VERBOSE) += -DDEBUG
+ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
\ No newline at end of file
--- /dev/null
+/*
+ * drivers/base/power/clock_ops.c - Generic clock manipulation PM callbacks
+ *
+ * Copyright (c) 2011 Rafael J. Wysocki <rjw@sisk.pl>, Renesas Electronics Corp.
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/io.h>
+#include <linux/pm.h>
+#include <linux/pm_runtime.h>
+#include <linux/clk.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+
+#ifdef CONFIG_PM_RUNTIME
+
+struct pm_runtime_clk_data {
+ struct list_head clock_list;
+ struct mutex lock;
+};
+
+enum pce_status {
+ PCE_STATUS_NONE = 0,
+ PCE_STATUS_ACQUIRED,
+ PCE_STATUS_ENABLED,
+ PCE_STATUS_ERROR,
+};
+
+struct pm_clock_entry {
+ struct list_head node;
+ char *con_id;
+ struct clk *clk;
+ enum pce_status status;
+};
+
+static struct pm_runtime_clk_data *__to_prd(struct device *dev)
+{
+ return dev ? dev->power.subsys_data : NULL;
+}
+
+/**
+ * pm_runtime_clk_add - Start using a device clock for runtime PM.
+ * @dev: Device whose clock is going to be used for runtime PM.
+ * @con_id: Connection ID of the clock.
+ *
+ * Add the clock represented by @con_id to the list of clocks used for
+ * the runtime PM of @dev.
+ */
+int pm_runtime_clk_add(struct device *dev, const char *con_id)
+{
+ struct pm_runtime_clk_data *prd = __to_prd(dev);
+ struct pm_clock_entry *ce;
+
+ if (!prd)
+ return -EINVAL;
+
+ ce = kzalloc(sizeof(*ce), GFP_KERNEL);
+ if (!ce) {
+ dev_err(dev, "Not enough memory for clock entry.\n");
+ return -ENOMEM;
+ }
+
+ if (con_id) {
+ ce->con_id = kstrdup(con_id, GFP_KERNEL);
+ if (!ce->con_id) {
+ dev_err(dev,
+ "Not enough memory for clock connection ID.\n");
+ kfree(ce);
+ return -ENOMEM;
+ }
+ }
+
+ mutex_lock(&prd->lock);
+ list_add_tail(&ce->node, &prd->clock_list);
+ mutex_unlock(&prd->lock);
+ return 0;
+}
+
+/**
+ * __pm_runtime_clk_remove - Destroy runtime PM clock entry.
+ * @ce: Runtime PM clock entry to destroy.
+ *
+ * This routine must be called under the mutex protecting the runtime PM list
+ * of clocks corresponding the the @ce's device.
+ */
+static void __pm_runtime_clk_remove(struct pm_clock_entry *ce)
+{
+ if (!ce)
+ return;
+
+ list_del(&ce->node);
+
+ if (ce->status < PCE_STATUS_ERROR) {
+ if (ce->status == PCE_STATUS_ENABLED)
+ clk_disable(ce->clk);
+
+ if (ce->status >= PCE_STATUS_ACQUIRED)
+ clk_put(ce->clk);
+ }
+
+ if (ce->con_id)
+ kfree(ce->con_id);
+
+ kfree(ce);
+}
+
+/**
+ * pm_runtime_clk_remove - Stop using a device clock for runtime PM.
+ * @dev: Device whose clock should not be used for runtime PM any more.
+ * @con_id: Connection ID of the clock.
+ *
+ * Remove the clock represented by @con_id from the list of clocks used for
+ * the runtime PM of @dev.
+ */
+void pm_runtime_clk_remove(struct device *dev, const char *con_id)
+{
+ struct pm_runtime_clk_data *prd = __to_prd(dev);
+ struct pm_clock_entry *ce;
+
+ if (!prd)
+ return;
+
+ mutex_lock(&prd->lock);
+
+ list_for_each_entry(ce, &prd->clock_list, node) {
+ if (!con_id && !ce->con_id) {
+ __pm_runtime_clk_remove(ce);
+ break;
+ } else if (!con_id || !ce->con_id) {
+ continue;
+ } else if (!strcmp(con_id, ce->con_id)) {
+ __pm_runtime_clk_remove(ce);
+ break;
+ }
+ }
+
+ mutex_unlock(&prd->lock);
+}
+
+/**
+ * pm_runtime_clk_init - Initialize a device's list of runtime PM clocks.
+ * @dev: Device to initialize the list of runtime PM clocks for.
+ *
+ * Allocate a struct pm_runtime_clk_data object, initialize its lock member and
+ * make the @dev's power.subsys_data field point to it.
+ */
+int pm_runtime_clk_init(struct device *dev)
+{
+ struct pm_runtime_clk_data *prd;
+
+ prd = kzalloc(sizeof(*prd), GFP_KERNEL);
+ if (!prd) {
+ dev_err(dev, "Not enough memory fo runtime PM data.\n");
+ return -ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&prd->clock_list);
+ mutex_init(&prd->lock);
+ dev->power.subsys_data = prd;
+ return 0;
+}
+
+/**
+ * pm_runtime_clk_destroy - Destroy a device's list of runtime PM clocks.
+ * @dev: Device to destroy the list of runtime PM clocks for.
+ *
+ * Clear the @dev's power.subsys_data field, remove the list of clock entries
+ * from the struct pm_runtime_clk_data object pointed to by it before and free
+ * that object.
+ */
+void pm_runtime_clk_destroy(struct device *dev)
+{
+ struct pm_runtime_clk_data *prd = __to_prd(dev);
+ struct pm_clock_entry *ce, *c;
+
+ if (!prd)
+ return;
+
+ dev->power.subsys_data = NULL;
+
+ mutex_lock(&prd->lock);
+
+ list_for_each_entry_safe_reverse(ce, c, &prd->clock_list, node)
+ __pm_runtime_clk_remove(ce);
+
+ mutex_unlock(&prd->lock);
+
+ kfree(prd);
+}
+
+/**
+ * pm_runtime_clk_acquire - Acquire a device clock.
+ * @dev: Device whose clock is to be acquired.
+ * @con_id: Connection ID of the clock.
+ */
+static void pm_runtime_clk_acquire(struct device *dev,
+ struct pm_clock_entry *ce)
+{
+ ce->clk = clk_get(dev, ce->con_id);
+ if (IS_ERR(ce->clk)) {
+ ce->status = PCE_STATUS_ERROR;
+ } else {
+ ce->status = PCE_STATUS_ACQUIRED;
+ dev_dbg(dev, "Clock %s managed by runtime PM.\n", ce->con_id);
+ }
+}
+
+/**
+ * pm_runtime_clk_suspend - Disable clocks in a device's runtime PM clock list.
+ * @dev: Device to disable the clocks for.
+ */
+int pm_runtime_clk_suspend(struct device *dev)
+{
+ struct pm_runtime_clk_data *prd = __to_prd(dev);
+ struct pm_clock_entry *ce;
+
+ dev_dbg(dev, "%s()\n", __func__);
+
+ if (!prd)
+ return 0;
+
+ mutex_lock(&prd->lock);
+
+ list_for_each_entry_reverse(ce, &prd->clock_list, node) {
+ if (ce->status == PCE_STATUS_NONE)
+ pm_runtime_clk_acquire(dev, ce);
+
+ if (ce->status < PCE_STATUS_ERROR) {
+ clk_disable(ce->clk);
+ ce->status = PCE_STATUS_ACQUIRED;
+ }
+ }
+
+ mutex_unlock(&prd->lock);
+
+ return 0;
+}
+
+/**
+ * pm_runtime_clk_resume - Enable clocks in a device's runtime PM clock list.
+ * @dev: Device to enable the clocks for.
+ */
+int pm_runtime_clk_resume(struct device *dev)
+{
+ struct pm_runtime_clk_data *prd = __to_prd(dev);
+ struct pm_clock_entry *ce;
+
+ dev_dbg(dev, "%s()\n", __func__);
+
+ if (!prd)
+ return 0;
+
+ mutex_lock(&prd->lock);
+
+ list_for_each_entry(ce, &prd->clock_list, node) {
+ if (ce->status == PCE_STATUS_NONE)
+ pm_runtime_clk_acquire(dev, ce);
+
+ if (ce->status < PCE_STATUS_ERROR) {
+ clk_enable(ce->clk);
+ ce->status = PCE_STATUS_ENABLED;
+ }
+ }
+
+ mutex_unlock(&prd->lock);
+
+ return 0;
+}
+
+/**
+ * pm_runtime_clk_notify - Notify routine for device addition and removal.
+ * @nb: Notifier block object this function is a member of.
+ * @action: Operation being carried out by the caller.
+ * @data: Device the routine is being run for.
+ *
+ * For this function to work, @nb must be a member of an object of type
+ * struct pm_clk_notifier_block containing all of the requisite data.
+ * Specifically, the pwr_domain member of that object is copied to the device's
+ * pwr_domain field and its con_ids member is used to populate the device's list
+ * of runtime PM clocks, depending on @action.
+ *
+ * If the device's pwr_domain field is already populated with a value different
+ * from the one stored in the struct pm_clk_notifier_block object, the function
+ * does nothing.
+ */
+static int pm_runtime_clk_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct pm_clk_notifier_block *clknb;
+ struct device *dev = data;
+ char *con_id;
+ int error;
+
+ dev_dbg(dev, "%s() %ld\n", __func__, action);
+
+ clknb = container_of(nb, struct pm_clk_notifier_block, nb);
+
+ switch (action) {
+ case BUS_NOTIFY_ADD_DEVICE:
+ if (dev->pwr_domain)
+ break;
+
+ error = pm_runtime_clk_init(dev);
+ if (error)
+ break;
+
+ dev->pwr_domain = clknb->pwr_domain;
+ if (clknb->con_ids[0]) {
+ for (con_id = clknb->con_ids[0]; *con_id; con_id++)
+ pm_runtime_clk_add(dev, con_id);
+ } else {
+ pm_runtime_clk_add(dev, NULL);
+ }
+
+ break;
+ case BUS_NOTIFY_DEL_DEVICE:
+ if (dev->pwr_domain != clknb->pwr_domain)
+ break;
+
+ dev->pwr_domain = NULL;
+ pm_runtime_clk_destroy(dev);
+ break;
+ }
+
+ return 0;
+}
+
+#else /* !CONFIG_PM_RUNTIME */
+
+/**
+ * enable_clock - Enable a device clock.
+ * @dev: Device whose clock is to be enabled.
+ * @con_id: Connection ID of the clock.
+ */
+static void enable_clock(struct device *dev, const char *con_id)
+{
+ struct clk *clk;
+
+ clk = clk_get(dev, con_id);
+ if (!IS_ERR(clk)) {
+ clk_enable(clk);
+ clk_put(clk);
+ dev_info(dev, "Runtime PM disabled, clock forced on.\n");
+ }
+}
+
+/**
+ * disable_clock - Disable a device clock.
+ * @dev: Device whose clock is to be disabled.
+ * @con_id: Connection ID of the clock.
+ */
+static void disable_clock(struct device *dev, const char *con_id)
+{
+ struct clk *clk;
+
+ clk = clk_get(dev, con_id);
+ if (!IS_ERR(clk)) {
+ clk_disable(clk);
+ clk_put(clk);
+ dev_info(dev, "Runtime PM disabled, clock forced off.\n");
+ }
+}
+
+/**
+ * pm_runtime_clk_notify - Notify routine for device addition and removal.
+ * @nb: Notifier block object this function is a member of.
+ * @action: Operation being carried out by the caller.
+ * @data: Device the routine is being run for.
+ *
+ * For this function to work, @nb must be a member of an object of type
+ * struct pm_clk_notifier_block containing all of the requisite data.
+ * Specifically, the con_ids member of that object is used to enable or disable
+ * the device's clocks, depending on @action.
+ */
+static int pm_runtime_clk_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct pm_clk_notifier_block *clknb;
+ struct device *dev = data;
+ char *con_id;
+
+ dev_dbg(dev, "%s() %ld\n", __func__, action);
+
+ clknb = container_of(nb, struct pm_clk_notifier_block, nb);
+
+ switch (action) {
+ case BUS_NOTIFY_ADD_DEVICE:
+ if (clknb->con_ids[0]) {
+ for (con_id = clknb->con_ids[0]; *con_id; con_id++)
+ enable_clock(dev, con_id);
+ } else {
+ enable_clock(dev, NULL);
+ }
+ break;
+ case BUS_NOTIFY_DEL_DEVICE:
+ if (clknb->con_ids[0]) {
+ for (con_id = clknb->con_ids[0]; *con_id; con_id++)
+ disable_clock(dev, con_id);
+ } else {
+ disable_clock(dev, NULL);
+ }
+ break;
+ }
+
+ return 0;
+}
+
+#endif /* !CONFIG_PM_RUNTIME */
+
+/**
+ * pm_runtime_clk_add_notifier - Add bus type notifier for runtime PM clocks.
+ * @bus: Bus type to add the notifier to.
+ * @clknb: Notifier to be added to the given bus type.
+ *
+ * The nb member of @clknb is not expected to be initialized and its
+ * notifier_call member will be replaced with pm_runtime_clk_notify(). However,
+ * the remaining members of @clknb should be populated prior to calling this
+ * routine.
+ */
+void pm_runtime_clk_add_notifier(struct bus_type *bus,
+ struct pm_clk_notifier_block *clknb)
+{
+ if (!bus || !clknb)
+ return;
+
+ clknb->nb.notifier_call = pm_runtime_clk_notify;
+ bus_register_notifier(bus, &clknb->nb);
+}
#endif /* CONFIG_PM_RUNTIME */
#ifdef CONFIG_PM_SLEEP
+/**
+ * pm_generic_prepare - Generic routine preparing a device for power transition.
+ * @dev: Device to prepare.
+ *
+ * Prepare a device for a system-wide power transition.
+ */
+int pm_generic_prepare(struct device *dev)
+{
+ struct device_driver *drv = dev->driver;
+ int ret = 0;
+
+ if (drv && drv->pm && drv->pm->prepare)
+ ret = drv->pm->prepare(dev);
+
+ return ret;
+}
+
/**
* __pm_generic_call - Generic suspend/freeze/poweroff/thaw subsystem callback.
* @dev: Device to handle.
return __pm_generic_resume(dev, PM_EVENT_RESTORE);
}
EXPORT_SYMBOL_GPL(pm_generic_restore);
+
+/**
+ * pm_generic_complete - Generic routine competing a device power transition.
+ * @dev: Device to handle.
+ *
+ * Complete a device power transition during a system-wide power transition.
+ */
+void pm_generic_complete(struct device *dev)
+{
+ struct device_driver *drv = dev->driver;
+
+ if (drv && drv->pm && drv->pm->complete)
+ drv->pm->complete(dev);
+
+ /*
+ * Let runtime PM try to suspend devices that haven't been in use before
+ * going into the system-wide sleep state we're resuming from.
+ */
+ pm_runtime_idle(dev);
+}
#endif /* CONFIG_PM_SLEEP */
struct dev_pm_ops generic_subsys_pm_ops = {
#ifdef CONFIG_PM_SLEEP
+ .prepare = pm_generic_prepare,
.suspend = pm_generic_suspend,
.resume = pm_generic_resume,
.freeze = pm_generic_freeze,
.thaw = pm_generic_thaw,
.poweroff = pm_generic_poweroff,
.restore = pm_generic_restore,
+ .complete = pm_generic_complete,
#endif
#ifdef CONFIG_PM_RUNTIME
.runtime_suspend = pm_generic_runtime_suspend,
dev->power.wakeup = NULL;
spin_lock_init(&dev->power.lock);
pm_runtime_init(dev);
+ INIT_LIST_HEAD(&dev->power.entry);
}
/**
if (dev->pwr_domain) {
pm_dev_dbg(dev, state, "EARLY power domain ");
- pm_noirq_op(dev, &dev->pwr_domain->ops, state);
- }
-
- if (dev->type && dev->type->pm) {
+ error = pm_noirq_op(dev, &dev->pwr_domain->ops, state);
+ } else if (dev->type && dev->type->pm) {
pm_dev_dbg(dev, state, "EARLY type ");
error = pm_noirq_op(dev, dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
if (dev->pwr_domain) {
pm_dev_dbg(dev, state, "power domain ");
- pm_op(dev, &dev->pwr_domain->ops, state);
+ error = pm_op(dev, &dev->pwr_domain->ops, state);
+ goto End;
}
if (dev->type && dev->type->pm) {
* Execute the appropriate "resume" callback for all devices whose status
* indicates that they are suspended.
*/
-static void dpm_resume(pm_message_t state)
+void dpm_resume(pm_message_t state)
{
struct device *dev;
ktime_t starttime = ktime_get();
+ might_sleep();
+
mutex_lock(&dpm_list_mtx);
pm_transition = state;
async_error = 0;
{
device_lock(dev);
- if (dev->pwr_domain && dev->pwr_domain->ops.complete) {
+ if (dev->pwr_domain) {
pm_dev_dbg(dev, state, "completing power domain ");
- dev->pwr_domain->ops.complete(dev);
- }
-
- if (dev->type && dev->type->pm) {
+ if (dev->pwr_domain->ops.complete)
+ dev->pwr_domain->ops.complete(dev);
+ } else if (dev->type && dev->type->pm) {
pm_dev_dbg(dev, state, "completing type ");
if (dev->type->pm->complete)
dev->type->pm->complete(dev);
* Execute the ->complete() callbacks for all devices whose PM status is not
* DPM_ON (this allows new devices to be registered).
*/
-static void dpm_complete(pm_message_t state)
+void dpm_complete(pm_message_t state)
{
struct list_head list;
+ might_sleep();
+
INIT_LIST_HEAD(&list);
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_prepared_list)) {
*/
void dpm_resume_end(pm_message_t state)
{
- might_sleep();
dpm_resume(state);
dpm_complete(state);
}
{
int error;
- if (dev->type && dev->type->pm) {
+ if (dev->pwr_domain) {
+ pm_dev_dbg(dev, state, "LATE power domain ");
+ error = pm_noirq_op(dev, &dev->pwr_domain->ops, state);
+ if (error)
+ return error;
+ } else if (dev->type && dev->type->pm) {
pm_dev_dbg(dev, state, "LATE type ");
error = pm_noirq_op(dev, dev->type->pm, state);
if (error)
return error;
}
- if (dev->pwr_domain) {
- pm_dev_dbg(dev, state, "LATE power domain ");
- pm_noirq_op(dev, &dev->pwr_domain->ops, state);
- }
-
return 0;
}
goto End;
}
+ if (dev->pwr_domain) {
+ pm_dev_dbg(dev, state, "power domain ");
+ error = pm_op(dev, &dev->pwr_domain->ops, state);
+ goto End;
+ }
+
if (dev->type && dev->type->pm) {
pm_dev_dbg(dev, state, "type ");
error = pm_op(dev, dev->type->pm, state);
- goto Domain;
+ goto End;
}
if (dev->class) {
if (dev->class->pm) {
pm_dev_dbg(dev, state, "class ");
error = pm_op(dev, dev->class->pm, state);
- goto Domain;
+ goto End;
} else if (dev->class->suspend) {
pm_dev_dbg(dev, state, "legacy class ");
error = legacy_suspend(dev, state, dev->class->suspend);
- goto Domain;
+ goto End;
}
}
}
}
- Domain:
- if (!error && dev->pwr_domain) {
- pm_dev_dbg(dev, state, "power domain ");
- pm_op(dev, &dev->pwr_domain->ops, state);
- }
-
End:
device_unlock(dev);
complete_all(&dev->power.completion);
* dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
* @state: PM transition of the system being carried out.
*/
-static int dpm_suspend(pm_message_t state)
+int dpm_suspend(pm_message_t state)
{
ktime_t starttime = ktime_get();
int error = 0;
+ might_sleep();
+
mutex_lock(&dpm_list_mtx);
pm_transition = state;
async_error = 0;
device_lock(dev);
- if (dev->type && dev->type->pm) {
+ if (dev->pwr_domain) {
+ pm_dev_dbg(dev, state, "preparing power domain ");
+ if (dev->pwr_domain->ops.prepare)
+ error = dev->pwr_domain->ops.prepare(dev);
+ suspend_report_result(dev->pwr_domain->ops.prepare, error);
+ if (error)
+ goto End;
+ } else if (dev->type && dev->type->pm) {
pm_dev_dbg(dev, state, "preparing type ");
if (dev->type->pm->prepare)
error = dev->type->pm->prepare(dev);
if (dev->bus->pm->prepare)
error = dev->bus->pm->prepare(dev);
suspend_report_result(dev->bus->pm->prepare, error);
- if (error)
- goto End;
- }
-
- if (dev->pwr_domain && dev->pwr_domain->ops.prepare) {
- pm_dev_dbg(dev, state, "preparing power domain ");
- dev->pwr_domain->ops.prepare(dev);
}
End:
*
* Execute the ->prepare() callback(s) for all devices.
*/
-static int dpm_prepare(pm_message_t state)
+int dpm_prepare(pm_message_t state)
{
int error = 0;
+ might_sleep();
+
mutex_lock(&dpm_list_mtx);
while (!list_empty(&dpm_list)) {
struct device *dev = to_device(dpm_list.next);
{
int error;
- might_sleep();
error = dpm_prepare(state);
if (!error)
error = dpm_suspend(state);
static int rpm_idle(struct device *dev, int rpmflags)
{
int (*callback)(struct device *);
- int (*domain_callback)(struct device *);
int retval;
retval = rpm_check_suspend_allowed(dev);
dev->power.idle_notification = true;
- if (dev->type && dev->type->pm)
+ if (dev->pwr_domain)
+ callback = dev->pwr_domain->ops.runtime_idle;
+ else if (dev->type && dev->type->pm)
callback = dev->type->pm->runtime_idle;
else if (dev->class && dev->class->pm)
callback = dev->class->pm->runtime_idle;
else
callback = NULL;
- if (dev->pwr_domain)
- domain_callback = dev->pwr_domain->ops.runtime_idle;
- else
- domain_callback = NULL;
-
- if (callback || domain_callback) {
+ if (callback) {
spin_unlock_irq(&dev->power.lock);
- if (domain_callback)
- retval = domain_callback(dev);
-
- if (!retval && callback)
- callback(dev);
+ callback(dev);
spin_lock_irq(&dev->power.lock);
}
__update_runtime_status(dev, RPM_SUSPENDING);
- if (dev->type && dev->type->pm)
+ if (dev->pwr_domain)
+ callback = dev->pwr_domain->ops.runtime_suspend;
+ else if (dev->type && dev->type->pm)
callback = dev->type->pm->runtime_suspend;
else if (dev->class && dev->class->pm)
callback = dev->class->pm->runtime_suspend;
else
pm_runtime_cancel_pending(dev);
} else {
- if (dev->pwr_domain)
- rpm_callback(dev->pwr_domain->ops.runtime_suspend, dev);
no_callback:
__update_runtime_status(dev, RPM_SUSPENDED);
pm_runtime_deactivate_timer(dev);
__update_runtime_status(dev, RPM_RESUMING);
if (dev->pwr_domain)
- rpm_callback(dev->pwr_domain->ops.runtime_resume, dev);
-
- if (dev->type && dev->type->pm)
+ callback = dev->pwr_domain->ops.runtime_resume;
+ else if (dev->type && dev->type->pm)
callback = dev->type->pm->runtime_resume;
else if (dev->class && dev->class->pm)
callback = dev->class->pm->runtime_resume;
static DEVICE_ATTR(autosuspend_delay_ms, 0644, autosuspend_delay_ms_show,
autosuspend_delay_ms_store);
-#endif
+#endif /* CONFIG_PM_RUNTIME */
+#ifdef CONFIG_PM_SLEEP
static ssize_t
wake_show(struct device * dev, struct device_attribute *attr, char * buf)
{
static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store);
-#ifdef CONFIG_PM_SLEEP
static ssize_t wakeup_count_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
spin_lock_irq(&events_lock);
list_add_rcu(&ws->entry, &wakeup_sources);
spin_unlock_irq(&events_lock);
- synchronize_rcu();
}
EXPORT_SYMBOL_GPL(wakeup_source_add);
if (!!dev->power.can_wakeup == !!capable)
return;
- if (device_is_registered(dev)) {
+ if (device_is_registered(dev) && !list_empty(&dev->power.entry)) {
if (capable) {
if (wakeup_sysfs_add(dev))
return;
kobject_put(&sysdev->kobj);
}
-
-#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
-/**
- * sysdev_shutdown - Shut down all system devices.
- *
- * Loop over each class of system devices, and the devices in each
- * of those classes. For each device, we call the shutdown method for
- * each driver registered for the device - the auxiliaries,
- * and the class driver.
- *
- * Note: The list is iterated in reverse order, so that we shut down
- * child devices before we shut down their parents. The list ordering
- * is guaranteed by virtue of the fact that child devices are registered
- * after their parents.
- */
-void sysdev_shutdown(void)
-{
- struct sysdev_class *cls;
-
- pr_debug("Shutting Down System Devices\n");
-
- mutex_lock(&sysdev_drivers_lock);
- list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
- struct sys_device *sysdev;
-
- pr_debug("Shutting down type '%s':\n",
- kobject_name(&cls->kset.kobj));
-
- list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
- struct sysdev_driver *drv;
- pr_debug(" %s\n", kobject_name(&sysdev->kobj));
-
- /* Call auxiliary drivers first */
- list_for_each_entry(drv, &cls->drivers, entry) {
- if (drv->shutdown)
- drv->shutdown(sysdev);
- }
-
- /* Now call the generic one */
- if (cls->shutdown)
- cls->shutdown(sysdev);
- }
- }
- mutex_unlock(&sysdev_drivers_lock);
-}
-
-static void __sysdev_resume(struct sys_device *dev)
-{
- struct sysdev_class *cls = dev->cls;
- struct sysdev_driver *drv;
-
- /* First, call the class-specific one */
- if (cls->resume)
- cls->resume(dev);
- WARN_ONCE(!irqs_disabled(),
- "Interrupts enabled after %pF\n", cls->resume);
-
- /* Call auxiliary drivers next. */
- list_for_each_entry(drv, &cls->drivers, entry) {
- if (drv->resume)
- drv->resume(dev);
- WARN_ONCE(!irqs_disabled(),
- "Interrupts enabled after %pF\n", drv->resume);
- }
-}
-
-/**
- * sysdev_suspend - Suspend all system devices.
- * @state: Power state to enter.
- *
- * We perform an almost identical operation as sysdev_shutdown()
- * above, though calling ->suspend() instead. Interrupts are disabled
- * when this called. Devices are responsible for both saving state and
- * quiescing or powering down the device.
- *
- * This is only called by the device PM core, so we let them handle
- * all synchronization.
- */
-int sysdev_suspend(pm_message_t state)
-{
- struct sysdev_class *cls;
- struct sys_device *sysdev, *err_dev;
- struct sysdev_driver *drv, *err_drv;
- int ret;
-
- pr_debug("Checking wake-up interrupts\n");
-
- /* Return error code if there are any wake-up interrupts pending */
- ret = check_wakeup_irqs();
- if (ret)
- return ret;
-
- WARN_ONCE(!irqs_disabled(),
- "Interrupts enabled while suspending system devices\n");
-
- pr_debug("Suspending System Devices\n");
-
- list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
- pr_debug("Suspending type '%s':\n",
- kobject_name(&cls->kset.kobj));
-
- list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
- pr_debug(" %s\n", kobject_name(&sysdev->kobj));
-
- /* Call auxiliary drivers first */
- list_for_each_entry(drv, &cls->drivers, entry) {
- if (drv->suspend) {
- ret = drv->suspend(sysdev, state);
- if (ret)
- goto aux_driver;
- }
- WARN_ONCE(!irqs_disabled(),
- "Interrupts enabled after %pF\n",
- drv->suspend);
- }
-
- /* Now call the generic one */
- if (cls->suspend) {
- ret = cls->suspend(sysdev, state);
- if (ret)
- goto cls_driver;
- WARN_ONCE(!irqs_disabled(),
- "Interrupts enabled after %pF\n",
- cls->suspend);
- }
- }
- }
- return 0;
- /* resume current sysdev */
-cls_driver:
- drv = NULL;
- printk(KERN_ERR "Class suspend failed for %s: %d\n",
- kobject_name(&sysdev->kobj), ret);
-
-aux_driver:
- if (drv)
- printk(KERN_ERR "Class driver suspend failed for %s: %d\n",
- kobject_name(&sysdev->kobj), ret);
- list_for_each_entry(err_drv, &cls->drivers, entry) {
- if (err_drv == drv)
- break;
- if (err_drv->resume)
- err_drv->resume(sysdev);
- }
-
- /* resume other sysdevs in current class */
- list_for_each_entry(err_dev, &cls->kset.list, kobj.entry) {
- if (err_dev == sysdev)
- break;
- pr_debug(" %s\n", kobject_name(&err_dev->kobj));
- __sysdev_resume(err_dev);
- }
-
- /* resume other classes */
- list_for_each_entry_continue(cls, &system_kset->list, kset.kobj.entry) {
- list_for_each_entry(err_dev, &cls->kset.list, kobj.entry) {
- pr_debug(" %s\n", kobject_name(&err_dev->kobj));
- __sysdev_resume(err_dev);
- }
- }
- return ret;
-}
-EXPORT_SYMBOL_GPL(sysdev_suspend);
-
-/**
- * sysdev_resume - Bring system devices back to life.
- *
- * Similar to sysdev_suspend(), but we iterate the list forwards
- * to guarantee that parent devices are resumed before their children.
- *
- * Note: Interrupts are disabled when called.
- */
-int sysdev_resume(void)
-{
- struct sysdev_class *cls;
-
- WARN_ONCE(!irqs_disabled(),
- "Interrupts enabled while resuming system devices\n");
-
- pr_debug("Resuming System Devices\n");
-
- list_for_each_entry(cls, &system_kset->list, kset.kobj.entry) {
- struct sys_device *sysdev;
-
- pr_debug("Resuming type '%s':\n",
- kobject_name(&cls->kset.kobj));
-
- list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
- pr_debug(" %s\n", kobject_name(&sysdev->kobj));
-
- __sysdev_resume(sysdev);
- }
- }
- return 0;
-}
-EXPORT_SYMBOL_GPL(sysdev_resume);
-#endif /* CONFIG_ARCH_NO_SYSDEV_OPS */
+EXPORT_SYMBOL_GPL(sysdev_register);
+EXPORT_SYMBOL_GPL(sysdev_unregister);
int __init system_bus_init(void)
{
return 0;
}
-EXPORT_SYMBOL_GPL(sysdev_register);
-EXPORT_SYMBOL_GPL(sysdev_unregister);
-
#define to_ext_attr(x) container_of(x, struct sysdev_ext_attribute, attr)
ssize_t sysdev_store_ulong(struct sys_device *sysdev,
return ret;
}
+EXPORT_SYMBOL_GPL(syscore_suspend);
/**
* syscore_resume - Execute all the registered system core resume callbacks.
"Interrupts enabled after %pF\n", ops->resume);
}
}
+EXPORT_SYMBOL_GPL(syscore_resume);
#endif /* CONFIG_PM_SLEEP */
/**
disk->major = MajorNumber;
disk->first_minor = n << DAC960_MaxPartitionsBits;
disk->fops = &DAC960_BlockDeviceOperations;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
}
/*
Indicate the Block Device Registration completed successfully,
disk->major = FLOPPY_MAJOR;
disk->first_minor = drive;
disk->fops = &floppy_fops;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
sprintf(disk->disk_name, "fd%d", drive);
disk->private_data = &unit[drive];
set_capacity(disk, 880*2);
unit[i].disk->first_minor = i;
sprintf(unit[i].disk->disk_name, "fd%d", i);
unit[i].disk->fops = &floppy_fops;
- unit[i].disk->events = DISK_EVENT_MEDIA_CHANGE;
unit[i].disk->private_data = &unit[i];
unit[i].disk->queue = blk_init_queue(do_fd_request,
&ataflop_lock);
disks[dr]->major = FLOPPY_MAJOR;
disks[dr]->first_minor = TOMINOR(dr);
disks[dr]->fops = &floppy_fops;
- disks[dr]->events = DISK_EVENT_MEDIA_CHANGE;
sprintf(disks[dr]->disk_name, "fd%d", dr);
init_timer(&motor_off_timer[dr]);
disk->first_minor = unit;
strcpy(disk->disk_name, cd->name); /* umm... */
disk->fops = &pcd_bdops;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
}
}
p->fops = &pd_fops;
p->major = major;
p->first_minor = (disk - pd) << PD_BITS;
- p->events = DISK_EVENT_MEDIA_CHANGE;
disk->gd = p;
p->private_data = disk;
p->queue = pd_queue;
disk->first_minor = unit;
strcpy(disk->disk_name, pf->name);
disk->fops = &pf_fops;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
if (!(*drives[unit])[D_PRT])
pf_drive_count++;
}
struct list_head node;
};
+struct rbd_req_coll;
+
/*
* a single io request
*/
struct bio *bio; /* cloned bio */
struct page **pages; /* list of used pages */
u64 len;
+ int coll_index;
+ struct rbd_req_coll *coll;
+};
+
+struct rbd_req_status {
+ int done;
+ int rc;
+ u64 bytes;
+};
+
+/*
+ * a collection of requests
+ */
+struct rbd_req_coll {
+ int total;
+ int num_done;
+ struct kref kref;
+ struct rbd_req_status status[0];
};
struct rbd_snap {
rbd_dev->client = NULL;
}
+/*
+ * Destroy requests collection
+ */
+static void rbd_coll_release(struct kref *kref)
+{
+ struct rbd_req_coll *coll =
+ container_of(kref, struct rbd_req_coll, kref);
+
+ dout("rbd_coll_release %p\n", coll);
+ kfree(coll);
+}
/*
* Create a new header structure, translate header format from the on-disk
return len;
}
+static int rbd_get_num_segments(struct rbd_image_header *header,
+ u64 ofs, u64 len)
+{
+ u64 start_seg = ofs >> header->obj_order;
+ u64 end_seg = (ofs + len - 1) >> header->obj_order;
+ return end_seg - start_seg + 1;
+}
+
/*
* bio helpers
*/
kfree(ops);
}
+static void rbd_coll_end_req_index(struct request *rq,
+ struct rbd_req_coll *coll,
+ int index,
+ int ret, u64 len)
+{
+ struct request_queue *q;
+ int min, max, i;
+
+ dout("rbd_coll_end_req_index %p index %d ret %d len %lld\n",
+ coll, index, ret, len);
+
+ if (!rq)
+ return;
+
+ if (!coll) {
+ blk_end_request(rq, ret, len);
+ return;
+ }
+
+ q = rq->q;
+
+ spin_lock_irq(q->queue_lock);
+ coll->status[index].done = 1;
+ coll->status[index].rc = ret;
+ coll->status[index].bytes = len;
+ max = min = coll->num_done;
+ while (max < coll->total && coll->status[max].done)
+ max++;
+
+ for (i = min; i<max; i++) {
+ __blk_end_request(rq, coll->status[i].rc,
+ coll->status[i].bytes);
+ coll->num_done++;
+ kref_put(&coll->kref, rbd_coll_release);
+ }
+ spin_unlock_irq(q->queue_lock);
+}
+
+static void rbd_coll_end_req(struct rbd_request *req,
+ int ret, u64 len)
+{
+ rbd_coll_end_req_index(req->rq, req->coll, req->coll_index, ret, len);
+}
+
/*
* Send ceph osd request
*/
int flags,
struct ceph_osd_req_op *ops,
int num_reply,
+ struct rbd_req_coll *coll,
+ int coll_index,
void (*rbd_cb)(struct ceph_osd_request *req,
struct ceph_msg *msg),
struct ceph_osd_request **linger_req,
struct ceph_osd_request_head *reqhead;
struct rbd_image_header *header = &dev->header;
- ret = -ENOMEM;
req_data = kzalloc(sizeof(*req_data), GFP_NOIO);
- if (!req_data)
- goto done;
+ if (!req_data) {
+ if (coll)
+ rbd_coll_end_req_index(rq, coll, coll_index,
+ -ENOMEM, len);
+ return -ENOMEM;
+ }
- dout("rbd_do_request len=%lld ofs=%lld\n", len, ofs);
+ if (coll) {
+ req_data->coll = coll;
+ req_data->coll_index = coll_index;
+ }
+
+ dout("rbd_do_request obj=%s ofs=%lld len=%lld\n", obj, len, ofs);
down_read(&header->snap_rwsem);
ops,
false,
GFP_NOIO, pages, bio);
- if (IS_ERR(req)) {
+ if (!req) {
up_read(&header->snap_rwsem);
- ret = PTR_ERR(req);
+ ret = -ENOMEM;
goto done_pages;
}
ret = ceph_osdc_wait_request(&dev->client->osdc, req);
if (ver)
*ver = le64_to_cpu(req->r_reassert_version.version);
- dout("reassert_ver=%lld\n", le64_to_cpu(req->r_reassert_version.version));
+ dout("reassert_ver=%lld\n",
+ le64_to_cpu(req->r_reassert_version.version));
ceph_osdc_put_request(req);
}
return ret;
bio_chain_put(req_data->bio);
ceph_osdc_put_request(req);
done_pages:
+ rbd_coll_end_req(req_data, ret, len);
kfree(req_data);
-done:
- if (rq)
- blk_end_request(rq, ret, len);
return ret;
}
bytes = req_data->len;
}
- blk_end_request(req_data->rq, rc, bytes);
+ rbd_coll_end_req(req_data, rc, bytes);
if (req_data->bio)
bio_chain_put(req_data->bio);
flags,
ops,
2,
+ NULL, 0,
NULL,
linger_req, ver);
if (ret < 0)
u64 snapid,
int opcode, int flags, int num_reply,
u64 ofs, u64 len,
- struct bio *bio)
+ struct bio *bio,
+ struct rbd_req_coll *coll,
+ int coll_index)
{
char *seg_name;
u64 seg_ofs;
flags,
ops,
num_reply,
+ coll, coll_index,
rbd_req_cb, 0, NULL);
+
+ rbd_destroy_ops(ops);
done:
kfree(seg_name);
return ret;
struct rbd_device *rbd_dev,
struct ceph_snap_context *snapc,
u64 ofs, u64 len,
- struct bio *bio)
+ struct bio *bio,
+ struct rbd_req_coll *coll,
+ int coll_index)
{
return rbd_do_op(rq, rbd_dev, snapc, CEPH_NOSNAP,
CEPH_OSD_OP_WRITE,
CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
2,
- ofs, len, bio);
+ ofs, len, bio, coll, coll_index);
}
/*
struct rbd_device *rbd_dev,
u64 snapid,
u64 ofs, u64 len,
- struct bio *bio)
+ struct bio *bio,
+ struct rbd_req_coll *coll,
+ int coll_index)
{
return rbd_do_op(rq, rbd_dev, NULL,
(snapid ? snapid : CEPH_NOSNAP),
CEPH_OSD_OP_READ,
CEPH_OSD_FLAG_READ,
2,
- ofs, len, bio);
+ ofs, len, bio, coll, coll_index);
}
/*
{
struct ceph_osd_req_op *ops;
struct page **pages = NULL;
- int ret = rbd_create_rw_ops(&ops, 1, CEPH_OSD_OP_NOTIFY_ACK, 0);
+ int ret;
+
+ ret = rbd_create_rw_ops(&ops, 1, CEPH_OSD_OP_NOTIFY_ACK, 0);
if (ret < 0)
return ret;
CEPH_OSD_FLAG_READ,
ops,
1,
+ NULL, 0,
rbd_simple_req_cb, 0, NULL);
rbd_destroy_ops(ops);
return ret;
}
+static struct rbd_req_coll *rbd_alloc_coll(int num_reqs)
+{
+ struct rbd_req_coll *coll =
+ kzalloc(sizeof(struct rbd_req_coll) +
+ sizeof(struct rbd_req_status) * num_reqs,
+ GFP_ATOMIC);
+
+ if (!coll)
+ return NULL;
+ coll->total = num_reqs;
+ kref_init(&coll->kref);
+ return coll;
+}
+
/*
* block device queue callback
*/
bool do_write;
int size, op_size = 0;
u64 ofs;
+ int num_segs, cur_seg = 0;
+ struct rbd_req_coll *coll;
/* peek at request from block layer */
if (!rq)
do_write ? "write" : "read",
size, blk_rq_pos(rq) * 512ULL);
+ num_segs = rbd_get_num_segments(&rbd_dev->header, ofs, size);
+ coll = rbd_alloc_coll(num_segs);
+ if (!coll) {
+ spin_lock_irq(q->queue_lock);
+ __blk_end_request_all(rq, -ENOMEM);
+ goto next;
+ }
+
do {
/* a bio clone to be passed down to OSD req */
dout("rq->bio->bi_vcnt=%d\n", rq->bio->bi_vcnt);
rbd_dev->header.block_name,
ofs, size,
NULL, NULL);
+ kref_get(&coll->kref);
bio = bio_chain_clone(&rq_bio, &next_bio, &bp,
op_size, GFP_ATOMIC);
if (!bio) {
- spin_lock_irq(q->queue_lock);
- __blk_end_request_all(rq, -ENOMEM);
- goto next;
+ rbd_coll_end_req_index(rq, coll, cur_seg,
+ -ENOMEM, op_size);
+ goto next_seg;
}
+
/* init OSD command: write or read */
if (do_write)
rbd_req_write(rq, rbd_dev,
rbd_dev->header.snapc,
ofs,
- op_size, bio);
+ op_size, bio,
+ coll, cur_seg);
else
rbd_req_read(rq, rbd_dev,
cur_snap_id(rbd_dev),
ofs,
- op_size, bio);
+ op_size, bio,
+ coll, cur_seg);
+next_seg:
size -= op_size;
ofs += op_size;
+ cur_seg++;
rq_bio = next_bio;
} while (size > 0);
+ kref_put(&coll->kref, rbd_coll_release);
if (bp)
bio_pair_release(bp);
-
spin_lock_irq(q->queue_lock);
next:
rq = blk_fetch_request(q);
swd->unit[drive].disk->first_minor = drive;
sprintf(swd->unit[drive].disk->disk_name, "fd%d", drive);
swd->unit[drive].disk->fops = &floppy_fops;
- swd->unit[drive].disk->events = DISK_EVENT_MEDIA_CHANGE;
swd->unit[drive].disk->private_data = &swd->unit[drive];
swd->unit[drive].disk->queue = swd->queue;
set_capacity(swd->unit[drive].disk, 2880);
disk->major = FLOPPY_MAJOR;
disk->first_minor = i;
disk->fops = &floppy_fops;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
disk->private_data = &floppy_states[i];
disk->queue = swim3_queue;
disk->flags |= GENHD_FL_REMOVABLE;
disk->major = UB_MAJOR;
disk->first_minor = lun->id * UB_PARTS_PER_LUN;
disk->fops = &ub_bd_fops;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
disk->private_data = lun;
disk->driverfs_dev = &sc->intf->dev;
ace->gd->major = ace_major;
ace->gd->first_minor = ace->id * ACE_NUM_MINORS;
ace->gd->fops = &ace_fops;
- ace->gd->events = DISK_EVENT_MEDIA_CHANGE;
ace->gd->queue = ace->queue;
ace->gd->private_data = ace;
snprintf(ace->gd->disk_name, 32, "xs%c", ace->id + 'a');
cdinfo(CD_OPEN, "entering cdrom_open\n");
+ /* open is event synchronization point, check events first */
+ check_disk_change(bdev);
+
/* if this was a O_NONBLOCK open and we should honor the flags,
* do a quick open without drive/disc integrity checks. */
cdi->use_count++;
cdinfo(CD_OPEN, "Use count for \"/dev/%s\" now %d\n",
cdi->name, cdi->use_count);
- /* Do this on open. Don't wait for mount, because they might
- not be mounting, but opening with O_NONBLOCK */
- check_disk_change(bdev);
return 0;
err_release:
if (CDROM_CAN(CDC_LOCK) && cdi->options & CDO_LOCK) {
goto probe_fail_cdrom_register;
}
gd.disk->fops = &gdrom_bdops;
- gd.disk->events = DISK_EVENT_MEDIA_CHANGE;
/* latch on to the interrupt */
err = gdrom_set_interrupt_handlers();
if (err)
gendisk->queue = q;
gendisk->fops = &viocd_fops;
gendisk->flags = GENHD_FL_CD|GENHD_FL_REMOVABLE;
- gendisk->events = DISK_EVENT_MEDIA_CHANGE;
set_capacity(gendisk, 0);
gendisk->private_data = d;
d->viocd_disk = gendisk;
.rating = 250,
.read = read_hpet,
.mask = CLOCKSOURCE_MASK(64),
- .mult = 0, /* to be calculated */
- .shift = 10,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
static struct clocksource *hpet_clocksource;
if (!hpet_clocksource) {
hpet_mctr = (void __iomem *)&hpetp->hp_hpet->hpet_mc;
CLKSRC_FSYS_MMIO_SET(clocksource_hpet.fsys_mmio, hpet_mctr);
- clocksource_hpet.mult = clocksource_hz2mult(hpetp->hp_tick_freq,
- clocksource_hpet.shift);
- clocksource_register(&clocksource_hpet);
+ clocksource_register_hz(&clocksource_hpet, hpetp->hp_tick_freq);
hpetp->hp_clocksource = &clocksource_hpet;
hpet_clocksource = &clocksource_hpet;
}
pr_info("%s", version);
}
+static const struct of_device_id n2rng_match[];
static int __devinit n2rng_probe(struct platform_device *op)
{
+ const struct of_device_id *match;
int victoria_falls;
int err = -ENOMEM;
struct n2rng *np;
- if (!op->dev.of_match)
+ match = of_match_device(n2rng_match, &op->dev);
+ if (!match)
return -EINVAL;
- victoria_falls = (op->dev.of_match->data != NULL);
+ victoria_falls = (match->data != NULL);
n2rng_driver_version();
np = kzalloc(sizeof(*np), GFP_KERNEL);
};
#endif /* CONFIG_PCI */
+static struct of_device_id ipmi_match[];
static int __devinit ipmi_probe(struct platform_device *dev)
{
#ifdef CONFIG_OF
+ const struct of_device_id *match;
struct smi_info *info;
struct resource resource;
const __be32 *regsize, *regspacing, *regshift;
dev_info(&dev->dev, "probing via device tree\n");
- if (!dev->dev.of_match)
+ match = of_match_device(ipmi_match, &dev->dev);
+ if (!match)
return -EINVAL;
ret = of_address_to_resource(np, 0, &resource);
return -ENOMEM;
}
- info->si_type = (enum si_type) dev->dev.of_match->data;
+ info->si_type = (enum si_type) match->data;
info->addr_source = SI_DEVICETREE;
info->irq_setup = std_irq_setup;
}
#ifdef CONFIG_OF
-static int __devinit hwicap_of_probe(struct platform_device *op)
+static int __devinit hwicap_of_probe(struct platform_device *op,
+ const struct hwicap_driver_config *config)
{
struct resource res;
const unsigned int *id;
const char *family;
int rc;
- const struct hwicap_driver_config *config = op->dev.of_match->data;
const struct config_registers *regs;
regs);
}
#else
-static inline int hwicap_of_probe(struct platform_device *op)
+static inline int hwicap_of_probe(struct platform_device *op,
+ const struct hwicap_driver_config *config)
{
return -EINVAL;
}
#endif /* CONFIG_OF */
+static const struct of_device_id __devinitconst hwicap_of_match[];
static int __devinit hwicap_drv_probe(struct platform_device *pdev)
{
+ const struct of_device_id *match;
struct resource *res;
const struct config_registers *regs;
const char *family;
- if (pdev->dev.of_match)
- return hwicap_of_probe(pdev);
+ match = of_match_device(hwicap_of_match, &pdev->dev);
+ if (match)
+ return hwicap_of_probe(pdev, match->data);
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
* Then we take the most specific entry - with the following
* order of precedence: dev+con > dev only > con only.
*/
-static struct clk *clk_find(const char *dev_id, const char *con_id)
+static struct clk_lookup *clk_find(const char *dev_id, const char *con_id)
{
- struct clk_lookup *p;
- struct clk *clk = NULL;
+ struct clk_lookup *p, *cl = NULL;
int match, best = 0;
list_for_each_entry(p, &clocks, node) {
}
if (match > best) {
- clk = p->clk;
+ cl = p;
if (match != 3)
best = match;
else
break;
}
}
- return clk;
+ return cl;
}
struct clk *clk_get_sys(const char *dev_id, const char *con_id)
{
- struct clk *clk;
+ struct clk_lookup *cl;
mutex_lock(&clocks_mutex);
- clk = clk_find(dev_id, con_id);
- if (clk && !__clk_get(clk))
- clk = NULL;
+ cl = clk_find(dev_id, con_id);
+ if (cl && !__clk_get(cl->clk))
+ cl = NULL;
mutex_unlock(&clocks_mutex);
- return clk ? clk : ERR_PTR(-ENOENT);
+ return cl ? cl->clk : ERR_PTR(-ENOENT);
}
EXPORT_SYMBOL(clk_get_sys);
--- /dev/null
+config CLKSRC_I8253
+ bool
obj-$(CONFIG_SH_TIMER_CMT) += sh_cmt.o
obj-$(CONFIG_SH_TIMER_MTU2) += sh_mtu2.o
obj-$(CONFIG_SH_TIMER_TMU) += sh_tmu.o
+obj-$(CONFIG_CLKSRC_I8253) += i8253.o
.rating = 250,
.read = read_cyclone,
.mask = CYCLONE_TIMER_MASK,
- .mult = 10,
- .shift = 0,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
}
cyclone_ptr = cyclone_timer;
- /* sort out mult/shift values: */
- clocksource_cyclone.shift = 22;
- clocksource_cyclone.mult = clocksource_hz2mult(CYCLONE_TIMER_FREQ,
- clocksource_cyclone.shift);
-
- return clocksource_register(&clocksource_cyclone);
+ return clocksource_register_hz(&clocksource_cyclone,
+ CYCLONE_TIMER_FREQ);
}
arch_initcall(init_cyclone_clocksource);
--- /dev/null
+/*
+ * i8253 PIT clocksource
+ */
+#include <linux/clocksource.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/spinlock.h>
+#include <linux/timex.h>
+
+#include <asm/i8253.h>
+
+/*
+ * Since the PIT overflows every tick, its not very useful
+ * to just read by itself. So use jiffies to emulate a free
+ * running counter:
+ */
+static cycle_t i8253_read(struct clocksource *cs)
+{
+ static int old_count;
+ static u32 old_jifs;
+ unsigned long flags;
+ int count;
+ u32 jifs;
+
+ raw_spin_lock_irqsave(&i8253_lock, flags);
+ /*
+ * Although our caller may have the read side of xtime_lock,
+ * this is now a seqlock, and we are cheating in this routine
+ * by having side effects on state that we cannot undo if
+ * there is a collision on the seqlock and our caller has to
+ * retry. (Namely, old_jifs and old_count.) So we must treat
+ * jiffies as volatile despite the lock. We read jiffies
+ * before latching the timer count to guarantee that although
+ * the jiffies value might be older than the count (that is,
+ * the counter may underflow between the last point where
+ * jiffies was incremented and the point where we latch the
+ * count), it cannot be newer.
+ */
+ jifs = jiffies;
+ outb_pit(0x00, PIT_MODE); /* latch the count ASAP */
+ count = inb_pit(PIT_CH0); /* read the latched count */
+ count |= inb_pit(PIT_CH0) << 8;
+
+ /* VIA686a test code... reset the latch if count > max + 1 */
+ if (count > LATCH) {
+ outb_pit(0x34, PIT_MODE);
+ outb_pit(PIT_LATCH & 0xff, PIT_CH0);
+ outb_pit(PIT_LATCH >> 8, PIT_CH0);
+ count = PIT_LATCH - 1;
+ }
+
+ /*
+ * It's possible for count to appear to go the wrong way for a
+ * couple of reasons:
+ *
+ * 1. The timer counter underflows, but we haven't handled the
+ * resulting interrupt and incremented jiffies yet.
+ * 2. Hardware problem with the timer, not giving us continuous time,
+ * the counter does small "jumps" upwards on some Pentium systems,
+ * (see c't 95/10 page 335 for Neptun bug.)
+ *
+ * Previous attempts to handle these cases intelligently were
+ * buggy, so we just do the simple thing now.
+ */
+ if (count > old_count && jifs == old_jifs)
+ count = old_count;
+
+ old_count = count;
+ old_jifs = jifs;
+
+ raw_spin_unlock_irqrestore(&i8253_lock, flags);
+
+ count = (PIT_LATCH - 1) - count;
+
+ return (cycle_t)(jifs * PIT_LATCH) + count;
+}
+
+static struct clocksource i8253_cs = {
+ .name = "pit",
+ .rating = 110,
+ .read = i8253_read,
+ .mask = CLOCKSOURCE_MASK(32),
+};
+
+int __init clocksource_i8253_init(void)
+{
+ return clocksource_register_hz(&i8253_cs, PIT_TICK_RATE);
+}
+menu "CPU Frequency scaling"
+
config CPU_FREQ
bool "CPU Frequency scaling"
help
config CPU_FREQ_TABLE
tristate
-config CPU_FREQ_DEBUG
- bool "Enable CPUfreq debugging"
- help
- Say Y here to enable CPUfreq subsystem (including drivers)
- debugging. You will need to activate it via the kernel
- command line by passing
- cpufreq.debug=<value>
-
- To get <value>, add
- 1 to activate CPUfreq core debugging,
- 2 to activate CPUfreq drivers debugging, and
- 4 to activate CPUfreq governor debugging
-
config CPU_FREQ_STAT
tristate "CPU frequency translation statistics"
select CPU_FREQ_TABLE
If in doubt, say N.
-endif # CPU_FREQ
+menu "x86 CPU frequency scaling drivers"
+depends on X86
+source "drivers/cpufreq/Kconfig.x86"
+endmenu
+
+endif
+endmenu
--- /dev/null
+#
+# x86 CPU Frequency scaling drivers
+#
+
+config X86_PCC_CPUFREQ
+ tristate "Processor Clocking Control interface driver"
+ depends on ACPI && ACPI_PROCESSOR
+ help
+ This driver adds support for the PCC interface.
+
+ For details, take a look at:
+ <file:Documentation/cpu-freq/pcc-cpufreq.txt>.
+
+ To compile this driver as a module, choose M here: the
+ module will be called pcc-cpufreq.
+
+ If in doubt, say N.
+
+config X86_ACPI_CPUFREQ
+ tristate "ACPI Processor P-States driver"
+ select CPU_FREQ_TABLE
+ depends on ACPI_PROCESSOR
+ help
+ This driver adds a CPUFreq driver which utilizes the ACPI
+ Processor Performance States.
+ This driver also supports Intel Enhanced Speedstep.
+
+ To compile this driver as a module, choose M here: the
+ module will be called acpi-cpufreq.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config ELAN_CPUFREQ
+ tristate "AMD Elan SC400 and SC410"
+ select CPU_FREQ_TABLE
+ depends on X86_ELAN
+ ---help---
+ This adds the CPUFreq driver for AMD Elan SC400 and SC410
+ processors.
+
+ You need to specify the processor maximum speed as boot
+ parameter: elanfreq=maxspeed (in kHz) or as module
+ parameter "max_freq".
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config SC520_CPUFREQ
+ tristate "AMD Elan SC520"
+ select CPU_FREQ_TABLE
+ depends on X86_ELAN
+ ---help---
+ This adds the CPUFreq driver for AMD Elan SC520 processor.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+
+config X86_POWERNOW_K6
+ tristate "AMD Mobile K6-2/K6-3 PowerNow!"
+ select CPU_FREQ_TABLE
+ depends on X86_32
+ help
+ This adds the CPUFreq driver for mobile AMD K6-2+ and mobile
+ AMD K6-3+ processors.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_POWERNOW_K7
+ tristate "AMD Mobile Athlon/Duron PowerNow!"
+ select CPU_FREQ_TABLE
+ depends on X86_32
+ help
+ This adds the CPUFreq driver for mobile AMD K7 mobile processors.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_POWERNOW_K7_ACPI
+ bool
+ depends on X86_POWERNOW_K7 && ACPI_PROCESSOR
+ depends on !(X86_POWERNOW_K7 = y && ACPI_PROCESSOR = m)
+ depends on X86_32
+ default y
+
+config X86_POWERNOW_K8
+ tristate "AMD Opteron/Athlon64 PowerNow!"
+ select CPU_FREQ_TABLE
+ depends on ACPI && ACPI_PROCESSOR
+ help
+ This adds the CPUFreq driver for K8/K10 Opteron/Athlon64 processors.
+
+ To compile this driver as a module, choose M here: the
+ module will be called powernow-k8.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+config X86_GX_SUSPMOD
+ tristate "Cyrix MediaGX/NatSemi Geode Suspend Modulation"
+ depends on X86_32 && PCI
+ help
+ This add the CPUFreq driver for NatSemi Geode processors which
+ support suspend modulation.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_SPEEDSTEP_CENTRINO
+ tristate "Intel Enhanced SpeedStep (deprecated)"
+ select CPU_FREQ_TABLE
+ select X86_SPEEDSTEP_CENTRINO_TABLE if X86_32
+ depends on X86_32 || (X86_64 && ACPI_PROCESSOR)
+ help
+ This is deprecated and this functionality is now merged into
+ acpi_cpufreq (X86_ACPI_CPUFREQ). Use that driver instead of
+ speedstep_centrino.
+ This adds the CPUFreq driver for Enhanced SpeedStep enabled
+ mobile CPUs. This means Intel Pentium M (Centrino) CPUs
+ or 64bit enabled Intel Xeons.
+
+ To compile this driver as a module, choose M here: the
+ module will be called speedstep-centrino.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_SPEEDSTEP_CENTRINO_TABLE
+ bool "Built-in tables for Banias CPUs"
+ depends on X86_32 && X86_SPEEDSTEP_CENTRINO
+ default y
+ help
+ Use built-in tables for Banias CPUs if ACPI encoding
+ is not available.
+
+ If in doubt, say N.
+
+config X86_SPEEDSTEP_ICH
+ tristate "Intel Speedstep on ICH-M chipsets (ioport interface)"
+ select CPU_FREQ_TABLE
+ depends on X86_32
+ help
+ This adds the CPUFreq driver for certain mobile Intel Pentium III
+ (Coppermine), all mobile Intel Pentium III-M (Tualatin) and all
+ mobile Intel Pentium 4 P4-M on systems which have an Intel ICH2,
+ ICH3 or ICH4 southbridge.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_SPEEDSTEP_SMI
+ tristate "Intel SpeedStep on 440BX/ZX/MX chipsets (SMI interface)"
+ select CPU_FREQ_TABLE
+ depends on X86_32 && EXPERIMENTAL
+ help
+ This adds the CPUFreq driver for certain mobile Intel Pentium III
+ (Coppermine), all mobile Intel Pentium III-M (Tualatin)
+ on systems which have an Intel 440BX/ZX/MX southbridge.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_P4_CLOCKMOD
+ tristate "Intel Pentium 4 clock modulation"
+ select CPU_FREQ_TABLE
+ help
+ This adds the CPUFreq driver for Intel Pentium 4 / XEON
+ processors. When enabled it will lower CPU temperature by skipping
+ clocks.
+
+ This driver should be only used in exceptional
+ circumstances when very low power is needed because it causes severe
+ slowdowns and noticeable latencies. Normally Speedstep should be used
+ instead.
+
+ To compile this driver as a module, choose M here: the
+ module will be called p4-clockmod.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ Unless you are absolutely sure say N.
+
+config X86_CPUFREQ_NFORCE2
+ tristate "nVidia nForce2 FSB changing"
+ depends on X86_32 && EXPERIMENTAL
+ help
+ This adds the CPUFreq driver for FSB changing on nVidia nForce2
+ platforms.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_LONGRUN
+ tristate "Transmeta LongRun"
+ depends on X86_32
+ help
+ This adds the CPUFreq driver for Transmeta Crusoe and Efficeon processors
+ which support LongRun.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_LONGHAUL
+ tristate "VIA Cyrix III Longhaul"
+ select CPU_FREQ_TABLE
+ depends on X86_32 && ACPI_PROCESSOR
+ help
+ This adds the CPUFreq driver for VIA Samuel/CyrixIII,
+ VIA Cyrix Samuel/C3, VIA Cyrix Ezra and VIA Cyrix Ezra-T
+ processors.
+
+ For details, take a look at <file:Documentation/cpu-freq/>.
+
+ If in doubt, say N.
+
+config X86_E_POWERSAVER
+ tristate "VIA C7 Enhanced PowerSaver (DANGEROUS)"
+ select CPU_FREQ_TABLE
+ depends on X86_32 && EXPERIMENTAL
+ help
+ This adds the CPUFreq driver for VIA C7 processors. However, this driver
+ does not have any safeguards to prevent operating the CPU out of spec
+ and is thus considered dangerous. Please use the regular ACPI cpufreq
+ driver, enabled by CONFIG_X86_ACPI_CPUFREQ.
+
+ If in doubt, say N.
+
+comment "shared options"
+
+config X86_SPEEDSTEP_LIB
+ tristate
+ default (X86_SPEEDSTEP_ICH || X86_SPEEDSTEP_SMI || X86_P4_CLOCKMOD)
+
+config X86_SPEEDSTEP_RELAXED_CAP_CHECK
+ bool "Relaxed speedstep capability checks"
+ depends on X86_32 && (X86_SPEEDSTEP_SMI || X86_SPEEDSTEP_ICH)
+ help
+ Don't perform all checks for a speedstep capable system which would
+ normally be done. Some ancient or strange systems, though speedstep
+ capable, don't always indicate that they are speedstep capable. This
+ option lets the probing code bypass some of those checks if the
+ parameter "relaxed_check=1" is passed to the module.
+
# CPUfreq cross-arch helpers
obj-$(CONFIG_CPU_FREQ_TABLE) += freq_table.o
+##################################################################################d
+# x86 drivers.
+# Link order matters. K8 is preferred to ACPI because of firmware bugs in early
+# K8 systems. ACPI is preferred to all other hardware-specific drivers.
+# speedstep-* is preferred over p4-clockmod.
+
+obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o mperf.o
+obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o mperf.o
+obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o
+obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o
+obj-$(CONFIG_X86_POWERNOW_K7) += powernow-k7.o
+obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
+obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o
+obj-$(CONFIG_ELAN_CPUFREQ) += elanfreq.o
+obj-$(CONFIG_SC520_CPUFREQ) += sc520_freq.o
+obj-$(CONFIG_X86_LONGRUN) += longrun.o
+obj-$(CONFIG_X86_GX_SUSPMOD) += gx-suspmod.o
+obj-$(CONFIG_X86_SPEEDSTEP_ICH) += speedstep-ich.o
+obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o
+obj-$(CONFIG_X86_SPEEDSTEP_SMI) += speedstep-smi.o
+obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o
+obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o
+obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o
+
+##################################################################################d
+
--- /dev/null
+/*
+ * acpi-cpufreq.c - ACPI Processor P-States Driver
+ *
+ * Copyright (C) 2001, 2002 Andy Grover <andrew.grover@intel.com>
+ * Copyright (C) 2001, 2002 Paul Diefenbaugh <paul.s.diefenbaugh@intel.com>
+ * Copyright (C) 2002 - 2004 Dominik Brodowski <linux@brodo.de>
+ * Copyright (C) 2006 Denis Sadykov <denis.m.sadykov@intel.com>
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/sched.h>
+#include <linux/cpufreq.h>
+#include <linux/compiler.h>
+#include <linux/dmi.h>
+#include <linux/slab.h>
+
+#include <linux/acpi.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/uaccess.h>
+
+#include <acpi/processor.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+#include "mperf.h"
+
+MODULE_AUTHOR("Paul Diefenbaugh, Dominik Brodowski");
+MODULE_DESCRIPTION("ACPI Processor P-States Driver");
+MODULE_LICENSE("GPL");
+
+enum {
+ UNDEFINED_CAPABLE = 0,
+ SYSTEM_INTEL_MSR_CAPABLE,
+ SYSTEM_IO_CAPABLE,
+};
+
+#define INTEL_MSR_RANGE (0xffff)
+
+struct acpi_cpufreq_data {
+ struct acpi_processor_performance *acpi_data;
+ struct cpufreq_frequency_table *freq_table;
+ unsigned int resume;
+ unsigned int cpu_feature;
+};
+
+static DEFINE_PER_CPU(struct acpi_cpufreq_data *, acfreq_data);
+
+/* acpi_perf_data is a pointer to percpu data. */
+static struct acpi_processor_performance __percpu *acpi_perf_data;
+
+static struct cpufreq_driver acpi_cpufreq_driver;
+
+static unsigned int acpi_pstate_strict;
+
+static int check_est_cpu(unsigned int cpuid)
+{
+ struct cpuinfo_x86 *cpu = &cpu_data(cpuid);
+
+ return cpu_has(cpu, X86_FEATURE_EST);
+}
+
+static unsigned extract_io(u32 value, struct acpi_cpufreq_data *data)
+{
+ struct acpi_processor_performance *perf;
+ int i;
+
+ perf = data->acpi_data;
+
+ for (i = 0; i < perf->state_count; i++) {
+ if (value == perf->states[i].status)
+ return data->freq_table[i].frequency;
+ }
+ return 0;
+}
+
+static unsigned extract_msr(u32 msr, struct acpi_cpufreq_data *data)
+{
+ int i;
+ struct acpi_processor_performance *perf;
+
+ msr &= INTEL_MSR_RANGE;
+ perf = data->acpi_data;
+
+ for (i = 0; data->freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
+ if (msr == perf->states[data->freq_table[i].index].status)
+ return data->freq_table[i].frequency;
+ }
+ return data->freq_table[0].frequency;
+}
+
+static unsigned extract_freq(u32 val, struct acpi_cpufreq_data *data)
+{
+ switch (data->cpu_feature) {
+ case SYSTEM_INTEL_MSR_CAPABLE:
+ return extract_msr(val, data);
+ case SYSTEM_IO_CAPABLE:
+ return extract_io(val, data);
+ default:
+ return 0;
+ }
+}
+
+struct msr_addr {
+ u32 reg;
+};
+
+struct io_addr {
+ u16 port;
+ u8 bit_width;
+};
+
+struct drv_cmd {
+ unsigned int type;
+ const struct cpumask *mask;
+ union {
+ struct msr_addr msr;
+ struct io_addr io;
+ } addr;
+ u32 val;
+};
+
+/* Called via smp_call_function_single(), on the target CPU */
+static void do_drv_read(void *_cmd)
+{
+ struct drv_cmd *cmd = _cmd;
+ u32 h;
+
+ switch (cmd->type) {
+ case SYSTEM_INTEL_MSR_CAPABLE:
+ rdmsr(cmd->addr.msr.reg, cmd->val, h);
+ break;
+ case SYSTEM_IO_CAPABLE:
+ acpi_os_read_port((acpi_io_address)cmd->addr.io.port,
+ &cmd->val,
+ (u32)cmd->addr.io.bit_width);
+ break;
+ default:
+ break;
+ }
+}
+
+/* Called via smp_call_function_many(), on the target CPUs */
+static void do_drv_write(void *_cmd)
+{
+ struct drv_cmd *cmd = _cmd;
+ u32 lo, hi;
+
+ switch (cmd->type) {
+ case SYSTEM_INTEL_MSR_CAPABLE:
+ rdmsr(cmd->addr.msr.reg, lo, hi);
+ lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
+ wrmsr(cmd->addr.msr.reg, lo, hi);
+ break;
+ case SYSTEM_IO_CAPABLE:
+ acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
+ cmd->val,
+ (u32)cmd->addr.io.bit_width);
+ break;
+ default:
+ break;
+ }
+}
+
+static void drv_read(struct drv_cmd *cmd)
+{
+ int err;
+ cmd->val = 0;
+
+ err = smp_call_function_any(cmd->mask, do_drv_read, cmd, 1);
+ WARN_ON_ONCE(err); /* smp_call_function_any() was buggy? */
+}
+
+static void drv_write(struct drv_cmd *cmd)
+{
+ int this_cpu;
+
+ this_cpu = get_cpu();
+ if (cpumask_test_cpu(this_cpu, cmd->mask))
+ do_drv_write(cmd);
+ smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
+ put_cpu();
+}
+
+static u32 get_cur_val(const struct cpumask *mask)
+{
+ struct acpi_processor_performance *perf;
+ struct drv_cmd cmd;
+
+ if (unlikely(cpumask_empty(mask)))
+ return 0;
+
+ switch (per_cpu(acfreq_data, cpumask_first(mask))->cpu_feature) {
+ case SYSTEM_INTEL_MSR_CAPABLE:
+ cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
+ cmd.addr.msr.reg = MSR_IA32_PERF_STATUS;
+ break;
+ case SYSTEM_IO_CAPABLE:
+ cmd.type = SYSTEM_IO_CAPABLE;
+ perf = per_cpu(acfreq_data, cpumask_first(mask))->acpi_data;
+ cmd.addr.io.port = perf->control_register.address;
+ cmd.addr.io.bit_width = perf->control_register.bit_width;
+ break;
+ default:
+ return 0;
+ }
+
+ cmd.mask = mask;
+ drv_read(&cmd);
+
+ pr_debug("get_cur_val = %u\n", cmd.val);
+
+ return cmd.val;
+}
+
+static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
+{
+ struct acpi_cpufreq_data *data = per_cpu(acfreq_data, cpu);
+ unsigned int freq;
+ unsigned int cached_freq;
+
+ pr_debug("get_cur_freq_on_cpu (%d)\n", cpu);
+
+ if (unlikely(data == NULL ||
+ data->acpi_data == NULL || data->freq_table == NULL)) {
+ return 0;
+ }
+
+ cached_freq = data->freq_table[data->acpi_data->state].frequency;
+ freq = extract_freq(get_cur_val(cpumask_of(cpu)), data);
+ if (freq != cached_freq) {
+ /*
+ * The dreaded BIOS frequency change behind our back.
+ * Force set the frequency on next target call.
+ */
+ data->resume = 1;
+ }
+
+ pr_debug("cur freq = %u\n", freq);
+
+ return freq;
+}
+
+static unsigned int check_freqs(const struct cpumask *mask, unsigned int freq,
+ struct acpi_cpufreq_data *data)
+{
+ unsigned int cur_freq;
+ unsigned int i;
+
+ for (i = 0; i < 100; i++) {
+ cur_freq = extract_freq(get_cur_val(mask), data);
+ if (cur_freq == freq)
+ return 1;
+ udelay(10);
+ }
+ return 0;
+}
+
+static int acpi_cpufreq_target(struct cpufreq_policy *policy,
+ unsigned int target_freq, unsigned int relation)
+{
+ struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+ struct acpi_processor_performance *perf;
+ struct cpufreq_freqs freqs;
+ struct drv_cmd cmd;
+ unsigned int next_state = 0; /* Index into freq_table */
+ unsigned int next_perf_state = 0; /* Index into perf table */
+ unsigned int i;
+ int result = 0;
+
+ pr_debug("acpi_cpufreq_target %d (%d)\n", target_freq, policy->cpu);
+
+ if (unlikely(data == NULL ||
+ data->acpi_data == NULL || data->freq_table == NULL)) {
+ return -ENODEV;
+ }
+
+ perf = data->acpi_data;
+ result = cpufreq_frequency_table_target(policy,
+ data->freq_table,
+ target_freq,
+ relation, &next_state);
+ if (unlikely(result)) {
+ result = -ENODEV;
+ goto out;
+ }
+
+ next_perf_state = data->freq_table[next_state].index;
+ if (perf->state == next_perf_state) {
+ if (unlikely(data->resume)) {
+ pr_debug("Called after resume, resetting to P%d\n",
+ next_perf_state);
+ data->resume = 0;
+ } else {
+ pr_debug("Already at target state (P%d)\n",
+ next_perf_state);
+ goto out;
+ }
+ }
+
+ switch (data->cpu_feature) {
+ case SYSTEM_INTEL_MSR_CAPABLE:
+ cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
+ cmd.addr.msr.reg = MSR_IA32_PERF_CTL;
+ cmd.val = (u32) perf->states[next_perf_state].control;
+ break;
+ case SYSTEM_IO_CAPABLE:
+ cmd.type = SYSTEM_IO_CAPABLE;
+ cmd.addr.io.port = perf->control_register.address;
+ cmd.addr.io.bit_width = perf->control_register.bit_width;
+ cmd.val = (u32) perf->states[next_perf_state].control;
+ break;
+ default:
+ result = -ENODEV;
+ goto out;
+ }
+
+ /* cpufreq holds the hotplug lock, so we are safe from here on */
+ if (policy->shared_type != CPUFREQ_SHARED_TYPE_ANY)
+ cmd.mask = policy->cpus;
+ else
+ cmd.mask = cpumask_of(policy->cpu);
+
+ freqs.old = perf->states[perf->state].core_frequency * 1000;
+ freqs.new = data->freq_table[next_state].frequency;
+ for_each_cpu(i, policy->cpus) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ }
+
+ drv_write(&cmd);
+
+ if (acpi_pstate_strict) {
+ if (!check_freqs(cmd.mask, freqs.new, data)) {
+ pr_debug("acpi_cpufreq_target failed (%d)\n",
+ policy->cpu);
+ result = -EAGAIN;
+ goto out;
+ }
+ }
+
+ for_each_cpu(i, policy->cpus) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+ perf->state = next_perf_state;
+
+out:
+ return result;
+}
+
+static int acpi_cpufreq_verify(struct cpufreq_policy *policy)
+{
+ struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+
+ pr_debug("acpi_cpufreq_verify\n");
+
+ return cpufreq_frequency_table_verify(policy, data->freq_table);
+}
+
+static unsigned long
+acpi_cpufreq_guess_freq(struct acpi_cpufreq_data *data, unsigned int cpu)
+{
+ struct acpi_processor_performance *perf = data->acpi_data;
+
+ if (cpu_khz) {
+ /* search the closest match to cpu_khz */
+ unsigned int i;
+ unsigned long freq;
+ unsigned long freqn = perf->states[0].core_frequency * 1000;
+
+ for (i = 0; i < (perf->state_count-1); i++) {
+ freq = freqn;
+ freqn = perf->states[i+1].core_frequency * 1000;
+ if ((2 * cpu_khz) > (freqn + freq)) {
+ perf->state = i;
+ return freq;
+ }
+ }
+ perf->state = perf->state_count-1;
+ return freqn;
+ } else {
+ /* assume CPU is at P0... */
+ perf->state = 0;
+ return perf->states[0].core_frequency * 1000;
+ }
+}
+
+static void free_acpi_perf_data(void)
+{
+ unsigned int i;
+
+ /* Freeing a NULL pointer is OK, and alloc_percpu zeroes. */
+ for_each_possible_cpu(i)
+ free_cpumask_var(per_cpu_ptr(acpi_perf_data, i)
+ ->shared_cpu_map);
+ free_percpu(acpi_perf_data);
+}
+
+/*
+ * acpi_cpufreq_early_init - initialize ACPI P-States library
+ *
+ * Initialize the ACPI P-States library (drivers/acpi/processor_perflib.c)
+ * in order to determine correct frequency and voltage pairings. We can
+ * do _PDC and _PSD and find out the processor dependency for the
+ * actual init that will happen later...
+ */
+static int __init acpi_cpufreq_early_init(void)
+{
+ unsigned int i;
+ pr_debug("acpi_cpufreq_early_init\n");
+
+ acpi_perf_data = alloc_percpu(struct acpi_processor_performance);
+ if (!acpi_perf_data) {
+ pr_debug("Memory allocation error for acpi_perf_data.\n");
+ return -ENOMEM;
+ }
+ for_each_possible_cpu(i) {
+ if (!zalloc_cpumask_var_node(
+ &per_cpu_ptr(acpi_perf_data, i)->shared_cpu_map,
+ GFP_KERNEL, cpu_to_node(i))) {
+
+ /* Freeing a NULL pointer is OK: alloc_percpu zeroes. */
+ free_acpi_perf_data();
+ return -ENOMEM;
+ }
+ }
+
+ /* Do initialization in ACPI core */
+ acpi_processor_preregister_performance(acpi_perf_data);
+ return 0;
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Some BIOSes do SW_ANY coordination internally, either set it up in hw
+ * or do it in BIOS firmware and won't inform about it to OS. If not
+ * detected, this has a side effect of making CPU run at a different speed
+ * than OS intended it to run at. Detect it and handle it cleanly.
+ */
+static int bios_with_sw_any_bug;
+
+static int sw_any_bug_found(const struct dmi_system_id *d)
+{
+ bios_with_sw_any_bug = 1;
+ return 0;
+}
+
+static const struct dmi_system_id sw_any_bug_dmi_table[] = {
+ {
+ .callback = sw_any_bug_found,
+ .ident = "Supermicro Server X6DLP",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Supermicro"),
+ DMI_MATCH(DMI_BIOS_VERSION, "080010"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "X6DLP"),
+ },
+ },
+ { }
+};
+
+static int acpi_cpufreq_blacklist(struct cpuinfo_x86 *c)
+{
+ /* Intel Xeon Processor 7100 Series Specification Update
+ * http://www.intel.com/Assets/PDF/specupdate/314554.pdf
+ * AL30: A Machine Check Exception (MCE) Occurring during an
+ * Enhanced Intel SpeedStep Technology Ratio Change May Cause
+ * Both Processor Cores to Lock Up. */
+ if (c->x86_vendor == X86_VENDOR_INTEL) {
+ if ((c->x86 == 15) &&
+ (c->x86_model == 6) &&
+ (c->x86_mask == 8)) {
+ printk(KERN_INFO "acpi-cpufreq: Intel(R) "
+ "Xeon(R) 7100 Errata AL30, processors may "
+ "lock up on frequency changes: disabling "
+ "acpi-cpufreq.\n");
+ return -ENODEV;
+ }
+ }
+ return 0;
+}
+#endif
+
+static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int i;
+ unsigned int valid_states = 0;
+ unsigned int cpu = policy->cpu;
+ struct acpi_cpufreq_data *data;
+ unsigned int result = 0;
+ struct cpuinfo_x86 *c = &cpu_data(policy->cpu);
+ struct acpi_processor_performance *perf;
+#ifdef CONFIG_SMP
+ static int blacklisted;
+#endif
+
+ pr_debug("acpi_cpufreq_cpu_init\n");
+
+#ifdef CONFIG_SMP
+ if (blacklisted)
+ return blacklisted;
+ blacklisted = acpi_cpufreq_blacklist(c);
+ if (blacklisted)
+ return blacklisted;
+#endif
+
+ data = kzalloc(sizeof(struct acpi_cpufreq_data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ data->acpi_data = per_cpu_ptr(acpi_perf_data, cpu);
+ per_cpu(acfreq_data, cpu) = data;
+
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
+ acpi_cpufreq_driver.flags |= CPUFREQ_CONST_LOOPS;
+
+ result = acpi_processor_register_performance(data->acpi_data, cpu);
+ if (result)
+ goto err_free;
+
+ perf = data->acpi_data;
+ policy->shared_type = perf->shared_type;
+
+ /*
+ * Will let policy->cpus know about dependency only when software
+ * coordination is required.
+ */
+ if (policy->shared_type == CPUFREQ_SHARED_TYPE_ALL ||
+ policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
+ cpumask_copy(policy->cpus, perf->shared_cpu_map);
+ }
+ cpumask_copy(policy->related_cpus, perf->shared_cpu_map);
+
+#ifdef CONFIG_SMP
+ dmi_check_system(sw_any_bug_dmi_table);
+ if (bios_with_sw_any_bug && cpumask_weight(policy->cpus) == 1) {
+ policy->shared_type = CPUFREQ_SHARED_TYPE_ALL;
+ cpumask_copy(policy->cpus, cpu_core_mask(cpu));
+ }
+#endif
+
+ /* capability check */
+ if (perf->state_count <= 1) {
+ pr_debug("No P-States\n");
+ result = -ENODEV;
+ goto err_unreg;
+ }
+
+ if (perf->control_register.space_id != perf->status_register.space_id) {
+ result = -ENODEV;
+ goto err_unreg;
+ }
+
+ switch (perf->control_register.space_id) {
+ case ACPI_ADR_SPACE_SYSTEM_IO:
+ pr_debug("SYSTEM IO addr space\n");
+ data->cpu_feature = SYSTEM_IO_CAPABLE;
+ break;
+ case ACPI_ADR_SPACE_FIXED_HARDWARE:
+ pr_debug("HARDWARE addr space\n");
+ if (!check_est_cpu(cpu)) {
+ result = -ENODEV;
+ goto err_unreg;
+ }
+ data->cpu_feature = SYSTEM_INTEL_MSR_CAPABLE;
+ break;
+ default:
+ pr_debug("Unknown addr space %d\n",
+ (u32) (perf->control_register.space_id));
+ result = -ENODEV;
+ goto err_unreg;
+ }
+
+ data->freq_table = kmalloc(sizeof(struct cpufreq_frequency_table) *
+ (perf->state_count+1), GFP_KERNEL);
+ if (!data->freq_table) {
+ result = -ENOMEM;
+ goto err_unreg;
+ }
+
+ /* detect transition latency */
+ policy->cpuinfo.transition_latency = 0;
+ for (i = 0; i < perf->state_count; i++) {
+ if ((perf->states[i].transition_latency * 1000) >
+ policy->cpuinfo.transition_latency)
+ policy->cpuinfo.transition_latency =
+ perf->states[i].transition_latency * 1000;
+ }
+
+ /* Check for high latency (>20uS) from buggy BIOSes, like on T42 */
+ if (perf->control_register.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE &&
+ policy->cpuinfo.transition_latency > 20 * 1000) {
+ policy->cpuinfo.transition_latency = 20 * 1000;
+ printk_once(KERN_INFO
+ "P-state transition latency capped at 20 uS\n");
+ }
+
+ /* table init */
+ for (i = 0; i < perf->state_count; i++) {
+ if (i > 0 && perf->states[i].core_frequency >=
+ data->freq_table[valid_states-1].frequency / 1000)
+ continue;
+
+ data->freq_table[valid_states].index = i;
+ data->freq_table[valid_states].frequency =
+ perf->states[i].core_frequency * 1000;
+ valid_states++;
+ }
+ data->freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
+ perf->state = 0;
+
+ result = cpufreq_frequency_table_cpuinfo(policy, data->freq_table);
+ if (result)
+ goto err_freqfree;
+
+ if (perf->states[0].core_frequency * 1000 != policy->cpuinfo.max_freq)
+ printk(KERN_WARNING FW_WARN "P-state 0 is not max freq\n");
+
+ switch (perf->control_register.space_id) {
+ case ACPI_ADR_SPACE_SYSTEM_IO:
+ /* Current speed is unknown and not detectable by IO port */
+ policy->cur = acpi_cpufreq_guess_freq(data, policy->cpu);
+ break;
+ case ACPI_ADR_SPACE_FIXED_HARDWARE:
+ acpi_cpufreq_driver.get = get_cur_freq_on_cpu;
+ policy->cur = get_cur_freq_on_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ /* notify BIOS that we exist */
+ acpi_processor_notify_smm(THIS_MODULE);
+
+ /* Check for APERF/MPERF support in hardware */
+ if (cpu_has(c, X86_FEATURE_APERFMPERF))
+ acpi_cpufreq_driver.getavg = cpufreq_get_measured_perf;
+
+ pr_debug("CPU%u - ACPI performance management activated.\n", cpu);
+ for (i = 0; i < perf->state_count; i++)
+ pr_debug(" %cP%d: %d MHz, %d mW, %d uS\n",
+ (i == perf->state ? '*' : ' '), i,
+ (u32) perf->states[i].core_frequency,
+ (u32) perf->states[i].power,
+ (u32) perf->states[i].transition_latency);
+
+ cpufreq_frequency_table_get_attr(data->freq_table, policy->cpu);
+
+ /*
+ * the first call to ->target() should result in us actually
+ * writing something to the appropriate registers.
+ */
+ data->resume = 1;
+
+ return result;
+
+err_freqfree:
+ kfree(data->freq_table);
+err_unreg:
+ acpi_processor_unregister_performance(perf, cpu);
+err_free:
+ kfree(data);
+ per_cpu(acfreq_data, cpu) = NULL;
+
+ return result;
+}
+
+static int acpi_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+ struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+
+ pr_debug("acpi_cpufreq_cpu_exit\n");
+
+ if (data) {
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ per_cpu(acfreq_data, policy->cpu) = NULL;
+ acpi_processor_unregister_performance(data->acpi_data,
+ policy->cpu);
+ kfree(data->freq_table);
+ kfree(data);
+ }
+
+ return 0;
+}
+
+static int acpi_cpufreq_resume(struct cpufreq_policy *policy)
+{
+ struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+
+ pr_debug("acpi_cpufreq_resume\n");
+
+ data->resume = 1;
+
+ return 0;
+}
+
+static struct freq_attr *acpi_cpufreq_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver acpi_cpufreq_driver = {
+ .verify = acpi_cpufreq_verify,
+ .target = acpi_cpufreq_target,
+ .bios_limit = acpi_processor_get_bios_limit,
+ .init = acpi_cpufreq_cpu_init,
+ .exit = acpi_cpufreq_cpu_exit,
+ .resume = acpi_cpufreq_resume,
+ .name = "acpi-cpufreq",
+ .owner = THIS_MODULE,
+ .attr = acpi_cpufreq_attr,
+};
+
+static int __init acpi_cpufreq_init(void)
+{
+ int ret;
+
+ if (acpi_disabled)
+ return 0;
+
+ pr_debug("acpi_cpufreq_init\n");
+
+ ret = acpi_cpufreq_early_init();
+ if (ret)
+ return ret;
+
+ ret = cpufreq_register_driver(&acpi_cpufreq_driver);
+ if (ret)
+ free_acpi_perf_data();
+
+ return ret;
+}
+
+static void __exit acpi_cpufreq_exit(void)
+{
+ pr_debug("acpi_cpufreq_exit\n");
+
+ cpufreq_unregister_driver(&acpi_cpufreq_driver);
+
+ free_percpu(acpi_perf_data);
+}
+
+module_param(acpi_pstate_strict, uint, 0644);
+MODULE_PARM_DESC(acpi_pstate_strict,
+ "value 0 or non-zero. non-zero -> strict ACPI checks are "
+ "performed during frequency changes.");
+
+late_initcall(acpi_cpufreq_init);
+module_exit(acpi_cpufreq_exit);
+
+MODULE_ALIAS("acpi");
--- /dev/null
+/*
+ * (C) 2004-2006 Sebastian Witt <se.witt@gmx.net>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ * Based upon reverse engineered information
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+
+#define NFORCE2_XTAL 25
+#define NFORCE2_BOOTFSB 0x48
+#define NFORCE2_PLLENABLE 0xa8
+#define NFORCE2_PLLREG 0xa4
+#define NFORCE2_PLLADR 0xa0
+#define NFORCE2_PLL(mul, div) (0x100000 | (mul << 8) | div)
+
+#define NFORCE2_MIN_FSB 50
+#define NFORCE2_SAFE_DISTANCE 50
+
+/* Delay in ms between FSB changes */
+/* #define NFORCE2_DELAY 10 */
+
+/*
+ * nforce2_chipset:
+ * FSB is changed using the chipset
+ */
+static struct pci_dev *nforce2_dev;
+
+/* fid:
+ * multiplier * 10
+ */
+static int fid;
+
+/* min_fsb, max_fsb:
+ * minimum and maximum FSB (= FSB at boot time)
+ */
+static int min_fsb;
+static int max_fsb;
+
+MODULE_AUTHOR("Sebastian Witt <se.witt@gmx.net>");
+MODULE_DESCRIPTION("nForce2 FSB changing cpufreq driver");
+MODULE_LICENSE("GPL");
+
+module_param(fid, int, 0444);
+module_param(min_fsb, int, 0444);
+
+MODULE_PARM_DESC(fid, "CPU multiplier to use (11.5 = 115)");
+MODULE_PARM_DESC(min_fsb,
+ "Minimum FSB to use, if not defined: current FSB - 50");
+
+#define PFX "cpufreq-nforce2: "
+
+/**
+ * nforce2_calc_fsb - calculate FSB
+ * @pll: PLL value
+ *
+ * Calculates FSB from PLL value
+ */
+static int nforce2_calc_fsb(int pll)
+{
+ unsigned char mul, div;
+
+ mul = (pll >> 8) & 0xff;
+ div = pll & 0xff;
+
+ if (div > 0)
+ return NFORCE2_XTAL * mul / div;
+
+ return 0;
+}
+
+/**
+ * nforce2_calc_pll - calculate PLL value
+ * @fsb: FSB
+ *
+ * Calculate PLL value for given FSB
+ */
+static int nforce2_calc_pll(unsigned int fsb)
+{
+ unsigned char xmul, xdiv;
+ unsigned char mul = 0, div = 0;
+ int tried = 0;
+
+ /* Try to calculate multiplier and divider up to 4 times */
+ while (((mul == 0) || (div == 0)) && (tried <= 3)) {
+ for (xdiv = 2; xdiv <= 0x80; xdiv++)
+ for (xmul = 1; xmul <= 0xfe; xmul++)
+ if (nforce2_calc_fsb(NFORCE2_PLL(xmul, xdiv)) ==
+ fsb + tried) {
+ mul = xmul;
+ div = xdiv;
+ }
+ tried++;
+ }
+
+ if ((mul == 0) || (div == 0))
+ return -1;
+
+ return NFORCE2_PLL(mul, div);
+}
+
+/**
+ * nforce2_write_pll - write PLL value to chipset
+ * @pll: PLL value
+ *
+ * Writes new FSB PLL value to chipset
+ */
+static void nforce2_write_pll(int pll)
+{
+ int temp;
+
+ /* Set the pll addr. to 0x00 */
+ pci_write_config_dword(nforce2_dev, NFORCE2_PLLADR, 0);
+
+ /* Now write the value in all 64 registers */
+ for (temp = 0; temp <= 0x3f; temp++)
+ pci_write_config_dword(nforce2_dev, NFORCE2_PLLREG, pll);
+
+ return;
+}
+
+/**
+ * nforce2_fsb_read - Read FSB
+ *
+ * Read FSB from chipset
+ * If bootfsb != 0, return FSB at boot-time
+ */
+static unsigned int nforce2_fsb_read(int bootfsb)
+{
+ struct pci_dev *nforce2_sub5;
+ u32 fsb, temp = 0;
+
+ /* Get chipset boot FSB from subdevice 5 (FSB at boot-time) */
+ nforce2_sub5 = pci_get_subsys(PCI_VENDOR_ID_NVIDIA, 0x01EF,
+ PCI_ANY_ID, PCI_ANY_ID, NULL);
+ if (!nforce2_sub5)
+ return 0;
+
+ pci_read_config_dword(nforce2_sub5, NFORCE2_BOOTFSB, &fsb);
+ fsb /= 1000000;
+
+ /* Check if PLL register is already set */
+ pci_read_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8 *)&temp);
+
+ if (bootfsb || !temp)
+ return fsb;
+
+ /* Use PLL register FSB value */
+ pci_read_config_dword(nforce2_dev, NFORCE2_PLLREG, &temp);
+ fsb = nforce2_calc_fsb(temp);
+
+ return fsb;
+}
+
+/**
+ * nforce2_set_fsb - set new FSB
+ * @fsb: New FSB
+ *
+ * Sets new FSB
+ */
+static int nforce2_set_fsb(unsigned int fsb)
+{
+ u32 temp = 0;
+ unsigned int tfsb;
+ int diff;
+ int pll = 0;
+
+ if ((fsb > max_fsb) || (fsb < NFORCE2_MIN_FSB)) {
+ printk(KERN_ERR PFX "FSB %d is out of range!\n", fsb);
+ return -EINVAL;
+ }
+
+ tfsb = nforce2_fsb_read(0);
+ if (!tfsb) {
+ printk(KERN_ERR PFX "Error while reading the FSB\n");
+ return -EINVAL;
+ }
+
+ /* First write? Then set actual value */
+ pci_read_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8 *)&temp);
+ if (!temp) {
+ pll = nforce2_calc_pll(tfsb);
+
+ if (pll < 0)
+ return -EINVAL;
+
+ nforce2_write_pll(pll);
+ }
+
+ /* Enable write access */
+ temp = 0x01;
+ pci_write_config_byte(nforce2_dev, NFORCE2_PLLENABLE, (u8)temp);
+
+ diff = tfsb - fsb;
+
+ if (!diff)
+ return 0;
+
+ while ((tfsb != fsb) && (tfsb <= max_fsb) && (tfsb >= min_fsb)) {
+ if (diff < 0)
+ tfsb++;
+ else
+ tfsb--;
+
+ /* Calculate the PLL reg. value */
+ pll = nforce2_calc_pll(tfsb);
+ if (pll == -1)
+ return -EINVAL;
+
+ nforce2_write_pll(pll);
+#ifdef NFORCE2_DELAY
+ mdelay(NFORCE2_DELAY);
+#endif
+ }
+
+ temp = 0x40;
+ pci_write_config_byte(nforce2_dev, NFORCE2_PLLADR, (u8)temp);
+
+ return 0;
+}
+
+/**
+ * nforce2_get - get the CPU frequency
+ * @cpu: CPU number
+ *
+ * Returns the CPU frequency
+ */
+static unsigned int nforce2_get(unsigned int cpu)
+{
+ if (cpu)
+ return 0;
+ return nforce2_fsb_read(0) * fid * 100;
+}
+
+/**
+ * nforce2_target - set a new CPUFreq policy
+ * @policy: new policy
+ * @target_freq: the target frequency
+ * @relation: how that frequency relates to achieved frequency
+ * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
+ *
+ * Sets a new CPUFreq policy.
+ */
+static int nforce2_target(struct cpufreq_policy *policy,
+ unsigned int target_freq, unsigned int relation)
+{
+/* unsigned long flags; */
+ struct cpufreq_freqs freqs;
+ unsigned int target_fsb;
+
+ if ((target_freq > policy->max) || (target_freq < policy->min))
+ return -EINVAL;
+
+ target_fsb = target_freq / (fid * 100);
+
+ freqs.old = nforce2_get(policy->cpu);
+ freqs.new = target_fsb * fid * 100;
+ freqs.cpu = 0; /* Only one CPU on nForce2 platforms */
+
+ if (freqs.old == freqs.new)
+ return 0;
+
+ pr_debug("Old CPU frequency %d kHz, new %d kHz\n",
+ freqs.old, freqs.new);
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ /* Disable IRQs */
+ /* local_irq_save(flags); */
+
+ if (nforce2_set_fsb(target_fsb) < 0)
+ printk(KERN_ERR PFX "Changing FSB to %d failed\n",
+ target_fsb);
+ else
+ pr_debug("Changed FSB successfully to %d\n",
+ target_fsb);
+
+ /* Enable IRQs */
+ /* local_irq_restore(flags); */
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+ return 0;
+}
+
+/**
+ * nforce2_verify - verifies a new CPUFreq policy
+ * @policy: new policy
+ */
+static int nforce2_verify(struct cpufreq_policy *policy)
+{
+ unsigned int fsb_pol_max;
+
+ fsb_pol_max = policy->max / (fid * 100);
+
+ if (policy->min < (fsb_pol_max * fid * 100))
+ policy->max = (fsb_pol_max + 1) * fid * 100;
+
+ cpufreq_verify_within_limits(policy,
+ policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);
+ return 0;
+}
+
+static int nforce2_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int fsb;
+ unsigned int rfid;
+
+ /* capability check */
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ /* Get current FSB */
+ fsb = nforce2_fsb_read(0);
+
+ if (!fsb)
+ return -EIO;
+
+ /* FIX: Get FID from CPU */
+ if (!fid) {
+ if (!cpu_khz) {
+ printk(KERN_WARNING PFX
+ "cpu_khz not set, can't calculate multiplier!\n");
+ return -ENODEV;
+ }
+
+ fid = cpu_khz / (fsb * 100);
+ rfid = fid % 5;
+
+ if (rfid) {
+ if (rfid > 2)
+ fid += 5 - rfid;
+ else
+ fid -= rfid;
+ }
+ }
+
+ printk(KERN_INFO PFX "FSB currently at %i MHz, FID %d.%d\n", fsb,
+ fid / 10, fid % 10);
+
+ /* Set maximum FSB to FSB at boot time */
+ max_fsb = nforce2_fsb_read(1);
+
+ if (!max_fsb)
+ return -EIO;
+
+ if (!min_fsb)
+ min_fsb = max_fsb - NFORCE2_SAFE_DISTANCE;
+
+ if (min_fsb < NFORCE2_MIN_FSB)
+ min_fsb = NFORCE2_MIN_FSB;
+
+ /* cpuinfo and default policy values */
+ policy->cpuinfo.min_freq = min_fsb * fid * 100;
+ policy->cpuinfo.max_freq = max_fsb * fid * 100;
+ policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
+ policy->cur = nforce2_get(policy->cpu);
+ policy->min = policy->cpuinfo.min_freq;
+ policy->max = policy->cpuinfo.max_freq;
+
+ return 0;
+}
+
+static int nforce2_cpu_exit(struct cpufreq_policy *policy)
+{
+ return 0;
+}
+
+static struct cpufreq_driver nforce2_driver = {
+ .name = "nforce2",
+ .verify = nforce2_verify,
+ .target = nforce2_target,
+ .get = nforce2_get,
+ .init = nforce2_cpu_init,
+ .exit = nforce2_cpu_exit,
+ .owner = THIS_MODULE,
+};
+
+/**
+ * nforce2_detect_chipset - detect the Southbridge which contains FSB PLL logic
+ *
+ * Detects nForce2 A2 and C1 stepping
+ *
+ */
+static int nforce2_detect_chipset(void)
+{
+ nforce2_dev = pci_get_subsys(PCI_VENDOR_ID_NVIDIA,
+ PCI_DEVICE_ID_NVIDIA_NFORCE2,
+ PCI_ANY_ID, PCI_ANY_ID, NULL);
+
+ if (nforce2_dev == NULL)
+ return -ENODEV;
+
+ printk(KERN_INFO PFX "Detected nForce2 chipset revision %X\n",
+ nforce2_dev->revision);
+ printk(KERN_INFO PFX
+ "FSB changing is maybe unstable and can lead to "
+ "crashes and data loss.\n");
+
+ return 0;
+}
+
+/**
+ * nforce2_init - initializes the nForce2 CPUFreq driver
+ *
+ * Initializes the nForce2 FSB support. Returns -ENODEV on unsupported
+ * devices, -EINVAL on problems during initiatization, and zero on
+ * success.
+ */
+static int __init nforce2_init(void)
+{
+ /* TODO: do we need to detect the processor? */
+
+ /* detect chipset */
+ if (nforce2_detect_chipset()) {
+ printk(KERN_INFO PFX "No nForce2 chipset.\n");
+ return -ENODEV;
+ }
+
+ return cpufreq_register_driver(&nforce2_driver);
+}
+
+/**
+ * nforce2_exit - unregisters cpufreq module
+ *
+ * Unregisters nForce2 FSB change support.
+ */
+static void __exit nforce2_exit(void)
+{
+ cpufreq_unregister_driver(&nforce2_driver);
+}
+
+module_init(nforce2_init);
+module_exit(nforce2_exit);
+
#include <trace/events/power.h>
-#define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, \
- "cpufreq-core", msg)
-
/**
* The "cpufreq driver" - the arch- or hardware-dependent low
* level driver of CPUFreq support, and its spinlock. This lock
EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
-/*********************************************************************
- * UNIFIED DEBUG HELPERS *
- *********************************************************************/
-#ifdef CONFIG_CPU_FREQ_DEBUG
-
-/* what part(s) of the CPUfreq subsystem are debugged? */
-static unsigned int debug;
-
-/* is the debug output ratelimit'ed using printk_ratelimit? User can
- * set or modify this value.
- */
-static unsigned int debug_ratelimit = 1;
-
-/* is the printk_ratelimit'ing enabled? It's enabled after a successful
- * loading of a cpufreq driver, temporarily disabled when a new policy
- * is set, and disabled upon cpufreq driver removal
- */
-static unsigned int disable_ratelimit = 1;
-static DEFINE_SPINLOCK(disable_ratelimit_lock);
-
-static void cpufreq_debug_enable_ratelimit(void)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&disable_ratelimit_lock, flags);
- if (disable_ratelimit)
- disable_ratelimit--;
- spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
-}
-
-static void cpufreq_debug_disable_ratelimit(void)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&disable_ratelimit_lock, flags);
- disable_ratelimit++;
- spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
-}
-
-void cpufreq_debug_printk(unsigned int type, const char *prefix,
- const char *fmt, ...)
-{
- char s[256];
- va_list args;
- unsigned int len;
- unsigned long flags;
-
- WARN_ON(!prefix);
- if (type & debug) {
- spin_lock_irqsave(&disable_ratelimit_lock, flags);
- if (!disable_ratelimit && debug_ratelimit
- && !printk_ratelimit()) {
- spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
- return;
- }
- spin_unlock_irqrestore(&disable_ratelimit_lock, flags);
-
- len = snprintf(s, 256, KERN_DEBUG "%s: ", prefix);
-
- va_start(args, fmt);
- len += vsnprintf(&s[len], (256 - len), fmt, args);
- va_end(args);
-
- printk(s);
-
- WARN_ON(len < 5);
- }
-}
-EXPORT_SYMBOL(cpufreq_debug_printk);
-
-
-module_param(debug, uint, 0644);
-MODULE_PARM_DESC(debug, "CPUfreq debugging: add 1 to debug core,"
- " 2 to debug drivers, and 4 to debug governors.");
-
-module_param(debug_ratelimit, uint, 0644);
-MODULE_PARM_DESC(debug_ratelimit, "CPUfreq debugging:"
- " set to 0 to disable ratelimiting.");
-
-#else /* !CONFIG_CPU_FREQ_DEBUG */
-
-static inline void cpufreq_debug_enable_ratelimit(void) { return; }
-static inline void cpufreq_debug_disable_ratelimit(void) { return; }
-
-#endif /* CONFIG_CPU_FREQ_DEBUG */
-
-
/*********************************************************************
* EXTERNALLY AFFECTING FREQUENCY CHANGES *
*********************************************************************/
if (!l_p_j_ref_freq) {
l_p_j_ref = loops_per_jiffy;
l_p_j_ref_freq = ci->old;
- dprintk("saving %lu as reference value for loops_per_jiffy; "
+ pr_debug("saving %lu as reference value for loops_per_jiffy; "
"freq is %u kHz\n", l_p_j_ref, l_p_j_ref_freq);
}
if ((val == CPUFREQ_PRECHANGE && ci->old < ci->new) ||
(val == CPUFREQ_RESUMECHANGE || val == CPUFREQ_SUSPENDCHANGE)) {
loops_per_jiffy = cpufreq_scale(l_p_j_ref, l_p_j_ref_freq,
ci->new);
- dprintk("scaling loops_per_jiffy to %lu "
+ pr_debug("scaling loops_per_jiffy to %lu "
"for frequency %u kHz\n", loops_per_jiffy, ci->new);
}
}
BUG_ON(irqs_disabled());
freqs->flags = cpufreq_driver->flags;
- dprintk("notification %u of frequency transition to %u kHz\n",
+ pr_debug("notification %u of frequency transition to %u kHz\n",
state, freqs->new);
policy = per_cpu(cpufreq_cpu_data, freqs->cpu);
if (!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
if ((policy) && (policy->cpu == freqs->cpu) &&
(policy->cur) && (policy->cur != freqs->old)) {
- dprintk("Warning: CPU frequency is"
+ pr_debug("Warning: CPU frequency is"
" %u, cpufreq assumed %u kHz.\n",
freqs->old, policy->cur);
freqs->old = policy->cur;
case CPUFREQ_POSTCHANGE:
adjust_jiffies(CPUFREQ_POSTCHANGE, freqs);
- dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
+ pr_debug("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
(unsigned long)freqs->cpu);
trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
trace_cpu_frequency(freqs->new, freqs->cpu);
t = __find_governor(str_governor);
if (t == NULL) {
- char *name = kasprintf(GFP_KERNEL, "cpufreq_%s",
- str_governor);
-
- if (name) {
- int ret;
+ int ret;
- mutex_unlock(&cpufreq_governor_mutex);
- ret = request_module("%s", name);
- mutex_lock(&cpufreq_governor_mutex);
+ mutex_unlock(&cpufreq_governor_mutex);
+ ret = request_module("cpufreq_%s", str_governor);
+ mutex_lock(&cpufreq_governor_mutex);
- if (ret == 0)
- t = __find_governor(str_governor);
- }
-
- kfree(name);
+ if (ret == 0)
+ t = __find_governor(str_governor);
}
if (t != NULL) {
static void cpufreq_sysfs_release(struct kobject *kobj)
{
struct cpufreq_policy *policy = to_policy(kobj);
- dprintk("last reference is dropped\n");
+ pr_debug("last reference is dropped\n");
complete(&policy->kobj_unregister);
}
gov = __find_governor(per_cpu(cpufreq_cpu_governor, cpu));
if (gov) {
policy->governor = gov;
- dprintk("Restoring governor %s for cpu %d\n",
+ pr_debug("Restoring governor %s for cpu %d\n",
policy->governor->name, cpu);
}
#endif
per_cpu(cpufreq_cpu_data, cpu) = managed_policy;
spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
- dprintk("CPU already managed, adding link\n");
+ pr_debug("CPU already managed, adding link\n");
ret = sysfs_create_link(&sys_dev->kobj,
&managed_policy->kobj,
"cpufreq");
if (!cpu_online(j))
continue;
- dprintk("CPU %u already managed, adding link\n", j);
+ pr_debug("CPU %u already managed, adding link\n", j);
managed_policy = cpufreq_cpu_get(cpu);
cpu_sys_dev = get_cpu_sysdev(j);
ret = sysfs_create_link(&cpu_sys_dev->kobj, &policy->kobj,
policy->user_policy.governor = policy->governor;
if (ret) {
- dprintk("setting policy failed\n");
+ pr_debug("setting policy failed\n");
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);
}
if (cpu_is_offline(cpu))
return 0;
- cpufreq_debug_disable_ratelimit();
- dprintk("adding CPU %u\n", cpu);
+ pr_debug("adding CPU %u\n", cpu);
#ifdef CONFIG_SMP
/* check whether a different CPU already registered this
policy = cpufreq_cpu_get(cpu);
if (unlikely(policy)) {
cpufreq_cpu_put(policy);
- cpufreq_debug_enable_ratelimit();
return 0;
}
#endif
*/
ret = cpufreq_driver->init(policy);
if (ret) {
- dprintk("initialization failed\n");
+ pr_debug("initialization failed\n");
goto err_unlock_policy;
}
policy->user_policy.min = policy->min;
kobject_uevent(&policy->kobj, KOBJ_ADD);
module_put(cpufreq_driver->owner);
- dprintk("initialization complete\n");
- cpufreq_debug_enable_ratelimit();
+ pr_debug("initialization complete\n");
return 0;
nomem_out:
module_put(cpufreq_driver->owner);
module_out:
- cpufreq_debug_enable_ratelimit();
return ret;
}
unsigned int j;
#endif
- cpufreq_debug_disable_ratelimit();
- dprintk("unregistering CPU %u\n", cpu);
+ pr_debug("unregistering CPU %u\n", cpu);
spin_lock_irqsave(&cpufreq_driver_lock, flags);
data = per_cpu(cpufreq_cpu_data, cpu);
if (!data) {
spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
- cpufreq_debug_enable_ratelimit();
unlock_policy_rwsem_write(cpu);
return -EINVAL;
}
* only need to unlink, put and exit
*/
if (unlikely(cpu != data->cpu)) {
- dprintk("removing link\n");
+ pr_debug("removing link\n");
cpumask_clear_cpu(cpu, data->cpus);
spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
kobj = &sys_dev->kobj;
cpufreq_cpu_put(data);
- cpufreq_debug_enable_ratelimit();
unlock_policy_rwsem_write(cpu);
sysfs_remove_link(kobj, "cpufreq");
return 0;
for_each_cpu(j, data->cpus) {
if (j == cpu)
continue;
- dprintk("removing link for cpu %u\n", j);
+ pr_debug("removing link for cpu %u\n", j);
#ifdef CONFIG_HOTPLUG_CPU
strncpy(per_cpu(cpufreq_cpu_governor, j),
data->governor->name, CPUFREQ_NAME_LEN);
* not referenced anymore by anybody before we proceed with
* unloading.
*/
- dprintk("waiting for dropping of refcount\n");
+ pr_debug("waiting for dropping of refcount\n");
wait_for_completion(cmp);
- dprintk("wait complete\n");
+ pr_debug("wait complete\n");
lock_policy_rwsem_write(cpu);
if (cpufreq_driver->exit)
cpufreq_driver->exit(data);
unlock_policy_rwsem_write(cpu);
+#ifdef CONFIG_HOTPLUG_CPU
+ /* when the CPU which is the parent of the kobj is hotplugged
+ * offline, check for siblings, and create cpufreq sysfs interface
+ * and symlinks
+ */
+ if (unlikely(cpumask_weight(data->cpus) > 1)) {
+ /* first sibling now owns the new sysfs dir */
+ cpumask_clear_cpu(cpu, data->cpus);
+ cpufreq_add_dev(get_cpu_sysdev(cpumask_first(data->cpus)));
+
+ /* finally remove our own symlink */
+ lock_policy_rwsem_write(cpu);
+ __cpufreq_remove_dev(sys_dev);
+ }
+#endif
+
free_cpumask_var(data->related_cpus);
free_cpumask_var(data->cpus);
kfree(data);
- per_cpu(cpufreq_cpu_data, cpu) = NULL;
- cpufreq_debug_enable_ratelimit();
return 0;
}
struct cpufreq_policy *policy =
container_of(work, struct cpufreq_policy, update);
unsigned int cpu = policy->cpu;
- dprintk("handle_update for cpu %u called\n", cpu);
+ pr_debug("handle_update for cpu %u called\n", cpu);
cpufreq_update_policy(cpu);
}
{
struct cpufreq_freqs freqs;
- dprintk("Warning: CPU frequency out of sync: cpufreq and timing "
+ pr_debug("Warning: CPU frequency out of sync: cpufreq and timing "
"core thinks of %u, is %u kHz.\n", old_freq, new_freq);
freqs.cpu = cpu;
int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
- dprintk("suspending cpu %u\n", cpu);
+ pr_debug("suspending cpu %u\n", cpu);
/* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
int cpu = smp_processor_id();
struct cpufreq_policy *cpu_policy;
- dprintk("resuming cpu %u\n", cpu);
+ pr_debug("resuming cpu %u\n", cpu);
/* If there's no policy for the boot CPU, we have nothing to do. */
cpu_policy = cpufreq_cpu_get(cpu);
{
int retval = -EINVAL;
- dprintk("target for CPU %u: %u kHz, relation %u\n", policy->cpu,
+ pr_debug("target for CPU %u: %u kHz, relation %u\n", policy->cpu,
target_freq, relation);
if (cpu_online(policy->cpu) && cpufreq_driver->target)
retval = cpufreq_driver->target(policy, target_freq, relation);
if (!try_module_get(policy->governor->owner))
return -EINVAL;
- dprintk("__cpufreq_governor for CPU %u, event %u\n",
+ pr_debug("__cpufreq_governor for CPU %u, event %u\n",
policy->cpu, event);
ret = policy->governor->governor(policy, event);
{
int ret = 0;
- cpufreq_debug_disable_ratelimit();
- dprintk("setting new policy for CPU %u: %u - %u kHz\n", policy->cpu,
+ pr_debug("setting new policy for CPU %u: %u - %u kHz\n", policy->cpu,
policy->min, policy->max);
memcpy(&policy->cpuinfo, &data->cpuinfo,
data->min = policy->min;
data->max = policy->max;
- dprintk("new min and max freqs are %u - %u kHz\n",
+ pr_debug("new min and max freqs are %u - %u kHz\n",
data->min, data->max);
if (cpufreq_driver->setpolicy) {
data->policy = policy->policy;
- dprintk("setting range\n");
+ pr_debug("setting range\n");
ret = cpufreq_driver->setpolicy(policy);
} else {
if (policy->governor != data->governor) {
/* save old, working values */
struct cpufreq_governor *old_gov = data->governor;
- dprintk("governor switch\n");
+ pr_debug("governor switch\n");
/* end old governor */
if (data->governor)
data->governor = policy->governor;
if (__cpufreq_governor(data, CPUFREQ_GOV_START)) {
/* new governor failed, so re-start old one */
- dprintk("starting governor %s failed\n",
+ pr_debug("starting governor %s failed\n",
data->governor->name);
if (old_gov) {
data->governor = old_gov;
}
/* might be a policy change, too, so fall through */
}
- dprintk("governor: change or update limits\n");
+ pr_debug("governor: change or update limits\n");
__cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
}
error_out:
- cpufreq_debug_enable_ratelimit();
return ret;
}
goto fail;
}
- dprintk("updating policy for CPU %u\n", cpu);
+ pr_debug("updating policy for CPU %u\n", cpu);
memcpy(&policy, data, sizeof(struct cpufreq_policy));
policy.min = data->user_policy.min;
policy.max = data->user_policy.max;
if (cpufreq_driver->get) {
policy.cur = cpufreq_driver->get(cpu);
if (!data->cur) {
- dprintk("Driver did not initialize current freq");
+ pr_debug("Driver did not initialize current freq");
data->cur = policy.cur;
} else {
if (data->cur != policy.cur)
((!driver_data->setpolicy) && (!driver_data->target)))
return -EINVAL;
- dprintk("trying to register driver %s\n", driver_data->name);
+ pr_debug("trying to register driver %s\n", driver_data->name);
if (driver_data->setpolicy)
driver_data->flags |= CPUFREQ_CONST_LOOPS;
/* if all ->init() calls failed, unregister */
if (ret) {
- dprintk("no CPU initialized for driver %s\n",
+ pr_debug("no CPU initialized for driver %s\n",
driver_data->name);
goto err_sysdev_unreg;
}
}
register_hotcpu_notifier(&cpufreq_cpu_notifier);
- dprintk("driver %s up and running\n", driver_data->name);
- cpufreq_debug_enable_ratelimit();
+ pr_debug("driver %s up and running\n", driver_data->name);
return 0;
err_sysdev_unreg:
{
unsigned long flags;
- cpufreq_debug_disable_ratelimit();
-
- if (!cpufreq_driver || (driver != cpufreq_driver)) {
- cpufreq_debug_enable_ratelimit();
+ if (!cpufreq_driver || (driver != cpufreq_driver))
return -EINVAL;
- }
- dprintk("unregistering driver %s\n", driver->name);
+ pr_debug("unregistering driver %s\n", driver->name);
sysdev_driver_unregister(&cpu_sysdev_class, &cpufreq_sysdev_driver);
unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
#include <linux/cpufreq.h>
#include <linux/init.h>
-#define dprintk(msg...) \
- cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "performance", msg)
-
static int cpufreq_governor_performance(struct cpufreq_policy *policy,
unsigned int event)
switch (event) {
case CPUFREQ_GOV_START:
case CPUFREQ_GOV_LIMITS:
- dprintk("setting to %u kHz because of event %u\n",
+ pr_debug("setting to %u kHz because of event %u\n",
policy->max, event);
__cpufreq_driver_target(policy, policy->max,
CPUFREQ_RELATION_H);
#include <linux/cpufreq.h>
#include <linux/init.h>
-#define dprintk(msg...) \
- cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "powersave", msg)
-
static int cpufreq_governor_powersave(struct cpufreq_policy *policy,
unsigned int event)
{
switch (event) {
case CPUFREQ_GOV_START:
case CPUFREQ_GOV_LIMITS:
- dprintk("setting to %u kHz because of event %u\n",
+ pr_debug("setting to %u kHz because of event %u\n",
policy->min, event);
__cpufreq_driver_target(policy, policy->min,
CPUFREQ_RELATION_L);
return -1;
}
+/* should be called late in the CPU removal sequence so that the stats
+ * memory is still available in case someone tries to use it.
+ */
static void cpufreq_stats_free_table(unsigned int cpu)
{
struct cpufreq_stats *stat = per_cpu(cpufreq_stats_table, cpu);
- struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
- if (policy && policy->cpu == cpu)
- sysfs_remove_group(&policy->kobj, &stats_attr_group);
if (stat) {
kfree(stat->time_in_state);
kfree(stat);
}
per_cpu(cpufreq_stats_table, cpu) = NULL;
+}
+
+/* must be called early in the CPU removal sequence (before
+ * cpufreq_remove_dev) so that policy is still valid.
+ */
+static void cpufreq_stats_free_sysfs(unsigned int cpu)
+{
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ if (policy && policy->cpu == cpu)
+ sysfs_remove_group(&policy->kobj, &stats_attr_group);
if (policy)
cpufreq_cpu_put(policy);
}
case CPU_ONLINE_FROZEN:
cpufreq_update_policy(cpu);
break;
+ case CPU_DOWN_PREPARE:
+ cpufreq_stats_free_sysfs(cpu);
+ break;
case CPU_DEAD:
case CPU_DEAD_FROZEN:
cpufreq_stats_free_table(cpu);
return NOTIFY_OK;
}
-static struct notifier_block cpufreq_stat_cpu_notifier __refdata =
-{
+/* priority=1 so this will get called before cpufreq_remove_dev */
+static struct notifier_block cpufreq_stat_cpu_notifier __refdata = {
.notifier_call = cpufreq_stat_cpu_callback,
+ .priority = 1,
};
static struct notifier_block notifier_policy_block = {
static DEFINE_MUTEX(userspace_mutex);
static int cpus_using_userspace_governor;
-#define dprintk(msg...) \
- cpufreq_debug_printk(CPUFREQ_DEBUG_GOVERNOR, "userspace", msg)
-
/* keep track of frequency transitions */
static int
userspace_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
if (!per_cpu(cpu_is_managed, freq->cpu))
return 0;
- dprintk("saving cpu_cur_freq of cpu %u to be %u kHz\n",
+ pr_debug("saving cpu_cur_freq of cpu %u to be %u kHz\n",
freq->cpu, freq->new);
per_cpu(cpu_cur_freq, freq->cpu) = freq->new;
{
int ret = -EINVAL;
- dprintk("cpufreq_set for cpu %u, freq %u kHz\n", policy->cpu, freq);
+ pr_debug("cpufreq_set for cpu %u, freq %u kHz\n", policy->cpu, freq);
mutex_lock(&userspace_mutex);
if (!per_cpu(cpu_is_managed, policy->cpu))
per_cpu(cpu_max_freq, cpu) = policy->max;
per_cpu(cpu_cur_freq, cpu) = policy->cur;
per_cpu(cpu_set_freq, cpu) = policy->cur;
- dprintk("managing cpu %u started "
+ pr_debug("managing cpu %u started "
"(%u - %u kHz, currently %u kHz)\n",
cpu,
per_cpu(cpu_min_freq, cpu),
per_cpu(cpu_min_freq, cpu) = 0;
per_cpu(cpu_max_freq, cpu) = 0;
per_cpu(cpu_set_freq, cpu) = 0;
- dprintk("managing cpu %u stopped\n", cpu);
+ pr_debug("managing cpu %u stopped\n", cpu);
mutex_unlock(&userspace_mutex);
break;
case CPUFREQ_GOV_LIMITS:
mutex_lock(&userspace_mutex);
- dprintk("limit event for cpu %u: %u - %u kHz, "
+ pr_debug("limit event for cpu %u: %u - %u kHz, "
"currently %u kHz, last set to %u kHz\n",
cpu, policy->min, policy->max,
per_cpu(cpu_cur_freq, cpu),
--- /dev/null
+/*
+ * Based on documentation provided by Dave Jones. Thanks!
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/ioport.h>
+#include <linux/slab.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+
+#include <asm/msr.h>
+#include <asm/tsc.h>
+
+#define EPS_BRAND_C7M 0
+#define EPS_BRAND_C7 1
+#define EPS_BRAND_EDEN 2
+#define EPS_BRAND_C3 3
+#define EPS_BRAND_C7D 4
+
+struct eps_cpu_data {
+ u32 fsb;
+ struct cpufreq_frequency_table freq_table[];
+};
+
+static struct eps_cpu_data *eps_cpu[NR_CPUS];
+
+
+static unsigned int eps_get(unsigned int cpu)
+{
+ struct eps_cpu_data *centaur;
+ u32 lo, hi;
+
+ if (cpu)
+ return 0;
+ centaur = eps_cpu[cpu];
+ if (centaur == NULL)
+ return 0;
+
+ /* Return current frequency */
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ return centaur->fsb * ((lo >> 8) & 0xff);
+}
+
+static int eps_set_state(struct eps_cpu_data *centaur,
+ unsigned int cpu,
+ u32 dest_state)
+{
+ struct cpufreq_freqs freqs;
+ u32 lo, hi;
+ int err = 0;
+ int i;
+
+ freqs.old = eps_get(cpu);
+ freqs.new = centaur->fsb * ((dest_state >> 8) & 0xff);
+ freqs.cpu = cpu;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ /* Wait while CPU is busy */
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ i = 0;
+ while (lo & ((1 << 16) | (1 << 17))) {
+ udelay(16);
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ i++;
+ if (unlikely(i > 64)) {
+ err = -ENODEV;
+ goto postchange;
+ }
+ }
+ /* Set new multiplier and voltage */
+ wrmsr(MSR_IA32_PERF_CTL, dest_state & 0xffff, 0);
+ /* Wait until transition end */
+ i = 0;
+ do {
+ udelay(16);
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ i++;
+ if (unlikely(i > 64)) {
+ err = -ENODEV;
+ goto postchange;
+ }
+ } while (lo & ((1 << 16) | (1 << 17)));
+
+ /* Return current frequency */
+postchange:
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ freqs.new = centaur->fsb * ((lo >> 8) & 0xff);
+
+#ifdef DEBUG
+ {
+ u8 current_multiplier, current_voltage;
+
+ /* Print voltage and multiplier */
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ current_voltage = lo & 0xff;
+ printk(KERN_INFO "eps: Current voltage = %dmV\n",
+ current_voltage * 16 + 700);
+ current_multiplier = (lo >> 8) & 0xff;
+ printk(KERN_INFO "eps: Current multiplier = %d\n",
+ current_multiplier);
+ }
+#endif
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ return err;
+}
+
+static int eps_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ struct eps_cpu_data *centaur;
+ unsigned int newstate = 0;
+ unsigned int cpu = policy->cpu;
+ unsigned int dest_state;
+ int ret;
+
+ if (unlikely(eps_cpu[cpu] == NULL))
+ return -ENODEV;
+ centaur = eps_cpu[cpu];
+
+ if (unlikely(cpufreq_frequency_table_target(policy,
+ &eps_cpu[cpu]->freq_table[0],
+ target_freq,
+ relation,
+ &newstate))) {
+ return -EINVAL;
+ }
+
+ /* Make frequency transition */
+ dest_state = centaur->freq_table[newstate].index & 0xffff;
+ ret = eps_set_state(centaur, cpu, dest_state);
+ if (ret)
+ printk(KERN_ERR "eps: Timeout!\n");
+ return ret;
+}
+
+static int eps_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy,
+ &eps_cpu[policy->cpu]->freq_table[0]);
+}
+
+static int eps_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int i;
+ u32 lo, hi;
+ u64 val;
+ u8 current_multiplier, current_voltage;
+ u8 max_multiplier, max_voltage;
+ u8 min_multiplier, min_voltage;
+ u8 brand = 0;
+ u32 fsb;
+ struct eps_cpu_data *centaur;
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ struct cpufreq_frequency_table *f_table;
+ int k, step, voltage;
+ int ret;
+ int states;
+
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ /* Check brand */
+ printk(KERN_INFO "eps: Detected VIA ");
+
+ switch (c->x86_model) {
+ case 10:
+ rdmsr(0x1153, lo, hi);
+ brand = (((lo >> 2) ^ lo) >> 18) & 3;
+ printk(KERN_CONT "Model A ");
+ break;
+ case 13:
+ rdmsr(0x1154, lo, hi);
+ brand = (((lo >> 4) ^ (lo >> 2))) & 0x000000ff;
+ printk(KERN_CONT "Model D ");
+ break;
+ }
+
+ switch (brand) {
+ case EPS_BRAND_C7M:
+ printk(KERN_CONT "C7-M\n");
+ break;
+ case EPS_BRAND_C7:
+ printk(KERN_CONT "C7\n");
+ break;
+ case EPS_BRAND_EDEN:
+ printk(KERN_CONT "Eden\n");
+ break;
+ case EPS_BRAND_C7D:
+ printk(KERN_CONT "C7-D\n");
+ break;
+ case EPS_BRAND_C3:
+ printk(KERN_CONT "C3\n");
+ return -ENODEV;
+ break;
+ }
+ /* Enable Enhanced PowerSaver */
+ rdmsrl(MSR_IA32_MISC_ENABLE, val);
+ if (!(val & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
+ val |= MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP;
+ wrmsrl(MSR_IA32_MISC_ENABLE, val);
+ /* Can be locked at 0 */
+ rdmsrl(MSR_IA32_MISC_ENABLE, val);
+ if (!(val & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
+ printk(KERN_INFO "eps: Can't enable Enhanced PowerSaver\n");
+ return -ENODEV;
+ }
+ }
+
+ /* Print voltage and multiplier */
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ current_voltage = lo & 0xff;
+ printk(KERN_INFO "eps: Current voltage = %dmV\n",
+ current_voltage * 16 + 700);
+ current_multiplier = (lo >> 8) & 0xff;
+ printk(KERN_INFO "eps: Current multiplier = %d\n", current_multiplier);
+
+ /* Print limits */
+ max_voltage = hi & 0xff;
+ printk(KERN_INFO "eps: Highest voltage = %dmV\n",
+ max_voltage * 16 + 700);
+ max_multiplier = (hi >> 8) & 0xff;
+ printk(KERN_INFO "eps: Highest multiplier = %d\n", max_multiplier);
+ min_voltage = (hi >> 16) & 0xff;
+ printk(KERN_INFO "eps: Lowest voltage = %dmV\n",
+ min_voltage * 16 + 700);
+ min_multiplier = (hi >> 24) & 0xff;
+ printk(KERN_INFO "eps: Lowest multiplier = %d\n", min_multiplier);
+
+ /* Sanity checks */
+ if (current_multiplier == 0 || max_multiplier == 0
+ || min_multiplier == 0)
+ return -EINVAL;
+ if (current_multiplier > max_multiplier
+ || max_multiplier <= min_multiplier)
+ return -EINVAL;
+ if (current_voltage > 0x1f || max_voltage > 0x1f)
+ return -EINVAL;
+ if (max_voltage < min_voltage)
+ return -EINVAL;
+
+ /* Calc FSB speed */
+ fsb = cpu_khz / current_multiplier;
+ /* Calc number of p-states supported */
+ if (brand == EPS_BRAND_C7M)
+ states = max_multiplier - min_multiplier + 1;
+ else
+ states = 2;
+
+ /* Allocate private data and frequency table for current cpu */
+ centaur = kzalloc(sizeof(struct eps_cpu_data)
+ + (states + 1) * sizeof(struct cpufreq_frequency_table),
+ GFP_KERNEL);
+ if (!centaur)
+ return -ENOMEM;
+ eps_cpu[0] = centaur;
+
+ /* Copy basic values */
+ centaur->fsb = fsb;
+
+ /* Fill frequency and MSR value table */
+ f_table = ¢aur->freq_table[0];
+ if (brand != EPS_BRAND_C7M) {
+ f_table[0].frequency = fsb * min_multiplier;
+ f_table[0].index = (min_multiplier << 8) | min_voltage;
+ f_table[1].frequency = fsb * max_multiplier;
+ f_table[1].index = (max_multiplier << 8) | max_voltage;
+ f_table[2].frequency = CPUFREQ_TABLE_END;
+ } else {
+ k = 0;
+ step = ((max_voltage - min_voltage) * 256)
+ / (max_multiplier - min_multiplier);
+ for (i = min_multiplier; i <= max_multiplier; i++) {
+ voltage = (k * step) / 256 + min_voltage;
+ f_table[k].frequency = fsb * i;
+ f_table[k].index = (i << 8) | voltage;
+ k++;
+ }
+ f_table[k].frequency = CPUFREQ_TABLE_END;
+ }
+
+ policy->cpuinfo.transition_latency = 140000; /* 844mV -> 700mV in ns */
+ policy->cur = fsb * current_multiplier;
+
+ ret = cpufreq_frequency_table_cpuinfo(policy, ¢aur->freq_table[0]);
+ if (ret) {
+ kfree(centaur);
+ return ret;
+ }
+
+ cpufreq_frequency_table_get_attr(¢aur->freq_table[0], policy->cpu);
+ return 0;
+}
+
+static int eps_cpu_exit(struct cpufreq_policy *policy)
+{
+ unsigned int cpu = policy->cpu;
+ struct eps_cpu_data *centaur;
+ u32 lo, hi;
+
+ if (eps_cpu[cpu] == NULL)
+ return -ENODEV;
+ centaur = eps_cpu[cpu];
+
+ /* Get max frequency */
+ rdmsr(MSR_IA32_PERF_STATUS, lo, hi);
+ /* Set max frequency */
+ eps_set_state(centaur, cpu, hi & 0xffff);
+ /* Bye */
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ kfree(eps_cpu[cpu]);
+ eps_cpu[cpu] = NULL;
+ return 0;
+}
+
+static struct freq_attr *eps_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver eps_driver = {
+ .verify = eps_verify,
+ .target = eps_target,
+ .init = eps_cpu_init,
+ .exit = eps_cpu_exit,
+ .get = eps_get,
+ .name = "e_powersaver",
+ .owner = THIS_MODULE,
+ .attr = eps_attr,
+};
+
+static int __init eps_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ /* This driver will work only on Centaur C7 processors with
+ * Enhanced SpeedStep/PowerSaver registers */
+ if (c->x86_vendor != X86_VENDOR_CENTAUR
+ || c->x86 != 6 || c->x86_model < 10)
+ return -ENODEV;
+ if (!cpu_has(c, X86_FEATURE_EST))
+ return -ENODEV;
+
+ if (cpufreq_register_driver(&eps_driver))
+ return -EINVAL;
+ return 0;
+}
+
+static void __exit eps_exit(void)
+{
+ cpufreq_unregister_driver(&eps_driver);
+}
+
+MODULE_AUTHOR("Rafal Bilski <rafalbilski@interia.pl>");
+MODULE_DESCRIPTION("Enhanced PowerSaver driver for VIA C7 CPU's.");
+MODULE_LICENSE("GPL");
+
+module_init(eps_init);
+module_exit(eps_exit);
--- /dev/null
+/*
+ * elanfreq: cpufreq driver for the AMD ELAN family
+ *
+ * (c) Copyright 2002 Robert Schwebel <r.schwebel@pengutronix.de>
+ *
+ * Parts of this code are (c) Sven Geggus <sven@geggus.net>
+ *
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * 2002-02-13: - initial revision for 2.4.18-pre9 by Robert Schwebel
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include <linux/delay.h>
+#include <linux/cpufreq.h>
+
+#include <asm/msr.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+
+#define REG_CSCIR 0x22 /* Chip Setup and Control Index Register */
+#define REG_CSCDR 0x23 /* Chip Setup and Control Data Register */
+
+/* Module parameter */
+static int max_freq;
+
+struct s_elan_multiplier {
+ int clock; /* frequency in kHz */
+ int val40h; /* PMU Force Mode register */
+ int val80h; /* CPU Clock Speed Register */
+};
+
+/*
+ * It is important that the frequencies
+ * are listed in ascending order here!
+ */
+static struct s_elan_multiplier elan_multiplier[] = {
+ {1000, 0x02, 0x18},
+ {2000, 0x02, 0x10},
+ {4000, 0x02, 0x08},
+ {8000, 0x00, 0x00},
+ {16000, 0x00, 0x02},
+ {33000, 0x00, 0x04},
+ {66000, 0x01, 0x04},
+ {99000, 0x01, 0x05}
+};
+
+static struct cpufreq_frequency_table elanfreq_table[] = {
+ {0, 1000},
+ {1, 2000},
+ {2, 4000},
+ {3, 8000},
+ {4, 16000},
+ {5, 33000},
+ {6, 66000},
+ {7, 99000},
+ {0, CPUFREQ_TABLE_END},
+};
+
+
+/**
+ * elanfreq_get_cpu_frequency: determine current cpu speed
+ *
+ * Finds out at which frequency the CPU of the Elan SOC runs
+ * at the moment. Frequencies from 1 to 33 MHz are generated
+ * the normal way, 66 and 99 MHz are called "Hyperspeed Mode"
+ * and have the rest of the chip running with 33 MHz.
+ */
+
+static unsigned int elanfreq_get_cpu_frequency(unsigned int cpu)
+{
+ u8 clockspeed_reg; /* Clock Speed Register */
+
+ local_irq_disable();
+ outb_p(0x80, REG_CSCIR);
+ clockspeed_reg = inb_p(REG_CSCDR);
+ local_irq_enable();
+
+ if ((clockspeed_reg & 0xE0) == 0xE0)
+ return 0;
+
+ /* Are we in CPU clock multiplied mode (66/99 MHz)? */
+ if ((clockspeed_reg & 0xE0) == 0xC0) {
+ if ((clockspeed_reg & 0x01) == 0)
+ return 66000;
+ else
+ return 99000;
+ }
+
+ /* 33 MHz is not 32 MHz... */
+ if ((clockspeed_reg & 0xE0) == 0xA0)
+ return 33000;
+
+ return (1<<((clockspeed_reg & 0xE0) >> 5)) * 1000;
+}
+
+
+/**
+ * elanfreq_set_cpu_frequency: Change the CPU core frequency
+ * @cpu: cpu number
+ * @freq: frequency in kHz
+ *
+ * This function takes a frequency value and changes the CPU frequency
+ * according to this. Note that the frequency has to be checked by
+ * elanfreq_validatespeed() for correctness!
+ *
+ * There is no return value.
+ */
+
+static void elanfreq_set_cpu_state(unsigned int state)
+{
+ struct cpufreq_freqs freqs;
+
+ freqs.old = elanfreq_get_cpu_frequency(0);
+ freqs.new = elan_multiplier[state].clock;
+ freqs.cpu = 0; /* elanfreq.c is UP only driver */
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ printk(KERN_INFO "elanfreq: attempting to set frequency to %i kHz\n",
+ elan_multiplier[state].clock);
+
+
+ /*
+ * Access to the Elan's internal registers is indexed via
+ * 0x22: Chip Setup & Control Register Index Register (CSCI)
+ * 0x23: Chip Setup & Control Register Data Register (CSCD)
+ *
+ */
+
+ /*
+ * 0x40 is the Power Management Unit's Force Mode Register.
+ * Bit 6 enables Hyperspeed Mode (66/100 MHz core frequency)
+ */
+
+ local_irq_disable();
+ outb_p(0x40, REG_CSCIR); /* Disable hyperspeed mode */
+ outb_p(0x00, REG_CSCDR);
+ local_irq_enable(); /* wait till internal pipelines and */
+ udelay(1000); /* buffers have cleaned up */
+
+ local_irq_disable();
+
+ /* now, set the CPU clock speed register (0x80) */
+ outb_p(0x80, REG_CSCIR);
+ outb_p(elan_multiplier[state].val80h, REG_CSCDR);
+
+ /* now, the hyperspeed bit in PMU Force Mode Register (0x40) */
+ outb_p(0x40, REG_CSCIR);
+ outb_p(elan_multiplier[state].val40h, REG_CSCDR);
+ udelay(10000);
+ local_irq_enable();
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+};
+
+
+/**
+ * elanfreq_validatespeed: test if frequency range is valid
+ * @policy: the policy to validate
+ *
+ * This function checks if a given frequency range in kHz is valid
+ * for the hardware supported by the driver.
+ */
+
+static int elanfreq_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, &elanfreq_table[0]);
+}
+
+static int elanfreq_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate = 0;
+
+ if (cpufreq_frequency_table_target(policy, &elanfreq_table[0],
+ target_freq, relation, &newstate))
+ return -EINVAL;
+
+ elanfreq_set_cpu_state(newstate);
+
+ return 0;
+}
+
+
+/*
+ * Module init and exit code
+ */
+
+static int elanfreq_cpu_init(struct cpufreq_policy *policy)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ unsigned int i;
+ int result;
+
+ /* capability check */
+ if ((c->x86_vendor != X86_VENDOR_AMD) ||
+ (c->x86 != 4) || (c->x86_model != 10))
+ return -ENODEV;
+
+ /* max freq */
+ if (!max_freq)
+ max_freq = elanfreq_get_cpu_frequency(0);
+
+ /* table init */
+ for (i = 0; (elanfreq_table[i].frequency != CPUFREQ_TABLE_END); i++) {
+ if (elanfreq_table[i].frequency > max_freq)
+ elanfreq_table[i].frequency = CPUFREQ_ENTRY_INVALID;
+ }
+
+ /* cpuinfo and default policy values */
+ policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
+ policy->cur = elanfreq_get_cpu_frequency(0);
+
+ result = cpufreq_frequency_table_cpuinfo(policy, elanfreq_table);
+ if (result)
+ return result;
+
+ cpufreq_frequency_table_get_attr(elanfreq_table, policy->cpu);
+ return 0;
+}
+
+
+static int elanfreq_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+
+#ifndef MODULE
+/**
+ * elanfreq_setup - elanfreq command line parameter parsing
+ *
+ * elanfreq command line parameter. Use:
+ * elanfreq=66000
+ * to set the maximum CPU frequency to 66 MHz. Note that in
+ * case you do not give this boot parameter, the maximum
+ * frequency will fall back to _current_ CPU frequency which
+ * might be lower. If you build this as a module, use the
+ * max_freq module parameter instead.
+ */
+static int __init elanfreq_setup(char *str)
+{
+ max_freq = simple_strtoul(str, &str, 0);
+ printk(KERN_WARNING "You're using the deprecated elanfreq command line option. Use elanfreq.max_freq instead, please!\n");
+ return 1;
+}
+__setup("elanfreq=", elanfreq_setup);
+#endif
+
+
+static struct freq_attr *elanfreq_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+
+static struct cpufreq_driver elanfreq_driver = {
+ .get = elanfreq_get_cpu_frequency,
+ .verify = elanfreq_verify,
+ .target = elanfreq_target,
+ .init = elanfreq_cpu_init,
+ .exit = elanfreq_cpu_exit,
+ .name = "elanfreq",
+ .owner = THIS_MODULE,
+ .attr = elanfreq_attr,
+};
+
+
+static int __init elanfreq_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ /* Test if we have the right hardware */
+ if ((c->x86_vendor != X86_VENDOR_AMD) ||
+ (c->x86 != 4) || (c->x86_model != 10)) {
+ printk(KERN_INFO "elanfreq: error: no Elan processor found!\n");
+ return -ENODEV;
+ }
+ return cpufreq_register_driver(&elanfreq_driver);
+}
+
+
+static void __exit elanfreq_exit(void)
+{
+ cpufreq_unregister_driver(&elanfreq_driver);
+}
+
+
+module_param(max_freq, int, 0444);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Robert Schwebel <r.schwebel@pengutronix.de>, "
+ "Sven Geggus <sven@geggus.net>");
+MODULE_DESCRIPTION("cpufreq driver for AMD's Elan CPUs");
+
+module_init(elanfreq_init);
+module_exit(elanfreq_exit);
#include <linux/init.h>
#include <linux/cpufreq.h>
-#define dprintk(msg...) \
- cpufreq_debug_printk(CPUFREQ_DEBUG_CORE, "freq-table", msg)
-
/*********************************************************************
* FREQUENCY TABLE HELPERS *
*********************************************************************/
for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
unsigned int freq = table[i].frequency;
if (freq == CPUFREQ_ENTRY_INVALID) {
- dprintk("table entry %u is invalid, skipping\n", i);
+ pr_debug("table entry %u is invalid, skipping\n", i);
continue;
}
- dprintk("table entry %u: %u kHz, %u index\n",
+ pr_debug("table entry %u: %u kHz, %u index\n",
i, freq, table[i].index);
if (freq < min_freq)
min_freq = freq;
unsigned int i;
unsigned int count = 0;
- dprintk("request for verification of policy (%u - %u kHz) for cpu %u\n",
+ pr_debug("request for verification of policy (%u - %u kHz) for cpu %u\n",
policy->min, policy->max, policy->cpu);
if (!cpu_online(policy->cpu))
cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
policy->cpuinfo.max_freq);
- dprintk("verification lead to (%u - %u kHz) for cpu %u\n",
+ pr_debug("verification lead to (%u - %u kHz) for cpu %u\n",
policy->min, policy->max, policy->cpu);
return 0;
};
unsigned int i;
- dprintk("request for target %u kHz (relation: %u) for cpu %u\n",
+ pr_debug("request for target %u kHz (relation: %u) for cpu %u\n",
target_freq, relation, policy->cpu);
switch (relation) {
} else
*index = optimal.index;
- dprintk("target is %u (%u kHz, %u)\n", *index, table[*index].frequency,
+ pr_debug("target is %u (%u kHz, %u)\n", *index, table[*index].frequency,
table[*index].index);
return 0;
void cpufreq_frequency_table_get_attr(struct cpufreq_frequency_table *table,
unsigned int cpu)
{
- dprintk("setting show_table for cpu %u to %p\n", cpu, table);
+ pr_debug("setting show_table for cpu %u to %p\n", cpu, table);
per_cpu(cpufreq_show_table, cpu) = table;
}
EXPORT_SYMBOL_GPL(cpufreq_frequency_table_get_attr);
void cpufreq_frequency_table_put_attr(unsigned int cpu)
{
- dprintk("clearing show_table for cpu %u\n", cpu);
+ pr_debug("clearing show_table for cpu %u\n", cpu);
per_cpu(cpufreq_show_table, cpu) = NULL;
}
EXPORT_SYMBOL_GPL(cpufreq_frequency_table_put_attr);
--- /dev/null
+/*
+ * Cyrix MediaGX and NatSemi Geode Suspend Modulation
+ * (C) 2002 Zwane Mwaikambo <zwane@commfireservices.com>
+ * (C) 2002 Hiroshi Miura <miura@da-cha.org>
+ * All Rights Reserved
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation
+ *
+ * The author(s) of this software shall not be held liable for damages
+ * of any nature resulting due to the use of this software. This
+ * software is provided AS-IS with no warranties.
+ *
+ * Theoretical note:
+ *
+ * (see Geode(tm) CS5530 manual (rev.4.1) page.56)
+ *
+ * CPU frequency control on NatSemi Geode GX1/GXLV processor and CS55x0
+ * are based on Suspend Modulation.
+ *
+ * Suspend Modulation works by asserting and de-asserting the SUSP# pin
+ * to CPU(GX1/GXLV) for configurable durations. When asserting SUSP#
+ * the CPU enters an idle state. GX1 stops its core clock when SUSP# is
+ * asserted then power consumption is reduced.
+ *
+ * Suspend Modulation's OFF/ON duration are configurable
+ * with 'Suspend Modulation OFF Count Register'
+ * and 'Suspend Modulation ON Count Register'.
+ * These registers are 8bit counters that represent the number of
+ * 32us intervals which the SUSP# pin is asserted(ON)/de-asserted(OFF)
+ * to the processor.
+ *
+ * These counters define a ratio which is the effective frequency
+ * of operation of the system.
+ *
+ * OFF Count
+ * F_eff = Fgx * ----------------------
+ * OFF Count + ON Count
+ *
+ * 0 <= On Count, Off Count <= 255
+ *
+ * From these limits, we can get register values
+ *
+ * off_duration + on_duration <= MAX_DURATION
+ * on_duration = off_duration * (stock_freq - freq) / freq
+ *
+ * off_duration = (freq * DURATION) / stock_freq
+ * on_duration = DURATION - off_duration
+ *
+ *
+ *---------------------------------------------------------------------------
+ *
+ * ChangeLog:
+ * Dec. 12, 2003 Hiroshi Miura <miura@da-cha.org>
+ * - fix on/off register mistake
+ * - fix cpu_khz calc when it stops cpu modulation.
+ *
+ * Dec. 11, 2002 Hiroshi Miura <miura@da-cha.org>
+ * - rewrite for Cyrix MediaGX Cx5510/5520 and
+ * NatSemi Geode Cs5530(A).
+ *
+ * Jul. ??, 2002 Zwane Mwaikambo <zwane@commfireservices.com>
+ * - cs5530_mod patch for 2.4.19-rc1.
+ *
+ *---------------------------------------------------------------------------
+ *
+ * Todo
+ * Test on machines with 5510, 5530, 5530A
+ */
+
+/************************************************************************
+ * Suspend Modulation - Definitions *
+ ************************************************************************/
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/cpufreq.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+
+#include <asm/processor-cyrix.h>
+
+/* PCI config registers, all at F0 */
+#define PCI_PMER1 0x80 /* power management enable register 1 */
+#define PCI_PMER2 0x81 /* power management enable register 2 */
+#define PCI_PMER3 0x82 /* power management enable register 3 */
+#define PCI_IRQTC 0x8c /* irq speedup timer counter register:typical 2 to 4ms */
+#define PCI_VIDTC 0x8d /* video speedup timer counter register: typical 50 to 100ms */
+#define PCI_MODOFF 0x94 /* suspend modulation OFF counter register, 1 = 32us */
+#define PCI_MODON 0x95 /* suspend modulation ON counter register */
+#define PCI_SUSCFG 0x96 /* suspend configuration register */
+
+/* PMER1 bits */
+#define GPM (1<<0) /* global power management */
+#define GIT (1<<1) /* globally enable PM device idle timers */
+#define GTR (1<<2) /* globally enable IO traps */
+#define IRQ_SPDUP (1<<3) /* disable clock throttle during interrupt handling */
+#define VID_SPDUP (1<<4) /* disable clock throttle during vga video handling */
+
+/* SUSCFG bits */
+#define SUSMOD (1<<0) /* enable/disable suspend modulation */
+/* the below is supported only with cs5530 (after rev.1.2)/cs5530A */
+#define SMISPDUP (1<<1) /* select how SMI re-enable suspend modulation: */
+ /* IRQTC timer or read SMI speedup disable reg.(F1BAR[08-09h]) */
+#define SUSCFG (1<<2) /* enable powering down a GXLV processor. "Special 3Volt Suspend" mode */
+/* the below is supported only with cs5530A */
+#define PWRSVE_ISA (1<<3) /* stop ISA clock */
+#define PWRSVE (1<<4) /* active idle */
+
+struct gxfreq_params {
+ u8 on_duration;
+ u8 off_duration;
+ u8 pci_suscfg;
+ u8 pci_pmer1;
+ u8 pci_pmer2;
+ struct pci_dev *cs55x0;
+};
+
+static struct gxfreq_params *gx_params;
+static int stock_freq;
+
+/* PCI bus clock - defaults to 30.000 if cpu_khz is not available */
+static int pci_busclk;
+module_param(pci_busclk, int, 0444);
+
+/* maximum duration for which the cpu may be suspended
+ * (32us * MAX_DURATION). If no parameter is given, this defaults
+ * to 255.
+ * Note that this leads to a maximum of 8 ms(!) where the CPU clock
+ * is suspended -- processing power is just 0.39% of what it used to be,
+ * though. 781.25 kHz(!) for a 200 MHz processor -- wow. */
+static int max_duration = 255;
+module_param(max_duration, int, 0444);
+
+/* For the default policy, we want at least some processing power
+ * - let's say 5%. (min = maxfreq / POLICY_MIN_DIV)
+ */
+#define POLICY_MIN_DIV 20
+
+
+/**
+ * we can detect a core multipiler from dir0_lsb
+ * from GX1 datasheet p.56,
+ * MULT[3:0]:
+ * 0000 = SYSCLK multiplied by 4 (test only)
+ * 0001 = SYSCLK multiplied by 10
+ * 0010 = SYSCLK multiplied by 4
+ * 0011 = SYSCLK multiplied by 6
+ * 0100 = SYSCLK multiplied by 9
+ * 0101 = SYSCLK multiplied by 5
+ * 0110 = SYSCLK multiplied by 7
+ * 0111 = SYSCLK multiplied by 8
+ * of 33.3MHz
+ **/
+static int gx_freq_mult[16] = {
+ 4, 10, 4, 6, 9, 5, 7, 8,
+ 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+
+/****************************************************************
+ * Low Level chipset interface *
+ ****************************************************************/
+static struct pci_device_id gx_chipset_tbl[] __initdata = {
+ { PCI_VDEVICE(CYRIX, PCI_DEVICE_ID_CYRIX_5530_LEGACY), },
+ { PCI_VDEVICE(CYRIX, PCI_DEVICE_ID_CYRIX_5520), },
+ { PCI_VDEVICE(CYRIX, PCI_DEVICE_ID_CYRIX_5510), },
+ { 0, },
+};
+
+static void gx_write_byte(int reg, int value)
+{
+ pci_write_config_byte(gx_params->cs55x0, reg, value);
+}
+
+/**
+ * gx_detect_chipset:
+ *
+ **/
+static __init struct pci_dev *gx_detect_chipset(void)
+{
+ struct pci_dev *gx_pci = NULL;
+
+ /* check if CPU is a MediaGX or a Geode. */
+ if ((boot_cpu_data.x86_vendor != X86_VENDOR_NSC) &&
+ (boot_cpu_data.x86_vendor != X86_VENDOR_CYRIX)) {
+ pr_debug("error: no MediaGX/Geode processor found!\n");
+ return NULL;
+ }
+
+ /* detect which companion chip is used */
+ for_each_pci_dev(gx_pci) {
+ if ((pci_match_id(gx_chipset_tbl, gx_pci)) != NULL)
+ return gx_pci;
+ }
+
+ pr_debug("error: no supported chipset found!\n");
+ return NULL;
+}
+
+/**
+ * gx_get_cpuspeed:
+ *
+ * Finds out at which efficient frequency the Cyrix MediaGX/NatSemi
+ * Geode CPU runs.
+ */
+static unsigned int gx_get_cpuspeed(unsigned int cpu)
+{
+ if ((gx_params->pci_suscfg & SUSMOD) == 0)
+ return stock_freq;
+
+ return (stock_freq * gx_params->off_duration)
+ / (gx_params->on_duration + gx_params->off_duration);
+}
+
+/**
+ * gx_validate_speed:
+ * determine current cpu speed
+ *
+ **/
+
+static unsigned int gx_validate_speed(unsigned int khz, u8 *on_duration,
+ u8 *off_duration)
+{
+ unsigned int i;
+ u8 tmp_on, tmp_off;
+ int old_tmp_freq = stock_freq;
+ int tmp_freq;
+
+ *off_duration = 1;
+ *on_duration = 0;
+
+ for (i = max_duration; i > 0; i--) {
+ tmp_off = ((khz * i) / stock_freq) & 0xff;
+ tmp_on = i - tmp_off;
+ tmp_freq = (stock_freq * tmp_off) / i;
+ /* if this relation is closer to khz, use this. If it's equal,
+ * prefer it, too - lower latency */
+ if (abs(tmp_freq - khz) <= abs(old_tmp_freq - khz)) {
+ *on_duration = tmp_on;
+ *off_duration = tmp_off;
+ old_tmp_freq = tmp_freq;
+ }
+ }
+
+ return old_tmp_freq;
+}
+
+
+/**
+ * gx_set_cpuspeed:
+ * set cpu speed in khz.
+ **/
+
+static void gx_set_cpuspeed(unsigned int khz)
+{
+ u8 suscfg, pmer1;
+ unsigned int new_khz;
+ unsigned long flags;
+ struct cpufreq_freqs freqs;
+
+ freqs.cpu = 0;
+ freqs.old = gx_get_cpuspeed(0);
+
+ new_khz = gx_validate_speed(khz, &gx_params->on_duration,
+ &gx_params->off_duration);
+
+ freqs.new = new_khz;
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ local_irq_save(flags);
+
+
+
+ if (new_khz != stock_freq) {
+ /* if new khz == 100% of CPU speed, it is special case */
+ switch (gx_params->cs55x0->device) {
+ case PCI_DEVICE_ID_CYRIX_5530_LEGACY:
+ pmer1 = gx_params->pci_pmer1 | IRQ_SPDUP | VID_SPDUP;
+ /* FIXME: need to test other values -- Zwane,Miura */
+ /* typical 2 to 4ms */
+ gx_write_byte(PCI_IRQTC, 4);
+ /* typical 50 to 100ms */
+ gx_write_byte(PCI_VIDTC, 100);
+ gx_write_byte(PCI_PMER1, pmer1);
+
+ if (gx_params->cs55x0->revision < 0x10) {
+ /* CS5530(rev 1.2, 1.3) */
+ suscfg = gx_params->pci_suscfg|SUSMOD;
+ } else {
+ /* CS5530A,B.. */
+ suscfg = gx_params->pci_suscfg|SUSMOD|PWRSVE;
+ }
+ break;
+ case PCI_DEVICE_ID_CYRIX_5520:
+ case PCI_DEVICE_ID_CYRIX_5510:
+ suscfg = gx_params->pci_suscfg | SUSMOD;
+ break;
+ default:
+ local_irq_restore(flags);
+ pr_debug("fatal: try to set unknown chipset.\n");
+ return;
+ }
+ } else {
+ suscfg = gx_params->pci_suscfg & ~(SUSMOD);
+ gx_params->off_duration = 0;
+ gx_params->on_duration = 0;
+ pr_debug("suspend modulation disabled: cpu runs 100%% speed.\n");
+ }
+
+ gx_write_byte(PCI_MODOFF, gx_params->off_duration);
+ gx_write_byte(PCI_MODON, gx_params->on_duration);
+
+ gx_write_byte(PCI_SUSCFG, suscfg);
+ pci_read_config_byte(gx_params->cs55x0, PCI_SUSCFG, &suscfg);
+
+ local_irq_restore(flags);
+
+ gx_params->pci_suscfg = suscfg;
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+ pr_debug("suspend modulation w/ duration of ON:%d us, OFF:%d us\n",
+ gx_params->on_duration * 32, gx_params->off_duration * 32);
+ pr_debug("suspend modulation w/ clock speed: %d kHz.\n", freqs.new);
+}
+
+/****************************************************************
+ * High level functions *
+ ****************************************************************/
+
+/*
+ * cpufreq_gx_verify: test if frequency range is valid
+ *
+ * This function checks if a given frequency range in kHz is valid
+ * for the hardware supported by the driver.
+ */
+
+static int cpufreq_gx_verify(struct cpufreq_policy *policy)
+{
+ unsigned int tmp_freq = 0;
+ u8 tmp1, tmp2;
+
+ if (!stock_freq || !policy)
+ return -EINVAL;
+
+ policy->cpu = 0;
+ cpufreq_verify_within_limits(policy, (stock_freq / max_duration),
+ stock_freq);
+
+ /* it needs to be assured that at least one supported frequency is
+ * within policy->min and policy->max. If it is not, policy->max
+ * needs to be increased until one freuqency is supported.
+ * policy->min may not be decreased, though. This way we guarantee a
+ * specific processing capacity.
+ */
+ tmp_freq = gx_validate_speed(policy->min, &tmp1, &tmp2);
+ if (tmp_freq < policy->min)
+ tmp_freq += stock_freq / max_duration;
+ policy->min = tmp_freq;
+ if (policy->min > policy->max)
+ policy->max = tmp_freq;
+ tmp_freq = gx_validate_speed(policy->max, &tmp1, &tmp2);
+ if (tmp_freq > policy->max)
+ tmp_freq -= stock_freq / max_duration;
+ policy->max = tmp_freq;
+ if (policy->max < policy->min)
+ policy->max = policy->min;
+ cpufreq_verify_within_limits(policy, (stock_freq / max_duration),
+ stock_freq);
+
+ return 0;
+}
+
+/*
+ * cpufreq_gx_target:
+ *
+ */
+static int cpufreq_gx_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ u8 tmp1, tmp2;
+ unsigned int tmp_freq;
+
+ if (!stock_freq || !policy)
+ return -EINVAL;
+
+ policy->cpu = 0;
+
+ tmp_freq = gx_validate_speed(target_freq, &tmp1, &tmp2);
+ while (tmp_freq < policy->min) {
+ tmp_freq += stock_freq / max_duration;
+ tmp_freq = gx_validate_speed(tmp_freq, &tmp1, &tmp2);
+ }
+ while (tmp_freq > policy->max) {
+ tmp_freq -= stock_freq / max_duration;
+ tmp_freq = gx_validate_speed(tmp_freq, &tmp1, &tmp2);
+ }
+
+ gx_set_cpuspeed(tmp_freq);
+
+ return 0;
+}
+
+static int cpufreq_gx_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int maxfreq, curfreq;
+
+ if (!policy || policy->cpu != 0)
+ return -ENODEV;
+
+ /* determine maximum frequency */
+ if (pci_busclk)
+ maxfreq = pci_busclk * gx_freq_mult[getCx86(CX86_DIR1) & 0x0f];
+ else if (cpu_khz)
+ maxfreq = cpu_khz;
+ else
+ maxfreq = 30000 * gx_freq_mult[getCx86(CX86_DIR1) & 0x0f];
+
+ stock_freq = maxfreq;
+ curfreq = gx_get_cpuspeed(0);
+
+ pr_debug("cpu max frequency is %d.\n", maxfreq);
+ pr_debug("cpu current frequency is %dkHz.\n", curfreq);
+
+ /* setup basic struct for cpufreq API */
+ policy->cpu = 0;
+
+ if (max_duration < POLICY_MIN_DIV)
+ policy->min = maxfreq / max_duration;
+ else
+ policy->min = maxfreq / POLICY_MIN_DIV;
+ policy->max = maxfreq;
+ policy->cur = curfreq;
+ policy->cpuinfo.min_freq = maxfreq / max_duration;
+ policy->cpuinfo.max_freq = maxfreq;
+ policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
+
+ return 0;
+}
+
+/*
+ * cpufreq_gx_init:
+ * MediaGX/Geode GX initialize cpufreq driver
+ */
+static struct cpufreq_driver gx_suspmod_driver = {
+ .get = gx_get_cpuspeed,
+ .verify = cpufreq_gx_verify,
+ .target = cpufreq_gx_target,
+ .init = cpufreq_gx_cpu_init,
+ .name = "gx-suspmod",
+ .owner = THIS_MODULE,
+};
+
+static int __init cpufreq_gx_init(void)
+{
+ int ret;
+ struct gxfreq_params *params;
+ struct pci_dev *gx_pci;
+
+ /* Test if we have the right hardware */
+ gx_pci = gx_detect_chipset();
+ if (gx_pci == NULL)
+ return -ENODEV;
+
+ /* check whether module parameters are sane */
+ if (max_duration > 0xff)
+ max_duration = 0xff;
+
+ pr_debug("geode suspend modulation available.\n");
+
+ params = kzalloc(sizeof(struct gxfreq_params), GFP_KERNEL);
+ if (params == NULL)
+ return -ENOMEM;
+
+ params->cs55x0 = gx_pci;
+ gx_params = params;
+
+ /* keep cs55x0 configurations */
+ pci_read_config_byte(params->cs55x0, PCI_SUSCFG, &(params->pci_suscfg));
+ pci_read_config_byte(params->cs55x0, PCI_PMER1, &(params->pci_pmer1));
+ pci_read_config_byte(params->cs55x0, PCI_PMER2, &(params->pci_pmer2));
+ pci_read_config_byte(params->cs55x0, PCI_MODON, &(params->on_duration));
+ pci_read_config_byte(params->cs55x0, PCI_MODOFF,
+ &(params->off_duration));
+
+ ret = cpufreq_register_driver(&gx_suspmod_driver);
+ if (ret) {
+ kfree(params);
+ return ret; /* register error! */
+ }
+
+ return 0;
+}
+
+static void __exit cpufreq_gx_exit(void)
+{
+ cpufreq_unregister_driver(&gx_suspmod_driver);
+ pci_dev_put(gx_params->cs55x0);
+ kfree(gx_params);
+}
+
+MODULE_AUTHOR("Hiroshi Miura <miura@da-cha.org>");
+MODULE_DESCRIPTION("Cpufreq driver for Cyrix MediaGX and NatSemi Geode");
+MODULE_LICENSE("GPL");
+
+module_init(cpufreq_gx_init);
+module_exit(cpufreq_gx_exit);
+
--- /dev/null
+/*
+ * (C) 2001-2004 Dave Jones. <davej@redhat.com>
+ * (C) 2002 Padraig Brady. <padraig@antefacto.com>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ * Based upon datasheets & sample CPUs kindly provided by VIA.
+ *
+ * VIA have currently 3 different versions of Longhaul.
+ * Version 1 (Longhaul) uses the BCR2 MSR at 0x1147.
+ * It is present only in Samuel 1 (C5A), Samuel 2 (C5B) stepping 0.
+ * Version 2 of longhaul is backward compatible with v1, but adds
+ * LONGHAUL MSR for purpose of both frequency and voltage scaling.
+ * Present in Samuel 2 (steppings 1-7 only) (C5B), and Ezra (C5C).
+ * Version 3 of longhaul got renamed to Powersaver and redesigned
+ * to use only the POWERSAVER MSR at 0x110a.
+ * It is present in Ezra-T (C5M), Nehemiah (C5X) and above.
+ * It's pretty much the same feature wise to longhaul v2, though
+ * there is provision for scaling FSB too, but this doesn't work
+ * too well in practice so we don't even try to use this.
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/delay.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+#include <linux/acpi.h>
+
+#include <asm/msr.h>
+#include <acpi/processor.h>
+
+#include "longhaul.h"
+
+#define PFX "longhaul: "
+
+#define TYPE_LONGHAUL_V1 1
+#define TYPE_LONGHAUL_V2 2
+#define TYPE_POWERSAVER 3
+
+#define CPU_SAMUEL 1
+#define CPU_SAMUEL2 2
+#define CPU_EZRA 3
+#define CPU_EZRA_T 4
+#define CPU_NEHEMIAH 5
+#define CPU_NEHEMIAH_C 6
+
+/* Flags */
+#define USE_ACPI_C3 (1 << 1)
+#define USE_NORTHBRIDGE (1 << 2)
+
+static int cpu_model;
+static unsigned int numscales = 16;
+static unsigned int fsb;
+
+static const struct mV_pos *vrm_mV_table;
+static const unsigned char *mV_vrm_table;
+
+static unsigned int highest_speed, lowest_speed; /* kHz */
+static unsigned int minmult, maxmult;
+static int can_scale_voltage;
+static struct acpi_processor *pr;
+static struct acpi_processor_cx *cx;
+static u32 acpi_regs_addr;
+static u8 longhaul_flags;
+static unsigned int longhaul_index;
+
+/* Module parameters */
+static int scale_voltage;
+static int disable_acpi_c3;
+static int revid_errata;
+
+
+/* Clock ratios multiplied by 10 */
+static int mults[32];
+static int eblcr[32];
+static int longhaul_version;
+static struct cpufreq_frequency_table *longhaul_table;
+
+static char speedbuffer[8];
+
+static char *print_speed(int speed)
+{
+ if (speed < 1000) {
+ snprintf(speedbuffer, sizeof(speedbuffer), "%dMHz", speed);
+ return speedbuffer;
+ }
+
+ if (speed%1000 == 0)
+ snprintf(speedbuffer, sizeof(speedbuffer),
+ "%dGHz", speed/1000);
+ else
+ snprintf(speedbuffer, sizeof(speedbuffer),
+ "%d.%dGHz", speed/1000, (speed%1000)/100);
+
+ return speedbuffer;
+}
+
+
+static unsigned int calc_speed(int mult)
+{
+ int khz;
+ khz = (mult/10)*fsb;
+ if (mult%10)
+ khz += fsb/2;
+ khz *= 1000;
+ return khz;
+}
+
+
+static int longhaul_get_cpu_mult(void)
+{
+ unsigned long invalue = 0, lo, hi;
+
+ rdmsr(MSR_IA32_EBL_CR_POWERON, lo, hi);
+ invalue = (lo & (1<<22|1<<23|1<<24|1<<25))>>22;
+ if (longhaul_version == TYPE_LONGHAUL_V2 ||
+ longhaul_version == TYPE_POWERSAVER) {
+ if (lo & (1<<27))
+ invalue += 16;
+ }
+ return eblcr[invalue];
+}
+
+/* For processor with BCR2 MSR */
+
+static void do_longhaul1(unsigned int mults_index)
+{
+ union msr_bcr2 bcr2;
+
+ rdmsrl(MSR_VIA_BCR2, bcr2.val);
+ /* Enable software clock multiplier */
+ bcr2.bits.ESOFTBF = 1;
+ bcr2.bits.CLOCKMUL = mults_index & 0xff;
+
+ /* Sync to timer tick */
+ safe_halt();
+ /* Change frequency on next halt or sleep */
+ wrmsrl(MSR_VIA_BCR2, bcr2.val);
+ /* Invoke transition */
+ ACPI_FLUSH_CPU_CACHE();
+ halt();
+
+ /* Disable software clock multiplier */
+ local_irq_disable();
+ rdmsrl(MSR_VIA_BCR2, bcr2.val);
+ bcr2.bits.ESOFTBF = 0;
+ wrmsrl(MSR_VIA_BCR2, bcr2.val);
+}
+
+/* For processor with Longhaul MSR */
+
+static void do_powersaver(int cx_address, unsigned int mults_index,
+ unsigned int dir)
+{
+ union msr_longhaul longhaul;
+ u32 t;
+
+ rdmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ /* Setup new frequency */
+ if (!revid_errata)
+ longhaul.bits.RevisionKey = longhaul.bits.RevisionID;
+ else
+ longhaul.bits.RevisionKey = 0;
+ longhaul.bits.SoftBusRatio = mults_index & 0xf;
+ longhaul.bits.SoftBusRatio4 = (mults_index & 0x10) >> 4;
+ /* Setup new voltage */
+ if (can_scale_voltage)
+ longhaul.bits.SoftVID = (mults_index >> 8) & 0x1f;
+ /* Sync to timer tick */
+ safe_halt();
+ /* Raise voltage if necessary */
+ if (can_scale_voltage && dir) {
+ longhaul.bits.EnableSoftVID = 1;
+ wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ /* Change voltage */
+ if (!cx_address) {
+ ACPI_FLUSH_CPU_CACHE();
+ halt();
+ } else {
+ ACPI_FLUSH_CPU_CACHE();
+ /* Invoke C3 */
+ inb(cx_address);
+ /* Dummy op - must do something useless after P_LVL3
+ * read */
+ t = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ }
+ longhaul.bits.EnableSoftVID = 0;
+ wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ }
+
+ /* Change frequency on next halt or sleep */
+ longhaul.bits.EnableSoftBusRatio = 1;
+ wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ if (!cx_address) {
+ ACPI_FLUSH_CPU_CACHE();
+ halt();
+ } else {
+ ACPI_FLUSH_CPU_CACHE();
+ /* Invoke C3 */
+ inb(cx_address);
+ /* Dummy op - must do something useless after P_LVL3 read */
+ t = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ }
+ /* Disable bus ratio bit */
+ longhaul.bits.EnableSoftBusRatio = 0;
+ wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+
+ /* Reduce voltage if necessary */
+ if (can_scale_voltage && !dir) {
+ longhaul.bits.EnableSoftVID = 1;
+ wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ /* Change voltage */
+ if (!cx_address) {
+ ACPI_FLUSH_CPU_CACHE();
+ halt();
+ } else {
+ ACPI_FLUSH_CPU_CACHE();
+ /* Invoke C3 */
+ inb(cx_address);
+ /* Dummy op - must do something useless after P_LVL3
+ * read */
+ t = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ }
+ longhaul.bits.EnableSoftVID = 0;
+ wrmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ }
+}
+
+/**
+ * longhaul_set_cpu_frequency()
+ * @mults_index : bitpattern of the new multiplier.
+ *
+ * Sets a new clock ratio.
+ */
+
+static void longhaul_setstate(unsigned int table_index)
+{
+ unsigned int mults_index;
+ int speed, mult;
+ struct cpufreq_freqs freqs;
+ unsigned long flags;
+ unsigned int pic1_mask, pic2_mask;
+ u16 bm_status = 0;
+ u32 bm_timeout = 1000;
+ unsigned int dir = 0;
+
+ mults_index = longhaul_table[table_index].index;
+ /* Safety precautions */
+ mult = mults[mults_index & 0x1f];
+ if (mult == -1)
+ return;
+ speed = calc_speed(mult);
+ if ((speed > highest_speed) || (speed < lowest_speed))
+ return;
+ /* Voltage transition before frequency transition? */
+ if (can_scale_voltage && longhaul_index < table_index)
+ dir = 1;
+
+ freqs.old = calc_speed(longhaul_get_cpu_mult());
+ freqs.new = speed;
+ freqs.cpu = 0; /* longhaul.c is UP only driver */
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ pr_debug("Setting to FSB:%dMHz Mult:%d.%dx (%s)\n",
+ fsb, mult/10, mult%10, print_speed(speed/1000));
+retry_loop:
+ preempt_disable();
+ local_irq_save(flags);
+
+ pic2_mask = inb(0xA1);
+ pic1_mask = inb(0x21); /* works on C3. save mask. */
+ outb(0xFF, 0xA1); /* Overkill */
+ outb(0xFE, 0x21); /* TMR0 only */
+
+ /* Wait while PCI bus is busy. */
+ if (acpi_regs_addr && (longhaul_flags & USE_NORTHBRIDGE
+ || ((pr != NULL) && pr->flags.bm_control))) {
+ bm_status = inw(acpi_regs_addr);
+ bm_status &= 1 << 4;
+ while (bm_status && bm_timeout) {
+ outw(1 << 4, acpi_regs_addr);
+ bm_timeout--;
+ bm_status = inw(acpi_regs_addr);
+ bm_status &= 1 << 4;
+ }
+ }
+
+ if (longhaul_flags & USE_NORTHBRIDGE) {
+ /* Disable AGP and PCI arbiters */
+ outb(3, 0x22);
+ } else if ((pr != NULL) && pr->flags.bm_control) {
+ /* Disable bus master arbitration */
+ acpi_write_bit_register(ACPI_BITREG_ARB_DISABLE, 1);
+ }
+ switch (longhaul_version) {
+
+ /*
+ * Longhaul v1. (Samuel[C5A] and Samuel2 stepping 0[C5B])
+ * Software controlled multipliers only.
+ */
+ case TYPE_LONGHAUL_V1:
+ do_longhaul1(mults_index);
+ break;
+
+ /*
+ * Longhaul v2 appears in Samuel2 Steppings 1->7 [C5B] and Ezra [C5C]
+ *
+ * Longhaul v3 (aka Powersaver). (Ezra-T [C5M] & Nehemiah [C5N])
+ * Nehemiah can do FSB scaling too, but this has never been proven
+ * to work in practice.
+ */
+ case TYPE_LONGHAUL_V2:
+ case TYPE_POWERSAVER:
+ if (longhaul_flags & USE_ACPI_C3) {
+ /* Don't allow wakeup */
+ acpi_write_bit_register(ACPI_BITREG_BUS_MASTER_RLD, 0);
+ do_powersaver(cx->address, mults_index, dir);
+ } else {
+ do_powersaver(0, mults_index, dir);
+ }
+ break;
+ }
+
+ if (longhaul_flags & USE_NORTHBRIDGE) {
+ /* Enable arbiters */
+ outb(0, 0x22);
+ } else if ((pr != NULL) && pr->flags.bm_control) {
+ /* Enable bus master arbitration */
+ acpi_write_bit_register(ACPI_BITREG_ARB_DISABLE, 0);
+ }
+ outb(pic2_mask, 0xA1); /* restore mask */
+ outb(pic1_mask, 0x21);
+
+ local_irq_restore(flags);
+ preempt_enable();
+
+ freqs.new = calc_speed(longhaul_get_cpu_mult());
+ /* Check if requested frequency is set. */
+ if (unlikely(freqs.new != speed)) {
+ printk(KERN_INFO PFX "Failed to set requested frequency!\n");
+ /* Revision ID = 1 but processor is expecting revision key
+ * equal to 0. Jumpers at the bottom of processor will change
+ * multiplier and FSB, but will not change bits in Longhaul
+ * MSR nor enable voltage scaling. */
+ if (!revid_errata) {
+ printk(KERN_INFO PFX "Enabling \"Ignore Revision ID\" "
+ "option.\n");
+ revid_errata = 1;
+ msleep(200);
+ goto retry_loop;
+ }
+ /* Why ACPI C3 sometimes doesn't work is a mystery for me.
+ * But it does happen. Processor is entering ACPI C3 state,
+ * but it doesn't change frequency. I tried poking various
+ * bits in northbridge registers, but without success. */
+ if (longhaul_flags & USE_ACPI_C3) {
+ printk(KERN_INFO PFX "Disabling ACPI C3 support.\n");
+ longhaul_flags &= ~USE_ACPI_C3;
+ if (revid_errata) {
+ printk(KERN_INFO PFX "Disabling \"Ignore "
+ "Revision ID\" option.\n");
+ revid_errata = 0;
+ }
+ msleep(200);
+ goto retry_loop;
+ }
+ /* This shouldn't happen. Longhaul ver. 2 was reported not
+ * working on processors without voltage scaling, but with
+ * RevID = 1. RevID errata will make things right. Just
+ * to be 100% sure. */
+ if (longhaul_version == TYPE_LONGHAUL_V2) {
+ printk(KERN_INFO PFX "Switching to Longhaul ver. 1\n");
+ longhaul_version = TYPE_LONGHAUL_V1;
+ msleep(200);
+ goto retry_loop;
+ }
+ }
+ /* Report true CPU frequency */
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+ if (!bm_timeout)
+ printk(KERN_INFO PFX "Warning: Timeout while waiting for "
+ "idle PCI bus.\n");
+}
+
+/*
+ * Centaur decided to make life a little more tricky.
+ * Only longhaul v1 is allowed to read EBLCR BSEL[0:1].
+ * Samuel2 and above have to try and guess what the FSB is.
+ * We do this by assuming we booted at maximum multiplier, and interpolate
+ * between that value multiplied by possible FSBs and cpu_mhz which
+ * was calculated at boot time. Really ugly, but no other way to do this.
+ */
+
+#define ROUNDING 0xf
+
+static int guess_fsb(int mult)
+{
+ int speed = cpu_khz / 1000;
+ int i;
+ int speeds[] = { 666, 1000, 1333, 2000 };
+ int f_max, f_min;
+
+ for (i = 0; i < 4; i++) {
+ f_max = ((speeds[i] * mult) + 50) / 100;
+ f_max += (ROUNDING / 2);
+ f_min = f_max - ROUNDING;
+ if ((speed <= f_max) && (speed >= f_min))
+ return speeds[i] / 10;
+ }
+ return 0;
+}
+
+
+static int __cpuinit longhaul_get_ranges(void)
+{
+ unsigned int i, j, k = 0;
+ unsigned int ratio;
+ int mult;
+
+ /* Get current frequency */
+ mult = longhaul_get_cpu_mult();
+ if (mult == -1) {
+ printk(KERN_INFO PFX "Invalid (reserved) multiplier!\n");
+ return -EINVAL;
+ }
+ fsb = guess_fsb(mult);
+ if (fsb == 0) {
+ printk(KERN_INFO PFX "Invalid (reserved) FSB!\n");
+ return -EINVAL;
+ }
+ /* Get max multiplier - as we always did.
+ * Longhaul MSR is useful only when voltage scaling is enabled.
+ * C3 is booting at max anyway. */
+ maxmult = mult;
+ /* Get min multiplier */
+ switch (cpu_model) {
+ case CPU_NEHEMIAH:
+ minmult = 50;
+ break;
+ case CPU_NEHEMIAH_C:
+ minmult = 40;
+ break;
+ default:
+ minmult = 30;
+ break;
+ }
+
+ pr_debug("MinMult:%d.%dx MaxMult:%d.%dx\n",
+ minmult/10, minmult%10, maxmult/10, maxmult%10);
+
+ highest_speed = calc_speed(maxmult);
+ lowest_speed = calc_speed(minmult);
+ pr_debug("FSB:%dMHz Lowest speed: %s Highest speed:%s\n", fsb,
+ print_speed(lowest_speed/1000),
+ print_speed(highest_speed/1000));
+
+ if (lowest_speed == highest_speed) {
+ printk(KERN_INFO PFX "highestspeed == lowest, aborting.\n");
+ return -EINVAL;
+ }
+ if (lowest_speed > highest_speed) {
+ printk(KERN_INFO PFX "nonsense! lowest (%d > %d) !\n",
+ lowest_speed, highest_speed);
+ return -EINVAL;
+ }
+
+ longhaul_table = kmalloc((numscales + 1) * sizeof(*longhaul_table),
+ GFP_KERNEL);
+ if (!longhaul_table)
+ return -ENOMEM;
+
+ for (j = 0; j < numscales; j++) {
+ ratio = mults[j];
+ if (ratio == -1)
+ continue;
+ if (ratio > maxmult || ratio < minmult)
+ continue;
+ longhaul_table[k].frequency = calc_speed(ratio);
+ longhaul_table[k].index = j;
+ k++;
+ }
+ if (k <= 1) {
+ kfree(longhaul_table);
+ return -ENODEV;
+ }
+ /* Sort */
+ for (j = 0; j < k - 1; j++) {
+ unsigned int min_f, min_i;
+ min_f = longhaul_table[j].frequency;
+ min_i = j;
+ for (i = j + 1; i < k; i++) {
+ if (longhaul_table[i].frequency < min_f) {
+ min_f = longhaul_table[i].frequency;
+ min_i = i;
+ }
+ }
+ if (min_i != j) {
+ swap(longhaul_table[j].frequency,
+ longhaul_table[min_i].frequency);
+ swap(longhaul_table[j].index,
+ longhaul_table[min_i].index);
+ }
+ }
+
+ longhaul_table[k].frequency = CPUFREQ_TABLE_END;
+
+ /* Find index we are running on */
+ for (j = 0; j < k; j++) {
+ if (mults[longhaul_table[j].index & 0x1f] == mult) {
+ longhaul_index = j;
+ break;
+ }
+ }
+ return 0;
+}
+
+
+static void __cpuinit longhaul_setup_voltagescaling(void)
+{
+ union msr_longhaul longhaul;
+ struct mV_pos minvid, maxvid, vid;
+ unsigned int j, speed, pos, kHz_step, numvscales;
+ int min_vid_speed;
+
+ rdmsrl(MSR_VIA_LONGHAUL, longhaul.val);
+ if (!(longhaul.bits.RevisionID & 1)) {
+ printk(KERN_INFO PFX "Voltage scaling not supported by CPU.\n");
+ return;
+ }
+
+ if (!longhaul.bits.VRMRev) {
+ printk(KERN_INFO PFX "VRM 8.5\n");
+ vrm_mV_table = &vrm85_mV[0];
+ mV_vrm_table = &mV_vrm85[0];
+ } else {
+ printk(KERN_INFO PFX "Mobile VRM\n");
+ if (cpu_model < CPU_NEHEMIAH)
+ return;
+ vrm_mV_table = &mobilevrm_mV[0];
+ mV_vrm_table = &mV_mobilevrm[0];
+ }
+
+ minvid = vrm_mV_table[longhaul.bits.MinimumVID];
+ maxvid = vrm_mV_table[longhaul.bits.MaximumVID];
+
+ if (minvid.mV == 0 || maxvid.mV == 0 || minvid.mV > maxvid.mV) {
+ printk(KERN_INFO PFX "Bogus values Min:%d.%03d Max:%d.%03d. "
+ "Voltage scaling disabled.\n",
+ minvid.mV/1000, minvid.mV%1000,
+ maxvid.mV/1000, maxvid.mV%1000);
+ return;
+ }
+
+ if (minvid.mV == maxvid.mV) {
+ printk(KERN_INFO PFX "Claims to support voltage scaling but "
+ "min & max are both %d.%03d. "
+ "Voltage scaling disabled\n",
+ maxvid.mV/1000, maxvid.mV%1000);
+ return;
+ }
+
+ /* How many voltage steps*/
+ numvscales = maxvid.pos - minvid.pos + 1;
+ printk(KERN_INFO PFX
+ "Max VID=%d.%03d "
+ "Min VID=%d.%03d, "
+ "%d possible voltage scales\n",
+ maxvid.mV/1000, maxvid.mV%1000,
+ minvid.mV/1000, minvid.mV%1000,
+ numvscales);
+
+ /* Calculate max frequency at min voltage */
+ j = longhaul.bits.MinMHzBR;
+ if (longhaul.bits.MinMHzBR4)
+ j += 16;
+ min_vid_speed = eblcr[j];
+ if (min_vid_speed == -1)
+ return;
+ switch (longhaul.bits.MinMHzFSB) {
+ case 0:
+ min_vid_speed *= 13333;
+ break;
+ case 1:
+ min_vid_speed *= 10000;
+ break;
+ case 3:
+ min_vid_speed *= 6666;
+ break;
+ default:
+ return;
+ break;
+ }
+ if (min_vid_speed >= highest_speed)
+ return;
+ /* Calculate kHz for one voltage step */
+ kHz_step = (highest_speed - min_vid_speed) / numvscales;
+
+ j = 0;
+ while (longhaul_table[j].frequency != CPUFREQ_TABLE_END) {
+ speed = longhaul_table[j].frequency;
+ if (speed > min_vid_speed)
+ pos = (speed - min_vid_speed) / kHz_step + minvid.pos;
+ else
+ pos = minvid.pos;
+ longhaul_table[j].index |= mV_vrm_table[pos] << 8;
+ vid = vrm_mV_table[mV_vrm_table[pos]];
+ printk(KERN_INFO PFX "f: %d kHz, index: %d, vid: %d mV\n",
+ speed, j, vid.mV);
+ j++;
+ }
+
+ can_scale_voltage = 1;
+ printk(KERN_INFO PFX "Voltage scaling enabled.\n");
+}
+
+
+static int longhaul_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, longhaul_table);
+}
+
+
+static int longhaul_target(struct cpufreq_policy *policy,
+ unsigned int target_freq, unsigned int relation)
+{
+ unsigned int table_index = 0;
+ unsigned int i;
+ unsigned int dir = 0;
+ u8 vid, current_vid;
+
+ if (cpufreq_frequency_table_target(policy, longhaul_table, target_freq,
+ relation, &table_index))
+ return -EINVAL;
+
+ /* Don't set same frequency again */
+ if (longhaul_index == table_index)
+ return 0;
+
+ if (!can_scale_voltage)
+ longhaul_setstate(table_index);
+ else {
+ /* On test system voltage transitions exceeding single
+ * step up or down were turning motherboard off. Both
+ * "ondemand" and "userspace" are unsafe. C7 is doing
+ * this in hardware, C3 is old and we need to do this
+ * in software. */
+ i = longhaul_index;
+ current_vid = (longhaul_table[longhaul_index].index >> 8);
+ current_vid &= 0x1f;
+ if (table_index > longhaul_index)
+ dir = 1;
+ while (i != table_index) {
+ vid = (longhaul_table[i].index >> 8) & 0x1f;
+ if (vid != current_vid) {
+ longhaul_setstate(i);
+ current_vid = vid;
+ msleep(200);
+ }
+ if (dir)
+ i++;
+ else
+ i--;
+ }
+ longhaul_setstate(table_index);
+ }
+ longhaul_index = table_index;
+ return 0;
+}
+
+
+static unsigned int longhaul_get(unsigned int cpu)
+{
+ if (cpu)
+ return 0;
+ return calc_speed(longhaul_get_cpu_mult());
+}
+
+static acpi_status longhaul_walk_callback(acpi_handle obj_handle,
+ u32 nesting_level,
+ void *context, void **return_value)
+{
+ struct acpi_device *d;
+
+ if (acpi_bus_get_device(obj_handle, &d))
+ return 0;
+
+ *return_value = acpi_driver_data(d);
+ return 1;
+}
+
+/* VIA don't support PM2 reg, but have something similar */
+static int enable_arbiter_disable(void)
+{
+ struct pci_dev *dev;
+ int status = 1;
+ int reg;
+ u8 pci_cmd;
+
+ /* Find PLE133 host bridge */
+ reg = 0x78;
+ dev = pci_get_device(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8601_0,
+ NULL);
+ /* Find PM133/VT8605 host bridge */
+ if (dev == NULL)
+ dev = pci_get_device(PCI_VENDOR_ID_VIA,
+ PCI_DEVICE_ID_VIA_8605_0, NULL);
+ /* Find CLE266 host bridge */
+ if (dev == NULL) {
+ reg = 0x76;
+ dev = pci_get_device(PCI_VENDOR_ID_VIA,
+ PCI_DEVICE_ID_VIA_862X_0, NULL);
+ /* Find CN400 V-Link host bridge */
+ if (dev == NULL)
+ dev = pci_get_device(PCI_VENDOR_ID_VIA, 0x7259, NULL);
+ }
+ if (dev != NULL) {
+ /* Enable access to port 0x22 */
+ pci_read_config_byte(dev, reg, &pci_cmd);
+ if (!(pci_cmd & 1<<7)) {
+ pci_cmd |= 1<<7;
+ pci_write_config_byte(dev, reg, pci_cmd);
+ pci_read_config_byte(dev, reg, &pci_cmd);
+ if (!(pci_cmd & 1<<7)) {
+ printk(KERN_ERR PFX
+ "Can't enable access to port 0x22.\n");
+ status = 0;
+ }
+ }
+ pci_dev_put(dev);
+ return status;
+ }
+ return 0;
+}
+
+static int longhaul_setup_southbridge(void)
+{
+ struct pci_dev *dev;
+ u8 pci_cmd;
+
+ /* Find VT8235 southbridge */
+ dev = pci_get_device(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8235, NULL);
+ if (dev == NULL)
+ /* Find VT8237 southbridge */
+ dev = pci_get_device(PCI_VENDOR_ID_VIA,
+ PCI_DEVICE_ID_VIA_8237, NULL);
+ if (dev != NULL) {
+ /* Set transition time to max */
+ pci_read_config_byte(dev, 0xec, &pci_cmd);
+ pci_cmd &= ~(1 << 2);
+ pci_write_config_byte(dev, 0xec, pci_cmd);
+ pci_read_config_byte(dev, 0xe4, &pci_cmd);
+ pci_cmd &= ~(1 << 7);
+ pci_write_config_byte(dev, 0xe4, pci_cmd);
+ pci_read_config_byte(dev, 0xe5, &pci_cmd);
+ pci_cmd |= 1 << 7;
+ pci_write_config_byte(dev, 0xe5, pci_cmd);
+ /* Get address of ACPI registers block*/
+ pci_read_config_byte(dev, 0x81, &pci_cmd);
+ if (pci_cmd & 1 << 7) {
+ pci_read_config_dword(dev, 0x88, &acpi_regs_addr);
+ acpi_regs_addr &= 0xff00;
+ printk(KERN_INFO PFX "ACPI I/O at 0x%x\n",
+ acpi_regs_addr);
+ }
+
+ pci_dev_put(dev);
+ return 1;
+ }
+ return 0;
+}
+
+static int __cpuinit longhaul_cpu_init(struct cpufreq_policy *policy)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ char *cpuname = NULL;
+ int ret;
+ u32 lo, hi;
+
+ /* Check what we have on this motherboard */
+ switch (c->x86_model) {
+ case 6:
+ cpu_model = CPU_SAMUEL;
+ cpuname = "C3 'Samuel' [C5A]";
+ longhaul_version = TYPE_LONGHAUL_V1;
+ memcpy(mults, samuel1_mults, sizeof(samuel1_mults));
+ memcpy(eblcr, samuel1_eblcr, sizeof(samuel1_eblcr));
+ break;
+
+ case 7:
+ switch (c->x86_mask) {
+ case 0:
+ longhaul_version = TYPE_LONGHAUL_V1;
+ cpu_model = CPU_SAMUEL2;
+ cpuname = "C3 'Samuel 2' [C5B]";
+ /* Note, this is not a typo, early Samuel2's had
+ * Samuel1 ratios. */
+ memcpy(mults, samuel1_mults, sizeof(samuel1_mults));
+ memcpy(eblcr, samuel2_eblcr, sizeof(samuel2_eblcr));
+ break;
+ case 1 ... 15:
+ longhaul_version = TYPE_LONGHAUL_V2;
+ if (c->x86_mask < 8) {
+ cpu_model = CPU_SAMUEL2;
+ cpuname = "C3 'Samuel 2' [C5B]";
+ } else {
+ cpu_model = CPU_EZRA;
+ cpuname = "C3 'Ezra' [C5C]";
+ }
+ memcpy(mults, ezra_mults, sizeof(ezra_mults));
+ memcpy(eblcr, ezra_eblcr, sizeof(ezra_eblcr));
+ break;
+ }
+ break;
+
+ case 8:
+ cpu_model = CPU_EZRA_T;
+ cpuname = "C3 'Ezra-T' [C5M]";
+ longhaul_version = TYPE_POWERSAVER;
+ numscales = 32;
+ memcpy(mults, ezrat_mults, sizeof(ezrat_mults));
+ memcpy(eblcr, ezrat_eblcr, sizeof(ezrat_eblcr));
+ break;
+
+ case 9:
+ longhaul_version = TYPE_POWERSAVER;
+ numscales = 32;
+ memcpy(mults, nehemiah_mults, sizeof(nehemiah_mults));
+ memcpy(eblcr, nehemiah_eblcr, sizeof(nehemiah_eblcr));
+ switch (c->x86_mask) {
+ case 0 ... 1:
+ cpu_model = CPU_NEHEMIAH;
+ cpuname = "C3 'Nehemiah A' [C5XLOE]";
+ break;
+ case 2 ... 4:
+ cpu_model = CPU_NEHEMIAH;
+ cpuname = "C3 'Nehemiah B' [C5XLOH]";
+ break;
+ case 5 ... 15:
+ cpu_model = CPU_NEHEMIAH_C;
+ cpuname = "C3 'Nehemiah C' [C5P]";
+ break;
+ }
+ break;
+
+ default:
+ cpuname = "Unknown";
+ break;
+ }
+ /* Check Longhaul ver. 2 */
+ if (longhaul_version == TYPE_LONGHAUL_V2) {
+ rdmsr(MSR_VIA_LONGHAUL, lo, hi);
+ if (lo == 0 && hi == 0)
+ /* Looks like MSR isn't present */
+ longhaul_version = TYPE_LONGHAUL_V1;
+ }
+
+ printk(KERN_INFO PFX "VIA %s CPU detected. ", cpuname);
+ switch (longhaul_version) {
+ case TYPE_LONGHAUL_V1:
+ case TYPE_LONGHAUL_V2:
+ printk(KERN_CONT "Longhaul v%d supported.\n", longhaul_version);
+ break;
+ case TYPE_POWERSAVER:
+ printk(KERN_CONT "Powersaver supported.\n");
+ break;
+ };
+
+ /* Doesn't hurt */
+ longhaul_setup_southbridge();
+
+ /* Find ACPI data for processor */
+ acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+ ACPI_UINT32_MAX, &longhaul_walk_callback, NULL,
+ NULL, (void *)&pr);
+
+ /* Check ACPI support for C3 state */
+ if (pr != NULL && longhaul_version == TYPE_POWERSAVER) {
+ cx = &pr->power.states[ACPI_STATE_C3];
+ if (cx->address > 0 && cx->latency <= 1000)
+ longhaul_flags |= USE_ACPI_C3;
+ }
+ /* Disable if it isn't working */
+ if (disable_acpi_c3)
+ longhaul_flags &= ~USE_ACPI_C3;
+ /* Check if northbridge is friendly */
+ if (enable_arbiter_disable())
+ longhaul_flags |= USE_NORTHBRIDGE;
+
+ /* Check ACPI support for bus master arbiter disable */
+ if (!(longhaul_flags & USE_ACPI_C3
+ || longhaul_flags & USE_NORTHBRIDGE)
+ && ((pr == NULL) || !(pr->flags.bm_control))) {
+ printk(KERN_ERR PFX
+ "No ACPI support. Unsupported northbridge.\n");
+ return -ENODEV;
+ }
+
+ if (longhaul_flags & USE_NORTHBRIDGE)
+ printk(KERN_INFO PFX "Using northbridge support.\n");
+ if (longhaul_flags & USE_ACPI_C3)
+ printk(KERN_INFO PFX "Using ACPI support.\n");
+
+ ret = longhaul_get_ranges();
+ if (ret != 0)
+ return ret;
+
+ if ((longhaul_version != TYPE_LONGHAUL_V1) && (scale_voltage != 0))
+ longhaul_setup_voltagescaling();
+
+ policy->cpuinfo.transition_latency = 200000; /* nsec */
+ policy->cur = calc_speed(longhaul_get_cpu_mult());
+
+ ret = cpufreq_frequency_table_cpuinfo(policy, longhaul_table);
+ if (ret)
+ return ret;
+
+ cpufreq_frequency_table_get_attr(longhaul_table, policy->cpu);
+
+ return 0;
+}
+
+static int __devexit longhaul_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+static struct freq_attr *longhaul_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver longhaul_driver = {
+ .verify = longhaul_verify,
+ .target = longhaul_target,
+ .get = longhaul_get,
+ .init = longhaul_cpu_init,
+ .exit = __devexit_p(longhaul_cpu_exit),
+ .name = "longhaul",
+ .owner = THIS_MODULE,
+ .attr = longhaul_attr,
+};
+
+
+static int __init longhaul_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ if (c->x86_vendor != X86_VENDOR_CENTAUR || c->x86 != 6)
+ return -ENODEV;
+
+#ifdef CONFIG_SMP
+ if (num_online_cpus() > 1) {
+ printk(KERN_ERR PFX "More than 1 CPU detected, "
+ "longhaul disabled.\n");
+ return -ENODEV;
+ }
+#endif
+#ifdef CONFIG_X86_IO_APIC
+ if (cpu_has_apic) {
+ printk(KERN_ERR PFX "APIC detected. Longhaul is currently "
+ "broken in this configuration.\n");
+ return -ENODEV;
+ }
+#endif
+ switch (c->x86_model) {
+ case 6 ... 9:
+ return cpufreq_register_driver(&longhaul_driver);
+ case 10:
+ printk(KERN_ERR PFX "Use acpi-cpufreq driver for VIA C7\n");
+ default:
+ ;
+ }
+
+ return -ENODEV;
+}
+
+
+static void __exit longhaul_exit(void)
+{
+ int i;
+
+ for (i = 0; i < numscales; i++) {
+ if (mults[i] == maxmult) {
+ longhaul_setstate(i);
+ break;
+ }
+ }
+
+ cpufreq_unregister_driver(&longhaul_driver);
+ kfree(longhaul_table);
+}
+
+/* Even if BIOS is exporting ACPI C3 state, and it is used
+ * with success when CPU is idle, this state doesn't
+ * trigger frequency transition in some cases. */
+module_param(disable_acpi_c3, int, 0644);
+MODULE_PARM_DESC(disable_acpi_c3, "Don't use ACPI C3 support");
+/* Change CPU voltage with frequency. Very useful to save
+ * power, but most VIA C3 processors aren't supporting it. */
+module_param(scale_voltage, int, 0644);
+MODULE_PARM_DESC(scale_voltage, "Scale voltage of processor");
+/* Force revision key to 0 for processors which doesn't
+ * support voltage scaling, but are introducing itself as
+ * such. */
+module_param(revid_errata, int, 0644);
+MODULE_PARM_DESC(revid_errata, "Ignore CPU Revision ID");
+
+MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
+MODULE_DESCRIPTION("Longhaul driver for VIA Cyrix processors.");
+MODULE_LICENSE("GPL");
+
+late_initcall(longhaul_init);
+module_exit(longhaul_exit);
--- /dev/null
+/*
+ * longhaul.h
+ * (C) 2003 Dave Jones.
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * VIA-specific information
+ */
+
+union msr_bcr2 {
+ struct {
+ unsigned Reseved:19, // 18:0
+ ESOFTBF:1, // 19
+ Reserved2:3, // 22:20
+ CLOCKMUL:4, // 26:23
+ Reserved3:5; // 31:27
+ } bits;
+ unsigned long val;
+};
+
+union msr_longhaul {
+ struct {
+ unsigned RevisionID:4, // 3:0
+ RevisionKey:4, // 7:4
+ EnableSoftBusRatio:1, // 8
+ EnableSoftVID:1, // 9
+ EnableSoftBSEL:1, // 10
+ Reserved:3, // 11:13
+ SoftBusRatio4:1, // 14
+ VRMRev:1, // 15
+ SoftBusRatio:4, // 19:16
+ SoftVID:5, // 24:20
+ Reserved2:3, // 27:25
+ SoftBSEL:2, // 29:28
+ Reserved3:2, // 31:30
+ MaxMHzBR:4, // 35:32
+ MaximumVID:5, // 40:36
+ MaxMHzFSB:2, // 42:41
+ MaxMHzBR4:1, // 43
+ Reserved4:4, // 47:44
+ MinMHzBR:4, // 51:48
+ MinimumVID:5, // 56:52
+ MinMHzFSB:2, // 58:57
+ MinMHzBR4:1, // 59
+ Reserved5:4; // 63:60
+ } bits;
+ unsigned long long val;
+};
+
+/*
+ * Clock ratio tables. Div/Mod by 10 to get ratio.
+ * The eblcr values specify the ratio read from the CPU.
+ * The mults values specify what to write to the CPU.
+ */
+
+/*
+ * VIA C3 Samuel 1 & Samuel 2 (stepping 0)
+ */
+static const int __cpuinitdata samuel1_mults[16] = {
+ -1, /* 0000 -> RESERVED */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ -1, /* 0011 -> RESERVED */
+ -1, /* 0100 -> RESERVED */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ 55, /* 0111 -> 5.5x */
+ 60, /* 1000 -> 6.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 50, /* 1011 -> 5.0x */
+ 65, /* 1100 -> 6.5x */
+ 75, /* 1101 -> 7.5x */
+ -1, /* 1110 -> RESERVED */
+ -1, /* 1111 -> RESERVED */
+};
+
+static const int __cpuinitdata samuel1_eblcr[16] = {
+ 50, /* 0000 -> RESERVED */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ -1, /* 0011 -> RESERVED */
+ 55, /* 0100 -> 5.5x */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ -1, /* 0111 -> RESERVED */
+ -1, /* 1000 -> RESERVED */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 60, /* 1011 -> 6.0x */
+ -1, /* 1100 -> RESERVED */
+ 75, /* 1101 -> 7.5x */
+ -1, /* 1110 -> RESERVED */
+ 65, /* 1111 -> 6.5x */
+};
+
+/*
+ * VIA C3 Samuel2 Stepping 1->15
+ */
+static const int __cpuinitdata samuel2_eblcr[16] = {
+ 50, /* 0000 -> 5.0x */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ 100, /* 0011 -> 10.0x */
+ 55, /* 0100 -> 5.5x */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ 110, /* 0111 -> 11.0x */
+ 90, /* 1000 -> 9.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 60, /* 1011 -> 6.0x */
+ 120, /* 1100 -> 12.0x */
+ 75, /* 1101 -> 7.5x */
+ 130, /* 1110 -> 13.0x */
+ 65, /* 1111 -> 6.5x */
+};
+
+/*
+ * VIA C3 Ezra
+ */
+static const int __cpuinitdata ezra_mults[16] = {
+ 100, /* 0000 -> 10.0x */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ 90, /* 0011 -> 9.0x */
+ 95, /* 0100 -> 9.5x */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ 55, /* 0111 -> 5.5x */
+ 60, /* 1000 -> 6.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 50, /* 1011 -> 5.0x */
+ 65, /* 1100 -> 6.5x */
+ 75, /* 1101 -> 7.5x */
+ 85, /* 1110 -> 8.5x */
+ 120, /* 1111 -> 12.0x */
+};
+
+static const int __cpuinitdata ezra_eblcr[16] = {
+ 50, /* 0000 -> 5.0x */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ 100, /* 0011 -> 10.0x */
+ 55, /* 0100 -> 5.5x */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ 95, /* 0111 -> 9.5x */
+ 90, /* 1000 -> 9.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 60, /* 1011 -> 6.0x */
+ 120, /* 1100 -> 12.0x */
+ 75, /* 1101 -> 7.5x */
+ 85, /* 1110 -> 8.5x */
+ 65, /* 1111 -> 6.5x */
+};
+
+/*
+ * VIA C3 (Ezra-T) [C5M].
+ */
+static const int __cpuinitdata ezrat_mults[32] = {
+ 100, /* 0000 -> 10.0x */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ 90, /* 0011 -> 9.0x */
+ 95, /* 0100 -> 9.5x */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ 55, /* 0111 -> 5.5x */
+ 60, /* 1000 -> 6.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 50, /* 1011 -> 5.0x */
+ 65, /* 1100 -> 6.5x */
+ 75, /* 1101 -> 7.5x */
+ 85, /* 1110 -> 8.5x */
+ 120, /* 1111 -> 12.0x */
+
+ -1, /* 0000 -> RESERVED (10.0x) */
+ 110, /* 0001 -> 11.0x */
+ -1, /* 0010 -> 12.0x */
+ -1, /* 0011 -> RESERVED (9.0x)*/
+ 105, /* 0100 -> 10.5x */
+ 115, /* 0101 -> 11.5x */
+ 125, /* 0110 -> 12.5x */
+ 135, /* 0111 -> 13.5x */
+ 140, /* 1000 -> 14.0x */
+ 150, /* 1001 -> 15.0x */
+ 160, /* 1010 -> 16.0x */
+ 130, /* 1011 -> 13.0x */
+ 145, /* 1100 -> 14.5x */
+ 155, /* 1101 -> 15.5x */
+ -1, /* 1110 -> RESERVED (13.0x) */
+ -1, /* 1111 -> RESERVED (12.0x) */
+};
+
+static const int __cpuinitdata ezrat_eblcr[32] = {
+ 50, /* 0000 -> 5.0x */
+ 30, /* 0001 -> 3.0x */
+ 40, /* 0010 -> 4.0x */
+ 100, /* 0011 -> 10.0x */
+ 55, /* 0100 -> 5.5x */
+ 35, /* 0101 -> 3.5x */
+ 45, /* 0110 -> 4.5x */
+ 95, /* 0111 -> 9.5x */
+ 90, /* 1000 -> 9.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 60, /* 1011 -> 6.0x */
+ 120, /* 1100 -> 12.0x */
+ 75, /* 1101 -> 7.5x */
+ 85, /* 1110 -> 8.5x */
+ 65, /* 1111 -> 6.5x */
+
+ -1, /* 0000 -> RESERVED (9.0x) */
+ 110, /* 0001 -> 11.0x */
+ 120, /* 0010 -> 12.0x */
+ -1, /* 0011 -> RESERVED (10.0x)*/
+ 135, /* 0100 -> 13.5x */
+ 115, /* 0101 -> 11.5x */
+ 125, /* 0110 -> 12.5x */
+ 105, /* 0111 -> 10.5x */
+ 130, /* 1000 -> 13.0x */
+ 150, /* 1001 -> 15.0x */
+ 160, /* 1010 -> 16.0x */
+ 140, /* 1011 -> 14.0x */
+ -1, /* 1100 -> RESERVED (12.0x) */
+ 155, /* 1101 -> 15.5x */
+ -1, /* 1110 -> RESERVED (13.0x) */
+ 145, /* 1111 -> 14.5x */
+};
+
+/*
+ * VIA C3 Nehemiah */
+
+static const int __cpuinitdata nehemiah_mults[32] = {
+ 100, /* 0000 -> 10.0x */
+ -1, /* 0001 -> 16.0x */
+ 40, /* 0010 -> 4.0x */
+ 90, /* 0011 -> 9.0x */
+ 95, /* 0100 -> 9.5x */
+ -1, /* 0101 -> RESERVED */
+ 45, /* 0110 -> 4.5x */
+ 55, /* 0111 -> 5.5x */
+ 60, /* 1000 -> 6.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 50, /* 1011 -> 5.0x */
+ 65, /* 1100 -> 6.5x */
+ 75, /* 1101 -> 7.5x */
+ 85, /* 1110 -> 8.5x */
+ 120, /* 1111 -> 12.0x */
+ -1, /* 0000 -> 10.0x */
+ 110, /* 0001 -> 11.0x */
+ -1, /* 0010 -> 12.0x */
+ -1, /* 0011 -> 9.0x */
+ 105, /* 0100 -> 10.5x */
+ 115, /* 0101 -> 11.5x */
+ 125, /* 0110 -> 12.5x */
+ 135, /* 0111 -> 13.5x */
+ 140, /* 1000 -> 14.0x */
+ 150, /* 1001 -> 15.0x */
+ 160, /* 1010 -> 16.0x */
+ 130, /* 1011 -> 13.0x */
+ 145, /* 1100 -> 14.5x */
+ 155, /* 1101 -> 15.5x */
+ -1, /* 1110 -> RESERVED (13.0x) */
+ -1, /* 1111 -> 12.0x */
+};
+
+static const int __cpuinitdata nehemiah_eblcr[32] = {
+ 50, /* 0000 -> 5.0x */
+ 160, /* 0001 -> 16.0x */
+ 40, /* 0010 -> 4.0x */
+ 100, /* 0011 -> 10.0x */
+ 55, /* 0100 -> 5.5x */
+ -1, /* 0101 -> RESERVED */
+ 45, /* 0110 -> 4.5x */
+ 95, /* 0111 -> 9.5x */
+ 90, /* 1000 -> 9.0x */
+ 70, /* 1001 -> 7.0x */
+ 80, /* 1010 -> 8.0x */
+ 60, /* 1011 -> 6.0x */
+ 120, /* 1100 -> 12.0x */
+ 75, /* 1101 -> 7.5x */
+ 85, /* 1110 -> 8.5x */
+ 65, /* 1111 -> 6.5x */
+ 90, /* 0000 -> 9.0x */
+ 110, /* 0001 -> 11.0x */
+ 120, /* 0010 -> 12.0x */
+ 100, /* 0011 -> 10.0x */
+ 135, /* 0100 -> 13.5x */
+ 115, /* 0101 -> 11.5x */
+ 125, /* 0110 -> 12.5x */
+ 105, /* 0111 -> 10.5x */
+ 130, /* 1000 -> 13.0x */
+ 150, /* 1001 -> 15.0x */
+ 160, /* 1010 -> 16.0x */
+ 140, /* 1011 -> 14.0x */
+ 120, /* 1100 -> 12.0x */
+ 155, /* 1101 -> 15.5x */
+ -1, /* 1110 -> RESERVED (13.0x) */
+ 145 /* 1111 -> 14.5x */
+};
+
+/*
+ * Voltage scales. Div/Mod by 1000 to get actual voltage.
+ * Which scale to use depends on the VRM type in use.
+ */
+
+struct mV_pos {
+ unsigned short mV;
+ unsigned short pos;
+};
+
+static const struct mV_pos __cpuinitdata vrm85_mV[32] = {
+ {1250, 8}, {1200, 6}, {1150, 4}, {1100, 2},
+ {1050, 0}, {1800, 30}, {1750, 28}, {1700, 26},
+ {1650, 24}, {1600, 22}, {1550, 20}, {1500, 18},
+ {1450, 16}, {1400, 14}, {1350, 12}, {1300, 10},
+ {1275, 9}, {1225, 7}, {1175, 5}, {1125, 3},
+ {1075, 1}, {1825, 31}, {1775, 29}, {1725, 27},
+ {1675, 25}, {1625, 23}, {1575, 21}, {1525, 19},
+ {1475, 17}, {1425, 15}, {1375, 13}, {1325, 11}
+};
+
+static const unsigned char __cpuinitdata mV_vrm85[32] = {
+ 0x04, 0x14, 0x03, 0x13, 0x02, 0x12, 0x01, 0x11,
+ 0x00, 0x10, 0x0f, 0x1f, 0x0e, 0x1e, 0x0d, 0x1d,
+ 0x0c, 0x1c, 0x0b, 0x1b, 0x0a, 0x1a, 0x09, 0x19,
+ 0x08, 0x18, 0x07, 0x17, 0x06, 0x16, 0x05, 0x15
+};
+
+static const struct mV_pos __cpuinitdata mobilevrm_mV[32] = {
+ {1750, 31}, {1700, 30}, {1650, 29}, {1600, 28},
+ {1550, 27}, {1500, 26}, {1450, 25}, {1400, 24},
+ {1350, 23}, {1300, 22}, {1250, 21}, {1200, 20},
+ {1150, 19}, {1100, 18}, {1050, 17}, {1000, 16},
+ {975, 15}, {950, 14}, {925, 13}, {900, 12},
+ {875, 11}, {850, 10}, {825, 9}, {800, 8},
+ {775, 7}, {750, 6}, {725, 5}, {700, 4},
+ {675, 3}, {650, 2}, {625, 1}, {600, 0}
+};
+
+static const unsigned char __cpuinitdata mV_mobilevrm[32] = {
+ 0x1f, 0x1e, 0x1d, 0x1c, 0x1b, 0x1a, 0x19, 0x18,
+ 0x17, 0x16, 0x15, 0x14, 0x13, 0x12, 0x11, 0x10,
+ 0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08,
+ 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
+};
+
--- /dev/null
+/*
+ * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/timex.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+
+static struct cpufreq_driver longrun_driver;
+
+/**
+ * longrun_{low,high}_freq is needed for the conversion of cpufreq kHz
+ * values into per cent values. In TMTA microcode, the following is valid:
+ * performance_pctg = (current_freq - low_freq)/(high_freq - low_freq)
+ */
+static unsigned int longrun_low_freq, longrun_high_freq;
+
+
+/**
+ * longrun_get_policy - get the current LongRun policy
+ * @policy: struct cpufreq_policy where current policy is written into
+ *
+ * Reads the current LongRun policy by access to MSR_TMTA_LONGRUN_FLAGS
+ * and MSR_TMTA_LONGRUN_CTRL
+ */
+static void __cpuinit longrun_get_policy(struct cpufreq_policy *policy)
+{
+ u32 msr_lo, msr_hi;
+
+ rdmsr(MSR_TMTA_LONGRUN_FLAGS, msr_lo, msr_hi);
+ pr_debug("longrun flags are %x - %x\n", msr_lo, msr_hi);
+ if (msr_lo & 0x01)
+ policy->policy = CPUFREQ_POLICY_PERFORMANCE;
+ else
+ policy->policy = CPUFREQ_POLICY_POWERSAVE;
+
+ rdmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
+ pr_debug("longrun ctrl is %x - %x\n", msr_lo, msr_hi);
+ msr_lo &= 0x0000007F;
+ msr_hi &= 0x0000007F;
+
+ if (longrun_high_freq <= longrun_low_freq) {
+ /* Assume degenerate Longrun table */
+ policy->min = policy->max = longrun_high_freq;
+ } else {
+ policy->min = longrun_low_freq + msr_lo *
+ ((longrun_high_freq - longrun_low_freq) / 100);
+ policy->max = longrun_low_freq + msr_hi *
+ ((longrun_high_freq - longrun_low_freq) / 100);
+ }
+ policy->cpu = 0;
+}
+
+
+/**
+ * longrun_set_policy - sets a new CPUFreq policy
+ * @policy: new policy
+ *
+ * Sets a new CPUFreq policy on LongRun-capable processors. This function
+ * has to be called with cpufreq_driver locked.
+ */
+static int longrun_set_policy(struct cpufreq_policy *policy)
+{
+ u32 msr_lo, msr_hi;
+ u32 pctg_lo, pctg_hi;
+
+ if (!policy)
+ return -EINVAL;
+
+ if (longrun_high_freq <= longrun_low_freq) {
+ /* Assume degenerate Longrun table */
+ pctg_lo = pctg_hi = 100;
+ } else {
+ pctg_lo = (policy->min - longrun_low_freq) /
+ ((longrun_high_freq - longrun_low_freq) / 100);
+ pctg_hi = (policy->max - longrun_low_freq) /
+ ((longrun_high_freq - longrun_low_freq) / 100);
+ }
+
+ if (pctg_hi > 100)
+ pctg_hi = 100;
+ if (pctg_lo > pctg_hi)
+ pctg_lo = pctg_hi;
+
+ /* performance or economy mode */
+ rdmsr(MSR_TMTA_LONGRUN_FLAGS, msr_lo, msr_hi);
+ msr_lo &= 0xFFFFFFFE;
+ switch (policy->policy) {
+ case CPUFREQ_POLICY_PERFORMANCE:
+ msr_lo |= 0x00000001;
+ break;
+ case CPUFREQ_POLICY_POWERSAVE:
+ break;
+ }
+ wrmsr(MSR_TMTA_LONGRUN_FLAGS, msr_lo, msr_hi);
+
+ /* lower and upper boundary */
+ rdmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
+ msr_lo &= 0xFFFFFF80;
+ msr_hi &= 0xFFFFFF80;
+ msr_lo |= pctg_lo;
+ msr_hi |= pctg_hi;
+ wrmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
+
+ return 0;
+}
+
+
+/**
+ * longrun_verify_poliy - verifies a new CPUFreq policy
+ * @policy: the policy to verify
+ *
+ * Validates a new CPUFreq policy. This function has to be called with
+ * cpufreq_driver locked.
+ */
+static int longrun_verify_policy(struct cpufreq_policy *policy)
+{
+ if (!policy)
+ return -EINVAL;
+
+ policy->cpu = 0;
+ cpufreq_verify_within_limits(policy,
+ policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);
+
+ if ((policy->policy != CPUFREQ_POLICY_POWERSAVE) &&
+ (policy->policy != CPUFREQ_POLICY_PERFORMANCE))
+ return -EINVAL;
+
+ return 0;
+}
+
+static unsigned int longrun_get(unsigned int cpu)
+{
+ u32 eax, ebx, ecx, edx;
+
+ if (cpu)
+ return 0;
+
+ cpuid(0x80860007, &eax, &ebx, &ecx, &edx);
+ pr_debug("cpuid eax is %u\n", eax);
+
+ return eax * 1000;
+}
+
+/**
+ * longrun_determine_freqs - determines the lowest and highest possible core frequency
+ * @low_freq: an int to put the lowest frequency into
+ * @high_freq: an int to put the highest frequency into
+ *
+ * Determines the lowest and highest possible core frequencies on this CPU.
+ * This is necessary to calculate the performance percentage according to
+ * TMTA rules:
+ * performance_pctg = (target_freq - low_freq)/(high_freq - low_freq)
+ */
+static int __cpuinit longrun_determine_freqs(unsigned int *low_freq,
+ unsigned int *high_freq)
+{
+ u32 msr_lo, msr_hi;
+ u32 save_lo, save_hi;
+ u32 eax, ebx, ecx, edx;
+ u32 try_hi;
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ if (!low_freq || !high_freq)
+ return -EINVAL;
+
+ if (cpu_has(c, X86_FEATURE_LRTI)) {
+ /* if the LongRun Table Interface is present, the
+ * detection is a bit easier:
+ * For minimum frequency, read out the maximum
+ * level (msr_hi), write that into "currently
+ * selected level", and read out the frequency.
+ * For maximum frequency, read out level zero.
+ */
+ /* minimum */
+ rdmsr(MSR_TMTA_LRTI_READOUT, msr_lo, msr_hi);
+ wrmsr(MSR_TMTA_LRTI_READOUT, msr_hi, msr_hi);
+ rdmsr(MSR_TMTA_LRTI_VOLT_MHZ, msr_lo, msr_hi);
+ *low_freq = msr_lo * 1000; /* to kHz */
+
+ /* maximum */
+ wrmsr(MSR_TMTA_LRTI_READOUT, 0, msr_hi);
+ rdmsr(MSR_TMTA_LRTI_VOLT_MHZ, msr_lo, msr_hi);
+ *high_freq = msr_lo * 1000; /* to kHz */
+
+ pr_debug("longrun table interface told %u - %u kHz\n",
+ *low_freq, *high_freq);
+
+ if (*low_freq > *high_freq)
+ *low_freq = *high_freq;
+ return 0;
+ }
+
+ /* set the upper border to the value determined during TSC init */
+ *high_freq = (cpu_khz / 1000);
+ *high_freq = *high_freq * 1000;
+ pr_debug("high frequency is %u kHz\n", *high_freq);
+
+ /* get current borders */
+ rdmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
+ save_lo = msr_lo & 0x0000007F;
+ save_hi = msr_hi & 0x0000007F;
+
+ /* if current perf_pctg is larger than 90%, we need to decrease the
+ * upper limit to make the calculation more accurate.
+ */
+ cpuid(0x80860007, &eax, &ebx, &ecx, &edx);
+ /* try decreasing in 10% steps, some processors react only
+ * on some barrier values */
+ for (try_hi = 80; try_hi > 0 && ecx > 90; try_hi -= 10) {
+ /* set to 0 to try_hi perf_pctg */
+ msr_lo &= 0xFFFFFF80;
+ msr_hi &= 0xFFFFFF80;
+ msr_hi |= try_hi;
+ wrmsr(MSR_TMTA_LONGRUN_CTRL, msr_lo, msr_hi);
+
+ /* read out current core MHz and current perf_pctg */
+ cpuid(0x80860007, &eax, &ebx, &ecx, &edx);
+
+ /* restore values */
+ wrmsr(MSR_TMTA_LONGRUN_CTRL, save_lo, save_hi);
+ }
+ pr_debug("percentage is %u %%, freq is %u MHz\n", ecx, eax);
+
+ /* performance_pctg = (current_freq - low_freq)/(high_freq - low_freq)
+ * eqals
+ * low_freq * (1 - perf_pctg) = (cur_freq - high_freq * perf_pctg)
+ *
+ * high_freq * perf_pctg is stored tempoarily into "ebx".
+ */
+ ebx = (((cpu_khz / 1000) * ecx) / 100); /* to MHz */
+
+ if ((ecx > 95) || (ecx == 0) || (eax < ebx))
+ return -EIO;
+
+ edx = ((eax - ebx) * 100) / (100 - ecx);
+ *low_freq = edx * 1000; /* back to kHz */
+
+ pr_debug("low frequency is %u kHz\n", *low_freq);
+
+ if (*low_freq > *high_freq)
+ *low_freq = *high_freq;
+
+ return 0;
+}
+
+
+static int __cpuinit longrun_cpu_init(struct cpufreq_policy *policy)
+{
+ int result = 0;
+
+ /* capability check */
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ /* detect low and high frequency */
+ result = longrun_determine_freqs(&longrun_low_freq, &longrun_high_freq);
+ if (result)
+ return result;
+
+ /* cpuinfo and default policy values */
+ policy->cpuinfo.min_freq = longrun_low_freq;
+ policy->cpuinfo.max_freq = longrun_high_freq;
+ policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
+ longrun_get_policy(policy);
+
+ return 0;
+}
+
+
+static struct cpufreq_driver longrun_driver = {
+ .flags = CPUFREQ_CONST_LOOPS,
+ .verify = longrun_verify_policy,
+ .setpolicy = longrun_set_policy,
+ .get = longrun_get,
+ .init = longrun_cpu_init,
+ .name = "longrun",
+ .owner = THIS_MODULE,
+};
+
+
+/**
+ * longrun_init - initializes the Transmeta Crusoe LongRun CPUFreq driver
+ *
+ * Initializes the LongRun support.
+ */
+static int __init longrun_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ if (c->x86_vendor != X86_VENDOR_TRANSMETA ||
+ !cpu_has(c, X86_FEATURE_LONGRUN))
+ return -ENODEV;
+
+ return cpufreq_register_driver(&longrun_driver);
+}
+
+
+/**
+ * longrun_exit - unregisters LongRun support
+ */
+static void __exit longrun_exit(void)
+{
+ cpufreq_unregister_driver(&longrun_driver);
+}
+
+
+MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
+MODULE_DESCRIPTION("LongRun driver for Transmeta Crusoe and "
+ "Efficeon processors.");
+MODULE_LICENSE("GPL");
+
+module_init(longrun_init);
+module_exit(longrun_exit);
--- /dev/null
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/slab.h>
+
+#include "mperf.h"
+
+static DEFINE_PER_CPU(struct aperfmperf, acfreq_old_perf);
+
+/* Called via smp_call_function_single(), on the target CPU */
+static void read_measured_perf_ctrs(void *_cur)
+{
+ struct aperfmperf *am = _cur;
+
+ get_aperfmperf(am);
+}
+
+/*
+ * Return the measured active (C0) frequency on this CPU since last call
+ * to this function.
+ * Input: cpu number
+ * Return: Average CPU frequency in terms of max frequency (zero on error)
+ *
+ * We use IA32_MPERF and IA32_APERF MSRs to get the measured performance
+ * over a period of time, while CPU is in C0 state.
+ * IA32_MPERF counts at the rate of max advertised frequency
+ * IA32_APERF counts at the rate of actual CPU frequency
+ * Only IA32_APERF/IA32_MPERF ratio is architecturally defined and
+ * no meaning should be associated with absolute values of these MSRs.
+ */
+unsigned int cpufreq_get_measured_perf(struct cpufreq_policy *policy,
+ unsigned int cpu)
+{
+ struct aperfmperf perf;
+ unsigned long ratio;
+ unsigned int retval;
+
+ if (smp_call_function_single(cpu, read_measured_perf_ctrs, &perf, 1))
+ return 0;
+
+ ratio = calc_aperfmperf_ratio(&per_cpu(acfreq_old_perf, cpu), &perf);
+ per_cpu(acfreq_old_perf, cpu) = perf;
+
+ retval = (policy->cpuinfo.max_freq * ratio) >> APERFMPERF_SHIFT;
+
+ return retval;
+}
+EXPORT_SYMBOL_GPL(cpufreq_get_measured_perf);
+MODULE_LICENSE("GPL");
--- /dev/null
+/*
+ * (c) 2010 Advanced Micro Devices, Inc.
+ * Your use of this code is subject to the terms and conditions of the
+ * GNU general public license version 2. See "COPYING" or
+ * http://www.gnu.org/licenses/gpl.html
+ */
+
+unsigned int cpufreq_get_measured_perf(struct cpufreq_policy *policy,
+ unsigned int cpu);
--- /dev/null
+/*
+ * Pentium 4/Xeon CPU on demand clock modulation/speed scaling
+ * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
+ * (C) 2002 Zwane Mwaikambo <zwane@commfireservices.com>
+ * (C) 2002 Arjan van de Ven <arjanv@redhat.com>
+ * (C) 2002 Tora T. Engstad
+ * All Rights Reserved
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * The author(s) of this software shall not be held liable for damages
+ * of any nature resulting due to the use of this software. This
+ * software is provided AS-IS with no warranties.
+ *
+ * Date Errata Description
+ * 20020525 N44, O17 12.5% or 25% DC causes lockup
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
+#include <linux/timex.h>
+
+#include <asm/processor.h>
+#include <asm/msr.h>
+#include <asm/timer.h>
+
+#include "speedstep-lib.h"
+
+#define PFX "p4-clockmod: "
+
+/*
+ * Duty Cycle (3bits), note DC_DISABLE is not specified in
+ * intel docs i just use it to mean disable
+ */
+enum {
+ DC_RESV, DC_DFLT, DC_25PT, DC_38PT, DC_50PT,
+ DC_64PT, DC_75PT, DC_88PT, DC_DISABLE
+};
+
+#define DC_ENTRIES 8
+
+
+static int has_N44_O17_errata[NR_CPUS];
+static unsigned int stock_freq;
+static struct cpufreq_driver p4clockmod_driver;
+static unsigned int cpufreq_p4_get(unsigned int cpu);
+
+static int cpufreq_p4_setdc(unsigned int cpu, unsigned int newstate)
+{
+ u32 l, h;
+
+ if (!cpu_online(cpu) ||
+ (newstate > DC_DISABLE) || (newstate == DC_RESV))
+ return -EINVAL;
+
+ rdmsr_on_cpu(cpu, MSR_IA32_THERM_STATUS, &l, &h);
+
+ if (l & 0x01)
+ pr_debug("CPU#%d currently thermal throttled\n", cpu);
+
+ if (has_N44_O17_errata[cpu] &&
+ (newstate == DC_25PT || newstate == DC_DFLT))
+ newstate = DC_38PT;
+
+ rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
+ if (newstate == DC_DISABLE) {
+ pr_debug("CPU#%d disabling modulation\n", cpu);
+ wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l & ~(1<<4), h);
+ } else {
+ pr_debug("CPU#%d setting duty cycle to %d%%\n",
+ cpu, ((125 * newstate) / 10));
+ /* bits 63 - 5 : reserved
+ * bit 4 : enable/disable
+ * bits 3-1 : duty cycle
+ * bit 0 : reserved
+ */
+ l = (l & ~14);
+ l = l | (1<<4) | ((newstate & 0x7)<<1);
+ wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
+ }
+
+ return 0;
+}
+
+
+static struct cpufreq_frequency_table p4clockmod_table[] = {
+ {DC_RESV, CPUFREQ_ENTRY_INVALID},
+ {DC_DFLT, 0},
+ {DC_25PT, 0},
+ {DC_38PT, 0},
+ {DC_50PT, 0},
+ {DC_64PT, 0},
+ {DC_75PT, 0},
+ {DC_88PT, 0},
+ {DC_DISABLE, 0},
+ {DC_RESV, CPUFREQ_TABLE_END},
+};
+
+
+static int cpufreq_p4_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate = DC_RESV;
+ struct cpufreq_freqs freqs;
+ int i;
+
+ if (cpufreq_frequency_table_target(policy, &p4clockmod_table[0],
+ target_freq, relation, &newstate))
+ return -EINVAL;
+
+ freqs.old = cpufreq_p4_get(policy->cpu);
+ freqs.new = stock_freq * p4clockmod_table[newstate].index / 8;
+
+ if (freqs.new == freqs.old)
+ return 0;
+
+ /* notifiers */
+ for_each_cpu(i, policy->cpus) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ }
+
+ /* run on each logical CPU,
+ * see section 13.15.3 of IA32 Intel Architecture Software
+ * Developer's Manual, Volume 3
+ */
+ for_each_cpu(i, policy->cpus)
+ cpufreq_p4_setdc(i, p4clockmod_table[newstate].index);
+
+ /* notifiers */
+ for_each_cpu(i, policy->cpus) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+
+ return 0;
+}
+
+
+static int cpufreq_p4_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, &p4clockmod_table[0]);
+}
+
+
+static unsigned int cpufreq_p4_get_frequency(struct cpuinfo_x86 *c)
+{
+ if (c->x86 == 0x06) {
+ if (cpu_has(c, X86_FEATURE_EST))
+ printk_once(KERN_WARNING PFX "Warning: EST-capable "
+ "CPU detected. The acpi-cpufreq module offers "
+ "voltage scaling in addition to frequency "
+ "scaling. You should use that instead of "
+ "p4-clockmod, if possible.\n");
+ switch (c->x86_model) {
+ case 0x0E: /* Core */
+ case 0x0F: /* Core Duo */
+ case 0x16: /* Celeron Core */
+ case 0x1C: /* Atom */
+ p4clockmod_driver.flags |= CPUFREQ_CONST_LOOPS;
+ return speedstep_get_frequency(SPEEDSTEP_CPU_PCORE);
+ case 0x0D: /* Pentium M (Dothan) */
+ p4clockmod_driver.flags |= CPUFREQ_CONST_LOOPS;
+ /* fall through */
+ case 0x09: /* Pentium M (Banias) */
+ return speedstep_get_frequency(SPEEDSTEP_CPU_PM);
+ }
+ }
+
+ if (c->x86 != 0xF)
+ return 0;
+
+ /* on P-4s, the TSC runs with constant frequency independent whether
+ * throttling is active or not. */
+ p4clockmod_driver.flags |= CPUFREQ_CONST_LOOPS;
+
+ if (speedstep_detect_processor() == SPEEDSTEP_CPU_P4M) {
+ printk(KERN_WARNING PFX "Warning: Pentium 4-M detected. "
+ "The speedstep-ich or acpi cpufreq modules offer "
+ "voltage scaling in addition of frequency scaling. "
+ "You should use either one instead of p4-clockmod, "
+ "if possible.\n");
+ return speedstep_get_frequency(SPEEDSTEP_CPU_P4M);
+ }
+
+ return speedstep_get_frequency(SPEEDSTEP_CPU_P4D);
+}
+
+
+
+static int cpufreq_p4_cpu_init(struct cpufreq_policy *policy)
+{
+ struct cpuinfo_x86 *c = &cpu_data(policy->cpu);
+ int cpuid = 0;
+ unsigned int i;
+
+#ifdef CONFIG_SMP
+ cpumask_copy(policy->cpus, cpu_sibling_mask(policy->cpu));
+#endif
+
+ /* Errata workaround */
+ cpuid = (c->x86 << 8) | (c->x86_model << 4) | c->x86_mask;
+ switch (cpuid) {
+ case 0x0f07:
+ case 0x0f0a:
+ case 0x0f11:
+ case 0x0f12:
+ has_N44_O17_errata[policy->cpu] = 1;
+ pr_debug("has errata -- disabling low frequencies\n");
+ }
+
+ if (speedstep_detect_processor() == SPEEDSTEP_CPU_P4D &&
+ c->x86_model < 2) {
+ /* switch to maximum frequency and measure result */
+ cpufreq_p4_setdc(policy->cpu, DC_DISABLE);
+ recalibrate_cpu_khz();
+ }
+ /* get max frequency */
+ stock_freq = cpufreq_p4_get_frequency(c);
+ if (!stock_freq)
+ return -EINVAL;
+
+ /* table init */
+ for (i = 1; (p4clockmod_table[i].frequency != CPUFREQ_TABLE_END); i++) {
+ if ((i < 2) && (has_N44_O17_errata[policy->cpu]))
+ p4clockmod_table[i].frequency = CPUFREQ_ENTRY_INVALID;
+ else
+ p4clockmod_table[i].frequency = (stock_freq * i)/8;
+ }
+ cpufreq_frequency_table_get_attr(p4clockmod_table, policy->cpu);
+
+ /* cpuinfo and default policy values */
+
+ /* the transition latency is set to be 1 higher than the maximum
+ * transition latency of the ondemand governor */
+ policy->cpuinfo.transition_latency = 10000001;
+ policy->cur = stock_freq;
+
+ return cpufreq_frequency_table_cpuinfo(policy, &p4clockmod_table[0]);
+}
+
+
+static int cpufreq_p4_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+static unsigned int cpufreq_p4_get(unsigned int cpu)
+{
+ u32 l, h;
+
+ rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
+
+ if (l & 0x10) {
+ l = l >> 1;
+ l &= 0x7;
+ } else
+ l = DC_DISABLE;
+
+ if (l != DC_DISABLE)
+ return stock_freq * l / 8;
+
+ return stock_freq;
+}
+
+static struct freq_attr *p4clockmod_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver p4clockmod_driver = {
+ .verify = cpufreq_p4_verify,
+ .target = cpufreq_p4_target,
+ .init = cpufreq_p4_cpu_init,
+ .exit = cpufreq_p4_cpu_exit,
+ .get = cpufreq_p4_get,
+ .name = "p4-clockmod",
+ .owner = THIS_MODULE,
+ .attr = p4clockmod_attr,
+};
+
+
+static int __init cpufreq_p4_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ int ret;
+
+ /*
+ * THERM_CONTROL is architectural for IA32 now, so
+ * we can rely on the capability checks
+ */
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return -ENODEV;
+
+ if (!test_cpu_cap(c, X86_FEATURE_ACPI) ||
+ !test_cpu_cap(c, X86_FEATURE_ACC))
+ return -ENODEV;
+
+ ret = cpufreq_register_driver(&p4clockmod_driver);
+ if (!ret)
+ printk(KERN_INFO PFX "P4/Xeon(TM) CPU On-Demand Clock "
+ "Modulation available\n");
+
+ return ret;
+}
+
+
+static void __exit cpufreq_p4_exit(void)
+{
+ cpufreq_unregister_driver(&p4clockmod_driver);
+}
+
+
+MODULE_AUTHOR("Zwane Mwaikambo <zwane@commfireservices.com>");
+MODULE_DESCRIPTION("cpufreq driver for Pentium(TM) 4/Xeon(TM)");
+MODULE_LICENSE("GPL");
+
+late_initcall(cpufreq_p4_init);
+module_exit(cpufreq_p4_exit);
--- /dev/null
+/*
+ * pcc-cpufreq.c - Processor Clocking Control firmware cpufreq interface
+ *
+ * Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com>
+ * Copyright (C) 2009 Hewlett-Packard Development Company, L.P.
+ * Nagananda Chumbalkar <nagananda.chumbalkar@hp.com>
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON
+ * INFRINGEMENT. See the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/sched.h>
+#include <linux/cpufreq.h>
+#include <linux/compiler.h>
+#include <linux/slab.h>
+
+#include <linux/acpi.h>
+#include <linux/io.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+
+#include <acpi/processor.h>
+
+#define PCC_VERSION "1.10.00"
+#define POLL_LOOPS 300
+
+#define CMD_COMPLETE 0x1
+#define CMD_GET_FREQ 0x0
+#define CMD_SET_FREQ 0x1
+
+#define BUF_SZ 4
+
+struct pcc_register_resource {
+ u8 descriptor;
+ u16 length;
+ u8 space_id;
+ u8 bit_width;
+ u8 bit_offset;
+ u8 access_size;
+ u64 address;
+} __attribute__ ((packed));
+
+struct pcc_memory_resource {
+ u8 descriptor;
+ u16 length;
+ u8 space_id;
+ u8 resource_usage;
+ u8 type_specific;
+ u64 granularity;
+ u64 minimum;
+ u64 maximum;
+ u64 translation_offset;
+ u64 address_length;
+} __attribute__ ((packed));
+
+static struct cpufreq_driver pcc_cpufreq_driver;
+
+struct pcc_header {
+ u32 signature;
+ u16 length;
+ u8 major;
+ u8 minor;
+ u32 features;
+ u16 command;
+ u16 status;
+ u32 latency;
+ u32 minimum_time;
+ u32 maximum_time;
+ u32 nominal;
+ u32 throttled_frequency;
+ u32 minimum_frequency;
+};
+
+static void __iomem *pcch_virt_addr;
+static struct pcc_header __iomem *pcch_hdr;
+
+static DEFINE_SPINLOCK(pcc_lock);
+
+static struct acpi_generic_address doorbell;
+
+static u64 doorbell_preserve;
+static u64 doorbell_write;
+
+static u8 OSC_UUID[16] = {0x9F, 0x2C, 0x9B, 0x63, 0x91, 0x70, 0x1f, 0x49,
+ 0xBB, 0x4F, 0xA5, 0x98, 0x2F, 0xA1, 0xB5, 0x46};
+
+struct pcc_cpu {
+ u32 input_offset;
+ u32 output_offset;
+};
+
+static struct pcc_cpu __percpu *pcc_cpu_info;
+
+static int pcc_cpufreq_verify(struct cpufreq_policy *policy)
+{
+ cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
+ policy->cpuinfo.max_freq);
+ return 0;
+}
+
+static inline void pcc_cmd(void)
+{
+ u64 doorbell_value;
+ int i;
+
+ acpi_read(&doorbell_value, &doorbell);
+ acpi_write((doorbell_value & doorbell_preserve) | doorbell_write,
+ &doorbell);
+
+ for (i = 0; i < POLL_LOOPS; i++) {
+ if (ioread16(&pcch_hdr->status) & CMD_COMPLETE)
+ break;
+ }
+}
+
+static inline void pcc_clear_mapping(void)
+{
+ if (pcch_virt_addr)
+ iounmap(pcch_virt_addr);
+ pcch_virt_addr = NULL;
+}
+
+static unsigned int pcc_get_freq(unsigned int cpu)
+{
+ struct pcc_cpu *pcc_cpu_data;
+ unsigned int curr_freq;
+ unsigned int freq_limit;
+ u16 status;
+ u32 input_buffer;
+ u32 output_buffer;
+
+ spin_lock(&pcc_lock);
+
+ pr_debug("get: get_freq for CPU %d\n", cpu);
+ pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
+
+ input_buffer = 0x1;
+ iowrite32(input_buffer,
+ (pcch_virt_addr + pcc_cpu_data->input_offset));
+ iowrite16(CMD_GET_FREQ, &pcch_hdr->command);
+
+ pcc_cmd();
+
+ output_buffer =
+ ioread32(pcch_virt_addr + pcc_cpu_data->output_offset);
+
+ /* Clear the input buffer - we are done with the current command */
+ memset_io((pcch_virt_addr + pcc_cpu_data->input_offset), 0, BUF_SZ);
+
+ status = ioread16(&pcch_hdr->status);
+ if (status != CMD_COMPLETE) {
+ pr_debug("get: FAILED: for CPU %d, status is %d\n",
+ cpu, status);
+ goto cmd_incomplete;
+ }
+ iowrite16(0, &pcch_hdr->status);
+ curr_freq = (((ioread32(&pcch_hdr->nominal) * (output_buffer & 0xff))
+ / 100) * 1000);
+
+ pr_debug("get: SUCCESS: (virtual) output_offset for cpu %d is "
+ "0x%p, contains a value of: 0x%x. Speed is: %d MHz\n",
+ cpu, (pcch_virt_addr + pcc_cpu_data->output_offset),
+ output_buffer, curr_freq);
+
+ freq_limit = (output_buffer >> 8) & 0xff;
+ if (freq_limit != 0xff) {
+ pr_debug("get: frequency for cpu %d is being temporarily"
+ " capped at %d\n", cpu, curr_freq);
+ }
+
+ spin_unlock(&pcc_lock);
+ return curr_freq;
+
+cmd_incomplete:
+ iowrite16(0, &pcch_hdr->status);
+ spin_unlock(&pcc_lock);
+ return 0;
+}
+
+static int pcc_cpufreq_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ struct pcc_cpu *pcc_cpu_data;
+ struct cpufreq_freqs freqs;
+ u16 status;
+ u32 input_buffer;
+ int cpu;
+
+ spin_lock(&pcc_lock);
+ cpu = policy->cpu;
+ pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
+
+ pr_debug("target: CPU %d should go to target freq: %d "
+ "(virtual) input_offset is 0x%p\n",
+ cpu, target_freq,
+ (pcch_virt_addr + pcc_cpu_data->input_offset));
+
+ freqs.new = target_freq;
+ freqs.cpu = cpu;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ input_buffer = 0x1 | (((target_freq * 100)
+ / (ioread32(&pcch_hdr->nominal) * 1000)) << 8);
+ iowrite32(input_buffer,
+ (pcch_virt_addr + pcc_cpu_data->input_offset));
+ iowrite16(CMD_SET_FREQ, &pcch_hdr->command);
+
+ pcc_cmd();
+
+ /* Clear the input buffer - we are done with the current command */
+ memset_io((pcch_virt_addr + pcc_cpu_data->input_offset), 0, BUF_SZ);
+
+ status = ioread16(&pcch_hdr->status);
+ if (status != CMD_COMPLETE) {
+ pr_debug("target: FAILED for cpu %d, with status: 0x%x\n",
+ cpu, status);
+ goto cmd_incomplete;
+ }
+ iowrite16(0, &pcch_hdr->status);
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ pr_debug("target: was SUCCESSFUL for cpu %d\n", cpu);
+ spin_unlock(&pcc_lock);
+
+ return 0;
+
+cmd_incomplete:
+ iowrite16(0, &pcch_hdr->status);
+ spin_unlock(&pcc_lock);
+ return -EINVAL;
+}
+
+static int pcc_get_offset(int cpu)
+{
+ acpi_status status;
+ struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
+ union acpi_object *pccp, *offset;
+ struct pcc_cpu *pcc_cpu_data;
+ struct acpi_processor *pr;
+ int ret = 0;
+
+ pr = per_cpu(processors, cpu);
+ pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
+
+ status = acpi_evaluate_object(pr->handle, "PCCP", NULL, &buffer);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ pccp = buffer.pointer;
+ if (!pccp || pccp->type != ACPI_TYPE_PACKAGE) {
+ ret = -ENODEV;
+ goto out_free;
+ };
+
+ offset = &(pccp->package.elements[0]);
+ if (!offset || offset->type != ACPI_TYPE_INTEGER) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ pcc_cpu_data->input_offset = offset->integer.value;
+
+ offset = &(pccp->package.elements[1]);
+ if (!offset || offset->type != ACPI_TYPE_INTEGER) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ pcc_cpu_data->output_offset = offset->integer.value;
+
+ memset_io((pcch_virt_addr + pcc_cpu_data->input_offset), 0, BUF_SZ);
+ memset_io((pcch_virt_addr + pcc_cpu_data->output_offset), 0, BUF_SZ);
+
+ pr_debug("pcc_get_offset: for CPU %d: pcc_cpu_data "
+ "input_offset: 0x%x, pcc_cpu_data output_offset: 0x%x\n",
+ cpu, pcc_cpu_data->input_offset, pcc_cpu_data->output_offset);
+out_free:
+ kfree(buffer.pointer);
+ return ret;
+}
+
+static int __init pcc_cpufreq_do_osc(acpi_handle *handle)
+{
+ acpi_status status;
+ struct acpi_object_list input;
+ struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+ union acpi_object in_params[4];
+ union acpi_object *out_obj;
+ u32 capabilities[2];
+ u32 errors;
+ u32 supported;
+ int ret = 0;
+
+ input.count = 4;
+ input.pointer = in_params;
+ in_params[0].type = ACPI_TYPE_BUFFER;
+ in_params[0].buffer.length = 16;
+ in_params[0].buffer.pointer = OSC_UUID;
+ in_params[1].type = ACPI_TYPE_INTEGER;
+ in_params[1].integer.value = 1;
+ in_params[2].type = ACPI_TYPE_INTEGER;
+ in_params[2].integer.value = 2;
+ in_params[3].type = ACPI_TYPE_BUFFER;
+ in_params[3].buffer.length = 8;
+ in_params[3].buffer.pointer = (u8 *)&capabilities;
+
+ capabilities[0] = OSC_QUERY_ENABLE;
+ capabilities[1] = 0x1;
+
+ status = acpi_evaluate_object(*handle, "_OSC", &input, &output);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ if (!output.length)
+ return -ENODEV;
+
+ out_obj = output.pointer;
+ if (out_obj->type != ACPI_TYPE_BUFFER) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ errors = *((u32 *)out_obj->buffer.pointer) & ~(1 << 0);
+ if (errors) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ supported = *((u32 *)(out_obj->buffer.pointer + 4));
+ if (!(supported & 0x1)) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ kfree(output.pointer);
+ capabilities[0] = 0x0;
+ capabilities[1] = 0x1;
+
+ status = acpi_evaluate_object(*handle, "_OSC", &input, &output);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ if (!output.length)
+ return -ENODEV;
+
+ out_obj = output.pointer;
+ if (out_obj->type != ACPI_TYPE_BUFFER) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ errors = *((u32 *)out_obj->buffer.pointer) & ~(1 << 0);
+ if (errors) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ supported = *((u32 *)(out_obj->buffer.pointer + 4));
+ if (!(supported & 0x1)) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+out_free:
+ kfree(output.pointer);
+ return ret;
+}
+
+static int __init pcc_cpufreq_probe(void)
+{
+ acpi_status status;
+ struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+ struct pcc_memory_resource *mem_resource;
+ struct pcc_register_resource *reg_resource;
+ union acpi_object *out_obj, *member;
+ acpi_handle handle, osc_handle, pcch_handle;
+ int ret = 0;
+
+ status = acpi_get_handle(NULL, "\\_SB", &handle);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ status = acpi_get_handle(handle, "PCCH", &pcch_handle);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ status = acpi_get_handle(handle, "_OSC", &osc_handle);
+ if (ACPI_SUCCESS(status)) {
+ ret = pcc_cpufreq_do_osc(&osc_handle);
+ if (ret)
+ pr_debug("probe: _OSC evaluation did not succeed\n");
+ /* Firmware's use of _OSC is optional */
+ ret = 0;
+ }
+
+ status = acpi_evaluate_object(handle, "PCCH", NULL, &output);
+ if (ACPI_FAILURE(status))
+ return -ENODEV;
+
+ out_obj = output.pointer;
+ if (out_obj->type != ACPI_TYPE_PACKAGE) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ member = &out_obj->package.elements[0];
+ if (member->type != ACPI_TYPE_BUFFER) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ mem_resource = (struct pcc_memory_resource *)member->buffer.pointer;
+
+ pr_debug("probe: mem_resource descriptor: 0x%x,"
+ " length: %d, space_id: %d, resource_usage: %d,"
+ " type_specific: %d, granularity: 0x%llx,"
+ " minimum: 0x%llx, maximum: 0x%llx,"
+ " translation_offset: 0x%llx, address_length: 0x%llx\n",
+ mem_resource->descriptor, mem_resource->length,
+ mem_resource->space_id, mem_resource->resource_usage,
+ mem_resource->type_specific, mem_resource->granularity,
+ mem_resource->minimum, mem_resource->maximum,
+ mem_resource->translation_offset,
+ mem_resource->address_length);
+
+ if (mem_resource->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ pcch_virt_addr = ioremap_nocache(mem_resource->minimum,
+ mem_resource->address_length);
+ if (pcch_virt_addr == NULL) {
+ pr_debug("probe: could not map shared mem region\n");
+ goto out_free;
+ }
+ pcch_hdr = pcch_virt_addr;
+
+ pr_debug("probe: PCCH header (virtual) addr: 0x%p\n", pcch_hdr);
+ pr_debug("probe: PCCH header is at physical address: 0x%llx,"
+ " signature: 0x%x, length: %d bytes, major: %d, minor: %d,"
+ " supported features: 0x%x, command field: 0x%x,"
+ " status field: 0x%x, nominal latency: %d us\n",
+ mem_resource->minimum, ioread32(&pcch_hdr->signature),
+ ioread16(&pcch_hdr->length), ioread8(&pcch_hdr->major),
+ ioread8(&pcch_hdr->minor), ioread32(&pcch_hdr->features),
+ ioread16(&pcch_hdr->command), ioread16(&pcch_hdr->status),
+ ioread32(&pcch_hdr->latency));
+
+ pr_debug("probe: min time between commands: %d us,"
+ " max time between commands: %d us,"
+ " nominal CPU frequency: %d MHz,"
+ " minimum CPU frequency: %d MHz,"
+ " minimum CPU frequency without throttling: %d MHz\n",
+ ioread32(&pcch_hdr->minimum_time),
+ ioread32(&pcch_hdr->maximum_time),
+ ioread32(&pcch_hdr->nominal),
+ ioread32(&pcch_hdr->throttled_frequency),
+ ioread32(&pcch_hdr->minimum_frequency));
+
+ member = &out_obj->package.elements[1];
+ if (member->type != ACPI_TYPE_BUFFER) {
+ ret = -ENODEV;
+ goto pcch_free;
+ }
+
+ reg_resource = (struct pcc_register_resource *)member->buffer.pointer;
+
+ doorbell.space_id = reg_resource->space_id;
+ doorbell.bit_width = reg_resource->bit_width;
+ doorbell.bit_offset = reg_resource->bit_offset;
+ doorbell.access_width = 64;
+ doorbell.address = reg_resource->address;
+
+ pr_debug("probe: doorbell: space_id is %d, bit_width is %d, "
+ "bit_offset is %d, access_width is %d, address is 0x%llx\n",
+ doorbell.space_id, doorbell.bit_width, doorbell.bit_offset,
+ doorbell.access_width, reg_resource->address);
+
+ member = &out_obj->package.elements[2];
+ if (member->type != ACPI_TYPE_INTEGER) {
+ ret = -ENODEV;
+ goto pcch_free;
+ }
+
+ doorbell_preserve = member->integer.value;
+
+ member = &out_obj->package.elements[3];
+ if (member->type != ACPI_TYPE_INTEGER) {
+ ret = -ENODEV;
+ goto pcch_free;
+ }
+
+ doorbell_write = member->integer.value;
+
+ pr_debug("probe: doorbell_preserve: 0x%llx,"
+ " doorbell_write: 0x%llx\n",
+ doorbell_preserve, doorbell_write);
+
+ pcc_cpu_info = alloc_percpu(struct pcc_cpu);
+ if (!pcc_cpu_info) {
+ ret = -ENOMEM;
+ goto pcch_free;
+ }
+
+ printk(KERN_DEBUG "pcc-cpufreq: (v%s) driver loaded with frequency"
+ " limits: %d MHz, %d MHz\n", PCC_VERSION,
+ ioread32(&pcch_hdr->minimum_frequency),
+ ioread32(&pcch_hdr->nominal));
+ kfree(output.pointer);
+ return ret;
+pcch_free:
+ pcc_clear_mapping();
+out_free:
+ kfree(output.pointer);
+ return ret;
+}
+
+static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int cpu = policy->cpu;
+ unsigned int result = 0;
+
+ if (!pcch_virt_addr) {
+ result = -1;
+ goto out;
+ }
+
+ result = pcc_get_offset(cpu);
+ if (result) {
+ pr_debug("init: PCCP evaluation failed\n");
+ goto out;
+ }
+
+ policy->max = policy->cpuinfo.max_freq =
+ ioread32(&pcch_hdr->nominal) * 1000;
+ policy->min = policy->cpuinfo.min_freq =
+ ioread32(&pcch_hdr->minimum_frequency) * 1000;
+ policy->cur = pcc_get_freq(cpu);
+
+ if (!policy->cur) {
+ pr_debug("init: Unable to get current CPU frequency\n");
+ result = -EINVAL;
+ goto out;
+ }
+
+ pr_debug("init: policy->max is %d, policy->min is %d\n",
+ policy->max, policy->min);
+out:
+ return result;
+}
+
+static int pcc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+ return 0;
+}
+
+static struct cpufreq_driver pcc_cpufreq_driver = {
+ .flags = CPUFREQ_CONST_LOOPS,
+ .get = pcc_get_freq,
+ .verify = pcc_cpufreq_verify,
+ .target = pcc_cpufreq_target,
+ .init = pcc_cpufreq_cpu_init,
+ .exit = pcc_cpufreq_cpu_exit,
+ .name = "pcc-cpufreq",
+ .owner = THIS_MODULE,
+};
+
+static int __init pcc_cpufreq_init(void)
+{
+ int ret;
+
+ if (acpi_disabled)
+ return 0;
+
+ ret = pcc_cpufreq_probe();
+ if (ret) {
+ pr_debug("pcc_cpufreq_init: PCCH evaluation failed\n");
+ return ret;
+ }
+
+ ret = cpufreq_register_driver(&pcc_cpufreq_driver);
+
+ return ret;
+}
+
+static void __exit pcc_cpufreq_exit(void)
+{
+ cpufreq_unregister_driver(&pcc_cpufreq_driver);
+
+ pcc_clear_mapping();
+
+ free_percpu(pcc_cpu_info);
+}
+
+MODULE_AUTHOR("Matthew Garrett, Naga Chumbalkar");
+MODULE_VERSION(PCC_VERSION);
+MODULE_DESCRIPTION("Processor Clocking Control interface driver");
+MODULE_LICENSE("GPL");
+
+late_initcall(pcc_cpufreq_init);
+module_exit(pcc_cpufreq_exit);
--- /dev/null
+/*
+ * This file was based upon code in Powertweak Linux (http://powertweak.sf.net)
+ * (C) 2000-2003 Dave Jones, Arjan van de Ven, Janne Pänkälä,
+ * Dominik Brodowski.
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/ioport.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+
+#include <asm/msr.h>
+
+#define POWERNOW_IOPORT 0xfff0 /* it doesn't matter where, as long
+ as it is unused */
+
+#define PFX "powernow-k6: "
+static unsigned int busfreq; /* FSB, in 10 kHz */
+static unsigned int max_multiplier;
+
+
+/* Clock ratio multiplied by 10 - see table 27 in AMD#23446 */
+static struct cpufreq_frequency_table clock_ratio[] = {
+ {45, /* 000 -> 4.5x */ 0},
+ {50, /* 001 -> 5.0x */ 0},
+ {40, /* 010 -> 4.0x */ 0},
+ {55, /* 011 -> 5.5x */ 0},
+ {20, /* 100 -> 2.0x */ 0},
+ {30, /* 101 -> 3.0x */ 0},
+ {60, /* 110 -> 6.0x */ 0},
+ {35, /* 111 -> 3.5x */ 0},
+ {0, CPUFREQ_TABLE_END}
+};
+
+
+/**
+ * powernow_k6_get_cpu_multiplier - returns the current FSB multiplier
+ *
+ * Returns the current setting of the frequency multiplier. Core clock
+ * speed is frequency of the Front-Side Bus multiplied with this value.
+ */
+static int powernow_k6_get_cpu_multiplier(void)
+{
+ u64 invalue = 0;
+ u32 msrval;
+
+ msrval = POWERNOW_IOPORT + 0x1;
+ wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */
+ invalue = inl(POWERNOW_IOPORT + 0x8);
+ msrval = POWERNOW_IOPORT + 0x0;
+ wrmsr(MSR_K6_EPMR, msrval, 0); /* disable it again */
+
+ return clock_ratio[(invalue >> 5)&7].index;
+}
+
+
+/**
+ * powernow_k6_set_state - set the PowerNow! multiplier
+ * @best_i: clock_ratio[best_i] is the target multiplier
+ *
+ * Tries to change the PowerNow! multiplier
+ */
+static void powernow_k6_set_state(unsigned int best_i)
+{
+ unsigned long outvalue = 0, invalue = 0;
+ unsigned long msrval;
+ struct cpufreq_freqs freqs;
+
+ if (clock_ratio[best_i].index > max_multiplier) {
+ printk(KERN_ERR PFX "invalid target frequency\n");
+ return;
+ }
+
+ freqs.old = busfreq * powernow_k6_get_cpu_multiplier();
+ freqs.new = busfreq * clock_ratio[best_i].index;
+ freqs.cpu = 0; /* powernow-k6.c is UP only driver */
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ /* we now need to transform best_i to the BVC format, see AMD#23446 */
+
+ outvalue = (1<<12) | (1<<10) | (1<<9) | (best_i<<5);
+
+ msrval = POWERNOW_IOPORT + 0x1;
+ wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */
+ invalue = inl(POWERNOW_IOPORT + 0x8);
+ invalue = invalue & 0xf;
+ outvalue = outvalue | invalue;
+ outl(outvalue , (POWERNOW_IOPORT + 0x8));
+ msrval = POWERNOW_IOPORT + 0x0;
+ wrmsr(MSR_K6_EPMR, msrval, 0); /* disable it again */
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+ return;
+}
+
+
+/**
+ * powernow_k6_verify - verifies a new CPUfreq policy
+ * @policy: new policy
+ *
+ * Policy must be within lowest and highest possible CPU Frequency,
+ * and at least one possible state must be within min and max.
+ */
+static int powernow_k6_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, &clock_ratio[0]);
+}
+
+
+/**
+ * powernow_k6_setpolicy - sets a new CPUFreq policy
+ * @policy: new policy
+ * @target_freq: the target frequency
+ * @relation: how that frequency relates to achieved frequency
+ * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
+ *
+ * sets a new CPUFreq policy
+ */
+static int powernow_k6_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate = 0;
+
+ if (cpufreq_frequency_table_target(policy, &clock_ratio[0],
+ target_freq, relation, &newstate))
+ return -EINVAL;
+
+ powernow_k6_set_state(newstate);
+
+ return 0;
+}
+
+
+static int powernow_k6_cpu_init(struct cpufreq_policy *policy)
+{
+ unsigned int i, f;
+ int result;
+
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ /* get frequencies */
+ max_multiplier = powernow_k6_get_cpu_multiplier();
+ busfreq = cpu_khz / max_multiplier;
+
+ /* table init */
+ for (i = 0; (clock_ratio[i].frequency != CPUFREQ_TABLE_END); i++) {
+ f = clock_ratio[i].index;
+ if (f > max_multiplier)
+ clock_ratio[i].frequency = CPUFREQ_ENTRY_INVALID;
+ else
+ clock_ratio[i].frequency = busfreq * f;
+ }
+
+ /* cpuinfo and default policy values */
+ policy->cpuinfo.transition_latency = 200000;
+ policy->cur = busfreq * max_multiplier;
+
+ result = cpufreq_frequency_table_cpuinfo(policy, clock_ratio);
+ if (result)
+ return result;
+
+ cpufreq_frequency_table_get_attr(clock_ratio, policy->cpu);
+
+ return 0;
+}
+
+
+static int powernow_k6_cpu_exit(struct cpufreq_policy *policy)
+{
+ unsigned int i;
+ for (i = 0; i < 8; i++) {
+ if (i == max_multiplier)
+ powernow_k6_set_state(i);
+ }
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+static unsigned int powernow_k6_get(unsigned int cpu)
+{
+ unsigned int ret;
+ ret = (busfreq * powernow_k6_get_cpu_multiplier());
+ return ret;
+}
+
+static struct freq_attr *powernow_k6_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver powernow_k6_driver = {
+ .verify = powernow_k6_verify,
+ .target = powernow_k6_target,
+ .init = powernow_k6_cpu_init,
+ .exit = powernow_k6_cpu_exit,
+ .get = powernow_k6_get,
+ .name = "powernow-k6",
+ .owner = THIS_MODULE,
+ .attr = powernow_k6_attr,
+};
+
+
+/**
+ * powernow_k6_init - initializes the k6 PowerNow! CPUFreq driver
+ *
+ * Initializes the K6 PowerNow! support. Returns -ENODEV on unsupported
+ * devices, -EINVAL or -ENOMEM on problems during initiatization, and zero
+ * on success.
+ */
+static int __init powernow_k6_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ if ((c->x86_vendor != X86_VENDOR_AMD) || (c->x86 != 5) ||
+ ((c->x86_model != 12) && (c->x86_model != 13)))
+ return -ENODEV;
+
+ if (!request_region(POWERNOW_IOPORT, 16, "PowerNow!")) {
+ printk(KERN_INFO PFX "PowerNow IOPORT region already used.\n");
+ return -EIO;
+ }
+
+ if (cpufreq_register_driver(&powernow_k6_driver)) {
+ release_region(POWERNOW_IOPORT, 16);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+
+/**
+ * powernow_k6_exit - unregisters AMD K6-2+/3+ PowerNow! support
+ *
+ * Unregisters AMD K6-2+ / K6-3+ PowerNow! support.
+ */
+static void __exit powernow_k6_exit(void)
+{
+ cpufreq_unregister_driver(&powernow_k6_driver);
+ release_region(POWERNOW_IOPORT, 16);
+}
+
+
+MODULE_AUTHOR("Arjan van de Ven, Dave Jones <davej@redhat.com>, "
+ "Dominik Brodowski <linux@brodo.de>");
+MODULE_DESCRIPTION("PowerNow! driver for AMD K6-2+ / K6-3+ processors.");
+MODULE_LICENSE("GPL");
+
+module_init(powernow_k6_init);
+module_exit(powernow_k6_exit);
--- /dev/null
+/*
+ * AMD K7 Powernow driver.
+ * (C) 2003 Dave Jones on behalf of SuSE Labs.
+ * (C) 2003-2004 Dave Jones <davej@redhat.com>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ * Based upon datasheets & sample CPUs kindly provided by AMD.
+ *
+ * Errata 5:
+ * CPU may fail to execute a FID/VID change in presence of interrupt.
+ * - We cli/sti on stepping A0 CPUs around the FID/VID transition.
+ * Errata 15:
+ * CPU with half frequency multipliers may hang upon wakeup from disconnect.
+ * - We disable half multipliers if ACPI is used on A0 stepping CPUs.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/dmi.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+
+#include <asm/timer.h> /* Needed for recalibrate_cpu_khz() */
+#include <asm/msr.h>
+#include <asm/system.h>
+
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+#include <linux/acpi.h>
+#include <acpi/processor.h>
+#endif
+
+#include "powernow-k7.h"
+
+#define PFX "powernow: "
+
+
+struct psb_s {
+ u8 signature[10];
+ u8 tableversion;
+ u8 flags;
+ u16 settlingtime;
+ u8 reserved1;
+ u8 numpst;
+};
+
+struct pst_s {
+ u32 cpuid;
+ u8 fsbspeed;
+ u8 maxfid;
+ u8 startvid;
+ u8 numpstates;
+};
+
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+union powernow_acpi_control_t {
+ struct {
+ unsigned long fid:5,
+ vid:5,
+ sgtc:20,
+ res1:2;
+ } bits;
+ unsigned long val;
+};
+#endif
+
+/* divide by 1000 to get VCore voltage in V. */
+static const int mobile_vid_table[32] = {
+ 2000, 1950, 1900, 1850, 1800, 1750, 1700, 1650,
+ 1600, 1550, 1500, 1450, 1400, 1350, 1300, 0,
+ 1275, 1250, 1225, 1200, 1175, 1150, 1125, 1100,
+ 1075, 1050, 1025, 1000, 975, 950, 925, 0,
+};
+
+/* divide by 10 to get FID. */
+static const int fid_codes[32] = {
+ 110, 115, 120, 125, 50, 55, 60, 65,
+ 70, 75, 80, 85, 90, 95, 100, 105,
+ 30, 190, 40, 200, 130, 135, 140, 210,
+ 150, 225, 160, 165, 170, 180, -1, -1,
+};
+
+/* This parameter is used in order to force ACPI instead of legacy method for
+ * configuration purpose.
+ */
+
+static int acpi_force;
+
+static struct cpufreq_frequency_table *powernow_table;
+
+static unsigned int can_scale_bus;
+static unsigned int can_scale_vid;
+static unsigned int minimum_speed = -1;
+static unsigned int maximum_speed;
+static unsigned int number_scales;
+static unsigned int fsb;
+static unsigned int latency;
+static char have_a0;
+
+static int check_fsb(unsigned int fsbspeed)
+{
+ int delta;
+ unsigned int f = fsb / 1000;
+
+ delta = (fsbspeed > f) ? fsbspeed - f : f - fsbspeed;
+ return delta < 5;
+}
+
+static int check_powernow(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ unsigned int maxei, eax, ebx, ecx, edx;
+
+ if ((c->x86_vendor != X86_VENDOR_AMD) || (c->x86 != 6)) {
+#ifdef MODULE
+ printk(KERN_INFO PFX "This module only works with "
+ "AMD K7 CPUs\n");
+#endif
+ return 0;
+ }
+
+ /* Get maximum capabilities */
+ maxei = cpuid_eax(0x80000000);
+ if (maxei < 0x80000007) { /* Any powernow info ? */
+#ifdef MODULE
+ printk(KERN_INFO PFX "No powernow capabilities detected\n");
+#endif
+ return 0;
+ }
+
+ if ((c->x86_model == 6) && (c->x86_mask == 0)) {
+ printk(KERN_INFO PFX "K7 660[A0] core detected, "
+ "enabling errata workarounds\n");
+ have_a0 = 1;
+ }
+
+ cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
+
+ /* Check we can actually do something before we say anything.*/
+ if (!(edx & (1 << 1 | 1 << 2)))
+ return 0;
+
+ printk(KERN_INFO PFX "PowerNOW! Technology present. Can scale: ");
+
+ if (edx & 1 << 1) {
+ printk("frequency");
+ can_scale_bus = 1;
+ }
+
+ if ((edx & (1 << 1 | 1 << 2)) == 0x6)
+ printk(" and ");
+
+ if (edx & 1 << 2) {
+ printk("voltage");
+ can_scale_vid = 1;
+ }
+
+ printk(".\n");
+ return 1;
+}
+
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+static void invalidate_entry(unsigned int entry)
+{
+ powernow_table[entry].frequency = CPUFREQ_ENTRY_INVALID;
+}
+#endif
+
+static int get_ranges(unsigned char *pst)
+{
+ unsigned int j;
+ unsigned int speed;
+ u8 fid, vid;
+
+ powernow_table = kzalloc((sizeof(struct cpufreq_frequency_table) *
+ (number_scales + 1)), GFP_KERNEL);
+ if (!powernow_table)
+ return -ENOMEM;
+
+ for (j = 0 ; j < number_scales; j++) {
+ fid = *pst++;
+
+ powernow_table[j].frequency = (fsb * fid_codes[fid]) / 10;
+ powernow_table[j].index = fid; /* lower 8 bits */
+
+ speed = powernow_table[j].frequency;
+
+ if ((fid_codes[fid] % 10) == 5) {
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+ if (have_a0 == 1)
+ invalidate_entry(j);
+#endif
+ }
+
+ if (speed < minimum_speed)
+ minimum_speed = speed;
+ if (speed > maximum_speed)
+ maximum_speed = speed;
+
+ vid = *pst++;
+ powernow_table[j].index |= (vid << 8); /* upper 8 bits */
+
+ pr_debug(" FID: 0x%x (%d.%dx [%dMHz]) "
+ "VID: 0x%x (%d.%03dV)\n", fid, fid_codes[fid] / 10,
+ fid_codes[fid] % 10, speed/1000, vid,
+ mobile_vid_table[vid]/1000,
+ mobile_vid_table[vid]%1000);
+ }
+ powernow_table[number_scales].frequency = CPUFREQ_TABLE_END;
+ powernow_table[number_scales].index = 0;
+
+ return 0;
+}
+
+
+static void change_FID(int fid)
+{
+ union msr_fidvidctl fidvidctl;
+
+ rdmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
+ if (fidvidctl.bits.FID != fid) {
+ fidvidctl.bits.SGTC = latency;
+ fidvidctl.bits.FID = fid;
+ fidvidctl.bits.VIDC = 0;
+ fidvidctl.bits.FIDC = 1;
+ wrmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
+ }
+}
+
+
+static void change_VID(int vid)
+{
+ union msr_fidvidctl fidvidctl;
+
+ rdmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
+ if (fidvidctl.bits.VID != vid) {
+ fidvidctl.bits.SGTC = latency;
+ fidvidctl.bits.VID = vid;
+ fidvidctl.bits.FIDC = 0;
+ fidvidctl.bits.VIDC = 1;
+ wrmsrl(MSR_K7_FID_VID_CTL, fidvidctl.val);
+ }
+}
+
+
+static void change_speed(unsigned int index)
+{
+ u8 fid, vid;
+ struct cpufreq_freqs freqs;
+ union msr_fidvidstatus fidvidstatus;
+ int cfid;
+
+ /* fid are the lower 8 bits of the index we stored into
+ * the cpufreq frequency table in powernow_decode_bios,
+ * vid are the upper 8 bits.
+ */
+
+ fid = powernow_table[index].index & 0xFF;
+ vid = (powernow_table[index].index & 0xFF00) >> 8;
+
+ freqs.cpu = 0;
+
+ rdmsrl(MSR_K7_FID_VID_STATUS, fidvidstatus.val);
+ cfid = fidvidstatus.bits.CFID;
+ freqs.old = fsb * fid_codes[cfid] / 10;
+
+ freqs.new = powernow_table[index].frequency;
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ /* Now do the magic poking into the MSRs. */
+
+ if (have_a0 == 1) /* A0 errata 5 */
+ local_irq_disable();
+
+ if (freqs.old > freqs.new) {
+ /* Going down, so change FID first */
+ change_FID(fid);
+ change_VID(vid);
+ } else {
+ /* Going up, so change VID first */
+ change_VID(vid);
+ change_FID(fid);
+ }
+
+
+ if (have_a0 == 1)
+ local_irq_enable();
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+}
+
+
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+
+static struct acpi_processor_performance *acpi_processor_perf;
+
+static int powernow_acpi_init(void)
+{
+ int i;
+ int retval = 0;
+ union powernow_acpi_control_t pc;
+
+ if (acpi_processor_perf != NULL && powernow_table != NULL) {
+ retval = -EINVAL;
+ goto err0;
+ }
+
+ acpi_processor_perf = kzalloc(sizeof(struct acpi_processor_performance),
+ GFP_KERNEL);
+ if (!acpi_processor_perf) {
+ retval = -ENOMEM;
+ goto err0;
+ }
+
+ if (!zalloc_cpumask_var(&acpi_processor_perf->shared_cpu_map,
+ GFP_KERNEL)) {
+ retval = -ENOMEM;
+ goto err05;
+ }
+
+ if (acpi_processor_register_performance(acpi_processor_perf, 0)) {
+ retval = -EIO;
+ goto err1;
+ }
+
+ if (acpi_processor_perf->control_register.space_id !=
+ ACPI_ADR_SPACE_FIXED_HARDWARE) {
+ retval = -ENODEV;
+ goto err2;
+ }
+
+ if (acpi_processor_perf->status_register.space_id !=
+ ACPI_ADR_SPACE_FIXED_HARDWARE) {
+ retval = -ENODEV;
+ goto err2;
+ }
+
+ number_scales = acpi_processor_perf->state_count;
+
+ if (number_scales < 2) {
+ retval = -ENODEV;
+ goto err2;
+ }
+
+ powernow_table = kzalloc((sizeof(struct cpufreq_frequency_table) *
+ (number_scales + 1)), GFP_KERNEL);
+ if (!powernow_table) {
+ retval = -ENOMEM;
+ goto err2;
+ }
+
+ pc.val = (unsigned long) acpi_processor_perf->states[0].control;
+ for (i = 0; i < number_scales; i++) {
+ u8 fid, vid;
+ struct acpi_processor_px *state =
+ &acpi_processor_perf->states[i];
+ unsigned int speed, speed_mhz;
+
+ pc.val = (unsigned long) state->control;
+ pr_debug("acpi: P%d: %d MHz %d mW %d uS control %08x SGTC %d\n",
+ i,
+ (u32) state->core_frequency,
+ (u32) state->power,
+ (u32) state->transition_latency,
+ (u32) state->control,
+ pc.bits.sgtc);
+
+ vid = pc.bits.vid;
+ fid = pc.bits.fid;
+
+ powernow_table[i].frequency = fsb * fid_codes[fid] / 10;
+ powernow_table[i].index = fid; /* lower 8 bits */
+ powernow_table[i].index |= (vid << 8); /* upper 8 bits */
+
+ speed = powernow_table[i].frequency;
+ speed_mhz = speed / 1000;
+
+ /* processor_perflib will multiply the MHz value by 1000 to
+ * get a KHz value (e.g. 1266000). However, powernow-k7 works
+ * with true KHz values (e.g. 1266768). To ensure that all
+ * powernow frequencies are available, we must ensure that
+ * ACPI doesn't restrict them, so we round up the MHz value
+ * to ensure that perflib's computed KHz value is greater than
+ * or equal to powernow's KHz value.
+ */
+ if (speed % 1000 > 0)
+ speed_mhz++;
+
+ if ((fid_codes[fid] % 10) == 5) {
+ if (have_a0 == 1)
+ invalidate_entry(i);
+ }
+
+ pr_debug(" FID: 0x%x (%d.%dx [%dMHz]) "
+ "VID: 0x%x (%d.%03dV)\n", fid, fid_codes[fid] / 10,
+ fid_codes[fid] % 10, speed_mhz, vid,
+ mobile_vid_table[vid]/1000,
+ mobile_vid_table[vid]%1000);
+
+ if (state->core_frequency != speed_mhz) {
+ state->core_frequency = speed_mhz;
+ pr_debug(" Corrected ACPI frequency to %d\n",
+ speed_mhz);
+ }
+
+ if (latency < pc.bits.sgtc)
+ latency = pc.bits.sgtc;
+
+ if (speed < minimum_speed)
+ minimum_speed = speed;
+ if (speed > maximum_speed)
+ maximum_speed = speed;
+ }
+
+ powernow_table[i].frequency = CPUFREQ_TABLE_END;
+ powernow_table[i].index = 0;
+
+ /* notify BIOS that we exist */
+ acpi_processor_notify_smm(THIS_MODULE);
+
+ return 0;
+
+err2:
+ acpi_processor_unregister_performance(acpi_processor_perf, 0);
+err1:
+ free_cpumask_var(acpi_processor_perf->shared_cpu_map);
+err05:
+ kfree(acpi_processor_perf);
+err0:
+ printk(KERN_WARNING PFX "ACPI perflib can not be used on "
+ "this platform\n");
+ acpi_processor_perf = NULL;
+ return retval;
+}
+#else
+static int powernow_acpi_init(void)
+{
+ printk(KERN_INFO PFX "no support for ACPI processor found."
+ " Please recompile your kernel with ACPI processor\n");
+ return -EINVAL;
+}
+#endif
+
+static void print_pst_entry(struct pst_s *pst, unsigned int j)
+{
+ pr_debug("PST:%d (@%p)\n", j, pst);
+ pr_debug(" cpuid: 0x%x fsb: %d maxFID: 0x%x startvid: 0x%x\n",
+ pst->cpuid, pst->fsbspeed, pst->maxfid, pst->startvid);
+}
+
+static int powernow_decode_bios(int maxfid, int startvid)
+{
+ struct psb_s *psb;
+ struct pst_s *pst;
+ unsigned int i, j;
+ unsigned char *p;
+ unsigned int etuple;
+ unsigned int ret;
+
+ etuple = cpuid_eax(0x80000001);
+
+ for (i = 0xC0000; i < 0xffff0 ; i += 16) {
+
+ p = phys_to_virt(i);
+
+ if (memcmp(p, "AMDK7PNOW!", 10) == 0) {
+ pr_debug("Found PSB header at %p\n", p);
+ psb = (struct psb_s *) p;
+ pr_debug("Table version: 0x%x\n", psb->tableversion);
+ if (psb->tableversion != 0x12) {
+ printk(KERN_INFO PFX "Sorry, only v1.2 tables"
+ " supported right now\n");
+ return -ENODEV;
+ }
+
+ pr_debug("Flags: 0x%x\n", psb->flags);
+ if ((psb->flags & 1) == 0)
+ pr_debug("Mobile voltage regulator\n");
+ else
+ pr_debug("Desktop voltage regulator\n");
+
+ latency = psb->settlingtime;
+ if (latency < 100) {
+ printk(KERN_INFO PFX "BIOS set settling time "
+ "to %d microseconds. "
+ "Should be at least 100. "
+ "Correcting.\n", latency);
+ latency = 100;
+ }
+ pr_debug("Settling Time: %d microseconds.\n",
+ psb->settlingtime);
+ pr_debug("Has %d PST tables. (Only dumping ones "
+ "relevant to this CPU).\n",
+ psb->numpst);
+
+ p += sizeof(struct psb_s);
+
+ pst = (struct pst_s *) p;
+
+ for (j = 0; j < psb->numpst; j++) {
+ pst = (struct pst_s *) p;
+ number_scales = pst->numpstates;
+
+ if ((etuple == pst->cpuid) &&
+ check_fsb(pst->fsbspeed) &&
+ (maxfid == pst->maxfid) &&
+ (startvid == pst->startvid)) {
+ print_pst_entry(pst, j);
+ p = (char *)pst + sizeof(struct pst_s);
+ ret = get_ranges(p);
+ return ret;
+ } else {
+ unsigned int k;
+ p = (char *)pst + sizeof(struct pst_s);
+ for (k = 0; k < number_scales; k++)
+ p += 2;
+ }
+ }
+ printk(KERN_INFO PFX "No PST tables match this cpuid "
+ "(0x%x)\n", etuple);
+ printk(KERN_INFO PFX "This is indicative of a broken "
+ "BIOS.\n");
+
+ return -EINVAL;
+ }
+ p++;
+ }
+
+ return -ENODEV;
+}
+
+
+static int powernow_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate;
+
+ if (cpufreq_frequency_table_target(policy, powernow_table, target_freq,
+ relation, &newstate))
+ return -EINVAL;
+
+ change_speed(newstate);
+
+ return 0;
+}
+
+
+static int powernow_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, powernow_table);
+}
+
+/*
+ * We use the fact that the bus frequency is somehow
+ * a multiple of 100000/3 khz, then we compute sgtc according
+ * to this multiple.
+ * That way, we match more how AMD thinks all of that work.
+ * We will then get the same kind of behaviour already tested under
+ * the "well-known" other OS.
+ */
+static int __cpuinit fixup_sgtc(void)
+{
+ unsigned int sgtc;
+ unsigned int m;
+
+ m = fsb / 3333;
+ if ((m % 10) >= 5)
+ m += 5;
+
+ m /= 10;
+
+ sgtc = 100 * m * latency;
+ sgtc = sgtc / 3;
+ if (sgtc > 0xfffff) {
+ printk(KERN_WARNING PFX "SGTC too large %d\n", sgtc);
+ sgtc = 0xfffff;
+ }
+ return sgtc;
+}
+
+static unsigned int powernow_get(unsigned int cpu)
+{
+ union msr_fidvidstatus fidvidstatus;
+ unsigned int cfid;
+
+ if (cpu)
+ return 0;
+ rdmsrl(MSR_K7_FID_VID_STATUS, fidvidstatus.val);
+ cfid = fidvidstatus.bits.CFID;
+
+ return fsb * fid_codes[cfid] / 10;
+}
+
+
+static int __cpuinit acer_cpufreq_pst(const struct dmi_system_id *d)
+{
+ printk(KERN_WARNING PFX
+ "%s laptop with broken PST tables in BIOS detected.\n",
+ d->ident);
+ printk(KERN_WARNING PFX
+ "You need to downgrade to 3A21 (09/09/2002), or try a newer "
+ "BIOS than 3A71 (01/20/2003)\n");
+ printk(KERN_WARNING PFX
+ "cpufreq scaling has been disabled as a result of this.\n");
+ return 0;
+}
+
+/*
+ * Some Athlon laptops have really fucked PST tables.
+ * A BIOS update is all that can save them.
+ * Mention this, and disable cpufreq.
+ */
+static struct dmi_system_id __cpuinitdata powernow_dmi_table[] = {
+ {
+ .callback = acer_cpufreq_pst,
+ .ident = "Acer Aspire",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Insyde Software"),
+ DMI_MATCH(DMI_BIOS_VERSION, "3A71"),
+ },
+ },
+ { }
+};
+
+static int __cpuinit powernow_cpu_init(struct cpufreq_policy *policy)
+{
+ union msr_fidvidstatus fidvidstatus;
+ int result;
+
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ rdmsrl(MSR_K7_FID_VID_STATUS, fidvidstatus.val);
+
+ recalibrate_cpu_khz();
+
+ fsb = (10 * cpu_khz) / fid_codes[fidvidstatus.bits.CFID];
+ if (!fsb) {
+ printk(KERN_WARNING PFX "can not determine bus frequency\n");
+ return -EINVAL;
+ }
+ pr_debug("FSB: %3dMHz\n", fsb/1000);
+
+ if (dmi_check_system(powernow_dmi_table) || acpi_force) {
+ printk(KERN_INFO PFX "PSB/PST known to be broken. "
+ "Trying ACPI instead\n");
+ result = powernow_acpi_init();
+ } else {
+ result = powernow_decode_bios(fidvidstatus.bits.MFID,
+ fidvidstatus.bits.SVID);
+ if (result) {
+ printk(KERN_INFO PFX "Trying ACPI perflib\n");
+ maximum_speed = 0;
+ minimum_speed = -1;
+ latency = 0;
+ result = powernow_acpi_init();
+ if (result) {
+ printk(KERN_INFO PFX
+ "ACPI and legacy methods failed\n");
+ }
+ } else {
+ /* SGTC use the bus clock as timer */
+ latency = fixup_sgtc();
+ printk(KERN_INFO PFX "SGTC: %d\n", latency);
+ }
+ }
+
+ if (result)
+ return result;
+
+ printk(KERN_INFO PFX "Minimum speed %d MHz. Maximum speed %d MHz.\n",
+ minimum_speed/1000, maximum_speed/1000);
+
+ policy->cpuinfo.transition_latency =
+ cpufreq_scale(2000000UL, fsb, latency);
+
+ policy->cur = powernow_get(0);
+
+ cpufreq_frequency_table_get_attr(powernow_table, policy->cpu);
+
+ return cpufreq_frequency_table_cpuinfo(policy, powernow_table);
+}
+
+static int powernow_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+ if (acpi_processor_perf) {
+ acpi_processor_unregister_performance(acpi_processor_perf, 0);
+ free_cpumask_var(acpi_processor_perf->shared_cpu_map);
+ kfree(acpi_processor_perf);
+ }
+#endif
+
+ kfree(powernow_table);
+ return 0;
+}
+
+static struct freq_attr *powernow_table_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver powernow_driver = {
+ .verify = powernow_verify,
+ .target = powernow_target,
+ .get = powernow_get,
+#ifdef CONFIG_X86_POWERNOW_K7_ACPI
+ .bios_limit = acpi_processor_get_bios_limit,
+#endif
+ .init = powernow_cpu_init,
+ .exit = powernow_cpu_exit,
+ .name = "powernow-k7",
+ .owner = THIS_MODULE,
+ .attr = powernow_table_attr,
+};
+
+static int __init powernow_init(void)
+{
+ if (check_powernow() == 0)
+ return -ENODEV;
+ return cpufreq_register_driver(&powernow_driver);
+}
+
+
+static void __exit powernow_exit(void)
+{
+ cpufreq_unregister_driver(&powernow_driver);
+}
+
+module_param(acpi_force, int, 0444);
+MODULE_PARM_DESC(acpi_force, "Force ACPI to be used.");
+
+MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
+MODULE_DESCRIPTION("Powernow driver for AMD K7 processors.");
+MODULE_LICENSE("GPL");
+
+late_initcall(powernow_init);
+module_exit(powernow_exit);
+
--- /dev/null
+/*
+ * (C) 2003 Dave Jones.
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * AMD-specific information
+ *
+ */
+
+union msr_fidvidctl {
+ struct {
+ unsigned FID:5, // 4:0
+ reserved1:3, // 7:5
+ VID:5, // 12:8
+ reserved2:3, // 15:13
+ FIDC:1, // 16
+ VIDC:1, // 17
+ reserved3:2, // 19:18
+ FIDCHGRATIO:1, // 20
+ reserved4:11, // 31-21
+ SGTC:20, // 32:51
+ reserved5:12; // 63:52
+ } bits;
+ unsigned long long val;
+};
+
+union msr_fidvidstatus {
+ struct {
+ unsigned CFID:5, // 4:0
+ reserved1:3, // 7:5
+ SFID:5, // 12:8
+ reserved2:3, // 15:13
+ MFID:5, // 20:16
+ reserved3:11, // 31:21
+ CVID:5, // 36:32
+ reserved4:3, // 39:37
+ SVID:5, // 44:40
+ reserved5:3, // 47:45
+ MVID:5, // 52:48
+ reserved6:11; // 63:53
+ } bits;
+ unsigned long long val;
+};
--- /dev/null
+/*
+ * (c) 2003-2010 Advanced Micro Devices, Inc.
+ * Your use of this code is subject to the terms and conditions of the
+ * GNU general public license version 2. See "COPYING" or
+ * http://www.gnu.org/licenses/gpl.html
+ *
+ * Support : mark.langsdorf@amd.com
+ *
+ * Based on the powernow-k7.c module written by Dave Jones.
+ * (C) 2003 Dave Jones on behalf of SuSE Labs
+ * (C) 2004 Dominik Brodowski <linux@brodo.de>
+ * (C) 2004 Pavel Machek <pavel@ucw.cz>
+ * Licensed under the terms of the GNU GPL License version 2.
+ * Based upon datasheets & sample CPUs kindly provided by AMD.
+ *
+ * Valuable input gratefully received from Dave Jones, Pavel Machek,
+ * Dominik Brodowski, Jacob Shin, and others.
+ * Originally developed by Paul Devriendt.
+ * Processor information obtained from Chapter 9 (Power and Thermal Management)
+ * of the "BIOS and Kernel Developer's Guide for the AMD Athlon 64 and AMD
+ * Opteron Processors" available for download from www.amd.com
+ *
+ * Tables for specific CPUs can be inferred from
+ * http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf
+ */
+
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/cpumask.h>
+#include <linux/sched.h> /* for current / set_cpus_allowed() */
+#include <linux/io.h>
+#include <linux/delay.h>
+
+#include <asm/msr.h>
+
+#include <linux/acpi.h>
+#include <linux/mutex.h>
+#include <acpi/processor.h>
+
+#define PFX "powernow-k8: "
+#define VERSION "version 2.20.00"
+#include "powernow-k8.h"
+#include "mperf.h"
+
+/* serialize freq changes */
+static DEFINE_MUTEX(fidvid_mutex);
+
+static DEFINE_PER_CPU(struct powernow_k8_data *, powernow_data);
+
+static int cpu_family = CPU_OPTERON;
+
+/* core performance boost */
+static bool cpb_capable, cpb_enabled;
+static struct msr __percpu *msrs;
+
+static struct cpufreq_driver cpufreq_amd64_driver;
+
+#ifndef CONFIG_SMP
+static inline const struct cpumask *cpu_core_mask(int cpu)
+{
+ return cpumask_of(0);
+}
+#endif
+
+/* Return a frequency in MHz, given an input fid */
+static u32 find_freq_from_fid(u32 fid)
+{
+ return 800 + (fid * 100);
+}
+
+/* Return a frequency in KHz, given an input fid */
+static u32 find_khz_freq_from_fid(u32 fid)
+{
+ return 1000 * find_freq_from_fid(fid);
+}
+
+static u32 find_khz_freq_from_pstate(struct cpufreq_frequency_table *data,
+ u32 pstate)
+{
+ return data[pstate].frequency;
+}
+
+/* Return the vco fid for an input fid
+ *
+ * Each "low" fid has corresponding "high" fid, and you can get to "low" fids
+ * only from corresponding high fids. This returns "high" fid corresponding to
+ * "low" one.
+ */
+static u32 convert_fid_to_vco_fid(u32 fid)
+{
+ if (fid < HI_FID_TABLE_BOTTOM)
+ return 8 + (2 * fid);
+ else
+ return fid;
+}
+
+/*
+ * Return 1 if the pending bit is set. Unless we just instructed the processor
+ * to transition to a new state, seeing this bit set is really bad news.
+ */
+static int pending_bit_stuck(void)
+{
+ u32 lo, hi;
+
+ if (cpu_family == CPU_HW_PSTATE)
+ return 0;
+
+ rdmsr(MSR_FIDVID_STATUS, lo, hi);
+ return lo & MSR_S_LO_CHANGE_PENDING ? 1 : 0;
+}
+
+/*
+ * Update the global current fid / vid values from the status msr.
+ * Returns 1 on error.
+ */
+static int query_current_values_with_pending_wait(struct powernow_k8_data *data)
+{
+ u32 lo, hi;
+ u32 i = 0;
+
+ if (cpu_family == CPU_HW_PSTATE) {
+ rdmsr(MSR_PSTATE_STATUS, lo, hi);
+ i = lo & HW_PSTATE_MASK;
+ data->currpstate = i;
+
+ /*
+ * a workaround for family 11h erratum 311 might cause
+ * an "out-of-range Pstate if the core is in Pstate-0
+ */
+ if ((boot_cpu_data.x86 == 0x11) && (i >= data->numps))
+ data->currpstate = HW_PSTATE_0;
+
+ return 0;
+ }
+ do {
+ if (i++ > 10000) {
+ pr_debug("detected change pending stuck\n");
+ return 1;
+ }
+ rdmsr(MSR_FIDVID_STATUS, lo, hi);
+ } while (lo & MSR_S_LO_CHANGE_PENDING);
+
+ data->currvid = hi & MSR_S_HI_CURRENT_VID;
+ data->currfid = lo & MSR_S_LO_CURRENT_FID;
+
+ return 0;
+}
+
+/* the isochronous relief time */
+static void count_off_irt(struct powernow_k8_data *data)
+{
+ udelay((1 << data->irt) * 10);
+ return;
+}
+
+/* the voltage stabilization time */
+static void count_off_vst(struct powernow_k8_data *data)
+{
+ udelay(data->vstable * VST_UNITS_20US);
+ return;
+}
+
+/* need to init the control msr to a safe value (for each cpu) */
+static void fidvid_msr_init(void)
+{
+ u32 lo, hi;
+ u8 fid, vid;
+
+ rdmsr(MSR_FIDVID_STATUS, lo, hi);
+ vid = hi & MSR_S_HI_CURRENT_VID;
+ fid = lo & MSR_S_LO_CURRENT_FID;
+ lo = fid | (vid << MSR_C_LO_VID_SHIFT);
+ hi = MSR_C_HI_STP_GNT_BENIGN;
+ pr_debug("cpu%d, init lo 0x%x, hi 0x%x\n", smp_processor_id(), lo, hi);
+ wrmsr(MSR_FIDVID_CTL, lo, hi);
+}
+
+/* write the new fid value along with the other control fields to the msr */
+static int write_new_fid(struct powernow_k8_data *data, u32 fid)
+{
+ u32 lo;
+ u32 savevid = data->currvid;
+ u32 i = 0;
+
+ if ((fid & INVALID_FID_MASK) || (data->currvid & INVALID_VID_MASK)) {
+ printk(KERN_ERR PFX "internal error - overflow on fid write\n");
+ return 1;
+ }
+
+ lo = fid;
+ lo |= (data->currvid << MSR_C_LO_VID_SHIFT);
+ lo |= MSR_C_LO_INIT_FID_VID;
+
+ pr_debug("writing fid 0x%x, lo 0x%x, hi 0x%x\n",
+ fid, lo, data->plllock * PLL_LOCK_CONVERSION);
+
+ do {
+ wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
+ if (i++ > 100) {
+ printk(KERN_ERR PFX
+ "Hardware error - pending bit very stuck - "
+ "no further pstate changes possible\n");
+ return 1;
+ }
+ } while (query_current_values_with_pending_wait(data));
+
+ count_off_irt(data);
+
+ if (savevid != data->currvid) {
+ printk(KERN_ERR PFX
+ "vid change on fid trans, old 0x%x, new 0x%x\n",
+ savevid, data->currvid);
+ return 1;
+ }
+
+ if (fid != data->currfid) {
+ printk(KERN_ERR PFX
+ "fid trans failed, fid 0x%x, curr 0x%x\n", fid,
+ data->currfid);
+ return 1;
+ }
+
+ return 0;
+}
+
+/* Write a new vid to the hardware */
+static int write_new_vid(struct powernow_k8_data *data, u32 vid)
+{
+ u32 lo;
+ u32 savefid = data->currfid;
+ int i = 0;
+
+ if ((data->currfid & INVALID_FID_MASK) || (vid & INVALID_VID_MASK)) {
+ printk(KERN_ERR PFX "internal error - overflow on vid write\n");
+ return 1;
+ }
+
+ lo = data->currfid;
+ lo |= (vid << MSR_C_LO_VID_SHIFT);
+ lo |= MSR_C_LO_INIT_FID_VID;
+
+ pr_debug("writing vid 0x%x, lo 0x%x, hi 0x%x\n",
+ vid, lo, STOP_GRANT_5NS);
+
+ do {
+ wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
+ if (i++ > 100) {
+ printk(KERN_ERR PFX "internal error - pending bit "
+ "very stuck - no further pstate "
+ "changes possible\n");
+ return 1;
+ }
+ } while (query_current_values_with_pending_wait(data));
+
+ if (savefid != data->currfid) {
+ printk(KERN_ERR PFX "fid changed on vid trans, old "
+ "0x%x new 0x%x\n",
+ savefid, data->currfid);
+ return 1;
+ }
+
+ if (vid != data->currvid) {
+ printk(KERN_ERR PFX "vid trans failed, vid 0x%x, "
+ "curr 0x%x\n",
+ vid, data->currvid);
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * Reduce the vid by the max of step or reqvid.
+ * Decreasing vid codes represent increasing voltages:
+ * vid of 0 is 1.550V, vid of 0x1e is 0.800V, vid of VID_OFF is off.
+ */
+static int decrease_vid_code_by_step(struct powernow_k8_data *data,
+ u32 reqvid, u32 step)
+{
+ if ((data->currvid - reqvid) > step)
+ reqvid = data->currvid - step;
+
+ if (write_new_vid(data, reqvid))
+ return 1;
+
+ count_off_vst(data);
+
+ return 0;
+}
+
+/* Change hardware pstate by single MSR write */
+static int transition_pstate(struct powernow_k8_data *data, u32 pstate)
+{
+ wrmsr(MSR_PSTATE_CTRL, pstate, 0);
+ data->currpstate = pstate;
+ return 0;
+}
+
+/* Change Opteron/Athlon64 fid and vid, by the 3 phases. */
+static int transition_fid_vid(struct powernow_k8_data *data,
+ u32 reqfid, u32 reqvid)
+{
+ if (core_voltage_pre_transition(data, reqvid, reqfid))
+ return 1;
+
+ if (core_frequency_transition(data, reqfid))
+ return 1;
+
+ if (core_voltage_post_transition(data, reqvid))
+ return 1;
+
+ if (query_current_values_with_pending_wait(data))
+ return 1;
+
+ if ((reqfid != data->currfid) || (reqvid != data->currvid)) {
+ printk(KERN_ERR PFX "failed (cpu%d): req 0x%x 0x%x, "
+ "curr 0x%x 0x%x\n",
+ smp_processor_id(),
+ reqfid, reqvid, data->currfid, data->currvid);
+ return 1;
+ }
+
+ pr_debug("transitioned (cpu%d): new fid 0x%x, vid 0x%x\n",
+ smp_processor_id(), data->currfid, data->currvid);
+
+ return 0;
+}
+
+/* Phase 1 - core voltage transition ... setup voltage */
+static int core_voltage_pre_transition(struct powernow_k8_data *data,
+ u32 reqvid, u32 reqfid)
+{
+ u32 rvosteps = data->rvo;
+ u32 savefid = data->currfid;
+ u32 maxvid, lo, rvomult = 1;
+
+ pr_debug("ph1 (cpu%d): start, currfid 0x%x, currvid 0x%x, "
+ "reqvid 0x%x, rvo 0x%x\n",
+ smp_processor_id(),
+ data->currfid, data->currvid, reqvid, data->rvo);
+
+ if ((savefid < LO_FID_TABLE_TOP) && (reqfid < LO_FID_TABLE_TOP))
+ rvomult = 2;
+ rvosteps *= rvomult;
+ rdmsr(MSR_FIDVID_STATUS, lo, maxvid);
+ maxvid = 0x1f & (maxvid >> 16);
+ pr_debug("ph1 maxvid=0x%x\n", maxvid);
+ if (reqvid < maxvid) /* lower numbers are higher voltages */
+ reqvid = maxvid;
+
+ while (data->currvid > reqvid) {
+ pr_debug("ph1: curr 0x%x, req vid 0x%x\n",
+ data->currvid, reqvid);
+ if (decrease_vid_code_by_step(data, reqvid, data->vidmvs))
+ return 1;
+ }
+
+ while ((rvosteps > 0) &&
+ ((rvomult * data->rvo + data->currvid) > reqvid)) {
+ if (data->currvid == maxvid) {
+ rvosteps = 0;
+ } else {
+ pr_debug("ph1: changing vid for rvo, req 0x%x\n",
+ data->currvid - 1);
+ if (decrease_vid_code_by_step(data, data->currvid-1, 1))
+ return 1;
+ rvosteps--;
+ }
+ }
+
+ if (query_current_values_with_pending_wait(data))
+ return 1;
+
+ if (savefid != data->currfid) {
+ printk(KERN_ERR PFX "ph1 err, currfid changed 0x%x\n",
+ data->currfid);
+ return 1;
+ }
+
+ pr_debug("ph1 complete, currfid 0x%x, currvid 0x%x\n",
+ data->currfid, data->currvid);
+
+ return 0;
+}
+
+/* Phase 2 - core frequency transition */
+static int core_frequency_transition(struct powernow_k8_data *data, u32 reqfid)
+{
+ u32 vcoreqfid, vcocurrfid, vcofiddiff;
+ u32 fid_interval, savevid = data->currvid;
+
+ if (data->currfid == reqfid) {
+ printk(KERN_ERR PFX "ph2 null fid transition 0x%x\n",
+ data->currfid);
+ return 0;
+ }
+
+ pr_debug("ph2 (cpu%d): starting, currfid 0x%x, currvid 0x%x, "
+ "reqfid 0x%x\n",
+ smp_processor_id(),
+ data->currfid, data->currvid, reqfid);
+
+ vcoreqfid = convert_fid_to_vco_fid(reqfid);
+ vcocurrfid = convert_fid_to_vco_fid(data->currfid);
+ vcofiddiff = vcocurrfid > vcoreqfid ? vcocurrfid - vcoreqfid
+ : vcoreqfid - vcocurrfid;
+
+ if ((reqfid <= LO_FID_TABLE_TOP) && (data->currfid <= LO_FID_TABLE_TOP))
+ vcofiddiff = 0;
+
+ while (vcofiddiff > 2) {
+ (data->currfid & 1) ? (fid_interval = 1) : (fid_interval = 2);
+
+ if (reqfid > data->currfid) {
+ if (data->currfid > LO_FID_TABLE_TOP) {
+ if (write_new_fid(data,
+ data->currfid + fid_interval))
+ return 1;
+ } else {
+ if (write_new_fid
+ (data,
+ 2 + convert_fid_to_vco_fid(data->currfid)))
+ return 1;
+ }
+ } else {
+ if (write_new_fid(data, data->currfid - fid_interval))
+ return 1;
+ }
+
+ vcocurrfid = convert_fid_to_vco_fid(data->currfid);
+ vcofiddiff = vcocurrfid > vcoreqfid ? vcocurrfid - vcoreqfid
+ : vcoreqfid - vcocurrfid;
+ }
+
+ if (write_new_fid(data, reqfid))
+ return 1;
+
+ if (query_current_values_with_pending_wait(data))
+ return 1;
+
+ if (data->currfid != reqfid) {
+ printk(KERN_ERR PFX
+ "ph2: mismatch, failed fid transition, "
+ "curr 0x%x, req 0x%x\n",
+ data->currfid, reqfid);
+ return 1;
+ }
+
+ if (savevid != data->currvid) {
+ printk(KERN_ERR PFX "ph2: vid changed, save 0x%x, curr 0x%x\n",
+ savevid, data->currvid);
+ return 1;
+ }
+
+ pr_debug("ph2 complete, currfid 0x%x, currvid 0x%x\n",
+ data->currfid, data->currvid);
+
+ return 0;
+}
+
+/* Phase 3 - core voltage transition flow ... jump to the final vid. */
+static int core_voltage_post_transition(struct powernow_k8_data *data,
+ u32 reqvid)
+{
+ u32 savefid = data->currfid;
+ u32 savereqvid = reqvid;
+
+ pr_debug("ph3 (cpu%d): starting, currfid 0x%x, currvid 0x%x\n",
+ smp_processor_id(),
+ data->currfid, data->currvid);
+
+ if (reqvid != data->currvid) {
+ if (write_new_vid(data, reqvid))
+ return 1;
+
+ if (savefid != data->currfid) {
+ printk(KERN_ERR PFX
+ "ph3: bad fid change, save 0x%x, curr 0x%x\n",
+ savefid, data->currfid);
+ return 1;
+ }
+
+ if (data->currvid != reqvid) {
+ printk(KERN_ERR PFX
+ "ph3: failed vid transition\n, "
+ "req 0x%x, curr 0x%x",
+ reqvid, data->currvid);
+ return 1;
+ }
+ }
+
+ if (query_current_values_with_pending_wait(data))
+ return 1;
+
+ if (savereqvid != data->currvid) {
+ pr_debug("ph3 failed, currvid 0x%x\n", data->currvid);
+ return 1;
+ }
+
+ if (savefid != data->currfid) {
+ pr_debug("ph3 failed, currfid changed 0x%x\n",
+ data->currfid);
+ return 1;
+ }
+
+ pr_debug("ph3 complete, currfid 0x%x, currvid 0x%x\n",
+ data->currfid, data->currvid);
+
+ return 0;
+}
+
+static void check_supported_cpu(void *_rc)
+{
+ u32 eax, ebx, ecx, edx;
+ int *rc = _rc;
+
+ *rc = -ENODEV;
+
+ if (__this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_AMD)
+ return;
+
+ eax = cpuid_eax(CPUID_PROCESSOR_SIGNATURE);
+ if (((eax & CPUID_XFAM) != CPUID_XFAM_K8) &&
+ ((eax & CPUID_XFAM) < CPUID_XFAM_10H))
+ return;
+
+ if ((eax & CPUID_XFAM) == CPUID_XFAM_K8) {
+ if (((eax & CPUID_USE_XFAM_XMOD) != CPUID_USE_XFAM_XMOD) ||
+ ((eax & CPUID_XMOD) > CPUID_XMOD_REV_MASK)) {
+ printk(KERN_INFO PFX
+ "Processor cpuid %x not supported\n", eax);
+ return;
+ }
+
+ eax = cpuid_eax(CPUID_GET_MAX_CAPABILITIES);
+ if (eax < CPUID_FREQ_VOLT_CAPABILITIES) {
+ printk(KERN_INFO PFX
+ "No frequency change capabilities detected\n");
+ return;
+ }
+
+ cpuid(CPUID_FREQ_VOLT_CAPABILITIES, &eax, &ebx, &ecx, &edx);
+ if ((edx & P_STATE_TRANSITION_CAPABLE)
+ != P_STATE_TRANSITION_CAPABLE) {
+ printk(KERN_INFO PFX
+ "Power state transitions not supported\n");
+ return;
+ }
+ } else { /* must be a HW Pstate capable processor */
+ cpuid(CPUID_FREQ_VOLT_CAPABILITIES, &eax, &ebx, &ecx, &edx);
+ if ((edx & USE_HW_PSTATE) == USE_HW_PSTATE)
+ cpu_family = CPU_HW_PSTATE;
+ else
+ return;
+ }
+
+ *rc = 0;
+}
+
+static int check_pst_table(struct powernow_k8_data *data, struct pst_s *pst,
+ u8 maxvid)
+{
+ unsigned int j;
+ u8 lastfid = 0xff;
+
+ for (j = 0; j < data->numps; j++) {
+ if (pst[j].vid > LEAST_VID) {
+ printk(KERN_ERR FW_BUG PFX "vid %d invalid : 0x%x\n",
+ j, pst[j].vid);
+ return -EINVAL;
+ }
+ if (pst[j].vid < data->rvo) {
+ /* vid + rvo >= 0 */
+ printk(KERN_ERR FW_BUG PFX "0 vid exceeded with pstate"
+ " %d\n", j);
+ return -ENODEV;
+ }
+ if (pst[j].vid < maxvid + data->rvo) {
+ /* vid + rvo >= maxvid */
+ printk(KERN_ERR FW_BUG PFX "maxvid exceeded with pstate"
+ " %d\n", j);
+ return -ENODEV;
+ }
+ if (pst[j].fid > MAX_FID) {
+ printk(KERN_ERR FW_BUG PFX "maxfid exceeded with pstate"
+ " %d\n", j);
+ return -ENODEV;
+ }
+ if (j && (pst[j].fid < HI_FID_TABLE_BOTTOM)) {
+ /* Only first fid is allowed to be in "low" range */
+ printk(KERN_ERR FW_BUG PFX "two low fids - %d : "
+ "0x%x\n", j, pst[j].fid);
+ return -EINVAL;
+ }
+ if (pst[j].fid < lastfid)
+ lastfid = pst[j].fid;
+ }
+ if (lastfid & 1) {
+ printk(KERN_ERR FW_BUG PFX "lastfid invalid\n");
+ return -EINVAL;
+ }
+ if (lastfid > LO_FID_TABLE_TOP)
+ printk(KERN_INFO FW_BUG PFX
+ "first fid not from lo freq table\n");
+
+ return 0;
+}
+
+static void invalidate_entry(struct cpufreq_frequency_table *powernow_table,
+ unsigned int entry)
+{
+ powernow_table[entry].frequency = CPUFREQ_ENTRY_INVALID;
+}
+
+static void print_basics(struct powernow_k8_data *data)
+{
+ int j;
+ for (j = 0; j < data->numps; j++) {
+ if (data->powernow_table[j].frequency !=
+ CPUFREQ_ENTRY_INVALID) {
+ if (cpu_family == CPU_HW_PSTATE) {
+ printk(KERN_INFO PFX
+ " %d : pstate %d (%d MHz)\n", j,
+ data->powernow_table[j].index,
+ data->powernow_table[j].frequency/1000);
+ } else {
+ printk(KERN_INFO PFX
+ "fid 0x%x (%d MHz), vid 0x%x\n",
+ data->powernow_table[j].index & 0xff,
+ data->powernow_table[j].frequency/1000,
+ data->powernow_table[j].index >> 8);
+ }
+ }
+ }
+ if (data->batps)
+ printk(KERN_INFO PFX "Only %d pstates on battery\n",
+ data->batps);
+}
+
+static u32 freq_from_fid_did(u32 fid, u32 did)
+{
+ u32 mhz = 0;
+
+ if (boot_cpu_data.x86 == 0x10)
+ mhz = (100 * (fid + 0x10)) >> did;
+ else if (boot_cpu_data.x86 == 0x11)
+ mhz = (100 * (fid + 8)) >> did;
+ else
+ BUG();
+
+ return mhz * 1000;
+}
+
+static int fill_powernow_table(struct powernow_k8_data *data,
+ struct pst_s *pst, u8 maxvid)
+{
+ struct cpufreq_frequency_table *powernow_table;
+ unsigned int j;
+
+ if (data->batps) {
+ /* use ACPI support to get full speed on mains power */
+ printk(KERN_WARNING PFX
+ "Only %d pstates usable (use ACPI driver for full "
+ "range\n", data->batps);
+ data->numps = data->batps;
+ }
+
+ for (j = 1; j < data->numps; j++) {
+ if (pst[j-1].fid >= pst[j].fid) {
+ printk(KERN_ERR PFX "PST out of sequence\n");
+ return -EINVAL;
+ }
+ }
+
+ if (data->numps < 2) {
+ printk(KERN_ERR PFX "no p states to transition\n");
+ return -ENODEV;
+ }
+
+ if (check_pst_table(data, pst, maxvid))
+ return -EINVAL;
+
+ powernow_table = kmalloc((sizeof(struct cpufreq_frequency_table)
+ * (data->numps + 1)), GFP_KERNEL);
+ if (!powernow_table) {
+ printk(KERN_ERR PFX "powernow_table memory alloc failure\n");
+ return -ENOMEM;
+ }
+
+ for (j = 0; j < data->numps; j++) {
+ int freq;
+ powernow_table[j].index = pst[j].fid; /* lower 8 bits */
+ powernow_table[j].index |= (pst[j].vid << 8); /* upper 8 bits */
+ freq = find_khz_freq_from_fid(pst[j].fid);
+ powernow_table[j].frequency = freq;
+ }
+ powernow_table[data->numps].frequency = CPUFREQ_TABLE_END;
+ powernow_table[data->numps].index = 0;
+
+ if (query_current_values_with_pending_wait(data)) {
+ kfree(powernow_table);
+ return -EIO;
+ }
+
+ pr_debug("cfid 0x%x, cvid 0x%x\n", data->currfid, data->currvid);
+ data->powernow_table = powernow_table;
+ if (cpumask_first(cpu_core_mask(data->cpu)) == data->cpu)
+ print_basics(data);
+
+ for (j = 0; j < data->numps; j++)
+ if ((pst[j].fid == data->currfid) &&
+ (pst[j].vid == data->currvid))
+ return 0;
+
+ pr_debug("currfid/vid do not match PST, ignoring\n");
+ return 0;
+}
+
+/* Find and validate the PSB/PST table in BIOS. */
+static int find_psb_table(struct powernow_k8_data *data)
+{
+ struct psb_s *psb;
+ unsigned int i;
+ u32 mvs;
+ u8 maxvid;
+ u32 cpst = 0;
+ u32 thiscpuid;
+
+ for (i = 0xc0000; i < 0xffff0; i += 0x10) {
+ /* Scan BIOS looking for the signature. */
+ /* It can not be at ffff0 - it is too big. */
+
+ psb = phys_to_virt(i);
+ if (memcmp(psb, PSB_ID_STRING, PSB_ID_STRING_LEN) != 0)
+ continue;
+
+ pr_debug("found PSB header at 0x%p\n", psb);
+
+ pr_debug("table vers: 0x%x\n", psb->tableversion);
+ if (psb->tableversion != PSB_VERSION_1_4) {
+ printk(KERN_ERR FW_BUG PFX "PSB table is not v1.4\n");
+ return -ENODEV;
+ }
+
+ pr_debug("flags: 0x%x\n", psb->flags1);
+ if (psb->flags1) {
+ printk(KERN_ERR FW_BUG PFX "unknown flags\n");
+ return -ENODEV;
+ }
+
+ data->vstable = psb->vstable;
+ pr_debug("voltage stabilization time: %d(*20us)\n",
+ data->vstable);
+
+ pr_debug("flags2: 0x%x\n", psb->flags2);
+ data->rvo = psb->flags2 & 3;
+ data->irt = ((psb->flags2) >> 2) & 3;
+ mvs = ((psb->flags2) >> 4) & 3;
+ data->vidmvs = 1 << mvs;
+ data->batps = ((psb->flags2) >> 6) & 3;
+
+ pr_debug("ramp voltage offset: %d\n", data->rvo);
+ pr_debug("isochronous relief time: %d\n", data->irt);
+ pr_debug("maximum voltage step: %d - 0x%x\n", mvs, data->vidmvs);
+
+ pr_debug("numpst: 0x%x\n", psb->num_tables);
+ cpst = psb->num_tables;
+ if ((psb->cpuid == 0x00000fc0) ||
+ (psb->cpuid == 0x00000fe0)) {
+ thiscpuid = cpuid_eax(CPUID_PROCESSOR_SIGNATURE);
+ if ((thiscpuid == 0x00000fc0) ||
+ (thiscpuid == 0x00000fe0))
+ cpst = 1;
+ }
+ if (cpst != 1) {
+ printk(KERN_ERR FW_BUG PFX "numpst must be 1\n");
+ return -ENODEV;
+ }
+
+ data->plllock = psb->plllocktime;
+ pr_debug("plllocktime: 0x%x (units 1us)\n", psb->plllocktime);
+ pr_debug("maxfid: 0x%x\n", psb->maxfid);
+ pr_debug("maxvid: 0x%x\n", psb->maxvid);
+ maxvid = psb->maxvid;
+
+ data->numps = psb->numps;
+ pr_debug("numpstates: 0x%x\n", data->numps);
+ return fill_powernow_table(data,
+ (struct pst_s *)(psb+1), maxvid);
+ }
+ /*
+ * If you see this message, complain to BIOS manufacturer. If
+ * he tells you "we do not support Linux" or some similar
+ * nonsense, remember that Windows 2000 uses the same legacy
+ * mechanism that the old Linux PSB driver uses. Tell them it
+ * is broken with Windows 2000.
+ *
+ * The reference to the AMD documentation is chapter 9 in the
+ * BIOS and Kernel Developer's Guide, which is available on
+ * www.amd.com
+ */
+ printk(KERN_ERR FW_BUG PFX "No PSB or ACPI _PSS objects\n");
+ printk(KERN_ERR PFX "Make sure that your BIOS is up to date"
+ " and Cool'N'Quiet support is enabled in BIOS setup\n");
+ return -ENODEV;
+}
+
+static void powernow_k8_acpi_pst_values(struct powernow_k8_data *data,
+ unsigned int index)
+{
+ u64 control;
+
+ if (!data->acpi_data.state_count || (cpu_family == CPU_HW_PSTATE))
+ return;
+
+ control = data->acpi_data.states[index].control;
+ data->irt = (control >> IRT_SHIFT) & IRT_MASK;
+ data->rvo = (control >> RVO_SHIFT) & RVO_MASK;
+ data->exttype = (control >> EXT_TYPE_SHIFT) & EXT_TYPE_MASK;
+ data->plllock = (control >> PLL_L_SHIFT) & PLL_L_MASK;
+ data->vidmvs = 1 << ((control >> MVS_SHIFT) & MVS_MASK);
+ data->vstable = (control >> VST_SHIFT) & VST_MASK;
+}
+
+static int powernow_k8_cpu_init_acpi(struct powernow_k8_data *data)
+{
+ struct cpufreq_frequency_table *powernow_table;
+ int ret_val = -ENODEV;
+ u64 control, status;
+
+ if (acpi_processor_register_performance(&data->acpi_data, data->cpu)) {
+ pr_debug("register performance failed: bad ACPI data\n");
+ return -EIO;
+ }
+
+ /* verify the data contained in the ACPI structures */
+ if (data->acpi_data.state_count <= 1) {
+ pr_debug("No ACPI P-States\n");
+ goto err_out;
+ }
+
+ control = data->acpi_data.control_register.space_id;
+ status = data->acpi_data.status_register.space_id;
+
+ if ((control != ACPI_ADR_SPACE_FIXED_HARDWARE) ||
+ (status != ACPI_ADR_SPACE_FIXED_HARDWARE)) {
+ pr_debug("Invalid control/status registers (%llx - %llx)\n",
+ control, status);
+ goto err_out;
+ }
+
+ /* fill in data->powernow_table */
+ powernow_table = kmalloc((sizeof(struct cpufreq_frequency_table)
+ * (data->acpi_data.state_count + 1)), GFP_KERNEL);
+ if (!powernow_table) {
+ pr_debug("powernow_table memory alloc failure\n");
+ goto err_out;
+ }
+
+ /* fill in data */
+ data->numps = data->acpi_data.state_count;
+ powernow_k8_acpi_pst_values(data, 0);
+
+ if (cpu_family == CPU_HW_PSTATE)
+ ret_val = fill_powernow_table_pstate(data, powernow_table);
+ else
+ ret_val = fill_powernow_table_fidvid(data, powernow_table);
+ if (ret_val)
+ goto err_out_mem;
+
+ powernow_table[data->acpi_data.state_count].frequency =
+ CPUFREQ_TABLE_END;
+ powernow_table[data->acpi_data.state_count].index = 0;
+ data->powernow_table = powernow_table;
+
+ if (cpumask_first(cpu_core_mask(data->cpu)) == data->cpu)
+ print_basics(data);
+
+ /* notify BIOS that we exist */
+ acpi_processor_notify_smm(THIS_MODULE);
+
+ if (!zalloc_cpumask_var(&data->acpi_data.shared_cpu_map, GFP_KERNEL)) {
+ printk(KERN_ERR PFX
+ "unable to alloc powernow_k8_data cpumask\n");
+ ret_val = -ENOMEM;
+ goto err_out_mem;
+ }
+
+ return 0;
+
+err_out_mem:
+ kfree(powernow_table);
+
+err_out:
+ acpi_processor_unregister_performance(&data->acpi_data, data->cpu);
+
+ /* data->acpi_data.state_count informs us at ->exit()
+ * whether ACPI was used */
+ data->acpi_data.state_count = 0;
+
+ return ret_val;
+}
+
+static int fill_powernow_table_pstate(struct powernow_k8_data *data,
+ struct cpufreq_frequency_table *powernow_table)
+{
+ int i;
+ u32 hi = 0, lo = 0;
+ rdmsr(MSR_PSTATE_CUR_LIMIT, lo, hi);
+ data->max_hw_pstate = (lo & HW_PSTATE_MAX_MASK) >> HW_PSTATE_MAX_SHIFT;
+
+ for (i = 0; i < data->acpi_data.state_count; i++) {
+ u32 index;
+
+ index = data->acpi_data.states[i].control & HW_PSTATE_MASK;
+ if (index > data->max_hw_pstate) {
+ printk(KERN_ERR PFX "invalid pstate %d - "
+ "bad value %d.\n", i, index);
+ printk(KERN_ERR PFX "Please report to BIOS "
+ "manufacturer\n");
+ invalidate_entry(powernow_table, i);
+ continue;
+ }
+ rdmsr(MSR_PSTATE_DEF_BASE + index, lo, hi);
+ if (!(hi & HW_PSTATE_VALID_MASK)) {
+ pr_debug("invalid pstate %d, ignoring\n", index);
+ invalidate_entry(powernow_table, i);
+ continue;
+ }
+
+ powernow_table[i].index = index;
+
+ /* Frequency may be rounded for these */
+ if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
+ || boot_cpu_data.x86 == 0x11) {
+ powernow_table[i].frequency =
+ freq_from_fid_did(lo & 0x3f, (lo >> 6) & 7);
+ } else
+ powernow_table[i].frequency =
+ data->acpi_data.states[i].core_frequency * 1000;
+ }
+ return 0;
+}
+
+static int fill_powernow_table_fidvid(struct powernow_k8_data *data,
+ struct cpufreq_frequency_table *powernow_table)
+{
+ int i;
+
+ for (i = 0; i < data->acpi_data.state_count; i++) {
+ u32 fid;
+ u32 vid;
+ u32 freq, index;
+ u64 status, control;
+
+ if (data->exttype) {
+ status = data->acpi_data.states[i].status;
+ fid = status & EXT_FID_MASK;
+ vid = (status >> VID_SHIFT) & EXT_VID_MASK;
+ } else {
+ control = data->acpi_data.states[i].control;
+ fid = control & FID_MASK;
+ vid = (control >> VID_SHIFT) & VID_MASK;
+ }
+
+ pr_debug(" %d : fid 0x%x, vid 0x%x\n", i, fid, vid);
+
+ index = fid | (vid<<8);
+ powernow_table[i].index = index;
+
+ freq = find_khz_freq_from_fid(fid);
+ powernow_table[i].frequency = freq;
+
+ /* verify frequency is OK */
+ if ((freq > (MAX_FREQ * 1000)) || (freq < (MIN_FREQ * 1000))) {
+ pr_debug("invalid freq %u kHz, ignoring\n", freq);
+ invalidate_entry(powernow_table, i);
+ continue;
+ }
+
+ /* verify voltage is OK -
+ * BIOSs are using "off" to indicate invalid */
+ if (vid == VID_OFF) {
+ pr_debug("invalid vid %u, ignoring\n", vid);
+ invalidate_entry(powernow_table, i);
+ continue;
+ }
+
+ if (freq != (data->acpi_data.states[i].core_frequency * 1000)) {
+ printk(KERN_INFO PFX "invalid freq entries "
+ "%u kHz vs. %u kHz\n", freq,
+ (unsigned int)
+ (data->acpi_data.states[i].core_frequency
+ * 1000));
+ invalidate_entry(powernow_table, i);
+ continue;
+ }
+ }
+ return 0;
+}
+
+static void powernow_k8_cpu_exit_acpi(struct powernow_k8_data *data)
+{
+ if (data->acpi_data.state_count)
+ acpi_processor_unregister_performance(&data->acpi_data,
+ data->cpu);
+ free_cpumask_var(data->acpi_data.shared_cpu_map);
+}
+
+static int get_transition_latency(struct powernow_k8_data *data)
+{
+ int max_latency = 0;
+ int i;
+ for (i = 0; i < data->acpi_data.state_count; i++) {
+ int cur_latency = data->acpi_data.states[i].transition_latency
+ + data->acpi_data.states[i].bus_master_latency;
+ if (cur_latency > max_latency)
+ max_latency = cur_latency;
+ }
+ if (max_latency == 0) {
+ /*
+ * Fam 11h and later may return 0 as transition latency. This
+ * is intended and means "very fast". While cpufreq core and
+ * governors currently can handle that gracefully, better set it
+ * to 1 to avoid problems in the future.
+ */
+ if (boot_cpu_data.x86 < 0x11)
+ printk(KERN_ERR FW_WARN PFX "Invalid zero transition "
+ "latency\n");
+ max_latency = 1;
+ }
+ /* value in usecs, needs to be in nanoseconds */
+ return 1000 * max_latency;
+}
+
+/* Take a frequency, and issue the fid/vid transition command */
+static int transition_frequency_fidvid(struct powernow_k8_data *data,
+ unsigned int index)
+{
+ u32 fid = 0;
+ u32 vid = 0;
+ int res, i;
+ struct cpufreq_freqs freqs;
+
+ pr_debug("cpu %d transition to index %u\n", smp_processor_id(), index);
+
+ /* fid/vid correctness check for k8 */
+ /* fid are the lower 8 bits of the index we stored into
+ * the cpufreq frequency table in find_psb_table, vid
+ * are the upper 8 bits.
+ */
+ fid = data->powernow_table[index].index & 0xFF;
+ vid = (data->powernow_table[index].index & 0xFF00) >> 8;
+
+ pr_debug("table matched fid 0x%x, giving vid 0x%x\n", fid, vid);
+
+ if (query_current_values_with_pending_wait(data))
+ return 1;
+
+ if ((data->currvid == vid) && (data->currfid == fid)) {
+ pr_debug("target matches current values (fid 0x%x, vid 0x%x)\n",
+ fid, vid);
+ return 0;
+ }
+
+ pr_debug("cpu %d, changing to fid 0x%x, vid 0x%x\n",
+ smp_processor_id(), fid, vid);
+ freqs.old = find_khz_freq_from_fid(data->currfid);
+ freqs.new = find_khz_freq_from_fid(fid);
+
+ for_each_cpu(i, data->available_cores) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ }
+
+ res = transition_fid_vid(data, fid, vid);
+ freqs.new = find_khz_freq_from_fid(data->currfid);
+
+ for_each_cpu(i, data->available_cores) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+ return res;
+}
+
+/* Take a frequency, and issue the hardware pstate transition command */
+static int transition_frequency_pstate(struct powernow_k8_data *data,
+ unsigned int index)
+{
+ u32 pstate = 0;
+ int res, i;
+ struct cpufreq_freqs freqs;
+
+ pr_debug("cpu %d transition to index %u\n", smp_processor_id(), index);
+
+ /* get MSR index for hardware pstate transition */
+ pstate = index & HW_PSTATE_MASK;
+ if (pstate > data->max_hw_pstate)
+ return 0;
+ freqs.old = find_khz_freq_from_pstate(data->powernow_table,
+ data->currpstate);
+ freqs.new = find_khz_freq_from_pstate(data->powernow_table, pstate);
+
+ for_each_cpu(i, data->available_cores) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ }
+
+ res = transition_pstate(data, pstate);
+ freqs.new = find_khz_freq_from_pstate(data->powernow_table, pstate);
+
+ for_each_cpu(i, data->available_cores) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+ return res;
+}
+
+/* Driver entry point to switch to the target frequency */
+static int powernowk8_target(struct cpufreq_policy *pol,
+ unsigned targfreq, unsigned relation)
+{
+ cpumask_var_t oldmask;
+ struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
+ u32 checkfid;
+ u32 checkvid;
+ unsigned int newstate;
+ int ret = -EIO;
+
+ if (!data)
+ return -EINVAL;
+
+ checkfid = data->currfid;
+ checkvid = data->currvid;
+
+ /* only run on specific CPU from here on. */
+ /* This is poor form: use a workqueue or smp_call_function_single */
+ if (!alloc_cpumask_var(&oldmask, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpumask_copy(oldmask, tsk_cpus_allowed(current));
+ set_cpus_allowed_ptr(current, cpumask_of(pol->cpu));
+
+ if (smp_processor_id() != pol->cpu) {
+ printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
+ goto err_out;
+ }
+
+ if (pending_bit_stuck()) {
+ printk(KERN_ERR PFX "failing targ, change pending bit set\n");
+ goto err_out;
+ }
+
+ pr_debug("targ: cpu %d, %d kHz, min %d, max %d, relation %d\n",
+ pol->cpu, targfreq, pol->min, pol->max, relation);
+
+ if (query_current_values_with_pending_wait(data))
+ goto err_out;
+
+ if (cpu_family != CPU_HW_PSTATE) {
+ pr_debug("targ: curr fid 0x%x, vid 0x%x\n",
+ data->currfid, data->currvid);
+
+ if ((checkvid != data->currvid) ||
+ (checkfid != data->currfid)) {
+ printk(KERN_INFO PFX
+ "error - out of sync, fix 0x%x 0x%x, "
+ "vid 0x%x 0x%x\n",
+ checkfid, data->currfid,
+ checkvid, data->currvid);
+ }
+ }
+
+ if (cpufreq_frequency_table_target(pol, data->powernow_table,
+ targfreq, relation, &newstate))
+ goto err_out;
+
+ mutex_lock(&fidvid_mutex);
+
+ powernow_k8_acpi_pst_values(data, newstate);
+
+ if (cpu_family == CPU_HW_PSTATE)
+ ret = transition_frequency_pstate(data, newstate);
+ else
+ ret = transition_frequency_fidvid(data, newstate);
+ if (ret) {
+ printk(KERN_ERR PFX "transition frequency failed\n");
+ ret = 1;
+ mutex_unlock(&fidvid_mutex);
+ goto err_out;
+ }
+ mutex_unlock(&fidvid_mutex);
+
+ if (cpu_family == CPU_HW_PSTATE)
+ pol->cur = find_khz_freq_from_pstate(data->powernow_table,
+ newstate);
+ else
+ pol->cur = find_khz_freq_from_fid(data->currfid);
+ ret = 0;
+
+err_out:
+ set_cpus_allowed_ptr(current, oldmask);
+ free_cpumask_var(oldmask);
+ return ret;
+}
+
+/* Driver entry point to verify the policy and range of frequencies */
+static int powernowk8_verify(struct cpufreq_policy *pol)
+{
+ struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
+
+ if (!data)
+ return -EINVAL;
+
+ return cpufreq_frequency_table_verify(pol, data->powernow_table);
+}
+
+struct init_on_cpu {
+ struct powernow_k8_data *data;
+ int rc;
+};
+
+static void __cpuinit powernowk8_cpu_init_on_cpu(void *_init_on_cpu)
+{
+ struct init_on_cpu *init_on_cpu = _init_on_cpu;
+
+ if (pending_bit_stuck()) {
+ printk(KERN_ERR PFX "failing init, change pending bit set\n");
+ init_on_cpu->rc = -ENODEV;
+ return;
+ }
+
+ if (query_current_values_with_pending_wait(init_on_cpu->data)) {
+ init_on_cpu->rc = -ENODEV;
+ return;
+ }
+
+ if (cpu_family == CPU_OPTERON)
+ fidvid_msr_init();
+
+ init_on_cpu->rc = 0;
+}
+
+/* per CPU init entry point to the driver */
+static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol)
+{
+ static const char ACPI_PSS_BIOS_BUG_MSG[] =
+ KERN_ERR FW_BUG PFX "No compatible ACPI _PSS objects found.\n"
+ FW_BUG PFX "Try again with latest BIOS.\n";
+ struct powernow_k8_data *data;
+ struct init_on_cpu init_on_cpu;
+ int rc;
+ struct cpuinfo_x86 *c = &cpu_data(pol->cpu);
+
+ if (!cpu_online(pol->cpu))
+ return -ENODEV;
+
+ smp_call_function_single(pol->cpu, check_supported_cpu, &rc, 1);
+ if (rc)
+ return -ENODEV;
+
+ data = kzalloc(sizeof(struct powernow_k8_data), GFP_KERNEL);
+ if (!data) {
+ printk(KERN_ERR PFX "unable to alloc powernow_k8_data");
+ return -ENOMEM;
+ }
+
+ data->cpu = pol->cpu;
+ data->currpstate = HW_PSTATE_INVALID;
+
+ if (powernow_k8_cpu_init_acpi(data)) {
+ /*
+ * Use the PSB BIOS structure. This is only available on
+ * an UP version, and is deprecated by AMD.
+ */
+ if (num_online_cpus() != 1) {
+ printk_once(ACPI_PSS_BIOS_BUG_MSG);
+ goto err_out;
+ }
+ if (pol->cpu != 0) {
+ printk(KERN_ERR FW_BUG PFX "No ACPI _PSS objects for "
+ "CPU other than CPU0. Complain to your BIOS "
+ "vendor.\n");
+ goto err_out;
+ }
+ rc = find_psb_table(data);
+ if (rc)
+ goto err_out;
+
+ /* Take a crude guess here.
+ * That guess was in microseconds, so multiply with 1000 */
+ pol->cpuinfo.transition_latency = (
+ ((data->rvo + 8) * data->vstable * VST_UNITS_20US) +
+ ((1 << data->irt) * 30)) * 1000;
+ } else /* ACPI _PSS objects available */
+ pol->cpuinfo.transition_latency = get_transition_latency(data);
+
+ /* only run on specific CPU from here on */
+ init_on_cpu.data = data;
+ smp_call_function_single(data->cpu, powernowk8_cpu_init_on_cpu,
+ &init_on_cpu, 1);
+ rc = init_on_cpu.rc;
+ if (rc != 0)
+ goto err_out_exit_acpi;
+
+ if (cpu_family == CPU_HW_PSTATE)
+ cpumask_copy(pol->cpus, cpumask_of(pol->cpu));
+ else
+ cpumask_copy(pol->cpus, cpu_core_mask(pol->cpu));
+ data->available_cores = pol->cpus;
+
+ if (cpu_family == CPU_HW_PSTATE)
+ pol->cur = find_khz_freq_from_pstate(data->powernow_table,
+ data->currpstate);
+ else
+ pol->cur = find_khz_freq_from_fid(data->currfid);
+ pr_debug("policy current frequency %d kHz\n", pol->cur);
+
+ /* min/max the cpu is capable of */
+ if (cpufreq_frequency_table_cpuinfo(pol, data->powernow_table)) {
+ printk(KERN_ERR FW_BUG PFX "invalid powernow_table\n");
+ powernow_k8_cpu_exit_acpi(data);
+ kfree(data->powernow_table);
+ kfree(data);
+ return -EINVAL;
+ }
+
+ /* Check for APERF/MPERF support in hardware */
+ if (cpu_has(c, X86_FEATURE_APERFMPERF))
+ cpufreq_amd64_driver.getavg = cpufreq_get_measured_perf;
+
+ cpufreq_frequency_table_get_attr(data->powernow_table, pol->cpu);
+
+ if (cpu_family == CPU_HW_PSTATE)
+ pr_debug("cpu_init done, current pstate 0x%x\n",
+ data->currpstate);
+ else
+ pr_debug("cpu_init done, current fid 0x%x, vid 0x%x\n",
+ data->currfid, data->currvid);
+
+ per_cpu(powernow_data, pol->cpu) = data;
+
+ return 0;
+
+err_out_exit_acpi:
+ powernow_k8_cpu_exit_acpi(data);
+
+err_out:
+ kfree(data);
+ return -ENODEV;
+}
+
+static int __devexit powernowk8_cpu_exit(struct cpufreq_policy *pol)
+{
+ struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
+
+ if (!data)
+ return -EINVAL;
+
+ powernow_k8_cpu_exit_acpi(data);
+
+ cpufreq_frequency_table_put_attr(pol->cpu);
+
+ kfree(data->powernow_table);
+ kfree(data);
+ per_cpu(powernow_data, pol->cpu) = NULL;
+
+ return 0;
+}
+
+static void query_values_on_cpu(void *_err)
+{
+ int *err = _err;
+ struct powernow_k8_data *data = __this_cpu_read(powernow_data);
+
+ *err = query_current_values_with_pending_wait(data);
+}
+
+static unsigned int powernowk8_get(unsigned int cpu)
+{
+ struct powernow_k8_data *data = per_cpu(powernow_data, cpu);
+ unsigned int khz = 0;
+ int err;
+
+ if (!data)
+ return 0;
+
+ smp_call_function_single(cpu, query_values_on_cpu, &err, true);
+ if (err)
+ goto out;
+
+ if (cpu_family == CPU_HW_PSTATE)
+ khz = find_khz_freq_from_pstate(data->powernow_table,
+ data->currpstate);
+ else
+ khz = find_khz_freq_from_fid(data->currfid);
+
+
+out:
+ return khz;
+}
+
+static void _cpb_toggle_msrs(bool t)
+{
+ int cpu;
+
+ get_online_cpus();
+
+ rdmsr_on_cpus(cpu_online_mask, MSR_K7_HWCR, msrs);
+
+ for_each_cpu(cpu, cpu_online_mask) {
+ struct msr *reg = per_cpu_ptr(msrs, cpu);
+ if (t)
+ reg->l &= ~BIT(25);
+ else
+ reg->l |= BIT(25);
+ }
+ wrmsr_on_cpus(cpu_online_mask, MSR_K7_HWCR, msrs);
+
+ put_online_cpus();
+}
+
+/*
+ * Switch on/off core performance boosting.
+ *
+ * 0=disable
+ * 1=enable.
+ */
+static void cpb_toggle(bool t)
+{
+ if (!cpb_capable)
+ return;
+
+ if (t && !cpb_enabled) {
+ cpb_enabled = true;
+ _cpb_toggle_msrs(t);
+ printk(KERN_INFO PFX "Core Boosting enabled.\n");
+ } else if (!t && cpb_enabled) {
+ cpb_enabled = false;
+ _cpb_toggle_msrs(t);
+ printk(KERN_INFO PFX "Core Boosting disabled.\n");
+ }
+}
+
+static ssize_t store_cpb(struct cpufreq_policy *policy, const char *buf,
+ size_t count)
+{
+ int ret = -EINVAL;
+ unsigned long val = 0;
+
+ ret = strict_strtoul(buf, 10, &val);
+ if (!ret && (val == 0 || val == 1) && cpb_capable)
+ cpb_toggle(val);
+ else
+ return -EINVAL;
+
+ return count;
+}
+
+static ssize_t show_cpb(struct cpufreq_policy *policy, char *buf)
+{
+ return sprintf(buf, "%u\n", cpb_enabled);
+}
+
+#define define_one_rw(_name) \
+static struct freq_attr _name = \
+__ATTR(_name, 0644, show_##_name, store_##_name)
+
+define_one_rw(cpb);
+
+static struct freq_attr *powernow_k8_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ &cpb,
+ NULL,
+};
+
+static struct cpufreq_driver cpufreq_amd64_driver = {
+ .verify = powernowk8_verify,
+ .target = powernowk8_target,
+ .bios_limit = acpi_processor_get_bios_limit,
+ .init = powernowk8_cpu_init,
+ .exit = __devexit_p(powernowk8_cpu_exit),
+ .get = powernowk8_get,
+ .name = "powernow-k8",
+ .owner = THIS_MODULE,
+ .attr = powernow_k8_attr,
+};
+
+/*
+ * Clear the boost-disable flag on the CPU_DOWN path so that this cpu
+ * cannot block the remaining ones from boosting. On the CPU_UP path we
+ * simply keep the boost-disable flag in sync with the current global
+ * state.
+ */
+static int cpb_notify(struct notifier_block *nb, unsigned long action,
+ void *hcpu)
+{
+ unsigned cpu = (long)hcpu;
+ u32 lo, hi;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ case CPU_UP_PREPARE_FROZEN:
+
+ if (!cpb_enabled) {
+ rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
+ lo |= BIT(25);
+ wrmsr_on_cpu(cpu, MSR_K7_HWCR, lo, hi);
+ }
+ break;
+
+ case CPU_DOWN_PREPARE:
+ case CPU_DOWN_PREPARE_FROZEN:
+ rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
+ lo &= ~BIT(25);
+ wrmsr_on_cpu(cpu, MSR_K7_HWCR, lo, hi);
+ break;
+
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block cpb_nb = {
+ .notifier_call = cpb_notify,
+};
+
+/* driver entry point for init */
+static int __cpuinit powernowk8_init(void)
+{
+ unsigned int i, supported_cpus = 0, cpu;
+ int rv;
+
+ for_each_online_cpu(i) {
+ int rc;
+ smp_call_function_single(i, check_supported_cpu, &rc, 1);
+ if (rc == 0)
+ supported_cpus++;
+ }
+
+ if (supported_cpus != num_online_cpus())
+ return -ENODEV;
+
+ printk(KERN_INFO PFX "Found %d %s (%d cpu cores) (" VERSION ")\n",
+ num_online_nodes(), boot_cpu_data.x86_model_id, supported_cpus);
+
+ if (boot_cpu_has(X86_FEATURE_CPB)) {
+
+ cpb_capable = true;
+
+ msrs = msrs_alloc();
+ if (!msrs) {
+ printk(KERN_ERR "%s: Error allocating msrs!\n", __func__);
+ return -ENOMEM;
+ }
+
+ register_cpu_notifier(&cpb_nb);
+
+ rdmsr_on_cpus(cpu_online_mask, MSR_K7_HWCR, msrs);
+
+ for_each_cpu(cpu, cpu_online_mask) {
+ struct msr *reg = per_cpu_ptr(msrs, cpu);
+ cpb_enabled |= !(!!(reg->l & BIT(25)));
+ }
+
+ printk(KERN_INFO PFX "Core Performance Boosting: %s.\n",
+ (cpb_enabled ? "on" : "off"));
+ }
+
+ rv = cpufreq_register_driver(&cpufreq_amd64_driver);
+ if (rv < 0 && boot_cpu_has(X86_FEATURE_CPB)) {
+ unregister_cpu_notifier(&cpb_nb);
+ msrs_free(msrs);
+ msrs = NULL;
+ }
+ return rv;
+}
+
+/* driver entry point for term */
+static void __exit powernowk8_exit(void)
+{
+ pr_debug("exit\n");
+
+ if (boot_cpu_has(X86_FEATURE_CPB)) {
+ msrs_free(msrs);
+ msrs = NULL;
+
+ unregister_cpu_notifier(&cpb_nb);
+ }
+
+ cpufreq_unregister_driver(&cpufreq_amd64_driver);
+}
+
+MODULE_AUTHOR("Paul Devriendt <paul.devriendt@amd.com> and "
+ "Mark Langsdorf <mark.langsdorf@amd.com>");
+MODULE_DESCRIPTION("AMD Athlon 64 and Opteron processor frequency driver.");
+MODULE_LICENSE("GPL");
+
+late_initcall(powernowk8_init);
+module_exit(powernowk8_exit);
--- /dev/null
+/*
+ * (c) 2003-2006 Advanced Micro Devices, Inc.
+ * Your use of this code is subject to the terms and conditions of the
+ * GNU general public license version 2. See "COPYING" or
+ * http://www.gnu.org/licenses/gpl.html
+ */
+
+enum pstate {
+ HW_PSTATE_INVALID = 0xff,
+ HW_PSTATE_0 = 0,
+ HW_PSTATE_1 = 1,
+ HW_PSTATE_2 = 2,
+ HW_PSTATE_3 = 3,
+ HW_PSTATE_4 = 4,
+ HW_PSTATE_5 = 5,
+ HW_PSTATE_6 = 6,
+ HW_PSTATE_7 = 7,
+};
+
+struct powernow_k8_data {
+ unsigned int cpu;
+
+ u32 numps; /* number of p-states */
+ u32 batps; /* number of p-states supported on battery */
+ u32 max_hw_pstate; /* maximum legal hardware pstate */
+
+ /* these values are constant when the PSB is used to determine
+ * vid/fid pairings, but are modified during the ->target() call
+ * when ACPI is used */
+ u32 rvo; /* ramp voltage offset */
+ u32 irt; /* isochronous relief time */
+ u32 vidmvs; /* usable value calculated from mvs */
+ u32 vstable; /* voltage stabilization time, units 20 us */
+ u32 plllock; /* pll lock time, units 1 us */
+ u32 exttype; /* extended interface = 1 */
+
+ /* keep track of the current fid / vid or pstate */
+ u32 currvid;
+ u32 currfid;
+ enum pstate currpstate;
+
+ /* the powernow_table includes all frequency and vid/fid pairings:
+ * fid are the lower 8 bits of the index, vid are the upper 8 bits.
+ * frequency is in kHz */
+ struct cpufreq_frequency_table *powernow_table;
+
+ /* the acpi table needs to be kept. it's only available if ACPI was
+ * used to determine valid frequency/vid/fid states */
+ struct acpi_processor_performance acpi_data;
+
+ /* we need to keep track of associated cores, but let cpufreq
+ * handle hotplug events - so just point at cpufreq pol->cpus
+ * structure */
+ struct cpumask *available_cores;
+};
+
+/* processor's cpuid instruction support */
+#define CPUID_PROCESSOR_SIGNATURE 1 /* function 1 */
+#define CPUID_XFAM 0x0ff00000 /* extended family */
+#define CPUID_XFAM_K8 0
+#define CPUID_XMOD 0x000f0000 /* extended model */
+#define CPUID_XMOD_REV_MASK 0x000c0000
+#define CPUID_XFAM_10H 0x00100000 /* family 0x10 */
+#define CPUID_USE_XFAM_XMOD 0x00000f00
+#define CPUID_GET_MAX_CAPABILITIES 0x80000000
+#define CPUID_FREQ_VOLT_CAPABILITIES 0x80000007
+#define P_STATE_TRANSITION_CAPABLE 6
+
+/* Model Specific Registers for p-state transitions. MSRs are 64-bit. For */
+/* writes (wrmsr - opcode 0f 30), the register number is placed in ecx, and */
+/* the value to write is placed in edx:eax. For reads (rdmsr - opcode 0f 32), */
+/* the register number is placed in ecx, and the data is returned in edx:eax. */
+
+#define MSR_FIDVID_CTL 0xc0010041
+#define MSR_FIDVID_STATUS 0xc0010042
+
+/* Field definitions within the FID VID Low Control MSR : */
+#define MSR_C_LO_INIT_FID_VID 0x00010000
+#define MSR_C_LO_NEW_VID 0x00003f00
+#define MSR_C_LO_NEW_FID 0x0000003f
+#define MSR_C_LO_VID_SHIFT 8
+
+/* Field definitions within the FID VID High Control MSR : */
+#define MSR_C_HI_STP_GNT_TO 0x000fffff
+
+/* Field definitions within the FID VID Low Status MSR : */
+#define MSR_S_LO_CHANGE_PENDING 0x80000000 /* cleared when completed */
+#define MSR_S_LO_MAX_RAMP_VID 0x3f000000
+#define MSR_S_LO_MAX_FID 0x003f0000
+#define MSR_S_LO_START_FID 0x00003f00
+#define MSR_S_LO_CURRENT_FID 0x0000003f
+
+/* Field definitions within the FID VID High Status MSR : */
+#define MSR_S_HI_MIN_WORKING_VID 0x3f000000
+#define MSR_S_HI_MAX_WORKING_VID 0x003f0000
+#define MSR_S_HI_START_VID 0x00003f00
+#define MSR_S_HI_CURRENT_VID 0x0000003f
+#define MSR_C_HI_STP_GNT_BENIGN 0x00000001
+
+
+/* Hardware Pstate _PSS and MSR definitions */
+#define USE_HW_PSTATE 0x00000080
+#define HW_PSTATE_MASK 0x00000007
+#define HW_PSTATE_VALID_MASK 0x80000000
+#define HW_PSTATE_MAX_MASK 0x000000f0
+#define HW_PSTATE_MAX_SHIFT 4
+#define MSR_PSTATE_DEF_BASE 0xc0010064 /* base of Pstate MSRs */
+#define MSR_PSTATE_STATUS 0xc0010063 /* Pstate Status MSR */
+#define MSR_PSTATE_CTRL 0xc0010062 /* Pstate control MSR */
+#define MSR_PSTATE_CUR_LIMIT 0xc0010061 /* pstate current limit MSR */
+
+/* define the two driver architectures */
+#define CPU_OPTERON 0
+#define CPU_HW_PSTATE 1
+
+
+/*
+ * There are restrictions frequencies have to follow:
+ * - only 1 entry in the low fid table ( <=1.4GHz )
+ * - lowest entry in the high fid table must be >= 2 * the entry in the
+ * low fid table
+ * - lowest entry in the high fid table must be a <= 200MHz + 2 * the entry
+ * in the low fid table
+ * - the parts can only step at <= 200 MHz intervals, odd fid values are
+ * supported in revision G and later revisions.
+ * - lowest frequency must be >= interprocessor hypertransport link speed
+ * (only applies to MP systems obviously)
+ */
+
+/* fids (frequency identifiers) are arranged in 2 tables - lo and hi */
+#define LO_FID_TABLE_TOP 7 /* fid values marking the boundary */
+#define HI_FID_TABLE_BOTTOM 8 /* between the low and high tables */
+
+#define LO_VCOFREQ_TABLE_TOP 1400 /* corresponding vco frequency values */
+#define HI_VCOFREQ_TABLE_BOTTOM 1600
+
+#define MIN_FREQ_RESOLUTION 200 /* fids jump by 2 matching freq jumps by 200 */
+
+#define MAX_FID 0x2a /* Spec only gives FID values as far as 5 GHz */
+#define LEAST_VID 0x3e /* Lowest (numerically highest) useful vid value */
+
+#define MIN_FREQ 800 /* Min and max freqs, per spec */
+#define MAX_FREQ 5000
+
+#define INVALID_FID_MASK 0xffffffc0 /* not a valid fid if these bits are set */
+#define INVALID_VID_MASK 0xffffffc0 /* not a valid vid if these bits are set */
+
+#define VID_OFF 0x3f
+
+#define STOP_GRANT_5NS 1 /* min poss memory access latency for voltage change */
+
+#define PLL_LOCK_CONVERSION (1000/5) /* ms to ns, then divide by clock period */
+
+#define MAXIMUM_VID_STEPS 1 /* Current cpus only allow a single step of 25mV */
+#define VST_UNITS_20US 20 /* Voltage Stabilization Time is in units of 20us */
+
+/*
+ * Most values of interest are encoded in a single field of the _PSS
+ * entries: the "control" value.
+ */
+
+#define IRT_SHIFT 30
+#define RVO_SHIFT 28
+#define EXT_TYPE_SHIFT 27
+#define PLL_L_SHIFT 20
+#define MVS_SHIFT 18
+#define VST_SHIFT 11
+#define VID_SHIFT 6
+#define IRT_MASK 3
+#define RVO_MASK 3
+#define EXT_TYPE_MASK 1
+#define PLL_L_MASK 0x7f
+#define MVS_MASK 3
+#define VST_MASK 0x7f
+#define VID_MASK 0x1f
+#define FID_MASK 0x1f
+#define EXT_VID_MASK 0x3f
+#define EXT_FID_MASK 0x3f
+
+
+/*
+ * Version 1.4 of the PSB table. This table is constructed by BIOS and is
+ * to tell the OS's power management driver which VIDs and FIDs are
+ * supported by this particular processor.
+ * If the data in the PSB / PST is wrong, then this driver will program the
+ * wrong values into hardware, which is very likely to lead to a crash.
+ */
+
+#define PSB_ID_STRING "AMDK7PNOW!"
+#define PSB_ID_STRING_LEN 10
+
+#define PSB_VERSION_1_4 0x14
+
+struct psb_s {
+ u8 signature[10];
+ u8 tableversion;
+ u8 flags1;
+ u16 vstable;
+ u8 flags2;
+ u8 num_tables;
+ u32 cpuid;
+ u8 plllocktime;
+ u8 maxfid;
+ u8 maxvid;
+ u8 numps;
+};
+
+/* Pairs of fid/vid values are appended to the version 1.4 PSB table. */
+struct pst_s {
+ u8 fid;
+ u8 vid;
+};
+
+static int core_voltage_pre_transition(struct powernow_k8_data *data,
+ u32 reqvid, u32 regfid);
+static int core_voltage_post_transition(struct powernow_k8_data *data, u32 reqvid);
+static int core_frequency_transition(struct powernow_k8_data *data, u32 reqfid);
+
+static void powernow_k8_acpi_pst_values(struct powernow_k8_data *data, unsigned int index);
+
+static int fill_powernow_table_pstate(struct powernow_k8_data *data, struct cpufreq_frequency_table *powernow_table);
+static int fill_powernow_table_fidvid(struct powernow_k8_data *data, struct cpufreq_frequency_table *powernow_table);
--- /dev/null
+/*
+ * sc520_freq.c: cpufreq driver for the AMD Elan sc520
+ *
+ * Copyright (C) 2005 Sean Young <sean@mess.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Based on elanfreq.c
+ *
+ * 2005-03-30: - initial revision
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include <linux/delay.h>
+#include <linux/cpufreq.h>
+#include <linux/timex.h>
+#include <linux/io.h>
+
+#include <asm/msr.h>
+
+#define MMCR_BASE 0xfffef000 /* The default base address */
+#define OFFS_CPUCTL 0x2 /* CPU Control Register */
+
+static __u8 __iomem *cpuctl;
+
+#define PFX "sc520_freq: "
+
+static struct cpufreq_frequency_table sc520_freq_table[] = {
+ {0x01, 100000},
+ {0x02, 133000},
+ {0, CPUFREQ_TABLE_END},
+};
+
+static unsigned int sc520_freq_get_cpu_frequency(unsigned int cpu)
+{
+ u8 clockspeed_reg = *cpuctl;
+
+ switch (clockspeed_reg & 0x03) {
+ default:
+ printk(KERN_ERR PFX "error: cpuctl register has unexpected "
+ "value %02x\n", clockspeed_reg);
+ case 0x01:
+ return 100000;
+ case 0x02:
+ return 133000;
+ }
+}
+
+static void sc520_freq_set_cpu_state(unsigned int state)
+{
+
+ struct cpufreq_freqs freqs;
+ u8 clockspeed_reg;
+
+ freqs.old = sc520_freq_get_cpu_frequency(0);
+ freqs.new = sc520_freq_table[state].frequency;
+ freqs.cpu = 0; /* AMD Elan is UP */
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+ pr_debug("attempting to set frequency to %i kHz\n",
+ sc520_freq_table[state].frequency);
+
+ local_irq_disable();
+
+ clockspeed_reg = *cpuctl & ~0x03;
+ *cpuctl = clockspeed_reg | sc520_freq_table[state].index;
+
+ local_irq_enable();
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+};
+
+static int sc520_freq_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, &sc520_freq_table[0]);
+}
+
+static int sc520_freq_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate = 0;
+
+ if (cpufreq_frequency_table_target(policy, sc520_freq_table,
+ target_freq, relation, &newstate))
+ return -EINVAL;
+
+ sc520_freq_set_cpu_state(newstate);
+
+ return 0;
+}
+
+
+/*
+ * Module init and exit code
+ */
+
+static int sc520_freq_cpu_init(struct cpufreq_policy *policy)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ int result;
+
+ /* capability check */
+ if (c->x86_vendor != X86_VENDOR_AMD ||
+ c->x86 != 4 || c->x86_model != 9)
+ return -ENODEV;
+
+ /* cpuinfo and default policy values */
+ policy->cpuinfo.transition_latency = 1000000; /* 1ms */
+ policy->cur = sc520_freq_get_cpu_frequency(0);
+
+ result = cpufreq_frequency_table_cpuinfo(policy, sc520_freq_table);
+ if (result)
+ return result;
+
+ cpufreq_frequency_table_get_attr(sc520_freq_table, policy->cpu);
+
+ return 0;
+}
+
+
+static int sc520_freq_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+
+static struct freq_attr *sc520_freq_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+
+static struct cpufreq_driver sc520_freq_driver = {
+ .get = sc520_freq_get_cpu_frequency,
+ .verify = sc520_freq_verify,
+ .target = sc520_freq_target,
+ .init = sc520_freq_cpu_init,
+ .exit = sc520_freq_cpu_exit,
+ .name = "sc520_freq",
+ .owner = THIS_MODULE,
+ .attr = sc520_freq_attr,
+};
+
+
+static int __init sc520_freq_init(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ int err;
+
+ /* Test if we have the right hardware */
+ if (c->x86_vendor != X86_VENDOR_AMD ||
+ c->x86 != 4 || c->x86_model != 9) {
+ pr_debug("no Elan SC520 processor found!\n");
+ return -ENODEV;
+ }
+ cpuctl = ioremap((unsigned long)(MMCR_BASE + OFFS_CPUCTL), 1);
+ if (!cpuctl) {
+ printk(KERN_ERR "sc520_freq: error: failed to remap memory\n");
+ return -ENOMEM;
+ }
+
+ err = cpufreq_register_driver(&sc520_freq_driver);
+ if (err)
+ iounmap(cpuctl);
+
+ return err;
+}
+
+
+static void __exit sc520_freq_exit(void)
+{
+ cpufreq_unregister_driver(&sc520_freq_driver);
+ iounmap(cpuctl);
+}
+
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Sean Young <sean@mess.org>");
+MODULE_DESCRIPTION("cpufreq driver for AMD's Elan sc520 CPU");
+
+module_init(sc520_freq_init);
+module_exit(sc520_freq_exit);
+
--- /dev/null
+/*
+ * cpufreq driver for Enhanced SpeedStep, as found in Intel's Pentium
+ * M (part of the Centrino chipset).
+ *
+ * Since the original Pentium M, most new Intel CPUs support Enhanced
+ * SpeedStep.
+ *
+ * Despite the "SpeedStep" in the name, this is almost entirely unlike
+ * traditional SpeedStep.
+ *
+ * Modelled on speedstep.c
+ *
+ * Copyright (C) 2003 Jeremy Fitzhardinge <jeremy@goop.org>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/sched.h> /* current */
+#include <linux/delay.h>
+#include <linux/compiler.h>
+#include <linux/gfp.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+
+#define PFX "speedstep-centrino: "
+#define MAINTAINER "cpufreq@vger.kernel.org"
+
+#define INTEL_MSR_RANGE (0xffff)
+
+struct cpu_id
+{
+ __u8 x86; /* CPU family */
+ __u8 x86_model; /* model */
+ __u8 x86_mask; /* stepping */
+};
+
+enum {
+ CPU_BANIAS,
+ CPU_DOTHAN_A1,
+ CPU_DOTHAN_A2,
+ CPU_DOTHAN_B0,
+ CPU_MP4HT_D0,
+ CPU_MP4HT_E0,
+};
+
+static const struct cpu_id cpu_ids[] = {
+ [CPU_BANIAS] = { 6, 9, 5 },
+ [CPU_DOTHAN_A1] = { 6, 13, 1 },
+ [CPU_DOTHAN_A2] = { 6, 13, 2 },
+ [CPU_DOTHAN_B0] = { 6, 13, 6 },
+ [CPU_MP4HT_D0] = {15, 3, 4 },
+ [CPU_MP4HT_E0] = {15, 4, 1 },
+};
+#define N_IDS ARRAY_SIZE(cpu_ids)
+
+struct cpu_model
+{
+ const struct cpu_id *cpu_id;
+ const char *model_name;
+ unsigned max_freq; /* max clock in kHz */
+
+ struct cpufreq_frequency_table *op_points; /* clock/voltage pairs */
+};
+static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c,
+ const struct cpu_id *x);
+
+/* Operating points for current CPU */
+static DEFINE_PER_CPU(struct cpu_model *, centrino_model);
+static DEFINE_PER_CPU(const struct cpu_id *, centrino_cpu);
+
+static struct cpufreq_driver centrino_driver;
+
+#ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE
+
+/* Computes the correct form for IA32_PERF_CTL MSR for a particular
+ frequency/voltage operating point; frequency in MHz, volts in mV.
+ This is stored as "index" in the structure. */
+#define OP(mhz, mv) \
+ { \
+ .frequency = (mhz) * 1000, \
+ .index = (((mhz)/100) << 8) | ((mv - 700) / 16) \
+ }
+
+/*
+ * These voltage tables were derived from the Intel Pentium M
+ * datasheet, document 25261202.pdf, Table 5. I have verified they
+ * are consistent with my IBM ThinkPad X31, which has a 1.3GHz Pentium
+ * M.
+ */
+
+/* Ultra Low Voltage Intel Pentium M processor 900MHz (Banias) */
+static struct cpufreq_frequency_table banias_900[] =
+{
+ OP(600, 844),
+ OP(800, 988),
+ OP(900, 1004),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Ultra Low Voltage Intel Pentium M processor 1000MHz (Banias) */
+static struct cpufreq_frequency_table banias_1000[] =
+{
+ OP(600, 844),
+ OP(800, 972),
+ OP(900, 988),
+ OP(1000, 1004),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Low Voltage Intel Pentium M processor 1.10GHz (Banias) */
+static struct cpufreq_frequency_table banias_1100[] =
+{
+ OP( 600, 956),
+ OP( 800, 1020),
+ OP( 900, 1100),
+ OP(1000, 1164),
+ OP(1100, 1180),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+
+/* Low Voltage Intel Pentium M processor 1.20GHz (Banias) */
+static struct cpufreq_frequency_table banias_1200[] =
+{
+ OP( 600, 956),
+ OP( 800, 1004),
+ OP( 900, 1020),
+ OP(1000, 1100),
+ OP(1100, 1164),
+ OP(1200, 1180),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.30GHz (Banias) */
+static struct cpufreq_frequency_table banias_1300[] =
+{
+ OP( 600, 956),
+ OP( 800, 1260),
+ OP(1000, 1292),
+ OP(1200, 1356),
+ OP(1300, 1388),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.40GHz (Banias) */
+static struct cpufreq_frequency_table banias_1400[] =
+{
+ OP( 600, 956),
+ OP( 800, 1180),
+ OP(1000, 1308),
+ OP(1200, 1436),
+ OP(1400, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.50GHz (Banias) */
+static struct cpufreq_frequency_table banias_1500[] =
+{
+ OP( 600, 956),
+ OP( 800, 1116),
+ OP(1000, 1228),
+ OP(1200, 1356),
+ OP(1400, 1452),
+ OP(1500, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.60GHz (Banias) */
+static struct cpufreq_frequency_table banias_1600[] =
+{
+ OP( 600, 956),
+ OP( 800, 1036),
+ OP(1000, 1164),
+ OP(1200, 1276),
+ OP(1400, 1420),
+ OP(1600, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.70GHz (Banias) */
+static struct cpufreq_frequency_table banias_1700[] =
+{
+ OP( 600, 956),
+ OP( 800, 1004),
+ OP(1000, 1116),
+ OP(1200, 1228),
+ OP(1400, 1308),
+ OP(1700, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+#undef OP
+
+#define _BANIAS(cpuid, max, name) \
+{ .cpu_id = cpuid, \
+ .model_name = "Intel(R) Pentium(R) M processor " name "MHz", \
+ .max_freq = (max)*1000, \
+ .op_points = banias_##max, \
+}
+#define BANIAS(max) _BANIAS(&cpu_ids[CPU_BANIAS], max, #max)
+
+/* CPU models, their operating frequency range, and freq/voltage
+ operating points */
+static struct cpu_model models[] =
+{
+ _BANIAS(&cpu_ids[CPU_BANIAS], 900, " 900"),
+ BANIAS(1000),
+ BANIAS(1100),
+ BANIAS(1200),
+ BANIAS(1300),
+ BANIAS(1400),
+ BANIAS(1500),
+ BANIAS(1600),
+ BANIAS(1700),
+
+ /* NULL model_name is a wildcard */
+ { &cpu_ids[CPU_DOTHAN_A1], NULL, 0, NULL },
+ { &cpu_ids[CPU_DOTHAN_A2], NULL, 0, NULL },
+ { &cpu_ids[CPU_DOTHAN_B0], NULL, 0, NULL },
+ { &cpu_ids[CPU_MP4HT_D0], NULL, 0, NULL },
+ { &cpu_ids[CPU_MP4HT_E0], NULL, 0, NULL },
+
+ { NULL, }
+};
+#undef _BANIAS
+#undef BANIAS
+
+static int centrino_cpu_init_table(struct cpufreq_policy *policy)
+{
+ struct cpuinfo_x86 *cpu = &cpu_data(policy->cpu);
+ struct cpu_model *model;
+
+ for(model = models; model->cpu_id != NULL; model++)
+ if (centrino_verify_cpu_id(cpu, model->cpu_id) &&
+ (model->model_name == NULL ||
+ strcmp(cpu->x86_model_id, model->model_name) == 0))
+ break;
+
+ if (model->cpu_id == NULL) {
+ /* No match at all */
+ pr_debug("no support for CPU model \"%s\": "
+ "send /proc/cpuinfo to " MAINTAINER "\n",
+ cpu->x86_model_id);
+ return -ENOENT;
+ }
+
+ if (model->op_points == NULL) {
+ /* Matched a non-match */
+ pr_debug("no table support for CPU model \"%s\"\n",
+ cpu->x86_model_id);
+ pr_debug("try using the acpi-cpufreq driver\n");
+ return -ENOENT;
+ }
+
+ per_cpu(centrino_model, policy->cpu) = model;
+
+ pr_debug("found \"%s\": max frequency: %dkHz\n",
+ model->model_name, model->max_freq);
+
+ return 0;
+}
+
+#else
+static inline int centrino_cpu_init_table(struct cpufreq_policy *policy)
+{
+ return -ENODEV;
+}
+#endif /* CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE */
+
+static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c,
+ const struct cpu_id *x)
+{
+ if ((c->x86 == x->x86) &&
+ (c->x86_model == x->x86_model) &&
+ (c->x86_mask == x->x86_mask))
+ return 1;
+ return 0;
+}
+
+/* To be called only after centrino_model is initialized */
+static unsigned extract_clock(unsigned msr, unsigned int cpu, int failsafe)
+{
+ int i;
+
+ /*
+ * Extract clock in kHz from PERF_CTL value
+ * for centrino, as some DSDTs are buggy.
+ * Ideally, this can be done using the acpi_data structure.
+ */
+ if ((per_cpu(centrino_cpu, cpu) == &cpu_ids[CPU_BANIAS]) ||
+ (per_cpu(centrino_cpu, cpu) == &cpu_ids[CPU_DOTHAN_A1]) ||
+ (per_cpu(centrino_cpu, cpu) == &cpu_ids[CPU_DOTHAN_B0])) {
+ msr = (msr >> 8) & 0xff;
+ return msr * 100000;
+ }
+
+ if ((!per_cpu(centrino_model, cpu)) ||
+ (!per_cpu(centrino_model, cpu)->op_points))
+ return 0;
+
+ msr &= 0xffff;
+ for (i = 0;
+ per_cpu(centrino_model, cpu)->op_points[i].frequency
+ != CPUFREQ_TABLE_END;
+ i++) {
+ if (msr == per_cpu(centrino_model, cpu)->op_points[i].index)
+ return per_cpu(centrino_model, cpu)->
+ op_points[i].frequency;
+ }
+ if (failsafe)
+ return per_cpu(centrino_model, cpu)->op_points[i-1].frequency;
+ else
+ return 0;
+}
+
+/* Return the current CPU frequency in kHz */
+static unsigned int get_cur_freq(unsigned int cpu)
+{
+ unsigned l, h;
+ unsigned clock_freq;
+
+ rdmsr_on_cpu(cpu, MSR_IA32_PERF_STATUS, &l, &h);
+ clock_freq = extract_clock(l, cpu, 0);
+
+ if (unlikely(clock_freq == 0)) {
+ /*
+ * On some CPUs, we can see transient MSR values (which are
+ * not present in _PSS), while CPU is doing some automatic
+ * P-state transition (like TM2). Get the last freq set
+ * in PERF_CTL.
+ */
+ rdmsr_on_cpu(cpu, MSR_IA32_PERF_CTL, &l, &h);
+ clock_freq = extract_clock(l, cpu, 1);
+ }
+ return clock_freq;
+}
+
+
+static int centrino_cpu_init(struct cpufreq_policy *policy)
+{
+ struct cpuinfo_x86 *cpu = &cpu_data(policy->cpu);
+ unsigned freq;
+ unsigned l, h;
+ int ret;
+ int i;
+
+ /* Only Intel makes Enhanced Speedstep-capable CPUs */
+ if (cpu->x86_vendor != X86_VENDOR_INTEL ||
+ !cpu_has(cpu, X86_FEATURE_EST))
+ return -ENODEV;
+
+ if (cpu_has(cpu, X86_FEATURE_CONSTANT_TSC))
+ centrino_driver.flags |= CPUFREQ_CONST_LOOPS;
+
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ for (i = 0; i < N_IDS; i++)
+ if (centrino_verify_cpu_id(cpu, &cpu_ids[i]))
+ break;
+
+ if (i != N_IDS)
+ per_cpu(centrino_cpu, policy->cpu) = &cpu_ids[i];
+
+ if (!per_cpu(centrino_cpu, policy->cpu)) {
+ pr_debug("found unsupported CPU with "
+ "Enhanced SpeedStep: send /proc/cpuinfo to "
+ MAINTAINER "\n");
+ return -ENODEV;
+ }
+
+ if (centrino_cpu_init_table(policy)) {
+ return -ENODEV;
+ }
+
+ /* Check to see if Enhanced SpeedStep is enabled, and try to
+ enable it if not. */
+ rdmsr(MSR_IA32_MISC_ENABLE, l, h);
+
+ if (!(l & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
+ l |= MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP;
+ pr_debug("trying to enable Enhanced SpeedStep (%x)\n", l);
+ wrmsr(MSR_IA32_MISC_ENABLE, l, h);
+
+ /* check to see if it stuck */
+ rdmsr(MSR_IA32_MISC_ENABLE, l, h);
+ if (!(l & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
+ printk(KERN_INFO PFX
+ "couldn't enable Enhanced SpeedStep\n");
+ return -ENODEV;
+ }
+ }
+
+ freq = get_cur_freq(policy->cpu);
+ policy->cpuinfo.transition_latency = 10000;
+ /* 10uS transition latency */
+ policy->cur = freq;
+
+ pr_debug("centrino_cpu_init: cur=%dkHz\n", policy->cur);
+
+ ret = cpufreq_frequency_table_cpuinfo(policy,
+ per_cpu(centrino_model, policy->cpu)->op_points);
+ if (ret)
+ return (ret);
+
+ cpufreq_frequency_table_get_attr(
+ per_cpu(centrino_model, policy->cpu)->op_points, policy->cpu);
+
+ return 0;
+}
+
+static int centrino_cpu_exit(struct cpufreq_policy *policy)
+{
+ unsigned int cpu = policy->cpu;
+
+ if (!per_cpu(centrino_model, cpu))
+ return -ENODEV;
+
+ cpufreq_frequency_table_put_attr(cpu);
+
+ per_cpu(centrino_model, cpu) = NULL;
+
+ return 0;
+}
+
+/**
+ * centrino_verify - verifies a new CPUFreq policy
+ * @policy: new policy
+ *
+ * Limit must be within this model's frequency range at least one
+ * border included.
+ */
+static int centrino_verify (struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy,
+ per_cpu(centrino_model, policy->cpu)->op_points);
+}
+
+/**
+ * centrino_setpolicy - set a new CPUFreq policy
+ * @policy: new policy
+ * @target_freq: the target frequency
+ * @relation: how that frequency relates to achieved frequency
+ * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
+ *
+ * Sets a new CPUFreq policy.
+ */
+static int centrino_target (struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate = 0;
+ unsigned int msr, oldmsr = 0, h = 0, cpu = policy->cpu;
+ struct cpufreq_freqs freqs;
+ int retval = 0;
+ unsigned int j, k, first_cpu, tmp;
+ cpumask_var_t covered_cpus;
+
+ if (unlikely(!zalloc_cpumask_var(&covered_cpus, GFP_KERNEL)))
+ return -ENOMEM;
+
+ if (unlikely(per_cpu(centrino_model, cpu) == NULL)) {
+ retval = -ENODEV;
+ goto out;
+ }
+
+ if (unlikely(cpufreq_frequency_table_target(policy,
+ per_cpu(centrino_model, cpu)->op_points,
+ target_freq,
+ relation,
+ &newstate))) {
+ retval = -EINVAL;
+ goto out;
+ }
+
+ first_cpu = 1;
+ for_each_cpu(j, policy->cpus) {
+ int good_cpu;
+
+ /* cpufreq holds the hotplug lock, so we are safe here */
+ if (!cpu_online(j))
+ continue;
+
+ /*
+ * Support for SMP systems.
+ * Make sure we are running on CPU that wants to change freq
+ */
+ if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
+ good_cpu = cpumask_any_and(policy->cpus,
+ cpu_online_mask);
+ else
+ good_cpu = j;
+
+ if (good_cpu >= nr_cpu_ids) {
+ pr_debug("couldn't limit to CPUs in this domain\n");
+ retval = -EAGAIN;
+ if (first_cpu) {
+ /* We haven't started the transition yet. */
+ goto out;
+ }
+ break;
+ }
+
+ msr = per_cpu(centrino_model, cpu)->op_points[newstate].index;
+
+ if (first_cpu) {
+ rdmsr_on_cpu(good_cpu, MSR_IA32_PERF_CTL, &oldmsr, &h);
+ if (msr == (oldmsr & 0xffff)) {
+ pr_debug("no change needed - msr was and needs "
+ "to be %x\n", oldmsr);
+ retval = 0;
+ goto out;
+ }
+
+ freqs.old = extract_clock(oldmsr, cpu, 0);
+ freqs.new = extract_clock(msr, cpu, 0);
+
+ pr_debug("target=%dkHz old=%d new=%d msr=%04x\n",
+ target_freq, freqs.old, freqs.new, msr);
+
+ for_each_cpu(k, policy->cpus) {
+ if (!cpu_online(k))
+ continue;
+ freqs.cpu = k;
+ cpufreq_notify_transition(&freqs,
+ CPUFREQ_PRECHANGE);
+ }
+
+ first_cpu = 0;
+ /* all but 16 LSB are reserved, treat them with care */
+ oldmsr &= ~0xffff;
+ msr &= 0xffff;
+ oldmsr |= msr;
+ }
+
+ wrmsr_on_cpu(good_cpu, MSR_IA32_PERF_CTL, oldmsr, h);
+ if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY)
+ break;
+
+ cpumask_set_cpu(j, covered_cpus);
+ }
+
+ for_each_cpu(k, policy->cpus) {
+ if (!cpu_online(k))
+ continue;
+ freqs.cpu = k;
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+
+ if (unlikely(retval)) {
+ /*
+ * We have failed halfway through the frequency change.
+ * We have sent callbacks to policy->cpus and
+ * MSRs have already been written on coverd_cpus.
+ * Best effort undo..
+ */
+
+ for_each_cpu(j, covered_cpus)
+ wrmsr_on_cpu(j, MSR_IA32_PERF_CTL, oldmsr, h);
+
+ tmp = freqs.new;
+ freqs.new = freqs.old;
+ freqs.old = tmp;
+ for_each_cpu(j, policy->cpus) {
+ if (!cpu_online(j))
+ continue;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+ }
+ retval = 0;
+
+out:
+ free_cpumask_var(covered_cpus);
+ return retval;
+}
+
+static struct freq_attr* centrino_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver centrino_driver = {
+ .name = "centrino", /* should be speedstep-centrino,
+ but there's a 16 char limit */
+ .init = centrino_cpu_init,
+ .exit = centrino_cpu_exit,
+ .verify = centrino_verify,
+ .target = centrino_target,
+ .get = get_cur_freq,
+ .attr = centrino_attr,
+ .owner = THIS_MODULE,
+};
+
+
+/**
+ * centrino_init - initializes the Enhanced SpeedStep CPUFreq driver
+ *
+ * Initializes the Enhanced SpeedStep support. Returns -ENODEV on
+ * unsupported devices, -ENOENT if there's no voltage table for this
+ * particular CPU model, -EINVAL on problems during initiatization,
+ * and zero on success.
+ *
+ * This is quite picky. Not only does the CPU have to advertise the
+ * "est" flag in the cpuid capability flags, we look for a specific
+ * CPU model and stepping, and we need to have the exact model name in
+ * our voltage tables. That is, be paranoid about not releasing
+ * someone's valuable magic smoke.
+ */
+static int __init centrino_init(void)
+{
+ struct cpuinfo_x86 *cpu = &cpu_data(0);
+
+ if (!cpu_has(cpu, X86_FEATURE_EST))
+ return -ENODEV;
+
+ return cpufreq_register_driver(¢rino_driver);
+}
+
+static void __exit centrino_exit(void)
+{
+ cpufreq_unregister_driver(¢rino_driver);
+}
+
+MODULE_AUTHOR ("Jeremy Fitzhardinge <jeremy@goop.org>");
+MODULE_DESCRIPTION ("Enhanced SpeedStep driver for Intel Pentium M processors.");
+MODULE_LICENSE ("GPL");
+
+late_initcall(centrino_init);
+module_exit(centrino_exit);
--- /dev/null
+/*
+ * (C) 2001 Dave Jones, Arjan van de ven.
+ * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ * Based upon reverse engineered information, and on Intel documentation
+ * for chipsets ICH2-M and ICH3-M.
+ *
+ * Many thanks to Ducrot Bruno for finding and fixing the last
+ * "missing link" for ICH2-M/ICH3-M support, and to Thomas Winkler
+ * for extensive testing.
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+
+/*********************************************************************
+ * SPEEDSTEP - DEFINITIONS *
+ *********************************************************************/
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+
+#include "speedstep-lib.h"
+
+
+/* speedstep_chipset:
+ * It is necessary to know which chipset is used. As accesses to
+ * this device occur at various places in this module, we need a
+ * static struct pci_dev * pointing to that device.
+ */
+static struct pci_dev *speedstep_chipset_dev;
+
+
+/* speedstep_processor
+ */
+static enum speedstep_processor speedstep_processor;
+
+static u32 pmbase;
+
+/*
+ * There are only two frequency states for each processor. Values
+ * are in kHz for the time being.
+ */
+static struct cpufreq_frequency_table speedstep_freqs[] = {
+ {SPEEDSTEP_HIGH, 0},
+ {SPEEDSTEP_LOW, 0},
+ {0, CPUFREQ_TABLE_END},
+};
+
+
+/**
+ * speedstep_find_register - read the PMBASE address
+ *
+ * Returns: -ENODEV if no register could be found
+ */
+static int speedstep_find_register(void)
+{
+ if (!speedstep_chipset_dev)
+ return -ENODEV;
+
+ /* get PMBASE */
+ pci_read_config_dword(speedstep_chipset_dev, 0x40, &pmbase);
+ if (!(pmbase & 0x01)) {
+ printk(KERN_ERR "speedstep-ich: could not find speedstep register\n");
+ return -ENODEV;
+ }
+
+ pmbase &= 0xFFFFFFFE;
+ if (!pmbase) {
+ printk(KERN_ERR "speedstep-ich: could not find speedstep register\n");
+ return -ENODEV;
+ }
+
+ pr_debug("pmbase is 0x%x\n", pmbase);
+ return 0;
+}
+
+/**
+ * speedstep_set_state - set the SpeedStep state
+ * @state: new processor frequency state (SPEEDSTEP_LOW or SPEEDSTEP_HIGH)
+ *
+ * Tries to change the SpeedStep state. Can be called from
+ * smp_call_function_single.
+ */
+static void speedstep_set_state(unsigned int state)
+{
+ u8 pm2_blk;
+ u8 value;
+ unsigned long flags;
+
+ if (state > 0x1)
+ return;
+
+ /* Disable IRQs */
+ local_irq_save(flags);
+
+ /* read state */
+ value = inb(pmbase + 0x50);
+
+ pr_debug("read at pmbase 0x%x + 0x50 returned 0x%x\n", pmbase, value);
+
+ /* write new state */
+ value &= 0xFE;
+ value |= state;
+
+ pr_debug("writing 0x%x to pmbase 0x%x + 0x50\n", value, pmbase);
+
+ /* Disable bus master arbitration */
+ pm2_blk = inb(pmbase + 0x20);
+ pm2_blk |= 0x01;
+ outb(pm2_blk, (pmbase + 0x20));
+
+ /* Actual transition */
+ outb(value, (pmbase + 0x50));
+
+ /* Restore bus master arbitration */
+ pm2_blk &= 0xfe;
+ outb(pm2_blk, (pmbase + 0x20));
+
+ /* check if transition was successful */
+ value = inb(pmbase + 0x50);
+
+ /* Enable IRQs */
+ local_irq_restore(flags);
+
+ pr_debug("read at pmbase 0x%x + 0x50 returned 0x%x\n", pmbase, value);
+
+ if (state == (value & 0x1))
+ pr_debug("change to %u MHz succeeded\n",
+ speedstep_get_frequency(speedstep_processor) / 1000);
+ else
+ printk(KERN_ERR "cpufreq: change failed - I/O error\n");
+
+ return;
+}
+
+/* Wrapper for smp_call_function_single. */
+static void _speedstep_set_state(void *_state)
+{
+ speedstep_set_state(*(unsigned int *)_state);
+}
+
+/**
+ * speedstep_activate - activate SpeedStep control in the chipset
+ *
+ * Tries to activate the SpeedStep status and control registers.
+ * Returns -EINVAL on an unsupported chipset, and zero on success.
+ */
+static int speedstep_activate(void)
+{
+ u16 value = 0;
+
+ if (!speedstep_chipset_dev)
+ return -EINVAL;
+
+ pci_read_config_word(speedstep_chipset_dev, 0x00A0, &value);
+ if (!(value & 0x08)) {
+ value |= 0x08;
+ pr_debug("activating SpeedStep (TM) registers\n");
+ pci_write_config_word(speedstep_chipset_dev, 0x00A0, value);
+ }
+
+ return 0;
+}
+
+
+/**
+ * speedstep_detect_chipset - detect the Southbridge which contains SpeedStep logic
+ *
+ * Detects ICH2-M, ICH3-M and ICH4-M so far. The pci_dev points to
+ * the LPC bridge / PM module which contains all power-management
+ * functions. Returns the SPEEDSTEP_CHIPSET_-number for the detected
+ * chipset, or zero on failure.
+ */
+static unsigned int speedstep_detect_chipset(void)
+{
+ speedstep_chipset_dev = pci_get_subsys(PCI_VENDOR_ID_INTEL,
+ PCI_DEVICE_ID_INTEL_82801DB_12,
+ PCI_ANY_ID, PCI_ANY_ID,
+ NULL);
+ if (speedstep_chipset_dev)
+ return 4; /* 4-M */
+
+ speedstep_chipset_dev = pci_get_subsys(PCI_VENDOR_ID_INTEL,
+ PCI_DEVICE_ID_INTEL_82801CA_12,
+ PCI_ANY_ID, PCI_ANY_ID,
+ NULL);
+ if (speedstep_chipset_dev)
+ return 3; /* 3-M */
+
+
+ speedstep_chipset_dev = pci_get_subsys(PCI_VENDOR_ID_INTEL,
+ PCI_DEVICE_ID_INTEL_82801BA_10,
+ PCI_ANY_ID, PCI_ANY_ID,
+ NULL);
+ if (speedstep_chipset_dev) {
+ /* speedstep.c causes lockups on Dell Inspirons 8000 and
+ * 8100 which use a pretty old revision of the 82815
+ * host brige. Abort on these systems.
+ */
+ static struct pci_dev *hostbridge;
+
+ hostbridge = pci_get_subsys(PCI_VENDOR_ID_INTEL,
+ PCI_DEVICE_ID_INTEL_82815_MC,
+ PCI_ANY_ID, PCI_ANY_ID,
+ NULL);
+
+ if (!hostbridge)
+ return 2; /* 2-M */
+
+ if (hostbridge->revision < 5) {
+ pr_debug("hostbridge does not support speedstep\n");
+ speedstep_chipset_dev = NULL;
+ pci_dev_put(hostbridge);
+ return 0;
+ }
+
+ pci_dev_put(hostbridge);
+ return 2; /* 2-M */
+ }
+
+ return 0;
+}
+
+static void get_freq_data(void *_speed)
+{
+ unsigned int *speed = _speed;
+
+ *speed = speedstep_get_frequency(speedstep_processor);
+}
+
+static unsigned int speedstep_get(unsigned int cpu)
+{
+ unsigned int speed;
+
+ /* You're supposed to ensure CPU is online. */
+ if (smp_call_function_single(cpu, get_freq_data, &speed, 1) != 0)
+ BUG();
+
+ pr_debug("detected %u kHz as current frequency\n", speed);
+ return speed;
+}
+
+/**
+ * speedstep_target - set a new CPUFreq policy
+ * @policy: new policy
+ * @target_freq: the target frequency
+ * @relation: how that frequency relates to achieved frequency
+ * (CPUFREQ_RELATION_L or CPUFREQ_RELATION_H)
+ *
+ * Sets a new CPUFreq policy.
+ */
+static int speedstep_target(struct cpufreq_policy *policy,
+ unsigned int target_freq,
+ unsigned int relation)
+{
+ unsigned int newstate = 0, policy_cpu;
+ struct cpufreq_freqs freqs;
+ int i;
+
+ if (cpufreq_frequency_table_target(policy, &speedstep_freqs[0],
+ target_freq, relation, &newstate))
+ return -EINVAL;
+
+ policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
+ freqs.old = speedstep_get(policy_cpu);
+ freqs.new = speedstep_freqs[newstate].frequency;
+ freqs.cpu = policy->cpu;
+
+ pr_debug("transiting from %u to %u kHz\n", freqs.old, freqs.new);
+
+ /* no transition necessary */
+ if (freqs.old == freqs.new)
+ return 0;
+
+ for_each_cpu(i, policy->cpus) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ }
+
+ smp_call_function_single(policy_cpu, _speedstep_set_state, &newstate,
+ true);
+
+ for_each_cpu(i, policy->cpus) {
+ freqs.cpu = i;
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+ }
+
+ return 0;
+}
+
+
+/**
+ * speedstep_verify - verifies a new CPUFreq policy
+ * @policy: new policy
+ *
+ * Limit must be within speedstep_low_freq and speedstep_high_freq, with
+ * at least one border included.
+ */
+static int speedstep_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, &speedstep_freqs[0]);
+}
+
+struct get_freqs {
+ struct cpufreq_policy *policy;
+ int ret;
+};
+
+static void get_freqs_on_cpu(void *_get_freqs)
+{
+ struct get_freqs *get_freqs = _get_freqs;
+
+ get_freqs->ret =
+ speedstep_get_freqs(speedstep_processor,
+ &speedstep_freqs[SPEEDSTEP_LOW].frequency,
+ &speedstep_freqs[SPEEDSTEP_HIGH].frequency,
+ &get_freqs->policy->cpuinfo.transition_latency,
+ &speedstep_set_state);
+}
+
+static int speedstep_cpu_init(struct cpufreq_policy *policy)
+{
+ int result;
+ unsigned int policy_cpu, speed;
+ struct get_freqs gf;
+
+ /* only run on CPU to be set, or on its sibling */
+#ifdef CONFIG_SMP
+ cpumask_copy(policy->cpus, cpu_sibling_mask(policy->cpu));
+#endif
+ policy_cpu = cpumask_any_and(policy->cpus, cpu_online_mask);
+
+ /* detect low and high frequency and transition latency */
+ gf.policy = policy;
+ smp_call_function_single(policy_cpu, get_freqs_on_cpu, &gf, 1);
+ if (gf.ret)
+ return gf.ret;
+
+ /* get current speed setting */
+ speed = speedstep_get(policy_cpu);
+ if (!speed)
+ return -EIO;
+
+ pr_debug("currently at %s speed setting - %i MHz\n",
+ (speed == speedstep_freqs[SPEEDSTEP_LOW].frequency)
+ ? "low" : "high",
+ (speed / 1000));
+
+ /* cpuinfo and default policy values */
+ policy->cur = speed;
+
+ result = cpufreq_frequency_table_cpuinfo(policy, speedstep_freqs);
+ if (result)
+ return result;
+
+ cpufreq_frequency_table_get_attr(speedstep_freqs, policy->cpu);
+
+ return 0;
+}
+
+
+static int speedstep_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+static struct freq_attr *speedstep_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+
+static struct cpufreq_driver speedstep_driver = {
+ .name = "speedstep-ich",
+ .verify = speedstep_verify,
+ .target = speedstep_target,
+ .init = speedstep_cpu_init,
+ .exit = speedstep_cpu_exit,
+ .get = speedstep_get,
+ .owner = THIS_MODULE,
+ .attr = speedstep_attr,
+};
+
+
+/**
+ * speedstep_init - initializes the SpeedStep CPUFreq driver
+ *
+ * Initializes the SpeedStep support. Returns -ENODEV on unsupported
+ * devices, -EINVAL on problems during initiatization, and zero on
+ * success.
+ */
+static int __init speedstep_init(void)
+{
+ /* detect processor */
+ speedstep_processor = speedstep_detect_processor();
+ if (!speedstep_processor) {
+ pr_debug("Intel(R) SpeedStep(TM) capable processor "
+ "not found\n");
+ return -ENODEV;
+ }
+
+ /* detect chipset */
+ if (!speedstep_detect_chipset()) {
+ pr_debug("Intel(R) SpeedStep(TM) for this chipset not "
+ "(yet) available.\n");
+ return -ENODEV;
+ }
+
+ /* activate speedstep support */
+ if (speedstep_activate()) {
+ pci_dev_put(speedstep_chipset_dev);
+ return -EINVAL;
+ }
+
+ if (speedstep_find_register())
+ return -ENODEV;
+
+ return cpufreq_register_driver(&speedstep_driver);
+}
+
+
+/**
+ * speedstep_exit - unregisters SpeedStep support
+ *
+ * Unregisters SpeedStep support.
+ */
+static void __exit speedstep_exit(void)
+{
+ pci_dev_put(speedstep_chipset_dev);
+ cpufreq_unregister_driver(&speedstep_driver);
+}
+
+
+MODULE_AUTHOR("Dave Jones <davej@redhat.com>, "
+ "Dominik Brodowski <linux@brodo.de>");
+MODULE_DESCRIPTION("Speedstep driver for Intel mobile processors on chipsets "
+ "with ICH-M southbridges.");
+MODULE_LICENSE("GPL");
+
+module_init(speedstep_init);
+module_exit(speedstep_exit);
--- /dev/null
+/*
+ * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * Library for common functions for Intel SpeedStep v.1 and v.2 support
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+
+#include <asm/msr.h>
+#include <asm/tsc.h>
+#include "speedstep-lib.h"
+
+#define PFX "speedstep-lib: "
+
+#ifdef CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK
+static int relaxed_check;
+#else
+#define relaxed_check 0
+#endif
+
+/*********************************************************************
+ * GET PROCESSOR CORE SPEED IN KHZ *
+ *********************************************************************/
+
+static unsigned int pentium3_get_frequency(enum speedstep_processor processor)
+{
+ /* See table 14 of p3_ds.pdf and table 22 of 29834003.pdf */
+ struct {
+ unsigned int ratio; /* Frequency Multiplier (x10) */
+ u8 bitmap; /* power on configuration bits
+ [27, 25:22] (in MSR 0x2a) */
+ } msr_decode_mult[] = {
+ { 30, 0x01 },
+ { 35, 0x05 },
+ { 40, 0x02 },
+ { 45, 0x06 },
+ { 50, 0x00 },
+ { 55, 0x04 },
+ { 60, 0x0b },
+ { 65, 0x0f },
+ { 70, 0x09 },
+ { 75, 0x0d },
+ { 80, 0x0a },
+ { 85, 0x26 },
+ { 90, 0x20 },
+ { 100, 0x2b },
+ { 0, 0xff } /* error or unknown value */
+ };
+
+ /* PIII(-M) FSB settings: see table b1-b of 24547206.pdf */
+ struct {
+ unsigned int value; /* Front Side Bus speed in MHz */
+ u8 bitmap; /* power on configuration bits [18: 19]
+ (in MSR 0x2a) */
+ } msr_decode_fsb[] = {
+ { 66, 0x0 },
+ { 100, 0x2 },
+ { 133, 0x1 },
+ { 0, 0xff}
+ };
+
+ u32 msr_lo, msr_tmp;
+ int i = 0, j = 0;
+
+ /* read MSR 0x2a - we only need the low 32 bits */
+ rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_tmp);
+ pr_debug("P3 - MSR_IA32_EBL_CR_POWERON: 0x%x 0x%x\n", msr_lo, msr_tmp);
+ msr_tmp = msr_lo;
+
+ /* decode the FSB */
+ msr_tmp &= 0x00c0000;
+ msr_tmp >>= 18;
+ while (msr_tmp != msr_decode_fsb[i].bitmap) {
+ if (msr_decode_fsb[i].bitmap == 0xff)
+ return 0;
+ i++;
+ }
+
+ /* decode the multiplier */
+ if (processor == SPEEDSTEP_CPU_PIII_C_EARLY) {
+ pr_debug("workaround for early PIIIs\n");
+ msr_lo &= 0x03c00000;
+ } else
+ msr_lo &= 0x0bc00000;
+ msr_lo >>= 22;
+ while (msr_lo != msr_decode_mult[j].bitmap) {
+ if (msr_decode_mult[j].bitmap == 0xff)
+ return 0;
+ j++;
+ }
+
+ pr_debug("speed is %u\n",
+ (msr_decode_mult[j].ratio * msr_decode_fsb[i].value * 100));
+
+ return msr_decode_mult[j].ratio * msr_decode_fsb[i].value * 100;
+}
+
+
+static unsigned int pentiumM_get_frequency(void)
+{
+ u32 msr_lo, msr_tmp;
+
+ rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_tmp);
+ pr_debug("PM - MSR_IA32_EBL_CR_POWERON: 0x%x 0x%x\n", msr_lo, msr_tmp);
+
+ /* see table B-2 of 24547212.pdf */
+ if (msr_lo & 0x00040000) {
+ printk(KERN_DEBUG PFX "PM - invalid FSB: 0x%x 0x%x\n",
+ msr_lo, msr_tmp);
+ return 0;
+ }
+
+ msr_tmp = (msr_lo >> 22) & 0x1f;
+ pr_debug("bits 22-26 are 0x%x, speed is %u\n",
+ msr_tmp, (msr_tmp * 100 * 1000));
+
+ return msr_tmp * 100 * 1000;
+}
+
+static unsigned int pentium_core_get_frequency(void)
+{
+ u32 fsb = 0;
+ u32 msr_lo, msr_tmp;
+ int ret;
+
+ rdmsr(MSR_FSB_FREQ, msr_lo, msr_tmp);
+ /* see table B-2 of 25366920.pdf */
+ switch (msr_lo & 0x07) {
+ case 5:
+ fsb = 100000;
+ break;
+ case 1:
+ fsb = 133333;
+ break;
+ case 3:
+ fsb = 166667;
+ break;
+ case 2:
+ fsb = 200000;
+ break;
+ case 0:
+ fsb = 266667;
+ break;
+ case 4:
+ fsb = 333333;
+ break;
+ default:
+ printk(KERN_ERR "PCORE - MSR_FSB_FREQ undefined value");
+ }
+
+ rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_tmp);
+ pr_debug("PCORE - MSR_IA32_EBL_CR_POWERON: 0x%x 0x%x\n",
+ msr_lo, msr_tmp);
+
+ msr_tmp = (msr_lo >> 22) & 0x1f;
+ pr_debug("bits 22-26 are 0x%x, speed is %u\n",
+ msr_tmp, (msr_tmp * fsb));
+
+ ret = (msr_tmp * fsb);
+ return ret;
+}
+
+
+static unsigned int pentium4_get_frequency(void)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+ u32 msr_lo, msr_hi, mult;
+ unsigned int fsb = 0;
+ unsigned int ret;
+ u8 fsb_code;
+
+ /* Pentium 4 Model 0 and 1 do not have the Core Clock Frequency
+ * to System Bus Frequency Ratio Field in the Processor Frequency
+ * Configuration Register of the MSR. Therefore the current
+ * frequency cannot be calculated and has to be measured.
+ */
+ if (c->x86_model < 2)
+ return cpu_khz;
+
+ rdmsr(0x2c, msr_lo, msr_hi);
+
+ pr_debug("P4 - MSR_EBC_FREQUENCY_ID: 0x%x 0x%x\n", msr_lo, msr_hi);
+
+ /* decode the FSB: see IA-32 Intel (C) Architecture Software
+ * Developer's Manual, Volume 3: System Prgramming Guide,
+ * revision #12 in Table B-1: MSRs in the Pentium 4 and
+ * Intel Xeon Processors, on page B-4 and B-5.
+ */
+ fsb_code = (msr_lo >> 16) & 0x7;
+ switch (fsb_code) {
+ case 0:
+ fsb = 100 * 1000;
+ break;
+ case 1:
+ fsb = 13333 * 10;
+ break;
+ case 2:
+ fsb = 200 * 1000;
+ break;
+ }
+
+ if (!fsb)
+ printk(KERN_DEBUG PFX "couldn't detect FSB speed. "
+ "Please send an e-mail to <linux@brodo.de>\n");
+
+ /* Multiplier. */
+ mult = msr_lo >> 24;
+
+ pr_debug("P4 - FSB %u kHz; Multiplier %u; Speed %u kHz\n",
+ fsb, mult, (fsb * mult));
+
+ ret = (fsb * mult);
+ return ret;
+}
+
+
+/* Warning: may get called from smp_call_function_single. */
+unsigned int speedstep_get_frequency(enum speedstep_processor processor)
+{
+ switch (processor) {
+ case SPEEDSTEP_CPU_PCORE:
+ return pentium_core_get_frequency();
+ case SPEEDSTEP_CPU_PM:
+ return pentiumM_get_frequency();
+ case SPEEDSTEP_CPU_P4D:
+ case SPEEDSTEP_CPU_P4M:
+ return pentium4_get_frequency();
+ case SPEEDSTEP_CPU_PIII_T:
+ case SPEEDSTEP_CPU_PIII_C:
+ case SPEEDSTEP_CPU_PIII_C_EARLY:
+ return pentium3_get_frequency(processor);
+ default:
+ return 0;
+ };
+ return 0;
+}
+EXPORT_SYMBOL_GPL(speedstep_get_frequency);
+
+
+/*********************************************************************
+ * DETECT SPEEDSTEP-CAPABLE PROCESSOR *
+ *********************************************************************/
+
+unsigned int speedstep_detect_processor(void)
+{
+ struct cpuinfo_x86 *c = &cpu_data(0);
+ u32 ebx, msr_lo, msr_hi;
+
+ pr_debug("x86: %x, model: %x\n", c->x86, c->x86_model);
+
+ if ((c->x86_vendor != X86_VENDOR_INTEL) ||
+ ((c->x86 != 6) && (c->x86 != 0xF)))
+ return 0;
+
+ if (c->x86 == 0xF) {
+ /* Intel Mobile Pentium 4-M
+ * or Intel Mobile Pentium 4 with 533 MHz FSB */
+ if (c->x86_model != 2)
+ return 0;
+
+ ebx = cpuid_ebx(0x00000001);
+ ebx &= 0x000000FF;
+
+ pr_debug("ebx value is %x, x86_mask is %x\n", ebx, c->x86_mask);
+
+ switch (c->x86_mask) {
+ case 4:
+ /*
+ * B-stepping [M-P4-M]
+ * sample has ebx = 0x0f, production has 0x0e.
+ */
+ if ((ebx == 0x0e) || (ebx == 0x0f))
+ return SPEEDSTEP_CPU_P4M;
+ break;
+ case 7:
+ /*
+ * C-stepping [M-P4-M]
+ * needs to have ebx=0x0e, else it's a celeron:
+ * cf. 25130917.pdf / page 7, footnote 5 even
+ * though 25072120.pdf / page 7 doesn't say
+ * samples are only of B-stepping...
+ */
+ if (ebx == 0x0e)
+ return SPEEDSTEP_CPU_P4M;
+ break;
+ case 9:
+ /*
+ * D-stepping [M-P4-M or M-P4/533]
+ *
+ * this is totally strange: CPUID 0x0F29 is
+ * used by M-P4-M, M-P4/533 and(!) Celeron CPUs.
+ * The latter need to be sorted out as they don't
+ * support speedstep.
+ * Celerons with CPUID 0x0F29 may have either
+ * ebx=0x8 or 0xf -- 25130917.pdf doesn't say anything
+ * specific.
+ * M-P4-Ms may have either ebx=0xe or 0xf [see above]
+ * M-P4/533 have either ebx=0xe or 0xf. [25317607.pdf]
+ * also, M-P4M HTs have ebx=0x8, too
+ * For now, they are distinguished by the model_id
+ * string
+ */
+ if ((ebx == 0x0e) ||
+ (strstr(c->x86_model_id,
+ "Mobile Intel(R) Pentium(R) 4") != NULL))
+ return SPEEDSTEP_CPU_P4M;
+ break;
+ default:
+ break;
+ }
+ return 0;
+ }
+
+ switch (c->x86_model) {
+ case 0x0B: /* Intel PIII [Tualatin] */
+ /* cpuid_ebx(1) is 0x04 for desktop PIII,
+ * 0x06 for mobile PIII-M */
+ ebx = cpuid_ebx(0x00000001);
+ pr_debug("ebx is %x\n", ebx);
+
+ ebx &= 0x000000FF;
+
+ if (ebx != 0x06)
+ return 0;
+
+ /* So far all PIII-M processors support SpeedStep. See
+ * Intel's 24540640.pdf of June 2003
+ */
+ return SPEEDSTEP_CPU_PIII_T;
+
+ case 0x08: /* Intel PIII [Coppermine] */
+
+ /* all mobile PIII Coppermines have FSB 100 MHz
+ * ==> sort out a few desktop PIIIs. */
+ rdmsr(MSR_IA32_EBL_CR_POWERON, msr_lo, msr_hi);
+ pr_debug("Coppermine: MSR_IA32_EBL_CR_POWERON is 0x%x, 0x%x\n",
+ msr_lo, msr_hi);
+ msr_lo &= 0x00c0000;
+ if (msr_lo != 0x0080000)
+ return 0;
+
+ /*
+ * If the processor is a mobile version,
+ * platform ID has bit 50 set
+ * it has SpeedStep technology if either
+ * bit 56 or 57 is set
+ */
+ rdmsr(MSR_IA32_PLATFORM_ID, msr_lo, msr_hi);
+ pr_debug("Coppermine: MSR_IA32_PLATFORM ID is 0x%x, 0x%x\n",
+ msr_lo, msr_hi);
+ if ((msr_hi & (1<<18)) &&
+ (relaxed_check ? 1 : (msr_hi & (3<<24)))) {
+ if (c->x86_mask == 0x01) {
+ pr_debug("early PIII version\n");
+ return SPEEDSTEP_CPU_PIII_C_EARLY;
+ } else
+ return SPEEDSTEP_CPU_PIII_C;
+ }
+
+ default:
+ return 0;
+ }
+}
+EXPORT_SYMBOL_GPL(speedstep_detect_processor);
+
+
+/*********************************************************************
+ * DETECT SPEEDSTEP SPEEDS *
+ *********************************************************************/
+
+unsigned int speedstep_get_freqs(enum speedstep_processor processor,
+ unsigned int *low_speed,
+ unsigned int *high_speed,
+ unsigned int *transition_latency,
+ void (*set_state) (unsigned int state))
+{
+ unsigned int prev_speed;
+ unsigned int ret = 0;
+ unsigned long flags;
+ struct timeval tv1, tv2;
+
+ if ((!processor) || (!low_speed) || (!high_speed) || (!set_state))
+ return -EINVAL;
+
+ pr_debug("trying to determine both speeds\n");
+
+ /* get current speed */
+ prev_speed = speedstep_get_frequency(processor);
+ if (!prev_speed)
+ return -EIO;
+
+ pr_debug("previous speed is %u\n", prev_speed);
+
+ local_irq_save(flags);
+
+ /* switch to low state */
+ set_state(SPEEDSTEP_LOW);
+ *low_speed = speedstep_get_frequency(processor);
+ if (!*low_speed) {
+ ret = -EIO;
+ goto out;
+ }
+
+ pr_debug("low speed is %u\n", *low_speed);
+
+ /* start latency measurement */
+ if (transition_latency)
+ do_gettimeofday(&tv1);
+
+ /* switch to high state */
+ set_state(SPEEDSTEP_HIGH);
+
+ /* end latency measurement */
+ if (transition_latency)
+ do_gettimeofday(&tv2);
+
+ *high_speed = speedstep_get_frequency(processor);
+ if (!*high_speed) {
+ ret = -EIO;
+ goto out;
+ }
+
+ pr_debug("high speed is %u\n", *high_speed);
+
+ if (*low_speed == *high_speed) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ /* switch to previous state, if necessary */
+ if (*high_speed != prev_speed)
+ set_state(SPEEDSTEP_LOW);
+
+ if (transition_latency) {
+ *transition_latency = (tv2.tv_sec - tv1.tv_sec) * USEC_PER_SEC +
+ tv2.tv_usec - tv1.tv_usec;
+ pr_debug("transition latency is %u uSec\n", *transition_latency);
+
+ /* convert uSec to nSec and add 20% for safety reasons */
+ *transition_latency *= 1200;
+
+ /* check if the latency measurement is too high or too low
+ * and set it to a safe value (500uSec) in that case
+ */
+ if (*transition_latency > 10000000 ||
+ *transition_latency < 50000) {
+ printk(KERN_WARNING PFX "frequency transition "
+ "measured seems out of range (%u "
+ "nSec), falling back to a safe one of"
+ "%u nSec.\n",
+ *transition_latency, 500000);
+ *transition_latency = 500000;
+ }
+ }
+
+out:
+ local_irq_restore(flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(speedstep_get_freqs);
+
+#ifdef CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK
+module_param(relaxed_check, int, 0444);
+MODULE_PARM_DESC(relaxed_check,
+ "Don't do all checks for speedstep capability.");
+#endif
+
+MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>");
+MODULE_DESCRIPTION("Library for Intel SpeedStep 1 or 2 cpufreq drivers.");
+MODULE_LICENSE("GPL");
--- /dev/null
+/*
+ * (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ * Library for common functions for Intel SpeedStep v.1 and v.2 support
+ *
+ * BIG FAT DISCLAIMER: Work in progress code. Possibly *dangerous*
+ */
+
+
+
+/* processors */
+enum speedstep_processor {
+ SPEEDSTEP_CPU_PIII_C_EARLY = 0x00000001, /* Coppermine core */
+ SPEEDSTEP_CPU_PIII_C = 0x00000002, /* Coppermine core */
+ SPEEDSTEP_CPU_PIII_T = 0x00000003, /* Tualatin core */
+ SPEEDSTEP_CPU_P4M = 0x00000004, /* P4-M */
+/* the following processors are not speedstep-capable and are not auto-detected
+ * in speedstep_detect_processor(). However, their speed can be detected using
+ * the speedstep_get_frequency() call. */
+ SPEEDSTEP_CPU_PM = 0xFFFFFF03, /* Pentium M */
+ SPEEDSTEP_CPU_P4D = 0xFFFFFF04, /* desktop P4 */
+ SPEEDSTEP_CPU_PCORE = 0xFFFFFF05, /* Core */
+};
+
+/* speedstep states -- only two of them */
+
+#define SPEEDSTEP_HIGH 0x00000000
+#define SPEEDSTEP_LOW 0x00000001
+
+
+/* detect a speedstep-capable processor */
+extern enum speedstep_processor speedstep_detect_processor(void);
+
+/* detect the current speed (in khz) of the processor */
+extern unsigned int speedstep_get_frequency(enum speedstep_processor processor);
+
+
+/* detect the low and high speeds of the processor. The callback
+ * set_state"'s first argument is either SPEEDSTEP_HIGH or
+ * SPEEDSTEP_LOW; the second argument is zero so that no
+ * cpufreq_notify_transition calls are initiated.
+ */
+extern unsigned int speedstep_get_freqs(enum speedstep_processor processor,
+ unsigned int *low_speed,
+ unsigned int *high_speed,
+ unsigned int *transition_latency,
+ void (*set_state) (unsigned int state));
--- /dev/null
+/*
+ * Intel SpeedStep SMI driver.
+ *
+ * (C) 2003 Hiroshi Miura <miura@da-cha.org>
+ *
+ * Licensed under the terms of the GNU GPL License version 2.
+ *
+ */
+
+
+/*********************************************************************
+ * SPEEDSTEP - DEFINITIONS *
+ *********************************************************************/
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <asm/ist.h>
+
+#include "speedstep-lib.h"
+
+/* speedstep system management interface port/command.
+ *
+ * These parameters are got from IST-SMI BIOS call.
+ * If user gives it, these are used.
+ *
+ */
+static int smi_port;
+static int smi_cmd;
+static unsigned int smi_sig;
+
+/* info about the processor */
+static enum speedstep_processor speedstep_processor;
+
+/*
+ * There are only two frequency states for each processor. Values
+ * are in kHz for the time being.
+ */
+static struct cpufreq_frequency_table speedstep_freqs[] = {
+ {SPEEDSTEP_HIGH, 0},
+ {SPEEDSTEP_LOW, 0},
+ {0, CPUFREQ_TABLE_END},
+};
+
+#define GET_SPEEDSTEP_OWNER 0
+#define GET_SPEEDSTEP_STATE 1
+#define SET_SPEEDSTEP_STATE 2
+#define GET_SPEEDSTEP_FREQS 4
+
+/* how often shall the SMI call be tried if it failed, e.g. because
+ * of DMA activity going on? */
+#define SMI_TRIES 5
+
+/**
+ * speedstep_smi_ownership
+ */
+static int speedstep_smi_ownership(void)
+{
+ u32 command, result, magic, dummy;
+ u32 function = GET_SPEEDSTEP_OWNER;
+ unsigned char magic_data[] = "Copyright (c) 1999 Intel Corporation";
+
+ command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
+ magic = virt_to_phys(magic_data);
+
+ pr_debug("trying to obtain ownership with command %x at port %x\n",
+ command, smi_port);
+
+ __asm__ __volatile__(
+ "push %%ebp\n"
+ "out %%al, (%%dx)\n"
+ "pop %%ebp\n"
+ : "=D" (result),
+ "=a" (dummy), "=b" (dummy), "=c" (dummy), "=d" (dummy),
+ "=S" (dummy)
+ : "a" (command), "b" (function), "c" (0), "d" (smi_port),
+ "D" (0), "S" (magic)
+ : "memory"
+ );
+
+ pr_debug("result is %x\n", result);
+
+ return result;
+}
+
+/**
+ * speedstep_smi_get_freqs - get SpeedStep preferred & current freq.
+ * @low: the low frequency value is placed here
+ * @high: the high frequency value is placed here
+ *
+ * Only available on later SpeedStep-enabled systems, returns false results or
+ * even hangs [cf. bugme.osdl.org # 1422] on earlier systems. Empirical testing
+ * shows that the latter occurs if !(ist_info.event & 0xFFFF).
+ */
+static int speedstep_smi_get_freqs(unsigned int *low, unsigned int *high)
+{
+ u32 command, result = 0, edi, high_mhz, low_mhz, dummy;
+ u32 state = 0;
+ u32 function = GET_SPEEDSTEP_FREQS;
+
+ if (!(ist_info.event & 0xFFFF)) {
+ pr_debug("bug #1422 -- can't read freqs from BIOS\n");
+ return -ENODEV;
+ }
+
+ command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
+
+ pr_debug("trying to determine frequencies with command %x at port %x\n",
+ command, smi_port);
+
+ __asm__ __volatile__(
+ "push %%ebp\n"
+ "out %%al, (%%dx)\n"
+ "pop %%ebp"
+ : "=a" (result),
+ "=b" (high_mhz),
+ "=c" (low_mhz),
+ "=d" (state), "=D" (edi), "=S" (dummy)
+ : "a" (command),
+ "b" (function),
+ "c" (state),
+ "d" (smi_port), "S" (0), "D" (0)
+ );
+
+ pr_debug("result %x, low_freq %u, high_freq %u\n",
+ result, low_mhz, high_mhz);
+
+ /* abort if results are obviously incorrect... */
+ if ((high_mhz + low_mhz) < 600)
+ return -EINVAL;
+
+ *high = high_mhz * 1000;
+ *low = low_mhz * 1000;
+
+ return result;
+}
+
+/**
+ * speedstep_get_state - set the SpeedStep state
+ * @state: processor frequency state (SPEEDSTEP_LOW or SPEEDSTEP_HIGH)
+ *
+ */
+static int speedstep_get_state(void)
+{
+ u32 function = GET_SPEEDSTEP_STATE;
+ u32 result, state, edi, command, dummy;
+
+ command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
+
+ pr_debug("trying to determine current setting with command %x "
+ "at port %x\n", command, smi_port);
+
+ __asm__ __volatile__(
+ "push %%ebp\n"
+ "out %%al, (%%dx)\n"
+ "pop %%ebp\n"
+ : "=a" (result),
+ "=b" (state), "=D" (edi),
+ "=c" (dummy), "=d" (dummy), "=S" (dummy)
+ : "a" (command), "b" (function), "c" (0),
+ "d" (smi_port), "S" (0), "D" (0)
+ );
+
+ pr_debug("state is %x, result is %x\n", state, result);
+
+ return state & 1;
+}
+
+
+/**
+ * speedstep_set_state - set the SpeedStep state
+ * @state: new processor frequency state (SPEEDSTEP_LOW or SPEEDSTEP_HIGH)
+ *
+ */
+static void speedstep_set_state(unsigned int state)
+{
+ unsigned int result = 0, command, new_state, dummy;
+ unsigned long flags;
+ unsigned int function = SET_SPEEDSTEP_STATE;
+ unsigned int retry = 0;
+
+ if (state > 0x1)
+ return;
+
+ /* Disable IRQs */
+ local_irq_save(flags);
+
+ command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
+
+ pr_debug("trying to set frequency to state %u "
+ "with command %x at port %x\n",
+ state, command, smi_port);
+
+ do {
+ if (retry) {
+ pr_debug("retry %u, previous result %u, waiting...\n",
+ retry, result);
+ mdelay(retry * 50);
+ }
+ retry++;
+ __asm__ __volatile__(
+ "push %%ebp\n"
+ "out %%al, (%%dx)\n"
+ "pop %%ebp"
+ : "=b" (new_state), "=D" (result),
+ "=c" (dummy), "=a" (dummy),
+ "=d" (dummy), "=S" (dummy)
+ : "a" (command), "b" (function), "c" (state),
+ "d" (smi_port), "S" (0), "D" (0)
+ );
+ } while ((new_state != state) && (retry <= SMI_TRIES));
+
+ /* enable IRQs */
+ local_irq_restore(flags);
+
+ if (new_state == state)
+ pr_debug("change to %u MHz succeeded after %u tries "
+ "with result %u\n",
+ (speedstep_freqs[new_state].frequency / 1000),
+ retry, result);
+ else
+ printk(KERN_ERR "cpufreq: change to state %u "
+ "failed with new_state %u and result %u\n",
+ state, new_state, result);
+
+ return;
+}
+
+
+/**
+ * speedstep_target - set a new CPUFreq policy
+ * @policy: new policy
+ * @target_freq: new freq
+ * @relation:
+ *
+ * Sets a new CPUFreq policy/freq.
+ */
+static int speedstep_target(struct cpufreq_policy *policy,
+ unsigned int target_freq, unsigned int relation)
+{
+ unsigned int newstate = 0;
+ struct cpufreq_freqs freqs;
+
+ if (cpufreq_frequency_table_target(policy, &speedstep_freqs[0],
+ target_freq, relation, &newstate))
+ return -EINVAL;
+
+ freqs.old = speedstep_freqs[speedstep_get_state()].frequency;
+ freqs.new = speedstep_freqs[newstate].frequency;
+ freqs.cpu = 0; /* speedstep.c is UP only driver */
+
+ if (freqs.old == freqs.new)
+ return 0;
+
+ cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+ speedstep_set_state(newstate);
+ cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+ return 0;
+}
+
+
+/**
+ * speedstep_verify - verifies a new CPUFreq policy
+ * @policy: new policy
+ *
+ * Limit must be within speedstep_low_freq and speedstep_high_freq, with
+ * at least one border included.
+ */
+static int speedstep_verify(struct cpufreq_policy *policy)
+{
+ return cpufreq_frequency_table_verify(policy, &speedstep_freqs[0]);
+}
+
+
+static int speedstep_cpu_init(struct cpufreq_policy *policy)
+{
+ int result;
+ unsigned int speed, state;
+ unsigned int *low, *high;
+
+ /* capability check */
+ if (policy->cpu != 0)
+ return -ENODEV;
+
+ result = speedstep_smi_ownership();
+ if (result) {
+ pr_debug("fails in acquiring ownership of a SMI interface.\n");
+ return -EINVAL;
+ }
+
+ /* detect low and high frequency */
+ low = &speedstep_freqs[SPEEDSTEP_LOW].frequency;
+ high = &speedstep_freqs[SPEEDSTEP_HIGH].frequency;
+
+ result = speedstep_smi_get_freqs(low, high);
+ if (result) {
+ /* fall back to speedstep_lib.c dection mechanism:
+ * try both states out */
+ pr_debug("could not detect low and high frequencies "
+ "by SMI call.\n");
+ result = speedstep_get_freqs(speedstep_processor,
+ low, high,
+ NULL,
+ &speedstep_set_state);
+
+ if (result) {
+ pr_debug("could not detect two different speeds"
+ " -- aborting.\n");
+ return result;
+ } else
+ pr_debug("workaround worked.\n");
+ }
+
+ /* get current speed setting */
+ state = speedstep_get_state();
+ speed = speedstep_freqs[state].frequency;
+
+ pr_debug("currently at %s speed setting - %i MHz\n",
+ (speed == speedstep_freqs[SPEEDSTEP_LOW].frequency)
+ ? "low" : "high",
+ (speed / 1000));
+
+ /* cpuinfo and default policy values */
+ policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
+ policy->cur = speed;
+
+ result = cpufreq_frequency_table_cpuinfo(policy, speedstep_freqs);
+ if (result)
+ return result;
+
+ cpufreq_frequency_table_get_attr(speedstep_freqs, policy->cpu);
+
+ return 0;
+}
+
+static int speedstep_cpu_exit(struct cpufreq_policy *policy)
+{
+ cpufreq_frequency_table_put_attr(policy->cpu);
+ return 0;
+}
+
+static unsigned int speedstep_get(unsigned int cpu)
+{
+ if (cpu)
+ return -ENODEV;
+ return speedstep_get_frequency(speedstep_processor);
+}
+
+
+static int speedstep_resume(struct cpufreq_policy *policy)
+{
+ int result = speedstep_smi_ownership();
+
+ if (result)
+ pr_debug("fails in re-acquiring ownership of a SMI interface.\n");
+
+ return result;
+}
+
+static struct freq_attr *speedstep_attr[] = {
+ &cpufreq_freq_attr_scaling_available_freqs,
+ NULL,
+};
+
+static struct cpufreq_driver speedstep_driver = {
+ .name = "speedstep-smi",
+ .verify = speedstep_verify,
+ .target = speedstep_target,
+ .init = speedstep_cpu_init,
+ .exit = speedstep_cpu_exit,
+ .get = speedstep_get,
+ .resume = speedstep_resume,
+ .owner = THIS_MODULE,
+ .attr = speedstep_attr,
+};
+
+/**
+ * speedstep_init - initializes the SpeedStep CPUFreq driver
+ *
+ * Initializes the SpeedStep support. Returns -ENODEV on unsupported
+ * BIOS, -EINVAL on problems during initiatization, and zero on
+ * success.
+ */
+static int __init speedstep_init(void)
+{
+ speedstep_processor = speedstep_detect_processor();
+
+ switch (speedstep_processor) {
+ case SPEEDSTEP_CPU_PIII_T:
+ case SPEEDSTEP_CPU_PIII_C:
+ case SPEEDSTEP_CPU_PIII_C_EARLY:
+ break;
+ default:
+ speedstep_processor = 0;
+ }
+
+ if (!speedstep_processor) {
+ pr_debug("No supported Intel CPU detected.\n");
+ return -ENODEV;
+ }
+
+ pr_debug("signature:0x%.8ulx, command:0x%.8ulx, "
+ "event:0x%.8ulx, perf_level:0x%.8ulx.\n",
+ ist_info.signature, ist_info.command,
+ ist_info.event, ist_info.perf_level);
+
+ /* Error if no IST-SMI BIOS or no PARM
+ sig= 'ISGE' aka 'Intel Speedstep Gate E' */
+ if ((ist_info.signature != 0x47534943) && (
+ (smi_port == 0) || (smi_cmd == 0)))
+ return -ENODEV;
+
+ if (smi_sig == 1)
+ smi_sig = 0x47534943;
+ else
+ smi_sig = ist_info.signature;
+
+ /* setup smi_port from MODLULE_PARM or BIOS */
+ if ((smi_port > 0xff) || (smi_port < 0))
+ return -EINVAL;
+ else if (smi_port == 0)
+ smi_port = ist_info.command & 0xff;
+
+ if ((smi_cmd > 0xff) || (smi_cmd < 0))
+ return -EINVAL;
+ else if (smi_cmd == 0)
+ smi_cmd = (ist_info.command >> 16) & 0xff;
+
+ return cpufreq_register_driver(&speedstep_driver);
+}
+
+
+/**
+ * speedstep_exit - unregisters SpeedStep support
+ *
+ * Unregisters SpeedStep support.
+ */
+static void __exit speedstep_exit(void)
+{
+ cpufreq_unregister_driver(&speedstep_driver);
+}
+
+module_param(smi_port, int, 0444);
+module_param(smi_cmd, int, 0444);
+module_param(smi_sig, uint, 0444);
+
+MODULE_PARM_DESC(smi_port, "Override the BIOS-given IST port with this value "
+ "-- Intel's default setting is 0xb2");
+MODULE_PARM_DESC(smi_cmd, "Override the BIOS-given IST command with this value "
+ "-- Intel's default setting is 0x82");
+MODULE_PARM_DESC(smi_sig, "Set to 1 to fake the IST signature when using the "
+ "SMI interface.");
+
+MODULE_AUTHOR("Hiroshi Miura");
+MODULE_DESCRIPTION("Speedstep driver for IST applet SMI interface.");
+MODULE_LICENSE("GPL");
+
+module_init(speedstep_init);
+module_exit(speedstep_exit);
scrubval = scrubval & 0x001F;
- amd64_debug("pci-read, sdram scrub control value: %d\n", scrubval);
-
for (i = 0; i < ARRAY_SIZE(scrubrates); i++) {
if (scrubrates[i].scrubval == scrubval) {
retval = scrubrates[i].bandwidth;
/* On F10h and later ErrAddr is MC4_ADDR[47:1] */
static u64 get_error_address(struct mce *m)
{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+ u64 addr;
u8 start_bit = 1;
u8 end_bit = 47;
- if (boot_cpu_data.x86 == 0xf) {
+ if (c->x86 == 0xf) {
start_bit = 3;
end_bit = 39;
}
- return m->addr & GENMASK(start_bit, end_bit);
+ addr = m->addr & GENMASK(start_bit, end_bit);
+
+ /*
+ * Erratum 637 workaround
+ */
+ if (c->x86 == 0x15) {
+ struct amd64_pvt *pvt;
+ u64 cc6_base, tmp_addr;
+ u32 tmp;
+ u8 mce_nid, intlv_en;
+
+ if ((addr & GENMASK(24, 47)) >> 24 != 0x00fdf7)
+ return addr;
+
+ mce_nid = amd_get_nb_id(m->extcpu);
+ pvt = mcis[mce_nid]->pvt_info;
+
+ amd64_read_pci_cfg(pvt->F1, DRAM_LOCAL_NODE_LIM, &tmp);
+ intlv_en = tmp >> 21 & 0x7;
+
+ /* add [47:27] + 3 trailing bits */
+ cc6_base = (tmp & GENMASK(0, 20)) << 3;
+
+ /* reverse and add DramIntlvEn */
+ cc6_base |= intlv_en ^ 0x7;
+
+ /* pin at [47:24] */
+ cc6_base <<= 24;
+
+ if (!intlv_en)
+ return cc6_base | (addr & GENMASK(0, 23));
+
+ amd64_read_pci_cfg(pvt->F1, DRAM_LOCAL_NODE_BASE, &tmp);
+
+ /* faster log2 */
+ tmp_addr = (addr & GENMASK(12, 23)) << __fls(intlv_en + 1);
+
+ /* OR DramIntlvSel into bits [14:12] */
+ tmp_addr |= (tmp & GENMASK(21, 23)) >> 9;
+
+ /* add remaining [11:0] bits from original MC4_ADDR */
+ tmp_addr |= addr & GENMASK(0, 11);
+
+ return cc6_base | tmp_addr;
+ }
+
+ return addr;
}
static void read_dram_base_limit_regs(struct amd64_pvt *pvt, unsigned range)
{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
int off = range << 3;
amd64_read_pci_cfg(pvt->F1, DRAM_BASE_LO + off, &pvt->ranges[range].base.lo);
amd64_read_pci_cfg(pvt->F1, DRAM_LIMIT_LO + off, &pvt->ranges[range].lim.lo);
- if (boot_cpu_data.x86 == 0xf)
+ if (c->x86 == 0xf)
return;
if (!dram_rw(pvt, range))
amd64_read_pci_cfg(pvt->F1, DRAM_BASE_HI + off, &pvt->ranges[range].base.hi);
amd64_read_pci_cfg(pvt->F1, DRAM_LIMIT_HI + off, &pvt->ranges[range].lim.hi);
+
+ /* Factor in CC6 save area by reading dst node's limit reg */
+ if (c->x86 == 0x15) {
+ struct pci_dev *f1 = NULL;
+ u8 nid = dram_dst_node(pvt, range);
+ u32 llim;
+
+ f1 = pci_get_domain_bus_and_slot(0, 0, PCI_DEVFN(0x18 + nid, 1));
+ if (WARN_ON(!f1))
+ return;
+
+ amd64_read_pci_cfg(f1, DRAM_LOCAL_NODE_LIM, &llim);
+
+ pvt->ranges[range].lim.lo &= GENMASK(0, 15);
+
+ /* {[39:27],111b} */
+ pvt->ranges[range].lim.lo |= ((llim & 0x1fff) << 3 | 0x7) << 16;
+
+ pvt->ranges[range].lim.hi &= GENMASK(0, 7);
+
+ /* [47:40] */
+ pvt->ranges[range].lim.hi |= llim >> 13;
+
+ pci_dev_put(f1);
+ }
}
static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
return -EINVAL;
}
- if (intlv_en &&
- (intlv_sel != ((sys_addr >> 12) & intlv_en))) {
- amd64_warn("Botched intlv bits, en: 0x%x, sel: 0x%x\n",
- intlv_en, intlv_sel);
+ if (intlv_en && (intlv_sel != ((sys_addr >> 12) & intlv_en)))
return -EINVAL;
- }
sys_addr = f1x_swap_interleaved_region(pvt, sys_addr);
#define DCT_CFG_SEL 0x10C
+#define DRAM_LOCAL_NODE_BASE 0x120
+#define DRAM_LOCAL_NODE_LIM 0x124
+
#define DRAM_BASE_HI 0x140
#define DRAM_LIMIT_HI 0x144
return -EINVAL;
new_bw = mci->set_sdram_scrub_rate(mci, bandwidth);
- if (new_bw >= 0) {
- edac_printk(KERN_DEBUG, EDAC_MC, "Scrub rate set to %d\n", new_bw);
- return count;
+ if (new_bw < 0) {
+ edac_printk(KERN_WARNING, EDAC_MC,
+ "Error setting scrub rate to: %lu\n", bandwidth);
+ return -EINVAL;
}
- edac_printk(KERN_DEBUG, EDAC_MC, "Error setting scrub rate to: %lu\n", bandwidth);
- return -EINVAL;
+ return count;
}
/*
return bandwidth;
}
- edac_printk(KERN_DEBUG, EDAC_MC, "Read scrub rate: %d\n", bandwidth);
return sprintf(data, "%d\n", bandwidth);
}
struct ppc4xx_edac_pdata *pdata = NULL;
const struct device_node *np = op->dev.of_node;
- if (op->dev.of_match == NULL)
+ if (of_match_device(ppc4xx_edac_match, &op->dev) == NULL)
return -EINVAL;
/* Initial driver pointers and private data */
{
struct fw_ohci *ohci;
unsigned long flags;
- int ret = -EBUSY;
__be32 *next_config_rom;
dma_addr_t uninitialized_var(next_config_rom_bus);
spin_lock_irqsave(&ohci->lock, flags);
+ /*
+ * If there is not an already pending config_rom update,
+ * push our new allocation into the ohci->next_config_rom
+ * and then mark the local variable as null so that we
+ * won't deallocate the new buffer.
+ *
+ * OTOH, if there is a pending config_rom update, just
+ * use that buffer with the new config_rom data, and
+ * let this routine free the unused DMA allocation.
+ */
+
if (ohci->next_config_rom == NULL) {
ohci->next_config_rom = next_config_rom;
ohci->next_config_rom_bus = next_config_rom_bus;
+ next_config_rom = NULL;
+ }
- copy_config_rom(ohci->next_config_rom, config_rom, length);
+ copy_config_rom(ohci->next_config_rom, config_rom, length);
- ohci->next_header = config_rom[0];
- ohci->next_config_rom[0] = 0;
+ ohci->next_header = config_rom[0];
+ ohci->next_config_rom[0] = 0;
- reg_write(ohci, OHCI1394_ConfigROMmap,
- ohci->next_config_rom_bus);
- ret = 0;
- }
+ reg_write(ohci, OHCI1394_ConfigROMmap, ohci->next_config_rom_bus);
spin_unlock_irqrestore(&ohci->lock, flags);
+ /* If we didn't use the DMA allocation, delete it. */
+ if (next_config_rom != NULL)
+ dma_free_coherent(ohci->card.device, CONFIG_ROM_SIZE,
+ next_config_rom, next_config_rom_bus);
+
/*
* Now initiate a bus reset to have the changes take
* effect. We clean up the old config rom memory and DMA
* controller could need to access it before the bus reset
* takes effect.
*/
- if (ret == 0)
- fw_schedule_bus_reset(&ohci->card, true, true);
- else
- dma_free_coherent(ohci->card.device, CONFIG_ROM_SIZE,
- next_config_rom, next_config_rom_bus);
- return ret;
+ fw_schedule_bus_reset(&ohci->card, true, true);
+
+ return 0;
}
static void ohci_send_request(struct fw_card *card, struct fw_packet *packet)
struct acpi_table_ibft *ibft_addr;
EXPORT_SYMBOL_GPL(ibft_addr);
-#define IBFT_SIGN "iBFT"
+static const struct {
+ char *sign;
+} ibft_signs[] = {
+#ifdef CONFIG_ACPI
+ /*
+ * One spec says "IBFT", the other says "iBFT". We have to check
+ * for both.
+ */
+ { ACPI_SIG_IBFT },
+#endif
+ { "iBFT" },
+ { "BIFT" }, /* Broadcom iSCSI Offload */
+};
+
#define IBFT_SIGN_LEN 4
#define IBFT_START 0x80000 /* 512kB */
#define IBFT_END 0x100000 /* 1MB */
unsigned long pos;
unsigned int len = 0;
void *virt;
+ int i;
for (pos = IBFT_START; pos < IBFT_END; pos += 16) {
/* The table can't be inside the VGA BIOS reserved space,
if (pos == VGA_MEM)
pos += VGA_SIZE;
virt = isa_bus_to_virt(pos);
- if (memcmp(virt, IBFT_SIGN, IBFT_SIGN_LEN) == 0) {
- unsigned long *addr =
- (unsigned long *)isa_bus_to_virt(pos + 4);
- len = *addr;
- /* if the length of the table extends past 1M,
- * the table cannot be valid. */
- if (pos + len <= (IBFT_END-1)) {
- ibft_addr = (struct acpi_table_ibft *)virt;
- break;
+
+ for (i = 0; i < ARRAY_SIZE(ibft_signs); i++) {
+ if (memcmp(virt, ibft_signs[i].sign, IBFT_SIGN_LEN) ==
+ 0) {
+ unsigned long *addr =
+ (unsigned long *)isa_bus_to_virt(pos + 4);
+ len = *addr;
+ /* if the length of the table extends past 1M,
+ * the table cannot be valid. */
+ if (pos + len <= (IBFT_END-1)) {
+ ibft_addr = (struct acpi_table_ibft *)virt;
+ goto done;
+ }
}
}
}
+done:
return len;
}
/*
*/
unsigned long __init find_ibft_region(unsigned long *sizep)
{
-
+ int i;
ibft_addr = NULL;
#ifdef CONFIG_ACPI
- /*
- * One spec says "IBFT", the other says "iBFT". We have to check
- * for both.
- */
- if (!ibft_addr)
- acpi_table_parse(ACPI_SIG_IBFT, acpi_find_ibft);
- if (!ibft_addr)
- acpi_table_parse(IBFT_SIGN, acpi_find_ibft);
+ for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++)
+ acpi_table_parse(ibft_signs[i].sign, acpi_find_ibft);
#endif /* CONFIG_ACPI */
/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
depends on DRM
select FB
select FRAMEBUFFER_CONSOLE if !EXPERT
+ select FRAMEBUFFER_CONSOLE_DETECT_PRIMARY if FRAMEBUFFER_CONSOLE
help
FB and CRTC helpers for KMS drivers.
}
EXPORT_SYMBOL(drm_fb_helper_debug_leave);
+bool drm_fb_helper_restore_fbdev_mode(struct drm_fb_helper *fb_helper)
+{
+ bool error = false;
+ int i, ret;
+ for (i = 0; i < fb_helper->crtc_count; i++) {
+ struct drm_mode_set *mode_set = &fb_helper->crtc_info[i].mode_set;
+ ret = drm_crtc_helper_set_config(mode_set);
+ if (ret)
+ error = true;
+ }
+ return error;
+}
+EXPORT_SYMBOL(drm_fb_helper_restore_fbdev_mode);
+
bool drm_fb_helper_force_kernel_mode(void)
{
- int i = 0;
bool ret, error = false;
struct drm_fb_helper *helper;
return false;
list_for_each_entry(helper, &kernel_fb_helper_list, kernel_fb_list) {
- for (i = 0; i < helper->crtc_count; i++) {
- struct drm_mode_set *mode_set = &helper->crtc_info[i].mode_set;
- ret = drm_crtc_helper_set_config(mode_set);
- if (ret)
- error = true;
- }
+ if (helper->dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+ continue;
+
+ ret = drm_fb_helper_restore_fbdev_mode(helper);
+ if (ret)
+ error = true;
}
return error;
}
}
EXPORT_SYMBOL(drm_fb_helper_initial_config);
-bool drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper)
+/**
+ * drm_fb_helper_hotplug_event - respond to a hotplug notification by
+ * probing all the outputs attached to the fb.
+ * @fb_helper: the drm_fb_helper
+ *
+ * LOCKING:
+ * Called at runtime, must take mode config lock.
+ *
+ * Scan the connectors attached to the fb_helper and try to put together a
+ * setup after *notification of a change in output configuration.
+ *
+ * RETURNS:
+ * 0 on success and a non-zero error code otherwise.
+ */
+int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper)
{
+ struct drm_device *dev = fb_helper->dev;
int count = 0;
u32 max_width, max_height, bpp_sel;
bool bound = false, crtcs_bound = false;
struct drm_crtc *crtc;
if (!fb_helper->fb)
- return false;
+ return 0;
- list_for_each_entry(crtc, &fb_helper->dev->mode_config.crtc_list, head) {
+ mutex_lock(&dev->mode_config.mutex);
+ list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
if (crtc->fb)
crtcs_bound = true;
if (crtc->fb == fb_helper->fb)
if (!bound && crtcs_bound) {
fb_helper->delayed_hotplug = true;
- return false;
+ mutex_unlock(&dev->mode_config.mutex);
+ return 0;
}
DRM_DEBUG_KMS("\n");
count = drm_fb_helper_probe_connector_modes(fb_helper, max_width,
max_height);
drm_setup_crtcs(fb_helper);
+ mutex_unlock(&dev->mode_config.mutex);
return drm_fb_helper_single_fb_probe(fb_helper, bpp_sel);
}
void drm_vblank_off(struct drm_device *dev, int crtc)
{
+ struct drm_pending_vblank_event *e, *t;
+ struct timeval now;
unsigned long irqflags;
+ unsigned int seq;
spin_lock_irqsave(&dev->vbl_lock, irqflags);
vblank_disable_and_save(dev, crtc);
DRM_WAKEUP(&dev->vbl_queue[crtc]);
+
+ /* Send any queued vblank events, lest the natives grow disquiet */
+ seq = drm_vblank_count_and_time(dev, crtc, &now);
+ list_for_each_entry_safe(e, t, &dev->vblank_event_list, base.link) {
+ if (e->pipe != crtc)
+ continue;
+ DRM_DEBUG("Sending premature vblank event on disable: \
+ wanted %d, current %d\n",
+ e->event.sequence, seq);
+
+ e->event.sequence = seq;
+ e->event.tv_sec = now.tv_sec;
+ e->event.tv_usec = now.tv_usec;
+ drm_vblank_put(dev, e->pipe);
+ list_move_tail(&e->base.link, &e->base.file_priv->event_list);
+ wake_up_interruptible(&e->base.file_priv->event_wait);
+ trace_drm_vblank_event_delivered(e->base.pid, e->pipe,
+ e->event.sequence);
+ }
+
spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
}
EXPORT_SYMBOL(drm_vblank_off);
void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
{
list_replace(&old->node_list, &new->node_list);
- list_replace(&old->node_list, &new->hole_stack);
+ list_replace(&old->hole_stack, &new->hole_stack);
new->hole_follows = old->hole_follows;
new->mm = old->mm;
new->start = old->start;
entry->size);
total_used += entry->size;
if (entry->hole_follows) {
- hole_start = drm_mm_hole_node_start(&mm->head_node);
- hole_end = drm_mm_hole_node_end(&mm->head_node);
+ hole_start = drm_mm_hole_node_start(entry);
+ hole_end = drm_mm_hole_node_end(entry);
hole_size = hole_end - hole_start;
seq_printf(m, "0x%08lx-0x%08lx: 0x%08lx: free\n",
hole_start, hole_end, hole_size);
drm_i915_private_t *dev_priv = dev->dev_private;
if (!dev_priv || drm_core_check_feature(dev, DRIVER_MODESET)) {
- drm_fb_helper_restore();
+ intel_fb_restore_mode(dev);
vga_switcheroo_process_delayed_switch();
return;
}
unsigned int i915_powersave = 1;
module_param_named(powersave, i915_powersave, int, 0600);
-unsigned int i915_semaphores = 1;
+unsigned int i915_semaphores = 0;
module_param_named(semaphores, i915_semaphores, int, 0600);
unsigned int i915_enable_rc6 = 0;
intel_clock_t clock;
if ((dpll & DISPLAY_RATE_SELECT_FPA1) == 0)
- fp = FP0(pipe);
+ fp = I915_READ(FP0(pipe));
else
- fp = FP1(pipe);
+ fp = I915_READ(FP1(pipe));
clock.m1 = (fp & FP_M1_DIV_MASK) >> FP_M1_DIV_SHIFT;
if (IS_PINEVIEW(dev)) {
return ERR_PTR(-ENOENT);
intel_fb = kzalloc(sizeof(*intel_fb), GFP_KERNEL);
- if (!intel_fb)
+ if (!intel_fb) {
+ drm_gem_object_unreference_unlocked(&obj->base);
return ERR_PTR(-ENOMEM);
+ }
ret = intel_framebuffer_init(dev, intel_fb, mode_cmd, obj);
if (ret) {
if (!HAS_PCH_CPT(dev) &&
I915_READ(intel_dp->output_reg) & DP_PIPEB_SELECT) {
- struct intel_crtc *intel_crtc = to_intel_crtc(intel_dp->base.base.crtc);
+ struct drm_crtc *crtc = intel_dp->base.base.crtc;
+
/* Hardware workaround: leaving our transcoder select
* set to transcoder B while it's off will prevent the
* corresponding HDMI output on transcoder A.
/* Changes to enable or select take place the vblank
* after being written.
*/
- intel_wait_for_vblank(dev, intel_crtc->pipe);
+ if (crtc == NULL) {
+ /* We can arrive here never having been attached
+ * to a CRTC, for instance, due to inheriting
+ * random state from the BIOS.
+ *
+ * If the pipe is not running, play safe and
+ * wait for the clocks to stabilise before
+ * continuing.
+ */
+ POSTING_READ(intel_dp->output_reg);
+ msleep(50);
+ } else
+ intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe);
}
I915_WRITE(intel_dp->output_reg, DP & ~DP_PORT_EN);
struct drm_file *file_priv);
extern void intel_fb_output_poll_changed(struct drm_device *dev);
+extern void intel_fb_restore_mode(struct drm_device *dev);
#endif /* __INTEL_DRV_H__ */
drm_i915_private_t *dev_priv = dev->dev_private;
drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
}
+
+void intel_fb_restore_mode(struct drm_device *dev)
+{
+ int ret;
+ drm_i915_private_t *dev_priv = dev->dev_private;
+
+ ret = drm_fb_helper_restore_fbdev_mode(&dev_priv->fbdev->helper);
+ if (ret)
+ DRM_DEBUG("failed to restore crtc mode\n");
+}
struct drm_device *dev = dev_priv->dev;
struct drm_connector *connector = dev_priv->int_lvds_connector;
+ if (dev->switch_power_state != DRM_SWITCH_POWER_ON)
+ return NOTIFY_OK;
+
/*
* check and update the status of LVDS connector after receiving
* the LID nofication event.
{
struct drm_nouveau_private *dev_priv = dev->dev_private;
- nouveau_bo_ref(NULL, &dev_priv->vga_ram);
-
ttm_bo_device_release(&dev_priv->ttm.bdev);
nouveau_ttm_global_release(dev_priv);
nvbe->nr_pages = 0;
while (num_pages--) {
- if (dma_addrs[nvbe->nr_pages] != DMA_ERROR_CODE) {
+ /* this code path isn't called and is incorrect anyways */
+ if (0) { /*dma_addrs[nvbe->nr_pages] != DMA_ERROR_CODE)*/
nvbe->pages[nvbe->nr_pages] =
dma_addrs[nvbe->nr_pages];
nvbe->ttm_alloced[nvbe->nr_pages] = true;
engine->mc.takedown(dev);
engine->display.late_takedown(dev);
+ if (dev_priv->vga_ram) {
+ nouveau_bo_unpin(dev_priv->vga_ram);
+ nouveau_bo_ref(NULL, &dev_priv->vga_ram);
+ }
+
mutex_lock(&dev->struct_mutex);
ttm_bo_clean_mm(&dev_priv->ttm.bdev, TTM_PL_VRAM);
ttm_bo_clean_mm(&dev_priv->ttm.bdev, TTM_PL_TT);
SYSTEM_ACCESS_MODE_NOT_IN_SYS |
SYSTEM_APERTURE_UNMAPPED_ACCESS_PASS_THRU |
EFFECTIVE_L1_TLB_SIZE(5) | EFFECTIVE_L1_QUEUE_SIZE(5);
- WREG32(MC_VM_MD_L1_TLB0_CNTL, tmp);
- WREG32(MC_VM_MD_L1_TLB1_CNTL, tmp);
- WREG32(MC_VM_MD_L1_TLB2_CNTL, tmp);
+ if (rdev->flags & RADEON_IS_IGP) {
+ WREG32(FUS_MC_VM_MD_L1_TLB0_CNTL, tmp);
+ WREG32(FUS_MC_VM_MD_L1_TLB1_CNTL, tmp);
+ WREG32(FUS_MC_VM_MD_L1_TLB2_CNTL, tmp);
+ } else {
+ WREG32(MC_VM_MD_L1_TLB0_CNTL, tmp);
+ WREG32(MC_VM_MD_L1_TLB1_CNTL, tmp);
+ WREG32(MC_VM_MD_L1_TLB2_CNTL, tmp);
+ }
WREG32(MC_VM_MB_L1_TLB0_CNTL, tmp);
WREG32(MC_VM_MB_L1_TLB1_CNTL, tmp);
WREG32(MC_VM_MB_L1_TLB2_CNTL, tmp);
mc_shared_chmap = RREG32(MC_SHARED_CHMAP);
- mc_arb_ramcfg = RREG32(MC_ARB_RAMCFG);
+ if (rdev->flags & RADEON_IS_IGP)
+ mc_arb_ramcfg = RREG32(FUS_MC_ARB_RAMCFG);
+ else
+ mc_arb_ramcfg = RREG32(MC_ARB_RAMCFG);
switch (rdev->config.evergreen.max_tile_pipes) {
case 1:
rdev->asic->copy = NULL;
dev_warn(rdev->dev, "failed blitter (%d) falling back to memcpy\n", r);
}
- /* XXX: ontario has problems blitting to gart at the moment */
- if (rdev->family == CHIP_PALM) {
- rdev->asic->copy = NULL;
- radeon_ttm_set_active_vram_size(rdev, rdev->mc.visible_vram_size);
- }
/* allocate wb buffer */
r = radeon_wb_init(rdev);
#define BURSTLENGTH_SHIFT 9
#define BURSTLENGTH_MASK 0x00000200
#define CHANSIZE_OVERRIDE (1 << 11)
+#define FUS_MC_ARB_RAMCFG 0x2768
#define MC_VM_AGP_TOP 0x2028
#define MC_VM_AGP_BOT 0x202C
#define MC_VM_AGP_BASE 0x2030
#define MC_VM_MD_L1_TLB0_CNTL 0x2654
#define MC_VM_MD_L1_TLB1_CNTL 0x2658
#define MC_VM_MD_L1_TLB2_CNTL 0x265C
+
+#define FUS_MC_VM_MD_L1_TLB0_CNTL 0x265C
+#define FUS_MC_VM_MD_L1_TLB1_CNTL 0x2660
+#define FUS_MC_VM_MD_L1_TLB2_CNTL 0x2664
+
#define MC_VM_SYSTEM_APERTURE_DEFAULT_ADDR 0x203C
#define MC_VM_SYSTEM_APERTURE_HIGH_ADDR 0x2038
#define MC_VM_SYSTEM_APERTURE_LOW_ADDR 0x2034
cc_rb_backend_disable = RREG32(CC_RB_BACKEND_DISABLE);
cc_gc_shader_pipe_config = RREG32(CC_GC_SHADER_PIPE_CONFIG);
- cgts_tcc_disable = RREG32(CGTS_TCC_DISABLE);
+ cgts_tcc_disable = 0xff000000;
gc_user_rb_backend_disable = RREG32(GC_USER_RB_BACKEND_DISABLE);
gc_user_shader_pipe_config = RREG32(GC_USER_SHADER_PIPE_CONFIG);
cgts_user_tcc_disable = RREG32(CGTS_USER_TCC_DISABLE);
smx_dc_ctl0 = RREG32(SMX_DC_CTL0);
smx_dc_ctl0 &= ~NUMBER_OF_SETS(0x1ff);
- smx_dc_ctl0 |= NUMBER_OF_SETS(rdev->config.evergreen.sx_num_of_sets);
+ smx_dc_ctl0 |= NUMBER_OF_SETS(rdev->config.cayman.sx_num_of_sets);
WREG32(SMX_DC_CTL0, smx_dc_ctl0);
WREG32(SPI_CONFIG_CNTL_1, VTX_DONE_DELAY(4) | CRC_SIMD_ID_WADDR_DISABLE);
WREG32(TA_CNTL_AUX, DISABLE_CUBE_ANISO);
- WREG32(SX_EXPORT_BUFFER_SIZES, (COLOR_BUFFER_SIZE((rdev->config.evergreen.sx_max_export_size / 4) - 1) |
- POSITION_BUFFER_SIZE((rdev->config.evergreen.sx_max_export_pos_size / 4) - 1) |
- SMX_BUFFER_SIZE((rdev->config.evergreen.sx_max_export_smx_size / 4) - 1)));
+ WREG32(SX_EXPORT_BUFFER_SIZES, (COLOR_BUFFER_SIZE((rdev->config.cayman.sx_max_export_size / 4) - 1) |
+ POSITION_BUFFER_SIZE((rdev->config.cayman.sx_max_export_pos_size / 4) - 1) |
+ SMX_BUFFER_SIZE((rdev->config.cayman.sx_max_export_smx_size / 4) - 1)));
- WREG32(PA_SC_FIFO_SIZE, (SC_PRIM_FIFO_SIZE(rdev->config.evergreen.sc_prim_fifo_size) |
- SC_HIZ_TILE_FIFO_SIZE(rdev->config.evergreen.sc_hiz_tile_fifo_size) |
- SC_EARLYZ_TILE_FIFO_SIZE(rdev->config.evergreen.sc_earlyz_tile_fifo_size)));
+ WREG32(PA_SC_FIFO_SIZE, (SC_PRIM_FIFO_SIZE(rdev->config.cayman.sc_prim_fifo_size) |
+ SC_HIZ_TILE_FIFO_SIZE(rdev->config.cayman.sc_hiz_tile_fifo_size) |
+ SC_EARLYZ_TILE_FIFO_SIZE(rdev->config.cayman.sc_earlyz_tile_fifo_size)));
WREG32(VGT_NUM_INSTANCES, 1);
WREG32(CP_PERFMON_CNTL, 0);
- WREG32(SQ_MS_FIFO_SIZES, (CACHE_FIFO_SIZE(16 * rdev->config.evergreen.sq_num_cf_insts) |
+ WREG32(SQ_MS_FIFO_SIZES, (CACHE_FIFO_SIZE(16 * rdev->config.cayman.sq_num_cf_insts) |
FETCH_FIFO_HIWATER(0x4) |
DONE_FIFO_HIWATER(0xe0) |
ALU_UPDATE_FIFO_HIWATER(0x8)));
}
}
- /* Acer laptop (Acer TravelMate 5730G) has an HDMI port
+ /* Acer laptop (Acer TravelMate 5730/5730G) has an HDMI port
* on the laptop and a DVI port on the docking station and
* both share the same encoder, hpd pin, and ddc line.
* So while the bios table is technically correct,
* with different crtcs which isn't possible on the hardware
* side and leaves no crtcs for LVDS or VGA.
*/
- if ((dev->pdev->device == 0x95c4) &&
+ if (((dev->pdev->device == 0x95c4) || (dev->pdev->device == 0x9591)) &&
(dev->pdev->subsystem_vendor == 0x1025) &&
(dev->pdev->subsystem_device == 0x013c)) {
if ((*connector_type == DRM_MODE_CONNECTOR_DVII) &&
ATOM_FAKE_EDID_PATCH_RECORD *fake_edid_record;
ATOM_PANEL_RESOLUTION_PATCH_RECORD *panel_res_record;
bool bad_record = false;
- u8 *record = (u8 *)(mode_info->atom_context->bios +
- data_offset +
- le16_to_cpu(lvds_info->info.usModePatchTableOffset));
+ u8 *record;
+
+ if ((frev == 1) && (crev < 2))
+ /* absolute */
+ record = (u8 *)(mode_info->atom_context->bios +
+ le16_to_cpu(lvds_info->info.usModePatchTableOffset));
+ else
+ /* relative */
+ record = (u8 *)(mode_info->atom_context->bios +
+ data_offset +
+ le16_to_cpu(lvds_info->info.usModePatchTableOffset));
while (*record != ATOM_RECORD_END_TYPE) {
switch (*record) {
case LCD_MODE_PATCH_RECORD_MODE_TYPE:
memcpy((u8 *)edid, (u8 *)&fake_edid_record->ucFakeEDIDString[0],
fake_edid_record->ucFakeEDIDLength);
- if (drm_edid_is_valid(edid))
+ if (drm_edid_is_valid(edid)) {
rdev->mode_info.bios_hardcoded_edid = edid;
- else
+ rdev->mode_info.bios_hardcoded_edid_size = edid_size;
+ } else
kfree(edid);
}
}
#define ATPX_VERSION 0
#define ATPX_GPU_PWR 2
#define ATPX_MUX_SELECT 3
+#define ATPX_I2C_MUX_SELECT 4
+#define ATPX_SWITCH_START 5
+#define ATPX_SWITCH_END 6
#define ATPX_INTEGRATED 0
#define ATPX_DISCRETE 1
return radeon_atpx_execute(handle, ATPX_MUX_SELECT, mux_id);
}
+static int radeon_atpx_switch_i2c_mux(acpi_handle handle, int mux_id)
+{
+ return radeon_atpx_execute(handle, ATPX_I2C_MUX_SELECT, mux_id);
+}
+
+static int radeon_atpx_switch_start(acpi_handle handle, int gpu_id)
+{
+ return radeon_atpx_execute(handle, ATPX_SWITCH_START, gpu_id);
+}
+
+static int radeon_atpx_switch_end(acpi_handle handle, int gpu_id)
+{
+ return radeon_atpx_execute(handle, ATPX_SWITCH_END, gpu_id);
+}
static int radeon_atpx_switchto(enum vga_switcheroo_client_id id)
{
+ int gpu_id;
+
if (id == VGA_SWITCHEROO_IGD)
- radeon_atpx_switch_mux(radeon_atpx_priv.atpx_handle, 0);
+ gpu_id = ATPX_INTEGRATED;
else
- radeon_atpx_switch_mux(radeon_atpx_priv.atpx_handle, 1);
+ gpu_id = ATPX_DISCRETE;
+
+ radeon_atpx_switch_start(radeon_atpx_priv.atpx_handle, gpu_id);
+ radeon_atpx_switch_mux(radeon_atpx_priv.atpx_handle, gpu_id);
+ radeon_atpx_switch_i2c_mux(radeon_atpx_priv.atpx_handle, gpu_id);
+ radeon_atpx_switch_end(radeon_atpx_priv.atpx_handle, gpu_id);
+
return 0;
}
return -EINVAL;
}
- radeon_crtc->cursor_width = width;
- radeon_crtc->cursor_height = height;
-
obj = drm_gem_object_lookup(crtc->dev, file_priv, handle);
if (!obj) {
DRM_ERROR("Cannot find cursor object %x for crtc %d\n", handle, radeon_crtc->crtc_id);
if (ret)
goto fail;
+ radeon_crtc->cursor_width = width;
+ radeon_crtc->cursor_height = height;
+
radeon_lock_cursor(crtc, true);
/* XXX only 27 bit offset for legacy cursor */
radeon_set_cursor(crtc, obj, gpu_addr);
p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
for (i = 0; i < pages; i++, p++) {
- /* On TTM path, we only use the DMA API if TTM_PAGE_FLAG_DMA32
- * is requested. */
- if (dma_addr[i] != DMA_ERROR_CODE) {
+ /* we reverted the patch using dma_addr in TTM for now but this
+ * code stops building on alpha so just comment it out for now */
+ if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */
rdev->gart.ttm_alloced[p] = true;
rdev->gart.pages_addr[p] = dma_addr[i];
} else {
return -EINVAL;
}
break;
+ case RADEON_INFO_NUM_TILE_PIPES:
+ if (rdev->family >= CHIP_CAYMAN)
+ value = rdev->config.cayman.max_tile_pipes;
+ else if (rdev->family >= CHIP_CEDAR)
+ value = rdev->config.evergreen.max_tile_pipes;
+ else if (rdev->family >= CHIP_RV770)
+ value = rdev->config.rv770.max_tile_pipes;
+ else if (rdev->family >= CHIP_R600)
+ value = rdev->config.r600.max_tile_pipes;
+ else {
+ return -EINVAL;
+ }
+ break;
+ case RADEON_INFO_FUSION_GART_WORKING:
+ value = 1;
+ break;
default:
DRM_DEBUG_KMS("Invalid request %d\n", info->request);
return -EINVAL;
0x00008E48 SQ_EX_ALLOC_TABLE_SLOTS
0x00009100 SPI_CONFIG_CNTL
0x0000913C SPI_CONFIG_CNTL_1
+0x00009508 TA_CNTL_AUX
0x00009830 DB_DEBUG
0x00009834 DB_DEBUG2
0x00009838 DB_DEBUG3
0x00008E48 SQ_EX_ALLOC_TABLE_SLOTS
0x00009100 SPI_CONFIG_CNTL
0x0000913C SPI_CONFIG_CNTL_1
+0x00009508 TA_CNTL_AUX
0x00009700 VC_CNTL
0x00009714 VC_ENHANCE
0x00009830 DB_DEBUG
0x00028D0C DB_RENDER_CONTROL
0x00028D10 DB_RENDER_OVERRIDE
0x0002880C DB_SHADER_CONTROL
+0x00028D28 DB_SRESULTS_COMPARE_STATE0
0x00028D2C DB_SRESULTS_COMPARE_STATE1
0x00028430 DB_STENCILREFMASK
0x00028434 DB_STENCILREFMASK_BF
int i;
struct vga_switcheroo_client *active = NULL;
- if (new_client->active == true)
- return 0;
-
for (i = 0; i < VGA_SWITCHEROO_MAX_CLIENTS; i++) {
if (vgasr_priv.clients[i].active == true) {
active = &vgasr_priv.clients[i];
goto out;
}
+ if (client->active == true)
+ goto out;
+
/* okay we want a switch - test if devices are willing to switch */
can_switch = true;
for (i = 0; i < VGA_SWITCHEROO_MAX_CLIENTS; i++) {
help
If you say yes here you get support for Analog Devices ADM1021
and ADM1023 sensor chips and clones: Maxim MAX1617 and MAX1617A,
- Genesys Logic GL523SM, National Semiconductor LM84, TI THMC10,
- and the XEON processor built-in sensor.
+ Genesys Logic GL523SM, National Semiconductor LM84 and TI THMC10.
This driver can also be built as a module. If so, the module
will be called adm1021.
depends on I2C
help
If you say yes here you get support for National Semiconductor LM90,
- LM86, LM89 and LM99, Analog Devices ADM1032 and ADT7461, Maxim
- MAX6646, MAX6647, MAX6648, MAX6649, MAX6657, MAX6658, MAX6659,
- MAX6680, MAX6681, MAX6692, MAX6695, MAX6696, and Winbond/Nuvoton
- W83L771W/G/AWG/ASG sensor chips.
+ LM86, LM89 and LM99, Analog Devices ADM1032, ADT7461, and ADT7461A,
+ Maxim MAX6646, MAX6647, MAX6648, MAX6649, MAX6657, MAX6658, MAX6659,
+ MAX6680, MAX6681, MAX6692, MAX6695, MAX6696, ON Semiconductor NCT1008,
+ and Winbond/Nuvoton W83L771W/G/AWG/ASG sensor chips.
This driver can also be built as a module. If so, the module
will be called lm90.
&sensor_dev_attr_pwm1_auto_pwm_minctl.dev_attr.attr,
&sensor_dev_attr_pwm2_auto_pwm_minctl.dev_attr.attr,
&sensor_dev_attr_pwm3_auto_pwm_minctl.dev_attr.attr,
+ NULL
};
static const struct attribute_group lm85_group_minctl = {
&sensor_dev_attr_temp1_auto_temp_off.dev_attr.attr,
&sensor_dev_attr_temp2_auto_temp_off.dev_attr.attr,
&sensor_dev_attr_temp3_auto_temp_off.dev_attr.attr,
+ NULL
};
static const struct attribute_group lm85_group_temp_off = {
if (data->type != emc6d103s) {
err = sysfs_create_group(&client->dev.kobj, &lm85_group_minctl);
if (err)
- goto err_kfree;
+ goto err_remove_files;
err = sysfs_create_group(&client->dev.kobj,
&lm85_group_temp_off);
if (err)
- goto err_kfree;
+ goto err_remove_files;
}
/* The ADT7463/68 have an optional VRM 10 mode where pin 21 is used
* chips, but support three temperature sensors instead of two. MAX6695
* and MAX6696 only differ in the pinout so they can be treated identically.
*
- * This driver also supports the ADT7461 chip from Analog Devices.
- * It's supported in both compatibility and extended mode. It is mostly
- * compatible with LM90 except for a data format difference for the
- * temperature value registers.
+ * This driver also supports ADT7461 and ADT7461A from Analog Devices as well as
+ * NCT1008 from ON Semiconductor. The chips are supported in both compatibility
+ * and extended mode. They are mostly compatible with LM90 except for a data
+ * format difference for the temperature value registers.
*
* Since the LM90 was the first chipset supported by this driver, most
* comments will refer to this chipset, but are actually general and
* Addresses to scan
* Address is fully defined internally and cannot be changed except for
* MAX6659, MAX6680 and MAX6681.
- * LM86, LM89, LM90, LM99, ADM1032, ADM1032-1, ADT7461, MAX6649, MAX6657,
- * MAX6658 and W83L771 have address 0x4c.
- * ADM1032-2, ADT7461-2, LM89-1, LM99-1 and MAX6646 have address 0x4d.
+ * LM86, LM89, LM90, LM99, ADM1032, ADM1032-1, ADT7461, ADT7461A, MAX6649,
+ * MAX6657, MAX6658, NCT1008 and W83L771 have address 0x4c.
+ * ADM1032-2, ADT7461-2, ADT7461A-2, LM89-1, LM99-1, MAX6646, and NCT1008D
+ * have address 0x4d.
* MAX6647 has address 0x4e.
* MAX6659 can have address 0x4c, 0x4d or 0x4e.
* MAX6680 and MAX6681 can have address 0x18, 0x19, 0x1a, 0x29, 0x2a, 0x2b,
static const struct i2c_device_id lm90_id[] = {
{ "adm1032", adm1032 },
{ "adt7461", adt7461 },
+ { "adt7461a", adt7461 },
{ "lm90", lm90 },
{ "lm86", lm86 },
{ "lm89", lm86 },
{ "max6681", max6680 },
{ "max6695", max6696 },
{ "max6696", max6696 },
+ { "nct1008", adt7461 },
{ "w83l771", w83l771 },
{ }
};
&& (reg_config1 & 0x1B) == 0x00
&& reg_convrate <= 0x0A) {
name = "adt7461";
+ } else
+ if (chip_id == 0x57 /* ADT7461A, NCT1008 */
+ && (reg_config1 & 0x1B) == 0x00
+ && reg_convrate <= 0x0A) {
+ name = "adt7461a";
}
} else
if (man_id == 0x4D) { /* Maxim */
static int __devinit twl4030_madc_hwmon_probe(struct platform_device *pdev)
{
int ret;
- int status;
struct device *hwmon;
ret = sysfs_create_group(&pdev->dev.kobj, &twl4030_madc_group);
hwmon = hwmon_device_register(&pdev->dev);
if (IS_ERR(hwmon)) {
dev_err(&pdev->dev, "hwmon_device_register failed.\n");
- status = PTR_ERR(hwmon);
+ ret = PTR_ERR(hwmon);
goto err_reg;
}
SMBHSTSTS_BUS_ERR | SMBHSTSTS_DEV_ERR | \
SMBHSTSTS_INTR)
+/* Older devices have their ID defined in <linux/pci_ids.h> */
+#define PCI_DEVICE_ID_INTEL_COUGARPOINT_SMBUS 0x1c22
+#define PCI_DEVICE_ID_INTEL_PATSBURG_SMBUS 0x1d22
/* Patsburg also has three 'Integrated Device Function' SMBus controllers */
#define PCI_DEVICE_ID_INTEL_PATSBURG_SMBUS_IDF0 0x1d70
#define PCI_DEVICE_ID_INTEL_PATSBURG_SMBUS_IDF1 0x1d71
#define PCI_DEVICE_ID_INTEL_PATSBURG_SMBUS_IDF2 0x1d72
+#define PCI_DEVICE_ID_INTEL_DH89XXCC_SMBUS 0x2330
+#define PCI_DEVICE_ID_INTEL_5_3400_SERIES_SMBUS 0x3b30
struct i801_priv {
struct i2c_adapter adapter;
.timeout = HZ,
};
+static const struct of_device_id mpc_i2c_of_match[];
static int __devinit fsl_i2c_probe(struct platform_device *op)
{
+ const struct of_device_id *match;
struct mpc_i2c *i2c;
const u32 *prop;
u32 clock = MPC_I2C_CLOCK_LEGACY;
int result = 0;
int plen;
- if (!op->dev.of_match)
+ match = of_match_device(mpc_i2c_of_match, &op->dev);
+ if (!match)
return -EINVAL;
i2c = kzalloc(sizeof(*i2c), GFP_KERNEL);
clock = *prop;
}
- if (op->dev.of_match->data) {
- struct mpc_i2c_data *data = op->dev.of_match->data;
+ if (match->data) {
+ struct mpc_i2c_data *data = match->data;
data->setup(op->dev.of_node, i2c, clock, data->prescaler);
} else {
/* Backwards compatibility */
/* ------------------------------------------------------------------------ *
* i2c-parport.c I2C bus over parallel port *
* ------------------------------------------------------------------------ *
- Copyright (C) 2003-2010 Jean Delvare <khali@linux-fr.org>
+ Copyright (C) 2003-2011 Jean Delvare <khali@linux-fr.org>
Based on older i2c-philips-par.c driver
Copyright (C) 1995-2000 Simon G. Vogl
#include <linux/i2c-algo-bit.h>
#include <linux/i2c-smbus.h>
#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
#include "i2c-parport.h"
/* ----- Device list ------------------------------------------------------ */
struct i2c_algo_bit_data algo_data;
struct i2c_smbus_alert_setup alert_data;
struct i2c_client *ara;
- struct i2c_par *next;
+ struct list_head node;
};
-static struct i2c_par *adapter_list;
+static LIST_HEAD(adapter_list);
+static DEFINE_MUTEX(adapter_list_lock);
/* ----- Low-level parallel port access ----------------------------------- */
}
/* Add the new adapter to the list */
- adapter->next = adapter_list;
- adapter_list = adapter;
+ mutex_lock(&adapter_list_lock);
+ list_add_tail(&adapter->node, &adapter_list);
+ mutex_unlock(&adapter_list_lock);
return;
ERROR1:
static void i2c_parport_detach (struct parport *port)
{
- struct i2c_par *adapter, *prev;
+ struct i2c_par *adapter, *_n;
/* Walk the list */
- for (prev = NULL, adapter = adapter_list; adapter;
- prev = adapter, adapter = adapter->next) {
+ mutex_lock(&adapter_list_lock);
+ list_for_each_entry_safe(adapter, _n, &adapter_list, node) {
if (adapter->pdev->port == port) {
if (adapter->ara) {
parport_disable_irq(port);
parport_release(adapter->pdev);
parport_unregister_device(adapter->pdev);
- if (prev)
- prev->next = adapter->next;
- else
- adapter_list = adapter->next;
+ list_del(&adapter->node);
kfree(adapter);
- return;
}
}
+ mutex_unlock(&adapter_list_lock);
}
static struct parport_driver i2c_parport_driver = {
jiffies, expires);
timer->expires = jiffies + expires;
- timer->data = (unsigned long)&alg_data;
+ timer->data = (unsigned long)alg_data;
add_timer(timer);
}
u32 qp_num;
u8 srq;
u8 tos;
+ u8 reuseaddr;
};
struct cma_multicast {
return cma_zero_addr(addr) || cma_loopback_addr(addr);
}
+static int cma_addr_cmp(struct sockaddr *src, struct sockaddr *dst)
+{
+ if (src->sa_family != dst->sa_family)
+ return -1;
+
+ switch (src->sa_family) {
+ case AF_INET:
+ return ((struct sockaddr_in *) src)->sin_addr.s_addr !=
+ ((struct sockaddr_in *) dst)->sin_addr.s_addr;
+ default:
+ return ipv6_addr_cmp(&((struct sockaddr_in6 *) src)->sin6_addr,
+ &((struct sockaddr_in6 *) dst)->sin6_addr);
+ }
+}
+
static inline __be16 cma_port(struct sockaddr *addr)
{
if (addr->sa_family == AF_INET)
mutex_unlock(&lock);
}
-int rdma_listen(struct rdma_cm_id *id, int backlog)
-{
- struct rdma_id_private *id_priv;
- int ret;
-
- id_priv = container_of(id, struct rdma_id_private, id);
- if (id_priv->state == CMA_IDLE) {
- ((struct sockaddr *) &id->route.addr.src_addr)->sa_family = AF_INET;
- ret = rdma_bind_addr(id, (struct sockaddr *) &id->route.addr.src_addr);
- if (ret)
- return ret;
- }
-
- if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN))
- return -EINVAL;
-
- id_priv->backlog = backlog;
- if (id->device) {
- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- ret = cma_ib_listen(id_priv);
- if (ret)
- goto err;
- break;
- case RDMA_TRANSPORT_IWARP:
- ret = cma_iw_listen(id_priv, backlog);
- if (ret)
- goto err;
- break;
- default:
- ret = -ENOSYS;
- goto err;
- }
- } else
- cma_listen_on_all(id_priv);
-
- return 0;
-err:
- id_priv->backlog = 0;
- cma_comp_exch(id_priv, CMA_LISTEN, CMA_ADDR_BOUND);
- return ret;
-}
-EXPORT_SYMBOL(rdma_listen);
-
void rdma_set_service_type(struct rdma_cm_id *id, int tos)
{
struct rdma_id_private *id_priv;
}
EXPORT_SYMBOL(rdma_resolve_addr);
+int rdma_set_reuseaddr(struct rdma_cm_id *id, int reuse)
+{
+ struct rdma_id_private *id_priv;
+ unsigned long flags;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ spin_lock_irqsave(&id_priv->lock, flags);
+ if (id_priv->state == CMA_IDLE) {
+ id_priv->reuseaddr = reuse;
+ ret = 0;
+ } else {
+ ret = -EINVAL;
+ }
+ spin_unlock_irqrestore(&id_priv->lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_set_reuseaddr);
+
static void cma_bind_port(struct rdma_bind_list *bind_list,
struct rdma_id_private *id_priv)
{
return -EADDRNOTAVAIL;
}
-static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+/*
+ * Check that the requested port is available. This is called when trying to
+ * bind to a specific port, or when trying to listen on a bound port. In
+ * the latter case, the provided id_priv may already be on the bind_list, but
+ * we still need to check that it's okay to start listening.
+ */
+static int cma_check_port(struct rdma_bind_list *bind_list,
+ struct rdma_id_private *id_priv, uint8_t reuseaddr)
{
struct rdma_id_private *cur_id;
- struct sockaddr_in *sin, *cur_sin;
- struct rdma_bind_list *bind_list;
+ struct sockaddr *addr, *cur_addr;
struct hlist_node *node;
+
+ addr = (struct sockaddr *) &id_priv->id.route.addr.src_addr;
+ if (cma_any_addr(addr) && !reuseaddr)
+ return -EADDRNOTAVAIL;
+
+ hlist_for_each_entry(cur_id, node, &bind_list->owners, node) {
+ if (id_priv == cur_id)
+ continue;
+
+ if ((cur_id->state == CMA_LISTEN) ||
+ !reuseaddr || !cur_id->reuseaddr) {
+ cur_addr = (struct sockaddr *) &cur_id->id.route.addr.src_addr;
+ if (cma_any_addr(cur_addr))
+ return -EADDRNOTAVAIL;
+
+ if (!cma_addr_cmp(addr, cur_addr))
+ return -EADDRINUSE;
+ }
+ }
+ return 0;
+}
+
+static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+{
+ struct rdma_bind_list *bind_list;
unsigned short snum;
+ int ret;
- sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr;
- snum = ntohs(sin->sin_port);
+ snum = ntohs(cma_port((struct sockaddr *) &id_priv->id.route.addr.src_addr));
if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
return -EACCES;
bind_list = idr_find(ps, snum);
- if (!bind_list)
- return cma_alloc_port(ps, id_priv, snum);
-
- /*
- * We don't support binding to any address if anyone is bound to
- * a specific address on the same port.
- */
- if (cma_any_addr((struct sockaddr *) &id_priv->id.route.addr.src_addr))
- return -EADDRNOTAVAIL;
-
- hlist_for_each_entry(cur_id, node, &bind_list->owners, node) {
- if (cma_any_addr((struct sockaddr *) &cur_id->id.route.addr.src_addr))
- return -EADDRNOTAVAIL;
-
- cur_sin = (struct sockaddr_in *) &cur_id->id.route.addr.src_addr;
- if (sin->sin_addr.s_addr == cur_sin->sin_addr.s_addr)
- return -EADDRINUSE;
+ if (!bind_list) {
+ ret = cma_alloc_port(ps, id_priv, snum);
+ } else {
+ ret = cma_check_port(bind_list, id_priv, id_priv->reuseaddr);
+ if (!ret)
+ cma_bind_port(bind_list, id_priv);
}
+ return ret;
+}
- cma_bind_port(bind_list, id_priv);
- return 0;
+static int cma_bind_listen(struct rdma_id_private *id_priv)
+{
+ struct rdma_bind_list *bind_list = id_priv->bind_list;
+ int ret = 0;
+
+ mutex_lock(&lock);
+ if (bind_list->owners.first->next)
+ ret = cma_check_port(bind_list, id_priv, 0);
+ mutex_unlock(&lock);
+ return ret;
}
static int cma_get_port(struct rdma_id_private *id_priv)
return 0;
}
+int rdma_listen(struct rdma_cm_id *id, int backlog)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (id_priv->state == CMA_IDLE) {
+ ((struct sockaddr *) &id->route.addr.src_addr)->sa_family = AF_INET;
+ ret = rdma_bind_addr(id, (struct sockaddr *) &id->route.addr.src_addr);
+ if (ret)
+ return ret;
+ }
+
+ if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN))
+ return -EINVAL;
+
+ if (id_priv->reuseaddr) {
+ ret = cma_bind_listen(id_priv);
+ if (ret)
+ goto err;
+ }
+
+ id_priv->backlog = backlog;
+ if (id->device) {
+ switch (rdma_node_get_transport(id->device->node_type)) {
+ case RDMA_TRANSPORT_IB:
+ ret = cma_ib_listen(id_priv);
+ if (ret)
+ goto err;
+ break;
+ case RDMA_TRANSPORT_IWARP:
+ ret = cma_iw_listen(id_priv, backlog);
+ if (ret)
+ goto err;
+ break;
+ default:
+ ret = -ENOSYS;
+ goto err;
+ }
+ } else
+ cma_listen_on_all(id_priv);
+
+ return 0;
+err:
+ id_priv->backlog = 0;
+ cma_comp_exch(id_priv, CMA_LISTEN, CMA_ADDR_BOUND);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_listen);
+
int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
{
struct rdma_id_private *id_priv;
*/
clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT);
- if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) {
+ if (iw_event->status == 0) {
cm_id_priv->id.local_addr = iw_event->local_addr;
cm_id_priv->id.remote_addr = iw_event->remote_addr;
cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
}
rdma_set_service_type(ctx->cm_id, *((u8 *) optval));
break;
+ case RDMA_OPTION_ID_REUSEADDR:
+ if (optlen != sizeof(int)) {
+ ret = -EINVAL;
+ break;
+ }
+ ret = rdma_set_reuseaddr(ctx->cm_id, *((int *) optval) ? 1 : 0);
+ break;
default:
ret = -ENOSYS;
}
}
PDBG("%s ep %p status %d error %d\n", __func__, ep,
rpl->status, status2errno(rpl->status));
- ep->com.wr_wait.ret = status2errno(rpl->status);
- ep->com.wr_wait.done = 1;
- wake_up(&ep->com.wr_wait.wait);
+ c4iw_wake_up(&ep->com.wr_wait, status2errno(rpl->status));
return 0;
}
struct c4iw_listen_ep *ep = lookup_stid(t, stid);
PDBG("%s ep %p\n", __func__, ep);
- ep->com.wr_wait.ret = status2errno(rpl->status);
- ep->com.wr_wait.done = 1;
- wake_up(&ep->com.wr_wait.wait);
+ c4iw_wake_up(&ep->com.wr_wait, status2errno(rpl->status));
return 0;
}
struct c4iw_qp_attributes attrs;
int disconnect = 1;
int release = 0;
- int closing = 0;
+ int abort = 0;
struct tid_info *t = dev->rdev.lldi.tids;
unsigned int tid = GET_TID(hdr);
* in rdma connection migration (see c4iw_accept_cr()).
*/
__state_set(&ep->com, CLOSING);
- ep->com.wr_wait.done = 1;
- ep->com.wr_wait.ret = -ECONNRESET;
PDBG("waking up ep %p tid %u\n", ep, ep->hwtid);
- wake_up(&ep->com.wr_wait.wait);
+ c4iw_wake_up(&ep->com.wr_wait, -ECONNRESET);
break;
case MPA_REP_SENT:
__state_set(&ep->com, CLOSING);
- ep->com.wr_wait.done = 1;
- ep->com.wr_wait.ret = -ECONNRESET;
PDBG("waking up ep %p tid %u\n", ep, ep->hwtid);
- wake_up(&ep->com.wr_wait.wait);
+ c4iw_wake_up(&ep->com.wr_wait, -ECONNRESET);
break;
case FPDU_MODE:
start_ep_timer(ep);
__state_set(&ep->com, CLOSING);
- closing = 1;
+ attrs.next_state = C4IW_QP_STATE_CLOSING;
+ abort = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
+ C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
peer_close_upcall(ep);
+ disconnect = 1;
break;
case ABORTING:
disconnect = 0;
BUG_ON(1);
}
mutex_unlock(&ep->com.mutex);
- if (closing) {
- attrs.next_state = C4IW_QP_STATE_CLOSING;
- c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
- C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
- }
if (disconnect)
c4iw_ep_disconnect(ep, 0, GFP_KERNEL);
if (release)
/*
* Wake up any threads in rdma_init() or rdma_fini().
*/
- ep->com.wr_wait.done = 1;
- ep->com.wr_wait.ret = -ECONNRESET;
- wake_up(&ep->com.wr_wait.wait);
+ c4iw_wake_up(&ep->com.wr_wait, -ECONNRESET);
mutex_lock(&ep->com.mutex);
switch (ep->com.state) {
ep = lookup_tid(t, tid);
BUG_ON(!ep);
- if (ep->com.qp) {
+ if (ep && ep->com.qp) {
printk(KERN_WARNING MOD "TERM received tid %u qpid %u\n", tid,
ep->com.qp->wq.sq.qid);
attrs.next_state = C4IW_QP_STATE_TERMINATE;
c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
} else
- printk(KERN_WARNING MOD "TERM received tid %u no qp\n", tid);
+ printk(KERN_WARNING MOD "TERM received tid %u no ep/qp\n", tid);
return 0;
}
ret = (int)((be64_to_cpu(rpl->data[0]) >> 8) & 0xff);
wr_waitp = (struct c4iw_wr_wait *)(__force unsigned long) rpl->data[1];
PDBG("%s wr_waitp %p ret %u\n", __func__, wr_waitp, ret);
- if (wr_waitp) {
- if (ret)
- wr_waitp->ret = -ret;
- else
- wr_waitp->ret = 0;
- wr_waitp->done = 1;
- wake_up(&wr_waitp->wait);
- }
+ if (wr_waitp)
+ c4iw_wake_up(wr_waitp, ret ? -ret : 0);
kfree_skb(skb);
break;
case 2:
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(DRV_VERSION);
-static LIST_HEAD(dev_list);
+static LIST_HEAD(uld_ctx_list);
static DEFINE_MUTEX(dev_mutex);
static struct dentry *c4iw_debugfs_root;
c4iw_destroy_resource(&rdev->resource);
}
-static void c4iw_remove(struct c4iw_dev *dev)
+struct uld_ctx {
+ struct list_head entry;
+ struct cxgb4_lld_info lldi;
+ struct c4iw_dev *dev;
+};
+
+static void c4iw_remove(struct uld_ctx *ctx)
{
- PDBG("%s c4iw_dev %p\n", __func__, dev);
- list_del(&dev->entry);
- if (dev->registered)
- c4iw_unregister_device(dev);
- c4iw_rdev_close(&dev->rdev);
- idr_destroy(&dev->cqidr);
- idr_destroy(&dev->qpidr);
- idr_destroy(&dev->mmidr);
- iounmap(dev->rdev.oc_mw_kva);
- ib_dealloc_device(&dev->ibdev);
+ PDBG("%s c4iw_dev %p\n", __func__, ctx->dev);
+ c4iw_unregister_device(ctx->dev);
+ c4iw_rdev_close(&ctx->dev->rdev);
+ idr_destroy(&ctx->dev->cqidr);
+ idr_destroy(&ctx->dev->qpidr);
+ idr_destroy(&ctx->dev->mmidr);
+ iounmap(ctx->dev->rdev.oc_mw_kva);
+ ib_dealloc_device(&ctx->dev->ibdev);
+ ctx->dev = NULL;
}
static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
devp = (struct c4iw_dev *)ib_alloc_device(sizeof(*devp));
if (!devp) {
printk(KERN_ERR MOD "Cannot allocate ib device\n");
- return NULL;
+ return ERR_PTR(-ENOMEM);
}
devp->rdev.lldi = *infop;
devp->rdev.oc_mw_kva = ioremap_wc(devp->rdev.oc_mw_pa,
devp->rdev.lldi.vr->ocq.size);
- printk(KERN_INFO MOD "ocq memory: "
+ PDBG(KERN_INFO MOD "ocq memory: "
"hw_start 0x%x size %u mw_pa 0x%lx mw_kva %p\n",
devp->rdev.lldi.vr->ocq.start, devp->rdev.lldi.vr->ocq.size,
devp->rdev.oc_mw_pa, devp->rdev.oc_mw_kva);
- mutex_lock(&dev_mutex);
-
ret = c4iw_rdev_open(&devp->rdev);
if (ret) {
mutex_unlock(&dev_mutex);
printk(KERN_ERR MOD "Unable to open CXIO rdev err %d\n", ret);
ib_dealloc_device(&devp->ibdev);
- return NULL;
+ return ERR_PTR(ret);
}
idr_init(&devp->cqidr);
idr_init(&devp->qpidr);
idr_init(&devp->mmidr);
spin_lock_init(&devp->lock);
- list_add_tail(&devp->entry, &dev_list);
- mutex_unlock(&dev_mutex);
if (c4iw_debugfs_root) {
devp->debugfs_root = debugfs_create_dir(
static void *c4iw_uld_add(const struct cxgb4_lld_info *infop)
{
- struct c4iw_dev *dev;
+ struct uld_ctx *ctx;
static int vers_printed;
int i;
printk(KERN_INFO MOD "Chelsio T4 RDMA Driver - version %s\n",
DRV_VERSION);
- dev = c4iw_alloc(infop);
- if (!dev)
+ ctx = kzalloc(sizeof *ctx, GFP_KERNEL);
+ if (!ctx) {
+ ctx = ERR_PTR(-ENOMEM);
goto out;
+ }
+ ctx->lldi = *infop;
PDBG("%s found device %s nchan %u nrxq %u ntxq %u nports %u\n",
- __func__, pci_name(dev->rdev.lldi.pdev),
- dev->rdev.lldi.nchan, dev->rdev.lldi.nrxq,
- dev->rdev.lldi.ntxq, dev->rdev.lldi.nports);
+ __func__, pci_name(ctx->lldi.pdev),
+ ctx->lldi.nchan, ctx->lldi.nrxq,
+ ctx->lldi.ntxq, ctx->lldi.nports);
+
+ mutex_lock(&dev_mutex);
+ list_add_tail(&ctx->entry, &uld_ctx_list);
+ mutex_unlock(&dev_mutex);
- for (i = 0; i < dev->rdev.lldi.nrxq; i++)
- PDBG("rxqid[%u] %u\n", i, dev->rdev.lldi.rxq_ids[i]);
+ for (i = 0; i < ctx->lldi.nrxq; i++)
+ PDBG("rxqid[%u] %u\n", i, ctx->lldi.rxq_ids[i]);
out:
- return dev;
+ return ctx;
}
static int c4iw_uld_rx_handler(void *handle, const __be64 *rsp,
const struct pkt_gl *gl)
{
- struct c4iw_dev *dev = handle;
+ struct uld_ctx *ctx = handle;
+ struct c4iw_dev *dev = ctx->dev;
struct sk_buff *skb;
const struct cpl_act_establish *rpl;
unsigned int opcode;
static int c4iw_uld_state_change(void *handle, enum cxgb4_state new_state)
{
- struct c4iw_dev *dev = handle;
+ struct uld_ctx *ctx = handle;
PDBG("%s new_state %u\n", __func__, new_state);
switch (new_state) {
case CXGB4_STATE_UP:
- printk(KERN_INFO MOD "%s: Up\n", pci_name(dev->rdev.lldi.pdev));
- if (!dev->registered) {
- int ret;
- ret = c4iw_register_device(dev);
- if (ret)
+ printk(KERN_INFO MOD "%s: Up\n", pci_name(ctx->lldi.pdev));
+ if (!ctx->dev) {
+ int ret = 0;
+
+ ctx->dev = c4iw_alloc(&ctx->lldi);
+ if (!IS_ERR(ctx->dev))
+ ret = c4iw_register_device(ctx->dev);
+ if (IS_ERR(ctx->dev) || ret)
printk(KERN_ERR MOD
"%s: RDMA registration failed: %d\n",
- pci_name(dev->rdev.lldi.pdev), ret);
+ pci_name(ctx->lldi.pdev), ret);
}
break;
case CXGB4_STATE_DOWN:
printk(KERN_INFO MOD "%s: Down\n",
- pci_name(dev->rdev.lldi.pdev));
- if (dev->registered)
- c4iw_unregister_device(dev);
+ pci_name(ctx->lldi.pdev));
+ if (ctx->dev)
+ c4iw_remove(ctx);
break;
case CXGB4_STATE_START_RECOVERY:
printk(KERN_INFO MOD "%s: Fatal Error\n",
- pci_name(dev->rdev.lldi.pdev));
- dev->rdev.flags |= T4_FATAL_ERROR;
- if (dev->registered) {
+ pci_name(ctx->lldi.pdev));
+ if (ctx->dev) {
struct ib_event event;
+ ctx->dev->rdev.flags |= T4_FATAL_ERROR;
memset(&event, 0, sizeof event);
event.event = IB_EVENT_DEVICE_FATAL;
- event.device = &dev->ibdev;
+ event.device = &ctx->dev->ibdev;
ib_dispatch_event(&event);
- c4iw_unregister_device(dev);
+ c4iw_remove(ctx);
}
break;
case CXGB4_STATE_DETACH:
printk(KERN_INFO MOD "%s: Detach\n",
- pci_name(dev->rdev.lldi.pdev));
- mutex_lock(&dev_mutex);
- c4iw_remove(dev);
- mutex_unlock(&dev_mutex);
+ pci_name(ctx->lldi.pdev));
+ if (ctx->dev)
+ c4iw_remove(ctx);
break;
}
return 0;
static void __exit c4iw_exit_module(void)
{
- struct c4iw_dev *dev, *tmp;
+ struct uld_ctx *ctx, *tmp;
mutex_lock(&dev_mutex);
- list_for_each_entry_safe(dev, tmp, &dev_list, entry) {
- c4iw_remove(dev);
+ list_for_each_entry_safe(ctx, tmp, &uld_ctx_list, entry) {
+ if (ctx->dev)
+ c4iw_remove(ctx);
+ kfree(ctx);
}
mutex_unlock(&dev_mutex);
cxgb4_unregister_uld(CXGB4_ULD_RDMA);
#define C4IW_WR_TO (10*HZ)
+enum {
+ REPLY_READY = 0,
+};
+
struct c4iw_wr_wait {
wait_queue_head_t wait;
- int done;
+ unsigned long status;
int ret;
};
static inline void c4iw_init_wr_wait(struct c4iw_wr_wait *wr_waitp)
{
wr_waitp->ret = 0;
- wr_waitp->done = 0;
+ wr_waitp->status = 0;
init_waitqueue_head(&wr_waitp->wait);
}
+static inline void c4iw_wake_up(struct c4iw_wr_wait *wr_waitp, int ret)
+{
+ wr_waitp->ret = ret;
+ set_bit(REPLY_READY, &wr_waitp->status);
+ wake_up(&wr_waitp->wait);
+}
+
static inline int c4iw_wait_for_reply(struct c4iw_rdev *rdev,
struct c4iw_wr_wait *wr_waitp,
u32 hwtid, u32 qpid,
const char *func)
{
unsigned to = C4IW_WR_TO;
- do {
+ int ret;
- wait_event_timeout(wr_waitp->wait, wr_waitp->done, to);
- if (!wr_waitp->done) {
+ do {
+ ret = wait_event_timeout(wr_waitp->wait,
+ test_and_clear_bit(REPLY_READY, &wr_waitp->status), to);
+ if (!ret) {
printk(KERN_ERR MOD "%s - Device %s not responding - "
"tid %u qpid %u\n", func,
pci_name(rdev->lldi.pdev), hwtid, qpid);
+ if (c4iw_fatal_error(rdev)) {
+ wr_waitp->ret = -EIO;
+ break;
+ }
to = to << 2;
}
- } while (!wr_waitp->done);
+ } while (!ret);
if (wr_waitp->ret)
- printk(KERN_WARNING MOD "%s: FW reply %d tid %u qpid %u\n",
- pci_name(rdev->lldi.pdev), wr_waitp->ret, hwtid, qpid);
+ PDBG("%s: FW reply %d tid %u qpid %u\n",
+ pci_name(rdev->lldi.pdev), wr_waitp->ret, hwtid, qpid);
return wr_waitp->ret;
}
-
struct c4iw_dev {
struct ib_device ibdev;
struct c4iw_rdev rdev;
struct idr qpidr;
struct idr mmidr;
spinlock_t lock;
- struct list_head entry;
struct dentry *debugfs_root;
- u8 registered;
};
static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
if (ret)
goto bail2;
}
- dev->registered = 1;
return 0;
bail2:
ib_unregister_device(&dev->ibdev);
c4iw_class_attributes[i]);
ib_unregister_device(&dev->ibdev);
kfree(dev->ibdev.iwcm);
- dev->registered = 0;
return;
}
V_FW_RI_RES_WR_HOSTFCMODE(0) | /* no host cidx updates */
V_FW_RI_RES_WR_CPRIO(0) | /* don't keep in chip cache */
V_FW_RI_RES_WR_PCIECHN(0) | /* set by uP at ri_init time */
- t4_sq_onchip(&wq->sq) ? F_FW_RI_RES_WR_ONCHIP : 0 |
+ (t4_sq_onchip(&wq->sq) ? F_FW_RI_RES_WR_ONCHIP : 0) |
V_FW_RI_RES_WR_IQID(scq->cqid));
res->u.sqrq.dcaen_to_eqsize = cpu_to_be32(
V_FW_RI_RES_WR_DCAEN(0) |
if (ret) {
if (internal)
c4iw_get_ep(&qhp->ep->com);
- disconnect = abort = 1;
goto err;
}
break;
struct ipath_devdata *dd;
unsigned long long addr;
u32 bar0 = 0, bar1 = 0;
- u8 rev;
dd = ipath_alloc_devdata(pdev);
if (IS_ERR(dd)) {
goto bail_regions;
}
- ret = pci_read_config_byte(pdev, PCI_REVISION_ID, &rev);
- if (ret) {
- ipath_dev_err(dd, "Failed to read PCI revision ID unit "
- "%u: err %d\n", dd->ipath_unit, -ret);
- goto bail_regions; /* shouldn't ever happen */
- }
- dd->ipath_pcirev = rev;
+ dd->ipath_pcirev = pdev->revision;
#if defined(__powerpc__)
/* There isn't a generic way to specify writethrough mappings */
u16 last_ae;
u8 original_hw_tcp_state;
u8 original_ibqp_state;
- enum iw_cm_event_status disconn_status = IW_CM_EVENT_STATUS_OK;
+ int disconn_status = 0;
int issue_disconn = 0;
int issue_close = 0;
int issue_flush = 0;
(last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET))) {
issue_disconn = 1;
if (last_ae == NES_AEQE_AEID_LLP_CONNECTION_RESET)
- disconn_status = IW_CM_EVENT_STATUS_RESET;
+ disconn_status = -ECONNRESET;
}
if (((original_hw_tcp_state == NES_AEQE_TCP_STATE_CLOSED) ||
cm_id->provider_data = nesqp;
/* Send up the close complete event */
cm_event.event = IW_CM_EVENT_CLOSE;
- cm_event.status = IW_CM_EVENT_STATUS_OK;
+ cm_event.status = 0;
cm_event.provider_data = cm_id->provider_data;
cm_event.local_addr = cm_id->local_addr;
cm_event.remote_addr = cm_id->remote_addr;
nes_add_ref(&nesqp->ibqp);
cm_event.event = IW_CM_EVENT_ESTABLISHED;
- cm_event.status = IW_CM_EVENT_STATUS_ACCEPTED;
+ cm_event.status = 0;
cm_event.provider_data = (void *)nesqp;
cm_event.local_addr = cm_id->local_addr;
cm_event.remote_addr = cm_id->remote_addr;
/* notify OF layer we successfully created the requested connection */
cm_event.event = IW_CM_EVENT_CONNECT_REPLY;
- cm_event.status = IW_CM_EVENT_STATUS_ACCEPTED;
+ cm_event.status = 0;
cm_event.provider_data = cm_id->provider_data;
cm_event.local_addr.sin_family = AF_INET;
cm_event.local_addr.sin_port = cm_id->local_addr.sin_port;
nesqp->cm_id = NULL;
/* cm_id->provider_data = NULL; */
cm_event.event = IW_CM_EVENT_DISCONNECT;
- cm_event.status = IW_CM_EVENT_STATUS_RESET;
+ cm_event.status = -ECONNRESET;
cm_event.provider_data = cm_id->provider_data;
cm_event.local_addr = cm_id->local_addr;
cm_event.remote_addr = cm_id->remote_addr;
ret = cm_id->event_handler(cm_id, &cm_event);
atomic_inc(&cm_closes);
cm_event.event = IW_CM_EVENT_CLOSE;
- cm_event.status = IW_CM_EVENT_STATUS_OK;
+ cm_event.status = 0;
cm_event.provider_data = cm_id->provider_data;
cm_event.local_addr = cm_id->local_addr;
cm_event.remote_addr = cm_id->remote_addr;
cm_node, cm_id, jiffies);
cm_event.event = IW_CM_EVENT_CONNECT_REQUEST;
- cm_event.status = IW_CM_EVENT_STATUS_OK;
+ cm_event.status = 0;
cm_event.provider_data = (void *)cm_node;
cm_event.local_addr.sin_family = AF_INET;
(nesqp->ibqp_state == IB_QPS_RTR)) && (nesqp->cm_id)) {
cm_id = nesqp->cm_id;
cm_event.event = IW_CM_EVENT_CONNECT_REPLY;
- cm_event.status = IW_CM_EVENT_STATUS_TIMEOUT;
+ cm_event.status = -ETIMEDOUT;
cm_event.local_addr = cm_id->local_addr;
cm_event.remote_addr = cm_id->remote_addr;
cm_event.private_data = NULL;
/*
* Keep chip from being accessed until we are ready. Use
* writeq() directly, to allow the write even though QIB_PRESENT
- * isn't' set.
+ * isn't set.
*/
dd->flags &= ~(QIB_INITTED | QIB_PRESENT);
dd->int_counter = 0; /* so we check interrupts work again */
/*
* Keep chip from being accessed until we are ready. Use
* writeq() directly, to allow the write even though QIB_PRESENT
- * isn't' set.
+ * isn't set.
*/
dd->flags &= ~(QIB_INITTED | QIB_PRESENT);
dd->int_counter = 0; /* so we check interrupts work again */
/*
* Keep chip from being accessed until we are ready. Use
* writeq() directly, to allow the write even though QIB_PRESENT
- * isn't' set.
+ * isn't set.
*/
dd->flags &= ~(QIB_INITTED | QIB_PRESENT | QIB_BADINTR);
dd->flags |= QIB_DOING_RESET;
ibsd_wr_allchans(ppd, 4, (1 << 10), BMASK(10, 10));
tstart = get_jiffies_64();
while (chan_done &&
- !time_after64(tstart, tstart + msecs_to_jiffies(500))) {
+ !time_after64(get_jiffies_64(),
+ tstart + msecs_to_jiffies(500))) {
msleep(20);
for (chan = 0; chan < SERDES_CHANS; ++chan) {
rxcaldone = ahb_mod(ppd->dd, IBSD(ppd->hw_pidx),
*/
devid = parent->device;
if (devid >= 0x25e2 && devid <= 0x25fa) {
- u8 rev;
-
/* 5000 P/V/X/Z */
- pci_read_config_byte(parent, PCI_REVISION_ID, &rev);
- if (rev <= 0xb2)
+ if (parent->revision <= 0xb2)
bits = 1U << 10;
else
bits = 7U << 10;
return -ENODEV;
// need to init core driver if not already done so
- if (atari_keyb_init())
- return -ENODEV;
+ error = atari_keyb_init();
+ if (error)
+ return error;
atakbd_dev = input_allocate_device();
if (!atakbd_dev)
#endif
/* only relative events get here */
- dx = buf[1];
- dy = -buf[2];
+ dx = buf[1];
+ dy = buf[2];
input_report_rel(atamouse_dev, REL_X, dx);
input_report_rel(atamouse_dev, REL_Y, dy);
- input_report_key(atamouse_dev, BTN_LEFT, buttons & 0x1);
+ input_report_key(atamouse_dev, BTN_LEFT, buttons & 0x4);
input_report_key(atamouse_dev, BTN_MIDDLE, buttons & 0x2);
- input_report_key(atamouse_dev, BTN_RIGHT, buttons & 0x4);
+ input_report_key(atamouse_dev, BTN_RIGHT, buttons & 0x1);
input_sync(atamouse_dev);
static void atamouse_close(struct input_dev *dev)
{
ikbd_mouse_disable();
- atari_mouse_interrupt_hook = NULL;
+ atari_input_mouse_interrupt_hook = NULL;
}
static int __init atamouse_init(void)
if (!MACH_IS_ATARI || !ATARIHW_PRESENT(ST_MFP))
return -ENODEV;
- if (!atari_keyb_init())
- return -ENODEV;
+ error = atari_keyb_init();
+ if (error)
+ return error;
atamouse_dev = input_allocate_device();
if (!atamouse_dev)
u8 command;
u8 ref_off;
u16 scratch;
- __be16 sample;
struct spi_message msg;
struct spi_transfer xfer[6];
+ /*
+ * DMA (thus cache coherency maintenance) requires the
+ * transfer buffers to live in their own cache lines.
+ */
+ __be16 sample ____cacheline_aligned;
};
struct ads7845_ser_req {
u8 command[3];
- u8 pwrdown[3];
- u8 sample[3];
struct spi_message msg;
struct spi_transfer xfer[2];
+ /*
+ * DMA (thus cache coherency maintenance) requires the
+ * transfer buffers to live in their own cache lines.
+ */
+ u8 sample[3] ____cacheline_aligned;
};
static int ads7846_read12_ser(struct device *dev, unsigned command)
unsigned int pd_irq;
bool pressure;
bool pen_down;
+ struct work_struct pd_data_work;
};
+static void wm831x_pd_data_work(struct work_struct *work)
+{
+ struct wm831x_ts *wm831x_ts =
+ container_of(work, struct wm831x_ts, pd_data_work);
+
+ if (wm831x_ts->pen_down) {
+ enable_irq(wm831x_ts->data_irq);
+ dev_dbg(wm831x_ts->wm831x->dev, "IRQ PD->DATA done\n");
+ } else {
+ enable_irq(wm831x_ts->pd_irq);
+ dev_dbg(wm831x_ts->wm831x->dev, "IRQ DATA->PD done\n");
+ }
+}
+
static irqreturn_t wm831x_ts_data_irq(int irq, void *irq_data)
{
struct wm831x_ts *wm831x_ts = irq_data;
}
if (!wm831x_ts->pen_down) {
+ /* Switch from data to pen down */
+ dev_dbg(wm831x->dev, "IRQ DATA->PD\n");
+
disable_irq_nosync(wm831x_ts->data_irq);
/* Don't need data any more */
ABS_PRESSURE, 0);
input_report_key(wm831x_ts->input_dev, BTN_TOUCH, 0);
+
+ schedule_work(&wm831x_ts->pd_data_work);
+ } else {
+ input_report_key(wm831x_ts->input_dev, BTN_TOUCH, 1);
}
input_sync(wm831x_ts->input_dev);
struct wm831x *wm831x = wm831x_ts->wm831x;
int ena = 0;
+ if (wm831x_ts->pen_down)
+ return IRQ_HANDLED;
+
+ disable_irq_nosync(wm831x_ts->pd_irq);
+
/* Start collecting data */
if (wm831x_ts->pressure)
ena |= WM831X_TCH_Z_ENA;
WM831X_TCH_X_ENA | WM831X_TCH_Y_ENA | WM831X_TCH_Z_ENA,
WM831X_TCH_X_ENA | WM831X_TCH_Y_ENA | ena);
- input_report_key(wm831x_ts->input_dev, BTN_TOUCH, 1);
- input_sync(wm831x_ts->input_dev);
-
wm831x_set_bits(wm831x, WM831X_INTERRUPT_STATUS_1,
WM831X_TCHPD_EINT, WM831X_TCHPD_EINT);
wm831x_ts->pen_down = true;
- enable_irq(wm831x_ts->data_irq);
+
+ /* Switch from pen down to data */
+ dev_dbg(wm831x->dev, "IRQ PD->DATA\n");
+ schedule_work(&wm831x_ts->pd_data_work);
return IRQ_HANDLED;
}
struct wm831x_ts *wm831x_ts = input_get_drvdata(idev);
struct wm831x *wm831x = wm831x_ts->wm831x;
+ /* Shut the controller down, disabling all other functionality too */
wm831x_set_bits(wm831x, WM831X_TOUCH_CONTROL_1,
- WM831X_TCH_ENA | WM831X_TCH_CVT_ENA |
- WM831X_TCH_X_ENA | WM831X_TCH_Y_ENA |
- WM831X_TCH_Z_ENA, 0);
+ WM831X_TCH_ENA | WM831X_TCH_X_ENA |
+ WM831X_TCH_Y_ENA | WM831X_TCH_Z_ENA, 0);
- if (wm831x_ts->pen_down)
+ /* Make sure any pending IRQs are done, the above will prevent
+ * new ones firing.
+ */
+ synchronize_irq(wm831x_ts->data_irq);
+ synchronize_irq(wm831x_ts->pd_irq);
+
+ /* Make sure the IRQ completion work is quiesced */
+ flush_work_sync(&wm831x_ts->pd_data_work);
+
+ /* If we ended up with the pen down then make sure we revert back
+ * to pen detection state for the next time we start up.
+ */
+ if (wm831x_ts->pen_down) {
disable_irq(wm831x_ts->data_irq);
+ enable_irq(wm831x_ts->pd_irq);
+ wm831x_ts->pen_down = false;
+ }
}
static __devinit int wm831x_ts_probe(struct platform_device *pdev)
struct wm831x_pdata *core_pdata = dev_get_platdata(pdev->dev.parent);
struct wm831x_touch_pdata *pdata = NULL;
struct input_dev *input_dev;
- int error;
+ int error, irqf;
if (core_pdata)
pdata = core_pdata->touch;
wm831x_ts->wm831x = wm831x;
wm831x_ts->input_dev = input_dev;
+ INIT_WORK(&wm831x_ts->pd_data_work, wm831x_pd_data_work);
/*
* If we have a direct IRQ use it, otherwise use the interrupt
wm831x_set_bits(wm831x, WM831X_TOUCH_CONTROL_1,
WM831X_TCH_RATE_MASK, 6);
+ if (pdata && pdata->data_irqf)
+ irqf = pdata->data_irqf;
+ else
+ irqf = IRQF_TRIGGER_HIGH;
+
error = request_threaded_irq(wm831x_ts->data_irq,
NULL, wm831x_ts_data_irq,
- IRQF_ONESHOT,
+ irqf | IRQF_ONESHOT,
"Touchscreen data", wm831x_ts);
if (error) {
dev_err(&pdev->dev, "Failed to request data IRQ %d: %d\n",
}
disable_irq(wm831x_ts->data_irq);
+ if (pdata && pdata->pd_irqf)
+ irqf = pdata->pd_irqf;
+ else
+ irqf = IRQF_TRIGGER_HIGH;
+
error = request_threaded_irq(wm831x_ts->pd_irq,
NULL, wm831x_ts_pen_down_irq,
- IRQF_ONESHOT,
+ irqf | IRQF_ONESHOT,
"Touchscreen pen down", wm831x_ts);
if (error) {
dev_err(&pdev->dev, "Failed to request pen down IRQ %d: %d\n",
{LM3530_NAME, 0},
{}
};
+MODULE_DEVICE_TABLE(i2c, lm3530_id);
static struct i2c_driver lm3530_i2c_driver = {
.probe = lm3530_probe,
---help---
This is a very simple module which allows you to run
multiple instances of the same Linux kernel, using the
- "lguest" command found in the Documentation/lguest directory.
+ "lguest" command found in the Documentation/virtual/lguest
+ directory.
+
Note that "lguest" is pronounced to rhyme with "fell quest",
- not "rustyvisor". See Documentation/lguest/lguest.txt.
+ not "rustyvisor". See Documentation/virtual/lguest/lguest.txt.
If unsure, say N. If curious, say M. If masochistic, say Y.
Beer:
@for f in Preparation Guest Drivers Launcher Host Switcher Mastery; do echo "{==- $$f -==}"; make -s $$f; done; echo "{==-==}"
Preparation Preparation! Guest Drivers Launcher Host Switcher Mastery:
- @sh ../../Documentation/lguest/extract $(PREFIX) `find ../../* -name '*.[chS]' -wholename '*lguest*'`
+ @sh ../../Documentation/virtual/lguest/extract $(PREFIX) `find ../../* -name '*.[chS]' -wholename '*lguest*'`
Puppy:
@clear
@printf " __ \n (___()'\`;\n /, /\`\n \\\\\\\"--\\\\\\ \n"
if (tda_fail(ret))
goto fail;
- regs[R_MPD] = (0x77 & pd);
-
- switch (priv->mode) {
- case TDA18271_ANALOG:
- regs[R_MPD] &= ~0x08;
- break;
- case TDA18271_DIGITAL:
- regs[R_MPD] |= 0x08;
- break;
- }
+ regs[R_MPD] = (0x7f & pd);
div = ((d * (freq / 1000)) << 7) / 125;
#define RF3 2
u32 rf_default[3];
u32 rf_freq[3];
- u8 prog_cal[3];
- u8 prog_tab[3];
+ s32 prog_cal[3];
+ s32 prog_tab[3];
i = tda18271_lookup_rf_band(fe, &freq, NULL);
return bcal;
tda18271_calc_rf_cal(fe, &rf_freq[rf]);
- prog_tab[rf] = regs[R_EB14];
+ prog_tab[rf] = (s32)regs[R_EB14];
if (1 == bcal)
- prog_cal[rf] = tda18271_calibrate_rf(fe, rf_freq[rf]);
+ prog_cal[rf] =
+ (s32)tda18271_calibrate_rf(fe, rf_freq[rf]);
else
prog_cal[rf] = prog_tab[rf];
switch (rf) {
case RF1:
map[i].rf_a1 = 0;
- map[i].rf_b1 = (s32)(prog_cal[RF1] - prog_tab[RF1]);
+ map[i].rf_b1 = (prog_cal[RF1] - prog_tab[RF1]);
map[i].rf1 = rf_freq[RF1] / 1000;
break;
case RF2:
- dividend = (s32)(prog_cal[RF2] - prog_tab[RF2]) -
- (s32)(prog_cal[RF1] + prog_tab[RF1]);
+ dividend = (prog_cal[RF2] - prog_tab[RF2] -
+ prog_cal[RF1] + prog_tab[RF1]);
divisor = (s32)(rf_freq[RF2] - rf_freq[RF1]) / 1000;
map[i].rf_a1 = (dividend / divisor);
map[i].rf2 = rf_freq[RF2] / 1000;
break;
case RF3:
- dividend = (s32)(prog_cal[RF3] - prog_tab[RF3]) -
- (s32)(prog_cal[RF2] + prog_tab[RF2]);
+ dividend = (prog_cal[RF3] - prog_tab[RF3] -
+ prog_cal[RF2] + prog_tab[RF2]);
divisor = (s32)(rf_freq[RF3] - rf_freq[RF2]) / 1000;
map[i].rf_a2 = (dividend / divisor);
- map[i].rf_b2 = (s32)(prog_cal[RF2] - prog_tab[RF2]);
+ map[i].rf_b2 = (prog_cal[RF2] - prog_tab[RF2]);
map[i].rf3 = rf_freq[RF3] / 1000;
break;
default:
static struct tda18271_map tda18271_rf_band[] = {
{ .rfmax = 47900, .val = 0x00 },
{ .rfmax = 61100, .val = 0x01 },
-/* { .rfmax = 152600, .val = 0x02 }, */
- { .rfmax = 121200, .val = 0x02 },
+ { .rfmax = 152600, .val = 0x02 },
{ .rfmax = 164700, .val = 0x03 },
{ .rfmax = 203500, .val = 0x04 },
{ .rfmax = 457800, .val = 0x05 },
{ .rfmax = 150000, .val = 0xb0 },
{ .rfmax = 151000, .val = 0xb1 },
{ .rfmax = 152000, .val = 0xb7 },
- { .rfmax = 153000, .val = 0xbd },
+ { .rfmax = 152600, .val = 0xbd },
{ .rfmax = 154000, .val = 0x20 },
{ .rfmax = 155000, .val = 0x22 },
{ .rfmax = 156000, .val = 0x24 },
{ .rfmax = 161000, .val = 0x2d },
{ .rfmax = 163000, .val = 0x2e },
{ .rfmax = 164000, .val = 0x2f },
- { .rfmax = 165000, .val = 0x30 },
+ { .rfmax = 164700, .val = 0x30 },
{ .rfmax = 166000, .val = 0x11 },
{ .rfmax = 167000, .val = 0x12 },
{ .rfmax = 168000, .val = 0x13 },
{ .rfmax = 236000, .val = 0x1b },
{ .rfmax = 237000, .val = 0x1c },
{ .rfmax = 240000, .val = 0x1d },
- { .rfmax = 242000, .val = 0x1f },
+ { .rfmax = 242000, .val = 0x1e },
+ { .rfmax = 244000, .val = 0x1f },
{ .rfmax = 247000, .val = 0x20 },
{ .rfmax = 249000, .val = 0x21 },
{ .rfmax = 252000, .val = 0x22 },
{ .rfmax = 453000, .val = 0x93 },
{ .rfmax = 454000, .val = 0x94 },
{ .rfmax = 456000, .val = 0x96 },
- { .rfmax = 457000, .val = 0x98 },
+ { .rfmax = 457800, .val = 0x98 },
{ .rfmax = 461000, .val = 0x11 },
{ .rfmax = 468000, .val = 0x12 },
{ .rfmax = 472000, .val = 0x13 },
DEBSTATUS);
#define DRIVER_VERSION "0.1"
-#define DRIVER_NAME "Technisat/B2C2 FlexCop II/IIb/III Digital TV PCI Driver"
+#define DRIVER_NAME "flexcop-pci"
#define DRIVER_AUTHOR "Patrick Boettcher <patrick.boettcher@desy.de>"
struct flexcop_pci {
select DVB_TDA826X if !DVB_FE_CUSTOMISE
select DVB_STV0288 if !DVB_FE_CUSTOMISE
select DVB_IX2505V if !DVB_FE_CUSTOMISE
+ select DVB_STV0299 if !DVB_FE_CUSTOMISE
+ select DVB_PLL if !DVB_FE_CUSTOMISE
help
Say Y here to support the LME DM04/QQBOX DVB-S USB2.0 .
config DVB_USB_TECHNISAT_USB2
tristate "Technisat DVB-S/S2 USB2.0 support"
depends on DVB_USB
- select DVB_STB0899 if !DVB_FE_CUSTOMISE
- select DVB_STB6100 if !DVB_FE_CUSTOMISE
+ select DVB_STV090x if !DVB_FE_CUSTOMISE
+ select DVB_STV6110x if !DVB_FE_CUSTOMISE
help
Say Y here to support the Technisat USB2 DVB-S/S2 device
.agc1_pt3 = 98,
.agc1_slope1 = 0,
.agc1_slope2 = 167,
- .agc1_pt1 = 98,
+ .agc2_pt1 = 98,
.agc2_pt2 = 255,
.agc2_slope1 = 104,
.agc2_slope2 = 0,
dib0700_set_i2c_speed(adap->dev, 340);
adap->fe = dvb_attach(dib7000p_attach, &adap->dev->i2c_adap, 0x90, &tfe7090pvr_dib7000p_config[0]);
- dib7090_slave_reset(adap->fe);
-
if (adap->fe == NULL)
return -ENODEV;
+ dib7090_slave_reset(adap->fe);
+
return 0;
}
if (dev->ci.en && (io & NGENE_IO_TSOUT)) {
dvb_ca_en50221_init(adapter, dev->ci.en, 0, 1);
set_transfer(chan, 1);
+ chan->dev->channel[2].DataFormatFlags = DF_SWAP32;
set_transfer(&chan->dev->channel[2], 1);
dvb_register_device(adapter, &chan->ci_dev,
&ngene_dvbdev_ci, (void *) chan,
static int __media_entity_setup_link_notify(struct media_link *link, u32 flags)
{
- const u32 mask = MEDIA_LNK_FL_ENABLED;
int ret;
/* Notify both entities. */
return ret;
}
- link->flags = (link->flags & ~mask) | (flags & mask);
+ link->flags = flags;
link->reverse->flags = link->flags;
return 0;
*/
int __media_entity_setup_link(struct media_link *link, u32 flags)
{
+ const u32 mask = MEDIA_LNK_FL_ENABLED;
struct media_device *mdev;
struct media_entity *source, *sink;
int ret = -EBUSY;
if (link == NULL)
return -EINVAL;
+ /* The non-modifiable link flags must not be modified. */
+ if ((link->flags & ~mask) != (flags & ~mask))
+ return -EINVAL;
+
if (link->flags & MEDIA_LNK_FL_IMMUTABLE)
return link->flags == flags ? 0 : -EINVAL;
return 0;
}
-/* !!! not tested, in my card this does't work !!! */
+/* !!! not tested, in my card this doesn't work !!! */
static int fmr2_setvolume(struct fmr2 *dev)
{
int vol[16] = { 0x021, 0x084, 0x090, 0x104,
v4l_info(client, "chip found @ 0x%02x (%s)\n",
client->addr << 1, client->adapter->name);
- state = kmalloc(sizeof(struct saa7706h_state), GFP_KERNEL);
+ state = kzalloc(sizeof(struct saa7706h_state), GFP_KERNEL);
if (state == NULL)
return -ENOMEM;
sd = &state->sd;
v4l_info(client, "chip found @ 0x%02x (%s)\n",
client->addr << 1, client->adapter->name);
- state = kmalloc(sizeof(struct tef6862_state), GFP_KERNEL);
+ state = kzalloc(sizeof(struct tef6862_state), GFP_KERNEL);
if (state == NULL)
return -ENOMEM;
state->freq = TEF6862_LO_FREQ;
#define MOD_AUTHOR "Jarod Wilson <jarod@wilsonet.com>"
#define MOD_DESC "Driver for SoundGraph iMON MultiMedia IR/Display"
#define MOD_NAME "imon"
-#define MOD_VERSION "0.9.2"
+#define MOD_VERSION "0.9.3"
#define DISPLAY_MINOR_BASE 144
#define DEVICE_NAME "lcd%d"
}
/**
- * Sends a packet to the device -- this function must be called
- * with ictx->lock held.
+ * Sends a packet to the device -- this function must be called with
+ * ictx->lock held, or its unlock/lock sequence while waiting for tx
+ * to complete can/will lead to a deadlock.
*/
static int send_packet(struct imon_context *ictx)
{
* the iMON remotes, and those used by the Windows MCE remotes (which is
* really just RC-6), but only one or the other at a time, as the signals
* are decoded onboard the receiver.
+ *
+ * This function gets called two different ways, one way is from
+ * rc_register_device, for initial protocol selection/setup, and the other is
+ * via a userspace-initiated protocol change request, either by direct sysfs
+ * prodding or by something like ir-keytable. In the rc_register_device case,
+ * the imon context lock is already held, but when initiated from userspace,
+ * it is not, so we must acquire it prior to calling send_packet, which
+ * requires that the lock is held.
*/
static int imon_ir_change_protocol(struct rc_dev *rc, u64 rc_type)
{
int retval;
struct imon_context *ictx = rc->priv;
struct device *dev = ictx->dev;
+ bool unlock = false;
unsigned char ir_proto_packet[] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x86 };
memcpy(ictx->usb_tx_buf, &ir_proto_packet, sizeof(ir_proto_packet));
+ if (!mutex_is_locked(&ictx->lock)) {
+ unlock = true;
+ mutex_lock(&ictx->lock);
+ }
+
retval = send_packet(ictx);
if (retval)
goto out;
ictx->pad_mouse = false;
out:
+ if (unlock)
+ mutex_unlock(&ictx->lock);
+
return retval;
}
goto rdev_setup_failed;
}
+ mutex_unlock(&ictx->lock);
return ictx;
rdev_setup_failed:
goto urb_submit_failed;
}
+ mutex_unlock(&ictx->lock);
return ictx;
urb_submit_failed:
usb_set_intfdata(interface, ictx);
if (ifnum == 0) {
+ mutex_lock(&ictx->lock);
+
if (product == 0xffdc && ictx->rf_device) {
sysfs_err = sysfs_create_group(&interface->dev.kobj,
&imon_rf_attr_group);
if (ictx->display_supported)
imon_init_display(ictx, interface);
+
+ mutex_unlock(&ictx->lock);
}
dev_info(dev, "iMON device (%04x:%04x, intf%d) on "
"usb<%d:%d> initialized\n", vendor, product, ifnum,
usbdev->bus->busnum, usbdev->devnum);
- mutex_unlock(&ictx->lock);
mutex_unlock(&driver_lock);
return 0;
#include <linux/io.h>
#include <linux/interrupt.h>
#include <linux/sched.h>
+#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/input.h>
#include <linux/bitops.h>
{ USB_DEVICE(VENDOR_PHILIPS, 0x206c) },
/* Philips/Spinel plus IR transceiver for ASUS */
{ USB_DEVICE(VENDOR_PHILIPS, 0x2088) },
+ /* Philips IR transceiver (Dell branded) */
+ { USB_DEVICE(VENDOR_PHILIPS, 0x2093) },
/* Realtek MCE IR Receiver and card reader */
{ USB_DEVICE(VENDOR_REALTEK, 0x0161),
.driver_info = MULTIFUNCTION },
{
struct rc_dev *rdev = input_get_drvdata(idev);
- rdev->close(rdev);
+ if (rdev)
+ rdev->close(rdev);
}
/* class for /sys/class/rc */
{ RC_TYPE_SONY, "sony" },
{ RC_TYPE_RC5_SZ, "rc-5-sz" },
{ RC_TYPE_LIRC, "lirc" },
+ { RC_TYPE_OTHER, "other" },
};
#define PROTO_NONE "none"
config VIDEO_MX3
tristate "i.MX3x Camera Sensor Interface driver"
depends on VIDEO_DEV && MX3_IPU && SOC_CAMERA
- select VIDEOBUF_DMA_CONTIG
+ select VIDEOBUF2_DMA_CONTIG
select MX3_VIDEO
---help---
This is a v4l2 driver for the i.MX3x Camera Sensor Interface
/* No struct video_device, but can have buffers allocated */
if (type == CX18_ENC_STREAM_TYPE_IDX) {
+ /* If the module params didn't inhibit IDX ... */
if (cx->stream_buffers[type] != 0) {
cx->stream_buffers[type] = 0;
- cx18_stream_free(&cx->streams[type]);
+ /*
+ * Before calling cx18_stream_free(),
+ * check if the IDX stream was actually set up.
+ * Needed, since the cx18_probe() error path
+ * exits through here as well as normal clean up
+ */
+ if (cx->streams[type].buffers != 0)
+ cx18_stream_free(&cx->streams[type]);
}
continue;
}
select DVB_CX24116 if !DVB_FE_CUSTOMISE
select DVB_STV0900 if !DVB_FE_CUSTOMISE
select DVB_DS3000 if !DVB_FE_CUSTOMISE
+ select DVB_STV0367 if !DVB_FE_CUSTOMISE
select MEDIA_TUNER_MT2131 if !MEDIA_TUNER_CUSTOMISE
select MEDIA_TUNER_XC2028 if !MEDIA_TUNER_CUSTOMISE
select MEDIA_TUNER_TDA8290 if !MEDIA_TUNER_CUSTOMISE
for (todo = 32; todo > 0; todo -= bits) {
ev.pulse = samples & 0x80000000 ? false : true;
bits = min(todo, 32U - fls(ev.pulse ? samples : ~samples));
- ev.duration = (bits * NSEC_PER_SEC) / (1000 * ir_samplerate);
+ ev.duration = (bits * (NSEC_PER_SEC / 1000)) / ir_samplerate;
ir_raw_event_store_with_filter(ir->dev, &ev);
samples <<= bits;
}
static int imx074_set_bus_param(struct soc_camera_device *icd,
unsigned long flags)
{
- return -1;
+ return -EINVAL;
}
static struct soc_camera_ops imx074_ops = {
v4l_info(client, "chip found @ 0x%x (%s)\n",
client->addr << 1, client->adapter->name);
- state = kmalloc(sizeof(struct m52790_state), GFP_KERNEL);
+ state = kzalloc(sizeof(struct m52790_state), GFP_KERNEL);
if (state == NULL)
return -ENOMEM;
}
switch (xclksel) {
- case 0:
+ case ISP_XCLK_A:
isp_reg_clr_set(isp, OMAP3_ISP_IOMEM_MAIN, ISP_TCTRL_CTRL,
ISPTCTRL_CTRL_DIVA_MASK,
divisor << ISPTCTRL_CTRL_DIVA_SHIFT);
dev_dbg(isp->dev, "isp_set_xclk(): cam_xclka set to %d Hz\n",
currentxclk);
break;
- case 1:
+ case ISP_XCLK_B:
isp_reg_clr_set(isp, OMAP3_ISP_IOMEM_MAIN, ISP_TCTRL_CTRL,
ISPTCTRL_CTRL_DIVB_MASK,
divisor << ISPTCTRL_CTRL_DIVB_SHIFT);
dev_dbg(isp->dev, "isp_set_xclk(): cam_xclkb set to %d Hz\n",
currentxclk);
break;
+ case ISP_XCLK_NONE:
default:
omap3isp_put(isp);
dev_dbg(isp->dev, "ISP_ERR: isp_set_xclk(): Invalid requested "
}
/* Do we go from stable whatever to clock? */
- if (divisor >= 2 && isp->xclk_divisor[xclksel] < 2)
+ if (divisor >= 2 && isp->xclk_divisor[xclksel - 1] < 2)
omap3isp_get(isp);
/* Stopping the clock. */
- else if (divisor < 2 && isp->xclk_divisor[xclksel] >= 2)
+ else if (divisor < 2 && isp->xclk_divisor[xclksel - 1] >= 2)
omap3isp_put(isp);
- isp->xclk_divisor[xclksel] = divisor;
+ isp->xclk_divisor[xclksel - 1] = divisor;
omap3isp_put(isp);
*/
void omap3isp_configure_bridge(struct isp_device *isp,
enum ccdc_input_entity input,
- const struct isp_parallel_platform_data *pdata)
+ const struct isp_parallel_platform_data *pdata,
+ unsigned int shift)
{
u32 ispctrl_val;
switch (input) {
case CCDC_INPUT_PARALLEL:
ispctrl_val |= ISPCTRL_PAR_SER_CLK_SEL_PARALLEL;
- ispctrl_val |= pdata->data_lane_shift << ISPCTRL_SHIFT_SHIFT;
ispctrl_val |= pdata->clk_pol << ISPCTRL_PAR_CLK_POL_SHIFT;
ispctrl_val |= pdata->bridge << ISPCTRL_PAR_BRIDGE_SHIFT;
+ shift += pdata->data_lane_shift * 2;
break;
case CCDC_INPUT_CSI2A:
return;
}
+ ispctrl_val |= ((shift/2) << ISPCTRL_SHIFT_SHIFT) & ISPCTRL_SHIFT_MASK;
+
ispctrl_val &= ~ISPCTRL_SYNC_DETECT_MASK;
ispctrl_val |= ISPCTRL_SYNC_DETECT_VSRISE;
/* Apply power change to connected non-nodes. */
ret = isp_pipeline_pm_power(entity, change);
+ if (ret < 0)
+ entity->use_count -= change;
mutex_unlock(&entity->parent->graph_mutex);
}
}
+ if (failure < 0)
+ isp->needs_reset = true;
+
return failure;
}
* single-shot or continuous mode.
*
* Return 0 if successful, or the return value of the failed video::s_stream
- * operation otherwise.
+ * operation otherwise. The pipeline state is not updated when the operation
+ * fails, except when stopping the pipeline.
*/
int omap3isp_pipeline_set_stream(struct isp_pipeline *pipe,
enum isp_pipeline_stream_state state)
ret = isp_pipeline_disable(pipe);
else
ret = isp_pipeline_enable(pipe, state);
- pipe->stream_state = state;
+
+ if (ret == 0 || state == ISP_PIPELINE_STREAM_STOPPED)
+ pipe->stream_state = state;
return ret;
}
if (--isp->ref_count == 0) {
isp_disable_interrupts(isp);
isp_save_ctx(isp);
+ if (isp->needs_reset) {
+ isp_reset(isp);
+ isp->needs_reset = false;
+ }
isp_disable_clocks(isp);
}
mutex_unlock(&isp->isp_mutex);
/**
* struct isp_parallel_platform_data - Parallel interface platform data
- * @width: Parallel bus width in bits (8, 10, 11 or 12)
* @data_lane_shift: Data lane shifter
* 0 - CAMEXT[13:0] -> CAM[13:0]
* 1 - CAMEXT[13:2] -> CAM[11:0]
* ISPCTRL_PAR_BRIDGE_BENDIAN - Big endian
*/
struct isp_parallel_platform_data {
- unsigned int width;
unsigned int data_lane_shift:2;
unsigned int clk_pol:1;
unsigned int bridge:4;
/* ISP Obj */
spinlock_t stat_lock; /* common lock for statistic drivers */
struct mutex isp_mutex; /* For handling ref_count field */
+ bool needs_reset;
int has_context;
int ref_count;
unsigned int autoidle;
enum isp_pipeline_stream_state state);
void omap3isp_configure_bridge(struct isp_device *isp,
enum ccdc_input_entity input,
- const struct isp_parallel_platform_data *pdata);
+ const struct isp_parallel_platform_data *pdata,
+ unsigned int shift);
-#define ISP_XCLK_NONE -1
-#define ISP_XCLK_A 0
-#define ISP_XCLK_B 1
+#define ISP_XCLK_NONE 0
+#define ISP_XCLK_A 1
+#define ISP_XCLK_B 2
struct isp_device *omap3isp_get(struct isp_device *isp);
void omap3isp_put(struct isp_device *isp);
static const unsigned int ccdc_fmts[] = {
V4L2_MBUS_FMT_Y8_1X8,
+ V4L2_MBUS_FMT_Y10_1X10,
+ V4L2_MBUS_FMT_Y12_1X12,
+ V4L2_MBUS_FMT_SGRBG8_1X8,
+ V4L2_MBUS_FMT_SRGGB8_1X8,
+ V4L2_MBUS_FMT_SBGGR8_1X8,
+ V4L2_MBUS_FMT_SGBRG8_1X8,
V4L2_MBUS_FMT_SGRBG10_1X10,
V4L2_MBUS_FMT_SRGGB10_1X10,
V4L2_MBUS_FMT_SBGGR10_1X10,
struct isp_parallel_platform_data *pdata = NULL;
struct v4l2_subdev *sensor;
struct v4l2_mbus_framefmt *format;
+ const struct isp_format_info *fmt_info;
+ struct v4l2_subdev_format fmt_src;
+ unsigned int depth_out;
+ unsigned int depth_in = 0;
struct media_pad *pad;
unsigned long flags;
+ unsigned int shift;
u32 syn_mode;
u32 ccdc_pattern;
- if (ccdc->input == CCDC_INPUT_PARALLEL) {
- pad = media_entity_remote_source(&ccdc->pads[CCDC_PAD_SINK]);
- sensor = media_entity_to_v4l2_subdev(pad->entity);
+ pad = media_entity_remote_source(&ccdc->pads[CCDC_PAD_SINK]);
+ sensor = media_entity_to_v4l2_subdev(pad->entity);
+ if (ccdc->input == CCDC_INPUT_PARALLEL)
pdata = &((struct isp_v4l2_subdevs_group *)sensor->host_priv)
->bus.parallel;
+
+ /* Compute shift value for lane shifter to configure the bridge. */
+ fmt_src.pad = pad->index;
+ fmt_src.which = V4L2_SUBDEV_FORMAT_ACTIVE;
+ if (!v4l2_subdev_call(sensor, pad, get_fmt, NULL, &fmt_src)) {
+ fmt_info = omap3isp_video_format_info(fmt_src.format.code);
+ depth_in = fmt_info->bpp;
}
- omap3isp_configure_bridge(isp, ccdc->input, pdata);
+ fmt_info = omap3isp_video_format_info
+ (isp->isp_ccdc.formats[CCDC_PAD_SINK].code);
+ depth_out = fmt_info->bpp;
+
+ shift = depth_in - depth_out;
+ omap3isp_configure_bridge(isp, ccdc->input, pdata, shift);
- ccdc->syncif.datsz = pdata ? pdata->width : 10;
+ ccdc->syncif.datsz = depth_out;
ccdc_config_sync_if(ccdc, &ccdc->syncif);
/* CCDC_PAD_SINK */
* @ccdc: Pointer to ISP CCDC device.
* @event: Pointing which event trigger handler
*
- * Return 1 when the event and stopping request combination is satisfyied,
+ * Return 1 when the event and stopping request combination is satisfied,
* zero otherwise.
*/
static int __ccdc_handle_stopping(struct isp_ccdc_device *ccdc, u32 event)
ccdc_set_outaddr(ccdc, buffer->isp_addr);
- /* We now have a buffer queued on the output, restart the pipeline in
+ /* We now have a buffer queued on the output, restart the pipeline
* on the next CCDC interrupt if running in continuous mode (or when
* starting the stream).
*/
* @configs - pointer to update config structure.
* @config - return pointer to appropriate structure field.
* @bit - for which feature to return pointers.
- * Return size of coresponding prev_params member
+ * Return size of corresponding prev_params member
*/
static u32
__preview_get_ptrs(struct prev_params *params, void **param,
up_read(¤t->mm->mmap_sem);
if (ret != buf->npages) {
- buf->npages = ret;
+ buf->npages = ret < 0 ? 0 : ret;
isp_video_buffer_cleanup(buf);
return -EFAULT;
}
* isp_video_buffer_prepare_vm_flags - Get VMA flags for a userspace address
*
* This function locates the VMAs for the buffer's userspace address and checks
- * that their flags match. The onlflag that we need to care for at the moment is
- * VM_PFNMAP.
+ * that their flags match. The only flag that we need to care for at the moment
+ * is VM_PFNMAP.
*
* The buffer vm_flags field is set to the first VMA flags.
*
* iw and ih are the input width and height after cropping. Those equations need
* to be satisfied exactly for the resizer to work correctly.
*
- * Reverting the equations, we can compute the resizing ratios with
+ * The equations can't be easily reverted, as the >> 8 operation is not linear.
+ * In addition, not all input sizes can be achieved for a given output size. To
+ * get the highest input size lower than or equal to the requested input size,
+ * we need to compute the highest resizing ratio that satisfies the following
+ * inequality (taking the 4-tap mode width equation as an example)
+ *
+ * iw >= (32 * sph + (ow - 1) * hrsz + 16) >> 8 - 7
+ *
+ * (where iw is the requested input width) which can be rewritten as
+ *
+ * iw - 7 >= (32 * sph + (ow - 1) * hrsz + 16) >> 8
+ * (iw - 7) << 8 >= 32 * sph + (ow - 1) * hrsz + 16 - b
+ * ((iw - 7) << 8) + b >= 32 * sph + (ow - 1) * hrsz + 16
+ *
+ * where b is the value of the 8 least significant bits of the right hand side
+ * expression of the last inequality. The highest resizing ratio value will be
+ * achieved when b is equal to its maximum value of 255. That resizing ratio
+ * value will still satisfy the original inequality, as b will disappear when
+ * the expression will be shifted right by 8.
+ *
+ * The reverted the equations thus become
*
* - 8-phase, 4-tap mode
- * hrsz = ((iw - 7) * 256 - 16 - 32 * sph) / (ow - 1)
- * vrsz = ((ih - 4) * 256 - 16 - 32 * spv) / (oh - 1)
+ * hrsz = ((iw - 7) * 256 + 255 - 16 - 32 * sph) / (ow - 1)
+ * vrsz = ((ih - 4) * 256 + 255 - 16 - 32 * spv) / (oh - 1)
* - 4-phase, 7-tap mode
- * hrsz = ((iw - 7) * 256 - 32 - 64 * sph) / (ow - 1)
- * vrsz = ((ih - 7) * 256 - 32 - 64 * spv) / (oh - 1)
+ * hrsz = ((iw - 7) * 256 + 255 - 32 - 64 * sph) / (ow - 1)
+ * vrsz = ((ih - 7) * 256 + 255 - 32 - 64 * spv) / (oh - 1)
*
- * The ratios are integer values, and must be rounded down to ensure that the
- * cropped input size is not bigger than the uncropped input size. As the ratio
- * in 7-tap mode is always smaller than the ratio in 4-tap mode, we can use the
- * 7-tap mode equations to compute a ratio approximation.
+ * The ratios are integer values, and are rounded down to ensure that the
+ * cropped input size is not bigger than the uncropped input size.
+ *
+ * As the number of phases/taps, used to select the correct equations to compute
+ * the ratio, depends on the ratio, we start with the 4-tap mode equations to
+ * compute an approximation of the ratio, and switch to the 7-tap mode equations
+ * if the approximation is higher than the ratio threshold.
+ *
+ * As the 7-tap mode equations will return a ratio smaller than or equal to the
+ * 4-tap mode equations, the resulting ratio could become lower than or equal to
+ * the ratio threshold. This 'equations loop' isn't an issue as long as the
+ * correct equations are used to compute the final input size. Starting with the
+ * 4-tap mode equations ensure that, in case of values resulting in a 'ratio
+ * loop', the smallest of the ratio values will be used, never exceeding the
+ * requested input size.
*
* We first clamp the output size according to the hardware capabilitie to avoid
* auto-cropping the input more than required to satisfy the TRM equations. The
unsigned int max_width;
unsigned int max_height;
unsigned int width_alignment;
+ unsigned int width;
+ unsigned int height;
/*
* Clamp the output height based on the hardware capabilities and
max_height = min_t(unsigned int, max_height, MAX_OUT_HEIGHT);
output->height = clamp(output->height, min_height, max_height);
- ratio->vert = ((input->height - 7) * 256 - 32 - 64 * spv)
+ ratio->vert = ((input->height - 4) * 256 + 255 - 16 - 32 * spv)
/ (output->height - 1);
+ if (ratio->vert > MID_RESIZE_VALUE)
+ ratio->vert = ((input->height - 7) * 256 + 255 - 32 - 64 * spv)
+ / (output->height - 1);
ratio->vert = clamp_t(unsigned int, ratio->vert,
MIN_RESIZE_VALUE, MAX_RESIZE_VALUE);
if (ratio->vert <= MID_RESIZE_VALUE) {
upscaled_height = (output->height - 1) * ratio->vert
+ 32 * spv + 16;
- input->height = (upscaled_height >> 8) + 4;
+ height = (upscaled_height >> 8) + 4;
} else {
upscaled_height = (output->height - 1) * ratio->vert
+ 64 * spv + 32;
- input->height = (upscaled_height >> 8) + 7;
+ height = (upscaled_height >> 8) + 7;
}
/*
max_width & ~(width_alignment - 1));
output->width = ALIGN(output->width, width_alignment);
- ratio->horz = ((input->width - 7) * 256 - 32 - 64 * sph)
+ ratio->horz = ((input->width - 7) * 256 + 255 - 16 - 32 * sph)
/ (output->width - 1);
+ if (ratio->horz > MID_RESIZE_VALUE)
+ ratio->horz = ((input->width - 7) * 256 + 255 - 32 - 64 * sph)
+ / (output->width - 1);
ratio->horz = clamp_t(unsigned int, ratio->horz,
MIN_RESIZE_VALUE, MAX_RESIZE_VALUE);
if (ratio->horz <= MID_RESIZE_VALUE) {
upscaled_width = (output->width - 1) * ratio->horz
+ 32 * sph + 16;
- input->width = (upscaled_width >> 8) + 7;
+ width = (upscaled_width >> 8) + 7;
} else {
upscaled_width = (output->width - 1) * ratio->horz
+ 64 * sph + 32;
- input->width = (upscaled_width >> 8) + 7;
+ width = (upscaled_width >> 8) + 7;
}
+
+ /* Center the new crop rectangle. */
+ input->left += (input->width - width) / 2;
+ input->top += (input->height - height) / 2;
+ input->width = width;
+ input->height = height;
}
/*
struct ispstat_generic_config {
/*
* Fields must be in the same order as in:
- * - isph3a_aewb_config
- * - isph3a_af_config
- * - isphist_config
+ * - omap3isp_h3a_aewb_config
+ * - omap3isp_h3a_af_config
+ * - omap3isp_hist_config
*/
u32 buf_size;
u16 config_counter;
static struct isp_format_info formats[] = {
{ V4L2_MBUS_FMT_Y8_1X8, V4L2_MBUS_FMT_Y8_1X8,
- V4L2_MBUS_FMT_Y8_1X8, V4L2_PIX_FMT_GREY, 8, },
+ V4L2_MBUS_FMT_Y8_1X8, V4L2_MBUS_FMT_Y8_1X8,
+ V4L2_PIX_FMT_GREY, 8, },
+ { V4L2_MBUS_FMT_Y10_1X10, V4L2_MBUS_FMT_Y10_1X10,
+ V4L2_MBUS_FMT_Y10_1X10, V4L2_MBUS_FMT_Y8_1X8,
+ V4L2_PIX_FMT_Y10, 10, },
+ { V4L2_MBUS_FMT_Y12_1X12, V4L2_MBUS_FMT_Y10_1X10,
+ V4L2_MBUS_FMT_Y12_1X12, V4L2_MBUS_FMT_Y8_1X8,
+ V4L2_PIX_FMT_Y12, 12, },
+ { V4L2_MBUS_FMT_SBGGR8_1X8, V4L2_MBUS_FMT_SBGGR8_1X8,
+ V4L2_MBUS_FMT_SBGGR8_1X8, V4L2_MBUS_FMT_SBGGR8_1X8,
+ V4L2_PIX_FMT_SBGGR8, 8, },
+ { V4L2_MBUS_FMT_SGBRG8_1X8, V4L2_MBUS_FMT_SGBRG8_1X8,
+ V4L2_MBUS_FMT_SGBRG8_1X8, V4L2_MBUS_FMT_SGBRG8_1X8,
+ V4L2_PIX_FMT_SGBRG8, 8, },
+ { V4L2_MBUS_FMT_SGRBG8_1X8, V4L2_MBUS_FMT_SGRBG8_1X8,
+ V4L2_MBUS_FMT_SGRBG8_1X8, V4L2_MBUS_FMT_SGRBG8_1X8,
+ V4L2_PIX_FMT_SGRBG8, 8, },
+ { V4L2_MBUS_FMT_SRGGB8_1X8, V4L2_MBUS_FMT_SRGGB8_1X8,
+ V4L2_MBUS_FMT_SRGGB8_1X8, V4L2_MBUS_FMT_SRGGB8_1X8,
+ V4L2_PIX_FMT_SRGGB8, 8, },
{ V4L2_MBUS_FMT_SGRBG10_DPCM8_1X8, V4L2_MBUS_FMT_SGRBG10_DPCM8_1X8,
- V4L2_MBUS_FMT_SGRBG10_1X10, V4L2_PIX_FMT_SGRBG10DPCM8, 8, },
+ V4L2_MBUS_FMT_SGRBG10_1X10, 0,
+ V4L2_PIX_FMT_SGRBG10DPCM8, 8, },
{ V4L2_MBUS_FMT_SBGGR10_1X10, V4L2_MBUS_FMT_SBGGR10_1X10,
- V4L2_MBUS_FMT_SBGGR10_1X10, V4L2_PIX_FMT_SBGGR10, 10, },
+ V4L2_MBUS_FMT_SBGGR10_1X10, V4L2_MBUS_FMT_SBGGR8_1X8,
+ V4L2_PIX_FMT_SBGGR10, 10, },
{ V4L2_MBUS_FMT_SGBRG10_1X10, V4L2_MBUS_FMT_SGBRG10_1X10,
- V4L2_MBUS_FMT_SGBRG10_1X10, V4L2_PIX_FMT_SGBRG10, 10, },
+ V4L2_MBUS_FMT_SGBRG10_1X10, V4L2_MBUS_FMT_SGBRG8_1X8,
+ V4L2_PIX_FMT_SGBRG10, 10, },
{ V4L2_MBUS_FMT_SGRBG10_1X10, V4L2_MBUS_FMT_SGRBG10_1X10,
- V4L2_MBUS_FMT_SGRBG10_1X10, V4L2_PIX_FMT_SGRBG10, 10, },
+ V4L2_MBUS_FMT_SGRBG10_1X10, V4L2_MBUS_FMT_SGRBG8_1X8,
+ V4L2_PIX_FMT_SGRBG10, 10, },
{ V4L2_MBUS_FMT_SRGGB10_1X10, V4L2_MBUS_FMT_SRGGB10_1X10,
- V4L2_MBUS_FMT_SRGGB10_1X10, V4L2_PIX_FMT_SRGGB10, 10, },
+ V4L2_MBUS_FMT_SRGGB10_1X10, V4L2_MBUS_FMT_SRGGB8_1X8,
+ V4L2_PIX_FMT_SRGGB10, 10, },
{ V4L2_MBUS_FMT_SBGGR12_1X12, V4L2_MBUS_FMT_SBGGR10_1X10,
- V4L2_MBUS_FMT_SBGGR12_1X12, V4L2_PIX_FMT_SBGGR12, 12, },
+ V4L2_MBUS_FMT_SBGGR12_1X12, V4L2_MBUS_FMT_SBGGR8_1X8,
+ V4L2_PIX_FMT_SBGGR12, 12, },
{ V4L2_MBUS_FMT_SGBRG12_1X12, V4L2_MBUS_FMT_SGBRG10_1X10,
- V4L2_MBUS_FMT_SGBRG12_1X12, V4L2_PIX_FMT_SGBRG12, 12, },
+ V4L2_MBUS_FMT_SGBRG12_1X12, V4L2_MBUS_FMT_SGBRG8_1X8,
+ V4L2_PIX_FMT_SGBRG12, 12, },
{ V4L2_MBUS_FMT_SGRBG12_1X12, V4L2_MBUS_FMT_SGRBG10_1X10,
- V4L2_MBUS_FMT_SGRBG12_1X12, V4L2_PIX_FMT_SGRBG12, 12, },
+ V4L2_MBUS_FMT_SGRBG12_1X12, V4L2_MBUS_FMT_SGRBG8_1X8,
+ V4L2_PIX_FMT_SGRBG12, 12, },
{ V4L2_MBUS_FMT_SRGGB12_1X12, V4L2_MBUS_FMT_SRGGB10_1X10,
- V4L2_MBUS_FMT_SRGGB12_1X12, V4L2_PIX_FMT_SRGGB12, 12, },
+ V4L2_MBUS_FMT_SRGGB12_1X12, V4L2_MBUS_FMT_SRGGB8_1X8,
+ V4L2_PIX_FMT_SRGGB12, 12, },
{ V4L2_MBUS_FMT_UYVY8_1X16, V4L2_MBUS_FMT_UYVY8_1X16,
- V4L2_MBUS_FMT_UYVY8_1X16, V4L2_PIX_FMT_UYVY, 16, },
+ V4L2_MBUS_FMT_UYVY8_1X16, 0,
+ V4L2_PIX_FMT_UYVY, 16, },
{ V4L2_MBUS_FMT_YUYV8_1X16, V4L2_MBUS_FMT_YUYV8_1X16,
- V4L2_MBUS_FMT_YUYV8_1X16, V4L2_PIX_FMT_YUYV, 16, },
+ V4L2_MBUS_FMT_YUYV8_1X16, 0,
+ V4L2_PIX_FMT_YUYV, 16, },
};
const struct isp_format_info *
return NULL;
}
+/*
+ * Decide whether desired output pixel code can be obtained with
+ * the lane shifter by shifting the input pixel code.
+ * @in: input pixelcode to shifter
+ * @out: output pixelcode from shifter
+ * @additional_shift: # of bits the sensor's LSB is offset from CAMEXT[0]
+ *
+ * return true if the combination is possible
+ * return false otherwise
+ */
+static bool isp_video_is_shiftable(enum v4l2_mbus_pixelcode in,
+ enum v4l2_mbus_pixelcode out,
+ unsigned int additional_shift)
+{
+ const struct isp_format_info *in_info, *out_info;
+
+ if (in == out)
+ return true;
+
+ in_info = omap3isp_video_format_info(in);
+ out_info = omap3isp_video_format_info(out);
+
+ if ((in_info->flavor == 0) || (out_info->flavor == 0))
+ return false;
+
+ if (in_info->flavor != out_info->flavor)
+ return false;
+
+ return in_info->bpp - out_info->bpp + additional_shift <= 6;
+}
+
/*
* isp_video_mbus_to_pix - Convert v4l2_mbus_framefmt to v4l2_pix_format
* @video: ISP video instance
return -EPIPE;
while (1) {
+ unsigned int shifter_link;
/* Retrieve the sink format */
pad = &subdev->entity.pads[0];
if (!(pad->flags & MEDIA_PAD_FL_SINK))
return -ENOSPC;
}
+ /* If sink pad is on CCDC, the link has the lane shifter
+ * in the middle of it. */
+ shifter_link = subdev == &isp->isp_ccdc.subdev;
+
/* Retrieve the source format */
pad = media_entity_remote_source(pad);
if (pad == NULL ||
return -EPIPE;
/* Check if the two ends match */
- if (fmt_source.format.code != fmt_sink.format.code ||
- fmt_source.format.width != fmt_sink.format.width ||
+ if (fmt_source.format.width != fmt_sink.format.width ||
fmt_source.format.height != fmt_sink.format.height)
return -EPIPE;
+
+ if (shifter_link) {
+ unsigned int parallel_shift = 0;
+ if (isp->isp_ccdc.input == CCDC_INPUT_PARALLEL) {
+ struct isp_parallel_platform_data *pdata =
+ &((struct isp_v4l2_subdevs_group *)
+ subdev->host_priv)->bus.parallel;
+ parallel_shift = pdata->data_lane_shift * 2;
+ }
+ if (!isp_video_is_shiftable(fmt_source.format.code,
+ fmt_sink.format.code,
+ parallel_shift))
+ return -EPIPE;
+ } else if (fmt_source.format.code != fmt_sink.format.code)
+ return -EPIPE;
}
return 0;
* bits. Identical to @code if the format is 10 bits wide or less.
* @uncompressed: V4L2 media bus format code for the corresponding uncompressed
* format. Identical to @code if the format is not DPCM compressed.
+ * @flavor: V4L2 media bus format code for the same pixel layout but
+ * shifted to be 8 bits per pixel. =0 if format is not shiftable.
* @pixelformat: V4L2 pixel format FCC identifier
* @bpp: Bits per pixel
*/
enum v4l2_mbus_pixelcode code;
enum v4l2_mbus_pixelcode truncated;
enum v4l2_mbus_pixelcode uncompressed;
+ enum v4l2_mbus_pixelcode flavor;
u32 pixelformat;
unsigned int bpp;
};
if (ret)
return ret;
- if (vb2_is_streaming(&fimc->vid_cap.vbq) || fimc_capture_active(fimc))
+ if (vb2_is_busy(&fimc->vid_cap.vbq) || fimc_capture_active(fimc))
return -EBUSY;
frame = &ctx->d_frame;
return -EINVAL;
}
- for (i = 0; i < frame->fmt->colplanes; i++)
- frame->payload[i] = pix->plane_fmt[i].bytesperline * pix->height;
+ for (i = 0; i < frame->fmt->colplanes; i++) {
+ frame->payload[i] =
+ (pix->width * pix->height * frame->fmt->depth[i]) >> 3;
+ }
/* Output DMA frame pixel size and offsets. */
frame->f_width = pix->plane_fmt[0].bytesperline * 8
{
struct fimc_vid_cap *cap = &fimc->vid_cap;
struct fimc_vid_buffer *v_buf;
+ struct timeval *tv;
+ struct timespec ts;
if (!list_empty(&cap->active_buf_q) &&
test_bit(ST_CAPT_RUN, &fimc->state)) {
+ ktime_get_real_ts(&ts);
+
v_buf = active_queue_pop(cap);
+
+ tv = &v_buf->vb.v4l2_buf.timestamp;
+ tv->tv_sec = ts.tv_sec;
+ tv->tv_usec = ts.tv_nsec / NSEC_PER_USEC;
+ v_buf->vb.v4l2_buf.sequence = cap->frame_count++;
+
vb2_buffer_done(&v_buf->vb, VB2_BUF_STATE_DONE);
}
mutex_unlock(&ctx->fimc_dev->lock);
}
-struct vb2_ops fimc_qops = {
+static struct vb2_ops fimc_qops = {
.queue_setup = fimc_queue_setup,
.buf_prepare = fimc_buf_prepare,
.buf_queue = fimc_buf_queue,
pix->num_planes = fmt->memplanes;
pix->colorspace = V4L2_COLORSPACE_JPEG;
- for (i = 0; i < pix->num_planes; ++i) {
- int bpl = pix->plane_fmt[i].bytesperline;
- dbg("[%d] bpl: %d, depth: %d, w: %d, h: %d",
- i, bpl, fmt->depth[i], pix->width, pix->height);
+ for (i = 0; i < pix->num_planes; ++i) {
+ u32 bpl = pix->plane_fmt[i].bytesperline;
+ u32 *sizeimage = &pix->plane_fmt[i].sizeimage;
- if (!bpl || (bpl * 8 / fmt->depth[i]) > pix->width)
- bpl = (pix->width * fmt->depth[0]) >> 3;
+ if (fmt->colplanes > 1 && (bpl == 0 || bpl < pix->width))
+ bpl = pix->width; /* Planar */
- if (!pix->plane_fmt[i].sizeimage)
- pix->plane_fmt[i].sizeimage = pix->height * bpl;
+ if (fmt->colplanes == 1 && /* Packed */
+ (bpl == 0 || ((bpl * 8) / fmt->depth[i]) < pix->width))
+ bpl = (pix->width * fmt->depth[0]) / 8;
- pix->plane_fmt[i].bytesperline = bpl;
+ if (i == 0) /* Same bytesperline for each plane. */
+ mod_x = bpl;
- dbg("[%d]: bpl: %d, sizeimage: %d",
- i, pix->plane_fmt[i].bytesperline,
- pix->plane_fmt[i].sizeimage);
+ pix->plane_fmt[i].bytesperline = mod_x;
+ *sizeimage = (pix->width * pix->height * fmt->depth[i]) / 8;
}
return 0;
vq = v4l2_m2m_get_vq(ctx->m2m_ctx, f->type);
- if (vb2_is_streaming(vq)) {
+ if (vb2_is_busy(vq)) {
v4l2_err(&fimc->m2m.v4l2_dev, "queue (%d) busy\n", f->type);
return -EBUSY;
}
if (!frame->fmt)
return -EINVAL;
- for (i = 0; i < frame->fmt->colplanes; i++)
- frame->payload[i] = pix->plane_fmt[i].bytesperline * pix->height;
+ for (i = 0; i < frame->fmt->colplanes; i++) {
+ frame->payload[i] =
+ (pix->width * pix->height * frame->fmt->depth[i]) / 8;
+ }
frame->f_width = pix->plane_fmt[0].bytesperline * 8 /
frame->fmt->depth[0];
}
/* Image pixel limits, similar across several FIMC HW revisions. */
-static struct fimc_pix_limit s5p_pix_limit[3] = {
+static struct fimc_pix_limit s5p_pix_limit[4] = {
[0] = {
.scaler_en_w = 3264,
.scaler_dis_w = 8192,
.out_rot_en_w = 1280,
.out_rot_dis_w = 1920,
},
+ [3] = {
+ .scaler_en_w = 1920,
+ .scaler_dis_w = 8192,
+ .in_rot_en_h = 1366,
+ .in_rot_dis_w = 8192,
+ .out_rot_en_w = 1366,
+ .out_rot_dis_w = 1920,
+ },
};
static struct samsung_fimc_variant fimc0_variant_s5p = {
.pix_limit = &s5p_pix_limit[2],
};
-static struct samsung_fimc_variant fimc0_variant_s5pv310 = {
+static struct samsung_fimc_variant fimc0_variant_exynos4 = {
.pix_hoff = 1,
.has_inp_rot = 1,
.has_out_rot = 1,
.pix_limit = &s5p_pix_limit[1],
};
-static struct samsung_fimc_variant fimc2_variant_s5pv310 = {
+static struct samsung_fimc_variant fimc2_variant_exynos4 = {
.pix_hoff = 1,
.has_cistatus2 = 1,
.has_mainscaler_ext = 1,
.min_out_pixsize = 16,
.hor_offs_align = 1,
.out_buf_count = 32,
- .pix_limit = &s5p_pix_limit[2],
+ .pix_limit = &s5p_pix_limit[3],
};
/* S5PC100 */
};
/* S5PV310, S5PC210 */
-static struct samsung_fimc_driverdata fimc_drvdata_s5pv310 = {
+static struct samsung_fimc_driverdata fimc_drvdata_exynos4 = {
.variant = {
- [0] = &fimc0_variant_s5pv310,
- [1] = &fimc0_variant_s5pv310,
- [2] = &fimc0_variant_s5pv310,
- [3] = &fimc2_variant_s5pv310,
+ [0] = &fimc0_variant_exynos4,
+ [1] = &fimc0_variant_exynos4,
+ [2] = &fimc0_variant_exynos4,
+ [3] = &fimc2_variant_exynos4,
},
.num_entities = 4,
.lclk_frequency = 166000000UL,
.name = "s5pv210-fimc",
.driver_data = (unsigned long)&fimc_drvdata_s5pv210,
}, {
- .name = "s5pv310-fimc",
- .driver_data = (unsigned long)&fimc_drvdata_s5pv310,
+ .name = "exynos4-fimc",
+ .driver_data = (unsigned long)&fimc_drvdata_exynos4,
},
{},
};
/* Try 2560x1920, 1280x960, 640x480, 320x240 */
mf.width = 2560 >> shift;
mf.height = 1920 >> shift;
- ret = v4l2_device_call_until_err(sd->v4l2_dev, 0, video,
+ ret = v4l2_device_call_until_err(sd->v4l2_dev, (long)icd, video,
s_mbus_fmt, &mf);
if (ret < 0)
return ret;
struct v4l2_cropcap cap;
int ret;
- ret = v4l2_device_call_until_err(sd->v4l2_dev, 0, video,
+ ret = v4l2_device_call_until_err(sd->v4l2_dev, (long)icd, video,
s_mbus_fmt, mf);
if (ret < 0)
return ret;
tmp_h = min(2 * tmp_h, max_height);
mf->width = tmp_w;
mf->height = tmp_h;
- ret = v4l2_device_call_until_err(sd->v4l2_dev, 0, video,
+ ret = v4l2_device_call_until_err(sd->v4l2_dev, (long)icd, video,
s_mbus_fmt, mf);
dev_geo(dev, "Camera scaled to %ux%u\n",
mf->width, mf->height);
mf.code = xlate->code;
mf.colorspace = pix->colorspace;
- ret = v4l2_device_call_until_err(sd->v4l2_dev, 0, video, try_mbus_fmt, &mf);
+ ret = v4l2_device_call_until_err(sd->v4l2_dev, (long)icd, video, try_mbus_fmt, &mf);
if (ret < 0)
return ret;
*/
mf.width = 2560;
mf.height = 1920;
- ret = v4l2_device_call_until_err(sd->v4l2_dev, 0, video,
+ ret = v4l2_device_call_until_err(sd->v4l2_dev, (long)icd, video,
try_mbus_fmt, &mf);
if (ret < 0) {
/* Shouldn't actually happen... */
void __iomem *base;
struct platform_device *pdev;
struct sh_csi2_client_config *client;
+ unsigned long (*query_bus_param)(struct soc_camera_device *);
+ int (*set_bus_param)(struct soc_camera_device *, unsigned long);
};
static int sh_csi2_try_fmt(struct v4l2_subdev *sd,
case BUS_NOTIFY_BOUND_DRIVER:
snprintf(priv->subdev.name, V4L2_SUBDEV_NAME_SIZE, "%s%s",
dev_name(v4l2_dev->dev), ".mipi-csi");
+ priv->subdev.grp_id = (long)icd;
ret = v4l2_device_register_subdev(v4l2_dev, &priv->subdev);
dev_dbg(dev, "%s(%p): ret(register_subdev) = %d\n", __func__, priv, ret);
if (ret < 0)
priv->client = pdata->clients + i;
+ priv->set_bus_param = icd->ops->set_bus_param;
+ priv->query_bus_param = icd->ops->query_bus_param;
icd->ops->set_bus_param = sh_csi2_set_bus_param;
icd->ops->query_bus_param = sh_csi2_query_bus_param;
priv->client = NULL;
/* Driver is about to be unbound */
- icd->ops->set_bus_param = NULL;
- icd->ops->query_bus_param = NULL;
+ icd->ops->set_bus_param = priv->set_bus_param;
+ icd->ops->query_bus_param = priv->query_bus_param;
+ priv->set_bus_param = NULL;
+ priv->query_bus_param = NULL;
v4l2_device_unregister_subdev(&priv->subdev);
}
EXPORT_SYMBOL(soc_camera_apply_sensor_flags);
+#define pixfmtstr(x) (x) & 0xff, ((x) >> 8) & 0xff, ((x) >> 16) & 0xff, \
+ ((x) >> 24) & 0xff
+
+static int soc_camera_try_fmt(struct soc_camera_device *icd,
+ struct v4l2_format *f)
+{
+ struct soc_camera_host *ici = to_soc_camera_host(icd->dev.parent);
+ struct v4l2_pix_format *pix = &f->fmt.pix;
+ int ret;
+
+ dev_dbg(&icd->dev, "TRY_FMT(%c%c%c%c, %ux%u)\n",
+ pixfmtstr(pix->pixelformat), pix->width, pix->height);
+
+ pix->bytesperline = 0;
+ pix->sizeimage = 0;
+
+ ret = ici->ops->try_fmt(icd, f);
+ if (ret < 0)
+ return ret;
+
+ if (!pix->sizeimage) {
+ if (!pix->bytesperline) {
+ const struct soc_camera_format_xlate *xlate;
+
+ xlate = soc_camera_xlate_by_fourcc(icd, pix->pixelformat);
+ if (!xlate)
+ return -EINVAL;
+
+ ret = soc_mbus_bytes_per_line(pix->width,
+ xlate->host_fmt);
+ if (ret > 0)
+ pix->bytesperline = ret;
+ }
+ if (pix->bytesperline)
+ pix->sizeimage = pix->bytesperline * pix->height;
+ }
+
+ return 0;
+}
+
static int soc_camera_try_fmt_vid_cap(struct file *file, void *priv,
struct v4l2_format *f)
{
struct soc_camera_device *icd = file->private_data;
- struct soc_camera_host *ici = to_soc_camera_host(icd->dev.parent);
WARN_ON(priv != file->private_data);
return -EINVAL;
/* limit format to hardware capabilities */
- return ici->ops->try_fmt(icd, f);
+ return soc_camera_try_fmt(icd, f);
}
static int soc_camera_enum_input(struct file *file, void *priv,
icd->user_formats = NULL;
}
-#define pixfmtstr(x) (x) & 0xff, ((x) >> 8) & 0xff, ((x) >> 16) & 0xff, \
- ((x) >> 24) & 0xff
-
/* Called with .vb_lock held, or from the first open(2), see comment there */
static int soc_camera_set_fmt(struct soc_camera_device *icd,
struct v4l2_format *f)
pixfmtstr(pix->pixelformat), pix->width, pix->height);
/* We always call try_fmt() before set_fmt() or set_crop() */
- ret = ici->ops->try_fmt(icd, f);
+ ret = soc_camera_try_fmt(icd, f);
if (ret < 0)
return ret;
{
struct i2c_client *client =
to_i2c_client(to_soc_camera_control(icd));
+ struct i2c_adapter *adap = client->adapter;
dev_set_drvdata(&icd->dev, NULL);
v4l2_device_unregister_subdev(i2c_get_clientdata(client));
i2c_unregister_device(client);
- i2c_put_adapter(client->adapter);
+ i2c_put_adapter(adap);
}
#else
#define soc_camera_init_i2c(icd, icl) (-ENODEV)
}
}
+ sd = soc_camera_to_subdev(icd);
+ sd->grp_id = (long)icd;
+
/* At this point client .probe() should have run already */
ret = soc_camera_init_user_formats(icd);
if (ret < 0)
goto evidstart;
/* Try to improve our guess of a reasonable window format */
- sd = soc_camera_to_subdev(icd);
if (!v4l2_subdev_call(sd, video, g_mbus_fmt, &mf)) {
icd->user_width = mf.width;
icd->user_height = mf.height;
v4l_info(client, "chip found @ 0x%x (%s)\n",
client->addr << 1, client->adapter->name);
- sd = kmalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
+ sd = kzalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
if (sd == NULL)
return -ENOMEM;
v4l2_i2c_subdev_init(sd, client, &tda9840_ops);
v4l_info(client, "chip found @ 0x%x (%s)\n",
client->addr << 1, client->adapter->name);
- sd = kmalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
+ sd = kzalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
if (sd == NULL)
return -ENOMEM;
v4l2_i2c_subdev_init(sd, client, &tea6415c_ops);
v4l_info(client, "chip found @ 0x%x (%s)\n",
client->addr << 1, client->adapter->name);
- sd = kmalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
+ sd = kzalloc(sizeof(struct v4l2_subdev), GFP_KERNEL);
if (sd == NULL)
return -ENOMEM;
v4l2_i2c_subdev_init(sd, client, &tea6420_ops);
v4l_info(client, "chip found @ 0x%x (%s)\n",
client->addr << 1, client->adapter->name);
- state = kmalloc(sizeof(struct upd64031a_state), GFP_KERNEL);
+ state = kzalloc(sizeof(struct upd64031a_state), GFP_KERNEL);
if (state == NULL)
return -ENOMEM;
sd = &state->sd;
v4l_info(client, "chip found @ 0x%x (%s)\n",
client->addr << 1, client->adapter->name);
- state = kmalloc(sizeof(struct upd64083_state), GFP_KERNEL);
+ state = kzalloc(sizeof(struct upd64083_state), GFP_KERNEL);
if (state == NULL)
return -ENOMEM;
sd = &state->sd;
video_get(vdev);
mutex_unlock(&videodev_lock);
#if defined(CONFIG_MEDIA_CONTROLLER)
- if (vdev->v4l2_dev && vdev->v4l2_dev->mdev) {
+ if (vdev->v4l2_dev && vdev->v4l2_dev->mdev &&
+ vdev->vfl_type != VFL_TYPE_SUBDEV) {
entity = media_entity_get(&vdev->entity);
if (!entity) {
ret = -EBUSY;
/* decrease the refcount in case of an error */
if (ret) {
#if defined(CONFIG_MEDIA_CONTROLLER)
- if (vdev->v4l2_dev && vdev->v4l2_dev->mdev)
+ if (vdev->v4l2_dev && vdev->v4l2_dev->mdev &&
+ vdev->vfl_type != VFL_TYPE_SUBDEV)
media_entity_put(entity);
#endif
video_put(vdev);
mutex_unlock(vdev->lock);
}
#if defined(CONFIG_MEDIA_CONTROLLER)
- if (vdev->v4l2_dev && vdev->v4l2_dev->mdev)
+ if (vdev->v4l2_dev && vdev->v4l2_dev->mdev &&
+ vdev->vfl_type != VFL_TYPE_SUBDEV)
media_entity_put(&vdev->entity);
#endif
/* decrease the refcount unconditionally since the release()
#if defined(CONFIG_MEDIA_CONTROLLER)
/* Part 5: Register the entity. */
- if (vdev->v4l2_dev && vdev->v4l2_dev->mdev) {
+ if (vdev->v4l2_dev && vdev->v4l2_dev->mdev &&
+ vdev->vfl_type != VFL_TYPE_SUBDEV) {
vdev->entity.type = MEDIA_ENT_T_DEVNODE_V4L;
vdev->entity.name = vdev->name;
vdev->entity.v4l.major = VIDEO_MAJOR;
return;
#if defined(CONFIG_MEDIA_CONTROLLER)
- if (vdev->v4l2_dev && vdev->v4l2_dev->mdev)
+ if (vdev->v4l2_dev && vdev->v4l2_dev->mdev &&
+ vdev->vfl_type != VFL_TYPE_SUBDEV)
media_device_unregister_entity(&vdev->entity);
#endif
sd->v4l2_dev = v4l2_dev;
if (sd->internal_ops && sd->internal_ops->registered) {
err = sd->internal_ops->registered(sd);
- if (err)
+ if (err) {
+ module_put(sd->owner);
return err;
+ }
}
/* This just returns 0 if either of the two args is NULL */
if (err) {
if (sd->internal_ops && sd->internal_ops->unregistered)
sd->internal_ops->unregistered(sd);
+ module_put(sd->owner);
return err;
}
switch (cmd) {
case VIDIOC_QUERYCTRL:
- return v4l2_subdev_queryctrl(sd, arg);
+ return v4l2_queryctrl(sd->ctrl_handler, arg);
case VIDIOC_QUERYMENU:
- return v4l2_subdev_querymenu(sd, arg);
+ return v4l2_querymenu(sd->ctrl_handler, arg);
case VIDIOC_G_CTRL:
- return v4l2_subdev_g_ctrl(sd, arg);
+ return v4l2_g_ctrl(sd->ctrl_handler, arg);
case VIDIOC_S_CTRL:
- return v4l2_subdev_s_ctrl(sd, arg);
+ return v4l2_s_ctrl(sd->ctrl_handler, arg);
case VIDIOC_G_EXT_CTRLS:
- return v4l2_subdev_g_ext_ctrls(sd, arg);
+ return v4l2_g_ext_ctrls(sd->ctrl_handler, arg);
case VIDIOC_S_EXT_CTRLS:
- return v4l2_subdev_s_ext_ctrls(sd, arg);
+ return v4l2_s_ext_ctrls(sd->ctrl_handler, arg);
case VIDIOC_TRY_EXT_CTRLS:
- return v4l2_subdev_try_ext_ctrls(sd, arg);
+ return v4l2_try_ext_ctrls(sd->ctrl_handler, arg);
case VIDIOC_DQEVENT:
if (!(sd->flags & V4L2_SUBDEV_FL_HAS_EVENTS))
#define call_qop(q, op, args...) \
(((q)->ops->op) ? ((q)->ops->op(args)) : 0)
+#define V4L2_BUFFER_STATE_FLAGS (V4L2_BUF_FLAG_MAPPED | V4L2_BUF_FLAG_QUEUED | \
+ V4L2_BUF_FLAG_DONE | V4L2_BUF_FLAG_ERROR)
+
/**
* __vb2_buf_mem_alloc() - allocate video memory for the given buffer
*/
for (plane = 0; plane < vb->num_planes; ++plane) {
mem_priv = call_memop(q, plane, alloc, q->alloc_ctx[plane],
plane_sizes[plane]);
- if (!mem_priv)
+ if (IS_ERR_OR_NULL(mem_priv))
goto free;
/* Associate allocator private data with this plane */
struct vb2_queue *q = vb->vb2_queue;
int ret = 0;
- /* Copy back data such as timestamp, input, etc. */
+ /* Copy back data such as timestamp, flags, input, etc. */
memcpy(b, &vb->v4l2_buf, offsetof(struct v4l2_buffer, m));
b->input = vb->v4l2_buf.input;
b->reserved = vb->v4l2_buf.reserved;
b->m.userptr = vb->v4l2_planes[0].m.userptr;
}
- b->flags = 0;
+ /*
+ * Clear any buffer state related flags.
+ */
+ b->flags &= ~V4L2_BUFFER_STATE_FLAGS;
switch (vb->state) {
case VB2_BUF_STATE_QUEUED:
num_buffers = min_t(unsigned int, req->count, VIDEO_MAX_FRAME);
memset(plane_sizes, 0, sizeof(plane_sizes));
memset(q->alloc_ctx, 0, sizeof(q->alloc_ctx));
+ q->memory = req->memory;
/*
* Ask the driver how many buffers and planes per buffer it requires.
ret = num_buffers;
}
- q->memory = req->memory;
-
/*
* Return the number of successfully allocated buffers
* to the userspace.
vb->v4l2_buf.field = b->field;
vb->v4l2_buf.timestamp = b->timestamp;
+ vb->v4l2_buf.input = b->input;
+ vb->v4l2_buf.flags = b->flags & ~V4L2_BUFFER_STATE_FLAGS;
return 0;
}
GFP_KERNEL);
if (!buf->vaddr) {
dev_err(conf->dev, "dma_alloc_coherent of size %ld failed\n",
- buf->size);
+ size);
kfree(buf);
return ERR_PTR(-ENOMEM);
}
gd->major = I2O_MAJOR;
gd->queue = queue;
gd->fops = &i2o_block_fops;
- gd->events = DISK_EVENT_MEDIA_CHANGE;
gd->private_data = dev;
dev->gd = gd;
int iter, i;
unsigned long flags;
- data->chip->irq_ack(irq_data);
+ data->chip->irq_ack(data);
for (iter = 0 ; iter < MAX_ASIC_ISR_LOOPS; iter++) {
u32 status;
#include <linux/dma-mapping.h>
#include <linux/spinlock.h>
#include <linux/gpio.h>
-#include <linux/regulator/consumer.h>
#include <plat/usb.h>
#define USBHS_DRIVER_NAME "usbhs-omap"
dev_dbg(dev, "starting TI HSUSB Controller\n");
if (!pdata) {
dev_dbg(dev, "missing platform_data\n");
- ret = -ENODEV;
- goto end_enable;
+ return -ENODEV;
}
spin_lock_irqsave(&omap->lock, flags);
gpio_request(pdata->ehci_data->reset_gpio_port[0],
"USB1 PHY reset");
gpio_direction_output
- (pdata->ehci_data->reset_gpio_port[0], 1);
+ (pdata->ehci_data->reset_gpio_port[0], 0);
}
if (gpio_is_valid(pdata->ehci_data->reset_gpio_port[1])) {
gpio_request(pdata->ehci_data->reset_gpio_port[1],
"USB2 PHY reset");
gpio_direction_output
- (pdata->ehci_data->reset_gpio_port[1], 1);
+ (pdata->ehci_data->reset_gpio_port[1], 0);
}
/* Hold the PHY in RESET for enough time till DIR is high */
if (gpio_is_valid(pdata->ehci_data->reset_gpio_port[0]))
gpio_set_value
- (pdata->ehci_data->reset_gpio_port[0], 0);
+ (pdata->ehci_data->reset_gpio_port[0], 1);
if (gpio_is_valid(pdata->ehci_data->reset_gpio_port[1]))
gpio_set_value
- (pdata->ehci_data->reset_gpio_port[1], 0);
+ (pdata->ehci_data->reset_gpio_port[1], 1);
}
end_count:
omap->count++;
- goto end_enable;
+ spin_unlock_irqrestore(&omap->lock, flags);
+ return 0;
err_tll:
if (pdata->ehci_data->phy_reset) {
clk_disable(omap->usbhost_fs_fck);
clk_disable(omap->usbhost_hs_fck);
clk_disable(omap->usbhost_ick);
-
-end_enable:
spin_unlock_irqrestore(&omap->lock, flags);
return ret;
}
if (err)
goto out;
}
- if (tscript->flags & TWL4030_SLEEP_SCRIPT)
+ if (tscript->flags & TWL4030_SLEEP_SCRIPT) {
if (order)
pr_warning("TWL4030: Bad order of scripts (sleep "\
"script before wakeup) Leads to boot"\
"failure on some boards\n");
err = twl4030_config_sleep_sequence(address);
+ }
out:
return err;
}
type = "SD-combo";
if (mmc_card_blockaddr(card))
type = "SDHC-combo";
+ break;
default:
type = "?";
break;
return IRQ_HANDLED;
}
- if (end_command)
+ if (end_command && host->cmd)
mmc_omap_cmd_done(host, host->cmd);
if (host->data != NULL) {
if (transfer_error)
#endif
}
+static const struct of_device_id sdhci_of_match[];
static int __devinit sdhci_of_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct device_node *np = ofdev->dev.of_node;
struct sdhci_of_data *sdhci_of_data;
struct sdhci_host *host;
int size;
int ret;
- if (!ofdev->dev.of_match)
+ match = of_match_device(sdhci_of_match, &ofdev->dev);
+ if (!match)
return -EINVAL;
- sdhci_of_data = ofdev->dev.of_match->data;
+ sdhci_of_data = match->data;
if (!of_device_is_available(np))
return -ENODEV;
host->ioaddr = pci_ioremap_bar(pdev, bar);
if (!host->ioaddr) {
dev_err(&pdev->dev, "failed to remap registers\n");
+ ret = -ENOMEM;
goto release;
}
host = (struct sdhci_host*)param;
+ /*
+ * If this tasklet gets rescheduled while running, it will
+ * be run again afterwards but without any active request.
+ */
+ if (!host->mrq)
+ return;
+
spin_lock_irqsave(&host->lock, flags);
del_timer(&host->timer);
* upon error conditions.
*/
if (!(host->flags & SDHCI_DEVICE_DEAD) &&
- (mrq->cmd->error ||
+ ((mrq->cmd && mrq->cmd->error) ||
(mrq->data && (mrq->data->error ||
(mrq->data->stop && mrq->data->stop->error))) ||
(host->quirks & SDHCI_QUIRK_RESET_AFTER_REQUEST))) {
tmio_mmc_set_clock(host, ios->clock);
/* Power sequence - OFF -> UP -> ON */
- if (ios->power_mode == MMC_POWER_OFF || !ios->clock) {
+ if (ios->power_mode == MMC_POWER_UP) {
+ /* power up SD bus */
+ if (host->set_pwr)
+ host->set_pwr(host->pdev, 1);
+ } else if (ios->power_mode == MMC_POWER_OFF || !ios->clock) {
/* power down SD bus */
if (ios->power_mode == MMC_POWER_OFF && host->set_pwr)
host->set_pwr(host->pdev, 0);
tmio_mmc_clk_stop(host);
- } else if (ios->power_mode == MMC_POWER_UP) {
- /* power up SD bus */
- if (host->set_pwr)
- host->set_pwr(host->pdev, 1);
} else {
/* start bus clock */
tmio_mmc_clk_start(host);
Support for parsing CFE image tag and creating MTD partitions on
Broadcom BCM63xx boards.
+config MTD_LANTIQ
+ tristate "Lantiq SoC NOR support"
+ depends on LANTIQ
+ select MTD_PARTITIONS
+ help
+ Support for NOR flash attached to the Lantiq SoC's External Bus Unit.
+
config MTD_DILNETPC
tristate "CFI Flash device mapped on DIL/Net PC"
depends on X86 && MTD_PARTITIONS && MTD_CFI_INTELEXT && BROKEN
obj-$(CONFIG_MTD_GPIO_ADDR) += gpio-addr-flash.o
obj-$(CONFIG_MTD_BCM963XX) += bcm963xx-flash.o
obj-$(CONFIG_MTD_LATCH_ADDR) += latch-addr-flash.o
+obj-$(CONFIG_MTD_LANTIQ) += lantiq-flash.o
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2004 Liu Peng Infineon IFAP DC COM CPE
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/io.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/mtd/mtd.h>
+#include <linux/mtd/map.h>
+#include <linux/mtd/partitions.h>
+#include <linux/mtd/cfi.h>
+#include <linux/platform_device.h>
+#include <linux/mtd/physmap.h>
+
+#include <lantiq_soc.h>
+#include <lantiq_platform.h>
+
+/*
+ * The NOR flash is connected to the same external bus unit (EBU) as PCI.
+ * To make PCI work we need to enable the endianness swapping for the address
+ * written to the EBU. This endianness swapping works for PCI correctly but
+ * fails for attached NOR devices. To workaround this we need to use a complex
+ * map. The workaround involves swapping all addresses whilst probing the chip.
+ * Once probing is complete we stop swapping the addresses but swizzle the
+ * unlock addresses to ensure that access to the NOR device works correctly.
+ */
+
+enum {
+ LTQ_NOR_PROBING,
+ LTQ_NOR_NORMAL
+};
+
+struct ltq_mtd {
+ struct resource *res;
+ struct mtd_info *mtd;
+ struct map_info *map;
+};
+
+static char ltq_map_name[] = "ltq_nor";
+
+static map_word
+ltq_read16(struct map_info *map, unsigned long adr)
+{
+ unsigned long flags;
+ map_word temp;
+
+ if (map->map_priv_1 == LTQ_NOR_PROBING)
+ adr ^= 2;
+ spin_lock_irqsave(&ebu_lock, flags);
+ temp.x[0] = *(u16 *)(map->virt + adr);
+ spin_unlock_irqrestore(&ebu_lock, flags);
+ return temp;
+}
+
+static void
+ltq_write16(struct map_info *map, map_word d, unsigned long adr)
+{
+ unsigned long flags;
+
+ if (map->map_priv_1 == LTQ_NOR_PROBING)
+ adr ^= 2;
+ spin_lock_irqsave(&ebu_lock, flags);
+ *(u16 *)(map->virt + adr) = d.x[0];
+ spin_unlock_irqrestore(&ebu_lock, flags);
+}
+
+/*
+ * The following 2 functions copy data between iomem and a cached memory
+ * section. As memcpy() makes use of pre-fetching we cannot use it here.
+ * The normal alternative of using memcpy_{to,from}io also makes use of
+ * memcpy() on MIPS so it is not applicable either. We are therefore stuck
+ * with having to use our own loop.
+ */
+static void
+ltq_copy_from(struct map_info *map, void *to,
+ unsigned long from, ssize_t len)
+{
+ unsigned char *f = (unsigned char *)map->virt + from;
+ unsigned char *t = (unsigned char *)to;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ebu_lock, flags);
+ while (len--)
+ *t++ = *f++;
+ spin_unlock_irqrestore(&ebu_lock, flags);
+}
+
+static void
+ltq_copy_to(struct map_info *map, unsigned long to,
+ const void *from, ssize_t len)
+{
+ unsigned char *f = (unsigned char *)from;
+ unsigned char *t = (unsigned char *)map->virt + to;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ebu_lock, flags);
+ while (len--)
+ *t++ = *f++;
+ spin_unlock_irqrestore(&ebu_lock, flags);
+}
+
+static const char const *part_probe_types[] = { "cmdlinepart", NULL };
+
+static int __init
+ltq_mtd_probe(struct platform_device *pdev)
+{
+ struct physmap_flash_data *ltq_mtd_data = dev_get_platdata(&pdev->dev);
+ struct ltq_mtd *ltq_mtd;
+ struct mtd_partition *parts;
+ struct resource *res;
+ int nr_parts = 0;
+ struct cfi_private *cfi;
+ int err;
+
+ ltq_mtd = kzalloc(sizeof(struct ltq_mtd), GFP_KERNEL);
+ platform_set_drvdata(pdev, ltq_mtd);
+
+ ltq_mtd->res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!ltq_mtd->res) {
+ dev_err(&pdev->dev, "failed to get memory resource");
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ res = devm_request_mem_region(&pdev->dev, ltq_mtd->res->start,
+ resource_size(ltq_mtd->res), dev_name(&pdev->dev));
+ if (!ltq_mtd->res) {
+ dev_err(&pdev->dev, "failed to request mem resource");
+ err = -EBUSY;
+ goto err_out;
+ }
+
+ ltq_mtd->map = kzalloc(sizeof(struct map_info), GFP_KERNEL);
+ ltq_mtd->map->phys = res->start;
+ ltq_mtd->map->size = resource_size(res);
+ ltq_mtd->map->virt = devm_ioremap_nocache(&pdev->dev,
+ ltq_mtd->map->phys, ltq_mtd->map->size);
+ if (!ltq_mtd->map->virt) {
+ dev_err(&pdev->dev, "failed to ioremap!\n");
+ err = -ENOMEM;
+ goto err_free;
+ }
+
+ ltq_mtd->map->name = ltq_map_name;
+ ltq_mtd->map->bankwidth = 2;
+ ltq_mtd->map->read = ltq_read16;
+ ltq_mtd->map->write = ltq_write16;
+ ltq_mtd->map->copy_from = ltq_copy_from;
+ ltq_mtd->map->copy_to = ltq_copy_to;
+
+ ltq_mtd->map->map_priv_1 = LTQ_NOR_PROBING;
+ ltq_mtd->mtd = do_map_probe("cfi_probe", ltq_mtd->map);
+ ltq_mtd->map->map_priv_1 = LTQ_NOR_NORMAL;
+
+ if (!ltq_mtd->mtd) {
+ dev_err(&pdev->dev, "probing failed\n");
+ err = -ENXIO;
+ goto err_unmap;
+ }
+
+ ltq_mtd->mtd->owner = THIS_MODULE;
+
+ cfi = ltq_mtd->map->fldrv_priv;
+ cfi->addr_unlock1 ^= 1;
+ cfi->addr_unlock2 ^= 1;
+
+ nr_parts = parse_mtd_partitions(ltq_mtd->mtd,
+ part_probe_types, &parts, 0);
+ if (nr_parts > 0) {
+ dev_info(&pdev->dev,
+ "using %d partitions from cmdline", nr_parts);
+ } else {
+ nr_parts = ltq_mtd_data->nr_parts;
+ parts = ltq_mtd_data->parts;
+ }
+
+ err = add_mtd_partitions(ltq_mtd->mtd, parts, nr_parts);
+ if (err) {
+ dev_err(&pdev->dev, "failed to add partitions\n");
+ goto err_destroy;
+ }
+
+ return 0;
+
+err_destroy:
+ map_destroy(ltq_mtd->mtd);
+err_unmap:
+ iounmap(ltq_mtd->map->virt);
+err_free:
+ kfree(ltq_mtd->map);
+err_out:
+ kfree(ltq_mtd);
+ return err;
+}
+
+static int __devexit
+ltq_mtd_remove(struct platform_device *pdev)
+{
+ struct ltq_mtd *ltq_mtd = platform_get_drvdata(pdev);
+
+ if (ltq_mtd) {
+ if (ltq_mtd->mtd) {
+ del_mtd_partitions(ltq_mtd->mtd);
+ map_destroy(ltq_mtd->mtd);
+ }
+ if (ltq_mtd->map->virt)
+ iounmap(ltq_mtd->map->virt);
+ kfree(ltq_mtd->map);
+ kfree(ltq_mtd);
+ }
+ return 0;
+}
+
+static struct platform_driver ltq_mtd_driver = {
+ .remove = __devexit_p(ltq_mtd_remove),
+ .driver = {
+ .name = "ltq_nor",
+ .owner = THIS_MODULE,
+ },
+};
+
+static int __init
+init_ltq_mtd(void)
+{
+ int ret = platform_driver_probe(<q_mtd_driver, ltq_mtd_probe);
+
+ if (ret)
+ pr_err("ltq_nor: error registering platform driver");
+ return ret;
+}
+
+static void __exit
+exit_ltq_mtd(void)
+{
+ platform_driver_unregister(<q_mtd_driver);
+}
+
+module_init(init_ltq_mtd);
+module_exit(exit_ltq_mtd);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("John Crispin <blogic@openwrt.org>");
+MODULE_DESCRIPTION("Lantiq SoC NOR");
}
#endif
+static struct of_device_id of_flash_match[];
static int __devinit of_flash_probe(struct platform_device *dev)
{
#ifdef CONFIG_MTD_PARTITIONS
const char **part_probe_types;
#endif
+ const struct of_device_id *match;
struct device_node *dp = dev->dev.of_node;
struct resource res;
struct of_flash *info;
struct mtd_info **mtd_list = NULL;
resource_size_t res_size;
- if (!dev->dev.of_match)
+ match = of_match_device(of_flash_match, &dev->dev);
+ if (!match)
return -EINVAL;
- probe_type = dev->dev.of_match->data;
+ probe_type = match->data;
reg_tuple_size = (of_n_addr_cells(dp) + of_n_size_cells(dp)) * sizeof(u32);
*/
#include <linux/slab.h>
+#include <linux/gpio.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/interrupt.h>
#ifdef CONFIG_MIPS_PB1550
/* set gpio206 high */
- au_writel(au_readl(GPIO2_DIR) & ~(1 << 6), GPIO2_DIR);
+ gpio_direction_input(206);
boot_swapboot = (au_readl(MEM_STSTAT) & (0x7 << 1)) | ((bcsr_read(BCSR_STATUS) >> 6) & 0x1);
doc200x_hwcontrol(mtd, 0, NAND_CTRL_ALE | NAND_CTRL_CHANGE);
doc200x_hwcontrol(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);
- /* We can't' use dev_ready here, but at least we wait for the
+ /* We can't use dev_ready here, but at least we wait for the
* command to complete
*/
udelay(50);
from Faraday. It is used on Faraday A320, Andes AG101 and some
other ARM/NDS32 SoC's.
+config LANTIQ_ETOP
+ tristate "Lantiq SoC ETOP driver"
+ depends on SOC_TYPE_XWAY
+ help
+ Support for the MII0 inside the Lantiq SoC
+
+
source "drivers/net/fs_enet/Kconfig"
source "drivers/net/octeon/Kconfig"
source "drivers/net/stmmac/Kconfig"
config PCH_GBE
- tristate "PCH Gigabit Ethernet"
+ tristate "Intel EG20T PCH / OKI SEMICONDUCTOR ML7223 IOH GbE"
depends on PCI
select MII
---help---
to Gigabit Ethernet.
This driver enables Gigabit Ethernet function.
+ This driver also can be used for OKI SEMICONDUCTOR IOH(Input/
+ Output Hub), ML7223.
+ ML7223 IOH is for MP(Media Phone) use.
+ ML7223 is companion chip for Intel Atom E6xx series.
+ ML7223 is completely compatible for Intel EG20T PCH.
+
endif # NETDEV_1000
#
obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
obj-$(CONFIG_B44) += b44.o
obj-$(CONFIG_FORCEDETH) += forcedeth.o
-obj-$(CONFIG_NE_H8300) += ne-h8300.o 8390.o
+obj-$(CONFIG_NE_H8300) += ne-h8300.o
obj-$(CONFIG_AX88796) += ax88796.o
obj-$(CONFIG_BCM63XX_ENET) += bcm63xx_enet.o
obj-$(CONFIG_FTMAC100) += ftmac100.o
obj-$(CONFIG_LP486E) += lp486e.o
obj-$(CONFIG_ETH16I) += eth16i.o
-obj-$(CONFIG_ZORRO8390) += zorro8390.o 8390.o
+obj-$(CONFIG_ZORRO8390) += zorro8390.o
obj-$(CONFIG_HPLANCE) += hplance.o 7990.o
obj-$(CONFIG_MVME147_NET) += mvme147.o 7990.o
obj-$(CONFIG_EQUALIZER) += eql.o
obj-$(CONFIG_DECLANCE) += declance.o
obj-$(CONFIG_ATARILANCE) += atarilance.o
obj-$(CONFIG_A2065) += a2065.o
-obj-$(CONFIG_HYDRA) += hydra.o 8390.o
+obj-$(CONFIG_HYDRA) += hydra.o
obj-$(CONFIG_ARIADNE) += ariadne.o
obj-$(CONFIG_CS89x0) += cs89x0.o
obj-$(CONFIG_MACSONIC) += macsonic.o
obj-$(CONFIG_ENC28J60) += enc28j60.o
obj-$(CONFIG_ETHOC) += ethoc.o
obj-$(CONFIG_GRETH) += greth.o
+obj-$(CONFIG_LANTIQ_ETOP) += lantiq_etop.o
obj-$(CONFIG_XTENSA_XT2000_SONIC) += xtsonic.o
MODULE_LICENSE("GPL");
MODULE_DEVICE_TABLE(pci, amd8111e_pci_tbl);
module_param_array(speed_duplex, int, NULL, 0);
-MODULE_PARM_DESC(speed_duplex, "Set device speed and duplex modes, 0: Auto Negotitate, 1: 10Mbps Half Duplex, 2: 10Mbps Full Duplex, 3: 100Mbps Half Duplex, 4: 100Mbps Full Duplex");
+MODULE_PARM_DESC(speed_duplex, "Set device speed and duplex modes, 0: Auto Negotiate, 1: 10Mbps Half Duplex, 2: 10Mbps Full Duplex, 3: 100Mbps Half Duplex, 4: 100Mbps Full Duplex");
module_param_array(coalesce, bool, NULL, 0);
MODULE_PARM_DESC(coalesce, "Enable or Disable interrupt coalescing, 1: Enable, 0: Disable");
module_param_array(dynamic_ipg, bool, NULL, 0);
* Read the ethernet address string from the on board rom.
* This is an ascii string...
*/
-static int __init etherh_addr(char *addr, struct expansion_card *ec)
+static int __devinit etherh_addr(char *addr, struct expansion_card *ec)
{
struct in_chunk_dir cd;
char *s;
static u32 etherh_regoffsets[16];
static u32 etherm_regoffsets[16];
-static int __init
+static int __devinit
etherh_probe(struct expansion_card *ec, const struct ecard_id *id)
{
const struct etherh_data *data = id->data;
memaddr == (unsigned short *)0xffe00000) {
/* PAMs card and Riebl on ST use level 5 autovector */
if (request_irq(IRQ_AUTO_5, lance_interrupt, IRQ_TYPE_PRIO,
- "PAM/Riebl-ST Ethernet", dev)) {
+ "PAM,Riebl-ST Ethernet", dev)) {
printk( "Lance: request for irq %d failed\n", IRQ_AUTO_5 );
return 0;
}
#define __AT_TESTING 0x0001
#define __AT_RESETTING 0x0002
#define __AT_DOWN 0x0003
- u8 work_event;
-#define ATL1C_WORK_EVENT_RESET 0x01
-#define ATL1C_WORK_EVENT_LINK_CHANGE 0x02
+ unsigned long work_event;
+#define ATL1C_WORK_EVENT_RESET 0
+#define ATL1C_WORK_EVENT_LINK_CHANGE 1
u32 msg_enable;
bool have_msi;
}
}
- adapter->work_event |= ATL1C_WORK_EVENT_LINK_CHANGE;
+ set_bit(ATL1C_WORK_EVENT_LINK_CHANGE, &adapter->work_event);
schedule_work(&adapter->common_task);
}
adapter = container_of(work, struct atl1c_adapter, common_task);
netdev = adapter->netdev;
- if (adapter->work_event & ATL1C_WORK_EVENT_RESET) {
- adapter->work_event &= ~ATL1C_WORK_EVENT_RESET;
+ if (test_and_clear_bit(ATL1C_WORK_EVENT_RESET, &adapter->work_event)) {
netif_device_detach(netdev);
atl1c_down(adapter);
atl1c_up(adapter);
netif_device_attach(netdev);
- return;
}
- if (adapter->work_event & ATL1C_WORK_EVENT_LINK_CHANGE) {
- adapter->work_event &= ~ATL1C_WORK_EVENT_LINK_CHANGE;
+ if (test_and_clear_bit(ATL1C_WORK_EVENT_LINK_CHANGE,
+ &adapter->work_event))
atl1c_check_link_status(adapter);
- }
- return;
}
struct atl1c_adapter *adapter = netdev_priv(netdev);
/* Do the reset outside of interrupt context */
- adapter->work_event |= ATL1C_WORK_EVENT_RESET;
+ set_bit(ATL1C_WORK_EVENT_RESET, &adapter->work_event);
schedule_work(&adapter->common_task);
}
struct be_rx_compl_info {
u32 rss_hash;
- u16 vid;
+ u16 vlan_tag;
u16 pkt_size;
u16 rxq_idx;
u16 mac_id;
struct be_async_event_grp5_pvid_state *evt)
{
if (evt->enabled)
- adapter->pvid = evt->tag;
+ adapter->pvid = le16_to_cpu(evt->tag);
else
adapter->pvid = 0;
}
kfree_skb(skb);
return;
}
- vlan_hwaccel_receive_skb(skb, adapter->vlan_grp, rxcp->vid);
+ vlan_hwaccel_receive_skb(skb, adapter->vlan_grp,
+ rxcp->vlan_tag);
} else {
netif_receive_skb(skb);
}
if (likely(!rxcp->vlanf))
napi_gro_frags(&eq_obj->napi);
else
- vlan_gro_frags(&eq_obj->napi, adapter->vlan_grp, rxcp->vid);
+ vlan_gro_frags(&eq_obj->napi, adapter->vlan_grp,
+ rxcp->vlan_tag);
}
static void be_parse_rx_compl_v1(struct be_adapter *adapter,
rxcp->pkt_type =
AMAP_GET_BITS(struct amap_eth_rx_compl_v1, cast_enc, compl);
rxcp->vtm = AMAP_GET_BITS(struct amap_eth_rx_compl_v1, vtm, compl);
- rxcp->vid = AMAP_GET_BITS(struct amap_eth_rx_compl_v1, vlan_tag, compl);
+ rxcp->vlan_tag = AMAP_GET_BITS(struct amap_eth_rx_compl_v1, vlan_tag,
+ compl);
}
static void be_parse_rx_compl_v0(struct be_adapter *adapter,
rxcp->pkt_type =
AMAP_GET_BITS(struct amap_eth_rx_compl_v0, cast_enc, compl);
rxcp->vtm = AMAP_GET_BITS(struct amap_eth_rx_compl_v0, vtm, compl);
- rxcp->vid = AMAP_GET_BITS(struct amap_eth_rx_compl_v0, vlan_tag, compl);
+ rxcp->vlan_tag = AMAP_GET_BITS(struct amap_eth_rx_compl_v0, vlan_tag,
+ compl);
}
static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
rxcp->vlanf = 0;
if (!lancer_chip(adapter))
- rxcp->vid = swab16(rxcp->vid);
+ rxcp->vlan_tag = swab16(rxcp->vlan_tag);
- if ((adapter->pvid == rxcp->vid) && !adapter->vlan_tag[rxcp->vid])
+ if (((adapter->pvid & VLAN_VID_MASK) ==
+ (rxcp->vlan_tag & VLAN_VID_MASK)) &&
+ !adapter->vlan_tag[rxcp->vlan_tag])
rxcp->vlanf = 0;
/* As the compl has been parsed, reset it; we wont touch it again */
be_detect_dump_ue(adapter);
reschedule:
+ adapter->work_counter++;
schedule_delayed_work(&adapter->work, msecs_to_jiffies(1000));
}
unregister_netdev(dev);
+ del_timer_sync(&bp->timer);
+
if (bp->mips_firmware)
release_firmware(bp->mips_firmware);
if (bp->rv2p_firmware)
static inline u8 bnx2x_set_pbd_csum_e2(struct bnx2x *bp, struct sk_buff *skb,
u32 *parsing_data, u32 xmit_type)
{
- *parsing_data |= ((tcp_hdrlen(skb)/4) <<
- ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW_SHIFT) &
- ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW;
+ *parsing_data |=
+ ((((u8 *)skb_transport_header(skb) - skb->data) >> 1) <<
+ ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W_SHIFT) &
+ ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W;
- *parsing_data |= ((((u8 *)tcp_hdr(skb) - skb->data) / 2) <<
- ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W_SHIFT) &
- ETH_TX_PARSE_BD_E2_TCP_HDR_START_OFFSET_W;
+ if (xmit_type & XMIT_CSUM_TCP) {
+ *parsing_data |= ((tcp_hdrlen(skb) / 4) <<
+ ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW_SHIFT) &
+ ETH_TX_PARSE_BD_E2_TCP_HDR_LENGTH_DW;
- return skb_transport_header(skb) + tcp_hdrlen(skb) - skb->data;
+ return skb_transport_header(skb) + tcp_hdrlen(skb) - skb->data;
+ } else
+ /* We support checksum offload for TCP and UDP only.
+ * No need to pass the UDP header length - it's a constant.
+ */
+ return skb_transport_header(skb) +
+ sizeof(struct udphdr) - skb->data;
}
/**
struct eth_tx_parse_bd_e1x *pbd,
u32 xmit_type)
{
- u8 hlen = (skb_network_header(skb) - skb->data) / 2;
+ u8 hlen = (skb_network_header(skb) - skb->data) >> 1;
/* for now NS flag is not used in Linux */
pbd->global_data =
ETH_TX_PARSE_BD_E1X_LLC_SNAP_EN_SHIFT));
pbd->ip_hlen_w = (skb_transport_header(skb) -
- skb_network_header(skb)) / 2;
+ skb_network_header(skb)) >> 1;
- hlen += pbd->ip_hlen_w + tcp_hdrlen(skb) / 2;
+ hlen += pbd->ip_hlen_w;
+
+ /* We support checksum offload for TCP and UDP only */
+ if (xmit_type & XMIT_CSUM_TCP)
+ hlen += tcp_hdrlen(skb) / 2;
+ else
+ hlen += sizeof(struct udphdr) / 2;
pbd->total_hlen_w = cpu_to_le16(hlen);
hlen = hlen*2;
static int agg_device_up(const struct aggregator *agg)
{
- return (netif_running(agg->slave->dev) &&
- netif_carrier_ok(agg->slave->dev));
+ struct port *port = agg->lag_ports;
+ if (!port)
+ return 0;
+ return (netif_running(port->slave->dev) &&
+ netif_carrier_ok(port->slave->dev));
}
/**
typedef struct mac_addr {
u8 mac_addr_value[ETH_ALEN];
-} mac_addr_t;
+} __packed mac_addr_t;
enum {
BOND_AD_STABLE = 0,
u8 tlv_type_terminator; // = terminator
u8 terminator_length; // = 0
u8 reserved_50[50]; // = 0
-} lacpdu_t;
+} __packed lacpdu_t;
typedef struct lacpdu_header {
struct ethhdr hdr;
struct lacpdu lacpdu;
-} lacpdu_header_t;
+} __packed lacpdu_header_t;
// Marker Protocol Data Unit(PDU) structure(43.5.3.2 in the 802.3ad standard)
typedef struct bond_marker {
u8 tlv_type_terminator; // = 0x00
u8 terminator_length; // = 0x00
u8 reserved_90[90]; // = 0
-} bond_marker_t;
+} __packed bond_marker_t;
typedef struct bond_marker_header {
struct ethhdr hdr;
struct bond_marker marker;
-} bond_marker_header_t;
+} __packed bond_marker_header_t;
#pragma pack()
}
#endif /* CONFIG_PPC_MPC512x */
+static struct of_device_id mpc5xxx_can_table[];
static int __devinit mpc5xxx_can_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct mpc5xxx_can_data *data;
struct device_node *np = ofdev->dev.of_node;
struct net_device *dev;
int irq, mscan_clksrc = 0;
int err = -ENOMEM;
- if (!ofdev->dev.of_match)
+ match = of_match_device(mpc5xxx_can_table, &ofdev->dev);
+ if (!match)
return -EINVAL;
- data = (struct mpc5xxx_can_data *)ofdev->dev.of_match->data;
+ data = match->data;
base = of_iomap(np, 0);
if (!base) {
| (priv->read_reg(priv, REG_ID2) >> 5);
}
+ cf->can_dlc = get_can_dlc(fi & 0x0F);
if (fi & FI_RTR) {
id |= CAN_RTR_FLAG;
} else {
- cf->can_dlc = get_can_dlc(fi & 0x0F);
for (i = 0; i < cf->can_dlc; i++)
cf->data[i] = priv->read_reg(priv, dreg++);
}
/* Done. We have linked the TTY line to a channel. */
rtnl_unlock();
tty->receive_room = 65536; /* We don't flow control */
- return sl->dev->base_addr;
+
+ /* TTY layer expects 0 on success */
+ return 0;
err_free_chan:
sl->tty = NULL;
cmd->duplex = -1;
}
- cmd->supported = (SUPPORTED_10000baseT_Full | SUPPORTED_1000baseT_Full
- | SUPPORTED_100baseT_Full | SUPPORTED_100baseT_Half
- | SUPPORTED_10baseT_Full | SUPPORTED_10baseT_Half
- | SUPPORTED_Autoneg | SUPPORTED_FIBRE);
-
- cmd->advertising = (ADVERTISED_10000baseT_Full | ADVERTISED_Autoneg
- | ADVERTISED_FIBRE);
+ if (cmd->speed == SPEED_10000) {
+ cmd->supported = (SUPPORTED_10000baseT_Full | SUPPORTED_FIBRE);
+ cmd->advertising = (ADVERTISED_10000baseT_Full | ADVERTISED_FIBRE);
+ cmd->port = PORT_FIBRE;
+ } else {
+ cmd->supported = (SUPPORTED_1000baseT_Full | SUPPORTED_100baseT_Full
+ | SUPPORTED_100baseT_Half | SUPPORTED_10baseT_Full
+ | SUPPORTED_10baseT_Half | SUPPORTED_Autoneg
+ | SUPPORTED_TP);
+ cmd->advertising = (ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg
+ | ADVERTISED_TP);
+ cmd->port = PORT_TP;
+ }
- cmd->port = PORT_FIBRE;
cmd->autoneg = port->autoneg == 1 ? AUTONEG_ENABLE : AUTONEG_DISABLE;
return 0;
netif_start_queue(dev);
}
- init_waitqueue_head(&port->swqe_avail_wq);
- init_waitqueue_head(&port->restart_wq);
-
mutex_unlock(&port->port_lock);
return ret;
if (dev->flags & IFF_UP) {
mutex_lock(&port->port_lock);
- port_napi_enable(port);
ret = ehea_restart_qps(dev);
- check_sqs(port);
- if (!ret)
+ if (!ret) {
+ check_sqs(port);
+ port_napi_enable(port);
netif_wake_queue(dev);
+ } else {
+ netdev_err(dev, "Unable to restart QPS\n");
+ }
mutex_unlock(&port->port_lock);
}
}
INIT_WORK(&port->reset_task, ehea_reset_port);
+ init_waitqueue_head(&port->swqe_avail_wq);
+ init_waitqueue_head(&port->restart_wq);
+
ret = register_netdev(dev);
if (ret) {
pr_err("register_netdev failed. ret=%d\n", ret);
#endif
};
+static struct of_device_id fs_enet_match[];
static int __devinit fs_enet_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct net_device *ndev;
struct fs_enet_private *fep;
struct fs_platform_info *fpi;
const u8 *mac_addr;
int privsize, len, ret = -ENODEV;
- if (!ofdev->dev.of_match)
+ match = of_match_device(fs_enet_match, &ofdev->dev);
+ if (!match)
return -EINVAL;
fpi = kzalloc(sizeof(*fpi), GFP_KERNEL);
if (!fpi)
return -ENOMEM;
- if (!IS_FEC(ofdev->dev.of_match)) {
+ if (!IS_FEC(match)) {
data = of_get_property(ofdev->dev.of_node, "fsl,cpm-command", &len);
if (!data || len != 4)
goto out_free_fpi;
fep->dev = &ofdev->dev;
fep->ndev = ndev;
fep->fpi = fpi;
- fep->ops = ofdev->dev.of_match->data;
+ fep->ops = match->data;
ret = fep->ops->setup_data(ndev);
if (ret)
}
FC(fecp, r_cntrl, FEC_RCNTRL_PROM);
- FW(fecp, hash_table_high, fep->fec.hthi);
- FW(fecp, hash_table_low, fep->fec.htlo);
+ FW(fecp, grp_hash_table_high, fep->fec.hthi);
+ FW(fecp, grp_hash_table_low, fep->fec.htlo);
}
static void set_multicast_list(struct net_device *dev)
/*
* Reset all multicast.
*/
- FW(fecp, hash_table_high, fep->fec.hthi);
- FW(fecp, hash_table_low, fep->fec.htlo);
+ FW(fecp, grp_hash_table_high, fep->fec.hthi);
+ FW(fecp, grp_hash_table_low, fep->fec.htlo);
/*
* Set maximum receive buffer size.
return 0;
}
+static struct of_device_id fs_enet_mdio_fec_match[];
static int __devinit fs_enet_mdio_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct resource res;
struct mii_bus *new_bus;
struct fec_info *fec;
int (*get_bus_freq)(struct device_node *);
int ret = -ENOMEM, clock, speed;
- if (!ofdev->dev.of_match)
+ match = of_match_device(fs_enet_mdio_fec_match, &ofdev->dev);
+ if (!match)
return -EINVAL;
- get_bus_freq = ofdev->dev.of_match->data;
+ get_bus_freq = match->data;
new_bus = mdiobus_alloc();
if (!new_bus)
* that hardware reset completed (what the f*ck).
* We still need to wait for a while.
*/
- usleep_range(500, 1000);
+ udelay(500);
return 0;
}
- usleep_range(1000, 10000);
+ udelay(1000);
}
netdev_err(netdev, "software reset failed\n");
if ((phycr & FTMAC100_PHYCR_MIIRD) == 0)
return phycr & FTMAC100_PHYCR_MIIRDATA;
- usleep_range(100, 1000);
+ udelay(100);
}
netdev_err(netdev, "mdio read timed out\n");
if ((phycr & FTMAC100_PHYCR_MIIWR) == 0)
return;
- usleep_range(100, 1000);
+ udelay(100);
}
netdev_err(netdev, "mdio write timed out\n");
.ndo_open = hydra_open,
.ndo_stop = hydra_close,
- .ndo_start_xmit = ei_start_xmit,
- .ndo_tx_timeout = ei_tx_timeout,
- .ndo_get_stats = ei_get_stats,
- .ndo_set_multicast_list = ei_set_multicast_list,
+ .ndo_start_xmit = __ei_start_xmit,
+ .ndo_tx_timeout = __ei_tx_timeout,
+ .ndo_get_stats = __ei_get_stats,
+ .ndo_set_multicast_list = __ei_set_multicast_list,
.ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = eth_mac_addr,
+ .ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = eth_change_mtu,
#ifdef CONFIG_NET_POLL_CONTROLLER
- .ndo_poll_controller = ei_poll,
+ .ndo_poll_controller = __ei_poll,
#endif
};
0x10, 0x12, 0x14, 0x16, 0x18, 0x1a, 0x1c, 0x1e,
};
- dev = alloc_ei_netdev();
+ dev = ____alloc_ei_netdev(0);
if (!dev)
return -ENOMEM;
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2011 John Crispin <blogic@openwrt.org>
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/interrupt.h>
+#include <linux/uaccess.h>
+#include <linux/in.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/phy.h>
+#include <linux/ip.h>
+#include <linux/tcp.h>
+#include <linux/skbuff.h>
+#include <linux/mm.h>
+#include <linux/platform_device.h>
+#include <linux/ethtool.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+
+#include <asm/checksum.h>
+
+#include <lantiq_soc.h>
+#include <xway_dma.h>
+#include <lantiq_platform.h>
+
+#define LTQ_ETOP_MDIO 0x11804
+#define MDIO_REQUEST 0x80000000
+#define MDIO_READ 0x40000000
+#define MDIO_ADDR_MASK 0x1f
+#define MDIO_ADDR_OFFSET 0x15
+#define MDIO_REG_MASK 0x1f
+#define MDIO_REG_OFFSET 0x10
+#define MDIO_VAL_MASK 0xffff
+
+#define PPE32_CGEN 0x800
+#define LQ_PPE32_ENET_MAC_CFG 0x1840
+
+#define LTQ_ETOP_ENETS0 0x11850
+#define LTQ_ETOP_MAC_DA0 0x1186C
+#define LTQ_ETOP_MAC_DA1 0x11870
+#define LTQ_ETOP_CFG 0x16020
+#define LTQ_ETOP_IGPLEN 0x16080
+
+#define MAX_DMA_CHAN 0x8
+#define MAX_DMA_CRC_LEN 0x4
+#define MAX_DMA_DATA_LEN 0x600
+
+#define ETOP_FTCU BIT(28)
+#define ETOP_MII_MASK 0xf
+#define ETOP_MII_NORMAL 0xd
+#define ETOP_MII_REVERSE 0xe
+#define ETOP_PLEN_UNDER 0x40
+#define ETOP_CGEN 0x800
+
+/* use 2 static channels for TX/RX */
+#define LTQ_ETOP_TX_CHANNEL 1
+#define LTQ_ETOP_RX_CHANNEL 6
+#define IS_TX(x) (x == LTQ_ETOP_TX_CHANNEL)
+#define IS_RX(x) (x == LTQ_ETOP_RX_CHANNEL)
+
+#define ltq_etop_r32(x) ltq_r32(ltq_etop_membase + (x))
+#define ltq_etop_w32(x, y) ltq_w32(x, ltq_etop_membase + (y))
+#define ltq_etop_w32_mask(x, y, z) \
+ ltq_w32_mask(x, y, ltq_etop_membase + (z))
+
+#define DRV_VERSION "1.0"
+
+static void __iomem *ltq_etop_membase;
+
+struct ltq_etop_chan {
+ int idx;
+ int tx_free;
+ struct net_device *netdev;
+ struct napi_struct napi;
+ struct ltq_dma_channel dma;
+ struct sk_buff *skb[LTQ_DESC_NUM];
+};
+
+struct ltq_etop_priv {
+ struct net_device *netdev;
+ struct ltq_eth_data *pldata;
+ struct resource *res;
+
+ struct mii_bus *mii_bus;
+ struct phy_device *phydev;
+
+ struct ltq_etop_chan ch[MAX_DMA_CHAN];
+ int tx_free[MAX_DMA_CHAN >> 1];
+
+ spinlock_t lock;
+};
+
+static int
+ltq_etop_alloc_skb(struct ltq_etop_chan *ch)
+{
+ ch->skb[ch->dma.desc] = dev_alloc_skb(MAX_DMA_DATA_LEN);
+ if (!ch->skb[ch->dma.desc])
+ return -ENOMEM;
+ ch->dma.desc_base[ch->dma.desc].addr = dma_map_single(NULL,
+ ch->skb[ch->dma.desc]->data, MAX_DMA_DATA_LEN,
+ DMA_FROM_DEVICE);
+ ch->dma.desc_base[ch->dma.desc].addr =
+ CPHYSADDR(ch->skb[ch->dma.desc]->data);
+ ch->dma.desc_base[ch->dma.desc].ctl =
+ LTQ_DMA_OWN | LTQ_DMA_RX_OFFSET(NET_IP_ALIGN) |
+ MAX_DMA_DATA_LEN;
+ skb_reserve(ch->skb[ch->dma.desc], NET_IP_ALIGN);
+ return 0;
+}
+
+static void
+ltq_etop_hw_receive(struct ltq_etop_chan *ch)
+{
+ struct ltq_etop_priv *priv = netdev_priv(ch->netdev);
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+ struct sk_buff *skb = ch->skb[ch->dma.desc];
+ int len = (desc->ctl & LTQ_DMA_SIZE_MASK) - MAX_DMA_CRC_LEN;
+ unsigned long flags;
+
+ spin_lock_irqsave(&priv->lock, flags);
+ if (ltq_etop_alloc_skb(ch)) {
+ netdev_err(ch->netdev,
+ "failed to allocate new rx buffer, stopping DMA\n");
+ ltq_dma_close(&ch->dma);
+ }
+ ch->dma.desc++;
+ ch->dma.desc %= LTQ_DESC_NUM;
+ spin_unlock_irqrestore(&priv->lock, flags);
+
+ skb_put(skb, len);
+ skb->dev = ch->netdev;
+ skb->protocol = eth_type_trans(skb, ch->netdev);
+ netif_receive_skb(skb);
+}
+
+static int
+ltq_etop_poll_rx(struct napi_struct *napi, int budget)
+{
+ struct ltq_etop_chan *ch = container_of(napi,
+ struct ltq_etop_chan, napi);
+ int rx = 0;
+ int complete = 0;
+
+ while ((rx < budget) && !complete) {
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+
+ if ((desc->ctl & (LTQ_DMA_OWN | LTQ_DMA_C)) == LTQ_DMA_C) {
+ ltq_etop_hw_receive(ch);
+ rx++;
+ } else {
+ complete = 1;
+ }
+ }
+ if (complete || !rx) {
+ napi_complete(&ch->napi);
+ ltq_dma_ack_irq(&ch->dma);
+ }
+ return rx;
+}
+
+static int
+ltq_etop_poll_tx(struct napi_struct *napi, int budget)
+{
+ struct ltq_etop_chan *ch =
+ container_of(napi, struct ltq_etop_chan, napi);
+ struct ltq_etop_priv *priv = netdev_priv(ch->netdev);
+ struct netdev_queue *txq =
+ netdev_get_tx_queue(ch->netdev, ch->idx >> 1);
+ unsigned long flags;
+
+ spin_lock_irqsave(&priv->lock, flags);
+ while ((ch->dma.desc_base[ch->tx_free].ctl &
+ (LTQ_DMA_OWN | LTQ_DMA_C)) == LTQ_DMA_C) {
+ dev_kfree_skb_any(ch->skb[ch->tx_free]);
+ ch->skb[ch->tx_free] = NULL;
+ memset(&ch->dma.desc_base[ch->tx_free], 0,
+ sizeof(struct ltq_dma_desc));
+ ch->tx_free++;
+ ch->tx_free %= LTQ_DESC_NUM;
+ }
+ spin_unlock_irqrestore(&priv->lock, flags);
+
+ if (netif_tx_queue_stopped(txq))
+ netif_tx_start_queue(txq);
+ napi_complete(&ch->napi);
+ ltq_dma_ack_irq(&ch->dma);
+ return 1;
+}
+
+static irqreturn_t
+ltq_etop_dma_irq(int irq, void *_priv)
+{
+ struct ltq_etop_priv *priv = _priv;
+ int ch = irq - LTQ_DMA_CH0_INT;
+
+ napi_schedule(&priv->ch[ch].napi);
+ return IRQ_HANDLED;
+}
+
+static void
+ltq_etop_free_channel(struct net_device *dev, struct ltq_etop_chan *ch)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+
+ ltq_dma_free(&ch->dma);
+ if (ch->dma.irq)
+ free_irq(ch->dma.irq, priv);
+ if (IS_RX(ch->idx)) {
+ int desc;
+ for (desc = 0; desc < LTQ_DESC_NUM; desc++)
+ dev_kfree_skb_any(ch->skb[ch->dma.desc]);
+ }
+}
+
+static void
+ltq_etop_hw_exit(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ int i;
+
+ ltq_pmu_disable(PMU_PPE);
+ for (i = 0; i < MAX_DMA_CHAN; i++)
+ if (IS_TX(i) || IS_RX(i))
+ ltq_etop_free_channel(dev, &priv->ch[i]);
+}
+
+static int
+ltq_etop_hw_init(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ int i;
+
+ ltq_pmu_enable(PMU_PPE);
+
+ switch (priv->pldata->mii_mode) {
+ case PHY_INTERFACE_MODE_RMII:
+ ltq_etop_w32_mask(ETOP_MII_MASK,
+ ETOP_MII_REVERSE, LTQ_ETOP_CFG);
+ break;
+
+ case PHY_INTERFACE_MODE_MII:
+ ltq_etop_w32_mask(ETOP_MII_MASK,
+ ETOP_MII_NORMAL, LTQ_ETOP_CFG);
+ break;
+
+ default:
+ netdev_err(dev, "unknown mii mode %d\n",
+ priv->pldata->mii_mode);
+ return -ENOTSUPP;
+ }
+
+ /* enable crc generation */
+ ltq_etop_w32(PPE32_CGEN, LQ_PPE32_ENET_MAC_CFG);
+
+ ltq_dma_init_port(DMA_PORT_ETOP);
+
+ for (i = 0; i < MAX_DMA_CHAN; i++) {
+ int irq = LTQ_DMA_CH0_INT + i;
+ struct ltq_etop_chan *ch = &priv->ch[i];
+
+ ch->idx = ch->dma.nr = i;
+
+ if (IS_TX(i)) {
+ ltq_dma_alloc_tx(&ch->dma);
+ request_irq(irq, ltq_etop_dma_irq, IRQF_DISABLED,
+ "etop_tx", priv);
+ } else if (IS_RX(i)) {
+ ltq_dma_alloc_rx(&ch->dma);
+ for (ch->dma.desc = 0; ch->dma.desc < LTQ_DESC_NUM;
+ ch->dma.desc++)
+ if (ltq_etop_alloc_skb(ch))
+ return -ENOMEM;
+ ch->dma.desc = 0;
+ request_irq(irq, ltq_etop_dma_irq, IRQF_DISABLED,
+ "etop_rx", priv);
+ }
+ ch->dma.irq = irq;
+ }
+ return 0;
+}
+
+static void
+ltq_etop_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
+{
+ strcpy(info->driver, "Lantiq ETOP");
+ strcpy(info->bus_info, "internal");
+ strcpy(info->version, DRV_VERSION);
+}
+
+static int
+ltq_etop_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+
+ return phy_ethtool_gset(priv->phydev, cmd);
+}
+
+static int
+ltq_etop_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+
+ return phy_ethtool_sset(priv->phydev, cmd);
+}
+
+static int
+ltq_etop_nway_reset(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+
+ return phy_start_aneg(priv->phydev);
+}
+
+static const struct ethtool_ops ltq_etop_ethtool_ops = {
+ .get_drvinfo = ltq_etop_get_drvinfo,
+ .get_settings = ltq_etop_get_settings,
+ .set_settings = ltq_etop_set_settings,
+ .nway_reset = ltq_etop_nway_reset,
+};
+
+static int
+ltq_etop_mdio_wr(struct mii_bus *bus, int phy_addr, int phy_reg, u16 phy_data)
+{
+ u32 val = MDIO_REQUEST |
+ ((phy_addr & MDIO_ADDR_MASK) << MDIO_ADDR_OFFSET) |
+ ((phy_reg & MDIO_REG_MASK) << MDIO_REG_OFFSET) |
+ phy_data;
+
+ while (ltq_etop_r32(LTQ_ETOP_MDIO) & MDIO_REQUEST)
+ ;
+ ltq_etop_w32(val, LTQ_ETOP_MDIO);
+ return 0;
+}
+
+static int
+ltq_etop_mdio_rd(struct mii_bus *bus, int phy_addr, int phy_reg)
+{
+ u32 val = MDIO_REQUEST | MDIO_READ |
+ ((phy_addr & MDIO_ADDR_MASK) << MDIO_ADDR_OFFSET) |
+ ((phy_reg & MDIO_REG_MASK) << MDIO_REG_OFFSET);
+
+ while (ltq_etop_r32(LTQ_ETOP_MDIO) & MDIO_REQUEST)
+ ;
+ ltq_etop_w32(val, LTQ_ETOP_MDIO);
+ while (ltq_etop_r32(LTQ_ETOP_MDIO) & MDIO_REQUEST)
+ ;
+ val = ltq_etop_r32(LTQ_ETOP_MDIO) & MDIO_VAL_MASK;
+ return val;
+}
+
+static void
+ltq_etop_mdio_link(struct net_device *dev)
+{
+ /* nothing to do */
+}
+
+static int
+ltq_etop_mdio_probe(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ struct phy_device *phydev = NULL;
+ int phy_addr;
+
+ for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++) {
+ if (priv->mii_bus->phy_map[phy_addr]) {
+ phydev = priv->mii_bus->phy_map[phy_addr];
+ break;
+ }
+ }
+
+ if (!phydev) {
+ netdev_err(dev, "no PHY found\n");
+ return -ENODEV;
+ }
+
+ phydev = phy_connect(dev, dev_name(&phydev->dev), <q_etop_mdio_link,
+ 0, priv->pldata->mii_mode);
+
+ if (IS_ERR(phydev)) {
+ netdev_err(dev, "Could not attach to PHY\n");
+ return PTR_ERR(phydev);
+ }
+
+ phydev->supported &= (SUPPORTED_10baseT_Half
+ | SUPPORTED_10baseT_Full
+ | SUPPORTED_100baseT_Half
+ | SUPPORTED_100baseT_Full
+ | SUPPORTED_Autoneg
+ | SUPPORTED_MII
+ | SUPPORTED_TP);
+
+ phydev->advertising = phydev->supported;
+ priv->phydev = phydev;
+ pr_info("%s: attached PHY [%s] (phy_addr=%s, irq=%d)\n",
+ dev->name, phydev->drv->name,
+ dev_name(&phydev->dev), phydev->irq);
+
+ return 0;
+}
+
+static int
+ltq_etop_mdio_init(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ int i;
+ int err;
+
+ priv->mii_bus = mdiobus_alloc();
+ if (!priv->mii_bus) {
+ netdev_err(dev, "failed to allocate mii bus\n");
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ priv->mii_bus->priv = dev;
+ priv->mii_bus->read = ltq_etop_mdio_rd;
+ priv->mii_bus->write = ltq_etop_mdio_wr;
+ priv->mii_bus->name = "ltq_mii";
+ snprintf(priv->mii_bus->id, MII_BUS_ID_SIZE, "%x", 0);
+ priv->mii_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
+ if (!priv->mii_bus->irq) {
+ err = -ENOMEM;
+ goto err_out_free_mdiobus;
+ }
+
+ for (i = 0; i < PHY_MAX_ADDR; ++i)
+ priv->mii_bus->irq[i] = PHY_POLL;
+
+ if (mdiobus_register(priv->mii_bus)) {
+ err = -ENXIO;
+ goto err_out_free_mdio_irq;
+ }
+
+ if (ltq_etop_mdio_probe(dev)) {
+ err = -ENXIO;
+ goto err_out_unregister_bus;
+ }
+ return 0;
+
+err_out_unregister_bus:
+ mdiobus_unregister(priv->mii_bus);
+err_out_free_mdio_irq:
+ kfree(priv->mii_bus->irq);
+err_out_free_mdiobus:
+ mdiobus_free(priv->mii_bus);
+err_out:
+ return err;
+}
+
+static void
+ltq_etop_mdio_cleanup(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+
+ phy_disconnect(priv->phydev);
+ mdiobus_unregister(priv->mii_bus);
+ kfree(priv->mii_bus->irq);
+ mdiobus_free(priv->mii_bus);
+}
+
+static int
+ltq_etop_open(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ int i;
+
+ for (i = 0; i < MAX_DMA_CHAN; i++) {
+ struct ltq_etop_chan *ch = &priv->ch[i];
+
+ if (!IS_TX(i) && (!IS_RX(i)))
+ continue;
+ ltq_dma_open(&ch->dma);
+ napi_enable(&ch->napi);
+ }
+ phy_start(priv->phydev);
+ netif_tx_start_all_queues(dev);
+ return 0;
+}
+
+static int
+ltq_etop_stop(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ int i;
+
+ netif_tx_stop_all_queues(dev);
+ phy_stop(priv->phydev);
+ for (i = 0; i < MAX_DMA_CHAN; i++) {
+ struct ltq_etop_chan *ch = &priv->ch[i];
+
+ if (!IS_RX(i) && !IS_TX(i))
+ continue;
+ napi_disable(&ch->napi);
+ ltq_dma_close(&ch->dma);
+ }
+ return 0;
+}
+
+static int
+ltq_etop_tx(struct sk_buff *skb, struct net_device *dev)
+{
+ int queue = skb_get_queue_mapping(skb);
+ struct netdev_queue *txq = netdev_get_tx_queue(dev, queue);
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ struct ltq_etop_chan *ch = &priv->ch[(queue << 1) | 1];
+ struct ltq_dma_desc *desc = &ch->dma.desc_base[ch->dma.desc];
+ int len;
+ unsigned long flags;
+ u32 byte_offset;
+
+ len = skb->len < ETH_ZLEN ? ETH_ZLEN : skb->len;
+
+ if ((desc->ctl & (LTQ_DMA_OWN | LTQ_DMA_C)) || ch->skb[ch->dma.desc]) {
+ dev_kfree_skb_any(skb);
+ netdev_err(dev, "tx ring full\n");
+ netif_tx_stop_queue(txq);
+ return NETDEV_TX_BUSY;
+ }
+
+ /* dma needs to start on a 16 byte aligned address */
+ byte_offset = CPHYSADDR(skb->data) % 16;
+ ch->skb[ch->dma.desc] = skb;
+
+ dev->trans_start = jiffies;
+
+ spin_lock_irqsave(&priv->lock, flags);
+ desc->addr = ((unsigned int) dma_map_single(NULL, skb->data, len,
+ DMA_TO_DEVICE)) - byte_offset;
+ wmb();
+ desc->ctl = LTQ_DMA_OWN | LTQ_DMA_SOP | LTQ_DMA_EOP |
+ LTQ_DMA_TX_OFFSET(byte_offset) | (len & LTQ_DMA_SIZE_MASK);
+ ch->dma.desc++;
+ ch->dma.desc %= LTQ_DESC_NUM;
+ spin_unlock_irqrestore(&priv->lock, flags);
+
+ if (ch->dma.desc_base[ch->dma.desc].ctl & LTQ_DMA_OWN)
+ netif_tx_stop_queue(txq);
+
+ return NETDEV_TX_OK;
+}
+
+static int
+ltq_etop_change_mtu(struct net_device *dev, int new_mtu)
+{
+ int ret = eth_change_mtu(dev, new_mtu);
+
+ if (!ret) {
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ unsigned long flags;
+
+ spin_lock_irqsave(&priv->lock, flags);
+ ltq_etop_w32((ETOP_PLEN_UNDER << 16) | new_mtu,
+ LTQ_ETOP_IGPLEN);
+ spin_unlock_irqrestore(&priv->lock, flags);
+ }
+ return ret;
+}
+
+static int
+ltq_etop_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+
+ /* TODO: mii-toll reports "No MII transceiver present!." ?!*/
+ return phy_mii_ioctl(priv->phydev, rq, cmd);
+}
+
+static int
+ltq_etop_set_mac_address(struct net_device *dev, void *p)
+{
+ int ret = eth_mac_addr(dev, p);
+
+ if (!ret) {
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ unsigned long flags;
+
+ /* store the mac for the unicast filter */
+ spin_lock_irqsave(&priv->lock, flags);
+ ltq_etop_w32(*((u32 *)dev->dev_addr), LTQ_ETOP_MAC_DA0);
+ ltq_etop_w32(*((u16 *)&dev->dev_addr[4]) << 16,
+ LTQ_ETOP_MAC_DA1);
+ spin_unlock_irqrestore(&priv->lock, flags);
+ }
+ return ret;
+}
+
+static void
+ltq_etop_set_multicast_list(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ unsigned long flags;
+
+ /* ensure that the unicast filter is not enabled in promiscious mode */
+ spin_lock_irqsave(&priv->lock, flags);
+ if ((dev->flags & IFF_PROMISC) || (dev->flags & IFF_ALLMULTI))
+ ltq_etop_w32_mask(ETOP_FTCU, 0, LTQ_ETOP_ENETS0);
+ else
+ ltq_etop_w32_mask(0, ETOP_FTCU, LTQ_ETOP_ENETS0);
+ spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+static u16
+ltq_etop_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+ /* we are currently only using the first queue */
+ return 0;
+}
+
+static int
+ltq_etop_init(struct net_device *dev)
+{
+ struct ltq_etop_priv *priv = netdev_priv(dev);
+ struct sockaddr mac;
+ int err;
+
+ ether_setup(dev);
+ dev->watchdog_timeo = 10 * HZ;
+ err = ltq_etop_hw_init(dev);
+ if (err)
+ goto err_hw;
+ ltq_etop_change_mtu(dev, 1500);
+
+ memcpy(&mac, &priv->pldata->mac, sizeof(struct sockaddr));
+ if (!is_valid_ether_addr(mac.sa_data)) {
+ pr_warn("etop: invalid MAC, using random\n");
+ random_ether_addr(mac.sa_data);
+ }
+
+ err = ltq_etop_set_mac_address(dev, &mac);
+ if (err)
+ goto err_netdev;
+ ltq_etop_set_multicast_list(dev);
+ err = ltq_etop_mdio_init(dev);
+ if (err)
+ goto err_netdev;
+ return 0;
+
+err_netdev:
+ unregister_netdev(dev);
+ free_netdev(dev);
+err_hw:
+ ltq_etop_hw_exit(dev);
+ return err;
+}
+
+static void
+ltq_etop_tx_timeout(struct net_device *dev)
+{
+ int err;
+
+ ltq_etop_hw_exit(dev);
+ err = ltq_etop_hw_init(dev);
+ if (err)
+ goto err_hw;
+ dev->trans_start = jiffies;
+ netif_wake_queue(dev);
+ return;
+
+err_hw:
+ ltq_etop_hw_exit(dev);
+ netdev_err(dev, "failed to restart etop after TX timeout\n");
+}
+
+static const struct net_device_ops ltq_eth_netdev_ops = {
+ .ndo_open = ltq_etop_open,
+ .ndo_stop = ltq_etop_stop,
+ .ndo_start_xmit = ltq_etop_tx,
+ .ndo_change_mtu = ltq_etop_change_mtu,
+ .ndo_do_ioctl = ltq_etop_ioctl,
+ .ndo_set_mac_address = ltq_etop_set_mac_address,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_set_multicast_list = ltq_etop_set_multicast_list,
+ .ndo_select_queue = ltq_etop_select_queue,
+ .ndo_init = ltq_etop_init,
+ .ndo_tx_timeout = ltq_etop_tx_timeout,
+};
+
+static int __init
+ltq_etop_probe(struct platform_device *pdev)
+{
+ struct net_device *dev;
+ struct ltq_etop_priv *priv;
+ struct resource *res;
+ int err;
+ int i;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(&pdev->dev, "failed to get etop resource\n");
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ res = devm_request_mem_region(&pdev->dev, res->start,
+ resource_size(res), dev_name(&pdev->dev));
+ if (!res) {
+ dev_err(&pdev->dev, "failed to request etop resource\n");
+ err = -EBUSY;
+ goto err_out;
+ }
+
+ ltq_etop_membase = devm_ioremap_nocache(&pdev->dev,
+ res->start, resource_size(res));
+ if (!ltq_etop_membase) {
+ dev_err(&pdev->dev, "failed to remap etop engine %d\n",
+ pdev->id);
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ dev = alloc_etherdev_mq(sizeof(struct ltq_etop_priv), 4);
+ strcpy(dev->name, "eth%d");
+ dev->netdev_ops = <q_eth_netdev_ops;
+ dev->ethtool_ops = <q_etop_ethtool_ops;
+ priv = netdev_priv(dev);
+ priv->res = res;
+ priv->pldata = dev_get_platdata(&pdev->dev);
+ priv->netdev = dev;
+ spin_lock_init(&priv->lock);
+
+ for (i = 0; i < MAX_DMA_CHAN; i++) {
+ if (IS_TX(i))
+ netif_napi_add(dev, &priv->ch[i].napi,
+ ltq_etop_poll_tx, 8);
+ else if (IS_RX(i))
+ netif_napi_add(dev, &priv->ch[i].napi,
+ ltq_etop_poll_rx, 32);
+ priv->ch[i].netdev = dev;
+ }
+
+ err = register_netdev(dev);
+ if (err)
+ goto err_free;
+
+ platform_set_drvdata(pdev, dev);
+ return 0;
+
+err_free:
+ kfree(dev);
+err_out:
+ return err;
+}
+
+static int __devexit
+ltq_etop_remove(struct platform_device *pdev)
+{
+ struct net_device *dev = platform_get_drvdata(pdev);
+
+ if (dev) {
+ netif_tx_stop_all_queues(dev);
+ ltq_etop_hw_exit(dev);
+ ltq_etop_mdio_cleanup(dev);
+ unregister_netdev(dev);
+ }
+ return 0;
+}
+
+static struct platform_driver ltq_mii_driver = {
+ .remove = __devexit_p(ltq_etop_remove),
+ .driver = {
+ .name = "ltq_etop",
+ .owner = THIS_MODULE,
+ },
+};
+
+int __init
+init_ltq_etop(void)
+{
+ int ret = platform_driver_probe(<q_mii_driver, ltq_etop_probe);
+
+ if (ret)
+ pr_err("ltq_etop: Error registering platfom driver!");
+ return ret;
+}
+
+static void __exit
+exit_ltq_etop(void)
+{
+ platform_driver_unregister(<q_mii_driver);
+}
+
+module_init(init_ltq_etop);
+module_exit(exit_ltq_etop);
+
+MODULE_AUTHOR("John Crispin <blogic@openwrt.org>");
+MODULE_DESCRIPTION("Lantiq SoC ETOP");
+MODULE_LICENSE("GPL");
result |= ADVERTISED_100baseT_Half;
if (advert & ADVERTISE_100FULL)
result |= ADVERTISED_100baseT_Full;
+ if (advert & ADVERTISE_PAUSE_CAP)
+ result |= ADVERTISED_Pause;
+ if (advert & ADVERTISE_PAUSE_ASYM)
+ result |= ADVERTISED_Asym_Pause;
return result;
}
#ifndef MODULE
struct net_device * __init ne_probe(int unit)
{
- struct net_device *dev = alloc_ei_netdev();
+ struct net_device *dev = ____alloc_ei_netdev(0);
int err;
if (!dev)
.ndo_open = ne_open,
.ndo_stop = ne_close,
- .ndo_start_xmit = ei_start_xmit,
- .ndo_tx_timeout = ei_tx_timeout,
- .ndo_get_stats = ei_get_stats,
- .ndo_set_multicast_list = ei_set_multicast_list,
+ .ndo_start_xmit = __ei_start_xmit,
+ .ndo_tx_timeout = __ei_tx_timeout,
+ .ndo_get_stats = __ei_get_stats,
+ .ndo_set_multicast_list = __ei_set_multicast_list,
.ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = eth_mac_addr,
+ .ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = eth_change_mtu,
#ifdef CONFIG_NET_POLL_CONTROLLER
- .ndo_poll_controller = ei_poll,
+ .ndo_poll_controller = __ei_poll,
#endif
};
int err;
for (this_dev = 0; this_dev < MAX_NE_CARDS; this_dev++) {
- struct net_device *dev = alloc_ei_netdev();
+ struct net_device *dev = ____alloc_ei_netdev(0);
if (!dev)
break;
if (io[this_dev]) {
goto done;
spin_lock_irqsave(&target_list_lock, flags);
+restart:
list_for_each_entry(nt, &target_list, list) {
netconsole_target_get(nt);
if (nt->np.dev == dev) {
* rtnl_lock already held
*/
if (nt->np.dev) {
+ spin_unlock_irqrestore(
+ &target_list_lock,
+ flags);
__netpoll_cleanup(&nt->np);
+ spin_lock_irqsave(&target_list_lock,
+ flags);
dev_put(nt->np.dev);
nt->np.dev = NULL;
+ netconsole_target_put(nt);
+ goto restart;
}
/* Fall through */
case NETDEV_GOING_DOWN:
#define PCH_GBE_COPYBREAK_DEFAULT 256
#define PCH_GBE_PCI_BAR 1
+/* Macros for ML7223 */
+#define PCI_VENDOR_ID_ROHM 0x10db
+#define PCI_DEVICE_ID_ROHM_ML7223_GBE 0x8013
+
#define PCH_GBE_TX_WEIGHT 64
#define PCH_GBE_RX_WEIGHT 64
#define PCH_GBE_RX_BUFFER_WRITE 16
#define PCH_GBE_MAC_RGMII_CTRL_SETTING ( \
PCH_GBE_CHIP_TYPE_INTERNAL | \
- PCH_GBE_RGMII_MODE_RGMII | \
- PCH_GBE_CRS_SEL \
+ PCH_GBE_RGMII_MODE_RGMII \
)
/* Ethertype field values */
/* Write meta date of skb */
skb_put(skb, length);
skb->protocol = eth_type_trans(skb, netdev);
- if ((tcp_ip_status & PCH_GBE_RXD_ACC_STAT_TCPIPOK) ==
- PCH_GBE_RXD_ACC_STAT_TCPIPOK) {
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- } else {
+ if (tcp_ip_status & PCH_GBE_RXD_ACC_STAT_TCPIPOK)
skb->ip_summed = CHECKSUM_NONE;
- }
+ else
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
napi_gro_receive(&adapter->napi, skb);
(*work_done)++;
pr_debug("Receive skb->ip_summed: %d length: %d\n",
.class = (PCI_CLASS_NETWORK_ETHERNET << 8),
.class_mask = (0xFFFF00)
},
+ {.vendor = PCI_VENDOR_ID_ROHM,
+ .device = PCI_DEVICE_ID_ROHM_ML7223_GBE,
+ .subvendor = PCI_ANY_ID,
+ .subdevice = PCI_ANY_ID,
+ .class = (PCI_CLASS_NETWORK_ETHERNET << 8),
+ .class_mask = (0xFFFF00)
+ },
/* required last entry */
{0}
};
};
#undef _R
+static const struct rtl_firmware_info {
+ int mac_version;
+ const char *fw_name;
+} rtl_firmware_infos[] = {
+ { .mac_version = RTL_GIGA_MAC_VER_25, .fw_name = FIRMWARE_8168D_1 },
+ { .mac_version = RTL_GIGA_MAC_VER_26, .fw_name = FIRMWARE_8168D_2 },
+ { .mac_version = RTL_GIGA_MAC_VER_29, .fw_name = FIRMWARE_8105E_1 },
+ { .mac_version = RTL_GIGA_MAC_VER_30, .fw_name = FIRMWARE_8105E_1 }
+};
+
enum cfg_version {
RTL_CFG_0 = 0x00,
RTL_CFG_1,
u32 saved_wolopts;
const struct firmware *fw;
+#define RTL_FIRMWARE_UNKNOWN ERR_PTR(-EAGAIN);
};
MODULE_AUTHOR("Realtek and the Linux r8169 crew <netdev@vger.kernel.org>");
static void rtl_release_firmware(struct rtl8169_private *tp)
{
- release_firmware(tp->fw);
- tp->fw = NULL;
+ if (!IS_ERR_OR_NULL(tp->fw))
+ release_firmware(tp->fw);
+ tp->fw = RTL_FIRMWARE_UNKNOWN;
}
-static int rtl_apply_firmware(struct rtl8169_private *tp, const char *fw_name)
+static void rtl_apply_firmware(struct rtl8169_private *tp)
{
- const struct firmware **fw = &tp->fw;
- int rc = !*fw;
-
- if (rc) {
- rc = request_firmware(fw, fw_name, &tp->pci_dev->dev);
- if (rc < 0)
- goto out;
- }
+ const struct firmware *fw = tp->fw;
/* TODO: release firmware once rtl_phy_write_fw signals failures. */
- rtl_phy_write_fw(tp, *fw);
-out:
- return rc;
+ if (!IS_ERR_OR_NULL(fw))
+ rtl_phy_write_fw(tp, fw);
+}
+
+static void rtl_apply_firmware_cond(struct rtl8169_private *tp, u8 reg, u16 val)
+{
+ if (rtl_readphy(tp, reg) != val)
+ netif_warn(tp, hw, tp->dev, "chipset not ready for firmware\n");
+ else
+ rtl_apply_firmware(tp);
}
static void rtl8169s_hw_phy_config(struct rtl8169_private *tp)
rtl_writephy(tp, 0x1f, 0x0005);
rtl_writephy(tp, 0x05, 0x001b);
- if ((rtl_readphy(tp, 0x06) != 0xbf00) ||
- (rtl_apply_firmware(tp, FIRMWARE_8168D_1) < 0)) {
- netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
- }
+
+ rtl_apply_firmware_cond(tp, MII_EXPANSION, 0xbf00);
rtl_writephy(tp, 0x1f, 0x0000);
}
rtl_writephy(tp, 0x1f, 0x0005);
rtl_writephy(tp, 0x05, 0x001b);
- if ((rtl_readphy(tp, 0x06) != 0xb300) ||
- (rtl_apply_firmware(tp, FIRMWARE_8168D_2) < 0)) {
- netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
- }
+
+ rtl_apply_firmware_cond(tp, MII_EXPANSION, 0xb300);
rtl_writephy(tp, 0x1f, 0x0000);
}
rtl_writephy(tp, 0x18, 0x0310);
msleep(100);
- if (rtl_apply_firmware(tp, FIRMWARE_8105E_1) < 0)
- netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
+ rtl_apply_firmware(tp);
rtl_writephy_batch(tp, phy_reg_init, ARRAY_SIZE(phy_reg_init));
}
tp->timer.data = (unsigned long) dev;
tp->timer.function = rtl8169_phy_timer;
+ tp->fw = RTL_FIRMWARE_UNKNOWN;
+
rc = register_netdev(dev);
if (rc < 0)
goto err_out_msi_4;
cancel_delayed_work_sync(&tp->task);
- rtl_release_firmware(tp);
-
unregister_netdev(dev);
+ rtl_release_firmware(tp);
+
if (pci_dev_run_wake(pdev))
pm_runtime_get_noresume(&pdev->dev);
pci_set_drvdata(pdev, NULL);
}
+static void rtl_request_firmware(struct rtl8169_private *tp)
+{
+ int i;
+
+ /* Return early if the firmware is already loaded / cached. */
+ if (!IS_ERR(tp->fw))
+ goto out;
+
+ for (i = 0; i < ARRAY_SIZE(rtl_firmware_infos); i++) {
+ const struct rtl_firmware_info *info = rtl_firmware_infos + i;
+
+ if (info->mac_version == tp->mac_version) {
+ const char *name = info->fw_name;
+ int rc;
+
+ rc = request_firmware(&tp->fw, name, &tp->pci_dev->dev);
+ if (rc < 0) {
+ netif_warn(tp, ifup, tp->dev, "unable to load "
+ "firmware patch %s (%d)\n", name, rc);
+ goto out_disable_request_firmware;
+ }
+ goto out;
+ }
+ }
+
+out_disable_request_firmware:
+ tp->fw = NULL;
+out:
+ return;
+}
+
static int rtl8169_open(struct net_device *dev)
{
struct rtl8169_private *tp = netdev_priv(dev);
smp_mb();
+ rtl_request_firmware(tp);
+
retval = request_irq(dev->irq, rtl8169_interrupt,
(tp->features & RTL_FEATURE_MSI) ? 0 : IRQF_SHARED,
dev->name, dev);
if (retval < 0)
- goto err_release_ring_2;
+ goto err_release_fw_2;
napi_enable(&tp->napi);
out:
return retval;
-err_release_ring_2:
+err_release_fw_2:
+ rtl_release_firmware(tp);
rtl8169_rx_clear(tp);
err_free_rx_1:
dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
return &nic_data->mcdi;
}
+static inline void
+efx_mcdi_readd(struct efx_nic *efx, efx_dword_t *value, unsigned reg)
+{
+ struct siena_nic_data *nic_data = efx->nic_data;
+ value->u32[0] = (__force __le32)__raw_readl(nic_data->mcdi_smem + reg);
+}
+
+static inline void
+efx_mcdi_writed(struct efx_nic *efx, const efx_dword_t *value, unsigned reg)
+{
+ struct siena_nic_data *nic_data = efx->nic_data;
+ __raw_writel((__force u32)value->u32[0], nic_data->mcdi_smem + reg);
+}
+
void efx_mcdi_init(struct efx_nic *efx)
{
struct efx_mcdi_iface *mcdi;
const u8 *inbuf, size_t inlen)
{
struct efx_mcdi_iface *mcdi = efx_mcdi(efx);
- unsigned pdu = FR_CZ_MC_TREG_SMEM + MCDI_PDU(efx);
- unsigned doorbell = FR_CZ_MC_TREG_SMEM + MCDI_DOORBELL(efx);
+ unsigned pdu = MCDI_PDU(efx);
+ unsigned doorbell = MCDI_DOORBELL(efx);
unsigned int i;
efx_dword_t hdr;
u32 xflags, seqno;
MCDI_HEADER_SEQ, seqno,
MCDI_HEADER_XFLAGS, xflags);
- efx_writed(efx, &hdr, pdu);
+ efx_mcdi_writed(efx, &hdr, pdu);
- for (i = 0; i < inlen; i += 4) {
- _efx_writed(efx, *((__le32 *)(inbuf + i)), pdu + 4 + i);
- /* use wmb() within loop to inhibit write combining */
- wmb();
- }
+ for (i = 0; i < inlen; i += 4)
+ efx_mcdi_writed(efx, (const efx_dword_t *)(inbuf + i),
+ pdu + 4 + i);
/* ring the doorbell with a distinctive value */
- _efx_writed(efx, (__force __le32) 0x45789abc, doorbell);
- wmb();
+ EFX_POPULATE_DWORD_1(hdr, EFX_DWORD_0, 0x45789abc);
+ efx_mcdi_writed(efx, &hdr, doorbell);
}
static void efx_mcdi_copyout(struct efx_nic *efx, u8 *outbuf, size_t outlen)
{
struct efx_mcdi_iface *mcdi = efx_mcdi(efx);
- unsigned int pdu = FR_CZ_MC_TREG_SMEM + MCDI_PDU(efx);
+ unsigned int pdu = MCDI_PDU(efx);
int i;
BUG_ON(atomic_read(&mcdi->state) == MCDI_STATE_QUIESCENT);
BUG_ON(outlen & 3 || outlen >= 0x100);
for (i = 0; i < outlen; i += 4)
- *((__le32 *)(outbuf + i)) = _efx_readd(efx, pdu + 4 + i);
+ efx_mcdi_readd(efx, (efx_dword_t *)(outbuf + i), pdu + 4 + i);
}
static int efx_mcdi_poll(struct efx_nic *efx)
struct efx_mcdi_iface *mcdi = efx_mcdi(efx);
unsigned int time, finish;
unsigned int respseq, respcmd, error;
- unsigned int pdu = FR_CZ_MC_TREG_SMEM + MCDI_PDU(efx);
+ unsigned int pdu = MCDI_PDU(efx);
unsigned int rc, spins;
efx_dword_t reg;
time = get_seconds();
- rmb();
- efx_readd(efx, ®, pdu);
+ efx_mcdi_readd(efx, ®, pdu);
/* All 1's indicates that shared memory is in reset (and is
* not a valid header). Wait for it to come out reset before
respseq, mcdi->seqno);
rc = EIO;
} else if (error) {
- efx_readd(efx, ®, pdu + 4);
+ efx_mcdi_readd(efx, ®, pdu + 4);
switch (EFX_DWORD_FIELD(reg, EFX_DWORD_0)) {
#define TRANSLATE_ERROR(name) \
case MC_CMD_ERR_ ## name: \
/* Test and clear MC-rebooted flag for this port/function */
int efx_mcdi_poll_reboot(struct efx_nic *efx)
{
- unsigned int addr = FR_CZ_MC_TREG_SMEM + MCDI_REBOOT_FLAG(efx);
+ unsigned int addr = MCDI_REBOOT_FLAG(efx);
efx_dword_t reg;
uint32_t value;
if (efx_nic_rev(efx) < EFX_REV_SIENA_A0)
return false;
- efx_readd(efx, ®, addr);
+ efx_mcdi_readd(efx, ®, addr);
value = EFX_DWORD_FIELD(reg, EFX_DWORD_0);
if (value == 0)
return 0;
EFX_ZERO_DWORD(reg);
- efx_writed(efx, ®, addr);
+ efx_mcdi_writed(efx, ®, addr);
if (value == MC_STATUS_DWORD_ASSERT)
return -EINTR;
size = min_t(size_t, table->step, 16);
+ if (table->offset >= efx->type->mem_map_size) {
+ /* No longer mapped; return dummy data */
+ memcpy(buf, "\xde\xc0\xad\xde", 4);
+ buf += table->rows * size;
+ continue;
+ }
+
for (i = 0; i < table->rows; i++) {
switch (table->step) {
case 4: /* 32-bit register or SRAM */
/**
* struct siena_nic_data - Siena NIC state
* @mcdi: Management-Controller-to-Driver Interface
+ * @mcdi_smem: MCDI shared memory mapping. The mapping is always uncacheable.
* @wol_filter_id: Wake-on-LAN packet filter id
*/
struct siena_nic_data {
struct efx_mcdi_iface mcdi;
+ void __iomem *mcdi_smem;
int wol_filter_id;
};
efx_reado(efx, ®, FR_AZ_CS_DEBUG);
efx->net_dev->dev_id = EFX_OWORD_FIELD(reg, FRF_CZ_CS_PORT_NUM) - 1;
+ /* Initialise MCDI */
+ nic_data->mcdi_smem = ioremap_nocache(efx->membase_phys +
+ FR_CZ_MC_TREG_SMEM,
+ FR_CZ_MC_TREG_SMEM_STEP *
+ FR_CZ_MC_TREG_SMEM_ROWS);
+ if (!nic_data->mcdi_smem) {
+ netif_err(efx, probe, efx->net_dev,
+ "could not map MCDI at %llx+%x\n",
+ (unsigned long long)efx->membase_phys +
+ FR_CZ_MC_TREG_SMEM,
+ FR_CZ_MC_TREG_SMEM_STEP * FR_CZ_MC_TREG_SMEM_ROWS);
+ rc = -ENOMEM;
+ goto fail1;
+ }
efx_mcdi_init(efx);
/* Recover from a failed assertion before probing */
rc = efx_mcdi_handle_assertion(efx);
if (rc)
- goto fail1;
+ goto fail2;
/* Let the BMC know that the driver is now in charge of link and
* filter settings. We must do this before we reset the NIC */
fail3:
efx_mcdi_drv_attach(efx, false, NULL);
fail2:
+ iounmap(nic_data->mcdi_smem);
fail1:
kfree(efx->nic_data);
return rc;
static void siena_remove_nic(struct efx_nic *efx)
{
+ struct siena_nic_data *nic_data = efx->nic_data;
+
efx_nic_free_buffer(efx, &efx->irq_status);
siena_reset_hw(efx, RESET_TYPE_ALL);
efx_mcdi_drv_attach(efx, false, NULL);
/* Tear down the private nic state */
- kfree(efx->nic_data);
+ iounmap(nic_data->mcdi_smem);
+ kfree(nic_data);
efx->nic_data = NULL;
}
.default_mac_ops = &efx_mcdi_mac_operations,
.revision = EFX_REV_SIENA_A0,
- .mem_map_size = (FR_CZ_MC_TREG_SMEM +
- FR_CZ_MC_TREG_SMEM_STEP * FR_CZ_MC_TREG_SMEM_ROWS),
+ .mem_map_size = FR_CZ_MC_TREG_SMEM, /* MC_TREG_SMEM mapped separately */
.txd_ptr_tbl_base = FR_BZ_TX_DESC_PTR_TBL,
.rxd_ptr_tbl_base = FR_BZ_RX_DESC_PTR_TBL,
.buf_tbl_base = FR_BZ_BUF_FULL_TBL,
/* Done. We have linked the TTY line to a channel. */
rtnl_unlock();
tty->receive_room = 65536; /* We don't flow control */
- return sl->dev->base_addr;
+
+ /* TTY layer expects 0 on success */
+ return 0;
err_free_bufs:
sl_free_bufs(sl);
#endif
#ifdef CONFIG_SBUS
+static const struct of_device_id hme_sbus_match[];
static int __devinit hme_sbus_probe(struct platform_device *op)
{
+ const struct of_device_id *match;
struct device_node *dp = op->dev.of_node;
const char *model = of_get_property(dp, "model", NULL);
int is_qfe;
- if (!op->dev.of_match)
+ match = of_match_device(hme_sbus_match, &op->dev);
+ if (!match)
return -EINVAL;
- is_qfe = (op->dev.of_match->data != NULL);
+ is_qfe = (match->data != NULL);
if (!is_qfe && model && !strcmp(model, "SUNW,sbus-qfe"))
is_qfe = 1;
if (val & VCPU_CFGSHDW_ASPM_DBNC)
tp->tg3_flags |= TG3_FLAG_ASPM_WORKAROUND;
if ((val & VCPU_CFGSHDW_WOL_ENABLE) &&
- (val & VCPU_CFGSHDW_WOL_MAGPKT))
+ (val & VCPU_CFGSHDW_WOL_MAGPKT)) {
tp->tg3_flags |= TG3_FLAG_WOL_ENABLE;
+ device_set_wakeup_enable(&tp->pdev->dev, true);
+ }
goto done;
}
tp->tg3_flags &= ~TG3_FLAG_WOL_CAP;
if ((tp->tg3_flags & TG3_FLAG_WOL_CAP) &&
- (nic_cfg & NIC_SRAM_DATA_CFG_WOL_ENABLE))
+ (nic_cfg & NIC_SRAM_DATA_CFG_WOL_ENABLE)) {
tp->tg3_flags |= TG3_FLAG_WOL_ENABLE;
+ device_set_wakeup_enable(&tp->pdev->dev, true);
+ }
if (cfg2 & (1 << 17))
tp->phy_flags |= TG3_PHYFLG_CAPACITIVE_COUPLING;
.manage_power = cdc_manage_power,
};
-static const struct driver_info mbm_info = {
+static const struct driver_info wwan_info = {
.description = "Mobile Broadband Network Device",
.flags = FLAG_WWAN,
.bind = usbnet_cdc_bind,
/*-------------------------------------------------------------------------*/
+#define HUAWEI_VENDOR_ID 0x12D1
static const struct usb_device_id products [] = {
/*
{
USB_DEVICE_AND_INTERFACE_INFO(0x1004, 0x61aa, USB_CLASS_COMM,
USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
- .driver_info = 0,
+ .driver_info = (unsigned long)&wwan_info,
},
/*
}, {
USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_MDLM,
USB_CDC_PROTO_NONE),
- .driver_info = (unsigned long)&mbm_info,
+ .driver_info = (unsigned long)&wwan_info,
+}, {
+ /* Various Huawei modems with a network port like the UMG1831 */
+ .match_flags = USB_DEVICE_ID_MATCH_VENDOR
+ | USB_DEVICE_ID_MATCH_INT_INFO,
+ .idVendor = HUAWEI_VENDOR_ID,
+ .bInterfaceClass = USB_CLASS_COMM,
+ .bInterfaceSubClass = USB_CDC_SUBCLASS_ETHERNET,
+ .bInterfaceProtocol = 255,
+ .driver_info = (unsigned long)&wwan_info,
},
{ }, // END
};
#include <linux/usb/usbnet.h>
#include <linux/usb/cdc.h>
-#define DRIVER_VERSION "7-Feb-2011"
+#define DRIVER_VERSION "23-Apr-2011"
/* CDC NCM subclass 3.2.1 */
#define USB_CDC_NCM_NDP16_LENGTH_MIN 0x10
/* Maximum NTB length */
-#define CDC_NCM_NTB_MAX_SIZE_TX 16384 /* bytes */
+#define CDC_NCM_NTB_MAX_SIZE_TX (16384 + 4) /* bytes, must be short terminated */
#define CDC_NCM_NTB_MAX_SIZE_RX 16384 /* bytes */
/* Minimum value for MaxDatagramSize, ch. 6.2.9 */
#define IPHETH_USBINTF_PROTO 1
#define IPHETH_BUF_SIZE 1516
+#define IPHETH_IP_ALIGN 2 /* padding at front of URB */
#define IPHETH_TX_TIMEOUT (5 * HZ)
#define IPHETH_INTFNUM 2
return;
}
- len = urb->actual_length;
- buf = urb->transfer_buffer;
+ if (urb->actual_length <= IPHETH_IP_ALIGN) {
+ dev->net->stats.rx_length_errors++;
+ return;
+ }
+ len = urb->actual_length - IPHETH_IP_ALIGN;
+ buf = urb->transfer_buffer + IPHETH_IP_ALIGN;
- skb = dev_alloc_skb(NET_IP_ALIGN + len);
+ skb = dev_alloc_skb(len);
if (!skb) {
err("%s: dev_alloc_skb: -ENOMEM", __func__);
dev->net->stats.rx_dropped++;
return;
}
- skb_reserve(skb, NET_IP_ALIGN);
- memcpy(skb_put(skb, len), buf + NET_IP_ALIGN, len - NET_IP_ALIGN);
+ memcpy(skb_put(skb, len), buf, len);
skb->dev = dev->net;
skb->protocol = eth_type_trans(skb, dev->net);
msleep(10);
bmcr = smsc95xx_mdio_read(dev->net, dev->mii.phy_id, MII_BMCR);
timeout++;
- } while ((bmcr & MII_BMCR) && (timeout < 100));
+ } while ((bmcr & BMCR_RESET) && (timeout < 100));
if (timeout >= 100) {
netdev_warn(dev->net, "timeout on PHY Reset");
struct driver_info *info = dev->driver_info;
int retval;
+ clear_bit(EVENT_DEV_OPEN, &dev->flags);
netif_stop_queue (net);
netif_info(dev, ifdown, dev->net,
}
}
+ set_bit(EVENT_DEV_OPEN, &dev->flags);
netif_start_queue (net);
netif_info(dev, ifup, dev->net,
"open: enable queueing (rx %d, tx %d) mtu %d %s framing\n",
if (dev->driver_info->unbind)
dev->driver_info->unbind (dev, intf);
+ usb_kill_urb(dev->interrupt);
+ usb_free_urb(dev->interrupt);
+
free_netdev(net);
usb_put_dev (xdev);
}
int retval;
if (!--dev->suspend_count) {
+ /* resume interrupt URBs */
+ if (dev->interrupt && test_bit(EVENT_DEV_OPEN, &dev->flags))
+ usb_submit_urb(dev->interrupt, GFP_NOIO);
+
spin_lock_irq(&dev->txq.lock);
while ((res = usb_get_from_anchor(&dev->deferred))) {
smp_mb();
clear_bit(EVENT_DEV_ASLEEP, &dev->flags);
spin_unlock_irq(&dev->txq.lock);
- if (!(dev->txq.qlen >= TX_QLEN(dev)))
- netif_start_queue(dev->net);
- tasklet_schedule (&dev->bh);
+
+ if (test_bit(EVENT_DEV_OPEN, &dev->flags)) {
+ if (!(dev->txq.qlen >= TX_QLEN(dev)))
+ netif_start_queue(dev->net);
+ tasklet_schedule (&dev->bh);
+ }
}
return 0;
}
if (tb[IFLA_ADDRESS] == NULL)
random_ether_addr(dev->dev_addr);
+ if (tb[IFLA_IFNAME])
+ nla_strlcpy(dev->name, tb[IFLA_IFNAME], IFNAMSIZ);
+ else
+ snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
+
+ if (strchr(dev->name, '%')) {
+ err = dev_alloc_name(dev, dev->name);
+ if (err < 0)
+ goto err_alloc_name;
+ }
+
err = register_netdevice(dev);
if (err < 0)
goto err_register_dev;
err_register_dev:
/* nothing to do */
+err_alloc_name:
err_configure_peer:
unregister_netdevice(peer);
return err;
vmxnet3_process_events(struct vmxnet3_adapter *adapter)
{
int i;
+ unsigned long flags;
u32 events = le32_to_cpu(adapter->shared->ecr);
if (!events)
return;
/* Check if there is an error on xmit/recv queues */
if (events & (VMXNET3_ECR_TQERR | VMXNET3_ECR_RQERR)) {
- spin_lock(&adapter->cmd_lock);
+ spin_lock_irqsave(&adapter->cmd_lock, flags);
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
VMXNET3_CMD_GET_QUEUE_STATUS);
- spin_unlock(&adapter->cmd_lock);
+ spin_unlock_irqrestore(&adapter->cmd_lock, flags);
for (i = 0; i < adapter->num_tx_queues; i++)
if (adapter->tqd_start[i].status.stopped)
vmxnet3_alloc_intr_resources(struct vmxnet3_adapter *adapter)
{
u32 cfg;
+ unsigned long flags;
/* intr settings */
- spin_lock(&adapter->cmd_lock);
+ spin_lock_irqsave(&adapter->cmd_lock, flags);
VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
VMXNET3_CMD_GET_CONF_INTR);
cfg = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
- spin_unlock(&adapter->cmd_lock);
+ spin_unlock_irqrestore(&adapter->cmd_lock, flags);
adapter->intr.type = cfg & 0x3;
adapter->intr.mask_mode = (cfg >> 2) & 0x3;
/* toggle the LRO feature*/
netdev->features ^= NETIF_F_LRO;
+ /* Update private LRO flag */
+ adapter->lro = lro_requested;
+
/* update harware LRO capability accordingly */
if (lro_requested)
adapter->shared->devRead.misc.uptFeatures |=
static void ath9k_flush(struct ieee80211_hw *hw, bool drop)
{
struct ath_softc *sc = hw->priv;
+ struct ath_hw *ah = sc->sc_ah;
+ struct ath_common *common = ath9k_hw_common(ah);
int timeout = 200; /* ms */
int i, j;
cancel_delayed_work_sync(&sc->tx_complete_work);
+ if (sc->sc_flags & SC_OP_INVALID) {
+ ath_dbg(common, ATH_DBG_ANY, "Device not present\n");
+ mutex_unlock(&sc->mutex);
+ return;
+ }
+
if (drop)
timeout = 1;
"confusing the DMA engine when we start RX up\n");
ATH_DBG_WARN_ON_ONCE(!stopped);
}
- return stopped || reset;
+ return stopped && !reset;
}
void ath_flushrecv(struct ath_softc *sc)
MODULE_FIRMWARE("b43/ucode13.fw");
MODULE_FIRMWARE("b43/ucode14.fw");
MODULE_FIRMWARE("b43/ucode15.fw");
+MODULE_FIRMWARE("b43/ucode16_mimo.fw");
MODULE_FIRMWARE("b43/ucode5.fw");
MODULE_FIRMWARE("b43/ucode9.fw");
hdr_len = ieee80211_hdrlen(fc);
- /* Find index into station table for destination station */
- sta_id = iwl_legacy_sta_id_or_broadcast(priv, ctx, info->control.sta);
- if (sta_id == IWL_INVALID_STATION) {
- IWL_DEBUG_DROP(priv, "Dropping - INVALID STATION: %pM\n",
- hdr->addr1);
- goto drop_unlock;
+ /* For management frames use broadcast id to do not break aggregation */
+ if (!ieee80211_is_data(fc))
+ sta_id = ctx->bcast_sta_id;
+ else {
+ /* Find index into station table for destination station */
+ sta_id = iwl_legacy_sta_id_or_broadcast(priv, ctx, info->control.sta);
+
+ if (sta_id == IWL_INVALID_STATION) {
+ IWL_DEBUG_DROP(priv, "Dropping - INVALID STATION: %pM\n",
+ hdr->addr1);
+ goto drop_unlock;
+ }
}
IWL_DEBUG_TX(priv, "station Id %d\n", sta_id);
q->read_ptr = iwl_legacy_queue_inc_wrap(q->read_ptr, q->n_bd)) {
tx_info = &txq->txb[txq->q.read_ptr];
- iwl4965_tx_status(priv, tx_info,
- txq_id >= IWL4965_FIRST_AMPDU_QUEUE);
+
+ if (WARN_ON_ONCE(tx_info->skb == NULL))
+ continue;
hdr = (struct ieee80211_hdr *)tx_info->skb->data;
- if (hdr && ieee80211_is_data_qos(hdr->frame_control))
+ if (ieee80211_is_data_qos(hdr->frame_control))
nfreed++;
+
+ iwl4965_tx_status(priv, tx_info,
+ txq_id >= IWL4965_FIRST_AMPDU_QUEUE);
tx_info->skb = NULL;
priv->cfg->ops->lib->txq_free_tfd(priv, txq);
goto set_ch_out;
}
+ if (priv->iw_mode == NL80211_IFTYPE_ADHOC &&
+ !iwl_legacy_is_channel_ibss(ch_info)) {
+ IWL_DEBUG_MAC80211(priv, "leave - not IBSS channel\n");
+ ret = -EINVAL;
+ goto set_ch_out;
+ }
+
spin_lock_irqsave(&priv->lock, flags);
for_each_context(priv, ctx) {
return (!(ch->flags & EEPROM_CHANNEL_ACTIVE)) ? 1 : 0;
}
+static inline int
+iwl_legacy_is_channel_ibss(const struct iwl_channel_info *ch)
+{
+ return (ch->flags & EEPROM_CHANNEL_IBSS) ? 1 : 0;
+}
+
static inline void
__iwl_legacy_free_pages(struct iwl_priv *priv, struct page *page)
{
MODULE_PARM_DESC(led_mode, "0=system default, "
"1=On(RF On)/Off(RF Off), 2=blinking");
+/* Throughput OFF time(ms) ON time (ms)
+ * >300 25 25
+ * >200 to 300 40 40
+ * >100 to 200 55 55
+ * >70 to 100 65 65
+ * >50 to 70 75 75
+ * >20 to 50 85 85
+ * >10 to 20 95 95
+ * >5 to 10 110 110
+ * >1 to 5 130 130
+ * >0 to 1 167 167
+ * <=0 SOLID ON
+ */
static const struct ieee80211_tpt_blink iwl_blink[] = {
- { .throughput = 0 * 1024 - 1, .blink_time = 334 },
+ { .throughput = 0, .blink_time = 334 },
{ .throughput = 1 * 1024 - 1, .blink_time = 260 },
{ .throughput = 5 * 1024 - 1, .blink_time = 220 },
{ .throughput = 10 * 1024 - 1, .blink_time = 190 },
if (priv->blink_on == on && priv->blink_off == off)
return 0;
+ if (off == 0) {
+ /* led is SOLID_ON */
+ on = IWL_LED_SOLID;
+ }
+
IWL_DEBUG_LED(priv, "Led blink time compensation=%u\n",
priv->cfg->base_params->led_compensation);
led_cmd.on = iwl_legacy_blink_compensation(priv, on,
struct iwl_priv *priv = container_of(work, struct iwl_priv,
txpower_work);
+ mutex_lock(&priv->mutex);
+
/* If a scan happened to start before we got here
* then just return; the statistics notification will
* kick off another scheduled work to compensate for
* any temperature delta we missed here. */
if (test_bit(STATUS_EXIT_PENDING, &priv->status) ||
test_bit(STATUS_SCANNING, &priv->status))
- return;
-
- mutex_lock(&priv->mutex);
+ goto out;
/* Regardless of if we are associated, we must reconfigure the
* TX power since frames can be sent on non-radar channels while
/* Update last_temperature to keep is_calib_needed from running
* when it isn't needed... */
priv->last_temperature = priv->temperature;
-
+out:
mutex_unlock(&priv->mutex);
}
struct ieee80211_channel *channel = conf->channel;
const struct iwl_channel_info *ch_info;
int ret = 0;
- bool ht_changed[NUM_IWL_RXON_CTX] = {};
IWL_DEBUG_MAC80211(priv, "changed %#x", changed);
for_each_context(priv, ctx) {
/* Configure HT40 channels */
- if (ctx->ht.enabled != conf_is_ht(conf)) {
+ if (ctx->ht.enabled != conf_is_ht(conf))
ctx->ht.enabled = conf_is_ht(conf);
- ht_changed[ctx->ctxid] = true;
- }
if (ctx->ht.enabled) {
if (conf_is_ht40_minus(conf)) {
if (!memcmp(&ctx->staging, &ctx->active, sizeof(ctx->staging)))
continue;
iwlagn_commit_rxon(priv, ctx);
- if (ht_changed[ctx->ctxid])
- iwlagn_update_qos(priv, ctx);
}
out:
mutex_unlock(&priv->mutex);
hdr_len = ieee80211_hdrlen(fc);
- /* Find index into station table for destination station */
- sta_id = iwl_sta_id_or_broadcast(priv, ctx, info->control.sta);
- if (sta_id == IWL_INVALID_STATION) {
- IWL_DEBUG_DROP(priv, "Dropping - INVALID STATION: %pM\n",
- hdr->addr1);
- goto drop_unlock;
+ /* For management frames use broadcast id to do not break aggregation */
+ if (!ieee80211_is_data(fc))
+ sta_id = ctx->bcast_sta_id;
+ else {
+ /* Find index into station table for destination station */
+ sta_id = iwl_sta_id_or_broadcast(priv, ctx, info->control.sta);
+ if (sta_id == IWL_INVALID_STATION) {
+ IWL_DEBUG_DROP(priv, "Dropping - INVALID STATION: %pM\n",
+ hdr->addr1);
+ goto drop_unlock;
+ }
}
IWL_DEBUG_TX(priv, "station Id %d\n", sta_id);
q->read_ptr = iwl_queue_inc_wrap(q->read_ptr, q->n_bd)) {
tx_info = &txq->txb[txq->q.read_ptr];
- iwlagn_tx_status(priv, tx_info,
- txq_id >= IWLAGN_FIRST_AMPDU_QUEUE);
+
+ if (WARN_ON_ONCE(tx_info->skb == NULL))
+ continue;
hdr = (struct ieee80211_hdr *)tx_info->skb->data;
- if (hdr && ieee80211_is_data_qos(hdr->frame_control))
+ if (ieee80211_is_data_qos(hdr->frame_control))
nfreed++;
+
+ iwlagn_tx_status(priv, tx_info,
+ txq_id >= IWLAGN_FIRST_AMPDU_QUEUE);
tx_info->skb = NULL;
if (priv->cfg->ops->lib->txq_inval_byte_cnt_tbl)
cpu_to_le16(PS_MODE_ACTION_EXIT_PS)) {
lbs_deb_host(
"EXEC_NEXT_CMD: ignore ENTER_PS cmd\n");
- list_del(&cmdnode->list);
spin_lock_irqsave(&priv->driver_lock, flags);
+ list_del(&cmdnode->list);
lbs_complete_command(priv, cmdnode, 0);
spin_unlock_irqrestore(&priv->driver_lock, flags);
(priv->psstate == PS_STATE_PRE_SLEEP)) {
lbs_deb_host(
"EXEC_NEXT_CMD: ignore EXIT_PS cmd in sleep\n");
- list_del(&cmdnode->list);
spin_lock_irqsave(&priv->driver_lock, flags);
+ list_del(&cmdnode->list);
lbs_complete_command(priv, cmdnode, 0);
spin_unlock_irqrestore(&priv->driver_lock, flags);
priv->needtowakeup = 1;
"EXEC_NEXT_CMD: sending EXIT_PS\n");
}
}
+ spin_lock_irqsave(&priv->driver_lock, flags);
list_del(&cmdnode->list);
+ spin_unlock_irqrestore(&priv->driver_lock, flags);
lbs_deb_host("EXEC_NEXT_CMD: sending command 0x%04x\n",
le16_to_cpu(cmd->command));
lbs_submit_command(priv, cmdnode);
board = z->resource.start;
ioaddr = board+cards[i].offset;
- dev = alloc_ei_netdev();
+ dev = ____alloc_ei_netdev(0);
if (!dev)
return -ENOMEM;
if (!request_mem_region(ioaddr, NE_IO_EXTENT*2, DRV_NAME)) {
static const struct net_device_ops zorro8390_netdev_ops = {
.ndo_open = zorro8390_open,
.ndo_stop = zorro8390_close,
- .ndo_start_xmit = ei_start_xmit,
- .ndo_tx_timeout = ei_tx_timeout,
- .ndo_get_stats = ei_get_stats,
- .ndo_set_multicast_list = ei_set_multicast_list,
+ .ndo_start_xmit = __ei_start_xmit,
+ .ndo_tx_timeout = __ei_tx_timeout,
+ .ndo_get_stats = __ei_get_stats,
+ .ndo_set_multicast_list = __ei_set_multicast_list,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = eth_change_mtu,
#ifdef CONFIG_NET_POLL_CONTROLLER
- .ndo_poll_controller = ei_poll,
+ .ndo_poll_controller = __ei_poll,
#endif
};
#include <linux/syscore_ops.h>
#include <linux/tboot.h>
#include <linux/dmi.h>
+#include <linux/pci-ats.h>
#include <asm/cacheflush.h>
#include <asm/iommu.h>
#include "pci.h"
#include <linux/mutex.h>
#include <linux/string.h>
#include <linux/delay.h>
+#include <linux/pci-ats.h>
#include "pci.h"
#define VIRTFN_ID_LEN 16
u8 __iomem *mstate; /* VF Migration State Array */
};
-/* Address Translation Service */
-struct pci_ats {
- int pos; /* capability position */
- int stu; /* Smallest Translation Unit */
- int qdep; /* Invalidate Queue Depth */
- int ref_cnt; /* Physical Function reference count */
- unsigned int is_enabled:1; /* Enable bit is set */
-};
-
#ifdef CONFIG_PCI_IOV
extern int pci_iov_init(struct pci_dev *dev);
extern void pci_iov_release(struct pci_dev *dev);
extern void pci_restore_iov_state(struct pci_dev *dev);
extern int pci_iov_bus_range(struct pci_bus *bus);
-extern int pci_enable_ats(struct pci_dev *dev, int ps);
-extern void pci_disable_ats(struct pci_dev *dev);
-extern int pci_ats_queue_depth(struct pci_dev *dev);
-/**
- * pci_ats_enabled - query the ATS status
- * @dev: the PCI device
- *
- * Returns 1 if ATS capability is enabled, or 0 if not.
- */
-static inline int pci_ats_enabled(struct pci_dev *dev)
-{
- return dev->ats && dev->ats->is_enabled;
-}
#else
static inline int pci_iov_init(struct pci_dev *dev)
{
return 0;
}
-static inline int pci_enable_ats(struct pci_dev *dev, int ps)
-{
- return -ENODEV;
-}
-static inline void pci_disable_ats(struct pci_dev *dev)
-{
-}
-static inline int pci_ats_queue_depth(struct pci_dev *dev)
-{
- return -ENODEV;
-}
-static inline int pci_ats_enabled(struct pci_dev *dev)
-{
- return 0;
-}
#endif /* CONFIG_PCI_IOV */
static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
}
size0 = calculate_iosize(size, min_size, size1,
resource_size(b_res), 4096);
- size1 = !add_size? size0:
+ size1 = (!add_head || (add_head && !add_size)) ? size0 :
calculate_iosize(size, min_size+add_size, size1,
resource_size(b_res), 4096);
if (!size0 && !size1) {
align += aligns[order];
}
size0 = calculate_memsize(size, min_size, 0, resource_size(b_res), min_align);
- size1 = !add_size ? size :
+ size1 = (!add_head || (add_head && !add_size)) ? size0 :
calculate_memsize(size, min_size+add_size, 0,
resource_size(b_res), min_align);
if (!size0 && !size1) {
c = p_dev->function_config;
if (!(c->state & CONFIG_LOCKED)) {
- dev_dbg(&p_dev->dev, "Configuration isn't't locked\n");
+ dev_dbg(&p_dev->dev, "Configuration isn't locked\n");
mutex_unlock(&s->ops_mutex);
return -EACCES;
}
return true;
}
-static void eeepc_rfkill_hotplug(struct eeepc_laptop *eeepc)
+static void eeepc_rfkill_hotplug(struct eeepc_laptop *eeepc, acpi_handle handle)
{
+ struct pci_dev *port;
struct pci_dev *dev;
struct pci_bus *bus;
bool blocked = eeepc_wlan_rfkill_blocked(eeepc);
mutex_lock(&eeepc->hotplug_lock);
if (eeepc->hotplug_slot) {
- bus = pci_find_bus(0, 1);
+ port = acpi_get_pci_dev(handle);
+ if (!port) {
+ pr_warning("Unable to find port\n");
+ goto out_unlock;
+ }
+
+ bus = port->subordinate;
+
if (!bus) {
- pr_warning("Unable to find PCI bus 1?\n");
+ pr_warning("Unable to find PCI bus?\n");
goto out_unlock;
}
pr_err("Unable to read PCI config space?\n");
goto out_unlock;
}
+
absent = (l == 0xffffffff);
if (blocked != absent) {
mutex_unlock(&eeepc->hotplug_lock);
}
+static void eeepc_rfkill_hotplug_update(struct eeepc_laptop *eeepc, char *node)
+{
+ acpi_status status = AE_OK;
+ acpi_handle handle;
+
+ status = acpi_get_handle(NULL, node, &handle);
+
+ if (ACPI_SUCCESS(status))
+ eeepc_rfkill_hotplug(eeepc, handle);
+}
+
static void eeepc_rfkill_notify(acpi_handle handle, u32 event, void *data)
{
struct eeepc_laptop *eeepc = data;
if (event != ACPI_NOTIFY_BUS_CHECK)
return;
- eeepc_rfkill_hotplug(eeepc);
+ eeepc_rfkill_hotplug(eeepc, handle);
}
static int eeepc_register_rfkill_notifier(struct eeepc_laptop *eeepc,
eeepc);
if (ACPI_FAILURE(status))
pr_warning("Failed to register notify on %s\n", node);
+ /*
+ * Refresh pci hotplug in case the rfkill state was
+ * changed during setup.
+ */
+ eeepc_rfkill_hotplug(eeepc, handle);
} else
return -ENODEV;
if (ACPI_FAILURE(status))
pr_err("Error removing rfkill notify handler %s\n",
node);
+ /*
+ * Refresh pci hotplug in case the rfkill
+ * state was changed after
+ * eeepc_unregister_rfkill_notifier()
+ */
+ eeepc_rfkill_hotplug(eeepc, handle);
}
}
rfkill_destroy(eeepc->wlan_rfkill);
eeepc->wlan_rfkill = NULL;
}
- /*
- * Refresh pci hotplug in case the rfkill state was changed after
- * eeepc_unregister_rfkill_notifier()
- */
- eeepc_rfkill_hotplug(eeepc);
+
if (eeepc->hotplug_slot)
pci_hp_deregister(eeepc->hotplug_slot);
eeepc_register_rfkill_notifier(eeepc, "\\_SB.PCI0.P0P5");
eeepc_register_rfkill_notifier(eeepc, "\\_SB.PCI0.P0P6");
eeepc_register_rfkill_notifier(eeepc, "\\_SB.PCI0.P0P7");
- /*
- * Refresh pci hotplug in case the rfkill state was changed during
- * setup.
- */
- eeepc_rfkill_hotplug(eeepc);
exit:
if (result && result != -ENODEV)
struct eeepc_laptop *eeepc = dev_get_drvdata(device);
/* Refresh both wlan rfkill state and pci hotplug */
- if (eeepc->wlan_rfkill)
- eeepc_rfkill_hotplug(eeepc);
+ if (eeepc->wlan_rfkill) {
+ eeepc_rfkill_hotplug_update(eeepc, "\\_SB.PCI0.P0P5");
+ eeepc_rfkill_hotplug_update(eeepc, "\\_SB.PCI0.P0P6");
+ eeepc_rfkill_hotplug_update(eeepc, "\\_SB.PCI0.P0P7");
+ }
if (eeepc->bluetooth_rfkill)
rfkill_set_sw_state(eeepc->bluetooth_rfkill,
/*
* Backlight device
*/
+struct sony_backlight_props {
+ struct backlight_device *dev;
+ int handle;
+ u8 offset;
+ u8 maxlvl;
+};
+struct sony_backlight_props sony_bl_props;
+
static int sony_backlight_update_status(struct backlight_device *bd)
{
return acpi_callsetfunc(sony_nc_acpi_handle, "SBRT",
{
int result;
int *handle = (int *)bl_get_data(bd);
+ struct sony_backlight_props *sdev =
+ (struct sony_backlight_props *)bl_get_data(bd);
- sony_call_snc_handle(*handle, 0x0200, &result);
+ sony_call_snc_handle(sdev->handle, 0x0200, &result);
- return result & 0xff;
+ return (result & 0xff) - sdev->offset;
}
static int sony_nc_update_status_ng(struct backlight_device *bd)
{
int value, result;
int *handle = (int *)bl_get_data(bd);
+ struct sony_backlight_props *sdev =
+ (struct sony_backlight_props *)bl_get_data(bd);
- value = bd->props.brightness;
- sony_call_snc_handle(*handle, 0x0100 | (value << 16), &result);
+ value = bd->props.brightness + sdev->offset;
+ if (sony_call_snc_handle(sdev->handle, 0x0100 | (value << 16), &result))
+ return -EIO;
- return sony_nc_get_brightness_ng(bd);
+ return value;
}
static const struct backlight_ops sony_backlight_ops = {
.update_status = sony_nc_update_status_ng,
.get_brightness = sony_nc_get_brightness_ng,
};
-static int backlight_ng_handle;
-static struct backlight_device *sony_backlight_device;
/*
* New SNC-only Vaios event mapping to driver known keys
&ignore);
}
+static void sony_nc_backlight_ng_read_limits(int handle,
+ struct sony_backlight_props *props)
+{
+ int offset;
+ acpi_status status;
+ u8 brlvl, i;
+ u8 min = 0xff, max = 0x00;
+ struct acpi_object_list params;
+ union acpi_object in_obj;
+ union acpi_object *lvl_enum;
+ struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+
+ props->handle = handle;
+ props->offset = 0;
+ props->maxlvl = 0xff;
+
+ offset = sony_find_snc_handle(handle);
+ if (offset < 0)
+ return;
+
+ /* try to read the boundaries from ACPI tables, if we fail the above
+ * defaults should be reasonable
+ */
+ params.count = 1;
+ params.pointer = &in_obj;
+ in_obj.type = ACPI_TYPE_INTEGER;
+ in_obj.integer.value = offset;
+ status = acpi_evaluate_object(sony_nc_acpi_handle, "SN06", ¶ms,
+ &buffer);
+ if (ACPI_FAILURE(status))
+ return;
+
+ lvl_enum = (union acpi_object *) buffer.pointer;
+ if (!lvl_enum) {
+ pr_err("No SN06 return object.");
+ return;
+ }
+ if (lvl_enum->type != ACPI_TYPE_BUFFER) {
+ pr_err("Invalid SN06 return object 0x%.2x\n",
+ lvl_enum->type);
+ goto out_invalid;
+ }
+
+ /* the buffer lists brightness levels available, brightness levels are
+ * from 0 to 8 in the array, other values are used by ALS control.
+ */
+ for (i = 0; i < 9 && i < lvl_enum->buffer.length; i++) {
+
+ brlvl = *(lvl_enum->buffer.pointer + i);
+ dprintk("Brightness level: %d\n", brlvl);
+
+ if (!brlvl)
+ break;
+
+ if (brlvl > max)
+ max = brlvl;
+ if (brlvl < min)
+ min = brlvl;
+ }
+ props->offset = min;
+ props->maxlvl = max;
+ dprintk("Brightness levels: min=%d max=%d\n", props->offset,
+ props->maxlvl);
+
+out_invalid:
+ kfree(buffer.pointer);
+ return;
+}
+
static void sony_nc_backlight_setup(void)
{
acpi_handle unused;
struct backlight_properties props;
if (sony_find_snc_handle(0x12f) != -1) {
- backlight_ng_handle = 0x12f;
ops = &sony_backlight_ng_ops;
- max_brightness = 0xff;
+ sony_nc_backlight_ng_read_limits(0x12f, &sony_bl_props);
+ max_brightness = sony_bl_props.maxlvl - sony_bl_props.offset;
} else if (sony_find_snc_handle(0x137) != -1) {
- backlight_ng_handle = 0x137;
ops = &sony_backlight_ng_ops;
- max_brightness = 0xff;
+ sony_nc_backlight_ng_read_limits(0x137, &sony_bl_props);
+ max_brightness = sony_bl_props.maxlvl - sony_bl_props.offset;
} else if (ACPI_SUCCESS(acpi_get_handle(sony_nc_acpi_handle, "GBRT",
&unused))) {
memset(&props, 0, sizeof(struct backlight_properties));
props.type = BACKLIGHT_PLATFORM;
props.max_brightness = max_brightness;
- sony_backlight_device = backlight_device_register("sony", NULL,
- &backlight_ng_handle,
- ops, &props);
+ sony_bl_props.dev = backlight_device_register("sony", NULL,
+ &sony_bl_props,
+ ops, &props);
- if (IS_ERR(sony_backlight_device)) {
- pr_warning(DRV_PFX "unable to register backlight device\n");
- sony_backlight_device = NULL;
+ if (IS_ERR(sony_bl_props.dev)) {
+ pr_warn(DRV_PFX "unable to register backlight device\n");
+ sony_bl_props.dev = NULL;
} else
- sony_backlight_device->props.brightness =
- ops->get_brightness(sony_backlight_device);
+ sony_bl_props.dev->props.brightness =
+ ops->get_brightness(sony_bl_props.dev);
}
static void sony_nc_backlight_cleanup(void)
{
- if (sony_backlight_device)
- backlight_device_unregister(sony_backlight_device);
+ if (sony_bl_props.dev)
+ backlight_device_unregister(sony_bl_props.dev);
}
static int sony_nc_add(struct acpi_device *device)
mutex_lock(&spic_dev.lock);
switch (cmd) {
case SONYPI_IOCGBRT:
- if (sony_backlight_device == NULL) {
+ if (sony_bl_props.dev == NULL) {
ret = -EIO;
break;
}
ret = -EFAULT;
break;
case SONYPI_IOCSBRT:
- if (sony_backlight_device == NULL) {
+ if (sony_bl_props.dev == NULL) {
ret = -EIO;
break;
}
break;
}
/* sync the backlight device status */
- sony_backlight_device->props.brightness =
- sony_backlight_get_brightness(sony_backlight_device);
+ sony_bl_props.dev->props.brightness =
+ sony_backlight_get_brightness(sony_bl_props.dev);
break;
case SONYPI_IOCGBAT1CAP:
if (ec_read16(SONYPI_BAT1_FULL, &val16)) {
};
/* ACPI HIDs */
-#define TPACPI_ACPI_HKEY_HID "IBM0068"
+#define TPACPI_ACPI_IBM_HKEY_HID "IBM0068"
+#define TPACPI_ACPI_LENOVO_HKEY_HID "LEN0068"
#define TPACPI_ACPI_EC_HID "PNP0C09"
/* Input IDs */
}
static const struct acpi_device_id ibm_htk_device_ids[] = {
- {TPACPI_ACPI_HKEY_HID, 0},
+ {TPACPI_ACPI_IBM_HKEY_HID, 0},
+ {TPACPI_ACPI_LENOVO_HKEY_HID, 0},
{"", 0},
};
else
table++;
+ if (route_port == RIO_INVALID_ROUTE)
+ route_port = IDT_DEFAULT_ROUTE;
+
rio_mport_write_config_32(mport, destid, hopcount,
LOCAL_RTE_CONF_DESTID_SEL, table);
rdev->rswitch->em_handle = idtg2_em_handler;
rdev->rswitch->sw_sysfs = idtg2_sysfs;
+ if (do_enum) {
+ /* Ensure that default routing is disabled on startup */
+ rio_write_config_32(rdev,
+ RIO_STD_RTE_DEFAULT_PORT, IDT_NO_ROUTE);
+ }
+
return 0;
}
{
u32 result;
+ if (route_port == RIO_INVALID_ROUTE)
+ route_port = CPS_DEFAULT_ROUTE;
+
if (table == RIO_GLOBAL_TABLE) {
rio_mport_write_config_32(mport, destid, hopcount,
RIO_STD_RTE_CONF_DESTID_SEL_CSR, route_destid);
/* set TVAL = ~50us */
rio_write_config_32(rdev,
rdev->phys_efptr + RIO_PORT_LINKTO_CTL_CSR, 0x8e << 8);
+ /* Ensure that default routing is disabled on startup */
+ rio_write_config_32(rdev,
+ RIO_STD_RTE_DEFAULT_PORT, CPS_NO_ROUTE);
}
return 0;
rdev->rswitch->em_init = tsi57x_em_init;
rdev->rswitch->em_handle = tsi57x_em_handler;
+ if (do_enum) {
+ /* Ensure that default routing is disabled on startup */
+ rio_write_config_32(rdev, RIO_STD_RTE_DEFAULT_PORT,
+ RIO_INVALID_ROUTE);
+ }
+
return 0;
}
#
config RTC_LIB
- tristate
+ bool
menuconfig RTC_CLASS
- tristate "Real Time Clock"
+ bool "Real Time Clock"
default n
depends on !S390
select RTC_LIB
be allowed to plug one or more RTCs to your system. You will
probably want to enable one or more of the interfaces below.
- This driver can also be built as a module. If so, the module
- will be called rtc-core.
-
if RTC_CLASS
config RTC_HCTOSYS
* system's wall clock; restore it on resume().
*/
-static struct timespec delta;
static time_t oldtime;
+static struct timespec oldts;
static int rtc_suspend(struct device *dev, pm_message_t mesg)
{
struct rtc_device *rtc = to_rtc_device(dev);
struct rtc_time tm;
- struct timespec ts = current_kernel_time();
if (strcmp(dev_name(&rtc->dev), CONFIG_RTC_HCTOSYS_DEVICE) != 0)
return 0;
rtc_read_time(rtc, &tm);
+ ktime_get_ts(&oldts);
rtc_tm_to_time(&tm, &oldtime);
- /* RTC precision is 1 second; adjust delta for avg 1/2 sec err */
- set_normalized_timespec(&delta,
- ts.tv_sec - oldtime,
- ts.tv_nsec - (NSEC_PER_SEC >> 1));
-
return 0;
}
struct rtc_time tm;
time_t newtime;
struct timespec time;
+ struct timespec newts;
if (strcmp(dev_name(&rtc->dev), CONFIG_RTC_HCTOSYS_DEVICE) != 0)
return 0;
+ ktime_get_ts(&newts);
rtc_read_time(rtc, &tm);
if (rtc_valid_tm(&tm) != 0) {
pr_debug("%s: bogus resume time\n", dev_name(&rtc->dev));
pr_debug("%s: time travel!\n", dev_name(&rtc->dev));
return 0;
}
+ /* calculate the RTC time delta */
+ set_normalized_timespec(&time, newtime - oldtime, 0);
- /* restore wall clock using delta against this RTC;
- * adjust again for avg 1/2 second RTC sampling error
- */
- set_normalized_timespec(&time,
- newtime + delta.tv_sec,
- (NSEC_PER_SEC >> 1) + delta.tv_nsec);
- do_settimeofday(&time);
+ /* subtract kernel time between rtc_suspend to rtc_resume */
+ time = timespec_sub(time, timespec_sub(newts, oldts));
+ timekeeping_inject_sleeptime(&time);
return 0;
}
}
clk_disable(rtap->clk);
+ platform_set_drvdata(pdev, rtap);
rtap->rtc = rtc_device_register("coh901331", &pdev->dev, &coh901331_ops,
THIS_MODULE);
if (IS_ERR(rtap->rtc)) {
goto out_no_rtc;
}
- platform_set_drvdata(pdev, rtap);
-
return 0;
out_no_rtc:
+ platform_set_drvdata(pdev, NULL);
out_no_clk_enable:
clk_put(rtap->clk);
out_no_clk:
goto fail2;
}
+ platform_set_drvdata(pdev, davinci_rtc);
+
davinci_rtc->rtc = rtc_device_register(pdev->name, &pdev->dev,
&davinci_rtc_ops, THIS_MODULE);
if (IS_ERR(davinci_rtc->rtc)) {
rtcss_write(davinci_rtc, PRTCSS_RTC_CCTRL_CAEN, PRTCSS_RTC_CCTRL);
- platform_set_drvdata(pdev, davinci_rtc);
-
device_init_wakeup(&pdev->dev, 0);
return 0;
fail4:
rtc_device_unregister(davinci_rtc->rtc);
fail3:
+ platform_set_drvdata(pdev, NULL);
iounmap(davinci_rtc->base);
fail2:
release_mem_region(davinci_rtc->pbase, davinci_rtc->base_size);
goto out;
}
spin_lock_init(&priv->lock);
+ platform_set_drvdata(pdev, priv);
rtc = rtc_device_register("ds1286", &pdev->dev,
&ds1286_ops, THIS_MODULE);
if (IS_ERR(rtc)) {
goto out;
}
priv->rtc = rtc;
- platform_set_drvdata(pdev, priv);
return 0;
out:
return -ENXIO;
pdev->dev.platform_data = ep93xx_rtc;
+ platform_set_drvdata(pdev, rtc);
rtc = rtc_device_register(pdev->name,
&pdev->dev, &ep93xx_rtc_ops, THIS_MODULE);
goto exit;
}
- platform_set_drvdata(pdev, rtc);
-
err = sysfs_create_group(&pdev->dev.kobj, &ep93xx_rtc_sysfs_files);
if (err)
goto fail;
return 0;
fail:
- platform_set_drvdata(pdev, NULL);
rtc_device_unregister(rtc);
exit:
+ platform_set_drvdata(pdev, NULL);
pdev->dev.platform_data = NULL;
return err;
}
goto exit;
}
+ clientdata->features = id->driver_data;
+ i2c_set_clientdata(client, clientdata);
+
rtc = rtc_device_register(client->name, &client->dev,
&m41t80_rtc_ops, THIS_MODULE);
if (IS_ERR(rtc)) {
}
clientdata->rtc = rtc;
- clientdata->features = id->driver_data;
- i2c_set_clientdata(client, clientdata);
/* Make sure HT (Halt Update) bit is cleared */
rc = i2c_smbus_read_byte_data(client, M41T80_REG_ALARM_HOUR);
goto out_irq;
}
+ dev_set_drvdata(&pdev->dev, info);
+ /* XXX - isn't this redundant? */
+ platform_set_drvdata(pdev, info);
+
info->rtc_dev = rtc_device_register("max8925-rtc", &pdev->dev,
&max8925_rtc_ops, THIS_MODULE);
ret = PTR_ERR(info->rtc_dev);
goto out_rtc;
}
- dev_set_drvdata(&pdev->dev, info);
- platform_set_drvdata(pdev, info);
-
return 0;
out_rtc:
+ platform_set_drvdata(pdev, NULL);
free_irq(chip->irq_base + MAX8925_IRQ_RTC_ALARM0, info);
out_irq:
kfree(info);
info->rtc = max8998->rtc;
info->irq = max8998->irq_base + MAX8998_IRQ_ALARM0;
+ platform_set_drvdata(pdev, info);
+
info->rtc_dev = rtc_device_register("max8998-rtc", &pdev->dev,
&max8998_rtc_ops, THIS_MODULE);
goto out_rtc;
}
- platform_set_drvdata(pdev, info);
-
ret = request_threaded_irq(info->irq, NULL, max8998_rtc_alarm_irq, 0,
"rtc-alarm0", info);
return 0;
out_rtc:
+ platform_set_drvdata(pdev, NULL);
kfree(info);
return ret;
}
if (ret)
goto err_alarm_irq_request;
+ mc13xxx_unlock(mc13xxx);
+
priv->rtc = rtc_device_register(pdev->name,
&pdev->dev, &mc13xxx_rtc_ops, THIS_MODULE);
if (IS_ERR(priv->rtc)) {
ret = PTR_ERR(priv->rtc);
+ mc13xxx_lock(mc13xxx);
+
mc13xxx_irq_free(mc13xxx, MC13XXX_IRQ_TODA, priv);
err_alarm_irq_request:
mc13xxx_irq_free(mc13xxx, MC13XXX_IRQ_RTCRST, priv);
err_reset_irq_request:
+ mc13xxx_unlock(mc13xxx);
+
platform_set_drvdata(pdev, NULL);
kfree(priv);
}
- mc13xxx_unlock(mc13xxx);
-
return ret;
}
error = -ENOMEM;
goto out_free_priv;
}
+ platform_set_drvdata(dev, priv);
rtc = rtc_device_register("rtc-msm6242", &dev->dev, &msm6242_rtc_ops,
THIS_MODULE);
}
priv->rtc = rtc;
- platform_set_drvdata(dev, priv);
return 0;
out_unmap:
+ platform_set_drvdata(dev, NULL);
iounmap(priv->regs);
out_free_priv:
kfree(priv);
goto exit_put_clk;
}
- rtc = rtc_device_register(pdev->name, &pdev->dev, &mxc_rtc_ops,
- THIS_MODULE);
- if (IS_ERR(rtc)) {
- ret = PTR_ERR(rtc);
- goto exit_put_clk;
- }
-
- pdata->rtc = rtc;
platform_set_drvdata(pdev, pdata);
/* Configure and enable the RTC */
pdata->irq = -1;
}
+ rtc = rtc_device_register(pdev->name, &pdev->dev, &mxc_rtc_ops,
+ THIS_MODULE);
+ if (IS_ERR(rtc)) {
+ ret = PTR_ERR(rtc);
+ goto exit_clr_drvdata;
+ }
+
+ pdata->rtc = rtc;
+
return 0;
+exit_clr_drvdata:
+ platform_set_drvdata(pdev, NULL);
exit_put_clk:
clk_disable(pdata->clk);
clk_put(pdata->clk);
pcap_rtc->pcap = dev_get_drvdata(pdev->dev.parent);
+ platform_set_drvdata(pdev, pcap_rtc);
+
pcap_rtc->rtc = rtc_device_register("pcap", &pdev->dev,
&pcap_rtc_ops, THIS_MODULE);
if (IS_ERR(pcap_rtc->rtc)) {
goto fail_rtc;
}
- platform_set_drvdata(pdev, pcap_rtc);
timer_irq = pcap_to_irq(pcap_rtc->pcap, PCAP_IRQ_1HZ);
alarm_irq = pcap_to_irq(pcap_rtc->pcap, PCAP_IRQ_TODA);
fail_timer:
rtc_device_unregister(pcap_rtc->rtc);
fail_rtc:
+ platform_set_drvdata(pdev, NULL);
kfree(pcap_rtc);
return err;
}
spin_lock_init(&priv->lock);
+ platform_set_drvdata(dev, priv);
+
rtc = rtc_device_register("rtc-rp5c01", &dev->dev, &rp5c01_rtc_ops,
THIS_MODULE);
if (IS_ERR(rtc)) {
error = PTR_ERR(rtc);
goto out_unmap;
}
-
priv->rtc = rtc;
- platform_set_drvdata(dev, priv);
error = sysfs_create_bin_file(&dev->dev.kobj, &priv->nvram_attr);
if (error)
out_unregister:
rtc_device_unregister(rtc);
out_unmap:
+ platform_set_drvdata(dev, NULL);
iounmap(priv->regs);
out_free_priv:
kfree(priv);
static void __iomem *s3c_rtc_base;
static int s3c_rtc_alarmno = NO_IRQ;
static int s3c_rtc_tickno = NO_IRQ;
+static bool wake_en;
static enum s3c_cpu_type s3c_rtc_cpu_type;
static DEFINE_SPINLOCK(s3c_rtc_pie_lock);
}
s3c_rtc_enable(pdev, 0);
- if (device_may_wakeup(&pdev->dev))
- enable_irq_wake(s3c_rtc_alarmno);
+ if (device_may_wakeup(&pdev->dev) && !wake_en) {
+ if (enable_irq_wake(s3c_rtc_alarmno) == 0)
+ wake_en = true;
+ else
+ dev_err(&pdev->dev, "enable_irq_wake failed\n");
+ }
return 0;
}
writew(tmp | ticnt_en_save, s3c_rtc_base + S3C2410_RTCCON);
}
- if (device_may_wakeup(&pdev->dev))
+ if (device_may_wakeup(&pdev->dev) && wake_en) {
disable_irq_wake(s3c_rtc_alarmno);
+ wake_en = false;
+ }
return 0;
}
static inline int _dasd_term_running_cqr(struct dasd_device *device)
{
struct dasd_ccw_req *cqr;
+ int rc;
if (list_empty(&device->ccw_queue))
return 0;
cqr = list_entry(device->ccw_queue.next, struct dasd_ccw_req, devlist);
- return device->discipline->term_IO(cqr);
+ rc = device->discipline->term_IO(cqr);
+ if (!rc)
+ /*
+ * CQR terminated because a more important request is pending.
+ * Undo decreasing of retry counter because this is
+ * not an error case.
+ */
+ cqr->retries++;
+ return rc;
}
int dasd_sleep_on_immediatly(struct dasd_ccw_req *cqr)
static int dasd_open(struct block_device *bdev, fmode_t mode)
{
- struct dasd_block *block = bdev->bd_disk->private_data;
struct dasd_device *base;
int rc;
- if (!block)
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
return -ENODEV;
- base = block->base;
- atomic_inc(&block->open_count);
+ atomic_inc(&base->block->open_count);
if (test_bit(DASD_FLAG_OFFLINE, &base->flags)) {
rc = -ENODEV;
goto unlock;
goto out;
}
+ dasd_put_device(base);
return 0;
out:
module_put(base->discipline->owner);
unlock:
- atomic_dec(&block->open_count);
+ atomic_dec(&base->block->open_count);
+ dasd_put_device(base);
return rc;
}
static int dasd_release(struct gendisk *disk, fmode_t mode)
{
- struct dasd_block *block = disk->private_data;
+ struct dasd_device *base;
- atomic_dec(&block->open_count);
- module_put(block->base->discipline->owner);
+ base = dasd_device_from_gendisk(disk);
+ if (!base)
+ return -ENODEV;
+
+ atomic_dec(&base->block->open_count);
+ module_put(base->discipline->owner);
+ dasd_put_device(base);
return 0;
}
*/
static int dasd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
{
- struct dasd_block *block;
struct dasd_device *base;
- block = bdev->bd_disk->private_data;
- if (!block)
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
return -ENODEV;
- base = block->base;
if (!base->discipline ||
- !base->discipline->fill_geometry)
+ !base->discipline->fill_geometry) {
+ dasd_put_device(base);
return -EINVAL;
-
- base->discipline->fill_geometry(block, geo);
- geo->start = get_start_sect(bdev) >> block->s2b_shift;
+ }
+ base->discipline->fill_geometry(base->block, geo);
+ geo->start = get_start_sect(bdev) >> base->block->s2b_shift;
+ dasd_put_device(base);
return 0;
}
dasd_set_target_state(device, DASD_STATE_NEW);
/* dasd_delete_device destroys the device reference. */
block = device->block;
- device->block = NULL;
dasd_delete_device(device);
/*
* life cycle of block is bound to device, so delete it after
dasd_set_target_state(device, DASD_STATE_NEW);
/* dasd_delete_device destroys the device reference. */
block = device->block;
- device->block = NULL;
dasd_delete_device(device);
/*
* life cycle of block is bound to device, so delete it after
return device;
}
+void dasd_add_link_to_gendisk(struct gendisk *gdp, struct dasd_device *device)
+{
+ struct dasd_devmap *devmap;
+
+ devmap = dasd_find_busid(dev_name(&device->cdev->dev));
+ if (IS_ERR(devmap))
+ return;
+ spin_lock(&dasd_devmap_lock);
+ gdp->private_data = devmap;
+ spin_unlock(&dasd_devmap_lock);
+}
+
+struct dasd_device *dasd_device_from_gendisk(struct gendisk *gdp)
+{
+ struct dasd_device *device;
+ struct dasd_devmap *devmap;
+
+ if (!gdp->private_data)
+ return NULL;
+ device = NULL;
+ spin_lock(&dasd_devmap_lock);
+ devmap = gdp->private_data;
+ if (devmap && devmap->device) {
+ device = devmap->device;
+ dasd_get_device(device);
+ }
+ spin_unlock(&dasd_devmap_lock);
+ return device;
+}
+
/*
* SECTION: files in sysfs
*/
addr_t ip;
int rc;
- kstat_cpu(smp_processor_id()).irqs[EXTINT_DSD]++;
switch (ext_int_code >> 24) {
case DASD_DIAG_CODE_31BIT:
ip = (addr_t) param32;
default:
return;
}
+ kstat_cpu(smp_processor_id()).irqs[EXTINT_DSD]++;
if (!ip) { /* no intparm: unsolicited interrupt */
DBF_EVENT(DBF_NOTICE, "%s", "caught unsolicited "
"interrupt");
return;
/* summary unit check */
- if ((sense[7] == 0x0D) &&
+ if ((sense[27] & DASD_SENSE_BIT_0) && (sense[7] == 0x0D) &&
(scsw_dstat(&irb->scsw) & DEV_STAT_UNIT_CHECK)) {
dasd_alias_handle_summary_unit_check(device, irb);
return;
/* loss of device reservation is handled via base devices only
* as alias devices may be used with several bases
*/
- if (device->block && (sense[7] == 0x3F) &&
+ if (device->block && (sense[27] & DASD_SENSE_BIT_0) &&
+ (sense[7] == 0x3F) &&
(scsw_dstat(&irb->scsw) & DEV_STAT_UNIT_CHECK) &&
test_bit(DASD_FLAG_IS_RESERVED, &device->flags)) {
if (device->features & DASD_FEATURE_FAILONSLCK)
if (base->features & DASD_FEATURE_READONLY ||
test_bit(DASD_FLAG_DEVICE_RO, &base->flags))
set_disk_ro(gdp, 1);
- gdp->private_data = block;
+ dasd_add_link_to_gendisk(gdp, base);
gdp->queue = block->request_queue;
block->gdp = gdp;
set_capacity(block->gdp, 0);
struct dasd_device *dasd_device_from_cdev_locked(struct ccw_device *);
struct dasd_device *dasd_device_from_devindex(int);
+void dasd_add_link_to_gendisk(struct gendisk *, struct dasd_device *);
+struct dasd_device *dasd_device_from_gendisk(struct gendisk *);
+
int dasd_parse(void);
int dasd_busid_known(const char *);
static int
dasd_ioctl_enable(struct block_device *bdev)
{
- struct dasd_block *block = bdev->bd_disk->private_data;
+ struct dasd_device *base;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
- dasd_enable_device(block->base);
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
+ return -ENODEV;
+
+ dasd_enable_device(base);
/* Formatting the dasd device can change the capacity. */
mutex_lock(&bdev->bd_mutex);
- i_size_write(bdev->bd_inode, (loff_t)get_capacity(block->gdp) << 9);
+ i_size_write(bdev->bd_inode,
+ (loff_t)get_capacity(base->block->gdp) << 9);
mutex_unlock(&bdev->bd_mutex);
+ dasd_put_device(base);
return 0;
}
static int
dasd_ioctl_disable(struct block_device *bdev)
{
- struct dasd_block *block = bdev->bd_disk->private_data;
+ struct dasd_device *base;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
+ return -ENODEV;
/*
* Man this is sick. We don't do a real disable but only downgrade
* the device to DASD_STATE_BASIC. The reason is that dasdfmt uses
* using the BIODASDFMT ioctl. Therefore the correct state for the
* device is DASD_STATE_BASIC that allows to do basic i/o.
*/
- dasd_set_target_state(block->base, DASD_STATE_BASIC);
+ dasd_set_target_state(base, DASD_STATE_BASIC);
/*
* Set i_size to zero, since read, write, etc. check against this
* value.
mutex_lock(&bdev->bd_mutex);
i_size_write(bdev->bd_inode, 0);
mutex_unlock(&bdev->bd_mutex);
+ dasd_put_device(base);
return 0;
}
static int
dasd_ioctl_format(struct block_device *bdev, void __user *argp)
{
- struct dasd_block *block = bdev->bd_disk->private_data;
+ struct dasd_device *base;
struct format_data_t fdata;
+ int rc;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
if (!argp)
return -EINVAL;
-
- if (block->base->features & DASD_FEATURE_READONLY ||
- test_bit(DASD_FLAG_DEVICE_RO, &block->base->flags))
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
+ return -ENODEV;
+ if (base->features & DASD_FEATURE_READONLY ||
+ test_bit(DASD_FLAG_DEVICE_RO, &base->flags)) {
+ dasd_put_device(base);
return -EROFS;
- if (copy_from_user(&fdata, argp, sizeof(struct format_data_t)))
+ }
+ if (copy_from_user(&fdata, argp, sizeof(struct format_data_t))) {
+ dasd_put_device(base);
return -EFAULT;
+ }
if (bdev != bdev->bd_contains) {
pr_warning("%s: The specified DASD is a partition and cannot "
"be formatted\n",
- dev_name(&block->base->cdev->dev));
+ dev_name(&base->cdev->dev));
+ dasd_put_device(base);
return -EINVAL;
}
- return dasd_format(block, &fdata);
+ rc = dasd_format(base->block, &fdata);
+ dasd_put_device(base);
+ return rc;
}
#ifdef CONFIG_DASD_PROFILE
static int
dasd_ioctl_set_ro(struct block_device *bdev, void __user *argp)
{
- struct dasd_block *block = bdev->bd_disk->private_data;
- int intval;
+ struct dasd_device *base;
+ int intval, rc;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
return -EINVAL;
if (get_user(intval, (int __user *)argp))
return -EFAULT;
- if (!intval && test_bit(DASD_FLAG_DEVICE_RO, &block->base->flags))
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
+ return -ENODEV;
+ if (!intval && test_bit(DASD_FLAG_DEVICE_RO, &base->flags)) {
+ dasd_put_device(base);
return -EROFS;
+ }
set_disk_ro(bdev->bd_disk, intval);
- return dasd_set_feature(block->base->cdev, DASD_FEATURE_READONLY, intval);
+ rc = dasd_set_feature(base->cdev, DASD_FEATURE_READONLY, intval);
+ dasd_put_device(base);
+ return rc;
}
static int dasd_ioctl_readall_cmb(struct dasd_block *block, unsigned int cmd,
int dasd_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
{
- struct dasd_block *block = bdev->bd_disk->private_data;
+ struct dasd_block *block;
+ struct dasd_device *base;
void __user *argp;
+ int rc;
if (is_compat_task())
argp = compat_ptr(arg);
else
argp = (void __user *)arg;
- if (!block)
- return -ENODEV;
-
if ((_IOC_DIR(cmd) != _IOC_NONE) && !arg) {
PRINT_DEBUG("empty data ptr");
return -EINVAL;
}
+ base = dasd_device_from_gendisk(bdev->bd_disk);
+ if (!base)
+ return -ENODEV;
+ block = base->block;
+ rc = 0;
switch (cmd) {
case BIODASDDISABLE:
- return dasd_ioctl_disable(bdev);
+ rc = dasd_ioctl_disable(bdev);
+ break;
case BIODASDENABLE:
- return dasd_ioctl_enable(bdev);
+ rc = dasd_ioctl_enable(bdev);
+ break;
case BIODASDQUIESCE:
- return dasd_ioctl_quiesce(block);
+ rc = dasd_ioctl_quiesce(block);
+ break;
case BIODASDRESUME:
- return dasd_ioctl_resume(block);
+ rc = dasd_ioctl_resume(block);
+ break;
case BIODASDFMT:
- return dasd_ioctl_format(bdev, argp);
+ rc = dasd_ioctl_format(bdev, argp);
+ break;
case BIODASDINFO:
- return dasd_ioctl_information(block, cmd, argp);
+ rc = dasd_ioctl_information(block, cmd, argp);
+ break;
case BIODASDINFO2:
- return dasd_ioctl_information(block, cmd, argp);
+ rc = dasd_ioctl_information(block, cmd, argp);
+ break;
case BIODASDPRRD:
- return dasd_ioctl_read_profile(block, argp);
+ rc = dasd_ioctl_read_profile(block, argp);
+ break;
case BIODASDPRRST:
- return dasd_ioctl_reset_profile(block);
+ rc = dasd_ioctl_reset_profile(block);
+ break;
case BLKROSET:
- return dasd_ioctl_set_ro(bdev, argp);
+ rc = dasd_ioctl_set_ro(bdev, argp);
+ break;
case DASDAPIVER:
- return dasd_ioctl_api_version(argp);
+ rc = dasd_ioctl_api_version(argp);
+ break;
case BIODASDCMFENABLE:
- return enable_cmf(block->base->cdev);
+ rc = enable_cmf(base->cdev);
+ break;
case BIODASDCMFDISABLE:
- return disable_cmf(block->base->cdev);
+ rc = disable_cmf(base->cdev);
+ break;
case BIODASDREADALLCMB:
- return dasd_ioctl_readall_cmb(block, cmd, argp);
+ rc = dasd_ioctl_readall_cmb(block, cmd, argp);
+ break;
default:
/* if the discipline has an ioctl method try it. */
- if (block->base->discipline->ioctl) {
- int rval = block->base->discipline->ioctl(block, cmd, argp);
- if (rval != -ENOIOCTLCMD)
- return rval;
- }
-
- return -EINVAL;
+ if (base->discipline->ioctl) {
+ rc = base->discipline->ioctl(block, cmd, argp);
+ if (rc == -ENOIOCTLCMD)
+ rc = -EINVAL;
+ } else
+ rc = -EINVAL;
}
+ dasd_put_device(base);
+ return rc;
}
return;
new_incr->rn = rn;
new_incr->standby = standby;
+ if (!standby)
+ new_incr->usecount = 1;
last_rn = 0;
prev = &sclp_mem_list;
list_for_each_entry(incr, &sclp_mem_list, list) {
disk->major = tapeblock_major;
disk->first_minor = device->first_minor;
disk->fops = &tapeblock_fops;
- disk->events = DISK_EVENT_MEDIA_CHANGE;
disk->private_data = tape_get_device(device);
disk->queue = blkdat->request_queue;
set_capacity(disk, 0);
q->q_stats.nr_sbals[pos]++;
}
-static void announce_buffer_error(struct qdio_q *q, int count)
+static void process_buffer_error(struct qdio_q *q, int count)
{
+ unsigned char state = (q->is_input_q) ? SLSB_P_INPUT_NOT_INIT :
+ SLSB_P_OUTPUT_NOT_INIT;
+
q->qdio_error |= QDIO_ERROR_SLSB_STATE;
/* special handling for no target buffer empty */
DBF_ERROR("F14:%2x F15:%2x",
q->sbal[q->first_to_check]->element[14].flags & 0xff,
q->sbal[q->first_to_check]->element[15].flags & 0xff);
+
+ /*
+ * Interrupts may be avoided as long as the error is present
+ * so change the buffer state immediately to avoid starvation.
+ */
+ set_buf_states(q, q->first_to_check, state, count);
}
static inline void inbound_primed(struct qdio_q *q, int count)
account_sbals(q, count);
break;
case SLSB_P_INPUT_ERROR:
- announce_buffer_error(q, count);
- /* process the buffer, the upper layer will take care of it */
+ process_buffer_error(q, count);
q->first_to_check = add_buf(q->first_to_check, count);
atomic_sub(count, &q->nr_buf_used);
if (q->irq_ptr->perf_stat_enabled)
account_sbals(q, count);
break;
case SLSB_P_OUTPUT_ERROR:
- announce_buffer_error(q, count);
- /* process the buffer, the upper layer will take care of it */
+ process_buffer_error(q, count);
q->first_to_check = add_buf(q->first_to_check, count);
atomic_sub(count, &q->nr_buf_used);
if (q->irq_ptr->perf_stat_enabled)
u16 subcode;
u32 param;
- kstat_cpu(smp_processor_id()).irqs[EXTINT_VRT]++;
subcode = ext_int_code >> 16;
if ((subcode & 0xff00) != VIRTIO_SUBCODE_64)
return;
+ kstat_cpu(smp_processor_id()).irqs[EXTINT_VRT]++;
/* The LSB might be overloaded, we have to mask it */
vq = (struct virtqueue *)(param64 & ~1UL);
unsigned long flags;
struct scsi_device *sdev;
struct scsi_device_handler *scsi_dh = NULL;
+ struct device *dev = NULL;
spin_lock_irqsave(q->queue_lock, flags);
sdev = q->queuedata;
if (sdev && sdev->scsi_dh_data)
scsi_dh = sdev->scsi_dh_data->scsi_dh;
- if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+ dev = get_device(&sdev->sdev_gendev);
+ if (!scsi_dh || !dev ||
sdev->sdev_state == SDEV_CANCEL ||
sdev->sdev_state == SDEV_DEL)
err = SCSI_DH_NOSYS;
if (err) {
if (fn)
fn(data, err);
- return err;
+ goto out;
}
if (scsi_dh->activate)
err = scsi_dh->activate(sdev, fn, data);
- put_device(&sdev->sdev_gendev);
+out:
+ put_device(dev);
return err;
}
EXPORT_SYMBOL_GPL(scsi_dh_activate);
goto out;
}
+ /* Check for overflow and wraparound */
+ if (karg.data_sge_offset * 4 > ioc->request_sz ||
+ karg.data_sge_offset > (UINT_MAX / 4)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
/* copy in request message frame from user */
if (copy_from_user(mpi_request, mf, karg.data_sge_offset*4)) {
printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__, __LINE__,
Mpi2DiagBufferPostReply_t *mpi_reply;
int rc, i;
u8 buffer_type;
- unsigned long timeleft;
+ unsigned long timeleft, request_size, copy_size;
u16 smid;
u16 ioc_status;
u8 issue_reset = 0;
return -ENOMEM;
}
+ request_size = ioc->diag_buffer_sz[buffer_type];
+
if ((karg.starting_offset % 4) || (karg.bytes_to_read % 4)) {
printk(MPT2SAS_ERR_FMT "%s: either the starting_offset "
"or bytes_to_read are not 4 byte aligned\n", ioc->name,
return -EINVAL;
}
+ if (karg.starting_offset > request_size)
+ return -EINVAL;
+
diag_data = (void *)(request_data + karg.starting_offset);
dctlprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: diag_buffer(%p), "
"offset(%d), sz(%d)\n", ioc->name, __func__,
diag_data, karg.starting_offset, karg.bytes_to_read));
+ /* Truncate data on requests that are too large */
+ if ((diag_data + karg.bytes_to_read < diag_data) ||
+ (diag_data + karg.bytes_to_read > request_data + request_size))
+ copy_size = request_size - karg.starting_offset;
+ else
+ copy_size = karg.bytes_to_read;
+
if (copy_to_user((void __user *)uarg->diagnostic_data,
- diag_data, karg.bytes_to_read)) {
+ diag_data, copy_size)) {
printk(MPT2SAS_ERR_FMT "%s: Unable to write "
"mpt_diag_read_buffer_t data @ %p\n", ioc->name,
__func__, diag_data);
rc = -EFAULT;
goto out_free_buffer;
}
+ } else if (request_size < 0) {
+ rc = -EINVAL;
+ goto out_free_buffer;
}
/* check if we have any additional command parameters */
.use_clustering = ENABLE_CLUSTERING,
};
+static const struct of_device_id qpti_match[];
static int __devinit qpti_sbus_probe(struct platform_device *op)
{
+ const struct of_device_id *match;
struct scsi_host_template *tpnt;
struct device_node *dp = op->dev.of_node;
struct Scsi_Host *host;
static int nqptis;
const char *fcode;
- if (!op->dev.of_match)
+ match = of_match_device(qpti_match, &op->dev);
+ if (!match)
return -EINVAL;
- tpnt = op->dev.of_match->data;
+ tpnt = match->data;
/* Sometimes Antares cards come up not completely
* setup, and we get a report of a zero IRQ.
*/
#define SCSI_QUEUE_DELAY 3
-static void scsi_run_queue(struct request_queue *q);
-
/*
* Function: scsi_unprep_request()
*
blk_requeue_request(q, cmd->request);
spin_unlock_irqrestore(q->queue_lock, flags);
- scsi_run_queue(q);
+ kblockd_schedule_work(q, &device->requeue_work);
return 0;
}
static void scsi_run_queue(struct request_queue *q)
{
struct scsi_device *sdev = q->queuedata;
- struct Scsi_Host *shost = sdev->host;
+ struct Scsi_Host *shost;
LIST_HEAD(starved_list);
unsigned long flags;
+ /* if the device is dead, sdev will be NULL, so no queue to run */
+ if (!sdev)
+ return;
+
+ shost = sdev->host;
if (scsi_target(sdev)->single_lun)
scsi_single_lun_run(sdev);
continue;
}
- blk_run_queue_async(sdev->request_queue);
+ spin_unlock(shost->host_lock);
+ spin_lock(sdev->request_queue->queue_lock);
+ __blk_run_queue(sdev->request_queue);
+ spin_unlock(sdev->request_queue->queue_lock);
+ spin_lock(shost->host_lock);
}
/* put any unprocessed entries back */
list_splice(&starved_list, &shost->starved_list);
blk_run_queue(q);
}
+void scsi_requeue_run_queue(struct work_struct *work)
+{
+ struct scsi_device *sdev;
+ struct request_queue *q;
+
+ sdev = container_of(work, struct scsi_device, requeue_work);
+ q = sdev->request_queue;
+ scsi_run_queue(q);
+}
+
/*
* Function: scsi_requeue_command()
*
int display_failure_msg = 1, ret;
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
extern void scsi_evt_thread(struct work_struct *work);
+ extern void scsi_requeue_run_queue(struct work_struct *work);
sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
GFP_ATOMIC);
INIT_LIST_HEAD(&sdev->event_list);
spin_lock_init(&sdev->list_lock);
INIT_WORK(&sdev->event_work, scsi_evt_thread);
+ INIT_WORK(&sdev->requeue_work, scsi_requeue_run_queue);
sdev->sdev_gendev.parent = get_device(&starget->dev);
sdev->sdev_target = starget;
kfree(evt);
}
- if (sdev->request_queue) {
- sdev->request_queue->queuedata = NULL;
- /* user context needed to free queue */
- scsi_free_queue(sdev->request_queue);
- /* temporary expedient, try to catch use of queue lock
- * after free of sdev */
- sdev->request_queue = NULL;
- }
+ /* NULL queue means the device can't be used */
+ sdev->request_queue = NULL;
scsi_target_reap(scsi_target(sdev));
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
+
+ /* cause the request function to reject all I/O requests */
+ sdev->request_queue->queuedata = NULL;
+
+ /* Freeing the queue signals to block that we're done */
+ scsi_free_queue(sdev->request_queue);
put_device(dev);
}
static int ssb_pci_sprom_get(struct ssb_bus *bus,
struct ssb_sprom *sprom)
{
- const struct ssb_sprom *fallback;
int err;
u16 *buf;
if (err) {
/* All CRC attempts failed.
* Maybe there is no SPROM on the device?
- * If we have a fallback, use that. */
- fallback = ssb_get_fallback_sprom();
- if (fallback) {
- memcpy(sprom, fallback, sizeof(*sprom));
+ * Now we ask the arch code if there is some sprom
+ * available for this device in some other storage */
+ err = ssb_fill_sprom_with_fallback(bus, sprom);
+ if (err) {
+ ssb_printk(KERN_WARNING PFX "WARNING: Using"
+ " fallback SPROM failed (err %d)\n",
+ err);
+ } else {
+ ssb_dprintk(KERN_DEBUG PFX "Using SPROM"
+ " revision %d provided by"
+ " platform.\n", sprom->revision);
err = 0;
goto out_free;
}
#include <linux/slab.h>
-static const struct ssb_sprom *fallback_sprom;
+static int(*get_fallback_sprom)(struct ssb_bus *dev, struct ssb_sprom *out);
static int sprom2hex(const u16 *sprom, char *buf, size_t buf_len,
}
/**
- * ssb_arch_set_fallback_sprom - Set a fallback SPROM for use if no SPROM is found.
+ * ssb_arch_register_fallback_sprom - Registers a method providing a
+ * fallback SPROM if no SPROM is found.
*
- * @sprom: The SPROM data structure to register.
+ * @sprom_callback: The callback function.
*
- * With this function the architecture implementation may register a fallback
- * SPROM data structure. The fallback is only used for PCI based SSB devices,
- * where no valid SPROM can be found in the shadow registers.
+ * With this function the architecture implementation may register a
+ * callback handler which fills the SPROM data structure. The fallback is
+ * only used for PCI based SSB devices, where no valid SPROM can be found
+ * in the shadow registers.
*
- * This function is useful for weird architectures that have a half-assed SSB device
- * hardwired to their PCI bus.
+ * This function is useful for weird architectures that have a half-assed
+ * SSB device hardwired to their PCI bus.
*
- * Note that it does only work with PCI attached SSB devices. PCMCIA devices currently
- * don't use this fallback.
- * Architectures must provide the SPROM for native SSB devices anyway,
- * so the fallback also isn't used for native devices.
+ * Note that it does only work with PCI attached SSB devices. PCMCIA
+ * devices currently don't use this fallback.
+ * Architectures must provide the SPROM for native SSB devices anyway, so
+ * the fallback also isn't used for native devices.
*
- * This function is available for architecture code, only. So it is not exported.
+ * This function is available for architecture code, only. So it is not
+ * exported.
*/
-int ssb_arch_set_fallback_sprom(const struct ssb_sprom *sprom)
+int ssb_arch_register_fallback_sprom(int (*sprom_callback)(struct ssb_bus *bus,
+ struct ssb_sprom *out))
{
- if (fallback_sprom)
+ if (get_fallback_sprom)
return -EEXIST;
- fallback_sprom = sprom;
+ get_fallback_sprom = sprom_callback;
return 0;
}
-const struct ssb_sprom *ssb_get_fallback_sprom(void)
+int ssb_fill_sprom_with_fallback(struct ssb_bus *bus, struct ssb_sprom *out)
{
- return fallback_sprom;
+ if (!get_fallback_sprom)
+ return -ENOENT;
+
+ return get_fallback_sprom(bus, out);
}
/* http://bcm-v4.sipsolutions.net/802.11/IsSpromAvailable */
const char *buf, size_t count,
int (*sprom_check_crc)(const u16 *sprom, size_t size),
int (*sprom_write)(struct ssb_bus *bus, const u16 *sprom));
-extern const struct ssb_sprom *ssb_get_fallback_sprom(void);
+extern int ssb_fill_sprom_with_fallback(struct ssb_bus *bus,
+ struct ssb_sprom *out);
/* core.c */
free_netdev(dev);
return NULL;
}
-
-EXPORT_SYMBOL(init_ft1000_card);
-EXPORT_SYMBOL(stop_ft1000_card);
-EXPORT_SYMBOL(flarion_ft1000_cnt);
remove_proc_entry(FT1000_PROC, init_net.proc_net);
unregister_netdevice_notifier(&ft1000_netdev_notifier);
}
-
-EXPORT_SYMBOL(ft1000InitProc);
-EXPORT_SYMBOL(ft1000CleanupProc);
config DRM_PSB
tristate "Intel GMA500 KMS Framebuffer"
- depends on DRM && PCI
+ depends on DRM && PCI && X86
select FB_CFB_COPYAREA
select FB_CFB_FILLRECT
select FB_CFB_IMAGEBLIT
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/pci.h>
+#include <linux/delay.h>
#include <linux/file.h>
#include <asm/mrst.h>
#include <sound/pcm.h>
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/pci.h>
+#include <linux/delay.h>
#include <linux/file.h>
#include "intel_sst.h"
#include "intelmid_snd_control.h"
*/
#include <linux/cs5535.h>
#include <linux/gpio.h>
+#include <linux/delay.h>
#include <asm/olpc.h>
#include "olpc_dcon.h"
return (NDIS_STATUS_FAILURE);
}
}
- /* Drop not U2M frames, can't's drop here because we will drop beacon in this case */
+ /* Drop not U2M frames, can't drop here because we will drop beacon in this case */
/* I am kind of doubting the U2M bit operation */
/* if (pRxD->U2M == 0) */
/* return(NDIS_STATUS_FAILURE); */
DBGPRINT_RAW(RT_DEBUG_ERROR, ("received packet too long\n"));
return NDIS_STATUS_FAILURE;
}
- /* Drop not U2M frames, can't's drop here because we will drop beacon in this case */
+ /* Drop not U2M frames, can't drop here because we will drop beacon in this case */
/* I am kind of doubting the U2M bit operation */
/* if (pRxD->U2M == 0) */
/* return(NDIS_STATUS_FAILURE); */
#define RTSX_STOR "rts_pstor: "
-#if CONFIG_RTS_PSTOR_DEBUG
+#ifdef CONFIG_RTS_PSTOR_DEBUG
#define RTSX_DEBUGP(x...) printk(KERN_DEBUG RTSX_STOR x)
#define RTSX_DEBUGPN(x...) printk(KERN_DEBUG x)
#define RTSX_DEBUGPX(x...) printk(x)
#include <linux/blkdev.h>
#include <linux/kthread.h>
#include <linux/sched.h>
+#include <linux/vmalloc.h>
#include "rtsx.h"
#include "rtsx_transport.h"
#include <linux/kthread.h>
#include <linux/sched.h>
#include <linux/workqueue.h>
+#include <linux/vmalloc.h>
#include "rtsx.h"
#include "rtsx_transport.h"
#ifdef SUPPORT_OCP
if (CHECK_LUN_MODE(chip, SD_MS_2LUN)) {
- #if CONFIG_RTS_PSTOR_DEBUG
+#ifdef CONFIG_RTS_PSTOR_DEBUG
if (chip->ocp_stat & (SD_OC_NOW | SD_OC_EVER | MS_OC_NOW | MS_OC_EVER)) {
RTSX_DEBUGP("Over current, OCPSTAT is 0x%x\n", chip->ocp_stat);
}
- #endif
+#endif
if (chip->ocp_stat & (SD_OC_NOW | SD_OC_EVER)) {
if (chip->card_exist & SD_CARD) {
#include <linux/blkdev.h>
#include <linux/kthread.h>
#include <linux/sched.h>
+#include <linux/vmalloc.h>
#include "rtsx.h"
#include "rtsx_transport.h"
RTSX_WRITE_REG(chip, SD_VPCLK0_CTL, PHASE_NOT_RESET, PHASE_NOT_RESET);
RTSX_WRITE_REG(chip, CLK_CTL, CHANGE_CLK, 0);
} else {
-#if CONFIG_RTS_PSTOR_DEBUG
+#ifdef CONFIG_RTS_PSTOR_DEBUG
rtsx_read_register(chip, SD_VP_CTL, &val);
RTSX_DEBUGP("SD_VP_CTL: 0x%x\n", val);
rtsx_read_register(chip, SD_DCMPS_CTL, &val);
return STATUS_SUCCESS;
Fail:
-#if CONFIG_RTS_PSTOR_DEBUG
+#ifdef CONFIG_RTS_PSTOR_DEBUG
rtsx_read_register(chip, SD_VP_CTL, &val);
RTSX_DEBUGP("SD_VP_CTL: 0x%x\n", val);
rtsx_read_register(chip, SD_DCMPS_CTL, &val);
#define TRACE_GOTO(chip, label) goto label
#endif
-#if CONFIG_RTS_PSTOR_DEBUG
+#ifdef CONFIG_RTS_PSTOR_DEBUG
static inline void rtsx_dump(u8 *buf, int buf_len)
{
int i;
#include <linux/blkdev.h>
#include <linux/kthread.h>
#include <linux/sched.h>
+#include <linux/vmalloc.h>
#include "rtsx.h"
#include "rtsx_transport.h"
tristate "Softlogic 6x10 MPEG codec cards"
depends on PCI && VIDEO_DEV && SND && I2C
select VIDEOBUF_DMA_SG
+ select SND_PCM
---help---
This driver supports the Softlogic based MPEG-4 and h.264 codec
codec cards.
}
dev->queue->queuedata = dev;
- /* As Linux block layer does't support >4KB hardware sector, */
+ /* As Linux block layer doesn't support >4KB hardware sector, */
/* Here we force report 512 byte hardware sector size to Kernel */
blk_queue_logical_block_size(dev->queue, 512);
* as a temporary for .dllview record construction.
* Allocate storage for the whole table. Add 1 to the section count
* in case a trampoline section is auto-generated as well as the
- * size of the trampoline section name so DLLView does't get lost.
+ * size of the trampoline section name so DLLView doesn't get lost.
*/
siz = sym_count * sizeof(struct local_symbol);
spin_lock_irqsave(&bp->lock, flags);
sx_out(bp, CD186x_CAR, port_No(port));
- /* The Specialix board does't implement the RTS lines.
+ /* The Specialix board doesn't implement the RTS lines.
They are used to set the IRQ level. Don't touch them. */
if (sx_crtscts(tty))
port->MSVR = MSVR_DTR | (sx_in(bp, CD186x_MSVR) & MSVR_RTS);
}
/* kill threads related to this sdev, if v.c. exists */
- kthread_stop(vdev->ud.tcp_rx);
- kthread_stop(vdev->ud.tcp_tx);
+ if (vdev->ud.tcp_rx)
+ kthread_stop(vdev->ud.tcp_rx);
+ if (vdev->ud.tcp_tx)
+ kthread_stop(vdev->ud.tcp_tx);
usbip_uinfo("stop threads\n");
{
memset(vdev, 0, sizeof(*vdev));
- vdev->ud.tcp_rx = kthread_create(vhci_rx_loop, &vdev->ud, "vhci_rx");
- vdev->ud.tcp_tx = kthread_create(vhci_tx_loop, &vdev->ud, "vhci_tx");
-
vdev->ud.side = USBIP_VHCI;
vdev->ud.status = VDEV_ST_NULL;
/* vdev->ud.lock = SPIN_LOCK_UNLOCKED; */
usbip_uerr("create hcd failed\n");
return -ENOMEM;
}
-
+ hcd->has_tt = 1;
/* this is private data for vhci_hcd */
the_controller = hcd_to_vhci(hcd);
#include "vhci.h"
#include <linux/in.h>
+#include <linux/kthread.h>
/* TODO: refine locking ?*/
vdev->ud.tcp_socket = socket;
vdev->ud.status = VDEV_ST_NOTASSIGNED;
- wake_up_process(vdev->ud.tcp_rx);
- wake_up_process(vdev->ud.tcp_tx);
-
spin_unlock(&vdev->ud.lock);
spin_unlock(&the_controller->lock);
/* end the lock */
+ vdev->ud.tcp_rx = kthread_run(vhci_rx_loop, &vdev->ud, "vhci_rx");
+ vdev->ud.tcp_tx = kthread_run(vhci_tx_loop, &vdev->ud, "vhci_tx");
+
rh_port_connect(rhport, speed);
return count;
}
int prism2_set_default_key(struct wiphy *wiphy, struct net_device *dev,
- u8 key_index)
+ u8 key_index, bool unicast, bool multicast)
{
wlandevice_t *wlandev = dev->ml_priv;
help
Support for Console on the NWP serial ports.
+config SERIAL_LANTIQ
+ bool "Lantiq serial driver"
+ depends on LANTIQ
+ select SERIAL_CORE
+ select SERIAL_CORE_CONSOLE
+ help
+ Support for console and UART on Lantiq SoCs.
+
config SERIAL_QE
tristate "Freescale QUICC Engine serial port support"
depends on QUICC_ENGINE
obj-$(CONFIG_SERIAL_PCH_UART) += pch_uart.o
obj-$(CONFIG_SERIAL_MSM_SMD) += msm_smd_tty.o
obj-$(CONFIG_SERIAL_MXS_AUART) += mxs-auart.o
+obj-$(CONFIG_SERIAL_LANTIQ) += lantiq.o
--- /dev/null
+/*
+ * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Copyright (C) 2004 Infineon IFAP DC COM CPE
+ * Copyright (C) 2007 Felix Fietkau <nbd@openwrt.org>
+ * Copyright (C) 2007 John Crispin <blogic@openwrt.org>
+ * Copyright (C) 2010 Thomas Langer, <thomas.langer@lantiq.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/ioport.h>
+#include <linux/init.h>
+#include <linux/console.h>
+#include <linux/sysrq.h>
+#include <linux/device.h>
+#include <linux/tty.h>
+#include <linux/tty_flip.h>
+#include <linux/serial_core.h>
+#include <linux/serial.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/clk.h>
+
+#include <lantiq_soc.h>
+
+#define PORT_LTQ_ASC 111
+#define MAXPORTS 2
+#define UART_DUMMY_UER_RX 1
+#define DRVNAME "ltq_asc"
+#ifdef __BIG_ENDIAN
+#define LTQ_ASC_TBUF (0x0020 + 3)
+#define LTQ_ASC_RBUF (0x0024 + 3)
+#else
+#define LTQ_ASC_TBUF 0x0020
+#define LTQ_ASC_RBUF 0x0024
+#endif
+#define LTQ_ASC_FSTAT 0x0048
+#define LTQ_ASC_WHBSTATE 0x0018
+#define LTQ_ASC_STATE 0x0014
+#define LTQ_ASC_IRNCR 0x00F8
+#define LTQ_ASC_CLC 0x0000
+#define LTQ_ASC_ID 0x0008
+#define LTQ_ASC_PISEL 0x0004
+#define LTQ_ASC_TXFCON 0x0044
+#define LTQ_ASC_RXFCON 0x0040
+#define LTQ_ASC_CON 0x0010
+#define LTQ_ASC_BG 0x0050
+#define LTQ_ASC_IRNREN 0x00F4
+
+#define ASC_IRNREN_TX 0x1
+#define ASC_IRNREN_RX 0x2
+#define ASC_IRNREN_ERR 0x4
+#define ASC_IRNREN_TX_BUF 0x8
+#define ASC_IRNCR_TIR 0x1
+#define ASC_IRNCR_RIR 0x2
+#define ASC_IRNCR_EIR 0x4
+
+#define ASCOPT_CSIZE 0x3
+#define TXFIFO_FL 1
+#define RXFIFO_FL 1
+#define ASCCLC_DISS 0x2
+#define ASCCLC_RMCMASK 0x0000FF00
+#define ASCCLC_RMCOFFSET 8
+#define ASCCON_M_8ASYNC 0x0
+#define ASCCON_M_7ASYNC 0x2
+#define ASCCON_ODD 0x00000020
+#define ASCCON_STP 0x00000080
+#define ASCCON_BRS 0x00000100
+#define ASCCON_FDE 0x00000200
+#define ASCCON_R 0x00008000
+#define ASCCON_FEN 0x00020000
+#define ASCCON_ROEN 0x00080000
+#define ASCCON_TOEN 0x00100000
+#define ASCSTATE_PE 0x00010000
+#define ASCSTATE_FE 0x00020000
+#define ASCSTATE_ROE 0x00080000
+#define ASCSTATE_ANY (ASCSTATE_ROE|ASCSTATE_PE|ASCSTATE_FE)
+#define ASCWHBSTATE_CLRREN 0x00000001
+#define ASCWHBSTATE_SETREN 0x00000002
+#define ASCWHBSTATE_CLRPE 0x00000004
+#define ASCWHBSTATE_CLRFE 0x00000008
+#define ASCWHBSTATE_CLRROE 0x00000020
+#define ASCTXFCON_TXFEN 0x0001
+#define ASCTXFCON_TXFFLU 0x0002
+#define ASCTXFCON_TXFITLMASK 0x3F00
+#define ASCTXFCON_TXFITLOFF 8
+#define ASCRXFCON_RXFEN 0x0001
+#define ASCRXFCON_RXFFLU 0x0002
+#define ASCRXFCON_RXFITLMASK 0x3F00
+#define ASCRXFCON_RXFITLOFF 8
+#define ASCFSTAT_RXFFLMASK 0x003F
+#define ASCFSTAT_TXFFLMASK 0x3F00
+#define ASCFSTAT_TXFREEMASK 0x3F000000
+#define ASCFSTAT_TXFREEOFF 24
+
+static void lqasc_tx_chars(struct uart_port *port);
+static struct ltq_uart_port *lqasc_port[MAXPORTS];
+static struct uart_driver lqasc_reg;
+static DEFINE_SPINLOCK(ltq_asc_lock);
+
+struct ltq_uart_port {
+ struct uart_port port;
+ struct clk *clk;
+ unsigned int tx_irq;
+ unsigned int rx_irq;
+ unsigned int err_irq;
+};
+
+static inline struct
+ltq_uart_port *to_ltq_uart_port(struct uart_port *port)
+{
+ return container_of(port, struct ltq_uart_port, port);
+}
+
+static void
+lqasc_stop_tx(struct uart_port *port)
+{
+ return;
+}
+
+static void
+lqasc_start_tx(struct uart_port *port)
+{
+ unsigned long flags;
+ spin_lock_irqsave(<q_asc_lock, flags);
+ lqasc_tx_chars(port);
+ spin_unlock_irqrestore(<q_asc_lock, flags);
+ return;
+}
+
+static void
+lqasc_stop_rx(struct uart_port *port)
+{
+ ltq_w32(ASCWHBSTATE_CLRREN, port->membase + LTQ_ASC_WHBSTATE);
+}
+
+static void
+lqasc_enable_ms(struct uart_port *port)
+{
+}
+
+static int
+lqasc_rx_chars(struct uart_port *port)
+{
+ struct tty_struct *tty = tty_port_tty_get(&port->state->port);
+ unsigned int ch = 0, rsr = 0, fifocnt;
+
+ if (!tty) {
+ dev_dbg(port->dev, "%s:tty is busy now", __func__);
+ return -EBUSY;
+ }
+ fifocnt =
+ ltq_r32(port->membase + LTQ_ASC_FSTAT) & ASCFSTAT_RXFFLMASK;
+ while (fifocnt--) {
+ u8 flag = TTY_NORMAL;
+ ch = ltq_r8(port->membase + LTQ_ASC_RBUF);
+ rsr = (ltq_r32(port->membase + LTQ_ASC_STATE)
+ & ASCSTATE_ANY) | UART_DUMMY_UER_RX;
+ tty_flip_buffer_push(tty);
+ port->icount.rx++;
+
+ /*
+ * Note that the error handling code is
+ * out of the main execution path
+ */
+ if (rsr & ASCSTATE_ANY) {
+ if (rsr & ASCSTATE_PE) {
+ port->icount.parity++;
+ ltq_w32_mask(0, ASCWHBSTATE_CLRPE,
+ port->membase + LTQ_ASC_WHBSTATE);
+ } else if (rsr & ASCSTATE_FE) {
+ port->icount.frame++;
+ ltq_w32_mask(0, ASCWHBSTATE_CLRFE,
+ port->membase + LTQ_ASC_WHBSTATE);
+ }
+ if (rsr & ASCSTATE_ROE) {
+ port->icount.overrun++;
+ ltq_w32_mask(0, ASCWHBSTATE_CLRROE,
+ port->membase + LTQ_ASC_WHBSTATE);
+ }
+
+ rsr &= port->read_status_mask;
+
+ if (rsr & ASCSTATE_PE)
+ flag = TTY_PARITY;
+ else if (rsr & ASCSTATE_FE)
+ flag = TTY_FRAME;
+ }
+
+ if ((rsr & port->ignore_status_mask) == 0)
+ tty_insert_flip_char(tty, ch, flag);
+
+ if (rsr & ASCSTATE_ROE)
+ /*
+ * Overrun is special, since it's reported
+ * immediately, and doesn't affect the current
+ * character
+ */
+ tty_insert_flip_char(tty, 0, TTY_OVERRUN);
+ }
+ if (ch != 0)
+ tty_flip_buffer_push(tty);
+ tty_kref_put(tty);
+ return 0;
+}
+
+static void
+lqasc_tx_chars(struct uart_port *port)
+{
+ struct circ_buf *xmit = &port->state->xmit;
+ if (uart_tx_stopped(port)) {
+ lqasc_stop_tx(port);
+ return;
+ }
+
+ while (((ltq_r32(port->membase + LTQ_ASC_FSTAT) &
+ ASCFSTAT_TXFREEMASK) >> ASCFSTAT_TXFREEOFF) != 0) {
+ if (port->x_char) {
+ ltq_w8(port->x_char, port->membase + LTQ_ASC_TBUF);
+ port->icount.tx++;
+ port->x_char = 0;
+ continue;
+ }
+
+ if (uart_circ_empty(xmit))
+ break;
+
+ ltq_w8(port->state->xmit.buf[port->state->xmit.tail],
+ port->membase + LTQ_ASC_TBUF);
+ xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+ port->icount.tx++;
+ }
+
+ if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS)
+ uart_write_wakeup(port);
+}
+
+static irqreturn_t
+lqasc_tx_int(int irq, void *_port)
+{
+ unsigned long flags;
+ struct uart_port *port = (struct uart_port *)_port;
+ spin_lock_irqsave(<q_asc_lock, flags);
+ ltq_w32(ASC_IRNCR_TIR, port->membase + LTQ_ASC_IRNCR);
+ spin_unlock_irqrestore(<q_asc_lock, flags);
+ lqasc_start_tx(port);
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t
+lqasc_err_int(int irq, void *_port)
+{
+ unsigned long flags;
+ struct uart_port *port = (struct uart_port *)_port;
+ spin_lock_irqsave(<q_asc_lock, flags);
+ /* clear any pending interrupts */
+ ltq_w32_mask(0, ASCWHBSTATE_CLRPE | ASCWHBSTATE_CLRFE |
+ ASCWHBSTATE_CLRROE, port->membase + LTQ_ASC_WHBSTATE);
+ spin_unlock_irqrestore(<q_asc_lock, flags);
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t
+lqasc_rx_int(int irq, void *_port)
+{
+ unsigned long flags;
+ struct uart_port *port = (struct uart_port *)_port;
+ spin_lock_irqsave(<q_asc_lock, flags);
+ ltq_w32(ASC_IRNCR_RIR, port->membase + LTQ_ASC_IRNCR);
+ lqasc_rx_chars(port);
+ spin_unlock_irqrestore(<q_asc_lock, flags);
+ return IRQ_HANDLED;
+}
+
+static unsigned int
+lqasc_tx_empty(struct uart_port *port)
+{
+ int status;
+ status = ltq_r32(port->membase + LTQ_ASC_FSTAT) & ASCFSTAT_TXFFLMASK;
+ return status ? 0 : TIOCSER_TEMT;
+}
+
+static unsigned int
+lqasc_get_mctrl(struct uart_port *port)
+{
+ return TIOCM_CTS | TIOCM_CAR | TIOCM_DSR;
+}
+
+static void
+lqasc_set_mctrl(struct uart_port *port, u_int mctrl)
+{
+}
+
+static void
+lqasc_break_ctl(struct uart_port *port, int break_state)
+{
+}
+
+static int
+lqasc_startup(struct uart_port *port)
+{
+ struct ltq_uart_port *ltq_port = to_ltq_uart_port(port);
+ int retval;
+
+ port->uartclk = clk_get_rate(ltq_port->clk);
+
+ ltq_w32_mask(ASCCLC_DISS | ASCCLC_RMCMASK, (1 << ASCCLC_RMCOFFSET),
+ port->membase + LTQ_ASC_CLC);
+
+ ltq_w32(0, port->membase + LTQ_ASC_PISEL);
+ ltq_w32(
+ ((TXFIFO_FL << ASCTXFCON_TXFITLOFF) & ASCTXFCON_TXFITLMASK) |
+ ASCTXFCON_TXFEN | ASCTXFCON_TXFFLU,
+ port->membase + LTQ_ASC_TXFCON);
+ ltq_w32(
+ ((RXFIFO_FL << ASCRXFCON_RXFITLOFF) & ASCRXFCON_RXFITLMASK)
+ | ASCRXFCON_RXFEN | ASCRXFCON_RXFFLU,
+ port->membase + LTQ_ASC_RXFCON);
+ /* make sure other settings are written to hardware before
+ * setting enable bits
+ */
+ wmb();
+ ltq_w32_mask(0, ASCCON_M_8ASYNC | ASCCON_FEN | ASCCON_TOEN |
+ ASCCON_ROEN, port->membase + LTQ_ASC_CON);
+
+ retval = request_irq(ltq_port->tx_irq, lqasc_tx_int,
+ IRQF_DISABLED, "asc_tx", port);
+ if (retval) {
+ pr_err("failed to request lqasc_tx_int\n");
+ return retval;
+ }
+
+ retval = request_irq(ltq_port->rx_irq, lqasc_rx_int,
+ IRQF_DISABLED, "asc_rx", port);
+ if (retval) {
+ pr_err("failed to request lqasc_rx_int\n");
+ goto err1;
+ }
+
+ retval = request_irq(ltq_port->err_irq, lqasc_err_int,
+ IRQF_DISABLED, "asc_err", port);
+ if (retval) {
+ pr_err("failed to request lqasc_err_int\n");
+ goto err2;
+ }
+
+ ltq_w32(ASC_IRNREN_RX | ASC_IRNREN_ERR | ASC_IRNREN_TX,
+ port->membase + LTQ_ASC_IRNREN);
+ return 0;
+
+err2:
+ free_irq(ltq_port->rx_irq, port);
+err1:
+ free_irq(ltq_port->tx_irq, port);
+ return retval;
+}
+
+static void
+lqasc_shutdown(struct uart_port *port)
+{
+ struct ltq_uart_port *ltq_port = to_ltq_uart_port(port);
+ free_irq(ltq_port->tx_irq, port);
+ free_irq(ltq_port->rx_irq, port);
+ free_irq(ltq_port->err_irq, port);
+
+ ltq_w32(0, port->membase + LTQ_ASC_CON);
+ ltq_w32_mask(ASCRXFCON_RXFEN, ASCRXFCON_RXFFLU,
+ port->membase + LTQ_ASC_RXFCON);
+ ltq_w32_mask(ASCTXFCON_TXFEN, ASCTXFCON_TXFFLU,
+ port->membase + LTQ_ASC_TXFCON);
+}
+
+static void
+lqasc_set_termios(struct uart_port *port,
+ struct ktermios *new, struct ktermios *old)
+{
+ unsigned int cflag;
+ unsigned int iflag;
+ unsigned int divisor;
+ unsigned int baud;
+ unsigned int con = 0;
+ unsigned long flags;
+
+ cflag = new->c_cflag;
+ iflag = new->c_iflag;
+
+ switch (cflag & CSIZE) {
+ case CS7:
+ con = ASCCON_M_7ASYNC;
+ break;
+
+ case CS5:
+ case CS6:
+ default:
+ new->c_cflag &= ~ CSIZE;
+ new->c_cflag |= CS8;
+ con = ASCCON_M_8ASYNC;
+ break;
+ }
+
+ cflag &= ~CMSPAR; /* Mark/Space parity is not supported */
+
+ if (cflag & CSTOPB)
+ con |= ASCCON_STP;
+
+ if (cflag & PARENB) {
+ if (!(cflag & PARODD))
+ con &= ~ASCCON_ODD;
+ else
+ con |= ASCCON_ODD;
+ }
+
+ port->read_status_mask = ASCSTATE_ROE;
+ if (iflag & INPCK)
+ port->read_status_mask |= ASCSTATE_FE | ASCSTATE_PE;
+
+ port->ignore_status_mask = 0;
+ if (iflag & IGNPAR)
+ port->ignore_status_mask |= ASCSTATE_FE | ASCSTATE_PE;
+
+ if (iflag & IGNBRK) {
+ /*
+ * If we're ignoring parity and break indicators,
+ * ignore overruns too (for real raw support).
+ */
+ if (iflag & IGNPAR)
+ port->ignore_status_mask |= ASCSTATE_ROE;
+ }
+
+ if ((cflag & CREAD) == 0)
+ port->ignore_status_mask |= UART_DUMMY_UER_RX;
+
+ /* set error signals - framing, parity and overrun, enable receiver */
+ con |= ASCCON_FEN | ASCCON_TOEN | ASCCON_ROEN;
+
+ spin_lock_irqsave(<q_asc_lock, flags);
+
+ /* set up CON */
+ ltq_w32_mask(0, con, port->membase + LTQ_ASC_CON);
+
+ /* Set baud rate - take a divider of 2 into account */
+ baud = uart_get_baud_rate(port, new, old, 0, port->uartclk / 16);
+ divisor = uart_get_divisor(port, baud);
+ divisor = divisor / 2 - 1;
+
+ /* disable the baudrate generator */
+ ltq_w32_mask(ASCCON_R, 0, port->membase + LTQ_ASC_CON);
+
+ /* make sure the fractional divider is off */
+ ltq_w32_mask(ASCCON_FDE, 0, port->membase + LTQ_ASC_CON);
+
+ /* set up to use divisor of 2 */
+ ltq_w32_mask(ASCCON_BRS, 0, port->membase + LTQ_ASC_CON);
+
+ /* now we can write the new baudrate into the register */
+ ltq_w32(divisor, port->membase + LTQ_ASC_BG);
+
+ /* turn the baudrate generator back on */
+ ltq_w32_mask(0, ASCCON_R, port->membase + LTQ_ASC_CON);
+
+ /* enable rx */
+ ltq_w32(ASCWHBSTATE_SETREN, port->membase + LTQ_ASC_WHBSTATE);
+
+ spin_unlock_irqrestore(<q_asc_lock, flags);
+
+ /* Don't rewrite B0 */
+ if (tty_termios_baud_rate(new))
+ tty_termios_encode_baud_rate(new, baud, baud);
+}
+
+static const char*
+lqasc_type(struct uart_port *port)
+{
+ if (port->type == PORT_LTQ_ASC)
+ return DRVNAME;
+ else
+ return NULL;
+}
+
+static void
+lqasc_release_port(struct uart_port *port)
+{
+ if (port->flags & UPF_IOREMAP) {
+ iounmap(port->membase);
+ port->membase = NULL;
+ }
+}
+
+static int
+lqasc_request_port(struct uart_port *port)
+{
+ struct platform_device *pdev = to_platform_device(port->dev);
+ struct resource *res;
+ int size;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(&pdev->dev, "cannot obtain I/O memory region");
+ return -ENODEV;
+ }
+ size = resource_size(res);
+
+ res = devm_request_mem_region(&pdev->dev, res->start,
+ size, dev_name(&pdev->dev));
+ if (!res) {
+ dev_err(&pdev->dev, "cannot request I/O memory region");
+ return -EBUSY;
+ }
+
+ if (port->flags & UPF_IOREMAP) {
+ port->membase = devm_ioremap_nocache(&pdev->dev,
+ port->mapbase, size);
+ if (port->membase == NULL)
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static void
+lqasc_config_port(struct uart_port *port, int flags)
+{
+ if (flags & UART_CONFIG_TYPE) {
+ port->type = PORT_LTQ_ASC;
+ lqasc_request_port(port);
+ }
+}
+
+static int
+lqasc_verify_port(struct uart_port *port,
+ struct serial_struct *ser)
+{
+ int ret = 0;
+ if (ser->type != PORT_UNKNOWN && ser->type != PORT_LTQ_ASC)
+ ret = -EINVAL;
+ if (ser->irq < 0 || ser->irq >= NR_IRQS)
+ ret = -EINVAL;
+ if (ser->baud_base < 9600)
+ ret = -EINVAL;
+ return ret;
+}
+
+static struct uart_ops lqasc_pops = {
+ .tx_empty = lqasc_tx_empty,
+ .set_mctrl = lqasc_set_mctrl,
+ .get_mctrl = lqasc_get_mctrl,
+ .stop_tx = lqasc_stop_tx,
+ .start_tx = lqasc_start_tx,
+ .stop_rx = lqasc_stop_rx,
+ .enable_ms = lqasc_enable_ms,
+ .break_ctl = lqasc_break_ctl,
+ .startup = lqasc_startup,
+ .shutdown = lqasc_shutdown,
+ .set_termios = lqasc_set_termios,
+ .type = lqasc_type,
+ .release_port = lqasc_release_port,
+ .request_port = lqasc_request_port,
+ .config_port = lqasc_config_port,
+ .verify_port = lqasc_verify_port,
+};
+
+static void
+lqasc_console_putchar(struct uart_port *port, int ch)
+{
+ int fifofree;
+
+ if (!port->membase)
+ return;
+
+ do {
+ fifofree = (ltq_r32(port->membase + LTQ_ASC_FSTAT)
+ & ASCFSTAT_TXFREEMASK) >> ASCFSTAT_TXFREEOFF;
+ } while (fifofree == 0);
+ ltq_w8(ch, port->membase + LTQ_ASC_TBUF);
+}
+
+
+static void
+lqasc_console_write(struct console *co, const char *s, u_int count)
+{
+ struct ltq_uart_port *ltq_port;
+ struct uart_port *port;
+ unsigned long flags;
+
+ if (co->index >= MAXPORTS)
+ return;
+
+ ltq_port = lqasc_port[co->index];
+ if (!ltq_port)
+ return;
+
+ port = <q_port->port;
+
+ spin_lock_irqsave(<q_asc_lock, flags);
+ uart_console_write(port, s, count, lqasc_console_putchar);
+ spin_unlock_irqrestore(<q_asc_lock, flags);
+}
+
+static int __init
+lqasc_console_setup(struct console *co, char *options)
+{
+ struct ltq_uart_port *ltq_port;
+ struct uart_port *port;
+ int baud = 115200;
+ int bits = 8;
+ int parity = 'n';
+ int flow = 'n';
+
+ if (co->index >= MAXPORTS)
+ return -ENODEV;
+
+ ltq_port = lqasc_port[co->index];
+ if (!ltq_port)
+ return -ENODEV;
+
+ port = <q_port->port;
+
+ port->uartclk = clk_get_rate(ltq_port->clk);
+
+ if (options)
+ uart_parse_options(options, &baud, &parity, &bits, &flow);
+ return uart_set_options(port, co, baud, parity, bits, flow);
+}
+
+static struct console lqasc_console = {
+ .name = "ttyLTQ",
+ .write = lqasc_console_write,
+ .device = uart_console_device,
+ .setup = lqasc_console_setup,
+ .flags = CON_PRINTBUFFER,
+ .index = -1,
+ .data = &lqasc_reg,
+};
+
+static int __init
+lqasc_console_init(void)
+{
+ register_console(&lqasc_console);
+ return 0;
+}
+console_initcall(lqasc_console_init);
+
+static struct uart_driver lqasc_reg = {
+ .owner = THIS_MODULE,
+ .driver_name = DRVNAME,
+ .dev_name = "ttyLTQ",
+ .major = 0,
+ .minor = 0,
+ .nr = MAXPORTS,
+ .cons = &lqasc_console,
+};
+
+static int __init
+lqasc_probe(struct platform_device *pdev)
+{
+ struct ltq_uart_port *ltq_port;
+ struct uart_port *port;
+ struct resource *mmres, *irqres;
+ int tx_irq, rx_irq, err_irq;
+ struct clk *clk;
+ int ret;
+
+ mmres = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ irqres = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
+ if (!mmres || !irqres)
+ return -ENODEV;
+
+ if (pdev->id >= MAXPORTS)
+ return -EBUSY;
+
+ if (lqasc_port[pdev->id] != NULL)
+ return -EBUSY;
+
+ clk = clk_get(&pdev->dev, "fpi");
+ if (IS_ERR(clk)) {
+ pr_err("failed to get fpi clk\n");
+ return -ENOENT;
+ }
+
+ tx_irq = platform_get_irq_byname(pdev, "tx");
+ rx_irq = platform_get_irq_byname(pdev, "rx");
+ err_irq = platform_get_irq_byname(pdev, "err");
+ if ((tx_irq < 0) | (rx_irq < 0) | (err_irq < 0))
+ return -ENODEV;
+
+ ltq_port = kzalloc(sizeof(struct ltq_uart_port), GFP_KERNEL);
+ if (!ltq_port)
+ return -ENOMEM;
+
+ port = <q_port->port;
+
+ port->iotype = SERIAL_IO_MEM;
+ port->flags = ASYNC_BOOT_AUTOCONF | UPF_IOREMAP;
+ port->ops = &lqasc_pops;
+ port->fifosize = 16;
+ port->type = PORT_LTQ_ASC,
+ port->line = pdev->id;
+ port->dev = &pdev->dev;
+
+ port->irq = tx_irq; /* unused, just to be backward-compatibe */
+ port->mapbase = mmres->start;
+
+ ltq_port->clk = clk;
+
+ ltq_port->tx_irq = tx_irq;
+ ltq_port->rx_irq = rx_irq;
+ ltq_port->err_irq = err_irq;
+
+ lqasc_port[pdev->id] = ltq_port;
+ platform_set_drvdata(pdev, ltq_port);
+
+ ret = uart_add_one_port(&lqasc_reg, port);
+
+ return ret;
+}
+
+static struct platform_driver lqasc_driver = {
+ .driver = {
+ .name = DRVNAME,
+ .owner = THIS_MODULE,
+ },
+};
+
+int __init
+init_lqasc(void)
+{
+ int ret;
+
+ ret = uart_register_driver(&lqasc_reg);
+ if (ret != 0)
+ return ret;
+
+ ret = platform_driver_probe(&lqasc_driver, lqasc_probe);
+ if (ret != 0)
+ uart_unregister_driver(&lqasc_reg);
+
+ return ret;
+}
+
+module_init(init_lqasc);
+
+MODULE_DESCRIPTION("Lantiq serial port driver");
+MODULE_LICENSE("GPL");
/*
* Try to register a serial port
*/
+static struct of_device_id of_platform_serial_table[];
static int __devinit of_platform_serial_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct of_serial_info *info;
struct uart_port port;
int port_type;
int ret;
- if (!ofdev->dev.of_match)
+ match = of_match_device(of_platform_serial_table, &ofdev->dev);
+ if (!match)
return -EINVAL;
if (of_find_property(ofdev->dev.of_node, "used-by-rtas", NULL))
if (info == NULL)
return -ENOMEM;
- port_type = (unsigned long)ofdev->dev.of_match->data;
+ port_type = (unsigned long)match->data;
ret = of_platform_serial_setup(ofdev, port_type, &port);
if (ret)
goto out;
}
/* Driver probe functions */
+static const struct of_device_id qe_udc_match[];
static int __devinit qe_udc_probe(struct platform_device *ofdev)
{
+ const struct of_device_id *match;
struct device_node *np = ofdev->dev.of_node;
struct qe_ep *ep;
unsigned int ret = 0;
unsigned int i;
const void *prop;
- if (!ofdev->dev.of_match)
+ match = of_match_device(qe_udc_match, &ofdev->dev);
+ if (!match)
return -EINVAL;
prop = of_get_property(np, "mode", NULL);
return -ENOMEM;
}
- udc_controller->soc_type = (unsigned long)ofdev->dev.of_match->data;
+ udc_controller->soc_type = (unsigned long)match->data;
udc_controller->usb_regs = of_iomap(np, 0);
if (!udc_controller->usb_regs) {
ret = -ENOMEM;
#include <linux/slab.h>
#include <linux/usb/ulpi.h>
#include <plat/usb.h>
+#include <linux/regulator/consumer.h>
/* EHCI Register Set */
#define EHCI_INSNREG04 (0xA0)
struct ehci_hcd *omap_ehci;
int ret = -ENODEV;
int irq;
+ int i;
+ char supply[7];
if (usb_disabled())
return -ENODEV;
hcd->rsrc_len = resource_size(res);
hcd->regs = regs;
+ /* get ehci regulator and enable */
+ for (i = 0 ; i < OMAP3_HS_USB_PORTS ; i++) {
+ if (pdata->port_mode[i] != OMAP_EHCI_PORT_MODE_PHY) {
+ pdata->regulator[i] = NULL;
+ continue;
+ }
+ snprintf(supply, sizeof(supply), "hsusb%d", i);
+ pdata->regulator[i] = regulator_get(dev, supply);
+ if (IS_ERR(pdata->regulator[i])) {
+ pdata->regulator[i] = NULL;
+ dev_dbg(dev,
+ "failed to get ehci port%d regulator\n", i);
+ } else {
+ regulator_enable(pdata->regulator[i]);
+ }
+ }
+
ret = omap_usbhs_enable(dev);
if (ret) {
dev_err(dev, "failed to start usbhs with err %d\n", ret);
ints[i].qh = NULL;
ints[i].qtd = NULL;
+ urb->status = status;
isp1760_urb_done(hcd, urb);
if (qtd)
pe(hcd, qh, qtd);
if (t1 != t2)
xhci_writel(xhci, t2, port_array[port_index]);
- if (DEV_HIGHSPEED(t1)) {
+ if (hcd->speed != HCD_USB3) {
/* enable remote wake up for USB 2.0 */
u32 __iomem *addr;
u32 tmp;
temp |= PORT_LINK_STROBE | XDEV_U0;
xhci_writel(xhci, temp, port_array[port_index]);
}
+ /* wait for the port to enter U0 and report port link
+ * state change.
+ */
+ spin_unlock_irqrestore(&xhci->lock, flags);
+ msleep(20);
+ spin_lock_irqsave(&xhci->lock, flags);
+
+ /* Clear PLC */
+ temp = xhci_readl(xhci, port_array[port_index]);
+ if (temp & PORT_PLC) {
+ temp = xhci_port_state_to_neutral(temp);
+ temp |= PORT_PLC;
+ xhci_writel(xhci, temp, port_array[port_index]);
+ }
+
slot_id = xhci_find_slot_id_by_port(hcd,
xhci, port_index + 1);
if (slot_id)
} else
xhci_writel(xhci, temp, port_array[port_index]);
- if (DEV_HIGHSPEED(temp)) {
+ if (hcd->speed != HCD_USB3) {
/* disable remote wake up for USB 2.0 */
u32 __iomem *addr;
u32 tmp;
otg_set_vbus(musb->xceiv, 1);
hcd->self.uses_pio_for_control = 1;
-
- if (musb->xceiv->last_event == USB_EVENT_NONE)
- pm_runtime_put(musb->controller);
-
}
+ if (musb->xceiv->last_event == USB_EVENT_NONE)
+ pm_runtime_put(musb->controller);
return 0;
DBG(4, "VBUS Disconnect\n");
#ifdef CONFIG_USB_GADGET_MUSB_HDRC
- if (is_otg_enabled(musb))
+ if (is_otg_enabled(musb) || is_peripheral_enabled(musb))
if (musb->gadget_driver)
#endif
{
* Author: Michael S. Tsirkin <mst@redhat.com>
*
* Inspiration, some code, and most witty comments come from
- * Documentation/lguest/lguest.c, by Rusty Russell
+ * Documentation/virtual/lguest/lguest.c, by Rusty Russell
*
* This work is licensed under the terms of the GNU GPL, version 2.
*
* have. Allow 1% either way on the nominal for TVs.
*/
#define NR_MONTYPES 6
-static struct fb_monspecs monspecs[NR_MONTYPES] __initdata = {
+static struct fb_monspecs monspecs[NR_MONTYPES] __devinitdata = {
{ /* TV */
.hfmin = 15469,
.hfmax = 15781,
/*
* Everything after here is initialisation!!!
*/
-static struct fb_videomode modedb[] __initdata = {
+static struct fb_videomode modedb[] __devinitdata = {
{ /* 320x256 @ 50Hz */
NULL, 50, 320, 256, 125000, 92, 62, 35, 19, 38, 2,
FB_SYNC_COMP_HIGH_ACT,
}
};
-static struct fb_videomode __initdata
-acornfb_default_mode = {
+static struct fb_videomode acornfb_default_mode __devinitdata = {
.name = NULL,
.refresh = 60,
.xres = 640,
.vmode = FB_VMODE_NONINTERLACED
};
-static void __init acornfb_init_fbinfo(void)
+static void __devinit acornfb_init_fbinfo(void)
{
static int first = 1;
* size can optionally be followed by 'M' or 'K' for
* MB or KB respectively.
*/
-static void __init
-acornfb_parse_mon(char *opt)
+static void __devinit acornfb_parse_mon(char *opt)
{
char *p = opt;
current_par.montype = -1;
}
-static void __init
-acornfb_parse_montype(char *opt)
+static void __devinit acornfb_parse_montype(char *opt)
{
current_par.montype = -2;
}
}
-static void __init
-acornfb_parse_dram(char *opt)
+static void __devinit acornfb_parse_dram(char *opt)
{
unsigned int size;
static struct options {
char *name;
void (*parse)(char *opt);
-} opt_table[] __initdata = {
+} opt_table[] __devinitdata = {
{ "mon", acornfb_parse_mon },
{ "montype", acornfb_parse_montype },
{ "dram", acornfb_parse_dram },
{ NULL, NULL }
};
-int __init
-acornfb_setup(char *options)
+static int __devinit acornfb_setup(char *options)
{
struct options *optp;
char *opt;
* Detect type of monitor connected
* For now, we just assume SVGA
*/
-static int __init
-acornfb_detect_monitortype(void)
+static int __devinit acornfb_detect_monitortype(void)
{
return 4;
}
atafb_ops.fb_setcolreg = &falcon_setcolreg;
error = request_irq(IRQ_AUTO_4, falcon_vbl_switcher,
IRQ_TYPE_PRIO,
- "framebuffer/modeswitch",
+ "framebuffer:modeswitch",
falcon_vbl_switcher);
if (error)
return error;
#define FBPIXMAPSIZE (1024 * 8)
+static DEFINE_MUTEX(registration_lock);
struct fb_info *registered_fb[FB_MAX] __read_mostly;
int num_registered_fb __read_mostly;
+static struct fb_info *get_fb_info(unsigned int idx)
+{
+ struct fb_info *fb_info;
+
+ if (idx >= FB_MAX)
+ return ERR_PTR(-ENODEV);
+
+ mutex_lock(®istration_lock);
+ fb_info = registered_fb[idx];
+ if (fb_info)
+ atomic_inc(&fb_info->count);
+ mutex_unlock(®istration_lock);
+
+ return fb_info;
+}
+
+static void put_fb_info(struct fb_info *fb_info)
+{
+ if (!atomic_dec_and_test(&fb_info->count))
+ return;
+ if (fb_info->fbops->fb_destroy)
+ fb_info->fbops->fb_destroy(fb_info);
+}
+
int lock_fb_info(struct fb_info *info)
{
mutex_lock(&info->lock);
static void *fb_seq_start(struct seq_file *m, loff_t *pos)
{
+ mutex_lock(®istration_lock);
return (*pos < FB_MAX) ? pos : NULL;
}
static void fb_seq_stop(struct seq_file *m, void *v)
{
+ mutex_unlock(®istration_lock);
}
static int fb_seq_show(struct seq_file *m, void *v)
.release = seq_release,
};
-static ssize_t
-fb_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+/*
+ * We hold a reference to the fb_info in file->private_data,
+ * but if the current registered fb has changed, we don't
+ * actually want to use it.
+ *
+ * So look up the fb_info using the inode minor number,
+ * and just verify it against the reference we have.
+ */
+static struct fb_info *file_fb_info(struct file *file)
{
- unsigned long p = *ppos;
struct inode *inode = file->f_path.dentry->d_inode;
int fbidx = iminor(inode);
struct fb_info *info = registered_fb[fbidx];
+
+ if (info != file->private_data)
+ info = NULL;
+ return info;
+}
+
+static ssize_t
+fb_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+{
+ unsigned long p = *ppos;
+ struct fb_info *info = file_fb_info(file);
u8 *buffer, *dst;
u8 __iomem *src;
int c, cnt = 0, err = 0;
fb_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
unsigned long p = *ppos;
- struct inode *inode = file->f_path.dentry->d_inode;
- int fbidx = iminor(inode);
- struct fb_info *info = registered_fb[fbidx];
+ struct fb_info *info = file_fb_info(file);
u8 *buffer, *src;
u8 __iomem *dst;
int c, cnt = 0, err = 0;
static long fb_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
- int fbidx = iminor(inode);
- struct fb_info *info = registered_fb[fbidx];
+ struct fb_info *info = file_fb_info(file);
+ if (!info)
+ return -ENODEV;
return do_fb_ioctl(info, cmd, arg);
}
static long fb_compat_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
- struct inode *inode = file->f_path.dentry->d_inode;
- int fbidx = iminor(inode);
- struct fb_info *info = registered_fb[fbidx];
- struct fb_ops *fb = info->fbops;
+ struct fb_info *info = file_fb_info(file);
+ struct fb_ops *fb;
long ret = -ENOIOCTLCMD;
+ if (!info)
+ return -ENODEV;
+ fb = info->fbops;
switch(cmd) {
case FBIOGET_VSCREENINFO:
case FBIOPUT_VSCREENINFO:
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
{
- int fbidx = iminor(file->f_path.dentry->d_inode);
- struct fb_info *info = registered_fb[fbidx];
- struct fb_ops *fb = info->fbops;
+ struct fb_info *info = file_fb_info(file);
+ struct fb_ops *fb;
unsigned long off;
unsigned long start;
u32 len;
+ if (!info)
+ return -ENODEV;
if (vma->vm_pgoff > (~0UL >> PAGE_SHIFT))
return -EINVAL;
off = vma->vm_pgoff << PAGE_SHIFT;
+ fb = info->fbops;
if (!fb)
return -ENODEV;
mutex_lock(&info->mm_lock);
struct fb_info *info;
int res = 0;
- if (fbidx >= FB_MAX)
- return -ENODEV;
- info = registered_fb[fbidx];
- if (!info)
+ info = get_fb_info(fbidx);
+ if (!info) {
request_module("fb%d", fbidx);
- info = registered_fb[fbidx];
- if (!info)
- return -ENODEV;
+ info = get_fb_info(fbidx);
+ if (!info)
+ return -ENODEV;
+ }
+ if (IS_ERR(info))
+ return PTR_ERR(info);
+
mutex_lock(&info->lock);
if (!try_module_get(info->fbops->owner)) {
res = -ENODEV;
#endif
out:
mutex_unlock(&info->lock);
+ if (res)
+ put_fb_info(info);
return res;
}
info->fbops->fb_release(info,1);
module_put(info->fbops->owner);
mutex_unlock(&info->lock);
+ put_fb_info(info);
return 0;
}
return false;
}
+static int do_unregister_framebuffer(struct fb_info *fb_info);
+
#define VGA_FB_PHYS 0xA0000
-void remove_conflicting_framebuffers(struct apertures_struct *a,
+static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
const char *name, bool primary)
{
int i;
printk(KERN_INFO "fb: conflicting fb hw usage "
"%s vs %s - removing generic driver\n",
name, registered_fb[i]->fix.id);
- unregister_framebuffer(registered_fb[i]);
+ do_unregister_framebuffer(registered_fb[i]);
}
}
}
-EXPORT_SYMBOL(remove_conflicting_framebuffers);
-/**
- * register_framebuffer - registers a frame buffer device
- * @fb_info: frame buffer info structure
- *
- * Registers a frame buffer device @fb_info.
- *
- * Returns negative errno on error, or zero for success.
- *
- */
-
-int
-register_framebuffer(struct fb_info *fb_info)
+static int do_register_framebuffer(struct fb_info *fb_info)
{
int i;
struct fb_event event;
struct fb_videomode mode;
- if (num_registered_fb == FB_MAX)
- return -ENXIO;
-
if (fb_check_foreignness(fb_info))
return -ENOSYS;
- remove_conflicting_framebuffers(fb_info->apertures, fb_info->fix.id,
+ do_remove_conflicting_framebuffers(fb_info->apertures, fb_info->fix.id,
fb_is_primary_device(fb_info));
+ if (num_registered_fb == FB_MAX)
+ return -ENXIO;
+
num_registered_fb++;
for (i = 0 ; i < FB_MAX; i++)
if (!registered_fb[i])
break;
fb_info->node = i;
+ atomic_set(&fb_info->count, 1);
mutex_init(&fb_info->lock);
mutex_init(&fb_info->mm_lock);
return 0;
}
-
-/**
- * unregister_framebuffer - releases a frame buffer device
- * @fb_info: frame buffer info structure
- *
- * Unregisters a frame buffer device @fb_info.
- *
- * Returns negative errno on error, or zero for success.
- *
- * This function will also notify the framebuffer console
- * to release the driver.
- *
- * This is meant to be called within a driver's module_exit()
- * function. If this is called outside module_exit(), ensure
- * that the driver implements fb_open() and fb_release() to
- * check that no processes are using the device.
- */
-
-int
-unregister_framebuffer(struct fb_info *fb_info)
+static int do_unregister_framebuffer(struct fb_info *fb_info)
{
struct fb_event event;
int i, ret = 0;
i = fb_info->node;
- if (!registered_fb[i]) {
- ret = -EINVAL;
- goto done;
- }
-
+ if (i < 0 || i >= FB_MAX || registered_fb[i] != fb_info)
+ return -EINVAL;
if (!lock_fb_info(fb_info))
return -ENODEV;
ret = fb_notifier_call_chain(FB_EVENT_FB_UNBIND, &event);
unlock_fb_info(fb_info);
- if (ret) {
- ret = -EINVAL;
- goto done;
- }
+ if (ret)
+ return -EINVAL;
if (fb_info->pixmap.addr &&
(fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
kfree(fb_info->pixmap.addr);
fb_destroy_modelist(&fb_info->modelist);
- registered_fb[i]=NULL;
+ registered_fb[i] = NULL;
num_registered_fb--;
fb_cleanup_device(fb_info);
device_destroy(fb_class, MKDEV(FB_MAJOR, i));
fb_notifier_call_chain(FB_EVENT_FB_UNREGISTERED, &event);
/* this may free fb info */
- if (fb_info->fbops->fb_destroy)
- fb_info->fbops->fb_destroy(fb_info);
-done:
+ put_fb_info(fb_info);
+ return 0;
+}
+
+void remove_conflicting_framebuffers(struct apertures_struct *a,
+ const char *name, bool primary)
+{
+ mutex_lock(®istration_lock);
+ do_remove_conflicting_framebuffers(a, name, primary);
+ mutex_unlock(®istration_lock);
+}
+EXPORT_SYMBOL(remove_conflicting_framebuffers);
+
+/**
+ * register_framebuffer - registers a frame buffer device
+ * @fb_info: frame buffer info structure
+ *
+ * Registers a frame buffer device @fb_info.
+ *
+ * Returns negative errno on error, or zero for success.
+ *
+ */
+int
+register_framebuffer(struct fb_info *fb_info)
+{
+ int ret;
+
+ mutex_lock(®istration_lock);
+ ret = do_register_framebuffer(fb_info);
+ mutex_unlock(®istration_lock);
+
+ return ret;
+}
+
+/**
+ * unregister_framebuffer - releases a frame buffer device
+ * @fb_info: frame buffer info structure
+ *
+ * Unregisters a frame buffer device @fb_info.
+ *
+ * Returns negative errno on error, or zero for success.
+ *
+ * This function will also notify the framebuffer console
+ * to release the driver.
+ *
+ * This is meant to be called within a driver's module_exit()
+ * function. If this is called outside module_exit(), ensure
+ * that the driver implements fb_open() and fb_release() to
+ * check that no processes are using the device.
+ */
+int
+unregister_framebuffer(struct fb_info *fb_info)
+{
+ int ret;
+
+ mutex_lock(®istration_lock);
+ ret = do_unregister_framebuffer(fb_info);
+ mutex_unlock(®istration_lock);
+
return ret;
}
To compile this driver as a loadable module, choose M here.
The module will be called bcm63xx_wdt.
+config LANTIQ_WDT
+ tristate "Lantiq SoC watchdog"
+ depends on LANTIQ
+ help
+ Hardware driver for the Lantiq SoC Watchdog Timer.
+
# PARISC Architecture
# POWERPC Architecture
obj-$(CONFIG_TXX9_WDT) += txx9wdt.o
obj-$(CONFIG_OCTEON_WDT) += octeon-wdt.o
octeon-wdt-y := octeon-wdt-main.o octeon-wdt-nmi.o
+obj-$(CONFIG_LANTIQ_WDT) += lantiq_wdt.o
# PARISC Architecture
* document number 324645-001, 324646-001: Cougar Point (CPT)
* document number TBD : Patsburg (PBG)
* document number TBD : DH89xxCC
+ * document number TBD : Panther Point
*/
/*
TCO_PBG1, /* Patsburg */
TCO_PBG2, /* Patsburg */
TCO_DH89XXCC, /* DH89xxCC */
+ TCO_PPT0, /* Panther Point */
+ TCO_PPT1, /* Panther Point */
+ TCO_PPT2, /* Panther Point */
+ TCO_PPT3, /* Panther Point */
+ TCO_PPT4, /* Panther Point */
+ TCO_PPT5, /* Panther Point */
+ TCO_PPT6, /* Panther Point */
+ TCO_PPT7, /* Panther Point */
+ TCO_PPT8, /* Panther Point */
+ TCO_PPT9, /* Panther Point */
+ TCO_PPT10, /* Panther Point */
+ TCO_PPT11, /* Panther Point */
+ TCO_PPT12, /* Panther Point */
+ TCO_PPT13, /* Panther Point */
+ TCO_PPT14, /* Panther Point */
+ TCO_PPT15, /* Panther Point */
+ TCO_PPT16, /* Panther Point */
+ TCO_PPT17, /* Panther Point */
+ TCO_PPT18, /* Panther Point */
+ TCO_PPT19, /* Panther Point */
+ TCO_PPT20, /* Panther Point */
+ TCO_PPT21, /* Panther Point */
+ TCO_PPT22, /* Panther Point */
+ TCO_PPT23, /* Panther Point */
+ TCO_PPT24, /* Panther Point */
+ TCO_PPT25, /* Panther Point */
+ TCO_PPT26, /* Panther Point */
+ TCO_PPT27, /* Panther Point */
+ TCO_PPT28, /* Panther Point */
+ TCO_PPT29, /* Panther Point */
+ TCO_PPT30, /* Panther Point */
+ TCO_PPT31, /* Panther Point */
};
static struct {
{"Patsburg", 2},
{"Patsburg", 2},
{"DH89xxCC", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
+ {"Panther Point", 2},
{NULL, 0}
};
{ ITCO_PCI_DEVICE(0x1d40, TCO_PBG1)},
{ ITCO_PCI_DEVICE(0x1d41, TCO_PBG2)},
{ ITCO_PCI_DEVICE(0x2310, TCO_DH89XXCC)},
+ { ITCO_PCI_DEVICE(0x1e40, TCO_PPT0)},
+ { ITCO_PCI_DEVICE(0x1e41, TCO_PPT1)},
+ { ITCO_PCI_DEVICE(0x1e42, TCO_PPT2)},
+ { ITCO_PCI_DEVICE(0x1e43, TCO_PPT3)},
+ { ITCO_PCI_DEVICE(0x1e44, TCO_PPT4)},
+ { ITCO_PCI_DEVICE(0x1e45, TCO_PPT5)},
+ { ITCO_PCI_DEVICE(0x1e46, TCO_PPT6)},
+ { ITCO_PCI_DEVICE(0x1e47, TCO_PPT7)},
+ { ITCO_PCI_DEVICE(0x1e48, TCO_PPT8)},
+ { ITCO_PCI_DEVICE(0x1e49, TCO_PPT9)},
+ { ITCO_PCI_DEVICE(0x1e4a, TCO_PPT10)},
+ { ITCO_PCI_DEVICE(0x1e4b, TCO_PPT11)},
+ { ITCO_PCI_DEVICE(0x1e4c, TCO_PPT12)},
+ { ITCO_PCI_DEVICE(0x1e4d, TCO_PPT13)},
+ { ITCO_PCI_DEVICE(0x1e4e, TCO_PPT14)},
+ { ITCO_PCI_DEVICE(0x1e4f, TCO_PPT15)},
+ { ITCO_PCI_DEVICE(0x1e50, TCO_PPT16)},
+ { ITCO_PCI_DEVICE(0x1e51, TCO_PPT17)},
+ { ITCO_PCI_DEVICE(0x1e52, TCO_PPT18)},
+ { ITCO_PCI_DEVICE(0x1e53, TCO_PPT19)},
+ { ITCO_PCI_DEVICE(0x1e54, TCO_PPT20)},
+ { ITCO_PCI_DEVICE(0x1e55, TCO_PPT21)},
+ { ITCO_PCI_DEVICE(0x1e56, TCO_PPT22)},
+ { ITCO_PCI_DEVICE(0x1e57, TCO_PPT23)},
+ { ITCO_PCI_DEVICE(0x1e58, TCO_PPT24)},
+ { ITCO_PCI_DEVICE(0x1e59, TCO_PPT25)},
+ { ITCO_PCI_DEVICE(0x1e5a, TCO_PPT26)},
+ { ITCO_PCI_DEVICE(0x1e5b, TCO_PPT27)},
+ { ITCO_PCI_DEVICE(0x1e5c, TCO_PPT28)},
+ { ITCO_PCI_DEVICE(0x1e5d, TCO_PPT29)},
+ { ITCO_PCI_DEVICE(0x1e5e, TCO_PPT30)},
+ { ITCO_PCI_DEVICE(0x1e5f, TCO_PPT31)},
{ 0, }, /* End of list */
};
MODULE_DEVICE_TABLE(pci, iTCO_wdt_pci_tbl);
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * Copyright (C) 2010 John Crispin <blogic@openwrt.org>
+ * Based on EP93xx wdt driver
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/watchdog.h>
+#include <linux/platform_device.h>
+#include <linux/uaccess.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+
+#include <lantiq.h>
+
+/* Section 3.4 of the datasheet
+ * The password sequence protects the WDT control register from unintended
+ * write actions, which might cause malfunction of the WDT.
+ *
+ * essentially the following two magic passwords need to be written to allow
+ * IO access to the WDT core
+ */
+#define LTQ_WDT_PW1 0x00BE0000
+#define LTQ_WDT_PW2 0x00DC0000
+
+#define LTQ_WDT_CR 0x0 /* watchdog control register */
+#define LTQ_WDT_SR 0x8 /* watchdog status register */
+
+#define LTQ_WDT_SR_EN (0x1 << 31) /* enable bit */
+#define LTQ_WDT_SR_PWD (0x3 << 26) /* turn on power */
+#define LTQ_WDT_SR_CLKDIV (0x3 << 24) /* turn on clock and set */
+ /* divider to 0x40000 */
+#define LTQ_WDT_DIVIDER 0x40000
+#define LTQ_MAX_TIMEOUT ((1 << 16) - 1) /* the reload field is 16 bit */
+
+static int nowayout = WATCHDOG_NOWAYOUT;
+
+static void __iomem *ltq_wdt_membase;
+static unsigned long ltq_io_region_clk_rate;
+
+static unsigned long ltq_wdt_bootstatus;
+static unsigned long ltq_wdt_in_use;
+static int ltq_wdt_timeout = 30;
+static int ltq_wdt_ok_to_close;
+
+static void
+ltq_wdt_enable(void)
+{
+ ltq_wdt_timeout = ltq_wdt_timeout *
+ (ltq_io_region_clk_rate / LTQ_WDT_DIVIDER) + 0x1000;
+ if (ltq_wdt_timeout > LTQ_MAX_TIMEOUT)
+ ltq_wdt_timeout = LTQ_MAX_TIMEOUT;
+
+ /* write the first password magic */
+ ltq_w32(LTQ_WDT_PW1, ltq_wdt_membase + LTQ_WDT_CR);
+ /* write the second magic plus the configuration and new timeout */
+ ltq_w32(LTQ_WDT_SR_EN | LTQ_WDT_SR_PWD | LTQ_WDT_SR_CLKDIV |
+ LTQ_WDT_PW2 | ltq_wdt_timeout, ltq_wdt_membase + LTQ_WDT_CR);
+}
+
+static void
+ltq_wdt_disable(void)
+{
+ /* write the first password magic */
+ ltq_w32(LTQ_WDT_PW1, ltq_wdt_membase + LTQ_WDT_CR);
+ /* write the second password magic with no config
+ * this turns the watchdog off
+ */
+ ltq_w32(LTQ_WDT_PW2, ltq_wdt_membase + LTQ_WDT_CR);
+}
+
+static ssize_t
+ltq_wdt_write(struct file *file, const char __user *data,
+ size_t len, loff_t *ppos)
+{
+ if (len) {
+ if (!nowayout) {
+ size_t i;
+
+ ltq_wdt_ok_to_close = 0;
+ for (i = 0; i != len; i++) {
+ char c;
+
+ if (get_user(c, data + i))
+ return -EFAULT;
+ if (c == 'V')
+ ltq_wdt_ok_to_close = 1;
+ else
+ ltq_wdt_ok_to_close = 0;
+ }
+ }
+ ltq_wdt_enable();
+ }
+
+ return len;
+}
+
+static struct watchdog_info ident = {
+ .options = WDIOF_MAGICCLOSE | WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING |
+ WDIOF_CARDRESET,
+ .identity = "ltq_wdt",
+};
+
+static long
+ltq_wdt_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+ int ret = -ENOTTY;
+
+ switch (cmd) {
+ case WDIOC_GETSUPPORT:
+ ret = copy_to_user((struct watchdog_info __user *)arg, &ident,
+ sizeof(ident)) ? -EFAULT : 0;
+ break;
+
+ case WDIOC_GETBOOTSTATUS:
+ ret = put_user(ltq_wdt_bootstatus, (int __user *)arg);
+ break;
+
+ case WDIOC_GETSTATUS:
+ ret = put_user(0, (int __user *)arg);
+ break;
+
+ case WDIOC_SETTIMEOUT:
+ ret = get_user(ltq_wdt_timeout, (int __user *)arg);
+ if (!ret)
+ ltq_wdt_enable();
+ /* intentional drop through */
+ case WDIOC_GETTIMEOUT:
+ ret = put_user(ltq_wdt_timeout, (int __user *)arg);
+ break;
+
+ case WDIOC_KEEPALIVE:
+ ltq_wdt_enable();
+ ret = 0;
+ break;
+ }
+ return ret;
+}
+
+static int
+ltq_wdt_open(struct inode *inode, struct file *file)
+{
+ if (test_and_set_bit(0, <q_wdt_in_use))
+ return -EBUSY;
+ ltq_wdt_in_use = 1;
+ ltq_wdt_enable();
+
+ return nonseekable_open(inode, file);
+}
+
+static int
+ltq_wdt_release(struct inode *inode, struct file *file)
+{
+ if (ltq_wdt_ok_to_close)
+ ltq_wdt_disable();
+ else
+ pr_err("ltq_wdt: watchdog closed without warning\n");
+ ltq_wdt_ok_to_close = 0;
+ clear_bit(0, <q_wdt_in_use);
+
+ return 0;
+}
+
+static const struct file_operations ltq_wdt_fops = {
+ .owner = THIS_MODULE,
+ .write = ltq_wdt_write,
+ .unlocked_ioctl = ltq_wdt_ioctl,
+ .open = ltq_wdt_open,
+ .release = ltq_wdt_release,
+ .llseek = no_llseek,
+};
+
+static struct miscdevice ltq_wdt_miscdev = {
+ .minor = WATCHDOG_MINOR,
+ .name = "watchdog",
+ .fops = <q_wdt_fops,
+};
+
+static int __init
+ltq_wdt_probe(struct platform_device *pdev)
+{
+ struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ struct clk *clk;
+
+ if (!res) {
+ dev_err(&pdev->dev, "cannot obtain I/O memory region");
+ return -ENOENT;
+ }
+ res = devm_request_mem_region(&pdev->dev, res->start,
+ resource_size(res), dev_name(&pdev->dev));
+ if (!res) {
+ dev_err(&pdev->dev, "cannot request I/O memory region");
+ return -EBUSY;
+ }
+ ltq_wdt_membase = devm_ioremap_nocache(&pdev->dev, res->start,
+ resource_size(res));
+ if (!ltq_wdt_membase) {
+ dev_err(&pdev->dev, "cannot remap I/O memory region\n");
+ return -ENOMEM;
+ }
+
+ /* we do not need to enable the clock as it is always running */
+ clk = clk_get(&pdev->dev, "io");
+ WARN_ON(!clk);
+ ltq_io_region_clk_rate = clk_get_rate(clk);
+ clk_put(clk);
+
+ if (ltq_reset_cause() == LTQ_RST_CAUSE_WDTRST)
+ ltq_wdt_bootstatus = WDIOF_CARDRESET;
+
+ return misc_register(<q_wdt_miscdev);
+}
+
+static int __devexit
+ltq_wdt_remove(struct platform_device *pdev)
+{
+ misc_deregister(<q_wdt_miscdev);
+
+ if (ltq_wdt_membase)
+ iounmap(ltq_wdt_membase);
+
+ return 0;
+}
+
+
+static struct platform_driver ltq_wdt_driver = {
+ .remove = __devexit_p(ltq_wdt_remove),
+ .driver = {
+ .name = "ltq_wdt",
+ .owner = THIS_MODULE,
+ },
+};
+
+static int __init
+init_ltq_wdt(void)
+{
+ return platform_driver_probe(<q_wdt_driver, ltq_wdt_probe);
+}
+
+static void __exit
+exit_ltq_wdt(void)
+{
+ return platform_driver_unregister(<q_wdt_driver);
+}
+
+module_init(init_ltq_wdt);
+module_exit(exit_ltq_wdt);
+
+module_param(nowayout, int, 0);
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started");
+
+MODULE_AUTHOR("John Crispin <blogic@openwrt.org>");
+MODULE_DESCRIPTION("Lantiq SoC Watchdog");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR);
.fops = &mpc8xxx_wdt_fops,
};
+static const struct of_device_id mpc8xxx_wdt_match[];
static int __devinit mpc8xxx_wdt_probe(struct platform_device *ofdev)
{
int ret;
+ const struct of_device_id *match;
struct device_node *np = ofdev->dev.of_node;
struct mpc8xxx_wdt_type *wdt_type;
u32 freq = fsl_get_sys_freq();
bool enabled;
- if (!ofdev->dev.of_match)
+ match = of_match_device(mpc8xxx_wdt_match, &ofdev->dev);
+ if (!match)
return -EINVAL;
- wdt_type = ofdev->dev.of_match->data;
+ wdt_type = match->data;
if (!freq || freq == -1)
return -EINVAL;
int default_ticks;
unsigned long inuse;
unsigned gpio;
+ int gstate;
} mtx1_wdt_device;
static void mtx1_wdt_trigger(unsigned long unused)
spin_lock(&mtx1_wdt_device.lock);
if (mtx1_wdt_device.running)
ticks--;
- /*
- * toggle GPIO2_15
- */
- tmp = au_readl(GPIO2_DIR);
- tmp = (tmp & ~(1 << mtx1_wdt_device.gpio)) |
- ((~tmp) & (1 << mtx1_wdt_device.gpio));
- au_writel(tmp, GPIO2_DIR);
+
+ /* toggle wdt gpio */
+ mtx1_wdt_device.gstate = ~mtx1_wdt_device.gstate;
+ if (mtx1_wdt_device.gstate)
+ gpio_direction_output(mtx1_wdt_device.gpio, 1);
+ else
+ gpio_direction_input(mtx1_wdt_device.gpio);
if (mtx1_wdt_device.queue && ticks)
mod_timer(&mtx1_wdt_device.timer, jiffies + MTX1_WDT_INTERVAL);
spin_lock_irqsave(&mtx1_wdt_device.lock, flags);
if (!mtx1_wdt_device.queue) {
mtx1_wdt_device.queue = 1;
- gpio_set_value(mtx1_wdt_device.gpio, 1);
+ mtx1_wdt_device.gstate = 1;
+ gpio_direction_output(mtx1_wdt_device.gpio, 1);
mod_timer(&mtx1_wdt_device.timer, jiffies + MTX1_WDT_INTERVAL);
}
mtx1_wdt_device.running++;
spin_lock_irqsave(&mtx1_wdt_device.lock, flags);
if (mtx1_wdt_device.queue) {
mtx1_wdt_device.queue = 0;
- gpio_set_value(mtx1_wdt_device.gpio, 0);
+ mtx1_wdt_device.gstate = 0;
+ gpio_direction_output(mtx1_wdt_device.gpio, 0);
}
ticks = mtx1_wdt_device.default_ticks;
spin_unlock_irqrestore(&mtx1_wdt_device.lock, flags);
nostackp := $(call cc-option, -fno-stack-protector)
CFLAGS_features.o := $(nostackp)
-obj-$(CONFIG_BLOCK) += biomerge.o
-obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
-obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
-obj-$(CONFIG_XEN_BALLOON) += xen-balloon.o
-obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
-obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
+obj-$(CONFIG_BLOCK) += biomerge.o
+obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
+obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
+obj-$(CONFIG_XEN_BALLOON) += xen-balloon.o
+obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
+obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o
-obj-$(CONFIG_XENFS) += xenfs/
+obj-$(CONFIG_XENFS) += xenfs/
obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o
-obj-$(CONFIG_XEN_PLATFORM_PCI) += xen-platform-pci.o
-obj-$(CONFIG_SWIOTLB_XEN) += swiotlb-xen.o
-obj-$(CONFIG_XEN_DOM0) += pci.o
+obj-$(CONFIG_XEN_PLATFORM_PCI) += xen-platform-pci.o
+obj-$(CONFIG_SWIOTLB_XEN) += swiotlb-xen.o
+obj-$(CONFIG_XEN_DOM0) += pci.o
-xen-evtchn-y := evtchn.o
+xen-evtchn-y := evtchn.o
xen-gntdev-y := gntdev.o
xen-gntalloc-y := gntalloc.o
-xen-platform-pci-y := platform-pci.o
+xen-platform-pci-y := platform-pci.o
if (PageHighMem(page)) {
list_add_tail(&page->lru, &ballooned_pages);
balloon_stats.balloon_high++;
- dec_totalhigh_pages();
} else {
list_add(&page->lru, &ballooned_pages);
balloon_stats.balloon_low++;
static void balloon_append(struct page *page)
{
__balloon_append(page);
+ if (PageHighMem(page))
+ dec_totalhigh_pages();
totalram_pages--;
}
return BP_EAGAIN;
}
-static unsigned long current_target(void)
+static long current_credit(void)
{
unsigned long target = balloon_stats.target_pages;
balloon_stats.balloon_low +
balloon_stats.balloon_high);
- return target;
+ return target - balloon_stats.current_pages;
}
static enum bp_state increase_reservation(unsigned long nr_pages)
set_phys_to_machine(pfn, frame_list[i]);
/* Link back into the page tables if not highmem. */
- if (!xen_hvm_domain() && pfn < max_low_pfn) {
+ if (xen_pv_domain() && !PageHighMem(page)) {
int ret;
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
scrub_page(page);
- if (!xen_hvm_domain() && !PageHighMem(page)) {
+ if (xen_pv_domain() && !PageHighMem(page)) {
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
__pte_ma(0), 0);
mutex_lock(&balloon_mutex);
do {
- credit = current_target() - balloon_stats.current_pages;
+ credit = current_credit();
if (credit > 0)
state = increase_reservation(credit);
}
/* The balloon may be too large now. Shrink it if needed. */
- if (current_target() != balloon_stats.current_pages)
+ if (current_credit())
schedule_delayed_work(&balloon_worker, 0);
mutex_unlock(&balloon_mutex);
static int __init balloon_init(void)
{
- unsigned long pfn, nr_pages, extra_pfn_end;
+ unsigned long pfn, extra_pfn_end;
struct page *page;
if (!xen_domain())
pr_info("xen/balloon: Initialising balloon driver.\n");
- if (xen_pv_domain())
- nr_pages = xen_start_info->nr_pages;
- else
- nr_pages = max_pfn;
- balloon_stats.current_pages = min(nr_pages, max_pfn);
+ balloon_stats.current_pages = xen_pv_domain() ? min(xen_start_info->nr_pages, max_pfn) : max_pfn;
balloon_stats.target_pages = balloon_stats.current_pages;
balloon_stats.balloon_low = 0;
balloon_stats.balloon_high = 0;
pfn < extra_pfn_end;
pfn++) {
page = pfn_to_page(pfn);
- /* totalram_pages doesn't include the boot-time
+ /* totalram_pages and totalhigh_pages do not include the boot-time
balloon extension, so don't subtract from it. */
__balloon_append(page);
}
unsigned short gsi;
unsigned char vector;
unsigned char flags;
+ uint16_t domid;
} pirq;
} u;
};
static struct irq_chip xen_dynamic_chip;
static struct irq_chip xen_percpu_chip;
static struct irq_chip xen_pirq_chip;
+static void enable_dynirq(struct irq_data *data);
+static void disable_dynirq(struct irq_data *data);
/* Get info for IRQ */
static struct irq_info *info_for_irq(unsigned irq)
unsigned short pirq,
unsigned short gsi,
unsigned short vector,
+ uint16_t domid,
unsigned char flags)
{
struct irq_info *info = info_for_irq(irq);
info->u.pirq.pirq = pirq;
info->u.pirq.gsi = gsi;
info->u.pirq.vector = vector;
+ info->u.pirq.domid = domid;
info->u.pirq.flags = flags;
}
irq_free_desc(irq);
}
-static void pirq_unmask_notify(int irq)
-{
- struct physdev_eoi eoi = { .irq = pirq_from_irq(irq) };
-
- if (unlikely(pirq_needs_eoi(irq))) {
- int rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi);
- WARN_ON(rc);
- }
-}
-
static void pirq_query_unmask(int irq)
{
struct physdev_irq_status_query irq_status;
return desc && desc->action == NULL;
}
+static void eoi_pirq(struct irq_data *data)
+{
+ int evtchn = evtchn_from_irq(data->irq);
+ struct physdev_eoi eoi = { .irq = pirq_from_irq(data->irq) };
+ int rc = 0;
+
+ irq_move_irq(data);
+
+ if (VALID_EVTCHN(evtchn))
+ clear_evtchn(evtchn);
+
+ if (pirq_needs_eoi(data->irq)) {
+ rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi);
+ WARN_ON(rc);
+ }
+}
+
+static void mask_ack_pirq(struct irq_data *data)
+{
+ disable_dynirq(data);
+ eoi_pirq(data);
+}
+
static unsigned int __startup_pirq(unsigned int irq)
{
struct evtchn_bind_pirq bind_pirq;
out:
unmask_evtchn(evtchn);
- pirq_unmask_notify(irq);
+ eoi_pirq(irq_get_irq_data(irq));
return 0;
}
static void disable_pirq(struct irq_data *data)
{
-}
-
-static void ack_pirq(struct irq_data *data)
-{
- int evtchn = evtchn_from_irq(data->irq);
-
- irq_move_irq(data);
-
- if (VALID_EVTCHN(evtchn)) {
- mask_evtchn(evtchn);
- clear_evtchn(evtchn);
- }
+ disable_dynirq(data);
}
static int find_irq_by_gsi(unsigned gsi)
if (irq < 0)
goto out;
- irq_set_chip_and_handler_name(irq, &xen_pirq_chip, handle_level_irq,
- name);
-
irq_op.irq = irq;
irq_op.vector = 0;
goto out;
}
- xen_irq_info_pirq_init(irq, 0, pirq, gsi, irq_op.vector,
+ xen_irq_info_pirq_init(irq, 0, pirq, gsi, irq_op.vector, DOMID_SELF,
shareable ? PIRQ_SHAREABLE : 0);
+ pirq_query_unmask(irq);
+ /* We try to use the handler with the appropriate semantic for the
+ * type of interrupt: if the interrupt doesn't need an eoi
+ * (pirq_needs_eoi returns false), we treat it like an edge
+ * triggered interrupt so we use handle_edge_irq.
+ * As a matter of fact this only happens when the corresponding
+ * physical interrupt is edge triggered or an msi.
+ *
+ * On the other hand if the interrupt needs an eoi (pirq_needs_eoi
+ * returns true) we treat it like a level triggered interrupt so we
+ * use handle_fasteoi_irq like the native code does for this kind of
+ * interrupts.
+ * Depending on the Xen version, pirq_needs_eoi might return true
+ * not only for level triggered interrupts but for edge triggered
+ * interrupts too. In any case Xen always honors the eoi mechanism,
+ * not injecting any more pirqs of the same kind if the first one
+ * hasn't received an eoi yet. Therefore using the fasteoi handler
+ * is the right choice either way.
+ */
+ if (pirq_needs_eoi(irq))
+ irq_set_chip_and_handler_name(irq, &xen_pirq_chip,
+ handle_fasteoi_irq, name);
+ else
+ irq_set_chip_and_handler_name(irq, &xen_pirq_chip,
+ handle_edge_irq, name);
+
out:
spin_unlock(&irq_mapping_update_lock);
}
int xen_bind_pirq_msi_to_irq(struct pci_dev *dev, struct msi_desc *msidesc,
- int pirq, int vector, const char *name)
+ int pirq, int vector, const char *name,
+ domid_t domid)
{
int irq, ret;
if (irq == -1)
goto out;
- irq_set_chip_and_handler_name(irq, &xen_pirq_chip, handle_level_irq,
- name);
+ irq_set_chip_and_handler_name(irq, &xen_pirq_chip, handle_edge_irq,
+ name);
- xen_irq_info_pirq_init(irq, 0, pirq, 0, vector, 0);
+ xen_irq_info_pirq_init(irq, 0, pirq, 0, vector, domid, 0);
ret = irq_set_msi_desc(irq, msidesc);
if (ret < 0)
goto error_irq;
if (xen_initial_domain()) {
unmap_irq.pirq = info->u.pirq.pirq;
- unmap_irq.domid = DOMID_SELF;
+ unmap_irq.domid = info->u.pirq.domid;
rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
- if (rc) {
+ /* If another domain quits without making the pci_disable_msix
+ * call, the Xen hypervisor takes care of freeing the PIRQs
+ * (free_domain_pirqs).
+ */
+ if ((rc == -ESRCH && info->u.pirq.domid != DOMID_SELF))
+ printk(KERN_INFO "domain %d does not have %d anymore\n",
+ info->u.pirq.domid, info->u.pirq.pirq);
+ else if (rc) {
printk(KERN_WARNING "unmap irq failed %d\n", rc);
goto out;
}
return irq;
}
+
+int xen_pirq_from_irq(unsigned irq)
+{
+ return pirq_from_irq(irq);
+}
+EXPORT_SYMBOL_GPL(xen_pirq_from_irq);
int bind_evtchn_to_irq(unsigned int evtchn)
{
int irq;
goto out;
irq_set_chip_and_handler_name(irq, &xen_dynamic_chip,
- handle_fasteoi_irq, "event");
+ handle_edge_irq, "event");
xen_irq_info_evtchn_init(irq, evtchn);
}
port = (word_idx * BITS_PER_LONG) + bit_idx;
irq = evtchn_to_irq[port];
- mask_evtchn(port);
- clear_evtchn(port);
-
if (irq != -1) {
desc = irq_to_desc(irq);
if (desc)
{
int evtchn = evtchn_from_irq(data->irq);
- irq_move_masked_irq(data);
+ irq_move_irq(data);
if (VALID_EVTCHN(evtchn))
- unmask_evtchn(evtchn);
+ clear_evtchn(evtchn);
+}
+
+static void mask_ack_dynirq(struct irq_data *data)
+{
+ disable_dynirq(data);
+ ack_dynirq(data);
}
static int retrigger_dynirq(struct irq_data *data)
xen_poll_irq_timeout(irq, 0 /* no timeout */);
}
+/* Check whether the IRQ line is shared with other guests. */
+int xen_test_irq_shared(int irq)
+{
+ struct irq_info *info = info_for_irq(irq);
+ struct physdev_irq_status_query irq_status = { .irq = info->u.pirq.pirq };
+
+ if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status))
+ return 0;
+ return !(irq_status.flags & XENIRQSTAT_shared);
+}
+EXPORT_SYMBOL_GPL(xen_test_irq_shared);
+
void xen_irq_resume(void)
{
unsigned int cpu, evtchn;
.irq_mask = disable_dynirq,
.irq_unmask = enable_dynirq,
- .irq_eoi = ack_dynirq,
+ .irq_ack = ack_dynirq,
+ .irq_mask_ack = mask_ack_dynirq,
+
.irq_set_affinity = set_affinity_irq,
.irq_retrigger = retrigger_dynirq,
};
.irq_startup = startup_pirq,
.irq_shutdown = shutdown_pirq,
-
.irq_enable = enable_pirq,
- .irq_unmask = enable_pirq,
-
.irq_disable = disable_pirq,
- .irq_mask = disable_pirq,
- .irq_ack = ack_pirq,
+ .irq_mask = disable_dynirq,
+ .irq_unmask = enable_dynirq,
+
+ .irq_ack = eoi_pirq,
+ .irq_eoi = eoi_pirq,
+ .irq_mask_ack = mask_ack_pirq,
.irq_set_affinity = set_affinity_irq,
return 0;
}
+static void gntalloc_vma_open(struct vm_area_struct *vma)
+{
+ struct gntalloc_gref *gref = vma->vm_private_data;
+ if (!gref)
+ return;
+
+ spin_lock(&gref_lock);
+ gref->users++;
+ spin_unlock(&gref_lock);
+}
+
static void gntalloc_vma_close(struct vm_area_struct *vma)
{
struct gntalloc_gref *gref = vma->vm_private_data;
}
static struct vm_operations_struct gntalloc_vmops = {
+ .open = gntalloc_vma_open,
.close = gntalloc_vma_close,
};
vma->vm_private_data = gref;
vma->vm_flags |= VM_RESERVED;
- vma->vm_flags |= VM_DONTCOPY;
- vma->vm_flags |= VM_PFNMAP | VM_PFN_AT_MMAP;
vma->vm_ops = &gntalloc_vmops;
/* ------------------------------------------------------------------ */
+static void gntdev_vma_open(struct vm_area_struct *vma)
+{
+ struct grant_map *map = vma->vm_private_data;
+
+ pr_debug("gntdev_vma_open %p\n", vma);
+ atomic_inc(&map->users);
+}
+
static void gntdev_vma_close(struct vm_area_struct *vma)
{
struct grant_map *map = vma->vm_private_data;
- pr_debug("close %p\n", vma);
+ pr_debug("gntdev_vma_close %p\n", vma);
map->vma = NULL;
vma->vm_private_data = NULL;
gntdev_put_map(map);
}
static struct vm_operations_struct gntdev_vmops = {
+ .open = gntdev_vma_open,
.close = gntdev_vma_close,
};
vma->vm_ops = &gntdev_vmops;
- vma->vm_flags |= VM_RESERVED|VM_DONTCOPY|VM_DONTEXPAND|VM_PFNMAP;
+ vma->vm_flags |= VM_RESERVED|VM_DONTEXPAND;
+
+ if (use_ptemod)
+ vma->vm_flags |= VM_DONTCOPY|VM_PFNMAP;
vma->vm_private_data = map;
if (map_ops[i].status)
continue;
- /* m2p override only supported for GNTMAP_contains_pte mappings */
- if (!(map_ops[i].flags & GNTMAP_contains_pte))
- continue;
- pte = (pte_t *) (mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) +
+ if (map_ops[i].flags & GNTMAP_contains_pte) {
+ pte = (pte_t *) (mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) +
(map_ops[i].host_addr & ~PAGE_MASK));
- mfn = pte_mfn(*pte);
- ret = m2p_add_override(mfn, pages[i]);
+ mfn = pte_mfn(*pte);
+ } else {
+ /* If you really wanted to do this:
+ * mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
+ *
+ * The reason we do not implement it is b/c on the
+ * unmap path (gnttab_unmap_refs) we have no means of
+ * checking whether the page is !GNTMAP_contains_pte.
+ *
+ * That is without some extra data-structure to carry
+ * the struct page, bool clear_pte, and list_head next
+ * tuples and deal with allocation/delallocation, etc.
+ *
+ * The users of this API set the GNTMAP_contains_pte
+ * flag so lets just return not supported until it
+ * becomes neccessary to implement.
+ */
+ return -EOPNOTSUPP;
+ }
+ ret = m2p_add_override(mfn, pages[i],
+ map_ops[i].flags & GNTMAP_contains_pte);
if (ret)
return ret;
}
return ret;
for (i = 0; i < count; i++) {
- ret = m2p_remove_override(pages[i]);
+ ret = m2p_remove_override(pages[i], true /* clear the PTE */);
if (ret)
return ret;
}
#include <linux/sysrq.h>
#include <linux/stop_machine.h>
#include <linux/freezer.h>
+#include <linux/syscore_ops.h>
#include <xen/xen.h>
#include <xen/xenbus.h>
BUG_ON(!irqs_disabled());
- err = sysdev_suspend(PMSG_FREEZE);
+ err = syscore_suspend();
if (err) {
- printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
+ printk(KERN_ERR "xen_suspend: system core suspend failed: %d\n",
err);
return err;
}
xen_timer_resume();
}
- sysdev_resume();
+ syscore_resume();
return 0;
}
.attrs = xen_compile_attrs,
};
-int __init static xen_compilation_init(void)
+static int __init xen_compilation_init(void)
{
return sysfs_create_group(hypervisor_kobj, &xen_compilation_group);
}
if (!bdev->bd_part)
goto out_clear;
+ ret = 0;
if (disk->fops->open) {
ret = disk->fops->open(bdev, mode);
if (ret == -ERESTARTSYS) {
put_disk(disk);
goto restart;
}
- if (ret)
- goto out_clear;
}
+ /*
+ * If the device is invalidated, rescan partition
+ * if open succeeded or failed with -ENOMEDIUM.
+ * The latter is necessary to prevent ghost
+ * partitions on a removed medium.
+ */
+ if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
+ rescan_partitions(disk, bdev);
+ if (ret)
+ goto out_clear;
+
if (!bdev->bd_openers) {
bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
bdi = blk_get_backing_dev_info(bdev);
bdi = &default_backing_dev_info;
bdev_inode_switch_bdi(bdev->bd_inode, bdi);
}
- if (bdev->bd_invalidated)
- rescan_partitions(disk, bdev);
} else {
struct block_device *whole;
whole = bdget_disk(disk, 0);
}
} else {
if (bdev->bd_contains == bdev) {
- if (bdev->bd_disk->fops->open) {
+ ret = 0;
+ if (bdev->bd_disk->fops->open)
ret = bdev->bd_disk->fops->open(bdev, mode);
- if (ret)
- goto out_unlock_bdev;
- }
- if (bdev->bd_invalidated)
+ /* the same as first opener case, read comment there */
+ if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
rescan_partitions(bdev->bd_disk, bdev);
+ if (ret)
+ goto out_unlock_bdev;
}
/* only one opener holds refs to the module and disk */
module_put(disk->fops->owner);
if (value) {
acl = posix_acl_from_xattr(value, size);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+
if (acl) {
ret = posix_acl_valid(acl);
if (ret)
goto out;
- } else if (IS_ERR(acl)) {
- return PTR_ERR(acl);
}
}
u64 total_bytes; /* total bytes in the space,
this doesn't take mirrors into account */
u64 bytes_used; /* total bytes used,
- this does't take mirrors into account */
+ this doesn't take mirrors into account */
u64 bytes_pinned; /* total bytes pinned, will be freed when the
transaction finishes */
u64 bytes_reserved; /* total bytes the allocator has reserved for
spin_lock(&delayed_refs->lock);
if (delayed_refs->num_entries == 0) {
+ spin_unlock(&delayed_refs->lock);
printk(KERN_INFO "delayed_refs has NO entry\n");
return ret;
}
u64 group_start = group->key.objectid;
new_extents = kmalloc(sizeof(*new_extents),
GFP_NOFS);
+ if (!new_extents) {
+ ret = -ENOMEM;
+ goto out;
+ }
nr_extents = 1;
ret = get_new_locations(reloc_inode,
extent_key,
int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
{
struct btrfs_space_info *space_info;
+ struct btrfs_super_block *disk_super;
+ u64 features;
+ u64 flags;
+ int mixed = 0;
int ret;
- ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0,
- &space_info);
- if (ret)
- return ret;
+ disk_super = &fs_info->super_copy;
+ if (!btrfs_super_root(disk_super))
+ return 1;
- ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0,
- &space_info);
- if (ret)
- return ret;
+ features = btrfs_super_incompat_flags(disk_super);
+ if (features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS)
+ mixed = 1;
- ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0,
- &space_info);
+ flags = BTRFS_BLOCK_GROUP_SYSTEM;
+ ret = update_space_info(fs_info, flags, 0, 0, &space_info);
if (ret)
- return ret;
+ goto out;
+ if (mixed) {
+ flags = BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA;
+ ret = update_space_info(fs_info, flags, 0, 0, &space_info);
+ } else {
+ flags = BTRFS_BLOCK_GROUP_METADATA;
+ ret = update_space_info(fs_info, flags, 0, 0, &space_info);
+ if (ret)
+ goto out;
+
+ flags = BTRFS_BLOCK_GROUP_DATA;
+ ret = update_space_info(fs_info, flags, 0, 0, &space_info);
+ }
+out:
return ret;
}
prefetchw(&page->flags);
list_del(&page->lru);
if (!add_to_page_cache_lru(page, mapping,
- page->index, GFP_KERNEL)) {
+ page->index, GFP_NOFS)) {
__extent_read_full_page(tree, page, get_extent,
&bio, 0, &bio_flags);
}
while ((node = rb_last(&block_group->free_space_offset)) != NULL) {
info = rb_entry(node, struct btrfs_free_space, offset_index);
- unlink_free_space(block_group, info);
- if (info->bitmap)
- kfree(info->bitmap);
- kmem_cache_free(btrfs_free_space_cachep, info);
+ if (!info->bitmap) {
+ unlink_free_space(block_group, info);
+ kmem_cache_free(btrfs_free_space_cachep, info);
+ } else {
+ free_bitmap(block_group, info);
+ }
+
if (need_resched()) {
spin_unlock(&block_group->tree_lock);
cond_resched();
start = entry->offset;
bytes = min(entry->bytes, end - start);
unlink_free_space(block_group, entry);
- kfree(entry);
+ kmem_cache_free(btrfs_free_space_cachep, entry);
}
spin_unlock(&block_group->tree_lock);
1, 0, NULL, GFP_NOFS);
while (start < end) {
async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
+ BUG_ON(!async_cow);
async_cow->inode = inode;
async_cow->root = root;
async_cow->locked_page = locked_page;
inode = btrfs_new_inode(trans, root, dir, dentry->d_name.name,
dentry->d_name.len, dir->i_ino, objectid,
BTRFS_I(dir)->block_group, mode, &index);
- err = PTR_ERR(inode);
- if (IS_ERR(inode))
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
goto out_unlock;
+ }
err = btrfs_init_inode_security(trans, inode, dir, &dentry->d_name);
if (err) {
inode = btrfs_new_inode(trans, root, dir, dentry->d_name.name,
dentry->d_name.len, dir->i_ino, objectid,
BTRFS_I(dir)->block_group, mode, &index);
- err = PTR_ERR(inode);
- if (IS_ERR(inode))
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
goto out_unlock;
+ }
err = btrfs_init_inode_security(trans, inode, dir, &dentry->d_name);
if (err) {
inline_size = btrfs_file_extent_inline_item_len(leaf,
btrfs_item_nr(leaf, path->slots[0]));
tmp = kmalloc(inline_size, GFP_NOFS);
+ if (!tmp)
+ return -ENOMEM;
ptr = btrfs_file_extent_inline_start(item);
read_extent_buffer(leaf, tmp, ptr, inline_size);
ret = btrfs_map_block(map_tree, READ, start_sector << 9,
&map_length, NULL, 0);
if (ret) {
- bio_put(bio);
+ bio_put(orig_bio);
return -EIO;
}
dentry->d_name.len, dir->i_ino, objectid,
BTRFS_I(dir)->block_group, S_IFLNK|S_IRWXUGO,
&index);
- err = PTR_ERR(inode);
- if (IS_ERR(inode))
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
goto out_unlock;
+ }
err = btrfs_init_inode_security(trans, inode, dir, &dentry->d_name);
if (err) {
iflags |= FS_NOATIME_FL;
if (flags & BTRFS_INODE_DIRSYNC)
iflags |= FS_DIRSYNC_FL;
+ if (flags & BTRFS_INODE_NODATACOW)
+ iflags |= FS_NOCOW_FL;
+
+ if ((flags & BTRFS_INODE_COMPRESS) && !(flags & BTRFS_INODE_NOCOMPRESS))
+ iflags |= FS_COMPR_FL;
+ else if (flags & BTRFS_INODE_NOCOMPRESS)
+ iflags |= FS_NOCOMP_FL;
return iflags;
}
if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
FS_NOATIME_FL | FS_NODUMP_FL | \
FS_SYNC_FL | FS_DIRSYNC_FL | \
- FS_NOCOMP_FL | FS_COMPR_FL | \
- FS_NOCOW_FL | FS_COW_FL))
+ FS_NOCOMP_FL | FS_COMPR_FL |
+ FS_NOCOW_FL))
return -EOPNOTSUPP;
if ((flags & FS_NOCOMP_FL) && (flags & FS_COMPR_FL))
return -EINVAL;
- if ((flags & FS_NOCOW_FL) && (flags & FS_COW_FL))
- return -EINVAL;
-
return 0;
}
ip->flags |= BTRFS_INODE_DIRSYNC;
else
ip->flags &= ~BTRFS_INODE_DIRSYNC;
+ if (flags & FS_NOCOW_FL)
+ ip->flags |= BTRFS_INODE_NODATACOW;
+ else
+ ip->flags &= ~BTRFS_INODE_NODATACOW;
/*
* The COMPRESS flag can only be changed by users, while the NOCOMPRESS
} else if (flags & FS_COMPR_FL) {
ip->flags |= BTRFS_INODE_COMPRESS;
ip->flags &= ~BTRFS_INODE_NOCOMPRESS;
+ } else {
+ ip->flags &= ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS);
}
- if (flags & FS_NOCOW_FL)
- ip->flags |= BTRFS_INODE_NODATACOW;
- else if (flags & FS_COW_FL)
- ip->flags &= ~BTRFS_INODE_NODATACOW;
trans = btrfs_join_transaction(root, 1);
BUG_ON(IS_ERR(trans));
log = root->log_root;
path = btrfs_alloc_path();
- if (!path)
- return -ENOMEM;
+ if (!path) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
di = btrfs_lookup_dir_item(trans, log, path, dir->i_ino,
name, name_len, -1);
}
fail:
btrfs_free_path(path);
+out_unlock:
mutex_unlock(&BTRFS_I(dir)->log_mutex);
if (ret == -ENOSPC) {
root->fs_info->last_trans_log_full_commit = trans->transid;
unsigned long limit;
unsigned long last_waited = 0;
int force_reg = 0;
+ struct blk_plug plug;
+
+ /*
+ * this function runs all the bios we've collected for
+ * a particular device. We don't want to wander off to
+ * another device without first sending all of these down.
+ * So, setup a plug here and finish it off before we return
+ */
+ blk_start_plug(&plug);
bdi = blk_get_backing_dev_info(device->bdev);
fs_info = device->dev_root->fs_info;
spin_unlock(&device->io_lock);
done:
+ blk_finish_plug(&plug);
return 0;
}
ci->i_truncate_seq,
ci->i_truncate_size,
&inode->i_mtime, true, 1, 0);
+
+ if (!req) {
+ rc = -ENOMEM;
+ unlock_page(page);
+ break;
+ }
+
max_pages = req->r_num_pages;
alloc_page_vec(fsc, req);
used |= CEPH_CAP_FILE_CACHE;
if (ci->i_wr_ref)
used |= CEPH_CAP_FILE_WR;
- if (ci->i_wrbuffer_ref)
+ if (ci->i_wb_ref || ci->i_wrbuffer_ref)
used |= CEPH_CAP_FILE_BUFFER;
return used;
}
}
/*
- * Mark caps dirty. If inode is newly dirty, add to the global dirty
- * list.
+ * Mark caps dirty. If inode is newly dirty, return the dirty flags.
+ * Caller is then responsible for calling __mark_inode_dirty with the
+ * returned flags value.
*/
-void __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask)
+int __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask)
{
struct ceph_mds_client *mdsc =
ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
list_add(&ci->i_dirty_item, &mdsc->cap_dirty);
spin_unlock(&mdsc->cap_dirty_lock);
if (ci->i_flushing_caps == 0) {
- igrab(inode);
+ ihold(inode);
dirty |= I_DIRTY_SYNC;
}
}
if (((was | ci->i_flushing_caps) & CEPH_CAP_FILE_BUFFER) &&
(mask & CEPH_CAP_FILE_BUFFER))
dirty |= I_DIRTY_DATASYNC;
- if (dirty)
- __mark_inode_dirty(inode, dirty);
__cap_delay_requeue(mdsc, ci);
+ return dirty;
}
/*
if (got & CEPH_CAP_FILE_WR)
ci->i_wr_ref++;
if (got & CEPH_CAP_FILE_BUFFER) {
- if (ci->i_wrbuffer_ref == 0)
- igrab(&ci->vfs_inode);
- ci->i_wrbuffer_ref++;
- dout("__take_cap_refs %p wrbuffer %d -> %d (?)\n",
- &ci->vfs_inode, ci->i_wrbuffer_ref-1, ci->i_wrbuffer_ref);
+ if (ci->i_wb_ref == 0)
+ ihold(&ci->vfs_inode);
+ ci->i_wb_ref++;
+ dout("__take_cap_refs %p wb %d -> %d (?)\n",
+ &ci->vfs_inode, ci->i_wb_ref-1, ci->i_wb_ref);
}
}
if (--ci->i_rdcache_ref == 0)
last++;
if (had & CEPH_CAP_FILE_BUFFER) {
- if (--ci->i_wrbuffer_ref == 0) {
+ if (--ci->i_wb_ref == 0) {
last++;
put++;
}
- dout("put_cap_refs %p wrbuffer %d -> %d (?)\n",
- inode, ci->i_wrbuffer_ref+1, ci->i_wrbuffer_ref);
+ dout("put_cap_refs %p wb %d -> %d (?)\n",
+ inode, ci->i_wb_ref+1, ci->i_wb_ref);
}
if (had & CEPH_CAP_FILE_WR)
if (--ci->i_wr_ref == 0) {
}
}
if (ret >= 0) {
+ int dirty;
spin_lock(&inode->i_lock);
- __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_WR);
+ dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_WR);
spin_unlock(&inode->i_lock);
+ if (dirty)
+ __mark_inode_dirty(inode, dirty);
}
out:
ci->i_rd_ref = 0;
ci->i_rdcache_ref = 0;
ci->i_wr_ref = 0;
+ ci->i_wb_ref = 0;
ci->i_wrbuffer_ref = 0;
ci->i_wrbuffer_ref_head = 0;
ci->i_shared_gen = 0;
int release = 0, dirtied = 0;
int mask = 0;
int err = 0;
+ int inode_dirty_flags = 0;
if (ceph_snap(inode) != CEPH_NOSNAP)
return -EROFS;
dout("setattr %p ATTR_FILE ... hrm!\n", inode);
if (dirtied) {
- __ceph_mark_dirty_caps(ci, dirtied);
+ inode_dirty_flags = __ceph_mark_dirty_caps(ci, dirtied);
inode->i_ctime = CURRENT_TIME;
}
release &= issued;
spin_unlock(&inode->i_lock);
+ if (inode_dirty_flags)
+ __mark_inode_dirty(inode, inode_dirty_flags);
+
if (mask) {
req->r_inode = igrab(inode);
req->r_inode_drop = release;
{
struct ceph_mds_session *s = con->private;
+ dout("mdsc con_put %p (%d)\n", s, atomic_read(&s->s_ref) - 1);
ceph_put_mds_session(s);
- dout("mdsc con_put %p (%d)\n", s, atomic_read(&s->s_ref));
}
/*
up_write(&mdsc->snap_rwsem);
} else {
spin_lock(&mdsc->snap_empty_lock);
- list_add(&mdsc->snap_empty, &realm->empty_item);
+ list_add(&realm->empty_item, &mdsc->snap_empty);
spin_unlock(&mdsc->snap_empty_lock);
}
}
/* held references to caps */
int i_pin_ref;
- int i_rd_ref, i_rdcache_ref, i_wr_ref;
+ int i_rd_ref, i_rdcache_ref, i_wr_ref, i_wb_ref;
int i_wrbuffer_ref, i_wrbuffer_ref_head;
u32 i_shared_gen; /* increment each time we get FILE_SHARED */
u32 i_rdcache_gen; /* incremented each time we get FILE_CACHE. */
{
return ci->i_dirty_caps | ci->i_flushing_caps;
}
-extern void __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask);
+extern int __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask);
extern int ceph_caps_revoking(struct ceph_inode_info *ci, int mask);
extern int __ceph_caps_used(struct ceph_inode_info *ci);
struct ceph_inode_xattr *xattr = NULL;
int issued;
int required_blob_size;
+ int dirty;
if (ceph_snap(inode) != CEPH_NOSNAP)
return -EROFS;
dout("setxattr %p issued %s\n", inode, ceph_cap_string(issued));
err = __set_xattr(ci, newname, name_len, newval,
val_len, 1, 1, 1, &xattr);
- __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL);
+ dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL);
ci->i_xattrs.dirty = true;
inode->i_ctime = CURRENT_TIME;
spin_unlock(&inode->i_lock);
-
+ if (dirty)
+ __mark_inode_dirty(inode, dirty);
return err;
do_sync:
struct ceph_vxattr_cb *vxattrs = ceph_inode_vxattrs(inode);
int issued;
int err;
+ int dirty;
if (ceph_snap(inode) != CEPH_NOSNAP)
return -EROFS;
goto do_sync;
err = __remove_xattr_by_name(ceph_inode(inode), name);
- __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL);
+ dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL);
ci->i_xattrs.dirty = true;
inode->i_ctime = CURRENT_TIME;
spin_unlock(&inode->i_lock);
-
+ if (dirty)
+ __mark_inode_dirty(inode, dirty);
return err;
do_sync:
spin_unlock(&inode->i_lock);
for (i = 0, j = 0; i < srclen; j++) {
src_char = source[i];
+ charlen = 1;
switch (src_char) {
case 0:
put_unaligned(0, &target[j]);
dst_char = cpu_to_le16(0x003f);
charlen = 1;
}
- /*
- * character may take more than one byte in the source
- * string, but will take exactly two bytes in the
- * target string
- */
- i += charlen;
- continue;
}
+ /*
+ * character may take more than one byte in the source string,
+ * but will take exactly two bytes in the target string
+ */
+ i += charlen;
put_unaligned(dst_char, &target[j]);
- i++; /* move to next char in source string */
}
ctoUCS_out:
char *data_area_of_target;
char *data_area_of_buf2;
int remaining;
- __u16 byte_count, total_data_size, total_in_buf, total_in_buf2;
+ unsigned int byte_count, total_in_buf;
+ __u16 total_data_size, total_in_buf2;
total_data_size = get_unaligned_le16(&pSMBt->t2_rsp.TotalDataCount);
remaining = total_data_size - total_in_buf;
if (remaining < 0)
- return -EINVAL;
+ return -EPROTO;
if (remaining == 0) /* nothing to do, ignore */
return 0;
data_area_of_target += total_in_buf;
/* copy second buffer into end of first buffer */
- memcpy(data_area_of_target, data_area_of_buf2, total_in_buf2);
total_in_buf += total_in_buf2;
+ /* is the result too big for the field? */
+ if (total_in_buf > USHRT_MAX)
+ return -EPROTO;
put_unaligned_le16(total_in_buf, &pSMBt->t2_rsp.DataCount);
+
+ /* fix up the BCC */
byte_count = get_bcc_le(pTargetSMB);
byte_count += total_in_buf2;
+ /* is the result too big for the field? */
+ if (byte_count > USHRT_MAX)
+ return -EPROTO;
put_bcc_le(byte_count, pTargetSMB);
byte_count = pTargetSMB->smb_buf_length;
byte_count += total_in_buf2;
-
- /* BB also add check that we are not beyond maximum buffer size */
-
+ /* don't allow buffer to overflow */
+ if (byte_count > CIFSMaxBufSize)
+ return -ENOBUFS;
pTargetSMB->smb_buf_length = byte_count;
+ memcpy(data_area_of_target, data_area_of_buf2, total_in_buf2);
+
if (remaining == total_in_buf2) {
cFYI(1, "found the last secondary response");
return 0; /* we are done */
list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
- if ((mid_entry->mid == smb_buffer->Mid) &&
- (mid_entry->midState == MID_REQUEST_SUBMITTED) &&
- (mid_entry->command == smb_buffer->Command)) {
- if (length == 0 &&
- check2ndT2(smb_buffer, server->maxBuf) > 0) {
- /* We have a multipart transact2 resp */
- isMultiRsp = true;
- if (mid_entry->resp_buf) {
- /* merge response - fix up 1st*/
- if (coalesce_t2(smb_buffer,
- mid_entry->resp_buf)) {
- mid_entry->multiRsp =
- true;
- break;
- } else {
- /* all parts received */
- mid_entry->multiEnd =
- true;
- goto multi_t2_fnd;
- }
+ if (mid_entry->mid != smb_buffer->Mid ||
+ mid_entry->midState != MID_REQUEST_SUBMITTED ||
+ mid_entry->command != smb_buffer->Command) {
+ mid_entry = NULL;
+ continue;
+ }
+
+ if (length == 0 &&
+ check2ndT2(smb_buffer, server->maxBuf) > 0) {
+ /* We have a multipart transact2 resp */
+ isMultiRsp = true;
+ if (mid_entry->resp_buf) {
+ /* merge response - fix up 1st*/
+ length = coalesce_t2(smb_buffer,
+ mid_entry->resp_buf);
+ if (length > 0) {
+ length = 0;
+ mid_entry->multiRsp = true;
+ break;
} else {
- if (!isLargeBuf) {
- cERROR(1, "1st trans2 resp needs bigbuf");
- /* BB maybe we can fix this up, switch
- to already allocated large buffer? */
- } else {
- /* Have first buffer */
- mid_entry->resp_buf =
- smb_buffer;
- mid_entry->largeBuf =
- true;
- bigbuf = NULL;
- }
+ /* all parts received or
+ * packet is malformed
+ */
+ mid_entry->multiEnd = true;
+ goto multi_t2_fnd;
+ }
+ } else {
+ if (!isLargeBuf) {
+ /*
+ * FIXME: switch to already
+ * allocated largebuf?
+ */
+ cERROR(1, "1st trans2 resp "
+ "needs bigbuf");
+ } else {
+ /* Have first buffer */
+ mid_entry->resp_buf =
+ smb_buffer;
+ mid_entry->largeBuf = true;
+ bigbuf = NULL;
}
- break;
}
- mid_entry->resp_buf = smb_buffer;
- mid_entry->largeBuf = isLargeBuf;
+ break;
+ }
+ mid_entry->resp_buf = smb_buffer;
+ mid_entry->largeBuf = isLargeBuf;
multi_t2_fnd:
- if (length == 0)
- mid_entry->midState =
- MID_RESPONSE_RECEIVED;
- else
- mid_entry->midState =
- MID_RESPONSE_MALFORMED;
+ if (length == 0)
+ mid_entry->midState = MID_RESPONSE_RECEIVED;
+ else
+ mid_entry->midState = MID_RESPONSE_MALFORMED;
#ifdef CONFIG_CIFS_STATS2
- mid_entry->when_received = jiffies;
+ mid_entry->when_received = jiffies;
#endif
- list_del_init(&mid_entry->qhead);
- mid_entry->callback(mid_entry);
- break;
- }
- mid_entry = NULL;
+ list_del_init(&mid_entry->qhead);
+ mid_entry->callback(mid_entry);
+ break;
}
spin_unlock(&GlobalMid_Lock);
cifs_parse_mount_options(char *options, const char *devname,
struct smb_vol *vol)
{
- char *value;
- char *data;
+ char *value, *data, *end;
unsigned int temp_len, i, j;
char separator[2];
short int override_uid = -1;
if (!options)
return 1;
+ end = options + strlen(options);
if (strncmp(options, "sep=", 4) == 0) {
if (options[4] != 0) {
separator[0] = options[4];
the only illegal character in a password is null */
if ((value[temp_len] == 0) &&
+ (value + temp_len < end) &&
(value[temp_len+1] == separator[0])) {
/* reinsert comma */
value[temp_len] = separator[0];
0 /* not legacy */, cifs_sb->local_nls,
cifs_sb->mnt_cifs_flags &
CIFS_MOUNT_MAP_SPECIAL_CHR);
+
+ if (rc == -EOPNOTSUPP || rc == -EINVAL)
+ rc = SMBQueryInformation(xid, tcon, full_path, pfile_info,
+ cifs_sb->local_nls, cifs_sb->mnt_cifs_flags &
+ CIFS_MOUNT_MAP_SPECIAL_CHR);
kfree(pfile_info);
return rc;
}
}
static void
-decode_unicode_ssetup(char **pbcc_area, __u16 bleft, struct cifsSesInfo *ses,
+decode_unicode_ssetup(char **pbcc_area, int bleft, struct cifsSesInfo *ses,
const struct nls_table *nls_cp)
{
int len;
cFYI(1, "bleft %d", bleft);
- /*
- * Windows servers do not always double null terminate their final
- * Unicode string. Check to see if there are an uneven number of bytes
- * left. If so, then add an extra NULL pad byte to the end of the
- * response.
- *
- * See section 2.7.2 in "Implementing CIFS" for details
- */
- if (bleft % 2) {
- data[bleft] = 0;
- ++bleft;
- }
-
kfree(ses->serverOS);
ses->serverOS = cifs_strndup_from_ucs(data, bleft, true, nls_cp);
cFYI(1, "serverOS=%s", ses->serverOS);
}
/* BB check if Unicode and decode strings */
- if (smb_buf->Flags2 & SMBFLG2_UNICODE) {
+ if (bytes_remaining == 0) {
+ /* no string area to decode, do nothing */
+ } else if (smb_buf->Flags2 & SMBFLG2_UNICODE) {
/* unicode string area must be word-aligned */
if (((unsigned long) bcc_ptr - (unsigned long) smb_buf) % 2) {
++bcc_ptr;
static void configfs_d_iput(struct dentry * dentry,
struct inode * inode)
{
- struct configfs_dirent * sd = dentry->d_fsdata;
+ struct configfs_dirent *sd = dentry->d_fsdata;
if (sd) {
BUG_ON(sd->s_dentry != dentry);
+ /* Coordinate with configfs_readdir */
+ spin_lock(&configfs_dirent_lock);
sd->s_dentry = NULL;
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
}
iput(inode);
sd = child->d_fsdata;
sd->s_type |= CONFIGFS_USET_DEFAULT;
} else {
- d_delete(child);
+ BUG_ON(child->d_inode);
+ d_drop(child);
dput(child);
}
}
struct configfs_dirent * parent_sd = dentry->d_fsdata;
struct configfs_dirent *cursor = filp->private_data;
struct list_head *p, *q = &cursor->s_sibling;
- ino_t ino;
+ ino_t ino = 0;
int i = filp->f_pos;
switch (i) {
struct configfs_dirent *next;
const char * name;
int len;
+ struct inode *inode = NULL;
next = list_entry(p, struct configfs_dirent,
s_sibling);
name = configfs_get_name(next);
len = strlen(name);
- if (next->s_dentry)
- ino = next->s_dentry->d_inode->i_ino;
- else
+
+ /*
+ * We'll have a dentry and an inode for
+ * PINNED items and for open attribute
+ * files. We lock here to prevent a race
+ * with configfs_d_iput() clearing
+ * s_dentry before calling iput().
+ *
+ * Why do we go to the trouble? If
+ * someone has an attribute file open,
+ * the inode number should match until
+ * they close it. Beyond that, we don't
+ * care.
+ */
+ spin_lock(&configfs_dirent_lock);
+ dentry = next->s_dentry;
+ if (dentry)
+ inode = dentry->d_inode;
+ if (inode)
+ ino = inode->i_ino;
+ spin_unlock(&configfs_dirent_lock);
+ if (!inode)
ino = iunique(configfs_sb, 2);
if (filldir(dirent, name, len, filp->f_pos, ino,
err = configfs_attach_group(sd->s_element, &group->cg_item,
dentry);
if (err) {
- d_delete(dentry);
+ BUG_ON(dentry->d_inode);
+ d_drop(dentry);
dput(dentry);
} else {
spin_lock(&configfs_dirent_lock);
static unsigned int d_hash_mask __read_mostly;
static unsigned int d_hash_shift __read_mostly;
-struct dcache_hash_bucket {
- struct hlist_bl_head head;
-};
-static struct dcache_hash_bucket *dentry_hashtable __read_mostly;
+static struct hlist_bl_head *dentry_hashtable __read_mostly;
-static inline struct dcache_hash_bucket *d_hash(struct dentry *parent,
+static inline struct hlist_bl_head *d_hash(struct dentry *parent,
unsigned long hash)
{
hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
return dentry_hashtable + (hash & D_HASHMASK);
}
-static inline void spin_lock_bucket(struct dcache_hash_bucket *b)
-{
- bit_spin_lock(0, (unsigned long *)&b->head.first);
-}
-
-static inline void spin_unlock_bucket(struct dcache_hash_bucket *b)
-{
- __bit_spin_unlock(0, (unsigned long *)&b->head.first);
-}
-
/* Statistics gathering. */
struct dentry_stat_t dentry_stat = {
.age_limit = 45,
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
- /* if dentry was never inserted into hash, immediate free is OK */
- if (hlist_bl_unhashed(&dentry->d_hash))
+ /* if dentry was never visible to RCU, immediate free is OK */
+ if (!(dentry->d_flags & DCACHE_RCUACCESS))
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
*/
void __d_drop(struct dentry *dentry)
{
- if (!(dentry->d_flags & DCACHE_UNHASHED)) {
- if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED)) {
- bit_spin_lock(0,
- (unsigned long *)&dentry->d_sb->s_anon.first);
- dentry->d_flags |= DCACHE_UNHASHED;
- hlist_bl_del_init(&dentry->d_hash);
- __bit_spin_unlock(0,
- (unsigned long *)&dentry->d_sb->s_anon.first);
- } else {
- struct dcache_hash_bucket *b;
+ if (!d_unhashed(dentry)) {
+ struct hlist_bl_head *b;
+ if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
+ b = &dentry->d_sb->s_anon;
+ else
b = d_hash(dentry->d_parent, dentry->d_name.hash);
- spin_lock_bucket(b);
- /*
- * We may not actually need to put DCACHE_UNHASHED
- * manipulations under the hash lock, but follow
- * the principle of least surprise.
- */
- dentry->d_flags |= DCACHE_UNHASHED;
- hlist_bl_del_rcu(&dentry->d_hash);
- spin_unlock_bucket(b);
- dentry_rcuwalk_barrier(dentry);
- }
+
+ hlist_bl_lock(b);
+ __hlist_bl_del(&dentry->d_hash);
+ dentry->d_hash.pprev = NULL;
+ hlist_bl_unlock(b);
+
+ dentry_rcuwalk_barrier(dentry);
}
}
EXPORT_SYMBOL(__d_drop);
dname[name->len] = 0;
dentry->d_count = 1;
- dentry->d_flags = DCACHE_UNHASHED;
+ dentry->d_flags = 0;
spin_lock_init(&dentry->d_lock);
seqcount_init(&dentry->d_seq);
dentry->d_inode = NULL;
tmp->d_inode = inode;
tmp->d_flags |= DCACHE_DISCONNECTED;
list_add(&tmp->d_alias, &inode->i_dentry);
- bit_spin_lock(0, (unsigned long *)&tmp->d_sb->s_anon.first);
- tmp->d_flags &= ~DCACHE_UNHASHED;
+ hlist_bl_lock(&tmp->d_sb->s_anon);
hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_anon);
- __bit_spin_unlock(0, (unsigned long *)&tmp->d_sb->s_anon.first);
+ hlist_bl_unlock(&tmp->d_sb->s_anon);
spin_unlock(&tmp->d_lock);
spin_unlock(&inode->i_lock);
security_d_instantiate(tmp, inode);
unsigned int len = name->len;
unsigned int hash = name->hash;
const unsigned char *str = name->name;
- struct dcache_hash_bucket *b = d_hash(parent, hash);
+ struct hlist_bl_head *b = d_hash(parent, hash);
struct hlist_bl_node *node;
struct dentry *dentry;
*
* See Documentation/filesystems/path-lookup.txt for more details.
*/
- hlist_bl_for_each_entry_rcu(dentry, node, &b->head, d_hash) {
+ hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
struct inode *i;
const char *tname;
int tlen;
unsigned int len = name->len;
unsigned int hash = name->hash;
const unsigned char *str = name->name;
- struct dcache_hash_bucket *b = d_hash(parent, hash);
+ struct hlist_bl_head *b = d_hash(parent, hash);
struct hlist_bl_node *node;
struct dentry *found = NULL;
struct dentry *dentry;
*/
rcu_read_lock();
- hlist_bl_for_each_entry_rcu(dentry, node, &b->head, d_hash) {
+ hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
const char *tname;
int tlen;
}
EXPORT_SYMBOL(d_delete);
-static void __d_rehash(struct dentry * entry, struct dcache_hash_bucket *b)
+static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
{
BUG_ON(!d_unhashed(entry));
- spin_lock_bucket(b);
- entry->d_flags &= ~DCACHE_UNHASHED;
- hlist_bl_add_head_rcu(&entry->d_hash, &b->head);
- spin_unlock_bucket(b);
+ hlist_bl_lock(b);
+ entry->d_flags |= DCACHE_RCUACCESS;
+ hlist_bl_add_head_rcu(&entry->d_hash, b);
+ hlist_bl_unlock(b);
}
static void _d_rehash(struct dentry * entry)
dentry_hashtable =
alloc_large_system_hash("Dentry cache",
- sizeof(struct dcache_hash_bucket),
+ sizeof(struct hlist_bl_head),
dhash_entries,
13,
HASH_EARLY,
0);
for (loop = 0; loop < (1 << d_hash_shift); loop++)
- INIT_HLIST_BL_HEAD(&dentry_hashtable[loop].head);
+ INIT_HLIST_BL_HEAD(dentry_hashtable + loop);
}
static void __init dcache_init(void)
dentry_hashtable =
alloc_large_system_hash("Dentry cache",
- sizeof(struct dcache_hash_bucket),
+ sizeof(struct hlist_bl_head),
dhash_entries,
13,
0,
0);
for (loop = 0; loop < (1 << d_hash_shift); loop++)
- INIT_HLIST_BL_HEAD(&dentry_hashtable[loop].head);
+ INIT_HLIST_BL_HEAD(dentry_hashtable + loop);
}
/* SLAB cache for __getname() consumers */
{
char buf[32];
int buf_size;
+ bool bv;
u32 *val = file->private_data;
buf_size = min(count, (sizeof(buf)-1));
if (copy_from_user(buf, user_buf, buf_size))
return -EFAULT;
- switch (buf[0]) {
- case 'y':
- case 'Y':
- case '1':
- *val = 1;
- break;
- case 'n':
- case 'N':
- case '0':
- *val = 0;
- break;
- }
-
+ if (strtobool(buf, &bv) == 0)
+ *val = bv;
+
return count;
}
crypt_stat->metadata_size = ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE;
}
+void ecryptfs_i_size_init(const char *page_virt, struct inode *inode)
+{
+ struct ecryptfs_mount_crypt_stat *mount_crypt_stat;
+ struct ecryptfs_crypt_stat *crypt_stat;
+ u64 file_size;
+
+ crypt_stat = &ecryptfs_inode_to_private(inode)->crypt_stat;
+ mount_crypt_stat =
+ &ecryptfs_superblock_to_private(inode->i_sb)->mount_crypt_stat;
+ if (mount_crypt_stat->flags & ECRYPTFS_ENCRYPTED_VIEW_ENABLED) {
+ file_size = i_size_read(ecryptfs_inode_to_lower(inode));
+ if (crypt_stat->flags & ECRYPTFS_METADATA_IN_XATTR)
+ file_size += crypt_stat->metadata_size;
+ } else
+ file_size = get_unaligned_be64(page_virt);
+ i_size_write(inode, (loff_t)file_size);
+ crypt_stat->flags |= ECRYPTFS_I_SIZE_INITIALIZED;
+}
+
/**
* ecryptfs_read_headers_virt
* @page_virt: The virtual address into which to read the headers
rc = -EINVAL;
goto out;
}
+ if (!(crypt_stat->flags & ECRYPTFS_I_SIZE_INITIALIZED))
+ ecryptfs_i_size_init(page_virt, ecryptfs_dentry->d_inode);
offset += MAGIC_ECRYPTFS_MARKER_SIZE_BYTES;
rc = ecryptfs_process_flags(crypt_stat, (page_virt + offset),
&bytes_read);
#define ECRYPTFS_ENCFN_USE_MOUNT_FNEK 0x00000800
#define ECRYPTFS_ENCFN_USE_FEK 0x00001000
#define ECRYPTFS_UNLINK_SIGS 0x00002000
+#define ECRYPTFS_I_SIZE_INITIALIZED 0x00004000
u32 flags;
unsigned int file_version;
size_t iv_bytes;
struct ecryptfs_inode_info {
struct inode vfs_inode;
struct inode *wii_inode;
+ struct mutex lower_file_mutex;
+ atomic_t lower_file_count;
struct file *lower_file;
struct ecryptfs_crypt_stat crypt_stat;
};
int ecryptfs_interpose(struct dentry *hidden_dentry,
struct dentry *this_dentry, struct super_block *sb,
u32 flags);
+void ecryptfs_i_size_init(const char *page_virt, struct inode *inode);
int ecryptfs_lookup_and_interpose_lower(struct dentry *ecryptfs_dentry,
struct dentry *lower_dentry,
struct inode *ecryptfs_dir_inode);
struct dentry *lower_dentry,
struct vfsmount *lower_mnt,
const struct cred *cred);
-int ecryptfs_init_persistent_file(struct dentry *ecryptfs_dentry);
+int ecryptfs_get_lower_file(struct dentry *ecryptfs_dentry);
+void ecryptfs_put_lower_file(struct inode *inode);
int
ecryptfs_write_tag_70_packet(char *dest, size_t *remaining_bytes,
size_t *packet_size,
| ECRYPTFS_ENCRYPTED);
}
mutex_unlock(&crypt_stat->cs_mutex);
- rc = ecryptfs_init_persistent_file(ecryptfs_dentry);
+ rc = ecryptfs_get_lower_file(ecryptfs_dentry);
if (rc) {
printk(KERN_ERR "%s: Error attempting to initialize "
- "the persistent file for the dentry with name "
+ "the lower file for the dentry with name "
"[%s]; rc = [%d]\n", __func__,
ecryptfs_dentry->d_name.name, rc);
goto out_free;
if ((ecryptfs_inode_to_private(inode)->lower_file->f_flags & O_ACCMODE)
== O_RDONLY && (file->f_flags & O_ACCMODE) != O_RDONLY) {
rc = -EPERM;
- printk(KERN_WARNING "%s: Lower persistent file is RO; eCryptfs "
+ printk(KERN_WARNING "%s: Lower file is RO; eCryptfs "
"file must hence be opened RO\n", __func__);
- goto out_free;
+ goto out_put;
}
ecryptfs_set_file_lower(
file, ecryptfs_inode_to_private(inode)->lower_file);
"Plaintext passthrough mode is not "
"enabled; returning -EIO\n");
mutex_unlock(&crypt_stat->cs_mutex);
- goto out_free;
+ goto out_put;
}
rc = 0;
- crypt_stat->flags &= ~(ECRYPTFS_ENCRYPTED);
+ crypt_stat->flags &= ~(ECRYPTFS_I_SIZE_INITIALIZED
+ | ECRYPTFS_ENCRYPTED);
mutex_unlock(&crypt_stat->cs_mutex);
goto out;
}
"[0x%.16lx] size: [0x%.16llx]\n", inode, inode->i_ino,
(unsigned long long)i_size_read(inode));
goto out;
+out_put:
+ ecryptfs_put_lower_file(inode);
out_free:
kmem_cache_free(ecryptfs_file_info_cache,
ecryptfs_file_to_private(file));
static int ecryptfs_flush(struct file *file, fl_owner_t td)
{
- int rc = 0;
- struct file *lower_file = NULL;
-
- lower_file = ecryptfs_file_to_lower(file);
- if (lower_file->f_op && lower_file->f_op->flush)
- rc = lower_file->f_op->flush(lower_file, td);
- return rc;
+ return file->f_mode & FMODE_WRITE
+ ? filemap_write_and_wait(file->f_mapping) : 0;
}
static int ecryptfs_release(struct inode *inode, struct file *file)
{
+ ecryptfs_put_lower_file(inode);
kmem_cache_free(ecryptfs_file_info_cache,
ecryptfs_file_to_private(file));
return 0;
"context; rc = [%d]\n", rc);
goto out;
}
- rc = ecryptfs_init_persistent_file(ecryptfs_dentry);
+ rc = ecryptfs_get_lower_file(ecryptfs_dentry);
if (rc) {
printk(KERN_ERR "%s: Error attempting to initialize "
- "the persistent file for the dentry with name "
+ "the lower file for the dentry with name "
"[%s]; rc = [%d]\n", __func__,
ecryptfs_dentry->d_name.name, rc);
goto out;
}
rc = ecryptfs_write_metadata(ecryptfs_dentry);
- if (rc) {
+ if (rc)
printk(KERN_ERR "Error writing headers; rc = [%d]\n", rc);
- goto out;
- }
+ ecryptfs_put_lower_file(ecryptfs_dentry->d_inode);
out:
return rc;
}
struct dentry *lower_dir_dentry;
struct vfsmount *lower_mnt;
struct inode *lower_inode;
- struct ecryptfs_mount_crypt_stat *mount_crypt_stat;
struct ecryptfs_crypt_stat *crypt_stat;
char *page_virt = NULL;
- u64 file_size;
- int rc = 0;
+ int put_lower = 0, rc = 0;
lower_dir_dentry = lower_dentry->d_parent;
lower_mnt = mntget(ecryptfs_dentry_to_lower_mnt(
rc = -ENOMEM;
goto out;
}
- rc = ecryptfs_init_persistent_file(ecryptfs_dentry);
+ rc = ecryptfs_get_lower_file(ecryptfs_dentry);
if (rc) {
printk(KERN_ERR "%s: Error attempting to initialize "
- "the persistent file for the dentry with name "
+ "the lower file for the dentry with name "
"[%s]; rc = [%d]\n", __func__,
ecryptfs_dentry->d_name.name, rc);
goto out_free_kmem;
}
+ put_lower = 1;
crypt_stat = &ecryptfs_inode_to_private(
ecryptfs_dentry->d_inode)->crypt_stat;
/* TODO: lock for crypt_stat comparison */
}
crypt_stat->flags |= ECRYPTFS_METADATA_IN_XATTR;
}
- mount_crypt_stat = &ecryptfs_superblock_to_private(
- ecryptfs_dentry->d_sb)->mount_crypt_stat;
- if (mount_crypt_stat->flags & ECRYPTFS_ENCRYPTED_VIEW_ENABLED) {
- if (crypt_stat->flags & ECRYPTFS_METADATA_IN_XATTR)
- file_size = (crypt_stat->metadata_size
- + i_size_read(lower_dentry->d_inode));
- else
- file_size = i_size_read(lower_dentry->d_inode);
- } else {
- file_size = get_unaligned_be64(page_virt);
- }
- i_size_write(ecryptfs_dentry->d_inode, (loff_t)file_size);
+ ecryptfs_i_size_init(page_virt, ecryptfs_dentry->d_inode);
out_free_kmem:
kmem_cache_free(ecryptfs_header_cache_2, page_virt);
goto out;
mntput(lower_mnt);
d_drop(ecryptfs_dentry);
out:
+ if (put_lower)
+ ecryptfs_put_lower_file(ecryptfs_dentry->d_inode);
return rc;
}
dget(lower_dentry);
rc = vfs_rmdir(lower_dir_dentry->d_inode, lower_dentry);
dput(lower_dentry);
- if (!rc)
- d_delete(lower_dentry);
fsstack_copy_attr_times(dir, lower_dir_dentry->d_inode);
dir->i_nlink = lower_dir_dentry->d_inode->i_nlink;
unlock_dir(lower_dir_dentry);
fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode);
out_lock:
unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
- dput(lower_new_dentry->d_parent);
- dput(lower_old_dentry->d_parent);
+ dput(lower_new_dir_dentry);
+ dput(lower_old_dir_dentry);
dput(lower_new_dentry);
dput(lower_old_dentry);
return rc;
if (unlikely((ia->ia_size == i_size))) {
lower_ia->ia_valid &= ~ATTR_SIZE;
- goto out;
+ return 0;
}
+ rc = ecryptfs_get_lower_file(dentry);
+ if (rc)
+ return rc;
crypt_stat = &ecryptfs_inode_to_private(dentry->d_inode)->crypt_stat;
/* Switch on growing or shrinking file */
if (ia->ia_size > i_size) {
lower_ia->ia_valid &= ~ATTR_SIZE;
}
out:
+ ecryptfs_put_lower_file(inode);
return rc;
}
mount_crypt_stat = &ecryptfs_superblock_to_private(
dentry->d_sb)->mount_crypt_stat;
+ rc = ecryptfs_get_lower_file(dentry);
+ if (rc) {
+ mutex_unlock(&crypt_stat->cs_mutex);
+ goto out;
+ }
rc = ecryptfs_read_metadata(dentry);
+ ecryptfs_put_lower_file(inode);
if (rc) {
if (!(mount_crypt_stat->flags
& ECRYPTFS_PLAINTEXT_PASSTHROUGH_ENABLED)) {
goto out;
}
rc = 0;
- crypt_stat->flags &= ~(ECRYPTFS_ENCRYPTED);
+ crypt_stat->flags &= ~(ECRYPTFS_I_SIZE_INITIALIZED
+ | ECRYPTFS_ENCRYPTED);
}
}
mutex_unlock(&crypt_stat->cs_mutex);
+ if (S_ISREG(inode->i_mode)) {
+ rc = filemap_write_and_wait(inode->i_mapping);
+ if (rc)
+ goto out;
+ fsstack_copy_attr_all(inode, lower_inode);
+ }
memcpy(&lower_ia, ia, sizeof(lower_ia));
if (ia->ia_valid & ATTR_FILE)
lower_ia.ia_file = ecryptfs_file_to_lower(ia->ia_file);
* @ignored: ignored
*
* The eCryptfs kernel thread that has the responsibility of getting
- * the lower persistent file with RW permissions.
+ * the lower file with RW permissions.
*
* Returns zero on success; non-zero otherwise
*/
int rc = 0;
/* Corresponding dput() and mntput() are done when the
- * persistent file is fput() when the eCryptfs inode is
- * destroyed. */
+ * lower file is fput() when all eCryptfs files for the inode are
+ * released. */
dget(lower_dentry);
mntget(lower_mnt);
flags |= IS_RDONLY(lower_dentry->d_inode) ? O_RDONLY : O_RDWR;
}
/**
- * ecryptfs_init_persistent_file
+ * ecryptfs_init_lower_file
* @ecryptfs_dentry: Fully initialized eCryptfs dentry object, with
* the lower dentry and the lower mount set
*
* inode. All I/O operations to the lower inode occur through that
* file. When the first eCryptfs dentry that interposes with the first
* lower dentry for that inode is created, this function creates the
- * persistent file struct and associates it with the eCryptfs
- * inode. When the eCryptfs inode is destroyed, the file is closed.
+ * lower file struct and associates it with the eCryptfs
+ * inode. When all eCryptfs files associated with the inode are released, the
+ * file is closed.
*
- * The persistent file will be opened with read/write permissions, if
+ * The lower file will be opened with read/write permissions, if
* possible. Otherwise, it is opened read-only.
*
- * This function does nothing if a lower persistent file is already
+ * This function does nothing if a lower file is already
* associated with the eCryptfs inode.
*
* Returns zero on success; non-zero otherwise
*/
-int ecryptfs_init_persistent_file(struct dentry *ecryptfs_dentry)
+static int ecryptfs_init_lower_file(struct dentry *dentry,
+ struct file **lower_file)
{
const struct cred *cred = current_cred();
- struct ecryptfs_inode_info *inode_info =
- ecryptfs_inode_to_private(ecryptfs_dentry->d_inode);
- int rc = 0;
+ struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+ struct vfsmount *lower_mnt = ecryptfs_dentry_to_lower_mnt(dentry);
+ int rc;
- if (!inode_info->lower_file) {
- struct dentry *lower_dentry;
- struct vfsmount *lower_mnt =
- ecryptfs_dentry_to_lower_mnt(ecryptfs_dentry);
+ rc = ecryptfs_privileged_open(lower_file, lower_dentry, lower_mnt,
+ cred);
+ if (rc) {
+ printk(KERN_ERR "Error opening lower file "
+ "for lower_dentry [0x%p] and lower_mnt [0x%p]; "
+ "rc = [%d]\n", lower_dentry, lower_mnt, rc);
+ (*lower_file) = NULL;
+ }
+ return rc;
+}
- lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
- rc = ecryptfs_privileged_open(&inode_info->lower_file,
- lower_dentry, lower_mnt, cred);
- if (rc) {
- printk(KERN_ERR "Error opening lower persistent file "
- "for lower_dentry [0x%p] and lower_mnt [0x%p]; "
- "rc = [%d]\n", lower_dentry, lower_mnt, rc);
- inode_info->lower_file = NULL;
- }
+int ecryptfs_get_lower_file(struct dentry *dentry)
+{
+ struct ecryptfs_inode_info *inode_info =
+ ecryptfs_inode_to_private(dentry->d_inode);
+ int count, rc = 0;
+
+ mutex_lock(&inode_info->lower_file_mutex);
+ count = atomic_inc_return(&inode_info->lower_file_count);
+ if (WARN_ON_ONCE(count < 1))
+ rc = -EINVAL;
+ else if (count == 1) {
+ rc = ecryptfs_init_lower_file(dentry,
+ &inode_info->lower_file);
+ if (rc)
+ atomic_set(&inode_info->lower_file_count, 0);
}
+ mutex_unlock(&inode_info->lower_file_mutex);
return rc;
}
+void ecryptfs_put_lower_file(struct inode *inode)
+{
+ struct ecryptfs_inode_info *inode_info;
+
+ inode_info = ecryptfs_inode_to_private(inode);
+ if (atomic_dec_and_mutex_lock(&inode_info->lower_file_count,
+ &inode_info->lower_file_mutex)) {
+ fput(inode_info->lower_file);
+ inode_info->lower_file = NULL;
+ mutex_unlock(&inode_info->lower_file_mutex);
+ }
+}
+
static struct inode *ecryptfs_get_inode(struct inode *lower_inode,
struct super_block *sb)
{
if (unlikely(!inode_info))
goto out;
ecryptfs_init_crypt_stat(&inode_info->crypt_stat);
+ mutex_init(&inode_info->lower_file_mutex);
+ atomic_set(&inode_info->lower_file_count, 0);
inode_info->lower_file = NULL;
inode = &inode_info->vfs_inode;
out:
*
* This is used during the final destruction of the inode. All
* allocation of memory related to the inode, including allocated
- * memory in the crypt_stat struct, will be released here. This
- * function also fput()'s the persistent file for the lower inode.
+ * memory in the crypt_stat struct, will be released here.
* There should be no chance that this deallocation will be missed.
*/
static void ecryptfs_destroy_inode(struct inode *inode)
struct ecryptfs_inode_info *inode_info;
inode_info = ecryptfs_inode_to_private(inode);
- if (inode_info->lower_file) {
- struct dentry *lower_dentry =
- inode_info->lower_file->f_dentry;
-
- BUG_ON(!lower_dentry);
- if (lower_dentry->d_inode) {
- fput(inode_info->lower_file);
- inode_info->lower_file = NULL;
- }
- }
+ BUG_ON(inode_info->lower_file);
ecryptfs_destroy_crypt_stat(&inode_info->crypt_stat);
call_rcu(&inode->i_rcu, ecryptfs_i_callback);
}
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/mmzone.h>
#include <linux/time.h>
#include <linux/sched.h>
#include <linux/slab.h>
*/
static DEFINE_PER_CPU(struct fdtable_defer, fdtable_defer_list);
-static inline void *alloc_fdmem(unsigned int size)
+static void *alloc_fdmem(unsigned int size)
{
- void *data;
-
- data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
- if (data != NULL)
- return data;
-
+ /*
+ * Very large allocations can stress page reclaim, so fall back to
+ * vmalloc() if the allocation size will be considered "large" by the VM.
+ */
+ if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
+ void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
+ if (data != NULL)
+ return data;
+ }
return vmalloc(size);
}
if (!inode)
return 0;
- if (nd->flags & LOOKUP_RCU)
+ if (nd && (nd->flags & LOOKUP_RCU))
return -ECHILD;
fc = get_fuse_conn(inode);
static inline void spin_lock_bucket(unsigned int hash)
{
- struct hlist_bl_head *bl = &gl_hash_table[hash];
- bit_spin_lock(0, (unsigned long *)bl);
+ hlist_bl_lock(&gl_hash_table[hash]);
}
static inline void spin_unlock_bucket(unsigned int hash)
{
- struct hlist_bl_head *bl = &gl_hash_table[hash];
- __bit_spin_unlock(0, (unsigned long *)bl);
+ hlist_bl_unlock(&gl_hash_table[hash]);
}
static void gfs2_glock_dealloc(struct rcu_head *rcu)
config HPFS_FS
tristate "OS/2 HPFS file system support"
depends on BLOCK
- depends on BROKEN || !PREEMPT
help
OS/2 is IBM's operating system for PC's, the same as Warp, and HPFS
is the file system used for organizing files on OS/2 hard disk
#include "hpfs_fn.h"
-static int hpfs_alloc_if_possible_nolock(struct super_block *s, secno sec);
-
/*
* Check if a sector is allocated in bitmap
* This is really slow. Turned on only if chk==2
static int chk_if_allocated(struct super_block *s, secno sec, char *msg)
{
struct quad_buffer_head qbh;
- unsigned *bmp;
+ u32 *bmp;
if (!(bmp = hpfs_map_bitmap(s, sec >> 14, &qbh, "chk"))) goto fail;
- if ((bmp[(sec & 0x3fff) >> 5] >> (sec & 0x1f)) & 1) {
+ if ((cpu_to_le32(bmp[(sec & 0x3fff) >> 5]) >> (sec & 0x1f)) & 1) {
hpfs_error(s, "sector '%s' - %08x not allocated in bitmap", msg, sec);
goto fail1;
}
if (sec >= hpfs_sb(s)->sb_dirband_start && sec < hpfs_sb(s)->sb_dirband_start + hpfs_sb(s)->sb_dirband_size) {
unsigned ssec = (sec - hpfs_sb(s)->sb_dirband_start) / 4;
if (!(bmp = hpfs_map_dnode_bitmap(s, &qbh))) goto fail;
- if ((bmp[ssec >> 5] >> (ssec & 0x1f)) & 1) {
+ if ((le32_to_cpu(bmp[ssec >> 5]) >> (ssec & 0x1f)) & 1) {
hpfs_error(s, "sector '%s' - %08x not allocated in directory bitmap", msg, sec);
goto fail1;
}
hpfs_error(s, "Bad allocation size: %d", n);
return 0;
}
- lock_super(s);
if (bs != ~0x3fff) {
if (!(bmp = hpfs_map_bitmap(s, near >> 14, &qbh, "aib"))) goto uls;
} else {
ret = bs + nr;
goto rt;
}
- /*if (!tstbits(bmp, nr + n, n + forward)) {
- ret = bs + nr + n;
- goto rt;
- }*/
q = nr + n; b = 0;
while ((a = tstbits(bmp, q, n + forward)) != 0) {
q += a;
goto rt;
}
nr >>= 5;
- /*for (i = nr + 1; i != nr; i++, i &= 0x1ff) {*/
+ /*for (i = nr + 1; i != nr; i++, i &= 0x1ff) */
i = nr;
do {
- if (!bmp[i]) goto cont;
- if (n + forward >= 0x3f && bmp[i] != -1) goto cont;
+ if (!le32_to_cpu(bmp[i])) goto cont;
+ if (n + forward >= 0x3f && le32_to_cpu(bmp[i]) != 0xffffffff) goto cont;
q = i<<5;
if (i > 0) {
- unsigned k = bmp[i-1];
+ unsigned k = le32_to_cpu(bmp[i-1]);
while (k & 0x80000000) {
q--; k <<= 1;
}
} while (i != nr);
rt:
if (ret) {
- if (hpfs_sb(s)->sb_chk && ((ret >> 14) != (bs >> 14) || (bmp[(ret & 0x3fff) >> 5] | ~(((1 << n) - 1) << (ret & 0x1f))) != 0xffffffff)) {
+ if (hpfs_sb(s)->sb_chk && ((ret >> 14) != (bs >> 14) || (le32_to_cpu(bmp[(ret & 0x3fff) >> 5]) | ~(((1 << n) - 1) << (ret & 0x1f))) != 0xffffffff)) {
hpfs_error(s, "Allocation doesn't work! Wanted %d, allocated at %08x", n, ret);
ret = 0;
goto b;
}
- bmp[(ret & 0x3fff) >> 5] &= ~(((1 << n) - 1) << (ret & 0x1f));
+ bmp[(ret & 0x3fff) >> 5] &= cpu_to_le32(~(((1 << n) - 1) << (ret & 0x1f)));
hpfs_mark_4buffers_dirty(&qbh);
}
b:
hpfs_brelse4(&qbh);
uls:
- unlock_super(s);
return ret;
}
* sectors
*/
-secno hpfs_alloc_sector(struct super_block *s, secno near, unsigned n, int forward, int lock)
+secno hpfs_alloc_sector(struct super_block *s, secno near, unsigned n, int forward)
{
secno sec;
int i;
forward = -forward;
f_p = 1;
}
- if (lock) hpfs_lock_creation(s);
n_bmps = (sbi->sb_fs_size + 0x4000 - 1) >> 14;
if (near && near < sbi->sb_fs_size) {
if ((sec = alloc_in_bmp(s, near, n, f_p ? forward : forward/4))) goto ret;
ret:
if (sec && f_p) {
for (i = 0; i < forward; i++) {
- if (!hpfs_alloc_if_possible_nolock(s, sec + i + 1)) {
+ if (!hpfs_alloc_if_possible(s, sec + i + 1)) {
hpfs_error(s, "Prealloc doesn't work! Wanted %d, allocated at %08x, can't allocate %d", forward, sec, i);
sec = 0;
break;
}
}
}
- if (lock) hpfs_unlock_creation(s);
return sec;
}
-static secno alloc_in_dirband(struct super_block *s, secno near, int lock)
+static secno alloc_in_dirband(struct super_block *s, secno near)
{
unsigned nr = near;
secno sec;
nr = sbi->sb_dirband_start + sbi->sb_dirband_size - 4;
nr -= sbi->sb_dirband_start;
nr >>= 2;
- if (lock) hpfs_lock_creation(s);
sec = alloc_in_bmp(s, (~0x3fff) | nr, 1, 0);
- if (lock) hpfs_unlock_creation(s);
if (!sec) return 0;
return ((sec & 0x3fff) << 2) + sbi->sb_dirband_start;
}
/* Alloc sector if it's free */
-static int hpfs_alloc_if_possible_nolock(struct super_block *s, secno sec)
+int hpfs_alloc_if_possible(struct super_block *s, secno sec)
{
struct quad_buffer_head qbh;
- unsigned *bmp;
- lock_super(s);
+ u32 *bmp;
if (!(bmp = hpfs_map_bitmap(s, sec >> 14, &qbh, "aip"))) goto end;
- if (bmp[(sec & 0x3fff) >> 5] & (1 << (sec & 0x1f))) {
- bmp[(sec & 0x3fff) >> 5] &= ~(1 << (sec & 0x1f));
+ if (le32_to_cpu(bmp[(sec & 0x3fff) >> 5]) & (1 << (sec & 0x1f))) {
+ bmp[(sec & 0x3fff) >> 5] &= cpu_to_le32(~(1 << (sec & 0x1f)));
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
- unlock_super(s);
return 1;
}
hpfs_brelse4(&qbh);
end:
- unlock_super(s);
return 0;
}
-int hpfs_alloc_if_possible(struct super_block *s, secno sec)
-{
- int r;
- hpfs_lock_creation(s);
- r = hpfs_alloc_if_possible_nolock(s, sec);
- hpfs_unlock_creation(s);
- return r;
-}
-
/* Free sectors in bitmaps */
void hpfs_free_sectors(struct super_block *s, secno sec, unsigned n)
{
struct quad_buffer_head qbh;
- unsigned *bmp;
+ u32 *bmp;
struct hpfs_sb_info *sbi = hpfs_sb(s);
/*printk("2 - ");*/
if (!n) return;
hpfs_error(s, "Trying to free reserved sector %08x", sec);
return;
}
- lock_super(s);
sbi->sb_max_fwd_alloc += n > 0xffff ? 0xffff : n;
if (sbi->sb_max_fwd_alloc > 0xffffff) sbi->sb_max_fwd_alloc = 0xffffff;
new_map:
if (!(bmp = hpfs_map_bitmap(s, sec >> 14, &qbh, "free"))) {
- unlock_super(s);
return;
}
new_tst:
- if ((bmp[(sec & 0x3fff) >> 5] >> (sec & 0x1f) & 1)) {
+ if ((le32_to_cpu(bmp[(sec & 0x3fff) >> 5]) >> (sec & 0x1f) & 1)) {
hpfs_error(s, "sector %08x not allocated", sec);
hpfs_brelse4(&qbh);
- unlock_super(s);
return;
}
- bmp[(sec & 0x3fff) >> 5] |= 1 << (sec & 0x1f);
+ bmp[(sec & 0x3fff) >> 5] |= cpu_to_le32(1 << (sec & 0x1f));
if (!--n) {
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
- unlock_super(s);
return;
}
if (!(++sec & 0x3fff)) {
int n_bmps = (hpfs_sb(s)->sb_fs_size + 0x4000 - 1) >> 14;
int b = hpfs_sb(s)->sb_c_bitmap & 0x0fffffff;
int i, j;
- unsigned *bmp;
+ u32 *bmp;
struct quad_buffer_head qbh;
if ((bmp = hpfs_map_dnode_bitmap(s, &qbh))) {
for (j = 0; j < 512; j++) {
unsigned k;
- if (!bmp[j]) continue;
- for (k = bmp[j]; k; k >>= 1) if (k & 1) if (!--n) {
+ if (!le32_to_cpu(bmp[j])) continue;
+ for (k = le32_to_cpu(bmp[j]); k; k >>= 1) if (k & 1) if (!--n) {
hpfs_brelse4(&qbh);
return 0;
}
chk_bmp:
if (bmp) {
for (j = 0; j < 512; j++) {
- unsigned k;
- if (!bmp[j]) continue;
+ u32 k;
+ if (!le32_to_cpu(bmp[j])) continue;
for (k = 0xf; k; k <<= 4)
- if ((bmp[j] & k) == k) {
+ if ((le32_to_cpu(bmp[j]) & k) == k) {
if (!--n) {
hpfs_brelse4(&qbh);
return 0;
hpfs_free_sectors(s, dno, 4);
} else {
struct quad_buffer_head qbh;
- unsigned *bmp;
+ u32 *bmp;
unsigned ssec = (dno - hpfs_sb(s)->sb_dirband_start) / 4;
- lock_super(s);
if (!(bmp = hpfs_map_dnode_bitmap(s, &qbh))) {
- unlock_super(s);
return;
}
- bmp[ssec >> 5] |= 1 << (ssec & 0x1f);
+ bmp[ssec >> 5] |= cpu_to_le32(1 << (ssec & 0x1f));
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
- unlock_super(s);
}
}
struct dnode *hpfs_alloc_dnode(struct super_block *s, secno near,
- dnode_secno *dno, struct quad_buffer_head *qbh,
- int lock)
+ dnode_secno *dno, struct quad_buffer_head *qbh)
{
struct dnode *d;
if (hpfs_count_one_bitmap(s, hpfs_sb(s)->sb_dmap) > FREE_DNODES_ADD) {
- if (!(*dno = alloc_in_dirband(s, near, lock)))
- if (!(*dno = hpfs_alloc_sector(s, near, 4, 0, lock))) return NULL;
+ if (!(*dno = alloc_in_dirband(s, near)))
+ if (!(*dno = hpfs_alloc_sector(s, near, 4, 0))) return NULL;
} else {
- if (!(*dno = hpfs_alloc_sector(s, near, 4, 0, lock)))
- if (!(*dno = alloc_in_dirband(s, near, lock))) return NULL;
+ if (!(*dno = hpfs_alloc_sector(s, near, 4, 0)))
+ if (!(*dno = alloc_in_dirband(s, near))) return NULL;
}
if (!(d = hpfs_get_4sectors(s, *dno, qbh))) {
hpfs_free_dnode(s, *dno);
return NULL;
}
memset(d, 0, 2048);
- d->magic = DNODE_MAGIC;
- d->first_free = 52;
+ d->magic = cpu_to_le32(DNODE_MAGIC);
+ d->first_free = cpu_to_le32(52);
d->dirent[0] = 32;
d->dirent[2] = 8;
d->dirent[30] = 1;
d->dirent[31] = 255;
- d->self = *dno;
+ d->self = cpu_to_le32(*dno);
return d;
}
struct buffer_head **bh)
{
struct fnode *f;
- if (!(*fno = hpfs_alloc_sector(s, near, 1, FNODE_ALLOC_FWD, 1))) return NULL;
+ if (!(*fno = hpfs_alloc_sector(s, near, 1, FNODE_ALLOC_FWD))) return NULL;
if (!(f = hpfs_get_sector(s, *fno, bh))) {
hpfs_free_sectors(s, *fno, 1);
return NULL;
}
memset(f, 0, 512);
- f->magic = FNODE_MAGIC;
- f->ea_offs = 0xc4;
+ f->magic = cpu_to_le32(FNODE_MAGIC);
+ f->ea_offs = cpu_to_le16(0xc4);
f->btree.n_free_nodes = 8;
- f->btree.first_free = 8;
+ f->btree.first_free = cpu_to_le16(8);
return f;
}
struct buffer_head **bh)
{
struct anode *a;
- if (!(*ano = hpfs_alloc_sector(s, near, 1, ANODE_ALLOC_FWD, 1))) return NULL;
+ if (!(*ano = hpfs_alloc_sector(s, near, 1, ANODE_ALLOC_FWD))) return NULL;
if (!(a = hpfs_get_sector(s, *ano, bh))) {
hpfs_free_sectors(s, *ano, 1);
return NULL;
}
memset(a, 0, 512);
- a->magic = ANODE_MAGIC;
- a->self = *ano;
+ a->magic = cpu_to_le32(ANODE_MAGIC);
+ a->self = cpu_to_le32(*ano);
a->btree.n_free_nodes = 40;
a->btree.n_used_nodes = 0;
- a->btree.first_free = 8;
+ a->btree.first_free = cpu_to_le16(8);
return a;
}
if (hpfs_sb(s)->sb_chk) if (hpfs_stop_cycles(s, a, &c1, &c2, "hpfs_bplus_lookup")) return -1;
if (btree->internal) {
for (i = 0; i < btree->n_used_nodes; i++)
- if (btree->u.internal[i].file_secno > sec) {
- a = btree->u.internal[i].down;
+ if (le32_to_cpu(btree->u.internal[i].file_secno) > sec) {
+ a = le32_to_cpu(btree->u.internal[i].down);
brelse(bh);
if (!(anode = hpfs_map_anode(s, a, &bh))) return -1;
btree = &anode->btree;
return -1;
}
for (i = 0; i < btree->n_used_nodes; i++)
- if (btree->u.external[i].file_secno <= sec &&
- btree->u.external[i].file_secno + btree->u.external[i].length > sec) {
- a = btree->u.external[i].disk_secno + sec - btree->u.external[i].file_secno;
+ if (le32_to_cpu(btree->u.external[i].file_secno) <= sec &&
+ le32_to_cpu(btree->u.external[i].file_secno) + le32_to_cpu(btree->u.external[i].length) > sec) {
+ a = le32_to_cpu(btree->u.external[i].disk_secno) + sec - le32_to_cpu(btree->u.external[i].file_secno);
if (hpfs_sb(s)->sb_chk) if (hpfs_chk_sectors(s, a, 1, "data")) {
brelse(bh);
return -1;
}
if (inode) {
struct hpfs_inode_info *hpfs_inode = hpfs_i(inode);
- hpfs_inode->i_file_sec = btree->u.external[i].file_secno;
- hpfs_inode->i_disk_sec = btree->u.external[i].disk_secno;
- hpfs_inode->i_n_secs = btree->u.external[i].length;
+ hpfs_inode->i_file_sec = le32_to_cpu(btree->u.external[i].file_secno);
+ hpfs_inode->i_disk_sec = le32_to_cpu(btree->u.external[i].disk_secno);
+ hpfs_inode->i_n_secs = le32_to_cpu(btree->u.external[i].length);
}
brelse(bh);
return a;
return -1;
}
if (btree->internal) {
- a = btree->u.internal[n].down;
- btree->u.internal[n].file_secno = -1;
+ a = le32_to_cpu(btree->u.internal[n].down);
+ btree->u.internal[n].file_secno = cpu_to_le32(-1);
mark_buffer_dirty(bh);
brelse(bh);
if (hpfs_sb(s)->sb_chk)
goto go_down;
}
if (n >= 0) {
- if (btree->u.external[n].file_secno + btree->u.external[n].length != fsecno) {
+ if (le32_to_cpu(btree->u.external[n].file_secno) + le32_to_cpu(btree->u.external[n].length) != fsecno) {
hpfs_error(s, "allocated size %08x, trying to add sector %08x, %cnode %08x",
- btree->u.external[n].file_secno + btree->u.external[n].length, fsecno,
+ le32_to_cpu(btree->u.external[n].file_secno) + le32_to_cpu(btree->u.external[n].length), fsecno,
fnod?'f':'a', node);
brelse(bh);
return -1;
}
- if (hpfs_alloc_if_possible(s, se = btree->u.external[n].disk_secno + btree->u.external[n].length)) {
- btree->u.external[n].length++;
+ if (hpfs_alloc_if_possible(s, se = le32_to_cpu(btree->u.external[n].disk_secno) + le32_to_cpu(btree->u.external[n].length))) {
+ btree->u.external[n].length = cpu_to_le32(le32_to_cpu(btree->u.external[n].length) + 1);
mark_buffer_dirty(bh);
brelse(bh);
return se;
}
se = !fnod ? node : (node + 16384) & ~16383;
}
- if (!(se = hpfs_alloc_sector(s, se, 1, fsecno*ALLOC_M>ALLOC_FWD_MAX ? ALLOC_FWD_MAX : fsecno*ALLOC_M<ALLOC_FWD_MIN ? ALLOC_FWD_MIN : fsecno*ALLOC_M, 1))) {
+ if (!(se = hpfs_alloc_sector(s, se, 1, fsecno*ALLOC_M>ALLOC_FWD_MAX ? ALLOC_FWD_MAX : fsecno*ALLOC_M<ALLOC_FWD_MIN ? ALLOC_FWD_MIN : fsecno*ALLOC_M))) {
brelse(bh);
return -1;
}
- fs = n < 0 ? 0 : btree->u.external[n].file_secno + btree->u.external[n].length;
+ fs = n < 0 ? 0 : le32_to_cpu(btree->u.external[n].file_secno) + le32_to_cpu(btree->u.external[n].length);
if (!btree->n_free_nodes) {
- up = a != node ? anode->up : -1;
+ up = a != node ? le32_to_cpu(anode->up) : -1;
if (!(anode = hpfs_alloc_anode(s, a, &na, &bh1))) {
brelse(bh);
hpfs_free_sectors(s, se, 1);
return -1;
}
if (a == node && fnod) {
- anode->up = node;
+ anode->up = cpu_to_le32(node);
anode->btree.fnode_parent = 1;
anode->btree.n_used_nodes = btree->n_used_nodes;
anode->btree.first_free = btree->first_free;
btree->internal = 1;
btree->n_free_nodes = 11;
btree->n_used_nodes = 1;
- btree->first_free = (char *)&(btree->u.internal[1]) - (char *)btree;
- btree->u.internal[0].file_secno = -1;
- btree->u.internal[0].down = na;
+ btree->first_free = cpu_to_le16((char *)&(btree->u.internal[1]) - (char *)btree);
+ btree->u.internal[0].file_secno = cpu_to_le32(-1);
+ btree->u.internal[0].down = cpu_to_le32(na);
mark_buffer_dirty(bh);
} else if (!(ranode = hpfs_alloc_anode(s, /*a*/0, &ra, &bh2))) {
brelse(bh);
btree = &anode->btree;
}
btree->n_free_nodes--; n = btree->n_used_nodes++;
- btree->first_free += 12;
- btree->u.external[n].disk_secno = se;
- btree->u.external[n].file_secno = fs;
- btree->u.external[n].length = 1;
+ btree->first_free = cpu_to_le16(le16_to_cpu(btree->first_free) + 12);
+ btree->u.external[n].disk_secno = cpu_to_le32(se);
+ btree->u.external[n].file_secno = cpu_to_le32(fs);
+ btree->u.external[n].length = cpu_to_le32(1);
mark_buffer_dirty(bh);
brelse(bh);
if ((a == node && fnod) || na == -1) return se;
c2 = 0;
- while (up != -1) {
+ while (up != (anode_secno)-1) {
struct anode *new_anode;
if (hpfs_sb(s)->sb_chk)
if (hpfs_stop_cycles(s, up, &c1, &c2, "hpfs_add_sector_to_btree #2")) return -1;
}
if (btree->n_free_nodes) {
btree->n_free_nodes--; n = btree->n_used_nodes++;
- btree->first_free += 8;
- btree->u.internal[n].file_secno = -1;
- btree->u.internal[n].down = na;
- btree->u.internal[n-1].file_secno = fs;
+ btree->first_free = cpu_to_le16(le16_to_cpu(btree->first_free) + 8);
+ btree->u.internal[n].file_secno = cpu_to_le32(-1);
+ btree->u.internal[n].down = cpu_to_le32(na);
+ btree->u.internal[n-1].file_secno = cpu_to_le32(fs);
mark_buffer_dirty(bh);
brelse(bh);
brelse(bh2);
hpfs_free_sectors(s, ra, 1);
if ((anode = hpfs_map_anode(s, na, &bh))) {
- anode->up = up;
+ anode->up = cpu_to_le32(up);
anode->btree.fnode_parent = up == node && fnod;
mark_buffer_dirty(bh);
brelse(bh);
}
return se;
}
- up = up != node ? anode->up : -1;
- btree->u.internal[btree->n_used_nodes - 1].file_secno = /*fs*/-1;
+ up = up != node ? le32_to_cpu(anode->up) : -1;
+ btree->u.internal[btree->n_used_nodes - 1].file_secno = cpu_to_le32(/*fs*/-1);
mark_buffer_dirty(bh);
brelse(bh);
a = na;
if ((new_anode = hpfs_alloc_anode(s, a, &na, &bh))) {
anode = new_anode;
- /*anode->up = up != -1 ? up : ra;*/
+ /*anode->up = cpu_to_le32(up != -1 ? up : ra);*/
anode->btree.internal = 1;
anode->btree.n_used_nodes = 1;
anode->btree.n_free_nodes = 59;
- anode->btree.first_free = 16;
- anode->btree.u.internal[0].down = a;
- anode->btree.u.internal[0].file_secno = -1;
+ anode->btree.first_free = cpu_to_le16(16);
+ anode->btree.u.internal[0].down = cpu_to_le32(a);
+ anode->btree.u.internal[0].file_secno = cpu_to_le32(-1);
mark_buffer_dirty(bh);
brelse(bh);
if ((anode = hpfs_map_anode(s, a, &bh))) {
- anode->up = na;
+ anode->up = cpu_to_le32(na);
mark_buffer_dirty(bh);
brelse(bh);
}
} else na = a;
}
if ((anode = hpfs_map_anode(s, na, &bh))) {
- anode->up = node;
+ anode->up = cpu_to_le32(node);
if (fnod) anode->btree.fnode_parent = 1;
mark_buffer_dirty(bh);
brelse(bh);
}
btree = &fnode->btree;
}
- ranode->up = node;
- memcpy(&ranode->btree, btree, btree->first_free);
+ ranode->up = cpu_to_le32(node);
+ memcpy(&ranode->btree, btree, le16_to_cpu(btree->first_free));
if (fnod) ranode->btree.fnode_parent = 1;
ranode->btree.n_free_nodes = (ranode->btree.internal ? 60 : 40) - ranode->btree.n_used_nodes;
if (ranode->btree.internal) for (n = 0; n < ranode->btree.n_used_nodes; n++) {
struct anode *unode;
- if ((unode = hpfs_map_anode(s, ranode->u.internal[n].down, &bh1))) {
- unode->up = ra;
+ if ((unode = hpfs_map_anode(s, le32_to_cpu(ranode->u.internal[n].down), &bh1))) {
+ unode->up = cpu_to_le32(ra);
unode->btree.fnode_parent = 0;
mark_buffer_dirty(bh1);
brelse(bh1);
btree->internal = 1;
btree->n_free_nodes = fnod ? 10 : 58;
btree->n_used_nodes = 2;
- btree->first_free = (char *)&btree->u.internal[2] - (char *)btree;
- btree->u.internal[0].file_secno = fs;
- btree->u.internal[0].down = ra;
- btree->u.internal[1].file_secno = -1;
- btree->u.internal[1].down = na;
+ btree->first_free = cpu_to_le16((char *)&btree->u.internal[2] - (char *)btree);
+ btree->u.internal[0].file_secno = cpu_to_le32(fs);
+ btree->u.internal[0].down = cpu_to_le32(ra);
+ btree->u.internal[1].file_secno = cpu_to_le32(-1);
+ btree->u.internal[1].down = cpu_to_le32(na);
mark_buffer_dirty(bh);
brelse(bh);
mark_buffer_dirty(bh2);
go_down:
d2 = 0;
while (btree1->internal) {
- ano = btree1->u.internal[pos].down;
+ ano = le32_to_cpu(btree1->u.internal[pos].down);
if (level) brelse(bh);
if (hpfs_sb(s)->sb_chk)
if (hpfs_stop_cycles(s, ano, &d1, &d2, "hpfs_remove_btree #1"))
pos = 0;
}
for (i = 0; i < btree1->n_used_nodes; i++)
- hpfs_free_sectors(s, btree1->u.external[i].disk_secno, btree1->u.external[i].length);
+ hpfs_free_sectors(s, le32_to_cpu(btree1->u.external[i].disk_secno), le32_to_cpu(btree1->u.external[i].length));
go_up:
if (!level) return;
brelse(bh);
if (hpfs_stop_cycles(s, ano, &c1, &c2, "hpfs_remove_btree #2")) return;
hpfs_free_sectors(s, ano, 1);
oano = ano;
- ano = anode->up;
+ ano = le32_to_cpu(anode->up);
if (--level) {
if (!(anode = hpfs_map_anode(s, ano, &bh))) return;
btree1 = &anode->btree;
} else btree1 = btree;
for (i = 0; i < btree1->n_used_nodes; i++) {
- if (btree1->u.internal[i].down == oano) {
+ if (le32_to_cpu(btree1->u.internal[i].down) == oano) {
if ((pos = i + 1) < btree1->n_used_nodes)
goto go_down;
else
if (fno) {
btree->n_free_nodes = 8;
btree->n_used_nodes = 0;
- btree->first_free = 8;
+ btree->first_free = cpu_to_le16(8);
btree->internal = 0;
mark_buffer_dirty(bh);
} else hpfs_free_sectors(s, f, 1);
while (btree->internal) {
nodes = btree->n_used_nodes + btree->n_free_nodes;
for (i = 0; i < btree->n_used_nodes; i++)
- if (btree->u.internal[i].file_secno >= secs) goto f;
+ if (le32_to_cpu(btree->u.internal[i].file_secno) >= secs) goto f;
brelse(bh);
hpfs_error(s, "internal btree %08x doesn't end with -1", node);
return;
f:
for (j = i + 1; j < btree->n_used_nodes; j++)
- hpfs_ea_remove(s, btree->u.internal[j].down, 1, 0);
+ hpfs_ea_remove(s, le32_to_cpu(btree->u.internal[j].down), 1, 0);
btree->n_used_nodes = i + 1;
btree->n_free_nodes = nodes - btree->n_used_nodes;
- btree->first_free = 8 + 8 * btree->n_used_nodes;
+ btree->first_free = cpu_to_le16(8 + 8 * btree->n_used_nodes);
mark_buffer_dirty(bh);
- if (btree->u.internal[i].file_secno == secs) {
+ if (btree->u.internal[i].file_secno == cpu_to_le32(secs)) {
brelse(bh);
return;
}
- node = btree->u.internal[i].down;
+ node = le32_to_cpu(btree->u.internal[i].down);
brelse(bh);
if (hpfs_sb(s)->sb_chk)
if (hpfs_stop_cycles(s, node, &c1, &c2, "hpfs_truncate_btree"))
}
nodes = btree->n_used_nodes + btree->n_free_nodes;
for (i = 0; i < btree->n_used_nodes; i++)
- if (btree->u.external[i].file_secno + btree->u.external[i].length >= secs) goto ff;
+ if (le32_to_cpu(btree->u.external[i].file_secno) + le32_to_cpu(btree->u.external[i].length) >= secs) goto ff;
brelse(bh);
return;
ff:
- if (secs <= btree->u.external[i].file_secno) {
+ if (secs <= le32_to_cpu(btree->u.external[i].file_secno)) {
hpfs_error(s, "there is an allocation error in file %08x, sector %08x", f, secs);
if (i) i--;
}
- else if (btree->u.external[i].file_secno + btree->u.external[i].length > secs) {
- hpfs_free_sectors(s, btree->u.external[i].disk_secno + secs -
- btree->u.external[i].file_secno, btree->u.external[i].length
- - secs + btree->u.external[i].file_secno); /* I hope gcc optimizes this :-) */
- btree->u.external[i].length = secs - btree->u.external[i].file_secno;
+ else if (le32_to_cpu(btree->u.external[i].file_secno) + le32_to_cpu(btree->u.external[i].length) > secs) {
+ hpfs_free_sectors(s, le32_to_cpu(btree->u.external[i].disk_secno) + secs -
+ le32_to_cpu(btree->u.external[i].file_secno), le32_to_cpu(btree->u.external[i].length)
+ - secs + le32_to_cpu(btree->u.external[i].file_secno)); /* I hope gcc optimizes this :-) */
+ btree->u.external[i].length = cpu_to_le32(secs - le32_to_cpu(btree->u.external[i].file_secno));
}
for (j = i + 1; j < btree->n_used_nodes; j++)
- hpfs_free_sectors(s, btree->u.external[j].disk_secno, btree->u.external[j].length);
+ hpfs_free_sectors(s, le32_to_cpu(btree->u.external[j].disk_secno), le32_to_cpu(btree->u.external[j].length));
btree->n_used_nodes = i + 1;
btree->n_free_nodes = nodes - btree->n_used_nodes;
- btree->first_free = 8 + 12 * btree->n_used_nodes;
+ btree->first_free = cpu_to_le16(8 + 12 * btree->n_used_nodes);
mark_buffer_dirty(bh);
brelse(bh);
}
struct extended_attribute *ea_end;
if (!(fnode = hpfs_map_fnode(s, fno, &bh))) return;
if (!fnode->dirflag) hpfs_remove_btree(s, &fnode->btree);
- else hpfs_remove_dtree(s, fnode->u.external[0].disk_secno);
+ else hpfs_remove_dtree(s, le32_to_cpu(fnode->u.external[0].disk_secno));
ea_end = fnode_end_ea(fnode);
for (ea = fnode_ea(fnode); ea < ea_end; ea = next_ea(ea))
if (ea->indirect)
hpfs_ea_remove(s, ea_sec(ea), ea->anode, ea_len(ea));
- hpfs_ea_ext_remove(s, fnode->ea_secno, fnode->ea_anode, fnode->ea_size_l);
+ hpfs_ea_ext_remove(s, le32_to_cpu(fnode->ea_secno), fnode->ea_anode, le32_to_cpu(fnode->ea_size_l));
brelse(bh);
hpfs_free_sectors(s, fno, 1);
}
#include <linux/slab.h>
#include "hpfs_fn.h"
-void hpfs_lock_creation(struct super_block *s)
-{
-#ifdef DEBUG_LOCKS
- printk("lock creation\n");
-#endif
- mutex_lock(&hpfs_sb(s)->hpfs_creation_de);
-}
-
-void hpfs_unlock_creation(struct super_block *s)
-{
-#ifdef DEBUG_LOCKS
- printk("unlock creation\n");
-#endif
- mutex_unlock(&hpfs_sb(s)->hpfs_creation_de);
-}
-
/* Map a sector into a buffer and return pointers to it and to the buffer. */
void *hpfs_map_sector(struct super_block *s, unsigned secno, struct buffer_head **bhp,
{
struct buffer_head *bh;
+ hpfs_lock_assert(s);
+
cond_resched();
*bhp = bh = sb_bread(s, secno);
struct buffer_head *bh;
/*return hpfs_map_sector(s, secno, bhp, 0);*/
+ hpfs_lock_assert(s);
+
cond_resched();
if ((*bhp = bh = sb_getblk(s, secno)) != NULL) {
struct buffer_head *bh;
char *data;
+ hpfs_lock_assert(s);
+
cond_resched();
if (secno & 3) {
{
cond_resched();
+ hpfs_lock_assert(s);
+
if (secno & 3) {
printk("HPFS: hpfs_get_4sectors: unaligned read\n");
return NULL;
hpfs_error(inode->i_sb, "not a directory, fnode %08lx",
(unsigned long)inode->i_ino);
}
- if (hpfs_inode->i_dno != fno->u.external[0].disk_secno) {
+ if (hpfs_inode->i_dno != le32_to_cpu(fno->u.external[0].disk_secno)) {
e = 1;
- hpfs_error(inode->i_sb, "corrupted inode: i_dno == %08x, fnode -> dnode == %08x", hpfs_inode->i_dno, fno->u.external[0].disk_secno);
+ hpfs_error(inode->i_sb, "corrupted inode: i_dno == %08x, fnode -> dnode == %08x", hpfs_inode->i_dno, le32_to_cpu(fno->u.external[0].disk_secno));
}
brelse(bh);
if (e) {
goto again;
}
tempname = hpfs_translate_name(inode->i_sb, de->name, de->namelen, lc, de->not_8x3);
- if (filldir(dirent, tempname, de->namelen, old_pos, de->fnode, DT_UNKNOWN) < 0) {
+ if (filldir(dirent, tempname, de->namelen, old_pos, le32_to_cpu(de->fnode), DT_UNKNOWN) < 0) {
filp->f_pos = old_pos;
if (tempname != de->name) kfree(tempname);
hpfs_brelse4(&qbh);
* Get inode number, what we're after.
*/
- ino = de->fnode;
+ ino = le32_to_cpu(de->fnode);
/*
* Go find or make an inode.
hpfs_init_inode(result);
if (de->directory)
hpfs_read_inode(result);
- else if (de->ea_size && hpfs_sb(dir->i_sb)->sb_eas)
+ else if (le32_to_cpu(de->ea_size) && hpfs_sb(dir->i_sb)->sb_eas)
hpfs_read_inode(result);
else {
result->i_mode |= S_IFREG;
hpfs_result = hpfs_i(result);
if (!de->directory) hpfs_result->i_parent_dir = dir->i_ino;
- hpfs_decide_conv(result, name, len);
-
if (de->has_acl || de->has_xtd_perm) if (!(dir->i_sb->s_flags & MS_RDONLY)) {
hpfs_error(result->i_sb, "ACLs or XPERM found. This is probably HPFS386. This driver doesn't support it now. Send me some info on these structures");
goto bail1;
*/
if (!result->i_ctime.tv_sec) {
- if (!(result->i_ctime.tv_sec = local_to_gmt(dir->i_sb, de->creation_date)))
+ if (!(result->i_ctime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(de->creation_date))))
result->i_ctime.tv_sec = 1;
result->i_ctime.tv_nsec = 0;
- result->i_mtime.tv_sec = local_to_gmt(dir->i_sb, de->write_date);
+ result->i_mtime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(de->write_date));
result->i_mtime.tv_nsec = 0;
- result->i_atime.tv_sec = local_to_gmt(dir->i_sb, de->read_date);
+ result->i_atime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(de->read_date));
result->i_atime.tv_nsec = 0;
- hpfs_result->i_ea_size = de->ea_size;
+ hpfs_result->i_ea_size = le32_to_cpu(de->ea_size);
if (!hpfs_result->i_ea_mode && de->read_only)
result->i_mode &= ~0222;
if (!de->directory) {
if (result->i_size == -1) {
- result->i_size = de->file_size;
+ result->i_size = le32_to_cpu(de->file_size);
result->i_data.a_ops = &hpfs_aops;
hpfs_i(result)->mmu_private = result->i_size;
/*
struct hpfs_dirent *de_end = dnode_end_de(d);
int i = 1;
for (de = dnode_first_de(d); de < de_end; de = de_next_de(de)) {
- if (de == fde) return ((loff_t) d->self << 4) | (loff_t)i;
+ if (de == fde) return ((loff_t) le32_to_cpu(d->self) << 4) | (loff_t)i;
i++;
}
printk("HPFS: get_pos: not_found\n");
- return ((loff_t)d->self << 4) | (loff_t)1;
+ return ((loff_t)le32_to_cpu(d->self) << 4) | (loff_t)1;
}
void hpfs_add_pos(struct inode *inode, loff_t *pos)
{
struct hpfs_dirent *de;
if (!(de = dnode_last_de(d))) {
- hpfs_error(s, "set_last_pointer: empty dnode %08x", d->self);
+ hpfs_error(s, "set_last_pointer: empty dnode %08x", le32_to_cpu(d->self));
return;
}
if (hpfs_sb(s)->sb_chk) {
if (de->down) {
hpfs_error(s, "set_last_pointer: dnode %08x has already last pointer %08x",
- d->self, de_down_pointer(de));
+ le32_to_cpu(d->self), de_down_pointer(de));
return;
}
- if (de->length != 32) {
- hpfs_error(s, "set_last_pointer: bad last dirent in dnode %08x", d->self);
+ if (le16_to_cpu(de->length) != 32) {
+ hpfs_error(s, "set_last_pointer: bad last dirent in dnode %08x", le32_to_cpu(d->self));
return;
}
}
if (ptr) {
- if ((d->first_free += 4) > 2048) {
- hpfs_error(s,"set_last_pointer: too long dnode %08x", d->self);
- d->first_free -= 4;
+ d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) + 4);
+ if (le32_to_cpu(d->first_free) > 2048) {
+ hpfs_error(s, "set_last_pointer: too long dnode %08x", le32_to_cpu(d->self));
+ d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) - 4);
return;
}
- de->length = 36;
+ de->length = cpu_to_le16(36);
de->down = 1;
- *(dnode_secno *)((char *)de + 32) = ptr;
+ *(dnode_secno *)((char *)de + 32) = cpu_to_le32(ptr);
}
}
for (de = dnode_first_de(d); de < de_end; de = de_next_de(de)) {
int c = hpfs_compare_names(s, name, namelen, de->name, de->namelen, de->last);
if (!c) {
- hpfs_error(s, "name (%c,%d) already exists in dnode %08x", *name, namelen, d->self);
+ hpfs_error(s, "name (%c,%d) already exists in dnode %08x", *name, namelen, le32_to_cpu(d->self));
return NULL;
}
if (c < 0) break;
memmove((char *)de + d_size, de, (char *)de_end - (char *)de);
memset(de, 0, d_size);
if (down_ptr) {
- *(int *)((char *)de + d_size - 4) = down_ptr;
+ *(dnode_secno *)((char *)de + d_size - 4) = cpu_to_le32(down_ptr);
de->down = 1;
}
- de->length = d_size;
- if (down_ptr) de->down = 1;
+ de->length = cpu_to_le16(d_size);
de->not_8x3 = hpfs_is_name_long(name, namelen);
de->namelen = namelen;
memcpy(de->name, name, namelen);
- d->first_free += d_size;
+ d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) + d_size);
return de;
}
struct hpfs_dirent *de)
{
if (de->last) {
- hpfs_error(s, "attempt to delete last dirent in dnode %08x", d->self);
+ hpfs_error(s, "attempt to delete last dirent in dnode %08x", le32_to_cpu(d->self));
return;
}
- d->first_free -= de->length;
- memmove(de, de_next_de(de), d->first_free + (char *)d - (char *)de);
+ d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) - le16_to_cpu(de->length));
+ memmove(de, de_next_de(de), le32_to_cpu(d->first_free) + (char *)d - (char *)de);
}
static void fix_up_ptrs(struct super_block *s, struct dnode *d)
{
struct hpfs_dirent *de;
struct hpfs_dirent *de_end = dnode_end_de(d);
- dnode_secno dno = d->self;
+ dnode_secno dno = le32_to_cpu(d->self);
for (de = dnode_first_de(d); de < de_end; de = de_next_de(de))
if (de->down) {
struct quad_buffer_head qbh;
struct dnode *dd;
if ((dd = hpfs_map_dnode(s, de_down_pointer(de), &qbh))) {
- if (dd->up != dno || dd->root_dnode) {
- dd->up = dno;
+ if (le32_to_cpu(dd->up) != dno || dd->root_dnode) {
+ dd->up = cpu_to_le32(dno);
dd->root_dnode = 0;
hpfs_mark_4buffers_dirty(&qbh);
}
kfree(nname);
return 1;
}
- if (d->first_free + de_size(namelen, down_ptr) <= 2048) {
+ if (le32_to_cpu(d->first_free) + de_size(namelen, down_ptr) <= 2048) {
loff_t t;
copy_de(de=hpfs_add_de(i->i_sb, d, name, namelen, down_ptr), new_de);
t = get_pos(d, de);
kfree(nname);
return 1;
}
- memcpy(nd, d, d->first_free);
+ memcpy(nd, d, le32_to_cpu(d->first_free));
copy_de(de = hpfs_add_de(i->i_sb, nd, name, namelen, down_ptr), new_de);
for_all_poss(i, hpfs_pos_ins, get_pos(nd, de), 1);
h = ((char *)dnode_last_de(nd) - (char *)nd) / 2 + 10;
- if (!(ad = hpfs_alloc_dnode(i->i_sb, d->up, &adno, &qbh1, 0))) {
+ if (!(ad = hpfs_alloc_dnode(i->i_sb, le32_to_cpu(d->up), &adno, &qbh1))) {
hpfs_error(i->i_sb, "unable to alloc dnode - dnode tree will be corrupted");
hpfs_brelse4(&qbh);
kfree(nd);
down_ptr = adno;
set_last_pointer(i->i_sb, ad, de->down ? de_down_pointer(de) : 0);
de = de_next_de(de);
- memmove((char *)nd + 20, de, nd->first_free + (char *)nd - (char *)de);
- nd->first_free -= (char *)de - (char *)nd - 20;
- memcpy(d, nd, nd->first_free);
+ memmove((char *)nd + 20, de, le32_to_cpu(nd->first_free) + (char *)nd - (char *)de);
+ nd->first_free = cpu_to_le32(le32_to_cpu(nd->first_free) - ((char *)de - (char *)nd - 20));
+ memcpy(d, nd, le32_to_cpu(nd->first_free));
for_all_poss(i, hpfs_pos_del, (loff_t)dno << 4, pos);
fix_up_ptrs(i->i_sb, ad);
if (!d->root_dnode) {
- dno = ad->up = d->up;
+ ad->up = d->up;
+ dno = le32_to_cpu(ad->up);
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
hpfs_mark_4buffers_dirty(&qbh1);
hpfs_brelse4(&qbh1);
goto go_up;
}
- if (!(rd = hpfs_alloc_dnode(i->i_sb, d->up, &rdno, &qbh2, 0))) {
+ if (!(rd = hpfs_alloc_dnode(i->i_sb, le32_to_cpu(d->up), &rdno, &qbh2))) {
hpfs_error(i->i_sb, "unable to alloc dnode - dnode tree will be corrupted");
hpfs_brelse4(&qbh);
hpfs_brelse4(&qbh1);
i->i_blocks += 4;
rd->root_dnode = 1;
rd->up = d->up;
- if (!(fnode = hpfs_map_fnode(i->i_sb, d->up, &bh))) {
+ if (!(fnode = hpfs_map_fnode(i->i_sb, le32_to_cpu(d->up), &bh))) {
hpfs_free_dnode(i->i_sb, rdno);
hpfs_brelse4(&qbh);
hpfs_brelse4(&qbh1);
kfree(nname);
return 1;
}
- fnode->u.external[0].disk_secno = rdno;
+ fnode->u.external[0].disk_secno = cpu_to_le32(rdno);
mark_buffer_dirty(bh);
brelse(bh);
- d->up = ad->up = hpfs_i(i)->i_dno = rdno;
+ hpfs_i(i)->i_dno = rdno;
+ d->up = ad->up = cpu_to_le32(rdno);
d->root_dnode = ad->root_dnode = 0;
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
int hpfs_add_dirent(struct inode *i,
const unsigned char *name, unsigned namelen,
- struct hpfs_dirent *new_de, int cdepth)
+ struct hpfs_dirent *new_de)
{
struct hpfs_inode_info *hpfs_inode = hpfs_i(i);
struct dnode *d;
}
}
hpfs_brelse4(&qbh);
- if (!cdepth) hpfs_lock_creation(i->i_sb);
if (hpfs_check_free_dnodes(i->i_sb, FREE_DNODES_ADD)) {
c = 1;
goto ret;
i->i_version++;
c = hpfs_add_to_dnode(i, dno, name, namelen, new_de, 0);
ret:
- if (!cdepth) hpfs_unlock_creation(i->i_sb);
return c;
}
return 0;
if (!(dnode = hpfs_map_dnode(i->i_sb, dno, &qbh))) return 0;
if (hpfs_sb(i->i_sb)->sb_chk) {
- if (dnode->up != chk_up) {
+ if (le32_to_cpu(dnode->up) != chk_up) {
hpfs_error(i->i_sb, "move_to_top: up pointer from %08x should be %08x, is %08x",
- dno, chk_up, dnode->up);
+ dno, chk_up, le32_to_cpu(dnode->up));
hpfs_brelse4(&qbh);
return 0;
}
hpfs_brelse4(&qbh);
}
while (!(de = dnode_pre_last_de(dnode))) {
- dnode_secno up = dnode->up;
+ dnode_secno up = le32_to_cpu(dnode->up);
hpfs_brelse4(&qbh);
hpfs_free_dnode(i->i_sb, dno);
i->i_size -= 2048;
hpfs_brelse4(&qbh);
return 0;
}
- dnode->first_free -= 4;
- de->length -= 4;
+ dnode->first_free = cpu_to_le32(le32_to_cpu(dnode->first_free) - 4);
+ de->length = cpu_to_le16(le16_to_cpu(de->length) - 4);
de->down = 0;
hpfs_mark_4buffers_dirty(&qbh);
dno = up;
t = get_pos(dnode, de);
for_all_poss(i, hpfs_pos_subst, t, 4);
for_all_poss(i, hpfs_pos_subst, t + 1, 5);
- if (!(nde = kmalloc(de->length, GFP_NOFS))) {
+ if (!(nde = kmalloc(le16_to_cpu(de->length), GFP_NOFS))) {
hpfs_error(i->i_sb, "out of memory for dirent - directory will be corrupted");
hpfs_brelse4(&qbh);
return 0;
}
- memcpy(nde, de, de->length);
+ memcpy(nde, de, le16_to_cpu(de->length));
ddno = de->down ? de_down_pointer(de) : 0;
hpfs_delete_de(i->i_sb, dnode, de);
set_last_pointer(i->i_sb, dnode, ddno);
try_it_again:
if (hpfs_stop_cycles(i->i_sb, dno, &c1, &c2, "delete_empty_dnode")) return;
if (!(dnode = hpfs_map_dnode(i->i_sb, dno, &qbh))) return;
- if (dnode->first_free > 56) goto end;
- if (dnode->first_free == 52 || dnode->first_free == 56) {
+ if (le32_to_cpu(dnode->first_free) > 56) goto end;
+ if (le32_to_cpu(dnode->first_free) == 52 || le32_to_cpu(dnode->first_free) == 56) {
struct hpfs_dirent *de_end;
int root = dnode->root_dnode;
- up = dnode->up;
+ up = le32_to_cpu(dnode->up);
de = dnode_first_de(dnode);
down = de->down ? de_down_pointer(de) : 0;
if (hpfs_sb(i->i_sb)->sb_chk) if (root && !down) {
return;
}
if ((d1 = hpfs_map_dnode(i->i_sb, down, &qbh1))) {
- d1->up = up;
+ d1->up = cpu_to_le32(up);
d1->root_dnode = 1;
hpfs_mark_4buffers_dirty(&qbh1);
hpfs_brelse4(&qbh1);
}
if ((fnode = hpfs_map_fnode(i->i_sb, up, &bh))) {
- fnode->u.external[0].disk_secno = down;
+ fnode->u.external[0].disk_secno = cpu_to_le32(down);
mark_buffer_dirty(bh);
brelse(bh);
}
for_all_poss(i, hpfs_pos_subst, ((loff_t)dno << 4) | 1, ((loff_t)up << 4) | p);
if (!down) {
de->down = 0;
- de->length -= 4;
- dnode->first_free -= 4;
+ de->length = cpu_to_le16(le16_to_cpu(de->length) - 4);
+ dnode->first_free = cpu_to_le32(le32_to_cpu(dnode->first_free) - 4);
memmove(de_next_de(de), (char *)de_next_de(de) + 4,
- (char *)dnode + dnode->first_free - (char *)de_next_de(de));
+ (char *)dnode + le32_to_cpu(dnode->first_free) - (char *)de_next_de(de));
} else {
struct dnode *d1;
struct quad_buffer_head qbh1;
- *(dnode_secno *) ((void *) de + de->length - 4) = down;
+ *(dnode_secno *) ((void *) de + le16_to_cpu(de->length) - 4) = down;
if ((d1 = hpfs_map_dnode(i->i_sb, down, &qbh1))) {
- d1->up = up;
+ d1->up = cpu_to_le32(up);
hpfs_mark_4buffers_dirty(&qbh1);
hpfs_brelse4(&qbh1);
}
}
} else {
- hpfs_error(i->i_sb, "delete_empty_dnode: dnode %08x, first_free == %03x", dno, dnode->first_free);
+ hpfs_error(i->i_sb, "delete_empty_dnode: dnode %08x, first_free == %03x", dno, le32_to_cpu(dnode->first_free));
goto end;
}
struct quad_buffer_head qbh1;
if (!de_next->down) goto endm;
ndown = de_down_pointer(de_next);
- if (!(de_cp = kmalloc(de->length, GFP_NOFS))) {
+ if (!(de_cp = kmalloc(le16_to_cpu(de->length), GFP_NOFS))) {
printk("HPFS: out of memory for dtree balancing\n");
goto endm;
}
- memcpy(de_cp, de, de->length);
+ memcpy(de_cp, de, le16_to_cpu(de->length));
hpfs_delete_de(i->i_sb, dnode, de);
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
for_all_poss(i, hpfs_pos_subst, ((loff_t)up << 4) | p, 4);
for_all_poss(i, hpfs_pos_del, ((loff_t)up << 4) | p, 1);
if (de_cp->down) if ((d1 = hpfs_map_dnode(i->i_sb, de_down_pointer(de_cp), &qbh1))) {
- d1->up = ndown;
+ d1->up = cpu_to_le32(ndown);
hpfs_mark_4buffers_dirty(&qbh1);
hpfs_brelse4(&qbh1);
}
struct hpfs_dirent *del = dnode_last_de(d1);
dlp = del->down ? de_down_pointer(del) : 0;
if (!dlp && down) {
- if (d1->first_free > 2044) {
+ if (le32_to_cpu(d1->first_free) > 2044) {
if (hpfs_sb(i->i_sb)->sb_chk >= 2) {
printk("HPFS: warning: unbalanced dnode tree, see hpfs.txt 4 more info\n");
printk("HPFS: warning: terminating balancing operation\n");
printk("HPFS: warning: unbalanced dnode tree, see hpfs.txt 4 more info\n");
printk("HPFS: warning: goin'on\n");
}
- del->length += 4;
+ del->length = cpu_to_le16(le16_to_cpu(del->length) + 4);
del->down = 1;
- d1->first_free += 4;
+ d1->first_free = cpu_to_le32(le32_to_cpu(d1->first_free) + 4);
}
if (dlp && !down) {
- del->length -= 4;
+ del->length = cpu_to_le16(le16_to_cpu(del->length) - 4);
del->down = 0;
- d1->first_free -= 4;
+ d1->first_free = cpu_to_le32(le32_to_cpu(d1->first_free) - 4);
} else if (down)
- *(dnode_secno *) ((void *) del + del->length - 4) = down;
+ *(dnode_secno *) ((void *) del + le16_to_cpu(del->length) - 4) = cpu_to_le32(down);
} else goto endm;
- if (!(de_cp = kmalloc(de_prev->length, GFP_NOFS))) {
+ if (!(de_cp = kmalloc(le16_to_cpu(de_prev->length), GFP_NOFS))) {
printk("HPFS: out of memory for dtree balancing\n");
hpfs_brelse4(&qbh1);
goto endm;
}
hpfs_mark_4buffers_dirty(&qbh1);
hpfs_brelse4(&qbh1);
- memcpy(de_cp, de_prev, de_prev->length);
+ memcpy(de_cp, de_prev, le16_to_cpu(de_prev->length));
hpfs_delete_de(i->i_sb, dnode, de_prev);
if (!de_prev->down) {
- de_prev->length += 4;
+ de_prev->length = cpu_to_le16(le16_to_cpu(de_prev->length) + 4);
de_prev->down = 1;
- dnode->first_free += 4;
+ dnode->first_free = cpu_to_le32(le32_to_cpu(dnode->first_free) + 4);
}
- *(dnode_secno *) ((void *) de_prev + de_prev->length - 4) = ndown;
+ *(dnode_secno *) ((void *) de_prev + le16_to_cpu(de_prev->length) - 4) = cpu_to_le32(ndown);
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
for_all_poss(i, hpfs_pos_subst, ((loff_t)up << 4) | (p - 1), 4);
for_all_poss(i, hpfs_pos_subst, ((loff_t)up << 4) | p, ((loff_t)up << 4) | (p - 1));
if (down) if ((d1 = hpfs_map_dnode(i->i_sb, de_down_pointer(de), &qbh1))) {
- d1->up = ndown;
+ d1->up = cpu_to_le32(ndown);
hpfs_mark_4buffers_dirty(&qbh1);
hpfs_brelse4(&qbh1);
}
{
struct dnode *dnode = qbh->data;
dnode_secno down = 0;
- int lock = 0;
loff_t t;
if (de->first || de->last) {
hpfs_error(i->i_sb, "hpfs_remove_dirent: attempt to delete first or last dirent in dnode %08x", dno);
}
if (de->down) down = de_down_pointer(de);
if (depth && (de->down || (de == dnode_first_de(dnode) && de_next_de(de)->last))) {
- lock = 1;
- hpfs_lock_creation(i->i_sb);
if (hpfs_check_free_dnodes(i->i_sb, FREE_DNODES_DEL)) {
hpfs_brelse4(qbh);
- hpfs_unlock_creation(i->i_sb);
return 2;
}
}
dnode_secno a = move_to_top(i, down, dno);
for_all_poss(i, hpfs_pos_subst, 5, t);
if (a) delete_empty_dnode(i, a);
- if (lock) hpfs_unlock_creation(i->i_sb);
return !a;
}
delete_empty_dnode(i, dno);
- if (lock) hpfs_unlock_creation(i->i_sb);
return 0;
}
ptr = 0;
go_up:
if (!(dnode = hpfs_map_dnode(s, dno, &qbh))) return;
- if (hpfs_sb(s)->sb_chk) if (odno && odno != -1 && dnode->up != odno)
- hpfs_error(s, "hpfs_count_dnodes: bad up pointer; dnode %08x, down %08x points to %08x", odno, dno, dnode->up);
+ if (hpfs_sb(s)->sb_chk) if (odno && odno != -1 && le32_to_cpu(dnode->up) != odno)
+ hpfs_error(s, "hpfs_count_dnodes: bad up pointer; dnode %08x, down %08x points to %08x", odno, dno, le32_to_cpu(dnode->up));
de = dnode_first_de(dnode);
if (ptr) while(1) {
if (de->down) if (de_down_pointer(de) == ptr) goto process_de;
if (!de->first && !de->last && n_items) (*n_items)++;
if ((de = de_next_de(de)) < dnode_end_de(dnode)) goto next_de;
ptr = dno;
- dno = dnode->up;
+ dno = le32_to_cpu(dnode->up);
if (dnode->root_dnode) {
hpfs_brelse4(&qbh);
return;
return d;
if (!(de = map_nth_dirent(s, d, 1, &qbh, NULL))) return dno;
if (hpfs_sb(s)->sb_chk)
- if (up && ((struct dnode *)qbh.data)->up != up)
- hpfs_error(s, "hpfs_de_as_down_as_possible: bad up pointer; dnode %08x, down %08x points to %08x", up, d, ((struct dnode *)qbh.data)->up);
+ if (up && le32_to_cpu(((struct dnode *)qbh.data)->up) != up)
+ hpfs_error(s, "hpfs_de_as_down_as_possible: bad up pointer; dnode %08x, down %08x points to %08x", up, d, le32_to_cpu(((struct dnode *)qbh.data)->up));
if (!de->down) {
hpfs_brelse4(&qbh);
return d;
/* Going up */
if (dnode->root_dnode) goto bail;
- if (!(up_dnode = hpfs_map_dnode(inode->i_sb, dnode->up, &qbh0)))
+ if (!(up_dnode = hpfs_map_dnode(inode->i_sb, le32_to_cpu(dnode->up), &qbh0)))
goto bail;
end_up_de = dnode_end_de(up_dnode);
for (up_de = dnode_first_de(up_dnode); up_de < end_up_de;
up_de = de_next_de(up_de)) {
if (!(++c & 077)) hpfs_error(inode->i_sb,
- "map_pos_dirent: pos crossed dnode boundary; dnode = %08x", dnode->up);
+ "map_pos_dirent: pos crossed dnode boundary; dnode = %08x", le32_to_cpu(dnode->up));
if (up_de->down && de_down_pointer(up_de) == dno) {
- *posp = ((loff_t) dnode->up << 4) + c;
+ *posp = ((loff_t) le32_to_cpu(dnode->up) << 4) + c;
hpfs_brelse4(&qbh0);
return de;
}
}
hpfs_error(inode->i_sb, "map_pos_dirent: pointer to dnode %08x not found in parent dnode %08x",
- dno, dnode->up);
+ dno, le32_to_cpu(dnode->up));
hpfs_brelse4(&qbh0);
bail:
/*name2[15] = 0xff;*/
name1len = 15; name2len = 256;
}
- if (!(upf = hpfs_map_fnode(s, f->up, &bh))) {
+ if (!(upf = hpfs_map_fnode(s, le32_to_cpu(f->up), &bh))) {
kfree(name2);
return NULL;
}
if (!upf->dirflag) {
brelse(bh);
- hpfs_error(s, "fnode %08x has non-directory parent %08x", fno, f->up);
+ hpfs_error(s, "fnode %08x has non-directory parent %08x", fno, le32_to_cpu(f->up));
kfree(name2);
return NULL;
}
- dno = upf->u.external[0].disk_secno;
+ dno = le32_to_cpu(upf->u.external[0].disk_secno);
brelse(bh);
go_down:
downd = 0;
return NULL;
}
next_de:
- if (de->fnode == fno) {
+ if (le32_to_cpu(de->fnode) == fno) {
kfree(name2);
return de;
}
goto go_down;
}
f:
- if (de->fnode == fno) {
+ if (le32_to_cpu(de->fnode) == fno) {
kfree(name2);
return de;
}
if ((de = de_next_de(de)) < de_end) goto next_de;
if (d->root_dnode) goto not_found;
downd = dno;
- dno = d->up;
+ dno = le32_to_cpu(d->up);
hpfs_brelse4(qbh);
if (hpfs_sb(s)->sb_chk)
if (hpfs_stop_cycles(s, downd, &d1, &d2, "map_fnode_dirent #2")) {
}
if (hpfs_ea_read(s, a, ano, pos, 4, ex)) return;
if (ea->indirect) {
- if (ea->valuelen != 8) {
+ if (ea_valuelen(ea) != 8) {
hpfs_error(s, "ea->indirect set while ea->valuelen!=8, %s %08x, pos %08x",
ano ? "anode" : "sectors", a, pos);
return;
return;
hpfs_ea_remove(s, ea_sec(ea), ea->anode, ea_len(ea));
}
- pos += ea->namelen + ea->valuelen + 5;
+ pos += ea->namelen + ea_valuelen(ea) + 5;
}
if (!ano) hpfs_free_sectors(s, a, (len+511) >> 9);
else {
unsigned pos;
int ano, len;
secno a;
+ char ex[4 + 255 + 1 + 8];
struct extended_attribute *ea;
struct extended_attribute *ea_end = fnode_end_ea(fnode);
for (ea = fnode_ea(fnode); ea < ea_end; ea = next_ea(ea))
if (!strcmp(ea->name, key)) {
if (ea->indirect)
goto indirect;
- if (ea->valuelen >= size)
+ if (ea_valuelen(ea) >= size)
return -EINVAL;
- memcpy(buf, ea_data(ea), ea->valuelen);
- buf[ea->valuelen] = 0;
+ memcpy(buf, ea_data(ea), ea_valuelen(ea));
+ buf[ea_valuelen(ea)] = 0;
return 0;
}
- a = fnode->ea_secno;
- len = fnode->ea_size_l;
+ a = le32_to_cpu(fnode->ea_secno);
+ len = le32_to_cpu(fnode->ea_size_l);
ano = fnode->ea_anode;
pos = 0;
while (pos < len) {
- char ex[4 + 255 + 1 + 8];
ea = (struct extended_attribute *)ex;
if (pos + 4 > len) {
hpfs_error(s, "EAs don't end correctly, %s %08x, len %08x",
if (!strcmp(ea->name, key)) {
if (ea->indirect)
goto indirect;
- if (ea->valuelen >= size)
+ if (ea_valuelen(ea) >= size)
return -EINVAL;
- if (hpfs_ea_read(s, a, ano, pos + 4 + ea->namelen + 1, ea->valuelen, buf))
+ if (hpfs_ea_read(s, a, ano, pos + 4 + ea->namelen + 1, ea_valuelen(ea), buf))
return -EIO;
- buf[ea->valuelen] = 0;
+ buf[ea_valuelen(ea)] = 0;
return 0;
}
- pos += ea->namelen + ea->valuelen + 5;
+ pos += ea->namelen + ea_valuelen(ea) + 5;
}
return -ENOENT;
indirect:
if (!strcmp(ea->name, key)) {
if (ea->indirect)
return get_indirect_ea(s, ea->anode, ea_sec(ea), *size = ea_len(ea));
- if (!(ret = kmalloc((*size = ea->valuelen) + 1, GFP_NOFS))) {
+ if (!(ret = kmalloc((*size = ea_valuelen(ea)) + 1, GFP_NOFS))) {
printk("HPFS: out of memory for EA\n");
return NULL;
}
- memcpy(ret, ea_data(ea), ea->valuelen);
- ret[ea->valuelen] = 0;
+ memcpy(ret, ea_data(ea), ea_valuelen(ea));
+ ret[ea_valuelen(ea)] = 0;
return ret;
}
- a = fnode->ea_secno;
- len = fnode->ea_size_l;
+ a = le32_to_cpu(fnode->ea_secno);
+ len = le32_to_cpu(fnode->ea_size_l);
ano = fnode->ea_anode;
pos = 0;
while (pos < len) {
if (!strcmp(ea->name, key)) {
if (ea->indirect)
return get_indirect_ea(s, ea->anode, ea_sec(ea), *size = ea_len(ea));
- if (!(ret = kmalloc((*size = ea->valuelen) + 1, GFP_NOFS))) {
+ if (!(ret = kmalloc((*size = ea_valuelen(ea)) + 1, GFP_NOFS))) {
printk("HPFS: out of memory for EA\n");
return NULL;
}
- if (hpfs_ea_read(s, a, ano, pos + 4 + ea->namelen + 1, ea->valuelen, ret)) {
+ if (hpfs_ea_read(s, a, ano, pos + 4 + ea->namelen + 1, ea_valuelen(ea), ret)) {
kfree(ret);
return NULL;
}
- ret[ea->valuelen] = 0;
+ ret[ea_valuelen(ea)] = 0;
return ret;
}
- pos += ea->namelen + ea->valuelen + 5;
+ pos += ea->namelen + ea_valuelen(ea) + 5;
}
return NULL;
}
if (ea->indirect) {
if (ea_len(ea) == size)
set_indirect_ea(s, ea->anode, ea_sec(ea), data, size);
- } else if (ea->valuelen == size) {
+ } else if (ea_valuelen(ea) == size) {
memcpy(ea_data(ea), data, size);
}
return;
}
- a = fnode->ea_secno;
- len = fnode->ea_size_l;
+ a = le32_to_cpu(fnode->ea_secno);
+ len = le32_to_cpu(fnode->ea_size_l);
ano = fnode->ea_anode;
pos = 0;
while (pos < len) {
set_indirect_ea(s, ea->anode, ea_sec(ea), data, size);
}
else {
- if (ea->valuelen == size)
+ if (ea_valuelen(ea) == size)
hpfs_ea_write(s, a, ano, pos + 4 + ea->namelen + 1, size, data);
}
return;
}
- pos += ea->namelen + ea->valuelen + 5;
+ pos += ea->namelen + ea_valuelen(ea) + 5;
}
- if (!fnode->ea_offs) {
- /*if (fnode->ea_size_s) {
+ if (!le16_to_cpu(fnode->ea_offs)) {
+ /*if (le16_to_cpu(fnode->ea_size_s)) {
hpfs_error(s, "fnode %08x: ea_size_s == %03x, ea_offs == 0",
- inode->i_ino, fnode->ea_size_s);
+ inode->i_ino, le16_to_cpu(fnode->ea_size_s));
return;
}*/
- fnode->ea_offs = 0xc4;
+ fnode->ea_offs = cpu_to_le16(0xc4);
}
- if (fnode->ea_offs < 0xc4 || fnode->ea_offs + fnode->acl_size_s + fnode->ea_size_s > 0x200) {
+ if (le16_to_cpu(fnode->ea_offs) < 0xc4 || le16_to_cpu(fnode->ea_offs) + le16_to_cpu(fnode->acl_size_s) + le16_to_cpu(fnode->ea_size_s) > 0x200) {
hpfs_error(s, "fnode %08lx: ea_offs == %03x, ea_size_s == %03x",
(unsigned long)inode->i_ino,
- fnode->ea_offs, fnode->ea_size_s);
+ le32_to_cpu(fnode->ea_offs), le16_to_cpu(fnode->ea_size_s));
return;
}
- if ((fnode->ea_size_s || !fnode->ea_size_l) &&
- fnode->ea_offs + fnode->acl_size_s + fnode->ea_size_s + strlen(key) + size + 5 <= 0x200) {
+ if ((le16_to_cpu(fnode->ea_size_s) || !le32_to_cpu(fnode->ea_size_l)) &&
+ le16_to_cpu(fnode->ea_offs) + le16_to_cpu(fnode->acl_size_s) + le16_to_cpu(fnode->ea_size_s) + strlen(key) + size + 5 <= 0x200) {
ea = fnode_end_ea(fnode);
*(char *)ea = 0;
ea->namelen = strlen(key);
- ea->valuelen = size;
+ ea->valuelen_lo = size;
+ ea->valuelen_hi = size >> 8;
strcpy(ea->name, key);
memcpy(ea_data(ea), data, size);
- fnode->ea_size_s += strlen(key) + size + 5;
+ fnode->ea_size_s = cpu_to_le16(le16_to_cpu(fnode->ea_size_s) + strlen(key) + size + 5);
goto ret;
}
/* Most the code here is 99.9993422% unused. I hope there are no bugs.
But what .. HPFS.IFS has also bugs in ea management. */
- if (fnode->ea_size_s && !fnode->ea_size_l) {
+ if (le16_to_cpu(fnode->ea_size_s) && !le32_to_cpu(fnode->ea_size_l)) {
secno n;
struct buffer_head *bh;
char *data;
- if (!(n = hpfs_alloc_sector(s, fno, 1, 0, 1))) return;
+ if (!(n = hpfs_alloc_sector(s, fno, 1, 0))) return;
if (!(data = hpfs_get_sector(s, n, &bh))) {
hpfs_free_sectors(s, n, 1);
return;
}
- memcpy(data, fnode_ea(fnode), fnode->ea_size_s);
- fnode->ea_size_l = fnode->ea_size_s;
- fnode->ea_size_s = 0;
- fnode->ea_secno = n;
- fnode->ea_anode = 0;
+ memcpy(data, fnode_ea(fnode), le16_to_cpu(fnode->ea_size_s));
+ fnode->ea_size_l = cpu_to_le32(le16_to_cpu(fnode->ea_size_s));
+ fnode->ea_size_s = cpu_to_le16(0);
+ fnode->ea_secno = cpu_to_le32(n);
+ fnode->ea_anode = cpu_to_le32(0);
mark_buffer_dirty(bh);
brelse(bh);
}
- pos = fnode->ea_size_l + 5 + strlen(key) + size;
- len = (fnode->ea_size_l + 511) >> 9;
+ pos = le32_to_cpu(fnode->ea_size_l) + 5 + strlen(key) + size;
+ len = (le32_to_cpu(fnode->ea_size_l) + 511) >> 9;
if (pos >= 30000) goto bail;
while (((pos + 511) >> 9) > len) {
if (!len) {
- if (!(fnode->ea_secno = hpfs_alloc_sector(s, fno, 1, 0, 1)))
- goto bail;
+ secno q = hpfs_alloc_sector(s, fno, 1, 0);
+ if (!q) goto bail;
+ fnode->ea_secno = cpu_to_le32(q);
fnode->ea_anode = 0;
len++;
} else if (!fnode->ea_anode) {
- if (hpfs_alloc_if_possible(s, fnode->ea_secno + len)) {
+ if (hpfs_alloc_if_possible(s, le32_to_cpu(fnode->ea_secno) + len)) {
len++;
} else {
/* Aargh... don't know how to create ea anodes :-( */
anode_secno a_s;
if (!(anode = hpfs_alloc_anode(s, fno, &a_s, &bh)))
goto bail;
- anode->up = fno;
+ anode->up = cpu_to_le32(fno);
anode->btree.fnode_parent = 1;
anode->btree.n_free_nodes--;
anode->btree.n_used_nodes++;
- anode->btree.first_free += 12;
- anode->u.external[0].disk_secno = fnode->ea_secno;
- anode->u.external[0].file_secno = 0;
- anode->u.external[0].length = len;
+ anode->btree.first_free = cpu_to_le16(le16_to_cpu(anode->btree.first_free) + 12);
+ anode->u.external[0].disk_secno = cpu_to_le32(le32_to_cpu(fnode->ea_secno));
+ anode->u.external[0].file_secno = cpu_to_le32(0);
+ anode->u.external[0].length = cpu_to_le32(len);
mark_buffer_dirty(bh);
brelse(bh);
fnode->ea_anode = 1;
- fnode->ea_secno = a_s;*/
+ fnode->ea_secno = cpu_to_le32(a_s);*/
secno new_sec;
int i;
- if (!(new_sec = hpfs_alloc_sector(s, fno, 1, 1 - ((pos + 511) >> 9), 1)))
+ if (!(new_sec = hpfs_alloc_sector(s, fno, 1, 1 - ((pos + 511) >> 9))))
goto bail;
for (i = 0; i < len; i++) {
struct buffer_head *bh1, *bh2;
void *b1, *b2;
- if (!(b1 = hpfs_map_sector(s, fnode->ea_secno + i, &bh1, len - i - 1))) {
+ if (!(b1 = hpfs_map_sector(s, le32_to_cpu(fnode->ea_secno) + i, &bh1, len - i - 1))) {
hpfs_free_sectors(s, new_sec, (pos + 511) >> 9);
goto bail;
}
mark_buffer_dirty(bh2);
brelse(bh2);
}
- hpfs_free_sectors(s, fnode->ea_secno, len);
- fnode->ea_secno = new_sec;
+ hpfs_free_sectors(s, le32_to_cpu(fnode->ea_secno), len);
+ fnode->ea_secno = cpu_to_le32(new_sec);
len = (pos + 511) >> 9;
}
}
if (fnode->ea_anode) {
- if (hpfs_add_sector_to_btree(s, fnode->ea_secno,
+ if (hpfs_add_sector_to_btree(s, le32_to_cpu(fnode->ea_secno),
0, len) != -1) {
len++;
} else {
h[1] = strlen(key);
h[2] = size & 0xff;
h[3] = size >> 8;
- if (hpfs_ea_write(s, fnode->ea_secno, fnode->ea_anode, fnode->ea_size_l, 4, h)) goto bail;
- if (hpfs_ea_write(s, fnode->ea_secno, fnode->ea_anode, fnode->ea_size_l + 4, h[1] + 1, key)) goto bail;
- if (hpfs_ea_write(s, fnode->ea_secno, fnode->ea_anode, fnode->ea_size_l + 5 + h[1], size, data)) goto bail;
- fnode->ea_size_l = pos;
+ if (hpfs_ea_write(s, le32_to_cpu(fnode->ea_secno), fnode->ea_anode, le32_to_cpu(fnode->ea_size_l), 4, h)) goto bail;
+ if (hpfs_ea_write(s, le32_to_cpu(fnode->ea_secno), fnode->ea_anode, le32_to_cpu(fnode->ea_size_l) + 4, h[1] + 1, key)) goto bail;
+ if (hpfs_ea_write(s, le32_to_cpu(fnode->ea_secno), fnode->ea_anode, le32_to_cpu(fnode->ea_size_l) + 5 + h[1], size, data)) goto bail;
+ fnode->ea_size_l = cpu_to_le32(pos);
ret:
hpfs_i(inode)->i_ea_size += 5 + strlen(key) + size;
return;
bail:
- if (fnode->ea_secno)
- if (fnode->ea_anode) hpfs_truncate_btree(s, fnode->ea_secno, 1, (fnode->ea_size_l + 511) >> 9);
- else hpfs_free_sectors(s, fnode->ea_secno + ((fnode->ea_size_l + 511) >> 9), len - ((fnode->ea_size_l + 511) >> 9));
- else fnode->ea_secno = fnode->ea_size_l = 0;
+ if (le32_to_cpu(fnode->ea_secno))
+ if (fnode->ea_anode) hpfs_truncate_btree(s, le32_to_cpu(fnode->ea_secno), 1, (le32_to_cpu(fnode->ea_size_l) + 511) >> 9);
+ else hpfs_free_sectors(s, le32_to_cpu(fnode->ea_secno) + ((le32_to_cpu(fnode->ea_size_l) + 511) >> 9), len - ((le32_to_cpu(fnode->ea_size_l) + 511) >> 9));
+ else fnode->ea_secno = fnode->ea_size_l = cpu_to_le32(0);
}
int hpfs_file_fsync(struct file *file, int datasync)
{
- /*return file_fsync(file, datasync);*/
- return 0; /* Don't fsync :-) */
+ struct inode *inode = file->f_mapping->host;
+ return sync_blockdev(inode->i_sb->s_bdev);
}
/*
static void hpfs_truncate(struct inode *i)
{
if (IS_IMMUTABLE(i)) return /*-EPERM*/;
- hpfs_lock(i->i_sb);
+ hpfs_lock_assert(i->i_sb);
+
hpfs_i(i)->i_n_secs = 0;
i->i_blocks = 1 + ((i->i_size + 511) >> 9);
hpfs_i(i)->mmu_private = i->i_size;
hpfs_truncate_btree(i->i_sb, i->i_ino, 1, ((i->i_size + 511) >> 9));
hpfs_write_inode(i);
hpfs_i(i)->i_n_secs = 0;
- hpfs_unlock(i->i_sb);
}
static int hpfs_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create)
{
+ int r;
secno s;
+ hpfs_lock(inode->i_sb);
s = hpfs_bmap(inode, iblock);
if (s) {
map_bh(bh_result, inode->i_sb, s);
- return 0;
+ goto ret_0;
}
- if (!create) return 0;
+ if (!create) goto ret_0;
if (iblock<<9 != hpfs_i(inode)->mmu_private) {
BUG();
- return -EIO;
+ r = -EIO;
+ goto ret_r;
}
if ((s = hpfs_add_sector_to_btree(inode->i_sb, inode->i_ino, 1, inode->i_blocks - 1)) == -1) {
hpfs_truncate_btree(inode->i_sb, inode->i_ino, 1, inode->i_blocks - 1);
- return -ENOSPC;
+ r = -ENOSPC;
+ goto ret_r;
}
inode->i_blocks++;
hpfs_i(inode)->mmu_private += 512;
set_buffer_new(bh_result);
map_bh(bh_result, inode->i_sb, s);
- return 0;
+ ret_0:
+ r = 0;
+ ret_r:
+ hpfs_unlock(inode->i_sb);
+ return r;
}
static int hpfs_writepage(struct page *page, struct writeback_control *wbc)
ssize_t retval;
retval = do_sync_write(file, buf, count, ppos);
- if (retval > 0)
+ if (retval > 0) {
+ hpfs_lock(file->f_path.dentry->d_sb);
hpfs_i(file->f_path.dentry->d_inode)->i_dirty = 1;
+ hpfs_unlock(file->f_path.dentry->d_sb);
+ }
return retval;
}
For definitive information on HPFS, ask somebody else -- this is guesswork.
There are certain to be many mistakes. */
+#if !defined(__LITTLE_ENDIAN) && !defined(__BIG_ENDIAN)
+#error unknown endian
+#endif
+
/* Notation */
-typedef unsigned secno; /* sector number, partition relative */
+typedef u32 secno; /* sector number, partition relative */
typedef secno dnode_secno; /* sector number of a dnode */
typedef secno fnode_secno; /* sector number of an fnode */
struct hpfs_boot_block
{
- unsigned char jmp[3];
- unsigned char oem_id[8];
- unsigned char bytes_per_sector[2]; /* 512 */
- unsigned char sectors_per_cluster;
- unsigned char n_reserved_sectors[2];
- unsigned char n_fats;
- unsigned char n_rootdir_entries[2];
- unsigned char n_sectors_s[2];
- unsigned char media_byte;
- unsigned short sectors_per_fat;
- unsigned short sectors_per_track;
- unsigned short heads_per_cyl;
- unsigned int n_hidden_sectors;
- unsigned int n_sectors_l; /* size of partition */
- unsigned char drive_number;
- unsigned char mbz;
- unsigned char sig_28h; /* 28h */
- unsigned char vol_serno[4];
- unsigned char vol_label[11];
- unsigned char sig_hpfs[8]; /* "HPFS " */
- unsigned char pad[448];
- unsigned short magic; /* aa55 */
+ u8 jmp[3];
+ u8 oem_id[8];
+ u8 bytes_per_sector[2]; /* 512 */
+ u8 sectors_per_cluster;
+ u8 n_reserved_sectors[2];
+ u8 n_fats;
+ u8 n_rootdir_entries[2];
+ u8 n_sectors_s[2];
+ u8 media_byte;
+ u16 sectors_per_fat;
+ u16 sectors_per_track;
+ u16 heads_per_cyl;
+ u32 n_hidden_sectors;
+ u32 n_sectors_l; /* size of partition */
+ u8 drive_number;
+ u8 mbz;
+ u8 sig_28h; /* 28h */
+ u8 vol_serno[4];
+ u8 vol_label[11];
+ u8 sig_hpfs[8]; /* "HPFS " */
+ u8 pad[448];
+ u16 magic; /* aa55 */
};
struct hpfs_super_block
{
- unsigned magic; /* f995 e849 */
- unsigned magic1; /* fa53 e9c5, more magic? */
- /*unsigned huh202;*/ /* ?? 202 = N. of B. in 1.00390625 S.*/
- char version; /* version of a filesystem usually 2 */
- char funcversion; /* functional version - oldest version
+ u32 magic; /* f995 e849 */
+ u32 magic1; /* fa53 e9c5, more magic? */
+ u8 version; /* version of a filesystem usually 2 */
+ u8 funcversion; /* functional version - oldest version
of filesystem that can understand
this disk */
- unsigned short int zero; /* 0 */
+ u16 zero; /* 0 */
fnode_secno root; /* fnode of root directory */
secno n_sectors; /* size of filesystem */
- unsigned n_badblocks; /* number of bad blocks */
+ u32 n_badblocks; /* number of bad blocks */
secno bitmaps; /* pointers to free space bit maps */
- unsigned zero1; /* 0 */
+ u32 zero1; /* 0 */
secno badblocks; /* bad block list */
- unsigned zero3; /* 0 */
+ u32 zero3; /* 0 */
time32_t last_chkdsk; /* date last checked, 0 if never */
- /*unsigned zero4;*/ /* 0 */
- time32_t last_optimize; /* date last optimized, 0 if never */
+ time32_t last_optimize; /* date last optimized, 0 if never */
secno n_dir_band; /* number of sectors in dir band */
secno dir_band_start; /* first sector in dir band */
secno dir_band_end; /* last sector in dir band */
secno dir_band_bitmap; /* free space map, 1 dnode per bit */
- char volume_name[32]; /* not used */
+ u8 volume_name[32]; /* not used */
secno user_id_table; /* 8 preallocated sectors - user id */
- unsigned zero6[103]; /* 0 */
+ u32 zero6[103]; /* 0 */
};
struct hpfs_spare_block
{
- unsigned magic; /* f991 1849 */
- unsigned magic1; /* fa52 29c5, more magic? */
-
- unsigned dirty: 1; /* 0 clean, 1 "improperly stopped" */
- /*unsigned flag1234: 4;*/ /* unknown flags */
- unsigned sparedir_used: 1; /* spare dirblks used */
- unsigned hotfixes_used: 1; /* hotfixes used */
- unsigned bad_sector: 1; /* bad sector, corrupted disk (???) */
- unsigned bad_bitmap: 1; /* bad bitmap */
- unsigned fast: 1; /* partition was fast formatted */
- unsigned old_wrote: 1; /* old version wrote to partion */
- unsigned old_wrote_1: 1; /* old version wrote to partion (?) */
- unsigned install_dasd_limits: 1; /* HPFS386 flags */
- unsigned resynch_dasd_limits: 1;
- unsigned dasd_limits_operational: 1;
- unsigned multimedia_active: 1;
- unsigned dce_acls_active: 1;
- unsigned dasd_limits_dirty: 1;
- unsigned flag67: 2;
- unsigned char mm_contlgulty;
- unsigned char unused;
+ u32 magic; /* f991 1849 */
+ u32 magic1; /* fa52 29c5, more magic? */
+
+#ifdef __LITTLE_ENDIAN
+ u8 dirty: 1; /* 0 clean, 1 "improperly stopped" */
+ u8 sparedir_used: 1; /* spare dirblks used */
+ u8 hotfixes_used: 1; /* hotfixes used */
+ u8 bad_sector: 1; /* bad sector, corrupted disk (???) */
+ u8 bad_bitmap: 1; /* bad bitmap */
+ u8 fast: 1; /* partition was fast formatted */
+ u8 old_wrote: 1; /* old version wrote to partion */
+ u8 old_wrote_1: 1; /* old version wrote to partion (?) */
+#else
+ u8 old_wrote_1: 1; /* old version wrote to partion (?) */
+ u8 old_wrote: 1; /* old version wrote to partion */
+ u8 fast: 1; /* partition was fast formatted */
+ u8 bad_bitmap: 1; /* bad bitmap */
+ u8 bad_sector: 1; /* bad sector, corrupted disk (???) */
+ u8 hotfixes_used: 1; /* hotfixes used */
+ u8 sparedir_used: 1; /* spare dirblks used */
+ u8 dirty: 1; /* 0 clean, 1 "improperly stopped" */
+#endif
+
+#ifdef __LITTLE_ENDIAN
+ u8 install_dasd_limits: 1; /* HPFS386 flags */
+ u8 resynch_dasd_limits: 1;
+ u8 dasd_limits_operational: 1;
+ u8 multimedia_active: 1;
+ u8 dce_acls_active: 1;
+ u8 dasd_limits_dirty: 1;
+ u8 flag67: 2;
+#else
+ u8 flag67: 2;
+ u8 dasd_limits_dirty: 1;
+ u8 dce_acls_active: 1;
+ u8 multimedia_active: 1;
+ u8 dasd_limits_operational: 1;
+ u8 resynch_dasd_limits: 1;
+ u8 install_dasd_limits: 1; /* HPFS386 flags */
+#endif
+
+ u8 mm_contlgulty;
+ u8 unused;
secno hotfix_map; /* info about remapped bad sectors */
- unsigned n_spares_used; /* number of hotfixes */
- unsigned n_spares; /* number of spares in hotfix map */
- unsigned n_dnode_spares_free; /* spare dnodes unused */
- unsigned n_dnode_spares; /* length of spare_dnodes[] list,
+ u32 n_spares_used; /* number of hotfixes */
+ u32 n_spares; /* number of spares in hotfix map */
+ u32 n_dnode_spares_free; /* spare dnodes unused */
+ u32 n_dnode_spares; /* length of spare_dnodes[] list,
follows in this block*/
secno code_page_dir; /* code page directory block */
- unsigned n_code_pages; /* number of code pages */
- /*unsigned large_numbers[2];*/ /* ?? */
- unsigned super_crc; /* on HPFS386 and LAN Server this is
+ u32 n_code_pages; /* number of code pages */
+ u32 super_crc; /* on HPFS386 and LAN Server this is
checksum of superblock, on normal
OS/2 unused */
- unsigned spare_crc; /* on HPFS386 checksum of spareblock */
- unsigned zero1[15]; /* unused */
+ u32 spare_crc; /* on HPFS386 checksum of spareblock */
+ u32 zero1[15]; /* unused */
dnode_secno spare_dnodes[100]; /* emergency free dnode list */
- unsigned zero2[1]; /* room for more? */
+ u32 zero2[1]; /* room for more? */
};
/* The bad block list is 4 sectors long. The first word must be zero,
struct code_page_directory
{
- unsigned magic; /* 4945 21f7 */
- unsigned n_code_pages; /* number of pointers following */
- unsigned zero1[2];
+ u32 magic; /* 4945 21f7 */
+ u32 n_code_pages; /* number of pointers following */
+ u32 zero1[2];
struct {
- unsigned short ix; /* index */
- unsigned short code_page_number; /* code page number */
- unsigned bounds; /* matches corresponding word
+ u16 ix; /* index */
+ u16 code_page_number; /* code page number */
+ u32 bounds; /* matches corresponding word
in data block */
secno code_page_data; /* sector number of a code_page_data
containing c.p. array */
- unsigned short index; /* index in c.p. array in that sector*/
- unsigned short unknown; /* some unknown value; usually 0;
+ u16 index; /* index in c.p. array in that sector*/
+ u16 unknown; /* some unknown value; usually 0;
2 in Japanese version */
} array[31]; /* unknown length */
};
struct code_page_data
{
- unsigned magic; /* 8945 21f7 */
- unsigned n_used; /* # elements used in c_p_data[] */
- unsigned bounds[3]; /* looks a bit like
+ u32 magic; /* 8945 21f7 */
+ u32 n_used; /* # elements used in c_p_data[] */
+ u32 bounds[3]; /* looks a bit like
(beg1,end1), (beg2,end2)
one byte each */
- unsigned short offs[3]; /* offsets from start of sector
+ u16 offs[3]; /* offsets from start of sector
to start of c_p_data[ix] */
struct {
- unsigned short ix; /* index */
- unsigned short code_page_number; /* code page number */
- unsigned short unknown; /* the same as in cp directory */
- unsigned char map[128]; /* upcase table for chars 80..ff */
- unsigned short zero2;
+ u16 ix; /* index */
+ u16 code_page_number; /* code page number */
+ u16 unknown; /* the same as in cp directory */
+ u8 map[128]; /* upcase table for chars 80..ff */
+ u16 zero2;
} code_page[3];
- unsigned char incognita[78];
+ u8 incognita[78];
};
#define DNODE_MAGIC 0x77e40aae
struct dnode {
- unsigned magic; /* 77e4 0aae */
- unsigned first_free; /* offset from start of dnode to
+ u32 magic; /* 77e4 0aae */
+ u32 first_free; /* offset from start of dnode to
first free dir entry */
- unsigned root_dnode:1; /* Is it root dnode? */
- unsigned increment_me:31; /* some kind of activity counter?
- Neither HPFS.IFS nor CHKDSK cares
+#ifdef __LITTLE_ENDIAN
+ u8 root_dnode: 1; /* Is it root dnode? */
+ u8 increment_me: 7; /* some kind of activity counter? */
+ /* Neither HPFS.IFS nor CHKDSK cares
+ if you change this word */
+#else
+ u8 increment_me: 7; /* some kind of activity counter? */
+ /* Neither HPFS.IFS nor CHKDSK cares
if you change this word */
+ u8 root_dnode: 1; /* Is it root dnode? */
+#endif
+ u8 increment_me2[3];
secno up; /* (root dnode) directory's fnode
(nonroot) parent dnode */
dnode_secno self; /* pointer to this dnode */
- unsigned char dirent[2028]; /* one or more dirents */
+ u8 dirent[2028]; /* one or more dirents */
};
struct hpfs_dirent {
- unsigned short length; /* offset to next dirent */
- unsigned first: 1; /* set on phony ^A^A (".") entry */
- unsigned has_acl: 1;
- unsigned down: 1; /* down pointer present (after name) */
- unsigned last: 1; /* set on phony \377 entry */
- unsigned has_ea: 1; /* entry has EA */
- unsigned has_xtd_perm: 1; /* has extended perm list (???) */
- unsigned has_explicit_acl: 1;
- unsigned has_needea: 1; /* ?? some EA has NEEDEA set
+ u16 length; /* offset to next dirent */
+
+#ifdef __LITTLE_ENDIAN
+ u8 first: 1; /* set on phony ^A^A (".") entry */
+ u8 has_acl: 1;
+ u8 down: 1; /* down pointer present (after name) */
+ u8 last: 1; /* set on phony \377 entry */
+ u8 has_ea: 1; /* entry has EA */
+ u8 has_xtd_perm: 1; /* has extended perm list (???) */
+ u8 has_explicit_acl: 1;
+ u8 has_needea: 1; /* ?? some EA has NEEDEA set
+ I have no idea why this is
+ interesting in a dir entry */
+#else
+ u8 has_needea: 1; /* ?? some EA has NEEDEA set
I have no idea why this is
interesting in a dir entry */
- unsigned read_only: 1; /* dos attrib */
- unsigned hidden: 1; /* dos attrib */
- unsigned system: 1; /* dos attrib */
- unsigned flag11: 1; /* would be volume label dos attrib */
- unsigned directory: 1; /* dos attrib */
- unsigned archive: 1; /* dos attrib */
- unsigned not_8x3: 1; /* name is not 8.3 */
- unsigned flag15: 1;
+ u8 has_explicit_acl: 1;
+ u8 has_xtd_perm: 1; /* has extended perm list (???) */
+ u8 has_ea: 1; /* entry has EA */
+ u8 last: 1; /* set on phony \377 entry */
+ u8 down: 1; /* down pointer present (after name) */
+ u8 has_acl: 1;
+ u8 first: 1; /* set on phony ^A^A (".") entry */
+#endif
+
+#ifdef __LITTLE_ENDIAN
+ u8 read_only: 1; /* dos attrib */
+ u8 hidden: 1; /* dos attrib */
+ u8 system: 1; /* dos attrib */
+ u8 flag11: 1; /* would be volume label dos attrib */
+ u8 directory: 1; /* dos attrib */
+ u8 archive: 1; /* dos attrib */
+ u8 not_8x3: 1; /* name is not 8.3 */
+ u8 flag15: 1;
+#else
+ u8 flag15: 1;
+ u8 not_8x3: 1; /* name is not 8.3 */
+ u8 archive: 1; /* dos attrib */
+ u8 directory: 1; /* dos attrib */
+ u8 flag11: 1; /* would be volume label dos attrib */
+ u8 system: 1; /* dos attrib */
+ u8 hidden: 1; /* dos attrib */
+ u8 read_only: 1; /* dos attrib */
+#endif
+
fnode_secno fnode; /* fnode giving allocation info */
time32_t write_date; /* mtime */
- unsigned file_size; /* file length, bytes */
+ u32 file_size; /* file length, bytes */
time32_t read_date; /* atime */
time32_t creation_date; /* ctime */
- unsigned ea_size; /* total EA length, bytes */
- unsigned char no_of_acls : 3; /* number of ACL's */
- unsigned char reserver : 5;
- unsigned char ix; /* code page index (of filename), see
+ u32 ea_size; /* total EA length, bytes */
+ u8 no_of_acls; /* number of ACL's (low 3 bits) */
+ u8 ix; /* code page index (of filename), see
struct code_page_data */
- unsigned char namelen, name[1]; /* file name */
+ u8 namelen, name[1]; /* file name */
/* dnode_secno down; btree down pointer, if present,
follows name on next word boundary, or maybe it
precedes next dirent, which is on a word boundary. */
struct bplus_leaf_node
{
- unsigned file_secno; /* first file sector in extent */
- unsigned length; /* length, sectors */
+ u32 file_secno; /* first file sector in extent */
+ u32 length; /* length, sectors */
secno disk_secno; /* first corresponding disk sector */
};
struct bplus_internal_node
{
- unsigned file_secno; /* subtree maps sectors < this */
+ u32 file_secno; /* subtree maps sectors < this */
anode_secno down; /* pointer to subtree */
};
struct bplus_header
{
- unsigned hbff: 1; /* high bit of first free entry offset */
- unsigned flag1: 1;
- unsigned flag2: 1;
- unsigned flag3: 1;
- unsigned flag4: 1;
- unsigned fnode_parent: 1; /* ? we're pointed to by an fnode,
+#ifdef __LITTLE_ENDIAN
+ u8 hbff: 1; /* high bit of first free entry offset */
+ u8 flag1234: 4;
+ u8 fnode_parent: 1; /* ? we're pointed to by an fnode,
the data btree or some ea or the
main ea bootage pointer ea_secno */
/* also can get set in fnodes, which
may be a chkdsk glitch or may mean
this bit is irrelevant in fnodes,
or this interpretation is all wet */
- unsigned binary_search: 1; /* suggest binary search (unused) */
- unsigned internal: 1; /* 1 -> (internal) tree of anodes
+ u8 binary_search: 1; /* suggest binary search (unused) */
+ u8 internal: 1; /* 1 -> (internal) tree of anodes
+ 0 -> (leaf) list of extents */
+#else
+ u8 internal: 1; /* 1 -> (internal) tree of anodes
0 -> (leaf) list of extents */
- unsigned char fill[3];
- unsigned char n_free_nodes; /* free nodes in following array */
- unsigned char n_used_nodes; /* used nodes in following array */
- unsigned short first_free; /* offset from start of header to
+ u8 binary_search: 1; /* suggest binary search (unused) */
+ u8 fnode_parent: 1; /* ? we're pointed to by an fnode,
+ the data btree or some ea or the
+ main ea bootage pointer ea_secno */
+ /* also can get set in fnodes, which
+ may be a chkdsk glitch or may mean
+ this bit is irrelevant in fnodes,
+ or this interpretation is all wet */
+ u8 flag1234: 4;
+ u8 hbff: 1; /* high bit of first free entry offset */
+#endif
+ u8 fill[3];
+ u8 n_free_nodes; /* free nodes in following array */
+ u8 n_used_nodes; /* used nodes in following array */
+ u16 first_free; /* offset from start of header to
first free node in array */
union {
struct bplus_internal_node internal[0]; /* (internal) 2-word entries giving
struct fnode
{
- unsigned magic; /* f7e4 0aae */
- unsigned zero1[2]; /* read history */
- unsigned char len, name[15]; /* true length, truncated name */
+ u32 magic; /* f7e4 0aae */
+ u32 zero1[2]; /* read history */
+ u8 len, name[15]; /* true length, truncated name */
fnode_secno up; /* pointer to file's directory fnode */
- /*unsigned zero2[3];*/
secno acl_size_l;
secno acl_secno;
- unsigned short acl_size_s;
- char acl_anode;
- char zero2; /* history bit count */
- unsigned ea_size_l; /* length of disk-resident ea's */
+ u16 acl_size_s;
+ u8 acl_anode;
+ u8 zero2; /* history bit count */
+ u32 ea_size_l; /* length of disk-resident ea's */
secno ea_secno; /* first sector of disk-resident ea's*/
- unsigned short ea_size_s; /* length of fnode-resident ea's */
-
- unsigned flag0: 1;
- unsigned ea_anode: 1; /* 1 -> ea_secno is an anode */
- unsigned flag2: 1;
- unsigned flag3: 1;
- unsigned flag4: 1;
- unsigned flag5: 1;
- unsigned flag6: 1;
- unsigned flag7: 1;
- unsigned dirflag: 1; /* 1 -> directory. first & only extent
+ u16 ea_size_s; /* length of fnode-resident ea's */
+
+#ifdef __LITTLE_ENDIAN
+ u8 flag0: 1;
+ u8 ea_anode: 1; /* 1 -> ea_secno is an anode */
+ u8 flag234567: 6;
+#else
+ u8 flag234567: 6;
+ u8 ea_anode: 1; /* 1 -> ea_secno is an anode */
+ u8 flag0: 1;
+#endif
+
+#ifdef __LITTLE_ENDIAN
+ u8 dirflag: 1; /* 1 -> directory. first & only extent
points to dnode. */
- unsigned flag9: 1;
- unsigned flag10: 1;
- unsigned flag11: 1;
- unsigned flag12: 1;
- unsigned flag13: 1;
- unsigned flag14: 1;
- unsigned flag15: 1;
+ u8 flag9012345: 7;
+#else
+ u8 flag9012345: 7;
+ u8 dirflag: 1; /* 1 -> directory. first & only extent
+ points to dnode. */
+#endif
struct bplus_header btree; /* b+ tree, 8 extents or 12 subtrees */
union {
struct bplus_internal_node internal[12];
} u;
- unsigned file_size; /* file length, bytes */
- unsigned n_needea; /* number of EA's with NEEDEA set */
- char user_id[16]; /* unused */
- unsigned short ea_offs; /* offset from start of fnode
+ u32 file_size; /* file length, bytes */
+ u32 n_needea; /* number of EA's with NEEDEA set */
+ u8 user_id[16]; /* unused */
+ u16 ea_offs; /* offset from start of fnode
to first fnode-resident ea */
- char dasd_limit_treshhold;
- char dasd_limit_delta;
- unsigned dasd_limit;
- unsigned dasd_usage;
- /*unsigned zero5[2];*/
- unsigned char ea[316]; /* zero or more EA's, packed together
+ u8 dasd_limit_treshhold;
+ u8 dasd_limit_delta;
+ u32 dasd_limit;
+ u32 dasd_usage;
+ u8 ea[316]; /* zero or more EA's, packed together
with no alignment padding.
(Do not use this name, get here
via fnode + ea_offs. I think.) */
struct anode
{
- unsigned magic; /* 37e4 0aae */
+ u32 magic; /* 37e4 0aae */
anode_secno self; /* pointer to this anode */
secno up; /* parent anode or fnode */
struct bplus_internal_node internal[60];
} u;
- unsigned fill[3]; /* unused */
+ u32 fill[3]; /* unused */
};
struct extended_attribute
{
- unsigned indirect: 1; /* 1 -> value gives sector number
+#ifdef __LITTLE_ENDIAN
+ u8 indirect: 1; /* 1 -> value gives sector number
where real value starts */
- unsigned anode: 1; /* 1 -> sector is an anode
+ u8 anode: 1; /* 1 -> sector is an anode
+ that points to fragmented value */
+ u8 flag23456: 5;
+ u8 needea: 1; /* required ea */
+#else
+ u8 needea: 1; /* required ea */
+ u8 flag23456: 5;
+ u8 anode: 1; /* 1 -> sector is an anode
that points to fragmented value */
- unsigned flag2: 1;
- unsigned flag3: 1;
- unsigned flag4: 1;
- unsigned flag5: 1;
- unsigned flag6: 1;
- unsigned needea: 1; /* required ea */
- unsigned char namelen; /* length of name, bytes */
- unsigned short valuelen; /* length of value, bytes */
- unsigned char name[0];
+ u8 indirect: 1; /* 1 -> value gives sector number
+ where real value starts */
+#endif
+ u8 namelen; /* length of name, bytes */
+ u8 valuelen_lo; /* length of value, bytes */
+ u8 valuelen_hi; /* length of value, bytes */
+ u8 name[0];
/*
- unsigned char name[namelen]; ascii attrib name
- unsigned char nul; terminating '\0', not counted
- unsigned char value[valuelen]; value, arbitrary
+ u8 name[namelen]; ascii attrib name
+ u8 nul; terminating '\0', not counted
+ u8 value[valuelen]; value, arbitrary
if this.indirect, valuelen is 8 and the value is
- unsigned length; real length of value, bytes
+ u32 length; real length of value, bytes
secno secno; sector address where it starts
if this.anode, the above sector number is the root of an anode tree
which points to the value.
#include <linux/pagemap.h>
#include <linux/buffer_head.h>
#include <linux/slab.h>
+#include <asm/unaligned.h>
#include "hpfs.h"
unsigned i_disk_sec; /* (files) minimalist cache of alloc info */
unsigned i_n_secs; /* (files) minimalist cache of alloc info */
unsigned i_ea_size; /* size of extended attributes */
- unsigned i_conv : 2; /* (files) crlf->newline hackery */
unsigned i_ea_mode : 1; /* file's permission is stored in ea */
unsigned i_ea_uid : 1; /* file's uid is stored in ea */
unsigned i_ea_gid : 1; /* file's gid is stored in ea */
unsigned i_dirty : 1;
- struct mutex i_mutex;
- struct mutex i_parent_mutex;
loff_t **i_rddir_off;
struct inode vfs_inode;
};
struct hpfs_sb_info {
+ struct mutex hpfs_mutex; /* global hpfs lock */
ino_t sb_root; /* inode number of root dir */
unsigned sb_fs_size; /* file system size, sectors */
unsigned sb_bitmaps; /* sector number of bitmap list */
uid_t sb_uid; /* uid from mount options */
gid_t sb_gid; /* gid from mount options */
umode_t sb_mode; /* mode from mount options */
- unsigned sb_conv : 2; /* crlf->newline hackery */
unsigned sb_eas : 2; /* eas: 0-ignore, 1-ro, 2-rw */
unsigned sb_err : 2; /* on errs: 0-cont, 1-ro, 2-panic */
unsigned sb_chk : 2; /* checks: 0-no, 1-normal, 2-strict */
unsigned *sb_bmp_dir; /* main bitmap directory */
unsigned sb_c_bitmap; /* current bitmap */
unsigned sb_max_fwd_alloc; /* max forwad allocation */
- struct mutex hpfs_creation_de; /* when creating dirents, nobody else
- can alloc blocks */
- /*unsigned sb_mounting : 1;*/
int sb_timeshift;
};
-/*
- * conv= options
- */
-
-#define CONV_BINARY 0 /* no conversion */
-#define CONV_TEXT 1 /* crlf->newline */
-#define CONV_AUTO 2 /* decide based on file contents */
-
/* Four 512-byte buffers and the 2k block obtained by concatenating them */
struct quad_buffer_head {
static inline dnode_secno de_down_pointer (struct hpfs_dirent *de)
{
CHKCOND(de->down,("HPFS: de_down_pointer: !de->down\n"));
- return *(dnode_secno *) ((void *) de + de->length - 4);
+ return le32_to_cpu(*(dnode_secno *) ((void *) de + le16_to_cpu(de->length) - 4));
}
/* The first dir entry in a dnode */
static inline struct hpfs_dirent *dnode_end_de (struct dnode *dnode)
{
- CHKCOND(dnode->first_free>=0x14 && dnode->first_free<=0xa00,("HPFS: dnode_end_de: dnode->first_free = %d\n",(int)dnode->first_free));
- return (void *) dnode + dnode->first_free;
+ CHKCOND(le32_to_cpu(dnode->first_free)>=0x14 && le32_to_cpu(dnode->first_free)<=0xa00,("HPFS: dnode_end_de: dnode->first_free = %x\n",(unsigned)le32_to_cpu(dnode->first_free)));
+ return (void *) dnode + le32_to_cpu(dnode->first_free);
}
/* The dir entry after dir entry de */
static inline struct hpfs_dirent *de_next_de (struct hpfs_dirent *de)
{
- CHKCOND(de->length>=0x20 && de->length<0x800,("HPFS: de_next_de: de->length = %d\n",(int)de->length));
- return (void *) de + de->length;
+ CHKCOND(le16_to_cpu(de->length)>=0x20 && le16_to_cpu(de->length)<0x800,("HPFS: de_next_de: de->length = %x\n",(unsigned)le16_to_cpu(de->length)));
+ return (void *) de + le16_to_cpu(de->length);
}
static inline struct extended_attribute *fnode_ea(struct fnode *fnode)
{
- return (struct extended_attribute *)((char *)fnode + fnode->ea_offs + fnode->acl_size_s);
+ return (struct extended_attribute *)((char *)fnode + le16_to_cpu(fnode->ea_offs) + le16_to_cpu(fnode->acl_size_s));
}
static inline struct extended_attribute *fnode_end_ea(struct fnode *fnode)
{
- return (struct extended_attribute *)((char *)fnode + fnode->ea_offs + fnode->acl_size_s + fnode->ea_size_s);
+ return (struct extended_attribute *)((char *)fnode + le16_to_cpu(fnode->ea_offs) + le16_to_cpu(fnode->acl_size_s) + le16_to_cpu(fnode->ea_size_s));
+}
+
+static unsigned ea_valuelen(struct extended_attribute *ea)
+{
+ return ea->valuelen_lo + 256 * ea->valuelen_hi;
}
static inline struct extended_attribute *next_ea(struct extended_attribute *ea)
{
- return (struct extended_attribute *)((char *)ea + 5 + ea->namelen + ea->valuelen);
+ return (struct extended_attribute *)((char *)ea + 5 + ea->namelen + ea_valuelen(ea));
}
static inline secno ea_sec(struct extended_attribute *ea)
{
- return *(secno *)((char *)ea + 9 + ea->namelen);
+ return le32_to_cpu(get_unaligned((secno *)((char *)ea + 9 + ea->namelen)));
}
static inline secno ea_len(struct extended_attribute *ea)
{
- return *(secno *)((char *)ea + 5 + ea->namelen);
+ return le32_to_cpu(get_unaligned((secno *)((char *)ea + 5 + ea->namelen)));
}
static inline char *ea_data(struct extended_attribute *ea)
dst->not_8x3 = n;
}
-static inline unsigned tstbits(unsigned *bmp, unsigned b, unsigned n)
+static inline unsigned tstbits(u32 *bmp, unsigned b, unsigned n)
{
int i;
if ((b >= 0x4000) || (b + n - 1 >= 0x4000)) return n;
- if (!((bmp[(b & 0x3fff) >> 5] >> (b & 0x1f)) & 1)) return 1;
+ if (!((le32_to_cpu(bmp[(b & 0x3fff) >> 5]) >> (b & 0x1f)) & 1)) return 1;
for (i = 1; i < n; i++)
- if (/*b+i < 0x4000 &&*/ !((bmp[((b+i) & 0x3fff) >> 5] >> ((b+i) & 0x1f)) & 1))
+ if (!((le32_to_cpu(bmp[((b+i) & 0x3fff) >> 5]) >> ((b+i) & 0x1f)) & 1))
return i + 1;
return 0;
}
/* alloc.c */
int hpfs_chk_sectors(struct super_block *, secno, int, char *);
-secno hpfs_alloc_sector(struct super_block *, secno, unsigned, int, int);
+secno hpfs_alloc_sector(struct super_block *, secno, unsigned, int);
int hpfs_alloc_if_possible(struct super_block *, secno);
void hpfs_free_sectors(struct super_block *, secno, unsigned);
int hpfs_check_free_dnodes(struct super_block *, int);
void hpfs_free_dnode(struct super_block *, secno);
-struct dnode *hpfs_alloc_dnode(struct super_block *, secno, dnode_secno *, struct quad_buffer_head *, int);
+struct dnode *hpfs_alloc_dnode(struct super_block *, secno, dnode_secno *, struct quad_buffer_head *);
struct fnode *hpfs_alloc_fnode(struct super_block *, secno, fnode_secno *, struct buffer_head **);
struct anode *hpfs_alloc_anode(struct super_block *, secno, anode_secno *, struct buffer_head **);
/* buffer.c */
-void hpfs_lock_creation(struct super_block *);
-void hpfs_unlock_creation(struct super_block *);
void *hpfs_map_sector(struct super_block *, unsigned, struct buffer_head **, int);
void *hpfs_get_sector(struct super_block *, unsigned, struct buffer_head **);
void *hpfs_map_4sectors(struct super_block *, unsigned, struct quad_buffer_head *, int);
struct hpfs_dirent *hpfs_add_de(struct super_block *, struct dnode *,
const unsigned char *, unsigned, secno);
int hpfs_add_dirent(struct inode *, const unsigned char *, unsigned,
- struct hpfs_dirent *, int);
+ struct hpfs_dirent *);
int hpfs_remove_dirent(struct inode *, dnode_secno, struct hpfs_dirent *, struct quad_buffer_head *, int);
void hpfs_count_dnodes(struct super_block *, dnode_secno, int *, int *, int *);
dnode_secno hpfs_de_as_down_as_possible(struct super_block *, dnode_secno dno);
const unsigned char *, unsigned, int);
int hpfs_is_name_long(const unsigned char *, unsigned);
void hpfs_adjust_length(const unsigned char *, unsigned *);
-void hpfs_decide_conv(struct inode *, const unsigned char *, unsigned);
/* namei.c */
/*
* Locking:
*
- * hpfs_lock() is a leftover from the big kernel lock.
- * Right now, these functions are empty and only left
- * for documentation purposes. The file system no longer
- * works on SMP systems, so the lock is not needed
- * any more.
+ * hpfs_lock() locks the whole filesystem. It must be taken
+ * on any method called by the VFS.
*
- * If someone is interested in making it work again, this
- * would be the place to start by adding a per-superblock
- * mutex and fixing all the bugs and performance issues
- * caused by that.
+ * We don't do any per-file locking anymore, it is hard to
+ * review and HPFS is not performance-sensitive anyway.
*/
static inline void hpfs_lock(struct super_block *s)
{
+ struct hpfs_sb_info *sbi = hpfs_sb(s);
+ mutex_lock(&sbi->hpfs_mutex);
}
static inline void hpfs_unlock(struct super_block *s)
{
+ struct hpfs_sb_info *sbi = hpfs_sb(s);
+ mutex_unlock(&sbi->hpfs_mutex);
+}
+
+static inline void hpfs_lock_assert(struct super_block *s)
+{
+ struct hpfs_sb_info *sbi = hpfs_sb(s);
+ WARN_ON(!mutex_is_locked(&sbi->hpfs_mutex));
}
i->i_uid = hpfs_sb(sb)->sb_uid;
i->i_gid = hpfs_sb(sb)->sb_gid;
i->i_mode = hpfs_sb(sb)->sb_mode;
- hpfs_inode->i_conv = hpfs_sb(sb)->sb_conv;
i->i_size = -1;
i->i_blocks = -1;
i->i_mode |= S_IFDIR;
i->i_op = &hpfs_dir_iops;
i->i_fop = &hpfs_dir_ops;
- hpfs_inode->i_parent_dir = fnode->up;
- hpfs_inode->i_dno = fnode->u.external[0].disk_secno;
+ hpfs_inode->i_parent_dir = le32_to_cpu(fnode->up);
+ hpfs_inode->i_dno = le32_to_cpu(fnode->u.external[0].disk_secno);
if (hpfs_sb(sb)->sb_chk >= 2) {
struct buffer_head *bh0;
if (hpfs_map_fnode(sb, hpfs_inode->i_parent_dir, &bh0)) brelse(bh0);
i->i_op = &hpfs_file_iops;
i->i_fop = &hpfs_file_ops;
i->i_nlink = 1;
- i->i_size = fnode->file_size;
+ i->i_size = le32_to_cpu(fnode->file_size);
i->i_blocks = ((i->i_size + 511) >> 9) + 1;
i->i_data.a_ops = &hpfs_aops;
hpfs_i(i)->mmu_private = i->i_size;
static void hpfs_write_inode_ea(struct inode *i, struct fnode *fnode)
{
struct hpfs_inode_info *hpfs_inode = hpfs_i(i);
- /*if (fnode->acl_size_l || fnode->acl_size_s) {
+ /*if (le32_to_cpu(fnode->acl_size_l) || le16_to_cpu(fnode->acl_size_s)) {
Some unknown structures like ACL may be in fnode,
we'd better not overwrite them
hpfs_error(i->i_sb, "fnode %08x has some unknown HPFS386 stuctures", i->i_ino);
kfree(hpfs_inode->i_rddir_off);
hpfs_inode->i_rddir_off = NULL;
}
- mutex_lock(&hpfs_inode->i_parent_mutex);
if (!i->i_nlink) {
- mutex_unlock(&hpfs_inode->i_parent_mutex);
return;
}
parent = iget_locked(i->i_sb, hpfs_inode->i_parent_dir);
hpfs_read_inode(parent);
unlock_new_inode(parent);
}
- mutex_lock(&hpfs_inode->i_mutex);
hpfs_write_inode_nolock(i);
- mutex_unlock(&hpfs_inode->i_mutex);
iput(parent);
- } else {
- mark_inode_dirty(i);
}
- mutex_unlock(&hpfs_inode->i_parent_mutex);
}
void hpfs_write_inode_nolock(struct inode *i)
}
} else de = NULL;
if (S_ISREG(i->i_mode)) {
- fnode->file_size = i->i_size;
- if (de) de->file_size = i->i_size;
+ fnode->file_size = cpu_to_le32(i->i_size);
+ if (de) de->file_size = cpu_to_le32(i->i_size);
} else if (S_ISDIR(i->i_mode)) {
- fnode->file_size = 0;
- if (de) de->file_size = 0;
+ fnode->file_size = cpu_to_le32(0);
+ if (de) de->file_size = cpu_to_le32(0);
}
hpfs_write_inode_ea(i, fnode);
if (de) {
- de->write_date = gmt_to_local(i->i_sb, i->i_mtime.tv_sec);
- de->read_date = gmt_to_local(i->i_sb, i->i_atime.tv_sec);
- de->creation_date = gmt_to_local(i->i_sb, i->i_ctime.tv_sec);
+ de->write_date = cpu_to_le32(gmt_to_local(i->i_sb, i->i_mtime.tv_sec));
+ de->read_date = cpu_to_le32(gmt_to_local(i->i_sb, i->i_atime.tv_sec));
+ de->creation_date = cpu_to_le32(gmt_to_local(i->i_sb, i->i_ctime.tv_sec));
de->read_only = !(i->i_mode & 0222);
- de->ea_size = hpfs_inode->i_ea_size;
+ de->ea_size = cpu_to_le32(hpfs_inode->i_ea_size);
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
}
if (S_ISDIR(i->i_mode)) {
if ((de = map_dirent(i, hpfs_inode->i_dno, "\001\001", 2, NULL, &qbh))) {
- de->write_date = gmt_to_local(i->i_sb, i->i_mtime.tv_sec);
- de->read_date = gmt_to_local(i->i_sb, i->i_atime.tv_sec);
- de->creation_date = gmt_to_local(i->i_sb, i->i_ctime.tv_sec);
+ de->write_date = cpu_to_le32(gmt_to_local(i->i_sb, i->i_mtime.tv_sec));
+ de->read_date = cpu_to_le32(gmt_to_local(i->i_sb, i->i_atime.tv_sec));
+ de->creation_date = cpu_to_le32(gmt_to_local(i->i_sb, i->i_ctime.tv_sec));
de->read_only = !(i->i_mode & 0222);
- de->ea_size = /*hpfs_inode->i_ea_size*/0;
- de->file_size = 0;
+ de->ea_size = cpu_to_le32(/*hpfs_inode->i_ea_size*/0);
+ de->file_size = cpu_to_le32(0);
hpfs_mark_4buffers_dirty(&qbh);
hpfs_brelse4(&qbh);
} else
hpfs_lock(inode->i_sb);
if (inode->i_ino == hpfs_sb(inode->i_sb)->sb_root)
goto out_unlock;
+ if ((attr->ia_valid & ATTR_UID) && attr->ia_uid >= 0x10000)
+ goto out_unlock;
+ if ((attr->ia_valid & ATTR_GID) && attr->ia_gid >= 0x10000)
+ goto out_unlock;
if ((attr->ia_valid & ATTR_SIZE) && attr->ia_size > inode->i_size)
goto out_unlock;
}
setattr_copy(inode, attr);
- mark_inode_dirty(inode);
hpfs_write_inode(inode);
hpfs_error(s, "hpfs_map_bitmap called with bad parameter: %08x at %s", bmp_block, id);
return NULL;
}
- sec = hpfs_sb(s)->sb_bmp_dir[bmp_block];
+ sec = le32_to_cpu(hpfs_sb(s)->sb_bmp_dir[bmp_block]);
if (!sec || sec > hpfs_sb(s)->sb_fs_size-4) {
hpfs_error(s, "invalid bitmap block pointer %08x -> %08x at %s", bmp_block, sec, id);
return NULL;
struct code_page_data *cpd;
struct code_page_directory *cp = hpfs_map_sector(s, cps, &bh, 0);
if (!cp) return NULL;
- if (cp->magic != CP_DIR_MAGIC) {
- printk("HPFS: Code page directory magic doesn't match (magic = %08x)\n", cp->magic);
+ if (le32_to_cpu(cp->magic) != CP_DIR_MAGIC) {
+ printk("HPFS: Code page directory magic doesn't match (magic = %08x)\n", le32_to_cpu(cp->magic));
brelse(bh);
return NULL;
}
- if (!cp->n_code_pages) {
+ if (!le32_to_cpu(cp->n_code_pages)) {
printk("HPFS: n_code_pages == 0\n");
brelse(bh);
return NULL;
}
- cpds = cp->array[0].code_page_data;
- cpi = cp->array[0].index;
+ cpds = le32_to_cpu(cp->array[0].code_page_data);
+ cpi = le16_to_cpu(cp->array[0].index);
brelse(bh);
if (cpi >= 3) {
}
if (!(cpd = hpfs_map_sector(s, cpds, &bh, 0))) return NULL;
- if ((unsigned)cpd->offs[cpi] > 0x178) {
+ if (le16_to_cpu(cpd->offs[cpi]) > 0x178) {
printk("HPFS: Code page index out of sector\n");
brelse(bh);
return NULL;
}
- ptr = (unsigned char *)cpd + cpd->offs[cpi] + 6;
+ ptr = (unsigned char *)cpd + le16_to_cpu(cpd->offs[cpi]) + 6;
if (!(cp_table = kmalloc(256, GFP_KERNEL))) {
printk("HPFS: out of memory for code page table\n");
brelse(bh);
if (hpfs_sb(s)->sb_chk) {
struct extended_attribute *ea;
struct extended_attribute *ea_end;
- if (fnode->magic != FNODE_MAGIC) {
+ if (le32_to_cpu(fnode->magic) != FNODE_MAGIC) {
hpfs_error(s, "bad magic on fnode %08lx",
(unsigned long)ino);
goto bail;
(unsigned long)ino);
goto bail;
}
- if (fnode->btree.first_free !=
+ if (le16_to_cpu(fnode->btree.first_free) !=
8 + fnode->btree.n_used_nodes * (fnode->btree.internal ? 8 : 12)) {
hpfs_error(s,
"bad first_free pointer in fnode %08lx",
goto bail;
}
}
- if (fnode->ea_size_s && ((signed int)fnode->ea_offs < 0xc4 ||
- (signed int)fnode->ea_offs + fnode->acl_size_s + fnode->ea_size_s > 0x200)) {
+ if (le16_to_cpu(fnode->ea_size_s) && (le16_to_cpu(fnode->ea_offs) < 0xc4 ||
+ le16_to_cpu(fnode->ea_offs) + le16_to_cpu(fnode->acl_size_s) + le16_to_cpu(fnode->ea_size_s) > 0x200)) {
hpfs_error(s,
"bad EA info in fnode %08lx: ea_offs == %04x ea_size_s == %04x",
(unsigned long)ino,
- fnode->ea_offs, fnode->ea_size_s);
+ le16_to_cpu(fnode->ea_offs), le16_to_cpu(fnode->ea_size_s));
goto bail;
}
ea = fnode_ea(fnode);
if (hpfs_sb(s)->sb_chk) if (hpfs_chk_sectors(s, ano, 1, "anode")) return NULL;
if ((anode = hpfs_map_sector(s, ano, bhp, ANODE_RD_AHEAD)))
if (hpfs_sb(s)->sb_chk) {
- if (anode->magic != ANODE_MAGIC || anode->self != ano) {
+ if (le32_to_cpu(anode->magic) != ANODE_MAGIC) {
hpfs_error(s, "bad magic on anode %08x", ano);
goto bail;
}
+ if (le32_to_cpu(anode->self) != ano) {
+ hpfs_error(s, "self pointer invalid on anode %08x", ano);
+ goto bail;
+ }
if ((unsigned)anode->btree.n_used_nodes + (unsigned)anode->btree.n_free_nodes !=
(anode->btree.internal ? 60 : 40)) {
hpfs_error(s, "bad number of nodes in anode %08x", ano);
goto bail;
}
- if (anode->btree.first_free !=
+ if (le16_to_cpu(anode->btree.first_free) !=
8 + anode->btree.n_used_nodes * (anode->btree.internal ? 8 : 12)) {
hpfs_error(s, "bad first_free pointer in anode %08x", ano);
goto bail;
unsigned p, pp = 0;
unsigned char *d = (unsigned char *)dnode;
int b = 0;
- if (dnode->magic != DNODE_MAGIC) {
+ if (le32_to_cpu(dnode->magic) != DNODE_MAGIC) {
hpfs_error(s, "bad magic on dnode %08x", secno);
goto bail;
}
- if (dnode->self != secno)
- hpfs_error(s, "bad self pointer on dnode %08x self = %08x", secno, dnode->self);
+ if (le32_to_cpu(dnode->self) != secno)
+ hpfs_error(s, "bad self pointer on dnode %08x self = %08x", secno, le32_to_cpu(dnode->self));
/* Check dirents - bad dirents would cause infinite
loops or shooting to memory */
- if (dnode->first_free > 2048/* || dnode->first_free < 84*/) {
- hpfs_error(s, "dnode %08x has first_free == %08x", secno, dnode->first_free);
+ if (le32_to_cpu(dnode->first_free) > 2048) {
+ hpfs_error(s, "dnode %08x has first_free == %08x", secno, le32_to_cpu(dnode->first_free));
goto bail;
}
- for (p = 20; p < dnode->first_free; p += d[p] + (d[p+1] << 8)) {
+ for (p = 20; p < le32_to_cpu(dnode->first_free); p += d[p] + (d[p+1] << 8)) {
struct hpfs_dirent *de = (struct hpfs_dirent *)((char *)dnode + p);
- if (de->length > 292 || (de->length < 32) || (de->length & 3) || p + de->length > 2048) {
+ if (le16_to_cpu(de->length) > 292 || (le16_to_cpu(de->length) < 32) || (le16_to_cpu(de->length) & 3) || p + le16_to_cpu(de->length) > 2048) {
hpfs_error(s, "bad dirent size in dnode %08x, dirent %03x, last %03x", secno, p, pp);
goto bail;
}
- if (((31 + de->namelen + de->down*4 + 3) & ~3) != de->length) {
- if (((31 + de->namelen + de->down*4 + 3) & ~3) < de->length && s->s_flags & MS_RDONLY) goto ok;
+ if (((31 + de->namelen + de->down*4 + 3) & ~3) != le16_to_cpu(de->length)) {
+ if (((31 + de->namelen + de->down*4 + 3) & ~3) < le16_to_cpu(de->length) && s->s_flags & MS_RDONLY) goto ok;
hpfs_error(s, "namelen does not match dirent size in dnode %08x, dirent %03x, last %03x", secno, p, pp);
goto bail;
}
pp = p;
}
- if (p != dnode->first_free) {
+ if (p != le32_to_cpu(dnode->first_free)) {
hpfs_error(s, "size on last dirent does not match first_free; dnode %08x", secno);
goto bail;
}
if (!fnode)
return 0;
- dno = fnode->u.external[0].disk_secno;
+ dno = le32_to_cpu(fnode->u.external[0].disk_secno);
brelse(bh);
return dno;
}
#include "hpfs_fn.h"
-static const char *text_postfix[]={
-".ASM", ".BAS", ".BAT", ".C", ".CC", ".CFG", ".CMD", ".CON", ".CPP", ".DEF",
-".DOC", ".DPR", ".ERX", ".H", ".HPP", ".HTM", ".HTML", ".JAVA", ".LOG", ".PAS",
-".RC", ".TEX", ".TXT", ".Y", ""};
-
-static const char *text_prefix[]={
-"AUTOEXEC.", "CHANGES", "COPYING", "CONFIG.", "CREDITS", "FAQ", "FILE_ID.DIZ",
-"MAKEFILE", "READ.ME", "README", "TERMCAP", ""};
-
-void hpfs_decide_conv(struct inode *inode, const unsigned char *name, unsigned len)
-{
- struct hpfs_inode_info *hpfs_inode = hpfs_i(inode);
- int i;
- if (hpfs_inode->i_conv != CONV_AUTO) return;
- for (i = 0; *text_postfix[i]; i++) {
- int l = strlen(text_postfix[i]);
- if (l <= len)
- if (!hpfs_compare_names(inode->i_sb, text_postfix[i], l, name + len - l, l, 0))
- goto text;
- }
- for (i = 0; *text_prefix[i]; i++) {
- int l = strlen(text_prefix[i]);
- if (l <= len)
- if (!hpfs_compare_names(inode->i_sb, text_prefix[i], l, name, l, 0))
- goto text;
- }
- hpfs_inode->i_conv = CONV_BINARY;
- return;
- text:
- hpfs_inode->i_conv = CONV_TEXT;
- return;
-}
-
static inline int not_allowed_char(unsigned char c)
{
return c<' ' || c=='"' || c=='*' || c=='/' || c==':' || c=='<' ||
fnode = hpfs_alloc_fnode(dir->i_sb, hpfs_i(dir)->i_dno, &fno, &bh);
if (!fnode)
goto bail;
- dnode = hpfs_alloc_dnode(dir->i_sb, fno, &dno, &qbh0, 1);
+ dnode = hpfs_alloc_dnode(dir->i_sb, fno, &dno, &qbh0);
if (!dnode)
goto bail1;
memset(&dee, 0, sizeof dee);
if (!(mode & 0222)) dee.read_only = 1;
/*dee.archive = 0;*/
dee.hidden = name[0] == '.';
- dee.fnode = fno;
- dee.creation_date = dee.write_date = dee.read_date = gmt_to_local(dir->i_sb, get_seconds());
+ dee.fnode = cpu_to_le32(fno);
+ dee.creation_date = dee.write_date = dee.read_date = cpu_to_le32(gmt_to_local(dir->i_sb, get_seconds()));
result = new_inode(dir->i_sb);
if (!result)
goto bail2;
result->i_ino = fno;
hpfs_i(result)->i_parent_dir = dir->i_ino;
hpfs_i(result)->i_dno = dno;
- result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, dee.creation_date);
+ result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(dee.creation_date));
result->i_ctime.tv_nsec = 0;
result->i_mtime.tv_nsec = 0;
result->i_atime.tv_nsec = 0;
if (dee.read_only)
result->i_mode &= ~0222;
- mutex_lock(&hpfs_i(dir)->i_mutex);
- r = hpfs_add_dirent(dir, name, len, &dee, 0);
+ r = hpfs_add_dirent(dir, name, len, &dee);
if (r == 1)
goto bail3;
if (r == -1) {
}
fnode->len = len;
memcpy(fnode->name, name, len > 15 ? 15 : len);
- fnode->up = dir->i_ino;
+ fnode->up = cpu_to_le32(dir->i_ino);
fnode->dirflag = 1;
fnode->btree.n_free_nodes = 7;
fnode->btree.n_used_nodes = 1;
- fnode->btree.first_free = 0x14;
- fnode->u.external[0].disk_secno = dno;
- fnode->u.external[0].file_secno = -1;
+ fnode->btree.first_free = cpu_to_le16(0x14);
+ fnode->u.external[0].disk_secno = cpu_to_le32(dno);
+ fnode->u.external[0].file_secno = cpu_to_le32(-1);
dnode->root_dnode = 1;
- dnode->up = fno;
+ dnode->up = cpu_to_le32(fno);
de = hpfs_add_de(dir->i_sb, dnode, "\001\001", 2, 0);
- de->creation_date = de->write_date = de->read_date = gmt_to_local(dir->i_sb, get_seconds());
+ de->creation_date = de->write_date = de->read_date = cpu_to_le32(gmt_to_local(dir->i_sb, get_seconds()));
if (!(mode & 0222)) de->read_only = 1;
de->first = de->directory = 1;
/*de->hidden = de->system = 0;*/
- de->fnode = fno;
+ de->fnode = cpu_to_le32(fno);
mark_buffer_dirty(bh);
brelse(bh);
hpfs_mark_4buffers_dirty(&qbh0);
hpfs_write_inode_nolock(result);
}
d_instantiate(dentry, result);
- mutex_unlock(&hpfs_i(dir)->i_mutex);
hpfs_unlock(dir->i_sb);
return 0;
bail3:
- mutex_unlock(&hpfs_i(dir)->i_mutex);
iput(result);
bail2:
hpfs_brelse4(&qbh0);
if (!(mode & 0222)) dee.read_only = 1;
dee.archive = 1;
dee.hidden = name[0] == '.';
- dee.fnode = fno;
- dee.creation_date = dee.write_date = dee.read_date = gmt_to_local(dir->i_sb, get_seconds());
+ dee.fnode = cpu_to_le32(fno);
+ dee.creation_date = dee.write_date = dee.read_date = cpu_to_le32(gmt_to_local(dir->i_sb, get_seconds()));
result = new_inode(dir->i_sb);
if (!result)
result->i_op = &hpfs_file_iops;
result->i_fop = &hpfs_file_ops;
result->i_nlink = 1;
- hpfs_decide_conv(result, name, len);
hpfs_i(result)->i_parent_dir = dir->i_ino;
- result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, dee.creation_date);
+ result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(dee.creation_date));
result->i_ctime.tv_nsec = 0;
result->i_mtime.tv_nsec = 0;
result->i_atime.tv_nsec = 0;
result->i_data.a_ops = &hpfs_aops;
hpfs_i(result)->mmu_private = 0;
- mutex_lock(&hpfs_i(dir)->i_mutex);
- r = hpfs_add_dirent(dir, name, len, &dee, 0);
+ r = hpfs_add_dirent(dir, name, len, &dee);
if (r == 1)
goto bail2;
if (r == -1) {
}
fnode->len = len;
memcpy(fnode->name, name, len > 15 ? 15 : len);
- fnode->up = dir->i_ino;
+ fnode->up = cpu_to_le32(dir->i_ino);
mark_buffer_dirty(bh);
brelse(bh);
hpfs_write_inode_nolock(result);
}
d_instantiate(dentry, result);
- mutex_unlock(&hpfs_i(dir)->i_mutex);
hpfs_unlock(dir->i_sb);
return 0;
bail2:
- mutex_unlock(&hpfs_i(dir)->i_mutex);
iput(result);
bail1:
brelse(bh);
if (!(mode & 0222)) dee.read_only = 1;
dee.archive = 1;
dee.hidden = name[0] == '.';
- dee.fnode = fno;
- dee.creation_date = dee.write_date = dee.read_date = gmt_to_local(dir->i_sb, get_seconds());
+ dee.fnode = cpu_to_le32(fno);
+ dee.creation_date = dee.write_date = dee.read_date = cpu_to_le32(gmt_to_local(dir->i_sb, get_seconds()));
result = new_inode(dir->i_sb);
if (!result)
hpfs_init_inode(result);
result->i_ino = fno;
hpfs_i(result)->i_parent_dir = dir->i_ino;
- result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, dee.creation_date);
+ result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(dee.creation_date));
result->i_ctime.tv_nsec = 0;
result->i_mtime.tv_nsec = 0;
result->i_atime.tv_nsec = 0;
result->i_blocks = 1;
init_special_inode(result, mode, rdev);
- mutex_lock(&hpfs_i(dir)->i_mutex);
- r = hpfs_add_dirent(dir, name, len, &dee, 0);
+ r = hpfs_add_dirent(dir, name, len, &dee);
if (r == 1)
goto bail2;
if (r == -1) {
}
fnode->len = len;
memcpy(fnode->name, name, len > 15 ? 15 : len);
- fnode->up = dir->i_ino;
+ fnode->up = cpu_to_le32(dir->i_ino);
mark_buffer_dirty(bh);
insert_inode_hash(result);
hpfs_write_inode_nolock(result);
d_instantiate(dentry, result);
- mutex_unlock(&hpfs_i(dir)->i_mutex);
brelse(bh);
hpfs_unlock(dir->i_sb);
return 0;
bail2:
- mutex_unlock(&hpfs_i(dir)->i_mutex);
iput(result);
bail1:
brelse(bh);
memset(&dee, 0, sizeof dee);
dee.archive = 1;
dee.hidden = name[0] == '.';
- dee.fnode = fno;
- dee.creation_date = dee.write_date = dee.read_date = gmt_to_local(dir->i_sb, get_seconds());
+ dee.fnode = cpu_to_le32(fno);
+ dee.creation_date = dee.write_date = dee.read_date = cpu_to_le32(gmt_to_local(dir->i_sb, get_seconds()));
result = new_inode(dir->i_sb);
if (!result)
result->i_ino = fno;
hpfs_init_inode(result);
hpfs_i(result)->i_parent_dir = dir->i_ino;
- result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, dee.creation_date);
+ result->i_ctime.tv_sec = result->i_mtime.tv_sec = result->i_atime.tv_sec = local_to_gmt(dir->i_sb, le32_to_cpu(dee.creation_date));
result->i_ctime.tv_nsec = 0;
result->i_mtime.tv_nsec = 0;
result->i_atime.tv_nsec = 0;
result->i_op = &page_symlink_inode_operations;
result->i_data.a_ops = &hpfs_symlink_aops;
- mutex_lock(&hpfs_i(dir)->i_mutex);
- r = hpfs_add_dirent(dir, name, len, &dee, 0);
+ r = hpfs_add_dirent(dir, name, len, &dee);
if (r == 1)
goto bail2;
if (r == -1) {
}
fnode->len = len;
memcpy(fnode->name, name, len > 15 ? 15 : len);
- fnode->up = dir->i_ino;
+ fnode->up = cpu_to_le32(dir->i_ino);
hpfs_set_ea(result, fnode, "SYMLINK", symlink, strlen(symlink));
mark_buffer_dirty(bh);
brelse(bh);
hpfs_write_inode_nolock(result);
d_instantiate(dentry, result);
- mutex_unlock(&hpfs_i(dir)->i_mutex);
hpfs_unlock(dir->i_sb);
return 0;
bail2:
- mutex_unlock(&hpfs_i(dir)->i_mutex);
iput(result);
bail1:
brelse(bh);
struct hpfs_dirent *de;
struct inode *inode = dentry->d_inode;
dnode_secno dno;
- fnode_secno fno;
int r;
int rep = 0;
int err;
hpfs_lock(dir->i_sb);
hpfs_adjust_length(name, &len);
again:
- mutex_lock(&hpfs_i(inode)->i_parent_mutex);
- mutex_lock(&hpfs_i(dir)->i_mutex);
err = -ENOENT;
de = map_dirent(dir, hpfs_i(dir)->i_dno, name, len, &dno, &qbh);
if (!de)
if (de->directory)
goto out1;
- fno = de->fnode;
r = hpfs_remove_dirent(dir, dno, de, &qbh, 1);
switch (r) {
case 1:
if (rep++)
break;
- mutex_unlock(&hpfs_i(dir)->i_mutex);
- mutex_unlock(&hpfs_i(inode)->i_parent_mutex);
dentry_unhash(dentry);
if (!d_unhashed(dentry)) {
dput(dentry);
out1:
hpfs_brelse4(&qbh);
out:
- mutex_unlock(&hpfs_i(dir)->i_mutex);
- mutex_unlock(&hpfs_i(inode)->i_parent_mutex);
hpfs_unlock(dir->i_sb);
return err;
}
struct hpfs_dirent *de;
struct inode *inode = dentry->d_inode;
dnode_secno dno;
- fnode_secno fno;
int n_items = 0;
int err;
int r;
hpfs_adjust_length(name, &len);
hpfs_lock(dir->i_sb);
- mutex_lock(&hpfs_i(inode)->i_parent_mutex);
- mutex_lock(&hpfs_i(dir)->i_mutex);
err = -ENOENT;
de = map_dirent(dir, hpfs_i(dir)->i_dno, name, len, &dno, &qbh);
if (!de)
if (n_items)
goto out1;
- fno = de->fnode;
r = hpfs_remove_dirent(dir, dno, de, &qbh, 1);
switch (r) {
case 1:
out1:
hpfs_brelse4(&qbh);
out:
- mutex_unlock(&hpfs_i(dir)->i_mutex);
- mutex_unlock(&hpfs_i(inode)->i_parent_mutex);
hpfs_unlock(dir->i_sb);
return err;
}
hpfs_lock(i->i_sb);
/* order doesn't matter, due to VFS exclusion */
- mutex_lock(&hpfs_i(i)->i_parent_mutex);
- if (new_inode)
- mutex_lock(&hpfs_i(new_inode)->i_parent_mutex);
- mutex_lock(&hpfs_i(old_dir)->i_mutex);
- if (new_dir != old_dir)
- mutex_lock(&hpfs_i(new_dir)->i_mutex);
/* Erm? Moving over the empty non-busy directory is perfectly legal */
if (new_inode && S_ISDIR(new_inode->i_mode)) {
if (new_dir == old_dir) hpfs_brelse4(&qbh);
- hpfs_lock_creation(i->i_sb);
- if ((r = hpfs_add_dirent(new_dir, new_name, new_len, &de, 1))) {
- hpfs_unlock_creation(i->i_sb);
+ if ((r = hpfs_add_dirent(new_dir, new_name, new_len, &de))) {
if (r == -1) hpfs_error(new_dir->i_sb, "hpfs_rename: dirent already exists!");
err = r == 1 ? -ENOSPC : -EFSERROR;
if (new_dir != old_dir) hpfs_brelse4(&qbh);
if (new_dir == old_dir)
if (!(dep = map_dirent(old_dir, hpfs_i(old_dir)->i_dno, old_name, old_len, &dno, &qbh))) {
- hpfs_unlock_creation(i->i_sb);
hpfs_error(i->i_sb, "lookup succeeded but map dirent failed at #2");
err = -ENOENT;
goto end1;
}
if ((r = hpfs_remove_dirent(old_dir, dno, dep, &qbh, 0))) {
- hpfs_unlock_creation(i->i_sb);
hpfs_error(i->i_sb, "hpfs_rename: could not remove dirent");
err = r == 2 ? -ENOSPC : -EFSERROR;
goto end1;
}
- hpfs_unlock_creation(i->i_sb);
-
+
end:
hpfs_i(i)->i_parent_dir = new_dir->i_ino;
if (S_ISDIR(i->i_mode)) {
drop_nlink(old_dir);
}
if ((fnode = hpfs_map_fnode(i->i_sb, i->i_ino, &bh))) {
- fnode->up = new_dir->i_ino;
+ fnode->up = cpu_to_le32(new_dir->i_ino);
fnode->len = new_len;
memcpy(fnode->name, new_name, new_len>15?15:new_len);
if (new_len < 15) memset(&fnode->name[new_len], 0, 15 - new_len);
mark_buffer_dirty(bh);
brelse(bh);
}
- hpfs_i(i)->i_conv = hpfs_sb(i->i_sb)->sb_conv;
- hpfs_decide_conv(i, new_name, new_len);
end1:
- if (old_dir != new_dir)
- mutex_unlock(&hpfs_i(new_dir)->i_mutex);
- mutex_unlock(&hpfs_i(old_dir)->i_mutex);
- mutex_unlock(&hpfs_i(i)->i_parent_mutex);
- if (new_inode)
- mutex_unlock(&hpfs_i(new_inode)->i_parent_mutex);
hpfs_unlock(i->i_sb);
return err;
}
/* Mark the filesystem dirty, so that chkdsk checks it when os/2 booted */
-static void mark_dirty(struct super_block *s)
+static void mark_dirty(struct super_block *s, int remount)
{
- if (hpfs_sb(s)->sb_chkdsk && !(s->s_flags & MS_RDONLY)) {
+ if (hpfs_sb(s)->sb_chkdsk && (remount || !(s->s_flags & MS_RDONLY))) {
struct buffer_head *bh;
struct hpfs_spare_block *sb;
if ((sb = hpfs_map_sector(s, 17, &bh, 0))) {
sb->dirty = 1;
sb->old_wrote = 0;
mark_buffer_dirty(bh);
+ sync_dirty_buffer(bh);
brelse(bh);
}
}
struct buffer_head *bh;
struct hpfs_spare_block *sb;
if (s->s_flags & MS_RDONLY) return;
+ sync_blockdev(s->s_bdev);
if ((sb = hpfs_map_sector(s, 17, &bh, 0))) {
sb->dirty = hpfs_sb(s)->sb_chkdsk > 1 - hpfs_sb(s)->sb_was_error;
sb->old_wrote = hpfs_sb(s)->sb_chkdsk >= 2 && !hpfs_sb(s)->sb_was_error;
mark_buffer_dirty(bh);
+ sync_dirty_buffer(bh);
brelse(bh);
}
}
if (!hpfs_sb(s)->sb_was_error) {
if (hpfs_sb(s)->sb_err == 2) {
printk("; crashing the system because you wanted it\n");
- mark_dirty(s);
+ mark_dirty(s, 0);
panic("HPFS panic");
} else if (hpfs_sb(s)->sb_err == 1) {
if (s->s_flags & MS_RDONLY) printk("; already mounted read-only\n");
else {
printk("; remounting read-only\n");
- mark_dirty(s);
+ mark_dirty(s, 0);
s->s_flags |= MS_RDONLY;
}
} else if (s->s_flags & MS_RDONLY) printk("; going on - but anything won't be destroyed because it's read-only\n");
{
struct hpfs_sb_info *sbi = hpfs_sb(s);
+ hpfs_lock(s);
+ unmark_dirty(s);
+ hpfs_unlock(s);
+
kfree(sbi->sb_cp_table);
kfree(sbi->sb_bmp_dir);
- unmark_dirty(s);
s->s_fs_info = NULL;
kfree(sbi);
}
n_bands = (hpfs_sb(s)->sb_fs_size + 0x3fff) >> 14;
count = 0;
for (n = 0; n < n_bands; n++)
- count += hpfs_count_one_bitmap(s, hpfs_sb(s)->sb_bmp_dir[n]);
+ count += hpfs_count_one_bitmap(s, le32_to_cpu(hpfs_sb(s)->sb_bmp_dir[n]));
return count;
}
{
struct hpfs_inode_info *ei = (struct hpfs_inode_info *) foo;
- mutex_init(&ei->i_mutex);
- mutex_init(&ei->i_parent_mutex);
inode_init_once(&ei->vfs_inode);
}
enum {
Opt_help, Opt_uid, Opt_gid, Opt_umask, Opt_case_lower, Opt_case_asis,
- Opt_conv_binary, Opt_conv_text, Opt_conv_auto,
Opt_check_none, Opt_check_normal, Opt_check_strict,
Opt_err_cont, Opt_err_ro, Opt_err_panic,
Opt_eas_no, Opt_eas_ro, Opt_eas_rw,
{Opt_umask, "umask=%o"},
{Opt_case_lower, "case=lower"},
{Opt_case_asis, "case=asis"},
- {Opt_conv_binary, "conv=binary"},
- {Opt_conv_text, "conv=text"},
- {Opt_conv_auto, "conv=auto"},
{Opt_check_none, "check=none"},
{Opt_check_normal, "check=normal"},
{Opt_check_strict, "check=strict"},
};
static int parse_opts(char *opts, uid_t *uid, gid_t *gid, umode_t *umask,
- int *lowercase, int *conv, int *eas, int *chk, int *errs,
+ int *lowercase, int *eas, int *chk, int *errs,
int *chkdsk, int *timeshift)
{
char *p;
case Opt_case_asis:
*lowercase = 0;
break;
- case Opt_conv_binary:
- *conv = CONV_BINARY;
- break;
- case Opt_conv_text:
- *conv = CONV_TEXT;
- break;
- case Opt_conv_auto:
- *conv = CONV_AUTO;
- break;
case Opt_check_none:
*chk = 0;
break;
umask=xxx set mode of files that don't have mode specified in eas\n\
case=lower lowercase all files\n\
case=asis do not lowercase files (default)\n\
- conv=binary do not convert CR/LF -> LF (default)\n\
- conv=auto convert only files with known text extensions\n\
- conv=text convert all files\n\
check=none no fs checks - kernel may crash on corrupted filesystem\n\
check=normal do some checks - it should not crash (default)\n\
check=strict do extra time-consuming checks, used for debugging\n\
uid_t uid;
gid_t gid;
umode_t umask;
- int lowercase, conv, eas, chk, errs, chkdsk, timeshift;
+ int lowercase, eas, chk, errs, chkdsk, timeshift;
int o;
struct hpfs_sb_info *sbi = hpfs_sb(s);
char *new_opts = kstrdup(data, GFP_KERNEL);
lock_super(s);
uid = sbi->sb_uid; gid = sbi->sb_gid;
umask = 0777 & ~sbi->sb_mode;
- lowercase = sbi->sb_lowercase; conv = sbi->sb_conv;
+ lowercase = sbi->sb_lowercase;
eas = sbi->sb_eas; chk = sbi->sb_chk; chkdsk = sbi->sb_chkdsk;
errs = sbi->sb_err; timeshift = sbi->sb_timeshift;
- if (!(o = parse_opts(data, &uid, &gid, &umask, &lowercase, &conv,
+ if (!(o = parse_opts(data, &uid, &gid, &umask, &lowercase,
&eas, &chk, &errs, &chkdsk, ×hift))) {
printk("HPFS: bad mount options.\n");
goto out_err;
sbi->sb_uid = uid; sbi->sb_gid = gid;
sbi->sb_mode = 0777 & ~umask;
- sbi->sb_lowercase = lowercase; sbi->sb_conv = conv;
+ sbi->sb_lowercase = lowercase;
sbi->sb_eas = eas; sbi->sb_chk = chk; sbi->sb_chkdsk = chkdsk;
sbi->sb_err = errs; sbi->sb_timeshift = timeshift;
- if (!(*flags & MS_RDONLY)) mark_dirty(s);
+ if (!(*flags & MS_RDONLY)) mark_dirty(s, 1);
replace_mount_options(s, new_opts);
uid_t uid;
gid_t gid;
umode_t umask;
- int lowercase, conv, eas, chk, errs, chkdsk, timeshift;
+ int lowercase, eas, chk, errs, chkdsk, timeshift;
dnode_secno root_dno;
struct hpfs_dirent *de = NULL;
int o;
- if (num_possible_cpus() > 1) {
- printk(KERN_ERR "HPFS is not SMP safe\n");
- return -EINVAL;
- }
-
save_mount_options(s, options);
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
sbi->sb_bmp_dir = NULL;
sbi->sb_cp_table = NULL;
- mutex_init(&sbi->hpfs_creation_de);
+ mutex_init(&sbi->hpfs_mutex);
+ hpfs_lock(s);
uid = current_uid();
gid = current_gid();
umask = current_umask();
lowercase = 0;
- conv = CONV_BINARY;
eas = 2;
chk = 1;
errs = 1;
chkdsk = 1;
timeshift = 0;
- if (!(o = parse_opts(options, &uid, &gid, &umask, &lowercase, &conv,
+ if (!(o = parse_opts(options, &uid, &gid, &umask, &lowercase,
&eas, &chk, &errs, &chkdsk, ×hift))) {
printk("HPFS: bad mount options.\n");
goto bail0;
if (!(spareblock = hpfs_map_sector(s, 17, &bh2, 0))) goto bail3;
/* Check magics */
- if (/*bootblock->magic != BB_MAGIC
- ||*/ superblock->magic != SB_MAGIC
- || spareblock->magic != SP_MAGIC) {
+ if (/*le16_to_cpu(bootblock->magic) != BB_MAGIC
+ ||*/ le32_to_cpu(superblock->magic) != SB_MAGIC
+ || le32_to_cpu(spareblock->magic) != SP_MAGIC) {
if (!silent) printk("HPFS: Bad magic ... probably not HPFS\n");
goto bail4;
}
s->s_op = &hpfs_sops;
s->s_d_op = &hpfs_dentry_operations;
- sbi->sb_root = superblock->root;
- sbi->sb_fs_size = superblock->n_sectors;
- sbi->sb_bitmaps = superblock->bitmaps;
- sbi->sb_dirband_start = superblock->dir_band_start;
- sbi->sb_dirband_size = superblock->n_dir_band;
- sbi->sb_dmap = superblock->dir_band_bitmap;
+ sbi->sb_root = le32_to_cpu(superblock->root);
+ sbi->sb_fs_size = le32_to_cpu(superblock->n_sectors);
+ sbi->sb_bitmaps = le32_to_cpu(superblock->bitmaps);
+ sbi->sb_dirband_start = le32_to_cpu(superblock->dir_band_start);
+ sbi->sb_dirband_size = le32_to_cpu(superblock->n_dir_band);
+ sbi->sb_dmap = le32_to_cpu(superblock->dir_band_bitmap);
sbi->sb_uid = uid;
sbi->sb_gid = gid;
sbi->sb_mode = 0777 & ~umask;
sbi->sb_n_free = -1;
sbi->sb_n_free_dnodes = -1;
sbi->sb_lowercase = lowercase;
- sbi->sb_conv = conv;
sbi->sb_eas = eas;
sbi->sb_chk = chk;
sbi->sb_chkdsk = chkdsk;
sbi->sb_max_fwd_alloc = 0xffffff;
/* Load bitmap directory */
- if (!(sbi->sb_bmp_dir = hpfs_load_bitmap_directory(s, superblock->bitmaps)))
+ if (!(sbi->sb_bmp_dir = hpfs_load_bitmap_directory(s, le32_to_cpu(superblock->bitmaps))))
goto bail4;
/* Check for general fs errors*/
mark_buffer_dirty(bh2);
}
- if (spareblock->hotfixes_used || spareblock->n_spares_used) {
+ if (le32_to_cpu(spareblock->hotfixes_used) || le32_to_cpu(spareblock->n_spares_used)) {
if (errs >= 2) {
printk("HPFS: Hotfixes not supported here, try chkdsk\n");
- mark_dirty(s);
+ mark_dirty(s, 0);
goto bail4;
}
hpfs_error(s, "hotfixes not supported here, try chkdsk");
if (errs == 0) printk("HPFS: Proceeding, but your filesystem will be probably corrupted by this driver...\n");
else printk("HPFS: This driver may read bad files or crash when operating on disk with hotfixes.\n");
}
- if (spareblock->n_dnode_spares != spareblock->n_dnode_spares_free) {
+ if (le32_to_cpu(spareblock->n_dnode_spares) != le32_to_cpu(spareblock->n_dnode_spares_free)) {
if (errs >= 2) {
printk("HPFS: Spare dnodes used, try chkdsk\n");
- mark_dirty(s);
+ mark_dirty(s, 0);
goto bail4;
}
hpfs_error(s, "warning: spare dnodes used, try chkdsk");
}
if (chk) {
unsigned a;
- if (superblock->dir_band_end - superblock->dir_band_start + 1 != superblock->n_dir_band ||
- superblock->dir_band_end < superblock->dir_band_start || superblock->n_dir_band > 0x4000) {
+ if (le32_to_cpu(superblock->dir_band_end) - le32_to_cpu(superblock->dir_band_start) + 1 != le32_to_cpu(superblock->n_dir_band) ||
+ le32_to_cpu(superblock->dir_band_end) < le32_to_cpu(superblock->dir_band_start) || le32_to_cpu(superblock->n_dir_band) > 0x4000) {
hpfs_error(s, "dir band size mismatch: dir_band_start==%08x, dir_band_end==%08x, n_dir_band==%08x",
- superblock->dir_band_start, superblock->dir_band_end, superblock->n_dir_band);
+ le32_to_cpu(superblock->dir_band_start), le32_to_cpu(superblock->dir_band_end), le32_to_cpu(superblock->n_dir_band));
goto bail4;
}
a = sbi->sb_dirband_size;
sbi->sb_dirband_size = 0;
- if (hpfs_chk_sectors(s, superblock->dir_band_start, superblock->n_dir_band, "dir_band") ||
- hpfs_chk_sectors(s, superblock->dir_band_bitmap, 4, "dir_band_bitmap") ||
- hpfs_chk_sectors(s, superblock->bitmaps, 4, "bitmaps")) {
- mark_dirty(s);
+ if (hpfs_chk_sectors(s, le32_to_cpu(superblock->dir_band_start), le32_to_cpu(superblock->n_dir_band), "dir_band") ||
+ hpfs_chk_sectors(s, le32_to_cpu(superblock->dir_band_bitmap), 4, "dir_band_bitmap") ||
+ hpfs_chk_sectors(s, le32_to_cpu(superblock->bitmaps), 4, "bitmaps")) {
+ mark_dirty(s, 0);
goto bail4;
}
sbi->sb_dirband_size = a;
} else printk("HPFS: You really don't want any checks? You are crazy...\n");
/* Load code page table */
- if (spareblock->n_code_pages)
- if (!(sbi->sb_cp_table = hpfs_load_code_page(s, spareblock->code_page_dir)))
+ if (le32_to_cpu(spareblock->n_code_pages))
+ if (!(sbi->sb_cp_table = hpfs_load_code_page(s, le32_to_cpu(spareblock->code_page_dir))))
printk("HPFS: Warning: code page support is disabled\n");
brelse(bh2);
if (!de)
hpfs_error(s, "unable to find root dir");
else {
- root->i_atime.tv_sec = local_to_gmt(s, de->read_date);
+ root->i_atime.tv_sec = local_to_gmt(s, le32_to_cpu(de->read_date));
root->i_atime.tv_nsec = 0;
- root->i_mtime.tv_sec = local_to_gmt(s, de->write_date);
+ root->i_mtime.tv_sec = local_to_gmt(s, le32_to_cpu(de->write_date));
root->i_mtime.tv_nsec = 0;
- root->i_ctime.tv_sec = local_to_gmt(s, de->creation_date);
+ root->i_ctime.tv_sec = local_to_gmt(s, le32_to_cpu(de->creation_date));
root->i_ctime.tv_nsec = 0;
- hpfs_i(root)->i_ea_size = de->ea_size;
+ hpfs_i(root)->i_ea_size = le16_to_cpu(de->ea_size);
hpfs_i(root)->i_parent_dir = root->i_ino;
if (root->i_size == -1)
root->i_size = 2048;
root->i_blocks = 5;
hpfs_brelse4(&qbh);
}
+ hpfs_unlock(s);
return 0;
bail4: brelse(bh2);
bail2: brelse(bh0);
bail1:
bail0:
+ hpfs_unlock(s);
kfree(sbi->sb_bmp_dir);
kfree(sbi->sb_cp_table);
s->s_fs_info = NULL;
!read_only)
return -EIO;
- mutex_init(&super->s_dirop_mutex);
- mutex_init(&super->s_object_alias_mutex);
- INIT_LIST_HEAD(&super->s_freeing_list);
-
ret = logfs_init_rw(sb);
if (ret)
return ret;
if (!super)
return ERR_PTR(-ENOMEM);
+ mutex_init(&super->s_dirop_mutex);
+ mutex_init(&super->s_object_alias_mutex);
+ INIT_LIST_HEAD(&super->s_freeing_list);
+
if (!devname)
err = logfs_get_sb_bdev(super, type, devname);
else if (strncmp(devname, "mtd", 3))
static int acl_permission_check(struct inode *inode, int mask, unsigned int flags,
int (*check_acl)(struct inode *inode, int mask, unsigned int flags))
{
- umode_t mode = inode->i_mode;
+ unsigned int mode = inode->i_mode;
mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
}
#ifdef CONFIG_NFS_V4
-static rpc_authflavor_t nfs_find_best_sec(struct nfs4_secinfo_flavors *flavors, struct inode *inode)
+static rpc_authflavor_t nfs_find_best_sec(struct nfs4_secinfo_flavors *flavors)
{
struct gss_api_mech *mech;
struct xdr_netobj oid;
}
flavors = page_address(page);
ret = secinfo(parent->d_inode, &dentry->d_name, flavors);
- *flavor = nfs_find_best_sec(flavors, dentry->d_inode);
+ *flavor = nfs_find_best_sec(flavors);
put_page(page);
}
NFS4CLNT_LAYOUTRECALL,
NFS4CLNT_SESSION_RESET,
NFS4CLNT_RECALL_SLOT,
+ NFS4CLNT_LEASE_CONFIRM,
};
enum nfs4_session_state {
case -EKEYEXPIRED:
rpc_delay(task, FILELAYOUT_POLL_RETRY_MAX);
break;
+ case -NFS4ERR_RETRY_UNCACHED_REP:
+ break;
default:
dprintk("%s DS error. Retry through MDS %d\n", __func__,
task->tk_status);
filelayout_check_layout(struct pnfs_layout_hdr *lo,
struct nfs4_filelayout_segment *fl,
struct nfs4_layoutget_res *lgr,
- struct nfs4_deviceid *id)
+ struct nfs4_deviceid *id,
+ gfp_t gfp_flags)
{
struct nfs4_file_layout_dsaddr *dsaddr;
int status = -EINVAL;
/* find and reference the deviceid */
dsaddr = nfs4_fl_find_get_deviceid(id);
if (dsaddr == NULL) {
- dsaddr = get_device_info(lo->plh_inode, id);
+ dsaddr = get_device_info(lo->plh_inode, id, gfp_flags);
if (dsaddr == NULL)
goto out;
}
filelayout_decode_layout(struct pnfs_layout_hdr *flo,
struct nfs4_filelayout_segment *fl,
struct nfs4_layoutget_res *lgr,
- struct nfs4_deviceid *id)
+ struct nfs4_deviceid *id,
+ gfp_t gfp_flags)
{
struct xdr_stream stream;
struct xdr_buf buf = {
dprintk("%s: set_layout_map Begin\n", __func__);
- scratch = alloc_page(GFP_KERNEL);
+ scratch = alloc_page(gfp_flags);
if (!scratch)
return -ENOMEM;
goto out_err;
fl->fh_array = kzalloc(fl->num_fh * sizeof(struct nfs_fh *),
- GFP_KERNEL);
+ gfp_flags);
if (!fl->fh_array)
goto out_err;
for (i = 0; i < fl->num_fh; i++) {
/* Do we want to use a mempool here? */
- fl->fh_array[i] = kmalloc(sizeof(struct nfs_fh), GFP_KERNEL);
+ fl->fh_array[i] = kmalloc(sizeof(struct nfs_fh), gfp_flags);
if (!fl->fh_array[i])
goto out_err_free;
static struct pnfs_layout_segment *
filelayout_alloc_lseg(struct pnfs_layout_hdr *layoutid,
- struct nfs4_layoutget_res *lgr)
+ struct nfs4_layoutget_res *lgr,
+ gfp_t gfp_flags)
{
struct nfs4_filelayout_segment *fl;
int rc;
struct nfs4_deviceid id;
dprintk("--> %s\n", __func__);
- fl = kzalloc(sizeof(*fl), GFP_KERNEL);
+ fl = kzalloc(sizeof(*fl), gfp_flags);
if (!fl)
return NULL;
- rc = filelayout_decode_layout(layoutid, fl, lgr, &id);
- if (rc != 0 || filelayout_check_layout(layoutid, fl, lgr, &id)) {
+ rc = filelayout_decode_layout(layoutid, fl, lgr, &id, gfp_flags);
+ if (rc != 0 || filelayout_check_layout(layoutid, fl, lgr, &id, gfp_flags)) {
_filelayout_free_lseg(fl);
return NULL;
}
int size = (fl->stripe_type == STRIPE_SPARSE) ?
fl->dsaddr->ds_num : fl->dsaddr->stripe_count;
- fl->commit_buckets = kcalloc(size, sizeof(struct list_head), GFP_KERNEL);
+ fl->commit_buckets = kcalloc(size, sizeof(struct list_head), gfp_flags);
if (!fl->commit_buckets) {
filelayout_free_lseg(&fl->generic_hdr);
return NULL;
nfs4_fl_find_get_deviceid(struct nfs4_deviceid *dev_id);
extern void nfs4_fl_put_deviceid(struct nfs4_file_layout_dsaddr *dsaddr);
struct nfs4_file_layout_dsaddr *
-get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id);
+get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags);
#endif /* FS_NFS_NFS4FILELAYOUT_H */
}
static struct nfs4_pnfs_ds *
-nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port)
+nfs4_pnfs_ds_add(struct inode *inode, u32 ip_addr, u32 port, gfp_t gfp_flags)
{
struct nfs4_pnfs_ds *tmp_ds, *ds;
- ds = kzalloc(sizeof(*tmp_ds), GFP_KERNEL);
+ ds = kzalloc(sizeof(*tmp_ds), gfp_flags);
if (!ds)
goto out;
* Currently only support ipv4, and one multi-path address.
*/
static struct nfs4_pnfs_ds *
-decode_and_add_ds(struct xdr_stream *streamp, struct inode *inode)
+decode_and_add_ds(struct xdr_stream *streamp, struct inode *inode, gfp_t gfp_flags)
{
struct nfs4_pnfs_ds *ds = NULL;
char *buf;
rlen);
goto out_err;
}
- buf = kmalloc(rlen + 1, GFP_KERNEL);
+ buf = kmalloc(rlen + 1, gfp_flags);
if (!buf) {
dprintk("%s: Not enough memory\n", __func__);
goto out_err;
sscanf(pstr, "-%d-%d", &tmp[0], &tmp[1]);
port = htons((tmp[0] << 8) | (tmp[1]));
- ds = nfs4_pnfs_ds_add(inode, ip_addr, port);
+ ds = nfs4_pnfs_ds_add(inode, ip_addr, port, gfp_flags);
dprintk("%s: Decoded address and port %s\n", __func__, buf);
out_free:
kfree(buf);
/* Decode opaque device data and return the result */
static struct nfs4_file_layout_dsaddr*
-decode_device(struct inode *ino, struct pnfs_device *pdev)
+decode_device(struct inode *ino, struct pnfs_device *pdev, gfp_t gfp_flags)
{
int i;
u32 cnt, num;
struct page *scratch;
/* set up xdr stream */
- scratch = alloc_page(GFP_KERNEL);
+ scratch = alloc_page(gfp_flags);
if (!scratch)
goto out_err;
}
/* read stripe indices */
- stripe_indices = kcalloc(cnt, sizeof(u8), GFP_KERNEL);
+ stripe_indices = kcalloc(cnt, sizeof(u8), gfp_flags);
if (!stripe_indices)
goto out_err_free_scratch;
dsaddr = kzalloc(sizeof(*dsaddr) +
(sizeof(struct nfs4_pnfs_ds *) * (num - 1)),
- GFP_KERNEL);
+ gfp_flags);
if (!dsaddr)
goto out_err_free_stripe_indices;
for (j = 0; j < mp_count; j++) {
if (j == 0) {
dsaddr->ds_list[i] = decode_and_add_ds(&stream,
- ino);
+ ino, gfp_flags);
if (dsaddr->ds_list[i] == NULL)
goto out_err_free_deviceid;
} else {
* available devices.
*/
static struct nfs4_file_layout_dsaddr *
-decode_and_add_device(struct inode *inode, struct pnfs_device *dev)
+decode_and_add_device(struct inode *inode, struct pnfs_device *dev, gfp_t gfp_flags)
{
struct nfs4_file_layout_dsaddr *d, *new;
long hash;
- new = decode_device(inode, dev);
+ new = decode_device(inode, dev, gfp_flags);
if (!new) {
printk(KERN_WARNING "%s: Could not decode or add device\n",
__func__);
* of available devices, and return it.
*/
struct nfs4_file_layout_dsaddr *
-get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id)
+get_device_info(struct inode *inode, struct nfs4_deviceid *dev_id, gfp_t gfp_flags)
{
struct pnfs_device *pdev = NULL;
u32 max_resp_sz;
dprintk("%s inode %p max_resp_sz %u max_pages %d\n",
__func__, inode, max_resp_sz, max_pages);
- pdev = kzalloc(sizeof(struct pnfs_device), GFP_KERNEL);
+ pdev = kzalloc(sizeof(struct pnfs_device), gfp_flags);
if (pdev == NULL)
return NULL;
- pages = kzalloc(max_pages * sizeof(struct page *), GFP_KERNEL);
+ pages = kzalloc(max_pages * sizeof(struct page *), gfp_flags);
if (pages == NULL) {
kfree(pdev);
return NULL;
}
for (i = 0; i < max_pages; i++) {
- pages[i] = alloc_page(GFP_KERNEL);
+ pages[i] = alloc_page(gfp_flags);
if (!pages[i])
goto out_free;
}
* Found new device, need to decode it and then add it to the
* list of known devices for this mountpoint.
*/
- dsaddr = decode_and_add_device(inode, pdev);
+ dsaddr = decode_and_add_device(inode, pdev, gfp_flags);
out_free:
for (i = 0; i < max_pages; i++)
__free_page(pages[i]);
#include <linux/nfs4.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
+#include <linux/nfs_mount.h>
#include <linux/namei.h>
#include <linux/mount.h>
#include <linux/module.h>
ret = nfs4_delay(server->client, &exception->timeout);
if (ret != 0)
break;
+ case -NFS4ERR_RETRY_UNCACHED_REP:
case -NFS4ERR_OLD_STATEID:
exception->retry = 1;
break;
if (res->sr_status == 1)
res->sr_status = NFS_OK;
- /* -ERESTARTSYS can result in skipping nfs41_sequence_setup */
- if (!res->sr_slot)
+ /* don't increment the sequence number if the task wasn't sent */
+ if (!RPC_WAS_SENT(task))
goto out;
/* Check the SEQUENCE operation status */
struct nfs4_exception exception = { };
int err;
do {
- err = nfs4_handle_exception(server,
- _nfs4_lookup_root(server, fhandle, info),
- &exception);
+ err = _nfs4_lookup_root(server, fhandle, info);
+ switch (err) {
+ case 0:
+ case -NFS4ERR_WRONGSEC:
+ break;
+ default:
+ err = nfs4_handle_exception(server, err, &exception);
+ }
} while (exception.retry);
return err;
}
return ret;
}
-/*
- * get the file handle for the "/" directory on the server
- */
-static int nfs4_proc_get_root(struct nfs_server *server, struct nfs_fh *fhandle,
+static int nfs4_find_root_sec(struct nfs_server *server, struct nfs_fh *fhandle,
struct nfs_fsinfo *info)
{
int i, len, status = 0;
- rpc_authflavor_t flav_array[NFS_MAX_SECFLAVORS + 2];
+ rpc_authflavor_t flav_array[NFS_MAX_SECFLAVORS];
- flav_array[0] = RPC_AUTH_UNIX;
- len = gss_mech_list_pseudoflavors(&flav_array[1]);
- flav_array[1+len] = RPC_AUTH_NULL;
- len += 2;
+ len = gss_mech_list_pseudoflavors(&flav_array[0]);
+ flav_array[len] = RPC_AUTH_NULL;
+ len += 1;
for (i = 0; i < len; i++) {
status = nfs4_lookup_root_sec(server, fhandle, info, flav_array[i]);
- if (status != -EPERM)
- break;
+ if (status == -NFS4ERR_WRONGSEC || status == -EACCES)
+ continue;
+ break;
}
+ /*
+ * -EACCESS could mean that the user doesn't have correct permissions
+ * to access the mount. It could also mean that we tried to mount
+ * with a gss auth flavor, but rpc.gssd isn't running. Either way,
+ * existing mount programs don't handle -EACCES very well so it should
+ * be mapped to -EPERM instead.
+ */
+ if (status == -EACCES)
+ status = -EPERM;
+ return status;
+}
+
+/*
+ * get the file handle for the "/" directory on the server
+ */
+static int nfs4_proc_get_root(struct nfs_server *server, struct nfs_fh *fhandle,
+ struct nfs_fsinfo *info)
+{
+ int status = nfs4_lookup_root(server, fhandle, info);
+ if ((status == -NFS4ERR_WRONGSEC) && !(server->flags & NFS_MOUNT_SECFLAVOUR))
+ /*
+ * A status of -NFS4ERR_WRONGSEC will be mapped to -EPERM
+ * by nfs4_map_errors() as this function exits.
+ */
+ status = nfs4_find_root_sec(server, fhandle, info);
if (status == 0)
status = nfs4_server_capabilities(server, fhandle);
if (status == 0)
rpc_delay(task, NFS4_POLL_RETRY_MAX);
task->tk_status = 0;
return -EAGAIN;
+ case -NFS4ERR_RETRY_UNCACHED_REP:
case -NFS4ERR_OLD_STATEID:
task->tk_status = 0;
return -EAGAIN;
sizeof(setclientid.sc_uaddr), "%s.%u.%u",
clp->cl_ipaddr, port >> 8, port & 255);
- status = rpc_call_sync(clp->cl_rpcclient, &msg, 0);
+ status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (status != -NFS4ERR_CLID_INUSE)
break;
- if (signalled())
+ if (loop != 0) {
+ ++clp->cl_id_uniquifier;
break;
- if (loop++ & 1)
- ssleep(clp->cl_lease_time / HZ + 1);
- else
- if (++clp->cl_id_uniquifier == 0)
- break;
+ }
+ ++loop;
+ ssleep(clp->cl_lease_time / HZ + 1);
}
return status;
}
-static int _nfs4_proc_setclientid_confirm(struct nfs_client *clp,
+int nfs4_proc_setclientid_confirm(struct nfs_client *clp,
struct nfs4_setclientid_res *arg,
struct rpc_cred *cred)
{
int status;
now = jiffies;
- status = rpc_call_sync(clp->cl_rpcclient, &msg, 0);
+ status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (status == 0) {
spin_lock(&clp->cl_lock);
clp->cl_lease_time = fsinfo.lease_time * HZ;
return status;
}
-int nfs4_proc_setclientid_confirm(struct nfs_client *clp,
- struct nfs4_setclientid_res *arg,
- struct rpc_cred *cred)
-{
- long timeout = 0;
- int err;
- do {
- err = _nfs4_proc_setclientid_confirm(clp, arg, cred);
- switch (err) {
- case 0:
- return err;
- case -NFS4ERR_RESOURCE:
- /* The IBM lawyers misread another document! */
- case -NFS4ERR_DELAY:
- err = nfs4_delay(clp->cl_rpcclient, &timeout);
- }
- } while (err == 0);
- return err;
-}
-
struct nfs4_delegreturndata {
struct nfs4_delegreturnargs args;
struct nfs4_delegreturnres res;
init_utsname()->domainname,
clp->cl_rpcclient->cl_auth->au_flavor);
- status = rpc_call_sync(clp->cl_rpcclient, &msg, 0);
+ status = rpc_call_sync(clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (!status)
status = nfs4_check_cl_exchange_flags(clp->cl_exchange_flags);
dprintk("<-- %s status= %d\n", __func__, status);
dprintk("%s Retry: tk_status %d\n", __func__, task->tk_status);
rpc_delay(task, NFS4_POLL_RETRY_MIN);
task->tk_status = 0;
+ /* fall through */
+ case -NFS4ERR_RETRY_UNCACHED_REP:
nfs_restart_rpc(task, data->clp);
return;
}
.rpc_client = clp->cl_rpcclient,
.rpc_message = &msg,
.callback_ops = &nfs4_get_lease_time_ops,
- .callback_data = &data
+ .callback_data = &data,
+ .flags = RPC_TASK_TIMEOUT,
};
int status;
nfs4_init_channel_attrs(&args);
args.flags = (SESSION4_PERSIST | SESSION4_BACK_CHAN);
- status = rpc_call_sync(session->clp->cl_rpcclient, &msg, 0);
+ status = rpc_call_sync(session->clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (!status)
/* Verify the session's negotiated channel_attrs values */
int status;
unsigned *ptr;
struct nfs4_session *session = clp->cl_session;
- long timeout = 0;
- int err;
dprintk("--> %s clp=%p session=%p\n", __func__, clp, session);
- do {
- status = _nfs4_proc_create_session(clp);
- if (status == -NFS4ERR_DELAY) {
- err = nfs4_delay(clp->cl_rpcclient, &timeout);
- if (err)
- status = err;
- }
- } while (status == -NFS4ERR_DELAY);
-
+ status = _nfs4_proc_create_session(clp);
if (status)
goto out;
msg.rpc_argp = session;
msg.rpc_resp = NULL;
msg.rpc_cred = NULL;
- status = rpc_call_sync(session->clp->cl_rpcclient, &msg, 0);
+ status = rpc_call_sync(session->clp->cl_rpcclient, &msg, RPC_TASK_TIMEOUT);
if (status)
printk(KERN_WARNING
break;
case -NFS4ERR_DELAY:
rpc_delay(task, NFS4_POLL_RETRY_MAX);
+ /* fall through */
+ case -NFS4ERR_RETRY_UNCACHED_REP:
return -EAGAIN;
default:
nfs4_schedule_lease_recovery(clp);
int nfs4_init_clientid(struct nfs_client *clp, struct rpc_cred *cred)
{
- struct nfs4_setclientid_res clid;
+ struct nfs4_setclientid_res clid = {
+ .clientid = clp->cl_clientid,
+ .confirm = clp->cl_confirm,
+ };
unsigned short port;
int status;
+ if (test_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state))
+ goto do_confirm;
port = nfs_callback_tcpport;
if (clp->cl_addr.ss_family == AF_INET6)
port = nfs_callback_tcpport6;
status = nfs4_proc_setclientid(clp, NFS4_CALLBACK, port, cred, &clid);
if (status != 0)
goto out;
+ clp->cl_clientid = clid.clientid;
+ clp->cl_confirm = clid.confirm;
+ set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
+do_confirm:
status = nfs4_proc_setclientid_confirm(clp, &clid, cred);
if (status != 0)
goto out;
- clp->cl_clientid = clid.clientid;
+ clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
nfs4_schedule_state_renewal(clp);
out:
return status;
{
int status;
+ if (test_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state))
+ goto do_confirm;
nfs4_begin_drain_session(clp);
status = nfs4_proc_exchange_id(clp, cred);
if (status != 0)
goto out;
+ set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
+do_confirm:
status = nfs4_proc_create_session(clp);
if (status != 0)
goto out;
+ clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
nfs41_setup_state_renewal(clp);
nfs_mark_client_ready(clp, NFS_CS_READY);
out:
*/
static void nfs4_set_lease_expired(struct nfs_client *clp, int status)
{
- if (nfs4_has_session(clp)) {
- switch (status) {
- case -NFS4ERR_DELAY:
- case -NFS4ERR_CLID_INUSE:
- case -EAGAIN:
- break;
+ switch (status) {
+ case -NFS4ERR_CLID_INUSE:
+ case -NFS4ERR_STALE_CLIENTID:
+ clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
+ break;
+ case -NFS4ERR_DELAY:
+ case -ETIMEDOUT:
+ case -EAGAIN:
+ ssleep(1);
+ break;
- case -EKEYEXPIRED:
- nfs4_warn_keyexpired(clp->cl_hostname);
- case -NFS4ERR_NOT_SAME: /* FixMe: implement recovery
- * in nfs4_exchange_id */
- default:
- return;
- }
+ case -EKEYEXPIRED:
+ nfs4_warn_keyexpired(clp->cl_hostname);
+ case -NFS4ERR_NOT_SAME: /* FixMe: implement recovery
+ * in nfs4_exchange_id */
+ default:
+ return;
}
set_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
}
int status = 0;
/* Ensure exclusive access to NFSv4 state */
- for(;;) {
+ do {
if (test_and_clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state)) {
/* We're going to have to re-establish a clientid */
status = nfs4_reclaim_lease(clp);
break;
if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
break;
- }
+ } while (atomic_read(&clp->cl_count) > 1);
return;
out_error:
printk(KERN_WARNING "Error: state manager failed on NFSv4 server %s"
static void encode_readdir(struct xdr_stream *xdr, const struct nfs4_readdir_arg *readdir, struct rpc_rqst *req, struct compound_hdr *hdr)
{
- uint32_t attrs[2] = {0, 0};
+ uint32_t attrs[2] = {
+ FATTR4_WORD0_RDATTR_ERROR,
+ FATTR4_WORD1_MOUNTED_ON_FILEID,
+ };
uint32_t dircount = readdir->count >> 1;
__be32 *p;
if (readdir->plus) {
attrs[0] |= FATTR4_WORD0_TYPE|FATTR4_WORD0_CHANGE|FATTR4_WORD0_SIZE|
- FATTR4_WORD0_FSID|FATTR4_WORD0_FILEHANDLE;
+ FATTR4_WORD0_FSID|FATTR4_WORD0_FILEHANDLE|FATTR4_WORD0_FILEID;
attrs[1] |= FATTR4_WORD1_MODE|FATTR4_WORD1_NUMLINKS|FATTR4_WORD1_OWNER|
FATTR4_WORD1_OWNER_GROUP|FATTR4_WORD1_RAWDEV|
FATTR4_WORD1_SPACE_USED|FATTR4_WORD1_TIME_ACCESS|
FATTR4_WORD1_TIME_METADATA|FATTR4_WORD1_TIME_MODIFY;
dircount >>= 1;
}
- attrs[0] |= FATTR4_WORD0_RDATTR_ERROR|FATTR4_WORD0_FILEID;
- attrs[1] |= FATTR4_WORD1_MOUNTED_ON_FILEID;
- /* Switch to mounted_on_fileid if the server supports it */
- if (readdir->bitmask[1] & FATTR4_WORD1_MOUNTED_ON_FILEID)
- attrs[0] &= ~FATTR4_WORD0_FILEID;
- else
- attrs[1] &= ~FATTR4_WORD1_MOUNTED_ON_FILEID;
+ /* Use mounted_on_fileid only if the server supports it */
+ if (!(readdir->bitmask[1] & FATTR4_WORD1_MOUNTED_ON_FILEID))
+ attrs[0] |= FATTR4_WORD0_FILEID;
p = reserve_space(xdr, 12+NFS4_VERIFIER_SIZE+20);
*p++ = cpu_to_be32(OP_READDIR);
goto out_overflow;
xdr_decode_hyper(p, fileid);
bitmap[1] &= ~FATTR4_WORD1_MOUNTED_ON_FILEID;
- ret = NFS_ATTR_FATTR_FILEID;
+ ret = NFS_ATTR_FATTR_MOUNTED_ON_FILEID;
}
dprintk("%s: fileid=%Lu\n", __func__, (unsigned long long)*fileid);
return ret;
{
int status;
umode_t fmode = 0;
- uint64_t fileid;
uint32_t type;
status = decode_attr_type(xdr, bitmap, &type);
goto xdr_error;
fattr->valid |= status;
- status = decode_attr_mounted_on_fileid(xdr, bitmap, &fileid);
+ status = decode_attr_mounted_on_fileid(xdr, bitmap, &fattr->mounted_on_fileid);
if (status < 0)
goto xdr_error;
- if (status != 0 && !(fattr->valid & status)) {
- fattr->fileid = fileid;
- fattr->valid |= status;
- }
+ fattr->valid |= status;
xdr_error:
dprintk("%s: xdr returned %d\n", __func__, -status);
struct nfs4_secinfo_flavor *sec_flavor;
int status;
__be32 *p;
- int i;
+ int i, num_flavors;
status = decode_op_hdr(xdr, OP_SECINFO);
+ if (status)
+ goto out;
p = xdr_inline_decode(xdr, 4);
if (unlikely(!p))
goto out_overflow;
- res->flavors->num_flavors = be32_to_cpup(p);
- for (i = 0; i < res->flavors->num_flavors; i++) {
+ res->flavors->num_flavors = 0;
+ num_flavors = be32_to_cpup(p);
+
+ for (i = 0; i < num_flavors; i++) {
sec_flavor = &res->flavors->flavors[i];
- if ((char *)&sec_flavor[1] - (char *)res > PAGE_SIZE)
+ if ((char *)&sec_flavor[1] - (char *)res->flavors > PAGE_SIZE)
break;
p = xdr_inline_decode(xdr, 4);
sec_flavor->flavor = be32_to_cpup(p);
if (sec_flavor->flavor == RPC_AUTH_GSS) {
- if (decode_secinfo_gss(xdr, sec_flavor))
- break;
+ status = decode_secinfo_gss(xdr, sec_flavor);
+ if (status)
+ goto out;
}
+ res->flavors->num_flavors++;
}
- return 0;
-
+out:
+ return status;
out_overflow:
print_overflow_msg(__func__, xdr);
return -EIO;
if (decode_getfattr_attrs(xdr, bitmap, entry->fattr, entry->fh,
entry->server, 1) < 0)
goto out_overflow;
- if (entry->fattr->valid & NFS_ATTR_FATTR_FILEID)
+ if (entry->fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID)
+ entry->ino = entry->fattr->mounted_on_fileid;
+ else if (entry->fattr->valid & NFS_ATTR_FATTR_FILEID)
entry->ino = entry->fattr->fileid;
entry->d_type = DT_UNKNOWN;
plh_layouts);
dprintk("%s freeing layout for inode %lu\n", __func__,
lo->plh_inode->i_ino);
+ list_del_init(&lo->plh_layouts);
pnfs_destroy_layout(NFS_I(lo->plh_inode));
}
}
static struct pnfs_layout_segment *
send_layoutget(struct pnfs_layout_hdr *lo,
struct nfs_open_context *ctx,
- u32 iomode)
+ u32 iomode,
+ gfp_t gfp_flags)
{
struct inode *ino = lo->plh_inode;
struct nfs_server *server = NFS_SERVER(ino);
dprintk("--> %s\n", __func__);
BUG_ON(ctx == NULL);
- lgp = kzalloc(sizeof(*lgp), GFP_KERNEL);
+ lgp = kzalloc(sizeof(*lgp), gfp_flags);
if (lgp == NULL)
return NULL;
max_resp_sz = server->nfs_client->cl_session->fc_attrs.max_resp_sz;
max_pages = max_resp_sz >> PAGE_SHIFT;
- pages = kzalloc(max_pages * sizeof(struct page *), GFP_KERNEL);
+ pages = kzalloc(max_pages * sizeof(struct page *), gfp_flags);
if (!pages)
goto out_err_free;
for (i = 0; i < max_pages; i++) {
- pages[i] = alloc_page(GFP_KERNEL);
+ pages[i] = alloc_page(gfp_flags);
if (!pages[i])
goto out_err_free;
}
lgp->args.layout.pages = pages;
lgp->args.layout.pglen = max_pages * PAGE_SIZE;
lgp->lsegpp = &lseg;
+ lgp->gfp_flags = gfp_flags;
/* Synchronously retrieve layout information from server and
* store in lseg.
}
static struct pnfs_layout_hdr *
-alloc_init_layout_hdr(struct inode *ino)
+alloc_init_layout_hdr(struct inode *ino, gfp_t gfp_flags)
{
struct pnfs_layout_hdr *lo;
- lo = kzalloc(sizeof(struct pnfs_layout_hdr), GFP_KERNEL);
+ lo = kzalloc(sizeof(struct pnfs_layout_hdr), gfp_flags);
if (!lo)
return NULL;
atomic_set(&lo->plh_refcount, 1);
}
static struct pnfs_layout_hdr *
-pnfs_find_alloc_layout(struct inode *ino)
+pnfs_find_alloc_layout(struct inode *ino, gfp_t gfp_flags)
{
struct nfs_inode *nfsi = NFS_I(ino);
struct pnfs_layout_hdr *new = NULL;
return nfsi->layout;
}
spin_unlock(&ino->i_lock);
- new = alloc_init_layout_hdr(ino);
+ new = alloc_init_layout_hdr(ino, gfp_flags);
spin_lock(&ino->i_lock);
if (likely(nfsi->layout == NULL)) /* Won the race? */
struct pnfs_layout_segment *
pnfs_update_layout(struct inode *ino,
struct nfs_open_context *ctx,
- enum pnfs_iomode iomode)
+ enum pnfs_iomode iomode,
+ gfp_t gfp_flags)
{
struct nfs_inode *nfsi = NFS_I(ino);
struct nfs_client *clp = NFS_SERVER(ino)->nfs_client;
if (!pnfs_enabled_sb(NFS_SERVER(ino)))
return NULL;
spin_lock(&ino->i_lock);
- lo = pnfs_find_alloc_layout(ino);
+ lo = pnfs_find_alloc_layout(ino, gfp_flags);
if (lo == NULL) {
dprintk("%s ERROR: can't get pnfs_layout_hdr\n", __func__);
goto out_unlock;
spin_unlock(&clp->cl_lock);
}
- lseg = send_layoutget(lo, ctx, iomode);
+ lseg = send_layoutget(lo, ctx, iomode, gfp_flags);
if (!lseg && first) {
spin_lock(&clp->cl_lock);
list_del_init(&lo->plh_layouts);
goto out;
}
/* Inject layout blob into I/O device driver */
- lseg = NFS_SERVER(ino)->pnfs_curr_ld->alloc_lseg(lo, res);
+ lseg = NFS_SERVER(ino)->pnfs_curr_ld->alloc_lseg(lo, res, lgp->gfp_flags);
if (!lseg || IS_ERR(lseg)) {
if (!lseg)
status = -ENOMEM;
/* This is first coelesce call for a series of nfs_pages */
pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
prev->wb_context,
- IOMODE_READ);
+ IOMODE_READ,
+ GFP_KERNEL);
}
return NFS_SERVER(pgio->pg_inode)->pnfs_curr_ld->pg_test(pgio, prev, req);
}
/* This is first coelesce call for a series of nfs_pages */
pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
prev->wb_context,
- IOMODE_RW);
+ IOMODE_RW,
+ GFP_NOFS);
}
return NFS_SERVER(pgio->pg_inode)->pnfs_curr_ld->pg_test(pgio, prev, req);
}
{
struct nfs_inode *nfsi = NFS_I(wdata->inode);
loff_t end_pos = wdata->args.offset + wdata->res.count;
+ bool mark_as_dirty = false;
spin_lock(&nfsi->vfs_inode.i_lock);
if (!test_and_set_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
get_lseg(wdata->lseg);
wdata->lseg->pls_lc_cred =
get_rpccred(wdata->args.context->state->owner->so_cred);
- mark_inode_dirty_sync(wdata->inode);
+ mark_as_dirty = true;
dprintk("%s: Set layoutcommit for inode %lu ",
__func__, wdata->inode->i_ino);
}
if (end_pos > wdata->lseg->pls_end_pos)
wdata->lseg->pls_end_pos = end_pos;
spin_unlock(&nfsi->vfs_inode.i_lock);
+
+ /* if pnfs_layoutcommit_inode() runs between inode locks, the next one
+ * will be a noop because NFS_INO_LAYOUTCOMMIT will not be set */
+ if (mark_as_dirty)
+ mark_inode_dirty_sync(wdata->inode);
}
EXPORT_SYMBOL_GPL(pnfs_set_layoutcommit);
const u32 id;
const char *name;
struct module *owner;
- struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr);
+ struct pnfs_layout_segment * (*alloc_lseg) (struct pnfs_layout_hdr *layoutid, struct nfs4_layoutget_res *lgr, gfp_t gfp_flags);
void (*free_lseg) (struct pnfs_layout_segment *lseg);
/* test for nfs page cache coalescing */
void put_lseg(struct pnfs_layout_segment *lseg);
struct pnfs_layout_segment *
pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
- enum pnfs_iomode access_type);
+ enum pnfs_iomode access_type, gfp_t gfp_flags);
void set_pnfs_layoutdriver(struct nfs_server *, u32 id);
void unset_pnfs_layoutdriver(struct nfs_server *);
enum pnfs_try_status pnfs_try_to_write_data(struct nfs_write_data *,
static inline struct pnfs_layout_segment *
pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
- enum pnfs_iomode access_type)
+ enum pnfs_iomode access_type, gfp_t gfp_flags)
{
return NULL;
}
atomic_set(&req->wb_complete, requests);
BUG_ON(desc->pg_lseg != NULL);
- lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_READ);
+ lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_READ, GFP_KERNEL);
ClearPageError(page);
offset = 0;
nbytes = desc->pg_count;
}
req = nfs_list_entry(data->pages.next);
if ((!lseg) && list_is_singular(&data->pages))
- lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_READ);
+ lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_READ, GFP_KERNEL);
ret = nfs_read_rpcsetup(req, data, &nfs_read_full_ops, desc->pg_count,
0, lseg);
return 0;
}
+ mnt->flags |= NFS_MOUNT_SECFLAVOUR;
mnt->auth_flavor_len = 1;
return 1;
}
if (error < 0)
goto out;
+ /*
+ * noac is a special case. It implies -o sync, but that's not
+ * necessarily reflected in the mtab options. do_remount_sb
+ * will clear MS_SYNCHRONOUS if -o sync wasn't specified in the
+ * remount options, so we have to explicitly reset it.
+ */
+ if (data->flags & NFS_MOUNT_NOAC)
+ *flags |= MS_SYNCHRONOUS;
+
/* compare new mount options with old ones */
error = nfs_compare_remount_data(nfss, data);
out:
if (!s->s_root) {
/* initial superblock/root creation */
nfs_fill_super(s, data);
- nfs_fscache_get_super_cookie(
- s, data ? data->fscache_uniq : NULL, NULL);
+ nfs_fscache_get_super_cookie(s, data->fscache_uniq, NULL);
}
mntroot = nfs_get_root(s, mntfh, dev_name);
req = nfs_setup_write_request(ctx, page, offset, count);
if (IS_ERR(req))
return PTR_ERR(req);
- nfs_mark_request_dirty(req);
/* Update file length */
nfs_grow_file(page, offset, count);
nfs_mark_uptodate(page, req->wb_pgbase, req->wb_bytes);
atomic_set(&req->wb_complete, requests);
BUG_ON(desc->pg_lseg);
- lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_RW);
+ lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_RW, GFP_NOFS);
ClearPageError(page);
offset = 0;
nbytes = desc->pg_count;
}
req = nfs_list_entry(data->pages.next);
if ((!lseg) && list_is_singular(&data->pages))
- lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_RW);
+ lseg = pnfs_update_layout(desc->pg_inode, req->wb_context, IOMODE_RW, GFP_NOFS);
if ((desc->pg_ioflags & FLUSH_COND_STABLE) &&
(desc->pg_moreio || NFS_I(desc->pg_inode)->ncommit))
task->tk_pid, task->tk_status);
/* Call the NFS version-specific code */
- if (NFS_PROTO(data->inode)->commit_done(task, data) != 0)
- return;
+ NFS_PROTO(data->inode)->commit_done(task, data);
}
void nfs_commit_release_pages(struct nfs_write_data *data)
unsigned long group, group_offset;
int i, j, n, ret;
- for (i = 0; i < nitems; i += n) {
+ for (i = 0; i < nitems; i = j) {
group = nilfs_palloc_group(inode, entry_nrs[i], &group_offset);
ret = nilfs_palloc_get_desc_block(inode, group, 0, &desc_bh);
if (ret < 0)
/* We want to make sure that nobody is heartbeating on top of us --
* this will help detect an invalid configuration. */
-static int o2hb_check_last_timestamp(struct o2hb_region *reg)
+static void o2hb_check_last_timestamp(struct o2hb_region *reg)
{
- int node_num, ret;
struct o2hb_disk_slot *slot;
struct o2hb_disk_heartbeat_block *hb_block;
+ char *errstr;
- node_num = o2nm_this_node();
-
- ret = 1;
- slot = ®->hr_slots[node_num];
+ slot = ®->hr_slots[o2nm_this_node()];
/* Don't check on our 1st timestamp */
- if (slot->ds_last_time) {
- hb_block = slot->ds_raw_block;
+ if (!slot->ds_last_time)
+ return;
- if (le64_to_cpu(hb_block->hb_seq) != slot->ds_last_time)
- ret = 0;
- }
+ hb_block = slot->ds_raw_block;
+ if (le64_to_cpu(hb_block->hb_seq) == slot->ds_last_time &&
+ le64_to_cpu(hb_block->hb_generation) == slot->ds_last_generation &&
+ hb_block->hb_node == slot->ds_node_num)
+ return;
- return ret;
+#define ERRSTR1 "Another node is heartbeating on device"
+#define ERRSTR2 "Heartbeat generation mismatch on device"
+#define ERRSTR3 "Heartbeat sequence mismatch on device"
+
+ if (hb_block->hb_node != slot->ds_node_num)
+ errstr = ERRSTR1;
+ else if (le64_to_cpu(hb_block->hb_generation) !=
+ slot->ds_last_generation)
+ errstr = ERRSTR2;
+ else
+ errstr = ERRSTR3;
+
+ mlog(ML_ERROR, "%s (%s): expected(%u:0x%llx, 0x%llx), "
+ "ondisk(%u:0x%llx, 0x%llx)\n", errstr, reg->hr_dev_name,
+ slot->ds_node_num, (unsigned long long)slot->ds_last_generation,
+ (unsigned long long)slot->ds_last_time, hb_block->hb_node,
+ (unsigned long long)le64_to_cpu(hb_block->hb_generation),
+ (unsigned long long)le64_to_cpu(hb_block->hb_seq));
}
static inline void o2hb_prepare_block(struct o2hb_region *reg,
/* With an up to date view of the slots, we can check that no
* other node has been improperly configured to heartbeat in
* our slot. */
- if (!o2hb_check_last_timestamp(reg))
- mlog(ML_ERROR, "Device \"%s\": another node is heartbeating "
- "in our slot!\n", reg->hr_dev_name);
+ o2hb_check_last_timestamp(reg);
/* fill in the proper info for our next heartbeat */
o2hb_prepare_block(reg, reg->hr_generation);
}
i = -1;
- while((i = find_next_bit(configured_nodes, O2NM_MAX_NODES, i + 1)) < O2NM_MAX_NODES) {
-
+ while((i = find_next_bit(configured_nodes,
+ O2NM_MAX_NODES, i + 1)) < O2NM_MAX_NODES) {
change |= o2hb_check_slot(reg, ®->hr_slots[i]);
}
struct file *filp = NULL;
struct inode *inode = NULL;
ssize_t ret = -EINVAL;
+ int live_threshold;
if (reg->hr_bdev)
goto out;
* A node is considered live after it has beat LIVE_THRESHOLD
* times. We're not steady until we've given them a chance
* _after_ our first read.
+ * The default threshold is bare minimum so as to limit the delay
+ * during mounts. For global heartbeat, the threshold doubled for the
+ * first region.
*/
- atomic_set(®->hr_steady_iterations, O2HB_LIVE_THRESHOLD + 1);
+ live_threshold = O2HB_LIVE_THRESHOLD;
+ if (o2hb_global_heartbeat_active()) {
+ spin_lock(&o2hb_live_lock);
+ if (o2hb_pop_count(&o2hb_region_bitmap, O2NM_MAX_REGIONS) == 1)
+ live_threshold <<= 1;
+ spin_unlock(&o2hb_live_lock);
+ }
+ atomic_set(®->hr_steady_iterations, live_threshold + 1);
hb_task = kthread_run(o2hb_thread, reg, "o2hb-%s",
reg->hr_item.ci_name);
bytes = blocks_wanted << sb->s_blocksize_bits;
struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
struct ocfs2_inode_info *oi = OCFS2_I(dir);
- struct ocfs2_alloc_context *data_ac;
+ struct ocfs2_alloc_context *data_ac = NULL;
struct ocfs2_alloc_context *meta_ac = NULL;
struct buffer_head *dirdata_bh = NULL;
struct buffer_head *dx_root_bh = NULL;
spin_unlock(&dlm->spinlock);
/* Support for global heartbeat and node info was added in 1.1 */
- if (dlm_protocol.pv_major > 1 || dlm_protocol.pv_minor > 0) {
+ if (dlm->dlm_locking_proto.pv_major > 1 ||
+ dlm->dlm_locking_proto.pv_minor > 0) {
status = dlm_send_nodeinfo(dlm, ctxt->yes_resp_map);
if (status) {
mlog_errno(status);
res->state &= ~DLM_LOCK_RES_MIGRATING;
wake = 1;
spin_unlock(&res->spinlock);
+ if (dlm_is_host_down(ret))
+ dlm_wait_for_node_death(dlm, target,
+ DLM_NODE_DEATH_WAIT_MAX);
goto leave;
}
range = le32_to_cpu(rec->e_cpos) + ocfs2_rec_clusters(el, rec);
if (le32_to_cpu(rec->e_cpos) >= trunc_start) {
+ /*
+ * remove an entire extent record.
+ */
*trunc_cpos = le32_to_cpu(rec->e_cpos);
/*
* Skip holes if any.
*blkno = le64_to_cpu(rec->e_blkno);
*trunc_end = le32_to_cpu(rec->e_cpos);
} else if (range > trunc_start) {
+ /*
+ * remove a partial extent record, which means we're
+ * removing the last extent record.
+ */
*trunc_cpos = trunc_start;
+ /*
+ * skip hole if any.
+ */
+ if (range < *trunc_end)
+ *trunc_end = range;
*trunc_len = *trunc_end - trunc_start;
coff = trunc_start - le32_to_cpu(rec->e_cpos);
*blkno = le64_to_cpu(rec->e_blkno) +
{
struct ocfs2_journal *journal = osb->journal;
+ if (ocfs2_is_hard_readonly(osb))
+ return;
+
/* No need to queue up our truncate_log as regular cleanup will catch
* that */
ocfs2_queue_recovery_completion(journal, osb->slot_num,
__le16 xe_name_offset; /* byte offset from the 1st entry in the
local xattr storage(inode, xattr block or
xattr bucket). */
- __u8 xe_name_len; /* xattr name len, does't include prefix. */
+ __u8 xe_name_len; /* xattr name len, doesn't include prefix. */
__u8 xe_type; /* the low 7 bits indicate the name prefix
* type and the highest bit indicates whether
* the EA is stored in the local storage. */
goto fail;
}
+ /* Check that sizeof_partition_entry has the correct value */
+ if (le32_to_cpu((*gpt)->sizeof_partition_entry) != sizeof(gpt_entry)) {
+ pr_debug("GUID Partitition Entry Size check failed.\n");
+ goto fail;
+ }
+
if (!(*ptes = alloc_read_gpt_entries(state, *gpt)))
goto fail;
int flags = vma->vm_flags;
unsigned long ino = 0;
unsigned long long pgoff = 0;
- unsigned long start;
+ unsigned long start, end;
dev_t dev = 0;
int len;
/* We don't show the stack guard page in /proc/maps */
start = vma->vm_start;
- if (vma->vm_flags & VM_GROWSDOWN)
- if (!vma_stack_continue(vma->vm_prev, vma->vm_start))
- start += PAGE_SIZE;
+ if (stack_guard_page_start(vma, start))
+ start += PAGE_SIZE;
+ end = vma->vm_end;
+ if (stack_guard_page_end(vma, end))
+ end -= PAGE_SIZE;
seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
start,
- vma->vm_end,
+ end,
flags & VM_READ ? 'r' : '-',
flags & VM_WRITE ? 'w' : '-',
flags & VM_EXEC ? 'x' : '-',
spin_unlock(&c->buds_lock);
}
-/**
- * ubifs_create_buds_lists - create journal head buds lists for remount rw.
- * @c: UBIFS file-system description object
- */
-void ubifs_create_buds_lists(struct ubifs_info *c)
-{
- struct rb_node *p;
-
- spin_lock(&c->buds_lock);
- p = rb_first(&c->buds);
- while (p) {
- struct ubifs_bud *bud = rb_entry(p, struct ubifs_bud, rb);
- struct ubifs_jhead *jhead = &c->jheads[bud->jhead];
-
- list_add_tail(&bud->list, &jhead->buds_list);
- p = rb_next(p);
- }
- spin_unlock(&c->buds_lock);
-}
-
/**
* ubifs_add_bud_to_log - add a new bud to the log.
* @c: UBIFS file-system description object
goto out_free;
}
memcpy(c->rcvrd_mst_node, c->mst_node, UBIFS_MST_NODE_SZ);
+
+ /*
+ * We had to recover the master node, which means there was an
+ * unclean reboot. However, it is possible that the master node
+ * is clean at this point, i.e., %UBIFS_MST_DIRTY is not set.
+ * E.g., consider the following chain of events:
+ *
+ * 1. UBIFS was cleanly unmounted, so the master node is clean
+ * 2. UBIFS is being mounted R/W and starts changing the master
+ * node in the first (%UBIFS_MST_LNUM). A power cut happens,
+ * so this LEB ends up with some amount of garbage at the
+ * end.
+ * 3. UBIFS is being mounted R/O. We reach this place and
+ * recover the master node from the second LEB
+ * (%UBIFS_MST_LNUM + 1). But we cannot update the media
+ * because we are being mounted R/O. We have to defer the
+ * operation.
+ * 4. However, this master node (@c->mst_node) is marked as
+ * clean (since the step 1). And if we just return, the
+ * mount code will be confused and won't recover the master
+ * node when it is re-mounter R/W later.
+ *
+ * Thus, to force the recovery by marking the master node as
+ * dirty.
+ */
+ c->mst_node->flags |= cpu_to_le32(UBIFS_MST_DIRTY);
} else {
/* Write the recovered master node */
c->max_sqnum = le64_to_cpu(mst->ch.sqnum) - 1;
* @new_size: truncation new size
* @free: amount of free space in a bud
* @dirty: amount of dirty space in a bud from padding and deletion nodes
+ * @jhead: journal head number of the bud
*
* UBIFS journal replay must compare node sequence numbers, which means it must
* build a tree of node information to insert into the TNC.
struct {
int free;
int dirty;
+ int jhead;
};
};
};
err = PTR_ERR(lp);
goto out;
}
+
+ /* Make sure the journal head points to the latest bud */
+ err = ubifs_wbuf_seek_nolock(&c->jheads[r->jhead].wbuf, r->lnum,
+ c->leb_size - r->free, UBI_SHORTTERM);
+
out:
ubifs_release_lprops(c);
return err;
ubifs_assert(sleb->endpt - offs >= used);
ubifs_assert(sleb->endpt % c->min_io_size == 0);
- if (sleb->endpt + c->min_io_size <= c->leb_size && !c->ro_mount)
- err = ubifs_wbuf_seek_nolock(&c->jheads[jhead].wbuf, lnum,
- sleb->endpt, UBI_SHORTTERM);
-
*dirty = sleb->endpt - offs - used;
*free = c->leb_size - sleb->endpt;
* @sqnum: sequence number
* @free: amount of free space in bud
* @dirty: amount of dirty space from padding and deletion nodes
+ * @jhead: journal head number for the bud
*
* This function inserts a reference node to the replay tree and returns zero
* in case of success or a negative error code in case of failure.
*/
static int insert_ref_node(struct ubifs_info *c, int lnum, int offs,
- unsigned long long sqnum, int free, int dirty)
+ unsigned long long sqnum, int free, int dirty,
+ int jhead)
{
struct rb_node **p = &c->replay_tree.rb_node, *parent = NULL;
struct replay_entry *r;
r->flags = REPLAY_REF;
r->free = free;
r->dirty = dirty;
+ r->jhead = jhead;
rb_link_node(&r->rb, parent, p);
rb_insert_color(&r->rb, &c->replay_tree);
if (err)
return err;
err = insert_ref_node(c, b->bud->lnum, b->bud->start, b->sqnum,
- free, dirty);
+ free, dirty, b->bud->jhead);
if (err)
return err;
}
goto out_free;
}
+ err = alloc_wbufs(c);
+ if (err)
+ goto out_cbuf;
+
sprintf(c->bgt_name, BGT_NAME_PATTERN, c->vi.ubi_num, c->vi.vol_id);
if (!c->ro_mount) {
- err = alloc_wbufs(c);
- if (err)
- goto out_cbuf;
-
/* Create background thread */
c->bgt = kthread_create(ubifs_bg_thread, c, "%s", c->bgt_name);
if (IS_ERR(c->bgt)) {
if (err)
goto out;
- err = alloc_wbufs(c);
- if (err)
- goto out;
-
- ubifs_create_buds_lists(c);
-
/* Create background thread */
c->bgt = kthread_create(ubifs_bg_thread, c, "%s", c->bgt_name);
if (IS_ERR(c->bgt)) {
if (err)
goto out;
+ dbg_gen("re-mounted read-write");
+ c->remounting_rw = 0;
+
if (c->need_recovery) {
c->need_recovery = 0;
ubifs_msg("deferred recovery completed");
+ } else {
+ /*
+ * Do not run the debugging space check if the were doing
+ * recovery, because when we saved the information we had the
+ * file-system in a state where the TNC and lprops has been
+ * modified in memory, but all the I/O operations (including a
+ * commit) were deferred. So the file-system was in
+ * "non-committed" state. Now the file-system is in committed
+ * state, and of course the amount of free space will change
+ * because, for example, the old index size was imprecise.
+ */
+ err = dbg_check_space_info(c);
}
-
- dbg_gen("re-mounted read-write");
- c->remounting_rw = 0;
- err = dbg_check_space_info(c);
mutex_unlock(&c->umount_mutex);
return err;
if (err)
ubifs_ro_mode(c, err);
- free_wbufs(c);
vfree(c->orph_buf);
c->orph_buf = NULL;
kfree(c->write_reserve_buf);
* of the media. For example, there will be dirty inodes if we failed
* to write them back because of I/O errors.
*/
- ubifs_assert(atomic_long_read(&c->dirty_pg_cnt) == 0);
- ubifs_assert(c->budg_idx_growth == 0);
- ubifs_assert(c->budg_dd_growth == 0);
- ubifs_assert(c->budg_data_growth == 0);
+ if (!c->ro_error) {
+ ubifs_assert(atomic_long_read(&c->dirty_pg_cnt) == 0);
+ ubifs_assert(c->budg_idx_growth == 0);
+ ubifs_assert(c->budg_dd_growth == 0);
+ ubifs_assert(c->budg_data_growth == 0);
+ }
/*
* The 'c->umount_lock' prevents races between UBIFS memory shrinker
XFS_LOOKUP_BATCH,
XFS_ICI_RECLAIM_TAG);
if (!nr_found) {
+ done = 1;
rcu_read_unlock();
break;
}
*/
STATIC void
xfs_ail_worker(
- struct work_struct *work)
+ struct work_struct *work)
{
- struct xfs_ail *ailp = container_of(to_delayed_work(work),
+ struct xfs_ail *ailp = container_of(to_delayed_work(work),
struct xfs_ail, xa_work);
- long tout;
- xfs_lsn_t target = ailp->xa_target;
- xfs_lsn_t lsn;
- xfs_log_item_t *lip;
- int flush_log, count, stuck;
- xfs_mount_t *mp = ailp->xa_mount;
+ xfs_mount_t *mp = ailp->xa_mount;
struct xfs_ail_cursor *cur = &ailp->xa_cursors;
- int push_xfsbufd = 0;
+ xfs_log_item_t *lip;
+ xfs_lsn_t lsn;
+ xfs_lsn_t target;
+ long tout = 10;
+ int flush_log = 0;
+ int stuck = 0;
+ int count = 0;
+ int push_xfsbufd = 0;
spin_lock(&ailp->xa_lock);
+ target = ailp->xa_target;
xfs_trans_ail_cursor_init(ailp, cur);
lip = xfs_trans_ail_cursor_first(ailp, cur, ailp->xa_last_pushed_lsn);
if (!lip || XFS_FORCED_SHUTDOWN(mp)) {
*/
xfs_trans_ail_cursor_done(ailp, cur);
spin_unlock(&ailp->xa_lock);
- ailp->xa_last_pushed_lsn = 0;
- return;
+ goto out_done;
}
XFS_STATS_INC(xs_push_ail);
* lots of contention on the AIL lists.
*/
lsn = lip->li_lsn;
- flush_log = stuck = count = 0;
- while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) {
+ while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) {
int lock_result;
/*
* If we can lock the item without sleeping, unlock the AIL
}
/* assume we have more work to do in a short while */
- tout = 10;
+out_done:
if (!count) {
/* We're past our target or empty, so idle */
ailp->xa_last_pushed_lsn = 0;
/*
- * Check for an updated push target before clearing the
- * XFS_AIL_PUSHING_BIT. If the target changed, we've got more
- * work to do. Wait a bit longer before starting that work.
+ * We clear the XFS_AIL_PUSHING_BIT first before checking
+ * whether the target has changed. If the target has changed,
+ * this pushes the requeue race directly onto the result of the
+ * atomic test/set bit, so we are guaranteed that either the
+ * the pusher that changed the target or ourselves will requeue
+ * the work (but not both).
*/
+ clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags);
smp_rmb();
- if (ailp->xa_target == target) {
- clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags);
+ if (XFS_LSN_CMP(ailp->xa_target, target) == 0 ||
+ test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))
return;
- }
+
tout = 50;
} else if (XFS_LSN_CMP(lsn, target) >= 0) {
/*
* the XFS_AIL_PUSHING_BIT.
*/
smp_wmb();
- ailp->xa_target = threshold_lsn;
+ xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn);
if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))
queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0);
}
STRUCT_ALIGN(); \
*(__tracepoints) \
/* implement dynamic printk debug */ \
+ . = ALIGN(8); \
+ VMLINUX_SYMBOL(__start___jump_table) = .; \
+ *(__jump_table) \
+ VMLINUX_SYMBOL(__stop___jump_table) = .; \
. = ALIGN(8); \
VMLINUX_SYMBOL(__start___verbose) = .; \
*(__verbose) \
\
BUG_TABLE \
\
- JUMP_TABLE \
- \
/* PCI quirks */ \
.pci_fixup : AT(ADDR(.pci_fixup) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_pci_fixups_early) = .; \
/* Kernel symbol table: Normal symbols */ \
__ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab) = .; \
- *(__ksymtab) \
+ *(SORT(___ksymtab+*)) \
VMLINUX_SYMBOL(__stop___ksymtab) = .; \
} \
\
/* Kernel symbol table: GPL-only symbols */ \
__ksymtab_gpl : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_gpl) = .; \
- *(__ksymtab_gpl) \
+ *(SORT(___ksymtab_gpl+*)) \
VMLINUX_SYMBOL(__stop___ksymtab_gpl) = .; \
} \
\
/* Kernel symbol table: Normal unused symbols */ \
__ksymtab_unused : AT(ADDR(__ksymtab_unused) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_unused) = .; \
- *(__ksymtab_unused) \
+ *(SORT(___ksymtab_unused+*)) \
VMLINUX_SYMBOL(__stop___ksymtab_unused) = .; \
} \
\
/* Kernel symbol table: GPL-only unused symbols */ \
__ksymtab_unused_gpl : AT(ADDR(__ksymtab_unused_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_unused_gpl) = .; \
- *(__ksymtab_unused_gpl) \
+ *(SORT(___ksymtab_unused_gpl+*)) \
VMLINUX_SYMBOL(__stop___ksymtab_unused_gpl) = .; \
} \
\
/* Kernel symbol table: GPL-future-only symbols */ \
__ksymtab_gpl_future : AT(ADDR(__ksymtab_gpl_future) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_gpl_future) = .; \
- *(__ksymtab_gpl_future) \
+ *(SORT(___ksymtab_gpl_future+*)) \
VMLINUX_SYMBOL(__stop___ksymtab_gpl_future) = .; \
} \
\
/* Kernel symbol table: Normal symbols */ \
__kcrctab : AT(ADDR(__kcrctab) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab) = .; \
- *(__kcrctab) \
+ *(SORT(___kcrctab+*)) \
VMLINUX_SYMBOL(__stop___kcrctab) = .; \
} \
\
/* Kernel symbol table: GPL-only symbols */ \
__kcrctab_gpl : AT(ADDR(__kcrctab_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_gpl) = .; \
- *(__kcrctab_gpl) \
+ *(SORT(___kcrctab_gpl+*)) \
VMLINUX_SYMBOL(__stop___kcrctab_gpl) = .; \
} \
\
/* Kernel symbol table: Normal unused symbols */ \
__kcrctab_unused : AT(ADDR(__kcrctab_unused) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_unused) = .; \
- *(__kcrctab_unused) \
+ *(SORT(___kcrctab_unused+*)) \
VMLINUX_SYMBOL(__stop___kcrctab_unused) = .; \
} \
\
/* Kernel symbol table: GPL-only unused symbols */ \
__kcrctab_unused_gpl : AT(ADDR(__kcrctab_unused_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_unused_gpl) = .; \
- *(__kcrctab_unused_gpl) \
+ *(SORT(___kcrctab_unused_gpl+*)) \
VMLINUX_SYMBOL(__stop___kcrctab_unused_gpl) = .; \
} \
\
/* Kernel symbol table: GPL-future-only symbols */ \
__kcrctab_gpl_future : AT(ADDR(__kcrctab_gpl_future) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_gpl_future) = .; \
- *(__kcrctab_gpl_future) \
+ *(SORT(___kcrctab_gpl_future+*)) \
VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .; \
} \
\
#define BUG_TABLE
#endif
-#define JUMP_TABLE \
- . = ALIGN(8); \
- __jump_table : AT(ADDR(__jump_table) - LOAD_OFFSET) { \
- VMLINUX_SYMBOL(__start___jump_table) = .; \
- *(__jump_table) \
- VMLINUX_SYMBOL(__stop___jump_table) = .; \
- }
-
#ifdef CONFIG_PM_TRACE
#define TRACEDATA \
. = ALIGN(4); \
unsigned transp,
struct fb_info *info);
+bool drm_fb_helper_restore_fbdev_mode(struct drm_fb_helper *fb_helper);
void drm_fb_helper_restore(void);
void drm_fb_helper_fill_var(struct fb_info *info, struct drm_fb_helper *fb_helper,
uint32_t fb_width, uint32_t fb_height);
int drm_fb_helper_setcmap(struct fb_cmap *cmap, struct fb_info *info);
-bool drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper);
+int drm_fb_helper_hotplug_event(struct drm_fb_helper *fb_helper);
bool drm_fb_helper_initial_config(struct drm_fb_helper *fb_helper, int bpp_sel);
int drm_fb_helper_single_add_all_connectors(struct drm_fb_helper *fb_helper);
int drm_fb_helper_debug_enter(struct fb_info *info);
}
#define drm_mm_for_each_node(entry, mm) list_for_each_entry(entry, \
&(mm)->head_node.node_list, \
- node_list);
+ node_list)
#define drm_mm_for_each_scanned_node_reverse(entry, n, mm) \
for (entry = (mm)->prev_scanned_node, \
next = entry ? list_entry(entry->node_list.next, \
{0x1002, 0x6719, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CAYMAN|RADEON_NEW_MEMMAP}, \
{0x1002, 0x671c, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CAYMAN|RADEON_NEW_MEMMAP}, \
{0x1002, 0x671d, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CAYMAN|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x671f, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CAYMAN|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6720, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6721, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6722, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6729, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6738, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6739, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x673e, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BARTS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6740, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TURKS|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6741, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TURKS|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6742, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_TURKS|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x688D, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CYPRESS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6898, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CYPRESS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x6899, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CYPRESS|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x689b, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CYPRESS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x689c, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_HEMLOCK|RADEON_NEW_MEMMAP}, \
{0x1002, 0x689d, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_HEMLOCK|RADEON_NEW_MEMMAP}, \
{0x1002, 0x689e, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_CYPRESS|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68b0, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_JUNIPER|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68b8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_JUNIPER|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68b9, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_JUNIPER|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x68ba, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_JUNIPER|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68be, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_JUNIPER|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x68bf, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_JUNIPER|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68c0, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_REDWOOD|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68c1, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_REDWOOD|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x68c7, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_REDWOOD|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
#define RADEON_INFO_WANT_CMASK 0x08 /* get access to CMASK on r300 */
#define RADEON_INFO_CLOCK_CRYSTAL_FREQ 0x09 /* clock crystal frequency */
#define RADEON_INFO_NUM_BACKENDS 0x0a /* DB/backends for r600+ - need for OQ */
+#define RADEON_INFO_NUM_TILE_PIPES 0x0b /* tile pipes for r600+ */
+#define RADEON_INFO_FUSION_GART_WORKING 0x0c /* fusion writes to GTT were broken before this */
struct drm_radeon_info {
uint32_t request;
--- /dev/null
+#ifndef _LINUX_ALARMTIMER_H
+#define _LINUX_ALARMTIMER_H
+
+#include <linux/time.h>
+#include <linux/hrtimer.h>
+#include <linux/timerqueue.h>
+#include <linux/rtc.h>
+
+enum alarmtimer_type {
+ ALARM_REALTIME,
+ ALARM_BOOTTIME,
+
+ ALARM_NUMTYPE,
+};
+
+/**
+ * struct alarm - Alarm timer structure
+ * @node: timerqueue node for adding to the event list this value
+ * also includes the expiration time.
+ * @period: Period for recuring alarms
+ * @function: Function pointer to be executed when the timer fires.
+ * @type: Alarm type (BOOTTIME/REALTIME)
+ * @enabled: Flag that represents if the alarm is set to fire or not
+ * @data: Internal data value.
+ */
+struct alarm {
+ struct timerqueue_node node;
+ ktime_t period;
+ void (*function)(struct alarm *);
+ enum alarmtimer_type type;
+ bool enabled;
+ void *data;
+};
+
+void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
+ void (*function)(struct alarm *));
+void alarm_start(struct alarm *alarm, ktime_t start, ktime_t period);
+void alarm_cancel(struct alarm *alarm);
+
+#endif
preempt_disable();
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
- while (test_bit(bitnum, addr)) {
- preempt_enable();
+ preempt_enable();
+ do {
cpu_relax();
- preempt_disable();
- }
+ } while (test_bit(bitnum, addr));
+ preempt_disable();
}
#endif
__acquire(bitlock);
__alloc_bootmem_nopanic(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
#define alloc_bootmem_node(pgdat, x) \
__alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
+#define alloc_bootmem_node_nopanic(pgdat, x) \
+ __alloc_bootmem_node_nopanic(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
#define alloc_bootmem_pages_node(pgdat, x) \
__alloc_bootmem_node(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
#define alloc_bootmem_pages_node_nopanic(pgdat, x) \
--- /dev/null
+#ifndef _LINUX_BSEARCH_H
+#define _LINUX_BSEARCH_H
+
+#include <linux/types.h>
+
+void *bsearch(const void *key, const void *base, size_t num, size_t size,
+ int (*cmp)(const void *key, const void *elt));
+
+#endif /* _LINUX_BSEARCH_H */
#define CAP_SYSLOG 34
-#define CAP_LAST_CAP CAP_SYSLOG
+/* Allow triggering something that will wake the system */
+
+#define CAP_WAKE_ALARM 35
+
+
+#define CAP_LAST_CAP CAP_WAKE_ALARM
#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
extern bool capable(int cap);
extern bool ns_capable(struct user_namespace *ns, int cap);
extern bool task_ns_capable(struct task_struct *t, int cap);
-
-/**
- * nsown_capable - Check superior capability to one's own user_ns
- * @cap: The capability in question
- *
- * Return true if the current task has the given superior capability
- * targeted at its own user namespace.
- */
-static inline bool nsown_capable(int cap)
-{
- return ns_capable(current_user_ns(), cap);
-}
+extern bool nsown_capable(int cap);
/* audit system wants to get cap info from files as well */
extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
/**
* struct clock_event_device - clock event device descriptor
- * @name: ptr to clock event name
- * @features: features
+ * @event_handler: Assigned by the framework to be called by the low
+ * level handler of the event source
+ * @set_next_event: set next event function
+ * @next_event: local storage for the next event in oneshot mode
* @max_delta_ns: maximum delta value in ns
* @min_delta_ns: minimum delta value in ns
* @mult: nanosecond to cycles multiplier
* @shift: nanoseconds to cycles divisor (power of two)
+ * @mode: operating mode assigned by the management code
+ * @features: features
+ * @retries: number of forced programming retries
+ * @set_mode: set mode function
+ * @broadcast: function to broadcast events
+ * @min_delta_ticks: minimum delta value in ticks stored for reconfiguration
+ * @max_delta_ticks: maximum delta value in ticks stored for reconfiguration
+ * @name: ptr to clock event name
* @rating: variable to rate clock event devices
* @irq: IRQ number (only for non CPU local devices)
* @cpumask: cpumask to indicate for which CPUs this device works
- * @set_next_event: set next event function
- * @set_mode: set mode function
- * @event_handler: Assigned by the framework to be called by the low
- * level handler of the event source
- * @broadcast: function to broadcast events
* @list: list head for the management code
- * @mode: operating mode assigned by the management code
- * @next_event: local storage for the next event in oneshot mode
- * @retries: number of forced programming retries
*/
struct clock_event_device {
- const char *name;
- unsigned int features;
+ void (*event_handler)(struct clock_event_device *);
+ int (*set_next_event)(unsigned long evt,
+ struct clock_event_device *);
+ ktime_t next_event;
u64 max_delta_ns;
u64 min_delta_ns;
u32 mult;
u32 shift;
+ enum clock_event_mode mode;
+ unsigned int features;
+ unsigned long retries;
+
+ void (*broadcast)(const struct cpumask *mask);
+ void (*set_mode)(enum clock_event_mode mode,
+ struct clock_event_device *);
+ unsigned long min_delta_ticks;
+ unsigned long max_delta_ticks;
+
+ const char *name;
int rating;
int irq;
const struct cpumask *cpumask;
- int (*set_next_event)(unsigned long evt,
- struct clock_event_device *);
- void (*set_mode)(enum clock_event_mode mode,
- struct clock_event_device *);
- void (*event_handler)(struct clock_event_device *);
- void (*broadcast)(const struct cpumask *mask);
struct list_head list;
- enum clock_event_mode mode;
- ktime_t next_event;
- unsigned long retries;
-};
+} ____cacheline_aligned;
/*
* Calculate a multiplication factor for scaled math, which is used to convert
struct clock_event_device *evt);
extern void clockevents_register_device(struct clock_event_device *dev);
+extern void clockevents_config_and_register(struct clock_event_device *dev,
+ u32 freq, unsigned long min_delta,
+ unsigned long max_delta);
+
+extern int clockevents_update_freq(struct clock_event_device *ce, u32 freq);
+
extern void clockevents_exchange_device(struct clock_event_device *old,
struct clock_event_device *new);
extern void clockevents_set_mode(struct clock_event_device *dev,
*/
struct clocksource {
/*
- * First part of structure is read mostly
+ * Hotpath data, fits in a single cache line when the
+ * clocksource itself is cacheline aligned.
*/
- char *name;
- struct list_head list;
- int rating;
cycle_t (*read)(struct clocksource *cs);
- int (*enable)(struct clocksource *cs);
- void (*disable)(struct clocksource *cs);
+ cycle_t cycle_last;
cycle_t mask;
u32 mult;
u32 shift;
u64 max_idle_ns;
- unsigned long flags;
- cycle_t (*vread)(void);
- void (*suspend)(struct clocksource *cs);
- void (*resume)(struct clocksource *cs);
+
#ifdef CONFIG_IA64
void *fsys_mmio; /* used by fsyscall asm code */
#define CLKSRC_FSYS_MMIO_SET(mmio, addr) ((mmio) = (addr))
#else
#define CLKSRC_FSYS_MMIO_SET(mmio, addr) do { } while (0)
#endif
-
- /*
- * Second part is written at each timer interrupt
- * Keep it in a different cache line to dirty no
- * more than one cache line.
- */
- cycle_t cycle_last ____cacheline_aligned_in_smp;
+ const char *name;
+ struct list_head list;
+ int rating;
+ cycle_t (*vread)(void);
+ int (*enable)(struct clocksource *cs);
+ void (*disable)(struct clocksource *cs);
+ unsigned long flags;
+ void (*suspend)(struct clocksource *cs);
+ void (*resume)(struct clocksource *cs);
#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
/* Watchdog related data, used by the framework */
struct list_head wd_list;
cycle_t wd_last;
#endif
-};
+} ____cacheline_aligned;
/*
* Clock source flags bits::
extern void timekeeping_notify(struct clocksource *clock);
+extern int clocksource_i8253_init(void);
+
#endif /* _LINUX_CLOCKSOURCE_H */
*
* Copyright (C) 2001 Russell King
* (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
- *
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
#define CPUFREQ_POLICY_POWERSAVE (1)
#define CPUFREQ_POLICY_PERFORMANCE (2)
-/* Frequency values here are CPU kHz so that hardware which doesn't run
- * with some frequencies can complain without having to guess what per
- * cent / per mille means.
+/* Frequency values here are CPU kHz so that hardware which doesn't run
+ * with some frequencies can complain without having to guess what per
+ * cent / per mille means.
* Maximum transition latency is in nanoseconds - if it's unknown,
* CPUFREQ_ETERNAL shall be used.
*/
struct cpufreq_cpuinfo {
unsigned int max_freq;
unsigned int min_freq;
- unsigned int transition_latency; /* in 10^(-9) s = nanoseconds */
+
+ /* in 10^(-9) s = nanoseconds */
+ unsigned int transition_latency;
};
struct cpufreq_real_policy {
unsigned int min; /* in kHz */
unsigned int max; /* in kHz */
- unsigned int policy; /* see above */
+ unsigned int policy; /* see above */
struct cpufreq_governor *governor; /* see below */
};
unsigned int max; /* in kHz */
unsigned int cur; /* in kHz, only needed if cpufreq
* governors are used */
- unsigned int policy; /* see above */
+ unsigned int policy; /* see above */
struct cpufreq_governor *governor; /* see below */
struct work_struct update; /* if update_policy() needs to be
struct cpufreq_governor {
char name[CPUFREQ_NAME_LEN];
- int (*governor) (struct cpufreq_policy *policy,
+ int (*governor) (struct cpufreq_policy *policy,
unsigned int event);
ssize_t (*show_setspeed) (struct cpufreq_policy *policy,
char *buf);
- int (*store_setspeed) (struct cpufreq_policy *policy,
+ int (*store_setspeed) (struct cpufreq_policy *policy,
unsigned int freq);
unsigned int max_transition_latency; /* HW must be able to switch to
next freq faster than this value in nano secs or we
struct module *owner;
};
-/* pass a target to the cpufreq driver
+/*
+ * Pass a target to the cpufreq driver.
*/
extern int cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
/* flags */
-#define CPUFREQ_STICKY 0x01 /* the driver isn't removed even if
+#define CPUFREQ_STICKY 0x01 /* the driver isn't removed even if
* all ->init() calls failed */
-#define CPUFREQ_CONST_LOOPS 0x02 /* loops_per_jiffy or other kernel
+#define CPUFREQ_CONST_LOOPS 0x02 /* loops_per_jiffy or other kernel
* "constants" aren't affected by
* frequency transitions */
#define CPUFREQ_PM_NO_WARN 0x04 /* don't warn on suspend/resume speed
void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state);
-static inline void cpufreq_verify_within_limits(struct cpufreq_policy *policy, unsigned int min, unsigned int max)
+static inline void cpufreq_verify_within_limits(struct cpufreq_policy *policy, unsigned int min, unsigned int max)
{
if (policy->min < min)
policy->min = min;
/* the following 3 funtions are for cpufreq core use only */
struct cpufreq_frequency_table *cpufreq_frequency_get_table(unsigned int cpu);
struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu);
-void cpufreq_cpu_put (struct cpufreq_policy *data);
+void cpufreq_cpu_put(struct cpufreq_policy *data);
/* the following are really really optional */
extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;
-void cpufreq_frequency_table_get_attr(struct cpufreq_frequency_table *table,
+void cpufreq_frequency_table_get_attr(struct cpufreq_frequency_table *table,
unsigned int cpu);
void cpufreq_frequency_table_put_attr(unsigned int cpu);
-/*********************************************************************
- * UNIFIED DEBUG HELPERS *
- *********************************************************************/
-
-#define CPUFREQ_DEBUG_CORE 1
-#define CPUFREQ_DEBUG_DRIVER 2
-#define CPUFREQ_DEBUG_GOVERNOR 4
-
-#ifdef CONFIG_CPU_FREQ_DEBUG
-
-extern void cpufreq_debug_printk(unsigned int type, const char *prefix,
- const char *fmt, ...);
-
-#else
-
-#define cpufreq_debug_printk(msg...) do { } while(0)
-
-#endif /* CONFIG_CPU_FREQ_DEBUG */
-
#endif /* _LINUX_CPUFREQ_H */
void *security; /* subjective LSM security */
#endif
struct user_struct *user; /* real user ID subscription */
+ struct user_namespace *user_ns; /* cached user->user_ns */
struct group_info *group_info; /* supplementary groups for euid/fsgid */
struct rcu_head rcu; /* RCU deletion hook */
};
#define current_fsgid() (current_cred_xxx(fsgid))
#define current_cap() (current_cred_xxx(cap_effective))
#define current_user() (current_cred_xxx(user))
-#define _current_user_ns() (current_cred_xxx(user)->user_ns)
#define current_security() (current_cred_xxx(security))
-extern struct user_namespace *current_user_ns(void);
+#ifdef CONFIG_USER_NS
+#define current_user_ns() (current_cred_xxx(user_ns))
+#else
+extern struct user_namespace init_user_ns;
+#define current_user_ns() (&init_user_ns)
+#endif
+
#define current_uid_gid(_uid, _gid) \
do { \
* typically using d_splice_alias. */
#define DCACHE_REFERENCED 0x0008 /* Recently used, don't discard. */
-#define DCACHE_UNHASHED 0x0010
+#define DCACHE_RCUACCESS 0x0010 /* Entry has ever been RCU-visible */
#define DCACHE_INOTIFY_PARENT_WATCHED 0x0020
/* Parent inode is watched by inotify */
static inline int d_unhashed(struct dentry *dentry)
{
- return (dentry->d_flags & DCACHE_UNHASHED);
+ return hlist_bl_unhashed(&dentry->d_hash);
}
static inline int d_unlinked(struct dentry *dentry)
struct dev_archdata archdata;
struct device_node *of_node; /* associated device tree node */
- const struct of_device_id *of_match; /* matching of_device_id from driver */
dev_t devt; /* dev_t, creates the sysfs "dev" */
/* drivers/base/power/shutdown.c */
extern void device_shutdown(void);
-#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
-/* drivers/base/sys.c */
-extern void sysdev_shutdown(void);
-#else
-static inline void sysdev_shutdown(void) { }
-#endif
-
/* debugging and troubleshooting/diagnostic helpers. */
extern const char *dev_driver_string(const struct device *dev);
#ifndef _DYNAMIC_DEBUG_H
#define _DYNAMIC_DEBUG_H
-#include <linux/jump_label.h>
-
/* dynamic_printk_enabled, and dynamic_printk_enabled2 are bitmasks in which
* bit n is set to 1 if any modname hashes into the bucket n, 0 otherwise. They
* use independent hash functions, to reduce the chance of false positives.
#define FBINFO_CAN_FORCE_OUTPUT 0x200000
struct fb_info {
+ atomic_t count;
int node;
int flags;
struct mutex lock; /* Lock for open/release/ioctl funcs */
struct flex_array *flex_array_alloc(int element_size, unsigned int total,
gfp_t flags);
int flex_array_prealloc(struct flex_array *fa, unsigned int start,
- unsigned int end, gfp_t flags);
+ unsigned int nr_elements, gfp_t flags);
void flex_array_free(struct flex_array *fa);
void flex_array_free_parts(struct flex_array *fa);
int flex_array_put(struct flex_array *fa, unsigned int element_nr, void *src,
#define FS_EXTENT_FL 0x00080000 /* Extents */
#define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
-#define FS_COW_FL 0x02000000 /* Cow file */
#define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
+struct ftrace_hash;
+
+enum {
+ FTRACE_OPS_FL_ENABLED = 1 << 0,
+ FTRACE_OPS_FL_GLOBAL = 1 << 1,
+ FTRACE_OPS_FL_DYNAMIC = 1 << 2,
+};
+
struct ftrace_ops {
- ftrace_func_t func;
- struct ftrace_ops *next;
+ ftrace_func_t func;
+ struct ftrace_ops *next;
+ unsigned long flags;
+#ifdef CONFIG_DYNAMIC_FTRACE
+ struct ftrace_hash *notrace_hash;
+ struct ftrace_hash *filter_hash;
+#endif
};
extern int function_trace_stop;
extern int ftrace_text_reserved(void *start, void *end);
enum {
- FTRACE_FL_FREE = (1 << 0),
- FTRACE_FL_FAILED = (1 << 1),
- FTRACE_FL_FILTER = (1 << 2),
- FTRACE_FL_ENABLED = (1 << 3),
- FTRACE_FL_NOTRACE = (1 << 4),
- FTRACE_FL_CONVERTED = (1 << 5),
+ FTRACE_FL_ENABLED = (1 << 30),
+ FTRACE_FL_FREE = (1 << 31),
};
+#define FTRACE_FL_MASK (0x3UL << 30)
+#define FTRACE_REF_MAX ((1 << 30) - 1)
+
struct dyn_ftrace {
union {
unsigned long ip; /* address of mcount call-site */
};
int ftrace_force_update(void);
-void ftrace_set_filter(unsigned char *buf, int len, int reset);
+void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+ int len, int reset);
+void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+ int len, int reset);
+void ftrace_set_global_filter(unsigned char *buf, int len, int reset);
+void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
int register_ftrace_command(struct ftrace_func_command *cmd);
int unregister_ftrace_command(struct ftrace_func_command *cmd);
unsigned char flags;
unsigned char preempt_count;
int pid;
+ int padding;
};
#define FTRACE_MAX_EVENT \
void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
void free_pages_exact(void *virt, size_t size);
+/* This is different from alloc_pages_exact_node !!! */
+void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
#define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask), 0)
unsigned long end,
long adjust_next)
{
- if (!vma->anon_vma || vma->vm_ops || vma->vm_file)
+ if (!vma->anon_vma || vma->vm_ops)
return;
__vma_adjust_trans_huge(vma, start, end, adjust_next);
}
#define __exitused __used
#endif
-#define __exit __section(.exit.text) __exitused __cold
+#define __exit __section(.exit.text) __exitused __cold notrace
/* Used for HOTPLUG */
-#define __devinit __section(.devinit.text) __cold
+#define __devinit __section(.devinit.text) __cold notrace
#define __devinitdata __section(.devinit.data)
#define __devinitconst __section(.devinit.rodata)
-#define __devexit __section(.devexit.text) __exitused __cold
+#define __devexit __section(.devexit.text) __exitused __cold notrace
#define __devexitdata __section(.devexit.data)
#define __devexitconst __section(.devexit.rodata)
/* Used for HOTPLUG_CPU */
-#define __cpuinit __section(.cpuinit.text) __cold
+#define __cpuinit __section(.cpuinit.text) __cold notrace
#define __cpuinitdata __section(.cpuinit.data)
#define __cpuinitconst __section(.cpuinit.rodata)
-#define __cpuexit __section(.cpuexit.text) __exitused __cold
+#define __cpuexit __section(.cpuexit.text) __exitused __cold notrace
#define __cpuexitdata __section(.cpuexit.data)
#define __cpuexitconst __section(.cpuexit.rodata)
/* Used for MEMORY_HOTPLUG */
-#define __meminit __section(.meminit.text) __cold
+#define __meminit __section(.meminit.text) __cold notrace
#define __meminitdata __section(.meminit.data)
#define __meminitconst __section(.meminit.rodata)
-#define __memexit __section(.memexit.text) __exitused __cold
+#define __memexit __section(.memexit.text) __exitused __cold notrace
#define __memexitdata __section(.memexit.data)
#define __memexitconst __section(.memexit.rodata)
.stack = &init_thread_info, \
.usage = ATOMIC_INIT(2), \
.flags = PF_KTHREAD, \
- .lock_depth = -1, \
.prio = MAX_PRIO-20, \
.static_prio = MAX_PRIO-20, \
.normal_prio = MAX_PRIO-20, \
* Bits which can be modified via irq_set/clear/modify_status_flags()
* IRQ_LEVEL - Interrupt is level type. Will be also
* updated in the code when the above trigger
- * bits are modified via set_irq_type()
+ * bits are modified via irq_set_irq_type()
* IRQ_PER_CPU - Mark an interrupt PER_CPU. Will protect
* it from affinity setting
* IRQ_NOPROBE - Interrupt cannot be probed by autoprobing
* IRQ_NOREQUEST - Interrupt cannot be requested via
* request_irq()
+ * IRQ_NOTHREAD - Interrupt cannot be threaded
* IRQ_NOAUTOEN - Interrupt is not automatically enabled in
* request/setup_irq()
* IRQ_NO_BALANCING - Interrupt cannot be balanced (affinity set)
IRQ_NO_BALANCING = (1 << 13),
IRQ_MOVE_PCNTXT = (1 << 14),
IRQ_NESTED_THREAD = (1 << 15),
+ IRQ_NOTHREAD = (1 << 16),
};
#define IRQF_MODIFY_MASK \
* struct irq_chip - hardware interrupt chip descriptor
*
* @name: name for /proc/interrupts
- * @startup: deprecated, replaced by irq_startup
- * @shutdown: deprecated, replaced by irq_shutdown
- * @enable: deprecated, replaced by irq_enable
- * @disable: deprecated, replaced by irq_disable
- * @ack: deprecated, replaced by irq_ack
- * @mask: deprecated, replaced by irq_mask
- * @mask_ack: deprecated, replaced by irq_mask_ack
- * @unmask: deprecated, replaced by irq_unmask
- * @eoi: deprecated, replaced by irq_eoi
- * @end: deprecated, will go away with __do_IRQ()
- * @set_affinity: deprecated, replaced by irq_set_affinity
- * @retrigger: deprecated, replaced by irq_retrigger
- * @set_type: deprecated, replaced by irq_set_type
- * @set_wake: deprecated, replaced by irq_wake
- * @bus_lock: deprecated, replaced by irq_bus_lock
- * @bus_sync_unlock: deprecated, replaced by irq_bus_sync_unlock
- *
* @irq_startup: start up the interrupt (defaults to ->enable if NULL)
* @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL)
* @irq_enable: enable the interrupt (defaults to chip->unmask if NULL)
* @irq_bus_sync_unlock:function to sync and unlock slow bus (i2c) chips
* @irq_cpu_online: configure an interrupt source for a secondary CPU
* @irq_cpu_offline: un-configure an interrupt source for a secondary CPU
+ * @irq_suspend: function called from core code on suspend once per chip
+ * @irq_resume: function called from core code on resume once per chip
+ * @irq_pm_shutdown: function called from core code on shutdown once per chip
* @irq_print_chip: optional to print special chip info in show_interrupts
* @flags: chip specific flags
*
void (*irq_cpu_online)(struct irq_data *data);
void (*irq_cpu_offline)(struct irq_data *data);
+ void (*irq_suspend)(struct irq_data *data);
+ void (*irq_resume)(struct irq_data *data);
+ void (*irq_pm_shutdown)(struct irq_data *data);
+
void (*irq_print_chip)(struct irq_data *data, struct seq_file *p);
unsigned long flags;
/*
* Set a highlevel chained flow handler for a given IRQ.
* (a chained handler is automatically enabled and set to
- * IRQ_NOREQUEST and IRQ_NOPROBE)
+ * IRQ_NOREQUEST, IRQ_NOPROBE, and IRQ_NOTHREAD)
*/
static inline void
irq_set_chained_handler(unsigned int irq, irq_flow_handler_t handle)
irq_modify_status(irq, IRQ_NOPROBE, 0);
}
+static inline void irq_set_nothread(unsigned int irq)
+{
+ irq_modify_status(irq, 0, IRQ_NOTHREAD);
+}
+
+static inline void irq_set_thread(unsigned int irq)
+{
+ irq_modify_status(irq, IRQ_NOTHREAD, 0);
+}
+
static inline void irq_set_nested_thread(unsigned int irq, bool nest)
{
if (nest)
return irq_reserve_irqs(irq, 1);
}
+#ifndef irq_reg_writel
+# define irq_reg_writel(val, addr) writel(val, addr)
+#endif
+#ifndef irq_reg_readl
+# define irq_reg_readl(addr) readl(addr)
+#endif
+
+/**
+ * struct irq_chip_regs - register offsets for struct irq_gci
+ * @enable: Enable register offset to reg_base
+ * @disable: Disable register offset to reg_base
+ * @mask: Mask register offset to reg_base
+ * @ack: Ack register offset to reg_base
+ * @eoi: Eoi register offset to reg_base
+ * @type: Type configuration register offset to reg_base
+ * @polarity: Polarity configuration register offset to reg_base
+ */
+struct irq_chip_regs {
+ unsigned long enable;
+ unsigned long disable;
+ unsigned long mask;
+ unsigned long ack;
+ unsigned long eoi;
+ unsigned long type;
+ unsigned long polarity;
+};
+
+/**
+ * struct irq_chip_type - Generic interrupt chip instance for a flow type
+ * @chip: The real interrupt chip which provides the callbacks
+ * @regs: Register offsets for this chip
+ * @handler: Flow handler associated with this chip
+ * @type: Chip can handle these flow types
+ *
+ * A irq_generic_chip can have several instances of irq_chip_type when
+ * it requires different functions and register offsets for different
+ * flow types.
+ */
+struct irq_chip_type {
+ struct irq_chip chip;
+ struct irq_chip_regs regs;
+ irq_flow_handler_t handler;
+ u32 type;
+};
+
+/**
+ * struct irq_chip_generic - Generic irq chip data structure
+ * @lock: Lock to protect register and cache data access
+ * @reg_base: Register base address (virtual)
+ * @irq_base: Interrupt base nr for this chip
+ * @irq_cnt: Number of interrupts handled by this chip
+ * @mask_cache: Cached mask register
+ * @type_cache: Cached type register
+ * @polarity_cache: Cached polarity register
+ * @wake_enabled: Interrupt can wakeup from suspend
+ * @wake_active: Interrupt is marked as an wakeup from suspend source
+ * @num_ct: Number of available irq_chip_type instances (usually 1)
+ * @private: Private data for non generic chip callbacks
+ * @list: List head for keeping track of instances
+ * @chip_types: Array of interrupt irq_chip_types
+ *
+ * Note, that irq_chip_generic can have multiple irq_chip_type
+ * implementations which can be associated to a particular irq line of
+ * an irq_chip_generic instance. That allows to share and protect
+ * state in an irq_chip_generic instance when we need to implement
+ * different flow mechanisms (level/edge) for it.
+ */
+struct irq_chip_generic {
+ raw_spinlock_t lock;
+ void __iomem *reg_base;
+ unsigned int irq_base;
+ unsigned int irq_cnt;
+ u32 mask_cache;
+ u32 type_cache;
+ u32 polarity_cache;
+ u32 wake_enabled;
+ u32 wake_active;
+ unsigned int num_ct;
+ void *private;
+ struct list_head list;
+ struct irq_chip_type chip_types[0];
+};
+
+/**
+ * enum irq_gc_flags - Initialization flags for generic irq chips
+ * @IRQ_GC_INIT_MASK_CACHE: Initialize the mask_cache by reading mask reg
+ * @IRQ_GC_INIT_NESTED_LOCK: Set the lock class of the irqs to nested for
+ * irq chips which need to call irq_set_wake() on
+ * the parent irq. Usually GPIO implementations
+ */
+enum irq_gc_flags {
+ IRQ_GC_INIT_MASK_CACHE = 1 << 0,
+ IRQ_GC_INIT_NESTED_LOCK = 1 << 1,
+};
+
+/* Generic chip callback functions */
+void irq_gc_noop(struct irq_data *d);
+void irq_gc_mask_disable_reg(struct irq_data *d);
+void irq_gc_mask_set_bit(struct irq_data *d);
+void irq_gc_mask_clr_bit(struct irq_data *d);
+void irq_gc_unmask_enable_reg(struct irq_data *d);
+void irq_gc_ack(struct irq_data *d);
+void irq_gc_mask_disable_reg_and_ack(struct irq_data *d);
+void irq_gc_eoi(struct irq_data *d);
+int irq_gc_set_wake(struct irq_data *d, unsigned int on);
+
+/* Setup functions for irq_chip_generic */
+struct irq_chip_generic *
+irq_alloc_generic_chip(const char *name, int nr_ct, unsigned int irq_base,
+ void __iomem *reg_base, irq_flow_handler_t handler);
+void irq_setup_generic_chip(struct irq_chip_generic *gc, u32 msk,
+ enum irq_gc_flags flags, unsigned int clr,
+ unsigned int set);
+int irq_setup_alt_chip(struct irq_data *d, unsigned int type);
+void irq_remove_generic_chip(struct irq_chip_generic *gc, u32 msk,
+ unsigned int clr, unsigned int set);
+
+static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
+{
+ return container_of(d->chip, struct irq_chip_type, chip);
+}
+
+#define IRQ_MSK(n) (u32)((n) < 32 ? ((1 << (n)) - 1) : UINT_MAX)
+
+#ifdef CONFIG_SMP
+static inline void irq_gc_lock(struct irq_chip_generic *gc)
+{
+ raw_spin_lock(&gc->lock);
+}
+
+static inline void irq_gc_unlock(struct irq_chip_generic *gc)
+{
+ raw_spin_unlock(&gc->lock);
+}
+#else
+static inline void irq_gc_lock(struct irq_chip_generic *gc) { }
+static inline void irq_gc_unlock(struct irq_chip_generic *gc) { }
+#endif
+
#endif /* CONFIG_GENERIC_HARDIRQS */
#endif /* !CONFIG_S390 */
* @irq_data: per irq and chip data passed down to chip functions
* @timer_rand_state: pointer to timer rand state struct
* @kstat_irqs: irq stats per cpu
- * @handle_irq: highlevel irq-events handler [if NULL, __do_IRQ()]
+ * @handle_irq: highlevel irq-events handler
+ * @preflow_handler: handler called before the flow handler (currently used by sparc)
* @action: the irq action chain
* @status: status information
* @core_internal_state__do_not_mess_with_it: core internal status information
* @depth: disable-depth, for nested irq_disable() calls
- * @wake_depth: enable depth, for multiple set_irq_wake() callers
+ * @wake_depth: enable depth, for multiple irq_set_irq_wake() callers
* @irq_count: stats field to detect stalled irqs
* @last_unhandled: aging timer for unhandled count
* @irqs_unhandled: stats field for spurious unhandled interrupts
* @lock: locking for SMP
+ * @affinity_hint: hint to user space for preferred irq affinity
* @affinity_notify: context for notification of affinity changes
* @pending_mask: pending rebalanced interrupts
* @threads_oneshot: bitfield to handle shared oneshot threads
desc->handle_irq(irq, desc);
}
-static inline void generic_handle_irq(unsigned int irq)
-{
- generic_handle_irq_desc(irq, irq_to_desc(irq));
-}
+int generic_handle_irq(unsigned int irq);
/* Test to see if a driver has successfully requested an irq */
static inline int irq_has_action(unsigned int irq)
#ifndef _LINUX_JUMP_LABEL_H
#define _LINUX_JUMP_LABEL_H
+#include <linux/types.h>
+#include <linux/compiler.h>
+
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
+
+struct jump_label_key {
+ atomic_t enabled;
+ struct jump_entry *entries;
+#ifdef CONFIG_MODULES
+ struct jump_label_mod *next;
+#endif
+};
+
# include <asm/jump_label.h>
# define HAVE_JUMP_LABEL
#endif
enum jump_label_type {
+ JUMP_LABEL_DISABLE = 0,
JUMP_LABEL_ENABLE,
- JUMP_LABEL_DISABLE
};
struct module;
#ifdef HAVE_JUMP_LABEL
+#ifdef CONFIG_MODULES
+#define JUMP_LABEL_INIT {{ 0 }, NULL, NULL}
+#else
+#define JUMP_LABEL_INIT {{ 0 }, NULL}
+#endif
+
+static __always_inline bool static_branch(struct jump_label_key *key)
+{
+ return arch_static_branch(key);
+}
+
extern struct jump_entry __start___jump_table[];
extern struct jump_entry __stop___jump_table[];
extern void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type);
extern void arch_jump_label_text_poke_early(jump_label_t addr);
-extern void jump_label_update(unsigned long key, enum jump_label_type type);
-extern void jump_label_apply_nops(struct module *mod);
extern int jump_label_text_reserved(void *start, void *end);
+extern void jump_label_inc(struct jump_label_key *key);
+extern void jump_label_dec(struct jump_label_key *key);
+extern bool jump_label_enabled(struct jump_label_key *key);
+extern void jump_label_apply_nops(struct module *mod);
-#define jump_label_enable(key) \
- jump_label_update((unsigned long)key, JUMP_LABEL_ENABLE);
+#else
-#define jump_label_disable(key) \
- jump_label_update((unsigned long)key, JUMP_LABEL_DISABLE);
+#include <asm/atomic.h>
-#else
+#define JUMP_LABEL_INIT {ATOMIC_INIT(0)}
-#define JUMP_LABEL(key, label) \
-do { \
- if (unlikely(*key)) \
- goto label; \
-} while (0)
+struct jump_label_key {
+ atomic_t enabled;
+};
-#define jump_label_enable(cond_var) \
-do { \
- *(cond_var) = 1; \
-} while (0)
+static __always_inline bool static_branch(struct jump_label_key *key)
+{
+ if (unlikely(atomic_read(&key->enabled)))
+ return true;
+ return false;
+}
-#define jump_label_disable(cond_var) \
-do { \
- *(cond_var) = 0; \
-} while (0)
+static inline void jump_label_inc(struct jump_label_key *key)
+{
+ atomic_inc(&key->enabled);
+}
-static inline int jump_label_apply_nops(struct module *mod)
+static inline void jump_label_dec(struct jump_label_key *key)
{
- return 0;
+ atomic_dec(&key->enabled);
}
static inline int jump_label_text_reserved(void *start, void *end)
static inline void jump_label_lock(void) {}
static inline void jump_label_unlock(void) {}
-#endif
+static inline bool jump_label_enabled(struct jump_label_key *key)
+{
+ return !!atomic_read(&key->enabled);
+}
-#define COND_STMT(key, stmt) \
-do { \
- __label__ jl_enabled; \
- JUMP_LABEL(key, jl_enabled); \
- if (0) { \
-jl_enabled: \
- stmt; \
- } \
-} while (0)
+static inline int jump_label_apply_nops(struct module *mod)
+{
+ return 0;
+}
+
+#endif
#endif
+++ /dev/null
-#ifndef _LINUX_JUMP_LABEL_REF_H
-#define _LINUX_JUMP_LABEL_REF_H
-
-#include <linux/jump_label.h>
-#include <asm/atomic.h>
-
-#ifdef HAVE_JUMP_LABEL
-
-static inline void jump_label_inc(atomic_t *key)
-{
- if (atomic_add_return(1, key) == 1)
- jump_label_enable(key);
-}
-
-static inline void jump_label_dec(atomic_t *key)
-{
- if (atomic_dec_and_test(key))
- jump_label_disable(key);
-}
-
-#else /* !HAVE_JUMP_LABEL */
-
-static inline void jump_label_inc(atomic_t *key)
-{
- atomic_inc(key);
-}
-
-static inline void jump_label_dec(atomic_t *key)
-{
- atomic_dec(key);
-}
-
-#undef JUMP_LABEL
-#define JUMP_LABEL(key, label) \
-do { \
- if (unlikely(__builtin_choose_expr( \
- __builtin_types_compatible_p(typeof(key), atomic_t *), \
- atomic_read((atomic_t *)(key)), *(key)))) \
- goto label; \
-} while (0)
-
-#endif /* HAVE_JUMP_LABEL */
-
-#endif /* _LINUX_JUMP_LABEL_REF_H */
extern unsigned long long memparse(const char *ptr, char **retptr);
extern int core_kernel_text(unsigned long addr);
+extern int core_kernel_data(unsigned long addr);
extern int __kernel_text_address(unsigned long addr);
extern int kernel_text_address(unsigned long addr);
extern int func_ptr_is_kernel_text(void *ptr);
extern int usermodehelper_disable(void);
extern void usermodehelper_enable(void);
+extern bool usermodehelper_is_disabled(void);
#endif /* __LINUX_KMOD_H__ */
ATA_DFLAG_ACPI_PENDING = (1 << 5), /* ACPI resume action pending */
ATA_DFLAG_ACPI_FAILED = (1 << 6), /* ACPI on devcfg has failed */
ATA_DFLAG_AN = (1 << 7), /* AN configured */
- ATA_DFLAG_HIPM = (1 << 8), /* device supports HIPM */
- ATA_DFLAG_DIPM = (1 << 9), /* device supports DIPM */
ATA_DFLAG_DMADIR = (1 << 10), /* device requires DMADIR */
ATA_DFLAG_CFG_MASK = (1 << 12) - 1,
* management */
ATA_FLAG_SW_ACTIVITY = (1 << 22), /* driver supports sw activity
* led */
+ ATA_FLAG_NO_DIPM = (1 << 23), /* host not happy with DIPM */
/* bits 24:31 of ap->flags are reserved for LLD specific flags */
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/poison.h>
-#include <linux/prefetch.h>
+#include <linux/const.h>
/*
* Simple doubly linked list implementation.
* @head: the head for your list.
*/
#define list_for_each(pos, head) \
- for (pos = (head)->next; prefetch(pos->next), pos != (head); \
- pos = pos->next)
+ for (pos = (head)->next; pos != (head); pos = pos->next)
/**
* __list_for_each - iterate over a list
* @pos: the &struct list_head to use as a loop cursor.
* @head: the head for your list.
*
- * This variant differs from list_for_each() in that it's the
- * simplest possible list iteration code, no prefetching is done.
- * Use this for code that knows the list to be very short (empty
- * or 1 entry) most of the time.
+ * This variant doesn't differ from list_for_each() any more.
+ * We don't do prefetching in either case.
*/
#define __list_for_each(pos, head) \
for (pos = (head)->next; pos != (head); pos = pos->next)
* @head: the head for your list.
*/
#define list_for_each_prev(pos, head) \
- for (pos = (head)->prev; prefetch(pos->prev), pos != (head); \
- pos = pos->prev)
+ for (pos = (head)->prev; pos != (head); pos = pos->prev)
/**
* list_for_each_safe - iterate over a list safe against removal of list entry
*/
#define list_for_each_prev_safe(pos, n, head) \
for (pos = (head)->prev, n = pos->prev; \
- prefetch(pos->prev), pos != (head); \
+ pos != (head); \
pos = n, n = pos->prev)
/**
*/
#define list_for_each_entry(pos, head, member) \
for (pos = list_entry((head)->next, typeof(*pos), member); \
- prefetch(pos->member.next), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))
/**
*/
#define list_for_each_entry_reverse(pos, head, member) \
for (pos = list_entry((head)->prev, typeof(*pos), member); \
- prefetch(pos->member.prev), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.prev, typeof(*pos), member))
/**
*/
#define list_for_each_entry_continue(pos, head, member) \
for (pos = list_entry(pos->member.next, typeof(*pos), member); \
- prefetch(pos->member.next), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))
/**
*/
#define list_for_each_entry_continue_reverse(pos, head, member) \
for (pos = list_entry(pos->member.prev, typeof(*pos), member); \
- prefetch(pos->member.prev), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.prev, typeof(*pos), member))
/**
* Iterate over list of given type, continuing from current position.
*/
#define list_for_each_entry_from(pos, head, member) \
- for (; prefetch(pos->member.next), &pos->member != (head); \
+ for (; &pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))
/**
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
#define hlist_for_each(pos, head) \
- for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; }); \
- pos = pos->next)
+ for (pos = (head)->first; pos ; pos = pos->next)
#define hlist_for_each_safe(pos, n, head) \
for (pos = (head)->first; pos && ({ n = pos->next; 1; }); \
*/
#define hlist_for_each_entry(tpos, pos, head, member) \
for (pos = (head)->first; \
- pos && ({ prefetch(pos->next); 1;}) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)
*/
#define hlist_for_each_entry_continue(tpos, pos, member) \
for (pos = (pos)->next; \
- pos && ({ prefetch(pos->next); 1;}) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)
* @member: the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry_from(tpos, pos, member) \
- for (; pos && ({ prefetch(pos->next); 1;}) && \
+ for (; pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)
#define _LINUX_LIST_BL_H
#include <linux/list.h>
+#include <linux/bit_spinlock.h>
/*
* Special version of lists, where head of the list has a lock in the lowest
}
}
+static inline void hlist_bl_lock(struct hlist_bl_head *b)
+{
+ bit_spin_lock(0, (unsigned long *)b);
+}
+
+static inline void hlist_bl_unlock(struct hlist_bl_head *b)
+{
+ __bit_spin_unlock(0, (unsigned long *)b);
+}
+
/**
* hlist_bl_for_each_entry - iterate over list of given type
* @tpos: the type * to use as a loop cursor.
int rpu; /** Pen down sensitivity resistor divider */
int pressure; /** Report pressure (boolean) */
unsigned int data_irq; /** Touch data ready IRQ */
+ int data_irqf; /** IRQ flags for data ready IRQ */
unsigned int pd_irq; /** Touch pendown detect IRQ */
+ int pd_irqf; /** IRQ flags for pen down IRQ */
};
enum wm831x_watchdog_action {
#define VM_RandomReadHint(v) ((v)->vm_flags & VM_RAND_READ)
/*
- * special vmas that are non-mergable, non-mlock()able
+ * Special vmas that are non-mergable, non-mlock()able.
+ * Note: mm/huge_memory.c VM_NO_THP depends on this definition.
*/
#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
int clear_page_dirty_for_io(struct page *page);
/* Is the vma a continuation of the stack vma above it? */
-static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr)
+static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr)
{
return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
}
+static inline int stack_guard_page_start(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return (vma->vm_flags & VM_GROWSDOWN) &&
+ (vma->vm_start == addr) &&
+ !vma_growsdown(vma->vm_prev, addr);
+}
+
+/* Is the vma a continuation of the stack vma below it? */
+static inline int vma_growsup(struct vm_area_struct *vma, unsigned long addr)
+{
+ return vma && (vma->vm_start == addr) && (vma->vm_flags & VM_GROWSUP);
+}
+
+static inline int stack_guard_page_end(struct vm_area_struct *vma,
+ unsigned long addr)
+{
+ return (vma->vm_flags & VM_GROWSUP) &&
+ (vma->vm_end == addr) &&
+ !vma_growsup(vma->vm_next, addr);
+}
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len);
const char *version;
} __attribute__ ((__aligned__(sizeof(void *))));
+extern ssize_t __modver_version_show(struct module_attribute *,
+ struct module *, char *);
+
struct module_kobject
{
struct kobject kobj;
#define MODULE_VERSION(_version) MODULE_INFO(version, _version)
#else
#define MODULE_VERSION(_version) \
- extern ssize_t __modver_version_show(struct module_attribute *, \
- struct module *, char *); \
- static struct module_version_attribute __modver_version_attr \
- __used \
- __attribute__ ((__section__ ("__modver"),aligned(sizeof(void *)))) \
- = { \
+ static struct module_version_attribute ___modver_attr = { \
.mattr = { \
.attr = { \
.name = "version", \
}, \
.module_name = KBUILD_MODNAME, \
.version = _version, \
- }
+ }; \
+ static const struct module_version_attribute \
+ __used __attribute__ ((__section__ ("__modver"))) \
+ * __moduleparam_const __modver_attr = &___modver_attr
#endif
/* Optional firmware file (or files) needed by the module
extern void *__crc_##sym __attribute__((weak)); \
static const unsigned long __kcrctab_##sym \
__used \
- __attribute__((section("__kcrctab" sec), unused)) \
+ __attribute__((section("___kcrctab" sec "+" #sym), unused)) \
= (unsigned long) &__crc_##sym;
#else
#define __CRC_SYMBOL(sym, sec)
= MODULE_SYMBOL_PREFIX #sym; \
static const struct kernel_symbol __ksymtab_##sym \
__used \
- __attribute__((section("__ksymtab" sec), unused)) \
+ __attribute__((section("___ksymtab" sec "+" #sym), unused)) \
= { (unsigned long)&sym, __kstrtab_##sym }
#define EXPORT_SYMBOL(sym) \
struct module_notes_attrs *notes_attrs;
#endif
+ /* The command line arguments (may be mangled). People like
+ keeping pointers to this stuff */
+ char *args;
+
#ifdef CONFIG_SMP
/* Per-cpu data. */
void __percpu *percpu;
unsigned int percpu_size;
#endif
- /* The command line arguments (may be mangled). People like
- keeping pointers to this stuff */
- char *args;
#ifdef CONFIG_TRACEPOINTS
- struct tracepoint * const *tracepoints_ptrs;
unsigned int num_tracepoints;
+ struct tracepoint * const *tracepoints_ptrs;
#endif
#ifdef HAVE_JUMP_LABEL
struct jump_entry *jump_entries;
unsigned int num_jump_entries;
#endif
#ifdef CONFIG_TRACING
- const char **trace_bprintk_fmt_start;
unsigned int num_trace_bprintk_fmt;
+ const char **trace_bprintk_fmt_start;
#endif
#ifdef CONFIG_EVENT_TRACING
struct ftrace_event_call **trace_events;
unsigned int num_trace_events;
#endif
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
- unsigned long *ftrace_callsites;
unsigned int num_ftrace_callsites;
+ unsigned long *ftrace_callsites;
#endif
#ifdef CONFIG_MODULE_UNLOAD
bool warn);
/* Walk the exported symbol table */
-bool each_symbol(bool (*fn)(const struct symsearch *arr, struct module *owner,
- unsigned int symnum, void *data), void *data);
+bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
+ struct module *owner,
+ void *data), void *data);
/* Returns 0 and fills in value, defined and namebuf, or -ERANGE if
symnum out of range. */
struct kparam_array
{
unsigned int max;
+ unsigned int elemsize;
unsigned int *num;
const struct kernel_param_ops *ops;
- unsigned int elemsize;
void *elem;
};
*/
#define module_param_array_named(name, array, type, nump, perm) \
static const struct kparam_array __param_arr_##name \
- = { ARRAY_SIZE(array), nump, ¶m_ops_##type, \
- sizeof(array[0]), array }; \
+ = { .max = ARRAY_SIZE(array), .num = nump, \
+ .ops = ¶m_ops_##type, \
+ .elemsize = sizeof(array[0]), .elem = array }; \
__module_param_call(MODULE_PARAM_PREFIX, name, \
¶m_array_ops, \
.arr = &__param_arr_##name, \
spinlock_t wait_lock;
struct list_head wait_list;
#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP)
- struct thread_info *owner;
+ struct task_struct *owner;
#endif
#ifdef CONFIG_DEBUG_MUTEXES
const char *name;
#ifdef CONFIG_NFS_V4
u64 cl_clientid; /* constant */
+ nfs4_verifier cl_confirm; /* Clientid verifier */
unsigned long cl_state;
spinlock_t cl_lock;
} du;
struct nfs_fsid fsid;
__u64 fileid;
+ __u64 mounted_on_fileid;
struct timespec atime;
struct timespec mtime;
struct timespec ctime;
#define NFS_ATTR_FATTR_PRECHANGE (1U << 18)
#define NFS_ATTR_FATTR_V4_REFERRAL (1U << 19) /* NFSv4 referral */
#define NFS_ATTR_FATTR_MOUNTPOINT (1U << 20) /* Treat as mountpoint */
+#define NFS_ATTR_FATTR_MOUNTED_ON_FILEID (1U << 21)
#define NFS_ATTR_FATTR (NFS_ATTR_FATTR_TYPE \
| NFS_ATTR_FATTR_MODE \
struct nfs4_layoutget_args args;
struct nfs4_layoutget_res res;
struct pnfs_layout_segment **lsegpp;
+ gfp_t gfp_flags;
};
struct nfs4_getdeviceinfo_args {
static inline int of_driver_match_device(struct device *dev,
const struct device_driver *drv)
{
- dev->of_match = of_match_device(drv->of_match_table, dev);
- return dev->of_match != NULL;
+ return of_match_device(drv->of_match_table, dev) != NULL;
}
extern struct platform_device *of_dev_get(struct platform_device *dev);
static inline void of_device_node_put(struct device *dev) { }
+static inline const struct of_device_id *of_match_device(
+ const struct of_device_id *matches, const struct device *dev)
+{
+ return NULL;
+}
#endif /* CONFIG_OF_DEVICE */
#endif /* _LINUX_OF_DEVICE_H */
--- /dev/null
+#ifndef LINUX_PCI_ATS_H
+#define LINUX_PCI_ATS_H
+
+/* Address Translation Service */
+struct pci_ats {
+ int pos; /* capability position */
+ int stu; /* Smallest Translation Unit */
+ int qdep; /* Invalidate Queue Depth */
+ int ref_cnt; /* Physical Function reference count */
+ unsigned int is_enabled:1; /* Enable bit is set */
+};
+
+#ifdef CONFIG_PCI_IOV
+
+extern int pci_enable_ats(struct pci_dev *dev, int ps);
+extern void pci_disable_ats(struct pci_dev *dev);
+extern int pci_ats_queue_depth(struct pci_dev *dev);
+/**
+ * pci_ats_enabled - query the ATS status
+ * @dev: the PCI device
+ *
+ * Returns 1 if ATS capability is enabled, or 0 if not.
+ */
+static inline int pci_ats_enabled(struct pci_dev *dev)
+{
+ return dev->ats && dev->ats->is_enabled;
+}
+
+#else /* CONFIG_PCI_IOV */
+
+static inline int pci_enable_ats(struct pci_dev *dev, int ps)
+{
+ return -ENODEV;
+}
+
+static inline void pci_disable_ats(struct pci_dev *dev)
+{
+}
+
+static inline int pci_ats_queue_depth(struct pci_dev *dev)
+{
+ return -ENODEV;
+}
+
+static inline int pci_ats_enabled(struct pci_dev *dev)
+{
+ return 0;
+}
+
+#endif /* CONFIG_PCI_IOV */
+
+#endif /* LINUX_PCI_ATS_H*/
#define PCI_DEVICE_ID_INTEL_82840_HB 0x1a21
#define PCI_DEVICE_ID_INTEL_82845_HB 0x1a30
#define PCI_DEVICE_ID_INTEL_IOAT 0x1a38
-#define PCI_DEVICE_ID_INTEL_COUGARPOINT_SMBUS 0x1c22
#define PCI_DEVICE_ID_INTEL_COUGARPOINT_LPC_MIN 0x1c41
#define PCI_DEVICE_ID_INTEL_COUGARPOINT_LPC_MAX 0x1c5f
-#define PCI_DEVICE_ID_INTEL_PATSBURG_SMBUS 0x1d22
#define PCI_DEVICE_ID_INTEL_PATSBURG_LPC_0 0x1d40
#define PCI_DEVICE_ID_INTEL_PATSBURG_LPC_1 0x1d41
#define PCI_DEVICE_ID_INTEL_DH89XXCC_LPC_MIN 0x2310
#define PCI_DEVICE_ID_INTEL_DH89XXCC_LPC_MAX 0x231f
-#define PCI_DEVICE_ID_INTEL_DH89XXCC_SMBUS 0x2330
#define PCI_DEVICE_ID_INTEL_82801AA_0 0x2410
#define PCI_DEVICE_ID_INTEL_82801AA_1 0x2411
#define PCI_DEVICE_ID_INTEL_82801AA_3 0x2413
#define PCI_DEVICE_ID_INTEL_ICH10_5 0x3a60
#define PCI_DEVICE_ID_INTEL_5_3400_SERIES_LPC_MIN 0x3b00
#define PCI_DEVICE_ID_INTEL_5_3400_SERIES_LPC_MAX 0x3b1f
-#define PCI_DEVICE_ID_INTEL_5_3400_SERIES_SMBUS 0x3b30
#define PCI_DEVICE_ID_INTEL_IOAT_SNB 0x402f
#define PCI_DEVICE_ID_INTEL_5100_16 0x65f0
#define PCI_DEVICE_ID_INTEL_5100_21 0x65f5
irqsafe_generic_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
# endif
# define irqsafe_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \
- __pcpu_double_call_return_int(irqsafe_cpu_cmpxchg_double_, (pcp1), (pcp2), (oval1), (oval2), (nval1), (nval2))
+ __pcpu_double_call_return_bool(irqsafe_cpu_cmpxchg_double_, (pcp1), (pcp2), (oval1), (oval2), (nval1), (nval2))
#endif
#endif /* __LINUX_PERCPU_H */
* Performance events:
*
* Copyright (C) 2008-2009, Thomas Gleixner <tglx@linutronix.de>
- * Copyright (C) 2008-2009, Red Hat, Inc., Ingo Molnar
- * Copyright (C) 2008-2009, Red Hat, Inc., Peter Zijlstra
+ * Copyright (C) 2008-2011, Red Hat, Inc., Ingo Molnar
+ * Copyright (C) 2008-2011, Red Hat, Inc., Peter Zijlstra
*
* Data type definitions, declarations, prototypes.
*
PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4,
PERF_COUNT_HW_BRANCH_MISSES = 5,
PERF_COUNT_HW_BUS_CYCLES = 6,
+ PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7,
+ PERF_COUNT_HW_STALLED_CYCLES_BACKEND = 8,
PERF_COUNT_HW_MAX, /* non-ABI */
};
PERF_CONTEXT_MAX = (__u64)-4095,
};
-#define PERF_FLAG_FD_NO_GROUP (1U << 0)
-#define PERF_FLAG_FD_OUTPUT (1U << 1)
-#define PERF_FLAG_PID_CGROUP (1U << 2) /* pid=cgroup id, per-cpu mode only */
+#define PERF_FLAG_FD_NO_GROUP (1U << 0)
+#define PERF_FLAG_FD_OUTPUT (1U << 1)
+#define PERF_FLAG_PID_CGROUP (1U << 2) /* pid=cgroup id, per-cpu mode only */
#ifdef __KERNEL__
/*
#endif
struct perf_guest_info_callbacks {
- int (*is_in_guest) (void);
- int (*is_user_mode) (void);
- unsigned long (*get_guest_ip) (void);
+ int (*is_in_guest)(void);
+ int (*is_user_mode)(void);
+ unsigned long (*get_guest_ip)(void);
};
#ifdef CONFIG_HAVE_HW_BREAKPOINT
#include <linux/ftrace.h>
#include <linux/cpu.h>
#include <linux/irq_work.h>
-#include <linux/jump_label_ref.h>
+#include <linux/jump_label.h>
#include <asm/atomic.h>
#include <asm/local.h>
* Start the transaction, after this ->add() doesn't need to
* do schedulability tests.
*/
- void (*start_txn) (struct pmu *pmu); /* optional */
+ void (*start_txn) (struct pmu *pmu); /* optional */
/*
* If ->start_txn() disabled the ->add() schedulability test
* then ->commit_txn() is required to perform one. On success
* the transaction is closed. On error the transaction is kept
* open until ->cancel_txn() is called.
*/
- int (*commit_txn) (struct pmu *pmu); /* optional */
+ int (*commit_txn) (struct pmu *pmu); /* optional */
/*
* Will cancel the transaction, assumes ->del() is called
* for each successful ->add() during the transaction.
*/
- void (*cancel_txn) (struct pmu *pmu); /* optional */
+ void (*cancel_txn) (struct pmu *pmu); /* optional */
};
/**
struct pt_regs *regs);
enum perf_group_flag {
- PERF_GROUP_SOFTWARE = 0x1,
+ PERF_GROUP_SOFTWARE = 0x1,
};
-#define SWEVENT_HLIST_BITS 8
-#define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS)
+#define SWEVENT_HLIST_BITS 8
+#define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS)
struct swevent_hlist {
- struct hlist_head heads[SWEVENT_HLIST_SIZE];
- struct rcu_head rcu_head;
+ struct hlist_head heads[SWEVENT_HLIST_SIZE];
+ struct rcu_head rcu_head;
};
#define PERF_ATTACH_CONTEXT 0x01
* This is a per-cpu dynamically allocated data structure.
*/
struct perf_cgroup_info {
- u64 time;
- u64 timestamp;
+ u64 time;
+ u64 timestamp;
};
struct perf_cgroup {
- struct cgroup_subsys_state css;
- struct perf_cgroup_info *info; /* timing info, one per cpu */
+ struct cgroup_subsys_state css;
+ struct perf_cgroup_info *info; /* timing info, one per cpu */
};
#endif
/*
* Number of contexts where an event can trigger:
- * task, softirq, hardirq, nmi.
+ * task, softirq, hardirq, nmi.
*/
#define PERF_NR_CONTEXTS 4
struct perf_raw_record *raw;
};
-static inline
-void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
+static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
{
data->addr = addr;
data->raw = NULL;
return event->pmu->task_ctx_nr == perf_sw_context;
}
-extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+extern struct jump_label_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
extern void __perf_sw_event(u32, u64, int, struct pt_regs *, u64);
#ifndef perf_arch_fetch_caller_regs
-static inline void
-perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip) { }
+static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip) { }
#endif
/*
{
struct pt_regs hot_regs;
- JUMP_LABEL(&perf_swevent_enabled[event_id], have_event);
- return;
-
-have_event:
- if (!regs) {
- perf_fetch_caller_regs(&hot_regs);
- regs = &hot_regs;
+ if (static_branch(&perf_swevent_enabled[event_id])) {
+ if (!regs) {
+ perf_fetch_caller_regs(&hot_regs);
+ regs = &hot_regs;
+ }
+ __perf_sw_event(event_id, nr, nmi, regs, addr);
}
- __perf_sw_event(event_id, nr, nmi, regs, addr);
}
-extern atomic_t perf_sched_events;
+extern struct jump_label_key perf_sched_events;
static inline void perf_event_task_sched_in(struct task_struct *task)
{
- COND_STMT(&perf_sched_events, __perf_event_task_sched_in(task));
+ if (static_branch(&perf_sched_events))
+ __perf_event_task_sched_in(task);
}
-static inline
-void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
+static inline void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
{
perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
/* Callchains */
DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
-extern void perf_callchain_user(struct perf_callchain_entry *entry,
- struct pt_regs *regs);
-extern void perf_callchain_kernel(struct perf_callchain_entry *entry,
- struct pt_regs *regs);
-
+extern void perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs);
+extern void perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs);
-static inline void
-perf_callchain_store(struct perf_callchain_entry *entry, u64 ip)
+static inline void perf_callchain_store(struct perf_callchain_entry *entry, u64 ip)
{
if (entry->nr < PERF_MAX_STACK_DEPTH)
entry->ip[entry->nr++] = ip;
extern void perf_bp_event(struct perf_event *event, void *data);
#ifndef perf_misc_flags
-#define perf_misc_flags(regs) (user_mode(regs) ? PERF_RECORD_MISC_USER : \
- PERF_RECORD_MISC_KERNEL)
-#define perf_instruction_pointer(regs) instruction_pointer(regs)
+# define perf_misc_flags(regs) \
+ (user_mode(regs) ? PERF_RECORD_MISC_USER : PERF_RECORD_MISC_KERNEL)
+# define perf_instruction_pointer(regs) instruction_pointer(regs)
#endif
extern int perf_output_begin(struct perf_output_handle *handle,
perf_bp_event(struct perf_event *event, void *data) { }
static inline int perf_register_guest_info_callbacks
-(struct perf_guest_info_callbacks *callbacks) { return 0; }
+(struct perf_guest_info_callbacks *callbacks) { return 0; }
static inline int perf_unregister_guest_info_callbacks
-(struct perf_guest_info_callbacks *callbacks) { return 0; }
+(struct perf_guest_info_callbacks *callbacks) { return 0; }
static inline void perf_event_mmap(struct vm_area_struct *vma) { }
static inline void perf_event_comm(struct task_struct *tsk) { }
static inline void perf_event_task_tick(void) { }
#endif
-#define perf_output_put(handle, x) \
- perf_output_copy((handle), &(x), sizeof(x))
+#define perf_output_put(handle, x) perf_output_copy((handle), &(x), sizeof(x))
/*
* This has to have a higher priority than migration_notifier in sched.c.
*/
-#define perf_cpu_notifier(fn) \
-do { \
- static struct notifier_block fn##_nb __cpuinitdata = \
- { .notifier_call = fn, .priority = CPU_PRI_PERF }; \
- fn(&fn##_nb, (unsigned long)CPU_UP_PREPARE, \
- (void *)(unsigned long)smp_processor_id()); \
- fn(&fn##_nb, (unsigned long)CPU_STARTING, \
- (void *)(unsigned long)smp_processor_id()); \
- fn(&fn##_nb, (unsigned long)CPU_ONLINE, \
- (void *)(unsigned long)smp_processor_id()); \
- register_cpu_notifier(&fn##_nb); \
+#define perf_cpu_notifier(fn) \
+do { \
+ static struct notifier_block fn##_nb __cpuinitdata = \
+ { .notifier_call = fn, .priority = CPU_PRI_PERF }; \
+ fn(&fn##_nb, (unsigned long)CPU_UP_PREPARE, \
+ (void *)(unsigned long)smp_processor_id()); \
+ fn(&fn##_nb, (unsigned long)CPU_STARTING, \
+ (void *)(unsigned long)smp_processor_id()); \
+ fn(&fn##_nb, (unsigned long)CPU_ONLINE, \
+ (void *)(unsigned long)smp_processor_id()); \
+ register_cpu_notifier(&fn##_nb); \
} while (0)
#endif /* __KERNEL__ */
struct resource *res, unsigned int n_res,
const void *data, size_t size);
-extern const struct dev_pm_ops * platform_bus_get_pm_ops(void);
-extern void platform_bus_set_pm_ops(const struct dev_pm_ops *pm);
-
/* early platform driver interface */
struct early_platform_driver {
const char *class_str;
}
#endif /* MODULE */
+#ifdef CONFIG_PM_SLEEP
+extern int platform_pm_prepare(struct device *dev);
+extern void platform_pm_complete(struct device *dev);
+#else
+#define platform_pm_prepare NULL
+#define platform_pm_complete NULL
+#endif
+
+#ifdef CONFIG_SUSPEND
+extern int platform_pm_suspend(struct device *dev);
+extern int platform_pm_suspend_noirq(struct device *dev);
+extern int platform_pm_resume(struct device *dev);
+extern int platform_pm_resume_noirq(struct device *dev);
+#else
+#define platform_pm_suspend NULL
+#define platform_pm_resume NULL
+#define platform_pm_suspend_noirq NULL
+#define platform_pm_resume_noirq NULL
+#endif
+
+#ifdef CONFIG_HIBERNATE_CALLBACKS
+extern int platform_pm_freeze(struct device *dev);
+extern int platform_pm_freeze_noirq(struct device *dev);
+extern int platform_pm_thaw(struct device *dev);
+extern int platform_pm_thaw_noirq(struct device *dev);
+extern int platform_pm_poweroff(struct device *dev);
+extern int platform_pm_poweroff_noirq(struct device *dev);
+extern int platform_pm_restore(struct device *dev);
+extern int platform_pm_restore_noirq(struct device *dev);
+#else
+#define platform_pm_freeze NULL
+#define platform_pm_thaw NULL
+#define platform_pm_poweroff NULL
+#define platform_pm_restore NULL
+#define platform_pm_freeze_noirq NULL
+#define platform_pm_thaw_noirq NULL
+#define platform_pm_poweroff_noirq NULL
+#define platform_pm_restore_noirq NULL
+#endif
+
+#ifdef CONFIG_PM_SLEEP
+#define USE_PLATFORM_PM_SLEEP_OPS \
+ .prepare = platform_pm_prepare, \
+ .complete = platform_pm_complete, \
+ .suspend = platform_pm_suspend, \
+ .resume = platform_pm_resume, \
+ .freeze = platform_pm_freeze, \
+ .thaw = platform_pm_thaw, \
+ .poweroff = platform_pm_poweroff, \
+ .restore = platform_pm_restore, \
+ .suspend_noirq = platform_pm_suspend_noirq, \
+ .resume_noirq = platform_pm_resume_noirq, \
+ .freeze_noirq = platform_pm_freeze_noirq, \
+ .thaw_noirq = platform_pm_thaw_noirq, \
+ .poweroff_noirq = platform_pm_poweroff_noirq, \
+ .restore_noirq = platform_pm_restore_noirq,
+#else
+#define USE_PLATFORM_PM_SLEEP_OPS
+#endif
+
#endif /* _PLATFORM_DEVICE_H_ */
unsigned long active_jiffies;
unsigned long suspended_jiffies;
unsigned long accounting_timestamp;
+ void *subsys_data; /* Owned by the subsystem. */
#endif
};
*/
#ifdef CONFIG_PM_SLEEP
-#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
-extern int sysdev_suspend(pm_message_t state);
-extern int sysdev_resume(void);
-#else
-static inline int sysdev_suspend(pm_message_t state) { return 0; }
-static inline int sysdev_resume(void) { return 0; }
-#endif
-
extern void device_pm_lock(void);
extern void dpm_resume_noirq(pm_message_t state);
extern void dpm_resume_end(pm_message_t state);
+extern void dpm_resume(pm_message_t state);
+extern void dpm_complete(pm_message_t state);
extern void device_pm_unlock(void);
extern int dpm_suspend_noirq(pm_message_t state);
extern int dpm_suspend_start(pm_message_t state);
+extern int dpm_suspend(pm_message_t state);
+extern int dpm_prepare(pm_message_t state);
extern void __suspend_report_result(const char *function, void *fn, int ret);
} while (0)
extern int device_pm_wait_for_dev(struct device *sub, struct device *dev);
+
+extern int pm_generic_prepare(struct device *dev);
+extern int pm_generic_suspend(struct device *dev);
+extern int pm_generic_resume(struct device *dev);
+extern int pm_generic_freeze(struct device *dev);
+extern int pm_generic_thaw(struct device *dev);
+extern int pm_generic_restore(struct device *dev);
+extern int pm_generic_poweroff(struct device *dev);
+extern void pm_generic_complete(struct device *dev);
+
#else /* !CONFIG_PM_SLEEP */
#define device_pm_lock() do {} while (0)
{
return 0;
}
+
+#define pm_generic_prepare NULL
+#define pm_generic_suspend NULL
+#define pm_generic_resume NULL
+#define pm_generic_freeze NULL
+#define pm_generic_thaw NULL
+#define pm_generic_restore NULL
+#define pm_generic_poweroff NULL
+#define pm_generic_complete NULL
#endif /* !CONFIG_PM_SLEEP */
/* How to reorder dpm_list after device_move() */
DPM_ORDER_DEV_LAST,
};
-extern int pm_generic_suspend(struct device *dev);
-extern int pm_generic_resume(struct device *dev);
-extern int pm_generic_freeze(struct device *dev);
-extern int pm_generic_thaw(struct device *dev);
-extern int pm_generic_restore(struct device *dev);
-extern int pm_generic_poweroff(struct device *dev);
-
#endif /* _LINUX_PM_H */
__pm_runtime_use_autosuspend(dev, false);
}
+struct pm_clk_notifier_block {
+ struct notifier_block nb;
+ struct dev_power_domain *pwr_domain;
+ char *con_ids[];
+};
+
+#ifdef CONFIG_PM_RUNTIME_CLK
+extern int pm_runtime_clk_init(struct device *dev);
+extern void pm_runtime_clk_destroy(struct device *dev);
+extern int pm_runtime_clk_add(struct device *dev, const char *con_id);
+extern void pm_runtime_clk_remove(struct device *dev, const char *con_id);
+extern int pm_runtime_clk_suspend(struct device *dev);
+extern int pm_runtime_clk_resume(struct device *dev);
+#else
+static inline int pm_runtime_clk_init(struct device *dev)
+{
+ return -EINVAL;
+}
+static inline void pm_runtime_clk_destroy(struct device *dev)
+{
+}
+static inline int pm_runtime_clk_add(struct device *dev, const char *con_id)
+{
+ return -EINVAL;
+}
+static inline void pm_runtime_clk_remove(struct device *dev, const char *con_id)
+{
+}
+#define pm_runtime_clock_suspend NULL
+#define pm_runtime_clock_resume NULL
+#endif
+
+#ifdef CONFIG_HAVE_CLK
+extern void pm_runtime_clk_add_notifier(struct bus_type *bus,
+ struct pm_clk_notifier_block *clknb);
+#else
+static inline void pm_runtime_clk_add_notifier(struct bus_type *bus,
+ struct pm_clk_notifier_block *clknb)
+{
+}
+#endif
+
#endif
#include <linux/list.h>
#include <linux/sched.h>
#include <linux/timex.h>
+#include <linux/alarmtimer.h>
union cpu_time_count {
cputime_t cpu;
unsigned long incr;
unsigned long expires;
} mmtimer;
+ struct alarm alarmtimer;
} it;
};
struct proc_dir_entry *parent,const char *dest) {return NULL;}
static inline struct proc_dir_entry *proc_mkdir(const char *name,
struct proc_dir_entry *parent) {return NULL;}
+static inline struct proc_dir_entry *proc_mkdir_mode(const char *name,
+ mode_t mode, struct proc_dir_entry *parent) { return NULL; }
static inline struct proc_dir_entry *create_proc_read_entry(const char *name,
mode_t mode, struct proc_dir_entry *base,
child->ptrace = current->ptrace;
__ptrace_link(child, current->parent);
}
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ atomic_set(&child->ptrace_bp_refcnt, 1);
+#endif
}
/**
unsigned long args[6], unsigned int maxargs,
unsigned long *sp, unsigned long *pc);
-#endif
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+extern int ptrace_get_breakpoints(struct task_struct *tsk);
+extern void ptrace_put_breakpoints(struct task_struct *tsk);
+#else
+static inline void ptrace_put_breakpoints(struct task_struct *tsk) { }
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+#endif /* __KERNEL */
#endif
#define RB_EMPTY_NODE(node) (rb_parent(node) == node)
#define RB_CLEAR_NODE(node) (rb_set_parent(node, node))
+static inline void rb_init_node(struct rb_node *rb)
+{
+ rb->rb_parent_color = 0;
+ rb->rb_right = NULL;
+ rb->rb_left = NULL;
+ RB_CLEAR_NODE(rb);
+}
+
extern void rb_insert_color(struct rb_node *, struct rb_root *);
extern void rb_erase(struct rb_node *, struct rb_root *);
*/
#define list_for_each_entry_rcu(pos, head, member) \
for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
- prefetch(pos->member.next), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
*/
#define list_for_each_continue_rcu(pos, head) \
for ((pos) = rcu_dereference_raw(list_next_rcu(pos)); \
- prefetch((pos)->next), (pos) != (head); \
+ (pos) != (head); \
(pos) = rcu_dereference_raw(list_next_rcu(pos)))
/**
*/
#define list_for_each_entry_continue_rcu(pos, head, member) \
for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
- prefetch(pos->member.next), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
/**
#define __hlist_for_each_rcu(pos, head) \
for (pos = rcu_dereference(hlist_first_rcu(head)); \
- pos && ({ prefetch(pos->next); 1; }); \
+ pos; \
pos = rcu_dereference(hlist_next_rcu(pos)))
/**
*/
#define hlist_for_each_entry_rcu(tpos, pos, head, member) \
for (pos = rcu_dereference_raw(hlist_first_rcu(head)); \
- pos && ({ prefetch(pos->next); 1; }) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_raw(hlist_next_rcu(pos)))
*/
#define hlist_for_each_entry_rcu_bh(tpos, pos, head, member) \
for (pos = rcu_dereference_bh((head)->first); \
- pos && ({ prefetch(pos->next); 1; }) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_bh(pos->next))
*/
#define hlist_for_each_entry_continue_rcu(tpos, pos, member) \
for (pos = rcu_dereference((pos)->next); \
- pos && ({ prefetch(pos->next); 1; }) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference(pos->next))
*/
#define hlist_for_each_entry_continue_rcu_bh(tpos, pos, member) \
for (pos = rcu_dereference_bh((pos)->next); \
- pos && ({ prefetch(pos->next); 1; }) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
pos = rcu_dereference_bh(pos->next))
extern signed long schedule_timeout_killable(signed long timeout);
extern signed long schedule_timeout_uninterruptible(signed long timeout);
asmlinkage void schedule(void);
-extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner);
struct nsproxy;
struct user_namespace;
/* timestamps */
unsigned long long last_arrival,/* when we last ran on a cpu */
last_queued; /* when we were last queued to run */
-#ifdef CONFIG_SCHEDSTATS
- /* BKL stats */
- unsigned int bkl_count;
-#endif
};
#endif /* defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) */
struct sched_group {
struct sched_group *next; /* Must be a circular list */
+ atomic_t ref;
/*
* CPU power of this group, SCHED_LOAD_SCALE being max power for a
* NOTE: this field is variable length. (Allocated dynamically
* by attaching extra space to the end of the structure,
* depending on how many CPUs the kernel has booted up with)
- *
- * It is also be embedded into static data structures at build
- * time. (See 'struct static_sched_group' in kernel/sched.c)
*/
unsigned long cpumask[0];
};
return to_cpumask(sg->cpumask);
}
-enum sched_domain_level {
- SD_LV_NONE = 0,
- SD_LV_SIBLING,
- SD_LV_MC,
- SD_LV_BOOK,
- SD_LV_CPU,
- SD_LV_NODE,
- SD_LV_ALLNODES,
- SD_LV_MAX
-};
-
struct sched_domain_attr {
int relax_domain_level;
};
.relax_domain_level = -1, \
}
+extern int sched_domain_level_max;
+
struct sched_domain {
/* These fields must be setup */
struct sched_domain *parent; /* top domain must be null terminated */
unsigned int forkexec_idx;
unsigned int smt_gain;
int flags; /* See SD_* */
- enum sched_domain_level level;
+ int level;
/* Runtime fields. */
unsigned long last_balance; /* init to jiffies. units in jiffies */
#ifdef CONFIG_SCHED_DEBUG
char *name;
#endif
+ union {
+ void *private; /* used during construction */
+ struct rcu_head rcu; /* used during destruction */
+ };
unsigned int span_weight;
/*
* NOTE: this field is variable length. (Allocated dynamically
* by attaching extra space to the end of the structure,
* depending on how many CPUs the kernel has booted up with)
- *
- * It is also be embedded into static data structures at build
- * time. (See 'struct static_sched_domain' in kernel/sched.c)
*/
unsigned long span[0];
};
#define WF_FORK 0x02 /* child wakeup after fork */
#define ENQUEUE_WAKEUP 1
-#define ENQUEUE_WAKING 2
-#define ENQUEUE_HEAD 4
+#define ENQUEUE_HEAD 2
+#ifdef CONFIG_SMP
+#define ENQUEUE_WAKING 4 /* sched_class::task_waking was called */
+#else
+#define ENQUEUE_WAKING 0
+#endif
#define DEQUEUE_SLEEP 1
void (*put_prev_task) (struct rq *rq, struct task_struct *p);
#ifdef CONFIG_SMP
- int (*select_task_rq)(struct rq *rq, struct task_struct *p,
- int sd_flag, int flags);
+ int (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
void (*post_schedule) (struct rq *this_rq);
- void (*task_waking) (struct rq *this_rq, struct task_struct *task);
+ void (*task_waking) (struct task_struct *task);
void (*task_woken) (struct rq *this_rq, struct task_struct *task);
void (*set_cpus_allowed)(struct task_struct *p,
unsigned int flags; /* per process flags, defined below */
unsigned int ptrace;
- int lock_depth; /* BKL lock depth */
-
#ifdef CONFIG_SMP
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
- int oncpu;
-#endif
+ struct task_struct *wake_entry;
+ int on_cpu;
#endif
+ int on_rq;
int prio, static_prio, normal_prio;
unsigned int rt_priority;
/* Revert to default priority/policy when forking */
unsigned sched_reset_on_fork:1;
+ unsigned sched_contributes_to_load:1;
pid_t pid;
pid_t tgid;
unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
} memcg_batch;
#endif
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ atomic_t ptrace_bp_refcnt;
+#endif
};
/* Future-safe accessor for struct task_struct's cpus_allowed. */
extern int wake_up_state(struct task_struct *tsk, unsigned int state);
extern int wake_up_process(struct task_struct *tsk);
-extern void wake_up_new_task(struct task_struct *tsk,
- unsigned long clone_flags);
+extern void wake_up_new_task(struct task_struct *tsk);
#ifdef CONFIG_SMP
extern void kick_process(struct task_struct *tsk);
#else
static inline void kick_process(struct task_struct *tsk) { }
#endif
-extern void sched_fork(struct task_struct *p, int clone_flags);
+extern void sched_fork(struct task_struct *p);
extern void sched_dead(struct task_struct *p);
extern void proc_caches_init(void);
extern char *get_task_comm(char *to, struct task_struct *tsk);
#ifdef CONFIG_SMP
+void scheduler_ipi(void);
extern unsigned long wait_task_inactive(struct task_struct *, long match_state);
#else
+static inline void scheduler_ipi(void) { }
static inline unsigned long wait_task_inactive(struct task_struct *p,
long match_state)
{
unsigned ret;
repeat:
- ret = sl->sequence;
- smp_rmb();
+ ret = ACCESS_ONCE(sl->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
+ smp_rmb();
return ret;
}
/* Set a fallback SPROM.
* See kdoc at the function definition for complete documentation. */
-extern int ssb_arch_set_fallback_sprom(const struct ssb_sprom *sprom);
+extern int ssb_arch_register_fallback_sprom(
+ int (*sprom_callback)(struct ssb_bus *bus,
+ struct ssb_sprom *out));
/* Suspend a SSB bus.
* Call this from the parent bus suspend routine. */
extern void argv_free(char **argv);
extern bool sysfs_streq(const char *s1, const char *s2);
+extern int strtobool(const char *s, bool *res);
#ifdef CONFIG_BINARY_PRINTF
int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, va_list args);
#define RPC_TASK_KILLED 0x0100 /* task was killed */
#define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */
#define RPC_TASK_SOFTCONN 0x0400 /* Fail if can't connect */
+#define RPC_TASK_SENT 0x0800 /* message was sent */
+#define RPC_TASK_TIMEOUT 0x1000 /* fail with ETIMEDOUT on timeout */
#define RPC_IS_ASYNC(t) ((t)->tk_flags & RPC_TASK_ASYNC)
#define RPC_IS_SWAPPER(t) ((t)->tk_flags & RPC_TASK_SWAPPER)
#define RPC_DO_ROOTOVERRIDE(t) ((t)->tk_flags & RPC_TASK_ROOTCREDS)
#define RPC_ASSASSINATED(t) ((t)->tk_flags & RPC_TASK_KILLED)
-#define RPC_IS_SOFT(t) ((t)->tk_flags & RPC_TASK_SOFT)
+#define RPC_IS_SOFT(t) ((t)->tk_flags & (RPC_TASK_SOFT|RPC_TASK_TIMEOUT))
#define RPC_IS_SOFTCONN(t) ((t)->tk_flags & RPC_TASK_SOFTCONN)
+#define RPC_WAS_SENT(t) ((t)->tk_flags & RPC_TASK_SENT)
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
struct list_head drivers;
struct sysdev_class_attribute **attrs;
struct kset kset;
-#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
- /* Default operations for these types of devices */
- int (*shutdown)(struct sys_device *);
- int (*suspend)(struct sys_device *, pm_message_t state);
- int (*resume)(struct sys_device *);
-#endif
};
struct sysdev_class_attribute {
struct list_head entry;
int (*add)(struct sys_device *);
int (*remove)(struct sys_device *);
-#ifndef CONFIG_ARCH_NO_SYSDEV_OPS
- int (*shutdown)(struct sys_device *);
- int (*suspend)(struct sys_device *, pm_message_t state);
- int (*resume)(struct sys_device *);
-#endif
};
struct timespec get_monotonic_coarse(void);
void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
struct timespec *wtom, struct timespec *sleep);
+void timekeeping_inject_sleeptime(struct timespec *delta);
#define CURRENT_TIME (current_kernel_time())
#define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 })
#define CLOCK_REALTIME_COARSE 5
#define CLOCK_MONOTONIC_COARSE 6
#define CLOCK_BOOTTIME 7
+#define CLOCK_REALTIME_ALARM 8
+#define CLOCK_BOOTTIME_ALARM 9
/*
* The IDs of various hardware clocks:
static inline void timerqueue_init(struct timerqueue_node *node)
{
- RB_CLEAR_NODE(&node->node);
+ rb_init_node(&node->node);
}
static inline void timerqueue_init_head(struct timerqueue_head *head)
struct tracepoint {
const char *name; /* Tracepoint name */
- int state; /* State. */
+ struct jump_label_key key;
void (*regfunc)(void);
void (*unregfunc)(void);
struct tracepoint_func __rcu *funcs;
extern struct tracepoint __tracepoint_##name; \
static inline void trace_##name(proto) \
{ \
- JUMP_LABEL(&__tracepoint_##name.state, do_trace); \
- return; \
-do_trace: \
+ if (static_branch(&__tracepoint_##name.key)) \
__DO_TRACE(&__tracepoint_##name, \
TP_PROTO(data_proto), \
TP_ARGS(data_args), \
* structures, so we create an array of pointers that will be used for iteration
* on the tracepoints.
*/
-#define DEFINE_TRACE_FN(name, reg, unreg) \
- static const char __tpstrtab_##name[] \
- __attribute__((section("__tracepoints_strings"))) = #name; \
- struct tracepoint __tracepoint_##name \
- __attribute__((section("__tracepoints"))) = \
- { __tpstrtab_##name, 0, reg, unreg, NULL }; \
- static struct tracepoint * const __tracepoint_ptr_##name __used \
- __attribute__((section("__tracepoints_ptrs"))) = \
+#define DEFINE_TRACE_FN(name, reg, unreg) \
+ static const char __tpstrtab_##name[] \
+ __attribute__((section("__tracepoints_strings"))) = #name; \
+ struct tracepoint __tracepoint_##name \
+ __attribute__((section("__tracepoints"))) = \
+ { __tpstrtab_##name, JUMP_LABEL_INIT, reg, unreg, NULL };\
+ static struct tracepoint * const __tracepoint_ptr_##name __used \
+ __attribute__((section("__tracepoints_ptrs"))) = \
&__tracepoint_##name;
#define DEFINE_TRACE(name) \
# define EVENT_RX_PAUSED 5
# define EVENT_DEV_WAKING 6
# define EVENT_DEV_ASLEEP 7
+# define EVENT_DEV_OPEN 8
};
static inline struct usb_driver *driver_of(struct usb_interface *intf)
V4L2_MBUS_FMT_RGB565_2X8_BE = 0x1007,
V4L2_MBUS_FMT_RGB565_2X8_LE = 0x1008,
- /* YUV (including grey) - next is 0x2013 */
+ /* YUV (including grey) - next is 0x2014 */
V4L2_MBUS_FMT_Y8_1X8 = 0x2001,
V4L2_MBUS_FMT_UYVY8_1_5X8 = 0x2002,
V4L2_MBUS_FMT_VYUY8_1_5X8 = 0x2003,
V4L2_MBUS_FMT_Y10_1X10 = 0x200a,
V4L2_MBUS_FMT_YUYV10_2X10 = 0x200b,
V4L2_MBUS_FMT_YVYU10_2X10 = 0x200c,
+ V4L2_MBUS_FMT_Y12_1X12 = 0x2013,
V4L2_MBUS_FMT_UYVY8_1X16 = 0x200f,
V4L2_MBUS_FMT_VYUY8_1X16 = 0x2010,
V4L2_MBUS_FMT_YUYV8_1X16 = 0x2011,
V4L2_MBUS_FMT_YUYV10_1X20 = 0x200d,
V4L2_MBUS_FMT_YVYU10_1X20 = 0x200e,
- /* Bayer - next is 0x3013 */
+ /* Bayer - next is 0x3015 */
V4L2_MBUS_FMT_SBGGR8_1X8 = 0x3001,
+ V4L2_MBUS_FMT_SGBRG8_1X8 = 0x3013,
V4L2_MBUS_FMT_SGRBG8_1X8 = 0x3002,
+ V4L2_MBUS_FMT_SRGGB8_1X8 = 0x3014,
V4L2_MBUS_FMT_SBGGR10_DPCM8_1X8 = 0x300b,
V4L2_MBUS_FMT_SGBRG10_DPCM8_1X8 = 0x300c,
V4L2_MBUS_FMT_SGRBG10_DPCM8_1X8 = 0x3009,
#define V4L2_PIX_FMT_Y4 v4l2_fourcc('Y', '0', '4', ' ') /* 4 Greyscale */
#define V4L2_PIX_FMT_Y6 v4l2_fourcc('Y', '0', '6', ' ') /* 6 Greyscale */
#define V4L2_PIX_FMT_Y10 v4l2_fourcc('Y', '1', '0', ' ') /* 10 Greyscale */
+#define V4L2_PIX_FMT_Y12 v4l2_fourcc('Y', '1', '2', ' ') /* 12 Greyscale */
#define V4L2_PIX_FMT_Y16 v4l2_fourcc('Y', '1', '6', ' ') /* 16 Greyscale */
/* Palette formats */
({ \
struct v4l2_subdev *__sd; \
__v4l2_device_call_subdevs_until_err_p(v4l2_dev, __sd, cond, o, \
- f, args...); \
+ f , ##args); \
})
/* Call the specified callback for all subdevs matching grp_id (if 0, then
return outer;
}
-#define INET_ECN_xmit(sk) do { inet_sk(sk)->tos |= INET_ECN_ECT_0; } while (0)
-#define INET_ECN_dontxmit(sk) \
- do { inet_sk(sk)->tos &= ~INET_ECN_MASK; } while (0)
+static inline void INET_ECN_xmit(struct sock *sk)
+{
+ inet_sk(sk)->tos |= INET_ECN_ECT_0;
+ if (inet6_sk(sk) != NULL)
+ inet6_sk(sk)->tclass |= INET_ECN_ECT_0;
+}
+
+static inline void INET_ECN_dontxmit(struct sock *sk)
+{
+ inet_sk(sk)->tos &= ~INET_ECN_MASK;
+ if (inet6_sk(sk) != NULL)
+ inet6_sk(sk)->tclass &= ~INET_ECN_MASK;
+}
#define IP6_ECN_flow_init(label) do { \
(label) &= ~htonl(INET_ECN_MASK << 20); \
/* IPVS in network namespace */
struct netns_ipvs {
int gen; /* Generation */
+ int enable; /* enable like nf_hooks do */
/*
* Hash table: for real service lookups
*/
atomic_inc(&ctl_cp->n_control);
}
+/*
+ * IPVS netns init & cleanup functions
+ */
+extern int __ip_vs_estimator_init(struct net *net);
+extern int __ip_vs_control_init(struct net *net);
+extern int __ip_vs_protocol_init(struct net *net);
+extern int __ip_vs_app_init(struct net *net);
+extern int __ip_vs_conn_init(struct net *net);
+extern int __ip_vs_sync_init(struct net *net);
+extern void __ip_vs_conn_cleanup(struct net *net);
+extern void __ip_vs_app_cleanup(struct net *net);
+extern void __ip_vs_protocol_cleanup(struct net *net);
+extern void __ip_vs_control_cleanup(struct net *net);
+extern void __ip_vs_estimator_cleanup(struct net *net);
+extern void __ip_vs_sync_cleanup(struct net *net);
+extern void __ip_vs_service_cleanup(struct net *net);
/*
* IPVS application functions
u8 ssap;
u8 ctrl_1;
u8 ctrl_2;
-};
+} __packed;
static inline struct llc_pdu_sn *llc_pdu_sn_hdr(struct sk_buff *skb)
{
u8 dsap;
u8 ssap;
u8 ctrl_1;
-};
+} __packed;
static inline struct llc_pdu_un *llc_pdu_un_hdr(struct sk_buff *skb)
{
u8 fmt_id; /* always 0x81 for LLC */
u8 type; /* different if NULL/non-NULL LSAP */
u8 rw; /* sender receive window */
-};
+} __packed;
/**
* llc_pdu_init_as_xid_cmd - sets bytes 3, 4 & 5 of LLC header as XID
u8 curr_ssv; /* current send state variable val */
u8 curr_rsv; /* current receive state variable */
u8 ind_bits; /* indicator bits set with macro */
-};
+} __packed;
extern void llc_pdu_set_cmd_rsp(struct sk_buff *skb, u8 type);
extern void llc_pdu_set_pf_bit(struct sk_buff *skb, u8 bit_value);
int (*tmpl_sort)(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n);
int (*state_sort)(struct xfrm_state **dst, struct xfrm_state **src, int n);
int (*output)(struct sk_buff *skb);
+ int (*output_finish)(struct sk_buff *skb);
int (*extract_input)(struct xfrm_state *x,
struct sk_buff *skb);
int (*extract_output)(struct xfrm_state *x,
extern int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
extern int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
extern int xfrm4_output(struct sk_buff *skb);
+extern int xfrm4_output_finish(struct sk_buff *skb);
extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
extern int xfrm6_extract_header(struct sk_buff *skb);
extern int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb);
extern int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
extern int xfrm6_output(struct sk_buff *skb);
+extern int xfrm6_output_finish(struct sk_buff *skb);
extern int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
u8 **prevhdr);
IW_CM_EVENT_CLOSE /* close complete */
};
-enum iw_cm_event_status {
- IW_CM_EVENT_STATUS_OK = 0, /* request successful */
- IW_CM_EVENT_STATUS_ACCEPTED = 0, /* connect request accepted */
- IW_CM_EVENT_STATUS_REJECTED, /* connect request rejected */
- IW_CM_EVENT_STATUS_TIMEOUT, /* the operation timed out */
- IW_CM_EVENT_STATUS_RESET, /* reset from remote peer */
- IW_CM_EVENT_STATUS_EINVAL, /* asynchronous failure for bad parm */
-};
-
struct iw_cm_event {
enum iw_cm_event_type event;
- enum iw_cm_event_status status;
+ int status;
struct sockaddr_in local_addr;
struct sockaddr_in remote_addr;
void *private_data;
*/
void rdma_set_service_type(struct rdma_cm_id *id, int tos);
+/**
+ * rdma_set_reuseaddr - Allow the reuse of local addresses when binding
+ * the rdma_cm_id.
+ * @id: Communication identifier to configure.
+ * @reuse: Value indicating if the bound address is reusable.
+ *
+ * Reuse must be set before an address is bound to the id.
+ */
+int rdma_set_reuseaddr(struct rdma_cm_id *id, int reuse);
+
#endif /* RDMA_CM_H */
/* Option details */
enum {
- RDMA_OPTION_ID_TOS = 0,
- RDMA_OPTION_IB_PATH = 1
+ RDMA_OPTION_ID_TOS = 0,
+ RDMA_OPTION_ID_REUSEADDR = 1,
+ RDMA_OPTION_IB_PATH = 1
};
struct rdma_ucm_set_option {
sdev_dev;
struct execute_work ew; /* used to get process context on put */
+ struct work_struct requeue_work;
struct scsi_dh_data *scsi_dh_data;
enum scsi_device_state sdev_state;
*/
#define show_gfp_flags(flags) \
(flags) ? __print_flags(flags, "|", \
+ {(unsigned long)GFP_TRANSHUGE, "GFP_TRANSHUGE"}, \
{(unsigned long)GFP_HIGHUSER_MOVABLE, "GFP_HIGHUSER_MOVABLE"}, \
{(unsigned long)GFP_HIGHUSER, "GFP_HIGHUSER"}, \
{(unsigned long)GFP_USER, "GFP_USER"}, \
{(unsigned long)__GFP_HARDWALL, "GFP_HARDWALL"}, \
{(unsigned long)__GFP_THISNODE, "GFP_THISNODE"}, \
{(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \
- {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"} \
+ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \
+ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \
+ {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \
+ {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \
) : "GFP_NOWAIT"
int xen_allocate_pirq_msi(struct pci_dev *dev, struct msi_desc *msidesc);
/* Bind an PSI pirq to an irq. */
int xen_bind_pirq_msi_to_irq(struct pci_dev *dev, struct msi_desc *msidesc,
- int pirq, int vector, const char *name);
+ int pirq, int vector, const char *name,
+ domid_t domid);
#endif
/* De-allocates the above mentioned physical interrupt. */
/* Return irq from pirq */
int xen_irq_from_pirq(unsigned pirq);
+/* Return the pirq allocated to the irq. */
+int xen_pirq_from_irq(unsigned irq);
+
+/* Determine whether to ignore this IRQ if it is passed to a guest. */
+int xen_test_irq_shared(int irq);
+
#endif /* _XEN_EVENTS_H */
desktop applications. Task group autogeneration is currently based
upon task session.
+config SCHED_TTWU_QUEUE
+ bool
+ depends on !SPARC32
+ default y
+
config MM_OWNER
bool
environments which can tolerate a "non-standard" kernel.
Only use this if you really know what you are doing.
-config EMBEDDED
- bool "Embedded system"
- select EXPERT
- help
- This option should be enabled if compiling the kernel for
- an embedded system so certain expert options are available
- for configuration.
-
config UID16
bool "Enable 16-bit UID system calls" if EXPERT
depends on ARM || BLACKFIN || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && COMPAT) || UML || (X86_64 && IA32_EMULATION)
by some high performance threaded applications. Disabling
this option saves about 7k.
+config EMBEDDED
+ bool "Embedded system"
+ select EXPERT
+ help
+ This option should be enabled if compiling the kernel for
+ an embedded system so certain expert options are available
+ for configuration.
+
config HAVE_PERF_EVENTS
bool
help
#endif
page_cgroup_init();
enable_debug_pagealloc();
- kmemleak_init();
debug_objects_mem_init();
+ kmemleak_init();
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)
CFLAGS_REMOVE_rtmutex-debug.o = -pg
CFLAGS_REMOVE_cgroup-debug.o = -pg
CFLAGS_REMOVE_sched_clock.o = -pg
-CFLAGS_REMOVE_perf_event.o = -pg
CFLAGS_REMOVE_irq_work.o = -pg
endif
obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_SMP) += sched_cpupri.o
obj-$(CONFIG_IRQ_WORK) += irq_work.o
-obj-$(CONFIG_PERF_EVENTS) += perf_event.o
-obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
+
+obj-$(CONFIG_PERF_EVENTS) += events/
+
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
obj-$(CONFIG_PADATA) += padata.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
return ns_capable(task_cred_xxx(t, user)->user_ns, cap);
}
EXPORT_SYMBOL(task_ns_capable);
+
+/**
+ * nsown_capable - Check superior capability to one's own user_ns
+ * @cap: The capability in question
+ *
+ * Return true if the current task has the given superior capability
+ * targeted at its own user namespace.
+ */
+bool nsown_capable(int cap)
+{
+ return ns_capable(current_user_ns(), cap);
+}
static int update_relax_domain_level(struct cpuset *cs, s64 val)
{
#ifdef CONFIG_SMP
- if (val < -1 || val >= SD_LV_MAX)
+ if (val < -1 || val >= sched_domain_level_max)
return -EINVAL;
#endif
.cap_effective = CAP_INIT_EFF_SET,
.cap_bset = CAP_INIT_BSET,
.user = INIT_USER,
+ .user_ns = &init_user_ns,
.group_info = &init_groups,
#ifdef CONFIG_KEYS
.tgcred = &init_tgcred,
goto error_put;
}
+ /* cache user_ns in cred. Doesn't need a refcount because it will
+ * stay pinned by cred->user
+ */
+ new->user_ns = new->user->user_ns;
+
#ifdef CONFIG_KEYS
/* new threads get their own thread keyrings if their parent already
* had one */
}
EXPORT_SYMBOL(set_create_files_as);
-struct user_namespace *current_user_ns(void)
-{
- return _current_user_ns();
-}
-EXPORT_SYMBOL(current_user_ns);
-
#ifdef CONFIG_DEBUG_CREDENTIALS
bool creds_are_invalid(const struct cred *cred)
--- /dev/null
+ifdef CONFIG_FUNCTION_TRACER
+CFLAGS_REMOVE_core.o = -pg
+endif
+
+obj-y := core.o
+obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
--- /dev/null
+/*
+ * Performance events core code:
+ *
+ * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
+ * Copyright (C) 2008-2011 Red Hat, Inc., Ingo Molnar
+ * Copyright (C) 2008-2011 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ * Copyright © 2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
+ *
+ * For licensing details see kernel-base/COPYING
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>
+#include <linux/idr.h>
+#include <linux/file.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/sysfs.h>
+#include <linux/dcache.h>
+#include <linux/percpu.h>
+#include <linux/ptrace.h>
+#include <linux/reboot.h>
+#include <linux/vmstat.h>
+#include <linux/device.h>
+#include <linux/vmalloc.h>
+#include <linux/hardirq.h>
+#include <linux/rculist.h>
+#include <linux/uaccess.h>
+#include <linux/syscalls.h>
+#include <linux/anon_inodes.h>
+#include <linux/kernel_stat.h>
+#include <linux/perf_event.h>
+#include <linux/ftrace_event.h>
+#include <linux/hw_breakpoint.h>
+
+#include <asm/irq_regs.h>
+
+struct remote_function_call {
+ struct task_struct *p;
+ int (*func)(void *info);
+ void *info;
+ int ret;
+};
+
+static void remote_function(void *data)
+{
+ struct remote_function_call *tfc = data;
+ struct task_struct *p = tfc->p;
+
+ if (p) {
+ tfc->ret = -EAGAIN;
+ if (task_cpu(p) != smp_processor_id() || !task_curr(p))
+ return;
+ }
+
+ tfc->ret = tfc->func(tfc->info);
+}
+
+/**
+ * task_function_call - call a function on the cpu on which a task runs
+ * @p: the task to evaluate
+ * @func: the function to be called
+ * @info: the function call argument
+ *
+ * Calls the function @func when the task is currently running. This might
+ * be on the current CPU, which just calls the function directly
+ *
+ * returns: @func return value, or
+ * -ESRCH - when the process isn't running
+ * -EAGAIN - when the process moved away
+ */
+static int
+task_function_call(struct task_struct *p, int (*func) (void *info), void *info)
+{
+ struct remote_function_call data = {
+ .p = p,
+ .func = func,
+ .info = info,
+ .ret = -ESRCH, /* No such (running) process */
+ };
+
+ if (task_curr(p))
+ smp_call_function_single(task_cpu(p), remote_function, &data, 1);
+
+ return data.ret;
+}
+
+/**
+ * cpu_function_call - call a function on the cpu
+ * @func: the function to be called
+ * @info: the function call argument
+ *
+ * Calls the function @func on the remote cpu.
+ *
+ * returns: @func return value or -ENXIO when the cpu is offline
+ */
+static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
+{
+ struct remote_function_call data = {
+ .p = NULL,
+ .func = func,
+ .info = info,
+ .ret = -ENXIO, /* No such CPU */
+ };
+
+ smp_call_function_single(cpu, remote_function, &data, 1);
+
+ return data.ret;
+}
+
+#define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
+ PERF_FLAG_FD_OUTPUT |\
+ PERF_FLAG_PID_CGROUP)
+
+enum event_type_t {
+ EVENT_FLEXIBLE = 0x1,
+ EVENT_PINNED = 0x2,
+ EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
+};
+
+/*
+ * perf_sched_events : >0 events exist
+ * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
+ */
+struct jump_label_key perf_sched_events __read_mostly;
+static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
+
+static atomic_t nr_mmap_events __read_mostly;
+static atomic_t nr_comm_events __read_mostly;
+static atomic_t nr_task_events __read_mostly;
+
+static LIST_HEAD(pmus);
+static DEFINE_MUTEX(pmus_lock);
+static struct srcu_struct pmus_srcu;
+
+/*
+ * perf event paranoia level:
+ * -1 - not paranoid at all
+ * 0 - disallow raw tracepoint access for unpriv
+ * 1 - disallow cpu events for unpriv
+ * 2 - disallow kernel profiling for unpriv
+ */
+int sysctl_perf_event_paranoid __read_mostly = 1;
+
+/* Minimum for 512 kiB + 1 user control page */
+int sysctl_perf_event_mlock __read_mostly = 512 + (PAGE_SIZE / 1024); /* 'free' kiB per user */
+
+/*
+ * max perf event sample rate
+ */
+#define DEFAULT_MAX_SAMPLE_RATE 100000
+int sysctl_perf_event_sample_rate __read_mostly = DEFAULT_MAX_SAMPLE_RATE;
+static int max_samples_per_tick __read_mostly =
+ DIV_ROUND_UP(DEFAULT_MAX_SAMPLE_RATE, HZ);
+
+int perf_proc_update_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret = proc_dointvec(table, write, buffer, lenp, ppos);
+
+ if (ret || !write)
+ return ret;
+
+ max_samples_per_tick = DIV_ROUND_UP(sysctl_perf_event_sample_rate, HZ);
+
+ return 0;
+}
+
+static atomic64_t perf_event_id;
+
+static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
+ enum event_type_t event_type);
+
+static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
+ enum event_type_t event_type,
+ struct task_struct *task);
+
+static void update_context_time(struct perf_event_context *ctx);
+static u64 perf_event_time(struct perf_event *event);
+
+void __weak perf_event_print_debug(void) { }
+
+extern __weak const char *perf_pmu_name(void)
+{
+ return "pmu";
+}
+
+static inline u64 perf_clock(void)
+{
+ return local_clock();
+}
+
+static inline struct perf_cpu_context *
+__get_cpu_context(struct perf_event_context *ctx)
+{
+ return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
+}
+
+#ifdef CONFIG_CGROUP_PERF
+
+/*
+ * Must ensure cgroup is pinned (css_get) before calling
+ * this function. In other words, we cannot call this function
+ * if there is no cgroup event for the current CPU context.
+ */
+static inline struct perf_cgroup *
+perf_cgroup_from_task(struct task_struct *task)
+{
+ return container_of(task_subsys_state(task, perf_subsys_id),
+ struct perf_cgroup, css);
+}
+
+static inline bool
+perf_cgroup_match(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+ return !event->cgrp || event->cgrp == cpuctx->cgrp;
+}
+
+static inline void perf_get_cgroup(struct perf_event *event)
+{
+ css_get(&event->cgrp->css);
+}
+
+static inline void perf_put_cgroup(struct perf_event *event)
+{
+ css_put(&event->cgrp->css);
+}
+
+static inline void perf_detach_cgroup(struct perf_event *event)
+{
+ perf_put_cgroup(event);
+ event->cgrp = NULL;
+}
+
+static inline int is_cgroup_event(struct perf_event *event)
+{
+ return event->cgrp != NULL;
+}
+
+static inline u64 perf_cgroup_event_time(struct perf_event *event)
+{
+ struct perf_cgroup_info *t;
+
+ t = per_cpu_ptr(event->cgrp->info, event->cpu);
+ return t->time;
+}
+
+static inline void __update_cgrp_time(struct perf_cgroup *cgrp)
+{
+ struct perf_cgroup_info *info;
+ u64 now;
+
+ now = perf_clock();
+
+ info = this_cpu_ptr(cgrp->info);
+
+ info->time += now - info->timestamp;
+ info->timestamp = now;
+}
+
+static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx)
+{
+ struct perf_cgroup *cgrp_out = cpuctx->cgrp;
+ if (cgrp_out)
+ __update_cgrp_time(cgrp_out);
+}
+
+static inline void update_cgrp_time_from_event(struct perf_event *event)
+{
+ struct perf_cgroup *cgrp;
+
+ /*
+ * ensure we access cgroup data only when needed and
+ * when we know the cgroup is pinned (css_get)
+ */
+ if (!is_cgroup_event(event))
+ return;
+
+ cgrp = perf_cgroup_from_task(current);
+ /*
+ * Do not update time when cgroup is not active
+ */
+ if (cgrp == event->cgrp)
+ __update_cgrp_time(event->cgrp);
+}
+
+static inline void
+perf_cgroup_set_timestamp(struct task_struct *task,
+ struct perf_event_context *ctx)
+{
+ struct perf_cgroup *cgrp;
+ struct perf_cgroup_info *info;
+
+ /*
+ * ctx->lock held by caller
+ * ensure we do not access cgroup data
+ * unless we have the cgroup pinned (css_get)
+ */
+ if (!task || !ctx->nr_cgroups)
+ return;
+
+ cgrp = perf_cgroup_from_task(task);
+ info = this_cpu_ptr(cgrp->info);
+ info->timestamp = ctx->timestamp;
+}
+
+#define PERF_CGROUP_SWOUT 0x1 /* cgroup switch out every event */
+#define PERF_CGROUP_SWIN 0x2 /* cgroup switch in events based on task */
+
+/*
+ * reschedule events based on the cgroup constraint of task.
+ *
+ * mode SWOUT : schedule out everything
+ * mode SWIN : schedule in based on cgroup for next
+ */
+void perf_cgroup_switch(struct task_struct *task, int mode)
+{
+ struct perf_cpu_context *cpuctx;
+ struct pmu *pmu;
+ unsigned long flags;
+
+ /*
+ * disable interrupts to avoid geting nr_cgroup
+ * changes via __perf_event_disable(). Also
+ * avoids preemption.
+ */
+ local_irq_save(flags);
+
+ /*
+ * we reschedule only in the presence of cgroup
+ * constrained events.
+ */
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+
+ cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+ perf_pmu_disable(cpuctx->ctx.pmu);
+
+ /*
+ * perf_cgroup_events says at least one
+ * context on this CPU has cgroup events.
+ *
+ * ctx->nr_cgroups reports the number of cgroup
+ * events for a context.
+ */
+ if (cpuctx->ctx.nr_cgroups > 0) {
+
+ if (mode & PERF_CGROUP_SWOUT) {
+ cpu_ctx_sched_out(cpuctx, EVENT_ALL);
+ /*
+ * must not be done before ctxswout due
+ * to event_filter_match() in event_sched_out()
+ */
+ cpuctx->cgrp = NULL;
+ }
+
+ if (mode & PERF_CGROUP_SWIN) {
+ WARN_ON_ONCE(cpuctx->cgrp);
+ /* set cgrp before ctxsw in to
+ * allow event_filter_match() to not
+ * have to pass task around
+ */
+ cpuctx->cgrp = perf_cgroup_from_task(task);
+ cpu_ctx_sched_in(cpuctx, EVENT_ALL, task);
+ }
+ }
+
+ perf_pmu_enable(cpuctx->ctx.pmu);
+ }
+
+ rcu_read_unlock();
+
+ local_irq_restore(flags);
+}
+
+static inline void perf_cgroup_sched_out(struct task_struct *task)
+{
+ perf_cgroup_switch(task, PERF_CGROUP_SWOUT);
+}
+
+static inline void perf_cgroup_sched_in(struct task_struct *task)
+{
+ perf_cgroup_switch(task, PERF_CGROUP_SWIN);
+}
+
+static inline int perf_cgroup_connect(int fd, struct perf_event *event,
+ struct perf_event_attr *attr,
+ struct perf_event *group_leader)
+{
+ struct perf_cgroup *cgrp;
+ struct cgroup_subsys_state *css;
+ struct file *file;
+ int ret = 0, fput_needed;
+
+ file = fget_light(fd, &fput_needed);
+ if (!file)
+ return -EBADF;
+
+ css = cgroup_css_from_dir(file, perf_subsys_id);
+ if (IS_ERR(css)) {
+ ret = PTR_ERR(css);
+ goto out;
+ }
+
+ cgrp = container_of(css, struct perf_cgroup, css);
+ event->cgrp = cgrp;
+
+ /* must be done before we fput() the file */
+ perf_get_cgroup(event);
+
+ /*
+ * all events in a group must monitor
+ * the same cgroup because a task belongs
+ * to only one perf cgroup at a time
+ */
+ if (group_leader && group_leader->cgrp != cgrp) {
+ perf_detach_cgroup(event);
+ ret = -EINVAL;
+ }
+out:
+ fput_light(file, fput_needed);
+ return ret;
+}
+
+static inline void
+perf_cgroup_set_shadow_time(struct perf_event *event, u64 now)
+{
+ struct perf_cgroup_info *t;
+ t = per_cpu_ptr(event->cgrp->info, event->cpu);
+ event->shadow_ctx_time = now - t->timestamp;
+}
+
+static inline void
+perf_cgroup_defer_enabled(struct perf_event *event)
+{
+ /*
+ * when the current task's perf cgroup does not match
+ * the event's, we need to remember to call the
+ * perf_mark_enable() function the first time a task with
+ * a matching perf cgroup is scheduled in.
+ */
+ if (is_cgroup_event(event) && !perf_cgroup_match(event))
+ event->cgrp_defer_enabled = 1;
+}
+
+static inline void
+perf_cgroup_mark_enabled(struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *sub;
+ u64 tstamp = perf_event_time(event);
+
+ if (!event->cgrp_defer_enabled)
+ return;
+
+ event->cgrp_defer_enabled = 0;
+
+ event->tstamp_enabled = tstamp - event->total_time_enabled;
+ list_for_each_entry(sub, &event->sibling_list, group_entry) {
+ if (sub->state >= PERF_EVENT_STATE_INACTIVE) {
+ sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ sub->cgrp_defer_enabled = 0;
+ }
+ }
+}
+#else /* !CONFIG_CGROUP_PERF */
+
+static inline bool
+perf_cgroup_match(struct perf_event *event)
+{
+ return true;
+}
+
+static inline void perf_detach_cgroup(struct perf_event *event)
+{}
+
+static inline int is_cgroup_event(struct perf_event *event)
+{
+ return 0;
+}
+
+static inline u64 perf_cgroup_event_cgrp_time(struct perf_event *event)
+{
+ return 0;
+}
+
+static inline void update_cgrp_time_from_event(struct perf_event *event)
+{
+}
+
+static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx)
+{
+}
+
+static inline void perf_cgroup_sched_out(struct task_struct *task)
+{
+}
+
+static inline void perf_cgroup_sched_in(struct task_struct *task)
+{
+}
+
+static inline int perf_cgroup_connect(pid_t pid, struct perf_event *event,
+ struct perf_event_attr *attr,
+ struct perf_event *group_leader)
+{
+ return -EINVAL;
+}
+
+static inline void
+perf_cgroup_set_timestamp(struct task_struct *task,
+ struct perf_event_context *ctx)
+{
+}
+
+void
+perf_cgroup_switch(struct task_struct *task, struct task_struct *next)
+{
+}
+
+static inline void
+perf_cgroup_set_shadow_time(struct perf_event *event, u64 now)
+{
+}
+
+static inline u64 perf_cgroup_event_time(struct perf_event *event)
+{
+ return 0;
+}
+
+static inline void
+perf_cgroup_defer_enabled(struct perf_event *event)
+{
+}
+
+static inline void
+perf_cgroup_mark_enabled(struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+}
+#endif
+
+void perf_pmu_disable(struct pmu *pmu)
+{
+ int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ if (!(*count)++)
+ pmu->pmu_disable(pmu);
+}
+
+void perf_pmu_enable(struct pmu *pmu)
+{
+ int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ if (!--(*count))
+ pmu->pmu_enable(pmu);
+}
+
+static DEFINE_PER_CPU(struct list_head, rotation_list);
+
+/*
+ * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
+ * because they're strictly cpu affine and rotate_start is called with IRQs
+ * disabled, while rotate_context is called from IRQ context.
+ */
+static void perf_pmu_rotate_start(struct pmu *pmu)
+{
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+ struct list_head *head = &__get_cpu_var(rotation_list);
+
+ WARN_ON(!irqs_disabled());
+
+ if (list_empty(&cpuctx->rotation_list))
+ list_add(&cpuctx->rotation_list, head);
+}
+
+static void get_ctx(struct perf_event_context *ctx)
+{
+ WARN_ON(!atomic_inc_not_zero(&ctx->refcount));
+}
+
+static void free_ctx(struct rcu_head *head)
+{
+ struct perf_event_context *ctx;
+
+ ctx = container_of(head, struct perf_event_context, rcu_head);
+ kfree(ctx);
+}
+
+static void put_ctx(struct perf_event_context *ctx)
+{
+ if (atomic_dec_and_test(&ctx->refcount)) {
+ if (ctx->parent_ctx)
+ put_ctx(ctx->parent_ctx);
+ if (ctx->task)
+ put_task_struct(ctx->task);
+ call_rcu(&ctx->rcu_head, free_ctx);
+ }
+}
+
+static void unclone_ctx(struct perf_event_context *ctx)
+{
+ if (ctx->parent_ctx) {
+ put_ctx(ctx->parent_ctx);
+ ctx->parent_ctx = NULL;
+ }
+}
+
+static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
+{
+ /*
+ * only top level events have the pid namespace they were created in
+ */
+ if (event->parent)
+ event = event->parent;
+
+ return task_tgid_nr_ns(p, event->ns);
+}
+
+static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
+{
+ /*
+ * only top level events have the pid namespace they were created in
+ */
+ if (event->parent)
+ event = event->parent;
+
+ return task_pid_nr_ns(p, event->ns);
+}
+
+/*
+ * If we inherit events we want to return the parent event id
+ * to userspace.
+ */
+static u64 primary_event_id(struct perf_event *event)
+{
+ u64 id = event->id;
+
+ if (event->parent)
+ id = event->parent->id;
+
+ return id;
+}
+
+/*
+ * Get the perf_event_context for a task and lock it.
+ * This has to cope with with the fact that until it is locked,
+ * the context could get moved to another task.
+ */
+static struct perf_event_context *
+perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
+{
+ struct perf_event_context *ctx;
+
+ rcu_read_lock();
+retry:
+ ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
+ if (ctx) {
+ /*
+ * If this context is a clone of another, it might
+ * get swapped for another underneath us by
+ * perf_event_task_sched_out, though the
+ * rcu_read_lock() protects us from any context
+ * getting freed. Lock the context and check if it
+ * got swapped before we could get the lock, and retry
+ * if so. If we locked the right context, then it
+ * can't get swapped on us any more.
+ */
+ raw_spin_lock_irqsave(&ctx->lock, *flags);
+ if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
+ raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+ goto retry;
+ }
+
+ if (!atomic_inc_not_zero(&ctx->refcount)) {
+ raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+ ctx = NULL;
+ }
+ }
+ rcu_read_unlock();
+ return ctx;
+}
+
+/*
+ * Get the context for a task and increment its pin_count so it
+ * can't get swapped to another task. This also increments its
+ * reference count so that the context can't get freed.
+ */
+static struct perf_event_context *
+perf_pin_task_context(struct task_struct *task, int ctxn)
+{
+ struct perf_event_context *ctx;
+ unsigned long flags;
+
+ ctx = perf_lock_task_context(task, ctxn, &flags);
+ if (ctx) {
+ ++ctx->pin_count;
+ raw_spin_unlock_irqrestore(&ctx->lock, flags);
+ }
+ return ctx;
+}
+
+static void perf_unpin_context(struct perf_event_context *ctx)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&ctx->lock, flags);
+ --ctx->pin_count;
+ raw_spin_unlock_irqrestore(&ctx->lock, flags);
+}
+
+/*
+ * Update the record of the current time in a context.
+ */
+static void update_context_time(struct perf_event_context *ctx)
+{
+ u64 now = perf_clock();
+
+ ctx->time += now - ctx->timestamp;
+ ctx->timestamp = now;
+}
+
+static u64 perf_event_time(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+
+ if (is_cgroup_event(event))
+ return perf_cgroup_event_time(event);
+
+ return ctx ? ctx->time : 0;
+}
+
+/*
+ * Update the total_time_enabled and total_time_running fields for a event.
+ */
+static void update_event_times(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+ u64 run_end;
+
+ if (event->state < PERF_EVENT_STATE_INACTIVE ||
+ event->group_leader->state < PERF_EVENT_STATE_INACTIVE)
+ return;
+ /*
+ * in cgroup mode, time_enabled represents
+ * the time the event was enabled AND active
+ * tasks were in the monitored cgroup. This is
+ * independent of the activity of the context as
+ * there may be a mix of cgroup and non-cgroup events.
+ *
+ * That is why we treat cgroup events differently
+ * here.
+ */
+ if (is_cgroup_event(event))
+ run_end = perf_event_time(event);
+ else if (ctx->is_active)
+ run_end = ctx->time;
+ else
+ run_end = event->tstamp_stopped;
+
+ event->total_time_enabled = run_end - event->tstamp_enabled;
+
+ if (event->state == PERF_EVENT_STATE_INACTIVE)
+ run_end = event->tstamp_stopped;
+ else
+ run_end = perf_event_time(event);
+
+ event->total_time_running = run_end - event->tstamp_running;
+
+}
+
+/*
+ * Update total_time_enabled and total_time_running for all events in a group.
+ */
+static void update_group_times(struct perf_event *leader)
+{
+ struct perf_event *event;
+
+ update_event_times(leader);
+ list_for_each_entry(event, &leader->sibling_list, group_entry)
+ update_event_times(event);
+}
+
+static struct list_head *
+ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
+{
+ if (event->attr.pinned)
+ return &ctx->pinned_groups;
+ else
+ return &ctx->flexible_groups;
+}
+
+/*
+ * Add a event from the lists for its context.
+ * Must be called with ctx->mutex and ctx->lock held.
+ */
+static void
+list_add_event(struct perf_event *event, struct perf_event_context *ctx)
+{
+ WARN_ON_ONCE(event->attach_state & PERF_ATTACH_CONTEXT);
+ event->attach_state |= PERF_ATTACH_CONTEXT;
+
+ /*
+ * If we're a stand alone event or group leader, we go to the context
+ * list, group events are kept attached to the group so that
+ * perf_group_detach can, at all times, locate all siblings.
+ */
+ if (event->group_leader == event) {
+ struct list_head *list;
+
+ if (is_software_event(event))
+ event->group_flags |= PERF_GROUP_SOFTWARE;
+
+ list = ctx_group_list(event, ctx);
+ list_add_tail(&event->group_entry, list);
+ }
+
+ if (is_cgroup_event(event))
+ ctx->nr_cgroups++;
+
+ list_add_rcu(&event->event_entry, &ctx->event_list);
+ if (!ctx->nr_events)
+ perf_pmu_rotate_start(ctx->pmu);
+ ctx->nr_events++;
+ if (event->attr.inherit_stat)
+ ctx->nr_stat++;
+}
+
+/*
+ * Called at perf_event creation and when events are attached/detached from a
+ * group.
+ */
+static void perf_event__read_size(struct perf_event *event)
+{
+ int entry = sizeof(u64); /* value */
+ int size = 0;
+ int nr = 1;
+
+ if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+ size += sizeof(u64);
+
+ if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+ size += sizeof(u64);
+
+ if (event->attr.read_format & PERF_FORMAT_ID)
+ entry += sizeof(u64);
+
+ if (event->attr.read_format & PERF_FORMAT_GROUP) {
+ nr += event->group_leader->nr_siblings;
+ size += sizeof(u64);
+ }
+
+ size += entry * nr;
+ event->read_size = size;
+}
+
+static void perf_event__header_size(struct perf_event *event)
+{
+ struct perf_sample_data *data;
+ u64 sample_type = event->attr.sample_type;
+ u16 size = 0;
+
+ perf_event__read_size(event);
+
+ if (sample_type & PERF_SAMPLE_IP)
+ size += sizeof(data->ip);
+
+ if (sample_type & PERF_SAMPLE_ADDR)
+ size += sizeof(data->addr);
+
+ if (sample_type & PERF_SAMPLE_PERIOD)
+ size += sizeof(data->period);
+
+ if (sample_type & PERF_SAMPLE_READ)
+ size += event->read_size;
+
+ event->header_size = size;
+}
+
+static void perf_event__id_header_size(struct perf_event *event)
+{
+ struct perf_sample_data *data;
+ u64 sample_type = event->attr.sample_type;
+ u16 size = 0;
+
+ if (sample_type & PERF_SAMPLE_TID)
+ size += sizeof(data->tid_entry);
+
+ if (sample_type & PERF_SAMPLE_TIME)
+ size += sizeof(data->time);
+
+ if (sample_type & PERF_SAMPLE_ID)
+ size += sizeof(data->id);
+
+ if (sample_type & PERF_SAMPLE_STREAM_ID)
+ size += sizeof(data->stream_id);
+
+ if (sample_type & PERF_SAMPLE_CPU)
+ size += sizeof(data->cpu_entry);
+
+ event->id_header_size = size;
+}
+
+static void perf_group_attach(struct perf_event *event)
+{
+ struct perf_event *group_leader = event->group_leader, *pos;
+
+ /*
+ * We can have double attach due to group movement in perf_event_open.
+ */
+ if (event->attach_state & PERF_ATTACH_GROUP)
+ return;
+
+ event->attach_state |= PERF_ATTACH_GROUP;
+
+ if (group_leader == event)
+ return;
+
+ if (group_leader->group_flags & PERF_GROUP_SOFTWARE &&
+ !is_software_event(event))
+ group_leader->group_flags &= ~PERF_GROUP_SOFTWARE;
+
+ list_add_tail(&event->group_entry, &group_leader->sibling_list);
+ group_leader->nr_siblings++;
+
+ perf_event__header_size(group_leader);
+
+ list_for_each_entry(pos, &group_leader->sibling_list, group_entry)
+ perf_event__header_size(pos);
+}
+
+/*
+ * Remove a event from the lists for its context.
+ * Must be called with ctx->mutex and ctx->lock held.
+ */
+static void
+list_del_event(struct perf_event *event, struct perf_event_context *ctx)
+{
+ struct perf_cpu_context *cpuctx;
+ /*
+ * We can have double detach due to exit/hot-unplug + close.
+ */
+ if (!(event->attach_state & PERF_ATTACH_CONTEXT))
+ return;
+
+ event->attach_state &= ~PERF_ATTACH_CONTEXT;
+
+ if (is_cgroup_event(event)) {
+ ctx->nr_cgroups--;
+ cpuctx = __get_cpu_context(ctx);
+ /*
+ * if there are no more cgroup events
+ * then cler cgrp to avoid stale pointer
+ * in update_cgrp_time_from_cpuctx()
+ */
+ if (!ctx->nr_cgroups)
+ cpuctx->cgrp = NULL;
+ }
+
+ ctx->nr_events--;
+ if (event->attr.inherit_stat)
+ ctx->nr_stat--;
+
+ list_del_rcu(&event->event_entry);
+
+ if (event->group_leader == event)
+ list_del_init(&event->group_entry);
+
+ update_group_times(event);
+
+ /*
+ * If event was in error state, then keep it
+ * that way, otherwise bogus counts will be
+ * returned on read(). The only way to get out
+ * of error state is by explicit re-enabling
+ * of the event
+ */
+ if (event->state > PERF_EVENT_STATE_OFF)
+ event->state = PERF_EVENT_STATE_OFF;
+}
+
+static void perf_group_detach(struct perf_event *event)
+{
+ struct perf_event *sibling, *tmp;
+ struct list_head *list = NULL;
+
+ /*
+ * We can have double detach due to exit/hot-unplug + close.
+ */
+ if (!(event->attach_state & PERF_ATTACH_GROUP))
+ return;
+
+ event->attach_state &= ~PERF_ATTACH_GROUP;
+
+ /*
+ * If this is a sibling, remove it from its group.
+ */
+ if (event->group_leader != event) {
+ list_del_init(&event->group_entry);
+ event->group_leader->nr_siblings--;
+ goto out;
+ }
+
+ if (!list_empty(&event->group_entry))
+ list = &event->group_entry;
+
+ /*
+ * If this was a group event with sibling events then
+ * upgrade the siblings to singleton events by adding them
+ * to whatever list we are on.
+ */
+ list_for_each_entry_safe(sibling, tmp, &event->sibling_list, group_entry) {
+ if (list)
+ list_move_tail(&sibling->group_entry, list);
+ sibling->group_leader = sibling;
+
+ /* Inherit group flags from the previous leader */
+ sibling->group_flags = event->group_flags;
+ }
+
+out:
+ perf_event__header_size(event->group_leader);
+
+ list_for_each_entry(tmp, &event->group_leader->sibling_list, group_entry)
+ perf_event__header_size(tmp);
+}
+
+static inline int
+event_filter_match(struct perf_event *event)
+{
+ return (event->cpu == -1 || event->cpu == smp_processor_id())
+ && perf_cgroup_match(event);
+}
+
+static void
+event_sched_out(struct perf_event *event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ u64 tstamp = perf_event_time(event);
+ u64 delta;
+ /*
+ * An event which could not be activated because of
+ * filter mismatch still needs to have its timings
+ * maintained, otherwise bogus information is return
+ * via read() for time_enabled, time_running:
+ */
+ if (event->state == PERF_EVENT_STATE_INACTIVE
+ && !event_filter_match(event)) {
+ delta = tstamp - event->tstamp_stopped;
+ event->tstamp_running += delta;
+ event->tstamp_stopped = tstamp;
+ }
+
+ if (event->state != PERF_EVENT_STATE_ACTIVE)
+ return;
+
+ event->state = PERF_EVENT_STATE_INACTIVE;
+ if (event->pending_disable) {
+ event->pending_disable = 0;
+ event->state = PERF_EVENT_STATE_OFF;
+ }
+ event->tstamp_stopped = tstamp;
+ event->pmu->del(event, 0);
+ event->oncpu = -1;
+
+ if (!is_software_event(event))
+ cpuctx->active_oncpu--;
+ ctx->nr_active--;
+ if (event->attr.exclusive || !cpuctx->active_oncpu)
+ cpuctx->exclusive = 0;
+}
+
+static void
+group_sched_out(struct perf_event *group_event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *event;
+ int state = group_event->state;
+
+ event_sched_out(group_event, cpuctx, ctx);
+
+ /*
+ * Schedule out siblings (if any):
+ */
+ list_for_each_entry(event, &group_event->sibling_list, group_entry)
+ event_sched_out(event, cpuctx, ctx);
+
+ if (state == PERF_EVENT_STATE_ACTIVE && group_event->attr.exclusive)
+ cpuctx->exclusive = 0;
+}
+
+/*
+ * Cross CPU call to remove a performance event
+ *
+ * We disable the event on the hardware level first. After that we
+ * remove it from the context list.
+ */
+static int __perf_remove_from_context(void *info)
+{
+ struct perf_event *event = info;
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+ raw_spin_lock(&ctx->lock);
+ event_sched_out(event, cpuctx, ctx);
+ list_del_event(event, ctx);
+ raw_spin_unlock(&ctx->lock);
+
+ return 0;
+}
+
+
+/*
+ * Remove the event from a task's (or a CPU's) list of events.
+ *
+ * CPU events are removed with a smp call. For task events we only
+ * call when the task is on a CPU.
+ *
+ * If event->ctx is a cloned context, callers must make sure that
+ * every task struct that event->ctx->task could possibly point to
+ * remains valid. This is OK when called from perf_release since
+ * that only calls us on the top-level context, which can't be a clone.
+ * When called from perf_event_exit_task, it's OK because the
+ * context has been detached from its task.
+ */
+static void perf_remove_from_context(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+ struct task_struct *task = ctx->task;
+
+ lockdep_assert_held(&ctx->mutex);
+
+ if (!task) {
+ /*
+ * Per cpu events are removed via an smp call and
+ * the removal is always successful.
+ */
+ cpu_function_call(event->cpu, __perf_remove_from_context, event);
+ return;
+ }
+
+retry:
+ if (!task_function_call(task, __perf_remove_from_context, event))
+ return;
+
+ raw_spin_lock_irq(&ctx->lock);
+ /*
+ * If we failed to find a running task, but find the context active now
+ * that we've acquired the ctx->lock, retry.
+ */
+ if (ctx->is_active) {
+ raw_spin_unlock_irq(&ctx->lock);
+ goto retry;
+ }
+
+ /*
+ * Since the task isn't running, its safe to remove the event, us
+ * holding the ctx->lock ensures the task won't get scheduled in.
+ */
+ list_del_event(event, ctx);
+ raw_spin_unlock_irq(&ctx->lock);
+}
+
+/*
+ * Cross CPU call to disable a performance event
+ */
+static int __perf_event_disable(void *info)
+{
+ struct perf_event *event = info;
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+ /*
+ * If this is a per-task event, need to check whether this
+ * event's task is the current task on this cpu.
+ *
+ * Can trigger due to concurrent perf_event_context_sched_out()
+ * flipping contexts around.
+ */
+ if (ctx->task && cpuctx->task_ctx != ctx)
+ return -EINVAL;
+
+ raw_spin_lock(&ctx->lock);
+
+ /*
+ * If the event is on, turn it off.
+ * If it is in error state, leave it in error state.
+ */
+ if (event->state >= PERF_EVENT_STATE_INACTIVE) {
+ update_context_time(ctx);
+ update_cgrp_time_from_event(event);
+ update_group_times(event);
+ if (event == event->group_leader)
+ group_sched_out(event, cpuctx, ctx);
+ else
+ event_sched_out(event, cpuctx, ctx);
+ event->state = PERF_EVENT_STATE_OFF;
+ }
+
+ raw_spin_unlock(&ctx->lock);
+
+ return 0;
+}
+
+/*
+ * Disable a event.
+ *
+ * If event->ctx is a cloned context, callers must make sure that
+ * every task struct that event->ctx->task could possibly point to
+ * remains valid. This condition is satisifed when called through
+ * perf_event_for_each_child or perf_event_for_each because they
+ * hold the top-level event's child_mutex, so any descendant that
+ * goes to exit will block in sync_child_event.
+ * When called from perf_pending_event it's OK because event->ctx
+ * is the current context on this CPU and preemption is disabled,
+ * hence we can't get into perf_event_task_sched_out for this context.
+ */
+void perf_event_disable(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+ struct task_struct *task = ctx->task;
+
+ if (!task) {
+ /*
+ * Disable the event on the cpu that it's on
+ */
+ cpu_function_call(event->cpu, __perf_event_disable, event);
+ return;
+ }
+
+retry:
+ if (!task_function_call(task, __perf_event_disable, event))
+ return;
+
+ raw_spin_lock_irq(&ctx->lock);
+ /*
+ * If the event is still active, we need to retry the cross-call.
+ */
+ if (event->state == PERF_EVENT_STATE_ACTIVE) {
+ raw_spin_unlock_irq(&ctx->lock);
+ /*
+ * Reload the task pointer, it might have been changed by
+ * a concurrent perf_event_context_sched_out().
+ */
+ task = ctx->task;
+ goto retry;
+ }
+
+ /*
+ * Since we have the lock this context can't be scheduled
+ * in, so we can change the state safely.
+ */
+ if (event->state == PERF_EVENT_STATE_INACTIVE) {
+ update_group_times(event);
+ event->state = PERF_EVENT_STATE_OFF;
+ }
+ raw_spin_unlock_irq(&ctx->lock);
+}
+
+static void perf_set_shadow_time(struct perf_event *event,
+ struct perf_event_context *ctx,
+ u64 tstamp)
+{
+ /*
+ * use the correct time source for the time snapshot
+ *
+ * We could get by without this by leveraging the
+ * fact that to get to this function, the caller
+ * has most likely already called update_context_time()
+ * and update_cgrp_time_xx() and thus both timestamp
+ * are identical (or very close). Given that tstamp is,
+ * already adjusted for cgroup, we could say that:
+ * tstamp - ctx->timestamp
+ * is equivalent to
+ * tstamp - cgrp->timestamp.
+ *
+ * Then, in perf_output_read(), the calculation would
+ * work with no changes because:
+ * - event is guaranteed scheduled in
+ * - no scheduled out in between
+ * - thus the timestamp would be the same
+ *
+ * But this is a bit hairy.
+ *
+ * So instead, we have an explicit cgroup call to remain
+ * within the time time source all along. We believe it
+ * is cleaner and simpler to understand.
+ */
+ if (is_cgroup_event(event))
+ perf_cgroup_set_shadow_time(event, tstamp);
+ else
+ event->shadow_ctx_time = tstamp - ctx->timestamp;
+}
+
+#define MAX_INTERRUPTS (~0ULL)
+
+static void perf_log_throttle(struct perf_event *event, int enable);
+
+static int
+event_sched_in(struct perf_event *event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ u64 tstamp = perf_event_time(event);
+
+ if (event->state <= PERF_EVENT_STATE_OFF)
+ return 0;
+
+ event->state = PERF_EVENT_STATE_ACTIVE;
+ event->oncpu = smp_processor_id();
+
+ /*
+ * Unthrottle events, since we scheduled we might have missed several
+ * ticks already, also for a heavily scheduling task there is little
+ * guarantee it'll get a tick in a timely manner.
+ */
+ if (unlikely(event->hw.interrupts == MAX_INTERRUPTS)) {
+ perf_log_throttle(event, 1);
+ event->hw.interrupts = 0;
+ }
+
+ /*
+ * The new state must be visible before we turn it on in the hardware:
+ */
+ smp_wmb();
+
+ if (event->pmu->add(event, PERF_EF_START)) {
+ event->state = PERF_EVENT_STATE_INACTIVE;
+ event->oncpu = -1;
+ return -EAGAIN;
+ }
+
+ event->tstamp_running += tstamp - event->tstamp_stopped;
+
+ perf_set_shadow_time(event, ctx, tstamp);
+
+ if (!is_software_event(event))
+ cpuctx->active_oncpu++;
+ ctx->nr_active++;
+
+ if (event->attr.exclusive)
+ cpuctx->exclusive = 1;
+
+ return 0;
+}
+
+static int
+group_sched_in(struct perf_event *group_event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *event, *partial_group = NULL;
+ struct pmu *pmu = group_event->pmu;
+ u64 now = ctx->time;
+ bool simulate = false;
+
+ if (group_event->state == PERF_EVENT_STATE_OFF)
+ return 0;
+
+ pmu->start_txn(pmu);
+
+ if (event_sched_in(group_event, cpuctx, ctx)) {
+ pmu->cancel_txn(pmu);
+ return -EAGAIN;
+ }
+
+ /*
+ * Schedule in siblings as one group (if any):
+ */
+ list_for_each_entry(event, &group_event->sibling_list, group_entry) {
+ if (event_sched_in(event, cpuctx, ctx)) {
+ partial_group = event;
+ goto group_error;
+ }
+ }
+
+ if (!pmu->commit_txn(pmu))
+ return 0;
+
+group_error:
+ /*
+ * Groups can be scheduled in as one unit only, so undo any
+ * partial group before returning:
+ * The events up to the failed event are scheduled out normally,
+ * tstamp_stopped will be updated.
+ *
+ * The failed events and the remaining siblings need to have
+ * their timings updated as if they had gone thru event_sched_in()
+ * and event_sched_out(). This is required to get consistent timings
+ * across the group. This also takes care of the case where the group
+ * could never be scheduled by ensuring tstamp_stopped is set to mark
+ * the time the event was actually stopped, such that time delta
+ * calculation in update_event_times() is correct.
+ */
+ list_for_each_entry(event, &group_event->sibling_list, group_entry) {
+ if (event == partial_group)
+ simulate = true;
+
+ if (simulate) {
+ event->tstamp_running += now - event->tstamp_stopped;
+ event->tstamp_stopped = now;
+ } else {
+ event_sched_out(event, cpuctx, ctx);
+ }
+ }
+ event_sched_out(group_event, cpuctx, ctx);
+
+ pmu->cancel_txn(pmu);
+
+ return -EAGAIN;
+}
+
+/*
+ * Work out whether we can put this event group on the CPU now.
+ */
+static int group_can_go_on(struct perf_event *event,
+ struct perf_cpu_context *cpuctx,
+ int can_add_hw)
+{
+ /*
+ * Groups consisting entirely of software events can always go on.
+ */
+ if (event->group_flags & PERF_GROUP_SOFTWARE)
+ return 1;
+ /*
+ * If an exclusive group is already on, no other hardware
+ * events can go on.
+ */
+ if (cpuctx->exclusive)
+ return 0;
+ /*
+ * If this group is exclusive and there are already
+ * events on the CPU, it can't go on.
+ */
+ if (event->attr.exclusive && cpuctx->active_oncpu)
+ return 0;
+ /*
+ * Otherwise, try to add it if all previous groups were able
+ * to go on.
+ */
+ return can_add_hw;
+}
+
+static void add_event_to_ctx(struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+ u64 tstamp = perf_event_time(event);
+
+ list_add_event(event, ctx);
+ perf_group_attach(event);
+ event->tstamp_enabled = tstamp;
+ event->tstamp_running = tstamp;
+ event->tstamp_stopped = tstamp;
+}
+
+static void perf_event_context_sched_in(struct perf_event_context *ctx,
+ struct task_struct *tsk);
+
+/*
+ * Cross CPU call to install and enable a performance event
+ *
+ * Must be called with ctx->mutex held
+ */
+static int __perf_install_in_context(void *info)
+{
+ struct perf_event *event = info;
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_event *leader = event->group_leader;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+ int err;
+
+ /*
+ * In case we're installing a new context to an already running task,
+ * could also happen before perf_event_task_sched_in() on architectures
+ * which do context switches with IRQs enabled.
+ */
+ if (ctx->task && !cpuctx->task_ctx)
+ perf_event_context_sched_in(ctx, ctx->task);
+
+ raw_spin_lock(&ctx->lock);
+ ctx->is_active = 1;
+ update_context_time(ctx);
+ /*
+ * update cgrp time only if current cgrp
+ * matches event->cgrp. Must be done before
+ * calling add_event_to_ctx()
+ */
+ update_cgrp_time_from_event(event);
+
+ add_event_to_ctx(event, ctx);
+
+ if (!event_filter_match(event))
+ goto unlock;
+
+ /*
+ * Don't put the event on if it is disabled or if
+ * it is in a group and the group isn't on.
+ */
+ if (event->state != PERF_EVENT_STATE_INACTIVE ||
+ (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE))
+ goto unlock;
+
+ /*
+ * An exclusive event can't go on if there are already active
+ * hardware events, and no hardware event can go on if there
+ * is already an exclusive event on.
+ */
+ if (!group_can_go_on(event, cpuctx, 1))
+ err = -EEXIST;
+ else
+ err = event_sched_in(event, cpuctx, ctx);
+
+ if (err) {
+ /*
+ * This event couldn't go on. If it is in a group
+ * then we have to pull the whole group off.
+ * If the event group is pinned then put it in error state.
+ */
+ if (leader != event)
+ group_sched_out(leader, cpuctx, ctx);
+ if (leader->attr.pinned) {
+ update_group_times(leader);
+ leader->state = PERF_EVENT_STATE_ERROR;
+ }
+ }
+
+unlock:
+ raw_spin_unlock(&ctx->lock);
+
+ return 0;
+}
+
+/*
+ * Attach a performance event to a context
+ *
+ * First we add the event to the list with the hardware enable bit
+ * in event->hw_config cleared.
+ *
+ * If the event is attached to a task which is on a CPU we use a smp
+ * call to enable it in the task context. The task might have been
+ * scheduled away, but we check this in the smp call again.
+ */
+static void
+perf_install_in_context(struct perf_event_context *ctx,
+ struct perf_event *event,
+ int cpu)
+{
+ struct task_struct *task = ctx->task;
+
+ lockdep_assert_held(&ctx->mutex);
+
+ event->ctx = ctx;
+
+ if (!task) {
+ /*
+ * Per cpu events are installed via an smp call and
+ * the install is always successful.
+ */
+ cpu_function_call(cpu, __perf_install_in_context, event);
+ return;
+ }
+
+retry:
+ if (!task_function_call(task, __perf_install_in_context, event))
+ return;
+
+ raw_spin_lock_irq(&ctx->lock);
+ /*
+ * If we failed to find a running task, but find the context active now
+ * that we've acquired the ctx->lock, retry.
+ */
+ if (ctx->is_active) {
+ raw_spin_unlock_irq(&ctx->lock);
+ goto retry;
+ }
+
+ /*
+ * Since the task isn't running, its safe to add the event, us holding
+ * the ctx->lock ensures the task won't get scheduled in.
+ */
+ add_event_to_ctx(event, ctx);
+ raw_spin_unlock_irq(&ctx->lock);
+}
+
+/*
+ * Put a event into inactive state and update time fields.
+ * Enabling the leader of a group effectively enables all
+ * the group members that aren't explicitly disabled, so we
+ * have to update their ->tstamp_enabled also.
+ * Note: this works for group members as well as group leaders
+ * since the non-leader members' sibling_lists will be empty.
+ */
+static void __perf_event_mark_enabled(struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *sub;
+ u64 tstamp = perf_event_time(event);
+
+ event->state = PERF_EVENT_STATE_INACTIVE;
+ event->tstamp_enabled = tstamp - event->total_time_enabled;
+ list_for_each_entry(sub, &event->sibling_list, group_entry) {
+ if (sub->state >= PERF_EVENT_STATE_INACTIVE)
+ sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ }
+}
+
+/*
+ * Cross CPU call to enable a performance event
+ */
+static int __perf_event_enable(void *info)
+{
+ struct perf_event *event = info;
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_event *leader = event->group_leader;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+ int err;
+
+ if (WARN_ON_ONCE(!ctx->is_active))
+ return -EINVAL;
+
+ raw_spin_lock(&ctx->lock);
+ update_context_time(ctx);
+
+ if (event->state >= PERF_EVENT_STATE_INACTIVE)
+ goto unlock;
+
+ /*
+ * set current task's cgroup time reference point
+ */
+ perf_cgroup_set_timestamp(current, ctx);
+
+ __perf_event_mark_enabled(event, ctx);
+
+ if (!event_filter_match(event)) {
+ if (is_cgroup_event(event))
+ perf_cgroup_defer_enabled(event);
+ goto unlock;
+ }
+
+ /*
+ * If the event is in a group and isn't the group leader,
+ * then don't put it on unless the group is on.
+ */
+ if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
+ goto unlock;
+
+ if (!group_can_go_on(event, cpuctx, 1)) {
+ err = -EEXIST;
+ } else {
+ if (event == leader)
+ err = group_sched_in(event, cpuctx, ctx);
+ else
+ err = event_sched_in(event, cpuctx, ctx);
+ }
+
+ if (err) {
+ /*
+ * If this event can't go on and it's part of a
+ * group, then the whole group has to come off.
+ */
+ if (leader != event)
+ group_sched_out(leader, cpuctx, ctx);
+ if (leader->attr.pinned) {
+ update_group_times(leader);
+ leader->state = PERF_EVENT_STATE_ERROR;
+ }
+ }
+
+unlock:
+ raw_spin_unlock(&ctx->lock);
+
+ return 0;
+}
+
+/*
+ * Enable a event.
+ *
+ * If event->ctx is a cloned context, callers must make sure that
+ * every task struct that event->ctx->task could possibly point to
+ * remains valid. This condition is satisfied when called through
+ * perf_event_for_each_child or perf_event_for_each as described
+ * for perf_event_disable.
+ */
+void perf_event_enable(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+ struct task_struct *task = ctx->task;
+
+ if (!task) {
+ /*
+ * Enable the event on the cpu that it's on
+ */
+ cpu_function_call(event->cpu, __perf_event_enable, event);
+ return;
+ }
+
+ raw_spin_lock_irq(&ctx->lock);
+ if (event->state >= PERF_EVENT_STATE_INACTIVE)
+ goto out;
+
+ /*
+ * If the event is in error state, clear that first.
+ * That way, if we see the event in error state below, we
+ * know that it has gone back into error state, as distinct
+ * from the task having been scheduled away before the
+ * cross-call arrived.
+ */
+ if (event->state == PERF_EVENT_STATE_ERROR)
+ event->state = PERF_EVENT_STATE_OFF;
+
+retry:
+ if (!ctx->is_active) {
+ __perf_event_mark_enabled(event, ctx);
+ goto out;
+ }
+
+ raw_spin_unlock_irq(&ctx->lock);
+
+ if (!task_function_call(task, __perf_event_enable, event))
+ return;
+
+ raw_spin_lock_irq(&ctx->lock);
+
+ /*
+ * If the context is active and the event is still off,
+ * we need to retry the cross-call.
+ */
+ if (ctx->is_active && event->state == PERF_EVENT_STATE_OFF) {
+ /*
+ * task could have been flipped by a concurrent
+ * perf_event_context_sched_out()
+ */
+ task = ctx->task;
+ goto retry;
+ }
+
+out:
+ raw_spin_unlock_irq(&ctx->lock);
+}
+
+static int perf_event_refresh(struct perf_event *event, int refresh)
+{
+ /*
+ * not supported on inherited events
+ */
+ if (event->attr.inherit || !is_sampling_event(event))
+ return -EINVAL;
+
+ atomic_add(refresh, &event->event_limit);
+ perf_event_enable(event);
+
+ return 0;
+}
+
+static void ctx_sched_out(struct perf_event_context *ctx,
+ struct perf_cpu_context *cpuctx,
+ enum event_type_t event_type)
+{
+ struct perf_event *event;
+
+ raw_spin_lock(&ctx->lock);
+ perf_pmu_disable(ctx->pmu);
+ ctx->is_active = 0;
+ if (likely(!ctx->nr_events))
+ goto out;
+ update_context_time(ctx);
+ update_cgrp_time_from_cpuctx(cpuctx);
+
+ if (!ctx->nr_active)
+ goto out;
+
+ if (event_type & EVENT_PINNED) {
+ list_for_each_entry(event, &ctx->pinned_groups, group_entry)
+ group_sched_out(event, cpuctx, ctx);
+ }
+
+ if (event_type & EVENT_FLEXIBLE) {
+ list_for_each_entry(event, &ctx->flexible_groups, group_entry)
+ group_sched_out(event, cpuctx, ctx);
+ }
+out:
+ perf_pmu_enable(ctx->pmu);
+ raw_spin_unlock(&ctx->lock);
+}
+
+/*
+ * Test whether two contexts are equivalent, i.e. whether they
+ * have both been cloned from the same version of the same context
+ * and they both have the same number of enabled events.
+ * If the number of enabled events is the same, then the set
+ * of enabled events should be the same, because these are both
+ * inherited contexts, therefore we can't access individual events
+ * in them directly with an fd; we can only enable/disable all
+ * events via prctl, or enable/disable all events in a family
+ * via ioctl, which will have the same effect on both contexts.
+ */
+static int context_equiv(struct perf_event_context *ctx1,
+ struct perf_event_context *ctx2)
+{
+ return ctx1->parent_ctx && ctx1->parent_ctx == ctx2->parent_ctx
+ && ctx1->parent_gen == ctx2->parent_gen
+ && !ctx1->pin_count && !ctx2->pin_count;
+}
+
+static void __perf_event_sync_stat(struct perf_event *event,
+ struct perf_event *next_event)
+{
+ u64 value;
+
+ if (!event->attr.inherit_stat)
+ return;
+
+ /*
+ * Update the event value, we cannot use perf_event_read()
+ * because we're in the middle of a context switch and have IRQs
+ * disabled, which upsets smp_call_function_single(), however
+ * we know the event must be on the current CPU, therefore we
+ * don't need to use it.
+ */
+ switch (event->state) {
+ case PERF_EVENT_STATE_ACTIVE:
+ event->pmu->read(event);
+ /* fall-through */
+
+ case PERF_EVENT_STATE_INACTIVE:
+ update_event_times(event);
+ break;
+
+ default:
+ break;
+ }
+
+ /*
+ * In order to keep per-task stats reliable we need to flip the event
+ * values when we flip the contexts.
+ */
+ value = local64_read(&next_event->count);
+ value = local64_xchg(&event->count, value);
+ local64_set(&next_event->count, value);
+
+ swap(event->total_time_enabled, next_event->total_time_enabled);
+ swap(event->total_time_running, next_event->total_time_running);
+
+ /*
+ * Since we swizzled the values, update the user visible data too.
+ */
+ perf_event_update_userpage(event);
+ perf_event_update_userpage(next_event);
+}
+
+#define list_next_entry(pos, member) \
+ list_entry(pos->member.next, typeof(*pos), member)
+
+static void perf_event_sync_stat(struct perf_event_context *ctx,
+ struct perf_event_context *next_ctx)
+{
+ struct perf_event *event, *next_event;
+
+ if (!ctx->nr_stat)
+ return;
+
+ update_context_time(ctx);
+
+ event = list_first_entry(&ctx->event_list,
+ struct perf_event, event_entry);
+
+ next_event = list_first_entry(&next_ctx->event_list,
+ struct perf_event, event_entry);
+
+ while (&event->event_entry != &ctx->event_list &&
+ &next_event->event_entry != &next_ctx->event_list) {
+
+ __perf_event_sync_stat(event, next_event);
+
+ event = list_next_entry(event, event_entry);
+ next_event = list_next_entry(next_event, event_entry);
+ }
+}
+
+static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
+ struct task_struct *next)
+{
+ struct perf_event_context *ctx = task->perf_event_ctxp[ctxn];
+ struct perf_event_context *next_ctx;
+ struct perf_event_context *parent;
+ struct perf_cpu_context *cpuctx;
+ int do_switch = 1;
+
+ if (likely(!ctx))
+ return;
+
+ cpuctx = __get_cpu_context(ctx);
+ if (!cpuctx->task_ctx)
+ return;
+
+ rcu_read_lock();
+ parent = rcu_dereference(ctx->parent_ctx);
+ next_ctx = next->perf_event_ctxp[ctxn];
+ if (parent && next_ctx &&
+ rcu_dereference(next_ctx->parent_ctx) == parent) {
+ /*
+ * Looks like the two contexts are clones, so we might be
+ * able to optimize the context switch. We lock both
+ * contexts and check that they are clones under the
+ * lock (including re-checking that neither has been
+ * uncloned in the meantime). It doesn't matter which
+ * order we take the locks because no other cpu could
+ * be trying to lock both of these tasks.
+ */
+ raw_spin_lock(&ctx->lock);
+ raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING);
+ if (context_equiv(ctx, next_ctx)) {
+ /*
+ * XXX do we need a memory barrier of sorts
+ * wrt to rcu_dereference() of perf_event_ctxp
+ */
+ task->perf_event_ctxp[ctxn] = next_ctx;
+ next->perf_event_ctxp[ctxn] = ctx;
+ ctx->task = next;
+ next_ctx->task = task;
+ do_switch = 0;
+
+ perf_event_sync_stat(ctx, next_ctx);
+ }
+ raw_spin_unlock(&next_ctx->lock);
+ raw_spin_unlock(&ctx->lock);
+ }
+ rcu_read_unlock();
+
+ if (do_switch) {
+ ctx_sched_out(ctx, cpuctx, EVENT_ALL);
+ cpuctx->task_ctx = NULL;
+ }
+}
+
+#define for_each_task_context_nr(ctxn) \
+ for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
+
+/*
+ * Called from scheduler to remove the events of the current task,
+ * with interrupts disabled.
+ *
+ * We stop each event and update the event value in event->count.
+ *
+ * This does not protect us against NMI, but disable()
+ * sets the disabled bit in the control field of event _before_
+ * accessing the event control register. If a NMI hits, then it will
+ * not restart the event.
+ */
+void __perf_event_task_sched_out(struct task_struct *task,
+ struct task_struct *next)
+{
+ int ctxn;
+
+ for_each_task_context_nr(ctxn)
+ perf_event_context_sched_out(task, ctxn, next);
+
+ /*
+ * if cgroup events exist on this CPU, then we need
+ * to check if we have to switch out PMU state.
+ * cgroup event are system-wide mode only
+ */
+ if (atomic_read(&__get_cpu_var(perf_cgroup_events)))
+ perf_cgroup_sched_out(task);
+}
+
+static void task_ctx_sched_out(struct perf_event_context *ctx,
+ enum event_type_t event_type)
+{
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+ if (!cpuctx->task_ctx)
+ return;
+
+ if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
+ return;
+
+ ctx_sched_out(ctx, cpuctx, event_type);
+ cpuctx->task_ctx = NULL;
+}
+
+/*
+ * Called with IRQs disabled
+ */
+static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
+ enum event_type_t event_type)
+{
+ ctx_sched_out(&cpuctx->ctx, cpuctx, event_type);
+}
+
+static void
+ctx_pinned_sched_in(struct perf_event_context *ctx,
+ struct perf_cpu_context *cpuctx)
+{
+ struct perf_event *event;
+
+ list_for_each_entry(event, &ctx->pinned_groups, group_entry) {
+ if (event->state <= PERF_EVENT_STATE_OFF)
+ continue;
+ if (!event_filter_match(event))
+ continue;
+
+ /* may need to reset tstamp_enabled */
+ if (is_cgroup_event(event))
+ perf_cgroup_mark_enabled(event, ctx);
+
+ if (group_can_go_on(event, cpuctx, 1))
+ group_sched_in(event, cpuctx, ctx);
+
+ /*
+ * If this pinned group hasn't been scheduled,
+ * put it in error state.
+ */
+ if (event->state == PERF_EVENT_STATE_INACTIVE) {
+ update_group_times(event);
+ event->state = PERF_EVENT_STATE_ERROR;
+ }
+ }
+}
+
+static void
+ctx_flexible_sched_in(struct perf_event_context *ctx,
+ struct perf_cpu_context *cpuctx)
+{
+ struct perf_event *event;
+ int can_add_hw = 1;
+
+ list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
+ /* Ignore events in OFF or ERROR state */
+ if (event->state <= PERF_EVENT_STATE_OFF)
+ continue;
+ /*
+ * Listen to the 'cpu' scheduling filter constraint
+ * of events:
+ */
+ if (!event_filter_match(event))
+ continue;
+
+ /* may need to reset tstamp_enabled */
+ if (is_cgroup_event(event))
+ perf_cgroup_mark_enabled(event, ctx);
+
+ if (group_can_go_on(event, cpuctx, can_add_hw)) {
+ if (group_sched_in(event, cpuctx, ctx))
+ can_add_hw = 0;
+ }
+ }
+}
+
+static void
+ctx_sched_in(struct perf_event_context *ctx,
+ struct perf_cpu_context *cpuctx,
+ enum event_type_t event_type,
+ struct task_struct *task)
+{
+ u64 now;
+
+ raw_spin_lock(&ctx->lock);
+ ctx->is_active = 1;
+ if (likely(!ctx->nr_events))
+ goto out;
+
+ now = perf_clock();
+ ctx->timestamp = now;
+ perf_cgroup_set_timestamp(task, ctx);
+ /*
+ * First go through the list and put on any pinned groups
+ * in order to give them the best chance of going on.
+ */
+ if (event_type & EVENT_PINNED)
+ ctx_pinned_sched_in(ctx, cpuctx);
+
+ /* Then walk through the lower prio flexible groups */
+ if (event_type & EVENT_FLEXIBLE)
+ ctx_flexible_sched_in(ctx, cpuctx);
+
+out:
+ raw_spin_unlock(&ctx->lock);
+}
+
+static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
+ enum event_type_t event_type,
+ struct task_struct *task)
+{
+ struct perf_event_context *ctx = &cpuctx->ctx;
+
+ ctx_sched_in(ctx, cpuctx, event_type, task);
+}
+
+static void task_ctx_sched_in(struct perf_event_context *ctx,
+ enum event_type_t event_type)
+{
+ struct perf_cpu_context *cpuctx;
+
+ cpuctx = __get_cpu_context(ctx);
+ if (cpuctx->task_ctx == ctx)
+ return;
+
+ ctx_sched_in(ctx, cpuctx, event_type, NULL);
+ cpuctx->task_ctx = ctx;
+}
+
+static void perf_event_context_sched_in(struct perf_event_context *ctx,
+ struct task_struct *task)
+{
+ struct perf_cpu_context *cpuctx;
+
+ cpuctx = __get_cpu_context(ctx);
+ if (cpuctx->task_ctx == ctx)
+ return;
+
+ perf_pmu_disable(ctx->pmu);
+ /*
+ * We want to keep the following priority order:
+ * cpu pinned (that don't need to move), task pinned,
+ * cpu flexible, task flexible.
+ */
+ cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
+
+ ctx_sched_in(ctx, cpuctx, EVENT_PINNED, task);
+ cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
+ ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
+
+ cpuctx->task_ctx = ctx;
+
+ /*
+ * Since these rotations are per-cpu, we need to ensure the
+ * cpu-context we got scheduled on is actually rotating.
+ */
+ perf_pmu_rotate_start(ctx->pmu);
+ perf_pmu_enable(ctx->pmu);
+}
+
+/*
+ * Called from scheduler to add the events of the current task
+ * with interrupts disabled.
+ *
+ * We restore the event value and then enable it.
+ *
+ * This does not protect us against NMI, but enable()
+ * sets the enabled bit in the control field of event _before_
+ * accessing the event control register. If a NMI hits, then it will
+ * keep the event running.
+ */
+void __perf_event_task_sched_in(struct task_struct *task)
+{
+ struct perf_event_context *ctx;
+ int ctxn;
+
+ for_each_task_context_nr(ctxn) {
+ ctx = task->perf_event_ctxp[ctxn];
+ if (likely(!ctx))
+ continue;
+
+ perf_event_context_sched_in(ctx, task);
+ }
+ /*
+ * if cgroup events exist on this CPU, then we need
+ * to check if we have to switch in PMU state.
+ * cgroup event are system-wide mode only
+ */
+ if (atomic_read(&__get_cpu_var(perf_cgroup_events)))
+ perf_cgroup_sched_in(task);
+}
+
+static u64 perf_calculate_period(struct perf_event *event, u64 nsec, u64 count)
+{
+ u64 frequency = event->attr.sample_freq;
+ u64 sec = NSEC_PER_SEC;
+ u64 divisor, dividend;
+
+ int count_fls, nsec_fls, frequency_fls, sec_fls;
+
+ count_fls = fls64(count);
+ nsec_fls = fls64(nsec);
+ frequency_fls = fls64(frequency);
+ sec_fls = 30;
+
+ /*
+ * We got @count in @nsec, with a target of sample_freq HZ
+ * the target period becomes:
+ *
+ * @count * 10^9
+ * period = -------------------
+ * @nsec * sample_freq
+ *
+ */
+
+ /*
+ * Reduce accuracy by one bit such that @a and @b converge
+ * to a similar magnitude.
+ */
+#define REDUCE_FLS(a, b) \
+do { \
+ if (a##_fls > b##_fls) { \
+ a >>= 1; \
+ a##_fls--; \
+ } else { \
+ b >>= 1; \
+ b##_fls--; \
+ } \
+} while (0)
+
+ /*
+ * Reduce accuracy until either term fits in a u64, then proceed with
+ * the other, so that finally we can do a u64/u64 division.
+ */
+ while (count_fls + sec_fls > 64 && nsec_fls + frequency_fls > 64) {
+ REDUCE_FLS(nsec, frequency);
+ REDUCE_FLS(sec, count);
+ }
+
+ if (count_fls + sec_fls > 64) {
+ divisor = nsec * frequency;
+
+ while (count_fls + sec_fls > 64) {
+ REDUCE_FLS(count, sec);
+ divisor >>= 1;
+ }
+
+ dividend = count * sec;
+ } else {
+ dividend = count * sec;
+
+ while (nsec_fls + frequency_fls > 64) {
+ REDUCE_FLS(nsec, frequency);
+ dividend >>= 1;
+ }
+
+ divisor = nsec * frequency;
+ }
+
+ if (!divisor)
+ return dividend;
+
+ return div64_u64(dividend, divisor);
+}
+
+static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ s64 period, sample_period;
+ s64 delta;
+
+ period = perf_calculate_period(event, nsec, count);
+
+ delta = (s64)(period - hwc->sample_period);
+ delta = (delta + 7) / 8; /* low pass filter */
+
+ sample_period = hwc->sample_period + delta;
+
+ if (!sample_period)
+ sample_period = 1;
+
+ hwc->sample_period = sample_period;
+
+ if (local64_read(&hwc->period_left) > 8*sample_period) {
+ event->pmu->stop(event, PERF_EF_UPDATE);
+ local64_set(&hwc->period_left, 0);
+ event->pmu->start(event, PERF_EF_RELOAD);
+ }
+}
+
+static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64 period)
+{
+ struct perf_event *event;
+ struct hw_perf_event *hwc;
+ u64 interrupts, now;
+ s64 delta;
+
+ raw_spin_lock(&ctx->lock);
+ list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+ if (event->state != PERF_EVENT_STATE_ACTIVE)
+ continue;
+
+ if (!event_filter_match(event))
+ continue;
+
+ hwc = &event->hw;
+
+ interrupts = hwc->interrupts;
+ hwc->interrupts = 0;
+
+ /*
+ * unthrottle events on the tick
+ */
+ if (interrupts == MAX_INTERRUPTS) {
+ perf_log_throttle(event, 1);
+ event->pmu->start(event, 0);
+ }
+
+ if (!event->attr.freq || !event->attr.sample_freq)
+ continue;
+
+ event->pmu->read(event);
+ now = local64_read(&event->count);
+ delta = now - hwc->freq_count_stamp;
+ hwc->freq_count_stamp = now;
+
+ if (delta > 0)
+ perf_adjust_period(event, period, delta);
+ }
+ raw_spin_unlock(&ctx->lock);
+}
+
+/*
+ * Round-robin a context's events:
+ */
+static void rotate_ctx(struct perf_event_context *ctx)
+{
+ raw_spin_lock(&ctx->lock);
+
+ /*
+ * Rotate the first entry last of non-pinned groups. Rotation might be
+ * disabled by the inheritance code.
+ */
+ if (!ctx->rotate_disable)
+ list_rotate_left(&ctx->flexible_groups);
+
+ raw_spin_unlock(&ctx->lock);
+}
+
+/*
+ * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
+ * because they're strictly cpu affine and rotate_start is called with IRQs
+ * disabled, while rotate_context is called from IRQ context.
+ */
+static void perf_rotate_context(struct perf_cpu_context *cpuctx)
+{
+ u64 interval = (u64)cpuctx->jiffies_interval * TICK_NSEC;
+ struct perf_event_context *ctx = NULL;
+ int rotate = 0, remove = 1;
+
+ if (cpuctx->ctx.nr_events) {
+ remove = 0;
+ if (cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
+ rotate = 1;
+ }
+
+ ctx = cpuctx->task_ctx;
+ if (ctx && ctx->nr_events) {
+ remove = 0;
+ if (ctx->nr_events != ctx->nr_active)
+ rotate = 1;
+ }
+
+ perf_pmu_disable(cpuctx->ctx.pmu);
+ perf_ctx_adjust_freq(&cpuctx->ctx, interval);
+ if (ctx)
+ perf_ctx_adjust_freq(ctx, interval);
+
+ if (!rotate)
+ goto done;
+
+ cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
+ if (ctx)
+ task_ctx_sched_out(ctx, EVENT_FLEXIBLE);
+
+ rotate_ctx(&cpuctx->ctx);
+ if (ctx)
+ rotate_ctx(ctx);
+
+ cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, current);
+ if (ctx)
+ task_ctx_sched_in(ctx, EVENT_FLEXIBLE);
+
+done:
+ if (remove)
+ list_del_init(&cpuctx->rotation_list);
+
+ perf_pmu_enable(cpuctx->ctx.pmu);
+}
+
+void perf_event_task_tick(void)
+{
+ struct list_head *head = &__get_cpu_var(rotation_list);
+ struct perf_cpu_context *cpuctx, *tmp;
+
+ WARN_ON(!irqs_disabled());
+
+ list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
+ if (cpuctx->jiffies_interval == 1 ||
+ !(jiffies % cpuctx->jiffies_interval))
+ perf_rotate_context(cpuctx);
+ }
+}
+
+static int event_enable_on_exec(struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+ if (!event->attr.enable_on_exec)
+ return 0;
+
+ event->attr.enable_on_exec = 0;
+ if (event->state >= PERF_EVENT_STATE_INACTIVE)
+ return 0;
+
+ __perf_event_mark_enabled(event, ctx);
+
+ return 1;
+}
+
+/*
+ * Enable all of a task's events that have been marked enable-on-exec.
+ * This expects task == current.
+ */
+static void perf_event_enable_on_exec(struct perf_event_context *ctx)
+{
+ struct perf_event *event;
+ unsigned long flags;
+ int enabled = 0;
+ int ret;
+
+ local_irq_save(flags);
+ if (!ctx || !ctx->nr_events)
+ goto out;
+
+ /*
+ * We must ctxsw out cgroup events to avoid conflict
+ * when invoking perf_task_event_sched_in() later on
+ * in this function. Otherwise we end up trying to
+ * ctxswin cgroup events which are already scheduled
+ * in.
+ */
+ perf_cgroup_sched_out(current);
+ task_ctx_sched_out(ctx, EVENT_ALL);
+
+ raw_spin_lock(&ctx->lock);
+
+ list_for_each_entry(event, &ctx->pinned_groups, group_entry) {
+ ret = event_enable_on_exec(event, ctx);
+ if (ret)
+ enabled = 1;
+ }
+
+ list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
+ ret = event_enable_on_exec(event, ctx);
+ if (ret)
+ enabled = 1;
+ }
+
+ /*
+ * Unclone this context if we enabled any event.
+ */
+ if (enabled)
+ unclone_ctx(ctx);
+
+ raw_spin_unlock(&ctx->lock);
+
+ /*
+ * Also calls ctxswin for cgroup events, if any:
+ */
+ perf_event_context_sched_in(ctx, ctx->task);
+out:
+ local_irq_restore(flags);
+}
+
+/*
+ * Cross CPU call to read the hardware event
+ */
+static void __perf_event_read(void *info)
+{
+ struct perf_event *event = info;
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+ /*
+ * If this is a task context, we need to check whether it is
+ * the current task context of this cpu. If not it has been
+ * scheduled out before the smp call arrived. In that case
+ * event->count would have been updated to a recent sample
+ * when the event was scheduled out.
+ */
+ if (ctx->task && cpuctx->task_ctx != ctx)
+ return;
+
+ raw_spin_lock(&ctx->lock);
+ if (ctx->is_active) {
+ update_context_time(ctx);
+ update_cgrp_time_from_event(event);
+ }
+ update_event_times(event);
+ if (event->state == PERF_EVENT_STATE_ACTIVE)
+ event->pmu->read(event);
+ raw_spin_unlock(&ctx->lock);
+}
+
+static inline u64 perf_event_count(struct perf_event *event)
+{
+ return local64_read(&event->count) + atomic64_read(&event->child_count);
+}
+
+static u64 perf_event_read(struct perf_event *event)
+{
+ /*
+ * If event is enabled and currently active on a CPU, update the
+ * value in the event structure:
+ */
+ if (event->state == PERF_EVENT_STATE_ACTIVE) {
+ smp_call_function_single(event->oncpu,
+ __perf_event_read, event, 1);
+ } else if (event->state == PERF_EVENT_STATE_INACTIVE) {
+ struct perf_event_context *ctx = event->ctx;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&ctx->lock, flags);
+ /*
+ * may read while context is not active
+ * (e.g., thread is blocked), in that case
+ * we cannot update context time
+ */
+ if (ctx->is_active) {
+ update_context_time(ctx);
+ update_cgrp_time_from_event(event);
+ }
+ update_event_times(event);
+ raw_spin_unlock_irqrestore(&ctx->lock, flags);
+ }
+
+ return perf_event_count(event);
+}
+
+/*
+ * Callchain support
+ */
+
+struct callchain_cpus_entries {
+ struct rcu_head rcu_head;
+ struct perf_callchain_entry *cpu_entries[0];
+};
+
+static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]);
+static atomic_t nr_callchain_events;
+static DEFINE_MUTEX(callchain_mutex);
+struct callchain_cpus_entries *callchain_cpus_entries;
+
+
+__weak void perf_callchain_kernel(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
+{
+}
+
+__weak void perf_callchain_user(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
+{
+}
+
+static void release_callchain_buffers_rcu(struct rcu_head *head)
+{
+ struct callchain_cpus_entries *entries;
+ int cpu;
+
+ entries = container_of(head, struct callchain_cpus_entries, rcu_head);
+
+ for_each_possible_cpu(cpu)
+ kfree(entries->cpu_entries[cpu]);
+
+ kfree(entries);
+}
+
+static void release_callchain_buffers(void)
+{
+ struct callchain_cpus_entries *entries;
+
+ entries = callchain_cpus_entries;
+ rcu_assign_pointer(callchain_cpus_entries, NULL);
+ call_rcu(&entries->rcu_head, release_callchain_buffers_rcu);
+}
+
+static int alloc_callchain_buffers(void)
+{
+ int cpu;
+ int size;
+ struct callchain_cpus_entries *entries;
+
+ /*
+ * We can't use the percpu allocation API for data that can be
+ * accessed from NMI. Use a temporary manual per cpu allocation
+ * until that gets sorted out.
+ */
+ size = offsetof(struct callchain_cpus_entries, cpu_entries[nr_cpu_ids]);
+
+ entries = kzalloc(size, GFP_KERNEL);
+ if (!entries)
+ return -ENOMEM;
+
+ size = sizeof(struct perf_callchain_entry) * PERF_NR_CONTEXTS;
+
+ for_each_possible_cpu(cpu) {
+ entries->cpu_entries[cpu] = kmalloc_node(size, GFP_KERNEL,
+ cpu_to_node(cpu));
+ if (!entries->cpu_entries[cpu])
+ goto fail;
+ }
+
+ rcu_assign_pointer(callchain_cpus_entries, entries);
+
+ return 0;
+
+fail:
+ for_each_possible_cpu(cpu)
+ kfree(entries->cpu_entries[cpu]);
+ kfree(entries);
+
+ return -ENOMEM;
+}
+
+static int get_callchain_buffers(void)
+{
+ int err = 0;
+ int count;
+
+ mutex_lock(&callchain_mutex);
+
+ count = atomic_inc_return(&nr_callchain_events);
+ if (WARN_ON_ONCE(count < 1)) {
+ err = -EINVAL;
+ goto exit;
+ }
+
+ if (count > 1) {
+ /* If the allocation failed, give up */
+ if (!callchain_cpus_entries)
+ err = -ENOMEM;
+ goto exit;
+ }
+
+ err = alloc_callchain_buffers();
+ if (err)
+ release_callchain_buffers();
+exit:
+ mutex_unlock(&callchain_mutex);
+
+ return err;
+}
+
+static void put_callchain_buffers(void)
+{
+ if (atomic_dec_and_mutex_lock(&nr_callchain_events, &callchain_mutex)) {
+ release_callchain_buffers();
+ mutex_unlock(&callchain_mutex);
+ }
+}
+
+static int get_recursion_context(int *recursion)
+{
+ int rctx;
+
+ if (in_nmi())
+ rctx = 3;
+ else if (in_irq())
+ rctx = 2;
+ else if (in_softirq())
+ rctx = 1;
+ else
+ rctx = 0;
+
+ if (recursion[rctx])
+ return -1;
+
+ recursion[rctx]++;
+ barrier();
+
+ return rctx;
+}
+
+static inline void put_recursion_context(int *recursion, int rctx)
+{
+ barrier();
+ recursion[rctx]--;
+}
+
+static struct perf_callchain_entry *get_callchain_entry(int *rctx)
+{
+ int cpu;
+ struct callchain_cpus_entries *entries;
+
+ *rctx = get_recursion_context(__get_cpu_var(callchain_recursion));
+ if (*rctx == -1)
+ return NULL;
+
+ entries = rcu_dereference(callchain_cpus_entries);
+ if (!entries)
+ return NULL;
+
+ cpu = smp_processor_id();
+
+ return &entries->cpu_entries[cpu][*rctx];
+}
+
+static void
+put_callchain_entry(int rctx)
+{
+ put_recursion_context(__get_cpu_var(callchain_recursion), rctx);
+}
+
+static struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+{
+ int rctx;
+ struct perf_callchain_entry *entry;
+
+
+ entry = get_callchain_entry(&rctx);
+ if (rctx == -1)
+ return NULL;
+
+ if (!entry)
+ goto exit_put;
+
+ entry->nr = 0;
+
+ if (!user_mode(regs)) {
+ perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
+ perf_callchain_kernel(entry, regs);
+ if (current->mm)
+ regs = task_pt_regs(current);
+ else
+ regs = NULL;
+ }
+
+ if (regs) {
+ perf_callchain_store(entry, PERF_CONTEXT_USER);
+ perf_callchain_user(entry, regs);
+ }
+
+exit_put:
+ put_callchain_entry(rctx);
+
+ return entry;
+}
+
+/*
+ * Initialize the perf_event context in a task_struct:
+ */
+static void __perf_event_init_context(struct perf_event_context *ctx)
+{
+ raw_spin_lock_init(&ctx->lock);
+ mutex_init(&ctx->mutex);
+ INIT_LIST_HEAD(&ctx->pinned_groups);
+ INIT_LIST_HEAD(&ctx->flexible_groups);
+ INIT_LIST_HEAD(&ctx->event_list);
+ atomic_set(&ctx->refcount, 1);
+}
+
+static struct perf_event_context *
+alloc_perf_context(struct pmu *pmu, struct task_struct *task)
+{
+ struct perf_event_context *ctx;
+
+ ctx = kzalloc(sizeof(struct perf_event_context), GFP_KERNEL);
+ if (!ctx)
+ return NULL;
+
+ __perf_event_init_context(ctx);
+ if (task) {
+ ctx->task = task;
+ get_task_struct(task);
+ }
+ ctx->pmu = pmu;
+
+ return ctx;
+}
+
+static struct task_struct *
+find_lively_task_by_vpid(pid_t vpid)
+{
+ struct task_struct *task;
+ int err;
+
+ rcu_read_lock();
+ if (!vpid)
+ task = current;
+ else
+ task = find_task_by_vpid(vpid);
+ if (task)
+ get_task_struct(task);
+ rcu_read_unlock();
+
+ if (!task)
+ return ERR_PTR(-ESRCH);
+
+ /* Reuse ptrace permission checks for now. */
+ err = -EACCES;
+ if (!ptrace_may_access(task, PTRACE_MODE_READ))
+ goto errout;
+
+ return task;
+errout:
+ put_task_struct(task);
+ return ERR_PTR(err);
+
+}
+
+/*
+ * Returns a matching context with refcount and pincount.
+ */
+static struct perf_event_context *
+find_get_context(struct pmu *pmu, struct task_struct *task, int cpu)
+{
+ struct perf_event_context *ctx;
+ struct perf_cpu_context *cpuctx;
+ unsigned long flags;
+ int ctxn, err;
+
+ if (!task) {
+ /* Must be root to operate on a CPU event: */
+ if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ return ERR_PTR(-EACCES);
+
+ /*
+ * We could be clever and allow to attach a event to an
+ * offline CPU and activate it when the CPU comes up, but
+ * that's for later.
+ */
+ if (!cpu_online(cpu))
+ return ERR_PTR(-ENODEV);
+
+ cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+ ctx = &cpuctx->ctx;
+ get_ctx(ctx);
+ ++ctx->pin_count;
+
+ return ctx;
+ }
+
+ err = -EINVAL;
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto errout;
+
+retry:
+ ctx = perf_lock_task_context(task, ctxn, &flags);
+ if (ctx) {
+ unclone_ctx(ctx);
+ ++ctx->pin_count;
+ raw_spin_unlock_irqrestore(&ctx->lock, flags);
+ }
+
+ if (!ctx) {
+ ctx = alloc_perf_context(pmu, task);
+ err = -ENOMEM;
+ if (!ctx)
+ goto errout;
+
+ get_ctx(ctx);
+
+ err = 0;
+ mutex_lock(&task->perf_event_mutex);
+ /*
+ * If it has already passed perf_event_exit_task().
+ * we must see PF_EXITING, it takes this mutex too.
+ */
+ if (task->flags & PF_EXITING)
+ err = -ESRCH;
+ else if (task->perf_event_ctxp[ctxn])
+ err = -EAGAIN;
+ else {
+ ++ctx->pin_count;
+ rcu_assign_pointer(task->perf_event_ctxp[ctxn], ctx);
+ }
+ mutex_unlock(&task->perf_event_mutex);
+
+ if (unlikely(err)) {
+ put_task_struct(task);
+ kfree(ctx);
+
+ if (err == -EAGAIN)
+ goto retry;
+ goto errout;
+ }
+ }
+
+ return ctx;
+
+errout:
+ return ERR_PTR(err);
+}
+
+static void perf_event_free_filter(struct perf_event *event);
+
+static void free_event_rcu(struct rcu_head *head)
+{
+ struct perf_event *event;
+
+ event = container_of(head, struct perf_event, rcu_head);
+ if (event->ns)
+ put_pid_ns(event->ns);
+ perf_event_free_filter(event);
+ kfree(event);
+}
+
+static void perf_buffer_put(struct perf_buffer *buffer);
+
+static void free_event(struct perf_event *event)
+{
+ irq_work_sync(&event->pending);
+
+ if (!event->parent) {
+ if (event->attach_state & PERF_ATTACH_TASK)
+ jump_label_dec(&perf_sched_events);
+ if (event->attr.mmap || event->attr.mmap_data)
+ atomic_dec(&nr_mmap_events);
+ if (event->attr.comm)
+ atomic_dec(&nr_comm_events);
+ if (event->attr.task)
+ atomic_dec(&nr_task_events);
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ put_callchain_buffers();
+ if (is_cgroup_event(event)) {
+ atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
+ jump_label_dec(&perf_sched_events);
+ }
+ }
+
+ if (event->buffer) {
+ perf_buffer_put(event->buffer);
+ event->buffer = NULL;
+ }
+
+ if (is_cgroup_event(event))
+ perf_detach_cgroup(event);
+
+ if (event->destroy)
+ event->destroy(event);
+
+ if (event->ctx)
+ put_ctx(event->ctx);
+
+ call_rcu(&event->rcu_head, free_event_rcu);
+}
+
+int perf_event_release_kernel(struct perf_event *event)
+{
+ struct perf_event_context *ctx = event->ctx;
+
+ /*
+ * Remove from the PMU, can't get re-enabled since we got
+ * here because the last ref went.
+ */
+ perf_event_disable(event);
+
+ WARN_ON_ONCE(ctx->parent_ctx);
+ /*
+ * There are two ways this annotation is useful:
+ *
+ * 1) there is a lock recursion from perf_event_exit_task
+ * see the comment there.
+ *
+ * 2) there is a lock-inversion with mmap_sem through
+ * perf_event_read_group(), which takes faults while
+ * holding ctx->mutex, however this is called after
+ * the last filedesc died, so there is no possibility
+ * to trigger the AB-BA case.
+ */
+ mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
+ raw_spin_lock_irq(&ctx->lock);
+ perf_group_detach(event);
+ list_del_event(event, ctx);
+ raw_spin_unlock_irq(&ctx->lock);
+ mutex_unlock(&ctx->mutex);
+
+ free_event(event);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_event_release_kernel);
+
+/*
+ * Called when the last reference to the file is gone.
+ */
+static int perf_release(struct inode *inode, struct file *file)
+{
+ struct perf_event *event = file->private_data;
+ struct task_struct *owner;
+
+ file->private_data = NULL;
+
+ rcu_read_lock();
+ owner = ACCESS_ONCE(event->owner);
+ /*
+ * Matches the smp_wmb() in perf_event_exit_task(). If we observe
+ * !owner it means the list deletion is complete and we can indeed
+ * free this event, otherwise we need to serialize on
+ * owner->perf_event_mutex.
+ */
+ smp_read_barrier_depends();
+ if (owner) {
+ /*
+ * Since delayed_put_task_struct() also drops the last
+ * task reference we can safely take a new reference
+ * while holding the rcu_read_lock().
+ */
+ get_task_struct(owner);
+ }
+ rcu_read_unlock();
+
+ if (owner) {
+ mutex_lock(&owner->perf_event_mutex);
+ /*
+ * We have to re-check the event->owner field, if it is cleared
+ * we raced with perf_event_exit_task(), acquiring the mutex
+ * ensured they're done, and we can proceed with freeing the
+ * event.
+ */
+ if (event->owner)
+ list_del_init(&event->owner_entry);
+ mutex_unlock(&owner->perf_event_mutex);
+ put_task_struct(owner);
+ }
+
+ return perf_event_release_kernel(event);
+}
+
+u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
+{
+ struct perf_event *child;
+ u64 total = 0;
+
+ *enabled = 0;
+ *running = 0;
+
+ mutex_lock(&event->child_mutex);
+ total += perf_event_read(event);
+ *enabled += event->total_time_enabled +
+ atomic64_read(&event->child_total_time_enabled);
+ *running += event->total_time_running +
+ atomic64_read(&event->child_total_time_running);
+
+ list_for_each_entry(child, &event->child_list, child_list) {
+ total += perf_event_read(child);
+ *enabled += child->total_time_enabled;
+ *running += child->total_time_running;
+ }
+ mutex_unlock(&event->child_mutex);
+
+ return total;
+}
+EXPORT_SYMBOL_GPL(perf_event_read_value);
+
+static int perf_event_read_group(struct perf_event *event,
+ u64 read_format, char __user *buf)
+{
+ struct perf_event *leader = event->group_leader, *sub;
+ int n = 0, size = 0, ret = -EFAULT;
+ struct perf_event_context *ctx = leader->ctx;
+ u64 values[5];
+ u64 count, enabled, running;
+
+ mutex_lock(&ctx->mutex);
+ count = perf_event_read_value(leader, &enabled, &running);
+
+ values[n++] = 1 + leader->nr_siblings;
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+ values[n++] = enabled;
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+ values[n++] = running;
+ values[n++] = count;
+ if (read_format & PERF_FORMAT_ID)
+ values[n++] = primary_event_id(leader);
+
+ size = n * sizeof(u64);
+
+ if (copy_to_user(buf, values, size))
+ goto unlock;
+
+ ret = size;
+
+ list_for_each_entry(sub, &leader->sibling_list, group_entry) {
+ n = 0;
+
+ values[n++] = perf_event_read_value(sub, &enabled, &running);
+ if (read_format & PERF_FORMAT_ID)
+ values[n++] = primary_event_id(sub);
+
+ size = n * sizeof(u64);
+
+ if (copy_to_user(buf + ret, values, size)) {
+ ret = -EFAULT;
+ goto unlock;
+ }
+
+ ret += size;
+ }
+unlock:
+ mutex_unlock(&ctx->mutex);
+
+ return ret;
+}
+
+static int perf_event_read_one(struct perf_event *event,
+ u64 read_format, char __user *buf)
+{
+ u64 enabled, running;
+ u64 values[4];
+ int n = 0;
+
+ values[n++] = perf_event_read_value(event, &enabled, &running);
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+ values[n++] = enabled;
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+ values[n++] = running;
+ if (read_format & PERF_FORMAT_ID)
+ values[n++] = primary_event_id(event);
+
+ if (copy_to_user(buf, values, n * sizeof(u64)))
+ return -EFAULT;
+
+ return n * sizeof(u64);
+}
+
+/*
+ * Read the performance event - simple non blocking version for now
+ */
+static ssize_t
+perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
+{
+ u64 read_format = event->attr.read_format;
+ int ret;
+
+ /*
+ * Return end-of-file for a read on a event that is in
+ * error state (i.e. because it was pinned but it couldn't be
+ * scheduled on to the CPU at some point).
+ */
+ if (event->state == PERF_EVENT_STATE_ERROR)
+ return 0;
+
+ if (count < event->read_size)
+ return -ENOSPC;
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+ if (read_format & PERF_FORMAT_GROUP)
+ ret = perf_event_read_group(event, read_format, buf);
+ else
+ ret = perf_event_read_one(event, read_format, buf);
+
+ return ret;
+}
+
+static ssize_t
+perf_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
+{
+ struct perf_event *event = file->private_data;
+
+ return perf_read_hw(event, buf, count);
+}
+
+static unsigned int perf_poll(struct file *file, poll_table *wait)
+{
+ struct perf_event *event = file->private_data;
+ struct perf_buffer *buffer;
+ unsigned int events = POLL_HUP;
+
+ rcu_read_lock();
+ buffer = rcu_dereference(event->buffer);
+ if (buffer)
+ events = atomic_xchg(&buffer->poll, 0);
+ rcu_read_unlock();
+
+ poll_wait(file, &event->waitq, wait);
+
+ return events;
+}
+
+static void perf_event_reset(struct perf_event *event)
+{
+ (void)perf_event_read(event);
+ local64_set(&event->count, 0);
+ perf_event_update_userpage(event);
+}
+
+/*
+ * Holding the top-level event's child_mutex means that any
+ * descendant process that has inherited this event will block
+ * in sync_child_event if it goes to exit, thus satisfying the
+ * task existence requirements of perf_event_enable/disable.
+ */
+static void perf_event_for_each_child(struct perf_event *event,
+ void (*func)(struct perf_event *))
+{
+ struct perf_event *child;
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+ mutex_lock(&event->child_mutex);
+ func(event);
+ list_for_each_entry(child, &event->child_list, child_list)
+ func(child);
+ mutex_unlock(&event->child_mutex);
+}
+
+static void perf_event_for_each(struct perf_event *event,
+ void (*func)(struct perf_event *))
+{
+ struct perf_event_context *ctx = event->ctx;
+ struct perf_event *sibling;
+
+ WARN_ON_ONCE(ctx->parent_ctx);
+ mutex_lock(&ctx->mutex);
+ event = event->group_leader;
+
+ perf_event_for_each_child(event, func);
+ func(event);
+ list_for_each_entry(sibling, &event->sibling_list, group_entry)
+ perf_event_for_each_child(event, func);
+ mutex_unlock(&ctx->mutex);
+}
+
+static int perf_event_period(struct perf_event *event, u64 __user *arg)
+{
+ struct perf_event_context *ctx = event->ctx;
+ int ret = 0;
+ u64 value;
+
+ if (!is_sampling_event(event))
+ return -EINVAL;
+
+ if (copy_from_user(&value, arg, sizeof(value)))
+ return -EFAULT;
+
+ if (!value)
+ return -EINVAL;
+
+ raw_spin_lock_irq(&ctx->lock);
+ if (event->attr.freq) {
+ if (value > sysctl_perf_event_sample_rate) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ event->attr.sample_freq = value;
+ } else {
+ event->attr.sample_period = value;
+ event->hw.sample_period = value;
+ }
+unlock:
+ raw_spin_unlock_irq(&ctx->lock);
+
+ return ret;
+}
+
+static const struct file_operations perf_fops;
+
+static struct perf_event *perf_fget_light(int fd, int *fput_needed)
+{
+ struct file *file;
+
+ file = fget_light(fd, fput_needed);
+ if (!file)
+ return ERR_PTR(-EBADF);
+
+ if (file->f_op != &perf_fops) {
+ fput_light(file, *fput_needed);
+ *fput_needed = 0;
+ return ERR_PTR(-EBADF);
+ }
+
+ return file->private_data;
+}
+
+static int perf_event_set_output(struct perf_event *event,
+ struct perf_event *output_event);
+static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+
+static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct perf_event *event = file->private_data;
+ void (*func)(struct perf_event *);
+ u32 flags = arg;
+
+ switch (cmd) {
+ case PERF_EVENT_IOC_ENABLE:
+ func = perf_event_enable;
+ break;
+ case PERF_EVENT_IOC_DISABLE:
+ func = perf_event_disable;
+ break;
+ case PERF_EVENT_IOC_RESET:
+ func = perf_event_reset;
+ break;
+
+ case PERF_EVENT_IOC_REFRESH:
+ return perf_event_refresh(event, arg);
+
+ case PERF_EVENT_IOC_PERIOD:
+ return perf_event_period(event, (u64 __user *)arg);
+
+ case PERF_EVENT_IOC_SET_OUTPUT:
+ {
+ struct perf_event *output_event = NULL;
+ int fput_needed = 0;
+ int ret;
+
+ if (arg != -1) {
+ output_event = perf_fget_light(arg, &fput_needed);
+ if (IS_ERR(output_event))
+ return PTR_ERR(output_event);
+ }
+
+ ret = perf_event_set_output(event, output_event);
+ if (output_event)
+ fput_light(output_event->filp, fput_needed);
+
+ return ret;
+ }
+
+ case PERF_EVENT_IOC_SET_FILTER:
+ return perf_event_set_filter(event, (void __user *)arg);
+
+ default:
+ return -ENOTTY;
+ }
+
+ if (flags & PERF_IOC_FLAG_GROUP)
+ perf_event_for_each(event, func);
+ else
+ perf_event_for_each_child(event, func);
+
+ return 0;
+}
+
+int perf_event_task_enable(void)
+{
+ struct perf_event *event;
+
+ mutex_lock(¤t->perf_event_mutex);
+ list_for_each_entry(event, ¤t->perf_event_list, owner_entry)
+ perf_event_for_each_child(event, perf_event_enable);
+ mutex_unlock(¤t->perf_event_mutex);
+
+ return 0;
+}
+
+int perf_event_task_disable(void)
+{
+ struct perf_event *event;
+
+ mutex_lock(¤t->perf_event_mutex);
+ list_for_each_entry(event, ¤t->perf_event_list, owner_entry)
+ perf_event_for_each_child(event, perf_event_disable);
+ mutex_unlock(¤t->perf_event_mutex);
+
+ return 0;
+}
+
+#ifndef PERF_EVENT_INDEX_OFFSET
+# define PERF_EVENT_INDEX_OFFSET 0
+#endif
+
+static int perf_event_index(struct perf_event *event)
+{
+ if (event->hw.state & PERF_HES_STOPPED)
+ return 0;
+
+ if (event->state != PERF_EVENT_STATE_ACTIVE)
+ return 0;
+
+ return event->hw.idx + 1 - PERF_EVENT_INDEX_OFFSET;
+}
+
+/*
+ * Callers need to ensure there can be no nesting of this function, otherwise
+ * the seqlock logic goes bad. We can not serialize this because the arch
+ * code calls this from NMI context.
+ */
+void perf_event_update_userpage(struct perf_event *event)
+{
+ struct perf_event_mmap_page *userpg;
+ struct perf_buffer *buffer;
+
+ rcu_read_lock();
+ buffer = rcu_dereference(event->buffer);
+ if (!buffer)
+ goto unlock;
+
+ userpg = buffer->user_page;
+
+ /*
+ * Disable preemption so as to not let the corresponding user-space
+ * spin too long if we get preempted.
+ */
+ preempt_disable();
+ ++userpg->lock;
+ barrier();
+ userpg->index = perf_event_index(event);
+ userpg->offset = perf_event_count(event);
+ if (event->state == PERF_EVENT_STATE_ACTIVE)
+ userpg->offset -= local64_read(&event->hw.prev_count);
+
+ userpg->time_enabled = event->total_time_enabled +
+ atomic64_read(&event->child_total_time_enabled);
+
+ userpg->time_running = event->total_time_running +
+ atomic64_read(&event->child_total_time_running);
+
+ barrier();
+ ++userpg->lock;
+ preempt_enable();
+unlock:
+ rcu_read_unlock();
+}
+
+static unsigned long perf_data_size(struct perf_buffer *buffer);
+
+static void
+perf_buffer_init(struct perf_buffer *buffer, long watermark, int flags)
+{
+ long max_size = perf_data_size(buffer);
+
+ if (watermark)
+ buffer->watermark = min(max_size, watermark);
+
+ if (!buffer->watermark)
+ buffer->watermark = max_size / 2;
+
+ if (flags & PERF_BUFFER_WRITABLE)
+ buffer->writable = 1;
+
+ atomic_set(&buffer->refcount, 1);
+}
+
+#ifndef CONFIG_PERF_USE_VMALLOC
+
+/*
+ * Back perf_mmap() with regular GFP_KERNEL-0 pages.
+ */
+
+static struct page *
+perf_mmap_to_page(struct perf_buffer *buffer, unsigned long pgoff)
+{
+ if (pgoff > buffer->nr_pages)
+ return NULL;
+
+ if (pgoff == 0)
+ return virt_to_page(buffer->user_page);
+
+ return virt_to_page(buffer->data_pages[pgoff - 1]);
+}
+
+static void *perf_mmap_alloc_page(int cpu)
+{
+ struct page *page;
+ int node;
+
+ node = (cpu == -1) ? cpu : cpu_to_node(cpu);
+ page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
+static struct perf_buffer *
+perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags)
+{
+ struct perf_buffer *buffer;
+ unsigned long size;
+ int i;
+
+ size = sizeof(struct perf_buffer);
+ size += nr_pages * sizeof(void *);
+
+ buffer = kzalloc(size, GFP_KERNEL);
+ if (!buffer)
+ goto fail;
+
+ buffer->user_page = perf_mmap_alloc_page(cpu);
+ if (!buffer->user_page)
+ goto fail_user_page;
+
+ for (i = 0; i < nr_pages; i++) {
+ buffer->data_pages[i] = perf_mmap_alloc_page(cpu);
+ if (!buffer->data_pages[i])
+ goto fail_data_pages;
+ }
+
+ buffer->nr_pages = nr_pages;
+
+ perf_buffer_init(buffer, watermark, flags);
+
+ return buffer;
+
+fail_data_pages:
+ for (i--; i >= 0; i--)
+ free_page((unsigned long)buffer->data_pages[i]);
+
+ free_page((unsigned long)buffer->user_page);
+
+fail_user_page:
+ kfree(buffer);
+
+fail:
+ return NULL;
+}
+
+static void perf_mmap_free_page(unsigned long addr)
+{
+ struct page *page = virt_to_page((void *)addr);
+
+ page->mapping = NULL;
+ __free_page(page);
+}
+
+static void perf_buffer_free(struct perf_buffer *buffer)
+{
+ int i;
+
+ perf_mmap_free_page((unsigned long)buffer->user_page);
+ for (i = 0; i < buffer->nr_pages; i++)
+ perf_mmap_free_page((unsigned long)buffer->data_pages[i]);
+ kfree(buffer);
+}
+
+static inline int page_order(struct perf_buffer *buffer)
+{
+ return 0;
+}
+
+#else
+
+/*
+ * Back perf_mmap() with vmalloc memory.
+ *
+ * Required for architectures that have d-cache aliasing issues.
+ */
+
+static inline int page_order(struct perf_buffer *buffer)
+{
+ return buffer->page_order;
+}
+
+static struct page *
+perf_mmap_to_page(struct perf_buffer *buffer, unsigned long pgoff)
+{
+ if (pgoff > (1UL << page_order(buffer)))
+ return NULL;
+
+ return vmalloc_to_page((void *)buffer->user_page + pgoff * PAGE_SIZE);
+}
+
+static void perf_mmap_unmark_page(void *addr)
+{
+ struct page *page = vmalloc_to_page(addr);
+
+ page->mapping = NULL;
+}
+
+static void perf_buffer_free_work(struct work_struct *work)
+{
+ struct perf_buffer *buffer;
+ void *base;
+ int i, nr;
+
+ buffer = container_of(work, struct perf_buffer, work);
+ nr = 1 << page_order(buffer);
+
+ base = buffer->user_page;
+ for (i = 0; i < nr + 1; i++)
+ perf_mmap_unmark_page(base + (i * PAGE_SIZE));
+
+ vfree(base);
+ kfree(buffer);
+}
+
+static void perf_buffer_free(struct perf_buffer *buffer)
+{
+ schedule_work(&buffer->work);
+}
+
+static struct perf_buffer *
+perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags)
+{
+ struct perf_buffer *buffer;
+ unsigned long size;
+ void *all_buf;
+
+ size = sizeof(struct perf_buffer);
+ size += sizeof(void *);
+
+ buffer = kzalloc(size, GFP_KERNEL);
+ if (!buffer)
+ goto fail;
+
+ INIT_WORK(&buffer->work, perf_buffer_free_work);
+
+ all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE);
+ if (!all_buf)
+ goto fail_all_buf;
+
+ buffer->user_page = all_buf;
+ buffer->data_pages[0] = all_buf + PAGE_SIZE;
+ buffer->page_order = ilog2(nr_pages);
+ buffer->nr_pages = 1;
+
+ perf_buffer_init(buffer, watermark, flags);
+
+ return buffer;
+
+fail_all_buf:
+ kfree(buffer);
+
+fail:
+ return NULL;
+}
+
+#endif
+
+static unsigned long perf_data_size(struct perf_buffer *buffer)
+{
+ return buffer->nr_pages << (PAGE_SHIFT + page_order(buffer));
+}
+
+static int perf_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ struct perf_event *event = vma->vm_file->private_data;
+ struct perf_buffer *buffer;
+ int ret = VM_FAULT_SIGBUS;
+
+ if (vmf->flags & FAULT_FLAG_MKWRITE) {
+ if (vmf->pgoff == 0)
+ ret = 0;
+ return ret;
+ }
+
+ rcu_read_lock();
+ buffer = rcu_dereference(event->buffer);
+ if (!buffer)
+ goto unlock;
+
+ if (vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))
+ goto unlock;
+
+ vmf->page = perf_mmap_to_page(buffer, vmf->pgoff);
+ if (!vmf->page)
+ goto unlock;
+
+ get_page(vmf->page);
+ vmf->page->mapping = vma->vm_file->f_mapping;
+ vmf->page->index = vmf->pgoff;
+
+ ret = 0;
+unlock:
+ rcu_read_unlock();
+
+ return ret;
+}
+
+static void perf_buffer_free_rcu(struct rcu_head *rcu_head)
+{
+ struct perf_buffer *buffer;
+
+ buffer = container_of(rcu_head, struct perf_buffer, rcu_head);
+ perf_buffer_free(buffer);
+}
+
+static struct perf_buffer *perf_buffer_get(struct perf_event *event)
+{
+ struct perf_buffer *buffer;
+
+ rcu_read_lock();
+ buffer = rcu_dereference(event->buffer);
+ if (buffer) {
+ if (!atomic_inc_not_zero(&buffer->refcount))
+ buffer = NULL;
+ }
+ rcu_read_unlock();
+
+ return buffer;
+}
+
+static void perf_buffer_put(struct perf_buffer *buffer)
+{
+ if (!atomic_dec_and_test(&buffer->refcount))
+ return;
+
+ call_rcu(&buffer->rcu_head, perf_buffer_free_rcu);
+}
+
+static void perf_mmap_open(struct vm_area_struct *vma)
+{
+ struct perf_event *event = vma->vm_file->private_data;
+
+ atomic_inc(&event->mmap_count);
+}
+
+static void perf_mmap_close(struct vm_area_struct *vma)
+{
+ struct perf_event *event = vma->vm_file->private_data;
+
+ if (atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex)) {
+ unsigned long size = perf_data_size(event->buffer);
+ struct user_struct *user = event->mmap_user;
+ struct perf_buffer *buffer = event->buffer;
+
+ atomic_long_sub((size >> PAGE_SHIFT) + 1, &user->locked_vm);
+ vma->vm_mm->locked_vm -= event->mmap_locked;
+ rcu_assign_pointer(event->buffer, NULL);
+ mutex_unlock(&event->mmap_mutex);
+
+ perf_buffer_put(buffer);
+ free_uid(user);
+ }
+}
+
+static const struct vm_operations_struct perf_mmap_vmops = {
+ .open = perf_mmap_open,
+ .close = perf_mmap_close,
+ .fault = perf_mmap_fault,
+ .page_mkwrite = perf_mmap_fault,
+};
+
+static int perf_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct perf_event *event = file->private_data;
+ unsigned long user_locked, user_lock_limit;
+ struct user_struct *user = current_user();
+ unsigned long locked, lock_limit;
+ struct perf_buffer *buffer;
+ unsigned long vma_size;
+ unsigned long nr_pages;
+ long user_extra, extra;
+ int ret = 0, flags = 0;
+
+ /*
+ * Don't allow mmap() of inherited per-task counters. This would
+ * create a performance issue due to all children writing to the
+ * same buffer.
+ */
+ if (event->cpu == -1 && event->attr.inherit)
+ return -EINVAL;
+
+ if (!(vma->vm_flags & VM_SHARED))
+ return -EINVAL;
+
+ vma_size = vma->vm_end - vma->vm_start;
+ nr_pages = (vma_size / PAGE_SIZE) - 1;
+
+ /*
+ * If we have buffer pages ensure they're a power-of-two number, so we
+ * can do bitmasks instead of modulo.
+ */
+ if (nr_pages != 0 && !is_power_of_2(nr_pages))
+ return -EINVAL;
+
+ if (vma_size != PAGE_SIZE * (1 + nr_pages))
+ return -EINVAL;
+
+ if (vma->vm_pgoff != 0)
+ return -EINVAL;
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+ mutex_lock(&event->mmap_mutex);
+ if (event->buffer) {
+ if (event->buffer->nr_pages == nr_pages)
+ atomic_inc(&event->buffer->refcount);
+ else
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ user_extra = nr_pages + 1;
+ user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
+
+ /*
+ * Increase the limit linearly with more CPUs:
+ */
+ user_lock_limit *= num_online_cpus();
+
+ user_locked = atomic_long_read(&user->locked_vm) + user_extra;
+
+ extra = 0;
+ if (user_locked > user_lock_limit)
+ extra = user_locked - user_lock_limit;
+
+ lock_limit = rlimit(RLIMIT_MEMLOCK);
+ lock_limit >>= PAGE_SHIFT;
+ locked = vma->vm_mm->locked_vm + extra;
+
+ if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
+ !capable(CAP_IPC_LOCK)) {
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ WARN_ON(event->buffer);
+
+ if (vma->vm_flags & VM_WRITE)
+ flags |= PERF_BUFFER_WRITABLE;
+
+ buffer = perf_buffer_alloc(nr_pages, event->attr.wakeup_watermark,
+ event->cpu, flags);
+ if (!buffer) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+ rcu_assign_pointer(event->buffer, buffer);
+
+ atomic_long_add(user_extra, &user->locked_vm);
+ event->mmap_locked = extra;
+ event->mmap_user = get_current_user();
+ vma->vm_mm->locked_vm += event->mmap_locked;
+
+unlock:
+ if (!ret)
+ atomic_inc(&event->mmap_count);
+ mutex_unlock(&event->mmap_mutex);
+
+ vma->vm_flags |= VM_RESERVED;
+ vma->vm_ops = &perf_mmap_vmops;
+
+ return ret;
+}
+
+static int perf_fasync(int fd, struct file *filp, int on)
+{
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ struct perf_event *event = filp->private_data;
+ int retval;
+
+ mutex_lock(&inode->i_mutex);
+ retval = fasync_helper(fd, filp, on, &event->fasync);
+ mutex_unlock(&inode->i_mutex);
+
+ if (retval < 0)
+ return retval;
+
+ return 0;
+}
+
+static const struct file_operations perf_fops = {
+ .llseek = no_llseek,
+ .release = perf_release,
+ .read = perf_read,
+ .poll = perf_poll,
+ .unlocked_ioctl = perf_ioctl,
+ .compat_ioctl = perf_ioctl,
+ .mmap = perf_mmap,
+ .fasync = perf_fasync,
+};
+
+/*
+ * Perf event wakeup
+ *
+ * If there's data, ensure we set the poll() state and publish everything
+ * to user-space before waking everybody up.
+ */
+
+void perf_event_wakeup(struct perf_event *event)
+{
+ wake_up_all(&event->waitq);
+
+ if (event->pending_kill) {
+ kill_fasync(&event->fasync, SIGIO, event->pending_kill);
+ event->pending_kill = 0;
+ }
+}
+
+static void perf_pending_event(struct irq_work *entry)
+{
+ struct perf_event *event = container_of(entry,
+ struct perf_event, pending);
+
+ if (event->pending_disable) {
+ event->pending_disable = 0;
+ __perf_event_disable(event);
+ }
+
+ if (event->pending_wakeup) {
+ event->pending_wakeup = 0;
+ perf_event_wakeup(event);
+ }
+}
+
+/*
+ * We assume there is only KVM supporting the callbacks.
+ * Later on, we might change it to a list if there is
+ * another virtualization implementation supporting the callbacks.
+ */
+struct perf_guest_info_callbacks *perf_guest_cbs;
+
+int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+ perf_guest_cbs = cbs;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
+
+int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+ perf_guest_cbs = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
+
+/*
+ * Output
+ */
+static bool perf_output_space(struct perf_buffer *buffer, unsigned long tail,
+ unsigned long offset, unsigned long head)
+{
+ unsigned long mask;
+
+ if (!buffer->writable)
+ return true;
+
+ mask = perf_data_size(buffer) - 1;
+
+ offset = (offset - tail) & mask;
+ head = (head - tail) & mask;
+
+ if ((int)(head - offset) < 0)
+ return false;
+
+ return true;
+}
+
+static void perf_output_wakeup(struct perf_output_handle *handle)
+{
+ atomic_set(&handle->buffer->poll, POLL_IN);
+
+ if (handle->nmi) {
+ handle->event->pending_wakeup = 1;
+ irq_work_queue(&handle->event->pending);
+ } else
+ perf_event_wakeup(handle->event);
+}
+
+/*
+ * We need to ensure a later event_id doesn't publish a head when a former
+ * event isn't done writing. However since we need to deal with NMIs we
+ * cannot fully serialize things.
+ *
+ * We only publish the head (and generate a wakeup) when the outer-most
+ * event completes.
+ */
+static void perf_output_get_handle(struct perf_output_handle *handle)
+{
+ struct perf_buffer *buffer = handle->buffer;
+
+ preempt_disable();
+ local_inc(&buffer->nest);
+ handle->wakeup = local_read(&buffer->wakeup);
+}
+
+static void perf_output_put_handle(struct perf_output_handle *handle)
+{
+ struct perf_buffer *buffer = handle->buffer;
+ unsigned long head;
+
+again:
+ head = local_read(&buffer->head);
+
+ /*
+ * IRQ/NMI can happen here, which means we can miss a head update.
+ */
+
+ if (!local_dec_and_test(&buffer->nest))
+ goto out;
+
+ /*
+ * Publish the known good head. Rely on the full barrier implied
+ * by atomic_dec_and_test() order the buffer->head read and this
+ * write.
+ */
+ buffer->user_page->data_head = head;
+
+ /*
+ * Now check if we missed an update, rely on the (compiler)
+ * barrier in atomic_dec_and_test() to re-read buffer->head.
+ */
+ if (unlikely(head != local_read(&buffer->head))) {
+ local_inc(&buffer->nest);
+ goto again;
+ }
+
+ if (handle->wakeup != local_read(&buffer->wakeup))
+ perf_output_wakeup(handle);
+
+out:
+ preempt_enable();
+}
+
+__always_inline void perf_output_copy(struct perf_output_handle *handle,
+ const void *buf, unsigned int len)
+{
+ do {
+ unsigned long size = min_t(unsigned long, handle->size, len);
+
+ memcpy(handle->addr, buf, size);
+
+ len -= size;
+ handle->addr += size;
+ buf += size;
+ handle->size -= size;
+ if (!handle->size) {
+ struct perf_buffer *buffer = handle->buffer;
+
+ handle->page++;
+ handle->page &= buffer->nr_pages - 1;
+ handle->addr = buffer->data_pages[handle->page];
+ handle->size = PAGE_SIZE << page_order(buffer);
+ }
+ } while (len);
+}
+
+static void __perf_event_header__init_id(struct perf_event_header *header,
+ struct perf_sample_data *data,
+ struct perf_event *event)
+{
+ u64 sample_type = event->attr.sample_type;
+
+ data->type = sample_type;
+ header->size += event->id_header_size;
+
+ if (sample_type & PERF_SAMPLE_TID) {
+ /* namespace issues */
+ data->tid_entry.pid = perf_event_pid(event, current);
+ data->tid_entry.tid = perf_event_tid(event, current);
+ }
+
+ if (sample_type & PERF_SAMPLE_TIME)
+ data->time = perf_clock();
+
+ if (sample_type & PERF_SAMPLE_ID)
+ data->id = primary_event_id(event);
+
+ if (sample_type & PERF_SAMPLE_STREAM_ID)
+ data->stream_id = event->id;
+
+ if (sample_type & PERF_SAMPLE_CPU) {
+ data->cpu_entry.cpu = raw_smp_processor_id();
+ data->cpu_entry.reserved = 0;
+ }
+}
+
+static void perf_event_header__init_id(struct perf_event_header *header,
+ struct perf_sample_data *data,
+ struct perf_event *event)
+{
+ if (event->attr.sample_id_all)
+ __perf_event_header__init_id(header, data, event);
+}
+
+static void __perf_event__output_id_sample(struct perf_output_handle *handle,
+ struct perf_sample_data *data)
+{
+ u64 sample_type = data->type;
+
+ if (sample_type & PERF_SAMPLE_TID)
+ perf_output_put(handle, data->tid_entry);
+
+ if (sample_type & PERF_SAMPLE_TIME)
+ perf_output_put(handle, data->time);
+
+ if (sample_type & PERF_SAMPLE_ID)
+ perf_output_put(handle, data->id);
+
+ if (sample_type & PERF_SAMPLE_STREAM_ID)
+ perf_output_put(handle, data->stream_id);
+
+ if (sample_type & PERF_SAMPLE_CPU)
+ perf_output_put(handle, data->cpu_entry);
+}
+
+static void perf_event__output_id_sample(struct perf_event *event,
+ struct perf_output_handle *handle,
+ struct perf_sample_data *sample)
+{
+ if (event->attr.sample_id_all)
+ __perf_event__output_id_sample(handle, sample);
+}
+
+int perf_output_begin(struct perf_output_handle *handle,
+ struct perf_event *event, unsigned int size,
+ int nmi, int sample)
+{
+ struct perf_buffer *buffer;
+ unsigned long tail, offset, head;
+ int have_lost;
+ struct perf_sample_data sample_data;
+ struct {
+ struct perf_event_header header;
+ u64 id;
+ u64 lost;
+ } lost_event;
+
+ rcu_read_lock();
+ /*
+ * For inherited events we send all the output towards the parent.
+ */
+ if (event->parent)
+ event = event->parent;
+
+ buffer = rcu_dereference(event->buffer);
+ if (!buffer)
+ goto out;
+
+ handle->buffer = buffer;
+ handle->event = event;
+ handle->nmi = nmi;
+ handle->sample = sample;
+
+ if (!buffer->nr_pages)
+ goto out;
+
+ have_lost = local_read(&buffer->lost);
+ if (have_lost) {
+ lost_event.header.size = sizeof(lost_event);
+ perf_event_header__init_id(&lost_event.header, &sample_data,
+ event);
+ size += lost_event.header.size;
+ }
+
+ perf_output_get_handle(handle);
+
+ do {
+ /*
+ * Userspace could choose to issue a mb() before updating the
+ * tail pointer. So that all reads will be completed before the
+ * write is issued.
+ */
+ tail = ACCESS_ONCE(buffer->user_page->data_tail);
+ smp_rmb();
+ offset = head = local_read(&buffer->head);
+ head += size;
+ if (unlikely(!perf_output_space(buffer, tail, offset, head)))
+ goto fail;
+ } while (local_cmpxchg(&buffer->head, offset, head) != offset);
+
+ if (head - local_read(&buffer->wakeup) > buffer->watermark)
+ local_add(buffer->watermark, &buffer->wakeup);
+
+ handle->page = offset >> (PAGE_SHIFT + page_order(buffer));
+ handle->page &= buffer->nr_pages - 1;
+ handle->size = offset & ((PAGE_SIZE << page_order(buffer)) - 1);
+ handle->addr = buffer->data_pages[handle->page];
+ handle->addr += handle->size;
+ handle->size = (PAGE_SIZE << page_order(buffer)) - handle->size;
+
+ if (have_lost) {
+ lost_event.header.type = PERF_RECORD_LOST;
+ lost_event.header.misc = 0;
+ lost_event.id = event->id;
+ lost_event.lost = local_xchg(&buffer->lost, 0);
+
+ perf_output_put(handle, lost_event);
+ perf_event__output_id_sample(event, handle, &sample_data);
+ }
+
+ return 0;
+
+fail:
+ local_inc(&buffer->lost);
+ perf_output_put_handle(handle);
+out:
+ rcu_read_unlock();
+
+ return -ENOSPC;
+}
+
+void perf_output_end(struct perf_output_handle *handle)
+{
+ struct perf_event *event = handle->event;
+ struct perf_buffer *buffer = handle->buffer;
+
+ int wakeup_events = event->attr.wakeup_events;
+
+ if (handle->sample && wakeup_events) {
+ int events = local_inc_return(&buffer->events);
+ if (events >= wakeup_events) {
+ local_sub(wakeup_events, &buffer->events);
+ local_inc(&buffer->wakeup);
+ }
+ }
+
+ perf_output_put_handle(handle);
+ rcu_read_unlock();
+}
+
+static void perf_output_read_one(struct perf_output_handle *handle,
+ struct perf_event *event,
+ u64 enabled, u64 running)
+{
+ u64 read_format = event->attr.read_format;
+ u64 values[4];
+ int n = 0;
+
+ values[n++] = perf_event_count(event);
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
+ values[n++] = enabled +
+ atomic64_read(&event->child_total_time_enabled);
+ }
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
+ values[n++] = running +
+ atomic64_read(&event->child_total_time_running);
+ }
+ if (read_format & PERF_FORMAT_ID)
+ values[n++] = primary_event_id(event);
+
+ perf_output_copy(handle, values, n * sizeof(u64));
+}
+
+/*
+ * XXX PERF_FORMAT_GROUP vs inherited events seems difficult.
+ */
+static void perf_output_read_group(struct perf_output_handle *handle,
+ struct perf_event *event,
+ u64 enabled, u64 running)
+{
+ struct perf_event *leader = event->group_leader, *sub;
+ u64 read_format = event->attr.read_format;
+ u64 values[5];
+ int n = 0;
+
+ values[n++] = 1 + leader->nr_siblings;
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+ values[n++] = enabled;
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+ values[n++] = running;
+
+ if (leader != event)
+ leader->pmu->read(leader);
+
+ values[n++] = perf_event_count(leader);
+ if (read_format & PERF_FORMAT_ID)
+ values[n++] = primary_event_id(leader);
+
+ perf_output_copy(handle, values, n * sizeof(u64));
+
+ list_for_each_entry(sub, &leader->sibling_list, group_entry) {
+ n = 0;
+
+ if (sub != event)
+ sub->pmu->read(sub);
+
+ values[n++] = perf_event_count(sub);
+ if (read_format & PERF_FORMAT_ID)
+ values[n++] = primary_event_id(sub);
+
+ perf_output_copy(handle, values, n * sizeof(u64));
+ }
+}
+
+#define PERF_FORMAT_TOTAL_TIMES (PERF_FORMAT_TOTAL_TIME_ENABLED|\
+ PERF_FORMAT_TOTAL_TIME_RUNNING)
+
+static void perf_output_read(struct perf_output_handle *handle,
+ struct perf_event *event)
+{
+ u64 enabled = 0, running = 0, now, ctx_time;
+ u64 read_format = event->attr.read_format;
+
+ /*
+ * compute total_time_enabled, total_time_running
+ * based on snapshot values taken when the event
+ * was last scheduled in.
+ *
+ * we cannot simply called update_context_time()
+ * because of locking issue as we are called in
+ * NMI context
+ */
+ if (read_format & PERF_FORMAT_TOTAL_TIMES) {
+ now = perf_clock();
+ ctx_time = event->shadow_ctx_time + now;
+ enabled = ctx_time - event->tstamp_enabled;
+ running = ctx_time - event->tstamp_running;
+ }
+
+ if (event->attr.read_format & PERF_FORMAT_GROUP)
+ perf_output_read_group(handle, event, enabled, running);
+ else
+ perf_output_read_one(handle, event, enabled, running);
+}
+
+void perf_output_sample(struct perf_output_handle *handle,
+ struct perf_event_header *header,
+ struct perf_sample_data *data,
+ struct perf_event *event)
+{
+ u64 sample_type = data->type;
+
+ perf_output_put(handle, *header);
+
+ if (sample_type & PERF_SAMPLE_IP)
+ perf_output_put(handle, data->ip);
+
+ if (sample_type & PERF_SAMPLE_TID)
+ perf_output_put(handle, data->tid_entry);
+
+ if (sample_type & PERF_SAMPLE_TIME)
+ perf_output_put(handle, data->time);
+
+ if (sample_type & PERF_SAMPLE_ADDR)
+ perf_output_put(handle, data->addr);
+
+ if (sample_type & PERF_SAMPLE_ID)
+ perf_output_put(handle, data->id);
+
+ if (sample_type & PERF_SAMPLE_STREAM_ID)
+ perf_output_put(handle, data->stream_id);
+
+ if (sample_type & PERF_SAMPLE_CPU)
+ perf_output_put(handle, data->cpu_entry);
+
+ if (sample_type & PERF_SAMPLE_PERIOD)
+ perf_output_put(handle, data->period);
+
+ if (sample_type & PERF_SAMPLE_READ)
+ perf_output_read(handle, event);
+
+ if (sample_type & PERF_SAMPLE_CALLCHAIN) {
+ if (data->callchain) {
+ int size = 1;
+
+ if (data->callchain)
+ size += data->callchain->nr;
+
+ size *= sizeof(u64);
+
+ perf_output_copy(handle, data->callchain, size);
+ } else {
+ u64 nr = 0;
+ perf_output_put(handle, nr);
+ }
+ }
+
+ if (sample_type & PERF_SAMPLE_RAW) {
+ if (data->raw) {
+ perf_output_put(handle, data->raw->size);
+ perf_output_copy(handle, data->raw->data,
+ data->raw->size);
+ } else {
+ struct {
+ u32 size;
+ u32 data;
+ } raw = {
+ .size = sizeof(u32),
+ .data = 0,
+ };
+ perf_output_put(handle, raw);
+ }
+ }
+}
+
+void perf_prepare_sample(struct perf_event_header *header,
+ struct perf_sample_data *data,
+ struct perf_event *event,
+ struct pt_regs *regs)
+{
+ u64 sample_type = event->attr.sample_type;
+
+ header->type = PERF_RECORD_SAMPLE;
+ header->size = sizeof(*header) + event->header_size;
+
+ header->misc = 0;
+ header->misc |= perf_misc_flags(regs);
+
+ __perf_event_header__init_id(header, data, event);
+
+ if (sample_type & PERF_SAMPLE_IP)
+ data->ip = perf_instruction_pointer(regs);
+
+ if (sample_type & PERF_SAMPLE_CALLCHAIN) {
+ int size = 1;
+
+ data->callchain = perf_callchain(regs);
+
+ if (data->callchain)
+ size += data->callchain->nr;
+
+ header->size += size * sizeof(u64);
+ }
+
+ if (sample_type & PERF_SAMPLE_RAW) {
+ int size = sizeof(u32);
+
+ if (data->raw)
+ size += data->raw->size;
+ else
+ size += sizeof(u32);
+
+ WARN_ON_ONCE(size & (sizeof(u64)-1));
+ header->size += size;
+ }
+}
+
+static void perf_event_output(struct perf_event *event, int nmi,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct perf_output_handle handle;
+ struct perf_event_header header;
+
+ /* protect the callchain buffers */
+ rcu_read_lock();
+
+ perf_prepare_sample(&header, data, event, regs);
+
+ if (perf_output_begin(&handle, event, header.size, nmi, 1))
+ goto exit;
+
+ perf_output_sample(&handle, &header, data, event);
+
+ perf_output_end(&handle);
+
+exit:
+ rcu_read_unlock();
+}
+
+/*
+ * read event_id
+ */
+
+struct perf_read_event {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+};
+
+static void
+perf_event_read_event(struct perf_event *event,
+ struct task_struct *task)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ struct perf_read_event read_event = {
+ .header = {
+ .type = PERF_RECORD_READ,
+ .misc = 0,
+ .size = sizeof(read_event) + event->read_size,
+ },
+ .pid = perf_event_pid(event, task),
+ .tid = perf_event_tid(event, task),
+ };
+ int ret;
+
+ perf_event_header__init_id(&read_event.header, &sample, event);
+ ret = perf_output_begin(&handle, event, read_event.header.size, 0, 0);
+ if (ret)
+ return;
+
+ perf_output_put(&handle, read_event);
+ perf_output_read(&handle, event);
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+}
+
+/*
+ * task tracking -- fork/exit
+ *
+ * enabled by: attr.comm | attr.mmap | attr.mmap_data | attr.task
+ */
+
+struct perf_task_event {
+ struct task_struct *task;
+ struct perf_event_context *task_ctx;
+
+ struct {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 ppid;
+ u32 tid;
+ u32 ptid;
+ u64 time;
+ } event_id;
+};
+
+static void perf_event_task_output(struct perf_event *event,
+ struct perf_task_event *task_event)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ struct task_struct *task = task_event->task;
+ int ret, size = task_event->event_id.header.size;
+
+ perf_event_header__init_id(&task_event->event_id.header, &sample, event);
+
+ ret = perf_output_begin(&handle, event,
+ task_event->event_id.header.size, 0, 0);
+ if (ret)
+ goto out;
+
+ task_event->event_id.pid = perf_event_pid(event, task);
+ task_event->event_id.ppid = perf_event_pid(event, current);
+
+ task_event->event_id.tid = perf_event_tid(event, task);
+ task_event->event_id.ptid = perf_event_tid(event, current);
+
+ perf_output_put(&handle, task_event->event_id);
+
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+out:
+ task_event->event_id.header.size = size;
+}
+
+static int perf_event_task_match(struct perf_event *event)
+{
+ if (event->state < PERF_EVENT_STATE_INACTIVE)
+ return 0;
+
+ if (!event_filter_match(event))
+ return 0;
+
+ if (event->attr.comm || event->attr.mmap ||
+ event->attr.mmap_data || event->attr.task)
+ return 1;
+
+ return 0;
+}
+
+static void perf_event_task_ctx(struct perf_event_context *ctx,
+ struct perf_task_event *task_event)
+{
+ struct perf_event *event;
+
+ list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+ if (perf_event_task_match(event))
+ perf_event_task_output(event, task_event);
+ }
+}
+
+static void perf_event_task_event(struct perf_task_event *task_event)
+{
+ struct perf_cpu_context *cpuctx;
+ struct perf_event_context *ctx;
+ struct pmu *pmu;
+ int ctxn;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+ if (cpuctx->active_pmu != pmu)
+ goto next;
+ perf_event_task_ctx(&cpuctx->ctx, task_event);
+
+ ctx = task_event->task_ctx;
+ if (!ctx) {
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto next;
+ ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+ }
+ if (ctx)
+ perf_event_task_ctx(ctx, task_event);
+next:
+ put_cpu_ptr(pmu->pmu_cpu_context);
+ }
+ rcu_read_unlock();
+}
+
+static void perf_event_task(struct task_struct *task,
+ struct perf_event_context *task_ctx,
+ int new)
+{
+ struct perf_task_event task_event;
+
+ if (!atomic_read(&nr_comm_events) &&
+ !atomic_read(&nr_mmap_events) &&
+ !atomic_read(&nr_task_events))
+ return;
+
+ task_event = (struct perf_task_event){
+ .task = task,
+ .task_ctx = task_ctx,
+ .event_id = {
+ .header = {
+ .type = new ? PERF_RECORD_FORK : PERF_RECORD_EXIT,
+ .misc = 0,
+ .size = sizeof(task_event.event_id),
+ },
+ /* .pid */
+ /* .ppid */
+ /* .tid */
+ /* .ptid */
+ .time = perf_clock(),
+ },
+ };
+
+ perf_event_task_event(&task_event);
+}
+
+void perf_event_fork(struct task_struct *task)
+{
+ perf_event_task(task, NULL, 1);
+}
+
+/*
+ * comm tracking
+ */
+
+struct perf_comm_event {
+ struct task_struct *task;
+ char *comm;
+ int comm_size;
+
+ struct {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+ } event_id;
+};
+
+static void perf_event_comm_output(struct perf_event *event,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ int size = comm_event->event_id.header.size;
+ int ret;
+
+ perf_event_header__init_id(&comm_event->event_id.header, &sample, event);
+ ret = perf_output_begin(&handle, event,
+ comm_event->event_id.header.size, 0, 0);
+
+ if (ret)
+ goto out;
+
+ comm_event->event_id.pid = perf_event_pid(event, comm_event->task);
+ comm_event->event_id.tid = perf_event_tid(event, comm_event->task);
+
+ perf_output_put(&handle, comm_event->event_id);
+ perf_output_copy(&handle, comm_event->comm,
+ comm_event->comm_size);
+
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+out:
+ comm_event->event_id.header.size = size;
+}
+
+static int perf_event_comm_match(struct perf_event *event)
+{
+ if (event->state < PERF_EVENT_STATE_INACTIVE)
+ return 0;
+
+ if (!event_filter_match(event))
+ return 0;
+
+ if (event->attr.comm)
+ return 1;
+
+ return 0;
+}
+
+static void perf_event_comm_ctx(struct perf_event_context *ctx,
+ struct perf_comm_event *comm_event)
+{
+ struct perf_event *event;
+
+ list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+ if (perf_event_comm_match(event))
+ perf_event_comm_output(event, comm_event);
+ }
+}
+
+static void perf_event_comm_event(struct perf_comm_event *comm_event)
+{
+ struct perf_cpu_context *cpuctx;
+ struct perf_event_context *ctx;
+ char comm[TASK_COMM_LEN];
+ unsigned int size;
+ struct pmu *pmu;
+ int ctxn;
+
+ memset(comm, 0, sizeof(comm));
+ strlcpy(comm, comm_event->task->comm, sizeof(comm));
+ size = ALIGN(strlen(comm)+1, sizeof(u64));
+
+ comm_event->comm = comm;
+ comm_event->comm_size = size;
+
+ comm_event->event_id.header.size = sizeof(comm_event->event_id) + size;
+ rcu_read_lock();
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+ if (cpuctx->active_pmu != pmu)
+ goto next;
+ perf_event_comm_ctx(&cpuctx->ctx, comm_event);
+
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto next;
+
+ ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+ if (ctx)
+ perf_event_comm_ctx(ctx, comm_event);
+next:
+ put_cpu_ptr(pmu->pmu_cpu_context);
+ }
+ rcu_read_unlock();
+}
+
+void perf_event_comm(struct task_struct *task)
+{
+ struct perf_comm_event comm_event;
+ struct perf_event_context *ctx;
+ int ctxn;
+
+ for_each_task_context_nr(ctxn) {
+ ctx = task->perf_event_ctxp[ctxn];
+ if (!ctx)
+ continue;
+
+ perf_event_enable_on_exec(ctx);
+ }
+
+ if (!atomic_read(&nr_comm_events))
+ return;
+
+ comm_event = (struct perf_comm_event){
+ .task = task,
+ /* .comm */
+ /* .comm_size */
+ .event_id = {
+ .header = {
+ .type = PERF_RECORD_COMM,
+ .misc = 0,
+ /* .size */
+ },
+ /* .pid */
+ /* .tid */
+ },
+ };
+
+ perf_event_comm_event(&comm_event);
+}
+
+/*
+ * mmap tracking
+ */
+
+struct perf_mmap_event {
+ struct vm_area_struct *vma;
+
+ const char *file_name;
+ int file_size;
+
+ struct {
+ struct perf_event_header header;
+
+ u32 pid;
+ u32 tid;
+ u64 start;
+ u64 len;
+ u64 pgoff;
+ } event_id;
+};
+
+static void perf_event_mmap_output(struct perf_event *event,
+ struct perf_mmap_event *mmap_event)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ int size = mmap_event->event_id.header.size;
+ int ret;
+
+ perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
+ ret = perf_output_begin(&handle, event,
+ mmap_event->event_id.header.size, 0, 0);
+ if (ret)
+ goto out;
+
+ mmap_event->event_id.pid = perf_event_pid(event, current);
+ mmap_event->event_id.tid = perf_event_tid(event, current);
+
+ perf_output_put(&handle, mmap_event->event_id);
+ perf_output_copy(&handle, mmap_event->file_name,
+ mmap_event->file_size);
+
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+out:
+ mmap_event->event_id.header.size = size;
+}
+
+static int perf_event_mmap_match(struct perf_event *event,
+ struct perf_mmap_event *mmap_event,
+ int executable)
+{
+ if (event->state < PERF_EVENT_STATE_INACTIVE)
+ return 0;
+
+ if (!event_filter_match(event))
+ return 0;
+
+ if ((!executable && event->attr.mmap_data) ||
+ (executable && event->attr.mmap))
+ return 1;
+
+ return 0;
+}
+
+static void perf_event_mmap_ctx(struct perf_event_context *ctx,
+ struct perf_mmap_event *mmap_event,
+ int executable)
+{
+ struct perf_event *event;
+
+ list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+ if (perf_event_mmap_match(event, mmap_event, executable))
+ perf_event_mmap_output(event, mmap_event);
+ }
+}
+
+static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
+{
+ struct perf_cpu_context *cpuctx;
+ struct perf_event_context *ctx;
+ struct vm_area_struct *vma = mmap_event->vma;
+ struct file *file = vma->vm_file;
+ unsigned int size;
+ char tmp[16];
+ char *buf = NULL;
+ const char *name;
+ struct pmu *pmu;
+ int ctxn;
+
+ memset(tmp, 0, sizeof(tmp));
+
+ if (file) {
+ /*
+ * d_path works from the end of the buffer backwards, so we
+ * need to add enough zero bytes after the string to handle
+ * the 64bit alignment we do later.
+ */
+ buf = kzalloc(PATH_MAX + sizeof(u64), GFP_KERNEL);
+ if (!buf) {
+ name = strncpy(tmp, "//enomem", sizeof(tmp));
+ goto got_name;
+ }
+ name = d_path(&file->f_path, buf, PATH_MAX);
+ if (IS_ERR(name)) {
+ name = strncpy(tmp, "//toolong", sizeof(tmp));
+ goto got_name;
+ }
+ } else {
+ if (arch_vma_name(mmap_event->vma)) {
+ name = strncpy(tmp, arch_vma_name(mmap_event->vma),
+ sizeof(tmp));
+ goto got_name;
+ }
+
+ if (!vma->vm_mm) {
+ name = strncpy(tmp, "[vdso]", sizeof(tmp));
+ goto got_name;
+ } else if (vma->vm_start <= vma->vm_mm->start_brk &&
+ vma->vm_end >= vma->vm_mm->brk) {
+ name = strncpy(tmp, "[heap]", sizeof(tmp));
+ goto got_name;
+ } else if (vma->vm_start <= vma->vm_mm->start_stack &&
+ vma->vm_end >= vma->vm_mm->start_stack) {
+ name = strncpy(tmp, "[stack]", sizeof(tmp));
+ goto got_name;
+ }
+
+ name = strncpy(tmp, "//anon", sizeof(tmp));
+ goto got_name;
+ }
+
+got_name:
+ size = ALIGN(strlen(name)+1, sizeof(u64));
+
+ mmap_event->file_name = name;
+ mmap_event->file_size = size;
+
+ mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+ if (cpuctx->active_pmu != pmu)
+ goto next;
+ perf_event_mmap_ctx(&cpuctx->ctx, mmap_event,
+ vma->vm_flags & VM_EXEC);
+
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto next;
+
+ ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+ if (ctx) {
+ perf_event_mmap_ctx(ctx, mmap_event,
+ vma->vm_flags & VM_EXEC);
+ }
+next:
+ put_cpu_ptr(pmu->pmu_cpu_context);
+ }
+ rcu_read_unlock();
+
+ kfree(buf);
+}
+
+void perf_event_mmap(struct vm_area_struct *vma)
+{
+ struct perf_mmap_event mmap_event;
+
+ if (!atomic_read(&nr_mmap_events))
+ return;
+
+ mmap_event = (struct perf_mmap_event){
+ .vma = vma,
+ /* .file_name */
+ /* .file_size */
+ .event_id = {
+ .header = {
+ .type = PERF_RECORD_MMAP,
+ .misc = PERF_RECORD_MISC_USER,
+ /* .size */
+ },
+ /* .pid */
+ /* .tid */
+ .start = vma->vm_start,
+ .len = vma->vm_end - vma->vm_start,
+ .pgoff = (u64)vma->vm_pgoff << PAGE_SHIFT,
+ },
+ };
+
+ perf_event_mmap_event(&mmap_event);
+}
+
+/*
+ * IRQ throttle logging
+ */
+
+static void perf_log_throttle(struct perf_event *event, int enable)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ int ret;
+
+ struct {
+ struct perf_event_header header;
+ u64 time;
+ u64 id;
+ u64 stream_id;
+ } throttle_event = {
+ .header = {
+ .type = PERF_RECORD_THROTTLE,
+ .misc = 0,
+ .size = sizeof(throttle_event),
+ },
+ .time = perf_clock(),
+ .id = primary_event_id(event),
+ .stream_id = event->id,
+ };
+
+ if (enable)
+ throttle_event.header.type = PERF_RECORD_UNTHROTTLE;
+
+ perf_event_header__init_id(&throttle_event.header, &sample, event);
+
+ ret = perf_output_begin(&handle, event,
+ throttle_event.header.size, 1, 0);
+ if (ret)
+ return;
+
+ perf_output_put(&handle, throttle_event);
+ perf_event__output_id_sample(event, &handle, &sample);
+ perf_output_end(&handle);
+}
+
+/*
+ * Generic event overflow handling, sampling.
+ */
+
+static int __perf_event_overflow(struct perf_event *event, int nmi,
+ int throttle, struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ int events = atomic_read(&event->event_limit);
+ struct hw_perf_event *hwc = &event->hw;
+ int ret = 0;
+
+ /*
+ * Non-sampling counters might still use the PMI to fold short
+ * hardware counters, ignore those.
+ */
+ if (unlikely(!is_sampling_event(event)))
+ return 0;
+
+ if (unlikely(hwc->interrupts >= max_samples_per_tick)) {
+ if (throttle) {
+ hwc->interrupts = MAX_INTERRUPTS;
+ perf_log_throttle(event, 0);
+ ret = 1;
+ }
+ } else
+ hwc->interrupts++;
+
+ if (event->attr.freq) {
+ u64 now = perf_clock();
+ s64 delta = now - hwc->freq_time_stamp;
+
+ hwc->freq_time_stamp = now;
+
+ if (delta > 0 && delta < 2*TICK_NSEC)
+ perf_adjust_period(event, delta, hwc->last_period);
+ }
+
+ /*
+ * XXX event_limit might not quite work as expected on inherited
+ * events
+ */
+
+ event->pending_kill = POLL_IN;
+ if (events && atomic_dec_and_test(&event->event_limit)) {
+ ret = 1;
+ event->pending_kill = POLL_HUP;
+ if (nmi) {
+ event->pending_disable = 1;
+ irq_work_queue(&event->pending);
+ } else
+ perf_event_disable(event);
+ }
+
+ if (event->overflow_handler)
+ event->overflow_handler(event, nmi, data, regs);
+ else
+ perf_event_output(event, nmi, data, regs);
+
+ return ret;
+}
+
+int perf_event_overflow(struct perf_event *event, int nmi,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ return __perf_event_overflow(event, nmi, 1, data, regs);
+}
+
+/*
+ * Generic software event infrastructure
+ */
+
+struct swevent_htable {
+ struct swevent_hlist *swevent_hlist;
+ struct mutex hlist_mutex;
+ int hlist_refcount;
+
+ /* Recursion avoidance in each contexts */
+ int recursion[PERF_NR_CONTEXTS];
+};
+
+static DEFINE_PER_CPU(struct swevent_htable, swevent_htable);
+
+/*
+ * We directly increment event->count and keep a second value in
+ * event->hw.period_left to count intervals. This period event
+ * is kept in the range [-sample_period, 0] so that we can use the
+ * sign as trigger.
+ */
+
+static u64 perf_swevent_set_period(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ u64 period = hwc->last_period;
+ u64 nr, offset;
+ s64 old, val;
+
+ hwc->last_period = hwc->sample_period;
+
+again:
+ old = val = local64_read(&hwc->period_left);
+ if (val < 0)
+ return 0;
+
+ nr = div64_u64(period + val, period);
+ offset = nr * period;
+ val -= offset;
+ if (local64_cmpxchg(&hwc->period_left, old, val) != old)
+ goto again;
+
+ return nr;
+}
+
+static void perf_swevent_overflow(struct perf_event *event, u64 overflow,
+ int nmi, struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ int throttle = 0;
+
+ data->period = event->hw.last_period;
+ if (!overflow)
+ overflow = perf_swevent_set_period(event);
+
+ if (hwc->interrupts == MAX_INTERRUPTS)
+ return;
+
+ for (; overflow; overflow--) {
+ if (__perf_event_overflow(event, nmi, throttle,
+ data, regs)) {
+ /*
+ * We inhibit the overflow from happening when
+ * hwc->interrupts == MAX_INTERRUPTS.
+ */
+ break;
+ }
+ throttle = 1;
+ }
+}
+
+static void perf_swevent_event(struct perf_event *event, u64 nr,
+ int nmi, struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ local64_add(nr, &event->count);
+
+ if (!regs)
+ return;
+
+ if (!is_sampling_event(event))
+ return;
+
+ if (nr == 1 && hwc->sample_period == 1 && !event->attr.freq)
+ return perf_swevent_overflow(event, 1, nmi, data, regs);
+
+ if (local64_add_negative(nr, &hwc->period_left))
+ return;
+
+ perf_swevent_overflow(event, 0, nmi, data, regs);
+}
+
+static int perf_exclude_event(struct perf_event *event,
+ struct pt_regs *regs)
+{
+ if (event->hw.state & PERF_HES_STOPPED)
+ return 1;
+
+ if (regs) {
+ if (event->attr.exclude_user && user_mode(regs))
+ return 1;
+
+ if (event->attr.exclude_kernel && !user_mode(regs))
+ return 1;
+ }
+
+ return 0;
+}
+
+static int perf_swevent_match(struct perf_event *event,
+ enum perf_type_id type,
+ u32 event_id,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ if (event->attr.type != type)
+ return 0;
+
+ if (event->attr.config != event_id)
+ return 0;
+
+ if (perf_exclude_event(event, regs))
+ return 0;
+
+ return 1;
+}
+
+static inline u64 swevent_hash(u64 type, u32 event_id)
+{
+ u64 val = event_id | (type << 32);
+
+ return hash_64(val, SWEVENT_HLIST_BITS);
+}
+
+static inline struct hlist_head *
+__find_swevent_head(struct swevent_hlist *hlist, u64 type, u32 event_id)
+{
+ u64 hash = swevent_hash(type, event_id);
+
+ return &hlist->heads[hash];
+}
+
+/* For the read side: events when they trigger */
+static inline struct hlist_head *
+find_swevent_head_rcu(struct swevent_htable *swhash, u64 type, u32 event_id)
+{
+ struct swevent_hlist *hlist;
+
+ hlist = rcu_dereference(swhash->swevent_hlist);
+ if (!hlist)
+ return NULL;
+
+ return __find_swevent_head(hlist, type, event_id);
+}
+
+/* For the event head insertion and removal in the hlist */
+static inline struct hlist_head *
+find_swevent_head(struct swevent_htable *swhash, struct perf_event *event)
+{
+ struct swevent_hlist *hlist;
+ u32 event_id = event->attr.config;
+ u64 type = event->attr.type;
+
+ /*
+ * Event scheduling is always serialized against hlist allocation
+ * and release. Which makes the protected version suitable here.
+ * The context lock guarantees that.
+ */
+ hlist = rcu_dereference_protected(swhash->swevent_hlist,
+ lockdep_is_held(&event->ctx->lock));
+ if (!hlist)
+ return NULL;
+
+ return __find_swevent_head(hlist, type, event_id);
+}
+
+static void do_perf_sw_event(enum perf_type_id type, u32 event_id,
+ u64 nr, int nmi,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
+ struct perf_event *event;
+ struct hlist_node *node;
+ struct hlist_head *head;
+
+ rcu_read_lock();
+ head = find_swevent_head_rcu(swhash, type, event_id);
+ if (!head)
+ goto end;
+
+ hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
+ if (perf_swevent_match(event, type, event_id, data, regs))
+ perf_swevent_event(event, nr, nmi, data, regs);
+ }
+end:
+ rcu_read_unlock();
+}
+
+int perf_swevent_get_recursion_context(void)
+{
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
+
+ return get_recursion_context(swhash->recursion);
+}
+EXPORT_SYMBOL_GPL(perf_swevent_get_recursion_context);
+
+inline void perf_swevent_put_recursion_context(int rctx)
+{
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
+
+ put_recursion_context(swhash->recursion, rctx);
+}
+
+void __perf_sw_event(u32 event_id, u64 nr, int nmi,
+ struct pt_regs *regs, u64 addr)
+{
+ struct perf_sample_data data;
+ int rctx;
+
+ preempt_disable_notrace();
+ rctx = perf_swevent_get_recursion_context();
+ if (rctx < 0)
+ return;
+
+ perf_sample_data_init(&data, addr);
+
+ do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, nmi, &data, regs);
+
+ perf_swevent_put_recursion_context(rctx);
+ preempt_enable_notrace();
+}
+
+static void perf_swevent_read(struct perf_event *event)
+{
+}
+
+static int perf_swevent_add(struct perf_event *event, int flags)
+{
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
+ struct hw_perf_event *hwc = &event->hw;
+ struct hlist_head *head;
+
+ if (is_sampling_event(event)) {
+ hwc->last_period = hwc->sample_period;
+ perf_swevent_set_period(event);
+ }
+
+ hwc->state = !(flags & PERF_EF_START);
+
+ head = find_swevent_head(swhash, event);
+ if (WARN_ON_ONCE(!head))
+ return -EINVAL;
+
+ hlist_add_head_rcu(&event->hlist_entry, head);
+
+ return 0;
+}
+
+static void perf_swevent_del(struct perf_event *event, int flags)
+{
+ hlist_del_rcu(&event->hlist_entry);
+}
+
+static void perf_swevent_start(struct perf_event *event, int flags)
+{
+ event->hw.state = 0;
+}
+
+static void perf_swevent_stop(struct perf_event *event, int flags)
+{
+ event->hw.state = PERF_HES_STOPPED;
+}
+
+/* Deref the hlist from the update side */
+static inline struct swevent_hlist *
+swevent_hlist_deref(struct swevent_htable *swhash)
+{
+ return rcu_dereference_protected(swhash->swevent_hlist,
+ lockdep_is_held(&swhash->hlist_mutex));
+}
+
+static void swevent_hlist_release_rcu(struct rcu_head *rcu_head)
+{
+ struct swevent_hlist *hlist;
+
+ hlist = container_of(rcu_head, struct swevent_hlist, rcu_head);
+ kfree(hlist);
+}
+
+static void swevent_hlist_release(struct swevent_htable *swhash)
+{
+ struct swevent_hlist *hlist = swevent_hlist_deref(swhash);
+
+ if (!hlist)
+ return;
+
+ rcu_assign_pointer(swhash->swevent_hlist, NULL);
+ call_rcu(&hlist->rcu_head, swevent_hlist_release_rcu);
+}
+
+static void swevent_hlist_put_cpu(struct perf_event *event, int cpu)
+{
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
+
+ mutex_lock(&swhash->hlist_mutex);
+
+ if (!--swhash->hlist_refcount)
+ swevent_hlist_release(swhash);
+
+ mutex_unlock(&swhash->hlist_mutex);
+}
+
+static void swevent_hlist_put(struct perf_event *event)
+{
+ int cpu;
+
+ if (event->cpu != -1) {
+ swevent_hlist_put_cpu(event, event->cpu);
+ return;
+ }
+
+ for_each_possible_cpu(cpu)
+ swevent_hlist_put_cpu(event, cpu);
+}
+
+static int swevent_hlist_get_cpu(struct perf_event *event, int cpu)
+{
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
+ int err = 0;
+
+ mutex_lock(&swhash->hlist_mutex);
+
+ if (!swevent_hlist_deref(swhash) && cpu_online(cpu)) {
+ struct swevent_hlist *hlist;
+
+ hlist = kzalloc(sizeof(*hlist), GFP_KERNEL);
+ if (!hlist) {
+ err = -ENOMEM;
+ goto exit;
+ }
+ rcu_assign_pointer(swhash->swevent_hlist, hlist);
+ }
+ swhash->hlist_refcount++;
+exit:
+ mutex_unlock(&swhash->hlist_mutex);
+
+ return err;
+}
+
+static int swevent_hlist_get(struct perf_event *event)
+{
+ int err;
+ int cpu, failed_cpu;
+
+ if (event->cpu != -1)
+ return swevent_hlist_get_cpu(event, event->cpu);
+
+ get_online_cpus();
+ for_each_possible_cpu(cpu) {
+ err = swevent_hlist_get_cpu(event, cpu);
+ if (err) {
+ failed_cpu = cpu;
+ goto fail;
+ }
+ }
+ put_online_cpus();
+
+ return 0;
+fail:
+ for_each_possible_cpu(cpu) {
+ if (cpu == failed_cpu)
+ break;
+ swevent_hlist_put_cpu(event, cpu);
+ }
+
+ put_online_cpus();
+ return err;
+}
+
+struct jump_label_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
+
+static void sw_perf_event_destroy(struct perf_event *event)
+{
+ u64 event_id = event->attr.config;
+
+ WARN_ON(event->parent);
+
+ jump_label_dec(&perf_swevent_enabled[event_id]);
+ swevent_hlist_put(event);
+}
+
+static int perf_swevent_init(struct perf_event *event)
+{
+ int event_id = event->attr.config;
+
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ switch (event_id) {
+ case PERF_COUNT_SW_CPU_CLOCK:
+ case PERF_COUNT_SW_TASK_CLOCK:
+ return -ENOENT;
+
+ default:
+ break;
+ }
+
+ if (event_id >= PERF_COUNT_SW_MAX)
+ return -ENOENT;
+
+ if (!event->parent) {
+ int err;
+
+ err = swevent_hlist_get(event);
+ if (err)
+ return err;
+
+ jump_label_inc(&perf_swevent_enabled[event_id]);
+ event->destroy = sw_perf_event_destroy;
+ }
+
+ return 0;
+}
+
+static struct pmu perf_swevent = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = perf_swevent_init,
+ .add = perf_swevent_add,
+ .del = perf_swevent_del,
+ .start = perf_swevent_start,
+ .stop = perf_swevent_stop,
+ .read = perf_swevent_read,
+};
+
+#ifdef CONFIG_EVENT_TRACING
+
+static int perf_tp_filter_match(struct perf_event *event,
+ struct perf_sample_data *data)
+{
+ void *record = data->raw->data;
+
+ if (likely(!event->filter) || filter_match_preds(event->filter, record))
+ return 1;
+ return 0;
+}
+
+static int perf_tp_event_match(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ if (event->hw.state & PERF_HES_STOPPED)
+ return 0;
+ /*
+ * All tracepoints are from kernel-space.
+ */
+ if (event->attr.exclude_kernel)
+ return 0;
+
+ if (!perf_tp_filter_match(event, data))
+ return 0;
+
+ return 1;
+}
+
+void perf_tp_event(u64 addr, u64 count, void *record, int entry_size,
+ struct pt_regs *regs, struct hlist_head *head, int rctx)
+{
+ struct perf_sample_data data;
+ struct perf_event *event;
+ struct hlist_node *node;
+
+ struct perf_raw_record raw = {
+ .size = entry_size,
+ .data = record,
+ };
+
+ perf_sample_data_init(&data, addr);
+ data.raw = &raw;
+
+ hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
+ if (perf_tp_event_match(event, &data, regs))
+ perf_swevent_event(event, count, 1, &data, regs);
+ }
+
+ perf_swevent_put_recursion_context(rctx);
+}
+EXPORT_SYMBOL_GPL(perf_tp_event);
+
+static void tp_perf_event_destroy(struct perf_event *event)
+{
+ perf_trace_destroy(event);
+}
+
+static int perf_tp_event_init(struct perf_event *event)
+{
+ int err;
+
+ if (event->attr.type != PERF_TYPE_TRACEPOINT)
+ return -ENOENT;
+
+ err = perf_trace_init(event);
+ if (err)
+ return err;
+
+ event->destroy = tp_perf_event_destroy;
+
+ return 0;
+}
+
+static struct pmu perf_tracepoint = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = perf_tp_event_init,
+ .add = perf_trace_add,
+ .del = perf_trace_del,
+ .start = perf_swevent_start,
+ .stop = perf_swevent_stop,
+ .read = perf_swevent_read,
+};
+
+static inline void perf_tp_register(void)
+{
+ perf_pmu_register(&perf_tracepoint, "tracepoint", PERF_TYPE_TRACEPOINT);
+}
+
+static int perf_event_set_filter(struct perf_event *event, void __user *arg)
+{
+ char *filter_str;
+ int ret;
+
+ if (event->attr.type != PERF_TYPE_TRACEPOINT)
+ return -EINVAL;
+
+ filter_str = strndup_user(arg, PAGE_SIZE);
+ if (IS_ERR(filter_str))
+ return PTR_ERR(filter_str);
+
+ ret = ftrace_profile_set_filter(event, event->attr.config, filter_str);
+
+ kfree(filter_str);
+ return ret;
+}
+
+static void perf_event_free_filter(struct perf_event *event)
+{
+ ftrace_profile_free_filter(event);
+}
+
+#else
+
+static inline void perf_tp_register(void)
+{
+}
+
+static int perf_event_set_filter(struct perf_event *event, void __user *arg)
+{
+ return -ENOENT;
+}
+
+static void perf_event_free_filter(struct perf_event *event)
+{
+}
+
+#endif /* CONFIG_EVENT_TRACING */
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+void perf_bp_event(struct perf_event *bp, void *data)
+{
+ struct perf_sample_data sample;
+ struct pt_regs *regs = data;
+
+ perf_sample_data_init(&sample, bp->attr.bp_addr);
+
+ if (!bp->hw.state && !perf_exclude_event(bp, regs))
+ perf_swevent_event(bp, 1, 1, &sample, regs);
+}
+#endif
+
+/*
+ * hrtimer based swevent callback
+ */
+
+static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
+{
+ enum hrtimer_restart ret = HRTIMER_RESTART;
+ struct perf_sample_data data;
+ struct pt_regs *regs;
+ struct perf_event *event;
+ u64 period;
+
+ event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+
+ if (event->state != PERF_EVENT_STATE_ACTIVE)
+ return HRTIMER_NORESTART;
+
+ event->pmu->read(event);
+
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+ regs = get_irq_regs();
+
+ if (regs && !perf_exclude_event(event, regs)) {
+ if (!(event->attr.exclude_idle && current->pid == 0))
+ if (perf_event_overflow(event, 0, &data, regs))
+ ret = HRTIMER_NORESTART;
+ }
+
+ period = max_t(u64, 10000, event->hw.sample_period);
+ hrtimer_forward_now(hrtimer, ns_to_ktime(period));
+
+ return ret;
+}
+
+static void perf_swevent_start_hrtimer(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ s64 period;
+
+ if (!is_sampling_event(event))
+ return;
+
+ period = local64_read(&hwc->period_left);
+ if (period) {
+ if (period < 0)
+ period = 10000;
+
+ local64_set(&hwc->period_left, 0);
+ } else {
+ period = max_t(u64, 10000, hwc->sample_period);
+ }
+ __hrtimer_start_range_ns(&hwc->hrtimer,
+ ns_to_ktime(period), 0,
+ HRTIMER_MODE_REL_PINNED, 0);
+}
+
+static void perf_swevent_cancel_hrtimer(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (is_sampling_event(event)) {
+ ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
+ local64_set(&hwc->period_left, ktime_to_ns(remaining));
+
+ hrtimer_cancel(&hwc->hrtimer);
+ }
+}
+
+static void perf_swevent_init_hrtimer(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (!is_sampling_event(event))
+ return;
+
+ hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hwc->hrtimer.function = perf_swevent_hrtimer;
+
+ /*
+ * Since hrtimers have a fixed rate, we can do a static freq->period
+ * mapping and avoid the whole period adjust feedback stuff.
+ */
+ if (event->attr.freq) {
+ long freq = event->attr.sample_freq;
+
+ event->attr.sample_period = NSEC_PER_SEC / freq;
+ hwc->sample_period = event->attr.sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+ event->attr.freq = 0;
+ }
+}
+
+/*
+ * Software event: cpu wall time clock
+ */
+
+static void cpu_clock_event_update(struct perf_event *event)
+{
+ s64 prev;
+ u64 now;
+
+ now = local_clock();
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void cpu_clock_event_start(struct perf_event *event, int flags)
+{
+ local64_set(&event->hw.prev_count, local_clock());
+ perf_swevent_start_hrtimer(event);
+}
+
+static void cpu_clock_event_stop(struct perf_event *event, int flags)
+{
+ perf_swevent_cancel_hrtimer(event);
+ cpu_clock_event_update(event);
+}
+
+static int cpu_clock_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ cpu_clock_event_start(event, flags);
+
+ return 0;
+}
+
+static void cpu_clock_event_del(struct perf_event *event, int flags)
+{
+ cpu_clock_event_stop(event, flags);
+}
+
+static void cpu_clock_event_read(struct perf_event *event)
+{
+ cpu_clock_event_update(event);
+}
+
+static int cpu_clock_event_init(struct perf_event *event)
+{
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
+ return -ENOENT;
+
+ perf_swevent_init_hrtimer(event);
+
+ return 0;
+}
+
+static struct pmu perf_cpu_clock = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = cpu_clock_event_init,
+ .add = cpu_clock_event_add,
+ .del = cpu_clock_event_del,
+ .start = cpu_clock_event_start,
+ .stop = cpu_clock_event_stop,
+ .read = cpu_clock_event_read,
+};
+
+/*
+ * Software event: task time clock
+ */
+
+static void task_clock_event_update(struct perf_event *event, u64 now)
+{
+ u64 prev;
+ s64 delta;
+
+ prev = local64_xchg(&event->hw.prev_count, now);
+ delta = now - prev;
+ local64_add(delta, &event->count);
+}
+
+static void task_clock_event_start(struct perf_event *event, int flags)
+{
+ local64_set(&event->hw.prev_count, event->ctx->time);
+ perf_swevent_start_hrtimer(event);
+}
+
+static void task_clock_event_stop(struct perf_event *event, int flags)
+{
+ perf_swevent_cancel_hrtimer(event);
+ task_clock_event_update(event, event->ctx->time);
+}
+
+static int task_clock_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ task_clock_event_start(event, flags);
+
+ return 0;
+}
+
+static void task_clock_event_del(struct perf_event *event, int flags)
+{
+ task_clock_event_stop(event, PERF_EF_UPDATE);
+}
+
+static void task_clock_event_read(struct perf_event *event)
+{
+ u64 now = perf_clock();
+ u64 delta = now - event->ctx->timestamp;
+ u64 time = event->ctx->time + delta;
+
+ task_clock_event_update(event, time);
+}
+
+static int task_clock_event_init(struct perf_event *event)
+{
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
+ return -ENOENT;
+
+ perf_swevent_init_hrtimer(event);
+
+ return 0;
+}
+
+static struct pmu perf_task_clock = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = task_clock_event_init,
+ .add = task_clock_event_add,
+ .del = task_clock_event_del,
+ .start = task_clock_event_start,
+ .stop = task_clock_event_stop,
+ .read = task_clock_event_read,
+};
+
+static void perf_pmu_nop_void(struct pmu *pmu)
+{
+}
+
+static int perf_pmu_nop_int(struct pmu *pmu)
+{
+ return 0;
+}
+
+static void perf_pmu_start_txn(struct pmu *pmu)
+{
+ perf_pmu_disable(pmu);
+}
+
+static int perf_pmu_commit_txn(struct pmu *pmu)
+{
+ perf_pmu_enable(pmu);
+ return 0;
+}
+
+static void perf_pmu_cancel_txn(struct pmu *pmu)
+{
+ perf_pmu_enable(pmu);
+}
+
+/*
+ * Ensures all contexts with the same task_ctx_nr have the same
+ * pmu_cpu_context too.
+ */
+static void *find_pmu_context(int ctxn)
+{
+ struct pmu *pmu;
+
+ if (ctxn < 0)
+ return NULL;
+
+ list_for_each_entry(pmu, &pmus, entry) {
+ if (pmu->task_ctx_nr == ctxn)
+ return pmu->pmu_cpu_context;
+ }
+
+ return NULL;
+}
+
+static void update_pmu_context(struct pmu *pmu, struct pmu *old_pmu)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_context *cpuctx;
+
+ cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+
+ if (cpuctx->active_pmu == old_pmu)
+ cpuctx->active_pmu = pmu;
+ }
+}
+
+static void free_pmu_context(struct pmu *pmu)
+{
+ struct pmu *i;
+
+ mutex_lock(&pmus_lock);
+ /*
+ * Like a real lame refcount.
+ */
+ list_for_each_entry(i, &pmus, entry) {
+ if (i->pmu_cpu_context == pmu->pmu_cpu_context) {
+ update_pmu_context(i, pmu);
+ goto out;
+ }
+ }
+
+ free_percpu(pmu->pmu_cpu_context);
+out:
+ mutex_unlock(&pmus_lock);
+}
+static struct idr pmu_idr;
+
+static ssize_t
+type_show(struct device *dev, struct device_attribute *attr, char *page)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+
+ return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->type);
+}
+
+static struct device_attribute pmu_dev_attrs[] = {
+ __ATTR_RO(type),
+ __ATTR_NULL,
+};
+
+static int pmu_bus_running;
+static struct bus_type pmu_bus = {
+ .name = "event_source",
+ .dev_attrs = pmu_dev_attrs,
+};
+
+static void pmu_dev_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static int pmu_dev_alloc(struct pmu *pmu)
+{
+ int ret = -ENOMEM;
+
+ pmu->dev = kzalloc(sizeof(struct device), GFP_KERNEL);
+ if (!pmu->dev)
+ goto out;
+
+ device_initialize(pmu->dev);
+ ret = dev_set_name(pmu->dev, "%s", pmu->name);
+ if (ret)
+ goto free_dev;
+
+ dev_set_drvdata(pmu->dev, pmu);
+ pmu->dev->bus = &pmu_bus;
+ pmu->dev->release = pmu_dev_release;
+ ret = device_add(pmu->dev);
+ if (ret)
+ goto free_dev;
+
+out:
+ return ret;
+
+free_dev:
+ put_device(pmu->dev);
+ goto out;
+}
+
+static struct lock_class_key cpuctx_mutex;
+
+int perf_pmu_register(struct pmu *pmu, char *name, int type)
+{
+ int cpu, ret;
+
+ mutex_lock(&pmus_lock);
+ ret = -ENOMEM;
+ pmu->pmu_disable_count = alloc_percpu(int);
+ if (!pmu->pmu_disable_count)
+ goto unlock;
+
+ pmu->type = -1;
+ if (!name)
+ goto skip_type;
+ pmu->name = name;
+
+ if (type < 0) {
+ int err = idr_pre_get(&pmu_idr, GFP_KERNEL);
+ if (!err)
+ goto free_pdc;
+
+ err = idr_get_new_above(&pmu_idr, pmu, PERF_TYPE_MAX, &type);
+ if (err) {
+ ret = err;
+ goto free_pdc;
+ }
+ }
+ pmu->type = type;
+
+ if (pmu_bus_running) {
+ ret = pmu_dev_alloc(pmu);
+ if (ret)
+ goto free_idr;
+ }
+
+skip_type:
+ pmu->pmu_cpu_context = find_pmu_context(pmu->task_ctx_nr);
+ if (pmu->pmu_cpu_context)
+ goto got_cpu_context;
+
+ pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context);
+ if (!pmu->pmu_cpu_context)
+ goto free_dev;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_context *cpuctx;
+
+ cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+ __perf_event_init_context(&cpuctx->ctx);
+ lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
+ cpuctx->ctx.type = cpu_context;
+ cpuctx->ctx.pmu = pmu;
+ cpuctx->jiffies_interval = 1;
+ INIT_LIST_HEAD(&cpuctx->rotation_list);
+ cpuctx->active_pmu = pmu;
+ }
+
+got_cpu_context:
+ if (!pmu->start_txn) {
+ if (pmu->pmu_enable) {
+ /*
+ * If we have pmu_enable/pmu_disable calls, install
+ * transaction stubs that use that to try and batch
+ * hardware accesses.
+ */
+ pmu->start_txn = perf_pmu_start_txn;
+ pmu->commit_txn = perf_pmu_commit_txn;
+ pmu->cancel_txn = perf_pmu_cancel_txn;
+ } else {
+ pmu->start_txn = perf_pmu_nop_void;
+ pmu->commit_txn = perf_pmu_nop_int;
+ pmu->cancel_txn = perf_pmu_nop_void;
+ }
+ }
+
+ if (!pmu->pmu_enable) {
+ pmu->pmu_enable = perf_pmu_nop_void;
+ pmu->pmu_disable = perf_pmu_nop_void;
+ }
+
+ list_add_rcu(&pmu->entry, &pmus);
+ ret = 0;
+unlock:
+ mutex_unlock(&pmus_lock);
+
+ return ret;
+
+free_dev:
+ device_del(pmu->dev);
+ put_device(pmu->dev);
+
+free_idr:
+ if (pmu->type >= PERF_TYPE_MAX)
+ idr_remove(&pmu_idr, pmu->type);
+
+free_pdc:
+ free_percpu(pmu->pmu_disable_count);
+ goto unlock;
+}
+
+void perf_pmu_unregister(struct pmu *pmu)
+{
+ mutex_lock(&pmus_lock);
+ list_del_rcu(&pmu->entry);
+ mutex_unlock(&pmus_lock);
+
+ /*
+ * We dereference the pmu list under both SRCU and regular RCU, so
+ * synchronize against both of those.
+ */
+ synchronize_srcu(&pmus_srcu);
+ synchronize_rcu();
+
+ free_percpu(pmu->pmu_disable_count);
+ if (pmu->type >= PERF_TYPE_MAX)
+ idr_remove(&pmu_idr, pmu->type);
+ device_del(pmu->dev);
+ put_device(pmu->dev);
+ free_pmu_context(pmu);
+}
+
+struct pmu *perf_init_event(struct perf_event *event)
+{
+ struct pmu *pmu = NULL;
+ int idx;
+ int ret;
+
+ idx = srcu_read_lock(&pmus_srcu);
+
+ rcu_read_lock();
+ pmu = idr_find(&pmu_idr, event->attr.type);
+ rcu_read_unlock();
+ if (pmu) {
+ ret = pmu->event_init(event);
+ if (ret)
+ pmu = ERR_PTR(ret);
+ goto unlock;
+ }
+
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ ret = pmu->event_init(event);
+ if (!ret)
+ goto unlock;
+
+ if (ret != -ENOENT) {
+ pmu = ERR_PTR(ret);
+ goto unlock;
+ }
+ }
+ pmu = ERR_PTR(-ENOENT);
+unlock:
+ srcu_read_unlock(&pmus_srcu, idx);
+
+ return pmu;
+}
+
+/*
+ * Allocate and initialize a event structure
+ */
+static struct perf_event *
+perf_event_alloc(struct perf_event_attr *attr, int cpu,
+ struct task_struct *task,
+ struct perf_event *group_leader,
+ struct perf_event *parent_event,
+ perf_overflow_handler_t overflow_handler)
+{
+ struct pmu *pmu;
+ struct perf_event *event;
+ struct hw_perf_event *hwc;
+ long err;
+
+ if ((unsigned)cpu >= nr_cpu_ids) {
+ if (!task || cpu != -1)
+ return ERR_PTR(-EINVAL);
+ }
+
+ event = kzalloc(sizeof(*event), GFP_KERNEL);
+ if (!event)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Single events are their own group leaders, with an
+ * empty sibling list:
+ */
+ if (!group_leader)
+ group_leader = event;
+
+ mutex_init(&event->child_mutex);
+ INIT_LIST_HEAD(&event->child_list);
+
+ INIT_LIST_HEAD(&event->group_entry);
+ INIT_LIST_HEAD(&event->event_entry);
+ INIT_LIST_HEAD(&event->sibling_list);
+ init_waitqueue_head(&event->waitq);
+ init_irq_work(&event->pending, perf_pending_event);
+
+ mutex_init(&event->mmap_mutex);
+
+ event->cpu = cpu;
+ event->attr = *attr;
+ event->group_leader = group_leader;
+ event->pmu = NULL;
+ event->oncpu = -1;
+
+ event->parent = parent_event;
+
+ event->ns = get_pid_ns(current->nsproxy->pid_ns);
+ event->id = atomic64_inc_return(&perf_event_id);
+
+ event->state = PERF_EVENT_STATE_INACTIVE;
+
+ if (task) {
+ event->attach_state = PERF_ATTACH_TASK;
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ /*
+ * hw_breakpoint is a bit difficult here..
+ */
+ if (attr->type == PERF_TYPE_BREAKPOINT)
+ event->hw.bp_target = task;
+#endif
+ }
+
+ if (!overflow_handler && parent_event)
+ overflow_handler = parent_event->overflow_handler;
+
+ event->overflow_handler = overflow_handler;
+
+ if (attr->disabled)
+ event->state = PERF_EVENT_STATE_OFF;
+
+ pmu = NULL;
+
+ hwc = &event->hw;
+ hwc->sample_period = attr->sample_period;
+ if (attr->freq && attr->sample_freq)
+ hwc->sample_period = 1;
+ hwc->last_period = hwc->sample_period;
+
+ local64_set(&hwc->period_left, hwc->sample_period);
+
+ /*
+ * we currently do not support PERF_FORMAT_GROUP on inherited events
+ */
+ if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
+ goto done;
+
+ pmu = perf_init_event(event);
+
+done:
+ err = 0;
+ if (!pmu)
+ err = -EINVAL;
+ else if (IS_ERR(pmu))
+ err = PTR_ERR(pmu);
+
+ if (err) {
+ if (event->ns)
+ put_pid_ns(event->ns);
+ kfree(event);
+ return ERR_PTR(err);
+ }
+
+ event->pmu = pmu;
+
+ if (!event->parent) {
+ if (event->attach_state & PERF_ATTACH_TASK)
+ jump_label_inc(&perf_sched_events);
+ if (event->attr.mmap || event->attr.mmap_data)
+ atomic_inc(&nr_mmap_events);
+ if (event->attr.comm)
+ atomic_inc(&nr_comm_events);
+ if (event->attr.task)
+ atomic_inc(&nr_task_events);
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
+ err = get_callchain_buffers();
+ if (err) {
+ free_event(event);
+ return ERR_PTR(err);
+ }
+ }
+ }
+
+ return event;
+}
+
+static int perf_copy_attr(struct perf_event_attr __user *uattr,
+ struct perf_event_attr *attr)
+{
+ u32 size;
+ int ret;
+
+ if (!access_ok(VERIFY_WRITE, uattr, PERF_ATTR_SIZE_VER0))
+ return -EFAULT;
+
+ /*
+ * zero the full structure, so that a short copy will be nice.
+ */
+ memset(attr, 0, sizeof(*attr));
+
+ ret = get_user(size, &uattr->size);
+ if (ret)
+ return ret;
+
+ if (size > PAGE_SIZE) /* silly large */
+ goto err_size;
+
+ if (!size) /* abi compat */
+ size = PERF_ATTR_SIZE_VER0;
+
+ if (size < PERF_ATTR_SIZE_VER0)
+ goto err_size;
+
+ /*
+ * If we're handed a bigger struct than we know of,
+ * ensure all the unknown bits are 0 - i.e. new
+ * user-space does not rely on any kernel feature
+ * extensions we dont know about yet.
+ */
+ if (size > sizeof(*attr)) {
+ unsigned char __user *addr;
+ unsigned char __user *end;
+ unsigned char val;
+
+ addr = (void __user *)uattr + sizeof(*attr);
+ end = (void __user *)uattr + size;
+
+ for (; addr < end; addr++) {
+ ret = get_user(val, addr);
+ if (ret)
+ return ret;
+ if (val)
+ goto err_size;
+ }
+ size = sizeof(*attr);
+ }
+
+ ret = copy_from_user(attr, uattr, size);
+ if (ret)
+ return -EFAULT;
+
+ /*
+ * If the type exists, the corresponding creation will verify
+ * the attr->config.
+ */
+ if (attr->type >= PERF_TYPE_MAX)
+ return -EINVAL;
+
+ if (attr->__reserved_1)
+ return -EINVAL;
+
+ if (attr->sample_type & ~(PERF_SAMPLE_MAX-1))
+ return -EINVAL;
+
+ if (attr->read_format & ~(PERF_FORMAT_MAX-1))
+ return -EINVAL;
+
+out:
+ return ret;
+
+err_size:
+ put_user(sizeof(*attr), &uattr->size);
+ ret = -E2BIG;
+ goto out;
+}
+
+static int
+perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
+{
+ struct perf_buffer *buffer = NULL, *old_buffer = NULL;
+ int ret = -EINVAL;
+
+ if (!output_event)
+ goto set;
+
+ /* don't allow circular references */
+ if (event == output_event)
+ goto out;
+
+ /*
+ * Don't allow cross-cpu buffers
+ */
+ if (output_event->cpu != event->cpu)
+ goto out;
+
+ /*
+ * If its not a per-cpu buffer, it must be the same task.
+ */
+ if (output_event->cpu == -1 && output_event->ctx != event->ctx)
+ goto out;
+
+set:
+ mutex_lock(&event->mmap_mutex);
+ /* Can't redirect output if we've got an active mmap() */
+ if (atomic_read(&event->mmap_count))
+ goto unlock;
+
+ if (output_event) {
+ /* get the buffer we want to redirect to */
+ buffer = perf_buffer_get(output_event);
+ if (!buffer)
+ goto unlock;
+ }
+
+ old_buffer = event->buffer;
+ rcu_assign_pointer(event->buffer, buffer);
+ ret = 0;
+unlock:
+ mutex_unlock(&event->mmap_mutex);
+
+ if (old_buffer)
+ perf_buffer_put(old_buffer);
+out:
+ return ret;
+}
+
+/**
+ * sys_perf_event_open - open a performance event, associate it to a task/cpu
+ *
+ * @attr_uptr: event_id type attributes for monitoring/sampling
+ * @pid: target pid
+ * @cpu: target cpu
+ * @group_fd: group leader event fd
+ */
+SYSCALL_DEFINE5(perf_event_open,
+ struct perf_event_attr __user *, attr_uptr,
+ pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
+{
+ struct perf_event *group_leader = NULL, *output_event = NULL;
+ struct perf_event *event, *sibling;
+ struct perf_event_attr attr;
+ struct perf_event_context *ctx;
+ struct file *event_file = NULL;
+ struct file *group_file = NULL;
+ struct task_struct *task = NULL;
+ struct pmu *pmu;
+ int event_fd;
+ int move_group = 0;
+ int fput_needed = 0;
+ int err;
+
+ /* for future expandability... */
+ if (flags & ~PERF_FLAG_ALL)
+ return -EINVAL;
+
+ err = perf_copy_attr(attr_uptr, &attr);
+ if (err)
+ return err;
+
+ if (!attr.exclude_kernel) {
+ if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
+ return -EACCES;
+ }
+
+ if (attr.freq) {
+ if (attr.sample_freq > sysctl_perf_event_sample_rate)
+ return -EINVAL;
+ }
+
+ /*
+ * In cgroup mode, the pid argument is used to pass the fd
+ * opened to the cgroup directory in cgroupfs. The cpu argument
+ * designates the cpu on which to monitor threads from that
+ * cgroup.
+ */
+ if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
+ return -EINVAL;
+
+ event_fd = get_unused_fd_flags(O_RDWR);
+ if (event_fd < 0)
+ return event_fd;
+
+ if (group_fd != -1) {
+ group_leader = perf_fget_light(group_fd, &fput_needed);
+ if (IS_ERR(group_leader)) {
+ err = PTR_ERR(group_leader);
+ goto err_fd;
+ }
+ group_file = group_leader->filp;
+ if (flags & PERF_FLAG_FD_OUTPUT)
+ output_event = group_leader;
+ if (flags & PERF_FLAG_FD_NO_GROUP)
+ group_leader = NULL;
+ }
+
+ if (pid != -1 && !(flags & PERF_FLAG_PID_CGROUP)) {
+ task = find_lively_task_by_vpid(pid);
+ if (IS_ERR(task)) {
+ err = PTR_ERR(task);
+ goto err_group_fd;
+ }
+ }
+
+ event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL);
+ if (IS_ERR(event)) {
+ err = PTR_ERR(event);
+ goto err_task;
+ }
+
+ if (flags & PERF_FLAG_PID_CGROUP) {
+ err = perf_cgroup_connect(pid, event, &attr, group_leader);
+ if (err)
+ goto err_alloc;
+ /*
+ * one more event:
+ * - that has cgroup constraint on event->cpu
+ * - that may need work on context switch
+ */
+ atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
+ jump_label_inc(&perf_sched_events);
+ }
+
+ /*
+ * Special case software events and allow them to be part of
+ * any hardware group.
+ */
+ pmu = event->pmu;
+
+ if (group_leader &&
+ (is_software_event(event) != is_software_event(group_leader))) {
+ if (is_software_event(event)) {
+ /*
+ * If event and group_leader are not both a software
+ * event, and event is, then group leader is not.
+ *
+ * Allow the addition of software events to !software
+ * groups, this is safe because software events never
+ * fail to schedule.
+ */
+ pmu = group_leader->pmu;
+ } else if (is_software_event(group_leader) &&
+ (group_leader->group_flags & PERF_GROUP_SOFTWARE)) {
+ /*
+ * In case the group is a pure software group, and we
+ * try to add a hardware event, move the whole group to
+ * the hardware context.
+ */
+ move_group = 1;
+ }
+ }
+
+ /*
+ * Get the target context (task or percpu):
+ */
+ ctx = find_get_context(pmu, task, cpu);
+ if (IS_ERR(ctx)) {
+ err = PTR_ERR(ctx);
+ goto err_alloc;
+ }
+
+ if (task) {
+ put_task_struct(task);
+ task = NULL;
+ }
+
+ /*
+ * Look up the group leader (we will attach this event to it):
+ */
+ if (group_leader) {
+ err = -EINVAL;
+
+ /*
+ * Do not allow a recursive hierarchy (this new sibling
+ * becoming part of another group-sibling):
+ */
+ if (group_leader->group_leader != group_leader)
+ goto err_context;
+ /*
+ * Do not allow to attach to a group in a different
+ * task or CPU context:
+ */
+ if (move_group) {
+ if (group_leader->ctx->type != ctx->type)
+ goto err_context;
+ } else {
+ if (group_leader->ctx != ctx)
+ goto err_context;
+ }
+
+ /*
+ * Only a group leader can be exclusive or pinned
+ */
+ if (attr.exclusive || attr.pinned)
+ goto err_context;
+ }
+
+ if (output_event) {
+ err = perf_event_set_output(event, output_event);
+ if (err)
+ goto err_context;
+ }
+
+ event_file = anon_inode_getfile("[perf_event]", &perf_fops, event, O_RDWR);
+ if (IS_ERR(event_file)) {
+ err = PTR_ERR(event_file);
+ goto err_context;
+ }
+
+ if (move_group) {
+ struct perf_event_context *gctx = group_leader->ctx;
+
+ mutex_lock(&gctx->mutex);
+ perf_remove_from_context(group_leader);
+ list_for_each_entry(sibling, &group_leader->sibling_list,
+ group_entry) {
+ perf_remove_from_context(sibling);
+ put_ctx(gctx);
+ }
+ mutex_unlock(&gctx->mutex);
+ put_ctx(gctx);
+ }
+
+ event->filp = event_file;
+ WARN_ON_ONCE(ctx->parent_ctx);
+ mutex_lock(&ctx->mutex);
+
+ if (move_group) {
+ perf_install_in_context(ctx, group_leader, cpu);
+ get_ctx(ctx);
+ list_for_each_entry(sibling, &group_leader->sibling_list,
+ group_entry) {
+ perf_install_in_context(ctx, sibling, cpu);
+ get_ctx(ctx);
+ }
+ }
+
+ perf_install_in_context(ctx, event, cpu);
+ ++ctx->generation;
+ perf_unpin_context(ctx);
+ mutex_unlock(&ctx->mutex);
+
+ event->owner = current;
+
+ mutex_lock(¤t->perf_event_mutex);
+ list_add_tail(&event->owner_entry, ¤t->perf_event_list);
+ mutex_unlock(¤t->perf_event_mutex);
+
+ /*
+ * Precalculate sample_data sizes
+ */
+ perf_event__header_size(event);
+ perf_event__id_header_size(event);
+
+ /*
+ * Drop the reference on the group_event after placing the
+ * new event on the sibling_list. This ensures destruction
+ * of the group leader will find the pointer to itself in
+ * perf_group_detach().
+ */
+ fput_light(group_file, fput_needed);
+ fd_install(event_fd, event_file);
+ return event_fd;
+
+err_context:
+ perf_unpin_context(ctx);
+ put_ctx(ctx);
+err_alloc:
+ free_event(event);
+err_task:
+ if (task)
+ put_task_struct(task);
+err_group_fd:
+ fput_light(group_file, fput_needed);
+err_fd:
+ put_unused_fd(event_fd);
+ return err;
+}
+
+/**
+ * perf_event_create_kernel_counter
+ *
+ * @attr: attributes of the counter to create
+ * @cpu: cpu in which the counter is bound
+ * @task: task to profile (NULL for percpu)
+ */
+struct perf_event *
+perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
+ struct task_struct *task,
+ perf_overflow_handler_t overflow_handler)
+{
+ struct perf_event_context *ctx;
+ struct perf_event *event;
+ int err;
+
+ /*
+ * Get the target context (task or percpu):
+ */
+
+ event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler);
+ if (IS_ERR(event)) {
+ err = PTR_ERR(event);
+ goto err;
+ }
+
+ ctx = find_get_context(event->pmu, task, cpu);
+ if (IS_ERR(ctx)) {
+ err = PTR_ERR(ctx);
+ goto err_free;
+ }
+
+ event->filp = NULL;
+ WARN_ON_ONCE(ctx->parent_ctx);
+ mutex_lock(&ctx->mutex);
+ perf_install_in_context(ctx, event, cpu);
+ ++ctx->generation;
+ perf_unpin_context(ctx);
+ mutex_unlock(&ctx->mutex);
+
+ return event;
+
+err_free:
+ free_event(event);
+err:
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
+
+static void sync_child_event(struct perf_event *child_event,
+ struct task_struct *child)
+{
+ struct perf_event *parent_event = child_event->parent;
+ u64 child_val;
+
+ if (child_event->attr.inherit_stat)
+ perf_event_read_event(child_event, child);
+
+ child_val = perf_event_count(child_event);
+
+ /*
+ * Add back the child's count to the parent's count:
+ */
+ atomic64_add(child_val, &parent_event->child_count);
+ atomic64_add(child_event->total_time_enabled,
+ &parent_event->child_total_time_enabled);
+ atomic64_add(child_event->total_time_running,
+ &parent_event->child_total_time_running);
+
+ /*
+ * Remove this event from the parent's list
+ */
+ WARN_ON_ONCE(parent_event->ctx->parent_ctx);
+ mutex_lock(&parent_event->child_mutex);
+ list_del_init(&child_event->child_list);
+ mutex_unlock(&parent_event->child_mutex);
+
+ /*
+ * Release the parent event, if this was the last
+ * reference to it.
+ */
+ fput(parent_event->filp);
+}
+
+static void
+__perf_event_exit_task(struct perf_event *child_event,
+ struct perf_event_context *child_ctx,
+ struct task_struct *child)
+{
+ if (child_event->parent) {
+ raw_spin_lock_irq(&child_ctx->lock);
+ perf_group_detach(child_event);
+ raw_spin_unlock_irq(&child_ctx->lock);
+ }
+
+ perf_remove_from_context(child_event);
+
+ /*
+ * It can happen that the parent exits first, and has events
+ * that are still around due to the child reference. These
+ * events need to be zapped.
+ */
+ if (child_event->parent) {
+ sync_child_event(child_event, child);
+ free_event(child_event);
+ }
+}
+
+static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
+{
+ struct perf_event *child_event, *tmp;
+ struct perf_event_context *child_ctx;
+ unsigned long flags;
+
+ if (likely(!child->perf_event_ctxp[ctxn])) {
+ perf_event_task(child, NULL, 0);
+ return;
+ }
+
+ local_irq_save(flags);
+ /*
+ * We can't reschedule here because interrupts are disabled,
+ * and either child is current or it is a task that can't be
+ * scheduled, so we are now safe from rescheduling changing
+ * our context.
+ */
+ child_ctx = rcu_dereference_raw(child->perf_event_ctxp[ctxn]);
+ task_ctx_sched_out(child_ctx, EVENT_ALL);
+
+ /*
+ * Take the context lock here so that if find_get_context is
+ * reading child->perf_event_ctxp, we wait until it has
+ * incremented the context's refcount before we do put_ctx below.
+ */
+ raw_spin_lock(&child_ctx->lock);
+ child->perf_event_ctxp[ctxn] = NULL;
+ /*
+ * If this context is a clone; unclone it so it can't get
+ * swapped to another process while we're removing all
+ * the events from it.
+ */
+ unclone_ctx(child_ctx);
+ update_context_time(child_ctx);
+ raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
+
+ /*
+ * Report the task dead after unscheduling the events so that we
+ * won't get any samples after PERF_RECORD_EXIT. We can however still
+ * get a few PERF_RECORD_READ events.
+ */
+ perf_event_task(child, child_ctx, 0);
+
+ /*
+ * We can recurse on the same lock type through:
+ *
+ * __perf_event_exit_task()
+ * sync_child_event()
+ * fput(parent_event->filp)
+ * perf_release()
+ * mutex_lock(&ctx->mutex)
+ *
+ * But since its the parent context it won't be the same instance.
+ */
+ mutex_lock(&child_ctx->mutex);
+
+again:
+ list_for_each_entry_safe(child_event, tmp, &child_ctx->pinned_groups,
+ group_entry)
+ __perf_event_exit_task(child_event, child_ctx, child);
+
+ list_for_each_entry_safe(child_event, tmp, &child_ctx->flexible_groups,
+ group_entry)
+ __perf_event_exit_task(child_event, child_ctx, child);
+
+ /*
+ * If the last event was a group event, it will have appended all
+ * its siblings to the list, but we obtained 'tmp' before that which
+ * will still point to the list head terminating the iteration.
+ */
+ if (!list_empty(&child_ctx->pinned_groups) ||
+ !list_empty(&child_ctx->flexible_groups))
+ goto again;
+
+ mutex_unlock(&child_ctx->mutex);
+
+ put_ctx(child_ctx);
+}
+
+/*
+ * When a child task exits, feed back event values to parent events.
+ */
+void perf_event_exit_task(struct task_struct *child)
+{
+ struct perf_event *event, *tmp;
+ int ctxn;
+
+ mutex_lock(&child->perf_event_mutex);
+ list_for_each_entry_safe(event, tmp, &child->perf_event_list,
+ owner_entry) {
+ list_del_init(&event->owner_entry);
+
+ /*
+ * Ensure the list deletion is visible before we clear
+ * the owner, closes a race against perf_release() where
+ * we need to serialize on the owner->perf_event_mutex.
+ */
+ smp_wmb();
+ event->owner = NULL;
+ }
+ mutex_unlock(&child->perf_event_mutex);
+
+ for_each_task_context_nr(ctxn)
+ perf_event_exit_task_context(child, ctxn);
+}
+
+static void perf_free_event(struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *parent = event->parent;
+
+ if (WARN_ON_ONCE(!parent))
+ return;
+
+ mutex_lock(&parent->child_mutex);
+ list_del_init(&event->child_list);
+ mutex_unlock(&parent->child_mutex);
+
+ fput(parent->filp);
+
+ perf_group_detach(event);
+ list_del_event(event, ctx);
+ free_event(event);
+}
+
+/*
+ * free an unexposed, unused context as created by inheritance by
+ * perf_event_init_task below, used by fork() in case of fail.
+ */
+void perf_event_free_task(struct task_struct *task)
+{
+ struct perf_event_context *ctx;
+ struct perf_event *event, *tmp;
+ int ctxn;
+
+ for_each_task_context_nr(ctxn) {
+ ctx = task->perf_event_ctxp[ctxn];
+ if (!ctx)
+ continue;
+
+ mutex_lock(&ctx->mutex);
+again:
+ list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
+ group_entry)
+ perf_free_event(event, ctx);
+
+ list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
+ group_entry)
+ perf_free_event(event, ctx);
+
+ if (!list_empty(&ctx->pinned_groups) ||
+ !list_empty(&ctx->flexible_groups))
+ goto again;
+
+ mutex_unlock(&ctx->mutex);
+
+ put_ctx(ctx);
+ }
+}
+
+void perf_event_delayed_put(struct task_struct *task)
+{
+ int ctxn;
+
+ for_each_task_context_nr(ctxn)
+ WARN_ON_ONCE(task->perf_event_ctxp[ctxn]);
+}
+
+/*
+ * inherit a event from parent task to child task:
+ */
+static struct perf_event *
+inherit_event(struct perf_event *parent_event,
+ struct task_struct *parent,
+ struct perf_event_context *parent_ctx,
+ struct task_struct *child,
+ struct perf_event *group_leader,
+ struct perf_event_context *child_ctx)
+{
+ struct perf_event *child_event;
+ unsigned long flags;
+
+ /*
+ * Instead of creating recursive hierarchies of events,
+ * we link inherited events back to the original parent,
+ * which has a filp for sure, which we use as the reference
+ * count:
+ */
+ if (parent_event->parent)
+ parent_event = parent_event->parent;
+
+ child_event = perf_event_alloc(&parent_event->attr,
+ parent_event->cpu,
+ child,
+ group_leader, parent_event,
+ NULL);
+ if (IS_ERR(child_event))
+ return child_event;
+ get_ctx(child_ctx);
+
+ /*
+ * Make the child state follow the state of the parent event,
+ * not its attr.disabled bit. We hold the parent's mutex,
+ * so we won't race with perf_event_{en, dis}able_family.
+ */
+ if (parent_event->state >= PERF_EVENT_STATE_INACTIVE)
+ child_event->state = PERF_EVENT_STATE_INACTIVE;
+ else
+ child_event->state = PERF_EVENT_STATE_OFF;
+
+ if (parent_event->attr.freq) {
+ u64 sample_period = parent_event->hw.sample_period;
+ struct hw_perf_event *hwc = &child_event->hw;
+
+ hwc->sample_period = sample_period;
+ hwc->last_period = sample_period;
+
+ local64_set(&hwc->period_left, sample_period);
+ }
+
+ child_event->ctx = child_ctx;
+ child_event->overflow_handler = parent_event->overflow_handler;
+
+ /*
+ * Precalculate sample_data sizes
+ */
+ perf_event__header_size(child_event);
+ perf_event__id_header_size(child_event);
+
+ /*
+ * Link it up in the child's context:
+ */
+ raw_spin_lock_irqsave(&child_ctx->lock, flags);
+ add_event_to_ctx(child_event, child_ctx);
+ raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
+
+ /*
+ * Get a reference to the parent filp - we will fput it
+ * when the child event exits. This is safe to do because
+ * we are in the parent and we know that the filp still
+ * exists and has a nonzero count:
+ */
+ atomic_long_inc(&parent_event->filp->f_count);
+
+ /*
+ * Link this into the parent event's child list
+ */
+ WARN_ON_ONCE(parent_event->ctx->parent_ctx);
+ mutex_lock(&parent_event->child_mutex);
+ list_add_tail(&child_event->child_list, &parent_event->child_list);
+ mutex_unlock(&parent_event->child_mutex);
+
+ return child_event;
+}
+
+static int inherit_group(struct perf_event *parent_event,
+ struct task_struct *parent,
+ struct perf_event_context *parent_ctx,
+ struct task_struct *child,
+ struct perf_event_context *child_ctx)
+{
+ struct perf_event *leader;
+ struct perf_event *sub;
+ struct perf_event *child_ctr;
+
+ leader = inherit_event(parent_event, parent, parent_ctx,
+ child, NULL, child_ctx);
+ if (IS_ERR(leader))
+ return PTR_ERR(leader);
+ list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
+ child_ctr = inherit_event(sub, parent, parent_ctx,
+ child, leader, child_ctx);
+ if (IS_ERR(child_ctr))
+ return PTR_ERR(child_ctr);
+ }
+ return 0;
+}
+
+static int
+inherit_task_group(struct perf_event *event, struct task_struct *parent,
+ struct perf_event_context *parent_ctx,
+ struct task_struct *child, int ctxn,
+ int *inherited_all)
+{
+ int ret;
+ struct perf_event_context *child_ctx;
+
+ if (!event->attr.inherit) {
+ *inherited_all = 0;
+ return 0;
+ }
+
+ child_ctx = child->perf_event_ctxp[ctxn];
+ if (!child_ctx) {
+ /*
+ * This is executed from the parent task context, so
+ * inherit events that have been marked for cloning.
+ * First allocate and initialize a context for the
+ * child.
+ */
+
+ child_ctx = alloc_perf_context(event->pmu, child);
+ if (!child_ctx)
+ return -ENOMEM;
+
+ child->perf_event_ctxp[ctxn] = child_ctx;
+ }
+
+ ret = inherit_group(event, parent, parent_ctx,
+ child, child_ctx);
+
+ if (ret)
+ *inherited_all = 0;
+
+ return ret;
+}
+
+/*
+ * Initialize the perf_event context in task_struct
+ */
+int perf_event_init_context(struct task_struct *child, int ctxn)
+{
+ struct perf_event_context *child_ctx, *parent_ctx;
+ struct perf_event_context *cloned_ctx;
+ struct perf_event *event;
+ struct task_struct *parent = current;
+ int inherited_all = 1;
+ unsigned long flags;
+ int ret = 0;
+
+ if (likely(!parent->perf_event_ctxp[ctxn]))
+ return 0;
+
+ /*
+ * If the parent's context is a clone, pin it so it won't get
+ * swapped under us.
+ */
+ parent_ctx = perf_pin_task_context(parent, ctxn);
+
+ /*
+ * No need to check if parent_ctx != NULL here; since we saw
+ * it non-NULL earlier, the only reason for it to become NULL
+ * is if we exit, and since we're currently in the middle of
+ * a fork we can't be exiting at the same time.
+ */
+
+ /*
+ * Lock the parent list. No need to lock the child - not PID
+ * hashed yet and not running, so nobody can access it.
+ */
+ mutex_lock(&parent_ctx->mutex);
+
+ /*
+ * We dont have to disable NMIs - we are only looking at
+ * the list, not manipulating it:
+ */
+ list_for_each_entry(event, &parent_ctx->pinned_groups, group_entry) {
+ ret = inherit_task_group(event, parent, parent_ctx,
+ child, ctxn, &inherited_all);
+ if (ret)
+ break;
+ }
+
+ /*
+ * We can't hold ctx->lock when iterating the ->flexible_group list due
+ * to allocations, but we need to prevent rotation because
+ * rotate_ctx() will change the list from interrupt context.
+ */
+ raw_spin_lock_irqsave(&parent_ctx->lock, flags);
+ parent_ctx->rotate_disable = 1;
+ raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
+
+ list_for_each_entry(event, &parent_ctx->flexible_groups, group_entry) {
+ ret = inherit_task_group(event, parent, parent_ctx,
+ child, ctxn, &inherited_all);
+ if (ret)
+ break;
+ }
+
+ raw_spin_lock_irqsave(&parent_ctx->lock, flags);
+ parent_ctx->rotate_disable = 0;
+
+ child_ctx = child->perf_event_ctxp[ctxn];
+
+ if (child_ctx && inherited_all) {
+ /*
+ * Mark the child context as a clone of the parent
+ * context, or of whatever the parent is a clone of.
+ *
+ * Note that if the parent is a clone, the holding of
+ * parent_ctx->lock avoids it from being uncloned.
+ */
+ cloned_ctx = parent_ctx->parent_ctx;
+ if (cloned_ctx) {
+ child_ctx->parent_ctx = cloned_ctx;
+ child_ctx->parent_gen = parent_ctx->parent_gen;
+ } else {
+ child_ctx->parent_ctx = parent_ctx;
+ child_ctx->parent_gen = parent_ctx->generation;
+ }
+ get_ctx(child_ctx->parent_ctx);
+ }
+
+ raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
+ mutex_unlock(&parent_ctx->mutex);
+
+ perf_unpin_context(parent_ctx);
+ put_ctx(parent_ctx);
+
+ return ret;
+}
+
+/*
+ * Initialize the perf_event context in task_struct
+ */
+int perf_event_init_task(struct task_struct *child)
+{
+ int ctxn, ret;
+
+ memset(child->perf_event_ctxp, 0, sizeof(child->perf_event_ctxp));
+ mutex_init(&child->perf_event_mutex);
+ INIT_LIST_HEAD(&child->perf_event_list);
+
+ for_each_task_context_nr(ctxn) {
+ ret = perf_event_init_context(child, ctxn);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __init perf_event_init_all_cpus(void)
+{
+ struct swevent_htable *swhash;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ swhash = &per_cpu(swevent_htable, cpu);
+ mutex_init(&swhash->hlist_mutex);
+ INIT_LIST_HEAD(&per_cpu(rotation_list, cpu));
+ }
+}
+
+static void __cpuinit perf_event_init_cpu(int cpu)
+{
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
+
+ mutex_lock(&swhash->hlist_mutex);
+ if (swhash->hlist_refcount > 0) {
+ struct swevent_hlist *hlist;
+
+ hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu));
+ WARN_ON(!hlist);
+ rcu_assign_pointer(swhash->swevent_hlist, hlist);
+ }
+ mutex_unlock(&swhash->hlist_mutex);
+}
+
+#if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC
+static void perf_pmu_rotate_stop(struct pmu *pmu)
+{
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+ WARN_ON(!irqs_disabled());
+
+ list_del_init(&cpuctx->rotation_list);
+}
+
+static void __perf_event_exit_context(void *__info)
+{
+ struct perf_event_context *ctx = __info;
+ struct perf_event *event, *tmp;
+
+ perf_pmu_rotate_stop(ctx->pmu);
+
+ list_for_each_entry_safe(event, tmp, &ctx->pinned_groups, group_entry)
+ __perf_remove_from_context(event);
+ list_for_each_entry_safe(event, tmp, &ctx->flexible_groups, group_entry)
+ __perf_remove_from_context(event);
+}
+
+static void perf_event_exit_cpu_context(int cpu)
+{
+ struct perf_event_context *ctx;
+ struct pmu *pmu;
+ int idx;
+
+ idx = srcu_read_lock(&pmus_srcu);
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ ctx = &per_cpu_ptr(pmu->pmu_cpu_context, cpu)->ctx;
+
+ mutex_lock(&ctx->mutex);
+ smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
+ mutex_unlock(&ctx->mutex);
+ }
+ srcu_read_unlock(&pmus_srcu, idx);
+}
+
+static void perf_event_exit_cpu(int cpu)
+{
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
+
+ mutex_lock(&swhash->hlist_mutex);
+ swevent_hlist_release(swhash);
+ mutex_unlock(&swhash->hlist_mutex);
+
+ perf_event_exit_cpu_context(cpu);
+}
+#else
+static inline void perf_event_exit_cpu(int cpu) { }
+#endif
+
+static int
+perf_reboot(struct notifier_block *notifier, unsigned long val, void *v)
+{
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ perf_event_exit_cpu(cpu);
+
+ return NOTIFY_OK;
+}
+
+/*
+ * Run the perf reboot notifier at the very last possible moment so that
+ * the generic watchdog code runs as long as possible.
+ */
+static struct notifier_block perf_reboot_notifier = {
+ .notifier_call = perf_reboot,
+ .priority = INT_MIN,
+};
+
+static int __cpuinit
+perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
+{
+ unsigned int cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+
+ case CPU_UP_PREPARE:
+ case CPU_DOWN_FAILED:
+ perf_event_init_cpu(cpu);
+ break;
+
+ case CPU_UP_CANCELED:
+ case CPU_DOWN_PREPARE:
+ perf_event_exit_cpu(cpu);
+ break;
+
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+void __init perf_event_init(void)
+{
+ int ret;
+
+ idr_init(&pmu_idr);
+
+ perf_event_init_all_cpus();
+ init_srcu_struct(&pmus_srcu);
+ perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
+ perf_pmu_register(&perf_cpu_clock, NULL, -1);
+ perf_pmu_register(&perf_task_clock, NULL, -1);
+ perf_tp_register();
+ perf_cpu_notifier(perf_cpu_notify);
+ register_reboot_notifier(&perf_reboot_notifier);
+
+ ret = init_hw_breakpoint();
+ WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
+}
+
+static int __init perf_event_sysfs_init(void)
+{
+ struct pmu *pmu;
+ int ret;
+
+ mutex_lock(&pmus_lock);
+
+ ret = bus_register(&pmu_bus);
+ if (ret)
+ goto unlock;
+
+ list_for_each_entry(pmu, &pmus, entry) {
+ if (!pmu->name || pmu->type < 0)
+ continue;
+
+ ret = pmu_dev_alloc(pmu);
+ WARN(ret, "Failed to register pmu: %s, reason %d\n", pmu->name, ret);
+ }
+ pmu_bus_running = 1;
+ ret = 0;
+
+unlock:
+ mutex_unlock(&pmus_lock);
+
+ return ret;
+}
+device_initcall(perf_event_sysfs_init);
+
+#ifdef CONFIG_CGROUP_PERF
+static struct cgroup_subsys_state *perf_cgroup_create(
+ struct cgroup_subsys *ss, struct cgroup *cont)
+{
+ struct perf_cgroup *jc;
+
+ jc = kzalloc(sizeof(*jc), GFP_KERNEL);
+ if (!jc)
+ return ERR_PTR(-ENOMEM);
+
+ jc->info = alloc_percpu(struct perf_cgroup_info);
+ if (!jc->info) {
+ kfree(jc);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ return &jc->css;
+}
+
+static void perf_cgroup_destroy(struct cgroup_subsys *ss,
+ struct cgroup *cont)
+{
+ struct perf_cgroup *jc;
+ jc = container_of(cgroup_subsys_state(cont, perf_subsys_id),
+ struct perf_cgroup, css);
+ free_percpu(jc->info);
+ kfree(jc);
+}
+
+static int __perf_cgroup_move(void *info)
+{
+ struct task_struct *task = info;
+ perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN);
+ return 0;
+}
+
+static void perf_cgroup_move(struct task_struct *task)
+{
+ task_function_call(task, __perf_cgroup_move, task);
+}
+
+static void perf_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup *old_cgrp, struct task_struct *task,
+ bool threadgroup)
+{
+ perf_cgroup_move(task);
+ if (threadgroup) {
+ struct task_struct *c;
+ rcu_read_lock();
+ list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
+ perf_cgroup_move(c);
+ }
+ rcu_read_unlock();
+ }
+}
+
+static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup *old_cgrp, struct task_struct *task)
+{
+ /*
+ * cgroup_exit() is called in the copy_process() failure path.
+ * Ignore this case since the task hasn't ran yet, this avoids
+ * trying to poke a half freed task state from generic code.
+ */
+ if (!(task->flags & PF_EXITING))
+ return;
+
+ perf_cgroup_move(task);
+}
+
+struct cgroup_subsys perf_subsys = {
+ .name = "perf_event",
+ .subsys_id = perf_subsys_id,
+ .create = perf_cgroup_create,
+ .destroy = perf_cgroup_destroy,
+ .exit = perf_cgroup_exit,
+ .attach = perf_cgroup_attach,
+};
+#endif /* CONFIG_CGROUP_PERF */
--- /dev/null
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ * Copyright (C) IBM Corporation, 2009
+ * Copyright (C) 2009, Frederic Weisbecker <fweisbec@gmail.com>
+ *
+ * Thanks to Ingo Molnar for his many suggestions.
+ *
+ * Authors: Alan Stern <stern@rowland.harvard.edu>
+ * K.Prasad <prasad@linux.vnet.ibm.com>
+ * Frederic Weisbecker <fweisbec@gmail.com>
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ * This file contains the arch-independent routines.
+ */
+
+#include <linux/irqflags.h>
+#include <linux/kallsyms.h>
+#include <linux/notifier.h>
+#include <linux/kprobes.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>
+
+#include <linux/hw_breakpoint.h>
+
+
+/*
+ * Constraints data
+ */
+
+/* Number of pinned cpu breakpoints in a cpu */
+static DEFINE_PER_CPU(unsigned int, nr_cpu_bp_pinned[TYPE_MAX]);
+
+/* Number of pinned task breakpoints in a cpu */
+static DEFINE_PER_CPU(unsigned int *, nr_task_bp_pinned[TYPE_MAX]);
+
+/* Number of non-pinned cpu/task breakpoints in a cpu */
+static DEFINE_PER_CPU(unsigned int, nr_bp_flexible[TYPE_MAX]);
+
+static int nr_slots[TYPE_MAX];
+
+/* Keep track of the breakpoints attached to tasks */
+static LIST_HEAD(bp_task_head);
+
+static int constraints_initialized;
+
+/* Gather the number of total pinned and un-pinned bp in a cpuset */
+struct bp_busy_slots {
+ unsigned int pinned;
+ unsigned int flexible;
+};
+
+/* Serialize accesses to the above constraints */
+static DEFINE_MUTEX(nr_bp_mutex);
+
+__weak int hw_breakpoint_weight(struct perf_event *bp)
+{
+ return 1;
+}
+
+static inline enum bp_type_idx find_slot_idx(struct perf_event *bp)
+{
+ if (bp->attr.bp_type & HW_BREAKPOINT_RW)
+ return TYPE_DATA;
+
+ return TYPE_INST;
+}
+
+/*
+ * Report the maximum number of pinned breakpoints a task
+ * have in this cpu
+ */
+static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
+{
+ int i;
+ unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
+
+ for (i = nr_slots[type] - 1; i >= 0; i--) {
+ if (tsk_pinned[i] > 0)
+ return i + 1;
+ }
+
+ return 0;
+}
+
+/*
+ * Count the number of breakpoints of the same type and same task.
+ * The given event must be not on the list.
+ */
+static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
+{
+ struct task_struct *tsk = bp->hw.bp_target;
+ struct perf_event *iter;
+ int count = 0;
+
+ list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
+ if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
+ count += hw_breakpoint_weight(iter);
+ }
+
+ return count;
+}
+
+/*
+ * Report the number of pinned/un-pinned breakpoints we have in
+ * a given cpu (cpu > -1) or in all of them (cpu = -1).
+ */
+static void
+fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
+ enum bp_type_idx type)
+{
+ int cpu = bp->cpu;
+ struct task_struct *tsk = bp->hw.bp_target;
+
+ if (cpu >= 0) {
+ slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
+ if (!tsk)
+ slots->pinned += max_task_bp_pinned(cpu, type);
+ else
+ slots->pinned += task_bp_pinned(bp, type);
+ slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
+
+ return;
+ }
+
+ for_each_online_cpu(cpu) {
+ unsigned int nr;
+
+ nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
+ if (!tsk)
+ nr += max_task_bp_pinned(cpu, type);
+ else
+ nr += task_bp_pinned(bp, type);
+
+ if (nr > slots->pinned)
+ slots->pinned = nr;
+
+ nr = per_cpu(nr_bp_flexible[type], cpu);
+
+ if (nr > slots->flexible)
+ slots->flexible = nr;
+ }
+}
+
+/*
+ * For now, continue to consider flexible as pinned, until we can
+ * ensure no flexible event can ever be scheduled before a pinned event
+ * in a same cpu.
+ */
+static void
+fetch_this_slot(struct bp_busy_slots *slots, int weight)
+{
+ slots->pinned += weight;
+}
+
+/*
+ * Add a pinned breakpoint for the given task in our constraint table
+ */
+static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
+ enum bp_type_idx type, int weight)
+{
+ unsigned int *tsk_pinned;
+ int old_count = 0;
+ int old_idx = 0;
+ int idx = 0;
+
+ old_count = task_bp_pinned(bp, type);
+ old_idx = old_count - 1;
+ idx = old_idx + weight;
+
+ /* tsk_pinned[n] is the number of tasks having n breakpoints */
+ tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
+ if (enable) {
+ tsk_pinned[idx]++;
+ if (old_count > 0)
+ tsk_pinned[old_idx]--;
+ } else {
+ tsk_pinned[idx]--;
+ if (old_count > 0)
+ tsk_pinned[old_idx]++;
+ }
+}
+
+/*
+ * Add/remove the given breakpoint in our constraint table
+ */
+static void
+toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
+ int weight)
+{
+ int cpu = bp->cpu;
+ struct task_struct *tsk = bp->hw.bp_target;
+
+ /* Pinned counter cpu profiling */
+ if (!tsk) {
+
+ if (enable)
+ per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
+ else
+ per_cpu(nr_cpu_bp_pinned[type], bp->cpu) -= weight;
+ return;
+ }
+
+ /* Pinned counter task profiling */
+
+ if (!enable)
+ list_del(&bp->hw.bp_list);
+
+ if (cpu >= 0) {
+ toggle_bp_task_slot(bp, cpu, enable, type, weight);
+ } else {
+ for_each_online_cpu(cpu)
+ toggle_bp_task_slot(bp, cpu, enable, type, weight);
+ }
+
+ if (enable)
+ list_add_tail(&bp->hw.bp_list, &bp_task_head);
+}
+
+/*
+ * Function to perform processor-specific cleanup during unregistration
+ */
+__weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
+{
+ /*
+ * A weak stub function here for those archs that don't define
+ * it inside arch/.../kernel/hw_breakpoint.c
+ */
+}
+
+/*
+ * Contraints to check before allowing this new breakpoint counter:
+ *
+ * == Non-pinned counter == (Considered as pinned for now)
+ *
+ * - If attached to a single cpu, check:
+ *
+ * (per_cpu(nr_bp_flexible, cpu) || (per_cpu(nr_cpu_bp_pinned, cpu)
+ * + max(per_cpu(nr_task_bp_pinned, cpu)))) < HBP_NUM
+ *
+ * -> If there are already non-pinned counters in this cpu, it means
+ * there is already a free slot for them.
+ * Otherwise, we check that the maximum number of per task
+ * breakpoints (for this cpu) plus the number of per cpu breakpoint
+ * (for this cpu) doesn't cover every registers.
+ *
+ * - If attached to every cpus, check:
+ *
+ * (per_cpu(nr_bp_flexible, *) || (max(per_cpu(nr_cpu_bp_pinned, *))
+ * + max(per_cpu(nr_task_bp_pinned, *)))) < HBP_NUM
+ *
+ * -> This is roughly the same, except we check the number of per cpu
+ * bp for every cpu and we keep the max one. Same for the per tasks
+ * breakpoints.
+ *
+ *
+ * == Pinned counter ==
+ *
+ * - If attached to a single cpu, check:
+ *
+ * ((per_cpu(nr_bp_flexible, cpu) > 1) + per_cpu(nr_cpu_bp_pinned, cpu)
+ * + max(per_cpu(nr_task_bp_pinned, cpu))) < HBP_NUM
+ *
+ * -> Same checks as before. But now the nr_bp_flexible, if any, must keep
+ * one register at least (or they will never be fed).
+ *
+ * - If attached to every cpus, check:
+ *
+ * ((per_cpu(nr_bp_flexible, *) > 1) + max(per_cpu(nr_cpu_bp_pinned, *))
+ * + max(per_cpu(nr_task_bp_pinned, *))) < HBP_NUM
+ */
+static int __reserve_bp_slot(struct perf_event *bp)
+{
+ struct bp_busy_slots slots = {0};
+ enum bp_type_idx type;
+ int weight;
+
+ /* We couldn't initialize breakpoint constraints on boot */
+ if (!constraints_initialized)
+ return -ENOMEM;
+
+ /* Basic checks */
+ if (bp->attr.bp_type == HW_BREAKPOINT_EMPTY ||
+ bp->attr.bp_type == HW_BREAKPOINT_INVALID)
+ return -EINVAL;
+
+ type = find_slot_idx(bp);
+ weight = hw_breakpoint_weight(bp);
+
+ fetch_bp_busy_slots(&slots, bp, type);
+ /*
+ * Simulate the addition of this breakpoint to the constraints
+ * and see the result.
+ */
+ fetch_this_slot(&slots, weight);
+
+ /* Flexible counters need to keep at least one slot */
+ if (slots.pinned + (!!slots.flexible) > nr_slots[type])
+ return -ENOSPC;
+
+ toggle_bp_slot(bp, true, type, weight);
+
+ return 0;
+}
+
+int reserve_bp_slot(struct perf_event *bp)
+{
+ int ret;
+
+ mutex_lock(&nr_bp_mutex);
+
+ ret = __reserve_bp_slot(bp);
+
+ mutex_unlock(&nr_bp_mutex);
+
+ return ret;
+}
+
+static void __release_bp_slot(struct perf_event *bp)
+{
+ enum bp_type_idx type;
+ int weight;
+
+ type = find_slot_idx(bp);
+ weight = hw_breakpoint_weight(bp);
+ toggle_bp_slot(bp, false, type, weight);
+}
+
+void release_bp_slot(struct perf_event *bp)
+{
+ mutex_lock(&nr_bp_mutex);
+
+ arch_unregister_hw_breakpoint(bp);
+ __release_bp_slot(bp);
+
+ mutex_unlock(&nr_bp_mutex);
+}
+
+/*
+ * Allow the kernel debugger to reserve breakpoint slots without
+ * taking a lock using the dbg_* variant of for the reserve and
+ * release breakpoint slots.
+ */
+int dbg_reserve_bp_slot(struct perf_event *bp)
+{
+ if (mutex_is_locked(&nr_bp_mutex))
+ return -1;
+
+ return __reserve_bp_slot(bp);
+}
+
+int dbg_release_bp_slot(struct perf_event *bp)
+{
+ if (mutex_is_locked(&nr_bp_mutex))
+ return -1;
+
+ __release_bp_slot(bp);
+
+ return 0;
+}
+
+static int validate_hw_breakpoint(struct perf_event *bp)
+{
+ int ret;
+
+ ret = arch_validate_hwbkpt_settings(bp);
+ if (ret)
+ return ret;
+
+ if (arch_check_bp_in_kernelspace(bp)) {
+ if (bp->attr.exclude_kernel)
+ return -EINVAL;
+ /*
+ * Don't let unprivileged users set a breakpoint in the trap
+ * path to avoid trap recursion attacks.
+ */
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+int register_perf_hw_breakpoint(struct perf_event *bp)
+{
+ int ret;
+
+ ret = reserve_bp_slot(bp);
+ if (ret)
+ return ret;
+
+ ret = validate_hw_breakpoint(bp);
+
+ /* if arch_validate_hwbkpt_settings() fails then release bp slot */
+ if (ret)
+ release_bp_slot(bp);
+
+ return ret;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @attr: breakpoint attributes
+ * @triggered: callback to trigger when we hit the breakpoint
+ * @tsk: pointer to 'task_struct' of the process to which the address belongs
+ */
+struct perf_event *
+register_user_hw_breakpoint(struct perf_event_attr *attr,
+ perf_overflow_handler_t triggered,
+ struct task_struct *tsk)
+{
+ return perf_event_create_kernel_counter(attr, -1, tsk, triggered);
+}
+EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
+
+/**
+ * modify_user_hw_breakpoint - modify a user-space hardware breakpoint
+ * @bp: the breakpoint structure to modify
+ * @attr: new breakpoint attributes
+ * @triggered: callback to trigger when we hit the breakpoint
+ * @tsk: pointer to 'task_struct' of the process to which the address belongs
+ */
+int modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr)
+{
+ u64 old_addr = bp->attr.bp_addr;
+ u64 old_len = bp->attr.bp_len;
+ int old_type = bp->attr.bp_type;
+ int err = 0;
+
+ perf_event_disable(bp);
+
+ bp->attr.bp_addr = attr->bp_addr;
+ bp->attr.bp_type = attr->bp_type;
+ bp->attr.bp_len = attr->bp_len;
+
+ if (attr->disabled)
+ goto end;
+
+ err = validate_hw_breakpoint(bp);
+ if (!err)
+ perf_event_enable(bp);
+
+ if (err) {
+ bp->attr.bp_addr = old_addr;
+ bp->attr.bp_type = old_type;
+ bp->attr.bp_len = old_len;
+ if (!bp->attr.disabled)
+ perf_event_enable(bp);
+
+ return err;
+ }
+
+end:
+ bp->attr.disabled = attr->disabled;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(modify_user_hw_breakpoint);
+
+/**
+ * unregister_hw_breakpoint - unregister a user-space hardware breakpoint
+ * @bp: the breakpoint structure to unregister
+ */
+void unregister_hw_breakpoint(struct perf_event *bp)
+{
+ if (!bp)
+ return;
+ perf_event_release_kernel(bp);
+}
+EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
+
+/**
+ * register_wide_hw_breakpoint - register a wide breakpoint in the kernel
+ * @attr: breakpoint attributes
+ * @triggered: callback to trigger when we hit the breakpoint
+ *
+ * @return a set of per_cpu pointers to perf events
+ */
+struct perf_event * __percpu *
+register_wide_hw_breakpoint(struct perf_event_attr *attr,
+ perf_overflow_handler_t triggered)
+{
+ struct perf_event * __percpu *cpu_events, **pevent, *bp;
+ long err;
+ int cpu;
+
+ cpu_events = alloc_percpu(typeof(*cpu_events));
+ if (!cpu_events)
+ return (void __percpu __force *)ERR_PTR(-ENOMEM);
+
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ pevent = per_cpu_ptr(cpu_events, cpu);
+ bp = perf_event_create_kernel_counter(attr, cpu, NULL, triggered);
+
+ *pevent = bp;
+
+ if (IS_ERR(bp)) {
+ err = PTR_ERR(bp);
+ goto fail;
+ }
+ }
+ put_online_cpus();
+
+ return cpu_events;
+
+fail:
+ for_each_online_cpu(cpu) {
+ pevent = per_cpu_ptr(cpu_events, cpu);
+ if (IS_ERR(*pevent))
+ break;
+ unregister_hw_breakpoint(*pevent);
+ }
+ put_online_cpus();
+
+ free_percpu(cpu_events);
+ return (void __percpu __force *)ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(register_wide_hw_breakpoint);
+
+/**
+ * unregister_wide_hw_breakpoint - unregister a wide breakpoint in the kernel
+ * @cpu_events: the per cpu set of events to unregister
+ */
+void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events)
+{
+ int cpu;
+ struct perf_event **pevent;
+
+ for_each_possible_cpu(cpu) {
+ pevent = per_cpu_ptr(cpu_events, cpu);
+ unregister_hw_breakpoint(*pevent);
+ }
+ free_percpu(cpu_events);
+}
+EXPORT_SYMBOL_GPL(unregister_wide_hw_breakpoint);
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify,
+ /* we need to be notified first */
+ .priority = 0x7fffffff
+};
+
+static void bp_perf_event_destroy(struct perf_event *event)
+{
+ release_bp_slot(event);
+}
+
+static int hw_breakpoint_event_init(struct perf_event *bp)
+{
+ int err;
+
+ if (bp->attr.type != PERF_TYPE_BREAKPOINT)
+ return -ENOENT;
+
+ err = register_perf_hw_breakpoint(bp);
+ if (err)
+ return err;
+
+ bp->destroy = bp_perf_event_destroy;
+
+ return 0;
+}
+
+static int hw_breakpoint_add(struct perf_event *bp, int flags)
+{
+ if (!(flags & PERF_EF_START))
+ bp->hw.state = PERF_HES_STOPPED;
+
+ return arch_install_hw_breakpoint(bp);
+}
+
+static void hw_breakpoint_del(struct perf_event *bp, int flags)
+{
+ arch_uninstall_hw_breakpoint(bp);
+}
+
+static void hw_breakpoint_start(struct perf_event *bp, int flags)
+{
+ bp->hw.state = 0;
+}
+
+static void hw_breakpoint_stop(struct perf_event *bp, int flags)
+{
+ bp->hw.state = PERF_HES_STOPPED;
+}
+
+static struct pmu perf_breakpoint = {
+ .task_ctx_nr = perf_sw_context, /* could eventually get its own */
+
+ .event_init = hw_breakpoint_event_init,
+ .add = hw_breakpoint_add,
+ .del = hw_breakpoint_del,
+ .start = hw_breakpoint_start,
+ .stop = hw_breakpoint_stop,
+ .read = hw_breakpoint_pmu_read,
+};
+
+int __init init_hw_breakpoint(void)
+{
+ unsigned int **task_bp_pinned;
+ int cpu, err_cpu;
+ int i;
+
+ for (i = 0; i < TYPE_MAX; i++)
+ nr_slots[i] = hw_breakpoint_slots(i);
+
+ for_each_possible_cpu(cpu) {
+ for (i = 0; i < TYPE_MAX; i++) {
+ task_bp_pinned = &per_cpu(nr_task_bp_pinned[i], cpu);
+ *task_bp_pinned = kzalloc(sizeof(int) * nr_slots[i],
+ GFP_KERNEL);
+ if (!*task_bp_pinned)
+ goto err_alloc;
+ }
+ }
+
+ constraints_initialized = 1;
+
+ perf_pmu_register(&perf_breakpoint, "breakpoint", PERF_TYPE_BREAKPOINT);
+
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+
+ err_alloc:
+ for_each_possible_cpu(err_cpu) {
+ if (err_cpu == cpu)
+ break;
+ for (i = 0; i < TYPE_MAX; i++)
+ kfree(per_cpu(nr_task_bp_pinned[i], cpu));
+ }
+
+ return -ENOMEM;
+}
+
+
/*
* FIXME: do that only when needed, using sched_exit tracepoint
*/
- flush_ptrace_hw_breakpoint(tsk);
+ ptrace_put_breakpoints(tsk);
exit_notify(tsk, group_dead);
#ifdef CONFIG_NUMA
return 0;
}
+int core_kernel_data(unsigned long addr)
+{
+ if (addr >= (unsigned long)_sdata &&
+ addr < (unsigned long)_edata)
+ return 1;
+ return 0;
+}
+
int __kernel_text_address(unsigned long addr)
{
if (core_kernel_text(addr))
posix_cpu_timers_init(p);
- p->lock_depth = -1; /* -1 = no lock */
do_posix_clock_monotonic_gettime(&p->start_time);
p->real_start_time = p->start_time;
monotonic_to_bootbased(&p->real_start_time);
#endif
/* Perform scheduler related setup. Assign this task to a CPU. */
- sched_fork(p, clone_flags);
+ sched_fork(p);
retval = perf_event_init_task(p);
if (retval)
*/
p->flags &= ~PF_STARTING;
- wake_up_new_task(p, clone_flags);
+ wake_up_new_task(p);
tracehook_report_clone_complete(trace, regs,
clone_flags, nr, p);
{
if (!unlikely(current->flags & PF_NOFREEZE)) {
current->flags |= PF_FROZEN;
- wmb();
+ smp_wmb();
}
clear_freeze_flag(current);
}
* the task as frozen and next clears its TIF_FREEZE.
*/
if (!freezing(p)) {
- rmb();
+ smp_rmb();
if (frozen(p))
return false;
}
};
-static int hrtimer_clock_to_base_table[MAX_CLOCKS];
+static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
+ [CLOCK_REALTIME] = HRTIMER_BASE_REALTIME,
+ [CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC,
+ [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME,
+};
static inline int hrtimer_clockid_to_base(clockid_t clock_id)
{
void __init hrtimers_init(void)
{
- hrtimer_clock_to_base_table[CLOCK_REALTIME] = HRTIMER_BASE_REALTIME;
- hrtimer_clock_to_base_table[CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC;
- hrtimer_clock_to_base_table[CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME;
-
hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
(void *)(long)smp_processor_id());
register_cpu_notifier(&hrtimers_nb);
/*
* Zero means infinite timeout - no checking done:
*/
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
unsigned long __read_mostly sysctl_hung_task_warnings = 10;
+++ /dev/null
-/*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * Copyright (C) 2007 Alan Stern
- * Copyright (C) IBM Corporation, 2009
- * Copyright (C) 2009, Frederic Weisbecker <fweisbec@gmail.com>
- *
- * Thanks to Ingo Molnar for his many suggestions.
- *
- * Authors: Alan Stern <stern@rowland.harvard.edu>
- * K.Prasad <prasad@linux.vnet.ibm.com>
- * Frederic Weisbecker <fweisbec@gmail.com>
- */
-
-/*
- * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
- * using the CPU's debug registers.
- * This file contains the arch-independent routines.
- */
-
-#include <linux/irqflags.h>
-#include <linux/kallsyms.h>
-#include <linux/notifier.h>
-#include <linux/kprobes.h>
-#include <linux/kdebug.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/percpu.h>
-#include <linux/sched.h>
-#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/list.h>
-#include <linux/cpu.h>
-#include <linux/smp.h>
-
-#include <linux/hw_breakpoint.h>
-
-
-/*
- * Constraints data
- */
-
-/* Number of pinned cpu breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int, nr_cpu_bp_pinned[TYPE_MAX]);
-
-/* Number of pinned task breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int *, nr_task_bp_pinned[TYPE_MAX]);
-
-/* Number of non-pinned cpu/task breakpoints in a cpu */
-static DEFINE_PER_CPU(unsigned int, nr_bp_flexible[TYPE_MAX]);
-
-static int nr_slots[TYPE_MAX];
-
-/* Keep track of the breakpoints attached to tasks */
-static LIST_HEAD(bp_task_head);
-
-static int constraints_initialized;
-
-/* Gather the number of total pinned and un-pinned bp in a cpuset */
-struct bp_busy_slots {
- unsigned int pinned;
- unsigned int flexible;
-};
-
-/* Serialize accesses to the above constraints */
-static DEFINE_MUTEX(nr_bp_mutex);
-
-__weak int hw_breakpoint_weight(struct perf_event *bp)
-{
- return 1;
-}
-
-static inline enum bp_type_idx find_slot_idx(struct perf_event *bp)
-{
- if (bp->attr.bp_type & HW_BREAKPOINT_RW)
- return TYPE_DATA;
-
- return TYPE_INST;
-}
-
-/*
- * Report the maximum number of pinned breakpoints a task
- * have in this cpu
- */
-static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
-{
- int i;
- unsigned int *tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
-
- for (i = nr_slots[type] - 1; i >= 0; i--) {
- if (tsk_pinned[i] > 0)
- return i + 1;
- }
-
- return 0;
-}
-
-/*
- * Count the number of breakpoints of the same type and same task.
- * The given event must be not on the list.
- */
-static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
-{
- struct task_struct *tsk = bp->hw.bp_target;
- struct perf_event *iter;
- int count = 0;
-
- list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
- if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
- count += hw_breakpoint_weight(iter);
- }
-
- return count;
-}
-
-/*
- * Report the number of pinned/un-pinned breakpoints we have in
- * a given cpu (cpu > -1) or in all of them (cpu = -1).
- */
-static void
-fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
- enum bp_type_idx type)
-{
- int cpu = bp->cpu;
- struct task_struct *tsk = bp->hw.bp_target;
-
- if (cpu >= 0) {
- slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
- if (!tsk)
- slots->pinned += max_task_bp_pinned(cpu, type);
- else
- slots->pinned += task_bp_pinned(bp, type);
- slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
-
- return;
- }
-
- for_each_online_cpu(cpu) {
- unsigned int nr;
-
- nr = per_cpu(nr_cpu_bp_pinned[type], cpu);
- if (!tsk)
- nr += max_task_bp_pinned(cpu, type);
- else
- nr += task_bp_pinned(bp, type);
-
- if (nr > slots->pinned)
- slots->pinned = nr;
-
- nr = per_cpu(nr_bp_flexible[type], cpu);
-
- if (nr > slots->flexible)
- slots->flexible = nr;
- }
-}
-
-/*
- * For now, continue to consider flexible as pinned, until we can
- * ensure no flexible event can ever be scheduled before a pinned event
- * in a same cpu.
- */
-static void
-fetch_this_slot(struct bp_busy_slots *slots, int weight)
-{
- slots->pinned += weight;
-}
-
-/*
- * Add a pinned breakpoint for the given task in our constraint table
- */
-static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
- enum bp_type_idx type, int weight)
-{
- unsigned int *tsk_pinned;
- int old_count = 0;
- int old_idx = 0;
- int idx = 0;
-
- old_count = task_bp_pinned(bp, type);
- old_idx = old_count - 1;
- idx = old_idx + weight;
-
- /* tsk_pinned[n] is the number of tasks having n breakpoints */
- tsk_pinned = per_cpu(nr_task_bp_pinned[type], cpu);
- if (enable) {
- tsk_pinned[idx]++;
- if (old_count > 0)
- tsk_pinned[old_idx]--;
- } else {
- tsk_pinned[idx]--;
- if (old_count > 0)
- tsk_pinned[old_idx]++;
- }
-}
-
-/*
- * Add/remove the given breakpoint in our constraint table
- */
-static void
-toggle_bp_slot(struct perf_event *bp, bool enable, enum bp_type_idx type,
- int weight)
-{
- int cpu = bp->cpu;
- struct task_struct *tsk = bp->hw.bp_target;
-
- /* Pinned counter cpu profiling */
- if (!tsk) {
-
- if (enable)
- per_cpu(nr_cpu_bp_pinned[type], bp->cpu) += weight;
- else
- per_cpu(nr_cpu_bp_pinned[type], bp->cpu) -= weight;
- return;
- }
-
- /* Pinned counter task profiling */
-
- if (!enable)
- list_del(&bp->hw.bp_list);
-
- if (cpu >= 0) {
- toggle_bp_task_slot(bp, cpu, enable, type, weight);
- } else {
- for_each_online_cpu(cpu)
- toggle_bp_task_slot(bp, cpu, enable, type, weight);
- }
-
- if (enable)
- list_add_tail(&bp->hw.bp_list, &bp_task_head);
-}
-
-/*
- * Function to perform processor-specific cleanup during unregistration
- */
-__weak void arch_unregister_hw_breakpoint(struct perf_event *bp)
-{
- /*
- * A weak stub function here for those archs that don't define
- * it inside arch/.../kernel/hw_breakpoint.c
- */
-}
-
-/*
- * Contraints to check before allowing this new breakpoint counter:
- *
- * == Non-pinned counter == (Considered as pinned for now)
- *
- * - If attached to a single cpu, check:
- *
- * (per_cpu(nr_bp_flexible, cpu) || (per_cpu(nr_cpu_bp_pinned, cpu)
- * + max(per_cpu(nr_task_bp_pinned, cpu)))) < HBP_NUM
- *
- * -> If there are already non-pinned counters in this cpu, it means
- * there is already a free slot for them.
- * Otherwise, we check that the maximum number of per task
- * breakpoints (for this cpu) plus the number of per cpu breakpoint
- * (for this cpu) doesn't cover every registers.
- *
- * - If attached to every cpus, check:
- *
- * (per_cpu(nr_bp_flexible, *) || (max(per_cpu(nr_cpu_bp_pinned, *))
- * + max(per_cpu(nr_task_bp_pinned, *)))) < HBP_NUM
- *
- * -> This is roughly the same, except we check the number of per cpu
- * bp for every cpu and we keep the max one. Same for the per tasks
- * breakpoints.
- *
- *
- * == Pinned counter ==
- *
- * - If attached to a single cpu, check:
- *
- * ((per_cpu(nr_bp_flexible, cpu) > 1) + per_cpu(nr_cpu_bp_pinned, cpu)
- * + max(per_cpu(nr_task_bp_pinned, cpu))) < HBP_NUM
- *
- * -> Same checks as before. But now the nr_bp_flexible, if any, must keep
- * one register at least (or they will never be fed).
- *
- * - If attached to every cpus, check:
- *
- * ((per_cpu(nr_bp_flexible, *) > 1) + max(per_cpu(nr_cpu_bp_pinned, *))
- * + max(per_cpu(nr_task_bp_pinned, *))) < HBP_NUM
- */
-static int __reserve_bp_slot(struct perf_event *bp)
-{
- struct bp_busy_slots slots = {0};
- enum bp_type_idx type;
- int weight;
-
- /* We couldn't initialize breakpoint constraints on boot */
- if (!constraints_initialized)
- return -ENOMEM;
-
- /* Basic checks */
- if (bp->attr.bp_type == HW_BREAKPOINT_EMPTY ||
- bp->attr.bp_type == HW_BREAKPOINT_INVALID)
- return -EINVAL;
-
- type = find_slot_idx(bp);
- weight = hw_breakpoint_weight(bp);
-
- fetch_bp_busy_slots(&slots, bp, type);
- /*
- * Simulate the addition of this breakpoint to the constraints
- * and see the result.
- */
- fetch_this_slot(&slots, weight);
-
- /* Flexible counters need to keep at least one slot */
- if (slots.pinned + (!!slots.flexible) > nr_slots[type])
- return -ENOSPC;
-
- toggle_bp_slot(bp, true, type, weight);
-
- return 0;
-}
-
-int reserve_bp_slot(struct perf_event *bp)
-{
- int ret;
-
- mutex_lock(&nr_bp_mutex);
-
- ret = __reserve_bp_slot(bp);
-
- mutex_unlock(&nr_bp_mutex);
-
- return ret;
-}
-
-static void __release_bp_slot(struct perf_event *bp)
-{
- enum bp_type_idx type;
- int weight;
-
- type = find_slot_idx(bp);
- weight = hw_breakpoint_weight(bp);
- toggle_bp_slot(bp, false, type, weight);
-}
-
-void release_bp_slot(struct perf_event *bp)
-{
- mutex_lock(&nr_bp_mutex);
-
- arch_unregister_hw_breakpoint(bp);
- __release_bp_slot(bp);
-
- mutex_unlock(&nr_bp_mutex);
-}
-
-/*
- * Allow the kernel debugger to reserve breakpoint slots without
- * taking a lock using the dbg_* variant of for the reserve and
- * release breakpoint slots.
- */
-int dbg_reserve_bp_slot(struct perf_event *bp)
-{
- if (mutex_is_locked(&nr_bp_mutex))
- return -1;
-
- return __reserve_bp_slot(bp);
-}
-
-int dbg_release_bp_slot(struct perf_event *bp)
-{
- if (mutex_is_locked(&nr_bp_mutex))
- return -1;
-
- __release_bp_slot(bp);
-
- return 0;
-}
-
-static int validate_hw_breakpoint(struct perf_event *bp)
-{
- int ret;
-
- ret = arch_validate_hwbkpt_settings(bp);
- if (ret)
- return ret;
-
- if (arch_check_bp_in_kernelspace(bp)) {
- if (bp->attr.exclude_kernel)
- return -EINVAL;
- /*
- * Don't let unprivileged users set a breakpoint in the trap
- * path to avoid trap recursion attacks.
- */
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
- }
-
- return 0;
-}
-
-int register_perf_hw_breakpoint(struct perf_event *bp)
-{
- int ret;
-
- ret = reserve_bp_slot(bp);
- if (ret)
- return ret;
-
- ret = validate_hw_breakpoint(bp);
-
- /* if arch_validate_hwbkpt_settings() fails then release bp slot */
- if (ret)
- release_bp_slot(bp);
-
- return ret;
-}
-
-/**
- * register_user_hw_breakpoint - register a hardware breakpoint for user space
- * @attr: breakpoint attributes
- * @triggered: callback to trigger when we hit the breakpoint
- * @tsk: pointer to 'task_struct' of the process to which the address belongs
- */
-struct perf_event *
-register_user_hw_breakpoint(struct perf_event_attr *attr,
- perf_overflow_handler_t triggered,
- struct task_struct *tsk)
-{
- return perf_event_create_kernel_counter(attr, -1, tsk, triggered);
-}
-EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
-
-/**
- * modify_user_hw_breakpoint - modify a user-space hardware breakpoint
- * @bp: the breakpoint structure to modify
- * @attr: new breakpoint attributes
- * @triggered: callback to trigger when we hit the breakpoint
- * @tsk: pointer to 'task_struct' of the process to which the address belongs
- */
-int modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr)
-{
- u64 old_addr = bp->attr.bp_addr;
- u64 old_len = bp->attr.bp_len;
- int old_type = bp->attr.bp_type;
- int err = 0;
-
- perf_event_disable(bp);
-
- bp->attr.bp_addr = attr->bp_addr;
- bp->attr.bp_type = attr->bp_type;
- bp->attr.bp_len = attr->bp_len;
-
- if (attr->disabled)
- goto end;
-
- err = validate_hw_breakpoint(bp);
- if (!err)
- perf_event_enable(bp);
-
- if (err) {
- bp->attr.bp_addr = old_addr;
- bp->attr.bp_type = old_type;
- bp->attr.bp_len = old_len;
- if (!bp->attr.disabled)
- perf_event_enable(bp);
-
- return err;
- }
-
-end:
- bp->attr.disabled = attr->disabled;
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(modify_user_hw_breakpoint);
-
-/**
- * unregister_hw_breakpoint - unregister a user-space hardware breakpoint
- * @bp: the breakpoint structure to unregister
- */
-void unregister_hw_breakpoint(struct perf_event *bp)
-{
- if (!bp)
- return;
- perf_event_release_kernel(bp);
-}
-EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
-
-/**
- * register_wide_hw_breakpoint - register a wide breakpoint in the kernel
- * @attr: breakpoint attributes
- * @triggered: callback to trigger when we hit the breakpoint
- *
- * @return a set of per_cpu pointers to perf events
- */
-struct perf_event * __percpu *
-register_wide_hw_breakpoint(struct perf_event_attr *attr,
- perf_overflow_handler_t triggered)
-{
- struct perf_event * __percpu *cpu_events, **pevent, *bp;
- long err;
- int cpu;
-
- cpu_events = alloc_percpu(typeof(*cpu_events));
- if (!cpu_events)
- return (void __percpu __force *)ERR_PTR(-ENOMEM);
-
- get_online_cpus();
- for_each_online_cpu(cpu) {
- pevent = per_cpu_ptr(cpu_events, cpu);
- bp = perf_event_create_kernel_counter(attr, cpu, NULL, triggered);
-
- *pevent = bp;
-
- if (IS_ERR(bp)) {
- err = PTR_ERR(bp);
- goto fail;
- }
- }
- put_online_cpus();
-
- return cpu_events;
-
-fail:
- for_each_online_cpu(cpu) {
- pevent = per_cpu_ptr(cpu_events, cpu);
- if (IS_ERR(*pevent))
- break;
- unregister_hw_breakpoint(*pevent);
- }
- put_online_cpus();
-
- free_percpu(cpu_events);
- return (void __percpu __force *)ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(register_wide_hw_breakpoint);
-
-/**
- * unregister_wide_hw_breakpoint - unregister a wide breakpoint in the kernel
- * @cpu_events: the per cpu set of events to unregister
- */
-void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events)
-{
- int cpu;
- struct perf_event **pevent;
-
- for_each_possible_cpu(cpu) {
- pevent = per_cpu_ptr(cpu_events, cpu);
- unregister_hw_breakpoint(*pevent);
- }
- free_percpu(cpu_events);
-}
-EXPORT_SYMBOL_GPL(unregister_wide_hw_breakpoint);
-
-static struct notifier_block hw_breakpoint_exceptions_nb = {
- .notifier_call = hw_breakpoint_exceptions_notify,
- /* we need to be notified first */
- .priority = 0x7fffffff
-};
-
-static void bp_perf_event_destroy(struct perf_event *event)
-{
- release_bp_slot(event);
-}
-
-static int hw_breakpoint_event_init(struct perf_event *bp)
-{
- int err;
-
- if (bp->attr.type != PERF_TYPE_BREAKPOINT)
- return -ENOENT;
-
- err = register_perf_hw_breakpoint(bp);
- if (err)
- return err;
-
- bp->destroy = bp_perf_event_destroy;
-
- return 0;
-}
-
-static int hw_breakpoint_add(struct perf_event *bp, int flags)
-{
- if (!(flags & PERF_EF_START))
- bp->hw.state = PERF_HES_STOPPED;
-
- return arch_install_hw_breakpoint(bp);
-}
-
-static void hw_breakpoint_del(struct perf_event *bp, int flags)
-{
- arch_uninstall_hw_breakpoint(bp);
-}
-
-static void hw_breakpoint_start(struct perf_event *bp, int flags)
-{
- bp->hw.state = 0;
-}
-
-static void hw_breakpoint_stop(struct perf_event *bp, int flags)
-{
- bp->hw.state = PERF_HES_STOPPED;
-}
-
-static struct pmu perf_breakpoint = {
- .task_ctx_nr = perf_sw_context, /* could eventually get its own */
-
- .event_init = hw_breakpoint_event_init,
- .add = hw_breakpoint_add,
- .del = hw_breakpoint_del,
- .start = hw_breakpoint_start,
- .stop = hw_breakpoint_stop,
- .read = hw_breakpoint_pmu_read,
-};
-
-int __init init_hw_breakpoint(void)
-{
- unsigned int **task_bp_pinned;
- int cpu, err_cpu;
- int i;
-
- for (i = 0; i < TYPE_MAX; i++)
- nr_slots[i] = hw_breakpoint_slots(i);
-
- for_each_possible_cpu(cpu) {
- for (i = 0; i < TYPE_MAX; i++) {
- task_bp_pinned = &per_cpu(nr_task_bp_pinned[i], cpu);
- *task_bp_pinned = kzalloc(sizeof(int) * nr_slots[i],
- GFP_KERNEL);
- if (!*task_bp_pinned)
- goto err_alloc;
- }
- }
-
- constraints_initialized = 1;
-
- perf_pmu_register(&perf_breakpoint, "breakpoint", PERF_TYPE_BREAKPOINT);
-
- return register_die_notifier(&hw_breakpoint_exceptions_nb);
-
- err_alloc:
- for_each_possible_cpu(err_cpu) {
- if (err_cpu == cpu)
- break;
- for (i = 0; i < TYPE_MAX; i++)
- kfree(per_cpu(nr_task_bp_pinned[i], cpu));
- }
-
- return -ENOMEM;
-}
-
-
config IRQ_EDGE_EOI_HANDLER
bool
+# Generic configurable interrupt chip implementation
+config GENERIC_IRQ_CHIP
+ bool
+
# Support forced irq threading
config IRQ_FORCED_THREADING
bool
obj-y := irqdesc.o handle.o manage.o spurious.o resend.o chip.o dummychip.o devres.o
+obj-$(CONFIG_GENERIC_IRQ_CHIP) += generic-chip.o
obj-$(CONFIG_GENERIC_IRQ_PROBE) += autoprobe.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
out_unlock:
raw_spin_unlock(&desc->lock);
}
+EXPORT_SYMBOL_GPL(handle_simple_irq);
/**
* handle_level_irq - Level type irq handler
if (handle != handle_bad_irq && is_chained) {
irq_settings_set_noprobe(desc);
irq_settings_set_norequest(desc);
+ irq_settings_set_nothread(desc);
irq_startup(desc);
}
out:
irq_put_desc_unlock(desc, flags);
}
+EXPORT_SYMBOL_GPL(irq_modify_status);
/**
* irq_cpu_online - Invoke all irq_cpu_online functions.
P(IRQ_PER_CPU);
P(IRQ_NOPROBE);
P(IRQ_NOREQUEST);
+ P(IRQ_NOTHREAD);
P(IRQ_NOAUTOEN);
PS(IRQS_AUTODETECT);
--- /dev/null
+/*
+ * Library implementing the most common irq chip callback functions
+ *
+ * Copyright (C) 2011, Thomas Gleixner
+ */
+#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+#include <linux/kernel_stat.h>
+#include <linux/syscore_ops.h>
+
+#include "internals.h"
+
+static LIST_HEAD(gc_list);
+static DEFINE_RAW_SPINLOCK(gc_lock);
+
+static inline struct irq_chip_regs *cur_regs(struct irq_data *d)
+{
+ return &container_of(d->chip, struct irq_chip_type, chip)->regs;
+}
+
+/**
+ * irq_gc_noop - NOOP function
+ * @d: irq_data
+ */
+void irq_gc_noop(struct irq_data *d)
+{
+}
+
+/**
+ * irq_gc_mask_disable_reg - Mask chip via disable register
+ * @d: irq_data
+ *
+ * Chip has separate enable/disable registers instead of a single mask
+ * register.
+ */
+void irq_gc_mask_disable_reg(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ irq_reg_writel(mask, gc->reg_base + cur_regs(d)->disable);
+ gc->mask_cache &= ~mask;
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_mask_set_mask_bit - Mask chip via setting bit in mask register
+ * @d: irq_data
+ *
+ * Chip has a single mask register. Values of this register are cached
+ * and protected by gc->lock
+ */
+void irq_gc_mask_set_bit(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ gc->mask_cache |= mask;
+ irq_reg_writel(gc->mask_cache, gc->reg_base + cur_regs(d)->mask);
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_mask_set_mask_bit - Mask chip via clearing bit in mask register
+ * @d: irq_data
+ *
+ * Chip has a single mask register. Values of this register are cached
+ * and protected by gc->lock
+ */
+void irq_gc_mask_clr_bit(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ gc->mask_cache &= ~mask;
+ irq_reg_writel(gc->mask_cache, gc->reg_base + cur_regs(d)->mask);
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_unmask_enable_reg - Unmask chip via enable register
+ * @d: irq_data
+ *
+ * Chip has separate enable/disable registers instead of a single mask
+ * register.
+ */
+void irq_gc_unmask_enable_reg(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ irq_reg_writel(mask, gc->reg_base + cur_regs(d)->enable);
+ gc->mask_cache |= mask;
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_ack - Ack pending interrupt
+ * @d: irq_data
+ */
+void irq_gc_ack(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ irq_reg_writel(mask, gc->reg_base + cur_regs(d)->ack);
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_mask_disable_reg_and_ack- Mask and ack pending interrupt
+ * @d: irq_data
+ */
+void irq_gc_mask_disable_reg_and_ack(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ irq_reg_writel(mask, gc->reg_base + cur_regs(d)->mask);
+ irq_reg_writel(mask, gc->reg_base + cur_regs(d)->ack);
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_eoi - EOI interrupt
+ * @d: irq_data
+ */
+void irq_gc_eoi(struct irq_data *d)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ irq_gc_lock(gc);
+ irq_reg_writel(mask, gc->reg_base + cur_regs(d)->eoi);
+ irq_gc_unlock(gc);
+}
+
+/**
+ * irq_gc_set_wake - Set/clr wake bit for an interrupt
+ * @d: irq_data
+ *
+ * For chips where the wake from suspend functionality is not
+ * configured in a separate register and the wakeup active state is
+ * just stored in a bitmask.
+ */
+int irq_gc_set_wake(struct irq_data *d, unsigned int on)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ u32 mask = 1 << (d->irq - gc->irq_base);
+
+ if (!(mask & gc->wake_enabled))
+ return -EINVAL;
+
+ irq_gc_lock(gc);
+ if (on)
+ gc->wake_active |= mask;
+ else
+ gc->wake_active &= ~mask;
+ irq_gc_unlock(gc);
+ return 0;
+}
+
+/**
+ * irq_alloc_generic_chip - Allocate a generic chip and initialize it
+ * @name: Name of the irq chip
+ * @num_ct: Number of irq_chip_type instances associated with this
+ * @irq_base: Interrupt base nr for this chip
+ * @reg_base: Register base address (virtual)
+ * @handler: Default flow handler associated with this chip
+ *
+ * Returns an initialized irq_chip_generic structure. The chip defaults
+ * to the primary (index 0) irq_chip_type and @handler
+ */
+struct irq_chip_generic *
+irq_alloc_generic_chip(const char *name, int num_ct, unsigned int irq_base,
+ void __iomem *reg_base, irq_flow_handler_t handler)
+{
+ struct irq_chip_generic *gc;
+ unsigned long sz = sizeof(*gc) + num_ct * sizeof(struct irq_chip_type);
+
+ gc = kzalloc(sz, GFP_KERNEL);
+ if (gc) {
+ raw_spin_lock_init(&gc->lock);
+ gc->num_ct = num_ct;
+ gc->irq_base = irq_base;
+ gc->reg_base = reg_base;
+ gc->chip_types->chip.name = name;
+ gc->chip_types->handler = handler;
+ }
+ return gc;
+}
+
+/*
+ * Separate lockdep class for interrupt chip which can nest irq_desc
+ * lock.
+ */
+static struct lock_class_key irq_nested_lock_class;
+
+/**
+ * irq_setup_generic_chip - Setup a range of interrupts with a generic chip
+ * @gc: Generic irq chip holding all data
+ * @msk: Bitmask holding the irqs to initialize relative to gc->irq_base
+ * @flags: Flags for initialization
+ * @clr: IRQ_* bits to clear
+ * @set: IRQ_* bits to set
+ *
+ * Set up max. 32 interrupts starting from gc->irq_base. Note, this
+ * initializes all interrupts to the primary irq_chip_type and its
+ * associated handler.
+ */
+void irq_setup_generic_chip(struct irq_chip_generic *gc, u32 msk,
+ enum irq_gc_flags flags, unsigned int clr,
+ unsigned int set)
+{
+ struct irq_chip_type *ct = gc->chip_types;
+ unsigned int i;
+
+ raw_spin_lock(&gc_lock);
+ list_add_tail(&gc->list, &gc_list);
+ raw_spin_unlock(&gc_lock);
+
+ /* Init mask cache ? */
+ if (flags & IRQ_GC_INIT_MASK_CACHE)
+ gc->mask_cache = irq_reg_readl(gc->reg_base + ct->regs.mask);
+
+ for (i = gc->irq_base; msk; msk >>= 1, i++) {
+ if (!msk & 0x01)
+ continue;
+
+ if (flags & IRQ_GC_INIT_NESTED_LOCK)
+ irq_set_lockdep_class(i, &irq_nested_lock_class);
+
+ irq_set_chip_and_handler(i, &ct->chip, ct->handler);
+ irq_set_chip_data(i, gc);
+ irq_modify_status(i, clr, set);
+ }
+ gc->irq_cnt = i - gc->irq_base;
+}
+
+/**
+ * irq_setup_alt_chip - Switch to alternative chip
+ * @d: irq_data for this interrupt
+ * @type Flow type to be initialized
+ *
+ * Only to be called from chip->irq_set_type() callbacks.
+ */
+int irq_setup_alt_chip(struct irq_data *d, unsigned int type)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ struct irq_chip_type *ct = gc->chip_types;
+ unsigned int i;
+
+ for (i = 0; i < gc->num_ct; i++, ct++) {
+ if (ct->type & type) {
+ d->chip = &ct->chip;
+ irq_data_to_desc(d)->handle_irq = ct->handler;
+ return 0;
+ }
+ }
+ return -EINVAL;
+}
+
+/**
+ * irq_remove_generic_chip - Remove a chip
+ * @gc: Generic irq chip holding all data
+ * @msk: Bitmask holding the irqs to initialize relative to gc->irq_base
+ * @clr: IRQ_* bits to clear
+ * @set: IRQ_* bits to set
+ *
+ * Remove up to 32 interrupts starting from gc->irq_base.
+ */
+void irq_remove_generic_chip(struct irq_chip_generic *gc, u32 msk,
+ unsigned int clr, unsigned int set)
+{
+ unsigned int i = gc->irq_base;
+
+ raw_spin_lock(&gc_lock);
+ list_del(&gc->list);
+ raw_spin_unlock(&gc_lock);
+
+ for (; msk; msk >>= 1, i++) {
+ if (!msk & 0x01)
+ continue;
+
+ /* Remove handler first. That will mask the irq line */
+ irq_set_handler(i, NULL);
+ irq_set_chip(i, &no_irq_chip);
+ irq_set_chip_data(i, NULL);
+ irq_modify_status(i, clr, set);
+ }
+}
+
+#ifdef CONFIG_PM
+static int irq_gc_suspend(void)
+{
+ struct irq_chip_generic *gc;
+
+ list_for_each_entry(gc, &gc_list, list) {
+ struct irq_chip_type *ct = gc->chip_types;
+
+ if (ct->chip.irq_suspend)
+ ct->chip.irq_suspend(irq_get_irq_data(gc->irq_base));
+ }
+ return 0;
+}
+
+static void irq_gc_resume(void)
+{
+ struct irq_chip_generic *gc;
+
+ list_for_each_entry(gc, &gc_list, list) {
+ struct irq_chip_type *ct = gc->chip_types;
+
+ if (ct->chip.irq_resume)
+ ct->chip.irq_resume(irq_get_irq_data(gc->irq_base));
+ }
+}
+#else
+#define irq_gc_suspend NULL
+#define irq_gc_resume NULL
+#endif
+
+static void irq_gc_shutdown(void)
+{
+ struct irq_chip_generic *gc;
+
+ list_for_each_entry(gc, &gc_list, list) {
+ struct irq_chip_type *ct = gc->chip_types;
+
+ if (ct->chip.irq_pm_shutdown)
+ ct->chip.irq_pm_shutdown(irq_get_irq_data(gc->irq_base));
+ }
+}
+
+static struct syscore_ops irq_gc_syscore_ops = {
+ .suspend = irq_gc_suspend,
+ .resume = irq_gc_resume,
+ .shutdown = irq_gc_shutdown,
+};
+
+static int __init irq_gc_init_ops(void)
+{
+ register_syscore_ops(&irq_gc_syscore_ops);
+ return 0;
+}
+device_initcall(irq_gc_init_ops);
*/
static struct lock_class_key irq_desc_lock_class;
-#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)
+#if defined(CONFIG_SMP)
static void __init init_irq_default_affinity(void)
{
alloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
#endif /* !CONFIG_SPARSE_IRQ */
+/**
+ * generic_handle_irq - Invoke the handler for a particular irq
+ * @irq: The irq number to handle
+ *
+ */
+int generic_handle_irq(unsigned int irq)
+{
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ if (!desc)
+ return -EINVAL;
+ generic_handle_irq_desc(irq, desc);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(generic_handle_irq);
+
/* Dynamic interrupt handling */
/**
bitmap_clear(allocated_irqs, from, cnt);
mutex_unlock(&sparse_irq_lock);
}
+EXPORT_SYMBOL_GPL(irq_free_descs);
/**
* irq_alloc_descs - allocate and initialize a range of irq descriptors
mutex_unlock(&sparse_irq_lock);
return ret;
}
+EXPORT_SYMBOL_GPL(irq_alloc_descs);
/**
* irq_reserve_irqs - mark irqs allocated
*per_cpu_ptr(desc->kstat_irqs, cpu) : 0;
}
-#ifdef CONFIG_GENERIC_HARDIRQS
unsigned int kstat_irqs(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
sum += *per_cpu_ptr(desc->kstat_irqs, cpu);
return sum;
}
-#endif /* CONFIG_GENERIC_HARDIRQS */
*/
new->handler = irq_nested_primary_handler;
} else {
- irq_setup_forced_threading(new);
+ if (irq_settings_can_thread(desc))
+ irq_setup_forced_threading(new);
}
/*
} else {
seq_printf(p, " %8s", "None");
}
-#ifdef CONFIG_GENIRC_IRQ_SHOW_LEVEL
+#ifdef CONFIG_GENERIC_IRQ_SHOW_LEVEL
seq_printf(p, " %-8s", irqd_is_level_type(&desc->irq_data) ? "Level" : "Edge");
#endif
if (desc->name)
_IRQ_LEVEL = IRQ_LEVEL,
_IRQ_NOPROBE = IRQ_NOPROBE,
_IRQ_NOREQUEST = IRQ_NOREQUEST,
+ _IRQ_NOTHREAD = IRQ_NOTHREAD,
_IRQ_NOAUTOEN = IRQ_NOAUTOEN,
_IRQ_MOVE_PCNTXT = IRQ_MOVE_PCNTXT,
_IRQ_NO_BALANCING = IRQ_NO_BALANCING,
#define IRQ_LEVEL GOT_YOU_MORON
#define IRQ_NOPROBE GOT_YOU_MORON
#define IRQ_NOREQUEST GOT_YOU_MORON
+#define IRQ_NOTHREAD GOT_YOU_MORON
#define IRQ_NOAUTOEN GOT_YOU_MORON
#define IRQ_NESTED_THREAD GOT_YOU_MORON
#undef IRQF_MODIFY_MASK
desc->status_use_accessors |= _IRQ_NOREQUEST;
}
+static inline bool irq_settings_can_thread(struct irq_desc *desc)
+{
+ return !(desc->status_use_accessors & _IRQ_NOTHREAD);
+}
+
+static inline void irq_settings_clr_nothread(struct irq_desc *desc)
+{
+ desc->status_use_accessors &= ~_IRQ_NOTHREAD;
+}
+
+static inline void irq_settings_set_nothread(struct irq_desc *desc)
+{
+ desc->status_use_accessors |= _IRQ_NOTHREAD;
+}
+
static inline bool irq_settings_can_probe(struct irq_desc *desc)
{
return !(desc->status_use_accessors & _IRQ_NOPROBE);
* jump label support
*
* Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
+ * Copyright (C) 2011 Peter Zijlstra <pzijlstr@redhat.com>
*
*/
-#include <linux/jump_label.h>
#include <linux/memory.h>
#include <linux/uaccess.h>
#include <linux/module.h>
#include <linux/list.h>
-#include <linux/jhash.h>
#include <linux/slab.h>
#include <linux/sort.h>
#include <linux/err.h>
+#include <linux/jump_label.h>
#ifdef HAVE_JUMP_LABEL
-#define JUMP_LABEL_HASH_BITS 6
-#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
-static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
-
/* mutex to protect coming/going of the the jump_label table */
static DEFINE_MUTEX(jump_label_mutex);
-struct jump_label_entry {
- struct hlist_node hlist;
- struct jump_entry *table;
- int nr_entries;
- /* hang modules off here */
- struct hlist_head modules;
- unsigned long key;
-};
-
-struct jump_label_module_entry {
- struct hlist_node hlist;
- struct jump_entry *table;
- int nr_entries;
- struct module *mod;
-};
-
void jump_label_lock(void)
{
mutex_lock(&jump_label_mutex);
mutex_unlock(&jump_label_mutex);
}
+bool jump_label_enabled(struct jump_label_key *key)
+{
+ return !!atomic_read(&key->enabled);
+}
+
static int jump_label_cmp(const void *a, const void *b)
{
const struct jump_entry *jea = a;
}
static void
-sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
+jump_label_sort_entries(struct jump_entry *start, struct jump_entry *stop)
{
unsigned long size;
sort(start, size, sizeof(struct jump_entry), jump_label_cmp, NULL);
}
-static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
-{
- struct hlist_head *head;
- struct hlist_node *node;
- struct jump_label_entry *e;
- u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
-
- head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (key == e->key)
- return e;
- }
- return NULL;
-}
+static void jump_label_update(struct jump_label_key *key, int enable);
-static struct jump_label_entry *
-add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
+void jump_label_inc(struct jump_label_key *key)
{
- struct hlist_head *head;
- struct jump_label_entry *e;
- u32 hash;
-
- e = get_jump_label_entry(key);
- if (e)
- return ERR_PTR(-EEXIST);
-
- e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
- if (!e)
- return ERR_PTR(-ENOMEM);
-
- hash = jhash((void *)&key, sizeof(jump_label_t), 0);
- head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
- e->key = key;
- e->table = table;
- e->nr_entries = nr_entries;
- INIT_HLIST_HEAD(&(e->modules));
- hlist_add_head(&e->hlist, head);
- return e;
-}
+ if (atomic_inc_not_zero(&key->enabled))
+ return;
-static int
-build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
-{
- struct jump_entry *iter, *iter_begin;
- struct jump_label_entry *entry;
- int count;
-
- sort_jump_label_entries(start, stop);
- iter = start;
- while (iter < stop) {
- entry = get_jump_label_entry(iter->key);
- if (!entry) {
- iter_begin = iter;
- count = 0;
- while ((iter < stop) &&
- (iter->key == iter_begin->key)) {
- iter++;
- count++;
- }
- entry = add_jump_label_entry(iter_begin->key,
- count, iter_begin);
- if (IS_ERR(entry))
- return PTR_ERR(entry);
- } else {
- WARN_ONCE(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
- return -1;
- }
- }
- return 0;
+ jump_label_lock();
+ if (atomic_add_return(1, &key->enabled) == 1)
+ jump_label_update(key, JUMP_LABEL_ENABLE);
+ jump_label_unlock();
}
-/***
- * jump_label_update - update jump label text
- * @key - key value associated with a a jump label
- * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
- *
- * Will enable/disable the jump for jump label @key, depending on the
- * value of @type.
- *
- */
-
-void jump_label_update(unsigned long key, enum jump_label_type type)
+void jump_label_dec(struct jump_label_key *key)
{
- struct jump_entry *iter;
- struct jump_label_entry *entry;
- struct hlist_node *module_node;
- struct jump_label_module_entry *e_module;
- int count;
+ if (!atomic_dec_and_mutex_lock(&key->enabled, &jump_label_mutex))
+ return;
- jump_label_lock();
- entry = get_jump_label_entry((jump_label_t)key);
- if (entry) {
- count = entry->nr_entries;
- iter = entry->table;
- while (count--) {
- if (kernel_text_address(iter->code))
- arch_jump_label_transform(iter, type);
- iter++;
- }
- /* eanble/disable jump labels in modules */
- hlist_for_each_entry(e_module, module_node, &(entry->modules),
- hlist) {
- count = e_module->nr_entries;
- iter = e_module->table;
- while (count--) {
- if (iter->key &&
- kernel_text_address(iter->code))
- arch_jump_label_transform(iter, type);
- iter++;
- }
- }
- }
+ jump_label_update(key, JUMP_LABEL_DISABLE);
jump_label_unlock();
}
return 0;
}
-#ifdef CONFIG_MODULES
-
-static int module_conflict(void *start, void *end)
+static int __jump_label_text_reserved(struct jump_entry *iter_start,
+ struct jump_entry *iter_stop, void *start, void *end)
{
- struct hlist_head *head;
- struct hlist_node *node, *node_next, *module_node, *module_node_next;
- struct jump_label_entry *e;
- struct jump_label_module_entry *e_module;
struct jump_entry *iter;
- int i, count;
- int conflict = 0;
-
- for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
- head = &jump_label_table[i];
- hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
- hlist_for_each_entry_safe(e_module, module_node,
- module_node_next,
- &(e->modules), hlist) {
- count = e_module->nr_entries;
- iter = e_module->table;
- while (count--) {
- if (addr_conflict(iter, start, end)) {
- conflict = 1;
- goto out;
- }
- iter++;
- }
- }
- }
- }
-out:
- return conflict;
-}
-
-#endif
-
-/***
- * jump_label_text_reserved - check if addr range is reserved
- * @start: start text addr
- * @end: end text addr
- *
- * checks if the text addr located between @start and @end
- * overlaps with any of the jump label patch addresses. Code
- * that wants to modify kernel text should first verify that
- * it does not overlap with any of the jump label addresses.
- * Caller must hold jump_label_mutex.
- *
- * returns 1 if there is an overlap, 0 otherwise
- */
-int jump_label_text_reserved(void *start, void *end)
-{
- struct jump_entry *iter;
- struct jump_entry *iter_start = __start___jump_table;
- struct jump_entry *iter_stop = __start___jump_table;
- int conflict = 0;
iter = iter_start;
while (iter < iter_stop) {
- if (addr_conflict(iter, start, end)) {
- conflict = 1;
- goto out;
- }
+ if (addr_conflict(iter, start, end))
+ return 1;
iter++;
}
- /* now check modules */
-#ifdef CONFIG_MODULES
- conflict = module_conflict(start, end);
-#endif
-out:
- return conflict;
+ return 0;
+}
+
+static void __jump_label_update(struct jump_label_key *key,
+ struct jump_entry *entry, int enable)
+{
+ for (; entry->key == (jump_label_t)(unsigned long)key; entry++) {
+ /*
+ * entry->code set to 0 invalidates module init text sections
+ * kernel_text_address() verifies we are not in core kernel
+ * init code, see jump_label_invalidate_module_init().
+ */
+ if (entry->code && kernel_text_address(entry->code))
+ arch_jump_label_transform(entry, enable);
+ }
}
/*
{
}
-static __init int init_jump_label(void)
+static __init int jump_label_init(void)
{
- int ret;
struct jump_entry *iter_start = __start___jump_table;
struct jump_entry *iter_stop = __stop___jump_table;
+ struct jump_label_key *key = NULL;
struct jump_entry *iter;
jump_label_lock();
- ret = build_jump_label_hashtable(__start___jump_table,
- __stop___jump_table);
- iter = iter_start;
- while (iter < iter_stop) {
+ jump_label_sort_entries(iter_start, iter_stop);
+
+ for (iter = iter_start; iter < iter_stop; iter++) {
arch_jump_label_text_poke_early(iter->code);
- iter++;
+ if (iter->key == (jump_label_t)(unsigned long)key)
+ continue;
+
+ key = (struct jump_label_key *)(unsigned long)iter->key;
+ atomic_set(&key->enabled, 0);
+ key->entries = iter;
+#ifdef CONFIG_MODULES
+ key->next = NULL;
+#endif
}
jump_label_unlock();
- return ret;
+
+ return 0;
}
-early_initcall(init_jump_label);
+early_initcall(jump_label_init);
#ifdef CONFIG_MODULES
-static struct jump_label_module_entry *
-add_jump_label_module_entry(struct jump_label_entry *entry,
- struct jump_entry *iter_begin,
- int count, struct module *mod)
+struct jump_label_mod {
+ struct jump_label_mod *next;
+ struct jump_entry *entries;
+ struct module *mod;
+};
+
+static int __jump_label_mod_text_reserved(void *start, void *end)
+{
+ struct module *mod;
+
+ mod = __module_text_address((unsigned long)start);
+ if (!mod)
+ return 0;
+
+ WARN_ON_ONCE(__module_text_address((unsigned long)end) != mod);
+
+ return __jump_label_text_reserved(mod->jump_entries,
+ mod->jump_entries + mod->num_jump_entries,
+ start, end);
+}
+
+static void __jump_label_mod_update(struct jump_label_key *key, int enable)
+{
+ struct jump_label_mod *mod = key->next;
+
+ while (mod) {
+ __jump_label_update(key, mod->entries, enable);
+ mod = mod->next;
+ }
+}
+
+/***
+ * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
+ * @mod: module to patch
+ *
+ * Allow for run-time selection of the optimal nops. Before the module
+ * loads patch these with arch_get_jump_label_nop(), which is specified by
+ * the arch specific jump label code.
+ */
+void jump_label_apply_nops(struct module *mod)
{
- struct jump_label_module_entry *e;
-
- e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
- if (!e)
- return ERR_PTR(-ENOMEM);
- e->mod = mod;
- e->nr_entries = count;
- e->table = iter_begin;
- hlist_add_head(&e->hlist, &entry->modules);
- return e;
+ struct jump_entry *iter_start = mod->jump_entries;
+ struct jump_entry *iter_stop = iter_start + mod->num_jump_entries;
+ struct jump_entry *iter;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (iter_start == iter_stop)
+ return;
+
+ for (iter = iter_start; iter < iter_stop; iter++)
+ arch_jump_label_text_poke_early(iter->code);
}
-static int add_jump_label_module(struct module *mod)
+static int jump_label_add_module(struct module *mod)
{
- struct jump_entry *iter, *iter_begin;
- struct jump_label_entry *entry;
- struct jump_label_module_entry *module_entry;
- int count;
+ struct jump_entry *iter_start = mod->jump_entries;
+ struct jump_entry *iter_stop = iter_start + mod->num_jump_entries;
+ struct jump_entry *iter;
+ struct jump_label_key *key = NULL;
+ struct jump_label_mod *jlm;
/* if the module doesn't have jump label entries, just return */
- if (!mod->num_jump_entries)
+ if (iter_start == iter_stop)
return 0;
- sort_jump_label_entries(mod->jump_entries,
- mod->jump_entries + mod->num_jump_entries);
- iter = mod->jump_entries;
- while (iter < mod->jump_entries + mod->num_jump_entries) {
- entry = get_jump_label_entry(iter->key);
- iter_begin = iter;
- count = 0;
- while ((iter < mod->jump_entries + mod->num_jump_entries) &&
- (iter->key == iter_begin->key)) {
- iter++;
- count++;
- }
- if (!entry) {
- entry = add_jump_label_entry(iter_begin->key, 0, NULL);
- if (IS_ERR(entry))
- return PTR_ERR(entry);
+ jump_label_sort_entries(iter_start, iter_stop);
+
+ for (iter = iter_start; iter < iter_stop; iter++) {
+ if (iter->key == (jump_label_t)(unsigned long)key)
+ continue;
+
+ key = (struct jump_label_key *)(unsigned long)iter->key;
+
+ if (__module_address(iter->key) == mod) {
+ atomic_set(&key->enabled, 0);
+ key->entries = iter;
+ key->next = NULL;
+ continue;
}
- module_entry = add_jump_label_module_entry(entry, iter_begin,
- count, mod);
- if (IS_ERR(module_entry))
- return PTR_ERR(module_entry);
+
+ jlm = kzalloc(sizeof(struct jump_label_mod), GFP_KERNEL);
+ if (!jlm)
+ return -ENOMEM;
+
+ jlm->mod = mod;
+ jlm->entries = iter;
+ jlm->next = key->next;
+ key->next = jlm;
+
+ if (jump_label_enabled(key))
+ __jump_label_update(key, iter, JUMP_LABEL_ENABLE);
}
+
return 0;
}
-static void remove_jump_label_module(struct module *mod)
+static void jump_label_del_module(struct module *mod)
{
- struct hlist_head *head;
- struct hlist_node *node, *node_next, *module_node, *module_node_next;
- struct jump_label_entry *e;
- struct jump_label_module_entry *e_module;
- int i;
+ struct jump_entry *iter_start = mod->jump_entries;
+ struct jump_entry *iter_stop = iter_start + mod->num_jump_entries;
+ struct jump_entry *iter;
+ struct jump_label_key *key = NULL;
+ struct jump_label_mod *jlm, **prev;
- /* if the module doesn't have jump label entries, just return */
- if (!mod->num_jump_entries)
- return;
+ for (iter = iter_start; iter < iter_stop; iter++) {
+ if (iter->key == (jump_label_t)(unsigned long)key)
+ continue;
+
+ key = (struct jump_label_key *)(unsigned long)iter->key;
+
+ if (__module_address(iter->key) == mod)
+ continue;
+
+ prev = &key->next;
+ jlm = key->next;
- for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
- head = &jump_label_table[i];
- hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
- hlist_for_each_entry_safe(e_module, module_node,
- module_node_next,
- &(e->modules), hlist) {
- if (e_module->mod == mod) {
- hlist_del(&e_module->hlist);
- kfree(e_module);
- }
- }
- if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
- hlist_del(&e->hlist);
- kfree(e);
- }
+ while (jlm && jlm->mod != mod) {
+ prev = &jlm->next;
+ jlm = jlm->next;
+ }
+
+ if (jlm) {
+ *prev = jlm->next;
+ kfree(jlm);
}
}
}
-static void remove_jump_label_module_init(struct module *mod)
+static void jump_label_invalidate_module_init(struct module *mod)
{
- struct hlist_head *head;
- struct hlist_node *node, *node_next, *module_node, *module_node_next;
- struct jump_label_entry *e;
- struct jump_label_module_entry *e_module;
+ struct jump_entry *iter_start = mod->jump_entries;
+ struct jump_entry *iter_stop = iter_start + mod->num_jump_entries;
struct jump_entry *iter;
- int i, count;
-
- /* if the module doesn't have jump label entries, just return */
- if (!mod->num_jump_entries)
- return;
- for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
- head = &jump_label_table[i];
- hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
- hlist_for_each_entry_safe(e_module, module_node,
- module_node_next,
- &(e->modules), hlist) {
- if (e_module->mod != mod)
- continue;
- count = e_module->nr_entries;
- iter = e_module->table;
- while (count--) {
- if (within_module_init(iter->code, mod))
- iter->key = 0;
- iter++;
- }
- }
- }
+ for (iter = iter_start; iter < iter_stop; iter++) {
+ if (within_module_init(iter->code, mod))
+ iter->code = 0;
}
}
switch (val) {
case MODULE_STATE_COMING:
jump_label_lock();
- ret = add_jump_label_module(mod);
+ ret = jump_label_add_module(mod);
if (ret)
- remove_jump_label_module(mod);
+ jump_label_del_module(mod);
jump_label_unlock();
break;
case MODULE_STATE_GOING:
jump_label_lock();
- remove_jump_label_module(mod);
+ jump_label_del_module(mod);
jump_label_unlock();
break;
case MODULE_STATE_LIVE:
jump_label_lock();
- remove_jump_label_module_init(mod);
+ jump_label_invalidate_module_init(mod);
jump_label_unlock();
break;
}
- return ret;
-}
-/***
- * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
- * @mod: module to patch
- *
- * Allow for run-time selection of the optimal nops. Before the module
- * loads patch these with arch_get_jump_label_nop(), which is specified by
- * the arch specific jump label code.
- */
-void jump_label_apply_nops(struct module *mod)
-{
- struct jump_entry *iter;
-
- /* if the module doesn't have jump label entries, just return */
- if (!mod->num_jump_entries)
- return;
-
- iter = mod->jump_entries;
- while (iter < mod->jump_entries + mod->num_jump_entries) {
- arch_jump_label_text_poke_early(iter->code);
- iter++;
- }
+ return notifier_from_errno(ret);
}
struct notifier_block jump_label_module_nb = {
.notifier_call = jump_label_module_notify,
- .priority = 0,
+ .priority = 1, /* higher than tracepoints */
};
-static __init int init_jump_label_module(void)
+static __init int jump_label_init_module(void)
{
return register_module_notifier(&jump_label_module_nb);
}
-early_initcall(init_jump_label_module);
+early_initcall(jump_label_init_module);
#endif /* CONFIG_MODULES */
+/***
+ * jump_label_text_reserved - check if addr range is reserved
+ * @start: start text addr
+ * @end: end text addr
+ *
+ * checks if the text addr located between @start and @end
+ * overlaps with any of the jump label patch addresses. Code
+ * that wants to modify kernel text should first verify that
+ * it does not overlap with any of the jump label addresses.
+ * Caller must hold jump_label_mutex.
+ *
+ * returns 1 if there is an overlap, 0 otherwise
+ */
+int jump_label_text_reserved(void *start, void *end)
+{
+ int ret = __jump_label_text_reserved(__start___jump_table,
+ __stop___jump_table, start, end);
+
+ if (ret)
+ return ret;
+
+#ifdef CONFIG_MODULES
+ ret = __jump_label_mod_text_reserved(start, end);
+#endif
+ return ret;
+}
+
+static void jump_label_update(struct jump_label_key *key, int enable)
+{
+ struct jump_entry *entry = key->entries;
+
+ /* if there are no users, entry can be NULL */
+ if (entry)
+ __jump_label_update(key, entry, enable);
+
+#ifdef CONFIG_MODULES
+ __jump_label_mod_update(key, enable);
+#endif
+}
+
#endif
#include <linux/vmalloc.h>
#include <linux/swap.h>
#include <linux/kmsg_dump.h>
+#include <linux/syscore_ops.h>
#include <asm/page.h>
#include <asm/uaccess.h>
if (error)
goto Enable_cpus;
local_irq_disable();
- /* Suspend system devices */
- error = sysdev_suspend(PMSG_FREEZE);
+ error = syscore_suspend();
if (error)
goto Enable_irqs;
} else
#ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
- sysdev_resume();
+ syscore_resume();
Enable_irqs:
local_irq_enable();
Enable_cpus:
}
}
-#ifdef CONFIG_PM_SLEEP
/*
* If set, call_usermodehelper_exec() will exit immediately returning -EBUSY
* (used for preventing user land processes from being created after the user
usermodehelper_disabled = 0;
}
+/**
+ * usermodehelper_is_disabled - check if new helpers are allowed to be started
+ */
+bool usermodehelper_is_disabled(void)
+{
+ return usermodehelper_disabled;
+}
+EXPORT_SYMBOL_GPL(usermodehelper_is_disabled);
+
static void helper_lock(void)
{
atomic_inc(&running_helpers);
if (atomic_dec_and_test(&running_helpers))
wake_up(&running_helpers_waitq);
}
-#else /* CONFIG_PM_SLEEP */
-#define usermodehelper_disabled 0
-
-static inline void helper_lock(void) {}
-static inline void helper_unlock(void) {}
-#endif /* CONFIG_PM_SLEEP */
/**
* call_usermodehelper_setup - prepare to call a usermode helper
usage[i] = '\0';
}
+static int __print_lock_name(struct lock_class *class)
+{
+ char str[KSYM_NAME_LEN];
+ const char *name;
+
+ name = class->name;
+ if (!name)
+ name = __get_key_name(class->key, str);
+
+ return printk("%s", name);
+}
+
static void print_lock_name(struct lock_class *class)
{
char str[KSYM_NAME_LEN], usage[LOCK_USAGE_CHARS];
return 0;
}
+static void
+print_circular_lock_scenario(struct held_lock *src,
+ struct held_lock *tgt,
+ struct lock_list *prt)
+{
+ struct lock_class *source = hlock_class(src);
+ struct lock_class *target = hlock_class(tgt);
+ struct lock_class *parent = prt->class;
+
+ /*
+ * A direct locking problem where unsafe_class lock is taken
+ * directly by safe_class lock, then all we need to show
+ * is the deadlock scenario, as it is obvious that the
+ * unsafe lock is taken under the safe lock.
+ *
+ * But if there is a chain instead, where the safe lock takes
+ * an intermediate lock (middle_class) where this lock is
+ * not the same as the safe lock, then the lock chain is
+ * used to describe the problem. Otherwise we would need
+ * to show a different CPU case for each link in the chain
+ * from the safe_class lock to the unsafe_class lock.
+ */
+ if (parent != source) {
+ printk("Chain exists of:\n ");
+ __print_lock_name(source);
+ printk(" --> ");
+ __print_lock_name(parent);
+ printk(" --> ");
+ __print_lock_name(target);
+ printk("\n\n");
+ }
+
+ printk(" Possible unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(parent);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(source);
+ printk(");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+}
+
/*
* When a circular dependency is detected, print the
* header first:
{
struct task_struct *curr = current;
struct lock_list *parent;
+ struct lock_list *first_parent;
int depth;
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
print_circular_bug_header(target, depth, check_src, check_tgt);
parent = get_lock_parent(target);
+ first_parent = parent;
while (parent) {
print_circular_bug_entry(parent, --depth);
}
printk("\nother info that might help us debug this:\n\n");
+ print_circular_lock_scenario(check_src, check_tgt,
+ first_parent);
+
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
printk("\n");
if (depth == 0 && (entry != root)) {
- printk("lockdep:%s bad BFS generated tree\n", __func__);
+ printk("lockdep:%s bad path found in chain graph\n", __func__);
break;
}
return;
}
+static void
+print_irq_lock_scenario(struct lock_list *safe_entry,
+ struct lock_list *unsafe_entry,
+ struct lock_class *prev_class,
+ struct lock_class *next_class)
+{
+ struct lock_class *safe_class = safe_entry->class;
+ struct lock_class *unsafe_class = unsafe_entry->class;
+ struct lock_class *middle_class = prev_class;
+
+ if (middle_class == safe_class)
+ middle_class = next_class;
+
+ /*
+ * A direct locking problem where unsafe_class lock is taken
+ * directly by safe_class lock, then all we need to show
+ * is the deadlock scenario, as it is obvious that the
+ * unsafe lock is taken under the safe lock.
+ *
+ * But if there is a chain instead, where the safe lock takes
+ * an intermediate lock (middle_class) where this lock is
+ * not the same as the safe lock, then the lock chain is
+ * used to describe the problem. Otherwise we would need
+ * to show a different CPU case for each link in the chain
+ * from the safe_class lock to the unsafe_class lock.
+ */
+ if (middle_class != unsafe_class) {
+ printk("Chain exists of:\n ");
+ __print_lock_name(safe_class);
+ printk(" --> ");
+ __print_lock_name(middle_class);
+ printk(" --> ");
+ __print_lock_name(unsafe_class);
+ printk("\n\n");
+ }
+
+ printk(" Possible interrupt unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(unsafe_class);
+ printk(");\n");
+ printk(" local_irq_disable();\n");
+ printk(" lock(");
+ __print_lock_name(safe_class);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(middle_class);
+ printk(");\n");
+ printk(" <Interrupt>\n");
+ printk(" lock(");
+ __print_lock_name(safe_class);
+ printk(");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+}
+
static int
print_bad_irq_dependency(struct task_struct *curr,
struct lock_list *prev_root,
print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);
printk("\nother info that might help us debug this:\n\n");
+ print_irq_lock_scenario(backwards_entry, forwards_entry,
+ hlock_class(prev), hlock_class(next));
+
lockdep_print_held_locks(curr);
printk("\nthe dependencies between %s-irq-safe lock", irqclass);
#endif
+static void
+print_deadlock_scenario(struct held_lock *nxt,
+ struct held_lock *prv)
+{
+ struct lock_class *next = hlock_class(nxt);
+ struct lock_class *prev = hlock_class(prv);
+
+ printk(" Possible unsafe locking scenario:\n\n");
+ printk(" CPU0\n");
+ printk(" ----\n");
+ printk(" lock(");
+ __print_lock_name(prev);
+ printk(");\n");
+ printk(" lock(");
+ __print_lock_name(next);
+ printk(");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+ printk(" May be due to missing lock nesting notation\n\n");
+}
+
static int
print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
struct held_lock *next)
print_lock(prev);
printk("\nother info that might help us debug this:\n");
+ print_deadlock_scenario(next, prev);
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
struct list_head *hash_head = chainhashentry(chain_key);
struct lock_chain *chain;
struct held_lock *hlock_curr, *hlock_next;
- int i, j, n, cn;
+ int i, j;
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return 0;
}
i++;
chain->depth = curr->lockdep_depth + 1 - i;
- cn = nr_chain_hlocks;
- while (cn + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS) {
- n = cmpxchg(&nr_chain_hlocks, cn, cn + chain->depth);
- if (n == cn)
- break;
- cn = n;
- }
- if (likely(cn + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
- chain->base = cn;
+ if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
+ chain->base = nr_chain_hlocks;
+ nr_chain_hlocks += chain->depth;
for (j = 0; j < chain->depth - 1; j++, i++) {
int lock_id = curr->held_locks[i].class_idx - 1;
chain_hlocks[chain->base + j] = lock_id;
#endif
}
+static void
+print_usage_bug_scenario(struct held_lock *lock)
+{
+ struct lock_class *class = hlock_class(lock);
+
+ printk(" Possible unsafe locking scenario:\n\n");
+ printk(" CPU0\n");
+ printk(" ----\n");
+ printk(" lock(");
+ __print_lock_name(class);
+ printk(");\n");
+ printk(" <Interrupt>\n");
+ printk(" lock(");
+ __print_lock_name(class);
+ printk(");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+}
+
static int
print_usage_bug(struct task_struct *curr, struct held_lock *this,
enum lock_usage_bit prev_bit, enum lock_usage_bit new_bit)
print_irqtrace_events(curr);
printk("\nother info that might help us debug this:\n");
+ print_usage_bug_scenario(this);
+
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
struct held_lock *this, int forwards,
const char *irqclass)
{
+ struct lock_list *entry = other;
+ struct lock_list *middle = NULL;
+ int depth;
+
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
printk("\n\nand interrupts could create inverse lock ordering between them.\n\n");
printk("\nother info that might help us debug this:\n");
+
+ /* Find a middle lock (if one exists) */
+ depth = get_lock_depth(other);
+ do {
+ if (depth == 0 && (entry != root)) {
+ printk("lockdep:%s bad path found in chain graph\n", __func__);
+ break;
+ }
+ middle = entry;
+ entry = get_lock_parent(entry);
+ depth--;
+ } while (entry && entry != root && (depth >= 0));
+ if (forwards)
+ print_irq_lock_scenario(root, other,
+ middle ? middle->class : root->class, other->class);
+ else
+ print_irq_lock_scenario(other, root,
+ middle ? middle->class : other->class, root->class);
+
lockdep_print_held_locks(curr);
printk("\nthe shortest dependencies between 2nd lock and 1st lock:\n");
#include <linux/kmemleak.h>
#include <linux/jump_label.h>
#include <linux/pfn.h>
+#include <linux/bsearch.h>
#define CREATE_TRACE_POINTS
#include <trace/events/module.h>
struct module *owner,
bool (*fn)(const struct symsearch *syms,
struct module *owner,
- unsigned int symnum, void *data),
+ void *data),
void *data)
{
- unsigned int i, j;
+ unsigned int j;
for (j = 0; j < arrsize; j++) {
- for (i = 0; i < arr[j].stop - arr[j].start; i++)
- if (fn(&arr[j], owner, i, data))
- return true;
+ if (fn(&arr[j], owner, data))
+ return true;
}
return false;
}
/* Returns true as soon as fn returns true, otherwise false. */
-bool each_symbol(bool (*fn)(const struct symsearch *arr, struct module *owner,
- unsigned int symnum, void *data), void *data)
+bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
+ struct module *owner,
+ void *data),
+ void *data)
{
struct module *mod;
static const struct symsearch arr[] = {
}
return false;
}
-EXPORT_SYMBOL_GPL(each_symbol);
+EXPORT_SYMBOL_GPL(each_symbol_section);
struct find_symbol_arg {
/* Input */
const struct kernel_symbol *sym;
};
-static bool find_symbol_in_section(const struct symsearch *syms,
- struct module *owner,
- unsigned int symnum, void *data)
+static bool check_symbol(const struct symsearch *syms,
+ struct module *owner,
+ unsigned int symnum, void *data)
{
struct find_symbol_arg *fsa = data;
- if (strcmp(syms->start[symnum].name, fsa->name) != 0)
- return false;
-
if (!fsa->gplok) {
if (syms->licence == GPL_ONLY)
return false;
return true;
}
+static int cmp_name(const void *va, const void *vb)
+{
+ const char *a;
+ const struct kernel_symbol *b;
+ a = va; b = vb;
+ return strcmp(a, b->name);
+}
+
+static bool find_symbol_in_section(const struct symsearch *syms,
+ struct module *owner,
+ void *data)
+{
+ struct find_symbol_arg *fsa = data;
+ struct kernel_symbol *sym;
+
+ sym = bsearch(fsa->name, syms->start, syms->stop - syms->start,
+ sizeof(struct kernel_symbol), cmp_name);
+
+ if (sym != NULL && check_symbol(syms, owner, sym - syms->start, data))
+ return true;
+
+ return false;
+}
+
/* Find a symbol and return it, along with, (optional) crc and
* (optional) module which owns it. Needs preempt disabled or module_mutex. */
const struct kernel_symbol *find_symbol(const char *name,
fsa.gplok = gplok;
fsa.warn = warn;
- if (each_symbol(find_symbol_in_section, &fsa)) {
+ if (each_symbol_section(find_symbol_in_section, &fsa)) {
if (owner)
*owner = fsa.owner;
if (crc)
}
}
-/* Setting memory back to RW+NX before releasing it */
-void unset_section_ro_nx(struct module *mod, void *module_region)
+static void unset_module_core_ro_nx(struct module *mod)
{
- unsigned long total_pages;
-
- if (mod->module_core == module_region) {
- /* Set core as NX+RW */
- total_pages = MOD_NUMBER_OF_PAGES(mod->module_core, mod->core_size);
- set_memory_nx((unsigned long)mod->module_core, total_pages);
- set_memory_rw((unsigned long)mod->module_core, total_pages);
+ set_page_attributes(mod->module_core + mod->core_text_size,
+ mod->module_core + mod->core_size,
+ set_memory_x);
+ set_page_attributes(mod->module_core,
+ mod->module_core + mod->core_ro_size,
+ set_memory_rw);
+}
- } else if (mod->module_init == module_region) {
- /* Set init as NX+RW */
- total_pages = MOD_NUMBER_OF_PAGES(mod->module_init, mod->init_size);
- set_memory_nx((unsigned long)mod->module_init, total_pages);
- set_memory_rw((unsigned long)mod->module_init, total_pages);
- }
+static void unset_module_init_ro_nx(struct module *mod)
+{
+ set_page_attributes(mod->module_init + mod->init_text_size,
+ mod->module_init + mod->init_size,
+ set_memory_x);
+ set_page_attributes(mod->module_init,
+ mod->module_init + mod->init_ro_size,
+ set_memory_rw);
}
/* Iterate through all modules and set each module's text as RW */
-void set_all_modules_text_rw()
+void set_all_modules_text_rw(void)
{
struct module *mod;
}
/* Iterate through all modules and set each module's text as RO */
-void set_all_modules_text_ro()
+void set_all_modules_text_ro(void)
{
struct module *mod;
}
#else
static inline void set_section_ro_nx(void *base, unsigned long text_size, unsigned long ro_size, unsigned long total_size) { }
-static inline void unset_section_ro_nx(struct module *mod, void *module_region) { }
+static void unset_module_core_ro_nx(struct module *mod) { }
+static void unset_module_init_ro_nx(struct module *mod) { }
#endif
/* Free a module, remove from lists, etc. */
destroy_params(mod->kp, mod->num_kp);
/* This may be NULL, but that's OK */
- unset_section_ro_nx(mod, mod->module_init);
+ unset_module_init_ro_nx(mod);
module_free(mod, mod->module_init);
kfree(mod->args);
percpu_modfree(mod);
lockdep_free_key_range(mod->module_core, mod->core_size);
/* Finally, free the core (containing the module structure) */
- unset_section_ro_nx(mod, mod->module_core);
+ unset_module_core_ro_nx(mod);
module_free(mod, mod->module_core);
#ifdef CONFIG_MPU
const struct kernel_symbol *start,
const struct kernel_symbol *stop)
{
- const struct kernel_symbol *ks = start;
- for (; ks < stop; ks++)
- if (strcmp(ks->name, name) == 0)
- return ks;
- return NULL;
+ return bsearch(name, start, stop - start,
+ sizeof(struct kernel_symbol), cmp_name);
}
static int is_exported(const char *name, unsigned long value,
mod->symtab = mod->core_symtab;
mod->strtab = mod->core_strtab;
#endif
- unset_section_ro_nx(mod, mod->module_init);
+ unset_module_init_ro_nx(mod);
module_free(mod, mod->module_init);
mod->module_init = NULL;
mod->init_size = 0;
+ mod->init_ro_size = 0;
mod->init_text_size = 0;
mutex_unlock(&module_mutex);
return;
DEBUG_LOCKS_WARN_ON(lock->magic != lock);
- DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
+ DEBUG_LOCKS_WARN_ON(lock->owner != current);
DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
mutex_clear_owner(lock);
}
static inline void mutex_set_owner(struct mutex *lock)
{
- lock->owner = current_thread_info();
+ lock->owner = current;
}
static inline void mutex_clear_owner(struct mutex *lock)
*/
for (;;) {
- struct thread_info *owner;
-
- /*
- * If we own the BKL, then don't spin. The owner of
- * the mutex might be waiting on us to release the BKL.
- */
- if (unlikely(current->lock_depth >= 0))
- break;
+ struct task_struct *owner;
/*
* If there's an owner, wait for it to either
#ifdef CONFIG_SMP
static inline void mutex_set_owner(struct mutex *lock)
{
- lock->owner = current_thread_info();
+ lock->owner = current;
}
static inline void mutex_clear_owner(struct mutex *lock)
int param_set_bool(const char *val, const struct kernel_param *kp)
{
bool v;
+ int ret;
/* No equals means "set"... */
if (!val) val = "1";
/* One of =[yYnN01] */
- switch (val[0]) {
- case 'y': case 'Y': case '1':
- v = true;
- break;
- case 'n': case 'N': case '0':
- v = false;
- break;
- default:
- return -EINVAL;
- }
+ ret = strtobool(val, &v);
+ if (ret)
+ return ret;
if (kp->flags & KPARAM_ISBOOL)
*(bool *)kp->arg = v;
return sprintf(buf, "%s\n", vattr->version);
}
-extern struct module_version_attribute __start___modver[], __stop___modver[];
+extern const struct module_version_attribute *__start___modver[];
+extern const struct module_version_attribute *__stop___modver[];
static void __init version_sysfs_builtin(void)
{
- const struct module_version_attribute *vattr;
+ const struct module_version_attribute **p;
struct module_kobject *mk;
int err;
- for (vattr = __start___modver; vattr < __stop___modver; vattr++) {
+ for (p = __start___modver; p < __stop___modver; p++) {
+ const struct module_version_attribute *vattr = *p;
+
mk = locate_module_kobject(vattr->module_name);
if (mk) {
err = sysfs_create_file(&mk->kobj, &vattr->mattr.attr);
+++ /dev/null
-/*
- * Performance events core code:
- *
- * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
- * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
- * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
- * Copyright © 2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
- *
- * For licensing details see kernel-base/COPYING
- */
-
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/cpu.h>
-#include <linux/smp.h>
-#include <linux/idr.h>
-#include <linux/file.h>
-#include <linux/poll.h>
-#include <linux/slab.h>
-#include <linux/hash.h>
-#include <linux/sysfs.h>
-#include <linux/dcache.h>
-#include <linux/percpu.h>
-#include <linux/ptrace.h>
-#include <linux/reboot.h>
-#include <linux/vmstat.h>
-#include <linux/device.h>
-#include <linux/vmalloc.h>
-#include <linux/hardirq.h>
-#include <linux/rculist.h>
-#include <linux/uaccess.h>
-#include <linux/syscalls.h>
-#include <linux/anon_inodes.h>
-#include <linux/kernel_stat.h>
-#include <linux/perf_event.h>
-#include <linux/ftrace_event.h>
-#include <linux/hw_breakpoint.h>
-
-#include <asm/irq_regs.h>
-
-struct remote_function_call {
- struct task_struct *p;
- int (*func)(void *info);
- void *info;
- int ret;
-};
-
-static void remote_function(void *data)
-{
- struct remote_function_call *tfc = data;
- struct task_struct *p = tfc->p;
-
- if (p) {
- tfc->ret = -EAGAIN;
- if (task_cpu(p) != smp_processor_id() || !task_curr(p))
- return;
- }
-
- tfc->ret = tfc->func(tfc->info);
-}
-
-/**
- * task_function_call - call a function on the cpu on which a task runs
- * @p: the task to evaluate
- * @func: the function to be called
- * @info: the function call argument
- *
- * Calls the function @func when the task is currently running. This might
- * be on the current CPU, which just calls the function directly
- *
- * returns: @func return value, or
- * -ESRCH - when the process isn't running
- * -EAGAIN - when the process moved away
- */
-static int
-task_function_call(struct task_struct *p, int (*func) (void *info), void *info)
-{
- struct remote_function_call data = {
- .p = p,
- .func = func,
- .info = info,
- .ret = -ESRCH, /* No such (running) process */
- };
-
- if (task_curr(p))
- smp_call_function_single(task_cpu(p), remote_function, &data, 1);
-
- return data.ret;
-}
-
-/**
- * cpu_function_call - call a function on the cpu
- * @func: the function to be called
- * @info: the function call argument
- *
- * Calls the function @func on the remote cpu.
- *
- * returns: @func return value or -ENXIO when the cpu is offline
- */
-static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
-{
- struct remote_function_call data = {
- .p = NULL,
- .func = func,
- .info = info,
- .ret = -ENXIO, /* No such CPU */
- };
-
- smp_call_function_single(cpu, remote_function, &data, 1);
-
- return data.ret;
-}
-
-#define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
- PERF_FLAG_FD_OUTPUT |\
- PERF_FLAG_PID_CGROUP)
-
-enum event_type_t {
- EVENT_FLEXIBLE = 0x1,
- EVENT_PINNED = 0x2,
- EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
-};
-
-/*
- * perf_sched_events : >0 events exist
- * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
- */
-atomic_t perf_sched_events __read_mostly;
-static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
-
-static atomic_t nr_mmap_events __read_mostly;
-static atomic_t nr_comm_events __read_mostly;
-static atomic_t nr_task_events __read_mostly;
-
-static LIST_HEAD(pmus);
-static DEFINE_MUTEX(pmus_lock);
-static struct srcu_struct pmus_srcu;
-
-/*
- * perf event paranoia level:
- * -1 - not paranoid at all
- * 0 - disallow raw tracepoint access for unpriv
- * 1 - disallow cpu events for unpriv
- * 2 - disallow kernel profiling for unpriv
- */
-int sysctl_perf_event_paranoid __read_mostly = 1;
-
-/* Minimum for 512 kiB + 1 user control page */
-int sysctl_perf_event_mlock __read_mostly = 512 + (PAGE_SIZE / 1024); /* 'free' kiB per user */
-
-/*
- * max perf event sample rate
- */
-#define DEFAULT_MAX_SAMPLE_RATE 100000
-int sysctl_perf_event_sample_rate __read_mostly = DEFAULT_MAX_SAMPLE_RATE;
-static int max_samples_per_tick __read_mostly =
- DIV_ROUND_UP(DEFAULT_MAX_SAMPLE_RATE, HZ);
-
-int perf_proc_update_handler(struct ctl_table *table, int write,
- void __user *buffer, size_t *lenp,
- loff_t *ppos)
-{
- int ret = proc_dointvec(table, write, buffer, lenp, ppos);
-
- if (ret || !write)
- return ret;
-
- max_samples_per_tick = DIV_ROUND_UP(sysctl_perf_event_sample_rate, HZ);
-
- return 0;
-}
-
-static atomic64_t perf_event_id;
-
-static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
- enum event_type_t event_type);
-
-static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
- enum event_type_t event_type,
- struct task_struct *task);
-
-static void update_context_time(struct perf_event_context *ctx);
-static u64 perf_event_time(struct perf_event *event);
-
-void __weak perf_event_print_debug(void) { }
-
-extern __weak const char *perf_pmu_name(void)
-{
- return "pmu";
-}
-
-static inline u64 perf_clock(void)
-{
- return local_clock();
-}
-
-static inline struct perf_cpu_context *
-__get_cpu_context(struct perf_event_context *ctx)
-{
- return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
-}
-
-#ifdef CONFIG_CGROUP_PERF
-
-/*
- * Must ensure cgroup is pinned (css_get) before calling
- * this function. In other words, we cannot call this function
- * if there is no cgroup event for the current CPU context.
- */
-static inline struct perf_cgroup *
-perf_cgroup_from_task(struct task_struct *task)
-{
- return container_of(task_subsys_state(task, perf_subsys_id),
- struct perf_cgroup, css);
-}
-
-static inline bool
-perf_cgroup_match(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-
- return !event->cgrp || event->cgrp == cpuctx->cgrp;
-}
-
-static inline void perf_get_cgroup(struct perf_event *event)
-{
- css_get(&event->cgrp->css);
-}
-
-static inline void perf_put_cgroup(struct perf_event *event)
-{
- css_put(&event->cgrp->css);
-}
-
-static inline void perf_detach_cgroup(struct perf_event *event)
-{
- perf_put_cgroup(event);
- event->cgrp = NULL;
-}
-
-static inline int is_cgroup_event(struct perf_event *event)
-{
- return event->cgrp != NULL;
-}
-
-static inline u64 perf_cgroup_event_time(struct perf_event *event)
-{
- struct perf_cgroup_info *t;
-
- t = per_cpu_ptr(event->cgrp->info, event->cpu);
- return t->time;
-}
-
-static inline void __update_cgrp_time(struct perf_cgroup *cgrp)
-{
- struct perf_cgroup_info *info;
- u64 now;
-
- now = perf_clock();
-
- info = this_cpu_ptr(cgrp->info);
-
- info->time += now - info->timestamp;
- info->timestamp = now;
-}
-
-static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx)
-{
- struct perf_cgroup *cgrp_out = cpuctx->cgrp;
- if (cgrp_out)
- __update_cgrp_time(cgrp_out);
-}
-
-static inline void update_cgrp_time_from_event(struct perf_event *event)
-{
- struct perf_cgroup *cgrp;
-
- /*
- * ensure we access cgroup data only when needed and
- * when we know the cgroup is pinned (css_get)
- */
- if (!is_cgroup_event(event))
- return;
-
- cgrp = perf_cgroup_from_task(current);
- /*
- * Do not update time when cgroup is not active
- */
- if (cgrp == event->cgrp)
- __update_cgrp_time(event->cgrp);
-}
-
-static inline void
-perf_cgroup_set_timestamp(struct task_struct *task,
- struct perf_event_context *ctx)
-{
- struct perf_cgroup *cgrp;
- struct perf_cgroup_info *info;
-
- /*
- * ctx->lock held by caller
- * ensure we do not access cgroup data
- * unless we have the cgroup pinned (css_get)
- */
- if (!task || !ctx->nr_cgroups)
- return;
-
- cgrp = perf_cgroup_from_task(task);
- info = this_cpu_ptr(cgrp->info);
- info->timestamp = ctx->timestamp;
-}
-
-#define PERF_CGROUP_SWOUT 0x1 /* cgroup switch out every event */
-#define PERF_CGROUP_SWIN 0x2 /* cgroup switch in events based on task */
-
-/*
- * reschedule events based on the cgroup constraint of task.
- *
- * mode SWOUT : schedule out everything
- * mode SWIN : schedule in based on cgroup for next
- */
-void perf_cgroup_switch(struct task_struct *task, int mode)
-{
- struct perf_cpu_context *cpuctx;
- struct pmu *pmu;
- unsigned long flags;
-
- /*
- * disable interrupts to avoid geting nr_cgroup
- * changes via __perf_event_disable(). Also
- * avoids preemption.
- */
- local_irq_save(flags);
-
- /*
- * we reschedule only in the presence of cgroup
- * constrained events.
- */
- rcu_read_lock();
-
- list_for_each_entry_rcu(pmu, &pmus, entry) {
-
- cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
-
- perf_pmu_disable(cpuctx->ctx.pmu);
-
- /*
- * perf_cgroup_events says at least one
- * context on this CPU has cgroup events.
- *
- * ctx->nr_cgroups reports the number of cgroup
- * events for a context.
- */
- if (cpuctx->ctx.nr_cgroups > 0) {
-
- if (mode & PERF_CGROUP_SWOUT) {
- cpu_ctx_sched_out(cpuctx, EVENT_ALL);
- /*
- * must not be done before ctxswout due
- * to event_filter_match() in event_sched_out()
- */
- cpuctx->cgrp = NULL;
- }
-
- if (mode & PERF_CGROUP_SWIN) {
- WARN_ON_ONCE(cpuctx->cgrp);
- /* set cgrp before ctxsw in to
- * allow event_filter_match() to not
- * have to pass task around
- */
- cpuctx->cgrp = perf_cgroup_from_task(task);
- cpu_ctx_sched_in(cpuctx, EVENT_ALL, task);
- }
- }
-
- perf_pmu_enable(cpuctx->ctx.pmu);
- }
-
- rcu_read_unlock();
-
- local_irq_restore(flags);
-}
-
-static inline void perf_cgroup_sched_out(struct task_struct *task)
-{
- perf_cgroup_switch(task, PERF_CGROUP_SWOUT);
-}
-
-static inline void perf_cgroup_sched_in(struct task_struct *task)
-{
- perf_cgroup_switch(task, PERF_CGROUP_SWIN);
-}
-
-static inline int perf_cgroup_connect(int fd, struct perf_event *event,
- struct perf_event_attr *attr,
- struct perf_event *group_leader)
-{
- struct perf_cgroup *cgrp;
- struct cgroup_subsys_state *css;
- struct file *file;
- int ret = 0, fput_needed;
-
- file = fget_light(fd, &fput_needed);
- if (!file)
- return -EBADF;
-
- css = cgroup_css_from_dir(file, perf_subsys_id);
- if (IS_ERR(css)) {
- ret = PTR_ERR(css);
- goto out;
- }
-
- cgrp = container_of(css, struct perf_cgroup, css);
- event->cgrp = cgrp;
-
- /* must be done before we fput() the file */
- perf_get_cgroup(event);
-
- /*
- * all events in a group must monitor
- * the same cgroup because a task belongs
- * to only one perf cgroup at a time
- */
- if (group_leader && group_leader->cgrp != cgrp) {
- perf_detach_cgroup(event);
- ret = -EINVAL;
- }
-out:
- fput_light(file, fput_needed);
- return ret;
-}
-
-static inline void
-perf_cgroup_set_shadow_time(struct perf_event *event, u64 now)
-{
- struct perf_cgroup_info *t;
- t = per_cpu_ptr(event->cgrp->info, event->cpu);
- event->shadow_ctx_time = now - t->timestamp;
-}
-
-static inline void
-perf_cgroup_defer_enabled(struct perf_event *event)
-{
- /*
- * when the current task's perf cgroup does not match
- * the event's, we need to remember to call the
- * perf_mark_enable() function the first time a task with
- * a matching perf cgroup is scheduled in.
- */
- if (is_cgroup_event(event) && !perf_cgroup_match(event))
- event->cgrp_defer_enabled = 1;
-}
-
-static inline void
-perf_cgroup_mark_enabled(struct perf_event *event,
- struct perf_event_context *ctx)
-{
- struct perf_event *sub;
- u64 tstamp = perf_event_time(event);
-
- if (!event->cgrp_defer_enabled)
- return;
-
- event->cgrp_defer_enabled = 0;
-
- event->tstamp_enabled = tstamp - event->total_time_enabled;
- list_for_each_entry(sub, &event->sibling_list, group_entry) {
- if (sub->state >= PERF_EVENT_STATE_INACTIVE) {
- sub->tstamp_enabled = tstamp - sub->total_time_enabled;
- sub->cgrp_defer_enabled = 0;
- }
- }
-}
-#else /* !CONFIG_CGROUP_PERF */
-
-static inline bool
-perf_cgroup_match(struct perf_event *event)
-{
- return true;
-}
-
-static inline void perf_detach_cgroup(struct perf_event *event)
-{}
-
-static inline int is_cgroup_event(struct perf_event *event)
-{
- return 0;
-}
-
-static inline u64 perf_cgroup_event_cgrp_time(struct perf_event *event)
-{
- return 0;
-}
-
-static inline void update_cgrp_time_from_event(struct perf_event *event)
-{
-}
-
-static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx)
-{
-}
-
-static inline void perf_cgroup_sched_out(struct task_struct *task)
-{
-}
-
-static inline void perf_cgroup_sched_in(struct task_struct *task)
-{
-}
-
-static inline int perf_cgroup_connect(pid_t pid, struct perf_event *event,
- struct perf_event_attr *attr,
- struct perf_event *group_leader)
-{
- return -EINVAL;
-}
-
-static inline void
-perf_cgroup_set_timestamp(struct task_struct *task,
- struct perf_event_context *ctx)
-{
-}
-
-void
-perf_cgroup_switch(struct task_struct *task, struct task_struct *next)
-{
-}
-
-static inline void
-perf_cgroup_set_shadow_time(struct perf_event *event, u64 now)
-{
-}
-
-static inline u64 perf_cgroup_event_time(struct perf_event *event)
-{
- return 0;
-}
-
-static inline void
-perf_cgroup_defer_enabled(struct perf_event *event)
-{
-}
-
-static inline void
-perf_cgroup_mark_enabled(struct perf_event *event,
- struct perf_event_context *ctx)
-{
-}
-#endif
-
-void perf_pmu_disable(struct pmu *pmu)
-{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
- if (!(*count)++)
- pmu->pmu_disable(pmu);
-}
-
-void perf_pmu_enable(struct pmu *pmu)
-{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
- if (!--(*count))
- pmu->pmu_enable(pmu);
-}
-
-static DEFINE_PER_CPU(struct list_head, rotation_list);
-
-/*
- * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
- * because they're strictly cpu affine and rotate_start is called with IRQs
- * disabled, while rotate_context is called from IRQ context.
- */
-static void perf_pmu_rotate_start(struct pmu *pmu)
-{
- struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
- struct list_head *head = &__get_cpu_var(rotation_list);
-
- WARN_ON(!irqs_disabled());
-
- if (list_empty(&cpuctx->rotation_list))
- list_add(&cpuctx->rotation_list, head);
-}
-
-static void get_ctx(struct perf_event_context *ctx)
-{
- WARN_ON(!atomic_inc_not_zero(&ctx->refcount));
-}
-
-static void free_ctx(struct rcu_head *head)
-{
- struct perf_event_context *ctx;
-
- ctx = container_of(head, struct perf_event_context, rcu_head);
- kfree(ctx);
-}
-
-static void put_ctx(struct perf_event_context *ctx)
-{
- if (atomic_dec_and_test(&ctx->refcount)) {
- if (ctx->parent_ctx)
- put_ctx(ctx->parent_ctx);
- if (ctx->task)
- put_task_struct(ctx->task);
- call_rcu(&ctx->rcu_head, free_ctx);
- }
-}
-
-static void unclone_ctx(struct perf_event_context *ctx)
-{
- if (ctx->parent_ctx) {
- put_ctx(ctx->parent_ctx);
- ctx->parent_ctx = NULL;
- }
-}
-
-static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
-{
- /*
- * only top level events have the pid namespace they were created in
- */
- if (event->parent)
- event = event->parent;
-
- return task_tgid_nr_ns(p, event->ns);
-}
-
-static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
-{
- /*
- * only top level events have the pid namespace they were created in
- */
- if (event->parent)
- event = event->parent;
-
- return task_pid_nr_ns(p, event->ns);
-}
-
-/*
- * If we inherit events we want to return the parent event id
- * to userspace.
- */
-static u64 primary_event_id(struct perf_event *event)
-{
- u64 id = event->id;
-
- if (event->parent)
- id = event->parent->id;
-
- return id;
-}
-
-/*
- * Get the perf_event_context for a task and lock it.
- * This has to cope with with the fact that until it is locked,
- * the context could get moved to another task.
- */
-static struct perf_event_context *
-perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
-{
- struct perf_event_context *ctx;
-
- rcu_read_lock();
-retry:
- ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
- if (ctx) {
- /*
- * If this context is a clone of another, it might
- * get swapped for another underneath us by
- * perf_event_task_sched_out, though the
- * rcu_read_lock() protects us from any context
- * getting freed. Lock the context and check if it
- * got swapped before we could get the lock, and retry
- * if so. If we locked the right context, then it
- * can't get swapped on us any more.
- */
- raw_spin_lock_irqsave(&ctx->lock, *flags);
- if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
- raw_spin_unlock_irqrestore(&ctx->lock, *flags);
- goto retry;
- }
-
- if (!atomic_inc_not_zero(&ctx->refcount)) {
- raw_spin_unlock_irqrestore(&ctx->lock, *flags);
- ctx = NULL;
- }
- }
- rcu_read_unlock();
- return ctx;
-}
-
-/*
- * Get the context for a task and increment its pin_count so it
- * can't get swapped to another task. This also increments its
- * reference count so that the context can't get freed.
- */
-static struct perf_event_context *
-perf_pin_task_context(struct task_struct *task, int ctxn)
-{
- struct perf_event_context *ctx;
- unsigned long flags;
-
- ctx = perf_lock_task_context(task, ctxn, &flags);
- if (ctx) {
- ++ctx->pin_count;
- raw_spin_unlock_irqrestore(&ctx->lock, flags);
- }
- return ctx;
-}
-
-static void perf_unpin_context(struct perf_event_context *ctx)
-{
- unsigned long flags;
-
- raw_spin_lock_irqsave(&ctx->lock, flags);
- --ctx->pin_count;
- raw_spin_unlock_irqrestore(&ctx->lock, flags);
-}
-
-/*
- * Update the record of the current time in a context.
- */
-static void update_context_time(struct perf_event_context *ctx)
-{
- u64 now = perf_clock();
-
- ctx->time += now - ctx->timestamp;
- ctx->timestamp = now;
-}
-
-static u64 perf_event_time(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
-
- if (is_cgroup_event(event))
- return perf_cgroup_event_time(event);
-
- return ctx ? ctx->time : 0;
-}
-
-/*
- * Update the total_time_enabled and total_time_running fields for a event.
- */
-static void update_event_times(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- u64 run_end;
-
- if (event->state < PERF_EVENT_STATE_INACTIVE ||
- event->group_leader->state < PERF_EVENT_STATE_INACTIVE)
- return;
- /*
- * in cgroup mode, time_enabled represents
- * the time the event was enabled AND active
- * tasks were in the monitored cgroup. This is
- * independent of the activity of the context as
- * there may be a mix of cgroup and non-cgroup events.
- *
- * That is why we treat cgroup events differently
- * here.
- */
- if (is_cgroup_event(event))
- run_end = perf_event_time(event);
- else if (ctx->is_active)
- run_end = ctx->time;
- else
- run_end = event->tstamp_stopped;
-
- event->total_time_enabled = run_end - event->tstamp_enabled;
-
- if (event->state == PERF_EVENT_STATE_INACTIVE)
- run_end = event->tstamp_stopped;
- else
- run_end = perf_event_time(event);
-
- event->total_time_running = run_end - event->tstamp_running;
-
-}
-
-/*
- * Update total_time_enabled and total_time_running for all events in a group.
- */
-static void update_group_times(struct perf_event *leader)
-{
- struct perf_event *event;
-
- update_event_times(leader);
- list_for_each_entry(event, &leader->sibling_list, group_entry)
- update_event_times(event);
-}
-
-static struct list_head *
-ctx_group_list(struct perf_event *event, struct perf_event_context *ctx)
-{
- if (event->attr.pinned)
- return &ctx->pinned_groups;
- else
- return &ctx->flexible_groups;
-}
-
-/*
- * Add a event from the lists for its context.
- * Must be called with ctx->mutex and ctx->lock held.
- */
-static void
-list_add_event(struct perf_event *event, struct perf_event_context *ctx)
-{
- WARN_ON_ONCE(event->attach_state & PERF_ATTACH_CONTEXT);
- event->attach_state |= PERF_ATTACH_CONTEXT;
-
- /*
- * If we're a stand alone event or group leader, we go to the context
- * list, group events are kept attached to the group so that
- * perf_group_detach can, at all times, locate all siblings.
- */
- if (event->group_leader == event) {
- struct list_head *list;
-
- if (is_software_event(event))
- event->group_flags |= PERF_GROUP_SOFTWARE;
-
- list = ctx_group_list(event, ctx);
- list_add_tail(&event->group_entry, list);
- }
-
- if (is_cgroup_event(event))
- ctx->nr_cgroups++;
-
- list_add_rcu(&event->event_entry, &ctx->event_list);
- if (!ctx->nr_events)
- perf_pmu_rotate_start(ctx->pmu);
- ctx->nr_events++;
- if (event->attr.inherit_stat)
- ctx->nr_stat++;
-}
-
-/*
- * Called at perf_event creation and when events are attached/detached from a
- * group.
- */
-static void perf_event__read_size(struct perf_event *event)
-{
- int entry = sizeof(u64); /* value */
- int size = 0;
- int nr = 1;
-
- if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
- size += sizeof(u64);
-
- if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
- size += sizeof(u64);
-
- if (event->attr.read_format & PERF_FORMAT_ID)
- entry += sizeof(u64);
-
- if (event->attr.read_format & PERF_FORMAT_GROUP) {
- nr += event->group_leader->nr_siblings;
- size += sizeof(u64);
- }
-
- size += entry * nr;
- event->read_size = size;
-}
-
-static void perf_event__header_size(struct perf_event *event)
-{
- struct perf_sample_data *data;
- u64 sample_type = event->attr.sample_type;
- u16 size = 0;
-
- perf_event__read_size(event);
-
- if (sample_type & PERF_SAMPLE_IP)
- size += sizeof(data->ip);
-
- if (sample_type & PERF_SAMPLE_ADDR)
- size += sizeof(data->addr);
-
- if (sample_type & PERF_SAMPLE_PERIOD)
- size += sizeof(data->period);
-
- if (sample_type & PERF_SAMPLE_READ)
- size += event->read_size;
-
- event->header_size = size;
-}
-
-static void perf_event__id_header_size(struct perf_event *event)
-{
- struct perf_sample_data *data;
- u64 sample_type = event->attr.sample_type;
- u16 size = 0;
-
- if (sample_type & PERF_SAMPLE_TID)
- size += sizeof(data->tid_entry);
-
- if (sample_type & PERF_SAMPLE_TIME)
- size += sizeof(data->time);
-
- if (sample_type & PERF_SAMPLE_ID)
- size += sizeof(data->id);
-
- if (sample_type & PERF_SAMPLE_STREAM_ID)
- size += sizeof(data->stream_id);
-
- if (sample_type & PERF_SAMPLE_CPU)
- size += sizeof(data->cpu_entry);
-
- event->id_header_size = size;
-}
-
-static void perf_group_attach(struct perf_event *event)
-{
- struct perf_event *group_leader = event->group_leader, *pos;
-
- /*
- * We can have double attach due to group movement in perf_event_open.
- */
- if (event->attach_state & PERF_ATTACH_GROUP)
- return;
-
- event->attach_state |= PERF_ATTACH_GROUP;
-
- if (group_leader == event)
- return;
-
- if (group_leader->group_flags & PERF_GROUP_SOFTWARE &&
- !is_software_event(event))
- group_leader->group_flags &= ~PERF_GROUP_SOFTWARE;
-
- list_add_tail(&event->group_entry, &group_leader->sibling_list);
- group_leader->nr_siblings++;
-
- perf_event__header_size(group_leader);
-
- list_for_each_entry(pos, &group_leader->sibling_list, group_entry)
- perf_event__header_size(pos);
-}
-
-/*
- * Remove a event from the lists for its context.
- * Must be called with ctx->mutex and ctx->lock held.
- */
-static void
-list_del_event(struct perf_event *event, struct perf_event_context *ctx)
-{
- struct perf_cpu_context *cpuctx;
- /*
- * We can have double detach due to exit/hot-unplug + close.
- */
- if (!(event->attach_state & PERF_ATTACH_CONTEXT))
- return;
-
- event->attach_state &= ~PERF_ATTACH_CONTEXT;
-
- if (is_cgroup_event(event)) {
- ctx->nr_cgroups--;
- cpuctx = __get_cpu_context(ctx);
- /*
- * if there are no more cgroup events
- * then cler cgrp to avoid stale pointer
- * in update_cgrp_time_from_cpuctx()
- */
- if (!ctx->nr_cgroups)
- cpuctx->cgrp = NULL;
- }
-
- ctx->nr_events--;
- if (event->attr.inherit_stat)
- ctx->nr_stat--;
-
- list_del_rcu(&event->event_entry);
-
- if (event->group_leader == event)
- list_del_init(&event->group_entry);
-
- update_group_times(event);
-
- /*
- * If event was in error state, then keep it
- * that way, otherwise bogus counts will be
- * returned on read(). The only way to get out
- * of error state is by explicit re-enabling
- * of the event
- */
- if (event->state > PERF_EVENT_STATE_OFF)
- event->state = PERF_EVENT_STATE_OFF;
-}
-
-static void perf_group_detach(struct perf_event *event)
-{
- struct perf_event *sibling, *tmp;
- struct list_head *list = NULL;
-
- /*
- * We can have double detach due to exit/hot-unplug + close.
- */
- if (!(event->attach_state & PERF_ATTACH_GROUP))
- return;
-
- event->attach_state &= ~PERF_ATTACH_GROUP;
-
- /*
- * If this is a sibling, remove it from its group.
- */
- if (event->group_leader != event) {
- list_del_init(&event->group_entry);
- event->group_leader->nr_siblings--;
- goto out;
- }
-
- if (!list_empty(&event->group_entry))
- list = &event->group_entry;
-
- /*
- * If this was a group event with sibling events then
- * upgrade the siblings to singleton events by adding them
- * to whatever list we are on.
- */
- list_for_each_entry_safe(sibling, tmp, &event->sibling_list, group_entry) {
- if (list)
- list_move_tail(&sibling->group_entry, list);
- sibling->group_leader = sibling;
-
- /* Inherit group flags from the previous leader */
- sibling->group_flags = event->group_flags;
- }
-
-out:
- perf_event__header_size(event->group_leader);
-
- list_for_each_entry(tmp, &event->group_leader->sibling_list, group_entry)
- perf_event__header_size(tmp);
-}
-
-static inline int
-event_filter_match(struct perf_event *event)
-{
- return (event->cpu == -1 || event->cpu == smp_processor_id())
- && perf_cgroup_match(event);
-}
-
-static void
-event_sched_out(struct perf_event *event,
- struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx)
-{
- u64 tstamp = perf_event_time(event);
- u64 delta;
- /*
- * An event which could not be activated because of
- * filter mismatch still needs to have its timings
- * maintained, otherwise bogus information is return
- * via read() for time_enabled, time_running:
- */
- if (event->state == PERF_EVENT_STATE_INACTIVE
- && !event_filter_match(event)) {
- delta = tstamp - event->tstamp_stopped;
- event->tstamp_running += delta;
- event->tstamp_stopped = tstamp;
- }
-
- if (event->state != PERF_EVENT_STATE_ACTIVE)
- return;
-
- event->state = PERF_EVENT_STATE_INACTIVE;
- if (event->pending_disable) {
- event->pending_disable = 0;
- event->state = PERF_EVENT_STATE_OFF;
- }
- event->tstamp_stopped = tstamp;
- event->pmu->del(event, 0);
- event->oncpu = -1;
-
- if (!is_software_event(event))
- cpuctx->active_oncpu--;
- ctx->nr_active--;
- if (event->attr.exclusive || !cpuctx->active_oncpu)
- cpuctx->exclusive = 0;
-}
-
-static void
-group_sched_out(struct perf_event *group_event,
- struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx)
-{
- struct perf_event *event;
- int state = group_event->state;
-
- event_sched_out(group_event, cpuctx, ctx);
-
- /*
- * Schedule out siblings (if any):
- */
- list_for_each_entry(event, &group_event->sibling_list, group_entry)
- event_sched_out(event, cpuctx, ctx);
-
- if (state == PERF_EVENT_STATE_ACTIVE && group_event->attr.exclusive)
- cpuctx->exclusive = 0;
-}
-
-/*
- * Cross CPU call to remove a performance event
- *
- * We disable the event on the hardware level first. After that we
- * remove it from the context list.
- */
-static int __perf_remove_from_context(void *info)
-{
- struct perf_event *event = info;
- struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-
- raw_spin_lock(&ctx->lock);
- event_sched_out(event, cpuctx, ctx);
- list_del_event(event, ctx);
- raw_spin_unlock(&ctx->lock);
-
- return 0;
-}
-
-
-/*
- * Remove the event from a task's (or a CPU's) list of events.
- *
- * CPU events are removed with a smp call. For task events we only
- * call when the task is on a CPU.
- *
- * If event->ctx is a cloned context, callers must make sure that
- * every task struct that event->ctx->task could possibly point to
- * remains valid. This is OK when called from perf_release since
- * that only calls us on the top-level context, which can't be a clone.
- * When called from perf_event_exit_task, it's OK because the
- * context has been detached from its task.
- */
-static void perf_remove_from_context(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- struct task_struct *task = ctx->task;
-
- lockdep_assert_held(&ctx->mutex);
-
- if (!task) {
- /*
- * Per cpu events are removed via an smp call and
- * the removal is always successful.
- */
- cpu_function_call(event->cpu, __perf_remove_from_context, event);
- return;
- }
-
-retry:
- if (!task_function_call(task, __perf_remove_from_context, event))
- return;
-
- raw_spin_lock_irq(&ctx->lock);
- /*
- * If we failed to find a running task, but find the context active now
- * that we've acquired the ctx->lock, retry.
- */
- if (ctx->is_active) {
- raw_spin_unlock_irq(&ctx->lock);
- goto retry;
- }
-
- /*
- * Since the task isn't running, its safe to remove the event, us
- * holding the ctx->lock ensures the task won't get scheduled in.
- */
- list_del_event(event, ctx);
- raw_spin_unlock_irq(&ctx->lock);
-}
-
-/*
- * Cross CPU call to disable a performance event
- */
-static int __perf_event_disable(void *info)
-{
- struct perf_event *event = info;
- struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-
- /*
- * If this is a per-task event, need to check whether this
- * event's task is the current task on this cpu.
- *
- * Can trigger due to concurrent perf_event_context_sched_out()
- * flipping contexts around.
- */
- if (ctx->task && cpuctx->task_ctx != ctx)
- return -EINVAL;
-
- raw_spin_lock(&ctx->lock);
-
- /*
- * If the event is on, turn it off.
- * If it is in error state, leave it in error state.
- */
- if (event->state >= PERF_EVENT_STATE_INACTIVE) {
- update_context_time(ctx);
- update_cgrp_time_from_event(event);
- update_group_times(event);
- if (event == event->group_leader)
- group_sched_out(event, cpuctx, ctx);
- else
- event_sched_out(event, cpuctx, ctx);
- event->state = PERF_EVENT_STATE_OFF;
- }
-
- raw_spin_unlock(&ctx->lock);
-
- return 0;
-}
-
-/*
- * Disable a event.
- *
- * If event->ctx is a cloned context, callers must make sure that
- * every task struct that event->ctx->task could possibly point to
- * remains valid. This condition is satisifed when called through
- * perf_event_for_each_child or perf_event_for_each because they
- * hold the top-level event's child_mutex, so any descendant that
- * goes to exit will block in sync_child_event.
- * When called from perf_pending_event it's OK because event->ctx
- * is the current context on this CPU and preemption is disabled,
- * hence we can't get into perf_event_task_sched_out for this context.
- */
-void perf_event_disable(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- struct task_struct *task = ctx->task;
-
- if (!task) {
- /*
- * Disable the event on the cpu that it's on
- */
- cpu_function_call(event->cpu, __perf_event_disable, event);
- return;
- }
-
-retry:
- if (!task_function_call(task, __perf_event_disable, event))
- return;
-
- raw_spin_lock_irq(&ctx->lock);
- /*
- * If the event is still active, we need to retry the cross-call.
- */
- if (event->state == PERF_EVENT_STATE_ACTIVE) {
- raw_spin_unlock_irq(&ctx->lock);
- /*
- * Reload the task pointer, it might have been changed by
- * a concurrent perf_event_context_sched_out().
- */
- task = ctx->task;
- goto retry;
- }
-
- /*
- * Since we have the lock this context can't be scheduled
- * in, so we can change the state safely.
- */
- if (event->state == PERF_EVENT_STATE_INACTIVE) {
- update_group_times(event);
- event->state = PERF_EVENT_STATE_OFF;
- }
- raw_spin_unlock_irq(&ctx->lock);
-}
-
-static void perf_set_shadow_time(struct perf_event *event,
- struct perf_event_context *ctx,
- u64 tstamp)
-{
- /*
- * use the correct time source for the time snapshot
- *
- * We could get by without this by leveraging the
- * fact that to get to this function, the caller
- * has most likely already called update_context_time()
- * and update_cgrp_time_xx() and thus both timestamp
- * are identical (or very close). Given that tstamp is,
- * already adjusted for cgroup, we could say that:
- * tstamp - ctx->timestamp
- * is equivalent to
- * tstamp - cgrp->timestamp.
- *
- * Then, in perf_output_read(), the calculation would
- * work with no changes because:
- * - event is guaranteed scheduled in
- * - no scheduled out in between
- * - thus the timestamp would be the same
- *
- * But this is a bit hairy.
- *
- * So instead, we have an explicit cgroup call to remain
- * within the time time source all along. We believe it
- * is cleaner and simpler to understand.
- */
- if (is_cgroup_event(event))
- perf_cgroup_set_shadow_time(event, tstamp);
- else
- event->shadow_ctx_time = tstamp - ctx->timestamp;
-}
-
-#define MAX_INTERRUPTS (~0ULL)
-
-static void perf_log_throttle(struct perf_event *event, int enable);
-
-static int
-event_sched_in(struct perf_event *event,
- struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx)
-{
- u64 tstamp = perf_event_time(event);
-
- if (event->state <= PERF_EVENT_STATE_OFF)
- return 0;
-
- event->state = PERF_EVENT_STATE_ACTIVE;
- event->oncpu = smp_processor_id();
-
- /*
- * Unthrottle events, since we scheduled we might have missed several
- * ticks already, also for a heavily scheduling task there is little
- * guarantee it'll get a tick in a timely manner.
- */
- if (unlikely(event->hw.interrupts == MAX_INTERRUPTS)) {
- perf_log_throttle(event, 1);
- event->hw.interrupts = 0;
- }
-
- /*
- * The new state must be visible before we turn it on in the hardware:
- */
- smp_wmb();
-
- if (event->pmu->add(event, PERF_EF_START)) {
- event->state = PERF_EVENT_STATE_INACTIVE;
- event->oncpu = -1;
- return -EAGAIN;
- }
-
- event->tstamp_running += tstamp - event->tstamp_stopped;
-
- perf_set_shadow_time(event, ctx, tstamp);
-
- if (!is_software_event(event))
- cpuctx->active_oncpu++;
- ctx->nr_active++;
-
- if (event->attr.exclusive)
- cpuctx->exclusive = 1;
-
- return 0;
-}
-
-static int
-group_sched_in(struct perf_event *group_event,
- struct perf_cpu_context *cpuctx,
- struct perf_event_context *ctx)
-{
- struct perf_event *event, *partial_group = NULL;
- struct pmu *pmu = group_event->pmu;
- u64 now = ctx->time;
- bool simulate = false;
-
- if (group_event->state == PERF_EVENT_STATE_OFF)
- return 0;
-
- pmu->start_txn(pmu);
-
- if (event_sched_in(group_event, cpuctx, ctx)) {
- pmu->cancel_txn(pmu);
- return -EAGAIN;
- }
-
- /*
- * Schedule in siblings as one group (if any):
- */
- list_for_each_entry(event, &group_event->sibling_list, group_entry) {
- if (event_sched_in(event, cpuctx, ctx)) {
- partial_group = event;
- goto group_error;
- }
- }
-
- if (!pmu->commit_txn(pmu))
- return 0;
-
-group_error:
- /*
- * Groups can be scheduled in as one unit only, so undo any
- * partial group before returning:
- * The events up to the failed event are scheduled out normally,
- * tstamp_stopped will be updated.
- *
- * The failed events and the remaining siblings need to have
- * their timings updated as if they had gone thru event_sched_in()
- * and event_sched_out(). This is required to get consistent timings
- * across the group. This also takes care of the case where the group
- * could never be scheduled by ensuring tstamp_stopped is set to mark
- * the time the event was actually stopped, such that time delta
- * calculation in update_event_times() is correct.
- */
- list_for_each_entry(event, &group_event->sibling_list, group_entry) {
- if (event == partial_group)
- simulate = true;
-
- if (simulate) {
- event->tstamp_running += now - event->tstamp_stopped;
- event->tstamp_stopped = now;
- } else {
- event_sched_out(event, cpuctx, ctx);
- }
- }
- event_sched_out(group_event, cpuctx, ctx);
-
- pmu->cancel_txn(pmu);
-
- return -EAGAIN;
-}
-
-/*
- * Work out whether we can put this event group on the CPU now.
- */
-static int group_can_go_on(struct perf_event *event,
- struct perf_cpu_context *cpuctx,
- int can_add_hw)
-{
- /*
- * Groups consisting entirely of software events can always go on.
- */
- if (event->group_flags & PERF_GROUP_SOFTWARE)
- return 1;
- /*
- * If an exclusive group is already on, no other hardware
- * events can go on.
- */
- if (cpuctx->exclusive)
- return 0;
- /*
- * If this group is exclusive and there are already
- * events on the CPU, it can't go on.
- */
- if (event->attr.exclusive && cpuctx->active_oncpu)
- return 0;
- /*
- * Otherwise, try to add it if all previous groups were able
- * to go on.
- */
- return can_add_hw;
-}
-
-static void add_event_to_ctx(struct perf_event *event,
- struct perf_event_context *ctx)
-{
- u64 tstamp = perf_event_time(event);
-
- list_add_event(event, ctx);
- perf_group_attach(event);
- event->tstamp_enabled = tstamp;
- event->tstamp_running = tstamp;
- event->tstamp_stopped = tstamp;
-}
-
-static void perf_event_context_sched_in(struct perf_event_context *ctx,
- struct task_struct *tsk);
-
-/*
- * Cross CPU call to install and enable a performance event
- *
- * Must be called with ctx->mutex held
- */
-static int __perf_install_in_context(void *info)
-{
- struct perf_event *event = info;
- struct perf_event_context *ctx = event->ctx;
- struct perf_event *leader = event->group_leader;
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
- int err;
-
- /*
- * In case we're installing a new context to an already running task,
- * could also happen before perf_event_task_sched_in() on architectures
- * which do context switches with IRQs enabled.
- */
- if (ctx->task && !cpuctx->task_ctx)
- perf_event_context_sched_in(ctx, ctx->task);
-
- raw_spin_lock(&ctx->lock);
- ctx->is_active = 1;
- update_context_time(ctx);
- /*
- * update cgrp time only if current cgrp
- * matches event->cgrp. Must be done before
- * calling add_event_to_ctx()
- */
- update_cgrp_time_from_event(event);
-
- add_event_to_ctx(event, ctx);
-
- if (!event_filter_match(event))
- goto unlock;
-
- /*
- * Don't put the event on if it is disabled or if
- * it is in a group and the group isn't on.
- */
- if (event->state != PERF_EVENT_STATE_INACTIVE ||
- (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE))
- goto unlock;
-
- /*
- * An exclusive event can't go on if there are already active
- * hardware events, and no hardware event can go on if there
- * is already an exclusive event on.
- */
- if (!group_can_go_on(event, cpuctx, 1))
- err = -EEXIST;
- else
- err = event_sched_in(event, cpuctx, ctx);
-
- if (err) {
- /*
- * This event couldn't go on. If it is in a group
- * then we have to pull the whole group off.
- * If the event group is pinned then put it in error state.
- */
- if (leader != event)
- group_sched_out(leader, cpuctx, ctx);
- if (leader->attr.pinned) {
- update_group_times(leader);
- leader->state = PERF_EVENT_STATE_ERROR;
- }
- }
-
-unlock:
- raw_spin_unlock(&ctx->lock);
-
- return 0;
-}
-
-/*
- * Attach a performance event to a context
- *
- * First we add the event to the list with the hardware enable bit
- * in event->hw_config cleared.
- *
- * If the event is attached to a task which is on a CPU we use a smp
- * call to enable it in the task context. The task might have been
- * scheduled away, but we check this in the smp call again.
- */
-static void
-perf_install_in_context(struct perf_event_context *ctx,
- struct perf_event *event,
- int cpu)
-{
- struct task_struct *task = ctx->task;
-
- lockdep_assert_held(&ctx->mutex);
-
- event->ctx = ctx;
-
- if (!task) {
- /*
- * Per cpu events are installed via an smp call and
- * the install is always successful.
- */
- cpu_function_call(cpu, __perf_install_in_context, event);
- return;
- }
-
-retry:
- if (!task_function_call(task, __perf_install_in_context, event))
- return;
-
- raw_spin_lock_irq(&ctx->lock);
- /*
- * If we failed to find a running task, but find the context active now
- * that we've acquired the ctx->lock, retry.
- */
- if (ctx->is_active) {
- raw_spin_unlock_irq(&ctx->lock);
- goto retry;
- }
-
- /*
- * Since the task isn't running, its safe to add the event, us holding
- * the ctx->lock ensures the task won't get scheduled in.
- */
- add_event_to_ctx(event, ctx);
- raw_spin_unlock_irq(&ctx->lock);
-}
-
-/*
- * Put a event into inactive state and update time fields.
- * Enabling the leader of a group effectively enables all
- * the group members that aren't explicitly disabled, so we
- * have to update their ->tstamp_enabled also.
- * Note: this works for group members as well as group leaders
- * since the non-leader members' sibling_lists will be empty.
- */
-static void __perf_event_mark_enabled(struct perf_event *event,
- struct perf_event_context *ctx)
-{
- struct perf_event *sub;
- u64 tstamp = perf_event_time(event);
-
- event->state = PERF_EVENT_STATE_INACTIVE;
- event->tstamp_enabled = tstamp - event->total_time_enabled;
- list_for_each_entry(sub, &event->sibling_list, group_entry) {
- if (sub->state >= PERF_EVENT_STATE_INACTIVE)
- sub->tstamp_enabled = tstamp - sub->total_time_enabled;
- }
-}
-
-/*
- * Cross CPU call to enable a performance event
- */
-static int __perf_event_enable(void *info)
-{
- struct perf_event *event = info;
- struct perf_event_context *ctx = event->ctx;
- struct perf_event *leader = event->group_leader;
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
- int err;
-
- if (WARN_ON_ONCE(!ctx->is_active))
- return -EINVAL;
-
- raw_spin_lock(&ctx->lock);
- update_context_time(ctx);
-
- if (event->state >= PERF_EVENT_STATE_INACTIVE)
- goto unlock;
-
- /*
- * set current task's cgroup time reference point
- */
- perf_cgroup_set_timestamp(current, ctx);
-
- __perf_event_mark_enabled(event, ctx);
-
- if (!event_filter_match(event)) {
- if (is_cgroup_event(event))
- perf_cgroup_defer_enabled(event);
- goto unlock;
- }
-
- /*
- * If the event is in a group and isn't the group leader,
- * then don't put it on unless the group is on.
- */
- if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
- goto unlock;
-
- if (!group_can_go_on(event, cpuctx, 1)) {
- err = -EEXIST;
- } else {
- if (event == leader)
- err = group_sched_in(event, cpuctx, ctx);
- else
- err = event_sched_in(event, cpuctx, ctx);
- }
-
- if (err) {
- /*
- * If this event can't go on and it's part of a
- * group, then the whole group has to come off.
- */
- if (leader != event)
- group_sched_out(leader, cpuctx, ctx);
- if (leader->attr.pinned) {
- update_group_times(leader);
- leader->state = PERF_EVENT_STATE_ERROR;
- }
- }
-
-unlock:
- raw_spin_unlock(&ctx->lock);
-
- return 0;
-}
-
-/*
- * Enable a event.
- *
- * If event->ctx is a cloned context, callers must make sure that
- * every task struct that event->ctx->task could possibly point to
- * remains valid. This condition is satisfied when called through
- * perf_event_for_each_child or perf_event_for_each as described
- * for perf_event_disable.
- */
-void perf_event_enable(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
- struct task_struct *task = ctx->task;
-
- if (!task) {
- /*
- * Enable the event on the cpu that it's on
- */
- cpu_function_call(event->cpu, __perf_event_enable, event);
- return;
- }
-
- raw_spin_lock_irq(&ctx->lock);
- if (event->state >= PERF_EVENT_STATE_INACTIVE)
- goto out;
-
- /*
- * If the event is in error state, clear that first.
- * That way, if we see the event in error state below, we
- * know that it has gone back into error state, as distinct
- * from the task having been scheduled away before the
- * cross-call arrived.
- */
- if (event->state == PERF_EVENT_STATE_ERROR)
- event->state = PERF_EVENT_STATE_OFF;
-
-retry:
- if (!ctx->is_active) {
- __perf_event_mark_enabled(event, ctx);
- goto out;
- }
-
- raw_spin_unlock_irq(&ctx->lock);
-
- if (!task_function_call(task, __perf_event_enable, event))
- return;
-
- raw_spin_lock_irq(&ctx->lock);
-
- /*
- * If the context is active and the event is still off,
- * we need to retry the cross-call.
- */
- if (ctx->is_active && event->state == PERF_EVENT_STATE_OFF) {
- /*
- * task could have been flipped by a concurrent
- * perf_event_context_sched_out()
- */
- task = ctx->task;
- goto retry;
- }
-
-out:
- raw_spin_unlock_irq(&ctx->lock);
-}
-
-static int perf_event_refresh(struct perf_event *event, int refresh)
-{
- /*
- * not supported on inherited events
- */
- if (event->attr.inherit || !is_sampling_event(event))
- return -EINVAL;
-
- atomic_add(refresh, &event->event_limit);
- perf_event_enable(event);
-
- return 0;
-}
-
-static void ctx_sched_out(struct perf_event_context *ctx,
- struct perf_cpu_context *cpuctx,
- enum event_type_t event_type)
-{
- struct perf_event *event;
-
- raw_spin_lock(&ctx->lock);
- perf_pmu_disable(ctx->pmu);
- ctx->is_active = 0;
- if (likely(!ctx->nr_events))
- goto out;
- update_context_time(ctx);
- update_cgrp_time_from_cpuctx(cpuctx);
-
- if (!ctx->nr_active)
- goto out;
-
- if (event_type & EVENT_PINNED) {
- list_for_each_entry(event, &ctx->pinned_groups, group_entry)
- group_sched_out(event, cpuctx, ctx);
- }
-
- if (event_type & EVENT_FLEXIBLE) {
- list_for_each_entry(event, &ctx->flexible_groups, group_entry)
- group_sched_out(event, cpuctx, ctx);
- }
-out:
- perf_pmu_enable(ctx->pmu);
- raw_spin_unlock(&ctx->lock);
-}
-
-/*
- * Test whether two contexts are equivalent, i.e. whether they
- * have both been cloned from the same version of the same context
- * and they both have the same number of enabled events.
- * If the number of enabled events is the same, then the set
- * of enabled events should be the same, because these are both
- * inherited contexts, therefore we can't access individual events
- * in them directly with an fd; we can only enable/disable all
- * events via prctl, or enable/disable all events in a family
- * via ioctl, which will have the same effect on both contexts.
- */
-static int context_equiv(struct perf_event_context *ctx1,
- struct perf_event_context *ctx2)
-{
- return ctx1->parent_ctx && ctx1->parent_ctx == ctx2->parent_ctx
- && ctx1->parent_gen == ctx2->parent_gen
- && !ctx1->pin_count && !ctx2->pin_count;
-}
-
-static void __perf_event_sync_stat(struct perf_event *event,
- struct perf_event *next_event)
-{
- u64 value;
-
- if (!event->attr.inherit_stat)
- return;
-
- /*
- * Update the event value, we cannot use perf_event_read()
- * because we're in the middle of a context switch and have IRQs
- * disabled, which upsets smp_call_function_single(), however
- * we know the event must be on the current CPU, therefore we
- * don't need to use it.
- */
- switch (event->state) {
- case PERF_EVENT_STATE_ACTIVE:
- event->pmu->read(event);
- /* fall-through */
-
- case PERF_EVENT_STATE_INACTIVE:
- update_event_times(event);
- break;
-
- default:
- break;
- }
-
- /*
- * In order to keep per-task stats reliable we need to flip the event
- * values when we flip the contexts.
- */
- value = local64_read(&next_event->count);
- value = local64_xchg(&event->count, value);
- local64_set(&next_event->count, value);
-
- swap(event->total_time_enabled, next_event->total_time_enabled);
- swap(event->total_time_running, next_event->total_time_running);
-
- /*
- * Since we swizzled the values, update the user visible data too.
- */
- perf_event_update_userpage(event);
- perf_event_update_userpage(next_event);
-}
-
-#define list_next_entry(pos, member) \
- list_entry(pos->member.next, typeof(*pos), member)
-
-static void perf_event_sync_stat(struct perf_event_context *ctx,
- struct perf_event_context *next_ctx)
-{
- struct perf_event *event, *next_event;
-
- if (!ctx->nr_stat)
- return;
-
- update_context_time(ctx);
-
- event = list_first_entry(&ctx->event_list,
- struct perf_event, event_entry);
-
- next_event = list_first_entry(&next_ctx->event_list,
- struct perf_event, event_entry);
-
- while (&event->event_entry != &ctx->event_list &&
- &next_event->event_entry != &next_ctx->event_list) {
-
- __perf_event_sync_stat(event, next_event);
-
- event = list_next_entry(event, event_entry);
- next_event = list_next_entry(next_event, event_entry);
- }
-}
-
-static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
- struct task_struct *next)
-{
- struct perf_event_context *ctx = task->perf_event_ctxp[ctxn];
- struct perf_event_context *next_ctx;
- struct perf_event_context *parent;
- struct perf_cpu_context *cpuctx;
- int do_switch = 1;
-
- if (likely(!ctx))
- return;
-
- cpuctx = __get_cpu_context(ctx);
- if (!cpuctx->task_ctx)
- return;
-
- rcu_read_lock();
- parent = rcu_dereference(ctx->parent_ctx);
- next_ctx = next->perf_event_ctxp[ctxn];
- if (parent && next_ctx &&
- rcu_dereference(next_ctx->parent_ctx) == parent) {
- /*
- * Looks like the two contexts are clones, so we might be
- * able to optimize the context switch. We lock both
- * contexts and check that they are clones under the
- * lock (including re-checking that neither has been
- * uncloned in the meantime). It doesn't matter which
- * order we take the locks because no other cpu could
- * be trying to lock both of these tasks.
- */
- raw_spin_lock(&ctx->lock);
- raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING);
- if (context_equiv(ctx, next_ctx)) {
- /*
- * XXX do we need a memory barrier of sorts
- * wrt to rcu_dereference() of perf_event_ctxp
- */
- task->perf_event_ctxp[ctxn] = next_ctx;
- next->perf_event_ctxp[ctxn] = ctx;
- ctx->task = next;
- next_ctx->task = task;
- do_switch = 0;
-
- perf_event_sync_stat(ctx, next_ctx);
- }
- raw_spin_unlock(&next_ctx->lock);
- raw_spin_unlock(&ctx->lock);
- }
- rcu_read_unlock();
-
- if (do_switch) {
- ctx_sched_out(ctx, cpuctx, EVENT_ALL);
- cpuctx->task_ctx = NULL;
- }
-}
-
-#define for_each_task_context_nr(ctxn) \
- for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
-
-/*
- * Called from scheduler to remove the events of the current task,
- * with interrupts disabled.
- *
- * We stop each event and update the event value in event->count.
- *
- * This does not protect us against NMI, but disable()
- * sets the disabled bit in the control field of event _before_
- * accessing the event control register. If a NMI hits, then it will
- * not restart the event.
- */
-void __perf_event_task_sched_out(struct task_struct *task,
- struct task_struct *next)
-{
- int ctxn;
-
- for_each_task_context_nr(ctxn)
- perf_event_context_sched_out(task, ctxn, next);
-
- /*
- * if cgroup events exist on this CPU, then we need
- * to check if we have to switch out PMU state.
- * cgroup event are system-wide mode only
- */
- if (atomic_read(&__get_cpu_var(perf_cgroup_events)))
- perf_cgroup_sched_out(task);
-}
-
-static void task_ctx_sched_out(struct perf_event_context *ctx,
- enum event_type_t event_type)
-{
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-
- if (!cpuctx->task_ctx)
- return;
-
- if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
- return;
-
- ctx_sched_out(ctx, cpuctx, event_type);
- cpuctx->task_ctx = NULL;
-}
-
-/*
- * Called with IRQs disabled
- */
-static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
- enum event_type_t event_type)
-{
- ctx_sched_out(&cpuctx->ctx, cpuctx, event_type);
-}
-
-static void
-ctx_pinned_sched_in(struct perf_event_context *ctx,
- struct perf_cpu_context *cpuctx)
-{
- struct perf_event *event;
-
- list_for_each_entry(event, &ctx->pinned_groups, group_entry) {
- if (event->state <= PERF_EVENT_STATE_OFF)
- continue;
- if (!event_filter_match(event))
- continue;
-
- /* may need to reset tstamp_enabled */
- if (is_cgroup_event(event))
- perf_cgroup_mark_enabled(event, ctx);
-
- if (group_can_go_on(event, cpuctx, 1))
- group_sched_in(event, cpuctx, ctx);
-
- /*
- * If this pinned group hasn't been scheduled,
- * put it in error state.
- */
- if (event->state == PERF_EVENT_STATE_INACTIVE) {
- update_group_times(event);
- event->state = PERF_EVENT_STATE_ERROR;
- }
- }
-}
-
-static void
-ctx_flexible_sched_in(struct perf_event_context *ctx,
- struct perf_cpu_context *cpuctx)
-{
- struct perf_event *event;
- int can_add_hw = 1;
-
- list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
- /* Ignore events in OFF or ERROR state */
- if (event->state <= PERF_EVENT_STATE_OFF)
- continue;
- /*
- * Listen to the 'cpu' scheduling filter constraint
- * of events:
- */
- if (!event_filter_match(event))
- continue;
-
- /* may need to reset tstamp_enabled */
- if (is_cgroup_event(event))
- perf_cgroup_mark_enabled(event, ctx);
-
- if (group_can_go_on(event, cpuctx, can_add_hw)) {
- if (group_sched_in(event, cpuctx, ctx))
- can_add_hw = 0;
- }
- }
-}
-
-static void
-ctx_sched_in(struct perf_event_context *ctx,
- struct perf_cpu_context *cpuctx,
- enum event_type_t event_type,
- struct task_struct *task)
-{
- u64 now;
-
- raw_spin_lock(&ctx->lock);
- ctx->is_active = 1;
- if (likely(!ctx->nr_events))
- goto out;
-
- now = perf_clock();
- ctx->timestamp = now;
- perf_cgroup_set_timestamp(task, ctx);
- /*
- * First go through the list and put on any pinned groups
- * in order to give them the best chance of going on.
- */
- if (event_type & EVENT_PINNED)
- ctx_pinned_sched_in(ctx, cpuctx);
-
- /* Then walk through the lower prio flexible groups */
- if (event_type & EVENT_FLEXIBLE)
- ctx_flexible_sched_in(ctx, cpuctx);
-
-out:
- raw_spin_unlock(&ctx->lock);
-}
-
-static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
- enum event_type_t event_type,
- struct task_struct *task)
-{
- struct perf_event_context *ctx = &cpuctx->ctx;
-
- ctx_sched_in(ctx, cpuctx, event_type, task);
-}
-
-static void task_ctx_sched_in(struct perf_event_context *ctx,
- enum event_type_t event_type)
-{
- struct perf_cpu_context *cpuctx;
-
- cpuctx = __get_cpu_context(ctx);
- if (cpuctx->task_ctx == ctx)
- return;
-
- ctx_sched_in(ctx, cpuctx, event_type, NULL);
- cpuctx->task_ctx = ctx;
-}
-
-static void perf_event_context_sched_in(struct perf_event_context *ctx,
- struct task_struct *task)
-{
- struct perf_cpu_context *cpuctx;
-
- cpuctx = __get_cpu_context(ctx);
- if (cpuctx->task_ctx == ctx)
- return;
-
- perf_pmu_disable(ctx->pmu);
- /*
- * We want to keep the following priority order:
- * cpu pinned (that don't need to move), task pinned,
- * cpu flexible, task flexible.
- */
- cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
-
- ctx_sched_in(ctx, cpuctx, EVENT_PINNED, task);
- cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
- ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
-
- cpuctx->task_ctx = ctx;
-
- /*
- * Since these rotations are per-cpu, we need to ensure the
- * cpu-context we got scheduled on is actually rotating.
- */
- perf_pmu_rotate_start(ctx->pmu);
- perf_pmu_enable(ctx->pmu);
-}
-
-/*
- * Called from scheduler to add the events of the current task
- * with interrupts disabled.
- *
- * We restore the event value and then enable it.
- *
- * This does not protect us against NMI, but enable()
- * sets the enabled bit in the control field of event _before_
- * accessing the event control register. If a NMI hits, then it will
- * keep the event running.
- */
-void __perf_event_task_sched_in(struct task_struct *task)
-{
- struct perf_event_context *ctx;
- int ctxn;
-
- for_each_task_context_nr(ctxn) {
- ctx = task->perf_event_ctxp[ctxn];
- if (likely(!ctx))
- continue;
-
- perf_event_context_sched_in(ctx, task);
- }
- /*
- * if cgroup events exist on this CPU, then we need
- * to check if we have to switch in PMU state.
- * cgroup event are system-wide mode only
- */
- if (atomic_read(&__get_cpu_var(perf_cgroup_events)))
- perf_cgroup_sched_in(task);
-}
-
-static u64 perf_calculate_period(struct perf_event *event, u64 nsec, u64 count)
-{
- u64 frequency = event->attr.sample_freq;
- u64 sec = NSEC_PER_SEC;
- u64 divisor, dividend;
-
- int count_fls, nsec_fls, frequency_fls, sec_fls;
-
- count_fls = fls64(count);
- nsec_fls = fls64(nsec);
- frequency_fls = fls64(frequency);
- sec_fls = 30;
-
- /*
- * We got @count in @nsec, with a target of sample_freq HZ
- * the target period becomes:
- *
- * @count * 10^9
- * period = -------------------
- * @nsec * sample_freq
- *
- */
-
- /*
- * Reduce accuracy by one bit such that @a and @b converge
- * to a similar magnitude.
- */
-#define REDUCE_FLS(a, b) \
-do { \
- if (a##_fls > b##_fls) { \
- a >>= 1; \
- a##_fls--; \
- } else { \
- b >>= 1; \
- b##_fls--; \
- } \
-} while (0)
-
- /*
- * Reduce accuracy until either term fits in a u64, then proceed with
- * the other, so that finally we can do a u64/u64 division.
- */
- while (count_fls + sec_fls > 64 && nsec_fls + frequency_fls > 64) {
- REDUCE_FLS(nsec, frequency);
- REDUCE_FLS(sec, count);
- }
-
- if (count_fls + sec_fls > 64) {
- divisor = nsec * frequency;
-
- while (count_fls + sec_fls > 64) {
- REDUCE_FLS(count, sec);
- divisor >>= 1;
- }
-
- dividend = count * sec;
- } else {
- dividend = count * sec;
-
- while (nsec_fls + frequency_fls > 64) {
- REDUCE_FLS(nsec, frequency);
- dividend >>= 1;
- }
-
- divisor = nsec * frequency;
- }
-
- if (!divisor)
- return dividend;
-
- return div64_u64(dividend, divisor);
-}
-
-static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
-{
- struct hw_perf_event *hwc = &event->hw;
- s64 period, sample_period;
- s64 delta;
-
- period = perf_calculate_period(event, nsec, count);
-
- delta = (s64)(period - hwc->sample_period);
- delta = (delta + 7) / 8; /* low pass filter */
-
- sample_period = hwc->sample_period + delta;
-
- if (!sample_period)
- sample_period = 1;
-
- hwc->sample_period = sample_period;
-
- if (local64_read(&hwc->period_left) > 8*sample_period) {
- event->pmu->stop(event, PERF_EF_UPDATE);
- local64_set(&hwc->period_left, 0);
- event->pmu->start(event, PERF_EF_RELOAD);
- }
-}
-
-static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64 period)
-{
- struct perf_event *event;
- struct hw_perf_event *hwc;
- u64 interrupts, now;
- s64 delta;
-
- raw_spin_lock(&ctx->lock);
- list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
- if (event->state != PERF_EVENT_STATE_ACTIVE)
- continue;
-
- if (!event_filter_match(event))
- continue;
-
- hwc = &event->hw;
-
- interrupts = hwc->interrupts;
- hwc->interrupts = 0;
-
- /*
- * unthrottle events on the tick
- */
- if (interrupts == MAX_INTERRUPTS) {
- perf_log_throttle(event, 1);
- event->pmu->start(event, 0);
- }
-
- if (!event->attr.freq || !event->attr.sample_freq)
- continue;
-
- event->pmu->read(event);
- now = local64_read(&event->count);
- delta = now - hwc->freq_count_stamp;
- hwc->freq_count_stamp = now;
-
- if (delta > 0)
- perf_adjust_period(event, period, delta);
- }
- raw_spin_unlock(&ctx->lock);
-}
-
-/*
- * Round-robin a context's events:
- */
-static void rotate_ctx(struct perf_event_context *ctx)
-{
- raw_spin_lock(&ctx->lock);
-
- /*
- * Rotate the first entry last of non-pinned groups. Rotation might be
- * disabled by the inheritance code.
- */
- if (!ctx->rotate_disable)
- list_rotate_left(&ctx->flexible_groups);
-
- raw_spin_unlock(&ctx->lock);
-}
-
-/*
- * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
- * because they're strictly cpu affine and rotate_start is called with IRQs
- * disabled, while rotate_context is called from IRQ context.
- */
-static void perf_rotate_context(struct perf_cpu_context *cpuctx)
-{
- u64 interval = (u64)cpuctx->jiffies_interval * TICK_NSEC;
- struct perf_event_context *ctx = NULL;
- int rotate = 0, remove = 1;
-
- if (cpuctx->ctx.nr_events) {
- remove = 0;
- if (cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
- rotate = 1;
- }
-
- ctx = cpuctx->task_ctx;
- if (ctx && ctx->nr_events) {
- remove = 0;
- if (ctx->nr_events != ctx->nr_active)
- rotate = 1;
- }
-
- perf_pmu_disable(cpuctx->ctx.pmu);
- perf_ctx_adjust_freq(&cpuctx->ctx, interval);
- if (ctx)
- perf_ctx_adjust_freq(ctx, interval);
-
- if (!rotate)
- goto done;
-
- cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
- if (ctx)
- task_ctx_sched_out(ctx, EVENT_FLEXIBLE);
-
- rotate_ctx(&cpuctx->ctx);
- if (ctx)
- rotate_ctx(ctx);
-
- cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, current);
- if (ctx)
- task_ctx_sched_in(ctx, EVENT_FLEXIBLE);
-
-done:
- if (remove)
- list_del_init(&cpuctx->rotation_list);
-
- perf_pmu_enable(cpuctx->ctx.pmu);
-}
-
-void perf_event_task_tick(void)
-{
- struct list_head *head = &__get_cpu_var(rotation_list);
- struct perf_cpu_context *cpuctx, *tmp;
-
- WARN_ON(!irqs_disabled());
-
- list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
- if (cpuctx->jiffies_interval == 1 ||
- !(jiffies % cpuctx->jiffies_interval))
- perf_rotate_context(cpuctx);
- }
-}
-
-static int event_enable_on_exec(struct perf_event *event,
- struct perf_event_context *ctx)
-{
- if (!event->attr.enable_on_exec)
- return 0;
-
- event->attr.enable_on_exec = 0;
- if (event->state >= PERF_EVENT_STATE_INACTIVE)
- return 0;
-
- __perf_event_mark_enabled(event, ctx);
-
- return 1;
-}
-
-/*
- * Enable all of a task's events that have been marked enable-on-exec.
- * This expects task == current.
- */
-static void perf_event_enable_on_exec(struct perf_event_context *ctx)
-{
- struct perf_event *event;
- unsigned long flags;
- int enabled = 0;
- int ret;
-
- local_irq_save(flags);
- if (!ctx || !ctx->nr_events)
- goto out;
-
- /*
- * We must ctxsw out cgroup events to avoid conflict
- * when invoking perf_task_event_sched_in() later on
- * in this function. Otherwise we end up trying to
- * ctxswin cgroup events which are already scheduled
- * in.
- */
- perf_cgroup_sched_out(current);
- task_ctx_sched_out(ctx, EVENT_ALL);
-
- raw_spin_lock(&ctx->lock);
-
- list_for_each_entry(event, &ctx->pinned_groups, group_entry) {
- ret = event_enable_on_exec(event, ctx);
- if (ret)
- enabled = 1;
- }
-
- list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
- ret = event_enable_on_exec(event, ctx);
- if (ret)
- enabled = 1;
- }
-
- /*
- * Unclone this context if we enabled any event.
- */
- if (enabled)
- unclone_ctx(ctx);
-
- raw_spin_unlock(&ctx->lock);
-
- /*
- * Also calls ctxswin for cgroup events, if any:
- */
- perf_event_context_sched_in(ctx, ctx->task);
-out:
- local_irq_restore(flags);
-}
-
-/*
- * Cross CPU call to read the hardware event
- */
-static void __perf_event_read(void *info)
-{
- struct perf_event *event = info;
- struct perf_event_context *ctx = event->ctx;
- struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-
- /*
- * If this is a task context, we need to check whether it is
- * the current task context of this cpu. If not it has been
- * scheduled out before the smp call arrived. In that case
- * event->count would have been updated to a recent sample
- * when the event was scheduled out.
- */
- if (ctx->task && cpuctx->task_ctx != ctx)
- return;
-
- raw_spin_lock(&ctx->lock);
- if (ctx->is_active) {
- update_context_time(ctx);
- update_cgrp_time_from_event(event);
- }
- update_event_times(event);
- if (event->state == PERF_EVENT_STATE_ACTIVE)
- event->pmu->read(event);
- raw_spin_unlock(&ctx->lock);
-}
-
-static inline u64 perf_event_count(struct perf_event *event)
-{
- return local64_read(&event->count) + atomic64_read(&event->child_count);
-}
-
-static u64 perf_event_read(struct perf_event *event)
-{
- /*
- * If event is enabled and currently active on a CPU, update the
- * value in the event structure:
- */
- if (event->state == PERF_EVENT_STATE_ACTIVE) {
- smp_call_function_single(event->oncpu,
- __perf_event_read, event, 1);
- } else if (event->state == PERF_EVENT_STATE_INACTIVE) {
- struct perf_event_context *ctx = event->ctx;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&ctx->lock, flags);
- /*
- * may read while context is not active
- * (e.g., thread is blocked), in that case
- * we cannot update context time
- */
- if (ctx->is_active) {
- update_context_time(ctx);
- update_cgrp_time_from_event(event);
- }
- update_event_times(event);
- raw_spin_unlock_irqrestore(&ctx->lock, flags);
- }
-
- return perf_event_count(event);
-}
-
-/*
- * Callchain support
- */
-
-struct callchain_cpus_entries {
- struct rcu_head rcu_head;
- struct perf_callchain_entry *cpu_entries[0];
-};
-
-static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]);
-static atomic_t nr_callchain_events;
-static DEFINE_MUTEX(callchain_mutex);
-struct callchain_cpus_entries *callchain_cpus_entries;
-
-
-__weak void perf_callchain_kernel(struct perf_callchain_entry *entry,
- struct pt_regs *regs)
-{
-}
-
-__weak void perf_callchain_user(struct perf_callchain_entry *entry,
- struct pt_regs *regs)
-{
-}
-
-static void release_callchain_buffers_rcu(struct rcu_head *head)
-{
- struct callchain_cpus_entries *entries;
- int cpu;
-
- entries = container_of(head, struct callchain_cpus_entries, rcu_head);
-
- for_each_possible_cpu(cpu)
- kfree(entries->cpu_entries[cpu]);
-
- kfree(entries);
-}
-
-static void release_callchain_buffers(void)
-{
- struct callchain_cpus_entries *entries;
-
- entries = callchain_cpus_entries;
- rcu_assign_pointer(callchain_cpus_entries, NULL);
- call_rcu(&entries->rcu_head, release_callchain_buffers_rcu);
-}
-
-static int alloc_callchain_buffers(void)
-{
- int cpu;
- int size;
- struct callchain_cpus_entries *entries;
-
- /*
- * We can't use the percpu allocation API for data that can be
- * accessed from NMI. Use a temporary manual per cpu allocation
- * until that gets sorted out.
- */
- size = offsetof(struct callchain_cpus_entries, cpu_entries[nr_cpu_ids]);
-
- entries = kzalloc(size, GFP_KERNEL);
- if (!entries)
- return -ENOMEM;
-
- size = sizeof(struct perf_callchain_entry) * PERF_NR_CONTEXTS;
-
- for_each_possible_cpu(cpu) {
- entries->cpu_entries[cpu] = kmalloc_node(size, GFP_KERNEL,
- cpu_to_node(cpu));
- if (!entries->cpu_entries[cpu])
- goto fail;
- }
-
- rcu_assign_pointer(callchain_cpus_entries, entries);
-
- return 0;
-
-fail:
- for_each_possible_cpu(cpu)
- kfree(entries->cpu_entries[cpu]);
- kfree(entries);
-
- return -ENOMEM;
-}
-
-static int get_callchain_buffers(void)
-{
- int err = 0;
- int count;
-
- mutex_lock(&callchain_mutex);
-
- count = atomic_inc_return(&nr_callchain_events);
- if (WARN_ON_ONCE(count < 1)) {
- err = -EINVAL;
- goto exit;
- }
-
- if (count > 1) {
- /* If the allocation failed, give up */
- if (!callchain_cpus_entries)
- err = -ENOMEM;
- goto exit;
- }
-
- err = alloc_callchain_buffers();
- if (err)
- release_callchain_buffers();
-exit:
- mutex_unlock(&callchain_mutex);
-
- return err;
-}
-
-static void put_callchain_buffers(void)
-{
- if (atomic_dec_and_mutex_lock(&nr_callchain_events, &callchain_mutex)) {
- release_callchain_buffers();
- mutex_unlock(&callchain_mutex);
- }
-}
-
-static int get_recursion_context(int *recursion)
-{
- int rctx;
-
- if (in_nmi())
- rctx = 3;
- else if (in_irq())
- rctx = 2;
- else if (in_softirq())
- rctx = 1;
- else
- rctx = 0;
-
- if (recursion[rctx])
- return -1;
-
- recursion[rctx]++;
- barrier();
-
- return rctx;
-}
-
-static inline void put_recursion_context(int *recursion, int rctx)
-{
- barrier();
- recursion[rctx]--;
-}
-
-static struct perf_callchain_entry *get_callchain_entry(int *rctx)
-{
- int cpu;
- struct callchain_cpus_entries *entries;
-
- *rctx = get_recursion_context(__get_cpu_var(callchain_recursion));
- if (*rctx == -1)
- return NULL;
-
- entries = rcu_dereference(callchain_cpus_entries);
- if (!entries)
- return NULL;
-
- cpu = smp_processor_id();
-
- return &entries->cpu_entries[cpu][*rctx];
-}
-
-static void
-put_callchain_entry(int rctx)
-{
- put_recursion_context(__get_cpu_var(callchain_recursion), rctx);
-}
-
-static struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
-{
- int rctx;
- struct perf_callchain_entry *entry;
-
-
- entry = get_callchain_entry(&rctx);
- if (rctx == -1)
- return NULL;
-
- if (!entry)
- goto exit_put;
-
- entry->nr = 0;
-
- if (!user_mode(regs)) {
- perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
- perf_callchain_kernel(entry, regs);
- if (current->mm)
- regs = task_pt_regs(current);
- else
- regs = NULL;
- }
-
- if (regs) {
- perf_callchain_store(entry, PERF_CONTEXT_USER);
- perf_callchain_user(entry, regs);
- }
-
-exit_put:
- put_callchain_entry(rctx);
-
- return entry;
-}
-
-/*
- * Initialize the perf_event context in a task_struct:
- */
-static void __perf_event_init_context(struct perf_event_context *ctx)
-{
- raw_spin_lock_init(&ctx->lock);
- mutex_init(&ctx->mutex);
- INIT_LIST_HEAD(&ctx->pinned_groups);
- INIT_LIST_HEAD(&ctx->flexible_groups);
- INIT_LIST_HEAD(&ctx->event_list);
- atomic_set(&ctx->refcount, 1);
-}
-
-static struct perf_event_context *
-alloc_perf_context(struct pmu *pmu, struct task_struct *task)
-{
- struct perf_event_context *ctx;
-
- ctx = kzalloc(sizeof(struct perf_event_context), GFP_KERNEL);
- if (!ctx)
- return NULL;
-
- __perf_event_init_context(ctx);
- if (task) {
- ctx->task = task;
- get_task_struct(task);
- }
- ctx->pmu = pmu;
-
- return ctx;
-}
-
-static struct task_struct *
-find_lively_task_by_vpid(pid_t vpid)
-{
- struct task_struct *task;
- int err;
-
- rcu_read_lock();
- if (!vpid)
- task = current;
- else
- task = find_task_by_vpid(vpid);
- if (task)
- get_task_struct(task);
- rcu_read_unlock();
-
- if (!task)
- return ERR_PTR(-ESRCH);
-
- /* Reuse ptrace permission checks for now. */
- err = -EACCES;
- if (!ptrace_may_access(task, PTRACE_MODE_READ))
- goto errout;
-
- return task;
-errout:
- put_task_struct(task);
- return ERR_PTR(err);
-
-}
-
-/*
- * Returns a matching context with refcount and pincount.
- */
-static struct perf_event_context *
-find_get_context(struct pmu *pmu, struct task_struct *task, int cpu)
-{
- struct perf_event_context *ctx;
- struct perf_cpu_context *cpuctx;
- unsigned long flags;
- int ctxn, err;
-
- if (!task) {
- /* Must be root to operate on a CPU event: */
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
- return ERR_PTR(-EACCES);
-
- /*
- * We could be clever and allow to attach a event to an
- * offline CPU and activate it when the CPU comes up, but
- * that's for later.
- */
- if (!cpu_online(cpu))
- return ERR_PTR(-ENODEV);
-
- cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
- ctx = &cpuctx->ctx;
- get_ctx(ctx);
- ++ctx->pin_count;
-
- return ctx;
- }
-
- err = -EINVAL;
- ctxn = pmu->task_ctx_nr;
- if (ctxn < 0)
- goto errout;
-
-retry:
- ctx = perf_lock_task_context(task, ctxn, &flags);
- if (ctx) {
- unclone_ctx(ctx);
- ++ctx->pin_count;
- raw_spin_unlock_irqrestore(&ctx->lock, flags);
- }
-
- if (!ctx) {
- ctx = alloc_perf_context(pmu, task);
- err = -ENOMEM;
- if (!ctx)
- goto errout;
-
- get_ctx(ctx);
-
- err = 0;
- mutex_lock(&task->perf_event_mutex);
- /*
- * If it has already passed perf_event_exit_task().
- * we must see PF_EXITING, it takes this mutex too.
- */
- if (task->flags & PF_EXITING)
- err = -ESRCH;
- else if (task->perf_event_ctxp[ctxn])
- err = -EAGAIN;
- else {
- ++ctx->pin_count;
- rcu_assign_pointer(task->perf_event_ctxp[ctxn], ctx);
- }
- mutex_unlock(&task->perf_event_mutex);
-
- if (unlikely(err)) {
- put_task_struct(task);
- kfree(ctx);
-
- if (err == -EAGAIN)
- goto retry;
- goto errout;
- }
- }
-
- return ctx;
-
-errout:
- return ERR_PTR(err);
-}
-
-static void perf_event_free_filter(struct perf_event *event);
-
-static void free_event_rcu(struct rcu_head *head)
-{
- struct perf_event *event;
-
- event = container_of(head, struct perf_event, rcu_head);
- if (event->ns)
- put_pid_ns(event->ns);
- perf_event_free_filter(event);
- kfree(event);
-}
-
-static void perf_buffer_put(struct perf_buffer *buffer);
-
-static void free_event(struct perf_event *event)
-{
- irq_work_sync(&event->pending);
-
- if (!event->parent) {
- if (event->attach_state & PERF_ATTACH_TASK)
- jump_label_dec(&perf_sched_events);
- if (event->attr.mmap || event->attr.mmap_data)
- atomic_dec(&nr_mmap_events);
- if (event->attr.comm)
- atomic_dec(&nr_comm_events);
- if (event->attr.task)
- atomic_dec(&nr_task_events);
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- if (is_cgroup_event(event)) {
- atomic_dec(&per_cpu(perf_cgroup_events, event->cpu));
- jump_label_dec(&perf_sched_events);
- }
- }
-
- if (event->buffer) {
- perf_buffer_put(event->buffer);
- event->buffer = NULL;
- }
-
- if (is_cgroup_event(event))
- perf_detach_cgroup(event);
-
- if (event->destroy)
- event->destroy(event);
-
- if (event->ctx)
- put_ctx(event->ctx);
-
- call_rcu(&event->rcu_head, free_event_rcu);
-}
-
-int perf_event_release_kernel(struct perf_event *event)
-{
- struct perf_event_context *ctx = event->ctx;
-
- /*
- * Remove from the PMU, can't get re-enabled since we got
- * here because the last ref went.
- */
- perf_event_disable(event);
-
- WARN_ON_ONCE(ctx->parent_ctx);
- /*
- * There are two ways this annotation is useful:
- *
- * 1) there is a lock recursion from perf_event_exit_task
- * see the comment there.
- *
- * 2) there is a lock-inversion with mmap_sem through
- * perf_event_read_group(), which takes faults while
- * holding ctx->mutex, however this is called after
- * the last filedesc died, so there is no possibility
- * to trigger the AB-BA case.
- */
- mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
- raw_spin_lock_irq(&ctx->lock);
- perf_group_detach(event);
- list_del_event(event, ctx);
- raw_spin_unlock_irq(&ctx->lock);
- mutex_unlock(&ctx->mutex);
-
- free_event(event);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(perf_event_release_kernel);
-
-/*
- * Called when the last reference to the file is gone.
- */
-static int perf_release(struct inode *inode, struct file *file)
-{
- struct perf_event *event = file->private_data;
- struct task_struct *owner;
-
- file->private_data = NULL;
-
- rcu_read_lock();
- owner = ACCESS_ONCE(event->owner);
- /*
- * Matches the smp_wmb() in perf_event_exit_task(). If we observe
- * !owner it means the list deletion is complete and we can indeed
- * free this event, otherwise we need to serialize on
- * owner->perf_event_mutex.
- */
- smp_read_barrier_depends();
- if (owner) {
- /*
- * Since delayed_put_task_struct() also drops the last
- * task reference we can safely take a new reference
- * while holding the rcu_read_lock().
- */
- get_task_struct(owner);
- }
- rcu_read_unlock();
-
- if (owner) {
- mutex_lock(&owner->perf_event_mutex);
- /*
- * We have to re-check the event->owner field, if it is cleared
- * we raced with perf_event_exit_task(), acquiring the mutex
- * ensured they're done, and we can proceed with freeing the
- * event.
- */
- if (event->owner)
- list_del_init(&event->owner_entry);
- mutex_unlock(&owner->perf_event_mutex);
- put_task_struct(owner);
- }
-
- return perf_event_release_kernel(event);
-}
-
-u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
-{
- struct perf_event *child;
- u64 total = 0;
-
- *enabled = 0;
- *running = 0;
-
- mutex_lock(&event->child_mutex);
- total += perf_event_read(event);
- *enabled += event->total_time_enabled +
- atomic64_read(&event->child_total_time_enabled);
- *running += event->total_time_running +
- atomic64_read(&event->child_total_time_running);
-
- list_for_each_entry(child, &event->child_list, child_list) {
- total += perf_event_read(child);
- *enabled += child->total_time_enabled;
- *running += child->total_time_running;
- }
- mutex_unlock(&event->child_mutex);
-
- return total;
-}
-EXPORT_SYMBOL_GPL(perf_event_read_value);
-
-static int perf_event_read_group(struct perf_event *event,
- u64 read_format, char __user *buf)
-{
- struct perf_event *leader = event->group_leader, *sub;
- int n = 0, size = 0, ret = -EFAULT;
- struct perf_event_context *ctx = leader->ctx;
- u64 values[5];
- u64 count, enabled, running;
-
- mutex_lock(&ctx->mutex);
- count = perf_event_read_value(leader, &enabled, &running);
-
- values[n++] = 1 + leader->nr_siblings;
- if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
- values[n++] = enabled;
- if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
- values[n++] = running;
- values[n++] = count;
- if (read_format & PERF_FORMAT_ID)
- values[n++] = primary_event_id(leader);
-
- size = n * sizeof(u64);
-
- if (copy_to_user(buf, values, size))
- goto unlock;
-
- ret = size;
-
- list_for_each_entry(sub, &leader->sibling_list, group_entry) {
- n = 0;
-
- values[n++] = perf_event_read_value(sub, &enabled, &running);
- if (read_format & PERF_FORMAT_ID)
- values[n++] = primary_event_id(sub);
-
- size = n * sizeof(u64);
-
- if (copy_to_user(buf + ret, values, size)) {
- ret = -EFAULT;
- goto unlock;
- }
-
- ret += size;
- }
-unlock:
- mutex_unlock(&ctx->mutex);
-
- return ret;
-}
-
-static int perf_event_read_one(struct perf_event *event,
- u64 read_format, char __user *buf)
-{
- u64 enabled, running;
- u64 values[4];
- int n = 0;
-
- values[n++] = perf_event_read_value(event, &enabled, &running);
- if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
- values[n++] = enabled;
- if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
- values[n++] = running;
- if (read_format & PERF_FORMAT_ID)
- values[n++] = primary_event_id(event);
-
- if (copy_to_user(buf, values, n * sizeof(u64)))
- return -EFAULT;
-
- return n * sizeof(u64);
-}
-
-/*
- * Read the performance event - simple non blocking version for now
- */
-static ssize_t
-perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
-{
- u64 read_format = event->attr.read_format;
- int ret;
-
- /*
- * Return end-of-file for a read on a event that is in
- * error state (i.e. because it was pinned but it couldn't be
- * scheduled on to the CPU at some point).
- */
- if (event->state == PERF_EVENT_STATE_ERROR)
- return 0;
-
- if (count < event->read_size)
- return -ENOSPC;
-
- WARN_ON_ONCE(event->ctx->parent_ctx);
- if (read_format & PERF_FORMAT_GROUP)
- ret = perf_event_read_group(event, read_format, buf);
- else
- ret = perf_event_read_one(event, read_format, buf);
-
- return ret;
-}
-
-static ssize_t
-perf_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
-{
- struct perf_event *event = file->private_data;
-
- return perf_read_hw(event, buf, count);
-}
-
-static unsigned int perf_poll(struct file *file, poll_table *wait)
-{
- struct perf_event *event = file->private_data;
- struct perf_buffer *buffer;
- unsigned int events = POLL_HUP;
-
- rcu_read_lock();
- buffer = rcu_dereference(event->buffer);
- if (buffer)
- events = atomic_xchg(&buffer->poll, 0);
- rcu_read_unlock();
-
- poll_wait(file, &event->waitq, wait);
-
- return events;
-}
-
-static void perf_event_reset(struct perf_event *event)
-{
- (void)perf_event_read(event);
- local64_set(&event->count, 0);
- perf_event_update_userpage(event);
-}
-
-/*
- * Holding the top-level event's child_mutex means that any
- * descendant process that has inherited this event will block
- * in sync_child_event if it goes to exit, thus satisfying the
- * task existence requirements of perf_event_enable/disable.
- */
-static void perf_event_for_each_child(struct perf_event *event,
- void (*func)(struct perf_event *))
-{
- struct perf_event *child;
-
- WARN_ON_ONCE(event->ctx->parent_ctx);
- mutex_lock(&event->child_mutex);
- func(event);
- list_for_each_entry(child, &event->child_list, child_list)
- func(child);
- mutex_unlock(&event->child_mutex);
-}
-
-static void perf_event_for_each(struct perf_event *event,
- void (*func)(struct perf_event *))
-{
- struct perf_event_context *ctx = event->ctx;
- struct perf_event *sibling;
-
- WARN_ON_ONCE(ctx->parent_ctx);
- mutex_lock(&ctx->mutex);
- event = event->group_leader;
-
- perf_event_for_each_child(event, func);
- func(event);
- list_for_each_entry(sibling, &event->sibling_list, group_entry)
- perf_event_for_each_child(event, func);
- mutex_unlock(&ctx->mutex);
-}
-
-static int perf_event_period(struct perf_event *event, u64 __user *arg)
-{
- struct perf_event_context *ctx = event->ctx;
- int ret = 0;
- u64 value;
-
- if (!is_sampling_event(event))
- return -EINVAL;
-
- if (copy_from_user(&value, arg, sizeof(value)))
- return -EFAULT;
-
- if (!value)
- return -EINVAL;
-
- raw_spin_lock_irq(&ctx->lock);
- if (event->attr.freq) {
- if (value > sysctl_perf_event_sample_rate) {
- ret = -EINVAL;
- goto unlock;
- }
-
- event->attr.sample_freq = value;
- } else {
- event->attr.sample_period = value;
- event->hw.sample_period = value;
- }
-unlock:
- raw_spin_unlock_irq(&ctx->lock);
-
- return ret;
-}
-
-static const struct file_operations perf_fops;
-
-static struct perf_event *perf_fget_light(int fd, int *fput_needed)
-{
- struct file *file;
-
- file = fget_light(fd, fput_needed);
- if (!file)
- return ERR_PTR(-EBADF);
-
- if (file->f_op != &perf_fops) {
- fput_light(file, *fput_needed);
- *fput_needed = 0;
- return ERR_PTR(-EBADF);
- }
-
- return file->private_data;
-}
-
-static int perf_event_set_output(struct perf_event *event,
- struct perf_event *output_event);
-static int perf_event_set_filter(struct perf_event *event, void __user *arg);
-
-static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- struct perf_event *event = file->private_data;
- void (*func)(struct perf_event *);
- u32 flags = arg;
-
- switch (cmd) {
- case PERF_EVENT_IOC_ENABLE:
- func = perf_event_enable;
- break;
- case PERF_EVENT_IOC_DISABLE:
- func = perf_event_disable;
- break;
- case PERF_EVENT_IOC_RESET:
- func = perf_event_reset;
- break;
-
- case PERF_EVENT_IOC_REFRESH:
- return perf_event_refresh(event, arg);
-
- case PERF_EVENT_IOC_PERIOD:
- return perf_event_period(event, (u64 __user *)arg);
-
- case PERF_EVENT_IOC_SET_OUTPUT:
- {
- struct perf_event *output_event = NULL;
- int fput_needed = 0;
- int ret;
-
- if (arg != -1) {
- output_event = perf_fget_light(arg, &fput_needed);
- if (IS_ERR(output_event))
- return PTR_ERR(output_event);
- }
-
- ret = perf_event_set_output(event, output_event);
- if (output_event)
- fput_light(output_event->filp, fput_needed);
-
- return ret;
- }
-
- case PERF_EVENT_IOC_SET_FILTER:
- return perf_event_set_filter(event, (void __user *)arg);
-
- default:
- return -ENOTTY;
- }
-
- if (flags & PERF_IOC_FLAG_GROUP)
- perf_event_for_each(event, func);
- else
- perf_event_for_each_child(event, func);
-
- return 0;
-}
-
-int perf_event_task_enable(void)
-{
- struct perf_event *event;
-
- mutex_lock(¤t->perf_event_mutex);
- list_for_each_entry(event, ¤t->perf_event_list, owner_entry)
- perf_event_for_each_child(event, perf_event_enable);
- mutex_unlock(¤t->perf_event_mutex);
-
- return 0;
-}
-
-int perf_event_task_disable(void)
-{
- struct perf_event *event;
-
- mutex_lock(¤t->perf_event_mutex);
- list_for_each_entry(event, ¤t->perf_event_list, owner_entry)
- perf_event_for_each_child(event, perf_event_disable);
- mutex_unlock(¤t->perf_event_mutex);
-
- return 0;
-}
-
-#ifndef PERF_EVENT_INDEX_OFFSET
-# define PERF_EVENT_INDEX_OFFSET 0
-#endif
-
-static int perf_event_index(struct perf_event *event)
-{
- if (event->hw.state & PERF_HES_STOPPED)
- return 0;
-
- if (event->state != PERF_EVENT_STATE_ACTIVE)
- return 0;
-
- return event->hw.idx + 1 - PERF_EVENT_INDEX_OFFSET;
-}
-
-/*
- * Callers need to ensure there can be no nesting of this function, otherwise
- * the seqlock logic goes bad. We can not serialize this because the arch
- * code calls this from NMI context.
- */
-void perf_event_update_userpage(struct perf_event *event)
-{
- struct perf_event_mmap_page *userpg;
- struct perf_buffer *buffer;
-
- rcu_read_lock();
- buffer = rcu_dereference(event->buffer);
- if (!buffer)
- goto unlock;
-
- userpg = buffer->user_page;
-
- /*
- * Disable preemption so as to not let the corresponding user-space
- * spin too long if we get preempted.
- */
- preempt_disable();
- ++userpg->lock;
- barrier();
- userpg->index = perf_event_index(event);
- userpg->offset = perf_event_count(event);
- if (event->state == PERF_EVENT_STATE_ACTIVE)
- userpg->offset -= local64_read(&event->hw.prev_count);
-
- userpg->time_enabled = event->total_time_enabled +
- atomic64_read(&event->child_total_time_enabled);
-
- userpg->time_running = event->total_time_running +
- atomic64_read(&event->child_total_time_running);
-
- barrier();
- ++userpg->lock;
- preempt_enable();
-unlock:
- rcu_read_unlock();
-}
-
-static unsigned long perf_data_size(struct perf_buffer *buffer);
-
-static void
-perf_buffer_init(struct perf_buffer *buffer, long watermark, int flags)
-{
- long max_size = perf_data_size(buffer);
-
- if (watermark)
- buffer->watermark = min(max_size, watermark);
-
- if (!buffer->watermark)
- buffer->watermark = max_size / 2;
-
- if (flags & PERF_BUFFER_WRITABLE)
- buffer->writable = 1;
-
- atomic_set(&buffer->refcount, 1);
-}
-
-#ifndef CONFIG_PERF_USE_VMALLOC
-
-/*
- * Back perf_mmap() with regular GFP_KERNEL-0 pages.
- */
-
-static struct page *
-perf_mmap_to_page(struct perf_buffer *buffer, unsigned long pgoff)
-{
- if (pgoff > buffer->nr_pages)
- return NULL;
-
- if (pgoff == 0)
- return virt_to_page(buffer->user_page);
-
- return virt_to_page(buffer->data_pages[pgoff - 1]);
-}
-
-static void *perf_mmap_alloc_page(int cpu)
-{
- struct page *page;
- int node;
-
- node = (cpu == -1) ? cpu : cpu_to_node(cpu);
- page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
- if (!page)
- return NULL;
-
- return page_address(page);
-}
-
-static struct perf_buffer *
-perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags)
-{
- struct perf_buffer *buffer;
- unsigned long size;
- int i;
-
- size = sizeof(struct perf_buffer);
- size += nr_pages * sizeof(void *);
-
- buffer = kzalloc(size, GFP_KERNEL);
- if (!buffer)
- goto fail;
-
- buffer->user_page = perf_mmap_alloc_page(cpu);
- if (!buffer->user_page)
- goto fail_user_page;
-
- for (i = 0; i < nr_pages; i++) {
- buffer->data_pages[i] = perf_mmap_alloc_page(cpu);
- if (!buffer->data_pages[i])
- goto fail_data_pages;
- }
-
- buffer->nr_pages = nr_pages;
-
- perf_buffer_init(buffer, watermark, flags);
-
- return buffer;
-
-fail_data_pages:
- for (i--; i >= 0; i--)
- free_page((unsigned long)buffer->data_pages[i]);
-
- free_page((unsigned long)buffer->user_page);
-
-fail_user_page:
- kfree(buffer);
-
-fail:
- return NULL;
-}
-
-static void perf_mmap_free_page(unsigned long addr)
-{
- struct page *page = virt_to_page((void *)addr);
-
- page->mapping = NULL;
- __free_page(page);
-}
-
-static void perf_buffer_free(struct perf_buffer *buffer)
-{
- int i;
-
- perf_mmap_free_page((unsigned long)buffer->user_page);
- for (i = 0; i < buffer->nr_pages; i++)
- perf_mmap_free_page((unsigned long)buffer->data_pages[i]);
- kfree(buffer);
-}
-
-static inline int page_order(struct perf_buffer *buffer)
-{
- return 0;
-}
-
-#else
-
-/*
- * Back perf_mmap() with vmalloc memory.
- *
- * Required for architectures that have d-cache aliasing issues.
- */
-
-static inline int page_order(struct perf_buffer *buffer)
-{
- return buffer->page_order;
-}
-
-static struct page *
-perf_mmap_to_page(struct perf_buffer *buffer, unsigned long pgoff)
-{
- if (pgoff > (1UL << page_order(buffer)))
- return NULL;
-
- return vmalloc_to_page((void *)buffer->user_page + pgoff * PAGE_SIZE);
-}
-
-static void perf_mmap_unmark_page(void *addr)
-{
- struct page *page = vmalloc_to_page(addr);
-
- page->mapping = NULL;
-}
-
-static void perf_buffer_free_work(struct work_struct *work)
-{
- struct perf_buffer *buffer;
- void *base;
- int i, nr;
-
- buffer = container_of(work, struct perf_buffer, work);
- nr = 1 << page_order(buffer);
-
- base = buffer->user_page;
- for (i = 0; i < nr + 1; i++)
- perf_mmap_unmark_page(base + (i * PAGE_SIZE));
-
- vfree(base);
- kfree(buffer);
-}
-
-static void perf_buffer_free(struct perf_buffer *buffer)
-{
- schedule_work(&buffer->work);
-}
-
-static struct perf_buffer *
-perf_buffer_alloc(int nr_pages, long watermark, int cpu, int flags)
-{
- struct perf_buffer *buffer;
- unsigned long size;
- void *all_buf;
-
- size = sizeof(struct perf_buffer);
- size += sizeof(void *);
-
- buffer = kzalloc(size, GFP_KERNEL);
- if (!buffer)
- goto fail;
-
- INIT_WORK(&buffer->work, perf_buffer_free_work);
-
- all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE);
- if (!all_buf)
- goto fail_all_buf;
-
- buffer->user_page = all_buf;
- buffer->data_pages[0] = all_buf + PAGE_SIZE;
- buffer->page_order = ilog2(nr_pages);
- buffer->nr_pages = 1;
-
- perf_buffer_init(buffer, watermark, flags);
-
- return buffer;
-
-fail_all_buf:
- kfree(buffer);
-
-fail:
- return NULL;
-}
-
-#endif
-
-static unsigned long perf_data_size(struct perf_buffer *buffer)
-{
- return buffer->nr_pages << (PAGE_SHIFT + page_order(buffer));
-}
-
-static int perf_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
- struct perf_event *event = vma->vm_file->private_data;
- struct perf_buffer *buffer;
- int ret = VM_FAULT_SIGBUS;
-
- if (vmf->flags & FAULT_FLAG_MKWRITE) {
- if (vmf->pgoff == 0)
- ret = 0;
- return ret;
- }
-
- rcu_read_lock();
- buffer = rcu_dereference(event->buffer);
- if (!buffer)
- goto unlock;
-
- if (vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))
- goto unlock;
-
- vmf->page = perf_mmap_to_page(buffer, vmf->pgoff);
- if (!vmf->page)
- goto unlock;
-
- get_page(vmf->page);
- vmf->page->mapping = vma->vm_file->f_mapping;
- vmf->page->index = vmf->pgoff;
-
- ret = 0;
-unlock:
- rcu_read_unlock();
-
- return ret;
-}
-
-static void perf_buffer_free_rcu(struct rcu_head *rcu_head)
-{
- struct perf_buffer *buffer;
-
- buffer = container_of(rcu_head, struct perf_buffer, rcu_head);
- perf_buffer_free(buffer);
-}
-
-static struct perf_buffer *perf_buffer_get(struct perf_event *event)
-{
- struct perf_buffer *buffer;
-
- rcu_read_lock();
- buffer = rcu_dereference(event->buffer);
- if (buffer) {
- if (!atomic_inc_not_zero(&buffer->refcount))
- buffer = NULL;
- }
- rcu_read_unlock();
-
- return buffer;
-}
-
-static void perf_buffer_put(struct perf_buffer *buffer)
-{
- if (!atomic_dec_and_test(&buffer->refcount))
- return;
-
- call_rcu(&buffer->rcu_head, perf_buffer_free_rcu);
-}
-
-static void perf_mmap_open(struct vm_area_struct *vma)
-{
- struct perf_event *event = vma->vm_file->private_data;
-
- atomic_inc(&event->mmap_count);
-}
-
-static void perf_mmap_close(struct vm_area_struct *vma)
-{
- struct perf_event *event = vma->vm_file->private_data;
-
- if (atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex)) {
- unsigned long size = perf_data_size(event->buffer);
- struct user_struct *user = event->mmap_user;
- struct perf_buffer *buffer = event->buffer;
-
- atomic_long_sub((size >> PAGE_SHIFT) + 1, &user->locked_vm);
- vma->vm_mm->locked_vm -= event->mmap_locked;
- rcu_assign_pointer(event->buffer, NULL);
- mutex_unlock(&event->mmap_mutex);
-
- perf_buffer_put(buffer);
- free_uid(user);
- }
-}
-
-static const struct vm_operations_struct perf_mmap_vmops = {
- .open = perf_mmap_open,
- .close = perf_mmap_close,
- .fault = perf_mmap_fault,
- .page_mkwrite = perf_mmap_fault,
-};
-
-static int perf_mmap(struct file *file, struct vm_area_struct *vma)
-{
- struct perf_event *event = file->private_data;
- unsigned long user_locked, user_lock_limit;
- struct user_struct *user = current_user();
- unsigned long locked, lock_limit;
- struct perf_buffer *buffer;
- unsigned long vma_size;
- unsigned long nr_pages;
- long user_extra, extra;
- int ret = 0, flags = 0;
-
- /*
- * Don't allow mmap() of inherited per-task counters. This would
- * create a performance issue due to all children writing to the
- * same buffer.
- */
- if (event->cpu == -1 && event->attr.inherit)
- return -EINVAL;
-
- if (!(vma->vm_flags & VM_SHARED))
- return -EINVAL;
-
- vma_size = vma->vm_end - vma->vm_start;
- nr_pages = (vma_size / PAGE_SIZE) - 1;
-
- /*
- * If we have buffer pages ensure they're a power-of-two number, so we
- * can do bitmasks instead of modulo.
- */
- if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
-
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
- if (vma->vm_pgoff != 0)
- return -EINVAL;
-
- WARN_ON_ONCE(event->ctx->parent_ctx);
- mutex_lock(&event->mmap_mutex);
- if (event->buffer) {
- if (event->buffer->nr_pages == nr_pages)
- atomic_inc(&event->buffer->refcount);
- else
- ret = -EINVAL;
- goto unlock;
- }
-
- user_extra = nr_pages + 1;
- user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
-
- /*
- * Increase the limit linearly with more CPUs:
- */
- user_lock_limit *= num_online_cpus();
-
- user_locked = atomic_long_read(&user->locked_vm) + user_extra;
-
- extra = 0;
- if (user_locked > user_lock_limit)
- extra = user_locked - user_lock_limit;
-
- lock_limit = rlimit(RLIMIT_MEMLOCK);
- lock_limit >>= PAGE_SHIFT;
- locked = vma->vm_mm->locked_vm + extra;
-
- if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
- !capable(CAP_IPC_LOCK)) {
- ret = -EPERM;
- goto unlock;
- }
-
- WARN_ON(event->buffer);
-
- if (vma->vm_flags & VM_WRITE)
- flags |= PERF_BUFFER_WRITABLE;
-
- buffer = perf_buffer_alloc(nr_pages, event->attr.wakeup_watermark,
- event->cpu, flags);
- if (!buffer) {
- ret = -ENOMEM;
- goto unlock;
- }
- rcu_assign_pointer(event->buffer, buffer);
-
- atomic_long_add(user_extra, &user->locked_vm);
- event->mmap_locked = extra;
- event->mmap_user = get_current_user();
- vma->vm_mm->locked_vm += event->mmap_locked;
-
-unlock:
- if (!ret)
- atomic_inc(&event->mmap_count);
- mutex_unlock(&event->mmap_mutex);
-
- vma->vm_flags |= VM_RESERVED;
- vma->vm_ops = &perf_mmap_vmops;
-
- return ret;
-}
-
-static int perf_fasync(int fd, struct file *filp, int on)
-{
- struct inode *inode = filp->f_path.dentry->d_inode;
- struct perf_event *event = filp->private_data;
- int retval;
-
- mutex_lock(&inode->i_mutex);
- retval = fasync_helper(fd, filp, on, &event->fasync);
- mutex_unlock(&inode->i_mutex);
-
- if (retval < 0)
- return retval;
-
- return 0;
-}
-
-static const struct file_operations perf_fops = {
- .llseek = no_llseek,
- .release = perf_release,
- .read = perf_read,
- .poll = perf_poll,
- .unlocked_ioctl = perf_ioctl,
- .compat_ioctl = perf_ioctl,
- .mmap = perf_mmap,
- .fasync = perf_fasync,
-};
-
-/*
- * Perf event wakeup
- *
- * If there's data, ensure we set the poll() state and publish everything
- * to user-space before waking everybody up.
- */
-
-void perf_event_wakeup(struct perf_event *event)
-{
- wake_up_all(&event->waitq);
-
- if (event->pending_kill) {
- kill_fasync(&event->fasync, SIGIO, event->pending_kill);
- event->pending_kill = 0;
- }
-}
-
-static void perf_pending_event(struct irq_work *entry)
-{
- struct perf_event *event = container_of(entry,
- struct perf_event, pending);
-
- if (event->pending_disable) {
- event->pending_disable = 0;
- __perf_event_disable(event);
- }
-
- if (event->pending_wakeup) {
- event->pending_wakeup = 0;
- perf_event_wakeup(event);
- }
-}
-
-/*
- * We assume there is only KVM supporting the callbacks.
- * Later on, we might change it to a list if there is
- * another virtualization implementation supporting the callbacks.
- */
-struct perf_guest_info_callbacks *perf_guest_cbs;
-
-int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
-{
- perf_guest_cbs = cbs;
- return 0;
-}
-EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
-
-int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
-{
- perf_guest_cbs = NULL;
- return 0;
-}
-EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
-
-/*
- * Output
- */
-static bool perf_output_space(struct perf_buffer *buffer, unsigned long tail,
- unsigned long offset, unsigned long head)
-{
- unsigned long mask;
-
- if (!buffer->writable)
- return true;
-
- mask = perf_data_size(buffer) - 1;
-
- offset = (offset - tail) & mask;
- head = (head - tail) & mask;
-
- if ((int)(head - offset) < 0)
- return false;
-
- return true;
-}
-
-static void perf_output_wakeup(struct perf_output_handle *handle)
-{
- atomic_set(&handle->buffer->poll, POLL_IN);
-
- if (handle->nmi) {
- handle->event->pending_wakeup = 1;
- irq_work_queue(&handle->event->pending);
- } else
- perf_event_wakeup(handle->event);
-}
-
-/*
- * We need to ensure a later event_id doesn't publish a head when a former
- * event isn't done writing. However since we need to deal with NMIs we
- * cannot fully serialize things.
- *
- * We only publish the head (and generate a wakeup) when the outer-most
- * event completes.
- */
-static void perf_output_get_handle(struct perf_output_handle *handle)
-{
- struct perf_buffer *buffer = handle->buffer;
-
- preempt_disable();
- local_inc(&buffer->nest);
- handle->wakeup = local_read(&buffer->wakeup);
-}
-
-static void perf_output_put_handle(struct perf_output_handle *handle)
-{
- struct perf_buffer *buffer = handle->buffer;
- unsigned long head;
-
-again:
- head = local_read(&buffer->head);
-
- /*
- * IRQ/NMI can happen here, which means we can miss a head update.
- */
-
- if (!local_dec_and_test(&buffer->nest))
- goto out;
-
- /*
- * Publish the known good head. Rely on the full barrier implied
- * by atomic_dec_and_test() order the buffer->head read and this
- * write.
- */
- buffer->user_page->data_head = head;
-
- /*
- * Now check if we missed an update, rely on the (compiler)
- * barrier in atomic_dec_and_test() to re-read buffer->head.
- */
- if (unlikely(head != local_read(&buffer->head))) {
- local_inc(&buffer->nest);
- goto again;
- }
-
- if (handle->wakeup != local_read(&buffer->wakeup))
- perf_output_wakeup(handle);
-
-out:
- preempt_enable();
-}
-
-__always_inline void perf_output_copy(struct perf_output_handle *handle,
- const void *buf, unsigned int len)
-{
- do {
- unsigned long size = min_t(unsigned long, handle->size, len);
-
- memcpy(handle->addr, buf, size);
-
- len -= size;
- handle->addr += size;
- buf += size;
- handle->size -= size;
- if (!handle->size) {
- struct perf_buffer *buffer = handle->buffer;
-
- handle->page++;
- handle->page &= buffer->nr_pages - 1;
- handle->addr = buffer->data_pages[handle->page];
- handle->size = PAGE_SIZE << page_order(buffer);
- }
- } while (len);
-}
-
-static void __perf_event_header__init_id(struct perf_event_header *header,
- struct perf_sample_data *data,
- struct perf_event *event)
-{
- u64 sample_type = event->attr.sample_type;
-
- data->type = sample_type;
- header->size += event->id_header_size;
-
- if (sample_type & PERF_SAMPLE_TID) {
- /* namespace issues */
- data->tid_entry.pid = perf_event_pid(event, current);
- data->tid_entry.tid = perf_event_tid(event, current);
- }
-
- if (sample_type & PERF_SAMPLE_TIME)
- data->time = perf_clock();
-
- if (sample_type & PERF_SAMPLE_ID)
- data->id = primary_event_id(event);
-
- if (sample_type & PERF_SAMPLE_STREAM_ID)
- data->stream_id = event->id;
-
- if (sample_type & PERF_SAMPLE_CPU) {
- data->cpu_entry.cpu = raw_smp_processor_id();
- data->cpu_entry.reserved = 0;
- }
-}
-
-static void perf_event_header__init_id(struct perf_event_header *header,
- struct perf_sample_data *data,
- struct perf_event *event)
-{
- if (event->attr.sample_id_all)
- __perf_event_header__init_id(header, data, event);
-}
-
-static void __perf_event__output_id_sample(struct perf_output_handle *handle,
- struct perf_sample_data *data)
-{
- u64 sample_type = data->type;
-
- if (sample_type & PERF_SAMPLE_TID)
- perf_output_put(handle, data->tid_entry);
-
- if (sample_type & PERF_SAMPLE_TIME)
- perf_output_put(handle, data->time);
-
- if (sample_type & PERF_SAMPLE_ID)
- perf_output_put(handle, data->id);
-
- if (sample_type & PERF_SAMPLE_STREAM_ID)
- perf_output_put(handle, data->stream_id);
-
- if (sample_type & PERF_SAMPLE_CPU)
- perf_output_put(handle, data->cpu_entry);
-}
-
-static void perf_event__output_id_sample(struct perf_event *event,
- struct perf_output_handle *handle,
- struct perf_sample_data *sample)
-{
- if (event->attr.sample_id_all)
- __perf_event__output_id_sample(handle, sample);
-}
-
-int perf_output_begin(struct perf_output_handle *handle,
- struct perf_event *event, unsigned int size,
- int nmi, int sample)
-{
- struct perf_buffer *buffer;
- unsigned long tail, offset, head;
- int have_lost;
- struct perf_sample_data sample_data;
- struct {
- struct perf_event_header header;
- u64 id;
- u64 lost;
- } lost_event;
-
- rcu_read_lock();
- /*
- * For inherited events we send all the output towards the parent.
- */
- if (event->parent)
- event = event->parent;
-
- buffer = rcu_dereference(event->buffer);
- if (!buffer)
- goto out;
-
- handle->buffer = buffer;
- handle->event = event;
- handle->nmi = nmi;
- handle->sample = sample;
-
- if (!buffer->nr_pages)
- goto out;
-
- have_lost = local_read(&buffer->lost);
- if (have_lost) {
- lost_event.header.size = sizeof(lost_event);
- perf_event_header__init_id(&lost_event.header, &sample_data,
- event);
- size += lost_event.header.size;
- }
-
- perf_output_get_handle(handle);
-
- do {
- /*
- * Userspace could choose to issue a mb() before updating the
- * tail pointer. So that all reads will be completed before the
- * write is issued.
- */
- tail = ACCESS_ONCE(buffer->user_page->data_tail);
- smp_rmb();
- offset = head = local_read(&buffer->head);
- head += size;
- if (unlikely(!perf_output_space(buffer, tail, offset, head)))
- goto fail;
- } while (local_cmpxchg(&buffer->head, offset, head) != offset);
-
- if (head - local_read(&buffer->wakeup) > buffer->watermark)
- local_add(buffer->watermark, &buffer->wakeup);
-
- handle->page = offset >> (PAGE_SHIFT + page_order(buffer));
- handle->page &= buffer->nr_pages - 1;
- handle->size = offset & ((PAGE_SIZE << page_order(buffer)) - 1);
- handle->addr = buffer->data_pages[handle->page];
- handle->addr += handle->size;
- handle->size = (PAGE_SIZE << page_order(buffer)) - handle->size;
-
- if (have_lost) {
- lost_event.header.type = PERF_RECORD_LOST;
- lost_event.header.misc = 0;
- lost_event.id = event->id;
- lost_event.lost = local_xchg(&buffer->lost, 0);
-
- perf_output_put(handle, lost_event);
- perf_event__output_id_sample(event, handle, &sample_data);
- }
-
- return 0;
-
-fail:
- local_inc(&buffer->lost);
- perf_output_put_handle(handle);
-out:
- rcu_read_unlock();
-
- return -ENOSPC;
-}
-
-void perf_output_end(struct perf_output_handle *handle)
-{
- struct perf_event *event = handle->event;
- struct perf_buffer *buffer = handle->buffer;
-
- int wakeup_events = event->attr.wakeup_events;
-
- if (handle->sample && wakeup_events) {
- int events = local_inc_return(&buffer->events);
- if (events >= wakeup_events) {
- local_sub(wakeup_events, &buffer->events);
- local_inc(&buffer->wakeup);
- }
- }
-
- perf_output_put_handle(handle);
- rcu_read_unlock();
-}
-
-static void perf_output_read_one(struct perf_output_handle *handle,
- struct perf_event *event,
- u64 enabled, u64 running)
-{
- u64 read_format = event->attr.read_format;
- u64 values[4];
- int n = 0;
-
- values[n++] = perf_event_count(event);
- if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
- values[n++] = enabled +
- atomic64_read(&event->child_total_time_enabled);
- }
- if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
- values[n++] = running +
- atomic64_read(&event->child_total_time_running);
- }
- if (read_format & PERF_FORMAT_ID)
- values[n++] = primary_event_id(event);
-
- perf_output_copy(handle, values, n * sizeof(u64));
-}
-
-/*
- * XXX PERF_FORMAT_GROUP vs inherited events seems difficult.
- */
-static void perf_output_read_group(struct perf_output_handle *handle,
- struct perf_event *event,
- u64 enabled, u64 running)
-{
- struct perf_event *leader = event->group_leader, *sub;
- u64 read_format = event->attr.read_format;
- u64 values[5];
- int n = 0;
-
- values[n++] = 1 + leader->nr_siblings;
-
- if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
- values[n++] = enabled;
-
- if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
- values[n++] = running;
-
- if (leader != event)
- leader->pmu->read(leader);
-
- values[n++] = perf_event_count(leader);
- if (read_format & PERF_FORMAT_ID)
- values[n++] = primary_event_id(leader);
-
- perf_output_copy(handle, values, n * sizeof(u64));
-
- list_for_each_entry(sub, &leader->sibling_list, group_entry) {
- n = 0;
-
- if (sub != event)
- sub->pmu->read(sub);
-
- values[n++] = perf_event_count(sub);
- if (read_format & PERF_FORMAT_ID)
- values[n++] = primary_event_id(sub);
-
- perf_output_copy(handle, values, n * sizeof(u64));
- }
-}
-
-#define PERF_FORMAT_TOTAL_TIMES (PERF_FORMAT_TOTAL_TIME_ENABLED|\
- PERF_FORMAT_TOTAL_TIME_RUNNING)
-
-static void perf_output_read(struct perf_output_handle *handle,
- struct perf_event *event)
-{
- u64 enabled = 0, running = 0, now, ctx_time;
- u64 read_format = event->attr.read_format;
-
- /*
- * compute total_time_enabled, total_time_running
- * based on snapshot values taken when the event
- * was last scheduled in.
- *
- * we cannot simply called update_context_time()
- * because of locking issue as we are called in
- * NMI context
- */
- if (read_format & PERF_FORMAT_TOTAL_TIMES) {
- now = perf_clock();
- ctx_time = event->shadow_ctx_time + now;
- enabled = ctx_time - event->tstamp_enabled;
- running = ctx_time - event->tstamp_running;
- }
-
- if (event->attr.read_format & PERF_FORMAT_GROUP)
- perf_output_read_group(handle, event, enabled, running);
- else
- perf_output_read_one(handle, event, enabled, running);
-}
-
-void perf_output_sample(struct perf_output_handle *handle,
- struct perf_event_header *header,
- struct perf_sample_data *data,
- struct perf_event *event)
-{
- u64 sample_type = data->type;
-
- perf_output_put(handle, *header);
-
- if (sample_type & PERF_SAMPLE_IP)
- perf_output_put(handle, data->ip);
-
- if (sample_type & PERF_SAMPLE_TID)
- perf_output_put(handle, data->tid_entry);
-
- if (sample_type & PERF_SAMPLE_TIME)
- perf_output_put(handle, data->time);
-
- if (sample_type & PERF_SAMPLE_ADDR)
- perf_output_put(handle, data->addr);
-
- if (sample_type & PERF_SAMPLE_ID)
- perf_output_put(handle, data->id);
-
- if (sample_type & PERF_SAMPLE_STREAM_ID)
- perf_output_put(handle, data->stream_id);
-
- if (sample_type & PERF_SAMPLE_CPU)
- perf_output_put(handle, data->cpu_entry);
-
- if (sample_type & PERF_SAMPLE_PERIOD)
- perf_output_put(handle, data->period);
-
- if (sample_type & PERF_SAMPLE_READ)
- perf_output_read(handle, event);
-
- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- if (data->callchain) {
- int size = 1;
-
- if (data->callchain)
- size += data->callchain->nr;
-
- size *= sizeof(u64);
-
- perf_output_copy(handle, data->callchain, size);
- } else {
- u64 nr = 0;
- perf_output_put(handle, nr);
- }
- }
-
- if (sample_type & PERF_SAMPLE_RAW) {
- if (data->raw) {
- perf_output_put(handle, data->raw->size);
- perf_output_copy(handle, data->raw->data,
- data->raw->size);
- } else {
- struct {
- u32 size;
- u32 data;
- } raw = {
- .size = sizeof(u32),
- .data = 0,
- };
- perf_output_put(handle, raw);
- }
- }
-}
-
-void perf_prepare_sample(struct perf_event_header *header,
- struct perf_sample_data *data,
- struct perf_event *event,
- struct pt_regs *regs)
-{
- u64 sample_type = event->attr.sample_type;
-
- header->type = PERF_RECORD_SAMPLE;
- header->size = sizeof(*header) + event->header_size;
-
- header->misc = 0;
- header->misc |= perf_misc_flags(regs);
-
- __perf_event_header__init_id(header, data, event);
-
- if (sample_type & PERF_SAMPLE_IP)
- data->ip = perf_instruction_pointer(regs);
-
- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- int size = 1;
-
- data->callchain = perf_callchain(regs);
-
- if (data->callchain)
- size += data->callchain->nr;
-
- header->size += size * sizeof(u64);
- }
-
- if (sample_type & PERF_SAMPLE_RAW) {
- int size = sizeof(u32);
-
- if (data->raw)
- size += data->raw->size;
- else
- size += sizeof(u32);
-
- WARN_ON_ONCE(size & (sizeof(u64)-1));
- header->size += size;
- }
-}
-
-static void perf_event_output(struct perf_event *event, int nmi,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- struct perf_output_handle handle;
- struct perf_event_header header;
-
- /* protect the callchain buffers */
- rcu_read_lock();
-
- perf_prepare_sample(&header, data, event, regs);
-
- if (perf_output_begin(&handle, event, header.size, nmi, 1))
- goto exit;
-
- perf_output_sample(&handle, &header, data, event);
-
- perf_output_end(&handle);
-
-exit:
- rcu_read_unlock();
-}
-
-/*
- * read event_id
- */
-
-struct perf_read_event {
- struct perf_event_header header;
-
- u32 pid;
- u32 tid;
-};
-
-static void
-perf_event_read_event(struct perf_event *event,
- struct task_struct *task)
-{
- struct perf_output_handle handle;
- struct perf_sample_data sample;
- struct perf_read_event read_event = {
- .header = {
- .type = PERF_RECORD_READ,
- .misc = 0,
- .size = sizeof(read_event) + event->read_size,
- },
- .pid = perf_event_pid(event, task),
- .tid = perf_event_tid(event, task),
- };
- int ret;
-
- perf_event_header__init_id(&read_event.header, &sample, event);
- ret = perf_output_begin(&handle, event, read_event.header.size, 0, 0);
- if (ret)
- return;
-
- perf_output_put(&handle, read_event);
- perf_output_read(&handle, event);
- perf_event__output_id_sample(event, &handle, &sample);
-
- perf_output_end(&handle);
-}
-
-/*
- * task tracking -- fork/exit
- *
- * enabled by: attr.comm | attr.mmap | attr.mmap_data | attr.task
- */
-
-struct perf_task_event {
- struct task_struct *task;
- struct perf_event_context *task_ctx;
-
- struct {
- struct perf_event_header header;
-
- u32 pid;
- u32 ppid;
- u32 tid;
- u32 ptid;
- u64 time;
- } event_id;
-};
-
-static void perf_event_task_output(struct perf_event *event,
- struct perf_task_event *task_event)
-{
- struct perf_output_handle handle;
- struct perf_sample_data sample;
- struct task_struct *task = task_event->task;
- int ret, size = task_event->event_id.header.size;
-
- perf_event_header__init_id(&task_event->event_id.header, &sample, event);
-
- ret = perf_output_begin(&handle, event,
- task_event->event_id.header.size, 0, 0);
- if (ret)
- goto out;
-
- task_event->event_id.pid = perf_event_pid(event, task);
- task_event->event_id.ppid = perf_event_pid(event, current);
-
- task_event->event_id.tid = perf_event_tid(event, task);
- task_event->event_id.ptid = perf_event_tid(event, current);
-
- perf_output_put(&handle, task_event->event_id);
-
- perf_event__output_id_sample(event, &handle, &sample);
-
- perf_output_end(&handle);
-out:
- task_event->event_id.header.size = size;
-}
-
-static int perf_event_task_match(struct perf_event *event)
-{
- if (event->state < PERF_EVENT_STATE_INACTIVE)
- return 0;
-
- if (!event_filter_match(event))
- return 0;
-
- if (event->attr.comm || event->attr.mmap ||
- event->attr.mmap_data || event->attr.task)
- return 1;
-
- return 0;
-}
-
-static void perf_event_task_ctx(struct perf_event_context *ctx,
- struct perf_task_event *task_event)
-{
- struct perf_event *event;
-
- list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
- if (perf_event_task_match(event))
- perf_event_task_output(event, task_event);
- }
-}
-
-static void perf_event_task_event(struct perf_task_event *task_event)
-{
- struct perf_cpu_context *cpuctx;
- struct perf_event_context *ctx;
- struct pmu *pmu;
- int ctxn;
-
- rcu_read_lock();
- list_for_each_entry_rcu(pmu, &pmus, entry) {
- cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
- if (cpuctx->active_pmu != pmu)
- goto next;
- perf_event_task_ctx(&cpuctx->ctx, task_event);
-
- ctx = task_event->task_ctx;
- if (!ctx) {
- ctxn = pmu->task_ctx_nr;
- if (ctxn < 0)
- goto next;
- ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
- }
- if (ctx)
- perf_event_task_ctx(ctx, task_event);
-next:
- put_cpu_ptr(pmu->pmu_cpu_context);
- }
- rcu_read_unlock();
-}
-
-static void perf_event_task(struct task_struct *task,
- struct perf_event_context *task_ctx,
- int new)
-{
- struct perf_task_event task_event;
-
- if (!atomic_read(&nr_comm_events) &&
- !atomic_read(&nr_mmap_events) &&
- !atomic_read(&nr_task_events))
- return;
-
- task_event = (struct perf_task_event){
- .task = task,
- .task_ctx = task_ctx,
- .event_id = {
- .header = {
- .type = new ? PERF_RECORD_FORK : PERF_RECORD_EXIT,
- .misc = 0,
- .size = sizeof(task_event.event_id),
- },
- /* .pid */
- /* .ppid */
- /* .tid */
- /* .ptid */
- .time = perf_clock(),
- },
- };
-
- perf_event_task_event(&task_event);
-}
-
-void perf_event_fork(struct task_struct *task)
-{
- perf_event_task(task, NULL, 1);
-}
-
-/*
- * comm tracking
- */
-
-struct perf_comm_event {
- struct task_struct *task;
- char *comm;
- int comm_size;
-
- struct {
- struct perf_event_header header;
-
- u32 pid;
- u32 tid;
- } event_id;
-};
-
-static void perf_event_comm_output(struct perf_event *event,
- struct perf_comm_event *comm_event)
-{
- struct perf_output_handle handle;
- struct perf_sample_data sample;
- int size = comm_event->event_id.header.size;
- int ret;
-
- perf_event_header__init_id(&comm_event->event_id.header, &sample, event);
- ret = perf_output_begin(&handle, event,
- comm_event->event_id.header.size, 0, 0);
-
- if (ret)
- goto out;
-
- comm_event->event_id.pid = perf_event_pid(event, comm_event->task);
- comm_event->event_id.tid = perf_event_tid(event, comm_event->task);
-
- perf_output_put(&handle, comm_event->event_id);
- perf_output_copy(&handle, comm_event->comm,
- comm_event->comm_size);
-
- perf_event__output_id_sample(event, &handle, &sample);
-
- perf_output_end(&handle);
-out:
- comm_event->event_id.header.size = size;
-}
-
-static int perf_event_comm_match(struct perf_event *event)
-{
- if (event->state < PERF_EVENT_STATE_INACTIVE)
- return 0;
-
- if (!event_filter_match(event))
- return 0;
-
- if (event->attr.comm)
- return 1;
-
- return 0;
-}
-
-static void perf_event_comm_ctx(struct perf_event_context *ctx,
- struct perf_comm_event *comm_event)
-{
- struct perf_event *event;
-
- list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
- if (perf_event_comm_match(event))
- perf_event_comm_output(event, comm_event);
- }
-}
-
-static void perf_event_comm_event(struct perf_comm_event *comm_event)
-{
- struct perf_cpu_context *cpuctx;
- struct perf_event_context *ctx;
- char comm[TASK_COMM_LEN];
- unsigned int size;
- struct pmu *pmu;
- int ctxn;
-
- memset(comm, 0, sizeof(comm));
- strlcpy(comm, comm_event->task->comm, sizeof(comm));
- size = ALIGN(strlen(comm)+1, sizeof(u64));
-
- comm_event->comm = comm;
- comm_event->comm_size = size;
-
- comm_event->event_id.header.size = sizeof(comm_event->event_id) + size;
- rcu_read_lock();
- list_for_each_entry_rcu(pmu, &pmus, entry) {
- cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
- if (cpuctx->active_pmu != pmu)
- goto next;
- perf_event_comm_ctx(&cpuctx->ctx, comm_event);
-
- ctxn = pmu->task_ctx_nr;
- if (ctxn < 0)
- goto next;
-
- ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
- if (ctx)
- perf_event_comm_ctx(ctx, comm_event);
-next:
- put_cpu_ptr(pmu->pmu_cpu_context);
- }
- rcu_read_unlock();
-}
-
-void perf_event_comm(struct task_struct *task)
-{
- struct perf_comm_event comm_event;
- struct perf_event_context *ctx;
- int ctxn;
-
- for_each_task_context_nr(ctxn) {
- ctx = task->perf_event_ctxp[ctxn];
- if (!ctx)
- continue;
-
- perf_event_enable_on_exec(ctx);
- }
-
- if (!atomic_read(&nr_comm_events))
- return;
-
- comm_event = (struct perf_comm_event){
- .task = task,
- /* .comm */
- /* .comm_size */
- .event_id = {
- .header = {
- .type = PERF_RECORD_COMM,
- .misc = 0,
- /* .size */
- },
- /* .pid */
- /* .tid */
- },
- };
-
- perf_event_comm_event(&comm_event);
-}
-
-/*
- * mmap tracking
- */
-
-struct perf_mmap_event {
- struct vm_area_struct *vma;
-
- const char *file_name;
- int file_size;
-
- struct {
- struct perf_event_header header;
-
- u32 pid;
- u32 tid;
- u64 start;
- u64 len;
- u64 pgoff;
- } event_id;
-};
-
-static void perf_event_mmap_output(struct perf_event *event,
- struct perf_mmap_event *mmap_event)
-{
- struct perf_output_handle handle;
- struct perf_sample_data sample;
- int size = mmap_event->event_id.header.size;
- int ret;
-
- perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
- ret = perf_output_begin(&handle, event,
- mmap_event->event_id.header.size, 0, 0);
- if (ret)
- goto out;
-
- mmap_event->event_id.pid = perf_event_pid(event, current);
- mmap_event->event_id.tid = perf_event_tid(event, current);
-
- perf_output_put(&handle, mmap_event->event_id);
- perf_output_copy(&handle, mmap_event->file_name,
- mmap_event->file_size);
-
- perf_event__output_id_sample(event, &handle, &sample);
-
- perf_output_end(&handle);
-out:
- mmap_event->event_id.header.size = size;
-}
-
-static int perf_event_mmap_match(struct perf_event *event,
- struct perf_mmap_event *mmap_event,
- int executable)
-{
- if (event->state < PERF_EVENT_STATE_INACTIVE)
- return 0;
-
- if (!event_filter_match(event))
- return 0;
-
- if ((!executable && event->attr.mmap_data) ||
- (executable && event->attr.mmap))
- return 1;
-
- return 0;
-}
-
-static void perf_event_mmap_ctx(struct perf_event_context *ctx,
- struct perf_mmap_event *mmap_event,
- int executable)
-{
- struct perf_event *event;
-
- list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
- if (perf_event_mmap_match(event, mmap_event, executable))
- perf_event_mmap_output(event, mmap_event);
- }
-}
-
-static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
-{
- struct perf_cpu_context *cpuctx;
- struct perf_event_context *ctx;
- struct vm_area_struct *vma = mmap_event->vma;
- struct file *file = vma->vm_file;
- unsigned int size;
- char tmp[16];
- char *buf = NULL;
- const char *name;
- struct pmu *pmu;
- int ctxn;
-
- memset(tmp, 0, sizeof(tmp));
-
- if (file) {
- /*
- * d_path works from the end of the buffer backwards, so we
- * need to add enough zero bytes after the string to handle
- * the 64bit alignment we do later.
- */
- buf = kzalloc(PATH_MAX + sizeof(u64), GFP_KERNEL);
- if (!buf) {
- name = strncpy(tmp, "//enomem", sizeof(tmp));
- goto got_name;
- }
- name = d_path(&file->f_path, buf, PATH_MAX);
- if (IS_ERR(name)) {
- name = strncpy(tmp, "//toolong", sizeof(tmp));
- goto got_name;
- }
- } else {
- if (arch_vma_name(mmap_event->vma)) {
- name = strncpy(tmp, arch_vma_name(mmap_event->vma),
- sizeof(tmp));
- goto got_name;
- }
-
- if (!vma->vm_mm) {
- name = strncpy(tmp, "[vdso]", sizeof(tmp));
- goto got_name;
- } else if (vma->vm_start <= vma->vm_mm->start_brk &&
- vma->vm_end >= vma->vm_mm->brk) {
- name = strncpy(tmp, "[heap]", sizeof(tmp));
- goto got_name;
- } else if (vma->vm_start <= vma->vm_mm->start_stack &&
- vma->vm_end >= vma->vm_mm->start_stack) {
- name = strncpy(tmp, "[stack]", sizeof(tmp));
- goto got_name;
- }
-
- name = strncpy(tmp, "//anon", sizeof(tmp));
- goto got_name;
- }
-
-got_name:
- size = ALIGN(strlen(name)+1, sizeof(u64));
-
- mmap_event->file_name = name;
- mmap_event->file_size = size;
-
- mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;
-
- rcu_read_lock();
- list_for_each_entry_rcu(pmu, &pmus, entry) {
- cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
- if (cpuctx->active_pmu != pmu)
- goto next;
- perf_event_mmap_ctx(&cpuctx->ctx, mmap_event,
- vma->vm_flags & VM_EXEC);
-
- ctxn = pmu->task_ctx_nr;
- if (ctxn < 0)
- goto next;
-
- ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
- if (ctx) {
- perf_event_mmap_ctx(ctx, mmap_event,
- vma->vm_flags & VM_EXEC);
- }
-next:
- put_cpu_ptr(pmu->pmu_cpu_context);
- }
- rcu_read_unlock();
-
- kfree(buf);
-}
-
-void perf_event_mmap(struct vm_area_struct *vma)
-{
- struct perf_mmap_event mmap_event;
-
- if (!atomic_read(&nr_mmap_events))
- return;
-
- mmap_event = (struct perf_mmap_event){
- .vma = vma,
- /* .file_name */
- /* .file_size */
- .event_id = {
- .header = {
- .type = PERF_RECORD_MMAP,
- .misc = PERF_RECORD_MISC_USER,
- /* .size */
- },
- /* .pid */
- /* .tid */
- .start = vma->vm_start,
- .len = vma->vm_end - vma->vm_start,
- .pgoff = (u64)vma->vm_pgoff << PAGE_SHIFT,
- },
- };
-
- perf_event_mmap_event(&mmap_event);
-}
-
-/*
- * IRQ throttle logging
- */
-
-static void perf_log_throttle(struct perf_event *event, int enable)
-{
- struct perf_output_handle handle;
- struct perf_sample_data sample;
- int ret;
-
- struct {
- struct perf_event_header header;
- u64 time;
- u64 id;
- u64 stream_id;
- } throttle_event = {
- .header = {
- .type = PERF_RECORD_THROTTLE,
- .misc = 0,
- .size = sizeof(throttle_event),
- },
- .time = perf_clock(),
- .id = primary_event_id(event),
- .stream_id = event->id,
- };
-
- if (enable)
- throttle_event.header.type = PERF_RECORD_UNTHROTTLE;
-
- perf_event_header__init_id(&throttle_event.header, &sample, event);
-
- ret = perf_output_begin(&handle, event,
- throttle_event.header.size, 1, 0);
- if (ret)
- return;
-
- perf_output_put(&handle, throttle_event);
- perf_event__output_id_sample(event, &handle, &sample);
- perf_output_end(&handle);
-}
-
-/*
- * Generic event overflow handling, sampling.
- */
-
-static int __perf_event_overflow(struct perf_event *event, int nmi,
- int throttle, struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- int events = atomic_read(&event->event_limit);
- struct hw_perf_event *hwc = &event->hw;
- int ret = 0;
-
- /*
- * Non-sampling counters might still use the PMI to fold short
- * hardware counters, ignore those.
- */
- if (unlikely(!is_sampling_event(event)))
- return 0;
-
- if (unlikely(hwc->interrupts >= max_samples_per_tick)) {
- if (throttle) {
- hwc->interrupts = MAX_INTERRUPTS;
- perf_log_throttle(event, 0);
- ret = 1;
- }
- } else
- hwc->interrupts++;
-
- if (event->attr.freq) {
- u64 now = perf_clock();
- s64 delta = now - hwc->freq_time_stamp;
-
- hwc->freq_time_stamp = now;
-
- if (delta > 0 && delta < 2*TICK_NSEC)
- perf_adjust_period(event, delta, hwc->last_period);
- }
-
- /*
- * XXX event_limit might not quite work as expected on inherited
- * events
- */
-
- event->pending_kill = POLL_IN;
- if (events && atomic_dec_and_test(&event->event_limit)) {
- ret = 1;
- event->pending_kill = POLL_HUP;
- if (nmi) {
- event->pending_disable = 1;
- irq_work_queue(&event->pending);
- } else
- perf_event_disable(event);
- }
-
- if (event->overflow_handler)
- event->overflow_handler(event, nmi, data, regs);
- else
- perf_event_output(event, nmi, data, regs);
-
- return ret;
-}
-
-int perf_event_overflow(struct perf_event *event, int nmi,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- return __perf_event_overflow(event, nmi, 1, data, regs);
-}
-
-/*
- * Generic software event infrastructure
- */
-
-struct swevent_htable {
- struct swevent_hlist *swevent_hlist;
- struct mutex hlist_mutex;
- int hlist_refcount;
-
- /* Recursion avoidance in each contexts */
- int recursion[PERF_NR_CONTEXTS];
-};
-
-static DEFINE_PER_CPU(struct swevent_htable, swevent_htable);
-
-/*
- * We directly increment event->count and keep a second value in
- * event->hw.period_left to count intervals. This period event
- * is kept in the range [-sample_period, 0] so that we can use the
- * sign as trigger.
- */
-
-static u64 perf_swevent_set_period(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
- u64 period = hwc->last_period;
- u64 nr, offset;
- s64 old, val;
-
- hwc->last_period = hwc->sample_period;
-
-again:
- old = val = local64_read(&hwc->period_left);
- if (val < 0)
- return 0;
-
- nr = div64_u64(period + val, period);
- offset = nr * period;
- val -= offset;
- if (local64_cmpxchg(&hwc->period_left, old, val) != old)
- goto again;
-
- return nr;
-}
-
-static void perf_swevent_overflow(struct perf_event *event, u64 overflow,
- int nmi, struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- struct hw_perf_event *hwc = &event->hw;
- int throttle = 0;
-
- data->period = event->hw.last_period;
- if (!overflow)
- overflow = perf_swevent_set_period(event);
-
- if (hwc->interrupts == MAX_INTERRUPTS)
- return;
-
- for (; overflow; overflow--) {
- if (__perf_event_overflow(event, nmi, throttle,
- data, regs)) {
- /*
- * We inhibit the overflow from happening when
- * hwc->interrupts == MAX_INTERRUPTS.
- */
- break;
- }
- throttle = 1;
- }
-}
-
-static void perf_swevent_event(struct perf_event *event, u64 nr,
- int nmi, struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- struct hw_perf_event *hwc = &event->hw;
-
- local64_add(nr, &event->count);
-
- if (!regs)
- return;
-
- if (!is_sampling_event(event))
- return;
-
- if (nr == 1 && hwc->sample_period == 1 && !event->attr.freq)
- return perf_swevent_overflow(event, 1, nmi, data, regs);
-
- if (local64_add_negative(nr, &hwc->period_left))
- return;
-
- perf_swevent_overflow(event, 0, nmi, data, regs);
-}
-
-static int perf_exclude_event(struct perf_event *event,
- struct pt_regs *regs)
-{
- if (event->hw.state & PERF_HES_STOPPED)
- return 1;
-
- if (regs) {
- if (event->attr.exclude_user && user_mode(regs))
- return 1;
-
- if (event->attr.exclude_kernel && !user_mode(regs))
- return 1;
- }
-
- return 0;
-}
-
-static int perf_swevent_match(struct perf_event *event,
- enum perf_type_id type,
- u32 event_id,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- if (event->attr.type != type)
- return 0;
-
- if (event->attr.config != event_id)
- return 0;
-
- if (perf_exclude_event(event, regs))
- return 0;
-
- return 1;
-}
-
-static inline u64 swevent_hash(u64 type, u32 event_id)
-{
- u64 val = event_id | (type << 32);
-
- return hash_64(val, SWEVENT_HLIST_BITS);
-}
-
-static inline struct hlist_head *
-__find_swevent_head(struct swevent_hlist *hlist, u64 type, u32 event_id)
-{
- u64 hash = swevent_hash(type, event_id);
-
- return &hlist->heads[hash];
-}
-
-/* For the read side: events when they trigger */
-static inline struct hlist_head *
-find_swevent_head_rcu(struct swevent_htable *swhash, u64 type, u32 event_id)
-{
- struct swevent_hlist *hlist;
-
- hlist = rcu_dereference(swhash->swevent_hlist);
- if (!hlist)
- return NULL;
-
- return __find_swevent_head(hlist, type, event_id);
-}
-
-/* For the event head insertion and removal in the hlist */
-static inline struct hlist_head *
-find_swevent_head(struct swevent_htable *swhash, struct perf_event *event)
-{
- struct swevent_hlist *hlist;
- u32 event_id = event->attr.config;
- u64 type = event->attr.type;
-
- /*
- * Event scheduling is always serialized against hlist allocation
- * and release. Which makes the protected version suitable here.
- * The context lock guarantees that.
- */
- hlist = rcu_dereference_protected(swhash->swevent_hlist,
- lockdep_is_held(&event->ctx->lock));
- if (!hlist)
- return NULL;
-
- return __find_swevent_head(hlist, type, event_id);
-}
-
-static void do_perf_sw_event(enum perf_type_id type, u32 event_id,
- u64 nr, int nmi,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
- struct perf_event *event;
- struct hlist_node *node;
- struct hlist_head *head;
-
- rcu_read_lock();
- head = find_swevent_head_rcu(swhash, type, event_id);
- if (!head)
- goto end;
-
- hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
- if (perf_swevent_match(event, type, event_id, data, regs))
- perf_swevent_event(event, nr, nmi, data, regs);
- }
-end:
- rcu_read_unlock();
-}
-
-int perf_swevent_get_recursion_context(void)
-{
- struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
-
- return get_recursion_context(swhash->recursion);
-}
-EXPORT_SYMBOL_GPL(perf_swevent_get_recursion_context);
-
-inline void perf_swevent_put_recursion_context(int rctx)
-{
- struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
-
- put_recursion_context(swhash->recursion, rctx);
-}
-
-void __perf_sw_event(u32 event_id, u64 nr, int nmi,
- struct pt_regs *regs, u64 addr)
-{
- struct perf_sample_data data;
- int rctx;
-
- preempt_disable_notrace();
- rctx = perf_swevent_get_recursion_context();
- if (rctx < 0)
- return;
-
- perf_sample_data_init(&data, addr);
-
- do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, nmi, &data, regs);
-
- perf_swevent_put_recursion_context(rctx);
- preempt_enable_notrace();
-}
-
-static void perf_swevent_read(struct perf_event *event)
-{
-}
-
-static int perf_swevent_add(struct perf_event *event, int flags)
-{
- struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
- struct hw_perf_event *hwc = &event->hw;
- struct hlist_head *head;
-
- if (is_sampling_event(event)) {
- hwc->last_period = hwc->sample_period;
- perf_swevent_set_period(event);
- }
-
- hwc->state = !(flags & PERF_EF_START);
-
- head = find_swevent_head(swhash, event);
- if (WARN_ON_ONCE(!head))
- return -EINVAL;
-
- hlist_add_head_rcu(&event->hlist_entry, head);
-
- return 0;
-}
-
-static void perf_swevent_del(struct perf_event *event, int flags)
-{
- hlist_del_rcu(&event->hlist_entry);
-}
-
-static void perf_swevent_start(struct perf_event *event, int flags)
-{
- event->hw.state = 0;
-}
-
-static void perf_swevent_stop(struct perf_event *event, int flags)
-{
- event->hw.state = PERF_HES_STOPPED;
-}
-
-/* Deref the hlist from the update side */
-static inline struct swevent_hlist *
-swevent_hlist_deref(struct swevent_htable *swhash)
-{
- return rcu_dereference_protected(swhash->swevent_hlist,
- lockdep_is_held(&swhash->hlist_mutex));
-}
-
-static void swevent_hlist_release_rcu(struct rcu_head *rcu_head)
-{
- struct swevent_hlist *hlist;
-
- hlist = container_of(rcu_head, struct swevent_hlist, rcu_head);
- kfree(hlist);
-}
-
-static void swevent_hlist_release(struct swevent_htable *swhash)
-{
- struct swevent_hlist *hlist = swevent_hlist_deref(swhash);
-
- if (!hlist)
- return;
-
- rcu_assign_pointer(swhash->swevent_hlist, NULL);
- call_rcu(&hlist->rcu_head, swevent_hlist_release_rcu);
-}
-
-static void swevent_hlist_put_cpu(struct perf_event *event, int cpu)
-{
- struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
-
- mutex_lock(&swhash->hlist_mutex);
-
- if (!--swhash->hlist_refcount)
- swevent_hlist_release(swhash);
-
- mutex_unlock(&swhash->hlist_mutex);
-}
-
-static void swevent_hlist_put(struct perf_event *event)
-{
- int cpu;
-
- if (event->cpu != -1) {
- swevent_hlist_put_cpu(event, event->cpu);
- return;
- }
-
- for_each_possible_cpu(cpu)
- swevent_hlist_put_cpu(event, cpu);
-}
-
-static int swevent_hlist_get_cpu(struct perf_event *event, int cpu)
-{
- struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
- int err = 0;
-
- mutex_lock(&swhash->hlist_mutex);
-
- if (!swevent_hlist_deref(swhash) && cpu_online(cpu)) {
- struct swevent_hlist *hlist;
-
- hlist = kzalloc(sizeof(*hlist), GFP_KERNEL);
- if (!hlist) {
- err = -ENOMEM;
- goto exit;
- }
- rcu_assign_pointer(swhash->swevent_hlist, hlist);
- }
- swhash->hlist_refcount++;
-exit:
- mutex_unlock(&swhash->hlist_mutex);
-
- return err;
-}
-
-static int swevent_hlist_get(struct perf_event *event)
-{
- int err;
- int cpu, failed_cpu;
-
- if (event->cpu != -1)
- return swevent_hlist_get_cpu(event, event->cpu);
-
- get_online_cpus();
- for_each_possible_cpu(cpu) {
- err = swevent_hlist_get_cpu(event, cpu);
- if (err) {
- failed_cpu = cpu;
- goto fail;
- }
- }
- put_online_cpus();
-
- return 0;
-fail:
- for_each_possible_cpu(cpu) {
- if (cpu == failed_cpu)
- break;
- swevent_hlist_put_cpu(event, cpu);
- }
-
- put_online_cpus();
- return err;
-}
-
-atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
-
-static void sw_perf_event_destroy(struct perf_event *event)
-{
- u64 event_id = event->attr.config;
-
- WARN_ON(event->parent);
-
- jump_label_dec(&perf_swevent_enabled[event_id]);
- swevent_hlist_put(event);
-}
-
-static int perf_swevent_init(struct perf_event *event)
-{
- int event_id = event->attr.config;
-
- if (event->attr.type != PERF_TYPE_SOFTWARE)
- return -ENOENT;
-
- switch (event_id) {
- case PERF_COUNT_SW_CPU_CLOCK:
- case PERF_COUNT_SW_TASK_CLOCK:
- return -ENOENT;
-
- default:
- break;
- }
-
- if (event_id >= PERF_COUNT_SW_MAX)
- return -ENOENT;
-
- if (!event->parent) {
- int err;
-
- err = swevent_hlist_get(event);
- if (err)
- return err;
-
- jump_label_inc(&perf_swevent_enabled[event_id]);
- event->destroy = sw_perf_event_destroy;
- }
-
- return 0;
-}
-
-static struct pmu perf_swevent = {
- .task_ctx_nr = perf_sw_context,
-
- .event_init = perf_swevent_init,
- .add = perf_swevent_add,
- .del = perf_swevent_del,
- .start = perf_swevent_start,
- .stop = perf_swevent_stop,
- .read = perf_swevent_read,
-};
-
-#ifdef CONFIG_EVENT_TRACING
-
-static int perf_tp_filter_match(struct perf_event *event,
- struct perf_sample_data *data)
-{
- void *record = data->raw->data;
-
- if (likely(!event->filter) || filter_match_preds(event->filter, record))
- return 1;
- return 0;
-}
-
-static int perf_tp_event_match(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- if (event->hw.state & PERF_HES_STOPPED)
- return 0;
- /*
- * All tracepoints are from kernel-space.
- */
- if (event->attr.exclude_kernel)
- return 0;
-
- if (!perf_tp_filter_match(event, data))
- return 0;
-
- return 1;
-}
-
-void perf_tp_event(u64 addr, u64 count, void *record, int entry_size,
- struct pt_regs *regs, struct hlist_head *head, int rctx)
-{
- struct perf_sample_data data;
- struct perf_event *event;
- struct hlist_node *node;
-
- struct perf_raw_record raw = {
- .size = entry_size,
- .data = record,
- };
-
- perf_sample_data_init(&data, addr);
- data.raw = &raw;
-
- hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
- if (perf_tp_event_match(event, &data, regs))
- perf_swevent_event(event, count, 1, &data, regs);
- }
-
- perf_swevent_put_recursion_context(rctx);
-}
-EXPORT_SYMBOL_GPL(perf_tp_event);
-
-static void tp_perf_event_destroy(struct perf_event *event)
-{
- perf_trace_destroy(event);
-}
-
-static int perf_tp_event_init(struct perf_event *event)
-{
- int err;
-
- if (event->attr.type != PERF_TYPE_TRACEPOINT)
- return -ENOENT;
-
- err = perf_trace_init(event);
- if (err)
- return err;
-
- event->destroy = tp_perf_event_destroy;
-
- return 0;
-}
-
-static struct pmu perf_tracepoint = {
- .task_ctx_nr = perf_sw_context,
-
- .event_init = perf_tp_event_init,
- .add = perf_trace_add,
- .del = perf_trace_del,
- .start = perf_swevent_start,
- .stop = perf_swevent_stop,
- .read = perf_swevent_read,
-};
-
-static inline void perf_tp_register(void)
-{
- perf_pmu_register(&perf_tracepoint, "tracepoint", PERF_TYPE_TRACEPOINT);
-}
-
-static int perf_event_set_filter(struct perf_event *event, void __user *arg)
-{
- char *filter_str;
- int ret;
-
- if (event->attr.type != PERF_TYPE_TRACEPOINT)
- return -EINVAL;
-
- filter_str = strndup_user(arg, PAGE_SIZE);
- if (IS_ERR(filter_str))
- return PTR_ERR(filter_str);
-
- ret = ftrace_profile_set_filter(event, event->attr.config, filter_str);
-
- kfree(filter_str);
- return ret;
-}
-
-static void perf_event_free_filter(struct perf_event *event)
-{
- ftrace_profile_free_filter(event);
-}
-
-#else
-
-static inline void perf_tp_register(void)
-{
-}
-
-static int perf_event_set_filter(struct perf_event *event, void __user *arg)
-{
- return -ENOENT;
-}
-
-static void perf_event_free_filter(struct perf_event *event)
-{
-}
-
-#endif /* CONFIG_EVENT_TRACING */
-
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
-void perf_bp_event(struct perf_event *bp, void *data)
-{
- struct perf_sample_data sample;
- struct pt_regs *regs = data;
-
- perf_sample_data_init(&sample, bp->attr.bp_addr);
-
- if (!bp->hw.state && !perf_exclude_event(bp, regs))
- perf_swevent_event(bp, 1, 1, &sample, regs);
-}
-#endif
-
-/*
- * hrtimer based swevent callback
- */
-
-static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
-{
- enum hrtimer_restart ret = HRTIMER_RESTART;
- struct perf_sample_data data;
- struct pt_regs *regs;
- struct perf_event *event;
- u64 period;
-
- event = container_of(hrtimer, struct perf_event, hw.hrtimer);
-
- if (event->state != PERF_EVENT_STATE_ACTIVE)
- return HRTIMER_NORESTART;
-
- event->pmu->read(event);
-
- perf_sample_data_init(&data, 0);
- data.period = event->hw.last_period;
- regs = get_irq_regs();
-
- if (regs && !perf_exclude_event(event, regs)) {
- if (!(event->attr.exclude_idle && current->pid == 0))
- if (perf_event_overflow(event, 0, &data, regs))
- ret = HRTIMER_NORESTART;
- }
-
- period = max_t(u64, 10000, event->hw.sample_period);
- hrtimer_forward_now(hrtimer, ns_to_ktime(period));
-
- return ret;
-}
-
-static void perf_swevent_start_hrtimer(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
- s64 period;
-
- if (!is_sampling_event(event))
- return;
-
- period = local64_read(&hwc->period_left);
- if (period) {
- if (period < 0)
- period = 10000;
-
- local64_set(&hwc->period_left, 0);
- } else {
- period = max_t(u64, 10000, hwc->sample_period);
- }
- __hrtimer_start_range_ns(&hwc->hrtimer,
- ns_to_ktime(period), 0,
- HRTIMER_MODE_REL_PINNED, 0);
-}
-
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
-
- if (is_sampling_event(event)) {
- ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
- local64_set(&hwc->period_left, ktime_to_ns(remaining));
-
- hrtimer_cancel(&hwc->hrtimer);
- }
-}
-
-static void perf_swevent_init_hrtimer(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
-
- if (!is_sampling_event(event))
- return;
-
- hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- hwc->hrtimer.function = perf_swevent_hrtimer;
-
- /*
- * Since hrtimers have a fixed rate, we can do a static freq->period
- * mapping and avoid the whole period adjust feedback stuff.
- */
- if (event->attr.freq) {
- long freq = event->attr.sample_freq;
-
- event->attr.sample_period = NSEC_PER_SEC / freq;
- hwc->sample_period = event->attr.sample_period;
- local64_set(&hwc->period_left, hwc->sample_period);
- event->attr.freq = 0;
- }
-}
-
-/*
- * Software event: cpu wall time clock
- */
-
-static void cpu_clock_event_update(struct perf_event *event)
-{
- s64 prev;
- u64 now;
-
- now = local_clock();
- prev = local64_xchg(&event->hw.prev_count, now);
- local64_add(now - prev, &event->count);
-}
-
-static void cpu_clock_event_start(struct perf_event *event, int flags)
-{
- local64_set(&event->hw.prev_count, local_clock());
- perf_swevent_start_hrtimer(event);
-}
-
-static void cpu_clock_event_stop(struct perf_event *event, int flags)
-{
- perf_swevent_cancel_hrtimer(event);
- cpu_clock_event_update(event);
-}
-
-static int cpu_clock_event_add(struct perf_event *event, int flags)
-{
- if (flags & PERF_EF_START)
- cpu_clock_event_start(event, flags);
-
- return 0;
-}
-
-static void cpu_clock_event_del(struct perf_event *event, int flags)
-{
- cpu_clock_event_stop(event, flags);
-}
-
-static void cpu_clock_event_read(struct perf_event *event)
-{
- cpu_clock_event_update(event);
-}
-
-static int cpu_clock_event_init(struct perf_event *event)
-{
- if (event->attr.type != PERF_TYPE_SOFTWARE)
- return -ENOENT;
-
- if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
- return -ENOENT;
-
- perf_swevent_init_hrtimer(event);
-
- return 0;
-}
-
-static struct pmu perf_cpu_clock = {
- .task_ctx_nr = perf_sw_context,
-
- .event_init = cpu_clock_event_init,
- .add = cpu_clock_event_add,
- .del = cpu_clock_event_del,
- .start = cpu_clock_event_start,
- .stop = cpu_clock_event_stop,
- .read = cpu_clock_event_read,
-};
-
-/*
- * Software event: task time clock
- */
-
-static void task_clock_event_update(struct perf_event *event, u64 now)
-{
- u64 prev;
- s64 delta;
-
- prev = local64_xchg(&event->hw.prev_count, now);
- delta = now - prev;
- local64_add(delta, &event->count);
-}
-
-static void task_clock_event_start(struct perf_event *event, int flags)
-{
- local64_set(&event->hw.prev_count, event->ctx->time);
- perf_swevent_start_hrtimer(event);
-}
-
-static void task_clock_event_stop(struct perf_event *event, int flags)
-{
- perf_swevent_cancel_hrtimer(event);
- task_clock_event_update(event, event->ctx->time);
-}
-
-static int task_clock_event_add(struct perf_event *event, int flags)
-{
- if (flags & PERF_EF_START)
- task_clock_event_start(event, flags);
-
- return 0;
-}
-
-static void task_clock_event_del(struct perf_event *event, int flags)
-{
- task_clock_event_stop(event, PERF_EF_UPDATE);
-}
-
-static void task_clock_event_read(struct perf_event *event)
-{
- u64 now = perf_clock();
- u64 delta = now - event->ctx->timestamp;
- u64 time = event->ctx->time + delta;
-
- task_clock_event_update(event, time);
-}
-
-static int task_clock_event_init(struct perf_event *event)
-{
- if (event->attr.type != PERF_TYPE_SOFTWARE)
- return -ENOENT;
-
- if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
- return -ENOENT;
-
- perf_swevent_init_hrtimer(event);
-
- return 0;
-}
-
-static struct pmu perf_task_clock = {
- .task_ctx_nr = perf_sw_context,
-
- .event_init = task_clock_event_init,
- .add = task_clock_event_add,
- .del = task_clock_event_del,
- .start = task_clock_event_start,
- .stop = task_clock_event_stop,
- .read = task_clock_event_read,
-};
-
-static void perf_pmu_nop_void(struct pmu *pmu)
-{
-}
-
-static int perf_pmu_nop_int(struct pmu *pmu)
-{
- return 0;
-}
-
-static void perf_pmu_start_txn(struct pmu *pmu)
-{
- perf_pmu_disable(pmu);
-}
-
-static int perf_pmu_commit_txn(struct pmu *pmu)
-{
- perf_pmu_enable(pmu);
- return 0;
-}
-
-static void perf_pmu_cancel_txn(struct pmu *pmu)
-{
- perf_pmu_enable(pmu);
-}
-
-/*
- * Ensures all contexts with the same task_ctx_nr have the same
- * pmu_cpu_context too.
- */
-static void *find_pmu_context(int ctxn)
-{
- struct pmu *pmu;
-
- if (ctxn < 0)
- return NULL;
-
- list_for_each_entry(pmu, &pmus, entry) {
- if (pmu->task_ctx_nr == ctxn)
- return pmu->pmu_cpu_context;
- }
-
- return NULL;
-}
-
-static void update_pmu_context(struct pmu *pmu, struct pmu *old_pmu)
-{
- int cpu;
-
- for_each_possible_cpu(cpu) {
- struct perf_cpu_context *cpuctx;
-
- cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
-
- if (cpuctx->active_pmu == old_pmu)
- cpuctx->active_pmu = pmu;
- }
-}
-
-static void free_pmu_context(struct pmu *pmu)
-{
- struct pmu *i;
-
- mutex_lock(&pmus_lock);
- /*
- * Like a real lame refcount.
- */
- list_for_each_entry(i, &pmus, entry) {
- if (i->pmu_cpu_context == pmu->pmu_cpu_context) {
- update_pmu_context(i, pmu);
- goto out;
- }
- }
-
- free_percpu(pmu->pmu_cpu_context);
-out:
- mutex_unlock(&pmus_lock);
-}
-static struct idr pmu_idr;
-
-static ssize_t
-type_show(struct device *dev, struct device_attribute *attr, char *page)
-{
- struct pmu *pmu = dev_get_drvdata(dev);
-
- return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->type);
-}
-
-static struct device_attribute pmu_dev_attrs[] = {
- __ATTR_RO(type),
- __ATTR_NULL,
-};
-
-static int pmu_bus_running;
-static struct bus_type pmu_bus = {
- .name = "event_source",
- .dev_attrs = pmu_dev_attrs,
-};
-
-static void pmu_dev_release(struct device *dev)
-{
- kfree(dev);
-}
-
-static int pmu_dev_alloc(struct pmu *pmu)
-{
- int ret = -ENOMEM;
-
- pmu->dev = kzalloc(sizeof(struct device), GFP_KERNEL);
- if (!pmu->dev)
- goto out;
-
- device_initialize(pmu->dev);
- ret = dev_set_name(pmu->dev, "%s", pmu->name);
- if (ret)
- goto free_dev;
-
- dev_set_drvdata(pmu->dev, pmu);
- pmu->dev->bus = &pmu_bus;
- pmu->dev->release = pmu_dev_release;
- ret = device_add(pmu->dev);
- if (ret)
- goto free_dev;
-
-out:
- return ret;
-
-free_dev:
- put_device(pmu->dev);
- goto out;
-}
-
-static struct lock_class_key cpuctx_mutex;
-
-int perf_pmu_register(struct pmu *pmu, char *name, int type)
-{
- int cpu, ret;
-
- mutex_lock(&pmus_lock);
- ret = -ENOMEM;
- pmu->pmu_disable_count = alloc_percpu(int);
- if (!pmu->pmu_disable_count)
- goto unlock;
-
- pmu->type = -1;
- if (!name)
- goto skip_type;
- pmu->name = name;
-
- if (type < 0) {
- int err = idr_pre_get(&pmu_idr, GFP_KERNEL);
- if (!err)
- goto free_pdc;
-
- err = idr_get_new_above(&pmu_idr, pmu, PERF_TYPE_MAX, &type);
- if (err) {
- ret = err;
- goto free_pdc;
- }
- }
- pmu->type = type;
-
- if (pmu_bus_running) {
- ret = pmu_dev_alloc(pmu);
- if (ret)
- goto free_idr;
- }
-
-skip_type:
- pmu->pmu_cpu_context = find_pmu_context(pmu->task_ctx_nr);
- if (pmu->pmu_cpu_context)
- goto got_cpu_context;
-
- pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context);
- if (!pmu->pmu_cpu_context)
- goto free_dev;
-
- for_each_possible_cpu(cpu) {
- struct perf_cpu_context *cpuctx;
-
- cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
- __perf_event_init_context(&cpuctx->ctx);
- lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
- cpuctx->ctx.type = cpu_context;
- cpuctx->ctx.pmu = pmu;
- cpuctx->jiffies_interval = 1;
- INIT_LIST_HEAD(&cpuctx->rotation_list);
- cpuctx->active_pmu = pmu;
- }
-
-got_cpu_context:
- if (!pmu->start_txn) {
- if (pmu->pmu_enable) {
- /*
- * If we have pmu_enable/pmu_disable calls, install
- * transaction stubs that use that to try and batch
- * hardware accesses.
- */
- pmu->start_txn = perf_pmu_start_txn;
- pmu->commit_txn = perf_pmu_commit_txn;
- pmu->cancel_txn = perf_pmu_cancel_txn;
- } else {
- pmu->start_txn = perf_pmu_nop_void;
- pmu->commit_txn = perf_pmu_nop_int;
- pmu->cancel_txn = perf_pmu_nop_void;
- }
- }
-
- if (!pmu->pmu_enable) {
- pmu->pmu_enable = perf_pmu_nop_void;
- pmu->pmu_disable = perf_pmu_nop_void;
- }
-
- list_add_rcu(&pmu->entry, &pmus);
- ret = 0;
-unlock:
- mutex_unlock(&pmus_lock);
-
- return ret;
-
-free_dev:
- device_del(pmu->dev);
- put_device(pmu->dev);
-
-free_idr:
- if (pmu->type >= PERF_TYPE_MAX)
- idr_remove(&pmu_idr, pmu->type);
-
-free_pdc:
- free_percpu(pmu->pmu_disable_count);
- goto unlock;
-}
-
-void perf_pmu_unregister(struct pmu *pmu)
-{
- mutex_lock(&pmus_lock);
- list_del_rcu(&pmu->entry);
- mutex_unlock(&pmus_lock);
-
- /*
- * We dereference the pmu list under both SRCU and regular RCU, so
- * synchronize against both of those.
- */
- synchronize_srcu(&pmus_srcu);
- synchronize_rcu();
-
- free_percpu(pmu->pmu_disable_count);
- if (pmu->type >= PERF_TYPE_MAX)
- idr_remove(&pmu_idr, pmu->type);
- device_del(pmu->dev);
- put_device(pmu->dev);
- free_pmu_context(pmu);
-}
-
-struct pmu *perf_init_event(struct perf_event *event)
-{
- struct pmu *pmu = NULL;
- int idx;
- int ret;
-
- idx = srcu_read_lock(&pmus_srcu);
-
- rcu_read_lock();
- pmu = idr_find(&pmu_idr, event->attr.type);
- rcu_read_unlock();
- if (pmu) {
- ret = pmu->event_init(event);
- if (ret)
- pmu = ERR_PTR(ret);
- goto unlock;
- }
-
- list_for_each_entry_rcu(pmu, &pmus, entry) {
- ret = pmu->event_init(event);
- if (!ret)
- goto unlock;
-
- if (ret != -ENOENT) {
- pmu = ERR_PTR(ret);
- goto unlock;
- }
- }
- pmu = ERR_PTR(-ENOENT);
-unlock:
- srcu_read_unlock(&pmus_srcu, idx);
-
- return pmu;
-}
-
-/*
- * Allocate and initialize a event structure
- */
-static struct perf_event *
-perf_event_alloc(struct perf_event_attr *attr, int cpu,
- struct task_struct *task,
- struct perf_event *group_leader,
- struct perf_event *parent_event,
- perf_overflow_handler_t overflow_handler)
-{
- struct pmu *pmu;
- struct perf_event *event;
- struct hw_perf_event *hwc;
- long err;
-
- if ((unsigned)cpu >= nr_cpu_ids) {
- if (!task || cpu != -1)
- return ERR_PTR(-EINVAL);
- }
-
- event = kzalloc(sizeof(*event), GFP_KERNEL);
- if (!event)
- return ERR_PTR(-ENOMEM);
-
- /*
- * Single events are their own group leaders, with an
- * empty sibling list:
- */
- if (!group_leader)
- group_leader = event;
-
- mutex_init(&event->child_mutex);
- INIT_LIST_HEAD(&event->child_list);
-
- INIT_LIST_HEAD(&event->group_entry);
- INIT_LIST_HEAD(&event->event_entry);
- INIT_LIST_HEAD(&event->sibling_list);
- init_waitqueue_head(&event->waitq);
- init_irq_work(&event->pending, perf_pending_event);
-
- mutex_init(&event->mmap_mutex);
-
- event->cpu = cpu;
- event->attr = *attr;
- event->group_leader = group_leader;
- event->pmu = NULL;
- event->oncpu = -1;
-
- event->parent = parent_event;
-
- event->ns = get_pid_ns(current->nsproxy->pid_ns);
- event->id = atomic64_inc_return(&perf_event_id);
-
- event->state = PERF_EVENT_STATE_INACTIVE;
-
- if (task) {
- event->attach_state = PERF_ATTACH_TASK;
-#ifdef CONFIG_HAVE_HW_BREAKPOINT
- /*
- * hw_breakpoint is a bit difficult here..
- */
- if (attr->type == PERF_TYPE_BREAKPOINT)
- event->hw.bp_target = task;
-#endif
- }
-
- if (!overflow_handler && parent_event)
- overflow_handler = parent_event->overflow_handler;
-
- event->overflow_handler = overflow_handler;
-
- if (attr->disabled)
- event->state = PERF_EVENT_STATE_OFF;
-
- pmu = NULL;
-
- hwc = &event->hw;
- hwc->sample_period = attr->sample_period;
- if (attr->freq && attr->sample_freq)
- hwc->sample_period = 1;
- hwc->last_period = hwc->sample_period;
-
- local64_set(&hwc->period_left, hwc->sample_period);
-
- /*
- * we currently do not support PERF_FORMAT_GROUP on inherited events
- */
- if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
- goto done;
-
- pmu = perf_init_event(event);
-
-done:
- err = 0;
- if (!pmu)
- err = -EINVAL;
- else if (IS_ERR(pmu))
- err = PTR_ERR(pmu);
-
- if (err) {
- if (event->ns)
- put_pid_ns(event->ns);
- kfree(event);
- return ERR_PTR(err);
- }
-
- event->pmu = pmu;
-
- if (!event->parent) {
- if (event->attach_state & PERF_ATTACH_TASK)
- jump_label_inc(&perf_sched_events);
- if (event->attr.mmap || event->attr.mmap_data)
- atomic_inc(&nr_mmap_events);
- if (event->attr.comm)
- atomic_inc(&nr_comm_events);
- if (event->attr.task)
- atomic_inc(&nr_task_events);
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
- err = get_callchain_buffers();
- if (err) {
- free_event(event);
- return ERR_PTR(err);
- }
- }
- }
-
- return event;
-}
-
-static int perf_copy_attr(struct perf_event_attr __user *uattr,
- struct perf_event_attr *attr)
-{
- u32 size;
- int ret;
-
- if (!access_ok(VERIFY_WRITE, uattr, PERF_ATTR_SIZE_VER0))
- return -EFAULT;
-
- /*
- * zero the full structure, so that a short copy will be nice.
- */
- memset(attr, 0, sizeof(*attr));
-
- ret = get_user(size, &uattr->size);
- if (ret)
- return ret;
-
- if (size > PAGE_SIZE) /* silly large */
- goto err_size;
-
- if (!size) /* abi compat */
- size = PERF_ATTR_SIZE_VER0;
-
- if (size < PERF_ATTR_SIZE_VER0)
- goto err_size;
-
- /*
- * If we're handed a bigger struct than we know of,
- * ensure all the unknown bits are 0 - i.e. new
- * user-space does not rely on any kernel feature
- * extensions we dont know about yet.
- */
- if (size > sizeof(*attr)) {
- unsigned char __user *addr;
- unsigned char __user *end;
- unsigned char val;
-
- addr = (void __user *)uattr + sizeof(*attr);
- end = (void __user *)uattr + size;
-
- for (; addr < end; addr++) {
- ret = get_user(val, addr);
- if (ret)
- return ret;
- if (val)
- goto err_size;
- }
- size = sizeof(*attr);
- }
-
- ret = copy_from_user(attr, uattr, size);
- if (ret)
- return -EFAULT;
-
- /*
- * If the type exists, the corresponding creation will verify
- * the attr->config.
- */
- if (attr->type >= PERF_TYPE_MAX)
- return -EINVAL;
-
- if (attr->__reserved_1)
- return -EINVAL;
-
- if (attr->sample_type & ~(PERF_SAMPLE_MAX-1))
- return -EINVAL;
-
- if (attr->read_format & ~(PERF_FORMAT_MAX-1))
- return -EINVAL;
-
-out:
- return ret;
-
-err_size:
- put_user(sizeof(*attr), &uattr->size);
- ret = -E2BIG;
- goto out;
-}
-
-static int
-perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
-{
- struct perf_buffer *buffer = NULL, *old_buffer = NULL;
- int ret = -EINVAL;
-
- if (!output_event)
- goto set;
-
- /* don't allow circular references */
- if (event == output_event)
- goto out;
-
- /*
- * Don't allow cross-cpu buffers
- */
- if (output_event->cpu != event->cpu)
- goto out;
-
- /*
- * If its not a per-cpu buffer, it must be the same task.
- */
- if (output_event->cpu == -1 && output_event->ctx != event->ctx)
- goto out;
-
-set:
- mutex_lock(&event->mmap_mutex);
- /* Can't redirect output if we've got an active mmap() */
- if (atomic_read(&event->mmap_count))
- goto unlock;
-
- if (output_event) {
- /* get the buffer we want to redirect to */
- buffer = perf_buffer_get(output_event);
- if (!buffer)
- goto unlock;
- }
-
- old_buffer = event->buffer;
- rcu_assign_pointer(event->buffer, buffer);
- ret = 0;
-unlock:
- mutex_unlock(&event->mmap_mutex);
-
- if (old_buffer)
- perf_buffer_put(old_buffer);
-out:
- return ret;
-}
-
-/**
- * sys_perf_event_open - open a performance event, associate it to a task/cpu
- *
- * @attr_uptr: event_id type attributes for monitoring/sampling
- * @pid: target pid
- * @cpu: target cpu
- * @group_fd: group leader event fd
- */
-SYSCALL_DEFINE5(perf_event_open,
- struct perf_event_attr __user *, attr_uptr,
- pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
-{
- struct perf_event *group_leader = NULL, *output_event = NULL;
- struct perf_event *event, *sibling;
- struct perf_event_attr attr;
- struct perf_event_context *ctx;
- struct file *event_file = NULL;
- struct file *group_file = NULL;
- struct task_struct *task = NULL;
- struct pmu *pmu;
- int event_fd;
- int move_group = 0;
- int fput_needed = 0;
- int err;
-
- /* for future expandability... */
- if (flags & ~PERF_FLAG_ALL)
- return -EINVAL;
-
- err = perf_copy_attr(attr_uptr, &attr);
- if (err)
- return err;
-
- if (!attr.exclude_kernel) {
- if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
- return -EACCES;
- }
-
- if (attr.freq) {
- if (attr.sample_freq > sysctl_perf_event_sample_rate)
- return -EINVAL;
- }
-
- /*
- * In cgroup mode, the pid argument is used to pass the fd
- * opened to the cgroup directory in cgroupfs. The cpu argument
- * designates the cpu on which to monitor threads from that
- * cgroup.
- */
- if ((flags & PERF_FLAG_PID_CGROUP) && (pid == -1 || cpu == -1))
- return -EINVAL;
-
- event_fd = get_unused_fd_flags(O_RDWR);
- if (event_fd < 0)
- return event_fd;
-
- if (group_fd != -1) {
- group_leader = perf_fget_light(group_fd, &fput_needed);
- if (IS_ERR(group_leader)) {
- err = PTR_ERR(group_leader);
- goto err_fd;
- }
- group_file = group_leader->filp;
- if (flags & PERF_FLAG_FD_OUTPUT)
- output_event = group_leader;
- if (flags & PERF_FLAG_FD_NO_GROUP)
- group_leader = NULL;
- }
-
- if (pid != -1 && !(flags & PERF_FLAG_PID_CGROUP)) {
- task = find_lively_task_by_vpid(pid);
- if (IS_ERR(task)) {
- err = PTR_ERR(task);
- goto err_group_fd;
- }
- }
-
- event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL);
- if (IS_ERR(event)) {
- err = PTR_ERR(event);
- goto err_task;
- }
-
- if (flags & PERF_FLAG_PID_CGROUP) {
- err = perf_cgroup_connect(pid, event, &attr, group_leader);
- if (err)
- goto err_alloc;
- /*
- * one more event:
- * - that has cgroup constraint on event->cpu
- * - that may need work on context switch
- */
- atomic_inc(&per_cpu(perf_cgroup_events, event->cpu));
- jump_label_inc(&perf_sched_events);
- }
-
- /*
- * Special case software events and allow them to be part of
- * any hardware group.
- */
- pmu = event->pmu;
-
- if (group_leader &&
- (is_software_event(event) != is_software_event(group_leader))) {
- if (is_software_event(event)) {
- /*
- * If event and group_leader are not both a software
- * event, and event is, then group leader is not.
- *
- * Allow the addition of software events to !software
- * groups, this is safe because software events never
- * fail to schedule.
- */
- pmu = group_leader->pmu;
- } else if (is_software_event(group_leader) &&
- (group_leader->group_flags & PERF_GROUP_SOFTWARE)) {
- /*
- * In case the group is a pure software group, and we
- * try to add a hardware event, move the whole group to
- * the hardware context.
- */
- move_group = 1;
- }
- }
-
- /*
- * Get the target context (task or percpu):
- */
- ctx = find_get_context(pmu, task, cpu);
- if (IS_ERR(ctx)) {
- err = PTR_ERR(ctx);
- goto err_alloc;
- }
-
- if (task) {
- put_task_struct(task);
- task = NULL;
- }
-
- /*
- * Look up the group leader (we will attach this event to it):
- */
- if (group_leader) {
- err = -EINVAL;
-
- /*
- * Do not allow a recursive hierarchy (this new sibling
- * becoming part of another group-sibling):
- */
- if (group_leader->group_leader != group_leader)
- goto err_context;
- /*
- * Do not allow to attach to a group in a different
- * task or CPU context:
- */
- if (move_group) {
- if (group_leader->ctx->type != ctx->type)
- goto err_context;
- } else {
- if (group_leader->ctx != ctx)
- goto err_context;
- }
-
- /*
- * Only a group leader can be exclusive or pinned
- */
- if (attr.exclusive || attr.pinned)
- goto err_context;
- }
-
- if (output_event) {
- err = perf_event_set_output(event, output_event);
- if (err)
- goto err_context;
- }
-
- event_file = anon_inode_getfile("[perf_event]", &perf_fops, event, O_RDWR);
- if (IS_ERR(event_file)) {
- err = PTR_ERR(event_file);
- goto err_context;
- }
-
- if (move_group) {
- struct perf_event_context *gctx = group_leader->ctx;
-
- mutex_lock(&gctx->mutex);
- perf_remove_from_context(group_leader);
- list_for_each_entry(sibling, &group_leader->sibling_list,
- group_entry) {
- perf_remove_from_context(sibling);
- put_ctx(gctx);
- }
- mutex_unlock(&gctx->mutex);
- put_ctx(gctx);
- }
-
- event->filp = event_file;
- WARN_ON_ONCE(ctx->parent_ctx);
- mutex_lock(&ctx->mutex);
-
- if (move_group) {
- perf_install_in_context(ctx, group_leader, cpu);
- get_ctx(ctx);
- list_for_each_entry(sibling, &group_leader->sibling_list,
- group_entry) {
- perf_install_in_context(ctx, sibling, cpu);
- get_ctx(ctx);
- }
- }
-
- perf_install_in_context(ctx, event, cpu);
- ++ctx->generation;
- perf_unpin_context(ctx);
- mutex_unlock(&ctx->mutex);
-
- event->owner = current;
-
- mutex_lock(¤t->perf_event_mutex);
- list_add_tail(&event->owner_entry, ¤t->perf_event_list);
- mutex_unlock(¤t->perf_event_mutex);
-
- /*
- * Precalculate sample_data sizes
- */
- perf_event__header_size(event);
- perf_event__id_header_size(event);
-
- /*
- * Drop the reference on the group_event after placing the
- * new event on the sibling_list. This ensures destruction
- * of the group leader will find the pointer to itself in
- * perf_group_detach().
- */
- fput_light(group_file, fput_needed);
- fd_install(event_fd, event_file);
- return event_fd;
-
-err_context:
- perf_unpin_context(ctx);
- put_ctx(ctx);
-err_alloc:
- free_event(event);
-err_task:
- if (task)
- put_task_struct(task);
-err_group_fd:
- fput_light(group_file, fput_needed);
-err_fd:
- put_unused_fd(event_fd);
- return err;
-}
-
-/**
- * perf_event_create_kernel_counter
- *
- * @attr: attributes of the counter to create
- * @cpu: cpu in which the counter is bound
- * @task: task to profile (NULL for percpu)
- */
-struct perf_event *
-perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
- struct task_struct *task,
- perf_overflow_handler_t overflow_handler)
-{
- struct perf_event_context *ctx;
- struct perf_event *event;
- int err;
-
- /*
- * Get the target context (task or percpu):
- */
-
- event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler);
- if (IS_ERR(event)) {
- err = PTR_ERR(event);
- goto err;
- }
-
- ctx = find_get_context(event->pmu, task, cpu);
- if (IS_ERR(ctx)) {
- err = PTR_ERR(ctx);
- goto err_free;
- }
-
- event->filp = NULL;
- WARN_ON_ONCE(ctx->parent_ctx);
- mutex_lock(&ctx->mutex);
- perf_install_in_context(ctx, event, cpu);
- ++ctx->generation;
- perf_unpin_context(ctx);
- mutex_unlock(&ctx->mutex);
-
- return event;
-
-err_free:
- free_event(event);
-err:
- return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
-
-static void sync_child_event(struct perf_event *child_event,
- struct task_struct *child)
-{
- struct perf_event *parent_event = child_event->parent;
- u64 child_val;
-
- if (child_event->attr.inherit_stat)
- perf_event_read_event(child_event, child);
-
- child_val = perf_event_count(child_event);
-
- /*
- * Add back the child's count to the parent's count:
- */
- atomic64_add(child_val, &parent_event->child_count);
- atomic64_add(child_event->total_time_enabled,
- &parent_event->child_total_time_enabled);
- atomic64_add(child_event->total_time_running,
- &parent_event->child_total_time_running);
-
- /*
- * Remove this event from the parent's list
- */
- WARN_ON_ONCE(parent_event->ctx->parent_ctx);
- mutex_lock(&parent_event->child_mutex);
- list_del_init(&child_event->child_list);
- mutex_unlock(&parent_event->child_mutex);
-
- /*
- * Release the parent event, if this was the last
- * reference to it.
- */
- fput(parent_event->filp);
-}
-
-static void
-__perf_event_exit_task(struct perf_event *child_event,
- struct perf_event_context *child_ctx,
- struct task_struct *child)
-{
- if (child_event->parent) {
- raw_spin_lock_irq(&child_ctx->lock);
- perf_group_detach(child_event);
- raw_spin_unlock_irq(&child_ctx->lock);
- }
-
- perf_remove_from_context(child_event);
-
- /*
- * It can happen that the parent exits first, and has events
- * that are still around due to the child reference. These
- * events need to be zapped.
- */
- if (child_event->parent) {
- sync_child_event(child_event, child);
- free_event(child_event);
- }
-}
-
-static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
-{
- struct perf_event *child_event, *tmp;
- struct perf_event_context *child_ctx;
- unsigned long flags;
-
- if (likely(!child->perf_event_ctxp[ctxn])) {
- perf_event_task(child, NULL, 0);
- return;
- }
-
- local_irq_save(flags);
- /*
- * We can't reschedule here because interrupts are disabled,
- * and either child is current or it is a task that can't be
- * scheduled, so we are now safe from rescheduling changing
- * our context.
- */
- child_ctx = rcu_dereference_raw(child->perf_event_ctxp[ctxn]);
- task_ctx_sched_out(child_ctx, EVENT_ALL);
-
- /*
- * Take the context lock here so that if find_get_context is
- * reading child->perf_event_ctxp, we wait until it has
- * incremented the context's refcount before we do put_ctx below.
- */
- raw_spin_lock(&child_ctx->lock);
- child->perf_event_ctxp[ctxn] = NULL;
- /*
- * If this context is a clone; unclone it so it can't get
- * swapped to another process while we're removing all
- * the events from it.
- */
- unclone_ctx(child_ctx);
- update_context_time(child_ctx);
- raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
-
- /*
- * Report the task dead after unscheduling the events so that we
- * won't get any samples after PERF_RECORD_EXIT. We can however still
- * get a few PERF_RECORD_READ events.
- */
- perf_event_task(child, child_ctx, 0);
-
- /*
- * We can recurse on the same lock type through:
- *
- * __perf_event_exit_task()
- * sync_child_event()
- * fput(parent_event->filp)
- * perf_release()
- * mutex_lock(&ctx->mutex)
- *
- * But since its the parent context it won't be the same instance.
- */
- mutex_lock(&child_ctx->mutex);
-
-again:
- list_for_each_entry_safe(child_event, tmp, &child_ctx->pinned_groups,
- group_entry)
- __perf_event_exit_task(child_event, child_ctx, child);
-
- list_for_each_entry_safe(child_event, tmp, &child_ctx->flexible_groups,
- group_entry)
- __perf_event_exit_task(child_event, child_ctx, child);
-
- /*
- * If the last event was a group event, it will have appended all
- * its siblings to the list, but we obtained 'tmp' before that which
- * will still point to the list head terminating the iteration.
- */
- if (!list_empty(&child_ctx->pinned_groups) ||
- !list_empty(&child_ctx->flexible_groups))
- goto again;
-
- mutex_unlock(&child_ctx->mutex);
-
- put_ctx(child_ctx);
-}
-
-/*
- * When a child task exits, feed back event values to parent events.
- */
-void perf_event_exit_task(struct task_struct *child)
-{
- struct perf_event *event, *tmp;
- int ctxn;
-
- mutex_lock(&child->perf_event_mutex);
- list_for_each_entry_safe(event, tmp, &child->perf_event_list,
- owner_entry) {
- list_del_init(&event->owner_entry);
-
- /*
- * Ensure the list deletion is visible before we clear
- * the owner, closes a race against perf_release() where
- * we need to serialize on the owner->perf_event_mutex.
- */
- smp_wmb();
- event->owner = NULL;
- }
- mutex_unlock(&child->perf_event_mutex);
-
- for_each_task_context_nr(ctxn)
- perf_event_exit_task_context(child, ctxn);
-}
-
-static void perf_free_event(struct perf_event *event,
- struct perf_event_context *ctx)
-{
- struct perf_event *parent = event->parent;
-
- if (WARN_ON_ONCE(!parent))
- return;
-
- mutex_lock(&parent->child_mutex);
- list_del_init(&event->child_list);
- mutex_unlock(&parent->child_mutex);
-
- fput(parent->filp);
-
- perf_group_detach(event);
- list_del_event(event, ctx);
- free_event(event);
-}
-
-/*
- * free an unexposed, unused context as created by inheritance by
- * perf_event_init_task below, used by fork() in case of fail.
- */
-void perf_event_free_task(struct task_struct *task)
-{
- struct perf_event_context *ctx;
- struct perf_event *event, *tmp;
- int ctxn;
-
- for_each_task_context_nr(ctxn) {
- ctx = task->perf_event_ctxp[ctxn];
- if (!ctx)
- continue;
-
- mutex_lock(&ctx->mutex);
-again:
- list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
- group_entry)
- perf_free_event(event, ctx);
-
- list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
- group_entry)
- perf_free_event(event, ctx);
-
- if (!list_empty(&ctx->pinned_groups) ||
- !list_empty(&ctx->flexible_groups))
- goto again;
-
- mutex_unlock(&ctx->mutex);
-
- put_ctx(ctx);
- }
-}
-
-void perf_event_delayed_put(struct task_struct *task)
-{
- int ctxn;
-
- for_each_task_context_nr(ctxn)
- WARN_ON_ONCE(task->perf_event_ctxp[ctxn]);
-}
-
-/*
- * inherit a event from parent task to child task:
- */
-static struct perf_event *
-inherit_event(struct perf_event *parent_event,
- struct task_struct *parent,
- struct perf_event_context *parent_ctx,
- struct task_struct *child,
- struct perf_event *group_leader,
- struct perf_event_context *child_ctx)
-{
- struct perf_event *child_event;
- unsigned long flags;
-
- /*
- * Instead of creating recursive hierarchies of events,
- * we link inherited events back to the original parent,
- * which has a filp for sure, which we use as the reference
- * count:
- */
- if (parent_event->parent)
- parent_event = parent_event->parent;
-
- child_event = perf_event_alloc(&parent_event->attr,
- parent_event->cpu,
- child,
- group_leader, parent_event,
- NULL);
- if (IS_ERR(child_event))
- return child_event;
- get_ctx(child_ctx);
-
- /*
- * Make the child state follow the state of the parent event,
- * not its attr.disabled bit. We hold the parent's mutex,
- * so we won't race with perf_event_{en, dis}able_family.
- */
- if (parent_event->state >= PERF_EVENT_STATE_INACTIVE)
- child_event->state = PERF_EVENT_STATE_INACTIVE;
- else
- child_event->state = PERF_EVENT_STATE_OFF;
-
- if (parent_event->attr.freq) {
- u64 sample_period = parent_event->hw.sample_period;
- struct hw_perf_event *hwc = &child_event->hw;
-
- hwc->sample_period = sample_period;
- hwc->last_period = sample_period;
-
- local64_set(&hwc->period_left, sample_period);
- }
-
- child_event->ctx = child_ctx;
- child_event->overflow_handler = parent_event->overflow_handler;
-
- /*
- * Precalculate sample_data sizes
- */
- perf_event__header_size(child_event);
- perf_event__id_header_size(child_event);
-
- /*
- * Link it up in the child's context:
- */
- raw_spin_lock_irqsave(&child_ctx->lock, flags);
- add_event_to_ctx(child_event, child_ctx);
- raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
-
- /*
- * Get a reference to the parent filp - we will fput it
- * when the child event exits. This is safe to do because
- * we are in the parent and we know that the filp still
- * exists and has a nonzero count:
- */
- atomic_long_inc(&parent_event->filp->f_count);
-
- /*
- * Link this into the parent event's child list
- */
- WARN_ON_ONCE(parent_event->ctx->parent_ctx);
- mutex_lock(&parent_event->child_mutex);
- list_add_tail(&child_event->child_list, &parent_event->child_list);
- mutex_unlock(&parent_event->child_mutex);
-
- return child_event;
-}
-
-static int inherit_group(struct perf_event *parent_event,
- struct task_struct *parent,
- struct perf_event_context *parent_ctx,
- struct task_struct *child,
- struct perf_event_context *child_ctx)
-{
- struct perf_event *leader;
- struct perf_event *sub;
- struct perf_event *child_ctr;
-
- leader = inherit_event(parent_event, parent, parent_ctx,
- child, NULL, child_ctx);
- if (IS_ERR(leader))
- return PTR_ERR(leader);
- list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
- child_ctr = inherit_event(sub, parent, parent_ctx,
- child, leader, child_ctx);
- if (IS_ERR(child_ctr))
- return PTR_ERR(child_ctr);
- }
- return 0;
-}
-
-static int
-inherit_task_group(struct perf_event *event, struct task_struct *parent,
- struct perf_event_context *parent_ctx,
- struct task_struct *child, int ctxn,
- int *inherited_all)
-{
- int ret;
- struct perf_event_context *child_ctx;
-
- if (!event->attr.inherit) {
- *inherited_all = 0;
- return 0;
- }
-
- child_ctx = child->perf_event_ctxp[ctxn];
- if (!child_ctx) {
- /*
- * This is executed from the parent task context, so
- * inherit events that have been marked for cloning.
- * First allocate and initialize a context for the
- * child.
- */
-
- child_ctx = alloc_perf_context(event->pmu, child);
- if (!child_ctx)
- return -ENOMEM;
-
- child->perf_event_ctxp[ctxn] = child_ctx;
- }
-
- ret = inherit_group(event, parent, parent_ctx,
- child, child_ctx);
-
- if (ret)
- *inherited_all = 0;
-
- return ret;
-}
-
-/*
- * Initialize the perf_event context in task_struct
- */
-int perf_event_init_context(struct task_struct *child, int ctxn)
-{
- struct perf_event_context *child_ctx, *parent_ctx;
- struct perf_event_context *cloned_ctx;
- struct perf_event *event;
- struct task_struct *parent = current;
- int inherited_all = 1;
- unsigned long flags;
- int ret = 0;
-
- if (likely(!parent->perf_event_ctxp[ctxn]))
- return 0;
-
- /*
- * If the parent's context is a clone, pin it so it won't get
- * swapped under us.
- */
- parent_ctx = perf_pin_task_context(parent, ctxn);
-
- /*
- * No need to check if parent_ctx != NULL here; since we saw
- * it non-NULL earlier, the only reason for it to become NULL
- * is if we exit, and since we're currently in the middle of
- * a fork we can't be exiting at the same time.
- */
-
- /*
- * Lock the parent list. No need to lock the child - not PID
- * hashed yet and not running, so nobody can access it.
- */
- mutex_lock(&parent_ctx->mutex);
-
- /*
- * We dont have to disable NMIs - we are only looking at
- * the list, not manipulating it:
- */
- list_for_each_entry(event, &parent_ctx->pinned_groups, group_entry) {
- ret = inherit_task_group(event, parent, parent_ctx,
- child, ctxn, &inherited_all);
- if (ret)
- break;
- }
-
- /*
- * We can't hold ctx->lock when iterating the ->flexible_group list due
- * to allocations, but we need to prevent rotation because
- * rotate_ctx() will change the list from interrupt context.
- */
- raw_spin_lock_irqsave(&parent_ctx->lock, flags);
- parent_ctx->rotate_disable = 1;
- raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
-
- list_for_each_entry(event, &parent_ctx->flexible_groups, group_entry) {
- ret = inherit_task_group(event, parent, parent_ctx,
- child, ctxn, &inherited_all);
- if (ret)
- break;
- }
-
- raw_spin_lock_irqsave(&parent_ctx->lock, flags);
- parent_ctx->rotate_disable = 0;
-
- child_ctx = child->perf_event_ctxp[ctxn];
-
- if (child_ctx && inherited_all) {
- /*
- * Mark the child context as a clone of the parent
- * context, or of whatever the parent is a clone of.
- *
- * Note that if the parent is a clone, the holding of
- * parent_ctx->lock avoids it from being uncloned.
- */
- cloned_ctx = parent_ctx->parent_ctx;
- if (cloned_ctx) {
- child_ctx->parent_ctx = cloned_ctx;
- child_ctx->parent_gen = parent_ctx->parent_gen;
- } else {
- child_ctx->parent_ctx = parent_ctx;
- child_ctx->parent_gen = parent_ctx->generation;
- }
- get_ctx(child_ctx->parent_ctx);
- }
-
- raw_spin_unlock_irqrestore(&parent_ctx->lock, flags);
- mutex_unlock(&parent_ctx->mutex);
-
- perf_unpin_context(parent_ctx);
- put_ctx(parent_ctx);
-
- return ret;
-}
-
-/*
- * Initialize the perf_event context in task_struct
- */
-int perf_event_init_task(struct task_struct *child)
-{
- int ctxn, ret;
-
- memset(child->perf_event_ctxp, 0, sizeof(child->perf_event_ctxp));
- mutex_init(&child->perf_event_mutex);
- INIT_LIST_HEAD(&child->perf_event_list);
-
- for_each_task_context_nr(ctxn) {
- ret = perf_event_init_context(child, ctxn);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
-static void __init perf_event_init_all_cpus(void)
-{
- struct swevent_htable *swhash;
- int cpu;
-
- for_each_possible_cpu(cpu) {
- swhash = &per_cpu(swevent_htable, cpu);
- mutex_init(&swhash->hlist_mutex);
- INIT_LIST_HEAD(&per_cpu(rotation_list, cpu));
- }
-}
-
-static void __cpuinit perf_event_init_cpu(int cpu)
-{
- struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
-
- mutex_lock(&swhash->hlist_mutex);
- if (swhash->hlist_refcount > 0) {
- struct swevent_hlist *hlist;
-
- hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu));
- WARN_ON(!hlist);
- rcu_assign_pointer(swhash->swevent_hlist, hlist);
- }
- mutex_unlock(&swhash->hlist_mutex);
-}
-
-#if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC
-static void perf_pmu_rotate_stop(struct pmu *pmu)
-{
- struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
-
- WARN_ON(!irqs_disabled());
-
- list_del_init(&cpuctx->rotation_list);
-}
-
-static void __perf_event_exit_context(void *__info)
-{
- struct perf_event_context *ctx = __info;
- struct perf_event *event, *tmp;
-
- perf_pmu_rotate_stop(ctx->pmu);
-
- list_for_each_entry_safe(event, tmp, &ctx->pinned_groups, group_entry)
- __perf_remove_from_context(event);
- list_for_each_entry_safe(event, tmp, &ctx->flexible_groups, group_entry)
- __perf_remove_from_context(event);
-}
-
-static void perf_event_exit_cpu_context(int cpu)
-{
- struct perf_event_context *ctx;
- struct pmu *pmu;
- int idx;
-
- idx = srcu_read_lock(&pmus_srcu);
- list_for_each_entry_rcu(pmu, &pmus, entry) {
- ctx = &per_cpu_ptr(pmu->pmu_cpu_context, cpu)->ctx;
-
- mutex_lock(&ctx->mutex);
- smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
- mutex_unlock(&ctx->mutex);
- }
- srcu_read_unlock(&pmus_srcu, idx);
-}
-
-static void perf_event_exit_cpu(int cpu)
-{
- struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
-
- mutex_lock(&swhash->hlist_mutex);
- swevent_hlist_release(swhash);
- mutex_unlock(&swhash->hlist_mutex);
-
- perf_event_exit_cpu_context(cpu);
-}
-#else
-static inline void perf_event_exit_cpu(int cpu) { }
-#endif
-
-static int
-perf_reboot(struct notifier_block *notifier, unsigned long val, void *v)
-{
- int cpu;
-
- for_each_online_cpu(cpu)
- perf_event_exit_cpu(cpu);
-
- return NOTIFY_OK;
-}
-
-/*
- * Run the perf reboot notifier at the very last possible moment so that
- * the generic watchdog code runs as long as possible.
- */
-static struct notifier_block perf_reboot_notifier = {
- .notifier_call = perf_reboot,
- .priority = INT_MIN,
-};
-
-static int __cpuinit
-perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
-{
- unsigned int cpu = (long)hcpu;
-
- switch (action & ~CPU_TASKS_FROZEN) {
-
- case CPU_UP_PREPARE:
- case CPU_DOWN_FAILED:
- perf_event_init_cpu(cpu);
- break;
-
- case CPU_UP_CANCELED:
- case CPU_DOWN_PREPARE:
- perf_event_exit_cpu(cpu);
- break;
-
- default:
- break;
- }
-
- return NOTIFY_OK;
-}
-
-void __init perf_event_init(void)
-{
- int ret;
-
- idr_init(&pmu_idr);
-
- perf_event_init_all_cpus();
- init_srcu_struct(&pmus_srcu);
- perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
- perf_pmu_register(&perf_cpu_clock, NULL, -1);
- perf_pmu_register(&perf_task_clock, NULL, -1);
- perf_tp_register();
- perf_cpu_notifier(perf_cpu_notify);
- register_reboot_notifier(&perf_reboot_notifier);
-
- ret = init_hw_breakpoint();
- WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
-}
-
-static int __init perf_event_sysfs_init(void)
-{
- struct pmu *pmu;
- int ret;
-
- mutex_lock(&pmus_lock);
-
- ret = bus_register(&pmu_bus);
- if (ret)
- goto unlock;
-
- list_for_each_entry(pmu, &pmus, entry) {
- if (!pmu->name || pmu->type < 0)
- continue;
-
- ret = pmu_dev_alloc(pmu);
- WARN(ret, "Failed to register pmu: %s, reason %d\n", pmu->name, ret);
- }
- pmu_bus_running = 1;
- ret = 0;
-
-unlock:
- mutex_unlock(&pmus_lock);
-
- return ret;
-}
-device_initcall(perf_event_sysfs_init);
-
-#ifdef CONFIG_CGROUP_PERF
-static struct cgroup_subsys_state *perf_cgroup_create(
- struct cgroup_subsys *ss, struct cgroup *cont)
-{
- struct perf_cgroup *jc;
-
- jc = kzalloc(sizeof(*jc), GFP_KERNEL);
- if (!jc)
- return ERR_PTR(-ENOMEM);
-
- jc->info = alloc_percpu(struct perf_cgroup_info);
- if (!jc->info) {
- kfree(jc);
- return ERR_PTR(-ENOMEM);
- }
-
- return &jc->css;
-}
-
-static void perf_cgroup_destroy(struct cgroup_subsys *ss,
- struct cgroup *cont)
-{
- struct perf_cgroup *jc;
- jc = container_of(cgroup_subsys_state(cont, perf_subsys_id),
- struct perf_cgroup, css);
- free_percpu(jc->info);
- kfree(jc);
-}
-
-static int __perf_cgroup_move(void *info)
-{
- struct task_struct *task = info;
- perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN);
- return 0;
-}
-
-static void perf_cgroup_move(struct task_struct *task)
-{
- task_function_call(task, __perf_cgroup_move, task);
-}
-
-static void perf_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct cgroup *old_cgrp, struct task_struct *task,
- bool threadgroup)
-{
- perf_cgroup_move(task);
- if (threadgroup) {
- struct task_struct *c;
- rcu_read_lock();
- list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
- perf_cgroup_move(c);
- }
- rcu_read_unlock();
- }
-}
-
-static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
- struct cgroup *old_cgrp, struct task_struct *task)
-{
- /*
- * cgroup_exit() is called in the copy_process() failure path.
- * Ignore this case since the task hasn't ran yet, this avoids
- * trying to poke a half freed task state from generic code.
- */
- if (!(task->flags & PF_EXITING))
- return;
-
- perf_cgroup_move(task);
-}
-
-struct cgroup_subsys perf_subsys = {
- .name = "perf_event",
- .subsys_id = perf_subsys_id,
- .create = perf_cgroup_create,
- .destroy = perf_cgroup_destroy,
- .exit = perf_cgroup_exit,
- .attach = perf_cgroup_attach,
-};
-#endif /* CONFIG_CGROUP_PERF */
code. This is helpful when debugging and reporting PM bugs, like
suspend support.
-config PM_VERBOSE
- bool "Verbose Power Management debugging"
- depends on PM_DEBUG
- ---help---
- This option enables verbose messages from the Power Management code.
-
config PM_ADVANCED_DEBUG
bool "Extra PM attributes in sysfs for low-level debugging/testing"
depends on PM_DEBUG
representing individual voltage domains and provides SOC
implementations a ready to use framework to manage OPPs.
For more information, read <file:Documentation/power/opp.txt>
+
+config PM_RUNTIME_CLK
+ def_bool y
+ depends on PM_RUNTIME && HAVE_CLK
local_irq_disable();
- error = sysdev_suspend(PMSG_FREEZE);
- if (!error)
- error = syscore_suspend();
+ error = syscore_suspend();
if (error) {
printk(KERN_ERR "PM: Some system devices failed to power down, "
"aborting hibernation\n");
Power_up:
syscore_resume();
- sysdev_resume();
/* NOTE: dpm_resume_noirq() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
int hibernation_snapshot(int platform_mode)
{
+ pm_message_t msg = PMSG_RECOVER;
int error;
error = platform_begin(platform_mode);
if (error)
goto Close;
+ error = dpm_prepare(PMSG_FREEZE);
+ if (error)
+ goto Complete_devices;
+
/* Preallocate image memory before shutting down devices. */
error = hibernate_preallocate_memory();
if (error)
- goto Close;
+ goto Complete_devices;
suspend_console();
pm_restrict_gfp_mask();
- error = dpm_suspend_start(PMSG_FREEZE);
+ error = dpm_suspend(PMSG_FREEZE);
if (error)
goto Recover_platform;
if (error || !in_suspend)
swsusp_free();
- dpm_resume_end(in_suspend ?
- (error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
+ msg = in_suspend ? (error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE;
+ dpm_resume(msg);
if (error || !in_suspend)
pm_restore_gfp_mask();
resume_console();
+
+ Complete_devices:
+ dpm_complete(msg);
+
Close:
platform_end(platform_mode);
return error;
local_irq_disable();
- error = sysdev_suspend(PMSG_QUIESCE);
- if (!error)
- error = syscore_suspend();
+ error = syscore_suspend();
if (error)
goto Enable_irqs;
touch_softlockup_watchdog();
syscore_resume();
- sysdev_resume();
Enable_irqs:
local_irq_enable();
goto Platform_finish;
local_irq_disable();
- sysdev_suspend(PMSG_HIBERNATE);
syscore_suspend();
if (pm_wakeup_pending()) {
error = -EAGAIN;
Power_up:
syscore_resume();
- sysdev_resume();
local_irq_enable();
enable_nonboot_cpus();
power_attr(image_size);
+static ssize_t reserved_size_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%lu\n", reserved_size);
+}
+
+static ssize_t reserved_size_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ unsigned long size;
+
+ if (sscanf(buf, "%lu", &size) == 1) {
+ reserved_size = size;
+ return n;
+ }
+
+ return -EINVAL;
+}
+
+power_attr(reserved_size);
+
static struct attribute * g[] = {
&disk_attr.attr,
&resume_attr.attr,
&image_size_attr.attr,
+ &reserved_size_attr.attr,
NULL,
};
if (error)
return error;
hibernate_image_size_init();
+ hibernate_reserved_size_init();
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
#ifdef CONFIG_HIBERNATION
/* kernel/power/snapshot.c */
+extern void __init hibernate_reserved_size_init(void);
extern void __init hibernate_image_size_init(void);
#ifdef CONFIG_ARCH_HIBERNATION_HEADER
#else /* !CONFIG_HIBERNATION */
+static inline void hibernate_reserved_size_init(void) {}
static inline void hibernate_image_size_init(void) {}
#endif /* !CONFIG_HIBERNATION */
/* Preferred image size in bytes (default 500 MB) */
extern unsigned long image_size;
+/* Size of memory reserved for drivers (default SPARE_PAGES x PAGE_SIZE) */
+extern unsigned long reserved_size;
extern int in_suspend;
extern dev_t swsusp_resume_device;
extern sector_t swsusp_resume_block;
static void swsusp_set_page_forbidden(struct page *);
static void swsusp_unset_page_forbidden(struct page *);
+/*
+ * Number of bytes to reserve for memory allocations made by device drivers
+ * from their ->freeze() and ->freeze_noirq() callbacks so that they don't
+ * cause image creation to fail (tunable via /sys/power/reserved_size).
+ */
+unsigned long reserved_size;
+
+void __init hibernate_reserved_size_init(void)
+{
+ reserved_size = SPARE_PAGES * PAGE_SIZE;
+}
+
/*
* Preferred image size in bytes (tunable via /sys/power/image_size).
- * When it is set to N, the image creating code will do its best to
- * ensure the image size will not exceed N bytes, but if that is
- * impossible, it will try to create the smallest image possible.
+ * When it is set to N, swsusp will do its best to ensure the image
+ * size will not exceed N bytes, but if that is impossible, it will
+ * try to create the smallest image possible.
*/
unsigned long image_size;
void __init hibernate_image_size_init(void)
{
- image_size = (totalram_pages / 3) * PAGE_SIZE;
+ image_size = ((totalram_pages * 2) / 5) * PAGE_SIZE;
}
/* List of PBEs needed for restoring the pages that were allocated before
* frame in use. We also need a number of page frames to be free during
* hibernation for allocations made while saving the image and for device
* drivers, in case they need to allocate memory from their hibernation
- * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
- * respectively, both of which are rough estimates). To make this happen, we
- * compute the total number of available page frames and allocate at least
+ * callbacks (these two numbers are given by PAGES_FOR_IO (which is a rough
+ * estimate) and reserverd_size divided by PAGE_SIZE (which is tunable through
+ * /sys/power/reserved_size, respectively). To make this happen, we compute the
+ * total number of available page frames and allocate at least
*
- * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2
+ * + 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE)
*
* of them, which corresponds to the maximum size of a hibernation image.
*
count -= totalreserve_pages;
/* Compute the maximum number of saveable pages to leave in memory. */
- max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+ max_size = (count - (size + PAGES_FOR_IO)) / 2
+ - 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE);
/* Compute the desired number of image pages specified by image_size. */
size = DIV_ROUND_UP(image_size, PAGE_SIZE);
if (size > max_size)
arch_suspend_disable_irqs();
BUG_ON(!irqs_disabled());
- error = sysdev_suspend(PMSG_SUSPEND);
- if (!error)
- error = syscore_suspend();
+ error = syscore_suspend();
if (!error) {
if (!(suspend_test(TEST_CORE) || pm_wakeup_pending())) {
error = suspend_ops->enter(state);
events_check_enabled = false;
}
syscore_resume();
- sysdev_resume();
}
arch_suspend_enable_irqs();
goto Close;
}
suspend_console();
- pm_restrict_gfp_mask();
suspend_test_start();
error = dpm_suspend_start(PMSG_SUSPEND);
if (error) {
if (suspend_test(TEST_DEVICES))
goto Recover_platform;
- suspend_enter(state);
+ error = suspend_enter(state);
Resume_devices:
suspend_test_start();
dpm_resume_end(PMSG_RESUME);
suspend_test_finish("resume devices");
- pm_restore_gfp_mask();
resume_console();
Close:
if (suspend_ops->end)
goto Finish;
pr_debug("PM: Entering %s sleep\n", pm_states[state]);
+ pm_restrict_gfp_mask();
error = suspend_devices_and_enter(state);
+ pm_restore_gfp_mask();
Finish:
pr_debug("PM: Finishing wakeup.\n");
free_basic_memory_bitmaps();
data = filp->private_data;
free_all_swap_pages(data->swap);
- if (data->frozen)
+ if (data->frozen) {
+ pm_restore_gfp_mask();
thaw_processes();
+ }
pm_notifier_call_chain(data->mode == O_RDONLY ?
PM_POST_HIBERNATION : PM_POST_RESTORE);
atomic_inc(&snapshot_device_available);
* PM_HIBERNATION_PREPARE
*/
error = suspend_devices_and_enter(PM_SUSPEND_MEM);
+ data->ready = 0;
break;
case SNAPSHOT_PLATFORM_SUPPORT:
#include <linux/syscalls.h>
#include <linux/uaccess.h>
#include <linux/regset.h>
+#include <linux/hw_breakpoint.h>
/*
return ret;
}
#endif /* CONFIG_COMPAT */
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+int ptrace_get_breakpoints(struct task_struct *tsk)
+{
+ if (atomic_inc_not_zero(&tsk->ptrace_bp_refcnt))
+ return 0;
+
+ return -1;
+}
+
+void ptrace_put_breakpoints(struct task_struct *tsk)
+{
+ if (atomic_dec_and_test(&tsk->ptrace_bp_refcnt))
+ flush_ptrace_hw_breakpoint(tsk);
+}
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
#endif
/*
- * sched_domains_mutex serializes calls to arch_init_sched_domains,
+ * sched_domains_mutex serializes calls to init_sched_domains,
* detach_destroy_domains and partition_sched_domains.
*/
static DEFINE_MUTEX(sched_domains_mutex);
u64 exec_clock;
u64 min_vruntime;
+#ifndef CONFIG_64BIT
+ u64 min_vruntime_copy;
+#endif
struct rb_root tasks_timeline;
struct rb_node *rb_leftmost;
*/
struct sched_entity *curr, *next, *last, *skip;
+#ifdef CONFIG_SCHED_DEBUG
unsigned int nr_spread_over;
+#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
struct rq *rq; /* cpu runqueue to which this cfs_rq is attached */
*/
struct root_domain {
atomic_t refcount;
+ struct rcu_head rcu;
cpumask_var_t span;
cpumask_var_t online;
u64 nohz_stamp;
unsigned char nohz_balance_kick;
#endif
- unsigned int skip_clock_update;
+ int skip_clock_update;
/* capture load from *all* tasks on this cpu: */
struct load_weight load;
unsigned int ttwu_count;
unsigned int ttwu_local;
#endif
+
+#ifdef CONFIG_SMP
+ struct task_struct *wake_list;
+#endif
};
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
#define rcu_dereference_check_sched_domain(p) \
rcu_dereference_check((p), \
- rcu_read_lock_sched_held() || \
+ rcu_read_lock_held() || \
lockdep_is_held(&sched_domains_mutex))
/*
* Return the group to which this tasks belongs.
*
* We use task_subsys_state_check() and extend the RCU verification
- * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
* holds that lock for each task it moves into the cgroup. Therefore
* by holding that lock, we pin the task to the current cgroup.
*/
struct cgroup_subsys_state *css;
css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
- lockdep_is_held(&task_rq(p)->lock));
+ lockdep_is_held(&p->pi_lock));
tg = container_of(css, struct task_group, css);
return autogroup_task_group(p, tg);
{
s64 delta;
- if (rq->skip_clock_update)
+ if (rq->skip_clock_update > 0)
return;
delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
return rq->curr == p;
}
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
static inline int task_running(struct rq *rq, struct task_struct *p)
{
+#ifdef CONFIG_SMP
+ return p->on_cpu;
+#else
return task_current(rq, p);
+#endif
}
+#ifndef __ARCH_WANT_UNLOCKED_CTXSW
static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
{
+#ifdef CONFIG_SMP
+ /*
+ * We can optimise this out completely for !SMP, because the
+ * SMP rebalancing from interrupt is the only thing that cares
+ * here.
+ */
+ next->on_cpu = 1;
+#endif
}
static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
{
+#ifdef CONFIG_SMP
+ /*
+ * After ->on_cpu is cleared, the task can be moved to a different CPU.
+ * We must ensure this doesn't happen until the switch is completely
+ * finished.
+ */
+ smp_wmb();
+ prev->on_cpu = 0;
+#endif
#ifdef CONFIG_DEBUG_SPINLOCK
/* this is a valid case when another task releases the spinlock */
rq->lock.owner = current;
}
#else /* __ARCH_WANT_UNLOCKED_CTXSW */
-static inline int task_running(struct rq *rq, struct task_struct *p)
-{
-#ifdef CONFIG_SMP
- return p->oncpu;
-#else
- return task_current(rq, p);
-#endif
-}
-
static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
{
#ifdef CONFIG_SMP
* SMP rebalancing from interrupt is the only thing that cares
* here.
*/
- next->oncpu = 1;
+ next->on_cpu = 1;
#endif
#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
raw_spin_unlock_irq(&rq->lock);
{
#ifdef CONFIG_SMP
/*
- * After ->oncpu is cleared, the task can be moved to a different CPU.
+ * After ->on_cpu is cleared, the task can be moved to a different CPU.
* We must ensure this doesn't happen until the switch is completely
* finished.
*/
smp_wmb();
- prev->oncpu = 0;
+ prev->on_cpu = 0;
#endif
#ifndef __ARCH_WANT_INTERRUPTS_ON_CTXSW
local_irq_enable();
#endif /* __ARCH_WANT_UNLOCKED_CTXSW */
/*
- * Check whether the task is waking, we use this to synchronize ->cpus_allowed
- * against ttwu().
- */
-static inline int task_is_waking(struct task_struct *p)
-{
- return unlikely(p->state == TASK_WAKING);
-}
-
-/*
- * __task_rq_lock - lock the runqueue a given task resides on.
- * Must be called interrupts disabled.
+ * __task_rq_lock - lock the rq @p resides on.
*/
static inline struct rq *__task_rq_lock(struct task_struct *p)
__acquires(rq->lock)
{
struct rq *rq;
+ lockdep_assert_held(&p->pi_lock);
+
for (;;) {
rq = task_rq(p);
raw_spin_lock(&rq->lock);
}
/*
- * task_rq_lock - lock the runqueue a given task resides on and disable
- * interrupts. Note the ordering: we can safely lookup the task_rq without
- * explicitly disabling preemption.
+ * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
*/
static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
+ __acquires(p->pi_lock)
__acquires(rq->lock)
{
struct rq *rq;
for (;;) {
- local_irq_save(*flags);
+ raw_spin_lock_irqsave(&p->pi_lock, *flags);
rq = task_rq(p);
raw_spin_lock(&rq->lock);
if (likely(rq == task_rq(p)))
return rq;
- raw_spin_unlock_irqrestore(&rq->lock, *flags);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
}
}
raw_spin_unlock(&rq->lock);
}
-static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)
+static inline void
+task_rq_unlock(struct rq *rq, struct task_struct *p, unsigned long *flags)
__releases(rq->lock)
+ __releases(p->pi_lock)
{
- raw_spin_unlock_irqrestore(&rq->lock, *flags);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
}
/*
int i;
struct sched_domain *sd;
+ rcu_read_lock();
for_each_domain(cpu, sd) {
- for_each_cpu(i, sched_domain_span(sd))
- if (!idle_cpu(i))
- return i;
+ for_each_cpu(i, sched_domain_span(sd)) {
+ if (!idle_cpu(i)) {
+ cpu = i;
+ goto unlock;
+ }
+ }
}
+unlock:
+ rcu_read_unlock();
return cpu;
}
/*
{
u64 tmp;
+ tmp = (u64)delta_exec * weight;
+
if (!lw->inv_weight) {
if (BITS_PER_LONG > 32 && unlikely(lw->weight >= WMULT_CONST))
lw->inv_weight = 1;
else
- lw->inv_weight = 1 + (WMULT_CONST-lw->weight/2)
- / (lw->weight+1);
+ lw->inv_weight = WMULT_CONST / lw->weight;
}
- tmp = (u64)delta_exec * weight;
/*
* Check whether we'd overflow the 64-bit multiplication:
*/
update_rq_clock(rq);
sched_info_queued(p);
p->sched_class->enqueue_task(rq, p, flags);
- p->se.on_rq = 1;
}
static void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
update_rq_clock(rq);
sched_info_dequeued(p);
p->sched_class->dequeue_task(rq, p, flags);
- p->se.on_rq = 0;
}
/*
* A queue event has occurred, and we're going to schedule. In
* this case, we can save a useless back to back clock update.
*/
- if (rq->curr->se.on_rq && test_tsk_need_resched(rq->curr))
+ if (rq->curr->on_rq && test_tsk_need_resched(rq->curr))
rq->skip_clock_update = 1;
}
*/
WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+
+#ifdef CONFIG_LOCKDEP
+ WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
+ lockdep_is_held(&task_rq(p)->lock)));
+#endif
#endif
trace_sched_migrate_task(p, new_cpu);
static int migration_cpu_stop(void *data);
-/*
- * The task's runqueue lock must be held.
- * Returns true if you have to wait for migration thread.
- */
-static bool migrate_task(struct task_struct *p, struct rq *rq)
-{
- /*
- * If the task is not on a runqueue (and not running), then
- * the next wake-up will properly place the task.
- */
- return p->se.on_rq || task_running(rq, p);
-}
-
/*
* wait_task_inactive - wait for a thread to unschedule.
*
rq = task_rq_lock(p, &flags);
trace_sched_wait_task(p);
running = task_running(rq, p);
- on_rq = p->se.on_rq;
+ on_rq = p->on_rq;
ncsw = 0;
if (!match_state || p->state == match_state)
ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
/*
* If it changed from the expected state, bail out now.
#ifdef CONFIG_SMP
/*
- * ->cpus_allowed is protected by either TASK_WAKING or rq->lock held.
+ * ->cpus_allowed is protected by both rq->lock and p->pi_lock
*/
static int select_fallback_rq(int cpu, struct task_struct *p)
{
}
/*
- * The caller (fork, wakeup) owns TASK_WAKING, ->cpus_allowed is stable.
+ * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
*/
static inline
-int select_task_rq(struct rq *rq, struct task_struct *p, int sd_flags, int wake_flags)
+int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
{
- int cpu = p->sched_class->select_task_rq(rq, p, sd_flags, wake_flags);
+ int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags);
/*
* In order not to call set_task_cpu() on a blocking task we need
}
#endif
-static inline void ttwu_activate(struct task_struct *p, struct rq *rq,
- bool is_sync, bool is_migrate, bool is_local,
- unsigned long en_flags)
+static void
+ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
{
+#ifdef CONFIG_SCHEDSTATS
+ struct rq *rq = this_rq();
+
+#ifdef CONFIG_SMP
+ int this_cpu = smp_processor_id();
+
+ if (cpu == this_cpu) {
+ schedstat_inc(rq, ttwu_local);
+ schedstat_inc(p, se.statistics.nr_wakeups_local);
+ } else {
+ struct sched_domain *sd;
+
+ schedstat_inc(p, se.statistics.nr_wakeups_remote);
+ rcu_read_lock();
+ for_each_domain(this_cpu, sd) {
+ if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
+ schedstat_inc(sd, ttwu_wake_remote);
+ break;
+ }
+ }
+ rcu_read_unlock();
+ }
+#endif /* CONFIG_SMP */
+
+ schedstat_inc(rq, ttwu_count);
schedstat_inc(p, se.statistics.nr_wakeups);
- if (is_sync)
+
+ if (wake_flags & WF_SYNC)
schedstat_inc(p, se.statistics.nr_wakeups_sync);
- if (is_migrate)
+
+ if (cpu != task_cpu(p))
schedstat_inc(p, se.statistics.nr_wakeups_migrate);
- if (is_local)
- schedstat_inc(p, se.statistics.nr_wakeups_local);
- else
- schedstat_inc(p, se.statistics.nr_wakeups_remote);
+#endif /* CONFIG_SCHEDSTATS */
+}
+
+static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
+{
activate_task(rq, p, en_flags);
+ p->on_rq = 1;
+
+ /* if a worker is waking up, notify workqueue */
+ if (p->flags & PF_WQ_WORKER)
+ wq_worker_waking_up(p, cpu_of(rq));
}
-static inline void ttwu_post_activation(struct task_struct *p, struct rq *rq,
- int wake_flags, bool success)
+/*
+ * Mark the task runnable and perform wakeup-preemption.
+ */
+static void
+ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
- trace_sched_wakeup(p, success);
+ trace_sched_wakeup(p, true);
check_preempt_curr(rq, p, wake_flags);
p->state = TASK_RUNNING;
rq->idle_stamp = 0;
}
#endif
- /* if a worker is waking up, notify workqueue */
- if ((p->flags & PF_WQ_WORKER) && success)
- wq_worker_waking_up(p, cpu_of(rq));
+}
+
+static void
+ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags)
+{
+#ifdef CONFIG_SMP
+ if (p->sched_contributes_to_load)
+ rq->nr_uninterruptible--;
+#endif
+
+ ttwu_activate(rq, p, ENQUEUE_WAKEUP | ENQUEUE_WAKING);
+ ttwu_do_wakeup(rq, p, wake_flags);
+}
+
+/*
+ * Called in case the task @p isn't fully descheduled from its runqueue,
+ * in this case we must do a remote wakeup. Its a 'light' wakeup though,
+ * since all we need to do is flip p->state to TASK_RUNNING, since
+ * the task is still ->on_rq.
+ */
+static int ttwu_remote(struct task_struct *p, int wake_flags)
+{
+ struct rq *rq;
+ int ret = 0;
+
+ rq = __task_rq_lock(p);
+ if (p->on_rq) {
+ ttwu_do_wakeup(rq, p, wake_flags);
+ ret = 1;
+ }
+ __task_rq_unlock(rq);
+
+ return ret;
+}
+
+#ifdef CONFIG_SMP
+static void sched_ttwu_pending(void)
+{
+ struct rq *rq = this_rq();
+ struct task_struct *list = xchg(&rq->wake_list, NULL);
+
+ if (!list)
+ return;
+
+ raw_spin_lock(&rq->lock);
+
+ while (list) {
+ struct task_struct *p = list;
+ list = list->wake_entry;
+ ttwu_do_activate(rq, p, 0);
+ }
+
+ raw_spin_unlock(&rq->lock);
+}
+
+void scheduler_ipi(void)
+{
+ sched_ttwu_pending();
+}
+
+static void ttwu_queue_remote(struct task_struct *p, int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+ struct task_struct *next = rq->wake_list;
+
+ for (;;) {
+ struct task_struct *old = next;
+
+ p->wake_entry = next;
+ next = cmpxchg(&rq->wake_list, old, p);
+ if (next == old)
+ break;
+ }
+
+ if (!next)
+ smp_send_reschedule(cpu);
+}
+#endif
+
+static void ttwu_queue(struct task_struct *p, int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+#if defined(CONFIG_SMP) && defined(CONFIG_SCHED_TTWU_QUEUE)
+ if (sched_feat(TTWU_QUEUE) && cpu != smp_processor_id()) {
+ ttwu_queue_remote(p, cpu);
+ return;
+ }
+#endif
+
+ raw_spin_lock(&rq->lock);
+ ttwu_do_activate(rq, p, 0);
+ raw_spin_unlock(&rq->lock);
}
/**
* Returns %true if @p was woken up, %false if it was already running
* or @state didn't match @p's state.
*/
-static int try_to_wake_up(struct task_struct *p, unsigned int state,
- int wake_flags)
+static int
+try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
{
- int cpu, orig_cpu, this_cpu, success = 0;
unsigned long flags;
- unsigned long en_flags = ENQUEUE_WAKEUP;
- struct rq *rq;
-
- this_cpu = get_cpu();
+ int cpu, success = 0;
smp_wmb();
- rq = task_rq_lock(p, &flags);
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
if (!(p->state & state))
goto out;
- if (p->se.on_rq)
- goto out_running;
-
+ success = 1; /* we're going to change ->state */
cpu = task_cpu(p);
- orig_cpu = cpu;
-#ifdef CONFIG_SMP
- if (unlikely(task_running(rq, p)))
- goto out_activate;
+ if (p->on_rq && ttwu_remote(p, wake_flags))
+ goto stat;
+#ifdef CONFIG_SMP
/*
- * In order to handle concurrent wakeups and release the rq->lock
- * we put the task in TASK_WAKING state.
- *
- * First fix up the nr_uninterruptible count:
+ * If the owning (remote) cpu is still in the middle of schedule() with
+ * this task as prev, wait until its done referencing the task.
*/
- if (task_contributes_to_load(p)) {
- if (likely(cpu_online(orig_cpu)))
- rq->nr_uninterruptible--;
- else
- this_rq()->nr_uninterruptible--;
- }
- p->state = TASK_WAKING;
-
- if (p->sched_class->task_waking) {
- p->sched_class->task_waking(rq, p);
- en_flags |= ENQUEUE_WAKING;
+ while (p->on_cpu) {
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+ /*
+ * If called from interrupt context we could have landed in the
+ * middle of schedule(), in this case we should take care not
+ * to spin on ->on_cpu if p is current, since that would
+ * deadlock.
+ */
+ if (p == current) {
+ ttwu_queue(p, cpu);
+ goto stat;
+ }
+#endif
+ cpu_relax();
}
-
- cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
- if (cpu != orig_cpu)
- set_task_cpu(p, cpu);
- __task_rq_unlock(rq);
-
- rq = cpu_rq(cpu);
- raw_spin_lock(&rq->lock);
-
/*
- * We migrated the task without holding either rq->lock, however
- * since the task is not on the task list itself, nobody else
- * will try and migrate the task, hence the rq should match the
- * cpu we just moved it to.
+ * Pairs with the smp_wmb() in finish_lock_switch().
*/
- WARN_ON(task_cpu(p) != cpu);
- WARN_ON(p->state != TASK_WAKING);
+ smp_rmb();
-#ifdef CONFIG_SCHEDSTATS
- schedstat_inc(rq, ttwu_count);
- if (cpu == this_cpu)
- schedstat_inc(rq, ttwu_local);
- else {
- struct sched_domain *sd;
- for_each_domain(this_cpu, sd) {
- if (cpumask_test_cpu(cpu, sched_domain_span(sd))) {
- schedstat_inc(sd, ttwu_wake_remote);
- break;
- }
- }
- }
-#endif /* CONFIG_SCHEDSTATS */
+ p->sched_contributes_to_load = !!task_contributes_to_load(p);
+ p->state = TASK_WAKING;
-out_activate:
+ if (p->sched_class->task_waking)
+ p->sched_class->task_waking(p);
+
+ cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
+ if (task_cpu(p) != cpu)
+ set_task_cpu(p, cpu);
#endif /* CONFIG_SMP */
- ttwu_activate(p, rq, wake_flags & WF_SYNC, orig_cpu != cpu,
- cpu == this_cpu, en_flags);
- success = 1;
-out_running:
- ttwu_post_activation(p, rq, wake_flags, success);
+
+ ttwu_queue(p, cpu);
+stat:
+ ttwu_stat(p, cpu, wake_flags);
out:
- task_rq_unlock(rq, &flags);
- put_cpu();
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
return success;
}
* try_to_wake_up_local - try to wake up a local task with rq lock held
* @p: the thread to be awakened
*
- * Put @p on the run-queue if it's not already there. The caller must
+ * Put @p on the run-queue if it's not already there. The caller must
* ensure that this_rq() is locked, @p is bound to this_rq() and not
- * the current task. this_rq() stays locked over invocation.
+ * the current task.
*/
static void try_to_wake_up_local(struct task_struct *p)
{
struct rq *rq = task_rq(p);
- bool success = false;
BUG_ON(rq != this_rq());
BUG_ON(p == current);
lockdep_assert_held(&rq->lock);
+ if (!raw_spin_trylock(&p->pi_lock)) {
+ raw_spin_unlock(&rq->lock);
+ raw_spin_lock(&p->pi_lock);
+ raw_spin_lock(&rq->lock);
+ }
+
if (!(p->state & TASK_NORMAL))
- return;
+ goto out;
- if (!p->se.on_rq) {
- if (likely(!task_running(rq, p))) {
- schedstat_inc(rq, ttwu_count);
- schedstat_inc(rq, ttwu_local);
- }
- ttwu_activate(p, rq, false, false, true, ENQUEUE_WAKEUP);
- success = true;
- }
- ttwu_post_activation(p, rq, 0, success);
+ if (!p->on_rq)
+ ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+
+ ttwu_do_wakeup(rq, p, 0);
+ ttwu_stat(p, smp_processor_id(), 0);
+out:
+ raw_spin_unlock(&p->pi_lock);
}
/**
*/
static void __sched_fork(struct task_struct *p)
{
+ p->on_rq = 0;
+
+ p->se.on_rq = 0;
p->se.exec_start = 0;
p->se.sum_exec_runtime = 0;
p->se.prev_sum_exec_runtime = 0;
p->se.nr_migrations = 0;
p->se.vruntime = 0;
+ INIT_LIST_HEAD(&p->se.group_node);
#ifdef CONFIG_SCHEDSTATS
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
#endif
INIT_LIST_HEAD(&p->rt.run_list);
- p->se.on_rq = 0;
- INIT_LIST_HEAD(&p->se.group_node);
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
/*
* fork()/clone()-time setup:
*/
-void sched_fork(struct task_struct *p, int clone_flags)
+void sched_fork(struct task_struct *p)
{
+ unsigned long flags;
int cpu = get_cpu();
__sched_fork(p);
*
* Silence PROVE_RCU.
*/
- rcu_read_lock();
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
set_task_cpu(p, cpu);
- rcu_read_unlock();
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
if (likely(sched_info_on()))
memset(&p->sched_info, 0, sizeof(p->sched_info));
#endif
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
- p->oncpu = 0;
+#if defined(CONFIG_SMP)
+ p->on_cpu = 0;
#endif
#ifdef CONFIG_PREEMPT
/* Want to start with kernel preemption disabled. */
* that must be done for every newly created context, then puts the task
* on the runqueue and wakes it.
*/
-void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
+void wake_up_new_task(struct task_struct *p)
{
unsigned long flags;
struct rq *rq;
- int cpu __maybe_unused = get_cpu();
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
#ifdef CONFIG_SMP
- rq = task_rq_lock(p, &flags);
- p->state = TASK_WAKING;
-
/*
* Fork balancing, do it here and not earlier because:
* - cpus_allowed can change in the fork path
* - any previously selected cpu might disappear through hotplug
- *
- * We set TASK_WAKING so that select_task_rq() can drop rq->lock
- * without people poking at ->cpus_allowed.
*/
- cpu = select_task_rq(rq, p, SD_BALANCE_FORK, 0);
- set_task_cpu(p, cpu);
-
- p->state = TASK_RUNNING;
- task_rq_unlock(rq, &flags);
+ set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
#endif
- rq = task_rq_lock(p, &flags);
+ rq = __task_rq_lock(p);
activate_task(rq, p, 0);
- trace_sched_wakeup_new(p, 1);
+ p->on_rq = 1;
+ trace_sched_wakeup_new(p, true);
check_preempt_curr(rq, p, WF_FORK);
#ifdef CONFIG_SMP
if (p->sched_class->task_woken)
p->sched_class->task_woken(rq, p);
#endif
- task_rq_unlock(rq, &flags);
- put_cpu();
+ task_rq_unlock(rq, p, &flags);
}
#ifdef CONFIG_PREEMPT_NOTIFIERS
{
struct task_struct *p = current;
unsigned long flags;
- struct rq *rq;
int dest_cpu;
- rq = task_rq_lock(p, &flags);
- dest_cpu = p->sched_class->select_task_rq(rq, p, SD_BALANCE_EXEC, 0);
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
+ dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
if (dest_cpu == smp_processor_id())
goto unlock;
- /*
- * select_task_rq() can race against ->cpus_allowed
- */
- if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
- likely(cpu_active(dest_cpu)) && migrate_task(p, rq)) {
+ if (likely(cpu_active(dest_cpu))) {
struct migration_arg arg = { p, dest_cpu };
- task_rq_unlock(rq, &flags);
- stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
return;
}
unlock:
- task_rq_unlock(rq, &flags);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
}
#endif
rq = task_rq_lock(p, &flags);
ns = do_task_delta_exec(p, rq);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
return ns;
}
rq = task_rq_lock(p, &flags);
ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
return ns;
}
rq = task_rq_lock(p, &flags);
thread_group_cputime(p, &totals);
ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
return ns;
}
/*
* This function gets called by the timer code, with HZ frequency.
* We call it with interrupts disabled.
- *
- * It also gets called by the fork code, when changing the parent's
- * timeslices.
*/
void scheduler_tick(void)
{
profile_hit(SCHED_PROFILING, __builtin_return_address(0));
schedstat_inc(this_rq(), sched_count);
-#ifdef CONFIG_SCHEDSTATS
- if (unlikely(prev->lock_depth >= 0)) {
- schedstat_inc(this_rq(), rq_sched_info.bkl_count);
- schedstat_inc(prev, sched_info.bkl_count);
- }
-#endif
}
static void put_prev_task(struct rq *rq, struct task_struct *prev)
{
- if (prev->se.on_rq)
+ if (prev->on_rq || rq->skip_clock_update < 0)
update_rq_clock(rq);
prev->sched_class->put_prev_task(rq, prev);
}
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
+ deactivate_task(rq, prev, DEQUEUE_SLEEP);
+ prev->on_rq = 0;
+
/*
- * If a worker is going to sleep, notify and
- * ask workqueue whether it wants to wake up a
- * task to maintain concurrency. If so, wake
- * up the task.
+ * If a worker went to sleep, notify and ask workqueue
+ * whether it wants to wake up a task to maintain
+ * concurrency.
*/
if (prev->flags & PF_WQ_WORKER) {
struct task_struct *to_wakeup;
if (to_wakeup)
try_to_wake_up_local(to_wakeup);
}
- deactivate_task(rq, prev, DEQUEUE_SLEEP);
/*
- * If we are going to sleep and we have plugged IO queued, make
- * sure to submit it to avoid deadlocks.
+ * If we are going to sleep and we have plugged IO
+ * queued, make sure to submit it to avoid deadlocks.
*/
if (blk_needs_flush_plug(prev)) {
raw_spin_unlock(&rq->lock);
EXPORT_SYMBOL(schedule);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * Look out! "owner" is an entirely speculative pointer
- * access and not reliable.
- */
-int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
-{
- unsigned int cpu;
- struct rq *rq;
- if (!sched_feat(OWNER_SPIN))
- return 0;
+static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
+{
+ bool ret = false;
-#ifdef CONFIG_DEBUG_PAGEALLOC
- /*
- * Need to access the cpu field knowing that
- * DEBUG_PAGEALLOC could have unmapped it if
- * the mutex owner just released it and exited.
- */
- if (probe_kernel_address(&owner->cpu, cpu))
- return 0;
-#else
- cpu = owner->cpu;
-#endif
+ rcu_read_lock();
+ if (lock->owner != owner)
+ goto fail;
/*
- * Even if the access succeeded (likely case),
- * the cpu field may no longer be valid.
+ * Ensure we emit the owner->on_cpu, dereference _after_ checking
+ * lock->owner still matches owner, if that fails, owner might
+ * point to free()d memory, if it still matches, the rcu_read_lock()
+ * ensures the memory stays valid.
*/
- if (cpu >= nr_cpumask_bits)
- return 0;
+ barrier();
- /*
- * We need to validate that we can do a
- * get_cpu() and that we have the percpu area.
- */
- if (!cpu_online(cpu))
- return 0;
+ ret = owner->on_cpu;
+fail:
+ rcu_read_unlock();
- rq = cpu_rq(cpu);
+ return ret;
+}
- for (;;) {
- /*
- * Owner changed, break to re-assess state.
- */
- if (lock->owner != owner) {
- /*
- * If the lock has switched to a different owner,
- * we likely have heavy contention. Return 0 to quit
- * optimistic spinning and not contend further:
- */
- if (lock->owner)
- return 0;
- break;
- }
+/*
+ * Look out! "owner" is an entirely speculative pointer
+ * access and not reliable.
+ */
+int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+{
+ if (!sched_feat(OWNER_SPIN))
+ return 0;
- /*
- * Is that owner really running on that cpu?
- */
- if (task_thread_info(rq->curr) != owner || need_resched())
+ while (owner_running(lock, owner)) {
+ if (need_resched())
return 0;
arch_mutex_cpu_relax();
}
+ /*
+ * If the owner changed to another task there is likely
+ * heavy contention, stop spinning.
+ */
+ if (lock->owner)
+ return 0;
+
return 1;
}
#endif
*/
void rt_mutex_setprio(struct task_struct *p, int prio)
{
- unsigned long flags;
int oldprio, on_rq, running;
struct rq *rq;
const struct sched_class *prev_class;
BUG_ON(prio < 0 || prio > MAX_PRIO);
- rq = task_rq_lock(p, &flags);
+ rq = __task_rq_lock(p);
trace_sched_pi_setprio(p, prio);
oldprio = p->prio;
prev_class = p->sched_class;
- on_rq = p->se.on_rq;
+ on_rq = p->on_rq;
running = task_current(rq, p);
if (on_rq)
dequeue_task(rq, p, 0);
enqueue_task(rq, p, oldprio < prio ? ENQUEUE_HEAD : 0);
check_class_changed(rq, p, prev_class, oldprio);
- task_rq_unlock(rq, &flags);
+ __task_rq_unlock(rq);
}
#endif
p->static_prio = NICE_TO_PRIO(nice);
goto out_unlock;
}
- on_rq = p->se.on_rq;
+ on_rq = p->on_rq;
if (on_rq)
dequeue_task(rq, p, 0);
resched_task(rq->curr);
}
out_unlock:
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
}
EXPORT_SYMBOL(set_user_nice);
static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
- BUG_ON(p->se.on_rq);
-
p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
/*
* make sure no PI-waiters arrive (or leave) while we are
* changing the priority of the task:
- */
- raw_spin_lock_irqsave(&p->pi_lock, flags);
- /*
+ *
* To be able to change p->policy safely, the appropriate
* runqueue lock must be held.
*/
- rq = __task_rq_lock(p);
+ rq = task_rq_lock(p, &flags);
/*
* Changing the policy of the stop threads its a very bad idea
*/
if (p == rq->stop) {
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
return -EINVAL;
}
if (rt_bandwidth_enabled() && rt_policy(policy) &&
task_group(p)->rt_bandwidth.rt_runtime == 0 &&
!task_group_is_autogroup(task_group(p))) {
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
return -EPERM;
}
}
/* recheck policy now with rq lock held */
if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
policy = oldpolicy = -1;
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
goto recheck;
}
- on_rq = p->se.on_rq;
+ on_rq = p->on_rq;
running = task_current(rq, p);
if (on_rq)
deactivate_task(rq, p, 0);
activate_task(rq, p, 0);
check_class_changed(rq, p, prev_class, oldprio);
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
rt_mutex_adjust_pi(p);
{
struct task_struct *p;
unsigned long flags;
- struct rq *rq;
int retval;
get_online_cpus();
if (retval)
goto out_unlock;
- rq = task_rq_lock(p, &flags);
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
- task_rq_unlock(rq, &flags);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
out_unlock:
rcu_read_unlock();
rq = task_rq_lock(p, &flags);
time_slice = p->sched_class->get_rr_interval(rq, p);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
rcu_read_unlock();
jiffies_to_timespec(time_slice, &t);
rcu_read_unlock();
rq->curr = rq->idle = idle;
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
- idle->oncpu = 1;
+#if defined(CONFIG_SMP)
+ idle->on_cpu = 1;
#endif
raw_spin_unlock_irqrestore(&rq->lock, flags);
/* Set the preempt count _outside_ the spinlocks! */
-#if defined(CONFIG_PREEMPT)
- task_thread_info(idle)->preempt_count = (idle->lock_depth >= 0);
-#else
task_thread_info(idle)->preempt_count = 0;
-#endif
+
/*
* The idle tasks have their own, simple scheduling class:
*/
unsigned int dest_cpu;
int ret = 0;
- /*
- * Serialize against TASK_WAKING so that ttwu() and wunt() can
- * drop the rq->lock and still rely on ->cpus_allowed.
- */
-again:
- while (task_is_waking(p))
- cpu_relax();
rq = task_rq_lock(p, &flags);
- if (task_is_waking(p)) {
- task_rq_unlock(rq, &flags);
- goto again;
- }
+
+ if (cpumask_equal(&p->cpus_allowed, new_mask))
+ goto out;
if (!cpumask_intersects(new_mask, cpu_active_mask)) {
ret = -EINVAL;
goto out;
}
- if (unlikely((p->flags & PF_THREAD_BOUND) && p != current &&
- !cpumask_equal(&p->cpus_allowed, new_mask))) {
+ if (unlikely((p->flags & PF_THREAD_BOUND) && p != current)) {
ret = -EINVAL;
goto out;
}
goto out;
dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
- if (migrate_task(p, rq)) {
+ if (p->on_rq) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
tlb_migrate_finish(p->mm);
return 0;
}
out:
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
return ret;
}
rq_src = cpu_rq(src_cpu);
rq_dest = cpu_rq(dest_cpu);
+ raw_spin_lock(&p->pi_lock);
double_rq_lock(rq_src, rq_dest);
/* Already moved. */
if (task_cpu(p) != src_cpu)
* If we're not on a rq, the next wake-up will ensure we're
* placed properly.
*/
- if (p->se.on_rq) {
+ if (p->on_rq) {
deactivate_task(rq_src, p, 0);
set_task_cpu(p, dest_cpu);
activate_task(rq_dest, p, 0);
ret = 1;
fail:
double_rq_unlock(rq_src, rq_dest);
+ raw_spin_unlock(&p->pi_lock);
return ret;
}
#ifdef CONFIG_HOTPLUG_CPU
case CPU_DYING:
+ sched_ttwu_pending();
/* Update our root-domain */
raw_spin_lock_irqsave(&rq->lock, flags);
if (rq->rd) {
#ifdef CONFIG_SMP
+static cpumask_var_t sched_domains_tmpmask; /* sched_domains_mutex */
+
#ifdef CONFIG_SCHED_DEBUG
static __read_mostly int sched_domain_debug_enabled;
static void sched_domain_debug(struct sched_domain *sd, int cpu)
{
- cpumask_var_t groupmask;
int level = 0;
if (!sched_domain_debug_enabled)
printk(KERN_DEBUG "CPU%d attaching sched-domain:\n", cpu);
- if (!alloc_cpumask_var(&groupmask, GFP_KERNEL)) {
- printk(KERN_DEBUG "Cannot load-balance (out of memory)\n");
- return;
- }
-
for (;;) {
- if (sched_domain_debug_one(sd, cpu, level, groupmask))
+ if (sched_domain_debug_one(sd, cpu, level, sched_domains_tmpmask))
break;
level++;
sd = sd->parent;
if (!sd)
break;
}
- free_cpumask_var(groupmask);
}
#else /* !CONFIG_SCHED_DEBUG */
# define sched_domain_debug(sd, cpu) do { } while (0)
return 1;
}
-static void free_rootdomain(struct root_domain *rd)
+static void free_rootdomain(struct rcu_head *rcu)
{
- synchronize_sched();
+ struct root_domain *rd = container_of(rcu, struct root_domain, rcu);
cpupri_cleanup(&rd->cpupri);
-
free_cpumask_var(rd->rto_mask);
free_cpumask_var(rd->online);
free_cpumask_var(rd->span);
raw_spin_unlock_irqrestore(&rq->lock, flags);
if (old_rd)
- free_rootdomain(old_rd);
+ call_rcu_sched(&old_rd->rcu, free_rootdomain);
}
static int init_rootdomain(struct root_domain *rd)
return rd;
}
+static void free_sched_domain(struct rcu_head *rcu)
+{
+ struct sched_domain *sd = container_of(rcu, struct sched_domain, rcu);
+ if (atomic_dec_and_test(&sd->groups->ref))
+ kfree(sd->groups);
+ kfree(sd);
+}
+
+static void destroy_sched_domain(struct sched_domain *sd, int cpu)
+{
+ call_rcu(&sd->rcu, free_sched_domain);
+}
+
+static void destroy_sched_domains(struct sched_domain *sd, int cpu)
+{
+ for (; sd; sd = sd->parent)
+ destroy_sched_domain(sd, cpu);
+}
+
/*
* Attach the domain 'sd' to 'cpu' as its base domain. Callers must
* hold the hotplug lock.
struct rq *rq = cpu_rq(cpu);
struct sched_domain *tmp;
- for (tmp = sd; tmp; tmp = tmp->parent)
- tmp->span_weight = cpumask_weight(sched_domain_span(tmp));
-
/* Remove the sched domains which do not contribute to scheduling. */
for (tmp = sd; tmp; ) {
struct sched_domain *parent = tmp->parent;
tmp->parent = parent->parent;
if (parent->parent)
parent->parent->child = tmp;
+ destroy_sched_domain(parent, cpu);
} else
tmp = tmp->parent;
}
if (sd && sd_degenerate(sd)) {
+ tmp = sd;
sd = sd->parent;
+ destroy_sched_domain(tmp, cpu);
if (sd)
sd->child = NULL;
}
sched_domain_debug(sd, cpu);
rq_attach_root(rq, rd);
+ tmp = rq->sd;
rcu_assign_pointer(rq->sd, sd);
+ destroy_sched_domains(tmp, cpu);
}
/* cpus with isolated domains */
__setup("isolcpus=", isolated_cpu_setup);
-/*
- * init_sched_build_groups takes the cpumask we wish to span, and a pointer
- * to a function which identifies what group(along with sched group) a CPU
- * belongs to. The return value of group_fn must be a >= 0 and < nr_cpu_ids
- * (due to the fact that we keep track of groups covered with a struct cpumask).
- *
- * init_sched_build_groups will build a circular linked list of the groups
- * covered by the given span, and will set each group's ->cpumask correctly,
- * and ->cpu_power to 0.
- */
-static void
-init_sched_build_groups(const struct cpumask *span,
- const struct cpumask *cpu_map,
- int (*group_fn)(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg,
- struct cpumask *tmpmask),
- struct cpumask *covered, struct cpumask *tmpmask)
-{
- struct sched_group *first = NULL, *last = NULL;
- int i;
-
- cpumask_clear(covered);
-
- for_each_cpu(i, span) {
- struct sched_group *sg;
- int group = group_fn(i, cpu_map, &sg, tmpmask);
- int j;
-
- if (cpumask_test_cpu(i, covered))
- continue;
-
- cpumask_clear(sched_group_cpus(sg));
- sg->cpu_power = 0;
-
- for_each_cpu(j, span) {
- if (group_fn(j, cpu_map, NULL, tmpmask) != group)
- continue;
-
- cpumask_set_cpu(j, covered);
- cpumask_set_cpu(j, sched_group_cpus(sg));
- }
- if (!first)
- first = sg;
- if (last)
- last->next = sg;
- last = sg;
- }
- last->next = first;
-}
-
#define SD_NODES_PER_DOMAIN 16
#ifdef CONFIG_NUMA
*/
static int find_next_best_node(int node, nodemask_t *used_nodes)
{
- int i, n, val, min_val, best_node = 0;
+ int i, n, val, min_val, best_node = -1;
min_val = INT_MAX;
}
}
- node_set(best_node, *used_nodes);
+ if (best_node != -1)
+ node_set(best_node, *used_nodes);
return best_node;
}
for (i = 1; i < SD_NODES_PER_DOMAIN; i++) {
int next_node = find_next_best_node(node, &used_nodes);
-
+ if (next_node < 0)
+ break;
cpumask_or(span, span, cpumask_of_node(next_node));
}
}
+
+static const struct cpumask *cpu_node_mask(int cpu)
+{
+ lockdep_assert_held(&sched_domains_mutex);
+
+ sched_domain_node_span(cpu_to_node(cpu), sched_domains_tmpmask);
+
+ return sched_domains_tmpmask;
+}
+
+static const struct cpumask *cpu_allnodes_mask(int cpu)
+{
+ return cpu_possible_mask;
+}
#endif /* CONFIG_NUMA */
-int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
+static const struct cpumask *cpu_cpu_mask(int cpu)
+{
+ return cpumask_of_node(cpu_to_node(cpu));
+}
-/*
- * The cpus mask in sched_group and sched_domain hangs off the end.
- *
- * ( See the the comments in include/linux/sched.h:struct sched_group
- * and struct sched_domain. )
- */
-struct static_sched_group {
- struct sched_group sg;
- DECLARE_BITMAP(cpus, CONFIG_NR_CPUS);
-};
+int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
-struct static_sched_domain {
- struct sched_domain sd;
- DECLARE_BITMAP(span, CONFIG_NR_CPUS);
+struct sd_data {
+ struct sched_domain **__percpu sd;
+ struct sched_group **__percpu sg;
};
struct s_data {
-#ifdef CONFIG_NUMA
- int sd_allnodes;
- cpumask_var_t domainspan;
- cpumask_var_t covered;
- cpumask_var_t notcovered;
-#endif
- cpumask_var_t nodemask;
- cpumask_var_t this_sibling_map;
- cpumask_var_t this_core_map;
- cpumask_var_t this_book_map;
- cpumask_var_t send_covered;
- cpumask_var_t tmpmask;
- struct sched_group **sched_group_nodes;
+ struct sched_domain ** __percpu sd;
struct root_domain *rd;
};
enum s_alloc {
- sa_sched_groups = 0,
sa_rootdomain,
- sa_tmpmask,
- sa_send_covered,
- sa_this_book_map,
- sa_this_core_map,
- sa_this_sibling_map,
- sa_nodemask,
- sa_sched_group_nodes,
-#ifdef CONFIG_NUMA
- sa_notcovered,
- sa_covered,
- sa_domainspan,
-#endif
+ sa_sd,
+ sa_sd_storage,
sa_none,
};
-/*
- * SMT sched-domains:
- */
-#ifdef CONFIG_SCHED_SMT
-static DEFINE_PER_CPU(struct static_sched_domain, cpu_domains);
-static DEFINE_PER_CPU(struct static_sched_group, sched_groups);
-
-static int
-cpu_to_cpu_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg, struct cpumask *unused)
-{
- if (sg)
- *sg = &per_cpu(sched_groups, cpu).sg;
- return cpu;
-}
-#endif /* CONFIG_SCHED_SMT */
+struct sched_domain_topology_level;
-/*
- * multi-core sched-domains:
- */
-#ifdef CONFIG_SCHED_MC
-static DEFINE_PER_CPU(struct static_sched_domain, core_domains);
-static DEFINE_PER_CPU(struct static_sched_group, sched_group_core);
+typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
+typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
-static int
-cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg, struct cpumask *mask)
-{
- int group;
-#ifdef CONFIG_SCHED_SMT
- cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
- group = cpumask_first(mask);
-#else
- group = cpu;
-#endif
- if (sg)
- *sg = &per_cpu(sched_group_core, group).sg;
- return group;
-}
-#endif /* CONFIG_SCHED_MC */
+struct sched_domain_topology_level {
+ sched_domain_init_f init;
+ sched_domain_mask_f mask;
+ struct sd_data data;
+};
/*
- * book sched-domains:
+ * Assumes the sched_domain tree is fully constructed
*/
-#ifdef CONFIG_SCHED_BOOK
-static DEFINE_PER_CPU(struct static_sched_domain, book_domains);
-static DEFINE_PER_CPU(struct static_sched_group, sched_group_book);
-
-static int
-cpu_to_book_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg, struct cpumask *mask)
+static int get_group(int cpu, struct sd_data *sdd, struct sched_group **sg)
{
- int group = cpu;
-#ifdef CONFIG_SCHED_MC
- cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
- group = cpumask_first(mask);
-#elif defined(CONFIG_SCHED_SMT)
- cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
- group = cpumask_first(mask);
-#endif
- if (sg)
- *sg = &per_cpu(sched_group_book, group).sg;
- return group;
-}
-#endif /* CONFIG_SCHED_BOOK */
+ struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
+ struct sched_domain *child = sd->child;
-static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
-static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
+ if (child)
+ cpu = cpumask_first(sched_domain_span(child));
-static int
-cpu_to_phys_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg, struct cpumask *mask)
-{
- int group;
-#ifdef CONFIG_SCHED_BOOK
- cpumask_and(mask, cpu_book_mask(cpu), cpu_map);
- group = cpumask_first(mask);
-#elif defined(CONFIG_SCHED_MC)
- cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
- group = cpumask_first(mask);
-#elif defined(CONFIG_SCHED_SMT)
- cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
- group = cpumask_first(mask);
-#else
- group = cpu;
-#endif
if (sg)
- *sg = &per_cpu(sched_group_phys, group).sg;
- return group;
+ *sg = *per_cpu_ptr(sdd->sg, cpu);
+
+ return cpu;
}
-#ifdef CONFIG_NUMA
/*
- * The init_sched_build_groups can't handle what we want to do with node
- * groups, so roll our own. Now each node has its own list of groups which
- * gets dynamically allocated.
+ * build_sched_groups takes the cpumask we wish to span, and a pointer
+ * to a function which identifies what group(along with sched group) a CPU
+ * belongs to. The return value of group_fn must be a >= 0 and < nr_cpu_ids
+ * (due to the fact that we keep track of groups covered with a struct cpumask).
+ *
+ * build_sched_groups will build a circular linked list of the groups
+ * covered by the given span, and will set each group's ->cpumask correctly,
+ * and ->cpu_power to 0.
*/
-static DEFINE_PER_CPU(struct static_sched_domain, node_domains);
-static struct sched_group ***sched_group_nodes_bycpu;
-
-static DEFINE_PER_CPU(struct static_sched_domain, allnodes_domains);
-static DEFINE_PER_CPU(struct static_sched_group, sched_group_allnodes);
-
-static int cpu_to_allnodes_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg,
- struct cpumask *nodemask)
-{
- int group;
-
- cpumask_and(nodemask, cpumask_of_node(cpu_to_node(cpu)), cpu_map);
- group = cpumask_first(nodemask);
-
- if (sg)
- *sg = &per_cpu(sched_group_allnodes, group).sg;
- return group;
-}
-
-static void init_numa_sched_groups_power(struct sched_group *group_head)
-{
- struct sched_group *sg = group_head;
- int j;
-
- if (!sg)
- return;
- do {
- for_each_cpu(j, sched_group_cpus(sg)) {
- struct sched_domain *sd;
-
- sd = &per_cpu(phys_domains, j).sd;
- if (j != group_first_cpu(sd->groups)) {
- /*
- * Only add "power" once for each
- * physical package.
- */
- continue;
- }
-
- sg->cpu_power += sd->groups->cpu_power;
- }
- sg = sg->next;
- } while (sg != group_head);
-}
-
-static int build_numa_sched_groups(struct s_data *d,
- const struct cpumask *cpu_map, int num)
+static void
+build_sched_groups(struct sched_domain *sd)
{
- struct sched_domain *sd;
- struct sched_group *sg, *prev;
- int n, j;
-
- cpumask_clear(d->covered);
- cpumask_and(d->nodemask, cpumask_of_node(num), cpu_map);
- if (cpumask_empty(d->nodemask)) {
- d->sched_group_nodes[num] = NULL;
- goto out;
- }
-
- sched_domain_node_span(num, d->domainspan);
- cpumask_and(d->domainspan, d->domainspan, cpu_map);
-
- sg = kmalloc_node(sizeof(struct sched_group) + cpumask_size(),
- GFP_KERNEL, num);
- if (!sg) {
- printk(KERN_WARNING "Can not alloc domain group for node %d\n",
- num);
- return -ENOMEM;
- }
- d->sched_group_nodes[num] = sg;
-
- for_each_cpu(j, d->nodemask) {
- sd = &per_cpu(node_domains, j).sd;
- sd->groups = sg;
- }
-
- sg->cpu_power = 0;
- cpumask_copy(sched_group_cpus(sg), d->nodemask);
- sg->next = sg;
- cpumask_or(d->covered, d->covered, d->nodemask);
+ struct sched_group *first = NULL, *last = NULL;
+ struct sd_data *sdd = sd->private;
+ const struct cpumask *span = sched_domain_span(sd);
+ struct cpumask *covered;
+ int i;
- prev = sg;
- for (j = 0; j < nr_node_ids; j++) {
- n = (num + j) % nr_node_ids;
- cpumask_complement(d->notcovered, d->covered);
- cpumask_and(d->tmpmask, d->notcovered, cpu_map);
- cpumask_and(d->tmpmask, d->tmpmask, d->domainspan);
- if (cpumask_empty(d->tmpmask))
- break;
- cpumask_and(d->tmpmask, d->tmpmask, cpumask_of_node(n));
- if (cpumask_empty(d->tmpmask))
- continue;
- sg = kmalloc_node(sizeof(struct sched_group) + cpumask_size(),
- GFP_KERNEL, num);
- if (!sg) {
- printk(KERN_WARNING
- "Can not alloc domain group for node %d\n", j);
- return -ENOMEM;
- }
- sg->cpu_power = 0;
- cpumask_copy(sched_group_cpus(sg), d->tmpmask);
- sg->next = prev->next;
- cpumask_or(d->covered, d->covered, d->tmpmask);
- prev->next = sg;
- prev = sg;
- }
-out:
- return 0;
-}
-#endif /* CONFIG_NUMA */
+ lockdep_assert_held(&sched_domains_mutex);
+ covered = sched_domains_tmpmask;
-#ifdef CONFIG_NUMA
-/* Free memory allocated for various sched_group structures */
-static void free_sched_groups(const struct cpumask *cpu_map,
- struct cpumask *nodemask)
-{
- int cpu, i;
+ cpumask_clear(covered);
- for_each_cpu(cpu, cpu_map) {
- struct sched_group **sched_group_nodes
- = sched_group_nodes_bycpu[cpu];
+ for_each_cpu(i, span) {
+ struct sched_group *sg;
+ int group = get_group(i, sdd, &sg);
+ int j;
- if (!sched_group_nodes)
+ if (cpumask_test_cpu(i, covered))
continue;
- for (i = 0; i < nr_node_ids; i++) {
- struct sched_group *oldsg, *sg = sched_group_nodes[i];
+ cpumask_clear(sched_group_cpus(sg));
+ sg->cpu_power = 0;
- cpumask_and(nodemask, cpumask_of_node(i), cpu_map);
- if (cpumask_empty(nodemask))
+ for_each_cpu(j, span) {
+ if (get_group(j, sdd, NULL) != group)
continue;
- if (sg == NULL)
- continue;
- sg = sg->next;
-next_sg:
- oldsg = sg;
- sg = sg->next;
- kfree(oldsg);
- if (oldsg != sched_group_nodes[i])
- goto next_sg;
+ cpumask_set_cpu(j, covered);
+ cpumask_set_cpu(j, sched_group_cpus(sg));
}
- kfree(sched_group_nodes);
- sched_group_nodes_bycpu[cpu] = NULL;
+
+ if (!first)
+ first = sg;
+ if (last)
+ last->next = sg;
+ last = sg;
}
+ last->next = first;
}
-#else /* !CONFIG_NUMA */
-static void free_sched_groups(const struct cpumask *cpu_map,
- struct cpumask *nodemask)
-{
-}
-#endif /* CONFIG_NUMA */
/*
* Initialize sched groups cpu_power.
*/
static void init_sched_groups_power(int cpu, struct sched_domain *sd)
{
- struct sched_domain *child;
- struct sched_group *group;
- long power;
- int weight;
-
WARN_ON(!sd || !sd->groups);
if (cpu != group_first_cpu(sd->groups))
sd->groups->group_weight = cpumask_weight(sched_group_cpus(sd->groups));
- child = sd->child;
-
- sd->groups->cpu_power = 0;
-
- if (!child) {
- power = SCHED_LOAD_SCALE;
- weight = cpumask_weight(sched_domain_span(sd));
- /*
- * SMT siblings share the power of a single core.
- * Usually multiple threads get a better yield out of
- * that one core than a single thread would have,
- * reflect that in sd->smt_gain.
- */
- if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
- power *= sd->smt_gain;
- power /= weight;
- power >>= SCHED_LOAD_SHIFT;
- }
- sd->groups->cpu_power += power;
- return;
- }
-
- /*
- * Add cpu_power of each child group to this groups cpu_power.
- */
- group = child->groups;
- do {
- sd->groups->cpu_power += group->cpu_power;
- group = group->next;
- } while (group != child->groups);
+ update_group_power(sd, cpu);
}
/*
# define SD_INIT_NAME(sd, type) do { } while (0)
#endif
-#define SD_INIT(sd, type) sd_init_##type(sd)
-
-#define SD_INIT_FUNC(type) \
-static noinline void sd_init_##type(struct sched_domain *sd) \
-{ \
- memset(sd, 0, sizeof(*sd)); \
- *sd = SD_##type##_INIT; \
- sd->level = SD_LV_##type; \
- SD_INIT_NAME(sd, type); \
+#define SD_INIT_FUNC(type) \
+static noinline struct sched_domain * \
+sd_init_##type(struct sched_domain_topology_level *tl, int cpu) \
+{ \
+ struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu); \
+ *sd = SD_##type##_INIT; \
+ SD_INIT_NAME(sd, type); \
+ sd->private = &tl->data; \
+ return sd; \
}
SD_INIT_FUNC(CPU)
#endif
static int default_relax_domain_level = -1;
+int sched_domain_level_max;
static int __init setup_relax_domain_level(char *str)
{
unsigned long val;
val = simple_strtoul(str, NULL, 0);
- if (val < SD_LV_MAX)
+ if (val < sched_domain_level_max)
default_relax_domain_level = val;
return 1;
}
}
+static void __sdt_free(const struct cpumask *cpu_map);
+static int __sdt_alloc(const struct cpumask *cpu_map);
+
static void __free_domain_allocs(struct s_data *d, enum s_alloc what,
const struct cpumask *cpu_map)
{
switch (what) {
- case sa_sched_groups:
- free_sched_groups(cpu_map, d->tmpmask); /* fall through */
- d->sched_group_nodes = NULL;
case sa_rootdomain:
- free_rootdomain(d->rd); /* fall through */
- case sa_tmpmask:
- free_cpumask_var(d->tmpmask); /* fall through */
- case sa_send_covered:
- free_cpumask_var(d->send_covered); /* fall through */
- case sa_this_book_map:
- free_cpumask_var(d->this_book_map); /* fall through */
- case sa_this_core_map:
- free_cpumask_var(d->this_core_map); /* fall through */
- case sa_this_sibling_map:
- free_cpumask_var(d->this_sibling_map); /* fall through */
- case sa_nodemask:
- free_cpumask_var(d->nodemask); /* fall through */
- case sa_sched_group_nodes:
-#ifdef CONFIG_NUMA
- kfree(d->sched_group_nodes); /* fall through */
- case sa_notcovered:
- free_cpumask_var(d->notcovered); /* fall through */
- case sa_covered:
- free_cpumask_var(d->covered); /* fall through */
- case sa_domainspan:
- free_cpumask_var(d->domainspan); /* fall through */
-#endif
+ if (!atomic_read(&d->rd->refcount))
+ free_rootdomain(&d->rd->rcu); /* fall through */
+ case sa_sd:
+ free_percpu(d->sd); /* fall through */
+ case sa_sd_storage:
+ __sdt_free(cpu_map); /* fall through */
case sa_none:
break;
}
static enum s_alloc __visit_domain_allocation_hell(struct s_data *d,
const struct cpumask *cpu_map)
{
-#ifdef CONFIG_NUMA
- if (!alloc_cpumask_var(&d->domainspan, GFP_KERNEL))
- return sa_none;
- if (!alloc_cpumask_var(&d->covered, GFP_KERNEL))
- return sa_domainspan;
- if (!alloc_cpumask_var(&d->notcovered, GFP_KERNEL))
- return sa_covered;
- /* Allocate the per-node list of sched groups */
- d->sched_group_nodes = kcalloc(nr_node_ids,
- sizeof(struct sched_group *), GFP_KERNEL);
- if (!d->sched_group_nodes) {
- printk(KERN_WARNING "Can not alloc sched group node list\n");
- return sa_notcovered;
- }
- sched_group_nodes_bycpu[cpumask_first(cpu_map)] = d->sched_group_nodes;
-#endif
- if (!alloc_cpumask_var(&d->nodemask, GFP_KERNEL))
- return sa_sched_group_nodes;
- if (!alloc_cpumask_var(&d->this_sibling_map, GFP_KERNEL))
- return sa_nodemask;
- if (!alloc_cpumask_var(&d->this_core_map, GFP_KERNEL))
- return sa_this_sibling_map;
- if (!alloc_cpumask_var(&d->this_book_map, GFP_KERNEL))
- return sa_this_core_map;
- if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
- return sa_this_book_map;
- if (!alloc_cpumask_var(&d->tmpmask, GFP_KERNEL))
- return sa_send_covered;
+ memset(d, 0, sizeof(*d));
+
+ if (__sdt_alloc(cpu_map))
+ return sa_sd_storage;
+ d->sd = alloc_percpu(struct sched_domain *);
+ if (!d->sd)
+ return sa_sd_storage;
d->rd = alloc_rootdomain();
- if (!d->rd) {
- printk(KERN_WARNING "Cannot alloc root domain\n");
- return sa_tmpmask;
- }
+ if (!d->rd)
+ return sa_sd;
return sa_rootdomain;
}
-static struct sched_domain *__build_numa_sched_domains(struct s_data *d,
- const struct cpumask *cpu_map, struct sched_domain_attr *attr, int i)
+/*
+ * NULL the sd_data elements we've used to build the sched_domain and
+ * sched_group structure so that the subsequent __free_domain_allocs()
+ * will not free the data we're using.
+ */
+static void claim_allocations(int cpu, struct sched_domain *sd)
{
- struct sched_domain *sd = NULL;
-#ifdef CONFIG_NUMA
- struct sched_domain *parent;
-
- d->sd_allnodes = 0;
- if (cpumask_weight(cpu_map) >
- SD_NODES_PER_DOMAIN * cpumask_weight(d->nodemask)) {
- sd = &per_cpu(allnodes_domains, i).sd;
- SD_INIT(sd, ALLNODES);
- set_domain_attribute(sd, attr);
- cpumask_copy(sched_domain_span(sd), cpu_map);
- cpu_to_allnodes_group(i, cpu_map, &sd->groups, d->tmpmask);
- d->sd_allnodes = 1;
- }
- parent = sd;
-
- sd = &per_cpu(node_domains, i).sd;
- SD_INIT(sd, NODE);
- set_domain_attribute(sd, attr);
- sched_domain_node_span(cpu_to_node(i), sched_domain_span(sd));
- sd->parent = parent;
- if (parent)
- parent->child = sd;
- cpumask_and(sched_domain_span(sd), sched_domain_span(sd), cpu_map);
-#endif
- return sd;
-}
+ struct sd_data *sdd = sd->private;
+ struct sched_group *sg = sd->groups;
-static struct sched_domain *__build_cpu_sched_domain(struct s_data *d,
- const struct cpumask *cpu_map, struct sched_domain_attr *attr,
- struct sched_domain *parent, int i)
-{
- struct sched_domain *sd;
- sd = &per_cpu(phys_domains, i).sd;
- SD_INIT(sd, CPU);
- set_domain_attribute(sd, attr);
- cpumask_copy(sched_domain_span(sd), d->nodemask);
- sd->parent = parent;
- if (parent)
- parent->child = sd;
- cpu_to_phys_group(i, cpu_map, &sd->groups, d->tmpmask);
- return sd;
-}
+ WARN_ON_ONCE(*per_cpu_ptr(sdd->sd, cpu) != sd);
+ *per_cpu_ptr(sdd->sd, cpu) = NULL;
-static struct sched_domain *__build_book_sched_domain(struct s_data *d,
- const struct cpumask *cpu_map, struct sched_domain_attr *attr,
- struct sched_domain *parent, int i)
-{
- struct sched_domain *sd = parent;
-#ifdef CONFIG_SCHED_BOOK
- sd = &per_cpu(book_domains, i).sd;
- SD_INIT(sd, BOOK);
- set_domain_attribute(sd, attr);
- cpumask_and(sched_domain_span(sd), cpu_map, cpu_book_mask(i));
- sd->parent = parent;
- parent->child = sd;
- cpu_to_book_group(i, cpu_map, &sd->groups, d->tmpmask);
-#endif
- return sd;
+ if (cpu == cpumask_first(sched_group_cpus(sg))) {
+ WARN_ON_ONCE(*per_cpu_ptr(sdd->sg, cpu) != sg);
+ *per_cpu_ptr(sdd->sg, cpu) = NULL;
+ }
}
-static struct sched_domain *__build_mc_sched_domain(struct s_data *d,
- const struct cpumask *cpu_map, struct sched_domain_attr *attr,
- struct sched_domain *parent, int i)
+#ifdef CONFIG_SCHED_SMT
+static const struct cpumask *cpu_smt_mask(int cpu)
{
- struct sched_domain *sd = parent;
-#ifdef CONFIG_SCHED_MC
- sd = &per_cpu(core_domains, i).sd;
- SD_INIT(sd, MC);
- set_domain_attribute(sd, attr);
- cpumask_and(sched_domain_span(sd), cpu_map, cpu_coregroup_mask(i));
- sd->parent = parent;
- parent->child = sd;
- cpu_to_core_group(i, cpu_map, &sd->groups, d->tmpmask);
-#endif
- return sd;
+ return topology_thread_cpumask(cpu);
}
-
-static struct sched_domain *__build_smt_sched_domain(struct s_data *d,
- const struct cpumask *cpu_map, struct sched_domain_attr *attr,
- struct sched_domain *parent, int i)
-{
- struct sched_domain *sd = parent;
-#ifdef CONFIG_SCHED_SMT
- sd = &per_cpu(cpu_domains, i).sd;
- SD_INIT(sd, SIBLING);
- set_domain_attribute(sd, attr);
- cpumask_and(sched_domain_span(sd), cpu_map, topology_thread_cpumask(i));
- sd->parent = parent;
- parent->child = sd;
- cpu_to_cpu_group(i, cpu_map, &sd->groups, d->tmpmask);
#endif
- return sd;
-}
-static void build_sched_groups(struct s_data *d, enum sched_domain_level l,
- const struct cpumask *cpu_map, int cpu)
-{
- switch (l) {
+/*
+ * Topology list, bottom-up.
+ */
+static struct sched_domain_topology_level default_topology[] = {
#ifdef CONFIG_SCHED_SMT
- case SD_LV_SIBLING: /* set up CPU (sibling) groups */
- cpumask_and(d->this_sibling_map, cpu_map,
- topology_thread_cpumask(cpu));
- if (cpu == cpumask_first(d->this_sibling_map))
- init_sched_build_groups(d->this_sibling_map, cpu_map,
- &cpu_to_cpu_group,
- d->send_covered, d->tmpmask);
- break;
+ { sd_init_SIBLING, cpu_smt_mask, },
#endif
#ifdef CONFIG_SCHED_MC
- case SD_LV_MC: /* set up multi-core groups */
- cpumask_and(d->this_core_map, cpu_map, cpu_coregroup_mask(cpu));
- if (cpu == cpumask_first(d->this_core_map))
- init_sched_build_groups(d->this_core_map, cpu_map,
- &cpu_to_core_group,
- d->send_covered, d->tmpmask);
- break;
+ { sd_init_MC, cpu_coregroup_mask, },
#endif
#ifdef CONFIG_SCHED_BOOK
- case SD_LV_BOOK: /* set up book groups */
- cpumask_and(d->this_book_map, cpu_map, cpu_book_mask(cpu));
- if (cpu == cpumask_first(d->this_book_map))
- init_sched_build_groups(d->this_book_map, cpu_map,
- &cpu_to_book_group,
- d->send_covered, d->tmpmask);
- break;
+ { sd_init_BOOK, cpu_book_mask, },
#endif
- case SD_LV_CPU: /* set up physical groups */
- cpumask_and(d->nodemask, cpumask_of_node(cpu), cpu_map);
- if (!cpumask_empty(d->nodemask))
- init_sched_build_groups(d->nodemask, cpu_map,
- &cpu_to_phys_group,
- d->send_covered, d->tmpmask);
- break;
+ { sd_init_CPU, cpu_cpu_mask, },
#ifdef CONFIG_NUMA
- case SD_LV_ALLNODES:
- init_sched_build_groups(cpu_map, cpu_map, &cpu_to_allnodes_group,
- d->send_covered, d->tmpmask);
- break;
+ { sd_init_NODE, cpu_node_mask, },
+ { sd_init_ALLNODES, cpu_allnodes_mask, },
#endif
- default:
- break;
+ { NULL, },
+};
+
+static struct sched_domain_topology_level *sched_domain_topology = default_topology;
+
+static int __sdt_alloc(const struct cpumask *cpu_map)
+{
+ struct sched_domain_topology_level *tl;
+ int j;
+
+ for (tl = sched_domain_topology; tl->init; tl++) {
+ struct sd_data *sdd = &tl->data;
+
+ sdd->sd = alloc_percpu(struct sched_domain *);
+ if (!sdd->sd)
+ return -ENOMEM;
+
+ sdd->sg = alloc_percpu(struct sched_group *);
+ if (!sdd->sg)
+ return -ENOMEM;
+
+ for_each_cpu(j, cpu_map) {
+ struct sched_domain *sd;
+ struct sched_group *sg;
+
+ sd = kzalloc_node(sizeof(struct sched_domain) + cpumask_size(),
+ GFP_KERNEL, cpu_to_node(j));
+ if (!sd)
+ return -ENOMEM;
+
+ *per_cpu_ptr(sdd->sd, j) = sd;
+
+ sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(),
+ GFP_KERNEL, cpu_to_node(j));
+ if (!sg)
+ return -ENOMEM;
+
+ *per_cpu_ptr(sdd->sg, j) = sg;
+ }
}
+
+ return 0;
+}
+
+static void __sdt_free(const struct cpumask *cpu_map)
+{
+ struct sched_domain_topology_level *tl;
+ int j;
+
+ for (tl = sched_domain_topology; tl->init; tl++) {
+ struct sd_data *sdd = &tl->data;
+
+ for_each_cpu(j, cpu_map) {
+ kfree(*per_cpu_ptr(sdd->sd, j));
+ kfree(*per_cpu_ptr(sdd->sg, j));
+ }
+ free_percpu(sdd->sd);
+ free_percpu(sdd->sg);
+ }
+}
+
+struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
+ struct s_data *d, const struct cpumask *cpu_map,
+ struct sched_domain_attr *attr, struct sched_domain *child,
+ int cpu)
+{
+ struct sched_domain *sd = tl->init(tl, cpu);
+ if (!sd)
+ return child;
+
+ set_domain_attribute(sd, attr);
+ cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
+ if (child) {
+ sd->level = child->level + 1;
+ sched_domain_level_max = max(sched_domain_level_max, sd->level);
+ child->parent = sd;
+ }
+ sd->child = child;
+
+ return sd;
}
/*
* Build sched domains for a given set of cpus and attach the sched domains
* to the individual cpus
*/
-static int __build_sched_domains(const struct cpumask *cpu_map,
- struct sched_domain_attr *attr)
+static int build_sched_domains(const struct cpumask *cpu_map,
+ struct sched_domain_attr *attr)
{
enum s_alloc alloc_state = sa_none;
- struct s_data d;
struct sched_domain *sd;
- int i;
-#ifdef CONFIG_NUMA
- d.sd_allnodes = 0;
-#endif
+ struct s_data d;
+ int i, ret = -ENOMEM;
alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
if (alloc_state != sa_rootdomain)
goto error;
- alloc_state = sa_sched_groups;
-
- /*
- * Set up domains for cpus specified by the cpu_map.
- */
- for_each_cpu(i, cpu_map) {
- cpumask_and(d.nodemask, cpumask_of_node(cpu_to_node(i)),
- cpu_map);
-
- sd = __build_numa_sched_domains(&d, cpu_map, attr, i);
- sd = __build_cpu_sched_domain(&d, cpu_map, attr, sd, i);
- sd = __build_book_sched_domain(&d, cpu_map, attr, sd, i);
- sd = __build_mc_sched_domain(&d, cpu_map, attr, sd, i);
- sd = __build_smt_sched_domain(&d, cpu_map, attr, sd, i);
- }
+ /* Set up domains for cpus specified by the cpu_map. */
for_each_cpu(i, cpu_map) {
- build_sched_groups(&d, SD_LV_SIBLING, cpu_map, i);
- build_sched_groups(&d, SD_LV_BOOK, cpu_map, i);
- build_sched_groups(&d, SD_LV_MC, cpu_map, i);
- }
+ struct sched_domain_topology_level *tl;
- /* Set up physical groups */
- for (i = 0; i < nr_node_ids; i++)
- build_sched_groups(&d, SD_LV_CPU, cpu_map, i);
+ sd = NULL;
+ for (tl = sched_domain_topology; tl->init; tl++)
+ sd = build_sched_domain(tl, &d, cpu_map, attr, sd, i);
-#ifdef CONFIG_NUMA
- /* Set up node groups */
- if (d.sd_allnodes)
- build_sched_groups(&d, SD_LV_ALLNODES, cpu_map, 0);
+ while (sd->child)
+ sd = sd->child;
- for (i = 0; i < nr_node_ids; i++)
- if (build_numa_sched_groups(&d, cpu_map, i))
- goto error;
-#endif
-
- /* Calculate CPU power for physical packages and nodes */
-#ifdef CONFIG_SCHED_SMT
- for_each_cpu(i, cpu_map) {
- sd = &per_cpu(cpu_domains, i).sd;
- init_sched_groups_power(i, sd);
- }
-#endif
-#ifdef CONFIG_SCHED_MC
- for_each_cpu(i, cpu_map) {
- sd = &per_cpu(core_domains, i).sd;
- init_sched_groups_power(i, sd);
+ *per_cpu_ptr(d.sd, i) = sd;
}
-#endif
-#ifdef CONFIG_SCHED_BOOK
- for_each_cpu(i, cpu_map) {
- sd = &per_cpu(book_domains, i).sd;
- init_sched_groups_power(i, sd);
- }
-#endif
+ /* Build the groups for the domains */
for_each_cpu(i, cpu_map) {
- sd = &per_cpu(phys_domains, i).sd;
- init_sched_groups_power(i, sd);
- }
+ for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
+ sd->span_weight = cpumask_weight(sched_domain_span(sd));
+ get_group(i, sd->private, &sd->groups);
+ atomic_inc(&sd->groups->ref);
-#ifdef CONFIG_NUMA
- for (i = 0; i < nr_node_ids; i++)
- init_numa_sched_groups_power(d.sched_group_nodes[i]);
+ if (i != cpumask_first(sched_domain_span(sd)))
+ continue;
- if (d.sd_allnodes) {
- struct sched_group *sg;
+ build_sched_groups(sd);
+ }
+ }
- cpu_to_allnodes_group(cpumask_first(cpu_map), cpu_map, &sg,
- d.tmpmask);
- init_numa_sched_groups_power(sg);
+ /* Calculate CPU power for physical packages and nodes */
+ for (i = nr_cpumask_bits-1; i >= 0; i--) {
+ if (!cpumask_test_cpu(i, cpu_map))
+ continue;
+
+ for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
+ claim_allocations(i, sd);
+ init_sched_groups_power(i, sd);
+ }
}
-#endif
/* Attach the domains */
+ rcu_read_lock();
for_each_cpu(i, cpu_map) {
-#ifdef CONFIG_SCHED_SMT
- sd = &per_cpu(cpu_domains, i).sd;
-#elif defined(CONFIG_SCHED_MC)
- sd = &per_cpu(core_domains, i).sd;
-#elif defined(CONFIG_SCHED_BOOK)
- sd = &per_cpu(book_domains, i).sd;
-#else
- sd = &per_cpu(phys_domains, i).sd;
-#endif
+ sd = *per_cpu_ptr(d.sd, i);
cpu_attach_domain(sd, d.rd, i);
}
+ rcu_read_unlock();
- d.sched_group_nodes = NULL; /* don't free this we still need it */
- __free_domain_allocs(&d, sa_tmpmask, cpu_map);
- return 0;
-
+ ret = 0;
error:
__free_domain_allocs(&d, alloc_state, cpu_map);
- return -ENOMEM;
-}
-
-static int build_sched_domains(const struct cpumask *cpu_map)
-{
- return __build_sched_domains(cpu_map, NULL);
+ return ret;
}
static cpumask_var_t *doms_cur; /* current sched domains */
* For now this just excludes isolated cpus, but could be used to
* exclude other special cases in the future.
*/
-static int arch_init_sched_domains(const struct cpumask *cpu_map)
+static int init_sched_domains(const struct cpumask *cpu_map)
{
int err;
doms_cur = &fallback_doms;
cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
dattr_cur = NULL;
- err = build_sched_domains(doms_cur[0]);
+ err = build_sched_domains(doms_cur[0], NULL);
register_sched_domain_sysctl();
return err;
}
-static void arch_destroy_sched_domains(const struct cpumask *cpu_map,
- struct cpumask *tmpmask)
-{
- free_sched_groups(cpu_map, tmpmask);
-}
-
/*
* Detach sched domains from a group of cpus specified in cpu_map
* These cpus will now be attached to the NULL domain
*/
static void detach_destroy_domains(const struct cpumask *cpu_map)
{
- /* Save because hotplug lock held. */
- static DECLARE_BITMAP(tmpmask, CONFIG_NR_CPUS);
int i;
+ rcu_read_lock();
for_each_cpu(i, cpu_map)
cpu_attach_domain(NULL, &def_root_domain, i);
- synchronize_sched();
- arch_destroy_sched_domains(cpu_map, to_cpumask(tmpmask));
+ rcu_read_unlock();
}
/* handle null as "default" */
goto match2;
}
/* no match - add a new doms_new */
- __build_sched_domains(doms_new[i],
- dattr_new ? dattr_new + i : NULL);
+ build_sched_domains(doms_new[i], dattr_new ? dattr_new + i : NULL);
match2:
;
}
}
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
-static void arch_reinit_sched_domains(void)
+static void reinit_sched_domains(void)
{
get_online_cpus();
else
sched_mc_power_savings = level;
- arch_reinit_sched_domains();
+ reinit_sched_domains();
return count;
}
alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
-#if defined(CONFIG_NUMA)
- sched_group_nodes_bycpu = kzalloc(nr_cpu_ids * sizeof(void **),
- GFP_KERNEL);
- BUG_ON(sched_group_nodes_bycpu == NULL);
-#endif
get_online_cpus();
mutex_lock(&sched_domains_mutex);
- arch_init_sched_domains(cpu_active_mask);
+ init_sched_domains(cpu_active_mask);
cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
if (cpumask_empty(non_isolated_cpus))
cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
/* Allocate the nohz_cpu_mask if CONFIG_CPUMASK_OFFSTACK */
zalloc_cpumask_var(&nohz_cpu_mask, GFP_NOWAIT);
#ifdef CONFIG_SMP
+ zalloc_cpumask_var(&sched_domains_tmpmask, GFP_NOWAIT);
#ifdef CONFIG_NO_HZ
zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
alloc_cpumask_var(&nohz.grp_idle_mask, GFP_NOWAIT);
int old_prio = p->prio;
int on_rq;
- on_rq = p->se.on_rq;
+ on_rq = p->on_rq;
if (on_rq)
deactivate_task(rq, p, 0);
__setscheduler(rq, p, SCHED_NORMAL, 0);
{
struct rt_rq *rt_rq;
struct sched_rt_entity *rt_se;
- struct rq *rq;
int i;
tg->rt_rq = kzalloc(sizeof(rt_rq) * nr_cpu_ids, GFP_KERNEL);
ktime_to_ns(def_rt_bandwidth.rt_period), 0);
for_each_possible_cpu(i) {
- rq = cpu_rq(i);
-
rt_rq = kzalloc_node(sizeof(struct rt_rq),
GFP_KERNEL, cpu_to_node(i));
if (!rt_rq)
rq = task_rq_lock(tsk, &flags);
running = task_current(rq, tsk);
- on_rq = tsk->se.on_rq;
+ on_rq = tsk->on_rq;
if (on_rq)
dequeue_task(rq, tsk, 0);
if (on_rq)
enqueue_task(rq, tsk, 0);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, tsk, &flags);
}
#endif /* CONFIG_CGROUP_SCHED */
read_lock_irqsave(&tasklist_lock, flags);
do_each_thread(g, p) {
- if (!p->se.on_rq || task_cpu(p) != rq_cpu)
+ if (!p->on_rq || task_cpu(p) != rq_cpu)
continue;
print_task(m, rq, p);
P(ttwu_count);
P(ttwu_local);
- SEQ_printf(m, " .%-30s: %d\n", "bkl_count",
- rq->rq_sched_info.bkl_count);
-
#undef P
#undef P64
#endif
P(se.statistics.wait_count);
PN(se.statistics.iowait_sum);
P(se.statistics.iowait_count);
- P(sched_info.bkl_count);
P(se.nr_migrations);
P(se.statistics.nr_migrations_cold);
P(se.statistics.nr_failed_migrations_affine);
}
cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+#ifndef CONFIG_64BIT
+ smp_wmb();
+ cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
+#endif
}
/*
hrtick_update(rq);
}
+static void set_next_buddy(struct sched_entity *se);
+
/*
* The dequeue_task method is called before nr_running is
* decreased. We remove the task from the rbtree and
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
+ int task_sleep = flags & DEQUEUE_SLEEP;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, flags);
/* Don't dequeue parent if it has other entities besides us */
- if (cfs_rq->load.weight)
+ if (cfs_rq->load.weight) {
+ /*
+ * Bias pick_next to pick a task from this cfs_rq, as
+ * p is sleeping when it is within its sched_slice.
+ */
+ if (task_sleep && parent_entity(se))
+ set_next_buddy(parent_entity(se));
break;
+ }
flags |= DEQUEUE_SLEEP;
}
#ifdef CONFIG_SMP
-static void task_waking_fair(struct rq *rq, struct task_struct *p)
+static void task_waking_fair(struct task_struct *p)
{
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ u64 min_vruntime;
- se->vruntime -= cfs_rq->min_vruntime;
+#ifndef CONFIG_64BIT
+ u64 min_vruntime_copy;
+
+ do {
+ min_vruntime_copy = cfs_rq->min_vruntime_copy;
+ smp_rmb();
+ min_vruntime = cfs_rq->min_vruntime;
+ } while (min_vruntime != min_vruntime_copy);
+#else
+ min_vruntime = cfs_rq->min_vruntime;
+#endif
+
+ se->vruntime -= min_vruntime;
}
#ifdef CONFIG_FAIR_GROUP_SCHED
/*
* Otherwise, iterate the domains and find an elegible idle cpu.
*/
+ rcu_read_lock();
for_each_domain(target, sd) {
if (!(sd->flags & SD_SHARE_PKG_RESOURCES))
break;
cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
break;
}
+ rcu_read_unlock();
return target;
}
* preempt must be disabled.
*/
static int
-select_task_rq_fair(struct rq *rq, struct task_struct *p, int sd_flag, int wake_flags)
+select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
{
struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
int cpu = smp_processor_id();
new_cpu = prev_cpu;
}
+ rcu_read_lock();
for_each_domain(cpu, tmp) {
if (!(tmp->flags & SD_LOAD_BALANCE))
continue;
if (affine_sd) {
if (cpu == prev_cpu || wake_affine(affine_sd, p, sync))
- return select_idle_sibling(p, cpu);
- else
- return select_idle_sibling(p, prev_cpu);
+ prev_cpu = cpu;
+
+ new_cpu = select_idle_sibling(p, prev_cpu);
+ goto unlock;
}
while (sd) {
}
/* while loop will break here if sd == NULL */
}
+unlock:
+ rcu_read_unlock();
return new_cpu;
}
* This is especially important for buddies when the leftmost
* task is higher priority than the buddy.
*/
- if (unlikely(se->load.weight != NICE_0_LOAD))
- gran = calc_delta_fair(gran, se);
-
- return gran;
+ return calc_delta_fair(gran, se);
}
/*
static void set_last_buddy(struct sched_entity *se)
{
- if (likely(task_of(se)->policy != SCHED_IDLE)) {
- for_each_sched_entity(se)
- cfs_rq_of(se)->last = se;
- }
+ if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
+ return;
+
+ for_each_sched_entity(se)
+ cfs_rq_of(se)->last = se;
}
static void set_next_buddy(struct sched_entity *se)
{
- if (likely(task_of(se)->policy != SCHED_IDLE)) {
- for_each_sched_entity(se)
- cfs_rq_of(se)->next = se;
- }
+ if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
+ return;
+
+ for_each_sched_entity(se)
+ cfs_rq_of(se)->next = se;
}
static void set_skip_buddy(struct sched_entity *se)
{
- if (likely(task_of(se)->policy != SCHED_IDLE)) {
- for_each_sched_entity(se)
- cfs_rq_of(se)->skip = se;
- }
+ for_each_sched_entity(se)
+ cfs_rq_of(se)->skip = se;
}
/*
struct sched_entity *se = &curr->se, *pse = &p->se;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int scale = cfs_rq->nr_running >= sched_nr_latency;
+ int next_buddy_marked = 0;
if (unlikely(se == pse))
return;
- if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK))
+ if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK)) {
set_next_buddy(pse);
+ next_buddy_marked = 1;
+ }
/*
* We can come here with TIF_NEED_RESCHED already set from new task
update_curr(cfs_rq);
find_matching_se(&se, &pse);
BUG_ON(!pse);
- if (wakeup_preempt_entity(se, pse) == 1)
+ if (wakeup_preempt_entity(se, pse) == 1) {
+ /*
+ * Bias pick_next to pick the sched entity that is
+ * triggering this preemption.
+ */
+ if (!next_buddy_marked)
+ set_next_buddy(pse);
goto preempt;
+ }
return;
balance_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_load_move, struct sched_domain *sd,
enum cpu_idle_type idle, int *all_pinned,
- int *this_best_prio, struct cfs_rq *busiest_cfs_rq)
+ struct cfs_rq *busiest_cfs_rq)
{
int loops = 0, pulled = 0;
long rem_load_move = max_load_move;
*/
if (rem_load_move <= 0)
break;
-
- if (p->prio < *this_best_prio)
- *this_best_prio = p->prio;
}
out:
/*
load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle,
- int *all_pinned, int *this_best_prio)
+ int *all_pinned)
{
long rem_load_move = max_load_move;
int busiest_cpu = cpu_of(busiest);
rem_load = div_u64(rem_load, busiest_h_load + 1);
moved_load = balance_tasks(this_rq, this_cpu, busiest,
- rem_load, sd, idle, all_pinned, this_best_prio,
+ rem_load, sd, idle, all_pinned,
busiest_cfs_rq);
if (!moved_load)
load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle,
- int *all_pinned, int *this_best_prio)
+ int *all_pinned)
{
return balance_tasks(this_rq, this_cpu, busiest,
max_load_move, sd, idle, all_pinned,
- this_best_prio, &busiest->cfs);
+ &busiest->cfs);
}
#endif
int *all_pinned)
{
unsigned long total_load_moved = 0, load_moved;
- int this_best_prio = this_rq->curr->prio;
do {
load_moved = load_balance_fair(this_rq, this_cpu, busiest,
max_load_move - total_load_moved,
- sd, idle, all_pinned, &this_best_prio);
+ sd, idle, all_pinned);
total_load_moved += load_moved;
/*
* Only siblings can have significantly less than SCHED_LOAD_SCALE
*/
- if (sd->level != SD_LV_SIBLING)
+ if (!(sd->flags & SD_SHARE_CPUPOWER))
return 0;
/*
raw_spin_unlock(&this_rq->lock);
update_shares(this_cpu);
+ rcu_read_lock();
for_each_domain(this_cpu, sd) {
unsigned long interval;
int balance = 1;
break;
}
}
+ rcu_read_unlock();
raw_spin_lock(&this_rq->lock);
double_lock_balance(busiest_rq, target_rq);
/* Search for an sd spanning us and the target CPU. */
+ rcu_read_lock();
for_each_domain(target_cpu, sd) {
if ((sd->flags & SD_LOAD_BALANCE) &&
cpumask_test_cpu(busiest_cpu, sched_domain_span(sd)))
else
schedstat_inc(sd, alb_failed);
}
+ rcu_read_unlock();
double_unlock_balance(busiest_rq, target_rq);
out_unlock:
busiest_rq->active_balance = 0;
{
struct sched_domain *sd;
struct sched_group *ilb_group;
+ int ilb = nr_cpu_ids;
/*
* Have idle load balancer selection from semi-idle packages only
if (cpumask_weight(nohz.idle_cpus_mask) < 2)
goto out_done;
+ rcu_read_lock();
for_each_flag_domain(cpu, sd, SD_POWERSAVINGS_BALANCE) {
ilb_group = sd->groups;
do {
- if (is_semi_idle_group(ilb_group))
- return cpumask_first(nohz.grp_idle_mask);
+ if (is_semi_idle_group(ilb_group)) {
+ ilb = cpumask_first(nohz.grp_idle_mask);
+ goto unlock;
+ }
ilb_group = ilb_group->next;
} while (ilb_group != sd->groups);
}
+unlock:
+ rcu_read_unlock();
out_done:
- return nr_cpu_ids;
+ return ilb;
}
#else /* (CONFIG_SCHED_MC || CONFIG_SCHED_SMT) */
static inline int find_new_ilb(int call_cpu)
update_shares(cpu);
+ rcu_read_lock();
for_each_domain(cpu, sd) {
if (!(sd->flags & SD_LOAD_BALANCE))
continue;
if (!balance)
break;
}
+ rcu_read_unlock();
/*
* next_balance will be updated only when there is a need.
* Decrement CPU power based on irq activity
*/
SCHED_FEAT(NONIRQ_POWER, 1)
+
+/*
+ * Queue remote wakeups on the target CPU and process them
+ * using the scheduler IPI. Reduces rq->lock contention/bounces.
+ */
+SCHED_FEAT(TTWU_QUEUE, 1)
#ifdef CONFIG_SMP
static int
-select_task_rq_idle(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
+select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
{
return task_cpu(p); /* IDLE tasks as never migrated */
}
return ktime_to_ns(rt_rq->tg->rt_bandwidth.rt_period);
}
+typedef struct task_group *rt_rq_iter_t;
+
+#define for_each_rt_rq(rt_rq, iter, rq) \
+ for (iter = list_entry_rcu(task_groups.next, typeof(*iter), list); \
+ (&iter->list != &task_groups) && \
+ (rt_rq = iter->rt_rq[cpu_of(rq)]); \
+ iter = list_entry_rcu(iter->list.next, typeof(*iter), list))
+
static inline void list_add_leaf_rt_rq(struct rt_rq *rt_rq)
{
list_add_rcu(&rt_rq->leaf_rt_rq_list,
return ktime_to_ns(def_rt_bandwidth.rt_period);
}
+typedef struct rt_rq *rt_rq_iter_t;
+
+#define for_each_rt_rq(rt_rq, iter, rq) \
+ for ((void) iter, rt_rq = &rq->rt; rt_rq; rt_rq = NULL)
+
static inline void list_add_leaf_rt_rq(struct rt_rq *rt_rq)
{
}
static void __disable_runtime(struct rq *rq)
{
struct root_domain *rd = rq->rd;
+ rt_rq_iter_t iter;
struct rt_rq *rt_rq;
if (unlikely(!scheduler_running))
return;
- for_each_leaf_rt_rq(rt_rq, rq) {
+ for_each_rt_rq(rt_rq, iter, rq) {
struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
s64 want;
int i;
static void __enable_runtime(struct rq *rq)
{
+ rt_rq_iter_t iter;
struct rt_rq *rt_rq;
if (unlikely(!scheduler_running))
/*
* Reset each runqueue's bandwidth settings
*/
- for_each_leaf_rt_rq(rt_rq, rq) {
+ for_each_rt_rq(rt_rq, iter, rq) {
struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq);
raw_spin_lock(&rt_b->rt_runtime_lock);
if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) {
rt_rq->rt_throttled = 0;
enqueue = 1;
+
+ /*
+ * Force a clock update if the CPU was idle,
+ * lest wakeup -> unthrottle time accumulate.
+ */
+ if (rt_rq->rt_nr_running && rq->curr == rq->idle)
+ rq->skip_clock_update = -1;
}
if (rt_rq->rt_time || rt_rq->rt_nr_running)
idle = 0;
static int find_lowest_rq(struct task_struct *task);
static int
-select_task_rq_rt(struct rq *rq, struct task_struct *p, int sd_flag, int flags)
+select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
{
+ struct task_struct *curr;
+ struct rq *rq;
+ int cpu;
+
if (sd_flag != SD_BALANCE_WAKE)
return smp_processor_id();
+ cpu = task_cpu(p);
+ rq = cpu_rq(cpu);
+
+ rcu_read_lock();
+ curr = ACCESS_ONCE(rq->curr); /* unlocked access */
+
/*
- * If the current task is an RT task, then
+ * If the current task on @p's runqueue is an RT task, then
* try to see if we can wake this RT task up on another
* runqueue. Otherwise simply start this RT task
* on its current runqueue.
* lock?
*
* For equal prio tasks, we just let the scheduler sort it out.
+ *
+ * Otherwise, just let it ride on the affined RQ and the
+ * post-schedule router will push the preempted task away
+ *
+ * This test is optimistic, if we get it wrong the load-balancer
+ * will have to sort it out.
*/
- if (unlikely(rt_task(rq->curr)) &&
- (rq->curr->rt.nr_cpus_allowed < 2 ||
- rq->curr->prio < p->prio) &&
+ if (curr && unlikely(rt_task(curr)) &&
+ (curr->rt.nr_cpus_allowed < 2 ||
+ curr->prio < p->prio) &&
(p->rt.nr_cpus_allowed > 1)) {
- int cpu = find_lowest_rq(p);
+ int target = find_lowest_rq(p);
- return (cpu == -1) ? task_cpu(p) : cpu;
+ if (target != -1)
+ cpu = target;
}
+ rcu_read_unlock();
- /*
- * Otherwise, just let it ride on the affined RQ and the
- * post-schedule router will push the preempted task away
- */
- return task_cpu(p);
+ return cpu;
}
static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
* The previous task needs to be made eligible for pushing
* if it is still active
*/
- if (p->se.on_rq && p->rt.nr_cpus_allowed > 1)
+ if (on_rt_rq(&p->rt) && p->rt.nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
}
!cpumask_test_cpu(lowest_rq->cpu,
&task->cpus_allowed) ||
task_running(rq, task) ||
- !task->se.on_rq)) {
+ !task->on_rq)) {
raw_spin_unlock(&lowest_rq->lock);
lowest_rq = NULL;
BUG_ON(task_current(rq, p));
BUG_ON(p->rt.nr_cpus_allowed <= 1);
- BUG_ON(!p->se.on_rq);
+ BUG_ON(!p->on_rq);
BUG_ON(!rt_task(p));
return p;
*/
if (p && (p->prio < this_rq->rt.highest_prio.curr)) {
WARN_ON(p == src_rq->curr);
- WARN_ON(!p->se.on_rq);
+ WARN_ON(!p->on_rq);
/*
* There's a chance that p is higher in priority
* Update the migration status of the RQ if we have an RT task
* which is running AND changing its weight value.
*/
- if (p->se.on_rq && (weight != p->rt.nr_cpus_allowed)) {
+ if (p->on_rq && (weight != p->rt.nr_cpus_allowed)) {
struct rq *rq = task_rq(p);
if (!task_current(rq, p)) {
* we may need to handle the pulling of RT tasks
* now.
*/
- if (p->se.on_rq && !rq->rt.rt_nr_running)
+ if (p->on_rq && !rq->rt.rt_nr_running)
pull_rt_task(rq);
}
* If that current running task is also an RT task
* then see if we can move to another run queue.
*/
- if (p->se.on_rq && rq->curr != p) {
+ if (p->on_rq && rq->curr != p) {
#ifdef CONFIG_SMP
if (rq->rt.overloaded && push_rt_task(rq) &&
/* Don't resched if we changed runqueues */
static void
prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
{
- if (!p->se.on_rq)
+ if (!p->on_rq)
return;
if (rq->curr == p) {
static void print_rt_stats(struct seq_file *m, int cpu)
{
+ rt_rq_iter_t iter;
struct rt_rq *rt_rq;
rcu_read_lock();
- for_each_leaf_rt_rq(rt_rq, cpu_rq(cpu))
+ for_each_rt_rq(rt_rq, iter, cpu_rq(cpu))
print_rt_rq(m, cpu, rt_rq);
rcu_read_unlock();
}
#ifdef CONFIG_SMP
static int
-select_task_rq_stop(struct rq *rq, struct task_struct *p,
- int sd_flag, int flags)
+select_task_rq_stop(struct task_struct *p, int sd_flag, int flags)
{
return task_cpu(p); /* stop tasks as never migrate */
}
{
struct task_struct *stop = rq->stop;
- if (stop && stop->se.on_rq)
+ if (stop && stop->on_rq)
return stop;
return NULL;
blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
device_shutdown();
- sysdev_shutdown();
syscore_shutdown();
}
void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
- sysdev_shutdown();
syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
if (pm_power_off_prepare)
pm_power_off_prepare();
disable_nonboot_cpus();
- sysdev_shutdown();
syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o
-obj-y += timeconv.o posix-clock.o
+obj-y += timeconv.o posix-clock.o alarmtimer.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD) += clockevents.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS) += tick-common.o
--- /dev/null
+/*
+ * Alarmtimer interface
+ *
+ * This interface provides a timer which is similarto hrtimers,
+ * but triggers a RTC alarm if the box is suspend.
+ *
+ * This interface is influenced by the Android RTC Alarm timer
+ * interface.
+ *
+ * Copyright (C) 2010 IBM Corperation
+ *
+ * Author: John Stultz <john.stultz@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/time.h>
+#include <linux/hrtimer.h>
+#include <linux/timerqueue.h>
+#include <linux/rtc.h>
+#include <linux/alarmtimer.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/posix-timers.h>
+#include <linux/workqueue.h>
+#include <linux/freezer.h>
+
+/**
+ * struct alarm_base - Alarm timer bases
+ * @lock: Lock for syncrhonized access to the base
+ * @timerqueue: Timerqueue head managing the list of events
+ * @timer: hrtimer used to schedule events while running
+ * @gettime: Function to read the time correlating to the base
+ * @base_clockid: clockid for the base
+ */
+static struct alarm_base {
+ spinlock_t lock;
+ struct timerqueue_head timerqueue;
+ struct hrtimer timer;
+ ktime_t (*gettime)(void);
+ clockid_t base_clockid;
+} alarm_bases[ALARM_NUMTYPE];
+
+#ifdef CONFIG_RTC_CLASS
+/* rtc timer and device for setting alarm wakeups at suspend */
+static struct rtc_timer rtctimer;
+static struct rtc_device *rtcdev;
+#endif
+
+/* freezer delta & lock used to handle clock_nanosleep triggered wakeups */
+static ktime_t freezer_delta;
+static DEFINE_SPINLOCK(freezer_delta_lock);
+
+
+/**
+ * alarmtimer_enqueue - Adds an alarm timer to an alarm_base timerqueue
+ * @base: pointer to the base where the timer is being run
+ * @alarm: pointer to alarm being enqueued.
+ *
+ * Adds alarm to a alarm_base timerqueue and if necessary sets
+ * an hrtimer to run.
+ *
+ * Must hold base->lock when calling.
+ */
+static void alarmtimer_enqueue(struct alarm_base *base, struct alarm *alarm)
+{
+ timerqueue_add(&base->timerqueue, &alarm->node);
+ if (&alarm->node == timerqueue_getnext(&base->timerqueue)) {
+ hrtimer_try_to_cancel(&base->timer);
+ hrtimer_start(&base->timer, alarm->node.expires,
+ HRTIMER_MODE_ABS);
+ }
+}
+
+/**
+ * alarmtimer_remove - Removes an alarm timer from an alarm_base timerqueue
+ * @base: pointer to the base where the timer is running
+ * @alarm: pointer to alarm being removed
+ *
+ * Removes alarm to a alarm_base timerqueue and if necessary sets
+ * a new timer to run.
+ *
+ * Must hold base->lock when calling.
+ */
+static void alarmtimer_remove(struct alarm_base *base, struct alarm *alarm)
+{
+ struct timerqueue_node *next = timerqueue_getnext(&base->timerqueue);
+
+ timerqueue_del(&base->timerqueue, &alarm->node);
+ if (next == &alarm->node) {
+ hrtimer_try_to_cancel(&base->timer);
+ next = timerqueue_getnext(&base->timerqueue);
+ if (!next)
+ return;
+ hrtimer_start(&base->timer, next->expires, HRTIMER_MODE_ABS);
+ }
+}
+
+
+/**
+ * alarmtimer_fired - Handles alarm hrtimer being fired.
+ * @timer: pointer to hrtimer being run
+ *
+ * When a alarm timer fires, this runs through the timerqueue to
+ * see which alarms expired, and runs those. If there are more alarm
+ * timers queued for the future, we set the hrtimer to fire when
+ * when the next future alarm timer expires.
+ */
+static enum hrtimer_restart alarmtimer_fired(struct hrtimer *timer)
+{
+ struct alarm_base *base = container_of(timer, struct alarm_base, timer);
+ struct timerqueue_node *next;
+ unsigned long flags;
+ ktime_t now;
+ int ret = HRTIMER_NORESTART;
+
+ spin_lock_irqsave(&base->lock, flags);
+ now = base->gettime();
+ while ((next = timerqueue_getnext(&base->timerqueue))) {
+ struct alarm *alarm;
+ ktime_t expired = next->expires;
+
+ if (expired.tv64 >= now.tv64)
+ break;
+
+ alarm = container_of(next, struct alarm, node);
+
+ timerqueue_del(&base->timerqueue, &alarm->node);
+ alarm->enabled = 0;
+ /* Re-add periodic timers */
+ if (alarm->period.tv64) {
+ alarm->node.expires = ktime_add(expired, alarm->period);
+ timerqueue_add(&base->timerqueue, &alarm->node);
+ alarm->enabled = 1;
+ }
+ spin_unlock_irqrestore(&base->lock, flags);
+ if (alarm->function)
+ alarm->function(alarm);
+ spin_lock_irqsave(&base->lock, flags);
+ }
+
+ if (next) {
+ hrtimer_set_expires(&base->timer, next->expires);
+ ret = HRTIMER_RESTART;
+ }
+ spin_unlock_irqrestore(&base->lock, flags);
+
+ return ret;
+
+}
+
+#ifdef CONFIG_RTC_CLASS
+/**
+ * alarmtimer_suspend - Suspend time callback
+ * @dev: unused
+ * @state: unused
+ *
+ * When we are going into suspend, we look through the bases
+ * to see which is the soonest timer to expire. We then
+ * set an rtc timer to fire that far into the future, which
+ * will wake us from suspend.
+ */
+static int alarmtimer_suspend(struct device *dev)
+{
+ struct rtc_time tm;
+ ktime_t min, now;
+ unsigned long flags;
+ int i;
+
+ spin_lock_irqsave(&freezer_delta_lock, flags);
+ min = freezer_delta;
+ freezer_delta = ktime_set(0, 0);
+ spin_unlock_irqrestore(&freezer_delta_lock, flags);
+
+ /* If we have no rtcdev, just return */
+ if (!rtcdev)
+ return 0;
+
+ /* Find the soonest timer to expire*/
+ for (i = 0; i < ALARM_NUMTYPE; i++) {
+ struct alarm_base *base = &alarm_bases[i];
+ struct timerqueue_node *next;
+ ktime_t delta;
+
+ spin_lock_irqsave(&base->lock, flags);
+ next = timerqueue_getnext(&base->timerqueue);
+ spin_unlock_irqrestore(&base->lock, flags);
+ if (!next)
+ continue;
+ delta = ktime_sub(next->expires, base->gettime());
+ if (!min.tv64 || (delta.tv64 < min.tv64))
+ min = delta;
+ }
+ if (min.tv64 == 0)
+ return 0;
+
+ /* XXX - Should we enforce a minimum sleep time? */
+ WARN_ON(min.tv64 < NSEC_PER_SEC);
+
+ /* Setup an rtc timer to fire that far in the future */
+ rtc_timer_cancel(rtcdev, &rtctimer);
+ rtc_read_time(rtcdev, &tm);
+ now = rtc_tm_to_ktime(tm);
+ now = ktime_add(now, min);
+
+ rtc_timer_start(rtcdev, &rtctimer, now, ktime_set(0, 0));
+
+ return 0;
+}
+#else
+static int alarmtimer_suspend(struct device *dev)
+{
+ return 0;
+}
+#endif
+
+static void alarmtimer_freezerset(ktime_t absexp, enum alarmtimer_type type)
+{
+ ktime_t delta;
+ unsigned long flags;
+ struct alarm_base *base = &alarm_bases[type];
+
+ delta = ktime_sub(absexp, base->gettime());
+
+ spin_lock_irqsave(&freezer_delta_lock, flags);
+ if (!freezer_delta.tv64 || (delta.tv64 < freezer_delta.tv64))
+ freezer_delta = delta;
+ spin_unlock_irqrestore(&freezer_delta_lock, flags);
+}
+
+
+/**
+ * alarm_init - Initialize an alarm structure
+ * @alarm: ptr to alarm to be initialized
+ * @type: the type of the alarm
+ * @function: callback that is run when the alarm fires
+ */
+void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
+ void (*function)(struct alarm *))
+{
+ timerqueue_init(&alarm->node);
+ alarm->period = ktime_set(0, 0);
+ alarm->function = function;
+ alarm->type = type;
+ alarm->enabled = 0;
+}
+
+/**
+ * alarm_start - Sets an alarm to fire
+ * @alarm: ptr to alarm to set
+ * @start: time to run the alarm
+ * @period: period at which the alarm will recur
+ */
+void alarm_start(struct alarm *alarm, ktime_t start, ktime_t period)
+{
+ struct alarm_base *base = &alarm_bases[alarm->type];
+ unsigned long flags;
+
+ spin_lock_irqsave(&base->lock, flags);
+ if (alarm->enabled)
+ alarmtimer_remove(base, alarm);
+ alarm->node.expires = start;
+ alarm->period = period;
+ alarmtimer_enqueue(base, alarm);
+ alarm->enabled = 1;
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+/**
+ * alarm_cancel - Tries to cancel an alarm timer
+ * @alarm: ptr to alarm to be canceled
+ */
+void alarm_cancel(struct alarm *alarm)
+{
+ struct alarm_base *base = &alarm_bases[alarm->type];
+ unsigned long flags;
+
+ spin_lock_irqsave(&base->lock, flags);
+ if (alarm->enabled)
+ alarmtimer_remove(base, alarm);
+ alarm->enabled = 0;
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+
+/**
+ * clock2alarm - helper that converts from clockid to alarmtypes
+ * @clockid: clockid.
+ */
+static enum alarmtimer_type clock2alarm(clockid_t clockid)
+{
+ if (clockid == CLOCK_REALTIME_ALARM)
+ return ALARM_REALTIME;
+ if (clockid == CLOCK_BOOTTIME_ALARM)
+ return ALARM_BOOTTIME;
+ return -1;
+}
+
+/**
+ * alarm_handle_timer - Callback for posix timers
+ * @alarm: alarm that fired
+ *
+ * Posix timer callback for expired alarm timers.
+ */
+static void alarm_handle_timer(struct alarm *alarm)
+{
+ struct k_itimer *ptr = container_of(alarm, struct k_itimer,
+ it.alarmtimer);
+ if (posix_timer_event(ptr, 0) != 0)
+ ptr->it_overrun++;
+}
+
+/**
+ * alarm_clock_getres - posix getres interface
+ * @which_clock: clockid
+ * @tp: timespec to fill
+ *
+ * Returns the granularity of underlying alarm base clock
+ */
+static int alarm_clock_getres(const clockid_t which_clock, struct timespec *tp)
+{
+ clockid_t baseid = alarm_bases[clock2alarm(which_clock)].base_clockid;
+
+ return hrtimer_get_res(baseid, tp);
+}
+
+/**
+ * alarm_clock_get - posix clock_get interface
+ * @which_clock: clockid
+ * @tp: timespec to fill.
+ *
+ * Provides the underlying alarm base time.
+ */
+static int alarm_clock_get(clockid_t which_clock, struct timespec *tp)
+{
+ struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
+
+ *tp = ktime_to_timespec(base->gettime());
+ return 0;
+}
+
+/**
+ * alarm_timer_create - posix timer_create interface
+ * @new_timer: k_itimer pointer to manage
+ *
+ * Initializes the k_itimer structure.
+ */
+static int alarm_timer_create(struct k_itimer *new_timer)
+{
+ enum alarmtimer_type type;
+ struct alarm_base *base;
+
+ if (!capable(CAP_WAKE_ALARM))
+ return -EPERM;
+
+ type = clock2alarm(new_timer->it_clock);
+ base = &alarm_bases[type];
+ alarm_init(&new_timer->it.alarmtimer, type, alarm_handle_timer);
+ return 0;
+}
+
+/**
+ * alarm_timer_get - posix timer_get interface
+ * @new_timer: k_itimer pointer
+ * @cur_setting: itimerspec data to fill
+ *
+ * Copies the itimerspec data out from the k_itimer
+ */
+static void alarm_timer_get(struct k_itimer *timr,
+ struct itimerspec *cur_setting)
+{
+ cur_setting->it_interval =
+ ktime_to_timespec(timr->it.alarmtimer.period);
+ cur_setting->it_value =
+ ktime_to_timespec(timr->it.alarmtimer.node.expires);
+ return;
+}
+
+/**
+ * alarm_timer_del - posix timer_del interface
+ * @timr: k_itimer pointer to be deleted
+ *
+ * Cancels any programmed alarms for the given timer.
+ */
+static int alarm_timer_del(struct k_itimer *timr)
+{
+ alarm_cancel(&timr->it.alarmtimer);
+ return 0;
+}
+
+/**
+ * alarm_timer_set - posix timer_set interface
+ * @timr: k_itimer pointer to be deleted
+ * @flags: timer flags
+ * @new_setting: itimerspec to be used
+ * @old_setting: itimerspec being replaced
+ *
+ * Sets the timer to new_setting, and starts the timer.
+ */
+static int alarm_timer_set(struct k_itimer *timr, int flags,
+ struct itimerspec *new_setting,
+ struct itimerspec *old_setting)
+{
+ /* Save old values */
+ old_setting->it_interval =
+ ktime_to_timespec(timr->it.alarmtimer.period);
+ old_setting->it_value =
+ ktime_to_timespec(timr->it.alarmtimer.node.expires);
+
+ /* If the timer was already set, cancel it */
+ alarm_cancel(&timr->it.alarmtimer);
+
+ /* start the timer */
+ alarm_start(&timr->it.alarmtimer,
+ timespec_to_ktime(new_setting->it_value),
+ timespec_to_ktime(new_setting->it_interval));
+ return 0;
+}
+
+/**
+ * alarmtimer_nsleep_wakeup - Wakeup function for alarm_timer_nsleep
+ * @alarm: ptr to alarm that fired
+ *
+ * Wakes up the task that set the alarmtimer
+ */
+static void alarmtimer_nsleep_wakeup(struct alarm *alarm)
+{
+ struct task_struct *task = (struct task_struct *)alarm->data;
+
+ alarm->data = NULL;
+ if (task)
+ wake_up_process(task);
+}
+
+/**
+ * alarmtimer_do_nsleep - Internal alarmtimer nsleep implementation
+ * @alarm: ptr to alarmtimer
+ * @absexp: absolute expiration time
+ *
+ * Sets the alarm timer and sleeps until it is fired or interrupted.
+ */
+static int alarmtimer_do_nsleep(struct alarm *alarm, ktime_t absexp)
+{
+ alarm->data = (void *)current;
+ do {
+ set_current_state(TASK_INTERRUPTIBLE);
+ alarm_start(alarm, absexp, ktime_set(0, 0));
+ if (likely(alarm->data))
+ schedule();
+
+ alarm_cancel(alarm);
+ } while (alarm->data && !signal_pending(current));
+
+ __set_current_state(TASK_RUNNING);
+
+ return (alarm->data == NULL);
+}
+
+
+/**
+ * update_rmtp - Update remaining timespec value
+ * @exp: expiration time
+ * @type: timer type
+ * @rmtp: user pointer to remaining timepsec value
+ *
+ * Helper function that fills in rmtp value with time between
+ * now and the exp value
+ */
+static int update_rmtp(ktime_t exp, enum alarmtimer_type type,
+ struct timespec __user *rmtp)
+{
+ struct timespec rmt;
+ ktime_t rem;
+
+ rem = ktime_sub(exp, alarm_bases[type].gettime());
+
+ if (rem.tv64 <= 0)
+ return 0;
+ rmt = ktime_to_timespec(rem);
+
+ if (copy_to_user(rmtp, &rmt, sizeof(*rmtp)))
+ return -EFAULT;
+
+ return 1;
+
+}
+
+/**
+ * alarm_timer_nsleep_restart - restartblock alarmtimer nsleep
+ * @restart: ptr to restart block
+ *
+ * Handles restarted clock_nanosleep calls
+ */
+static long __sched alarm_timer_nsleep_restart(struct restart_block *restart)
+{
+ enum alarmtimer_type type = restart->nanosleep.index;
+ ktime_t exp;
+ struct timespec __user *rmtp;
+ struct alarm alarm;
+ int ret = 0;
+
+ exp.tv64 = restart->nanosleep.expires;
+ alarm_init(&alarm, type, alarmtimer_nsleep_wakeup);
+
+ if (alarmtimer_do_nsleep(&alarm, exp))
+ goto out;
+
+ if (freezing(current))
+ alarmtimer_freezerset(exp, type);
+
+ rmtp = restart->nanosleep.rmtp;
+ if (rmtp) {
+ ret = update_rmtp(exp, type, rmtp);
+ if (ret <= 0)
+ goto out;
+ }
+
+
+ /* The other values in restart are already filled in */
+ ret = -ERESTART_RESTARTBLOCK;
+out:
+ return ret;
+}
+
+/**
+ * alarm_timer_nsleep - alarmtimer nanosleep
+ * @which_clock: clockid
+ * @flags: determins abstime or relative
+ * @tsreq: requested sleep time (abs or rel)
+ * @rmtp: remaining sleep time saved
+ *
+ * Handles clock_nanosleep calls against _ALARM clockids
+ */
+static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
+ struct timespec *tsreq, struct timespec __user *rmtp)
+{
+ enum alarmtimer_type type = clock2alarm(which_clock);
+ struct alarm alarm;
+ ktime_t exp;
+ int ret = 0;
+ struct restart_block *restart;
+
+ if (!capable(CAP_WAKE_ALARM))
+ return -EPERM;
+
+ alarm_init(&alarm, type, alarmtimer_nsleep_wakeup);
+
+ exp = timespec_to_ktime(*tsreq);
+ /* Convert (if necessary) to absolute time */
+ if (flags != TIMER_ABSTIME) {
+ ktime_t now = alarm_bases[type].gettime();
+ exp = ktime_add(now, exp);
+ }
+
+ if (alarmtimer_do_nsleep(&alarm, exp))
+ goto out;
+
+ if (freezing(current))
+ alarmtimer_freezerset(exp, type);
+
+ /* abs timers don't set remaining time or restart */
+ if (flags == TIMER_ABSTIME) {
+ ret = -ERESTARTNOHAND;
+ goto out;
+ }
+
+ if (rmtp) {
+ ret = update_rmtp(exp, type, rmtp);
+ if (ret <= 0)
+ goto out;
+ }
+
+ restart = ¤t_thread_info()->restart_block;
+ restart->fn = alarm_timer_nsleep_restart;
+ restart->nanosleep.index = type;
+ restart->nanosleep.expires = exp.tv64;
+ restart->nanosleep.rmtp = rmtp;
+ ret = -ERESTART_RESTARTBLOCK;
+
+out:
+ return ret;
+}
+
+
+/* Suspend hook structures */
+static const struct dev_pm_ops alarmtimer_pm_ops = {
+ .suspend = alarmtimer_suspend,
+};
+
+static struct platform_driver alarmtimer_driver = {
+ .driver = {
+ .name = "alarmtimer",
+ .pm = &alarmtimer_pm_ops,
+ }
+};
+
+/**
+ * alarmtimer_init - Initialize alarm timer code
+ *
+ * This function initializes the alarm bases and registers
+ * the posix clock ids.
+ */
+static int __init alarmtimer_init(void)
+{
+ int error = 0;
+ int i;
+ struct k_clock alarm_clock = {
+ .clock_getres = alarm_clock_getres,
+ .clock_get = alarm_clock_get,
+ .timer_create = alarm_timer_create,
+ .timer_set = alarm_timer_set,
+ .timer_del = alarm_timer_del,
+ .timer_get = alarm_timer_get,
+ .nsleep = alarm_timer_nsleep,
+ };
+
+ posix_timers_register_clock(CLOCK_REALTIME_ALARM, &alarm_clock);
+ posix_timers_register_clock(CLOCK_BOOTTIME_ALARM, &alarm_clock);
+
+ /* Initialize alarm bases */
+ alarm_bases[ALARM_REALTIME].base_clockid = CLOCK_REALTIME;
+ alarm_bases[ALARM_REALTIME].gettime = &ktime_get_real;
+ alarm_bases[ALARM_BOOTTIME].base_clockid = CLOCK_BOOTTIME;
+ alarm_bases[ALARM_BOOTTIME].gettime = &ktime_get_boottime;
+ for (i = 0; i < ALARM_NUMTYPE; i++) {
+ timerqueue_init_head(&alarm_bases[i].timerqueue);
+ spin_lock_init(&alarm_bases[i].lock);
+ hrtimer_init(&alarm_bases[i].timer,
+ alarm_bases[i].base_clockid,
+ HRTIMER_MODE_ABS);
+ alarm_bases[i].timer.function = alarmtimer_fired;
+ }
+ error = platform_driver_register(&alarmtimer_driver);
+ platform_device_register_simple("alarmtimer", -1, NULL, 0);
+
+ return error;
+}
+device_initcall(alarmtimer_init);
+
+#ifdef CONFIG_RTC_CLASS
+/**
+ * has_wakealarm - check rtc device has wakealarm ability
+ * @dev: current device
+ * @name_ptr: name to be returned
+ *
+ * This helper function checks to see if the rtc device can wake
+ * from suspend.
+ */
+static int __init has_wakealarm(struct device *dev, void *name_ptr)
+{
+ struct rtc_device *candidate = to_rtc_device(dev);
+
+ if (!candidate->ops->set_alarm)
+ return 0;
+ if (!device_may_wakeup(candidate->dev.parent))
+ return 0;
+
+ *(const char **)name_ptr = dev_name(dev);
+ return 1;
+}
+
+/**
+ * alarmtimer_init_late - Late initializing of alarmtimer code
+ *
+ * This function locates a rtc device to use for wakealarms.
+ * Run as late_initcall to make sure rtc devices have been
+ * registered.
+ */
+static int __init alarmtimer_init_late(void)
+{
+ char *str;
+
+ /* Find an rtc device and init the rtc_timer */
+ class_find_device(rtc_class, NULL, &str, has_wakealarm);
+ if (str)
+ rtcdev = rtc_class_open(str);
+ if (!rtcdev) {
+ printk(KERN_WARNING "No RTC device found, ALARM timers will"
+ " not wake from suspend");
+ }
+ rtc_timer_init(&rtctimer, NULL, NULL);
+
+ return 0;
+}
+#else
+static int __init alarmtimer_init_late(void)
+{
+ printk(KERN_WARNING "Kernel not built with RTC support, ALARM timers"
+ " will not wake from suspend");
+ return 0;
+}
+#endif
+late_initcall(alarmtimer_init_late);
}
EXPORT_SYMBOL_GPL(clockevents_register_device);
+static void clockevents_config(struct clock_event_device *dev,
+ u32 freq)
+{
+ unsigned long sec;
+
+ if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+ return;
+
+ /*
+ * Calculate the maximum number of seconds we can sleep. Limit
+ * to 10 minutes for hardware which can program more than
+ * 32bit ticks so we still get reasonable conversion values.
+ */
+ sec = dev->max_delta_ticks;
+ do_div(sec, freq);
+ if (!sec)
+ sec = 1;
+ else if (sec > 600 && dev->max_delta_ticks > UINT_MAX)
+ sec = 600;
+
+ clockevents_calc_mult_shift(dev, freq, sec);
+ dev->min_delta_ns = clockevent_delta2ns(dev->min_delta_ticks, dev);
+ dev->max_delta_ns = clockevent_delta2ns(dev->max_delta_ticks, dev);
+}
+
+/**
+ * clockevents_config_and_register - Configure and register a clock event device
+ * @dev: device to register
+ * @freq: The clock frequency
+ * @min_delta: The minimum clock ticks to program in oneshot mode
+ * @max_delta: The maximum clock ticks to program in oneshot mode
+ *
+ * min/max_delta can be 0 for devices which do not support oneshot mode.
+ */
+void clockevents_config_and_register(struct clock_event_device *dev,
+ u32 freq, unsigned long min_delta,
+ unsigned long max_delta)
+{
+ dev->min_delta_ticks = min_delta;
+ dev->max_delta_ticks = max_delta;
+ clockevents_config(dev, freq);
+ clockevents_register_device(dev);
+}
+
+/**
+ * clockevents_update_freq - Update frequency and reprogram a clock event device.
+ * @dev: device to modify
+ * @freq: new device frequency
+ *
+ * Reconfigure and reprogram a clock event device in oneshot
+ * mode. Must be called on the cpu for which the device delivers per
+ * cpu timer events with interrupts disabled! Returns 0 on success,
+ * -ETIME when the event is in the past.
+ */
+int clockevents_update_freq(struct clock_event_device *dev, u32 freq)
+{
+ clockevents_config(dev, freq);
+
+ if (dev->mode != CLOCK_EVT_MODE_ONESHOT)
+ return 0;
+
+ return clockevents_program_event(dev, dev->next_event, ktime_get());
+}
+
/*
* Noop handler when we shut down an event device
*/
list_add(&cs->list, entry);
}
-
-/*
- * Maximum time we expect to go between ticks. This includes idle
- * tickless time. It provides the trade off between selecting a
- * mult/shift pair that is very precise but can only handle a short
- * period of time, vs. a mult/shift pair that can handle long periods
- * of time but isn't as precise.
- *
- * This is a subsystem constant, and actual hardware limitations
- * may override it (ie: clocksources that wrap every 3 seconds).
- */
-#define MAX_UPDATE_LENGTH 5 /* Seconds */
-
/**
* __clocksource_updatefreq_scale - Used update clocksource with new freq
* @t: clocksource to be registered
*/
void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
{
+ unsigned long sec;
+
/*
- * Ideally we want to use some of the limits used in
- * clocksource_max_deferment, to provide a more informed
- * MAX_UPDATE_LENGTH. But for now this just gets the
- * register interface working properly.
+ * Calc the maximum number of seconds which we can run before
+ * wrapping around. For clocksources which have a mask > 32bit
+ * we need to limit the max sleep time to have a good
+ * conversion precision. 10 minutes is still a reasonable
+ * amount. That results in a shift value of 24 for a
+ * clocksource with mask >= 40bit and f >= 4GHz. That maps to
+ * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
+ * margin as we do in clocksource_max_deferment()
*/
+ sec = (cs->mask - (cs->mask >> 5));
+ do_div(sec, freq);
+ do_div(sec, scale);
+ if (!sec)
+ sec = 1;
+ else if (sec > 600 && cs->mask > UINT_MAX)
+ sec = 600;
+
clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
- NSEC_PER_SEC/scale,
- MAX_UPDATE_LENGTH*scale);
+ NSEC_PER_SEC / scale, sec * scale);
cs->max_idle_ns = clocksource_max_deferment(cs);
}
EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
/* Add clocksource to the clcoksource list */
mutex_lock(&clocksource_mutex);
clocksource_enqueue(cs);
- clocksource_select();
clocksource_enqueue_watchdog(cs);
+ clocksource_select();
mutex_unlock(&clocksource_mutex);
return 0;
}
mutex_lock(&clocksource_mutex);
clocksource_enqueue(cs);
- clocksource_select();
clocksource_enqueue_watchdog(cs);
+ clocksource_select();
mutex_unlock(&clocksource_mutex);
return 0;
}
*/
void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
{
+ int cpu = smp_processor_id();
+
/* Set it up only once ! */
if (bc->event_handler != tick_handle_oneshot_broadcast) {
int was_periodic = bc->mode == CLOCK_EVT_MODE_PERIODIC;
- int cpu = smp_processor_id();
bc->event_handler = tick_handle_oneshot_broadcast;
clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
tick_broadcast_set_event(tick_next_period, 1);
} else
bc->next_event.tv64 = KTIME_MAX;
+ } else {
+ /*
+ * The first cpu which switches to oneshot mode sets
+ * the bit for all other cpus which are in the general
+ * (periodic) broadcast mask. So the bit is set and
+ * would prevent the first broadcast enter after this
+ * to program the bc device.
+ */
+ tick_broadcast_clear_oneshot(cpu);
}
}
/* time in seconds when suspend began */
static struct timespec timekeeping_suspend_time;
+/**
+ * __timekeeping_inject_sleeptime - Internal function to add sleep interval
+ * @delta: pointer to a timespec delta value
+ *
+ * Takes a timespec offset measuring a suspend interval and properly
+ * adds the sleep offset to the timekeeping variables.
+ */
+static void __timekeeping_inject_sleeptime(struct timespec *delta)
+{
+ xtime = timespec_add(xtime, *delta);
+ wall_to_monotonic = timespec_sub(wall_to_monotonic, *delta);
+ total_sleep_time = timespec_add(total_sleep_time, *delta);
+}
+
+
+/**
+ * timekeeping_inject_sleeptime - Adds suspend interval to timeekeeping values
+ * @delta: pointer to a timespec delta value
+ *
+ * This hook is for architectures that cannot support read_persistent_clock
+ * because their RTC/persistent clock is only accessible when irqs are enabled.
+ *
+ * This function should only be called by rtc_resume(), and allows
+ * a suspend offset to be injected into the timekeeping values.
+ */
+void timekeeping_inject_sleeptime(struct timespec *delta)
+{
+ unsigned long flags;
+ struct timespec ts;
+
+ /* Make sure we don't set the clock twice */
+ read_persistent_clock(&ts);
+ if (!(ts.tv_sec == 0 && ts.tv_nsec == 0))
+ return;
+
+ write_seqlock_irqsave(&xtime_lock, flags);
+ timekeeping_forward_now();
+
+ __timekeeping_inject_sleeptime(delta);
+
+ timekeeper.ntp_error = 0;
+ ntp_clear();
+ update_vsyscall(&xtime, &wall_to_monotonic, timekeeper.clock,
+ timekeeper.mult);
+
+ write_sequnlock_irqrestore(&xtime_lock, flags);
+
+ /* signal hrtimers about time change */
+ clock_was_set();
+}
+
+
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
*
if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
ts = timespec_sub(ts, timekeeping_suspend_time);
- xtime = timespec_add(xtime, ts);
- wall_to_monotonic = timespec_sub(wall_to_monotonic, ts);
- total_sleep_time = timespec_add(total_sleep_time, ts);
+ __timekeeping_inject_sleeptime(&ts);
}
/* re-base the last cycle value */
timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
config FUNCTION_TRACER
bool "Kernel Function Tracer"
depends on HAVE_FUNCTION_TRACER
- select FRAME_POINTER if !ARM_UNWIND && !S390
+ select FRAME_POINTER if !ARM_UNWIND && !S390 && !MICROBLAZE
select KALLSYMS
select GENERIC_TRACER
select CONTEXT_SWITCH_TRACER
#include "trace_stat.h"
#define FTRACE_WARN_ON(cond) \
- do { \
- if (WARN_ON(cond)) \
+ ({ \
+ int ___r = cond; \
+ if (WARN_ON(___r)) \
ftrace_kill(); \
- } while (0)
+ ___r; \
+ })
#define FTRACE_WARN_ON_ONCE(cond) \
- do { \
- if (WARN_ON_ONCE(cond)) \
+ ({ \
+ int ___r = cond; \
+ if (WARN_ON_ONCE(___r)) \
ftrace_kill(); \
- } while (0)
+ ___r; \
+ })
/* hash bits for specific function selection */
#define FTRACE_HASH_BITS 7
#define FTRACE_FUNC_HASHSIZE (1 << FTRACE_HASH_BITS)
+#define FTRACE_HASH_DEFAULT_BITS 10
+#define FTRACE_HASH_MAX_BITS 12
/* ftrace_enabled is a method to turn ftrace on or off */
int ftrace_enabled __read_mostly;
.func = ftrace_stub,
};
-static struct ftrace_ops *ftrace_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_global_list __read_mostly = &ftrace_list_end;
+static struct ftrace_ops *ftrace_ops_list __read_mostly = &ftrace_list_end;
ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
ftrace_func_t __ftrace_trace_function __read_mostly = ftrace_stub;
ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
+static struct ftrace_ops global_ops;
+
+static void
+ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
/*
- * Traverse the ftrace_list, invoking all entries. The reason that we
+ * Traverse the ftrace_global_list, invoking all entries. The reason that we
* can use rcu_dereference_raw() is that elements removed from this list
* are simply leaked, so there is no need to interact with a grace-period
* mechanism. The rcu_dereference_raw() calls are needed to handle
- * concurrent insertions into the ftrace_list.
+ * concurrent insertions into the ftrace_global_list.
*
* Silly Alpha and silly pointer-speculation compiler optimizations!
*/
-static void ftrace_list_func(unsigned long ip, unsigned long parent_ip)
+static void ftrace_global_list_func(unsigned long ip,
+ unsigned long parent_ip)
{
- struct ftrace_ops *op = rcu_dereference_raw(ftrace_list); /*see above*/
+ struct ftrace_ops *op = rcu_dereference_raw(ftrace_global_list); /*see above*/
while (op != &ftrace_list_end) {
op->func(ip, parent_ip);
}
#endif
-static int __register_ftrace_function(struct ftrace_ops *ops)
+static void update_global_ops(void)
{
- ops->next = ftrace_list;
+ ftrace_func_t func;
+
/*
- * We are entering ops into the ftrace_list but another
- * CPU might be walking that list. We need to make sure
- * the ops->next pointer is valid before another CPU sees
- * the ops pointer included into the ftrace_list.
+ * If there's only one function registered, then call that
+ * function directly. Otherwise, we need to iterate over the
+ * registered callers.
*/
- rcu_assign_pointer(ftrace_list, ops);
+ if (ftrace_global_list == &ftrace_list_end ||
+ ftrace_global_list->next == &ftrace_list_end)
+ func = ftrace_global_list->func;
+ else
+ func = ftrace_global_list_func;
- if (ftrace_enabled) {
- ftrace_func_t func;
+ /* If we filter on pids, update to use the pid function */
+ if (!list_empty(&ftrace_pids)) {
+ set_ftrace_pid_function(func);
+ func = ftrace_pid_func;
+ }
- if (ops->next == &ftrace_list_end)
- func = ops->func;
- else
- func = ftrace_list_func;
+ global_ops.func = func;
+}
- if (!list_empty(&ftrace_pids)) {
- set_ftrace_pid_function(func);
- func = ftrace_pid_func;
- }
+static void update_ftrace_function(void)
+{
+ ftrace_func_t func;
+
+ update_global_ops();
+
+ /*
+ * If we are at the end of the list and this ops is
+ * not dynamic, then have the mcount trampoline call
+ * the function directly
+ */
+ if (ftrace_ops_list == &ftrace_list_end ||
+ (ftrace_ops_list->next == &ftrace_list_end &&
+ !(ftrace_ops_list->flags & FTRACE_OPS_FL_DYNAMIC)))
+ func = ftrace_ops_list->func;
+ else
+ func = ftrace_ops_list_func;
- /*
- * For one func, simply call it directly.
- * For more than one func, call the chain.
- */
#ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
- ftrace_trace_function = func;
+ ftrace_trace_function = func;
#else
- __ftrace_trace_function = func;
- ftrace_trace_function = ftrace_test_stop_func;
+ __ftrace_trace_function = func;
+ ftrace_trace_function = ftrace_test_stop_func;
#endif
- }
+}
- return 0;
+static void add_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
+{
+ ops->next = *list;
+ /*
+ * We are entering ops into the list but another
+ * CPU might be walking that list. We need to make sure
+ * the ops->next pointer is valid before another CPU sees
+ * the ops pointer included into the list.
+ */
+ rcu_assign_pointer(*list, ops);
}
-static int __unregister_ftrace_function(struct ftrace_ops *ops)
+static int remove_ftrace_ops(struct ftrace_ops **list, struct ftrace_ops *ops)
{
struct ftrace_ops **p;
* If we are removing the last function, then simply point
* to the ftrace_stub.
*/
- if (ftrace_list == ops && ops->next == &ftrace_list_end) {
- ftrace_trace_function = ftrace_stub;
- ftrace_list = &ftrace_list_end;
+ if (*list == ops && ops->next == &ftrace_list_end) {
+ *list = &ftrace_list_end;
return 0;
}
- for (p = &ftrace_list; *p != &ftrace_list_end; p = &(*p)->next)
+ for (p = list; *p != &ftrace_list_end; p = &(*p)->next)
if (*p == ops)
break;
return -1;
*p = (*p)->next;
+ return 0;
+}
- if (ftrace_enabled) {
- /* If we only have one func left, then call that directly */
- if (ftrace_list->next == &ftrace_list_end) {
- ftrace_func_t func = ftrace_list->func;
+static int __register_ftrace_function(struct ftrace_ops *ops)
+{
+ if (ftrace_disabled)
+ return -ENODEV;
- if (!list_empty(&ftrace_pids)) {
- set_ftrace_pid_function(func);
- func = ftrace_pid_func;
- }
-#ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
- ftrace_trace_function = func;
-#else
- __ftrace_trace_function = func;
-#endif
- }
- }
+ if (FTRACE_WARN_ON(ops == &global_ops))
+ return -EINVAL;
+
+ if (WARN_ON(ops->flags & FTRACE_OPS_FL_ENABLED))
+ return -EBUSY;
+
+ if (!core_kernel_data((unsigned long)ops))
+ ops->flags |= FTRACE_OPS_FL_DYNAMIC;
+
+ if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
+ int first = ftrace_global_list == &ftrace_list_end;
+ add_ftrace_ops(&ftrace_global_list, ops);
+ ops->flags |= FTRACE_OPS_FL_ENABLED;
+ if (first)
+ add_ftrace_ops(&ftrace_ops_list, &global_ops);
+ } else
+ add_ftrace_ops(&ftrace_ops_list, ops);
+
+ if (ftrace_enabled)
+ update_ftrace_function();
return 0;
}
-static void ftrace_update_pid_func(void)
+static int __unregister_ftrace_function(struct ftrace_ops *ops)
{
- ftrace_func_t func;
+ int ret;
- if (ftrace_trace_function == ftrace_stub)
- return;
+ if (ftrace_disabled)
+ return -ENODEV;
-#ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
- func = ftrace_trace_function;
-#else
- func = __ftrace_trace_function;
-#endif
+ if (WARN_ON(!(ops->flags & FTRACE_OPS_FL_ENABLED)))
+ return -EBUSY;
- if (!list_empty(&ftrace_pids)) {
- set_ftrace_pid_function(func);
- func = ftrace_pid_func;
- } else {
- if (func == ftrace_pid_func)
- func = ftrace_pid_function;
- }
+ if (FTRACE_WARN_ON(ops == &global_ops))
+ return -EINVAL;
-#ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
- ftrace_trace_function = func;
-#else
- __ftrace_trace_function = func;
-#endif
+ if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
+ ret = remove_ftrace_ops(&ftrace_global_list, ops);
+ if (!ret && ftrace_global_list == &ftrace_list_end)
+ ret = remove_ftrace_ops(&ftrace_ops_list, &global_ops);
+ if (!ret)
+ ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+ } else
+ ret = remove_ftrace_ops(&ftrace_ops_list, ops);
+
+ if (ret < 0)
+ return ret;
+
+ if (ftrace_enabled)
+ update_ftrace_function();
+
+ /*
+ * Dynamic ops may be freed, we must make sure that all
+ * callers are done before leaving this function.
+ */
+ if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
+ synchronize_sched();
+
+ return 0;
+}
+
+static void ftrace_update_pid_func(void)
+{
+ /* Only do something if we are tracing something */
+ if (ftrace_trace_function == ftrace_stub)
+ return;
+
+ update_ftrace_function();
}
#ifdef CONFIG_FUNCTION_PROFILER
FTRACE_START_FUNC_RET = (1 << 3),
FTRACE_STOP_FUNC_RET = (1 << 4),
};
+struct ftrace_func_entry {
+ struct hlist_node hlist;
+ unsigned long ip;
+};
+
+struct ftrace_hash {
+ unsigned long size_bits;
+ struct hlist_head *buckets;
+ unsigned long count;
+ struct rcu_head rcu;
+};
+
+/*
+ * We make these constant because no one should touch them,
+ * but they are used as the default "empty hash", to avoid allocating
+ * it all the time. These are in a read only section such that if
+ * anyone does try to modify it, it will cause an exception.
+ */
+static const struct hlist_head empty_buckets[1];
+static const struct ftrace_hash empty_hash = {
+ .buckets = (struct hlist_head *)empty_buckets,
+};
+#define EMPTY_HASH ((struct ftrace_hash *)&empty_hash)
-static int ftrace_filtered;
+static struct ftrace_ops global_ops = {
+ .func = ftrace_stub,
+ .notrace_hash = EMPTY_HASH,
+ .filter_hash = EMPTY_HASH,
+};
static struct dyn_ftrace *ftrace_new_addrs;
static struct dyn_ftrace *ftrace_free_records;
+static struct ftrace_func_entry *
+ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
+{
+ unsigned long key;
+ struct ftrace_func_entry *entry;
+ struct hlist_head *hhd;
+ struct hlist_node *n;
+
+ if (!hash->count)
+ return NULL;
+
+ if (hash->size_bits > 0)
+ key = hash_long(ip, hash->size_bits);
+ else
+ key = 0;
+
+ hhd = &hash->buckets[key];
+
+ hlist_for_each_entry_rcu(entry, n, hhd, hlist) {
+ if (entry->ip == ip)
+ return entry;
+ }
+ return NULL;
+}
+
+static void __add_hash_entry(struct ftrace_hash *hash,
+ struct ftrace_func_entry *entry)
+{
+ struct hlist_head *hhd;
+ unsigned long key;
+
+ if (hash->size_bits)
+ key = hash_long(entry->ip, hash->size_bits);
+ else
+ key = 0;
+
+ hhd = &hash->buckets[key];
+ hlist_add_head(&entry->hlist, hhd);
+ hash->count++;
+}
+
+static int add_hash_entry(struct ftrace_hash *hash, unsigned long ip)
+{
+ struct ftrace_func_entry *entry;
+
+ entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->ip = ip;
+ __add_hash_entry(hash, entry);
+
+ return 0;
+}
+
+static void
+free_hash_entry(struct ftrace_hash *hash,
+ struct ftrace_func_entry *entry)
+{
+ hlist_del(&entry->hlist);
+ kfree(entry);
+ hash->count--;
+}
+
+static void
+remove_hash_entry(struct ftrace_hash *hash,
+ struct ftrace_func_entry *entry)
+{
+ hlist_del(&entry->hlist);
+ hash->count--;
+}
+
+static void ftrace_hash_clear(struct ftrace_hash *hash)
+{
+ struct hlist_head *hhd;
+ struct hlist_node *tp, *tn;
+ struct ftrace_func_entry *entry;
+ int size = 1 << hash->size_bits;
+ int i;
+
+ if (!hash->count)
+ return;
+
+ for (i = 0; i < size; i++) {
+ hhd = &hash->buckets[i];
+ hlist_for_each_entry_safe(entry, tp, tn, hhd, hlist)
+ free_hash_entry(hash, entry);
+ }
+ FTRACE_WARN_ON(hash->count);
+}
+
+static void free_ftrace_hash(struct ftrace_hash *hash)
+{
+ if (!hash || hash == EMPTY_HASH)
+ return;
+ ftrace_hash_clear(hash);
+ kfree(hash->buckets);
+ kfree(hash);
+}
+
+static void __free_ftrace_hash_rcu(struct rcu_head *rcu)
+{
+ struct ftrace_hash *hash;
+
+ hash = container_of(rcu, struct ftrace_hash, rcu);
+ free_ftrace_hash(hash);
+}
+
+static void free_ftrace_hash_rcu(struct ftrace_hash *hash)
+{
+ if (!hash || hash == EMPTY_HASH)
+ return;
+ call_rcu_sched(&hash->rcu, __free_ftrace_hash_rcu);
+}
+
+static struct ftrace_hash *alloc_ftrace_hash(int size_bits)
+{
+ struct ftrace_hash *hash;
+ int size;
+
+ hash = kzalloc(sizeof(*hash), GFP_KERNEL);
+ if (!hash)
+ return NULL;
+
+ size = 1 << size_bits;
+ hash->buckets = kzalloc(sizeof(*hash->buckets) * size, GFP_KERNEL);
+
+ if (!hash->buckets) {
+ kfree(hash);
+ return NULL;
+ }
+
+ hash->size_bits = size_bits;
+
+ return hash;
+}
+
+static struct ftrace_hash *
+alloc_and_copy_ftrace_hash(int size_bits, struct ftrace_hash *hash)
+{
+ struct ftrace_func_entry *entry;
+ struct ftrace_hash *new_hash;
+ struct hlist_node *tp;
+ int size;
+ int ret;
+ int i;
+
+ new_hash = alloc_ftrace_hash(size_bits);
+ if (!new_hash)
+ return NULL;
+
+ /* Empty hash? */
+ if (!hash || !hash->count)
+ return new_hash;
+
+ size = 1 << hash->size_bits;
+ for (i = 0; i < size; i++) {
+ hlist_for_each_entry(entry, tp, &hash->buckets[i], hlist) {
+ ret = add_hash_entry(new_hash, entry->ip);
+ if (ret < 0)
+ goto free_hash;
+ }
+ }
+
+ FTRACE_WARN_ON(new_hash->count != hash->count);
+
+ return new_hash;
+
+ free_hash:
+ free_ftrace_hash(new_hash);
+ return NULL;
+}
+
+static int
+ftrace_hash_move(struct ftrace_hash **dst, struct ftrace_hash *src)
+{
+ struct ftrace_func_entry *entry;
+ struct hlist_node *tp, *tn;
+ struct hlist_head *hhd;
+ struct ftrace_hash *old_hash;
+ struct ftrace_hash *new_hash;
+ unsigned long key;
+ int size = src->count;
+ int bits = 0;
+ int i;
+
+ /*
+ * If the new source is empty, just free dst and assign it
+ * the empty_hash.
+ */
+ if (!src->count) {
+ free_ftrace_hash_rcu(*dst);
+ rcu_assign_pointer(*dst, EMPTY_HASH);
+ return 0;
+ }
+
+ /*
+ * Make the hash size about 1/2 the # found
+ */
+ for (size /= 2; size; size >>= 1)
+ bits++;
+
+ /* Don't allocate too much */
+ if (bits > FTRACE_HASH_MAX_BITS)
+ bits = FTRACE_HASH_MAX_BITS;
+
+ new_hash = alloc_ftrace_hash(bits);
+ if (!new_hash)
+ return -ENOMEM;
+
+ size = 1 << src->size_bits;
+ for (i = 0; i < size; i++) {
+ hhd = &src->buckets[i];
+ hlist_for_each_entry_safe(entry, tp, tn, hhd, hlist) {
+ if (bits > 0)
+ key = hash_long(entry->ip, bits);
+ else
+ key = 0;
+ remove_hash_entry(src, entry);
+ __add_hash_entry(new_hash, entry);
+ }
+ }
+
+ old_hash = *dst;
+ rcu_assign_pointer(*dst, new_hash);
+ free_ftrace_hash_rcu(old_hash);
+
+ return 0;
+}
+
+/*
+ * Test the hashes for this ops to see if we want to call
+ * the ops->func or not.
+ *
+ * It's a match if the ip is in the ops->filter_hash or
+ * the filter_hash does not exist or is empty,
+ * AND
+ * the ip is not in the ops->notrace_hash.
+ *
+ * This needs to be called with preemption disabled as
+ * the hashes are freed with call_rcu_sched().
+ */
+static int
+ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
+{
+ struct ftrace_hash *filter_hash;
+ struct ftrace_hash *notrace_hash;
+ int ret;
+
+ filter_hash = rcu_dereference_raw(ops->filter_hash);
+ notrace_hash = rcu_dereference_raw(ops->notrace_hash);
+
+ if ((!filter_hash || !filter_hash->count ||
+ ftrace_lookup_ip(filter_hash, ip)) &&
+ (!notrace_hash || !notrace_hash->count ||
+ !ftrace_lookup_ip(notrace_hash, ip)))
+ ret = 1;
+ else
+ ret = 0;
+
+ return ret;
+}
+
/*
* This is a double for. Do not use 'break' to break out of the loop,
* you must use a goto.
} \
}
+static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
+ int filter_hash,
+ bool inc)
+{
+ struct ftrace_hash *hash;
+ struct ftrace_hash *other_hash;
+ struct ftrace_page *pg;
+ struct dyn_ftrace *rec;
+ int count = 0;
+ int all = 0;
+
+ /* Only update if the ops has been registered */
+ if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
+ return;
+
+ /*
+ * In the filter_hash case:
+ * If the count is zero, we update all records.
+ * Otherwise we just update the items in the hash.
+ *
+ * In the notrace_hash case:
+ * We enable the update in the hash.
+ * As disabling notrace means enabling the tracing,
+ * and enabling notrace means disabling, the inc variable
+ * gets inversed.
+ */
+ if (filter_hash) {
+ hash = ops->filter_hash;
+ other_hash = ops->notrace_hash;
+ if (!hash || !hash->count)
+ all = 1;
+ } else {
+ inc = !inc;
+ hash = ops->notrace_hash;
+ other_hash = ops->filter_hash;
+ /*
+ * If the notrace hash has no items,
+ * then there's nothing to do.
+ */
+ if (hash && !hash->count)
+ return;
+ }
+
+ do_for_each_ftrace_rec(pg, rec) {
+ int in_other_hash = 0;
+ int in_hash = 0;
+ int match = 0;
+
+ if (all) {
+ /*
+ * Only the filter_hash affects all records.
+ * Update if the record is not in the notrace hash.
+ */
+ if (!other_hash || !ftrace_lookup_ip(other_hash, rec->ip))
+ match = 1;
+ } else {
+ in_hash = hash && !!ftrace_lookup_ip(hash, rec->ip);
+ in_other_hash = other_hash && !!ftrace_lookup_ip(other_hash, rec->ip);
+
+ /*
+ *
+ */
+ if (filter_hash && in_hash && !in_other_hash)
+ match = 1;
+ else if (!filter_hash && in_hash &&
+ (in_other_hash || !other_hash->count))
+ match = 1;
+ }
+ if (!match)
+ continue;
+
+ if (inc) {
+ rec->flags++;
+ if (FTRACE_WARN_ON((rec->flags & ~FTRACE_FL_MASK) == FTRACE_REF_MAX))
+ return;
+ } else {
+ if (FTRACE_WARN_ON((rec->flags & ~FTRACE_FL_MASK) == 0))
+ return;
+ rec->flags--;
+ }
+ count++;
+ /* Shortcut, if we handled all records, we are done. */
+ if (!all && count == hash->count)
+ return;
+ } while_for_each_ftrace_rec();
+}
+
+static void ftrace_hash_rec_disable(struct ftrace_ops *ops,
+ int filter_hash)
+{
+ __ftrace_hash_rec_update(ops, filter_hash, 0);
+}
+
+static void ftrace_hash_rec_enable(struct ftrace_ops *ops,
+ int filter_hash)
+{
+ __ftrace_hash_rec_update(ops, filter_hash, 1);
+}
+
static void ftrace_free_rec(struct dyn_ftrace *rec)
{
rec->freelist = ftrace_free_records;
ftrace_addr = (unsigned long)FTRACE_ADDR;
/*
- * If this record is not to be traced or we want to disable it,
- * then disable it.
+ * If we are enabling tracing:
+ *
+ * If the record has a ref count, then we need to enable it
+ * because someone is using it.
*
- * If we want to enable it and filtering is off, then enable it.
+ * Otherwise we make sure its disabled.
*
- * If we want to enable it and filtering is on, enable it only if
- * it's filtered
+ * If we are disabling tracing, then disable all records that
+ * are enabled.
*/
- if (enable && !(rec->flags & FTRACE_FL_NOTRACE)) {
- if (!ftrace_filtered || (rec->flags & FTRACE_FL_FILTER))
- flag = FTRACE_FL_ENABLED;
- }
+ if (enable && (rec->flags & ~FTRACE_FL_MASK))
+ flag = FTRACE_FL_ENABLED;
/* If the state of this record hasn't changed, then do nothing */
if ((rec->flags & FTRACE_FL_ENABLED) == flag)
struct ftrace_page *pg;
int failed;
+ if (unlikely(ftrace_disabled))
+ return;
+
do_for_each_ftrace_rec(pg, rec) {
- /*
- * Skip over free records, records that have
- * failed and not converted.
- */
- if (rec->flags & FTRACE_FL_FREE ||
- rec->flags & FTRACE_FL_FAILED ||
- !(rec->flags & FTRACE_FL_CONVERTED))
+ /* Skip over free records */
+ if (rec->flags & FTRACE_FL_FREE)
continue;
failed = __ftrace_replace_code(rec, enable);
if (failed) {
- rec->flags |= FTRACE_FL_FAILED;
ftrace_bug(failed, rec->ip);
/* Stop processing */
return;
ip = rec->ip;
+ if (unlikely(ftrace_disabled))
+ return 0;
+
ret = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
if (ret) {
ftrace_bug(ret, ip);
- rec->flags |= FTRACE_FL_FAILED;
return 0;
}
return 1;
static ftrace_func_t saved_ftrace_func;
static int ftrace_start_up;
+static int global_start_up;
static void ftrace_startup_enable(int command)
{
ftrace_run_update_code(command);
}
-static void ftrace_startup(int command)
+static void ftrace_startup(struct ftrace_ops *ops, int command)
{
+ bool hash_enable = true;
+
if (unlikely(ftrace_disabled))
return;
ftrace_start_up++;
command |= FTRACE_ENABLE_CALLS;
+ /* ops marked global share the filter hashes */
+ if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
+ ops = &global_ops;
+ /* Don't update hash if global is already set */
+ if (global_start_up)
+ hash_enable = false;
+ global_start_up++;
+ }
+
+ ops->flags |= FTRACE_OPS_FL_ENABLED;
+ if (hash_enable)
+ ftrace_hash_rec_enable(ops, 1);
+
ftrace_startup_enable(command);
}
-static void ftrace_shutdown(int command)
+static void ftrace_shutdown(struct ftrace_ops *ops, int command)
{
+ bool hash_disable = true;
+
if (unlikely(ftrace_disabled))
return;
*/
WARN_ON_ONCE(ftrace_start_up < 0);
+ if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
+ ops = &global_ops;
+ global_start_up--;
+ WARN_ON_ONCE(global_start_up < 0);
+ /* Don't update hash if global still has users */
+ if (global_start_up) {
+ WARN_ON_ONCE(!ftrace_start_up);
+ hash_disable = false;
+ }
+ }
+
+ if (hash_disable)
+ ftrace_hash_rec_disable(ops, 1);
+
+ if (ops != &global_ops || !global_start_up)
+ ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+
if (!ftrace_start_up)
command |= FTRACE_DISABLE_CALLS;
*/
if (!ftrace_code_disable(mod, p)) {
ftrace_free_rec(p);
- continue;
+ /* Game over */
+ break;
}
- p->flags |= FTRACE_FL_CONVERTED;
ftrace_update_cnt++;
/*
enum {
FTRACE_ITER_FILTER = (1 << 0),
FTRACE_ITER_NOTRACE = (1 << 1),
- FTRACE_ITER_FAILURES = (1 << 2),
- FTRACE_ITER_PRINTALL = (1 << 3),
- FTRACE_ITER_HASH = (1 << 4),
+ FTRACE_ITER_PRINTALL = (1 << 2),
+ FTRACE_ITER_HASH = (1 << 3),
+ FTRACE_ITER_ENABLED = (1 << 4),
};
#define FTRACE_BUFF_MAX (KSYM_SYMBOL_LEN+4) /* room for wildcards */
struct dyn_ftrace *func;
struct ftrace_func_probe *probe;
struct trace_parser parser;
+ struct ftrace_hash *hash;
+ struct ftrace_ops *ops;
int hidx;
int idx;
unsigned flags;
t_next(struct seq_file *m, void *v, loff_t *pos)
{
struct ftrace_iterator *iter = m->private;
+ struct ftrace_ops *ops = &global_ops;
struct dyn_ftrace *rec = NULL;
+ if (unlikely(ftrace_disabled))
+ return NULL;
+
if (iter->flags & FTRACE_ITER_HASH)
return t_hash_next(m, pos);
rec = &iter->pg->records[iter->idx++];
if ((rec->flags & FTRACE_FL_FREE) ||
- (!(iter->flags & FTRACE_ITER_FAILURES) &&
- (rec->flags & FTRACE_FL_FAILED)) ||
-
- ((iter->flags & FTRACE_ITER_FAILURES) &&
- !(rec->flags & FTRACE_FL_FAILED)) ||
-
((iter->flags & FTRACE_ITER_FILTER) &&
- !(rec->flags & FTRACE_FL_FILTER)) ||
+ !(ftrace_lookup_ip(ops->filter_hash, rec->ip))) ||
((iter->flags & FTRACE_ITER_NOTRACE) &&
- !(rec->flags & FTRACE_FL_NOTRACE))) {
+ !ftrace_lookup_ip(ops->notrace_hash, rec->ip)) ||
+
+ ((iter->flags & FTRACE_ITER_ENABLED) &&
+ !(rec->flags & ~FTRACE_FL_MASK))) {
+
rec = NULL;
goto retry;
}
static void *t_start(struct seq_file *m, loff_t *pos)
{
struct ftrace_iterator *iter = m->private;
+ struct ftrace_ops *ops = &global_ops;
void *p = NULL;
loff_t l;
mutex_lock(&ftrace_lock);
+
+ if (unlikely(ftrace_disabled))
+ return NULL;
+
/*
* If an lseek was done, then reset and start from beginning.
*/
* off, we can short cut and just print out that all
* functions are enabled.
*/
- if (iter->flags & FTRACE_ITER_FILTER && !ftrace_filtered) {
+ if (iter->flags & FTRACE_ITER_FILTER && !ops->filter_hash->count) {
if (*pos > 0)
return t_hash_start(m, pos);
iter->flags |= FTRACE_ITER_PRINTALL;
if (!rec)
return 0;
- seq_printf(m, "%ps\n", (void *)rec->ip);
+ seq_printf(m, "%ps", (void *)rec->ip);
+ if (iter->flags & FTRACE_ITER_ENABLED)
+ seq_printf(m, " (%ld)",
+ rec->flags & ~FTRACE_FL_MASK);
+ seq_printf(m, "\n");
return 0;
}
};
static int
-ftrace_avail_open(struct inode *inode, struct file *file)
+ftrace_avail_open(struct inode *inode, struct file *file)
+{
+ struct ftrace_iterator *iter;
+ int ret;
+
+ if (unlikely(ftrace_disabled))
+ return -ENODEV;
+
+ iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+ if (!iter)
+ return -ENOMEM;
+
+ iter->pg = ftrace_pages_start;
+
+ ret = seq_open(file, &show_ftrace_seq_ops);
+ if (!ret) {
+ struct seq_file *m = file->private_data;
+
+ m->private = iter;
+ } else {
+ kfree(iter);
+ }
+
+ return ret;
+}
+
+static int
+ftrace_enabled_open(struct inode *inode, struct file *file)
{
struct ftrace_iterator *iter;
int ret;
return -ENOMEM;
iter->pg = ftrace_pages_start;
+ iter->flags = FTRACE_ITER_ENABLED;
ret = seq_open(file, &show_ftrace_seq_ops);
if (!ret) {
return ret;
}
-static int
-ftrace_failures_open(struct inode *inode, struct file *file)
-{
- int ret;
- struct seq_file *m;
- struct ftrace_iterator *iter;
-
- ret = ftrace_avail_open(inode, file);
- if (!ret) {
- m = file->private_data;
- iter = m->private;
- iter->flags = FTRACE_ITER_FAILURES;
- }
-
- return ret;
-}
-
-
-static void ftrace_filter_reset(int enable)
+static void ftrace_filter_reset(struct ftrace_hash *hash)
{
- struct ftrace_page *pg;
- struct dyn_ftrace *rec;
- unsigned long type = enable ? FTRACE_FL_FILTER : FTRACE_FL_NOTRACE;
-
mutex_lock(&ftrace_lock);
- if (enable)
- ftrace_filtered = 0;
- do_for_each_ftrace_rec(pg, rec) {
- if (rec->flags & FTRACE_FL_FAILED)
- continue;
- rec->flags &= ~type;
- } while_for_each_ftrace_rec();
+ ftrace_hash_clear(hash);
mutex_unlock(&ftrace_lock);
}
static int
-ftrace_regex_open(struct inode *inode, struct file *file, int enable)
+ftrace_regex_open(struct ftrace_ops *ops, int flag,
+ struct inode *inode, struct file *file)
{
struct ftrace_iterator *iter;
+ struct ftrace_hash *hash;
int ret = 0;
if (unlikely(ftrace_disabled))
return -ENOMEM;
}
+ if (flag & FTRACE_ITER_NOTRACE)
+ hash = ops->notrace_hash;
+ else
+ hash = ops->filter_hash;
+
+ iter->ops = ops;
+ iter->flags = flag;
+
+ if (file->f_mode & FMODE_WRITE) {
+ mutex_lock(&ftrace_lock);
+ iter->hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS, hash);
+ mutex_unlock(&ftrace_lock);
+
+ if (!iter->hash) {
+ trace_parser_put(&iter->parser);
+ kfree(iter);
+ return -ENOMEM;
+ }
+ }
+
mutex_lock(&ftrace_regex_lock);
+
if ((file->f_mode & FMODE_WRITE) &&
(file->f_flags & O_TRUNC))
- ftrace_filter_reset(enable);
+ ftrace_filter_reset(iter->hash);
if (file->f_mode & FMODE_READ) {
iter->pg = ftrace_pages_start;
- iter->flags = enable ? FTRACE_ITER_FILTER :
- FTRACE_ITER_NOTRACE;
ret = seq_open(file, &show_ftrace_seq_ops);
if (!ret) {
struct seq_file *m = file->private_data;
m->private = iter;
} else {
+ /* Failed */
+ free_ftrace_hash(iter->hash);
trace_parser_put(&iter->parser);
kfree(iter);
}
static int
ftrace_filter_open(struct inode *inode, struct file *file)
{
- return ftrace_regex_open(inode, file, 1);
+ return ftrace_regex_open(&global_ops, FTRACE_ITER_FILTER,
+ inode, file);
}
static int
ftrace_notrace_open(struct inode *inode, struct file *file)
{
- return ftrace_regex_open(inode, file, 0);
+ return ftrace_regex_open(&global_ops, FTRACE_ITER_NOTRACE,
+ inode, file);
}
static loff_t
}
static int
-ftrace_match_record(struct dyn_ftrace *rec, char *regex, int len, int type)
+enter_record(struct ftrace_hash *hash, struct dyn_ftrace *rec, int not)
+{
+ struct ftrace_func_entry *entry;
+ int ret = 0;
+
+ entry = ftrace_lookup_ip(hash, rec->ip);
+ if (not) {
+ /* Do nothing if it doesn't exist */
+ if (!entry)
+ return 0;
+
+ free_hash_entry(hash, entry);
+ } else {
+ /* Do nothing if it exists */
+ if (entry)
+ return 0;
+
+ ret = add_hash_entry(hash, rec->ip);
+ }
+ return ret;
+}
+
+static int
+ftrace_match_record(struct dyn_ftrace *rec, char *mod,
+ char *regex, int len, int type)
{
char str[KSYM_SYMBOL_LEN];
+ char *modname;
+
+ kallsyms_lookup(rec->ip, NULL, NULL, &modname, str);
+
+ if (mod) {
+ /* module lookup requires matching the module */
+ if (!modname || strcmp(modname, mod))
+ return 0;
+
+ /* blank search means to match all funcs in the mod */
+ if (!len)
+ return 1;
+ }
- kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
return ftrace_match(str, regex, len, type);
}
-static int ftrace_match_records(char *buff, int len, int enable)
+static int
+match_records(struct ftrace_hash *hash, char *buff,
+ int len, char *mod, int not)
{
- unsigned int search_len;
+ unsigned search_len = 0;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
- unsigned long flag;
- char *search;
- int type;
- int not;
+ int type = MATCH_FULL;
+ char *search = buff;
int found = 0;
+ int ret;
- flag = enable ? FTRACE_FL_FILTER : FTRACE_FL_NOTRACE;
- type = filter_parse_regex(buff, len, &search, ¬);
-
- search_len = strlen(search);
+ if (len) {
+ type = filter_parse_regex(buff, len, &search, ¬);
+ search_len = strlen(search);
+ }
mutex_lock(&ftrace_lock);
- do_for_each_ftrace_rec(pg, rec) {
- if (rec->flags & FTRACE_FL_FAILED)
- continue;
+ if (unlikely(ftrace_disabled))
+ goto out_unlock;
- if (ftrace_match_record(rec, search, search_len, type)) {
- if (not)
- rec->flags &= ~flag;
- else
- rec->flags |= flag;
+ do_for_each_ftrace_rec(pg, rec) {
+
+ if (ftrace_match_record(rec, mod, search, search_len, type)) {
+ ret = enter_record(hash, rec, not);
+ if (ret < 0) {
+ found = ret;
+ goto out_unlock;
+ }
found = 1;
}
- /*
- * Only enable filtering if we have a function that
- * is filtered on.
- */
- if (enable && (rec->flags & FTRACE_FL_FILTER))
- ftrace_filtered = 1;
} while_for_each_ftrace_rec();
+ out_unlock:
mutex_unlock(&ftrace_lock);
return found;
}
static int
-ftrace_match_module_record(struct dyn_ftrace *rec, char *mod,
- char *regex, int len, int type)
+ftrace_match_records(struct ftrace_hash *hash, char *buff, int len)
{
- char str[KSYM_SYMBOL_LEN];
- char *modname;
-
- kallsyms_lookup(rec->ip, NULL, NULL, &modname, str);
-
- if (!modname || strcmp(modname, mod))
- return 0;
-
- /* blank search means to match all funcs in the mod */
- if (len)
- return ftrace_match(str, regex, len, type);
- else
- return 1;
+ return match_records(hash, buff, len, NULL, 0);
}
-static int ftrace_match_module_records(char *buff, char *mod, int enable)
+static int
+ftrace_match_module_records(struct ftrace_hash *hash, char *buff, char *mod)
{
- unsigned search_len = 0;
- struct ftrace_page *pg;
- struct dyn_ftrace *rec;
- int type = MATCH_FULL;
- char *search = buff;
- unsigned long flag;
int not = 0;
- int found = 0;
-
- flag = enable ? FTRACE_FL_FILTER : FTRACE_FL_NOTRACE;
/* blank or '*' mean the same */
if (strcmp(buff, "*") == 0)
not = 1;
}
- if (strlen(buff)) {
- type = filter_parse_regex(buff, strlen(buff), &search, ¬);
- search_len = strlen(search);
- }
-
- mutex_lock(&ftrace_lock);
- do_for_each_ftrace_rec(pg, rec) {
-
- if (rec->flags & FTRACE_FL_FAILED)
- continue;
-
- if (ftrace_match_module_record(rec, mod,
- search, search_len, type)) {
- if (not)
- rec->flags &= ~flag;
- else
- rec->flags |= flag;
- found = 1;
- }
- if (enable && (rec->flags & FTRACE_FL_FILTER))
- ftrace_filtered = 1;
-
- } while_for_each_ftrace_rec();
- mutex_unlock(&ftrace_lock);
-
- return found;
+ return match_records(hash, buff, strlen(buff), mod, not);
}
/*
static int
ftrace_mod_callback(char *func, char *cmd, char *param, int enable)
{
+ struct ftrace_ops *ops = &global_ops;
+ struct ftrace_hash *hash;
char *mod;
+ int ret = -EINVAL;
/*
* cmd == 'mod' because we only registered this func
/* we must have a module name */
if (!param)
- return -EINVAL;
+ return ret;
mod = strsep(¶m, ":");
if (!strlen(mod))
- return -EINVAL;
+ return ret;
- if (ftrace_match_module_records(func, mod, enable))
- return 0;
- return -EINVAL;
+ if (enable)
+ hash = ops->filter_hash;
+ else
+ hash = ops->notrace_hash;
+
+ ret = ftrace_match_module_records(hash, func, mod);
+ if (!ret)
+ ret = -EINVAL;
+ if (ret < 0)
+ return ret;
+
+ return 0;
}
static struct ftrace_func_command ftrace_mod_cmd = {
static void __enable_ftrace_function_probe(void)
{
+ int ret;
int i;
if (ftrace_probe_registered)
if (i == FTRACE_FUNC_HASHSIZE)
return;
- __register_ftrace_function(&trace_probe_ops);
- ftrace_startup(0);
+ ret = __register_ftrace_function(&trace_probe_ops);
+ if (!ret)
+ ftrace_startup(&trace_probe_ops, 0);
+
ftrace_probe_registered = 1;
}
static void __disable_ftrace_function_probe(void)
{
+ int ret;
int i;
if (!ftrace_probe_registered)
}
/* no more funcs left */
- __unregister_ftrace_function(&trace_probe_ops);
- ftrace_shutdown(0);
+ ret = __unregister_ftrace_function(&trace_probe_ops);
+ if (!ret)
+ ftrace_shutdown(&trace_probe_ops, 0);
+
ftrace_probe_registered = 0;
}
return -EINVAL;
mutex_lock(&ftrace_lock);
- do_for_each_ftrace_rec(pg, rec) {
- if (rec->flags & FTRACE_FL_FAILED)
- continue;
+ if (unlikely(ftrace_disabled))
+ goto out_unlock;
+
+ do_for_each_ftrace_rec(pg, rec) {
- if (!ftrace_match_record(rec, search, len, type))
+ if (!ftrace_match_record(rec, NULL, search, len, type))
continue;
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
return ret;
}
-static int ftrace_process_regex(char *buff, int len, int enable)
+static int ftrace_process_regex(struct ftrace_hash *hash,
+ char *buff, int len, int enable)
{
char *func, *command, *next = buff;
struct ftrace_func_command *p;
- int ret = -EINVAL;
+ int ret;
func = strsep(&next, ":");
if (!next) {
- if (ftrace_match_records(func, len, enable))
- return 0;
- return ret;
+ ret = ftrace_match_records(hash, func, len);
+ if (!ret)
+ ret = -EINVAL;
+ if (ret < 0)
+ return ret;
+ return 0;
}
/* command found */
mutex_lock(&ftrace_regex_lock);
+ ret = -ENODEV;
+ if (unlikely(ftrace_disabled))
+ goto out_unlock;
+
if (file->f_mode & FMODE_READ) {
struct seq_file *m = file->private_data;
iter = m->private;
if (read >= 0 && trace_parser_loaded(parser) &&
!trace_parser_cont(parser)) {
- ret = ftrace_process_regex(parser->buffer,
+ ret = ftrace_process_regex(iter->hash, parser->buffer,
parser->idx, enable);
trace_parser_clear(parser);
if (ret)
return ftrace_regex_write(file, ubuf, cnt, ppos, 0);
}
-static void
-ftrace_set_regex(unsigned char *buf, int len, int reset, int enable)
+static int
+ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
+ int reset, int enable)
{
+ struct ftrace_hash **orig_hash;
+ struct ftrace_hash *hash;
+ int ret;
+
+ /* All global ops uses the global ops filters */
+ if (ops->flags & FTRACE_OPS_FL_GLOBAL)
+ ops = &global_ops;
+
if (unlikely(ftrace_disabled))
- return;
+ return -ENODEV;
+
+ if (enable)
+ orig_hash = &ops->filter_hash;
+ else
+ orig_hash = &ops->notrace_hash;
+
+ hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS, *orig_hash);
+ if (!hash)
+ return -ENOMEM;
mutex_lock(&ftrace_regex_lock);
if (reset)
- ftrace_filter_reset(enable);
+ ftrace_filter_reset(hash);
if (buf)
- ftrace_match_records(buf, len, enable);
+ ftrace_match_records(hash, buf, len);
+
+ mutex_lock(&ftrace_lock);
+ ret = ftrace_hash_move(orig_hash, hash);
+ mutex_unlock(&ftrace_lock);
+
mutex_unlock(&ftrace_regex_lock);
+
+ free_ftrace_hash(hash);
+ return ret;
+}
+
+/**
+ * ftrace_set_filter - set a function to filter on in ftrace
+ * @ops - the ops to set the filter with
+ * @buf - the string that holds the function filter text.
+ * @len - the length of the string.
+ * @reset - non zero to reset all filters before applying this filter.
+ *
+ * Filters denote which functions should be enabled when tracing is enabled.
+ * If @buf is NULL and reset is set, all functions will be enabled for tracing.
+ */
+void ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
+ int len, int reset)
+{
+ ftrace_set_regex(ops, buf, len, reset, 1);
}
+EXPORT_SYMBOL_GPL(ftrace_set_filter);
+/**
+ * ftrace_set_notrace - set a function to not trace in ftrace
+ * @ops - the ops to set the notrace filter with
+ * @buf - the string that holds the function notrace text.
+ * @len - the length of the string.
+ * @reset - non zero to reset all filters before applying this filter.
+ *
+ * Notrace Filters denote which functions should not be enabled when tracing
+ * is enabled. If @buf is NULL and reset is set, all functions will be enabled
+ * for tracing.
+ */
+void ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
+ int len, int reset)
+{
+ ftrace_set_regex(ops, buf, len, reset, 0);
+}
+EXPORT_SYMBOL_GPL(ftrace_set_notrace);
/**
* ftrace_set_filter - set a function to filter on in ftrace
+ * @ops - the ops to set the filter with
* @buf - the string that holds the function filter text.
* @len - the length of the string.
* @reset - non zero to reset all filters before applying this filter.
* Filters denote which functions should be enabled when tracing is enabled.
* If @buf is NULL and reset is set, all functions will be enabled for tracing.
*/
-void ftrace_set_filter(unsigned char *buf, int len, int reset)
+void ftrace_set_global_filter(unsigned char *buf, int len, int reset)
{
- ftrace_set_regex(buf, len, reset, 1);
+ ftrace_set_regex(&global_ops, buf, len, reset, 1);
}
+EXPORT_SYMBOL_GPL(ftrace_set_global_filter);
/**
* ftrace_set_notrace - set a function to not trace in ftrace
+ * @ops - the ops to set the notrace filter with
* @buf - the string that holds the function notrace text.
* @len - the length of the string.
* @reset - non zero to reset all filters before applying this filter.
* is enabled. If @buf is NULL and reset is set, all functions will be enabled
* for tracing.
*/
-void ftrace_set_notrace(unsigned char *buf, int len, int reset)
+void ftrace_set_global_notrace(unsigned char *buf, int len, int reset)
{
- ftrace_set_regex(buf, len, reset, 0);
+ ftrace_set_regex(&global_ops, buf, len, reset, 0);
}
+EXPORT_SYMBOL_GPL(ftrace_set_global_notrace);
/*
* command line interface to allow users to set filters on boot up.
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-static void __init set_ftrace_early_filter(char *buf, int enable)
+static void __init
+set_ftrace_early_filter(struct ftrace_ops *ops, char *buf, int enable)
{
char *func;
while (buf) {
func = strsep(&buf, ",");
- ftrace_set_regex(func, strlen(func), 0, enable);
+ ftrace_set_regex(ops, func, strlen(func), 0, enable);
}
}
static void __init set_ftrace_early_filters(void)
{
if (ftrace_filter_buf[0])
- set_ftrace_early_filter(ftrace_filter_buf, 1);
+ set_ftrace_early_filter(&global_ops, ftrace_filter_buf, 1);
if (ftrace_notrace_buf[0])
- set_ftrace_early_filter(ftrace_notrace_buf, 0);
+ set_ftrace_early_filter(&global_ops, ftrace_notrace_buf, 0);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (ftrace_graph_buf[0])
set_ftrace_early_graph(ftrace_graph_buf);
}
static int
-ftrace_regex_release(struct inode *inode, struct file *file, int enable)
+ftrace_regex_release(struct inode *inode, struct file *file)
{
struct seq_file *m = (struct seq_file *)file->private_data;
struct ftrace_iterator *iter;
+ struct ftrace_hash **orig_hash;
struct trace_parser *parser;
+ int filter_hash;
+ int ret;
mutex_lock(&ftrace_regex_lock);
if (file->f_mode & FMODE_READ) {
parser = &iter->parser;
if (trace_parser_loaded(parser)) {
parser->buffer[parser->idx] = 0;
- ftrace_match_records(parser->buffer, parser->idx, enable);
+ ftrace_match_records(iter->hash, parser->buffer, parser->idx);
}
- mutex_lock(&ftrace_lock);
- if (ftrace_start_up && ftrace_enabled)
- ftrace_run_update_code(FTRACE_ENABLE_CALLS);
- mutex_unlock(&ftrace_lock);
-
trace_parser_put(parser);
+
+ if (file->f_mode & FMODE_WRITE) {
+ filter_hash = !!(iter->flags & FTRACE_ITER_FILTER);
+
+ if (filter_hash)
+ orig_hash = &iter->ops->filter_hash;
+ else
+ orig_hash = &iter->ops->notrace_hash;
+
+ mutex_lock(&ftrace_lock);
+ /*
+ * Remove the current set, update the hash and add
+ * them back.
+ */
+ ftrace_hash_rec_disable(iter->ops, filter_hash);
+ ret = ftrace_hash_move(orig_hash, iter->hash);
+ if (!ret) {
+ ftrace_hash_rec_enable(iter->ops, filter_hash);
+ if (iter->ops->flags & FTRACE_OPS_FL_ENABLED
+ && ftrace_enabled)
+ ftrace_run_update_code(FTRACE_ENABLE_CALLS);
+ }
+ mutex_unlock(&ftrace_lock);
+ }
+ free_ftrace_hash(iter->hash);
kfree(iter);
mutex_unlock(&ftrace_regex_lock);
return 0;
}
-static int
-ftrace_filter_release(struct inode *inode, struct file *file)
-{
- return ftrace_regex_release(inode, file, 1);
-}
-
-static int
-ftrace_notrace_release(struct inode *inode, struct file *file)
-{
- return ftrace_regex_release(inode, file, 0);
-}
-
static const struct file_operations ftrace_avail_fops = {
.open = ftrace_avail_open,
.read = seq_read,
.release = seq_release_private,
};
-static const struct file_operations ftrace_failures_fops = {
- .open = ftrace_failures_open,
+static const struct file_operations ftrace_enabled_fops = {
+ .open = ftrace_enabled_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
.read = seq_read,
.write = ftrace_filter_write,
.llseek = ftrace_regex_lseek,
- .release = ftrace_filter_release,
+ .release = ftrace_regex_release,
};
static const struct file_operations ftrace_notrace_fops = {
.read = seq_read,
.write = ftrace_notrace_write,
.llseek = ftrace_regex_lseek,
- .release = ftrace_notrace_release,
+ .release = ftrace_regex_release,
};
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
bool exists;
int i;
- if (ftrace_disabled)
- return -ENODEV;
-
/* decode regex */
type = filter_parse_regex(buffer, strlen(buffer), &search, ¬);
if (!not && *idx >= FTRACE_GRAPH_MAX_FUNCS)
search_len = strlen(search);
mutex_lock(&ftrace_lock);
+
+ if (unlikely(ftrace_disabled)) {
+ mutex_unlock(&ftrace_lock);
+ return -ENODEV;
+ }
+
do_for_each_ftrace_rec(pg, rec) {
- if (rec->flags & (FTRACE_FL_FAILED | FTRACE_FL_FREE))
+ if (rec->flags & FTRACE_FL_FREE)
continue;
- if (ftrace_match_record(rec, search, search_len, type)) {
+ if (ftrace_match_record(rec, NULL, search, search_len, type)) {
/* if it is in the array */
exists = false;
for (i = 0; i < *idx; i++) {
trace_create_file("available_filter_functions", 0444,
d_tracer, NULL, &ftrace_avail_fops);
- trace_create_file("failures", 0444,
- d_tracer, NULL, &ftrace_failures_fops);
+ trace_create_file("enabled_functions", 0444,
+ d_tracer, NULL, &ftrace_enabled_fops);
trace_create_file("set_ftrace_filter", 0644, d_tracer,
NULL, &ftrace_filter_fops);
{
unsigned long *p;
unsigned long addr;
- unsigned long flags;
mutex_lock(&ftrace_lock);
p = start;
ftrace_record_ip(addr);
}
- /* disable interrupts to prevent kstop machine */
- local_irq_save(flags);
ftrace_update_code(mod);
- local_irq_restore(flags);
mutex_unlock(&ftrace_lock);
return 0;
struct dyn_ftrace *rec;
struct ftrace_page *pg;
+ mutex_lock(&ftrace_lock);
+
if (ftrace_disabled)
- return;
+ goto out_unlock;
- mutex_lock(&ftrace_lock);
do_for_each_ftrace_rec(pg, rec) {
if (within_module_core(rec->ip, mod)) {
/*
ftrace_free_rec(rec);
}
} while_for_each_ftrace_rec();
+ out_unlock:
mutex_unlock(&ftrace_lock);
}
#else
+static struct ftrace_ops global_ops = {
+ .func = ftrace_stub,
+};
+
static int __init ftrace_nodyn_init(void)
{
ftrace_enabled = 1;
static inline int ftrace_init_dyn_debugfs(struct dentry *d_tracer) { return 0; }
static inline void ftrace_startup_enable(int command) { }
/* Keep as macros so we do not need to define the commands */
-# define ftrace_startup(command) do { } while (0)
-# define ftrace_shutdown(command) do { } while (0)
+# define ftrace_startup(ops, command) do { } while (0)
+# define ftrace_shutdown(ops, command) do { } while (0)
# define ftrace_startup_sysctl() do { } while (0)
# define ftrace_shutdown_sysctl() do { } while (0)
+
+static inline int
+ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
+{
+ return 1;
+}
+
#endif /* CONFIG_DYNAMIC_FTRACE */
+static void
+ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
+{
+ struct ftrace_ops *op;
+
+ /*
+ * Some of the ops may be dynamically allocated,
+ * they must be freed after a synchronize_sched().
+ */
+ preempt_disable_notrace();
+ op = rcu_dereference_raw(ftrace_ops_list);
+ while (op != &ftrace_list_end) {
+ if (ftrace_ops_test(op, ip))
+ op->func(ip, parent_ip);
+ op = rcu_dereference_raw(op->next);
+ };
+ preempt_enable_notrace();
+}
+
static void clear_ftrace_swapper(void)
{
struct task_struct *p;
*/
int register_ftrace_function(struct ftrace_ops *ops)
{
- int ret;
-
- if (unlikely(ftrace_disabled))
- return -1;
+ int ret = -1;
mutex_lock(&ftrace_lock);
+ if (unlikely(ftrace_disabled))
+ goto out_unlock;
+
ret = __register_ftrace_function(ops);
- ftrace_startup(0);
+ if (!ret)
+ ftrace_startup(ops, 0);
+
+ out_unlock:
mutex_unlock(&ftrace_lock);
return ret;
}
+EXPORT_SYMBOL_GPL(register_ftrace_function);
/**
* unregister_ftrace_function - unregister a function for profiling.
mutex_lock(&ftrace_lock);
ret = __unregister_ftrace_function(ops);
- ftrace_shutdown(0);
+ if (!ret)
+ ftrace_shutdown(ops, 0);
mutex_unlock(&ftrace_lock);
return ret;
}
+EXPORT_SYMBOL_GPL(unregister_ftrace_function);
int
ftrace_enable_sysctl(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
- int ret;
-
- if (unlikely(ftrace_disabled))
- return -ENODEV;
+ int ret = -ENODEV;
mutex_lock(&ftrace_lock);
- ret = proc_dointvec(table, write, buffer, lenp, ppos);
+ if (unlikely(ftrace_disabled))
+ goto out;
+
+ ret = proc_dointvec(table, write, buffer, lenp, ppos);
if (ret || !write || (last_ftrace_enabled == !!ftrace_enabled))
goto out;
ftrace_startup_sysctl();
/* we are starting ftrace again */
- if (ftrace_list != &ftrace_list_end) {
- if (ftrace_list->next == &ftrace_list_end)
- ftrace_trace_function = ftrace_list->func;
+ if (ftrace_ops_list != &ftrace_list_end) {
+ if (ftrace_ops_list->next == &ftrace_list_end)
+ ftrace_trace_function = ftrace_ops_list->func;
else
- ftrace_trace_function = ftrace_list_func;
+ ftrace_trace_function = ftrace_ops_list_func;
}
} else {
ftrace_graph_return = retfunc;
ftrace_graph_entry = entryfunc;
- ftrace_startup(FTRACE_START_FUNC_RET);
+ ftrace_startup(&global_ops, FTRACE_START_FUNC_RET);
out:
mutex_unlock(&ftrace_lock);
ftrace_graph_active--;
ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
ftrace_graph_entry = ftrace_graph_entry_stub;
- ftrace_shutdown(FTRACE_STOP_FUNC_RET);
+ ftrace_shutdown(&global_ops, FTRACE_STOP_FUNC_RET);
unregister_pm_notifier(&ftrace_suspend_notifier);
unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
entry->preempt_count = pc & 0xff;
entry->pid = (tsk) ? tsk->pid : 0;
+ entry->padding = 0;
entry->flags =
#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
{
enum print_line_t ret;
- if (iter->lost_events)
- trace_seq_printf(&iter->seq, "CPU:%d [LOST %lu EVENTS]\n",
- iter->cpu, iter->lost_events);
+ if (iter->lost_events &&
+ !trace_seq_printf(&iter->seq, "CPU:%d [LOST %lu EVENTS]\n",
+ iter->cpu, iter->lost_events))
+ return TRACE_TYPE_PARTIAL_LINE;
if (iter->trace && iter->trace->print_line) {
ret = iter->trace->print_line(iter);
if (iter->seq.len >= cnt)
break;
+
+ /*
+ * Setting the full flag means we reached the trace_seq buffer
+ * size and we should leave by partial output condition above.
+ * One of the trace_seq_* functions is not used properly.
+ */
+ WARN_ONCE(iter->seq.full, "full flag set for trace type %d",
+ iter->ent->type);
}
trace_access_unlock(iter->cpu_file);
trace_event_read_unlock();
extern unsigned long ftrace_update_tot_cnt;
#define DYN_FTRACE_TEST_NAME trace_selftest_dynamic_test_func
extern int DYN_FTRACE_TEST_NAME(void);
+#define DYN_FTRACE_TEST_NAME2 trace_selftest_dynamic_test_func2
+extern int DYN_FTRACE_TEST_NAME2(void);
#endif
extern int ring_buffer_expanded;
__common_field(unsigned char, flags);
__common_field(unsigned char, preempt_count);
__common_field(int, pid);
+ __common_field(int, padding);
return ret;
}
static struct ftrace_ops trace_ops __read_mostly =
{
.func = function_trace_call,
+ .flags = FTRACE_OPS_FL_GLOBAL,
};
static struct ftrace_ops trace_stack_ops __read_mostly =
{
.func = function_stack_trace_call,
+ .flags = FTRACE_OPS_FL_GLOBAL,
};
/* Our two options */
static struct ftrace_ops trace_ops __read_mostly =
{
.func = irqsoff_tracer_call,
+ .flags = FTRACE_OPS_FL_GLOBAL,
};
#endif /* CONFIG_FUNCTION_TRACER */
"common_preempt_count",
"common_pid",
"common_tgid",
- "common_lock_depth",
FIELD_STRING_IP,
FIELD_STRING_RETIP,
FIELD_STRING_FUNC,
enum print_line_t trace_nop_print(struct trace_iterator *iter, int flags,
struct trace_event *event)
{
+ if (!trace_seq_printf(&iter->seq, "type: %d\n", iter->ent->type))
+ return TRACE_TYPE_PARTIAL_LINE;
+
return TRACE_TYPE_HANDLED;
}
struct trace_bprintk_fmt {
struct list_head list;
- char fmt[0];
+ const char *fmt;
};
static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
void hold_module_trace_bprintk_format(const char **start, const char **end)
{
const char **iter;
+ char *fmt;
mutex_lock(&btrace_mutex);
for (iter = start; iter < end; iter++) {
continue;
}
- tb_fmt = kmalloc(offsetof(struct trace_bprintk_fmt, fmt)
- + strlen(*iter) + 1, GFP_KERNEL);
- if (tb_fmt) {
+ tb_fmt = kmalloc(sizeof(*tb_fmt), GFP_KERNEL);
+ if (tb_fmt)
+ fmt = kmalloc(strlen(*iter) + 1, GFP_KERNEL);
+ if (tb_fmt && fmt) {
list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
- strcpy(tb_fmt->fmt, *iter);
+ strcpy(fmt, *iter);
+ tb_fmt->fmt = fmt;
*iter = tb_fmt->fmt;
- } else
+ } else {
+ kfree(tb_fmt);
*iter = NULL;
+ }
}
mutex_unlock(&btrace_mutex);
}
return 0;
}
+/*
+ * The debugfs/tracing/printk_formats file maps the addresses with
+ * the ASCII formats that are used in the bprintk events in the
+ * buffer. For userspace tools to be able to decode the events from
+ * the buffer, they need to be able to map the address with the format.
+ *
+ * The addresses of the bprintk formats are in their own section
+ * __trace_printk_fmt. But for modules we copy them into a link list.
+ * The code to print the formats and their addresses passes around the
+ * address of the fmt string. If the fmt address passed into the seq
+ * functions is within the kernel core __trace_printk_fmt section, then
+ * it simply uses the next pointer in the list.
+ *
+ * When the fmt pointer is outside the kernel core __trace_printk_fmt
+ * section, then we need to read the link list pointers. The trick is
+ * we pass the address of the string to the seq function just like
+ * we do for the kernel core formats. To get back the structure that
+ * holds the format, we simply use containerof() and then go to the
+ * next format in the list.
+ */
+static const char **
+find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
+{
+ struct trace_bprintk_fmt *mod_fmt;
+
+ if (list_empty(&trace_bprintk_fmt_list))
+ return NULL;
+
+ /*
+ * v will point to the address of the fmt record from t_next
+ * v will be NULL from t_start.
+ * If this is the first pointer or called from start
+ * then we need to walk the list.
+ */
+ if (!v || start_index == *pos) {
+ struct trace_bprintk_fmt *p;
+
+ /* search the module list */
+ list_for_each_entry(p, &trace_bprintk_fmt_list, list) {
+ if (start_index == *pos)
+ return &p->fmt;
+ start_index++;
+ }
+ /* pos > index */
+ return NULL;
+ }
+
+ /*
+ * v points to the address of the fmt field in the mod list
+ * structure that holds the module print format.
+ */
+ mod_fmt = container_of(v, typeof(*mod_fmt), fmt);
+ if (mod_fmt->list.next == &trace_bprintk_fmt_list)
+ return NULL;
+
+ mod_fmt = container_of(mod_fmt->list.next, typeof(*mod_fmt), list);
+
+ return &mod_fmt->fmt;
+}
+
+static void format_mod_start(void)
+{
+ mutex_lock(&btrace_mutex);
+}
+
+static void format_mod_stop(void)
+{
+ mutex_unlock(&btrace_mutex);
+}
+
#else /* !CONFIG_MODULES */
__init static int
module_trace_bprintk_format_notify(struct notifier_block *self,
{
return 0;
}
+static inline const char **
+find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
+{
+ return NULL;
+}
+static inline void format_mod_start(void) { }
+static inline void format_mod_stop(void) { }
#endif /* CONFIG_MODULES */
}
EXPORT_SYMBOL_GPL(__ftrace_vprintk);
+static const char **find_next(void *v, loff_t *pos)
+{
+ const char **fmt = v;
+ int start_index;
+
+ if (!fmt)
+ fmt = __start___trace_bprintk_fmt + *pos;
+
+ start_index = __stop___trace_bprintk_fmt - __start___trace_bprintk_fmt;
+
+ if (*pos < start_index)
+ return fmt;
+
+ return find_next_mod_format(start_index, v, fmt, pos);
+}
+
static void *
t_start(struct seq_file *m, loff_t *pos)
{
- const char **fmt = __start___trace_bprintk_fmt + *pos;
-
- if ((unsigned long)fmt >= (unsigned long)__stop___trace_bprintk_fmt)
- return NULL;
- return fmt;
+ format_mod_start();
+ return find_next(NULL, pos);
}
static void *t_next(struct seq_file *m, void * v, loff_t *pos)
{
(*pos)++;
- return t_start(m, pos);
+ return find_next(v, pos);
}
static int t_show(struct seq_file *m, void *v)
static void t_stop(struct seq_file *m, void *p)
{
+ format_mod_stop();
}
static const struct seq_operations show_format_seq_ops = {
static struct ftrace_ops trace_ops __read_mostly =
{
.func = wakeup_tracer_call,
+ .flags = FTRACE_OPS_FL_GLOBAL,
};
#endif /* CONFIG_FUNCTION_TRACER */
#ifdef CONFIG_DYNAMIC_FTRACE
+static int trace_selftest_test_probe1_cnt;
+static void trace_selftest_test_probe1_func(unsigned long ip,
+ unsigned long pip)
+{
+ trace_selftest_test_probe1_cnt++;
+}
+
+static int trace_selftest_test_probe2_cnt;
+static void trace_selftest_test_probe2_func(unsigned long ip,
+ unsigned long pip)
+{
+ trace_selftest_test_probe2_cnt++;
+}
+
+static int trace_selftest_test_probe3_cnt;
+static void trace_selftest_test_probe3_func(unsigned long ip,
+ unsigned long pip)
+{
+ trace_selftest_test_probe3_cnt++;
+}
+
+static int trace_selftest_test_global_cnt;
+static void trace_selftest_test_global_func(unsigned long ip,
+ unsigned long pip)
+{
+ trace_selftest_test_global_cnt++;
+}
+
+static int trace_selftest_test_dyn_cnt;
+static void trace_selftest_test_dyn_func(unsigned long ip,
+ unsigned long pip)
+{
+ trace_selftest_test_dyn_cnt++;
+}
+
+static struct ftrace_ops test_probe1 = {
+ .func = trace_selftest_test_probe1_func,
+};
+
+static struct ftrace_ops test_probe2 = {
+ .func = trace_selftest_test_probe2_func,
+};
+
+static struct ftrace_ops test_probe3 = {
+ .func = trace_selftest_test_probe3_func,
+};
+
+static struct ftrace_ops test_global = {
+ .func = trace_selftest_test_global_func,
+ .flags = FTRACE_OPS_FL_GLOBAL,
+};
+
+static void print_counts(void)
+{
+ printk("(%d %d %d %d %d) ",
+ trace_selftest_test_probe1_cnt,
+ trace_selftest_test_probe2_cnt,
+ trace_selftest_test_probe3_cnt,
+ trace_selftest_test_global_cnt,
+ trace_selftest_test_dyn_cnt);
+}
+
+static void reset_counts(void)
+{
+ trace_selftest_test_probe1_cnt = 0;
+ trace_selftest_test_probe2_cnt = 0;
+ trace_selftest_test_probe3_cnt = 0;
+ trace_selftest_test_global_cnt = 0;
+ trace_selftest_test_dyn_cnt = 0;
+}
+
+static int trace_selftest_ops(int cnt)
+{
+ int save_ftrace_enabled = ftrace_enabled;
+ struct ftrace_ops *dyn_ops;
+ char *func1_name;
+ char *func2_name;
+ int len1;
+ int len2;
+ int ret = -1;
+
+ printk(KERN_CONT "PASSED\n");
+ pr_info("Testing dynamic ftrace ops #%d: ", cnt);
+
+ ftrace_enabled = 1;
+ reset_counts();
+
+ /* Handle PPC64 '.' name */
+ func1_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+ func2_name = "*" __stringify(DYN_FTRACE_TEST_NAME2);
+ len1 = strlen(func1_name);
+ len2 = strlen(func2_name);
+
+ /*
+ * Probe 1 will trace function 1.
+ * Probe 2 will trace function 2.
+ * Probe 3 will trace functions 1 and 2.
+ */
+ ftrace_set_filter(&test_probe1, func1_name, len1, 1);
+ ftrace_set_filter(&test_probe2, func2_name, len2, 1);
+ ftrace_set_filter(&test_probe3, func1_name, len1, 1);
+ ftrace_set_filter(&test_probe3, func2_name, len2, 0);
+
+ register_ftrace_function(&test_probe1);
+ register_ftrace_function(&test_probe2);
+ register_ftrace_function(&test_probe3);
+ register_ftrace_function(&test_global);
+
+ DYN_FTRACE_TEST_NAME();
+
+ print_counts();
+
+ if (trace_selftest_test_probe1_cnt != 1)
+ goto out;
+ if (trace_selftest_test_probe2_cnt != 0)
+ goto out;
+ if (trace_selftest_test_probe3_cnt != 1)
+ goto out;
+ if (trace_selftest_test_global_cnt == 0)
+ goto out;
+
+ DYN_FTRACE_TEST_NAME2();
+
+ print_counts();
+
+ if (trace_selftest_test_probe1_cnt != 1)
+ goto out;
+ if (trace_selftest_test_probe2_cnt != 1)
+ goto out;
+ if (trace_selftest_test_probe3_cnt != 2)
+ goto out;
+
+ /* Add a dynamic probe */
+ dyn_ops = kzalloc(sizeof(*dyn_ops), GFP_KERNEL);
+ if (!dyn_ops) {
+ printk("MEMORY ERROR ");
+ goto out;
+ }
+
+ dyn_ops->func = trace_selftest_test_dyn_func;
+
+ register_ftrace_function(dyn_ops);
+
+ trace_selftest_test_global_cnt = 0;
+
+ DYN_FTRACE_TEST_NAME();
+
+ print_counts();
+
+ if (trace_selftest_test_probe1_cnt != 2)
+ goto out_free;
+ if (trace_selftest_test_probe2_cnt != 1)
+ goto out_free;
+ if (trace_selftest_test_probe3_cnt != 3)
+ goto out_free;
+ if (trace_selftest_test_global_cnt == 0)
+ goto out;
+ if (trace_selftest_test_dyn_cnt == 0)
+ goto out_free;
+
+ DYN_FTRACE_TEST_NAME2();
+
+ print_counts();
+
+ if (trace_selftest_test_probe1_cnt != 2)
+ goto out_free;
+ if (trace_selftest_test_probe2_cnt != 2)
+ goto out_free;
+ if (trace_selftest_test_probe3_cnt != 4)
+ goto out_free;
+
+ ret = 0;
+ out_free:
+ unregister_ftrace_function(dyn_ops);
+ kfree(dyn_ops);
+
+ out:
+ /* Purposely unregister in the same order */
+ unregister_ftrace_function(&test_probe1);
+ unregister_ftrace_function(&test_probe2);
+ unregister_ftrace_function(&test_probe3);
+ unregister_ftrace_function(&test_global);
+
+ /* Make sure everything is off */
+ reset_counts();
+ DYN_FTRACE_TEST_NAME();
+ DYN_FTRACE_TEST_NAME();
+
+ if (trace_selftest_test_probe1_cnt ||
+ trace_selftest_test_probe2_cnt ||
+ trace_selftest_test_probe3_cnt ||
+ trace_selftest_test_global_cnt ||
+ trace_selftest_test_dyn_cnt)
+ ret = -1;
+
+ ftrace_enabled = save_ftrace_enabled;
+
+ return ret;
+}
+
/* Test dynamic code modification and ftrace filters */
int trace_selftest_startup_dynamic_tracing(struct tracer *trace,
struct trace_array *tr,
func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
/* filter only on our function */
- ftrace_set_filter(func_name, strlen(func_name), 1);
+ ftrace_set_global_filter(func_name, strlen(func_name), 1);
/* enable tracing */
ret = tracer_init(trace, tr);
/* check the trace buffer */
ret = trace_test_buffer(tr, &count);
- trace->reset(tr);
tracing_start();
/* we should only have one item */
if (!ret && count != 1) {
+ trace->reset(tr);
printk(KERN_CONT ".. filter failed count=%ld ..", count);
ret = -1;
goto out;
}
+ /* Test the ops with global tracing running */
+ ret = trace_selftest_ops(1);
+ trace->reset(tr);
+
out:
ftrace_enabled = save_ftrace_enabled;
tracer_enabled = save_tracer_enabled;
/* Enable tracing on all functions again */
- ftrace_set_filter(NULL, 0, 1);
+ ftrace_set_global_filter(NULL, 0, 1);
+
+ /* Test the ops with global tracing off */
+ if (!ret)
+ ret = trace_selftest_ops(2);
return ret;
}
/* used to call mcount */
return 0;
}
+
+int DYN_FTRACE_TEST_NAME2(void)
+{
+ /* used to call mcount */
+ return 0;
+}
static struct ftrace_ops trace_ops __read_mostly =
{
.func = stack_trace_call,
+ .flags = FTRACE_OPS_FL_GLOBAL,
};
static ssize_t
{
WARN_ON(strcmp((*entry)->name, elem->name) != 0);
- if (elem->regfunc && !elem->state && active)
+ if (elem->regfunc && !jump_label_enabled(&elem->key) && active)
elem->regfunc();
- else if (elem->unregfunc && elem->state && !active)
+ else if (elem->unregfunc && jump_label_enabled(&elem->key) && !active)
elem->unregfunc();
/*
* is used.
*/
rcu_assign_pointer(elem->funcs, (*entry)->funcs);
- if (!elem->state && active) {
- jump_label_enable(&elem->state);
- elem->state = active;
- } else if (elem->state && !active) {
- jump_label_disable(&elem->state);
- elem->state = active;
- }
+ if (active && !jump_label_enabled(&elem->key))
+ jump_label_inc(&elem->key);
+ else if (!active && jump_label_enabled(&elem->key))
+ jump_label_dec(&elem->key);
}
/*
*/
static void disable_tracepoint(struct tracepoint *elem)
{
- if (elem->unregfunc && elem->state)
+ if (elem->unregfunc && jump_label_enabled(&elem->key))
elem->unregfunc();
- if (elem->state) {
- jump_label_disable(&elem->state);
- elem->state = 0;
- }
+ if (jump_label_enabled(&elem->key))
+ jump_label_dec(&elem->key);
rcu_assign_pointer(elem->funcs, NULL);
}
p = kthread_create(watchdog, (void *)(unsigned long)cpu, "watchdog/%d", cpu);
if (IS_ERR(p)) {
printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu);
- if (!err)
+ if (!err) {
/* if hardlockup hasn't already set this */
err = PTR_ERR(p);
+ /* and disable the perf event */
+ watchdog_nmi_disable(cpu);
+ }
goto out;
}
kthread_bind(p, cpu);
return true;
spin_unlock_irq(&gcwq->lock);
- /* CPU has come up in between, retry migration */
+ /*
+ * We've raced with CPU hot[un]plug. Give it a breather
+ * and retry migration. cond_resched() is required here;
+ * otherwise, we might deadlock against cpu_stop trying to
+ * bring down the CPU on non-preemptive kernel.
+ */
cpu_relax();
+ cond_resched();
}
}
enabled then all held locks will also be reported. This
feature has negligible overhead.
+config DEFAULT_HUNG_TASK_TIMEOUT
+ int "Default timeout for hung task detection (in seconds)"
+ depends on DETECT_HUNG_TASK
+ default 120
+ help
+ This option controls the default timeout (in seconds) used
+ to determine when a task has become non-responsive and should
+ be considered hung.
+
+ It can be adjusted at runtime via the kernel.hung_task_timeout
+ sysctl or by writing a value to /proc/sys/kernel/hung_task_timeout.
+
+ A timeout of 0 disables the check. The default is two minutes.
+ Keeping the default should be fine in most cases.
+
config BOOTPARAM_HUNG_TASK_PANIC
bool "Panic (Reboot) On Hung Tasks"
depends on DETECT_HUNG_TASK
config DEBUG_KMEMLEAK
bool "Kernel memory leak detector"
depends on DEBUG_KERNEL && EXPERIMENTAL && !MEMORY_HOTPLUG && \
- (X86 || ARM || PPC || S390 || SPARC64 || SUPERH || MICROBLAZE || TILE)
+ (X86 || ARM || PPC || MIPS || S390 || SPARC64 || SUPERH || MICROBLAZE || TILE)
- select DEBUG_FS if SYSFS
+ select DEBUG_FS
select STACKTRACE if STACKTRACE_SUPPORT
select KALLSYMS
select CRC32
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
- string_helpers.o gcd.o lcm.o list_sort.o uuid.o flex_array.o
+ string_helpers.o gcd.o lcm.o list_sort.o uuid.o flex_array.o \
+ bsearch.o
obj-y += kstrtox.o
obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
--- /dev/null
+/*
+ * A generic implementation of binary search for the Linux kernel
+ *
+ * Copyright (C) 2008-2009 Ksplice, Inc.
+ * Author: Tim Abbott <tabbott@ksplice.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; version 2.
+ */
+
+#include <linux/module.h>
+#include <linux/bsearch.h>
+
+/*
+ * bsearch - binary search an array of elements
+ * @key: pointer to item being searched for
+ * @base: pointer to first element to search
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp: pointer to comparison function
+ *
+ * This function does a binary search on the given array. The
+ * contents of the array should already be in ascending sorted order
+ * under the provided comparison function.
+ *
+ * Note that the key need not have the same type as the elements in
+ * the array, e.g. key could be a string and the comparison function
+ * could compare the string with the struct's name field. However, if
+ * the key and elements in the array are of the same type, you can use
+ * the same comparison function for both sort() and bsearch().
+ */
+void *bsearch(const void *key, const void *base, size_t num, size_t size,
+ int (*cmp)(const void *key, const void *elt))
+{
+ size_t start = 0, end = num;
+ int result;
+
+ while (start < end) {
+ size_t mid = start + (end - start) / 2;
+
+ result = cmp(key, base + mid * size);
+ if (result < 0)
+ end = mid;
+ else if (result > 0)
+ start = mid + 1;
+ else
+ return (void *)base + mid * size;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL(bsearch);
return -ENOMEM;
}
-static int device_dma_allocations(struct device *dev)
+static int device_dma_allocations(struct device *dev, struct dma_debug_entry **out_entry)
{
struct dma_debug_entry *entry;
unsigned long flags;
for (i = 0; i < HASH_SIZE; ++i) {
spin_lock(&dma_entry_hash[i].lock);
list_for_each_entry(entry, &dma_entry_hash[i].list, list) {
- if (entry->dev == dev)
+ if (entry->dev == dev) {
count += 1;
+ *out_entry = entry;
+ }
}
spin_unlock(&dma_entry_hash[i].lock);
}
static int dma_debug_device_change(struct notifier_block *nb, unsigned long action, void *data)
{
struct device *dev = data;
+ struct dma_debug_entry *uninitialized_var(entry);
int count;
if (global_disable)
switch (action) {
case BUS_NOTIFY_UNBOUND_DRIVER:
- count = device_dma_allocations(dev);
+ count = device_dma_allocations(dev, &entry);
if (count == 0)
break;
- err_printk(dev, NULL, "DMA-API: device driver has pending "
+ err_printk(dev, entry, "DMA-API: device driver has pending "
"DMA allocations while released from device "
- "[count=%d]\n", count);
+ "[count=%d]\n"
+ "One of leaked entries details: "
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[mapped with %s] [mapped as %s]\n",
+ count, entry->dev_addr, entry->size,
+ dir2name[entry->direction], type2name[entry->type]);
break;
default:
break;
/**
* flex_array_prealloc - guarantee that array space exists
- * @fa: the flex array for which to preallocate parts
- * @start: index of first array element for which space is allocated
- * @end: index of last (inclusive) element for which space is allocated
- * @flags: page allocation flags
+ * @fa: the flex array for which to preallocate parts
+ * @start: index of first array element for which space is allocated
+ * @nr_elements: number of elements for which space is allocated
+ * @flags: page allocation flags
*
* This will guarantee that no future calls to flex_array_put()
* will allocate memory. It can be used if you are expecting to
* Locking must be provided by the caller.
*/
int flex_array_prealloc(struct flex_array *fa, unsigned int start,
- unsigned int end, gfp_t flags)
+ unsigned int nr_elements, gfp_t flags)
{
int start_part;
int end_part;
int part_nr;
+ unsigned int end;
struct flex_array_part *part;
- if (start >= fa->total_nr_elements || end >= fa->total_nr_elements)
+ if (!start && !nr_elements)
+ return 0;
+ if (start >= fa->total_nr_elements)
+ return -ENOSPC;
+ if (!nr_elements)
+ return 0;
+
+ end = start + nr_elements - 1;
+
+ if (end >= fa->total_nr_elements)
return -ENOSPC;
if (elements_fit_in_base(fa))
return 0;
int part_nr;
int ret = 0;
+ if (!fa->total_nr_elements)
+ return 0;
if (elements_fit_in_base(fa))
return ret;
for (part_nr = 0; part_nr < FLEX_ARRAY_NR_BASE_PTRS; part_nr++) {
}
EXPORT_SYMBOL(sysfs_streq);
+/**
+ * strtobool - convert common user inputs into boolean values
+ * @s: input string
+ * @res: result
+ *
+ * This routine returns 0 iff the first character is one of 'Yy1Nn0'.
+ * Otherwise it will return -EINVAL. Value pointed to by res is
+ * updated upon finding a match.
+ */
+int strtobool(const char *s, bool *res)
+{
+ switch (s[0]) {
+ case 'y':
+ case 'Y':
+ case '1':
+ *res = true;
+ break;
+ case 'n':
+ case 'N':
+ case '0':
+ *res = false;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+}
+EXPORT_SYMBOL(strtobool);
+
#ifndef __HAVE_ARCH_MEMSET
/**
* memset - Fill a region of memory with the given value
return string(buf, end, uuid, spec);
}
-int kptr_restrict = 1;
+int kptr_restrict __read_mostly;
/*
* Show a '%p' thing. A kernel extension is that the '%p' is followed
*/
tmp = b->in[b->in_pos++];
+ if (tmp == 0x00)
+ return XZ_STREAM_END;
+
if (tmp >= 0xE0 || tmp == 0x01) {
s->lzma2.need_props = true;
s->lzma2.need_dict_reset = false;
lzma_reset(s);
}
} else {
- if (tmp == 0x00)
- return XZ_STREAM_END;
-
if (tmp > 0x02)
return XZ_DATA_ERROR;
return ret;
}
+#define VM_NO_THP (VM_SPECIAL|VM_INSERTPAGE|VM_MIXEDMAP|VM_SAO| \
+ VM_HUGETLB|VM_SHARED|VM_MAYSHARE)
+
int hugepage_madvise(struct vm_area_struct *vma,
unsigned long *vm_flags, int advice)
{
/*
* Be somewhat over-protective like KSM for now!
*/
- if (*vm_flags & (VM_HUGEPAGE |
- VM_SHARED | VM_MAYSHARE |
- VM_PFNMAP | VM_IO | VM_DONTEXPAND |
- VM_RESERVED | VM_HUGETLB | VM_INSERTPAGE |
- VM_MIXEDMAP | VM_SAO))
+ if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
return -EINVAL;
*vm_flags &= ~VM_NOHUGEPAGE;
*vm_flags |= VM_HUGEPAGE;
/*
* Be somewhat over-protective like KSM for now!
*/
- if (*vm_flags & (VM_NOHUGEPAGE |
- VM_SHARED | VM_MAYSHARE |
- VM_PFNMAP | VM_IO | VM_DONTEXPAND |
- VM_RESERVED | VM_HUGETLB | VM_INSERTPAGE |
- VM_MIXEDMAP | VM_SAO))
+ if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
return -EINVAL;
*vm_flags &= ~VM_HUGEPAGE;
*vm_flags |= VM_NOHUGEPAGE;
* page fault if needed.
*/
return 0;
- if (vma->vm_file || vma->vm_ops)
+ if (vma->vm_ops)
/* khugepaged not yet working on file or special mappings */
return 0;
- VM_BUG_ON(is_linear_pfn_mapping(vma) || is_pfn_mapping(vma));
+ /*
+ * If is_pfn_mapping() is true is_learn_pfn_mapping() must be
+ * true too, verify it here.
+ */
+ VM_BUG_ON(is_linear_pfn_mapping(vma) || vma->vm_flags & VM_NO_THP);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
if (hstart < hend)
(vma->vm_flags & VM_NOHUGEPAGE))
goto out;
- /* VM_PFNMAP vmas may have vm_ops null but vm_file set */
- if (!vma->anon_vma || vma->vm_ops || vma->vm_file)
+ if (!vma->anon_vma || vma->vm_ops)
goto out;
if (is_vma_temporary_stack(vma))
goto out;
- VM_BUG_ON(is_linear_pfn_mapping(vma) || is_pfn_mapping(vma));
+ /*
+ * If is_pfn_mapping() is true is_learn_pfn_mapping() must be
+ * true too, verify it here.
+ */
+ VM_BUG_ON(is_linear_pfn_mapping(vma) || vma->vm_flags & VM_NO_THP);
pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
progress++;
continue;
}
- /* VM_PFNMAP vmas may have vm_ops null but vm_file set */
- if (!vma->anon_vma || vma->vm_ops || vma->vm_file)
+ if (!vma->anon_vma || vma->vm_ops)
goto skip;
if (is_vma_temporary_stack(vma))
goto skip;
-
- VM_BUG_ON(is_linear_pfn_mapping(vma) || is_pfn_mapping(vma));
+ /*
+ * If is_pfn_mapping() is true is_learn_pfn_mapping()
+ * must be true too, verify it here.
+ */
+ VM_BUG_ON(is_linear_pfn_mapping(vma) ||
+ vma->vm_flags & VM_NO_THP);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
++(*pos);
list_for_each_continue_rcu(n, &object_list) {
- next_obj = list_entry(n, struct kmemleak_object, object_list);
- if (get_object(next_obj))
+ struct kmemleak_object *obj =
+ list_entry(n, struct kmemleak_object, object_list);
+ if (get_object(obj)) {
+ next_obj = obj;
break;
+ }
}
put_object(prev_obj);
*/
mark_page_accessed(page);
}
- if (flags & FOLL_MLOCK) {
+ if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
/*
* The preliminary mapping check is mainly to avoid the
* pointless overhead of lock_page on the ZERO_PAGE
static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr)
{
- return (vma->vm_flags & VM_GROWSDOWN) &&
- (vma->vm_start == addr) &&
- !vma_stack_continue(vma->vm_prev, addr);
+ return stack_guard_page_start(vma, addr) ||
+ stack_guard_page_end(vma, addr+PAGE_SIZE);
}
/**
continue;
}
- /*
- * If we don't actually want the page itself,
- * and it's the stack guard page, just skip it.
- */
- if (!pages && stack_guard_page(vma, start))
- goto next_page;
-
do {
struct page *page;
unsigned int foll_flags = gup_flags;
int ret;
unsigned int fault_flags = 0;
+ /* For mlock, just skip the stack guard page. */
+ if (foll_flags & FOLL_MLOCK) {
+ if (stack_guard_page(vma, start))
+ goto next_page;
+ }
if (foll_flags & FOLL_WRITE)
fault_flags |= FAULT_FLAG_WRITE;
if (nonblocking)
* run pte_offset_map on the pmd, if an huge pmd could
* materialize from under us from a different thread.
*/
- if (unlikely(__pte_alloc(mm, vma, pmd, address)))
+ if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address))
return VM_FAULT_OOM;
/* if an huge pmd materialized from under us just retry later */
if (unlikely(pmd_trans_huge(*pmd)))
VM_BUG_ON(end > vma->vm_end);
VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
- gup_flags = FOLL_TOUCH;
+ gup_flags = FOLL_TOUCH | FOLL_MLOCK;
/*
* We want to touch writable mappings with a write fault in order
* to break COW, except for shared mappings because these don't COW
if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))
gup_flags |= FOLL_FORCE;
- if (vma->vm_flags & VM_LOCKED)
- gup_flags |= FOLL_MLOCK;
-
return __get_user_pages(current, mm, addr, nr_pages, gup_flags,
NULL, NULL, nonblocking);
}
size = address - vma->vm_start;
grow = (address - vma->vm_end) >> PAGE_SHIFT;
- error = acct_stack_growth(vma, size, grow);
- if (!error) {
- vma->vm_end = address;
- perf_event_mmap(vma);
+ error = -ENOMEM;
+ if (vma->vm_pgoff + (size >> PAGE_SHIFT) >= vma->vm_pgoff) {
+ error = acct_stack_growth(vma, size, grow);
+ if (!error) {
+ vma->vm_end = address;
+ perf_event_mmap(vma);
+ }
}
}
vma_unlock_anon_vma(vma);
/*
* The baseline for the badness score is the proportion of RAM that each
- * task's rss and swap space use.
+ * task's rss, pagetable and swap space use.
*/
- points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
- totalpages;
+ points = get_mm_rss(p->mm) + p->mm->nr_ptes;
+ points += get_mm_counter(p->mm, MM_SWAPENTS);
+
+ points *= 1000;
+ points /= totalpages;
task_unlock(p);
/*
EXPORT_SYMBOL(free_pages);
+static void *make_alloc_exact(unsigned long addr, unsigned order, size_t size)
+{
+ if (addr) {
+ unsigned long alloc_end = addr + (PAGE_SIZE << order);
+ unsigned long used = addr + PAGE_ALIGN(size);
+
+ split_page(virt_to_page((void *)addr), order);
+ while (used < alloc_end) {
+ free_page(used);
+ used += PAGE_SIZE;
+ }
+ }
+ return (void *)addr;
+}
+
/**
* alloc_pages_exact - allocate an exact number physically-contiguous pages.
* @size: the number of bytes to allocate
unsigned long addr;
addr = __get_free_pages(gfp_mask, order);
- if (addr) {
- unsigned long alloc_end = addr + (PAGE_SIZE << order);
- unsigned long used = addr + PAGE_ALIGN(size);
-
- split_page(virt_to_page((void *)addr), order);
- while (used < alloc_end) {
- free_page(used);
- used += PAGE_SIZE;
- }
- }
-
- return (void *)addr;
+ return make_alloc_exact(addr, order, size);
}
EXPORT_SYMBOL(alloc_pages_exact);
+/**
+ * alloc_pages_exact_nid - allocate an exact number of physically-contiguous
+ * pages on a node.
+ * @nid: the preferred node ID where memory should be allocated
+ * @size: the number of bytes to allocate
+ * @gfp_mask: GFP flags for the allocation
+ *
+ * Like alloc_pages_exact(), but try to allocate on node nid first before falling
+ * back.
+ * Note this is not alloc_pages_exact_node() which allocates on a specific node,
+ * but is not exact.
+ */
+void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
+{
+ unsigned order = get_order(size);
+ struct page *p = alloc_pages_node(nid, gfp_mask, order);
+ if (!p)
+ return NULL;
+ return make_alloc_exact((unsigned long)page_address(p), order, size);
+}
+EXPORT_SYMBOL(alloc_pages_exact_nid);
+
/**
* free_pages_exact - release memory allocated via alloc_pages_exact()
* @virt: the value returned by alloc_pages_exact.
if (!slab_is_available()) {
zone->wait_table = (wait_queue_head_t *)
- alloc_bootmem_node(pgdat, alloc_size);
+ alloc_bootmem_node_nopanic(pgdat, alloc_size);
} else {
/*
* This case means that a zone whose size was 0 gets new memory
unsigned long usemapsize = usemap_size(zonesize);
zone->pageblock_flags = NULL;
if (usemapsize)
- zone->pageblock_flags = alloc_bootmem_node(pgdat, usemapsize);
+ zone->pageblock_flags = alloc_bootmem_node_nopanic(pgdat,
+ usemapsize);
}
#else
static inline void setup_usemap(struct pglist_data *pgdat,
size = (end - start) * sizeof(struct page);
map = alloc_remap(pgdat->node_id, size);
if (!map)
- map = alloc_bootmem_node(pgdat, size);
+ map = alloc_bootmem_node_nopanic(pgdat, size);
pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
}
#ifndef CONFIG_NEED_MULTIPLE_NODES
{
void *addr = NULL;
- addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN);
+ addr = alloc_pages_exact_nid(nid, size, GFP_KERNEL | __GFP_NOWARN);
if (addr)
return addr;
static int shmem_unuse_inode(struct shmem_inode_info *info, swp_entry_t entry, struct page *page)
{
- struct inode *inode;
+ struct address_space *mapping;
unsigned long idx;
unsigned long size;
unsigned long limit;
if (size > SHMEM_NR_DIRECT)
size = SHMEM_NR_DIRECT;
offset = shmem_find_swp(entry, ptr, ptr+size);
- if (offset >= 0)
+ if (offset >= 0) {
+ shmem_swp_balance_unmap();
goto found;
+ }
if (!info->i_indirect)
goto lost2;
if (size > ENTRIES_PER_PAGE)
size = ENTRIES_PER_PAGE;
offset = shmem_find_swp(entry, ptr, ptr+size);
- shmem_swp_unmap(ptr);
if (offset >= 0) {
shmem_dir_unmap(dir);
goto found;
}
+ shmem_swp_unmap(ptr);
}
}
lost1:
return 0;
found:
idx += offset;
- inode = igrab(&info->vfs_inode);
- spin_unlock(&info->lock);
+ ptr += offset;
/*
* Move _head_ to start search for next from here.
*/
if (shmem_swaplist.next != &info->swaplist)
list_move_tail(&shmem_swaplist, &info->swaplist);
- mutex_unlock(&shmem_swaplist_mutex);
- error = 1;
- if (!inode)
- goto out;
/*
- * Charge page using GFP_KERNEL while we can wait.
- * Charged back to the user(not to caller) when swap account is used.
- * add_to_page_cache() will be called with GFP_NOWAIT.
+ * We rely on shmem_swaplist_mutex, not only to protect the swaplist,
+ * but also to hold up shmem_evict_inode(): so inode cannot be freed
+ * beneath us (pagelock doesn't help until the page is in pagecache).
*/
- error = mem_cgroup_cache_charge(page, current->mm, GFP_KERNEL);
- if (error)
- goto out;
- error = radix_tree_preload(GFP_KERNEL);
- if (error) {
- mem_cgroup_uncharge_cache_page(page);
- goto out;
- }
- error = 1;
-
- spin_lock(&info->lock);
- ptr = shmem_swp_entry(info, idx, NULL);
- if (ptr && ptr->val == entry.val) {
- error = add_to_page_cache_locked(page, inode->i_mapping,
- idx, GFP_NOWAIT);
- /* does mem_cgroup_uncharge_cache_page on error */
- } else /* we must compensate for our precharge above */
- mem_cgroup_uncharge_cache_page(page);
+ mapping = info->vfs_inode.i_mapping;
+ error = add_to_page_cache_locked(page, mapping, idx, GFP_NOWAIT);
+ /* which does mem_cgroup_uncharge_cache_page on error */
if (error == -EEXIST) {
- struct page *filepage = find_get_page(inode->i_mapping, idx);
+ struct page *filepage = find_get_page(mapping, idx);
error = 1;
if (filepage) {
/*
swap_free(entry);
error = 1; /* not an error, but entry was found */
}
- if (ptr)
- shmem_swp_unmap(ptr);
+ shmem_swp_unmap(ptr);
spin_unlock(&info->lock);
- radix_tree_preload_end();
-out:
- unlock_page(page);
- page_cache_release(page);
- iput(inode); /* allows for NULL */
return error;
}
struct list_head *p, *next;
struct shmem_inode_info *info;
int found = 0;
+ int error;
+
+ /*
+ * Charge page using GFP_KERNEL while we can wait, before taking
+ * the shmem_swaplist_mutex which might hold up shmem_writepage().
+ * Charged back to the user (not to caller) when swap account is used.
+ * add_to_page_cache() will be called with GFP_NOWAIT.
+ */
+ error = mem_cgroup_cache_charge(page, current->mm, GFP_KERNEL);
+ if (error)
+ goto out;
+ /*
+ * Try to preload while we can wait, to not make a habit of
+ * draining atomic reserves; but don't latch on to this cpu,
+ * it's okay if sometimes we get rescheduled after this.
+ */
+ error = radix_tree_preload(GFP_KERNEL);
+ if (error)
+ goto uncharge;
+ radix_tree_preload_end();
mutex_lock(&shmem_swaplist_mutex);
list_for_each_safe(p, next, &shmem_swaplist) {
found = shmem_unuse_inode(info, entry, page);
cond_resched();
if (found)
- goto out;
+ break;
}
mutex_unlock(&shmem_swaplist_mutex);
- /*
- * Can some race bring us here? We've been holding page lock,
- * so I think not; but would rather try again later than BUG()
- */
+
+uncharge:
+ if (!found)
+ mem_cgroup_uncharge_cache_page(page);
+ if (found < 0)
+ error = found;
+out:
unlock_page(page);
page_cache_release(page);
-out:
- return (found < 0) ? found : 0;
+ return error;
}
/*
else
swap.val = 0;
+ /*
+ * Add inode to shmem_unuse()'s list of swapped-out inodes,
+ * if it's not already there. Do it now because we cannot take
+ * mutex while holding spinlock, and must do so before the page
+ * is moved to swap cache, when its pagelock no longer protects
+ * the inode from eviction. But don't unlock the mutex until
+ * we've taken the spinlock, because shmem_unuse_inode() will
+ * prune a !swapped inode from the swaplist under both locks.
+ */
+ if (swap.val) {
+ mutex_lock(&shmem_swaplist_mutex);
+ if (list_empty(&info->swaplist))
+ list_add_tail(&info->swaplist, &shmem_swaplist);
+ }
+
spin_lock(&info->lock);
+ if (swap.val)
+ mutex_unlock(&shmem_swaplist_mutex);
+
if (index >= info->next_index) {
BUG_ON(!(info->flags & SHMEM_TRUNCATE));
goto unlock;
delete_from_page_cache(page);
shmem_swp_set(info, entry, swap.val);
shmem_swp_unmap(entry);
- if (list_empty(&info->swaplist))
- inode = igrab(inode);
- else
- inode = NULL;
spin_unlock(&info->lock);
swap_shmem_alloc(swap);
BUG_ON(page_mapped(page));
swap_writepage(page, wbc);
- if (inode) {
- mutex_lock(&shmem_swaplist_mutex);
- /* move instead of add in case we're racing */
- list_move_tail(&info->swaplist, &shmem_swaplist);
- mutex_unlock(&shmem_swaplist_mutex);
- iput(inode);
- }
return 0;
}
if (sbinfo->max_blocks) {
if (percpu_counter_compare(&sbinfo->used_blocks,
sbinfo->max_blocks) >= 0 ||
- shmem_acct_block(info->flags)) {
- spin_unlock(&info->lock);
- error = -ENOSPC;
- goto failed;
- }
+ shmem_acct_block(info->flags))
+ goto nospace;
percpu_counter_inc(&sbinfo->used_blocks);
spin_lock(&inode->i_lock);
inode->i_blocks += BLOCKS_PER_PAGE;
spin_unlock(&inode->i_lock);
- } else if (shmem_acct_block(info->flags)) {
- spin_unlock(&info->lock);
- error = -ENOSPC;
- goto failed;
- }
+ } else if (shmem_acct_block(info->flags))
+ goto nospace;
if (!filepage) {
int ret;
error = 0;
goto out;
+nospace:
+ /*
+ * Perhaps the page was brought in from swap between find_lock_page
+ * and taking info->lock? We allow for that at add_to_page_cache_lru,
+ * but must also avoid reporting a spurious ENOSPC while working on a
+ * full tmpfs. (When filepage has been passed in to shmem_getpage, it
+ * is already in page cache, which prevents this race from occurring.)
+ */
+ if (!filepage) {
+ struct page *page = find_get_page(mapping, idx);
+ if (page) {
+ spin_unlock(&info->lock);
+ page_cache_release(page);
+ goto repeat;
+ }
+ }
+ spin_unlock(&info->lock);
+ error = -ENOSPC;
failed:
if (*pagep != filepage) {
unlock_page(filepage);
* Since this is without lock semantics the protection is only against
* code executing on this cpu *not* from access by other cpus.
*/
- if (unlikely(!this_cpu_cmpxchg_double(
+ if (unlikely(!irqsafe_cpu_cmpxchg_double(
s->cpu_slab->freelist, s->cpu_slab->tid,
object, tid,
get_freepointer(s, object), next_tid(tid)))) {
set_freepointer(s, object, c->freelist);
#ifdef CONFIG_CMPXCHG_LOCAL
- if (unlikely(!this_cpu_cmpxchg_double(
+ if (unlikely(!irqsafe_cpu_cmpxchg_double(
s->cpu_slab->freelist, s->cpu_slab->tid,
c->freelist, tid,
object, next_tid(tid)))) {
if (!PageLRU(page))
return;
+ if (PageUnevictable(page))
+ return;
+
/* Some processes are using the page */
if (page_mapped(page))
return;
* back off and wait for congestion to clear because further reclaim
* will encounter the same problem
*/
- if (nr_dirty == nr_congested && nr_dirty != 0)
+ if (nr_dirty && nr_dirty == nr_congested && scanning_global_lru(sc))
zone_set_flag(zone, ZONE_CONGESTED);
free_page_list(&free_pages);
grp->nr_vlans--;
+ if (vlan->flags & VLAN_FLAG_GVRP)
+ vlan_gvrp_request_leave(dev);
+
vlan_group_set_device(grp, vlan_id, NULL);
if (!grp->killall)
synchronize_net();
struct vlan_dev_info *vlan = vlan_dev_info(dev);
struct net_device *real_dev = vlan->real_dev;
- if (vlan->flags & VLAN_FLAG_GVRP)
- vlan_gvrp_request_leave(dev);
-
dev_mc_unsync(real_dev, dev);
dev_uc_unsync(real_dev, dev);
if (dev->flags & IFF_ALLMULTI)
err = c->trans_mod->request(c, req);
if (err < 0) {
- if (err != -ERESTARTSYS)
+ if (err != -ERESTARTSYS && err != -EFAULT)
c->status = Disconnected;
goto reterr;
}
}
strcpy(dirent->d_name, nameptr);
+ kfree(nameptr);
out:
return fake_pdu.offset;
int nr_pages, u8 rw)
{
uint32_t first_page_bytes = 0;
- uint32_t pdata_mapped_pages;
+ int32_t pdata_mapped_pages;
struct trans_rpage_info *rpinfo;
*pdata_off = (__force size_t)req->tc->pubuf & (PAGE_SIZE-1);
rpinfo = req->tc->private;
pdata_mapped_pages = get_user_pages_fast((unsigned long)req->tc->pubuf,
nr_pages, rw, &rpinfo->rp_data[0]);
+ if (pdata_mapped_pages <= 0)
+ return pdata_mapped_pages;
- if (pdata_mapped_pages < 0) {
- printk(KERN_ERR "get_user_pages_fast failed:%d udata:%p"
- "nr_pages:%d\n", pdata_mapped_pages,
- req->tc->pubuf, nr_pages);
- pdata_mapped_pages = 0;
- return -EIO;
- }
rpinfo->rp_nr_pages = pdata_mapped_pages;
if (*pdata_off) {
*pdata_len = first_page_bytes;
hci_req_cancel(hdev, ENODEV);
hci_req_lock(hdev);
- /* Stop timer, it might be running */
- del_timer_sync(&hdev->cmd_timer);
-
if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
+ del_timer_sync(&hdev->cmd_timer);
hci_req_unlock(hdev);
return 0;
}
/* Drop last sent command */
if (hdev->sent_cmd) {
+ del_timer_sync(&hdev->cmd_timer);
kfree_skb(hdev->sent_cmd);
hdev->sent_cmd = NULL;
}
if (!conn)
goto unlock;
- hci_conn_hold(conn);
-
conn->remote_cap = ev->capability;
conn->remote_oob = ev->oob_data;
conn->remote_auth = ev->authentication;
tx_skb = skb_clone(skb, GFP_ATOMIC);
bt_cb(skb)->retries++;
control = get_unaligned_le16(tx_skb->data + L2CAP_HDR_SIZE);
+ control &= L2CAP_CTRL_SAR;
if (pi->conn_state & L2CAP_CONN_SEND_FBIT) {
control |= L2CAP_CTRL_FINAL;
goto drop;
/* If STP is turned off, then forward */
- if (p->br->stp_enabled == BR_NO_STP)
+ if (p->br->stp_enabled == BR_NO_STP && dest[5] == 0)
goto forward;
if (NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, skb, skb->dev,
nf_bridge->mask |= BRNF_PKT_TYPE;
}
- if (br_parse_ip_options(skb))
+ if (pf == PF_INET && br_parse_ip_options(skb))
return NF_DROP;
/* The physdev module checks on this */
newinfo->entries_size = size;
- xt_compat_init_offsets(AF_INET, info->nentries);
+ xt_compat_init_offsets(NFPROTO_BRIDGE, info->nentries);
return EBT_ENTRY_ITERATE(entries, size, compat_calc_entry, info,
entries, newinfo);
}
struct xt_match *match;
struct xt_target *wt;
void *dst = NULL;
- int off, pad = 0, ret = 0;
+ int off, pad = 0;
unsigned int size_kern, entry_offset, match_size = mwt->match_size;
strlcpy(name, mwt->u.name, sizeof(name));
break;
}
- if (!dst) {
- ret = xt_compat_add_offset(NFPROTO_BRIDGE, entry_offset,
- off + ebt_compat_entry_padsize());
- if (ret < 0)
- return ret;
- }
-
state->buf_kern_offset += match_size + off;
state->buf_user_offset += match_size;
pad = XT_ALIGN(size_kern) - size_kern;
return growth;
}
-#define EBT_COMPAT_WATCHER_ITERATE(e, fn, args...) \
-({ \
- unsigned int __i; \
- int __ret = 0; \
- struct compat_ebt_entry_mwt *__watcher; \
- \
- for (__i = e->watchers_offset; \
- __i < (e)->target_offset; \
- __i += __watcher->watcher_size + \
- sizeof(struct compat_ebt_entry_mwt)) { \
- __watcher = (void *)(e) + __i; \
- __ret = fn(__watcher , ## args); \
- if (__ret != 0) \
- break; \
- } \
- if (__ret == 0) { \
- if (__i != (e)->target_offset) \
- __ret = -EINVAL; \
- } \
- __ret; \
-})
-
-#define EBT_COMPAT_MATCH_ITERATE(e, fn, args...) \
-({ \
- unsigned int __i; \
- int __ret = 0; \
- struct compat_ebt_entry_mwt *__match; \
- \
- for (__i = sizeof(struct ebt_entry); \
- __i < (e)->watchers_offset; \
- __i += __match->match_size + \
- sizeof(struct compat_ebt_entry_mwt)) { \
- __match = (void *)(e) + __i; \
- __ret = fn(__match , ## args); \
- if (__ret != 0) \
- break; \
- } \
- if (__ret == 0) { \
- if (__i != (e)->watchers_offset) \
- __ret = -EINVAL; \
- } \
- __ret; \
-})
-
/* called for all ebt_entry structures. */
static int size_entry_mwt(struct ebt_entry *entry, const unsigned char *base,
unsigned int *total,
}
}
+ if (state->buf_kern_start == NULL) {
+ unsigned int offset = buf_start - (char *) base;
+
+ ret = xt_compat_add_offset(NFPROTO_BRIDGE, offset, new_offset);
+ if (ret < 0)
+ return ret;
+ }
+
startoff = state->buf_user_offset - startoff;
BUG_ON(*total < startoff);
xt_compat_lock(NFPROTO_BRIDGE);
+ xt_compat_init_offsets(NFPROTO_BRIDGE, tmp.nentries);
ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state);
if (ret < 0)
goto out_unlock;
static int bcm_release(struct socket *sock)
{
struct sock *sk = sock->sk;
- struct bcm_sock *bo = bcm_sk(sk);
+ struct bcm_sock *bo;
struct bcm_op *op, *next;
+ if (sk == NULL)
+ return 0;
+
+ bo = bcm_sk(sk);
+
/* remove bcm_ops, timer, rx_unregister(), etc. */
unregister_netdevice_notifier(&bo->notifier);
static int raw_release(struct socket *sock)
{
struct sock *sk = sock->sk;
- struct raw_sock *ro = raw_sk(sk);
+ struct raw_sock *ro;
+
+ if (!sk)
+ return 0;
+
+ ro = raw_sk(sk);
unregister_netdevice_notifier(&ro->notifier);
m->more_to_follow = false;
m->pool = NULL;
+ /* middle */
+ m->middle = NULL;
+
+ /* data */
+ m->nr_pages = 0;
+ m->page_alignment = 0;
+ m->pages = NULL;
+ m->pagelist = NULL;
+ m->bio = NULL;
+ m->bio_iter = NULL;
+ m->bio_seg = 0;
+ m->trail = NULL;
+
/* front */
if (front_len) {
if (front_len > PAGE_CACHE_SIZE) {
}
m->front.iov_len = front_len;
- /* middle */
- m->middle = NULL;
-
- /* data */
- m->nr_pages = 0;
- m->page_alignment = 0;
- m->pages = NULL;
- m->pagelist = NULL;
- m->bio = NULL;
- m->bio_iter = NULL;
- m->bio_seg = 0;
- m->trail = NULL;
-
dout("ceph_msg_new %p front %d\n", m, front_len);
return m;
snapc, ops,
use_mempool,
GFP_NOFS, NULL, NULL);
- if (IS_ERR(req))
- return req;
+ if (!req)
+ return NULL;
/* calculate max write size */
calc_layout(osdc, vino, layout, off, plen, req, ops);
*/
int dev_close(struct net_device *dev)
{
- LIST_HEAD(single);
+ if (dev->flags & IFF_UP) {
+ LIST_HEAD(single);
- list_add(&dev->unreg_list, &single);
- dev_close_many(&single);
- list_del(&single);
+ list_add(&dev->unreg_list, &single);
+ dev_close_many(&single);
+ list_del(&single);
+ }
return 0;
}
EXPORT_SYMBOL(dev_close);
* is never reached
*/
WARN_ON(1);
- err = -EINVAL;
+ err = -ENOTTY;
break;
}
/* Set the per device memory buffer space.
* Not applicable in our case */
case SIOCSIFLINK:
- return -EINVAL;
+ return -ENOTTY;
/*
* Unknown or private ioctl.
/* Take care of Wireless Extensions */
if (cmd >= SIOCIWFIRST && cmd <= SIOCIWLAST)
return wext_handle_ioctl(net, &ifr, cmd, arg);
- return -EINVAL;
+ return -ENOTTY;
}
}
/* Fix illegal checksum combinations */
if ((features & NETIF_F_HW_CSUM) &&
(features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
- netdev_info(dev, "mixed HW and IP checksum settings.\n");
+ netdev_warn(dev, "mixed HW and IP checksum settings.\n");
features &= ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
}
if ((features & NETIF_F_NO_CSUM) &&
(features & (NETIF_F_HW_CSUM|NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
- netdev_info(dev, "mixed no checksumming and other settings.\n");
+ netdev_warn(dev, "mixed no checksumming and other settings.\n");
features &= ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM|NETIF_F_HW_CSUM);
}
/* Fix illegal SG+CSUM combinations. */
if ((features & NETIF_F_SG) &&
!(features & NETIF_F_ALL_CSUM)) {
- netdev_info(dev,
- "Dropping NETIF_F_SG since no checksum feature.\n");
+ netdev_dbg(dev,
+ "Dropping NETIF_F_SG since no checksum feature.\n");
features &= ~NETIF_F_SG;
}
/* TSO requires that SG is present as well. */
if ((features & NETIF_F_ALL_TSO) && !(features & NETIF_F_SG)) {
- netdev_info(dev, "Dropping TSO features since no SG feature.\n");
+ netdev_dbg(dev, "Dropping TSO features since no SG feature.\n");
features &= ~NETIF_F_ALL_TSO;
}
/* Software GSO depends on SG. */
if ((features & NETIF_F_GSO) && !(features & NETIF_F_SG)) {
- netdev_info(dev, "Dropping NETIF_F_GSO since no SG feature.\n");
+ netdev_dbg(dev, "Dropping NETIF_F_GSO since no SG feature.\n");
features &= ~NETIF_F_GSO;
}
if (!((features & NETIF_F_GEN_CSUM) ||
(features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
== (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
- netdev_info(dev,
+ netdev_dbg(dev,
"Dropping NETIF_F_UFO since no checksum offload features.\n");
features &= ~NETIF_F_UFO;
}
if (!(features & NETIF_F_SG)) {
- netdev_info(dev,
+ netdev_dbg(dev,
"Dropping NETIF_F_UFO since no NETIF_F_SG feature.\n");
features &= ~NETIF_F_UFO;
}
dev->features |= NETIF_F_SOFT_FEATURES;
dev->wanted_features = dev->features & dev->hw_features;
- /* Avoid warning from netdev_fix_features() for GSO without SG */
- if (!(dev->wanted_features & NETIF_F_SG)) {
- dev->wanted_features &= ~NETIF_F_GSO;
- dev->features &= ~NETIF_F_GSO;
- }
-
/* Enable GRO and NETIF_F_HIGHDMA for vlans by default,
* vlan_dev_init() will do the dev->features check, so these features
* are enabled only if supported by underlying device.
case DCCPO_CHANGE_L ... DCCPO_CONFIRM_R:
if (pkt_type == DCCP_PKT_DATA) /* RFC 4340, 6 */
break;
+ if (len == 0)
+ goto out_invalid_option;
rc = dccp_feat_parse_options(sk, dreq, mandatory, opt,
*value, value + 1, len - 1);
if (rc)
default n
config NET_DSA_MV88E6131
- bool "Marvell 88E6095/6095F/6131 ethernet switch chip support"
+ bool "Marvell 88E6085/6095/6095F/6131 ethernet switch chip support"
select NET_DSA_MV88E6XXX
select NET_DSA_MV88E6XXX_NEED_PPU
select NET_DSA_TAG_DSA
---help---
- This enables support for the Marvell 88E6095/6095F/6131
+ This enables support for the Marvell 88E6085/6095/6095F/6131
ethernet switch chips.
config NET_DSA_MV88E6123_61_65
* mode, but do not enable forwarding of unknown unicasts.
*/
val = 0x0433;
- if (p == dsa_upstream_port(ds))
+ if (p == dsa_upstream_port(ds)) {
val |= 0x0104;
+ /*
+ * On 6085, unknown multicast forward is controlled
+ * here rather than in Port Control 2 register.
+ */
+ if (ps->id == ID_6085)
+ val |= 0x0008;
+ }
if (ds->dsa_port_mask & (1 << p))
val |= 0x0100;
REG_WRITE(addr, 0x04, val);
* If this is the upstream port for this switch, enable
* forwarding of unknown multicast addresses.
*/
- val = 0x0080 | dsa_upstream_port(ds);
- if (p == dsa_upstream_port(ds))
- val |= 0x0040;
- REG_WRITE(addr, 0x08, val);
+ if (ps->id == ID_6085)
+ /*
+ * on 6085, bits 3:0 are reserved, bit 6 control ARP
+ * mirroring, and multicast forward is handled in
+ * Port Control register.
+ */
+ REG_WRITE(addr, 0x08, 0x0080);
+ else {
+ val = 0x0080 | dsa_upstream_port(ds);
+ if (p == dsa_upstream_port(ds))
+ val |= 0x0040;
+ REG_WRITE(addr, 0x08, val);
+ }
/*
* Rate Control: disable ingress rate limiting.
return;
cnf->sysctl = NULL;
- unregister_sysctl_table(t->sysctl_header);
+ unregister_net_sysctl_table(t->sysctl_header);
kfree(t->dev_name);
kfree(t);
}
t = (struct trie *) tb->tb_data;
memset(t, 0, sizeof(*t));
- if (id == RT_TABLE_LOCAL)
- pr_info("IPv4 FIB: Using LC-trie version %s\n", VERSION);
-
return tb;
}
if ((qp->q.last_in & INET_FRAG_FIRST_IN) && qp->q.fragments != NULL) {
struct sk_buff *head = qp->q.fragments;
+ const struct iphdr *iph;
+ int err;
rcu_read_lock();
head->dev = dev_get_by_index_rcu(net, qp->iif);
if (!head->dev)
goto out_rcu_unlock;
+ /* skb dst is stale, drop it, and perform route lookup again */
+ skb_dst_drop(head);
+ iph = ip_hdr(head);
+ err = ip_route_input_noref(head, iph->daddr, iph->saddr,
+ iph->tos, head->dev);
+ if (err)
+ goto out_rcu_unlock;
+
/*
- * Only search router table for the head fragment,
- * when defraging timeout at PRE_ROUTING HOOK.
+ * Only an end host needs to send an ICMP
+ * "Fragment Reassembly Timeout" message, per RFC792.
*/
- if (qp->user == IP_DEFRAG_CONNTRACK_IN && !skb_dst(head)) {
- const struct iphdr *iph = ip_hdr(head);
- int err = ip_route_input(head, iph->daddr, iph->saddr,
- iph->tos, head->dev);
- if (unlikely(err))
- goto out_rcu_unlock;
-
- /*
- * Only an end host needs to send an ICMP
- * "Fragment Reassembly Timeout" message, per RFC792.
- */
- if (skb_rtable(head)->rt_type != RTN_LOCAL)
- goto out_rcu_unlock;
+ if (qp->user == IP_DEFRAG_CONNTRACK_IN &&
+ skb_rtable(head)->rt_type != RTN_LOCAL)
+ goto out_rcu_unlock;
- }
/* Send an ICMP "Fragment Reassembly Timeout" message. */
icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
{
}
+static u32 *ipv4_rt_blackhole_cow_metrics(struct dst_entry *dst,
+ unsigned long old)
+{
+ return NULL;
+}
+
static struct dst_ops ipv4_dst_blackhole_ops = {
.family = AF_INET,
.protocol = cpu_to_be16(ETH_P_IP),
.default_mtu = ipv4_blackhole_default_mtu,
.default_advmss = ipv4_default_advmss,
.update_pmtu = ipv4_rt_blackhole_update_pmtu,
+ .cow_metrics = ipv4_rt_blackhole_cow_metrics,
};
struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig)
u32 ack_cnt; /* number of acks */
u32 tcp_cwnd; /* estimated tcp cwnd */
#define ACK_RATIO_SHIFT 4
+#define ACK_RATIO_LIMIT (32u << ACK_RATIO_SHIFT)
u16 delayed_ack; /* estimate the ratio of Packets/ACKs << 4 */
u8 sample_cnt; /* number of samples to decide curr_rtt */
u8 found; /* the exit point is found? */
u32 delay;
if (icsk->icsk_ca_state == TCP_CA_Open) {
- cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
- ca->delayed_ack += cnt;
+ u32 ratio = ca->delayed_ack;
+
+ ratio -= ca->delayed_ack >> ACK_RATIO_SHIFT;
+ ratio += cnt;
+
+ ca->delayed_ack = min(ratio, ACK_RATIO_LIMIT);
}
/* Some calls are for duplicates without timetamps */
}
EXPORT_SYMBOL(xfrm4_prepare_output);
-static int xfrm4_output_finish(struct sk_buff *skb)
+int xfrm4_output_finish(struct sk_buff *skb)
{
#ifdef CONFIG_NETFILTER
if (!skb_dst(skb)->xfrm) {
int xfrm4_output(struct sk_buff *skb)
{
+ struct dst_entry *dst = skb_dst(skb);
+ struct xfrm_state *x = dst->xfrm;
+
return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, skb,
- NULL, skb_dst(skb)->dev, xfrm4_output_finish,
+ NULL, dst->dev,
+ x->outer_mode->afinfo->output_finish,
!(IPCB(skb)->flags & IPSKB_REROUTED));
}
.init_tempsel = __xfrm4_init_tempsel,
.init_temprop = xfrm4_init_temprop,
.output = xfrm4_output,
+ .output_finish = xfrm4_output_finish,
.extract_input = xfrm4_extract_input,
.extract_output = xfrm4_extract_output,
.transport_finish = xfrm4_transport_finish,
t = p->sysctl;
p->sysctl = NULL;
- unregister_sysctl_table(t->sysctl_header);
+ unregister_net_sysctl_table(t->sysctl_header);
kfree(t->dev_name);
kfree(t);
}
iv = esp_tmp_iv(aead, tmp, seqhilen);
req = esp_tmp_req(aead, iv);
asg = esp_req_sg(aead, req);
- sg = asg + 1;
+ sg = asg + sglists;
skb->ip_summed = CHECKSUM_NONE;
int tcphoff, needs_ack;
const struct ipv6hdr *oip6h = ipv6_hdr(oldskb);
struct ipv6hdr *ip6h;
+#define DEFAULT_TOS_VALUE 0x0U
+ const __u8 tclass = DEFAULT_TOS_VALUE;
struct dst_entry *dst = NULL;
u8 proto;
struct flowi6 fl6;
skb_put(nskb, sizeof(struct ipv6hdr));
skb_reset_network_header(nskb);
ip6h = ipv6_hdr(nskb);
- ip6h->version = 6;
+ *(__be32 *)ip6h = htonl(0x60000000 | (tclass << 20));
ip6h->hop_limit = ip6_dst_hoplimit(dst);
ip6h->nexthdr = IPPROTO_TCP;
ipv6_addr_copy(&ip6h->saddr, &oip6h->daddr);
{
}
+static u32 *ip6_rt_blackhole_cow_metrics(struct dst_entry *dst,
+ unsigned long old)
+{
+ return NULL;
+}
+
static struct dst_ops ip6_dst_blackhole_ops = {
.family = AF_INET6,
.protocol = cpu_to_be16(ETH_P_IPV6),
.default_mtu = ip6_blackhole_default_mtu,
.default_advmss = ip6_default_advmss,
.update_pmtu = ip6_rt_blackhole_update_pmtu,
+ .cow_metrics = ip6_rt_blackhole_cow_metrics,
};
static const u32 ip6_template_metrics[RTAX_MAX] = {
rt->dst.output = ip6_output;
rt->rt6i_dev = net->loopback_dev;
rt->rt6i_idev = idev;
- dst_metric_set(&rt->dst, RTAX_HOPLIMIT, -1);
rt->dst.obsolete = -1;
rt->rt6i_flags = RTF_UP | RTF_NONEXTHOP;
skb->ip_summed = CHECKSUM_NONE;
/* Check if there is enough headroom to insert fragment header. */
- if ((skb_headroom(skb) < frag_hdr_sz) &&
+ if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) &&
pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC))
goto out;
}
EXPORT_SYMBOL(xfrm6_prepare_output);
-static int xfrm6_output_finish(struct sk_buff *skb)
+int xfrm6_output_finish(struct sk_buff *skb)
{
#ifdef CONFIG_NETFILTER
IP6CB(skb)->flags |= IP6SKB_XFRM_TRANSFORMED;
if ((x && x->props.mode == XFRM_MODE_TUNNEL) &&
((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
dst_allfrag(skb_dst(skb)))) {
- return ip6_fragment(skb, xfrm6_output_finish);
+ return ip6_fragment(skb, x->outer_mode->afinfo->output_finish);
}
- return xfrm6_output_finish(skb);
+ return x->outer_mode->afinfo->output_finish(skb);
}
int xfrm6_output(struct sk_buff *skb)
.tmpl_sort = __xfrm6_tmpl_sort,
.state_sort = __xfrm6_state_sort,
.output = xfrm6_output,
+ .output_finish = xfrm6_output_finish,
.extract_input = xfrm6_extract_input,
.extract_output = xfrm6_extract_output,
.transport_finish = xfrm6_transport_finish,
MODULE_DESCRIPTION("L2TP over IP");
MODULE_VERSION("1.0");
-/* Use the value of SOCK_DGRAM (2) directory, because __stringify does't like
+/* Use the value of SOCK_DGRAM (2) directory, because __stringify doesn't like
* enums
*/
MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 2, IPPROTO_L2TP);
enum ieee80211_smps_mode old_req;
int err;
+ lockdep_assert_held(&sdata->u.mgd.mtx);
+
old_req = sdata->u.mgd.req_smps;
sdata->u.mgd.req_smps = smps_mode;
if (sdata->vif.type != NL80211_IFTYPE_STATION)
return -EOPNOTSUPP;
- mutex_lock(&local->iflist_mtx);
+ mutex_lock(&sdata->u.mgd.mtx);
err = __ieee80211_request_smps(sdata, smps_mode);
- mutex_unlock(&local->iflist_mtx);
+ mutex_unlock(&sdata->u.mgd.mtx);
return err;
}
&local->dynamic_ps_disable_work);
}
+ /* Don't restart the timer if we're not disassociated */
+ if (!ifmgd->associated)
+ return TX_CONTINUE;
+
mod_timer(&local->dynamic_ps_timer, jiffies +
msecs_to_jiffies(local->hw.conf.dynamic_ps_timeout));
.open = ip_vs_app_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = seq_release_net,
};
#endif
-static int __net_init __ip_vs_app_init(struct net *net)
+int __net_init __ip_vs_app_init(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
return 0;
}
-static void __net_exit __ip_vs_app_cleanup(struct net *net)
+void __net_exit __ip_vs_app_cleanup(struct net *net)
{
proc_net_remove(net, "ip_vs_app");
}
-static struct pernet_operations ip_vs_app_ops = {
- .init = __ip_vs_app_init,
- .exit = __ip_vs_app_cleanup,
-};
-
int __init ip_vs_app_init(void)
{
- int rv;
-
- rv = register_pernet_subsys(&ip_vs_app_ops);
- return rv;
+ return 0;
}
void ip_vs_app_cleanup(void)
{
- unregister_pernet_subsys(&ip_vs_app_ops);
}
.open = ip_vs_conn_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = seq_release_net,
};
static const char *ip_vs_origin_name(unsigned flags)
.open = ip_vs_conn_sync_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = seq_release_net,
};
#endif
return 0;
}
-static void __net_exit __ip_vs_conn_cleanup(struct net *net)
+void __net_exit __ip_vs_conn_cleanup(struct net *net)
{
/* flush all the connection entries first */
ip_vs_conn_flush(net);
proc_net_remove(net, "ip_vs_conn");
proc_net_remove(net, "ip_vs_conn_sync");
}
-static struct pernet_operations ipvs_conn_ops = {
- .init = __ip_vs_conn_init,
- .exit = __ip_vs_conn_cleanup,
-};
int __init ip_vs_conn_init(void)
{
int idx;
- int retc;
/* Compute size and mask */
ip_vs_conn_tab_size = 1 << ip_vs_conn_tab_bits;
rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
}
- retc = register_pernet_subsys(&ipvs_conn_ops);
-
/* calculate the random value for connection hash */
get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
- return retc;
+ return 0;
}
void ip_vs_conn_cleanup(void)
{
- unregister_pernet_subsys(&ipvs_conn_ops);
/* Release the empty cache */
kmem_cache_destroy(ip_vs_conn_cachep);
vfree(ip_vs_conn_tab);
return NF_ACCEPT;
net = skb_net(skb);
+ if (!net_ipvs(net)->enable)
+ return NF_ACCEPT;
+
ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
return NF_ACCEPT; /* The packet looks wrong, ignore */
net = skb_net(skb);
+
pd = ip_vs_proto_data_get(net, cih->protocol);
if (!pd)
return NF_ACCEPT;
IP_VS_DBG_ADDR(af, &iph.daddr), hooknum);
return NF_ACCEPT;
}
+ /* ipvs enabled in this netns ? */
+ net = skb_net(skb);
+ if (!net_ipvs(net)->enable)
+ return NF_ACCEPT;
+
ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
/* Bad... Do not break raw sockets */
ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
}
- net = skb_net(skb);
/* Protocol supported? */
pd = ip_vs_proto_data_get(net, iph.protocol);
if (unlikely(!pd))
}
IP_VS_DBG_PKT(11, af, pp, skb, 0, "Incoming packet");
- net = skb_net(skb);
ipvs = net_ipvs(net);
/* Check the server status */
if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
int (*okfn)(struct sk_buff *))
{
int r;
+ struct net *net;
if (ip_hdr(skb)->protocol != IPPROTO_ICMP)
return NF_ACCEPT;
+ /* ipvs enabled in this netns ? */
+ net = skb_net(skb);
+ if (!net_ipvs(net)->enable)
+ return NF_ACCEPT;
+
return ip_vs_in_icmp(skb, &r, hooknum);
}
int (*okfn)(struct sk_buff *))
{
int r;
+ struct net *net;
if (ipv6_hdr(skb)->nexthdr != IPPROTO_ICMPV6)
return NF_ACCEPT;
+ /* ipvs enabled in this netns ? */
+ net = skb_net(skb);
+ if (!net_ipvs(net)->enable)
+ return NF_ACCEPT;
+
return ip_vs_in_icmp_v6(skb, &r, hooknum);
}
#endif
pr_err("%s(): no memory.\n", __func__);
return -ENOMEM;
}
+ /* Hold the beast until a service is registerd */
+ ipvs->enable = 0;
ipvs->net = net;
/* Counters used for creating unique names */
ipvs->gen = atomic_read(&ipvs_netns_cnt);
atomic_inc(&ipvs_netns_cnt);
net->ipvs = ipvs;
+
+ if (__ip_vs_estimator_init(net) < 0)
+ goto estimator_fail;
+
+ if (__ip_vs_control_init(net) < 0)
+ goto control_fail;
+
+ if (__ip_vs_protocol_init(net) < 0)
+ goto protocol_fail;
+
+ if (__ip_vs_app_init(net) < 0)
+ goto app_fail;
+
+ if (__ip_vs_conn_init(net) < 0)
+ goto conn_fail;
+
+ if (__ip_vs_sync_init(net) < 0)
+ goto sync_fail;
+
printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n",
sizeof(struct netns_ipvs), ipvs->gen);
return 0;
+/*
+ * Error handling
+ */
+
+sync_fail:
+ __ip_vs_conn_cleanup(net);
+conn_fail:
+ __ip_vs_app_cleanup(net);
+app_fail:
+ __ip_vs_protocol_cleanup(net);
+protocol_fail:
+ __ip_vs_control_cleanup(net);
+control_fail:
+ __ip_vs_estimator_cleanup(net);
+estimator_fail:
+ return -ENOMEM;
}
static void __net_exit __ip_vs_cleanup(struct net *net)
{
- IP_VS_DBG(10, "ipvs netns %d released\n", net_ipvs(net)->gen);
+ __ip_vs_service_cleanup(net); /* ip_vs_flush() with locks */
+ __ip_vs_conn_cleanup(net);
+ __ip_vs_app_cleanup(net);
+ __ip_vs_protocol_cleanup(net);
+ __ip_vs_control_cleanup(net);
+ __ip_vs_estimator_cleanup(net);
+ IP_VS_DBG(2, "ipvs netns %d released\n", net_ipvs(net)->gen);
+}
+
+static void __net_exit __ip_vs_dev_cleanup(struct net *net)
+{
+ EnterFunction(2);
+ net_ipvs(net)->enable = 0; /* Disable packet reception */
+ __ip_vs_sync_cleanup(net);
+ LeaveFunction(2);
}
static struct pernet_operations ipvs_core_ops = {
.size = sizeof(struct netns_ipvs),
};
+static struct pernet_operations ipvs_core_dev_ops = {
+ .exit = __ip_vs_dev_cleanup,
+};
+
/*
* Initialize IP Virtual Server
*/
{
int ret;
- ret = register_pernet_subsys(&ipvs_core_ops); /* Alloc ip_vs struct */
- if (ret < 0)
- return ret;
-
ip_vs_estimator_init();
ret = ip_vs_control_init();
if (ret < 0) {
goto cleanup_conn;
}
+ ret = register_pernet_subsys(&ipvs_core_ops); /* Alloc ip_vs struct */
+ if (ret < 0)
+ goto cleanup_sync;
+
+ ret = register_pernet_device(&ipvs_core_dev_ops);
+ if (ret < 0)
+ goto cleanup_sub;
+
ret = nf_register_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
if (ret < 0) {
pr_err("can't register hooks.\n");
- goto cleanup_sync;
+ goto cleanup_dev;
}
pr_info("ipvs loaded.\n");
+
return ret;
+cleanup_dev:
+ unregister_pernet_device(&ipvs_core_dev_ops);
+cleanup_sub:
+ unregister_pernet_subsys(&ipvs_core_ops);
cleanup_sync:
ip_vs_sync_cleanup();
cleanup_conn:
ip_vs_control_cleanup();
cleanup_estimator:
ip_vs_estimator_cleanup();
- unregister_pernet_subsys(&ipvs_core_ops); /* free ip_vs struct */
return ret;
}
static void __exit ip_vs_cleanup(void)
{
nf_unregister_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
+ unregister_pernet_device(&ipvs_core_dev_ops);
+ unregister_pernet_subsys(&ipvs_core_ops); /* free ip_vs struct */
ip_vs_sync_cleanup();
ip_vs_conn_cleanup();
ip_vs_app_cleanup();
ip_vs_protocol_cleanup();
ip_vs_control_cleanup();
ip_vs_estimator_cleanup();
- unregister_pernet_subsys(&ipvs_core_ops); /* free ip_vs struct */
pr_info("ipvs unloaded.\n");
}
}
#endif
+
+/* Protos */
+static void __ip_vs_del_service(struct ip_vs_service *svc);
+
+
#ifdef CONFIG_IP_VS_IPV6
/* Taken from rt6_fill_node() in net/ipv6/route.c, is there a better way? */
static int __ip_vs_addr_is_local_v6(struct net *net,
write_unlock_bh(&__ip_vs_svc_lock);
*svc_p = svc;
+ /* Now there is a service - full throttle */
+ ipvs->enable = 1;
return 0;
return 0;
}
+/*
+ * Delete service by {netns} in the service table.
+ * Called by __ip_vs_cleanup()
+ */
+void __ip_vs_service_cleanup(struct net *net)
+{
+ EnterFunction(2);
+ /* Check for "full" addressed entries */
+ mutex_lock(&__ip_vs_mutex);
+ ip_vs_flush(net);
+ mutex_unlock(&__ip_vs_mutex);
+ LeaveFunction(2);
+}
+/*
+ * Release dst hold by dst_cache
+ */
+static inline void
+__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
+{
+ spin_lock_bh(&dest->dst_lock);
+ if (dest->dst_cache && dest->dst_cache->dev == dev) {
+ IP_VS_DBG_BUF(3, "Reset dev:%s dest %s:%u ,dest->refcnt=%d\n",
+ dev->name,
+ IP_VS_DBG_ADDR(dest->af, &dest->addr),
+ ntohs(dest->port),
+ atomic_read(&dest->refcnt));
+ ip_vs_dst_reset(dest);
+ }
+ spin_unlock_bh(&dest->dst_lock);
+
+}
+/*
+ * Netdev event receiver
+ * Currently only NETDEV_UNREGISTER is handled, i.e. if we hold a reference to
+ * a device that is "unregister" it must be released.
+ */
+static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
+ void *ptr)
+{
+ struct net_device *dev = ptr;
+ struct net *net = dev_net(dev);
+ struct ip_vs_service *svc;
+ struct ip_vs_dest *dest;
+ unsigned int idx;
+
+ if (event != NETDEV_UNREGISTER)
+ return NOTIFY_DONE;
+ IP_VS_DBG(3, "%s() dev=%s\n", __func__, dev->name);
+ EnterFunction(2);
+ mutex_lock(&__ip_vs_mutex);
+ for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
+ list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
+ if (net_eq(svc->net, net)) {
+ list_for_each_entry(dest, &svc->destinations,
+ n_list) {
+ __ip_vs_dev_reset(dest, dev);
+ }
+ }
+ }
+
+ list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
+ if (net_eq(svc->net, net)) {
+ list_for_each_entry(dest, &svc->destinations,
+ n_list) {
+ __ip_vs_dev_reset(dest, dev);
+ }
+ }
+
+ }
+ }
+
+ list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
+ __ip_vs_dev_reset(dest, dev);
+ }
+ mutex_unlock(&__ip_vs_mutex);
+ LeaveFunction(2);
+ return NOTIFY_DONE;
+}
/*
* Zero counters in a service or all services
.open = ip_vs_info_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = seq_release_net,
};
#endif
.open = ip_vs_stats_seq_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = single_release,
+ .release = single_release_net,
};
static int ip_vs_stats_percpu_show(struct seq_file *seq, void *v)
.open = ip_vs_stats_percpu_seq_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = single_release,
+ .release = single_release_net,
};
#endif
#endif
+static struct notifier_block ip_vs_dst_notifier = {
+ .notifier_call = ip_vs_dst_event,
+};
+
int __net_init __ip_vs_control_init(struct net *net)
{
int idx;
return -ENOMEM;
}
-static void __net_exit __ip_vs_control_cleanup(struct net *net)
+void __net_exit __ip_vs_control_cleanup(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
free_percpu(ipvs->tot_stats.cpustats);
}
-static struct pernet_operations ipvs_control_ops = {
- .init = __ip_vs_control_init,
- .exit = __ip_vs_control_cleanup,
-};
-
int __init ip_vs_control_init(void)
{
int idx;
INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
}
- ret = register_pernet_subsys(&ipvs_control_ops);
- if (ret) {
- pr_err("cannot register namespace.\n");
- goto err;
- }
-
smp_wmb(); /* Do we really need it now ? */
ret = nf_register_sockopt(&ip_vs_sockopts);
if (ret) {
pr_err("cannot register sockopt.\n");
- goto err_net;
+ goto err_sock;
}
ret = ip_vs_genl_register();
if (ret) {
pr_err("cannot register Generic Netlink interface.\n");
- nf_unregister_sockopt(&ip_vs_sockopts);
- goto err_net;
+ goto err_genl;
}
+ ret = register_netdevice_notifier(&ip_vs_dst_notifier);
+ if (ret < 0)
+ goto err_notf;
+
LeaveFunction(2);
return 0;
-err_net:
- unregister_pernet_subsys(&ipvs_control_ops);
-err:
+err_notf:
+ ip_vs_genl_unregister();
+err_genl:
+ nf_unregister_sockopt(&ip_vs_sockopts);
+err_sock:
return ret;
}
void ip_vs_control_cleanup(void)
{
EnterFunction(2);
- unregister_pernet_subsys(&ipvs_control_ops);
ip_vs_genl_unregister();
nf_unregister_sockopt(&ip_vs_sockopts);
LeaveFunction(2);
dst->outbps = (e->outbps + 0xF) >> 5;
}
-static int __net_init __ip_vs_estimator_init(struct net *net)
+int __net_init __ip_vs_estimator_init(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
return 0;
}
-static void __net_exit __ip_vs_estimator_exit(struct net *net)
+void __net_exit __ip_vs_estimator_cleanup(struct net *net)
{
del_timer_sync(&net_ipvs(net)->est_timer);
}
-static struct pernet_operations ip_vs_app_ops = {
- .init = __ip_vs_estimator_init,
- .exit = __ip_vs_estimator_exit,
-};
int __init ip_vs_estimator_init(void)
{
- int rv;
-
- rv = register_pernet_subsys(&ip_vs_app_ops);
- return rv;
+ return 0;
}
void ip_vs_estimator_cleanup(void)
{
- unregister_pernet_subsys(&ip_vs_app_ops);
}
/*
* per network name-space init
*/
-static int __net_init __ip_vs_protocol_init(struct net *net)
+int __net_init __ip_vs_protocol_init(struct net *net)
{
#ifdef CONFIG_IP_VS_PROTO_TCP
register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
return 0;
}
-static void __net_exit __ip_vs_protocol_cleanup(struct net *net)
+void __net_exit __ip_vs_protocol_cleanup(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
struct ip_vs_proto_data *pd;
}
}
-static struct pernet_operations ipvs_proto_ops = {
- .init = __ip_vs_protocol_init,
- .exit = __ip_vs_protocol_cleanup,
-};
-
int __init ip_vs_protocol_init(void)
{
char protocols[64];
REGISTER_PROTOCOL(&ip_vs_protocol_esp);
#endif
pr_info("Registered protocols (%s)\n", &protocols[2]);
- return register_pernet_subsys(&ipvs_proto_ops);
return 0;
}
struct ip_vs_protocol *pp;
int i;
- unregister_pernet_subsys(&ipvs_proto_ops);
/* unregister all the ipvs protocols */
for (i = 0; i < IP_VS_PROTO_TAB_SIZE; i++) {
while ((pp = ip_vs_proto_table[i]) != NULL)
struct socket *sock;
int result;
- /* First create a socket */
- result = __sock_create(net, PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+ /* First create a socket move it to right name space later */
+ result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
if (result < 0) {
pr_err("Error during creation of socket; terminating\n");
return ERR_PTR(result);
}
-
+ /*
+ * Kernel sockets that are a part of a namespace, should not
+ * hold a reference to a namespace in order to allow to stop it.
+ * After sk_change_net should be released using sk_release_kernel.
+ */
+ sk_change_net(sock->sk, net);
result = set_mcast_if(sock->sk, ipvs->master_mcast_ifn);
if (result < 0) {
pr_err("Error setting outbound mcast interface\n");
return sock;
- error:
- sock_release(sock);
+error:
+ sk_release_kernel(sock->sk);
return ERR_PTR(result);
}
int result;
/* First create a socket */
- result = __sock_create(net, PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+ result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
if (result < 0) {
pr_err("Error during creation of socket; terminating\n");
return ERR_PTR(result);
}
-
+ /*
+ * Kernel sockets that are a part of a namespace, should not
+ * hold a reference to a namespace in order to allow to stop it.
+ * After sk_change_net should be released using sk_release_kernel.
+ */
+ sk_change_net(sock->sk, net);
/* it is equivalent to the REUSEADDR option in user-space */
sock->sk->sk_reuse = 1;
return sock;
- error:
- sock_release(sock);
+error:
+ sk_release_kernel(sock->sk);
return ERR_PTR(result);
}
ip_vs_sync_buff_release(sb);
/* release the sending multicast socket */
- sock_release(tinfo->sock);
+ sk_release_kernel(tinfo->sock->sk);
kfree(tinfo);
return 0;
}
/* release the sending multicast socket */
- sock_release(tinfo->sock);
+ sk_release_kernel(tinfo->sock->sk);
kfree(tinfo->buf);
kfree(tinfo);
outbuf:
kfree(buf);
outsocket:
- sock_release(sock);
+ sk_release_kernel(sock->sk);
out:
return result;
}
int stop_sync_thread(struct net *net, int state)
{
struct netns_ipvs *ipvs = net_ipvs(net);
+ int retc = -EINVAL;
IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
spin_lock_bh(&ipvs->sync_lock);
ipvs->sync_state &= ~IP_VS_STATE_MASTER;
spin_unlock_bh(&ipvs->sync_lock);
- kthread_stop(ipvs->master_thread);
+ retc = kthread_stop(ipvs->master_thread);
ipvs->master_thread = NULL;
} else if (state == IP_VS_STATE_BACKUP) {
if (!ipvs->backup_thread)
task_pid_nr(ipvs->backup_thread));
ipvs->sync_state &= ~IP_VS_STATE_BACKUP;
- kthread_stop(ipvs->backup_thread);
+ retc = kthread_stop(ipvs->backup_thread);
ipvs->backup_thread = NULL;
- } else {
- return -EINVAL;
}
/* decrease the module use count */
ip_vs_use_count_dec();
- return 0;
+ return retc;
}
/*
* Initialize data struct for each netns
*/
-static int __net_init __ip_vs_sync_init(struct net *net)
+int __net_init __ip_vs_sync_init(struct net *net)
{
struct netns_ipvs *ipvs = net_ipvs(net);
return 0;
}
-static void __ip_vs_sync_cleanup(struct net *net)
+void __ip_vs_sync_cleanup(struct net *net)
{
- stop_sync_thread(net, IP_VS_STATE_MASTER);
- stop_sync_thread(net, IP_VS_STATE_BACKUP);
-}
+ int retc;
-static struct pernet_operations ipvs_sync_ops = {
- .init = __ip_vs_sync_init,
- .exit = __ip_vs_sync_cleanup,
-};
+ retc = stop_sync_thread(net, IP_VS_STATE_MASTER);
+ if (retc && retc != -ESRCH)
+ pr_err("Failed to stop Master Daemon\n");
+ retc = stop_sync_thread(net, IP_VS_STATE_BACKUP);
+ if (retc && retc != -ESRCH)
+ pr_err("Failed to stop Backup Daemon\n");
+}
int __init ip_vs_sync_init(void)
{
- return register_pernet_subsys(&ipvs_sync_ops);
+ return 0;
}
void ip_vs_sync_cleanup(void)
{
- unregister_pernet_subsys(&ipvs_sync_ops);
}
struct nf_conn *ct;
int err = -EINVAL;
struct nf_conntrack_helper *helper;
+ struct nf_conn_tstamp *tstamp;
ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
if (IS_ERR(ct))
__set_bit(IPS_EXPECTED_BIT, &ct->status);
ct->master = master_ct;
}
+ tstamp = nf_conn_tstamp_find(ct);
+ if (tstamp)
+ tstamp->start = ktime_to_ns(ktime_get_real());
add_timer(&ct->timeout);
nf_conntrack_hash_insert(ct);
vfree(xt[af].compat_tab);
xt[af].compat_tab = NULL;
xt[af].number = 0;
+ xt[af].cur = 0;
}
}
EXPORT_SYMBOL_GPL(xt_compat_flush_offsets);
else
return mid ? tmp[mid - 1].delta : 0;
}
- WARN_ON_ONCE(1);
- return 0;
+ return left ? tmp[left - 1].delta : 0;
}
EXPORT_SYMBOL_GPL(xt_compat_calc_jump);
u_int8_t orig, nv;
orig = ipv6_get_dsfield(iph);
- nv = (orig & info->tos_mask) ^ info->tos_value;
+ nv = (orig & ~info->tos_mask) ^ info->tos_value;
if (orig != nv) {
if (!skb_make_writable(skb, sizeof(struct iphdr)))
{
int ret;
- if (strcmp(par->table, "raw") == 0) {
- pr_info("state is undetermined at the time of raw table\n");
- return -EINVAL;
- }
-
ret = nf_ct_l3proto_try_module_get(par->family);
if (ret < 0)
pr_info("cannot load conntrack support for proto=%u\n",
memcpy(&ssf->ssf_info, &chunk->sinfo, sizeof(struct sctp_sndrcvinfo));
/* Per TSVWG discussion with Randy. Allow the application to
- * resemble a fragmented message.
+ * reassemble a fragmented message.
*/
ssf->ssf_info.sinfo_flags = chunk->chunk_hdr->flags;
If unsure, say N.
config RPCSEC_GSS_KRB5
- tristate
+ tristate "Secure RPC: Kerberos V mechanism"
depends on SUNRPC && CRYPTO
- prompt "Secure RPC: Kerberos V mechanism" if !(NFS_V4 || NFSD_V4)
+ depends on CRYPTO_MD5 && CRYPTO_DES && CRYPTO_CBC && CRYPTO_CTS
+ depends on CRYPTO_ECB && CRYPTO_HMAC && CRYPTO_SHA1 && CRYPTO_AES
+ depends on CRYPTO_ARC4
default y
select SUNRPC_GSS
- select CRYPTO_MD5
- select CRYPTO_DES
- select CRYPTO_CBC
help
Choose Y here to enable Secure RPC using the Kerberos version 5
GSS-API mechanism (RFC 1964).
warn_gssd();
task->tk_timeout = 15*HZ;
rpc_sleep_on(&pipe_version_rpc_waitqueue, task, NULL);
- return 0;
+ return -EAGAIN;
}
if (IS_ERR(gss_msg)) {
err = PTR_ERR(gss_msg);
if (PTR_ERR(gss_msg) == -EAGAIN) {
err = wait_event_interruptible_timeout(pipe_version_waitqueue,
pipe_version >= 0, 15*HZ);
+ if (pipe_version < 0) {
+ warn_gssd();
+ err = -EACCES;
+ }
if (err)
goto out;
- if (pipe_version < 0)
- warn_gssd();
goto retry;
}
if (IS_ERR(gss_msg)) {
if (clnt->cl_chatty)
printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
clnt->cl_protname, clnt->cl_server);
- rpc_exit(task, -EIO);
+ if (task->tk_flags & RPC_TASK_TIMEOUT)
+ rpc_exit(task, -ETIMEDOUT);
+ else
+ rpc_exit(task, -EIO);
return;
}
}
dprintk("RPC: %5u xmit complete\n", task->tk_pid);
+ task->tk_flags |= RPC_TASK_SENT;
spin_lock_bh(&xprt->transport_lock);
xprt->ops->set_retrans_timeout(task);
int, int);
static int unix_seqpacket_sendmsg(struct kiocb *, struct socket *,
struct msghdr *, size_t);
+static int unix_seqpacket_recvmsg(struct kiocb *, struct socket *,
+ struct msghdr *, size_t, int);
static const struct proto_ops unix_stream_ops = {
.family = PF_UNIX,
.setsockopt = sock_no_setsockopt,
.getsockopt = sock_no_getsockopt,
.sendmsg = unix_seqpacket_sendmsg,
- .recvmsg = unix_dgram_recvmsg,
+ .recvmsg = unix_seqpacket_recvmsg,
.mmap = sock_no_mmap,
.sendpage = sock_no_sendpage,
};
return unix_dgram_sendmsg(kiocb, sock, msg, len);
}
+static int unix_seqpacket_recvmsg(struct kiocb *iocb, struct socket *sock,
+ struct msghdr *msg, size_t size,
+ int flags)
+{
+ struct sock *sk = sock->sk;
+
+ if (sk->sk_state != TCP_ESTABLISHED)
+ return -ENOTCONN;
+
+ return unix_dgram_recvmsg(iocb, sock, msg, size, flags);
+}
+
static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
{
struct unix_sock *u = unix_sk(sk);
struct net *net = xp_net(policy);
unsigned long now = jiffies;
struct net_device *dev;
+ struct xfrm_mode *inner_mode;
struct dst_entry *dst_prev = NULL;
struct dst_entry *dst0 = NULL;
int i = 0;
goto put_states;
}
+ if (xfrm[i]->sel.family == AF_UNSPEC) {
+ inner_mode = xfrm_ip2inner_mode(xfrm[i],
+ xfrm_af2proto(family));
+ if (!inner_mode) {
+ err = -EAFNOSUPPORT;
+ dst_release(dst);
+ goto put_states;
+ }
+ } else
+ inner_mode = xfrm[i]->inner_mode;
+
if (!dst_prev)
dst0 = dst1;
else {
dst1->lastuse = now;
dst1->input = dst_discard;
- dst1->output = xfrm[i]->outer_mode->afinfo->output;
+ dst1->output = inner_mode->afinfo->output;
dst1->next = dst_prev;
dst_prev = dst1;
if (replay_esn) {
if (replay_esn->replay_window >
- replay_esn->bmp_len * sizeof(__u32))
+ replay_esn->bmp_len * sizeof(__u32) * 8)
return -EINVAL;
+ if ((x->props.flags & XFRM_STATE_ESN) && replay_esn->replay_window == 0)
+ return -EINVAL;
+
if ((x->props.flags & XFRM_STATE_ESN) && x->replay_esn)
x->repl = &xfrm_replay_esn;
else
{
struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL];
+ if ((p->flags & XFRM_STATE_ESN) && !rt)
+ return -EINVAL;
+
if (!rt)
return 0;
ifdef CONFIG_FTRACE_MCOUNT_RECORD
ifdef BUILD_C_RECORDMCOUNT
+ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
+ RECORDMCOUNT_FLAGS = -w
+endif
# Due to recursion, we must skip empty.o.
# The empty.o file is created in the make process in order to determine
# the target endianness and word size. It is made before all other C
# files, including recordmcount.
sub_cmd_record_mcount = \
if [ $(@) != "scripts/mod/empty.o" ]; then \
- $(objtree)/scripts/recordmcount "$(@)"; \
+ $(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) "$(@)"; \
fi;
+recordmcount_source := $(srctree)/scripts/recordmcount.c \
+ $(srctree)/scripts/recordmcount.h
else
sub_cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
"$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \
"$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CFLAGS)" \
"$(LD)" "$(NM)" "$(RM)" "$(MV)" \
"$(if $(part-of-module),1,0)" "$(@)";
+recordmcount_source := $(srctree)/scripts/recordmcount.pl
endif
cmd_record_mcount = \
if [ "$(findstring -pg,$(_c_flags))" = "-pg" ]; then \
endef
# Built-in and composite module parts
-$(obj)/%.o: $(src)/%.c FORCE
+$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
$(call cmd,force_checksrc)
$(call if_changed_rule,cc_o_c)
# Single-part modules are special since we need to mark them in $(MODVERDIR)
-$(single-used-m): $(obj)/%.o: $(src)/%.c FORCE
+$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
$(call cmd,force_checksrc)
$(call if_changed_rule,cc_o_c)
@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
}
if (!child)
continue;
- if (line[strlen(line) - 1] == '?') {
+ if (line[0] && line[strlen(line) - 1] == '?') {
print_help(child);
continue;
}
return 0;
}
- if (hdr->e_shnum == 0) {
+ if (hdr->e_shnum == SHN_UNDEF) {
/*
* There are more than 64k sections,
* read count from .sh_size.
- * note: it doesn't need shndx2secindex()
*/
info->num_sections = TO_NATIVE(sechdrs[0].sh_size);
}
info->num_sections = hdr->e_shnum;
}
if (hdr->e_shstrndx == SHN_XINDEX) {
- info->secindex_strings =
- shndx2secindex(TO_NATIVE(sechdrs[0].sh_link));
+ info->secindex_strings = TO_NATIVE(sechdrs[0].sh_link);
}
else {
info->secindex_strings = hdr->e_shstrndx;
sechdrs[i].sh_offset;
info->symtab_stop = (void *)hdr +
sechdrs[i].sh_offset + sechdrs[i].sh_size;
- sh_link_idx = shndx2secindex(sechdrs[i].sh_link);
+ sh_link_idx = sechdrs[i].sh_link;
info->strtab = (void *)hdr +
sechdrs[sh_link_idx].sh_offset;
}
if (symtab_shndx_idx != ~0U) {
Elf32_Word *p;
- if (symtab_idx !=
- shndx2secindex(sechdrs[symtab_shndx_idx].sh_link))
+ if (symtab_idx != sechdrs[symtab_shndx_idx].sh_link)
fatal("%s: SYMTAB_SHNDX has bad sh_link: %u!=%u\n",
- filename,
- shndx2secindex(sechdrs[symtab_shndx_idx].sh_link),
+ filename, sechdrs[symtab_shndx_idx].sh_link,
symtab_idx);
/* Fix endianness */
for (p = info->symtab_shndx_start; p < info->symtab_shndx_stop;
Elf_Shdr *sechdr, Elf_Rela *r)
{
Elf_Shdr *sechdrs = elf->sechdrs;
- int section = shndx2secindex(sechdr->sh_info);
+ int section = sechdr->sh_info;
return (void *)elf->hdr + sechdrs[section].sh_offset +
r->r_offset;
return i != SHN_XINDEX && i >= SHN_LORESERVE && i <= SHN_HIRESERVE;
}
-/* shndx is in [0..SHN_LORESERVE) U (SHN_HIRESERVE, 0xfffffff], thus:
- * shndx == 0 <=> sechdrs[0]
- * ......
- * shndx == SHN_LORESERVE-1 <=> sechdrs[SHN_LORESERVE-1]
- * shndx == SHN_HIRESERVE+1 <=> sechdrs[SHN_LORESERVE]
- * shndx == SHN_HIRESERVE+2 <=> sechdrs[SHN_LORESERVE+1]
- * ......
- * fyi: sym->st_shndx is uint16, SHN_LORESERVE = ff00, SHN_HIRESERVE = ffff,
- * so basically we map 0000..feff -> 0000..feff
- * ff00..ffff -> (you are a bad boy, dont do it)
- * 10000..xxxx -> ff00..(xxxx-0x100)
+/*
+ * Move reserved section indices SHN_LORESERVE..SHN_HIRESERVE out of
+ * the way to -256..-1, to avoid conflicting with real section
+ * indices.
*/
-static inline unsigned int shndx2secindex(unsigned int i)
-{
- if (i <= SHN_HIRESERVE)
- return i;
- return i - (SHN_HIRESERVE + 1 - SHN_LORESERVE);
-}
+#define SPECIAL(i) ((i) - (SHN_HIRESERVE + 1))
/* Accessor for sym->st_shndx, hides ugliness of "64k sections" */
static inline unsigned int get_secindex(const struct elf_info *info,
const Elf_Sym *sym)
{
+ if (is_shndx_special(sym->st_shndx))
+ return SPECIAL(sym->st_shndx);
if (sym->st_shndx != SHN_XINDEX)
return sym->st_shndx;
- return shndx2secindex(info->symtab_shndx_start[sym -
- info->symtab_start]);
+ return info->symtab_shndx_start[sym - info->symtab_start];
}
/* file2alias.c */
*/
SECTIONS {
/DISCARD/ : { *(.discard) }
+
+ __ksymtab : { *(SORT(___ksymtab+*)) }
+ __ksymtab_gpl : { *(SORT(___ksymtab_gpl+*)) }
+ __ksymtab_unused : { *(SORT(___ksymtab_unused+*)) }
+ __ksymtab_unused_gpl : { *(SORT(___ksymtab_unused_gpl+*)) }
+ __ksymtab_gpl_future : { *(SORT(___ksymtab_gpl_future+*)) }
+ __kcrctab : { *(SORT(___kcrctab+*)) }
+ __kcrctab_gpl : { *(SORT(___kcrctab_gpl+*)) }
+ __kcrctab_unused : { *(SORT(___kcrctab_unused+*)) }
+ __kcrctab_unused_gpl : { *(SORT(___kcrctab_unused_gpl+*)) }
+ __kcrctab_gpl_future : { *(SORT(___kcrctab_gpl_future+*)) }
}
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
+#include <getopt.h>
#include <elf.h>
#include <fcntl.h>
#include <setjmp.h>
static struct stat sb; /* Remember .st_size, etc. */
static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */
static const char *altmcount; /* alternate mcount symbol name */
+static int warn_on_notrace_sect; /* warn when section has mcount not being recorded */
/* setjmp() return values */
enum {
ulseek(int const fd, off_t const offset, int const whence)
{
off_t const w = lseek(fd, offset, whence);
- if ((off_t)-1 == w) {
+ if (w == (off_t)-1) {
perror("lseek");
fail_file();
}
umalloc(size_t size)
{
void *const addr = malloc(size);
- if (0 == addr) {
+ if (addr == 0) {
fprintf(stderr, "malloc failed: %zu bytes\n", size);
fail_file();
}
return addr;
}
+static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
+static unsigned char *ideal_nop;
+
+static char rel_type_nop;
+
+static int (*make_nop)(void *map, size_t const offset);
+
+static int make_nop_x86(void *map, size_t const offset)
+{
+ uint32_t *ptr;
+ unsigned char *op;
+
+ /* Confirm we have 0xe8 0x0 0x0 0x0 0x0 */
+ ptr = map + offset;
+ if (*ptr != 0)
+ return -1;
+
+ op = map + offset - 1;
+ if (*op != 0xe8)
+ return -1;
+
+ /* convert to nop */
+ ulseek(fd_map, offset - 1, SEEK_SET);
+ uwrite(fd_map, ideal_nop, 5);
+ return 0;
+}
+
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
void *addr;
fd_map = open(fname, O_RDWR);
- if (0 > fd_map || 0 > fstat(fd_map, &sb)) {
+ if (fd_map < 0 || fstat(fd_map, &sb) < 0) {
perror(fname);
fail_file();
}
addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
fd_map, 0);
mmap_failed = 0;
- if (MAP_FAILED == addr) {
+ if (addr == MAP_FAILED) {
mmap_failed = 1;
addr = umalloc(sb.st_size);
uread(fd_map, addr, sb.st_size);
static int
is_mcounted_section_name(char const *const txtname)
{
- return 0 == strcmp(".text", txtname) ||
- 0 == strcmp(".ref.text", txtname) ||
- 0 == strcmp(".sched.text", txtname) ||
- 0 == strcmp(".spinlock.text", txtname) ||
- 0 == strcmp(".irqentry.text", txtname) ||
- 0 == strcmp(".text.unlikely", txtname);
+ return strcmp(".text", txtname) == 0 ||
+ strcmp(".ref.text", txtname) == 0 ||
+ strcmp(".sched.text", txtname) == 0 ||
+ strcmp(".spinlock.text", txtname) == 0 ||
+ strcmp(".irqentry.text", txtname) == 0 ||
+ strcmp(".kprobes.text", txtname) == 0 ||
+ strcmp(".text.unlikely", txtname) == 0;
}
/* 32 bit and 64 bit are very similar */
w8 = w8nat;
switch (ehdr->e_ident[EI_DATA]) {
static unsigned int const endian = 1;
- default: {
+ default:
fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
ehdr->e_ident[EI_DATA], fname);
fail_file();
- } break;
- case ELFDATA2LSB: {
- if (1 != *(unsigned char const *)&endian) {
+ break;
+ case ELFDATA2LSB:
+ if (*(unsigned char const *)&endian != 1) {
/* main() is big endian, file.o is little endian. */
w = w4rev;
w2 = w2rev;
w8 = w8rev;
}
- } break;
- case ELFDATA2MSB: {
- if (0 != *(unsigned char const *)&endian) {
+ break;
+ case ELFDATA2MSB:
+ if (*(unsigned char const *)&endian != 0) {
/* main() is little endian, file.o is big endian. */
w = w4rev;
w2 = w2rev;
w8 = w8rev;
}
- } break;
+ break;
} /* end switch */
- if (0 != memcmp(ELFMAG, ehdr->e_ident, SELFMAG)
- || ET_REL != w2(ehdr->e_type)
- || EV_CURRENT != ehdr->e_ident[EI_VERSION]) {
+ if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
+ || w2(ehdr->e_type) != ET_REL
+ || ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ET_REL file %s\n", fname);
fail_file();
}
gpfx = 0;
switch (w2(ehdr->e_machine)) {
- default: {
+ default:
fprintf(stderr, "unrecognized e_machine %d %s\n",
w2(ehdr->e_machine), fname);
fail_file();
- } break;
- case EM_386: reltype = R_386_32; break;
+ break;
+ case EM_386:
+ reltype = R_386_32;
+ make_nop = make_nop_x86;
+ ideal_nop = ideal_nop5_x86_32;
+ mcount_adjust_32 = -1;
+ break;
case EM_ARM: reltype = R_ARM_ABS32;
altmcount = "__gnu_mcount_nc";
break;
case EM_S390: /* reltype: e_class */ gpfx = '_'; break;
case EM_SH: reltype = R_SH_DIR32; break;
case EM_SPARCV9: reltype = R_SPARC_64; gpfx = '_'; break;
- case EM_X86_64: reltype = R_X86_64_64; break;
+ case EM_X86_64:
+ make_nop = make_nop_x86;
+ ideal_nop = ideal_nop5_x86_64;
+ reltype = R_X86_64_64;
+ mcount_adjust_64 = -1;
+ break;
} /* end switch */
switch (ehdr->e_ident[EI_CLASS]) {
- default: {
+ default:
fprintf(stderr, "unrecognized ELF class %d %s\n",
ehdr->e_ident[EI_CLASS], fname);
fail_file();
- } break;
- case ELFCLASS32: {
- if (sizeof(Elf32_Ehdr) != w2(ehdr->e_ehsize)
- || sizeof(Elf32_Shdr) != w2(ehdr->e_shentsize)) {
+ break;
+ case ELFCLASS32:
+ if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
+ || w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr,
"unrecognized ET_REL file: %s\n", fname);
fail_file();
}
- if (EM_S390 == w2(ehdr->e_machine))
+ if (w2(ehdr->e_machine) == EM_S390) {
reltype = R_390_32;
- if (EM_MIPS == w2(ehdr->e_machine)) {
+ mcount_adjust_32 = -4;
+ }
+ if (w2(ehdr->e_machine) == EM_MIPS) {
reltype = R_MIPS_32;
is_fake_mcount32 = MIPS32_is_fake_mcount;
}
do32(ehdr, fname, reltype);
- } break;
+ break;
case ELFCLASS64: {
Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
- if (sizeof(Elf64_Ehdr) != w2(ghdr->e_ehsize)
- || sizeof(Elf64_Shdr) != w2(ghdr->e_shentsize)) {
+ if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
+ || w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr,
"unrecognized ET_REL file: %s\n", fname);
fail_file();
}
- if (EM_S390 == w2(ghdr->e_machine))
+ if (w2(ghdr->e_machine) == EM_S390) {
reltype = R_390_64;
- if (EM_MIPS == w2(ghdr->e_machine)) {
+ mcount_adjust_64 = -8;
+ }
+ if (w2(ghdr->e_machine) == EM_MIPS) {
reltype = R_MIPS_64;
Elf64_r_sym = MIPS64_r_sym;
Elf64_r_info = MIPS64_r_info;
is_fake_mcount64 = MIPS64_is_fake_mcount;
}
do64(ghdr, fname, reltype);
- } break;
+ break;
+ }
} /* end switch */
cleanup();
}
int
-main(int argc, char const *argv[])
+main(int argc, char *argv[])
{
const char ftrace[] = "/ftrace.o";
int ftrace_size = sizeof(ftrace) - 1;
int n_error = 0; /* gcc-4.3.0 false positive complaint */
+ int c;
+ int i;
+
+ while ((c = getopt(argc, argv, "w")) >= 0) {
+ switch (c) {
+ case 'w':
+ warn_on_notrace_sect = 1;
+ break;
+ default:
+ fprintf(stderr, "usage: recordmcount [-w] file.o...\n");
+ return 0;
+ }
+ }
- if (argc <= 1) {
- fprintf(stderr, "usage: recordmcount file.o...\n");
+ if ((argc - optind) < 1) {
+ fprintf(stderr, "usage: recordmcount [-w] file.o...\n");
return 0;
}
/* Process each file in turn, allowing deep failure. */
- for (--argc, ++argv; 0 < argc; --argc, ++argv) {
+ for (i = optind; i < argc; i++) {
+ char *file = argv[i];
int const sjval = setjmp(jmpenv);
int len;
* function but does not call it. Since ftrace.o should
* not be traced anyway, we just skip it.
*/
- len = strlen(argv[0]);
+ len = strlen(file);
if (len >= ftrace_size &&
- strcmp(argv[0] + (len - ftrace_size), ftrace) == 0)
+ strcmp(file + (len - ftrace_size), ftrace) == 0)
continue;
switch (sjval) {
- default: {
- fprintf(stderr, "internal error: %s\n", argv[0]);
+ default:
+ fprintf(stderr, "internal error: %s\n", file);
exit(1);
- } break;
- case SJ_SETJMP: { /* normal sequence */
+ break;
+ case SJ_SETJMP: /* normal sequence */
/* Avoid problems if early cleanup() */
fd_map = -1;
ehdr_curr = NULL;
mmap_failed = 1;
- do_file(argv[0]);
- } break;
- case SJ_FAIL: { /* error in do_file or below */
+ do_file(file);
+ break;
+ case SJ_FAIL: /* error in do_file or below */
++n_error;
- } break;
- case SJ_SUCCEED: { /* premature success */
+ break;
+ case SJ_SUCCEED: /* premature success */
/* do nothing */
- } break;
+ break;
} /* end switch */
}
return !!n_error;
#undef is_fake_mcount
#undef fn_is_fake_mcount
#undef MIPS_is_fake_mcount
+#undef mcount_adjust
#undef sift_rel_mcount
+#undef nop_mcount
#undef find_secsym_ndx
#undef __has_rel_mcount
#undef has_rel_mcount
#undef tot_relsize
+#undef get_mcountsym
+#undef get_sym_str_and_relp
#undef do_func
#undef Elf_Addr
#undef Elf_Ehdr
#ifdef RECORD_MCOUNT_64
# define append_func append64
# define sift_rel_mcount sift64_rel_mcount
+# define nop_mcount nop_mcount_64
# define find_secsym_ndx find64_secsym_ndx
# define __has_rel_mcount __has64_rel_mcount
# define has_rel_mcount has64_rel_mcount
# define tot_relsize tot64_relsize
+# define get_sym_str_and_relp get_sym_str_and_relp_64
# define do_func do64
+# define get_mcountsym get_mcountsym_64
# define is_fake_mcount is_fake_mcount64
# define fn_is_fake_mcount fn_is_fake_mcount64
# define MIPS_is_fake_mcount MIPS64_is_fake_mcount
+# define mcount_adjust mcount_adjust_64
# define Elf_Addr Elf64_Addr
# define Elf_Ehdr Elf64_Ehdr
# define Elf_Shdr Elf64_Shdr
#else
# define append_func append32
# define sift_rel_mcount sift32_rel_mcount
+# define nop_mcount nop_mcount_32
# define find_secsym_ndx find32_secsym_ndx
# define __has_rel_mcount __has32_rel_mcount
# define has_rel_mcount has32_rel_mcount
# define tot_relsize tot32_relsize
+# define get_sym_str_and_relp get_sym_str_and_relp_32
# define do_func do32
+# define get_mcountsym get_mcountsym_32
# define is_fake_mcount is_fake_mcount32
# define fn_is_fake_mcount fn_is_fake_mcount32
# define MIPS_is_fake_mcount MIPS32_is_fake_mcount
+# define mcount_adjust mcount_adjust_32
# define Elf_Addr Elf32_Addr
# define Elf_Ehdr Elf32_Ehdr
# define Elf_Shdr Elf32_Shdr
}
static void (*Elf_r_info)(Elf_Rel *const rp, unsigned sym, unsigned type) = fn_ELF_R_INFO;
+static int mcount_adjust = 0;
+
/*
* MIPS mcount long call has 2 _mcount symbols, only the position of the 1st
* _mcount symbol is needed for dynamic function tracer, with it, to disable
uwrite(fd_map, ehdr, sizeof(*ehdr));
}
+static unsigned get_mcountsym(Elf_Sym const *const sym0,
+ Elf_Rel const *relp,
+ char const *const str0)
+{
+ unsigned mcountsym = 0;
+
+ Elf_Sym const *const symp =
+ &sym0[Elf_r_sym(relp)];
+ char const *symname = &str0[w(symp->st_name)];
+ char const *mcount = gpfx == '_' ? "_mcount" : "mcount";
+
+ if (symname[0] == '.')
+ ++symname; /* ppc64 hack */
+ if (strcmp(mcount, symname) == 0 ||
+ (altmcount && strcmp(altmcount, symname) == 0))
+ mcountsym = Elf_r_sym(relp);
+
+ return mcountsym;
+}
+
+static void get_sym_str_and_relp(Elf_Shdr const *const relhdr,
+ Elf_Ehdr const *const ehdr,
+ Elf_Sym const **sym0,
+ char const **str0,
+ Elf_Rel const **relp)
+{
+ Elf_Shdr *const shdr0 = (Elf_Shdr *)(_w(ehdr->e_shoff)
+ + (void *)ehdr);
+ unsigned const symsec_sh_link = w(relhdr->sh_link);
+ Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
+ Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
+ Elf_Rel const *const rel0 = (Elf_Rel const *)(_w(relhdr->sh_offset)
+ + (void *)ehdr);
+
+ *sym0 = (Elf_Sym const *)(_w(symsec->sh_offset)
+ + (void *)ehdr);
+
+ *str0 = (char const *)(_w(strsec->sh_offset)
+ + (void *)ehdr);
+
+ *relp = rel0;
+}
+
/*
* Look at the relocations in order to find the calls to mcount.
* Accumulate the section offsets that are found, and their relocation info,
{
uint_t *const mloc0 = mlocp;
Elf_Rel *mrelp = *mrelpp;
- Elf_Shdr *const shdr0 = (Elf_Shdr *)(_w(ehdr->e_shoff)
- + (void *)ehdr);
- unsigned const symsec_sh_link = w(relhdr->sh_link);
- Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
- Elf_Sym const *const sym0 = (Elf_Sym const *)(_w(symsec->sh_offset)
- + (void *)ehdr);
-
- Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
- char const *const str0 = (char const *)(_w(strsec->sh_offset)
- + (void *)ehdr);
-
- Elf_Rel const *const rel0 = (Elf_Rel const *)(_w(relhdr->sh_offset)
- + (void *)ehdr);
+ Elf_Sym const *sym0;
+ char const *str0;
+ Elf_Rel const *relp;
unsigned rel_entsize = _w(relhdr->sh_entsize);
unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
- Elf_Rel const *relp = rel0;
-
unsigned mcountsym = 0;
unsigned t;
+ get_sym_str_and_relp(relhdr, ehdr, &sym0, &str0, &relp);
+
for (t = nrel; t; --t) {
- if (!mcountsym) {
- Elf_Sym const *const symp =
- &sym0[Elf_r_sym(relp)];
- char const *symname = &str0[w(symp->st_name)];
- char const *mcount = '_' == gpfx ? "_mcount" : "mcount";
-
- if ('.' == symname[0])
- ++symname; /* ppc64 hack */
- if (0 == strcmp(mcount, symname) ||
- (altmcount && 0 == strcmp(altmcount, symname)))
- mcountsym = Elf_r_sym(relp);
- }
+ if (!mcountsym)
+ mcountsym = get_mcountsym(sym0, relp, str0);
if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
- uint_t const addend = _w(_w(relp->r_offset) - recval);
-
+ uint_t const addend =
+ _w(_w(relp->r_offset) - recval + mcount_adjust);
mrelp->r_offset = _w(offbase
+ ((void *)mlocp - (void *)mloc0));
Elf_r_info(mrelp, recsym, reltype);
- if (sizeof(Elf_Rela) == rel_entsize) {
+ if (rel_entsize == sizeof(Elf_Rela)) {
((Elf_Rela *)mrelp)->r_addend = addend;
*mlocp++ = 0;
} else
return mlocp;
}
+/*
+ * Read the relocation table again, but this time its called on sections
+ * that are not going to be traced. The mcount calls here will be converted
+ * into nops.
+ */
+static void nop_mcount(Elf_Shdr const *const relhdr,
+ Elf_Ehdr const *const ehdr,
+ const char *const txtname)
+{
+ Elf_Shdr *const shdr0 = (Elf_Shdr *)(_w(ehdr->e_shoff)
+ + (void *)ehdr);
+ Elf_Sym const *sym0;
+ char const *str0;
+ Elf_Rel const *relp;
+ Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
+ unsigned rel_entsize = _w(relhdr->sh_entsize);
+ unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
+ unsigned mcountsym = 0;
+ unsigned t;
+ int once = 0;
+
+ get_sym_str_and_relp(relhdr, ehdr, &sym0, &str0, &relp);
+
+ for (t = nrel; t; --t) {
+ int ret = -1;
+
+ if (!mcountsym)
+ mcountsym = get_mcountsym(sym0, relp, str0);
+
+ if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
+ if (make_nop)
+ ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
+ if (warn_on_notrace_sect && !once) {
+ printf("Section %s has mcount callers being ignored\n",
+ txtname);
+ once = 1;
+ /* just warn? */
+ if (!make_nop)
+ return;
+ }
+ }
+
+ /*
+ * If we successfully removed the mcount, mark the relocation
+ * as a nop (don't do anything with it).
+ */
+ if (!ret) {
+ Elf_Rel rel;
+ rel = *(Elf_Rel *)relp;
+ Elf_r_info(&rel, Elf_r_sym(relp), rel_type_nop);
+ ulseek(fd_map, (void *)relp - (void *)ehdr, SEEK_SET);
+ uwrite(fd_map, &rel, sizeof(rel));
+ }
+ relp = (Elf_Rel const *)(rel_entsize + (void *)relp);
+ }
+}
+
/*
* Find a symbol in the given section, to be used as the base for relocating
Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
char const *const txtname = &shstrtab[w(txthdr->sh_name)];
- if (0 == strcmp("__mcount_loc", txtname)) {
+ if (strcmp("__mcount_loc", txtname) == 0) {
fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
fname);
succeed_file();
}
- if (SHT_PROGBITS != w(txthdr->sh_type) ||
- !is_mcounted_section_name(txtname))
+ if (w(txthdr->sh_type) != SHT_PROGBITS ||
+ !(w(txthdr->sh_flags) & SHF_EXECINSTR))
return NULL;
return txtname;
}
char const *const shstrtab,
char const *const fname)
{
- if (SHT_REL != w(relhdr->sh_type) && SHT_RELA != w(relhdr->sh_type))
+ if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
return NULL;
return __has_rel_mcount(relhdr, shdr0, shstrtab, fname);
}
{
unsigned totrelsz = 0;
Elf_Shdr const *shdrp = shdr0;
+ char const *txtname;
for (; nhdr; --nhdr, ++shdrp) {
- if (has_rel_mcount(shdrp, shdr0, shstrtab, fname))
+ txtname = has_rel_mcount(shdrp, shdr0, shstrtab, fname);
+ if (txtname && is_mcounted_section_name(txtname))
totrelsz += _w(shdrp->sh_size);
}
return totrelsz;
for (relhdr = shdr0, k = nhdr; k; --k, ++relhdr) {
char const *const txtname = has_rel_mcount(relhdr, shdr0,
shstrtab, fname);
- if (txtname) {
+ if (txtname && is_mcounted_section_name(txtname)) {
uint_t recval = 0;
unsigned const recsym = find_secsym_ndx(
w(relhdr->sh_info), txtname, &recval,
mlocp = sift_rel_mcount(mlocp,
(void *)mlocp - (void *)mloc0, &mrelp,
relhdr, ehdr, recsym, recval, reltype);
+ } else if (txtname && (warn_on_notrace_sect || make_nop)) {
+ /*
+ * This section is ignored by ftrace, but still
+ * has mcount calls. Convert them to nops now.
+ */
+ nop_mcount(relhdr, ehdr, txtname);
}
}
if (mloc0 != mlocp) {
".sched.text" => 1,
".spinlock.text" => 1,
".irqentry.text" => 1,
+ ".kprobes.text" => 1,
".text.unlikely" => 1,
);
$mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\smcount([+-]0x[0-9a-zA-Z]+)?\$";
$type = ".quad";
$alignment = 8;
+ $mcount_adjust = -1;
# force flags for this arch
$ld .= " -m elf_x86_64";
} elsif ($arch eq "i386") {
$alignment = 4;
+ $mcount_adjust = -1;
# force flags for this arch
$ld .= " -m elf_i386";
} elsif ($arch eq "s390" && $bits == 32) {
$mcount_regex = "^\\s*([0-9a-fA-F]+):\\s*R_390_32\\s+_mcount\$";
+ $mcount_adjust = -4;
$alignment = 4;
$ld .= " -m elf_s390";
$cc .= " -m31";
} elsif ($arch eq "s390" && $bits == 64) {
$mcount_regex = "^\\s*([0-9a-fA-F]+):\\s*R_390_(PC|PLT)32DBL\\s+_mcount\\+0x2\$";
+ $mcount_adjust = -8;
$alignment = 8;
$type = ".quad";
$ld .= " -m elf64_s390";
* @avd: access vector decisions
* @result: result from avc_has_perm_noaudit
* @a: auxiliary audit data
+ * @flags: VFS walk flags
*
* Audit the granting or denial of permissions in accordance
* with the policy. This function is typically called by
* be performed under a lock, to allow the lock to be released
* before calling the auditing code.
*/
-void avc_audit(u32 ssid, u32 tsid,
+int avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
- struct av_decision *avd, int result, struct common_audit_data *a)
+ struct av_decision *avd, int result, struct common_audit_data *a,
+ unsigned flags)
{
struct common_audit_data stack_data;
u32 denied, audited;
else
audited = requested & avd->auditallow;
if (!audited)
- return;
+ return 0;
+
if (!a) {
a = &stack_data;
COMMON_AUDIT_DATA_INIT(a, NONE);
}
+
+ /*
+ * When in a RCU walk do the audit on the RCU retry. This is because
+ * the collection of the dname in an inode audit message is not RCU
+ * safe. Note this may drop some audits when the situation changes
+ * during retry. However this is logically just as if the operation
+ * happened a little later.
+ */
+ if ((a->type == LSM_AUDIT_DATA_FS) &&
+ (flags & IPERM_FLAG_RCU))
+ return -ECHILD;
+
a->selinux_audit_data.tclass = tclass;
a->selinux_audit_data.requested = requested;
a->selinux_audit_data.ssid = ssid;
a->lsm_pre_audit = avc_audit_pre_callback;
a->lsm_post_audit = avc_audit_post_callback;
common_lsm_audit(a);
+ return 0;
}
/**
* @tclass: target security class
* @requested: requested permissions, interpreted based on @tclass
* @auditdata: auxiliary audit data
+ * @flags: VFS walk flags
*
* Check the AVC to determine whether the @requested permissions are granted
* for the SID pair (@ssid, @tsid), interpreting the permissions
* permissions are granted, -%EACCES if any permissions are denied, or
* another -errno upon other errors.
*/
-int avc_has_perm(u32 ssid, u32 tsid, u16 tclass,
- u32 requested, struct common_audit_data *auditdata)
+int avc_has_perm_flags(u32 ssid, u32 tsid, u16 tclass,
+ u32 requested, struct common_audit_data *auditdata,
+ unsigned flags)
{
struct av_decision avd;
- int rc;
+ int rc, rc2;
rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, &avd);
- avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata);
+
+ rc2 = avc_audit(ssid, tsid, tclass, requested, &avd, rc, auditdata,
+ flags);
+ if (rc2)
+ return rc2;
return rc;
}
}
rc = avc_has_perm_noaudit(sid, sid, sclass, av, 0, &avd);
- if (audit == SECURITY_CAP_AUDIT)
- avc_audit(sid, sid, sclass, av, &avd, rc, &ad);
+ if (audit == SECURITY_CAP_AUDIT) {
+ int rc2 = avc_audit(sid, sid, sclass, av, &avd, rc, &ad, 0);
+ if (rc2)
+ return rc2;
+ }
return rc;
}
static int inode_has_perm(const struct cred *cred,
struct inode *inode,
u32 perms,
- struct common_audit_data *adp)
+ struct common_audit_data *adp,
+ unsigned flags)
{
struct inode_security_struct *isec;
struct common_audit_data ad;
ad.u.fs.inode = inode;
}
- return avc_has_perm(sid, isec->sid, isec->sclass, perms, adp);
+ return avc_has_perm_flags(sid, isec->sid, isec->sclass, perms, adp, flags);
}
/* Same as inode_has_perm, but pass explicit audit data containing
COMMON_AUDIT_DATA_INIT(&ad, FS);
ad.u.fs.path.mnt = mnt;
ad.u.fs.path.dentry = dentry;
- return inode_has_perm(cred, inode, av, &ad);
+ return inode_has_perm(cred, inode, av, &ad, 0);
}
/* Check whether a task can use an open file descriptor to
/* av is zero if only checking access to the descriptor. */
rc = 0;
if (av)
- rc = inode_has_perm(cred, inode, av, &ad);
+ rc = inode_has_perm(cred, inode, av, &ad, 0);
out:
return rc;
return rc;
if (!newsid || !(sbsec->flags & SE_SBLABELSUPP)) {
- rc = security_transition_sid(sid, dsec->sid, tclass, NULL, &newsid);
+ rc = security_transition_sid(sid, dsec->sid, tclass,
+ &dentry->d_name, &newsid);
if (rc)
return rc;
}
file = file_priv->file;
inode = file->f_path.dentry->d_inode;
if (inode_has_perm(cred, inode,
- FILE__READ | FILE__WRITE, NULL)) {
+ FILE__READ | FILE__WRITE, NULL, 0)) {
drop_tty = 1;
}
}
if (!mask)
return 0;
- /* May be droppable after audit */
- if (flags & IPERM_FLAG_RCU)
- return -ECHILD;
-
COMMON_AUDIT_DATA_INIT(&ad, FS);
ad.u.fs.inode = inode;
perms = file_mask_to_av(inode->i_mode, mask);
- return inode_has_perm(cred, inode, perms, &ad);
+ return inode_has_perm(cred, inode, perms, &ad, flags);
}
static int selinux_inode_setattr(struct dentry *dentry, struct iattr *iattr)
* new inode label or new policy.
* This check is not redundant - do not remove.
*/
- return inode_has_perm(cred, inode, open_file_to_av(file), NULL);
+ return inode_has_perm(cred, inode, open_file_to_av(file), NULL, 0);
}
/* task security operations */
void __init avc_init(void);
-void avc_audit(u32 ssid, u32 tsid,
+int avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd,
int result,
- struct common_audit_data *a);
+ struct common_audit_data *a, unsigned flags);
#define AVC_STRICT 1 /* Ignore permissive mode. */
int avc_has_perm_noaudit(u32 ssid, u32 tsid,
unsigned flags,
struct av_decision *avd);
-int avc_has_perm(u32 ssid, u32 tsid,
- u16 tclass, u32 requested,
- struct common_audit_data *auditdata);
+int avc_has_perm_flags(u32 ssid, u32 tsid,
+ u16 tclass, u32 requested,
+ struct common_audit_data *auditdata,
+ unsigned);
+
+static inline int avc_has_perm(u32 ssid, u32 tsid,
+ u16 tclass, u32 requested,
+ struct common_audit_data *auditdata)
+{
+ return avc_has_perm_flags(ssid, tsid, tclass, requested, auditdata, 0);
+}
u32 avc_policy_seqno(void);
goto out;
rc = flex_array_prealloc(p->type_val_to_struct_array, 0,
- p->p_types.nprim - 1, GFP_KERNEL | __GFP_ZERO);
+ p->p_types.nprim, GFP_KERNEL | __GFP_ZERO);
if (rc)
goto out;
goto out;
rc = flex_array_prealloc(p->sym_val_to_name[i],
- 0, p->symtab[i].nprim - 1,
+ 0, p->symtab[i].nprim,
GFP_KERNEL | __GFP_ZERO);
if (rc)
goto out;
goto out;
nel = le32_to_cpu(buf[0]);
- printk(KERN_ERR "%s: nel=%d\n", __func__, nel);
-
last = p->filename_trans;
while (last && last->next)
last = last->next;
goto out;
name[len] = 0;
- printk(KERN_ERR "%s: ft=%p ft->name=%p ft->name=%s\n", __func__, ft, ft->name, ft->name);
-
rc = next_entry(buf, fp, sizeof(u32) * 4);
if (rc)
goto out;
goto bad;
/* preallocate so we don't have to worry about the put ever failing */
- rc = flex_array_prealloc(p->type_attr_map_array, 0, p->p_types.nprim - 1,
+ rc = flex_array_prealloc(p->type_attr_map_array, 0, p->p_types.nprim,
GFP_KERNEL | __GFP_ZERO);
if (rc)
goto bad;
/* analysing the volume and mixer tables shows
* that they are similar enough when we shift
* the mixer table down by 4 bits. The error
- * is minuscule, in just one item the error
+ * is miniscule, in just one item the error
* is 1, at a value of 0x07f17b (mixer table
* value is 0x07f17a) */
tmp = tas_gaintable[left];
.channels_min = 1,
.channels_max = 2,
.buffer_bytes_max = 0x10000,
- .period_bytes_min = 0x1,
+ .period_bytes_min = 0x20,
.period_bytes_max = 0x1000,
.periods_min = 2,
- .periods_max = 32,
+ .periods_max = 1024,
};
#ifndef CHIP_AU8820
SNDRV_PCM_HW_PARAM_PERIOD_BYTES)) < 0)
return err;
+ snd_pcm_hw_constraint_step(runtime, 0,
+ SNDRV_PCM_HW_PARAM_BUFFER_BYTES, 64);
+
if (VORTEX_PCM_TYPE(substream->pcm) != VORTEX_PCM_WT) {
#ifndef CHIP_AU8820
if (VORTEX_PCM_TYPE(substream->pcm) == VORTEX_PCM_A3D) {
codec->chip_name, fix->type);
break;
}
- if (!fix[id].chained)
+ if (!fix->chained)
break;
if (++depth > 10)
break;
- id = fix[id].chain_id;
+ id = fix->chain_id;
}
}
static struct snd_pci_quirk beep_white_list[] = {
SND_PCI_QUIRK(0x1043, 0x829f, "ASUS", 1),
SND_PCI_QUIRK(0x1043, 0x83ce, "EeePC", 1),
+ SND_PCI_QUIRK(0x1043, 0x831a, "EeePC", 1),
SND_PCI_QUIRK(0x8086, 0xd613, "Intel", 1),
{}
};
SND_PCI_QUIRK(0x1071, 0x8258, "Evesham Voyaeger", ALC883_LAPTOP_EAPD),
SND_PCI_QUIRK(0x10f1, 0x2350, "TYAN-S2350", ALC888_6ST_DELL),
SND_PCI_QUIRK(0x108e, 0x534d, NULL, ALC883_3ST_6ch),
+ SND_PCI_QUIRK(0x1458, 0xa002, "Gigabyte P35 DS3R", ALC882_6ST_DIG),
SND_PCI_QUIRK(0x1462, 0x0349, "MSI", ALC883_TARGA_2ch_DIG),
SND_PCI_QUIRK(0x1462, 0x040d, "MSI", ALC883_TARGA_2ch_DIG),
PINFIX_LENOVO_Y530,
PINFIX_PB_M5210,
PINFIX_ACER_ASPIRE_7736,
- PINFIX_GIGABYTE_880GM,
};
static const struct alc_fixup alc882_fixups[] = {
.type = ALC_FIXUP_SKU,
.v.sku = ALC_FIXUP_SKU_IGNORE,
},
- [PINFIX_GIGABYTE_880GM] = {
- .type = ALC_FIXUP_PINS,
- .v.pins = (const struct alc_pincfg[]) {
- { 0x14, 0x1114410 }, /* set as speaker */
- { }
- }
- },
};
static struct snd_pci_quirk alc882_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x3a0d, "Lenovo Y530", PINFIX_LENOVO_Y530),
SND_PCI_QUIRK(0x147b, 0x107a, "Abit AW9D-MAX", PINFIX_ABIT_AW9D_MAX),
SND_PCI_QUIRK(0x1025, 0x0296, "Acer Aspire 7736z", PINFIX_ACER_ASPIRE_7736),
- SND_PCI_QUIRK(0x1458, 0xa002, "Gigabyte", PINFIX_GIGABYTE_880GM),
{}
};
ALC662_3ST_6ch_DIG),
SND_PCI_QUIRK(0x1179, 0xff6e, "Toshiba NB20x", ALC662_AUTO),
SND_PCI_QUIRK(0x144d, 0xca00, "Samsung NC10", ALC272_SAMSUNG_NC10),
+ SND_PCI_QUIRK(0x1458, 0xa002, "Gigabyte 945GCM-S2L",
+ ALC662_3ST_6ch_DIG),
SND_PCI_QUIRK(0x152d, 0x2304, "Quanta WH1", ALC663_ASUS_H13),
SND_PCI_QUIRK(0x1565, 0x820f, "Biostar TA780G M2+", ALC662_3ST_6ch_DIG),
SND_PCI_QUIRK(0x1631, 0xc10c, "PB RS65", ALC663_ASUS_M51VA),
ALC662_FIXUP_IDEAPAD,
ALC272_FIXUP_MARIO,
ALC662_FIXUP_CZC_P10T,
- ALC662_FIXUP_GIGABYTE,
+ ALC662_FIXUP_SKU_IGNORE,
};
static const struct alc_fixup alc662_fixups[] = {
{}
}
},
- [ALC662_FIXUP_GIGABYTE] = {
- .type = ALC_FIXUP_PINS,
- .v.pins = (const struct alc_pincfg[]) {
- { 0x14, 0x1114410 }, /* set as speaker */
- { }
- }
+ [ALC662_FIXUP_SKU_IGNORE] = {
+ .type = ALC_FIXUP_SKU,
+ .v.sku = ALC_FIXUP_SKU_IGNORE,
},
};
static struct snd_pci_quirk alc662_fixup_tbl[] = {
SND_PCI_QUIRK(0x1025, 0x0308, "Acer Aspire 8942G", ALC662_FIXUP_ASPIRE),
+ SND_PCI_QUIRK(0x1025, 0x031c, "Gateway NV79", ALC662_FIXUP_SKU_IGNORE),
SND_PCI_QUIRK(0x1025, 0x038b, "Acer Aspire 8943G", ALC662_FIXUP_ASPIRE),
SND_PCI_QUIRK(0x144d, 0xc051, "Samsung R720", ALC662_FIXUP_IDEAPAD),
- SND_PCI_QUIRK(0x1458, 0xa002, "Gigabyte", ALC662_FIXUP_GIGABYTE),
SND_PCI_QUIRK(0x17aa, 0x38af, "Lenovo Ideapad Y550P", ALC662_FIXUP_IDEAPAD),
SND_PCI_QUIRK(0x17aa, 0x3a0d, "Lenovo Ideapad Y550", ALC662_FIXUP_IDEAPAD),
SND_PCI_QUIRK(0x1b35, 0x2206, "CZC P10T", ALC662_FIXUP_CZC_P10T),
{
int i;
struct snd_ctl_elem_id id;
- const char *labels[] = {"Mic", "Front Mic", "Line"};
+ const char *labels[] = {"Mic", "Front Mic", "Line", "Rear Mic"};
+ struct snd_kcontrol *ctl;
memset(&id, 0, sizeof(id));
id.iface = SNDRV_CTL_ELEM_IFACE_MIXER;
for (i = 0; i < ARRAY_SIZE(labels); i++) {
sprintf(id.name, "%s Playback Volume", labels[i]);
- snd_ctl_notify(codec->bus->card, SNDRV_CTL_EVENT_MASK_VALUE,
- &id);
+ ctl = snd_hda_find_mixer_ctl(codec, id.name);
+ if (ctl)
+ snd_ctl_notify(codec->bus->card,
+ SNDRV_CTL_EVENT_MASK_VALUE,
+ &ctl->id);
}
}
SOC_DOUBLE_R("Capture Switch", SSM2602_LINVOL, SSM2602_RINVOL, 7, 1, 1),
SOC_SINGLE("Mic Boost (+20dB)", SSM2602_APANA, 0, 1, 0),
-SOC_SINGLE("Mic Boost2 (+20dB)", SSM2602_APANA, 7, 1, 0),
+SOC_SINGLE("Mic Boost2 (+20dB)", SSM2602_APANA, 8, 1, 0),
SOC_SINGLE("Mic Switch", SSM2602_APANA, 1, 1, 1),
SOC_SINGLE("Sidetone Playback Volume", SSM2602_APANA, 6, 3, 1),
.read = ssm2602_read_reg_cache,
.write = ssm2602_write,
.set_bias_level = ssm2602_set_bias_level,
- .reg_cache_size = sizeof(ssm2602_reg),
+ .reg_cache_size = ARRAY_SIZE(ssm2602_reg),
.reg_word_size = sizeof(u16),
.reg_cache_default = ssm2602_reg,
};
* low = 0x1a
* high = 0x1b
*/
-static int ssm2602_i2c_probe(struct i2c_client *i2c,
+static int __devinit ssm2602_i2c_probe(struct i2c_client *i2c,
const struct i2c_device_id *id)
{
struct ssm2602_priv *ssm2602;
return ret;
}
-static int ssm2602_i2c_remove(struct i2c_client *client)
+static int __devexit ssm2602_i2c_remove(struct i2c_client *client)
{
snd_soc_unregister_codec(&client->dev);
kfree(i2c_get_clientdata(client));
.owner = THIS_MODULE,
},
.probe = ssm2602_i2c_probe,
- .remove = ssm2602_i2c_remove,
+ .remove = __devexit_p(ssm2602_i2c_remove),
.id_table = ssm2602_i2c_id,
};
#endif
.reg_cache_step = 1,
.read = uda134x_read_reg_cache,
.write = uda134x_write,
-#ifdef POWER_OFF_ON_STANDBY
.set_bias_level = uda134x_set_bias_level,
-#endif
};
static int __devinit uda134x_codec_probe(struct platform_device *pdev)
SOC_SINGLE_TLV("DRC Startup Volume", WM8903_DRC_0, 6, 18, 0, drc_tlv_startup),
SOC_DOUBLE_R_TLV("Digital Capture Volume", WM8903_ADC_DIGITAL_VOLUME_LEFT,
- WM8903_ADC_DIGITAL_VOLUME_RIGHT, 1, 96, 0, digital_tlv),
+ WM8903_ADC_DIGITAL_VOLUME_RIGHT, 1, 120, 0, digital_tlv),
SOC_ENUM("ADC Companding Mode", adc_companding),
SOC_SINGLE("ADC Companding Switch", WM8903_AUDIO_INTERFACE_0, 3, 1, 0),
mcasp_set_bits(base + DAVINCI_MCASP_ACLKRCTL_REG, ACLKRE);
mcasp_set_bits(base + DAVINCI_MCASP_RXFMCTL_REG, AFSRE);
- mcasp_set_bits(base + DAVINCI_MCASP_PDIR_REG, (0x7 << 26));
+ mcasp_set_bits(base + DAVINCI_MCASP_PDIR_REG,
+ ACLKX | AHCLKX | AFSX);
break;
case SND_SOC_DAIFMT_CBM_CFS:
/* codec is clock master and frame slave */
- mcasp_set_bits(base + DAVINCI_MCASP_ACLKXCTL_REG, ACLKXE);
+ mcasp_clr_bits(base + DAVINCI_MCASP_ACLKXCTL_REG, ACLKXE);
mcasp_set_bits(base + DAVINCI_MCASP_TXFMCTL_REG, AFSXE);
- mcasp_set_bits(base + DAVINCI_MCASP_ACLKRCTL_REG, ACLKRE);
+ mcasp_clr_bits(base + DAVINCI_MCASP_ACLKRCTL_REG, ACLKRE);
mcasp_set_bits(base + DAVINCI_MCASP_RXFMCTL_REG, AFSRE);
- mcasp_set_bits(base + DAVINCI_MCASP_PDIR_REG, (0x2d << 26));
+ mcasp_clr_bits(base + DAVINCI_MCASP_PDIR_REG,
+ ACLKX | ACLKR);
+ mcasp_set_bits(base + DAVINCI_MCASP_PDIR_REG,
+ AFSX | AFSR);
break;
case SND_SOC_DAIFMT_CBM_CFM:
/* codec is clock and frame master */
mcasp_clr_bits(base + DAVINCI_MCASP_ACLKRCTL_REG, ACLKRE);
mcasp_clr_bits(base + DAVINCI_MCASP_RXFMCTL_REG, AFSRE);
- mcasp_clr_bits(base + DAVINCI_MCASP_PDIR_REG, (0x3f << 26));
+ mcasp_clr_bits(base + DAVINCI_MCASP_PDIR_REG,
+ ACLKX | AHCLKX | AFSX | ACLKR | AHCLKR | AFSR);
break;
default:
mcasp_set_reg(dev->base + DAVINCI_MCASP_TXTDM_REG, mask);
mcasp_set_bits(dev->base + DAVINCI_MCASP_TXFMT_REG, TXORD);
- if ((dev->tdm_slots >= 2) || (dev->tdm_slots <= 32))
+ if ((dev->tdm_slots >= 2) && (dev->tdm_slots <= 32))
mcasp_mod_bits(dev->base + DAVINCI_MCASP_TXFMCTL_REG,
FSXMOD(dev->tdm_slots), FSXMOD(0x1FF));
else
AHCLKRE);
mcasp_set_reg(dev->base + DAVINCI_MCASP_RXTDM_REG, mask);
- if ((dev->tdm_slots >= 2) || (dev->tdm_slots <= 32))
+ if ((dev->tdm_slots >= 2) && (dev->tdm_slots <= 32))
mcasp_mod_bits(dev->base + DAVINCI_MCASP_RXFMCTL_REG,
FSRMOD(dev->tdm_slots), FSRMOD(0x1FF));
else
struct jz4740_i2s *i2s = snd_soc_dai_get_drvdata(dai);
uint32_t conf;
- if (!dai->active)
+ if (dai->active)
return;
conf = jz4740_i2s_read(i2s, JZ_REG_AIC_CONF);
return 0;
}
+static int sst_platform_pcm_hw_free(struct snd_pcm_substream *substream)
+{
+ return snd_pcm_lib_free_pages(substream);
+}
+
static struct snd_pcm_ops sst_platform_ops = {
.open = sst_platform_open,
.close = sst_platform_close,
.trigger = sst_platform_pcm_trigger,
.pointer = sst_platform_pcm_pointer,
.hw_params = sst_platform_pcm_hw_params,
+ .hw_free = sst_platform_pcm_hw_free,
};
static void sst_pcm_free(struct snd_pcm *pcm)
.name = "WM8994",
.stream_name = "WM8994 HiFi",
.cpu_dai_name = "samsung-i2s.0",
- .codec_dai_name = "wm8994-hifi",
+ .codec_dai_name = "wm8994-aif1",
.platform_name = "samsung-audio",
- .codec_name = "wm8994-codec.0-0x1a",
+ .codec_name = "wm8994-codec.0-001a",
.init = goni_wm8994_init,
.ops = &goni_hifi_ops,
}, {
.name = "WM8994 Voice",
.stream_name = "Voice",
.cpu_dai_name = "goni-voice-dai",
- .codec_dai_name = "wm8994-voice",
+ .codec_dai_name = "wm8994-aif2",
.platform_name = "samsung-audio",
- .codec_name = "wm8994-codec.0-0x1a",
+ .codec_name = "wm8994-codec.0-001a",
.ops = &goni_voice_ops,
},
};
if (!card->name || !card->dev)
return -EINVAL;
+ dev_set_drvdata(card->dev, card);
+
snd_soc_initialize_card_lists(card);
soc_init_card_debugfs(card);
if (!rate)
continue;
/* C-Media CM6501 mislabels its 96 kHz altsetting */
+ /* Terratec Aureon 7.1 USB C-Media 6206, too */
if (rate == 48000 && nr_rates == 1 &&
(chip->usb_id == USB_ID(0x0d8c, 0x0201) ||
- chip->usb_id == USB_ID(0x0d8c, 0x0102)) &&
+ chip->usb_id == USB_ID(0x0d8c, 0x0102) ||
+ chip->usb_id == USB_ID(0x0ccd, 0x00b1)) &&
fp->altsetting == 5 && fp->maxpacksize == 392)
rate = 96000;
/* Creative VF0470 Live Cam reports 16 kHz instead of 8kHz */
case USB_ID(0x0d8c, 0x0102):
/* C-Media CM6206 / CM106-Like Sound Device */
+ case USB_ID(0x0ccd, 0x00b1): /* Terratec Aureon 7.1 USB */
return snd_usb_cm6206_boot_quirk(dev);
case USB_ID(0x133e, 0x0815):
field:unsigned char common_flags;
field:unsigned char common_preempt_count;
field:int common_pid;
- field:int common_lock_depth;
field:char comm[TASK_COMM_LEN];
field:pid_t pid;
field:unsigned char common_flags;
field:unsigned char common_preempt_count;
field:int common_pid;
- field:int common_lock_depth;
field:char comm[TASK_COMM_LEN];
field:pid_t pid;
Do various checks like samples ordering and lost events.
-f::
---fields
+--fields::
Comma separated list of fields to print. Options are:
comm, tid, pid, time, cpu, event, trace, sym. Field
- list must be prepended with the type, trace, sw or hw,
+ list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies.
e.g., -f sw:comm,tid,time,sym and -f trace:time,cpu,trace
+ perf script -f <fields>
+
+ is equivalent to:
+
+ perf script -f trace:<fields> -f sw:<fields> -f hw:<fields>
+
+ i.e., the specified fields apply to all event types if the type string
+ is not given.
+
+ The arguments are processed in the order received. A later usage can
+ reset a prior request. e.g.:
+
+ -f trace: -f comm,tid,time,sym
+
+ The first -f suppresses trace events (field list is ""), but then the
+ second invocation sets the fields to comm,tid,time,sym. In this case a
+ warning is given to the user:
+
+ "Overriding previous field request for all events."
+
+ Alternativey, consider the order:
+
+ -f comm,tid,time,sym -f trace:
+
+ The first -f sets the fields for all events and the second -f
+ suppresses trace events. The user is given a warning message about
+ the override, and the result of the above is that only S/W and H/W
+ events are displayed with the given fields.
+
+ For the 'wildcard' option if a user selected field is invalid for an
+ event type, a message is displayed to the user that the option is
+ ignored for that type. For example:
+
+ $ perf script -f comm,tid,trace
+ 'trace' not valid for hardware events. Ignoring.
+ 'trace' not valid for software events. Ignoring.
+
+ Alternatively, if the type is given an invalid field is specified it
+ is an error. For example:
+
+ perf script -v -f sw:comm,tid,trace
+ 'trace' not valid for software events.
+
+ At this point usage is displayed, and perf-script exits.
+
+ Finally, a user may not set fields to none for all event types.
+ i.e., -f "" is not allowed.
+
-k::
--vmlinux=<file>::
vmlinux pathname
# The default target of this Makefile is...
all:
+include config/utilities.mak
+
ifneq ($(OUTPUT),)
# check that the output directory actually exists
OUTDIR := $(shell cd $(OUTPUT) && /bin/pwd)
# Define V to have a more verbose compile.
#
+# Define PYTHON to point to the python binary if the default
+# `python' is not correct; for example: PYTHON=python2
+#
+# Define PYTHON_CONFIG to point to the python-config binary if
+# the default `$(PYTHON)-config' is not correct.
+#
# Define ASCIIDOC8 if you want to format documentation with AsciiDoc 8
#
# Define DOCBOOK_XSL_172 if you want to format man pages with DocBook XSL v1.72.
-e s/ppc.*/powerpc/ -e s/mips.*/mips/ \
-e s/sh[234].*/sh/ )
+CC = $(CROSS_COMPILE)gcc
+AR = $(CROSS_COMPILE)ar
+
# Additional ARCH settings for x86
ifeq ($(ARCH),i386)
ARCH := x86
endif
ifeq ($(ARCH),x86_64)
- RAW_ARCH := x86_64
- ARCH := x86
- ARCH_CFLAGS := -DARCH_X86_64
- ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S
+ ARCH := x86
+ IS_X86_64 := $(shell echo __x86_64__ | ${CC} -E -xc - | tail -n 1)
+ ifeq (${IS_X86_64}, 1)
+ RAW_ARCH := x86_64
+ ARCH_CFLAGS := -DARCH_X86_64
+ ARCH_INCLUDE = ../../arch/x86/lib/memcpy_64.S
+ endif
endif
#
export prefix bindir sharedir sysconfdir
-CC = $(CROSS_COMPILE)gcc
-AR = $(CROSS_COMPILE)ar
RM = rm -f
MKDIR = mkdir
FIND = find
# explicitly what architecture to check for. Fix this up for yours..
SPARSE_FLAGS = -D__BIG_ENDIAN__ -D__powerpc__
--include feature-tests.mak
+-include config/feature-tests.mak
ifeq ($(call try-cc,$(SOURCE_HELLO),-Werror -fstack-protector-all),y)
CFLAGS := $(CFLAGS) -fstack-protector-all
strip-libs = $(filter-out -l%,$(1))
$(OUTPUT)python/perf.so: $(PYRF_OBJS)
- $(QUIET_GEN)( \
- export CFLAGS="$(BASIC_CFLAGS)"; \
- python util/setup.py --quiet build_ext --build-lib='$(OUTPUT)python' \
- --build-temp='$(OUTPUT)python/temp' \
- )
-
+ $(QUIET_GEN)CFLAGS='$(BASIC_CFLAGS)' $(PYTHON_WORD) util/setup.py \
+ --quiet build_ext \
+ --build-lib='$(OUTPUT)python' \
+ --build-temp='$(OUTPUT)python/temp'
#
# No Perl scripts right now:
#
endif
endif
-ifdef NO_LIBPYTHON
- BASIC_CFLAGS += -DNO_LIBPYTHON
+disable-python = $(eval $(disable-python_code))
+define disable-python_code
+ BASIC_CFLAGS += -DNO_LIBPYTHON
+ $(if $(1),$(warning No $(1) was found))
+ $(warning Python support won't be built)
+endef
+
+override PYTHON := \
+ $(call get-executable-or-default,PYTHON,python)
+
+ifndef PYTHON
+ $(call disable-python,python interpreter)
+ python-clean :=
else
- PYTHON_EMBED_LDOPTS = $(shell python-config --ldflags 2>/dev/null)
- PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS))
- PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS))
- PYTHON_EMBED_CCOPTS = `python-config --cflags 2>/dev/null`
- FLAGS_PYTHON_EMBED=$(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS)
- ifneq ($(call try-cc,$(SOURCE_PYTHON_EMBED),$(FLAGS_PYTHON_EMBED)),y)
- msg := $(warning No Python.h found, install python-dev[el] to have python support in 'perf script' and to build the python bindings)
- BASIC_CFLAGS += -DNO_LIBPYTHON
- else
- ALL_LDFLAGS += $(PYTHON_EMBED_LDFLAGS)
- EXTLIBS += $(PYTHON_EMBED_LIBADD)
- LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-python.o
- LIB_OBJS += $(OUTPUT)scripts/python/Perf-Trace-Util/Context.o
- LANG_BINDINGS += $(OUTPUT)python/perf.so
- endif
+
+ PYTHON_WORD := $(call shell-wordify,$(PYTHON))
+
+ python-clean := $(PYTHON_WORD) util/setup.py clean \
+ --build-lib='$(OUTPUT)python' \
+ --build-temp='$(OUTPUT)python/temp'
+
+ ifdef NO_LIBPYTHON
+ $(call disable-python)
+ else
+
+ override PYTHON_CONFIG := \
+ $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
+
+ ifndef PYTHON_CONFIG
+ $(call disable-python,python-config tool)
+ else
+
+ PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
+
+ PYTHON_EMBED_LDOPTS := $(shell $(PYTHON_CONFIG_SQ) --ldflags 2>/dev/null)
+ PYTHON_EMBED_LDFLAGS := $(call strip-libs,$(PYTHON_EMBED_LDOPTS))
+ PYTHON_EMBED_LIBADD := $(call grep-libs,$(PYTHON_EMBED_LDOPTS))
+ PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null)
+ FLAGS_PYTHON_EMBED := $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS)
+
+ ifneq ($(call try-cc,$(SOURCE_PYTHON_EMBED),$(FLAGS_PYTHON_EMBED)),y)
+ $(call disable-python,Python.h (for Python 2.x))
+ else
+
+ ifneq ($(call try-cc,$(SOURCE_PYTHON_VERSION),$(FLAGS_PYTHON_EMBED)),y)
+ $(warning Python 3 is not yet supported; please set)
+ $(warning PYTHON and/or PYTHON_CONFIG appropriately.)
+ $(warning If you also have Python 2 installed, then)
+ $(warning try something like:)
+ $(warning $(and ,))
+ $(warning $(and ,) make PYTHON=python2)
+ $(warning $(and ,))
+ $(warning Otherwise, disable Python support entirely:)
+ $(warning $(and ,))
+ $(warning $(and ,) make NO_LIBPYTHON=1)
+ $(warning $(and ,))
+ $(error $(and ,))
+ else
+ ALL_LDFLAGS += $(PYTHON_EMBED_LDFLAGS)
+ EXTLIBS += $(PYTHON_EMBED_LIBADD)
+ LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-python.o
+ LIB_OBJS += $(OUTPUT)scripts/python/Perf-Trace-Util/Context.o
+ LANG_BINDINGS += $(OUTPUT)python/perf.so
+ endif
+
+ endif
+ endif
+ endif
endif
ifdef NO_DEMANGLE
$(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope*
$(MAKE) -C Documentation/ clean
$(RM) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS
- @python util/setup.py clean --build-lib='$(OUTPUT)python' \
- --build-temp='$(OUTPUT)python/temp'
+ $(python-clean)
.PHONY: all install clean strip
.PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell
{
int i;
- for (i = 0; i < evsel_list->cpus->nr; i++) {
+ for (i = 0; i < evsel_list->nr_mmaps; i++) {
if (evsel_list->mmap[i].base)
mmap_read(&evsel_list->mmap[i]);
}
};
/* default set to maintain compatibility with current format */
-static u64 output_fields[PERF_TYPE_MAX] = {
- [PERF_TYPE_HARDWARE] = PERF_OUTPUT_COMM | PERF_OUTPUT_TID | \
- PERF_OUTPUT_CPU | PERF_OUTPUT_TIME | \
- PERF_OUTPUT_EVNAME | PERF_OUTPUT_SYM,
-
- [PERF_TYPE_SOFTWARE] = PERF_OUTPUT_COMM | PERF_OUTPUT_TID | \
- PERF_OUTPUT_CPU | PERF_OUTPUT_TIME | \
- PERF_OUTPUT_EVNAME | PERF_OUTPUT_SYM,
-
- [PERF_TYPE_TRACEPOINT] = PERF_OUTPUT_COMM | PERF_OUTPUT_TID | \
- PERF_OUTPUT_CPU | PERF_OUTPUT_TIME | \
- PERF_OUTPUT_EVNAME | PERF_OUTPUT_TRACE,
+static struct {
+ bool user_set;
+ bool wildcard_set;
+ u64 fields;
+ u64 invalid_fields;
+} output[PERF_TYPE_MAX] = {
+
+ [PERF_TYPE_HARDWARE] = {
+ .user_set = false,
+
+ .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_SYM,
+
+ .invalid_fields = PERF_OUTPUT_TRACE,
+ },
+
+ [PERF_TYPE_SOFTWARE] = {
+ .user_set = false,
+
+ .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_SYM,
+
+ .invalid_fields = PERF_OUTPUT_TRACE,
+ },
+
+ [PERF_TYPE_TRACEPOINT] = {
+ .user_set = false,
+
+ .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_TRACE,
+ },
+
+ [PERF_TYPE_RAW] = {
+ .user_set = false,
+
+ .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_SYM,
+
+ .invalid_fields = PERF_OUTPUT_TRACE,
+ },
};
-static bool output_set_by_user;
+static bool output_set_by_user(void)
+{
+ int j;
+ for (j = 0; j < PERF_TYPE_MAX; ++j) {
+ if (output[j].user_set)
+ return true;
+ }
+ return false;
+}
+
+static const char *output_field2str(enum perf_output_field field)
+{
+ int i, imax = ARRAY_SIZE(all_output_options);
+ const char *str = "";
+
+ for (i = 0; i < imax; ++i) {
+ if (all_output_options[i].field == field) {
+ str = all_output_options[i].str;
+ break;
+ }
+ }
+ return str;
+}
-#define PRINT_FIELD(x) (output_fields[attr->type] & PERF_OUTPUT_##x)
+#define PRINT_FIELD(x) (output[attr->type].fields & PERF_OUTPUT_##x)
-static int perf_session__check_attr(struct perf_session *session,
- struct perf_event_attr *attr)
+static int perf_event_attr__check_stype(struct perf_event_attr *attr,
+ u64 sample_type, const char *sample_msg,
+ enum perf_output_field field)
{
+ int type = attr->type;
+ const char *evname;
+
+ if (attr->sample_type & sample_type)
+ return 0;
+
+ if (output[type].user_set) {
+ evname = __event_name(attr->type, attr->config);
+ pr_err("Samples for '%s' event do not have %s attribute set. "
+ "Cannot print '%s' field.\n",
+ evname, sample_msg, output_field2str(field));
+ return -1;
+ }
+
+ /* user did not ask for it explicitly so remove from the default list */
+ output[type].fields &= ~field;
+ evname = __event_name(attr->type, attr->config);
+ pr_debug("Samples for '%s' event do not have %s attribute set. "
+ "Skipping '%s' field.\n",
+ evname, sample_msg, output_field2str(field));
+
+ return 0;
+}
+
+static int perf_evsel__check_attr(struct perf_evsel *evsel,
+ struct perf_session *session)
+{
+ struct perf_event_attr *attr = &evsel->attr;
+
if (PRINT_FIELD(TRACE) &&
!perf_session__has_traces(session, "record -R"))
return -EINVAL;
if (PRINT_FIELD(SYM)) {
- if (!(session->sample_type & PERF_SAMPLE_IP)) {
- pr_err("Samples do not contain IP data.\n");
+ if (perf_event_attr__check_stype(attr, PERF_SAMPLE_IP, "IP",
+ PERF_OUTPUT_SYM))
return -EINVAL;
- }
+
if (!no_callchain &&
- !(session->sample_type & PERF_SAMPLE_CALLCHAIN))
+ !(attr->sample_type & PERF_SAMPLE_CALLCHAIN))
symbol_conf.use_callchain = false;
}
if ((PRINT_FIELD(PID) || PRINT_FIELD(TID)) &&
- !(session->sample_type & PERF_SAMPLE_TID)) {
- pr_err("Samples do not contain TID/PID data.\n");
+ perf_event_attr__check_stype(attr, PERF_SAMPLE_TID, "TID",
+ PERF_OUTPUT_TID|PERF_OUTPUT_PID))
return -EINVAL;
- }
if (PRINT_FIELD(TIME) &&
- !(session->sample_type & PERF_SAMPLE_TIME)) {
- pr_err("Samples do not contain timestamps.\n");
+ perf_event_attr__check_stype(attr, PERF_SAMPLE_TIME, "TIME",
+ PERF_OUTPUT_TIME))
return -EINVAL;
- }
if (PRINT_FIELD(CPU) &&
- !(session->sample_type & PERF_SAMPLE_CPU)) {
- pr_err("Samples do not contain cpu.\n");
+ perf_event_attr__check_stype(attr, PERF_SAMPLE_CPU, "CPU",
+ PERF_OUTPUT_CPU))
return -EINVAL;
+
+ return 0;
+}
+
+/*
+ * verify all user requested events exist and the samples
+ * have the expected data
+ */
+static int perf_session__check_output_opt(struct perf_session *session)
+{
+ int j;
+ struct perf_evsel *evsel;
+
+ for (j = 0; j < PERF_TYPE_MAX; ++j) {
+ evsel = perf_session__find_first_evtype(session, j);
+
+ /*
+ * even if fields is set to 0 (ie., show nothing) event must
+ * exist if user explicitly includes it on the command line
+ */
+ if (!evsel && output[j].user_set && !output[j].wildcard_set) {
+ pr_err("%s events do not exist. "
+ "Remove corresponding -f option to proceed.\n",
+ event_type(j));
+ return -1;
+ }
+
+ if (evsel && output[j].fields &&
+ perf_evsel__check_attr(evsel, session))
+ return -1;
}
return 0;
{
struct perf_event_attr *attr = &evsel->attr;
- if (output_fields[attr->type] == 0)
- return;
-
- if (perf_session__check_attr(session, attr) < 0)
+ if (output[attr->type].fields == 0)
return;
print_sample_start(sample, thread, attr);
{
char *tok;
int i, imax = sizeof(all_output_options) / sizeof(struct output_option);
+ int j;
int rc = 0;
char *str = strdup(arg);
int type = -1;
if (!str)
return -ENOMEM;
- tok = strtok(str, ":");
- if (!tok) {
- fprintf(stderr,
- "Invalid field string - not prepended with type.");
- return -EINVAL;
- }
-
- /* first word should state which event type user
- * is specifying the fields
+ /* first word can state for which event type the user is specifying
+ * the fields. If no type exists, the specified fields apply to all
+ * event types found in the file minus the invalid fields for a type.
*/
- if (!strcmp(tok, "hw"))
- type = PERF_TYPE_HARDWARE;
- else if (!strcmp(tok, "sw"))
- type = PERF_TYPE_SOFTWARE;
- else if (!strcmp(tok, "trace"))
- type = PERF_TYPE_TRACEPOINT;
- else {
- fprintf(stderr, "Invalid event type in field string.");
- return -EINVAL;
+ tok = strchr(str, ':');
+ if (tok) {
+ *tok = '\0';
+ tok++;
+ if (!strcmp(str, "hw"))
+ type = PERF_TYPE_HARDWARE;
+ else if (!strcmp(str, "sw"))
+ type = PERF_TYPE_SOFTWARE;
+ else if (!strcmp(str, "trace"))
+ type = PERF_TYPE_TRACEPOINT;
+ else if (!strcmp(str, "raw"))
+ type = PERF_TYPE_RAW;
+ else {
+ fprintf(stderr, "Invalid event type in field string.\n");
+ return -EINVAL;
+ }
+
+ if (output[type].user_set)
+ pr_warning("Overriding previous field request for %s events.\n",
+ event_type(type));
+
+ output[type].fields = 0;
+ output[type].user_set = true;
+ output[type].wildcard_set = false;
+
+ } else {
+ tok = str;
+ if (strlen(str) == 0) {
+ fprintf(stderr,
+ "Cannot set fields to 'none' for all event types.\n");
+ rc = -EINVAL;
+ goto out;
+ }
+
+ if (output_set_by_user())
+ pr_warning("Overriding previous field request for all events.\n");
+
+ for (j = 0; j < PERF_TYPE_MAX; ++j) {
+ output[j].fields = 0;
+ output[j].user_set = true;
+ output[j].wildcard_set = true;
+ }
}
- output_fields[type] = 0;
- while (1) {
- tok = strtok(NULL, ",");
- if (!tok)
- break;
+ tok = strtok(tok, ",");
+ while (tok) {
for (i = 0; i < imax; ++i) {
- if (strcmp(tok, all_output_options[i].str) == 0) {
- output_fields[type] |= all_output_options[i].field;
+ if (strcmp(tok, all_output_options[i].str) == 0)
break;
- }
}
if (i == imax) {
- fprintf(stderr, "Invalid field requested.");
+ fprintf(stderr, "Invalid field requested.\n");
rc = -EINVAL;
- break;
+ goto out;
}
- }
- if (output_fields[type] == 0) {
- pr_debug("No fields requested for %s type. "
- "Events will not be displayed\n", event_type(type));
+ if (type == -1) {
+ /* add user option to all events types for
+ * which it is valid
+ */
+ for (j = 0; j < PERF_TYPE_MAX; ++j) {
+ if (output[j].invalid_fields & all_output_options[i].field) {
+ pr_warning("\'%s\' not valid for %s events. Ignoring.\n",
+ all_output_options[i].str, event_type(j));
+ } else
+ output[j].fields |= all_output_options[i].field;
+ }
+ } else {
+ if (output[type].invalid_fields & all_output_options[i].field) {
+ fprintf(stderr, "\'%s\' not valid for %s events.\n",
+ all_output_options[i].str, event_type(type));
+
+ rc = -EINVAL;
+ goto out;
+ }
+ output[type].fields |= all_output_options[i].field;
+ }
+
+ tok = strtok(NULL, ",");
}
- output_set_by_user = true;
+ if (type >= 0) {
+ if (output[type].fields == 0) {
+ pr_debug("No fields requested for %s type. "
+ "Events will not be displayed.\n", event_type(type));
+ }
+ }
+out:
free(str);
return rc;
}
OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
"Look for files with symbols relative to this directory"),
OPT_CALLBACK('f', "fields", NULL, "str",
- "comma separated output fields prepend with 'type:'. Valid types: hw,sw,trace. Fields: comm,tid,pid,time,cpu,event,trace,sym",
+ "comma separated output fields prepend with 'type:'. Valid types: hw,sw,trace,raw. Fields: comm,tid,pid,time,cpu,event,trace,sym",
parse_output_fields),
OPT_END()
struct stat perf_stat;
int input;
- if (output_set_by_user) {
+ if (output_set_by_user()) {
fprintf(stderr,
"custom fields not supported for generated scripts");
return -1;
pr_debug("perf script started with script %s\n\n", script_name);
}
+
+ err = perf_session__check_output_opt(session);
+ if (err < 0)
+ goto out;
+
err = __cmd_script(session);
perf_session__delete(session);
*
* Sample output:
- $ perf stat ~/hackbench 10
- Time: 0.104
+ $ perf stat ./hackbench 10
- Performance counter stats for '/home/mingo/hackbench':
+ Time: 0.118
- 1255.538611 task clock ticks # 10.143 CPU utilization factor
- 54011 context switches # 0.043 M/sec
- 385 CPU migrations # 0.000 M/sec
- 17755 pagefaults # 0.014 M/sec
- 3808323185 CPU cycles # 3033.219 M/sec
- 1575111190 instructions # 1254.530 M/sec
- 17367895 cache references # 13.833 M/sec
- 7674421 cache misses # 6.112 M/sec
+ Performance counter stats for './hackbench 10':
- Wall-clock time elapsed: 123.786620 msecs
+ 1708.761321 task-clock # 11.037 CPUs utilized
+ 41,190 context-switches # 0.024 M/sec
+ 6,735 CPU-migrations # 0.004 M/sec
+ 17,318 page-faults # 0.010 M/sec
+ 5,205,202,243 cycles # 3.046 GHz
+ 3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
+ 1,600,790,871 stalled-cycles-backend # 30.75% backend cycles idle
+ 2,603,501,247 instructions # 0.50 insns per cycle
+ # 1.48 stalled cycles per insn
+ 484,357,498 branches # 283.455 M/sec
+ 6,388,934 branch-misses # 1.32% of all branches
+
+ 0.154822978 seconds time elapsed
*
- * Copyright (C) 2008, Red Hat Inc, Ingo Molnar <mingo@redhat.com>
+ * Copyright (C) 2008-2011, Red Hat Inc, Ingo Molnar <mingo@redhat.com>
*
* Improvements and fixes by:
*
#include "util/evlist.h"
#include "util/evsel.h"
#include "util/debug.h"
+#include "util/color.h"
#include "util/header.h"
#include "util/cpumap.h"
#include "util/thread.h"
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
+ { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_REFERENCES },
- { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CACHE_MISSES },
};
+/*
+ * Detailed stats (-d), covering the L1 and last level data caches:
+ */
+static struct perf_event_attr detailed_attrs[] = {
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_L1D << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_L1D << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_LL << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_LL << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
+};
+
+/*
+ * Very detailed stats (-d -d), covering the instruction cache and the TLB caches:
+ */
+static struct perf_event_attr very_detailed_attrs[] = {
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_L1I << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_L1I << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_DTLB << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_DTLB << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_ITLB << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_ITLB << 0 |
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
+
+};
+
+/*
+ * Very, very detailed stats (-d -d -d), adding prefetch events:
+ */
+static struct perf_event_attr very_very_detailed_attrs[] = {
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_L1D << 0 |
+ (PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
+
+ { .type = PERF_TYPE_HW_CACHE,
+ .config =
+ PERF_COUNT_HW_CACHE_L1D << 0 |
+ (PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
+};
+
+
+
struct perf_evlist *evsel_list;
static bool system_wide = false;
static pid_t target_tid = -1;
static pid_t child_pid = -1;
static bool null_run = false;
+static int detailed_run = 0;
+static bool sync_run = false;
static bool big_num = true;
static int big_num_opt = -1;
static const char *cpu_list;
struct stats runtime_nsecs_stats[MAX_NR_CPUS];
struct stats runtime_cycles_stats[MAX_NR_CPUS];
+struct stats runtime_stalled_cycles_front_stats[MAX_NR_CPUS];
+struct stats runtime_stalled_cycles_back_stats[MAX_NR_CPUS];
struct stats runtime_branches_stats[MAX_NR_CPUS];
+struct stats runtime_cacherefs_stats[MAX_NR_CPUS];
+struct stats runtime_l1_dcache_stats[MAX_NR_CPUS];
+struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
+struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
+struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
+struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
struct stats walltime_nsecs_stats;
static int create_perf_stat_counter(struct perf_evsel *evsel)
return 0;
}
+/*
+ * Update various tracking values we maintain to print
+ * more semantic information such as miss/hit ratios,
+ * instruction rates, etc:
+ */
+static void update_shadow_stats(struct perf_evsel *counter, u64 *count)
+{
+ if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK))
+ update_stats(&runtime_nsecs_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
+ update_stats(&runtime_cycles_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
+ update_stats(&runtime_stalled_cycles_front_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
+ update_stats(&runtime_stalled_cycles_back_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
+ update_stats(&runtime_branches_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HARDWARE, HW_CACHE_REFERENCES))
+ update_stats(&runtime_cacherefs_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1D))
+ update_stats(&runtime_l1_dcache_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_L1I))
+ update_stats(&runtime_l1_icache_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_LL))
+ update_stats(&runtime_ll_cache_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_DTLB))
+ update_stats(&runtime_dtlb_cache_stats[0], count[0]);
+ else if (perf_evsel__match(counter, HW_CACHE, HW_CACHE_ITLB))
+ update_stats(&runtime_itlb_cache_stats[0], count[0]);
+}
+
/*
* Read out the results of a single counter:
* aggregate counts across CPUs in system-wide mode
/*
* Save the full runtime - to allow normalization during printout:
*/
- if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK))
- update_stats(&runtime_nsecs_stats[0], count[0]);
- if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
- update_stats(&runtime_cycles_stats[0], count[0]);
- if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
- update_stats(&runtime_branches_stats[0], count[0]);
+ update_shadow_stats(counter, count);
return 0;
}
count = counter->counts->cpu[cpu].values;
- if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK))
- update_stats(&runtime_nsecs_stats[cpu], count[0]);
- if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
- update_stats(&runtime_cycles_stats[cpu], count[0]);
- if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
- update_stats(&runtime_branches_stats[cpu], count[0]);
+ update_shadow_stats(counter, count);
}
return 0;
list_for_each_entry(counter, &evsel_list->entries, node) {
if (create_perf_stat_counter(counter) < 0) {
- if (errno == -EPERM || errno == -EACCES) {
+ if (errno == EINVAL || errno == ENOSYS || errno == ENOENT) {
+ if (verbose)
+ ui__warning("%s event is not supported by the kernel.\n",
+ event_name(counter));
+ continue;
+ }
+
+ if (errno == EPERM || errno == EACCES) {
error("You may not have permission to collect %sstats.\n"
"\t Consider tweaking"
" /proc/sys/kernel/perf_event_paranoid or running as root.",
system_wide ? "system-wide " : "");
- } else if (errno == ENOENT) {
- error("%s event is not supported. ", event_name(counter));
} else {
error("open_counter returned with %d (%s). "
"/bin/dmesg may provide additional information.\n",
return WEXITSTATUS(status);
}
+static void print_noise_pct(double total, double avg)
+{
+ double pct = 0.0;
+
+ if (avg)
+ pct = 100.0*total/avg;
+
+ fprintf(stderr, " ( +-%6.2f%% )", pct);
+}
+
static void print_noise(struct perf_evsel *evsel, double avg)
{
struct perf_stat *ps;
return;
ps = evsel->priv;
- fprintf(stderr, " ( +- %7.3f%% )",
- 100 * stddev_stats(&ps->res_stats[0]) / avg);
+ print_noise_pct(stddev_stats(&ps->res_stats[0]), avg);
}
static void nsec_printout(int cpu, struct perf_evsel *evsel, double avg)
{
double msecs = avg / 1e6;
char cpustr[16] = { '\0', };
- const char *fmt = csv_output ? "%s%.6f%s%s" : "%s%18.6f%s%-24s";
+ const char *fmt = csv_output ? "%s%.6f%s%s" : "%s%18.6f%s%-25s";
if (no_aggr)
sprintf(cpustr, "CPU%*d%s",
return;
if (perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
- fprintf(stderr, " # %10.3f CPUs ",
- avg / avg_stats(&walltime_nsecs_stats));
+ fprintf(stderr, " # %8.3f CPUs utilized ", avg / avg_stats(&walltime_nsecs_stats));
+}
+
+static void print_stalled_cycles_frontend(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_cycles_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 50.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 30.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " frontend cycles idle ");
+}
+
+static void print_stalled_cycles_backend(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_cycles_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 75.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 50.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 20.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " backend cycles idle ");
+}
+
+static void print_branch_misses(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_branches_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 20.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 5.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " of all branches ");
+}
+
+static void print_l1_dcache_misses(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_l1_dcache_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 20.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 5.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " of all L1-dcache hits ");
+}
+
+static void print_l1_icache_misses(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_l1_icache_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 20.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 5.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " of all L1-icache hits ");
+}
+
+static void print_dtlb_cache_misses(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_dtlb_cache_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 20.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 5.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " of all dTLB cache hits ");
+}
+
+static void print_itlb_cache_misses(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_itlb_cache_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 20.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 5.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " of all iTLB cache hits ");
+}
+
+static void print_ll_cache_misses(int cpu, struct perf_evsel *evsel __used, double avg)
+{
+ double total, ratio = 0.0;
+ const char *color;
+
+ total = avg_stats(&runtime_ll_cache_stats[cpu]);
+
+ if (total)
+ ratio = avg / total * 100.0;
+
+ color = PERF_COLOR_NORMAL;
+ if (ratio > 20.0)
+ color = PERF_COLOR_RED;
+ else if (ratio > 10.0)
+ color = PERF_COLOR_MAGENTA;
+ else if (ratio > 5.0)
+ color = PERF_COLOR_YELLOW;
+
+ fprintf(stderr, " # ");
+ color_fprintf(stderr, color, "%6.2f%%", ratio);
+ fprintf(stderr, " of all LL-cache hits ");
}
static void abs_printout(int cpu, struct perf_evsel *evsel, double avg)
if (csv_output)
fmt = "%s%.0f%s%s";
else if (big_num)
- fmt = "%s%'18.0f%s%-24s";
+ fmt = "%s%'18.0f%s%-25s";
else
- fmt = "%s%18.0f%s%-24s";
+ fmt = "%s%18.0f%s%-25s";
if (no_aggr)
sprintf(cpustr, "CPU%*d%s",
if (total)
ratio = avg / total;
- fprintf(stderr, " # %10.3f IPC ", ratio);
+ fprintf(stderr, " # %5.2f insns per cycle ", ratio);
+
+ total = avg_stats(&runtime_stalled_cycles_front_stats[cpu]);
+ total = max(total, avg_stats(&runtime_stalled_cycles_back_stats[cpu]));
+
+ if (total && avg) {
+ ratio = total / avg;
+ fprintf(stderr, "\n # %5.2f stalled cycles per insn", ratio);
+ }
+
} else if (perf_evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES) &&
runtime_branches_stats[cpu].n != 0) {
- total = avg_stats(&runtime_branches_stats[cpu]);
+ print_branch_misses(cpu, evsel, avg);
+ } else if (
+ evsel->attr.type == PERF_TYPE_HW_CACHE &&
+ evsel->attr.config == ( PERF_COUNT_HW_CACHE_L1D |
+ ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+ ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16)) &&
+ runtime_l1_dcache_stats[cpu].n != 0) {
+ print_l1_dcache_misses(cpu, evsel, avg);
+ } else if (
+ evsel->attr.type == PERF_TYPE_HW_CACHE &&
+ evsel->attr.config == ( PERF_COUNT_HW_CACHE_L1I |
+ ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+ ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16)) &&
+ runtime_l1_icache_stats[cpu].n != 0) {
+ print_l1_icache_misses(cpu, evsel, avg);
+ } else if (
+ evsel->attr.type == PERF_TYPE_HW_CACHE &&
+ evsel->attr.config == ( PERF_COUNT_HW_CACHE_DTLB |
+ ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+ ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16)) &&
+ runtime_dtlb_cache_stats[cpu].n != 0) {
+ print_dtlb_cache_misses(cpu, evsel, avg);
+ } else if (
+ evsel->attr.type == PERF_TYPE_HW_CACHE &&
+ evsel->attr.config == ( PERF_COUNT_HW_CACHE_ITLB |
+ ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+ ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16)) &&
+ runtime_itlb_cache_stats[cpu].n != 0) {
+ print_itlb_cache_misses(cpu, evsel, avg);
+ } else if (
+ evsel->attr.type == PERF_TYPE_HW_CACHE &&
+ evsel->attr.config == ( PERF_COUNT_HW_CACHE_LL |
+ ((PERF_COUNT_HW_CACHE_OP_READ) << 8) |
+ ((PERF_COUNT_HW_CACHE_RESULT_MISS) << 16)) &&
+ runtime_ll_cache_stats[cpu].n != 0) {
+ print_ll_cache_misses(cpu, evsel, avg);
+ } else if (perf_evsel__match(evsel, HARDWARE, HW_CACHE_MISSES) &&
+ runtime_cacherefs_stats[cpu].n != 0) {
+ total = avg_stats(&runtime_cacherefs_stats[cpu]);
if (total)
ratio = avg * 100 / total;
- fprintf(stderr, " # %10.3f %% ", ratio);
+ fprintf(stderr, " # %8.3f %% of all cache refs ", ratio);
+
+ } else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) {
+ print_stalled_cycles_frontend(cpu, evsel, avg);
+ } else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) {
+ print_stalled_cycles_backend(cpu, evsel, avg);
+ } else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
+ total = avg_stats(&runtime_nsecs_stats[cpu]);
+ if (total)
+ ratio = 1.0 * avg / total;
+
+ fprintf(stderr, " # %8.3f GHz ", ratio);
} else if (runtime_nsecs_stats[cpu].n != 0) {
total = avg_stats(&runtime_nsecs_stats[cpu]);
if (total)
ratio = 1000.0 * avg / total;
- fprintf(stderr, " # %10.3f M/sec", ratio);
+ fprintf(stderr, " # %8.3f M/sec ", ratio);
+ } else {
+ fprintf(stderr, " ");
}
}
avg_enabled = avg_stats(&ps->res_stats[1]);
avg_running = avg_stats(&ps->res_stats[2]);
- fprintf(stderr, " (scaled from %.2f%%)",
- 100 * avg_running / avg_enabled);
+ fprintf(stderr, " [%5.2f%%]", 100 * avg_running / avg_enabled);
}
fprintf(stderr, "\n");
}
if (!csv_output) {
print_noise(counter, 1.0);
- if (run != ena) {
- fprintf(stderr, " (scaled from %.2f%%)",
- 100.0 * run / ena);
- }
+ if (run != ena)
+ fprintf(stderr, " (%.2f%%)", 100.0 * run / ena);
}
fputc('\n', stderr);
}
}
if (!csv_output) {
- fprintf(stderr, "\n");
- fprintf(stderr, " %18.9f seconds time elapsed",
+ if (!null_run)
+ fprintf(stderr, "\n");
+ fprintf(stderr, " %17.9f seconds time elapsed",
avg_stats(&walltime_nsecs_stats)/1e9);
if (run_count > 1) {
- fprintf(stderr, " ( +- %7.3f%% )",
- 100*stddev_stats(&walltime_nsecs_stats) /
- avg_stats(&walltime_nsecs_stats));
+ fprintf(stderr, " ");
+ print_noise_pct(stddev_stats(&walltime_nsecs_stats),
+ avg_stats(&walltime_nsecs_stats));
}
fprintf(stderr, "\n\n");
}
"repeat command and print average + stddev (max: 100)"),
OPT_BOOLEAN('n', "null", &null_run,
"null run - dont start any counters"),
+ OPT_INCR('d', "detailed", &detailed_run,
+ "detailed run - start a lot of events"),
+ OPT_BOOLEAN('S', "sync", &sync_run,
+ "call sync() before starting a run"),
OPT_CALLBACK_NOOPT('B', "big-num", NULL, NULL,
"print large numbers with thousands\' separators",
stat__set_big_num),
OPT_END()
};
+/*
+ * Add default attributes, if there were no attributes specified or
+ * if -d/--detailed, -d -d or -d -d -d is used:
+ */
+static int add_default_attributes(void)
+{
+ struct perf_evsel *pos;
+ size_t attr_nr = 0;
+ size_t c;
+
+ /* Set attrs if no event is selected and !null_run: */
+ if (null_run)
+ return 0;
+
+ if (!evsel_list->nr_entries) {
+ for (c = 0; c < ARRAY_SIZE(default_attrs); c++) {
+ pos = perf_evsel__new(default_attrs + c, c + attr_nr);
+ if (pos == NULL)
+ return -1;
+ perf_evlist__add(evsel_list, pos);
+ }
+ attr_nr += c;
+ }
+
+ /* Detailed events get appended to the event list: */
+
+ if (detailed_run < 1)
+ return 0;
+
+ /* Append detailed run extra attributes: */
+ for (c = 0; c < ARRAY_SIZE(detailed_attrs); c++) {
+ pos = perf_evsel__new(detailed_attrs + c, c + attr_nr);
+ if (pos == NULL)
+ return -1;
+ perf_evlist__add(evsel_list, pos);
+ }
+ attr_nr += c;
+
+ if (detailed_run < 2)
+ return 0;
+
+ /* Append very detailed run extra attributes: */
+ for (c = 0; c < ARRAY_SIZE(very_detailed_attrs); c++) {
+ pos = perf_evsel__new(very_detailed_attrs + c, c + attr_nr);
+ if (pos == NULL)
+ return -1;
+ perf_evlist__add(evsel_list, pos);
+ }
+
+ if (detailed_run < 3)
+ return 0;
+
+ /* Append very, very detailed run extra attributes: */
+ for (c = 0; c < ARRAY_SIZE(very_very_detailed_attrs); c++) {
+ pos = perf_evsel__new(very_very_detailed_attrs + c, c + attr_nr);
+ if (pos == NULL)
+ return -1;
+ perf_evlist__add(evsel_list, pos);
+ }
+
+
+ return 0;
+}
+
int cmd_stat(int argc, const char **argv, const char *prefix __used)
{
struct perf_evsel *pos;
usage_with_options(stat_usage, options);
}
- /* Set attrs and nr_counters if no event is selected and !null_run */
- if (!null_run && !evsel_list->nr_entries) {
- size_t c;
-
- for (c = 0; c < ARRAY_SIZE(default_attrs); ++c) {
- pos = perf_evsel__new(&default_attrs[c], c);
- if (pos == NULL)
- goto out;
- perf_evlist__add(evsel_list, pos);
- }
- }
+ if (add_default_attributes())
+ goto out;
if (target_pid != -1)
target_tid = target_pid;
for (run_idx = 0; run_idx < run_count; run_idx++) {
if (run_count != 1 && verbose)
fprintf(stderr, "[ perf stat: executing run #%d ... ]\n", run_idx + 1);
+
+ if (sync_run)
+ sync();
+
status = run_perf_stat(argc, argv);
}
++foo;
}
- while ((event = perf_evlist__read_on_cpu(evlist, 0)) != NULL) {
+ while ((event = perf_evlist__mmap_read(evlist, 0)) != NULL) {
struct perf_sample sample;
if (event->header.type != PERF_RECORD_SAMPLE) {
}
}
-static void perf_session__mmap_read_cpu(struct perf_session *self, int cpu)
+static void perf_session__mmap_read_idx(struct perf_session *self, int idx)
{
struct perf_sample sample;
union perf_event *event;
- while ((event = perf_evlist__read_on_cpu(top.evlist, cpu)) != NULL) {
+ while ((event = perf_evlist__mmap_read(top.evlist, idx)) != NULL) {
perf_session__parse_sample(self, event, &sample);
if (event->header.type == PERF_RECORD_SAMPLE)
{
int i;
- for (i = 0; i < top.evlist->cpus->nr; i++)
- perf_session__mmap_read_cpu(self, i);
+ for (i = 0; i < top.evlist->nr_mmaps; i++)
+ perf_session__mmap_read_idx(self, i);
}
static void start_counters(struct perf_evlist *evlist)
--- /dev/null
+define SOURCE_HELLO
+#include <stdio.h>
+int main(void)
+{
+ return puts(\"hi\");
+}
+endef
+
+ifndef NO_DWARF
+define SOURCE_DWARF
+#include <dwarf.h>
+#include <elfutils/libdw.h>
+#include <elfutils/version.h>
+#ifndef _ELFUTILS_PREREQ
+#error
+#endif
+
+int main(void)
+{
+ Dwarf *dbg = dwarf_begin(0, DWARF_C_READ);
+ return (long)dbg;
+}
+endef
+endif
+
+define SOURCE_LIBELF
+#include <libelf.h>
+
+int main(void)
+{
+ Elf *elf = elf_begin(0, ELF_C_READ, 0);
+ return (long)elf;
+}
+endef
+
+define SOURCE_GLIBC
+#include <gnu/libc-version.h>
+
+int main(void)
+{
+ const char *version = gnu_get_libc_version();
+ return (long)version;
+}
+endef
+
+define SOURCE_ELF_MMAP
+#include <libelf.h>
+int main(void)
+{
+ Elf *elf = elf_begin(0, ELF_C_READ_MMAP, 0);
+ return (long)elf;
+}
+endef
+
+ifndef NO_NEWT
+define SOURCE_NEWT
+#include <newt.h>
+
+int main(void)
+{
+ newtInit();
+ newtCls();
+ return newtFinished();
+}
+endef
+endif
+
+ifndef NO_LIBPERL
+define SOURCE_PERL_EMBED
+#include <EXTERN.h>
+#include <perl.h>
+
+int main(void)
+{
+perl_alloc();
+return 0;
+}
+endef
+endif
+
+ifndef NO_LIBPYTHON
+define SOURCE_PYTHON_VERSION
+#include <Python.h>
+#if PY_VERSION_HEX >= 0x03000000
+ #error
+#endif
+int main(void){}
+endef
+define SOURCE_PYTHON_EMBED
+#include <Python.h>
+int main(void)
+{
+ Py_Initialize();
+ return 0;
+}
+endef
+endif
+
+define SOURCE_BFD
+#include <bfd.h>
+
+int main(void)
+{
+ bfd_demangle(0, 0, 0);
+ return 0;
+}
+endef
+
+define SOURCE_CPLUS_DEMANGLE
+extern char *cplus_demangle(const char *, int);
+
+int main(void)
+{
+ cplus_demangle(0, 0);
+ return 0;
+}
+endef
+
+define SOURCE_STRLCPY
+#include <stdlib.h>
+extern size_t strlcpy(char *dest, const char *src, size_t size);
+
+int main(void)
+{
+ strlcpy(NULL, NULL, 0);
+ return 0;
+}
+endef
--- /dev/null
+# This allows us to work with the newline character:
+define newline
+
+
+endef
+newline := $(newline)
+
+# nl-escape
+#
+# Usage: escape = $(call nl-escape[,escape])
+#
+# This is used as the common way to specify
+# what should replace a newline when escaping
+# newlines; the default is a bizarre string.
+#
+nl-escape = $(or $(1),m822df3020w6a44id34bt574ctac44eb9f4n)
+
+# escape-nl
+#
+# Usage: escaped-text = $(call escape-nl,text[,escape])
+#
+# GNU make's $(shell ...) function converts to a
+# single space each newline character in the output
+# produced during the expansion; this may not be
+# desirable.
+#
+# The only solution is to change each newline into
+# something that won't be converted, so that the
+# information can be recovered later with
+# $(call unescape-nl...)
+#
+escape-nl = $(subst $(newline),$(call nl-escape,$(2)),$(1))
+
+# unescape-nl
+#
+# Usage: text = $(call unescape-nl,escaped-text[,escape])
+#
+# See escape-nl.
+#
+unescape-nl = $(subst $(call nl-escape,$(2)),$(newline),$(1))
+
+# shell-escape-nl
+#
+# Usage: $(shell some-command | $(call shell-escape-nl[,escape]))
+#
+# Use this to escape newlines from within a shell call;
+# the default escape is a bizarre string.
+#
+# NOTE: The escape is used directly as a string constant
+# in an `awk' program that is delimited by shell
+# single-quotes, so be wary of the characters
+# that are chosen.
+#
+define shell-escape-nl
+awk 'NR==1 {t=$$0} NR>1 {t=t "$(nl-escape)" $$0} END {printf t}'
+endef
+
+# shell-unescape-nl
+#
+# Usage: $(shell some-command | $(call shell-unescape-nl[,escape]))
+#
+# Use this to unescape newlines from within a shell call;
+# the default escape is a bizarre string.
+#
+# NOTE: The escape is used directly as an extended regular
+# expression constant in an `awk' program that is
+# delimited by shell single-quotes, so be wary
+# of the characters that are chosen.
+#
+# (The bash shell has a bug where `{gsub(...),...}' is
+# misinterpreted as a brace expansion; this can be
+# overcome by putting a space between `{' and `gsub').
+#
+define shell-unescape-nl
+awk 'NR==1 {t=$$0} NR>1 {t=t "\n" $$0} END { gsub(/$(nl-escape)/,"\n",t); printf t }'
+endef
+
+# escape-for-shell-sq
+#
+# Usage: embeddable-text = $(call escape-for-shell-sq,text)
+#
+# This function produces text that is suitable for
+# embedding in a shell string that is delimited by
+# single-quotes.
+#
+escape-for-shell-sq = $(subst ','\'',$(1))
+
+# shell-sq
+#
+# Usage: single-quoted-and-escaped-text = $(call shell-sq,text)
+#
+shell-sq = '$(escape-for-shell-sq)'
+
+# shell-wordify
+#
+# Usage: wordified-text = $(call shell-wordify,text)
+#
+# For instance:
+#
+# |define text
+# |hello
+# |world
+# |endef
+# |
+# |target:
+# | echo $(call shell-wordify,$(text))
+#
+# At least GNU make gets confused by expanding a newline
+# within the context of a command line of a makefile rule
+# (this is in constrast to a `$(shell ...)' function call,
+# which can handle it just fine).
+#
+# This function avoids the problem by producing a string
+# that works as a shell word, regardless of whether or
+# not it contains a newline.
+#
+# If the text to be wordified contains a newline, then
+# an intrictate shell command substitution is constructed
+# to render the text as a single line; when the shell
+# processes the resulting escaped text, it transforms
+# it into the original unescaped text.
+#
+# If the text does not contain a newline, then this function
+# produces the same results as the `$(shell-sq)' function.
+#
+shell-wordify = $(if $(findstring $(newline),$(1)),$(_sw-esc-nl),$(shell-sq))
+define _sw-esc-nl
+"$$(echo $(call escape-nl,$(shell-sq),$(2)) | $(call shell-unescape-nl,$(2)))"
+endef
+
+# is-absolute
+#
+# Usage: bool-value = $(call is-absolute,path)
+#
+is-absolute = $(shell echo $(shell-sq) | grep ^/ -q && echo y)
+
+# lookup
+#
+# Usage: absolute-executable-path-or-empty = $(call lookup,path)
+#
+# (It's necessary to use `sh -c' because GNU make messes up by
+# trying too hard and getting things wrong).
+#
+lookup = $(call unescape-nl,$(shell sh -c $(_l-sh)))
+_l-sh = $(call shell-sq,command -v $(shell-sq) | $(call shell-escape-nl,))
+
+# is-executable
+#
+# Usage: bool-value = $(call is-executable,path)
+#
+# (It's necessary to use `sh -c' because GNU make messes up by
+# trying too hard and getting things wrong).
+#
+is-executable = $(call _is-executable-helper,$(shell-sq))
+_is-executable-helper = $(shell sh -c $(_is-executable-sh))
+_is-executable-sh = $(call shell-sq,test -f $(1) -a -x $(1) && echo y)
+
+# get-executable
+#
+# Usage: absolute-executable-path-or-empty = $(call get-executable,path)
+#
+# The goal is to get an absolute path for an executable;
+# the `command -v' is defined by POSIX, but it's not
+# necessarily very portable, so it's only used if
+# relative path resolution is requested, as determined
+# by the presence of a leading `/'.
+#
+get-executable = $(if $(1),$(if $(is-absolute),$(_ge-abspath),$(lookup)))
+_ge-abspath = $(if $(is-executable),$(1))
+
+# get-supplied-or-default-executable
+#
+# Usage: absolute-executable-path-or-empty = $(call get-executable-or-default,variable,default)
+#
+define get-executable-or-default
+$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2)))
+endef
+_ge_attempt = $(or $(get-executable),$(_gea_warn),$(call _gea_err,$(2)))
+_gea_warn = $(warning The path '$(1)' is not executable.)
+_gea_err = $(if $(1),$(error Please set '$(1)' appropriately))
+
+# try-cc
+# Usage: option = $(call try-cc, source-to-build, cc-options)
+try-cc = $(shell sh -c \
+ 'TMP="$(OUTPUT)$(TMPOUT).$$$$"; \
+ echo "$(1)" | \
+ $(CC) -x c - $(2) -o "$$TMP" > /dev/null 2>&1 && echo y; \
+ rm -f "$$TMP"')
+++ /dev/null
-define SOURCE_HELLO
-#include <stdio.h>
-int main(void)
-{
- return puts(\"hi\");
-}
-endef
-
-ifndef NO_DWARF
-define SOURCE_DWARF
-#include <dwarf.h>
-#include <elfutils/libdw.h>
-#include <elfutils/version.h>
-#ifndef _ELFUTILS_PREREQ
-#error
-#endif
-
-int main(void)
-{
- Dwarf *dbg = dwarf_begin(0, DWARF_C_READ);
- return (long)dbg;
-}
-endef
-endif
-
-define SOURCE_LIBELF
-#include <libelf.h>
-
-int main(void)
-{
- Elf *elf = elf_begin(0, ELF_C_READ, 0);
- return (long)elf;
-}
-endef
-
-define SOURCE_GLIBC
-#include <gnu/libc-version.h>
-
-int main(void)
-{
- const char *version = gnu_get_libc_version();
- return (long)version;
-}
-endef
-
-define SOURCE_ELF_MMAP
-#include <libelf.h>
-int main(void)
-{
- Elf *elf = elf_begin(0, ELF_C_READ_MMAP, 0);
- return (long)elf;
-}
-endef
-
-ifndef NO_NEWT
-define SOURCE_NEWT
-#include <newt.h>
-
-int main(void)
-{
- newtInit();
- newtCls();
- return newtFinished();
-}
-endef
-endif
-
-ifndef NO_LIBPERL
-define SOURCE_PERL_EMBED
-#include <EXTERN.h>
-#include <perl.h>
-
-int main(void)
-{
-perl_alloc();
-return 0;
-}
-endef
-endif
-
-ifndef NO_LIBPYTHON
-define SOURCE_PYTHON_EMBED
-#include <Python.h>
-
-int main(void)
-{
- Py_Initialize();
- return 0;
-}
-endef
-endif
-
-define SOURCE_BFD
-#include <bfd.h>
-
-int main(void)
-{
- bfd_demangle(0, 0, 0);
- return 0;
-}
-endef
-
-define SOURCE_CPLUS_DEMANGLE
-extern char *cplus_demangle(const char *, int);
-
-int main(void)
-{
- cplus_demangle(0, 0);
- return 0;
-}
-endef
-
-define SOURCE_STRLCPY
-#include <stdlib.h>
-extern size_t strlcpy(char *dest, const char *src, size_t size);
-
-int main(void)
-{
- strlcpy(NULL, NULL, 0);
- return 0;
-}
-endef
-
-# try-cc
-# Usage: option = $(call try-cc, source-to-build, cc-options)
-try-cc = $(shell sh -c \
- 'TMP="$(OUTPUT)$(TMPOUT).$$$$"; \
- echo "$(1)" | \
- $(CC) -x c - $(2) -o "$$TMP" > /dev/null 2>&1 && echo y; \
- rm -f "$$TMP"')
return NULL;
}
-union perf_event *perf_evlist__read_on_cpu(struct perf_evlist *evlist, int cpu)
+union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
{
/* XXX Move this to perf.c, making it generally available */
unsigned int page_size = sysconf(_SC_PAGE_SIZE);
- struct perf_mmap *md = &evlist->mmap[cpu];
+ struct perf_mmap *md = &evlist->mmap[idx];
unsigned int head = perf_mmap__read_head(md);
unsigned int old = md->prev;
unsigned char *data = md->base + page_size;
void perf_evlist__munmap(struct perf_evlist *evlist)
{
- int cpu;
+ int i;
- for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
- if (evlist->mmap[cpu].base != NULL) {
- munmap(evlist->mmap[cpu].base, evlist->mmap_len);
- evlist->mmap[cpu].base = NULL;
+ for (i = 0; i < evlist->nr_mmaps; i++) {
+ if (evlist->mmap[i].base != NULL) {
+ munmap(evlist->mmap[i].base, evlist->mmap_len);
+ evlist->mmap[i].base = NULL;
}
}
+
+ free(evlist->mmap);
+ evlist->mmap = NULL;
}
int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
{
- evlist->mmap = zalloc(evlist->cpus->nr * sizeof(struct perf_mmap));
+ evlist->nr_mmaps = evlist->cpus->nr;
+ if (evlist->cpus->map[0] == -1)
+ evlist->nr_mmaps = evlist->threads->nr;
+ evlist->mmap = zalloc(evlist->nr_mmaps * sizeof(struct perf_mmap));
return evlist->mmap != NULL ? 0 : -ENOMEM;
}
static int __perf_evlist__mmap(struct perf_evlist *evlist, struct perf_evsel *evsel,
- int cpu, int prot, int mask, int fd)
+ int idx, int prot, int mask, int fd)
{
- evlist->mmap[cpu].prev = 0;
- evlist->mmap[cpu].mask = mask;
- evlist->mmap[cpu].base = mmap(NULL, evlist->mmap_len, prot,
+ evlist->mmap[idx].prev = 0;
+ evlist->mmap[idx].mask = mask;
+ evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, prot,
MAP_SHARED, fd, 0);
- if (evlist->mmap[cpu].base == MAP_FAILED) {
- if (evlist->cpus->map[cpu] == -1 && evsel->attr.inherit)
+ if (evlist->mmap[idx].base == MAP_FAILED) {
+ if (evlist->cpus->map[idx] == -1 && evsel->attr.inherit)
ui__warning("Inherit is not allowed on per-task "
"events using mmap.\n");
return -1;
return 0;
}
+static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist, int prot, int mask)
+{
+ struct perf_evsel *evsel;
+ int cpu, thread;
+
+ for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
+ int output = -1;
+
+ for (thread = 0; thread < evlist->threads->nr; thread++) {
+ list_for_each_entry(evsel, &evlist->entries, node) {
+ int fd = FD(evsel, cpu, thread);
+
+ if (output == -1) {
+ output = fd;
+ if (__perf_evlist__mmap(evlist, evsel, cpu,
+ prot, mask, output) < 0)
+ goto out_unmap;
+ } else {
+ if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, output) != 0)
+ goto out_unmap;
+ }
+
+ if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
+ perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0)
+ goto out_unmap;
+ }
+ }
+ }
+
+ return 0;
+
+out_unmap:
+ for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
+ if (evlist->mmap[cpu].base != NULL) {
+ munmap(evlist->mmap[cpu].base, evlist->mmap_len);
+ evlist->mmap[cpu].base = NULL;
+ }
+ }
+ return -1;
+}
+
+static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist, int prot, int mask)
+{
+ struct perf_evsel *evsel;
+ int thread;
+
+ for (thread = 0; thread < evlist->threads->nr; thread++) {
+ int output = -1;
+
+ list_for_each_entry(evsel, &evlist->entries, node) {
+ int fd = FD(evsel, 0, thread);
+
+ if (output == -1) {
+ output = fd;
+ if (__perf_evlist__mmap(evlist, evsel, thread,
+ prot, mask, output) < 0)
+ goto out_unmap;
+ } else {
+ if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, output) != 0)
+ goto out_unmap;
+ }
+
+ if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
+ perf_evlist__id_add_fd(evlist, evsel, 0, thread, fd) < 0)
+ goto out_unmap;
+ }
+ }
+
+ return 0;
+
+out_unmap:
+ for (thread = 0; thread < evlist->threads->nr; thread++) {
+ if (evlist->mmap[thread].base != NULL) {
+ munmap(evlist->mmap[thread].base, evlist->mmap_len);
+ evlist->mmap[thread].base = NULL;
+ }
+ }
+ return -1;
+}
+
/** perf_evlist__mmap - Create per cpu maps to receive events
*
* @evlist - list of events
int perf_evlist__mmap(struct perf_evlist *evlist, int pages, bool overwrite)
{
unsigned int page_size = sysconf(_SC_PAGE_SIZE);
- int mask = pages * page_size - 1, cpu;
- struct perf_evsel *first_evsel, *evsel;
+ int mask = pages * page_size - 1;
+ struct perf_evsel *evsel;
const struct cpu_map *cpus = evlist->cpus;
const struct thread_map *threads = evlist->threads;
- int thread, prot = PROT_READ | (overwrite ? 0 : PROT_WRITE);
+ int prot = PROT_READ | (overwrite ? 0 : PROT_WRITE);
if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
return -ENOMEM;
evlist->overwrite = overwrite;
evlist->mmap_len = (pages + 1) * page_size;
- first_evsel = list_entry(evlist->entries.next, struct perf_evsel, node);
list_for_each_entry(evsel, &evlist->entries, node) {
if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
evsel->sample_id == NULL &&
perf_evsel__alloc_id(evsel, cpus->nr, threads->nr) < 0)
return -ENOMEM;
-
- for (cpu = 0; cpu < cpus->nr; cpu++) {
- for (thread = 0; thread < threads->nr; thread++) {
- int fd = FD(evsel, cpu, thread);
-
- if (evsel->idx || thread) {
- if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT,
- FD(first_evsel, cpu, 0)) != 0)
- goto out_unmap;
- } else if (__perf_evlist__mmap(evlist, evsel, cpu,
- prot, mask, fd) < 0)
- goto out_unmap;
-
- if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
- perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0)
- goto out_unmap;
- }
- }
}
- return 0;
+ if (evlist->cpus->map[0] == -1)
+ return perf_evlist__mmap_per_thread(evlist, prot, mask);
-out_unmap:
- for (cpu = 0; cpu < cpus->nr; cpu++) {
- if (evlist->mmap[cpu].base != NULL) {
- munmap(evlist->mmap[cpu].base, evlist->mmap_len);
- evlist->mmap[cpu].base = NULL;
- }
- }
- return -1;
+ return perf_evlist__mmap_per_cpu(evlist, prot, mask);
}
int perf_evlist__create_maps(struct perf_evlist *evlist, pid_t target_pid,
if (evlist->threads == NULL)
return -1;
- if (target_tid != -1)
+ if (cpu_list == NULL && target_tid != -1)
evlist->cpus = cpu_map__dummy_new();
else
evlist->cpus = cpu_map__new(cpu_list);
struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
int nr_entries;
int nr_fds;
+ int nr_mmaps;
int mmap_len;
bool overwrite;
union perf_event event_copy;
struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
-union perf_event *perf_evlist__read_on_cpu(struct perf_evlist *self, int cpu);
+union perf_event *perf_evlist__mmap_read(struct perf_evlist *self, int idx);
int perf_evlist__alloc_mmap(struct perf_evlist *evlist);
int perf_evlist__mmap(struct perf_evlist *evlist, int pages, bool overwrite);
--- /dev/null
+#ifndef _PERF_ASM_ALTERNATIVE_ASM_H
+#define _PERF_ASM_ALTERNATIVE_ASM_H
+
+/* Just disable it so we can build arch/x86/lib/memcpy_64.S for perf bench: */
+
+#define altinstruction_entry #
+
+#endif
#define CSW(x) .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_##x
static struct event_symbol event_symbols[] = {
- { CHW(CPU_CYCLES), "cpu-cycles", "cycles" },
- { CHW(INSTRUCTIONS), "instructions", "" },
- { CHW(CACHE_REFERENCES), "cache-references", "" },
- { CHW(CACHE_MISSES), "cache-misses", "" },
- { CHW(BRANCH_INSTRUCTIONS), "branch-instructions", "branches" },
- { CHW(BRANCH_MISSES), "branch-misses", "" },
- { CHW(BUS_CYCLES), "bus-cycles", "" },
-
- { CSW(CPU_CLOCK), "cpu-clock", "" },
- { CSW(TASK_CLOCK), "task-clock", "" },
- { CSW(PAGE_FAULTS), "page-faults", "faults" },
- { CSW(PAGE_FAULTS_MIN), "minor-faults", "" },
- { CSW(PAGE_FAULTS_MAJ), "major-faults", "" },
- { CSW(CONTEXT_SWITCHES), "context-switches", "cs" },
- { CSW(CPU_MIGRATIONS), "cpu-migrations", "migrations" },
- { CSW(ALIGNMENT_FAULTS), "alignment-faults", "" },
- { CSW(EMULATION_FAULTS), "emulation-faults", "" },
+ { CHW(CPU_CYCLES), "cpu-cycles", "cycles" },
+ { CHW(STALLED_CYCLES_FRONTEND), "stalled-cycles-frontend", "idle-cycles-frontend" },
+ { CHW(STALLED_CYCLES_BACKEND), "stalled-cycles-backend", "idle-cycles-backend" },
+ { CHW(INSTRUCTIONS), "instructions", "" },
+ { CHW(CACHE_REFERENCES), "cache-references", "" },
+ { CHW(CACHE_MISSES), "cache-misses", "" },
+ { CHW(BRANCH_INSTRUCTIONS), "branch-instructions", "branches" },
+ { CHW(BRANCH_MISSES), "branch-misses", "" },
+ { CHW(BUS_CYCLES), "bus-cycles", "" },
+
+ { CSW(CPU_CLOCK), "cpu-clock", "" },
+ { CSW(TASK_CLOCK), "task-clock", "" },
+ { CSW(PAGE_FAULTS), "page-faults", "faults" },
+ { CSW(PAGE_FAULTS_MIN), "minor-faults", "" },
+ { CSW(PAGE_FAULTS_MAJ), "major-faults", "" },
+ { CSW(CONTEXT_SWITCHES), "context-switches", "cs" },
+ { CSW(CPU_MIGRATIONS), "cpu-migrations", "migrations" },
+ { CSW(ALIGNMENT_FAULTS), "alignment-faults", "" },
+ { CSW(EMULATION_FAULTS), "emulation-faults", "" },
};
#define __PERF_EVENT_FIELD(config, name) \
((config & PERF_EVENT_##name##_MASK) >> PERF_EVENT_##name##_SHIFT)
-#define PERF_EVENT_RAW(config) __PERF_EVENT_FIELD(config, RAW)
+#define PERF_EVENT_RAW(config) __PERF_EVENT_FIELD(config, RAW)
#define PERF_EVENT_CONFIG(config) __PERF_EVENT_FIELD(config, CONFIG)
-#define PERF_EVENT_TYPE(config) __PERF_EVENT_FIELD(config, TYPE)
+#define PERF_EVENT_TYPE(config) __PERF_EVENT_FIELD(config, TYPE)
#define PERF_EVENT_ID(config) __PERF_EVENT_FIELD(config, EVENT)
-static const char *hw_event_names[] = {
+static const char *hw_event_names[PERF_COUNT_HW_MAX] = {
"cycles",
"instructions",
"cache-references",
"branches",
"branch-misses",
"bus-cycles",
+ "stalled-cycles-frontend",
+ "stalled-cycles-backend",
};
-static const char *sw_event_names[] = {
- "cpu-clock-msecs",
- "task-clock-msecs",
+static const char *sw_event_names[PERF_COUNT_SW_MAX] = {
+ "cpu-clock",
+ "task-clock",
"page-faults",
"context-switches",
"CPU-migrations",
switch (type) {
case PERF_TYPE_HARDWARE:
- if (config < PERF_COUNT_HW_MAX)
+ if (config < PERF_COUNT_HW_MAX && hw_event_names[config])
return hw_event_names[config];
return "unknown-hardware";
}
case PERF_TYPE_SOFTWARE:
- if (config < PERF_COUNT_SW_MAX)
+ if (config < PERF_COUNT_SW_MAX && sw_event_names[config])
return sw_event_names[config];
return "unknown-software";
int n;
n = strlen(event_symbols[i].symbol);
- if (!strncmp(str, event_symbols[i].symbol, n))
+ if (!strncasecmp(str, event_symbols[i].symbol, n))
return n;
n = strlen(event_symbols[i].alias);
- if (n)
- if (!strncmp(str, event_symbols[i].alias, n))
+ if (n) {
+ if (!strncasecmp(str, event_symbols[i].alias, n))
return n;
+ }
+
return 0;
}
return EVT_FAILED;
}
-static enum event_result
+static int
parse_event_modifier(const char **strp, struct perf_event_attr *attr)
{
const char *str = *strp;
int exclude = 0;
int eu = 0, ek = 0, eh = 0, precise = 0;
- if (*str++ != ':')
+ if (!*str)
+ return 0;
+
+ if (*str == ',')
return 0;
+
+ if (*str++ != ':')
+ return -1;
+
while (*str) {
if (*str == 'u') {
if (!exclude)
++str;
}
- if (str >= *strp + 2) {
- *strp = str;
- attr->exclude_user = eu;
- attr->exclude_kernel = ek;
- attr->exclude_hv = eh;
- attr->precise_ip = precise;
- return 1;
- }
+ if (str < *strp + 2)
+ return -1;
+
+ *strp = str;
+
+ attr->exclude_user = eu;
+ attr->exclude_kernel = ek;
+ attr->exclude_hv = eh;
+ attr->precise_ip = precise;
+
return 0;
}
return EVT_FAILED;
modifier:
- parse_event_modifier(str, attr);
+ if (parse_event_modifier(str, attr) < 0) {
+ fprintf(stderr, "invalid event modifier: '%s'\n", *str);
+ fprintf(stderr, "Run 'perf list' for a list of valid events and modifiers\n");
+
+ return EVT_FAILED;
+ }
return ret;
}
snprintf(evt_path, MAXPATHLEN, "%s:%s",
sys_dirent.d_name, evt_dirent.d_name);
- printf(" %-42s [%s]\n", evt_path,
+ printf(" %-50s [%s]\n", evt_path,
event_type_descriptors[PERF_TYPE_TRACEPOINT]);
}
closedir(evt_dir);
else
snprintf(name, sizeof(name), "%s", syms->symbol);
- printf(" %-42s [%s]\n", name,
+ printf(" %-50s [%s]\n", name,
event_type_descriptors[type]);
}
}
for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
char *name = event_cache_name(type, op, i);
- if (event_glob != NULL &&
- !strglobmatch(name, event_glob))
+ if (event_glob != NULL && !strglobmatch(name, event_glob))
continue;
- printf(" %-42s [%s]\n", name,
+ printf(" %-50s [%s]\n", name,
event_type_descriptors[PERF_TYPE_HW_CACHE]);
++printed;
}
return printed;
}
+#define MAX_NAME_LEN 100
+
/*
* Print the help text for the event symbols:
*/
void print_events(const char *event_glob)
{
- struct event_symbol *syms = event_symbols;
unsigned int i, type, prev_type = -1, printed = 0, ntypes_printed = 0;
- char name[40];
+ struct event_symbol *syms = event_symbols;
+ char name[MAX_NAME_LEN];
printf("\n");
printf("List of pre-defined events (to be used in -e):\n");
continue;
if (strlen(syms->alias))
- sprintf(name, "%s OR %s", syms->symbol, syms->alias);
+ snprintf(name, MAX_NAME_LEN, "%s OR %s", syms->symbol, syms->alias);
else
- strcpy(name, syms->symbol);
- printf(" %-42s [%s]\n", name,
+ strncpy(name, syms->symbol, MAX_NAME_LEN);
+ printf(" %-50s [%s]\n", name,
event_type_descriptors[type]);
prev_type = type;
return;
printf("\n");
- printf(" %-42s [%s]\n",
+ printf(" %-50s [%s]\n",
"rNNN (see 'perf list --help' on how to encode it)",
event_type_descriptors[PERF_TYPE_RAW]);
printf("\n");
- printf(" %-42s [%s]\n",
+ printf(" %-50s [%s]\n",
"mem:<addr>[:access]",
event_type_descriptors[PERF_TYPE_BREAKPOINT]);
printf("\n");
return _param.retval;
}
+struct pubname_callback_param {
+ char *function;
+ char *file;
+ Dwarf_Die *cu_die;
+ Dwarf_Die *sp_die;
+ int found;
+};
+
+static int pubname_search_cb(Dwarf *dbg, Dwarf_Global *gl, void *data)
+{
+ struct pubname_callback_param *param = data;
+
+ if (dwarf_offdie(dbg, gl->die_offset, param->sp_die)) {
+ if (dwarf_tag(param->sp_die) != DW_TAG_subprogram)
+ return DWARF_CB_OK;
+
+ if (die_compare_name(param->sp_die, param->function)) {
+ if (!dwarf_offdie(dbg, gl->cu_offset, param->cu_die))
+ return DWARF_CB_OK;
+
+ if (param->file &&
+ strtailcmp(param->file, dwarf_decl_file(param->sp_die)))
+ return DWARF_CB_OK;
+
+ param->found = 1;
+ return DWARF_CB_ABORT;
+ }
+ }
+
+ return DWARF_CB_OK;
+}
+
/* Find probe points from debuginfo */
static int find_probes(int fd, struct probe_finder *pf)
{
off = 0;
line_list__init(&pf->lcache);
+
+ /* Fastpath: lookup by function name from .debug_pubnames section */
+ if (pp->function) {
+ struct pubname_callback_param pubname_param = {
+ .function = pp->function,
+ .file = pp->file,
+ .cu_die = &pf->cu_die,
+ .sp_die = &pf->sp_die,
+ .found = 0,
+ };
+ struct dwarf_callback_param probe_param = {
+ .data = pf,
+ };
+
+ dwarf_getpubnames(dbg, pubname_search_cb, &pubname_param, 0);
+ if (pubname_param.found) {
+ ret = probe_point_search_cb(&pf->sp_die, &probe_param);
+ if (ret)
+ goto found;
+ }
+ }
+
/* Loop on CUs (Compilation Unit) */
while (!dwarf_nextcu(dbg, off, &noff, &cuhl, NULL, NULL, NULL)) {
/* Get the DIE(Debugging Information Entry) of this CU */
}
off = noff;
}
+
+found:
line_list__free(&pf->lcache);
if (dwfl)
dwfl_end(dwfl);
return -EBADF;
}
+ /* Fastpath: lookup by function name from .debug_pubnames section */
+ if (lr->function) {
+ struct pubname_callback_param pubname_param = {
+ .function = lr->function, .file = lr->file,
+ .cu_die = &lf.cu_die, .sp_die = &lf.sp_die, .found = 0};
+ struct dwarf_callback_param line_range_param = {
+ .data = (void *)&lf, .retval = 0};
+
+ dwarf_getpubnames(dbg, pubname_search_cb, &pubname_param, 0);
+ if (pubname_param.found) {
+ line_range_search_cb(&lf.sp_die, &line_range_param);
+ if (lf.found)
+ goto found;
+ }
+ }
+
/* Loop on CUs (Compilation Unit) */
while (!lf.found && ret >= 0) {
if (dwarf_nextcu(dbg, off, &noff, &cuhl, NULL, NULL, NULL) != 0)
off = noff;
}
+found:
/* Store comp_dir */
if (lf.found) {
comp_dir = cu_get_comp_dir(&lf.cu_die);
Dwarf_Addr addr; /* Address */
const char *fname; /* Real file name */
Dwarf_Die cu_die; /* Current CU */
+ Dwarf_Die sp_die;
struct list_head lcache; /* Line cache for lazy match */
/* For variable searching */
int lno_s; /* Start line number */
int lno_e; /* End line number */
Dwarf_Die cu_die; /* Current CU */
+ Dwarf_Die sp_die;
int found;
};
&cpu, &sample_id_all))
return NULL;
- event = perf_evlist__read_on_cpu(evlist, cpu);
+ event = perf_evlist__mmap_read(evlist, cpu);
if (event != NULL) {
struct perf_evsel *first;
PyObject *pyevent = pyrf_event__new(event);
{ "COUNT_HW_CACHE_RESULT_ACCESS", PERF_COUNT_HW_CACHE_RESULT_ACCESS },
{ "COUNT_HW_CACHE_RESULT_MISS", PERF_COUNT_HW_CACHE_RESULT_MISS },
+ { "COUNT_HW_STALLED_CYCLES_FRONTEND", PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
+ { "COUNT_HW_STALLED_CYCLES_BACKEND", PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
+
{ "COUNT_SW_CPU_CLOCK", PERF_COUNT_SW_CPU_CLOCK },
{ "COUNT_SW_TASK_CLOCK", PERF_COUNT_SW_TASK_CLOCK },
{ "COUNT_SW_PAGE_FAULTS", PERF_COUNT_SW_PAGE_FAULTS },
return ret;
}
+struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
+ unsigned int type)
+{
+ struct perf_evsel *pos;
+
+ list_for_each_entry(pos, &session->evlist->entries, node) {
+ if (pos->attr.type == type)
+ return pos;
+ }
+ return NULL;
+}
+
void perf_session__print_symbols(union perf_event *event,
struct perf_sample *sample,
struct perf_session *session)
session->sample_id_all, sample);
}
+struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
+ unsigned int type);
+
void perf_session__print_symbols(union perf_event *event,
struct perf_sample *sample,
struct perf_session *session);
#define NT_GNU_BUILD_ID 3
#endif
-static bool dso__build_id_equal(const struct dso *self, u8 *build_id);
+static bool dso__build_id_equal(const struct dso *dso, u8 *build_id);
static int elf_read_build_id(Elf *elf, void *bf, size_t size);
static void dsos__add(struct list_head *head, struct dso *dso);
static struct map *map__new2(u64 start, struct dso *dso, enum map_type type);
-static int dso__load_kernel_sym(struct dso *self, struct map *map,
+static int dso__load_kernel_sym(struct dso *dso, struct map *map,
symbol_filter_t filter);
-static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map,
symbol_filter_t filter);
static int vmlinux_path__nr_entries;
static char **vmlinux_path;
.symfs = "",
};
-int dso__name_len(const struct dso *self)
+int dso__name_len(const struct dso *dso)
{
if (verbose)
- return self->long_name_len;
+ return dso->long_name_len;
- return self->short_name_len;
+ return dso->short_name_len;
}
-bool dso__loaded(const struct dso *self, enum map_type type)
+bool dso__loaded(const struct dso *dso, enum map_type type)
{
- return self->loaded & (1 << type);
+ return dso->loaded & (1 << type);
}
-bool dso__sorted_by_name(const struct dso *self, enum map_type type)
+bool dso__sorted_by_name(const struct dso *dso, enum map_type type)
{
- return self->sorted_by_name & (1 << type);
+ return dso->sorted_by_name & (1 << type);
}
-static void dso__set_sorted_by_name(struct dso *self, enum map_type type)
+static void dso__set_sorted_by_name(struct dso *dso, enum map_type type)
{
- self->sorted_by_name |= (1 << type);
+ dso->sorted_by_name |= (1 << type);
}
bool symbol_type__is_a(char symbol_type, enum map_type map_type)
}
}
-static void symbols__fixup_end(struct rb_root *self)
+static void symbols__fixup_end(struct rb_root *symbols)
{
- struct rb_node *nd, *prevnd = rb_first(self);
+ struct rb_node *nd, *prevnd = rb_first(symbols);
struct symbol *curr, *prev;
if (prevnd == NULL)
curr->end = roundup(curr->start, 4096);
}
-static void __map_groups__fixup_end(struct map_groups *self, enum map_type type)
+static void __map_groups__fixup_end(struct map_groups *mg, enum map_type type)
{
struct map *prev, *curr;
- struct rb_node *nd, *prevnd = rb_first(&self->maps[type]);
+ struct rb_node *nd, *prevnd = rb_first(&mg->maps[type]);
if (prevnd == NULL)
return;
curr->end = ~0ULL;
}
-static void map_groups__fixup_end(struct map_groups *self)
+static void map_groups__fixup_end(struct map_groups *mg)
{
int i;
for (i = 0; i < MAP__NR_TYPES; ++i)
- __map_groups__fixup_end(self, i);
+ __map_groups__fixup_end(mg, i);
}
static struct symbol *symbol__new(u64 start, u64 len, u8 binding,
const char *name)
{
size_t namelen = strlen(name) + 1;
- struct symbol *self = calloc(1, (symbol_conf.priv_size +
- sizeof(*self) + namelen));
- if (self == NULL)
+ struct symbol *sym = calloc(1, (symbol_conf.priv_size +
+ sizeof(*sym) + namelen));
+ if (sym == NULL)
return NULL;
if (symbol_conf.priv_size)
- self = ((void *)self) + symbol_conf.priv_size;
-
- self->start = start;
- self->end = len ? start + len - 1 : start;
- self->binding = binding;
- self->namelen = namelen - 1;
+ sym = ((void *)sym) + symbol_conf.priv_size;
- pr_debug4("%s: %s %#" PRIx64 "-%#" PRIx64 "\n", __func__, name, start, self->end);
+ sym->start = start;
+ sym->end = len ? start + len - 1 : start;
+ sym->binding = binding;
+ sym->namelen = namelen - 1;
- memcpy(self->name, name, namelen);
+ pr_debug4("%s: %s %#" PRIx64 "-%#" PRIx64 "\n",
+ __func__, name, start, sym->end);
+ memcpy(sym->name, name, namelen);
- return self;
+ return sym;
}
-void symbol__delete(struct symbol *self)
+void symbol__delete(struct symbol *sym)
{
- free(((void *)self) - symbol_conf.priv_size);
+ free(((void *)sym) - symbol_conf.priv_size);
}
-static size_t symbol__fprintf(struct symbol *self, FILE *fp)
+static size_t symbol__fprintf(struct symbol *sym, FILE *fp)
{
return fprintf(fp, " %" PRIx64 "-%" PRIx64 " %c %s\n",
- self->start, self->end,
- self->binding == STB_GLOBAL ? 'g' :
- self->binding == STB_LOCAL ? 'l' : 'w',
- self->name);
+ sym->start, sym->end,
+ sym->binding == STB_GLOBAL ? 'g' :
+ sym->binding == STB_LOCAL ? 'l' : 'w',
+ sym->name);
}
-void dso__set_long_name(struct dso *self, char *name)
+void dso__set_long_name(struct dso *dso, char *name)
{
if (name == NULL)
return;
- self->long_name = name;
- self->long_name_len = strlen(name);
+ dso->long_name = name;
+ dso->long_name_len = strlen(name);
}
-static void dso__set_short_name(struct dso *self, const char *name)
+static void dso__set_short_name(struct dso *dso, const char *name)
{
if (name == NULL)
return;
- self->short_name = name;
- self->short_name_len = strlen(name);
+ dso->short_name = name;
+ dso->short_name_len = strlen(name);
}
-static void dso__set_basename(struct dso *self)
+static void dso__set_basename(struct dso *dso)
{
- dso__set_short_name(self, basename(self->long_name));
+ dso__set_short_name(dso, basename(dso->long_name));
}
struct dso *dso__new(const char *name)
{
- struct dso *self = calloc(1, sizeof(*self) + strlen(name) + 1);
+ struct dso *dso = calloc(1, sizeof(*dso) + strlen(name) + 1);
- if (self != NULL) {
+ if (dso != NULL) {
int i;
- strcpy(self->name, name);
- dso__set_long_name(self, self->name);
- dso__set_short_name(self, self->name);
+ strcpy(dso->name, name);
+ dso__set_long_name(dso, dso->name);
+ dso__set_short_name(dso, dso->name);
for (i = 0; i < MAP__NR_TYPES; ++i)
- self->symbols[i] = self->symbol_names[i] = RB_ROOT;
- self->symtab_type = SYMTAB__NOT_FOUND;
- self->loaded = 0;
- self->sorted_by_name = 0;
- self->has_build_id = 0;
- self->kernel = DSO_TYPE_USER;
- INIT_LIST_HEAD(&self->node);
+ dso->symbols[i] = dso->symbol_names[i] = RB_ROOT;
+ dso->symtab_type = SYMTAB__NOT_FOUND;
+ dso->loaded = 0;
+ dso->sorted_by_name = 0;
+ dso->has_build_id = 0;
+ dso->kernel = DSO_TYPE_USER;
+ INIT_LIST_HEAD(&dso->node);
}
- return self;
+ return dso;
}
-static void symbols__delete(struct rb_root *self)
+static void symbols__delete(struct rb_root *symbols)
{
struct symbol *pos;
- struct rb_node *next = rb_first(self);
+ struct rb_node *next = rb_first(symbols);
while (next) {
pos = rb_entry(next, struct symbol, rb_node);
next = rb_next(&pos->rb_node);
- rb_erase(&pos->rb_node, self);
+ rb_erase(&pos->rb_node, symbols);
symbol__delete(pos);
}
}
-void dso__delete(struct dso *self)
+void dso__delete(struct dso *dso)
{
int i;
for (i = 0; i < MAP__NR_TYPES; ++i)
- symbols__delete(&self->symbols[i]);
- if (self->sname_alloc)
- free((char *)self->short_name);
- if (self->lname_alloc)
- free(self->long_name);
- free(self);
+ symbols__delete(&dso->symbols[i]);
+ if (dso->sname_alloc)
+ free((char *)dso->short_name);
+ if (dso->lname_alloc)
+ free(dso->long_name);
+ free(dso);
}
-void dso__set_build_id(struct dso *self, void *build_id)
+void dso__set_build_id(struct dso *dso, void *build_id)
{
- memcpy(self->build_id, build_id, sizeof(self->build_id));
- self->has_build_id = 1;
+ memcpy(dso->build_id, build_id, sizeof(dso->build_id));
+ dso->has_build_id = 1;
}
-static void symbols__insert(struct rb_root *self, struct symbol *sym)
+static void symbols__insert(struct rb_root *symbols, struct symbol *sym)
{
- struct rb_node **p = &self->rb_node;
+ struct rb_node **p = &symbols->rb_node;
struct rb_node *parent = NULL;
const u64 ip = sym->start;
struct symbol *s;
p = &(*p)->rb_right;
}
rb_link_node(&sym->rb_node, parent, p);
- rb_insert_color(&sym->rb_node, self);
+ rb_insert_color(&sym->rb_node, symbols);
}
-static struct symbol *symbols__find(struct rb_root *self, u64 ip)
+static struct symbol *symbols__find(struct rb_root *symbols, u64 ip)
{
struct rb_node *n;
- if (self == NULL)
+ if (symbols == NULL)
return NULL;
- n = self->rb_node;
+ n = symbols->rb_node;
while (n) {
struct symbol *s = rb_entry(n, struct symbol, rb_node);
struct symbol sym;
};
-static void symbols__insert_by_name(struct rb_root *self, struct symbol *sym)
+static void symbols__insert_by_name(struct rb_root *symbols, struct symbol *sym)
{
- struct rb_node **p = &self->rb_node;
+ struct rb_node **p = &symbols->rb_node;
struct rb_node *parent = NULL;
struct symbol_name_rb_node *symn, *s;
p = &(*p)->rb_right;
}
rb_link_node(&symn->rb_node, parent, p);
- rb_insert_color(&symn->rb_node, self);
+ rb_insert_color(&symn->rb_node, symbols);
}
-static void symbols__sort_by_name(struct rb_root *self, struct rb_root *source)
+static void symbols__sort_by_name(struct rb_root *symbols,
+ struct rb_root *source)
{
struct rb_node *nd;
for (nd = rb_first(source); nd; nd = rb_next(nd)) {
struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
- symbols__insert_by_name(self, pos);
+ symbols__insert_by_name(symbols, pos);
}
}
-static struct symbol *symbols__find_by_name(struct rb_root *self, const char *name)
+static struct symbol *symbols__find_by_name(struct rb_root *symbols,
+ const char *name)
{
struct rb_node *n;
- if (self == NULL)
+ if (symbols == NULL)
return NULL;
- n = self->rb_node;
+ n = symbols->rb_node;
while (n) {
struct symbol_name_rb_node *s;
return NULL;
}
-struct symbol *dso__find_symbol(struct dso *self,
+struct symbol *dso__find_symbol(struct dso *dso,
enum map_type type, u64 addr)
{
- return symbols__find(&self->symbols[type], addr);
+ return symbols__find(&dso->symbols[type], addr);
}
-struct symbol *dso__find_symbol_by_name(struct dso *self, enum map_type type,
+struct symbol *dso__find_symbol_by_name(struct dso *dso, enum map_type type,
const char *name)
{
- return symbols__find_by_name(&self->symbol_names[type], name);
+ return symbols__find_by_name(&dso->symbol_names[type], name);
}
-void dso__sort_by_name(struct dso *self, enum map_type type)
+void dso__sort_by_name(struct dso *dso, enum map_type type)
{
- dso__set_sorted_by_name(self, type);
- return symbols__sort_by_name(&self->symbol_names[type],
- &self->symbols[type]);
+ dso__set_sorted_by_name(dso, type);
+ return symbols__sort_by_name(&dso->symbol_names[type],
+ &dso->symbols[type]);
}
-int build_id__sprintf(const u8 *self, int len, char *bf)
+int build_id__sprintf(const u8 *build_id, int len, char *bf)
{
char *bid = bf;
- const u8 *raw = self;
+ const u8 *raw = build_id;
int i;
for (i = 0; i < len; ++i) {
bid += 2;
}
- return raw - self;
+ return raw - build_id;
}
-size_t dso__fprintf_buildid(struct dso *self, FILE *fp)
+size_t dso__fprintf_buildid(struct dso *dso, FILE *fp)
{
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
- build_id__sprintf(self->build_id, sizeof(self->build_id), sbuild_id);
+ build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
return fprintf(fp, "%s", sbuild_id);
}
-size_t dso__fprintf_symbols_by_name(struct dso *self, enum map_type type, FILE *fp)
+size_t dso__fprintf_symbols_by_name(struct dso *dso,
+ enum map_type type, FILE *fp)
{
size_t ret = 0;
struct rb_node *nd;
struct symbol_name_rb_node *pos;
- for (nd = rb_first(&self->symbol_names[type]); nd; nd = rb_next(nd)) {
+ for (nd = rb_first(&dso->symbol_names[type]); nd; nd = rb_next(nd)) {
pos = rb_entry(nd, struct symbol_name_rb_node, rb_node);
fprintf(fp, "%s\n", pos->sym.name);
}
return ret;
}
-size_t dso__fprintf(struct dso *self, enum map_type type, FILE *fp)
+size_t dso__fprintf(struct dso *dso, enum map_type type, FILE *fp)
{
struct rb_node *nd;
- size_t ret = fprintf(fp, "dso: %s (", self->short_name);
+ size_t ret = fprintf(fp, "dso: %s (", dso->short_name);
- if (self->short_name != self->long_name)
- ret += fprintf(fp, "%s, ", self->long_name);
+ if (dso->short_name != dso->long_name)
+ ret += fprintf(fp, "%s, ", dso->long_name);
ret += fprintf(fp, "%s, %sloaded, ", map_type__name[type],
- self->loaded ? "" : "NOT ");
- ret += dso__fprintf_buildid(self, fp);
+ dso->loaded ? "" : "NOT ");
+ ret += dso__fprintf_buildid(dso, fp);
ret += fprintf(fp, ")\n");
- for (nd = rb_first(&self->symbols[type]); nd; nd = rb_next(nd)) {
+ for (nd = rb_first(&dso->symbols[type]); nd; nd = rb_next(nd)) {
struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
ret += symbol__fprintf(pos, fp);
}
* so that we can in the next step set the symbol ->end address and then
* call kernel_maps__split_kallsyms.
*/
-static int dso__load_all_kallsyms(struct dso *self, const char *filename,
+static int dso__load_all_kallsyms(struct dso *dso, const char *filename,
struct map *map)
{
- struct process_kallsyms_args args = { .map = map, .dso = self, };
+ struct process_kallsyms_args args = { .map = map, .dso = dso, };
return kallsyms__parse(filename, &args, map__process_kallsym_symbol);
}
* kernel range is broken in several maps, named [kernel].N, as we don't have
* the original ELF section names vmlinux have.
*/
-static int dso__split_kallsyms(struct dso *self, struct map *map,
+static int dso__split_kallsyms(struct dso *dso, struct map *map,
symbol_filter_t filter)
{
struct map_groups *kmaps = map__kmap(map)->kmaps;
struct map *curr_map = map;
struct symbol *pos;
int count = 0, moved = 0;
- struct rb_root *root = &self->symbols[map->type];
+ struct rb_root *root = &dso->symbols[map->type];
struct rb_node *next = rb_first(root);
int kernel_range = 0;
if (strcmp(curr_map->dso->short_name, module)) {
if (curr_map != map &&
- self->kernel == DSO_TYPE_GUEST_KERNEL &&
+ dso->kernel == DSO_TYPE_GUEST_KERNEL &&
machine__is_default_guest(machine)) {
/*
* We assume all symbols of a module are
pos->end = curr_map->map_ip(curr_map, pos->end);
} else if (curr_map != map) {
char dso_name[PATH_MAX];
- struct dso *dso;
+ struct dso *ndso;
if (count == 0) {
curr_map = map;
goto filter_symbol;
}
- if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
snprintf(dso_name, sizeof(dso_name),
"[guest.kernel].%d",
kernel_range++);
"[kernel].%d",
kernel_range++);
- dso = dso__new(dso_name);
- if (dso == NULL)
+ ndso = dso__new(dso_name);
+ if (ndso == NULL)
return -1;
- dso->kernel = self->kernel;
+ ndso->kernel = dso->kernel;
- curr_map = map__new2(pos->start, dso, map->type);
+ curr_map = map__new2(pos->start, ndso, map->type);
if (curr_map == NULL) {
- dso__delete(dso);
+ dso__delete(ndso);
return -1;
}
}
if (curr_map != map &&
- self->kernel == DSO_TYPE_GUEST_KERNEL &&
+ dso->kernel == DSO_TYPE_GUEST_KERNEL &&
machine__is_default_guest(kmaps->machine)) {
dso__set_loaded(curr_map->dso, curr_map->type);
}
return count + moved;
}
-int dso__load_kallsyms(struct dso *self, const char *filename,
+int dso__load_kallsyms(struct dso *dso, const char *filename,
struct map *map, symbol_filter_t filter)
{
- if (dso__load_all_kallsyms(self, filename, map) < 0)
+ if (dso__load_all_kallsyms(dso, filename, map) < 0)
return -1;
- if (self->kernel == DSO_TYPE_GUEST_KERNEL)
- self->symtab_type = SYMTAB__GUEST_KALLSYMS;
+ if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
+ dso->symtab_type = SYMTAB__GUEST_KALLSYMS;
else
- self->symtab_type = SYMTAB__KALLSYMS;
+ dso->symtab_type = SYMTAB__KALLSYMS;
- return dso__split_kallsyms(self, map, filter);
+ return dso__split_kallsyms(dso, map, filter);
}
-static int dso__load_perf_map(struct dso *self, struct map *map,
+static int dso__load_perf_map(struct dso *dso, struct map *map,
symbol_filter_t filter)
{
char *line = NULL;
FILE *file;
int nr_syms = 0;
- file = fopen(self->long_name, "r");
+ file = fopen(dso->long_name, "r");
if (file == NULL)
goto out_failure;
if (filter && filter(map, sym))
symbol__delete(sym);
else {
- symbols__insert(&self->symbols[map->type], sym);
+ symbols__insert(&dso->symbols[map->type], sym);
nr_syms++;
}
}
/**
* elf_symtab__for_each_symbol - iterate thru all the symbols
*
- * @self: struct elf_symtab instance to iterate
+ * @syms: struct elf_symtab instance to iterate
* @idx: uint32_t idx
* @sym: GElf_Sym iterator
*/
* And always look at the original dso, not at debuginfo packages, that
* have the PLT data stripped out (shdr_rel_plt.sh_type == SHT_NOBITS).
*/
-static int dso__synthesize_plt_symbols(struct dso *self, struct map *map,
+static int dso__synthesize_plt_symbols(struct dso *dso, struct map *map,
symbol_filter_t filter)
{
uint32_t nr_rel_entries, idx;
char name[PATH_MAX];
snprintf(name, sizeof(name), "%s%s",
- symbol_conf.symfs, self->long_name);
+ symbol_conf.symfs, dso->long_name);
fd = open(name, O_RDONLY);
if (fd < 0)
goto out;
if (filter && filter(map, f))
symbol__delete(f);
else {
- symbols__insert(&self->symbols[map->type], f);
+ symbols__insert(&dso->symbols[map->type], f);
++nr;
}
}
if (filter && filter(map, f))
symbol__delete(f);
else {
- symbols__insert(&self->symbols[map->type], f);
+ symbols__insert(&dso->symbols[map->type], f);
++nr;
}
}
return nr;
out:
pr_debug("%s: problems reading %s PLT info.\n",
- __func__, self->long_name);
+ __func__, dso->long_name);
return 0;
}
-static bool elf_sym__is_a(GElf_Sym *self, enum map_type type)
+static bool elf_sym__is_a(GElf_Sym *sym, enum map_type type)
{
switch (type) {
case MAP__FUNCTION:
- return elf_sym__is_function(self);
+ return elf_sym__is_function(sym);
case MAP__VARIABLE:
- return elf_sym__is_object(self);
+ return elf_sym__is_object(sym);
default:
return false;
}
}
-static bool elf_sec__is_a(GElf_Shdr *self, Elf_Data *secstrs, enum map_type type)
+static bool elf_sec__is_a(GElf_Shdr *shdr, Elf_Data *secstrs,
+ enum map_type type)
{
switch (type) {
case MAP__FUNCTION:
- return elf_sec__is_text(self, secstrs);
+ return elf_sec__is_text(shdr, secstrs);
case MAP__VARIABLE:
- return elf_sec__is_data(self, secstrs);
+ return elf_sec__is_data(shdr, secstrs);
default:
return false;
}
return -1;
}
-static int dso__load_sym(struct dso *self, struct map *map, const char *name,
+static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
int fd, symbol_filter_t filter, int kmodule,
int want_symtab)
{
- struct kmap *kmap = self->kernel ? map__kmap(map) : NULL;
+ struct kmap *kmap = dso->kernel ? map__kmap(map) : NULL;
struct map *curr_map = map;
- struct dso *curr_dso = self;
+ struct dso *curr_dso = dso;
Elf_Data *symstrs, *secstrs;
uint32_t nr_syms;
int err = -1;
}
/* Always reject images with a mismatched build-id: */
- if (self->has_build_id) {
+ if (dso->has_build_id) {
u8 build_id[BUILD_ID_SIZE];
if (elf_read_build_id(elf, build_id,
BUILD_ID_SIZE) != BUILD_ID_SIZE)
goto out_elf_end;
- if (!dso__build_id_equal(self, build_id))
+ if (!dso__build_id_equal(dso, build_id))
goto out_elf_end;
}
nr_syms = shdr.sh_size / shdr.sh_entsize;
memset(&sym, 0, sizeof(sym));
- if (self->kernel == DSO_TYPE_USER) {
- self->adjust_symbols = (ehdr.e_type == ET_EXEC ||
+ if (dso->kernel == DSO_TYPE_USER) {
+ dso->adjust_symbols = (ehdr.e_type == ET_EXEC ||
elf_section_by_name(elf, &ehdr, &shdr,
".gnu.prelink_undo",
NULL) != NULL);
- } else self->adjust_symbols = 0;
-
+ } else {
+ dso->adjust_symbols = 0;
+ }
elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) {
struct symbol *f;
const char *elf_name = elf_sym__name(&sym, symstrs);
(sym.st_value & 1))
--sym.st_value;
- if (self->kernel != DSO_TYPE_USER || kmodule) {
+ if (dso->kernel != DSO_TYPE_USER || kmodule) {
char dso_name[PATH_MAX];
if (strcmp(section_name,
(curr_dso->short_name +
- self->short_name_len)) == 0)
+ dso->short_name_len)) == 0)
goto new_symbol;
if (strcmp(section_name, ".text") == 0) {
curr_map = map;
- curr_dso = self;
+ curr_dso = dso;
goto new_symbol;
}
snprintf(dso_name, sizeof(dso_name),
- "%s%s", self->short_name, section_name);
+ "%s%s", dso->short_name, section_name);
curr_map = map_groups__find_by_name(kmap->kmaps, map->type, dso_name);
if (curr_map == NULL) {
curr_dso = dso__new(dso_name);
if (curr_dso == NULL)
goto out_elf_end;
- curr_dso->kernel = self->kernel;
- curr_dso->long_name = self->long_name;
- curr_dso->long_name_len = self->long_name_len;
+ curr_dso->kernel = dso->kernel;
+ curr_dso->long_name = dso->long_name;
+ curr_dso->long_name_len = dso->long_name_len;
curr_map = map__new2(start, curr_dso,
map->type);
if (curr_map == NULL) {
}
curr_map->map_ip = identity__map_ip;
curr_map->unmap_ip = identity__map_ip;
- curr_dso->symtab_type = self->symtab_type;
+ curr_dso->symtab_type = dso->symtab_type;
map_groups__insert(kmap->kmaps, curr_map);
- dsos__add(&self->node, curr_dso);
+ dsos__add(&dso->node, curr_dso);
dso__set_loaded(curr_dso, map->type);
} else
curr_dso = curr_map->dso;
* For misannotated, zeroed, ASM function sizes.
*/
if (nr > 0) {
- symbols__fixup_end(&self->symbols[map->type]);
+ symbols__fixup_end(&dso->symbols[map->type]);
if (kmap) {
/*
* We need to fixup this here too because we create new
return err;
}
-static bool dso__build_id_equal(const struct dso *self, u8 *build_id)
+static bool dso__build_id_equal(const struct dso *dso, u8 *build_id)
{
- return memcmp(self->build_id, build_id, sizeof(self->build_id)) == 0;
+ return memcmp(dso->build_id, build_id, sizeof(dso->build_id)) == 0;
}
bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
return err;
}
-char dso__symtab_origin(const struct dso *self)
+char dso__symtab_origin(const struct dso *dso)
{
static const char origin[] = {
[SYMTAB__KALLSYMS] = 'k',
[SYMTAB__GUEST_KMODULE] = 'G',
};
- if (self == NULL || self->symtab_type == SYMTAB__NOT_FOUND)
+ if (dso == NULL || dso->symtab_type == SYMTAB__NOT_FOUND)
return '!';
- return origin[self->symtab_type];
+ return origin[dso->symtab_type];
}
-int dso__load(struct dso *self, struct map *map, symbol_filter_t filter)
+int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
{
int size = PATH_MAX;
char *name;
const char *root_dir;
int want_symtab;
- dso__set_loaded(self, map->type);
+ dso__set_loaded(dso, map->type);
- if (self->kernel == DSO_TYPE_KERNEL)
- return dso__load_kernel_sym(self, map, filter);
- else if (self->kernel == DSO_TYPE_GUEST_KERNEL)
- return dso__load_guest_kernel_sym(self, map, filter);
+ if (dso->kernel == DSO_TYPE_KERNEL)
+ return dso__load_kernel_sym(dso, map, filter);
+ else if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
+ return dso__load_guest_kernel_sym(dso, map, filter);
if (map->groups && map->groups->machine)
machine = map->groups->machine;
if (!name)
return -1;
- self->adjust_symbols = 0;
+ dso->adjust_symbols = 0;
- if (strncmp(self->name, "/tmp/perf-", 10) == 0) {
- ret = dso__load_perf_map(self, map, filter);
- self->symtab_type = ret > 0 ? SYMTAB__JAVA_JIT :
+ if (strncmp(dso->name, "/tmp/perf-", 10) == 0) {
+ ret = dso__load_perf_map(dso, map, filter);
+ dso->symtab_type = ret > 0 ? SYMTAB__JAVA_JIT :
SYMTAB__NOT_FOUND;
return ret;
}
*/
want_symtab = 1;
restart:
- for (self->symtab_type = SYMTAB__BUILD_ID_CACHE;
- self->symtab_type != SYMTAB__NOT_FOUND;
- self->symtab_type++) {
- switch (self->symtab_type) {
+ for (dso->symtab_type = SYMTAB__BUILD_ID_CACHE;
+ dso->symtab_type != SYMTAB__NOT_FOUND;
+ dso->symtab_type++) {
+ switch (dso->symtab_type) {
case SYMTAB__BUILD_ID_CACHE:
/* skip the locally configured cache if a symfs is given */
if (symbol_conf.symfs[0] ||
- (dso__build_id_filename(self, name, size) == NULL)) {
+ (dso__build_id_filename(dso, name, size) == NULL)) {
continue;
}
break;
case SYMTAB__FEDORA_DEBUGINFO:
snprintf(name, size, "%s/usr/lib/debug%s.debug",
- symbol_conf.symfs, self->long_name);
+ symbol_conf.symfs, dso->long_name);
break;
case SYMTAB__UBUNTU_DEBUGINFO:
snprintf(name, size, "%s/usr/lib/debug%s",
- symbol_conf.symfs, self->long_name);
+ symbol_conf.symfs, dso->long_name);
break;
case SYMTAB__BUILDID_DEBUGINFO: {
char build_id_hex[BUILD_ID_SIZE * 2 + 1];
- if (!self->has_build_id)
+ if (!dso->has_build_id)
continue;
- build_id__sprintf(self->build_id,
- sizeof(self->build_id),
+ build_id__sprintf(dso->build_id,
+ sizeof(dso->build_id),
build_id_hex);
snprintf(name, size,
"%s/usr/lib/debug/.build-id/%.2s/%s.debug",
break;
case SYMTAB__SYSTEM_PATH_DSO:
snprintf(name, size, "%s%s",
- symbol_conf.symfs, self->long_name);
+ symbol_conf.symfs, dso->long_name);
break;
case SYMTAB__GUEST_KMODULE:
if (map->groups && machine)
else
root_dir = "";
snprintf(name, size, "%s%s%s", symbol_conf.symfs,
- root_dir, self->long_name);
+ root_dir, dso->long_name);
break;
case SYMTAB__SYSTEM_PATH_KMODULE:
snprintf(name, size, "%s%s", symbol_conf.symfs,
- self->long_name);
+ dso->long_name);
break;
default:;
}
if (fd < 0)
continue;
- ret = dso__load_sym(self, map, name, fd, filter, 0,
+ ret = dso__load_sym(dso, map, name, fd, filter, 0,
want_symtab);
close(fd);
continue;
if (ret > 0) {
- int nr_plt = dso__synthesize_plt_symbols(self, map, filter);
+ int nr_plt = dso__synthesize_plt_symbols(dso, map,
+ filter);
if (nr_plt > 0)
ret += nr_plt;
break;
}
free(name);
- if (ret < 0 && strstr(self->name, " (deleted)") != NULL)
+ if (ret < 0 && strstr(dso->name, " (deleted)") != NULL)
return 0;
return ret;
}
-struct map *map_groups__find_by_name(struct map_groups *self,
+struct map *map_groups__find_by_name(struct map_groups *mg,
enum map_type type, const char *name)
{
struct rb_node *nd;
- for (nd = rb_first(&self->maps[type]); nd; nd = rb_next(nd)) {
+ for (nd = rb_first(&mg->maps[type]); nd; nd = rb_next(nd)) {
struct map *map = rb_entry(nd, struct map, rb_node);
if (map->dso && strcmp(map->dso->short_name, name) == 0)
return NULL;
}
-static int dso__kernel_module_get_build_id(struct dso *self,
- const char *root_dir)
+static int dso__kernel_module_get_build_id(struct dso *dso,
+ const char *root_dir)
{
char filename[PATH_MAX];
/*
* kernel module short names are of the form "[module]" and
* we need just "module" here.
*/
- const char *name = self->short_name + 1;
+ const char *name = dso->short_name + 1;
snprintf(filename, sizeof(filename),
"%s/sys/module/%.*s/notes/.note.gnu.build-id",
root_dir, (int)strlen(name) - 1, name);
- if (sysfs__read_build_id(filename, self->build_id,
- sizeof(self->build_id)) == 0)
- self->has_build_id = true;
+ if (sysfs__read_build_id(filename, dso->build_id,
+ sizeof(dso->build_id)) == 0)
+ dso->has_build_id = true;
return 0;
}
-static int map_groups__set_modules_path_dir(struct map_groups *self,
+static int map_groups__set_modules_path_dir(struct map_groups *mg,
const char *dir_name)
{
struct dirent *dent;
snprintf(path, sizeof(path), "%s/%s",
dir_name, dent->d_name);
- ret = map_groups__set_modules_path_dir(self, path);
+ ret = map_groups__set_modules_path_dir(mg, path);
if (ret < 0)
goto out;
} else {
(int)(dot - dent->d_name), dent->d_name);
strxfrchar(dso_name, '-', '_');
- map = map_groups__find_by_name(self, MAP__FUNCTION, dso_name);
+ map = map_groups__find_by_name(mg, MAP__FUNCTION,
+ dso_name);
if (map == NULL)
continue;
return strdup(name);
}
-static int machine__set_modules_path(struct machine *self)
+static int machine__set_modules_path(struct machine *machine)
{
char *version;
char modules_path[PATH_MAX];
- version = get_kernel_version(self->root_dir);
+ version = get_kernel_version(machine->root_dir);
if (!version)
return -1;
snprintf(modules_path, sizeof(modules_path), "%s/lib/modules/%s/kernel",
- self->root_dir, version);
+ machine->root_dir, version);
free(version);
- return map_groups__set_modules_path_dir(&self->kmaps, modules_path);
+ return map_groups__set_modules_path_dir(&machine->kmaps, modules_path);
}
/*
*/
static struct map *map__new2(u64 start, struct dso *dso, enum map_type type)
{
- struct map *self = calloc(1, (sizeof(*self) +
- (dso->kernel ? sizeof(struct kmap) : 0)));
- if (self != NULL) {
+ struct map *map = calloc(1, (sizeof(*map) +
+ (dso->kernel ? sizeof(struct kmap) : 0)));
+ if (map != NULL) {
/*
* ->end will be filled after we load all the symbols
*/
- map__init(self, type, start, 0, 0, dso);
+ map__init(map, type, start, 0, 0, dso);
}
- return self;
+ return map;
}
-struct map *machine__new_module(struct machine *self, u64 start,
+struct map *machine__new_module(struct machine *machine, u64 start,
const char *filename)
{
struct map *map;
- struct dso *dso = __dsos__findnew(&self->kernel_dsos, filename);
+ struct dso *dso = __dsos__findnew(&machine->kernel_dsos, filename);
if (dso == NULL)
return NULL;
if (map == NULL)
return NULL;
- if (machine__is_host(self))
+ if (machine__is_host(machine))
dso->symtab_type = SYMTAB__SYSTEM_PATH_KMODULE;
else
dso->symtab_type = SYMTAB__GUEST_KMODULE;
- map_groups__insert(&self->kmaps, map);
+ map_groups__insert(&machine->kmaps, map);
return map;
}
-static int machine__create_modules(struct machine *self)
+static int machine__create_modules(struct machine *machine)
{
char *line = NULL;
size_t n;
const char *modules;
char path[PATH_MAX];
- if (machine__is_default_guest(self))
+ if (machine__is_default_guest(machine))
modules = symbol_conf.default_guest_modules;
else {
- sprintf(path, "%s/proc/modules", self->root_dir);
+ sprintf(path, "%s/proc/modules", machine->root_dir);
modules = path;
}
*sep = '\0';
snprintf(name, sizeof(name), "[%s]", line);
- map = machine__new_module(self, start, name);
+ map = machine__new_module(machine, start, name);
if (map == NULL)
goto out_delete_line;
- dso__kernel_module_get_build_id(map->dso, self->root_dir);
+ dso__kernel_module_get_build_id(map->dso, machine->root_dir);
}
free(line);
fclose(file);
- return machine__set_modules_path(self);
+ return machine__set_modules_path(machine);
out_delete_line:
free(line);
return -1;
}
-int dso__load_vmlinux(struct dso *self, struct map *map,
+int dso__load_vmlinux(struct dso *dso, struct map *map,
const char *vmlinux, symbol_filter_t filter)
{
int err = -1, fd;
if (fd < 0)
return -1;
- dso__set_long_name(self, (char *)vmlinux);
- dso__set_loaded(self, map->type);
- err = dso__load_sym(self, map, symfs_vmlinux, fd, filter, 0, 0);
+ dso__set_long_name(dso, (char *)vmlinux);
+ dso__set_loaded(dso, map->type);
+ err = dso__load_sym(dso, map, symfs_vmlinux, fd, filter, 0, 0);
close(fd);
if (err > 0)
return err;
}
-int dso__load_vmlinux_path(struct dso *self, struct map *map,
+int dso__load_vmlinux_path(struct dso *dso, struct map *map,
symbol_filter_t filter)
{
int i, err = 0;
pr_debug("Looking at the vmlinux_path (%d entries long)\n",
vmlinux_path__nr_entries + 1);
- filename = dso__build_id_filename(self, NULL, 0);
+ filename = dso__build_id_filename(dso, NULL, 0);
if (filename != NULL) {
- err = dso__load_vmlinux(self, map, filename, filter);
+ err = dso__load_vmlinux(dso, map, filename, filter);
if (err > 0) {
- dso__set_long_name(self, filename);
+ dso__set_long_name(dso, filename);
goto out;
}
free(filename);
}
for (i = 0; i < vmlinux_path__nr_entries; ++i) {
- err = dso__load_vmlinux(self, map, vmlinux_path[i], filter);
+ err = dso__load_vmlinux(dso, map, vmlinux_path[i], filter);
if (err > 0) {
- dso__set_long_name(self, strdup(vmlinux_path[i]));
+ dso__set_long_name(dso, strdup(vmlinux_path[i]));
break;
}
}
return err;
}
-static int dso__load_kernel_sym(struct dso *self, struct map *map,
+static int dso__load_kernel_sym(struct dso *dso, struct map *map,
symbol_filter_t filter)
{
int err;
}
if (symbol_conf.vmlinux_name != NULL) {
- err = dso__load_vmlinux(self, map,
+ err = dso__load_vmlinux(dso, map,
symbol_conf.vmlinux_name, filter);
if (err > 0) {
- dso__set_long_name(self,
+ dso__set_long_name(dso,
strdup(symbol_conf.vmlinux_name));
goto out_fixup;
}
}
if (vmlinux_path != NULL) {
- err = dso__load_vmlinux_path(self, map, filter);
+ err = dso__load_vmlinux_path(dso, map, filter);
if (err > 0)
goto out_fixup;
}
* we have a build-id, so check if it is the same as the running kernel,
* using it if it is.
*/
- if (self->has_build_id) {
+ if (dso->has_build_id) {
u8 kallsyms_build_id[BUILD_ID_SIZE];
char sbuild_id[BUILD_ID_SIZE * 2 + 1];
if (sysfs__read_build_id("/sys/kernel/notes", kallsyms_build_id,
sizeof(kallsyms_build_id)) == 0) {
- if (dso__build_id_equal(self, kallsyms_build_id)) {
+ if (dso__build_id_equal(dso, kallsyms_build_id)) {
kallsyms_filename = "/proc/kallsyms";
goto do_kallsyms;
}
* Now look if we have it on the build-id cache in
* $HOME/.debug/[kernel.kallsyms].
*/
- build_id__sprintf(self->build_id, sizeof(self->build_id),
+ build_id__sprintf(dso->build_id, sizeof(dso->build_id),
sbuild_id);
if (asprintf(&kallsyms_allocated_filename,
}
do_kallsyms:
- err = dso__load_kallsyms(self, kallsyms_filename, map, filter);
+ err = dso__load_kallsyms(dso, kallsyms_filename, map, filter);
if (err > 0)
pr_debug("Using %s for symbols\n", kallsyms_filename);
free(kallsyms_allocated_filename);
if (err > 0) {
out_fixup:
if (kallsyms_filename != NULL)
- dso__set_long_name(self, strdup("[kernel.kallsyms]"));
+ dso__set_long_name(dso, strdup("[kernel.kallsyms]"));
map__fixup_start(map);
map__fixup_end(map);
}
return err;
}
-static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
- symbol_filter_t filter)
+static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map,
+ symbol_filter_t filter)
{
int err;
const char *kallsyms_filename = NULL;
* Or use file guest_kallsyms inputted by user on commandline
*/
if (symbol_conf.default_guest_vmlinux_name != NULL) {
- err = dso__load_vmlinux(self, map,
+ err = dso__load_vmlinux(dso, map,
symbol_conf.default_guest_vmlinux_name, filter);
goto out_try_fixup;
}
kallsyms_filename = path;
}
- err = dso__load_kallsyms(self, kallsyms_filename, map, filter);
+ err = dso__load_kallsyms(dso, kallsyms_filename, map, filter);
if (err > 0)
pr_debug("Using %s for symbols\n", kallsyms_filename);
if (err > 0) {
if (kallsyms_filename != NULL) {
machine__mmap_name(machine, path, sizeof(path));
- dso__set_long_name(self, strdup(path));
+ dso__set_long_name(dso, strdup(path));
}
map__fixup_start(map);
map__fixup_end(map);
return ret;
}
-size_t machines__fprintf_dsos(struct rb_root *self, FILE *fp)
+size_t machines__fprintf_dsos(struct rb_root *machines, FILE *fp)
{
struct rb_node *nd;
size_t ret = 0;
- for (nd = rb_first(self); nd; nd = rb_next(nd)) {
+ for (nd = rb_first(machines); nd; nd = rb_next(nd)) {
struct machine *pos = rb_entry(nd, struct machine, rb_node);
ret += __dsos__fprintf(&pos->kernel_dsos, fp);
ret += __dsos__fprintf(&pos->user_dsos, fp);
return ret;
}
-size_t machine__fprintf_dsos_buildid(struct machine *self, FILE *fp, bool with_hits)
+size_t machine__fprintf_dsos_buildid(struct machine *machine, FILE *fp,
+ bool with_hits)
{
- return __dsos__fprintf_buildid(&self->kernel_dsos, fp, with_hits) +
- __dsos__fprintf_buildid(&self->user_dsos, fp, with_hits);
+ return __dsos__fprintf_buildid(&machine->kernel_dsos, fp, with_hits) +
+ __dsos__fprintf_buildid(&machine->user_dsos, fp, with_hits);
}
-size_t machines__fprintf_dsos_buildid(struct rb_root *self, FILE *fp, bool with_hits)
+size_t machines__fprintf_dsos_buildid(struct rb_root *machines,
+ FILE *fp, bool with_hits)
{
struct rb_node *nd;
size_t ret = 0;
- for (nd = rb_first(self); nd; nd = rb_next(nd)) {
+ for (nd = rb_first(machines); nd; nd = rb_next(nd)) {
struct machine *pos = rb_entry(nd, struct machine, rb_node);
ret += machine__fprintf_dsos_buildid(pos, fp, with_hits);
}
struct dso *dso__new_kernel(const char *name)
{
- struct dso *self = dso__new(name ?: "[kernel.kallsyms]");
+ struct dso *dso = dso__new(name ?: "[kernel.kallsyms]");
- if (self != NULL) {
- dso__set_short_name(self, "[kernel]");
- self->kernel = DSO_TYPE_KERNEL;
+ if (dso != NULL) {
+ dso__set_short_name(dso, "[kernel]");
+ dso->kernel = DSO_TYPE_KERNEL;
}
- return self;
+ return dso;
}
static struct dso *dso__new_guest_kernel(struct machine *machine,
const char *name)
{
char bf[PATH_MAX];
- struct dso *self = dso__new(name ?: machine__mmap_name(machine, bf, sizeof(bf)));
-
- if (self != NULL) {
- dso__set_short_name(self, "[guest.kernel]");
- self->kernel = DSO_TYPE_GUEST_KERNEL;
+ struct dso *dso = dso__new(name ?: machine__mmap_name(machine, bf,
+ sizeof(bf)));
+ if (dso != NULL) {
+ dso__set_short_name(dso, "[guest.kernel]");
+ dso->kernel = DSO_TYPE_GUEST_KERNEL;
}
- return self;
+ return dso;
}
-void dso__read_running_kernel_build_id(struct dso *self, struct machine *machine)
+void dso__read_running_kernel_build_id(struct dso *dso, struct machine *machine)
{
char path[PATH_MAX];
if (machine__is_default_guest(machine))
return;
sprintf(path, "%s/sys/kernel/notes", machine->root_dir);
- if (sysfs__read_build_id(path, self->build_id,
- sizeof(self->build_id)) == 0)
- self->has_build_id = true;
+ if (sysfs__read_build_id(path, dso->build_id,
+ sizeof(dso->build_id)) == 0)
+ dso->has_build_id = true;
}
-static struct dso *machine__create_kernel(struct machine *self)
+static struct dso *machine__create_kernel(struct machine *machine)
{
const char *vmlinux_name = NULL;
struct dso *kernel;
- if (machine__is_host(self)) {
+ if (machine__is_host(machine)) {
vmlinux_name = symbol_conf.vmlinux_name;
kernel = dso__new_kernel(vmlinux_name);
} else {
- if (machine__is_default_guest(self))
+ if (machine__is_default_guest(machine))
vmlinux_name = symbol_conf.default_guest_vmlinux_name;
- kernel = dso__new_guest_kernel(self, vmlinux_name);
+ kernel = dso__new_guest_kernel(machine, vmlinux_name);
}
if (kernel != NULL) {
- dso__read_running_kernel_build_id(kernel, self);
- dsos__add(&self->kernel_dsos, kernel);
+ dso__read_running_kernel_build_id(kernel, machine);
+ dsos__add(&machine->kernel_dsos, kernel);
}
return kernel;
}
return args.start;
}
-int __machine__create_kernel_maps(struct machine *self, struct dso *kernel)
+int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
{
enum map_type type;
- u64 start = machine__get_kernel_start_addr(self);
+ u64 start = machine__get_kernel_start_addr(machine);
for (type = 0; type < MAP__NR_TYPES; ++type) {
struct kmap *kmap;
- self->vmlinux_maps[type] = map__new2(start, kernel, type);
- if (self->vmlinux_maps[type] == NULL)
+ machine->vmlinux_maps[type] = map__new2(start, kernel, type);
+ if (machine->vmlinux_maps[type] == NULL)
return -1;
- self->vmlinux_maps[type]->map_ip =
- self->vmlinux_maps[type]->unmap_ip = identity__map_ip;
-
- kmap = map__kmap(self->vmlinux_maps[type]);
- kmap->kmaps = &self->kmaps;
- map_groups__insert(&self->kmaps, self->vmlinux_maps[type]);
+ machine->vmlinux_maps[type]->map_ip =
+ machine->vmlinux_maps[type]->unmap_ip =
+ identity__map_ip;
+ kmap = map__kmap(machine->vmlinux_maps[type]);
+ kmap->kmaps = &machine->kmaps;
+ map_groups__insert(&machine->kmaps,
+ machine->vmlinux_maps[type]);
}
return 0;
}
-void machine__destroy_kernel_maps(struct machine *self)
+void machine__destroy_kernel_maps(struct machine *machine)
{
enum map_type type;
for (type = 0; type < MAP__NR_TYPES; ++type) {
struct kmap *kmap;
- if (self->vmlinux_maps[type] == NULL)
+ if (machine->vmlinux_maps[type] == NULL)
continue;
- kmap = map__kmap(self->vmlinux_maps[type]);
- map_groups__remove(&self->kmaps, self->vmlinux_maps[type]);
+ kmap = map__kmap(machine->vmlinux_maps[type]);
+ map_groups__remove(&machine->kmaps,
+ machine->vmlinux_maps[type]);
if (kmap->ref_reloc_sym) {
/*
* ref_reloc_sym is shared among all maps, so free just
kmap->ref_reloc_sym = NULL;
}
- map__delete(self->vmlinux_maps[type]);
- self->vmlinux_maps[type] = NULL;
+ map__delete(machine->vmlinux_maps[type]);
+ machine->vmlinux_maps[type] = NULL;
}
}
-int machine__create_kernel_maps(struct machine *self)
+int machine__create_kernel_maps(struct machine *machine)
{
- struct dso *kernel = machine__create_kernel(self);
+ struct dso *kernel = machine__create_kernel(machine);
if (kernel == NULL ||
- __machine__create_kernel_maps(self, kernel) < 0)
+ __machine__create_kernel_maps(machine, kernel) < 0)
return -1;
- if (symbol_conf.use_modules && machine__create_modules(self) < 0)
+ if (symbol_conf.use_modules && machine__create_modules(machine) < 0)
pr_debug("Problems creating module maps, continuing anyway...\n");
/*
* Now that we have all the maps created, just set the ->end of them:
*/
- map_groups__fixup_end(&self->kmaps);
+ map_groups__fixup_end(&machine->kmaps);
return 0;
}
return -1;
}
-size_t machine__fprintf_vmlinux_path(struct machine *self, FILE *fp)
+size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp)
{
int i;
size_t printed = 0;
- struct dso *kdso = self->vmlinux_maps[MAP__FUNCTION]->dso;
+ struct dso *kdso = machine->vmlinux_maps[MAP__FUNCTION]->dso;
if (kdso->has_build_id) {
char filename[PATH_MAX];
symbol_conf.initialized = false;
}
-int machines__create_kernel_maps(struct rb_root *self, pid_t pid)
+int machines__create_kernel_maps(struct rb_root *machines, pid_t pid)
{
- struct machine *machine = machines__findnew(self, pid);
+ struct machine *machine = machines__findnew(machines, pid);
if (machine == NULL)
return -1;
return s;
}
-int machines__create_guest_kernel_maps(struct rb_root *self)
+int machines__create_guest_kernel_maps(struct rb_root *machines)
{
int ret = 0;
struct dirent **namelist = NULL;
if (symbol_conf.default_guest_vmlinux_name ||
symbol_conf.default_guest_modules ||
symbol_conf.default_guest_kallsyms) {
- machines__create_kernel_maps(self, DEFAULT_GUEST_KERNEL_ID);
+ machines__create_kernel_maps(machines, DEFAULT_GUEST_KERNEL_ID);
}
if (symbol_conf.guestmount) {
pr_debug("Can't access file %s\n", path);
goto failure;
}
- machines__create_kernel_maps(self, pid);
+ machines__create_kernel_maps(machines, pid);
}
failure:
free(namelist);
return ret;
}
-void machines__destroy_guest_kernel_maps(struct rb_root *self)
+void machines__destroy_guest_kernel_maps(struct rb_root *machines)
{
- struct rb_node *next = rb_first(self);
+ struct rb_node *next = rb_first(machines);
while (next) {
struct machine *pos = rb_entry(next, struct machine, rb_node);
next = rb_next(&pos->rb_node);
- rb_erase(&pos->rb_node, self);
+ rb_erase(&pos->rb_node, machines);
machine__delete(pos);
}
}
-int machine__load_kallsyms(struct machine *self, const char *filename,
+int machine__load_kallsyms(struct machine *machine, const char *filename,
enum map_type type, symbol_filter_t filter)
{
- struct map *map = self->vmlinux_maps[type];
+ struct map *map = machine->vmlinux_maps[type];
int ret = dso__load_kallsyms(map->dso, filename, map, filter);
if (ret > 0) {
* kernel, with modules between them, fixup the end of all
* sections.
*/
- __map_groups__fixup_end(&self->kmaps, type);
+ __map_groups__fixup_end(&machine->kmaps, type);
}
return ret;
}
-int machine__load_vmlinux_path(struct machine *self, enum map_type type,
+int machine__load_vmlinux_path(struct machine *machine, enum map_type type,
symbol_filter_t filter)
{
- struct map *map = self->vmlinux_maps[type];
+ struct map *map = machine->vmlinux_maps[type];
int ret = dso__load_vmlinux_path(map->dso, map, filter);
if (ret > 0) {
char name[0];
};
-void symbol__delete(struct symbol *self);
+void symbol__delete(struct symbol *sym);
struct strlist;
extern struct symbol_conf symbol_conf;
-static inline void *symbol__priv(struct symbol *self)
+static inline void *symbol__priv(struct symbol *sym)
{
- return ((void *)self) - symbol_conf.priv_size;
+ return ((void *)sym) - symbol_conf.priv_size;
}
struct ref_reloc_sym {
struct dso *dso__new(const char *name);
struct dso *dso__new_kernel(const char *name);
-void dso__delete(struct dso *self);
+void dso__delete(struct dso *dso);
-int dso__name_len(const struct dso *self);
+int dso__name_len(const struct dso *dso);
-bool dso__loaded(const struct dso *self, enum map_type type);
-bool dso__sorted_by_name(const struct dso *self, enum map_type type);
+bool dso__loaded(const struct dso *dso, enum map_type type);
+bool dso__sorted_by_name(const struct dso *dso, enum map_type type);
-static inline void dso__set_loaded(struct dso *self, enum map_type type)
+static inline void dso__set_loaded(struct dso *dso, enum map_type type)
{
- self->loaded |= (1 << type);
+ dso->loaded |= (1 << type);
}
-void dso__sort_by_name(struct dso *self, enum map_type type);
+void dso__sort_by_name(struct dso *dso, enum map_type type);
struct dso *__dsos__findnew(struct list_head *head, const char *name);
-int dso__load(struct dso *self, struct map *map, symbol_filter_t filter);
-int dso__load_vmlinux(struct dso *self, struct map *map,
+int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter);
+int dso__load_vmlinux(struct dso *dso, struct map *map,
const char *vmlinux, symbol_filter_t filter);
-int dso__load_vmlinux_path(struct dso *self, struct map *map,
+int dso__load_vmlinux_path(struct dso *dso, struct map *map,
symbol_filter_t filter);
-int dso__load_kallsyms(struct dso *self, const char *filename, struct map *map,
+int dso__load_kallsyms(struct dso *dso, const char *filename, struct map *map,
symbol_filter_t filter);
-int machine__load_kallsyms(struct machine *self, const char *filename,
+int machine__load_kallsyms(struct machine *machine, const char *filename,
enum map_type type, symbol_filter_t filter);
-int machine__load_vmlinux_path(struct machine *self, enum map_type type,
+int machine__load_vmlinux_path(struct machine *machine, enum map_type type,
symbol_filter_t filter);
size_t __dsos__fprintf(struct list_head *head, FILE *fp);
-size_t machine__fprintf_dsos_buildid(struct machine *self, FILE *fp, bool with_hits);
-size_t machines__fprintf_dsos(struct rb_root *self, FILE *fp);
-size_t machines__fprintf_dsos_buildid(struct rb_root *self, FILE *fp, bool with_hits);
-
-size_t dso__fprintf_buildid(struct dso *self, FILE *fp);
-size_t dso__fprintf_symbols_by_name(struct dso *self, enum map_type type, FILE *fp);
-size_t dso__fprintf(struct dso *self, enum map_type type, FILE *fp);
+size_t machine__fprintf_dsos_buildid(struct machine *machine,
+ FILE *fp, bool with_hits);
+size_t machines__fprintf_dsos(struct rb_root *machines, FILE *fp);
+size_t machines__fprintf_dsos_buildid(struct rb_root *machines,
+ FILE *fp, bool with_hits);
+size_t dso__fprintf_buildid(struct dso *dso, FILE *fp);
+size_t dso__fprintf_symbols_by_name(struct dso *dso,
+ enum map_type type, FILE *fp);
+size_t dso__fprintf(struct dso *dso, enum map_type type, FILE *fp);
enum symtab_type {
SYMTAB__KALLSYMS = 0,
SYMTAB__NOT_FOUND,
};
-char dso__symtab_origin(const struct dso *self);
-void dso__set_long_name(struct dso *self, char *name);
-void dso__set_build_id(struct dso *self, void *build_id);
-void dso__read_running_kernel_build_id(struct dso *self, struct machine *machine);
-struct symbol *dso__find_symbol(struct dso *self, enum map_type type, u64 addr);
-struct symbol *dso__find_symbol_by_name(struct dso *self, enum map_type type,
+char dso__symtab_origin(const struct dso *dso);
+void dso__set_long_name(struct dso *dso, char *name);
+void dso__set_build_id(struct dso *dso, void *build_id);
+void dso__read_running_kernel_build_id(struct dso *dso,
+ struct machine *machine);
+struct symbol *dso__find_symbol(struct dso *dso, enum map_type type,
+ u64 addr);
+struct symbol *dso__find_symbol_by_name(struct dso *dso, enum map_type type,
const char *name);
int filename__read_build_id(const char *filename, void *bf, size_t size);
int sysfs__read_build_id(const char *filename, void *bf, size_t size);
bool __dsos__read_build_ids(struct list_head *head, bool with_hits);
-int build_id__sprintf(const u8 *self, int len, char *bf);
+int build_id__sprintf(const u8 *build_id, int len, char *bf);
int kallsyms__parse(const char *filename, void *arg,
int (*process_symbol)(void *arg, const char *name,
char type, u64 start, u64 end));
-void machine__destroy_kernel_maps(struct machine *self);
-int __machine__create_kernel_maps(struct machine *self, struct dso *kernel);
-int machine__create_kernel_maps(struct machine *self);
+void machine__destroy_kernel_maps(struct machine *machine);
+int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel);
+int machine__create_kernel_maps(struct machine *machine);
-int machines__create_kernel_maps(struct rb_root *self, pid_t pid);
-int machines__create_guest_kernel_maps(struct rb_root *self);
-void machines__destroy_guest_kernel_maps(struct rb_root *self);
+int machines__create_kernel_maps(struct rb_root *machines, pid_t pid);
+int machines__create_guest_kernel_maps(struct rb_root *machines);
+void machines__destroy_guest_kernel_maps(struct rb_root *machines);
int symbol__init(void);
void symbol__exit(void);
bool symbol_type__is_a(char symbol_type, enum map_type map_type);
-size_t machine__fprintf_vmlinux_path(struct machine *self, FILE *fp);
+size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
#endif /* __PERF_SYMBOL */