# A LOW OVERHEAD TEMPERATURE SENSOR FOR SELF-AWARE RECONFIGURABLE PLATFORMS

Rizwan Syed, Wenfeng Zhao, Yajun Ha and Bharadwaj Veeravalli

Department of Electrical and Computer Engineering National University of Singapore Singapore 117576 email: {*syed.rizwan,wenfengzhao,elehy,elebv*}@nus.edu.sg

### ABSTRACT

To enable the self-awareness in reconfigurable platforms, FPGAs will require sensors to measure various physical quantities. A physical quantity such as the temperature profile of an FPGA die allows the platform to perform thermal management by dynamically relocating workloads. Previous research works have already demonstrated such self-aware and adaptive reconfigurable platforms. However, the relative high resource utilizations by their temperature sensors make their proposals less useful. We develop a low overhead temperature sensor for reconfigurable platforms that utilizes only 7 Flip Flops, 16 6-LUTs and 7 SRL32E. This is 52% less than that of the state-of-the-art. The resolution of our temperature sensor is 0.5°C with a sampling period of only 1ms and the accuracy is  $\pm 0.5^{\circ}$ C using two-point calibration. We use the sensor in our reconfigurable platform to demonstrate its effectiveness.

### 1. INTRODUCTION

Due to the feature size shrink of transistors, integrated circuits are able to achieve a much higher density to realize a complex and high performance system in a single chip. However, their ever increasing unit area power densities also lead to thermal reliability issues. This fact holds true for high-performance CPUs, GPUs, ASICs and reconfigurable devices like FPGAs [1], and the thermal management is critical for these platforms to ensure both the correct functionality and the longer life-time. Temperature sensors based on band-gap voltage and ADCs have been reported for commercial CPUs. They are integrated and placed in the same CPU die for dynamic thermal monitoring and hot-spot profiling.

Temperature sensors are also needed for FPGA platforms to generate a die temperature profile and realize the thermal monitoring for self-awareness. However, off-the-shelf FPGAs provide only a small and fixed number of temperature sensors for the die temperature sensing, and they are incapable of thermal profiling across the whole die and in flexible locations. As a result, hot spots cannot be effectively monitored. ASIC implementations of all-digital temperature sensors are reported in [2, 3]. These temperature sensors can be categorized into either the time domain (temperature dependent delay) or the frequency domain (temperature dependent frequencies) sensors. However, either of them can be implemented in FPGA platforms. The design trade-offs of temperature sensors on FPGAs should be determined among the resource utilization, accuracy, resolution and self-heating. To measure the die temperature profile, these sensors should be instantiated in large numbers across the entire die. As a result, each sensor should have a low resource utilization.

In this work, we propose a low overhead temperature sensor which can be instantiated in large numbers to measure the temperature profile of an FPGA die. The sensor uses 52% less resource than that of the state-of-the-art. It achieves a resolution of  $0.5^{\circ}$ C with a sampling period of only 1ms, and an accuracy of  $\pm 0.5^{\circ}$ C using two-point calibration.

#### 2. RELATED WORK

Some previous research works focused on thermal-aware application management [4, 5, 6, 7]. However, the actual performance and area requirements of the sensor have not been discussed. Other research works have been done to develop and demonstrate temperature sensors on FPGAs. However, due to their overheads and performance limitations, they were practically less useful. A fully digital timedomain temperature sensor [2] was demonstrated on FPGA using 140 logic elements. Later, they improved their design using a retriggerable ring oscillator [3], but their design was not optimized for FPGAs as it was targeted at ASICs. Other works [8] have also tried to improve the performance of their temperature sensors by using different techniques. However, their area overheads are significant. We demonstrate a low overhead time domain temperature sensor using a retriggerable ring oscillator which offers the comparable accuracy



Simplified Schematic of the Programmable Time Amplifier

**Fig. 1**. (a) Block Diagram of the Proposed Sensor, (b) Timing Diagram of the Temperature Sensor, (c) Layout of the Retriggerable Ring Oscillator in the target region of the FPGA, (d) Simplified Schematic of the Programmable Time Amplifier

and the improved sampling performance. It is modular and features direct digital readout in the calibrated unit.

### 3. CIRCUIT DESCRIPTION

In this section, we present our temperature sensor. The block diagram of our temperature sensor is shown in Fig.1. The complete design can be divided into three sections: (1) Main Sensor, (2) Reading Circuit, and (3) Calibration Circuit. Each component is explained in the following subsections.

#### 3.1. Main Sensor

The main sensor measures temperature by calculating the increase in the period of the retriggerable ring oscillator. The actual increase in the period is very small and cannot be measured directly. Therefore it is amplified using a programmable divider (Programmable Time Amplifier). The delay is accumulated over time during the division. As

shown in Fig.1,  $t_{amp}$  enables the ring oscillator and programmable divider divides the oscillation  $t_{osc}$ . Signal  $t_{amp}$ is corrected using the programmable offset correction circuit to convert the extended period into a pulse that is directly proportional to the temperature with zero bias. In other words, at 0°C, the output is just the starting pulse. The pulse width of the output of a calibrated temperature sensor will be:

$$t_d = T \times t_{Clk} \times \alpha \times \beta + t_{Clk} \tag{1}$$

where,  $t_d$  is the pulse width of the output, T is the temperature in the desired unit,  $t_{Clk}$  is the period of the reference clock,  $\alpha$  is the inverse of accuracy in the desired unit, and *beta* is the integration factor. It can be seen from the equation that the pulse width will be 1 clock period wider than the actual value. This is because a starting pulse is always added to the beginning of the output pulse to enable the synchronization at the reading circuit.

Unlike previous work, the retriggerable ring oscillator in our sensor uses the transport delay in signal through the routing fabric of the FPGA as shown in Fig. 1(c). The placement of each LUT is constrained to be near the periphery of the region such that the routing distance between each is as large as possible. The exact number of buffers in the oscillator is a system parameter and can be adjusted for different sized regions. The placement of LUTs and routing is matched among different sensors in similar sized region through placement and directed routing constraints. However, it was observed that significant jitter was present in the period of the ring oscillator. This decreased the accuracy of the sensor significantly. To reduce the effect of the jitter, the output pulse from the ring oscillator was accumulated by using a fixed divider refer to as the integrator in Fig.1(d). The division factor is referred to as integration factor in equation 1. The extended versions of the pulse is again extended by programmable divisor (Programmable Time Amplifier). This dividing factor depends upon the resolution and calibrated unit. The factor is set such that unit increase (w.r.t the resolution and accuracy of measurement) in pulse length is exactly one clock period  $(t_{Clk})$ . In other words, if the resolution is high then the divisor will be larger. This implies a larger conversion time and larger area requirements for the programmable divider. Therefore, there is a trade-off between the conversion time and resolution. Offset correction is performed by using a masking pulse that is only high between minimum and maximum measurable temperature w.r.t the output of the ring oscillator. In this case, the pulse was high only between 0°C and 80°C. The implementation use an efficient implementation of dividers and pulse generators using cascaded shift registers as shown in Fig.1.

# 3.2. Reading Circuit

The reading circuit only consists of a simple up-counter that is enabled by the output pulse of the temperature and thus measure the pulse width of the output pulse. The count of the counter is a direct binary readout of the temperature in the selected unit during the calibration. The counter is cleared at the rising edge of the output pulse. This automatic synchronization allows us to interface many temperature sensors by multiplexing the input of the reading circuit.

# 3.3. Calibration Circuit

The calibration circuit is used to read the digital readout across the operating temperature range and load the corrected gain and offsets coefficients into each sensor. The corrected divisors and offsets are calculated based on the readout from the sensor and the actual reading. This circuit is only required during calibration and can be removed by hard coding the coefficients. The procedure used to calibrate the sensor is mentioned in the Section 4.

### 4. CALIBRATION

The actual oscillating frequency of the ring oscillator increases non-linearly with increase in temperature over a large temperature range. However, for small ranges, the relationship can be approximated using a straight line. This allows us to use a simple counter as a reading circuit. For better accuracy, two point method can be used which corrects both the gain and offset coefficients. For simpler calibration, one point calibration can be used which only corrects the offset at the given temperature. Since the gain is not calibrated, therefore, the accuracy over the selected temperature range may be reduced.

For two-point calibration, the temperature of the FPGA is controlled using a temperature controlled oven. Two temperatures are selected on the basis of the operating range of the FPGA and the readout of the sensors at those temperature are recorded using default gain and offset coefficients. The actual operating frequencies of the ring oscillators in each sensor can be calculated using the readouts at each temperature. Then we can calculate the correct values of the gain and offset coefficients and load it using the calibration circuit. After verification, the calibration circuit can be completely removed and the gain and offset coefficients can be hard coded in HDL. The placement and routing of the ring oscillators are recorded in a constraint file. This allows the calibration to be preserved for that FPGA. Because of process variation among different FPGAs of the same type, the oscillating frequency of the ring oscillator is not the same and therefore, the calculated coefficient are only accurate for that FPGA. Therefore, calibration is required for different FPGAs. To simplify the calibration process, only one-point calibration can be performed by calculating the offset coefficient using the calibrated coefficients (of another FPGA of the same type) as default. Since the change in gain coefficient is generally not large from one FPGA to another (of the same type), therefore the accuracy will not be badly affected.

#### 5. EXPERIMENTAL RESULTS

We used Xilinx Virtex 6 board ML605 for the implementation. The ring oscillator of the sensor used 10-LUTs. The sensors were calibrated using the on-die temperature sensor in a temperature controlled oven from 35°C to 70°C. Chip-Scope Pro Virtual I/O was use to measure the readout and load corrected divisors and offsets. We achieved an accuracy of  $\pm 0.5$ °C using two-point calibration on two FPGAs and an integration factor of 64. We compare in Table 1 the resource utilization and the performance of our sensor with that of the related works.

We further build an evaluation platform based on the AARP [9] on Xilinx ML605 Board [10]. The AARP platform allows us to instantiate applications dynamically in the



Fig. 2. Floorplan of Implementation

|    | 1           | 0    |         | C 1     |
|----|-------------|------|---------|---------|
| ab | <b>e</b> I. | Comn | arative | e Study |

| Tuble I. Computative Study                                     |                       |            |             |             |  |  |  |
|----------------------------------------------------------------|-----------------------|------------|-------------|-------------|--|--|--|
|                                                                | #LE                   | Resolution | Accuracy    | Samples/sec |  |  |  |
| [2]                                                            | 140                   | 0.06       | -1.5 to 0.8 | 3000        |  |  |  |
| [3]                                                            | 48                    | 0.13       | -0.7 to 0.6 | 4400        |  |  |  |
| Our Sensor                                                     | 23*[24 <sup>†</sup> ] | 0.5        | 0.5         | 1000        |  |  |  |
| * Resource Usage on Virtex 5/6, Spartan 6, and Series 7 FPGAs: |                       |            |             |             |  |  |  |
| 07 FFs, 16 6-LUTs, 07 SRL32Es packed into 23 Equivalent LEs    |                       |            |             |             |  |  |  |

<sup>†</sup> Resource Usage on Virtex 4 and Spartan 3 FPGAs: 07 FFs, 15 4-LUTs, 09 SRL16Es, 22 MUXFX, 03 MUXF5 packed into 24 Equivalent LEs

FPGA. The floorplan of the implementation on FPGA is shown in Fig.2. As shown in the Fig.2, there are  $8 \times 6(=48)$ regions available. We develop a small dummy application using only one region and embed our developed sensor in it. We instantiate the dummy application in all the regions and calibrate the sensors using the one-point calibration using the initial coefficients from the first experiment. We achieve an accuracy of 0.5°C for the operating temperature range from 35°C to 70°C. We are able to successfully measure the temperature profile of the reconfigurable regions of the FPGA. This information will be used by the system manager for hot spot detection and thermal management through dynamic frequency scaling and task scheduling/relocation in a future work.

### 6. CONCLUSION

We present a low overhead time domain temperature sensor to enable self-awareness for reconfigurable platforms. The sensor uses 52% less resource than that of the state-of-theart. Its resolution is  $0.5^{\circ}$ C with a sampling period of only 1ms. Its accuracy is  $\pm 0.5^{\circ}$ C using two-point calibration.

### 7. ACKNOWLEDGEMENT

The authors acknowledge the funding support from Singapore A\*Star (grant no 1122804010).

### 8. REFERENCES

- P. Mangalagiri, S. Bae, R. Krishnan, Y. Xie, and V. Narayanan, "Thermal-aware reliability analysis for platform fpgas," in *ICCAD 2008 IEEE*, nov. 2008, pp. 722 –727.
- [2] P. Chen, M.-C. Shie, Z.-Y. Zheng, Z.-F. Zheng, and C.-Y. Chu, "A fully digital time-domain smart temperature sensor realized with 140 fpga logic elements," *IEEE Trans. Circuits Syst. I*, vol. 54, no. 12, pp. 2661 –2668, dec. 2007.
- [3] P. Chen, S.-C. Chen, Y.-S. Shen, and Y.-J. Peng, "Alldigital time-domain smart temperature sensor with an inter-batch inaccuracy of -0.7C-0.6C after one-point calibration," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 5, pp. 913 –920, may 2011.
- [4] P. Jones, J. Moscola, Y. Cho, and J. Lockwood, "Adaptive thermoregulation for applications on reconfigurable devices," in *FPL 2007 Intl. Conf.*, aug. 2007, pp. 246 –253.
- [5] P. H. Jones, Y. H. Cho, and J. W. Lockwood, "Dynamically optimizing fpga applications by monitoring temperature and workloads," in *VLSID 2007 Intl. Conf.*, jan. 2007, pp. 391–400.
- [6] D. Atienza and E. Martinez, "Inducing thermalawareness in multicore systems using networks-onchip," in *ISVLSI 2009 IEEE CS Symp.*, may 2009, pp. 187–192.
- [7] X. Zhang, W. Jouini, P. Leray, and J. Palicot, "Temperature-power consumption relationship and hot-spot migration for fpga-based system," in *Green-Com 2010 IEEE/ACM Intl. Conf.*, dec. 2010, pp. 392 –397.
- [8] Z. Chen, R. Nagesh, A. Reddy, and P. Schaumont, "Increasing the sensitivity of on-chip digital thermal sensors with pre-filtering," in *VLSI 2009 IEEE CS Symp.*, may 2009, pp. 304 –309.
- [9] R. Syed, Y. Ha, and B. Veeravalli, "A low overhead abstract architecture for fpga resource management," in *HEART 2012 Intl. Workshop.*, Jun 2012.
- [10] Xilinx, "Xilinx ML605 User Guide UG534," 2008.