Aceleradores Reconfiguráveis no Projeto Multicore: uma análise de custo versus benefício
DOI:
https://doi.org/10.15628/holos.2020.9924Palavras-chave:
multicore, acelerador reconfigurável, exploração de espaço de projeto, desempenho, energiaResumo
A crescente evolução do software através de novas técnicas tem permitido o desenvolvimento de diversas soluções para atender a demanda da sociedade. Um exemplo atual é o avanço de técnicas de aprendizado de máquina para ser utilizadas em veículos autônomos, diagnósticos médicos, robôs, dentre outros. Diversas soluções de hardware têm surgido nos últimos anos para atender à essa demanda. Dentre essas soluções, os sistemas com múltiplos núcleos, chamados de multicores, estão entre as principais tendências. Porém, a busca por soluções de hardware não visa só o alto desempenho. É preciso levar em consideração outros aspectos como a eficiência energética e a área. Nesse cenário, a combinação de processadores com aceleradores reconfiguráveis tem sido amplamente explorada pelo fato destes últimos proporcionarem ganho de desempenho com redução de energia. Neste trabalho, pretendemos colaborar com o projeto de multicores investigando diferentes combinações de processadores com aceleradores reconfiguráveis. Como estudo de caso, combinamos processadores superescalares com arquiteturas reconfiguráveis de granularidade grossa e avaliamos três cenários. O primeiro é uma combinação de processadores e aceleradores que alcançam o mais alto desempenho possível para um conjunto de aplicações. O segundo é uma combinação de processadores e aceleradores definido por um limite de desempenho e o terceiro é limitado pela energia. Os experimentos mostram que é possível obter uma aceleração de mais de 2,5x para determinadas aplicações; economizar mais de 11% de energia com perda de 10% de aceleração e reduzir 30% de área com economia de 20% de energia.
Downloads
Referências
ARM. (2020). Arm big.little technologies. https://www.arm.com/why-arm/technologies/big-little. (Accessed: 2020-04-14)
Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., ... others (2011). The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2), 1–7.
Bouthaina, D., Baklouti, M., Niar, S., & Abid, M. (2013). Shared hardware accelerator architectures for heterogeneous mpsocs. In 2013 8th international workshop on reconfigurable and communication-centric systems-on-chip (recosoc) (pp. 1–6).
Brandalero, M., & Beck, A. C. S. (2017). A mechanism for energy-efficient reuse of decoding and scheduling of x86 instruction streams. In Design, automation & test in europe conference & exhibition (date), 2017 (pp. 1468–1473).
Butko, A., Bruguier, F., Gamatié, A., Sassatelli, G., Novo, D., Torres, L., & Robert, M. (2016). Full-system simulation of big. little multicore architecture for performance and energy exploration. In 2016 ieee 10th international symposium on embedded multicore/many-core systems-on-chip (mcsoc) (pp. 201–208).
Butko, A., Bruguier, F., Novo, D., Gamatié, A., & Sassatelli, G. (2019). Exploration of performance and energy trade-offs for heterogeneous multicore architectures. arXiv preprint arXiv:1902.02343.
Compton, K., & Hauck, S. (2002). Reconfigurable computing: a survey of systems and software. ACM Computing Surveys (csuR), 34(2), 171–210.
Compton, K., & Hauck, S. (2008). Automatic design of reconfigurable domain-specific flexible cores. IEEE transactions on very large scale integration (VLSI) systems, 16(5), 493–503.
Cong, J., Ghodrat, M. A., Gill, M., Grigorian, B., Gururaj, K., & Reinman, G. (2014). Accelerator-rich architectures: Opportunities and progresses. In Proceedings of the 51st annual design automation conference (pp. 1–6).
Duhem, F., Muller, F., Bonamy, R., & Bilavarn, S. (2015). Fortress: a flow for design space exploration of partially reconfigurable systems. Design Automation for Embedded Systems, 19(3), 301–326.
Gao, C., Gutierrez, A., Rajan, M., Dreslinski, R. G., Mudge, T., & Wu, C.-J. (2015). A study of mobile device utilization. In 2015 ieee international symposium on performance analysis of systems and software (ispass) (pp. 225–234).
Greenhalgh, P. (2011). Big. little processing with arm cortex-a15 & cortex-a7: Improving energy efficiency in high-performance mobile platforms. white paper, ARM Ltd.
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown, R. B. (2001). Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the fourth annual ieee international workshop on workload characterization. wwc-4 (cat. no. 01ex538) (pp. 3–14).
Hartenstein, R. (2001). Coarse grain reconfigurable architecture (embedded tutorial). In Proceedings of the 2001 Asia and South Pacific Design Automation Conference (pp. 564–570). New York, NY, USA
Hill, M. D., & Marty, M. R. (2008). Amdahl’s law in the multicore era. Computer, 41(7), 33–38.
Hussain, W., Airoldi, R., Hoffmann, H., Ahonen, T., & Nurmi, J. (2014). Design of an accelerator-rich architecture by integrating multiple heterogeneous coarse grain reconfigurable arrays over a network-on-chip. In 2014 ieee 25th international conference on application-specific systems, architectures and processors (pp.
–138).
Kamdar, S., & Kamdar, N. (2015). big. little architecture: Heterogeneous multicore processing. International Journal of Computer Applications, 119(1).
Kareemullah, H., Janakiraman, N., & Kumar, P. N. (2017). A survey on embedded reconfigurable architectures. In 2017 international conference on communication and signal processing (iccsp) (pp. 1500–1504).
Koenig, R., Bauer, L., Stripf, T., Shafique, M., Ahmed, W., Becker, J., & Henkel, J. (2010). Kahrisma: a novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture. In 2010 design, automation & test in europe conference & exhibition (date 2010) (pp. 819–824).
Koeplinger, D., Prabhakar, R., Zhang, Y., Delimitrou, C., Kozyrakis, C., & Olukotun, K. (2016). Automatic generation of efficient accelerators for reconfigurable hardware. In 2016 acm/ieee 43rd annual international symposium on computer architecture (isca) (pp. 115–127).
Koutras, I., Maragos, K., Diamantopoulos, D., Siozios, K., & Soudris, D. (2017). On supporting rapid prototyping of embedded systems with reconfigurable architectures. Integration, 58, 91–100.
Kuon, I., Tessier, R., Rose, J., et al. (2008). Fpga architecture: Survey and challenges. Foundations and Trends R in Electronic Design Automation, 2(2), 135–253.
Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., & Jouppi, N. P. (2009). Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd annual ieee/acm international symposium on microarchitecture (pp. 469–480).
Li, W., Zeng, X., Dai, Z., Nan, L., Chen, T., & Ma, C. (2017). A high energy-efficient reconfigurable vliw symmetric cryptographic processor with loop buffer structure and chain processing mechanism. Chinese Journal of Electronics, 26(6), 1161–1167.
Mao, Y., You, C., Zhang, J., Huang, K., & Letaief, K. B. (2017). A survey on mobile edge computing: The communication perspective. IEEE Communications Surveys & Tutorials, 19(4), 2322-2358.
Mishra, A., & Tripathi, A. K. (2014). Energy efficient voltage scheduling for multi-core processors with software controlled dynamic voltage scaling. Applied Mathematical Modelling, 38(14), 3456–3466.
Neshatpour, K., Mokrani, H. M., Sasan, A., Ghasemzadeh, H., Rafatirad, S., & Homayoun, H. (2018). Architectural considerations for fpga acceleration of machine learning applications in mapreduce. In Proceedings of the 18th international conference on embedded computer systems: Architectures, modeling, and simulation (pp. 89–96).
Nguyen, H. K., Le-Van, T.-V., & Tran, X.-T. (2018). A survey on reconfigurable system-on-chips. REV Journal on Electronics and Communications, 7(3-4). NVIDIA. (2019). Tegra mobile processors. http://www.nvidia.com/. (Accessed: 2019-06-29)
Reddy, D., Koufaty, D., Brett, P., & Hahn, S. (2011). Bridging functional heterogeneity in multicore architectures. ACM SIGOPS Operating Systems Review, 45(1), 21–33.
SAMSUNG. (2019). The samsung reference platform. http://www.samsung.com/.
(Accessed: 2019-06-29)
Scogland, T., Balaji, P., Feng, W.-c., & Narayanaswamy, G. (2008). Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation. In Sc’08: Proceedings of the 2008 acm/ieee conference on supercomputing (pp. 1–12).
Smit, G. J., Havinga, P. J., Smit, L. T., Heysters, P. M., & Rosien, M. A. (2002). Dynamic reconfiguration in mobile systems. In International conference on field programmable logic and applications (pp. 171–181).
Souza, J. D., Carro, L., Rutzig, M. B., & Beck, A. C. S. (2016, March). A reconfigurable heterogeneous multicore with a homogeneous ISA. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1598-1603). IEEE.
Suh, D., Kwon, K., Kim, S., Ryu, S., & Kim, J. (2012). Design space exploration and implementation of a high performance and low area coarse grained reconfigurable processor. In 2012 international conference on field-programmable technology (pp. 67–70).
Synopsys. (2020). RTL Synthesys. https://www.synopsys.com/. (Accessed: 2020-04-14)
Tehre, V., & Kshirsagar, R. (2012). Survey on coarse grained reconfigurable architectures. International Journal of Computer Applications, 48(16), 1–7.
Van Craeynest, K., & Eeckhout, L. (2013). Understanding fundamental design choices in single-isa heterogeneous multicore architectures. ACM Transactions on Architecture and Code Optimization (TACO), 9(4), 32.
Watkins, M. A., & Albonesi, D. H. (2010). Remap: A reconfigurable heterogeneous multicore architecture. In 2010 43rd annual IEEE/ACM International Symposium on Microarchitecture (pp. 497–508).