распределенное файловое хранилище (Nginx, zfs, perl)....

download распределенное файловое хранилище (Nginx, zfs, perl). перепелица мамонтов. зал 2

If you can't read please download the document

Transcript of распределенное файловое хранилище (Nginx, zfs, perl)....

  • 1. (Nginx, Perl, ZFS) Mons Anderson

2. NAS, SAN or custom? SAN: FC/iSCSI NAS: NFS/CIFS/... Custom: * 3. ? HTTP 4. HTTP nginx;) 5. ? User -> nginx -> SAN ? User -> nginx -> NAS ? User -> nginx -> nginx ? 6. Nginx: $ NAS: $$ SAN: $$$$$$$ 7. 8 U, 24Tb 4 x ( 2U, 6Tb ( 2+10 Tb raw ) 4 x redundancy for each file 8. ? CRUD! ( C reate,R ead,U pdate,D elete ) 9. ? Read -> HTTP C, U, D -> ? 10. ? Read -> HTTP C, U, D ->HTTP! 11. WebDAV nginx* 12. * WebDAV + ACL Patch: X-ACL: 0644 X-Time: 1234567890 13. Step by step: Front 14. Front 15. Front 16. Front 17. Storage N Tb =storage N/2 storage N/2 18. Storage N Tb hash storage storage 19. Storage N Tb hash storage storage 20. Storage hash storage storage storage storage 21. Storage hash storage storage storage storage 22. Resizer hash storage storage storage image_filter() storage image_filter() 23. Resizer hash Low CPU storage storage storage image_filter() storage image_filter() 24. Resizer storage storage storage storage Resize Resize Resize 25. Hashing ? Hash storage storage storage storage Resize Resize Resize 26. Hashing ScaleHash(time,id) storage storage storage storage Resize Resize Resize 27. Hashing URL: http:///ObjectID ObjectID = (xxx)-(xxx)- ID Time 28. Hashing Timeline past upgrade over future start a b d c 0x0010 0x00FF 0x02FF 0x0FFF e 0x0000 29. Hashing TimeHash(Time) { 0x0000 - 0x00FF -> [ a, b ] 0x00FF - 0x02FF -> [ a, c, d ] 0x02FF - 0x0FFF -> [ c, d ] 0x0FFF - 0xFFFF -> [ f ] } -> groups 30. Hashing IDHash(ID, groups) -> group 31. Hashing TimeHash(const Time) = const IDHash(const ID, groups) = const 32. Access Control DB Perl: - AnyEvent::HTTPD - AnyEvent::HTTP - AnyEvent::DBI storage storage group 33. Access Control: Public object -> GET /XXX Perl handler (check perms) 200 OK 34. Access Control: Protected object ->GET /XXX ->Perl handler X-Accel-Redirect: /access/... ->GET /access/... ->proxy_pass http://ACLDX-Accel-Redirect: /int/.. ->GET /int/.. 200 OK 35. Result? 36. 2+2=4, 2+2=4, 2+2=4, ! CPU Cache? 37. Cache 38. Cache is a problem :( 39. /cache proxy: //resize /source/ ?q=/i/AxB /i proxy://store resize() /internal 200 OK /i proxy: //storage /i/AxB /i proxy: //cache /i/AxB /source handler() XAccel: /cache/i/AxB+ (mtime:ctime:status) /i/AxB+ /source/ ?q=/i/AxB cache store 200 OK HIT /source handler() /source/ ?q=/i/AxB+ /source/ ?q=/i/AxB+ XAccel: /int 200 OK 200 OK 200 OK 40. Combine? MEM CPU HDD 41. Combined! Money are saved :) Deployed on same hosts Full resource consumption 42. Combined, but scalable! Cache cluster Resize cluster Storage cluster + MEM + CPU + HDD 43. Create, Update, Delete? PUT DELETE OPTIONS GET GET storage storage 44. Temporary loss PUT DELETE OPTIONS GET GET storage storage 45. Flashing error GET GET storage storage 46. No file? Ask a friend! GET GET 404 fallbackproxy_pass storage storage 47. 404 Fallback: avoiding recursion GET asknext unless asked storage storage 48. 404 Fallback Just... proxy_next_upstreamhttp_404 storage storage 49. Again about PUT PUT storage storage 50. How about single PUT? PUT DELETE OPTIONS ZFS replication storage storage 51.

  • /

52. 53. 54. 55. 56. 57. (inode) LBA LBA LBA RAID LBA RAID LBA RAID 58. Rampant layering violation;) (zfs) (zpool) 59. stripe / mirror / raidz / ? stripe , , mirror , , raidz , , :( 60. :( , GB IOPS stripe 1 x 100 10000 20000 mirror 2 x 50 5000 20000 raidz 1 x ( 99 + 1 ) 9900 200 raidz 5 x ( 19 + 1 ) 9500 1000 raidz 33 x ( 2 + 1 ) 6600 6600 :100 100GB, 200 IOPS 61. 6TB / 2U mirror raidz1 raidz1 raidz1 stripe spare 2U:2 x 500MB + 10 x 1TB, SATA-2 62. ! snapshot ( cow-, ) rollback clone ( ) send ( ) receive 63.

  • ( snapshot )

64. ( send ) 65. ( socketpipe ) 66. ( receive ) 67. ( clone ) 68. ( link -fhs ) 69. sleep && goto .1; 70.

  • ?

71. 72. 73. 74. 75. 76. 2011 Mons Anderson 77. ?