Be watter with Spark
-
Upload
sergio-gomez -
Category
Software
-
view
427 -
download
0
Transcript of Be watter with Spark
![Page 1: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/1.jpg)
Be water with
Apache Spark™ in the Real World
![Page 2: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/2.jpg)
¡Hola!
• Sergio Gómez
• Software Architect at
• @pulsarin
• linkedin.com/in/bedeveloper
![Page 3: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/3.jpg)
Contenido
• Un proyecto Big Data
• Lecciones aprendidas
• Búsqueda de rendimiento
![Page 4: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/4.jpg)
¿De qué va esto?
![Page 5: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/5.jpg)
¿De qué va esto?
![Page 6: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/6.jpg)
Un proyecto
![Page 7: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/7.jpg)
Un proyecto
• Telco internacional
• Información de la sondas
• 10 millones de usuarios registrados
![Page 8: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/8.jpg)
Volumen
• ~10K millones de eventos diarios
• 17 nodos, 360 cores, 2’4 TB ram
• Ejecuciones diarias
![Page 9: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/9.jpg)
Volumen
![Page 10: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/10.jpg)
Objetivos
• Monetización
• Calidad de las infraestructuras
• Detección de potenciales problemas
![Page 11: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/11.jpg)
Lecciones aprendidas
![Page 12: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/12.jpg)
Diseña
• Piensa en los datos
• Piensa en el flujo
• Piensa en tu storage
![Page 13: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/13.jpg)
Caso 1: ETL
• Descarga diaria de las sondas
• Procesado y enriquecido con los usuarios
• Guardado para futuros procesos
• Métricas de sanidad
![Page 14: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/14.jpg)
Caso 1: ETL
• Gran volumen de datos
• Errores de parseo
• Reprocesing
![Page 15: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/15.jpg)
Caso 1: ETL
CPU
RAMRED
![Page 16: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/16.jpg)
Caso II: K-Means
• Muchas iteraciones
• Ensayo - error
• Datos precocinados
![Page 17: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/17.jpg)
Los errores
![Page 18: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/18.jpg)
Los errores
![Page 19: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/19.jpg)
No falles
• Tu aplicación no puede fallar
• Un error puede suponer horas…
• … o incluso perder información
![Page 20: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/20.jpg)
Datos reales
• Usa samples reales
• Reduce la magnitud para extrapolar
![Page 21: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/21.jpg)
“Qué sea barato fallar”
![Page 22: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/22.jpg)
Volumen real
• Usa el volumen real
• Necesitas ser escalable
![Page 23: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/23.jpg)
Evitaremos
• PermGen
• OOM
• Tiempos de proceso demasiado altos
• No escalamos
• Uso asimétrico del cluster
![Page 24: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/24.jpg)
“Falla rápido”
![Page 25: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/25.jpg)
Buscando rendimiento
![Page 26: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/26.jpg)
Cachea
• Cachea cuando sea posible
• Usa broadcast
• Coalesce
![Page 27: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/27.jpg)
Gestiona el cluster
• Mesos / YARN
• Prueba diferentes configuraciones
• JVM tuning
![Page 28: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/28.jpg)
Suffle
• Piensa en la distribución de keys
• Partitioner
• ¿Qué voy a hacer con los datos?
• groupByKey
![Page 29: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/29.jpg)
SparkSQL
• Select * from… ¿seriously?
• Bueno si tenemos datos parcialmente estructurados
• Trabajar con un subset
![Page 30: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/30.jpg)
¿Dudas?
![Page 31: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/31.jpg)
![Page 32: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/32.jpg)
Linkazos :)
• http://es.slideshare.net/pulsarin/be-watter-with-spark
• http://kcy.me/29czy
• Insultos aquí: http://kcy.me/29d01
![Page 33: Be watter with Spark](https://reader034.fdocuments.net/reader034/viewer/2022051122/58a4f3f51a28abd8548b6c7f/html5/thumbnails/33.jpg)
¡Gracias!@pulsarin