AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU...

103
Grant Agreement N°825619 Page 1 of 9 AI4EU Deliverable D2.4 Community Portal WP 2 Platform design and implementation Task 2.2 Community Tools Dissemination level 1 PU Due delivery date 31/06/2019 Nature 2 O Actual delivery date 03/09/2019 Lead beneficiary SMI Document Version Date Author Comments 3 1 13/06/2019 Sebastien VINCENT Abstract content 1.1 17/06/2019 Sebastien VINCENT Add detailed content 2 09/08/2019 Sebastien VINCENT Details on website development 2.1 28/08/2019 Ludivine LENOIR, Sébastien VINCENT Correct document according to reviewer comments 1 Dissemination level: PU = Public, PP = Restricted to other programme participants (including the JU), RE = Restricted to a group specified by the consortium (including the JU), CO = Confidential, only for members of the consortium (including the JU) 2 Nature of the deliverable: R = Report, P = Prototype, D = Demonstrator, O = Other 3 Creation, modification, final version for evaluation, revised version following evaluation, final

Transcript of AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU...

Page 1: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Grant Agreement N°825619

Page 1 of 9

AI4EU Deliverable D2.4

Community Portal

WP 2 Platform design and implementation

Task 2.2 Community Tools

Dissemination level1 PU Due delivery date 31/06/2019

Nature2 O Actual delivery date 03/09/2019

Lead beneficiary SMI

Document Version Date Author Comments3

1 13/06/2019 Sebastien VINCENT Abstract content

1.1 17/06/2019 Sebastien VINCENT Add detailed content

2 09/08/2019 Sebastien VINCENT Details on website development

2.1 28/08/2019 Ludivine LENOIR, Sébastien VINCENT

Correct document according to reviewer comments

1 Dissemination level: PU = Public, PP = Restricted to other programme participants (including the JU), RE = Restricted to a group

specified by the consortium (including the JU), CO = Confidential, only for members of the consortium (including the JU)

2 Nature of the deliverable: R = Report, P = Prototype, D = Demonstrator, O = Other

3 Creation, modification, final version for evaluation, revised version following evaluation, final

Page 2: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 2 of 9

Glossary

eXo Digital workplace software

FHG Fraunhofer Gesellschaft

IMT Institut Mines-Télécom

ORA Orange

THA Thales

TWE Twenty Communications

SMI Smile

Page 3: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 3 of 9

Deliverable abstract Scope of the deliverable: This deliverable is the initial version of the AI4EU community portal software which will first integrate the official website of the project and then provide the different functionalities described in task 2.2. The activities made from the beginning of the project on the task 2.2 can be split into 3 parts:

1. The infrastructure: the technical layer supporting all the platform in collaboration with IMT / Teralab

2. The AI4EU internal tool: an instance of eXo platform to gather all the partners in a common space to improve collaborative work in collaboration with ORA.

3. The AI4EU public website: the website designed to involve all the AI community in Europe. Results:

1. Infrastructure - Technical Architecture Document for the targeted platform (see attached documents) (M1). - Technical Architecture Document for the interim platform (see attached documents) (M2) - Interim Infrastructure described in the Interim TAD (M3)

2. AI4EU internal tool - Smile has delivered the eXo platform containing all management tools to gather all partners

, organize the workload and providing a single repository for all project documentation.(M3) https://collab.ai4eu.eu

3. AI4EU Public website - Smile is involved into the specification of the website to deliver the full backlog of tasks

needed to start the development - Smile is designing and developing the backend of all section of the website expected for the

first release of the website. - Smile is integrating the frontend of the website

Deliverable Review

Reviewer #1: Gabriel Gonzalez-Castane Reviewer #2: ..........................................

Answer Comments Type* Answer Comments Type*

1. Is the deliverable in accordance with

(i) the Description of the Action?

Yes No

M m a

Yes No

M m a

(ii) the international State of the Art?

Yes No

M m a

Yes No

M m a

2. Is the quality of the deliverable in a status

(i) that allows it to be sent to European Commission?

Yes No

M m a

Yes No

M m a

(ii) that needs improvement of the writing by the originator of the deliverable?

Yes No

M m a

Yes No

M m a

(iii) that needs further work by the Partners responsible for the deliverable?

Yes No

M m a

Yes No

M m a

* Type of comments: M = Major comment; m = minor comment; a = advice

Page 4: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 4 of 9

Contents

Introduction 5

Share 5

Learn 5

Show 5

Advanced search 5

State of the Art 6

Results and Analysis 7

Infrastructure 7

AI4EU Internal tool 7

AI4EU public website 7

AI4EU Process: step by step on how to build the public website 8

Conclusion 8

Annex 9

Page 5: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 5 of 9

1. Introduction

The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project and then provide the different functionalities described in Task 2.2.

The main goal of the first 6 months of the AI4EU project, WP2, is to build the community website using the Drupal CMS. The AI4EU website is aimed to attract a broad range of profiles and to provide features to share, learn, show and search to the community.

1. Share

Onboard or create communities to discuss some AI subjects, share opinions or experiences and bring people sharing the same goals together to achieve a collaborative target.

Visitors of the platform can get involved in the community and become contributors by writing posts, articles and share their knowledge. This is a good way to maintain the state of the art up-to-date and bring new knowledge to all members of the community. Articles can also refer to other online articles, users can comment on them or add more information.

Contents of the website is shareable through common social network as a way to increase the visibility of the AI4EU community.

Communities can grow as people are invited or ask to join in. Users will also be able to share documents and articles, discuss them and create groups around a particular topic and a specific goal.

2. Learn

By visiting AI4EU, a user can browse through articles, read groups about various subjects and search the platform.

If a visitor wants to learn more, he/she can register to AI4EU and get access to private groups, engage in discussions, join communities and expand his/her network by adding new people to get more knowledge. A list of events occurring within a group, gives the opportunity to users to attend events, learn more and meet people with common interests.

Users will share their best practices, resources as well as coding examples. This way, any user eager to learn more and looking for quality content on specific subjects will be able to use those information safely.

3. Show

The AI4EU ecosystem will promote works done by the partners of Industrial Pilots. The 8 prototypes to be delivered will be implemented on the platform with documentation. They aim to explain what AI is and how AI can help people, as well as to foster the creation of discussion and activities based on these pilots.

New versions of pilots or other technical initiatives, based on AI4EU open calls or not, will be shared on the website.

4. Advanced search

The search function is one of the major functionalities of the AI4EU platform. There is a global feature which searches within the website and on targeted AI websites over the internet. The sort and filter capabilities of the search allow users to customise and refine their query. The search request API uses an advanced algorithm to search AI websites (previously indexed) and match the user’s query.

Page 6: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 6 of 9

2. State of the Art

The technical state of the art, related to the platform infrastructure, is described in 3 annexes:

- AI4EU - Interim Platform Technical Architecture Document V1.1.pdf

- AI4EU - Platform Technical Architecture Document - v1.3.pdf

- AI4EU - Platform Installation and Maintenance Guide - v1.0.pdf

These 3 documents are not merged into the deliverable as:

- They help understand how the platform is built and explain how it was build - They are technical oriented - They are mandatory documents when building a technical platform

These documents contain all the required elements to understand how the infrastructure of the AI4EU Platform is built. This includes the details of:

- used softwares, their configuration and their goals - used technologies and their goal - the global architecture overview - how to operate the system - how to maintain the system

The functional aspect of the platform and the public website, is developed using the Content Management System (CMS) Drupal. This technology allowed us to set up the AI4EU website based on a catalog of features and a standard stable architecture. As Drupal is a flexible solution, we were able to customize some components to match the specific needs of the project. Drupal is the leading open source CMS and it is continuously updated to maintain security standards and performance of the website.

Page 7: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 7 of 9

3. Results and Analysis The 3 major achievements of the first period of the AI4EU project are related to the infrastructure, the internal tool for management and the public website.

1. Infrastructure

During the first months of the project, the internal collaborative platform (instance of the eXo Platform) needed to be released to create a support to all partners, so that they could start working in a collaborative way. Thus, we built the first version of the infrastructure: the interim architecture. At the same time, we worked on the whole architecture expected to host the full AI4EU ecosystem: the target infrastructure.

The two versions of the infrastructure have been built to serve 2 specific objectives:

- The interim infrastructure: delivered at M3 to support the AI4EU Internal tool. The architecture is described in the Interim TAD (see annex). The interim architecture is a minimalist version of the target TAD. This means that the developer of the platform and the hoster partner of the AI4EU project, worked closely together to select the best software components and technologies to create a robust environment in a short time (less than 3 months). This environment is fully isolated from the target one to allow technicians to maintain the interim architecture and develop the target version without any disruption between the 2 threads. This environment is robust enough to host the AI4EU internal tool for a first opening to the AI4EU members scheduled for M3. Some bugs have been fixed to optimize this platform opening.

- The target infrastructure: delivered at M6 to support the whole AI4EU platform. The architecture is described in the Target TAD (see annex). The target is designed and setup to provide a scalable and high availability environment. These are prerequisites to host all the target platform (Acumos, public website, interoperability, search, …)

2. AI4EU Internal tool

The internal tool to manage the project has been released at M3. An instance of the eXo platform has been set on the interim architecture and some features have been customized to match the management needs.

3. AI4EU public website

There are 4 main steps in the process of delivery of the website

- Specification: The collection of these have been performed through several workshops. - Back end development: Drupal customisation and development of new modules to match

the specified needs. - Front end design: create front end mockups and HTML pages according to the

specifications - Front end integration: Slicing and integrating the HTML to the back end of the website and

deliver a fully functional website.

The contributing partners are:

- Specifications leader: THA - Frontend designer and html production: TWE - Front end integration, backend and platform architecture: SMI - Product owner: ORA. - Scrum mastering: FHG

The website is currently in the development phase and the first version will be released in September. It will contain the following sections: People, Groups, Discussions, Search, plus some static pages to present the objectives of AI4EU.

Page 8: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 8 of 9

4. AI4EU Process: step by step on how to build the public website

The partners involved in the WP2 are working together in order to create the AI4EU website.

Since the workshops involving all partners, Thales has been the main driver regarding specifications, they create a basic wireframe for each feature with a detailed description. Once this is ready and validated, Twenty Communications starts to work on an HTML version of the wireframe and functionalities while Smile works on the backend development. When the HTML is validated and delivered, Smile adds and integrates the functional features to the website.

Each feature is then developed, tested and released in a pre production environment, then shown and validated by Thales during a demo ceremony.

4. Conclusion

The activities related to the Task2.2 during the first 6 months and the work done since the kick-off is aligned with the workload and expectations of the consortium:

- interim platform hosting the internal collaborative tool has been set up and is running - target platform is up and ready to host the public website - first version of the public website will be released by the end of september.

The main difficulties encountered were related to the process and the communication with partners as it is our first time working together. The process is now clear and in continuous improvement as well as our communications. Next steps will be easier to tackle as the team is now well established to such exercise.

Page 9: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Page 9 of 9

5. Annex

AI4EU - Platform Technical Architecture Document - v1.3.pdf

AI4EU - Interim Platform Technical Architecture Document - v1.1.pdf

AI4EU - Platform Installation and Maintenance Guide - v1.0.pdf

Page 10: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Annex 1 - Platform Technical Architecture Document

Page 11: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU project 

WP2 - Platform Technical ArchitectureDocument 

Version 1.3 

Page 12: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Document changes 

Version  Changes 

DRAFT-1.0  Document creation 

DRAFT-1.1  First TeraLab feedback (SLA, …) 

DRAFT-1.2 IAM updates

1.3  - TeraLab architecture stable - CI/CD architecture and Drupal 

Revision date Revision Authors Smile Reviewer AI4EU Reviewer

17/12/2018  DRAFT-1.0  Patrice Ferlet Olivier Favreau 

Alain ROUEN  TeraLab (Olivier Dehoux) Orange (Thierry Nagellen) 

17/01/2019 DRAFT-1.1 Patrice FerletOlivier Favreau 

Alain ROUEN TeraLab (Olivier Dehoux)Orange (Thierry Nagellen) 

20/05/2019  DRAFT-1.2  Patrice Ferlet  

Alain ROUEN  TeraLab (Olivier Dehoux) Orange (Thierry Nagellen) 

06/08/2019  1.3  Patrice Ferlet  Alain ROUEN  Smile (Sebastien Vincent) Orange (Thierry Nagellen)

AI4EU - WP2 - Platform TAD  1/34

 

Page 13: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Document summary 

Document goal 4 

Platform description 5 

Platform Features 5 

Functional architecture and dependencies 6

Assumptions and service level objectives 7 

Assumptions for the “Platform block” 7 

Assumptions for the “IaaS and managed services block” 8 

Assumptions for the project board 9

Assumptions for the platform operator 10 

Technical architecture 11 

Architecture overview 11 

Platform components role and dependencies 13

Network environment 14 

Technical requirements 14 

Redundancy and failover 14 

Addressing 14

Security 15 

Operational requirements 16 

Architecture diagram 16 

Distributed block storage service 17

Technical requirements 17 

Operational requirements 17 

AI4EU - WP2 - Platform TAD  2/34

 

Page 14: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Object storage service 18 

Technical requirements 18 

Operational requirements 18

Architecture diagram 19 

Admin endpoint 20 

Technical requirements 20 

Operational requirements 20

Architecture diagram 21 

Platform orchestrator 22 

Technical requirements 22 

Operational requirements 23

Architecture diagram 24 

Identity and Access Management 25 

Technical requirements 25 

Operational requirements 25

Architecture diagram 26 

Collaboration management 27 

Technical requirements 27 

Operational requirements 27

Architecture diagram 28 

CI/CD tooling 29 

AI project management 30 

Summary of technical resources requirements 31

AI4EU - WP2 - Platform TAD  3/34

 

Page 15: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

1. DOCUMENT GOAL 

This document aims to describe the technical architecture and requirements of the AI4EUplatform that will support AI4EU activities : 

● Mobilize the entire European AI community ● Create a leading collaborative AI European platform 

Being the software integrator of this platform, Smile is the author of this document.  

In regards to his role, Smile will provide, at the latest at the end of his mission, the different                                     documents related to this architecture (build/setup and operational guide) to the entity in                         charge of the operations. 

AI4EU - WP2 - Platform TAD  4/34

 

Page 16: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

2. PLATFORM DESCRIPTION 

2.1. PLATFORM FEATURES 

To fulfill the platform mission, several categories of features have identified : 

Feature category  Description 

User identity and access management (IAM)  A common repository for all users identity to offer a single sign-on experience across all platform features 

Collaboration management (CM) A set of collaborative features for platformusers, to cover topics like: 

● The ability to publish and contribute to public and private content 

● Content management ● Platform social network, user 

activities, news boards ● Subject matter expert communities 

AI project management (AIPM)  An AI development studio has been identified during the project pre-sales phase (Acumos). This tool will cover topics like: 

● The ability to develop or collaborateon data-science projects or ML/DL projects

● The ability to train ML/DL models with platform data or external data,and export trained models for further inference usage

● A makerplace of AI/ML/DL projects to moderate, categorize, publish or usecommunity AI/ML/DL projects 

Third party integration (3RDP) The ability to extend the platform with newfuture features 

AI4EU - WP2 - Platform TAD  5/34

 

Page 17: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU - WP2 - Platform TAD  6/34

 

Page 18: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

2.2. FUNCTIONAL ARCHITECTURE AND DEPENDENCIES 

In order to orchestrate and provide the platform features, the technical architecture will be                           based on the following functional architecture :  

 

This functional architecture is based on the following blocks:  

● The platform with its differents features (CM, AIPM, IAM) embedded in an orchestrator● IaaS and managed services on which the platform rely on ● Any other 3rd party provider that can interoperate with the platform

Smile will be in charge of the delivery of the platform block (“Smile Scope”).  

AI4EU - WP2 - Platform TAD  7/34

 

Page 19: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

The designated hosting provider will be in charge of the delivery and the run of the                               infrastructure and managed services on which the platform rely on (“Hosting providerscope”). 

3. ASSUMPTIONS AND SERVICE LEVEL OBJECTIVES 

With the aim of designing the technical architecture adapted to the platform features andfunctional architecture we need to take several assumptions.  

These assumptions will lead to technology and technical architecture design choices and the possible service level agreement. 

3.1. ASSUMPTIONS FOR THE “PLATFORM BLOCK”

Assumption ID Assumption description Related to feature

PA1  Standardize the software component deployment and execution thanks to a common platform orchestrator 

Non functional requirement 

PA2 Offer, at the platform level, a unique identity to eachplatform user 

All

PA3  Offer user identification and authentication mechanisms compatible with industry standards 

IAM, 3RDP 

PA4  Enforce end to end encryption for user activities  Non functional requirement

PA5  Guarantee of the logical data integrity  Non functional requirement 

PA6 Guarantee of the logical data security Non functionalrequirement 

PA7  Allow the logical scalability, within the limits of the hosting provider capacity 

Non functional requirement 

PA8  Use of web technologies to offer collaborative services and AI development studio services to platform users

IAM, CM, AIPM 

PA9  A target of 1000 unique registered users by end of 2019  All 

AI4EU - WP2 - Platform TAD  8/34

 

Page 20: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

In regards to these assumptions, the technical architecture will be designed to match these                           service level objectives :

- Service availability : aligned with hosting provider service availability (Business hours) - Maximum Recovery Time Objective (RTO) : 4 business hours for incident non-related to                         

IaaS services - Maximum Recovery Point Objective (RPO) : 24h (linked to backup scheduling - daily                         

backups) for incident non-related to IaaS services - Data retention period : legal aspect/regulation still to be defined and agreed by the                           

AI4EU project board 

To reach these objectives any underlying dependencies must also match their own, and the                           platform will have to be operated according to the relevant operational guide.

3.2. ASSUMPTIONS FOR THE “IAAS AND MANAGED SERVICES BLOCK” 

Assumption ID  Assumption description 

HA1  A disaster recovery plan and related resources is ready and tested yearly 

HA2  Guarantee of the physical data integrity 

HA3 Guarantee of the physical data security

HA4  Able to provide public DNS and NTP service 

HA5  Able to provide compute resources compatible with the platform orchestrator (assumption PA1), with different kind of cpu/memoryprofile, including specific profile for GPU usage if required 

HA6  Guarantee of constant IOPS on direct attached storage 

HA7 Guarantee of minimum network bandwidth on virtual network interface

HA8  Able to provide distributed storage compatible with the platform orchestrator (assumption PA1) 

HA9 Able to provide network layer, including private LAN public ip address,load-balancer L3 

HA10  Able to provide at least one public ip address (IPv4 or IPv6) 

HA11 Ability to respond to scalability needs in constant time

AI4EU - WP2 - Platform TAD  9/34

 

Page 21: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

HA12  Ability to setup monitoring and alerting solution of client infrastructure 

In regards to these assumptions we know that the hosting provider is able to offer these                               service level objectives for infrastructure and managed services : 

- Service availability : business hours (weekdays between 9AM to 6PM, GMT+1)- Maximum Recovery Time Objective (RTO) : 10 business days - Maximum Recovery Point Objective (RPO) : next business day- Constant DAS IOPS : 70Mb/s per physical volume - Minimum network bandwidth on virtual network interface : 10Gb/s per physical server- Maximum time to make available a new virtual machine or distributed storage volume :                           

maximum 3 business days- Data retention period : legal aspect/regulation still to be defined and agreed by the                           

AI4EU project board

3.3. ASSUMPTIONS FOR THE PROJECT BOARD 

Assumption ID  Assumption description 

PA1 Able to provide required SSL certificates for end-to-end encryption(i.e. : *.ai4eu.eu) 

PA2  Able to include a 3rd party provider to deploy and maintain a security solution on top of the platform (IAM, CM, AIM, …). This solution must include a web application firewall (traffic inspection, anti-virus, ...) 

PA3 No ISO-27001 Certification or any other information securitymanagement certifications required 

AI4EU - WP2 - Platform TAD  10/34

 

Page 22: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

3.4. ASSUMPTIONS FOR THE PLATFORM OPERATOR

The platform operator will be in charge of :

● The monitoring of the platform health and scheduled jobs ● The execution of the platform scheduled maintenance plan  ● Running proactive actions to maintain the platform health ● To support end-user and resolve issues ● To cooperate with the hosting provider regarding the resources and services used by                         

the platform and delivered by this provider 

The platform will use and update/maintain the platform operation guides delivered during thedeployment phase of this project. 

This platform operator role is not yet assigned, and will have to be transferred to a designated                                 entity during the project, as soon as one of the platform feature in production mode. 

AI4EU - WP2 - Platform TAD  11/34

 

Page 23: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4. TECHNICAL ARCHITECTURE 

4.1. ARCHITECTURE OVERVIEW 

The technical architecture is built around the 3 main features of the platform :  

● User identity and access management (IAM) ● Collaboration management (CM)● AI project management (AIPM) 

To support these features the technical architecture will include several technical components: 

● A dedicated network (i.e. hypervisor virtual network), segregated from any other clientsnetworks and secured by firewall. Control also incoming requests from the internet 

● An object storage service (Ceph object gateway)● A distributed block storage service (CephFS and Ceph RBD) ● A platform orchestrator (a.k.a. Platform as a Service, K8S), with different kind of

technical components and resources like : ○ Private docker registry dedicated to custom platform images○ A set of compute resources compatible with the software and hardware 

requirements of the platform features● A admin endpoint (i.e. virtual machine), containing all administrative tooling and 

scheduled scripts (i.e. backup scripts)

AI4EU - WP2 - Platform TAD  12/34

 

Page 24: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

 

AI4EU - WP2 - Platform TAD  13/34

 

Page 25: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.1.1. Platform components role and dependencies 

Component  Role  Dependencies  Setup ownership 

Dedicated network  Allow all technical components to communicate to each other in a secure way, but also control incoming requests from the internet 

None  Hosting provider scope 

Distributed blockstorage service 

Offer persistent block storage(mount point) at platform orchestrator level

Dedicated network Hosting providerscope 

Object storage service 

Offer S3 like storage at platform orchestrator level 

Dedicated network  Hosting provider scope 

Admin endpoint Allow remote platformadministration and administrative tasks scheduling. Secured andrestricted to administrator profile 

Dedicated network Smile scope

Platform orchestrator (K8S) 

Orchestrate software components lifecycle (deployment, consistency, self-healing, communication) in a standard way 

Dedicated network, distributed block storage service and object storage service 

Smile scope 

IAM Common users repository atplatform level. Offer standard user identification and authenticationfor all software components 

Platformorchestrator 

Smile scope

CM  Software component in charge of all collaborative features at platform level 

Platform orchestrator, IAM 

Smile scope 

AIPM Software component in charge ofall AI project management features at platform level

Platformorchestrator, IAM 

Smile scope

AI4EU - WP2 - Platform TAD  14/34

 

Page 26: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.2. NETWORK ENVIRONMENT

4.2.1. Technical requirements  

The network environment technical requirements can be divided according to the following                       categories : 

● Redundancy and failover ● Addressing● Security 

4.2.1.1. Redundancy and failover

We expect the network environment to allow the platform traffic and computer/storageresources to be spread across different data centers. This will enable the platform                         orchestrator to be aware of physical location : in case of any data center maintenance ordisaster recovery, live relocation of workload to any healthy location will then happen. 

This implicitly means that :  

● The data centers are connected with high-speed and reliable network connections● The data centers network equipment are able to balance internet traffic and attach the                           

platform public ip address(es) to any active/healthy data center

We highly recommend the hosting provider to use at least 3 data centers to allow any software                                 component relying on cluster and quorum technology to be well balanced across these data                           centers. This to avoid any “split-brain issues” that could happen at hardware or software                           failure time. 

4.2.1.2. Addressing  

We expect from the network environment to provide a private IPv4 addressing, with the                         following requirements : 

● Either :○ A private network with one private subnet with a dedicated class C network (/24                           

mask) extended to each data center○ A private network with a dedicated subnet (class C network) attached to each                         

data center● An internet gateway (NAT for outgoing traffic) attached to each network subnet (default                         

subnet gateway)● A NAT gateway for incoming traffic from this internet. This gateway will forward this                           

traffic to the correct private subnet with specific traffic rules that will be defined duringthe deployment project 

● At least one public IP address attached to the NAT gateway. An optional second IPaddress dedicated to the admin endpoint traffic is also highly recommended.  

AI4EU - WP2 - Platform TAD  15/34

 

Page 27: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.2.1.3. Security 

We expect from the network environment to allow the implementation of the following traffic rules : 

Outgoing internet traffic : 

Source IP  Source proto/port 

Destination IP  Destination proto/port 

Access / comments 

<platformsubnet> 

TCP/Any 0.0.0.0/0 TCP/80 Allow / publicHTTP 

<platform subnet> 

TCP/Any  0.0.0.0/0  TCP/443  Allow / public HTTPS 

<platform subnet>

TCP/Any  0.0.0.0/0  TCP/53  Allow / TCP DNS request

<platform subnet> 

TCP/Any  0.0.0.0/0  UDP/53  Allow / UDP DNS request 

<platformsubnet> 

TCP/Any 0.0.0.0/0 UDP/123 Allow / NTP

<platform subnet> 

TCP/Any  0.0.0.0/0  TCP/Any  Deny / default TCP deny 

<platform subnet>

TCP/Any  0.0.0.0/0  UDP/Any  Deny / default UDP deny

Incoming internet traffic : 

Source IP Sourceproto/port 

Destination IP Destinationproto/port 

Access

Trusted Admin IPs 

TCP/Any  <admin endpoint public IP> => NAT to => <admin endpoint private IP> 

TCP/22  Allow / SSH access 

0.0.0.0/0  TCP/Any  <default public IP> => NAT to =>

TCP/443  Allow / HTTPS access to the

AI4EU - WP2 - Platform TAD  16/34

 

Page 28: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

<platform orchestrator privateip endpoints> 

platform 

0.0.0.0/0  TCP/Any  <default public IP>  TCP/Any  Deny 

0.0.0.0/0 UDP/Any <default public IP> UDP/Any Deny

4.2.1.4. Operational requirements

As this component is under the hosting provider scope for its setup and operation, there is nospecific required operational requirements, except the alignment with the expected service                     level objectives (cf. § Assumptions for the “IaaS and managed services block”).

4.2.2. Architecture diagram 

AI4EU - WP2 - Platform TAD  17/34

 

Page 29: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.3. DISTRIBUTED BLOCK STORAGE SERVICE 

4.3.1. Technical requirements  

We expect the hosting provider to setup and maintain a distributed block storage service. This                             service will be based on CephFS, Ceph RBD.

We expect the service to be available from the platform private network. 

We expect an initial available storage space of 500 Gb. The deployed architecture should also                             allow further project extension (storage extension need) without unplanned servicedisruptions. 

The usage of this storage area will be dedicated to : 

● Collaboration management : documents repositories● AI project management : artifacts and trained model repositories (nexus registry) ● Platform orchestrator : software images repositories (docker registry)

AI4EU - WP2 - Platform TAD  18/34

 

Page 30: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

The initial space allocation will be : 

● 60% for Collaboration management (300Gb) ● 30% for Platform orchestrator (150Gb) ● 10% for AI project management (50Gb) 

At the required project step, the increase of storage size will be done accordingly to AI projectmanagement needs. 

4.3.2. Operational requirements 

We expect the service availability and maintenance to be aligned with the service level                           objectives (§ Assumptions for the “IaaS and managed services block”). In that regards weexpect a daily backup of data volumes, and a restore process possible compatible with the                             defined RTO/RPO.

AI4EU - WP2 - Platform TAD  19/34

 

Page 31: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.4. OBJECT STORAGE SERVICE

4.4.1. Technical requirements  

The object storage service is a software component that will be deployed on dedicated machines attached to the platform private network.  

This service will be based on Ceph object gateway. 

This services aims to support the S3 protocol to allow the storage of : 

● All platforms features backups (different from infrastructure/storage service backup)● AI project management private data sources (for training, test, …), available from data 

scientist notebooks

These two needs can be seen as two different deployments and requirements : 

● For the platforms features backups, we expect an initial available storage space of                         500Gb. The deployed architecture should also allow further project extension (storageextension need) without unplanned service disruptions. 

● For the AI project management, the requirements definition will wait for the detaileddesign of this particular platform feature. But we can already confidently estimate the                         storage need to multi-terabyte.

4.4.2. Operational requirements

We expect the hosting provider to :

● Monitor the virtual resources (compute, storage) healthiness and the report of issues                       to the platform operator entity 

● The daily backup (with off-site archiving) of the virtual storage resources (to be detailed                           during the projects steps 

● To make available the required virtual resources (compute, storage, network), attached                     to the platform private network 

In Smile Scope the following tasks are required :

● The detailed setup procedure documentation of this service ● The detailed operating guide documentation of this service ● The setup and deployment of this service 

AI4EU - WP2 - Platform TAD  20/34

 

Page 32: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.4.3. Architecture diagram 

Object storage service architecture, dedicated to platforms features backups 

AI4EU - WP2 - Platform TAD  21/34

 

Page 33: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.5. ADMIN ENDPOINT

4.5.1. Technical requirements  

The admin endpoint is a bastion host dedicated to platform technical administrators. 

This host will be attached to the platform private network. It will be also available from a remote access (from the Internet) thanks to a SSH connection, authorized (firewalling, NAT) by the platform network equipment. This remote access can be restricted to a whitelist of trusted IP source (To be defined during the deployment project). 

To support this host execution, we estimate the minimal compute and storage requirement to one virtual machine with at least 2 vCPU, 8Gb of Ram, 50Gb of direct attached storage andable to run CentOS 7 operating system 

4.5.2. Operational requirements 

This bastion host aims to support administrators tasks like : 

● Getting access to the platform orchestrator administration tools (CLI and web access) ● Getting access to the platform underlying virtual machines (SSH access) ● Creating, maintaining, scheduling and triggering administrative batch jobs (platform 

backups, clean-up, ...) ● Install and use Rancher as a UI for the platform orchestrator ● Maintain platform software components (infrastructure and platform features) ● Review platform logs ● Debug any platform issues 

We expect the hosting provider to : 

● Monitor the virtual resources (compute, storage) healthiness and the report of issues                       to the platform operator entity 

● The daily backup (with off-site archiving) of the virtual storage resources (to be detailed                           during the projects steps 

● To make available the required virtual resources (compute, storage, network), attached                     to the platform private network, with the expected network rules 

In Smile Scope the following tasks are required : 

● The detailed setup procedure documentation of this service ● The detailed operating guide documentation of this service ● The setup and deployment of this service 

AI4EU - WP2 - Platform TAD  22/34

 

Page 34: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.5.3. Architecture diagram 

AI4EU - WP2 - Platform TAD  23/34

 

Page 35: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.6. PLATFORM ORCHESTRATOR

The platform orchestrator aims to support the execution of all platform features. Which                         means to allow : 

● To create and maintain the required execution environment by each platform feature ● To create and maintain any network link required between software components and

the exterior world ● To spread the software workload across several compute resources according to

business and technical rules (i.e. service level objective, specialized compute resources,                     data center awareness,…)

● To monitor and heal deployed software component, in the case of software or                         hardware failure (i.e. : restart of crashed software component, move of softwareexecution from unhealthy to available hardware resources) 

● To use an industry standard software containerization (Docker)

4.6.1. Technical requirements  

The software solution choice for the platform orchestrator is Kubernetes . 

A Kubernetes production grade architecture require the following resources : 

● 3 virtual machines dedicated to Kubernetes server components (Master Server                   Components). For this platform, each virtual machine specification is : 

○ 2 vCPU, 8Gb Ram and 30 Gb of direct attached storage ○ Able to run CoreOs ○ Attached to the platform private network ○ We highly recommend to assign one master server node per data server 

● A set of virtual machines dedicated to workload execution (Node Server Components) 

The quantity and specifications of virtual machines dedicated to the workload execution willbe defined in the following sections of this document. These specifications will be adapted to                             each platform features. For example, for the AI project management, we can imagine thesetup of virtual machines based on hardware equipped with GPU (i.e. Nvidia GPU) to                           accelerate AI computing (training/inference).

However, we already need a minimal set of worker nodes (Node Server Components) to run : 

● Infrastructure services : 

○ Private docker registry 

AI4EU - WP2 - Platform TAD  24/34

 

Page 36: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

○ CI/CD services 

○ Internal DNS and SMTP 

● IAM, CM services

These nodes specifications will be aligned to two different configuration :  

● 3 nodes for Infrastructure service with each : 4 vCPU and 16G of RAM ● 6 nodes for IAM and CM service with each : 8 vCPU and 16G of RAM

To give to the platform operator a global overview and UI to manage this orchestrator, the                               Rancher application will be deployed on the admin endpoint. 

4.6.2. Operational requirements 

The K8S cluster API must be available from the admin endpoint (CLI and web access)

We expect the hosting provider to :

● Monitor the virtual resources (compute, storage) healthiness and the report of issues                       to the platform operator entity 

● To make available the required virtual resources (compute, storage, network), attached                     to the platform private network, with the expected network rules and spread equally                         between the available data centers  

In Smile Scope the following tasks are required : 

● The detailed setup procedure documentation of this service ● The detailed operating guide documentation of this service ● The setup and deployment of this service ● The setup and deployment of scheduled backup scripts of the K8S cluster configuration                         

and status ● The setup of monitoring and administrative tooling for K8S supervision 

A detailed description of K8S setup and configuration is available in annexes (§ K8S detailed                             setup proposal)

AI4EU - WP2 - Platform TAD  25/34

 

Page 37: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.6.3. Architecture diagram 

AI4EU - WP2 - Platform TAD  26/34

 

Page 38: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.7. IDENTITY AND ACCESS MANAGEMENT

This technical component is in charge of the “User identity and access management (IAM)”feature.  The software component choice is WSO2 Identity Server (version 5.7.0). A commercial supportof this software component is possible from the software vendor. 

4.7.1. Technical requirements  

In regards to this platform service level objectives, the IAM production environment                       requirements is the following :

● one WSO2 IS running instance  ● a MariaDB instance 

These software components will be packaged into containers and deployed by the platformorchestrator on worker nodes dedicated to platform services. 

This service will offer a public access to different API and user interface : 

● OAuth2 and SAML API● Administrative panel access (Web UI to be restricted to trusted IP sources) ● User authentication form● User registration form and self-service (lost password, MFA/OTP registration, …) 

This access will be managed by a K8S ingress rule. 

In that regard, an end-to-end secure communication between users (end-user, administrators)and this feature must be setup. The security requirements involve the use of SSL/TLS                           certificate dedicated to these HTTPS communications.

4.7.2. Operational requirements

We expect the hosting provider to :

● Monitor the virtual resources (compute, storage) healthiness and the report of issues                       to the platform operator entity 

● To make available the required virtual resources (compute, storage, network), attached                     to the platform private network, with the expected network rules and, when possible,                         spread equally between the available data centers  

● To provide the required SSL/TLS certificate, linked to this service host name, registered                         in the platform’s domain name 

In Smile Scope the following tasks are required : 

● The detailed setup procedure documentation of this service 

AI4EU - WP2 - Platform TAD  27/34

 

Page 39: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

● The detailed operating guide documentation of this service ● The setup and deployment of this service● The setup and deployment of scheduled backup scripts of the IAM configuration and                         

database

4.7.3. Architecture diagram 

AI4EU - WP2 - Platform TAD  28/34

 

Page 40: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.8. COLLABORATION MANAGEMENT

This technical component is in charge of the “Collaboration management (CM)” feature. Thesoftware component choice is Drupal CMS v8. 

The year one target assumptions is 1000 active users. 

4.8.1. Technical requirements  

In regards to this platform service level objectives, this software component requirements for                         production are the following :

● A single server deployment with its MariaDB instance and a its ElasticSearch instance ● A distributed block storage for attachments storage 

These software components will be packaged into containers and deployed by the platformorchestrator on worker nodes dedicated to platform services. 

This service will offer a public access to the web portal and communication services. It will also                                 rely on the IAM platform feature to get identified/authenticated users. 

These access (user, IAM) will be managed by a K8S ingress rule. 

In that regard, an end-to-end secure communication between users (end-user, administrators)                     and this feature must be setup. The security requirements involve the use of SSL/TLS                           certificate dedicated to these HTTPS communications. 

4.8.2. Operational requirements 

We expect the hosting provider to : 

● Monitor the virtual resources (compute, storage) healthiness and the report of issuesto the platform operator entity 

● To make available the required virtual resources (compute, storage, network), attachedto the platform private network, with the expected network rules and, when possible,                         spread equally between the available data centers

● To provide the required distributed block storage volumes ● To provide the required SSL/TLS certificate, linked to this service host name, registered

in the platform’s domain name 

In Smile Scope the following tasks are required : 

● The detailed setup procedure documentation of this service● The detailed operating guide documentation of this service ● The setup, custom configuration and deployment of this service

AI4EU - WP2 - Platform TAD  29/34

 

Page 41: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

● The setup and deployment of scheduled backup scripts of the service configuration and                         database :

○ daily incremental backup with GFS scheme, following platform service level                   objectives requirements

○ including documents, databases and search index) 

4.8.3. Architecture diagram 

AI4EU - WP2 - Platform TAD  30/34

 

Page 42: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.9. CI/CD TOOLING

The CI/CD tooling is part of this platform to support the customization of software                           components and the standardization of their deployment. 

This tooling is made of :

● A source control system (Gitea) :  ○ For each platform software components that require customization, a source                   

code repository is created in this system ○ Any platform developer/integrator that contribute these customizations can               

save its work in these repository ○ These repositories allows the versioning, tracking of changes and release                   

management 

● A continuous integration and deployment system (Drone) :  ○ For each platform software components that require customization, an                 

integration pipeline is created in this system ○ An integration pipeline aims to assemble the different source code, from the                       

source control system, into a ready to use software ○ These pipelines include different steps like QA check and packaging scripts ○ The final step of such pipeline can be a deployment task, in order to get an end                                 

to end software engineering process ○ These pipelines can be triggered automatically or by a human action : automatic                         

triggering is for development phase and human triggering is for production                     deployment phase 

This tooling is deployed on worker nodes dedicated to infrastructure services. 

AI4EU - WP2 - Platform TAD  31/34

 

Page 43: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

The AIM and CM software components are built and deployed thanks to this tooling. 

 

4.10. AI PROJECT MANAGEMENT 

This platform component specifications are still under discussion in others work packages                       groups. 

AI4EU - WP2 - Platform TAD  32/34

 

Page 44: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

5. SUMMARY OF TECHNICAL RESOURCES REQUIREMENTS 

Component VM vCPU / RAM DAS Block storage Object storage

Admin endpoint  1  2 / 8 Gb  100 Gb  0 Gb  500 Gb 

Platform orchestrator

3  2 / 8 Gb  40 Gb  0 Gb  0 

Infrastructure services 

3  4 / 16 Gb  30 Gb  100 Gb  0 

IAM / CM 6 8 / 16 Gb 30 Gb 800 Gb 0

Total requirements 

13  68 / 176 Gb  490 Gb  900 Gb  500 Gb 

AI4EU - WP2 - Platform TAD  33/34

 

Page 45: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Annex 2 - Interim Platform Technical Architecture Document

Page 46: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU project 

WP2 - Interim Platform TechnicalArchitecture Document 

Version 1.1 

Page 47: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Document changes 

Version  Changes 

1.1  Document updates 

Revision date 

Revision  Authors  Smile Reviewer  AI4EU Reviewer 

22/01/2019  1.0  Patrice Ferlet Olivier FavreauAlain Rouen 

Sebastien Vincent  TeraLab (Olivier Dehoux)Orange (Thierry Nagellen)

20/02/2019  1.1  Patrice Ferlet Alain Rouen 

Sebastien Vincent   

 AI4EU - WP2 - Interim Platform 

TAD 1/21 

Page 48: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Document summary 

Document goal 4

Interim Platform description 5 

Interim Platform Features 5 

Functional architecture and dependencies 6 

Assumptions and service level objectives 7

Assumptions for the “Platform block” 7 

Assumptions for the “IaaS and managed services block” 8 

Assumptions for the platform operator 9 

Technical architecture 10

Architecture overview 10 

Platform components role and dependencies 10 

Network environment 11 

Technical requirements 11

Redundancy and failover 11 

Addressing 11 

Security 12 

Operational requirements 13

Collaboration management 14 

Technical requirements 14 

Operational requirements 15 

 AI4EU - WP2 - Interim Platform 

TAD 2/21 

Page 49: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Hosting provider 15 

Smile 15 

Project level 15

Summary of technical resources requirements 17 

Deployment phases proposal 18 

Appendices 19 

ExoPlatform detailed setup proposal 19

 AI4EU - WP2 - Interim Platform 

TAD 3/21 

Page 50: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

1. DOCUMENT GOAL 

This document aims to describe the technical architecture and requirements of the InterimAI4EU platform. This interim platform that will support these activities : 

● The collaboration tool experimentation ● The collaboration tool usage by a subset of target users (early adopters, power users) 

This document will also include a proposal of migration path of this collaboration tool on the                               target platform (described in “AI4EU - Platform technical architecture document”)

Being the software integrator of this platform, Smile is the author of this document.  

In regards to his role, Smile will provide, at the latest at the end of his mission, the different                                     documents related to this architecture (build/setup and operational guide) to the entity incharge of the operations. 

 AI4EU - WP2 - Interim Platform 

TAD 4/21 

Page 51: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

2. INTERIM PLATFORM DESCRIPTION 

2.1. INTERIM PLATFORM FEATURES 

To fulfill the platform mission, several categories of features have identified : 

Feature category  Description 

Collaboration management (CM)  A set of collaborative features for platform users, to cover topics like: 

● The ability to publish and contributeto public and private content 

● Content management, documentstorage and sharing 

● Platform social network, useractivities, news boards 

● Forums and wikis, Chat and video● Subject matter expert communities 

 AI4EU - WP2 - Interim Platform 

TAD 5/21 

Page 52: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

2.2. FUNCTIONAL ARCHITECTURE AND DEPENDENCIES 

In order to orchestrate and provide the platform features, the technical architecture will be                           based on the following functional architecture :  

This functional architecture is based on the following blocks:  

● The CM platform feature ● IaaS and managed services on which the platform rely on

Smile will be in charge of the delivery of the platform block (“Smile Scope”).  

The designated hosting provider will be in charge of the delivery and the run of the                               infrastructure and managed services on which the platform rely on (“TeraLab scope”).

 AI4EU - WP2 - Interim Platform 

TAD 6/21 

Page 53: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

3. ASSUMPTIONS AND SERVICE LEVEL OBJECTIVES 

With the aim of designing the technical architecture adapted to the platform features and functional architecture we need to take several assumptions.  

These assumptions will lead to technology and technical architecture design choices and the possible service level agreement.

3.1. ASSUMPTIONS FOR THE “PLATFORM BLOCK” 

Assumption ID  Assumption description 

PA1  Each platform user identity will only exists in the ExoPlatform instance and will not be reusable in any external software component or 3rdparty service 

PA2  Enforce end to end encryption for user activities 

PA3 Guarantee of the logical data integrity

PA4  Guarantee of the logical data security 

PA5  Use of web technologies to offer collaborative services to interim platform users

PA6  A maximum of 200 unique registered users during the experimentation/interim period 

In regards to these assumptions, the technical architecture will be designed to match theseservice level objectives : 

- Service availability : aligned with hosting provider service availability (Business hours) - Maximum Recovery Time Objective (RTO) : next business day for incidents non-related                       

to IaaS services - Maximum Recovery Point Objective (RPO) : 24h (linked to backup scheduling - daily                         

backups) for incidents non-related to IaaS services - Backup data retention period : 6 months - Maintenance period : due to the project organisation and planning iterations, we                       

expect to update or change the CM feature configuration several times, in order to                         match project requirements. Most of these changes will require service downtime. To                       avoid too much service disruption, we plan to organize weekly maintenance period                       

 AI4EU - WP2 - Interim Platform 

TAD 7/21 

Page 54: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

during business hours. This maintenance window will be defined during the project                       and communicated accordingly.

To reach these objectives any underlying dependencies must also match their own, and the                           platform will have to be operated according to the relevant operational guide. 

3.2. ASSUMPTIONS FOR THE “IAAS AND MANAGED SERVICES BLOCK” 

Assumption ID  Assumption description 

HA1  A disaster recovery plan and related resources is ready and tested yearly 

HA2 Guarantee of the physical data integrity

HA3  Guarantee of the physical data security 

HA4  Able to provide required SSL certificates for end-to-end encryption 

HA5  Able to provide public DNS and NTP service 

HA6 Able to provide compute resources compatible with the functionalplatform architecture 

HA7  Guarantee of constant IOPS on direct attached storage 

HA8 Guarantee of minimum network bandwidth on virtual network interface

HA10  Able to provide network layer, including private LAN and multiple public ip address, load-balancer L3 

HA11 Able to provide security layer, including firewall

HA12  Able to provide several public ip address (IPv4 or IPv6), floating between data-center (for failover scenario) 

HA14 Ability to setup monitoring and alerting solution of client infrastructure

HA15  No ISO-27001 Certification or any other information security management certifications required 

In regards to these assumptions we know that the hosting provider is able to offer theseservice level objectives for infrastructure and managed services : 

- Service availability : business hours (week days between 9AM to 6PM, GMT+1) 

 AI4EU - WP2 - Interim Platform 

TAD 8/21 

Page 55: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

- Maximum Recovery Time Objective (RTO) : 10 business days - Maximum Recovery Point Objective (RPO) : next business day- Constant DAS IOPS : 70Mb/s per physical volume - Minimum network bandwidth on virtual network interface : 10Gb/s per physical server- Maximum time to make available a new virtual machine: 1 business day best effort,                           

maximum 3 business day- Backup data retention period : 6 months 

3.3. ASSUMPTIONS FOR THE PLATFORM OPERATOR 

The platform operator will be in charge of : 

● The monitoring of the platform health and scheduled jobs ● The execution of the platform scheduled maintenance plan  ● Running proactive actions to maintain the platform health ● To support end-user and resolve issues ● To cooperate with the hosting provider regarding the resources and services used by                         

the platform and delivered by this provider 

The platform will use and update/maintain the platform operation guides delivered during the                         deployment phase of this project.

This platform operator role is not yet assigned, and will have to be transferred to a designated                                 entity during the project, as soon the CM platform feature is in production mode. 

 AI4EU - WP2 - Interim Platform 

TAD 9/21 

Page 56: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4. TECHNICAL ARCHITECTURE 

4.1. ARCHITECTURE OVERVIEW 

The technical architecture is build around the Collaboration management feature of the platform. 

To support these features the technical architecture will include several technical components:

● A dedicated network (i.e. hypervisor virtual network), segregated from any other clients networks and secured by firewall. Control also incoming requests from the internet 

● Compute and storage resources required by the Collaboration management feature 

 

4.1.1. Platform components role and dependencies 

Component  Role  Dependencies  Setup ownership 

Dedicated network  Allow all technical components to communicate to each other in a 

None  Hosting provider scope 

 AI4EU - WP2 - Interim Platform 

TAD 10/21 

Page 57: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

secured way, but also control incoming requests from theinternet 

CM  Software component in charge of all collaborative features at platform level 

Dedicated network  Smile scope 

4.2. NETWORK ENVIRONMENT  

4.2.1. Technical requirements  

The network environment technical requirements can be divided according the following                     categories :

● Redundancy and failover ● Addressing ● Security 

4.2.1.1. Redundancy and failover 

We expect the network environment to allow the incoming and outgoing platform traffic to be                             transferred from and to the dedicated compute resource that will host the Collaboration                         management 

4.2.1.2. Addressing  

We expect from the network environment to provide a private IPv4 addressing, with the                         following requirements : 

● A private network with one private subnet with enough private IP address for thecompute resource.

● An internet gateway (NAT for outgoing traffic) attached the network subnet (defaultsubnet gateway)  

● A NAT gateway for incoming traffic from this internet. This gateway will forward thistraffic to the correct private subnet with specific traffic rules that will be defined during                             the deployment project

● One public IP address attached to the NAT gateway.  

 AI4EU - WP2 - Interim Platform 

TAD 11/21 

Page 58: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.2.1.3. Security 

We expect from the network environment to allow the implementation of the following traffic rules : 

Outgoing internet traffic : 

Source IP  Source proto/port 

Destination IP  Destination proto/port 

Access / comments 

<platformsubnet> 

TCP/Any 0.0.0.0/0 TCP/80 Allow / publicHTTP 

<platform subnet> 

TCP/Any  0.0.0.0/0  TCP/443  Allow / public HTTPS 

<platform subnet>

TCP/Any  0.0.0.0/0  TCP/25  Allow / public SMTP

<platform subnet> 

TCP/Any  0.0.0.0/0  TCP/53  Allow / TCP DNS request 

<platformsubnet> 

TCP/Any 0.0.0.0/0 UDP/53 Allow / UDP DNSrequest 

<platform subnet> 

TCP/Any  0.0.0.0/0  UDP/123  Allow / NTP 

<platform subnet>

TCP/Any  0.0.0.0/0  TCP/Any  Deny / default TCP deny

<platform subnet> 

TCP/Any  0.0.0.0/0  UDP/Any  Deny / default UDP deny 

Incoming internet traffic :

Source IP  Source proto/port 

Destination IP  Destination proto/port 

Access 

Trusted Admin IPs

TCP/Any  <default public IP> => NAT to =><exoplatform private IP>

TCP/22  Allow / SSH access

 AI4EU - WP2 - Interim Platform 

TAD 12/21 

Page 59: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

0.0.0.0/0  TCP/Any  <default public IP> => NAT to =><exoplatform private IP>

TCP/443  Allow / HTTPS access to theplatform 

0.0.0.0/0  TCP/Any  <default public IP> => NAT to => <exoplatform private IP> 

TCP/25  Allow / SMTP access to the platform 

0.0.0.0/0 TCP/Any <default public IP> TCP/Any Deny

0.0.0.0/0  UDP/Any  <default public IP>  UDP/Any  Deny 

4.2.1.4. Operational requirements 

As this components is under the hosting provider scope for the setup and te operation, there                               is no specific required operational requirements, except the alignment with the expected                       service level objectives (cf. § Assumptions for the “IaaS and managed services block”). 

 AI4EU - WP2 - Interim Platform 

TAD 13/21 

Page 60: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.3. COLLABORATION MANAGEMENT

This technical component is in charge of the “Collaboration management (CM)” feature. Thesoftware component choice is ExoPlatform (Enterprise Edition).  

4.3.1. Technical requirements  

In regards to this platform service level objectives, the interim ExoPlatform environment                       requirements is the following :

● A single server deployment with all ExoPlatorm technical component side by side (ExoPlatform application server and chat server, MariaDB, MongoDB, ElasticSearch, Postfix SMTP relay) 

● Software components segregation thanks to the usage of Docker containers and a non distributed orchestration (docker-compose) 

The virtual machine in charge of running these software components must be capable of                           running at least CentOs 7.5 x86-64. The machine specifications are the following :

● 20 vCPU, 40Gb RAM ● 4 dedicated data volumes : 

○ System volume : A direct attached volume with RAID1 or equivalent, with 50Gb of available space 

○ ExoPlatform file storage volume : A direct attached volume with RAID1 or equivalent resilience, with 300Gb of available space 

○ ExoPlatform databases volume : A direct attached volume with RAID1 or equivalent resilience, with 300Gb of available space 

○ ExoPlatform backup dumps : A direct attached volume with RAID1 or equivalent resilience, with 600Gb of available space 

This service will offer a public access to the ExoPlatform web portal and communicationservices. It will also rely on its own internal feature to get identified/authenticated users. 

In that regards, an end-to-end secured communication between users (end-user,                   administrators) and this feature must be setup. The security requirement involve the use of                         SSL/TLS certificate dedicated to these HTTPS communications. 

 AI4EU - WP2 - Interim Platform 

TAD 14/21 

Page 61: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

4.3.2. Operational requirements 

4.3.2.1. Hosting provider 

We expect the hosting provider to : 

● Monitor the virtual resources (compute, storage) healthiness and the report of issues                       to the platform operator entity

● To make available the required virtual resources (compute, storage, network), attached                     to the platform private network, with the expected network rules

● To provide the required SSL/TLS certificate, linked to this service host name, registered                         in the platform’s domain name

● To include in its backup plan the volume dedicated to the ExoPlatform backup (§4.3.1                           Technical requirements). We expect a daily backup of the volume

4.3.2.2. Smile 

In Smile Scope the following tasks are required : 

● The detailed setup procedure documentation of this service ● The detailed operating guide documentation of this service ● The setup, custom configuration and deployment of this service ● The setup and deployment of scheduled backup scripts of the service configuration and                         

database :  ○ daily incremental backup with GFS scheme, following platform service level                   

objectives requirements ○ including documents, databases and search index) 

● The hand-over of the ExoPlatform administrative access to the power-users in charge                       of interim users on-boarding 

● The initial power-user training to ExoPlatform essentials 

To be able to execute these tasks, Smile will require a remote access to the underlyingcompute resource. This remote access configuration is describe in §4.2.1.3 Security. 

4.3.2.3. Project level

To be able to to give access to the Collaboration platform, we will need from the projectstakeholders : 

● The official domain name that will be used to publish the collaboration platform urls ● The different hostnames associated to the differents services of the collaboration                   

platform : ○ Portal access hostname (with or without public content) ○ Chat server hostname 

● The ability to add several DNS records linked to the collaboration platform services : ○ The portal access hostname (A or CNAME record) ○ The chat server hostname (A or CNAME record) 

 AI4EU - WP2 - Interim Platform 

TAD 15/21 

Page 62: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

○ Technicals records for email notifications (MX record for the SMTP relay and TXT                         and SPF records for anti-spam setup)

 AI4EU - WP2 - Interim Platform 

TAD 16/21 

Page 63: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

5. SUMMARY OF TECHNICAL RESOURCES REQUIREMENTS 

Component VM vCPU / RAM DAS

CM  1  20 / 40 Gb  1250 Gb 

Total requirements  1  20 / 40 Gb  1250 Gb 

 AI4EU - WP2 - Interim Platform 

TAD 17/21 

Page 64: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

6. DEPLOYMENT PHASES PROPOSAL 

To align the platform deployment to the project timeline and goals we propose to organize the following milestones and deliverables : 

Milestone  Deliverable  Owner  Description 

M0  D1  TeraLab  Network environment and computing resources 

M0 D2 Smile ExoPlatform experimentation instance

 

 AI4EU - WP2 - Interim Platform 

TAD 18/21 

Page 65: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

7. APPENDICES 

7.1. EXOPLATFORM DETAILED SETUP PROPOSAL 

ExoPlatform will start following services: 

● Galera cluster database  ● MongoDB database● Elastic Search ● ExoPlatform (tomcat)● PostFix 

 AI4EU - WP2 - Interim Platform 

TAD 19/21 

Page 66: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

 AI4EU - WP2 - Interim Platform 

TAD 20/21 

Page 67: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

AI4EU_D2.4_M6_vfinal

Annex 3 - Platform Installation and Maintenance Guide

Page 68: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

 

AI4EU project 

Platform Installation and maintenance guide  

 Version 1.0 

Page 69: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Document changes 

Version  Changes 

1.0  Installation & Maintenance (Rancher, Gitea, Drone, WSO2, velero) 

   

   

   

 

Revision date  Revision  Authors  Smile Reviewer  AI4EU Reviewer 

08/08/2019  1.0  Patrice Ferlet 

 

Alain ROUEN   

         

         

         

   

AI4EU Platform - Installation and maintenance guide - V1.0 2 / 36

Page 70: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Document summary

Part 1 - From zero 4 

Cluster initialisation 4 

Rancher UI 4

Rancher Monitoring 5

Update Rancher UI 6

Rancher CLI 7

Install storageClasses 7 

Add labels to nodes 8 

Install private registry 9 

Add Legacy helm chart repository 9 

Build and push internal services Docker images 10

AI4EU Rancher Catalog 11 

Install internal services 12

DNSMASQ 12 

SMTPD 13 

Startup gitea and drone 14 

Gitea 14 

Drone 15

Automatic images build 17 

Check Part 1 finalization 17

Part 2 - AI4EU platform installation 18

Portal instances and dependencies 18

WSO2 Identity server 18

Install mariadb 18

Install WSO2 19

Portal 20

Prepare Database 21

Install Portal 22

Activate Drone deployment 23

Part3 - Common commands to maintain and fix 26

Cleanup remaining jobs 26

Get a terminal session to launch commands 27

Copy files from/to containers 27

Port Forwarding 28

AI4EU Platform - Installation and maintenance guide - V1.0 3 / 36

Page 71: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Part 4 - Manage Backups with Velero and Companion 30

Velero backups 30

Backup “one-shot” 31

Backup “Schedule” 31

Apply a backup 31

Delete Backup or Schedule 31

Companion backup 32

Part 5 - Common problems, fixes and workarounds 32

Rancher 32

let’s encrypt certificate problems 32

Registry problems 33

Error 500 on image push (no space left on device) 33

AI4EU Platform - Installation and maintenance guide - V1.0 4 / 36

Page 72: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Part 1 - From zero 

This part describes how to install cluster from zero. In this case, we consider that nodes are

empty (coreOs installed, no rancher, no kubernetes).

This part lists actions to:

- install kubernetes and Rancher

- prepare storageClass

- install the registry

- prepare helm repositories (install legacy helm repository and our own chart museum)

- push our own images in the registry

- install dnsmasq and smtpd for the cluster

- install Gitea and Drone

- prepare automatic builds of charts and images

After this part is finished, we can deploy AI4EU applications in a second part.

Cluster initialisation

Get the “admin-rancher” project:

git clone [email protected]:innovation/ai4eu/admin-rancher.git

/tmp/admin-rancher

cd /tmp/admin-rancher

TK: Here - cloud ini

TK Certificates

After having installed Kubernetes and Rancher, you may connect to the Rancher interface :

https://ws66-admin-ep.tl.teralab-datascience.fr:8443/

Rancher UI

Goal: Get a web interface to manage cluster and application, and to get monitoring

Rancher UI should be started outside the cluster, for example on the “admin machine”.

To start it:

VERSION=v2.2.6

AI4EU Platform - Installation and maintenance guide - V1.0 5 / 36

Page 73: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

docker pull rancher/rancher:$VERSION

docker run -d --restart=unless-stopped \

--name rancher-2.2.6 \

-v $HOME/admin-rancher/data:/var/lib/rancher \

-p 80:80 -p 443:443 \

rancher/rancher:$VERSION

Then go to https://ws66-admin-ep.tl.teralab-datascience.fr and prepare admin/others users.

Rancher Monitoring

To start monitoring, you’ll need to prepare storageClass before - please do that at last

After having storage class activated, you can start monitoring. Login on Rancher UI and go

to the “ai4eu-production” cluster, then tools » monitoring

Put that values in form:

- Data Retention: 168 hours

- Enable Node Exporter: True

- Enable Persistent Storage for Prometheus: True

- Enable Persistent Storage for Grafana: True

- Prometheus Persistent Volume Size: 50Gi

- Default StorageClass for Prometheus: teralab-ceph

- Grafana Persistent Volume Size: 10Gi

- Default StorageClass for Grafana: teralab-ceph

- Prometheus CPU Limit: 1000 MilliCPU

- Prometheus Memory Limit: 4000 MiB

- Prometheus CPU Reservation: 200 MilliCPU

- Prometheus Memory Reservation: 1000 MiB

- Node Exporter CPU Limit: 200 MilliCpu

- Node Exporter Memory Limit: 50MiB

- Node Exporter Host Port: 9796

- Prometheus Operator Memory Limit: 100 MiB

Then, press “Enable Monitoring” and wait a bit that the entire services start (it could take

several minutes)

After a while, on “Cluster” page, you can see Graphs:

AI4EU Platform - Installation and maintenance guide - V1.0 6 / 36

Page 74: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Update Rancher UI

To upgrade Rancher UI container you need to:

- stop the container

- save data

- start new version

- update old container to not restart

- later, you can remove old containers

Exemple to pass from v2.2.5 to v.2.2.6:

docker stop rancher-v2.2.5

docker update --restart=no rancher-v2.2.5

sudo tar cvfz $(date +"rancher-data-%Y%m%d.tgz")

$HOME/admin-rancher/data

VERSION=v2.2.6

docker run -d --restart=unless-stopped \

--name rancher-2.2.6 \

-v $HOME/admin-rancher/data:/var/lib/rancher \

-p 80:80 -p 443:443 \

rancher/rancher:$VERSION

Then, if something goes wrong, you can do:

# stop the broken version

docker stop rancher-v2.2.6

docker update --restart=no rancher-v2.2.6

# revert data

cd $HOME

AI4EU Platform - Installation and maintenance guide - V1.0 7 / 36

Page 75: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

sudo tar xvfz rancher-data-DATE.tgz

# back to 2.2.5

docker start rancher:v2.2.5

docker update --restart=unless-stopped rancher-v2.2.5

The old version is now reverted.

Rancher CLI

Goal : Having a graphical interface to manage cluster

Download the CLI from the bottom right link on Rancher interface. Extract and put the

“rancher” command tool inside your PATH.

Go to your profile page API and Keys, and press “add key” button.

Give a name (e.g. “kubectl key”) and set up the scope to the cluster. Then copy the provided

token.

In a terminal, type:

rancher login --token="<your token here>"

https://ws66-admin-ep.tl.teralab-datascience.fr:8443/

Now, you are able to use rancher commands. To check, type:

rancher clusters

rancher nodes

rancher ps

You should have many information about the cluster, nodes and running applications.

Install storageClasses

Goal : Having possibility to create storage with size on demand

The storageClass will be able to create storages on CephRDB with a specified size. Note

that CephRDB does not allow to mount one storage on several nodes or containers.

AI4EU Platform - Installation and maintenance guide - V1.0 8 / 36

Page 76: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

You need the “admin-rancher” repository.

git clone [email protected]:innovation/ai4eu/admin-rancher.git

/tmp/admin-rancher

cd /tmp/admin-rancher

# prepare the secrets, you need to set up user and password

# provided by hosting service for Ceph

export _USERPASS=XXX

export _ADMINPASS=YYY

# create the secrets

rancher kubectl -n kube-system create secret generic ceph-user-secret

--from-literal=key=$(echo $_USERPASS | base64) --type=kubernetes.io/rbd

rancher kubectl -n kube-system create secret generic ceph-admin-secret

--from-literal=key=$(echo $_ADMINPASS | base64) --type=kubernetes.io/rbd

unset _USERPASS

unset _ADMINPASS

Now the secrets are created, install storage class definitions:

rancher kubectl create -f ./ceph/storage-class.yml

rancher kubectl create -f ./ceph/storage-class-data.yml

Make a check:

rancher kubectl get storageclass

NAME PROVISIONER AGE

teralab-ceph kubernetes.io/rbd 0d

teralab-ceph-data (default) kubernetes.io/rbd 0d

Add labels to nodes

Goal : Make Pods to be deployed on specified nodes

We need to add labels on nodes to be able to select them for service affinity:

cd setup-nodes

AI4EU Platform - Installation and maintenance guide - V1.0 9 / 36

Page 77: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

make

Install private registry

Goal : Having internal images to deploy on a local registry. The registry will be exposed as

NodePort to make each node to be able to contact 127.0.0.1:PORT to access registry.

The private registry will be used to keep custom images. We will force node port to be

30500.

# Install rancher provided tempate for Docker Registry

# Note that we set nodeport to 30500 to make it possible to

# use node_ip:30500 or 127.0.0.1:30500 registry url

_REGISTRY=cattle-global-data:library-docker-registry

rancher app install --namespace registry \

--set persistence.enabled=true \

--set persistence.size=100Gi \

--set service.nodePort=30500 \

--set service.type="NodePort" \

--set podAnnotations."backup.velero.io/backup-volumes"="data" \

$_REGISTRY registry

unset _REGISTRY

Note: the pod annotation is added to make it possible to velero to backup images later.

We also force the node port to be 30500 to make it possible to pull image from

127.0.0.1:30500 on each node.

Add Legacy helm chart repository

Goal : Get more applications in Rancher catalog, needed to install standard Drone instance.

We need to have some legacy repositories that are not provided by the Rancher one. In

Rancher UI, navigate to “Tools Catalogs” and add the repository to the “Global” scope

AI4EU Platform - Installation and maintenance guide - V1.0 10 / 36

Page 78: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Wait the catalog to be refreshed.

Build and push internal services Docker images

Goal : Get specific Docker images needed by AI4EU platform to be deployed from local

registry

To be able to make the cluster to work as expected in AI4EU context, we need to build some

custom images.

We will build

- smtpd service to send emails from internal services

- custom DNS based on dnsmasq to avoid internal access to use public addresses

This section will also build custom images that we will install later:

- identity-provider image that is a custom Docker image build for WSO2

- php image that is a “source to image” (s2i) compliant docker image to build PHP

applications from source (e.g. for the portal website that is built with Drupal)

Note that this images will be built automatically when updated as soon as we will

install Drone in a later section (parts Automatic chart package and updates and

Automatic build images)

For the initialisation, we need to make it manually.

Open a new terminal and make a port-forward on registry:

rancher kubectl -n registry port-forward svc/registry-docker-registry

AI4EU Platform - Installation and maintenance guide - V1.0 11 / 36

Page 79: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

5000:5000

Then, in a second terminal:

git clone [email protected]:innovation/ai4eu/docker-images.git

/tmp/docker-images

cd /tmp/docker-images

# build all images, it can take a while

make all

# push images

make push-all

This will push custom images in the private registry.

You can now stop the port-forward pressing CTRL+C in the first terminal.

AI4EU Rancher Catalog

Goal : Having our own helm chart installed in Gitea repository to be accessible on Rancher

Catalog and manage updates, installation form, and so on

Rancher can use Git catalog to make it possible to manage applications and use the

“questions.yml” file as form generator.

After having pushed charts in https://git.ai4eu.eu/smile/charts go in Rancher > Tools >

Catalog and add the repository using the IP address and NodePort to gitea.

To get the svc NodePort:

rancher kubectl -n git get svc

Take the node port corresponding to the 3000 port.

Then add a “cluster” catalog, name it “custom” and set up:

- http://10.200.211.33 :<nodeport>/smile/charts as url

- Activate “private repository”

- Give a reader user and password

- Validate

AI4EU Platform - Installation and maintenance guide - V1.0 12 / 36

Page 80: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

It will activate the catalog. To check if it works, go on “Apps” tab, press “Launch” and select

the “Custom” catalog:

These applications are the ones that resides in Gitea repository.

Install internal services

DNSMASQ

Goal : Avoid DNS to give external IP to contact internal services

The internal dnsmasq is used to avoid ai4eu.eu DNS request to point on public address that

is not allowed by the firewall. It will be used as stub domains resolution to point on

10.200.211.11 IP address instead

Go in Rancher UI Project Default Apps Launch

Search “dnsmasq” that is in the “custom” registry. Click on “view details” then fill up the form:

Use “infra” namespace.

AI4EU Platform - Installation and maintenance guide - V1.0 13 / 36

Page 81: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

We’re using “infra” namespace. Then press “Launch” button on the bottom.

It will startup dnsmasq on internal IP address 10.43.50.50.

Wait the startup of dnsmasq, then append this DNS to the kube-dns service:

rancher kubectl -n kube-system create configmap kube-dns

--from-literal=stubDomains='{"ai4eu.eu":["10.43.50.50"]}'

Now, applications launched in kubernetes will resolve each “*.ai4eu.eu” with addresses that

are configured in the “custom-dns” configMap in “infra” namespace.

To change mapping:

rancher kubectl -n infra edit cm custom-dns

It will open an editor with the configuration, you can add or remove entries, save it, and

dnsmasq will be refreshed.

SMTPD

Goal : Having local mail server to send email to users

Go in Rancher UI Project Default Apps Launch

AI4EU Platform - Installation and maintenance guide - V1.0 14 / 36

Page 82: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Search smtpd in the “custom” registry.

Press “view details”, change “namespace” to “infra”

Press “Launch” button.

Applications can now use “smtpd.infra:25” as mail server.

Startup gitea and drone

Gitea is the git registry that will holds some projects and the custom helm charts we will build

and push to ChartMuseum.

Gitea

Goal : Having application source code on local cluster, needed to make Drone to build and

deploy applications

AI4EU Platform - Installation and maintenance guide - V1.0 15 / 36

Page 83: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Get the original chart git repository

git clone [email protected]:innovation/ai4eu/charts.git /tmp/charts

cd /tmp/charts

Install gitea from the command line in the “git” namespace:

rancher app install -n git \

--set persistence.size=1Gi \

--set persistence.storageClass=teralab-ceph \

./charts/gitea \

gitea

This will create instance to https://git.ai4eu.eu

Important: Navigate to the URL and create a first user - it will be the administrator !

Drone

Goal : Manage Gitea triggers to build application (as Docker images), push on local Docker

registry and to deploy it on cluster

AI4EU Platform - Installation and maintenance guide - V1.0 16 / 36

Page 84: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

To install Drone, using Legacy helm chart to get the latest version:

Create a file named /tmp/drone.yaml containing:

server:

host: drone.ai4eu.eu

protocol: https

env:

DRONE_RUNNER_PRIVILEGED_IMAGES:

plugins/docker,plugins/ecr,metal3d/drone-plugin-s2i

ingress:

enabled: true

hosts:

- drone.ai4eu.eu

tls:

- secretName: ai4eu-eu

hosts:

- drone.ai4eu.eu

AI4EU Platform - Installation and maintenance guide - V1.0 17 / 36

Page 85: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

sourceControl:

provider: gitea

gitea:

server: https://git.ai4eu.eu

dind:

args: '["--insecure-registry=10.0.0.0/8"]'

Then:

rancher app install -n drone \

--values=/tmp/drone.yaml c-t6c42:helm-legacy-drone drone

You can now go to https://done.ai4eu.eu and authenticate with user created on gitea.

Automatic images build

On Drone, activate smile/docker-images

Each new modification in image definition that will be pushed in Gitea will start new build and

push images in internal registry.

Check Part 1 finalization

When each steps are done, check:

- kubernetes nodes are up and running

- chart museum is deployed

- you should have 3 hem charts catalogs (rancher, legacy and custom)

- there is one registry running on node port 30500

- Drone and Gitea should be up and running

- There are 2 repository in automatic build:

- smile/docker-images

- smile/charts

AI4EU Platform - Installation and maintenance guide - V1.0 18 / 36

Page 86: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Part 2 - AI4EU platform installation 

Portal instances and dependencies

instance namespace url db

testing portal-testing https://k8jmlo451.ai4eu.eu portal-testing

mariadb-portal-testing-maria

db

staging portal http://bd73h83933ls.ai4eu.e

u

portal

mariadb-mariadb

production

Current WSO2 IS instance is running in “wso2” namespace and uses “commondb” mariadb

instance in the same namespace. URL is https://is.ai4eu.eu

WSO2 Identity server

To be able to install WSO2 you will need to prepare a database. Then you can use the

helm-chart named “identity-provider” to install the server.

Install mariadb

Create a namespace named “wso2” and startup mariadb with replicas - you can use

Rancher application UI or use that command:

rancher app install -n wso2 \

--set db.user="root" \

--set db.password="the password here" \

--slave.replicas=3 \

--set master.persistence.enabled=true \

--set master.persistence.size="8Gi"

cattle-global-data:library-mariadb commondb

AI4EU Platform - Installation and maintenance guide - V1.0 19 / 36

Page 87: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Now that the database is started, get git.smile.fr:innovation/ai4eu/charts.git repository and

use the database dump:

_NS="wso2"

_POD="commondb-mariadb-master-0"

_PASS="the password here"

rancher kubectl -n $_NS exec \

-i $_POD \

-- mysql -u root --password="$_PASS" \

< charts/identity-server/mysql-scripts/mysql5.7.sql

rancher kubectl -n $_NS exec -i $_POD \

-- mysql -u root --password="$_PASS" \

< charts/identity-server/mysql-scripts/um_mysql5.7.sql

These commands installs databases:

● wso2reg_db

● wso2um_db

And it creates users “wso2” that has got permissions on theses databases.

After the database is ready, we can install WS02

Install WSO2

Use the helm-chart to install identity-server:

rancher app install -n wso2 ./charts/identity-server identity-provider

TODO: Use helm-char

The default ingress is at is.ai4eu.eu, you can now navigate to the IS server and add

configuration.

In Service Provider, clean “Add”,

AI4EU Platform - Installation and maintenance guide - V1.0 20 / 36

Page 88: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Then “File Configuration”, and choose charts/identity-provider/ai4eu-dev-oauth2.xml file.

Press “Import” to import the configuration.

The identity provider is now integrated.

Portal

Installing portal is done in 2 steps:

- prepare database

- install application

AI4EU Platform - Installation and maintenance guide - V1.0 21 / 36

Page 89: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Prepare Database

To prepare the database, go in Rancher UI and create a mariadb application in a

namespace. E.g. portal-testing

For pre-production or production, use a replicated mariadb server.

For production, activate the persistence for database.

Name the application “mariadb-portal-testing” or any other name that corresponds to the

portal you want to deploy.

After deployment, check if mariadb is OK (use the right namespace, here “portal-testing”):

rancher kubectl -n portal-testing get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP

PORT(S) AGE

mariadb-portal-testing-mariadb ClusterIP 10.43.236.204 <none>

3306/TCP 6m47s

Get the dump to inject and start the import (change user and password, and database

name):

kubectl -n portal-testing exec -i mariadb-portal-testing-mariadb-0 \

-- mysql -uadmin --password="admin" drupal <

~/Documents/ai4eu-preprod.db

AI4EU Platform - Installation and maintenance guide - V1.0 22 / 36

Page 90: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Then, check if tables are injected:

$ kubectl -n portal-testing exec -i mariadb-portal-testing-mariadb-0 \

-- mysql -uadmin --password="admin" drupal <<< "show tables;" | wc -l

315

Heren we’ve got 315 tables - you need to have a consequent table count (more that 50) to

be sure that mysql dump is injected.

Install Portal

Then, to install portal, go to Rancher UI, Apps > Launch.

Select “Custom” catalog and select “ai4eu-portal”:

Press View Details and change values:

- name: portal (could be any name, but choose one that is relevant)

- Customize the namespace, then press “use existing namespace” and select the

namespace where you started mariadb

- Change the “ceph path” to mount, there are mainly 3 directories:

- uat for user acceptance tests (for testing)

- pre-prod that is more or less stable

AI4EU Platform - Installation and maintenance guide - V1.0 23 / 36

Page 91: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

- production that is made for “production” and should not be used for

testing

- Docker image: choose the latest (WARNING “last” tag is not the latest for now) - go

to drone.ai4eu.eu and check the version you want to deploy. Comonly, its

“127.0.0.1:5000/smile/portal:<tag name>

-

- Put the right database host, here: mariadb-portal-testing-mariadb

- Set database name, user and password you provided for the mariadb installation

You can take a look on generated answers to ensure your settings (selection answers.yaml

file):

You can then press “Launch” button to startup the deployment.

Activate Drone deployment

This part is made to make Drone able to deploy new versions on git events. We need to

configure service account, roles and role bindings to let Drone accessing the API and make

changes on deployment

AI4EU Platform - Installation and maintenance guide - V1.0 24 / 36

Page 92: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Drone deployment will use the “.drone.yml” file in the project.

There are several build event for different steps.

To make Drone able to make build, each namespace where the portal is deployed should

have a “deployer” role with “drone-deployer” service account and assign authorization to

make some actions (list, patch, create...).

Taking the namespace named “portal-testing”:

_NS=portal-testing

rancher kubectl -n $_NS create sa drone-deployer

rancher kubectl -n $_NS create role deployer \

--resource=deployments,services,pods \

--verb=list,watch,patch,create,get

rancher kubectl -n portal-testing create rolebinding candeploy \

--role=deployer \

--serviceaccount=portal-testing:drone-deployer

You can now get the secret:

rancher kubectl -n portal-testing get secret | grep "deployer"

drone-deployer-token-6fk7j ...

# note the secret name

# here, use the secret name:

kubectl -n portal-testing describe secret drone-deployer-token-6fk7j

Name: drone-deployer-token-6fk7j

Namespace: portal-testing

#...

Type: kubernetes.io/service-account-token

Data

====

namespace: 14 bytes

token: eyJhb... # truncated => this is the token to copy

ca.crt: 1017 bytes

AI4EU Platform - Installation and maintenance guide - V1.0 25 / 36

Page 93: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Copy the entire “token” (that is truncated here for documentation) and go to drone.ai4eu.eu

You need to add the provided secret to Drone project.

Go in the “smile/portal” (or any repository you activated) and go in the settings tab. Scroll

down to “Secrets” and create another one. Name it and paste the token in the text field:

Press “Add a secret” button.

Then, in the “.drone.yml” file, find the corresponding builds that need to deploy on the

corresponding namespace (here, we search “portal-testing” deployment and we change

“from_secret” attribute:

- name: deploy

image: quay.io/honestbee/drone-kubernetes

settings:

kubernetes_server: https://10.43.0.1

kubernetes_token:

from_secret: drone-testing-deployer

namespace: portal-testing

deployment: portal-portal-testing

container: portal

repo: 127.0.0.1:30500/ai4eu/portal

tag: testing-${DRONE_COMMIT}

Note:

- the kubernetes server is the local API address, get it with

kubectl -n default get scv kubernetes

AI4EU Platform - Installation and maintenance guide - V1.0 26 / 36

Page 94: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

- we use 127.0.0.1:30500 that is the NodePort of the private registry, get it with

echo $(kubectl -n registry get svc registry-docker-registry -o

jsonpath={.spec.ports[0].nodePort})

The result is that the “deploy” step will use “portal-testing” namespace, find deployment

named “portal-portal-testing” and update the “portal” container to the compiled Docker

image. It will use our “drone-testing-deployer” secret to contact kubernetes API (without that

secret, the kubernetes API will refuse Drone to make changes on the deployment)

When the deployment is started, you can check the output with that commands:

NS=portal-testing

APP=portal-testing

POD=$(rancher kubectl -n $NS get pods \

--selector=app=portal-$APP \

--field-selector=status.phase=Running \

-o jsonpath='{.items[0].metadata.name}')

kubectl -n $NS logs -f $POD

Hit CTRL+C to stop logs.

Part3 - Common commands to maintain and fix 

You may need to make some commands to manipulate composants, services or

configuration.

In this section, we consider that you configured rancher CLI or Kubectl and that you can

connect to kubernetes with one of these tools.

Cleanup remaining jobs

For now, Kubernetes isn’t configured with feature gate “TTL”, so, jobs should be removed

sometimes.

To remove jobs that are not running:

for j in $(kubectl -n drone get jobs | awk '/1\/1/{print $1}'); do

kubectl -n drone delete job $j

done

AI4EU Platform - Installation and maintenance guide - V1.0 27 / 36

Page 95: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Get a terminal session to launch commands

While Rancher UI allow you to open a terminal session on the web interface, it is sometimes

necessary to be able to use a real terminal to use STDIN/STDOUT (e.g. to dump a mariadb

database on your local computer)

To do that task:

- First, get the pod name

kubectl -n <namespace> get pods

- Then use that pod name to open a terminal

kubectl -n <namespace> exec -it <podname> -- bash (or sh for alpine

based)

The “-i” parameters allow STDIN, STDOUT and STDERR to be piped on your own terminal.

The “-t” option start a tty session, so that you can use keyboard shortcuts as “CTRL+C”

Existing the shell session will not stop the container. It only stop the terminal session.

Keep in mind that “>” and “<” signs are interpreted on your terminal as far as you use it with

kubectl. For example, to get a mysql dump from “mariadb” pod in “default” namespace, you

only need “-i” option to use STDIN/STDOUT:

kubectl -n default exec -i mariadb -- mysqldump -u"admin"

--password="pwd" -b database > local.dump.sql

And you can also push dumps:

kubectl -n default exec -i mariadb -- mysql -u"admin" --password="pwd"

database < local.dump.sql

Copy files from/to containers

Use kubectl “cp” command

# get files from a pod

kubectl -n default cp <podname>:/path/to/file ./local/path

# push files to a pod

kubectl -n default cp ./local/path <podname>:/path/to/file

AI4EU Platform - Installation and maintenance guide - V1.0 28 / 36

Page 96: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

The order is “source” to “destination”

If a pod has got several containers, you can specify the container name with “-c” option:

kubectl -n default cp ./local/path <podname>:/path/to/file -c

<containername>

Note: the container must have “tar” command installed

Port Forwarding

It is sometimes useful to get a local port bind to a container/service that is working on

Kubernetes. For example, if a webservice is not exposed to internet.

Take the example of drone-ui that is installed on the cluster.

kubectl -n registry get svc

NAME TYPE CLUSTER-IP PORT(S)

docker-ui ClusterIP 10.43.50.251 80/TCP

...

The port for the service is 80:

kubectl -n registry port-forward svc/docker-ui 8080:80

We bind the local port “8080” to the service port “80”. Now open http://localhost:8080

AI4EU Platform - Installation and maintenance guide - V1.0 29 / 36

Page 97: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

It’s the same to make you able to use registry to push/pull images if needed:

kubectl -n registry get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

docker-ui ClusterIP 10.43.50.251 <none> 80/TCP 42d

registry-docker-registry NodePort 10.43.143.215 <none> 5000:30500/TCP 71d

Taking the 5000 port:

kubectl -n registry port-forward svc/registry-docker-registry 5000:5000

Try to pull image:

$ docker pull localhost:5000/ai4eu/portal

Using default tag: latest

latest: Pulling from ai4eu/portal

743f2d6c1f65: Already exists

6307e89982cc: Already exists

807218e72ce2: Already exists

5108df1d03f8: Already exists

901e0b6a7fe5: Already exists

AI4EU Platform - Installation and maintenance guide - V1.0 30 / 36

Page 98: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

5ffe11e7ab2c: Already exists ...

You can also use container/pod port instead of service name.

Part 4 - Manage Backups with Velero and 

Companion 

To make backups, we are using 2 tools:

- Velero, by Heptio, that is a complete tool that is able to backup resources and

volumes to S3 compatible storage

- A “companion” container, launched as Job in Kubernetes, to make specifics backups

Velero backups

Velero comes with a CLI that is installed on the “admin” machine. It needs to be able to

connect to kubernetes API.

The kubectl configuration resides on the machine, and is encrypted with GPG. To decrypt:

gpg ai4eu.yaml.gpg

Type the password and you’ll get the ai4eu.yaml file.

Now export the environment variable and check if it works:

$ export KUBECONFIG=$HOME/ai4eu.yaml

$ kubectl cluster-info

Kubernetes master is running at https://10.200.211.30:6443

KubeDNS is running at

https://10.200.211.30:6443/api/v1/namespaces/kube-system/services/kube-d

ns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl

cluster-info dump'.

Get the backup and schedule backups list:

velero backup get

velero schedule get

AI4EU Platform - Installation and maintenance guide - V1.0 31 / 36

Page 99: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Backup “one-shot”

To schedule a backup, you need:

- a selector to know the resource to save

- a TTL that is the time after the backup is deleted to save place

To backup a namespace as a one shot, e.g. the portal-testing namespace:

velereo backup testing-portal --include-namspaces portal-testing

After a while:

velero backup get testing-portal

NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR

testing-portal Completed 2019-07-11 10:20:01 +0200 CEST 29d default <none>

Backup “Schedule”

Apply a backup

To restore a backup, you need to create a restore state using the backup name you want to

restore.

Example for the testing-portal backup

velero restore create testing-restore --from-backup testing-portal

To restore from a scheduled backup:

velero restore create testing-restore --from-schedule testing-portal

It will setup the backup to the state of the backup, copying each kubernetes and, if provided,

volume snapshots.

Delete Backup or Schedule

Deleting a backup, or schedule backup, will remove the configuration and backups in the S3

server.

To delete a backup or schedule configuration, use:

velero backup delete <name>

AI4EU Platform - Installation and maintenance guide - V1.0 32 / 36

Page 100: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

E.g.

velero backup delete portal-testing

Companion backup

TODO

Part 5 - Common problems, fixes and 

workarounds 

Rancher

let’s encrypt certificate problems

It’s possible for any reason that certification get broken for “let’s encrypt” on Rancher. For

example, if ACME protocol cannot access Rancher and ban your requests for weeks after

several unsuccessful requests.

That means that Rancher will not be accessible, and cluster agent will fail to contact

Rancher service.

You can then use a workaround. We will use transfer rancher configuration to another

instance by removing the certificate validation.

Stop Rancher from admin server:

docker stop rancher

Backup the data volume

cd admin-rancher

export DATA="data-$(date +"%d%m%Y-%H%M%S")"

tar cvfz $DATA.tgz data

mv $DATA.tgz $HOME

Remove cert-cache

sudo rm -rf data/certs-cache/*

AI4EU Platform - Installation and maintenance guide - V1.0 33 / 36

Page 101: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Then start a new rancher docker container:

docker run --name=rancher-temporary --restart=unless-stopped \

-d -v $(pwd)/data:/var/lib/rancher \

-p 443:443 -p 80:80 \

rancher/rancher:v2.2.1

The container will start but there will be errors for the cluster agent. You need to activate

certification validation for this self-signed certificate:

sha256sum < data/management-state/tls/ca.key | cut -f1 -d " "

Copy that sum, and edit deployment for the agent:

export EDITOR=vi

kubectl -n cattle-system edit deployment cattle-cluster-agent

Find CATTLE_CA_CHECKSUM and add a value:

- name: "CATTLE_CA_CHECKSUM"

value: "<paste the sum here>"

Save and quit, it will restart cluster-agent and the cluster is now able to accept the Rancher

self-signed certificate.

TODO: Back to let’s encrypt

Registry problems

Error 500 on image push (no space left on device)

That error is commonly raised when there is no space left on device. This error can be seen

in a Drone build when pushing new application version to the registry. So that the image is

not pushed, build is is failed state and new version cannot be deployed.

You can check errors using the following command:

rancher kubectl -n registry get pods

rancher kubectl -n registry logs --tail=100 <podname>

AI4EU Platform - Installation and maintenance guide - V1.0 34 / 36

Page 102: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

To solve the “no space left on device” problem, you will need to cleanup the storage. You

may use “registry-cli” tools that ease the procedure (see below).

First, ensure that REGISTRY_STORAGE_DELETE_ENABLED variable is set to “true” in

deployment.

kubectl -n registry get deployments registry-docker-registry \

-o yaml | grep -A1 DELETE

# you should see:

- name: REGISTRY_STORAGE_DELETE_ENABLED

value: "true"

If not, edit the deployment and add the variable, or use Rancher UI to add that variable.

Open a terminal and open a port-forward on registry service:

rancher kubectl -n registry port-forward svc/registry-docker-registry

5000:5000

On a second terminal, download the registry-cli tools from

https://github.com/andrey-pohilko/registry-cli :

cd /tmp

git clone https://github.com/andrey-pohilko/registry-cli

cd registry-cli

python3 -m venv v

source v/bin/activate

pip3 install -r requirements-ci.txt

chmod +x reglistry-cli.py

Then type that command to check if everything is ok:

./registry-cli.py -r http://127.0.0.1:5000

You should see the entire list of images.

Now, cleanup the registry to remove old images, excepting the 10 latest (for example):

./registry-cli.py -r http://127.0.0.1:5000 --delete --num=10

Then, you can stop (CTRL+C) the port forward on the first terminal.

AI4EU Platform - Installation and maintenance guide - V1.0 35 / 36

Page 103: AI4EU Deliverable D2.4 Community Portal · The deliverable 2.4 is the initial version of the AI4EU community portal software. It will integrate the official website of the project

Finally, you need to pass the garbage collector on registry:

POD=$(kubectl -n registry get pods \

--selector=app=docker-registry --no-headers | awk '{print $1}')

rancher kubectl -n registy exec -it $POD \

bin/registry garbage-collect /etc/docker/registry/config.yml

Now, the storage is cleaned up and you must have enough place to push new images.

On Drone, you can press “restart” button on failed build tasks to retry to push images. Erros

should disappear.

AI4EU Platform - Installation and maintenance guide - V1.0 36 / 36