Utilizing Apple’s ARKit 2.0 for Augmented Reality...

Utilizing Apple’s ARKit 2.0 for

Augmented Reality Application Development

Ivan Permozer* and Tihomir Orehovački**

Juraj Dobrila University of Pula

* Faculty of Economics and Tourism „Dr. Mijo Mirković“, Pula, Croatia

** Faculty of Informatics, Pula, Croatia

{ipermoze, tihomir.orehovacki}@unipu.hr

Abstract - When it comes to practical augmented reality

applications, mobile platform tools are the most deserving.

Thanks to the nature of mobile devices and their everyday

usage, the ideal basis for this kind of content has

inadvertently formed itself. Consequently, within the iOS

development environment, Apple's Xcode program enables

application development using the ARKit library which

delivers a host of benefits. Amongst the plethora of

advantages, this paper focuses on utilizing features such as

the ability to measure distances between two points in space,

horizontal and vertical plane detection, the ability to detect

three-dimensional objects and utilize them as triggers, and

the consolidated implementation of ARKit and MapKit

libraries in conjunction with the Google Places API intended

for displaying superimposed computer-generated content on

iOS 11 and later iterations of Apple's mobile operating

system.

Keywords – Apple, AR, ARKit, Augmented Reality,

Computer Vision, Core ML, Custom Vision, Google Places,

iOS, iPhone, Machine Learning, Map Kit, Microsoft Cognitive,

POI, Points of interest, SceneKit, Swift, Xcode

I. INTRODUCTION

By virtue of recent technological advancements, computer-generated graphics have reached a level at which achieving a credible depiction of photorealistic content no longer presents a challenge, but rather serves as a stepping stone when it comes to defining the concept of reality through the convergence of real and virtual content. Therefore, it could be argued that the term Augmented Reality represents the backbone of the real and virtual worlds.

Since its inception, augmented reality has drastically changed the way information gets perceived. Although present for many years, augmented reality has flourished thanks to the emergence of sufficient hardware and software support towards the end of the 20th century. As a result, interacting with virtual objects superimposed over the real environment is no longer a matter of fiction, but a fundamental element of numerous applications and services which intend to simultaneously enhance and simplify interactions and consumption of all sorts of digital content.

Even though augmented reality gets utilized in a wide range of human activities, this paper intends to focus on a few specific and potentially practical implementations of AR within Apple's iOS operating system.

Consequently, the applications depicted in this paper are based on Apple's ARKit 2.0 library which greatly simplifies the development of AR based applications in several key aspects with the use of Apple's Swift programming language compiled through Xcode.

II. DISTANCE MEASURING APPLICATION

Thanks to recent advancements in technology, many everyday tools are gradually getting replaced and improved upon. One such example can be seen in the case of the classic ruler, or a tape measure whose role could be taken over by an application based on augmented reality.

The following application attempts to showcase the potential, and bridge the gap between reality and technology by utilizing augmented reality with the help of the iOS device's onboard camera as an input element in order to achieve a credible display of distance between two desired points in space (figure 1).

Displaying the distance was achieved by using the UIlabel() class which, amongst others, contains a CGRect element representing a rectangle on top of which a string of alphanumeric characters depicts the distance measured in centimeters with two or more decimal places.

Since the purpose of the application is to visually depict dots in space based on the user's input, the SCNNode method along with the SCNMaterial() instance represents the key component in translating touch input into a graphical interpretation of a dot on the screen of the device.

Lastly, the inclusion of the UITapGestureRecognizer class, which is responsible for perceiving the number of taps, represents the final ingredient necessary to combine the touch input with the distance measurements between the two separate points in space.

The basic idea behind the application rests on utilizing the distance formula based on the Cartesian coordinate system and applying it in the three-dimensional space. Combining that concept with Apple's ARKit library, which is responsible for detecting horizontal and vertical surfaces, results in a credible distance measurement display based on the user’s input throughout the device's touchscreen interface.

Through the process of testing the application several things stood out. Even though the desired points can be placed anywhere within the visible spectrum provided by the devices wide angle camera, the most precise measurements are achievable in the range of 10 to 300 cm. Also worth noting is the fact that the measurements get

1892 MIPRO 2019/SP

more precise the closer the device is in regards to the object that is being measured. Where the ARKit implementation really shines is the fact that the measurement points don't need to be visible on the screen at the same time which ultimately translates to a high level of precision as long as the device is placed in closer proximity to the object that is being measured [4] (Figure 1).

In conclusion, thanks to ARKit's tracking and mapping system, one could say that the points placed on top of the real world embody the definition of augmented reality.

Figure 1. Demonstrating the basic capability of the distance measuring application

III. OBJECT RECOGNITION APPLICATION

In contrary to the previous way of implementing augmented reality, the following application utilizes standard elements of ARKit and SceneKit libraries in conjunction with the Core ML and Vision libraries.

While the ARKit and SceneKit libraries deliver data related to planar detection and store user defined parameters superimposed over the user's real environment, the Vision library introduces the ability to use Machine Learning and Computer Vision in order to be able to determine the name of the perceived object based on the specific predefined category it belongs to.

The category assessment process is done with the help of predefined Machine Learning models such as Inceptionv3, Resnet50, MobileNet, SqueezeNet, GoogLeNetPlaces and VGG16 readily available withing the Apple Developer documentation [2]. Also, along with the aforementioned Machine Learning models, a custom Croatian language based model was designed specifically for the purpose of application testing and demonstration.

The functionality of the application itself rests on several key components. One such component is VNCoreMLModel from the Core ML library which is responsible for utilizing the Machine Learning model within the code [7].

Thanks to the inclusion of the ARKit library and its UITapGestureRecognizer class the application is able to recognize user input and translate it into a specifically defined action. In this case, similarly to the distance measuring app, the application generates white spheres with the addition of a three-dimensional text element attached to its node.

The Vision library’s role comes into play once the perceived image needs to be compared to the desired Machine Learning model which is initiated with the VNClassificationObservation component (Figure 2).

Initializing the image request is done with the help of a do – catch statement accompanied with the keyword try which implies that the request might return an error. Following a successful do – catch statement the Vision request gets initiated with the help of the perform function.

Figure 2. Implementation of the custom ARVision model within the code using the Xcode interface

Another thing worth noting is the fact that the image perceived by the ARKit library gets recorded in a YUV format, which in turn means it needs to be converted to RGB values in order for the device to be able to interpret the image properly and ultimately ensure future compatibility with a broader range of Machine Learning models.

For the creation of the custom Machine Learning model, Microsoft's Cognitive service under the name of Custom Vision was used. The purpose of the service is to simplify the creation and training of smaller Machine Learning models.

To ensure a more precise result, each specific object category contains a sequence of 100 to 500 photos. Since the model requires a series of repetitive training cycles, each cycle contained up to 60 different photos. The number of photos per object was determined based on their physical features, i.e. simpler objects needed fewer photos than more complex ones.

All of the uploaded photos, along with their respective tags used for determining the name of the class which they belong to, fall under the General (compact) domain within the Microsoft Custom Vision interface since that's the only way to extract the needed data in a format compatible with the iOS Core ML models.

In total, 5000 out of the maximum allowed 5000 photos were uploaded under 17 different tags (Figure 3). Also, 10 out of 10 model training iterations were used to ensure the model's capability to differentiate all 17 categories of objects it contains. Since the minimum acceptable

MIPRO 2019/SP 1893

percentage of a usable model is considered to be upwards of 60%, and the achieved precision and recall percentages of 10 consecutive training sessions reached upwards of 86%, the custom ML model was successfully created and ready to be implemented within the application [4].

Figure 3. The uploading process of a sequence of hand images with appropriate tags for the purpose of training the custom Machine

Learning model

Upon booting the application for the first time the user gets greeted with a white screen during which ARKit collects parameters and positions of the surrounding surfaces after which the application becomes ready for user input. Considering the number of characters in each tag, the recommended range of usage is between 20 and 200 cm.

More precise and reliable recognition is achieved when the device is placed closer to the object the user wants to identify by utilizing the built in Machine Learning model. The reason for that lies in the fact that the device tends to discern objects easier if they cover the majority of the frame as opposed to a more complex scene with several similar shapes and objects [4].

Bearing in mind the limitations of the application, it is recommended for the objects to be static because of the way the application collects and utilizes spatial parameters necessary for precise virtualization of computer-generated content. Therefore, objects that are static have a higher probability of being recognized by the algorithm as opposed to the ones that are changing their position relative to the device’s location (Figure 4).

Figure 4. Testing the object recognition function of static objects while utilizing the Croatian machine learning model

Another prominent element of the application is the inclusion of statistical data located in the lower half of the display. The statistics include features such as a frame per second counter and the number of perceived nodes in the scene along with the total number of polygons. By going through the available statistics, provided thanks to the SceneKit library, the user can determine how many elements in the current scene the device has recognized and thereby influence its perception by slightly repositioning the device [4] (Figure 5).

Figure 5. Scene statistics display based on the showStatistics function from the SceneKit library

IV. GOOGLE PLACES POINTS OF INTEREST AR

APPLICATION

The purpose of this application is to identify and pinpoint the location of Google's points of interest based on the user’s current location. The display of the POI content is accomplished by combining the HDAugmentedReality library with the implementation of the Google Places API interface.

By utilizing the two aforementioned elements in combination with the device's built-in camera module, the application is able to display Google's points of interest superimposed over the surrounding environment while simultaneously binding the floating POI labels to their actual location in space regardless of the direction the device is facing.

Achieving this complex task required several separate Swift based files to be created and combined. The first two files named MjestaUcitavanje.swift and Mjesto.swift are responsible for returning the query results as points of interest, based on the Google Places API interface, and storing them in appropriate classes within the code. The most important class in this case is CLLocationManager since it's responsible for starting and stopping the location data protocols required for successful execution of the application's main functionality [4] (Figure 6).

Figure 6. Location data fetching syntax using the CLLocation class

1894 MIPRO 2019/SP

Thanks to the aforementioned part of the code the application is able to fetch data relating to the current location of the device which in turn prepares it to fetch points of interest through the Google Places API interface.

Since the Google Places API is in essence a paid service, its usage and implementation require a registered Google account along with a valid credit card. Upon completing the registration and payment process, the service automatically generates an API key necessary for the proper implementation of Google's services within the application [3]. In this case, the API key itself needed to be stored within the MjestaUcitavanje.swift file (Figure 7).

Figure 7. Google Places API apiURL and apiKey constants necessary for the proper execution of the code

In order for the points of interest to be displayed on the screen of the device, they first need to be passed to the ViewController Swift file with the help of the startedLoadingPOIs variable which is responsible for monitoring all ongoing requests (Figure 8). Since the CLLocationManagerDelegate method can initiate multiple requests after refreshing the location data, the larger number of requests is limited with the help of the mjesta variable which is meant to store all of the pereceived points of interest.

Figure 8. The implementation of the startedLoadingPOIs and mjesta variables within the code

Taking into consideration the fact that the user might be located in an environment that doesn't have any points of interest nearby, the code contains a specific parameter to address this issue. By attaching a value of 5000 to the radius parameter, the application starts loading points of interest that are up to 5000 meters away from the device. In other words, if the returned value equals null the fetching radius gradually increases until a sufficient number of places of interest gets loaded into the ViewController [4] (Figure 9).

Figure 9. Part of the locationManager function responsible for determining the maximum radius of POI content fetching

By combining all of the above mentioned elements, the application is able to determine the user's location, load a number of nearby points of interest and store them inside their respective classes.

Since the application is meant to display the POI content in two ways, i.e. on a map and through the live view feed of the built-in camera module, the application needs to show a map with the user’s current location on the initial screen.

Implementing the map element is achieved by utilizing the MapKit library within a separate Swift file named MjestoAnotacija the purpose of which is to bind together the fetched POI content details with their corresponding locations on the map (Figure 10).

Figure 10. Implementing the MapKit's MKAnnotation protocol responsible for binding the POI data with the corresponding location

on the map

With the initial map screen created, the only missing element is the actual implementation of augmented reality. For the user to be able to transition from a traditional map into the AR view, a UIButton element representing the camera shortcut needed to be created.

For the actual augmented reality part of the application to be functional, the HDAugmentedReality library needed to be utilized to simplify the process of running specific calculations geared towards determining the distance between the device and the points of interest it's aimed towards. The calculations in question imply the use of x and y coordinates of each individual POI element based on the Cartesian coordinate system as well as the use of the orthodromic distance formula which determines the precise distance between two points on a sphere. In short, the HDAugmentedReality library is responsible for keeping the POI content tied to its actual location in space rather than constantly being displayed on the device regardless of its orientation [1].

The HDAugmentedReality library itself contains a number of Swift files which are needed to identify the points of interest, create a view for the points of interest, define the basic settings and supporting methods, run calculations and control the video signal while simultaneously adding annotations into the visible spectrum. However, for the library to be usable its ARViewController needs to be implemented within the ViewController file (Figure 11).

Figure 11. ARViewController implementation within the ViewController Swift file

Another crucial element of the augmented reality experience within the application is the ARDataSource protocol which is responsible for transferring data necessary for displaying the POI content. For that purpose,

MIPRO 2019/SP 1895

a separate Swift file was created under the name AnnotationView.

Utilizing the UIKit library, the AnnotationView file contains several notable elements. The inclusion of the ARAnnotationView subclass provides access to the view element containing labels with names and distances of corresponding points of interest. Additionally, the loadUI() function is responsible for attaching specific settings to the labels in question.

For visualization purposes within the augmented reality segment of the application, three specific labels were created. While the variables under the names nazivLabel and udaljenostLabel represent the graphical background elements for the name and distance parameters respectively, the strelicaLabel variable represents the arrow underneath the annotation pointing towards the location it refers to [4].

All of the elements contained within the label were created using the CGRect structure which is commonly used for creating rectangles of various sizes. To create the arrow shape, a rectangular element was rotated by 45 degrees and placed underneath the rectangles containing the POI content. This was done using the rotationAngle command which depicts the rotation in radians rather than degrees, therefore requiring the value of pi to be divided by four. Since pi, in this case, represents 180 degrees, the result equates to a rotation of 45 degrees [6].

The remaining elements of the label consist of two basic rectangles with two curved corners each. The curved corners were achieved thanks to the Core Animation library supported by iOS 11 and newer iterations of the iOS operating system. Through the use of an if statement containing the maskedCorners command, the layerMinXMinYCorner and layerMaxXMinYCorner variables were utilized to round off two specific corners by five units [5].

In order for all of the label elements to be displayed, they needed to be contained inside of the layoutSubviews() function followed by UIKit's touchesEnded function. This way, the application is capable of discerning whether the display is being touched so it can run processes necessary for its successful execution (Figure 12).

Figure 12. Combining the label elements with the layoutSubviews() and touchesEnded functions

Bridging the gap between the layoutSubviews() and the actual ViewController file is the UIAlertController class contained within the UIViewController class (Figure 13). With the inclusion of the UIViewController class the user

is able to get relevant information upon tapping the desired point of interest on the device’s display.

Figure 13. UIAlertController implementation

The last crucial component necessary for the successful execution of the application is the inclusion of the NSCameraUsageDescription key inside of the Info.plist file. The sole purpose of this specific key is to ask for the user’s permission to use the device’s built-in camera in order for it to be able to display the augmented reality content.

Upon running the application for the first time, the user is prompted to allow the usage of location services. Also worth noting is that the user will be prompted to activate the location services each time the application detects that the services aren’t actively running in the background. This is to ensure the accuracy and reliability of the results shown within the application.

By activating the location services, the user allows the application to use the device’s built in GPS module to determine its location in relation to the nearby points of interest. Shortly thereafter, the points of interest start appearing in line with the previously determined radius of 5000 meters (Figure 14). Pinching the map to zoom in or out reveals additional points of interest in the vicinity determined by their statistics fetched from the Google Maps server. Consequently, using the application requires a constant Wi-Fi or Mobile Data connection [4].

Figure 14. The home screen of the application displaying the user’s current location (blue) and the closes points of interest (red)

1896 MIPRO 2019/SP

Touching the Kamera button located in the lower right corner reveals the augmented reality portion of the application. In order for it to be functional, the user gets prompted with a pop-up notification requiring them to allow the usage of the device’s built in camera module due to the previously mentioned NSCameraUsageDescription key contained within of the Info.plist file. Allowing the usage of the device’s camera module results in the display being occupied with nearby points of interest floating above their actual location superimposed over the user’s real environment [4].

Since the display of the POI content depends on the device’s spatial orientation, the user needs to point the device towards the general direction the points of interest are located in order to be able to see their actual distance displayed on the screen of the device (Figure 15).

Figure 15. Displaying the names and distances of the nearest points of interest through the use of augmented reality

By tapping the name and distance labels the user can access additional information regarding certain points of interest depending on whether there is more information available on the Google Maps server. Exiting the information notification is achieved by tapping the OK button, while exiting the AR view is achieved by tapping on the dedicated button in the upper right corner (Figure 16).

Figure 16. Displaying the additional information for a specific point of interest fetched from the Google Maps server

By moving through space the user can discover additional points of interest that weren’t initially present depending on their popularity determined by the Google Maps server. The Google Maps server also offers 90 different categories of POI content which can be selectively displayed by utilizing the types= command within the code [3]. In case no category is defined, the user gets to see all of the available points of interest in their vicinity.

V. CONCLUSION

Contemporary augmented reality solutions such as the Apple ARKit are diligently upgrading their platforms and accompanying utensils with each coming iteration, thus providing developers with a broader range of tools intended to simplify tedious tasks and ultimately boost their productivity and creativity while consequently discovering new and innovative use-case scenarios.

Some innovative concepts of augmented reality implementation were materialized during the process of forming this paper, while their practical applications serve as a testament to the functionality and the potential of the synergy between mobile technology, the iOS platform and augmented reality as a way of consuming and interacting with digital content.

Based on the presented concepts that combine content that is of interest to users, i.e. potentially engaging and relevant, with modern technological innovations and augmentations, it can be concluded that augmented reality represents undeniable potential while serving as a useful alternative to a multitude of previously established modes of interaction with the environment.

REFERENCES

[1] D. Huis, HDAugmentedReality, GitHub Inc., https://github.com/DanijelHuis/HDAugmentedReality

[2] Developer Documentation, Apple Inc., https://developer.apple.com/documentation/arkit

[3] Google Maps Platform: Overview, Google Inc., https://developers.google.com/places/web-service/intro

[4] I. Permozer, “Proširena stvarnost na Apple iOS platformi” (in Croatian), Master’s thesis, Juraj Dobrila University of Pula, Department of Economics and Tourism, September 26th 2018.

[5] P. Hudson, How to round only specific corners using maskedCorners, Hacking with Swift, 2018., https://www.hackingwithswift.com/example-code/calayer/how-to-round-only-specific-corners-using-maskedcorners

[6] P. Hudson, How to scale, stretch, move and rotate UIViews Using CGAffineTransform, Hacking with Swift, 2016., https://www.hackingwithswift.com/example-code/uikit/how-to-scale-stretch-move-and-rotate-uiviews-using-cgaffinetransform

[7] Working with Core ML Models, Apple Inc., https://developer.apple.com/machine-learning/build-run-models/

MIPRO 2019/SP 1897

https://github.com/DanijelHuis/HDAugmentedReality

https://developer.apple.com/documentation/arkit

https://developers.google.com/places/web-service/intro

https://www.hackingwithswift.com/example-code/calayer/how-to-round-only-specific-corners-using-maskedcorners

https://www.hackingwithswift.com/example-code/calayer/how-to-round-only-specific-corners-using-maskedcorners

https://www.hackingwithswift.com/example-code/uikit/how-to-scale-stretch-move-and-rotate-uiviews-using-cgaffinetransform

https://www.hackingwithswift.com/example-code/uikit/how-to-scale-stretch-move-and-rotate-uiviews-using-cgaffinetransform

https://developer.apple.com/machine-learning/build-run-models/

Utilizing Apple’s ARKit 2.0 for Augmented Reality...

Documents

Transcript of Utilizing Apple’s ARKit 2.0 for Augmented Reality...