Jobsystem in Depth – Part I – Basic Theory

I have recently (and by recently I mean like 2-3 weeks ago) posted an In-depth on Unity’s Entity Component System – so it is about time to take a look at the Jobsystem as well.

Generally, it is a good idea to not see these two things as separates but rather as a combined system to improve your game performance immensely. We will take a dive into making a project including both, to get to something with really high performance eventually. But before we can properly do that you need to understand the basics of both of them. So if you have no clue about the Entity Component System you should go and read that article first.

What is the Jobsystem?

If you have been following me for a while now you may have seen an article I wrote on Multi-Threading in Unity – in there I mentioned how people tend to say that you can’t Multi-Thread in Unity. This has never been the case as you were able to do it but with the caveat of not being able to use any of the Unity specific name spaces. So f.e. and just to recap really quickly – you could multi-thread different kinds of tasks as long as they didn’t need access to transforms or game objects from anything other than the main thread. So doing some Vector3 math was no problem in a separate thread.

If you are pretty on par with your Unity knowledge you might have already known that parts of the engine had been multi-threaded for a while. F.e. the parallel running physics engine. Now, with the Jobsystem as a feature Unity allows us to take advantage of their way of Multi-Threading.

You might be wondering why this is so exciting and if you are, I’d highly suggest you read the article about multi-threading I linked earlier, but in short :

The Jobsystem allows us to easily write multi-threaded code, which in general is a great start to generate a high-performance gaming experience. Not only are we talking frame rate benefits here but also significant improvements to battery life when developing for mobile.

Under the hood, with this new feature we are now able to write code that shares worker threads with the Unity engine features (f.e. Physics Engine).

What is Multi-Threading?

If you are a pro, you already know the basics or you read the article I wrote on Multi-Threading in Unity – feel totally free to skip this part as I just want to give a “TLDR” for all the people that ignore my 1000 disclaimers to read something else first anyways.

Generally In a single-threaded program, one execution call goes in at a time, and one result comes out at a time.
The performance of a program is dependent mainly on how long it takes to load and complete. Now, with only one single thread that executes lineally it takes longer than if we for example had two threads that work simultaneously and that’s what we call Multi-Threading.

Multithreading  takes advantage of a CPU’s capability to process many threads at the same time across multiple cores.

One thread runs at the start of a program by default. This is the “main thread”. The main thread creates new threads to handle tasks. These new threads run in parallel to one another, and usually synchronize their results with the main thread once completed.

This approach to multithreading works well if you have a few tasks that run for a long time. However, game development code usually contains many small instructions to execute at once. If you create a thread for each one, you can end up with many threads, each with a short lifetime. That can push the limits of the processing capacity of your CPU and operating system.

It is possible to mitigate the issue of thread lifetime by having a pool of threads. However, even if you use a thread pool, you are likely to have a large number of threads active at the same time. Having more threads than CPU cores leads to the threads contending with each other for CPU resources, which causes frequent context switching as a result. Context switching is the process of saving the state of a thread part way through execution, then working on another thread, and then reconstructing the first thread, later on, to continue processing it. Context switching is resource-intensive, so you should avoid the need for it wherever possible.

Where is the difference between Jobsystem and traditional Multi-Threading?

In Multi-Threading you open a thread and feed it different tasks. You have to not only be aware of when to merge your secondary thread to the main thread but only to properly close it back down and so on. So, Multi-Threading comes hand in hand with a bunch of managing.

The Jobsystem has a different approach because we are not creating any threads but rather use Unity’s worker threads that span across multiple cores and feed them some tasks, or as Unity likes to call them – Jobs. It should be easy to see that this approach is much easier as you avoid any kind of difficulties that may come along with actually having to manage a thread. Not only that, but as mentioned earlier, we also don’t have to ever worry about race conditions.

What are race conditions?

A race condition can happen when the outcome of an operation is depended on the timing of another process.
Let’s say we do modify a value a in a second thread. Maybe our second thread isn’t done yet but our main thread keeps processing the value of a – therefore we end up dealing with the unmodified value.

Timing execution can be really tricky and trying to debug this sort of problem is a pain in the ass as debugging and using breakpoints generally results in timing differences. So there is a chance that your code performs as intended while you are debugging it but it doesn’t if it is run regularly.

In some situations this might not even result in a proper bug but just in really weird behavior. Race conditions are one of the biggest challenges that come along with traditional multi-threading.

 

With it’s built-in safety checks the Jobsystem can detect all potential race conditions and and actually prohibit them from occuring by sending each job a copy of the data it needs to operate on, rather than a reference to the data in the main thread, which in reult eliminates the race condition as we now speak of standalone data rather than just a reference.

In consequence a job can only access blittable data types. These types do not need conversion when passed between managed and native code.

What are blittable data types?

Blittable data types have a common representation in both managed and unmanaged memory which means that they don’t require special handling or conversion for that matter.

For Example: Int16, Int32, Int64, Byte, Single and Double are blittable data types.

Unity uses the C++ method memcpy to copy and transfer the data between the managed and native parts of Unity. Basically, when scheduling a job, we put our data into native memory while giving the managed side access to a copy of that data when we are executing the job.

What is memcpy?

Memcpy is a copied block of memory. This C++ function copies the values of num bytes from one given point to another destination.
The execution results in a binary(build up of 0 and 1) copy of the given data.

Even the dreadful terms of context switching and CPU contention are no longer an issue to worry about as Unity usually has one worker thread per CPU core and jobs are scheduled synchronously between them.

What is CPU Contention?

CPU contention is a state in which individual CPU components/Cores wait too long for their turn at processing. In general resources are distributed evenly between components and/or virtual machines and the schedules order the input and/or output tasks.
In short: CPU contention can be the reason why multi-threaded code can be slower than single threaded code when done badly.

In the Jobsystem all our Jobs are placed in a queue. Free worker threads grab our jobs and execute them in the way they are lined up in our queue. To ensure that our jobs are executed in the order we need them to be we can take advantage of our jobs dependencies.

So what is a Job?

Basically each Job can be described as a method calls, each job has its own data and parameters that it receives upon creating and then uses for its execution. Jobs can be self-contained, meaning that when they are actually going to be finished doesn’t matter to us. Or, which is probably the more reasonable case, they can have dependencies. A dependency makes our life much easier as it ensures to our code get’s executed at the right time.

With Multi-Threading in general this is a big thing, you need to make sure that your execution is on par to avoid race conditions, which means nothing more than one thing having to wait for another which results in a lag.

So basically, a dependency means that our second task, that’s dependent on our first task, won’t start executing until our first task is complete.

Synthax:

Every Job has to implement one of these three types – IJob, IJobParallelFor or IJobParallelForTransform. IJobParallelFor is used for everything where a single task has to be executed in parallel many many times. IJobParallelForTransform is literally the same just especially for Unity Transforms. As you may have been able to spot – these types are in fact interfaces, so as long as you don’t have an Execute function in your script you’ll be annoyed by your compiler. You should also never forget that your job has to be a nullable type which means that it has to be a struct and can under no circumstances ever be a class. This is due to memory allocation problems.

If you’ve read part one of this article carefully you might remember me mentioning native containers. These also touch on the subject of memory allocation and deallocation. Basically, Unity created these new containers to allow us to write easy threadsafe code. We are going to take a closer look on those when we discuss scheduling of the jobs.

using Unity.Collections;
using Unity.Jobs;

/*Jobs need to be of nullable type, which means they have to be structs...
Every job has to inherit from either IJobParallelFor, IJobParallelForTransform or IJob */
public struct MyJob : IJobParallelFor {

  /*Within your job you'll have to define all the data that is necessary to execute your job as well as your outcome. 
  Unity created native arrays which are basically like regular array but you have to take care of the allocation and deallocation yourself
  */
  public NativeArray<Vector3> waypoints;
  public float offsetToAdd;

  /*Every Job needs an execute function */
  public void Execute(int i)
  {
    /*This function will hold your behaviour. Every variable that is necessary to execute will have to be defined at the begginging of this struct */
    waypoints[i] = waypoints[i] * offsetToAdd;
  }
}

Scheduling Jobs:

Now, once we created our MyJob.cs struct how do we actually get it working? We have to schedule it.
In generall this concept is rather easy but it comes with a few caveats you should be aware of. In general, every job needs to be scheduled. That basically means that we initiated the job, added its data and sent it out to be queued for executing. Once this has happened there is no way for us to interrupt this process.

The general synthax reference the unity manual gives you for the jobs looks like this:

// Create a native array of a single float to store the result. This example waits for the job to complete for illustration purposes
NativeArray<float> result = new NativeArray<float>(1, Allocator.TempJob);

// Setup the job data
MyJob jobData = new MyJob();
jobData.a = 10;
jobData.b = 10;
jobData.result = result;

// Schedule the job
JobHandle handle = jobData.Schedule();

// Wait for the job to complete
handle.Complete();

// All copies of the NativeArray point to the same memory, you can access the result in "your" copy of the NativeArray
float aPlusB = result[0];

// Free the memory allocated by the result array
result.Dispose();

Which is correct and it works but it has a little downside as you are calling your complete right after your schedule which will result in a little wait time (or as it is called in the Profiler – “Idle Time”).

Instead if you get in the habit of actually scheduling your jobs nicely you’ll see that your wait time in the profiler will minimize and that you’ll gain significant performance – at least on older machines.

Scheduling Jobs Efficiantly:

As discussed earlier – calling complete right after scheduling our job is bad because we are giving our worker threads no time at all to accomplish any thing. This will end up giving us a lot of idle time between scheduling calls, which takes an impact on performance.

Now in this example we are creating a struct that can hold the reference to our handle and our native array. Why those two? We need the handle to call complete on our job at a later time and we need the native array as it needs to be deallocated. As I mentioned before – NativeArrays work like regular arrays with the caveat of actually having to set an Allocator (which defines how long our array will remain in memory) – we’ll use Allocator.TempJob for our example.

We’ll also have to deallocate the memory once we called complete and we copied over the data.

You’ll see that I am creating a reference to a JobResultAndHandle and then call ScheduleJob() on it. This will result in my job being scheduled and its references are save in my list.

I can then go over every entry in my list, call complete, copy my executed data and dispose the NativeArray to deallocate the memory.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.Collections;
using Unity.Jobs;

public class MyJobScheduler : MonoBehaviour 
{
  Vector3[] waypoints;
  float offsetForWaypoints;

  //We are keeping a list of our results and handles!
  List<JobResultAndHandle> resultsAndHandles = new List<JobResultAndHandle>();

  void Update() 
  {
    /*We are creating a new JobResultANdHandle whenever we need one (this does not need to be in update - its just an example!)!
    And we then give the reference to our ScheduleJob method. */
    JobResultAndHandle newResultAndHandle = new JobResultAndHandle();
    ScheduleJob(ref newResultAndHandle);

    /*If the list of our ResultAndHAndles isnt empty we are looping through it to see if there is any job that we need to call complete on. */
    if(resultsAndHandles.Count > 0)
    {
      for(int i = 0; i < resultsAndHandles.Count; i++){
        CompleteJob(resultsAndHandles[i]);
      }
    }
  }

  /*ScheduleJob takes the reference to JobResultAndHandle we created and initizes and schedules the job for ous! */
  void ScheduleJob(ref JobResultAndHandle resultAndHandle)
  {
    //We are populating our native array and setting an appropriate allocator
    resultAndHandle.waypoints = new NativeArray<Vector3>(waypoints, Allocator.TempJob);

    //We are initiazing the job and giving the needed data.
    MyJob newJob = new MyJob
    {
      waypoints = resultAndHandle.waypoints,
      offsetToAdd = offsetForWaypoints,
    };

    //We are setting our job handle and scheduling our job.
    resultAndHandle.handle = newJob.Schedule();
    resultsAndHandles.Add(resultAndHandle);
  }

  //In complete we are copying over the processed data from the job and then desposing the native array.
  //This is necessary as we need to deallocate the memory.
  void CompleteJob(JobResultAndHandle resultAndHandle)
  {
    resultsAndHandles.Remove(resultAndHandle);

    resultAndHandle.handle.Complete();
    resultAndHandle.waypoints.CopyTo(waypoints);
    resultAndHandle.waypoints.Dispose();
  }
}

struct JobResultAndHandle
{
  public NativeArray<Vector3> waypoints;
  public JobHandle handle;
}

 

JobHandles and Dependencies:

Calling the Schedule() function of a job it returns a JobHandle. These are useful to keep a reference of your job but you can also utilize them as depencies for other jobs. Now what does that mean?  If my job depends on the results of another job I can simply pass the other jobs handle as parameter in myjobs schedule method. By doing so I am basically saying execute my job once this other thing is complete.

As mentioned in earlier in this article, problems of race conditions or where one thread had to wait for another, where a big downside of working with multi-threaded code – which is now easily avoided by just passing the handle.

(Note: if you have more than one dependency you can call  CombineDependencies() on the JobHandle!)

Part II of this article will be published for Patrons of the Early Accessor Tier here!
It’ll have an example for Meshdeformation using the Jobsystem.

Liked it? Take a second to support Kristin on Patreon!

Leave a Reply

Your email address will not be published. Required fields are marked *