Digital Histopathology Deployment in GSK Non-Clinical Histology Facility

Elena Miranda, PhD Director, Non-Clinical Histology at GSK

Currently, the Pathology community examines relatively small numbers of digital images of H&E stained histological sections for internal consultation and peer review harmonization.

The Non-Clinical Histology team at GSK focused on a new digital histopathology imaging workflow that will define the standards for histopathology in drug discovery/development for the wider GSK scientific community and expectations for external collaborators. By consolidating metadata standards, governance and data entry systems, the digital histopathology imaging workflow will enable increased application of quantitative image analytic approaches like artificial intelligence to clinical and nonclinical biomarker assays.

For Research Use Only. Not for use in diagnostic procedures.

Webinar Transcription

In this presentation I will talk a little bit about how we are deploying digital histopathology in the non-clinical histology lab in GSK. Digital histopathology has the potential to improve our ability to discover disease mechanisms, and to identify and categorize patients in different diseases and to predict their disease outcome and to create more targeted therapies.

Before, the diagnosis was based on the pathologist looking down to the microscope and checking the slides and deciding if it was a tumor, if it was autoinflammatory disease, all this kind of thing. Now, the diagnosis is becoming a complex mechanism. There are multiple pieces of information that the clinician puts together to give to the patient the right diagnosis and the right therapies.

Digital histopathology and histopathology in general started a long time ago. H&E seems very strange, but H&E was first introduced 150 years ago and the first SNE was introduced 100 years ago. That shows a little bit how we base all our knowledge on something that was created a long time ago and remains stable for all this period.

Something that changed is how we are approaching and how we are visualizing the data that we are getting from all these slides. The term artificial intelligence was first used in 1956. And from 1956 to the 1990s, people started approaching microscopy and taking images in a slightly different way. They started putting cameras on microscopes and it was possible to get these images.

There were other terms related to artificial intelligence or image analysis that were used in those years. For example, convolutional network, neural network or deep learning were also invented in this period as well. But it's only from when we had the first digital scanner in 1990 that we could implement and improve what we could do in artificial intelligence and image analysis for the last 20, 30 years, then it was just exponential to grow.

We have different kinds of scanners. We have scanners that are quicker, high throughput scanners. What is important then for the patient is that the regulatory agency approve something that we do with artificial intelligence and this happened two, three years ago with the FDA that approved the first artificial intelligence tools for medical diagnostic. The digitalization of rolled slide images streamlines the pathology workflow. It doesn't matter where the slide is created, it doesn't matter where the pathology is, you just need a screen and a scanner, and you can do your work. There are all this data that are collected in libraries. All these are sorts of information that then we can use to then characterize the patient and create diagnostic tools. We can apply artificial intelligence to all these images and determine information that can be used then for the patients.

With all this exponential growth, it's normal that all the investments increase exponentially in the last period. Healthcare artificial intelligence projects are growing greater than any other sector in industry more than mobile networking or anything else. In the last 18 months, we had 100 million invested in startups that are doing something related to pathology, artificial intelligence. And in 2019, the UK government invested 60 million in the UK Innovate Initiative that was to develop digital histopathology in the NHS.

In 2018, there were the steam was like that 2 billion were invested in artificial intelligence project. This is going to triplicate in 2023 and triplicate again probably in 2025, and it might be even higher than that. With all this investment, it is normal that there are any venues from everything that is digital pathology related is just growing exponentially in the last two, three years.

Let's see a little bit how the digital histopathology workflow works in the pharma industry. We have modern digital scanners, so the quality of the slides is better. The number of slides that we can scan is higher and we can do it faster and it is almost automatic. There are more tools for artificial intelligence or image analysis that are powerful and more user friendly. There is a change in the culture as well, because the more we can see the benefits, the more we might use these tools.

The regulatory barrier is going down. At the beginning it was difficult to get something approved, but once that we have the first tool that was approved, now it will be a little bit easier. There is a challenge in business environment because with all these new tools, we need to use them and there is the pressure to use them. The advantages of using digital histopathology is that it reduces the disruption of home and work-related issues. As I say, it doesn't matter where you create the slides and where pathology is because you just need a video and you can do your own work.

It is easier to connect with different pathologies, having collaboration and having the subject matter expert involved in the diagnosis as well. And then the two major things, there is a very high focus on quantitative data. It's not only the quantitative, qualitative, but the semi-quantitative, but it's quantitative almost down to the single cells. And there is improved productivity and efficiency because there is automation and more investment.

As in everything, there are also limitations as well. The images that we create are very, very big. The amount of data that we produce and the amount of storage space that we need is massive. That's 99% of the time I see the cost of these scanners that we use. It is expensive. The training set of slides required to train the algorithm is becoming bigger and more complex. As in all the aspects of the medicine development, when you have something that is rare, it becomes more difficult to have enough slides on a particular rare disease to train your algorithm to recognize that disease.

Quick adaptation is required and it is a different work for pathologists. People that have learned their job on glass slides, it becomes difficult to adapt to digital files. But this shift is progressing. We are moving forward. With all the situation with COVID, all the last 18 months have been training virtually. The new generation of pathology will be just used to that.

In GSK, we have been engaged with digital histopathology project since 2004. It was normal at some point we were going to arrive in a situation where we were going to have a big project on this to standardize it. And in order to standardize a workflow, you need to 1st identify which kind of data or metadata you want to collect, how you are going to collect them, And then you're going to collect all the qualitative data that you produce from these images, and you need to make sure that everything that you are collecting as a sort of library, you can reuse it.

The workflow is starting from the origin, so it's like there are study metadata, there are protocol metadata, there are animal metadata, and then there is what we do in the Histology Lab. The sample metadata that we are going to collect, and then it goes to the scanner, so the digital scanner metadata and the image analysis metadata. The way that I imagine this is like a train. You start at a certain point with just a locomotive and then you attach into the different station, different coach. These coaches are just bringing new data that will be associated with your digital image, with your digital file.

All these trains and all these image files are put together in a library and somebody in the future will just go back and search for a specific characteristic and then this person will be able to see all the digital files and all the data associated with that file. The idea is to have this sort of libraries for both internal and external images. That is at a different level of complexity because we need to work also with our CRO partners to get all this solved.

For our digital histopathology workflow, we use this software called Prima. It is produced by a company, an American company called Fortelinia. It allows us to have this workflow in the lab. We are at the little bit, we are starting with the software. Now there is not a full workflow that is all automatic. The initial capture of all the data coming from the study and the animals is done through a CSV file that is imported in the system. And then everything that we do in the lab, so starting from the jar creation, the cassette, the slides and the stain slides are governed by a barcode. We just need to read this barcode, and all these barcodes have all the information that we have on this tissue, on these organs and on the stain as well. The barcode is also read by the scanner and then there is software that puts all this metadata together with the digital file and then we collect this digital file in long-term storage.

These are just part of the equipment that we have in the lab and that we use in this workflow. We have cassette and slide printers, but don’t worry if you don't have them because you can have a label printer as well and you can just attach your paper label to your glass slides. We have a combination of tablets and computers and many PCs. There are people that prefer to use a tablet because there is a little bit of mobility, and another advantage of using tablet is that if you have limited spaces, it's a little bit easier.

I will talk about this switch box in a second. Then we have slide printer again, the barcode reader. We have used different barcode readers, and we have found that what we were thinking, so for example, that the fact that sometimes cassettes are covered by wax is not a problem. The other point that is important is that you need to have a scanner that suits your purpose.

What is important to do is that your LIM system, so for example in our case Prima, is a scanner agnostic. Because we have the P250 from 3D stack, but there are other companies that maybe have a period or the Nano Zoomer. In that case, we want to make sure that our system can have all the slides from all the images from the different scanners. It doesn't matter which kind of scanner you use.

Prima software works in a way that is very similar to, for example, Leica BOND. For example, before you write what you're going to do in terms of your slides, on the Leica BOND, you register your reagents, you set up your protocols, and here it is very similar. You have a control panel where you standardize your tissue, your organs, what you're going to do in a specific protocol, and then you just apply it to your slides.

There are other different modules, that one is the workstation, so there is the micropump workstation, there is the microscopy workstation, and the other modules are one for the pathologist, the lab manager, and the label creator, so you can create your own template for the labels. This is a representation very like the software that we use in the Prima control panel. We add the tissue type, the protocol, so how the blocks will be organized, which kind of block we collect in a specific tissue. Registering data regarding the equipment is also important because you want to know which kind of model you have used, and which kind of situation your equipment was in, especially if you're working in a regulatory environment.

The imaginary naming folder is the landing zone where all your images are going. And the CSV is here just a representation of what we use in terms of is like an Excel file to introduce all our data in the system. Lessons and learnings from our experience. Our experience with this software is based on the last two years. You need to have IT involvement from the beginning because it is fundamental. You might think that you know your equipment, but there are connections, network, firewall, all possible things and you need to have IT involved.

It is better to have all the equipment already in place. One recommendation that my team had is don’t train virtually, but that was inevitable because we started all this during COVID, so it was not possible to do it in a different way. You need to decide where you want to use your LIM system, if it's just toxicology or discovery. We have both and something that helps us. We started with the phase system. We started with the regulatory tox studies first. This is the reason why we use this printer switch boxes because as not everything that we were doing in the lab was going through Prima, we needed to switch from a normal not Prima system to Prima system and these boxes are doing the same. It's up to you to decide the tablets and the other PCs that if you have space, if you want to have the flexibility.

The super user. We divided, well, we selected few people in the team that became super user, and they have the role to test the first all the new version of the LIM system and then to train other people in the team. It is very important that they have very high computer skills to make sure that they can address all the problem that are important. Another thing that is critical is your barcode quality because barcode readers are basing all the information on the barcode quality.

A point that is important to stress is you need a file renaming convention because if not it becomes difficult to search for things. The way that we approach this is we try to identify which kind of data were important to read in the file name to make sure that the pathologist or the histology that were doing the analysis knew exactly what they are going to analyze. We decided to use this convention to insert the study number first, the animal number, the tissue number, the style name, and eventually the level that we were going to section. You can select the one that makes more sense for you, but it needs to make sure that is something that everybody can understand.

Another key point is the problem with vocabularies. If you have only one company or one lab based in one single place, that is easy. But GSK is a global company and we have labs both in the UK and the US. We had problems with British English versus American English, different names used in different places. You can call it colon, you can call it large intestine, you never know. We were the problem because we were also trying to work with the CROs. The CROs has a different way of recording tissue and organs and species and all this kind of thing. There is always the possibility that there are type errors, especially in username, because we were recording that. All this is something that you need to take in consideration when you do, when you approach this kind of LIMs system.

Another thing that you need to consider is the image quality. I think that everybody has seen overexposed images, out-of-focus images, images that somehow you cannot quantify because there is a problem or the contrasting is too strong and all this kind of thing. What we are trying to do is trying to automate image analysis check to get this information before the images go to the final analysis.

Another thing that we need to do, especially for H&E and stain slides, is use color calibration. I think that everybody has already tried to move to transfer protocol from one side to the other one, and they are never the same. The color can always change and to have an analysis that can quantify and be specific, and you apply the same algorithm to all the images created in different places, you need to have all the colors together.

There are challenges in establishing a digital histopathology workflow and the first one is the IT infrastructure. Sometimes it is outdated. And the problem is that mainly there are, as I said before, firewalls, security checks that sometimes are a little bit difficult because you come from a scientific background, not from an IT background. Metadata are important. I hope that I show you how they can be different even if two different people mean the same thing. They need to be standardized, and they need to be consistent.

You need to have a very good relationship with your CRO partners because then all the slides that they produce, they need to go through the same workflow. There is the problem with the accreditation for GLP, GCP. We don't have that problem, but when you use clinical sample, that is something that you need to consider and the user experience because it needs to be the same as glass slides or even better if not.

There are future challenges as well. The main one is that the regulatory agency needs to accept the data generated from these tools. And we need to make sure that the trail of a collection of metadata and analysis is consistent. Another one is that you need to have always the right equipment and the data storage space, as I mentioned before. For example, this is the CIFAR-10, it’s a database that collects 60,000 different images from different things. It can be dog, it can be roads, bridge, whatever. They are the size of 61 million pixels. And this is just, let's say, not even half of one of the digital pixels that are included in one of our whole slides sections. This is also another database with images. It has been calculated that the number of pixels contained in ImageNet database, they are equivalent to 474 old slides that you can scan normally in histopathology lab. I think that we can scan 474 slides in probably two or three days easily. When we create thousands of these slides, data storage becomes a very massive issue.

In summary, I hope that I show you that the digital images workflow is at the renaissance of the pharma industry. There are a lot of industries that are using it now. A digital workflow makes histopathology easier, faster, and then can improve the data quality and the type of data that we can produce. Image analysis tools or artificial intelligence are potentially very useful, and they are driving scientific robustness and the objective of the decision making that we do for our drug discovery. Quantitative data needs to be verified, and they are increasing their potential in the digital in the drug industry and there are a lot of challenges that you need to face if you want to establish a fit for post-digital histopathology workflow. As I mentioned before, this is not the work on one single group, but the work of many groups together. I would like to thank the, first, the Prima super user in the non-clinical histology team, the technical team together with their team, the data team that help us with the vocabularies and the definition of renaming and pathology and safety as well. And I didn't show any animal sample, but all the ones that we are using are ethical review and they are in accordance with all the regulations. And thank you for your attention.

Digital Histopathology Deployment in GSK Non-Clinical Histology Facility

Webinar Transcription

Related Content

RELATED PRODUCTS

RELATED TAGS AND TOPICS

RELATED CONTENT

SHARE