You need a browser with web GPU capability.
Checkout instructions here Tested on Chrome Canary
End to end time cost includes data copy from/to the GPU.
GPU run benchmarks only invoke compute without data copy.
We can get a sense of copy overhead by comparing benchmark runs to the end to end timing.
Try to click classify multiple times,
you will notice that the classifier becomes faster.
This could due to GPU driver execution stablizes.
The stablized GPU cost helps us to know what is the best performance
we can get if we run the model continuously(e.g. in an always on detetcor demo).