SlideShare a Scribd company logo
1 of 10
Download to read offline
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 1, March 2024, pp. 82~91
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp82-91  82
Journal homepage: http://ijai.iaescore.com
Video saliency-recognition by applying custom spatio temporal
fusion technique
Vinay C. Warad, Ruksar Fatima
Department of Computer Science and Engineering, KBNCE, Kalaburagi, India
Article Info ABSTRACT
Article history:
Received Sep 24, 2022
Revised Jan 24, 2023
Accepted Mar 10, 2023
Video saliency detection is a major growing field with quite few contributions
to it. The general method available today is to conduct frame wise saliency
detection and this leads to several complications, including an incoherent
pixel-based saliency map, making it not so useful. This paper provides a novel
solution to saliency detection and mapping with its custom spatio-temporal
fusion method that uses frame wise overall motion colour saliency along with
pixel-based consistent spatio-temporal diffusion for its temporal uniformity.
In the proposed method section, it has been discussed how the video is
fragmented into groups of frames and each frame undergoes diffusion and
integration in a temporary fashion for the colour saliency mapping to be
computed. Then the inter group frame are used to format the pixel-based
saliency fusion, after which the features, that is, fusion of pixel saliency and
colour information, guide the diffusion of the spatio temporal saliency. With
this, the result has been tested with 5 publicly available global saliency
evaluation metrics and it comes to conclusion that the proposed algorithm
performs better than several state-of-the-art saliency detection methods with
increase in accuracy with a good value margin. All the results display the
robustness, reliability, versatility and accuracy.
Keywords:
Motion colour saliency
Pixel-based coherency
Saliency detection
Spatio-temporal
Video-saliency
This is an open access article under the CC BY-SA license.
Corresponding Author:
Vinay C. Warad
Department of Computer Science and Engineering, KBNCE
Kalaburagi, India
Email: vinaywarad999@gmail.com
1. INTRODUCTION
The human eyes have proven to be an amazing marvel of nature. The brain and eyes together make a
powerful group, which can not only see 10 million distinct colors but also have a perceptive power of 50 objects
per second. In general, the human eyes can focus on specific components of a picture or a video that have an
importance to us. The brain in turn filters out the unnecessary bits of information this way, keeping only those
that have importance. Since a video is a series of images, the amount of information to be processed increases
along with the perception of dimensions.
In the technical world, this method of image and video processing has been tried to be copied or
reconstructed in a different way. Looking at the available saliency models for stationary images, we have Itti’s
model [1], this is regarded as the most used model for stationary image. Other models such as [2], which use
Fourier transformation along the lines of phase spectrum and [3] uses frequency tuning for saliency detection.
The commonality among the aforementioned models is the employment of the bottom-up visual attention
mechanism. For example, [3] model uses a range of frequencies in the image spectrum, which highlights the
important details, to obtain the saliency map. Then, the saliency map is computed with the help of Difference
of Gaussians as well as combining several band pass filters’ results. Then feature conspicuity maps are
constructed with the help of all low-level image features [4], [5], which is again added into the final saliency
Int J Artif Intell ISSN: 2252-8938 
Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad)
83
map result with principles such as Winner Take All or Inhibition of Return, that are taken from the visual
nervous system. All of these are designed for still images and not for videos. In videos, the texture feature may
not be salient in a moving image while it is present in a still image. Thus, there is need for other saliency models
and methods to for videos.
Videos are series of moving images called frames and its movement is in a sequence. There is a set
frame rate to render a smooth motion so that the brain cannot differentiate between each image. Videos also
helps in determining the position of any object with reference to another [6]. It can be inferred that video
saliency is much more complex than image saliency. There have been several researches done on this field and
there have been majorly two methods, one being computation of space-time saliency map and the other being
the computation of motion saliency map [7]–[10]. To get spatio-metrically mapped video saliency, Peters and
Itti [11] fused the ideas f static and dynamic saliency mapping to get a space-time saliency detection model.
We also have [12] in which the authors proposed a dynamic texture model to get the motion patterns, even for
dynamic scenes.
In general, most of the video saliency models uses bottom-up imagery as the base, which is capable
of handling non-stationary videos. In addition, motion information is considered an additional saliency clue to
help in detection of video saliency and to accomplish this many state-of-the-art saliency methods fuse motion
saliency with colour saliency. In [13]–[15] have adopted the fusion model but the result is a low-level saliency.
Almost all the latest keep temporary smoothness in the result saliency map and this helps in improving
accuracy. In [16], [17] has even used global temporal clues to obtain a low-level robust saliency but these
methods have error accumulation due to usage of minimization of energy framework, as it can manage the
saliency consistency over a temporal scale and this leads to wrong detections. As video saliency is a lesser
researched field and it has a great room for improvement and inclusion of customized models as well, with
addition of limited accuracy fall while guaranteeing temporal saliency consistency.
The general method in video saliency algorithms is to use state-of-the-art image saliency detection to
use as basic saliency clues but, in this paper, the chosen method is to not involve any high -level priors or
constraints and only just straight low contrast saliency. The hollow effect is also avoided by integrating spatial-
temporal gradient map. The temporal-level global clue is taken as the appearance modelling as this helps in
guiding the motion saliency and colour saliency fusion. The proposed solution of custom spatio-temporal
fusion saliency detection method, thus, is a spatial temporal gradient definition that helps in assigning high
saliency values around foreground object and also not take into consideration the hollow effects. The efficiency
and accuracy of the solution is boosted by using a series of adjustments in the saliency strategies, which helps
in fusion of motion and colour saliencies. The temporal smoothness is first guarded by making a temporal
saliency correspondence with cross-frame super pixels and then it is leveraged for further boosting the accuracy
of the saliency model by employing a one-to-one spatial temporal saliency diffusion.
2. LITERATURE SURVEY
This section will deal with the various research papers that have been taken inspiration from to
complete the proposed solution, that is, custom spatio-temporal fusion saliency detection method. As it has
been previously mentioned, image saliency distinguishes the most important details in that image. There has
been an exponential increase in video compression due to increase in the traffic caused by video streaming,
webinars and so on. The demand of best video quality has led to the development of various video compression
algorithms, which focuses on reducing the video memory space while keeping the quality in check. The usage
of convolutional neural networks (CNN) has also been done in this field. A survey had been conducted on
learning-based video compression methods [18] and each method’s advantages and disadvantages have been
discussed. Borji [19] has researched on the various deep saliency models, its benchmarks and datasets in order
to help in the development of the not so researched field of video saliency. The research also notes the
differences between the human level and algorithm level saliency detection accuracy and how to patch
them up.
Meanwhile, in [20] there are three contributions made. Firstly, they introduced a new benchmark
named dynamic human fixation 1K (DHF1K) that helps in pointing out fixations that are needed during
dynamic scene free viewing. Then comes the attentive convolutional neural networks-long short-term memory
(CNN-LSTM) network (ACLNet) that augments the CNN-LSTM architecture with a supervised attention
mechanism to enable fast end-to-end saliency learning. This helps the CNN-LSTM to focus on learning faster
end-to-end saliency methods for better temporal saliency representation across successive frames. The third
contribution is that they have performed extensive experimentation on three datasets names, DHF1K,
Hollywood-2, and University of Central Florida (UCF) sports dataset. The results of the experiments conducted
were of great and upmost importance for the further development in the stated field.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91
84
In [21] has given a solution to reduce the error made in smooth pursuits (SPs), that is, a major eye
movement type that is unique to perception of dynamic scenes. The solution employs manual annotations of
SPs, and algorithmic points for fixations along with fixation of SP salient locations or saliency prediction by
training slicing CNN. This solutions model is then tested on three datasets with reference to the already
available methods. The result has led to greater accuracy and efficiency. There has been another model proposal
that uses 3D convolutional encoder-decoder subnetworks [22] for dynamic scene saliency prediction. The
result is first started with extraction of spatial and temporal features using two subnetworks and then the
decoder enlarges the features in the spatial dimensions and aggregating temporal information.
High-definition video compression (HEVC) system is the new standard video compression algorithms
used today. In [23] has improved the HEVC algorithms by the proposal of a spatial saliency algorithm that
uses the concept of a motion vector. The motion estimation of each block during HEVC compression based on
CNN is combined and adaptive dynamic fusion takes place. There is also an algorithm for a more flexible
quadratic programming (QP) selection along with another algorithm to help in rate distortion optimization.
In [24] has introduced new salient object segmentation method, which combines conditional random
field (CRF) and saliency measure. Being formulated by a statistical framework and local feature contrast in
colour, illumination and motion information, the resultant salient map is used in CRF model using segmentation
approach to define an energy minimization and recover well-defined salient objects. In [25] Also uses the
combination of spatial and temporal information along with statistical uncertainty measures to detect visual
saliency. The two spatial and temporal maps are merged using a spatiotemporally adaptive entropy-based
uncertainty weighting approach to get one single map.
In [26] has introduced a contrast-based saliency in a pre-defined spatial temporal surrounding.
Co-saliency detection using cluster algorithms is discussed in [27]. Cluster saliency is measured using spatial,
corresponding and contrast and the results are obtained by fusing the single and multi-image saliency maps.
There is another research [28]–[34] where computation of robust geodesic measurement is done to get the
saliency mapping. In [35]–[40] has used a super pixel-based strategy and this helps in formulating our proposed
custom spatio-temporal fusion saliency detection method. The image if first segmented into super pixels and
undergoes adaptive colour quantization. Next, [41], [42] they measure inter-super pixel similarity based on
difference between spatial distance and histograms. Then the spatial sparsity and global contrast sparsity are
measured and then integrated with inter-super pixels to generate the super-pixel saliency map [43]–[47]. In
[48] has helped in choosing the various evaluation metrics and methods for saliency testing. It has referenced
the main papers as well and has very well explained metrics for even a layman to understand. This paper has 5
sections. The first section handles the introduction; the second section partakes in naming every reference that
has helped this paper compete the solution proposed. The third section will take care of the mathematical aspect
of the algorithm proposed and how each modification is put in to increase accuracy, perception and betterment
in low result areas and sections 4 and 5 display the results in comparison to various saliency detection methods
and the conclusion.
3. PROPOSED SYSTEM
The solution proposed by his paper is based on spatial temporal saliency fusion. The available state-
of-the-art methods create saliency maps using frame sequence one by one. We have used the fusion of
modelling and contrast-based saliencies. The two methods are briefly explained here.
3.1. Modeling based saliency adjustment
To produce a robust saliency map, there is a need to combine colour contrast computation with long-
term inter batch information so that the saliency of non-salient backgrounds is reduced. We shall use BM ∈
ℝ3×bn
and FM ∈ ℝ3×fn
to represent background model and foreground appearance model, with
fn and bn being the sizes of their respective backgrounds, while their job is to take care of the i − th super
pixel’s RGB (Red, Green, Blue) history in all regions. Then, we follow (1) and (2).
intraCi
= exp(λ − |φ(MCi) − φ(CMi)|) ; λ = 0.5 (1)
interCi
= φ(
min||(Ri,Gi,Bi),BM||
2
⋅
1
bn
∑||(Ri,Gi,Bi),BM||
2
min||(Ri,Gi,Bi),FM||
2
⋅
1
fn
∑||(Ri,Gi,Bi),FM||
2
) (2)
Here, λ is the upper bound discrepancy degree. This helps to inverse the penalty between the motion and color
saliencies.
Int J Artif Intell ISSN: 2252-8938 
Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad)
85
3.2. Contrast- based saliency mapping
This mapping method has been inspired by [15]–[17], [27] but there have been some changes to their
proposition to best suit the paper’s aim. The aforementioned papers have used frame-by-frame analogy to
detect saliency in them and separate the video sequence into several short groups of frames
Gi = {F1, F2, F3, … . , Fn}. Each frame Fk, where (k denotes the frame number) undergoes modification using
simple linear iterative clustering [30], taken form [31] and boundary-aware smoothing method, which is
inspired from, which helps in removing the computational burden and unnecessary details. The colour and
motion gradient mapping from [31], [32] for obtaining the spatio-temporal gradient map and thus obtain pixel-
based contrast computation given by (3).
SMT = ||ux, uy||2
⨀||∇(F)||2
(3)
That is, horizontal and vertical gradient of optical flow and ∇(F) colour gradient map. We then calculate the
i − th super pixel’s motion contrast using (4),
MCi = ∑
||Ui,Uj||
2
||ai,a||2
,
aj∈ψi
ψi = {τ + 1 ≥ ||ai, aj||
2
≥ τ} (4)
where l2 norm has been used and U and ai denote the optical flow gradient in two directions and i − th super-
pixel position centre respectively. ψi is used to denote computational contrast range and is calculated using
shortest Euclidean distance between spatio-temporal map and i − th superpixel.
τ =
r
||Λ(SMT)||0
∑ ||Λ(SMTτ
)||
0
τ∈||τ,i||≤r ; l = 0.5 min{width, height} , Λ → down sampling (5)
Colour saliency is also computed the same way as optical flow gradient, except we use the red, blue and green
notations for the i − th super pixel.
CM = ∑
||(Ri,Gi,Bi,),(Rj,Gj,Bj)||
2
||ai,aj||
2
aj∈ψi
(6)
CMk,i ←
∑ ∑ exp (−||ck,i
aτ,j∈μϕ
,c τ,j||1 μ)⋅CMτ,j
⁄
k+1
τ=k−1
∑ ∑ exp (−||ck,i
aτ,j∈μϕ
,c τ,j||1 μ)
⁄
k+1
τ=k−1
(7)
Here, ck,i is the average of the i − th super-pixel RGB colour value in k − th frame while σ controls smoothing
strength. The equation ||ak,i, a τ,j||
2
≤ θ needs to be satisfied and this is done using μ,
θ =
1
m×n
∑ ∑ ||
1
m
m
i=1
n
k=1 ∑ F(SMTk,i
m
i=1 ), F(SMTk,i
)||1; m, n = frame numbers (8)
F(SMTi
) = {
ai, SMTi
≤ ϵ ×
1
m
∑ SMTi
m
i=1
0, otherwise
; ϵ = filter strenght control (9)
At each batch frame level, the q − th frame’s smoothing rate is dynamically updated with (10).
(1 − γ)θs−1 + γθs → θs; γ = (learning weight ,0.2) (10)
Now the colour and motion saliency is integrated to get the pixel-based saliency map.
LLS = CM ⊙ MC (11)
Since this fused saliency maps increases accuracy considerably but the rate decreases, so this will be dealt with
in the next section.
3.3. Accuracy boosting
There is a matrix M that is the input and it needs to be decomposed, we use the help of sparse S and
low level D and use this equation min
D,S
α||S||1
+ ||D||∗
subj = M = S + D where the nuclear form of D is used
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91
86
and the (11) is solved with the help of robust principal component analysis (RPCA) [34] and is showcased
using the two equations. Where ssd(Z) denotes singular value decomposition of Lagrange multiplier and
α and β represent lesser-rank and sparse threshold parameters respectively. Then, to reduce wrong detections
due to misplaced optical flow the super-pixels contained in the given region’s rough foreground is located and
feature subspace of a frame k is spanned as gIk = {LLSk,1
, LLSk,2
, … . . LLSk,m
} and thus for the entire frame
group we get gBτ = {gI1, gI2, … . , gIn}. This way the rough foreground is calculated as given in (14).
S ← sign(M − D − S)[|M − D − S| − αβ]+ (12)
D ← V[Σ − βI]+U, (V, Σ, U) ← svd(Z) (13)
RFi
= [∑ LLSk,i
−
ω
n×m
∑ ∑ LLSk,i
]+
m
i=1
n
k=1
n
k=1 (14)
Here ω is reliability cotrol factor and we also get two subspaces for (14) spanned by LLS and RGB colour and
it is given by SB = {cv1, cv2, … . , cvn} ∈ ℝ3v×n
where cvi = {vec(Ri,1, Gi,1, Bi,1, … . , Ri,m, Gi,m, Bi,m)}K
and
SF = vec(LLS1
), … . vec(LLSn
) ∈ ℝv×n
. This helps in making a one-to-one correspondence and then pixel-
based saliency mapping infusion that is dissipated on the entire group of frames.SBover SF causes disruptive
foreground salient movements and hence with the help from [35]–[37] this issue was resolved with an alternate
solution,
min
Mcx,Scx,ϑ,A⊙ϑ
||Mc||
∗
+ ||Dx||
∗
+ ||A + ϑ||2
+ α1||Sc||
1
+ α2||Sx||; || ∙ ||∗ →
nuclear norm, A is position matrix (15)
s. t Mc = Dc + Sc, Ms = Ds + Sx, Mc = SB ⊙ ϑ, Mx = SF ⊙ ϑ,
ϑ = {E1, E2, … . , En}, Ei ∈ {0,1}m×m
, Ei1K
= 1.
where the estimated pixel-based mapping features over colour and saliency feature spaces are denoted by the
Dc, Dx variables, ϑ is the permutation matrix that is taken from [36], [38], while Sx, Screpresent the colour
feature sparse component space and saliency feature space. This entire equation set helps in correcting super-
pixel correspondences.
3.4. Mathematical model
In (15) is again modified using the concept of [39]. This is to generate a distributed version of convex
problems and this is represented by (16). Where Zirepresents Lagrangian multiplier. π Denotes steps of
iterations and the optimized solution using partial derivative shown in (17).
D(Mcx, Scx, ϑ, A ⊙ ϑ)
= α1||Sc||
1
+ α2||Ex||
2
+ β1||Mc||
∗
+ β2||Mx||
∗
+ ||A ⊙ ϑ||2
+ trace (Z1
K(Mc − Dc − Sc))
+trace (Z2
K(Mx − Dx − Sx)) +
π
2
(||Mc − Dc − Sc||
2
+ ||(Mx − Dx − Sx)||
2
). (16)
Sc,x
k+1
=
1
2
||Sc,x
k
− (Mc,x
k
− Sc,x
k
+ Z1,2
k
πk||2
2
⁄ + min
Sc,x
k
α1,2 ||Sc,x
k
||
1
/πk (17)
Dc,x
k+1
=
1
2
||Dc,x
k
− (Mc,x
k
− Dc,x
k
+ Z1,2
k
πk||2
2
⁄ + min
Dc,x
k
β1,2 ||Dc,x
k
||
∗
/πk (18)
Di is updated to become
Dc,x
k+1
← UK
+ V [Σ −
β1,2
πk
] (19)
(V, Σ, U) ← svd (Mc,x
k
− Sc,x
k
+
Z1,2
k
πk
)
Int J Artif Intell ISSN: 2252-8938 
Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad)
87
similarly, for Si.
Sc,x
k+1
← sign (
|J|
πk
) [J −
α1,2
πk
]
+
(20)
J = Mc,x
k
− Dc,x
k
+ Zc,x
k
/πk
then the components that determine the value of E are used o compute the norm cost L ∈ ℝm×m
li,j
k
= ||Ok,i − H(V1, j)||
2
, V1 = H(SB, k) ⊙ Ek (21)
li,j
k
= ||Ok,i − H(V2, j)||
2
, V2 = H(SB, k) ⊙ Ek
where, O objective column matrix which gives k − th of RF
Ok,i = Sc,x(k, i) + Dc,x(k, i) − Z1,2(k, i)/πk (22)
As min||A + ϑ||2
is hard to approximate, so as to calculate it there is a need to change
Lτ = {r1,1
τ
+ d1,1
τ
, r1,2
τ
+ d1,2
τ
, … . , rm,m
τ
+ dm,m
τ
} ∈ ℝm×m
, k = [k − 1, k + 1] to Lk.
H(Lk, j) ← ∑ ∑ H(Lτ, v). exp (−||cτ,v, ck,j|| 1 μ)
⁄
pt,v∈ξ
k+1
τ=k−1 (23)
the global optimization problems is handled using the algorithm in [40] and thus modifying the (24)-(26)
SFk+1
← SFk
⊙ ϑ, SBk+1
SBk
⊙ ϑ (24)
Z1,2
k+1
← πk(Mc,x
k
− Dc,x
k
− Sc,x
k
) + Z1,2
k
(25)
πk+1 ← πk × 1.05 (26)
the alignment of the super pixels is now given by (27)
gSi =
1
n−1
∑ H(SF ⊙ ϑ, τ)
n
τ=1,i≠τ (27)
SF is modified to reduce the incorrect detections and alignments
SF
̃ ← SF ⊙ ϑ (28)
SF ← SF
̃ ∙ (1m×n
− X(Sc)) + ρ ∙ SF
̅̅̅ ∙ X(Sc) (29)
ρi,j = {
0.5,
1
n
∑ SFi,j <
̃
n
j=1 SFi,j
̃
2, otherwise
(30)
In (29) is a balancing equation matrix. The equation to represent the result of the saliency mapping for the i −
th video frame is
gSi =
H(ρ,i)−(H(ρ,i).X(Sc)
H(ρ,i)(n−1)
∑ H(SF ⊙ ϑ, τ)
n
τ=1,i≠τ (31)
there is a need to diffuse inner temporal batch xr of the current group’s frames based of degree of colour
similarity. the final output is given by
gSi,j =
xr∙yr+∑ yi∙gSi,j
n
i=1
yr+∑ yi
n
i=1
; yr = exp (− ||cr,j, ci,j||
2
/μ) (32)
Where xl displays the colour distance-based weights.
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91
88
4. RESULTS, EXPERIMENTS AND DATABASE
4.1. Based references and comparison
Any experiment or research is not complete without the proposed solution’s actual application results.
For this paper, the algorithm is compared [42] as a bare reference, followed by [43]’s operational block
description length (OBDL) [43], dynamic adaptive whitening saliency (AWS-D) [44]. Object-to-motion
convolutional neural network two layer long short-term memory (OMCNN-2CLSTM) [45], attentive
convolutional (ACL) [46], saliency aware video compression (SAVC) algorithm from Xu et al. and
Bylinskii et al. [47], [48]. These algorithms are used on the database. The database used in this paper is same
as that of the reference base paper [42]. Its a high-definition eye tracking database with the open-source present
in GitHub [49]. These algorithms are used widely, with a great common intermediate factor (CIF) resolution
and is also based on HD non-destructive video.
4.2. Experiment and results
For the final comparison and evaluation, 10 sequences of videos were taken in 3 discrete resolutions
of 1920 × 1080, 1280 × 720 and 832 × 480. This is shown in Table I. Then we use five evaluation metrics,
namely area under the receiver operating characteristic (ROC) curve (AUC), similarity (SIM), correlation
coefficient (CC), normalized scanpath saliency (NSS) and kullback-leibler (KL) and the results are shown in
Table 1. Figure 1 shows the results for saliency algorithms.
The comparison among all the aforementioned algorithms OBDL [43], AWS-D [44]. OMCNN-
2CLSTM [45], ACL [46], SAVC [47], XU [48]. Base reference [42] and our proposed algorithm) has been
numerically arranged in Table 2 and Figure 2 shows the graphical representation of the same. Five common
saliency evaluation metrics have been used, the same as used in the base reference paper [42], namely area
under ROC curve, similarity or histogram intersection, pearson’s CC, NSS and KL divergence. Looking at the
research papers, the SAVC and OBDL algorithms have been based on H>264 that has incorporated macroblock
coding strategy with fixed size and is inflexible unlike the HEVC. This causes reduction in accuracy and
precision. The XU algorithm is quite similar to HEVC algorithm and so it gets better results than the previous
mentioned algorithms but complex pictures will lead to difficulty in saliency detection and mapping. This gives
large values of KL evaluation scheme. This problem is similarly found in OMCNN-2CLSTM [45] and sACL
[46] with high values of (2.82 and 3.0642 respectively). The base paper [42] has somewhat fared well than
the other saliency detection methods but yet again the KL value is quite high (at 2.4921). The propose solution
has done remarkably well with respect to the KL evaluation metric with an amazing result of 0.862871, that
is ground truth vale is more accurate and a remarkable NSS value of nearly being unity. The other evaluation
metric values are closer to each other but this paper has outperformed in several aspects, making the proposed
custom spatio-temporal fusion saliency detection method a much more successful and viable saliency detection
method.
Table 1. Information regarding the types of videos chosen for evaluation and comparison
Type Resolution Name Frame Rate (Hz)
A 1920×1080 Basketball Drive 50
Kimono 1 24
Park Scene 24
Johnny 60
B 1280×720 Kristen And Sara 60
Four People 60
vidyo3 60
vidyo4 60
C 832×480 Basketball Drill 50
Race Horses 30
Table 2. Saliency evaluation and comparison results
Method AUC SIM CC NSS KL
OBDL [43] 0.6413 0.2982 0.2253 0.297 3.4642
AWS-D [44] 0.6635 0.3154 0.2663 0.4768 1.7144
ACL [46] 0.7673 0.3614 0.3774 0.5005 3.0642
SAVC [47] 0.5844 0.2688 0.1248 0.1889 2.0191
XU [48] 0.5881 0.305 0.2663 0.2854 1.5098
BasePaper [42] 0.7334 0.3751 0.387 0.5674 2.4921
PS-SYSTEM 0.7354 0.4644 0.44391 1.000138 0.862871
Int J Artif Intell ISSN: 2252-8938 
Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad)
89
Figure 1. The results display the video frames after implementing saliency algorithm
Figure 2. Saliency evaluation and comparison graph
5. CONCLUSION
This paper has introduced a custom spatio-temporal fusion video saliency detection method that has
greater accuracy and precision in comparison to the latest available state-of-the-art saliency detection methods.
There have been several changes made in simple calculations to solve the problems of colour contrast
computation, modifying the fusion aspect of the saliency so as to boost both motion and colour values and also
spatio-temporal of pixel-based coherency boost for temporal scope saliency exploration. The product had been
tested against an extensive database provided by for comprehending its robustness and efficiency. The result
has also been compared to the various state-of-the0-art saliency-mapping methods and it has cometo light that
the proposed solution has better accuracy and precision. All these modifications have made our proposed
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91
90
custom spatio-temporal fusion video saliency detection method perform much better and has given a new rise
of hope in the field of video saliency. This algorithm will be helpful for those who will continue to further the
research in this field of saliency detection, as there is very little research available.
REFERENCES
[1] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998, doi: 10.1109/34.730558.
[2] C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” 26th
IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8, Jun. 2008, doi: 10.1109/CVPR.2008.4587715.
[3] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 {IEEE} Conference on
Computer Vision and Pattern Recognition, Jun. 2010, pp. 1597–1604, doi: 10.1109/cvpr.2009.5206596.
[4] M. Cerf, E. P. Frady, and C. Koch, “Faces and text attract gaze independent of the task: Experimental data and computer model,”
Journal of Vision, vol. 9, no. 12, pp. 1–15, Nov. 2009, doi: 10.1167/9.12.10.
[5] M. Cerf, J. Harel, W. Einhäuser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,”
Advances in Neural Information Processing Systems 20-Proceedings of the 2007 Conference, 2008.
[6] L. J. Li and L. Fei-Fei, “What, where and who? Classifying events by scene and object recognition,” Proceedings of the IEEE
International Conference on Computer Vision, 2007, doi: 10.1109/ICCV.2007.4408872.
[7] B. Scassellati, “Theory of mind for a humanoid robot,” Autonomous Robots, vol. 12, no. 1, pp. 13–24, 2002,
doi: 10.1023/A:1013298507114.
[8] S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, and A. Guérin-Dugué, “Spatio-temporal saliency model to predict eye
movements in video free viewing,” 2008 16th European Signal Processing Conference, Aug. 2008.
[9] Yu-Fei Ma and Hong-Jiang Zhang, “A model of motion attention for video skimming,” in Proceedings. International Conference
on Image Processing, 2002, vol. 1, pp. I-129-I–132, doi: 10.1109/ICIP.2002.1037976.
[10] S. Li and M. C. Lee, “Fast visual tracking using motion saliency in video,” in 2007 IEEE International Conference on Acoustics,
Speech and Signal Processing-ICASSP ’07, 2007, vol. 1, pp. I-1073-I–1076, doi: 10.1109/ICASSP.2007.366097.
[11] R. J. Peters and L. Itti, “Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial
attention,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8,
doi: 10.1109/CVPR.2007.383337.
[12] A. C. Schütz, D. I. Braun, and K. R. Gegenfurtner, “Object recognition during foveating eye movements,” Vision Research,
vol. 49, no. 18, pp. 2241–2253, Sep. 2009, doi: 10.1016/j.visres.2009.05.022.
[13] F. Zhou, S. B. Kang, and M. F. Cohen, “Time-mapping using space-time saliency,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 3358–3365, doi: 10.1109/CVPR.2014.429.
[14] Z. Liu, X. Zhang, S. Luo, and O. Le Meur, “Superpixel-based spatiotemporal saliency detection,” IEEE Transactions on Circuits
and Systems for Video Technology, vol. 24, no. 9, pp. 1522–1540, Sep. 2014, doi: 10.1109/TCSVT.2014.2308642.
[15] Y. Fang, Z. Wang, W. Lin, and Z. Fang, “Video saliency incorporating spatiotemporal cues and uncertainty weighting,” IEEE
Transactions on Image Processing, vol. 23, no. 9, pp. 3910–3921, Sep. 2014, doi: 10.1109/TIP.2014.2336549.
[16] W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, Jun. 2015, vol. 07-12-June-2015, pp. 3395–3402,
doi: 10.1109/CVPR.2015.7298961.
[17] W. Wang, J. Shen, and L. Shao, “Consistent video saliency using local gradient flow optimization and global refinement,” IEEE
Transactions on Image Processing, vol. 24, no. 11, pp. 4185–4196, Nov. 2015, doi: 10.1109/TIP.2015.2460013.
[18] T. M. Hoang and J. Zhou, “Recent trending on learning based video compression: A survey,” Cognitive Robotics, vol. 1,
pp. 145–158, 2021, doi: 10.1016/j.cogr.2021.08.003.
[19] A. Borji, “Saliency prediction in the deep learning era: successes and limitations,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 43, no. 2, pp. 679–700, Feb. 2021, doi: 10.1109/TPAMI.2019.2935715.
[20] W. Wang, J. Shen, J. Xie, M. M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 220–237, Jan. 2021,
doi: 10.1109/TPAMI.2019.2924417.
[21] M. Startsev and M. Dorr, “Supersaliency: a novel pipeline for predicting smooth pursuit-based attention improves generalisability
of video saliency,” IEEE Access, vol. 8, pp. 1276–1289, 2020, doi: 10.1109/ACCESS.2019.2961835.
[22] H. Li, F. Qi, and G. Shi, “A novel spatiooral 3D convolutional encoder-decoder network for dynamic saliency prediction,” IEEE
Access, vol. 9, pp. 36328–36341, 2021, doi: 10.1109/ACCESS.2021.3063372.
[23] S. Zhu, C. Liu, and Z. Xu, “High-definition video compression system based on perception guidance of salient information of a
convolutional neural network and HEVC compression domain,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 30, no. 7, pp. 1946–1959, 2020, doi: 10.1109/TCSVT.2019.2911396.
[24] E. Rahtu, J. Kannala, M. Salo, and J. Heikkilä, “Segmenting salient objects from images and videos,” in Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6315 LNCS,
no. PART 5, Springer Berlin Heidelberg, 2010, pp. 366–379.
[25] Q. Zhang, X. Wang, S. Wang, S. Li, S. Kwong, and J. Jiang, “Learning to explore intrinsic saliency for stereoscopic video,” in
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2019, vol. 2019-June,
pp. 9741–9750, doi: 10.1109/CVPR.2019.00998.
[26] H. J. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance,” Journal of Vision, vol. 9, no. 12,
pp. 15–15, Nov. 2009, doi: 10.1167/9.12.15.
[27] H. Fu, X. Cao, and Z. Tu, “Cluster-based co-saliency detection,” IEEE Transactions on Image Processing, vol. 22, no. 10,
pp. 3766–3778, Oct. 2013, doi: 10.1109/TIP.2013.2260166.
[28] Z. Wang, J. Li, and Z. Pan, “Cross complementary fusion network for video salient object detection,” IEEE Access, vol. 8,
pp. 201259–201270, 2020, doi: 10.1109/ACCESS.2020.3036533.
[29] H. Bi, D. Lu, N. Li, L. Yang, and H. Guan, “Multi-level model for video saliency detection,” in 2019 IEEE International Conference
on Image Processing (ICIP), Sep. 2019, vol. 2019-Septe, pp. 4654–4658, doi: 10.1109/ICIP.2019.8803611.
[30] E. S. L. Gastal and M. M. Oliveira, “Domain transform for edge-aware image and video processing,” ACM Transactions on
Graphics, vol. 30, no. 4, pp. 1–12, Jul. 2011, doi: 10.1145/2010324.1964964.
Int J Artif Intell ISSN: 2252-8938 
Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad)
91
[31] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC superpixels compared to state-of-the-art superpixel
methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, Nov. 2012,
doi: 10.1109/TPAMI.2012.120.
[32] D. Zhang, O. Javed, and M. Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary
object regions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2013,
pp. 628–635, doi: 10.1109/CVPR.2013.87.
[33] M. Xu, P. Fu, B. Liu, and J. Li, “Multi-stream attention-aware graph convolution network for video salient object detection,” IEEE
Transactions on Image Processing, vol. 30, pp. 4183–4197, 2021, doi: 10.1109/TIP.2021.3070200.
[34] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao, “Robust principal component analysis: exact recovery of corrupted low-rank
matrices by convex optimization,” in NIPS’09: Proceedings of the 22nd International Conference on Neural Information
Processing Systems, 2009, pp. 2080–2088.
[35] X. Zhou, C. Yang, and W. Yu, “Moving object detection by detecting contiguous outliers in the low-rank representation,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, Mar. 2013, doi: 10.1109/TPAMI.2012.132.
[36] Z. Zeng, T. H. Chan, K. Jia, and D. Xu, “Finding correspondence from multiple images via sparse and low-rank decomposition,”
in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 7576 LNCS, no. PART 5, Springer Berlin Heidelberg, 2012, pp. 325–339.
[37] P. Ji, H. Li, M. Salzmann, and Y. Dai, “Robust motion segmentation with unknown correspondences,” in Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8694 LNCS,
no. PART 6, Springer International Publishing, 2014, pp. 204–219.
[38] R. Oliveira, J. Costeira, and J. Xavier, “Optimal point correspondence through the use of rank constraints,” in 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 2, pp. 1016–1021,
doi: 10.1109/CVPR.2005.264.
[39] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction
method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010, doi: 10.1561/2200000016.
[40] J. Munkres, “Algorithms for the assignment and transportation problems,” Journal of the Society for Industrial and Applied
Mathematics, vol. 5, no. 1, pp. 32–38, Mar. 1957, doi: 10.1137/0105003.
[41] Z. Liu, L. Meur, and S. Luo, “Superpixel-based saliency detection,” in 2013 14th International Workshop on Image Analysis for
Multimedia Interactive Services (WIAMIS), Jul. 2013, pp. 1–4, doi: 10.1109/WIAMIS.2013.6616119.
[42] L. Wei, M. Wang, W. Liu, X. Wang, J. Sun, and X. Yin, “Multi-features fusion based on boolean map for video saliency detection,”
in Chinese Control Conference, CCC, Jul. 2019, vol. 2019-July, pp. 7589–7594, doi: 10.23919/ChiCC.2019.8865253.
[43] V. Leboran, A. Garcia-Diaz, X. R. Fdez-Vidal, and X. M. Pardo, “Dynamic whitening saliency,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 5, pp. 893–907, May 2017, doi: 10.1109/TPAMI.2016.2567391.
[44] F. Guo, W. Wang, Z. Shen, J. Shen, L. Shao, and D. Tao, “Motion-aware rapid video saliency detection,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4887–4898, Dec. 2020, doi: 10.1109/TCSVT.2019.2906226.
[45] L. Huang, K. Song, J. Wang, M. Niu, and Y. Yan, “Multi-graph fusion and learning for RGBT image saliency detection,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1366–1377, Mar. 2022,
doi: 10.1109/TCSVT.2021.3069812.
[46] H. Hadizadeh and I. V. Bajic, “Saliency-aware video compression,” IEEE Transactions on Image Processing, vol. 23, no. 1,
pp. 19–33, Jan. 2014, doi: 10.1109/TIP.2013.2282897.
[47] M. Xu, L. Jiang, X. Sun, Z. Ye, and Z. Wang, “Learning to detect video saliency with HEVC features,” IEEE Transactions on
Image Processing, vol. 26, no. 1, pp. 369–385, Jan. 2017, doi: 10.1109/TIP.2016.2628583.
[48] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “What do different evaluation metrics tell us about saliency models?,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 3, pp. 740–757, Mar. 2019,
doi: 10.1109/TPAMI.2018.2815601.
[49] S. Park, E. Aksan, X. Zhang, and O. Hilliges, “Towards end-to-end video-based eye-trackin,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12357 LNCS, Springer
International Publishing, 2020, pp. 747–763.
BIOGRAPHIES OF AUTHORS
Vinay C. Warad working as assistant professor in Department of computer science
and engineering at Khawaja Banda Nawaz College of engineering. He has 8 years of teaching
experience. His area of interest is video saliency, image retrieval. He can be contacted at email:
vinaywarad999@gmail.com.
Ruksar Fatima is a professor and head of the Department for computer science
and engineering, Vice principal and examination in charge at khajaBandanawaz college of
engineering (KBNCE) kalaburagi Karnataka. She can be contacted at email:
ruksarf@gmail.com.

More Related Content

Similar to Video saliency-recognition by applying custom spatio temporal fusion technique

Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...
Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...
Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...YogeshIJTSRD
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonVideo Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
 
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...IRJET Journal
 
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET Journal
 
A VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTS
A VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTSA VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTS
A VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTSIAEME Publication
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyIJERA Editor
 
Video Stabilization using Python and open CV
Video Stabilization using Python and open CVVideo Stabilization using Python and open CV
Video Stabilization using Python and open CVIRJET Journal
 
Inverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy RetrievalInverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy Retrievalijcsa
 
Event recognition image &amp; video segmentation
Event recognition image &amp; video segmentationEvent recognition image &amp; video segmentation
Event recognition image &amp; video segmentationeSAT Journals
 
An Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection StrategiesAn Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection Strategiesidescitation
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...ijdpsjournal
 
On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...ijdpsjournal
 
Mobile Recorded Videos
Mobile Recorded Videos Mobile Recorded Videos
Mobile Recorded Videos IJECEIAES
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS
 

Similar to Video saliency-recognition by applying custom spatio temporal fusion technique (20)

Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...
Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...
Embedded Implementations of Real Time Video Stabilization Mechanisms A Compre...
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonVideo Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
 
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...
 
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
 
A VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTS
A VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTSA VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTS
A VIDEO COMPRESSION TECHNIQUE UTILIZING SPATIO-TEMPORAL LOWER COEFFICIENTS
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
 
Video Stabilization using Python and open CV
Video Stabilization using Python and open CVVideo Stabilization using Python and open CV
Video Stabilization using Python and open CV
 
Inverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy RetrievalInverted File Based Search Technique for Video Copy Retrieval
Inverted File Based Search Technique for Video Copy Retrieval
 
Event recognition image &amp; video segmentation
Event recognition image &amp; video segmentationEvent recognition image &amp; video segmentation
Event recognition image &amp; video segmentation
 
An Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection StrategiesAn Exploration based on Multifarious Video Copy Detection Strategies
An Exploration based on Multifarious Video Copy Detection Strategies
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
Video saliency-detection using custom spatiotemporal fusion method
Video saliency-detection using custom spatiotemporal fusion  methodVideo saliency-detection using custom spatiotemporal fusion  method
Video saliency-detection using custom spatiotemporal fusion method
 
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...
On the Fly Porn Video Blocking Using Distributed Multi-Gpu and Data Mining Ap...
 
On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...On the fly porn video blocking using distributed multi gpu and data mining ap...
On the fly porn video blocking using distributed multi gpu and data mining ap...
 
Mobile Recorded Videos
Mobile Recorded Videos Mobile Recorded Videos
Mobile Recorded Videos
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
 

More from IAESIJAI

Convolutional neural network with binary moth flame optimization for emotion ...
Convolutional neural network with binary moth flame optimization for emotion ...Convolutional neural network with binary moth flame optimization for emotion ...
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
 
A novel ensemble model for detecting fake news
A novel ensemble model for detecting fake newsA novel ensemble model for detecting fake news
A novel ensemble model for detecting fake newsIAESIJAI
 
K-centroid convergence clustering identification in one-label per type for di...
K-centroid convergence clustering identification in one-label per type for di...K-centroid convergence clustering identification in one-label per type for di...
K-centroid convergence clustering identification in one-label per type for di...IAESIJAI
 
Plant leaf detection through machine learning based image classification appr...
Plant leaf detection through machine learning based image classification appr...Plant leaf detection through machine learning based image classification appr...
Plant leaf detection through machine learning based image classification appr...IAESIJAI
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
 
Deep learning method for lung cancer identification and classification
Deep learning method for lung cancer identification and classificationDeep learning method for lung cancer identification and classification
Deep learning method for lung cancer identification and classificationIAESIJAI
 
Optically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelOptically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
 
Embedded artificial intelligence system using deep learning and raspberrypi f...
Embedded artificial intelligence system using deep learning and raspberrypi f...Embedded artificial intelligence system using deep learning and raspberrypi f...
Embedded artificial intelligence system using deep learning and raspberrypi f...IAESIJAI
 
Deep learning based biometric authentication using electrocardiogram and iris
Deep learning based biometric authentication using electrocardiogram and irisDeep learning based biometric authentication using electrocardiogram and iris
Deep learning based biometric authentication using electrocardiogram and irisIAESIJAI
 
Hybrid channel and spatial attention-UNet for skin lesion segmentation
Hybrid channel and spatial attention-UNet for skin lesion segmentationHybrid channel and spatial attention-UNet for skin lesion segmentation
Hybrid channel and spatial attention-UNet for skin lesion segmentationIAESIJAI
 
Photoplethysmogram signal reconstruction through integrated compression sensi...
Photoplethysmogram signal reconstruction through integrated compression sensi...Photoplethysmogram signal reconstruction through integrated compression sensi...
Photoplethysmogram signal reconstruction through integrated compression sensi...IAESIJAI
 
Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
 
Multi-channel microseismic signals classification with convolutional neural n...
Multi-channel microseismic signals classification with convolutional neural n...Multi-channel microseismic signals classification with convolutional neural n...
Multi-channel microseismic signals classification with convolutional neural n...IAESIJAI
 
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...Sophisticated face mask dataset: a novel dataset for effective coronavirus di...
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...IAESIJAI
 
Transfer learning for epilepsy detection using spectrogram images
Transfer learning for epilepsy detection using spectrogram imagesTransfer learning for epilepsy detection using spectrogram images
Transfer learning for epilepsy detection using spectrogram imagesIAESIJAI
 
Deep neural network for lateral control of self-driving cars in urban environ...
Deep neural network for lateral control of self-driving cars in urban environ...Deep neural network for lateral control of self-driving cars in urban environ...
Deep neural network for lateral control of self-driving cars in urban environ...IAESIJAI
 
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...IAESIJAI
 
Efficient commodity price forecasting using long short-term memory model
Efficient commodity price forecasting using long short-term memory modelEfficient commodity price forecasting using long short-term memory model
Efficient commodity price forecasting using long short-term memory modelIAESIJAI
 
1-dimensional convolutional neural networks for predicting sudden cardiac
1-dimensional convolutional neural networks for predicting sudden cardiac1-dimensional convolutional neural networks for predicting sudden cardiac
1-dimensional convolutional neural networks for predicting sudden cardiacIAESIJAI
 
A deep learning-based approach for early detection of disease in sugarcane pl...
A deep learning-based approach for early detection of disease in sugarcane pl...A deep learning-based approach for early detection of disease in sugarcane pl...
A deep learning-based approach for early detection of disease in sugarcane pl...IAESIJAI
 

More from IAESIJAI (20)

Convolutional neural network with binary moth flame optimization for emotion ...
Convolutional neural network with binary moth flame optimization for emotion ...Convolutional neural network with binary moth flame optimization for emotion ...
Convolutional neural network with binary moth flame optimization for emotion ...
 
A novel ensemble model for detecting fake news
A novel ensemble model for detecting fake newsA novel ensemble model for detecting fake news
A novel ensemble model for detecting fake news
 
K-centroid convergence clustering identification in one-label per type for di...
K-centroid convergence clustering identification in one-label per type for di...K-centroid convergence clustering identification in one-label per type for di...
K-centroid convergence clustering identification in one-label per type for di...
 
Plant leaf detection through machine learning based image classification appr...
Plant leaf detection through machine learning based image classification appr...Plant leaf detection through machine learning based image classification appr...
Plant leaf detection through machine learning based image classification appr...
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...
 
Deep learning method for lung cancer identification and classification
Deep learning method for lung cancer identification and classificationDeep learning method for lung cancer identification and classification
Deep learning method for lung cancer identification and classification
 
Optically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network modelOptically processed Kannada script realization with Siamese neural network model
Optically processed Kannada script realization with Siamese neural network model
 
Embedded artificial intelligence system using deep learning and raspberrypi f...
Embedded artificial intelligence system using deep learning and raspberrypi f...Embedded artificial intelligence system using deep learning and raspberrypi f...
Embedded artificial intelligence system using deep learning and raspberrypi f...
 
Deep learning based biometric authentication using electrocardiogram and iris
Deep learning based biometric authentication using electrocardiogram and irisDeep learning based biometric authentication using electrocardiogram and iris
Deep learning based biometric authentication using electrocardiogram and iris
 
Hybrid channel and spatial attention-UNet for skin lesion segmentation
Hybrid channel and spatial attention-UNet for skin lesion segmentationHybrid channel and spatial attention-UNet for skin lesion segmentation
Hybrid channel and spatial attention-UNet for skin lesion segmentation
 
Photoplethysmogram signal reconstruction through integrated compression sensi...
Photoplethysmogram signal reconstruction through integrated compression sensi...Photoplethysmogram signal reconstruction through integrated compression sensi...
Photoplethysmogram signal reconstruction through integrated compression sensi...
 
Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...
 
Multi-channel microseismic signals classification with convolutional neural n...
Multi-channel microseismic signals classification with convolutional neural n...Multi-channel microseismic signals classification with convolutional neural n...
Multi-channel microseismic signals classification with convolutional neural n...
 
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...Sophisticated face mask dataset: a novel dataset for effective coronavirus di...
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...
 
Transfer learning for epilepsy detection using spectrogram images
Transfer learning for epilepsy detection using spectrogram imagesTransfer learning for epilepsy detection using spectrogram images
Transfer learning for epilepsy detection using spectrogram images
 
Deep neural network for lateral control of self-driving cars in urban environ...
Deep neural network for lateral control of self-driving cars in urban environ...Deep neural network for lateral control of self-driving cars in urban environ...
Deep neural network for lateral control of self-driving cars in urban environ...
 
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...
 
Efficient commodity price forecasting using long short-term memory model
Efficient commodity price forecasting using long short-term memory modelEfficient commodity price forecasting using long short-term memory model
Efficient commodity price forecasting using long short-term memory model
 
1-dimensional convolutional neural networks for predicting sudden cardiac
1-dimensional convolutional neural networks for predicting sudden cardiac1-dimensional convolutional neural networks for predicting sudden cardiac
1-dimensional convolutional neural networks for predicting sudden cardiac
 
A deep learning-based approach for early detection of disease in sugarcane pl...
A deep learning-based approach for early detection of disease in sugarcane pl...A deep learning-based approach for early detection of disease in sugarcane pl...
A deep learning-based approach for early detection of disease in sugarcane pl...
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Video saliency-recognition by applying custom spatio temporal fusion technique

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 1, March 2024, pp. 82~91 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp82-91  82 Journal homepage: http://ijai.iaescore.com Video saliency-recognition by applying custom spatio temporal fusion technique Vinay C. Warad, Ruksar Fatima Department of Computer Science and Engineering, KBNCE, Kalaburagi, India Article Info ABSTRACT Article history: Received Sep 24, 2022 Revised Jan 24, 2023 Accepted Mar 10, 2023 Video saliency detection is a major growing field with quite few contributions to it. The general method available today is to conduct frame wise saliency detection and this leads to several complications, including an incoherent pixel-based saliency map, making it not so useful. This paper provides a novel solution to saliency detection and mapping with its custom spatio-temporal fusion method that uses frame wise overall motion colour saliency along with pixel-based consistent spatio-temporal diffusion for its temporal uniformity. In the proposed method section, it has been discussed how the video is fragmented into groups of frames and each frame undergoes diffusion and integration in a temporary fashion for the colour saliency mapping to be computed. Then the inter group frame are used to format the pixel-based saliency fusion, after which the features, that is, fusion of pixel saliency and colour information, guide the diffusion of the spatio temporal saliency. With this, the result has been tested with 5 publicly available global saliency evaluation metrics and it comes to conclusion that the proposed algorithm performs better than several state-of-the-art saliency detection methods with increase in accuracy with a good value margin. All the results display the robustness, reliability, versatility and accuracy. Keywords: Motion colour saliency Pixel-based coherency Saliency detection Spatio-temporal Video-saliency This is an open access article under the CC BY-SA license. Corresponding Author: Vinay C. Warad Department of Computer Science and Engineering, KBNCE Kalaburagi, India Email: vinaywarad999@gmail.com 1. INTRODUCTION The human eyes have proven to be an amazing marvel of nature. The brain and eyes together make a powerful group, which can not only see 10 million distinct colors but also have a perceptive power of 50 objects per second. In general, the human eyes can focus on specific components of a picture or a video that have an importance to us. The brain in turn filters out the unnecessary bits of information this way, keeping only those that have importance. Since a video is a series of images, the amount of information to be processed increases along with the perception of dimensions. In the technical world, this method of image and video processing has been tried to be copied or reconstructed in a different way. Looking at the available saliency models for stationary images, we have Itti’s model [1], this is regarded as the most used model for stationary image. Other models such as [2], which use Fourier transformation along the lines of phase spectrum and [3] uses frequency tuning for saliency detection. The commonality among the aforementioned models is the employment of the bottom-up visual attention mechanism. For example, [3] model uses a range of frequencies in the image spectrum, which highlights the important details, to obtain the saliency map. Then, the saliency map is computed with the help of Difference of Gaussians as well as combining several band pass filters’ results. Then feature conspicuity maps are constructed with the help of all low-level image features [4], [5], which is again added into the final saliency
  • 2. Int J Artif Intell ISSN: 2252-8938  Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad) 83 map result with principles such as Winner Take All or Inhibition of Return, that are taken from the visual nervous system. All of these are designed for still images and not for videos. In videos, the texture feature may not be salient in a moving image while it is present in a still image. Thus, there is need for other saliency models and methods to for videos. Videos are series of moving images called frames and its movement is in a sequence. There is a set frame rate to render a smooth motion so that the brain cannot differentiate between each image. Videos also helps in determining the position of any object with reference to another [6]. It can be inferred that video saliency is much more complex than image saliency. There have been several researches done on this field and there have been majorly two methods, one being computation of space-time saliency map and the other being the computation of motion saliency map [7]–[10]. To get spatio-metrically mapped video saliency, Peters and Itti [11] fused the ideas f static and dynamic saliency mapping to get a space-time saliency detection model. We also have [12] in which the authors proposed a dynamic texture model to get the motion patterns, even for dynamic scenes. In general, most of the video saliency models uses bottom-up imagery as the base, which is capable of handling non-stationary videos. In addition, motion information is considered an additional saliency clue to help in detection of video saliency and to accomplish this many state-of-the-art saliency methods fuse motion saliency with colour saliency. In [13]–[15] have adopted the fusion model but the result is a low-level saliency. Almost all the latest keep temporary smoothness in the result saliency map and this helps in improving accuracy. In [16], [17] has even used global temporal clues to obtain a low-level robust saliency but these methods have error accumulation due to usage of minimization of energy framework, as it can manage the saliency consistency over a temporal scale and this leads to wrong detections. As video saliency is a lesser researched field and it has a great room for improvement and inclusion of customized models as well, with addition of limited accuracy fall while guaranteeing temporal saliency consistency. The general method in video saliency algorithms is to use state-of-the-art image saliency detection to use as basic saliency clues but, in this paper, the chosen method is to not involve any high -level priors or constraints and only just straight low contrast saliency. The hollow effect is also avoided by integrating spatial- temporal gradient map. The temporal-level global clue is taken as the appearance modelling as this helps in guiding the motion saliency and colour saliency fusion. The proposed solution of custom spatio-temporal fusion saliency detection method, thus, is a spatial temporal gradient definition that helps in assigning high saliency values around foreground object and also not take into consideration the hollow effects. The efficiency and accuracy of the solution is boosted by using a series of adjustments in the saliency strategies, which helps in fusion of motion and colour saliencies. The temporal smoothness is first guarded by making a temporal saliency correspondence with cross-frame super pixels and then it is leveraged for further boosting the accuracy of the saliency model by employing a one-to-one spatial temporal saliency diffusion. 2. LITERATURE SURVEY This section will deal with the various research papers that have been taken inspiration from to complete the proposed solution, that is, custom spatio-temporal fusion saliency detection method. As it has been previously mentioned, image saliency distinguishes the most important details in that image. There has been an exponential increase in video compression due to increase in the traffic caused by video streaming, webinars and so on. The demand of best video quality has led to the development of various video compression algorithms, which focuses on reducing the video memory space while keeping the quality in check. The usage of convolutional neural networks (CNN) has also been done in this field. A survey had been conducted on learning-based video compression methods [18] and each method’s advantages and disadvantages have been discussed. Borji [19] has researched on the various deep saliency models, its benchmarks and datasets in order to help in the development of the not so researched field of video saliency. The research also notes the differences between the human level and algorithm level saliency detection accuracy and how to patch them up. Meanwhile, in [20] there are three contributions made. Firstly, they introduced a new benchmark named dynamic human fixation 1K (DHF1K) that helps in pointing out fixations that are needed during dynamic scene free viewing. Then comes the attentive convolutional neural networks-long short-term memory (CNN-LSTM) network (ACLNet) that augments the CNN-LSTM architecture with a supervised attention mechanism to enable fast end-to-end saliency learning. This helps the CNN-LSTM to focus on learning faster end-to-end saliency methods for better temporal saliency representation across successive frames. The third contribution is that they have performed extensive experimentation on three datasets names, DHF1K, Hollywood-2, and University of Central Florida (UCF) sports dataset. The results of the experiments conducted were of great and upmost importance for the further development in the stated field.
  • 3.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91 84 In [21] has given a solution to reduce the error made in smooth pursuits (SPs), that is, a major eye movement type that is unique to perception of dynamic scenes. The solution employs manual annotations of SPs, and algorithmic points for fixations along with fixation of SP salient locations or saliency prediction by training slicing CNN. This solutions model is then tested on three datasets with reference to the already available methods. The result has led to greater accuracy and efficiency. There has been another model proposal that uses 3D convolutional encoder-decoder subnetworks [22] for dynamic scene saliency prediction. The result is first started with extraction of spatial and temporal features using two subnetworks and then the decoder enlarges the features in the spatial dimensions and aggregating temporal information. High-definition video compression (HEVC) system is the new standard video compression algorithms used today. In [23] has improved the HEVC algorithms by the proposal of a spatial saliency algorithm that uses the concept of a motion vector. The motion estimation of each block during HEVC compression based on CNN is combined and adaptive dynamic fusion takes place. There is also an algorithm for a more flexible quadratic programming (QP) selection along with another algorithm to help in rate distortion optimization. In [24] has introduced new salient object segmentation method, which combines conditional random field (CRF) and saliency measure. Being formulated by a statistical framework and local feature contrast in colour, illumination and motion information, the resultant salient map is used in CRF model using segmentation approach to define an energy minimization and recover well-defined salient objects. In [25] Also uses the combination of spatial and temporal information along with statistical uncertainty measures to detect visual saliency. The two spatial and temporal maps are merged using a spatiotemporally adaptive entropy-based uncertainty weighting approach to get one single map. In [26] has introduced a contrast-based saliency in a pre-defined spatial temporal surrounding. Co-saliency detection using cluster algorithms is discussed in [27]. Cluster saliency is measured using spatial, corresponding and contrast and the results are obtained by fusing the single and multi-image saliency maps. There is another research [28]–[34] where computation of robust geodesic measurement is done to get the saliency mapping. In [35]–[40] has used a super pixel-based strategy and this helps in formulating our proposed custom spatio-temporal fusion saliency detection method. The image if first segmented into super pixels and undergoes adaptive colour quantization. Next, [41], [42] they measure inter-super pixel similarity based on difference between spatial distance and histograms. Then the spatial sparsity and global contrast sparsity are measured and then integrated with inter-super pixels to generate the super-pixel saliency map [43]–[47]. In [48] has helped in choosing the various evaluation metrics and methods for saliency testing. It has referenced the main papers as well and has very well explained metrics for even a layman to understand. This paper has 5 sections. The first section handles the introduction; the second section partakes in naming every reference that has helped this paper compete the solution proposed. The third section will take care of the mathematical aspect of the algorithm proposed and how each modification is put in to increase accuracy, perception and betterment in low result areas and sections 4 and 5 display the results in comparison to various saliency detection methods and the conclusion. 3. PROPOSED SYSTEM The solution proposed by his paper is based on spatial temporal saliency fusion. The available state- of-the-art methods create saliency maps using frame sequence one by one. We have used the fusion of modelling and contrast-based saliencies. The two methods are briefly explained here. 3.1. Modeling based saliency adjustment To produce a robust saliency map, there is a need to combine colour contrast computation with long- term inter batch information so that the saliency of non-salient backgrounds is reduced. We shall use BM ∈ ℝ3×bn and FM ∈ ℝ3×fn to represent background model and foreground appearance model, with fn and bn being the sizes of their respective backgrounds, while their job is to take care of the i − th super pixel’s RGB (Red, Green, Blue) history in all regions. Then, we follow (1) and (2). intraCi = exp(λ − |φ(MCi) − φ(CMi)|) ; λ = 0.5 (1) interCi = φ( min||(Ri,Gi,Bi),BM|| 2 ⋅ 1 bn ∑||(Ri,Gi,Bi),BM|| 2 min||(Ri,Gi,Bi),FM|| 2 ⋅ 1 fn ∑||(Ri,Gi,Bi),FM|| 2 ) (2) Here, λ is the upper bound discrepancy degree. This helps to inverse the penalty between the motion and color saliencies.
  • 4. Int J Artif Intell ISSN: 2252-8938  Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad) 85 3.2. Contrast- based saliency mapping This mapping method has been inspired by [15]–[17], [27] but there have been some changes to their proposition to best suit the paper’s aim. The aforementioned papers have used frame-by-frame analogy to detect saliency in them and separate the video sequence into several short groups of frames Gi = {F1, F2, F3, … . , Fn}. Each frame Fk, where (k denotes the frame number) undergoes modification using simple linear iterative clustering [30], taken form [31] and boundary-aware smoothing method, which is inspired from, which helps in removing the computational burden and unnecessary details. The colour and motion gradient mapping from [31], [32] for obtaining the spatio-temporal gradient map and thus obtain pixel- based contrast computation given by (3). SMT = ||ux, uy||2 ⨀||∇(F)||2 (3) That is, horizontal and vertical gradient of optical flow and ∇(F) colour gradient map. We then calculate the i − th super pixel’s motion contrast using (4), MCi = ∑ ||Ui,Uj|| 2 ||ai,a||2 , aj∈ψi ψi = {τ + 1 ≥ ||ai, aj|| 2 ≥ τ} (4) where l2 norm has been used and U and ai denote the optical flow gradient in two directions and i − th super- pixel position centre respectively. ψi is used to denote computational contrast range and is calculated using shortest Euclidean distance between spatio-temporal map and i − th superpixel. τ = r ||Λ(SMT)||0 ∑ ||Λ(SMTτ )|| 0 τ∈||τ,i||≤r ; l = 0.5 min{width, height} , Λ → down sampling (5) Colour saliency is also computed the same way as optical flow gradient, except we use the red, blue and green notations for the i − th super pixel. CM = ∑ ||(Ri,Gi,Bi,),(Rj,Gj,Bj)|| 2 ||ai,aj|| 2 aj∈ψi (6) CMk,i ← ∑ ∑ exp (−||ck,i aτ,j∈μϕ ,c τ,j||1 μ)⋅CMτ,j ⁄ k+1 τ=k−1 ∑ ∑ exp (−||ck,i aτ,j∈μϕ ,c τ,j||1 μ) ⁄ k+1 τ=k−1 (7) Here, ck,i is the average of the i − th super-pixel RGB colour value in k − th frame while σ controls smoothing strength. The equation ||ak,i, a τ,j|| 2 ≤ θ needs to be satisfied and this is done using μ, θ = 1 m×n ∑ ∑ || 1 m m i=1 n k=1 ∑ F(SMTk,i m i=1 ), F(SMTk,i )||1; m, n = frame numbers (8) F(SMTi ) = { ai, SMTi ≤ ϵ × 1 m ∑ SMTi m i=1 0, otherwise ; ϵ = filter strenght control (9) At each batch frame level, the q − th frame’s smoothing rate is dynamically updated with (10). (1 − γ)θs−1 + γθs → θs; γ = (learning weight ,0.2) (10) Now the colour and motion saliency is integrated to get the pixel-based saliency map. LLS = CM ⊙ MC (11) Since this fused saliency maps increases accuracy considerably but the rate decreases, so this will be dealt with in the next section. 3.3. Accuracy boosting There is a matrix M that is the input and it needs to be decomposed, we use the help of sparse S and low level D and use this equation min D,S α||S||1 + ||D||∗ subj = M = S + D where the nuclear form of D is used
  • 5.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91 86 and the (11) is solved with the help of robust principal component analysis (RPCA) [34] and is showcased using the two equations. Where ssd(Z) denotes singular value decomposition of Lagrange multiplier and α and β represent lesser-rank and sparse threshold parameters respectively. Then, to reduce wrong detections due to misplaced optical flow the super-pixels contained in the given region’s rough foreground is located and feature subspace of a frame k is spanned as gIk = {LLSk,1 , LLSk,2 , … . . LLSk,m } and thus for the entire frame group we get gBτ = {gI1, gI2, … . , gIn}. This way the rough foreground is calculated as given in (14). S ← sign(M − D − S)[|M − D − S| − αβ]+ (12) D ← V[Σ − βI]+U, (V, Σ, U) ← svd(Z) (13) RFi = [∑ LLSk,i − ω n×m ∑ ∑ LLSk,i ]+ m i=1 n k=1 n k=1 (14) Here ω is reliability cotrol factor and we also get two subspaces for (14) spanned by LLS and RGB colour and it is given by SB = {cv1, cv2, … . , cvn} ∈ ℝ3v×n where cvi = {vec(Ri,1, Gi,1, Bi,1, … . , Ri,m, Gi,m, Bi,m)}K and SF = vec(LLS1 ), … . vec(LLSn ) ∈ ℝv×n . This helps in making a one-to-one correspondence and then pixel- based saliency mapping infusion that is dissipated on the entire group of frames.SBover SF causes disruptive foreground salient movements and hence with the help from [35]–[37] this issue was resolved with an alternate solution, min Mcx,Scx,ϑ,A⊙ϑ ||Mc|| ∗ + ||Dx|| ∗ + ||A + ϑ||2 + α1||Sc|| 1 + α2||Sx||; || ∙ ||∗ → nuclear norm, A is position matrix (15) s. t Mc = Dc + Sc, Ms = Ds + Sx, Mc = SB ⊙ ϑ, Mx = SF ⊙ ϑ, ϑ = {E1, E2, … . , En}, Ei ∈ {0,1}m×m , Ei1K = 1. where the estimated pixel-based mapping features over colour and saliency feature spaces are denoted by the Dc, Dx variables, ϑ is the permutation matrix that is taken from [36], [38], while Sx, Screpresent the colour feature sparse component space and saliency feature space. This entire equation set helps in correcting super- pixel correspondences. 3.4. Mathematical model In (15) is again modified using the concept of [39]. This is to generate a distributed version of convex problems and this is represented by (16). Where Zirepresents Lagrangian multiplier. π Denotes steps of iterations and the optimized solution using partial derivative shown in (17). D(Mcx, Scx, ϑ, A ⊙ ϑ) = α1||Sc|| 1 + α2||Ex|| 2 + β1||Mc|| ∗ + β2||Mx|| ∗ + ||A ⊙ ϑ||2 + trace (Z1 K(Mc − Dc − Sc)) +trace (Z2 K(Mx − Dx − Sx)) + π 2 (||Mc − Dc − Sc|| 2 + ||(Mx − Dx − Sx)|| 2 ). (16) Sc,x k+1 = 1 2 ||Sc,x k − (Mc,x k − Sc,x k + Z1,2 k πk||2 2 ⁄ + min Sc,x k α1,2 ||Sc,x k || 1 /πk (17) Dc,x k+1 = 1 2 ||Dc,x k − (Mc,x k − Dc,x k + Z1,2 k πk||2 2 ⁄ + min Dc,x k β1,2 ||Dc,x k || ∗ /πk (18) Di is updated to become Dc,x k+1 ← UK + V [Σ − β1,2 πk ] (19) (V, Σ, U) ← svd (Mc,x k − Sc,x k + Z1,2 k πk )
  • 6. Int J Artif Intell ISSN: 2252-8938  Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad) 87 similarly, for Si. Sc,x k+1 ← sign ( |J| πk ) [J − α1,2 πk ] + (20) J = Mc,x k − Dc,x k + Zc,x k /πk then the components that determine the value of E are used o compute the norm cost L ∈ ℝm×m li,j k = ||Ok,i − H(V1, j)|| 2 , V1 = H(SB, k) ⊙ Ek (21) li,j k = ||Ok,i − H(V2, j)|| 2 , V2 = H(SB, k) ⊙ Ek where, O objective column matrix which gives k − th of RF Ok,i = Sc,x(k, i) + Dc,x(k, i) − Z1,2(k, i)/πk (22) As min||A + ϑ||2 is hard to approximate, so as to calculate it there is a need to change Lτ = {r1,1 τ + d1,1 τ , r1,2 τ + d1,2 τ , … . , rm,m τ + dm,m τ } ∈ ℝm×m , k = [k − 1, k + 1] to Lk. H(Lk, j) ← ∑ ∑ H(Lτ, v). exp (−||cτ,v, ck,j|| 1 μ) ⁄ pt,v∈ξ k+1 τ=k−1 (23) the global optimization problems is handled using the algorithm in [40] and thus modifying the (24)-(26) SFk+1 ← SFk ⊙ ϑ, SBk+1 SBk ⊙ ϑ (24) Z1,2 k+1 ← πk(Mc,x k − Dc,x k − Sc,x k ) + Z1,2 k (25) πk+1 ← πk × 1.05 (26) the alignment of the super pixels is now given by (27) gSi = 1 n−1 ∑ H(SF ⊙ ϑ, τ) n τ=1,i≠τ (27) SF is modified to reduce the incorrect detections and alignments SF ̃ ← SF ⊙ ϑ (28) SF ← SF ̃ ∙ (1m×n − X(Sc)) + ρ ∙ SF ̅̅̅ ∙ X(Sc) (29) ρi,j = { 0.5, 1 n ∑ SFi,j < ̃ n j=1 SFi,j ̃ 2, otherwise (30) In (29) is a balancing equation matrix. The equation to represent the result of the saliency mapping for the i − th video frame is gSi = H(ρ,i)−(H(ρ,i).X(Sc) H(ρ,i)(n−1) ∑ H(SF ⊙ ϑ, τ) n τ=1,i≠τ (31) there is a need to diffuse inner temporal batch xr of the current group’s frames based of degree of colour similarity. the final output is given by gSi,j = xr∙yr+∑ yi∙gSi,j n i=1 yr+∑ yi n i=1 ; yr = exp (− ||cr,j, ci,j|| 2 /μ) (32) Where xl displays the colour distance-based weights.
  • 7.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91 88 4. RESULTS, EXPERIMENTS AND DATABASE 4.1. Based references and comparison Any experiment or research is not complete without the proposed solution’s actual application results. For this paper, the algorithm is compared [42] as a bare reference, followed by [43]’s operational block description length (OBDL) [43], dynamic adaptive whitening saliency (AWS-D) [44]. Object-to-motion convolutional neural network two layer long short-term memory (OMCNN-2CLSTM) [45], attentive convolutional (ACL) [46], saliency aware video compression (SAVC) algorithm from Xu et al. and Bylinskii et al. [47], [48]. These algorithms are used on the database. The database used in this paper is same as that of the reference base paper [42]. Its a high-definition eye tracking database with the open-source present in GitHub [49]. These algorithms are used widely, with a great common intermediate factor (CIF) resolution and is also based on HD non-destructive video. 4.2. Experiment and results For the final comparison and evaluation, 10 sequences of videos were taken in 3 discrete resolutions of 1920 × 1080, 1280 × 720 and 832 × 480. This is shown in Table I. Then we use five evaluation metrics, namely area under the receiver operating characteristic (ROC) curve (AUC), similarity (SIM), correlation coefficient (CC), normalized scanpath saliency (NSS) and kullback-leibler (KL) and the results are shown in Table 1. Figure 1 shows the results for saliency algorithms. The comparison among all the aforementioned algorithms OBDL [43], AWS-D [44]. OMCNN- 2CLSTM [45], ACL [46], SAVC [47], XU [48]. Base reference [42] and our proposed algorithm) has been numerically arranged in Table 2 and Figure 2 shows the graphical representation of the same. Five common saliency evaluation metrics have been used, the same as used in the base reference paper [42], namely area under ROC curve, similarity or histogram intersection, pearson’s CC, NSS and KL divergence. Looking at the research papers, the SAVC and OBDL algorithms have been based on H>264 that has incorporated macroblock coding strategy with fixed size and is inflexible unlike the HEVC. This causes reduction in accuracy and precision. The XU algorithm is quite similar to HEVC algorithm and so it gets better results than the previous mentioned algorithms but complex pictures will lead to difficulty in saliency detection and mapping. This gives large values of KL evaluation scheme. This problem is similarly found in OMCNN-2CLSTM [45] and sACL [46] with high values of (2.82 and 3.0642 respectively). The base paper [42] has somewhat fared well than the other saliency detection methods but yet again the KL value is quite high (at 2.4921). The propose solution has done remarkably well with respect to the KL evaluation metric with an amazing result of 0.862871, that is ground truth vale is more accurate and a remarkable NSS value of nearly being unity. The other evaluation metric values are closer to each other but this paper has outperformed in several aspects, making the proposed custom spatio-temporal fusion saliency detection method a much more successful and viable saliency detection method. Table 1. Information regarding the types of videos chosen for evaluation and comparison Type Resolution Name Frame Rate (Hz) A 1920×1080 Basketball Drive 50 Kimono 1 24 Park Scene 24 Johnny 60 B 1280×720 Kristen And Sara 60 Four People 60 vidyo3 60 vidyo4 60 C 832×480 Basketball Drill 50 Race Horses 30 Table 2. Saliency evaluation and comparison results Method AUC SIM CC NSS KL OBDL [43] 0.6413 0.2982 0.2253 0.297 3.4642 AWS-D [44] 0.6635 0.3154 0.2663 0.4768 1.7144 ACL [46] 0.7673 0.3614 0.3774 0.5005 3.0642 SAVC [47] 0.5844 0.2688 0.1248 0.1889 2.0191 XU [48] 0.5881 0.305 0.2663 0.2854 1.5098 BasePaper [42] 0.7334 0.3751 0.387 0.5674 2.4921 PS-SYSTEM 0.7354 0.4644 0.44391 1.000138 0.862871
  • 8. Int J Artif Intell ISSN: 2252-8938  Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad) 89 Figure 1. The results display the video frames after implementing saliency algorithm Figure 2. Saliency evaluation and comparison graph 5. CONCLUSION This paper has introduced a custom spatio-temporal fusion video saliency detection method that has greater accuracy and precision in comparison to the latest available state-of-the-art saliency detection methods. There have been several changes made in simple calculations to solve the problems of colour contrast computation, modifying the fusion aspect of the saliency so as to boost both motion and colour values and also spatio-temporal of pixel-based coherency boost for temporal scope saliency exploration. The product had been tested against an extensive database provided by for comprehending its robustness and efficiency. The result has also been compared to the various state-of-the0-art saliency-mapping methods and it has cometo light that the proposed solution has better accuracy and precision. All these modifications have made our proposed
  • 9.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 82-91 90 custom spatio-temporal fusion video saliency detection method perform much better and has given a new rise of hope in the field of video saliency. This algorithm will be helpful for those who will continue to further the research in this field of saliency detection, as there is very little research available. REFERENCES [1] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998, doi: 10.1109/34.730558. [2] C. Guo, Q. Ma, and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8, Jun. 2008, doi: 10.1109/CVPR.2008.4587715. [3] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 {IEEE} Conference on Computer Vision and Pattern Recognition, Jun. 2010, pp. 1597–1604, doi: 10.1109/cvpr.2009.5206596. [4] M. Cerf, E. P. Frady, and C. Koch, “Faces and text attract gaze independent of the task: Experimental data and computer model,” Journal of Vision, vol. 9, no. 12, pp. 1–15, Nov. 2009, doi: 10.1167/9.12.10. [5] M. Cerf, J. Harel, W. Einhäuser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,” Advances in Neural Information Processing Systems 20-Proceedings of the 2007 Conference, 2008. [6] L. J. Li and L. Fei-Fei, “What, where and who? Classifying events by scene and object recognition,” Proceedings of the IEEE International Conference on Computer Vision, 2007, doi: 10.1109/ICCV.2007.4408872. [7] B. Scassellati, “Theory of mind for a humanoid robot,” Autonomous Robots, vol. 12, no. 1, pp. 13–24, 2002, doi: 10.1023/A:1013298507114. [8] S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, and A. Guérin-Dugué, “Spatio-temporal saliency model to predict eye movements in video free viewing,” 2008 16th European Signal Processing Conference, Aug. 2008. [9] Yu-Fei Ma and Hong-Jiang Zhang, “A model of motion attention for video skimming,” in Proceedings. International Conference on Image Processing, 2002, vol. 1, pp. I-129-I–132, doi: 10.1109/ICIP.2002.1037976. [10] S. Li and M. C. Lee, “Fast visual tracking using motion saliency in video,” in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP ’07, 2007, vol. 1, pp. I-1073-I–1076, doi: 10.1109/ICASSP.2007.366097. [11] R. J. Peters and L. Itti, “Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8, doi: 10.1109/CVPR.2007.383337. [12] A. C. Schütz, D. I. Braun, and K. R. Gegenfurtner, “Object recognition during foveating eye movements,” Vision Research, vol. 49, no. 18, pp. 2241–2253, Sep. 2009, doi: 10.1016/j.visres.2009.05.022. [13] F. Zhou, S. B. Kang, and M. F. Cohen, “Time-mapping using space-time saliency,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 3358–3365, doi: 10.1109/CVPR.2014.429. [14] Z. Liu, X. Zhang, S. Luo, and O. Le Meur, “Superpixel-based spatiotemporal saliency detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 9, pp. 1522–1540, Sep. 2014, doi: 10.1109/TCSVT.2014.2308642. [15] Y. Fang, Z. Wang, W. Lin, and Z. Fang, “Video saliency incorporating spatiotemporal cues and uncertainty weighting,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3910–3921, Sep. 2014, doi: 10.1109/TIP.2014.2336549. [16] W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2015, vol. 07-12-June-2015, pp. 3395–3402, doi: 10.1109/CVPR.2015.7298961. [17] W. Wang, J. Shen, and L. Shao, “Consistent video saliency using local gradient flow optimization and global refinement,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4185–4196, Nov. 2015, doi: 10.1109/TIP.2015.2460013. [18] T. M. Hoang and J. Zhou, “Recent trending on learning based video compression: A survey,” Cognitive Robotics, vol. 1, pp. 145–158, 2021, doi: 10.1016/j.cogr.2021.08.003. [19] A. Borji, “Saliency prediction in the deep learning era: successes and limitations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 679–700, Feb. 2021, doi: 10.1109/TPAMI.2019.2935715. [20] W. Wang, J. Shen, J. Xie, M. M. Cheng, H. Ling, and A. Borji, “Revisiting video saliency prediction in the deep learning era,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 220–237, Jan. 2021, doi: 10.1109/TPAMI.2019.2924417. [21] M. Startsev and M. Dorr, “Supersaliency: a novel pipeline for predicting smooth pursuit-based attention improves generalisability of video saliency,” IEEE Access, vol. 8, pp. 1276–1289, 2020, doi: 10.1109/ACCESS.2019.2961835. [22] H. Li, F. Qi, and G. Shi, “A novel spatiooral 3D convolutional encoder-decoder network for dynamic saliency prediction,” IEEE Access, vol. 9, pp. 36328–36341, 2021, doi: 10.1109/ACCESS.2021.3063372. [23] S. Zhu, C. Liu, and Z. Xu, “High-definition video compression system based on perception guidance of salient information of a convolutional neural network and HEVC compression domain,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 1946–1959, 2020, doi: 10.1109/TCSVT.2019.2911396. [24] E. Rahtu, J. Kannala, M. Salo, and J. Heikkilä, “Segmenting salient objects from images and videos,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6315 LNCS, no. PART 5, Springer Berlin Heidelberg, 2010, pp. 366–379. [25] Q. Zhang, X. Wang, S. Wang, S. Li, S. Kwong, and J. Jiang, “Learning to explore intrinsic saliency for stereoscopic video,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2019, vol. 2019-June, pp. 9741–9750, doi: 10.1109/CVPR.2019.00998. [26] H. J. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance,” Journal of Vision, vol. 9, no. 12, pp. 15–15, Nov. 2009, doi: 10.1167/9.12.15. [27] H. Fu, X. Cao, and Z. Tu, “Cluster-based co-saliency detection,” IEEE Transactions on Image Processing, vol. 22, no. 10, pp. 3766–3778, Oct. 2013, doi: 10.1109/TIP.2013.2260166. [28] Z. Wang, J. Li, and Z. Pan, “Cross complementary fusion network for video salient object detection,” IEEE Access, vol. 8, pp. 201259–201270, 2020, doi: 10.1109/ACCESS.2020.3036533. [29] H. Bi, D. Lu, N. Li, L. Yang, and H. Guan, “Multi-level model for video saliency detection,” in 2019 IEEE International Conference on Image Processing (ICIP), Sep. 2019, vol. 2019-Septe, pp. 4654–4658, doi: 10.1109/ICIP.2019.8803611. [30] E. S. L. Gastal and M. M. Oliveira, “Domain transform for edge-aware image and video processing,” ACM Transactions on Graphics, vol. 30, no. 4, pp. 1–12, Jul. 2011, doi: 10.1145/2010324.1964964.
  • 10. Int J Artif Intell ISSN: 2252-8938  Video saliency-recognition by applying custom spatio temporal fusion technique (Vinay C. Warad) 91 [31] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, Nov. 2012, doi: 10.1109/TPAMI.2012.120. [32] D. Zhang, O. Javed, and M. Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp. 628–635, doi: 10.1109/CVPR.2013.87. [33] M. Xu, P. Fu, B. Liu, and J. Li, “Multi-stream attention-aware graph convolution network for video salient object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 4183–4197, 2021, doi: 10.1109/TIP.2021.3070200. [34] J. Wright, Y. Peng, Y. Ma, A. Ganesh, and S. Rao, “Robust principal component analysis: exact recovery of corrupted low-rank matrices by convex optimization,” in NIPS’09: Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009, pp. 2080–2088. [35] X. Zhou, C. Yang, and W. Yu, “Moving object detection by detecting contiguous outliers in the low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp. 597–610, Mar. 2013, doi: 10.1109/TPAMI.2012.132. [36] Z. Zeng, T. H. Chan, K. Jia, and D. Xu, “Finding correspondence from multiple images via sparse and low-rank decomposition,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7576 LNCS, no. PART 5, Springer Berlin Heidelberg, 2012, pp. 325–339. [37] P. Ji, H. Li, M. Salzmann, and Y. Dai, “Robust motion segmentation with unknown correspondences,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8694 LNCS, no. PART 6, Springer International Publishing, 2014, pp. 204–219. [38] R. Oliveira, J. Costeira, and J. Xavier, “Optimal point correspondence through the use of rank constraints,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 2, pp. 1016–1021, doi: 10.1109/CVPR.2005.264. [39] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010, doi: 10.1561/2200000016. [40] J. Munkres, “Algorithms for the assignment and transportation problems,” Journal of the Society for Industrial and Applied Mathematics, vol. 5, no. 1, pp. 32–38, Mar. 1957, doi: 10.1137/0105003. [41] Z. Liu, L. Meur, and S. Luo, “Superpixel-based saliency detection,” in 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Jul. 2013, pp. 1–4, doi: 10.1109/WIAMIS.2013.6616119. [42] L. Wei, M. Wang, W. Liu, X. Wang, J. Sun, and X. Yin, “Multi-features fusion based on boolean map for video saliency detection,” in Chinese Control Conference, CCC, Jul. 2019, vol. 2019-July, pp. 7589–7594, doi: 10.23919/ChiCC.2019.8865253. [43] V. Leboran, A. Garcia-Diaz, X. R. Fdez-Vidal, and X. M. Pardo, “Dynamic whitening saliency,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 5, pp. 893–907, May 2017, doi: 10.1109/TPAMI.2016.2567391. [44] F. Guo, W. Wang, Z. Shen, J. Shen, L. Shao, and D. Tao, “Motion-aware rapid video saliency detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4887–4898, Dec. 2020, doi: 10.1109/TCSVT.2019.2906226. [45] L. Huang, K. Song, J. Wang, M. Niu, and Y. Yan, “Multi-graph fusion and learning for RGBT image saliency detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1366–1377, Mar. 2022, doi: 10.1109/TCSVT.2021.3069812. [46] H. Hadizadeh and I. V. Bajic, “Saliency-aware video compression,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 19–33, Jan. 2014, doi: 10.1109/TIP.2013.2282897. [47] M. Xu, L. Jiang, X. Sun, Z. Ye, and Z. Wang, “Learning to detect video saliency with HEVC features,” IEEE Transactions on Image Processing, vol. 26, no. 1, pp. 369–385, Jan. 2017, doi: 10.1109/TIP.2016.2628583. [48] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand, “What do different evaluation metrics tell us about saliency models?,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 3, pp. 740–757, Mar. 2019, doi: 10.1109/TPAMI.2018.2815601. [49] S. Park, E. Aksan, X. Zhang, and O. Hilliges, “Towards end-to-end video-based eye-trackin,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12357 LNCS, Springer International Publishing, 2020, pp. 747–763. BIOGRAPHIES OF AUTHORS Vinay C. Warad working as assistant professor in Department of computer science and engineering at Khawaja Banda Nawaz College of engineering. He has 8 years of teaching experience. His area of interest is video saliency, image retrieval. He can be contacted at email: vinaywarad999@gmail.com. Ruksar Fatima is a professor and head of the Department for computer science and engineering, Vice principal and examination in charge at khajaBandanawaz college of engineering (KBNCE) kalaburagi Karnataka. She can be contacted at email: ruksarf@gmail.com.