NetMSA

NetMSA.NetMSAModule

This module provides an implementation of NetMSA algorithm in Julia, which can be used for multiple sequence alignment.

source
NetMSA.createPeerMatrixMethod
createPeerMatrix(inputStrings::Vector{String})::Matrix{Union{Missing,Char}}

Create and return a Peer matrix, containing charachters as elements, where each input sequence, provided in the inputStrings, is represented as a column. Missing values are represented in the matrix by the missing keyword.

Examples

julia> NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'
source
NetMSA.matrixalignmentMethod
matrixalignment(M)

Align the matrix using NetMSA algorithm.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

juila> NetMSA.matrixalignment(M)
9×4 Array{Union{Missing, Char},2}:
 'a'  'a'  'a'  'a'
 'b'  '-'  'b'  'b'
 'c'  'c'  'c'  'c'
 'b'  'b'  '-'  'b'
 'c'  'c'  '-'  'c'
 'd'  'f'  'h'  'j'
 'e'  'g'  'i'  'k'
 'm'  '-'  'm'  'm'
 '-'  '-'  'n'  '-'
source
NetMSA.ParticleType

A particle that is used for creating swarms.

Fields

  • value::Char : Value of the particle, e.g. 'b' or 'c'
  • updated::Int64 : Number of turns till last updated
  • pos::Position : The original position of the particle
  • best::Position : The best local position of the particle
  • bestvalue::Float64 : Best local score
source
NetMSA.PositionType

Store the position of a given particle. Position $x_{s_{i}}(r)$ of the particle $p_{s_{i}}$ is defined by using the row $r$ that contains the symbol $s_i$ as well as locations of the symbol $s_i$ in the different columns (indexes of the columns that contain $s_i$ ) in the row $r$.

source
NetMSA.alignedMethod
aligned(row)::Bool

Return whether a row is aligned or not.

A row is aligned if it only contains different occurrences of the same symbol.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

 juila> NetMSA.aligned(M[1, :])
 true

 juila> NetMSA.aligned(M[2, :])
 false
source
NetMSA.createswarmMethod
createswarm(rowindex::Int64, M)

Create a swarm containing unique Particles in the current row.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

juila> NetMSA.createswarm(2, M)
2-element Array{NetMSA.Particle,1}:
 NetMSA.Particle('c', 0, NetMSA.Position(2, [2]), NetMSA.Position(2, [2]), 0.0)
 NetMSA.Particle('b', 0, NetMSA.Position(2, [1, 3, 4]), NetMSA.Position(2, [1, 3, 4]), 0.0)
source
NetMSA.flydownMethod
flydown(p, M; stride=1)

Fly down the given particle by stride.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

juila> p = NetMSA.Particle('b', NetMSA.getposition('b', 2, M));
NetMSA.Particle('b', 0, Main.NetMSA.Position(2, [1, 3, 4]), Main.NetMSA.Position(2, [1, 3, 4]), 0.0)

julia> NetMSA.flydown(p, M)
9×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 '-'  'c'      '-'      '-'
 'b'  'b'      'b'      'b'
 'c'  'c'      'c'      'c'
 'b'  'f'      'h'      'b'
 'c'  'g'      'i'      'c'
 'd'  missing  'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

julia> NetMSA.flydown(p, M; stride=3)
11×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 '-'  'c'      '-'      '-'
 '-'  'b'      '-'      '-'
 '-'  'c'      '-'      '-'
 'b'  'f'      'b'      'b'
 'c'  'g'      'c'      'c'
 'b'  missing  'h'      'b'
 'c'  missing  'i'      'c'
 'd'  missing  'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'
source
NetMSA.fullMethod
aligned(row)::Bool

Return whether a row is full or not.

An aligned row r is called full if no gaps (—) are added in the row r . That is, the number of occurrences of the symbol in the row is equal to the number of columns in the matrix.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

 juila> NetMSA.full(M[1, :])
 true

 juila> NetMSA.full(M[2, :])
 false
source
NetMSA.getpositionMethod
getposition(value, rowindex, matrix)

Return the Position (rowindex, [colindex1, colindex2, ...]) of the Particle represented by value, at the rowindex in the matrix.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

 juila> NetMSA.getposition('b', 2, M)
 NetMSA.Position(2, [1, 3, 4])
source
NetMSA.mostfrequentMethod
mostfrequent(row)

Return a tuple containing the most frequent element occuring in the row, along with its frequency.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

 juila> NetMSA.mostfrequent(M[2, :])
 (3, 'b')
source
NetMSA.objectiveMethod
objective(M, rowindex; endindex=0)

Return the objective score of the row, calculated as follows:

\[f(x_s(r)) = \frac{A(r) \times C(r)}{1 + Gaps(r)} \times \sum_{j=r}^{k} w(j)\]

where $A(r)$ is the number of aligned rows in M from $r$ to the last row, $C(r)$ is the maximum number of matched charachters in the current row, $Gaps(r)$ is the number of gaps added to the matrix M from row $r$ to the last row, and $w(r)$ is the weight of the row $r$.

endindex is used to reduce the search area for Gaps, and if it is not provided, it would default to size(M)[1].

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

 juila> NetMSA.objective(M, 2)
 2.625
source
NetMSA.rowalignmentMethod
rowalignment(rowindex, M)

Return Particle with best position that aligns (maximizes the objective score) the given row in the matrix.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

juila> p = NetMSA.Particle('b', NetMSA.getposition('b', 2, M));
NetMSA.Particle('b', 0, Main.NetMSA.Position(2, [1, 3, 4]), Main.NetMSA.Position(2, [1, 3, 4]), 0.0)

julia> NetMSA.rowalignment(2, M)
NetMSA.Particle('c', 0, Main.NetMSA.Position(2, [2]), Main.NetMSA.Position(3, [1]), 9.0)
source
NetMSA.stopcriteriaMethod
stopcriteria(p::Particle, newindex, M; threshold::Int=5, debug=false)

Check whether a stopping criteria has been met. Two stopping criteria are checked in this function:

  • Criteria 2: If a particle hasn't updated its best score in the last threshold turns.
  • Criteria 3: If a particle moves to a new row which already contains the same symbol as that of the particle.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

juila> p = NetMSA.Particle('b', NetMSA.getposition('b', 2, M));
NetMSA.Particle('b', 0, Main.NetMSA.Position(2, [1, 3, 4]), Main.NetMSA.Position(2, [1, 3, 4]), 0.0)

julia> NetMSA.stopcriteria(p, 3, M; debug=true)
"Terminating because of criteria 3"
true
source
NetMSA.weightMethod
weight(row; w1=0.25, w2=0.5, w3=1.0)

Return the weight of the row, calculated as:

\[w(r) = \begin{cases} w_1 \times \frac{x}{c}; & \text{ if r is not aligned} \\ w_2 \times \frac{n_s}{c}; & \text{ if r is aligned} \\ w_3; & \text{ if r is full} \\ \end{cases}\]

where $n_s$ is the number of occurrences of the symbol $s$ in the aligned row $r$, and $c$ is the total number of columns in the row. The value of $x$ is equal to zero if every symbol in the row $r$ occurred at most once, otherwise $x$ is equal to the max number of occurrences (matches) of some symbol in $r$.

Examples

julia> M = NetMSA.createPeerMatrix(["abcbcdem", "acbcfg", "abchimn", "abcbcjkm"])
8×4 Array{Union{Missing, Char},2}:
 'a'  'a'      'a'      'a'
 'b'  'c'      'b'      'b'
 'c'  'b'      'c'      'c'
 'b'  'c'      'h'      'b'
 'c'  'f'      'i'      'c'
 'd'  'g'      'm'      'j'
 'e'  missing  'n'      'k'
 'm'  missing  missing  'm'

 juila> NetMSA.weight(M[1, :])
 1.0

 juila> NetMSA.weight(M[2, :])
 0.1875
source