Using UFFizi involves three steps:
- Step 1: Uploading the matrix
- Step 2: Running fast UFF
- Step 3: Retrieving the results
Step 1: Uploading your matrix
UFFizi file format specifications:
- The input file should contain a delimited MxN matrix, where M stands for features (in rows) and N stands for instances (in columns).
- We accept tab or comma as delimiters.
- Matrix file size is limited to 100MB, which is equivalent to matrices of several hundred instances and up to 60K features.
- If your file does not contain feature names specified in the first column, the serial number is outputted instead. (don't forget to uncheck the relevant checkbox).
- In order to reduce upload time, you may upload a zip file, where the archived matrix data file must be named data.txt
|
|
For further clarification, please look at the following 2308 features by 88 instances example matrix file.
Step 2: Running fast UFF
If the number of features match the ones you expected, simply run the algorithm
Step 3: Receiving the output and what warnings mean
Normal results would contain no warnings and would contain only indication to the number of features selected and a link to the results.
Please note that the results are kept for only two days
This is an example of the UFF output of selected features
The results might be accompanied by warnings. The following warnings might be encountered:
- There are only X instances in the dataset. Results might be dubious.
This warning is initiated if there are too few instances, such that the confidence in the results is low.
- There are only X features in the dataset. Results might be dubious.
This warning is initiated if there are too few features, such that the confidence in the results is low.
- We have detected that the fast (approximated) UFF version might produce different ranking from the exact UFF for this dataset.
You might want to verify this with the accurate (slow) version.
This warning is initiated if the entropy criterion assessing deviation from the accurate version is met.
The accurate algorithm is described in the following paper: Novel Unsupervised Feature Filtering of Biological Data, Bioinformatics, 22 (2006), pp. e507-513
- Score entropy criterion indicates that the results might not be significant.
This warning is initiated if the criterion for assessing whether the dataset is amenable to UFF is met. The confidence in the results is very low.
- Score entropy criterion indicates that the results have borderline significance.
This warning is initiated if a weaker criterion for assessing whether the dataset is amenable to UFF is met. The confidence in the results is low.
|