Sunday, 1 March 2009

Bioinformatics Software Engineering Group Project

We got our group project brief last Tuesday, with our 1st meeting with the "clients" last Thursday.

The brief:
CRIC has asked for us to create a prototype web-based application to tender for a contract to create a full program. Our "company" (our group) needs to demonstrate this working prototype in 3 weeks (so we can hypothetically "win" the contract, and design the program).

Specification:

Papers Download
• Enable users with means to retrieve online research papers using a search term. The system should query against online resources (e.g. Direct Connect, Wiley International, Google Scholar and Web of Knowledge) using appropriate passwords from user (or could be retrieved from database) and attempt to search for and download selected papers. Allow for other services in the future.
• If full papers are not available, at least the abstract and publication information should be downloaded.
- Scopus, other resources that work are feasible alternatives. No preference with resource, but want something working for the prototype

2. Search for gene and/or protein IDs
• The web application should provide the user with means of retrieving genes and proteins information using a search term (e.g. Breast cancer).
• The system should give the user the flexibility to specify the online resources he wants to query, the following bioinformatics resources are of particular interest to our researchers at the moment – these may change: Ensembl (EBI), Swiss-Prot/Uni-Prot, Entrez (NCBI),
Protein DataBank (PDB).
• The user should be kept informed of the results retrieval process.

3. Upload and analyze custom datasets
• The system should also give the user the possibility to upload custom dataset files. Once uploaded the user should be able to assign privileges for browsing, modifying, downloading the dataset by:
i)only the user, ii) the user workgroup, iii) everyone
• The developed system should also support the integration of data analysis plugins. For the purpose of this prototype, you can adopt one or more of the data analysis scripts you have developed throughout the year (e.g. sample data and analysis script from Module 1 and/or
Module 5 assignments) and make it excutable through the web application.

• You are required to choose a model dataset to test your analysis script. This can be any datasets that we have been provided with. The dataset & analysis scripts you decided to implement will be discussed during one of the early meetings with the client.
• Modularity of the associated analysis scripts should be well respected. A detailed section describing methodologies for adding more analysis scripts should be provided with the written report.


4. Local Database
• Design a database schema to store all relevant collected information using appropriate keying and indexing.
• User login details, online search results, links to the uploaded datasets, as well as data analysis output should be stored and managed within the database

- Registration system, manage the users and their data.
- Registered users should be able to login to his own space on the system (think of msn or facebook space), to manage and store his experimental results.
- The system should also provide different levels of security for the stored/retrieved data.


So we need to sort out, amongst other things
- the userspace (registration, log in, sessions, linking the user to their data)
- the permissions/security (personal, group and public settings)
- 1, 2, and 3.


We have Netbeans, MySQL, R and anything free to use to implement pretty much. We can add bits, hack things in, as long as we reference.

If you were to do this project, what solution might you take for this?
What do you think might be the things that take the longest time?

Some helpful pointers would be welcome, and hopefully I can update on how it's going if anyone would be interested.

No comments: