Datasets

Real-world data can be difficult to obtain. Here I share the datasets that I have collected.
1. Mobile App User Dataset
2. RALIC Datasets

.

Mobile App User Dataset

I surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea.

We asked respondents about:
(1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download.
(2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income
(3) their personality using the Big-Five personality traits

This dataset contains the results of the survey.

Detailed descriptions about the project and how I collected the data can be found in my TSE paper (3MB).

Download

The datasets are freely available for research use when acknowledged with the following reference:

(1) Soo Ling Lim, Peter J. Bentley, Natalie Kanakam, Fuyuki Ishikawa, and Shinichi Honiden (2015). Investigating Country Differences in Mobile App User Behavior and Challenges for Software Engineering. IEEE Transactions on Software Engineering (TSE), vol 41 issue 1, pp 40-64.

Download dataset (7MB)

If you use the data, please tell me your name, research group, and the publications that may result.

For further information please contact me at s.lim [at] cs.ucl.ac.uk


.

RALIC Datasets

I have collected various datasets of stakeholders and their requirements on a real software project. Detailed descriptions about the project and how I collected the data can be found in my thesis (8MB).

The datasets consist of:

* 1714 recommendations from 61 stakeholders (OpenR)
* 839 recommendations from 50 stakeholders (ClosedR)
* 439 ratings from 76 stakeholders on 10 project objectives (RateP-Obj)
* 1514 ratings from the same 76 stakeholders on 48 requirements (RateP-Req)
* 3113 ratings from the same 76 stakeholders on 104 specific requirements (RateP-SReq)
* 262 ratings from 79 stakeholders on 10 project objectives (RankP-Obj)
* 469 ratings from the same 79 stakeholders on 51 requirements (RankP-Req)
* 1109 ratings from the same 79 stakeholders on 132 specific requirements (RankP-SReq)
* 276 ratings from 77 stakeholders on 10 project objectives (PointP-Obj)
* 670 ratings from the same 77 stakeholders on 45 requirements (PointP-Req)
* 1219 ratings from the same 77 stakeholders on 83 specific requirements (PointP-SReq)
* 410 raw textual description of requirements provided by stakeholders (Raw-requirements)
* stakeholders and their roles (Stakeholders-and-roles)

Download

The datasets are freely available for research use when acknowledged with the following references:

(1) Soo Ling Lim (2010). Social Networks and Collaborative Filtering in Large-Scale Requirements Elicitation. PhD Thesis. School of Computer Science and Engineering, University of New South Wales, Sydney, Australia.

(2) Soo Ling Lim & Anthony Finkelstein (2012). StakeRare: Using Social Networks and Collaborative Filtering for Large-Scale Requirements Elicitation. IEEE Transactions on Software Engineering. Issue 3 Volume 38, pages 707 – 735.

Download datasets

Additional data about the cost (person hours) for each requirement

If you use the data, please tell me your name, research group, and the publications that may result.

For further information please contact me at s.lim [at] cs.ucl.ac.uk

Advertisements

21 Responses to Datasets

  1. Jonny says:

    I can’t do anything if have not your code. Please send me your code to implement your data set by method in your thesis and paper. My email tanhuu.bkhn@gmail.com
    Thanks in advance.

  2. sooling says:

    Dear Jonny,

    I used the social network algorithms in NetworkX to analyse my dataset.

    You can access their code here:

    http://networkx.github.com/

    Soo Ling

  3. charan says:

    Dear Soo Ling,

    Greetings of the day!

    I am Charan, a research scholar from Dayalbagh Educational Institute, India, working on search based software engineering.
    At the outset, I am thankful to you for making the real datasets of RALIC project publicly available for research community,
    It is indeed a precious resource for all the researchers working on requirements engibeering.
    I read your paper on “Empirical evaluation of search based requirements interaction management” and i want to apply the concepts proposed in the paper.
    I developed some algorithms for multi-objective optimization and want to apply on the datasets of RALIC to experiment with the algorithms.
    I sincerely seek your help, as i was stuck at the following instances in understanding:

    1. There are three types of requirements defined- (objectives, requirements, specific requirements). Out of these three types of requirements, which type of requirements are considered in the paper. (as the number of requirements cited in the paper for Point P and Rank P as 143 not matching with any of these types of requirements)
    2. Where the dependencies among the requirements are specified? (as discussed in the paper AND & OR dependencies among the requirements are considered)
    3. How to get the cost of implementing each requirement?
    4. What is the weightage given to each stakeholder in the above mentioned paper?

    I sincerely request you to kindly clarify the queries.
    Thanking you
    Yours sincerely
    Charan

  4. Sanusi says:

    Dear Soo Ling, i am a researcher in Nigeria currently undergoing a research on stakeholder identification in software requirement engineering. I came across your work and I found it top notch and sincerely want to thank you for releasing the RALIC dataset. I am solving this problem using particle swarm optimization algorithm but I would like if you could explain the structuring of the dataset better to me. Thank you

  5. sooling says:

    Dear Sanusi,

    Thank you for your interest in the work. An explanation about the structure of the dataset can be found in my thesis: https://soolinglim.files.wordpress.com/2010/07/thesis_soolinglim.pdf

    Let me know if you have more questions.

    Soo Ling

  6. franck-olivier kwan says:

    Hi Soo Ling, I’ve downloaded your RALIC dataset. I’m doing my DBA-Project Management from Universite du Quebec en Outaouais (Canada) and will test the dataset on my actual research which is Semantic analysis on IT project artefacts toward better risk management.

    I’ll let you know how it goes.

    Thanks,
    Franck-Olivier

  7. sooling says:

    Great! Let me know if you have questions.

    Soo Ling

  8. Hi, I am M Waqas, I am currently working on Requirement Elicitation through Social Network.

    I am stuck on the table#5.5 on page 123 of Dr Soo Ling’s Thesis Document.

    Any help will be highly appreciated!

  9. Sayo says:

    Dear Soo Ling,
    Thank you for making your research data publicly available.
    My name is Sayo Makinwa from Nigeria. I am working on my undergraduate research project on Multi-Objective Next Release Problem.

    There seems to be a little something that I do not understand in the data sets; the information provided in the README.txt file is that “Recommendations are in the format . Level of influence is between 1 and 5 inclusive,” which is very much adhered to in your thesis from page 118 downwards. However, the level of influence values provided in the files “ClosedR.txt” and “OpenR.txt” (files that fall within that specified format in the data sets) have the upper bound very much higher than 5. This is a point of confusion for me and I would really appreciate a good clarification

    Thank you so much.
    Best regards.

  10. Sayo says:

    Dear Soo Ling,
    My name is Sayo Makinwa.
    I could not find any part of the data set stating the cost for fulfiling each requirement. Cost is very paramount in my research and I would really appreciate if you could clarify this.

    Thank you so much.
    Kind regards.

  11. sooling says:

    Dear Sayo,

    The range for OpenR is 1 to 8. The raw level of influence for OpenR as collected during the survey is 1 to 5. There were 8 project scopes and level of influence of a stakeholder was collected for each project scope. The final data combined the level of influence for all 8 scopes into one score, which ranged from 1 to 8.

    The range for ClosedR is 1 to 10.

    You can get information about the cost for each requirement here: https://soolinglim.wordpress.com/datasets/#ralic (I have added a link to the cost file)

    Soo Ling

  12. Sayo says:

    Dear Soo Ling,

    Thank you for your kind and prompt response.

    Sayo

  13. Fadhl says:

    Dear Soo Ling,

    Greetings to you !

    I am PhD research scholar in University Malaysia Pahang (UMP) at Malaysia , i am really thankful for your effort in given the RALIC dataset online , and i would like to use in my proposed research but i have on issue which is :

    the experience and education background is not provided for each participated stakeholder ,

    so please help in providing and i will be really thankful for that since experience and education background are really significant in my proposed research

    thanks & Kind regards.
    Fadhl

  14. sooling says:

    Hi Fadhl,

    Thanks for your interest in the dataset. Unfortunately, information about the experience and education background for the stakeholders was not collected so I am not able to make it available.

    Soo Ling

  15. Fadhl says:

    Dear Soo Ling

    i am really thankful for your reply , and i would like to ask is it possible to providing me any way to contact the stakeholders in order to get the experience and background attribute. i wish you can help in this since i really wanted to used your respectable datasets

  16. sooling says:

    Hi Fadhl,

    The stakeholders have been anonymised to protect their privacy, so the names you see are not their real names and their contact details have been redacted.

    Soo Ling

  17. Fadhl says:

    Dear Soo Ling

    in that case you can’t do much , and thanks for your reply

    Fadhl

  18. Gina Benigno says:

    Soo Ling,

    I am a graduate student at Temple University in Philadelphia, Pennsylvania currently pursuing a degree in digital marketing. I would like to use your data in a capstone project I am working on that focuses on mobile applications. We are working with a client to assist him with developing a marketing plan for a new mobile gaming application he is creating. Part of our project involves using raw data sets and data visualizations to support our decision making. I am curious if you have any raw data sets of user demographics, etc. that I could possibly use for my project. I am having a difficult time locating these types of data online. Any assistance or advice you could offer is greatly appreciated.

    Thank you,

    Gina

  19. sooling says:

    Hi Gina,

    Thank you for your enquiry.

    You can find user demographics data in my mobile app user survey dataset here: http://www.cs.ucl.ac.uk/research/app_user_survey/mobile_app_user_dataset.xlsx

    Soo Ling

  20. dream g says:

    sir
    i am doing a project entitled Discovery of Ranking Fraud for Mobile Apps. so i want the data set for mobile apps rank,rate,review
    can you help me

  21. sunutha says:

    Everything is fine, am happy about your blog. Thanks admin for sharing the unique content, you have done a great job I appreciate your effort and I hope you will get more positive comments from the web users.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: