Defining the Problem - Octo Consulting’s entree into Semantic Web
Note: This blog post was co-posted at Octoconsulting.com, with the help of our Director of Health IT, Dr. Charlie Mead.
Octo set out a few weeks ago to figure out a path to solve a problem that is simple to explain but difficult to achieve. We call it a “End-to-End Use Case.” The challenge is to find a way to discover “semantically equivalent” data that has been collected in multiple studies. Traditionally, this is a relatively straight-forward task if done pre-study (although it requires considerable “top-down” governance), but very difficult once a study has been “designed” and executed. The inherent semantic difficulty of the task is made even more difficult by ‘non-semantic’ barriers including different physical data persistence and access models, and the wire-format exchange” serialization brittleness” of XML, the lingua franca for much clinical trial data exchange. Our work is based on the overarching thesis that “end-to-end Semantic Web-based representation of study meta-data and data plus data transport formats largely circumvents the non-semantic barriers, thereby allowing study stakeholders to focus on the core problem: interoperable semantics.
Octo Consulting and members of the W3C’s Healthcare Life Sciences Working Group have developed a concrete instance of our hypothesis in which study meta-data and data are represented using based on RDFS and OWL ontologies of the HL7 Model Interchange Format (MIF) — the MIF includes the HL7 Reference Information Model (RIM), data types and vocabulary bindings — and SNOMED-CT. Data transport will use an Semantic Web representation of the CDISC ODM (Operational Data Model) specification. SPARQL end points are used for data discover and analysis.
Octo Semantic Web Hackathon in partnership with the W3C Life Sciences
Note: This blog article was co-posted at Octoconsulting.com
Octo is thrilled to announce a Semantic Web three day “hackathon”, (February 19th - 22nd), featuring guest Eric Prud’hommeaux from the World Wide Web Consortium (W3C) . Eric’s role in the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is why we called upon him. While Semantic Web is a non-domain specific technology, we are deeply interested in how Semantic Web technologies can assist in the areas of healthcare and health research. We are striving to accomplish a number of objectives this week, some of which include:
- Familiarize our consultants and technical staff on our use case - Developing an “End to End” solution for health information trails (See Dr. Mead’s upcoming blog post on “Defining the Problem”)
- Develop a robust talent in Semantic Web for Octo’s brightest solutions team of Architects, Developers, and Management Consultants
- Familiarize Octo with many of the open source tools in the market and build our own environment for Semantic Web tools Research & Development in our “Octo Labs” cloud environment
- Develop additional materials for technical training sessions on Semantic Web technologies
Look for blog posts this week from our Octo team on the ground, featuring our perspective on interesting topics such as, how Semantic Web and Legacy IT are integrated,an update of the current progress of healthcare interoperability standards in Semantic Web, a recap on our daily progress at the “hackathon’, business intelligence and visualization, and much more. To get in touch, tweet us at @mlamoure or @octoconsulting with questions, comments, and thoughts. Wish us luck!
Progress on Semantic Web Dashboard
I am in the process of adding another source of data to my Semantic Web Dashboard. I want to chart my number of friends via Social Media over time. I started with Facebook - but Facebook doesn’t give me access to my historical friends count, only the current count today. So, I’m writing a app that will record the number of friends I have on a daily basis. I plan to expand it down the road to include number of postings and other things. This will mean that any chart I create will have limited data initially. I didn’t get to actually creating the chart yet, it took me long enough to figure out Facebook’s Graph API and to get that working properly (did I mention that I hate JSON?).
Here is the source to my PHP script that grabs my friend count:
$user_id = “7412441”;
$app_id = “REMOVED”;
$app_secret = “REMOVED”;
$my_url = “REMOVED”;
$app_token_url = “https://graph.facebook.com/oauth/access_token?”
. “client_id=” . $app_id
. “&client_secret=” . $app_secret
$response = file_get_contents($app_token_url);
$params = null;
$graph_url = “https://graph.facebook.com/app?access_token=”
//$app_details = json_decode(file_get_contents($graph_url), true);
//$query_url = “https://graph.facebook.com//fql?q=SELECT+friend_count+FROM+user+WHERE+uid=” . $user_id . “&access_token=” . $params[‘access_token’];
$query_url = “https://graph.facebook.com/$user_id?fields=friends&access_token=” . $params[‘access_token’];
$rawdata = file_get_contents($query_url);
$friends = json_decode($rawdata, true);
$friends_count = count($friends[‘friends’][‘data’]);
// Write to RDF, not complete yet.
$RDFData = “data.ttl”;
$fh = fopen($RDFData, ‘a’);
$stringData = “”;
Building a Semantic Web Personal Dashboard
I’m taking up a pet project to develop a personal data dashboard that I will make partially public on my blog. I’m challenging myself to do this as I was looking for a achievable project to undertake using Semantic Web technologies. Here are the data sources, some of which are manual, that I’m considering using:
- Energy Usage Data (SOURCE: Power Bills, I wish NEST were to give me specifics on this in a automated way)
- Personal Health Information including my weight (SOURCE: Fitbit scale)
- Social Trends Information (SOURCE: Facebook and Twitter)
- Personal finances (likely will hide the Y-axis!) such as net worth or retirement savings (SOURCE: iBank)
- Average TV Usage (SOURCE: My Home Automation System)
Here are the details for the plan:
- I will use RDF to store my data in a flat file
- I will use Fuseki from the Apache Jena project to serve that data using SPARQL
- I will use PHP to query the information using easyRDF libraries which will be converted to JSON
- I will use Google Charts to produce the dashboard, and JQuery to load the information asynchronously
- I’ll host this on my Synology Server (hopefully without having to keep a VM running on my iMac to successfully keep it hosted, but we will see)
I look forward to showing my colleagues at Octo Consulting my progress. I know some of the developers there may have some suggestions on how to best work with JSON, something I’m not very experienced with. Wish me luck, I’ll keep you posted on progress.
Semantic Web Activities
I’ve been going through “Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL” (amazon link) for quite some time. Today, I am paring that with a tutorial from Matthew Horridge. It is quite amazing how primitive the tools remain for the semantic web, I believe it adds to the barrier of entry for most established technical folks who look at the semantic web like it’s a foreign language.
This is the future of healthcare data, and what will pave the way for future health IT systems initiatives, I’m almost convinced. I’ll post more on my progress.
Economics of the Semantic Web
The past few weeks we have heard about Twitter beginning to limit their programming interface (example 1, example 2, …) for third party applications. The reason being that they want more visitors to their own website to capitalize on ad revenue.
The unfortunate reality of selling online advertising is that it has to be done in a rich user experience environment to be effective. Therefore, when a company engages in selling their valuable web real-estate, the goal is to drive as much traffic to your website as possible to capitalize on that ad revenue. Websites then tend to engage in practices such as limiting their RSS feed with “Read More…”, external APIs, and more.
My question is - Is there a model that can promote information sharing and linking on an internet that is advertisement supported? If each individual website is only concerned with bringing traffic to their own fiefdom, then how will we enable websites that can mash content from different sources. This has been the goal of the semantic web, a approach supported by the W3C since 2011 but has been met with limited success.
You can’t blame Twitter for protecting their data, it has economic value. Would an open source alternative to Twitter be the answer? Social Media and Social Networking are not the only areas lacking semantic linking. Would this model work for other domains?
This argument could likely be tied to net neutrality, which would allow for alternative revenues for websites other than through advertising. APIs can have revenue streams (and do!), but it seems that they are much less lucrative than selling ads yourself.